AWS Solutions Architect Exam  >  AWS Solutions Architect Notes  >  : Associate Level  >  Case Studies: ElastiCache — Redis vs Memcached

Case Studies: ElastiCache — Redis vs Memcached

Case Studies: ElastiCache - Redis vs Memcached

Case Study 1

A financial services company operates a trading platform that processes thousands of transactions per second. The application currently uses an on-premises Oracle database with significant read load during market hours. The Solutions Architect needs to implement a caching layer to reduce database read latency and improve response times. The application requires caching of complex data structures including sorted sets of stock prices, hash maps of user portfolios, and the ability to automatically expire cache entries after 15 minutes. The operations team has limited Redis expertise and wants to minimize cluster management overhead while ensuring automatic failover capabilities for high availability.

Which ElastiCache solution best meets these requirements?

  1. Deploy ElastiCache for Memcached with multiple nodes and implement client-side sharding to distribute the complex data structures across nodes
  2. Deploy ElastiCache for Redis in cluster mode disabled with Multi-AZ automatic failover enabled
  3. Deploy ElastiCache for Redis in cluster mode enabled with automatic partitioning across multiple shards
  4. Deploy ElastiCache for Memcached with Auto Discovery enabled and use ElastiCache parameter groups to configure TTL settings

Answer & Explanation

Correct Answer: 2 - Deploy ElastiCache for Redis in cluster mode disabled with Multi-AZ automatic failover enabled

Why this is correct: This solution addresses all requirements effectively. Redis supports complex data structures like sorted sets and hash maps natively, which Memcached does not. The cluster mode disabled configuration provides a simpler operational model with less management overhead while still offering Multi-AZ automatic failover for high availability. Redis natively supports TTL (time-to-live) for automatic expiration of cache entries. This configuration provides a single primary node with read replicas, which is operationally simpler for teams with limited Redis expertise while meeting all functional requirements.

Why the other options are wrong:

  • Option 1: Memcached does not support complex data structures like sorted sets or hash maps. It only supports simple key-value pairs with string values. Client-side sharding adds complexity and doesn't solve the fundamental limitation. Additionally, Memcached doesn't provide built-in automatic failover capabilities.
  • Option 3: While Redis cluster mode enabled does support all the data structure requirements, it introduces significantly more operational complexity with manual partitioning decisions and resharding considerations. This violates the constraint of minimizing management overhead for a team with limited Redis expertise. Cluster mode is better suited for scenarios requiring horizontal scaling beyond what a single node can handle.
  • Option 4: Memcached fundamentally cannot store complex data structures regardless of configuration. Auto Discovery helps clients automatically identify nodes but doesn't add data structure capabilities. TTL is supported in Memcached, but the lack of complex data type support makes this solution non-viable.

Key Insight: The requirement for complex data structures (sorted sets, hash maps) immediately eliminates all Memcached options. The deciding factor between Redis configurations is balancing high availability needs with operational simplicity-cluster mode disabled with Multi-AZ provides automatic failover without the operational complexity of managing shards and partitions.

Case Study 2

An e-commerce company runs a product catalog service that serves millions of product lookups per hour during peak shopping seasons. Each product record is a simple JSON object averaging 2 KB in size that changes infrequently. The application is distributed across multiple Availability Zones and uses a horizontally scaled fleet of EC2 instances. The CTO has mandated a caching solution that is the most cost-effective while supporting horizontal scaling to handle traffic spikes during Black Friday sales. The application architecture can be modified if needed. There is no requirement for data persistence, complex querying, or automatic failover since the cache is treated as disposable and can be repopulated from the database.

What is the MOST cost-effective caching solution for this scenario?

  1. Deploy ElastiCache for Redis with cluster mode enabled and use read replicas for horizontal read scaling
  2. Deploy ElastiCache for Memcached with multiple cache nodes and use Auto Discovery for dynamic node management
  3. Deploy ElastiCache for Redis in cluster mode disabled with a single large node instance type
  4. Deploy ElastiCache for Redis with cluster mode enabled and enable Multi-AZ for automatic failover

Answer & Explanation

Correct Answer: 2 - Deploy ElastiCache for Memcached with multiple cache nodes and use Auto Discovery for dynamic node management

Why this is correct: Memcached is the most cost-effective solution for this use case. The scenario requires only simple key-value caching of JSON strings with no need for complex data types, persistence, or automatic failover. Memcached offers lower per-node pricing than Redis and its multi-threaded architecture provides better performance per dollar for simple caching operations. The ability to horizontally scale by adding nodes matches the requirement for handling traffic spikes. Auto Discovery simplifies client configuration when adding or removing nodes. Since the data is disposable and can be repopulated, the lack of persistence and replication features in Memcached is not a concern.

Why the other options are wrong:

  • Option 1: Redis cluster mode with read replicas is more expensive than necessary. This configuration is designed for scenarios requiring advanced features like complex data types, persistence, or pub/sub functionality. The added cost of Redis licensing overhead and replica nodes cannot be justified when the use case only requires simple key-value caching.
  • Option 3: A single large Redis node creates a potential bottleneck and doesn't provide the horizontal scaling capability required for traffic spikes. Additionally, Redis is more expensive per unit of cache capacity compared to Memcached for simple key-value operations, and vertical scaling is generally more expensive than horizontal scaling with Memcached.
  • Option 4: This is the most expensive option presented. Cluster mode enabled with Multi-AZ failover provides capabilities (data durability, automatic failover, complex data structures) that the scenario explicitly does not require. This violates the cost-effectiveness constraint while providing unnecessary features.

Key Insight: When a scenario explicitly states that data persistence, complex data structures, and automatic failover are not required, and cost-effectiveness is the primary driver, Memcached is almost always the correct choice. Many candidates default to Redis because it has more features, but understanding when NOT to use a more feature-rich (and expensive) service is critical for cost-optimization questions.

Case Study 3

A healthcare analytics company processes patient data and generates complex reports that aggregate information from multiple databases. The reporting queries execute stored procedures that produce result sets which are then cached for 24 hours. Different reports require different retention periods, and some reports must be available even if the caching layer experiences a failure. The development team wants to implement a caching strategy that supports automatic persistence to disk, the ability to set individual TTL values for different report types, and the capability to perform read operations against cached data even during primary node failure scenarios. Compliance requirements mandate that cached data must be encrypted at rest and in transit.

Which ElastiCache configuration meets all of these requirements? (Select TWO)

  1. Use ElastiCache for Redis with cluster mode disabled, enable automatic backups, enable Multi-AZ with automatic failover, and enable encryption at rest and in transit
  2. Use ElastiCache for Memcached with multiple nodes across Availability Zones and enable encryption in transit using TLS
  3. Use ElastiCache for Redis with cluster mode enabled, configure AOF (Append Only File) persistence, and enable at-rest and in-transit encryption
  4. Use ElastiCache for Memcached with Auto Discovery enabled and implement application-level encryption before storing data in the cache
  5. Use ElastiCache for Redis with cluster mode disabled, enable Redis persistence using RDB snapshots, enable Multi-AZ, and enable encryption at rest and in transit

Answer & Explanation

Correct Answers: 1 and 5 - Both Redis configurations with appropriate persistence and encryption settings

Why these are correct: Both options use ElastiCache for Redis, which is the only ElastiCache engine that supports data persistence to disk, encryption at rest, and automatic failover capabilities. Option 1 provides automatic backups and Multi-AZ failover, ensuring data availability during failures. Option 5 uses RDB snapshots for persistence and also includes Multi-AZ for read availability during failover. Both configurations support individual TTL values per cache key (a Redis feature) and meet the encryption requirements through native Redis encryption at rest and in transit. Both approaches ensure that read replicas can serve read requests during primary node unavailability.

Why the other options are wrong:

  • Option 2: Memcached does not support data persistence to disk, cannot perform automatic failover, and does not support encryption at rest. While it can use TLS for encryption in transit, the lack of persistence and failover capabilities means cached reports would be lost during node failures, violating the availability requirement.
  • Option 3: While Redis does support AOF persistence in self-managed Redis installations, ElastiCache for Redis does not expose AOF configuration. ElastiCache for Redis only supports RDB (snapshot-based) persistence and automatic backup mechanisms. This option references a feature not available in the managed service.
  • Option 4: Memcached lacks persistence capabilities entirely-data is stored only in memory and is lost when nodes restart or fail. Application-level encryption doesn't solve the fundamental persistence and failover requirements. Additionally, Memcached does not support encryption at rest, which is a compliance requirement.

Key Insight: This question tests understanding of what persistence features are actually available in ElastiCache for Redis versus self-managed Redis. Many candidates know Redis supports both RDB and AOF persistence but don't realize that ElastiCache only exposes RDB and automatic backups. The requirement for "read operations during failure" requires Multi-AZ with read replicas, which only Redis provides.

Case Study 4

A gaming company has deployed a real-time leaderboard feature that tracks player scores across millions of active users. The leaderboard must display the top 100 players globally and update in real-time as players complete game levels. The current implementation queries a relational database using complex ORDER BY and LIMIT clauses, which is causing performance degradation under load. The development team wants to implement a caching solution that can natively maintain sorted rankings without requiring the application to perform sorting operations. The solution must support atomic increment operations for score updates and handle thousands of concurrent score updates per second with sub-millisecond latency.

Which solution best addresses these requirements?

  1. Implement ElastiCache for Memcached and store player scores as individual key-value pairs, then retrieve all scores and sort them in the application layer
  2. Implement ElastiCache for Redis and use the Sorted Set data structure with ZADD for score updates and ZREVRANGE to retrieve top players
  3. Implement ElastiCache for Redis using Hash data structures to store player data and use Redis Lua scripting to perform sorting operations
  4. Implement ElastiCache for Memcached with client-side consistent hashing and maintain separate cache keys for each ranking position

Answer & Explanation

Correct Answer: 2 - Implement ElastiCache for Redis and use the Sorted Set data structure with ZADD for score updates and ZREVRANGE to retrieve top players

Why this is correct: Redis Sorted Sets are purpose-built for exactly this use case. They maintain elements in sorted order by score automatically and efficiently. The ZADD command atomically adds or updates member scores, and ZREVRANGE retrieves the top N members in descending order with O(log(N)+M) time complexity. This eliminates the need for application-level sorting and provides sub-millisecond performance even with millions of entries. The sorted set automatically maintains ranking order, supports atomic increment operations via ZINCRBY, and handles concurrent updates efficiently. This is a native Redis feature designed specifically for leaderboard scenarios.

Why the other options are wrong:

  • Option 1: Memcached only supports simple key-value storage without any built-in data structures for maintaining sorted order. Retrieving all scores and sorting in the application layer defeats the purpose of caching for performance-it would require fetching potentially millions of records and performing expensive sorting operations in application memory, creating the same performance problem as the database query.
  • Option 3: While Redis Hashes can store player data and Lua scripting can perform operations, this approach is unnecessarily complex and less performant than using Sorted Sets, which are optimized specifically for this use case. Lua scripts for sorting would need to load data into memory and sort it, which is significantly slower than the native sorted set operations. This violates the requirement for sub-millisecond latency at scale.
  • Option 4: Memcached cannot maintain sorted order natively. Trying to maintain separate cache keys for each ranking position would require complex coordination logic to update rankings when scores change, leading to race conditions and consistency issues with thousands of concurrent updates. This approach is architecturally unsound for real-time leaderboards.

Key Insight: This question tests deep knowledge of Redis data structures. Candidates who only understand Redis as a "key-value store" may miss that Sorted Sets provide native, performant ranking capabilities. The scenario's emphasis on "natively maintain sorted rankings" and "atomic increment operations" points directly to Sorted Sets-one of Redis's most powerful features for gaming and ranking scenarios.

Case Study 5

A media streaming company has implemented an ElastiCache for Redis cluster to cache user preferences and viewing history. After deployment, the operations team notices that during peak evening hours, the application experiences intermittent connection timeouts to the Redis cluster, while during off-peak hours, performance is acceptable. CloudWatch metrics show that the CPU utilization on the Redis primary node reaches 85-90% during peak times, while network throughput remains well below limits. The cache hit ratio is consistently above 90%. The application performs a mix of read and write operations with a 70/30 read-to-write ratio. The current implementation uses a single-node Redis instance with no read replicas.

What is the MOST LIKELY cause of the connection timeouts, and what action should be taken to resolve it?

  1. The network bandwidth limit is being reached; upgrade to a larger node instance type with higher network capacity
  2. The Redis primary node is CPU-bound because Redis is single-threaded for command execution; add read replicas to offload read operations from the primary node
  3. The ElastiCache cluster is experiencing memory pressure causing evictions; increase the memory capacity by upgrading to a larger node type
  4. The application connection pool is exhausted; increase the number of connections allowed in the ElastiCache parameter group and scale the application servers

Answer & Explanation

Correct Answer: 2 - The Redis primary node is CPU-bound because Redis is single-threaded for command execution; add read replicas to offload read operations from the primary node

Why this is correct: The scenario indicates high CPU utilization (85-90%) during peak times while network throughput is well below limits, which points to a CPU bottleneck. Redis uses a single-threaded event loop for processing commands, meaning all operations are processed sequentially on one CPU core. When this core reaches capacity, commands queue up, leading to increased latency and timeouts. With a 70/30 read-to-write ratio and a single-node configuration, all read and write operations compete for the same processing thread. Adding read replicas allows the application to distribute read operations across multiple nodes, reducing the load on the primary node's CPU and improving overall throughput. The high cache hit ratio confirms that the cache is effective and memory is not the issue.

Why the other options are wrong:

  • Option 1: The scenario explicitly states that network throughput remains well below limits, ruling out network bandwidth as the bottleneck. The problem manifests as CPU exhaustion, not network saturation. Upgrading the instance type for more network capacity would not address the single-threaded CPU constraint.
  • Option 3: The high cache hit ratio (90%+) indicates that the cache is not experiencing significant eviction pressure. If memory pressure were causing evictions, the cache hit ratio would be declining during peak periods. The CPU metric, not memory exhaustion, correlates with the timeout issues.
  • Option 4: While connection pool exhaustion can cause timeouts, the correlation with high CPU utilization and the timing pattern (only during peak load) indicates a processing bottleneck rather than a connection limit issue. Additionally, ElastiCache has high default connection limits, and scaling application servers without addressing the Redis CPU bottleneck would only increase load on the already saturated primary node.

Key Insight: Understanding Redis's single-threaded architecture is critical for diagnosing performance issues. Many candidates might assume network or memory issues, but the combination of high CPU, adequate network capacity, and good cache hit ratio points specifically to CPU-bound command processing. This tests the ability to correlate metrics with architectural knowledge to identify root causes.

Case Study 6

A logistics company is migrating its warehouse management system from on-premises infrastructure to AWS. The existing system uses Memcached on physical servers for caching inventory queries. The application code uses consistent hashing to distribute cache keys across multiple Memcached nodes and relies on this distribution pattern for performance. The company wants to minimize code changes during migration while improving operational efficiency. The application does not require data persistence, advanced data structures, or automatic failover, but the operations team wants to eliminate the need to manually reconfigure application servers when cache nodes are added or removed.

Which migration strategy minimizes code changes while meeting the operational requirements?

  1. Migrate to ElastiCache for Redis in cluster mode enabled and modify the application to use Redis client libraries with cluster-aware routing
  2. Migrate to ElastiCache for Memcached and implement Auto Discovery in the application by updating the client library to use the cluster configuration endpoint
  3. Migrate to ElastiCache for Redis in cluster mode disabled and update the application to use Redis connection string instead of Memcached protocol
  4. Migrate to ElastiCache for Memcached and configure Application Load Balancer to distribute requests across Memcached nodes using round-robin routing

Answer & Explanation

Correct Answer: 2 - Migrate to ElastiCache for Memcached and implement Auto Discovery in the application by updating the client library to use the cluster configuration endpoint

Why this is correct: This solution maintains compatibility with the existing Memcached protocol, minimizing code changes since the application already uses Memcached. ElastiCache Auto Discovery for Memcached allows the application to automatically discover cache nodes in the cluster without manual reconfiguration. The application points to a single configuration endpoint, and the ElastiCache-aware Memcached client library automatically maintains the list of available nodes. This replaces the manual consistent hashing configuration while preserving the distribution pattern behavior. The code changes are minimal-primarily updating the client library and changing the endpoint configuration-while eliminating operational overhead of manually updating node lists.

Why the other options are wrong:

  • Option 1: Migrating from Memcached to Redis requires significant code refactoring because Redis uses a different protocol, different client libraries, and different command syntax. Even though Redis cluster mode provides automatic sharding, the application code would need substantial changes to work with Redis clients and handle potential differences in behavior. This violates the constraint to minimize code changes.
  • Option 3: Similar to Option 1, switching from Memcached to Redis requires changing client libraries, connection handling, and potentially command syntax. Redis connection strings and protocols are incompatible with Memcached, requiring more extensive code modifications. Additionally, cluster mode disabled doesn't provide the same scaling characteristics as the multi-node Memcached setup being replaced.
  • Option 4: Application Load Balancer is designed for HTTP/HTTPS traffic, not for the Memcached binary protocol. ALB cannot effectively load balance Memcached connections, and this approach would break the consistent hashing distribution pattern that the application relies on for performance. This would also introduce unnecessary latency and complexity.

Key Insight: Migration scenarios often test whether candidates understand protocol compatibility and incremental modernization. When existing code uses Memcached and requirements don't demand Redis-specific features, staying with ElastiCache for Memcached with Auto Discovery provides the path of least resistance. The key phrase "minimize code changes" should immediately suggest maintaining protocol compatibility.

Case Study 7

A social media analytics platform processes sentiment analysis on millions of posts and stores aggregated metrics in a caching layer. The architecture team is evaluating ElastiCache options. The application needs to perform the following operations: increment counters for post likes atomically, maintain lists of the most recent 1,000 comments per post, implement a publish/subscribe mechanism to notify multiple analytics workers when new data is available, and set expiration times on cached metrics. The platform experiences variable traffic with unpredictable spikes, and the team wants a solution that can scale horizontally when needed without application downtime.

Which ElastiCache configuration best supports these requirements?

  1. ElastiCache for Memcached with multiple nodes to handle traffic spikes, using client-side logic to implement atomic counters and pub/sub functionality
  2. ElastiCache for Redis in cluster mode disabled with Multi-AZ, using Redis Lists for comments, INCR for atomic counters, and pub/sub channels
  3. ElastiCache for Redis in cluster mode enabled, using Redis data structures for lists and counters, with pub/sub for notifications and online scaling for horizontal growth
  4. ElastiCache for Memcached with Auto Discovery enabled and application-level implementation of list management and counter operations

Answer & Explanation

Correct Answer: 3 - ElastiCache for Redis in cluster mode enabled, using Redis data structures for lists and counters, with pub/sub for notifications and online scaling for horizontal growth

Why this is correct: This solution addresses all requirements comprehensively. Redis supports native atomic increment operations (INCR, INCRBY) without requiring application-level locking. Redis Lists can efficiently maintain ordered collections like recent comments with operations like LPUSH and LTRIM. Redis pub/sub provides built-in publish/subscribe messaging for notifying workers. Cluster mode enabled allows horizontal scaling across multiple shards, and Redis supports online resharding where nodes can be added to the cluster without application downtime. Redis also natively supports TTL for automatic expiration. This configuration provides all required functionality with built-in features rather than requiring application-level implementations.

Why the other options are wrong:

  • Option 1: Memcached does not provide atomic increment operations that are truly atomic under concurrent access from multiple clients without application-level locking mechanisms. It also has no built-in pub/sub functionality-this would need to be implemented externally using another service like SNS or SQS. Memcached also lacks native list data structures, requiring serialization of entire lists for updates, which is inefficient for maintaining "most recent 1,000 comments."
  • Option 2: While Redis in cluster mode disabled does provide all the required data structures and pub/sub functionality, it cannot scale horizontally across multiple shards. The single primary node (even with read replicas) has a vertical scaling limit and cannot distribute write operations across multiple nodes. With "unpredictable spikes" and the need to "scale horizontally," this configuration would hit capacity limits.
  • Option 4: Memcached lacks native support for complex data structures like lists, atomic operations, and pub/sub. Implementing these at the application level defeats the purpose of using a caching service and introduces complexity, potential race conditions, and performance overhead. Auto Discovery helps with node management but doesn't add missing functionality.

Key Insight: The combination of requirements-atomic operations, complex data structures (lists), pub/sub, and horizontal scaling-points specifically to Redis cluster mode enabled. This question tests whether candidates recognize that certain architectural patterns require not just Redis, but specifically the clustered configuration. The phrase "scale horizontally without downtime" is the key differentiator from cluster mode disabled.

Case Study 8

A government agency is developing a citizen services portal that will cache sensitive personal information temporarily during user sessions. Compliance regulations require that all data be encrypted at rest and in transit, and that the caching infrastructure must support automated backup and recovery capabilities to prevent data loss. The application needs to cache user session data that includes nested JSON structures with user preferences and form data. Sessions must automatically expire after 30 minutes of inactivity. The security team has mandated that authentication to the cache must use IAM credentials rather than managing separate database passwords. The infrastructure must be deployed across multiple Availability Zones for resilience.

Which solution meets all security and compliance requirements?

  1. ElastiCache for Memcached with encryption in transit using TLS, deployed across multiple Availability Zones with Auto Discovery
  2. ElastiCache for Redis with encryption at rest and in transit, Redis AUTH enabled, Multi-AZ deployment, and automated daily backups
  3. ElastiCache for Redis with encryption at rest and in transit, IAM authentication enabled using Redis AUTH token, Multi-AZ deployment with automatic failover, and automated backups
  4. ElastiCache for Memcached with application-level encryption before caching, deployed in multiple AZs, with manual backup scripts to export cache data to S3

Answer & Explanation

Correct Answer: 3 - ElastiCache for Redis with encryption at rest and in transit, IAM authentication enabled using Redis AUTH token, Multi-AZ deployment with automatic failover, and automated backups

Why this is correct: This is the only option that satisfies all security and compliance requirements. ElastiCache for Redis supports IAM authentication (introduced as a feature that allows IAM policies to control access while still using Redis AUTH tokens behind the scenes), meeting the requirement to avoid managing separate passwords. Redis provides encryption at rest and in transit natively. Multi-AZ with automatic failover ensures resilience across Availability Zones. Automated backups provide the recovery capability required by compliance regulations. Redis can store nested JSON structures as strings and supports per-key TTL for the 30-minute session expiration requirement. This configuration addresses security (encryption, IAM authentication), compliance (backups), availability (Multi-AZ), and functional requirements (TTL, complex data).

Why the other options are wrong:

  • Option 1: Memcached does not support encryption at rest, which violates the compliance requirement. It also lacks automated backup capabilities-data is purely in-memory and lost on node failure. Memcached does not support IAM authentication. While it can encrypt data in transit with TLS, the missing encryption at rest and backup capabilities make this non-compliant.
  • Option 2: While this option includes most necessary features (encryption, Multi-AZ, backups), it only mentions Redis AUTH (password-based authentication) and does not include IAM authentication, which was explicitly mandated by the security team. Redis AUTH uses a static password that must be managed and rotated separately from IAM.
  • Option 4: Memcached does not support encryption at rest as a service feature. Application-level encryption addresses data protection but doesn't meet the compliance requirement for "infrastructure encryption at rest." More critically, Memcached has no backup capabilities-it's a pure in-memory cache with no persistence. Manual backup scripts cannot effectively export Memcached data since it's volatile and not designed for persistence. Memcached also lacks IAM authentication support.

Key Insight: Security and compliance questions often have multiple options that meet some requirements but fail on specific mandates. The key phrase "IAM credentials rather than managing separate passwords" eliminates options with only Redis AUTH. Understanding that Memcached lacks encryption at rest and backup capabilities immediately rules out those options. This tests detailed knowledge of security features across both ElastiCache engines.

Case Study 9

A SaaS company provides a multi-tenant application where each tenant has isolated data requirements. The application currently stores session state and frequently accessed tenant configuration in an ElastiCache for Redis cluster mode disabled setup with a single primary node and two read replicas. As the customer base has grown to over 500 tenants, the operations team notices increasing write latency during business hours and observes that CloudWatch metrics show the primary node's write operations per second reaching the single-node throughput limit. Read operations are well-distributed across replicas and performing adequately. The architecture team wants to scale write capacity without changing the application's Redis client libraries or implementing application-level sharding logic.

What is the most operationally efficient solution to increase write throughput?

  1. Vertically scale the Redis primary node to a larger instance type with more CPU and memory capacity
  2. Enable Redis cluster mode by creating a new cluster mode enabled Redis cluster and migrate data, which will distribute write operations across multiple shards
  3. Add additional read replicas to the existing cluster to distribute write operations across more nodes
  4. Implement Redis Streams to buffer write operations and process them asynchronously across multiple consumer groups

Answer & Explanation

Correct Answer: 2 - Enable Redis cluster mode by creating a new cluster mode enabled Redis cluster and migrate data, which will distribute write operations across multiple shards

Why this is correct: The problem is a write throughput bottleneck on the primary node. Redis cluster mode enabled distributes data across multiple primary shards, with each shard handling its own write operations. This provides horizontal scaling of write capacity-exactly what's needed. While migration requires some planning and potentially involves a short cutover window, it doesn't require changing application client libraries (modern Redis clients support both cluster modes) and doesn't require implementing custom sharding logic at the application layer. Each shard in cluster mode can process writes independently, effectively multiplying write throughput by the number of shards. This is the architecturally correct solution for scaling beyond single-node write limits.

Why the other options are wrong:

  • Option 1: Vertical scaling provides limited improvement and is a temporary solution. Even larger instance types have write throughput limits because Redis uses a single thread for command processing on each primary node. While this might provide short-term relief, it doesn't address the fundamental architectural limitation and will eventually hit the same ceiling as tenants continue to grow. This is not operationally efficient long-term.
  • Option 3: Read replicas only handle read operations in ElastiCache for Redis-they do not accept write operations. All writes must go to the primary node, which then replicates changes to read replicas. Adding more read replicas does nothing to address write throughput limitations. This reflects a fundamental misunderstanding of Redis replication architecture.
  • Option 4: Redis Streams is a data structure for log-style messaging, not a solution for scaling write throughput. While Streams can buffer messages, the writes to add items to the stream still occur on the primary node, so this doesn't solve the underlying write bottleneck. This adds complexity without addressing the core issue and would require significant application changes.

Key Insight: This question tests understanding of how to scale Redis write capacity. Many candidates know that read replicas help with reads but may incorrectly assume they also help with writes. The key constraint is "without implementing application-level sharding logic"-cluster mode provides automatic sharding. Understanding that single-node write throughput is a hard limit that can only be overcome by distributing writes across multiple primary nodes (cluster mode) is essential for designing scalable Redis architectures.

Case Study 10

An online education platform uses ElastiCache to improve performance for their learning management system. They have implemented ElastiCache for Redis to cache course content, user progress data, and quiz results. The platform experiences a pattern where cache performance is excellent during normal operation, but every morning at 6 AM UTC when the cache is at its coldest (after overnight low traffic), the application experiences high database load and slow response times for the first 30-45 minutes until the cache warms up naturally. The database team has reported that this daily spike is causing database connection pool exhaustion. The development team wants to implement a solution that proactively warms the cache before peak morning traffic begins at 6:30 AM UTC without overloading the database.

Which approach most effectively addresses the cache warming requirement while protecting the database?

  1. Schedule an AWS Lambda function to run at 5:30 AM UTC that queries the most frequently accessed data from the database and loads it into Redis using batch operations with rate limiting
  2. Configure ElastiCache for Redis to restore from the previous day's automated backup snapshot at 5:45 AM, ensuring the cache contains recent data
  3. Increase the TTL values for all cached objects to 48 hours so that cache entries from the previous day remain valid during the morning cold start period
  4. Implement lazy loading in the application with a longer timeout grace period during the 6-6:45 AM window to allow the cache to warm gradually without impacting users

Answer & Explanation

Correct Answer: 1 - Schedule an AWS Lambda function to run at 5:30 AM UTC that queries the most frequently accessed data from the database and loads it into Redis using batch operations with rate limiting

Why this is correct: This solution proactively warms the cache before peak traffic begins, addressing the root cause. A Lambda function on a schedule can be triggered via EventBridge (CloudWatch Events) to run at 5:30 AM, giving 60 minutes for cache warming before peak traffic at 6:30 AM. By querying the most frequently accessed data (which could be determined from access patterns or a prioritized list), the function populates the cache with high-value items first. Rate limiting the database queries ensures the database isn't overwhelmed-queries can be spread over the 60-minute window with controlled concurrency. The Lambda function can batch Redis SET operations for efficiency. This approach allows fine-grained control over what gets cached and how quickly, protecting the database while ensuring the cache is warm when real users arrive.

Why the other options are wrong:

  • Option 2: Restoring from a backup snapshot requires creating a new Redis cluster from the snapshot, which involves significant time (often 10-30+ minutes depending on data size) and requires changing the application endpoint to point to the new cluster or performing a blue/green swap. This operational complexity is high, and the restored data would be from the previous day's backup time, potentially containing stale information. Additionally, this approach replaces the entire cluster rather than warming it, causing disruption.
  • Option 3: Simply increasing TTL to 48 hours doesn't solve the cold cache problem-if keys have expired or were evicted due to memory pressure, they won't exist regardless of TTL setting. More importantly, extending TTL to 48 hours may cache stale data that has changed in the database, leading to data consistency issues. This doesn't proactively warm the cache; it only delays expiration. If the cache is genuinely cold (empty or mostly empty), TTL changes have no effect.
  • Option 4: Lazy loading (cache-aside pattern) is reactive, not proactive-it only populates the cache when users request data. Extending the timeout window during 6-6:45 AM doesn't prevent the database from being hit by all those requests; it just makes users wait longer for slow responses. This doesn't solve the database connection pool exhaustion problem and doesn't prevent the poor user experience during the warm-up period.

Key Insight: Cache warming is a proactive strategy that requires scheduled pre-population before traffic arrives. This question tests understanding of practical caching patterns and operational scheduling. Many candidates might consider restoring from backup, but the operational complexity and timing issues make a controlled Lambda-based warming approach more practical and flexible. The phrase "without overloading the database" is key-rate-limited querying from Lambda provides the necessary control.

The document Case Studies: ElastiCache — Redis vs Memcached is a part of the AWS Solutions Architect Course AWS Solutions Architect: Associate Level.
All you need of AWS Solutions Architect at this link: AWS Solutions Architect
Explore Courses for AWS Solutions Architect exam
Get EduRev Notes directly in your Google search
Related Searches
Semester Notes, mock tests for examination, past year papers, Objective type Questions, Important questions, MCQs, Extra Questions, Viva Questions, ppt, Sample Paper, video lectures, Previous Year Questions with Solutions, Case Studies: ElastiCache — Redis vs Memcached, shortcuts and tricks, study material, pdf , Case Studies: ElastiCache — Redis vs Memcached, Exam, practice quizzes, Case Studies: ElastiCache — Redis vs Memcached, Summary, Free;