AWS Solutions Architect Exam  >  AWS Solutions Architect Notes  >  : Associate Level  >  Cheat Sheet: Kinesis — Streaming Data

Cheat Sheet: Kinesis — Streaming Data

1. Kinesis Overview

1.1 Service Family

1.1 Service Family

2. Kinesis Data Streams

2.1 Core Concepts

2.1 Core Concepts

2.2 Capacity and Limits

2.2 Capacity and Limits

2.3 Capacity Modes

2.3 Capacity Modes

2.4 Resharding Operations

  • Shard Splitting: Divide one shard into two to increase capacity
  • Shard Merging: Combine two shards into one to reduce capacity and cost
  • UpdateShardCount API: Automatic scaling to target shard count
  • Cannot split into more than two shards or merge more than two shards in single operation
  • Parent shard remains in CLOSED state after splitting/merging until data expires

2.5 Consumers

2.5.1 Shared Consumer (Standard)

  • Pull model using GetRecords API
  • 2 MB/sec per shard shared across all consumers
  • 5 GetRecords API calls per second per shard
  • 200 ms latency
  • Lower cost option
  • Maximum 5 simultaneous consumers per shard

2.5.2 Enhanced Fan-Out Consumer

  • Push model using SubscribeToShard API with HTTP/2
  • 2 MB/sec per shard per consumer
  • 70 ms latency
  • Higher cost (consumer-hour charges)
  • Supports up to 20 consumers per stream
  • Use when multiple consumers need dedicated throughput

2.6 Producer Options

2.6 Producer Options

2.7 Data Ordering

  • Records with same partition key go to same shard
  • Order guaranteed within a shard
  • To maintain strict ordering, use single shard or consistent partition key
  • Use timestamp or sequence number to order across shards

2.8 Security

  • Encryption at rest: KMS encryption with customer or AWS managed keys
  • Encryption in transit: HTTPS endpoints
  • Access control: IAM policies for producer and consumer permissions
  • VPC endpoints: Private connectivity via PrivateLink (Interface VPC Endpoint)
  • CloudWatch: Monitor metrics (IncomingBytes, IncomingRecords, WriteProvisionedThroughputExceeded)
  • CloudTrail: API call logging

2.9 Error Handling

2.9 Error Handling

3. Kinesis Data Firehose

3.1 Core Features

  • Fully managed; no administration required
  • Automatic scaling
  • Near real-time delivery (60 seconds minimum latency or 1 MB minimum batch size)
  • Serverless data transformation using Lambda
  • Supports compression (GZIP, ZIP, Snappy for S3; GZIP only for Redshift)
  • Pay only for data volume transmitted

3.2 Destinations

3.2 Destinations

3.3 Buffer Configuration

3.3 Buffer Configuration
  • Firehose delivers when either buffer size or interval is reached first
  • Smaller buffers = lower latency but higher cost per record

3.4 Data Transformation

  • Invoke Lambda function to transform records before delivery
  • Lambda receives batches up to 3 MB or 500 records
  • Lambda timeout: maximum 5 minutes for Firehose invocation
  • Transformed record size limit: 6 MB per record
  • Can enable source record backup to S3 before transformation

3.5 Data Sources

  • Direct PUT using SDK or Kinesis Agent
  • Kinesis Data Streams (use Firehose as consumer)
  • CloudWatch Logs
  • CloudWatch Events
  • AWS IoT

3.6 Data Firehose vs Data Streams

3.6 Data Firehose vs Data Streams

4. Kinesis Data Analytics

4.1 Overview

  • Process and analyze streaming data in real-time
  • Two runtime options: SQL or Apache Flink (Java, Scala, Python)
  • Serverless; automatic scaling
  • Pay for resources consumed (KPU - Kinesis Processing Unit)

4.2 SQL Applications

  • Use standard SQL to query streaming data
  • Sources: Kinesis Data Streams or Kinesis Data Firehose
  • Reference data from S3 for enrichment
  • Destinations: Kinesis Data Streams, Kinesis Data Firehose, Lambda
  • Built-in templates for common patterns (anomaly detection, top-K items)

4.3 Apache Flink Applications

  • Advanced processing with Java, Scala, or Python
  • Supports event time processing and windowing
  • Stateful processing with automatic checkpointing
  • Sources: Kinesis Data Streams, MSK (Managed Kafka), custom sources
  • Sinks: Kinesis Data Streams, Kinesis Data Firehose, S3, custom sinks
  • Use for complex transformations, machine learning inference, custom business logic

4.4 Use Cases

  • Real-time dashboards and metrics
  • Anomaly detection
  • Filtering and transformation before storage
  • Time-series analytics
  • Click stream analysis
  • IoT data processing

5. Kinesis Video Streams

5.1 Core Features

  • Ingest video from millions of devices
  • Durably stores, encrypts, and indexes video data
  • Access video through APIs for playback and processing
  • Retention: 1 hour to 10 years
  • Not for data analytics use cases

5.2 Integration

  • Producers: Cameras, IoT devices, smartphones using Kinesis Video Streams SDK
  • Consumers: Amazon Rekognition Video for ML analysis, custom applications
  • Playback: HLS, DASH streaming protocols
  • Integration with SageMaker for custom ML model training

6. Architecture Patterns

6.1 Real-Time Analytics Pipeline

  • Producers → Kinesis Data Streams → Kinesis Data Analytics → Kinesis Data Firehose → S3/Redshift
  • Use for real-time aggregation, filtering, and analysis before storage

6.2 Lambda Processing

  • Kinesis Data Streams → Lambda → DynamoDB/S3/SNS
  • Lambda batch size: 1 to 10,000 records (default 100)
  • Lambda batch window: 0 to 300 seconds
  • Lambda processes records in order per shard
  • On error, Lambda retries entire batch until success or data expires
  • Use bisect-on-function-error to isolate bad records

6.3 Simple Data Ingestion

  • Producers → Kinesis Data Firehose → S3 (with optional Lambda transformation)
  • Use when no real-time processing required, only storage

6.4 Multi-Consumer Fan-Out

  • Single Kinesis Data Stream with multiple enhanced fan-out consumers
  • Each consumer gets dedicated 2 MB/sec throughput per shard
  • Use for real-time dashboards, archival, and analytics from same stream

6.5 Log Aggregation

  • CloudWatch Logs → Subscription Filter → Kinesis Data Streams/Firehose → S3/Elasticsearch
  • Kinesis Agent on EC2 instances → Kinesis Data Streams
  • Use for centralized log analysis and monitoring

7. Monitoring and Troubleshooting

7.1 Key CloudWatch Metrics (Data Streams)

7.1 Key CloudWatch Metrics (Data Streams)

7.2 Key CloudWatch Metrics (Data Firehose)

7.2 Key CloudWatch Metrics (Data Firehose)

7.3 Best Practices

  • Use partition keys with high cardinality to distribute load evenly
  • Enable enhanced monitoring for detailed shard-level metrics
  • Implement exponential backoff for retries on throttling
  • Use enhanced fan-out for multiple consumers requiring low latency
  • Monitor IteratorAgeMilliseconds to detect processing delays
  • Enable server-side encryption for sensitive data
  • Use VPC endpoints for private connectivity
  • Set appropriate retention based on replay requirements
  • Use KPL for high-throughput producers to benefit from batching
  • Configure CloudWatch alarms for throughput exceptions and iterator age

8. Cost Optimization

8.1 Data Streams Pricing

  • Provisioned: Shard Hour + PUT Payload Unit (25 KB increments)
  • On-Demand: Per GB data written/read + data retention beyond 24 hours
  • Extended retention: Additional charge per shard hour for retention beyond 24 hours
  • Enhanced fan-out: Consumer-shard hour + data retrieval per GB
  • Switch to on-demand for unpredictable workloads

8.2 Data Firehose Pricing

  • Per GB ingested (pricing scales down at higher volumes)
  • Data format conversion charged per GB
  • VPC delivery charged per hour and per GB
  • No charge for data retention (no retention capability)

8.3 Optimization Tips

  • Right-size shard count based on actual throughput needs
  • Use Data Firehose instead of Data Streams if real-time access not required
  • Aggregate small records using KPL to reduce PUT costs
  • Reduce retention period if replay not needed
  • Use standard consumers if 200 ms latency acceptable
The document Cheat Sheet: Kinesis — Streaming Data is a part of the AWS Solutions Architect Course AWS Solutions Architect: Associate Level.
All you need of AWS Solutions Architect at this link: AWS Solutions Architect
Explore Courses for AWS Solutions Architect exam
Get EduRev Notes directly in your Google search
Related Searches
ppt, pdf , Previous Year Questions with Solutions, Cheat Sheet: Kinesis — Streaming Data, MCQs, Cheat Sheet: Kinesis — Streaming Data, Viva Questions, past year papers, Exam, Objective type Questions, Sample Paper, video lectures, shortcuts and tricks, practice quizzes, Semester Notes, Cheat Sheet: Kinesis — Streaming Data, Free, Summary, Extra Questions, study material, mock tests for examination, Important questions;