AWS Solutions Architect Exam  >  AWS Solutions Architect Notes  >  : Professional Level  >  Case Studies: Data Transfer

Case Studies: Data Transfer

Data Transfer - Domain 4: Migration & Modernization - Case Studies

Case Study 1

A multinational pharmaceutical company is migrating 450 TB of clinical trial data from an on-premises data center in Frankfurt to Amazon S3 in the eu-central-1 region. The data consists of high-resolution medical imaging files, genomic sequencing data, and associated metadata. The company's current internet connection provides 1 Gbps bandwidth, but the connection is shared with 2,000 employees and cannot be saturated for more than 4 hours per day during off-peak hours. Regulatory compliance requires that all data must be encrypted both in transit and at rest, and the company must maintain a detailed chain of custody log showing exactly when each file left the premises and arrived in AWS. The migration must be completed within 45 days, and the IT team has a maximum of 5 staff members who can dedicate 20% of their time to managing the migration process.

Which migration approach will meet these requirements with the least operational complexity?

  1. Order multiple AWS Snowball Edge Storage Optimized devices, configure them with S3-compatible endpoints, copy data in parallel batches, ship devices back to AWS, and use AWS Snowball job reports combined with S3 inventory reports for chain of custody documentation
  2. Establish an AWS Direct Connect connection with a 10 Gbps port, implement AWS DataSync with bandwidth throttling to use only during off-peak hours, enable CloudWatch Logs for transfer tracking, and configure S3 bucket versioning for compliance
  3. Deploy AWS Storage Gateway in File Gateway mode on-premises, mount it as an NFS share, copy files during off-peak hours using robocopy scripts with bandwidth limiting, and use CloudTrail logs combined with S3 server access logs for audit trails
  4. Use AWS Transfer Family with SFTP endpoints, write custom scripts to upload files in parallel during maintenance windows with bandwidth throttling, and implement AWS Config rules to track object uploads with detailed timestamps

Answer & Explanation

Correct Answer: 1 - Order multiple AWS Snowball Edge Storage Optimized devices with detailed job tracking

Why this is correct: With 450 TB of data and only 4 hours per day available at 1 Gbps (shared bandwidth), a network-based transfer would take approximately 3-4 months even under ideal conditions, far exceeding the 45-day requirement. AWS Snowball Edge Storage Optimized devices can hold up to 80 TB of usable capacity each, so 6 devices would be sufficient. The devices provide built-in encryption (256-bit encryption keys managed through AWS KMS), support parallel data loading, and generate comprehensive job completion reports that document when data was loaded and when AWS received and imported the data. The physical chain of custody is tracked through device shipping manifests and AWS import job logs. With minimal staff time required (primarily for initial device setup and data copying), this approach provides the lowest operational complexity while meeting all constraints.

Why the other options are wrong:

  • Option 2: AWS Direct Connect requires 2-4 weeks minimum for circuit provisioning and often longer for cross-connects and BGP configuration, consuming a significant portion of the 45-day window. Even with a 10 Gbps connection, transferring 450 TB in the remaining time while limited to 4 hours per day would be mathematically challenging. Additionally, Direct Connect implementation requires substantial staff expertise and ongoing management, violating the operational complexity constraint with only 5 staff at 20% time allocation.
  • Option 3: File Gateway caches data locally before asynchronously uploading to S3, which introduces unpredictability in transfer timing and makes it extremely difficult to complete 450 TB within 45 days given the bandwidth constraints. The cache management, bandwidth throttling configuration, and monitoring would require significant ongoing staff intervention. Furthermore, the chain of custody documentation would be fragmented across CloudTrail (API calls), S3 server access logs (object creation), and File Gateway CloudWatch metrics, requiring complex correlation and custom reporting that increases operational burden.
  • Option 4: AWS Transfer Family is designed for ongoing file transfer workflows and user access management, not bulk one-time migrations of this scale. Custom scripting for 450 TB of parallel uploads with bandwidth management, error handling, retry logic, and progress tracking would require substantial development and testing effort. The operational overhead of managing SFTP endpoints, monitoring transfer progress across thousands of files, handling failures, and correlating AWS Config rules with upload timestamps significantly exceeds what 5 staff members at 20% time can reasonably manage within 45 days.

Key Insight: The critical discriminator here is the mathematical impossibility of network-based transfers given the bandwidth and time constraints (4 hours/day × 1 Gbps shared × 45 days ≈ 180 TB maximum theoretical throughput). Candidates often gravitate toward DataSync or Direct Connect for large migrations, but fail to calculate whether the timeline and bandwidth constraints make these approaches viable. Snowball becomes the only practical option when transfer time calculations exceed project deadlines.

Case Study 2

A global financial services firm operates a hybrid architecture with a primary data center in Singapore and AWS infrastructure in the ap-southeast-1 region. The company runs a risk calculation engine on-premises that generates 800 GB of analytical output files daily, which must be transferred to AWS within 2 hours of generation for downstream processing by AWS Lambda functions and Amazon Athena queries. The existing VPN connection frequently experiences latency spikes during market hours, causing transfer delays. Security policy mandates that all financial data must traverse private network connections without exposure to the public internet, and all transfers must be encrypted using customer-managed keys. The company's compliance team requires detailed network flow logs for all data movement. The IT operations team needs a solution that automatically retries failed transfers and provides transfer acceleration without manual intervention.

Which architecture will meet these requirements while providing the most reliable and lowest-latency data transfer?

  1. Establish AWS Direct Connect with a private virtual interface to a VPC, deploy AWS DataSync agents on-premises pointing to S3 through VPC endpoints, configure DataSync tasks with customer-managed KMS keys, enable VPC Flow Logs, and use DataSync's built-in bandwidth optimization and automatic retry mechanisms
  2. Configure AWS Site-to-Site VPN with accelerated VPN endpoints, deploy AWS Transfer Family SFTP server with VPC-only access, write scripts to upload files using SFTP with customer-managed encryption, enable VPC Flow Logs and CloudWatch monitoring, and implement custom retry logic in the upload scripts
  3. Deploy AWS Storage Gateway Volume Gateway in cached mode, mount volumes on the on-premises risk calculation servers, write output directly to volumes, enable private connectivity through Direct Connect, configure SSE-KMS with customer-managed keys, and use CloudWatch Logs for monitoring
  4. Set up AWS PrivateLink endpoints for S3 in the VPC, install AWS CLI on application servers, create scripts using S3 multi-part upload with customer-managed KMS keys, route traffic through Direct Connect, enable CloudTrail and VPC Flow Logs, and implement exponential backoff retry logic in scripts

Answer & Explanation

Correct Answer: 1 - Direct Connect with DataSync agents using VPC endpoints and customer-managed KMS keys

Why this is correct: This solution addresses all constraints optimally. AWS Direct Connect provides the dedicated, private network connection required by security policy with predictable low latency. AWS DataSync is purpose-built for automated, scheduled data transfers between on-premises and AWS, offering built-in bandwidth optimization, network resilience, data integrity verification, automatic retry mechanisms, and comprehensive transfer logging. By deploying DataSync agents on-premises and configuring them to communicate with AWS DataSync service endpoints through a VPC private interface, all traffic remains on the private Direct Connect connection. DataSync natively supports SSE-KMS encryption with customer-managed keys for data at rest in S3. VPC Flow Logs capture all network flows for compliance requirements. DataSync automatically handles the 800 GB daily transfer within the 2-hour window through parallel transfer optimization, and its built-in retry logic eliminates the need for manual intervention during transient failures.

Why the other options are wrong:

  • Option 2: While accelerated VPN can improve performance, it still relies on internet-based connectivity, which violates the explicit requirement that financial data must not traverse the public internet. AWS Transfer Family SFTP is designed for user-based file access patterns rather than automated bulk transfers from applications, and custom scripting for 800 GB daily transfers with proper error handling, bandwidth management, and monitoring would create significant operational overhead. The lack of built-in transfer optimization means meeting the 2-hour window would be unreliable during peak periods.
  • Option 3: Volume Gateway in cached mode is designed for block storage workloads with low-latency local access to frequently accessed data, not for one-way bulk file transfers. The risk calculation engine generates output files (file/object workload), not block-level storage I/O. Volume Gateway would introduce unnecessary complexity by presenting block volumes that would need mounting, formatting, and file system management. Additionally, the asynchronous upload behavior of cached mode makes it difficult to guarantee that 800 GB will transfer to AWS within the required 2-hour window, as upload timing is managed by the gateway's internal algorithms rather than explicit scheduling.
  • Option 4: While this approach uses private connectivity and supports customer-managed KMS keys, implementing reliable automated transfers using AWS CLI scripts requires substantial custom development. The team would need to build comprehensive error handling, implement proper multi-part upload logic for large files, create scheduling mechanisms, develop monitoring and alerting, implement exponential backoff correctly, handle partial upload failures, and ensure idempotency. This creates significant operational overhead and maintenance burden. AWS CLI is not optimized for bulk transfer performance the way DataSync is, potentially making the 2-hour window difficult to achieve consistently for 800 GB daily volumes.

Key Insight: The combination of "2-hour transfer window," "automatic retry," and "private connectivity" requirements specifically points to DataSync over custom scripting solutions. Many candidates choose custom CLI-based approaches because they seem more controllable, but fail to appreciate that DataSync was engineered specifically for this use case with performance optimizations, parallel transfers, and resilience features that would take months to replicate in custom code.

Case Study 3

A media production company needs to transfer 2.5 PB of raw 8K video footage from its production studio in Los Angeles to AWS for archival and machine learning-based content analysis. The footage consists of approximately 180,000 files ranging from 5 GB to 45 GB each. The company has a 10 Gbps internet connection but uses 6 Gbps for ongoing production operations. The footage is currently stored on a high-performance NAS array that will be decommissioned once the migration is complete. The company wants to minimize data transfer costs while completing the migration within 90 days. The engineering team has identified that roughly 40% of the files will need immediate access for ML processing, while the remaining 60% can be archived with infrequent access. AWS has recently announced the availability of Snowball Edge devices in the company's region.

Which approach will complete the migration within the timeline while minimizing total costs? (Select TWO)

  1. Order 32 AWS Snowball Edge Storage Optimized devices (80 TB usable each), load data in parallel across multiple devices using the company's local network, classify files during the copy process, ship devices to AWS for import directly into S3 Intelligent-Tiering storage class
  2. Use the available 4 Gbps of internet bandwidth to transfer data using AWS DataSync with network throttling, configure DataSync to write 40% of files to S3 Standard and 60% to S3 Glacier Flexible Retrieval based on file metadata, spread the transfer over 90 days to maximize bandwidth utilization
  3. Order AWS Snowmobile service to handle the 2.5 PB transfer, coordinate the 6-week on-site data loading period, use the Snowmobile's high-speed network interfaces to transfer data directly from the NAS array, and specify S3 Intelligent-Tiering as the destination storage class
  4. Order multiple AWS Snowball Edge devices in rolling batches, copy data to devices while applying file classification tags, configure the import jobs to use S3 Lifecycle policies that immediately transition 60% of objects to S3 Glacier Deep Archive based on tags, and place the remaining 40% in S3 Standard for ML processing
  5. Establish an AWS Direct Connect connection with a dedicated 10 Gbps port, use AWS DataSync with full bandwidth allocation during nights and weekends, implement task filtering to route files to appropriate S3 storage classes during transfer, and leverage DataSync's data validation features

Answer & Explanation

Correct Answer: 1 and 4

Why these are correct: Option 1 provides the most cost-effective approach by using Snowball Edge devices, which charge only per device and per day of use (approximately $300 per device for a 10-day use case). With 32 devices at 80 TB usable capacity each, the company can transfer the entire 2.5 PB dataset. The ability to load multiple devices in parallel dramatically reduces calendar time compared to network transfers. Importing directly into S3 Intelligent-Tiering allows AWS to automatically optimize storage costs between access tiers without the company needing to pre-classify files, and Snowball data transfer to S3 incurs no data transfer charges. Option 4 complements this approach by using targeted storage class placement with Lifecycle policies, which is more cost-effective for the known 60/40 split. Tagging files during the Snowball loading process and using immediate Lifecycle transitions to Glacier Deep Archive for archival content minimizes storage costs from day one. The rolling batch approach allows the team to manage device logistics more effectively and begin processing the first batches of data while later batches are still being loaded and shipped.

Why the other options are wrong:

  • Option 2: Transferring 2.5 PB over a 4 Gbps connection would take approximately 58 days of continuous 24/7 transfer at theoretical maximum throughput, leaving minimal margin for real-world overhead, retries, or network variations within the 90-day window. More critically, this approach incurs substantial data transfer out charges. While AWS does not charge for data transfer IN to S3 from the internet, the company would face significant bandwidth costs from their ISP for sustained 4 Gbps utilization over two months, and network-based transfers are far more expensive than Snowball's fixed per-device pricing for multi-petabyte datasets.
  • Option 3: AWS Snowmobile is designed for exabyte-scale transfers (10 PB minimum typically) and involves significantly higher costs due to specialized security, dedicated AWS personnel on-site, and complex logistics. At 2.5 PB, Snowmobile is substantially more expensive than Snowball Edge devices. Additionally, Snowmobile availability is limited and requires extensive planning and coordination with AWS, often taking 2-3 months just for scheduling and preparation, potentially exceeding the 90-day project window before data loading even begins.
  • Option 5: Direct Connect provisioning requires 2-4 weeks minimum, and ordering a dedicated 10 Gbps port incurs port hour charges (approximately $2.25/hour or $1,620/month) plus data transfer out charges from the Direct Connect location. Even with 10 Gbps dedicated bandwidth, transferring 2.5 PB would take approximately 23 days of continuous transfer, but realistically nights and weekends only would extend this significantly, risking timeline compliance. The combination of Direct Connect port charges, potential data transfer charges, and the ongoing monthly commitment makes this approach considerably more expensive than Snowball for a one-time migration of this scale.

Key Insight: The critical calculation candidates must perform is comparing the total cost of Snowball devices (fixed per-device pricing regardless of data volume, no data transfer charges) versus network-based transfers (sustained bandwidth costs, potential AWS data transfer charges, infrastructure costs like Direct Connect). For multi-petabyte one-time migrations, Snowball becomes dramatically more cost-effective than any network option. The second discriminator is recognizing that S3 Intelligent-Tiering versus immediate Lifecycle transitions to Glacier Deep Archive both serve the use case, making both approaches viable when combined with Snowball.

Case Study 4

A healthcare research institution runs genomic sequencing workloads that generate approximately 15 TB of data per day across 40 research labs in a campus setting. Each lab has dedicated file servers that store sequencing output files, and researchers need to analyze this data using Amazon EMR clusters and SageMaker notebooks in AWS. The institution has an existing 5 Gbps AWS Direct Connect connection to the us-east-1 region. Recently, researchers have complained about data transfer delays, and monitoring shows the Direct Connect connection is consistently saturated at 92-98% utilization during business hours. The institution's cloud architect has been asked to improve transfer performance without replacing the existing Direct Connect circuit. The architecture must maintain end-to-end encryption for HIPAA compliance, minimize changes to the existing 40 file servers, and provide researchers with visibility into transfer progress and estimated completion times.

What is the MOST operationally efficient solution to improve transfer performance while meeting all requirements?

  1. Deploy AWS DataSync agents on each of the 40 file servers, configure DataSync tasks with bandwidth limiting to use no more than 80% of available Direct Connect capacity, enable DataSync task execution scheduling to run during off-peak hours, and provide researchers access to DataSync CloudWatch metrics dashboards for transfer visibility
  2. Implement AWS Storage Gateway File Gateway on a centralized VM in the campus data center, configure all 40 file servers to copy files to File Gateway NFS exports using scheduled rsync jobs, enable bandwidth throttling on File Gateway to prevent Direct Connect saturation, and use CloudWatch dashboards for monitoring
  3. Deploy AWS DataSync agents as containerized services on a centralized high-performance server in the campus data center, configure the 40 file servers to copy files to a staging area using local network transfers, create DataSync tasks that transfer from the staging area to S3 using task scheduling and bandwidth optimization, and grant researchers access to DataSync console for progress monitoring
  4. Order AWS Snowball Edge devices on a recurring weekly schedule, have each lab copy their files to Snowball devices throughout the week, ship devices at week-end for import to S3, and provide researchers with S3 inventory reports and S3 Event Notifications for data availability tracking

Answer & Explanation

Correct Answer: 3 - Centralized DataSync agent with staging area and bandwidth optimization

Why this is correct: This approach addresses the core problem-Direct Connect saturation-while minimizing operational complexity. By consolidating transfer operations through a centralized DataSync agent, the institution gains a single control point for bandwidth management, scheduling, and monitoring rather than managing 40 distributed agents. The local campus network (typically 10 Gbps or faster between buildings) allows the 40 file servers to quickly copy files to the staging area without competing for WAN bandwidth. DataSync's built-in bandwidth optimization and task scheduling features allow intelligent use of the 5 Gbps Direct Connect connection during off-peak hours or at controlled rates during business hours. DataSync provides native TLS encryption in transit and integrates with AWS KMS for encryption at rest, satisfying HIPAA requirements. The DataSync console provides detailed per-task progress metrics, estimated completion times, and throughput statistics that researchers can access. Importantly, this requires minimal changes to the file servers-they simply copy to a local staging area using existing tools-rather than deploying and managing agents on 40 separate systems.

Why the other options are wrong:

  • Option 1: Deploying and managing 40 separate DataSync agents creates significant operational overhead-each agent requires installation, configuration, ongoing patching, monitoring, and troubleshooting. Configuring bandwidth limits on 40 independent agents is complex and risks misconfiguration where aggregate bandwidth consumption still saturates the Direct Connect link. Each agent would require its own activation, AWS credentials or IAM roles, and separate CloudWatch monitoring. Coordinating task schedules across 40 agents to prevent simultaneous execution is operationally complex. This violates the "operationally efficient" and "minimize changes to existing file servers" constraints since each of the 40 servers requires agent installation and ongoing maintenance.
  • Option 2: File Gateway is designed for providing ongoing, low-latency file access with cloud backing, not for batch transfer optimization. The asynchronous upload behavior of File Gateway makes it difficult to provide researchers with accurate transfer progress and completion time estimates-files are cached locally and uploaded based on the gateway's internal algorithms. Configuring 40 file servers to reliably copy to File Gateway NFS exports requires changing backup or data management processes on all servers. File Gateway's bandwidth throttling is less sophisticated than DataSync's transfer optimization and doesn't provide task-level scheduling granularity. The caching architecture means researchers can't reliably know when files are actually available in AWS versus just cached locally.
  • Option 4: Snowball Edge devices are designed for offline bulk transfers or edge computing, not for continuous daily data flow. Managing weekly Snowball rotations across 40 labs creates massive operational overhead-device ordering, receiving, distributing to labs, coordinating returns, tracking which data is on which device, and managing import jobs. Each device cycle takes several days (shipping to AWS, import processing, data availability), meaning researchers would face 5-7 day delays before data is available for analysis, which is unacceptable for time-sensitive research. This approach also consumes the existing Direct Connect connection for other traffic but doesn't utilize it for data transfer, wasting existing infrastructure investment. The logistical complexity of 52 Snowball shipments per year across 40 labs is operationally expensive.

Key Insight: The trap here is the instinct to deploy DataSync agents on each source server (Option 1) because it seems direct and simple. However, the operationally efficient solution recognizes that centralizing transfer operations through a staging area reduces management complexity from N agents to one agent, while leveraging fast local campus networking. Professional architects must evaluate not just whether a solution works technically, but whether the operational overhead of managing distributed components is justified.

Case Study 5

A logistics company operates a fleet management system with IoT sensors on 25,000 trucks across North America. Each truck generates approximately 2 MB of telemetry data per hour (GPS coordinates, engine diagnostics, fuel consumption, driver behavior metrics), resulting in roughly 1.2 TB of data daily. The data is collected at 15 regional processing centers where edge servers aggregate and compress the telemetry before forwarding to AWS for analytics. The company currently uses internet-based transfers through VPN tunnels, but has experienced data loss during network interruptions, and the IT team spends considerable time troubleshooting failed transfers and manually reprocessing lost data. The data must be transferred to Amazon S3 where it feeds real-time dashboards in Amazon QuickSight and batch analytics in Amazon Redshift. Company policy requires data to be available in AWS within 15 minutes of collection at regional centers. The CIO has mandated a solution that provides guaranteed data delivery with automatic error recovery and requires minimal ongoing operational intervention.

Which solution will meet these requirements with the highest data reliability and lowest operational burden?

  1. Deploy AWS DataSync agents on the 15 regional edge servers, configure DataSync tasks to run every 15 minutes with automatic retry and verification enabled, use DataSync's built-in task scheduling and CloudWatch integration for monitoring, and leverage DataSync's data integrity verification to prevent data loss
  2. Install Amazon Kinesis Data Firehose delivery stream agents on the edge servers, configure Firehose to batch data every 5 minutes with automatic retry, enable S3 backup for failed records, use Firehose's automatic scaling and managed service model, and implement CloudWatch alarms for delivery monitoring
  3. Implement AWS IoT Core on each edge server to publish telemetry data using MQTT protocol, configure IoT Core rules to route data directly to S3, enable dead letter queues for failed messages, use IoT Core's built-in retry mechanisms, and leverage AWS IoT Device Defender for monitoring
  4. Deploy AWS Storage Gateway File Gateway at each regional center, configure edge servers to write telemetry files to File Gateway NFS exports, enable File Gateway's automatic upload to S3 with bandwidth optimization, use CloudWatch metrics for monitoring, and rely on File Gateway's local cache for resilience during network interruptions

Answer & Explanation

Correct Answer: 2 - Amazon Kinesis Data Firehose with automatic retry and backup for failed records

Why this is correct: Amazon Kinesis Data Firehose is purpose-built for reliably delivering streaming data to AWS services with minimal operational overhead. As a fully managed service, Firehose automatically scales to handle the throughput (1.2 TB daily ≈ 14 MB/sec average, well within Firehose's capabilities), requires no server provisioning or capacity management, and handles all infrastructure maintenance. Firehose's automatic retry mechanism persists data for up to 24 hours while retrying failed deliveries, and the ability to configure S3 backup for failed records ensures zero data loss. The 5-minute batching interval comfortably meets the 15-minute availability requirement. Firehose automatically compresses data before delivery to S3 (reducing storage costs), supports data transformation via Lambda if needed, and integrates with CloudWatch for monitoring without requiring custom instrumentation. The fully managed nature means the IT team eliminates troubleshooting of failed transfers and manual reprocessing-Firehose handles resilience automatically. Each regional center can have its own Firehose delivery stream, with simple agent installation and minimal configuration.

Why the other options are wrong:

  • Option 1: While DataSync provides excellent reliability for file-based transfers, it is optimized for larger files and batch transfer workloads rather than continuous streaming of small telemetry records. Running DataSync tasks every 15 minutes means the edge servers must continuously buffer incoming telemetry data and write it to files for DataSync to transfer. This creates operational complexity around file management, buffering, and ensuring files are closed and ready for transfer each cycle. DataSync agents require ongoing maintenance (updates, monitoring, troubleshooting) across 15 regional locations. DataSync is also designed for larger file sets and would be inefficient for the relatively small continuous data flow compared to Firehose's streaming architecture.
  • Option 3: AWS IoT Core is designed for direct device-to-cloud connectivity using MQTT or HTTPS protocols from individual IoT devices. In this scenario, telemetry is already being aggregated at regional processing centers by edge servers, not coming directly from 25,000 individual trucks. Implementing IoT Core would require significant architectural changes to bypass the existing regional aggregation infrastructure or would involve using IoT Core in an unconventional manner where edge servers act as MQTT publishers. IoT Core pricing is based on messages and connectivity time, which could be more expensive than Firehose for this aggregated data volume. Additionally, configuring IoT Core rules across 15 regional deployments and managing MQTT client implementations adds operational complexity compared to Firehose's simple delivery stream configuration.
  • Option 4: File Gateway introduces unnecessary complexity for this use case. The edge servers would need to continuously write small telemetry records to NFS exports, which is inefficient compared to streaming delivery. File Gateway's asynchronous upload behavior makes it difficult to guarantee data availability in AWS within 15 minutes-the gateway decides when to upload cached data based on internal algorithms, not explicit timing requirements. While File Gateway provides local cache resilience during network interruptions, it doesn't offer the same level of guaranteed delivery and automatic retry sophistication as Firehose. Managing 15 File Gateway deployments (VM resources, cache sizing, monitoring, updates) creates ongoing operational burden that violates the "minimal ongoing operational intervention" requirement.

Key Insight: The discriminator is recognizing that this is a streaming data delivery problem, not a file transfer problem. Candidates often default to DataSync for any data transfer scenario, but DataSync is optimized for files and scheduled batch transfers. Firehose is engineered specifically for continuously delivering streaming data with guaranteed delivery, automatic scaling, and zero operational overhead. The "data loss during network interruptions" and "minimal ongoing operational intervention" constraints specifically favor a fully managed streaming solution over file-based batch transfer mechanisms.

Case Study 6

An automotive manufacturer operates a global supply chain with manufacturing facilities in Germany, Japan, South Korea, and Mexico. The company runs SAP ERP systems at each facility, generating daily extract files containing parts inventory, production schedules, and quality control data. These files range from 50 GB to 200 GB per facility and must be consolidated into a centralized AWS data lake in the us-east-1 region for global analytics using Amazon Redshift and Amazon QuickSight. Currently, each facility uses FTP over the public internet to transfer files to AWS, but transfers from Japan and South Korea frequently fail due to network instability, requiring manual re-transmission. The company's security team has mandated that all data transfers must use private connectivity or encrypted channels with certificate-based authentication. The facilities are planning to implement AWS Direct Connect connections to their nearest AWS regions within the next 6 months, but the consolidated analytics requirement has been given executive priority and cannot wait for Direct Connect provisioning.

What is the MOST secure interim solution that provides reliable data transfer while Direct Connect connections are being established?

  1. Deploy AWS Transfer Family SFTP servers in each facility's nearest AWS region (eu-central-1, ap-northeast-1, ap-northeast-2, us-east-2), configure certificate-based authentication, implement S3 cross-region replication to consolidate files into us-east-1, enable VPC endpoints for private connectivity within AWS, and use SFTP over the internet with TLS encryption from facilities to their regional Transfer Family endpoints
  2. Set up Site-to-Site VPN connections between each facility and the us-east-1 region, deploy AWS DataSync agents at each facility, configure DataSync to transfer files through the VPN tunnels with task-level encryption using AWS KMS, enable DataSync task verification for transfer reliability, and implement CloudWatch alarms for monitoring transfer failures
  3. Deploy AWS Storage Gateway Volume Gateway at each facility in stored volume mode, configure SAP systems to write extract files to mounted volumes, establish VPN connections to the nearest AWS regions, enable automatic snapshots to Amazon S3, implement cross-region snapshot copy to us-east-1, and use VPC endpoints for private AWS connectivity
  4. Implement AWS PrivateLink connections from each facility to Amazon S3 using VPN tunnels to the nearest AWS regions, install AWS CLI on the SAP application servers, create automated scripts using S3 multipart upload with SSE-KMS encryption and certificate-based authentication, implement exponential backoff retry logic, and use S3 cross-region replication to us-east-1

Answer & Explanation

Correct Answer: 2 - Site-to-Site VPN with DataSync agents and task-level encryption

Why this is correct: This solution best balances security, reliability, and time-to-implementation while Direct Connect is being provisioned. AWS Site-to-Site VPN provides IPsec-encrypted tunnels that satisfy the requirement for encrypted channels, can be deployed within days (vs. months for Direct Connect), and creates private connectivity between facilities and AWS. AWS DataSync provides enterprise-grade transfer capabilities specifically designed to handle network instability-its built-in retry logic, verification, and resume capability address the frequent transfer failures from Japan and South Korea. DataSync's task-level encryption with AWS KMS provides additional security layer beyond VPN encryption. The automatic verification feature ensures data integrity and eliminates the manual re-transmission burden. DataSync supports certificate-based authentication through TLS, meeting the security requirement. CloudWatch integration provides operational visibility. While VPN bandwidth may be lower than future Direct Connect, it's sufficient for daily 50-200 GB transfers per facility (approximately 6-24 hours per facility with typical VPN throughput), and as an interim 6-month solution, this is acceptable given the executive priority to start analytics immediately.

Why the other options are wrong:

  • Option 1: While AWS Transfer Family SFTP supports certificate-based authentication and provides reliable file transfer, the approach requires SFTP transfers over the public internet from facilities to AWS. Even with TLS encryption, the security team's mandate specifies "private connectivity or encrypted channels," and best practice interpretation of this requirement in the context of "mandated" security policy suggests private connectivity is strongly preferred over internet-based transfers. Public internet SFTP doesn't address the underlying network instability issues from Japan and South Korea-SFTP itself doesn't provide the sophisticated retry and resume capabilities that DataSync offers. Additionally, deploying Transfer Family servers in four different regions plus implementing and managing cross-region replication adds operational complexity and cost (Transfer Family per-hour charges in 4 regions, cross-region data transfer charges).
  • Option 3: Volume Gateway in stored volume mode is designed for block storage workloads (presenting iSCSI volumes to applications), not for file extract transfers from SAP systems. SAP typically generates extract files that are written to file systems, not block I/O patterns. Using Volume Gateway would require architectural changes to how SAP generates extracts, potentially writing to mounted iSCSI volumes, which introduces application-level changes and complexity. The snapshot mechanism is designed for backup and disaster recovery, not for operational data transfer-snapshots occur on a schedule and don't provide the "transfer file when ready" operational model needed for daily extract workflows. Using snapshots for data transfer creates significant delays (snapshot creation, snapshot copy to S3, cross-region snapshot copy) and doesn't provide real-time visibility into transfer progress or easy recovery from failures.
  • Option 4: While technically viable, this approach requires substantial custom development and creates ongoing maintenance burden. Building reliable automated scripts with proper multipart upload logic, comprehensive error handling, exponential backoff, progress tracking, and recovery from partial uploads is complex and error-prone. The team would need to implement monitoring, alerting, and logging infrastructure around custom scripts. AWS CLI is not optimized for transfer resilience the way DataSync is-DataSync has built-in network optimization, parallel transfer, integrity verification, and automatic retry that would need to be manually coded. Implementing certificate-based authentication with custom scripts requires PKI management and secure credential handling. Managing and troubleshooting custom scripts across 4 global facilities creates operational risk compared to the managed service approach of DataSync.

Key Insight: The critical requirements are "interim solution," "network instability," and "cannot wait for Direct Connect." Candidates often choose Transfer Family (Option 1) because SFTP seems like the obvious secure file transfer protocol, but miss that DataSync over VPN provides superior transfer resilience for unreliable networks and better aligns with the private connectivity security mandate. The ability to rapidly deploy Site-to-Site VPN (days vs. months) makes it the practical interim solution.

Case Study 7

A genomics research consortium consisting of 8 universities needs to share large sequencing datasets between their on-premises high-performance computing (HPC) clusters and a shared AWS environment in the us-west-2 region. Each university generates between 10-30 TB of sequencing data monthly, and researchers at any university need low-latency access to datasets from all universities. The current approach of shipping hard drives between institutions creates 4-6 week delays that impede research progress. Each university has different IT security policies: 3 universities require all data to remain within their own AWS accounts for governance reasons, while 5 universities can share data through a common AWS account. All universities have existing 10 Gbps network connections to their regional internet service providers. The consortium has a limited budget and wants to minimize both initial capital expenditure and ongoing monthly costs while enabling researchers to access data within 24 hours of generation. Data must be encrypted both in transit and at rest to comply with NIH security requirements.

Which architecture provides the most cost-effective solution while meeting all requirements?

  1. Each university deploys AWS DataSync agents on their HPC clusters with internet connectivity, creates S3 buckets in their own AWS accounts (or the shared account), uses S3 cross-account access policies to grant read access to all universities, implements S3 replication to create copies in each university's preferred account, and uses DataSync task scheduling to transfer data during off-peak hours to minimize bandwidth costs
  2. The consortium establishes a single AWS Direct Connect connection at a colocation facility in a central geographic location, each university sets up VPN connections from their campus to the colocation facility, deploys a shared Storage Gateway File Gateway at the colocation, each university mounts the File Gateway NFS exports, and uses S3 bucket policies to segregate data while providing shared access across universities
  3. Each university deploys AWS Storage Gateway File Gateway on-premises, writes sequencing data to File Gateway NFS shares, uses File Gateway's S3 integration to automatically upload data to their designated S3 buckets, implements S3 Cross-Region Replication where needed, and leverages S3 Access Points to provide cross-account access to researchers while maintaining data governance requirements
  4. Each university orders AWS Snowball Edge devices on a monthly recurring schedule, loads sequencing data throughout the month, ships devices to AWS for import into their designated S3 buckets, implements S3 bucket policies and cross-account IAM roles for shared access, and uses S3 Intelligent-Tiering to minimize storage costs for infrequently accessed datasets

Answer & Explanation

Correct Answer: 1 - DataSync agents with internet connectivity and S3 cross-account access

Why this is correct: This architecture provides the most cost-effective solution for several reasons. AWS DataSync with internet connectivity avoids the capital expense and ongoing charges of AWS Direct Connect (approximately $0.30/hour port charges plus monthly cross-connect fees). With 10 Gbps internet connections at each university, transferring 10-30 TB per month is easily achievable within the 24-hour requirement-even 30 TB can transfer in approximately 7-9 hours with realistic throughput. AWS does not charge for data transfer INTO S3 from the internet, eliminating inbound data transfer costs. DataSync provides built-in TLS encryption for data in transit, and S3 supports SSE-KMS for encryption at rest, meeting NIH requirements. The architecture respects data governance by allowing 3 universities to maintain data in their own accounts while the other 5 share a common account. S3 cross-account access policies and IAM roles enable researchers across all universities to access data without requiring replication, which would significantly increase storage costs. Only data that truly needs to be replicated (for governance isolation) is copied, minimizing both storage and cross-region data transfer costs. DataSync's task scheduling allows transfers during off-peak hours when bandwidth costs from ISPs may be lower.

Why the other options are wrong:

  • Option 2: This architecture incurs substantial capital and ongoing costs that violate the budget constraint. Establishing a Direct Connect connection requires port hour charges ($0.30-$2.25/hour depending on port speed, or $216-$1,620/month), cross-connect fees at the colocation facility, and potentially charges for each university's VPN connection to the colocation. Deploying and maintaining infrastructure at a colocation facility adds rental costs, network equipment costs, and operational overhead. A shared File Gateway at a central location creates a single point of failure and potential bandwidth bottleneck for 8 universities. File Gateway's asynchronous upload behavior makes the 24-hour data availability requirement less predictable. The operational complexity of managing shared infrastructure used by 8 institutions with different IT policies creates governance and support challenges.
  • Option 3: While File Gateway could work technically, it introduces unnecessary costs and complexity. Each university would need to deploy and maintain File Gateway VMs, requiring compute resources, storage for caching, ongoing monitoring and updates. File Gateway is designed for hybrid access patterns where applications need local file access with cloud backing, but in this scenario, HPC clusters are generating output files that need to move to AWS-they don't need ongoing local access after initial creation. The asynchronous nature of File Gateway uploads makes it difficult to guarantee 24-hour availability, as upload timing is controlled by internal algorithms rather than explicit transfer completion. The caching architecture means paying for local storage capacity at each university in addition to S3 storage. S3 Cross-Region Replication for data that needs copying incurs both storage and data transfer charges that exceed the cost efficiency of DataSync's one-time transfers.
  • Option 4: Snowball Edge devices are cost-effective for very large one-time migrations but inefficient for ongoing monthly data flows. Each device costs approximately $300 per use plus 10 days of use time and shipping costs. With 8 universities, the consortium would need to manage approximately 96 Snowball shipments per year (8 universities × 12 months), creating enormous logistical and administrative overhead. Each device cycle introduces 5-7 days of shipping and import processing time, making the 24-hour availability requirement impossible to meet-researchers would face week-long delays before data is accessible. The operational burden of coordinating monthly device orders, tracking shipments, managing import jobs, and ensuring data is properly organized across 8 institutions overwhelms the cost savings compared to network-based transfers. For datasets in the 10-30 TB/month range per institution, network transfer is more cost-effective than Snowball.

Key Insight: The trap is assuming that inter-institutional data sharing requires expensive private connectivity (Direct Connect) or complex shared infrastructure. The key insight is recognizing that AWS does not charge for inbound data transfer to S3 from the internet, making internet-based DataSync extremely cost-effective for this use case. Candidates often overlook this fundamental AWS pricing model and over-engineer with Direct Connect or Snowball, dramatically increasing costs unnecessarily. The monthly data volumes (10-30 TB) are well within internet transfer capabilities given 10 Gbps connections.

Case Study 8

A satellite imagery company captures high-resolution Earth observation data from 12 satellites in low Earth orbit. Ground stations in Alaska, Norway, and Australia receive downlink transmissions and store imagery on local servers. Each ground station receives approximately 8 TB of raw imagery data per day. The imagery must be transferred to AWS within 6 hours of capture for processing by GPU-based machine learning pipelines running on Amazon EC2 P4d instances. The processing pipeline has specific timing requirements: imagery captured between 00:00-12:00 UTC must begin processing by 06:00 UTC, and imagery captured between 12:00-00:00 UTC must begin processing by 18:00 UTC. The ground stations have satellite-based internet connectivity with bandwidth that varies between 500 Mbps and 2 Gbps depending on weather conditions and satellite positioning. The company's cloud operations team has noticed that weather-related connectivity degradation at any single ground station creates bottlenecks that delay processing for all three stations. The CTO wants an architecture that provides consistent transfer performance regardless of individual ground station connectivity issues.

Which solution will provide the most reliable data transfer with consistent timing to meet the ML pipeline requirements?

  1. Deploy AWS DataSync agents at each ground station, configure DataSync tasks with aggressive retry settings and bandwidth optimization, implement Amazon EventBridge rules that monitor DataSync task completion and trigger EC2 processing pipeline launches, use DataSync's network resilience features to handle connectivity variations, and set up CloudWatch alarms to alert operations staff when any transfer falls behind schedule
  2. Implement a hybrid approach: deploy AWS Storage Gateway File Gateway at each ground station to provide immediate local access to captured imagery, configure File Gateway to upload to S3 during periods of good connectivity, use S3 Event Notifications to trigger AWS Lambda functions that launch the EC2 processing pipeline, and rely on File Gateway's local cache to buffer data during connectivity degradation
  3. Deploy a local processing tier at each ground station using AWS Outposts with GPU-equipped racks, process imagery locally at the ground stations, transfer only the processed results (approximately 10% of raw data volume) to AWS using DataSync over the satellite internet connections, and use this approach to minimize data transfer requirements and timing sensitivity
  4. Order AWS Snowball Edge Compute Optimized devices with GPU capabilities on a weekly rotation schedule for each ground station, configure automatic data ingestion from satellite receivers to Snowball devices, process data locally on Snowball Edge GPUs during the week, transfer processed results to S3 via satellite internet, and ship devices to AWS for bulk import of raw imagery for archival purposes

Answer & Explanation

Correct Answer: 3 - AWS Outposts with local GPU processing and reduced data transfer requirements

Why this is correct: This solution addresses the fundamental problem: unreliable satellite internet connectivity cannot consistently transfer 8 TB per ground station (24 TB total daily) within the 6-hour processing window when bandwidth varies between 500 Mbps and 2 Gbps. Even at maximum 2 Gbps, transferring 8 TB requires approximately 9 hours of continuous transfer, which exceeds the 6-hour requirement. At degraded 500 Mbps bandwidth, transfer would take 36+ hours. By deploying AWS Outposts at each ground station, the ML processing occurs locally using P4d-equivalent GPU instances, completely decoupling from satellite internet reliability. Processing imagery locally reduces the data volume requiring transfer by approximately 90% (processed results and derived products vs. raw imagery), making the satellite internet connection adequate for transferring results to AWS. This architecture ensures processing begins on schedule regardless of connectivity status. Raw imagery can be archived to AWS during periods of good connectivity or via periodic Snowball shipments without time pressure. AWS Outposts provides consistent AWS API experience, integrates with AWS services for orchestration, and provides the same security and encryption capabilities as AWS regions.

Why the other options are wrong:

  • Option 1: While DataSync provides excellent retry and resilience features, it cannot overcome fundamental physics limitations of insufficient bandwidth. DataSync with aggressive retry helps recover from transient failures, but when satellite connectivity degrades to 500 Mbps for extended periods, 8 TB cannot physically transfer in 6 hours (requires 36+ hours). Even with bandwidth optimization, DataSync cannot create bandwidth that doesn't exist. The solution relies on operations staff responding to CloudWatch alarms, which introduces manual intervention and still doesn't solve the underlying bandwidth constraint. When weather affects one ground station, that station's imagery misses its processing window, violating the specific timing requirements the CTO wants to eliminate.
  • Option 2: File Gateway's asynchronous upload model makes it impossible to guarantee the strict 6-hour processing windows. File Gateway caches data locally and uploads when connectivity permits based on internal algorithms-it doesn't provide explicit control over "this data must be in AWS by 06:00 UTC." During connectivity degradation, File Gateway may buffer data locally for many hours, causing the EC2 processing pipeline to miss its timing requirements. S3 Event Notifications trigger when objects are created in S3, so if File Gateway hasn't completed upload by the required processing time, the pipeline doesn't start on schedule. File Gateway is designed for hybrid access patterns and gradual synchronization, not for time-bound data delivery with strict SLAs.
  • Option 4: While Snowball Edge Compute Optimized devices provide GPU capabilities for local processing, managing weekly rotations across 3 ground stations creates massive operational complexity. The team must coordinate 156 device rotations per year (3 stations × 52 weeks), manage device shipping logistics from remote locations (Alaska, Norway, Australia), handle customs for international shipments, and deal with device delays due to weather or logistics issues. Each device takes 5-7 days for shipping and import processing, introducing unpredictability in data availability. Snowball Edge devices require physical interaction (loading, shipping, tracking), which creates operational burden in remote ground station environments. The architecture still relies on satellite internet for transferring processed results, which doesn't address the connectivity reliability concern for time-sensitive data. The cost of 156 Snowball Edge Compute Optimized devices annually (approximately $400 per use) plus shipping exceeds the cost of deploying Outposts for a permanent solution.

Key Insight: This question tests whether candidates recognize when network-based data transfer is fundamentally infeasible given bandwidth and timing constraints. The critical calculation is 8 TB ÷ 6 hours = minimum 3.0 Gbps sustained throughput required, but available bandwidth is 0.5-2.0 Gbps. Many candidates will choose DataSync (Option 1) because it's the most common transfer solution, without performing the mathematical analysis that proves no network-based transfer solution can meet the requirements. The insight is that when data transfer timing requirements cannot be met with available connectivity, moving compute to the data (via Outposts) becomes necessary, even though it appears more complex initially.

Case Study 9

A financial services company operates a real-time trading platform that generates transaction logs, market data feeds, and risk calculation outputs in an on-premises data center in London. The company is implementing a disaster recovery strategy that requires replicating this data to AWS in the eu-west-2 (London) region with a Recovery Point Objective (RPO) of 5 minutes and Recovery Time Objective (RTO) of 15 minutes. The data generation rate averages 400 GB per hour during market hours (08:00-16:30 GMT) and 50 GB per hour during off-market hours. The company has a 10 Gbps AWS Direct Connect connection with a private virtual interface to their VPC. The compliance team requires that all financial data must traverse private connectivity with end-to-end encryption, and detailed audit logs must prove that data never traversed the public internet. The DR solution must be automated with no manual intervention required during a disaster event, and recovery drills must be conducted monthly without disrupting production systems. The infrastructure team has 2 engineers who can dedicate 25% of their time to managing the DR solution.

Which disaster recovery architecture will meet all requirements with the least operational complexity?

  1. Deploy AWS DataSync agents on the on-premises servers, configure DataSync tasks to run every 5 minutes transferring data to S3 through the Direct Connect private virtual interface, use S3 Lifecycle policies to transition older data to S3 Glacier, create AWS Backup plans to orchestrate DR recovery, implement CloudWatch Logs for audit trails, and use S3 Event Notifications to trigger automated recovery workflows during disaster events
  2. Implement AWS Storage Gateway Volume Gateway in cached mode, configure application servers to write all data to mounted iSCSI volumes, enable automatic snapshots every 5 minutes, use AWS Backup to manage snapshot retention and recovery, implement VPC endpoints to ensure traffic remains on Direct Connect, leverage Volume Gateway's point-in-time recovery capabilities for DR, and use CloudWatch and CloudTrail for audit logging
  3. Deploy AWS Database Migration Service (DMS) with ongoing replication, configure DMS to capture change data from on-premises databases and file systems, replicate continuously to Amazon RDS and S3 in eu-west-2 through Direct Connect, implement Multi-AZ RDS deployments for high availability, use DMS task logs and CloudTrail for compliance audit trails, and leverage DMS's automatic recovery capabilities
  4. Configure AWS Application Migration Service (MGN) agents on all application servers, enable continuous replication through Direct Connect private virtual interface, maintain low-cost staging instances in AWS that stay in sync with 5-minute lag, use MGN's automated launch templates to orchestrate full application stack recovery within 15 minutes, conduct monthly DR drills using MGN's drill mode without impacting production, and leverage MGN's built-in encryption and CloudTrail integration for compliance

Answer & Explanation

Correct Answer: 2 - AWS Storage Gateway Volume Gateway in cached mode with automated snapshots and AWS Backup

Why this is correct: Volume Gateway in cached mode is specifically designed for disaster recovery scenarios with low RPO requirements. Applications write to iSCSI volumes presented by the gateway, which stores frequently accessed data in the local cache while asynchronously uploading all data to S3-backed volumes in AWS. The 5-minute snapshot capability meets the RPO requirement, and snapshots are incremental (only changed blocks), making frequent snapshots bandwidth-efficient even during the 400 GB/hour peak market hours (approximately 33 GB per 5-minute interval at peak, which is well within 10 Gbps Direct Connect capacity). Volume Gateway traffic uses AWS PrivateLink and can be configured to route through Direct Connect private VIF, ensuring all data remains on private connectivity. AWS Backup provides centralized, automated snapshot management and recovery orchestration, eliminating manual intervention. Recovery involves creating EBS volumes from snapshots and attaching to EC2 instances, which can be automated through AWS Backup recovery plans and Lambda functions to meet the 15-minute RTO. Monthly DR drills can be conducted by recovering snapshots to separate test instances without affecting production gateway operations. The solution requires minimal ongoing management-after initial setup, snapshot schedules and retention are automated. CloudTrail logs all API calls, and VPC Flow Logs prove traffic traversed Direct Connect, satisfying audit requirements.

Why the other options are wrong:

  • Option 1: DataSync is optimized for file transfers and scheduled tasks, not for block-level continuous replication with 5-minute RPO. Running DataSync tasks every 5 minutes during peak market hours (400 GB/hour = 33 GB per 5-minute task) creates operational challenges: each task must complete within 5 minutes to maintain schedule, task queuing could occur if a previous task hasn't finished, and file-level synchronization introduces overhead compared to block-level replication. DataSync doesn't provide the point-in-time consistency needed for recovering transactional systems-files may be in various states of completion when each 5-minute task runs. Recovery requires reconstructing application state from S3 objects, which doesn't provide the same recovery simplicity as Volume Gateway snapshots. While DataSync can create audit logs, proving that an entire disaster recovery workflow (data transfer, storage, and recovery) never touched the internet requires correlating multiple AWS service logs, increasing complexity.
  • Option 3: AWS DMS is designed for database replication, not for comprehensive disaster recovery of entire application stacks including transaction logs, market data feeds, and risk calculation outputs (which likely include files, not just database records). The question indicates multiple types of data outputs, suggesting heterogeneous applications beyond just databases. DMS requires defining specific database endpoints and schemas, and doesn't handle file system data or application binaries required for full DR. Implementing DMS for continuous file system replication is possible using custom scripts but creates significant operational complexity. RDS Multi-AZ provides high availability within AWS but doesn't address the core requirement of replicating on-premises data to AWS for DR. The solution doesn't address recovering compute infrastructure, only data-meeting a 15-minute RTO for full application stack recovery requires additional orchestration not provided by DMS.
  • Option 4: AWS Application Migration Service (MGN) is designed for server migration and disaster recovery of entire server workloads, but it's optimized for protecting compute instances and operating system state, not specifically for high-frequency data replication with 5-minute RPO. MGN maintains staging instances that consume compute costs continuously (even low-cost instances accumulate charges). For a trading platform with potentially dozens of application servers, maintaining staging instances 24/7 may be more expensive than Storage Gateway. MGN is most cost-effective when RPO requirements are measured in hours rather than minutes. The question emphasizes data replication (transaction logs, market data feeds, risk calculations) rather than server recovery, suggesting a data-centric DR approach (Volume Gateway) is more appropriate than an application-centric approach (MGN). While MGN can meet the requirements, it's operationally more complex than Volume Gateway for data-focused DR and incurs higher ongoing costs for staging infrastructure.

Key Insight: The discriminator is recognizing that "transaction logs, market data feeds, and risk calculation outputs" combined with "5-minute RPO" points to a block-level continuous replication solution rather than file-based scheduled transfers or application-level replication. Volume Gateway's ability to present block storage to applications while automatically maintaining incremental snapshots provides the sweet spot between low RPO, simplified recovery, and operational simplicity. Candidates often choose DataSync because it's familiar for data transfer, but miss that scheduled file synchronization doesn't provide the transactional consistency needed for financial system DR.

Case Study 10

A machine learning research lab at a major university has trained a large language model using their on-premises GPU cluster. The resulting model consists of 180 TB of checkpoint files, model weights, training data, and associated metadata. The research team wants to publish the model through Amazon SageMaker for public access by other researchers worldwide. The university's IT security policy prohibits direct internet connectivity from research servers due to concerns about data exfiltration of sensitive research IP. The university has a 1 Gbps connection to the internet that is heavily utilized by 40,000 students and faculty, with typical sustained available bandwidth of 200-300 Mbps. The research team has a grant deadline in 60 days to make the model publicly available. However, the university's purchasing department requires 4-6 weeks to process purchase orders and vendor contracts for new services. The research lab has a budget of $15,000 for data transfer and initial AWS setup costs. The CTO of the university IT department has expressed concern about any solution that might saturate the university's internet connection and impact educational activities.

What is the MOST feasible solution to transfer the model to AWS within the timeline and budget constraints while respecting the university's operational and security policies?

  1. Order AWS Snowball Edge Storage Optimized devices immediately using the research grant credit card to bypass purchasing department delays, load the 180 TB model data onto multiple devices using the research lab's isolated network segment, ship devices directly to AWS for import into S3, configure SageMaker to access the model from S3, and complete the entire process within the 60-day deadline while staying well under the $15,000 budget
  2. Request that the university's IT department provision a dedicated 1 Gbps circuit for the research lab, establish a new AWS Direct Connect connection using expedited provisioning, deploy AWS DataSync agents on the research servers, transfer the model over Direct Connect with dedicated bandwidth, and use the remaining time to configure SageMaker for model hosting
  3. Work with the university's IT department to create a temporary firewall exception allowing outbound HTTPS connectivity from the research servers to AWS S3 endpoints, install AWS CLI on the research servers, transfer data using S3 multipart uploads with bandwidth throttling to limit impact to 100 Mbps during nighttime hours (22:00-06:00), and calculate that the transfer can complete in approximately 60 days of nightly uploads
  4. Negotiate with the university's IT department to temporarily install AWS Storage Gateway Volume Gateway in the campus data center with appropriate firewall rules, configure the research servers to copy model data to Volume Gateway volumes, leverage Volume Gateway's bandwidth throttling to limit internet impact, use asynchronous upload to S3, and configure SageMaker once data upload completes

Answer & Explanation

Correct Answer: 1 - AWS Snowball Edge devices ordered immediately using grant funding

Why this is correct: This is the only option that simultaneously meets all the critical constraints: 60-day timeline, $15,000 budget, 4-6 week purchasing delay, prohibition on direct internet connectivity from research servers, and avoidance of saturating university bandwidth. AWS Snowball Edge Storage Optimized devices can be ordered directly through AWS console using a credit card, bypassing the university purchasing department's procurement delays. Each device provides 80 TB usable capacity, so 3 devices can accommodate the 180 TB model. Snowball pricing is approximately $300 per device plus $100-150 shipping and 10 days of use, totaling roughly $1,350-1,500 for the complete transfer, well within the $15,000 budget. The research lab can use its internal isolated network to transfer data from GPU servers to Snowball devices at high speed (10 Gbps local network) without any internet connectivity, satisfying security policy. The devices ship directly to AWS for import, with no data traversing the university's internet connection, eliminating impact to educational activities. The timeline is predictable: 2-3 days for device delivery, 5-7 days for data loading (180 TB on local 10 Gbps network), 2-3 days for shipping to AWS, 7-10 days for AWS import processing, totaling approximately 20-25 days, leaving ample time for SageMaker configuration within the 60-day grant deadline.

Why the other options are wrong:

  • Option 2: AWS Direct Connect provisioning typically requires 2-4 weeks minimum even with expedited processing, and often longer for university environments where network changes require committee approvals and campus networking team coordination. Ordering a dedicated 1 Gbps circuit from a telecommunications provider requires 4-8 weeks for installation, service activation, and cross-connect configuration, consuming the entire 60-day window before data transfer even begins. Both Direct Connect and dedicated circuit require purchase orders and vendor contracts, triggering the 4-6 week purchasing department delay. The cost of Direct Connect exceeds budget: port hour charges ($300-$2,250/month depending on capacity), cross-connect fees ($100-$500/month), telecommunications circuit costs ($1,000-$5,000/month depending on bandwidth and location), and potential AWS data transfer charges exceed the $15,000 budget even for a 2-month engagement. This option is completely infeasible given the timeline and budget constraints.
  • Option 3: Creating firewall exceptions requires working through the university's IT governance and security approval process, which typically takes 2-4 weeks in university environments involving security committee review, change control boards, and policy exception documentation. This violates the prohibition on direct internet connectivity from research servers-the security policy exists specifically to prevent scenarios like this, and obtaining an exception undermines institutional security posture. The mathematical calculation reveals this approach is infeasible: 180 TB = 180,000 GB = 1,440,000,000 megabits. At 100 Mbps for 8 hours/night, daily transfer is 100 Mbps × 8 hours × 3600 seconds = 2,880,000 megabits ≈ 360 GB/night. To transfer 180 TB requires 180,000 GB ÷ 360 GB/night = 500 nights ≈ 16-17 months, far exceeding the 60-day deadline. Even using the full 200-300 Mbps available bandwidth 24/7 would take 2-3 months of continuous transfer, which would saturate the university connection and violate the CTO's operational concerns.
  • Option 4: Deploying Storage Gateway in the campus data center requires procuring and installing the gateway appliance (VM or hardware), which involves the university IT department's standard procurement process (4-6 week delay), committee approvals for new infrastructure, and change control processes for data center installations. Configuring appropriate firewall rules for Storage Gateway to communicate with AWS requires security team approval and change control (2-4 weeks). The asynchronous nature of Volume Gateway uploads makes it impossible to predict whether 180 TB will transfer within 60 days-uploads occur based on internal algorithms and available bandwidth, not explicit schedules. Volume Gateway would consume the university's internet bandwidth for continuous uploading, directly violating the CTO's concern about impact to educational activities. The operational complexity of coordinating Storage Gateway deployment across research lab and IT department within the tight timeline while managing all the bureaucratic processes makes this approach highly risky for meeting the grant deadline.

Key Insight: This question tests the ability to perform pragmatic architectural decision-making when faced with multiple real-world constraints that compete with each other. The trap is choosing technically sophisticated solutions (Direct Connect, Storage Gateway) that seem more "enterprise-grade," while missing that procurement delays, budget limits, and political/operational constraints at educational institutions make these approaches infeasible. Snowball's ability to be ordered immediately with a credit card and its offline transfer model that bypasses all the networking and security governance complexity makes it the only viable solution despite seeming less elegant than network-based approaches. Professional architects must recognize when "perfect" architectures are impractical and choose solutions that actually ship within real-world constraints.

The document Case Studies: Data Transfer is a part of the AWS Solutions Architect Course AWS Solutions Architect: Professional Level.
All you need of AWS Solutions Architect at this link: AWS Solutions Architect
Explore Courses for AWS Solutions Architect exam
Get EduRev Notes directly in your Google search
Related Searches
Semester Notes, shortcuts and tricks, Important questions, mock tests for examination, Sample Paper, video lectures, MCQs, ppt, practice quizzes, pdf , Free, study material, Extra Questions, Viva Questions, Case Studies: Data Transfer, Summary, past year papers, Previous Year Questions with Solutions, Exam, Case Studies: Data Transfer, Case Studies: Data Transfer, Objective type Questions;