Choose the best when there are options

A company collects a steady stream of 10 million data records from 100,000 sources each day. These records are written to an Amazon RDS MySQL DB. A query must produce the daily average of a data source over the past 30 days. There are twice as many reads as writes. Queries to the collected data are for one source ID at a time.

How can the Solutions Architect improve the reliability and cost effectiveness of this solution?

  1. Use Amazon Aurora with MySQL in a Multi-AZ mode. Use four additional read replicas.
  2. Use Amazon DynamoDB with the source ID as the partition key and the timestamp as the sort key. Use a Time to Live (TTL) to delete data after 30 days.
  3. Use Amazon DynamoDB with the source ID as the partition key. Use a different table each day.
  4. Ingest data into Amazon Kinesis using a retention period of 30 days. Use AWS Lambda to write data records to Amazon ElastiCache for read access.

答案中比较明显的是B、C采用了DynamoDB,而题目中说明使用了MySQL。采用这两种方案需要对数据库进行非常大的改动。一般来说这种方案除非有深思熟虑,在实际情况中是不会采纳的。所以这里可以比较容易的排除B、C两项。

答案A和D中,乍一看都是可以满足所有需求:一定量的数据写入,两倍于写入的读取需求性能和查询最近30天的平均值。但答案A相比D更加合理: * 数据并没有说明只保留30天,答案A可以长期保留数据。而D在30天之后数据即消失 * Kinesis更加强调实时性,这里并没有提出这种需求

综合来看,A是最佳的答案。

经过其他群友的提醒,文档中明确了Kinesis里数据最长保存时间是7天(168小时),所以答案4也是不正确的。所以只能选答案1。