DynamoDB: Summary & Notes
Storage is cheap as can be, while compute is at a premium.
DynamoDB is a fully-managed, NoSQL database provided by Amazon Web Services.
đź’ˇKey Properties of DynamoDB
Key-value or wide-column data model
- Key-value store: retrieve one record at a time.
- Wide-column store: super-charged version of a hash table where the value for each record in your hash table is a B-tree (think of it as a phonebook: “Give me all entries between Dupond and Dupré”)
Infinite scaling with no performance degradation
- Most operations in DynamoDB have response times in single-digit milliseconds.
- If you need better than that, AWS offers DynamoDB Accelerator (DAX), which is a fully managed in-memory cache for your DynamoDB table.
- There’s no theoretical limit to how big a DynamoDB table can be.
HTTP connection model
- PostgreSQL sets the default number of maximum connections to 100. With DynamoDB, there are no limitations.
- You can connect to DynamoDB with a virtually unlimited number of concurrent requests, provided you have paid for the throughput.
Infrastructure-as-code friendly
- Creating a DynamoDB table and specifying the primary key and secondary indexes can be done declaratively via Terraform and CloudFormation. You can handle database users and access control by creating AWS IAM roles and policies as well.
Change data capture with DynamoDB Streams
- With DynamoDB Streams, you get a transactional log of each write/update/delete transaction in your DynamoDB table.
- You can programmatically process this log, which opens up a huge number of use cases: using DynamoDB as a queue, broadcasting event updates across microservices…
Time-to-live (TTL)
- TTLs allow you to have DynamoDB automatically delete items on a per-item basis.
- You can use TTL to clean up your database rather than handling it manually via a scheduled job.
- NB: Items are generally deleted in a timely manner, but AWS only states that items will usually be deleted within 48 hours after the time indicated by the attribute. This delay could be unacceptable for the access patterns in your application. Rather than relying on the TTL for data accuracy in your application, you should confirm an item is not expired when you retrieve it.
đź“šData Modeling Considerations
Basic Vocabulary
- Table
- Item = row/record
- Attributes = typed data values holding information about a named element in an item
- Item collection = group of items sharing the same partition key (in either the base table or a secondary index)
Data Types
- Scalars: string, number, binary, boolean, null
- Complex: lists, maps
- Sets: string sets, number sets, binary sets
👉 DynamoDB does not offer a “datetime” data type.
- For range-based queries, it is recommended to store datetime values as string in ISO 8601 format —> designed to sort alphanumerically (within the same timezone), will work with DynamoDB filter expressions.
- For TTL, use Epoch format (number type).
Primary keys
When creating a DynamoDB table, you must declare a primary key for your table. There are two kinds of primary keys:
- Simple primary keys, which consist of a single element called a partition key.
- It allows you to fetch only a single item at a time.
- Composite primary keys, which consist of two elements, called a partition key and a sort key (can also be called “hash” or “range” key).
- They enable a “fetch many” access pattern.
- It is possible to specify conditions on the sort key to narrow down your query space.
It’s the most important part of data modeling with DynamoDB: almost all data access will be driven off primary keys, so you need to choose them wisely.
Secondary indexes
The way you configure your primary keys may allow for one read or write access pattern but may prevent you from handling a second access pattern.
To help with this problem, DynamoDB has the concept of secondary indexes. Secondary indexes allow you to reshape your data into another format for querying, so you can add additional access patterns to your data.
There are two types of secondary indexes:
Key schema | Creation time | Consistency | |
---|---|---|---|
Local secondary index | Must use same partition key as the base table | Must be created when table is created | Eventual consistency by default. Can choose to receive strongly-consistent reads at a cost of higher throughput usage |
Global secondary index | May use any attribute from table as partition and sort keys | Can be created after the table exists | Eventual consistency only |
Generally, global secondary indexes (GSI) are deemed more flexible: no need to add them at table-creation time, and possible to delete them.
Overloading keys/indexes
With DynamoDB, we can include different types of entities in a single table. It can make querying more efficient, and helps overcoming certain limitations.
Follow this link for an intuitive example of how to overload GSI key: Amazon DynamoDB Workshop & Labs
Data Modeling recommendations
- There is no Join operation in DynamoDB:
- Rather than reassembling your data at read time with a join, you should preassemble your data in the exact shape that is needed for a read operation.
- Data modeling in DynamoDB should be driven entirely by your data access patterns.
- Use primary key prefixes to distinguish between entity types. Get comfortable with denormalization and multiple entity types per table.
- Handle additional access patterns with secondary indexes and streams.
- DynamoDB is designed for OLTP use cases. Using it for OLAP queries (analytics purposes) should be avoided.
- Table prefixes can be used to distinguish between environments (master, beta, prod…).
📬DynamoDB API Basics
- Item-based actions: GetItem, PutItem, UpdateItem, DeleteItem
- Single-item actions must include the entire primary key of the item(s) being referenced.
- Queries:
- Used to retrieve multiple items with the same partition key.
- The conditions on the sort key can provide powerful filter capabilities on your table.
- Scans:
- Grabs everything in the table, will paginate for large tables.
- Mostly useful for small tables, or for exporting data.
✂️DynamoDB Limits
Item size limits
- A single DynamoDB item is limited to 400KB of data.
- Attribute names (column names) prefix the values in the item’s structure (key=value) and are included in the size limit, so we need to be aware of long column names.
- It is common to shorten attribute names to save storage.
Query and Scan request size limits
- Query and Scan will read a maximum of 1MB of data from your table. Further, this 1MB limit is applied before any filter expressions are considered.
- For bigger volumes of data, you will need to paginate through the results by making follow-up requests to DynamoDB.
Partition throughput limits
- A single partition can have a maximum of 3000 Read Capacity Units or 1000 Write Capacity Units.
- Capacity units are on a per-second basis, and these limits apply to a single partition, not the table as a whole.
đź’¸Pricing Model
DynamoDB is priced directly based on the amount of workload capacity you need. You specify the throughput you want in terms of Read Capacity Units and Write Capacity Units:
- A Read Capacity Unit gives you a single strongly-consistent read per second or two eventually-consistent reads per second, up to 4KB in size.
- A Write Capacity Unit allows you to write a single item per second, up to 1KB in size.
There are several capacity modes for DynamoDB:
- On-demand capacity mode: you pay for what you use.
- If you don’t know your access patterns or don’t want to take the time to capacity plan your workload, you can use On-Demand Pricing from DynamoDB.
- It is recommended to start with On-Demand Pricing as you develop a baseline traffic level for your application.
- Tables can be switched to on-demand mode once every 24 hours.
- Provisioned capacity mode: you specify the number of reads and writes per second that you require for your application.
- A good option if you have predictable application traffic that is consistent or ramps gradually. Use this mode to forecast capacity requirements and control costs.
- If your application exceeds your provisioned throughput capacity on a table or index, it is subject to request throttling.
- Auto scaling: you define a range (upper and lower limits) for read and write capacity units. You also define a target utilization percentage within that range.
- Reserved capacity: you pay a one-time upfront fee and commit to a minimum provisioned usage level over a period of time (1 year generally). Your reserved capacity is billed at the hourly reserved capacity rate.
- By reserving your read and write capacity units ahead of time, you realize significant cost savings on your provisioned capacity costs.
⚠️ You can change the read/write capacity settings twice in a 24-hour period.
Tips for optimizing costs with DynamoDB
Great tips right here!
Sources:
- [DOCS] Amazon DynamoDB Developer Guide
- [DOCS] DynamoDB API Reference
- [DOCS] Read/write capacity mode
- [BOOK] The DynamoDB Book, by Alex DeBrie
- [LABS] Amazon DynamoDB Labs