Denormalization in DynamoDB
Coming from SQL, denormalization sounds like a sin — duplicated data, no single source of truth. In DynamoDB it's the whole point. There are no joins, so you copy related data onto the item that needs it and read it back in one shot.
What is denormalization in DynamoDB?
Denormalization in DynamoDB means copying related data onto the item that reads it, so a single query returns everything in one shot. Because DynamoDB has no joins, you pre-join at write time instead of stitching tables together at read time. The trade-off is staleness — only duplicate values that rarely change.
- No joins means you pre-join at write time. Store the related value on the item that reads it, so a query never needs a second lookup.
- Two flavors. Embed nested data in a complex attribute on one item, or duplicate a value across many items.
- The footgun is staleness. When the source changes, every copy is wrong until you fan out the update. Only duplicate values that rarely change.
- It buys reads, not writes. You trade more (and more careful) writes for cheap, single-request reads.
Why there are no joins to fall back on
A relational JOIN reassembles normalized rows at read time. DynamoDB has no
join — a Query reads one and hands back exactly what's stored
there. Nothing stitches two tables together for you.
So the data has to already be shaped for the read. If a screen needs a post and its author's name, that name must live somewhere the post read already touches. The 2007 Amazon Dynamo paper made this trade explicit: drop relational features to get predictable, single-digit-millisecond reads at scale.
Pattern 1 — embed with a complex attribute
DynamoDB attributes can hold nested maps and lists, not just scalars. So one common form of denormalization is stuffing a child object directly inside its parent item instead of giving it its own item.
A post with its tags and a small author snapshot, all on one item:
| PK | SK | author | tags |
|---|---|---|---|
| POST#9f3 | META | {id: U#12, name: "Mara Vance"} | ["dynamodb","aws"] |
One GetItem returns the post, the tags, and the author block together. No
second read. This is great for data that's owned by the parent and bounded in
size — a handful of tags, one author snapshot.
The limit to respect: a single DynamoDB item maxes out at 400 KB, attribute names and values included (Service Quotas). Embed an unbounded list (every comment on a viral post) and you'll blow past it.
Pattern 2 — duplicate a value across items
The blog case is the textbook one. You list posts and want each row to show the author's display name — but you don't want a second read per post to fetch it.
So you write the author's name onto each post item when the post is created:
| PK | SK | authorId | authorName | title |
|---|---|---|---|---|
| POST#9f3 | META | U#12 | "Mara Vance" | "Modeling 1:N" |
| POST#a71 | META | U#12 | "Mara Vance" | "Sparse GSIs" |
| POST#b04 | META | U#88 | "Lio Tan" | "Query vs Scan" |
Now Query PK begins_with "POST#" (or a over posts) renders the whole list —
title and author — with no per-row lookup. The author name is denormalized:
the canonical copy lives on USER#12, and every post carries its own copy.
The trade is right there. You've turned an N+1 read into one read, at the cost of
holding "Mara Vance" in N+1 places.
Embed vs. duplicate — which one
| Embed (complex attribute) | Duplicate (copy across items) | |
|---|---|---|
| Shape | child nested inside parent | same value on many items |
| Best for | bounded, parent-owned data | a shared value many items display |
| Read | one GetItem | one Query |
| Update cost | rewrite the one parent item | fan out to every copy |
| Size risk | 400 KB item cap | none per item |
Reach for embed when the child only ever appears with its parent. Reach for duplicate when many independent items need to show the same shared value.
The footgun: stale copies
Here's the part that bites. Mara renames herself to "Mara V." You update
USER#12. Every post item still says "Mara Vance" until you go fix them.
So updating a duplicated value is a fan-out write, not a one-liner. You query every affected item and rewrite each one — ideally guarded so you only touch rows that still hold the old value:
UPDATE POST#9f3
SET authorName = "Mara V."
WHERE authorName = "Mara Vance"
You can compose that conditional SET against authorName in the
Expression Builder and copy the generated
UpdateExpression and ConditionExpression straight into your code.
The fan-out itself is a write per item: query the author's posts, then issue the updates. The sequence:
The cost of duplicating data: every change to the source is a query plus a write per copy.
This is why the rule is only duplicate values that rarely change. A display name, a plan tier, a category label — fine. A live counter or a frequently edited field — don't; the fan-out will eat you alive.
When normalization still wins
If a value changes often, or one item is read by genuinely unpredictable patterns, keep it normalized and accept the extra read. Denormalization is an optimization for known, read-heavy access patterns — not a default to apply everywhere. Pre-join the reads you actually run, and leave the rest alone.
To decide where these duplicated attributes live, model the access patterns first — see single-table design and, for the read side of the trade, Query vs Scan.
Download DynoTable to inspect a denormalized table, spot which copies have drifted, and run the fan-out update against your own data.