Intermediate6 min read

Denormalisation in DynamoDB

Coming from SQL, denormalisation sounds like a sin — duplicated data, no single source of truth. In DynamoDB it's the whole point. There are no joins, so you copy related data onto the item that needs it and read it back in one shot.

What is denormalisation in DynamoDB?

Denormalisation in DynamoDB means copying related data onto the item that reads it, so a single query returns everything in one shot. Because DynamoDB has no joins, you pre-join at write time instead of stitching tables together at read time. The trade-off is staleness — only duplicate values that rarely change.

  • No joins means you pre-join at write time. Store the related value on the item that reads it, so a query never needs a second lookup.
  • Two flavors. Embed nested data in a complex attribute on one item, or duplicate a value across many items.
  • The footgun is staleness. When the source changes, every copy is wrong until you fan out the update. Only duplicate values that rarely change.
  • It buys reads, not writes. You trade more (and more careful) writes for cheap, single-request reads.

Why there are no joins to fall back on

A relational JOIN reassembles normalised rows at read time. DynamoDB has no join — a Query reads one item collection and hands back exactly what's stored there. Nothing stitches two tables together for you.

So the data has to already be shaped for the read. If a screen needs a post and its author's name, that name must live somewhere the post read already touches. The 2007 Amazon Dynamo paper made this trade explicit: drop relational features to get predictable, single-digit-millisecond reads at scale.

Pattern 1 — embed with a complex attribute

DynamoDB attributes can hold nested maps and lists, not just scalars. So one common form of denormalisation is stuffing a child object directly inside its parent item instead of giving it its own item.

A post with its tags and a small author snapshot, all on one item:

PKSKauthortags
POST#9f3META{id: U#12, name: "Mara Vance"}["dynamodb","aws"]

One GetItem returns the post, the tags, and the author block together. No second read. This is great for data that's owned by the parent and bounded in size — a handful of tags, one author snapshot.

The limit to respect: a single DynamoDB item maxes out at 400 KB, attribute names and values included (Service Quotas). Embed an unbounded list (every comment on a viral post) and you'll blow past it.

Pattern 2 — duplicate a value across items

The blog case is the textbook one. You list posts and want each row to show the author's display name — but you don't want a second read per post to fetch it.

So you write the author's name onto each post item when the post is created:

PKSKauthorIdauthorNametitle
POST#9f3METAU#12"Mara Vance""Modeling 1:N"
POST#a71METAU#12"Mara Vance""Sparse GSIs"
POST#b04METAU#88"Lio Tan""Query vs Scan"

Now Query PK begins_with "POST#" (or a GSI over posts) renders the whole list — title and author — with no per-row lookup. The author name is denormalised: the canonical copy lives on USER#12, and every post carries its own copy.

The trade is right there. You've turned an N+1 read into one read, at the cost of holding "Mara Vance" in N+1 places.

Embed vs. duplicate — which one

Embed (complex attribute)Duplicate (copy across items)
Shapechild nested inside parentsame value on many items
Best forbounded, parent-owned dataa shared value many items display
Readone GetItemone Query
Update costrewrite the one parent itemfan out to every copy
Size risk400 KB item capnone per item

Reach for embed when the child only ever appears with its parent. Reach for duplicate when many independent items need to show the same shared value.

The footgun: stale copies

Here's the part that bites. Mara renames herself to "Mara V." You update USER#12. Every post item still says "Mara Vance" until you go fix them.

So updating a duplicated value is a fan-out write, not a one-liner. You query every affected item and rewrite each one — ideally guarded so you only touch rows that still hold the old value:

UPDATE POST#9f3
SET authorName = "Mara V."
WHERE authorName = "Mara Vance"

You can compose that conditional SET against authorName in the Expression Builder and copy the generated UpdateExpression and ConditionExpression straight into your code.

The fan-out itself is a write per item: query the author's posts, then issue the updates. The sequence:

"DynamoDB"App"DynamoDB"App"Update USER"Query author's posts""POST"Update each authorName"

The cost of duplicating data: every change to the source is a query plus a write per copy.

This is why the rule is only duplicate values that rarely change. A display name, a plan tier, a category label — fine. A live counter or a frequently edited field — don't; the fan-out will eat you alive.

When normalisation still wins

If a value changes often, or one item is read by genuinely unpredictable patterns, keep it normalised and accept the extra read. Denormalisation is an optimisation for known, read-heavy access patterns — not a default to apply everywhere. Pre-join the reads you actually run, and leave the rest alone.

To decide where these duplicated attributes live, model the access patterns first — see single-table design and, for the read side of the trade, Query vs Scan.

Download DynoTable to inspect a denormalised table, spot which copies have drifted, and run the fan-out update against your own data.

Updated