Why a DynamoDB Scan Is Slow and Expensive
A Scan reads every item in the table and only filters afterward. It is
the operation you reach for out of SQL muscle memory, and the one that quietly
runs up your bill while making your latency worse than the RDS box you left.
Why is my DynamoDB Scan slow and expensive?
A Scan reads every item in the table before the FilterExpression runs, so
you pay to read the whole table no matter how few rows come back, and it gets
slower as the table grows. The fix is almost always a keyed Query — model the
access pattern around a key so DynamoDB touches one partition instead of
everything.
- A
Scanreads the whole table, every time. Size, not your result count, decides what you pay and how long it takes. - The
FilterExpressionis a lie about cost. It runs after the read is metered, so returning 12 items can bill for reading 12 million. - A
Scangets slower as you grow. A keyedQuerystays flat — it touches one partition no matter how big the table gets. - The fix is almost always modelling, not tuning. If you
Scanto answer a routine question, you are missing a key.
What a Scan actually does
Coming from SQL, SELECT * FROM events WHERE type = 'checkout' feels free —
the engine has an index, or it doesn't, but either way you get rows back. In
DynamoDB there is no query planner deciding that for you.
A Scan walks the entire table sequentially, 1 MB at a time, and hands each
page to your FilterExpression. Whatever the filter rejects is still read,
still metered, and still on your bill. (AWS: Scanning tables)
That is the trap. The filter looks like a WHERE clause, but it changes the
result set, never the cost. A Scan consumes the same read capacity whether or
not a filter is present. (AWS: Scanning tables)
Count the read units
DynamoDB meters reads in read capacity units (RCUs). One RCU buys a single strongly consistent read of an item up to 4 KB; eventually consistent reads cost half that. Bigger items round up to the next 4 KB. (AWS: Read/write capacity mode)
Take an analytics table, ProductEvents. Each row is one tracked event:
PK = "TENANT#acme"
SK = "TS#2026-06-23T14:08:55Z#evt_9f3a"
attrs: eventType, sessionId, userId, payloadBytesSay it holds 2,000,000 events, each ~1 KB, all under one busy tenant. You want today's checkouts. The reflexive move:
Scan ProductEvents
FilterExpression: eventType = "checkout"
That filter might return 40 rows. But the Scan read all 2,000,000 items
first. At ~1 KB each (1 RCU per 4 KB, eventually consistent ≈ 0.5 RCU per 4 KB),
you metered roughly 250,000 RCUs — and paged through ~500 MB of data — to
hand back 40 items.
Now model the access pattern as a key and Query it instead:
Query ProductEvents
PK = "TENANT#acme"
AND SK begins_with "TS#2026-06-23"
This reads only the matched slice of one partition. If those 40 checkout rows plus the day's other events come to ~2 MB, you pay for ~2 MB of reads, not 500 MB. Same answer, a tiny fraction of the cost — and the latency stays flat as the table grows.
Scan vs Query, metered
| Scan + filter | Keyed Query | |
|---|---|---|
| Reads | Every item in the table | One partition, narrowed by SK |
| Billed capacity | Whole table, before the filter | Only the items in your slice |
| Our example | ~250,000 RCUs (~500 MB) | a few hundred RCUs (~2 MB) |
| Latency | Grows with table size | Flat as the table grows |
| Result count | Decides nothing about cost | Matches what you pay for |
The lesson the table encodes: on a Scan, your result count and your bill are
unrelated. On a Query, they track each other.
Decide before you Scan
Most accidental Scans come from one question: can I name the partition I
need? If yes, it is a Query. If no, the fix is a key, not a bigger filter.
Here is the decision in flow form.
The path almost always ends at Query; you only fall through to Scan when no
key — present or addable — fits the access pattern.
If the pattern is real and recurring but the base table can't key it, that is
the signal to add a Global Secondary Index so the question
becomes a Query. Modelling your keys around your access patterns up front is
the whole game — see single-table design.
Write the keyed query, not a filter
When you do need a condition beyond the key, build it deliberately rather than
dumping everything into a FilterExpression. The
DynamoDB Expression Builder generates the
KeyConditionExpression and attribute placeholders for you, so the partition
and sort key do the narrowing — before DynamoDB meters the read, not after.
KeyConditionExpression: PK = :tenant AND begins_with(SK, :day)
When a Scan is actually fine
A Scan isn't forbidden — it's just the wrong default. It's the right tool when
you genuinely mean "read everything":
- One-off exports or backfills run by hand.
- Tiny config / lookup tables where the whole table is a few KB.
- Background jobs that page the full table on purpose. Split those across
workers with
Segment/TotalSegments— a parallel scan — instead of one long sequential crawl. (AWS: Scanning tables)
And note PartiQL doesn't save you: SELECT * FROM ProductEvents WHERE eventType = 'checkout' with no key predicate compiles straight to a Scan.
It's the same footgun in SQL clothing. (See Query vs Scan
for the full breakdown.)
When you truly need cross-item analytics — a GROUP BY, a JOIN, an aggregate
DynamoDB can't express — DynoTable's SQL Workbench runs them client-side over a
bounded result set, instead of hammering the table with a full Scan.
Next steps
Estimate what either pattern costs with the capacity calculator, read Query vs Scan for the API-level contrast, and download DynoTable to run these against your own tables and watch the consumed capacity for yourself.