Beginner6 min read

Why a DynamoDB Scan Is Slow and Expensive

A Scan reads every item in the table and only filters afterward. It is the operation you reach for out of SQL muscle memory, and the one that quietly runs up your bill while making your latency worse than the RDS box you left.

Why is my DynamoDB Scan slow and expensive?

A Scan reads every item in the table before the FilterExpression runs, so you pay to read the whole table no matter how few rows come back, and it gets slower as the table grows. The fix is almost always a keyed Query — model the access pattern around a key so DynamoDB touches one partition instead of everything.

  • A Scan reads the whole table, every time. Size, not your result count, decides what you pay and how long it takes.
  • The FilterExpression is a lie about cost. It runs after the read is metered, so returning 12 items can bill for reading 12 million.
  • A Scan gets slower as you grow. A keyed Query stays flat — it touches one partition no matter how big the table gets.
  • The fix is almost always modelling, not tuning. If you Scan to answer a routine question, you are missing a key.

What a Scan actually does

Coming from SQL, SELECT * FROM events WHERE type = 'checkout' feels free — the engine has an index, or it doesn't, but either way you get rows back. In DynamoDB there is no query planner deciding that for you.

A Scan walks the entire table sequentially, 1 MB at a time, and hands each page to your FilterExpression. Whatever the filter rejects is still read, still metered, and still on your bill. (AWS: Scanning tables)

That is the trap. The filter looks like a WHERE clause, but it changes the result set, never the cost. A Scan consumes the same read capacity whether or not a filter is present. (AWS: Scanning tables)

Count the read units

DynamoDB meters reads in read capacity units (RCUs). One RCU buys a single strongly consistent read of an item up to 4 KB; eventually consistent reads cost half that. Bigger items round up to the next 4 KB. (AWS: Read/write capacity mode)

Take an analytics table, ProductEvents. Each row is one tracked event:

PK  = "TENANT#acme"
SK  = "TS#2026-06-23T14:08:55Z#evt_9f3a"
attrs: eventType, sessionId, userId, payloadBytes

Say it holds 2,000,000 events, each ~1 KB, all under one busy tenant. You want today's checkouts. The reflexive move:

Scan ProductEvents
FilterExpression: eventType = "checkout"

That filter might return 40 rows. But the Scan read all 2,000,000 items first. At ~1 KB each (1 RCU per 4 KB, eventually consistent ≈ 0.5 RCU per 4 KB), you metered roughly 250,000 RCUs — and paged through ~500 MB of data — to hand back 40 items.

Now model the access pattern as a key and Query it instead:

Query ProductEvents
PK = "TENANT#acme"
AND SK begins_with "TS#2026-06-23"

This reads only the matched slice of one partition. If those 40 checkout rows plus the day's other events come to ~2 MB, you pay for ~2 MB of reads, not 500 MB. Same answer, a tiny fraction of the cost — and the latency stays flat as the table grows.

Scan vs Query, metered

Scan + filterKeyed Query
ReadsEvery item in the tableOne partition, narrowed by SK
Billed capacityWhole table, before the filterOnly the items in your slice
Our example~250,000 RCUs (~500 MB)a few hundred RCUs (~2 MB)
LatencyGrows with table sizeFlat as the table grows
Result countDecides nothing about costMatches what you pay for

The lesson the table encodes: on a Scan, your result count and your bill are unrelated. On a Query, they track each other.

Decide before you Scan

Most accidental Scans come from one question: can I name the partition I need? If yes, it is a Query. If no, the fix is a key, not a bigger filter.

Here is the decision in flow form.

YesNoYesNoNeed to read itemsKnow the partition key?Query one partitionCan a GSI key it?Add a GSI, then QueryScan last resort

The path almost always ends at Query; you only fall through to Scan when no key — present or addable — fits the access pattern.

If the pattern is real and recurring but the base table can't key it, that is the signal to add a Global Secondary Index so the question becomes a Query. Modelling your keys around your access patterns up front is the whole game — see single-table design.

Write the keyed query, not a filter

When you do need a condition beyond the key, build it deliberately rather than dumping everything into a FilterExpression. The DynamoDB Expression Builder generates the KeyConditionExpression and attribute placeholders for you, so the partition and sort key do the narrowing — before DynamoDB meters the read, not after.

KeyConditionExpression: PK = :tenant AND begins_with(SK, :day)

When a Scan is actually fine

A Scan isn't forbidden — it's just the wrong default. It's the right tool when you genuinely mean "read everything":

  • One-off exports or backfills run by hand.
  • Tiny config / lookup tables where the whole table is a few KB.
  • Background jobs that page the full table on purpose. Split those across workers with Segment / TotalSegments — a parallel scan — instead of one long sequential crawl. (AWS: Scanning tables)

And note PartiQL doesn't save you: SELECT * FROM ProductEvents WHERE eventType = 'checkout' with no key predicate compiles straight to a Scan. It's the same footgun in SQL clothing. (See Query vs Scan for the full breakdown.)

When you truly need cross-item analytics — a GROUP BY, a JOIN, an aggregate DynamoDB can't express — DynoTable's SQL Workbench runs them client-side over a bounded result set, instead of hammering the table with a full Scan.

Next steps

Estimate what either pattern costs with the capacity calculator, read Query vs Scan for the API-level contrast, and download DynoTable to run these against your own tables and watch the consumed capacity for yourself.

Updated