DynamoDB Filtering Strategies
"Filtering" in DynamoDB means four different things wearing the same word. Three
narrow the data before it's read and billed; one — the one named Filter —
narrows it after. Knowing which is which is most of the skill.
How does filtering work in DynamoDB?
DynamoDB has four ways to filter, and only one runs after you're billed. The partition key picks a partition, the sort key narrows a slice, and a sparse index filters by attribute presence — all three cut your read cost before metering. A FilterExpression runs after the read, so it shrinks the response but never the bill.
- Partition key is the cheapest filter: it picks the partition, so you never touch the rest of the table.
- Sort key filters within a partition with
begins_with,between,<,>— still before billing, still cheap. - Sparse index filters by absence: an item only appears in the index if it has the indexed attribute, so the index is the filtered set.
FilterExpressionis the trap: it runs after DynamoDB meters the read, so it cuts your response size but never your bill.
Set up the example
A product catalogue. One table, partition key PK, sort key SK:
PK = "DEPT#kitchen" SK = "PROD#00194"
Every product also carries price, inStock (a boolean), and clearanceAt
(a unix timestamp, present only on items marked for clearance). Items in a
department share a partition, sorted by product id.
We want four access patterns. Each one maps to a different filtering strategy —
and the wrong choice on any of them is a Scan you'll pay for forever.
Filter by partition key
"Give me every product in kitchen." The partition key answers this directly:
Query PK = "DEPT#kitchen"
DynamoDB reads exactly one partition. Nothing else in the table is touched or
billed. This is the only filter that's free in the sense that matters — it's the
difference between Query and Scan.
Coming from SQL, this feels backwards: there's no WHERE department = 'kitchen'
scanning an index, you just name the partition. If you can't name it, that's a
modelling problem, not a query problem.
Filter by sort key
"Give me kitchen products from PROD#00100 up." The sort key narrows inside
the partition, and it does so before the read is metered:
Query PK = "DEPT#kitchen" AND SK between "PROD#00100" AND "PROD#00200"
Sort-key conditions are limited on purpose: =, <, <=, >, >=,
between, and begins_with. No OR, no arbitrary predicate.
That constraint is what keeps the read targeted — DynamoDB walks a contiguous slice, not the whole partition.
The lever here is how you encode the sort key. If your pattern is "by price
band", a PROD#<id> sort key won't help — you'd bake price into the key.
That's a sort-key strategy decision, made at design time, not query time.
Filter by sparse index
"Give me everything currently on clearance." Most products aren't, so you don't want to read the catalogue to find the few that are.
A sparse index solves this by absence. A Global Secondary Index only contains an item if that item has both of the index's key attributes.
Set the GSI partition key to clearanceAt — present only on clearance items —
and the index holds nothing else.
AWS spells this out: a GSI "only contains items that have the indexed attributes," so items missing the key attribute are simply not propagated (AWS — Take advantage of sparse indexes).
Now the query reads only the clearance items, billed only for them:
Query ON ClearanceIndex GSI_PK = "CLEARANCE" (sorted by clearanceAt)
The filter happened when you wrote the data — by choosing whether to set
clearanceAt at all. The index is the filtered set. See
GSI vs LSI for which index type fits.
Filter with FilterExpression
"Give me kitchen products that are in stock." inStock isn't a key attribute,
so you reach for a FilterExpression:
Query PK = "DEPT#kitchen"
Filter inStock = true
Here's the trap. DynamoDB reads every item in the kitchen partition, meters
the capacity for all of them, and then drops the out-of-stock ones.
The official rule: a filter expression is "applied after a Query finishes, but
before the results are returned," and "does not consume any additional read
capacity units" — you already paid for the full read
(AWS — Filter expressions for Query).
So if kitchen has 10,000 products and 12 are in stock, you pay to read 10,000.
The response is small; the bill is not. FilterExpression shrinks the payload
crossing the wire, never the read.
There's a second, sharper edge: pagination is metered before filtering. A page is 1 MB of read items, not 1 MB of matches.
A filter can return an empty page with a LastEvaluatedKey set — DynamoDB read a
full megabyte, matched nothing, handed you an empty array. You keep paging, and
you paid for every empty page.
Build the expression — names, values, and the right reserved-word escaping — with
the DynamoDB Expression Builder so the
#inStock/:val placeholders are correct on the first try.
Compare the four
| When it filters | Cuts read cost? | Predicate power | Cost to set up | |
|---|---|---|---|---|
| Partition key | Before read | Yes — one partition | Equality only | Free (it's the key) |
| Sort key | Before read | Yes — a slice | Range / begins_with | Sort-key design |
| Sparse index | Before read | Yes — index-only | Presence of an attribute | Extra GSI + write cost |
| FilterExpression | After read | No | Almost any condition | None |
Read the table top to bottom: predicate power goes up, cost control goes
down. FilterExpression can express anything precisely because it runs on
already-read items — that's the same reason it can't save you money.
See it in DynoTable
When you run a Query with a filter, the gap between items read and items
returned is the whole story. DynoTable surfaces consumed capacity next to the
result count — so a filter quietly reading the whole partition is visible, not
hiding in your monthly bill.
For genuine cross-item questions a filter can't answer — "average price per
department", "products in stock joined to their reviews" — DynoTable's SQL
Workbench runs GROUP BY, JOIN, and aggregates client-side over a bounded
result set, instead of compiling to a table-wide Scan.
Pitfalls and next steps
- Don't use
FilterExpressionas your primary access path. If a pattern is common, model it into a key or a sparse index. A filter is for the last little bit of narrowing, not the bulk of it. - Watch empty pages. A filtered query can page for a long time returning
nothing. Honour
LastEvaluatedKey; don't assume an empty page means "done". - A sparse index isn't free. It costs write capacity and storage for every item that lands in it — cheap when the attribute is rare, less so when it isn't.
Estimate what a filtered read will actually cost with the capacity calculator, and try DynoTable to watch consumed capacity against returned rows on your own tables.