Intermediate7 min read

DynamoDB Filtering Strategies

"Filtering" in DynamoDB means four different things wearing the same word. Three narrow the data before it's read and billed; one — the one named Filter — narrows it after. Knowing which is which is most of the skill.

How does filtering work in DynamoDB?

DynamoDB has four ways to filter, and only one runs after you're billed. The partition key picks a partition, the sort key narrows a slice, and a sparse index filters by attribute presence — all three cut your read cost before metering. A FilterExpression runs after the read, so it shrinks the response but never the bill.

  • Partition key is the cheapest filter: it picks the partition, so you never touch the rest of the table.
  • Sort key filters within a partition with begins_with, between, <, > — still before billing, still cheap.
  • Sparse index filters by absence: an item only appears in the index if it has the indexed attribute, so the index is the filtered set.
  • FilterExpression is the trap: it runs after DynamoDB meters the read, so it cuts your response size but never your bill.

Set up the example

A product catalogue. One table, partition key PK, sort key SK:

PK = "DEPT#kitchen"   SK = "PROD#00194"

Every product also carries price, inStock (a boolean), and clearanceAt (a unix timestamp, present only on items marked for clearance). Items in a department share a partition, sorted by product id.

We want four access patterns. Each one maps to a different filtering strategy — and the wrong choice on any of them is a Scan you'll pay for forever.

Filter by partition key

"Give me every product in kitchen." The partition key answers this directly:

Query  PK = "DEPT#kitchen"

DynamoDB reads exactly one partition. Nothing else in the table is touched or billed. This is the only filter that's free in the sense that matters — it's the difference between Query and Scan.

Coming from SQL, this feels backwards: there's no WHERE department = 'kitchen' scanning an index, you just name the partition. If you can't name it, that's a modelling problem, not a query problem.

Filter by sort key

"Give me kitchen products from PROD#00100 up." The sort key narrows inside the partition, and it does so before the read is metered:

Query  PK = "DEPT#kitchen"  AND  SK between "PROD#00100" AND "PROD#00200"

Sort-key conditions are limited on purpose: =, <, <=, >, >=, between, and begins_with. No OR, no arbitrary predicate.

That constraint is what keeps the read targeted — DynamoDB walks a contiguous slice, not the whole partition.

The lever here is how you encode the sort key. If your pattern is "by price band", a PROD#<id> sort key won't help — you'd bake price into the key.

That's a sort-key strategy decision, made at design time, not query time.

Filter by sparse index

"Give me everything currently on clearance." Most products aren't, so you don't want to read the catalogue to find the few that are.

A sparse index solves this by absence. A Global Secondary Index only contains an item if that item has both of the index's key attributes.

Set the GSI partition key to clearanceAt — present only on clearance items — and the index holds nothing else.

AWS spells this out: a GSI "only contains items that have the indexed attributes," so items missing the key attribute are simply not propagated (AWS — Take advantage of sparse indexes).

YesNoBase table all productsHas clearanceAt?Replicated to ClearanceIndexNot in the indexQuery the index = clearanceitems only

Now the query reads only the clearance items, billed only for them:

Query  ON ClearanceIndex   GSI_PK = "CLEARANCE"   (sorted by clearanceAt)

The filter happened when you wrote the data — by choosing whether to set clearanceAt at all. The index is the filtered set. See GSI vs LSI for which index type fits.

Filter with FilterExpression

"Give me kitchen products that are in stock." inStock isn't a key attribute, so you reach for a FilterExpression:

Query  PK = "DEPT#kitchen"
Filter inStock = true

Here's the trap. DynamoDB reads every item in the kitchen partition, meters the capacity for all of them, and then drops the out-of-stock ones.

The official rule: a filter expression is "applied after a Query finishes, but before the results are returned," and "does not consume any additional read capacity units" — you already paid for the full read (AWS — Filter expressions for Query).

So if kitchen has 10,000 products and 12 are in stock, you pay to read 10,000. The response is small; the bill is not. FilterExpression shrinks the payload crossing the wire, never the read.

There's a second, sharper edge: pagination is metered before filtering. A page is 1 MB of read items, not 1 MB of matches.

A filter can return an empty page with a LastEvaluatedKey set — DynamoDB read a full megabyte, matched nothing, handed you an empty array. You keep paging, and you paid for every empty page.

Build the expression — names, values, and the right reserved-word escaping — with the DynamoDB Expression Builder so the #inStock/:val placeholders are correct on the first try.

Compare the four

When it filtersCuts read cost?Predicate powerCost to set up
Partition keyBefore readYes — one partitionEquality onlyFree (it's the key)
Sort keyBefore readYes — a sliceRange / begins_withSort-key design
Sparse indexBefore readYes — index-onlyPresence of an attributeExtra GSI + write cost
FilterExpressionAfter readNoAlmost any conditionNone

Read the table top to bottom: predicate power goes up, cost control goes down. FilterExpression can express anything precisely because it runs on already-read items — that's the same reason it can't save you money.

See it in DynoTable

When you run a Query with a filter, the gap between items read and items returned is the whole story. DynoTable surfaces consumed capacity next to the result count — so a filter quietly reading the whole partition is visible, not hiding in your monthly bill.

For genuine cross-item questions a filter can't answer — "average price per department", "products in stock joined to their reviews" — DynoTable's SQL Workbench runs GROUP BY, JOIN, and aggregates client-side over a bounded result set, instead of compiling to a table-wide Scan.

Pitfalls and next steps

  • Don't use FilterExpression as your primary access path. If a pattern is common, model it into a key or a sparse index. A filter is for the last little bit of narrowing, not the bulk of it.
  • Watch empty pages. A filtered query can page for a long time returning nothing. Honour LastEvaluatedKey; don't assume an empty page means "done".
  • A sparse index isn't free. It costs write capacity and storage for every item that lands in it — cheap when the attribute is rare, less so when it isn't.

Estimate what a filtered read will actually cost with the capacity calculator, and try DynoTable to watch consumed capacity against returned rows on your own tables.

Updated