DynamoDB GROUP BY: How to Aggregate Without a GROUP BY Clause
There is no GROUP BY in DynamoDB. There is no COUNT, SUM, or AVG
either — not in the native API, and not in PartiQL. DynamoDB is a key-value /
document store, not an analytics engine, so aggregation is something you
build, not something the query planner does for you.
Can you GROUP BY in DynamoDB?
No. DynamoDB has no GROUP BY, HAVING, or aggregate functions like COUNT, SUM, and AVG — not in the native API and not in PartiQL, whose SELECT accepts only WHERE and ORDER BY. You aggregate by pre-computing totals as data changes (atomic counters or Streams + Lambda rollups) or by grouping app-side after reading.
- DynamoDB's PartiQL
SELECTgrammar isSELECT … FROM … [WHERE …] [ORDER BY …]— and that is the whole list. NoGROUP BY, noHAVING, no aggregate functions, noJOIN(AWS PartiQLSELECTreference). - Because DynamoDB "doesn't natively support aggregation operations like
SUMorCOUNTacross items," AWS's own guidance is to pre-compute aggregates as the data changes and store the results as ordinary items (AWS: materialized aggregation). - The alternative — read every item then aggregate in your app — works, but you pay to read the whole table on every query.
- For one-off exploration, DynoTable's SQL Workbench runs
GROUP BY/COUNT/SUM/AVGdirectly against a live table — the SQL DynamoDB's PartiQL endpoint rejects.
Why aggregation is hard in DynamoDB
DynamoDB has no scan-time aggregation engine. Query and Scan return items;
they don't fold them. A Scan reads the whole table 1 MB at a time, and the
capacity it consumes is based on the items it reads, not the rows you keep — a
FilterExpression is applied after the scan but before results return, so it
narrows the result set without lowering the bill (AWS Scan API reference: a filter
"does not consume any additional read capacity units"; capacity is based on item
size scanned, not returned). There's no GROUP-BY hook
to hang a sum or count on in the first place.
PartiQL doesn't change this. PartiQL is a SQL-compatible dialect over the same
engine, so it inherits the same limits — it's a syntax surface, not a new
execution model. The documented SELECT grammar
simply has no GROUP BY token.
For the full gap between PartiQL and real SQL, see
PartiQL vs SQL.
So the question isn't "how do I write a GROUP BY" — it's "where does my
aggregate live, and when is it computed?" There are three answers.
Pattern 1: aggregate on write (atomic counters)
If you know the groups ahead of time — count per status, total per customer, downloads per month — keep a counter item and update it on every write.
Use an ADD update expression so the increment is atomic and concurrency-safe.
ADD works on numbers and sets, and it avoids the read-modify-write race, so
two writers incrementing the same counter never clobber each other
(AWS notes the atomic ADD "avoids read-modify-write race conditions"):
UpdateItem
Key { pk: "STATS#orders", sk: "status#shipped" }
UpdateExpression "ADD orderCount :one"
ExpressionAttributeValues { ":one": 1 }
This is your SELECT COUNT(*) … GROUP BY status — except the count is already
sitting there as an item, readable in a single-digit-millisecond GetItem. The
trade-off: you must know the grouping key at write time, and you couple the
counter update to the write path. If the app crashes after the write but
before the counter update, the two drift out of sync — which is exactly the
failure mode the next pattern decouples.
Pattern 2: DynamoDB Streams + Lambda rollups
When you don't want aggregation logic on the write path — or the write is a
plain PutItem you can't easily wrap — move it downstream. This is AWS's own
recommended pattern, materialized aggregation
(AWS: Using GSIs for materialized aggregation queries):
- The app writes the raw item (an order, a download, an event). No aggregation logic.
- DynamoDB Streams captures the write as a stream record.
- A Lambda attached to the stream reads the new item, derives the group
(status, month, category…), and
ADDs to the matching aggregate item with an atomicUpdateItem— which "avoids read-modify-write race conditions" when many invocations touch the same counter. - You query the pre-computed aggregate — often through a sparse GSI that
indexes only the rollup items, so "top 10 this month" is one
QuerywithLimit 10.
The sparse GSI trick: only the aggregate items carry the indexed attribute
(e.g. Month), so the raw event rows are excluded from the index automatically
— "a small fraction of the total items in the table," which keeps the index
cheap and the read fast.
This decouples aggregation from the write path and keeps writes simple, at the cost of eventual consistency — AWS notes "a delay of a few seconds between a download being recorded and the aggregation being updated." For dashboards, leaderboards, and trend counters that's fine.
The same retry caveat applies: a retried Lambda invocation re-runs the ADD, so
"a retry would increment the count more than once," leaving an approximate
value. For exact counts, add idempotency (e.g. a condition expression keyed on
the source item's id); otherwise the small margin is fine for analytics and
leaderboards.
Pattern 3: app-side grouping after Scan/Query
The brute-force option: read the items, group them in your code.
groups = {}
for item in paginate(table.scan()): # or query() for one partition
key = item["status"]
groups[key] = groups.get(key, 0) + 1This is correct and sometimes the right call — but be honest about the cost. A
Scan reads every item in the table, and the read capacity is the same
whether or not you filter. So app-side grouping over a full Scan means you pay
to read the entire table on every aggregation, and latency grows with the table.
AWS lists "scan and count at read time" as "only suitable for very small
datasets where latency isn't a concern"
(AWS: Why pre-compute aggregations).
Scoped down to a single partition via Query (e.g. count the orders for one
customer), app-side grouping is perfectly reasonable — you're only reading one
item collection. For the full cost gap between the two, see
Query vs Scan. To estimate what a given aggregation
scan will read before you run it, size a representative item with the
item-size calculator — read capacity
rounds up per 4 KB,
so item size drives the bill.
For genuinely ad-hoc analytical SQL over a DynamoDB table — the throwaway
"GROUP BY status, count them" you'd run once — AWS's answer is to point a
separate engine at it: the Amazon Athena DynamoDB connector lets you query
the table with real SQL (GROUP BY, aggregates, even JOINs to other sources)
via a Lambda connector
(AWS: Amazon Athena DynamoDB connector).
It scans the table behind the scenes, so it's a reporting/BI tool, not a hot path.
Which pattern do I use?
| You need… | Use |
|---|---|
| A known group total on a hot read path | Pattern 1 — atomic counter (ADD) |
| Aggregates without touching the write path | Pattern 2 — Streams + Lambda rollup |
| A count scoped to one partition | Pattern 3 — Query then group in app |
| Exact totals, no drift | Pattern 1/2 with idempotency guard |
A one-off GROUP BY while exploring | DynoTable Workbench (below) or Athena |
| Recurring BI/reporting with SQL | Athena DynamoDB connector |
Running GROUP BY directly in DynoTable's SQL Workbench
The patterns above are how you serve aggregates in production. But when you're exploring a table — "how many orders per status, right now?" — you don't want to provision a Lambda or stand up Athena. You want to type the query.
That's what DynoTable's SQL Workbench is for. It runs real SQL —
GROUP BY, COUNT, SUM, AVG, HAVING, even JOIN — directly against your
live DynamoDB tables, executing the aggregation client-side over the rows it
reads. It's the SQL that DynamoDB's PartiQL endpoint rejects:
SELECT status, COUNT(*) AS orders, SUM(total) AS revenue
FROM "Orders"
GROUP BY status
HAVING SUM(total) > 1000
ORDER BY revenue DESCThe honest framing: under the hood DynoTable reads items the way the API allows
(Query where it can, Scan where it must), materialises them, and does the
grouping in the Workbench — the same "read then aggregate" mechanics as Pattern
3, just without the loop, and within DynamoDB's access-pattern rules. It's
built for exploration and ad-hoc analysis, not for replacing a production
rollup on a hot read path. For that, pre-compute (Pattern 1 / 2).
For the JOIN side of the same wedge — DynoTable runs cross-table joins PartiQL
can't either — see DynamoDB JOIN. Comparing GUI
clients on exactly this capability? See
the DynamoDB GUI comparison.
FAQ
Does DynamoDB PartiQL support GROUP BY?
No. DynamoDB's PartiQL SELECT supports WHERE and ORDER BY only — no
GROUP BY, HAVING, aggregate functions, or JOIN. The grammar is
documented
as SELECT … FROM … [WHERE …] [ORDER BY …].
Can I do COUNT(*) over a whole DynamoDB table?
Not as an aggregate function — PartiQL has none. The API gives you
Select=COUNT on a Scan/Query, which returns a count of matched items but
still reads (and bills) every item the scan touches
(AWS Scan API reference:
capacity is based on items examined, not returned). For a frequently-read
total, keep a counter item (Pattern 1).
Can I GROUP BY the partition key?
Not in DynamoDB or PartiQL. If "per partition key" is a known access pattern,
maintain one aggregate item per key with an atomic ADD (Pattern 1), or roll it
up with Streams + Lambda (Pattern 2).
How do I do SUM or AVG per group?
SUM: keep a running total per group and ADD to it on write. AVG: store
both the sum and the count and divide at read time — there's no native average.
For a one-off exploratory AVG, run it in DynoTable's SQL Workbench or via the
Athena DynamoDB connector.
Is there a partiql group by workaround?
No PartiQL-side one. Either pre-compute the aggregate (counters/Streams) and
SELECT the rollup item, or run the GROUP BY in an engine that has one —
DynoTable's Workbench for ad-hoc, Athena for recurring reporting.
Want to run GROUP BY against your own tables without writing a Lambda?
Try DynoTable and point the SQL Workbench at a live table.