Intermediate9 min read

DynamoDB GROUP BY: How to Aggregate Without a GROUP BY Clause

There is no GROUP BY in DynamoDB. There is no COUNT, SUM, or AVG either — not in the native API, and not in PartiQL. DynamoDB is a key-value / document store, not an analytics engine, so aggregation is something you build, not something the query planner does for you.

Can you GROUP BY in DynamoDB?

No. DynamoDB has no GROUP BY, HAVING, or aggregate functions like COUNT, SUM, and AVG — not in the native API and not in PartiQL, whose SELECT accepts only WHERE and ORDER BY. You aggregate by pre-computing totals as data changes (atomic counters or + Lambda rollups) or by grouping app-side after reading.

  • DynamoDB's PartiQL SELECT grammar is SELECT … FROM … [WHERE …] [ORDER BY …] — and that is the whole list. No GROUP BY, no HAVING, no aggregate functions, no JOIN (AWS PartiQL SELECT reference).
  • Because DynamoDB "doesn't natively support aggregation operations like SUM or COUNT across items," AWS's own guidance is to pre-compute aggregates as the data changes and store the results as ordinary items (AWS: materialized aggregation).
  • The alternative — read every item then aggregate in your app — works, but you pay to read the whole table on every query.
  • For one-off exploration, DynoTable's SQL Workbench runs GROUP BY / COUNT / SUM / AVG directly against a live table — the SQL DynamoDB's PartiQL endpoint rejects.

Why aggregation is hard in DynamoDB

DynamoDB has no scan-time aggregation engine. Query and Scan return items; they don't fold them. A Scan reads the whole table 1 MB at a time, and the capacity it consumes is based on the items it reads, not the rows you keep — a FilterExpression is applied after the scan but before results return, so it narrows the result set without lowering the bill (AWS Scan API reference: a filter "does not consume any additional read capacity units"; capacity is based on item size scanned, not returned). There's no GROUP-BY hook to hang a sum or count on in the first place.

PartiQL doesn't change this. PartiQL is a SQL-compatible dialect over the same engine, so it inherits the same limits — it's a syntax surface, not a new execution model. The documented SELECT grammar simply has no GROUP BY token. For the full gap between PartiQL and real SQL, see PartiQL vs SQL.

So the question isn't "how do I write a GROUP BY" — it's "where does my aggregate live, and when is it computed?" There are three answers.

Pattern 1: aggregate on write (atomic counters)

If you know the groups ahead of time — count per status, total per customer, downloads per month — keep a counter item and update it on every write.

Use an ADD so the increment is atomic and concurrency-safe. ADD works on numbers and sets, and it avoids the read-modify-write race, so two writers incrementing the same counter never clobber each other (AWS notes the atomic ADD "avoids read-modify-write race conditions"):

UpdateItem
Key                         { pk: "STATS#orders", sk: "status#shipped" }
UpdateExpression            "ADD orderCount :one"
ExpressionAttributeValues   { ":one": 1 }

This is your SELECT COUNT(*) … GROUP BY status — except the count is already sitting there as an item, readable in a single-digit-millisecond GetItem. The trade-off: you must know the grouping key at write time, and you couple the counter update to the write path. If the app crashes after the write but before the counter update, the two drift out of sync — which is exactly the failure mode the next pattern decouples.

Pattern 2: DynamoDB Streams + Lambda rollups

When you don't want aggregation logic on the write path — or the write is a plain PutItem you can't easily wrap — move it downstream. This is AWS's own recommended pattern, materialized aggregation (AWS: Using GSIs for materialized aggregation queries):

  1. The app writes the raw item (an order, a download, an event). No aggregation logic.
  2. captures the write as a stream record.
  3. A Lambda attached to the stream reads the new item, derives the group (status, month, category…), and ADDs to the matching aggregate item with an atomic UpdateItem — which "avoids read-modify-write race conditions" when many invocations touch the same counter.
  4. You query the pre-computed aggregate — often through a that indexes only the rollup items, so "top 10 this month" is one Query with Limit 10.

The sparse GSI trick: only the aggregate items carry the indexed attribute (e.g. Month), so the raw event rows are excluded from the index automatically — "a small fraction of the total items in the table," which keeps the index cheap and the read fast.

This decouples aggregation from the write path and keeps writes simple, at the cost of eventual consistency — AWS notes "a delay of a few seconds between a download being recorded and the aggregation being updated." For dashboards, leaderboards, and trend counters that's fine.

The same retry caveat applies: a retried Lambda invocation re-runs the ADD, so "a retry would increment the count more than once," leaving an approximate value. For exact counts, add idempotency (e.g. a condition expression keyed on the source item's id); otherwise the small margin is fine for analytics and leaderboards.

Pattern 3: app-side grouping after Scan/Query

The brute-force option: read the items, group them in your code.

groups = {}
for item in paginate(table.scan()):       # or query() for one partition
    key = item["status"]
    groups[key] = groups.get(key, 0) + 1

This is correct and sometimes the right call — but be honest about the cost. A Scan reads every item in the table, and the read capacity is the same whether or not you filter. So app-side grouping over a full Scan means you pay to read the entire table on every aggregation, and latency grows with the table. AWS lists "scan and count at read time" as "only suitable for very small datasets where latency isn't a concern" (AWS: Why pre-compute aggregations).

Scoped down to a single partition via Query (e.g. count the orders for one customer), app-side grouping is perfectly reasonable — you're only reading one item collection. For the full cost gap between the two, see Query vs Scan. To estimate what a given aggregation scan will read before you run it, size a representative item with the item-size calculatorread capacity rounds up per 4 KB, so item size drives the bill.

For genuinely ad-hoc analytical SQL over a DynamoDB table — the throwaway "GROUP BY status, count them" you'd run once — AWS's answer is to point a separate engine at it: the Amazon Athena DynamoDB connector lets you query the table with real SQL (GROUP BY, aggregates, even JOINs to other sources) via a Lambda connector (AWS: Amazon Athena DynamoDB connector). It scans the table behind the scenes, so it's a reporting/BI tool, not a hot path.

Which pattern do I use?

You need…Use
A known group total on a hot read pathPattern 1 — atomic counter (ADD)
Aggregates without touching the write pathPattern 2 — Streams + Lambda rollup
A count scoped to one partitionPattern 3 — Query then group in app
Exact totals, no driftPattern 1/2 with idempotency guard
A one-off GROUP BY while exploringDynoTable Workbench (below) or Athena
Recurring BI/reporting with SQLAthena DynamoDB connector

Running GROUP BY directly in DynoTable's SQL Workbench

The patterns above are how you serve aggregates in production. But when you're exploring a table — "how many orders per status, right now?" — you don't want to provision a Lambda or stand up Athena. You want to type the query.

That's what DynoTable's SQL Workbench is for. It runs real SQL — GROUP BY, COUNT, SUM, AVG, HAVING, even JOIN — directly against your live DynamoDB tables, executing the aggregation client-side over the rows it reads. It's the SQL that DynamoDB's PartiQL endpoint rejects:

SELECT status, COUNT(*) AS orders, SUM(total) AS revenue
FROM "Orders"
GROUP BY status
HAVING SUM(total) > 1000
ORDER BY revenue DESC

The honest framing: under the hood DynoTable reads items the way the API allows (Query where it can, Scan where it must), materializes them, and does the grouping in the Workbench — the same "read then aggregate" mechanics as Pattern 3, just without the loop, and within DynamoDB's access-pattern rules. It's built for exploration and ad-hoc analysis, not for replacing a production rollup on a hot read path. For that, pre-compute (Pattern 1 / 2).

For the JOIN side of the same wedge — DynoTable runs cross-table joins PartiQL can't either — see DynamoDB JOIN. Comparing GUI clients on exactly this capability? See the DynamoDB GUI comparison.

FAQ

Does DynamoDB PartiQL support GROUP BY? No. DynamoDB's PartiQL SELECT supports WHERE and ORDER BY only — no GROUP BY, HAVING, aggregate functions, or JOIN. The grammar is documented as SELECT … FROM … [WHERE …] [ORDER BY …].

Can I do COUNT(*) over a whole DynamoDB table? Not as an aggregate function — PartiQL has none. The API gives you Select=COUNT on a Scan/Query, which returns a count of matched items but still reads (and bills) every item the scan touches (AWS Scan API reference: capacity is based on items examined, not returned). For a frequently-read total, keep a counter item (Pattern 1).

Can I GROUP BY the partition key? Not in DynamoDB or PartiQL. If "per partition key" is a known access pattern, maintain one aggregate item per key with an atomic ADD (Pattern 1), or roll it up with Streams + Lambda (Pattern 2).

How do I do SUM or AVG per group? SUM: keep a running total per group and ADD to it on write. AVG: store both the sum and the count and divide at read time — there's no native average. For a one-off exploratory AVG, run it in DynoTable's SQL Workbench or via the Athena DynamoDB connector.

Is there a partiql group by workaround? No PartiQL-side one. Either pre-compute the aggregate (counters/Streams) and SELECT the rollup item, or run the GROUP BY in an engine that has one — DynoTable's Workbench for ad-hoc, Athena for recurring reporting.


Want to run GROUP BY against your own tables without writing a Lambda? Try DynoTable and point the SQL Workbench at a live table.

Updated