TIL - Removing incorrect GROUP BY statements when using the Django ORM

Sept. 14, 2024

Not really a Today-I-Learned (TIL), since I’ve known about this technique for a while, but I find this technique useful and not widely known, so I thought about documenting it here.

Sometimes the Django ORM does not generate the SQL code we need. Let’s first set the stage with a really simple example:

1
2
3
4
5
6
7
8
9
# example/models.py
from django.db import models
from django.db.models import Count, OuterRef, Value

class Invoice(models.Model):
    pass

class TimeSheet(models.Model):
    invoice = models.ForeignKey(Invoice, on_delete=models.CASCADE)

Now, in your view, you want to render your invoices and include a count of the number of associated timesheets. Your first try might look something like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# example/views.py
from django.db.models import Count, OuterRef, Value
from .models import Invoice, TimeSheet

num_timesheets = (
    TimeSheet.objects.filter(invoice_id=OuterRef("id"))
    .annotate(count=Count("id"))
    .values("count")
)

qs = Invoice.objects.annotate(
    num_timesheets=num_timesheets
)

However, when you run that code, you get incorrect results. Let’s take a look at the generated SQL code, to understand why:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
SELECT
    "example_invoice"."id",
    (
    SELECT
        COUNT(U0."id") AS "count"
    FROM
        "example_timesheet" U0
    WHERE
        U0."invoice_id" = ("example_invoice"."id")
    GROUP BY
        U0."id",
        U0."invoice_id") AS "num_timesheets"
FROM
    "example_invoice"

There is a strange “group by” statement in the generated subquery! Grouping by “id” and “invoice_id” will always lead to groups of 1, so that’s definitely not what we want. Actually, we don’t need the “group by” statement at all, the filter should suffice.

The Solution (or hack…)

It turns out, you can achieve that with a very strange hack. It’s not documented and I’ve only learned about it from StackOverflow: Link

To solve this, we need to add a grouping statement by some dummy value. Since we’re grouping on a dummy value (i.e. it has no actual grouping effect), the Django ORM will remove the “group by” statement altogether:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# example/views.py
from django.db.models import Count, OuterRef, Value
from .models import Invoice, TimeSheet

num_timesheets = (
    TimeSheet.objects.filter(invoice_id=OuterRef("id"))
    .annotate(dummy=Value(1))  # 1. dummy annotation
    .values("dummy")  # 2. grouping by dummy value
    .annotate(count=Count("id"))
    .values("count")
)

qs = Invoice.objects.annotate(
    num_timesheets=num_timesheets
)

And just like that, the ORM will generate the correct SQL code for you:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
SELECT
    "example_invoice"."id",
    (
    SELECT
        COUNT(U0."id") AS "count"
    FROM
        "example_timesheet" U0
    WHERE
        U0."invoice_id" = ("example_invoice"."id")) AS "num_timesheets"
FROM
    "example_invoice"

Conclusion

You could abstract this further into some custom queryset method, but that’s left as an exercise to the reader. 😉

When you use this technique, be sure to document what you’re doing, since is definitely not self-describing, why the grouping by a dummy value is necessary. Also, a test that checks if the aggregation works would be a good idea - after all, we’re relying on an undocumented feature here.