TIL - Removing incorrect GROUP BY statements when using the Django ORM
Not really a Today-I-Learned (TIL), since I've known about this technique for a while, but I find this technique useful and not widely known, so I thought about documenting it here.
Sometimes the Django ORM does not generate the SQL code we need. Let's first set the stage with a really simple example:
1# example/models.py
2from django.db import models
3from django.db.models import Count, OuterRef, Value
4
5class Invoice(models.Model):
6 pass
7
8class TimeSheet(models.Model):
9 invoice = models.ForeignKey(Invoice, on_delete=models.CASCADE)
Now, in your view, you want to render your invoices and include a count of the number of associated timesheets. Your first try might look something like this:
1# example/views.py
2from django.db.models import Count, OuterRef, Value
3from .models import Invoice, TimeSheet
4
5num_timesheets = (
6 TimeSheet.objects.filter(invoice_id=OuterRef("id"))
7 .annotate(count=Count("id"))
8 .values("count")
9)
10
11qs = Invoice.objects.annotate(
12 num_timesheets=num_timesheets
13)
However, when you run that code, you get incorrect results. Let's take a look at the generated SQL code, to understand why:
1SELECT
2 "example_invoice"."id",
3 (
4 SELECT
5 COUNT(U0."id") AS "count"
6 FROM
7 "example_timesheet" U0
8 WHERE
9 U0."invoice_id" = ("example_invoice"."id")
10 GROUP BY
11 U0."id",
12 U0."invoice_id") AS "num_timesheets"
13FROM
14 "example_invoice"
There is a strange "group by" statement in the generated subquery! Grouping by "id" and "invoice_id" will always lead to groups of 1, so that's definitely not what we want. Actually, we don't need the "group by" statement at all, the filter should suffice.
The Solution (or hack...)
It turns out, you can achieve that with a very strange hack. It's not documented and I've only learned about it from StackOverflow: Link
To solve this, we need to add a grouping statement by some dummy value. Since we're grouping on a dummy value (i.e. it has no actual grouping effect), the Django ORM will remove the "group by" statement altogether:
1# example/views.py
2from django.db.models import Count, OuterRef, Value
3from .models import Invoice, TimeSheet
4
5num_timesheets = (
6 TimeSheet.objects.filter(invoice_id=OuterRef("id"))
7 .annotate(dummy=Value(1)) # 1. dummy annotation
8 .values("dummy") # 2. grouping by dummy value
9 .annotate(count=Count("id"))
10 .values("count")
11)
12
13qs = Invoice.objects.annotate(
14 num_timesheets=num_timesheets
15)
And just like that, the ORM will generate the correct SQL code for you:
1SELECT
2 "example_invoice"."id",
3 (
4 SELECT
5 COUNT(U0."id") AS "count"
6 FROM
7 "example_timesheet" U0
8 WHERE
9 U0."invoice_id" = ("example_invoice"."id")) AS "num_timesheets"
10FROM
11 "example_invoice"
Conclusion
You could abstract this further into some custom queryset method, but that's left as an exercise to the reader. 😉
When you use this technique, be sure to document what you're doing, since is definitely not self-describing, why the grouping by a dummy value is necessary. Also, a test that checks if the aggregation works would be a good idea - after all, we're relying on an undocumented feature here.