TIL - Removing incorrect GROUP BY statements when using the Django ORM
Sept. 14, 2024
Not really a Today-I-Learned (TIL), since I’ve known about this technique for a while, but I find this technique useful and not widely known, so I thought about documenting it here.
Sometimes the Django ORM does not generate the SQL code we need. Let’s first set the stage with a really simple example:
1 2 3 4 5 6 7 8 9 | |
Now, in your view, you want to render your invoices and include a count of the number of associated timesheets. Your first try might look something like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 | |
However, when you run that code, you get incorrect results. Let’s take a look at the generated SQL code, to understand why:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | |
There is a strange “group by” statement in the generated subquery! Grouping by “id” and “invoice_id” will always lead to groups of 1, so that’s definitely not what we want. Actually, we don’t need the “group by” statement at all, the filter should suffice.
The Solution (or hack…)
It turns out, you can achieve that with a very strange hack. It’s not documented and I’ve only learned about it from StackOverflow: Link
To solve this, we need to add a grouping statement by some dummy value. Since we’re grouping on a dummy value (i.e. it has no actual grouping effect), the Django ORM will remove the “group by” statement altogether:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | |
And just like that, the ORM will generate the correct SQL code for you:
1 2 3 4 5 6 7 8 9 10 11 | |
Conclusion
You could abstract this further into some custom queryset method, but that’s left as an exercise to the reader. 😉
When you use this technique, be sure to document what you’re doing, since is definitely not self-describing, why the grouping by a dummy value is necessary. Also, a test that checks if the aggregation works would be a good idea - after all, we’re relying on an undocumented feature here.