QuerySet Filters on Many-to-many Relations
2018-05-27
Django ORM (Object-relational mapping) makes querying the database so intuitive, that at some point you might forget that SQL is being used in the background.
This year at the DjangoCon Europe Katie McLaughlin was giving a talk and mentioned one thing that affects the SQL query generated by Django ORM, depending on how you call the QuerySet or manager methods. This particularity is especially relevant when you are creating your QuerySets dynamically. Here it is. When you have a many-to-many relationship, and you try to filter objects by the fields of the related model, every new filter()
method of a QuerySet creates a new INNER JOIN
clause. I won't discuss whether that's a Django bug or a feature, but these are my observations about it.
The Books and Authors Example
Let's create an app with books and authors, where each book can be written by multiple authors.
# -*- coding: UTF-8 -*-
from __future__ import unicode_literals
from django.db import models
from django.utils.translation import ugettext_lazy as _
from django.utils.encoding import python_2_unicode_compatible
@python_2_unicode_compatible
class Author(models.Model):
first_name = models.CharField(_("First name"), max_length=200)
last_name = models.CharField(_("Last name"), max_length=200)
author_name = models.CharField(_("Author name"), max_length=200)
class Meta:
verbose_name = _("Author")
verbose_name_plural = _("Authors")
ordering = ("author_name",)
def __str__(self):
return self.author_name
@python_2_unicode_compatible
class Book(models.Model):
title = models.CharField(_("Title"), max_length=200)
authors = models.ManyToManyField(Author, verbose_name=_("Authors"))
publishing_date = models.DateField(_("Publishing date"), blank=True, null=True)
class Meta:
verbose_name = _("Book")
verbose_name_plural = _("Books")
ordering = ("title",)
def __str__(self):
return self.title
The similar app with sample data can be found in this repository.
Inefficient Filter
With the above models, you could define the following QuerySet to select books which author is me, Aidas Bendoraitis.
queryset = Book.objects.filter(
authors__first_name='Aidas',
).filter(
authors__last_name='Bendoraitis',
)
We can check what SQL query it would generate with str(queryset.query)
(or queryset.query. __str__ ()
).
The output would be something like this:
SELECT `libraryapp_book`.`id`, `libraryapp_book`.`title`, `libraryapp_book`.`publishing_date`
FROM `libraryapp_book`
INNER JOIN `libraryapp_book_authors` ON ( `libraryapp_book`.`id` = `libraryapp_book_authors`.`book_id` )
INNER JOIN `libraryapp_author` ON ( `libraryapp_book_authors`.`author_id` = `libraryapp_author`.`id` )
INNER JOIN `libraryapp_book_authors` T4 ON ( `libraryapp_book`.`id` = T4.`book_id` )
INNER JOIN `libraryapp_author` T5 ON ( T4.`author_id` = T5.`id` )
WHERE (`libraryapp_author`.`first_name` = 'Aidas' AND T5.`last_name` = 'Bendoraitis')
ORDER BY `libraryapp_book`.`title` ASC;
Did you notice, that the database table libraryapp_author
was attached through the libraryapp_book_authors
table to the libraryapp_book
table TWICE where just ONCE would be enough?
Efficient Filter
On the other hand, if you are defining query expressions in the same filter()
method like this:
queryset = Book.objects.filter(
authors__first_name='Aidas',
authors__last_name='Bendoraitis',
)
The generated SQL query will be much shorter and (theoretically) would perform faster:
SELECT `libraryapp_book`.`id`, `libraryapp_book`.`title`, `libraryapp_book`.`publishing_date`
FROM `libraryapp_book`
INNER JOIN `libraryapp_book_authors` ON ( `libraryapp_book`.`id` = `libraryapp_book_authors`.`book_id` )
INNER JOIN `libraryapp_author` ON ( `libraryapp_book_authors`.`author_id` = `libraryapp_author`.`id` )
WHERE (`libraryapp_author`.`first_name` = 'Aidas' AND `libraryapp_author`.`last_name` = 'Bendoraitis')
ORDER BY `libraryapp_book`.`title` ASC;
The same SQL query can be achieved using the Q()
objects:
queryset = Book.objects.filter(
models.Q(authors__first_name='Aidas') &
models.Q(authors__last_name='Bendoraitis')
)
The Q()
objects add a lot of flexibility to filters allowing to OR, AND, and negate query expressions.
Dynamic Filtering
So to have faster performance, when creating QuerySets dynamically, DON'T use filter()
multiple times:
queryset = Book.objects.all()
if first_name:
queryset = queryset.filter(
authors__first_name=first_name,
)
if last_name:
queryset = queryset.filter(
authors__last_name=last_name,
)
DO this instead:
filters = models.Q()
if first_name:
filters &= models.Q(
authors__first_name=first_name,
)
if last_name:
filters &= models.Q(
authors__last_name=last_name,
)
queryset = Book.objects.filter(filters)
Here the empty Q()
doesn't have any impact for the generated SQL query, so you don't need the complexity of creating a list of filters and then joining all of them with the bitwise AND operator, like this:
import operator
from django.utils.six.moves import reduce
filters = []
if first_name:
filters.append(models.Q(
authors__first_name=first_name,
))
if last_name:
filters.append(models.Q(
authors__last_name=last_name,
))
queryset = Book.objects.filter(reduce(operator.iand, filters))
Profiling
In DEBUG mode, you can check how long the previously executed SQL queries took by checking django.db.connection.queries
:
>>> from django.db import connection
>>> connection.queries
[{'sql': 'SELECT …', 'time': '0.001'}, {'sql': 'SELECT …', 'time': '0.004'}]
The Takeaways
- When querying many-to-many relationships, avoid using multiple
filter()
methods, make use ofQ()
objects instead. - You can check the SQL query of a QuerySet with
str(queryset.query)
. - Check the performance of recently executed SQL queries with
django.db.connection.queries
. - With small datasets, the performance difference is not so obvious. For your specific cases you should do the benchmarks yourself.
Cover photo by Tobias Fischer.
Also by me
Django Paddle Subscriptions app
For Django-based SaaS projects.
Django GDPR Cookie Consent app
For Django websites that use cookies.