This works but is very slow. I have an Author table that foreign keys to all books by that author. Each book has a variety of formats it can appear in (html, pdf, etc), which its linked to by its own foreign key.
As the code shows, what I'm doing now is getting all the authors, then looping through to get each book by the author, and looping through those to get all the format types of each book. If the book does not have all four types, it's appended to a list of dictionaries to be displayed in the template.
Note that checking simply the count of each format would be insufficient, as it's possible a book might have multiple versions of one type; I need to make sure each type appears at least once.
We have tens of thousands of authors, each of which can have up to about 20 books, each of which can have up to about 5-8 formats max.
def missing_book_format_report(request):
authors = Author.objects.all()
data = []
for author in authors:
books = author.book_set.all()
for book in books:
book_info = {
"book_num": book.num,
"type": book.short_name,
"xml": False,
"pdf": False,
"html": False,
"gen_html": False
}
book_formats = book.format_set.all()
for book_format in book_formats:
format_type = book_format.format_type_cd_id.lower() # e.g. xml
book_info[format_type] = True
if not all([book_info["xml"], book_info["pdf"], book_info["html"], book_info["gen_html"]]):
data.append(book_info)
context = {
'data': data,
'books': books
}
return render(request, 'book-report.html', context)
1 Answer 1
If you would use a debugger or something like a django-debug-toolbar
, you'd see that you are actually issuing a lot of queries - to get all authors, for each author to get all the books, for each book to get all the formats.
There is a specific tool in Django to solve this kind of problem of getting the related sets - prefetch_related()
and select_related()
. Replace:
authors = Author.objects.all() # 1 query
for author in authors:
books = author.book_set.all() # 1 query per each author
with:
# only 1 query
authors = Author.objects.prefetch_related('books').all()
for author in authors:
books = author.books.all()
See also this nice article with related examples:
For the multiple "prefetch" levels, please see:
You can also select only the columns you actually need - use values_list
.
-
\$\begingroup\$ Unfortunately django-debug-toolbar is not possible with our current set up. I've tried the prefetch and select_related and it hasn't improved performance.
values_list
improves performance? I haven't encountered that yet. \$\endgroup\$thumbtackthief– thumbtackthief2017年06月20日 13:20:22 +00:00Commented Jun 20, 2017 at 13:20 -
\$\begingroup\$ @thumbtackthief may be I'm not suggesting the correct use - apologies if this is the case, it's being a while I've used "advanced" Django ORM. It would be really useful to actually see what queries are executed for both with "prefetch_related" and without..and take it from there. Could you please post your current models? I'll see if I can reproduce the problem as well. Thanks. \$\endgroup\$alecxe– alecxe2017年06月20日 13:24:51 +00:00Commented Jun 20, 2017 at 13:24
-
\$\begingroup\$ I'm afraid I can't post the real models for proprietary reasons, but Author has a lot of fields, so does Book, which FKs to Author, and so does Book_Format, which FKs to Author. The original query is actually not
.objects.all
, it's a filter. Not sure if that makes a difference with prefetch; updating the question. \$\endgroup\$thumbtackthief– thumbtackthief2017年06月20日 13:34:13 +00:00Commented Jun 20, 2017 at 13:34
Explore related questions
See similar questions with these tags.
prefetch_related
is already implemented; I mistakenly left it out of the code. \$\endgroup\$