SlideShare a Scribd company logo
Advanced Django ORM
     techniques
 Daniel Roseman   http://blog.roseman.org.uk
About Me
• Python user for five years
• Discovered Django four years ago
• Worked full-time with Python/Django since
  2008.
• Top Django answerer on StackOverflow!
• Occasionally blog on Django, concentrating
  on efficient use of the ORM.
Contents

• Behind the scenes: models and fields
• How model relationships work
• More efficient relationships
• Other optimising techniques
Django ORM
efficiency: a story
Advanced Django ORM techniques
414 queries!
How can you stop this
 happening to you?


             http://www.flickr.com/photos/m0n0/4479450696
Behind the scenes:
models and fields


               http://www.flickr.com/photos/spacesuitcatalyst/847530840
Defining a model

• Model structure initialised via metaclass
• Called when model is first defined
• Resulting model class stored in cache to
  use when instantiated
Fields

• Fields have contribute_to_class
• Adds methods, eg get_FOO_display()
• Enables use of descriptors for field access
Model metadata

•   Model._meta

•   .fields

•   .get_field(fieldname)

•   .get_all_related_objects()
Model instantiation

• Instance is populated from database initially
• Has no subsequent relationship with db
  until save
• No identity between models
Querysets
• Model=manager returns a queryset:
  foos Foo.objects.all()

• Queryset is an ordered list of instances
  of a single model
• No database access yet
• Slice: foos[0]
• Iterate: {% for foo in foos %}
Where do all those
  queries come from?
• Repeated queries
• Lack of caching
• Relational lookup
• Templates as well as views
Repeated queries
    def get_absolute_url(self):
      return "%s/%s" % (
         self.category.slug,
         self.slug
      )


    Same category, but query is
    repeated for each article
Repeated queries
• Same link on every
  page

• Dynamic, so can't
  go in urlconf

• Could be cached
  or memoized
Relationships




        http://www.flickr.com/photos/katietegtmeyer/124315322
Relational lookups

• Forwards:
  foo.bar.field



• Backwards:
  bar.foo_set.all()
Example models
class Foo(models.Model):
 name = models.CharField(max_length=10)


class Bar(models.Model):
 name = models.CharField(max_length=10)
 foo = models.ForeignKey(Foo)
Forwards relationship

>>> bar = Bar.objects.all()[0]
>>> bar.__dict__
{'id': 1, 'foo_id': 1, 'name': u'item1'}
Forwards relationship
>>> bar.foo.name
u'item1'
>>> bar.__dict__
{'_foo_cache': <Foo: Foo object>, 'id': 1,
'foo_id': 1, 'name': u'item1'}
Fowards relationships
• Relational access implemented via a
    descriptor:
    django.db.models.fields.related.
    SingleRelatedObjectDescriptor

•   __get__ tries to access _foo_cache

• If doesn't exist, does lookup and creates
    cache
select_related
• Automatically follows foreign keys in SQL
  query
• Prepopulates _foo_cache
• Doesn't follow null=True relationships by
  default
• Makes query more expensive, so be sure
  you need it
Backwards relationships
{% for foo in my_foos %}
 {% for bar in foo.bar_set.all %}
  {{ bar.name }}
 {% endfor %}
{% endfor %}
Backwards relationships
• One query per foo
• If you iterate over foo_set again, you
  generate a new set of db hits
• No _foo_cache
• select_related does not work here
Optimising backwards
    relationships

• Get all related objects at once
• Sort by ID of parent object
• Then cache in hidden attribute as with
  select_related
qs = Foo.objects.filter(criteria=whatever)
obj_dict = dict([(obj.id, obj)
           for obj in qs])
objects = Bar.objects.filter(foo__in=qs)
relation_dict = {}
for obj in objects:
 relation_dict.setdefault(
          obj.foo_id, []).append(obj)
for id, related in relation_dict.items():
 obj_dict[id]._related = related
qs = Foo.objects.filter(criteria=whatever)
obj_dict = dict([(obj.id, obj)
           for obj in qs])
objects = Bar.objects.filter(foo__in=qs)
relation_dict = {}
for obj in objects:
 relation_dict.setdefault(
          obj.foo_id, []).append(obj)
for id, related in relation_dict.items():
 obj_dict[id]._related = related
qs = Foo.objects.filter(criteria=whatever)
obj_dict = dict([(obj.id, obj)
           for obj in qs])
objects = Bar.objects.filter(foo__in=qs)
relation_dict = {}
for obj in objects:
 relation_dict.setdefault(
          obj.foo_id, []).append(obj)
for id, related in relation_dict.items():
 obj_dict[id]._related = related
qs = Foo.objects.filter(criteria=whatever)
obj_dict = dict([(obj.id, obj)
           for obj in qs])
objects = Bar.objects.filter(foo__in=qs)
relation_dict = {}
for obj in objects:
 relation_dict.setdefault(
          obj.foo_id, []).append(obj)
for id, related in relation_dict.items():
 obj_dict[id]._related = related
qs = Foo.objects.filter(criteria=whatever)
obj_dict = dict([(obj.id, obj)
           for obj in qs])
objects = Bar.objects.filter(foo__in=qs)
relation_dict = {}
for obj in objects:
 relation_dict.setdefault(
          obj.foo_id, []).append(obj)
for id, related in relation_dict.items():
 obj_dict[id]._related = related
qs = Foo.objects.filter(criteria=whatever)
obj_dict = dict([(obj.id, obj)
           for obj in qs])
objects = Bar.objects.filter(foo__in=qs)
relation_dict = {}
for obj in objects:
 relation_dict.setdefault(
          obj.foo_id, []).append(obj)
for id, related in relation_dict.items():
 obj_dict[id]._related = related
Optimising backwards
[{'time': '0.000', 'sql': u'SELECT
"foobar_foo"."id", "foobar_foo"."name" FROM
"foobar_foo"'},
{'time': '0.000', 'sql': u'SELECT
"foobar_bar"."id", "foobar_bar"."name",
"foobar_bar"."foo_id" FROM "foobar_bar"
WHERE "foobar_bar"."foo_id" IN (SELECT
U0."id" FROM "foobar_foo" U0)'}]
Optimising backwards

• Still quite expensive, as can mean large
  dependent subquery – MySQL in particular
  very bad at these
• But now just two queries instead of n
• Not automatic – need to remember to use
  _related_items attribute
Generic relations
• Foreign key to ContentType, object_id
• Descriptor to enable direct access
• iterating through creates n+m
  queries(n=number of source objects,
  m=number of different content types)
• ContentType objects automatically cached
• Forwards relationship creates _foo_cache
• but select_related doesn't work
generics = {}
for item in queryset:
  generics.setdefault(item.content_type_id,
                 
 set()).add(item.object_id)
content_types = ContentType.objects.in_bulk(
                 generics.keys())
relations = {}
for ct, fk_list in generics.items():
 ct_model = content_types[ct].model_class()
 relations[ct] = ct_model.objects.
           in_bulk(list(fk_list))
for item in queryset:
 setattr(item, '_content_object_cache',
    relations[content_type_id][item.object_id]
 )
generics = {}
for item in queryset:
  generics.setdefault(item.content_type_id,
                 
 set()).add(item.object_id)
content_types = ContentType.objects.in_bulk(
                 generics.keys())
relations = {}
for ct, fk_list in generics.items():
 ct_model = content_types[ct].model_class()
 relations[ct] = ct_model.objects.
           in_bulk(list(fk_list))
for item in queryset:
 setattr(item, '_content_object_cache',
    relations[content_type_id][item.object_id]
 )
generics = {}
for item in queryset:
  generics.setdefault(item.content_type_id,
                 
 set()).add(item.object_id)
content_types = ContentType.objects.in_bulk(
                 generics.keys())
relations = {}
for ct, fk_list in generics.items():
 ct_model = content_types[ct].model_class()
 relations[ct] = ct_model.objects.
           in_bulk(list(fk_list))
for item in queryset:
 setattr(item, '_content_object_cache',
    relations[content_type_id][item.object_id]
 )
generics = {}
for item in queryset:
  generics.setdefault(item.content_type_id,
                 
 set()).add(item.object_id)
content_types = ContentType.objects.in_bulk(
                 generics.keys())
relations = {}
for ct, fk_list in generics.items():
 ct_model = content_types[ct].model_class()
 relations[ct] = ct_model.objects.
           in_bulk(list(fk_list))
for item in queryset:
 setattr(item, '_content_object_cache',
    relations[content_type_id][item.object_id]
 )
generics = {}
for item in queryset:
  generics.setdefault(item.content_type_id,
                 
 set()).add(item.object_id)
content_types = ContentType.objects.in_bulk(
                 generics.keys())
relations = {}
for ct, fk_list in generics.items():
 ct_model = content_types[ct].model_class()
 relations[ct] = ct_model.objects.
           in_bulk(list(fk_list))
for item in queryset:
 setattr(item, '_content_object_cache',
    relations[content_type_id][item.object_id]
 )
generics = {}
for item in queryset:
  generics.setdefault(item.content_type_id,
                 
 set()).add(item.object_id)
content_types = ContentType.objects.in_bulk(
                 generics.keys())
relations = {}
for ct, fk_list in generics.items():
 ct_model = content_types[ct].model_class()
 relations[ct] = ct_model.objects.
           in_bulk(list(fk_list))
for item in queryset:
 setattr(item, '_content_object_cache',
    relations[content_type_id][item.object_id]
 )
generics = {}
for item in queryset:
  generics.setdefault(item.content_type_id,
                 
 set()).add(item.object_id)
content_types = ContentType.objects.in_bulk(
                 generics.keys())
relations = {}
for ct, fk_list in generics.items():
 ct_model = content_types[ct].model_class()
 relations[ct] = ct_model.objects.
           in_bulk(list(fk_list))
for item in queryset:
 setattr(item, '_content_object_cache',
    relations[content_type_id][item.object_id]
 )
generics = {}
for item in queryset:
  generics.setdefault(item.content_type_id,
                 
 set()).add(item.object_id)
content_types = ContentType.objects.in_bulk(
                 generics.keys())
relations = {}
for ct, fk_list in generics.items():
 ct_model = content_types[ct].model_class()
 relations[ct] = ct_model.objects.
           in_bulk(list(fk_list))
for item in queryset:
 setattr(item, '_content_object_cache',
    relations[content_type_id][item.object_id]
 )
generics = {}
for item in queryset:
  generics.setdefault(item.content_type_id,
                 
 set()).add(item.object_id)
content_types = ContentType.objects.in_bulk(
                 generics.keys())
relations = {}
for ct, fk_list in generics.items():
 ct_model = content_types[ct].model_class()
 relations[ct] = ct_model.objects.
           in_bulk(list(fk_list))
for item in queryset:
 setattr(item, '_content_object_cache',
    relations[content_type_id][item.object_id]
 )
Other optimising
  techniques
Memoizing
• Cache property on first access
• Can cache within instance, if multiple
  accesses within same request
def get_expensive_items(self):
 if not hasattr(self, '_cache'):
  self._cache = self.expensive_op()
 return self._cache
DB Indexes

• Pay attention to slow query log and
  debug toolbar output
• Add extra indexes where necessary -
  especially for multiple-column lookup
• Use EXPLAIN
Outsourcing

• Does all the logic need to go in the web
  app?
• Services - via eg Piston
• Message queues
• Distributed tasks, eg Celery
Summary

• Understand where queries are coming
  from
• Optimise where necessary, within Django
  or in the database
• and...
PROFILE
Daniel Roseman

http://blog.roseman.org.uk

More Related Content

What's hot (20)

PPT
JDBC – Java Database Connectivity
Information Technology
 
PDF
Collections In Java
Binoj T E
 
PDF
Algebraic Data Types for Data Oriented Programming - From Haskell and Scala t...
Philip Schwarz
 
PDF
Data Persistence in Android with Room Library
Reinvently
 
PPT
Collection Framework in java
CPD INDIA
 
PDF
Java variable types
Soba Arjun
 
PPSX
Java String class
DrRajeshreeKhande
 
PPTX
Unit 5 java-awt (1)
DevaKumari Vijay
 
PPT
Java multi threading
Raja Sekhar
 
PPTX
ArrayList in JAVA
SAGARDAVE29
 
PPTX
Spring MVC
Emprovise
 
PDF
PostgreSQL Tutorial For Beginners | Edureka
Edureka!
 
PPTX
Java servlets
yuvarani p
 
PPT
Java exception
Arati Gadgil
 
PDF
Java threads
Prabhakaran V M
 
PDF
MySQL Tutorial For Beginners | Relational Database Management System | MySQL ...
Edureka!
 
PDF
Struts2
Rajiv Gupta
 
PPTX
Java exception handling
BHUVIJAYAVELU
 
PPTX
Super keyword in java
Hitesh Kumar
 
JDBC – Java Database Connectivity
Information Technology
 
Collections In Java
Binoj T E
 
Algebraic Data Types for Data Oriented Programming - From Haskell and Scala t...
Philip Schwarz
 
Data Persistence in Android with Room Library
Reinvently
 
Collection Framework in java
CPD INDIA
 
Java variable types
Soba Arjun
 
Java String class
DrRajeshreeKhande
 
Unit 5 java-awt (1)
DevaKumari Vijay
 
Java multi threading
Raja Sekhar
 
ArrayList in JAVA
SAGARDAVE29
 
Spring MVC
Emprovise
 
PostgreSQL Tutorial For Beginners | Edureka
Edureka!
 
Java servlets
yuvarani p
 
Java exception
Arati Gadgil
 
Java threads
Prabhakaran V M
 
MySQL Tutorial For Beginners | Relational Database Management System | MySQL ...
Edureka!
 
Struts2
Rajiv Gupta
 
Java exception handling
BHUVIJAYAVELU
 
Super keyword in java
Hitesh Kumar
 

Viewers also liked (18)

PDF
Advanced Django
Simon Willison
 
PDF
What's new in Django 1.7
Daniel Roseman
 
PPT
Django orm-tips
Tareque Hossain
 
PPTX
Basic Django ORM
Ayun Park
 
PDF
Django ORM
Junsu Kim
 
PDF
Introduction to Django REST Framework, an easy way to build REST framework in...
Zhe Li
 
PPTX
Django: Advanced Models
Ying-An Lai
 
PDF
Django In Depth
ubernostrum
 
PDF
Django REST Framework
Load Impact
 
PPTX
REST Easy with Django-Rest-Framework
Marcel Chastain
 
KEY
Advanced Django Forms Usage
Daniel Greenfeld
 
PDF
Full Stack & Full Circle: What the Heck Happens In an HTTP Request-Response C...
Carina C. Zona
 
PDF
12 tips on Django Best Practices
David Arcos
 
PDF
Django in the Real World
Jacob Kaplan-Moss
 
DOCX
Tabela de números romanos
Dann Senda
 
PPTX
Top 10 senior technical architect interview questions and answers
tonychoper5406
 
PDF
facebook architecture for 600M users
Jongyoon Choi
 
PDF
Web Development with Python and Django
Michael Pirnat
 
Advanced Django
Simon Willison
 
What's new in Django 1.7
Daniel Roseman
 
Django orm-tips
Tareque Hossain
 
Basic Django ORM
Ayun Park
 
Django ORM
Junsu Kim
 
Introduction to Django REST Framework, an easy way to build REST framework in...
Zhe Li
 
Django: Advanced Models
Ying-An Lai
 
Django In Depth
ubernostrum
 
Django REST Framework
Load Impact
 
REST Easy with Django-Rest-Framework
Marcel Chastain
 
Advanced Django Forms Usage
Daniel Greenfeld
 
Full Stack & Full Circle: What the Heck Happens In an HTTP Request-Response C...
Carina C. Zona
 
12 tips on Django Best Practices
David Arcos
 
Django in the Real World
Jacob Kaplan-Moss
 
Tabela de números romanos
Dann Senda
 
Top 10 senior technical architect interview questions and answers
tonychoper5406
 
facebook architecture for 600M users
Jongyoon Choi
 
Web Development with Python and Django
Michael Pirnat
 
Ad

Similar to Advanced Django ORM techniques (20)

PPT
Hibernate Tutorial for beginners
rajkamal560066
 
PPTX
Powerful Generic Patterns With Django
Eric Satterwhite
 
PDF
Django workshop : let's make a blog
Pierre Sudron
 
PPT
Django Search
Peter Herndon
 
PDF
The Django Book, Chapter 16: django.contrib
Tzu-ping Chung
 
KEY
Backbone.js Simple Tutorial
추근 문
 
KEY
Django class based views (Dutch Django meeting presentation)
Reinout van Rees
 
PDF
Django design-patterns
Agiliq Info Solutions India Pvt Ltd
 
PPT
Chap 3 Python Object Oriented Programming - Copy.ppt
muneshwarbisen1
 
PDF
اسلاید جلسه ۹ کلاس پایتون برای هکر های قانونی
Mohammad Reza Kamalifard
 
PDF
Firebase for Apple Developers
Peter Friese
 
PDF
Hands On Spring Data
Eric Bottard
 
PDF
A Related Matter: Optimizing your webapp by using django-debug-toolbar, selec...
Christopher Adams
 
PDF
Django Class-based views (Slovenian)
Luka Zakrajšek
 
PPT
Django Forms: Best Practices, Tips, Tricks
Shawn Rider
 
PDF
Mongo and Harmony
Steve Smith
 
PDF
Declarative Data Modeling in Python
Joshua Forman
 
PDF
Alfresco Content Modelling and Policy Behaviours
J V
 
PDF
Django Heresies
Simon Willison
 
PDF
Core data in Swfit
allanh0526
 
Hibernate Tutorial for beginners
rajkamal560066
 
Powerful Generic Patterns With Django
Eric Satterwhite
 
Django workshop : let's make a blog
Pierre Sudron
 
Django Search
Peter Herndon
 
The Django Book, Chapter 16: django.contrib
Tzu-ping Chung
 
Backbone.js Simple Tutorial
추근 문
 
Django class based views (Dutch Django meeting presentation)
Reinout van Rees
 
Django design-patterns
Agiliq Info Solutions India Pvt Ltd
 
Chap 3 Python Object Oriented Programming - Copy.ppt
muneshwarbisen1
 
اسلاید جلسه ۹ کلاس پایتون برای هکر های قانونی
Mohammad Reza Kamalifard
 
Firebase for Apple Developers
Peter Friese
 
Hands On Spring Data
Eric Bottard
 
A Related Matter: Optimizing your webapp by using django-debug-toolbar, selec...
Christopher Adams
 
Django Class-based views (Slovenian)
Luka Zakrajšek
 
Django Forms: Best Practices, Tips, Tricks
Shawn Rider
 
Mongo and Harmony
Steve Smith
 
Declarative Data Modeling in Python
Joshua Forman
 
Alfresco Content Modelling and Policy Behaviours
J V
 
Django Heresies
Simon Willison
 
Core data in Swfit
allanh0526
 
Ad

Recently uploaded (20)

PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PDF
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PPTX
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
PPTX
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 

Advanced Django ORM techniques

  • 1. Advanced Django ORM techniques Daniel Roseman http://blog.roseman.org.uk
  • 2. About Me • Python user for five years • Discovered Django four years ago • Worked full-time with Python/Django since 2008. • Top Django answerer on StackOverflow! • Occasionally blog on Django, concentrating on efficient use of the ORM.
  • 3. Contents • Behind the scenes: models and fields • How model relationships work • More efficient relationships • Other optimising techniques
  • 7. How can you stop this happening to you? http://www.flickr.com/photos/m0n0/4479450696
  • 8. Behind the scenes: models and fields http://www.flickr.com/photos/spacesuitcatalyst/847530840
  • 9. Defining a model • Model structure initialised via metaclass • Called when model is first defined • Resulting model class stored in cache to use when instantiated
  • 10. Fields • Fields have contribute_to_class • Adds methods, eg get_FOO_display() • Enables use of descriptors for field access
  • 11. Model metadata • Model._meta • .fields • .get_field(fieldname) • .get_all_related_objects()
  • 12. Model instantiation • Instance is populated from database initially • Has no subsequent relationship with db until save • No identity between models
  • 13. Querysets • Model=manager returns a queryset: foos Foo.objects.all() • Queryset is an ordered list of instances of a single model • No database access yet • Slice: foos[0] • Iterate: {% for foo in foos %}
  • 14. Where do all those queries come from? • Repeated queries • Lack of caching • Relational lookup • Templates as well as views
  • 15. Repeated queries def get_absolute_url(self): return "%s/%s" % ( self.category.slug, self.slug ) Same category, but query is repeated for each article
  • 16. Repeated queries • Same link on every page • Dynamic, so can't go in urlconf • Could be cached or memoized
  • 17. Relationships http://www.flickr.com/photos/katietegtmeyer/124315322
  • 18. Relational lookups • Forwards: foo.bar.field • Backwards: bar.foo_set.all()
  • 19. Example models class Foo(models.Model): name = models.CharField(max_length=10) class Bar(models.Model): name = models.CharField(max_length=10) foo = models.ForeignKey(Foo)
  • 20. Forwards relationship >>> bar = Bar.objects.all()[0] >>> bar.__dict__ {'id': 1, 'foo_id': 1, 'name': u'item1'}
  • 21. Forwards relationship >>> bar.foo.name u'item1' >>> bar.__dict__ {'_foo_cache': <Foo: Foo object>, 'id': 1, 'foo_id': 1, 'name': u'item1'}
  • 22. Fowards relationships • Relational access implemented via a descriptor: django.db.models.fields.related. SingleRelatedObjectDescriptor • __get__ tries to access _foo_cache • If doesn't exist, does lookup and creates cache
  • 23. select_related • Automatically follows foreign keys in SQL query • Prepopulates _foo_cache • Doesn't follow null=True relationships by default • Makes query more expensive, so be sure you need it
  • 24. Backwards relationships {% for foo in my_foos %} {% for bar in foo.bar_set.all %} {{ bar.name }} {% endfor %} {% endfor %}
  • 25. Backwards relationships • One query per foo • If you iterate over foo_set again, you generate a new set of db hits • No _foo_cache • select_related does not work here
  • 26. Optimising backwards relationships • Get all related objects at once • Sort by ID of parent object • Then cache in hidden attribute as with select_related
  • 27. qs = Foo.objects.filter(criteria=whatever) obj_dict = dict([(obj.id, obj) for obj in qs]) objects = Bar.objects.filter(foo__in=qs) relation_dict = {} for obj in objects: relation_dict.setdefault( obj.foo_id, []).append(obj) for id, related in relation_dict.items(): obj_dict[id]._related = related
  • 28. qs = Foo.objects.filter(criteria=whatever) obj_dict = dict([(obj.id, obj) for obj in qs]) objects = Bar.objects.filter(foo__in=qs) relation_dict = {} for obj in objects: relation_dict.setdefault( obj.foo_id, []).append(obj) for id, related in relation_dict.items(): obj_dict[id]._related = related
  • 29. qs = Foo.objects.filter(criteria=whatever) obj_dict = dict([(obj.id, obj) for obj in qs]) objects = Bar.objects.filter(foo__in=qs) relation_dict = {} for obj in objects: relation_dict.setdefault( obj.foo_id, []).append(obj) for id, related in relation_dict.items(): obj_dict[id]._related = related
  • 30. qs = Foo.objects.filter(criteria=whatever) obj_dict = dict([(obj.id, obj) for obj in qs]) objects = Bar.objects.filter(foo__in=qs) relation_dict = {} for obj in objects: relation_dict.setdefault( obj.foo_id, []).append(obj) for id, related in relation_dict.items(): obj_dict[id]._related = related
  • 31. qs = Foo.objects.filter(criteria=whatever) obj_dict = dict([(obj.id, obj) for obj in qs]) objects = Bar.objects.filter(foo__in=qs) relation_dict = {} for obj in objects: relation_dict.setdefault( obj.foo_id, []).append(obj) for id, related in relation_dict.items(): obj_dict[id]._related = related
  • 32. qs = Foo.objects.filter(criteria=whatever) obj_dict = dict([(obj.id, obj) for obj in qs]) objects = Bar.objects.filter(foo__in=qs) relation_dict = {} for obj in objects: relation_dict.setdefault( obj.foo_id, []).append(obj) for id, related in relation_dict.items(): obj_dict[id]._related = related
  • 33. Optimising backwards [{'time': '0.000', 'sql': u'SELECT "foobar_foo"."id", "foobar_foo"."name" FROM "foobar_foo"'}, {'time': '0.000', 'sql': u'SELECT "foobar_bar"."id", "foobar_bar"."name", "foobar_bar"."foo_id" FROM "foobar_bar" WHERE "foobar_bar"."foo_id" IN (SELECT U0."id" FROM "foobar_foo" U0)'}]
  • 34. Optimising backwards • Still quite expensive, as can mean large dependent subquery – MySQL in particular very bad at these • But now just two queries instead of n • Not automatic – need to remember to use _related_items attribute
  • 35. Generic relations • Foreign key to ContentType, object_id • Descriptor to enable direct access • iterating through creates n+m queries(n=number of source objects, m=number of different content types) • ContentType objects automatically cached • Forwards relationship creates _foo_cache • but select_related doesn't work
  • 36. generics = {} for item in queryset: generics.setdefault(item.content_type_id, set()).add(item.object_id) content_types = ContentType.objects.in_bulk( generics.keys()) relations = {} for ct, fk_list in generics.items(): ct_model = content_types[ct].model_class() relations[ct] = ct_model.objects. in_bulk(list(fk_list)) for item in queryset: setattr(item, '_content_object_cache', relations[content_type_id][item.object_id] )
  • 37. generics = {} for item in queryset: generics.setdefault(item.content_type_id, set()).add(item.object_id) content_types = ContentType.objects.in_bulk( generics.keys()) relations = {} for ct, fk_list in generics.items(): ct_model = content_types[ct].model_class() relations[ct] = ct_model.objects. in_bulk(list(fk_list)) for item in queryset: setattr(item, '_content_object_cache', relations[content_type_id][item.object_id] )
  • 38. generics = {} for item in queryset: generics.setdefault(item.content_type_id, set()).add(item.object_id) content_types = ContentType.objects.in_bulk( generics.keys()) relations = {} for ct, fk_list in generics.items(): ct_model = content_types[ct].model_class() relations[ct] = ct_model.objects. in_bulk(list(fk_list)) for item in queryset: setattr(item, '_content_object_cache', relations[content_type_id][item.object_id] )
  • 39. generics = {} for item in queryset: generics.setdefault(item.content_type_id, set()).add(item.object_id) content_types = ContentType.objects.in_bulk( generics.keys()) relations = {} for ct, fk_list in generics.items(): ct_model = content_types[ct].model_class() relations[ct] = ct_model.objects. in_bulk(list(fk_list)) for item in queryset: setattr(item, '_content_object_cache', relations[content_type_id][item.object_id] )
  • 40. generics = {} for item in queryset: generics.setdefault(item.content_type_id, set()).add(item.object_id) content_types = ContentType.objects.in_bulk( generics.keys()) relations = {} for ct, fk_list in generics.items(): ct_model = content_types[ct].model_class() relations[ct] = ct_model.objects. in_bulk(list(fk_list)) for item in queryset: setattr(item, '_content_object_cache', relations[content_type_id][item.object_id] )
  • 41. generics = {} for item in queryset: generics.setdefault(item.content_type_id, set()).add(item.object_id) content_types = ContentType.objects.in_bulk( generics.keys()) relations = {} for ct, fk_list in generics.items(): ct_model = content_types[ct].model_class() relations[ct] = ct_model.objects. in_bulk(list(fk_list)) for item in queryset: setattr(item, '_content_object_cache', relations[content_type_id][item.object_id] )
  • 42. generics = {} for item in queryset: generics.setdefault(item.content_type_id, set()).add(item.object_id) content_types = ContentType.objects.in_bulk( generics.keys()) relations = {} for ct, fk_list in generics.items(): ct_model = content_types[ct].model_class() relations[ct] = ct_model.objects. in_bulk(list(fk_list)) for item in queryset: setattr(item, '_content_object_cache', relations[content_type_id][item.object_id] )
  • 43. generics = {} for item in queryset: generics.setdefault(item.content_type_id, set()).add(item.object_id) content_types = ContentType.objects.in_bulk( generics.keys()) relations = {} for ct, fk_list in generics.items(): ct_model = content_types[ct].model_class() relations[ct] = ct_model.objects. in_bulk(list(fk_list)) for item in queryset: setattr(item, '_content_object_cache', relations[content_type_id][item.object_id] )
  • 44. generics = {} for item in queryset: generics.setdefault(item.content_type_id, set()).add(item.object_id) content_types = ContentType.objects.in_bulk( generics.keys()) relations = {} for ct, fk_list in generics.items(): ct_model = content_types[ct].model_class() relations[ct] = ct_model.objects. in_bulk(list(fk_list)) for item in queryset: setattr(item, '_content_object_cache', relations[content_type_id][item.object_id] )
  • 45. Other optimising techniques
  • 46. Memoizing • Cache property on first access • Can cache within instance, if multiple accesses within same request def get_expensive_items(self): if not hasattr(self, '_cache'): self._cache = self.expensive_op() return self._cache
  • 47. DB Indexes • Pay attention to slow query log and debug toolbar output • Add extra indexes where necessary - especially for multiple-column lookup • Use EXPLAIN
  • 48. Outsourcing • Does all the logic need to go in the web app? • Services - via eg Piston • Message queues • Distributed tasks, eg Celery
  • 49. Summary • Understand where queries are coming from • Optimise where necessary, within Django or in the database • and...

Editor's Notes

  • #3: (background: montage of Limmud, rosemanblog, Capital, Classic, Heart, GlassesDirect)
  • #4: Some of same ideas in Guido&apos;s Appstats talk this morning
  • #9: It&apos;s a model, in a field, geddit?
  • #10: For more, see Marty Alchin, Pro Django (Apress)
  • #11: descriptors used especially in related objects - see later
  • #12: Very useful for introspection and working out what&apos;s going on
  • #13: explain identity: multiple instances relating to same model row aren&apos;t the same object, changes made to one don&apos;t reflect the other; even saving one with new values won&apos;t be reflected in others.
  • #14: Update, Aggregates, Q, F
  • #17: Find repeated queries with my branch of the django-debug-toolbar, or SimonW&apos;s original query debug middleware
  • #21: Actually in 1.2 there&apos;s an extra _state object in __dict__, which is used for the multiple DB support (which I&apos;m not covering here).
  • #23: Lack of model identity means that accessing the related item on one instance does not cause cache to be created on other instances that might reference the same db row
  • #26: Note: backwards cache does work on OneToOne as of 1.2
  • #38: +----+--------------------+-----------+-----------------+---------------+---------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+--------------------+-----------+-----------------+---------------+---------+---------+------+------+-------------+ | 1 | PRIMARY | sandy_bar | ALL | NULL | NULL | NULL | NULL | 100 | Using where | | 2 | DEPENDENT SUBQUERY | U0 | unique_subquery | PRIMARY | PRIMARY | 4 | func | 1 | Using where | +----+-----------+----+--------------------+-----------+-----------------+---------------+---------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+--------------------+-----------+-----------------+---------------+---------+---------+------+------+-------------+ | 1 | PRIMARY | sandy_bar | ALL | NULL | NULL | NULL | NULL | 100 | Using where | | 2 | DEPENDENT SUBQUERY | U0 | unique_subquery | PRIMARY | PRIMARY | 4 | func | 1 | Using where | +----+--------------------+-----------+-----------------+---------------+---------+---------+------+------+-------------+ --------+-----------+-----------------+---------------+---------+---------+------+------+-------------+