Opened 5 months ago

Closed 4 months ago

Last modified 4 months ago

#36213 closed Cleanup/optimization (fixed)

Document that QuerySet.update can execute two separate SQL queries when using MySQL

Reported by: Babak Owned by: Babak
Component: Documentation Version: dev
Severity: Normal Keywords: mysql self-select race-condition
Cc: Simon Charette, John Speno Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Babak)

When you have a QuerySet that has filter conditions based on related tables, the QuerySet.update() function will execute two separate SQL queries (a SELECT, followed by an UPDATE).

Examples:

BlogPost.objects.filter(published=True).update(foo="bar") (Causes a single UPDATE)

BlogPost.objects.filter(published=True, author__name="Foo Bar").update(foo="bar") (Causes SELECT THEN UPDATE)

As I was told in the forum (thanks charettes), this is an undocumented MySQL-only behaviour because MySQL doesn't allow self-select updates.

This will silently cause nasty race conditions since the update is no longer running as a single SQL statement.

Currently the docs for QuerySet.update say:

Using update() also prevents a race condition wherein something might change in your database in the short period of time between loading the object and calling save().

But in the case that I described, it causes the exact same type of race condition that the docs suggest that it prevents.

If the users are aware of this behaviour they can take care to avoid such filter conditions or to use alternative transaction handling mechanisms to ensure atomic behaviour.

I'd like to suggest for a warning to be added to the documentation about this.

Change History (17)

comment:1 by Babak, 5 months ago

Description: modified (diff)

comment:2 by Babak, 5 months ago

Here's a draft that I wrote. Happy to submit a PR if it's appropriate.

Last edited 5 months ago by Babak (previous) (diff)

comment:3 by Sarah Boyce, 5 months ago

Cc: Simon Charette added
Summary: Add warning to documentation: QuerySet.update can execute two separate SQL queries when using MySQLDocument that QuerySet.update can execute two separate SQL queries when using MySQL
Triage Stage: UnreviewedAccepted
Type: UncategorizedCleanup/optimization
Version: dev

comment:4 by Anders Hovmöller, 5 months ago

This seems like it's the same thing: https://www.reddit.com/r/django/comments/1ixb489/high_memory_usage_on_delete/

We also had a user on the unofficial django discord asking about a similar situation where their process got killed by the OOM killer when doing an update(). Also on mysql.

comment:5 by John Speno, 5 months ago

Cc: John Speno added

comment:6 by Simon Charette, 5 months ago

This seems like it's the same thing: ​https://www.reddit.com/r/django/comments/1ixb489/high_memory_usage_on_delete/

Unless they are making use of on_delete=models.SET_NULL or models.SET which make use of update it's not the same thing.

comment:7 by Simon Charette, 5 months ago

In stand corrected, QuerySet.delete on MySQL also performs a materialization but at the database level (it doesn't fetch all the ids on the Python side).

I wonder if the same strategy could be used for the UPDATE.

Last edited 5 months ago by Simon Charette (previous) (diff)

comment:8 by Simon Charette, 5 months ago

I confirmed that the Python side materialization of the query can be entirely avoided by using the same strategy as #31965 (f6405c0b8ef7aff513b105c1da68407a881a3671).

While MySQL disallows UPDATE table ... WHERE table.id IN (SELECT id FROM table WHERE ...) it allows UPDATE table ... WHERE table.id IN (SELECT * FROM (SELECT id FROM table WHERE ...) subquery) as it materialize the subqueries remotely.

I also learned that MySQL UPDATE supports updating multiple table at a time which is wild so I figured I'd give a shot making use of it.

Not sure if this changes the outcome of this ticket but the first two commits of this branch seem to address the issue entirely by

  1. In cases where the subquer must be materialized favor doing so on the server side (same strategy as #31965).
  2. In cases where the UPDATE FROM syntax can be used with JOINs to avoid the materialization of a subquery (which MySQL is notably bad at) adopt a similar strategy to #23576 (7acef095d73322f45dcceb99afa1a4e50b520479). That's possible when no aggregation or related updates are used.

The last commit is complementary but demonstrate an optimization for MTI updates of the form

Child.objects.filter(child_field=0).update(parent_field=1, child_field=2)

That are currently implemented through via N + 1 queries on all backends (A first select followed by N updates)

SELECT parent_prt_id FROM child WHERE child_field = 0
UPDATE child SET child_field = 1 WHERE parent_prt_id IN :parent_prt_ids
UPDATE parent SET parent_field = 2 WHERE id IN :parent_prt_ids

But are replaced by the following on MySQL

UPDATE child
JOIN parent ON (child.parent_ptr_id = parent.id)
SET child.child_field = 1, parent.parent_field = 2
WHERE child_field = 0

Note that the optimization must be disabled if order_by is used against inherited fields as MySQL doesn't allow references to JOIN'ed table in UPDATE ORDER BY (the join clause would have to be altered to be an ordered subquery).

Last edited 5 months ago by Simon Charette (previous) (diff)

comment:9 by Andro Ranogajec, 5 months ago

Owner: set to Andro Ranogajec
Status: newassigned

comment:10 by Andro Ranogajec, 5 months ago

Has patch: set

comment:11 by Jacob Walls, 4 months ago

Patch needs improvement: set

comment:12 by Jacob Walls, 4 months ago

Owner: changed from Andro Ranogajec to Babak
Patch needs improvement: unset
Triage Stage: AcceptedReady for checkin

comment:13 by Sarah Boyce, 4 months ago

Patch needs improvement: set
Triage Stage: Ready for checkinAccepted

comment:14 by Sarah Boyce, 4 months ago

Patch needs improvement: unset
Triage Stage: AcceptedReady for checkin

comment:15 by Sarah Boyce <42296566+sarahboyce@…>, 4 months ago

Resolution: fixed
Status: assignedclosed

In be1b776:

Fixed #36213 -- Doc'd MySQL's handling of self-select updates in QuerySet.update().

Co-authored-by: Andro Ranogajec <ranogaet@…>

comment:16 by Sarah Boyce <42296566+sarahboyce@…>, 4 months ago

In c68f3516:

[5.2.x] Fixed #36213 -- Doc'd MySQL's handling of self-select updates in QuerySet.update().

Co-authored-by: Andro Ranogajec <ranogaet@…>

Backport of be1b776ad8d6f9bccfbdf63f84b16fb81a13119e from main.

comment:17 by Sarah Boyce <42296566+sarahboyce@…>, 4 months ago

In b3b09dc6:

[5.1.x] Fixed #36213 -- Doc'd MySQL's handling of self-select updates in QuerySet.update().

Co-authored-by: Andro Ranogajec <ranogaet@…>

Backport of be1b776ad8d6f9bccfbdf63f84b16fb81a13119e from main.

Note: See TracTickets for help on using tickets.
Back to Top