Security News
Input Validation Vulnerabilities Dominate MITRE's 2024 CWE Top 25 List
MITRE's 2024 CWE Top 25 highlights critical software vulnerabilities like XSS, SQL Injection, and CSRF, reflecting shifts due to a refined ranking methodology.
django-bulk-update-or-create
Advanced tools
Everyone using Django ORM will eventually find himself doing batch update_or_create
operations: ingest files from external sources, sync with external APIs, etc.
If the number of records is big, the slowliness of QuerySet.update_or_create
will stand out: it is very practical to use but it always does one SELECT
and then one INSERT
(if select didn't return anything) or UPDATE
/.save
(if it did).
Searching online shows that this does indeed happen to quite a few people though it doesn't seem to be any good solution:
bulk_create
is really fast if you know all records are new (and you're not using multi-table inheritance)bulk_update
does some nice voodoo to update several records with the same UPDATE
statement (using a huge WHERE
condition together with CASE
), but you need to be sure they all existbulk_create
limitations ==> cannot use on models with multi-table inheritanceThis package tries to tackle this introducing bulk_update_or_create
to model QuerySet/Manager:
update_or_create
: (1 SELECT + 1 INSERT/UPDATE) * N
bulk_update_or_create
: 1 BIG_SELECT + 1 BIG_UPDATE + (lte_N) INSERT
For a batch of records:
SELECT
all from database (based on the match_field
parameter)bulk_update
for thoseINSERT
/.create
on each of the remainingThe (SOFTCORE) performance test looks promising, more than 70% less time (average):
$ make testcmd
# default - sqlite
DJANGO_SETTINGS_MODULE=settings tests/manage.py bulk_it
loop of update_or_create - all creates: 3.966486692428589
loop of update_or_create - all updates: 4.020653247833252
loop of update_or_create - half half: 3.9968857765197754
bulk_update_or_create - all creates: 2.949239730834961
bulk_update_or_create - all updates: 0.15633511543273926
bulk_update_or_create - half half: 1.4585723876953125
# mysql
DJANGO_SETTINGS_MODULE=settings_mysql tests/manage.py bulk_it
loop of update_or_create - all creates: 5.511938571929932
loop of update_or_create - all updates: 5.321666955947876
loop of update_or_create - half half: 5.391834735870361
bulk_update_or_create - all creates: 1.5671980381011963
bulk_update_or_create - all updates: 0.14612770080566406
bulk_update_or_create - half half: 0.7262606620788574
# postgres
DJANGO_SETTINGS_MODULE=settings_postgresql tests/manage.py bulk_it
loop of update_or_create - all creates: 4.3584535121917725
loop of update_or_create - all updates: 3.6183276176452637
loop of update_or_create - half half: 4.145816087722778
bulk_update_or_create - all creates: 1.044851541519165
bulk_update_or_create - all updates: 0.14954638481140137
bulk_update_or_create - half half: 0.8407495021820068
pip install django-bulk-update-or-create
Add it to your INSTALLED_APPS
list in settings.py
BulkUpdateOrCreateQuerySet
as manager of your model(s)from django.db import models
from bulk_update_or_create import BulkUpdateOrCreateQuerySet
class RandomData(models.Model):
objects = BulkUpdateOrCreateQuerySet.as_manager()
uuid = models.IntegerField(unique=True)
data = models.CharField(max_length=200, null=True, blank=True)
bulk_update_or_create
items = [
RandomData(uuid=1, data='data for 1'),
RandomData(uuid=2, data='data for 2'),
]
RandomData.objects.bulk_update_or_create(items, ['data'], match_field='uuid')
with RandomData.objects.bulk_update_or_create_context(['data'], match_field='uuid', batch_size=10) as bulkit:
for i in range(10000):
bulkit.queue(RandomData(uuid=i, data=i + 20))
bulk_update_or_create
supports yield_objects=True
so you can iterate over the created/updated objects.
bulk_update_or_create_context
provides the same information to the callback function specified as status_cb
WIP
bulk_create
for creates: assert model is not multi-table, if enabledcase_insensitive_match
should be dropped and collation detected in runtimematch_field
- probably will need to use WHERE (K1=X and K2=Y) or (K1=.. and K2 =..)
instead of IN
for those, as that SQL standard doesn't seem widely adopted yetUPSERT
alternative package once done!FAQs
bulk_update_or_create for Django model managers
We found that django-bulk-update-or-create demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
MITRE's 2024 CWE Top 25 highlights critical software vulnerabilities like XSS, SQL Injection, and CSRF, reflecting shifts due to a refined ranking methodology.
Security News
In this segment of the Risky Business podcast, Feross Aboukhadijeh and Patrick Gray discuss the challenges of tracking malware discovered in open source softare.
Research
Security News
A threat actor's playbook for exploiting the npm ecosystem was exposed on the dark web, detailing how to build a blockchain-powered botnet.