Benchmark

Introduction

This benchmark does not intend to be exhaustive nor fair to SQL. It shows how django-cachalot behaves on an unoptimised application. On an application using perfectly optimised SQL queries only, django-cachalot may not be useful. Unfortunately, most Django apps (including Django itself) use unoptimised queries. Of course, they often lack useful indexes (even though it only requires 20 characters per index…). But what you may not know is that the ORM currently generates totally unoptimised queries [1].

You can run the benchmarks yourself (officially supported on Linux and Mac). You will need a database called “cachalot” on MySQL and PostgreSQL. Additionally, on PostgreSQL, you will need to create a role called “cachalot.” Running the benchmarks can raise errors with specific instructions for how to fix it.

  1. Install: pip install -r requirements/benchmark.txt
  2. Run: python benchmark.py

The output will be in benchmark/TODAY’S_DATE/

Conditions

In this benchmark, a small database is generated, and each test is executed 20 times under the following conditions:

CPU Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
RAM 24634516 kB
Disk SAMSUNG MZVPW256HEGL-00000
Linux distribution Ubuntu 18.04 bionic
Python 3.7.0b3
Django 2.1
cachalot 2.1.0
sqlite 3.22.0
PostgreSQL 10.4
MySQL 5.7.23
Redis 4.0.9
memcached 1.5.6
psycopg2 2.7.5
mysqlclient 1.3.13

Note that MySQL’s query cache is active during the benchmark.

Database results

  • mysql is 1.1× slower then 4.0× faster
  • postgresql is 1.1× slower then 9.0× faster
  • sqlite is 1.2× slower then 4.3× faster
_images/db.svg

Cache results

  • filebased is 1.2× slower then 5.8× faster
  • locmem is 1.1× slower then 6.1× faster
  • memcached is 1.1× slower then 5.0× faster
  • pylibmc is 1.1× slower then 5.6× faster
  • redis is 1.1× slower then 5.6× faster
_images/cache.svg

Cache detailed results

Redis

_images/cache_redis.svg
[1]The ORM fetches way too much data if you don’t restrict it using .only and .defer. You can divide the execution time of most queries by 2-3 by specifying what you want to fetch. But specifying which data we want for each query is very long and unmaintainable. An automation using field usage statistics is possible and would drastically improve performance. Other performance issues occur with slicing. You can often optimise a sliced query using a subquery, like YourModel.objects.filter(pk__in=YourModel.objects.filter(…)[10000:10050]).select_related(…) instead of YourModel.objects.filter(…).select_related(…)[10000:10050]. I’ll maybe work on these issues one day.