Introduction

Should you use it?

Django-cachalot is the perfect speedup tool for most Django projects. It will speedup a website of 100 000 visits per month without any problem. In fact, the more visitors you have, the faster the website becomes. That’s because every possible SQL query on the project ends up being cached.

Django-cachalot is especially efficient in the Django administration website since it’s unfortunately badly optimised (use foreign keys in list_editable if you need to be convinced).

However, it’s not suited for projects where there is a high number of modifications per minute on each table, like a social network with more than a 30 messages per minute. Django-cachalot may still give a small speedup in such cases, but it may also slow things a bit (in the worst case scenario, a 20% slowdown, according to the benchmark). If you have a website like that, optimising your SQL database and queries is the number one thing you have to do.

There is also an obvious case where you don’t need django-cachalot: when the project is already fast enough (all pages load in less than 300 ms). Like any other dependency, django-cachalot is a potential source of problems (even though it’s currently bug free). Don’t use dependencies you can avoid, a “future you” may thank you for that.

Features

  • Saves in cache the results of any SQL query generated by the Django ORM that reads data. These saved results are then returned instead of executing the same SQL query, which is faster.
  • The first time a query is executed is about 10% slower, then the following times are way faster (7× faster being the average).
  • Automatically invalidates saved results, so that you never get stale results.
  • Invalidates per table, not per object: if you change an object, all the queries done on other objects of the same model are also invalidated. This is unfortunately technically impossible to make a reliable per-object cache. Don’t be fooled by packages pretending having that per-object feature, they are unreliable and dangerous for your data.
  • Handles everything in the ORM. You can use the most advanced features from the ORM without a single issue, django-cachalot is extremely robust.
  • An easy control thanks to Settings and a simple API. But that’s only required if you have a complex infrastructure. Most people will never use settings or the API.
  • A few bonus features like a signal triggered at each database change (including bulk changes) and a template tag for a better template fragment caching.

Comparison with similar tools

This comparison was done in October 2015. It compares django-cachalot to the other popular automatic ORM caches at the moment: django-cache-machine & django-cacheops.

Features

Feature cachalot cache-machine cacheops
Easy to install quite
Cache agnostic
Type of invalidation per table per object per table
CPU & memory performance optimal bad terrible
Reliable quite
Handles QuerySet.count
Handles empty queries
Handles multi-table inheritance probably not
Handles proxy models
Handles many-to-many fields
Handles transactions probably not
Handles QuerySet.aggregate/annotate probably not
Handles QuerySet.bulk_create/update/delete probably not
Handles QuerySet.select_related/prefetch_related partially
Handles cursor.execute
Handles GeoDjango maybe
Handles django.contrib.postgres maybe partially

To find if a package supports a feature, I searched in the documentation, the issues, the tests and the code. I really tried to avoid writing “maybe”, “probably not”, etc. Unfortunately, the absence of tests for such cases and sometimes the confusion of the authors themselves about these features makes it difficult to know whether they support a feature or not.

Explanations

Of course, I can’t just throw a table with such “Reliable” and “CPU & memory performance” lines without explanation. My goal is not to start another stupid open source conflict, nor to be pretentious about my work. I’m just trying to inform users here, so they can fully grasp the consequences of using one or another tool. I actually used django-cache-machine in production for a week and django-cacheops for a month. On both solutions, I faced a lot of invalidation issues, and the bigger the cache became, the worst the performance was.

I now know the reason of these issues: in short, this is due to their invalidation systems. Read the following paragraphs for more detail.

django-cache-machine

django-cache-machine is using “flush lists” to remember which SQL queries are linked to which objects. This is the approach I chose when I created a prototype of django-cachalot, except it was invalidated per table, not per object like django-cache-machine does. Unfortunately, there are several important issues due to this approach that lead me to drop it.

The smaller issue is that each time you execute a new SQL query, django-cache-machine needs to fetch the “flush list” from the cache, update it and add it back to the cache. This means we have to make two cache calls in addition of the cache call to store the SQL query results. It may seem tiny, but when your cache size increases, the “flush lists” start becoming huge (a list of hundreds of cache keys for each database object), leading to an exponentially growing cache size and a longer time to fetch the always-growing “flush lists”. So bad memory and CPU usage when reading data.

The second issue is only linked to the per object invalidation. When django-cache-machine invalidates an object, it also needs to invalidate the queries of the related objects, otherwise they may contain stale data. Django-cache-machine invalidates foreign keys only, not many-to-many or generic foreign keys (because… I don’t know). This degrades performance of each writing operation to the database, because it needs to fetch related objects, fetch “flush lists” and delete these cache keys. And of course it can’t invalidate basic queries such as count or empty queries (probably aggregations too, but I’m not sure).

And at last but not least: a critical issue. It simply proves that the django-cache-machine team doesn’t know how caches work. Caches are fast because they are stupid: when your cache is full and needs room, it randomly fetches a few keys, selects the older ones if possible then deletes them. This means that a cache key with a 1 year timeout can be deleted before a cache key with a 1 minute timeout. But django-cache-machine assumes its “flush lists” will always stay longer in cache than the saved query results will, because they have the same timeout and “flush list” are saved a few milli-seconds after query results. Until the cache is full, this is kind of true because no cache key is deleted. But when it is full, the “flush list” can be removed at any moment, so the other cache keys will never be invalidated until they are deleted.

To sum up, django-cache-machine has bad memory and CPU performance and is absolutely not reliable.

django-cacheops

django-cacheops uses a debug feature from Redis, KEYS, to invalidate cache keys (that’s why it only supports Redis). It’s a feature that becomes linearly slower as your cache size grows. I measured, one single call of this command by django-cacheops slows down any database save by 50 ms to 3.5 seconds, depending on your database and cache sizes. The problem is also that django-cacheops runs this command several times at each save. Suppose you have a model with 3 many-to-many and you save an object with 3 related objects per many-to-many. django-cacheops will therefore run the Redis KEYS command at least 10 times! If you have a large cache and database, it means you can wait 30 seconds while this object is saved!

Another bad consequence of that use of the KEYS command is that Redis jumps to a 100% CPU usage when the command is running, degrading performance for other users or even blocking them until the command is finished.

In a general way, the workflow of django-cacheops is totally unoptimised. When an object is modified, an invalidate_obj function is called, calling an invalidate_dict function, calling the manage.py invalidate command with a serialized version of the object (!?) calling an invalidate_model function that calls the Redis KEYS command to get all the cache keys from that model then delete them. And as I said above, it executes all that N times, N being the number of related objects to the current object, even though multiple objects have the same model and we therefore don’t need to invalidate the model multiple times.

To sum up, django-cacheops has a terrible performance when modifying data, and is reliable on what it handles. But you probably need features it doesn’t handle, such as transactions (used by Django admin), multi-table inheritance, or cursor.execute (the three features being used by Wagtail and django CMS)…

Number of lines of code

Django-cachalot tries to be as minimalist as possible, while handling most use cases. Being minimalist is essential to create maintainable projects, and having a large test suite is essential to get an excellent quality. The statistics below speak for themselves…

Project part cachalot cache-machine cacheops
Application 743 843 1662
Tests 3023 659 1491