Introduction¶
Should you use it?¶
Django-cachalot is the perfect speedup tool for most Django projects. It will speedup a website of 100 000 visits per month without any problem. In fact, the more visitors you have, the faster the website becomes. That’s because every possible SQL query on the project ends up being cached.
Django-cachalot is especially efficient in the Django administration website
since it’s unfortunately badly optimised (use foreign keys in list_editable
if you need to be convinced). One of the best suited is select_related
and
prefetch_related
operations.
However, it’s not suited for projects where there is a high number of modifications per minute on each table, like a social network with more than a 50 messages per minute. Django-cachalot may still give a small speedup in such cases, but it may also slow things a bit (in the worst case scenario, a 20% slowdown, according to the benchmark). If you have a website like that, optimising your SQL database and queries is the number one thing you have to do.
There is also an obvious case where you don’t need django-cachalot: when the project is already fast enough (all pages load in less than 300 ms). Like any other dependency, django-cachalot is a potential source of problems (even though it’s currently bug free). Don’t use dependencies you can avoid, a “future you” may thank you for that.
Features¶
- Saves in cache the results of any SQL query generated by the Django ORM that reads data. These saved results are then returned instead of executing the same SQL query, which is faster.
- The first time a query is executed is about 10% slower, then the following times are way faster (7× faster being the average).
- Automatically invalidates saved results, so that you never get stale results.
- Invalidates per table, not per object: if you change an object, all the queries done on other objects of the same model are also invalidated. This is unfortunately technically impossible to make a reliable per-object cache. Don’t be fooled by packages pretending having that per-object feature, they are unreliable and dangerous for your data.
- Handles everything in the ORM. You can use the most advanced features from the ORM without a single issue, django-cachalot is extremely robust.
- An easy control thanks to Settings and a simple API. But that’s only required if you have a complex infrastructure. Most people will never use settings or the API.
- A few bonus features like a signal triggered at each database change (including bulk changes) and a template tag for a better template fragment caching.
Comparison with similar tools¶
This comparison was done in December 2015. It compares django-cachalot to the other popular automatic ORM caches at the moment: django-cache-machine & django-cacheops.
Features¶
Feature | cachalot | cache-machine | cacheops |
---|---|---|---|
Easy to install | ✔ | ✘ | quite |
Cache agnostic | ✔ | ✔ | ✘ |
Type of invalidation | per table | per object | per query |
CPU performance | excellent | excellent | excellent |
Memory performance | excellent | good | excellent |
Reliable | ✔ | ✘ | ✘ |
Useful for > 50 modifications per minute | ✘ | ✔ | ✔ |
Handles transactions | ✔ | ✘ | ✘ |
Handles Django admin save | ✔ | ✘ | ✘ |
Handles multi-table inheritance | ✔ | ✔ | ✘ |
Handles QuerySet.count |
✔ | ✘ | ✔ |
Handles QuerySet.aggregate /annotate |
✔ | ✔ | ✘ |
Handles QuerySet.update |
✔ | ✘ | ✘ |
Handles QuerySet.select_related |
✔ | ✔ | ✘ |
Handles QuerySet.extra |
✔ | ✘ | ✘ |
Handles QuerySet.values /values_list |
✔ | ✘ | ✔ |
Handles QuerySet.dates /datetimes |
✔ | ✘ | ✔ |
Handles subqueries | ✔ | ✔ | ✘ |
Handles querysets generating a SQL HAVING keyword |
✔ | ✔ | ✘ |
Handles cursor.execute |
✔ | ✘ | ✘ |
Handles the Django command flush |
✔ | ✘ | ✘ |
Explanations¶
“Handles [a feature]” means that the package correctly invalidates SQL queries
using that feature. So if a package doesn’t handle a feature, you may get
stale query results when using this feature.
It does not mean that it caches a query with this feature, although
django-cachalot caches all queries except random queries
or those ran through cursor.execute
.
This comparison was done by running the test suite of cachalot against cache-machine & cacheops. This test suite is indeed relevant for other packages (such as cache-machine & cacheops) since most of it is written in a cachalot-independent way.
Similarly, the performance comparison was done using our benchmark, coupled with a memory measure.
To me, cache-machine & cacheops are not reliable because of these reasons:
- Neither cache-machine or cacheops handle transactions, which is critical. Transactions are used a lot in Django internals: at least in any Django admin save, many-to-many relations modification, bulk creation or update, migrations, session save. If an error occurs during one of these operations, good luck finding if stale data is returned. The best you can do in this case is manually clearing the cache.
- If you use a query that’s not handled, you may get stale data. It ends up ruining your database since it lets you save modifications to stale data, therefore overwriting the latest version that’s in the database. And you always end up using queries that are not handled since there is no list of unhandled queries in the documentation of each module.
- In the case of cache-machine, another issue is that it relies on “flush lists”, which can’t work reliably when implemented in a cache like this (see cache-machine#107).
Number of lines of code¶
Django-cachalot tries to be as minimalist as possible, while handling most use cases. Being minimalist is essential to create maintainable projects, and having a large test suite is essential to get an excellent quality. The statistics below speak for themselves…
Project part | cachalot | cache-machine | cacheops |
---|---|---|---|
Application | 743 | 843 | 1662 |
Tests | 3023 | 659 | 1491 |