What about sharding in the Django?
13 Feb 2014Some time ago I was faced with the need to implement the sharding in Django 1.6 . It was an attempt to make step beyond the standart features of this framework and I felt the resistance of Django =) I’ll talk a bit about this challenge and its results.
Let’s start with definitions. Wikipedia says that:
A database shard is a horizontal partition in a database. Horizontal partitioning is a database design principle whereby rows of a database table are held separately, rather than being split into columns (which is what normalization and vertical partitioning do, to differing extents). Each partition forms part of a shard, which may in turn be located on a separate database server or physical location.
We wanted split our database entities by the different PostgreSQL schemas and used something like this for the id
generation. The sharding model was clear, but how to implement it in the Django application?
My solution of this problem was a custom database backend, that contains a custom sql compilers. Maybe it was a dirty hack, but I hope it wasn’t =)
To create your own custom database backend, you can copy structure from one of the existing backends from django.db.backends
(postgresql_psycopg2
for our case) and override DatabaseOperations
:
A custom sql compilers will be adding a corresponding schema name into the sql request based on the entity id:
That’s all! Oh, okay, that’s not all =) Now you must create a custom QuerySet
(with the two overrided methods - get
& create
) to provide a correct sharded id for an all entities.
But there is one problem - migrations. You can’t migrate correctly your sharded models and it’s sad. To avoid this we inctoruced the some more complex database configuration dictionary. We used the special method, that converted this complex config into the standard with a lot of database connections - a one for each shard. All connections have the search_path
option. In the settings.py
we must take in account a type of action:
Now we can manage sharded migrations by --database
options. For convenience you can write a fab script of course.
And one more and last caveat - you must create SOUTH_DATABASE_ADAPTERS
variable, that will be pointing to original postgres adapter south.db.postgresql_psycopg2
- south can’t create a correct migration otherwise.