I\'m looking for working code and ideas from others who have tried to build a multi-tenant Django application using database-level isolation.
Update/Solution
You could create a simple middleware of your own that determined the database name from your sub-domain or whatever and then executed a USE statement on the database cursor for each request. Looking at the django-tenants-schema code, that is essentially what it is doing. It is sub-classing psycopg2 and issuing the postgres equivalent to USE, "set search_path XXX". You could create a model to manage and create your tenants too, but then you would be re-writing much of django-tenants-schema.
There should be no performance or resource penalty in MySQL to switching the schema (db name). It is just setting a session parameter for the connection.
I've done something similar that is closest to point 1, but instead of using middleware to set a default connection Django database routers are used. This allow application logic to use a number of databases if required for each request. It's up to the application logic to choose a suitable database for every query, and this is the big downside of this approach.
With this setup, all databases are listed in settings.DATABASES
, including databases which may be shared among customers. Each model that is customer specific is placed in a Django app that has a specific app label.
eg. The following class defines a model which exists in all customer databases.
class MyModel(Model):
....
class Meta:
app_label = 'customer_records'
managed = False
A database router is placed in the settings.DATABASE_ROUTERS
chain to route database request by app_label
, something like this (not a full example):
class AppLabelRouter(object):
def get_customer_db(self, model):
# Route models belonging to 'myapp' to the 'shared_db' database, irrespective
# of customer.
if model._meta.app_label == 'myapp':
return 'shared_db'
if model._meta.app_label == 'customer_records':
customer_db = thread_local_data.current_customer_db()
if customer_db is not None:
return customer_db
raise Exception("No customer database selected")
return None
def db_for_read(self, model, **hints):
return self.get_customer_db(model, **hints)
def db_for_write(self, model, **hints):
return self.get_customer_db(model, **hints)
The special part about this router is the thread_local_data.current_customer_db()
call. Before the router is exercised, the caller/application must have set up the current customer db in thread_local_data
. A Python context manager can be used for this purpose to push/pop a current customer database.
With all of this configured, the application code then looks something like this, where UseCustomerDatabase
is a context manager to push/pop a current customer database name into thread_local_data
so that thread_local_data.current_customer_db()
will return the correct database name when the router is eventually hit:
class MyView(DetailView):
def get_object(self):
db_name = determine_customer_db_to_use(self.request)
with UseCustomerDatabase(db_name):
return MyModel.object.get(pk=1)
This is quite a complex setup already. It works, but I'll try to summarize what I see see as advantages and disadvantages:
Advantages
USE db;
statement that guesses that all databases are accessible through a single connection.Disadvantages
Suggestions
If you want flexible database access, I'd suggest to use Django's database routers. Use Middleware or a view Mixin which automatically sets up a default database to use for the connection based on request parameters. You might have to resort to thread local data to store the default database to use so that when the router is hit, it knows which database to route to. This allows Django to use its existing persistent connections to a database (which may reside on different hosts if wanted), and chooses the database to use based on routing set up in the request.
This approach also has the advantage that the database for a query can be overridden if needed by using the QuerySet using() function to select a database other than the default.
For the record, I chose to implement a variation of my first idea: issue a USE <dbname>
in an early request middleware. I also set the CACHE prefix the same way.
I'm using it on a small production site, looking up the tenant name from a Redis database based on the request host. So far, I'm quite happy with the results.
I've turned it into a (hopefully resuable) github project here: https://github.com/mik3y/django-db-multitenant