From MySQL to PostgreSQL
2017-12-08
In this article I will guide you through the steps I had to take to migrate Django projects from MySQL to PostgreSQL.
Intro
MySQL database has proven to be a good start for small and medium scale projects. It is known and used widely and has good documentation. Also there are great clients for easy management, like phpMyAdmin (web), HeidiSQL (windows), or Sequel Pro (macOS). However, in my professional life there were unfortunate moments, when databases from different projects crashed because of large queries or file system errors. Also I had database integrity errors which appeared in the MySQL databases throughout years because of different bugs at the application level.
When one thinks about scaling a project, they have to choose something more suitable. It should be something that is fast, reliable, and well supports ANSI standards of relational databases. Something that most top Django developers use. And the database of choice for most professionals happens to be PostgreSQL. PostgreSQL enables using several vendor-specific features that were not possible with MySQL, e.g. multidimensional arrays, JSON fields, key-value pair fields, special case-insensitive text fields, range fields, special indexes, full-text search, etc. For a newcomer, the best client that I know - pgAdmin (macOS, linux, windows) - might seem too complex at first, compared with MySQL clients, but as you have Django administration and handy ORM, you probably won't need to inspect the database in raw format too often.
So what does it take to migrate from MySQL to PostgreSQL? We will do that in a few steps and we will be using pgloader to help us with data migration. I learned about this tool from Louise Grandjonc, who was giving a presentation about PostgreSQL query optimization at DjangoCon Europe 2017.
One prerequisite for the migration are passing tests. You need to have functional tests to check if all pages are functioning correctly and unit tests to check at least the most critical or complex classes, methods, or functions.
1. Prepare your MySQL database
Make sure that your production MySQL database migration state is up to date:
(env)$ python manage.py migrate --settings=settings.production
Then create a local copy of your production MySQL database. We are going to use it for the migration.
2. Install pgloader
As I mentioned before, for the database migration we will use a tool called pgloader (version 3.4.1 or higher). This tool was programmed by Dimitri Fontaine and is available as an open source project on GitHub. You can compile the required version from the source. Or if you are using macOS, you can install it with Homebrew:
$ brew update
$ brew install pgloader
Note that PostgreSQL will also be installed as a dependency.
3. Create a new PostgreSQL user and database
Unlike with MySQL, creating new database users and databases with PostgreSQL usually happen in the shell rather than in the database client.
Let's create a user and database with the same name myproject
.
$ createuser --createdb --password myproject
$ createdb --username=myproject myproject
The --createdb
parameter will enable privilege to create databases. The --password
parameter will offer to enter a password. The --username
parameter will set the owner of the created database.
4. Create the schema
Link the project to this new PostgreSQL database in the settings, for example:
DATABASES = {
'postgresql': {
'ENGINE': 'django.db.backends.postgresql_psycopg2',
'NAME': get_secret("DATABASE_NAME"),
'USER': get_secret("DATABASE_USER"),
'PASSWORD': get_secret("DATABASE_PASSWORD"),
},
}
DATABASES['default'] = DATABASES['postgresql']
Here the custom get_secret()
function returns sensitive information from environment variables or a text file that is not tracked under version control. Its implementation is up to you.
Run the migrations to create tables and foreign key constraints in the new PostgreSQL database:
(env)$ python manage.py migrate --settings=settings.local
5. Create the data migration script
The pgloader uses configuration files with the settings defining how to deal with migrations. Let's create the configuration file myproject.load
with the following content:
LOAD DATABASE
FROM mysql://mysql_username:mysql_password@localhost/mysql_dbname
INTO postgresql:///myproject
WITH truncate, data only, disable triggers, preserve index names, include no drop, reset sequences
ALTER SCHEMA 'mysql_dbname' RENAME TO 'public'
;
6. Run data migration
Now it's time to copy the data:
$ pgloader myproject.load
Typically you will get a bunch of warnings about type conversions. These can probably be ignored, because the script will take its best guess how to convert data when importing. If in addition you get errors about duplicated data or tables with foreign keys to missing entries, you will need to fix the issues at MySQL database and then repeat the process. In that case, clean up the MySQL database, update your local copy, recreate PostgreSQL database with dropdb
and createdb
commands, run Django migrations to create the database schema, and copy the data again.
7. Adapt the code
When the database is successfully migrated, we should run Django project tests and fix all PostgreSQL-specific problems in the project's code. The code running Django ORM will run smoothly, but very likely there will be issues with raw SQL, QuerySet's extra()
method, and type conversions.
Typically, these are the differences that you might have to keep in mind:
-
String values in PostgreSQL queries are always quoted with 'single quotes'.
-
PostgreSQL doesn't convert types when comparing values automatically as MySQL does. If you use any raw SQL, you will need to do some casting before comparison like
CAST(blog_post.id AS text) = likes_like.object_id
orblog_post.id::text = likes_like.object_id
. The latter double-colon syntax is not understood by MySQL, so if you want to support both databases, you will need to write vendor-specific cases for each database management system. -
PostgreSQL is case-sensitive for string comparisons, so in the QuerySet filters you will need to use
*__iexact
lookup instead*__exact
and*__icontains
lookup instead of*__contains
. -
When ordering, convert the column to lowercase with the
Lower()
function:from django.db import models posts = Post.objects.order_by(models.Lower('title'))
-
When using
*__in
lookup, make sure that the type of the listed elements matches the type of the model field. For example, you may have aLike
model with generic relation, i.e.content_type
andobject_id
fields that together combine a generic foreign key to any model instance. Theobject_id
field is usually of a string type, but the IDs of related models might be integers, strings, or UUIDs. If you then want to get the liked Posts which primary keys are integers, you would need to convert theobject_id
values to integers before assigning them to thepk__in
lookup in the filter:``` liked_ids = map(int, Like.objects.filter( user=request.user, content_type=ContentType.objects.get_for_model(Post) ).values("object_id", flat=True))
liked_posts = Post.objects.filter(pk__in=liked_ids) ```
8. Repeat the process for production
When you are sure that the migration process is fluent and all Django tests pass, you can take your production website down, repeat the migration process locally with the latest production data, copy the migrated local database to production server, update the production code, install new dependencies, and take the website back online.
To create a database dump you can use command:
$ pg_dump --format=c --compress=9 --file=myproject.backup myproject
To restore or create the database from dump use commands:
$ dropdb --username=pgsql myproject
$ createdb --username=myproject myproject
$ pg_restore --dbname=myproject --role=myproject --schema=public myproject.backup
I might probably miss some points and there are some ways to automate the upgrade process for production, but you got the idea.
Conclusion
PostgreSQL is more restrictive than MySQL, but it provides greater performance, more stability, and better compliance with standards. In addition, in PostgreSQL there is a bunch of features that were not available in MySQL. If you are lucky, you can switch your project from MySQL to PostgreSQL in one day.
Cover picture by Casey Allen
Also by me
Django Paddle Subscriptions app
For Django-based SaaS projects.
Django GDPR Cookie Consent app
For Django websites that use cookies.