HomeAbout UsNews, company reports and blogBlogTechAutomated Schema Migration in a MySQL cluster

Automated Schema Migration in a MySQL cluster

Article by Ben Hutchison
Published on Read in 4 mins

The PSeeker Database

REA stores listing and agency data for Australia in a MySQL database named PSeeker. This large, complex database plays a central role in REA’s business:

  • About 95 tables in use
  • Largest table has 38 million rows
  • 24 tables have over 1 million rows
  • Near 100% uptime required

PSeeker in production is a loose cluster of ~10 MySQL database servers, that play a variety of roles:

  • A single active, writeable master instance that runs in our primary data center.
  • Several replica slaves in the primary data center used for read-only application load.
  • A replica in our secondary data center for disaster recovery
  • A replica used for investigation by support staff
  • A replica that feeds into our data warehouse

We use MySQL statement-based replication between the master and it’s replicas. This essentially runs the same statements on the replicas as has been run on the master.

Typically, replicas in the same data center as the master will be running behind changes in the master database by less than a second. More distant replicas can be up to a minute behind, depending upon the rate of updates and the bandwidth between the servers.

Manual Schema Management

Schema changes, such as adding columns or new tables, occur as new products are built or legacy systems are upgraded. They originate with the development teams involved in building or upgrading applications, at an average rate of 5-8 per month.

When I joined REA in 2010, changes to the schemas of the database instances were performed manually by operations staff. This was a very time consuming process, owing to the number of replicas that had to be individually upgraded, and the consequences of an error or omission could be severe.

The Liquibase Experience

Initially, we experimented with the open source schema migration tool Liquibase. Liquibase tracks the schema changes made to each database, and ensures that they are brought up to the latest version by automatically applying a series of migrations, described via an XML DSL. The tool was successfully deployed in development and test environments, resulting in a significant productivity improvement.

There was one consequence of automated schema management which took a while for the teams to adapt to: realizing that one a schema change had been shared in a git repo, it cannot be edited further, since some databases may already have applied the migration. Instead, if further migration changes are needed, a second migration must be added.

Ultimately however, Liquibase adoption was limited to use in dev & QA environments, because it didn’t handle MySQL clusters.

Challenges of upgrading a MySQL cluster

Prior to the recent 5.6 release, MySQL schema changes required taking the database offline. To maximize the uptime of the cluster, our practice was to take one server out of the cluster at a time for schema changes.

But upgrading cluster members individually can conflict with replication: if the schema changes on a master node before a replica, then the replication logs will contain data for columns that do not yet exist in the replica instance, resulting in replication errors.

SchemaMigrationReplicationProblem

Schemabot & Sequel

So REA developed an automatic in-house tool, Schemabot, for schema migration that understands our cluster topology, and ensured the replicas were upgraded before the master node. Developers push their database changes to a git repo, and the deployment tooling ensures that the schema upgrades are applied in all environments, including prod.

This was rolled out to prod in April/May of 2012, so for one and a half years now we have been enjoying automatic schema management.

Schemabot migrations are written using a Ruby DSL provided by the Sequel library, less verbose than the XML format used by Liquibase.

Sequel.migration do
  up do
    alter_table('Land') do
      set_column_type 'LandSize', 'decimal(20, 8) unsigned'
    end
  end

  down do
    alter_table('Land') do
      set_column_type 'LandSize', 'int(11)'
    end
  end
end

Long-running Data Migrations: An Unsolved Problem..

In addition to schema changes, we periodically have need for long-running data migrations & transformations to implement business logic. An example is the splitting of listing address data, from a single column into a more structured multi-column representation. These migrations:

  • Change the data in the database, but not the schema
  • Populate or modify the data stored in the database by executing scripts & code containing business logic
  • Take minutes or hours to run
  • Are performed while the database is in use

Currently, these are handled in an ad-hoc manner; devs write a script and have to run it in each environment manually. I hope that one day we can handle them more systematically as well.

More from the blog

From Teaching to Software Development – A Year On

Category: Career stories
Published on

A vision of the future is a bet we are prepared to make

Category: Tech
Published on

One year at REA: Lessons, Growth, and Gratitude

Category: Life at REA
Published on

Introducing Argonaut – Part Three.

Category: Tech
Published on

Introducing Argonaut – Part Two.

Category: Tech
Published on

Introducing Argonaut – Part One.

Category: Tech
Published on

View all articles