There are two main categories of database deployment tools. The first one supports deployment of software that follows Application-Driven Database Design (ADD). It means that a database is a persistence layer for a particular application. It is the application itself, its features and the need them to be persisted that drives database design. A typical reprezentant here is web application development.
On the other hand, applications that are oriented strictly on database development (and deployment) follow (as I call such a development approach) Independent Database Design (IDD). Thus, the major determinant for introducing database schema changes is the need for extending or modifying a nature of the data. This category typically includes data warehouses and high load/performance critical database processes.
It should be clear that IDD is simpler than ADD because a database developer cares just about development and deployment of one layer. But when an application developer takes care of two software layers and their mutual interconnection then he has more complex responsibility. It is natural that deployment of two dependent layers is more complex task than deployment of only one. The situation becomes even more complex when responsibilities for application layer development and database layer development are spread across different teams because of communication issues.
First I focus on IDD because it seems it is simpler than ADD, as if ADD was just extension to IDD. Later in this article I will explain that IID can be significantly different from ADD.
Independent Database Design
If a developer is responsible for development of just a database layer then his task to create deployment strategy is quite straightforward. Once the development phase within an iteration is over he is expected to define a deployment strategy for new version of database schema. The strategy usually is defined in a database deployment script. It includes DDL commands for inserting/modifying/deleting columns in tables and etc. The deployment script often defines also DML commands like inserting initial data to newly created tables. The two general questions come up when thinking about how to define database deployment process:
- How does the developer identifies what should be deployed into target environment?
- How does the developer define the deployment process of the database?
The answer to the first question is Database Comparison Tool. There is a number of them in the market. They can compare both database schema and data themselves. When the developer needs to identify what should be deployed he usually works in the following scenario:
- Compare the target database schema with current development database schema.
- Identify schema differences
- Identify which data should be deployed into target database. Database data comparison can help here.
Identifying schema and data differences is quite simple thanks to the mature database comparison tools. The more difficult part come up with the second question. How to define a deployment process? Regardless of whether or not the developer uses some database deployment software the final result is always the same – a deployment SQL script. So, in order to answer the second question related to defining deployment process, the developer has to do the following steps:
- Create a database schema deployment script. It usually contains only DDL commands.
- Create database data modification script. It usually contains only DML commands.
Of course, the tools could help here, too. However, the whole process of transforming the identified database differences into the deployment script can be very complex. There can be mutual dependencies among database objects – both schema and data dependencies.
Consider the following example: In a target database there is a table Person with columns Id, FullName. In a development database that I want to deploy to target there is a table Person with columns Id, FirstName, LastName. How should I deploy the development database into target database? Simply with this deployment script?
ALTER TABLE Person ADD COLUMN FirstName VARCHAR(50);
ALTER TABLE Person ADD COLUMN LastName VARCHAR(50);
ALTER TABLE Person DROP COLUMN FullName; -- data gets lost!
Opps! I lost all my data related to person names! Let us try better approach:
ALTER TABLE Person ADD COLUMN FirstName VARCHAR(50);
ALTER TABLE Person ADD COLUMN LastName VARCHAR(50);
UPDATE Person SET FirstName=ExtractFirstName(FullName), ExtractLastName(FullName);
ALTER TABLE Person DROP COLUMN FullName;
Here functions ExtractFirstName, ExtractLastName represent some (potentially complex) transformations defining the process how to retrieve particular parts of the name from its full represennation.
The situation gets more complex if new columns have constraints. For example, FirstName and LastName should not be null. It such a case you are not even able to add a column until you do not specify a default values for the columns being added.
The examples above demonstrates that there is not always a straightforward way how to deploy a new version of database to target environment. If there was then it would be relatively easy to automate creation od a deployment script.
Although nowadays tools can help a developer to pre-generate a deployment script, the developer still is required to manually double-check and fix it if needed. All in all, the whole process of defining a deployment script requires rather a lot of manual work.
Application-Driven Database Design
It might seem that ADD extends IDD approach however it is true only to some extent. In case a developer is creating an application which uses a database only as persistent storage then the database design is driven by application features. If the application stores a property FullName of an entity Person then a table Person with a column FullName has to be modeled in the database.
Later on, when the need for two independent properties FirstName, LastName arises then it has to be reflected in the database as well. Deployment of the application layer is simple since it is stateless – I can simply get rid of the old version of the application that supports only FullName property and replace it completely with new version of the entire application (I simplified it here since I do not count with introducing breaking changes in case other applications depend on the application being deployed). Definition of a deployment script can be simple SQL as in IDD approach.
However, unless the database design is strictly driven by the application then it is possible (at least to some extent) to define a recipe for database deployment directly in the application layer in the language the application layer is written in. Here is an example for migration in Ruby:
class AddFirstNameAndLastNameToPersons < ActiveRecord::Migration
def change
add_column :FirstName, :LastName, :string
end
end
Of course it is just a wrapper that generates good old SQL at deployment time. But with this approach of defining migrations with simple DSL you get greater level of control over deployment process because you can use full power of object-oriented language. You can programmatically combine migrations together, you can programmatically enable/disable them, you can run pre-tests validating that particular migration is even possible to be run, etc.
One of the most popular deployment tools of this kind is Liquibase. Written in Java, it uses special DSL for migrations recipes:
<changeSet id="1" author="bob">
<addColumn tableName="Person">
<column name="FirstName" type="varchar(50)"/>
</addColumn>
<addColumn tableName="Person">
<column name="LastName" type="varchar(50)"/>
</addColumn>
<sql>UPDATE Person SET FirstName=ExtractFirstName(FullName), ExtractLastName(FullName)</sql>
<dropColumn tableName="Person" columnName="FullName"/>
<rollback>
<addColumn tableName="Person">
<column name="FullName" type="varchar(100)"/>
</addColumn>
<sql>UPDATE Person SET FullName=CreateFullName(FirstName, LastName)</sql>
<dropColumn tableName="Person" columnName="LastName"/>
<dropColumn tableName="Person" columnName="FirstName"/>
</rollback>
</changeSet>
During deployment, the tool is responsible for translating migrations written in XML into SQL of target database server. Liquibase comes with an interesting feature of a rollback deployment in case a business decision to go back to previous version arises. In some cases the rollback does not have to be defined (add column, remove column) since a recipe for rollback can be inherited from the migration definition. But for a more advanced migration like in the example above the rollback recipe has to be defined manually.
Conclusion
Regardless of what approach the development team chooses for database migrations it always results in creating of a database change script. In simple case, the script is first generated by a database comparison tool, then a developer manuall reviews in and corrects conflicting changes. RoundhousE or Liquibase represent such a family of tools.
Advanced tools use custom DSL for defining database migrations. They can be written in application original language (Ruby migrations, FluentMigrator).
There are also mature commercial products that include both database comparison and deployment features in one tool. SQL Data Compare and NeXtep Open Designer are promising tools for database deployment automation.