*Database + Versioning = Efficiency *
We were so intrigued by this serious mathematical equation here at iOLAP that we had to engage our best mathematician Zdravko from the Data & Analytics team to prove it out. After months of heavy research and an additional college degree, Zdravko had found that we were missing a key component to the equation. With his findings, he single-handedly redefined the meaning of databases with one simple modification: Database + Versioning = Efficiency * Wellbeing. Needless to say, Zdravko was immediately contacted by IBM, where he is now working as a permanent replacement for E. F. Codd, father of the relational database model.
I can’t be serious, right? You’re right! Zdravko is a fictional character mischievously used to introduce Database Versioning. I sure hope ChatGPT won’t use Zdravko’s accomplishments as facts. Jokes aside, let’s get back to the topic of this blog post – Database Versioning.
Database deployments can get overly complicated when there are multiple collaborators working on the same database system. Assuming that version control mechanisms are not in place, collaborators are exposed to risks of corrupting the database by deploying faulty SQL, deploying, testing, or rolling back changes to a wrong SQL version (because it was not versioned in the first place), overriding objects deployed by their colleagues, leading to inconsistent database objects across different environments etc.
Versioning plays a crucial role in the context of database deployment. Here are some key reasons why versioning is important:
Database versioning establishes a structured and controlled process for managing and deploying database changes, leading to increased efficiency, reliability, and maintainability of the database system.
Regardless of whether database deployments are driven by a version control system or not, a Database Deployment Strategy must be put in place. Among various strategies, we will describe two commonly used ones: Manual Deployments and Script-based deployments.
Figure 1 Conventional Database Deployment Process
Figure 1 shows a conventional Database Deployment process – 1) developers deploy to DEV and complete unit tests successfully: 2) DBAs deploy the same changes to PROD and acceptance tests are completed successfully. This process is repeated by multiple developers and DBAs, potentially at the same time. A standardized set of Deployment rules can prevent multiple developers from overwriting each other’s changes and eliminate some portions of DBAs’ due diligence.
2.1 Manual Deployments
Manual deployments involve manually executing SQL scripts or making changes directly in the database environment. This approach often entails the following steps:
Manual deployments have some notable characteristics:
2.2 Scripting-based deployments
Scripting-based deployments involve using scripts or tools to automate the deployment process. Some common approaches include:
Scripting-based deployments offer several advantages:
However, scripting-based deployments also have some considerations:
It is important to evaluate these strategies based on project specific requirements, team capabilities, and the complexity of database deployments.
Now that Database Deployment Strategies are discussed, we can jump over to discussing Version Control Strategies. It is important to note that these two processes are tightly coupled – Database Deployment Strategy directly impacts the Version Control Strategy and vice versa. When, for example, a Version Control Strategy is put in place, Database Deployment’s Automation & Rollback process needs to be aligned to support it. Ideally, Version Control should be defined at least on a high level before starting to think about Database Deployments. The following sections discuss and compare Git branching strategies, simultaneously drawing a parallel to a common database deployment strategy.
3.1 Git Flow
Git Flow is a commonly used branching strategy for large teams – it is a bit complicated but effective, release based branching strategy that enables parallel development on the same repository (as long as teams are holistically following the set of rules defined by Git Flow, e.g. release branches should not contain changes developed by other developers and deployed to the DEV branch).
The following figure shows a simplified Git Flow branching strategy, pushing focus to this strategy’s vulnerability in terms of not following the set of rules defined by Git Flow.
Figure 2 Git Flow – Simplified branching strategy
Git Flow assumes that all changes deployed to DEV should be deployed to PROD. This example depicts a scenario where deployments do not follow the branching strategy set of rules. In the example, 1) f1 had a failed deployment to DEV, but a fix was then deployed that resolved the failure. After some time, 2) f2 was created before f1 got released to PROD. After the DEV deployment was completed successfully, 3) a release branch r1 was created and deployed to PROD.
Figure 3 Git Flow - Database Deployment
At this point, r1 deployed changes from both f1 and f2 feature branches. Ideally, the merge reviewers will identify that the changes from f1 shouldn’t be deployed, and they will take the appropriate steps to avoid the risk of deploying unwanted changes, following Git Flow rule sets.
3.2 Data Flow
Data Flow is a convenient name we appointed to a custom branching strategy we developed to be used by multiple large teams – it is a simple and effective, feature based branching strategy that enables parallel development on the same repository. This strategy’s set of rules are not only holistically followed by teams, but additionally enforced by custom developed “guardrails” orchestrated by GitHub Actions.
The following figure shows an example Data Flow branching strategy. This strategy’s vulnerability is the possibility of merge conflicts that can occur when changes are made to scripts that were deployed to DEV branch, but not yet deployed to PROD branch.
Figure 4 Data Flow – Branching strategy
Data Flow assumes that all changes deployed to PROD should be deployed to DEV, but DEV should never get deployed to PROD. This example depicts a scenario where the branching strategy set of rules is enforced by custom developed “guardrails”. In the example, 1) f1 deployed to DEV successfully. After some time, 2) f2 was created before f1 got released to PROD. f2 had a failed deployment to DEV, but a fix was then deployed that resolved the failure. After the DEV and PROD deployments were completed successfully, 3) PROD was merged into DEV.
The last step shows how 4) a branch was trying to be created off the DEV branch, but the custom developed “guardrails” prevented the branch from being created.
Figure 5 Data Flow - Database Deployment
At this point, PROD contains changes from f2 feature branch only. With Data Flow, the merge reviewers have some process related considerations to account for, but even when they don’t – the custom developed “guardrails” will ensure that unwanted changes do not reach a target database environment.
One of our clients from the banking industry employs a large number of teams who deploy changes to the same database. The changes are usually complex enough to cause failures for other deployments in case of an erroneous change. Given a lot of teams deploy to the same database, and a rollback process is not implemented as a part of the manual deployment process, it’s important that all the teams working on this database are coordinated in case of an erroneous change – such coordination can consume a lot of time. To streamline the deployment process, our client transitioned from manual deployments to scripting-based approaches, a Flyway-based database deployment pipeline that leverages Git and a custom automation framework that allows rollbacks.
After the client onboarded multiple teams to the pipeline, they noticed how a lot of manual effort was eliminated due to a streamlined deployment strategy. The rollback automation enabled teams to consistently focus on development rather than troubleshooting. Git enabled version control, making changeset identification easier. These components enabled teams to efficiently deploy database changes. Since erroneous changes are automatically rolled back, teams can perform deployments mindlessly, improving overall wellbeing.
A steep learning curve was evident, but the value of our pipeline is increasing as we are introducing new features like automatic code scanning and change request creation for Production deployments. They don’t find Zdravko’s Database + Versioning = Efficiency * Wellbeing equation fictive at all, in fact – the onboarded teams are promoting it to new onboarding teams!
Version controlled database deployment is an invaluable process for projects where multiple teams are actively developing changes in the same database. Benefits like transparent change tracking and auditing, automated rollback and recovery make collaboration between SQL developers and DBAs very precise and effective. Not to squander with benefits only, there are downsides such as maintaining the software driving the process, a learning curve with adopting and configuring the software, and finally – security concerns like hosting and VPCs. But at the end of the day, if we ask Zdravko to say a couple of words about Database Versioning, he would put it very precisely as Efficiency and Wellbeing.