Workflow for Writing Database Migrations for a Team

Question 1

I am wondering what some possible workflows are for writing database migrations for a team of developers. We seem to run into a problem where one person writes a migration, names it and gives it a number (the latter is done automatically by the migration package called Alembic). Prior to this migration being submitted from a feature branch into development branch, another developer may make another migration from the same root migration. Obviously, this one gets a different number. When it comes time to merge it, there is a conflict, which needs to be resolved.

The problem is exacerbated by the fact that sometimes there is a several-day delay before these migrations are integrated into the development branch (we have somewhat good reasons for this workflow due to specifics of the organization). As a result, if one starts a new feature branch from the current HEAD of the development branch, he can be missing several migrations which will eventually be added, just haven't been yet.

I have thought of dedicating a separate branch within the gitflow just to host migration files. This would allow for an easy merge of that branch to acquire all of most recent migrations, but would place an extra load and complexity on the developers in form of needing to add things to separate branches while developing locally.

What are some good workflows for resolving this? Any advice or feedback is appreciated.

Question 2

Where does the merge conflict happen? With the tools I'm used to each migration is a separate file, and its named after the time it was created, to the second, so there is no merge conflict unless two people happen to generate migrations at the same second.

Question 3

@bdsl Merge conflict is probably the wrong term. What I mean is that there is a situation in which several migrations end up referring to the same migration as their parent migration on which they depend. Since the migration chain is linear, it is a problem. It is not a VCS issue per se, but it could be a part of the solution.

Question 4

Why can't more than one migration have the same parent? Why do migrations need a "parent" at all?

Question 5

Have you read through Working with Branches in the Alembric documentation? This specifically refers to migrations with more than one parent, although this question is focused on multiple migrations with the same parent.

Question 6

@MadPhysicist Thanks I see. Looks like this might be specific to Alembic. The tool I'm most familar with, Doctrine Migrations doesn't have any concept of branches in the same way.

Question 7

You describe a situation where Alice and Bob create feature branches from same spot in main, and then each creates a new alembic DB migration, leading to an undesirable merge conflict. It is unclear why the documented approach to merging is unsuitable, but I will just take that as a given. Perhaps the learning curve is too steep or the documentation burden is too great for your environment.

So the input assumptions are:

Developers create new DB migrations at a "high" rate.
Merging feature branch to main takes a "long" time.
Conflicts are unacceptable.

Our field is Computer Science, where you can always fix a problem by introducing one more layer of abstraction.

Here is one solution. Never create a simple migration. Instead, always create a migration that is developed in these two steps:

null migration
migration required by new app feature

In (1.) Alice creates a null feature branch where alembic advances from hash1 to hash2 while making zero changes to the database. Now she has allocated hash1 for her own use, and leaves hash2 for other developers such as Bob. This trivial change can be immediately merged down to main, where it is visible to others. Even an organization that is heavy on ceremony would be able to devise a process for low latency merging of null updates. Think of the alembic hashes as a linked list, and Alice reserves the right to later insert a migration at the site of an "empty" link.

Alice now creates the "real" feature branch, and proceeds to author app changes plus a migration from hash1 -> hash2. Bob meanwhile does something similar so main has hash3, and he proceeds to author a hash2 -> hash3 migration. Either Alice or Bob will win the race, when their feature is finished and they merge to main.

The other developer will need to exercise some care to verify the final migrations do not conflict. Ideally this would have been worked out during Sprint Planning, but by hypothesis the OP suggests the organization has yet to converge on a means of deconflicting early in the process. So we should test migrations that go backward "far enough" and then bring us up-to-date.

Suppose Alice won the race. When Bob is about to merge down to main, he already knows that tests run fine with Alice's null migration. He pulls her code, verifies that her actual migration also integrates fine, and at last merges his feature branch.

Alternatively Bob won the race. When Alice is ready to merge to main, she pulls his code and verifies her feature doesn't break it.

In general there could be N racing feature branches, which implies that on each merge a test should go back at least N migrations to verify that things still work.

Question 8

Make sure that you always generate unique filenames for each database migration (apparently Alembic uses abbreviated GUIDs so that shouldn’t be a problem) and that you know in what order they will be applied. Then, before merging into main, make sure to update you migration file so that it makes sense in the reality of the current main.

With one file per migration and unique filenames you should at least never get merge conflicts. And main getting updated with a new version under your feet should just require you to update your migration before merging.

What we do (using Flyway):

Once a new migration lands: synch. with main (rebase or merge)
Bump the version (it’s just a version number)
- All this impacts is that we have to rebuild the project while testing the branch, because of Flyway errors. Or we have to fiddle with updating checksums and whatnot. But we never bother doing that locally, which illustrates that it’s not a problem at all.

I have thought of dedicating a separate branch within the gitflow just to host migration files.

I feel like I must be a bit emphatic, here: so often I see people trying to solve what are effectively branching or divergence problems using branches. Like configurations for different product lines. And it seems very misguided to me.

Keep updating migrations prior to merge into main. Don’t create a new parallel history for everyone to worry about. The test suite should fail spectacularly if weird migrations conflicts (database migration conflicts, not Git conflicts) land in main, anyway.

J_H J_H 7,6451 gold badge17 silver badges27 bronze badges · Answer 1 · 2023-05-17 21:40:22Z

You describe a situation where Alice and Bob create feature branches from same spot in main, and then each creates a new alembic DB migration, leading to an undesirable merge conflict. It is unclear why the documented approach to merging is unsuitable, but I will just take that as a given. Perhaps the learning curve is too steep or the documentation burden is too great for your environment.

So the input assumptions are:

Developers create new DB migrations at a "high" rate.
Merging feature branch to main takes a "long" time.
Conflicts are unacceptable.

Our field is Computer Science, where you can always fix a problem by introducing one more layer of abstraction.

Here is one solution. Never create a simple migration. Instead, always create a migration that is developed in these two steps:

null migration
migration required by new app feature

In (1.) Alice creates a null feature branch where alembic advances from hash1 to hash2 while making zero changes to the database. Now she has allocated hash1 for her own use, and leaves hash2 for other developers such as Bob. This trivial change can be immediately merged down to main, where it is visible to others. Even an organization that is heavy on ceremony would be able to devise a process for low latency merging of null updates. Think of the alembic hashes as a linked list, and Alice reserves the right to later insert a migration at the site of an "empty" link.

Alice now creates the "real" feature branch, and proceeds to author app changes plus a migration from hash1 -> hash2. Bob meanwhile does something similar so main has hash3, and he proceeds to author a hash2 -> hash3 migration. Either Alice or Bob will win the race, when their feature is finished and they merge to main.

The other developer will need to exercise some care to verify the final migrations do not conflict. Ideally this would have been worked out during Sprint Planning, but by hypothesis the OP suggests the organization has yet to converge on a means of deconflicting early in the process. So we should test migrations that go backward "far enough" and then bring us up-to-date.

Suppose Alice won the race. When Bob is about to merge down to main, he already knows that tests run fine with Alice's null migration. He pulls her code, verifies that her actual migration also integrates fine, and at last merges his feature branch.

Alternatively Bob won the race. When Alice is ready to merge to main, she pulls his code and verifies her feature doesn't break it.

In general there could be N racing feature branches, which implies that on each merge a test should go back at least N migrations to verify that things still work.

Guildenstern Guildenstern 3761 silver badge8 bronze badges · Answer 2 · 2023-05-18 21:00:21Z

Make sure that you always generate unique filenames for each database migration (apparently Alembic uses abbreviated GUIDs so that shouldn’t be a problem) and that you know in what order they will be applied. Then, before merging into main, make sure to update you migration file so that it makes sense in the reality of the current main.

With one file per migration and unique filenames you should at least never get merge conflicts. And main getting updated with a new version under your feet should just require you to update your migration before merging.

What we do (using Flyway):

Once a new migration lands: synch. with main (rebase or merge)
Bump the version (it’s just a version number)
- All this impacts is that we have to rebuild the project while testing the branch, because of Flyway errors. Or we have to fiddle with updating checksums and whatnot. But we never bother doing that locally, which illustrates that it’s not a problem at all.

I have thought of dedicating a separate branch within the gitflow just to host migration files.

I feel like I must be a bit emphatic, here: so often I see people trying to solve what are effectively branching or divergence problems using branches. Like configurations for different product lines. And it seems very misguided to me.

Keep updating migrations prior to merge into main. Don’t create a new parallel history for everyone to worry about. The test suite should fail spectacularly if weird migrations conflicts (database migration conflicts, not Git conflicts) land in main, anyway.

Stack Exchange Network

Workflow for Writing Database Migrations for a Team

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Workflow for Writing Database Migrations for a Team

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions