I am wondering what some possible workflows are for writing database migrations for a team of developers. We seem to run into a problem where one person writes a migration, names it and gives it a number (the latter is done automatically by the migration package called Alembic). Prior to this migration being submitted from a feature branch into development branch, another developer may make another migration from the same root migration. Obviously, this one gets a different number. When it comes time to merge it, there is a conflict, which needs to be resolved.
The problem is exacerbated by the fact that sometimes there is a several-day delay before these migrations are integrated into the development branch (we have somewhat good reasons for this workflow due to specifics of the organization). As a result, if one starts a new feature branch from the current HEAD of the development branch, he can be missing several migrations which will eventually be added, just haven't been yet.
I have thought of dedicating a separate branch within the gitflow just to host migration files. This would allow for an easy merge of that branch to acquire all of most recent migrations, but would place an extra load and complexity on the developers in form of needing to add things to separate branches while developing locally.
What are some good workflows for resolving this? Any advice or feedback is appreciated.
-
1Where does the merge conflict happen? With the tools I'm used to each migration is a separate file, and its named after the time it was created, to the second, so there is no merge conflict unless two people happen to generate migrations at the same second.bdsl– bdsl05/17/2023 13:34:02Commented May 17, 2023 at 13:34
-
@bdsl Merge conflict is probably the wrong term. What I mean is that there is a situation in which several migrations end up referring to the same migration as their parent migration on which they depend. Since the migration chain is linear, it is a problem. It is not a VCS issue per se, but it could be a part of the solution.MadPhysicist– MadPhysicist05/17/2023 14:08:09Commented May 17, 2023 at 14:08
-
1Why can't more than one migration have the same parent? Why do migrations need a "parent" at all?Greg Burghardt– Greg Burghardt05/17/2023 15:16:44Commented May 17, 2023 at 15:16
-
Have you read through Working with Branches in the Alembric documentation? This specifically refers to migrations with more than one parent, although this question is focused on multiple migrations with the same parent.Greg Burghardt– Greg Burghardt05/17/2023 15:19:34Commented May 17, 2023 at 15:19
-
@MadPhysicist Thanks I see. Looks like this might be specific to Alembic. The tool I'm most familar with, Doctrine Migrations doesn't have any concept of branches in the same way.bdsl– bdsl05/17/2023 17:01:26Commented May 17, 2023 at 17:01
2 Answers 2
You describe a situation where Alice and Bob create
feature branches from same spot in main
,
and then each creates a new alembic DB migration,
leading to an undesirable merge conflict.
It is unclear why the
documented
approach to merging is unsuitable,
but I will just take that as a given.
Perhaps the learning curve is too steep
or the documentation burden is too great
for your environment.
So the input assumptions are:
- Developers create new DB migrations at a "high" rate.
- Merging feature branch to
main
takes a "long" time. - Conflicts are unacceptable.
Our field is Computer Science, where you can always fix a problem by introducing one more layer of abstraction.
Here is one solution. Never create a simple migration. Instead, always create a migration that is developed in these two steps:
- null migration
- migration required by new app feature
In (1.) Alice creates a null feature branch
where alembic advances from hash1
to hash2
while making zero changes to the database.
Now she has allocated hash1
for her own use,
and leaves hash2
for other developers such as Bob.
This trivial change can be immediately
merged down to main
, where it is visible to others.
Even an organization that is heavy on ceremony
would be able to devise a process for low latency
merging of null updates.
Think of the alembic hashes as a linked list,
and Alice reserves the right to later insert
a migration at the site of an "empty" link.
Alice now creates the "real" feature branch,
and proceeds to author app changes plus a
migration from hash1 -> hash2
.
Bob meanwhile does something similar so
main
has hash3
, and he proceeds to author
a hash2 -> hash3
migration.
Either Alice or Bob will win the race,
when their feature is finished and they merge to main
.
The other developer will need to exercise some care to verify the final migrations do not conflict. Ideally this would have been worked out during Sprint Planning, but by hypothesis the OP suggests the organization has yet to converge on a means of deconflicting early in the process. So we should test migrations that go backward "far enough" and then bring us up-to-date.
Suppose Alice won the race.
When Bob is about to merge down to main
,
he already knows that tests run fine with Alice's null migration.
He pulls her code, verifies that her actual migration
also integrates fine, and at last merges his feature branch.
Alternatively Bob won the race. When Alice is ready to merge to main, she pulls his code and verifies her feature doesn't break it.
In general there could be N racing feature branches, which implies that on each merge a test should go back at least N migrations to verify that things still work.
Make sure that you always generate unique filenames for each database
migration (apparently Alembic uses abbreviated GUIDs so that shouldn’t
be a problem) and that you know in what order they will be
applied. Then, before merging into main
, make sure to update you
migration file so that it makes sense in the reality of the current
main
.
With one file per migration and unique filenames you should at least
never get merge conflicts. And main
getting updated with a new version
under your feet should just require you to update your migration before
merging.
What we do (using Flyway):
- Once a new migration lands: synch. with
main
(rebase or merge) - Bump the version (it’s just a version number)
- All this impacts is that we have to rebuild the project while testing the branch, because of Flyway errors. Or we have to fiddle with updating checksums and whatnot. But we never bother doing that locally, which illustrates that it’s not a problem at all.
I have thought of dedicating a separate branch within the gitflow just to host migration files.
I feel like I must be a bit emphatic, here: so often I see people trying to solve what are effectively branching or divergence problems using branches. Like configurations for different product lines. And it seems very misguided to me.
Keep updating migrations prior to merge into main
. Don’t create a new
parallel history for everyone to worry about. The test suite should fail
spectacularly if weird migrations conflicts (database migration
conflicts, not Git conflicts) land in main
, anyway.