Imagine you're creating a completely new version control system, like Git, Mercurial, SVN etc. from scratch. Rather than identifying commits by SHA hashes, you identify them using an ordered sequence of numbers and letters, from 1
, 2
, 3
, 4
, 5
etc. If you make a branch from a commit, you call that commit 1a
, and then further commits from that are 1b
, 1c
, 1d
... If you branch from 1a
, you get a new commit 1a1
, from which follows 1a2
, 1a3
... If you go past the end of the alphabet, you loop around to aa
, ab
etc. like in Excel.
It looks like this:
1---2---3---4---5 ... 64---65
\
1a---1b---1c ... 1z---1aa---1ab
\
1a1---1a2---1a3
\
1a2a
So a summation of the sequence rule would be:
- The name of every commit is incremented from the name of the parent commit, either by incrementing the number, or incrementing the letter.
- Only the last number/letter is "incremented" using the rule above
- A commit name can consist of an alternating string of numbers and letters, where "numbers" can be 1 or more digits, and "letters" can be one or more letters. But you cannot have two strings of numbers, or two strings of letters, in a row.
Now there are two problems I've identified with this, and I'm not sure if there's a solution.
Merge commits. E.g. if commit
65
and commit1ab
merge, what do I name the merge commit? Do I make an arbitrary choice and flip a coin to make the new commit either66
, or1ac
? I need to designate one of the parent commits as the "true" parent and increment that name to arrive at the merge commit name. But what logic do I use to conclude which commit is the "true" parent?If you have multiple branches from the same commit, there's currently no logical way to name the next branch from that commit. If I branch from
1
I get1a
. The next commit after1a
is1b
. If I branch again from1
, I therefore can't use1b
Problem 1 visualised:
1---2---3---4---5... 64---65------???
\ /
1a---1b---1c ... 1z---1aa---1ab/
And problem 2:
1---2---3
|\
| 1a---1b
\
???
For problem 1, I originally thought of identifying the "ancestor" commit - i.e. the commit that connects all of the commits you're trying to merge, and then following the branch of that "ancestor" until you get to the parent of the merge commit. But I realised that that's fundamentally flawed, because finding the one commit that connects all of your merge commit parents, doesn't tell you which branch to follow to get back to the merge commit - they're all equally valid branches.
The main solution I've thought for branching (problem 2) would be to introduce a separate signifier for branches and commit numbers, maybe separated by a punctuation character. E.g.
1---2---3
|\
| 1a.1---1a.2---1a.3
|\
| 1b.1---1b.2---1b.3
\
1c.1---1c.2---1c.3
|\
| 1c.2a.1---1c.2a.2
\
1c.2b.1---1c.2b.2
But that already seems to be ludicrously complex at only a few levels of branching.
So my question is: is there any reasonable and logical approach one could take to name commits using a system like this, or would this system fail from the outset? In particular:
Could you name merge commits in a systematic way, without simply choosing which parent commit to "increment" the name of at random?
Could you name branch commits in this system without names quickly becoming long and complicated to read?
I'm prepared to accept the answer "No to both, this system is fundamentally flawed", but would like to see if there are possible solutions to this problem that I hadn't envisaged.
-
9The first question: why? What are you actually trying to gain here?Flater– Flater2022年07月08日 15:01:11 +00:00Commented Jul 8, 2022 at 15:01
-
2That's a fair question. I'm interested in sketching out ways that a VCS could be designed in ways that don't require beginners to version control to understand concepts like hashes or SHAs - which some could argue is more complexity than a user may want from a VCS. A different level of abstraction, if you will. This was one idea that I had. I recognise that it's flawed but wanted to put it up here for criticism to see if this idea is salvageable.Lou– Lou2022年07月08日 15:06:33 +00:00Commented Jul 8, 2022 at 15:06
-
11"ways that don't require beginners to version control to understand concepts like hashes or SHAs" – At least in Git, you don't need to understand concepts like hashes or SHAs. You can just treat them as opaque identifiers. You don't even know that they are SHA-1 hashes or even that they are hashes at all, or what a hash is. All you need to know is that each object in Git has a unique name that is based on its content.Jörg W Mittag– Jörg W Mittag2022年07月08日 15:42:01 +00:00Commented Jul 8, 2022 at 15:42
-
1You could consider the branch with fewer number/alpha/number parts to be the "parent" to be merged into, so 65 would win out over 1ab and the merged branch would be 66. The idea is, "more branched" branches are always merging into "less branched" ones.David Conrad– David Conrad2022年07月08日 15:58:24 +00:00Commented Jul 8, 2022 at 15:58
-
3@Lou: Your entire problem revolves around synchronization. In a centralized system, there is nothing to synchronize, so the problem you are presenting simply does not exist in the first place. You mention Subversion in your question. Subversion does have unique revision numbers which are not content-based / hashes, and that is easy to do precisely because Subversion is centralized. Heck, you could just use a timestamp, which is again trivial because you don't have to synchronize clocks – there is only one clock.Jörg W Mittag– Jörg W Mittag2022年07月09日 08:23:06 +00:00Commented Jul 9, 2022 at 8:23
2 Answers 2
You don't need a different naming scheme for branches if you give up on "sequential" numbers in a branch, and instead go for globally increasing.
1---2---5---7---21... 64---65------66
\ /
3---4---6 ... 61---62---63/
However this and your scheme only work with a central source of truth for names, so neither will be a DVCS.
-
Ah yeah, so this is essentially the same approach as SQL - auto-increment the commit name. I see what you mean about a central source of truth though, and how easily this would clash between local and remote repoLou– Lou2022年07月08日 15:58:27 +00:00Commented Jul 8, 2022 at 15:58
-
-
2@Lou: This is exactly what Subversion does. Every commit increases the Revision by 1.Jörg W Mittag– Jörg W Mittag2022年07月09日 08:24:59 +00:00Commented Jul 9, 2022 at 8:24
-
1The main frame challenge here is that this is impossible with a distributed VCS. Unless you intend to throw off-line work modes out, no scheme that requires central coordination can work.Hans-Martin Mosner– Hans-Martin Mosner2022年07月09日 08:51:00 +00:00Commented Jul 9, 2022 at 8:51
-
2@Hans-MartinMosner To frame challenge the frame challenge (😛) off-line doesn't necessarily have to mean fully distributed. In theory, git is fully decentralised, but in practice, most people use a "source of truth" hosted somewhere like Github or Gitlab, and often require all actions on "core" branches to be done directly there, not offline. So a system that allowed you to "claim", say, all branches prefixed with your username, would allow you to build as many branches as you wanted offline, then upload them to the central server. Commit numbers based on branch would then also be possible.IMSoP– IMSoP2022年07月09日 13:59:41 +00:00Commented Jul 9, 2022 at 13:59
Your problems mostly arise from an inconsistent set of base assumptions.
On the one hand, you have assumed that "branching" is a different operation than "committing to a branch"; but on the other, you've assumed that "merging into" is indistinguishable from "merging from".
If you take a git-like model of history as a Directed Acyclic Graph, the first assumption is incorrect. Every commit "branches off from" its parent, in the sense that any commit can be treated as a branch tip in its own right. So commit 1
would be immediately followed by commit 1a
, then 1a1
, and so on. If you are creating a child of commit 1
but 1a
is already in use, that's commit 1b
. You would quickly get commit identifiers with hundreds of 1's and a's, and as you point out, merge commits have two natural names (or more, if you allow more than two parents, as git does).
If on the other hand you take the view that branches are a core concept, with commits directly "belonging to" a branch, then the second assumption is incorrect. If commit 65
is "on a branch", and you merge commit 1ab
"into that branch", the result is unambiguously commit 66
, because it belongs to the same branch as commit 65
. It wouldn't be commit 1ac
, because that would imply you're continuing the other branch. If you wanted to indicate that it was a merge commit, you could mention its extra source, e.g. 66.1ab
The problem of multiple branches from one point can be solved by always incrementing a number as you go "along" a branch, and adding a letter only when you "create" a branch: 45d6
is the 6th commit on branch 45d
, which is the 4th branch from commit 45
of the original branch. A commit "continuing" that branch would be 45d7
("branch 45d, commit 7"); but a commit "branching from" it would be 45d6a1
("branch 45d6a, commit 1"). In this scheme, branch identifiers always end in letters, and commits in numbers. Eventually, you would end up with things like 213c42g17
and 213c42g65a25
, which you can see at a glance share a common parent branch 213c42g
; whereas 213c42f21
is from a different branch (213c42f
) with common ancestor 213c42
.
Another assumption that could be challenged is that commits always have the same identifier. In Mercurial, commits have a local shorthand which is incremental within a particular list. It is possible (and indeed common) to draw some subset of a git repository as though it was a linear "current branch" with "other branches" splitting from and joining to it. Pick a starting point, and you could label each commit in such a subset with a systematic identifier; you just need some other way to identify that frame of reference.
Ultimately, the format of commit identifiers has to go hand in hand with some definition of what a commit actually is, and that definition varies a lot between the many versioning systems that have been invented over the decades.
-
Thanks, this contains a series of good assumption challenges. I see what you mean about viewing each commit as being a branch tip in its own right. The only thing I don't get is how your suggested naming scheme differs from mine. I'm assuming you mean that from commit
45
we have four branches,45a1
,45b1
,45c1
and45d1
. Then as I create commits on branch45d1
I eventually get to45d6
. If I were to branch again from45
I'd have45e1
. But then if I were to branch from45d1
, wouldn't that also be45e1
? Or are you proposing to then add a new letter45d1a1
?Lou– Lou2022年07月11日 08:28:08 +00:00Commented Jul 11, 2022 at 8:28 -
1@Lou Yes, I'm saying that if you can tell the difference between "commit to existing branch" and "create new branch", then "commit to existing branch on top of
45d1
" could be45d2
, and "create new branch from45d1
" could be45d1a1
. They could be read as "branch 45d, commit 2" and "branch 45d1a, commit 1" - branches end in letters, commits in numbers. Eventually, you would end up with things like213c42g17
and213c42g65a25
, which you can see at a glance share a common parent branch213c42g
; whereas213c42f21
is from a different branch (213c42f
) with common ancestor213c42
.IMSoP– IMSoP2022年07月11日 08:46:03 +00:00Commented Jul 11, 2022 at 8:46 -
Ah nice! Yes that makes sense and seems logical and consistent, thanks :). I'm seeing now that at advanced levels of branching it would outweigh the complexity of an abbreviated SHA hash, but perhaps that would encourage regular merges/rebases. Perhaps it's easier to understand than a hash ID is if you've never heard of a hash thoughLou– Lou2022年07月11日 09:15:24 +00:00Commented Jul 11, 2022 at 9:15