Is it logically possible to consistently name commits in a VCS using an ordered sequence?

Question 1

Imagine you're creating a completely new version control system, like Git, Mercurial, SVN etc. from scratch. Rather than identifying commits by SHA hashes, you identify them using an ordered sequence of numbers and letters, from 1, 2, 3, 4, 5 etc. If you make a branch from a commit, you call that commit 1a, and then further commits from that are 1b, 1c, 1d... If you branch from 1a, you get a new commit 1a1, from which follows 1a2, 1a3... If you go past the end of the alphabet, you loop around to aa, ab etc. like in Excel.

It looks like this:

1---2---3---4---5 ... 64---65
\
 1a---1b---1c ... 1z---1aa---1ab
 \
 1a1---1a2---1a3
 \
 1a2a

So a summation of the sequence rule would be:

The name of every commit is incremented from the name of the parent commit, either by incrementing the number, or incrementing the letter.
Only the last number/letter is "incremented" using the rule above
A commit name can consist of an alternating string of numbers and letters, where "numbers" can be 1 or more digits, and "letters" can be one or more letters. But you cannot have two strings of numbers, or two strings of letters, in a row.

Now there are two problems I've identified with this, and I'm not sure if there's a solution.

Merge commits. E.g. if commit 65 and commit 1ab merge, what do I name the merge commit? Do I make an arbitrary choice and flip a coin to make the new commit either 66, or 1ac? I need to designate one of the parent commits as the "true" parent and increment that name to arrive at the merge commit name. But what logic do I use to conclude which commit is the "true" parent?
If you have multiple branches from the same commit, there's currently no logical way to name the next branch from that commit. If I branch from 1 I get 1a. The next commit after 1a is 1b. If I branch again from 1, I therefore can't use 1b

Problem 1 visualised:

1---2---3---4---5... 64---65------???
\ /
 1a---1b---1c ... 1z---1aa---1ab/

And problem 2:

1---2---3
|\
| 1a---1b
 \
 ???

For problem 1, I originally thought of identifying the "ancestor" commit - i.e. the commit that connects all of the commits you're trying to merge, and then following the branch of that "ancestor" until you get to the parent of the merge commit. But I realised that that's fundamentally flawed, because finding the one commit that connects all of your merge commit parents, doesn't tell you which branch to follow to get back to the merge commit - they're all equally valid branches.

The main solution I've thought for branching (problem 2) would be to introduce a separate signifier for branches and commit numbers, maybe separated by a punctuation character. E.g.

1---2---3
|\
| 1a.1---1a.2---1a.3
|\
| 1b.1---1b.2---1b.3
 \
 1c.1---1c.2---1c.3
 |\
 | 1c.2a.1---1c.2a.2
 \
 1c.2b.1---1c.2b.2

But that already seems to be ludicrously complex at only a few levels of branching.

So my question is: is there any reasonable and logical approach one could take to name commits using a system like this, or would this system fail from the outset? In particular:

Could you name merge commits in a systematic way, without simply choosing which parent commit to "increment" the name of at random?
Could you name branch commits in this system without names quickly becoming long and complicated to read?

I'm prepared to accept the answer "No to both, this system is fundamentally flawed", but would like to see if there are possible solutions to this problem that I hadn't envisaged.

Question 2

The first question: why? What are you actually trying to gain here?

Question 3

That's a fair question. I'm interested in sketching out ways that a VCS could be designed in ways that don't require beginners to version control to understand concepts like hashes or SHAs - which some could argue is more complexity than a user may want from a VCS. A different level of abstraction, if you will. This was one idea that I had. I recognise that it's flawed but wanted to put it up here for criticism to see if this idea is salvageable.

Question 4

"ways that don't require beginners to version control to understand concepts like hashes or SHAs" – At least in Git, you don't need to understand concepts like hashes or SHAs. You can just treat them as opaque identifiers. You don't even know that they are SHA-1 hashes or even that they are hashes at all, or what a hash is. All you need to know is that each object in Git has a unique name that is based on its content.

Question 5

You could consider the branch with fewer number/alpha/number parts to be the "parent" to be merged into, so 65 would win out over 1ab and the merged branch would be 66. The idea is, "more branched" branches are always merging into "less branched" ones.

Question 6

@Lou: Your entire problem revolves around synchronization. In a centralized system, there is nothing to synchronize, so the problem you are presenting simply does not exist in the first place. You mention Subversion in your question. Subversion does have unique revision numbers which are not content-based / hashes, and that is easy to do precisely because Subversion is centralized. Heck, you could just use a timestamp, which is again trivial because you don't have to synchronize clocks – there is only one clock.

Question 7

You don't need a different naming scheme for branches if you give up on "sequential" numbers in a branch, and instead go for globally increasing.

1---2---5---7---21... 64---65------66
\ /
 3---4---6 ... 61---62---63/

However this and your scheme only work with a central source of truth for names, so neither will be a DVCS.

Question 8

Ah yeah, so this is essentially the same approach as SQL - auto-increment the commit name. I see what you mean about a central source of truth though, and how easily this would clash between local and remote repo

Question 9

@Lou You are bringing up memories of VSS and all the 3rd party tools that were developed so you could use VSS in a sort of local/remote type of architecture.

Question 10

@Lou: This is exactly what Subversion does. Every commit increases the Revision by 1.

Question 11

The main frame challenge here is that this is impossible with a distributed VCS. Unless you intend to throw off-line work modes out, no scheme that requires central coordination can work.

Question 12

@Hans-MartinMosner To frame challenge the frame challenge (😛) off-line doesn't necessarily have to mean fully distributed. In theory, git is fully decentralised, but in practice, most people use a "source of truth" hosted somewhere like Github or Gitlab, and often require all actions on "core" branches to be done directly there, not offline. So a system that allowed you to "claim", say, all branches prefixed with your username, would allow you to build as many branches as you wanted offline, then upload them to the central server. Commit numbers based on branch would then also be possible.

Question 13

Your problems mostly arise from an inconsistent set of base assumptions.

On the one hand, you have assumed that "branching" is a different operation than "committing to a branch"; but on the other, you've assumed that "merging into" is indistinguishable from "merging from".

If you take a git-like model of history as a Directed Acyclic Graph, the first assumption is incorrect. Every commit "branches off from" its parent, in the sense that any commit can be treated as a branch tip in its own right. So commit 1 would be immediately followed by commit 1a, then 1a1, and so on. If you are creating a child of commit 1 but 1a is already in use, that's commit 1b. You would quickly get commit identifiers with hundreds of 1's and a's, and as you point out, merge commits have two natural names (or more, if you allow more than two parents, as git does).

If on the other hand you take the view that branches are a core concept, with commits directly "belonging to" a branch, then the second assumption is incorrect. If commit 65 is "on a branch", and you merge commit 1ab "into that branch", the result is unambiguously commit 66, because it belongs to the same branch as commit 65. It wouldn't be commit 1ac, because that would imply you're continuing the other branch. If you wanted to indicate that it was a merge commit, you could mention its extra source, e.g. 66.1ab

The problem of multiple branches from one point can be solved by always incrementing a number as you go "along" a branch, and adding a letter only when you "create" a branch: 45d6 is the 6th commit on branch 45d, which is the 4th branch from commit 45 of the original branch. A commit "continuing" that branch would be 45d7 ("branch 45d, commit 7"); but a commit "branching from" it would be 45d6a1 ("branch 45d6a, commit 1"). In this scheme, branch identifiers always end in letters, and commits in numbers. Eventually, you would end up with things like 213c42g17 and 213c42g65a25, which you can see at a glance share a common parent branch 213c42g; whereas 213c42f21 is from a different branch (213c42f) with common ancestor 213c42.

Another assumption that could be challenged is that commits always have the same identifier. In Mercurial, commits have a local shorthand which is incremental within a particular list. It is possible (and indeed common) to draw some subset of a git repository as though it was a linear "current branch" with "other branches" splitting from and joining to it. Pick a starting point, and you could label each commit in such a subset with a systematic identifier; you just need some other way to identify that frame of reference.

Ultimately, the format of commit identifiers has to go hand in hand with some definition of what a commit actually is, and that definition varies a lot between the many versioning systems that have been invented over the decades.

Question 14

Thanks, this contains a series of good assumption challenges. I see what you mean about viewing each commit as being a branch tip in its own right. The only thing I don't get is how your suggested naming scheme differs from mine. I'm assuming you mean that from commit 45 we have four branches, 45a1, 45b1, 45c1 and 45d1. Then as I create commits on branch 45d1 I eventually get to 45d6. If I were to branch again from 45 I'd have 45e1. But then if I were to branch from 45d1, wouldn't that also be 45e1? Or are you proposing to then add a new letter 45d1a1?

Question 15

@Lou Yes, I'm saying that if you can tell the difference between "commit to existing branch" and "create new branch", then "commit to existing branch on top of 45d1" could be 45d2, and "create new branch from 45d1" could be 45d1a1. They could be read as "branch 45d, commit 2" and "branch 45d1a, commit 1" - branches end in letters, commits in numbers. Eventually, you would end up with things like 213c42g17 and 213c42g65a25, which you can see at a glance share a common parent branch 213c42g; whereas 213c42f21 is from a different branch (213c42f) with common ancestor 213c42.

Question 16

Ah nice! Yes that makes sense and seems logical and consistent, thanks :). I'm seeing now that at advanced levels of branching it would outweigh the complexity of an abbreviated SHA hash, but perhaps that would encourage regular merges/rebases. Perhaps it's easier to understand than a hash ID is if you've never heard of a hash though

Caleth Caleth 12.3k2 gold badges29 silver badges44 bronze badges · Answer 1 · 2022-07-08 15:24:05Z

7

You don't need a different naming scheme for branches if you give up on "sequential" numbers in a branch, and instead go for globally increasing.

1---2---5---7---21... 64---65------66
\ /
 3---4---6 ... 61---62---63/

However this and your scheme only work with a central source of truth for names, so neither will be a DVCS.

Share

Improve this answer

answered Jul 8, 2022 at 15:24

Caleth's user avatar

Caleth Caleth

12.3k2 gold badges29 silver badges44 bronze badges

8

Ah yeah, so this is essentially the same approach as SQL - auto-increment the commit name. I see what you mean about a central source of truth though, and how easily this would clash between local and remote repo

Lou
– Lou

2022年07月08日 15:58:27 +00:00
Commented Jul 8, 2022 at 15:58
@Lou You are bringing up memories of VSS and all the 3rd party tools that were developed so you could use VSS in a sort of local/remote type of architecture.

Peter M
– Peter M

2022年07月08日 18:15:56 +00:00
Commented Jul 8, 2022 at 18:15
2

@Lou: This is exactly what Subversion does. Every commit increases the Revision by 1.

Jörg W Mittag
– Jörg W Mittag

2022年07月09日 08:24:59 +00:00
Commented Jul 9, 2022 at 8:24
1

The main frame challenge here is that this is impossible with a distributed VCS. Unless you intend to throw off-line work modes out, no scheme that requires central coordination can work.

Hans-Martin Mosner
– Hans-Martin Mosner

2022年07月09日 08:51:00 +00:00
Commented Jul 9, 2022 at 8:51
2

@Hans-MartinMosner To frame challenge the frame challenge (😛) off-line doesn't necessarily have to mean fully distributed. In theory, git is fully decentralised, but in practice, most people use a "source of truth" hosted somewhere like Github or Gitlab, and often require all actions on "core" branches to be done directly there, not offline. So a system that allowed you to "claim", say, all branches prefixed with your username, would allow you to build as many branches as you wanted offline, then upload them to the central server. Commit numbers based on branch would then also be possible.

IMSoP
– IMSoP

2022年07月09日 13:59:41 +00:00
Commented Jul 9, 2022 at 13:59

| Show 3 more comments

IMSoP IMSoP 5,9471 gold badge23 silver badges29 bronze badges · Answer 2 · 2022-07-08 23:49:17Z

Your problems mostly arise from an inconsistent set of base assumptions.

On the one hand, you have assumed that "branching" is a different operation than "committing to a branch"; but on the other, you've assumed that "merging into" is indistinguishable from "merging from".

If you take a git-like model of history as a Directed Acyclic Graph, the first assumption is incorrect. Every commit "branches off from" its parent, in the sense that any commit can be treated as a branch tip in its own right. So commit 1 would be immediately followed by commit 1a, then 1a1, and so on. If you are creating a child of commit 1 but 1a is already in use, that's commit 1b. You would quickly get commit identifiers with hundreds of 1's and a's, and as you point out, merge commits have two natural names (or more, if you allow more than two parents, as git does).

If on the other hand you take the view that branches are a core concept, with commits directly "belonging to" a branch, then the second assumption is incorrect. If commit 65 is "on a branch", and you merge commit 1ab "into that branch", the result is unambiguously commit 66, because it belongs to the same branch as commit 65. It wouldn't be commit 1ac, because that would imply you're continuing the other branch. If you wanted to indicate that it was a merge commit, you could mention its extra source, e.g. 66.1ab

The problem of multiple branches from one point can be solved by always incrementing a number as you go "along" a branch, and adding a letter only when you "create" a branch: 45d6 is the 6th commit on branch 45d, which is the 4th branch from commit 45 of the original branch. A commit "continuing" that branch would be 45d7 ("branch 45d, commit 7"); but a commit "branching from" it would be 45d6a1 ("branch 45d6a, commit 1"). In this scheme, branch identifiers always end in letters, and commits in numbers. Eventually, you would end up with things like 213c42g17 and 213c42g65a25, which you can see at a glance share a common parent branch 213c42g; whereas 213c42f21 is from a different branch (213c42f) with common ancestor 213c42.

Another assumption that could be challenged is that commits always have the same identifier. In Mercurial, commits have a local shorthand which is incremental within a particular list. It is possible (and indeed common) to draw some subset of a git repository as though it was a linear "current branch" with "other branches" splitting from and joining to it. Pick a starting point, and you could label each commit in such a subset with a systematic identifier; you just need some other way to identify that frame of reference.

Ultimately, the format of commit identifiers has to go hand in hand with some definition of what a commit actually is, and that definition varies a lot between the many versioning systems that have been invented over the decades.

Thanks, this contains a series of good assumption challenges. I see what you mean about viewing each commit as being a branch tip in its own right. The only thing I don't get is how your suggested naming scheme differs from mine. I'm assuming you mean that from commit 45 we have four branches, 45a1, 45b1, 45c1 and 45d1. Then as I create commits on branch 45d1 I eventually get to 45d6. If I were to branch again from 45 I'd have 45e1. But then if I were to branch from 45d1, wouldn't that also be 45e1? Or are you proposing to then add a new letter 45d1a1?
@Lou Yes, I'm saying that if you can tell the difference between "commit to existing branch" and "create new branch", then "commit to existing branch on top of 45d1" could be 45d2, and "create new branch from 45d1" could be 45d1a1. They could be read as "branch 45d, commit 2" and "branch 45d1a, commit 1" - branches end in letters, commits in numbers. Eventually, you would end up with things like 213c42g17 and 213c42g65a25, which you can see at a glance share a common parent branch 213c42g; whereas 213c42f21 is from a different branch (213c42f) with common ancestor 213c42.
Ah nice! Yes that makes sense and seems logical and consistent, thanks :). I'm seeing now that at advanced levels of branching it would outweigh the complexity of an abbreviated SHA hash, but perhaps that would encourage regular merges/rebases. Perhaps it's easier to understand than a hash ID is if you've never heard of a hash though

Stack Exchange Network

Is it logically possible to consistently name commits in a VCS using an ordered sequence?

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Is it logically possible to consistently name commits in a VCS using an ordered sequence?

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions