13

I need to be able to convert a version string to an integer. I've decided I will follow semantic versioning so version strings will be of the type x.y.z.

Initially I thought a simple algorithm like int = x * 10^6 + y * 10^3 + z would be sufficient and it most probably will be since I've never seen software with a three digit major/minor/patch value, however theoretically it is possible for such algorithm to fail.

Lets suppose we have versions 2.0.0 and 1.1000.0, in this case the integers would be

int(2.0.0) = 2 * 10^6 + 0 * 10^3 + 0 = 2000000
int(1.1000.0) = 1 * 10^6 + 1000 * 10^3 + 0 = 2000000

Clearly the algorithm is flawed, I want to ask if there is a known such algorithm that I could use.

Again I would probably be OK performing the calculation this way but I'm interested to find out if there is a bulletproof way?

asked Mar 24, 2016 at 17:43
6
  • 5
    Your main problem is that a version string is not a number. It is a string. Just because a string only contains digits doesn't make it a number. A phonenumber is still a string. If it is because you need to compare two versions, then compare each part individually. Commented Mar 24, 2016 at 18:19
  • @Bent I need to be able to find the next version of a given API in a database given a known version, I figured the easiest way would be to convert the version to some integer based on a law and use that instead of some complex logic in SQL which would look horrible. I guess I could just limit version components to 1000 or more extreme case 10000 and deal with that. Commented Mar 24, 2016 at 20:28
  • 3
    I would still not use a simple integer comparison. I would create a function that compares the integer value before the period. If that is equal then the value after the first and before the second period. And so on. And don't rely on there only being 2 periods or even the same amount of periods in both version strings. Commented Mar 24, 2016 at 20:52
  • 3
    @Bent: Actually, it's more like a tuple (Nat × Nat × Nat × String). Commented Mar 25, 2016 at 1:13
  • 1
    What you want is impossible unless you limit the component numbers, and then it's trivial. The SQL is not so bad, depending on the database engine. Commented Apr 1, 2016 at 1:06

5 Answers 5

10

No, there's no bulletproof way unless you know the maximum values in advance (the maximum number of digits per component). You will also need that information to decode the number.

You're trying to represent something using positional notation, so the positions in your single integer have an inherent meaning. There is extra information within the "x.y.z" version string (the dots separating the components), which is lost when you do your conversion to a single integer.

answered Mar 24, 2016 at 17:57
8

If the only requirement is that each version string has a unique integer identifier, you can use a function like:

int(x.y.z) = 2^x * 3^y * 5^z

This is easily reversible by finding the prime factorization of the integer, but doesn't have the same ordering as the version strings.

Edit: If the size of the integer is a concern, find the binary representation for each element of the tuple

x -> ...x3 x2 x1 x0
y -> ...y3 y2 y1 y0
z -> ...z3 z2 z1 z0

then interleave the bits

output = ... x3 y3 z3 x2 y2 z2 x1 y1 z1 z0 y0 z0

This should be more space efficient than storing a version string.

answered Mar 24, 2016 at 19:47
4
  • 3
    Brilliant in theory, but could be a problem if there's going to be a large number of patch releases: You'll need quite a few digits to represent version 6.0.103 Commented Mar 24, 2016 at 19:56
  • 2
    I think you wanted multiplications instead of additions. With the current formula, 1.1.0 gives the same result as 0.0.1. I agree with James that this will however quickly give huge numbers. Commented Mar 25, 2016 at 16:03
  • "That is easily reversible by finding the prime factorization of the integer" may be a bit optimistic for a np-hard problem. Commented Jun 24, 2022 at 14:23
  • @allo Finding the prime factorization of an arbitrary integer is hard, but restricting the problem to factoring numbers whose prime factors are taken from the first k integers makes it somewhat easier. Commented Jun 25, 2022 at 16:08
4

I may be late, but I had same problem and I noticed that we can use bit masking to solve it:

Glossary

  • << is the shift left operator.
  • >> is the shift right operator.
  • | is the XOR operator.
  • & is the AND operator.
  • maxBits is a integer representing how many bits can be used to represent each version number (x, y, and z), this affects the maximum output integer.

The output integers are ordened

That is, the output integer of v0.1.0:

  • is equal only to v0.1.0
  • greater than v0.0.9, v0.0.100, v0.0.1000
  • less than v0.1.1, v1.0.0.

Masking a version

Let version be a string in format vX.Y.Z where X, Y and Z are integers.

Lets generate a masked integer from the x, y, z values:

int(x, y, z) = x << maxBits * 2 | y << maxBits * 1 | z << maxBits * 0

A good value for maxBits is 8 since it allows you to work with versions from v0.0.0 to v255.255.255 and the output integer of this mask can be represented in a integer of 24 bits which is commonly supported since most languages and environments support integers up to 32 bits.

Version reverse masking

To "unmask" the integer and return back to the original string, you can reverse the mask to each version number.

Let K be the masked integer:

major(K) = (K >> maxBits * 2) & ((1 << maxBits) - 1)
minor(K) = (K >> maxBits * 1) & ((1 << maxBits) - 1)
patch(K) = (K >> maxBits * 0) & ((1 << maxBits) - 1)

Remember that the masked number looks like:

 X Y Z
0000 0000 0000

These reverse masking functions are telling to:

  • Given the masked integer, shift the previous bit group(s), if any, to the right and keep only the first group.

Dart implementation Try it Online!

/// [version] should be a string in the format of 'vX.Y.Z' where X, Y, Z are
/// integers representing the major, minor, and patch versions respectively.
///
/// For example, 'v1.2.3' represents version 1.2.3.
///
/// [maxBits] is the number of bits used to represent the number each version number: [vMAJOR.MINOR.PATCH].
///
/// 8 bits are enough to represent the number [v0.0.0] to [v255.255.255].
///
/// This function returns an unique integer representing the version.
int maskVersion(String version, {int maxBits = 8}) {
 final List<int> versions = version
 .replaceAll(RegExp('v'), '') // Remove 'v' from 'v0.1.0'
 .split('.') // Turn '0.1.0' into ['0', '1', '0']
 .map((String e) => int.parse(e))
 .toList();
 final int major = versions[0];
 final int minor = versions[1];
 final int patch = versions[2];
 return major << maxBits * 2 | minor << maxBits * 1 | patch << maxBits * 0;
}
/// [version] should be the integer returned by [maskVersion].
///
/// [maxBits] is the number of bits used to represent the number each version number: [vMAJOR.MINOR.PATCH].
///
/// 8 bits are enough to represent the number [v0.0.0] to [v255.255.255].
///
/// This function returns the original string representing the version.
String unmaskVersion(int version, {int maxBits = 8}) {
 final int major = (version >> maxBits * 2) & ((1 << maxBits) - 1);
 final int minor = (version >> maxBits * 1) & ((1 << maxBits) - 1);
 final int patch = (version >> maxBits * 0) & ((1 << maxBits) - 1);
 return 'v$major.$minor.$patch';
}

Remember to always work with the same maxBits value, in both directions (masking and unmasking).

Masking assertions (Max bits: 8):

maskVersion('v0.1.0') > maskVersion('v0.0.9')
maskVersion('v0.1.0') < maskVersion('v0.1.1')
maskVersion('v0.1.0') < maskVersion('v1.0.0')
maskVersion('v0.0.0') < maskVersion('v0.0.1')
maskVersion('v0.0.0') < maskVersion('v0.1.0')
maskVersion('v0.0.0') < maskVersion('v100.100.100')

Unmasking assertions (Max bits: 8):

unmaskVersion(255) == 'v0.0.255'
unmaskVersion(256) == 'v0.1.0'
unmaskVersion(257) == 'v0.1.1'
unmaskVersion(0) == 'v0.0.0'
LAST_OUTPUT_INTEGER = pow(2, 8) - 1
unmaskVersion(LAST_OUTPUT_INTEGER) == 'v255.255.255'

Note: since you want to support numbers up to 1000 you can use maxBits = 10 which supports v1024.1024.1024 and can also be represented in a integer of 32 bits.

answered Jun 19, 2022 at 23:51
4
  • Very nice. But how do you go from the output back to the input? Commented Jun 20, 2022 at 8:08
  • Great question, I researched and updated the answer, added the unmask section, thanks! Commented Jun 20, 2022 at 17:17
  • This is effectively the very same approach that the OP had in his question and judged it "Clearly the algorithm is flawed". Just base 2 instead of base 10. Commented Jun 23, 2022 at 8:06
  • The flaw is that we can't use it to infinite, we need to set a constraint before using it. Commented Jun 23, 2022 at 16:30
2

In addition to the other answer, consider that discarding labels like "alpha", etc. is not a good idea if you have to compare versions. Instead of using strings, consider parsing a version into a structured internal data:

(major: 3, minor: 0, patch: 0, label: "alpha")

And define a custom lexicographic sort for comparing them.

answered Mar 24, 2016 at 19:15
0

Don't.

You didn't explain in your question why you want to do this but it is probably a bad idea. I think that whatever usecase you are thinking of, it is probably better to handle it a different way.

If you really want to map version numbers to integers and have them be unique and in order you must know how many digits each part of the version string can have. Then you could do something like you mentioned in your question.


Additionally, in terms of order, I am not really sure what should come first:

Version 1.0 - Released in Dec 2000
Version 1.1 - Released in Dec 2001
Version 2.0 - Released in Dec 2002
Version 1.2 - Released in Dec 2014

Do you want to sort by release date or do you want to sort by major version, followed by minor version, followed by ... ?

answered Apr 5, 2016 at 22:13
1
  • The OP said the versioning follows semantic-versioning and this clearly specifies how it should be sorted. Your examples are invalid in terms of semver. One of the use-cases for such a conversion would be to convert semver to a number because android apps use strings as an information for the user and an integer internally. Converting a semver to an int would be much more reliable than having to maintain two separate numbers. Sooner or later you'll miss either one. Commented Jan 12, 2021 at 21:09

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.