Convert version string to integer

Question 1

I need to be able to convert a version string to an integer. I've decided I will follow semantic versioning so version strings will be of the type x.y.z.

Initially I thought a simple algorithm like int = x * 10^6 + y * 10^3 + z would be sufficient and it most probably will be since I've never seen software with a three digit major/minor/patch value, however theoretically it is possible for such algorithm to fail.

Lets suppose we have versions 2.0.0 and 1.1000.0, in this case the integers would be

int(2.0.0) = 2 * 10^6 + 0 * 10^3 + 0 = 2000000
int(1.1000.0) = 1 * 10^6 + 1000 * 10^3 + 0 = 2000000

Clearly the algorithm is flawed, I want to ask if there is a known such algorithm that I could use.

Again I would probably be OK performing the calculation this way but I'm interested to find out if there is a bulletproof way?

Question 2

Your main problem is that a version string is not a number. It is a string. Just because a string only contains digits doesn't make it a number. A phonenumber is still a string. If it is because you need to compare two versions, then compare each part individually.

Question 3

@Bent I need to be able to find the next version of a given API in a database given a known version, I figured the easiest way would be to convert the version to some integer based on a law and use that instead of some complex logic in SQL which would look horrible. I guess I could just limit version components to 1000 or more extreme case 10000 and deal with that.

Question 4

I would still not use a simple integer comparison. I would create a function that compares the integer value before the period. If that is equal then the value after the first and before the second period. And so on. And don't rely on there only being 2 periods or even the same amount of periods in both version strings.

Question 5

@Bent: Actually, it's more like a tuple (Nat × Nat × Nat × String).

Question 6

What you want is impossible unless you limit the component numbers, and then it's trivial. The SQL is not so bad, depending on the database engine.

Question 7

No, there's no bulletproof way unless you know the maximum values in advance (the maximum number of digits per component). You will also need that information to decode the number.

You're trying to represent something using positional notation, so the positions in your single integer have an inherent meaning. There is extra information within the "x.y.z" version string (the dots separating the components), which is lost when you do your conversion to a single integer.

Question 8

If the only requirement is that each version string has a unique integer identifier, you can use a function like:

int(x.y.z) = 2^x * 3^y * 5^z

This is easily reversible by finding the prime factorization of the integer, but doesn't have the same ordering as the version strings.

Edit: If the size of the integer is a concern, find the binary representation for each element of the tuple

x -> ...x3 x2 x1 x0
y -> ...y3 y2 y1 y0
z -> ...z3 z2 z1 z0

then interleave the bits

output = ... x3 y3 z3 x2 y2 z2 x1 y1 z1 z0 y0 z0

This should be more space efficient than storing a version string.

Question 9

Brilliant in theory, but could be a problem if there's going to be a large number of patch releases: You'll need quite a few digits to represent version 6.0.103

Question 10

I think you wanted multiplications instead of additions. With the current formula, 1.1.0 gives the same result as 0.0.1. I agree with James that this will however quickly give huge numbers.

Question 11

"That is easily reversible by finding the prime factorization of the integer" may be a bit optimistic for a np-hard problem.

Question 12

@allo Finding the prime factorization of an arbitrary integer is hard, but restricting the problem to factoring numbers whose prime factors are taken from the first k integers makes it somewhat easier.

Question 13

I may be late, but I had same problem and I noticed that we can use bit masking to solve it:

Glossary

<< is the shift left operator.
>> is the shift right operator.
| is the XOR operator.
& is the AND operator.
maxBits is a integer representing how many bits can be used to represent each version number (x, y, and z), this affects the maximum output integer.

The output integers are ordened

That is, the output integer of v0.1.0:

is equal only to v0.1.0
greater than v0.0.9, v0.0.100, v0.0.1000
less than v0.1.1, v1.0.0.

Masking a version

Let version be a string in format vX.Y.Z where X, Y and Z are integers.

Lets generate a masked integer from the x, y, z values:

int(x, y, z) = x << maxBits * 2 | y << maxBits * 1 | z << maxBits * 0

A good value for maxBits is 8 since it allows you to work with versions from v0.0.0 to v255.255.255 and the output integer of this mask can be represented in a integer of 24 bits which is commonly supported since most languages and environments support integers up to 32 bits.

Version reverse masking

To "unmask" the integer and return back to the original string, you can reverse the mask to each version number.

Let K be the masked integer:

major(K) = (K >> maxBits * 2) & ((1 << maxBits) - 1)
minor(K) = (K >> maxBits * 1) & ((1 << maxBits) - 1)
patch(K) = (K >> maxBits * 0) & ((1 << maxBits) - 1)

Remember that the masked number looks like:

 X Y Z
0000 0000 0000

These reverse masking functions are telling to:

Given the masked integer, shift the previous bit group(s), if any, to the right and keep only the first group.

Dart implementation Try it Online!

/// [version] should be a string in the format of 'vX.Y.Z' where X, Y, Z are
/// integers representing the major, minor, and patch versions respectively.
///
/// For example, 'v1.2.3' represents version 1.2.3.
///
/// [maxBits] is the number of bits used to represent the number each version number: [vMAJOR.MINOR.PATCH].
///
/// 8 bits are enough to represent the number [v0.0.0] to [v255.255.255].
///
/// This function returns an unique integer representing the version.
int maskVersion(String version, {int maxBits = 8}) {
 final List<int> versions = version
 .replaceAll(RegExp('v'), '') // Remove 'v' from 'v0.1.0'
 .split('.') // Turn '0.1.0' into ['0', '1', '0']
 .map((String e) => int.parse(e))
 .toList();
 final int major = versions[0];
 final int minor = versions[1];
 final int patch = versions[2];
 return major << maxBits * 2 | minor << maxBits * 1 | patch << maxBits * 0;
}
/// [version] should be the integer returned by [maskVersion].
///
/// [maxBits] is the number of bits used to represent the number each version number: [vMAJOR.MINOR.PATCH].
///
/// 8 bits are enough to represent the number [v0.0.0] to [v255.255.255].
///
/// This function returns the original string representing the version.
String unmaskVersion(int version, {int maxBits = 8}) {
 final int major = (version >> maxBits * 2) & ((1 << maxBits) - 1);
 final int minor = (version >> maxBits * 1) & ((1 << maxBits) - 1);
 final int patch = (version >> maxBits * 0) & ((1 << maxBits) - 1);
 return 'v$major.$minor.$patch';
}

Remember to always work with the same maxBits value, in both directions (masking and unmasking).

Masking assertions (Max bits: 8):

maskVersion('v0.1.0') > maskVersion('v0.0.9')
maskVersion('v0.1.0') < maskVersion('v0.1.1')
maskVersion('v0.1.0') < maskVersion('v1.0.0')
maskVersion('v0.0.0') < maskVersion('v0.0.1')
maskVersion('v0.0.0') < maskVersion('v0.1.0')
maskVersion('v0.0.0') < maskVersion('v100.100.100')

Unmasking assertions (Max bits: 8):

unmaskVersion(255) == 'v0.0.255'
unmaskVersion(256) == 'v0.1.0'
unmaskVersion(257) == 'v0.1.1'
unmaskVersion(0) == 'v0.0.0'
LAST_OUTPUT_INTEGER = pow(2, 8) - 1
unmaskVersion(LAST_OUTPUT_INTEGER) == 'v255.255.255'

Note: since you want to support numbers up to 1000 you can use maxBits = 10 which supports v1024.1024.1024 and can also be represented in a integer of 32 bits.

Question 14

Very nice. But how do you go from the output back to the input?

Question 15

Great question, I researched and updated the answer, added the unmask section, thanks!

Question 16

This is effectively the very same approach that the OP had in his question and judged it "Clearly the algorithm is flawed". Just base 2 instead of base 10.

Question 17

The flaw is that we can't use it to infinite, we need to set a constraint before using it.

Question 18

In addition to the other answer, consider that discarding labels like "alpha", etc. is not a good idea if you have to compare versions. Instead of using strings, consider parsing a version into a structured internal data:

(major: 3, minor: 0, patch: 0, label: "alpha")

And define a custom lexicographic sort for comparing them.

Question 19

Don't.

You didn't explain in your question why you want to do this but it is probably a bad idea. I think that whatever usecase you are thinking of, it is probably better to handle it a different way.

If you really want to map version numbers to integers and have them be unique and in order you must know how many digits each part of the version string can have. Then you could do something like you mentioned in your question.

Additionally, in terms of order, I am not really sure what should come first:

Version 1.0 - Released in Dec 2000
Version 1.1 - Released in Dec 2001
Version 2.0 - Released in Dec 2002
Version 1.2 - Released in Dec 2014

Do you want to sort by release date or do you want to sort by major version, followed by minor version, followed by ... ?

Question 20

The OP said the versioning follows semantic-versioning and this clearly specifies how it should be sorted. Your examples are invalid in terms of semver. One of the use-cases for such a conversion would be to convert semver to a number because android apps use strings as an information for the user and an integer internally. Converting a semver to an int would be much more reliable than having to maintain two separate numbers. Sooner or later you'll miss either one.

Mike Harris Mike Harris 5824 silver badges13 bronze badges · Accepted Answer · 2016-03-24 17:57:55Z

No, there's no bulletproof way unless you know the maximum values in advance (the maximum number of digits per component). You will also need that information to decode the number.

You're trying to represent something using positional notation, so the positions in your single integer have an inherent meaning. There is extra information within the "x.y.z" version string (the dots separating the components), which is lost when you do your conversion to a single integer.

Stack Exchange Network

Convert version string to integer

5 Answers 5

Glossary

The output integers are ordened

Masking a version

Version reverse masking

Dart implementation Try it Online!

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Convert version string to integer

5 Answers 5

Glossary

The output integers are ordened

Masking a version

Version reverse masking

Dart implementation Try it Online!

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions