I have written my own take on semantic versioning. Parsing it is not really hard, but I feel like my parsing could be more optimal, more readable and feel more like a parser. Currently, there is this unread
method that I don't see in most parser so if possible I would like to get rid of it, and the two methods readPrerelease()
and readBuild()
feel too complex.
I'm only interested in parsing so my Version
class got cleaned of equals
, hashCode
, compareTo
and toString
methods and I removed the related tests. If required I could re-add them, but to me that is superfluous in this code review request.
For this code review, I would like to:
- Make the code more readable
- Let the code go forward and avoid going backwards (remove
unread
and all thosecurrentPosition - 1
), unless necessary. - Avoid having so many booleans in the methods
readPreRelease()
andreadBuild()
. - Write general remarks about the code if any.
My code provides three classes:
- the
VersionParser
which do the actual parsing. - the
Version
class, which was dumbed down because I don't want that to be code-reviewed, it's rather easy but the goal here is for the parser. The parsing entry point is here, through thevalueOf
method. - the testing class I used to make sure my parsing is correct.
Below those classes, you can see the BNF grammar for reference.
Please note that I removed comments, as I want my code to be self-explanatory, so if it's unclear, that something that should be factored in the review.
import java.util.ArrayList;
import java.util.List;
import static java.util.Objects.requireNonNull;
final class VersionParser {
private final String source;
private int currentPosition = 0;
VersionParser(String source) {
this.source = requireNonNull(source);
}
Version parse() {
var major = readNumericIdentifier();
consume('.');
var minor = readNumericIdentifier();
consume('.');
var patch = readNumericIdentifier();
var preRelease = List.<String>of();
if (peek() == '-') {
consume('-');
preRelease = readPreReleases();
}
var build = List.<String>of();
if (peek() == '+') {
consume('+');
build = readBuilds();
}
check(isAtEnd(), "Unexpected characters in \"%s\" at %d", source, currentPosition - 1);
return new Version(major, minor, patch, preRelease, build);
}
private boolean isDigit(int c) {
return '0' <= c && c <= '9';
}
private boolean isAlpha(int c) {
return ('a' <= c && c <= 'z') || ('A' <= c && c <= 'Z');
}
private boolean isNonDigit(int c) {
return isAlpha(c) || c == '-';
}
private boolean isAtEnd() {
return currentPosition >= source.length();
}
private int advance() {
var c = source.charAt(currentPosition);
currentPosition++;
return c;
}
private int peek() {
if (isAtEnd()) {
return -1;
}
return source.charAt(currentPosition);
}
private void unread() {
currentPosition--;
}
private void check(boolean expression, String messageFormat, Object... arguments) {
if (!expression) {
var message = String.format(messageFormat, arguments);
throw new IllegalArgumentException(message);
}
}
private void consume(char expected) {
check(!isAtEnd(), "Early end in \"%s\"", source);
var c = advance();
check(c == expected, "Expected %c, got %c in \"%s\" at position %d", expected, c, source, currentPosition - 1);
}
private int readNumericIdentifier() {
check(!isAtEnd(), "Early end in \"%s\"", source);
var start = currentPosition;
var c = advance();
check(isDigit(c), "Expected a digit, got %c in \"%s\" at position %d", c, source, currentPosition - 1);
if (c == '0') {
return 0;
}
while (!isAtEnd()) {
c = advance();
if (!isDigit(c)) {
unread();
break;
}
}
var string = source.substring(start, currentPosition);
return Integer.parseInt(string);
}
private List<String> readPreReleases() {
var preReleases = new ArrayList<String>();
preReleases.add(readPreRelease());
while (true) {
if (peek() != '.') {
return preReleases;
}
consume('.');
preReleases.add(readPreRelease());
}
}
/*
* Basically, should be a valid number (without leading 0, unless for 0) or should contain at least one letter or dash.
*/
private String readPreRelease() {
var start = currentPosition;
var isAllDigit = true;
var startsWithZero = false;
var isEmpty = true;
while (!isAtEnd()) {
var c = advance();
var isDigit = isDigit(c);
var isNonDigit = isNonDigit(c);
if (!isDigit && !isNonDigit) {
unread();
break;
}
if (isEmpty && c == '0') {
startsWithZero = true;
}
isEmpty = false;
isAllDigit &= isDigit;
}
check(!isEmpty, "Empty preRelease part in \"%s\" at %d", source, currentPosition - 1);
var length = currentPosition - start;
var doesNotStartWithZero = !isAllDigit || !startsWithZero || length == 1;
check(doesNotStartWithZero, "Numbers may not start with 0 except 0 in \"%s\" at position %d", source, start);
return source.substring(start, currentPosition);
}
private List<String> readBuilds() {
var builds = new ArrayList<String>();
builds.add(readBuild());
while (true) {
if (peek() != '.') {
return builds;
}
consume('.');
builds.add(readBuild());
}
}
private String readBuild() {
var start = currentPosition;
var isEmpty = true;
while (!isAtEnd()) {
var c = advance();
var isDigit = isDigit(c);
var isNonDigit = isNonDigit(c);
if (!isDigit && !isNonDigit) {
unread();
break;
}
isEmpty = false;
}
check(!isEmpty, "Empty build part in \"%s\" at %d", source, currentPosition - 1);
return source.substring(start, currentPosition);
}
}
The Version
class that hides the parser.
import java.util.List;
public final class Version {
public static Version valueOf(String s) {
return new VersionParser(s).parse();
}
private final int major;
private final int minor;
private final int patch;
private final List<String> preRelease;
private final List<String> build;
Version(int major, int minor, int patch, List<String> preRelease, List<String> build) {
this.major = major;
this.minor = minor;
this.patch = patch;
this.preRelease = List.copyOf(preRelease);
this.build = List.copyOf(build);
}
// getters, equals, hashCode, toString, compareTo (+ implement Comparable)
}
The test class to make sure the parsing works. Requires Junit and AssertJ.
import org.junit.jupiter.params.*;
import org.junit.jupiter.params.provider.*;
import java.util.*;
import static java.util.stream.Collectors.toList;
import static org.assertj.core.api.Assertions.*;
class VersionTest {
@ParameterizedTest
@MethodSource("provideCorrectVersions")
void testVersion_correct(String correctVersion) {
assertThat(be.imgn.common.base.Version.valueOf(correctVersion))
.isNotNull();
}
private static List<Arguments> provideCorrectVersions() {
var versions = new String[] {
"0.0.4", "1.2.3", "10.20.30", "1.1.2-prerelease+meta", "1.1.2+meta",
"1.1.2+meta-valid", "1.0.0-alpha", "1.0.0-beta", "1.0.0-alpha.beta",
"1.0.0-alpha.beta.1", "1.0.0-alpha.1", "1.0.0-alpha0.valid",
"1.0.0-alpha.0valid",
"1.0.0-alpha-a.b-c-somethinglong+build.1-aef.1-its-okay",
"1.0.0-rc.1+build.1", "2.0.0-rc.1+build.123", "1.2.3-beta",
"10.2.3-DEV-SNAPSHOT", "1.2.3-SNAPSHOT-123", "1.0.0", "2.0.0", "1.1.7",
"2.0.0+build.1848", "2.0.1-alpha.1227", "1.0.0-alpha+beta",
"1.2.3----RC-SNAPSHOT.12.9.1--.12+788",
"1.2.3----R-S.12.9.1--.12+meta", "1.2.3----RC-SNAPSHOT.12.9.1--.12",
"1.0.0+0.build.1-rc.10000aaa-kk-0.1",
"999999999.999999999.999999999", "1.0.0-0A.is.legal"
};
return Arrays.stream(versions)
.map(Arguments::of)
.collect(toList());
}
@ParameterizedTest
@MethodSource("provideIncorrectVersions")
void testVersion_incorrect(String incorrectVersion) {
assertThatThrownBy(() -> Version.valueOf(incorrectVersion))
.isInstanceOf(IllegalArgumentException.class)
.hasNoSuppressedExceptions();
}
private static List<Arguments> provideIncorrectVersions() {
var versions = new String[] {
"1", "1.2", "1.2.3-0123", "1.2.3-0123.0123", "1.1.2+.123", "1.2.3+",
"+invalid", "-invalid", "-invalid+invalid", "-invalid.01", "alpha",
"alpha.beta", "alpha.beta.1", "alpha.1", "alpha+beta", "alpha_beta",
"alpha.", "alpha..", "beta", "1.0.0-alpha_beta", "-alpha.",
"1.0.0-alpha..", "1.0.0-alpha..1", "1.0.0-alpha...1",
"1.0.0-alpha....1", "1.0.0-alpha.....1", "1.0.0-alpha......1",
"1.0.0-alpha.......1", "01.1.1", "1.01.1", "1.1.01", "1.2",
"1.2.3.DEV", "1.2-SNAPSHOT",
"1.2.31.2.3----RC-SNAPSHOT.12.09.1--..12+788", "1.2-RC-SNAPSHOT",
"-1.0.3-gamma+b7718", "+justmeta", "9.8.7+meta+meta",
"9.8.7-whatever+meta+meta",
"999999999999999999.999999999999999999.999999999999999999",
"999999999.999999999.999999999----RC-SNAPSHOT.12.09.1-------------..12"
};
return Arrays.stream(versions)
.map(Arguments::of)
.collect(toList());
}
}
The Backus–Naur Form grammar, as taken from the semver.org website.
<valid semver> ::= <version core>
| <version core> "-" <pre-release>
| <version core> "+" <build>
| <version core> "-" <pre-release> "+" <build>
<version core> ::= <major> "." <minor> "." <patch>
<major> ::= <numeric identifier>
<minor> ::= <numeric identifier>
<patch> ::= <numeric identifier>
<pre-release> ::= <dot-separated pre-release identifiers>
<dot-separated pre-release identifiers> ::= <pre-release identifier>
| <pre-release identifier> "." <dot-separated pre-release identifiers>
<build> ::= <dot-separated build identifiers>
<dot-separated build identifiers> ::= <build identifier>
| <build identifier> "." <dot-separated build identifiers>
<pre-release identifier> ::= <alphanumeric identifier>
| <numeric identifier>
<build identifier> ::= <alphanumeric identifier>
| <digits>
<alphanumeric identifier> ::= <non-digit>
| <non-digit> <identifier characters>
| <identifier characters> <non-digit>
| <identifier characters> <non-digit> <identifier characters>
<numeric identifier> ::= "0"
| <positive digit>
| <positive digit> <digits>
<identifier characters> ::= <identifier character>
| <identifier character> <identifier characters>
<identifier character> ::= <digit>
| <non-digit>
<non-digit> ::= <letter>
| "-"
<digits> ::= <digit>
| <digit> <digits>
<digit> ::= "0"
| <positive digit>
<positive digit> ::= "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
<letter> ::= "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | "J"
| "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" | "S" | "T"
| "U" | "V" | "W" | "X" | "Y" | "Z" | "a" | "b" | "c" | "d"
| "e" | "f" | "g" | "h" | "i" | "j" | "k" | "l" | "m" | "n"
| "o" | "p" | "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x"
| "y" | "z"
````
1 Answer 1
On the whole, the code was easy to read, and the tests were concise (apart from the fully qualified be.imgn.common.base.Version.valueOf
call. Here's a few things to think about.
Object Lifetime
The VersionParser
class takes in a String
and then provides a parse
method which actually does the parsing work. However, this method can only ever be called once. If it's called more than once, then it fails, because the source string has already been parsed and the method assumes that it's only called on a newly constructed parser. This feels wrong. It's relying on the clients knowing too much about how the class works. A better approach might have been to make the class constructor private and have a static parse
method for the interface, which spun up the data, if required, and performed the parse. Alternately, parse
could simply reset processed back to the beginning of the source data an reprocess it, or even return a cached version...
Circular Dependency
Circular dependencies as a general rule are bad. They have a tendency to result in tightly coupled code that bites you, just as you decide you want do reuse a bit of the code somewhere else. As it stands, you've got a circular dependency. Your Version
calls VersionParser
which then creates a Version
. To me, it seems like a VersionParser
might need to know about a Version
, in order to construct it, but a Version
shouldn't really need to know about a VersionParser
.
If it's !isDigit && !isNonDigit
what is it?
One of your goals is self explanatory code. If found this line less than obvious, my instinct was non-digit is the same as not-a-digit, which is everything that isn't a digit... However that's clearly not the case. A NonDigit, appears to be an alpha, or '-'. A better name might help, however you only actually seem to use it in this check. Maybe a method isValidVersionCharacter
which evaluated digits and 'non digits', would be clearer...
Check
There's quite a lot of !
in your code. For me, this made some of the calls to check
awkward to process.
check(!isAtEnd(), "Early end in \"%s\"", source);
I'm not sure if it's that check
sounds a bit like if
, so I'm expecting it to perform the print/throw action if the condition is true, or if it's that "Check not is at end" sounds awkward. verify
would work better for me I think, because it's closer to assert, so I'm of a mindset that it's expecting the condition to be true, or it will perform the print/throw action (i.e. the opposite of the if
processing). I suspect this is very subjective, but possibly something to consider.
Since I'm thinking about check
, its first parameter is boolean expression
. This is a bit misleading. It doesn't take an expression, it takes in a boolean value that if it isn't true will result in an exception. A better name for the parameter may help with some of my previous misunderstandings.
To get a feel for possible approaches for the negatives, I went through the code and noticed several possible small refactorings, with a goal of making small improvements.
- Replaced your
while
constructs withdo..while
, the operation is always performed once. isAtEnd
seemed to result in a lot of!isAtEnd
, so I inverted it toisMoreToProcess
- I think you’re missing three invalid test cases, so I added them: "01.0.4", "0.01.4", "0.0.04"
unread
andpeek
seem to be doing the same thing in different ways. I removedunread
to make the approach more straightforward to follow- Since I used
peek
, which checks if we’re at the end as part of it, didn’t need to check if we have reached the end during loops - The isEmpty variables make your iterations busier than they need to be, so I extracted them from the loop
doesNotStartWithZero
had a lot of negatives, so I inverted it tonumberWithZeroPrefix
- I removed the construction requirement for the parser, so that
parse
takes the information it’s expected to parse. This makes callingparse
more consistent (you get the same behaviour if you call it once or 5 times)
The resulting code:
final class VersionParser {
private String source;
private int currentPosition = 0;
Version parse(String source) {
this.source = requireNonNull(source);
var major = readNumericIdentifier();
consume('.');
var minor = readNumericIdentifier();
consume('.');
var patch = readNumericIdentifier();
var preRelease = List.<String>of();
if (peek() == '-') {
consume('-');
preRelease = readPreReleases();
}
var build = List.<String>of();
if (peek() == '+') {
consume('+');
build = readBuilds();
}
check(!isMoreToProcess(), "Unexpected characters in \"%s\" at %d", source, currentPosition - 1);
return new Version(major, minor, patch, preRelease, build);
}
private boolean isDigit(int c) {
return '0' <= c && c <= '9';
}
private boolean isAlpha(int c) {
return ('a' <= c && c <= 'z') || ('A' <= c && c <= 'Z');
}
private boolean isNonDigit(int c) {
return isAlpha(c) || c == '-';
}
private boolean isMoreToProcess() {
return !(currentPosition >= source.length());
}
private int advance() {
var c = source.charAt(currentPosition);
currentPosition++;
return c;
}
private int peek() {
if (!isMoreToProcess()) {
return -1;
}
return source.charAt(currentPosition);
}
private void check(boolean expression, String messageFormat, Object... arguments) {
if (!expression) {
var message = String.format(messageFormat, arguments);
throw new IllegalArgumentException(message);
}
}
private void consume(char expected) {
check(isMoreToProcess(), "Early end in \"%s\"", source);
var c = advance();
check(c == expected, "Expected %c, got %c in \"%s\" at position %d", expected, c, source, currentPosition - 1);
}
private boolean isDigitNext() {
return isDigit(peek());
}
private boolean isValidIdentifierCharacterNext() {
var nextCharacter = peek();
return isDigit(nextCharacter) || isNonDigit(nextCharacter);
}
private int readNumericIdentifier() {
check(isMoreToProcess(), "Early end in \"%s\"", source);
var start = currentPosition;
var c = advance();
check(isDigit(c), "Expected a digit, got %c in \"%s\" at position %d", c, source, currentPosition - 1);
if (c == '0') {
return 0;
}
while (isDigitNext()) {
advance();
}
var string = source.substring(start, currentPosition);
return Integer.parseInt(string);
}
private List<String> readPreReleases() {
var preReleases = new ArrayList<String>();
do {
preReleases.add(readPreRelease());
if (peek() != '.') {
return preReleases;
}
consume('.');
} while(true);
}
/*
* Basically, should be a valid number (without leading 0, unless for 0) or should contain at least one letter or dash.
*/
private String readPreRelease() {
var start = currentPosition;
var isAllDigit = true;
check(isValidIdentifierCharacterNext(), "Empty preRelease part in \"%s\" at %d", source, currentPosition - 1);
boolean startsWithZero = peek() == '0';
while (isValidIdentifierCharacterNext()) {
var c = advance();
isAllDigit &= isDigit(c);
}
var length = currentPosition - start;
var numberWithZeroPrefix = isAllDigit && startsWithZero && length != 1;
check(!numberWithZeroPrefix, "Numbers may not start with 0 except 0 in \"%s\" at position %d", source, start);
return source.substring(start, currentPosition);
}
private List<String> readBuilds() {
var builds = new ArrayList<String>();
do {
builds.add(readBuild());
if (peek() != '.') {
return builds;
}
consume('.');
}
while(true);
}
private String readBuild() {
var start = currentPosition;
check(isValidIdentifierCharacterNext(), "Empty build part in \"%s\" at %d", source, currentPosition - 1);
while (isValidIdentifierCharacterNext()) {
advance();
}
return source.substring(start, currentPosition);
}
}
-
\$\begingroup\$ Thank you for this analysis. You have only valid points. Regarding the circular dependency, would it help if I move the Parser as a nested class of Version? Or should they really be kept separated? The digit/non-digit duality is from the spec. I tried to keep the names as much as possible, but actually, there is a "wrapper" in the spec for that:
<identifier character>
so I'll use that one. The quantity of!
is indeed one driver for this code review request, that's some of it that made me think my code isn't really a fully readable parser. Thecheck
idea comes from Guava, but I'll fix it. \$\endgroup\$Olivier Grégoire– Olivier Grégoire2021年03月24日 15:16:39 +00:00Commented Mar 24, 2021 at 15:16 -
\$\begingroup\$ I disagree with the circular dependency assertion. In general, yes, they should be avoided, in this case, the impact is completely neglectable as
Version
does not depend on implementation details ofVersionParser
at all. Additionally, it leads to easier to use code as one can doVersion version = Version.parse(value)
instead ofVersion version = new VersionParser(value).parse()
orVersion version = VersionParser.parse(value)
. That there isVersionParser
seems like something that must not necessarily be known to the user ofVersion
. \$\endgroup\$Bobby– Bobby2021年03月24日 16:13:56 +00:00Commented Mar 24, 2021 at 16:13 -
\$\begingroup\$ @Bobby If
Version.valueOf
calledVersionParser.parse
then I think you'd have a stronger case. In its current form though,Version
both constructs and callsparse
. To me this is a dependency to be aware of. Adding aJSON
parser for constructing aVersion
would currently require changing logic in theVersion
class. As you've said, it may not be a problem in this case, however to me it's still a circle. \$\endgroup\$forsvarir– forsvarir2021年03月24日 17:29:39 +00:00Commented Mar 24, 2021 at 17:29 -
\$\begingroup\$ @OlivierGrégoire making the parser a nested class would signal your intent more strongly that the two classes are intertwined (the
Version
constructor could beprivate
). You could argue you already do this through package, rather thanpublic
visibility, or that the current dependency isn’t a problem because you have more knowledge of the domain. That’s why it’s a consideration, rather than a concrete rule. \$\endgroup\$forsvarir– forsvarir2021年03月24日 17:30:15 +00:00Commented Mar 24, 2021 at 17:30 -
1\$\begingroup\$ @OlivierGrégoire I've added some additional refactoring suggestions. \$\endgroup\$forsvarir– forsvarir2021年03月24日 21:32:59 +00:00Commented Mar 24, 2021 at 21:32