7
\$\begingroup\$

I was searching for C# comparers that can be used for natural sorting ("a9" comes before "a11"), but most solutions have a lot of allocations, quite unreadable code or are not covering all scenarios.

So I decided to make my own:

public sealed class JacobusAlphanumericComparer : IComparer<string>
{
 private readonly CultureInfo _culture;
 /// <param name="culture">The culture to use for comparing single characters, defaults to <see cref="CultureInfo.InvariantCulture"/>.</param>
 public JacobusAlphanumericComparer(CultureInfo? culture = null)
 {
 _culture = culture ?? CultureInfo.InvariantCulture;
 }
 public int Compare(string? x, string? y)
 {
 // Nulls are always seen as smaller/lower than actual strings.
 if (x is null) return y is null ? 0 : -1;
 else if (y is null) return 1;
 int index = 0;
 // While at least 1 of the strings still contain a character...
 while (index < x.Length || index < y.Length)
 {
 bool xEnded = index >= x.Length;
 bool yEnded = index >= y.Length;
 if (xEnded || yEnded)
 {
 // One of the strings has ended, the longest string will be considered the biggest
 // (the while condition already excludes equal length scenarios).
 // TODO: Not sure which one would be more correct in this scenario.
 // return x.Length > y.Length ? 1 : -1;
 return x.Length - y.Length;
 }
 // If we get here, both of the strings contain a character at the index.
 bool xIsNumber = char.IsDigit(x[index]);
 bool yIsNumber = char.IsDigit(y[index]);
 if (xIsNumber && yIsNumber)
 {
 int numberStartIndex = index;
 // Keeping looping while there are digits.
 do
 {
 index++;
 }
 while (index < x.Length && index < y.Length && char.IsDigit(x[index]) && char.IsDigit(y[index]));
 bool xNumberEnded = index >= x.Length || !char.IsDigit(x[index]);
 bool yNumberEnded = index >= y.Length || !char.IsDigit(y[index]);
 if (xNumberEnded && yNumberEnded)
 {
 // Both numbers ended at the same index.
 int numberEndIndex = index - 1;
 int numberComparison = CompareNumbers(x, y, numberStartIndex, numberEndIndex);
 if (numberComparison != 0) return numberComparison;
 // If we get here, the numbers were equal and we need to continue looking for differences.
 // We also don't need to increase the index, because the while loop already did that.
 continue;
 }
 else if (xNumberEnded)
 {
 // We immediately know y is bigger, because it contains more digits.
 return -1;
 }
 else if (yNumberEnded)
 {
 // We immediately know x is bigger, because it contains more digits.
 return 1;
 }
 }
 else
 {
 // Compare characters normally.
 int charComparison = CompareCharacters(x, y, index);
 if (charComparison != 0) return charComparison;
 }
 index++;
 }
 // Looped through the entire string without finding differences.
 return 0;
 }
 /// <param name="c">A char containing any digit 0-9.</param>
 private static int ParseDigitCharToInt(char c)
 {
 return c - '0';
 }
 private int CompareCharacters(string x, string y, int index)
 {
 // ToUpperInvariant would be slighly faster (2ns vs 6ns according to benchmarks),
 // but not worth sacrificing the flexibility of supplying your preferred culture.
 char upperCharX = char.ToUpper(x[index], _culture);
 char upperCharY = char.ToUpper(y[index], _culture);
 return upperCharX.CompareTo(upperCharY);
 }
 private static int CompareNumbers(string x, string y, int numberStartIndex, int numberEndIndex)
 {
 int xNumber;
 int yNumber;
 if (numberStartIndex == numberEndIndex)
 {
 // Optimized for single digit comparison.
 xNumber = ParseDigitCharToInt(x[numberStartIndex]);
 yNumber = ParseDigitCharToInt(y[numberStartIndex]);
 }
 else
 {
 #if NET5_0_OR_GREATER
 ReadOnlySpan<char> numbersSpanX = x.AsSpan(numberStartIndex..numberEndIndex);
 ReadOnlySpan<char> numbersSpanY = y.AsSpan(numberStartIndex..numberEndIndex);
 xNumber = int.Parse(numbersSpanX);
 yNumber = int.Parse(numbersSpanY);
 #else
 ReadOnlySpan<char> numbersSpanX = x.AsSpan(numberStartIndex, numberEndIndex - numberStartIndex);
 ReadOnlySpan<char> numbersSpanY = y.AsSpan(numberStartIndex, numberEndIndex - numberStartIndex);
 xNumber = int.Parse(numbersSpanX.ToString());
 yNumber = int.Parse(numbersSpanY.ToString());
 #endif
 }
 return xNumber.CompareTo(yNumber);
 }
}

I've created some benchmarks, and my solution seems faster than anything I've found so far (only https://github.com/drewnoakes/natural-sort is faster, but that one is not covering all scenarios).

Have I missed something? Do you see anymore potential performance or readability improvements?

asked Jan 15 at 15:22
\$\endgroup\$
4
  • 2
    \$\begingroup\$ Are you sure you need to parse consecutive digits as integers at all? Seems to me once you’ve established that both strings have an equally long run of digits, those can just be compared non- numerically character by character. \$\endgroup\$ Commented Jan 16 at 11:02
  • \$\begingroup\$ @Seb Yes you are right, that will be in the next version I will post soon. \$\endgroup\$ Commented Jan 17 at 8:20
  • \$\begingroup\$ What's the problem with "lots of allocations"? The .NET runtime is specially optimized around many small, short-lived objects. Have you found a particular scenario where it matters? \$\endgroup\$ Commented Jun 18 at 16:31
  • \$\begingroup\$ @Alejandro More allocations cause more garbage collections, which cause performance loss as the garbage collector requires resources to clean up the mess. I didn't really have a scenario where performance was critical. So this is was just for fun and curiosity. \$\endgroup\$ Commented Jun 25 at 9:00

6 Answers 6

11
\$\begingroup\$

"Allocation free"

"most solutions have a lot of allocations"
While I don't claim any knowledge of C#, the appearance of .ToString() suggests there are hidden (short lived) allocations happening in this code.
I'll gladly delete this point if I'm mistaken in my assumption.


How much to re-invent

Beginners writing in C quickly learn to use (and trust) the Standard Library function qsort(), writing, as the OP has done, built-for-purpose compare() functions.

To serve user's needs, any compare() function (with its predefined parameters) is often accompanied by a wrapper that inverts to returning value. This allows calling code (eg: qsort()) to sort in either ascending or descending sequence.

Suggestion: Add a trivial public function to your class (CompareReverse()) that provides descending ordering to a generic sorting mill...


Flying too high

There's a tacit presumption that contiguous digits will convert to a value that can be represented in an int variable, not even an unsigned int capable of twice the magnitude.

For example, a credit card 'number' requires a 'high capacity' variable to store its 'integer' value.
It's not difficult to imagine even larger strings of digits being used by users.

The presumption of small numbers may need a re-think.


Don't mix apples with oranges

Consider this comment

 // If we get here, the numbers were equal and we need to continue looking for differences.

The OP's candidate data is, in fact, a string-ised data structure.

typedef struct {
 char str[ 10 ]; // string of 1-10 characters
 char str[ 10 ]; // string of 1-10 characters
 char str[ 10 ]; // string of 1-10 characters
 ...
};

Any number of segments each of which could be any length, and alpha segments are left justified while numeric segments are left padded with zeros. (Used 10 as a maximum segment length for this example.)

The code would be much easier to follow (and to maintain! (imo)) if it were broken into two independent 'service functions' to compare: 1) two segments full of alpha, and 2) two segments full of digits. The controlling function would be responsible for dissecting each segment, and ensuring alpha vs alpha and digits vs digits. As presented, these two different requirements seem to be hazardously interwoven.


Swapping priorities

In the following, alpha means any non-digit character.

The alpha vs alpha comparison is no challenge, and the alpha vs digit is also rather easy to evaluate. The digits vs digits presents difficulties because each digit's applicable "power of 10" is unknown.

The OP's code prioritises the length of each substring of digits, then goes off to try to compare their values as integers.

Consider these scenarios:

"abc456444qrx"
"abc456444qry" // equal digits. Keep going until reaching 'x' vs 'y'
 ^--
"abc456444qrx"
"abc457444qry" // greater is TENTATIVE at 6 vs 7, confirmed at end of digits
 ^> ^!!!
"abc4569999qrx"
"abc457444qry" // greater is TENTATIVE at 6 vs 7, but conceded at 9 vs 'q'
 ^> ^!!! // assumption was wrong. Now we know!

The key idea being "which string becomes tentatively greater first while sequentially comparing paired digits?" While traversing remaining digits do not compare values, but be on the lookout for end of string or encountering an alpha character.

Make up a table listing the permutations of the conditions that could arise, and, for each entry, determine whether to return a relationship value (-1, 0, 1), or to continue scanning. This can be achieved with only a single index variable, but needs to be coded as simply as possible to begin with. Postpone combining operations to reduce lines of code. Get it working with a large number of outrageous test cases as the first priority. (Hint: comparing "HHGTTG" with "42" is decided on the very first character.)


Avoid slippery slopes

After reading the answer from @J_H, I've suddenly become aware of another scenario that could cause grief: Decimal Fractions.

From a numeric stand point, the value "foo5.8" is greater than "foo5.75". However, the latter's extra digit(s) to the right of the decimal point ARE already in their applicable power-of-ten sequence. That the user's data uses varying degrees of precision means simple "the longer sequence of digits represents the larger value" will not provide sorting into natural order.

(It's always the simplest of projects that turn out to have unforeseen complexities.)

answered Jan 16 at 1:52
\$\endgroup\$
3
  • 1
    \$\begingroup\$ @Fe203 About your paragraph "How much to re-invent": In C# we often order with linq, in this case you would call either myList.Order(new JacobusAlphanumericComparer()) or call myList.OrderDescending(new JacobusAlphanumericComparer()). But if you want to keep the same array reference by using Array.Sort(), then yes, you would have to create a reverse comparer. \$\endgroup\$ Commented Jan 16 at 12:56
  • 2
    \$\begingroup\$ In C# int and uint are 32-bit numbers with 9 - 10 digits. We also have 64-bit long and ulong with 19 - 20 digits. A decimal has 28 - 29 digits precision. \$\endgroup\$ Commented Jan 16 at 14:17
  • 1
    \$\begingroup\$ @OlivierJacot-Descombes Thank you. The point is to avoid (if possible) an arbitrary ceiling, and, otherwise, to acknowledge any recognised "restrictions" used in the code... The OP's use-case may be satisfied with "up to 6 digits, max!!" Just noting a caveat should be present as a comment... \$\endgroup\$ Commented Jan 16 at 16:41
8
\$\begingroup\$

Code the way you think about problem and solution.
Document your code to at least the point where it doesn't need to be reverse engineered to find out how to use it - does Compare() not need a description for reassurance? (I don't know about C# and "documentation inheritance". But I think digit string interpretation to deserve a description, anyway - starting with leading zeroes, sign before a digit, letter case...
While private, CompareNumbers() could document numberEndIndex to be inclusive. (or ex? guess "the NET5-#else" confuses me))

Come to think of it, handling digit sequences and case is independent:
consider parametrising letter comparison.

I would not think compare index to string length "twice in a row". (same for ?NumberEnded.)
Come to think of it, if I kept whether I swapped arguments, I don't need to check with both strings at all. But overall, you seem prioritise readability over microoptimisation. Good. Maybe drop the special case in CompareNumbers().

I see how introducing xIsNumber to yNumberEnded is self-documenting.

The condition of "the digit loop" repeats that of the enclosing if - a code smell.

The character loop should use for - caveat: digit string handling as presented makes this choice less clear.

I think it would be simpler to start special casing digit strings when a character difference has been detected:
If one is a digit and the other is not, check if both preceding characters are digits - if yes, digit at current index is greater.
Otherwise establish which digit is greater and check (remaining) lengths: you never need to convert into a number representation of worrisome range.

answered Jan 15 at 18:16
\$\endgroup\$
1
  • \$\begingroup\$ Tried to code "non-converting digit string handling": leading zeroes make that interesting - if prepending "0" doesn't change order, further processing needs separate string indices. \$\endgroup\$ Commented Jan 16 at 8:05
4
\$\begingroup\$

Have I missed something?

write down the spec

After seeing the OP and the github ReadMe, I didn't notice anything that formally describes what "less than" means in this context.

unary plus / minus

These three characters might conceivably fit into CompareNumbers(), though they are not digits.

  • 0x2b +
  • 0x2d -
  • 0x2e .

(I predict that when you do write down a spec, representing "a thousand" as 1e3 will be very very far out of scope.)

I feel that decimal behavior we get "for free" -- there's little difference between a3Z4 and a3.4, so we can probably exclude that one. But it troubles me that ASCII 0x2b < 0x2d, as with a+3 versus a-3, given that many mathematicians would assert \$-3 ~ < +3\$.

Yikes! @Fe2O3 points out that 0x2C , also matters, in Europe. Though again I appeal to there being little difference between a3Z4 and a3,4.

leading zeros

It's not obvious to me which of these is smallest.

  • agent-007
  • agent-07

That's something that would be cleared up by a written spec. (We might also throw agent+007 and agent+07 into the mix.)

unicode codepoints

You might possibly wish to limit valid input to just 7-bit ASCII. There's a vast number of ways to express the numeric concept of "two" in modern Unicode, and char.IsDigit() is familiar with many of them. Maybe throw an exception upon encountering an invalid input character?

Alternatively, you might prefer to write your own IsDigit() helper which recognizes exactly ten digits.

unwarranted assumption

 /// <param name="c">A char containing any digit 0-9.</param>
 private static int ParseDigitCharToInt(char c)
 {
 return c - '0';
 }

The "0-9" comment is great, thank you.

But the caller's guard of char.IsDigit() admits glyphs from several codepoint ranges beyond Latin, including

  • Arabic
  • Superscript
  • Mathematical Alphanumeric Symbols
  • Arabic-Indic
  • Mathematical Double-Struck
  • Roman Numerals
  • Coptic
  • Devanagari
  • Urdu
  • Persian
  • Sindhi
  • Thai
  • Tamil
  • Gurmukhi
  • Hangul
answered Jan 16 at 20:19
\$\endgroup\$
1
  • 3
    \$\begingroup\$ If we're going international, can't forget the inverse meanings of '.' and ',' between N.Am and Europe (at least)... :-) \$\endgroup\$ Commented Jan 16 at 20:31
2
\$\begingroup\$

Allocation free? You could use 1 single new allocation simply for reversing the string for checking if the reverse is equal to the first string. It appears that if the lengths of the strings aren't equal it should return -1? Here is a code sample you could perhaps build on to get the result your looking for.

public static class Test
{
 private static bool IsValid(string value)
 {
 if (string.IsNullOrEmpty(value) ||
 string.IsNullOrWhiteSpace(value)) return false;
 foreach (char c in value)
 {
 if (!char.IsDigit(c)) return false;
 }
 return true;
 }
 private static bool IsComparableForward(string x, string y)
 {
 if (!IsValid(x) || !IsValid(y)) return false;
 if (x.Length != y.Length) return false;
 int length = x.Length + y.Length >> 1;
 for (int i = 0; i < length; i++)
 {
 char a = x[i];
 char b = y[i];
 if (a != b) return false;
 }
 return true;
 }
 private static bool IsComparableBackward(string x, string y)
 {
 if (!IsValid(x) || !IsValid(y)) return false;
 if (x.Length != y.Length) return false;
 //the only time a new string allocation 
 //will be done is for reversing y
 char[] temp = y.ToCharArray();
 Array.Reverse(temp);
 string y2 = new(temp);
 return IsComparableForward(x, y2);
 }
 /// <summary>
 /// Performs a comparison between <paramref 
 /// name="x"/> and <paramref name="y"/>
 /// </summary>
 /// <param name="x"></param>
 /// <param name="y"></param>
 /// <returns>-1 if not equal, 0 if comparable 
 /// forward, or 1 if comparable backward</returns>
 public static int Compare(string x, string y)
 { 
 if (!IsValid(x) || !IsValid(y)) return -1;
 
 return IsComparableForward(x, y) ? 0 : 
 IsComparableBackward(x, y) ? 1 : -1;
 }
}
answered Jan 16 at 17:36
\$\endgroup\$
3
  • \$\begingroup\$ For one you use String.IsNullOrEmpty and the other String.IsNullOrWhiteSpace, that surely should be the same. \$\endgroup\$ Commented Jan 17 at 15:27
  • \$\begingroup\$ No, String.IsNullOrEmpty checks for null & "" and String.IsNullORWhiteSpace checks for null or a string of any size that has nothing but white space for example string s = " " 1 whitespace, " " 6 whitespaces \$\endgroup\$ Commented Jan 17 at 17:16
  • 2
    \$\begingroup\$ @CharlesHenington Tip: use 'back-ticks' to present data... " " This keeps the formatter from squishing-down adjacent whitespace... \$\endgroup\$ Commented Jan 17 at 20:44
2
\$\begingroup\$

Based on all the feedback in the other answers, I've created a new version of the comparer.

Things I have decided not to support:

  • Scientific number representations like 1e3.
  • Unary plus/minus, meaning differing between positive and negative numbers is not supported.
  • Decimal fractions.

Changes:

  • More comments/documentation to clarify how the comparer and its private methods work.
  • The implementation no longer contains allocations for .NET versions lower than .NET 5.
  • Support for numbers greater than the C# int maximum value (2147483647). This change might also boost performance as it never has to convert characters to an actual number (like int or uint).
  • Leading zeroes are now correctly ignored.
  • No longer repeating if statements/conditions in while/for loop that are also conditions of the loop itself.
  • Made character loop a for loop instead of while.
  • Performance improvement for very similar strings by only using expensive ToUpper methods if the characters are not equal.

/// <summary>
/// Compares strings naturally/alphanumerically.
/// See <see href="https://en.wikipedia.org/wiki/Natural_sort_order"/> for more details.
/// </summary>
/// <remarks>
/// Does not support decimal fractions, only full numbers are compared.<br/>
/// Does not support negative numbers, so '-' and '+' will be compared just like any other non-digit character.<br/>
/// Does not support comparing scientific number representations like 1e3.
/// </remarks>
/// <example>
/// Compare("a", "a") == 0
/// Compare("a", "b") == -1
/// Compare("b", "a") == 1
/// Compare("a", "1") == 1
/// Compare("11", "4") == 1
/// Compare("11z", "2zzz") == 1
/// Compare("11zy", "11z") == 1
/// Compare("aa8aa", "aa7aa") == 1
/// Compare("b", "A") == 1
/// Compare("011", "10") == 1
/// Compare("10", "010") == 1 (this one was an arbitrary choice, because Windows also orders its folders that way).
/// Compare("21.49", "21.5") = -1 (no support for decimal fractions on purpose, because that would cause other issues with for example comparing semantic version numbers https://semver.org/)
/// </example>
/// <returns>
/// <b>Exactly zero</b> if equal.<br/>
/// <b>Greater than zero</b> if x is greater than y.<br/>
/// <b>Less than zero</b> if x is less than y.
/// </returns>
public sealed class JacobusAlphanumericComparer : IComparer<string>
{
 private readonly CultureInfo _culture;
 /// <param name="culture">The culture to use for comparing single characters, defaults to <see cref="CultureInfo.InvariantCulture"/>.</param>
 public JacobusAlphanumericComparer(CultureInfo culture = null)
 {
 _culture = culture ?? CultureInfo.InvariantCulture;
 }
 public int Compare(string x, string y)
 {
 if (x is null) return y is null ? 0 : -1;
 else if (y is null) return 1;
 // Loop while both strings still contain more characters...
 for (int index = 0; index < x.Length && index < y.Length; index++)
 {
 bool xIsNumber = char.IsDigit(x[index]);
 bool yIsNumber = char.IsDigit(y[index]);
 if (xIsNumber && yIsNumber)
 {
 int numberStartIndex = index;
 int xNumberEndIndex = index;
 int yNumberEndIndex = index;
 bool findingXNumberEnd;
 bool findingYNumberEnd;
 // Looping to find the end index of the number in both x and y.
 do
 {
 index++;
 findingXNumberEnd = index < x.Length && char.IsDigit(x[index]);
 if (findingXNumberEnd) xNumberEndIndex = index;
 findingYNumberEnd = index < y.Length && char.IsDigit(y[index]);
 if (findingYNumberEnd) yNumberEndIndex = index;
 }
 while (findingXNumberEnd || findingYNumberEnd);
 // Creating spans that start and end exactly where the numbers are.
 int xNumberLength = xNumberEndIndex - numberStartIndex + 1;
 ReadOnlySpan<char> xNumber = x.AsSpan(numberStartIndex, xNumberLength);
 int yNumberLength = yNumberEndIndex - numberStartIndex + 1;
 ReadOnlySpan<char> yNumber = y.AsSpan(numberStartIndex, yNumberLength);
 int numberComparison = CompareNumbers(xNumber, yNumber);
 bool numbersAreDifferent = numberComparison != 0;
 if (numbersAreDifferent) return numberComparison;
 // If we get here, the numbers were equal and we need to continue looking for differences.
 // Both xNumberEndIndex and yNumberEndIndex are equal here, so we can use any one of them.
 index = xNumberEndIndex;
 continue;
 }
 else
 {
 // Compare characters normally.
 int charComparison = CompareCharacters(x[index], y[index]);
 bool charactersAreDifferent = charComparison != 0;
 if (charactersAreDifferent) return charComparison;
 }
 }
 // If we get here, either:
 // - both strings were equal. For example: Compare("a1a", "a1a") == 0.
 // - or they were equal up until one string ended. For example: Compare("a1a", "a1aa") == -1
 return x.Length - y.Length;
 }
 private int CompareCharacters(char x, char y)
 {
 if (x == y) return 0;
 // ToUpperInvariant would be slightly faster (2ns vs 6ns according to benchmarks),
 // but not worth sacrificing the flexibility of supplying your preferred culture.
 char upperX = char.ToUpper(x, _culture);
 char upperY = char.ToUpper(y, _culture);
 return upperX.CompareTo(upperY);
 }
 /// <summary>
 /// Compares two texts numerically.
 /// </summary>
 /// <remarks>
 /// Ignores leading '0' characters, so CompareNumbers("013", "12") => 1
 /// </remarks>
 /// <example>
 /// CompareNumbers("12", "13") => -1
 /// CompareNumbers("13", "12") => 1
 /// CompareNumbers("12", "12") => 0
 /// CompareNumbers("012", "12") => -1
 /// </example>
 /// <param name="x">A span of characters containing only digits (0-9).</param>
 /// <param name="y">A span of characters containing only digits (0-9).</param>
 /// <returns>
 /// 0 if equal.<br/>
 /// 1 if <paramref name="x"/> is greater than <paramref name="y"/>.<br/>
 /// -1 if <paramref name="y"/> is greater than <paramref name="x"/>.
 /// </returns>
 private static int CompareNumbers(ReadOnlySpan<char> x, ReadOnlySpan<char> y)
 {
 // Trim leading '0' of x and y number, for example: "00123" becomes "123".
 ReadOnlySpan<char> trimmedX = ExcludeLeadingZeroes(x);
 ReadOnlySpan<char> trimmedY = ExcludeLeadingZeroes(y);
 // Return early if a number only contains zeroes: CompareNumbers("1", "0000") == 1.
 bool xContainsOnlyZeroes = trimmedX.Length is 0;
 bool yContainsOnlyZeroes = trimmedY.Length is 0;
 if (xContainsOnlyZeroes) return yContainsOnlyZeroes ? 0 : -1;
 else if (yContainsOnlyZeroes) return 1;
 // After leading zeroes are excluded, a longer number is always greater: CompareNumbers("1000", "999") == 1.
 if (trimmedX.Length > trimmedY.Length) return 1;
 else if (trimmedX.Length < trimmedY.Length) return -1;
 // If we get here, we know there are no leading zeroes and the numbers have the same length.
 // Now it's just a matter of comparing each character normally.
 for (int index = 0; index < trimmedX.Length; index++)
 {
 char xDigit = trimmedX[index];
 char yDigit = trimmedY[index];
 int comparison = xDigit.CompareTo(yDigit);
 if (comparison != 0) return comparison;
 }
 // TODO: Should "010" < "10" like Windows sorts its folders and files?
 // Or is "010" > "10" more logical? I'm clueless here.
 // Comparing original length to make sure "010" < "10".
 return y.Length - x.Length;
 }
 /// <summary>
 /// Shortens the span to exclude leading '0' characters.
 /// </summary>
 /// <example>
 /// ExcludeLeadingZeroes("00100200") => "100200"
 /// </example>
 /// <param name="span">The span to exclude leading zeroes from.</param>
 /// <returns>A span without leading zeroes.</returns>
 private static ReadOnlySpan<char> ExcludeLeadingZeroes(ReadOnlySpan<char> span)
 {
 int startIndex = 0;
 for (int i = 0; i < span.Length; i++)
 {
 if (span[i] is '0') startIndex++;
 else break;
 }
 // We can just return the same span if it had no leading zeroes.
 if (startIndex is 0) return span;
 // The range operator is only supported on .NET version 5 or greater,
 // in other versions .Slice() must be used.
 #if NET5_0_OR_GREATER
 return span[startIndex..];
 #else
 return span.Slice(startIndex);
 #endif
 }
}
answered Jan 17 at 10:06
\$\endgroup\$
7
  • 3
    \$\begingroup\$ Looking forward to taking a closer look tomorrow. For the present, though, there's a now unnecessary continue; still lurking in the main loop. Cheers! \$\endgroup\$ Commented Jan 17 at 10:33
  • 2
    \$\begingroup\$ That continue is only for readability purposes. However, what is more readable to me might be less readable for others ;) \$\endgroup\$ Commented Jan 17 at 12:57
  • 3
    \$\begingroup\$ I have written such a comparer in the past and I recommend not to handle leading zeros as I had to remove it from my code again. It's mathematically correct but not what the user wants. In any GUI, if there are numbers with and without leading zeros present, then you want to have the ones with leading zeros together with the rest of the group with the same number of digits. I would sort "Foo1", "Foo2", "Foo99", "Foo001", "Foo002", "Foo099", "Foo999", "Foo1000" etc. and not "Foo1", "Foo001", "Foo2", "Foo002", "Foo099", "Foo999", "Foo1000". \$\endgroup\$ Commented Jan 17 at 15:18
  • 1
    \$\begingroup\$ Congrats. Great learning experience!! It's only by climbing the mountain that you appreciate how 'rugged' the journey can be. The initial idea sounds quite simple. It often turns out there are unrecognised hurdles and decisions to be made along the way. The joy of coding is, in part, this improvisation... :-) Cheers! (PS: When asked by supervisors, "How long will this take?", be sure to respond, "How long is a piece of string? We won't know until we have it." All too often, the spec they wrote is full of holes they can't see...) \$\endgroup\$ Commented Jan 17 at 20:31
  • 1
    \$\begingroup\$ If you want a review of the new code, you should create a follow up question. Questions are not open ended discussions. \$\endgroup\$ Commented Jan 18 at 12:51
2
\$\begingroup\$

With strings of digits, equal strings are equal numbers, strings of equal lengths compare the same as ascii or numbers. For digit strings of different lengths with at least one equal digit at the start the longer string is larger.

Perform a letter compare which ends when one or both strings end or two letters are different. If one or both strings ended then the result is the letter compare result. Same if we found two different non-digits. All preceding numbers are the same.

Otherwise we have two letters which are different where at least one is a digit and all information to get the correct result.

If there is one digit only then the string with the extra digit is larger if there is a preceding digit; if there is no preceding digit then the strings are sorted according to letter sorting. If there are two different digits then we need to count the following digits. If the number of following digits is the same then the strings are sorted in letter order, otherwise the string with more digits is the larger one.

answered Jan 18 at 13:58
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.