Custom integer parser optimized for performance

Question 1

I want an integer parser so fast it risks discovering time travel.

I'm stuck in C# for a language, but that doesn't mean I can't try to use c++. So I wrote the following:

The best I could do

public static unsafe bool FastTryParseInt(string input, out int result)
{
 fixed (char* cString = input) {
 unchecked {
 char* nextChar = cString;
 bool isNegative = false;
 if (*nextChar == CharNegative) {
 isNegative = true;
 nextChar++;
 }
 result = 0;
 while (*nextChar >= '0' && *nextChar <= '9')
 result = result * 10 + (*nextChar++ - '0');
 if (*nextChar != Char.MinValue) return false;
 long ptrLen = nextChar - cString;
 if (isNegative) {
 result = -result;
 return ptrLen < 11L || ptrLen <= 11L && result <= 0;
 }
 return ptrLen < 10L || ptrLen <= 10L && result >= 0;
 }
 }
}

Shoutout to @chux who gave me many good suggestions while reviewing our custom Double parser at Custom double parser optimized for performance .

With comments

Here it is with comments, because I'm not a monster (at least not that kind of monster):

// High performance integer parser with rudimentary flexibility.
// Supports leading negatives, but no white-space or other non-numeric characters.
public static unsafe bool FastTryParseInt(string input, out int result)
{
 fixed (char* cString = input) { // Allows pointer math
 unchecked { // Allows faster integer math by not testing for overflows
 char* nextChar = cString;
 bool isNegative = false;
 // Handle a possible negative sign at the beginning of the string.
 if (*nextChar == CharNegative)
 {
 isNegative = true;
 nextChar++;
 }
 // Now process each character of the string
 result = 0;
 while (*nextChar >= '0' && *nextChar <= '9')
 result = result * 10 + (*nextChar++ - '0');
 // If the non-numeric character encountered to end the while loop
 // wasn't the null terminator, the string is invalid.
 if (*nextChar != Char.MinValue) return false;
 // We need to check for an integer overflow based on the length of the string
 long ptrLen = nextChar - cString;
 // Result and overflow logic is different if there was a minus sign.
 if (isNegative)
 {
 result = -result;
 // Less than 11 characters (including negative) is no risk of overflow
 // Longest possible negative int is 11 chars (-2147483648)
 // If exactly 11 characters, we know our negative integer overflowed
 // if the final result is greater than zero, otherwise it's fine.
 return ptrLen < 11L || ptrLen <= 11L && result <= 0;
 }
 // Otherwise, overflow logic is the same, but opposite, and one fewer
 // characters is allowed (because there was no minus sign)
 return ptrLen < 10L || ptrLen <= 10L && result >= 0;
 }
 }
}

This is ~6x faster than a billion calls to Int.TryParse.

Native parser took 13808 ms. Custom parser took 2191 ms. Performance gain was 530%

Unsafe? Gross!

I tried getting as close as I could to the above algorithm without using unsafe, and to my surprise, it's about as good. It made me sad, because I was proud of my pointer math:

public static bool FastTryParseIntOld(string input, out int result)
{
 result = 0;
 int length = input.Length;
 if (length == 0) return false;
 bool isNegative = false;
 int currentIndex = 0;
 char nextChar = input[0];
 unchecked
 {
 if (nextChar == CharNegative)
 {
 isNegative = true;
 ++currentIndex;
 }
 while (currentIndex < length)
 {
 nextChar = input[currentIndex++];
 if (nextChar < '0' || nextChar > '9') return false;
 result = result * 10 + (nextChar - '0');
 }
 if (isNegative)
 {
 result = -result;
 return length < 11 || length <= 11 && result <= 0;
 }
 return length < 10 || length <= 10 && result >= 0;
 }
}

Native parser took 13727 ms. Custom parser took 2377 ms. Performance gain was 477%

Frequently Asked Questions

Is Double.Parse really your performance bottleneck?

Yes! We benchmarked it in release mode and everything. We're ripping through billions of strings loaded into memory from a super-fast CSV reader, and calling native Double.TryParse was around ~70% of our execution time. More than I/O even.

Then why are you using C#?

It's much nicer for the 100 other things our app has to do that aren't performance critical.

What's your benchmark code?

Here it is. I'm pretty sure it's a good measure:

const int iterations = 100000000;
string[] randomIntegers = Enumerable.Repeat(0, iterations )
 .Select(i => generateRandomInput()).ToArray();
CultureInfo cachedCulture = CultureInfo.CurrentCulture;
Stopwatch timer = Stopwatch.StartNew();
foreach (string value in randomIntegers)
 Int32.TryParse(value, nativeNumberStyles, cachedCulture, out T _);
Console.WriteLine($"Native parser took {timer.ElapsedMilliseconds} ms.");
timer.Restart();
foreach (string value in randomIntegers)
 FastTryParseInt(value, out T _);
Console.WriteLine($"Custom parser took {timer.ElapsedMilliseconds} ms.");

Compiled and run as "Release|Any CPU" on a 64 bit machine.

My Question

Can we do better? Did I miss something in unsafe mode that can make it way better than boring, safe C#?

Question 2

Not sure on performance, but this is broke: FastTryParseInt(4294967296) will return True / 0. This works for [4294967296,6442450943], [8589934592,9999999999], [-6442450944,-4294967296], and [-9999999999,-8589934592].

Question 3

Is this a programming-challange or do you really have that ammount of numbers that you have to parse?

Question 4

@t3chb0t Here's all of it: dotnetfiddle.net/yLtKLO

Question 5

Please see What should I do when someone answers my question?: Do not add an improved version of the code after receiving an answer. Including revised versions of the code makes the question confusing, especially if someone later reviews the newer code.

Question 6

@t3chb0t I forgot about that one. Going to put it in a new answer instead. (codereview.stackexchange.com/a/201151/2792)

Question 7

I normally don't write performance-sensitive code like this, but I think I managed to find a few improvements:

Writing to out parameters is slower than using local variables, so you only want to assign to result once. This makes a big difference.
Rejecting null/empty input right away might improve performance in some cases, perhaps because it enables some JIT optimizations.
Overflow always causes a mismatch between the first input digit and the resulting value, so a single check should be sufficient.

And a few bug-fixes:

"00000000001" was rejected because it has 11 digits, but it's a valid input. Leading zeroes should be ignored for length/overflow checks.
"405円" was parsed as 4, but it's an invalid input. input.Length should be used instead of checking for null-terminators.
"" was parsed as 0, but it's an invalid input.

According to my tests the following (safe) implementation is roughly 10%-15% faster. An unsafe variant should increase that to some 20%:

public static bool TryParseInt(string input, out int result)
{
 // This check sometimes improves performance - perhaps the extra guarantees enable some JIT optimizations?
 if (input == null || input.Length == 0)
 {
 result = default(int);
 return false;
 }
 var length = input.Length;
 var isNegative = input[0] == '-';
 var offset = isNegative ? 1 : 0;
 // It's faster to not operate directly on 'out' parameters:
 int value = 0;
 for (int i = offset; i < length; i++)
 {
 var c = input[i];
 if (c < '0' || c > '9')
 {
 result = default(int);
 return false;
 }
 else
 {
 value = (value * 10) + (c - '0');
 }
 }
 // Inputs with 10 digits or more might not fit in an integer, so they'll require additional checks:
 if (length - offset >= 10)
 {
 // Overflow/length checks should ignore leading zeroes:
 var meaningfulDigits = length - offset;
 for (int i = offset; i < length && input[i] == '0'; i++)
 meaningfulDigits -= 1;
 if (meaningfulDigits > 10)
 {
 // Too many digits, this certainly won't fit:
 result = default(int);
 return false;
 }
 else if (meaningfulDigits == 10)
 {
 // 10-digit numbers can be several times larger than int.MaxValue, so overflow may result in any possible value.
 // However, we only need to check the most significant digit to see if there's a mismatch.
 // Note that int.MinValue always overflows, making it the only case where overflow is allowed:
 if (!isNegative || value != int.MinValue)
 {
 // Any overflow will cause a leading digit mismatch:
 if (value / 1000000000 != (input[length - 10] - '0'))
 {
 result = default(int);
 return false;
 }
 }
 }
 }
 // -int.MinValue overflows back into int.MinValue, so that's ok:
 result = isNegative ? -value : value;
 return true;
}

Question 8

Wow, just switching to a local int for the accumulation rather than using the out result directly upped the performance from +530% to +710% vs the native integer parser. Great find!

Question 9

Like @202_accepted has mentioned in his comment you have a bug in your code which is hidden because of using unchecked in addition with your resulting condition.

I would divide this algorithm into 3 branches.

If the length of input is greater than the length of either int.MaxValue or int.MìnValue we can just return false.
If the length of input is smaller than the length of either int.MaxValue or int.MìnValue we can return false if any char is > 9 or < 0.
If the length is the same we iterate over input and compare it to a string representation of either int.MaxValue or int.MinValue and if any char, which isn't > 9 or < 0, is greater than the char to compare we return false.

Because you may use this method in a different project as well I would validate the input for null and whitespace instead of checking the Length property.
If I see a xxxTryParse method I would assume the out parameter being default(T) if the method returns false.

Implementing the mentioned points the code would loke like so

private const char CharNegative = '-';
private const int possibleMaxLength = 10;
private const int possibleMaxNegativeLength = 11;
private static string comparePositive = int.MaxValue.ToString();
private static string compareNegative =int.MinValue.ToString();
public static bool FastTryParseInt(string input, out int result)
{
 result = 0;
 if (string.IsNullOrWhiteSpace(input)) { return false; }
 int length = input.Length;
 bool isNegative = false;
 int currentIndex = 0;
 if (input[0] == CharNegative)
 {
 if (length > possibleMaxNegativeLength) { return false; }
 isNegative = true;
 ++currentIndex;
 }
 int maxLength = isNegative ? possibleMaxNegativeLength : possibleMaxLength;
 if (length > maxLength)
 {
 return false;
 }
 else if (length < maxLength)
 {
 char nextChar;
 while (currentIndex < length)
 {
 nextChar = input[currentIndex++];
 if (nextChar < '0' || nextChar > '9')
 {
 result = 0;
 return false;
 }
 result = result * 10 + (nextChar - '0');
 }
 }
 else
 {
 bool needsToBeCompared = true;
 string valueToCompare = isNegative ? compareNegative : comparePositive;
 char nextChar;
 char compareChar;
 while (currentIndex < maxLength)
 {
 nextChar = input[currentIndex];
 compareChar = valueToCompare[currentIndex];
 if (nextChar < '0' || nextChar > '9' || (needsToBeCompared && compareChar < nextChar))
 {
 result = 0;
 return false;
 }
 if (needsToBeCompared)
 {
 needsToBeCompared = compareChar == nextChar;
 }
 result = result * 10 + (nextChar - '0');
 currentIndex++;
 }
 }
 if (isNegative) { result = -result; }
 return true;
}

Timings (100000000 values, compiled AnyCpu, running on x64):

int.TryParse(): 11200 ms
FastTryParse(): 2730 ms

with generating random input like

private static Random random = new Random();
private static IEnumerable<string> GenerateRandomInput(int iterations)
{
 if (iterations == 0) yield break;
 for (int i = 0; i < iterations; i++)
 {
 if (i % 2 == 0)
 {
 yield return LongRandom((long)int.MinValue, ((long)int.MaxValue)).ToString();
 }
 else if (i % 3 == 0)
 {
 yield return LongRandom((long)int.MaxValue, ((long)int.MaxValue + 10000)).ToString();
 }
 else
 {
 yield return LongRandom(long.MinValue, long.MaxValue).ToString().Insert(3, "a"); 
 }
 }
}
private static long LongRandom(long min, long max)
{
 byte[] buf = new byte[8];
 random.NextBytes(buf);
 long longRand = BitConverter.ToInt64(buf, 0);
 return (Math.Abs(longRand % (max - min)) + min);
}

Edit
Since you don't bother about special cases like null being passed to the method and performance is your target, I have changed it like so

public static bool FastTryParseInt(string input, out int result)
{
 result = 0;
 int length = input.Length;
 if (length == 0) { return false; }
 bool isNegative = false;
 int currentIndex = 0;
 if (input[0] == CharNegative)
 {
 if (length > possibleMaxNegativeLength) { return false; }
 isNegative = true;
 ++currentIndex;
 }
 int maxLength = isNegative ? possibleMaxNegativeLength : possibleMaxLength;
 if (length > maxLength)
 {
 return false;
 }
 char nextChar;
 while (currentIndex < length)
 {
 nextChar = input[currentIndex++];
 if (nextChar < '0' || nextChar > '9')
 {
 result = 0;
 return false;
 }
 result = result * 10 + (nextChar - '0');
 if (result < 0)
 {
 result = 0;
 return false;
 }
 }
 if (isNegative) { result = -result; }
 return true;
}

If int result is overflowing it becomes negative so we just check if result < 0 to return false.

This runs in 2065 ms vs int.TryParse in 11220 ms.

Btw. you only need the unchecked keyword if you have checked "Check for arithmetic overflow/underflow" in the advanced build settings.

Question 10

string.IsNullOrWhiteSpace is latin sensitive, which could be optimized in this case, maybe? Seeing how char.IsDigit was not used in favor of nextChar < '0' || nextChar > '9', which ignores globalization. I don't know what satisfies the requirements of the parser though.

Question 11

That's an interesting concept, but in practice, it looks like the performance cost is too great. This new approach is only 310% faster than int.TryParse (down from 530%). Any 'checks' that can be saved until absolutely necessary are important to maximizing performance in the success case. Even just getting the length of the string could force an enumeration of the underlying character array that consumes half as many operations as the parsing of the string itself.

Question 12

Since you're rejecting anything with too many digits you should only need to check for overflow after the last digit. Also note that int.MinValue is a special case.

Question 13

Note that your latest edit has the same bug as the OP - a value like 4294967300 doesn't exceed the max length, but will overflow so far that it will become positive again - so your final check to see if the result is negative won't catch the overflow.

Question 14

@Alain thanks. I just shouldn't change code without checking. Fixed

Question 15

Here's the most performant version I've got so far based on suggestions by @PieterWitvoet and 202_accepted. It also corrects the overflow bug from my OP.

/// <summary>High performance integer parser with rudimentary flexibility.</summary>
/// <remarks>Supports negative numbers, but no whitespace or other non-digit characters
/// may be present.
/// Will not parse strings with more than 10 numeric characters,
/// even if there are leading zeros (such that the integer does not overflow).</remarks>
public static unsafe bool FastTryParseInt(string input, out int result)
{
 // We never expect null, but enforcing this may enable some JIT optimizations.
 if (input == null)
 {
 result = default(int);
 return false;
 }
 fixed (char* cString = input)
 {
 unchecked
 {
 char* nextChar = cString;
 bool isNegative = false;
 // Check whether the first character is numeric
 if (*nextChar < '0' || *nextChar > '9')
 {
 // Only allow a negative sign at the beginning of the string.
 if (*nextChar != CharNegative)
 {
 result = default(int);
 return false;
 }
 isNegative = true;
 // Any other non-numeric characters after this is an error.
 if (*++nextChar < '0' || *nextChar > '9')
 {
 result = default(int);
 // Special Case: Excel has been known to format zero as "-".
 // So return true here IFF this non-digit char is the end-of-string.
 return *nextChar == Char.MinValue;
 }
 }
 // Now process each character of the string
 int localValue = *nextChar++ - '0';
 while (*nextChar >= '0' && *nextChar <= '9')
 localValue = localValue * 10 + (*nextChar++ - '0');
 // If the non-numeric character encountered to end the while loop
 // wasn't the null terminator, the string is invalid.
 if (*nextChar != Char.MinValue)
 {
 result = default(int);
 return false;
 }
 // We need to check for an integer overflow based on the length of the string
 long ptrLen = nextChar - cString;
 // Result and overflow logic is different if there was a minus sign.
 if (isNegative)
 {
 result = -localValue;
 // Longest possible negative int is 11 chars (-2147483648)
 // Less than 11 characters (including negative) is no risk of overflow
 if (ptrLen < 11L) return true;
 // More than 11 characters is definitely an overflow.
 if (ptrLen > 11L) return false;
 // Exactly 11 characters needs to be checked for overflow.
 // Neat Trick: An overflow will always result in the first digit changing.
 return *(cString + 1) - '0' == localValue / 1000000000
 // Special case, parsing 2147483648 overflows to -2147483648, but this
 // value should be supported if there was a leading minus sign.
 || localValue == Int32.MinValue;
 }
 // Same logic if positive, but one fewer characters is allowed (no minus sign)
 result = localValue;
 if (ptrLen < 10L) return true;
 if (ptrLen > 10L) return false;
 return *cString - '0' == localValue / 1000000000;
 }
 }
}

Benchmarks

I was seeing a lot of variability in my benchmark runs when I looped over a billion strings at once, so to mange noise, I changed my test code to run several smaller test in succession. Here are the results for the above code:

Native parser took 1202 ms. Custom parser took 181 ms. Performance gain was 564.09% Native parser took 1211 ms. Custom parser took 177 ms. Performance gain was 584.18% Native parser took 1226 ms. Custom parser took 241 ms. Performance gain was 408.71% Native parser took 1315 ms. Custom parser took 180 ms. Performance gain was 630.56% Native parser took 1402 ms. Custom parser took 182 ms. Performance gain was 670.33% Native parser took 1212 ms. Custom parser took 181 ms. Performance gain was 569.61% Native parser took 1221 ms. Custom parser took 178 ms. Performance gain was 585.96% Native parser took 1203 ms. Custom parser took 178 ms. Performance gain was 575.84% Native parser took 1203 ms. Custom parser took 178 ms. Performance gain was 575.84% Native parser took 1220 ms. Custom parser took 179 ms. Performance gain was 581.56%

The average is about 575% faster than native parse.

Performance Improvements

Uses unsafe and fixed to treat the string as a null-terminated character array, avoiding the need to monitor or pre-compute the length of the string as we traverse it.
Initializes the local value directly using the first numeric digit, avoiding a superfluous multiplication and addition with zero in the first loop.
A null check at the beginning may enable some JIT optimizations.
Uses a local integer value for accumulation during parsing, and only assigns out result once - since manipulating out variables directly is more expensive.
Corrected the overflow check. In the majority case (less than 11/10 characters) there are no additional ops for overflow detection. Otherwise, for a number close to the limit, it takes just a couple additional ops to check whether the first digit changed due to overflow.

Omissions

Still does not account for leading zeros in overflow detection. Doing so slows down the expected case too much.
Still does not allow white-space, thousand separators, a leading positive sign, or trailing signs.

Note that in all above 'unhandled' cases, we're very careful to return false - we will never return true with an incorrect result.

Note that you could replace all instances of return false with return Int.TryParse(input, out result) if you wanted to "fall-back" to the native integer parser in these rare cases to strike a balance between performance and flexibility. In our case, something similar is done further up the chain, so I haven't included it in this code.

Question 16

This fails to handle inputs like "", "00000000001" and "405円". The overflow/length checks should ignore leading zeroes, and you may want to check the actual length of the input rather than relying on null-terminators.

Question 17

One more thing: what about inputs with a leading positive sign, like "+1"?

Question 18

string.Length is precomputed, so it's quite inexpensive, and the leading-zeroes check doesn't need to be done up-front, so it should only affect performance for 10+ digit input. Regarding the null-check, did you actually notice an improvement? For me it was very hit-and-miss.

Question 19

I noticed no performance change trends with the null check (especially given the inherent noise in the results as seen above.) But so long as there's no negative impact, it's nice to be able to handle that case rather than throw a null reference exception.

Question 20

Good to know about string.Length. I intuitively thought it might get lazily computed as needed, but it looks like it is populated at construction time.

Pieter Witvoet Pieter Witvoet 8,27916 silver badges28 bronze badges · Answer 1 · 2018-08-08 10:03:06Z

I normally don't write performance-sensitive code like this, but I think I managed to find a few improvements:

Writing to out parameters is slower than using local variables, so you only want to assign to result once. This makes a big difference.
Rejecting null/empty input right away might improve performance in some cases, perhaps because it enables some JIT optimizations.
Overflow always causes a mismatch between the first input digit and the resulting value, so a single check should be sufficient.

And a few bug-fixes:

"00000000001" was rejected because it has 11 digits, but it's a valid input. Leading zeroes should be ignored for length/overflow checks.
"405円" was parsed as 4, but it's an invalid input. input.Length should be used instead of checking for null-terminators.
"" was parsed as 0, but it's an invalid input.

According to my tests the following (safe) implementation is roughly 10%-15% faster. An unsafe variant should increase that to some 20%:

public static bool TryParseInt(string input, out int result)
{
 // This check sometimes improves performance - perhaps the extra guarantees enable some JIT optimizations?
 if (input == null || input.Length == 0)
 {
 result = default(int);
 return false;
 }
 var length = input.Length;
 var isNegative = input[0] == '-';
 var offset = isNegative ? 1 : 0;
 // It's faster to not operate directly on 'out' parameters:
 int value = 0;
 for (int i = offset; i < length; i++)
 {
 var c = input[i];
 if (c < '0' || c > '9')
 {
 result = default(int);
 return false;
 }
 else
 {
 value = (value * 10) + (c - '0');
 }
 }
 // Inputs with 10 digits or more might not fit in an integer, so they'll require additional checks:
 if (length - offset >= 10)
 {
 // Overflow/length checks should ignore leading zeroes:
 var meaningfulDigits = length - offset;
 for (int i = offset; i < length && input[i] == '0'; i++)
 meaningfulDigits -= 1;
 if (meaningfulDigits > 10)
 {
 // Too many digits, this certainly won't fit:
 result = default(int);
 return false;
 }
 else if (meaningfulDigits == 10)
 {
 // 10-digit numbers can be several times larger than int.MaxValue, so overflow may result in any possible value.
 // However, we only need to check the most significant digit to see if there's a mismatch.
 // Note that int.MinValue always overflows, making it the only case where overflow is allowed:
 if (!isNegative || value != int.MinValue)
 {
 // Any overflow will cause a leading digit mismatch:
 if (value / 1000000000 != (input[length - 10] - '0'))
 {
 result = default(int);
 return false;
 }
 }
 }
 }
 // -int.MinValue overflows back into int.MinValue, so that's ok:
 result = isNegative ? -value : value;
 return true;
}

Wow, just switching to a local int for the accumulation rather than using the out result directly upped the performance from +530% to +710% vs the native integer parser. Great find!

Heslacher Heslacher 50.9k5 gold badges83 silver badges177 bronze badges · Answer 2 · 2018-08-07 11:35:02Z

Like @202_accepted has mentioned in his comment you have a bug in your code which is hidden because of using unchecked in addition with your resulting condition.

I would divide this algorithm into 3 branches.

If the length of input is greater than the length of either int.MaxValue or int.MìnValue we can just return false.
If the length of input is smaller than the length of either int.MaxValue or int.MìnValue we can return false if any char is > 9 or < 0.
If the length is the same we iterate over input and compare it to a string representation of either int.MaxValue or int.MinValue and if any char, which isn't > 9 or < 0, is greater than the char to compare we return false.

Because you may use this method in a different project as well I would validate the input for null and whitespace instead of checking the Length property.
If I see a xxxTryParse method I would assume the out parameter being default(T) if the method returns false.

Implementing the mentioned points the code would loke like so

private const char CharNegative = '-';
private const int possibleMaxLength = 10;
private const int possibleMaxNegativeLength = 11;
private static string comparePositive = int.MaxValue.ToString();
private static string compareNegative =int.MinValue.ToString();
public static bool FastTryParseInt(string input, out int result)
{
 result = 0;
 if (string.IsNullOrWhiteSpace(input)) { return false; }
 int length = input.Length;
 bool isNegative = false;
 int currentIndex = 0;
 if (input[0] == CharNegative)
 {
 if (length > possibleMaxNegativeLength) { return false; }
 isNegative = true;
 ++currentIndex;
 }
 int maxLength = isNegative ? possibleMaxNegativeLength : possibleMaxLength;
 if (length > maxLength)
 {
 return false;
 }
 else if (length < maxLength)
 {
 char nextChar;
 while (currentIndex < length)
 {
 nextChar = input[currentIndex++];
 if (nextChar < '0' || nextChar > '9')
 {
 result = 0;
 return false;
 }
 result = result * 10 + (nextChar - '0');
 }
 }
 else
 {
 bool needsToBeCompared = true;
 string valueToCompare = isNegative ? compareNegative : comparePositive;
 char nextChar;
 char compareChar;
 while (currentIndex < maxLength)
 {
 nextChar = input[currentIndex];
 compareChar = valueToCompare[currentIndex];
 if (nextChar < '0' || nextChar > '9' || (needsToBeCompared && compareChar < nextChar))
 {
 result = 0;
 return false;
 }
 if (needsToBeCompared)
 {
 needsToBeCompared = compareChar == nextChar;
 }
 result = result * 10 + (nextChar - '0');
 currentIndex++;
 }
 }
 if (isNegative) { result = -result; }
 return true;
}

Timings (100000000 values, compiled AnyCpu, running on x64):

int.TryParse(): 11200 ms
FastTryParse(): 2730 ms

with generating random input like

private static Random random = new Random();
private static IEnumerable<string> GenerateRandomInput(int iterations)
{
 if (iterations == 0) yield break;
 for (int i = 0; i < iterations; i++)
 {
 if (i % 2 == 0)
 {
 yield return LongRandom((long)int.MinValue, ((long)int.MaxValue)).ToString();
 }
 else if (i % 3 == 0)
 {
 yield return LongRandom((long)int.MaxValue, ((long)int.MaxValue + 10000)).ToString();
 }
 else
 {
 yield return LongRandom(long.MinValue, long.MaxValue).ToString().Insert(3, "a"); 
 }
 }
}
private static long LongRandom(long min, long max)
{
 byte[] buf = new byte[8];
 random.NextBytes(buf);
 long longRand = BitConverter.ToInt64(buf, 0);
 return (Math.Abs(longRand % (max - min)) + min);
}

Edit
Since you don't bother about special cases like null being passed to the method and performance is your target, I have changed it like so

public static bool FastTryParseInt(string input, out int result)
{
 result = 0;
 int length = input.Length;
 if (length == 0) { return false; }
 bool isNegative = false;
 int currentIndex = 0;
 if (input[0] == CharNegative)
 {
 if (length > possibleMaxNegativeLength) { return false; }
 isNegative = true;
 ++currentIndex;
 }
 int maxLength = isNegative ? possibleMaxNegativeLength : possibleMaxLength;
 if (length > maxLength)
 {
 return false;
 }
 char nextChar;
 while (currentIndex < length)
 {
 nextChar = input[currentIndex++];
 if (nextChar < '0' || nextChar > '9')
 {
 result = 0;
 return false;
 }
 result = result * 10 + (nextChar - '0');
 if (result < 0)
 {
 result = 0;
 return false;
 }
 }
 if (isNegative) { result = -result; }
 return true;
}

If int result is overflowing it becomes negative so we just check if result < 0 to return false.

This runs in 2065 ms vs int.TryParse in 11220 ms.

Btw. you only need the unchecked keyword if you have checked "Check for arithmetic overflow/underflow" in the advanced build settings.

string.IsNullOrWhiteSpace is latin sensitive, which could be optimized in this case, maybe? Seeing how char.IsDigit was not used in favor of nextChar < '0' || nextChar > '9', which ignores globalization. I don't know what satisfies the requirements of the parser though.
That's an interesting concept, but in practice, it looks like the performance cost is too great. This new approach is only 310% faster than int.TryParse (down from 530%). Any 'checks' that can be saved until absolutely necessary are important to maximizing performance in the success case. Even just getting the length of the string could force an enumeration of the underlying character array that consumes half as many operations as the parsing of the string itself.
Since you're rejecting anything with too many digits you should only need to check for overflow after the last digit. Also note that int.MinValue is a special case.
Note that your latest edit has the same bug as the OP - a value like 4294967300 doesn't exceed the max length, but will overflow so far that it will become positive again - so your final check to see if the result is negative won't catch the overflow.
@Alain thanks. I just shouldn't change code without checking. Fixed

Alain Alain 4721 gold badge4 silver badges17 bronze badges · Answer 3 · 2018-08-07 15:27:45Z

Here's the most performant version I've got so far based on suggestions by @PieterWitvoet and 202_accepted. It also corrects the overflow bug from my OP.

/// <summary>High performance integer parser with rudimentary flexibility.</summary>
/// <remarks>Supports negative numbers, but no whitespace or other non-digit characters
/// may be present.
/// Will not parse strings with more than 10 numeric characters,
/// even if there are leading zeros (such that the integer does not overflow).</remarks>
public static unsafe bool FastTryParseInt(string input, out int result)
{
 // We never expect null, but enforcing this may enable some JIT optimizations.
 if (input == null)
 {
 result = default(int);
 return false;
 }
 fixed (char* cString = input)
 {
 unchecked
 {
 char* nextChar = cString;
 bool isNegative = false;
 // Check whether the first character is numeric
 if (*nextChar < '0' || *nextChar > '9')
 {
 // Only allow a negative sign at the beginning of the string.
 if (*nextChar != CharNegative)
 {
 result = default(int);
 return false;
 }
 isNegative = true;
 // Any other non-numeric characters after this is an error.
 if (*++nextChar < '0' || *nextChar > '9')
 {
 result = default(int);
 // Special Case: Excel has been known to format zero as "-".
 // So return true here IFF this non-digit char is the end-of-string.
 return *nextChar == Char.MinValue;
 }
 }
 // Now process each character of the string
 int localValue = *nextChar++ - '0';
 while (*nextChar >= '0' && *nextChar <= '9')
 localValue = localValue * 10 + (*nextChar++ - '0');
 // If the non-numeric character encountered to end the while loop
 // wasn't the null terminator, the string is invalid.
 if (*nextChar != Char.MinValue)
 {
 result = default(int);
 return false;
 }
 // We need to check for an integer overflow based on the length of the string
 long ptrLen = nextChar - cString;
 // Result and overflow logic is different if there was a minus sign.
 if (isNegative)
 {
 result = -localValue;
 // Longest possible negative int is 11 chars (-2147483648)
 // Less than 11 characters (including negative) is no risk of overflow
 if (ptrLen < 11L) return true;
 // More than 11 characters is definitely an overflow.
 if (ptrLen > 11L) return false;
 // Exactly 11 characters needs to be checked for overflow.
 // Neat Trick: An overflow will always result in the first digit changing.
 return *(cString + 1) - '0' == localValue / 1000000000
 // Special case, parsing 2147483648 overflows to -2147483648, but this
 // value should be supported if there was a leading minus sign.
 || localValue == Int32.MinValue;
 }
 // Same logic if positive, but one fewer characters is allowed (no minus sign)
 result = localValue;
 if (ptrLen < 10L) return true;
 if (ptrLen > 10L) return false;
 return *cString - '0' == localValue / 1000000000;
 }
 }
}

Benchmarks

I was seeing a lot of variability in my benchmark runs when I looped over a billion strings at once, so to mange noise, I changed my test code to run several smaller test in succession. Here are the results for the above code:

Native parser took 1202 ms. Custom parser took 181 ms. Performance gain was 564.09% Native parser took 1211 ms. Custom parser took 177 ms. Performance gain was 584.18% Native parser took 1226 ms. Custom parser took 241 ms. Performance gain was 408.71% Native parser took 1315 ms. Custom parser took 180 ms. Performance gain was 630.56% Native parser took 1402 ms. Custom parser took 182 ms. Performance gain was 670.33% Native parser took 1212 ms. Custom parser took 181 ms. Performance gain was 569.61% Native parser took 1221 ms. Custom parser took 178 ms. Performance gain was 585.96% Native parser took 1203 ms. Custom parser took 178 ms. Performance gain was 575.84% Native parser took 1203 ms. Custom parser took 178 ms. Performance gain was 575.84% Native parser took 1220 ms. Custom parser took 179 ms. Performance gain was 581.56%

The average is about 575% faster than native parse.

Performance Improvements

Uses unsafe and fixed to treat the string as a null-terminated character array, avoiding the need to monitor or pre-compute the length of the string as we traverse it.
Initializes the local value directly using the first numeric digit, avoiding a superfluous multiplication and addition with zero in the first loop.
A null check at the beginning may enable some JIT optimizations.
Uses a local integer value for accumulation during parsing, and only assigns out result once - since manipulating out variables directly is more expensive.
Corrected the overflow check. In the majority case (less than 11/10 characters) there are no additional ops for overflow detection. Otherwise, for a number close to the limit, it takes just a couple additional ops to check whether the first digit changed due to overflow.

Omissions

Still does not account for leading zeros in overflow detection. Doing so slows down the expected case too much.
Still does not allow white-space, thousand separators, a leading positive sign, or trailing signs.

Note that in all above 'unhandled' cases, we're very careful to return false - we will never return true with an incorrect result.

Note that you could replace all instances of return false with return Int.TryParse(input, out result) if you wanted to "fall-back" to the native integer parser in these rare cases to strike a balance between performance and flexibility. In our case, something similar is done further up the chain, so I haven't included it in this code.

This fails to handle inputs like "", "00000000001" and "405円". The overflow/length checks should ignore leading zeroes, and you may want to check the actual length of the input rather than relying on null-terminators.
One more thing: what about inputs with a leading positive sign, like "+1"?
string.Length is precomputed, so it's quite inexpensive, and the leading-zeroes check doesn't need to be done up-front, so it should only affect performance for 10+ digit input. Regarding the null-check, did you actually notice an improvement? For me it was very hit-and-miss.
I noticed no performance change trends with the null check (especially given the inherent noise in the results as seen above.) But so long as there's no negative impact, it's nice to be able to handle that case rather than throw a null reference exception.
Good to know about string.Length. I intuitively thought it might get lazily computed as needed, but it looks like it is populated at construction time.

Stack Exchange Network

Custom integer parser optimized for performance

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Custom integer parser optimized for performance

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions