Return to Revisions

10 of 10

Commonmark migration

edited Jun 10, 2020 at 13:24

Custom double parser optimized for performance

I'm trying to beat the native Double.TryParse for performance in parsing large multi-million row (simple) CSV files as much as possible. I do not have to support exponential notation, thousand separators, Inf, -Inf, NaN, or anything exotic. Just millions of "0.##" format doubles.

Here's my best attempt, which is ~350% faster by my tests (64 bit release mode)

My Implementation

This is the setup of the function (mostly for context).

private static readonly char CharNegative = CurrentCulture.NumberFormat.NegativeSign[0];
private static readonly char CharDecimalSeparator =
 CurrentCulture.NumberFormat.NumberDecimalSeparator[0];
/// <summary>High performance double parser with rudimentary flexibility.
/// <returns>Returns true only if we can be certain we parsed the string correctly.
/// <remarks>Does not support exponential notation, thousand separators or whitespace.
/// Does not support Infinity, Negative Infinity, NaN, or detect over/underflows.
/// Supports only leading negative signs, no positive signs or trailing signs.</remarks>
public static bool FastTryParseDouble(string input, out double result)
{
 result = 0d;
 int length = input.Length;
 if (length == 0) return false;
 double sign = 1d;
 int currentIndex = 0;
 char nextChar = input[0];
 // Handle a possible negative sign at the beginning of the string.
 if (nextChar == CharNegative)
 {
 sign = -1d;
 ++currentIndex;
 }

As you can see, I try to remain culture aware and support negative numbers. This is the remainder of the method, which I think needs to be optimized for performance:

 unchecked
 {
 while (true)
 {
 // Return now if we have reached the end of the string
 if (currentIndex >= length)
 {
 result *= sign;
 return true;
 }
 nextChar = input[currentIndex++];
 // Break if the result wasn't a digit between 0 and 9
 if (nextChar < '0' || nextChar > '9') break;
 // Multiply by 10 and add the next digit.
 result = result * 10 + (nextChar - '0');
 }
 // The next character should be a decimal character, or else it's invalid.
 if (nextChar != CharDecimalSeparator) return false;
 double fractionalPart = 0d;
 int fractionLengh = length - currentIndex;
 while (currentIndex < length)
 {
 nextChar = input[currentIndex++];
 // If we encounter a non-digit now, it's an error
 if (nextChar < '0' || nextChar > '9') return false;
 fractionalPart = fractionalPart * 10 + (nextChar - '0');
 }
 // Add the fractional part to the result, apply sign, and return
 if (fractionLengh < NegPow10.Length)
 result = (result + fractionalPart * NegPow10[fractionLengh]) * sign;
 else
 result = (result + fractionalPart * Math.Pow(10, -fractionLengh)) * sign;
 }
 return true;
}

NegPow10 at the end there is just the following array, which has a quick lookup value to cover the first 20 or so values of 10^-n for quick reference. Anything bigger just falls back to Math.Pow()

/// <summary>A cache of negative powers of 10 for quick magnitude adjustment of parsed
/// decimals up to the maximum number of possible decimal places that might be consumed
/// from a string representation of a double.</summary>
private static readonly double[] NegPow10 = new double[]
{
 1d,
 0.1,
 0.01,
 ///... you get the idea
 0.0000000000000001
};

Test Cases

The following test cases are all passing:

TestSuccess("0", 0d);
TestSuccess("1", 1d);
TestSuccess("-1", -1d);
TestSuccess("123.45", 123.45);
TestSuccess("-123.45", -123.45);
TestSuccess("12345678901234", 12345678901234d);
TestSuccess("-12345678901234", -12345678901234d);
TestSuccess("0.12", 0.12);
TestSuccess("-0.12", -0.12);
TestSuccess("0.00", 0.00);
TestSuccess("-0.00", -0.00);
TestSuccess("1234567890123.01", 1234567890123.01);
TestSuccess("-1234567890123.01", -1234567890123.01);
TestSuccess("123456789000000000000000", 123456789000000000000000d);
TestSuccess("-123456789000000000000000", -123456789000000000000000d);

I also have the unsupported (failure) cases laid out if anyone's interested, but it's basically the limitations mentioned in the remarks above.

Benchmarks

I benchmarked my implementation against native Double.TryParse to guage the performance difference.

I tested parsing an array of 10 million different strings using:

Double.TryParse(value, NumberStyles.Float, cachedCulture, out _)

Note that I cache the culture instance and pass in explicit NumberStyles to get the native method as fast as possible before comparing it to my own. My method was of course running 10 million strings through:

Parsers.FastTryParseDouble(value, out _)

Results

Native Double.TryParse took ~4500 ms.

Custom Parsers.FastTryParseDouble took ~950 ms.

Performance gain was ~370%

Next Steps

See any other ways I can squeeze out more performance?

Any awful flaws that might cause incorrect results to be returned? Note that I'm always happy to return "false" for unsupported cases if that's what's fastest, but I'm not okay to return true and a bad result.

c# performance parsing floating-point

asked Jul 31, 2018 at 0:25

Alain

lang-cs