Here is an alternative algorithm based on a recommendation by @chux.
We append a null character to the end of the string ('0円') although any non-digit character will do. Now we don't have to check for array boundaries. All of our while loops become super tight:
/// <summary>A buffer used store a copy of input characters so that we can
/// append a null character avoid array index boundary checks while looping.</summary>
private static readonly char[] characters = new char[256];
public static bool FastTryParseDouble(string input, out double result)
{
int length = input.Length;
if (length <= 0)
{
result = Double.NaN;
return false;
}
double sign = 1d;
int currentIndex = 0;
// Copy the input characters to a new array with a null termination character.
input.CopyTo(0, characters, 0, length);
characters[length] = '0円';
char nextChar = characters[0];
/**************** Sign (+/-) and Special Case String Representations *****************/
// Handle all cases where the string does not start with a numeric character
if (nextChar < '0' || nextChar > '9')
{
// Non-numeric 1-character strings must match one of the supported special cases.
if (length == 1)
return CheckForSpecialCaseDoubleStrings(input, out result);
// For anything more than one character, this should be a sign character.
if (nextChar == CharNegative)
sign = -1d;
// The very next character may also be the decimal separator.
else if (nextChar == CharDecimalSeparator)
{
// In this case, we treat the integer part as 0 and skip to the fractional part.
result = 0d;
goto SkipIntegerPart;
}
// Finally, unless it was a '+' sign, input must match one of a set of special cases.
else if (nextChar != CharPositive)
return CheckForSpecialCaseDoubleStrings(input, out result);
// Once the sign is consumed, advance to the next character for further parsing
nextChar = characters[unchecked(++currentIndex)];
// We must once more check whether the character is numeric before proceeding.
if (nextChar < '0' || nextChar > '9')
{
// If not numeric, at this point, the character can only be a decimal separator
// (as in "-.123" or "+.123"), or else it must be part of a special case string
// (as in "-∞"). So check for those.
if (nextChar != CharDecimalSeparator)
return CheckForSpecialCaseDoubleStrings(input, out result);
result = 0d;
goto SkipIntegerPart;
}
}
/********************************** "Integer Part" ***********************************/
// Treat all subsequent numeric characters as the "integer part" of the result.
// Since we've already checked that the next character is numeric,
// We can save 2 ops by initializing the result directly.
unchecked
{
result = nextChar - '0';
nextChar = characters[++currentIndex];
while (nextChar >= '0' && nextChar <= '9')
{
result = result * 10d + (nextChar - '0');
nextChar = characters[++currentIndex];
}
}
// This label and corresponding goto statements is a performance optimization to
// allow us to efficiently skip "integer part" parsing in cases like ".123456"
// Please don't be mad.
SkipIntegerPart:
// The expected case is that the next character is a decimal separator, however
// this section might be skipped in normal use cases (e.g. as in "1e18")
// TODO: If we broke out of the while loop above due to reaching the end of the
// string, this operation is superfluous. Is there a way to skip it?
if (nextChar == CharDecimalSeparator)
{
/******************************* "Fractional Part" *******************************/
// Track the index at the start of the fraction part.
unchecked
{
int fractionPos = ++currentIndex;
nextChar = characters[currentIndex];
// Continue shifting and adding to the result as before
// Note that we flip the OR here, because it's now more likely that
// nextChar > '9' ('e' or 'E'), leading to an early exit condition.
while (nextChar <= '9' && nextChar >= '0')
{
result = result * 10d + (nextChar - '0');
nextChar = characters[++currentIndex];
}
// Update this to store the number of digits in the "fraction part".
// We will use this to adjust down the magnitude of the double.
fractionPos = currentIndex - fractionPos;
// Use our tiny array of negative powers of 10 if possible, but fallback to
// our larger array (still fast), whose higher indices store negative powers.
// Finally, while practically unlikely, ridiculous strings (>300 characters)
// can still be supported with a final fallback to native Math.Pow
// TODO: Is it possible to combine this magnitude adjustment with any
// applicable adjustment due to scientific notation?
result *= fractionPos < NegPow10Length ?
NegPow10[fractionPos] : fractionPos < MaxDoubleExponent ?
Pow10[MaxDoubleExponent + fractionPos] : Math.Pow(10, -fractionPos);
}
}
// Apply the sign now that we've added all digits that belong to the significand
result *= sign;
// If we have consumed every character in the string, return now.
if (currentIndex >= length) return true;
// The next character encountered must be an exponent character
if (nextChar != 'e' && nextChar != 'E')
return false;
/**************************** "Scientific Notation Part" *****************************/
unchecked
{
// If we're at the end of the string (last character was 'e' or 'E'), that's an error
if (++currentIndex >= length) return false;
// Otherwise, advance the current character and begin parsing the exponent
nextChar = characters[currentIndex];
bool exponentIsNegative = false;
// The next character can only be a +/- sign, or a numeric character
if (nextChar < '0' || nextChar > '9')
{
if (nextChar == CharNegative)
exponentIsNegative = true;
else if (nextChar != CharPositive)
return false;
// Again, require there to be at least one more character in the string after the sign
if (++currentIndex >= length) return false;
nextChar = characters[currentIndex];
// And verify that this next character is numeric
if (nextChar < '0' || nextChar > '9') return false;
}
// Since we know the next character is a digit, we can initialize the exponent int
// directly and avoid 2 wasted ops (multiplying by and adding to zero).
int exponent = nextChar - '0';
nextChar = characters[++currentIndex];
// Shift and add any additional digit characters
while (nextChar >= '0' && nextChar <= '9')
{
exponent = exponent * 10 + nextChar - '0';
nextChar = characters[++currentIndex];
}
// If the final (non-digit) char was not our null character, the string was invalid.
if (nextChar != '0円') return false;
// Apply the exponent. If negative, our index jump is a little different.
if (exponentIsNegative)
result *= exponent < Pow10Length - MaxDoubleExponent ?
// Fallback to Math.Pow if the lookup array doesn't cover it.
Pow10[exponent + MaxDoubleExponent] : Math.Pow(10, -exponent);
// If positive, our array covers all possible positive exponents - ensure its valid.
else if (exponent > MaxDoubleExponent)
return false;
else
result *= Pow10[exponent];
}
// Doubles that underwent scientific notation parsing should be checked for overflow
// (Otherwise, this isn't really a risk we don't expect strings of >308 characters)
return !Double.IsInfinity(result);
}
Unfortunately, C# doesn't let you cheaply append an element to an array. We need to copy the data to a new (larger) array in order to add the 'terminating' character. Even if we keep a single, shared "buffer" for re-use to avoid allocating a new array on each call, the copy operation is prohibitively expensive. As such, the performance ends up worse:
Native parser took 21422 ms.
Custom parser took 8684 ms.
Performance gain was 146.68%
I'm sure this solution would be much more optimal in a language like c++, where the strings already contain a null character at the end. In such a case, there would be no need to copy the character array at all, and the algorithm below could be that much faster than one that has to constantly test array index bounds and test for whether the character is a digit character.
- 472
- 1
- 4
- 17