Here is an alternative algorithm based on a recommendation by @chux.
(削除) We append a null character to the end of the string ('0円') although any non-digit character will do. (削除ここまで) EDIT We can use an unsafe
method and treat the string as a null-terminated character array pointer.
Now we don't have to check for array boundaries. All of our while loops become super tight like this:
while (*nextChar >= '0' && *nextChar <= '9')
result = result * 10d + (*nextChar++ - '0');
We also no longer need to keep track of our current index or check for end of string anywhere (instead, we check for the null terminating character - but always outside of a loop). The only thing that becomes a little more complicated is the determination of the fraction length and corresponding magnitude adjustment.
Here is the full algorithm, updated to use a char*
pointer. I'll just split out these first two lines from the rest of the body to keep indentation readable on SE.
public static unsafe bool FastTryParseDouble(string input, out double result)
{
fixed (char* cInput = input)
{
double sign = 1d;
char* nextChar = cInput;
/**************** Sign (+/-) and Special Case String Representations *****************/
// Handle all cases where the string does not start with a numeric character
if (*nextChar < '0' || *nextChar > '9')
{
// Check for empty string
if (*nextChar == Char.MinValue)
{
result = Double.NaN;
return false;
}
// Non-numeric 1-character strings must match one of the supported special cases
if (*(nextChar + 1) == Char.MinValue)
return CheckForSpecialCaseDoubleStrings(input, out result);
// For anything more than one character, this should be a sign character
if (*nextChar == CharNegative)
sign = -1d;
// The very first character may also be the decimal separator.
else if (*nextChar == CharDecimalSeparator)
{
// In this case, we treat the integer part as 0 and skip to the fractional part
result = 0d;
goto SkipIntegerPart;
}
// Finally, unless it was a '+' sign, input must match one of a set of special cases
else if (*nextChar != CharPositive)
return CheckForSpecialCaseDoubleStrings(input, out result);
// Once the sign is consumed, advance to the next character for further parsing
// We must once more check whether the character is numeric before proceeding.
if (*++nextChar < '0' || *nextChar > '9')
{
// If not numeric, at this point, the character can only be a decimal separator
// (as in "-.123" or "+.123"), or else it must be part of a special case string
// (as in "-∞"). So check for those.
if (*nextChar != CharDecimalSeparator)
return CheckForSpecialCaseDoubleStrings(input, out result);
result = 0d;
goto SkipIntegerPart;
}
}
/********************************** "Integer Part" ***********************************/
// Treat all subsequent numeric characters as the "integer part" of the result.
// Since we've already checked that the next character is numeric,
// We can save 2 ops by initializing the result directly.
result = *nextChar++ - '0';
while (*nextChar >= '0' && *nextChar <= '9')
result = result * 10d + (*nextChar++ - '0');
// This label and corresponding goto statements is a performance optimization to
// allow us to efficiently skip "integer part" parsing in cases like ".123456"
// Please don't be mad.
SkipIntegerPart:
// The expected case is that the next character is a decimal separator, however
// this section might be skipped in normal use cases (e.g. as in "1e18")
// TODO: If we broke out of the while loop above due to reaching the end of the
// string, this operation is superfluous. Is there a way to skip it?
if (*nextChar == CharDecimalSeparator)
{
/******************************* "Fractional Part" *******************************/
// Track the index at the start of the fraction part.
char* fractionStart = ++nextChar;
// Continue shifting and adding to the result as before
// Note that we flip the OR here, because it's now more likely that
// nextChar > '9' ('e' or 'E'), leading to an early exit condition.
while (*nextChar <= '9' && *nextChar >= '0')
result = result * 10d + (*nextChar++ - '0');
// Adjust down the magnitude of the double.
// TODO: Is it possible to combine this magnitude adjustment with any
// applicable adjustment due to scientific notation?
unchecked
{
long fractionLen = nextChar - fractionStart;
// Use our tiny array of negative powers of 10 if possible, but fallback to
// our larger array (still fast), whose higher indices store negative powers
// Finally, while practically unlikely, ridiculous strings (>300 characters)
// can still be supported with a final fallback to native Math.Pow
result *= fractionLen < NegPow10Length ? NegPow10[fractionLen] :
fractionLen < MaxDoubleExponent ? Pow10[MaxDoubleExponent + fractionLen] :
Math.Pow(10, -fractionLen);
}
}
// Apply the sign now that we've added all digits that belong to the significand
result *= sign;
// If we have consumed every character in the string, return now.
if (*nextChar == Char.MinValue) return true;
// The next character encountered must be an exponent character
if (*nextChar != 'e' && *nextChar != 'E') return false;
/**************************** "Scientific Notation Part" *****************************/
// If we're at the end of the string (last character was 'e' or 'E'), that's an error
if (*++nextChar == Char.MinValue) return false;
// Otherwise, begin parsing the exponent
bool exponentIsNegative = false;
// The next character can only be a +/- sign, or a numeric character
if (*nextChar < '0' || *nextChar > '9')
{
if (*nextChar == CharNegative)
exponentIsNegative = true;
else if (*nextChar != CharPositive)
return false;
// Again, require at least one more character in the string after the sign
if (*++nextChar == Char.MinValue) return false;
// And verify that this next character is numeric
if (*nextChar < '0' || *nextChar > '9') return false;
}
unchecked
{
// Since we know the next character is a digit, we can initialize the exponent
// int directly and avoid 2 wasted ops (multiplying by and adding to zero)
int exponent = *nextChar++ - '0';
// Shift and add any additional digit characters
while (*nextChar <= '9' && *nextChar >= '0')
exponent = exponent * 10 + (*nextChar++ - '0');
// If we broke for anything other than the end of string, it's an error
if (*nextChar != Char.MinValue) return false;
// Apply the exponent. If negative, our index jump is a little different
if (exponentIsNegative)
result *= exponent < Pow10Length - MaxDoubleExponent ?
// Fallback to Math.Pow if the lookup array doesn't cover it.
Pow10[exponent + MaxDoubleExponent] : Math.Pow(10, -exponent);
// If positive, our array covers all possible positive exponents - ensure its valid
else if (exponent > MaxDoubleExponent)
return false;
else
result *= Pow10[exponent];
}
// Doubles that underwent scientific notation parsing should be checked for overflow
// (Otherwise, this isn't really a risk we don't expect strings of >308 characters)
return !Double.IsInfinity(result);
Native parser took 21660 ms.
Custom parser took 5623 ms.
Performance gain was 285%
Now that's more competitive!
- 472
- 1
- 4
- 17