Skip to main content
Code Review

Return to Revisions

2 of 4
Improved answer - using a char array makes this approach competitive!
Alain
  • 472
  • 1
  • 4
  • 17

Here is an alternative algorithm based on a recommendation by @chux.

(削除) We append a null character to the end of the string ('0円') although any non-digit character will do. (削除ここまで) EDIT We can use an unsafe method and treat the string as a null-terminated character array pointer.

Now we don't have to check for array boundaries. All of our while loops become super tight like this:

while (*nextChar >= '0' && *nextChar <= '9')
 result = result * 10d + (*nextChar++ - '0');

We also no longer need to keep track of our current index or check for end of string anywhere (instead, we check for the null terminating character - but always outside of a loop). The only thing that becomes a little more complicated is the determination of the fraction length and corresponding magnitude adjustment.

Here is the full algorithm, updated to use a char* pointer. I'll just split out these first two lines from the rest of the body to keep indentation readable on SE.

public static unsafe bool FastTryParseDouble(string input, out double result)
{
 fixed (char* cInput = input)
 {
double sign = 1d;
char* nextChar = cInput;
/**************** Sign (+/-) and Special Case String Representations *****************/
// Handle all cases where the string does not start with a numeric character
if (*nextChar < '0' || *nextChar > '9')
{
 // Check for empty string
 if (*nextChar == Char.MinValue)
 {
 result = Double.NaN;
 return false;
 }
 // Non-numeric 1-character strings must match one of the supported special cases
 if (*(nextChar + 1) == Char.MinValue)
 return CheckForSpecialCaseDoubleStrings(input, out result);
 // For anything more than one character, this should be a sign character
 if (*nextChar == CharNegative)
 sign = -1d;
 // The very first character may also be the decimal separator.
 else if (*nextChar == CharDecimalSeparator)
 {
 // In this case, we treat the integer part as 0 and skip to the fractional part
 result = 0d;
 goto SkipIntegerPart;
 }
 // Finally, unless it was a '+' sign, input must match one of a set of special cases
 else if (*nextChar != CharPositive)
 return CheckForSpecialCaseDoubleStrings(input, out result);
 // Once the sign is consumed, advance to the next character for further parsing
 // We must once more check whether the character is numeric before proceeding.
 if (*++nextChar < '0' || *nextChar > '9')
 {
 // If not numeric, at this point, the character can only be a decimal separator
 // (as in "-.123" or "+.123"), or else it must be part of a special case string
 // (as in "-∞"). So check for those.
 if (*nextChar != CharDecimalSeparator)
 return CheckForSpecialCaseDoubleStrings(input, out result);
 result = 0d;
 goto SkipIntegerPart;
 }
}
/********************************** "Integer Part" ***********************************/
// Treat all subsequent numeric characters as the "integer part" of the result.
// Since we've already checked that the next character is numeric,
// We can save 2 ops by initializing the result directly.
result = *nextChar++ - '0';
while (*nextChar >= '0' && *nextChar <= '9')
 result = result * 10d + (*nextChar++ - '0');
// This label and corresponding goto statements is a performance optimization to
// allow us to efficiently skip "integer part" parsing in cases like ".123456"
// Please don't be mad.
SkipIntegerPart:
// The expected case is that the next character is a decimal separator, however
// this section might be skipped in normal use cases (e.g. as in "1e18")
// TODO: If we broke out of the while loop above due to reaching the end of the
// string, this operation is superfluous. Is there a way to skip it?
if (*nextChar == CharDecimalSeparator)
{
 /******************************* "Fractional Part" *******************************/
 // Track the index at the start of the fraction part.
 char* fractionStart = ++nextChar;
 // Continue shifting and adding to the result as before
 // Note that we flip the OR here, because it's now more likely that
 // nextChar > '9' ('e' or 'E'), leading to an early exit condition.
 while (*nextChar <= '9' && *nextChar >= '0')
 result = result * 10d + (*nextChar++ - '0');
 // Adjust down the magnitude of the double.
 // TODO: Is it possible to combine this magnitude adjustment with any
 // applicable adjustment due to scientific notation?
 unchecked
 {
 long fractionLen = nextChar - fractionStart;
 // Use our tiny array of negative powers of 10 if possible, but fallback to
 // our larger array (still fast), whose higher indices store negative powers
 // Finally, while practically unlikely, ridiculous strings (>300 characters)
 // can still be supported with a final fallback to native Math.Pow
 result *= fractionLen < NegPow10Length ? NegPow10[fractionLen] :
 fractionLen < MaxDoubleExponent ? Pow10[MaxDoubleExponent + fractionLen] :
 Math.Pow(10, -fractionLen);
 }
}
// Apply the sign now that we've added all digits that belong to the significand
result *= sign;
// If we have consumed every character in the string, return now.
if (*nextChar == Char.MinValue) return true;
// The next character encountered must be an exponent character
if (*nextChar != 'e' && *nextChar != 'E') return false;
/**************************** "Scientific Notation Part" *****************************/
// If we're at the end of the string (last character was 'e' or 'E'), that's an error
if (*++nextChar == Char.MinValue) return false;
// Otherwise, begin parsing the exponent
bool exponentIsNegative = false;
// The next character can only be a +/- sign, or a numeric character
if (*nextChar < '0' || *nextChar > '9')
{
 if (*nextChar == CharNegative)
 exponentIsNegative = true;
 else if (*nextChar != CharPositive)
 return false;
 // Again, require at least one more character in the string after the sign
 if (*++nextChar == Char.MinValue) return false;
 // And verify that this next character is numeric
 if (*nextChar < '0' || *nextChar > '9') return false;
}
unchecked
{
 // Since we know the next character is a digit, we can initialize the exponent
 // int directly and avoid 2 wasted ops (multiplying by and adding to zero)
 int exponent = *nextChar++ - '0';
 // Shift and add any additional digit characters
 while (*nextChar <= '9' && *nextChar >= '0')
 exponent = exponent * 10 + (*nextChar++ - '0');
 // If we broke for anything other than the end of string, it's an error
 if (*nextChar != Char.MinValue) return false;
 // Apply the exponent. If negative, our index jump is a little different
 if (exponentIsNegative)
 result *= exponent < Pow10Length - MaxDoubleExponent ?
 // Fallback to Math.Pow if the lookup array doesn't cover it.
 Pow10[exponent + MaxDoubleExponent] : Math.Pow(10, -exponent);
 // If positive, our array covers all possible positive exponents - ensure its valid
 else if (exponent > MaxDoubleExponent)
 return false;
 else
 result *= Pow10[exponent];
}
// Doubles that underwent scientific notation parsing should be checked for overflow
// (Otherwise, this isn't really a risk we don't expect strings of >308 characters)
return !Double.IsInfinity(result);

Native parser took 21660 ms.

Custom parser took 5623 ms.

Performance gain was 285%

Now that's more competitive!

Alain
  • 472
  • 1
  • 4
  • 17
default

AltStyle によって変換されたページ (->オリジナル) /