`atof` implementation

Question 1

I have implemented the atof function. Here is my implementation

double atof(const char *str)
{
 double a; /* the a value in a*10^b */
 double decplace; /* number to divide by if decimal point is seen */
 double b; /* The b value (exponent) in a*10^b */
 int sign = 1; /* stores the sign of a */
 int bsign = 1; /* stores the sign of b */
 while (*str && isspace(*str))
 ++str;
 if (*str == '+')
 ++str;
 if (*str == '-') {
 sign = -1;
 ++str;
 }
 if ((*str == 'n' || *str == 'N') && 
 (str[1] == 'a' || str[1] == 'A') 
 && (str[2] == 'n' || str[2] == 'N'))
 return NAN * sign;
 if ((*str == 'i' || *str == 'I') && (str[1] == 'n' || str[1] == 'N') && 
 (str[2] == 'f' || str[2] == 'F'))
 return INFINITY * sign;
 for (a = 0; *str && isdigit(*str); ++str)
 a = a * 10 + (*str - '0');
 if (*str == '.')
 ++str;
 for (decplace = 1.; *str && isdigit(*str); ++str, decplace *= 10.)
 a = a * 10. + (*str - '0');
 if (*str == 'e' || *str == 'E') {
 /* if the user types a string starting from e, make the base be 1 */
 if (a == 0)
 a = 1;
 ++str;
 if (*str == '-') {
 bsign = -1;
 ++str;
 }
 if (*str == '+')
 ++str;
 for (b = 0; *str && isdigit(*str); ++str)
 b = b * 10 + (*str - '0');
 b *= bsign;
 }
 else
 b = 0;
 return (a * sign / decplace) * pow(10, b);
}

Question 2

FP overflow

A ridiculously long fractional part causes decplace to reach inf, and the perfectly valid number is converted to nan.
Testing for NAN and INF

Using strncmp would be much more clear.
DRY

The integer conversion loop is repeated at least twice (for whole part and exponent). It is a strong indication that you need to factor it out into a function. You may want to have this function to handle fractional part as well.

Best effort

The function fails to convert certain numbers which stdlib version does handle, e.g.

 char * s ="10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001E-200";

(a ridiculously long whole part compensated by E-200 exponent). I suppose that stdlib version anticipates such inputs and normalizes them prior to conversion.

Question 3

Bug

If I convert "0e+1", I will get back 1, because of these lines:

 /* if the user types a string starting from e, make the base be 1 */
 if (a == 0)
 a = 1;

That check can't just be a == 0 because the part before the 'e' may be 0 as in my example. One way to fix this is to check if the string begins with an 'e' before parsing any digits and setting a to 1 if it does.

Avoid calling pow unless necessary

Most of the time, you won't parse any e+exponent part, but currently you call pow() no matter what, which makes your function slower than it could be:

return (a * sign / decplace) * pow(10, b);

You should avoid calling pow() unless you need it. Here is one way (not necessarily the best way):

return (a * sign / decplace) * ((b == 0) ? 1 : pow(10, b));

Timing tests

Edit: one user questioned whether the pow() optimization was useful. I ran the original function on the input "55.5" 100000000 times vs the version that avoided calling pow(). The results:

Original version: 2.55 sec
Modified version: 1.10 sec

I also modified the code a different way to avoid the extra multiply against 1, like this:

if (*str == 'e' || *str == 'E') {
 // (All the same code here)
 b *= bsign;
 return (a * sign / decplace) * pow(10, b);
}
return a * sign / decplace;

This one was even faster:

Second modified version: 1.00 sec

Question 4

I'm not sure the pow optimization is useful. Also, it clutters the code.

Question 5

using a lookup table for powers of 10 may be faster than that trick

Question 6

@heinrich5991 See timing results that I added above.

Question 7

@LưuVĩnhPhúc My assumption was that the pow() case was uncommon. But yes a lookup table could speed up the case where the pow() is necessary.

Question 8

Implementing a precise atof function is surprisingly difficult since there is no limit to the number of digits code may need to read to ensure correct rounding. Consider, for example, the following two values:

12012345.50000000000000000000000000000000000000000000000000000
12012345.50000000000000000000000000000000000000000000000000001

The first should round down to 12012345.0f; the second should round up to 12012346.0f. It's not necessary for the function to perform extensive calculations on the input string; it would sufficient for it to observe whether there are any non-zero digits past a certain point, but the logic to handle all the cases is apt to be tricky.

Question 9

Glibc does it all with big integers (see.: stdlib/strtod_l.c) others use David Gay’s strtod() (now over 20 years old and still running!) which is optimized for speed and a hard read.

Question 10

Probably digits 1201234 are not necessary in this example.

Question 11

@CiaPan: The integer part needs to be odd and in a certain range. value needs to be in a certain range, and making digits different makes it easier to see how many there are. Note that 12345678.5 wouldn't work, nor would 23456789.5.

Question 12

Overflow with values near DBL_MAX. With input like "0.0001e310" return inf rather than 1.00000e+307.
```
return (a * sign / decplace) * pow(10, b);
```
Same code loses precision near DBL_MIN and total loss of precision near DBL_TRUE_MIN.
The same line of code unnecessarily loses precision. To improve #1, 2 and 3, rather than decplace being a power of 10, just use integer values and add to b. This will greatly reduce but not eliminate issues near DBL_MAX, DBL_MIN,DBL_TRUE_MIN. There will be less precision loss, but not fully eliminated. IOWs, the double generated will be closer to the best one. [Edit -->] Another simple improvement: Rather than ... * pow(10, b), use (* ... * pow(5, b)) * pow(2,b). This will allow precise calculations near the extremes of double without undo over/under flow.
Quality assessment. Unknown - but should not be. Part of writing a good atof() is to test its correctnesses. By renaming this to my_atof() and testing against the standard atof() would be a first step. Then rate how close the two functions are. This post omits any statistics, so I assume none have been done. With any math functions, such an analysis, even crude, should be attempted. Note that such test code can be much more code than this atof() and it is work.
If truly interested in improving your algorithm, refine it with by first writing a good atofloat() (using only float math) and compare that against strtod().
I do not see clear consensus about use of sign on NAN so return NAN * sign; may or not be needed. Also, that code might not provide result in -NAN. Consider using copysign(). One of those FP corner cases.
Pedantic: isspace() expects a value in the unsigned char (or EOF). With a signed char, the below is a problem with negative values.
```
// while (*str && isspace(*str))
// fix with cast or other means
while (*str && isspace((unsigned char) *str))
```

vnp vnp 58.7k4 gold badges55 silver badges144 bronze badges · Accepted Answer · 2016-02-14 22:59:01Z

FP overflow

A ridiculously long fractional part causes decplace to reach inf, and the perfectly valid number is converted to nan.
Testing for NAN and INF

Using strncmp would be much more clear.
DRY

The integer conversion loop is repeated at least twice (for whole part and exponent). It is a strong indication that you need to factor it out into a function. You may want to have this function to handle fractional part as well.

Best effort

The function fails to convert certain numbers which stdlib version does handle, e.g.

 char * s ="10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001E-200";

(a ridiculously long whole part compensated by E-200 exponent). I suppose that stdlib version anticipates such inputs and normalizes them prior to conversion.

Stack Exchange Network

`atof` implementation

4 Answers 4

Bug

Avoid calling pow unless necessary

Timing tests

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

`atof` implementation

4 Answers 4

Bug

Avoid calling pow unless necessary

Timing tests

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions