I have implemented the atof
function. Here is my implementation
double atof(const char *str)
{
double a; /* the a value in a*10^b */
double decplace; /* number to divide by if decimal point is seen */
double b; /* The b value (exponent) in a*10^b */
int sign = 1; /* stores the sign of a */
int bsign = 1; /* stores the sign of b */
while (*str && isspace(*str))
++str;
if (*str == '+')
++str;
if (*str == '-') {
sign = -1;
++str;
}
if ((*str == 'n' || *str == 'N') &&
(str[1] == 'a' || str[1] == 'A')
&& (str[2] == 'n' || str[2] == 'N'))
return NAN * sign;
if ((*str == 'i' || *str == 'I') && (str[1] == 'n' || str[1] == 'N') &&
(str[2] == 'f' || str[2] == 'F'))
return INFINITY * sign;
for (a = 0; *str && isdigit(*str); ++str)
a = a * 10 + (*str - '0');
if (*str == '.')
++str;
for (decplace = 1.; *str && isdigit(*str); ++str, decplace *= 10.)
a = a * 10. + (*str - '0');
if (*str == 'e' || *str == 'E') {
/* if the user types a string starting from e, make the base be 1 */
if (a == 0)
a = 1;
++str;
if (*str == '-') {
bsign = -1;
++str;
}
if (*str == '+')
++str;
for (b = 0; *str && isdigit(*str); ++str)
b = b * 10 + (*str - '0');
b *= bsign;
}
else
b = 0;
return (a * sign / decplace) * pow(10, b);
}
4 Answers 4
FP overflow
A ridiculously long fractional part causes
decplace
to reachinf
, and the perfectly valid number is converted tonan
.Testing for
NAN
andINF
Using
strncmp
would be much more clear.DRY
The integer conversion loop is repeated at least twice (for whole part and exponent). It is a strong indication that you need to factor it out into a function. You may want to have this function to handle fractional part as well.
Best effort
The function fails to convert certain numbers which
stdlib
version does handle, e.g.char * s ="10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001E-200";
(a ridiculously long whole part compensated by
E-200
exponent). I suppose thatstdlib
version anticipates such inputs and normalizes them prior to conversion.
Bug
If I convert "0e+1"
, I will get back 1
, because of these lines:
/* if the user types a string starting from e, make the base be 1 */ if (a == 0) a = 1;
That check can't just be a == 0
because the part before the 'e'
may be 0 as in my example. One way to fix this is to check if the string begins with an 'e'
before parsing any digits and setting a
to 1 if it does.
Avoid calling pow unless necessary
Most of the time, you won't parse any e+exponent
part, but currently you call pow()
no matter what, which makes your function slower than it could be:
return (a * sign / decplace) * pow(10, b);
You should avoid calling pow()
unless you need it. Here is one way (not necessarily the best way):
return (a * sign / decplace) * ((b == 0) ? 1 : pow(10, b));
Timing tests
Edit: one user questioned whether the pow()
optimization was useful. I ran the original function on the input "55.5"
100000000 times vs the version that avoided calling pow()
. The results:
Original version: 2.55 sec
Modified version: 1.10 sec
I also modified the code a different way to avoid the extra multiply against 1, like this:
if (*str == 'e' || *str == 'E') {
// (All the same code here)
b *= bsign;
return (a * sign / decplace) * pow(10, b);
}
return a * sign / decplace;
This one was even faster:
Second modified version: 1.00 sec
-
2\$\begingroup\$ I'm not sure the
pow
optimization is useful. Also, it clutters the code. \$\endgroup\$heinrich5991– heinrich59912016年02月15日 10:09:08 +00:00Commented Feb 15, 2016 at 10:09 -
\$\begingroup\$ using a lookup table for powers of 10 may be faster than that trick \$\endgroup\$phuclv– phuclv2016年02月15日 10:29:12 +00:00Commented Feb 15, 2016 at 10:29
-
\$\begingroup\$ @heinrich5991 See timing results that I added above. \$\endgroup\$JS1– JS12016年02月15日 18:54:46 +00:00Commented Feb 15, 2016 at 18:54
-
\$\begingroup\$ @LưuVĩnhPhúc My assumption was that the
pow()
case was uncommon. But yes a lookup table could speed up the case where thepow()
is necessary. \$\endgroup\$JS1– JS12016年02月15日 18:55:32 +00:00Commented Feb 15, 2016 at 18:55
Implementing a precise atof function is surprisingly difficult since there is no limit to the number of digits code may need to read to ensure correct rounding. Consider, for example, the following two values:
- 12012345.50000000000000000000000000000000000000000000000000000
- 12012345.50000000000000000000000000000000000000000000000000001
The first should round down to 12012345.0f; the second should round up to 12012346.0f. It's not necessary for the function to perform extensive calculations on the input string; it would sufficient for it to observe whether there are any non-zero digits past a certain point, but the logic to handle all the cases is apt to be tricky.
-
\$\begingroup\$ Glibc does it all with big integers (see.: stdlib/strtod_l.c) others use David Gay’s strtod() (now over 20 years old and still running!) which is optimized for speed and a hard read. \$\endgroup\$deamentiaemundi– deamentiaemundi2016年02月15日 04:30:32 +00:00Commented Feb 15, 2016 at 4:30
-
\$\begingroup\$ Probably digits 1201234 are not necessary in this example. \$\endgroup\$CiaPan– CiaPan2016年02月15日 13:57:21 +00:00Commented Feb 15, 2016 at 13:57
-
1\$\begingroup\$ @CiaPan: The integer part needs to be odd and in a certain range. value needs to be in a certain range, and making digits different makes it easier to see how many there are. Note that 12345678.5 wouldn't work, nor would 23456789.5. \$\endgroup\$supercat– supercat2016年02月15日 15:01:57 +00:00Commented Feb 15, 2016 at 15:01
Overflow with values near
DBL_MAX
. With input like"0.0001e310"
returninf
rather than1.00000e+307
.return (a * sign / decplace) * pow(10, b);
Same code loses precision near
DBL_MIN
and total loss of precision nearDBL_TRUE_MIN
.The same line of code unnecessarily loses precision. To improve #1, 2 and 3, rather than
decplace
being a power of 10, just use integer values and add tob
. This will greatly reduce but not eliminate issues nearDBL_MAX, DBL_MIN,DBL_TRUE_MIN
. There will be less precision loss, but not fully eliminated. IOWs, thedouble
generated will be closer to the best one. [Edit -->] Another simple improvement: Rather than... * pow(10, b)
, use(* ... * pow(5, b)) * pow(2,b)
. This will allow precise calculations near the extremes ofdouble
without undo over/under flow.Quality assessment. Unknown - but should not be. Part of writing a good
atof()
is to test its correctnesses. By renaming this tomy_atof()
and testing against the standardatof()
would be a first step. Then rate how close the two functions are. This post omits any statistics, so I assume none have been done. With any math functions, such an analysis, even crude, should be attempted. Note that such test code can be much more code than thisatof()
and it is work.
If truly interested in improving your algorithm, refine it with by first writing a goodatofloat()
(using onlyfloat
math) and compare that againststrtod()
.I do not see clear consensus about use of sign on
NAN
soreturn NAN * sign;
may or not be needed. Also, that code might not provide result in-NAN
. Consider usingcopysign()
. One of those FP corner cases.Pedantic:
isspace()
expects a value in theunsigned char
(orEOF
). With a signedchar
, the below is a problem with negative values.// while (*str && isspace(*str)) // fix with cast or other means while (*str && isspace((unsigned char) *str))