string to double

Question 1

I'm trying to write a function to replace the standard std::stod(...). I want to parse the following valid inputs:

/[+-]?\d+(?:\.\d+)?(?:e[-+]?\d+)?/g

And also the words infinity, +infinity, -infinity, undefined, +undefined and -undefined. Note, if I remember well, infinity and +infinity are the same underlying values while undefined, +undefined and -undefined are all different NaN values.

So I came up with this:

#include <iostream>
#include <cmath>
#include <limits>
#include <cstring>
using namespace std;
class invalid_number { };
static bool is_digit(char c) {
 // probably faster than `return c >= '0' && c <= '9';`?
 switch (c) {
 case '0': case '1': case '2':
 case '3': case '4': case '5':
 case '6': case '7': case '8':
 case '9': return true;
 default: return false;
 }
}
double safe_pow(double a, double b) {
 if (b < 0) return 1.0 / pow(a, abs(b));
 else return pow(a, b);
}
// parses /[+-]?\d+(?:\.\d+)?(?:e[-+]?\d+)?/g
static double s2d(char * s) {
 bool negative = (s[0] == '-');
 bool positive = (s[0] == '+');
 if (positive || negative) s++;
 // + infinity == infinity
 if (!strcmp(s, "infinity0円")) {
 if (negative) return - numeric_limits<double>::infinity();
 return numeric_limits<double>::infinity();
 }
 // + undefined != undefined != - undefined
 if (!strcmp(s, "undefined0円")) {
 if (negative) return - numeric_limits<double>::quiet_NaN();
 if (positive) return + numeric_limits<double>::quiet_NaN();
 return numeric_limits<double>::quiet_NaN();
 }
 if (!is_digit(* s)) throw invalid_number();
 int64_t x = 0, point = 0, d = 0;
 while (is_digit(* s)) {
 x = (x * 10) + (* s) - '0';
 s++; point++; d++;
 }
 if ((* s) == '.') {
 s++;
 if (!is_digit(* s)) throw invalid_number();
 while (is_digit(* s)) {
 x = x * 10 + (* s) - '0';
 s++; d++;
 }
 } else point = 0;
 if (negative) x *= -1;
 uint64_t e = 0; bool ne = false;
 if ((* s) == 'e') {
 s++;
 if ((* s) == '-') { ne = true; s++; }
 else if ((* s) == '+') s++;
 if (!is_digit(* s)) throw invalid_number();
 while (is_digit(* s)) {
 e = (e * 10) + (* s) - '0';
 s++;
 }
 }
 if (!point) return x;
 point = d - point; 
 if (ne) point += e; else point -= e;
 return x / safe_pow(10, point);
}
int main(int argc, char * argv[]) {
 cout << "test number: ";
 string number;
 getline(cin, number);
 /* test cases:
 -537e+4
 +20
 34.3e-10
 2e5
 -5e3
 -2
 -22.27e0
 */
 try {
 cout << s2d((char *)number.c_str()) << endl;
 } catch (invalid_number & inv) {
 cout << "invalid number" << endl;
 }
 return 0;
}

Question 2

Why are you putting an extra 0円 in your strings? Note that in the comparisons that doesn't do anything since the compare stops at the first 0円.

Question 3

I'd expect ".125" and "123." to be acceptable, yet it appears /[+-]?\d+(?:\.\d+)?(?:e[-+]?\d+)?/g does not allow.

Question 4

undefined is not certainly different from both +undefined and -undefined.

Question 5

@chux-ReinstateMonica, last time I checked, the IEEE 754 standard was the same for c, c++ and javascript... Also, there are several undefined type of values that indicate different things like 0 / 0 should be NaN different than inf / inf. Anyways, now the code works.

Question 6

@PavloSlavynskyy now the code works. sorry for wasting you time, now it's ready to be reviewed

Question 7

Edge cases

Even though the mathematical quotient may be in the double range, safe_pow(10, point) may be outside the double range.

An alternative scales by 5, then 2

// return x / safe_pow(10, point);
x *= pow(5, -point);
x *= pow(2, -point); // see also scalbn()
return x;

Avoid premature optimization

// probably faster than return c >= '0' && c <= '9'; is simply not supported. The switch statement could as well by 10x slower. Look to isdigit().

Nothing safer about safe_pow()

Instead:

double safe_pow(double a, double b) {
 // if (b < 0) return 1.0 / pow(a, abs(b));
 // else return pow(a, b);
 return pow(a, b);
}

No overflow protection

x = (x * 10) + (* s) - '0'; may overflow on the 19th/20th digit.

Consider using double math here instead of int64_t.

It works now - or does it?

Also test with the string equivalent of DBL_MAX, DBL_MIN, DBL_TRUE_MIN (or the C++ counterparts). See output of this

 "1.7976931348623157e+308" "179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368.0"
 "2.2250738585072014e-308" "0.00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000002225073858507201383090232717332404064219215980462331830553327416887204434813918195854283159012511020564067339731035811005152434161553460108856012385377718821130777993532002330479610147442583636071921565046942503734208375250806650616658158948720491179968591639648500635908770118304874799780887753749949451580451605050915399856582470818645113537935804992115981085766051992433352114352390148795699609591288891602992641511063466313393663477586513029371762047325631781485664350872122828637642044846811407613911477062801689853244110024161447421618567166150540154285084716752901903161322778896729707373123334086988983175067838846926092773977972858659654941091369095406136467568702398678315290680984617210924625396728515625"
"-4.9406564584124654e-324" "-0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000004940656458412465441765687928682213723650598026143247644255856825006755072702087518652998363616359923797965646954457177309266567103559397963987747960107818781263007131903114045278458171678489821036887186360569987307230500063874091535649843873124733972731696151400317153853980741262385655911710266585566867681870395603106249319452715914924553293054565444011274801297099995419319894090804165633245247571478690147267801593552386115501348035264934720193790268107107491703332226844753335720832431936092382893458368060106011506169809753078342277318329247904982524730776375927247874656084778203734469699533647017972677717585125660551199131504891101451037862738167250955837389733598993664809941164205702637090279242767544565229087538682506419718265533447265625"

Use const

For greater application.

// double s2d(char * s) {
double s2d(const char * s) {

IEEE 754

OP commented "IEEE 754 standard was the same for c, c++ and javascript." C++ does not specify adherence to IEEE 754. Better to avoid assuming that.

"-0.0"

Code can properly return -0.0 - good. Better to do the x *= -1 after the power-of-10 scaling though to always preserve the sign.

Curious code

Two different negation approaches. I'd expect more similar code.

if (negative) x *= -1;
...
if (ne) point += e; else point -= e;

Minor: Caseless e

I'd expect 'E' to work as well as 'e'.

Deeper

String to double is a difficult function to get high quality results without extended bit-width math. Conversion from int64_t to double, prior to the multiplication/division can lose many bits. pow() is prone to losing about 1-1.5 bits. I'd expected overall precision to be within 2 bits of the best answer (aside from overflow issues) - not too bad for basic code.
IEEE Std 754-2019 discusses how many significant decimal digits should be read before effectively assuming the rest are 0 (even if not) as H. H >= M + 3 where M is 17 for double, the number needed to round-trip double -> text -> double. To encode all 20 significant decimal digit takes uint67_t or wider. Thus a quality conversion obliges better than int64_t math. For now, I recommend to settle for a double x in the significant calculation and its inherent imprecision. Later, when ready for a better conversions, look to extended math.

Question 8

@Cristian Looked it up. The double significant parsing should handle at least 20 decimal digits to meet IEEE Std 754-2019 standards.

Question 9

Identifiers

Identifiers should be meaningful. What's s? What's d? What's e? Maybe you should name them str, digits_count (or at least n_digits) and exponent or something like that?

The only exceptions are i and j as common loop variables.

One line - one statement

Don't put several statements in a single line.

if( condition )
 statement1;
else
 statement2;

is much more readable.

`using namespace std;` is bad

It's OKish while you're writing a code like this - 50 lines in a single file; but C++ is intended for bigger projects, and mixing names from different namespaces can be at least misleading, at most dangerous.

Use standard exceptions

There's a std::invalid_argument exception type in <stdexcept>. You can use it or derive your own class from it. Just invalid_number is unclear.

`is_digit` is replacing default `std::isdigit`

Is this intended to be so? The same for safe_pow and std::pow - it works with negative values fine, your problem is somewhere else.

At this point I've run your code - and it still fails on the first example, -537e+4 should be -5370000, not -537.

Question 10

oops, I missed that. Good thing you found it, so I can fix it again :)

Question 11

@Cristian OK to post your own answer with that functional fix (and other improvements) - or post another review . (Tip: If posting a new review, give it a week for this one to get fully vetted)

Question 12

Don’t write using namespace std;.

You can, however, in a CPP file (not H file) or inside a function put individual using std::string; etc. (See SF.7.)

if (!strcmp(s, "undefined0円")) {

First, a lexical string literal already contains a terminating 0円. If you write "x" you get a const char unnamed_variable [2] = {'x','0円'};

Second, don't use strcmp in C++. It is confusing and difficult to use, and very inefficient compared to comparisons that know the length up front. (To be fair, sometimes I can get the compiler to optimize it when experimenting with Compiler Explorer, sometimes it will not.) Use string_view literals so you can write:

using namespace std::literals; // at the top of your CPP file
 ⋮
if (s == "undefined"sv) {

static double s2d(char * s) {

It should raise a 🚩 big 🚩red 🚩 flag 🚩 that you must cast the argument to char* any time you want to use it, including the result of string::c_str() and a lexical string literal. Where else does a "normal" string come from? If your function is incompatible with these, there's something fundimentally wrong with it.

The type should be const char* to fix this problem.

But, you are not writing in C here, but C++. You want to pass in a std::string (as seen from your test code), as well as pass in a lexical string literal (like "1.23"). To handle both of these and more efficiently, write your functions to take a std::string_view (by value).

With a string_view you can still use subscript notation, or get a pointer to the underlying characters, as well as getting most of the API of the string class.

Why is s2d defined as static? That means it will only be seen by this translation unit, which doesn't matter in a simple program that only has one CPP file, but in real code I expect this would be in a library you actually expect to call from somewhere.

Meanwhile, the helper function safe_pow is not, so you're not just making everything static.

const

Besides the missing const on your argument type, you don't use it at all anywhere in your post. You should use it a lot. For example,

 const bool negative = (s[0] == '-');
 const bool positive = (s[0] == '+');

Generally, make a variable const if you can.

As for declarations in general,

int64_t x = 0, point = 0, d = 0;
It's good to see that you didn't put this at the top of the function, but where you were ready to start using them. However, it is strongly the idiom in C++ to only make one declaration per statement, and they should be on separate lines.

Test Code

You can write your test code to not require you to type each example every time you run it! And, you can automate the testing that it got the right answer. Take a look at a testing framework such as Catch2.

Rather than prompting for input, turn your comment into useful data:

constexpr string_view testdata[]= {
 "-537e+4",
 "+20",
 "34.3e-10",
 "2e5",
 "-5e3",
 "-2",
 "-22.27e0" };
for (auto s : testdata) {
 const double result= s2d(s);
 // compare result against library's parser or other data
 cout << result << '\n'; // until it's automated
}

chux chuxchux 36.4k2 gold badges43 silver badges96 bronze badges · Accepted Answer · 2021-08-11 01:17:56Z

Edge cases

Even though the mathematical quotient may be in the double range, safe_pow(10, point) may be outside the double range.

An alternative scales by 5, then 2

// return x / safe_pow(10, point);
x *= pow(5, -point);
x *= pow(2, -point); // see also scalbn()
return x;

Avoid premature optimization

// probably faster than return c >= '0' && c <= '9'; is simply not supported. The switch statement could as well by 10x slower. Look to isdigit().

Nothing safer about safe_pow()

Instead:

double safe_pow(double a, double b) {
 // if (b < 0) return 1.0 / pow(a, abs(b));
 // else return pow(a, b);
 return pow(a, b);
}

No overflow protection

x = (x * 10) + (* s) - '0'; may overflow on the 19th/20th digit.

Consider using double math here instead of int64_t.

It works now - or does it?

Also test with the string equivalent of DBL_MAX, DBL_MIN, DBL_TRUE_MIN (or the C++ counterparts). See output of this

 "1.7976931348623157e+308" "179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368.0"
 "2.2250738585072014e-308" "0.00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000002225073858507201383090232717332404064219215980462331830553327416887204434813918195854283159012511020564067339731035811005152434161553460108856012385377718821130777993532002330479610147442583636071921565046942503734208375250806650616658158948720491179968591639648500635908770118304874799780887753749949451580451605050915399856582470818645113537935804992115981085766051992433352114352390148795699609591288891602992641511063466313393663477586513029371762047325631781485664350872122828637642044846811407613911477062801689853244110024161447421618567166150540154285084716752901903161322778896729707373123334086988983175067838846926092773977972858659654941091369095406136467568702398678315290680984617210924625396728515625"
"-4.9406564584124654e-324" "-0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000004940656458412465441765687928682213723650598026143247644255856825006755072702087518652998363616359923797965646954457177309266567103559397963987747960107818781263007131903114045278458171678489821036887186360569987307230500063874091535649843873124733972731696151400317153853980741262385655911710266585566867681870395603106249319452715914924553293054565444011274801297099995419319894090804165633245247571478690147267801593552386115501348035264934720193790268107107491703332226844753335720832431936092382893458368060106011506169809753078342277318329247904982524730776375927247874656084778203734469699533647017972677717585125660551199131504891101451037862738167250955837389733598993664809941164205702637090279242767544565229087538682506419718265533447265625"

Use const

For greater application.

// double s2d(char * s) {
double s2d(const char * s) {

IEEE 754

OP commented "IEEE 754 standard was the same for c, c++ and javascript." C++ does not specify adherence to IEEE 754. Better to avoid assuming that.

"-0.0"

Code can properly return -0.0 - good. Better to do the x *= -1 after the power-of-10 scaling though to always preserve the sign.

Curious code

Two different negation approaches. I'd expect more similar code.

if (negative) x *= -1;
...
if (ne) point += e; else point -= e;

Minor: Caseless e

I'd expect 'E' to work as well as 'e'.

Deeper

String to double is a difficult function to get high quality results without extended bit-width math. Conversion from int64_t to double, prior to the multiplication/division can lose many bits. pow() is prone to losing about 1-1.5 bits. I'd expected overall precision to be within 2 bits of the best answer (aside from overflow issues) - not too bad for basic code.
IEEE Std 754-2019 discusses how many significant decimal digits should be read before effectively assuming the rest are 0 (even if not) as H. H >= M + 3 where M is 17 for double, the number needed to round-trip double -> text -> double. To encode all 20 significant decimal digit takes uint67_t or wider. Thus a quality conversion obliges better than int64_t math. For now, I recommend to settle for a double x in the significant calculation and its inherent imprecision. Later, when ready for a better conversions, look to extended math.

@Cristian Looked it up. The double significant parsing should handle at least 20 decimal digits to meet IEEE Std 754-2019 standards.

Stack Exchange Network

string to double

3 Answers 3

Identifiers

One line - one statement

`using namespace std;` is bad

Use standard exceptions

`is_digit` is replacing default `std::isdigit`

const

Test Code

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

string to double

3 Answers 3

Identifiers

One line - one statement

using namespace std; is bad

Use standard exceptions

is_digit is replacing default std::isdigit

const

Test Code

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions

`using namespace std;` is bad

`is_digit` is replacing default `std::isdigit`