Search code, repositories, users, issues, pull requests...


 Proposal for a reference string to number

c0f38d4


 Merge branch 'master' of https://github.com/jalvesz/stdlib

f0b80ae


 str2num: forgot to include file in CMakeLists.txt

94cc5df

Copy link

klausler commented Nov 2, 2023 •

edited

Loading

I ran an exhaustive test over the full range of 32-bit IEEE-754 numbers and observed many discrepancies in the real values read by str2float vs. the values read via internal input.

program test_real4s
 use stdlib_str2num, only: str2float
 implicit none
 character(112) :: buff
 integer(4) :: ival
 real(4) :: rval, rcheck
 equivalence(ival, rval)
 integer(8) :: j, passes, fails, tests
 passes = 0
 fails = 0
 tests = 0
 do j = 0, int(z'ffffffff',8)
 ival = j
 if (iand(ival, z'7f800000') == int(z'7f800000') .and. &
 iand(ival, z'007fffff') /= 0) then
 cycle ! skip NaNs
 end if
 tests = tests + 1
 write(buff, 1) rval
1 format(E112.105)
 read(buff, 1) rcheck
 if (rval /= rcheck) then
 print 2, ival, rval, rcheck, rval, rcheck
2 format('I/O library failure: ival ',z8,' rval ',z8,' rcheck ',z8, &
 ';',2(1x,E112.105))
 error stop
 end if
 rcheck = str2float(buff)
 if (rval == rcheck) then
 passes = passes + 1
 else
 print 3, rval, rcheck, rval, rcheck
3 format('str2float failure: rval ',z8,' rcheck ',z8, &
 /' want ',E112.105,/' got ',E112.105)
 end if
 end do
 print *, passes, ' total str2float passes out of ', tests, ' tests'
 print *, fails, ' total str2float failures out of ', tests, ' tests'
end

Copy link

Contributor Author

jalvesz commented Nov 2, 2023 •

edited

Loading

Thanks for pointing it out and the test @klausler, I saw the problem with the small values!!

So, with the current strategy, this value for instance:
0.140129846432481707092372958328991613128026194187651577175706828388979108268586060148663818836212158203125E-44

Would be read first with an integer containing:
int_wp = 14012984
the variable resp = 9, contains the decimal places that should be applied first to get 0.14012984 and the exponent integer cotains the correct value
sige*max(0,i_exp) = -44
But, since the total exponent is computed from
expbase(nwnb-1+resp-sige*max(0,i_exp))
which can not hold 1e-53 that gives an underflow, the strategy doesn't work properly here...

Thinking how to make sure to manage those limit cases without degrading the performance

Copy link

Contributor Author

jalvesz commented Nov 2, 2023 •

edited

Loading

For the edge cases, something like this could work:

exp_aux = nwnb-1+resp-sige*max(0,i_exp)
if( exp_aux>0 .and. exp_aux<=nwnb+nfnb ) then
 v = sign*int_wp*expbase(exp_aux)
 else if(exp_aux>nwnb+nfnb) then
 v = sign*int_wp*fractional_base(exp_aux-(nwnb+nfnb))*expbase(nwnb+nfnb)
 end if

Tried on
"0.462428493227189633404830762485672323322486440819250204679832533683631057286333998490590602159500122070312E-43"

With ifort and ifx the error is below acceptable tolerance, with gfortran I get:
formatted read : 0.462428493E-43
str2real : 0.448415509E-43

Copy link

klausler commented Nov 2, 2023

For the edge cases, something like this could work:
exp_aux = nwnb-1+resp-sige*max(0,i_exp)
if( exp_aux>0 .and. exp_aux<=nwnb+nfnb ) then
 v = sign*int_wp*expbase(exp_aux)
 else if(exp_aux>nwnb+nfnb) then
 v = sign*int_wp*fractional_base(exp_aux-(nwnb+nfnb))*expbase(nwnb+nfnb)
 end if
Tried on "0.462428493227189633404830762485672323322486440819250204679832533683631057286333998490590602159500122070312E-43"

With ifort and ifx the error is below acceptable tolerance, with gfortran I get: formatted read : 0.462428493E-43 str2real : 0.448415509E-43

This test is reading the results of binary->decimal conversions that have enough decimal digits to be exact. The values read back must match the original binary values exactly; there is no "acceptable tolerance".

Copy link

Contributor Author

jalvesz commented Nov 2, 2023

Well, the problem being that even this is not exact

use iso_fortran_env
real :: x
x = 10._real32**(-38)
print *, x

gfortran : 9.99999935E-39
ifx : 9.9999994E-39

and it get worst with

x = 10._real32**(-44)

gfortran : 9.80908925E-45
ifx : 9.8090893E-45

Anyhow, fixed the problem. It was coming from the table of exponents which did not accurately defined the 10 basis. Changed for the following:

integer(kind=ikind), parameter :: nwnb = 39 !> number of whole number factors
integer(kind=ikind), parameter :: nfnb = 37 !> number of fractional number factors
integer :: e
real(wp), parameter :: whole_number_base(nwnb) = [(10._wp**(nwnb-e),e=1,nwnb)]
real(wp), parameter :: fractional_base(nfnb) = [(10._wp**(-e),e=1,nfnb)]
real(wp), parameter :: expbase(nwnb+nfnb) = [whole_number_base, fractional_base]

Now I get:

gfortran
input : "0.462428493227189633404830762485672323322486440819250204679832533683631057286333998490590602159500122070312E-43"
formatted read : 0.462428493E-43
str2real : 0.462428493E-43

ifort
input : "0.462428493227189633404830762485672323322486440819250204679832533683631057286333998490590602159500122070312E-43"
formatted read : .4624285E-43
str2real : .4624285E-43

Copy link

klausler commented Nov 2, 2023

If you're using negative powers of ten in your algorithm, it's never going to be exact.

Copy link

Contributor Author

jalvesz commented Nov 2, 2023

I know, but I haven't found a better approach to be fast and accurate. In any case you can see in the last results that both the formatted read and the proposed algorithm give the same output to the last decimal place.

Previously I was defining the exponents table with "1eN", this thread https://stackoverflow.com/questions/47337262/defining-exponent-in-fortran hinted me to using "10._wp**N" which did indeed improve the precision.

I'll update when finishing more tests.

jalvesz added 3 commits

November 2, 2023 19:11


 update interface str2num > to_num( ) using mold

001ef27


 Fix: elemental attribute lost in refactoring

1f44350


 hexadecimal def of Inf and NaN

90a8b98

Copy link

Contributor Author

jalvesz commented Nov 2, 2023

Changed the interface according to the discussions in the forum asking for something like "r = to_num( s , mold = r )" (the mold argument is there just to disambiguate the function call).

@klausler I'm rerunning your tests loosening the error check by

if (abs(rval - rcheck)<=10*epsilon(0.0) * abs(rcheck)) then

It has been running for quite a while on 16threads and so far no error messages, don't know when it will finish, I'll update fully then... a relative error to 10*epsilon(0._wp) is already quite good I would say. If there is a different method that would enable both speed and better accuracy I would gladly try to adapt it.

Copy link

klausler commented Nov 2, 2023

How much faster is this inexact algorithm than an internal READ statement with a good Fortran compiler?

Copy link

Contributor Author

jalvesz commented Nov 2, 2023 •

edited

Loading

Between 30x to 70x depending on the hardware and compiler (using -O3 -march=native)
I've done all the tests with gfortran 11 / 12 / 13 and ifort19 / ifort21 / ifx23

Copy link

klausler commented Nov 2, 2023

Using a variant of my test from above, sweeping over all the ca. 4B valid 32-bit IEEE-754 numbers, I measured that it takes 4795 total CPU seconds to just do all the internal writes, 2107 additional CPU seconds to also do all the internal reads, or 526 additional CPU seconds to also run str2num instead of using internal reads. That's a speed-up of 4x, which is not 70x but would still be great if the results were the same. (This is with f18 ("llvm flang"), which has pretty fast binary<->decimal conversions.)

Copy link

Contributor Author

jalvesz commented Nov 3, 2023 •

edited

Loading

Just saw the tests were finish, the only errors prompted were

str2float failure: rval 7F800000 rcheck 0
 want Infinity
 got 0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000E+00
str2float failure: rval FF800000 rcheck 0
 want -Infinity
 got 0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000E+00
 4278190080 total str2float passes out of 4278190082 tests
 0 total str2float failures out of 4278190082 tests

I did get those speed ups with str2double (now just to_num) ... I'll double check speedups for real32

One idea for coming across the precision issue would be to store the exponents in double precision. As the multiplication is done just once at the end, it should no add much penalty. I'll try that out.

So, with the exponents table in single precision:

input : "0.123456789123456789123456789123456789"
formatted read : 0.123456791
str2real : 0.123456776
difference abs : -0.149011612E-7
difference rel : -0.120699415E-4%

input : "0.589575869907087640261923735717316179460653520220192613836518827249684807090268634510721312835812568664551E-38"
formatted read : 0.589575870E-38
str2real : 0.589575730E-38
difference abs : -0.140129846E-44
difference rel : -0.237679069E-4%

exponents in double precision

input : "0.123456789123456789123456789123456789"
formatted read : 0.123456791
str2real : 0.123456784
difference abs : -0.745058060E-8
difference rel : -0.603497074E-5%

input : "0.589575869907087640261923735717316179460653520220192613836518827249684807090268634510721312835812568664551E-38"
formatted read : 0.589575870E-38
str2real : 0.589575870E-38
difference abs : 0.00000000
difference rel : 0.00000000%


 Mix real64 exponents for reading real32

ceb0b6c

Copy link

Contributor Author

jalvesz commented Nov 3, 2023

Mixing the precisions gets the error down to epsilon(0.0) for real32

I'm kind of puzzled by the NaN assignment. I did define a parameter real(wp), parameter :: rNaN = real(z'7fc00000',wp) which is assigned if "N" if found

else if( iachar(s(p:p)) == NaN ) then
 v = rNaN; return

But a simple print returns 0.00000000 instead (same for Inf) ... any clues ?


 parametrized max depth to read number

64dd1c0

Copy link

Contributor Author

jalvesz commented Nov 3, 2023 •

edited

Loading

@klausler reran your tests with the latest version and got the error down to zero. No prompts other than the Infinity and -Infinity with the condition on if (rval == rcheck) then ... have yet to understand why the parameters are not being correctly assigned. Would you mind cross-validating? With the new interface just change

use stdlib_str2num, only: to_num
...
rcheck = to_num(buff,rcheck)

Edit: found and fixed the issue with Inf and NaN


 Bugfix with Inf and NaN

3ca78a9

@jvdp1

Copy link

Member

jvdp1 commented Nov 3, 2023

Mixing the precisions gets the error down to epsilon(0.0) for real32

I'm kind of puzzled by the NaN assignment. I did define a parameter real(wp), parameter :: rNaN = real(z'7fc00000',wp) which is assigned if "N" if found
else if( iachar(s(p:p)) == NaN ) then
 v = rNaN; return
But a simple print returns 0.00000000 instead (same for Inf) ... any clues ?

If possible I suggest to use the ones provided by ieee_arithmetic as in other stdlib module (e.g. in the stats mean module )

Copy link

Contributor Author

jalvesz commented Nov 3, 2023 •

edited

Loading

@jvdp1 by any chance do you know why there are those two fails in the build? when I look at the logs, they do not seem to be related to the current PR.

@jvdp1

Copy link

Member

jvdp1 commented Nov 4, 2023

@jvdp1 by any chance do you know why there are those two fails in the build? when I look at the logs, they do not seem to be related to the current PR.

No idea. But it seems unrelated to this PR.

Copy link

Contributor Author

jalvesz commented Nov 5, 2023

@klausler after verifying that your tests runs successfully without any errors with strict equivalence, I adapted it to check for the speedups. I saw values oscillating between 6x and 13x stabilizing around 8x for the real32 reader (to_num) compared with the formatted read.
Compiler: gfortran 13.2
Options: -O3 -march=native
OS: WSL Ubuntu 20.04
CPU: Intel Xeon Gold @ 2.9GHz

Copy link

Contributor Author

jalvesz commented Nov 14, 2023

@jvdp1 regarding this:

If possible I suggest to use the ones provided by ieee_arithmetic as in other stdlib module (e.g. in the stats mean module )

I have just one remark:
Defining a parameter variable should in principle offer better optimization compared to having to call the function ieee_value(1._wp, ieee_positive_inf) which is not an intrinsic function. Thus one cannot do something like:
real(wp), parameter :: Inf = ieee_value(1._wp, ieee_positive_inf) (at least not with gfortran or ifort). One can only evaluate within the implementation. Then, I'm not sure how this would behave with all the compilers out there, and if it is an actual concern?

@jvdp1

Copy link

Member

jvdp1 commented Nov 19, 2023

I have just one remark: Defining a parameter variable should in principle offer better optimization compared to having to call the function ieee_value(1._wp, ieee_positive_inf) which is not an intrinsic function. Thus one cannot do something like: real(wp), parameter :: Inf = ieee_value(1._wp, ieee_positive_inf) (at least not with gfortran or ifort). One can only evaluate within the implementation. Then, I'm not sure how this would behave with all the compilers out there, and if it is an actual concern?

There have been many discussions about NaN and Inf, see e.g., #128 . As I remember the main issue was portability.


 use ieee_arithmetic for Inf and NaN

36ca79a

jalvesz and others added 5 commits

January 2, 2024 23:09


 enable testing for xdp

03c1693


 missing xdp kind declaration

03b4d60


 include xdp test to testsuite

44ecace


 Update doc/specs/stdlib_str2num.md

c357fa2

Co-authored-by: Jeremie Vandenplas <jeremie.vandenplas@gmail.com>


 Update doc/specs/stdlib_str2num.md

cbc05b2

Co-authored-by: Jeremie Vandenplas <jeremie.vandenplas@gmail.com>

jvdp1

jvdp1 approved these changes

Jan 7, 2024

Copy link

Member

@jvdp1 jvdp1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some final? comments, mainly related to FORD and the generation of the specs.

doc/specs/stdlib_str2num.md Outdated Show resolved Hide resolved

ci/fpm-deployment.sh Outdated Show resolved Hide resolved

src/stdlib_str2num.fypp Outdated Show resolved Hide resolved

src/stdlib_str2num.fypp Show resolved Hide resolved

src/stdlib_str2num.fypp Outdated Show resolved Hide resolved

@jvdp1 jvdp1 requested a review from a team

January 7, 2024 15:02

jalvesz and others added 6 commits

January 7, 2024 16:08


 Update src/stdlib_str2num.fypp

026a22c

Co-authored-by: Jeremie Vandenplas <jeremie.vandenplas@gmail.com>


 Update src/stdlib_str2num.fypp

acefb99

Co-authored-by: Jeremie Vandenplas <jeremie.vandenplas@gmail.com>


 Update src/stdlib_str2num.fypp

89f0015

Co-authored-by: Jeremie Vandenplas <jeremie.vandenplas@gmail.com>


 autodoc and examples for str2num improvement

fbdf870


 cmake build str2num examples

cfb95da

Minor changes related to fypp prepocessing #758


 fix fpm processing for maxrank=4

e45b884

@jvdp1 jvdp1 mentioned this pull request

Jan 7, 2024

Merged

jvdp1

jvdp1 reviewed

Jan 9, 2024

ci/fpm-deployment.sh Outdated Show resolved Hide resolved

jalvesz and others added 2 commits

January 9, 2024 20:11


 Merge branch 'master' into master

62f2d0b


 rollback prune command in fpm deployment

7022d83

Copy link

Contributor Author

jalvesz commented Jan 10, 2024

Is it possible to re-launch the build checks without a commit? I've seen a few times here that the macos-latest build fails after staying idle for hours. It doesn't seem to be related to the changes in the PR.

@jvdp1

Copy link

Member

jvdp1 commented Jan 10, 2024

Is it possible to re-launch the build checks without a commit? I've seen a few times here that the macos-latest build fails after staying idle for hours. It doesn't seem to be related to the changes in the PR.

Done! Indeed, it seems related to the CI, instead of stdlib itself

Copy link

Contributor Author

jalvesz commented Jan 11, 2024

Hi, is there something else that should be done before the PR can be merged?

@jvdp1

Copy link

Member

jvdp1 commented Jan 12, 2024

IMO this PR sounds good and ready to be merged. But it would be good to have at least one additional reviewer (maybe @klausler @milancurcic @gnikit @awvwgk ?)

gnikit

gnikit approved these changes

Jan 15, 2024

Copy link

Member

@gnikit gnikit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @jalvesz. This would be an amazing addition to stdlib.
I had a read over the added files and they look good to me.
I would be keen to merge this and make a minor stdlib release, unless others have any objections.

milancurcic

milancurcic approved these changes

Jan 15, 2024

Copy link

Member

@milancurcic milancurcic left a comment •

edited

Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR, looks good. Only minor nit-picks in comments.

Not a big deal, but it would be nice if the style was consistent within the module, e.g. the whitespaces around operators, some are there, some aren't.

Another one is the use of semicolons to put multiple statements on one line. I think we should just not use those at all.

src/stdlib_str2num.fypp Outdated Show resolved Hide resolved

src/stdlib_str2num.fypp

if(present(stat)) stat = err

end function

#:endfor

Copy link

Member

@milancurcic milancurcic Jan 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have an interface to a shallow wrapper that takes string_type as input. Better integration with the rest of stdlib.

Copy link

Contributor Author

@jalvesz jalvesz Jan 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't used the string_type from stdlib so didn't think about that. Maybe after merging the PR we could take a look at that?

Copy link

Member

@milancurcic milancurcic Jan 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely, since it's an additional feature, can be a future PR.

src/stdlib_str2num.fypp Outdated Show resolved Hide resolved

jalvesz added 2 commits

January 15, 2024 19:36


 rename functions, remove semicolons

fa8bdad


 add some white spaces

3b50f6a

@jvdp1

Copy link

Member

jvdp1 commented Jan 16, 2024

I would be keen to merge this and make a minor stdlib release, unless others have any objections.

@gnikit I have no objections to that.

@ivan-pi

Copy link

Member

ivan-pi commented Jan 17, 2024

@jvdp1, I think you can go ahead and merge.

@jvdp1

Copy link

Member

jvdp1 commented Jan 17, 2024

thank you @jalvesz for this PR, and all participants for useful discussions and reviews. As suggested I will merge it this PR.

@jvdp1 jvdp1 merged commit 8c7461d into fortran-lang:master

Jan 17, 2024