musl - musl - an implementation of the standard library for Linux-based systems

index : musl
musl - an implementation of the standard library for Linux-based systems
summary refs log tree commit diff
path: root/src/math
AgeCommit message (Collapse)AuthorLines
2024年08月14日remove incorrect comment regarding powl exceptional cases Rich Felker-8/+0
the comment does not match the required or actual behavior when x<0 and y is not an integer. while it could be corrected, the role of comments here is to tell about characteristics unique to the implementation, not to restate the requirements of the standard, so just removing it seems best.
2024年03月14日math: fix fma(x,y,0) when x*y rounds to -0 Szabolcs Nagy-1/+1
if x!=0, y!=0, z==0 then fma(x,y,z) == x*y in all rounding modes, while adding z can ruin the sign of 0 if x*y rounds to -0.
2024年02月29日riscv32: add fenv and math Stefan O'Rear-0/+180
These are identical to riscv64.
2024年02月03日sqrtl: fix invalid use of a non-constant-expression as static initializer Rich Felker-2/+2
having these constants be static was unnecessary, so just remove the static. this error should have been caught by compilers, but recent versions of both gcc and clang accept these as "other forms of constant expressions" which the C standard allows.
2023年08月19日math: fix ld80 powl(x,huge) and powl(LDBL_MAX,small) Szabolcs Nagy-13/+21
powl used >= LDBL_MAX as infinity check, but LDBL_MAX is finite, so this can cause wrong results e.g. powl(LDBL_MAX, 0.5) returned inf or powl(2, LDBL_MAX) returned inf without raising overflow. huge y values (close to LDBL_MAX) could cause intermediate results to overflow (computing y * log2(x) with more than long double precision) and e.g. powl(0.5, 0x1p16380L) or powl(10, 0x1p16380L) returned nan. this is fixed by handling huge y early since that always overflows or underflows. reported by Paul Zimmermann against expl10 (which uses powl).
2023年08月19日math: fix ld80 acoshl(x) for x < 0 Szabolcs Nagy-3/+7
acosh(x) is nan for x < 1, but x < 0 cases were not handled specially and acoshl gave wrong result for some -0x1p32 < x < -2 values, e.g.: acoshl(-0x1p20) returned -inf, acoshl(-0x1.4p20) returned -0x1.db365758403aa9acp+0L, fixed by checking the sign bit and handling it specially. reported by Paul Zimmermann.
2023年02月12日math: fix undefined shift in logf Szabolcs Nagy-1/+1
A signed int shift overflowed when computing a constant mask, use hex literal instead. This is unlikely to cause actual issues unless the code was compiled with ubsan or similar instrumentation specifically to catch this. The stripped libc.so is unchanged on x86_64. Reported by q66 on irc.
2021年09月23日add SPE FPU support to powerpc-sf Rich Felker-4/+4
When the soft-float ABI for PowerPC was added in commit 5a92dd95c77cee81755f1a441ae0b71e3ae2bcdb, with Freescale cpus using the alternative SPE FPU as the main use case, it was noted that we could probably support hard float on them, but that it would involve determining some difficult ABI constraints. This commit is the completion of that work. The Power-Arch-32 ABI supplement defines the ABI profiles, and indeed ATR-SPE is built on ATR-SOFT-FLOAT. But setjmp/longjmp compatibility are problematic for the same reason they're problematic on ARM, where optional float-related parts of the register file are "call-saved if present". This requires testing __hwcap, which is now done. In keeping with the existing powerpc-sf subarch definition, which did not have fenv, the fenv macros are not defined for SPE and the SPEFSCR control register is left (and assumed to start in) the default mode.
2021年07月06日math: fix fmaf not to depend on FE_TOWARDZERO Szabolcs Nagy-11/+10
2021年02月10日math: fix expm1f overflow threshold Szabolcs Nagy-2/+1
the threshold was wrong so expm1f overflowed to inf a bit too early and on most targets uint32_t compare is faster than float compare so use that. this also fixes sinhf incorrectly returning nan for some values where the internal expm1f overflowed.
2021年02月10日math: fix acoshf for negative inputs Szabolcs Nagy-4/+4
on some negative inputs (e.g. -0x1.1e6ae8p+5) acoshf failed to return nan. ensure that negative inputs result nan without introducing new branches. this was tried before in commit 101e6012856918440b5d7474739c3fc22a8d3b85 math: fix acoshf on negative values but that fix was wrong. there are 3 formulas used: log1p(x-1 + sqrt((x-1)*(x-1)+2*(x-1))) log(2*x - 1/(x+sqrt(x*x-1))) log(x) + 0.693147180559945309417232121458176568 the first fails on large negative inputs (may compute log1p(0) or log1p(inf)), the second one fails on some mid range or large negative inputs (may compute log(large) or log(inf)) and the last one fails on -0 (returns -inf).
2020年11月29日arm fabs and sqrt: support single-precision-only fpu variants Jinliang Li-2/+2
2020年08月05日math: new software sqrtl Szabolcs Nagy-1/+253
same approach as in sqrt. sqrtl was broken on aarch64, riscv64 and s390x targets because of missing quad precision support and on m68k-sf because of missing ld80 sqrtl. this implementation is written for quad precision and then edited to make it work for both m68k and x86 style ld80 formats too, but it is not expected to be optimal for them. note: using fp instructions for the initial estimate when such instructions are available (e.g. double prec sqrt or rsqrt) is avoided because of fenv correctness.
2020年08月05日math: add __math_invalidl Szabolcs Nagy-0/+9
for targets where long double is different from double.
2020年08月05日math: new software sqrtf Szabolcs Nagy-70/+70
same method as in sqrt, this was tested on all inputs against an sqrtf instruction. (the only difference found was that x86 sqrtf does not signal the x86 specific input-denormal exception on negative subnormal inputs while the software sqrtf does, this is fine as it was designed for ieee754 exceptions only.) there is known faster method: "Computing Floating-Point Square Roots via Bivariate Polynomial Evaluation" that computes sqrtf directly via pipelined polynomial evaluation which allows more parallelism, but the design does not generalize easily to higher precisions.
2020年08月05日math: new software sqrt Szabolcs Nagy-173/+179
approximate 1/sqrt(x) and sqrt(x) with goldschmidt iterations. this is known to be a fast method for computing sqrt, but it is tricky to get right, so added detailed comments. use a lookup table for the initial estimate, this adds 256bytes rodata but it can be shared between sqrt, sqrtf and sqrtl. this saves one iteration compared to a linear estimate. this is for soft float targets, but it supports fenv by using a floating-point operation to get the final result. the result is correctly rounded in all rounding modes. if fenv support is turned off then the nearest rounded result is computed and inexact exception is not signaled. assumes fast 32bit integer arithmetics and 32 to 64bit mul.
2020年08月02日add m68k sqrtl using native instruction Rich Felker-0/+15
this is actually a functional fix at present, since the C sqrtl does not support ld80 and just wraps double sqrt. once that's fixed it will just be an optimization.
2020年03月24日math: add x86_64 remquol Alexander Monakov-0/+32
2020年03月24日math: move x87-family fmod functions to C with inline asm Alexander Monakov-44/+38
2020年03月24日math: move x87-family remainder functions to C with inline asm Alexander Monakov-50/+42
2020年03月24日math: move x87-family rint functions to C with inline asm Alexander Monakov-24/+28
2020年03月24日math: move x87-family lrint functions to C with inline asm Alexander Monakov-60/+64
2020年03月24日math: move x86_64 (l)lrint(f) functions to C with inline asm Alexander Monakov-20/+32
2020年03月24日math: move i386 sqrt to C with inline asm Alexander Monakov-21/+15
2020年03月24日math: move i386 sqrtf to C with inline asm Alexander Monakov-7/+12
2020年03月24日math: move trivial x86-family sqrt functions to C with inline asm Alexander Monakov-18/+28
2020年03月24日math: move x87-family fabs functions to C with inline asm Alexander Monakov-24/+28
2020年03月24日math: move x86_64 fabs, fabsf to C with inline asm Alexander Monakov-16/+20
2020年02月21日math: fix sinh overflows in non-nearest rounding Szabolcs Nagy-8/+10
The final rounding operation should be done with the correct sign otherwise huge results may incorrectly get rounded to or away from infinity in upward or downward rounding modes. This affected sinh and sinhf which set the sign on the result after a potentially overflowing mul. There may be other non-nearest rounding issues, but this was a known long standing issue with large ulp error (depending on how ulp is defined near infinity). The fix should have no effect on sinh and sinhf performance but may have a tiny effect on cosh and coshf.
2020年02月21日math: fix __rem_pio2 in non-nearest rounding modes Szabolcs Nagy-3/+41
Handle when after reduction |y| > pi/4+tiny. This happens in directed rounding modes because the fast round to int code does not give the nearest integer. In such cases the reduction may not be symmetric between x and -x so e.g. cos(x)==cos(-x) may not hold (but polynomial evaluation is not symmetric either with directed rounding so fixing that would require more changes with bigger performance impact). The fix only adds two predictable branches in nearest rounding mode, simple ubenchmark does not show relevant performance regression in nearest rounding mode. The code could be improved: e.g reducing the medium size threshold such that two step reduction is enough instead of three, and the single precision case can avoid the issue by doing the round to int differently, but this fix was kept minimal.
2020年02月06日remove i386 asm for single and double precision exp-family functions Rich Felker-62/+3
these did not truncate excess precision in the return value. fixing them looks like considerable work, and the current C code seems to outperform them significantly anyway. long double functions are left in place because they are not subject to excess precision issues and probably better than the C code.
2020年02月06日rename i386 exp.s to exp_ld.s Rich Felker-0/+1
this commit is for the sake of reviewable history.
2020年02月06日fix excess precision in return value of i386 log-family functions Rich Felker-0/+20
2020年02月06日fix excess precision in return value of i386 acos[f] and asin[f] Rich Felker-42/+75
analogous to commit 1c9afd69051a64cf085c6fb3674a444ff9a43857 for atan[2][f].
2020年02月06日fix excess precision in return value of i386 atan[2][f] Rich Felker-2/+8
for functions implemented in C, this is a requirement of C11 (F.6); strictly speaking that text does not apply to standard library functions, but it seems to be intended to apply to them, and C2x is expected to make it a requirement. failure to drop excess precision is particularly bad for inverse trig functions, where a value with excess precision can be outside the range of the function (entire range, or range for a particular subdomain), breaking reasonable invariants a caller may expect.
2020年01月27日math/x32: correct lrintl.s for 32-bit long Alexander Monakov-2/+2
2019年11月05日ppc: add configure check for older compilers erroring on 'd' constraint rofl0r-2/+2
2019年10月14日mips: add single-instruction math functions info@mobile-stream.com-0/+64
SQRT.fmt exists on MIPS II+ (float), MIPS III+ (double). ABS.fmt exists on MIPS I+ but only cores with ABS2008 flag in FCSR implement the required behaviour.
2019年10月13日math: fix signed int left shift ub in sqrt Szabolcs Nagy-4/+2
Both sqrt and sqrtf shifted the signed exponent as signed int to adjust the bit representation of the result. There are signed right shifts too in the code but those are implementation defined and are expected to compile to arithmetic shift on supported compilers and targets.
2019年09月27日math: optimize lrint on 32bit targets Szabolcs Nagy-1/+27
lrint in (LONG_MAX, 1/DBL_EPSILON) and in (-1/DBL_EPSILON, LONG_MIN) is not trivial: rounding to int may be inexact, but the conversion to int may overflow and then the inexact flag must not be raised. (the overflow threshold is rounding mode dependent). this matters on 32bit targets (without single instruction lrint or rint), so the common case (when there is no overflow) is optimized by inlining the lrint logic, otherwise the old code is kept as a fallback. on my laptop an i486 lrint call is asm:10ns, old c:30ns, new c:21ns on a smaller arm core: old c:71ns, new c:34ns on a bigger arm core: old c:27ns, new c:19ns
2019年08月05日fix build regression in i386 asm for atan2, atan2f Rich Felker-2/+2
commit f3ed8bfe8a82af1870ddc8696ed4cc1d5aa6b441 inadvertently removed labels that were still needed.
2019年08月05日fix x87 stack imbalance in corner cases of i386 math asm Rich Felker-44/+14
commit 31c5fb80b9eae86f801be4f46025bc6532a554c5 introduced underflow code paths for the i386 math asm, along with checks on the fpu status word to skip the underflow-generation instructions if the underflow flag was already raised. unfortunately, at least one such path, in log1p, returned with 2 items on the x87 stack rather than just 1 item for the return value. this is a violation of the ABI's calling convention, and could cause subsequent floating point code to produce NANs due to x87 stack overflow. if floating point results are used in flow control, this can lead to runaway wrong code execution. rather than reviewing each "underflow already raised" code path for correctness, remove them all. they're likely slower than just performing the underflow code unconditionally, and significantly more complex. all of this code should be ripped out and replaced by C source files with inline asm. doing so would preclude this kind of error by having the compiler perform all x87 stack register allocation and stack manipulation, and would produce comparable or better code. however such a change is a much larger project.
2019年06月14日add riscv64 architecture support Rich Felker-0/+180
Author: Alex Suykov <alex.suykov@gmail.com> Author: Aric Belsito <lluixhi@gmail.com> Author: Drew DeVault <sir@cmpwn.com> Author: Michael Clark <mjc@sifive.com> Author: Michael Forney <mforney@mforney.org> Author: Stefan O'Rear <sorear2@gmail.com> This port has involved the work of many people over several years. I have tried to ensure that everyone with substantial contributions has been credited above; if any omissions are found they will be noted later in an update to the authors/contributors list in the COPYRIGHT file. The version committed here comes from the riscv/riscv-musl repo's commit 3fe7e2c75df78eef42dcdc352a55757729f451e2, with minor changes by me for issues found during final review: - a_ll/a_sc atomics are removed (according to the ISA spec, lr/sc are not safe to use in separate inline asm fragments) - a_cas[_p] is fixed to be a memory barrier - the call from the _start assembly into the C part of crt1/ldso is changed to allow for the possibility that the linker does not place them nearby each other. - DTP_OFFSET is defined correctly so that local-dynamic TLS works - reloc.h LDSO_ARCH logic is simplified and made explicit. - unused, non-functional crti/n asm files are removed. - an empty .sdata section is added to crt1 so that the __global_pointer reference is resolvable. - indentation style errors in some asm files are fixed.
2019年04月17日math: new pow Szabolcs Nagy-303/+520
from https://github.com/ARM-software/optimized-routines, commit 04884bd04eac4b251da4026900010ea7d8850edc The underflow exception is signaled if the result is in the subnormal range even if the result is exact. code size change: +3421 bytes. benchmark on x86_64 before, after, speedup: -Os: pow rthruput: 102.96 ns/call 33.38 ns/call 3.08x pow latency: 144.37 ns/call 54.75 ns/call 2.64x -O3: pow rthruput: 98.91 ns/call 32.79 ns/call 3.02x pow latency: 138.74 ns/call 53.78 ns/call 2.58x
2019年04月17日math: new exp and exp2 Szabolcs Nagy-480/+434
from https://github.com/ARM-software/optimized-routines, commit 04884bd04eac4b251da4026900010ea7d8850edc TOINT_INTRINSICS and EXP_USE_TOINT_NARROW cases are unused. The underflow exception is signaled if the result is in the subnormal range even if the result is exact (e.g. exp2(-1023.0)). code size change: -1672 bytes. benchmark on x86_64 before, after, speedup: -Os: exp rthruput: 12.73 ns/call 6.68 ns/call 1.91x exp latency: 45.78 ns/call 21.79 ns/call 2.1x exp2 rthruput: 6.35 ns/call 5.26 ns/call 1.21x exp2 latency: 26.00 ns/call 16.58 ns/call 1.57x -O3: exp rthruput: 12.75 ns/call 6.73 ns/call 1.89x exp latency: 45.91 ns/call 21.80 ns/call 2.11x exp2 rthruput: 6.47 ns/call 5.40 ns/call 1.2x exp2 latency: 26.03 ns/call 16.54 ns/call 1.57x
2019年04月17日math: new log2 Szabolcs Nagy-106/+335
from https://github.com/ARM-software/optimized-routines, commit 04884bd04eac4b251da4026900010ea7d8850edc code size change: +2458 bytes (+1524 bytes with fma). benchmark on x86_64 before, after, speedup: -Os: log2 rthruput: 16.08 ns/call 10.49 ns/call 1.53x log2 latency: 44.54 ns/call 25.55 ns/call 1.74x -O3: log2 rthruput: 15.92 ns/call 10.11 ns/call 1.58x log2 latency: 44.66 ns/call 26.16 ns/call 1.71x
2019年04月17日math: new log Szabolcs Nagy-104/+454
from https://github.com/ARM-software/optimized-routines, commit 04884bd04eac4b251da4026900010ea7d8850edc Assume __FP_FAST_FMA implies __builtin_fma is inlined as a single instruction. code size change: +4588 bytes (+2540 bytes with fma). benchmark on x86_64 before, after, speedup: -Os: log rthruput: 12.61 ns/call 7.95 ns/call 1.59x log latency: 41.64 ns/call 23.38 ns/call 1.78x -O3: log rthruput: 12.51 ns/call 7.75 ns/call 1.61x log latency: 41.82 ns/call 23.55 ns/call 1.78x
2019年04月17日math: new powf Szabolcs Nagy-240/+226
from https://github.com/ARM-software/optimized-routines, commit 04884bd04eac4b251da4026900010ea7d8850edc POWF_SCALE != 1.0 case only matters if TOINT_INTRINSICS is set, which is currently not supported for any target. SNaN is not supported, it would require an issignalingf implementation. code size change: -816 bytes. benchmark on x86_64 before, after, speedup: -Os: powf rthruput: 95.14 ns/call 20.04 ns/call 4.75x powf latency: 137.00 ns/call 34.98 ns/call 3.92x -O3: powf rthruput: 92.48 ns/call 13.67 ns/call 6.77x powf latency: 131.11 ns/call 35.15 ns/call 3.73x
2019年04月17日math: new exp2f and expf Szabolcs Nagy-179/+177
from https://github.com/ARM-software/optimized-routines, commit 04884bd04eac4b251da4026900010ea7d8850edc In expf TOINT_INTRINSICS is kept, but is unused, it would require support for __builtin_round and __builtin_lround as single instruction. code size change: +94 bytes. benchmark on x86_64 before, after, speedup: -Os: expf rthruput: 9.19 ns/call 8.11 ns/call 1.13x expf latency: 34.19 ns/call 18.77 ns/call 1.82x exp2f rthruput: 5.59 ns/call 6.52 ns/call 0.86x exp2f latency: 17.93 ns/call 16.70 ns/call 1.07x -O3: expf rthruput: 9.12 ns/call 4.92 ns/call 1.85x expf latency: 34.44 ns/call 18.99 ns/call 1.81x exp2f rthruput: 5.58 ns/call 4.49 ns/call 1.24x exp2f latency: 17.95 ns/call 16.94 ns/call 1.06x
2019年04月17日math: new log2f Szabolcs Nagy-58/+108
from https://github.com/ARM-software/optimized-routines, commit 04884bd04eac4b251da4026900010ea7d8850edc code size change: +177 bytes. benchmark on x86_64 before, after, speedup: -Os: log2f rthruput: 11.38 ns/call 5.99 ns/call 1.9x log2f latency: 35.01 ns/call 22.57 ns/call 1.55x -O3: log2f rthruput: 10.82 ns/call 5.58 ns/call 1.94x log2f latency: 35.13 ns/call 21.04 ns/call 1.67x
generated by cgit v1.2.1 (git 2.18.0) at 2025年09月10日 18:30:42 +0000

AltStyle によって変換されたページ (->オリジナル) /