Have seen library code for finding square-root, using the Newton Raphson method. It uses a table of 256 entries, whose significance is unclear, as the initial guess should be dependent on the quantity (x), whose square-root need be found. So, request if some details could be given.
It states about it, that :
Initial guesses for g and y are determined from argument x via table lookup into the array SqrtTable[256].
SqrtTable[256] contains initial estimates for the high significand bits of sqrt and correction factor. This table is also used by the IBM arccos and arcsin functions.
It states the usage of some terms as: xhead, and compares with 0x7ff00000, and 0x07000000ul, as in the code.
If some light be thrown, on the details; or some reference be given; would be highly thankful.
3 Answers 3
The table covers two consecutive binades, so is indexed by the least significant bit of the stored exponent and the most significant seven bits of the stored significand of the function argument, which is an IEEE-754 binary64
number. In essence, $x \in [1,2)$ is mapped to indexes 128ドル \ldots 255$, and $x \in [2, 4)$ is mapped to indexes 0ドル \ldots 127$. Each 16-bit table entries comprises two bytes, the most significant of which represents an 8-bit approximation to $\sqrt{x}$ while the least significant represents an 8-bit approximation to $\frac{1}{2\sqrt{x}}$.
What is stored in the table are the leading bits of the significands of the two approximations which get combined by bit manipulation with the exponent bits of an IEEE-754 binary64
in the appropriate range and re-interpreted as an IEEE-754 binary64
number. The square root approximation winds up in $g$ and the approximation to half the reciprocal square root winds up in $y$.
The table provides approximations valid to slightly under eight bits. Three steps of a Newton-Raphson type coupled sqrt / rsqrt iteration are performed. This iteration has quadratic convergence. This delivers square root approximation results with slightly under sixty-four valid bits to the final rounding step. Several different arrangements of such Newton-Raphson iterations are described in the literature and in practical use. The one used in the linked code matches with that shown in Figure 2 of the following publication:
P. W. Markstein, "Computation of elementary functions on the IBM RISC System/6000 processor," IBM Journal of Research and Development, Vol. 34, No. 1, Jan. 1990, pp. 111-119.
The approximations provided by the table include both overestimates and underestimates, which suggests that they were selected to minimize the approximation error in each sub-interval. I am not able to replicate the table exactly, despite trying for some time. I am able to generate a table that only differs by ± 1 in the least significant bit of each tabulated value, using the following C code:
double x;
int idx = 0;
for (x = 2.0; x < 4.0; x += 1.0 / 64) {
double midpoint = x + 1.0 / 128;
double s = sqrt (midpoint);
double r = 1.0 / s;
int hi = round ((s - 1.0) * 256);
int lo = round ((2 * r - 1.0) * 256);
int tab = hi * 256 + lo;
printf ("%d, ", tab);
idx++;
if ((idx % 12) == 0) printf ("\n");
}
for (x = 1.0; x < 2.0; x += 1.0 / 128) {
double midpoint = x + 1.0 / 256;
double s = sqrt (midpoint);
double r = 1.0 / s;
int hi = round ((s - 1.0) * 256);
int lo = round ((2 * r - 1.0) * 256);
int tab = hi * 256 + lo;
printf ("%d, ", tab);
idx++;
if ((idx % 12) == 0) printf ("\n");
}
-
$\begingroup$ Kindly elaborate a little more. Seems answer at: math.stackexchange.com/a/2284821/424260, has some reference to William Kahan, though now deleted. Though, request some reference to William Kahan or Peter Markstein too. $\endgroup$jiten– jiten2023年03月23日 00:05:21 +00:00Commented Mar 23, 2023 at 0:05
-
$\begingroup$ Note that this implementation doesn’t conform to the IEEE754 standard which requires that the square root needs to be correctly rounded in all cases. This one has a high percentage of incorrectly rounded values. $\endgroup$gnasher729– gnasher7292023年03月23日 10:05:30 +00:00Commented Mar 23, 2023 at 10:05
-
$\begingroup$ Njuffa, I switched to a larger font size and that fixed the problem :-( I remember an old ibm implementation that needed to do sqrt (4-2) manually because the result is soooo close to half way between 2 and 2-u. $\endgroup$gnasher729– gnasher7292023年03月24日 19:30:57 +00:00Commented Mar 24, 2023 at 19:30
This is by far not an answer, but a first hint. Below, a plot of the content of the lookup-table, which is addressed with the eight first bits of the mantissa.
The values in the table are in fact two bytes packed together but later processed independently. The blue byte is for $g$ and the orange for $y$. We have shifted the values to let a sign appear and absorb discontinuities across the value 106ドル$ (?).
-
$\begingroup$ The hexadecimal value for 0x7ff00000 is: 2146435072, as at: rapidtables.com/convert/number/hex-to-decimal.html. So, there are two bytes packed together. The choice of the value is unclear, as bigger values can be there. The second value ( 0x07000000ul ) is unclear, as what it means in the decimal expansion, and the reason for that choice. $\endgroup$jiten– jiten2023年03月22日 09:41:40 +00:00Commented Mar 22, 2023 at 9:41
-
1$\begingroup$ @jiten: I mean that the 16 bits values in the table are split in two bytes. The mask that you mention relates to the computation of the index. $\endgroup$user16034– user160342023年03月22日 11:07:28 +00:00Commented Mar 22, 2023 at 11:07
-
1$\begingroup$ Jiten, use a spreadsheet and plot for example the error and relative error of the very first approximation for x in the interval [1 + 39/128, 1+ 40/128] based on entry 128+39 in the table. $\endgroup$gnasher729– gnasher7292023年03月22日 22:07:18 +00:00Commented Mar 22, 2023 at 22:07
There’s a union if double and 64 bit int in a file that’s not shown, which lets you extract the sign, biased exponent, and mantissa without leading zero bit.
For numbers from 1 to 2 the square root would be from 1 to 1.414. For numbers from 2 to 4 the square root would be from 1.414 to 2. If you look at the ieee format, the last bit of the exponent and the leading bits of the mantissa are stored side by side which lets you extract one value for the whole range from the table.
There are checks whether the first iteration can be done in single precision, and one check for tiny (denormalised) numbers. Once you get used to the idea that you can create floating point numbers by gluing bits together it’s quite straightforward.
I’d check if the square root for the floating point number just below 4 is rounded correctly. The original IBM code had/needed a special case for that.
But the key to understanding this is to lookup the ieee754 floating point format and figure out how the various bit manipulations are translated into floating point numbers.
Explore related questions
See similar questions with these tags.