I'm trying to calculate the min/max, or the lowest to highest value range of a 48 bit floating point type MIL-STD-1750A (PDF) (WIKI).
Ex: How a double range is 1.7E +/- 308
I've looked around for equations, and am unsure if what I have found will work.
The first equation I found was first equation
The second was second equation
I'm not quite sure where to begin with these, if they are even correct in what I need.
Will someone impart their knowledge to me and help solve this?
2 Answers 2
For 32-bit floating point, the maximum value is shown in Table III:
0.9999998 x 2^127 represented in hex as: mantissa=7FFFFF, exponent=7F.
We can decompose the mantissa/exponent into a (close) decimal value as follows:
7FFFFF <base-16> = 8,388,607 <base-10>.
There are 23 bits of significance, so we divide 8,388,607 by 2^23.
8,388,607 / 2^23 = 0.99999988079071044921875 (see Table III)
as far as the exponent:
7F <base-16> = 127 <base-10>
and now we multiply the mantissa by 2^127 (the exponent)
8,388,607 / 2^23 * 2^127 =
8,388,607 * 2^104 = 1.7014116317805962808001687976863 * 10^38
This is the largest 32-bit floating point value because the largest mantissa is used and the largest exponent.
The 48-bit floating point adds 16 bits of lessor significance mantissa but leaves the exponent the same size. Thus, the max value would be represented in hex as
mansissa=7FFFFFFFFF, exponent=7F.
again, we can compute
7FFFFFFFFF <base-16> = 549,755,813,887 <base-10>
the max exponent is still 127, but we need to divide by [23+16=39, so:] 2^39. 127-39=88, so just multiply by 2^88:
549,755,813,887 * 2^88 =
1.7014118346015974672186595864716 * 10^38
This is the largest 48-bit floating point value because we used the largest possible mantissa and largest possible exponent.
So, the max values are:
1.7014116317805962808001687976863 * 10^38, for 32-bit, and,
1.7014118346015974672186595864716 * 10^38, for 48-bit
The max value for 48-bit is just slightly larger than for 32-bit, which stands to reason since a few bits are added to the end of the mantissa.
(To be exact the maximum number for the 48-bit format can be expressed as a binary number that consists of 39 1's followed by 88 0's.)
(The smallest is just the negative of this value. The closest to zero without being zero can also easily be computed as per above: use the smallest possible (positive) mantissa:0000001 and the smallest possible exponent: 80 in hex, or -128 in decimal)
FYI
Some floating point formats use an unrepresented hidden 1
bit in the mantissa (this allows for one extra bit of precision in the mantissa, as follows: the first binary digit of all numbers (except 0, or denormals, see below) is a 1
, therefore we don't have to store that 1
, and we have an extra bit of precision). This particular format doesn't seem to do this.
Other floating point formats allow denormalized mantissa, which allows representing (positive) numbers smaller than smallest the exponent, by trading bits of precision for additional (negative) powers of 2. This easy to support if it doesn't also support the hidden one bit, a bit harder if it does.
8,388,607 / 2^23 is the value you'd get with mantissa=0x7FFFFF and exponent=0x00. It is not the single bit value but rather the value with a full mantissa and a neutral, or more specifically, a zero exponent.
The reason this value is not directly 8388607, and requires division (by 2^23 and hence is less than what you might expect) is that the implied radix point is in front of the mantissa, rather than after it. So, think +/-.111111111111111111111
(a sign bit, followed by a radix point, followed by twenty-three 1-bits) for the mantissa and +/-111111111111 (no radix point here, just an integer, in this case, 127) for the exponent.
mantissa = 0x7FFFFF with exponent = 0x7F is the largest value which corresponds to 8388607 * 2 ^ 104, where the 104 comes from 127-23: again, subtracting 23 powers of two because the mantissa has the radix point at the beginning. If the radix point were at the end, then the largest value (0x7FFFFF,0x7F) would indeed be 8,388,607 * 2 ^ 127.
Among others, there are possible ways we can consider a single bit value for the mantissa. One is mantissa=0x400000, and the other is mantissa=0x000001. without considering the radix point or the exponent, the former is 4,194,304, and the latter is 1. With a zero exponent and considering the radix point, the former is 0.5 (decimal) and the latter is 0.00000011920928955078125. With a maximum (or minimum) exponent, we can compute max and min single bit values.
(Note that the latter format where the mantissa has leading zeros would be considered denormalized in some number formats, and its normalized representation would be 0x400000 with an exponent of -23).
-
I have one doubt. By doing 8,388,607 / 2^23 gives the what a single bit in the mantissa can represent. So how 8,388,607 / 2^23 * 2^127 does represent the maximum value ?hariprasad– hariprasad2016年08月11日 07:04:55 +00:00Commented Aug 11, 2016 at 7:04
-
@hariprasad, I will put a postscript on the answer, as it is too hard to explain in the comment format.Erik Eidt– Erik Eidt2016年08月11日 16:37:06 +00:00Commented Aug 11, 2016 at 16:37
-
where is the minimum value? it's requested in the title!Charlie Parker– Charlie Parker2020年09月16日 19:57:00 +00:00Commented Sep 16, 2020 at 19:57
-
@CharlieParker, unlike integer representations where min and max are offset, floating point formats use sign&magnitude representations, where the magnitude is composed of an exponent and a mantissa. The max number in sign magnitude is then (1) positive (sign bit is 0) and (2) the largest possible exponent and (3) largest possible mantissa. The min number is identical: except the sign bit changes, so it is (1) negative (sign bit is 1), yet (2) the largest possible exponent, and (3) the largest possible mantissa. As I explained above, the min number is just -max, so just throw on a
-
sign.Erik Eidt– Erik Eidt2020年09月17日 22:01:56 +00:00Commented Sep 17, 2020 at 22:01
You can borrow from how the IEEE floating point is laid out for fast comparison: sign, exponent, mantissa. however in that PDF I see mantissa and exponent are reversed.
This means that to compare you'll have to first check the sign bit and if one is not the winner yet you compare the exponents and then you compare the mantissa.
If one is positive and the other is negative then the positive is the max.
If both are positive and one exponent is larger then it is the max (if both are negative then it is the min)
Similarly for mantissa.
a<b?a:b
anda>b?a:b