How can a computer round the last digit in a floating point representation?

Question 1

I'm confused by how a computer rounds off the last digit in the floating point representation. For example, I'm told that x=1.24327789 is stored in a computer with a 6-digit capacity then it;s floating point representation would be x=0.124328x10¹, where clearly the last digit has been rounded.

My confusion refers to how the computer can have the capacity to round this last digit, if it hasn't a 7-digit capacity in order to know the 'last' digit.

I probably have a half-assed way of understanding this representation, but I really have no background in CompSci.

Question 2

For calculations it is not uncommon for the cpu/fpu to use more bits internally and convert back to the width of the original operands (rounding it in the process) before yielding the result. This is what 80-bit fp is for, to prevent 64-bit calculations from losing accuracy.

Question 3

With a few odd exceptions, a floating point number is stored as binary in the standard known as IEEE 754. These are most often 32 bit (single percision) and 64 bit (double precision) representations. The 32 bit representation can store approximately 7 decimal digits, but remember that the underlying representation is in binary.

The representation of 1.24327789₁₀ is actually 00111111100111110010001110111011 as a single precision IEEE 754 floating point number in binary.

This is made up of three parts:

The sign bit (0 indicating it is positive)
The exponent (01111111 which is 127) giving 2^127-127 coming out to be 2⁰
The mantissa (00111110010001110111011) which has a leading 1 implicit.

This gives us +2⁰ * 1.00111110010001110111011 which then gives you your number. If you look at the first couple bits there of 1.0011111₂ you will see that this is rather close to 1.25₁₀ or 1.01₂.

On reading binary numbers past the binary (not decimal) point...

Just as 1001₂ represents 1*2³ + 0*2² + 0*2¹ + 1*2⁰, the value 1.011₂ represets 1*2⁰ + 0*2^-1 + 1*2^-2 + 1*2^-3 or 1 + (1/4) + (1/8)

Now, that conversion I did a bit above - I grabbed it from an IEEE 754 converter because doing it by hand is tedious - its typically a good part of an assignment at the college level.

Rounding is actually a big deal. As described in Lecture Notes on the Status of IEEE Standard 754 for Binary Floating-Point Arithmetic from '97, rounding issues abounded in the 70s.

The number 1.24327789 in binary is 1.0011111001000111011101011011010101100011110011111000100000111...₂

So, the 1 is assumed and the mantissa is 23 bits of that...

 1 2 |
12345678901234567890123v
0011111001000111011101011011010101100011110011111000100000111

And you see at the arrow that this number should be rounded up which gives us 00111110010001110111011₂ which is the mantissa from above. And thats how it is represented and rounded. You should note that as this is rounded up it is slightly greater than the original and closer to 1.243277907371521₁₀.

Question 4

Rounding is also part of the reason why FP math circuitry is an order of magnitude more complex than the circuitry for integer math. Even with the massive improvements to speed and accuracy in the past 20 years it is still ridiculously complicated to handle FP numbers in hardware.

Question 5

@Snowman guard digits and such are challenging and weren't consistent for awhile. See also Rounding Error section and Guard Digits.

Question 6

If your computer uses IEEE 754 single precision floating point numbers (as most computers do), then it uses a representation where a number x is represented by a sign (+1 or -1), a mantissa which is an integer ≥ 2^23 and ≤ 2^24 - 1, and a binary exponent b; the number represented is sign * mantissa * 2^b. (There are some details not mentioned).

A number from 1 to 2 has an exponent b = -23, so that mantissa * 2^b is between 1 and 2. The mantissa is an integer, so the number is a multiple of 2^-23. You can calculate that 2^-23 is a bit over 0.000 000 119.

Question 7

I think the other answers don't actually address the point of this question.

If you input the number 1.24327789 then at first it's simply a sequence of characters or keystrokes. In order to turn this into a numerical representation, a compiler or interpreter has to convert it. This program understands decimal representations and can produce a standard floating point binary representation. And in fact it can, for its internal purposes, first convert it to a higher precision binary representation than what can be stored later on, and then round that off for the final representation.

Question 8

I apologize if this sounds naïve, it kind of is, but why bother with rounding the binary representation? Why not simply truncate?

Question 9

To squeeze out a bit more precision out of the process.

Question 10

but isn't it the act of rounding that causes the error? I personally would think that storing the value accurately would be more important.

Question 11

Truncation causes larger error. Imagine (in decimal) that the exact value would be 3.68, but we can only store two significant digits. Truncation leads to 3.6, while rounding gives 3.7. Since 3.7 is closer to 3.68 than 3.6 is, we should use rounding.

Question 12

I understand that, but I was specifically referring to the binary rounding, not decimal rounding, unless I'm misunderstanding your response?

user40980user40980 · Answer 1 · 2015-06-05 03:08:19Z

With a few odd exceptions, a floating point number is stored as binary in the standard known as IEEE 754. These are most often 32 bit (single percision) and 64 bit (double precision) representations. The 32 bit representation can store approximately 7 decimal digits, but remember that the underlying representation is in binary.

The representation of 1.24327789₁₀ is actually 00111111100111110010001110111011 as a single precision IEEE 754 floating point number in binary.

This is made up of three parts:

The sign bit (0 indicating it is positive)
The exponent (01111111 which is 127) giving 2^127-127 coming out to be 2⁰
The mantissa (00111110010001110111011) which has a leading 1 implicit.

This gives us +2⁰ * 1.00111110010001110111011 which then gives you your number. If you look at the first couple bits there of 1.0011111₂ you will see that this is rather close to 1.25₁₀ or 1.01₂.

On reading binary numbers past the binary (not decimal) point...

Just as 1001₂ represents 1*2³ + 0*2² + 0*2¹ + 1*2⁰, the value 1.011₂ represets 1*2⁰ + 0*2^-1 + 1*2^-2 + 1*2^-3 or 1 + (1/4) + (1/8)

Now, that conversion I did a bit above - I grabbed it from an IEEE 754 converter because doing it by hand is tedious - its typically a good part of an assignment at the college level.

Rounding is actually a big deal. As described in Lecture Notes on the Status of IEEE Standard 754 for Binary Floating-Point Arithmetic from '97, rounding issues abounded in the 70s.

The number 1.24327789 in binary is 1.0011111001000111011101011011010101100011110011111000100000111...₂

So, the 1 is assumed and the mantissa is 23 bits of that...

 1 2 |
12345678901234567890123v
0011111001000111011101011011010101100011110011111000100000111

And you see at the arrow that this number should be rounded up which gives us 00111110010001110111011₂ which is the mantissa from above. And thats how it is represented and rounded. You should note that as this is rounded up it is slightly greater than the original and closer to 1.243277907371521₁₀.

Rounding is also part of the reason why FP math circuitry is an order of magnitude more complex than the circuitry for integer math. Even with the massive improvements to speed and accuracy in the past 20 years it is still ridiculously complicated to handle FP numbers in hardware.
@Snowman guard digits and such are challenging and weren't consistent for awhile. See also Rounding Error section and Guard Digits.

gnasher729 gnasher729 49.2k4 gold badges71 silver badges137 bronze badges · Answer 2 · 2015-11-29 18:00:20Z

If your computer uses IEEE 754 single precision floating point numbers (as most computers do), then it uses a representation where a number x is represented by a sign (+1 or -1), a mantissa which is an integer ≥ 2^23 and ≤ 2^24 - 1, and a binary exponent b; the number represented is sign * mantissa * 2^b. (There are some details not mentioned).

A number from 1 to 2 has an exponent b = -23, so that mantissa * 2^b is between 1 and 2. The mantissa is an integer, so the number is a multiple of 2^-23. You can calculate that 2^-23 is a bit over 0.000 000 119.

isarandi isarandi 101 · Answer 3 · 2016-08-29 20:25:54Z

0

I think the other answers don't actually address the point of this question.

If you input the number 1.24327789 then at first it's simply a sequence of characters or keystrokes. In order to turn this into a numerical representation, a compiler or interpreter has to convert it. This program understands decimal representations and can produce a standard floating point binary representation. And in fact it can, for its internal purposes, first convert it to a higher precision binary representation than what can be stored later on, and then round that off for the final representation.

Share

Improve this answer

answered Aug 29, 2016 at 20:25

isarandi's user avatar

isarandi isarandi

101

6

I apologize if this sounds naïve, it kind of is, but why bother with rounding the binary representation? Why not simply truncate?

RWolfe
– RWolfe

2022年01月05日 08:51:15 +00:00
Commented Jan 5, 2022 at 8:51
To squeeze out a bit more precision out of the process.

isarandi
– isarandi

2022年01月05日 23:16:52 +00:00
Commented Jan 5, 2022 at 23:16
but isn't it the act of rounding that causes the error? I personally would think that storing the value accurately would be more important.

RWolfe
– RWolfe

2022年01月06日 07:28:20 +00:00
Commented Jan 6, 2022 at 7:28
Truncation causes larger error. Imagine (in decimal) that the exact value would be 3.68, but we can only store two significant digits. Truncation leads to 3.6, while rounding gives 3.7. Since 3.7 is closer to 3.68 than 3.6 is, we should use rounding.

isarandi
– isarandi

2022年01月06日 17:36:33 +00:00
Commented Jan 6, 2022 at 17:36
I understand that, but I was specifically referring to the binary rounding, not decimal rounding, unless I'm misunderstanding your response?

RWolfe
– RWolfe

2022年01月07日 01:16:39 +00:00
Commented Jan 7, 2022 at 1:16

| Show 1 more comment

Stack Exchange Network

How can a computer round the last digit in a floating point representation?

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

How can a computer round the last digit in a floating point representation?

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions