Denote by RD(f) and RU(f) the computed approximation obtained by evaluating the function f in floating-point arithmetic with rounding downwards and rounding upwards, respectively.
Assume we know from rounding error analysis that
| RD(f)- f | < E, and
| RU(f)- f | < E
What is the bound for the difference between RD(f) and RU(f),
| RD(f)- RU(f) | < E, or
| RD(f)- RU(f) | < 2E ?
[UPD] In addition to the comments:
Consider a "toy" decimal floating point system with p = 4 (precision, the total number of digits in the significand, including one digit to the left of the radix point) and with an unbounded exponent. For this system, the unit roundoff, u, is defined as follows:
u = 1/2 * 10^{1-4} = 0.0005 for round to-nearest-mode,
u = 10^{1-4} = 0.001 for any of the directed rounding modes.
Let us suppose f = (1.324/1.567 + 1.641/1.878) needs to be computed in the such system.
Exact value of f is 1.7187285282921926....
The error analysis shows that
| RD (f) - f | <= E, and
| RU (f) - f | <= E,
where E = n * u * (|1.324/1.567| + |1.641/1.878|),
and, as noted above, u = 0.001.
So,
E = 2 * 0.001 * 1.7187285282921926 =わ 0.0034374570565843852
(this is a bit rough estimate, because f was rounded).
Let us now calculate RD(f) and RF(f):
RD(f) = RD(0.8449266113592853 + 0.8738019169329073) = RD(0.844 + 0.873) = 1.717
RU(f) = RU(0.8449266113592853 + 0.8738019169329073) = RU(0.845 + 0.874) = 1.719
So,
|RD(f) - f| = 0.0017285282921926
|RU(f) – f| = 0.0002714717078074
and
|RD(f) - RU(f)| = 0.002 < 0.0034374570565843852
From this I suppose that |RD(f) - f| = E only if |RU(f) – f| = 0, and vice versa. Thus,
|RD(f) - RU(f)| <= E.
Or is something wrong in this example?
1 Answer 1
Let u be the difference between 1 and the next representable number greater than 1. (This is the Unit of Least Precision [ULP] of 1, the value of the least significant bit in the significand for 1 in the floating-point format.)
Consider the function f(x) = (4 − (x + 1⁄2u) − 3) / (1⁄2u). The exact mathematical value of f(1) is 1, but the computed value with rounding down is 0, and the computed value with rounding up is 0:
- When rounding down, 1 + 1⁄2u produces 1, then 4−1 produces 3, and 3−3 produces 0.
- When rounding up, 1 + 1⁄2u produces 1+u, then 4−(1+u) is exactly 3−u but must round up to 3 because 3−u is not representable (it is between 3−2u and 3 since the ULP in [2, 4) is twice the ULP in [1, 2)), and 3−3 produces 0.
Thus, for this function on the domain x∈{1}, we have an error bound E = 1 such that |RD(f) − f| ≤ E and |RU(f) − f| ≤ E, but |RD(f) − RU(f)| ≤ 0.
In contrast, consider the function (x + 1⁄2u − 1) / (1⁄2u). Again the exact mathematical value of f(1) is 1, but now the computed value with rounding down is −1, and the computed value with rounding up is +1.
So, in this case, we have the same error bound E = 1 such that |RD(f) − f| ≤ E and |RU(f) − f| ≤ E, but now the best limit on |RD(f) − RU(f)| is |RD(f) − RU(f)| ≤ 2E.
So, in general, given |RD(f) − f| ≤ E and |RU(f) − f| ≤ E, the best bound on |RD(f) − RU(f)| can fluctuate from 0 to 2E.
That answers the question in general. In a comment, you ask about f = a1/b1 + a2/b2 + ... + an/bn for positive ai and bi. Given the constraints, and if all the b values are representable, I think every rounding down error must have a negative (toward −∞) effect on the computed result, and every rounding up error must have a positive effect (toward +∞). (If any b value is not representable, its rounding will have the opposite effect on the final result, and the following analysis does not apply.) If E is the best (least) bound such that |RD(f) − f| < E and |RU(f) − f| < E, then it is not possible that |RD(f) − RU(f)| < E, and it is necessary that |RD(f) − RU(f)| < 2E.
(If you change < to ≤, then, if E is the best bound such that |RD(f) − f| ≤ E and |RU(f) − f| ≤ E, then |RD(f) − RU(f)| ≤ E is possible if and only if E is 0. Obviously this is true if E is 0, meaning the arithmetic is exact. If E is not zero, then one of the computations must have had some non-zero error, and therefore the other did too. And since the errors are necessarily monotonic as the computation progresses, the final errors must remain non-zero and of opposite signs.)
[It turns out I do not need the argument x in f(x); I could have simply used a constant function f as presented in the question. However, I wrote up the demonstration that way before I realized I did not need it.]
5 Comments
f = a1/b1 + a2/b2 +...+ an/bn, where ai > 0 and bi > 0 for all i = 1,2,...,n. Then the bound will be E? How can this be proved? Maybe you can suggest the books that give an error analysis for floating-point intervals?f = a1/b1 + a2/b2 + ... + an/bn (all ai > 0 and bi > 0) using rounding downwards and rounding upwards. In this case, E = n * u * f, where u is the unit roundoff. For directed roundings u = 2^{1-p}, where p is the precision.f = a1/b1 + a2/b2 + ... + an/bn for positive ai and bi: I think that if every rounding downwards will give a large negative error (close to the bound), the corresponding rounding upwards error (positive) will be small, and vice versa. So, if |RD(f) − f| is close to E, then |RU(f) − f| will be close to 0, and vice versa, if |RU(f) − f| is close to E, then |RD(f) − f| will be close to 0. Thus, |RD(f) − RU(f)| < E. Or is it wrong?Explore related questions
See similar questions with these tags.
a/bwhere the quotient is in [1/10, 1) is u / 10, not u, since the quotients have a lower exponent than 1 does. Additionally, it appears only the two division operations have been considered, but the addition has a rounding error too, particularly since the sum has a larger exponent (0) than the two things being added (both −1). Also, one cannot simply multiply the number of operations n by the "unit roundoff" u, since the "unit roundoff" varies with the result exponent.