I am currently reading a chapter in a textbook on Processor Architecture and saw the following statement:
The less precision there is, the less space is occupied by a program variable in memory. Further, there is often a time advantage, both in ferrying the operands back and forth between the processor and memory, and for arithmetic and logic operations that need less precision. This is particularly true for floating-point arithmetic operations.
Why are less precise data like float sometimes faster than larger, more precise data like double? Can somebody develop on this explanation and maybe give an example?
3 Answers 3
For intuitively the same reason why it's faster to calculate 2 + 2 by hand than it is to calculate 3685 + 2193: there's simply less data to work your way through.
-
5Less data might be more precisely defined as a fewer number of bytes.Frank Hileman– Frank Hileman2017年02月08日 22:35:21 +00:00Commented Feb 8, 2017 at 22:35
-
3Yep. And even on an architecture where individual single-precision operations are not faster than double-precision, using half as much memory can still be a great performance advantage (provided that you're OK with the reduced accuracy).dan04– dan042017年02月09日 00:38:46 +00:00Commented Feb 9, 2017 at 0:38
Single precision floating point format compared to double precision:
- uses less memory, so can be transferred into register faster (in one machine instruction, usually)
- has less accuracy, so some approximations can be used for faster calculations (on software level this means less machine instructions per call, on hardware level this means less CPU clocks per instruction)
The size of double word types (double
, long
), is also influenced higher level languages specifications, for example, Java does not guarantee access to variable of such type to be atomic (done in one step for external observer).
An FPU or GPU can (sometimes) parallelize more 32-bit (float) FP operations than 64-bit (double) FP operations. That is, if it can add 2 doubles in parallel, it can add 4 floats in parallel.
For highly-optimized tight loops this can have a dramatic effect, especially on GPU where the processing units are less constrained with memory bandwidth.
-
2GPUs tend to be worse than a 1/2 ratio. Only professional-oriented GPUs (Quadro and FirePro) get anywhere close, typically around 1/3 if they're marketed towards those needing FP64. Though, Nvidia's recently announced Quadro GP100 is a rare sample that does have a 1/2 ratio. Typical consumer GPUs these days tend to have ratios like 1/32 to 1/24.8bittree– 8bittree2017年02月09日 17:57:43 +00:00Commented Feb 9, 2017 at 17:57
float
is noticeably faster than larger, more precise data likedouble
orlong double
".double
or extended-precision numbers.