When do you use float and when do you use double

Question 1

Frequently, in my programming experience, I need to make a decision whether I should use float or double for my real numbers. Sometimes I go for float, sometimes I go for double, but really this feels more subjective. If I would be confronted to defend my decision, I would probably not give sound reasons.

When do you use float and when do you use double? Do you always use double, only when memory constraints are present you go for float? Or you always use float unless the precision requirement requires you to use double? Are there some substantial differences regarding the computational complexity of basic arithmetics between float and double? What are the pros and cons of using float or double? And have you even used long double?

Question 2

In many cases you want to use neither, but rather a decimal floating or fixedpoint type. Binary floating point types can't represent most decimals exactly.

Question 3

Related to What causes floating point rounding errors?. @CodesInChaos my answer there suggests resources to help you make that determination, there is no one-size-fits-all solution.

Question 4

What exactly do you mean by "decimals". If you need to represent values like 0.01 exactly (say, for money), then (binary) floating-point is not the answer. If you merely means non-integer numbers, then floating-point is likely ok -- but then "decimals" is not the best word to describe what you need.

Question 5

Considering (as of today) most graphics cards accept floats over doubles, graphics programming often uses single precision.

Question 6

You don't always have a choice. For example, on the Arduino platform, both double and float equate to float. You need to find an add-in library to handle real doubles.

Question 7

The default choice for a floating-point type should be double. This is also the type that you get with floating-point literals without a suffix or (in C) standard functions that operate on floating point numbers (e.g. exp, sin, etc.).

float should only be used if you need to operate on a lot of floating-point numbers (think in the order of thousands or more) and analysis of the algorithm has shown that the reduced range and accuracy don't pose a problem.

long double can be used if you need more range or accuracy than double, and if it provides this on your target platform.

In summary, float and long double should be reserved for use by the specialists, with double for "every-day" use.

Question 8

I would probably not consider float for a few thousand values unless there were a performance problem related to floating point caching and data transfer. There is usually a substantial cost to doing the analysis to show that float is precise enough.

Question 9

As an addendum, if you need compatibility with other systems, it can be advantageous to use the same data types.

Question 10

I'd use floats for millions of numbers, not 1000s. Also, some GPUs do better with floats, in that specialized case use floats. Else, as you say, use doubles.

Question 11

@PatriciaShanahan - 'performance problem related to..' A good example is if you are planning to use SSE2 or similar vector instructions, you can do 4 ops/vector in float (vs 2 per double) which can give a significant speed improvement (half as many ops and half as much data to read & write). This can significantly lower the threshold where using floats becomes attractive, and worth the trouble to sort out the numeric issues.

Question 12

I endorse this answer with one additional advice: When one is operating with RGB values for display, it is acceptable to use float (and occasionally half-precision) because neither the human eye, the display, or the color system has that many bits of precision. This advice is applicable for say OpenGL etc. This additional advice does not apply to medical images, which have more strict precision requirements.

Question 13

There is rarely cause to use float instead of double in code targeting modern computers. The extra precision reduces (but does not eliminate) the chance of rounding errors or other imprecision causing problems.

The main reasons I can think of to use float are:

You are storing large arrays of numbers and need to reduce your program's memory consumption.
You are targeting a system that doesn't natively support double-precision floating point. Until recently, many graphics cards only supported single precision floating points. I'm sure there are plenty of low-power and embedded processors that have limited floating point support too.
You are targeting hardware where single-precision is faster than double-precision, and your application makes heavy use of floating point arithmetic. On modern Intel CPUs I believe all floating point calculations are done in double precision, so you don't gain anything here.
You are doing low-level optimization, for example using special CPU instructions that operate on multiple numbers at a time.

So, basically, double is the way to go unless you have hardware limitations or unless analysis has shown that storing double precision numbers is contributing significantly to memory usage.

Question 14

"Modern computers" meaning Intel x86 processors. Some of the machines the Ancients used provided perfectly adequate precision with the basic float type. (The CDC 6600 used a 60-bit word, 48 bits of normalized floating-point mantissa, 12 bits of exponent. That's ALMOST what the x86 gives you for double precision.)

Question 15

@John.R.Strohm: agreed, but C compilers did not exist on CDC6600. It was Fortran IV...

Question 16

By "modern computers" I mean any processor built in the last decade or two, or really, since the IEEE floating point standard was widely implemented. I'm perfectly aware that non-x86 architectures exist and had that in mind with my answer - I mentioned GPUs and embedded processors, which are typically not x86.

Question 17

That's simply not true, though. SSE2 can manipulate 4 floats or 2 doubles in one operation, AVX can manipulate 8 floats or 4 doubles, AVX-512 can manipulate 16 floats or 8 doubles. For any kind of high performance computing, math on floats should be thought of as twice the speed of the same operations on doubles on x86.

Question 18

And it's even worse than that, since you can fit twice as many floats in processor cache as you can with doubles, and memory latency is likely to be the main bottleneck in many programs. Keeping a whole working set of floats warm in cache may be literally an order of magnitude faster than using doubles and having them spill to RAM.

Question 19

Use double for all your calculations and temp variables. Use float when you need to maintain an array of numbers - float[] (if precision is sufficient), and you are dealing with over tens of thousands of float numbers.

Many/most math functions or operators convert/return double, and you don't want to cast the numbers back to float for any intermediate steps.

E.g. If you have an input of 100,000 numbers from a file or a stream and need to sort them, put the numbers in a float[].

Question 20

Some platforms (ARM Cortex-M2, Cortex-M4 etc) don't support double (It can always be checked in the reference manual to your processor. If there is no compilation warnings or errors, it does not mean that code is optimal. double can be emulated.). That is why you may need to stick to int or float.

If that is not the case, I would use double.

You can check the famous article by D. Goldberg ("What Every Computer Scientist Should Know About Floating-Point Arithmetic"). You should think twice before using floating-point arithmetic. There is a pretty big chance they are not needed at all in your particular situation.

http://perso.ens-lyon.fr/jean-michel.muller/goldberg.pdf

Question 21

This question was already pretty well answered a year ago... but in any case, I'd say any time you're using double on platforms with double precision FPU acceleration, you should be using it on any other, even if that means letting the compiler emulate it instead of taking advantage of a FPU with floating-point only (note that FPU's aren't required on all platforms either, in fact a Cortex-M4 architecture defines them as an optional feature [was M2 a typo?]).

Question 22

The key to that logic is, while it's true one should be weary of floating point arithmetic, and it's many "quirks", definitely not taking the presence of FPU support for doubles to mean simply use doubles instead of floats. Floats are very generally faster than doubles and take less memory (FPU features vary). The volume of usage precludes this point from being on premature optimization. As does the fact doubles are clearly overkill for a lot (maybe even most) applications. Do the elements on this page really need to have their relative positions and sizes calculated to 13 decimal places?

Question 23

When including a link to an off site page or document, please copy the relevant information, or summary, from the document into your answer. Off site links have a tendency to disappear over time.

Question 24

For real world problems the sampling threshold of your data is important when answering this question. Similarly, the noise floor is also important. If either is exceeded by your data type selection, no benefit will come from increasing precision.

Most real world samplers are limited to 24 bit DAC s. Suggesting that 32 bits of precision on real world calculations should be adequate where the significand is 24 bits of precision.

Double precision comes at the cost of 2x memory. Therefore limiting the use of doubles over floats could drastically cut the memory footprint/bandwidth of running applications.

Question 25

A very simple rule: You use double unless you, personally, can give reasons that you can defend, why you would use float.

Consequently, if you ask "should I use double or float", the answer is "use double".

Question 26

The choice of what variable to use between float and double depends on the accuracy of the data required. If an answer is required to have negligible difference from the actual answer, number of decimal places required will be many thus will dictate that double to be in use.Float will chop off some decimal places part thus reducing the accuracy.

Question 27

This answer doesn't add anything new to the question, and fails to say anything of actual use.

Question 28

Usually, I use the float type when I don't need much precision — for example, for money — which is wrong, but is what I'm used to wrongly do.

On the other hand, I use double when I need more precision, for example for complex mathematical algorithms.

The C99 standard says this:

There are three floating point types: float, double, and long double. The type double provides at least as much precision as float, and the type long double provides at least as much precision as double. The set of values of the type float is a subset of the set of values of the type double; the set of values of the type double is a subset of the set of values of the type long double.

I never really used long double, but I don't use C/C++ so much. Usually I use dynamically typed languages like Python, where you don't have to care about the types.

For further information about Double vs Float, see this question at SO.

Question 29

Using floating point for serious money calculations is probably a mistake.

Question 30

float is exactly the wrong type for money. You need to be using the highest precision possible.

Question 31

@BartvanIngenSchenau Floating point for money is usually okay, binary floating point is not. For example .net's Decimal is a floating point type and it's typically a good choice for money calculations.

Question 32

@ChrisF You don't need "high precision" for money, you need exact values.

Question 33

@SeanMcSomething - Fair point. However, floats are still the wrong type though and given the floating point types available in most languages you need "high precision" to get "exact values".

score 217 · Accepted Answer · 2013-02-28 10:50:07Z

217

The default choice for a floating-point type should be double. This is also the type that you get with floating-point literals without a suffix or (in C) standard functions that operate on floating point numbers (e.g. exp, sin, etc.).

float should only be used if you need to operate on a lot of floating-point numbers (think in the order of thousands or more) and analysis of the algorithm has shown that the reduced range and accuracy don't pose a problem.

long double can be used if you need more range or accuracy than double, and if it provides this on your target platform.

In summary, float and long double should be reserved for use by the specialists, with double for "every-day" use.

Share

Improve this answer

answered Feb 28, 2013 at 10:50

Bart van Ingen Schenau's user avatar

Bart van Ingen Schenau Bart van Ingen Schenau

78.8k20 gold badges129 silver badges196 bronze badges

12

13

I would probably not consider float for a few thousand values unless there were a performance problem related to floating point caching and data transfer. There is usually a substantial cost to doing the analysis to show that float is precise enough.

Patricia Shanahan
– Patricia Shanahan

2013年02月28日 15:35:10 +00:00
Commented Feb 28, 2013 at 15:35
8

As an addendum, if you need compatibility with other systems, it can be advantageous to use the same data types.

zzzzBov
– zzzzBov

2013年02月28日 16:30:14 +00:00
Commented Feb 28, 2013 at 16:30
17

I'd use floats for millions of numbers, not 1000s. Also, some GPUs do better with floats, in that specialized case use floats. Else, as you say, use doubles.

user949300
– user949300

2014年08月19日 16:57:11 +00:00
Commented Aug 19, 2014 at 16:57
4

@PatriciaShanahan - 'performance problem related to..' A good example is if you are planning to use SSE2 or similar vector instructions, you can do 4 ops/vector in float (vs 2 per double) which can give a significant speed improvement (half as many ops and half as much data to read & write). This can significantly lower the threshold where using floats becomes attractive, and worth the trouble to sort out the numeric issues.

greggo
– greggo

2014年09月09日 19:03:01 +00:00
Commented Sep 9, 2014 at 19:03
16

I endorse this answer with one additional advice: When one is operating with RGB values for display, it is acceptable to use float (and occasionally half-precision) because neither the human eye, the display, or the color system has that many bits of precision. This advice is applicable for say OpenGL etc. This additional advice does not apply to medical images, which have more strict precision requirements.

rwong
– rwong

2014年11月17日 22:00:14 +00:00
Commented Nov 17, 2014 at 22:00

| Show 7 more comments

Stack Exchange Network

When do you use float and when do you use double

8 Answers 8

Linked

Hot Network Questions