Convert an integer to IEEE 754 float

Question 1

The task is simple: given a 32 bit integer, convert it to its floating point value as defined by the IEEE 754 (32-bit) standard.
To put it another way, interpret the integer as the bit-pattern of an IEEE binary32 single-precision float and output the numeric value it represents.

IEEE 754 single precision

Here is a converter for your reference.

Here is how the format looks, from Wikipedia's excellent article:

The standard is similar to scientific notation.

The sign bit determines whether the output is negative or positive. If the bit is set, the number is negative otherwise it is positive.

The exponent bit determines the exponent (base 2), it's value is offset by 127. Therefore the exponent is \2ドル^{n-127}\$ where n is the integer representation of the exponent bits.

The mantissa defines a floating point number in the range \$[1,2)\$. The way it represents the number is like binary, the most significant bit is \$\frac 1 2\$, the one to the right is \$\frac 1 4\$, the next one is \$\frac 1 8\$ and so on... A one by default is added to the value, implied by a non-zero exponent.

Now the final number is: $$\text{sign}\cdot 2^{\text{exponent}-127}\cdot \text{mantissa}$$

Test cases

1078523331 -> 3.1400001049041748046875
1076719780 -> 2.71000003814697265625
1036831949 -> 0.100000001490116119384765625
3264511895 -> -74.24919891357421875
1056964608 -> 0.5
3205496832 -> -0.5625
0 -> 0.0
2147483648 -> -0.0 (or 0.0)

For this challenge assume that cases like NaN and inf are not going to be the inputs, and subnormals need not be handled (except for 0.0 which works like a subnormal, with the all-zero exponent implying a leading 0 bit for the all-zero mantissa.) You may output 0 for the case where the number represented is -0.

This is code-golf, so the shortest answer in bytes wins.

Question 2

Are we allowed to take input as a list of 4 bytes?

Question 3

Do we need to return all decimals in your examples? Or are 15 significant digits enough? That is, output 3.14000010490417 in the first case

Question 4

@LuisMendo 15 digits is enough

Question 5

The challenge as written maps 0 to \2ドル^{-127}\,ドル but the test case claims that 0 should map to 0. I imagine this is a special case to allow 0 to be represented in IEEE 754 floats, but this is not clear from the challenge text.

Question 6

@loopywalt: The description did not previously describe how to handle +-0.0. I edited, assuming that those test cases implied this was part of the challenge. If not, the OP should remove the +-0.0 test cases. The all-zero exponent encoding implies a leading 0 bit for the all-zero mantissa, so it's 2^(-126) * 0.mantissa instead of 2^(-127) * 1.mantissa. en.wikipedia.org/wiki/… . Since you don't need to handle other cases of all-zero exponent, you could special-case the whole bit-pattern, e.g. if(! x<<1) return 0;

Question 7

Python, 55 bytes

Correct as per original challenge description (always add 2^23 to mantissa) but not per IEEE.

lambda i:-(i>>31or-1)*2**((i>>23)%256-129)*(i/8**7%4+4)

Attempt This Online!

Direct bit twiddling, no casting.

Python, 69 bytes

At last proper IEEE, I think (thanks @Neil).

lambda i:-(i>>31or-1)*2**((e:=i>>23&255)-126-(e>0))*(i/2**23%1+(e>0))

Attempt This Online!

Python NumPy, 47 bytes

lambda i:int32(i).view("f4")
from numpy import*

Attempt This Online!

Boring use of builtin "view" or "reinterpret" casting. Note that we can save the "u" from uint32 without issues.

Question 8

The no-casting version seems to fail for the testcases 0 and 2147483648.

Question 9

The "Proper IEEE" version actually halves subnormals.

Question 10

Yeah, this look right, now. 2^(-126) * 0.mantissa instead of 2^(-127) * 1.mantissa. An all-zero exponent encodes the same power of 2 as the minimum normalized float, instead of (not as well) changing the mantissa interpretation, so there isn't a gap in which values can be represented. My edit to fix the question's 0 handling didn't mention that detail. :/ (Fun fact: even 80-bit x87 IEEE extended precision with an explicit leading-1 bit works this way, so the leading 1 is always redundant.)

Question 11

x86 32-bit machine code, 5 bytes

D9 44 24 04 C3

Try it online!

Following the cdecl calling convention, this takes the 32-bit integer on the stack and returns the result on the FPU register stack.

In assembly:

f: fld DWORD PTR [esp + 4]
 ret

(fld does everything that is needed. The integer is placed below the return address on the stack, hence the + 4 to get to it.)

Question 12

x64 works similarly well with "movd xmm0,edi".

Question 13

Note that what fld m32 does is convert from binary32 to the x87 80-bit format in st0. So it widens the exponent field and appends zeros to the mantissa, and also creates an explicit leading bit of the mantissa according to the exponent being non-zero. This is how legacy x86 code normally returns FP values; the caller of this function can recover the original float with an fstp m32 instruction, or get it as a double with fstp m64, or get the 80-bit value with fstp m80.

Question 14

This is fine: the question asks for any FP or numeric value that holds the value represented by the binary32. It doesn't require a type-pun to an actual float binary32 return value. Which makes sense; many languages only have double-precision floating point, if they have non-integers at all.

Question 15

@throx: This is a challenge where a custom calling convention completely trivializes it. e.g. take the value by reference and update it in place, just a 1-byte ret. Or take the integer arg in XMM0; that's probably more justifiable than returning an FP bit-pattern in an integer reg. Although ARM soft-float calling conventions do pass FP bit-patterns in integer regs, and the return value reg r0 is also the first arg-passing reg. godbolt.org/z/x7T6W8xzT

Question 16

@throx: Sorry, my comment was badly phrased. You're correct that x86-64 SysV or Windows x64 would both need 5-byte functions. But we can get shorter if we consider alternatives as recommended by Tips for golfing in x86/x64 machine code, like a 1-byte C3 ret. A bit hard to justify for x86, although I posted an answer with some discussion of it. But justifiable for ARM with a soft-float ABI, for a 2-byte answer, which is the first part of the answer I wrote.

Question 17

Rust, 14 bytes

f32::from_bits

Not even reached the 30 byte min limit for posts

Question 18

Don't codegolf answers have to be functions, lambdas, or programs? Not just code fragments or the name of a built-in function. I think it has to be something that would let later code do f(1234), i.e. something that lets later call-sites use a custom (short) name for this operation.

Question 19

Since you can do let f = f32::from_bits; f(5) in the same way you can do let f = |x|f32::from_bits(x) I think it's valid. There are a lot of other answers that use this technique too, like this one

Question 20

Ok, that makes sense. If we're going to allow lambdas that require the surrounding code to give it a name if they want to reuse it, and bare function names can be used the same way, it wouldn't make sense to disallow them.

Question 21

C (GCC) without reliance on undefined behaviour, (削除) 41 (削除ここまで) 40 bytes

#define f(x)((union{int a;float b;})x).b

Attempt This Online!

Type-punning through pointers does normally work, but the compiler is free to do strange optimisations which can stop it working. A union makes this explicit in ways the compiler understands. It can also potentially be evaluated at compile time instead of at run time.

Note that it's best practise in C/C++ to use parentheses around the input value to a function-like macro and around the result, so you don't get unwanted interactions with precedence rules if this is used in a more complex statement. This would add 4 extra bytes to the total. For the tests defined in the question, we don't need this.

(Thanks @ceilingcat for spotting an unneeded space.)

Question 22

Welcome to Code Golf!

Question 23

Fun fact: this is well-defined in ISO C99 (assuming int and float are the same width), but not in ISO C++. It is well-defined in GNU C++, and in Visual C++, going beyond what ISO C++ defines. (In ISO C++, the safe ways to type-pun are std::memcpy and C++20 std::bit_cast<float>(x)). The reason for type-punning via pointers working is that MSVC explicitly supports it, and modern GCC tries to notice idioms like that and sometimes not break them even with the default -fstrict-aliasing. Older GCC versions would happily break *(float*)&x even though it sees the &, cast, and deref.

Question 24

@PeterCordes: Does this answer with the union work with ISO C++?

Question 25

@pts: No. It was edited to add "/ C++" after I commented, despite my earlier comment pointing out it's not well-defined in ISO C++, only "C++ (GCC)" and some other specific implementations. Type punning between integer and array using `union`? / Unions and type-punning. It does only claim to be a "C / C++ (GCC)" answer, but I agree it would be much better to point out that it depends on a GNU extension for the C++ part, if it's going to talk about doing it without UB.

Question 26

@PeterCordes OK, I'll undo that edit - thanks for the feedback.

Question 27

C++ (GCC / Clang / MSVC), 39 bytes

#define f(x)__builtin_bit_cast(float,x)

Try it online!

Question 28

Welcome to Code Golf, and nice answer!

Question 29

This does work in recent GCC and clang even without specifying -std=gnu++20 or c++20, so no need to worry about extra bytes for the command-line options you used in your Godbolt link. Also, yes, apparently MSVC used the same name as GCC/clang, following GCC's naming pattern for compiler built-ins in this case. Surprising.

Question 30

Heh, I noticed it worked even without specifying that flag, but forgot to remove it from the TIO (and to count it as bytes...)

Question 31

I'm not sure if -std=c++20 needs to count as part of the answer, since C++20 is a standard language, and it just happens that current GCC needs an option to fully operate in C++20 mode because that's not yet the default. As opposed to Perl where some interpreter options add code to your program. A meta answer I from from a former mod mentioned that -m32 should count as zero bytes, since you could just as well have used GCC on a 32-bit system where that's the default. So I'm debunking the concern I raised about options last comment.

Question 32

For the record, the only answer on Command-line flags on front ends proposes that flags should count as different languages, and has about 5:1 up:down votes. That's fully appropriate for C++20 vs. the default. But wonky for Perl, sed, and awk especially.

Question 33

Factor, 10 bytes

bits>float

Try it online!

Question 34

Java, 21 bytes

Float::intBitsToFloat

Attempt This Online!

Builtin :P

Question 35

JavaScript (ES6), 50 bytes

-16 bytes (!) thanks to @Neil

n=>new Float32Array(new Int32Array([n]).buffer)[0]

Try it online!

Question 36

new Int32Array([n]).buffer actually works on both little and big-endian architectures.

Question 37

From @loopywalt's Python answer, save another byte: n=>(4+n/8**7%4)*2**((n<<1>>>24)-129)*(n>>31|1)

Question 38

The no-casting versions seems to fail for the testcases 0 and 2147483648.

Question 39

@Neil: As long as float endianness matches integer endianness; apparently there have been some unfortunate historical architectures where that wasn't the case: Floating point Endianness? quotes wikipedia. But a newer quote from the same article says that all modern machines (using IEEE754) have matching int and FP endianness.

Question 40

@alephalpha: 0 (representing 0.0) is technically a subnormal value (or works like one): the exponent is all zero, so the implicit leading bit of the mantissa is 0, not 1. Some FP bithacks that fail for Inf/NaN/subnormals also fail for 0.0; I've seen that before in Why don't GCC and Clang optimize multiplication by 2^n with a float to integer PADDD of the exponent, even with -ffast-math?

Question 41

Charcoal, 45 bytes

×ばつ+%θηηX2−ζ150∨‹θX2¦31±1

Try it online! Link is to verbose version of code. Explanation:

Nθ

Input the integer.

≔X2¦23η

Calculate 223 as it gets used often enough to make it worthwhile. (It was originally only used twice but it was still worthwhile then. I then golfed a byte off by introducing a third use, which also avoided the use of Incremented which is buggy on TIO, otherwise I would have had to have used ATO instead.)

≔%÷θη256ζ

Extract the exponent. This is needed because an exponent of 0 needs to be special-cased. (Normally this results in a subnormal, but fortunately the only subnormals that we need to support are 0 and -0.)

×ばつ

If the exponent is zero, output zero, otherwise output the product of...

+%θηη

... the bottom 23 bits of the input integer, with a 1 bit prepended, ...

X2−ζ150

... 2 to the power of the exponent, adjusted by 150 instead of 127 to shift the mantissa bits by 23, and...

∨‹θX2¦31±1

... the sign bit.

Question 42

ARM Thumb machine code, 2 bytes

arm-none-eabi-g++ defaults to -mfloat-abi=soft, so float is passed/returned in general-purpose integer registers. (Godbolt)

ARM's standard calling convention passes the first arg in r0, which is also the return-value register. So all we need to do is return with bx lr (2 bytes).

// float f(int)
 // machine code hex // assembly
 70 47 bx lr

The same trick can work for any ISA if you can justify a custom calling convention. Normally that's fine, but on machines that use IEEE754 FP the challenge reduces to type-punning, and a calling convention that trivializes it is less interesting. e.g. for x86, you could normally justify taking an integer arg in XMM0, which is where you'd want a scalar float. (Tips for golfing in x86/x64 machine code)

x86-64 machine code, with custom calling convention, 1 byte

1-byte c3 ret for x86-64.

Another justification could be that we take the input by reference and update in-place. Like C void f(void*p){} - mutate the pointed-to int object from int to float, which is a no-op in asm.

(As a C function, that wouldn't make it well-defined to point a float* at an int, still a strict-aliasing violation. It might make it work in practice if it couldn't be inlined, forcing the compiler to keep its hands off. But this is a machine code answer. Obviously in real asm you'd never call this, it doesn't do anything.)

x86-64 machine code with AMD64 System V calling convention, 5 bytes

It's the same length as a call instruction, making it pointless not to inline, but whatever. :/

 66 0f 6e c7 movd xmm0,edi
 c3 ret

AVX vmovd xmm0,edi is the same 4-byte length.

x86 with custom 3DNow! calling convention, 4 bytes

Did anyone ever use the low element of an mm register for scalar float with 3DNow!, like how SSE/SSE2 use the bottom of an xmm register for scalar float/double? Possible, although there aren't scalar 3DNow! instructions like SSE addss, only packed float like pfadd. So you might get slowdown from subnormals in the high half. Still it's plausible.

# int arg in EDI, float return value in MM0
 0f 6e c7 movd mm0,edi
 c3 ret

Question 43

C (gcc), (削除) 36 (削除ここまで) (削除) 34 (削除ここまで) (削除) 30 (削除ここまで) 23 bytes

#define f(x)*(float*)&x

Try it online!

-2 thanks to m90

-4 thanks to Digital Trauma and mousetail

-7 thanks to jdt

Unsafe code go brr

#define f(x)*(float*)&x // Macrotaking an int, returning a float
#define f(x) // Boilerplate
 &x // Pointer to the input
 (float*) // Reinterpret it as a pointer to a float instead of an int
 * // Get the value at that pointer, now a float

Question 44

If you're willing to accept compilation with warnings (generally ok on CG), then x as an implicit int saves 4: float f(x){return*(float*)&x;}

Question 45

I think this is an UB (strict pointer aliasing) so compiler is free to return any value (unlessyou pass -fno-strict-pointer-aliasing or something else which will turn on compiler extention.

Question 46

You should be able to omit int from the function declaration since the default type is int

Question 47

This is definitely UB in C.

Question 48

On Linux Debian arm32, this also works: f(x,y){*(int*)y=x;} But then it is questionable if this does really meet the goal. You have to call it like this: float y; f(3264511895,&y);printf("%36.32f\n",y);

Question 49

Python, 55 bytes

lambda n:unpack('f',pack('I',n))[0]
from struct import*

Attempt This Online!

Question 50

Python, (削除) 69 (削除ここまで) (削除) 67 (削除ここまで) 65 bytes

lambda x:((x&8388607)/2**23+1)*2**((x>>23&255)-127)*(1-(x>>30&2))

Attempt This Online!

Question 51

This seems to fail for the testcases 0 and 2147483648.

Question 52

@alephalpha: The question didn't previously specify how 0 encoded +0.0 (like a subnormal where the all-zero exponent implies a leading zero for the mantissa); I edited the question. If zero and other subnormals weren't intended to be part of the challenge, that should be stated in the question and those test-cases removed. IMO answers that assume a non-zero exponent (i.e. answer the question as originally written) are interesting and worth keeping, even if people also want to add another longer version that does handle +-0.0.

Question 53

Ruby, 29 bytes

Same technique as solid.py's Python answer.

->n{[n].pack(?I).unpack1(?f)}

Attempt This Online!

Question 54

Go, 29 bytes

import."math"
Float32frombits

Attempt This Online!

wow builtins

Question 55

PARI/GP, 53 bytes

n->(1-n>>31*2)*if(e=n>>23%256,(1+n/2^23%1.)<<(e-127))

Attempt This Online!

Question 56

MATL, 7 bytes

7Y%10Z%

Try it at MATL online! Or verify all test cases.

Code explanation

 % Implicit input: number in 'double' data type
7Y% % Cast to 'uint32'
10Z% % Convert to 'single' without changing underlying data
 % Implicit display

Question 57

C++ (GCC), 30 bytes

[](int x){return*(float*)&x;};

This is just Seggan's C answer using C++11's lambda syntax instead of a named function.

Question 58

Nitpick: It's C++11's lambda syntax, not C++10's.

Question 59

If I understood the previous comments on that topic correctly, it's undefined behavior in C++.

Question 60

J-uby, 27 bytes

Port of my Ruby answer.

-[I]|~:pack&?I|~:unpack1&?f

Attempt This Online!

Explanation

 -[I] | # Construct an array with one element (the input), then
 ~:pack & ?I | # pack it into a binary string
 ~:unpack1 & ?f | # unpack it into a float

Question 61

C# (.NET Core 6), 30 bytes

BitConverter.Int32BitsToSingle

.NET Fiddle!

Question 62

To be understood in a particular way when the input exceeds 2**31, like the stated case 3264511895 (since C# can care about signed int versus unsigned uint).

Question 63

JavaScript (Node.js), 48 bytes

f=x=>x<0?-f(x^1<<31):x>>24?f(x-2**23)*2:x/2**149

Try it online!

No cast

loopy walt loopy walt 16.9k2 gold badges11 silver badges70 bronze badges · Accepted Answer · 2022-10-10 17:39:36Z

18

\$\begingroup\$

Python, 55 bytes

Correct as per original challenge description (always add 2^23 to mantissa) but not per IEEE.

lambda i:-(i>>31or-1)*2**((i>>23)%256-129)*(i/8**7%4+4)

Attempt This Online!

Direct bit twiddling, no casting.

Python, 69 bytes

At last proper IEEE, I think (thanks @Neil).

lambda i:-(i>>31or-1)*2**((e:=i>>23&255)-126-(e>0))*(i/2**23%1+(e>0))

Attempt This Online!

Python NumPy, 47 bytes

lambda i:int32(i).view("f4")
from numpy import*

Attempt This Online!

Boring use of builtin "view" or "reinterpret" casting. Note that we can save the "u" from uint32 without issues.

Share

Improve this answer

edited Oct 12, 2022 at 19:52

answered Oct 10, 2022 at 17:39

loopy walt's user avatar

loopy walt loopy walt

16.9k2 gold badges11 silver badges70 bronze badges

\$\endgroup\$

3

2

\$\begingroup\$ The no-casting version seems to fail for the testcases 0 and 2147483648. \$\endgroup\$

alephalpha
– alephalpha

2022年10月11日 11:16:51 +00:00
Commented Oct 11, 2022 at 11:16
2

\$\begingroup\$ The "Proper IEEE" version actually halves subnormals. \$\endgroup\$

Neil
– Neil

2022年10月12日 19:03:16 +00:00
Commented Oct 12, 2022 at 19:03
\$\begingroup\$ Yeah, this look right, now. 2^(-126) * 0.mantissa instead of 2^(-127) * 1.mantissa. An all-zero exponent encodes the same power of 2 as the minimum normalized float, instead of (not as well) changing the mantissa interpretation, so there isn't a gap in which values can be represented. My edit to fix the question's 0 handling didn't mention that detail. :/ (Fun fact: even 80-bit x87 IEEE extended precision with an explicit leading-1 bit works this way, so the leading 1 is always redundant.) \$\endgroup\$

Peter Cordes
– Peter Cordes

2022年10月15日 11:03:17 +00:00
Commented Oct 15, 2022 at 11:03

Add a comment |

Stack Exchange Network

Convert an integer to IEEE 754 float

IEEE 754 single precision

Test cases

21 Answers 21

Python, 55 bytes

Python, 69 bytes

Python NumPy, 47 bytes

x86 32-bit machine code, 5 bytes

Rust, 14 bytes

C (GCC) without reliance on undefined behaviour, (削除) 41 (削除ここまで) 40 bytes

C++ (GCC / Clang / MSVC), 39 bytes

Factor, 10 bytes

Java, 21 bytes

JavaScript (ES6), 50 bytes

Charcoal, 45 bytes

ARM Thumb machine code, 2 bytes

x86-64 machine code, with custom calling convention, 1 byte

x86-64 machine code with AMD64 System V calling convention, 5 bytes

x86 with custom 3DNow! calling convention, 4 bytes

C (gcc), (削除) 36 (削除ここまで) (削除) 34 (削除ここまで) (削除) 30 (削除ここまで) 23 bytes

Python, 55 bytes

Python, (削除) 69 (削除ここまで) (削除) 67 (削除ここまで) 65 bytes

Ruby, 29 bytes

Go, 29 bytes

PARI/GP, 53 bytes

MATL, 7 bytes

Code explanation

C++ (GCC), 30 bytes

J-uby, 27 bytes

Explanation

C# (.NET Core 6), 30 bytes

JavaScript (Node.js), 48 bytes

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Convert an integer to IEEE 754 float

IEEE 754 single precision

Test cases

21 Answers 21

Python, 55 bytes

Python, 69 bytes

Python NumPy, 47 bytes

x86 32-bit machine code, 5 bytes

Rust, 14 bytes

C (GCC) without reliance on undefined behaviour, (削除) 41 (削除ここまで) 40 bytes

C++ (GCC / Clang / MSVC), 39 bytes

Factor, 10 bytes

Java, 21 bytes

JavaScript (ES6), 50 bytes

Charcoal, 45 bytes

ARM Thumb machine code, 2 bytes

x86-64 machine code, with custom calling convention, 1 byte

x86-64 machine code with AMD64 System V calling convention, 5 bytes

x86 with custom 3DNow! calling convention, 4 bytes

C (gcc), (削除) 36 (削除ここまで) (削除) 34 (削除ここまで) (削除) 30 (削除ここまで) 23 bytes

Python, 55 bytes

Python, (削除) 69 (削除ここまで) (削除) 67 (削除ここまで) 65 bytes

Ruby, 29 bytes

Go, 29 bytes

PARI/GP, 53 bytes

MATL, 7 bytes

Code explanation

C++ (GCC), 30 bytes

J-uby, 27 bytes

Explanation

C# (.NET Core 6), 30 bytes

JavaScript (Node.js), 48 bytes

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions