I have a binary file and C++ code which can read that binary file like follow.
int NumberOfWord;
FILE *f = fopen("../data/vec.bin", "rb");
fscanf(f, "%d", &NumberOfWord);
cout << NumberOfWord< <endl;
This output is:
114042
I want to reimplement like above code in python.
with open("../data/vec.bin","rb") as f:
b = f.read(8)
print struct.unpack("d",b)[0]
but this code is not working. my output is:
8.45476330511e-53
My question are:
1) why integer has 8 byte in C++.
I never knows %d means double. but, actually the variable has a type of integer, but normally we output using "%d" in C++. It is weird.
2) How do I extract a real number in python
I want to extract a real number like above C++ code in python code. How do I that??
maybe, I misunderstand about struct module in python.
2 Answers 2
As you have been able to read the file correctly with this C++ (or rather C) line, fscanf(f, "%d", &NumberOfWord);, I assume that your file contains a text representation of 114042. So it contains the bytes
0x31 0x31 0x34 0x30 0x34 0x32 ... or '1', '1', '4', '0', '4', '2', ...
When you open it in a text editor, you can see one single line 114042.
Now when you try to read if as binary with i format, you use the 4 first bytes of the file and actually get int('31313034', 16): 825308208. I could not reproduce what you get with d format for decoding it as double because I could not guess what comes in your file after the last digit...
If the number is alone on first line, it is easy: just read one line and convert it to an int:
with open("../data/vec.bin","rb") as f:
print int(f.readline())
If there are other characters after the last digit, you will have to first use a regex (do not forget to import re) to get the numeric value and then convert it to an int:
with open("../data/vec.bin","rb") as f:
line = f.readline()
m = re.match(t'\s*\d*', line)
print(int(m.group(0)))
TL/DR: Do not try to read a text file as if it contained a binary representation
2 Comments
In C format strings, %d is short for decimal.
In Python, d is short for double.
If it's an integer, you should use i in struct.unpack call.
with open("../data/vec.bin","rb") as f:
b = f.read()
print struct.unpack("i",b)[0]
printf("%d", 42);in C corresponds to print("{0:d}" % 42) in Python.fscanf(..., "%d", )is for reading text file???fscanf(..., "%d", )is for reading binary file in this time.