This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2010年06月16日 23:20 by Bill.Steinmetz, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Messages (9) | |||
|---|---|---|---|
| msg107964 - (view) | Author: Bill Steinmetz (Bill.Steinmetz) | Date: 2010年06月16日 23:20 | |
Here's my Python version info:
Python 2.6.5 (r265:79096, Mar 19 2010, 18:02:59) [MSC v.1500 64 bit (AMD64)] on win32
Here's my code that won't return (Start with a file > 4GB "hugefile.bin"):
siz = (1<<32)
print "making array (%d) bytes" % siz
fin = open("hugefile.bin","rb")
a = array.array("B")
a.fromfile(fin, siz)
fin.close()
print "writing array (%d) bytes" % siz
fout = open("foo.bin","wb")
a.tofile(fout)
print "wrote 2^32 bytes with array.tofile"
I never get the third print statement :(
|
|||
| msg107966 - (view) | Author: Bill Steinmetz (Bill.Steinmetz) | Date: 2010年06月16日 23:53 | |
Looks like the issue is Microsoft's fwrite |
|||
| msg120313 - (view) | Author: Christoph Gohlke (cgohlke) | Date: 2010年11月03日 08:35 | |
This seems to be related: http://social.msdn.microsoft.com/Forums/en-US/vcgeneral/thread/7c913001-227e-439b-bf07-54369ba07994 |
|||
| msg120379 - (view) | Author: Martin Spacek (mspacek) | Date: 2010年11月04日 04:44 | |
NumPy is addressing this with a workaround in its ndarray, calling fwrite multiple times in reasonably sized chunks. See http://projects.scipy.org/numpy/ticket/1660 |
|||
| msg120385 - (view) | Author: Martin Spacek (mspacek) | Date: 2010年11月04日 08:49 | |
It turns out this isn't just a problem with array.array. It's a problem with Python's file.write() as well. Here's my test code:
# file.write() test:
FOURGBMINUS = 2**32 - 16
s = '0123456789012345' # 16 bytes
longs = ''.join([s for i in xrange(FOURGBMINUS//len(s))])
assert len(longs) == FOURGBMINUS
f = open('test.txt', 'w')
f.write(longs) # completes successfully
f.close()
FOURGB = 2**32
s = '0123456789012345' # 16 bytes
longs = ''.join([s for i in xrange(FOURGB//len(s))])
assert len(longs) == FOURGB
f = open('test.txt', 'w')
f.write(longs) # hangs with 100% CPU, file is 0 bytes
f.close()
SIXGB = 2**32 + 2**31
s = '0123456789012345' # 16 bytes
longs = ''.join([s for i in xrange(SIXGB//len(s))])
assert len(longs) == SIXGB
f = open('test.txt', 'w')
f.write(longs) # hangs with 100% CPU, file is 2**31 bytes
f.close()
# file.read test:
TWOGB = 2**31
TWOGBPLUS = TWOGB + 16
s = '0123456789012345' # 16 bytes
longs = ''.join([s for i in xrange(TWOGBPLUS//len(s))])
assert len(longs) == TWOGBPLUS
f = open('test.txt', 'w')
f.write(longs) # completes successfully
f.close()
f = open('test.txt', 'r')
longs = f.read() # works, but takes >30 min, memory usage keeps jumping around
f.close()
del longs
# maybe f.read() reads 1 char at a time til it hits EOL. try this instead:
f = open('test.txt', 'r')
longs = f.read(TWOGBPLUS) # OverflowError: long int too large to convert to int
longs = f.read(TWOGB) # OverflowError: long int too large to convert to int
longs = f.read(TWOGB - 1) # works, takes only seconds
f.close()
So, I guess in windows (I've only tested in 64-bit Windows 7, Python 2.6.6 amd64), file.write() should call fwrite multiple times in chunks no greater than 2**31 bytes or so. Also, calling f.read(nbytes) where nbytes >= 2**31 raises "OverflowError: long int too large to convert to int". I don't have either of these problems in 64-bit Linux (Ubuntu 10.10) on the same machine (i7, 12GB).
|
|||
| msg120386 - (view) | Author: Martin Spacek (mspacek) | Date: 2010年11月04日 08:53 | |
I suppose someone should confirm this problem on Py > 2.6? |
|||
| msg120387 - (view) | Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) | Date: 2010年11月04日 09:37 | |
It's still an issue with 2.7, and even with 3.2a2, see issue9611. |
|||
| msg125259 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2011年01月04日 00:33 | |
r87722 should fix the issue, but I didn't tested the fix... see #9611 for more information. |
|||
| msg139839 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2011年07月05日 09:46 | |
This issue is a duplicate of #9611. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:02 | admin | set | github: 53261 |
| 2011年07月05日 09:46:58 | vstinner | set | status: open -> closed resolution: duplicate messages: + msg139839 |
| 2011年01月04日 00:33:27 | vstinner | set | nosy:
+ vstinner messages: + msg125259 |
| 2010年11月04日 09:37:33 | amaury.forgeotdarc | set | nosy:
+ amaury.forgeotdarc messages: + msg120387 |
| 2010年11月04日 08:53:33 | mspacek | set | nosy:
cgohlke, Bill.Steinmetz, mspacek messages: + msg120386 components: + Extension Modules, Windows |
| 2010年11月04日 08:49:50 | mspacek | set | nosy:
cgohlke, Bill.Steinmetz, mspacek messages: + msg120385 components: + IO, - Extension Modules title: array.array.tofile cannot write arrays of sizes > 4GB, even compiled for amd64 -> f.write(s) for s > 2GB hangs in win64 (and win32?) |
| 2010年11月04日 04:44:18 | mspacek | set | nosy:
+ mspacek type: crash messages: + msg120379 |
| 2010年11月03日 08:35:03 | cgohlke | set | nosy:
+ cgohlke messages: + msg120313 |
| 2010年06月16日 23:53:15 | Bill.Steinmetz | set | messages: + msg107966 |
| 2010年06月16日 23:20:44 | Bill.Steinmetz | create | |