homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: mmap offset should be off_t instead of ssize_t, and size calculation needs corrected
Type: behavior Stage: resolved
Components: Extension Modules Versions: Python 3.2, Python 3.3, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: eckhardt, jbeezley, loewis, nadeem.vawda, pitrou, rosslagerwall, saa, vstinner, wrobell
Priority: normal Keywords: patch

Created on 2008年12月17日 08:40 by saa, last changed 2022年04月11日 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
mmap_off_t.patch saa, 2008年12月17日 08:40 Noobie fix for non-Windows
mmap_off_t_2.patch saa, 2008年12月17日 09:55 Revised patch that removes fpos_t.
mmap.patch rosslagerwall, 2011年01月18日 06:30
mmap_revised.patch rosslagerwall, 2011年02月09日 14:52
Messages (23)
msg77957 - (view) Author: (saa) Date: 2008年12月17日 08:40
In trying to use the mmap offset to allow me to work with a 12GB file, I 
encountered the fact that the new offset parameter was a ssize_t, which 
is only 32 bits on my platform (OS X). The mmap offset in C is defined 
to be an off_t 
(http://opengroup.org/onlinepubs/007908775/xsh/mmap.html), which is 64 
bits on my platform. The attached patch attempts to fix the non-Windows 
code to have an off_t offset parameter and allows my code to access all 
12 GB of my file. As a warning, this is the first time I've even looked 
at the Python source, and wouldn't even consider myself a Python coder; 
I just use Python every few months. Review the patch carefully.
msg77958 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008年12月17日 09:07
I don't think fpos_t is a suitable type here. It isn't guaranteed to be
an integral type, and, on many systems, it isn't integral. So you can't
convert int to it, and you can't do arithmetic on it.
msg77959 - (view) Author: (saa) Date: 2008年12月17日 09:17
Sorry, I should have explained that the bulk of my patch was derived from 
existing code. The code that references fpos_t is basically a cut and 
paste of code from Modules/bz2module.c. I don't know if this is a valid 
usage of fpos_t, but if it isn't, you should probably file a bug on the 
existing usage in bz2module and suggest a fix. Thanks for reviewing!
msg77960 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008年12月17日 09:38
This is different from bz2, though. In bzlib, there is no API to seek to
a certain position. In mmap, the offset type in mmap specified as off_t,
so fpos_t plays no role here.
msg77961 - (view) Author: (saa) Date: 2008年12月17日 09:42
Ah, I think I'm beginning to understand. Thanks for the explanation. So, 
if I understand correctly, I should refactor the patch and just remove all 
of the Py_off_t and just use off_t because the standard says that *is* the 
type. Correct?
msg77962 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008年12月17日 09:45
> Ah, I think I'm beginning to understand. Thanks for the explanation. So, 
> if I understand correctly, I should refactor the patch and just remove all 
> of the Py_off_t and just use off_t because the standard says that *is* the 
> type. Correct?
Correct.
msg77963 - (view) Author: (saa) Date: 2008年12月17日 09:55
Thanks again, revised patch attached.
msg78055 - (view) Author: (saa) Date: 2008年12月19日 08:31
There are bigger problems here. When the code calculates the size to map 
if size is passed as zero, it simply assigns the 64-bit st_size to the 
size to mmap. On a 32-bit system, it is obvious that the amount to map 
must be less than a 32-bit value. The calculation should take into 
account at least the following two things: 1) it should be the size of the 
file minus the offset, and 2) it shouldn't ask for an unreasonable amount 
of memory. I will experiment with this and submit a new patch (and delete 
the two I've already uploaded) within a couple of days, or at a minimum, 
provide an update to this bug.
msg78076 - (view) Author: (saa) Date: 2008年12月19日 23:24
Notes on the problem.
The Python mmap documentation says: If length is 0, the maximum length 
of the map will be the current size of the file when mmap is called.
Currently, on a system where off_t is 64-bit and size_t is 32-bit, if 
the size of the file (st_size) is > 0xFFFF'FFFF, the code will in effect 
try to map (st_size & 0xFFFF'FFFF). This is suboptimal (imagine if the 
size of the file is 0x1'0000'1000 or even 0x1'0000'0000).
In addition, it seems weird that a caller would have to check the size 
of the returned mmap against the size of the file to see if the entire 
file was mapped or not.
Finally, it appears that there isn't a reliable, architecture 
independent, quick, and easy way to figure out what the maximum mmap 
size is.
With these points in mind, I am going to work on coming up with a patch 
that will change the behavior to simply return an error when a zero 
length is passed to mmap and the entire file can not be mapped. This 
will put the onus on the caller to determine an appropriate mmap size 
for "chunking" the processing of the file, but the behavior will become 
more consistent.
Of course, comments and suggestions are welcome. However, I'm going to 
have to wrap up my immediate involvement on this in the next couple of 
days (Yay, vacation!).
msg78085 - (view) Author: (saa) Date: 2008年12月20日 04:50
Ok, put a fork in me, 'cuz I'm done with this. The latest is that 
mmap.size() is defined to return the size of the file and not the size 
of the data. It tries to return it as a ssize_t, which of course, on 
systems where off_t is 64-bits and ssize_t is 32-bits, won't work for 
sizes that won't fit in 32-bits.
Without the size of the data, it is unclear how one would traverse a 
file using the offset parameter. As part of fixing mmap, I would 
suggest someone should write an example of how it should be used in 
these cases and use that as a test case.
I'm going to leave this up to someone with more knowledge of how this 
stuff *should* work in Python, but I'm going to go redo my program in C.
Thanks for listening and the future efforts.
msg126454 - (view) Author: Ross Lagerwall (rosslagerwall) (Python committer) Date: 2011年01月18日 06:30
Attached is a fix to make offset use off_t. This means that mmap will work with offset > 2GB on 32bit systems.
It also fixes that mmap.size() returns the correct value for files > 2GB on 32bit systems.
The first issue of msg78055 was fixed in issue10916, this also fixes the second part, raising an exception if the mmap length is too large instead of mmap()ing an invalid or wrong size.
msg128210 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011年02月09日 12:54
Thanks for doing this. Looking at the patch: Modules/_io/_iomodule.h has macros for dealing with off_t (it also defines Py_off_t for Windows), perhaps you want to use them.
Also, support.unlink() takes care of ignoring ENOENT for you.
Regardling the sizes used in the test: I think using hexadecimal literals (e.g. 0x180000000 instead of 6442450944) would make things a bit more readable :)
Part of the patch doesn't apply cleanly to latest py3k. Could you regenerate it, please?
msg128215 - (view) Author: Ross Lagerwall (rosslagerwall) (Python committer) Date: 2011年02月09日 14:52
OK, this new patch applies cleanly, uses support.unlink and hexadecimal constants.
I left the off_t handling as is (it seems to work on *nix testing). Perhaps someone can handle the Windows side?
msg128236 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011年02月09日 20:18
> I left the off_t handling as is (it seems to work on *nix testing).
> Perhaps someone can handle the Windows side?
Ok, after looking at the code, it appears Windows already should have proper handling of 64-bit sizes under 32-bit builds.
msg128259 - (view) Author: wrobell (wrobell) Date: 2011年02月10日 02:06
will this patch support off_t for "mmap.resize" as well? ftruncate uses off_t for the length of a file as well.
msg128261 - (view) Author: Ross Lagerwall (rosslagerwall) (Python committer) Date: 2011年02月10日 04:38
> will this patch support off_t for "mmap.resize" as well?
> ftruncate uses off_t for the length of a file as well.
No, it doesn't because resize() resizes the amount mmapped in memory which can't be more than ssize_t anyway. However, resizing does work on a 12GiB file with a large offset, e.g. 6GiB.
msg128274 - (view) Author: wrobell (wrobell) Date: 2011年02月10日 09:43
> > will this patch support off_t for "mmap.resize" as well?
> > ftruncate uses off_t for the length of a file as well.
> No, it doesn't because resize() resizes the amount mmapped in memory
> which can't be more than ssize_t anyway. However, resizing does work on
> a 12GiB file with a large offset, e.g. 6GiB.
i am bit lost. ftruncate is used to enlarge the file, isn't it?
please consider the script:
-- rr.py ---
import mmap
f = open('test.x', 'w+b')
f.write(b'\x00' * 10)
f.flush()
fm = mmap.mmap(f.fileno(), 0)
fm.resize(5 * 1024 ** 3)
fm.close()
f.close()
------------
the script above resizes file from 10 bytes
to 5 * 1024 ** 3 bytes with mmap.resize (but
not 5 * 1024 ** 3 + 10!)
now, on 32-bit linux (with off_t):
$ python3 rr.py
Traceback (most recent call last):
 File "rr.py", line 8, in <module>
 fm.resize(5 * 1024 ** 3)
OverflowError: Python int too large to convert to C ssize_t
above script works perfectly on 64-bit version of linux
with just 4GB of RAM.
maybe i do not understand some magic here, but clearly one
can mmap.resize to a file size much beyond the amount of memory.
msg128305 - (view) Author: Ross Lagerwall (rosslagerwall) (Python committer) Date: 2011年02月10日 16:41
32-bit computers can address up to 4GiB of memory and 64-bit computers can address much more than this. mmap() allows a file to be mapped to a location in memory - the actual amount of memory that exists doesn't matter. This is the reason why a 5GiB file can be mmaped on 64-bit but not 32-bit.
Also, python uses the ssize_t type for sequences which allows about 2GiB to be mapped on a 32-bit platform. The mmap.resize() operation as well as changing the size of the underlying file size with ftruncate(), also changes the size of the map in memory. Thus in your sample script, this resizing would need to ftruncate the file to 5GiB and then map all of it into memory which is not possible on 32-bit.
I think the following might be a possible way of doing it:
import mmap
f = open('test.x', 'w+b')
f.write(b'\x00' * 10)
f.flush()
fm = mmap.mmap(f.fileno(), 0)
fm.close()
f.truncate(5 * 1024 ** 3)
# mmap only 1GiB of the 5GiB file
fm = mmap.mmap(f.fileno(), 1024**3)
fm.close()
f.close()
msg128312 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年02月10日 17:03
Le jeudi 10 février 2011 à 16:41 +0000, Ross Lagerwall a écrit :
> Ross Lagerwall <rosslagerwall@gmail.com> added the comment:
> 
> 32-bit computers can address up to 4GiB of memory
... at least 4 GB. With PAE (Physical Address Extension), we can address
up to 2^36 bytes. It looks like mmap() do support PAE, I suppose that
you need 64 bits offset on a 32 bits system to be able to use PAE :-)
Victor
msg128313 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011年02月10日 17:06
Le jeudi 10 février 2011 à 17:03 +0000, STINNER Victor a écrit :
> STINNER Victor <victor.stinner@haypocalc.com> added the comment:
> 
> Le jeudi 10 février 2011 à 16:41 +0000, Ross Lagerwall a écrit :
> > Ross Lagerwall <rosslagerwall@gmail.com> added the comment:
> > 
> > 32-bit computers can address up to 4GiB of memory
> 
> ... at least 4 GB. With PAE (Physical Address Extension), we can address
> up to 2^36 bytes. It looks like mmap() do support PAE, I suppose that
> you need 64 bits offset on a 32 bits system to be able to use PAE :-)
We are talking about virtual addresses, not physical addresses. PAE
doesn't magically enlarge your address registers ;)
msg128336 - (view) Author: wrobell (wrobell) Date: 2011年02月10日 20:17
Ross, you are completely right. thanks for the tips!
msg129012 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011年02月21日 23:44
I was a bit optimistic concerning 32-bit Windows. I had to do some changes, in part because off_t is 32-bit there. The final patch is committed in r88486 (tested under 32-bit and 64-bit Linux and Windows).
msg129014 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011年02月21日 23:59
Backported in r88487 to 3.2, and in r88488 to 2.7.
History
Date User Action Args
2022年04月11日 14:56:42adminsetgithub: 48931
2011年02月21日 23:59:56pitrousetstatus: open -> closed
versions: + Python 2.7
nosy: loewis, pitrou, vstinner, wrobell, nadeem.vawda, eckhardt, saa, rosslagerwall, jbeezley
messages: + msg129014

resolution: fixed
stage: patch review -> resolved
2011年02月21日 23:44:30pitrousetnosy: loewis, pitrou, vstinner, wrobell, nadeem.vawda, eckhardt, saa, rosslagerwall, jbeezley
messages: + msg129012
2011年02月10日 20:17:59wrobellsetnosy: loewis, pitrou, vstinner, wrobell, nadeem.vawda, eckhardt, saa, rosslagerwall, jbeezley
messages: + msg128336
2011年02月10日 17:06:04pitrousetnosy: loewis, pitrou, vstinner, wrobell, nadeem.vawda, eckhardt, saa, rosslagerwall, jbeezley
messages: + msg128313
2011年02月10日 17:03:48vstinnersetnosy: loewis, pitrou, vstinner, wrobell, nadeem.vawda, eckhardt, saa, rosslagerwall, jbeezley
messages: + msg128312
2011年02月10日 16:41:45rosslagerwallsetnosy: loewis, pitrou, vstinner, wrobell, nadeem.vawda, eckhardt, saa, rosslagerwall, jbeezley
messages: + msg128305
2011年02月10日 09:43:31wrobellsetnosy: loewis, pitrou, vstinner, wrobell, nadeem.vawda, eckhardt, saa, rosslagerwall, jbeezley
messages: + msg128274
2011年02月10日 04:38:01rosslagerwallsetnosy: loewis, pitrou, vstinner, wrobell, nadeem.vawda, eckhardt, saa, rosslagerwall, jbeezley
messages: + msg128261
2011年02月10日 02:06:36wrobellsetnosy: + wrobell
messages: + msg128259
2011年02月09日 20:18:18pitrousetnosy: loewis, pitrou, vstinner, nadeem.vawda, eckhardt, saa, rosslagerwall, jbeezley
messages: + msg128236
2011年02月09日 14:52:42rosslagerwallsetfiles: + mmap_revised.patch
nosy: loewis, pitrou, vstinner, nadeem.vawda, eckhardt, saa, rosslagerwall, jbeezley
messages: + msg128215
2011年02月09日 12:54:13pitrousetnosy: + pitrou
messages: + msg128210
2011年02月09日 01:24:19jbeezleysetnosy: + jbeezley
2011年01月18日 21:38:38vstinnersetnosy: + vstinner
2011年01月18日 06:30:57rosslagerwallsetfiles: + mmap.patch
nosy: + rosslagerwall
messages: + msg126454

2011年01月06日 22:18:49nadeem.vawdasetnosy: + nadeem.vawda
2011年01月06日 16:47:12pitrousetnosy: loewis, eckhardt, saa
stage: patch review
versions: + Python 3.2, Python 3.3, - Python 2.6
2009年01月02日 13:37:08eckhardtsetnosy: + eckhardt
2008年12月20日 04:50:40saasetmessages: + msg78085
2008年12月19日 23:24:10saasetmessages: + msg78076
2008年12月19日 08:31:32saasetmessages: + msg78055
title: mmap offset should be off_t instead of ssize_t -> mmap offset should be off_t instead of ssize_t, and size calculation needs corrected
2008年12月17日 09:55:48saasetfiles: + mmap_off_t_2.patch
messages: + msg77963
2008年12月17日 09:45:51loewissetmessages: + msg77962
2008年12月17日 09:42:27saasetmessages: + msg77961
2008年12月17日 09:38:24loewissetmessages: + msg77960
2008年12月17日 09:17:40saasetmessages: + msg77959
2008年12月17日 09:07:08loewissetnosy: + loewis
messages: + msg77958
2008年12月17日 08:40:09saacreate

AltStyle によって変換されたページ (->オリジナル) /