This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2008年12月17日 08:40 by saa, last changed 2022年04月11日 14:56 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| mmap_off_t.patch | saa, 2008年12月17日 08:40 | Noobie fix for non-Windows | ||
| mmap_off_t_2.patch | saa, 2008年12月17日 09:55 | Revised patch that removes fpos_t. | ||
| mmap.patch | rosslagerwall, 2011年01月18日 06:30 | |||
| mmap_revised.patch | rosslagerwall, 2011年02月09日 14:52 | |||
| Messages (23) | |||
|---|---|---|---|
| msg77957 - (view) | Author: (saa) | Date: 2008年12月17日 08:40 | |
In trying to use the mmap offset to allow me to work with a 12GB file, I encountered the fact that the new offset parameter was a ssize_t, which is only 32 bits on my platform (OS X). The mmap offset in C is defined to be an off_t (http://opengroup.org/onlinepubs/007908775/xsh/mmap.html), which is 64 bits on my platform. The attached patch attempts to fix the non-Windows code to have an off_t offset parameter and allows my code to access all 12 GB of my file. As a warning, this is the first time I've even looked at the Python source, and wouldn't even consider myself a Python coder; I just use Python every few months. Review the patch carefully. |
|||
| msg77958 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2008年12月17日 09:07 | |
I don't think fpos_t is a suitable type here. It isn't guaranteed to be an integral type, and, on many systems, it isn't integral. So you can't convert int to it, and you can't do arithmetic on it. |
|||
| msg77959 - (view) | Author: (saa) | Date: 2008年12月17日 09:17 | |
Sorry, I should have explained that the bulk of my patch was derived from existing code. The code that references fpos_t is basically a cut and paste of code from Modules/bz2module.c. I don't know if this is a valid usage of fpos_t, but if it isn't, you should probably file a bug on the existing usage in bz2module and suggest a fix. Thanks for reviewing! |
|||
| msg77960 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2008年12月17日 09:38 | |
This is different from bz2, though. In bzlib, there is no API to seek to a certain position. In mmap, the offset type in mmap specified as off_t, so fpos_t plays no role here. |
|||
| msg77961 - (view) | Author: (saa) | Date: 2008年12月17日 09:42 | |
Ah, I think I'm beginning to understand. Thanks for the explanation. So, if I understand correctly, I should refactor the patch and just remove all of the Py_off_t and just use off_t because the standard says that *is* the type. Correct? |
|||
| msg77962 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2008年12月17日 09:45 | |
> Ah, I think I'm beginning to understand. Thanks for the explanation. So, > if I understand correctly, I should refactor the patch and just remove all > of the Py_off_t and just use off_t because the standard says that *is* the > type. Correct? Correct. |
|||
| msg77963 - (view) | Author: (saa) | Date: 2008年12月17日 09:55 | |
Thanks again, revised patch attached. |
|||
| msg78055 - (view) | Author: (saa) | Date: 2008年12月19日 08:31 | |
There are bigger problems here. When the code calculates the size to map if size is passed as zero, it simply assigns the 64-bit st_size to the size to mmap. On a 32-bit system, it is obvious that the amount to map must be less than a 32-bit value. The calculation should take into account at least the following two things: 1) it should be the size of the file minus the offset, and 2) it shouldn't ask for an unreasonable amount of memory. I will experiment with this and submit a new patch (and delete the two I've already uploaded) within a couple of days, or at a minimum, provide an update to this bug. |
|||
| msg78076 - (view) | Author: (saa) | Date: 2008年12月19日 23:24 | |
Notes on the problem. The Python mmap documentation says: If length is 0, the maximum length of the map will be the current size of the file when mmap is called. Currently, on a system where off_t is 64-bit and size_t is 32-bit, if the size of the file (st_size) is > 0xFFFF'FFFF, the code will in effect try to map (st_size & 0xFFFF'FFFF). This is suboptimal (imagine if the size of the file is 0x1'0000'1000 or even 0x1'0000'0000). In addition, it seems weird that a caller would have to check the size of the returned mmap against the size of the file to see if the entire file was mapped or not. Finally, it appears that there isn't a reliable, architecture independent, quick, and easy way to figure out what the maximum mmap size is. With these points in mind, I am going to work on coming up with a patch that will change the behavior to simply return an error when a zero length is passed to mmap and the entire file can not be mapped. This will put the onus on the caller to determine an appropriate mmap size for "chunking" the processing of the file, but the behavior will become more consistent. Of course, comments and suggestions are welcome. However, I'm going to have to wrap up my immediate involvement on this in the next couple of days (Yay, vacation!). |
|||
| msg78085 - (view) | Author: (saa) | Date: 2008年12月20日 04:50 | |
Ok, put a fork in me, 'cuz I'm done with this. The latest is that mmap.size() is defined to return the size of the file and not the size of the data. It tries to return it as a ssize_t, which of course, on systems where off_t is 64-bits and ssize_t is 32-bits, won't work for sizes that won't fit in 32-bits. Without the size of the data, it is unclear how one would traverse a file using the offset parameter. As part of fixing mmap, I would suggest someone should write an example of how it should be used in these cases and use that as a test case. I'm going to leave this up to someone with more knowledge of how this stuff *should* work in Python, but I'm going to go redo my program in C. Thanks for listening and the future efforts. |
|||
| msg126454 - (view) | Author: Ross Lagerwall (rosslagerwall) (Python committer) | Date: 2011年01月18日 06:30 | |
Attached is a fix to make offset use off_t. This means that mmap will work with offset > 2GB on 32bit systems. It also fixes that mmap.size() returns the correct value for files > 2GB on 32bit systems. The first issue of msg78055 was fixed in issue10916, this also fixes the second part, raising an exception if the mmap length is too large instead of mmap()ing an invalid or wrong size. |
|||
| msg128210 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2011年02月09日 12:54 | |
Thanks for doing this. Looking at the patch: Modules/_io/_iomodule.h has macros for dealing with off_t (it also defines Py_off_t for Windows), perhaps you want to use them. Also, support.unlink() takes care of ignoring ENOENT for you. Regardling the sizes used in the test: I think using hexadecimal literals (e.g. 0x180000000 instead of 6442450944) would make things a bit more readable :) Part of the patch doesn't apply cleanly to latest py3k. Could you regenerate it, please? |
|||
| msg128215 - (view) | Author: Ross Lagerwall (rosslagerwall) (Python committer) | Date: 2011年02月09日 14:52 | |
OK, this new patch applies cleanly, uses support.unlink and hexadecimal constants. I left the off_t handling as is (it seems to work on *nix testing). Perhaps someone can handle the Windows side? |
|||
| msg128236 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2011年02月09日 20:18 | |
> I left the off_t handling as is (it seems to work on *nix testing). > Perhaps someone can handle the Windows side? Ok, after looking at the code, it appears Windows already should have proper handling of 64-bit sizes under 32-bit builds. |
|||
| msg128259 - (view) | Author: wrobell (wrobell) | Date: 2011年02月10日 02:06 | |
will this patch support off_t for "mmap.resize" as well? ftruncate uses off_t for the length of a file as well. |
|||
| msg128261 - (view) | Author: Ross Lagerwall (rosslagerwall) (Python committer) | Date: 2011年02月10日 04:38 | |
> will this patch support off_t for "mmap.resize" as well? > ftruncate uses off_t for the length of a file as well. No, it doesn't because resize() resizes the amount mmapped in memory which can't be more than ssize_t anyway. However, resizing does work on a 12GiB file with a large offset, e.g. 6GiB. |
|||
| msg128274 - (view) | Author: wrobell (wrobell) | Date: 2011年02月10日 09:43 | |
> > will this patch support off_t for "mmap.resize" as well?
> > ftruncate uses off_t for the length of a file as well.
> No, it doesn't because resize() resizes the amount mmapped in memory
> which can't be more than ssize_t anyway. However, resizing does work on
> a 12GiB file with a large offset, e.g. 6GiB.
i am bit lost. ftruncate is used to enlarge the file, isn't it?
please consider the script:
-- rr.py ---
import mmap
f = open('test.x', 'w+b')
f.write(b'\x00' * 10)
f.flush()
fm = mmap.mmap(f.fileno(), 0)
fm.resize(5 * 1024 ** 3)
fm.close()
f.close()
------------
the script above resizes file from 10 bytes
to 5 * 1024 ** 3 bytes with mmap.resize (but
not 5 * 1024 ** 3 + 10!)
now, on 32-bit linux (with off_t):
$ python3 rr.py
Traceback (most recent call last):
File "rr.py", line 8, in <module>
fm.resize(5 * 1024 ** 3)
OverflowError: Python int too large to convert to C ssize_t
above script works perfectly on 64-bit version of linux
with just 4GB of RAM.
maybe i do not understand some magic here, but clearly one
can mmap.resize to a file size much beyond the amount of memory.
|
|||
| msg128305 - (view) | Author: Ross Lagerwall (rosslagerwall) (Python committer) | Date: 2011年02月10日 16:41 | |
32-bit computers can address up to 4GiB of memory and 64-bit computers can address much more than this. mmap() allows a file to be mapped to a location in memory - the actual amount of memory that exists doesn't matter. This is the reason why a 5GiB file can be mmaped on 64-bit but not 32-bit.
Also, python uses the ssize_t type for sequences which allows about 2GiB to be mapped on a 32-bit platform. The mmap.resize() operation as well as changing the size of the underlying file size with ftruncate(), also changes the size of the map in memory. Thus in your sample script, this resizing would need to ftruncate the file to 5GiB and then map all of it into memory which is not possible on 32-bit.
I think the following might be a possible way of doing it:
import mmap
f = open('test.x', 'w+b')
f.write(b'\x00' * 10)
f.flush()
fm = mmap.mmap(f.fileno(), 0)
fm.close()
f.truncate(5 * 1024 ** 3)
# mmap only 1GiB of the 5GiB file
fm = mmap.mmap(f.fileno(), 1024**3)
fm.close()
f.close()
|
|||
| msg128312 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2011年02月10日 17:03 | |
Le jeudi 10 février 2011 à 16:41 +0000, Ross Lagerwall a écrit : > Ross Lagerwall <rosslagerwall@gmail.com> added the comment: > > 32-bit computers can address up to 4GiB of memory ... at least 4 GB. With PAE (Physical Address Extension), we can address up to 2^36 bytes. It looks like mmap() do support PAE, I suppose that you need 64 bits offset on a 32 bits system to be able to use PAE :-) Victor |
|||
| msg128313 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2011年02月10日 17:06 | |
Le jeudi 10 février 2011 à 17:03 +0000, STINNER Victor a écrit : > STINNER Victor <victor.stinner@haypocalc.com> added the comment: > > Le jeudi 10 février 2011 à 16:41 +0000, Ross Lagerwall a écrit : > > Ross Lagerwall <rosslagerwall@gmail.com> added the comment: > > > > 32-bit computers can address up to 4GiB of memory > > ... at least 4 GB. With PAE (Physical Address Extension), we can address > up to 2^36 bytes. It looks like mmap() do support PAE, I suppose that > you need 64 bits offset on a 32 bits system to be able to use PAE :-) We are talking about virtual addresses, not physical addresses. PAE doesn't magically enlarge your address registers ;) |
|||
| msg128336 - (view) | Author: wrobell (wrobell) | Date: 2011年02月10日 20:17 | |
Ross, you are completely right. thanks for the tips! |
|||
| msg129012 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2011年02月21日 23:44 | |
I was a bit optimistic concerning 32-bit Windows. I had to do some changes, in part because off_t is 32-bit there. The final patch is committed in r88486 (tested under 32-bit and 64-bit Linux and Windows). |
|||
| msg129014 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2011年02月21日 23:59 | |
Backported in r88487 to 3.2, and in r88488 to 2.7. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:56:42 | admin | set | github: 48931 |
| 2011年02月21日 23:59:56 | pitrou | set | status: open -> closed versions: + Python 2.7 nosy: loewis, pitrou, vstinner, wrobell, nadeem.vawda, eckhardt, saa, rosslagerwall, jbeezley messages: + msg129014 resolution: fixed stage: patch review -> resolved |
| 2011年02月21日 23:44:30 | pitrou | set | nosy:
loewis, pitrou, vstinner, wrobell, nadeem.vawda, eckhardt, saa, rosslagerwall, jbeezley messages: + msg129012 |
| 2011年02月10日 20:17:59 | wrobell | set | nosy:
loewis, pitrou, vstinner, wrobell, nadeem.vawda, eckhardt, saa, rosslagerwall, jbeezley messages: + msg128336 |
| 2011年02月10日 17:06:04 | pitrou | set | nosy:
loewis, pitrou, vstinner, wrobell, nadeem.vawda, eckhardt, saa, rosslagerwall, jbeezley messages: + msg128313 |
| 2011年02月10日 17:03:48 | vstinner | set | nosy:
loewis, pitrou, vstinner, wrobell, nadeem.vawda, eckhardt, saa, rosslagerwall, jbeezley messages: + msg128312 |
| 2011年02月10日 16:41:45 | rosslagerwall | set | nosy:
loewis, pitrou, vstinner, wrobell, nadeem.vawda, eckhardt, saa, rosslagerwall, jbeezley messages: + msg128305 |
| 2011年02月10日 09:43:31 | wrobell | set | nosy:
loewis, pitrou, vstinner, wrobell, nadeem.vawda, eckhardt, saa, rosslagerwall, jbeezley messages: + msg128274 |
| 2011年02月10日 04:38:01 | rosslagerwall | set | nosy:
loewis, pitrou, vstinner, wrobell, nadeem.vawda, eckhardt, saa, rosslagerwall, jbeezley messages: + msg128261 |
| 2011年02月10日 02:06:36 | wrobell | set | nosy:
+ wrobell messages: + msg128259 |
| 2011年02月09日 20:18:18 | pitrou | set | nosy:
loewis, pitrou, vstinner, nadeem.vawda, eckhardt, saa, rosslagerwall, jbeezley messages: + msg128236 |
| 2011年02月09日 14:52:42 | rosslagerwall | set | files:
+ mmap_revised.patch nosy: loewis, pitrou, vstinner, nadeem.vawda, eckhardt, saa, rosslagerwall, jbeezley messages: + msg128215 |
| 2011年02月09日 12:54:13 | pitrou | set | nosy:
+ pitrou messages: + msg128210 |
| 2011年02月09日 01:24:19 | jbeezley | set | nosy:
+ jbeezley |
| 2011年01月18日 21:38:38 | vstinner | set | nosy:
+ vstinner |
| 2011年01月18日 06:30:57 | rosslagerwall | set | files:
+ mmap.patch nosy: + rosslagerwall messages: + msg126454 |
| 2011年01月06日 22:18:49 | nadeem.vawda | set | nosy:
+ nadeem.vawda |
| 2011年01月06日 16:47:12 | pitrou | set | nosy:
loewis, eckhardt, saa stage: patch review versions: + Python 3.2, Python 3.3, - Python 2.6 |
| 2009年01月02日 13:37:08 | eckhardt | set | nosy: + eckhardt |
| 2008年12月20日 04:50:40 | saa | set | messages: + msg78085 |
| 2008年12月19日 23:24:10 | saa | set | messages: + msg78076 |
| 2008年12月19日 08:31:32 | saa | set | messages:
+ msg78055 title: mmap offset should be off_t instead of ssize_t -> mmap offset should be off_t instead of ssize_t, and size calculation needs corrected |
| 2008年12月17日 09:55:48 | saa | set | files:
+ mmap_off_t_2.patch messages: + msg77963 |
| 2008年12月17日 09:45:51 | loewis | set | messages: + msg77962 |
| 2008年12月17日 09:42:27 | saa | set | messages: + msg77961 |
| 2008年12月17日 09:38:24 | loewis | set | messages: + msg77960 |
| 2008年12月17日 09:17:40 | saa | set | messages: + msg77959 |
| 2008年12月17日 09:07:08 | loewis | set | nosy:
+ loewis messages: + msg77958 |
| 2008年12月17日 08:40:09 | saa | create | |