This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2008年04月16日 16:29 by schmir, last changed 2022年04月11日 14:56 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| issue2643.diff | brian.curtin, 2010年01月12日 18:27 | patch against trunk, r77420 | ||
| mmap_msync.diff | neologix, 2010年04月08日 20:21 | Patch to remove msync() call when mmap object is deallocated. | ||
| test_mmap.py | neologix, 2010年04月08日 20:21 | Test script showing msync() latency | ||
| Messages (12) | |||
|---|---|---|---|
| msg65552 - (view) | Author: Ralf Schmitt (schmir) | Date: 2008年04月16日 16:29 | |
on unix it does call msync however.
here is the relevant part from mmapmodule.c:
static void
mmap_object_dealloc(mmap_object *m_obj)
{
#ifdef MS_WINDOWS
if (m_obj->data != NULL)
UnmapViewOfFile (m_obj->data);
if (m_obj->map_handle != INVALID_HANDLE_VALUE)
CloseHandle (m_obj->map_handle);
if (m_obj->file_handle != INVALID_HANDLE_VALUE)
CloseHandle (m_obj->file_handle);
if (m_obj->tagname)
PyMem_Free(m_obj->tagname);
#endif /* MS_WINDOWS */
#ifdef UNIX
if (m_obj->fd >= 0)
(void) close(m_obj->fd);
if (m_obj->data!=NULL) {
msync(m_obj->data, m_obj->size, MS_SYNC);
munmap(m_obj->data, m_obj->size);
}
#endif /* UNIX */
Py_TYPE(m_obj)->tp_free((PyObject*)m_obj);
}
|
|||
| msg65554 - (view) | Author: Ralf Schmitt (schmir) | Date: 2008年04月16日 16:40 | |
The close method does not call msync or FlushViewOfFile. I find this a bit strange cause explicitly closing the mmap will not flush changes but relying on the garbage collector will flush changes. |
|||
| msg97643 - (view) | Author: Brian Curtin (brian.curtin) * (Python committer) | Date: 2010年01月12日 18:25 | |
Added the FlushViewOfFile calls, and an msync call to the close method. Not sure how to explicitly test this, if it's possible. Current tests pass on Windows, I'll need to check *NIX when I have the access later today. As for justification, from the UnmapViewOfFile docs[1]: "To minimize the risk of data loss in the event of a power failure or a system crash, applications should explicitly flush modified pages using the FlushViewOfFile function." [1] http://msdn.microsoft.com/en-us/library/aa366882%28VS.85%29.aspx |
|||
| msg97644 - (view) | Author: Brian Curtin (brian.curtin) * (Python committer) | Date: 2010年01月12日 18:27 | |
tab/space issue, updated the patch |
|||
| msg97664 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2010年01月12日 22:10 | |
A test could explicitly close a dirtied mmaped file and then open() it to check that everything was written, no? |
|||
| msg102495 - (view) | Author: Charles-François Natali (neologix) * (Python committer) | Date: 2010年04月06日 21:16 | |
I don't think that calling msync() or FlushViewOfFile() when closing the mmap object or deallocating it is a good idea. sync()ing dirtied pages to disk is very expensive, blocks the process for a long time, and the OS does a much better job at it (it can be done asynchronously, sync()ing can be grouped, re-ordered, etc). For example, it took around 7 seconds to msync() a mmap-filed of 300Mb in a quick test I've done. Furthermore, we don't do this for regular files: when a file object is closed or deallocated, we don't call fsync(), and the documentation makes it clear: os.fsync(fd) Force write of file with filedescriptor fd to disk. On Unix, this calls the native fsync() function; on Windows, the MS _commit() function. If you’re starting with a Python file object f, first do f.flush(), and then do os.fsync(f.fileno()), to ensure that all internal buffers associated with f are written to disk. Availability: Unix, and Windows starting in 2.2.3. The reason is the same: fsync(), like msync(), is not usually what you want, because of latencies and performance penalties it incurs. Any application requiring the data to be actually written to disk _must_ call fsync() for file objects, and call the flush() method of mmap objects (which is done just for that reason). So, for performance and consistency with files, I'd suggest to remove calls to msync() and FlushViewOfFile() from close() and dealloc(). If agreed, I can submit the patch. > A test could explicitly close a dirtied mmaped file and then open() > it to check that everything was written, no? The problem is that when you open() your file, you'll read the data from cache. You have no way to read the data directly from disk (well, there may be, but are higly non portable, like O_DIRECT file or raw IO). The only check that can be done is tracing the process and checking that msync() is indeed called. |
|||
| msg102642 - (view) | Author: Charles-François Natali (neologix) * (Python committer) | Date: 2010年04月08日 20:21 | |
Alright, the current behaviour is quite strange:
we don't call msync() when closing the object, we just unmap() it:
mmap_close_method(mmap_object *self, PyObject *unused)
{
[...]
#ifdef UNIX
if (0 <= self->fd)
(void) close(self->fd);
self->fd = -1;
if (self->data != NULL) {
munmap(self->data, self->size);
self->data = NULL;
}
#endif
[...]
}
But we set self->data to NULL to avoid calling munmap() a second time when deallocating the object:
static void
mmap_object_dealloc(mmap_object *m_obj)
{
[ ... ]
#ifdef UNIX
if (m_obj->fd >= 0)
(void) close(m_obj->fd);
if (m_obj->data!=NULL) {
msync(m_obj->data, m_obj->size, MS_SYNC);
munmap(m_obj->data, m_obj->size);
}
#endif /* UNIX */
[ ...]
}
So, if the object has been closed properly before being deallocated, msync() is _not_ called.
But, if we don't close the object, then msync() is called.
The attached test script shows the _huge_ performance impact of msync:
when only close() is called (no msync()):
$ ./python /home/cf/test_mmap.py
0.35829615593
when both flush() and close() are called (msync() called):
$ ./python /home/cf/test_mmap.py
4.95999493599
when neither is called, relying on the deallocation (msync() called):
$ ./python /home/cf/test_mmap.py
4.8811671257
And a strace leaves no doubt (called 10 times in a loop) :
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb80b1000 <0.000019>
write(1, "4.12167286873\n"..., 144.12167286873
) = 14 <0.000012>
close(3) = 0 <0.000010>
munmap(0xb80b2000, 4096) = 0 <0.000023>
rt_sigaction(SIGINT, {SIG_DFL}, {0x811d630, [], 0}, 8) = 0 <0.000011>
close(5) = 0 <0.004889>
msync(0xb69f9000, 10000000, MS_SYNC) = 0 <0.584054>
munmap(0xb69f9000, 10000000) = 0 <0.000433>
See how expensive msync() is, and this is just for a 10MB file.
So the attached patch (mmap_msync.diff) removes the call to msync from mmap_object_dealloc(). Since UnmapViewOfFile() is only called inside flush() method, nothing to remove for MS Windows.
Here's the result of the same test script with the patch:
when only close() is called (no msync()):
$ ./python /home/cf/test_mmap.py
0.370584011078
when both flush() and close() are called (msync() called):
$ ./python /home/cf/test_mmap.py
4.97467517853
when neither is called, relying on the deallocation (msync() not called):
$ ./python /home/cf/test_mmap.py
0.390102148056
So we only get msync() latency when the user explicitely calls flush().
|
|||
| msg116818 - (view) | Author: Charles-François Natali (neologix) * (Python committer) | Date: 2010年09月18日 18:03 | |
Any thoughts on this? |
|||
| msg117072 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2010年09月21日 15:40 | |
It's a pity that flush() is defined like this. Ideally, if mmap claims to mimick ordinary file objects, flush() should be a no-op() and there should be a separate sync() method. On the other hand, your (Charles-François's) patch is already much better than the statu quo. If nobody objects, I think it should be committed to 3.2. Whether or not we should be backport it to the stable branches is a bit more delicate, since it /could/ break badly written applications... On a sidenote, the mmap object has received a *lot* less attention during the years than the other IO primitives (especially file objects in 3.x). It should probably only be used for specialized cases. |
|||
| msg117074 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2010年09月21日 15:44 | |
I notice that there's support in #678250 for making flush() a no-op. We should still fix tp_dealloc anyway. |
|||
| msg117077 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2010年09月21日 16:00 | |
I think Antoine misinterpreted my message. I do think that flush should continue to msync, except in cases where there really is no underlying file to sync to. It may (or may not) be unfortunate that the method is called flush, but nothing is gained by renaming it. I agree that calling msync on close/dealloc is not really necessary. The Windows version doesn't sync, either. |
|||
| msg117079 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2010年09月21日 16:08 | |
> I agree that calling msync on close/dealloc is not really necessary. > The Windows version doesn't sync, either. Ok, thank you. I committed the patch in r84950. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:56:33 | admin | set | github: 46895 |
| 2010年09月21日 16:09:20 | pitrou | set | status: open -> closed resolution: fixed |
| 2010年09月21日 16:08:54 | pitrou | set | messages: + msg117079 |
| 2010年09月21日 16:00:16 | loewis | set | nosy:
+ loewis messages: + msg117077 |
| 2010年09月21日 15:44:01 | pitrou | set | messages: + msg117074 |
| 2010年09月21日 15:40:59 | pitrou | set | versions:
- Python 2.6, Python 3.1, Python 2.7 nosy: akuchling, exarkun, pitrou, schmir, trent, brian.curtin, neologix title: mmap_object_dealloc does not call FlushViewOfFile on windows -> mmap_object_dealloc calls time-consuming msync(), although close doesn't messages: + msg117072 components: + Library (Lib), IO stage: patch review -> commit review |
| 2010年09月18日 18:03:56 | neologix | set | messages: + msg116818 |
| 2010年04月08日 20:35:29 | pitrou | set | nosy:
+ akuchling, exarkun |
| 2010年04月08日 20:21:33 | neologix | set | files: + test_mmap.py |
| 2010年04月08日 20:21:02 | neologix | set | files:
+ mmap_msync.diff messages: + msg102642 |
| 2010年04月06日 21:16:47 | neologix | set | nosy:
+ neologix messages: + msg102495 |
| 2010年01月12日 22:10:59 | pitrou | set | nosy:
+ pitrou messages: + msg97664 |
| 2010年01月12日 18:27:03 | brian.curtin | set | files:
+ issue2643.diff messages: + msg97644 |
| 2010年01月12日 18:26:27 | brian.curtin | set | files: - issue2643.diff |
| 2010年01月12日 18:25:14 | brian.curtin | set | files:
+ issue2643.diff priority: normal versions: + Python 3.1, Python 2.7, Python 3.2 keywords: + patch, needs review nosy: + brian.curtin messages: + msg97643 stage: patch review |
| 2008年04月17日 18:38:17 | trent | set | nosy: + trent |
| 2008年04月16日 16:40:11 | schmir | set | messages: + msg65554 |
| 2008年04月16日 16:30:44 | schmir | set | type: behavior versions: + Python 2.6 |
| 2008年04月16日 16:29:01 | schmir | create | |