homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: filecmp.cmp() incorrect results when previously compared file is modified within modification time resolution
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: fbm, melevittfl, nadeem.vawda, ned.deily, python-dev, rhettinger
Priority: normal Keywords: easy, patch

Created on 2013年06月06日 14:07 by fbm, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
18149.patch melevittfl, 2013年06月12日 13:06 Patch for issue 18149 - Adds clear_cache() method review
18149-2.patch melevittfl, 2013年06月12日 23:40 Patch for issue 18149 - Adds clear_cache method with updated Docs and tests review
18149-3.patch melevittfl, 2013年06月13日 08:32 Patch for issue 18149 - Updated to add version added directive to docs review
Messages (10)
msg190715 - (view) Author: Matej Fröbe (fbm) Date: 2013年06月06日 14:07
Example:
 with open('file1', 'w') as f:
 f.write('a')
 with open('file2', 'w') as f:
 f.write('a')
 
 print filecmp.cmp('file1', 'file2', shallow=False) # true
 with open('file2', 'w') as f:
 f.write('b')
 print filecmp.cmp('file1', 'file2', shallow=False) # true
Because of the caching, both calls to filecmp.cmp() return true on my system.
When retrieving value from cache, the function filecmp.cmp() checks the signatures of the files:
 s1 = _sig(os.stat(f1))
 s2 = _sig(os.stat(f2))
 ...
 outcome = _cache.get((f1, f2, s1, s2))
But the signatures in cache are the same, if the file sizes and times of modification (os.stat().st_mtime) haven't changed from the last call, even if the content has changed.
The buffer is mentioned in the documentation, but there isn't any documented way to clear it. It also isn't nice IMO, that one has to worry about the file system's resolution of the file modification time when calling a simple file comparison.
msg190774 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2013年06月07日 20:26
It seems like this would be a fairly rare situation and, as you note, dependent on the underlying file system. But it would be easy to add a new function to the module to clear its cache in cases where it is known this might be a problem. In fact, in Issue11802 a clear_cache function was proposed to solve the problem of the cache growing without bounds but that problem was solved by the simpler solution of discarding the cache when it gets above 100 entries.
msg190790 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2013年06月08日 01:56
+1 for a cache clearing function like the one in re.py
msg191026 - (view) Author: Mark Levitt (melevittfl) * Date: 2013年06月12日 13:06
I've added a "clear_cache()" method to filecmp.py. Patch attached. 
I had thought about implementing an optional parameter to only invalidate the cache of a specific file object, but figured I'd keep it simple for now.
First time submitting a patch, so apologies if I've done something the wrong way.
msg191048 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2013年06月12日 21:03
Thanks for the patch, Mark. I've left some review comments via Rietveld (the review link next to the patch). Also, if you haven't already, please fill out the contributor form as described in the Developer's Guide (http://docs.python.org/devguide/patch.html#licensing).
msg191051 - (view) Author: Mark Levitt (melevittfl) * Date: 2013年06月12日 23:40
Ned,
Thanks for taking the time to review. I've updated the docs, added a unit test, signed the contributor form, and made the changes/corrections from your review.
Updated patch attached.
msg191060 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2013年06月13日 06:29
Looks good to me, other than that the doc change should include a version added directive (which can be added by the committer):
 .. function:: clear_cache()
+ .. versionadded:: 3.4
+
 Clear the filecmp cache. This may be useful if a file is compared so quickly
msg191067 - (view) Author: Mark Levitt (melevittfl) * Date: 2013年06月13日 08:32
Cool. I've gone ahead and generated a new patch with the version added directive included.
msg191162 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013年06月14日 22:20
New changeset bfd53dcb02ff by Ned Deily in branch 'default':
Issue #18149: Add filecmp.clear_cache() to manually clear the filecmp cache.
http://hg.python.org/cpython/rev/bfd53dcb02ff 
msg191163 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2013年06月14日 22:22
Committed for release in 3.4.0. Thanks, Mark.
History
Date User Action Args
2022年04月11日 14:57:46adminsetgithub: 62349
2013年06月14日 22:22:19ned.deilysetstatus: open -> closed
resolution: fixed
messages: + msg191163

stage: commit review -> resolved
2013年06月14日 22:20:23python-devsetnosy: + python-dev
messages: + msg191162
2013年06月13日 08:32:28melevittflsetfiles: + 18149-3.patch

messages: + msg191067
2013年06月13日 06:29:12ned.deilysetmessages: + msg191060
stage: needs patch -> commit review
2013年06月12日 23:40:21melevittflsetfiles: + 18149-2.patch

messages: + msg191051
2013年06月12日 21:03:40ned.deilysetmessages: + msg191048
2013年06月12日 13:06:56melevittflsetfiles: + 18149.patch

nosy: + melevittfl
messages: + msg191026

keywords: + patch
2013年06月08日 01:56:51rhettingersetmessages: + msg190790
2013年06月07日 20:26:29ned.deilysettitle: filecmp.cmp() - cache invalidation fails when file modification times haven't changed -> filecmp.cmp() incorrect results when previously compared file is modified within modification time resolution

keywords: + easy
nosy: + rhettinger, ned.deily, nadeem.vawda
versions: + Python 3.4, - Python 2.7
messages: + msg190774
stage: needs patch
2013年06月06日 14:07:35fbmcreate

AltStyle によって変換されたページ (->オリジナル) /