homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Distutils filelist selects too many files on Windows
Type: Stage: resolved
Components: Distutils, Windows Versions: Python 3.2, Python 3.3, Python 2.7
process
Status: closed Resolution: duplicate
Dependencies: Superseder: Impossible to include file in sdist that starts with 'build' on Win32
View: 6884
Assigned To: eric.araujo Nosy List: eric.araujo, jaraco, nadeem.vawda, python-dev, tarek
Priority: normal Keywords: easy

Created on 2012年02月13日 20:57 by jaraco, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
smime.p7s jaraco, 2012年02月15日 15:34
Messages (14)
msg153302 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2012年02月13日 20:57
When using a MANIFEST.in with only "include *.txt", on Windows, distutils grabs too many files. I set DISTUTILS_DEBUG=1 and ran ./setup.py sdist on the keyring library and it included this output:
 include *.txt
 include_pattern: applying regex r'^[^/]*\.txt\Z(?ms)'
 adding CHANGES.txt
 adding CONTRIBUTORS.txt
 adding .hg\last-message.txt
 adding keyring.egg-info\dependency_links.txt
 adding keyring.egg-info\requires.txt
 adding keyring.egg-info\SOURCES.txt
 adding keyring.egg-info\top_level.txt
As you can see, this is not a recursive include, but it's matching several files not in the supplied pattern (files in .hg/ and keyring.egg-info/).
It's obvious from the regular expression that the regex doesn't properly account for os.sep.
msg153341 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2012年02月14日 15:38
This code just got changed (#13193) to use '/' instead of os.path.join, as it is documented that paths in MANIFEST.in must use only '/'. Can you build an up-to-date Python 2.7 from Mercurial and check again?
Your report comes timely, as there was some doubt about what the correct fix was. Even if your bug is now gone, I’ll want to add a regression test for it.
msg153371 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2012年02月14日 21:32
I've been able to reproduce this on current builds of 2.7, 3.2 and 3.3.
The problem seems to be that distutils.filelist pathnames using the OS
directory separator, but the regexps used for matching paths are written
to always assume "/"-based paths.
I've been able to get the correct behaviour in this case by making the
module build paths using "/" regardless of the OS directory separator:
 --- a/Lib/distutils/filelist.py
 +++ b/Lib/distutils/filelist.py
 @@ -55,10 +55,10 @@
 def sort(self):
 # Not a strict lexical sort!
 - sortable_files = sorted(map(os.path.split, self.files))
 + sortable_files = sorted(f.split("/") for f in self.files)
 self.files = []
 for sort_tuple in sortable_files:
 - self.files.append(os.path.join(*sort_tuple))
 + self.files.append("/".join(sort_tuple))
 # -- Other miscellaneous utility methods ---------------------------
 @@ -258,7 +258,7 @@
 for name in names:
 if dir != os.curdir: # avoid the dreaded "./" syndrome
 - fullname = os.path.join(dir, name)
 + fullname = dir + "/" + name
 else:
 fullname = name
I'm not entirely comfortable with this fix, though - it seems like it
would be better to produce paths using the OS directory separator at the
end of the process. Maybe it's possible to add a translation step after
the file list is built and de-duplicated? I don't know much about how
the rest of distutils uses this module.
(Random aside: do we support any platforms that don't support "/" as a
directory separator (even if it isn't the preferred one)? Hmm...)
msg153378 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2012年02月15日 00:04
The bug is indeed fixed in the latest 2.7 tip:
 PS C:\Users\jaraco\projects\public\keyring> C:\Users\jaraco\projects\public\cpython\PCbuild\amd64\python.exe setup.py sdist 2> NULL | findstr .hg
 (produces no output)
msg153379 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2012年02月15日 00:06
If always using a posix path separator, I recommend using posixpath.join instead of hard-coding the path separator.
msg153391 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2012年02月15日 07:24
> The bug is indeed fixed in the latest 2.7 tip:
>
> PS C:\Users\jaraco\projects\public\keyring> C:\Users\jaraco\projects\public\cpython\PCbuild\amd64\python.exe setup.py sdist 2> NULL | findstr .hg
> (produces no output)
I suspect you forgot to set DISTUTILS_DEBUG before running this -
otherwise that command should at least output a line like this:
 exclude_pattern: applying regex r'(^|/|\\)(RCS|CVS|\.svn|\.hg|\.git|\.bzr|_darcs)(/|\\).*'
> If always using a posix path separator, I recommend using posixpath.join instead of hard-coding the path separator.
Interesting. I had thought that posixpath and ntpath were each only
available on their respective platforms, but looking at the source,
I see that I was mistaken.
msg153403 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2012年02月15日 15:34
It's not that I forgot to set DISTUTILS_DEBUG, I simply did not set it. If the 
bug were still present, I would have seen a line indicating that 
.hg/last-message.txt was being copied.
For completeness, here's the output with the DEBUG setting:
 PS C:\Users\jaraco\projects\public\keyring> $env:DISTUTILS_DEBUG=1
 PS C:\Users\jaraco\projects\public\keyring> 
..\cpython\PCbuild\amd64\python.exe setup.py sdist 2> NULL | findstr hg
 adding .hg\last-message.txt
 exclude_pattern: applying regex 
r'(^|/|\\)(RCS|CVS|\.svn|\.hg|\.git|\.bzr|_darcs)(/|\\).*'
 removing .hg\last-message.txt
msg153404 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2012年02月15日 15:42
So it looks like that even if the exclusion of .hg removes .hg/last-message.txt, it should not have been added in the first place, as the command was include and not recursive-include.
At first glance, Nadeem’s proposed fix is not right: paths in MANIFEST.in use '/', but then filelist produces paths using os.sep, so that the MANIFEST file and other operations done by the sdist command use native paths. So even though the currently supported OSes all accept '/', I think the right thing is to use os.sep.
(About posixpath: It is always available and can be used e.g. to manipulate URI paths.)
msg153428 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2012年02月15日 19:26
> At first glance, Nadeem’s proposed fix is not right: paths in MANIFEST.in use '/', but then filelist produces paths using os.sep, so that the MANIFEST file and other operations done by the sdist command use native paths. So even though the currently supported OSes all accept '/', I think the right thing is to use os.sep.
Yes, that sounds like a better solution. So the solution then is to fix
the regexps to use os.sep instead of "/" (and ensure that FileList uses
native paths throughout).
As an aside, it seems that the failing test from issue 13193 was
actually correct, and that the library itself was broken. I suppose all
of the tests will need to be changed to use native paths when this issue
is fixed.
msg153624 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2012年02月18日 01:49
This regex bug was actually already known. Please follow up on the other report.
> As an aside, it seems that the failing test from issue 13193 was
> actually correct, and that the library itself was broken.
Yes. filelist operates in this way: it builds a list of all files in a tree (creating allfiles, which thus uses native path separators), then edits this list according to the MANIFEST.in commands (where the patterns must be transformed to a regex that uses native separators). So the test was right in using os.path.join, and the regex was not right to always use '/'.
msg154258 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012年02月25日 15:14
New changeset 47788c90f80b by Éric Araujo in branch '2.7':
Fix long-standing bugs with MANIFEST.in parsing on Windows (#6884).
http://hg.python.org/cpython/rev/47788c90f80b 
msg154262 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012年02月25日 15:32
New changeset 73aa4c9305b3 by Éric Araujo in branch '3.2':
Fix long-standing bugs with MANIFEST.in parsing on Windows (#6884).
http://hg.python.org/cpython/rev/73aa4c9305b3 
msg154457 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012年02月27日 11:29
New changeset 4d6a9f287543 by Éric Araujo in branch 'default':
Fix bugs with MANIFEST.in parsing on Windows (#6884).
http://hg.python.org/distutils2/rev/4d6a9f287543 
msg156266 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012年03月18日 19:40
New changeset edcdef70c44e by Éric Araujo in branch '3.2':
Fix long-standing bugs with MANIFEST.in parsing on Windows (#6884).
http://hg.python.org/cpython/rev/edcdef70c44e 
History
Date User Action Args
2022年04月11日 14:57:26adminsetgithub: 58212
2012年03月18日 19:40:09python-devsetmessages: + msg156266
2012年02月27日 11:29:46python-devsetmessages: + msg154457
2012年02月25日 15:32:25python-devsetmessages: + msg154262
2012年02月25日 15:14:01python-devsetnosy: + python-dev
messages: + msg154258
2012年02月18日 01:49:16eric.araujosetstatus: open -> closed
resolution: duplicate
messages: + msg153624

superseder: Impossible to include file in sdist that starts with 'build' on Win32
stage: test needed -> resolved
2012年02月15日 19:26:00nadeem.vawdasetmessages: + msg153428
2012年02月15日 15:42:49eric.araujosetmessages: + msg153404
2012年02月15日 15:34:59jaracosetfiles: + smime.p7s

messages: + msg153403
2012年02月15日 07:24:10nadeem.vawdasetmessages: + msg153391
2012年02月15日 00:06:10jaracosetmessages: + msg153379
2012年02月15日 00:04:26jaracosetmessages: + msg153378
2012年02月14日 21:32:31nadeem.vawdasetmessages: + msg153371
2012年02月14日 15:38:12eric.araujosetversions: + Python 3.2, Python 3.3
messages: + msg153341

assignee: tarek -> eric.araujo
keywords: + easy
stage: test needed
2012年02月13日 21:01:17nadeem.vawdasetnosy: + nadeem.vawda
2012年02月13日 20:57:51jaracosetcomponents: + Windows
2012年02月13日 20:57:07jaracocreate

AltStyle によって変換されたページ (->オリジナル) /