homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: msilib file names check too strict ?
Type: enhancement Stage: needs patch
Components: Windows Versions: Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: cdavid, loewis, markm
Priority: normal Keywords: patch

Created on 2008年04月26日 04:18 by cdavid, last changed 2022年04月11日 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
make_id_fix_and_test.patch markm, 2011年03月26日 12:24 Patch to fix msilib.make_id() and test it review
Messages (8)
msg65834 - (view) Author: Cournapeau David (cdavid) Date: 2008年04月26日 04:18
Hi,
I wanted to build a msi using the build_msi distutils command for one of
my package, but at some point, it fails, at the function make_id, at
line 177 in mstlib/__init__.py, for a file named aixc++.py. The regex
indeed refuses any character which is not alphanumeric: is msi itself
really that strict, or could this check be relaxed ?
msg65842 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008年04月26日 12:19
Indeed, the primary keys in many tables must be Identifiers, see
http://msdn2.microsoft.com/en-us/library/aa369212(VS.85).aspx
make_id tries to synthesize an identifier from a file name, and fails
for your file names.
msg65845 - (view) Author: Cournapeau David (cdavid) Date: 2008年04月26日 15:56
Ok, thanks for the information.
It may good to have a bit more informative error, though, such as saying
which characters are allowed when checking against a regex ?
msg65846 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008年04月26日 16:02
Actually, the algorithm should be fixed to generate a valid identifier
for any input.
Would you like to work on a fix?
msg65847 - (view) Author: Cournapeau David (cdavid) Date: 2008年04月26日 16:13
It's not that I don't want to work on it, but I don't know anything
about msi, except that some windows users of my packages request it :)
So I would need some indication on what to fix exactly
Do I understand right that dist_msi builds a database of the files, and
that the identifiers could be named differently than the filenames
themselves, as long as they are unique ?
msg65848 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008年04月26日 16:23
> Do I understand right that dist_msi builds a database of the files, and
> that the identifiers could be named differently than the filenames
> themselves, as long as they are unique ?
Correct. As a design objective, I try to use identifiers close to the
file names, to simplify debugging of the MSI file (Microsoft itself
typically uses UUIDs instead).
In short, just make make_id generate valid identifiers. An algorithm
on top of that will make them unique in case of conflicts.
Regards,
Martin
msg132232 - (view) Author: Mark Mc Mahon (markm) * Date: 2011年03月26日 12:24
How about the following patch and tests...
Per: http://msdn.microsoft.com/en-us/library/aa369212(v=vs.85).aspx
"""The Identifier data type is a text string. Identifiers may contain the
ASCII characters A-Z (a-z), digits, underscores (_), or periods (.). However, every identifier must begin with either a letter or an underscore."""
So the spec would say that colons are NOT allowed. Editing some entries in the File table of an MSI (using Orca from the MSI SDK) and running the validation confirms that.
All the following were flagged as errors:
'KDiff3EXE;"ASDF@#$', 'chmFile-', 'pdfFile(', 'hgbook]', 'TortoisePlinkEXE]', 'Hg.Cämd'
I also did some speed testing (just in case non/regex might be slow)
Python 3.2 (r32:88445, Feb 20 2011, 21:29:02) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from timeit import timeit
>>> setup = 'import string\nidentifier_chars = string.ascii_letters + string.digits + "._"\ntmp_str = []'
>>> timeit("re.sub(r'[^a-zA-Z_\.]', '_', 'somefilename.txt')", setup = "import re")
4.434621757767205
>>> setup = 'import string\nidentifier_chars = string.ascii_letters + string.digits + "._"\ntmp_str = []'
>>> timeit('"".join([c if c in identifier_chars else "_" for c in "somefilename.txt"])', setup)
3.3757537425069906
>>>
msg132543 - (view) Author: Mark Mc Mahon (markm) * Date: 2011年03月29日 22:14
This issue has been fixed by changes made in issue7639 and issue11696 
History
Date User Action Args
2022年04月11日 14:56:33adminsetgithub: 46946
2011年03月30日 05:33:38loewissetstatus: open -> closed
resolution: fixed
2011年03月29日 22:14:07markmsetmessages: + msg132543
2011年03月26日 12:24:18markmsetfiles: + make_id_fix_and_test.patch

nosy: + markm
messages: + msg132232

keywords: + patch
2010年01月13日 01:58:56brian.curtinsetpriority: normal
stage: needs patch
versions: + Python 2.7, - Python 2.5
2008年04月26日 16:23:53loewissetmessages: + msg65848
2008年04月26日 16:13:55cdavidsetmessages: + msg65847
2008年04月26日 16:02:31loewissetmessages: + msg65846
2008年04月26日 15:56:06cdavidsetmessages: + msg65845
2008年04月26日 12:19:34loewissetnosy: + loewis
messages: + msg65842
2008年04月26日 04:18:42cdavidcreate

AltStyle によって変換されたページ (->オリジナル) /