This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2014年04月17日 01:01 by bgailer, last changed 2022年04月11日 14:58 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| issue21279.patch | lilbludot, 2014年04月18日 22:47 | review | ||
| issue21279.patch | martin.panter, 2014年12月13日 03:21 | review | ||
| issue21279.patch | martin.panter, 2014年12月18日 04:14 | review | ||
| issue21279.v4.patch | martin.panter, 2014年12月21日 08:50 | review | ||
| issue21279.v5.patch | jjposner, 2014年12月24日 02:27 | review | ||
| issue21279.v6.patch | jjposner, 2015年01月25日 16:39 | review | ||
| Messages (25) | |||
|---|---|---|---|
| msg216630 - (view) | Author: bob gailer (bgailer) | Date: 2014年04月17日 01:01 | |
Documentation for str.translate only mentions a dictionary for the translation table. Actually any iterable can be used, as long as its elements are integer, None or str. Recommend wording: str.translate(translation_table) Return a copy of the s where all characters have been "mapped" through the translation_table - which must be either a dictionary mapping Unicode ordinals (integers) to Unicode ordinals, strings or None, or an iterable. In this case the ord() of each character in s is used as an index into the iterable; the corresponding element of the iterable replaces the character. If ord() of the character exceeds the index range of the iterator, no substitution is made. Example: to shift any of the first 255 ASCII characters to the next: >>> 'Now is the time for all good men'.translate(range(1, 256)) 'Opx!jt!uif!ujnf!gps!bmm!hppe!nfo' COMMENT: I placed mapped in quotes as technically this only applies to dictionaries. Not sure what the best word is. |
|||
| msg216653 - (view) | Author: Martin Panter (martin.panter) * (Python committer) | Date: 2014年04月17日 04:50 | |
I suspect "iterable" is the wrong term. >>> isinstance(set(), Iterable) True >>> "abc".translate(set()) TypeError: 'set' object does not support indexing >>> "abc".translate(object()) TypeError: 'object' object is not subscriptable Maybe "indexable" or "subscriptable" would be more correct? If this behaviour is part of the API, it would be nice to document, because it would have saved me a few times from implementing the __len__() and __iter__() methods of the mapping interface in my custom lookup tables. Here is my suggestion: str.translate(table): Return a copy of the string where all characters have been mapped through "table", a lookup table. The lookup table must be a subscriptable object, for instance a dictionary or list, mapping Unicode ordinals (integers) to Unicode ordinals, strings or None. If a character is not in the table, the subscript operation should raise LookupError, and the character is left untouched. Characters mapped to None are deleted. |
|||
| msg216683 - (view) | Author: Josh Rosenberg (josh.r) * (Python triager) | Date: 2014年04月17日 10:43 | |
For the record, I have intentionally used bytes.maketrans to make translation table for str.translate for precisely this reason; it's much faster to look up a ordinal in a bytes object than in a dictionary. Before the recent (partial) patch for str.translate performance (#21118), this was a huge improvement if you only needed to worry about latin-1 characters (though encoding to latin-1, using bytes.translate, then decoding again was still faster). It's still faster than using a dictionary even with the patch from #21118, but it's not nearly as significant. |
|||
| msg216815 - (view) | Author: Kinga Farkas (lilbludot) * | Date: 2014年04月18日 22:47 | |
I have created a patch based on Martin Panter's suggestions. Please let me know if it is off or there should be additional changes included. |
|||
| msg216817 - (view) | Author: Terry J. Reedy (terry.reedy) * (Python committer) | Date: 2014年04月19日 00:08 | |
The docstring is more accurate.
">>> str.translate.__doc__
'S.translate(table) -> str\n\nReturn a copy of the string S, where all characters have been mapped\nthrough the given translation table, which must be a mapping of\nUnicode ordinals to Unicode ordinals, strings, or None.\nUnmapped characters are left untouched. Characters mapped to None\nare deleted.'""
To me, even this is a bit unclear on exceptions and 'unmapped'. Based on experiments and then reading the C source, I determined that LookupErrors mean 'unmapped' while other exceptions are passed on and terminate the translation.
"Return a copy of the string S, where all characters have been mapped through the given translation table. When subscripted by a Unicode ordinal (integer in range(1048576)), the table must return a Unicode ordinal, string, or None, or else raise a LookupError. A LookupError, which includes instances of subclasses IndexError and KeyError, indicates that the character is unmapped and should be left untouched. Characters mapped to None are deleted."
class Table:
def __getitem__(self, key):
if key == 99: raise LookupError() #'c'
elif key == 100: return None # 'd'
elif key == 101: return 'xyz' # 'e'
else: return key+1
print('abcdef'.translate(Table()))
# bccxyzg
The current doc ends with "Note
An even more flexible approach is to create a custom character mapping codec using the codecs module (see encodings.cp1251 for an example)."
I don't see how this is supposed to help. Encodings.cp1251 uses a string of 256 chars as a lookup table.
|
|||
| msg216818 - (view) | Author: Terry J. Reedy (terry.reedy) * (Python committer) | Date: 2014年04月19日 00:10 | |
I see that we mostly added the same info. |
|||
| msg232590 - (view) | Author: Martin Panter (martin.panter) * (Python committer) | Date: 2014年12月13日 03:21 | |
Update patch with typo fixed, removed note about the "codecs" module (which I never found useful either), and updated the doc string with similar wording. Terry, do you think the wording in the patch is good enough, or do you think some of your proposed wording should be included? |
|||
| msg232624 - (view) | Author: Terry J. Reedy (terry.reedy) * (Python committer) | Date: 2014年12月13日 23:50 | |
Many people may not know that IndexError and KeyError are subclasses of LookupError. I have not decided what to add yet, but I think we are close. |
|||
| msg232662 - (view) | Author: John Posner (jjposner) * | Date: 2014年12月15日 13:38 | |
Kindly ignore message #2 on the Rietveld page (sorry for the channel noise). Here's my suggested revision: Return a copy of the string *str* in which each character has been mapped through the given translation *table*. The table must be a subscriptable object, for instance a list or dictionary; when subscripted (indexed) by a Unicode ordinal (an integer in range(1048576)), the table object can: * return a Unicode ordinal or a string, to map the character to one or more other characters. * return None, to delete the character from the return string. * raise a LookupError (possibly an instance of subclass IndexError or KeyError), to map the character to itself. |
|||
| msg232695 - (view) | Author: Martin Panter (martin.panter) * (Python committer) | Date: 2014年12月16日 01:12 | |
I’m largely happy with any of these revisions. If I end up doing another patch I would omit the *str* (it is a class name, not a parameter). Also I would omit the range(2^20) claim. Unless people think it is important; why is it different to sys.maxunicode + 1 = 0x110000? |
|||
| msg232855 - (view) | Author: Martin Panter (martin.panter) * (Python committer) | Date: 2014年12月18日 04:14 | |
Here is a new patch based on John’s suggestion |
|||
| msg232933 - (view) | Author: John Posner (jjposner) * | Date: 2014年12月19日 14:58 | |
Regarding Martin's patch of 12-18: stdtypes.rst -- looks good to me unicodeobject.c -- I suggest changing this sentence: If a character is not in the table, the subscript operation should raise LookupError, and the character is left untouched. ... to: If the subscript operation raises a LookupError, the character is left untouched. |
|||
| msg232994 - (view) | Author: Martin Panter (martin.panter) * (Python committer) | Date: 2014年12月21日 08:50 | |
Patch v4 with John’s doc string wording |
|||
| msg232999 - (view) | Author: John Posner (jjposner) * | Date: 2014年12月21日 15:58 | |
Patch of 12-21 looks good, Martin. |
|||
| msg233000 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2014年12月21日 16:59 | |
Proposed wording looks superfluously verbose to me. Look also at description in Include/unicodeobject.h: /* Translate a string by applying a character mapping table to it and return the resulting Unicode object. The mapping table must map Unicode ordinal integers to Unicode ordinal integers or None (causing deletion of the character). Mapping tables may be dictionaries or sequences. Unmapped character ordinals (ones which cause a LookupError) are left untouched and are copied as-is. */ It is repeated (more detailed) in Doc/c-api/unicode.rst. Isn't it pretty clear? |
|||
| msg233002 - (view) | Author: Martin Panter (martin.panter) * (Python committer) | Date: 2014年12月21日 22:05 | |
Serhiy can you point out which bits are too verbose? Perhaps you prefer it without the bullet list like in the earlier 2014年12月13日 version of the patch. Looking at the C API, I see a couple problems there: * Omits mentioning that an ordinal can map to a replacement string * It looks like the documented None behaviour applies when errors="ignore", otherwise it invokes a codec error handler |
|||
| msg233013 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2014年12月22日 07:35 | |
> Serhiy can you point out which bits are too verbose? Perhaps you prefer it > without the bullet list like in the earlier 2014年12月13日 version of the > patch. I prefer it without the bullet list and without LookupError expansion (there is a link to LookupError definition where IndexError and KeyError should be mentioned). Instead of new term "subscriptable objects" use "mappings or sequences" with links to glossary. > Looking at the C API, I see a couple problems there: Yes, it is slightly outdated and needs updates. |
|||
| msg233014 - (view) | Author: Terry J. Reedy (terry.reedy) * (Python committer) | Date: 2014年12月22日 08:38 | |
I agree with Serhiy: no bullet points, links to glossary (at least in doc), without repeating. |
|||
| msg233025 - (view) | Author: Martin Panter (martin.panter) * (Python committer) | Date: 2014年12月22日 20:34 | |
The problem with mappings and sequences is that they both require len() and iter() implementations, but str.translate() only requires __getitem__(). Perhaps a qualifier could work, like: The table must implement the __getitem__() method of mappings and sequences. |
|||
| msg233071 - (view) | Author: John Posner (jjposner) * | Date: 2014年12月24日 02:27 | |
issue21279.v5.patch tries to apply the comments in msg233013, msg233014, and msg233025 to the Doc/library/stdtypes.rst writeup. Then it applies some of the same language to the docstring in Objects/unicodeobject.c. |
|||
| msg234652 - (view) | Author: Martin Panter (martin.panter) * (Python committer) | Date: 2015年01月25日 05:00 | |
I’m happy with the new wording in v5. Maybe the docstring in the C module could be reflowed though. |
|||
| msg234674 - (view) | Author: John Posner (jjposner) * | Date: 2015年01月25日 16:39 | |
Per Martin's suggestion, deltas from issue21279.v5.patch: * no change to patch for doc/library/stdtypes.rst * doc string reflowed in patch for objects/unicodeobject.c |
|||
| msg245546 - (view) | Author: Martin Panter (martin.panter) * (Python committer) | Date: 2015年06月20日 07:10 | |
Patch v6 looks okay, so I think it is ready to commit. |
|||
| msg248107 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2015年08月06日 05:06 | |
New changeset ae53bd5decae by Zachary Ware in branch '3.4': Issue #21279: Flesh out str.translate docs https://hg.python.org/cpython/rev/ae53bd5decae New changeset 064b569e38fe by Zachary Ware in branch '3.5': Issue #21279: Merge with 3.4 https://hg.python.org/cpython/rev/064b569e38fe New changeset 967c9a9fe724 by Zachary Ware in branch 'default': Closes #21279: Merge with 3.5 https://hg.python.org/cpython/rev/967c9a9fe724 |
|||
| msg248108 - (view) | Author: Zachary Ware (zach.ware) * (Python committer) | Date: 2015年08月06日 05:09 | |
Very minor grammatical fixes, reflowed the .rst docs, and re-added the codecs module mention in a less obtrusive manner, but the patch is committed. Thank you Kinga, Martin, and John! |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:58:02 | admin | set | github: 65478 |
| 2015年08月06日 05:09:41 | zach.ware | set | nosy:
+ zach.ware messages: + msg248108 |
| 2015年08月06日 05:06:10 | python-dev | set | status: open -> closed nosy: + python-dev messages: + msg248107 resolution: fixed stage: commit review -> resolved |
| 2015年06月20日 07:10:26 | martin.panter | set | stage: patch review -> commit review messages: + msg245546 versions: + Python 3.6 |
| 2015年01月25日 19:54:08 | berker.peksag | set | nosy:
+ berker.peksag |
| 2015年01月25日 16:39:34 | jjposner | set | files:
+ issue21279.v6.patch messages: + msg234674 |
| 2015年01月25日 05:00:02 | martin.panter | set | messages: + msg234652 |
| 2014年12月24日 02:27:34 | jjposner | set | files:
+ issue21279.v5.patch messages: + msg233071 |
| 2014年12月22日 20:34:56 | martin.panter | set | messages: + msg233025 |
| 2014年12月22日 08:38:25 | terry.reedy | set | messages: + msg233014 |
| 2014年12月22日 07:35:40 | serhiy.storchaka | set | messages: + msg233013 |
| 2014年12月21日 22:05:30 | martin.panter | set | messages: + msg233002 |
| 2014年12月21日 16:59:25 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka, ezio.melotti, georg.brandl, vstinner messages: + msg233000 components: + Unicode |
| 2014年12月21日 15:58:49 | jjposner | set | messages: + msg232999 |
| 2014年12月21日 08:50:48 | martin.panter | set | files:
+ issue21279.v4.patch messages: + msg232994 |
| 2014年12月19日 14:58:58 | jjposner | set | messages: + msg232933 |
| 2014年12月18日 04:14:07 | martin.panter | set | files:
+ issue21279.patch messages: + msg232855 |
| 2014年12月16日 01:12:42 | martin.panter | set | messages: + msg232695 |
| 2014年12月15日 13:38:48 | jjposner | set | nosy:
+ jjposner messages: + msg232662 |
| 2014年12月13日 23:50:42 | terry.reedy | set | messages: + msg232624 |
| 2014年12月13日 03:21:49 | martin.panter | set | files:
+ issue21279.patch messages: + msg232590 |
| 2014年04月19日 00:10:17 | terry.reedy | set | messages: + msg216818 |
| 2014年04月19日 00:08:48 | terry.reedy | set | nosy:
+ terry.reedy messages: + msg216817 |
| 2014年04月18日 22:47:22 | lilbludot | set | files:
+ issue21279.patch nosy: + lilbludot messages: + msg216815 keywords: + patch |
| 2014年04月17日 10:43:16 | josh.r | set | nosy:
+ josh.r messages: + msg216683 |
| 2014年04月17日 04:50:41 | martin.panter | set | nosy:
+ martin.panter messages: + msg216653 |
| 2014年04月17日 01:06:05 | rhettinger | set | keywords:
+ easy stage: patch review versions: + Python 3.4, Python 3.5, - Python 3.3 |
| 2014年04月17日 01:01:22 | bgailer | create | |