Issue 1361643: textwrap.dedent() expands tabs

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/42613

classification

Type:	Stage:
Title:	textwrap.dedent() expands tabs
Components:	Library (Lib)	Versions:	Python 2.5

process

Dependencies:	Superseder:
Status:	closed	Resolution:	fixed
Assigned To:	gward	Nosy List:	bethard, georg.brandl, gward, rhettinger
Priority:	high	Keywords:

Created on 2005年11月19日 19:02 by bethard, last changed 2022年04月11日 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
textwrap.diff	rhettinger, 2005年11月20日 12:30	Diff for textwrap.py and test_textwrap.py

Messages (6)
msg26902 - (view)	Author: Steven Bethard (bethard) * (Python committer)	Date: 2005年11月19日 19:02
I'm not sure whether this is a documentation bug or a code bug, but textwrap.dedent() expands tabs (and AFAICT doesn't give the user any way of stopping this): py> def test(): ... x = ('abcd efgh\n' ... 'ijkl mnop\n') ... y = textwrap.dedent('''\ ... abcd efgh ... ijkl mnop ... ''') ... return x, y ... py> test() ('abcd\tefgh\nijkl\tmnop\n', 'abcd efgh\nijkl mnop\n') Looking at the code, I can see the culprit is the first line: lines = text.expandtabs().split('\n') If this is the intended behavior, I think the first sentence in the documentation[1] should be replaced with: """ Replace all tabs in string with spaces as per str.expandtabs() and then remove any whitespace that can be uniformly removed from the left of every line in text. """ and (I guess this part is an RFE) textwrap.dedent() should gain an optional expandtabs= keyword argument to disable this behavior. If it's not the intended behavior, I'd love to see that .expandtabs() call removed. [1]http://docs.python.org/lib/module-textwrap.html
msg26903 - (view)	Author: Raymond Hettinger (rhettinger) * (Python committer)	Date: 2005年11月19日 20:18
Logged In: YES user_id=80475 FWIW, the tab expansion would be more useful if the default tabsize could be changed.
msg26904 - (view)	Author: Raymond Hettinger (rhettinger) * (Python committer)	Date: 2005年11月20日 04:52
Logged In: YES user_id=80475 After more thought, I think the expandtabs() is a bug since it expands content tabs as well as margin tabs: >>> textwrap.dedent('\tABC\t\tDEF') 'ABC DEF' This is especially problematic given that dedent() has to guess at the tab size. If this gets fixed, I recommend using regular expressions as a way to indentify common margin prefixes on non-empty lines. This will also mixes of spaces and tabs without altering content with embedded tabs and without making assumptions about the tab size. Also, it ought to run somewhat faster.
msg26905 - (view)	Author: Raymond Hettinger (rhettinger) * (Python committer)	Date: 2005年11月20日 06:04
Logged In: YES user_id=80475 Suggested code: import re as _re _emptylines_with_spaces = _re.compile('(?m)^[ \t]+$') _prefixes_on_nonempty_lines = _re.compile('(?m)(^[ \t]*)(?:[^ \t\n]+)') def dedent(text): text = _emptylines_with_spaces.sub('', text) prefixes = _prefixes_on_nonempty_lines.findall(text) margin = min(prefixes or ['']) if margin: text = _re.sub('(?m)^' + margin, '', text) return text
msg26906 - (view)	Author: Georg Brandl (georg.brandl) * (Python committer)	Date: 2005年12月15日 08:45
Logged In: YES user_id=1188172 Looks good!
msg26907 - (view)	Author: Greg Ward (gward) (Python committer)	Date: 2006年06月11日 00:41
Logged In: YES user_id=14422 I agree that the docs are (pretty) clear and the code is wrong. When determining common leading whitespace, tabs and spaces should not be treated as equivalent. Raymond's fix was close, but not quite there: considering only the length of leading whitespace still causes space/tab confusion. (This only became clear to me after I wrote several test cases.) My fix is based on Raymond's, i.e. it uses regexes for most of the heavy lifting rather than splitting the input string on newline and looping over the lines. The bit that's different is determining what exactly is the common leading whitespace string. Anyways, this ended up being a complete rewrite of dedent(). I also added a paragraph to the docs to clarify the distinction between tabs and spaces. Checked in under rev 46844 (trunk only).

History
Date	User	Action	Args
2022年04月11日 14:56:14	admin	set	github: 42613
2005年11月19日 19:02:09	bethard	create

homepage