Issue 22385: Define a binary output formatting mini-language for *.hex()

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/66579

classification

Title:	Define a binary output formatting mini-language for *.hex()
Type:	enhancement	Stage:	commit review
Components:	Interpreter Core	Versions:	Python 3.9

process

Status:	closed	Resolution:	fixed
Dependencies:	9951	Superseder:
Assigned To:	Nosy List:	Arfrever, Christian H, barry, belopolsky, eric.smith, gotgenes, gregory.p.smith, mrh1997, ncoghlan, xtreak
Priority:	normal	Keywords:	patch

Created on 2014年09月10日 23:55 by ncoghlan, last changed 2022年04月11日 14:58 by admin. This issue is now closed.

Pull Requests
URL	Status	Linked	Edit
PR 13578	merged	gregory.p.smith, 2019年05月26日 00:54

Messages (18)
msg226733 - (view)	Author: Alyssa Coghlan (ncoghlan) * (Python committer)	Date: 2014年09月10日 23:55
Inspired by the discussion in issue 9951, I believe it would be appropriate to extend the default handling of the "x" and "X" format characters to accept arbitrary bytes-like objects. The processing of these characters would be as follows: "x": display a-f as lowercase digits "X": display A-F as uppercase digits "#": includes 0x prefix ".precision": chunks output, placing a space after every <precision> bytes ",": uses a comma as the separator, rather than a space Output order would match binascii.hexlify() Examples: format(b"xyz", "x") -> '78797a' format(b"xyz", "X") -> '78797A' format(b"xyz", "#x") -> '0x78797a' format(b"xyz", ".1x") -> '78 79 7a' format(b"abcdwxyz", ".4x") -> '61626364 7778797a' format(b"abcdwxyz", "#.4x") -> '0x61626364 0x7778797a' format(b"xyz", ",.1x") -> '78,79,7a' format(b"abcdwxyz", ",.4x") -> '61626364,7778797a' format(b"abcdwxyz", "#,.4x") -> '0x61626364,0x7778797a' This approach makes it easy to inspect binary data, with the ability to inject regular spaces or commas to improved readability. Those are the basic features needed to support debugging. Anything more complicated than that, and we're starting to want something more like the struct module.
msg226746 - (view)	Author: Eric V. Smith (eric.smith) * (Python committer)	Date: 2014年09月11日 07:23
I think this would need to be implemented by adding bytes.__format__. I can't think of a way to make it work on bytes-like objects in general.
msg226748 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2014年09月11日 07:25
> ".precision": chunks output, placing a space after every <precision> bytes I dislike this option. There is already "%.<precision>s" in Python 2 and Python 3 (and printf of the C language) which truncates the string. If you need such special output, please write your own function.
msg226749 - (view)	Author: Eric V. Smith (eric.smith) * (Python committer)	Date: 2014年09月11日 07:34
I'm not particularly wild about the .precision syntax either, but I think the feature is generally useful. Adding bytes.__format__ is exactly what "special output for bytes" _is_, as far as format() is concerned. Another option would be to invent a new format specification for bytes. There's no reason it needs to follow the same syntax as for str, int, etc., except for ease of remembering the syntax, and some code reuse. For example, although it's insane, you could do: format(b'abcdwxyz', 'use_spaces,grouping=4,add_prefix') -> '0x61626364 0x7778797a'
msg226992 - (view)	Author: Alyssa Coghlan (ncoghlan) * (Python committer)	Date: 2014年09月17日 10:56
Retitled the issue and made it depend on issue 9951. I now think it's better to tackle this more like strftime and have a method that accepts of particular custom formatting mini-language (in this case, the new hex() methods proposed in issue 9951), and then also support that mini-language in the __format__() method. It would likely need a PEP to decide on the exact details of the formatting.
msg226993 - (view)	Author: Alyssa Coghlan (ncoghlan) * (Python committer)	Date: 2014年09月17日 11:01
python-ideas post with a sketch of a possible mini-language: https://mail.python.org/pipermail/python-ideas/2014-September/029352.html
msg242941 - (view)	Author: Alyssa Coghlan (ncoghlan) * (Python committer)	Date: 2015年05月12日 05:00
Reviewing the items I had flagged as dependencies of issue 22555 for personal tracking purposes, I suggest we defer further consideration of this idea to 3.6 after folks have had a chance to get some experience with the basic bytes.hex() method.
msg292663 - (view)	Author: Alyssa Coghlan (ncoghlan) * (Python committer)	Date: 2017年05月01日 13:34
Copying the amended proposal from that python-ideas thread into here: Start with a leading base format character (chosen to be orthogonal to the default format characters): "h": lowercase hex "H": uppercase hex "A": ASCII (using "." for unprintable & extended ASCII) format(b"xyz", "A") -> 'xyz' format(b"xyz", "h") -> '78797a' format(b"xyz", "H") -> '78797A' Followed by a separator and "chunk size": format(b"xyz", "h 1") -> '78 79 7a' format(b"abcdwxyz", "h 4") -> '61626364 7778797a' format(b"xyz", "h,1") -> '78,79,7a' format(b"abcdwxyz", "h,4") -> '61626364,7778797a' format(b"xyz", "h:1") -> '78:79:7a' format(b"abcdwxyz", "h:4") -> '61626364:7778797a' In the "h" and "H" cases, allow requesting a preceding "0x" on the chunks: format(b"xyz", "h#") -> '0x78797a' format(b"xyz", "h# 1") -> '0x78 0x79 0x7a' format(b"abcdwxyz", "h# 4") -> '0x61626364 0x7778797a' In the thread, I suggested the section before the format character would use the standard string formatting rules (alignment, fill character, width, precision), but I now think that would be ambiguous and confusing, and would be better left as a post-processing step on the rendered text.
msg292671 - (view)	Author: Gregory P. Smith (gregory.p.smith) * (Python committer)	Date: 2017年05月01日 16:27
Based on the ideas thread it isn't obvious that chunk size means "per byte". I suggest either specifying the number of base digits per delimiter. Or using a name other than chunk that indicates it means bytes. If we're going to do this, it should also be done for octal formatting (the 'o' code) for consistency. Also, per the python-ideas thread, via parameters to the .hex() method on bytes/bytearray/memoryview. I'm inclined to leave 'A' printable-ascii formatting out. Or at least consider that it could also work on unicode str.
msg292699 - (view)	Author: Eric V. Smith (eric.smith) * (Python committer)	Date: 2017年05月01日 20:07
The Unix "od" command pretty much has all of the possibilities covered. https://linuxconfig.org/od-1-manual-page Although "named characters" might be going a bit far. Float, too.
msg292710 - (view)	Author: Alyssa Coghlan (ncoghlan) * (Python committer)	Date: 2017年05月02日 02:10
Minimalist proposal: def hex(self, , bytes_per_group=None, delimiter=" "): """B.hex() -> string of hex digits B.hex(bytes_per_group=N) -> hex digits in groups separated by delimeter* Create a string of hexadecimal numbers from a bytes object:: >>> b'\xb9\x01\xef'.hex() 'b901ef' >>> b'\xb9\x01\xef'.hex(bytes_per_group=1) 'b9 01 ef' """ Alternatively, the grouping could be by digit rather than by byte: def hex(self, , group_digits=None, delimiter=" "): """B.hex() -> string of hex digits B.hex(group_digits=N) -> hex digits in groups separated by delimeter* Create a string of hexadecimal numbers from a bytes object:: >>> b'\xb9\x01\xef'.hex() 'b901ef' >>> b'\xb9\x01\xef'.hex(group_digits=2) 'b9 01 ef' """ One potential advantage of the `group_digits` approach is that it could be fairly readily adapted to the hex/oct/bin builtins (although if we did that, it would make the lack of a "dec" builtin for decimal formatting a bit weird)
msg292871 - (view)	Author: Robert (mrh1997) *	Date: 2017年05月03日 09:54
regarding the proposal for mini format languages for bytes (msg292663): Wouldn't it be more consistent if the format specifiers are identical to the one of int's (see https://docs.python.org/3/library/string.html#format-specification-mini-language). I.e. "X" / "x" for hex, "o" for octal, "d" for decimal, "b" for binary, "c" for character (=default). Only 'A' need to be added for printing only ascii characters. Furthermore I cannot see in how far the format spec in http://bugs.python.org/issue22385#msg292663 ("h#,1") is more intuitive than in http://bugs.python.org/issue22385#msg226733 ("#,.4x"), which looks like the existing minilang. Why does Python need a new format mini lang, if the existing one provides most of the requirements. As developer it is already hard to memorize the details of the existing minilang. Ideally I do not need to learn a similar but different one for bytes...
msg292900 - (view)	Author: Alyssa Coghlan (ncoghlan) * (Python committer)	Date: 2017年05月03日 13:19
Re-using an existing minilanguage to mean something completely different wouldn't be a good idea. Whether or not we should add any bytes specific features for this at all is also still an open question, as one of the points raised in the latest python-ideas thread is that this may be better handled as a general purpose string splitting method that breaks the string up into fixed size units, which can then be rejoined with an arbitrary delimeter. For example: >>> digit_groups = b'\xb9\x01\xef'.hex().splitgroups(2) >>> ' '.join(digit_groups) 'b9 01 ef'
msg342888 - (view)	Author: Gregory P. Smith (gregory.p.smith) * (Python committer)	Date: 2019年05月20日 06:31
FYI - micropython added an optional 'sep' second argument to binascii.hexlify() that is a single character separator to insert between every two hex digits. given the #9951 .hex() methods we have everywhere (and corresponding .fromhex), binascii.hexlify is almost a legacy API. (but micropython doesn't have those methods yet). one key difference? hexlify returns the hex value as a bytes rather than a str. just adding a couple of parameters to the hex() method seems fine. a separator string and a number of bytes to separate. yet another minilanguage would be overkill. and confusing in the face of the existing numeric formatting mini language ability to insert , or _ separators every four spaces ala f'{value:_x}'.
msg343527 - (view)	Author: Gregory P. Smith (gregory.p.smith) * (Python committer)	Date: 2019年05月26日 00:57
Given that we have f-strings, I don't think a format mini language makes as much sense. My PR adds support for separators to the .hex() methods (and to binascii.hexlify) via a parameter. Extending beyond what MicroPython already does in its binascii implementation (a single sep parameter).
msg343910 - (view)	Author: Gregory P. Smith (gregory.p.smith) * (Python committer)	Date: 2019年05月29日 18:47
New changeset 0c2f9305640f7655ba0cd5f478948b2763b376b3 by Gregory P. Smith in branch 'master': bpo-22385: Support output separators in hex methods. (#13578) https://github.com/python/cpython/commit/0c2f9305640f7655ba0cd5f478948b2763b376b3
msg343966 - (view)	Author: Karthikeyan Singaravelan (xtreak) * (Python committer)	Date: 2019年05月30日 10:32
This change seems to have created some compile time warnings : https://buildbot.python.org/all/#/builders/103/builds/2544/steps/3/logs/warnings__6_ Python/pystrhex.c:18:45: warning: passing argument 1 of ‘PyObject_Size’ discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers] Python/pystrhex.c:60:27: warning: comparison of integer expressions of different signedness: ‘unsigned int’ and ‘Py_ssize_t’ {aka ‘const int’} [-Wsign-compare] Python/pystrhex.c:90:29: warning: ‘sep_char’ may be used uninitialized in this function [-Wmaybe-uninitialized] Python/pystrhex.c:90:29: warning: ‘sep_char’ may be used uninitialized in this function [-Wmaybe-uninitialized]
msg343987 - (view)	Author: Gregory P. Smith (gregory.p.smith) * (Python committer)	Date: 2019年05月30日 17:48
thanks, i'll take care of them.

History
Date	User	Action	Args
2022年04月11日 14:58:07	admin	set	github: 66579
2019年10月21日 03:21:09	gregory.p.smith	set	status: open -> closed stage: patch review -> commit review resolution: fixed versions: + Python 3.9, - Python 3.8
2019年05月30日 17:48:10	gregory.p.smith	set	messages: + msg343987
2019年05月30日 10:32:55	xtreak	set	nosy: + xtreak messages: + msg343966
2019年05月29日 18:47:04	gregory.p.smith	set	messages: + msg343910
2019年05月26日 00:57:40	gregory.p.smith	set	messages: + msg343527
2019年05月26日 00:54:49	gregory.p.smith	set	keywords: + patch stage: needs patch -> patch review pull_requests: + pull_request13486
2019年05月20日 06:31:33	gregory.p.smith	set	type: enhancement stage: needs patch messages: + msg342888 versions: + Python 3.8, - Python 3.7
2018年06月09日 11:03:02	ncoghlan	unlink	issue22555 dependencies
2017年05月03日 14:32:26	vstinner	set	nosy: - vstinner
2017年05月03日 13:19:04	ncoghlan	set	messages: + msg292900
2017年05月03日 09:54:27	mrh1997	set	nosy: + mrh1997 messages: + msg292871
2017年05月02日 02:10:58	ncoghlan	set	messages: + msg292710
2017年05月01日 20:07:05	eric.smith	set	messages: + msg292699
2017年05月01日 16:27:11	gregory.p.smith	set	nosy: + gregory.p.smith messages: + msg292671 versions: + Python 3.7, - Python 3.6
2017年05月01日 13:34:12	ncoghlan	set	messages: + msg292663
2017年02月02日 22:23:19	Christian H	set	nosy: + Christian H
2015年06月04日 05:41:01	gotgenes	set	nosy: + gotgenes
2015年05月12日 05:00:00	ncoghlan	set	messages: + msg242941 versions: + Python 3.6, - Python 3.5
2014年10月05日 04:09:50	ncoghlan	link	issue22555 dependencies
2014年09月23日 11:44:35	barry	set	nosy: + barry
2014年09月19日 23:23:14	belopolsky	set	nosy: + belopolsky
2014年09月17日 11:01:06	ncoghlan	set	messages: + msg226993
2014年09月17日 10:56:55	ncoghlan	set	dependencies: + introduce bytes.hex method (also for bytearray and memoryview) messages: + msg226992 title: Allow 'x' and 'X' to accept bytes-like objects in string formatting -> Define a binary output formatting mini-language for *.hex()
2014年09月11日 09:44:58	Arfrever	set	nosy: + Arfrever
2014年09月11日 07:34:35	eric.smith	set	messages: + msg226749
2014年09月11日 07:25:21	vstinner	set	nosy: + vstinner messages: + msg226748
2014年09月11日 07:23:31	eric.smith	set	messages: + msg226746
2014年09月11日 07:12:17	ned.deily	set	nosy: + eric.smith
2014年09月10日 23:57:42	ncoghlan	set	title: Allow 'x' and 'X' to accept bytes objects in string formatting -> Allow 'x' and 'X' to accept bytes-like objects in string formatting
2014年09月10日 23:55:01	ncoghlan	create

homepage