[Python-3000] More PEP 3101 changes incoming
Ron Adam
rrr at ronadam.com
Sun Aug 5 11:57:25 CEST 2007
Talin wrote:
> Ron Adam wrote:
>>> Ron Adam wrote:
>>>>> An alternative I thought of this morning is to reuse the alignment
>>> symbols '^', '+', and '-' and require a minimum width if a maximum
>>> width is specified.
>>>> One more (or two) additions to this...
>> (snipped)
>> I've kind of lost track of what the proposal is at this specific point.
> I like several of the ideas you have proposed, but I think it needs to
> be slimmed down even more.
I put in a lot of implementation details, so it may seem heavier than it
really is.
> I don't have a particular syntax in mind - yet - but I can tell you what
> I would like to see in general.
>> Guido used the term "mini-language" to describe the conversion specifier
> syntax. I think that's a good term, because it implies that it's not
> just a set of isolated properties, but rather a grammar where the
> arrangement and ordering of things matters.
I agree, a mini-language also imply a richness that a simple option list
doesn't have.
> Like real human languages, it has a "Huffman-coding" property, where the
> most commonly-uttered phrases are the shortest. This conciseness is
> achieved by sacrificing some degree of orthogonality (in the same way
> that a CISC machine instruction is shorter than an equivalent RISC
> instruction.) In practical terms it means that the interpretation of a
> symbol depends on what comes before it.
Sounds good.
> So in general common cases should be short, uncommon cases should be
> possible. And we don't have to allow every possible combination of
> options, just the ones that are most important.
I figured some of what I suggested would be vetoed, but included them in
case they are desirable. It's not always easy to know before hand how the
community, or Guido, ;-) is going to respond to any suggestion.
> Another thing I want to point out is that Guido and I (in a private
> discussion) have resolved our argument about the role of __format__.
> Well, not so much *agreed* I guess, more like I capitulated.
Refer to the message in this thread where I discuss the difference between
concrete and abstract format specifiers. I think this is basically where
you and Guido are differing on these issues. I got the impression you
prefer the more abstract interpretation and Guido prefers a more
traditional interpretation. We can have both as long as they are well
defined and documented as being one or the other. It's when we try to make
one format specifier have both qualities at different times that it gets messy.
Here's how the apply_format function could look, we may not be in as much
disagreement as you think.
def apply_format(value, format_spec):
abstract = False
type = format_spec[0]
if type in 'rtgd':
abstract = True
if format_spec[0] == 'r': # abstarct repr
value = repr(value)
elif format_spec[0] == 't': # abstarct text
value = str(value)
elif format_spec[0] == 'g': # abstract float
value = float(value)
else
format_spec[0] == 'd': # abstarct int
value = int(value)
return value.__format__(format_spec, abstract)
The above abstract types use duck typing to convert to concrete types
before calling the returned types __format__ method. There aren't that many
abstract types needed. We only need a few to cover the most common cases.
That's it. It's up to each types __format__ method to figure out things
from there. They can look at the original type spec passed to them and
handle special cases if need be.
If the abstract flag is False and the format_spec type doesn't match the
type of the __format__ methods class, then an exception can be raised.
This offers a wider range of strictness/leniency to string formatting.
There are cases where you may want either.
> But in any case, the deal is that int, float, and decimal all get to
> have a __format__ method which interprets the format string for those
> types.
Good, +1
> There is no longer any automatic coercion of types based on the
> format string
Ever? This seems to contradict below where you say int needs to handle
float, and float needs to handle int. Can you explain further?
> - so simply defining an __int__ method for a type is
> insufficient if you want to use the 'd' format type. Instead, if you
> want to use 'd' you can simply write the following:
>> def MyClass:
> def __format__(self, spec):
> return int(self).__format__(spec)
So if an item has an __int__ method, but not a __format__ method, and you
tried to print it with a 'd' format type, it would raise an exception?
From your descriptions elsewhere in this reply it sounds like it would
fall back to string output. Or am I missing something?
> This at least has the advantage of simplifying the problem quite a bit.
> The global 'format(value, spec)' function now just does:
>> 1) check for the 'repr' override, if present return repr(val)
> 2) call val.__format__(spec) if it exists
> 3) call str(val).__format__(spec)
The repr override is the same as in the above function, except in the above
example any options after the 'r' would be interpreted by the string
__format__ method.
Sense there isn't any string specific options yet... it can just be
returned early as in #1 here, but if options are added to the string type,
that could be changed to forward the format_spec to the string __format__
method.
Number two is the same also.
Number three could be the same... Just put the __format__() in a
try/except and call str(value) on the exception.
It sounds like we may be getting hung up on interpretation rather than a
real difference.
> Note that this also means that float.__format__ will have to handle 'd'
> and int.__format__ will handle 'f', and so on, although this can be done
> by explicit type conversion in the __format__ method. (No need for float
> to handle 'x' and the like, even though it does work with %-formatting
> today.)
This happens in my example above in the case of 'g' and 'd' types
specifiers, but I'm not sure when it happens in your description if no
conversions are made?
>> One other feature might be to use the fill syntax form to specify an
>> overflow replacement character...
>>>> '{0:10+10/#}'.format('Python') -> 'Python '
>>>> '{0:10+10/#}'.format('To be, or not to be.') -> '##########'
>> Yeah, as Guido pointed out in another message that's not going to fly.
This one was just a see if it fly's suggestion. It apparently didn't
unless a bunch of people all of a sudden say they have actual and valid use
cases for it that make sense.
Some times you just have to punt and see what happens. ;-)
> A few minor points on syntax of the minilanguage:
>> -- I like your idea that :xxxx and ,yyyy can occur in any order.
>
> -- I'm leaning towards the .Net conversion spec syntax convention where
> the type letter comes first: ':f10'. The idea being that the first
> letter changes the interpretation of subsequent letters.
>
> Note that in the .Net case, the numeric quantity after the letter
> represents a *precision* specifier, not a min/max field width.
I agree with these points of course.
> So for example, in .Net having a float field of minimum width 10 and a
> decimal precision of 3 digits would be ':f3,10'.
It looks ok to me, but there may be some cases where it could be ambiguous.
How would you specify leading 0's. Or would we do that in the alignment
specifier?
{0:f3,-10/0} '000123.000'
> Now, as stated above, there's no 'max field width' for any data type
> except strings. So in the case of strings, we can re-use the precision
> specifier just like C printf does: ':s10' to limit the string to 10
> characters. So 's:10,5' to indicate a max width of 10, min width of 5.
I'm sure you meant '{0:s10,5}' here.
What happens if the string is too long? Does it always cut the left side
off? Or do we use +' - and ^ here too?
> -- There's no decimal precision quantity for any data type except
> floats. So ':d10' doesn't mean anything I think, but ':d,10' is minimum
> 10 digits.
This is fine... The maximum value is optional, so this works in my examples
as well. If there's not enough cases where specifying a maximum width is
useful I'm ok with not having it.
The reason I prefer it in the alignment side, is it applies to all cases
equally. A consistency I prefer, but maybe not one that's needed.
> -- I don't have an opinion yet on where the other stuff (sign options,
> padding, alignment) should go, except that sign should go next to the
> type letter, while the rest should go after the comma.
I think I agree here.
> -- For the 'repr' override, Guido suggests putting 'r' in the alignment
> field: '{0,r}'. How that mixes with alignment and padding is unknown,
> although frankly why anyone would want to pad and align a repr() is
> completely beyond me.
Sometimes it's handy for formatting a variable repr output in columns.
Mostly for debugging, learning exercises, or documentation purposes.
Since there is no actual Repr type, it may seem like it shouldn't be a type
specifier. But if you consider it as indirect string type, an abstract type
that converts to string type, the idea and implementation works fine and it
can then forward it's type specifier to the strings __format__ method. (or
not)
The exact behavior can be flexible.
To me there is an underlying consistency with grouping abstract/indirect
types with more concrete types rather than makeing an exception in the
field alignment specifier.
Moving repr to the format side sort of breaks the original clean idea of
having a field alignment specifier and separate type format specifiers.
I think if we continue to sort out the detail behaviors of the underlying
implementation, the best overall solution will sort it self out. Good and
complete example test cases will help too.
I think we actually agree on quite a lot so far. :-)
Cheers,
Ron
More information about the Python-3000
mailing list