[Python-3000] More PEP 3101 changes incoming

Sun Aug 5 11:57:25 CEST 2007

Talin wrote:
> Ron Adam wrote:
>>> Ron Adam wrote:
>>>>> An alternative I thought of this morning is to reuse the alignment 
>>> symbols '^', '+', and '-' and require a minimum width if a maximum 
>>> width is specified.
>>>> One more (or two) additions to this...
>> (snipped)
>> I've kind of lost track of what the proposal is at this specific point. 
> I like several of the ideas you have proposed, but I think it needs to 
> be slimmed down even more.

I put in a lot of implementation details, so it may seem heavier than it 
really is.
> I don't have a particular syntax in mind - yet - but I can tell you what 
> I would like to see in general.
>> Guido used the term "mini-language" to describe the conversion specifier 
> syntax. I think that's a good term, because it implies that it's not 
> just a set of isolated properties, but rather a grammar where the 
> arrangement and ordering of things matters.

I agree, a mini-language also imply a richness that a simple option list 
doesn't have.
> Like real human languages, it has a "Huffman-coding" property, where the 
> most commonly-uttered phrases are the shortest. This conciseness is 
> achieved by sacrificing some degree of orthogonality (in the same way 
> that a CISC machine instruction is shorter than an equivalent RISC 
> instruction.) In practical terms it means that the interpretation of a 
> symbol depends on what comes before it.

Sounds good.
> So in general common cases should be short, uncommon cases should be 
> possible. And we don't have to allow every possible combination of 
> options, just the ones that are most important.

I figured some of what I suggested would be vetoed, but included them in 
case they are desirable. It's not always easy to know before hand how the 
community, or Guido, ;-) is going to respond to any suggestion.
> Another thing I want to point out is that Guido and I (in a private 
> discussion) have resolved our argument about the role of __format__. 
> Well, not so much *agreed* I guess, more like I capitulated.

Refer to the message in this thread where I discuss the difference between 
concrete and abstract format specifiers. I think this is basically where 
you and Guido are differing on these issues. I got the impression you 
prefer the more abstract interpretation and Guido prefers a more 
traditional interpretation. We can have both as long as they are well 
defined and documented as being one or the other. It's when we try to make 
one format specifier have both qualities at different times that it gets messy.
Here's how the apply_format function could look, we may not be in as much 
disagreement as you think.
def apply_format(value, format_spec):
 abstract = False
 type = format_spec[0]
 if type in 'rtgd':
	abstract = True
 	if format_spec[0] == 'r': # abstarct repr
 value = repr(value)
 elif format_spec[0] == 't': # abstarct text
 value = str(value)
 elif format_spec[0] == 'g': # abstract float
 value = float(value)
 else
 format_spec[0] == 'd': # abstarct int
 value = int(value)
 return value.__format__(format_spec, abstract)
The above abstract types use duck typing to convert to concrete types 
before calling the returned types __format__ method. There aren't that many 
abstract types needed. We only need a few to cover the most common cases.
That's it. It's up to each types __format__ method to figure out things 
from there. They can look at the original type spec passed to them and 
handle special cases if need be.
If the abstract flag is False and the format_spec type doesn't match the 
type of the __format__ methods class, then an exception can be raised. 
This offers a wider range of strictness/leniency to string formatting. 
There are cases where you may want either.
> But in any case, the deal is that int, float, and decimal all get to 
> have a __format__ method which interprets the format string for those 
> types.

Good, +1
> There is no longer any automatic coercion of types based on the 
> format string

Ever? This seems to contradict below where you say int needs to handle 
float, and float needs to handle int. Can you explain further?
> - so simply defining an __int__ method for a type is 
> insufficient if you want to use the 'd' format type. Instead, if you 
> want to use 'd' you can simply write the following:
>> def MyClass:
> def __format__(self, spec):
> return int(self).__format__(spec)

So if an item has an __int__ method, but not a __format__ method, and you 
tried to print it with a 'd' format type, it would raise an exception?
 From your descriptions elsewhere in this reply it sounds like it would 
fall back to string output. Or am I missing something?
> This at least has the advantage of simplifying the problem quite a bit. 
> The global 'format(value, spec)' function now just does:
>> 1) check for the 'repr' override, if present return repr(val)
> 2) call val.__format__(spec) if it exists
> 3) call str(val).__format__(spec)

The repr override is the same as in the above function, except in the above 
example any options after the 'r' would be interpreted by the string 
__format__ method.
Sense there isn't any string specific options yet... it can just be 
returned early as in #1 here, but if options are added to the string type, 
that could be changed to forward the format_spec to the string __format__ 
method.
Number two is the same also.
Number three could be the same... Just put the __format__() in a 
try/except and call str(value) on the exception.
It sounds like we may be getting hung up on interpretation rather than a 
real difference.
> Note that this also means that float.__format__ will have to handle 'd' 
> and int.__format__ will handle 'f', and so on, although this can be done 
> by explicit type conversion in the __format__ method. (No need for float 
> to handle 'x' and the like, even though it does work with %-formatting 
> today.)

This happens in my example above in the case of 'g' and 'd' types 
specifiers, but I'm not sure when it happens in your description if no 
conversions are made?
>> One other feature might be to use the fill syntax form to specify an 
>> overflow replacement character...
>>>> '{0:10+10/#}'.format('Python') -> 'Python '
>>>> '{0:10+10/#}'.format('To be, or not to be.') -> '##########'
>> Yeah, as Guido pointed out in another message that's not going to fly.

This one was just a see if it fly's suggestion. It apparently didn't 
unless a bunch of people all of a sudden say they have actual and valid use 
cases for it that make sense.
Some times you just have to punt and see what happens. ;-)
> A few minor points on syntax of the minilanguage:
>> -- I like your idea that :xxxx and ,yyyy can occur in any order.
 >
> -- I'm leaning towards the .Net conversion spec syntax convention where 
> the type letter comes first: ':f10'. The idea being that the first 
> letter changes the interpretation of subsequent letters.
 >
> Note that in the .Net case, the numeric quantity after the letter 
> represents a *precision* specifier, not a min/max field width.

I agree with these points of course.
> So for example, in .Net having a float field of minimum width 10 and a 
> decimal precision of 3 digits would be ':f3,10'.

It looks ok to me, but there may be some cases where it could be ambiguous. 
 How would you specify leading 0's. Or would we do that in the alignment 
specifier?
 {0:f3,-10/0} '000123.000'
> Now, as stated above, there's no 'max field width' for any data type 
> except strings. So in the case of strings, we can re-use the precision 
> specifier just like C printf does: ':s10' to limit the string to 10 
> characters. So 's:10,5' to indicate a max width of 10, min width of 5.

I'm sure you meant '{0:s10,5}' here.
What happens if the string is too long? Does it always cut the left side 
off? Or do we use +' - and ^ here too?
> -- There's no decimal precision quantity for any data type except 
> floats. So ':d10' doesn't mean anything I think, but ':d,10' is minimum 
> 10 digits.

This is fine... The maximum value is optional, so this works in my examples 
as well. If there's not enough cases where specifying a maximum width is 
useful I'm ok with not having it.
The reason I prefer it in the alignment side, is it applies to all cases 
equally. A consistency I prefer, but maybe not one that's needed.
> -- I don't have an opinion yet on where the other stuff (sign options, 
> padding, alignment) should go, except that sign should go next to the 
> type letter, while the rest should go after the comma.

I think I agree here.
> -- For the 'repr' override, Guido suggests putting 'r' in the alignment 
> field: '{0,r}'. How that mixes with alignment and padding is unknown, 
> although frankly why anyone would want to pad and align a repr() is 
> completely beyond me.

Sometimes it's handy for formatting a variable repr output in columns. 
Mostly for debugging, learning exercises, or documentation purposes.
Since there is no actual Repr type, it may seem like it shouldn't be a type 
specifier. But if you consider it as indirect string type, an abstract type 
that converts to string type, the idea and implementation works fine and it 
can then forward it's type specifier to the strings __format__ method. (or 
not)
The exact behavior can be flexible.
To me there is an underlying consistency with grouping abstract/indirect 
types with more concrete types rather than makeing an exception in the 
field alignment specifier.
Moving repr to the format side sort of breaks the original clean idea of 
having a field alignment specifier and separate type format specifiers.
I think if we continue to sort out the detail behaviors of the underlying 
implementation, the best overall solution will sort it self out. Good and 
complete example test cases will help too.
I think we actually agree on quite a lot so far. :-)
Cheers,
 Ron