[Python-Dev] The Trick (was Re: copysort patch, was Re: inline sort option)

Sat Oct 18 08:26:39 EDT 2003

Wondering about the trick of copysort not copying a singly-referenced list I 
decided to try it out in a tiny extension module, and, yes, it is just as 
trivial as one might wish (haven't dealt with optional args to sort, just 
wanting to check performance &c):
static PyObject*
copysort(PyObject* self, PyObject* args)
{
 PyObject *sequence, *listresult;
 if(!PyArg_ParseTuple(args, "O", &sequence))
 return 0;
 if(PyList_CheckExact(sequence) && sequence->ob_refcnt==1) {
 listresult = sequence;
 Py_INCREF(listresult);
 } else {
 listresult = PySequence_List(sequence);
 }
 if(listresult) {
 if(PyList_Sort(listresult) == -1) {
 Py_DECREF(listresult);
 listresult = 0;
 }
 }
 return listresult;
}
and performance on an equally trivial testcase:
x = dict.fromkeys(range(99999))
def looponsorted1(x):
 keys = x.keys()
 keys.sort()
 for k in keys: pass
def looponsorted2(x, c=copysort.copysort):
 for k in c(x.keys()): pass
turns out to be identical between the two _with_ The Trick (4.4e+04 usec with 
timeit.py -c on my box) while without The Trick copysort would slow down to
about 5.5e+04 usec.
But, this reminds me -- function filter, in bltinmodule.c, uses just about the 
same trick too (to modify in-place when possible rather than making a new 
list -- even though when it does make a new list it's an empty one, not a
copy, so the gain is less). There must be other cases of applicability which
just haven't been considered. So...
Shouldn't The Trick be embodied in PySequence_List itself...? So, the whole 
small tricky part above:
 if(PyList_CheckExact(sequence) && sequence->ob_refcnt==1) {
 listresult = sequence;
 Py_INCREF(listresult);
 } else {
 listresult = PySequence_List(sequence);
 }
would collapse to a single PySequence_List call -- *AND* potential calls from
Python code such as "x=list(somedict.keys())" might also be speeded up 
analogously...
[Such a call looks silly when shown like this, but in some cases one might not 
know, in polymorphic use, whether a method returns a new or potentially 
shared list, or other sequence, and a call to list() on the method's result 
then may be needed to ensure the right semantics in all cases].
Is there any hidden trap in The Trick that makes it unadvisable to insert it 
in PySequence_List? Can't think of any, but I'm sure y'all will let me know 
ASAP what if anything I have overlooked...;-).
One might even be tempted to reach down all the way to PyList_GetSlice, 
placing THERE The Trick in cases of all-list slicing of a singly-referenced 
list (PyList_GetSlice is what PySequence_List uses, so it would also get the 
benefit), but that might be overdoing it -- and encouraging list(xxx) instead
of xxx[:], by making the former a bit faster in some cases, would be no bad 
thing IMHO (already I'm teaching newbies to prefer using list(...) rather 
than ...[:] strictly for legibility and clarity, being able to mention 
possible future performance benefits might well reinforce the habit...;-).
Alex