Say you want to optimize a (byte) string compare intensive algorithm implemented in Python. Since a central code path contains this sequence of statements
if s < t:
# less than ...
elif t < s:
# greater than ...
else:
# equal ...
it would be great to optimize it to something like
r = bytes_compare(s, t)
if r < 0:
# less than ...
elif r > 0:
# greater than ...
else:
# equal ...
where (the hypothetical) bytes_compare()
ideally would just call the three-way comparison C function memcmp()
which is usually quite well optimized. This would reduce the number of string comparisons by half. A very feasible optimization unless the strings are ultra short.
But how to get there with Python 3?
PS:
Python 3 has removed the three way comparison global function cmp()
and the magic method __cmp__()
. And even with Python 2, the bytes
class doesn't had a __cmp__()
member.
With the ctypes
package it's straight forward to call memcmp()
but the foreign function call overhead with ctypes
is prohibitively high.
1 Answer 1
Python 3 (including 3.6) simply doesn't include any three-way comparison support for strings. Although the internal implementation of the rich comparison operator __lt__()
, __eq__()
etc. do call memcmp()
(in the C implementation of bytes
- cf. Objects/bytesobject.c
) there is no internal three-way comparison function that could be leveraged.
Thus, writing a C extension that provides a three-way comparison function by calling memcmp()
is the next best thing:
#include <Python.h>
static PyObject* cmp(PyObject* self, PyObject* args) {
PyObject *a = 0, *b = 0;
if (!PyArg_UnpackTuple(args, "cmp", 2, 2, &a, &b))
return 0;
if (!PyBytes_Check(a) || !PyBytes_Check(b)) {
PyErr_SetString(PyExc_TypeError, "only bytes() strings supported");
return 0;
}
Py_ssize_t n = PyBytes_GET_SIZE(a), m = PyBytes_GET_SIZE(b);
char *s = PyBytes_AsString(a), *t = PyBytes_AsString(b);
int r = 0;
if (n == m) {
r = memcmp(s, t, n);
} else if (n < m) {
r = memcmp(s, t, n);
if (!r)
r = -1;
} else {
r = memcmp(s, t, m);
if (!r)
r = 1;
}
return PyLong_FromLong(r);
}
static PyMethodDef bytes_util_methods[] = {
{ "cmp", cmp, METH_VARARGS, "Three way compare 2 bytes() objects." },
{0,0,0,0} };
static struct PyModuleDef bytes_util_def = {
PyModuleDef_HEAD_INIT, "bytes_util", "Three way comparison for strings.",
-1, bytes_util_methods };
PyMODINIT_FUNC PyInit_bytes_util(void) {
Py_Initialize();
return PyModule_Create(&bytes_util_def);
}
Compile with:
gcc -Wall -O3 -fPIC -shared bytes_util.c -o bytes_util.so -I/usr/include/python3.6m
Test:
>>> import bytes_util
>>> bytes_util.cmp(b'foo', b'barx')
265725
In contrast to calling memcmp
via the ctypes
package, this foreign call has the same overhead as the builtin bytes comparison operators (as they also are implemented as C extension with the standard Python version).
1 Comment
PyBytes
APIs with the more general buffer interface (PyObject_GetBuffer
with flags
PyBUF_SIMPLE
) would allow this to work seamlessly with bytearray
, mmap
, array.array('B')
, and all manner of other bytes
-like objects. You'd just declare a pair of Py_buffer
structures on the stack, populate them w/PyObject_GetBuffer
and your inputs, then use buf.buf
and buf.len
where you currently use s/t
and m/n
.
s < t
comparison implemented atm, given thats
andt
arebytes
?s
witht
followed byt
withs
). Note that this pattern isn't unusual - for example, you also have this when you merge/join 2 sorted sequences of strings. Also, Python string comparison aren't slow in general - if you just need one comparison outcome for a pairs
andt
then they are fast, as is. Thus, this really isn't about isolated microbenchmarking.Objects/bytesobject.c
and it callsmemcmp()
inbytes_compare_eq()
andbytes_richcompare()
(for<
or>
).cmp()
in python 3, the suggested way is to do:((a > b) - (b > a))
. This will give you ther
but ... it is still 2 comparisons. I have tried a python-based implementation ofbytes_compare
... very bad idea :) at least 3x slower