8

Say you want to optimize a (byte) string compare intensive algorithm implemented in Python. Since a central code path contains this sequence of statements

if s < t:
 # less than ...
elif t < s:
 # greater than ...
else:
 # equal ...

it would be great to optimize it to something like

r = bytes_compare(s, t)
if r < 0:
 # less than ...
elif r > 0:
 # greater than ...
else:
 # equal ...

where (the hypothetical) bytes_compare() ideally would just call the three-way comparison C function memcmp() which is usually quite well optimized. This would reduce the number of string comparisons by half. A very feasible optimization unless the strings are ultra short.

But how to get there with Python 3?

PS:

Python 3 has removed the three way comparison global function cmp() and the magic method __cmp__(). And even with Python 2, the bytes class doesn't had a __cmp__() member.

With the ctypes package it's straight forward to call memcmp() but the foreign function call overhead with ctypes is prohibitively high.

asked Jun 10, 2018 at 9:32
5
  • It sounds like you're doing string comparisons in a loop in Python. That will always be slow. You'd be better off engineering a proper fast solution using related tools like Cython, Pandas, etc. But doing that will require a holistic look at the surrounding code, not microbenchmarking a single string comparison. Commented Jun 10, 2018 at 10:01
  • How is s < t comparison implemented atm, given that s and t are bytes? Commented Jun 10, 2018 at 10:30
  • @JohnZwinck, well, I did some profiling with pyflame and this verified that a significant amount of time is spend in these paired comparisons (s with t followed by t with s). Note that this pattern isn't unusual - for example, you also have this when you merge/join 2 sorted sequences of strings. Also, Python string comparison aren't slow in general - if you just need one comparison outcome for a pair s and t then they are fast, as is. Thus, this really isn't about isolated microbenchmarking. Commented Jun 10, 2018 at 10:31
  • @urban, with Python 3.6.5, there is Objects/bytesobject.c and it calls memcmp() in bytes_compare_eq() and bytes_richcompare() (for < or >). Commented Jun 10, 2018 at 10:36
  • From codegolf.stackexchange.com/a/49779 where they discuss cmp() in python 3, the suggested way is to do: ((a > b) - (b > a)). This will give you the r but ... it is still 2 comparisons. I have tried a python-based implementation of bytes_compare... very bad idea :) at least 3x slower Commented Jun 10, 2018 at 11:00

1 Answer 1

5

Python 3 (including 3.6) simply doesn't include any three-way comparison support for strings. Although the internal implementation of the rich comparison operator __lt__(), __eq__() etc. do call memcmp() (in the C implementation of bytes - cf. Objects/bytesobject.c) there is no internal three-way comparison function that could be leveraged.

Thus, writing a C extension that provides a three-way comparison function by calling memcmp() is the next best thing:

#include <Python.h>
static PyObject* cmp(PyObject* self, PyObject* args) {
 PyObject *a = 0, *b = 0;
 if (!PyArg_UnpackTuple(args, "cmp", 2, 2, &a, &b))
 return 0;
 if (!PyBytes_Check(a) || !PyBytes_Check(b)) {
 PyErr_SetString(PyExc_TypeError, "only bytes() strings supported");
 return 0;
 }
 Py_ssize_t n = PyBytes_GET_SIZE(a), m = PyBytes_GET_SIZE(b);
 char *s = PyBytes_AsString(a), *t = PyBytes_AsString(b);
 int r = 0;
 if (n == m) {
 r = memcmp(s, t, n);
 } else if (n < m) {
 r = memcmp(s, t, n);
 if (!r)
 r = -1;
 } else {
 r = memcmp(s, t, m);
 if (!r)
 r = 1;
 }
 return PyLong_FromLong(r);
}
static PyMethodDef bytes_util_methods[] = {
 { "cmp", cmp, METH_VARARGS, "Three way compare 2 bytes() objects." },
 {0,0,0,0} };
static struct PyModuleDef bytes_util_def = {
 PyModuleDef_HEAD_INIT, "bytes_util", "Three way comparison for strings.",
 -1, bytes_util_methods };
PyMODINIT_FUNC PyInit_bytes_util(void) {
 Py_Initialize();
 return PyModule_Create(&bytes_util_def);
}

Compile with:

gcc -Wall -O3 -fPIC -shared bytes_util.c -o bytes_util.so -I/usr/include/python3.6m

Test:

>>> import bytes_util
>>> bytes_util.cmp(b'foo', b'barx')
265725

In contrast to calling memcmp via the ctypes package, this foreign call has the same overhead as the builtin bytes comparison operators (as they also are implemented as C extension with the standard Python version).

answered Jun 15, 2018 at 19:37

1 Comment

Note: To make this significantly more reusable with near zero overhead, replacing explicit use of the PyBytes APIs with the more general buffer interface (PyObject_GetBuffer with flags PyBUF_SIMPLE) would allow this to work seamlessly with bytearray, mmap, array.array('B'), and all manner of other bytes-like objects. You'd just declare a pair of Py_buffer structures on the stack, populate them w/PyObject_GetBuffer and your inputs, then use buf.buf and buf.len where you currently use s/t and m/n.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.