I want to cythonise the python implementation of the Sutherland-Hogman algorithm. This algorithm updates a list of vertices according to pretty simple rules (being inside or outside an edge, etc.) but the details are not important. Here is the python version it accepts lists of vertices of polygons oriented clockwise. For instance those:
sP=[(50 150), (200 50), (350 150), (350 300), (250 300), (200 250), (150 350)(100 250) (100 200)]
cP=[(100, 100), (300, 100), (300, 300), (100, 300)]
and calculate their intersection:
inter=clip(sP, cP)
Here is the code found on rosettacode and slightly modified to return an empty list if there is no intersection.
def clip(subjectPolygon, clipPolygon):
def inside(p):
return(cp2[0]-cp1[0])*(p[1]-cp1[1]) > (cp2[1]-cp1[1])*(p[0]-cp1[0])
def computeIntersection():
dc = [ cp1[0] - cp2[0], cp1[1] - cp2[1] ]
dp = [ s[0] - e[0], s[1] - e[1] ]
n1 = cp1[0] * cp2[1] - cp1[1] * cp2[0]
n2 = s[0] * e[1] - s[1] * e[0]
n3 = 1.0 / (dc[0] * dp[1] - dc[1] * dp[0])
return [(n1*dp[0] - n2*dc[0]) * n3, (n1*dp[1] - n2*dc[1]) * n3]
outputList = subjectPolygon
cp1 = clipPolygon[-1]
for clipVertex in clipPolygon:
cp2 = clipVertex
inputList = outputList
outputList = []
s = inputList[-1]
for subjectVertex in inputList:
e = subjectVertex
if inside(e):
if not inside(s):
outputList.append(computeIntersection())
outputList.append(e)
elif inside(s):
outputList.append(computeIntersection())
s = e
if len(outputList)<1:
return []
cp1 = cp2
return(outputList)
This function is excruciatingly slow for my applications so I tried to cythonize it using numpy. Here is my cython version. I had to define the two functions outside clip because I had error messages about buffer inputs.
cimport cython
import numpy as np
cimport numpy as np
def clip(np.ndarray[np.float32_t, ndim=2] subjectPolygon,np.ndarray[np.float32_t, ndim=2] clipPolygon):
outputList = list(subjectPolygon)
cdef np.ndarray[np.float32_t, ndim=1] cp1 = clipPolygon[-1,:]
cdef np.ndarray[np.float32_t, ndim=1] cp2
for i in xrange(clipPolygon.shape[0]):
cp2 = clipPolygon[i]
inputList = outputList
outputList = []
s = inputList[-1]
for subjectVertex in inputList:
e = subjectVertex
if inside(e, cp1, cp2):
if not inside(s, cp1, cp2):
outputList.append(computeIntersection(cp1, cp2, e, s))
outputList.append(e)
elif inside(s, cp1, cp2):
outputList.append(computeIntersection(cp1, cp2, e, s))
s = e
if len(outputList)<1:
return []
cp1 = cp2
return(outputList)
def computeIntersection(np.ndarray[np.float32_t, ndim=1] cp1, np.ndarray[np.float32_t, ndim=1] cp2, np.ndarray[np.float32_t, ndim=1] e, np.ndarray[np.float32_t, ndim=1] s):
cdef np.ndarray[np.float32_t, ndim=1] dc = cp1-cp2
cdef np.ndarray[np.float32_t, ndim=1] dp = s-e
cdef np.float32_t n1 = cp1[0] * cp2[1] - cp1[1] * cp2[0]
cdef np.float32_t n2 = s[0] * e[1] - s[1] * e[0]
cdef np.float32_t n3 = 1.0 / (dc[0] * dp[1] - dc[1] * dp[0])
cdef np.ndarray[np.float32_t, ndim=1] res=np.array([(n1*dp[0] - n2*dc[0]) * n3, (n1*dp[1] - n2*dc[1]) * n3], dtype=np.float32)
return res
def inside(np.ndarray[np.float32_t, ndim=1] p, np.ndarray[np.float32_t, ndim=1] cp1, np.ndarray[np.float32_t, ndim=1] cp2):
cdef bint b=(cp2[0]-cp1[0])*(p[1]-cp1[1]) > (cp2[1]-cp1[1])*(p[0]-cp1[0])
return b
When I time the two versions I gained only a factor of two in speed-up I need at least 10 times that (or 100x !). Is there something to do ? How does one deal with variable sized list with Cython ? I do not know if this could be useful but my inputs are of fixed length.
2 Answers 2
First, some things which won't give you a speedup factor of 10 but might get another 2 or 3:
- Only call
inside
once for each set of arguments. The inner loop callsinside(e)
andinside(s)
every time, and then assignss = e
. - Factor out
dc
andn1
.dc
is calculated in every call toinside
and every call tocomputeIntersection
;n1
is calculated in every call tocomputeIntersection
. Butcp1
andcp2
change very rarely.
To get more speedup, it's probably worth looking at parallelisation. Lifting inside
over inputList
is completely parallelisable; mapping (s, e, s_inside, e_inside)
to a suitable sublist of [computeIntersection(e, s), e, computeIntersection(e, s)]
is also completely parallelisable, although possibly a bit fiddly. I don't know anything about parallelisation in Python, but there seem to be options.
-
\$\begingroup\$ Thanks definitely a good answer I will benchmark your solution and come back to you asap. \$\endgroup\$jeandut– jeandut2017年06月27日 09:03:27 +00:00Commented Jun 27, 2017 at 9:03
-
\$\begingroup\$ Thinking about it I do not think we can get rid of one of the inside because e is assigned to s afterwards. \$\endgroup\$jeandut– jeandut2017年06月27日 09:06:28 +00:00Commented Jun 27, 2017 at 9:06
-
\$\begingroup\$ @jean, you're thinking about it backwards. It's precisely because
e
is assigned tos
that you can reuse the result ofinside
. \$\endgroup\$Peter Taylor– Peter Taylor2017年06月27日 09:11:32 +00:00Commented Jun 27, 2017 at 9:11 -
\$\begingroup\$ I will get bak to you tomorrow with an updated version of the code and benchmarks. \$\endgroup\$jeandut– jeandut2017年06月27日 16:30:31 +00:00Commented Jun 27, 2017 at 16:30
-
\$\begingroup\$ I finally got it super nice comments ! \$\endgroup\$jeandut– jeandut2017年06月28日 11:45:54 +00:00Commented Jun 28, 2017 at 11:45
One important thing to mention - if you're cythoning a for loop through a range, you need to cdef the type of the index.
def pointless_add_slow(j):
for i in range(120000):
i = i+j
return i
%timeit pointless_add_slow(1)
100 loops, best of 3: 3.71 ms per loop
New block in jupyter:
%%cython
def pointless_add(j):
cdef int i
for i in range(120000):
i = i+j
return i
%timeit pointless_add(1)
100 loops, best of 3: 1.98 ms per loop
-
\$\begingroup\$ Thanks but it does not change noticeably the becnhmarks. \$\endgroup\$jeandut– jeandut2017年06月28日 12:20:13 +00:00Commented Jun 28, 2017 at 12:20
Explore related questions
See similar questions with these tags.
@prune
doesn't have any experience answeringcython
questions on CR. Your SO version will get a lot more attention than this CR one. \$\endgroup\$