Cythonized Sutherland-Hogman algorithm

Question 1

I want to cythonise the python implementation of the Sutherland-Hogman algorithm. This algorithm updates a list of vertices according to pretty simple rules (being inside or outside an edge, etc.) but the details are not important. Here is the python version it accepts lists of vertices of polygons oriented clockwise. For instance those:

sP=[(50 150), (200 50), (350 150), (350 300), (250 300), (200 250), (150 350)(100 250) (100 200)]
cP=[(100, 100), (300, 100), (300, 300), (100, 300)]

and calculate their intersection:

inter=clip(sP, cP)

Here is the code found on rosettacode and slightly modified to return an empty list if there is no intersection.

def clip(subjectPolygon, clipPolygon):
 def inside(p):
 return(cp2[0]-cp1[0])*(p[1]-cp1[1]) > (cp2[1]-cp1[1])*(p[0]-cp1[0])
 def computeIntersection():
 dc = [ cp1[0] - cp2[0], cp1[1] - cp2[1] ]
 dp = [ s[0] - e[0], s[1] - e[1] ]
 n1 = cp1[0] * cp2[1] - cp1[1] * cp2[0]
 n2 = s[0] * e[1] - s[1] * e[0] 
 n3 = 1.0 / (dc[0] * dp[1] - dc[1] * dp[0])
 return [(n1*dp[0] - n2*dc[0]) * n3, (n1*dp[1] - n2*dc[1]) * n3]
 outputList = subjectPolygon
 cp1 = clipPolygon[-1]
 for clipVertex in clipPolygon:
 cp2 = clipVertex
 inputList = outputList
 outputList = []
 s = inputList[-1]
 for subjectVertex in inputList:
 e = subjectVertex
 if inside(e):
 if not inside(s):
 outputList.append(computeIntersection())
 outputList.append(e)
 elif inside(s):
 outputList.append(computeIntersection())
 s = e
 if len(outputList)<1:
 return []
 cp1 = cp2
 return(outputList)

This function is excruciatingly slow for my applications so I tried to cythonize it using numpy. Here is my cython version. I had to define the two functions outside clip because I had error messages about buffer inputs.

cimport cython
import numpy as np
cimport numpy as np
def clip(np.ndarray[np.float32_t, ndim=2] subjectPolygon,np.ndarray[np.float32_t, ndim=2] clipPolygon):
 outputList = list(subjectPolygon)
 cdef np.ndarray[np.float32_t, ndim=1] cp1 = clipPolygon[-1,:]
 cdef np.ndarray[np.float32_t, ndim=1] cp2 
 for i in xrange(clipPolygon.shape[0]):
 cp2 = clipPolygon[i]
 inputList = outputList
 outputList = []
 s = inputList[-1]
 for subjectVertex in inputList:
 e = subjectVertex
 if inside(e, cp1, cp2):
 if not inside(s, cp1, cp2):
 outputList.append(computeIntersection(cp1, cp2, e, s))
 outputList.append(e)
 elif inside(s, cp1, cp2):
 outputList.append(computeIntersection(cp1, cp2, e, s))
 s = e
 if len(outputList)<1:
 return []
 cp1 = cp2
 return(outputList)
def computeIntersection(np.ndarray[np.float32_t, ndim=1] cp1, np.ndarray[np.float32_t, ndim=1] cp2, np.ndarray[np.float32_t, ndim=1] e, np.ndarray[np.float32_t, ndim=1] s):
 cdef np.ndarray[np.float32_t, ndim=1] dc = cp1-cp2
 cdef np.ndarray[np.float32_t, ndim=1] dp = s-e
 cdef np.float32_t n1 = cp1[0] * cp2[1] - cp1[1] * cp2[0]
 cdef np.float32_t n2 = s[0] * e[1] - s[1] * e[0] 
 cdef np.float32_t n3 = 1.0 / (dc[0] * dp[1] - dc[1] * dp[0])
 cdef np.ndarray[np.float32_t, ndim=1] res=np.array([(n1*dp[0] - n2*dc[0]) * n3, (n1*dp[1] - n2*dc[1]) * n3], dtype=np.float32)
 return res
def inside(np.ndarray[np.float32_t, ndim=1] p, np.ndarray[np.float32_t, ndim=1] cp1, np.ndarray[np.float32_t, ndim=1] cp2):
 cdef bint b=(cp2[0]-cp1[0])*(p[1]-cp1[1]) > (cp2[1]-cp1[1])*(p[0]-cp1[0])
 return b

When I time the two versions I gained only a factor of two in speed-up I need at least 10 times that (or 100x !). Is there something to do ? How does one deal with variable sized list with Cython ? I do not know if this could be useful but my inputs are of fixed length.

Question 2

Reopen your SO question. @prune doesn't have any experience answering cython questions on CR. Your SO version will get a lot more attention than this CR one.

Question 3

Following @hpaulj comment I reopened it on SO.

Question 4

I will close the one that gets the least feedback.

Question 5

What you may and may not do after receiving answers. I've rolled back Rev 5 → 2.

Question 6

@jean If you open a new post and in the title add something like - Parallel or vectorizing methods? I can likely post you some code changes that will get your required speed. Be sure to note Python 3.5 Windows x64 so answers are relevant. I think you could get what you need in a lot less code.

Question 7

First, some things which won't give you a speedup factor of 10 but might get another 2 or 3:

Only call inside once for each set of arguments. The inner loop calls inside(e) and inside(s) every time, and then assigns s = e.
Factor out dc and n1. dc is calculated in every call to inside and every call to computeIntersection; n1 is calculated in every call to computeIntersection. But cp1 and cp2 change very rarely.

To get more speedup, it's probably worth looking at parallelisation. Lifting inside over inputList is completely parallelisable; mapping (s, e, s_inside, e_inside) to a suitable sublist of [computeIntersection(e, s), e, computeIntersection(e, s)] is also completely parallelisable, although possibly a bit fiddly. I don't know anything about parallelisation in Python, but there seem to be options.

Question 8

Thanks definitely a good answer I will benchmark your solution and come back to you asap.

Question 9

Thinking about it I do not think we can get rid of one of the inside because e is assigned to s afterwards.

Question 10

@jean, you're thinking about it backwards. It's precisely because e is assigned to s that you can reuse the result of inside.

Question 11

I will get bak to you tomorrow with an updated version of the code and benchmarks.

Question 12

I finally got it super nice comments !

Question 13

One important thing to mention - if you're cythoning a for loop through a range, you need to cdef the type of the index.

def pointless_add_slow(j):
 for i in range(120000):
 i = i+j
 return i
%timeit pointless_add_slow(1)
100 loops, best of 3: 3.71 ms per loop

New block in jupyter:

%%cython
def pointless_add(j):
 cdef int i
 for i in range(120000):
 i = i+j
 return i
%timeit pointless_add(1)
100 loops, best of 3: 1.98 ms per loop

Question 14

Thanks but it does not change noticeably the becnhmarks.

Peter Taylor Peter Taylor 24.4k1 gold badge49 silver badges94 bronze badges · Accepted Answer · 2017-06-27 08:37:34Z

First, some things which won't give you a speedup factor of 10 but might get another 2 or 3:

Only call inside once for each set of arguments. The inner loop calls inside(e) and inside(s) every time, and then assigns s = e.
Factor out dc and n1. dc is calculated in every call to inside and every call to computeIntersection; n1 is calculated in every call to computeIntersection. But cp1 and cp2 change very rarely.

To get more speedup, it's probably worth looking at parallelisation. Lifting inside over inputList is completely parallelisable; mapping (s, e, s_inside, e_inside) to a suitable sublist of [computeIntersection(e, s), e, computeIntersection(e, s)] is also completely parallelisable, although possibly a bit fiddly. I don't know anything about parallelisation in Python, but there seem to be options.

Thanks definitely a good answer I will benchmark your solution and come back to you asap.
Thinking about it I do not think we can get rid of one of the inside because e is assigned to s afterwards.
@jean, you're thinking about it backwards. It's precisely because e is assigned to s that you can reuse the result of inside.
I will get bak to you tomorrow with an updated version of the code and benchmarks.

Stack Exchange Network

Cythonized Sutherland-Hogman algorithm

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Cythonized Sutherland-Hogman algorithm

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions