I recently implemented the RDP polygon approximation algorithm in Python and I'm skeptical of whether or not I implemented it correctly of with the greatest efficiency. The algorithm runs in around 0.0003 seconds for a polygon with 30 points on my computer (an Intel Core i5 with 3.8 GHz of RAM), so I'm worried about how it may run on a slower computer. Also, there seems to be a cap as to the number of points that can be removed by the algorithm, or at least there's a cap in my implementation. No matter how high I set the tolerance, the approximation always caps at about \$\frac{2N}{3}\$ points where \$N\$ is the number of points in the input polygon. Could I be doing something wrong?
NegInf = float('-inf')
def distance(v1, v2):
"""
Calculate the distance between two points.
PARAMETERS
==========
v1, v2 >> The first and second vertices respectively.
"""
dx = v2[0] - v1[0]
dy = v2[1] - v1[1]
return math.sqrt(dx*dx + dy*dy)
def perpendicular_distance(point, line_start, line_end):
"""
Calculate the perpendicular distance from a point to a line.
PARAMETERS
==========
point >> The point of which to calculate the distance from the line
(must be an (x, y) tuple).
line_start, line_end >> Start and end points defining the line respectively
(must each be an (x, y) tuple).
"""
x1, y1 = line_start
x2, y2 = line_end
vx, vy = point
if x1 == x2:
return abs(x1 - vx)
m = (y2 - y1)/(x2 - x1)
b = y1 - m*x1
return abs(m * vx - vy + b)/math.sqrt(m*m + 1)
def _rdp_approx(points, tolerance, depth):
"""
Internal Function: Recursively perform the RDP algorithm.
"""
if not points:
# In case the furthest point index discovered is equal to the length of the
# list of points, leading to points[furthest:] sending in an empty list.
return []
elif len(points) <= 2:
# BASE CASE:: No points to remove, only the start and the end points of the line.
# Return it.
return points
elif len(points) == 3:
# BASE CASE:: Our decomposition of the polygon has reached a minimum of 3 points.
# Now all that is left is to remove the point in the middle (assuming it's distance
# from the line is greater than the set tolerance).
dist = perpendicular_distance(points[1],
points[0],
points[2]
)
if dist < tolerance:
return [points[0], points[-1]]
return points
max_dist = NegInf
furthest = None
start = 0
start_point = points[start]
if depth == 1:
# In the initial approximation, we are given an entire polygon to approximate. This
# means that the start and end points are the same, thus we cannot use the perpendicular
# distance equation to calculate the distance a point is from the start since the start is
# not a line. We have to use ordinary distance formula instead.
get_distance = lambda point: distance(point, start_point)
else:
end_point = points[-1]
get_distance = lambda point: perpendicular_distance(point, start_point, end_point)
# Find the farthest point from the norm.
for i, point in enumerate(points[1:], 1):
dist = get_distance(point)
if dist > max_dist:
max_dist = dist
furthest = i
# Recursively calculate the RDP approximation of the two polygonal chains formed by
# slicing at the index of the furthest discovered point.
prev_points = _rdp_approx(points[:furthest+1], tolerance, depth+1)
next_points = _rdp_approx(points[furthest:], tolerance, depth+1)
new_points = []
for point in prev_points + next_points:
# Filter out the duplicate points whilst maintaining order.
# TODO:: There's probably some fancy slicing trick I just haven't figured out
# that can be applied to prev_points and next_points so that we don't have to
# do this, but it's not a huge bottleneck so we needn't worry about it now.
if point not in new_points:
new_points.append(point)
return new_points
def rdp_polygon_approximate(coordinates, tolerance):
"""
Use the Ramer-Douglas-Peucker algorithm to approximate the points on a polygon.
The RDP algorithm recursively cuts away parts of a polygon that stray from the
average of the edges. It is a great algorithm for maintaining the overall form
of the input polygon, however one should be careful when using this for larger
polygons as the algorithm has an average complexity of T(n) = 2T(n/2) + O(n) and
a worst case complexity of O(n^2).
PARAMETERS
==========
coordinates >> The coordinates of the polygon to approximate.
tolerance >> The amount of tolerance the algorithm will use. The tolerance
determines the minimum distance a point has to sway from the average
before it gets deleted from the polygon. Thus, setting the tolerance to
be higher should delete more points on the final polygon.
That said, due to how the algorithm works there is a limit to the number
of vertices that can be removed on a polygon. Setting the tolerance to
float('inf') or sys.maxsize will not approximate the polygon to a single
point. Usually the minimum points an approximated polygon can have if the
original polygon had N points is between 2N/3 and N/3.
FURTHER READING
===============
For further reading on the Ramer-Douglas-Peucker algorithm, see
http://en.wikipedia.org/wiki/Ramer%E2%80%93Douglas%E2%80%93Peucker_algorithm
"""
return _rdp_approx(coordinates, tolerance, 1)
if __name__ == '__main__':
poly = [(3, 0), (4, 2), (5, 2), (5.5, 3), (5, 4), (4, 5), (5, 6),
(7, 5), (7, 3), (8, 2.5), (8, 4), (9, 5), (8, 7), (7, 8), (6, 7),
(4, 7.75), (3.5, 7.5), (3, 8), (3, 8.5), (2.5, 9), (1, 9), (0, 8),
(2, 7), (1, 7), (0, 6), (1, 4), (2, 5), (2, 2), (3, 3), (2, 1)]
print(rdp_polygon_approximate(poly, 3))
print(rdp_polygon_approximate(poly, float('inf')))
3 Answers 3
If you want to calculate the farthest point you don't need to use the square root for 'real' distance calculation. Since you only need the distance for comparison and not the actual distance it is totally sufficient to compare by x*x + y*y
instead of sqrt(x*x + y*y)
-
\$\begingroup\$ In this case, the
perpendicular_distance
also probably needs a simpler way to produce a squared distance. \$\endgroup\$Morwenn– Morwenn2014年05月15日 11:20:10 +00:00Commented May 15, 2014 at 11:20 -
\$\begingroup\$ That's a good suggestion, I'll start looking into a way to implement a squared perpendicular distance function. Thanks! \$\endgroup\$user3002473– user30024732014年05月15日 22:03:25 +00:00Commented May 15, 2014 at 22:03
This is quite an old question, yet I'd like to answer because it comes up if one searches for "Python Ramer-Douglas-Peucker" on the internet.
First, the implementation is incorrect!
The OP suspected the same when they said:
No matter how high I set the tolerance, the approximation always caps at about (2 N / 3) points where N is the number of points in the input polygon. Could I be doing something wrong?
Knowing how the algorithm works, it should be clear that the number of points in the input should not be the driving variable determining the number of output points. Instead, the tolerance should matter.
What is an easy test case? A straight line!
Let's use Hypothesis to generate some test cases. Our properties are easy: The resulting poly-line should have length 2, and the first and last point are the same as the input poly-line.
import unittest
from hypothesis import given, example, assume
from hypothesis.strategies import lists, integers, tuples, floats
from ramer_douglas_peucker.original import rdp_polygon_approximate, perpendicular_distance
import numpy as np
class CompareTestCase(unittest.TestCase):
@given(min_x=floats(min_value=-1e5, max_value=1e5, allow_nan=False,
allow_infinity=False),
max_x = floats(min_value=-1e5, max_value=1e5, allow_nan=False,
allow_infinity=False))
def test_straight_line(self, min_x, max_x):
assume(min_x < max_x)
m = 2.0
b = -3.0
poly = [(x, m*x+b) for x in np.linspace(min_x, max_x, 100)]
poly_out = rdp_polygon_approximate(poly, tolerance=0.01)
self.assertEqual(len(poly_out), 2)
self.assertEqual(poly_out, [poly[0], poly[-1]])
You should see that this fails: AssertionError: 77 != 2
You might get a slightly different length. Thus the length of the returned poly_out
is not equal to two.
Next, let's have a look at your implementation.
# Find the farthest point from the norm.
for i, point in enumerate(points[1:], 1):
dist = get_distance(point)
if dist > max_dist:
max_dist = dist
furthest = i
# Main bug here:
# You were missing this common case: All points are already within
# 'tolerance'. Just return start and end.
if max_dist < tolerance:
return [points[0], points[-1]]
This bug explains the observation that you can only simplify by (2 N / 3) points. The code only drops the middle point if the (recursive) input has length 3 (special case at the beginning) and the max distance is larger than tolerance.
Next, let's have a look at your 'TODO' comment, where you put the two recursion results together:
new_points = []
for point in prev_points + next_points:
# Filter out the duplicate points whilst maintaining order.
# TODO:: There's probably some fancy slicing trick I just haven't figured out
# that can be applied to prev_points and next_points so that we don't have to
# do this, but it's not a huge bottleneck so we needn't worry about it now.
if point not in new_points:
new_points.append(point)
return new_points
The fancy slicing trick you mentioned in your TODO is: You want to
concatenate prev_points and next_points but the point at the pivot (furthest
)
will be present twice. So just return the first list except for the last point.
return prev_points[:-1] + next_points
This also avoids a bug, where you have the same point multiple times. Like [A, b, c, A, d, e, f]
. The second A
would've been removed.
More could be said about the fact that you are using recursion (if you care about performance), the special case handling at the beginning of the implementation (not needed) -- however, this question is too old for anyone to care.
-
\$\begingroup\$ Though the question is old, I'm not yet! Thanks for doing this, I always appreciate it when old, unfulfilled questions get properly sorted out. It'll serve the next generation of python RDP implementers! \$\endgroup\$user3002473– user30024732018年01月27日 16:08:30 +00:00Commented Jan 27, 2018 at 16:08
You can simplify your distance
function with the function math.hypot(x, y)
:
dx = v2[0] - v1[0]
dy = v2[1] - v1[1]
return math.hypot(dx, dy)
math.hypot(x, y)
computes directly math.sqrt(x*x + y*y)
. Morevoer, the CPython implementation should be based on the underlying C function hypot
; therefore, this function is safer than the naive implementation since it does its best to avoid overflows and underflows at an intermediate stage of the computation.
From a design point of view, one thing I would probably do to avoid passing coordinates everywhere is to create at least two classes, Point
and Line
:
class Point:
def __init__(self, x, y):
self.x = x
self.y = y
class Line:
def __init__(self, start, end):
self.start = start
self.end = end
Then, I would have done a "generic" distance
function that can compute the distance between a point and virtually anything. With Python 3.4 singledispatch
decorator, it would be something like this (I did not test the code, it is merely here to give you an idea):
from singledispatch import singledispatch
@singledispatch
def distance(shape, point):
pass
@distance.register(Point)
def _(arg, point):
"""distance from point ot point"""
# implementation
@distance.register(Line)
def _(arg, point):
"""distance from point to line)"""
# implementation
Unfortunately, this is only a design idea, not an optimization one, but it would at least help to write code easier to write, with an API based on actual types, and not on mere coordinates.
-
3\$\begingroup\$ I did consider using
math.hypot
, however running some tests withtimeit
gave some interesting results:distance
withmath.sqrt
: 7.09003095935727e-07s,distance
withmath.hypot
: 8.529042939559872e-07s. I'm pretty sure it has to do something with the fact thatmath.hypot
is not just a square root function in C, it uses a different implementation (as seen at en.wikipedia.org/wiki/Hypot). \$\endgroup\$user3002473– user30024732014年05月15日 08:44:37 +00:00Commented May 15, 2014 at 8:44 -
2\$\begingroup\$ @user3002473 That's actually rather interesting. It shows again that there may be an actual trade-off between safe vs fast :) \$\endgroup\$Morwenn– Morwenn2014年05月15日 09:02:52 +00:00Commented May 15, 2014 at 9:02
Explore related questions
See similar questions with these tags.
print
s at the end of your program. Which version of Python are you using? \$\endgroup\$[(0, 0), (0, 4)]
. \$\endgroup\$