I finished a program to do connected component analysis using union - find algorithm. Is it true that the complexity of the code is \$O(N logN)\,ドル where \$N\$ is the total number of pixels (512x512 by example, not 512).
Is there any way to improve the performance?
import cv2
import numpy as np
import random
class QuickUnionUF:
def __init__(self, N):
self.id = range(N)
self.sz = [0] * N
@classmethod
def fromimage(self, im):
self.id = im
self.sz = [0] * len(im)
def root(self, i):
while (i != self.id[i]):
i = self.id[i]
return i
def getresult(self):
result = [self.root(i) for i in self.id]
return result
def connected(self, p, q):
return self.root(p) == self.root(q)
def union(self, p, q):
i = self.root(p)
j = self.root(q)
if (i == j):
return
if (self.sz[i] < self.sz[j]):
self.id[i] = j
self.sz[j] += self.sz[i]
else:
self.id[j] = i
self.sz[j] += self.sz[i]
def bwlabel(im):
M, N = im.shape[:2]
qf = QuickUnionUF(M * N)
for i in range(M - 1):
for j in range(N - 1):
if (im[i][j] == im[i][j+1]):
qf.union(i * N + j, i * N + j + 1)
if (im[i + 1][j] == im[i][j]):
qf.union(i * N + j, (i + 1) * N + j)
mask = np.reshape(np.array(qf.getresult()), (M, N))
values = np.unique(mask).tolist()
random.seed()
colors = [(random.randint(0,255), random.randint(0,255), random.randint(0,255)) for k in range(len(values))]
out = np.zeros((M, N, 3))
for i in range(M):
for j in range(N):
label = values.index(mask[i][j])
out[i,j] = colors[label]
return out
im = cv2.imread("bw.jpg",cv2.IMREAD_GRAYSCALE)
out = bwlabel(im > 100)
cv2.imwrite("result.png", out)
1 Answer 1
Is it true that the complexity of the code is O(NlogN), where N is the total number of pixels (512x512 by example, not 512).
You main loop in bwlabel
is:
for i in range(M - 1):
for j in range(N - 1):
if (im[i][j] == im[i][j+1]):
qf.union(i * N + j, i * N + j + 1)
if (im[i + 1][j] == im[i][j]):
qf.union(i * N + j, (i + 1) * N + j)
So you are calling qf.union
\$O(N*M) = O(n)\$ times. There you call self.root
twice:
def root(self, i):
while (i != self.id[i]):
i = self.id[i]
return i
What's the worst case running time of this? At most \$O(n)\,ドル since self.id
has \$n\$ values and there are no cycles. However, since you are comparing ranks in union
:
if (self.sz[i] < self.sz[j]):
self.id[i] = j
self.sz[j] += self.sz[i]
else:
self.id[j] = i
self.sz[j] += self.sz[i]
You will have \$O(log\ n)\$ paths to root, so the total running time is \$O(n\ log\ n)\$.
Is there any way to improve the performance?
Yes, path compression. You descended down two paths from p
and q
to i
and j
and it took \$O(log\ n)\$. At each step you can point a node to its root, instead of just the next element. You can do this recursively like so:
def root(self, i):
if i == self.id[i]:
return i
self.id[i] = self.root(self.id[i])
return self.id[i]
Recursion adds its own cost, though, so whether it's faster in practice would require testing. An alternative would be to add a second iterative compression function:
def compress(self, i, r):
while i != r:
i, self.id[i] = self.id[i], r
Now you are walking the path twice, which again may or may not be faster than recursion. You should only need to call this for one of the paths, though.