I am working on a small project that requires finding Von Neumann neighborhoods in a matrix. Basically, whenever there is a positive value in the array, I want to get the neighborhood for that value. I used an iterative approach because I don't think there's a way to do this in less than O(n) time as you have to check every single index to make sure all of the positive values are found. My solution is this:
is_counted = {}
recursions = []
def populate_dict(grid, n):
rows = len(grid)
cols = len(grid[0])
for i in range(rows):
for j in range(cols):
if grid[i][j] > 0:
find_von_neumann(grid, i, j, n, rows, cols)
return len(is_counted.keys())
def find_von_neumann(grid, curr_i, curr_j, n, rows, cols):
#print(curr_i, curr_j)
recursions.append(1)
if n == 0:
cell = grid[curr_i][curr_j]
if cell > 0:
key = str(curr_i) + str(curr_j)
if key not in is_counted:
is_counted[key] = 1
if n >= 1:
coord_list = []
# Use Min so that if N > col/row, those values outside of the grid will not be computed.
for i in range(curr_i + min((-n), rows), curr_i + min((n + 1), rows)):
for j in range(curr_j + min((-n), cols), min(curr_j + (n + 1), cols)):
dist = abs((curr_i - i)) + abs((curr_j - j))
if n >= dist >= -n and i >= 0 and j >= 0 and i < rows and j < cols:
coord_list.append([i, j])
for coord in coord_list:
key = str(coord[0]) + str(coord[1])
if key not in is_counted:
is_counted[key] = 1
neighbors = populate_dict([
[-1, -1, -1, -1, -1, 1, 1, 1, 1, 1, 1],
[-1, -1, -1, 1, 1, 1, 1, 1, 1, 1, 1],
[-1, -1, -1, 1, 1, 1, 1, 1, 1, 1, 1],
[-1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[-1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, -1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
],
2)
print(neighbors)
I went with a hash table as my data structure but I believe that currently the worst case runtime would look very similar to O(n^2). For example if N (the distance factor) is equal to the amount of rows or columns AND every single value is positive, its going to be inefficient. Am I missing something with my solution, for example could using dynamic programming possibly help cut down on the runtime or is this already a good solution?
I'm fairly new to algorithms and runtime analysis so please correct me if I messed up on anything in my own analysis of this , thanks!
3 Answers 3
The indexing in these loops looks wrong:
if n >= 1:
coord_list = []
# Use Min so that if N > col/row, those values outside of the grid will not be computed.
for i in range(curr_i + min((-n), rows), curr_i + min((n + 1), rows)):
for j in range(curr_j + min((-n), cols), min(curr_j + (n + 1), cols)):
dist = abs((curr_i - i)) + abs((curr_j - j))
if n >= dist >= -n and i >= 0 and j >= 0 and i < rows and j < cols:
coord_list.append([i, j])
The if n >= 1:
means -n
is a negative number so min((-n), rows)
always returns -n
.
dist
is the sum of two positive numbers (abs((curr_i - i)) + abs((curr_j - j))
) so it is always positive and the test dist >= -n
is unnecessary.
The comment says using min
to keep i
and j
in the grid, but the calculations are not correct so i
and j
may be outside the grid. The later if ... i >= 0 and j >= 0 ...
is filtering out the bad index values.
I think you meant to do something like this (untested) code:
i_start = max(curr_i - n, 0)
i_end = min(curr_i + n + 1, rows)
j_start = max(curr_j - n, 0)
j_end = min(curr_j + n + 1, cols)
for i in range(i_start, i_end):
for j in range(j_start, j_end):
dist = abs((curr_i - i)) + abs((curr_j - j))
if dist <= n:
coord_list.append([i, j])
The Von Neumann neighborhood has a diamond shape. So you could make j_start
and j_end
also depend on abs(curr_i - i)
. That would check fewer cells and let you eliminate the test dist <= n
.
Big-O complexity: populate_dict
iterates over the entire grid, so it is O(rows x cols). find_von_neumann
iterates over a diamond that with an area that grows as n^2 so it is O(n^2). Because find_von_neumann
is called for every square in the grid, the over all complexity is O(rows x cols x n^2).
-
\$\begingroup\$ Good point on the min and dist comparisons being unnecessary. Also I guess I should have put this in the original post but is O(n^2) necessarily bad for these types of algorithms or is there a different to approach this then how I'm currently looking at it? It runs fast enough for anything that we'd need for at the moment but I have a feeling we'll be using this in future projects with possibly more data. Also thank you for taking the time to go through my code \$\endgroup\$Brennan McGowan– Brennan McGowan2020年11月20日 19:17:15 +00:00Commented Nov 20, 2020 at 19:17
Set vs. dict
So far as I can tell, you aren't really using the value in is_counted
, so it is better represented as a set than a dictionary. Also, it shouldn't be global - this prevents re-entrance and hampers testing; instead, make it a return value of find_von_neumann
, merged in turn into another set in populate_dict
that is returned.
Iteration
This is both very dense and re-executes a number of things that can be pre-computed:
for i in range(curr_i + min((-n), rows), curr_i + min((n + 1), rows)):
for j in range(curr_j + min((-n), cols), min(curr_j + (n + 1), cols)):
instead consider something like
i0 = curr_i + min((-n), rows)
i1 = curr_i + min((n + 1), rows)
j0 = curr_j + min((-n), cols)
j1 = min(curr_j + (n + 1), cols)
for i in range(i0, i1):
for j in range(j0, j1):
Combined inequalities
You did the right thing here:
n >= dist >= -n
though I would find it more intuitive as
-n <= dist <= n
You should do the same for i
and j
:
0 <= i < rows and 0 <= j < cols
-
2\$\begingroup\$ This was awesome, thank you taking a look at it! I didn't even think to do those pre-computations and on our largest data set it made a very real difference \$\endgroup\$Brennan McGowan– Brennan McGowan2020年11月20日 19:10:35 +00:00Commented Nov 20, 2020 at 19:10
This doesn't look right:
for coord in coord_list:
key = str(coord[0]) + str(coord[1])
if key not in is_counted:
is_counted[key] = 1
For example, both coord = [1, 11]
and coord = [11, 1]
have the same key '111'
. Unless that's really intended, you should probably insert a non-digit character in the middle to distinguish the cases. Or, much simpler, just use tuples of ints like (1, 11)
.
-
\$\begingroup\$ I didn’t even notice that, good catch! I was going to use a list as the key but forgot you can’t user mutable data structures as keys, but tuples sounds like the answer \$\endgroup\$Brennan McGowan– Brennan McGowan2020年11月20日 22:21:11 +00:00Commented Nov 20, 2020 at 22:21
Explore related questions
See similar questions with these tags.
if grid[coord[0]][coord[1]] < 1:
are the same. \$\endgroup\$