How can I optimize my Von Neumann neighborhood algorithm?

Question 1

I am working on a small project that requires finding Von Neumann neighborhoods in a matrix. Basically, whenever there is a positive value in the array, I want to get the neighborhood for that value. I used an iterative approach because I don't think there's a way to do this in less than O(n) time as you have to check every single index to make sure all of the positive values are found. My solution is this:

is_counted = {}
recursions = []
def populate_dict(grid, n):
 rows = len(grid)
 cols = len(grid[0])
 for i in range(rows):
 for j in range(cols):
 if grid[i][j] > 0:
 find_von_neumann(grid, i, j, n, rows, cols)
 return len(is_counted.keys())
def find_von_neumann(grid, curr_i, curr_j, n, rows, cols):
 #print(curr_i, curr_j)
 recursions.append(1)
 if n == 0:
 cell = grid[curr_i][curr_j]
 if cell > 0:
 key = str(curr_i) + str(curr_j)
 if key not in is_counted:
 is_counted[key] = 1
 if n >= 1:
 coord_list = []
 # Use Min so that if N > col/row, those values outside of the grid will not be computed.
 for i in range(curr_i + min((-n), rows), curr_i + min((n + 1), rows)):
 for j in range(curr_j + min((-n), cols), min(curr_j + (n + 1), cols)):
 dist = abs((curr_i - i)) + abs((curr_j - j))
 if n >= dist >= -n and i >= 0 and j >= 0 and i < rows and j < cols:
 coord_list.append([i, j])
 for coord in coord_list:
 key = str(coord[0]) + str(coord[1])
 if key not in is_counted:
 is_counted[key] = 1
neighbors = populate_dict([
 [-1, -1, -1, -1, -1, 1, 1, 1, 1, 1, 1],
 [-1, -1, -1, 1, 1, 1, 1, 1, 1, 1, 1],
 [-1, -1, -1, 1, 1, 1, 1, 1, 1, 1, 1],
 [-1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
 [-1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
 [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
 [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
 [1, 1, -1, 1, 1, 1, 1, 1, 1, 1, 1],
 [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
 [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
 [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
 [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
 ],
 2)
print(neighbors)

I went with a hash table as my data structure but I believe that currently the worst case runtime would look very similar to O(n^2). For example if N (the distance factor) is equal to the amount of rows or columns AND every single value is positive, its going to be inefficient. Am I missing something with my solution, for example could using dynamic programming possibly help cut down on the runtime or is this already a good solution?

I'm fairly new to algorithms and runtime analysis so please correct me if I messed up on anything in my own analysis of this , thanks!

Question 2

Please check that you posted the correct code. The indentation is wrong and both branches of the if grid[coord[0]][coord[1]] < 1: are the same.

Question 3

@RootTwo you're right that logic was for a different approach I tried but forgot to change, also fixed the indentation

Question 4

What is the result supposed to be? You forgot to describe the task.

Question 5

@superbrain Sorry this is my first time using code review but I just updated it again and this matches my local code exactly. The task is to find the Von Neumann neighborhoods of every positive cell in the matrix and return the total number of cells that are in all Von Neumann neighborhoods in the matrix, but it does not need to return the locations of the neighborhoods themselves. Sorry for the mix up!

Question 6

Do you actually care about these neighborhoods, or do you just want to know the number of grid cells within n steps of some positive value?

Question 7

The indexing in these loops looks wrong:

if n >= 1:
 coord_list = []
 # Use Min so that if N > col/row, those values outside of the grid will not be computed.
 for i in range(curr_i + min((-n), rows), curr_i + min((n + 1), rows)):
 for j in range(curr_j + min((-n), cols), min(curr_j + (n + 1), cols)):
 dist = abs((curr_i - i)) + abs((curr_j - j))
 if n >= dist >= -n and i >= 0 and j >= 0 and i < rows and j < cols:
 coord_list.append([i, j])

The if n >= 1: means -n is a negative number so min((-n), rows) always returns -n.

dist is the sum of two positive numbers (abs((curr_i - i)) + abs((curr_j - j))) so it is always positive and the test dist >= -n is unnecessary.

The comment says using min to keep i and j in the grid, but the calculations are not correct so i and j may be outside the grid. The later if ... i >= 0 and j >= 0 ... is filtering out the bad index values.

I think you meant to do something like this (untested) code:

 i_start = max(curr_i - n, 0)
 i_end = min(curr_i + n + 1, rows)
 j_start = max(curr_j - n, 0)
 j_end = min(curr_j + n + 1, cols)
 for i in range(i_start, i_end):
 for j in range(j_start, j_end):
 dist = abs((curr_i - i)) + abs((curr_j - j))
 if dist <= n:
 coord_list.append([i, j])

The Von Neumann neighborhood has a diamond shape. So you could make j_start and j_end also depend on abs(curr_i - i). That would check fewer cells and let you eliminate the test dist <= n.

Big-O complexity: populate_dict iterates over the entire grid, so it is O(rows x cols). find_von_neumann iterates over a diamond that with an area that grows as n^2 so it is O(n^2). Because find_von_neumann is called for every square in the grid, the over all complexity is O(rows x cols x n^2).

Question 8

Good point on the min and dist comparisons being unnecessary. Also I guess I should have put this in the original post but is O(n^2) necessarily bad for these types of algorithms or is there a different to approach this then how I'm currently looking at it? It runs fast enough for anything that we'd need for at the moment but I have a feeling we'll be using this in future projects with possibly more data. Also thank you for taking the time to go through my code

Question 9

Set vs. dict

So far as I can tell, you aren't really using the value in is_counted, so it is better represented as a set than a dictionary. Also, it shouldn't be global - this prevents re-entrance and hampers testing; instead, make it a return value of find_von_neumann, merged in turn into another set in populate_dict that is returned.

Iteration

This is both very dense and re-executes a number of things that can be pre-computed:

 for i in range(curr_i + min((-n), rows), curr_i + min((n + 1), rows)):
 for j in range(curr_j + min((-n), cols), min(curr_j + (n + 1), cols)):

instead consider something like

i0 = curr_i + min((-n), rows)
i1 = curr_i + min((n + 1), rows)
j0 = curr_j + min((-n), cols)
j1 = min(curr_j + (n + 1), cols)
for i in range(i0, i1):
 for j in range(j0, j1):

Combined inequalities

You did the right thing here:

n >= dist >= -n

though I would find it more intuitive as

-n <= dist <= n

You should do the same for i and j:

0 <= i < rows and 0 <= j < cols

Question 10

This was awesome, thank you taking a look at it! I didn't even think to do those pre-computations and on our largest data set it made a very real difference

Question 11

This doesn't look right:

 for coord in coord_list:
 key = str(coord[0]) + str(coord[1])
 if key not in is_counted:
 is_counted[key] = 1

For example, both coord = [1, 11] and coord = [11, 1] have the same key '111'. Unless that's really intended, you should probably insert a non-digit character in the middle to distinguish the cases. Or, much simpler, just use tuples of ints like (1, 11).

Question 12

I didn’t even notice that, good catch! I was going to use a list as the key but forgot you can’t user mutable data structures as keys, but tuples sounds like the answer

RootTwo RootTwo 10.6k1 gold badge14 silver badges30 bronze badges · Accepted Answer · 2020-11-20 19:10:47Z

The indexing in these loops looks wrong:

if n >= 1:
 coord_list = []
 # Use Min so that if N > col/row, those values outside of the grid will not be computed.
 for i in range(curr_i + min((-n), rows), curr_i + min((n + 1), rows)):
 for j in range(curr_j + min((-n), cols), min(curr_j + (n + 1), cols)):
 dist = abs((curr_i - i)) + abs((curr_j - j))
 if n >= dist >= -n and i >= 0 and j >= 0 and i < rows and j < cols:
 coord_list.append([i, j])

The if n >= 1: means -n is a negative number so min((-n), rows) always returns -n.

dist is the sum of two positive numbers (abs((curr_i - i)) + abs((curr_j - j))) so it is always positive and the test dist >= -n is unnecessary.

The comment says using min to keep i and j in the grid, but the calculations are not correct so i and j may be outside the grid. The later if ... i >= 0 and j >= 0 ... is filtering out the bad index values.

I think you meant to do something like this (untested) code:

 i_start = max(curr_i - n, 0)
 i_end = min(curr_i + n + 1, rows)
 j_start = max(curr_j - n, 0)
 j_end = min(curr_j + n + 1, cols)
 for i in range(i_start, i_end):
 for j in range(j_start, j_end):
 dist = abs((curr_i - i)) + abs((curr_j - j))
 if dist <= n:
 coord_list.append([i, j])

The Von Neumann neighborhood has a diamond shape. So you could make j_start and j_end also depend on abs(curr_i - i). That would check fewer cells and let you eliminate the test dist <= n.

Big-O complexity: populate_dict iterates over the entire grid, so it is O(rows x cols). find_von_neumann iterates over a diamond that with an area that grows as n^2 so it is O(n^2). Because find_von_neumann is called for every square in the grid, the over all complexity is O(rows x cols x n^2).

Good point on the min and dist comparisons being unnecessary. Also I guess I should have put this in the original post but is O(n^2) necessarily bad for these types of algorithms or is there a different to approach this then how I'm currently looking at it? It runs fast enough for anything that we'd need for at the moment but I have a feeling we'll be using this in future projects with possibly more data. Also thank you for taking the time to go through my code

Stack Exchange Network

How can I optimize my Von Neumann neighborhood algorithm?

3 Answers 3

Set vs. dict

Iteration

Combined inequalities

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

How can I optimize my Von Neumann neighborhood algorithm?

3 Answers 3

Set vs. dict

Iteration

Combined inequalities

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions