I've tried to implement the Kohonen algorithm using Python and I've managed to do this, but my result is so slow. I want to know if anyone knows how I can improve it.
Objective:
- I have to read a file with X and Y position of 3000 points
- I generate N number of neutrons and arrange them uniformly over the whole chart (over the whole points)
- I check the distance between each point and each neutron, and when I find the minimal distance between one point and a neutron, I move that neutron and his neighborhoods with an alpha value to that point.
I need to mention I'm still quite a beginner using Python.
-UPDATE-
Code:
from chart import save_image, draw_centroids, draw_mesh_between_centroids, chart, draw_centroid
import math
from copy import deepcopy
neurons = []
x = []
y = []
N = 15
t = 0
_alfa = 0
neuron = dict(x=0, y=0, row=0, col=0)
def alfa(t):
return 0.7 * math.pow(math.e, -t / N)
def neighborhood(t):
return int(7 * math.pow(math.e, -t / N) + 1)
def divisors(n):
sqrt = int(math.sqrt(n))
for num in range(sqrt, 2, -1):
if n % num == 0:
return num, int(n / num)
return 0
def generate_neurons(x_range, y_range, size):
div = divisors(size)
if div == 0:
sqrt = int(math.sqrt(size))
size = sqrt * (sqrt + 1)
row, col = divisors(size)
else:
row, col = divisors(size)
step_x = x_range / (row + 1)
step_y = y_range / (col + 1)
for m in range(size):
neuron["row"] = row
neuron["col"] = col
neurons.append(deepcopy(neuron))
step_y_tmp = step_y
for i in range(row):
for j in range(col):
if j == 0:
neurons[i * col + j]["x"] = step_x
else:
neurons[i * col + j]["x"] = neurons[i * col + (j - 1)]["x"] + step_x
neurons[i * col + j]["y"] = step_y_tmp
step_y_tmp += step_y
def read_file(name):
lines = [line.rstrip('\n') for line in open('../generate_file/' + name)]
global x
global y
zone = []
for index in range(len(lines)):
x.append(int(lines[index].split()[0]))
y.append(int(lines[index].split()[1]))
zone.append(int(lines[index].split()[2]))
def run():
global t
global _alfa
distances = []
for j in range(len(x)):
for c in neurons:
dist = distance(x[j], y[j], int(c["x"]), int(c["y"]))
distances.append(dist)
minim = min(float(s) for s in distances)
index = distances.index(minim)
_alfa = alfa(t)
move_neurons(x[j], y[j], neurons[index], _alfa)
distances.clear()
t += 1
def draw_result():
chart(x, y)
draw_mesh_between_centroids(neurons)
draw_centroids(neurons, "black")
save_image()
def detect_neighborhood(neuron, neighborhood):
global neurons
neighborhoods = []
col = neurons[0]["col"]
row = neurons[0]["row"]
for i in range(row):
for j in range(col):
if neuron["x"] == neurons[i * col + j]["x"] and neuron["y"] == neurons[i * col + j]["y"]:
check = -neighborhood + 1
check_2 = neighborhood / 2
if i - neighborhood >= check and j - neighborhood >= check:
for _i in range(i - neighborhood, i + 1):
for _j in range(j - neighborhood, j + 1):
if _i >= 0 and _j >= 0:
neighborhoods.append(neurons[_i * col + _j])
if i - neighborhood >= check and j + neighborhood <= col + check_2:
for _i in range(i - neighborhood, i + 1):
for _j in range(j, (j + neighborhood) + 1):
if _i >= 0 and _j < col:
neighborhoods.append(neurons[_i * col + _j])
if i + neighborhood <= row + check_2 and j - neighborhood >= check:
for _i in range(i, (i + neighborhood) + 1):
for _j in range(j - neighborhood, j + 1):
if _i < row and _j >= 0:
neighborhoods.append(neurons[_i * col + _j])
if i + neighborhood <= row + check_2 and j + neighborhood <= col + check_2:
for _i in range(i, (i + neighborhood) + 1):
for _j in range(j, (j + neighborhood) + 1):
if _i < row and _j < col:
neighborhoods.append(neurons[_i * col + _j])
# draw_centroids(uniqe_dic_list(neighborhoods), "yellow")
# draw_centroid(neuron, "red")
return uniqe_dic_list(neighborhoods)
def distance(x1, y1, x2, y2):
return math.sqrt((x1 - x2) ** 2 + ((y1 - y2) ** 2))
def move_neurons(x, y, neuron, alfa):
global neurons
neighborhoods = detect_neighborhood(neuron, neighborhood(t))
for neuron in neurons:
for n in neighborhoods:
if neuron == n:
neuron["x"] = x + alfa * (x - neuron["x"])
neuron["y"] = y + alfa * (y - neuron["y"])
def uniqe_dic_list(seq):
new_d = []
for x in seq:
if x not in new_d:
new_d.append(x)
return new_d
generate_neurons(300, 300, 100)
read_file("input.txt")
run()
print(_alfa)
while _alfa > 0.2:
run()
print(_alfa)
draw_result()
Input file:
120 52 3
200 10 1
133 200 2
...so on...
20 200 1
-UPDATE- Output image: enter image description here
-
1\$\begingroup\$ Can you provide a small sample of the input file? (BTW you probably meant "neuron" instead of "neutron") \$\endgroup\$ChatterOne– ChatterOne2017年04月10日 06:40:08 +00:00Commented Apr 10, 2017 at 6:40
-
\$\begingroup\$ @ChatterOne I've update my question with how it looks a input file, and yes there was suppose to be "neutron", my mistake \$\endgroup\$Vildnex– Vildnex2017年04月10日 18:23:40 +00:00Commented Apr 10, 2017 at 18:23
2 Answers 2
First of all, what I think are two bugs.
I've generated a bunch of lines like this (redirecting the output to a file):
import random
for i in range(0, 3000):
print str(random.randint(0, 300)) + ' ' + str(random.randint(-200, 600)) + ' 2'
And what happens is that I get:
neighborhoods.append(neutrons[_i * col + _j])
IndexError: list index out of range
This may be due to how I generated the file, but it's still something that is not being accounted for, which stops all the calculation that was maybe running for a long time. You should detect something like this and either skip the line silently, warn the user or ask the user what to do.
Also, the return uniqe_dic_list(neighborhoods)
is indented to be part of the if
, but it really looks like it should be out of the if
and out of both the for
loops.
Then the rest:
zone
is not used. You declare it, assign it, but you never use it.- the line
row, col = divisors(size)
is in both branches of theif
, so it should be out of it.
Like:
if div == 0:
sqrt = int(math.sqrt(size))
size = sqrt * (sqrt + 1)
row, col = divisors(size)
The function
divisors
can return either one value (an int) or two values (a list) and you're not accounting for when it returns only one value. You should either check for that or always return a list.You don't need the
neutron
variable with thedeepcopy
, you can simply append it.
Like:
for m in range(size):
neutrons.append(
{'row': row, 'col': col, 'x': neutron['x'], 'y': neutron['y']})
- You're not closing the input file after you open it.
You can use the with
keyword and also combine the two loop:
def read_file(name):
global x, y
with open(name) as input_file:
for line in input_file:
line_values = line.rstrip('\n').split()
x.append(int(line_values[0]))
y.append(int(line_values[1]))
- Naming matters, having
i
,_i
,j
,_j
,check
,check_2
and so on make it hard to read the code.
-
\$\begingroup\$ Thx for your answer and I will try to fallow each of your steps to make my code better :D. But I have to mention 2 things... 1. yes, I'm not using zone because in this algorithm I dont need that but I other yes, because I'm using the same input file and 2. I've update my code, now that error shouldn't exist so you can check again if you want :D. Thx again \$\endgroup\$Vildnex– Vildnex2017年04月10日 22:26:16 +00:00Commented Apr 10, 2017 at 22:26
I'll just give you a few tips now, that you can start working on. But later, with more time, I'll give a more thorough look at your code and improve this answer.
It would be nice if could share the
chart
as well, so we can test your code without having to make changes to it.Try to not use the same name with different meanings. In your code
neighborhood
can mean both a function or a number, depending on the context you are. Inside thedetect_neighborhood
function it means the variable that was received via argument and elsewhere it means theneighborhood
function.You are using a one-dimensional list to store an information that's two-dimensional. Your code could be simpler if you were using a list of lists or a numpy matrix. This way you're having to do the
_i * col + _j
trick every time when you could be doing something likeneighborhood[i][j]
orneighborhood[i,j]
(with numpy).Depending on your access pattern, e.g. if you always are going to access individual items and not iterate over a row or column, you could even use a dictionary with the keys being a tuple
(x,y)
.For each "neutron" both
row
andcol
are always the same. You could remove these variables from each neutron and store them in another place, where they wouldn't be repeated.
I know some of the stuff I mentioned are not directly related to the performance problem you're having, but I consider them to be good programming practices and worth mentioning.
I hope I could make myself clear, but feel free to ask for clarifications and/or more reasons for each point I've put above. And, of course, you can disagree with everything I said. ;)
Explore related questions
See similar questions with these tags.