I had a code challenge for a class I'm taking that built a NN algorithm. I got it to work but I used really basic methods for solving it. There are two 1D NP Arrays that have values 0-2 in them, both equal length. They represent two different trains and test data The output is a confusion matrix that shows which received the right predictions and which received the wrong (doesn't matter ;).
This code is correct - I just feel I took the lazy way out working with lists and then turning those lists into a ndarray. I would love to see if people have some tips on maybe utilizing Numpy for this? Anything Clever?
import numpy as np
x = [0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 1, 0, 2, 0, 0, 0, 0, 0, 1, 0]
y = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
testy = np.array(x)
testy_fit = np.array(y)
row_no = [0,0,0]
row_dh = [0,0,0]
row_sl = [0,0,0]
# Code for the first row - NO
for i in range(len(testy)):
if testy.item(i) == 0 and testy_fit.item(i) == 0:
row_no[0] += 1
elif testy.item(i) == 0 and testy_fit.item(i) == 1:
row_no[1] += 1
elif testy.item(i) == 0 and testy_fit.item(i) == 2:
row_no[2] += 1
# Code for the second row - DH
for i in range(len(testy)):
if testy.item(i) == 1 and testy_fit.item(i) == 0:
row_dh[0] += 1
elif testy.item(i) == 1 and testy_fit.item(i) == 1:
row_dh[1] += 1
elif testy.item(i) == 1 and testy_fit.item(i) == 2:
row_dh[2] += 1
# Code for the third row - SL
for i in range(len(testy)):
if testy.item(i) == 2 and testy_fit.item(i) == 0:
row_sl[0] += 1
elif testy.item(i) == 2 and testy_fit.item(i) == 1:
row_sl[1] += 1
elif testy.item(i) == 2 and testy_fit.item(i) == 2:
row_sl[2] += 1
confusion = np.array([row_no,row_dh,row_sl])
print(confusion)
the result of the print is correct as follow:
[[16 10 0]
[ 2 10 0]
[ 2 0 22]]
2 Answers 2
This can be implemented concisely by using numpy.add.at
:
In [2]: c = np.zeros((3, 3), dtype=int)
In [3]: np.add.at(c, (x, y), 1)
In [4]: c
Out[4]:
array([[16, 10, 0],
[ 2, 10, 0],
[ 2, 0, 22]])
-
\$\begingroup\$ Oh my! I thought there would be something better but i didn't think 1 line of code! Wow. So glad I asked and thank you! \$\endgroup\$broepke– broepke2019年05月06日 02:04:51 +00:00Commented May 6, 2019 at 2:04
-
2\$\begingroup\$ Rule #1 of numpy is if you want to do something, check the docs first to check for a 1 line solution. \$\endgroup\$Oscar Smith– Oscar Smith2019年05月06日 05:39:09 +00:00Commented May 6, 2019 at 5:39
For now disregarding that there is a (way) better numpy
solution to this, as explained in the answer by @WarrenWeckesser, here is a short code review of your actual code.
testy.item(i)
is a very unusual way to saytesty[i]
. It is probably also slower as it involves an attribute lookup.Don't repeat yourself. You test e.g.
if testy.item(i) == 0
three times, each time with a different second condition. Just nest them in anif
block:for i in range(len(testy)): if testy[i] == 0: if testy_fit[i] == 0: row_no[0] += 1 elif testy_fit[i] == 1: row_no[1] += 1 elif testy_fit[i] == 2: row_no[2] += 1
Loop like a native. Don't iterate over the indices of iterables, iterate over the iterable(s)! You can also use the fact that the value encodes the position you want to increment:
for test, fit in zip(testy, testy_fit): if test == 0 and fit in {0, 1, 2}: row_no[fit] += 1
You can even use the fact that the first value encodes the list you want to use and iterate only once. Or even better, make it a list of lists right away:
n = 3 confusion_matrix = [[0] * n for _ in range(n)] for test, fit in zip(testy, testy_fit): confusion_matrix[test][fit] += 1 print(np.array(confusion_matrix))
Don't put everything into the global space, to be run whenever you interact with the script at all. Put your code into functions, document them with a
docstring
, and execute them under aif __name__ == "__main__":
guard, which allows you to import from this script from another script without your code running:def confusion_matrix(x, y): """Return the confusion matrix for two vectors `x` and `y`. x and y must only have values from 0 to n and 0 to m, respectively. """ n, m = np.max(x) + 1, np.max(y) + 1 matrix = [[0] * m for _ in range(n)] for a, b in zip(x, y): matrix[a][b] += 1 return matrix if __name__ == "__main__": x = ... y = ... print(np.array(confusion_matrix(x, y)))
Once you have come this far, you can just swap the implementation of this function to the faster numpy
one without changing anything (except that it then directly returns a numpy.array
instead of a list of lists).
numpy
are routine on SO. \$\endgroup\$