5
\$\begingroup\$

I had a code challenge for a class I'm taking that built a NN algorithm. I got it to work but I used really basic methods for solving it. There are two 1D NP Arrays that have values 0-2 in them, both equal length. They represent two different trains and test data The output is a confusion matrix that shows which received the right predictions and which received the wrong (doesn't matter ;).

This code is correct - I just feel I took the lazy way out working with lists and then turning those lists into a ndarray. I would love to see if people have some tips on maybe utilizing Numpy for this? Anything Clever?

import numpy as np
x = [0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 1, 0, 2, 0, 0, 0, 0, 0, 1, 0]
y = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
testy = np.array(x)
testy_fit = np.array(y)
row_no = [0,0,0]
row_dh = [0,0,0]
row_sl = [0,0,0]
# Code for the first row - NO
for i in range(len(testy)):
 if testy.item(i) == 0 and testy_fit.item(i) == 0:
 row_no[0] += 1
 elif testy.item(i) == 0 and testy_fit.item(i) == 1:
 row_no[1] += 1
 elif testy.item(i) == 0 and testy_fit.item(i) == 2:
 row_no[2] += 1
# Code for the second row - DH
for i in range(len(testy)):
 if testy.item(i) == 1 and testy_fit.item(i) == 0:
 row_dh[0] += 1
 elif testy.item(i) == 1 and testy_fit.item(i) == 1:
 row_dh[1] += 1
 elif testy.item(i) == 1 and testy_fit.item(i) == 2:
 row_dh[2] += 1
# Code for the third row - SL
for i in range(len(testy)):
 if testy.item(i) == 2 and testy_fit.item(i) == 0:
 row_sl[0] += 1
 elif testy.item(i) == 2 and testy_fit.item(i) == 1:
 row_sl[1] += 1
 elif testy.item(i) == 2 and testy_fit.item(i) == 2:
 row_sl[2] += 1
confusion = np.array([row_no,row_dh,row_sl])
print(confusion)

the result of the print is correct as follow:

[[16 10 0]
 [ 2 10 0]
 [ 2 0 22]]
asked May 5, 2019 at 23:36
\$\endgroup\$
1
  • 1
    \$\begingroup\$ Good thing this got an answer on SO before it was moved. Performance questions for numpy are routine on SO. \$\endgroup\$ Commented May 6, 2019 at 0:15

2 Answers 2

5
\$\begingroup\$

This can be implemented concisely by using numpy.add.at:

In [2]: c = np.zeros((3, 3), dtype=int) 
In [3]: np.add.at(c, (x, y), 1) 
In [4]: c 
Out[4]: 
array([[16, 10, 0],
 [ 2, 10, 0],
 [ 2, 0, 22]])
answered May 5, 2019 at 23:41
\$\endgroup\$
2
  • \$\begingroup\$ Oh my! I thought there would be something better but i didn't think 1 line of code! Wow. So glad I asked and thank you! \$\endgroup\$ Commented May 6, 2019 at 2:04
  • 2
    \$\begingroup\$ Rule #1 of numpy is if you want to do something, check the docs first to check for a 1 line solution. \$\endgroup\$ Commented May 6, 2019 at 5:39
3
\$\begingroup\$

For now disregarding that there is a (way) better numpy solution to this, as explained in the answer by @WarrenWeckesser, here is a short code review of your actual code.

  • testy.item(i) is a very unusual way to say testy[i]. It is probably also slower as it involves an attribute lookup.
  • Don't repeat yourself. You test e.g. if testy.item(i) == 0 three times, each time with a different second condition. Just nest them in an if block:

    for i in range(len(testy)):
     if testy[i] == 0:
     if testy_fit[i] == 0:
     row_no[0] += 1
     elif testy_fit[i] == 1:
     row_no[1] += 1
     elif testy_fit[i] == 2:
     row_no[2] += 1
    
  • Loop like a native. Don't iterate over the indices of iterables, iterate over the iterable(s)! You can also use the fact that the value encodes the position you want to increment:

    for test, fit in zip(testy, testy_fit):
     if test == 0 and fit in {0, 1, 2}:
     row_no[fit] += 1
    
  • You can even use the fact that the first value encodes the list you want to use and iterate only once. Or even better, make it a list of lists right away:

    n = 3
    confusion_matrix = [[0] * n for _ in range(n)]
    for test, fit in zip(testy, testy_fit):
     confusion_matrix[test][fit] += 1
    print(np.array(confusion_matrix))
    
  • Don't put everything into the global space, to be run whenever you interact with the script at all. Put your code into functions, document them with a docstring, and execute them under a if __name__ == "__main__": guard, which allows you to import from this script from another script without your code running:

    def confusion_matrix(x, y):
     """Return the confusion matrix for two vectors `x` and `y`.
     x and y must only have values from 0 to n and 0 to m, respectively.
     """
     n, m = np.max(x) + 1, np.max(y) + 1
     matrix = [[0] * m for _ in range(n)]
     for a, b in zip(x, y):
     matrix[a][b] += 1
     return matrix
    if __name__ == "__main__":
     x = ...
     y = ...
     print(np.array(confusion_matrix(x, y)))
    

Once you have come this far, you can just swap the implementation of this function to the faster numpy one without changing anything (except that it then directly returns a numpy.array instead of a list of lists).

answered May 6, 2019 at 6:58
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.