So here is the problem:
Given 2D numpy arrays 'a' and 'b' of sizes n×ばつm and k×ばつm respectively and one natural number 'p'. You need to find the distance(Euclidean) of the rows of the matrices 'a' and 'b'. Fill the results in the k×ばつn matrix. Calculate the distance with the following formula $$ D(x, y) = \left( \sum _{i=1} ^{m} \left| x_i - y_i \right|^p \right) ^{1/p} ; x,y \in R^m $$ (try to prove that this is a distance). Extra points for writing without a loop.
And here is my solution:
import numpy as np
def dist_mat(a, b, p):
result = []
print(result)
for vector in b:
matrix = a - vector
print(matrix)
result.append(list(((matrix ** p).sum(axis=1))**(1/p)))
return np.array(result)
a = np.array([[1, 1],
[0, 1],
[1, 3],
[4, 5]])
b = np.array([[1, 1],
[-1, 0]])
p = 2
print(dist_mat(a, b, p))
I'm not sure about using Python list
and then converting it into np.array
, is there a better way?
2 Answers 2
np.array()
takes any kind of nested sequence. So, it isn't necessary to convert to a list before appending to result
.
def dist_mat(a, b, p):
result = []
for vector in b:
matrix = np.abs(a - vector)
result.append(((matrix ** p).sum(axis=1))**(1/p))
return np.array(result)
This is an old enough question that shooting for the extra points is hopefully not going to step on anyone's toes. So let me be the broadcast guy, i.e. this is a loop-free version.
It is perhaps better to first read either the numpy documentation on broadcasting, or a related answer of mine.
We start with solving the one-column case.
In this case the matrices are of size n x 1
and k x 1
. We need to turn these into a matrix of size k x n
.
Well, to get there by broadcasting, we need to take the transpose of one of the vectors. The problem calls for the first one to be transposed. Thus we have the matrix a.T
of size 1 x n
and b
of size k x 1
.
Then the solution is just
# shape is (k, n)
(np.abs(a.T - b) ** p) ** (1/p).
The case with multiple columns
The matrices are of size n x m
and k x m
. To be able to reuse the previous idea, we have to turn these into arrays of shape 1 x n x m
and k x 1 x m
. This can be done using np.expand_dims
:
# shape is (k, n, m)
np.expand_dims(a, 0) - np.expand_dims(b, 1)
Looks good. All that remains is to take the absolute value, then the p
th power, then sum up along the last dimension, lastly take p
th root.
# shape is (k, n)
(np.abs(np.expand_dims(a, 0) - np.expand_dims(b, 1))**p).sum(axis=-1)**(1/p)
-
1\$\begingroup\$ All insights are welcome here on CR - there's no need to worry about "stepping on toes"! This is certainly a good way to simplify the code, and shows the reasoning as to how it works and why it's better - exactly what we want from a good answer. Keep them coming! \$\endgroup\$Toby Speight– Toby Speight2021年05月16日 12:35:30 +00:00Commented May 16, 2021 at 12:35
-
\$\begingroup\$ @TobySpeight right, thank you. I never seem to know when is a complete rewrite appropriate as a review. Especially when the question smells like homework. \$\endgroup\$Eman Yalpsid– Eman Yalpsid2021年05月16日 17:04:29 +00:00Commented May 16, 2021 at 17:04
Explore related questions
See similar questions with these tags.
matrix = abs(a - vector)
according to the provided formula. Doesn't matter for an even valuep
, but does for odd values. Might be wrong though. \$\endgroup\$scipy.spatial.distance_matrix()
for testing. docs.scipy.org/doc/scipy/reference/generated/… \$\endgroup\$