Finding distance between vectors of matrices

Question 1

So here is the problem:

Given 2D numpy arrays 'a' and 'b' of sizes n×ばつm and k×ばつm respectively and one natural number 'p'. You need to find the distance(Euclidean) of the rows of the matrices 'a' and 'b'. Fill the results in the k×ばつn matrix. Calculate the distance with the following formula $$ D(x, y) = \left( \sum _{i=1} ^{m} \left| x_i - y_i \right|^p \right) ^{1/p} ; x,y \in R^m $$ (try to prove that this is a distance). Extra points for writing without a loop.

And here is my solution:

import numpy as np
def dist_mat(a, b, p):
 result = []
 print(result)
 for vector in b:
 matrix = a - vector
 print(matrix)
 result.append(list(((matrix ** p).sum(axis=1))**(1/p)))
 return np.array(result)
a = np.array([[1, 1],
 [0, 1],
 [1, 3],
 [4, 5]])
b = np.array([[1, 1],
 [-1, 0]])
p = 2
print(dist_mat(a, b, p))

I'm not sure about using Python list and then converting it into np.array, is there a better way?

Question 2

Does this code pass all the testcases? It seems to me that you would want matrix = abs(a - vector) according to the provided formula. Doesn't matter for an even value p, but does for odd values. Might be wrong though.

Question 3

Yeah, that part was incorrect, thank you for the correction.

Question 4

I know you want your own solution from scratch, but you might have a look at scipy.spatial.distance_matrix() for testing. docs.scipy.org/doc/scipy/reference/generated/…

Question 5

np.array() takes any kind of nested sequence. So, it isn't necessary to convert to a list before appending to result.

def dist_mat(a, b, p):
 result = []
 for vector in b:
 matrix = np.abs(a - vector)
 result.append(((matrix ** p).sum(axis=1))**(1/p))
 return np.array(result)

Question 6

This is an old enough question that shooting for the extra points is hopefully not going to step on anyone's toes. So let me be the broadcast guy, i.e. this is a loop-free version.

It is perhaps better to first read either the numpy documentation on broadcasting, or a related answer of mine.

We start with solving the one-column case.

In this case the matrices are of size n x 1 and k x 1. We need to turn these into a matrix of size k x n.

Well, to get there by broadcasting, we need to take the transpose of one of the vectors. The problem calls for the first one to be transposed. Thus we have the matrix a.T of size 1 x n and b of size k x 1.

Then the solution is just

 # shape is (k, n)
 (np.abs(a.T - b) ** p) ** (1/p).

The case with multiple columns

The matrices are of size n x m and k x m. To be able to reuse the previous idea, we have to turn these into arrays of shape 1 x n x m and k x 1 x m. This can be done using np.expand_dims:

 # shape is (k, n, m)
 np.expand_dims(a, 0) - np.expand_dims(b, 1)

Looks good. All that remains is to take the absolute value, then the pth power, then sum up along the last dimension, lastly take pth root.

 # shape is (k, n)
 (np.abs(np.expand_dims(a, 0) - np.expand_dims(b, 1))**p).sum(axis=-1)**(1/p)

Question 7

All insights are welcome here on CR - there's no need to worry about "stepping on toes"! This is certainly a good way to simplify the code, and shows the reasoning as to how it works and why it's better - exactly what we want from a good answer. Keep them coming!

Question 8

@TobySpeight right, thank you. I never seem to know when is a complete rewrite appropriate as a review. Especially when the question smells like homework.

RootTwo RootTwo 10.7k1 gold badge14 silver badges30 bronze badges · Answer 1 · 2021-04-16 06:44:24Z

np.array() takes any kind of nested sequence. So, it isn't necessary to convert to a list before appending to result.

def dist_mat(a, b, p):
 result = []
 for vector in b:
 matrix = np.abs(a - vector)
 result.append(((matrix ** p).sum(axis=1))**(1/p))
 return np.array(result)

Eman Yalpsid Eman Yalpsid 1,56911 silver badges16 bronze badges · Answer 2 · 2021-05-16 10:29:29Z

This is an old enough question that shooting for the extra points is hopefully not going to step on anyone's toes. So let me be the broadcast guy, i.e. this is a loop-free version.

It is perhaps better to first read either the numpy documentation on broadcasting, or a related answer of mine.

We start with solving the one-column case.

In this case the matrices are of size n x 1 and k x 1. We need to turn these into a matrix of size k x n.

Well, to get there by broadcasting, we need to take the transpose of one of the vectors. The problem calls for the first one to be transposed. Thus we have the matrix a.T of size 1 x n and b of size k x 1.

Then the solution is just

 # shape is (k, n)
 (np.abs(a.T - b) ** p) ** (1/p).

The case with multiple columns

The matrices are of size n x m and k x m. To be able to reuse the previous idea, we have to turn these into arrays of shape 1 x n x m and k x 1 x m. This can be done using np.expand_dims:

 # shape is (k, n, m)
 np.expand_dims(a, 0) - np.expand_dims(b, 1)

Looks good. All that remains is to take the absolute value, then the pth power, then sum up along the last dimension, lastly take pth root.

 # shape is (k, n)
 (np.abs(np.expand_dims(a, 0) - np.expand_dims(b, 1))**p).sum(axis=-1)**(1/p)

All insights are welcome here on CR - there's no need to worry about "stepping on toes"! This is certainly a good way to simplify the code, and shows the reasoning as to how it works and why it's better - exactly what we want from a good answer. Keep them coming!
@TobySpeight right, thank you. I never seem to know when is a complete rewrite appropriate as a review. Especially when the question smells like homework.

Stack Exchange Network

Finding distance between vectors of matrices

2 Answers 2

We start with solving the one-column case.

The case with multiple columns

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Finding distance between vectors of matrices

2 Answers 2

We start with solving the one-column case.

The case with multiple columns

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions