-
Notifications
You must be signed in to change notification settings - Fork 536
-
Hi. I seem not to find a module to calculate the distance between two matrices (with positive values and equal summation of elements). It would be great to have one.
Beta Was this translation helpful? Give feedback.
All reactions
Replies: 4 comments 1 reply
-
Hello @kiasar ,
The Gromov-Wasserstein distance can be used to compare any pair of matrices, including those you've mentioned. POT includes several solvers for estimating this distance, e.g. 'ot.gromov.gromov_wasserstein', and other features such as the Gromov-Wasserstein barycenter, etc.
I recommend you take a look at the gallery of examples on this subject, e.g https://pythonot.github.io/auto_examples/gromov/plot_fgw.html#sphx-glr-auto-examples-gromov-plot-fgw-py
Beta Was this translation helpful? Give feedback.
All reactions
-
Hi @cedricvincentcuaz,
Thank you for pointing me to the Gromov-Wasserstein distance and the resources associated with it.
I initially approached this by visualizing the matrix cells as (i, j) points in a 2D space and then utilized ot.emd() to compute the "cost" as the distance between matrices. Given your expertise in this domain, it would be great to know your opinion: Would the ot.gromov.gromov_wasserstein method offers advantages or be more appropriate for this task than my approach? (my interest is to cluster the matrices)
Thank you for your guidance.
Beta Was this translation helpful? Give feedback.
All reactions
-
In most applications, it is relevant to see a matrix as a graph where the
If I understood correctly what you did, it comes down to flatten each matrix
Doing so, you destroy the spatial arrangements of the graphs. Imagine that you have 2 graphs with binary connections and the same number of nodes, it suffices that both graphs have K non-zero connections so that
However, the Gromov-Wasserstein (GW) distance would be zero iif these graphs have the same spatial arrangement up to any permutation of the nodes within each graph. If this invariance makes sense for your matrices, clearly GW would be a better choice.
Then to cluster a set of several matrices, you could e.g perform GW-based dictionary learning (https://pythonot.github.io/auto_examples/gromov/plot_gromov_wasserstein_dictionary_learning.html#sphx-glr-auto-examples-gromov-plot-gromov-wasserstein-dictionary-learning-py).
Beta Was this translation helpful? Give feedback.
All reactions
-
Firstly, thank you so much for the insights you've shared. I truly appreciate your expertise.
I wanted to touch base with you on a couple of things:
-
I've been working with your
plot2D_samples_mat()function and thought of an enhancement that might be beneficial. I introduced a boolean draw_arrows variable to allow for directional arrows in the plots. I found it useful for my work, and I've created a merge request for the same. I'd love for you to take a look when you have a moment. -
While going through your "Plot Fused-Gromov-Wasserstein" documentation, I noticed a potential oversight. It seems that all the
pl.scatter(y, x)instances might need to be corrected topl.scatter(x, y), as the x-coordinate comes first inpl.scatter.
I've been looking into the "Gromov-Wasserstein" method, and it seems like it's designed for situations a bit trickier than ours. Our challenge feels a bit more straightforward.
Just to paint a clearer picture: We've got thousands of these square matrices that are all the same size and with the same grand sum. When we show them as colorful heatmaps, you can easily spot some repeating designs. Our big question is: how do we cluster these? Here's what they look like:
image
My initial thought was to plot these patterns in a simple way and use ot.emd() to see how different they are from each other. But, as you might have guessed, it didn't quite hit the mark. Here's what that looked like:
image
Given everything I've shared, would you kindly advise if you still think the Gromov-Wasserstein method remains a suitable approach in your opinion?
Beta Was this translation helpful? Give feedback.
All reactions
-
Hello @kiasar,
For 1) I find these arrow plots clearly less readable than the plot on the left, but I guess it is a matter of habits and taste so it might indeed be a relevant new feature. For 2) I believe that it was simply a matter of rendering but thank you for pointing these out.
For your original problem, as all matrices have the same size, GW is mostly relevant if you want to find an optimal permutation to align compared matrices. Otherwise If the matrices are already aligned, i.e for two matrices it makes sense to directly compare M_1[I, j] and M_2[I,j], you should rather compare them using distances based on element-wise operations like the euclidean distance or e.g the Aitchison distance that might be more suitable with your simplex constraints. And then use any kind of clustering algorithm that follows the geometry you chose.
Best.
Beta Was this translation helpful? Give feedback.