Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit 4563b3c

Browse files
Illustration Hungarian algorithm
1 parent 01af34c commit 4563b3c

File tree

1 file changed

+39
-5
lines changed

1 file changed

+39
-5
lines changed

‎OptimalTransportWasserteinDistance.ipynb

Lines changed: 39 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@
2727
"\n",
2828
"This note book has been inspired by the following sources:\n",
2929
"* [this wikipedia article](https://en.wikipedia.org/wiki/Wasserstein_metric)\n",
30+
"* Great notebook from Zhengping Jiang [here](https://zipjiang.github.io/2020/11/23/sinkhorn's-theorem-,-sinkhorn-algorithm-and-applications.html)\n",
3031
"* A very nice serie of blog articles:\n",
3132
" * [introduction](http://modelai.gettysburg.edu/2020/wgan/Resources/Lesson4/IntuitiveGuideOT1.htm)\n",
3233
" * [Wassertein GAN](http://modelai.gettysburg.edu/2020/wgan/Resources/Lesson4/IntuitiveGuideOT.htm)\n"
@@ -301,24 +302,33 @@
301302
},
302303
{
303304
"cell_type": "code",
304-
"execution_count": 10,
305+
"execution_count": 7,
305306
"metadata": {},
306307
"outputs": [
307308
{
308309
"name": "stdout",
309310
"output_type": "stream",
310311
"text": [
311-
"[1 0 2]\n",
312+
"[0 1 2] [1 0 2]\n",
313+
"[1 2 2]\n",
312314
"5\n"
313315
]
314316
}
315317
],
316318
"source": [
317319
"import numpy as np\n",
320+
"# i-th row is i-th worker and j-th column is j-th job\n",
321+
"# ie cost = (worker_0, worker_1, worker_2)\n",
322+
"#( worker_0 ) (4, 1, 3)\n",
323+
"#( worker_1 ) = (2, 0, 5)\n",
324+
"#( worker_2 ) (3, 2, 2)\n",
318325
"cost = np.array([[4, 1, 3], [2, 0, 5], [3, 2, 2]])\n",
319326
"from scipy.optimize import linear_sum_assignment\n",
320-
"row_ind, col_ind = linear_sum_assignment(cost)\n",
321-
"print(col_ind)\n",
327+
"row_ind, col_ind = linear_sum_assignment(cost, maximize=False)\n",
328+
"# result is (array([0, 1, 2]), array([1, 0, 2]))\n",
329+
"# Means \n",
330+
"print(row_ind, col_ind)\n",
331+
"print(cost[row_ind, col_ind])\n",
322332
"print(cost[row_ind, col_ind].sum())"
323333
]
324334
},
@@ -435,6 +445,30 @@
435445
"out = solver(prob)"
436446
]
437447
},
448+
{
449+
"cell_type": "markdown",
450+
"metadata": {},
451+
"source": [
452+
"### How is Sinkhorn algorithm related to OT ?\n",
453+
"So how is calculating a doubly stochastic matrix related to optimal transport?\n",
454+
"\n",
455+
"We will reformulate our optimal transport problem with a slightly different matrix form: This time, we take N warehouses and also N customers (K=N from our original problem). The distance cost is still $c(x,y)$ is the distance between the storage area $x$ and the address of customer $y,ドル but we store those cost in matrix $M \\in \\mathbb{R}^{N\\times N+}$ where entry $m_{i,j}=c(x_i,y_j)$.\n",
456+
"\n",
457+
"Now suppose our e-readers are randomly distributed according to a distribution r among the N warehouses. And we would like them to be distributed according to another distribution c on the customers.\n",
458+
"Now we can write up a non-negative matrix $A \\in \\mathbb{R}^{N\\times K+},ドル where $a_{i,j}$ corresponds to how much portion of the total e-readers we would like to transport from warehouse i to customer j. Another way to view this matrix A is that A defines a joint distribution in the transportation space and r and c could be seen as its marginals of starting points and destinations respectively. Now it’s not hard to see that the final transportation cost would be the sum of the element-wise product between A and M, or in other words the Frobenius inner product $\\langle A, M \\rangle$ In fact, this is called the Kantorovich relaxation of OT.\n",
459+
"\n",
460+
"There is the conclusion that if M is a distance matrix, that is, every element in the matrix comform to the three property of distance, the inner product is a distance itself. Confirming this should not be hard. To make a transfort optimal is to find the minimum cost of the total transport, that is, to minimize the Frobenius inner product.\n",
461+
"\n",
462+
"I would like to stop and mention that as we now interpret A as a joint probability matrix, we can define its entropy, the marginal probabiilty entropy, and KL-divergence between two different transportation matrix. These takes the form of\n",
463+
"\n",
464+
"\\begin{align*}\n",
465+
" \\text{Entropy} &= H(A) &= \\sum_{i,j} a_{i,j} log(a_{i,j}) \\\\\n",
466+
" \\text{Marginal source distribution entropy r} &= H(r) &= \\sum_{i} \\left( \\sum_{j} a_{i,j} \\right) log\\left( \\sum_{j} a_{i,j} \\right)\\\\\n",
467+
" \\text{Marginal destination distribution entropy c} &= H(c) &= \\sum_{j} \\left( \\sum_{i} a_{i,j} \\right) log\\left( \\sum_{i} a_{i,j} \\right)\\\\\n",
468+
" \\text{KL-divergence between A and B transportation} &= &=\n",
469+
"\\end{align*}\n"
470+
]
471+
},
438472
{
439473
"cell_type": "markdown",
440474
"metadata": {},
@@ -955,7 +989,7 @@
955989
"This chapter is inspired by Uni heldelberg series of courses on Optimal Transport [part 3](https://www.youtube.com/watch?v=BfOjrQAhG4M&ab_channel=UniHeidelberg) itself inspired by [Vincent Hermann great blog article](https://vincentherrmann.github.io/blog/wasserstein/)\n",
956990
"\n",
957991
"### Introduction\n",
958-
"Sinkhorn iteration is an approximate method to solve the optinal transport problem in its discrete formulation as seen before, where we solved the OT with linear programming"
992+
"Sinkhorn iteration is an approximate method to solve the optimal transport problem in its discrete formulation as seen before, where we solved the OT with linear programming"
959993
]
960994
},
961995
{

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /