|
27 | 27 | "\n",
|
28 | 28 | "This note book has been inspired by the following sources:\n",
|
29 | 29 | "* [this wikipedia article](https://en.wikipedia.org/wiki/Wasserstein_metric)\n",
|
| 30 | + "* Great notebook from Zhengping Jiang [here](https://zipjiang.github.io/2020/11/23/sinkhorn's-theorem-,-sinkhorn-algorithm-and-applications.html)\n", |
30 | 31 | "* A very nice serie of blog articles:\n",
|
31 | 32 | " * [introduction](http://modelai.gettysburg.edu/2020/wgan/Resources/Lesson4/IntuitiveGuideOT1.htm)\n",
|
32 | 33 | " * [Wassertein GAN](http://modelai.gettysburg.edu/2020/wgan/Resources/Lesson4/IntuitiveGuideOT.htm)\n"
|
|
301 | 302 | },
|
302 | 303 | {
|
303 | 304 | "cell_type": "code",
|
304 | | - "execution_count": 10, |
| 305 | + "execution_count": 7, |
305 | 306 | "metadata": {},
|
306 | 307 | "outputs": [
|
307 | 308 | {
|
308 | 309 | "name": "stdout",
|
309 | 310 | "output_type": "stream",
|
310 | 311 | "text": [
|
311 | | - "[1 0 2]\n", |
| 312 | + "[0 1 2] [1 0 2]\n", |
| 313 | + "[1 2 2]\n", |
312 | 314 | "5\n"
|
313 | 315 | ]
|
314 | 316 | }
|
315 | 317 | ],
|
316 | 318 | "source": [
|
317 | 319 | "import numpy as np\n",
|
| 320 | + "# i-th row is i-th worker and j-th column is j-th job\n", |
| 321 | + "# ie cost = (worker_0, worker_1, worker_2)\n", |
| 322 | + "#( worker_0 ) (4, 1, 3)\n", |
| 323 | + "#( worker_1 ) = (2, 0, 5)\n", |
| 324 | + "#( worker_2 ) (3, 2, 2)\n", |
318 | 325 | "cost = np.array([[4, 1, 3], [2, 0, 5], [3, 2, 2]])\n",
|
319 | 326 | "from scipy.optimize import linear_sum_assignment\n",
|
320 | | - "row_ind, col_ind = linear_sum_assignment(cost)\n", |
321 | | - "print(col_ind)\n", |
| 327 | + "row_ind, col_ind = linear_sum_assignment(cost, maximize=False)\n", |
| 328 | + "# result is (array([0, 1, 2]), array([1, 0, 2]))\n", |
| 329 | + "# Means \n", |
| 330 | + "print(row_ind, col_ind)\n", |
| 331 | + "print(cost[row_ind, col_ind])\n", |
322 | 332 | "print(cost[row_ind, col_ind].sum())"
|
323 | 333 | ]
|
324 | 334 | },
|
|
435 | 445 | "out = solver(prob)"
|
436 | 446 | ]
|
437 | 447 | },
|
| 448 | + { |
| 449 | + "cell_type": "markdown", |
| 450 | + "metadata": {}, |
| 451 | + "source": [ |
| 452 | + "### How is Sinkhorn algorithm related to OT ?\n", |
| 453 | + "So how is calculating a doubly stochastic matrix related to optimal transport?\n", |
| 454 | + "\n", |
| 455 | + "We will reformulate our optimal transport problem with a slightly different matrix form: This time, we take N warehouses and also N customers (K=N from our original problem). The distance cost is still $c(x,y)$ is the distance between the storage area $x$ and the address of customer $y,ドル but we store those cost in matrix $M \\in \\mathbb{R}^{N\\times N+}$ where entry $m_{i,j}=c(x_i,y_j)$.\n", |
| 456 | + "\n", |
| 457 | + "Now suppose our e-readers are randomly distributed according to a distribution r among the N warehouses. And we would like them to be distributed according to another distribution c on the customers.\n", |
| 458 | + "Now we can write up a non-negative matrix $A \\in \\mathbb{R}^{N\\times K+},ドル where $a_{i,j}$ corresponds to how much portion of the total e-readers we would like to transport from warehouse i to customer j. Another way to view this matrix A is that A defines a joint distribution in the transportation space and r and c could be seen as its marginals of starting points and destinations respectively. Now it’s not hard to see that the final transportation cost would be the sum of the element-wise product between A and M, or in other words the Frobenius inner product $\\langle A, M \\rangle$ In fact, this is called the Kantorovich relaxation of OT.\n", |
| 459 | + "\n", |
| 460 | + "There is the conclusion that if M is a distance matrix, that is, every element in the matrix comform to the three property of distance, the inner product is a distance itself. Confirming this should not be hard. To make a transfort optimal is to find the minimum cost of the total transport, that is, to minimize the Frobenius inner product.\n", |
| 461 | + "\n", |
| 462 | + "I would like to stop and mention that as we now interpret A as a joint probability matrix, we can define its entropy, the marginal probabiilty entropy, and KL-divergence between two different transportation matrix. These takes the form of\n", |
| 463 | + "\n", |
| 464 | + "\\begin{align*}\n", |
| 465 | + " \\text{Entropy} &= H(A) &= \\sum_{i,j} a_{i,j} log(a_{i,j}) \\\\\n", |
| 466 | + " \\text{Marginal source distribution entropy r} &= H(r) &= \\sum_{i} \\left( \\sum_{j} a_{i,j} \\right) log\\left( \\sum_{j} a_{i,j} \\right)\\\\\n", |
| 467 | + " \\text{Marginal destination distribution entropy c} &= H(c) &= \\sum_{j} \\left( \\sum_{i} a_{i,j} \\right) log\\left( \\sum_{i} a_{i,j} \\right)\\\\\n", |
| 468 | + " \\text{KL-divergence between A and B transportation} &= &=\n", |
| 469 | + "\\end{align*}\n" |
| 470 | + ] |
| 471 | + }, |
438 | 472 | {
|
439 | 473 | "cell_type": "markdown",
|
440 | 474 | "metadata": {},
|
|
955 | 989 | "This chapter is inspired by Uni heldelberg series of courses on Optimal Transport [part 3](https://www.youtube.com/watch?v=BfOjrQAhG4M&ab_channel=UniHeidelberg) itself inspired by [Vincent Hermann great blog article](https://vincentherrmann.github.io/blog/wasserstein/)\n",
|
956 | 990 | "\n",
|
957 | 991 | "### Introduction\n",
|
958 | | - "Sinkhorn iteration is an approximate method to solve the optinal transport problem in its discrete formulation as seen before, where we solved the OT with linear programming" |
| 992 | + "Sinkhorn iteration is an approximate method to solve the optimal transport problem in its discrete formulation as seen before, where we solved the OT with linear programming" |
959 | 993 | ]
|
960 | 994 | },
|
961 | 995 | {
|
|
0 commit comments