Commit 0805e53

committed

Continuying on Optimal transport

1 parent 7b9f82a commit 0805e53Copy full SHA for 0805e53

File tree

4 files changed

+54

-13

lines changed

InformationTheoryOptimization.ipynb
OptimalTransportWasserteinDistance.ipynb
doc/OptimalTransportWasserteinDistance
- Fast_computation_of_Wasserstein_barycenters.pdf
- NIPS-2013-sinkhorn-distances-lightspeed-computation-of-optimal-transport-Paper.pdf

4 files changed

+54

-13

lines changed

`‎InformationTheoryOptimization.ipynb‎`

Lines changed: 1 addition & 1 deletion

Original file line number	Diff line number	Diff line change
`@@ -2021,7 +2021,7 @@`
`2021`	`2021`	`"name": "python",`
`2022`	`2022`	`"nbconvert_exporter": "python",`
`2023`	`2023`	`"pygments_lexer": "ipython3",`
`2024`		`- "version": "3.11.5"`
	`2024`	`+ "version": "3.12.1"`
`2025`	`2025`	`}`
`2026`	`2026`	`},`
`2027`	`2027`	`"nbformat": 4,`

`‎OptimalTransportWasserteinDistance.ipynb‎`

Lines changed: 53 additions & 12 deletions

Original file line number	Diff line number	Diff line change
`@@ -372,8 +372,6 @@`
`372`	`372`	`"\n",`
`373`	`373`	`"This part is largly inspired by the following wikipedia article on [sinkhorn theorem](https://en.wikipedia.org/wiki/Sinkhorn%27s_theorem) and [doubly stochastic matrices](https://en.wikipedia.org/wiki/Doubly_stochastic_matrix) as well as this [blog post](https://zipjiang.github.io/2020/11/23/sinkhorn's-theorem-,-sinkhorn-algorithm-and-applications.html) from Zipjiang, and this [original overview](https://djalil.chafai.net/blog/2021/08/28/sinkhorn-and-circular-law/) by Djalil Chafai.\n",`
`374`	`374`	`"\n",`
`375`		- "In Djalil Chafai, it is mentionned, that, when factorizing a non-negative square matrix $A$ as $d_1 S d_2,ドル it is well known that S is the projection of $A,ドル with respect to the Kullback-Leibler divergence, on the set of doubly stochastic matrices (Birkhoff polytope). The Sinkhorn factorization and its algorithm became fashionable in the domain of computational optimal transportation due to a relation with entropic regularization. It is also considered in the domain of quantum information theory.\n",
`376`		`- "\n",`
`377`	`375`	`"### (Doubly) Stochastic matrices\n",`
`378`	`376`	`"A doubly stochastic matrix (also called bistochastic matrix) is a square matrix $X = (x_{ij})$ of nonnegative real numbers, each of whose rows and columns sums to 1, i.e: $ \\sum_{i}x_{ij}=\\sum_{j}x_{ij}=1 $\n",`
`379`	`377`	`"Thus, a doubly stochastic matrix is both left stochastic and right stochastic.\n",`
`@@ -383,7 +381,7 @@`
`383`	`381`	`"If $A$ is an $n \\times n$ matrix with strictly positive elements, then there exist diagonal matrices $D_1$ and $D_2$ with strictly positive diagonal elements such that $D_1AD_2$ is doubly stochastic. The matrices $D_1$ and $D_2$ are unique modulo multiplying the first matrix by a positive number and dividing the second one by the same number.\n",`
`384`	`382`	`"\n",`
`385`	`383`	`"### Sinkhorn's algorithm\n",`
`386`		`- "What is great about Sinkhorn’s theorem is that there is a efficient algorithm to calculate an approximation of the doubly stochastic matrix given a matrix $A,ドル that features linear convergence. And the algorithm is very simple: one just iteratively normalize the matrix along rows and columns."`
	`384`	+ "What is great about Sinkhorn’s theorem is that there is an efficient algorithm to calculate an approximation of the doubly stochastic matrix given a matrix $A,ドル that features linear convergence. And the algorithm is very simple: one just iteratively normalize the matrix along rows and columns. In Djalil Chafai post, it is mentionned, that, when factorizing a non-negative square matrix $M$ as $D_1 A D_2,ドル it is well known that $A$ is the projection of $M,ドル with respect to the Kullback-Leibler divergence, on the set of doubly stochastic matrices (Birkhoff polytope). The Sinkhorn factorization and its algorithm became fashionable in the domain of computational optimal transportation due to a relation with entropic regularization. It is also considered in the domain of quantum information theory."
`387`	`385`	`]`
`388`	`386`	`},`
`389`	`387`	`{`
`@@ -517,23 +515,49 @@`
`517`	`515`	`"metadata": {},`
`518`	`516`	`"source": [`
`519`	`517`	`"### How is Sinkhorn algorithm related to OT ?\n",`
`520`		`- "So how is calculating a doubly stochastic matrix related to optimal transport?\n",`
`521`		`- "\n",`
	`518`	`+ "So how is calculating a doubly stochastic matrix related to optimal transport ? The link between those two problems has been highlighted in a famous paper from Marco Cuturi in 2013 (CUTURI, Marco. Sinkhorn distances: Lightspeed computation of optimal transport. Advances in neural information processing systems, 2013, vol. 26.)"`
	`519`	`+ ]`
	`520`	`+ },`
	`521`	`+ {`
	`522`	`+ "cell_type": "code",`
	`523`	`+ "execution_count": 34,`
	`524`	`+ "metadata": {},`
	`525`	`+ "outputs": [],`
	`526`	`+ "source": [`
	`527`	`+ "#IFrame(\"doc/OptimalTransportWasserteinDistance/NIPS-2013-sinkhorn-distances-lightspeed-computation-of-optimal-transport-Paper.pdf\", width=1200, height=800)"`
	`528`	`+ ]`
	`529`	`+ },`
	`530`	`+ {`
	`531`	`+ "cell_type": "markdown",`
	`532`	`+ "metadata": {},`
	`533`	`+ "source": [`
	`534`	`+ "### Formalizing Kantorovich relaxed OT\n",`
`522`	`535`	`"We will reformulate our optimal transport problem with a slightly different matrix form: This time, we take N warehouses and also N customers (K=N from our original problem). The distance cost is still $c(x,y)$ is the distance between the storage area $x$ and the address of customer $y,ドル but we store those cost in matrix $M \\in \\mathbb{R}^{N\\times N+}$ where entry $m_{i,j}=c(x_i,y_j)$.\n",`
`523`	`536`	`"\n",`
`524`	`537`	`"Now suppose our e-readers are randomly distributed according to a distribution r among the N warehouses. And we would like them to be distributed according to another distribution c on the customers.\n",`
`525`		- "Now we can write up a non-negative matrix $A \\in \\mathbb{R}^{N\\times K+},ドル where $a_{i,j}$ corresponds to how much portion of the total e-readers we would like to transport from warehouse i to customer j. Another way to view this matrix A is that A defines a joint distribution in the transportation space and r and c could be seen as its marginals of starting points and destinations respectively. Now it’s not hard to see that the final transportation cost would be the sum of the element-wise product between A and M, or in other words the Frobenius inner product $\\langle A, M \\rangle$ In fact, this is called the Kantorovich relaxation of OT.\n",
	`538`	+ "Now we can write up a non-negative matrix $A \\in \\mathbb{R}_{+}^{N\\times K},ドル where $a_{i,j}$ corresponds to how much portion of the total e-readers we would like to transport from warehouse i to customer j. Another way to view this matrix A is that A defines a joint distribution in the transportation space and r and c could be seen as its marginals of starting points and destinations respectively. Now it’s not hard to see that the final transportation cost would be the sum of the element-wise product between A and M, or in other words the Frobenius inner product $\\langle A, M \\rangle$ In fact, this is called the Kantorovich relaxation of OT.\n",
`526`	`539`	`"\n",`
`527`		`- "There is the conclusion that if M is a distance matrix, that is, every element in the matrix comform to the three property of distance, the inner product is a distance itself. Confirming this should not be hard. To make a transfort optimal is to find the minimum cost of the total transport, that is, to minimize the Frobenius inner product.\n",`
	`540`	`+ "More formaly, we can say that for two probability vectors $r$ and $c$ in the simplex $\\Sigma_d := \\{ x \\in \\mathbb{R}_{+}^{d} : x^T \\mathbb{1}^d = 1\\},ドル where $\\mathbb{1}^d$ is the d dimensional vector of ones, we write $U(r, c)$ for the transport polytope of r and c, namely the polyhedral set of $d \\times d$ matrices:\n",`
	`541`	`+ "\\begin{align*}\n",`
	`542`	`+ " U(r,c) := \\{A \\in \\mathbb{R}_{+}^{d\\times d} \| P\\mathbb{1}^d =r, P^T\\mathbb{1}^d =c \\}\n",`
	`543`	`+ "\\end{align*}\n",`
`528`	`544`	`"\n",`
	`545`	`+ "The conclusion is that if $M$ is a distance matrix, that is, every element in the matrix comform to the three properties of distance (non-negativity, symmetry and the triangle inequality), the inner product is a distance itself. To make a transport optimal is to find the minimum cost of the total transport, that is, to minimize the Frobenius inner product.\n",`
	`546`	`+ "To rephrase previous assertion in a more formal way, transport distance between $r$ and $c$ given a $d \\times d$ cost matrix $M$ , the cost of mapping $r$ to $c$ using a transport matrix (or joint probability) A can be quantified as $\\langle A, M \\rangle$. The problem of optimal transport reads\n",`
	`547`	`+ "\\begin{align*}\n",`
	`548`	`+ " d_{M}(r,c) := \\underset{A \\in U(r,c)}{min} \\langle A, M \\rangle\n",`
	`549`	`+ "\\end{align*}\n",`
	`550`	`+ "\n",`
	`551`	`+ "\n",`
	`552`	`+ "### OT and statistical concepts\n",`
`529`	`553`	`"I would like to stop and mention that as we now interpret A as a joint probability matrix, we can define its entropy, the marginal probabiilty entropy, and KL-divergence between two different transportation matrix. These takes the form of\n",`
`530`	`554`	`"\n",`
`531`	`555`	`"\\begin{align*}\n",`
`532`		`- " \\text{Entropy} &= H(A) &= \\sum_{i,j} a_{i,j} log(a_{i,j}) \\\\\n",`
`533`		`- " \\text{Marginal source distribution entropy r} &= H(r) &= \\sum_{i} \\left( \\sum_{j} a_{i,j} \\right) log\\left( \\sum_{j} a_{i,j} \\right)\\\\\n",`
`534`		`- " \\text{Marginal destination distribution entropy c} &= H(c) &= \\sum_{j} \\left( \\sum_{i} a_{i,j} \\right) log\\left( \\sum_{i} a_{i,j} \\right)\\\\\n",`
`535`		`- " \\text{KL-divergence between A and B transportation} &= &=\n",`
`536`		`- "\\end{align*}\n"`
	`556`	`+ " \\text{Entropy} &= H(A) &= -\\sum_{i,j} a_{i,j} log(a_{i,j}) & \\\\\n",`
	`557`	`+ " \\text{Marginal source distribution entropy r} &= H(r) &= -\\sum_{i} \\left( \\sum_{j} a_{i,j} \\right) log\\left( \\sum_{j} a_{i,j} \\right) &= −\\sum_{i} r_i log(r_i)\\\\\n",`
	`558`	`+ " \\text{Marginal destination distribution entropy c} &= H(c) &= -\\sum_{j} \\left( \\sum_{i} a_{i,j} \\right) log\\left( \\sum_{i} a_{i,j} \\right) &= −\\sum_{i} c_i log(c_i)\\\\\n",`
	`559`	`+ " \\text{KL-divergence between A and B transportation} &= KL(A\\\|B) &= \\sum_{i,j} a_{i,j} log\\left(\\frac{a_{i,j}}{{b}_{i,j}}\\right) &\n",`
	`560`	`+ "\\end{align*}"`
`537`	`561`	`]`
`538`	`562`	`},`
`539`	`563`	`{`
`@@ -857,6 +881,23 @@`
`857`	`881`	`"# See also, hungarian algorithm"`
`858`	`882`	`]`
`859`	`883`	`},`
	`884`	`+ {`
	`885`	`+ "cell_type": "markdown",`
	`886`	`+ "metadata": {},`
	`887`	`+ "source": [`
	`888`	`+ "## Computation of Wassertstein barycenters\n",`
	`889`	`+ "\n"`
	`890`	`+ ]`
	`891`	`+ },`
	`892`	`+ {`
	`893`	`+ "cell_type": "code",`
	`894`	`+ "execution_count": null,`
	`895`	`+ "metadata": {},`
	`896`	`+ "outputs": [],`
	`897`	`+ "source": [`
	`898`	`+ "#IFrame(\"doc/OptimalTransportWasserteinDistance/Fast_computation_of_Wasserstein_barycenters.pdf\", width=1200, height=800)"`
	`899`	`+ ]`
	`900`	`+ },`
`860`	`901`	`{`
`861`	`902`	`"cell_type": "markdown",`
`862`	`903`	`"metadata": {},`

`‎doc/OptimalTransportWasserteinDistance/Fast_computation_of_Wasserstein_barycenters.pdf‎`

472 KB

Binary file not shown.

`‎doc/OptimalTransportWasserteinDistance/NIPS-2013-sinkhorn-distances-lightspeed-computation-of-optimal-transport-Paper.pdf‎`

360 KB

Binary file not shown.

0 commit comments

Comments

(0)

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit 0805e53

File tree

4 files changed

4 files changed

`‎InformationTheoryOptimization.ipynb‎`

`‎OptimalTransportWasserteinDistance.ipynb‎`

`‎doc/OptimalTransportWasserteinDistance/Fast_computation_of_Wasserstein_barycenters.pdf‎`

`‎doc/OptimalTransportWasserteinDistance/NIPS-2013-sinkhorn-distances-lightspeed-computation-of-optimal-transport-Paper.pdf‎`

0 commit comments