Commit 897b37a

committed

Working on concavity of entropy and convexity of KL divergence

1 parent 94328b8 commit 897b37aCopy full SHA for 897b37a

File tree

2 files changed

+33

-2

lines changed

InformationTheoryOptimization.ipynb
OptimalTransportWasserteinDistance.ipynb

2 files changed

+33

-2

lines changed

`‎InformationTheoryOptimization.ipynb‎`

Lines changed: 31 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -177,6 +177,37 @@`
`177`	`177`	`" return p, -np.dot(p,SafeLog2(p))"`
`178`	`178`	`]`
`179`	`179`	`},`
	`180`	`+ {`
	`181`	`+ "cell_type": "markdown",`
	`182`	`+ "metadata": {},`
	`183`	`+ "source": [`
	`184`	`+ "## Interesting property of entropy\n",`
	`185`	`+ "### Concavity of entropy and convexity of KL-divergence\n",`
	`186`	`+ "The entropy is concave in the space of probability mass function, more formally, this reads:\n",`
	`187`	`+ "\\begin{align*}\n",`
	`188`	`+ " H[\\lambda p_1 + (1-\\lambda p_2)] \\geq \\lambda H[p_1] + (1-\\lambda p_2) H[p_2]\n",`
	`189`	`+ "\\end{align*}\n",`
	`190`	`+ "where $p_1$ and $p_2$ are probability mass functions and $\\lambda \\in [0,1]$\n",`
	`191`	`+ "\n",`
	`192`	`+ "Proof: Let $X$ be a discrete random variable with possible outcomes $\\mathcal{X} := {x_i, i \\in 0,1,\\dots N-1}$ and let $u(x)$ be the probability mass function of a discrete uniform distribution on $X \\in \\mathcal{X}$. Then, the entropy of an arbitrary probability mass function $p(x)$ can be rewritten as\n",`
	`193`	`+ "\n",`
	`194`	`+ "\\begin{align*}\n",`
	`195`	`+ " H(X) &= - \\sum_{i=0}^{N-1} p(x_i)log(p(x_i)) \\\\\n",`
	`196`	`+ " &= - \\sum_{i=0}^{N-1} p(x_i)log\\left(\\frac{p(x_i)}{u(x_i)} u(x_i)\\right) \\\\\n",`
	`197`	`+ " &= - \\sum_{i=0}^{N-1} p(x_i)log\\left(\\frac{p(x_i)}{u(x_i)}\\right) - \\sum_{i=0}^{N-1} p(x_i)log(u(x_i)) \\\\\n",`
	`198`	`+ " &= -KL[p\\\|u] - \\sum_{i=0}^{N-1} p(x_i)log(u(x_i)) \\\\\n",`
	`199`	`+ " &= -KL[p\\\|u] - log \\left(\\frac{1}{N} \\right) \\sum_{i=0}^{N-1} p(x_i) \\\\\n",`
	`200`	`+ " &= log(N) - KL[p\\\|u]\n",`
	`201`	`+ " log(N) - H(X) &= KL[p\\\|u]\n",`
	`202`	`+ "\\end{align*}\n",`
	`203`	`+ "\n",`
	`204`	`+ "Where $KL[p\\\|u]$ is the Kullback-Leibler divergence between $p$ and the discrete uniform distriution $u$ over $\\mathcal{X},ドル a concept we will explain more in detail later on this page. \n",`
	`205`	`+ "Note that the KL divergence is convex in the space of the pair of probability distributions $(p,q)$:\n",`
	`206`	`+ "\\begin{align*}\n",`
	`207`	`+ " KL[\\lambda p_1 + (1-\\lambda p_2) \\\| \\lambda q_1 + (1-\\lambda q_2)] \\geq \\lambda KL[p_1\\\|q_1] + (1-\\lambda p_2) KL[p_2\\\|q_2]\n",`
	`208`	`+ "\\end{align*}\n"`
	`209`	`+ ]`
	`210`	`+ },`
`180`	`211`	`{`
`181`	`212`	`"cell_type": "markdown",`
`182`	`213`	`"metadata": {},`

`‎OptimalTransportWasserteinDistance.ipynb‎`

Lines changed: 2 additions & 2 deletions

Original file line number	Diff line number	Diff line change
`@@ -559,7 +559,7 @@`
`559`	`559`	`"metadata": {},`
`560`	`560`	`"source": [`
`561`	`561`	`"### OT and statistical concepts\n",`
`562`		`- "Some of the basics to understand the following statements can be found in the notebook \"InformationTheoryOptimization\"\n",`
	`562`	`+ "Some of the basics to understand the following statements can be found in the notebook \"InformationTheoryOptimization\" this part is also partly a direct reproduction of Marco Cuturi famous article \"Sinkhorn Distances: Lightspeed Computation of Optimal Transport\"\n",`
`563`	`563`	`"\n",`
`564`	`564`	`"I would like to stop and mention that as we now interpret $P$ as a joint probability matrix, we can define its entropy, the marginal probabiilty entropy, and KL-divergence between two different transportation matrix. These takes the form of\n",`
`565`	`565`	`"\n",`
`@@ -585,7 +585,7 @@`
`585`	`585`	`" KL(P\\\|rcˆT) = h(r) + h(c) − h(P)\n",`
`586`	`586`	`"\\end{align*}\n",`
`587`	`587`	`"\n",`
`588`		- "This quantity is also the mutual information $I(X\\\|Y)$ of two random variables $(X, Y)$ should they follow the joint probability $P$ (Cover and Thomas, 1991, §2). Hence, the set of tables P whose Kullback-Leibler divergence to rcT is constrained to lie below a certain threshold can be interpreted as the set of joint probabilities P in U (r, c) which have sufficient entropy with respect to h(r) and h(c), or small enough mutual information. For reasons that will become clear in Section 4, we call the quantity below the Sinkhorn distance of r and c:"
	`588`	+ "This quantity is also the mutual information $I(X\\\|Y)$ of two random variables $(X, Y)$ should they follow the joint probability $P$ . Hence, the set of tables P whose Kullback-Leibler divergence to rcT is constrained to lie below a certain threshold can be interpreted as the set of joint probabilities P in U (r, c) which have sufficient entropy with respect to h(r) and h(c), or small enough mutual information. For reasons that will become clear in Section 4, we call the quantity below the Sinkhorn distance of r and c:"
`589`	`589`	`]`
`590`	`590`	`},`
`591`	`591`	`{`

0 commit comments

Comments

(0)

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit 897b37a

File tree

2 files changed

2 files changed

`‎InformationTheoryOptimization.ipynb‎`

`‎OptimalTransportWasserteinDistance.ipynb‎`

0 commit comments