Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit 897b37a

Browse files
Working on concavity of entropy and convexity of KL divergence
1 parent 94328b8 commit 897b37a

File tree

2 files changed

+33
-2
lines changed

2 files changed

+33
-2
lines changed

‎InformationTheoryOptimization.ipynb

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -177,6 +177,37 @@
177177
" return p, -np.dot(p,SafeLog2(p))"
178178
]
179179
},
180+
{
181+
"cell_type": "markdown",
182+
"metadata": {},
183+
"source": [
184+
"## Interesting property of entropy\n",
185+
"### Concavity of entropy and convexity of KL-divergence\n",
186+
"The entropy is concave in the space of probability mass function, more formally, this reads:\n",
187+
"\\begin{align*}\n",
188+
" H[\\lambda p_1 + (1-\\lambda p_2)] \\geq \\lambda H[p_1] + (1-\\lambda p_2) H[p_2]\n",
189+
"\\end{align*}\n",
190+
"where $p_1$ and $p_2$ are probability mass functions and $\\lambda \\in [0,1]$\n",
191+
"\n",
192+
"Proof: Let $X$ be a discrete random variable with possible outcomes $\\mathcal{X} := {x_i, i \\in 0,1,\\dots N-1}$ and let $u(x)$ be the probability mass function of a discrete uniform distribution on $X \\in \\mathcal{X}$. Then, the entropy of an arbitrary probability mass function $p(x)$ can be rewritten as\n",
193+
"\n",
194+
"\\begin{align*}\n",
195+
" H(X) &= - \\sum_{i=0}^{N-1} p(x_i)log(p(x_i)) \\\\\n",
196+
" &= - \\sum_{i=0}^{N-1} p(x_i)log\\left(\\frac{p(x_i)}{u(x_i)} u(x_i)\\right) \\\\\n",
197+
" &= - \\sum_{i=0}^{N-1} p(x_i)log\\left(\\frac{p(x_i)}{u(x_i)}\\right) - \\sum_{i=0}^{N-1} p(x_i)log(u(x_i)) \\\\\n",
198+
" &= -KL[p\\|u] - \\sum_{i=0}^{N-1} p(x_i)log(u(x_i)) \\\\\n",
199+
" &= -KL[p\\|u] - log \\left(\\frac{1}{N} \\right) \\sum_{i=0}^{N-1} p(x_i) \\\\\n",
200+
" &= log(N) - KL[p\\|u]\n",
201+
" log(N) - H(X) &= KL[p\\|u]\n",
202+
"\\end{align*}\n",
203+
"\n",
204+
"Where $KL[p\\|u]$ is the Kullback-Leibler divergence between $p$ and the discrete uniform distriution $u$ over $\\mathcal{X},ドル a concept we will explain more in detail later on this page. \n",
205+
"Note that the KL divergence is convex in the space of the pair of probability distributions $(p,q)$:\n",
206+
"\\begin{align*}\n",
207+
" KL[\\lambda p_1 + (1-\\lambda p_2) \\| \\lambda q_1 + (1-\\lambda q_2)] \\geq \\lambda KL[p_1\\|q_1] + (1-\\lambda p_2) KL[p_2\\|q_2]\n",
208+
"\\end{align*}\n"
209+
]
210+
},
180211
{
181212
"cell_type": "markdown",
182213
"metadata": {},

‎OptimalTransportWasserteinDistance.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -559,7 +559,7 @@
559559
"metadata": {},
560560
"source": [
561561
"### OT and statistical concepts\n",
562-
"Some of the basics to understand the following statements can be found in the notebook \"InformationTheoryOptimization\"\n",
562+
"Some of the basics to understand the following statements can be found in the notebook \"InformationTheoryOptimization\" this part is also partly a direct reproduction of Marco Cuturi famous article \"Sinkhorn Distances: Lightspeed Computation of Optimal Transport\"\n",
563563
"\n",
564564
"I would like to stop and mention that as we now interpret $P$ as a joint probability matrix, we can define its entropy, the marginal probabiilty entropy, and KL-divergence between two different transportation matrix. These takes the form of\n",
565565
"\n",
@@ -585,7 +585,7 @@
585585
" KL(P\\|rcˆT) = h(r) + h(c) − h(P)\n",
586586
"\\end{align*}\n",
587587
"\n",
588-
"This quantity is also the mutual information $I(X\\|Y)$ of two random variables $(X, Y)$ should they follow the joint probability $P$ (Cover and Thomas, 1991, §2). Hence, the set of tables P whose Kullback-Leibler divergence to rcT is constrained to lie below a certain threshold can be interpreted as the set of joint probabilities P in U (r, c) which have sufficient entropy with respect to h(r) and h(c), or small enough mutual information. For reasons that will become clear in Section 4, we call the quantity below the Sinkhorn distance of r and c:"
588+
"This quantity is also the mutual information $I(X\\|Y)$ of two random variables $(X, Y)$ should they follow the joint probability $P$ . Hence, the set of tables P whose Kullback-Leibler divergence to rcT is constrained to lie below a certain threshold can be interpreted as the set of joint probabilities P in U (r, c) which have sufficient entropy with respect to h(r) and h(c), or small enough mutual information. For reasons that will become clear in Section 4, we call the quantity below the Sinkhorn distance of r and c:"
589589
]
590590
},
591591
{

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /