|
177 | 177 | " return p, -np.dot(p,SafeLog2(p))"
|
178 | 178 | ]
|
179 | 179 | },
|
| 180 | + { |
| 181 | + "cell_type": "markdown", |
| 182 | + "metadata": {}, |
| 183 | + "source": [ |
| 184 | + "## Interesting property of entropy\n", |
| 185 | + "### Concavity of entropy and convexity of KL-divergence\n", |
| 186 | + "The entropy is concave in the space of probability mass function, more formally, this reads:\n", |
| 187 | + "\\begin{align*}\n", |
| 188 | + " H[\\lambda p_1 + (1-\\lambda p_2)] \\geq \\lambda H[p_1] + (1-\\lambda p_2) H[p_2]\n", |
| 189 | + "\\end{align*}\n", |
| 190 | + "where $p_1$ and $p_2$ are probability mass functions and $\\lambda \\in [0,1]$\n", |
| 191 | + "\n", |
| 192 | + "Proof: Let $X$ be a discrete random variable with possible outcomes $\\mathcal{X} := {x_i, i \\in 0,1,\\dots N-1}$ and let $u(x)$ be the probability mass function of a discrete uniform distribution on $X \\in \\mathcal{X}$. Then, the entropy of an arbitrary probability mass function $p(x)$ can be rewritten as\n", |
| 193 | + "\n", |
| 194 | + "\\begin{align*}\n", |
| 195 | + " H(X) &= - \\sum_{i=0}^{N-1} p(x_i)log(p(x_i)) \\\\\n", |
| 196 | + " &= - \\sum_{i=0}^{N-1} p(x_i)log\\left(\\frac{p(x_i)}{u(x_i)} u(x_i)\\right) \\\\\n", |
| 197 | + " &= - \\sum_{i=0}^{N-1} p(x_i)log\\left(\\frac{p(x_i)}{u(x_i)}\\right) - \\sum_{i=0}^{N-1} p(x_i)log(u(x_i)) \\\\\n", |
| 198 | + " &= -KL[p\\|u] - \\sum_{i=0}^{N-1} p(x_i)log(u(x_i)) \\\\\n", |
| 199 | + " &= -KL[p\\|u] - log \\left(\\frac{1}{N} \\right) \\sum_{i=0}^{N-1} p(x_i) \\\\\n", |
| 200 | + " &= log(N) - KL[p\\|u]\n", |
| 201 | + " log(N) - H(X) &= KL[p\\|u]\n", |
| 202 | + "\\end{align*}\n", |
| 203 | + "\n", |
| 204 | + "Where $KL[p\\|u]$ is the Kullback-Leibler divergence between $p$ and the discrete uniform distriution $u$ over $\\mathcal{X},ドル a concept we will explain more in detail later on this page. \n", |
| 205 | + "Note that the KL divergence is convex in the space of the pair of probability distributions $(p,q)$:\n", |
| 206 | + "\\begin{align*}\n", |
| 207 | + " KL[\\lambda p_1 + (1-\\lambda p_2) \\| \\lambda q_1 + (1-\\lambda q_2)] \\geq \\lambda KL[p_1\\|q_1] + (1-\\lambda p_2) KL[p_2\\|q_2]\n", |
| 208 | + "\\end{align*}\n" |
| 209 | + ] |
| 210 | + }, |
180 | 211 | {
|
181 | 212 | "cell_type": "markdown",
|
182 | 213 | "metadata": {},
|
|
0 commit comments