David wrote, challengingly,
If all of the members of that family of entropies he told us about are
so interesting […]
If you’re making the sceptical point I think you are, then I
probably agree. My belief is that, in some sense, Shannon entropy has prime
position among the family () of entropy
measures that I discussed. The others are still interesting, but not,
I think, as important.
By way of analogy, there are many interesting invariants of
topological spaces that can be defined via homology: the rank of the
rd homology group, for instance. But in some sense it’s the Euler
characteristic that has primacy: it’s the invariant that
behaves most like cardinality.
Certainly Shannon entropy has properties not shared by the
-entropies for . Indeed,
Rényi made exactly this point when he introduced
. In particular, he observed that while shares
with the property that the entropy of a product is the sum of
the entropies, it does not share the property that the entropy
of a convex combination
can be expressed in terms of , and
.
In Part 2 (coming soon!), I’ll define -entropy,
-diversity and -cardinality of finite probability
spaces in which the underlying set is equipped with a metric.
Curiously, in this extended setting it’s the case
that seems to be best-understood. But I’m sure must play
a special role.
Continuing David’s sentence:
[…] why is it that so many of our best loved distributions are maximum
entropy distributions (under various constraints) for Shannon entropy?
A priori, that doesn’t exclude the possibility that they’re also
maximum entropy distributions for other entropies .
As I understand it, you’re mostly referring to distributions on the
real line, and the task is to find the distribution having maximum
entropy subject to certain constraints (e.g. ‘mean must be and standard
deviation must be ’). But let’s go back to a much simpler case:
distributions on a finite set, subject to no constraints. As we saw
in Part 1, the distribution with maximum entropy is the
uniform distribution — and that’s true for -entropy,
no matter what is. So in this case, the distribution
that maximizes the -entropy is the same for all .
I’ll talk more about maximizing entropy in Part 2. There seem to be
some unsolved problems.