After implementing the group matrices I figured it would be nice to see the group matrix of a much larger group. (If you are a mathematician, the Web of Trust is a large directed graph where the vertices are called "keys" and the edges "signatures". There are four different types of signatures, just think of them as four colors of the edges. The group matrix is the adjacency matrix of this graph. You probably want to take a look at the FAQ, especially the part about MSD.) I generated a PNG image with the keys sorted in MSD order and expected little more than random noise. When I first saw the result I thought I had done something wrong, but a little bit of thinking revealed that the resulting leaf-like shape was perfectly natural, almost unavoidable. Writing my master's thesis on a mathematical departement (but not about this) has its advantages, but I did not manage to find anyone who had seen anything like it. If you know if it has been analyzed before, please send me a mail. Scroll down for some images analyzing the mail addresses of the OpenPGP keys, but let's try to dissect this leaf first.
Before we start talking about the technical details of the images, remember that what they really represent is relations between real human beings. They may look good regardless, but keeping in mind that each and every dot (well, almost) is an actual meeting between two humans gives the patterns a new dimension. If you own an OpenPGP key which was available here at the time this was written, then some of these dots were made by you.
The Web of Trust information used on this page was taken from the .wot file from 2004年12月10日.
A dot in column x
, counting from left, and row
y
, counting from the top, means that key number
x
has signed key number y
. A red
dot means a signature of level 0 or 1, a blue dot level 2
and a green level 3. Signatures which are not on the primary
User ID are represented by a darker dot. A key can not sign
itself (or, rather, must always sign itself). This is shown
by the grey diagonal line. In other words, each key has a
vertical line and a horizontal line, with the same distance
from the top left corner. The horizontal lines shows
signatures on that key, while the vertical line shows
signatures made by that key. The intersection between those
two lines is grey. To get a better feel of how these
matrices work, play with the group matrix generator.
Since the Web of Trust contained approximately 24000 keys when these images were made (December 2004), most images on this page should really be 24000x24000 pixels large. That is very large and unpractical. If you really want one of these, this is an unscaled image of the whole leaf. Beware that most programs eats something between several hundred MiB and some GiB of memory and then crashes when trying to unpack it, while most web browsers just give up and claim it contains errors. The rest of the images are downscaled when needed to more practical sizes. To avoid suspicion of patterns created by the downscaling, only integer downscalings are made. Since the images would be far too dark if the pixels were just averaged, a floating point average was first calculated, and that value was gamma corrected with a high gamma value.
If you click on the image, or on any other image on this page, you will come to a page with some more technical information on the image and a HTML hack that lets you see the name and MSD for individual keys. The links below the images takes you to similar pages with larger images. If you follow a link on the image above, you can easily check that it is sorted in MSD order by moving your mouse over the image while watching your status bar.
Not all group matrices look like leaves. The group matrices look very different depending on the order of the keys. There are a large number of ways to arrange 24000 keys in a list. According to Stirling's approximation, the number would have around 94702 digits. A large number indeed. Most key orders gives a group matrix that look like the random one to the right or above. We shall explore some more key orders that comes natural, but much more may be hidden in the other ways to order the keys.
The only difference between this random image and the one above is the order of the keys. Because of the lack of other order, one detail stands out - there are both vertical and horizontal colored stripes, but the vertical ones are clearly more visible. This is because a particular human being tends to be somewhat consistent in what signature types she uses, and while signature types on a particular key are also correlated, that correlation is much weaker. The same patterns can be seen on the other images too, but they are much harder to spot amongst the other patterns.
There are two natural ways to measure distances in this graph. Counting the number of keys is an obvious method, but since the keys are sorted by MSD we can also look at the MSD distances, at least for purely vertical or purely horizontal distances. MSD is measured in hops, so this gives us a way to speak of the distance between two lines or two columns as a number of hops. In this graph every integer MSD is marked by a yellow line and cyan lines are placed every 1/10th hop. You can also follow one of the links below the image or on the image itself to (hopefully be able to) explore the distances with your mouse pointer.
There are a number of interesting things that can be seen here.
One way to understand the 1-hop line is to pick a key, any
key, and call it Alice's key. If Alice's key has MSD
x
, and Alice has signed Bob's key, Bob's key
would have an MSD of at most x+1
. The shortest
path from any key, let's call it Carol's, to Bob's either
goes through Alice's, and is thus one hop longer than the
shortest path from Carol to Alice, or it goes another
shorter way and is, well, shorter. The mean shortest
distance is therefore at most one hop longer than Alice's.
Alice has a horizontal line that contains signatures on her key and a vertical line that contains signatures she has made. Those lines intersect at a point in the diagonal, Alice's diagonal point. If you follow that line down from the diagonal more than one hop it can not contain any more signatures. If she had signed a key further down, that key would have a higer MSD and be sorted higher up, above the 1-hop line. The fact that keys tend to cluster on and close to the 1-hop line isn't very strange either. Keys will often end up with one particular signature that lowers its MSD considerably, and will therefore get an MSD of just below one hop more than the MSD of the signing key. There will therefore be lots of dots exactly one hop below the diagonal. For similar reasons, and also by plain geometrical symmetry, every point in the 1-hop line is exactly one hop to the left of the diagonal.
The (-1)-hop line is just a mirror image of the 1-hop line.
Remember, the rows and columns of these images represent
human beings, and the dots represents meetings between human
beings. If Alice meets Bob, there is a good chance that Bob
simultaneosly meets Alice. In practice, if Alice signs Bob's
key, Bob will probably sign Alice's key too. If Alice has a
lower MSD than Bob, Alice's signature on Bob's key will be
visible as a dot a certain distance, no more than
1
hop, below her diagonal point. Bob's
signature on Alice's key will then be visible the exact same
distance to the right of her diagonal point. The same can be
said for the reverse relationship, but the signature points
are above and to the left of Bob's diagonal point since he
has a higher MSD than Alice.
I said that Bob will probably sign Alice's key, and in practice, for different reasons, signatures are sometimes one-way. The fact that there are signatures above and to the right of the (-1)-hop line is a proof of this. If all signatures were cross-signatures, the top/right side of the leaf would be an exact (but with possibly different colors) mirror copy of the left/bottom side. Since all signatures are not cross-signatures the leaf is not symmetrical, and there can therefore be signatures further away. For example near the (-2)-hop line, which seems to attract signatures in a similar way as the (-1)-hop line.
The dark crosses cannot be explained with simple theory like
this. We'll have to look deeper into the Web of Trust to
find their cause. It turns out that the two keys 0xB3B2A12C
ct magazine CERTIFICATE <pgpCA@ct.heise.de> and 0xBB1D9F6D
ct magazine CERTIFICATE <pgpCA@ct.heise.de> are
pretty high ranked and have made so many signatures on keys
that would otherwise have a quite high MSD. The keys seems
to be part of an initiative
by a German magazine to promote cryptography and
OpenPGP. Those two keys are the only ones I can find leaving
obvious visible traces in the leaf, so the initiative seems
to be a success or at least have made quite an impact. This
has lead to lots of keys with an MSD 1
hop
higher than the MSD of those two keys. The keys differ
somewhat in MSD, so we have two different lines that
distance apart. This explains the increased distance between
the cyan lines - we simply have lots of keys that have to
fit in at practically the same MSD. The reason the lines are
dark is that for someone to get an MSD of 1
plus the MSD of the signing key, that someone have probably
neither got very many many signatures nor made as many
signatures themselves. Notice the way the crosses bounces
nicely on the 1-hop and (-1)-hop lines all the way down to
the end of the leaf. Or, in other words, that the cluster of
keys around specific MSDs also gives rise to clusters
exactly one hop away, and these too does not contain very
many signatures.
The above images just show direct links between keys. It is possible to find longer paths in the unscaled image, but it isn't something one would like to do by hand. Instead, we can look at this image. Instead of showing signatures it shows all paths that are exactly two hops long. (If you are a mathematician, think of the square of an adjacency matrix with only zeroes in the diagonal.) The color is chosen by averaging the signature type of the two signatures and rounding downwards, then selecting color as above. Notice how we now also see the 2-hop line, but just as before everything below that line is black. The fact that the 1-hop and (-1)-hop lines are still visible is interesting. One might wonder if they come from the probable scenario that if Alice and Bob have cross-signed and Bob and Carol have done the same, then there is a good chance that Alice and Carol have also cross-signed. Looking at the image below proves that this is not the whole explanation. In this image only the shortest paths of length two are marked. The 1-hop and (-1)-hop lines are clearly fainter, but still clearly visible.
Since we have both a key metric and a hop metric it makes sense to also draw the images so that one hop instead of one key gets a specific distance in the image. In other words, we stretch the image until the distances between the MSD marks are equal. If we remove the grid this stretched leaf is easily seen, but its shape is quite boring. Since the integer-hop lines always intersects the MSD grid crossings they must now all be straight lines. Notice how we now clearly can see the (-3)-hop line, which was barely guessable before.
All the other images on this page are generated from the actual OpenPGP Web of Trust, but it might be interesting to see if we can reproduce the leaf shape with a random Web of Trust. It looks like we can produce something similar. This image was made from a Web of Trust, sorted in MSD order, generated from these rules:
1
. Add
1
to that number. This key will now receive
that number of signatures.As can be seen in the above images, the top left corner is quite dense with signatures. This is simply a zoom of that corner. Unfortunately, it doesn't look very interesting. The hop distance from top to bottom row is less than one hop, which places the 1-hop and (-1)-hop lines, and therefore also the dark side of the leaf, outside the picture.
Sorting them by local MSD instead of global MSD doesn't make it less boring, but if we also stretch it like we did before to make the pixel distance proportional to hop distance something happens. The two lighter squares are interesting, especially since they seem to overlap in the middle.
(A stretched image with global MSDs look somewhat similar but is more boring.)
This far we have used no information besides the structure
of the Web of Trust when sorting the keys. But the keys also
have another piece of information attached to them which
discloses some aspects of their owner's relations with each
other, completely independant of the signatures. Each key
has a primary User ID, usually in the form John Doe
<foo@bar.com>
. Sometimes they have a
parenthesis or something else after the email address, and
sometimes the address isn't enclosed in brackets. I wanted
to keep it simple, so I just located the last
>
and if it existed I removed it together
with everything after. I then stripped the whitespace from
the end of the string, reversed it, and sorted the keys in
case-insensitive alphabetical order based on those strings.
The resulting group matrix image has much to tell. There are
clearly correlations between User IDs and signatures. If you
follow one of the links below the image or on the image
itself you might be able to explore the User IDs with your
mouse. Or you might look below where I have already marked
the important squares.
Please remember that when creating these images I used no
information about domains or anything similar. The sorting
did not treat the .
or any other character in
any special way, except stripping whitespace and removing
the last >
and everything after it.
Germany is clearly an important part of the Web of Trust, both containing many keys and being dense in signatures. This is probably in large thanks to the magazine initiative found above, but I don't doubt there are more factors.
.com
and .net
are very faint, as
expected. People having those types of keys are probably
scattered all around the globe. .org
, on the
other hand, might be scattered but is very strong anyway.
.debian.org
is the only non-top domain visible,
with a size similar to Sweden's but much brighter, despite
the developers geographical separation.
Notice the very dark border around Germany? The top and left edge says that Germany has very little contact with Canada, which makes sense geographically. However, the right and bottom edge says that the social contact between Germany and Sweden is almost non-existant, despite the mere 70 km between the two countries. In fact, there are only 187 signatures between the countries, of which 19 are to or from my key.
Instead of just sorting by either MSD or User ID, we can sort by a mix of them. Unfortunately, to do this we must part somewhat from the dumb and simple sorting policy, but the result is well worth it. Here the keys are sorted by email address like before, but then the top domains are identified and within each top domain the keys are sorted by the MSD those keys has in the whole Web of Trust.
Each leaf along the diagonal has the similar features as the
whole leaf analyzed before, for the same reasons. The 1-hop
and (-1) hop lines are there, and even the (-2)-hop line is
visible on some of the leaves. Why the dark side of the
leaves is still completely dark is not very strange when you
realize that a signature within the top domain limits the
MSD difference to 1
in the larger set too.
Notice how the two dark crosses only appear in the
.de
square. Remember, the two keys that made
the crosses are German.
While I was able to predict that the squares along the diagonal would look somewhat similar to the whole leaf before producing the image, the fact that the other squares, which represents signatures between top domains, also share most of the same features is strange. The only visible difference seems to be that the diagonal doesn't need to be straight.
.org
, sorted in MSD order.
We can also look at only the keys with a specific top
domain. Here we have all .org
keys, sorted by
their global MSD value. This is actually identical to a zoom
of the .org
square of the above image.
This image is very similar to the leaf for the whole Web of Trust. But the MSD values these keys are sorted by are not produced in the same way - for the whole Web of Trust we calculated the MSDs in the same set of keys that we graphed. Now the MSDs are calculated in a much larger superset. That does not seem to make much difference.
We can instead calculate distances among just the
.org
keys. This poses some difficulties since
there are key pairs between which there simply is no path.
In this image the keys are sorted according to the mean
shortest distance from the set of keys that can reach a
particular key. The big empty space on the top is keys that
can only be reached by themselves, at no distance at all,
and therefore has an MSD of exactly 0. Since they cannot be
reached by any other key they have no signature to the
right, but they may have signed other keys so some
signatures are visible beneath them.
Since it might be odd that a non-reachable key is sorted first we can make reachability highest priority and sort by MSD only among keys that are identically reachable. If you follow the links you see that it was implemented by rewarding large penalty MSDs for each key that could not reach the key. No key can be reached by all the other, but the lone keys at the top can be reached by slightly more than the large leaf in the middle.
You don't need a very large set of keys to produce a leaf
shape. These are just the Swedish keys. Or, more accurately,
the keys whose primary User ID has an email address that
ends in .se
. Many Swedes probably have email
addresses with other top domains. Anyway, the leaf shape and
most of its features are clearly visible with only these
468 keys.
Of course I have to sort the Swedish keys by email address
too. The most visible square is .liu.se
, Link?ping University, where
I am a student. Inside that square is
.lysator.liu.se
, Lysator, The Academic
Computer Society of Link?ping, where I am a member.
Inside that square is the intersection of the two lines of
my key.
This was one of those things I just had to do - sort the keys by their local MSD from only those keys inside their own top domain which can reach them. I have only faint ideas of what this image means, but those Germans are clearly visible here too.
Some ideas: