Wotsap

Home
Search
News
FAQ
Mailing lists
Download
Dissecting the Leaf of Trust
Thanks
Links

Dissecting the Leaf of Trust

After implementing the group matrices I figured it would be nice to see the group matrix of a much larger group. (If you are a mathematician, the Web of Trust is a large directed graph where the vertices are called "keys" and the edges "signatures". There are four different types of signatures, just think of them as four colors of the edges. The group matrix is the adjacency matrix of this graph. You probably want to take a look at the FAQ, especially the part about MSD.) I generated a PNG image with the keys sorted in MSD order and expected little more than random noise. When I first saw the result I thought I had done something wrong, but a little bit of thinking revealed that the resulting leaf-like shape was perfectly natural, almost unavoidable. Writing my master's thesis on a mathematical departement (but not about this) has its advantages, but I did not manage to find anyone who had seen anything like it. If you know if it has been analyzed before, please send me a mail. Scroll down for some images analyzing the mail addresses of the OpenPGP keys, but let's try to dissect this leaf first.

About the images

The whole Leaf of Trust

600イ 800イ 1000イ 1200イ 1500イ

An image of all signatures in the Web of Trust, scaled 61 times. The keys are sorted in MSD order, with the key with best MSD being the top row and leftmost column.

Before we start talking about the technical details of the images, remember that what they really represent is relations between real human beings. They may look good regardless, but keeping in mind that each and every dot (well, almost) is an actual meeting between two humans gives the patterns a new dimension. If you own an OpenPGP key which was available here at the time this was written, then some of these dots were made by you.

The Web of Trust information used on this page was taken from the .wot file from 2004年12月10日.

A dot in column x, counting from left, and row y, counting from the top, means that key number x has signed key number y. A red dot means a signature of level 0 or 1, a blue dot level 2 and a green level 3. Signatures which are not on the primary User ID are represented by a darker dot. A key can not sign itself (or, rather, must always sign itself). This is shown by the grey diagonal line. In other words, each key has a vertical line and a horizontal line, with the same distance from the top left corner. The horizontal lines shows signatures on that key, while the vertical line shows signatures made by that key. The intersection between those two lines is grey. To get a better feel of how these matrices work, play with the group matrix generator.

Since the Web of Trust contained approximately 24000 keys when these images were made (December 2004), most images on this page should really be 24000x24000 pixels large. That is very large and unpractical. If you really want one of these, this is an unscaled image of the whole leaf. Beware that most programs eats something between several hundred MiB and some GiB of memory and then crashes when trying to unpack it, while most web browsers just give up and claim it contains errors. The rest of the images are downscaled when needed to more practical sizes. To avoid suspicion of patterns created by the downscaling, only integer downscalings are made. Since the images would be far too dark if the pixels were just averaged, a floating point average was first calculated, and that value was gamma corrected with a high gamma value.

If you click on the image, or on any other image on this page, you will come to a page with some more technical information on the image and a HTML hack that lets you see the name and MSD for individual keys. The links below the images takes you to similar pages with larger images. If you follow a link on the image above, you can easily check that it is sorted in MSD order by moving your mouse over the image while watching your status bar.

Order matters

Random order

600イ 800イ 1000イ 1200イ 1500イ

This too is an image of all the signatures in the Web of Trust, just like the one above, but the order of the keys is random.

Not all group matrices look like leaves. The group matrices look very different depending on the order of the keys. There are a large number of ways to arrange 24000 keys in a list. According to Stirling's approximation, the number would have around 94702 digits. A large number indeed. Most key orders gives a group matrix that look like the random one to the right or above. We shall explore some more key orders that comes natural, but much more may be hidden in the other ways to order the keys.

The only difference between this random image and the one above is the order of the keys. Because of the lack of other order, one detail stands out - there are both vertical and horizontal colored stripes, but the vertical ones are clearly more visible. This is because a particular human being tends to be somewhat consistent in what signature types she uses, and while signature types on a particular key are also correlated, that correlation is much weaker. The same patterns can be seen on the other images too, but they are much harder to spot amongst the other patterns.

Hop metric of the leaf

MSDs marked on the leaf

600イ 800イ 1000イ 1200イ 1500イ

The Leaf of Trust again, but MSDs are marked every hop with yellow lines and every 1/10th hop with dark cyan lines.

There are two natural ways to measure distances in this graph. Counting the number of keys is an obvious method, but since the keys are sorted by MSD we can also look at the MSD distances, at least for purely vertical or purely horizontal distances. MSD is measured in hops, so this gives us a way to speak of the distance between two lines or two columns as a number of hops. In this graph every integer MSD is marked by a yellow line and cyan lines are placed every 1/10th hop. You can also follow one of the links below the image or on the image itself to (hopefully be able to) explore the distances with your mouse pointer.

There are a number of interesting things that can be seen here.

There is a bright line that marks the left/bottom edge of the leaf. Let's call that line the 1-hop line. The 1-hop line has one clearly visible mirror image on the other side of the other side of the diagonal, and yet another but much fainter even further away. I'll refer to these as the (-1)-hop and (-2)-hop lines.
The dark side of the leaf - the area below the 1-hop line is black. It might be hard to tell from the image, but it is completely black, without a single signature.
The 1-hop, (-1)-hop and (-2)-hop lines always intersect MSD line crossings. So does the diagonal, but that can be explained simply by the fact that both rows and columns are sorted in the same order.
The two dark crosses that reflects down the leaf while getting fainter are interesting. Notice how the distances between the cyan 1/10th-hop-lines are bigger around the crosses? That increased distance is visible in all the crosses reflections down the leaf, even when the reflections become too faint to be seen themselves.

One way to understand the 1-hop line is to pick a key, any key, and call it Alice's key. If Alice's key has MSD x, and Alice has signed Bob's key, Bob's key would have an MSD of at most x+1. The shortest path from any key, let's call it Carol's, to Bob's either goes through Alice's, and is thus one hop longer than the shortest path from Carol to Alice, or it goes another shorter way and is, well, shorter. The mean shortest distance is therefore at most one hop longer than Alice's.

Alice has a horizontal line that contains signatures on her key and a vertical line that contains signatures she has made. Those lines intersect at a point in the diagonal, Alice's diagonal point. If you follow that line down from the diagonal more than one hop it can not contain any more signatures. If she had signed a key further down, that key would have a higer MSD and be sorted higher up, above the 1-hop line. The fact that keys tend to cluster on and close to the 1-hop line isn't very strange either. Keys will often end up with one particular signature that lowers its MSD considerably, and will therefore get an MSD of just below one hop more than the MSD of the signing key. There will therefore be lots of dots exactly one hop below the diagonal. For similar reasons, and also by plain geometrical symmetry, every point in the 1-hop line is exactly one hop to the left of the diagonal.

The (-1)-hop line is just a mirror image of the 1-hop line. Remember, the rows and columns of these images represent human beings, and the dots represents meetings between human beings. If Alice meets Bob, there is a good chance that Bob simultaneosly meets Alice. In practice, if Alice signs Bob's key, Bob will probably sign Alice's key too. If Alice has a lower MSD than Bob, Alice's signature on Bob's key will be visible as a dot a certain distance, no more than 1 hop, below her diagonal point. Bob's signature on Alice's key will then be visible the exact same distance to the right of her diagonal point. The same can be said for the reverse relationship, but the signature points are above and to the left of Bob's diagonal point since he has a higher MSD than Alice.

I said that Bob will probably sign Alice's key, and in practice, for different reasons, signatures are sometimes one-way. The fact that there are signatures above and to the right of the (-1)-hop line is a proof of this. If all signatures were cross-signatures, the top/right side of the leaf would be an exact (but with possibly different colors) mirror copy of the left/bottom side. Since all signatures are not cross-signatures the leaf is not symmetrical, and there can therefore be signatures further away. For example near the (-2)-hop line, which seems to attract signatures in a similar way as the (-1)-hop line.

The dark crosses cannot be explained with simple theory like this. We'll have to look deeper into the Web of Trust to find their cause. It turns out that the two keys 0xB3B2A12C ct magazine CERTIFICATE <pgpCA@ct.heise.de> and 0xBB1D9F6D ct magazine CERTIFICATE <pgpCA@ct.heise.de> are pretty high ranked and have made so many signatures on keys that would otherwise have a quite high MSD. The keys seems to be part of an initiative by a German magazine to promote cryptography and OpenPGP. Those two keys are the only ones I can find leaving obvious visible traces in the leaf, so the initiative seems to be a success or at least have made quite an impact. This has lead to lots of keys with an MSD 1 hop higher than the MSD of those two keys. The keys differ somewhat in MSD, so we have two different lines that distance apart. This explains the increased distance between the cyan lines - we simply have lots of keys that have to fit in at practically the same MSD. The reason the lines are dark is that for someone to get an MSD of 1 plus the MSD of the signing key, that someone have probably neither got very many many signatures nor made as many signatures themselves. Notice the way the crosses bounces nicely on the 1-hop and (-1)-hop lines all the way down to the end of the leaf. Or, in other words, that the cluster of keys around specific MSDs also gives rise to clusters exactly one hop away, and these too does not contain very many signatures.

Two hops

All two hop paths

600イ 800イ 1000イ 1200イ 1500イ

Instead of signatures, which can be seen as a one hop path, all two-hop paths are marked.

The above images just show direct links between keys. It is possible to find longer paths in the unscaled image, but it isn't something one would like to do by hand. Instead, we can look at this image. Instead of showing signatures it shows all paths that are exactly two hops long. (If you are a mathematician, think of the square of an adjacency matrix with only zeroes in the diagonal.) The color is chosen by averaging the signature type of the two signatures and rounding downwards, then selecting color as above. Notice how we now also see the 2-hop line, but just as before everything below that line is black. The fact that the 1-hop and (-1)-hop lines are still visible is interesting. One might wonder if they come from the probable scenario that if Alice and Bob have cross-signed and Bob and Carol have done the same, then there is a good chance that Alice and Carol have also cross-signed. Looking at the image below proves that this is not the whole explanation. In this image only the shortest paths of length two are marked. The 1-hop and (-1)-hop lines are clearly fainter, but still clearly visible.

All two hop shortest paths

600イ 800イ 1000イ 1200イ 1500イ

Just as above two hop paths are marked, but only if they are the shortest path. If a direct signature exists, the path is not visible.

Adjusting to hop metric

Stretched leaf with MSD grid

600イ 800イ 1000イ 1200イ 1500イ

The same image as above, but stretched to make the MSD grid symmetrical.

Since we have both a key metric and a hop metric it makes sense to also draw the images so that one hop instead of one key gets a specific distance in the image. In other words, we stretch the image until the distances between the MSD marks are equal. If we remove the grid this stretched leaf is easily seen, but its shape is quite boring. Since the integer-hop lines always intersects the MSD grid crossings they must now all be straight lines. Notice how we now clearly can see the (-3)-hop line, which was barely guessable before.

Stretched leaf without MSD grid

600イ 800イ 1000イ 1200イ 1500イ

The same image as above, but the MSD grid is removed.

Faking a web of trust

Random signatures

600イ 800イ 1000イ 1200イ 1500イ

This is a completely fictional Web of Trust, generated from some simple rules.

All the other images on this page are generated from the actual OpenPGP Web of Trust, but it might be interesting to see if we can reproduce the leaf shape with a random Web of Trust. It looks like we can produce something similar. This image was made from a Web of Trust, sorted in MSD order, generated from these rules:

Create 1000 keys without signatures.
For each of these keys, pick a number from an exponential distribution with mean 1. Add 1 to that number. This key will now receive that number of signatures.
For each of these signatures, either just give the key a signature, marked red in the image, from a random key, or give it a cross-signature with a random key, marked green on this key and blue on the other.

Top 1000

The 1000 keys with lowest MSD

500イ 1000イ

Adjacency matrix of the 1000 highest ranked keys, in MSD order.

As can be seen in the above images, the top left corner is quite dense with signatures. This is simply a zoom of that corner. Unfortunately, it doesn't look very interesting. The hop distance from top to bottom row is less than one hop, which places the 1-hop and (-1)-hop lines, and therefore also the dark side of the leaf, outside the picture.

Top 1000 stretched

500イ 1000イ

Top 1000 sorted by local MSD and stretched to make the invisible MSD grid symmetrical.

Sorting them by local MSD instead of global MSD doesn't make it less boring, but if we also stretch it like we did before to make the pixel distance proportional to hop distance something happens. The two lighter squares are interesting, especially since they seem to overlap in the middle.

(A stretched image with global MSDs look somewhat similar but is more boring.)

Using the user identity when sorting

Keys sorted by reverse name

600イ 800イ 1000イ 1200イ 1500イ

Keys are not sorted by MSD but by the reverse of their primary User IDs.

This far we have used no information besides the structure of the Web of Trust when sorting the keys. But the keys also have another piece of information attached to them which discloses some aspects of their owner's relations with each other, completely independant of the signatures. Each key has a primary User ID, usually in the form John Doe <foo@bar.com>. Sometimes they have a parenthesis or something else after the email address, and sometimes the address isn't enclosed in brackets. I wanted to keep it simple, so I just located the last > and if it existed I removed it together with everything after. I then stripped the whitespace from the end of the string, reversed it, and sorted the keys in case-insensitive alphabetical order based on those strings. The resulting group matrix image has much to tell. There are clearly correlations between User IDs and signatures. If you follow one of the links below the image or on the image itself you might be able to explore the User IDs with your mouse. Or you might look below where I have already marked the important squares.

Please remember that when creating these images I used no information about domains or anything similar. The sorting did not treat the . or any other character in any special way, except stripping whitespace and removing the last > and everything after it.

Large top domains marked

600イ 800イ 1000イ 1200イ 1500イ

Like above, but the visible boxes are identified as domains.

Germany is clearly an important part of the Web of Trust, both containing many keys and being dense in signatures. This is probably in large thanks to the magazine initiative found above, but I don't doubt there are more factors.

.com and .net are very faint, as expected. People having those types of keys are probably scattered all around the globe. .org, on the other hand, might be scattered but is very strong anyway. .debian.org is the only non-top domain visible, with a size similar to Sweden's but much brighter, despite the developers geographical separation.

Notice the very dark border around Germany? The top and left edge says that Germany has very little contact with Canada, which makes sense geographically. However, the right and bottom edge says that the social contact between Germany and Sweden is almost non-existant, despite the mere 70 km between the two countries. In fact, there are only 187 signatures between the countries, of which 19 are to or from my key.

Separating top domains

Sorted by both top domain and MSD

600イ 800イ 1000イ 1200イ 1500イ

Keys sorted first by top domains and then in MSD order within the domains, as suggested by Jan-ナke Larsson.

Instead of just sorting by either MSD or User ID, we can sort by a mix of them. Unfortunately, to do this we must part somewhat from the dumb and simple sorting policy, but the result is well worth it. Here the keys are sorted by email address like before, but then the top domains are identified and within each top domain the keys are sorted by the MSD those keys has in the whole Web of Trust.

Each leaf along the diagonal has the similar features as the whole leaf analyzed before, for the same reasons. The 1-hop and (-1) hop lines are there, and even the (-2)-hop line is visible on some of the leaves. Why the dark side of the leaves is still completely dark is not very strange when you realize that a signature within the top domain limits the MSD difference to 1 in the larger set too.

Notice how the two dark crosses only appear in the .de square. Remember, the two keys that made the crosses are German.

While I was able to predict that the squares along the diagonal would look somewhat similar to the whole leaf before producing the image, the fact that the other squares, which represents signatures between top domains, also share most of the same features is strange. The only visible difference seems to be that the diagonal doesn't need to be straight.

.org

Just the .org keys

610イ 760イ 1010イ 1510イ

All keys with email address ending in .org, sorted in MSD order.

We can also look at only the keys with a specific top domain. Here we have all .org keys, sorted by their global MSD value. This is actually identical to a zoom of the .org square of the above image.

This image is very similar to the leaf for the whole Web of Trust. But the MSD values these keys are sorted by are not produced in the same way - for the whole Web of Trust we calculated the MSDs in the same set of keys that we graphed. Now the MSDs are calculated in a much larger superset. That does not seem to make much difference.

.org keys with local MSDs

610イ 760イ 1010イ 1510イ

Keys sorted by the MSD it has from the set that can actually reach them.

We can instead calculate distances among just the .org keys. This poses some difficulties since there are key pairs between which there simply is no path. In this image the keys are sorted according to the mean shortest distance from the set of keys that can reach a particular key. The big empty space on the top is keys that can only be reached by themselves, at no distance at all, and therefore has an MSD of exactly 0. Since they cannot be reached by any other key they have no signature to the right, but they may have signed other keys so some signatures are visible beneath them.

.org sorted by reachability

610イ 760イ 1010イ 1510イ

These keys are first sorted by how large set of keys can reach the particular key and then in local MSD order as above inside those groups.

Since it might be odd that a non-reachable key is sorted first we can make reachability highest priority and sort by MSD only among keys that are identically reachable. If you follow the links you see that it was implemented by rewarding large penalty MSDs for each key that could not reach the key. No key can be reached by all the other, but the lone keys at the top can be reached by slightly more than the large leaf in the middle.

Sweden

Just the Swedish keys

500イ

Just the Swedish keys, sorted by global MSD.

You don't need a very large set of keys to produce a leaf shape. These are just the Swedish keys. Or, more accurately, the keys whose primary User ID has an email address that ends in .se. Many Swedes probably have email addresses with other top domains. Anyway, the leaf shape and most of its features are clearly visible with only these 468 keys.

Swedish keys sorted by name

500イ

I can see my house from here!

Of course I have to sort the Swedish keys by email address too. The most visible square is .liu.se, Link?ping University, where I am a student. Inside that square is .lysator.liu.se, Lysator, The Academic Computer Society of Link?ping, where I am a member. Inside that square is the intersection of the two lines of my key.

Top domain MSDs used incorrectly

Sorted by top domain MSD

600イ 800イ 1000イ 1200イ 1500イ

These are all keys but sorted not by their MSD in this set, but by their local MSDs in the smaller top domain sets.

This was one of those things I just had to do - sort the keys by their local MSD from only those keys inside their own top domain which can reach them. I have only faint ideas of what this image means, but those Germans are clearly visible here too.

Future work

Some ideas:

Find out if anyone else has found similar images before.
Find other natural ways to sort the keys that produces nice images.
Explore the time dimension. How do these images vary over time? It should be possible to create an mpeg movie of gradually changing graphs.
When creating these I only had access to what is available in the .wot files I use for Wotsap. There are lots of other information that I simply don't have available right now. Examples of things that might be interesting sorting by:
- Key creation dates
- Dates of signatures
- Number of User IDs and properties of their email addresses

J?rgen Cederl?f <jc+wotsap@lysator.liu.se>