Inconsistent ratings when drawing using Trueskill

Question 1

I'm using Trueskill to try to create a rating system for a tennis tournament among my friends. Games are 1v1, so I'm trying out the following:

from trueskill import Rating, quality_1vs1, rate_1vs1
alice, bob = Rating(25), Rating(25)
print('No games')
print(alice)
print(bob)
alice, bob = rate_1vs1(alice, bob)
print('First game, winner alice')
print(alice)
print(bob)
alice, bob = rate_1vs1(bob, alice)
print('Second game, winner bob')
print(alice)
print(bob)

This outputs the following:

No games
trueskill.Rating(mu=25.000, sigma=8.333)
trueskill.Rating(mu=25.000, sigma=8.333)
First game, winner alice
trueskill.Rating(mu=29.396, sigma=7.171)
trueskill.Rating(mu=20.604, sigma=7.171)
Second game, winner bob
trueskill.Rating(mu=26.643, sigma=6.040)
trueskill.Rating(mu=23.357, sigma=6.040)

I would have expected both players having the same rating after these two games but I'll go with that, no issue. However, if I remove the second game and replace it with a draw and re-run the thing:

alice, bob = rate_1vs1(bob, alice, True)
print('Second game, draw')
print(alice)
print(bob)

I get the following:

First game, winner alice
trueskill.Rating(mu=29.396, sigma=7.171)
trueskill.Rating(mu=20.604, sigma=7.171)
Second game, draw
trueskill.Rating(mu=23.886, sigma=5.678)
trueskill.Rating(mu=26.114, sigma=5.678)

bob seems to have a better ranking when having drawn than when having won.

What's going on here? What am I doing wrong?

Question 2

I would have expected both players having the same rating after these two games

The TrueSkill FAQ mentions that it "takes more recent game outcomes more into account than older game outcomes".

It looks like TrueSkill only remembers two numbers per player (mu and sigma), so if it ever wants to forget the past, it has to do some kind of exponential decay of its old knowledge.

Bob seems to have a better ranking when having drawn than when having won. What's going on here?

I don't know, but I think you did everything right. The answer is probably in the formula (or maybe the implementation). Note how the sigma has decreased more after the draw outcome, so the algorithm seems to think that it gained much stronger evidence from the drawn result. That the mu values move a lot more when the evidence is stronger is only logical. So the question to ask is: Why should it consider the draw to be more informative?

Question 3

Thanks for your answer. What you pointed from their FAQ confuses me even more. If the most recent game should be taken more into account, then after the second game, bob should have higher mu aka skill. Good point about the sigma decreasing more for a draw than for a victory/loss, I hadn't noticed that. Honestly I'm very confused and I start to wonder whether this library really works as expected.

Question 4

When I do only win/loss (no drawn) it looks sensible to me. I think it's better to judge this algorithm by how it updates the rank difference between two players (instead of absolute rank).

maxy 5,5671 gold badge26 silver badges30 bronze badges · Accepted Answer · 2024-10-03 10:11:00Z

I would have expected both players having the same rating after these two games

The TrueSkill FAQ mentions that it "takes more recent game outcomes more into account than older game outcomes".

It looks like TrueSkill only remembers two numbers per player (mu and sigma), so if it ever wants to forget the past, it has to do some kind of exponential decay of its old knowledge.

Bob seems to have a better ranking when having drawn than when having won. What's going on here?

I don't know, but I think you did everything right. The answer is probably in the formula (or maybe the implementation). Note how the sigma has decreased more after the draw outcome, so the algorithm seems to think that it gained much stronger evidence from the drawn result. That the mu values move a lot more when the evidence is stronger is only logical. So the question to ask is: Why should it consider the draw to be more informative?

Thanks for your answer. What you pointed from their FAQ confuses me even more. If the most recent game should be taken more into account, then after the second game, bob should have higher mu aka skill. Good point about the sigma decreasing more for a draw than for a victory/loss, I hadn't noticed that. Honestly I'm very confused and I start to wonder whether this library really works as expected.
When I do only win/loss (no drawn) it looks sensible to me. I think it's better to judge this algorithm by how it updates the rank difference between two players (instead of absolute rank).

CollectivesTM on Stack Overflow

Inconsistent ratings when drawing using Trueskill

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related