July 2002

This describes the newly-introduced multiplicative ratings system. It might be useful to read the original Devil Points document first.

Why the change? I grew dissatisfied with the previous additive system because it failed to capture the way probabilities of independent events work. Winning 7 2-player games gave you the same rating as winning a single 8-player game. This seemed wrong since there's only a 1/128 probability of winning the 7 2-player games, and a 1/8 probability of winning a single 8-player game, so winning the 7 games seems like it is more glorious and impressive. Loosely speaking, I felt that adding the results of individual games couldn't be right since one multiplies rather than adds probabilities of independent events.

So in 2002 after a few false starts I hit upon a method I like which has a simple elegance analogous to the old system, which pleases me, and which more accurately captures the intuition that winning 7 2-player games is more glorious than winning a single 8-player game.

Email with several folks on the mailing list, especially JeffF and Dan, led me to the final breakthrough needed to realize my current system.

The devil points from a single game still work the same. All that's changed is how those are used to compute a combined rating for the set of games you've played. Your rating is found by the following steps:

- Compute a rating R
_{i}for each game played - Multiply all the R
_{i}to get a combined rating for the set of games - Take the log of that to normalize

The ratings R_{i} are simply P(you'd do this poorly or worse)/P(you'd do this well or better). For the non-math-geeks, P(X) means "Probability of X occurring". Probabilities range from 0 (impossible) to 1 (certain). E.g. in an n-player game with a unique winner, the probability is 1/n that you'd win. The probability is also 1/n that you'd come in dead last.

Note that for someone who won, P(they'd do this poorly or worse) is 1, i.e. there's no way they could do better than 1st place. Similarly for someone who finishes dead last, P(they'd do this well or better) is 1, i.e. there's no way they could do worse than last place.

Thus your rating from an individual game will be a positive number, and as we'll see the expected result is 1, and your combined rating will be a positive number, also with expected result 1. This gives a sort of skewed exponential curve; the old system gave a distribution that was symmetric about 0. Players who do really poorly end up very close to 0, and players who do well end up with quite large numbers. So we take the log as a final step simply to normalize and get a roughly symmetrical curve centered at 0, so that positive ratings mean you did better than expected, and negative mean worse than expected.

The simplest example is a 2-player game. Say that Biff won and Eugene lost. Then

P(Biff would do at least this poorly) = 1Note that the winner's rating is always greater than 1, and last place person is always less than 1.

P(Biff would do at least this well) = 1/2

So Bob's rating is the quotient of these, i.e. 1/0.5 = 2.

Similarly, Eugene's rating is 1/2.

Suppose we have a 3 player game, with Biff winning, Alfred 2nd, and Eugene dead last. Then

P(Biff doing this poorly)/P(Biff doing this well) = 1 / (1/3) = 3

P(Alfred doing this poorly)/P(Alfred doing this well) = (2/3) / (2/3) = 1

P(Eugene doing this poorly)/P(Eugene doing this well) = (1/3) / 1 = 1/3

But what about ties? That was a conundrum, because trying to estimate the probability, e.g., of winning an n-player game with a 2-player joint victory gets weird and is clearly game-dependent (e.g. in some games, ties are common, and in others they are impossible). The conditional-probability-based approach I take is to basically say "given that this was the distribution of ranks, what is the probability you'd do this poorly/well?" I.e. totally punt on attempting to worry about the probability of a tie occurring and simply go with the actual distribution that occurred, whether it has ties or not.

E.g. suppose we have a 5 player game with Biff and Eugene tying for 1st and Alfred and Tim tying for 2nd and Herbert coming in last. Then, given that there was 2-way tie for 1st place, you have a 2/5 chance of having done that well. Thus:

P(Biff doing this poorly)/P(Biff doing this well) = 1 / (2/5) = 5/2

P(Eugene doing this poorly)/P(Eugene doing this well) = same as Biff = 5/2

P(Alfred doing this poorly)/P(Alfred doing this well) = (3/5) / (4/5) = 3/4

P(Tim doing this poorly)/P(Tim doing this well) = same as Alfred = 3/4

P(Herbert doing this poorly)/P(Herbert doing this well) = (1/5) / 1 = 1/5

If that isn't obvious, consider the following diagram:

|-- 5/5 ------------------| | |-- 3/5 ---------| | | | 1/5 | Probability of doing this poorly | | | | Biff, Alfred, Herbert Eugene Tim | | | | | | | 2/5 | | | Probability of doing this well |------- 4/5 --| | |------------------ 5/5 --|

If you still don't get it, all I can say is "Math is hard, let's go shopping!"

Win ratings are handled analogously to the old system, i.e. we first convert all the non-1st-place ranks to be equally tied for 2nd. E.g. the above 5-player example becomes Biff and Eugene tie for 1st, and Alfred, Tim, Herbert are all equally tied losers since none of the won. I.e. the rank rating and the win rating will be the same for winners and different for non-winners (just as in the old system). Thus:

P(Biff doing this poorly)/P(Biff doing this well) = 1 / (2/5) = 5/2Note that one of the 2 Probabilities will always be 1 when computing a player's win rating, since everyone is either a winner or dead last when you consider all the losers to be equal losers.

P(Eugene doing this poorly)/P(Eugene doing this well) = same as Biff = 5/2

P(Alfred doing this poorly)/P(Alfred doing this well) = (3/5) / 1 = 3/5

P(Tim doing this poorly)/P(Tim doing this well) = same as Alfred = 3/5

P(Herbert doing this poorly)/P(Herbert doing this well) = same as Alfred = 3/5

Teams are handled the same as before, i.e. each team is considered a "player" for purposes of computing ratings, and the numeric effects are applied identically to each player within the team.

How about some real-world examples? July 17 2002 ratings works nicely. (It also shows the numbers before and after the normalizing log transform.) Steve won all 3 of his games, which in the old system would have given him perfect ratings of 1. Daniel won 6 of 12, which in the old system would have given him ratings less than 1. But in fact the probability of Daniel winning 6 of those 12 games was lower than the probability of Steve winning all of his mere 3 games, and so the new system gives Daniel a higher rating. Similarly Dan lost both of his 2 games, while Whindy won 1 of her 10 games; the old system might have given her a higher win rating than Steve, but the new system notes that in effect her 1 win is countered by losing so many; the probability that Dan would lose both of only 2 games is not as low as the probability that Whindy would have won only 1 out 10 whole games.

A major difference between the old and new systems is that the old system's ratings are bounded between -1 and 1. If you won all your games, your rating would be 1. There was no way to distinguish someone who played and won a single 2-player game, and someone who played and won 10 8-player games, even though the latter feat is clearly more impressive and unlikely. The new system has an open-ended scale. Someone who tends to do better than expected results will get an arbitrarily high rating by playing more and more games. I don't like that aspect of it, but it's still more pleasing to me than the old system.

Also note that the old system had a nice property of all the ratings from a given game added to 1, and the expected rating was always 0. This system has no such nice simple summation/cumulative property, but the expected rating is always 1. (I suspect there's got to be some similar property, it's just messier and I've not figured it out.)

Also, in the old system, achieving the expected result (being right in the middle) of a single game would tend to pull your rating toward the middle (lower a strong player's rating and raising a weak player's rating) due to that system's final normalization step (dividing the sum of your individual game ratings by the total number of opponents faced). In the new system, achieving the expected result has literally no effect on your rating.

Some axiomatic assumptions remain the same. E.g. only final ranks in the game matter, not relative scores. And no attempt is made to modify ratings based on which opponents you played, or which games were played.

Please let me know if this document has bugs or you still don't grok it.

2003-01-10 update: Thanks to Bradley Eng-Kohn for pointing out that the ratio of probabilities

is equivalent to the ratio of

since of course the probabilities are simply

so the two # players cancel out in the ratio of probabilities. Thanks Brad!