Devil Points

The RussCon "Devil Points" Ratings System dice

From the beginning we kept statistics of game results. We called them Devil Points due to the group's lingo for the leader in a game (e.g. "Nobody should trade with Ken any more, he's the Devil!"). Soon the Devil came to mean someone who did well on average, not just within a single game.

I was interested in making a ratings system after playing in a Settlers tournament Brady ran at MillenniumCon. We both agreed that the system in that tournament seemed flawed, but it wasn't obvious how to fix it. After exploring various systems, I soon hit upon a simple elegant notion: your Devil Points from a single game session would be:

(# opponents you beat) - (# opponents who beat you)

E.g. if a game of Settlers ends with the following scores:

Alf  10
Bert  8
Chuck 7
Del   5

then the Devil Points earned are:

Alf    3
Bert   1
Chuck -1
Del   -3

This has several properties:

The total points awarded from a game always sum to zero.
Winning games against a larger number of opponents gains you more Devil Points.
Coming in last against against a larger number of opponents costs you more Devil Points.
Ties are handled nicely. E.g. if Chuck had actually scored 8 above, tying with Bert, then both would earn 0 Devil Points (each beat Del and was beaten by Alf).
Differences in score are irrelevant. Only the rank ordering matters. This bothers some people and pleases others. I took this approach partly inspired by tournaments for go, chess, etc., where difference in score doesn't matter, and often doesn't give a true picture of how close the game was anyway. One advantage of it is the system can be uniformly applied to any game, even games which don't have obvious scores, just winners and losers. Thus the following property:
Devil Points work with any game as long as you can say who beat whom at the end.
This is a system to rate actual historical performance, rather than estimate a player's true strength. In particular it does not take into account the ratings of the other players in the game. Trying to estimate actual player strength is a much harder problem, especially in a multiplayer context (where a very strong player might get crushed by a bunch of newbies who gang up on him).

Later the system went through some discussion and evolution; interested parties may read the back issues of RussCon Report for details. The current incarnation is as follows.

We realized there are two independent concepts: how often do you win, and how highly do you tend to rank? This led to two separate statistics: Win Ratings and Rank Ratings. To illustrate the difference, consider the following scenario:

You are approaching the end of a 4 player game. You are currently in second place. If you play conservatively, you are assured of finishing in second place. There is a risky gamble you could try which has a 10% chance of giving you first place, and a 90% chance of dropping you to fourth place. Do you attempt the risky gamble?

I believe there is no "correct" answer to that question. Some players would, because their goal is to maximize their number of wins. Some players would not, because their goal is to maximize their average finishing rank. A fun thing about multiplayer gaming is that games get played with players of both personalities in the same game! And many, perhaps most players, aren't even consciously aware of which goal they have, or flip back and forth.

So we have Rank Ratings for a set of games, which is simply:
(Sum of all your Devil Points)/(Total # opponents)

And we have Win Ratings for a set of games. Modify the Devil Points from a game to pretend that all the losers tied for second place. Then your Win Rating is simply:
(Sum of all your Win Devil Points)/(Total # opponents)

Note in a typical game that has a unique winner, the N losers will all receive -1 Win Devil Points, and the unique winner receives N Devil Points. The winner's Rank and Win Devil Points are always the same, but that's typically not true of the losers'. A way to look at it is that Win Ratings throw away the information about rank among the losers; losing in second place is the same as losing in last place as far as Win Rating is concerned.

Both Ratings are thus bounded between -1 and +1.

Given the data above for Alf, Bert, Chuck, and Del, their Rank Ratings are simply

Alf    1
Bert   0.3333
Chuck -0.3333
Del   -1

and their Win Ratings are

Alf    1
Bert  -0.3333
Chuck -0.3333
Del   -0.3333

So from an evening's set of games, I can compute each players Win and Rank Ratings, each one of which is a total ordering, but the 2 different orders typically do not agree. So how can a unique winner be picked? Again, various schemes were tried, such as adding the two ratings together or using some weighted sum, but these all presuppose some objective weighting between Win and Rank, and that the two are directly addable.

Instead, I construct a partial order from the 2 total orders in the usual mathematical fashion:
Biff >= Eugene
iff
Biff's Rank Rating >= Eugene's and Biff's Win Rating >= Eugene's

This is quite natural. It leads to a partial order (not a total order) since many players are incomparable. E.g. if Biff's Win Rating is 0.5 and his Rank Rating is -0.5, while Eugene's Win Rating is -0.5 and his Rank Rating is 0.5, then neither is >= the other.

If this isn't clear, refer to RussCon Reports which actually show a partial order. The first one is April 8, 1998.

So now I have a partial order over the players. There is one or more players who has no one >= them. These are the Devils; they share the glory. Going a level down to the players who have only Devils >= them, we find the Vice-Devils, and so on.

In February 2000, JeffF suggested breaking ties among Co-Devils by number of games played, which seems a fine idea, since it encourages people to continue gaming instead of quitting early if they won their first 2 games!

Another interesting tidbit is games where everyone loses, or everyone wins. E.g. if the players fail to stop the monsters in Arkham Horror. These 2 cases are equivalent: complete joint victory is the same as complete joint loss, and everyone gets 0 Devil Points for the game. The game should still be counted (since it increases the Total # Opponents for the set of games played).

Another interesting case is team games (as opposed to games where alliances form during play). A team game is really a 2-player game, with each "player" run by several human players. When a team game is over, it really shouldn't matter if the teams were each 2 players or 10 players for purposes of getting Devil Points. My ratings program doesn't handle this (we play very few team games), but I believe the solution is simply to say that each player earns -1 or 1 Devil Point, having faced one opponent, regardless of how many players were on the teams.

I have to thank many people for brainstorming and philosophical/mathematical discussions of ratings systems. Roughly in order of contribution, I especially thank Dave (who needs to play more go), Ken, Brady, William, and many more....

What are the rewards for being Devil? At RussCon, the Devil from last week gets to stake out the dining table this week and suggest what games will be played there.

Addendum: sometime around October 2000, JeffF modified the program I use to support team games! Our notion is that a team works like a single player. E.g. a game like bridge would be a 2 player game, with both the winning team's players earning 1 point, and both the losing team's players earning -1 point. This nicely generalizes, e.g. games with 1 player versus a team work as a 2 player game (with all players on the team earning -1 or 1 point), or games with more than 2 teams, varying size teams, etc. Perhaps I will write up more details later, or perhaps not. :)

Addendum: July 2002: I have majorly changed the way that devil points are combined from games to compute a combined rating for the set of games. The new system is multiplicative and described here!

Back up to RussCon