From the beginning we kept statistics of game results. We called them Devil Points due to the group's lingo for the leader in a game (e.g. "Nobody should trade with Ken any more, he's the Devil!"). Soon the Devil came to mean someone who did well on average, not just within a single game.
I was interested in making a ratings system after playing in a Settlers tournament Brady ran at MillenniumCon. We both agreed that the system in that tournament seemed flawed, but it wasn't obvious how to fix it. After exploring various systems, I soon hit upon a simple elegant notion: your Devil Points from a single game session would be:
(# opponents you beat) - (# opponents who beat you)
E.g. if a game of Settlers ends with the following scores:
Alf 10 Bert 8 Chuck 7 Del 5then the Devil Points earned are:
Alf 3 Bert 1 Chuck -1 Del -3
This has several properties:
Later the system went through some discussion and evolution; interested parties may read the back issues of RussCon Report for details. The current incarnation is as follows.
We realized there are two independent concepts: how often do you win, and how highly do you tend to rank? This led to two separate statistics: Win Ratings and Rank Ratings. To illustrate the difference, consider the following scenario:
You are approaching the end of a 4 player game. You are currently in second place. If you play conservatively, you are assured of finishing in second place. There is a risky gamble you could try which has a 10% chance of giving you first place, and a 90% chance of dropping you to fourth place. Do you attempt the risky gamble?
I believe there is no "correct" answer to that question. Some players would, because their goal is to maximize their number of wins. Some players would not, because their goal is to maximize their average finishing rank. A fun thing about multiplayer gaming is that games get played with players of both personalities in the same game! And many, perhaps most players, aren't even consciously aware of which goal they have, or flip back and forth.
So we have Rank Ratings for a set of games, which is simply:
(Sum of all your Devil Points)/(Total # opponents)
And we have Win Ratings for a set of games. Modify the Devil Points from a game to pretend that all the losers tied for second place. Then your Win Rating is simply:
(Sum of all your Win Devil Points)/(Total # opponents)
Note in a typical game that has a unique winner, the N losers will all receive -1 Win Devil Points, and the unique winner receives N Devil Points. The winner's Rank and Win Devil Points are always the same, but that's typically not true of the losers'. A way to look at it is that Win Ratings throw away the information about rank among the losers; losing in second place is the same as losing in last place as far as Win Rating is concerned.
Both Ratings are thus bounded between -1 and +1.
Given the data above for Alf, Bert, Chuck, and Del, their Rank Ratings are simply
Alf 1 Bert 0.3333 Chuck -0.3333 Del -1and their Win Ratings are
Alf 1 Bert -0.3333 Chuck -0.3333 Del -0.3333
So from an evening's set of games, I can compute each players Win and Rank Ratings, each one of which is a total ordering, but the 2 different orders typically do not agree. So how can a unique winner be picked? Again, various schemes were tried, such as adding the two ratings together or using some weighted sum, but these all presuppose some objective weighting between Win and Rank, and that the two are directly addable.
Instead, I construct a partial order from the 2 total orders in the usual mathematical fashion:
Biff >= Eugene
Biff's Rank Rating >= Eugene's and Biff's Win Rating >= Eugene's
This is quite natural. It leads to a partial order (not a total order) since many players are incomparable. E.g. if Biff's Win Rating is 0.5 and his Rank Rating is -0.5, while Eugene's Win Rating is -0.5 and his Rank Rating is 0.5, then neither is >= the other.
If this isn't clear, refer to RussCon Reports which actually show a partial order. The first one is April 8, 1998.
So now I have a partial order over the players. There is one or more players who has no one >= them. These are the Devils; they share the glory. Going a level down to the players who have only Devils >= them, we find the Vice-Devils, and so on.
In February 2000, JeffF suggested breaking ties among Co-Devils by number of games played, which seems a fine idea, since it encourages people to continue gaming instead of quitting early if they won their first 2 games!
Another interesting tidbit is games where everyone loses, or everyone wins. E.g. if the players fail to stop the monsters in Arkham Horror. These 2 cases are equivalent: complete joint victory is the same as complete joint loss, and everyone gets 0 Devil Points for the game. The game should still be counted (since it increases the Total # Opponents for the set of games played).
Another interesting case is team games (as opposed to games where alliances form during play). A team game is really a 2-player game, with each "player" run by several human players. When a team game is over, it really shouldn't matter if the teams were each 2 players or 10 players for purposes of getting Devil Points. My ratings program doesn't handle this (we play very few team games), but I believe the solution is simply to say that each player earns -1 or 1 Devil Point, having faced one opponent, regardless of how many players were on the teams.
I have to thank many people for brainstorming and philosophical/mathematical discussions of ratings systems. Roughly in order of contribution, I especially thank Dave (who needs to play more go), Ken, Brady, William, and many more....
What are the rewards for being Devil? At RussCon, the Devil from last week gets to stake out the dining table this week and suggest what games will be played there.
Addendum: July 2002: I have majorly changed the way that devil points are combined from games to compute a combined rating for the set of games. The new system is multiplicative and described here!
Back up to RussCon