February 4 1998 RussCon Report

Nine of us played games at the Feb 4 session. Thanks to Jeff & Kevin for the cake ("Happy Birthday. Eat me.") and the RussCon Tabernacle Choir for the, um, spirited rendition of "Happy Birthday"!

This issue:
Game Results
Medici Rules Newsflash
Titan Next Week
Throne World on Sunday
Cumulative RussCon Devil Points, A New System
Devil Points Theory and Philosophizing

GAME RESULTS:

We finally had a RoboRally with more players, 5 this time, and used the Chasm board finally which proved interesting. Kevin took an early lead, but RussW caught up and won (hey, it was my birthday). RussD tragicomically used Axelrad technique on his robot, driving off the board twice and taking 3 laps around the Maelstrom for subtle and mysterious reasons, never reaching the first flag but gaining glorious victory over Jeff who permanently suicided to solidify his own firm grip on last place after an early brain fart sent him plummeting deep into the Colisseum.

Ken won Loewenherz. Tom won his first-ever game of Medici in a 6 player game.

Die-hards RussW, James, and Brady played 3 games of Entdecker and 6 games of Montgolfiere (brought by Brady) until 4am. Montgolfiere is a cute fast race game of hot air balloons, complete with dirty tricks like grappling hooks and spraying soporific gas on your enemies. You want to ascend faster/higher than the others and reach the moon first!

However, there appears to be a serious luck factor to Montgolfiere as we used the optional rule with the non-human Black Baron player for additional competition: he plays his cards randomly. The Black Baron won 2/5 of his games. As Brady observed, it's a fun game and it comes with a good AI! Our final game resulted in a 3-way tie among the humans with the Black Baron just one point behind, which seemed an appropriate note on which to end the evening.

MEDICI RULES NEWSFLASH:

Ken (owner of Medici) suffered Eric Hyman Disease when reading the rules, missing the key sentence:

"Once a player's ship is full (they have five cards), [they] can no longer turn over cards in this round."

TITAN NEXT WEEK:

RussW, Marty, James, and Brady all have a hankering for Avalon Hill's classic monster slugathon fantasy wargame next Wednesday. Anyone else who wants in is welcome. This means several things:

(1) If you want to play Titan, you must be willing to risk staying up late on a school night! Bring a signed permission slip from your parents.

(2) We must start promptly. Get here at 7pm. If the 4 of us are ready to go and you haven't emailed me you want to play Titan, we're starting.

(3) RussCon is a good time to play Titan since the early elimination problem isn't really a problem. Players knocked out early can join whatever shorter games are going on at the same time.

(4) We get the dining table!

Speaking of seating: I actually have the 6 decent dining chairs plus 2 folding chairs and an office chair, and a table could be brought downstairs if people prefer folding chairs to the floor. (We've had no more than 9 players the past couple sessions.) (If someone could easily bring folding chairs that might be useful just in case.)

THRONE WORLD ON SUNDAY:

If people are interested, I propose Throne World this coming Sunday at 1pm. (Start promptly since last time ran late into the evening!) Please email me or call xxx-xxxx to RSVP.

CUMULATIVE RUSSCON DEVIL POINTS, A NEW SYSTEM:

Well, we diluted Tom a little, but after that Medici victory he's still a devil to be reckoned with.

Here are cumulative scores under the old system (sum of devil points divided by # of games):

EricH 6/2 = 3
Tom 8/3 = 2.67
Evan 4/3 = 1.33
Doug 7/11 = 0.64
RussW 8/20 = 0.4
James 5/21 = 0.24
Baron 1/5 = 0.2
Ken 2/11 = 0.18
Kevin -2/5 = -0.4
Brady -10/17 = -0.59
RussD -6/6 = -1
Peter -2/2 = -1
Jay -1/1 = -1
Jeff -3/2 = -1.5
Marty -17/7 = -2.43

There are flaws with that system, however. E.g., winning a game could LOWER your rating if your rating was higher than the # players you beat. I realized after thinking and emailing with Dave Brandt that I need to divide by the total # opponents you faced. (I think I'd sort of realized that a few weeks ago but was too lazy; counting your games is easier than counting your opponents!)

Thus I introduce Devil Point Ratings, version 2:

(# opponents you beat - # opponents beating you) / (# opponents).

This gives a rating between -1 and 1.

EricH 6/6 = 1
Tom 8/12 = 0.67
Evan 4/13 = 0.31
Doug 7/42 = 0.17
RussW 8/64 = 0.13
James 5/73 = 0.07
Baron 1/15 = 0.07
Ken 2/45 = 0.04
Kevin -2/18 = -0.11
Brady -10/57 = -0.18
RussD -6/22 = -0.27
Peter -2/6 = -0.33
Jay -1/3 = -0.33
Jeff -3/5 = -0.6
Marty -17/27 = -0.63

Curiously, people's relative ranks didn't change much using the new weighted average, so at least my initial lazy reaction that it would be close enough was justified. (RussD is no longer tied with Peter and Jay...)

Baron, of course, is the Black Baron player from Montgolfiere. He flies a mean balloon. Of course, he's been playing Montgolfiere longer than James, Brady or I have.

DEVIL POINT THEORY AND PHILOSOPHIZING:

Here are some rambling philosophical musings on game ratings and strategies, and how I came up with devil points...

Thanks to Randy Paul for pointing out the typo in the last mailing. Your devil points from a game are of course (# opponents beaten - # opponents beating you), not just (# opponents beaten).

Where did the idea of devil points come from? I began brainstorming tournament scoring ideas after the fun Settlers of Catan tournament which Brady ran at the Millenium game con a few months ago. (Which, incidentally, is where RussD & I got hooked up with Brady and James.) After fiddling around with a few ideas, I realized that I wanted a system based on relative ranking at the end of the game rather than absolute scores. This is of course a question with no provably correct answer, I think. It's a matter of taste. My reasoning was that ranks are universal across all games, and I wanted a system to let us see who was really "the devil" over all our games played. If you use absolute scores from individual games, that's ok for comparing other sessions of that same game, but how do I meaningfully relate my final score of 9 in Settlers of Catan with my final score of $42300 in Acquire? Some games have a fixed upper bound on scores and others don't. Furthermore many games don't even HAVE an objective final score other than ranking (e.g. any of the racing games).

I liked the devil point idea because it has nice properties:

(1) Defeating more opponents should be worth more than defeating fewer opponents. We're more impressed by someone who wins an 8 player game than a 3 player game.

(2) Similarly for losing.

(3) The total points awarded to all players in a game sums to zero.

(4) It filters out game-specific info, leaving results comparable across all games.

(5) It nicely handles ties. E.g. if 2 players end with 8 points in a 3-player Settlers game, then they each get -1 devil points, while the victor gets 2 devil points. I dislike arbitrary tie-breaking for the sake of tie-breaking.

This system by itself seems ideal for a tournament scoring system to me. Play a fixed number of games, sum up players' devil points, and the high total wins the tournament. In a tournament situation, ideally everyone would play games with the same number of players (e.g. we all play 3 games with 4 players and 1 game with 3 players, or some such), but in reality that's not always possible, so I feel the devil point formula gives a reasonable balancing factor for that. I.e. if I get to play more games with fewer opponents, I'm more likely to win them, but I win fewer points for doing so. I'm also more likely to come in last, but I get penalized fewer points if that happens. Note that the ability to meaningfully compare results from different games makes this a natural method for running a multi-game tournament, e.g. the "Puffing Billy" events that involve playing several different train games.

------------

But what about the problem of varying numbers of games being played?

I've been summing your total devil points and dividing by the number of games played. This is not necessarily the right thing to do, but it's easy... Dave Brandt & I have been emailing about possible alternate schemes. Issues involved:

Small sample size problems. Dave suggests having a minimum number of games (e.g. 5 or 10) to divide by, in effect treating unplayed games as draws (score of 0) for the player. He also pondered dividing by some function of n which grows less rapidly than n (e.g. sqrt(n) or log(n)). But surely that's not right. I'd once considered simply using your raw sum of devil points, since an average player will average 0 either way, but then a good or bad player will be continually drifting further away from 0. Dividing by sqrt(n) or log(n) would have the same problem.

Playing more games should only make your score more accurate. It shouldn't make your score get better and better. Otherwise a weaker player gets a better rating than a stronger player just by playing more games.

Dave also wonders whether winning a 3 player game (2 devil points) and coming in second in a 5 player game (also 2 devil points) should be considered equivalent. He's more impressed by the winner of the 3 person game. I think this is a philosophical conundrum. Where do you draw the line? Surely it's more impressive to come in 2nd for some large n (e.g. a 100 person game) than to win a 3 person game. What value of n this is perhaps becomes personal taste.

My main problem with the old system was that winning a game could hurt my devil point rating! If I win a game and get awarded points which are less than my current devil point average, then my devil point average goes down. E.g. Zippy currently has 20/5=4 devil points. Suppose he wins a 3 player game. Then he has 22/6=3.67 devil points.

After some more pondering, I have come up with version 2 of the system introduced in this mailing: use a weighted sum of the game results instead of weighting them all equally. The weights are of course the # opponents from the game.

Our Most Assiduous Reader (OMAR) can verify that now winning a game cannot lower your cumulative rating (and similarly, losing a game cannot raise your cumulative rating).

------------

Another issue is the mixing of different game titles. Obviously a victory in a game like 1830 denotes more skill than a victory in a game like Montgolfiere. My long term goal is to have all this computerized, and one could then request devil point ratings based on certain games. E.g. "Tell all devil ratings for Settlers of Catan" or "Tell all devil ratings for Throne World and Twilight Imperium".

Dave had suggested providing a numerical rating for each game, e.g. 1 for 1830, 0.9 for Twilight Imperium, 0.3 for Montgolfiere, but this then begs the question of how to come up with all these weight numbers.

------------

Another issue is comparing ratings of players who haven't directly competed much. A program could support requests like "Tell me Marty's and Ken's ratings based on all games which they both played in."

In the world of chess and go and other 2 player games, it's a little easier to compute "absolute" ratings (though there are many murky problems there as well). In multiplayer games this is much trickier. Alliances mean that a stronger player can get his butt kicked ("stop Ken, he's the devil!"), whereas in 2 player games, a stronger player will normally win.

------------

Finally there is the whole deep question of "What is my goal in this game?" Typically I want to win. If it's clear that I can't win, then what do I do? The devil points system assumes your goal is to rank as high as you can. Sometimes people have other goals.

(1) You might try to maximize your objective score. (This could lead to weird results though, since sometimes maximizing my score costs me victory! E.g., imagine that I could cooperate with you and gain us both 100 points, but you're currently 1 point ahead of me so you still win. Or I could do some action which gains me 2 points and gains you nothing so I win, with a lower final score than I could have had.)

(2) You might try to hurt the leader. (If they win, you certainly can't win.)

(3) You might try to hurt an enemy, e.g. for revenge.

(4) You might try to help your buddy win.

(5) You might try to suicide and eliminate yourself from the game.

(6) You might do wacky things just to entertain yourself.

Imagine that you're in a situation where by playing conservatively you will definitely win 2nd place. There is a very risky move you could take which would make you win, but there's a high probability that it will backfire and make you come in last. Do you try the gamble? Does it depend on what the probability of backfiring is?

------------

Imagine you're playing Acquire and a rich philanthropist will give each player $n at the end of the game, where n=your final money score. How will that affect your playing? You'd want to maximize your own score, regardless of whether you win or lose the game and regardless of others' scores. This is not a zero-sum game; we could all make lots of money.

Now imagine there is a fixed pot of $1000 which will be divided proportionally based on your final money score. Now increasing your own score is good, but so is decreasing others' scores. Winning the game is necessary but not sufficient to optimize your share of the pot.

Now imagine there is a fixed pot of $1000 which will all go to the winner. You must play only to win; it's all or nothing. Winning is necessary and sufficient to optimize your share of the pot.

Now imagine that when the game is over, each person must pay $100 to all players who scored higher than they did. This is the devil points model; you want to get as high a rank as you can.

These differing incentives all lead to different goals and strategies. (And don't forget some people still might not care about the money!)

Obviously any rating system will be measuring how well players do with respect to some given objective, and maximizing your rank is what I think makes most sense and is most generally applicable.

------------

Thanks very much to Dave for some thought-provoking discussion on this. If anyone else has ideas, let's talk about them!