March 20 2002 RussCon Report Addendum: JeffF's proposal

Here's a couple emails JeffF sent me way back in September 2000 proposing an alternate ratings system. At the time, and still today, I'm not really thrilled with it but unable to clearly articulate why. JeffF was good enough to re-forward them to me for re-consideration.

Go back to March 20 report.

Here's the original email:

I. Jeff's wacky and new win and rank factors

Rank Factors:
16.9036 JeffF (4)
16.4968 Jeffles (5)
 3.8278 Steve (4)
 2.9302 Clay (1)
 2.5198 Carly (2)
 2.2361 RussW (5)
 1.2500 RussD (2)
 1.0450 BobR (4)
 1.0000 Clayton (1)
 1.0000 JonathanC (3)
 0.7560 ChrisH (3)
 0.7246 JP (5)
 0.5291 Marty (2)
 0.2291 William (5)
 0.0498 Dan (3)
 0.0131 Matt (6)

Win Factors:
 6.5467 Jeffles (5)
 5.0554 JeffF (4)
 3.1498 RussD (2)
 2.5198 Carly (2)
 1.9688 JP (5)
 1.6904 ChrisH (3)
 1.4548 JonathanC (3)
 1.0450 BobR (4)
 0.8360 Steve (4)
 0.7661 RussW (5)
 0.6988 Clay (1)
 0.6687 Clayton (1)
 0.3637 Marty (2)
 0.2291 William (5)
 0.2090 Dan (3)
 0.1443 Matt (6)

II. How it works (sort of)

Clearly my rankings are better since I'm ranked higher in them. :)
Seriously, though I tried to come up with a ranking scheme to
address what I thought of as deficiencies in the current scheme.  The
quick explanation is that they are multiplicative across games, and that a
score of 1 is completely average (coming in the middle of games for rank,
winning exactly your share of games for win).  The exact formula for your
rating in a game is n^((n-2i+1)/(n-1)), where n is the number of players
and i is your position from 1 to n in the game.  Ties are treated as
everyone being at the average position, i.e. in a 2-way joint win both use
i=1.5.  As before win ratings fr now winners are computed by treating all
of them as tied.

III. Where it came from

I have two main complaints about the old rankings.  One is that
consistently good performance is undervalued, i.e., win 99 games out of
100 and finish with a lower rank than someone who plays and wins 1
game.  The other is that the relative boost to ratings provided by games
with very large numbers of players seemed too high.

I mostly derived the rankings from the following assumptions (which may
seem obvious, but aren't always consistent with the old rankings):
A. Winning (even jointly) should make your win factor go up.  Not winning
	should make your win factor go down.
B. Finishing higher than the average position in a game should make your
	rank factor go up.  Finishing lower should make it go down.
C. Winning a game with a large number of people increases the win factor
	more than winning a game with fewer people.  Not winning a game
	with a large number of people is less harmful to your ranking then
	not winning a game with a fewer number of people.  Not winning a
	game with multiple winners is worse than not winning a game with
	one winner.
D. Playing n n-player games and winning one and losing the rest (with no
	joint wins in the sequence) shouldn't affect your win
	factor.  Playing some number of n-player games and finishing at
	ranks which average to (n+1)/2 should not affect your rank factor.

Inspired by the probability discussions from earlier I decided that
winning a game with n players gives a rank factor of n.  (For example,
winning three 2 player games gives an equivalent rank from winning one 8
player game.  In the old rankings it takes eight 2 player games.)  This
means losing a game in which you're the only nonwinner needs to give you a
win factor of 1/n to cancel a win perfectly.  1/n is then natural for the
rank rating of the last place finisher in a game.  The formula above just
finds the correct (geometric) (n-1)-section points for the other ranks.

IV. Advantages and disadvantages

This system isn't scaled by the number of games played.  I think this is a
good thing in the short term for reasons mentioned earlier.  Winning all
or almost all of several games is more glorious than winning two and
quitting.  This isn't so good in the long run.  People who win more than
their share of games can get an arbitrarily high win/rank factor by
playing an arbitrarily large number of games.  However I think for
determining devil of one night this isn't a problem.  (For an example of
the problem, here's the rank factors for the top 10 all time, minus a
couple weeks data I don't have.)
527832481731203957453040728210811303150247859404992340596461928448.0000
RussW (957)
10199565661047372224643859474813227630592.0000 JP (656)
33213474123505656585464031605753380864.0000 Bob (248)
83192608411045632158438097158144.0000 JeffF (292)
4840732703688516598757326848.0000 Daniel (175)
599796113429543052640256.0000 James (223)
5719440983627852.0000 Tim (255)
309919631712516.5625 KevinH (158)
3762408645892.6768 Jeffles (190)
3231261736160.2266 Ken (182)

Aesthetically, I like the fact that good scores can grow arbitrarily
large, but socres can never drop below zero.  Group hug, everyone. :)
Seriously though if you wanted the low end scores to spread out you could
just take logs at the end.  I have a feeling this might turn out to
something close to what the old rankings would be without dividing by the
number of opponents.

Currently the team as one player is handled as before, with each getting
credit for their rank in an n-player game where n is the number of sides
in the game.  This breaks one feature that for any game the players scores
multiply to 1, but then again the way it's handled in the old system
breaks the rule that players scores sum to 0.

Anyway, I'd appreciate any comments, suggestions, questions, mob violence
to change to the new system, etc.

Here's part of our email debate about it:

Double > are JeffF.
Single > are Russ.

> > winning exactly your share of games for win).  The exact formula for your
> > rating in a game is n^((n-2i+1)/(n-1)), where n is the number of players
> > and i is your position from 1 to n in the game.  Ties are treated as
> > everyone being at the average position, i.e. in a 2-way joint win both use
> > i=1.5.  As before win ratings fr now winners are computed by treating all
> > of them as tied.
>
> On an Occam's Razor sort of principle, this formula feels kind of arbitrary,
> complex, and nonobvious to me. :)

Well, I knew the formula wouldn't appeal to you at first sight.  Does it
look any better as [thanks to Dan]
(# players)^((players defeated - players defeating you)/(number opponents)?

> > I have two main complaints about the old rankings.  One is that
> > consistently good performance is undervalued, i.e., win 99 games out of
> > 100 and finish with a lower rank than someone who plays and wins 1
> > game.  The other is that the relative boost to ratings provided by games
> > with very large numbers of players seemed too high.
> >
> > I mostly derived the rankings from the following assumptions (which may
> > seem obvious, but aren't always consistent with the old rankings):
> > A. Winning (even jointly) should make your win factor go up.  Not winning
> > 	should make your win factor go down.
>
> Not sure I agree with this.  (A unique win should and does make your rating
> go up in the existing system.)

This isn't always true.  If you've won everything, winning again doesn't
help.

> Consider the extreme case of joint victory,
> e.g. n-1 players collude to make a joint victory against one poor slob.  One
> of the winning players previously had been winning all their games solo.
> Should this joint victory really increase their rating?  It seems like it
> should lessen their glory to me.

I agree that this isn't very glorious, but I still think winning is
glorious enough to raise the ranking.  For every game you win in some
fashion you have even more proof that you're good at winning.  If you work
out the math, an n-1 player win will have a multiplier pretty close to 1,
closer for more players.


> > B. Finishing higher than the average position in a game should make your
> > 	rank factor go up.  Finishing lower should make it go down.
>
> Ditto: if I always come in first, then if I come in 2nd in some game, it
> seems to me that I have performed worse than I normally do, and so it's not
> obvious to me that my rating should increase just because 2nd is better than
> average.

I think maybe the distinction is that I'm measuring how well you're
outperforming your expected placing / number of wins.  If you finish above
your average placing, that's more "proof" that your rank factor should be
above average.

> > C. Winning a game with a large number of people increases the win factor
> > 	more than winning a game with fewer people.  Not winning a game
> > 	with a large number of people is less harmful to your ranking then
> > 	not winning a game with a fewer number of people.  Not winning a
> > 	game with multiple winners is worse than not winning a game with
> > 	one winner.
>
> Agreed.  Existing system conforms to these goals.

Not true.  Imagine winning two 2-player games and losing a 100-player game
versus winning two 2-player games and losing a 2-player game.  In case 1
your win rating is 1/101.  In case 2 it's 1/3.  Losing the game with more
people hurts your rating *more* than losing the game with fewer people.

> > D. Playing n n-player games and winning one and losing the rest (with no
> > 	joint wins in the sequence) shouldn't affect your win
> > 	factor.  Playing some number of n-player games and finishing at
> > 	ranks which average to (n+1)/2 should not affect your rank factor.
>
> Ditto.
>
> So Axioms A & B seem to be controversial, in that different people have
> different intuitions about them.  Anyone else have strong opinions?  Should
> coming in 3rd in a 6th player game automatically increase your rating
> (regardless of your current history/rating) or not?

Well, see above.  Axiom C may not be controversial, but it's not handled
by the current system.

> > Inspired by the probability discussions from earlier I decided that
> > winning a game with n players gives a rank factor of n.  (For example,
> > winning three 2 player games gives an equivalent rank from winning one 8
> > player game.  In the old rankings it takes eight 2 player games.)  This
>
> Actually it would take 7 2-player games -- i.e. there are 7 opponents in 7
> 2-player games or 1 8-player game.  But certainly your point stands, that
> the 2 systems are different in this regard.

Does this seem to be a good or bad thing to you?

> > means losing a game in which you're the only nonwinner needs to give you a
> > win factor of 1/n to cancel a win perfectly.  1/n is then natural for the
> > rank rating of the last place finisher in a game.  The formula above just
> > finds the correct (geometric) (n-1)-section points for the other ranks.

Do you find the explanation of the formula intuitive even if you don't
like the formula itself?

> > This system isn't scaled by the number of games played.  I think this is a
> > good thing in the short term for reasons mentioned earlier.  Winning all
> > or almost all of several games is more glorious than winning two and
> > quitting.  This isn't so good in the long run.  People who win more than
> > their share of games can get an arbitrarily high win/rank factor by
> > playing an arbitrarily large number of games.  However I think for
> > determining devil of one night this isn't a problem.  (For an example of
> > the problem, here's the rank factors for the top 10 all time, minus a
> > couple weeks data I don't have.)
> > 527832481731203957453040728210811303150247859404992340596461928448.0000
> > RussW (957)
> > 10199565661047372224643859474813227630592.0000 JP (656)
> > 33213474123505656585464031605753380864.0000 Bob (248)
> > 83192608411045632158438097158144.0000 JeffF (292)
> > 4840732703688516598757326848.0000 Daniel (175)
> > 599796113429543052640256.0000 James (223)
> > 5719440983627852.0000 Tim (255)
> > 309919631712516.5625 KevinH (158)
> > 3762408645892.6768 Jeffles (190)
> > 3231261736160.2266 Ken (182)
>
> As cool as it is to see my rating be 527-bajillion, I gotta say that any
> system that ends up producing ratings with insanely large numbers like this
> seems a bit dodgy to me... :)  I'm suspicious of a system that utterly blows
> up for large datasets but is alleged to be fine for small datasets.  Makes
> me think there's something fundamentally broken in it.

I think for me that's my biggest problem with the system.  I still think
it's less flawed than the other one.

> Analogously, JeffS did some numbercrunching with the existing system but
> omitting the normalization step.  The problem/advantage (depending on your
> viewpoint) is again that a slightly above average player can get an
> arbitrarily high rating the more games they play.  That's surely not a
> desirable property IMHO.  Chess & go ratings would be absurd if they worked
> like that...

Chess and go ratings factor in the strength of opponents.  They don't
really seem comparable.

> Yet I can also grok the argument that someone who wins 9 of 10 games perhaps
> deserves a better rating than someone who wins 3 of 3 games.  Proportionally
> 3 of 3 is clearly better, yet in some sense 9/10 seems more glorious.  Is
> this just emotion, or is there a real reason to think 9/10 wins is better
> than 3/3?  Would your answer change if the numbers were 900/1000 and
> 300/300?  At that point, I begin to think 300/300 is surely more glorious.
> The difference is that 3/3 feels like too small a sample to really believe
> the person played perfectly, but 300/300 is pretty damn convincing that they
> are a perfect player.  So how to reflect that in a system...

I think the rankings are mostly used to choose a "best" player for a
single night of gaming.  Thus there's not a way for someone to play more
than a few games.  If ratings go up slightly in a single night from
playing more isn't that a good thing? If you want to scale the rankings, a
natural choice is to take the rating to the power 1/#games.  However it
then loses it's scaling property for differen't number of player
games.  A perfect player would then have a rating equal to the
"average" number of players in their games.

Another thing I've seen in other ranking systems is to seed everyone with
some suitable number of bad and/or average datapoints so that 9/10 looks
better than 3/3, but 900/100 doesn't look better than 300/300.  What do
your overall ratings look like if everyone's rating is scaled by
#opponents+100 instead of #opponents?