### Bill James' Pythagorean Expectation and King Values

I'm not sure how far I'm going to delve into baseball, but I thought that this was worth talking about.

Long ago baseball writer Bill James asserted that the ratio of runs scored to runs allowed was a better indicator of a team's true ability than it's win-loss record. The Pythagorean winning percentage is defined as = 1/( 1 + (RA/RS)

I downloaded the 2009 MLB standings from the Internet, and calculated the Pythagorean expectation for each team. Then I calculated the King Expectation for each team using very nearly the same method demonstrated here. Easy as beans. This result, the King Expectations, are the revised Pythagorean Expectations, ones that are corrected for difficulty of competition. And it's stunning, or at least I think so.

These numbers reflect the expected change in winning percentage of each team, due to their schedule. By way of example, the Toronto Blue Jays would be expected to win about 12 more games had they played the St. Louis Cardinals schedule last season. That's a whack, folks. In other terms, they would have been the best team in the National League by Pythagorean Expectation, very narrowly edging out Philly. The difference between some conferences is large, but the differences between the two leagues is absurd.

A quick check of the results:

The difference in King Expectation from NL to AL is 5.4%, the AL averages 52.9% and the NL averages 47.5%. Therefore we'd expect the AL to win about 55.4% of the inter-league games. They in fact went 137W-114L in 2009, good for a 54.6% winning clip. We'll call that close enough.

UPDATE: I keep forgetting that Milwaukee is in the NL, so I've changed the last paragraph a bit. i.e. I've edited the numbers to reflect this, and the AL is a tad better than in my original post, and the NL just a smidgen worse.

Long ago baseball writer Bill James asserted that the ratio of runs scored to runs allowed was a better indicator of a team's true ability than it's win-loss record. The Pythagorean winning percentage is defined as = 1/( 1 + (RA/RS)

^{2}). While it is far from perfect, it is a clearly a better indicator of most teams' ability to win than their win-loss record, so I'll use it here.I downloaded the 2009 MLB standings from the Internet, and calculated the Pythagorean expectation for each team. Then I calculated the King Expectation for each team using very nearly the same method demonstrated here. Easy as beans. This result, the King Expectations, are the revised Pythagorean Expectations, ones that are corrected for difficulty of competition. And it's stunning, or at least I think so.

These numbers reflect the expected change in winning percentage of each team, due to their schedule. By way of example, the Toronto Blue Jays would be expected to win about 12 more games had they played the St. Louis Cardinals schedule last season. That's a whack, folks. In other terms, they would have been the best team in the National League by Pythagorean Expectation, very narrowly edging out Philly. The difference between some conferences is large, but the differences between the two leagues is absurd.

A quick check of the results:

The difference in King Expectation from NL to AL is 5.4%, the AL averages 52.9% and the NL averages 47.5%. Therefore we'd expect the AL to win about 55.4% of the inter-league games. They in fact went 137W-114L in 2009, good for a 54.6% winning clip. We'll call that close enough.

UPDATE: I keep forgetting that Milwaukee is in the NL, so I've changed the last paragraph a bit. i.e. I've edited the numbers to reflect this, and the AL is a tad better than in my original post, and the NL just a smidgen worse.

## 18 Comments:

No comments? I suppose that this is a hockey blog, still, I thought there were more baseball fans.

At the very least I would have thought that the folks who like WAR and VORP would be yelling at me. The implications are significant. A good GM, one who is aware of which other teams are heavy into such measures, they could really make hay.

I have a question and I'm sure it's been answered before, but I'm just getting into baseball stats.

How does the DH rule effect runs for and runs against? Is there a way to filter that out against the noise of the arms race in the AL East? Does that significantly alter the King Expectation if both leagues play under the same ruleset (either one)?

Michael

I've not spent a lot of time thinking about it. I tend to think of the 'Pythagorean Expectation' the same way we talk about three extra goals for an NHL team, over a season, being worth an extra win.

It's a round number, a rule of thumb. And it eliminates some of the noise that is included in Ws and Ls.

Having said that, it's a valid point and I don't doubt you're right that the NL's slightly lower runs scored level changes that exponent a bit, compared to the AL.

Does it really matter in this case? I kind of doubt it, intuitively, but I don't know. A quick test would be to use old fashioned Winning% instead of pythagorean expectation to determine King values, then to compare the results.

I'm not at the right computer to do that now, but it will only take moments to do when I get a chance. I'll post a link to a scatterplot of WIN% King Values vs PYTH.EXP.% King Values then.

I love this. Some questions: Didja use Pythagorean wins as the outcome? Was your input for, say, Boston <162BOS-6ATL-18BAL-8CHW...-3WSN=93>? Any idea how the methodology compares to, say, Jeff Sagarin's stuff?

Passive Voice:

Essentially, yeah. The result I'm going for is games over .500, though. Or Pythagorean games over .500.

Your equation there, if you eliminate the 162BOS bit ... it would be the Desjardins' schedule qualcomp, the MLB schedule version of his hockey stuff. And we all recognize that it is very valid stuff.

The King schedule qualcomp is the parlayed version of that.

Sagarin doesn't seem to post sked difficulties for MLB. A shame, because I think I know his general methodology and it would have been interesting to compare.

ESPN is the only source I could find for MLB strength of schedule. Some groovy things appear when you produce scatterplots of King vs ESPN and King vs Desjardins. I might post them up one day.

If anyone knows of any other MLB sources for strength of schedule, please leave a comment here.

BPro runs adjusted standings although, unfortunately, they don't seem to archive them.

I'd have to check the VORP and WAR methodology but I think that they use schedule adjusted statistics.

Tyler:

The only sked adjustment I could find was ESPN's. If there are others out there it would be good to see them.

I didn't find a sked adjustment for VORP and Smith's WAR adjustment seems to only account for the league difference, unless I'm missing something. Which is very possible.

I don't have issue with the different levels used between AL and NL. Though obviously the difference between San Diego and Minny was small for 2009, and the diff between BAL and STL was enormous. More significantly, the bones of WAR and VORP are contstructed assumming all competitors are mythically average.

Baseball is tough, even if WAR or VORP were perfect, you'd still just be at the same level as +/- in hockey. It's the essence of winning, but extracting luck from the equation is an enormous feat.

And while intuitively context would seem relatively unimportant in baseball, there is some tremendous drift in underlying abilities based on different samples, ones that don't exist using even and odd numbered games. Which we use in hockey to filter context, of course.

Brad Null expresses this very well. And I understand that Albert's graduate class was calculating the aging curve for Hank Aaron's abilities, and were seeing significant shifts in underlying abilities (p values).

That's not cricket, Tyler.

Another way of looking at it is using Marcel Predictor. Which is a simple thing I know, that's part of it's purpose. But it is a simple Bayesian equation with underlying assumptions that are flat out bonkers. And it works very nearly as well as more sophisticated measures (PECOTA, CHONE). Which tells us they are making the same errors.

OTOH, we can use Albert's K value for simple batting average (mentioned in passing in the link in my streakiness post from a week or two ago) and use that for all batters who played on the same team with the same manager ... compare that to Pecota or CHONE, it won't be close. The former have invested thousands of hours in their systems, the latter 10 minutes. It's stunning.

Why?

I'm missing something simple, I suspect.

If you took the ten smartest people you know from sabermetrics and had them bet against Joe Morgan directly ... they would collectively whoop him (I'm almost certain).

If you took the ten smartest people you know from sabermetrics and had them bet against Joe Morgan using oddsmakers odds ... they would collectively get whooped by him (I'm almost certain).

Why?

The answer to that question is worth a lot of money. And I don't expect anyone who is clever enough to answer that question to be foolish enough to share his knowledge.

The same phenomenon exists with European soccer, btw.

But I'm digressing.

良言一句三冬暖，惡語傷人六月寒。...........................................................................

Michael:

Sorry for the delay in getting back to you.

I reran the King Values using wins as the valuator of teams, instead of Pythagorean Expectation.

The correlation of those King Values to the ones I posted above is r=.981. I won't bother with a scatter plot.

This way skews the AL have a 55.8% expected winning percentage vs he NL, as opposed to 55.4% with Pyth. Exp.

Now the AL is a higher spending league, that's surely a significant factor. Also, if you're a team like BAL or TOR, it's probably easier to keep your good young players on reasonable contracts. The same kids playing for the Cards or Cubs would likely put up bigger numbers and get more handsomely rewarded at contract negotiation time. So it seems like a bit of a self perpetuating phenomenon, this league and division disparity.

Vic, you have to slow down so the dumb ones among us can keep up with you. I didn't understand your point about MARCEL and PECOTA. You say they're making the same errors. Aren't they both just missing the same variance?

Players' results will vary widely from year to year, just based on luck and possibly injury. Obviously both MARCEL and PECOTA will make predictions that strongly resemble recent results, and if a player breaks out or collapses it'll be a surprise to both.

I also didn't understand what you were saying about the K value. I scanned the article quickly, K seems to be a measure of "streakiness". What are you saying? That players are streakier than they should be?

Sorry to ask so many seemingly basic questions, but when I read some of your stuff it's like Portuguese poetry to me: it sounds great and I know there's something beautiful underneath, but I can't understand a word.

Sorry for the confusion, Tom.

For the type of math Albert uses it is usually convenient to assume that the ability within a population (say hitters playing at DH or a corner position) is distributed in beta form. i.e.

p^(α-1) * (1-p)^(β-1)

This at least for the first try. The alternative way of expressing that equation is:

p^(Kη-1) * (1-p)^(K(1-η)-1)

where η is the population average.

The reason it is convenient to ise that form with Bayesian math is because the estimated ability of a player, his average Hits/PA, or whatever, it distills down to simple arithmetic.

And using the K-format, this K constant reflects the relative amount of luck and skill in the element being considered.

Marcel actually uses the beta form, though not intentionally. And it assumes a the same K for all hitting elements (BB/PA, SO/AB, etc.) (Marcel uses a K of about 300 I think, though I'd have to check).

Then it shifts the nature of the population for years past (say 400 for a year ago, and 500 for two years ago).

Not to pick on marcel, it's designed to be brute simple and open-source. But it gets comparable results to CHONE, PECOTA, ESPN, etc. At least by std error and abs deviation. SO presumably the other forecasting systems are using the same thinking.

This compression of the population from years past .. it is an end run around reason. It's compensating for the fact that the underlying abilities for a lot of players stay mainly constant during the season (some guys have injuries, others have struggles with their swing mechanics, etc ... still the ability is very steady for the group), but somehow have shifted during the winter.

If someone did this with hockey we'd all chide them. Guys have different roles from season to season in the NHL, often on different teams, and it has a huge impact on their results. A much larger effect on the population than chance alone can explain.

Sure, we could just regress the frick out of earlier season's results, but that's not cricket.

Hopefully I've stated that clearly. This post is a primer on a subject that's not easy to explain. Though if there is any place on the Internet where the reasoning will be followed ... it is the Oilogosphere.

Tom,

Also, this article

http://www.philbirnbaum.com/btn2006-02.pdf

gives a better explanation of the beta form. Unfortunately he uses alpha and beta here. alpha/avg = K though, so simple arithmetic to convert.

Hi Vic,

Thanks, that's much clearer. I now understand the issue: the K in the beta distribution is used to model two things at one: the ratio of luck vs. skill, which is a characteristic of the skill distribution of the population and the structure of the game (with 100000 at-bats, skill would be 99% of results), and the change in underlying abilities from year-to-year. Marcel is "cheating" by increasing K to model the fact that results from 2 years ago don't correlate as well as results from last year. This is the real problem of projection systems: ability changes, injuries happen

The same issue emerges in hockey, but there's also the larger issue of context, which you mentioned. The skill distribution in hockey is not as wide as the boxcars lead you to believe, as we all know, and most of players' variance in results is from variance in usage. Malkin wouldn't score as much if he got Marcel Goc's minutes. This isn't nearly as much of a concern in baseball.

What does all this have to do with betting on European soccer? :)

Tom,

Think of a beta curve as a graph of a binomial experiment.

In choosing which specific values of a and b to use for your prior beta curve, imagine your prior beliefs can be expressed as a "prior experiment" where you have a successes and b failures. Then K = a+b would be the prior sample size and eta = a/(a+b) would be the prior proportion of successes. (Jim Albert himself explained it to me like that, and it really set off a light bulb in my brain.)

Now note that if K is large, it essentially means your prior beliefs are "strong", so your curve will be strongly conglomerated around the center, i.e. narrow distribution. If K is small, your curve will be weakly conglomerated around the center, i.e. more spread out.

As an example, imagine our prior beliefs about some proportion (let's say "percentage of coffee drinkers in America") is 30 percent or .3. If we use a=300 and b=700 for our beta curve, the K is 1000 (and the eta is of course .3), and the curve looks like this...

http://sunnymehta.com/public/K=1000.jpeg

But if we use a=3 and b=7 for our beta curve, note that we have the same eta of .3, but our K is 10, so the curve looks like this...

http://sunnymehta.com/public/K=10.jpeg

So, bringing it back to hockey, if the K is large for our prior curve for save percentage (which it is), that's essentially saying that the population is strongly centered around the mean, and not very spread out, therefore any one particular player's results get "heavily regressed" when predicting his future performance (i.e. calculating his posterior).

If K is small, as it is for batter strikeout rate in baseball, that's saying that the population is spread out with regards to that stat, i.e. very differing abilities, so any one particular player's results don't get "regressed" nearly as much and can be taken at face value much more readily after not that many plate appearances.

Make sense?

Sunny, thanks for the clarification. I think I understand the beta function now - I just wasn't clear on the terminology. The K and the Bayesian prior distribution are concepts that I'm comfortable with and we agree upon.

I'm still trying to understand the distinction Vic made between the sabermetric wisdom and the Joe Morgan / oddsmaker wisdom. What exactly do you believe that Morgan knows that the sabermetricians don't, and in what context would it manifest itself?

Tom,

I think what Vic is saying (and plz correct me if I'm wrong, Vic) has nothing to do with what Morgan knows, but with what the oddsmakers know about what the sabermetricians know.

In other words, sabermetricians can predict games better than a randomly guessing monkey (represented by Joe Morgan in Vic's example). However, since the bookies have such a lock on how the sabermetricians are betting, a randomly guessing monkey would do better against the oddsmakers than the sabermetricians would.

I come across this phenomenon sometimes in poker. Every now and then a situation will come up where I'm playing a very good player, but one whose tendencies I'm extremely familiar with, and he'll do worse against me in that situation than a completely clueless amateur would against me, and he'd do better against the amateur himself.

What I don't understand is a) what evidence Vic sees (aside from low vigs on baseball lines) to feel so strongly about this, and b) how it's possible. Baseball betting is not really a game of incomplete information the way poker is. I.e., if sabermetricians had a very good model for predicting games, the only way the bookies could beat them would be to create a better model. Simply knowing the sabermetricians model wouldn't be enough for the bookies. (Right?)

I think one issue for the sabermetrics guys is that they follow the method I outlined here:

http://www.puckprospectus.com/article.php?articleid=77

Bill James published this approximation for head-to-head winning percentage about 25 years ago and it stuck.

Sunny called me out for writing silly things like that. It actually doesn't work that badly so long as you adequate regress player and team talent.

those are the data that we have mining to get a good look how good a team is and it makes thing easier for me with the price per head free trial

Post a Comment

<< Home