Tuesday, March 28, 2006

mudcrutch vs Sagarin

Serious hockey fans everywhere are checking out mudcrutch79's oddly riveting playoff odds predictor regularly. It's fascinating stuff, as mike w observes, it is a triumph of math over feelings. :D

Anyhow, I wrote a little script that generates the same table using Jeff Sagarin's Predictor Numbers. Partly because I wanted to check a few things myself, but mostly for the F of it.

Jeff Sagarin is a middle aged guy who has made a living churning out sports odds data for ages. He is an MIT grad (mathematics) and later earned his MBA from Indiana University. He has gotten a lot of notoriety in recent years for his work with college sports rankings, where history has shown his raw numbers stuff to kick the asses of the coach and media polls. He's since become involved in determining these rankings.

He also got a bit of press when it was learned that he has been doing some in depth data analysis work for Mark Cuban's (also an Indiana grad) Dallas Mavericks for several years.

Mudcrutch is a law student, Oiler fan, internet smartass, and, I think, has taken and passed at least one math course at the university level.

Sagarin's result in red below, mudcrutch's in blue.
I've used a simple algorithm with Sagarin's 'Predictor' numbers as the base. Using traditional numbers for home, road, 0, 1 and 2 days rest. Without the latter these numbers would be even closer.

My point, and I actually have one. It is startling how close these two systems are. And further it is surprising how much baseball analysis actually works with NHL hockey, assuming the right measure of common sense is added. A lot of mc's math on this, IMHO it seems to come out of nowhere. Much like most of Bill James' stuff that I have read ... and like James' stuff, it seems to work. Go figure.

Update to add: An Oiler win tonight, independent of other results ... gives the Oilers an 86% chance of making the playoffs by the Sagarin metric, and I'm sure a similar 8% improvement by the mudcrutch metric. If they get the 'W' tonight, even Dennis should start believin' :D


Anonymous lowetide said...

Just based on the process used I think mudcrutch's is more credible. The differences aren't huge as you presented it here, but the compelling thing mc's table boasts is a "playoff percentage" that seems logical and is solid math.

Besides Sagarin's one big advantage (his stuff is on USA Today) I think mudcrutch buries him everywhere else.

Plus sagarin's baseball stuff is horseshit, or at least it was when I last looked at it five years ago (or so).

3/28/2006 7:13 pm  
Blogger Vic Ferrari said...

This is mc's format. The principle difference is the use of Sagarin's valuation of team quality, the numbers are parlayed through in the same fashion afterwards. I used 10,000 iterations because I think that's what he did.

MC used a matrice built up from Poisson calcs based on these numbers. Which is kinda longhand, but works. The script I wrote on his OilFans thread makes a call to the Poisson worksheet function. Left in in there so he could change it out easily later if he chose. Obviously his script will run quicker if he just enters the Poisson equation in that line if he wants to use that method.

It's certainly a terrific way to present it. Parlayed through by the playoff position 1 thru 8, a cool idea. And ranking all the teams is a pain in the ass, especially with the Division winners finishing 1-2-3, I assume he did that on the worksheet itself.

The right hand numbers in Sagarin's sheet are pretty common numbers. If everyone played the exact same sched then it's just goal differential per game. Simple as that. The tricky bit is handling the cschedule difficulty.

Sagarin's main claim to fame is the overall rating, which is something I ignore, but it is a combination of the odsmaking numbers and his Elo chess system, which presumably is a take-off of the chess rating system ... you are awarded or deducted points based on the outcome of the game and the rating of your opponent.

Sagarin also binds himself to pure math. So if Smyth gets injured an hour before game time on Thursday ... his numbers stay the same, even though the chances of the Oilers winning has obviously changed. MC bound himself with the same constraints I think, as well he should, keeps it clean.

3/28/2006 10:53 pm  
Blogger Vic Ferrari said...

Just rereasing this and I wanted to add:

I'm NOT pissing on mudcrutch here. The opposite. In my experience, common sense isn't all that fuckin common, and this cat seems to have come pretty damn close to figuring out a lot of stuff without a maths or gaming background. props to him.

He even knows exactly why his sked-difficulty calcs are a bit weak, though they are better than he suspects, in my humble opinion.

Clever bugger is MC. I'm surprised that he has found as large of an audience as he has, and I think in a few years he'll be able to use his own verbage to find interesting articles with a http://scholar.google.com/ search. Just a hunch mind.

3/29/2006 7:08 pm  
Blogger mudcrutch79 said...

The sked difficulty thing is hte most interesting point to me. It's got a couple of problems I think: 1) It seems to me that sked difficulty could alter how the poisson thing plays out, although I'm not quite sure how to express what I think happens here. 2) As you mentioned, I'm not at all sure that my numbers are a good approximation of how the difficulty changes. Is the difference between the Oilers schedule and the Red Wings schedule really just 14 goals? It seems low to me.

3/30/2006 9:19 am  
Blogger Vic Ferrari said...

I have a question, mc:

Ran the numbers today, 10,000 seasons using the difference between Sagarin's Predictor #'s as the expected goal differential, plus home ice and rest factored in. Same as before.

Edmonton is the team with the biggest difference, them and San Jose.

So, for EDM, Sagarin vs MC:

1 2 3 4 5 6 7 8
0% 0% 0% 2% 7% 13% 21% 26%
0% 0% 0% 3% 8% 14% 20% 24%

Your numbers on top. My script just uses the wins tie-breaker. If it goes to an 8th place tie with both points and W's ... I just punt both of the teams out of the playoffs. :-)

But I sum up Sagarin results and get the Oilers at 70%. I sum up MC results and get 68%. Strikingly similar. Sagarin has the Oilers at 94.0 average points. You at 93.9.

Yet on your front page you show the Oilers at 58%. ? Am I missing something?

3/30/2006 5:50 pm  
Blogger speeds said...

I don't know if this is why, but if wins are the first tiebreaker EDM would probably lose that tiebreaker with every team left in the playoff hunt.

3/30/2006 10:32 pm  
Blogger mudcrutch79 said...

Yeah, speeds has it. It's all tiebreakers. The leftmost playoff column runs the wins tiebreaker as well.

3/31/2006 8:04 am  

Post a Comment

<< Home