Saturday, January 16, 2010

Likelihood and the Way Humans Think

This is at the crux of sport's most heated (and entertaining) online debates. The Internet is good that way, because surely we all have the good sense to avoid arguing about these things in real life. It also separates two types of thinkers, and they will never cross the line to join forces with the folks who have the different mindset, so it is an endless repetition of similar arguments.

Likelihood, and the way we understand it. That's at the core of almost all of it. So I thought I would write a blurb on the subject, as a point of reference for the future.


Think of an imaginary hockey player that scores 16 EV goals on 103 EV shots, this during his rookie season. That's a 15.5% clip. How good is he at finishing his chances?

Thinker A says: He's a 15.5% shooter at evens, that is precisely what the evidence tells us. And with finish like that, he should rightly get more powerplay time and EV icetime in his sophomore year, meaning more goals.

Now Thinker A doesn't expect him to shoot EXACTLY 15.5% next year. Hell, a goalpost or two here, a big save there, a soft goal allowed by the opposing goalie here, it can make a hell of a difference ... it's a fluid thing. But 15.5% is the best guess.

A university student, in the natural or social sciences usually, will often bust into the Internet conversation with some math to support his Thinker A cohort. He will use precisely the thinking, and equation, explained here to calculate the likelihood of this rookie getting this wonderful shooting% while only possessing average shooting ability. Then he'll do the same for a player with 24.5% natural ability, and 23.5% natural ability ... and so on and so on. Plot them all out and the likelihood of the player's abilities will be expressed in the red histogram (ignore the clear one for now):

That's the graphic representation of what Thinker A was feeling all along, so he embraces it. And really, it is intellectually honest. He's looking at the player in isolation. He'll also find ways to rationalize it. He's a brilliant shooter, he doesn't waste any, he always gets in close for his chances. Some of that may be true, in my experience most of these arguments are developed after the fact. I doubt that Thinker A types are compulsive liars, they just embrace the anecdotal evidence that supports the way their brain works, as visualized in the chart above.

Now Thinker B enters the conversation ... this is always trouble.

He reads the discussion and can't remember the player having these special qualities. Then he looks and sees that the rookie never scored at will in either junior or college.

Then he looks at how shooting% shook out for the forwards in the entire league last season. It's a pretty wide spread of results (about half agains as wide as the clear histogram in the plot above).

He also sees that a whack of guys shoot 15-16% at evens every year, but nobody has ever maintained that clip for their career (the clear histogram in the plot above is his sense of how career EV shooting% is shaking out). And that includes some guys with tremendous, proven finishing ability. It's a real oddity for someone to even manage it two years in a row.

That's his feel of the way EV shooting% works in the NHL, with some guys being clearly better than others in the long haul, but wild swings for most guys from year to year, and huge spreads from good to poor on a yearly basis ... he surmises that buddy is due for a fall, that he just had a bit of a lucky season with the goals.

So Thinker B looks at the Thinker A's position (the red histogram in the top plot) and says ...

"Okay, I can understand that to an extent, but you're saying that this rookie has 1 in 6 chance of being a 15-16% guy by ability, and a 1 in 12 chance of being a 11-12% guy ... twice as likely to be in the high group, that seems fishy to me."

"In the league, we have a pretty good idea that there are 20 times as many 11-12% forwards than there are 15-16% guys, in terms of long term ability ... shouldn't we take that into account? So instead of our rookie being twice as likely to be in that 'super' pail than the merely 'decent' pail ... isn't he actually 10 times LESS likely to be in the 'super' pail than the merely 'decent' pail?"

This is usually the point where the chasm between the two mindsets becomes obvious and unbridgeable. Name calling generally starts about here as well.

If thinker B progresses with this thinking, multiplies each red column with each clear column, he'll end up with the bold clear histogram in the plot below. That's his likelihood estimation of the players ability. And it's a hell of a lot different than thinker A's.

If you use B's histogram above, as his estimate for the rookie ... now do precisely the same for the other 250 or so forwards who registered enough shots for you to count. Add up all your histograms to build one giant one (feel free to use Lego) ... you're back to where you started with the original clear histogram in the first picture. At least you will if that clear histogram was right in the first place.

Now, let's simulate a season. If you take one lego block, at random, from the histogram for each player (let's say you pulled a 10% lego block for the rookie) now grab a die weighted to 10% and roll it 103 times (the number of shots he had). Make sure to record the number of sixes you roll, mark that down as his simulated goals for that season, and figure out his shooting% as well. Now do the same for every other player ... plot out the results. Voila! The same as the spread of results for an actual NHL season.

Simulate another season and look at the way the individual player's numbers shifted from yearto year. Some guys had madass swings, some guys stayed the same. And the real league numbers show a frighteningly similar pattern. The real world numbers will be ever so slightly wider shifts from year to year. Why? Because things like wrist injuries, playing with Thornton, emotional problems ... they really do happen, and they affect a player's EV shooting%. But the effect on the league as a whole ... it is a fraction of what hockey pundits are attributing to the variance for even one team. So the overwhelming majority of this expert insight must be completely untrue. The universe demands that.

If thinker A tries the same thing with his plot (the red one, his sense of likelihood) if he applies it to every player in the league and adds them all up ... the spread is way too wide. He has estimated tonnes more terrific shooters than the NHL produced, and tonnes more terrible shooters as well. Keep parlaying it through and in relatively few seasons you'd have a population with a bunch of guys who didn't have a hope in hell of ever scoring a goal, and a smaller group who scored on almost every shot they took.

That's why A scoffs at the notion of predictive value, and will almost never wager with B.

Now obviously very few arguments on the Internet involve that kind of math, the language used is more likely a mixture of English and math, mostly the former. The format remains the same though, regardless of whether the topic is clutchness or health or save percentage or shootouts or whatever. If google had an English2Math translator ... you'd see that a lot of guys who aren't especially mathy (slipper and Tyler come to mind) are actually often hitting us with some big, Bayesian concepts. Translated to math, they would be brutal to solve, you'd need someone with much better math skills than me.

B types often bust out the phrase "regress to the mean". That's probably a poor choice of words. It creates the impression that there is an invisible force in the universe pulling everyone and everything towards mediocrity. Really we are talking about luck driving results in the short term, but the bounces being more likely to settle out with time. There is a good chance that a forward who has completely average results over three seasons ... if he has a hard shot and quick release, he will probably get better results in the future, he will probably move AWAY from the mean. The population as a whole is regressing to the mean with time, the players are bouncing around every which way at any moment.

That is it. I don't think I have ever seen a Thinker A type convert to a Thinker B type in my life. The reverse has never happened either. Not in sports talk, not anywhere. I have to think that it's just the way our heads are wired from birth or early life. The Oilogosphere is swimming in this Thinker B type of writer and commenter, whether they are mathematically inclined or not. That is an extraordinary thing, at least to my mind.

NOTE: The clear histogram in the top picture is actually an estimate of the ability of finishing in the population. You can think of each piece of it as the average of each player over a million parallel universes, it all adds up to make this plot of shooting% ability for the NHL forwards. 'Ability' is the common term for these prior distributions in sports, 'Non-Luck' is more correct to my mind.


Blogger YKOil said...

Don't know what you think of it ('it' being your own post) but I am thinking that is the best post of yours - both in logic, clarity and communicativeness, that I have ever read.

FTR, I see myself as a guy who develops his thoughts under paradigm A but looks out for the B corrections, and then adjusts to B (almost) everytime the correction arrives (whether from myself or someone else).

Thanks for the post Vic, appreciated,


1/16/2010 8:30 pm  
Blogger JavaGeek said...

This theory you present above for "B" is generally considered a Bayesian approach. I've been planning on writing something on technical end of these issues, but never got around to it. It's not always the easiest stuff to explain.

Good job, btw.

1/16/2010 8:47 pm  
Blogger R O said...

Nice post.

I have all of one University course as far as background in probability, so I don't know what a term like "Bayesian" means.

I do know that Thinker B is what I would call the "common sense" approach (at least my common sense). In day-to-day life many events are influenced by nothing but chance, surely hockey (being on a smaller scale) might be the same?

And once one brings the power of math and logic to the table, all doubt is removed.

1/17/2010 12:26 am  
Blogger Vic Ferrari said...


Thanks. By the way, your explanation of the way you think ... that's the "B" guy through and through. The idea is that we're taking what we know about a tree ... adding it to what we know about the forest at large, then going back to apply it to the tree. If we apply the same thinking to all the trees and it implies a forest that can't be ... well we've effed up, so it's back to the drawing board. But we've figured some things out from our first kick at the cat, so our next try is probably going to be better.

1/17/2010 12:41 am  
Blogger Vic Ferrari said...


Thanks. I'll go further with this stuff if enough people are into it. As you say, it's a bitch to explain. By the standards most of the oilogosphere's writers I am incoherent. I like to think 'stream of consciousness' is a writing style, really it's just laziness.

There really is some great insight to be gained from this approach IMO. Jim Albert is a guy I follow. He's a Bayesian statistician, and some of his more advanced work is well beyond what I will ever attempt to understand. His book "Curve Ball" seems to have been almost universally misunderstood by reviewers, but it's terrific. And his published articles are all well worth reading IMO.

Uncommon common sense and madass math skills on that guy.

1/17/2010 1:32 am  
Blogger Vic Ferrari said...


I had you figured for a guy with a stats background.

In any case, to my mind it's not so much about the math skills as it is about the way of thinking. It would be cool if more people in this neighbourhood made an effort to learn the true language of their reasoning (math). But it's understandable that most folks wouldn't want to do that.

1/17/2010 1:37 am  
Blogger MikeP said...

Vic, I suspect that the reason you've never seen A <-> B is because you've seen very few people progress from "I passed high school math and might recognize a quadratic equation" to "I've taken multiple university level maths courses and did well in them." I suspect that's what it would take to bring about such a paradigm shift.

Most people will, as you say, look for reasons to justify the numbers. For example, Patrick O'Sullivan. We went from "he'll shoot from anywhere, he should score a lot" to "snakebitten" and lately to "now he's on a roll, he's got confidence." (I suspect the real answer is "he'll shoot from anywhere, so a lot of his goals are luck, which is not to say he's unskilled." But sometimes I really do wonder just how good this player is.

1/17/2010 10:04 am  
Blogger R O said...


I do actually have a stats background, specifically when it comes to quantifying uncertainties.

It's the probability side where I come up short - stats and probability are intrinsically related of course but I've never done (for instance) a derivation of the central limit theorem, and I can simulate how a million coin flips will shake out but I couldn't give you the theoretical PDF of a million coin flips. I'm more on the applications side than anything.

1/17/2010 10:14 am  
Blogger mc79hockey said...

My biggest regret in life is probably doing a poli sci/commerce undergrad instead of getting serious about math and hardcore economics stuff. That, along with my dreams of NHL stardom, will be visited on any children I might have.

1/17/2010 1:18 pm  
Blogger spOILer said...

Wowser. Nice post. I originally thought you give the Thinker A types too much credit--that there aren't that many serious thinkers out there that are like that. Thinker A is just how the typical and very common mind works. And I think the issue is more training and socialization than wiring.

And then I remembered how dogmatic those individuals can be.

America's greatest gift to cinema is the invention of noir, which I would typify as a story in a world of capricious fate.

However noir endings are nowhere near as common as movies that attempt a noir universe--audiences really prefer a character in control of his own fate.

1/17/2010 1:19 pm  
Blogger Scott Reynolds said...

Great post Vic. I think you communicated the ideas very well and the shooting percentage stuff is of course classic. I had Andrew Cogliano, Tyler and Traktor bouncing around in my head the whole time I was reading.

1/18/2010 1:10 pm  
Blogger Kent W. said...

Good stuff.

I probably began life as Thinker A. Math has never come all that naturally to me.

Tyler: same here, although my degree did involve some stats courses. Now I'm having to catch up on a lot this stuff in my spare time.

1/18/2010 2:24 pm  
Blogger Jonathan Willis said...

Nice post.

Speaking as someone who has mostly picked up what statistics he has on his own time, I don't think it's an education thing, at least not a formal education thing.

It's the willingness to fit theory to available facts, rather than to fit facts to an preconceived theory.

1/18/2010 3:07 pm  
Blogger Vic Ferrari said...


I disagree. Thinker B types come from all walks of life. If anything, formal math education may well convert people to type A thinkers (as spOILer suggests).

It seems to me that folks who think like "B" are fairly rare in general. So the chances of getting a "B" involved in serious hockey conversation are slim to begin with. Of those "B"s, odds are very few have a math background, because generally few people do.

1/18/2010 4:29 pm  
Blogger Vic Ferrari said...


Yeah, I agree with all of that.

"B" thinkers are typically called Bayesian Thinkers (until recently that was intended as a derisory tag, thrown out by the frequentist thinkers, the "A"s.

I don't know if there is much to be gained by narrowing the definitions for the pigeonholes.

"B" types generally see things as a series of events, and express them that. Like a chain of events, or a sequence with branches. There is rarely, if ever, any Bayesian math in there, though.

I posted a link to a study on Harvard's hockey team a while ago, in a post at this site. That was cool stuff to my mind. If that struck a chord with you ... then you see the world a certain way, and that applies to hockey, too.

If you saw hockey as a giant, magic rainbow that washes over the blessed, leaving a light dew of magic on everyone ... well then I'd be envious, but I wouldn't be interested in talking hockey with you.

Anyways, there wasn't application of Bayesian math in the article on the Harvard team, not that I recall, if their was it would be peripheral. The author, however, his expertise is Bayesian math.

Always seems to be the way.

1/18/2010 4:51 pm  
Blogger Vic Ferrari said...


Yeah, I do have a math background, though much of it has been forgotten since University. If nothing else though, that gives a person the confidence to go to wikipedia and learn something new when the need arises.

Typically I've never bothered to relearn the frequentist math. Occasionally an especially beligerent commenter here or elsewhere will move me to hit wiki and try to understand the math they are trying to apply. Usually I find that it was not possible that they actually understood their own argument.

You follow baseball, it's madass that way. You could set off the fire sprinklers at a sabrmetrics convention, and so long as Albert, McCracken and James were out for a smoke ... no "B" thinkers would get wet.

(This by my own definition-by-example of "B" in the original post)

1/18/2010 5:05 pm  
Blogger reoddai said...

This comment has been removed by the author.

1/18/2010 7:09 pm  
Blogger reoddai said...

This is a very interesting article, but I'm a little confused. I have a feeling that I'm a thinker A. That's because if you ask me such a question, the Thinker A answer looks good.

My problem comes to Thinker B because thinker B is the only one of the two that considers the history of the player and that of the rest of the players in the NHL. Why wouldn't thinker A consider a player's history or league history?

Are you saying that the difference in Thinker A vs Thinker B attributes is that Thinker A takes one year of data and uses it as an average to calculate a normally distributed curve while thinker B looks at a player's history and league wide data to determine an average and then computes a seemingly normally distributed curve?

This sounds less like a Thinker A vs B problem and more like a "wrong first guess" problem.

Still an interesting read.


1/18/2010 7:10 pm  
Blogger Olivier said...

Vic: I don't think "B types" are that rare. But I certainly think the way Hockey is talked about, marketed and sold has been training the casual fan to see B-Think as the atithesis of how Hockey works.

I think the MSM are slowly warming up to this, just as the Baseball MSM warmed up to more coherent stats analysis a few years ago.

I know I had that "wait, what?" moment reading this post from Sisu Hockey (BTW: Jeff, if you read this: we miss you dude!).

Jeebus, Carbonneau was actually slotting Dandenault with Koivu & Higgins while sticking Ryder with Chicpchura and Latendresse...

1/18/2010 8:53 pm  
Blogger Hawerchuk said...

Nice post, Vic. Maybe it's that I have H1N1 but I had a hell of a time figuring out how somebody came up with the red distribution.

My day job lends itself to 'type A' thinking - when we have small sample sizes, we know the distribution of important parameters and they are clustered very close to the mean. We intentionally build outliers, know how far away they are from the mean, and match them up to our predicted distribution. And we get beat up by the boss if our predictions are off by more than 10-20%.

It's hard to break from that in situations where "B" would be valuable. So I appreciate your examples of how people would approach the problem from each angle!

1/18/2010 9:50 pm  
Blogger MikeP said...

Vic, I think you're succumbing to your own Type A thinking when you make the assertion you did (formal education leads to it, rather than away from it), although I suppose that's irrelevant to your main point. As you say, type Bs come from all walks in life, but I don't believe that formal maths education (much less any other sort) automatically leads to your Type A thinking any longer than I believed firing Craig MacTavish automatically made the Oilers a better team, and if you're really asserting that then I think you should stick to the numbers and leave the Psych 100 stuff at home, because you're not as good as it as your original post suggested you might be.

Olivier, I can get behind that's the way hockey is talked about. All you have to do is to watch a few hockey broadcasts to get a gutful of analysis by hindsight and magical thinking. I give Ron MacLean and Kelly Hrudey full credit for at least trying, but once you get off HNIC (yes, really) actual analysis is pretty difficult to find. And even HNIC has more Mike Milbury type analysis than you can shake a stick at. Now that I think of it, why are the most annoying people behind those desks the ex-GMs? Between Milbury and Doug MacLean, my mute button gets a lot of usage nowadays. Most of us, I suspect, grew up watching a lot of hockey and being exposed to the level of analysis that starts and ends with "what's he done for the last couple of months?" It's difficult to shake that off, especially since it's just plain easier to think that way.

I guess, in a way, you can't blame hockey people, particularly GMs and coaches. It's all well and good to talk about regressing to the mean, but the thought that one's goalie can't possibly be as bad as he's been so far this season or that the team's shooting percentage is suffering because of bad luck but must eventually rise again can't help a coach or GM to sleep at night when his job is on the line. Too many owners and fans just want the team to win now, never mind about anything else.

1/19/2010 4:19 am  
Blogger 美麗公主 said...

Every why has a wherefore.........................................

1/19/2010 6:15 am  
Blogger Tom Awad said...


For people who are willing to understand what you're saying, this is one of the most important posts on data analysis, never mind sports statistics, they will ever read in their life. My hat is off to you. Sure beats "they're finding ways to win".

1/19/2010 9:35 am  
Blogger Sunny Mehta said...

First and foremost, bro, this is an incredibly inspired post. Dense mathematical topic notwithstanding, I can feel the passion and fervor in your writing.

IMO, the separation of thinkers on a more simplistic level goes by "people who think probabilistically" and "people who don't." Generally speaking, most people fall into the trap of being "results oriented", "unable to separate process from result", "finding causality in noise and backwards narrating", etc - however you wanna phrase it. I don't think it's because they're bad people or stupid, we just aren't taught to think that way. It's my hope that in the future, probability will be taught and nurtured at a very early age.

As for your separation of thinkers A and B, yes we do see that often on message boards. I think honestly everyone is fooled by randomness to some extent, the extent just differs by person in terms of degree and area. And there's overlap in your A and B areas too, because even "probability" as we generally define it in society, is basically a frequentist concept. (f/n)

Imo it comes down to seeing the value in viewing things in the world a certain way, and making an attempt to try and do that, starting with basic conceptuality. Even more basically, just being aware of the role of luck. And then further and further, learning the language and formulas to more specifically identify and quantify luck.

On that latter note, Vic you have the background/chops for that, at least relative to most, and you obviously have the patience/desire to wanna share it (I suspect you would never have started this blog otherwise), so my hope is mainly that this doesn't die here, but rather begins. If any reader doesn't understand something someone else says, either due to the math or the writing, I would hope they'd ask and ask again until they do. Because, regardless of how important certain information is, it's useless if not communicated properly.

1/19/2010 11:56 am  
Blogger Sunny Mehta said...

Okay, onto specific questions.

Your whole setup of Thinker A is that he views the imaginary hockey player as being a 15.5% shooter. He realizes the player will not shoot exactly 15.5% next season, but as of now, 15.5% is our best guess at his expected mean. I don't understand then why you say "the likelihood of this rookie getting this wonderful shooting% WHILE ONLY POSSESSING AVERAGE SHOOTING ABILITY." Why would thinker A suddenly start to factor in how the rookie compares to the league average shooting percentage, when the point is that Thinker A does NOT look at the population, but rather views the rookie in isolation?

Also, if Thinker A views the rookie in isolation and considers 15.5% to be his mean, wouldn't his impression of his expectation simply be a bell curve with the middle squarely sitting on 15.5%, and the standard deviation (i.e. spread of the bell curve) fanning out according to the basic formula for binomial probability? (Folks at home who may not get that, anytime you have a specific sample n, in this case 103 EV shots, and a probability p that you consider to be the "true mean", there is simply a basic formula in place that tells you how often, by chance alone, you will get deviations from that.)

So, does the red histogram look a little skewed rather than totally bell-shaped simply because of how you drew up the classes, or is it because it is actually skewed? If it's the latter, then I think I agree with Gabe that I'm not clear on how and why you derived it.

1/19/2010 12:11 pm  
Blogger Hawerchuk said...


I was confused for a different reason. I read "calculate the likelihood of this rookie getting this wonderful shooting% while only possessing average shooting ability" and didn't see the link to the red distribution, which is the distribution of the player's true talent given that he hit 15.5% one time.

1/19/2010 12:40 pm  
Blogger Sunny Mehta said...


Yeah i'm just confused because the red histogram, as depicted, doesn't look totally bell-shaped to my eye, when in fact it should. (right?) And I also don't get why the latter half of the sentence I quoted is mentioned at that juncture.

1/19/2010 12:47 pm  
Blogger Vic Ferrari said...


Point taken on my grade 11 psychology. It was likly pure crap.

On your last paragraph,the one that starts: "I guess, in a way, you can't blame hockey people, particularly GMs and coaches. It's all well and ..."

I agree entirely, though I think on the whole that high level coaches have a much better grasp of luck in hockey than anyone else. People who track scoring chances for any length of time at all start talking that way. Guys like Dennis and Scott may not use the terms postdictive and predictive, but the thinking is there.

Still, when a coach's ass is on the line ... believing that a player was just unlucky gets harder to do. Especially if the player himself carries a casual attitude about it. Nothing worse than watching some effer be happy go lucky when your nuts are in a vice.

1/19/2010 2:02 pm  
Blogger Hawerchuk said...

Patrick Marleau reaching a career-high in ESG after only 50 games is surely due to 1) reduced pressure on him from not being the captain; 2) the fire that was lit under his ass by constant trade rumors.

1/19/2010 2:19 pm  
Blogger Vic Ferrari said...


Good point. That wasn't explained at all I've added in a sentence or two there to hopefully make it clear.

Also, the "A" types I'm talking about have never actually done that in conversations that I've been involved with. It is, however, fairly common for someone to bust out: The chances of Cogliano getting 19 or more EV goals on 96 EV shots (or whatever it was) at evens if he was a 8% shooter ... it's .00006, nearly 16000 to 1!

That's foolish for a number of reasons, firstly because he's a forward, and mostly for the reasons I address in the post.

So the "A" guy gave me a tree, and I built the forest that they implied with the red histogram.

1/19/2010 2:23 pm  
Blogger Vic Ferrari said...


Good post, I won't argue. Though I think there are a staggering number of people who can never even get as far as the red histogram. If you take their thinking and build the red histogram for them, they'll will it to be narrower and find ethereal reasons to justify it ("He'll start shooting more so his save% will come down a touch").

The only compelling way to shoot that additional argument down is with the "B" thinking above. Which, as we already established, they won't let their brains follow. So it's pointless. A's with A's, B's with B's ... and the tweeners can do whatever they want.

1/19/2010 2:39 pm  
Blogger Vic Ferrari said...


"So the "A" guy gave me a tree, and I built the forest that they implied with the red histogram."

should read:

So the "A" guy gave me a tree, and I built the forest that they implied ... that's the red histogram.

1/19/2010 2:41 pm  
Blogger Vic Ferrari said...


Wow, thanks.


It should be right skewed, Sunny. I've re-explained it in the original text. Gabe was quite right, there was a total leap over the building of the red histogram.

So you calc the chances of a 10.5% shooter getting exactly 16 goals on 103 shots ... that's 3.5%,and that's your "10 to 11" bin.

Next you calc the chances of a 11.5% shooter getting exactly 16 goals on 103 shots ... that's 5.1%,and that's your "11 to 12" bin.

On and on for all the bins. Then when you're done, you adjust the bin sizes so that the total equals 100%. (In this case I nicked all the bins by 10% or so iirc).

If you are mad enough to want to do the pure math and build a continuous distribution, take the integral of the binomial equation, that was derived from simple ideas in the linked 'poetry' post.

So the integral of P = (103!/(16!87!)*sp^16*sp^87

Feel free to mock my "integration with Lego" method above :) . But it's near as dammit, and a helluva lot easier to follow.

1/19/2010 3:00 pm  
Blogger Sunny Mehta said...

Good edit, Vic. Now I see what you were saying. The red histogram is just a graphical representation of how Thinker A's graph WOULD look if he sat there and dictated to us his "feeling" of every player's expected shooting percentage. And I totally agree with your last comment that the histogram would likely actually be narrower and more f'd up if we did the same exercise with the majority of hockey fans in the world.

FYI, the url you link to in paragraph six is incorrect.

1/19/2010 3:10 pm  
Blogger R O said...


This might be analogous to the birthday problem, which I know a lot of people have problems with.


You're in a room with a bunch of people. How many people have to be in the room for there to be a 50% chance of two of you having the same birthday?

The above, intentionally worded vaguely.

The Thinker A will guess way too many of people, after all birthdays are only 1 in 365 days! Surely the odds of someone hitting the same birthday are small.

The Thinker B will guess a smaller number (23 to be exact), after all a birthday is but 1 in 365 days, plenty of ways for some day to be the birthday of two people at once, even though the odds of it being a specific day (e.g. the thinker B's birthday) are much lower.

1/19/2010 5:27 pm  
Blogger Vic Ferrari said...

Thanks Sunny, I fixed it.

As far as moving the conversation forward ... I'm tempted, but I dunno. We're through the looking glass now that the scoring chances have helped us sharpen the ability distributions. But why write about it here? The people who might quantifiably benefit:

A. Professional gamblers, who I specifically do not want to help (no offense).

B. Some NHL Teams.

The Oilers have never visited here as far as I know, nor (granted I check far less than 1% of visitors at either place) and I'm pretty sure this isn't the sort of thing they would embrace.

This afternoon people from two Western Conference NHL teams have visited here. For all I know, they could be security guards or secretaries. Still, who the hell am I doing this for, exactly?

By the by ... I often wondered why any team would go to I mean they have the same data and more, and their own database people, as well as the league system. I thought maybe it was because my shift charts had 'click and drag'.

Turns out there are people from some NHL teams' offices and mail servers doing Zona-style analyses of players, trying to separate the wheat from the chaff. Sweet baby Jesus. I know 99% of Oilers talk radio show callers would consider this idiotic, but thinking Oiler fans should be a little bit concerned. At least they should if these teams are working on a trade with the Oilers.

I'm of two minds on the whole thing, though I'm leaning towards a closed forum and/or avoiding explanations of the really useful stuff. I've been doing the latter for a while. For the scoring chance stuff, I think that has to be closed forum.

I dunno.

1/19/2010 6:16 pm  
Blogger Vic Ferrari said...


Yeah. Forest thinkers and tree thinkers.

Prospect fans are almost universally tree men.

I don't know if you've ever read through the prospect projections at HF, but they are wildly optimistic. I don't know any of these players from Adam, and some of the forecasts may well be accurate. But if they are all right, then every year the prospect pool is ten times more awesome than it has ever been.

They've built the forest tree by tree, and the result is mad shit. A forest that has a one in a gajillion chances of ever existing. Lowetide always tries to bring everyone back to the forest, but he's a rare bird.

The other thing you notice if you ever spend an hour checking the prospect reviews/threads ... a lot of those cats at HF like the kids more than Michael Jackson did. That site should have a clickable icon that automatically sends a red flag to Big Brothers, boy scouts, local soccer associations, etc.

1/19/2010 6:28 pm  
Blogger Sunny Mehta said...


Yeah, I see your point. Either way, this post and discussion was great.

1/19/2010 6:39 pm  
Blogger Olivier said...

Vic: re: team visiting etc...

Smart people understand others migh have smart ideas from time to time and thus understand the necessity of keeping an eye on what's happening in their field. Like it or not, when we record scoring chances and then crosstab the results with zonestarts, corsi, whatnot and then do whatnot, we are on their turf.

Btw, are the draggable shift charts activated for the current season?

1/19/2010 7:22 pm  
Blogger Vic Ferrari said...


On the birthday puzzle you bring up; I think it's a pretty good example of how good writing can make a difference, and explains why a guy like Bill James is such an effective writer, but may seem inconsistent in his approach.

Going by his other work, I'd guess that he would think of it as a sequence ... as if every person who entered a room was without a birthday, and was assigned one as they entered the room by blindly drawing a numbered ping pong ball out of a bag, writing their name on it, then returning it to the bag. Then it's just a sequence of events, branching off into parallel universes.

What he would write, though, is probably something like:

"I generated a random number between 1 and 365. I did this thirty times to represent 30 people with random birthdays. Then I checked to see if there were any duplicate numbers, they would represent matching birthdays in the group.

I did this 10000 times and there were 7075 occasions when more than one person had the same birthday. So I tried using fewer people and repeated the process.

When I used 23 people with random birthdays, 5023 of 10000 trials produced cases with at least one matching birthday. Just a bit over 50%.

So the answer to R O's puzzles is 23"

He'd probably mention that it took him 15 minutes, start to finish, to figure that out. And he might say that you could do the same but factor in leap year, but he didn't plan to bother.

He would also word it better than I did just here. Still, even with my relatively poor writing skills, almost everybody going to follow the reasoning and agree that 23 is the right answer. There just isn't any room to argue.

1/19/2010 7:58 pm  
Blogger Vic Ferrari said...


I don't know how long that's been down, but it seems to be working again for now. The server didn't want an array initialized so I skipped those few lines of code. Hopefully it still gives the right results.

0809 seems to have stopped working as well, and for the same reason. 0708 carried on unaffected with the same bit of code intact. Go figure.

I'll apply the same 'fix' to 0809 in a minute.

1/19/2010 8:16 pm  
Blogger Hawerchuk said...

Watching the Kings telecast of the Kings-Sharks game. The announcers agreed that Marleau is "on fire" due to no longer being the captain.

1/19/2010 10:53 pm  
Blogger Olivier said...

Gabriel: do you think that means the habs, having no captain yet, are about to stop sucking? I also heard the Oilers captain has an attitude problem; maybe they should demote him?

I think we are onto something! Progress people! Progress!

1/19/2010 11:05 pm  
Blogger Kent W. said...

I'm leaning towards a closed forum

That's an interesting idea.

1/21/2010 12:14 pm  
Blogger Triumph said...

Vic -

Great article - I think it was over on Tyler's blog that you pointed out that humans generally aren't inclined towards 'B' thinking because it runs counter to our experience - Sunny basically echoes that above.

However, I don't think you can alternately point out the general lack of 'B' thinkers across all fields, then be surprised that you're getting hits from NHL team front offices. As you well know, most of the top sabermetrics people in baseball either are employed by a team, or have been in the past.

1/22/2010 8:24 am  
Blogger kinger said...

into the trap of being "results oriented", "unable to separate process from result", "finding causality in noise and backwards narrating", etc - however you wanna phrase it.

It's almost as if you prefaced that paragraph with "Frequent Oilogosphere commentator Bruce often falls"

1/25/2010 10:41 pm  
Blogger Jibblescribbits said...


You really should read "The Drunkards Walk; how Randomness affects our lives"

($10 on Amazon)

It pretty much goes about explaining what you call "Type B". It's a great book, that I'm sure you would enjoy.

Good stuff.

1/25/2010 11:16 pm  
Blogger R O said...

You really should read "The Drunkards Walk; how Randomness affects our lives"

Drunkards walk. Hmm that's usually how the random walk process is described to undergrads, a prime example of how chance can look a lot like something tangible. You should read that book yourself Jib.



I don't follow baseball much but that explanation of the birthday problem would convince some disbelievers. Not all though, you know the type of person that would tell you where to go, even in the face of overwhelmingly logical logic.

About prospect followers: not just prospects, NHL rosters too. Damn if I've seen a roster constructed with goals from every player in brackets once, I've seen it a thousand times.

The best is defensive depth charts sorted by goals, that is the kind of comedy you just can't make up.

1/25/2010 11:25 pm  
Blogger Jibblescribbits said...

Drunkards walk. Hmm that's usually how the random walk process is described to undergrads, a prime example of how chance can look a lot like something tangible. You should read that book yourself Jib.

I've read it twice. I've also taken about 7 or 8 classes on statistics and math methods being as I have a master's degree in Physics and all. I don't appreciate the condescending snark.

1/27/2010 12:58 pm  
Blogger Sunny Mehta said...

After pondering this post a little more, here are a couple more comments/questions.

1) Vic, I reviewed your comment where you explain how you got the red histogram. I think I was right in presuming that you just took 15.5%, used that as your expected mean, and built a basic binomial distribution around it based on the standard formula. However, I think I was wrong in assuming it should be bell shaped because, after referring back to a textbook, I realized that technically a binomial distribution is slightly skewed when p is significantly greater or less than .5. Most of the time we dealt with this distribution in stats class we were told to just use the normal approximation of it because it's close enough. (Not sure if that's right or wrong.)

2) Once we've drawn up a hypothetical ability distribution to then test by putting through Bernoulli trials, I don't understand why you'd say we can take ANY BLOCK from each individual player's distribution and use that as our p for his weighted die in the Bernoulli trials. Wouldn't that completely shift each player's expected mean, and therefore each player's individual distribution, and therefore the whole distribution?

For example, if Henrik Sedin is assigned a 13% Sh% in our hypothetical ability distribution, ok fine, when we put him through a bunch of sims he's gonna have seasons encompassing the range of his 99 percent confidence intervals or whatever. (Say, around 7% in his shittiest seasons and 19% in his top percentile seasons.) And ditto for all other players and their own expected mean and according confidence intervals.

But if you just take ONE ARBITRARY BLOCK from Sedin's distribution, say the 8% block, and use that as his weighted die, aren't you essentially reassigning his mean to 8%? And now he's gonna have 99 percent confidence intervals from like 3% to 13%, which isn't right at all.

Do you get what I'm saying? Am I misunderstanding what you meant?

1/30/2010 3:26 am  
Blogger Vic Ferrari said...


Sabermetrics is the fear, not the aspiration. I've been reading quite a bit of that stuff in recent weeks, it's a mess. It's an institution now, you dare not criticize it, but surely a lot people sense that the foundations are built in sand. I suppose that when the sabermetrics guys started, they kicked at the 'saw him good guys'. I dunno.

Only ever use MLB sabermetrics for wagering unless you are in full-on Constanza opposite mode. Those sub 3% holds on gamelines with a clear favourite ... that's not coincidence, the gaming houses aren't taking a loss. The opposite.

It's a perfect storm, convincing and well meaning people have created something majic that they couldn't possibly understand. Best that they stay that way.

1/30/2010 3:43 am  
Blogger Vic Ferrari said...

I think I follow, sunny. My point was that in the simulation you are randomly taking a block from each player's distribution ... in essence that is what nature is doing.

So far Sedin a random block is a lot more likely to have come from the 13% stack than from the 7% stack or the 20% stack ... just because there were more legos were in the 13% column to start with.

Makes sense, no?

The thing is, some fuckers are bound to have bad luck, and others good luck, in the right measure. And a bunch more about spot on what they deserve. And some a bit lucky or a bit unlucky. Again all in the right measure.

That's why the spread of results is always bound to be wider than the spread of ability. Not only bound to be wider, but in a predictable way. We can never say who will be lucky, but we can say how many will be lucky. As we both know, that's the art.

1/30/2010 3:55 am  
Blogger Sunny Mehta said...

Ok I had a hunch that's what you actually meant, but then FYI i think this paragraph is written incorrectly:

Now, let's simulate a season. If you take one lego block, at random, from the histogram for each player (let's say you pulled a 10% lego block for the rookie) now grab a die weighted to 10% and roll it 103 times (the number of shots he had). Make sure to record the number of sixes you roll, mark that down as his simulated goals for that season, and figure out his shooting% as well. Now do the same for every other player ... plot out the results. Voila! The same as the spread of results for an actual NHL season.

We should either be taking one block from each player's histogram and simply pasting that onto our new master histogram, or we should be taking each player's expected mean Sh% and rolling his die (weighted to that mean) to record his simulated goals to paste onto our master histogram. We should NOT be taking one random block from a player's histogram and using THAT number to then weight a die and simulate goals. Ya dig?

Btw, I have been thinking recently about exactly your last comment that the spread of results will always be wider than the spread of ability. Reason: It would be basically impossible for every player in the NHL to have a season where they ALL shot at their exact expected mean Sh%.

It'd be like if we had a dartboard with 20 evenly spaced squares, and we closed our eyes and randomly threw 20 darts at it. While over the course of infinity we expect each square to average one dart, it would be basically impossible in a single throwing of 20 darts to have each square contain exactly one dart.

1/30/2010 10:04 am  
Blogger Vic Ferrari said...

I'm not sure I follow, Sunny.

The histogram for the player is the likelihood of his abilities (and only as accurate as our reasoning/model and raw data).

So if we estimate that a player named Sven, for example, has a 25% chance of having 12-13% shooting ability based on last year's shooting ... we're also saying that there is a 75% chance that he doesn't. So if you took every player's mode ability and simulated a season ... you'd end up with a bit narrower distribution for the population.

It would be fairer to Sven, but not to the universe.

It probably wouldn't matter much at all if you had enough history on a player to narrow down your esimate of his ability ... of course that brings in aging/injury issues as well.

Another way to do it is to break Sven into 100 parts (or a thousand, whatever) ... then 15 Svens would be a 11.5% shooter, 25 Svens would be a 12.5% shooter, 20 Svens would be a 13.5% shooter, etc.

Then have each one of those Svens go through a simulated season ... do the same for every player ... you'll end up with something awfully close to the population distribution of SH% for forwards.

1/30/2010 6:36 pm  
Blogger Bushwood Bushwhacker said...

This comment has been removed by the author.

1/31/2010 12:30 pm  
Blogger Pete. said...

This blog is pretty excellent. I've got no understanding of or background in this stuff at all (I'm an artsy type), but it's all explained clearly, and if I read it eight times everything suddenly becomes clear.

A vs. B: thinkers - there may be some of us who appear to be type A, but are in fact intellectually lazy B thinkers. You give me the A thinkers conclusions and I'll think "there's something seriously off about that", but can't be bothered to figure out why. That's problematic, not in terms of hockey (who cares really) but in terms of day to day life. Gotta fix that.

I don't intend to post here again, because I have nothing of value to add, and would like this to remain a noise-free zone. But I'll continue to read your posts, and learn something worthwhile. Good work.

1/31/2010 6:12 pm  
Blogger Jim Philips said...

One thing that I really like about the Internet is the huge variety of opinions that you can get from something. I got a great review and I started to try pph free demo

5/30/2013 11:50 pm  
Blogger piercy solicitors said...

Especially if the player himself carries a casual attitude about it. Nothing worse than watching some effer be happy go lucky when your nuts are in a vice.
Divorce Solicitors London

6/29/2013 3:38 am  

Post a Comment

<< Home