Monday, June 08, 2009

Real Effects and Team Shooting Percentage at Even Strength

'Real Effects' is a term I'm stealing from Bayesian mathematicians. The idea is that if an element of nature is affected by something other than randomness, that it should sustain itself from one independent sample to another.

So if we look at MLB baseball players and their ability to hit home runs, as a percentage of balls that they put in play ... well the Real Effect is enormous. There is a strong association from one set 50 of random baseball games in a season to 50 other random games from the same season. Using Pearson correlation as a convenient way of measuring that relationship, it's around 0.9.

Of course randomness still has it's say, but it's a starting point, and tells that it is possible to separate the ability distribution from the luck distribution for this element of nature. There are no Real Effects for 'players who hit better on natural grass', or 'RH batter vs LH or RH pitcher' (obviously players do a lot better against opposite hand pitchers, just that there is no evidence to suggest that, in the general population, some guys do it better than others). For that matter there is no real effect for the majority of the stats recorded at Though surprisingly there is for most measures of clutchness in baseball.

In this case team I'm looking at Team EV Shooting% for the 08/09 season. Inspired by Jlikens terrific work on the subject over the past season.

Jlikens looked at the distribution of EVshooting% for the 08/09 season, this versus what we would expect the random distribution to look like, and found that there was very little in it. That the overwhelming majority of differences between teams EVshooting%'s was accounted for by luck alone. Surprising stuff, even for the cold hearted folks around this part of the internet.

Of course with just 30 teams in play, it's possible that this distribution occurred by chance. It's like if we were assessing the fairness of a coin flipping contest, where we knew that the coins were weighted differently. If we just looked at one 'season' of coin flipping, it is possible that by chance alone, the spread of results would appear to be the same as that of fair coins. Just because a few of the guys with good coins happened to have a spot of bad luck flipping, and the opposite for some of the guys with poor coins.

So, I thought I would check for real effects with team EVshooting% by upping the sample sizes. For the 08/09 season I randomly selected 19 home games and 19 road games. Then I just took the shooting% for when the score was tied and both goalies were in their nets, this to eliminate the goals from blowout games.

Then, of the remaining 22 home games I randomly selected 19 of them and did the same as above, grabbed the EV goals and EV shots from the tied-game state. Same for the remaining road games.

Then I checked for real effects using Pearson correlation.

Then I did the same thing 1000 times.

And while I was at it I also looked for the real effects of shot ratios and EVsave% while the score was tied.

The script that does it is here, and it will take about a minute to run. And every time it will use different random selections of games, each of these will be listed in the second table, but 1000 samples seems to be enough that the averaged result always ends up about the same:

Real Effects:

EVshooting%: 0.00
EVsave%: 0.14
SF/SA: 0.68

Jlikens is right. Damn, woulda thunk it?

I doubt that this is the case for the NHL in previous eras, especially the 1980s and around 2000. And clearly some players have a better ability to finish, though as MC79hockey has shown us repeatedly usually similar reasoning, that's much smaller than most of us realized intuitively.

And surely this will be the same next season. In fact if we decide right now who are the five best EV goalies in the league, based on EVsave% over the past few years, and eliminate them from next year's test ... then I suspect that we'll see the real effects for goaltending EVsave% fall to almost zero as well. Again, I don't think this was the case 20 years ago, but there are an awful lot of good goalies in the NHL now, and precious few real difference makers in that position.

In conclusion:
1. Possession may not be everything, but in the big picture it's damn close toit. The Oilers need players who drive the terriorial advantage. It would help if Pisani regained his old form as well, there was a time when the play just never seemed to die with the guy. And the forwards that the Oilers do have, they need to be willing to play more below the other team's goal line and finish their shifts more responsibly as well.
2. Unless you're getting an established star goalie, it doesn't make a lot of sense to spend a lot on a second tier guy, either in cap space or trade assets. Because the gap between 2nd tier and third tier appears tiny.
3. Guys coming off of poor percentage years, but with good underlying numbers, and who play a solid game ... these should come at reasonable rates. They should be players that the Oilers buy, not sell. (SEE Reasoner, Stoll and Torres from last summer).


Blogger said...

Awesome stuff.

Question: in your experience from running a lot of these correlation experiments, have you found any really high correlations anywhere? Like, along the .9 lines of HR rate? If so, where? Exactly how predictive is the corsi and zoneshift stuff you always refer to? Also, in your opinion, even if we can't measure it right now due to lack of data, where do you think we'd find super high correlations? Time On Attack or something like that?

6/08/2009 8:22 pm  
Blogger said...

Also, just so this is clear...

You took 38 random games (made up of exactly 19 random home games and 19 random away games), looked only at the games when they were tied, and compared that whole sample to a sample of 38 different tie games (19 random home + 19 random away) from the same season?

And the correlation of EV S% was zero? Would it be easy for you to run this type of thing for PP S%,SV% and SF/SA? It would probably also be close to zero?

The thing I'm thinking is this - the average sample in your study is, what, like 400 shots?

I wonder how many shots it would take for EV S% to stabilize? The problem is obv in testing it. If we use subsequent years we are sampling from different distributions (although that's somewhat true to a lesser extent even in one season). Nevertheless, what if we used two (or even three or four) subsequent seasons and did the same kind of split-reliability thing?

6/09/2009 10:00 am  
Blogger Vic Ferrari said...

Sunny, the importance of correlation is just to see if their are real effects there. This isn't a step in the linear regression process, since none of these things are likely to lend themselves to that.

It does tell us that, right now in the NHL, the difference in ability between teams to bury their shots (EVshooting% when tied) is virtually nil. I probably should have used "when the score is within one goal" just to up the number of total shots a bit. It's not going to matter though, getting rid of the blowout portion of games and the empty netters is the main thing.

So in trivia craps with 30 players we know that the ability to answer trivia is the only thing that is linked to ability. Because we designed the game and the dice are fair.

And if we assign trivia answering abilities of the 30 players based on Corsi% for 30 NHL teams in 07/08:


The correlation of trivia question win% from 6 games onto the remaining 72 is 0.81.

In the league that year the correlation of corsi% from 6 games onto the remaining 72 is 0.77.


The predictive value of trivia win% over six games, predicting overall trivia craps winnings in the remaining 76 games: 0.61.

Corsi% in six games predicting EVgoals% in remaining 76 games: 0.58.


The reason trivia craps, intended just as a starting point for modelling EV NHL hockey, worked so well is because, as we now know, EVshooting% in the current era is just noise, and EVsave% ability varies little between teams, and that .17 average association shown in this post ... we now know that to be mostly the result of skewness of the distribution, not a wider spread.

In short, it's all about the model. The correlation is only an arrow in the right direction and a measure of association for data generated by a model and it's real life counterpart.


If you just go around poking shit with a pointy linear regression stick, you're going to come up with some madass conclusions. And if you separate luck from ability assuming that all talent in every subset of society is distributed in Guassian fashion (see the link provided by a commentor on MC79hockey yesterday, to a sabermetric methodology) that's demonstrably wrong. In some cases it may work okay, in others it will be a mess.

6/10/2009 11:02 am  
Blogger Vic Ferrari said...

To further illustrate:

Over six randomly chosen games at EV:

Correlation of PDO# to EVgoal% (GF/(GF+GA)): .87
Correlation of Corsi% to EVgoal%: .20

Let's assume that all distributions are normal, shall we?

PDO# is responsible for (r^2) 76% of results:
Corsi% is responsible for 4% of results.

Spot the problem?

And if I used Corsi+ and EVshooting% for goals-for. And the opposite for goals against ... the totals would add up to very nearly 100%.

And it's not because the big picture sabrmetric ideas "don't work well in hockey", it's that they don't work well in baseball either. Because the world is round. I'm not interested in the micro view stuff, the "Pitcher X is starting too many batters fastball-fastball-curve ... he's too predictable!" ... I don't doubt that stuff is very good, just that I don't follow any team in baseball closely enough to care. Plus it's not what I'm interested in with MLB.

I wrote a post a little while ago, one that hopefully someone like Matt, Rivers or Jlikens writes sequels to, it explained what a binomial prabability was. The title was 'The Poetry of Logical Ideas', and to me that's what math should be. It's the language used to express reason.

Math should NOT be a big fucking hammer with Linear Regression stamped on it. And people expecting to beat the truth out of elements of life by wailing on them ferociously with a BFLR hammer ... well, it's probably not going to work very often.

6/10/2009 11:24 am  
Blogger JLikens said...

This is really interesting stuff, Vic.

I've obviously done a bit of work in this area, but nothing as definitive as this.

The fact that you've averaged the correlation over 1000 samples is critical and, as far as I'm concerned, essentially confirms that none of the team-to-team variance in EV S % when the score is tied is due to ability.

Interestingly, I was recently looking at how repeatable EV shooting stats with the score tied were from one season to the next, comparing 07-08 to 08-09.

The correlations were quite similar to those obtained by yourself.

Shot differential: 0.68
Save percentage: 0.19
Shooting percentage: 0.00

6/10/2009 9:04 pm  
Blogger said...

This is some deep shit - I'm still trying to wrap my head around all of it.

Vic, I read your article about the Binomial Formula. Here's my question - the Bernoulli trials are yes/no trials assuming a constant probability. Intuitively that doesn't seem very likely to be the case for shots on goal. I mean, come on guys, surely every shot doesn't have the same probability of going in - we can acknowledge that? No? Just 'cause we can't measure it accurately yet doesn't mean it doesn't exist.

If we run the Rangers' 133 ES goals on 2055 shots through Bernoulli trials assuming a constant .08 rate, we get a probability of .5% that they'd end up with that shooting percentage.

Perhaps I happen to have chosen an outlier. Perhaps the blowout games are influencing it (I'm dubious how much, particularly in light of the team in question).

Look, I definitely see the point that randomness plays WAY bigger of a role than people think, on both shooting percentage and therefore goal scoring - I've certainly thought that myself for a while, and I've largely bet on it for a while. And I think Vic's point that shots are closer to being Bernoulli trials than most people would assume is a great point. I don't know, like I said, I'm still trying to soak all this up.

6/11/2009 10:43 am  
Blogger said...

Eh, i think my last post missed the point. Like i said, i'm still pondering over all this.

What do y'all think about the following...

If we only think about goals as being derived from shots, we are essentially backing ourselves into a corner because we ARE making them no different than Bernoulli trials. Why? Because it's always a binary outcome. You're limited to a certain (small) standard deviation because every outcome is 0 or 1.

But what if we graphed goals by something OTHER than shots? Maybe that will get us somewhere?

Because... I don't know, maybe it's just 3 in the morning and I've lost my mind, but intuitively something feels fucking WRONG about saying goals are simply the product of shots. They're not! I know it, you know it. We just know it. They're the product of something else. Yes, by our paltry definition a "shot" is necessary for a goal, but dammit, goals come from SOMETHING ELSE.

What is that something else? Lol, I don't know. That's what we gotta figure out.

Zone time? Possession? Possession in a certain zone? Possession in a certain zone that's preceded by certain particular events? Possession in a certain zone that's preceded by certain particular events and succeeded by other certain particular events and blah blah blah - you get the point.

6/12/2009 2:26 am  
Blogger Vic Ferrari said...

Sunny, you've made some good points, and I'll address them one by one as best I can. If I'm off at all in my answers hopefully someone else chimes in to correct me.

It will take a while though, it took me this long just to find the motivation to reread my own two stream-of-consciousness comments above, and your replies to them.

* On "not all shots are created equally" ... absolutely true. The correct distribution would be multinomial, and the correct conjugate is a dirichlet distribution. (i.e. the most likely distribution of shooting talent).

And your into some extraordinarily complicated math. (For all my eye rolling at the overly simplistic and misleading sabermetric treatments of "luck" in my comment above, there have apparently been a couple of exhaustive papers written using this multinomial/dirichlet procedure, though I have yet to read them).

The thing is, if you watch a few games and tally shots into "probablity of going in", just by your sense of it, and not including how good the shot was ... just the quality of the scoring chance, then only counting it if it ended up being 'on goal':

So break them into ~0%, 0-5%, 5-10%, 10-15%, 15-20%, 20-25%, 25%+ ...
You will be building a multinomial distribution. Thing is, in the case of hockey shots at evens, it will end up virtually identical to the binomial distribution if you plot it out.

At least that was the case last time I checked, granted that was years ago, but hockey is still just hockey.

I mean apparently they were tracking corsi (shots directed at net) in the East German League in the 60s, perhaps earlier. At some point the East German government decided to stop funding the league and it fell by the wayside, playing on outdoor rinks with mostly unpaid players, etc.

Still, I would bet that the corsi numbers related to shots, goals and scoring chances the same then and there as they do in the NHL now. More or less anyways. It's just hockey. And that certainly seems to be the case with the small sample size of the 1966 WHC's in East Germany.

6/13/2009 10:53 am  
Blogger Vic Ferrari said...


On the thing using NYR shooting percentage and binomial probablity:

That's absolutely the line of thinking. Of course you have to look at all 30 teams to get a fairer picture. And of course even 30 dice rollers will have something visually quite different than a binomial spread of results over 2000 rolls each. But they'll be in range. And the distribution built up from countless trials with random 500 roll subsets will (I think).

In any case the numbers you are using are for the overall EV shooting% I think. The data you should look at is here at
that's a big table of data used for this post.

is a user friendly version of the binomial calculator.

That will change things dramatically.

JLikens comparison of the spread of actual-vs-expected when leading/trailing shows a significant difference though. So you'll find these (1 in 2000 chance of happening by coincidence! things several times then. Or so I would think.). It doesn't take too much of a shift in the shape of the expected curve to bring that in line.

But since there are countless possibilities of team shooting ability distribution ... it's not an easy nut to crack.

6/13/2009 11:04 am  
Blogger said...


a few more thoughts...

vic, let's say we wanna know what distribution shape the binomial formula takes on, so we graph coin flips over a million trials. as i understand it, the binomial distribution essentially BECOMES a normal distribution after enough trials. i.e. - they are the same thing. ??? am i missing something, because i thought one of your hangups was that you dislike the normal distribution assumption, yet you seem to like the binomial formula?

on that note, ok, as you guys saw from the email i sent you, i've been graphing a few things, including Corsi. the shape sure as shit doesn't look like a normal distribution to me. of course, that in and of itself could be dumb luck, however, it looks exactly how we'd expect it to look like - i.e. the left side of the bell curve is cut off. that of course makes sense: if your performance starts to slip too far below average, you no longer get to play in the nhl.

so, but vic, i don't get where your binomial thing fits in. are you saying that a) shooting percentage IS actually normally distributed?, or b) you use the binomial formula to see what a normal distribution would look like, then you COMPARE to the observed distribution to see if they differ. (and your hunch is that they differ)

now, humor me - let's assume that in fact we're dealing with a non-normal distribution. at that point we can throw Standard Deviation, Pearson Correlations, Confidence Intervals, etc all out the window because they don't mean a fucking thing to us.

but now how do we measure luck? i.e. - if our distribution were normal, and league average shooting percentage was 8.5 percent, and some player was shooting well above that, we could say "ok, look how many SD's this guy is off. that's very unlikely to be luck. particularly given everyone's subjective assessment that he's awesome."

but we can't do that with a non-normal distribution! so how do we figure out what the probability is of so-and-so's abnormal S% being skill vs. luck?

any of this making sense?

6/15/2009 11:40 pm  
Blogger Vic Ferrari said...


The luck distribution is binomial, or normal, or near as dammit. Depends on the model, but there isn't much to argue about there.

The problem comes when people make the assumption that ability is distributed normally (or binomially). Because while it would be extremely convenient if that was the case, it's pretty damn unlikely to happen that way in the world as we know it.

Model building is good, that's my message.

7/06/2009 7:55 pm  

Post a Comment

<< Home