Thursday, April 30, 2009

Scoring Chances: Part I of Many

For those that don't know, Dennis King tracked scoring chances for the Oilers this past season and posted his results at mc79hockey.com after the games. This was outstanding work, and myself and many others will surely be using this information for weeks to come.

Scott at Gospel of Hockey compiled this data on a few occasions, I'm using his data from mid season and the final sums in this post. This link will generate the NHL.com player stats for the games in question, automatically skipping over the handful of games that Dennis missed. As usual, change the values in the URL to look at different segments of the season.

So to pick a big apple from the bottom of the tree, I'm taking a look at the effect of players on the quality of the scoring chance on the Oiler goal. Dennis used a binary system, meaning he either marked down an opportunity as a scoring chance or not at all, their was no grading of the quality of the scoring chance. This is the way that every team tracks them, as far as I know, and I think that this is wise.

So if presume that Dennis was fair and consistent, and the players on the ice were not having any impact at all on the quality of scoring chance against, then we can model what we expect to see, which is:

  • Over the first half of the season, using the top 20 players by ice time (Stortini being the cutoff point), the average will be 11.9 goals against per 100 scoring chances against. And we should see a dispersion of results (measured here with sample standard deviation) of about 3.1
  • We in fact see a a standard deviation of 2.9. So, check.
  • Over the second half of the season, the average will be 12.2 goals against per 100 scoring chances against. And we should see a dispersion of results (measured here with sample standard deviation) of about 2.8
  • We in fact see a a standard deviation of 3.2. Check.
  • Over the season as a whole, the average will be 12.1 goals against per 100 scoring chances against. And we should see a dispersion of results (measured here with sample standard deviation) of about 2.0
  • We in fact see a a standard deviation of 1.5. Check.
  • The rates of 'goals against per scoring chance against' for the players, this should not repeat from the first half of the season to the next. As a convenient measure of this I used Pearson Correlation.
  • We in fact see small negative relationship for the results from the front half to the back half. So no repeatability at all. Check.

For the season as a whole the results are shown below, click to enlarge:


By way of example, for every 100 EV scoring chances that the opposition had when Smid was on the ice, 12 goals were scored. He was actually fantastically lucky in the first half of the season with GA rate of only 4 per 100 chances, and hit a stretch of snakeyes in the back half of the season resulting in that rate quadrupling. Get enough buggers rolling the dice and that sort of thing is bound to happen to one of them. If we divided up the games by odd and even game numbers, the similar thing will be there for someone else, though it's impossible to say who it will be, because randomness is random.

As you can see, the players are grouped tightly together. And the variation between them is small and accounted for entirely by expected chance variation.

I could have posted 9 more charts as generated by the random model, and they look identical in form, though obvious some players have better luck some simulated seasons than others. And if I had done that, I expect that about 1 in 10 people would have been able to pick out the real chart from the random clones.

And this statistic doesn't repeat at all, because it is likely both honestly measured and it is almost entirely luck, or near enough pure luck that it would be extremely difficult to hear the tiny skill component squeaking through the noise.

Now if I was playing defense for the Oilers last season this wouldn't be the case, the quality of the scoring chances would be through the roof, and it would skew this whole picture. But I didn't play for the Oilers, that was done by practised and trained professional hockey players.

As another check for homerism and bias, we expect the EV scoring chances per 100 shots-direct-at-net (Corsi+) to be similar both for and against. And it is:
35.6 For.
35.7 Against.

Damn.

So we should be good to go. This seems like a reasonable starting point, a check to make sure that the world is still round. Though it may well seem obvious to some and unbelievable to others.

And a couple of random thoughts to tag onto the end of this rambling post, based on my sense of it after kicking at this stuff a bit over the past couple of days:
  • Mike Babcock was right, possession is everything.
  • It's probably fairer to use 'shots direct at net while on the ice', instead of time, as a leveling tool, especially when comparing players on different teams. This for any even strength statistic.

19 Comments:

Blogger Olivier said...

People from every single freaking NHL city should be doing that next year.

You guys are the Shitâ„¢.

5/01/2009 12:47 pm  
Blogger Vic Ferrari said...

Thanks, Olivier, though Dennis is the one who deserves props.

I suspect that if 29 other bloggers from other teams did the same, 28ish of them would have registered more scoring chances per corsi+ for their team than the other. And that bias in turn would be reflected in the similar values for their fave players. Does that make sense?

I guess I'm saying that possession (as measured by corsi+/corsi total) reflects the scoring chance+/scoring chance total terrifically with Dennis, suggesting he wasn't biasing towards his faves. Similarly for and against overall, he wasn't favouring his team. That shouldn't be a rare trait, but it is.

Few buggers can manage that. You need a cold heart and an uncommonly honest disposition. And since few well read bloggers possess these qualities (you'd better be damn insightful if you're that frank, knowing that frankness is widely accepted as 'being an ass'.)

So as much I'd like to see it, I think I would review their data in precisely the same way as I have Dennis' above, and see bias everywhere.

Sisu has the game, Sleek too. Japers Rink as well. A few CHI fans (as unlikely as that sounds) and a few Sharks guys as well, even though there aren't many of them in total. Kent or Matt could do it perfectly for the Flames, but neither will bother. Probably several beyond that, guys I don't know about or whom I have forgotten while typing this. Still, rare birds.

Just in general, Olivier, does this post make sense, or did I rush over some things too quickly.

5/01/2009 2:01 pm  
Blogger Jonathan Willis said...

I figured that the scoring chances were a good measure, but this ices it. Thanks for running these checks, Vic.

5/01/2009 6:13 pm  
Blogger Scott said...

Great work Vic. Did you want the actaul spreadsheet to work with? I think I have an email address for you and I can pass along the Excel file if you'd like it.

5/01/2009 6:24 pm  
Blogger Vic Ferrari said...

Thanks jonathon.

And Scott, I thought about emailing you for that, but it turned out that it only took me a couple of minutes to transfer the data fro your jpg to Excel manually.

Eventually I'd like to have all the games stored individually in database format or csv. I don't know much about db programming so it will probably be the latter. That way we could look at the relationship between even and odd numbered games, or just generally 41 random games against the remaining 41.

The main reason for doing that would be to look at the streakiness of scoring chances (and corsi as well). Because that's really interesting. And how that 'pattern of abuse', as it were, is affected by TV timeouts, goalie changes, scrums, fights, goalie equipment adjustment delays, captains arguing with referees, etc.

I think there may be method to the madness behind the bench antics of a guy like Keenan, btw. More than I ever would have imagined.

And then I want to look at the special teams effects. Rolling averages, over several games ... trying to see if the quality of chance drifts with the numbers of shots ... grinding through a few games manually, I strongly suspect that it does on PPs AND PKs, inexplicably, but not in a measurable way at evens.

So at some point I'll be coming cap in hand, looking for volunteers to help me get all the games SC data into an appropriate format.

5/02/2009 1:37 pm  
Blogger Scott said...

Vic,

I have each game summed individually in the Excel file. It doesn't have the time of each chance but for things like odd v. even games or 41 random games it would work just fine. I'll just send it over and you can delete the email if you don't want it.

5/02/2009 8:50 pm  
Blogger JLikens said...

"People from every single freaking NHL city should be doing that next year.

You guys are the Shitâ„¢."

I'd be willing to participate in such a project.

I do have a team that I cheer for, but I'd be willing to do it for any team -- I get Center Ice every year, so not having access to the games wouldn't be an issue.

5/02/2009 11:14 pm  
Blogger Vic Ferrari said...

Scott:

I got the email, thats just terrific stuff. You are a star.

Won't take much to write a script to generate corsi/goals/shots etc in the same format. I'll get around to it sooner or later. Generally I root through this stuff, and post on the net, when I'm procrastinating on something else. And then I usually jolt myself out of this with boorish behavior. So keep your eyes open for the pattern ;)

5/03/2009 7:05 am  
Blogger Vic Ferrari said...

Jlikens:

That would be great. My advice would be to do it for the team you cheer for, it will be too much of a chore otherwise.

You strike me as a unbiased guy, I'm sure your results will pass the sniff test above with the same flying colours as Dennis.

What team do you cheer for, anyways? It's never been obvious, I would have guessed the Oilers.

5/03/2009 7:09 am  
Blogger JLikens said...

I'm actually a Habs fan.

I try not to be too partial but, like most fans, my loyalties likely affects my judgment to some degree.

Anyway, I follow the Oilers a fair amount too. I encountered the Oilogosphere in the latter part of the 2005-06 season and it completely changed the way that I view the game. I'd always been a fairly analytical thinker, so the content naturally appealed to me.

5/03/2009 5:24 pm  
Blogger sunnymehta.com said...

Vic,

I apologize if I missed it, but did Dennis explicitly give any kind of detail on how he defined "scoring chance"? If so, could you either explain it to me or link me to the appropriate post?

thanks.

5/03/2009 6:43 pm  
Blogger oilswell said...

Something that's been missing for a long time so its great to see.

Even if you miraculously eliminate bias, Sunny points out the calibration problem. Common advise for human categorization is to provide standardized training information to calibrate the human judges. Special attention should be paid to boundary cases and training all judges with examples should help. These could be created using online videos (but probably won't be). In addition to Vic's bias sniff test one might employ Cohen's Kappa to test the calibration of pairs of judges using games that both judges categorized.

5/03/2009 11:13 pm  
Blogger Vic Ferrari said...

Yeah, if we were running a team that would be the way to go. Probably using shots directed at net as the base number, because surely there are only a handful of scoring chances all year that don't register a shot towards the goal.

Then you could hire a few guys on the cheap to review hundreds of hours of game tape from various leagues and record the corsi and scoring chance numbers for the games. Leave some overlap in the games reviewed, and use Cohen's wholly sensible test with scoring chances as a positive result and shots-directed-at-net as the sample size. Just to make sure that all your interns were counting them the same way.

Because over the course of a season, or at least this season, scoring chances gravitate to Corsi really well. r=.91 for the team as a whole.

And you can predict on-ice scoring chance numbers really well. Even the outliers (which are Visnovsky and Moreau) aren't outlying very far:

Vis: 52% of possession by corsi, 55% of scoring chances by Dennis.

Ethan: 45% by possession, 42% of on-ice scoring chances by Dennis.

I mean 71 is clearly a mile better than 18 by either measure, and the rest of the regulars are much closer yet. Probably very nearly as close to the same as random chance would allow, though I haven't checked.

I also strongly suspect that if we modelled this as corsi number being the equivalent of ability to answer a trivia question, and winning a trivia question gives you a roll of the dice (losing it gives a roll to an opponent), and they are fair dice ... roll an 8 or more (or whatever) and you get a chance to roll the 'scoring chance dice', which you've inherited from you parents.

Or so I suspect.

5/04/2009 10:43 am  
Blogger Vic Ferrari said...

You'll have to ask Dennis. I can't imagine that any has a hard set of rules, though. I don't think that would be doable.

It's like porn ... in some cases it's obvious by set criterion, in other cases hard to define, but you know it when you see it.

5/04/2009 10:48 am  
Blogger Scott said...

The idea of doing a project sounds good. It may be an idea to start with one conference so that we "only" need to find fifteen willing participants and, if that doesn't work, a division so that there's "only" five. That way there will be quite a few games where there are two scorers and the criteria can be better defined over time. Plus, it will give an interesting look at the marginal cases since the times for the scroing chances will be recorded. The biggest possible snag is access to all of the games and, of course, finding five people willing to do it. If I can find a way to have access to all of the games, I'd be up for following a (non-Oiler assuming Dennis is willing to go with the Oilers again next year) team next season.

Perhaps Dennis with Edm, Kent? for Cgy, Willis? with Van, and JLikens and I can split up Col and Min if he's willing.

If we can all use the format you had laid out for Dennis I really wouldn't mind putting all of the results into separate spreadsheets and then emailing them out to all of the participants.

5/04/2009 11:22 am  
Blogger Kent W. said...

Kent or Matt could do it perfectly for the Flames, but neither will bother.I actually flirted with doing it this year, but it was well into the season so I didn't bother. I'll likely try to take it up next season tough.

5/05/2009 3:13 pm  
Blogger JLikens said...

"
Perhaps Dennis with Edm, Kent? for Cgy, Willis? with Van, and JLikens and I can split up Col and Min if he's willing."
Sure, I'd be willing to do either.

5/05/2009 8:42 pm  
Blogger The Falconer said...

I tried doing this for Atlanta twice, but a) it requires a very high level of dedication and b) it was depressing how many scoring chances against ATL allowed.

I would be very interested in seeing a rough list of how he treated border line scoring chances. For example, do you count a point shot if the goalie was screened. How about a wide open guy on the back door who just missed the pass across the crease--is that a chance or not?

5/11/2009 11:23 am  
Blogger Dennis said...

Vic: I'm not sure if you'll be alerted that this thread has a new comment but once again I want to thank you for creating that website/tool and for compiling the numbers afterwards.

10/05/2009 10:47 pm  

Post a Comment

<< Home