Saturday, May 15, 2010

Streaks

Streaks are interesting phenomena. They are also very difficult to pin down with any language, be it spoken or mathematical. On top of that, the human brain seems to have evolved to recognize patterns, and we can spot them even where they don't exist. Children can't look at a cloud, a stipple ceiling or the grain in wood panelling without seeing an image. Ask one if you don't believe me.

One way of analyzing streakiness is to look at rolling averages. For example, if you looked at Canuck forward Alex Burrows during last season, and plotted out the 20 game rolling average of shooting percentage (even strength, no empty netters) you'd get this:
We can evaluate Burrows streakiness, or inconsistency, by summing up how many black pixels we needed to draw that picture. Then we compare that to how many black pixels we'd expect to use by chance alone. In Alex's case, nearly 99 times out of 100 we'd expect to use fewer black pixels. So he had a streaky season. How much of that was down to chance and how much of that was down to circumstance? It's a good question. The first thing you'd need to do is look at every other forward in the league, but that's a subject for another day.

This idea is taken from a 2008 article in The Journal of Quantitative Analysis in Sports. It has been coined the BLACK stat.


For now I have a little test, so you can check yourself for streak bias:

If you rolled a ten sided die 100 times, counting a seven as a success and any other number as a failure, and you happened to get 10 sevens ... what sort of pattern would you expect?

The pattern immediately below this paragraph would be too consistent to be true. We've all played enough board games to know that dice don't work that way:
0000000001000000000100000000010000000001000000000100000000010000000001000000000100000000010000000001

Copy that series of 0s and 1s into a word processor or text editor, and drag the 1s around in the series until it seems properly random to your mind.

Next, copy and paste the modified series of 0s and 1s into this URL. Simply replace the original series in the URL with your modified version. Open a browser window with that URL, and voila, you'll get a little graph like the one above for Burrows, and you'll also get a bit of BLACK stat data. The number to notice is the BLACK statistic rank, that shows how many pixels your series required, as compared to 10,000 randomly simulated series.

Please note that this will NOT count towards your final grade. If you're going to do it, trust your instincts. I got a shockingly small ranking with my first go, implying that I was subconsciously putting a pattern into my random series. That's not good, but it's humanity. Or so I think.

Friday, May 14, 2010

Brian Burke and AHL Advanced Stats

NHL Board of Governors Meeting

What the hell is Brian up to?

There is a company called Time On Ice Software that sells programs designed to help teams acquire advanced stats for their hockey team. Depending on how you input the data and present it, you could essentially have the same stats that are available at behindthenet.ca or timeonice.com ... but for your beer league team.

If you click on the link you'll see a list of their users by logo. As of last year the four AHL teams that were customers:
Manitoba Moose (VAN)
Portland Pirates (ANA)
Rockford Icehogs (CHI ... formerly ANA via Cincinnati Mighty Ducks)
Toronto Marlies (TOR)

The thing that these teams all had in common; Brian Burke has been the GM of the NHL affiliate.

The Texas Stars and Hamilton Bulldogs seem to have started using this system this year as well. They are currently facing each other in the AHL Western Conference finals, so perhaps it has helped a bit.

Now there isn't anything particularly sophisticated about the programs, it wouldn't surprise me if some teams had developed their own software. The nice thing about staying with the same format is that you would be able to share with other teams and easily build an NHL.com style database. A database that would have a significant sample of underlying stats for most players in the AHL. And of course the cost of the software is trivial compared to the cost of inputting the data for every game. Presumably one doesn't go to that trouble and expense unless they are making use of the resulting data.

Brian Burke has never struck me as a particularly cerebral guy, this investment will tell him nothing of the truculence of AHL players, after all. The obvious thing that this data yields, as has been shown repeatedly on this blog and throughout the Oilogosphere; it tells you the context of a player's results and gives you a better idea of his true value, beyond the counting stats.

At the trade deadline the Leafs acquired Martin Skoula (who was flipped for a 5th round draft pick) and an AHL player named Luca Caputi in exchange for Alexei Ponikarovsky. It will be interesting to see how this player develops.

I don't know what the Leafs (or even the Habs, Ducks, Stars, Blackhawks, Canucks) are doing here. Perhaps I'm giving them too much credit. I do know that I would be very wary about any Oilers trade with the Leafs that involved AHL players, or involved any guys whose track record was principally in the AHL. We've seen that movie once, with Lupul and Smid, I don't want to see it again.

Thursday, May 13, 2010

Forest v. Trees

I thought I would take a quick look at the effect of defensemen on save percentage, relative to their teammates. This is just for 5v5 in the NHL during the past two seasons.

To do this I used Desjardins' terrific behindthenet statistics site. I arbitrarily set the cutoff at 30 games played, and grabbed the 08/09 and 09/10 data.

From there I took the on-ice save percentage and subtracted the off-ice save percentage. The off-ice save percentage applies only to the games in which the player was on the roster. If you're doing this yourself, note that the shots on Gabe's site are actually saved shots, you'll need to add goals against to get total shots. Also empty net goals are included in the data, which is unfortunate, and is going to make the offensive defensemen on bad teams seem a touch worse.

By way of example, in 08/09 Chris Pronger had a 5v5 on-ice save percentage of .915. When he was in the game, but not on the ice, the opponents scored at a 2.20 goals per 60 clip, and the ducks made saves at a 26.5 saves per 60 clip. So 26.5/(26.5+2.20) = .923. So, in 08/09 Chris Pronger had a 5v5 off-ice save percentage of .923. His net score was -.008, meaning that the Ducks goalies stopped pucks at a .008 better clip when he was on the bench at 5v5, rather than on the ice. We'll call that his 5v5 save percentage score.

This season Pronger was +.008.

The same exercise was repeated for all players.

Real Effects:

If there is a real ability of defensemen to affect shot quality, it should repeat from season to season in the general population. So below is a scatter plot of 08/09 vs 09/10 save percentage scores for all defensemen who played at least 30 games in each season. Click to enlarge.

The guy at the top right is Mark Fistric, the player to the far middle right is Brett Festerling, and Jack Johnson is the player the furthest to the bottom left. These aren't outliers in the true sense of the word, the universe requires that some guys have this level of good and bad fortune. In fact if you remove them, the correlation from season to season becomes negative, and the bunching of the dots becomes far more tightly grouped than we would expect by random chance.

In fact, the bunching is tighter than we would expect by random chance already. The reason is NOT survival bias (a phenomenon that corrupts many MLB stats beyond recognition) coaches and GMs in this league seem to have this stuff figured out. You could make a case that Alzner would have made the Caps this year if he'd been luckier in 08/09, and that Hedican probably had another year or two left in him. But that's just two guys and both are at the edges of their careers, it's a non-issue on the whole.

There is some censorship bias though. The group is bunched a bit too closely together because good defenders tend to play more against good forwards, so their 5v5 save percentage score suffers a smidgen. And if a guy is playing tough minutes and getting shelled in terms of save percentage score, coaches tend to give them a break from the tough gig, probably just to make sure that they don't start losing confidence. So the correlation of Corsi QualComp to save percentage score is only r=-.01. But this defender was playing tough minutes in the first place because he's a good player, the coaches know that, so next year he'll be back to facing tougher comp. So the correlation of 08/09 Corsi QualComp to 09/10 5v5 save percentage score is a touch stronger, r=-.09.

In terms of old school QualCOMP, which is a tad self referring. postdictive r=-.05, predictive r=-.10

The same phenomenon occurs from 07/08 to 08/09, though not quite as strongly.

Bottom Line:

The ability of defensemen to affect shot quality against does exist in the population, but it is so small that we will never be able to sensibly apply it to any player in particular. And a paradox is created, the type of defensemen who are helping the goalie save percentage a bit (presumably because they make fewer mistakes of the spectacularly bad variety) are, as a group, seeing slightly worse save percentages behind them, because they are the guys the coaches are leaning on to play tougher opposition. And the guys who have talent but are guilty of the occasional egregious error ... as a group, they do a whisker better than average by 5v5 save percentage score. This is presumably because their coaches have the good sense not to play them much against Malkin, Kovalchuk and Heatley types.

Trailing Thoughts:

Even though it will not be popular with fans, I think the right guys for a team to target are defenders who have been getting some bad bounces in recent seasons. They should come cheaper in trade than their true value dictates. They are:

1. YouTube clowns. Souray was the poster child for the phenomenon. Plus he had a brutal save percentage score his last year in Montreal, but had a decent level of competition (Carbonneau wasn't fooled) and decent Fenwick numbers (and therefore almost certainly decent on-ice scoring chance numbers). That save% score bounced back in a ridiculous way for his first two years as an Oiler, and plummeted this season, creating the illusion that Charlie Huddy was a supergenius. The dice have no memory, after all.

2. Guys with back to back seasons of poor 5v5 save percentage scores, three in a row is even better. NHL teams aren't swayed much at all by these (oddly enough, it is the implicit core focus of fandom) but no matter how square your head is ... two or three seasons in a row of bad luck and the mind searches for reasons. Just watch a craps table for a couple of hours if you don't believe me, and that's with dice. So it's bound to affect at least some GMs. The universe requires it happens to some defenders, so I think these guys could come cheaper than their true market value. Zbynek Michalek fits the bill, Colin White, Bieksa and Jack Johnson maybe. Niedermayer isn't going anywhere, unfortunately, but he's been rolling snake-eyes over the last couple, creating the illusion of a sharp decline in his play.

3. Scoring chance % may not be everything for defenders, but it's almost everything. And it repeats really well just as a raw number. If you apply LupulSmid to account for quality of D partner, then the season to season correlation goes through the roof, r=.85 for the Oilers D, and that's without accounting for the rest of the contextual issues (quality of competition, how often they start in their own end of the rink, etc.). So it's time to move a guy like Smid, before the bottom falls out of his save% score. Because it will eventually.

Tuesday, May 04, 2010

Blocked Shots: Luck or Skill?

Team A is on the attack in Team B's end of the ice. Team A fires a whole bunch of pucks at Team B's net. Some of these shots hit the goaltender, some go wide, and some get blocked. Does either team have control over those three outcomes? In other words, once Team A is in the offensive zone shooting pucks at Team B, are the missed and blocked shots the result of an actual skill (by either team), or are they simply the product of randomness? I set out to answer that question.


DATA

To limit scorer bias as well as playing-to-the-score bias (both of which have been well demonstrated on this blog), I looked at only "even strength on the road with the score tied" numbers. I used data from the '08-'09 season (because that's the only season I happened to have handy).

Let's establish some basic terminology. "Shots on goal" is comprised of Goals + Saved Shots. To differentiate between the attacking and defending team, we append a "For" and "Against." Goals For is abbreviated GF, and Goals Against is abbreviated GA. So if Team A is in Team B's zone and shoots the puck off the goaltender's pad, Team A gets a SSF (saved shot for) and Team B gets a SSA (saved shot against). Missed shots for and against are MSF and MSA, and blocked shots for and against are BSF and BSA. (To clarify the latter, if Team A shoots a puck that gets blocked, Team A gets a "BSF".) Cool?


MISSED SHOTS


Let's first look at missed shots. If teams have no offensive ability to control how many of their shots go wide, we'd expect the distribution of MSF% (i.e., [Missed Shots For] divided by [Goals For + Saved Shots For + Missed Shots For]) to look completely random (i.e., no different than if teams were flipping a fair coin and their differing results were due to pure luck, or "binomial chance variation" as statisticians call it). To test this, I ran 10,000 simulated seasons in which I gave each team the same "coin" weighted to the league average MSF% of 26.92%, and I gave each team the number of "coin flips" equal to the actual shot attempts (i.e. GF + SSF + MSF) they took in real life. So if the Oilers had 502 shot attempts in real life in '08-'09 on the road with the score tied, they got 502 flips of the coin in each simulated season.

I examined the MSF% spread (i.e., the results of the coin flips) of each simulated season by calculating the standard deviation every time. In 10,000 simulated seasons, the average sd was 2.08%, and the maximum sd was 3.25%.

The actual observed sd of MSF% in '08-'09 was 1.92%. In other words, the distribution of teams' MSF% looks no different than what we'd expect if they were all flipping the same coin. To put it technically, there is no evidence for offensive skill in Missed Shots beyond what we'd expect from binomial chance variation.

What about on the defensive side? Do teams have the ability to induce missed shots?

I ran the same simulations using MSA% and found similar results as above. The league average MSA% in '08-'09 was 27.69% with a standard deviation of 2.05%. The average sd in the 10,000 simulated seasons was 2.01%, and the maximum sd was 3.11%.


BLOCKED SHOTS

Now let's add blocked shots into the mix. Team A shoots a puck that's blocked by Team B. Was it just randomness, or is Team A doing something wrong that persistently allows Team B to block shots?

I examined the distribution of BSF% (i.e., [Blocked Shots For] divided by [Goals For + Saved Shots For + Missed Shots For + Blocked Shots For]). Once again, if the percentage of blocked shots amongst total offensive shot attempts differs widely by team, the distribution of BSF% should look a lot different than it would if teams were just flipping coins. But it doesn't.

League average BSF% in '08-'09 was 24.36% with a standard deviation of 1.99%. The average sd in 10,000 simulated seasons was 1.75%, and the maximum sd was 2.96%.

What about on the defensive end? Do teams have the ability to block a persistently higher number of opposing shot attempts than other teams?

I again ran 10,000 simulated seasons using the league average BSA% of 24.32%. The average sd was 1.68%, and the maximum sd was 2.74%.

The actual observed standard deviation of BSA% in '08-'09 was 2.77%.

Whoa. See the difference? The actual spread of BSA% in '08-'09 was wider than even the widest of spreads in 10,000 simulated seasons! That's completely different than what we saw with MSF%, MSA%, and BSF%.


CONCLUSION

Teams appear to have the ability to block shots, beyond what we'd expect from chance alone. While many of us hockey stat nerds have used Corsi (i.e., total shot attempts including blocks), a couple bloggers like Matt Fenwick and Gabe Desjardins often stated an intuitive preference for excluding blocked shots. Though I disagreed in the past, at this point I'm inclined to think that if we are using shot attempts as a proxy for meaningful territorial advantage, we should exclude blocked shots.