### The Importance of Quality of Competition

Hockey talk has come a long way in the past few years. It wasn't long ago that the notion of context of ice time, especially quality of competition, was considered to be a nonsense promoted by wacky Oiler fans. Even the pure math folks on the Internet thought it was folly. MC79 would routinely visit a Flames fan message board and argue the point. There was much mocking and derision going both ways.

It's not that way now, though. This is thanks largely to hockey play-by-play man Jim Hughson. Not only does he frequently talk about the match-ups on the ice, he gets them right. And nobody has done more to promote the notion on the Internet than Gabe Desjardins, this through the QUALCOMP numbers at his terrific hockey statistics site.

I thought it was time to look at just how much of an impact QUALCOMP had on a player's results. To do that I'm going to split the 09/10 NHL season in half, and look at change of results for the players from one half to the next, and compare that to their change in QUALCOMP.

Desjardins' QUALCOMP is similar to what is shown in this post of a few weeks ago. The head to head EV shifts table from that post is replaced with head to head 5v5 ice time, and that scoring chance +/- list is replaced with on-ice/off-ice +/- rate. That's his measure of player value. Plus Gabe uses only one iteration. Then he subtracts the result from the original and voila ... QUALCOMP.

It would be a bitch to gather all the head to head and on-ice/off-ice data. But once you had done that, calculating QUALCOMP is just three or four lines of code in most programming languages. Simple stuff, really.

The quality of competition metric I will use here is a bit different. Gabe's QUALCOMP only compares players to their teammates, I need a global number. Also, Gabe doesn't given season and quarterly splits for his data. And I'll need both here.

For those that care; I use Fenwick Numbers for the player value (our best proxy for player scoring chance +/-) and total shots at net by either team, in lieu of head to head ice time. Zone start is also factored into the opponent's value. I also run several iterations, in this case about 30. This is because good players tend to play a lot against good players in this league. So, for example, you run the numbers for Zetterberg and realize that he played a bunch against the other team's best outchancing players. So he's even better than his Fenwick suggest. Everyone who played against him deserves larger props. you bump his value a smidgen and run the numbers again. Same for everyone else in the league. Rinse and repeat until the results stabilize.

I chose guys that played regularly in both halves of the season. That's 412 players, about 14 per team. Obviously far fewer for teams like the Oilers, who were devastated by injury and illness this past season.

The results:

Comparing the first half of the season o the second half.

The relationship of change in Vic's Qualcomp to change in Fenwick ratio:

Pearson's r = .37

variance in player results change: 13,186

Just to keep our lives simple, we'll pretend that the data is distributed normally, so r² * total variance ~= variance attributable to Vic's Qualcomp.

variance in player results change attributable to Vic's Qualcomp: 1,853

Now Pearson's correlation doesn't mean a whole lot unless we understand the role of chance in the game.

Some of that change in players' Fenwick% is down to chance alone. We see that in the games, and it's explicit if you've been following any of the scoring chance recorders on the web.

If we had access to a million parallel universes the element of luck (chance variation) would evaporate, and Pearson's r would wander up to about .71 and stabilize. But we don't.

We do have access to the first quarter of the season and the last quarter of the season, though. That works out to about half the average ice time per player, and the element of chance would be double in that sample.

So we run that:

Pearson's r = .20

variance in player results change: 23,024

Subtraction yields the variance component attributable to luck in the first study and he non-luck element:

luck component variance: 9,837

non-luck component variance: 3,349

THE FINAL TALLY:

Vic's Qualcomp explains 55% of the change in player's Fenwick results that is unaccounted for by luck.

It would be larger if my model were better. For starters, scorer bias and score effects in the games could be accounted for.

On a team by team basis, Vic's qualcomp has a strong correlation to Desjardin's QUALCOMP, usually around r = .8. And these measures are built from completely different bricks. They are both built with reason, that's the only thing they have in common.

My methodology, crude and simple as it is, did weed out most of the injury effects, they are obviously huge for some players as well. Linemate quality is obviously going to account for a big chunk of the remainder, I think, though that covaries negatively with quality of competition (when you get better linemates, more often than not you're also going to be playing against better opposition).

Bottom line: If you ignore quality of competition, you do so at your own peril.

Next season I'll show the change in Vic's Qualcomp over season halves. The top 100 players in qualcomp change will, collectively, see their Fenwick results (or scoring chance results if we have them) move the same direction as their quality of competition. Nothing can stop it. If you can find a denier who likes to wager, prop them 3 to 1 odds that most of that Top 100 will see their results go the way of their Vic's Qualcomp. Or Desjardins' QUALCOMP for that matter.

The odds on that wager are about 400 trillion to one in your favour. It's unlosable. Props any higher than 3 to 1 will just scare off the punter, though.

## 22 Comments:

I should also note that my method uses jersey numbers only. So in the second half of he season player 721 would be a combination Olli Jokinen as a Flame and Chris Higgins as a Flame.

Surely we'd get better results if we corrected for that, but it would be a lot of work.

And just for shits and giggles, the Oilers Vic Qualcomps, hopefully you remember player numbers:

32 11.6

44 10.1

83 6.8

37 3.9

19 3.5

10 3.0

24 2.9

18 2.9

16 2.8

5 2.7

41 2.5

78 1.9

12 1.6

22 1.2

67 0.8

77 0.7

89 0.5

43 0.5

13 -0.6

71 -0.8

34 -2.4

27 -3.2

91 -6.2

46 -6.5

6 -12.0

2 -12.3

The Oilers opponents down the stretch weren't so good in terms of outchancing. So while guys like Whitney, Pisani and Moreau saw more responsibility, relatively speaking ... globally it still wasn't very tough icetime. That's probably why this squad, and especially the Horc/Moreau/Pisani line and Gilbert/Whitney pairing looked much better down the stretch.

The poster boys for the phenomenon were CAR and COL. The former had a very rough sked over the first half, and the easiest sked in the league over the second half. Either that or they went on a run of always playing teams when their best players were injured, Like Minny did a few years ago.

The Opposite happened for COL. For whatever reason the back half of the schedule doomed them to get outchanced to a higher degree, and they were no hell in that regard to begin with.

Great idea and execution, as usual.

To get your Vic's Qualcomp numbers, did you basically make a huge matrix with all 412 players?

Sunny,

It's actually a matrix of all 936 skaters in he league, you need to use everyone to get it to balance. Otherwise the iterative soluion becomes unstable quickly.

Everyone in the league affects the results of everyone else in the league, to some degree, even if they were never on the ice together at the same time.

And you flatter me with the compliment on execution. I compensated for the crude and naive math by being uber-conservative throughout, however. So 55% is super safe.

Your Laplacian/Nelder-Mead solution would apply here too, Sunny. I use it for MLB, I just copied and pasted your R code. Crazy shit. It frightens me that I still don' understand the nuts and bolts of that math, and I've tried, Sunny.

If you want niche fame, Tangotiger is running a little comparison of forecasting tools. Second place would be a distant bell to that method, even if you didn't bother to correct for the corner positions. I've run the numbers for 1970 hrough 2009, it's magic. Granted I used baseball-reference.com for data, because they list fielding positions. They appear to use a player numbering system that differs from other sites, so a lot of players go randomly missing.

Along that vein, I'm sure Jim is wrong about platoons and real effects. Think of it like hockey, Sunny ... there are no real effects with shooting% against and defenders either ... if the NHL coaches get it right there will appear to be no effect at all. NHL coaches actually overdo that a touch. MLB managers, collectively, are about spot on with platoons, so it disappears perfectly. It's like the bunting thing. Don't tell him, though.

Yeah, to be truthful, my first comment was made immediately after reading your post. (And your posts are always thought provoking!)

After thinking about it some more though, I did find it kind of curious that you used Tango-esque math for your model here. (i.e. pearson correlations, assuming normal distributions, subtracting variances, etc) In fairness though, you criticized your own model. :)

I assume you used this model due to ease? Culling all the data must've been a bitch in and of itself, so kudos to you for that.

Since you looked at the actual curves, in your opinion is the normal distribution assumption fair here? Also, how much do you think the model is affected by leaving out half the population in your sample?

As for your baseball comments, yeah I've actually been working up my SQL chops a little bit, so doing a projection system would be very doable with all the R stuff I learned for hockey. But I haven't had the time lately, plus I kinda wondered if I'd go through all the work and end up with similar results as all the others. Hearing you state otherwise gives me a little more motivation to maybe give it a go. We'll see.

Your last paragraph is very interesting. I'll have to think about it some more.

Awesome stuff, Vic.

The timing of your post is interesting as I was recently looking at the predictive validity of team shot ratio from the 1st half of the season to the 2nd half at even strength (basically, the correlation between EV shot ratio in the 1st half to EV goal ratio in the 2nd half) .

While the relationship was disappointingly weak (if I recall correctly, the average correlation was about 0.4 over the 5 or 6 seasons I looked at), I attributed that to the fact that so much of the variance in goal ratio over a short period of time is due to randomness. However, I wasn't able to figure out how to control for this element (that is, until I read your post, which does exactly that).

I recall reading a post of yours from early 2008 that examined the repeatability and predictive power of various EV stats at the team level. If I'm remembering right, you determined that both fenwick and zone start differential were the best predictors of 2nd half EV outscoring in that particular season. When you conducted your analysis, did you factor out the proportion of the variance due to binomial change variation? If not, then that might be something worth looking at in the future.

Hey Vic,

Would it be possible to do a similar analysis that looked at how QUALCOMP affects overall point totals instead? I know Corsi is the best way to measure a player's value, but my experience has been that for those who aren't familiar with advanced stats, point totals hold a ton of weight. I think it'd be nice to be able to say with some level of certainty how much benefit a player like OV or Sedin got this year due to weaker competition.

And thanks a lot for doing what you do. This is my first time commenting, but I've been following you guys here for about a month. Hockey stats are the shit.

Sunny,

Nah, the obvious weakness of this model (and Gabe's for that matter) is that defenders and forwards are given equal weight. It becomes obvious very quickly that forwards are driving the outchancing-at-evens bus ... defensemen are just riding it. There are a few exceptions, and obviously defenders generally play more ice time per game than forwards. And I doubt that this holds for special teams.

Still, at even strength, it is an obvious truth.

The next step would be to change the model so that the clock had two weights that affected the rate of time kept. Pennies on weight #1 affected the rate of time kept by twice the rate of the pennies on Weight #2. Then Weight #1 would be used for forwards, Weight #2 for defencemen.

My script just uses jersey numbers, so it would be no small feat for me to add in the playing positions of all these guys on a game by game basis.

And who knows, from there the results may point to a better model yet, and the clock metaphor may have worn out it's usefulness, I dunno.

In any case, at this point there is no sense busting out big math that few would understand anyways. The frquentist shortcut methods suffice here. And even with this fairly crude model the results are compelling.

Online sports stats fans prefer these methods anyways. I suspect that part of the reason for that is the failure of our post-secondary education system, particularly in the social sciences, and part of that is the influence of sabermetrics.

If there is one base stat that most folks are obsessed with, it's time. If there is one high level stat that sports fans are obsessed with it is Pearson's r. Though going by the way they use it in arguments ... I don't think that most of these folks know what it is.

Jlikens,

That post was just one sample, of course, but it does give a reasonable indication.

For what you are trying to do, I'd suggest you use the Trivia Craps model. There is a post from a couple of years ago on here that should give you enough information, J.

Lyle

I'm not interested in hockey pools, so that isn't something I'd pursue, personally.

I don't think that anyone has ever done this, but try the following:

Get the Qualcomp numbers for every forward last season using behindthenet.ca, maybe set the cutoff at 30 games played or so. Gabe's interactive selection tools make that easy to do.

Get the EV points/60 stats from somewhere (I don't think Gabe publishes those, but he might). Hell, even points per game would suffice.

Drop them into a spreadsheet and run a Pearson correlation.

You should get a strong POSITIVE correlation between QUALCOMP and EVpoints/60. Higher QUALCOMP results in MORE scoring!!!!1!!11 Desjardins and the hockey nerds have it ALL WRONG!!

I don't know that this will be the result, but it's hard to imagine that it won't. Good forwards play a tonne against good forwards in this league, we all see that. So it's bound to yield that result.

And the final step, Email that to someone who has a widely read NHL blog, but doesn't know much about hockey. Guys like Brownlee or Gregor at OilersNation would be good targets, but that might be a bit mean.

It would be funny, for one thing. Plus it would help to stratify the online fanbase. That's important if the conversation is going to continue to move forward.

Damn, that's actually very easy to do.

Desjardins has EV points per 60 minutes on his site.

copy, paste and correlate. Takes about two minutes total.

r=.41 btw. Strong, though I would have guessed a larger number.

So by and large, the tougher your competition the more you'll score!

Vic

Thanks for doing the leg work.

I repeated your process, but I used Corsi QoC from Gabe's site instead of QUALCOMP. I set the limits for all forwards this year at 40 games played and 5 min of ES ice time. I got r = 0.15, which is pretty weak. Not sure what to make of these numbers.

Lyle,

I'm not sure, but I think Corsi QoC is a global stat, QUALCOMP and Rel.Corsi QoC are relative to teammates.

Essentially the former would be incorporating strength of schedule.

So the relative numbers will work better in convincing casual fans that quality of competition is a meaningless measure.

As to "what does it mean?" ... nothing at all.

I suppose that folks who watch a lot of games and pay attention realize that playing with better players usually leads to playing against better players. That's just obvious.

Desjardins' QUALTEAM and QUALCOMP correlate positively (r=.41). So I suppose that tells us that there is likely genuine merit to the math that Gabe uses to calculate both. Nothing more or less.

Be very careful with r, Lyle. Depending on the role of luck in the game (generally modelled as binomial chance variation in these parts) and covariance with other factors (such as quality of teammate and quality of opposition being inextricably linked for NHL players) ... an r of .94 may be meaningless, and in another case an r of .23 may be damning evidence.

It's not a magic bullet.

I should add that, if we had a better model than "pennies on the weight" then the r would be higher.

Then if we did the same for quality of teammates we should get a high r also. In fact an r so high that the r^2s would add up to a lot more than 1. Frequentist math tells us that's not possible.

Of course it is possible, hell wholly expected, if we understand the nature of the distributions of ability, qualcomp, chance variation and qualteam. And if we know how these interact with each other.

In this case we know that qualteam and qualcomp co-vary, so obviously the r^2s will add up to a whack more than 1.

Makes sense, no?

As I'm rambling :) ....

Pearson's r is representative of the ratio of covariance to variance. That's it.

So if distribution 'A' is height and distribution 'B' is weight ... Pearson's r is the covariance of A and B divided by the standard deviation of A and the standard deviation of B. Nothing more or less. That is the explicit definition.

If we make a few assumptions ... then r equals the ratio of the standard deviation of A divided by the standard deviation of B. And r becomes super useful.

Even the best sabermetrician who posts regularly (Birnbaum) makes that assumption each and every time, without explanation. It's convenient, but it's not cricket. And it can lead to some wildass conclusions.

That's fine if you're betting against guys who are NOT using any kind of systematic reasoning at all ... you're better than them. But you're in trouble deep if you bet against someone who is smarter than you. Because being "pretty close" is the worse possible scenario if the guy you're betting against understands what you're doing wrong.

That's why the OLGC (Ontario liquor and gaming commission), in spite of continual and astounding incompetence, manages to continually take higher margins with Pro-Line than chance alone would allow. They win above the natural hold, or what Sunny would call 'the juice'. And what a lot of cats are wrongly calling 'the vig'.

OLGC is oblivious to the fact that this is even occurring, much less why. Somewhere a consultant has made the government billions of extra dollars, just by having no respect for the market and a tremendous respect for the odds-setters. And it would seem that he is the only person who knows why.

(That cat deserves love btw, beyond that, back when tied games still happened, pre-lockout, pro-line made a killing.) Fans liked ties on the parlay a lot more than the odds-makers, and the Canadian government did very well, you're welcome.

There were some weekends where Pro-Line took a huge loss. Shit happens. Fuck. That's the game.

And in recent years, the cheating of the lines against the Canadian teams is inspired. You're going to have some bad weeks, and especially some bad weekends, when you play that game. And on each and every one of those OLGC will convene an emergency meeting and make a bad decision. Just childlike.

They're making more money than chance allows, over and over ... and still pointing an accusing finger at the poor fucker whose enabling their stupid asses. This every time they have a bad week.

This is all publicly available information, some things, like mistakes (leaving the game line open after the game has started, bad line, etc) if it's the former OLGC buries it (Liverpool v Everton) and points the finger elsewhere.

Seriously, Lyle. Google OLGC. If there was ever a case for small government. Good Lord, it would be funny if it wasn't so damn sad.

Whew, I'm a bit overwhelmed Vic :)

Corsi QoC is a global stat. But competition is important for contextual purposes, no? I've thought that Corsi QoC and zone start stats help put a player's ice time in perspective. I see that the numbers above indicate that good players tend to play against good players, though there are important exceptions, like Sedin and Nicklas Backstrom (from WSH) this year.

Thanks for the warning about r. I don't know a lot about it, or what it's pitfalls are, and if there's any reading you know of that's particularly insightful, it'd be awesome if you could shoot it my way.

I'm a little confused about the r^2 argument (not your fault, I just suck at math). If you're correlating two variables, and the r goes from -1 to 1, then I can't see how r^2 is ever more than one. I know I'm missing something, I just can't see what. Is it the interaction between the variables that changes the equation?

Googling OLGC right now...

I don't think that the Sedins or Backstrom are exceptions, but that aside ...

I've reread my comments on Pearson's r, and it is unclear to me what you disagree with.

Imagine this, a bunch of stats for high school shot putters in Alberta.

The correlation of participant height to shot put distance is r=.9. Would you really be comfortable saying that height of participant accounts for .9^2=81% of shot put distance? The remaining 19% is randomness, technique, other physical attributes and pyschological factors.

If you're pon board with that ... what if the correlation of participant weight to shot put distance is r=.8. You would clearly be comfortable saying that weight of participant accounts for .8^2=64% of shot put distance? The remaining 36% is randomness, technique, other physical attributes and pyschological factors.

You can spot the problem, I'm sure.

Google a baseball thread called SOLVING DIPS. It's this line of thinking run amok. Though in that case random chance accounts for enough of the variation to prevent them to add up to well over 100%.

Where it goes for a spectacular shit is when the forms of the distributions vary.

Sunny asked a question above that was very valid, though I never bothered to answer. The distribution of qualcomp is nothing close to normally distributed (<1% confidence with an Anderson-Darling test), and neither is the overall result (~58%, which is very low considering how much of that, as shown, is down to chance variation. Which is, by definition, normally distributed). And The K-S test to check if both are from the same distribution fails miserably.

How much does that affect things? I don't know or care. Though I'd advise everyone not to use this information for the purposes of wagering.

There's a limit to how much shit I'm going to take on this, however. The math in my post above represents the high water mark for sabermetrics math. And they sell it as rock solid and get nothing but compliments. Let's have some perspective, people.

I didn't mean to imply that I disagreed with any of your Pearson's R statements -- I'm only unsure about whether any QCOMP measure is meaningless. I thought that it was helpful to look at because it put a player's numbers into perspective.

No worries, Lyle. That commentary was aimed just generally up in the air.

When I read the sports internet

(and I don't manage to read a fraction of the commentary on the Oilogosphere, much less elsewhere, I'm sure that I miss out on tonnes of great stuff)you see so many people just pulling a Pearson correlation out of the air, for any two sets of data, and presenting it as rock solid evidence of some sort.It's bananas. And there just doesn't seem to be any stopping it. This post, and the comment above, is a lameass attempt at that, though. At least it's a point of reference if anyone starts hitting you with the 'r' hammer.

Hell, the same mistake is still fairly common in published academic articles (at least in ecomomics and the social sciences). So it will probably be a while before the magic bullet appeal of Pearson's r wears off.

Good thread. I for one have enjoyed your rambling, Vic. Imo it's important to hear intelligent counter-arguments to the current "standard" practices being used in sports statistics.

Vic -

I like reading this stuff, even though I probably understand about 30% of it, and I'm probably guilty of committing some of the errors you're referencing, although it seems fairly obvious to me, at least in the example you're using with the height and weight of shotputters; I would assume that height and weight are linked variables.

Say I wanted to get a better handle on this Bayesian versus frequentist stuff. What should I read?

mc,

Jim Albert has a book called Workshop Statistics: A Bayesian Perspective which is basically like an intro stats book written from a Bayesian point of view. It's very well written, heavily driven by real life examples (which I think is fantastic), and he gives an overview in the beginning on why he feels Bayesian methods are more practical than frequentist.

You might find the first half of the book elementary, but the last half of the book (particularly the last third) is the best, most easy-to-understand explanation of Bayesian inference I've ever read.

The book is available for free e-book download from Albert's website. Here is the link:

http://bayes.bgsu.edu/nsf_web/workshop.bayes.pdf

I have been impressive how things as changed in all those years. I have been great because now we have access to price per head free trial

Post a Comment

<< Home