On Base Percentage
Just looking at overall team stats, adding the hits+walks+hitbypitch numbers, then dividing those by the same surrendered by the team's pitching staff/defense; that gives the OBP ratio. This is the convention used by sabrmetric types, no? And the correlation to winning is enormous, and the correlation to runs ratio (runs scored divided by runs surrendered) equally so. A sample correlation of .90 or so, for any season or portion of it, this is typical. And it is obviously huge.
Now baseball isn't much like trivia craps at all. The better team on the day usually wins. Whereas over 40 matches of trivia craps, I would guess about a quarter of the players will actually have won more money in matches where they did worse at the trivia. Stuff happens.
My question is; how does it predict? If a team starts the year with a disappointing record, but a strong OBP ratio, should we suspect that they've just been a touch unlucky, and that they will start scoring more runs, allowing fewer, and winning more games in the future? And the answer is yes, or at least I can't find a baseball stat that has stronger predictive value, not with a quick search of the web, anyways.
For the 2003, 2005, 2006 and 2007 seasons the predictive value of OBP to the runs ratio, from one half of the season to the other, it averages a correlation of .43. As a point of reference for baseball fans, this is a touch better than the .38 predictive value of runs ratio to future wins ratio over the same interval (which is the essence of Bill James' Pythagorean Expectation) .
I have the even strength face off zones for the NHL for these same seasons, that's why I picked them. And obviously in both sports there are trades at the deadline, injuries, young players that get better with experience, differences in schedule difficulty from one half to the next, teams that are out of the playoff picture so start playing youth in situations that they wouldn't otherwise, etc. And in hockey especially, the bounces are still going to have a big say in the results.
The chart below shows the average repeatability and predictive correlations for these four seasons. I used face off zones because to my mind it is a strong indicator of meaningful possession in hockey, and the even strength zone time ratio and corsi ratio (shots directed at net ratio) aren't available for all of these seasons.
If you sum the first half of these four seasons, and compare it to the sum of the back half of these same four seasons (which I did accidentally at first, due to a script error) then both metrics grow stronger.
In the case of OBP, repeatability climbs to .70, prediction rises to .62.
In the case of face off zone ratio, repeatability climbs to .87, prediction rises to .57.
And in the case of face offs the correlation to immediate results, averaged over these four seasons, is a fairly weak .35 . With the four seasons combined, however, the benefit of the larger sample causes correlation of face off zone ratio to scoring ratio to climb to .71.
I know I'm painting with a big brush here, this is a bit rough. And I'm sure that baseball stats guys have done a tremendous amount of analysis of OBP and other team stats. Still, to my mind it's a comparison that has value. In large part because Pythagorean Expectation and OBP are fairly established metrics in the minds of a lot of the Oiler fans who hang out on the Oilogosphere.
Possession certainly isn't everything, but it's a lot of it. And is the foundation for any sensible endeavour to place reasonably accurate expectations on hockey teams at even strength, and hockey players as well.
As an aside: retrosheet.org is a terrific resource for baseball stats nuts. Everything is laid out for you, it takes minutes to write an excel macro that scrapes off what you need, and for any range of seasons. It does not have pitch counts by inning though, or even pitch counts by game. Does anyone know where this is available? I suspect that the reason OBP is such a strong indicator of present and future success is because it is an indicator of pitch count, and the ability of a team to get weaker pitchers (middle relievers) into the game. Tthough I'm certainly not sure of that.


