### Clutch, Baby!

In this post, I intend to use reason in order to make you believe in clutch hitting ability in major league baseball. Seriously.

Bill James, who is quite famous in the world of baseball, is an engaging writer and a critical thinker. Nobody has done more to get baseball fans thinking, though I doubt he lists math as one of his strengths. In 2004 he wrote a terrific article in The Baseball Research Journal which I stumbled across on Monday, I highly recommend it. Essentially he's questioning the validity of some of the methods used by most of the mathy baseball analysts of the world. If you consider his audience ... that takes some stones.

Not surprisingly, this piece generated a lot of commentary. Dr Jim Albert, who I think is the best baseball writer out there, chimed in as well. He offered fair, detailed and reasonable comments on James' article. The thrust of it:

After that the conversation drifts into a wandering discussion regarding the presence of clutch hitting in baseball. Prolific baseball writer Phil Birnbaum doesn't believe in clutch hitting ability at all, and he gathered a whack of clutch hitting stats to make his case. He used Late Inning Pressure Situations (LIPS) to define a clutch at-bat (AB). And he defined the clutchness as the difference in batting averages between LIPS ABs and all other ABs. So if a guy batted .340 in Late Inning Pressure Situations and .300 in other ABs, his clutchness result would be +.040. Easy as beans. That makes sense to me, so I'm going to use Phil's data and definitions for my kick at the clutchness cat, thats the LIPS and non-LIPS ABs and hits for 16 seasons. I'll sum each player's results from each season, so I'll be working with one set of clutch data for each of 553 MLB hitters.

THE MODEL:

Imagine that there was an enormous balls-up at the Elias Sports Bureau. A disgruntled employee has falsified the LIPS batting averages for all players. He's kept the number of LIPS and non-LIPS ABs the same for everyone, and he's kept the total number of hits the same for everyone. But every time a customer downloads data, the hits are sprinkled over the LIPS and non-LIPS ABs completely randomly. So in 1974 Ron Cey still had 491 non-LIPS ABs, 123 LIPS ABs and 114 total hits ... but every time I access the Elias data those hits get shuffled into the LIPS and non-LIPS ABs. The first time I check I see that Ron Cey was the clutchiest of the clutchy in 1974, I check an hour later and he was a disgraceful choker in 1974. What the hell?

I keep downloading sets of this random data, the same stuff Phil compiled. Every time it's different, and by the time I realize what's going on I've downloaded and saved a whopping 1000 sets of random data. This isn't much of a stretch by the way, I'm not a quick study.

As I have these 1000 random seasons and also the real clutch data, I may as well make use of them. I plot out a bunch of them as histograms (that's just a bar chart, each bar covers a range of clutch averages, such as -.040 to -.035). The result is always a squiggly bell shape. Not too exciting. The actual clutch histogram is also a squiggly bell shape. To be expected, the universe is a squiggly place. It's also a little off centre, LIPS ABs probably come against better pitching a shade more often than not. It also looks like the bell lists to the right a bit, we'll call that left skewed.

It turns out that the actual data is spread out wider than the vast majority of the random seasons. Wider than 931 of them, in fact, using variance as the measuring stick.

Variance is a simple measure. If, using all the data for these 16 seasons, Ron Cey has a clutchness of +.020, and the overall league average is -.005, then he is is .025 points from average. Square that ( .025 x .025. ), then the same for everyone. Take the average of the whole bunch.

Σ(abs(x-x

Σ(abs(x-x

Using Jim Albert's equation from the article above. The sum of variances of luck and ability distributions equals the variance of the actual distribution.

Therefore:

Hitting clutchness, as defined by Phil Birnbaum and using his data, was 10.4% skill and 89.6% luck.

That strikes me as a naive assumption though, nature probably hasn't been kind enough to distribute clutch ability in Gaussian (Normal Distribution) fashion throughout the rosters of MLB.

We can build our own model for ability, parlay it through the luck distribution (the average of the 1000 random seasons) and see how close we come to the actual, or observed, distribution that Phil provided.

Trial 1: Assume that most hitters have no clutch or choking qualities. Apply .010 of added clutchness to 100 random hitters. Deduct .010 of clutchness from 100 random hitters. Run 1000 simulations.

Result: It's an improvement over the assumption that no clutch ability exists, this by all three measures above, but not enough. We need to bump it up a bit more.

Trial 2: Same as trial 1 but crank it up to .020 points added or deducted to clutchness.

Result: It's an improvement over Trial 1 by all three measures above, we're getting close to the 50th percentile by all three measures above. But still not enough. We need to bump it up just a shade more.

Trial 3: Same as trial 2 but crank it up to .025 points added or deducted to clutchness.

Result: Now we've gone much too far. In around the 35th percentile range for the three measures I'm using.

Trial 4: Let's try .022 as an adjustment.

Result: Ah, that's the stuff. The result we create with the model matches all three measure are very closely. All measures would rank close to 500th when compared to the 1000 random seasons.

That's all that I have done. No more or less. From here out it's straightforward though, we can refine the ability distribution to give a perfect result if we try. I wouldn't bother at this point, though. Firstly because my implementation here was a bit heavy handed, clutchness should be added into the ability distribution in a different way. Secondly because the difference between the actual data and the random data could still be the product of randomness. Or, equally likely, clutch ability is larger than I'm indicating here. Randomness is the essence of the universe, after all. Best to run the same procedure on several different sets of data, methinks. We're still painting with a big brush at this point.

* I may well have made a mistake along the way, either in logic or in coding, so please do not use this information for the purpose of wagering.

* None of the 1000 sample seasons resulted in a wider spread of results by all three measurements (absolute average difference from the mean, average squared difference from the mean and average absolute cubed difference from the mean) than the actual results, though this may be due in part to the fact that overall the players averaged a -.0057 clutchness. I don't suspect that is fatal, but this offset does escape the Lutheran philosophy of the model.

Bill James, who is quite famous in the world of baseball, is an engaging writer and a critical thinker. Nobody has done more to get baseball fans thinking, though I doubt he lists math as one of his strengths. In 2004 he wrote a terrific article in The Baseball Research Journal which I stumbled across on Monday, I highly recommend it. Essentially he's questioning the validity of some of the methods used by most of the mathy baseball analysts of the world. If you consider his audience ... that takes some stones.

Not surprisingly, this piece generated a lot of commentary. Dr Jim Albert, who I think is the best baseball writer out there, chimed in as well. He offered fair, detailed and reasonable comments on James' article. The thrust of it:

"Although I agree with James’ general conclusions, unfortunately I think that he is unclear and sometimes wrong in some of his statements about chance variation."There is a peculiar formality in the way that these baseball cats talk to each other on the internet, bless them.

After that the conversation drifts into a wandering discussion regarding the presence of clutch hitting in baseball. Prolific baseball writer Phil Birnbaum doesn't believe in clutch hitting ability at all, and he gathered a whack of clutch hitting stats to make his case. He used Late Inning Pressure Situations (LIPS) to define a clutch at-bat (AB). And he defined the clutchness as the difference in batting averages between LIPS ABs and all other ABs. So if a guy batted .340 in Late Inning Pressure Situations and .300 in other ABs, his clutchness result would be +.040. Easy as beans. That makes sense to me, so I'm going to use Phil's data and definitions for my kick at the clutchness cat, thats the LIPS and non-LIPS ABs and hits for 16 seasons. I'll sum each player's results from each season, so I'll be working with one set of clutch data for each of 553 MLB hitters.

THE MODEL:

Imagine that there was an enormous balls-up at the Elias Sports Bureau. A disgruntled employee has falsified the LIPS batting averages for all players. He's kept the number of LIPS and non-LIPS ABs the same for everyone, and he's kept the total number of hits the same for everyone. But every time a customer downloads data, the hits are sprinkled over the LIPS and non-LIPS ABs completely randomly. So in 1974 Ron Cey still had 491 non-LIPS ABs, 123 LIPS ABs and 114 total hits ... but every time I access the Elias data those hits get shuffled into the LIPS and non-LIPS ABs. The first time I check I see that Ron Cey was the clutchiest of the clutchy in 1974, I check an hour later and he was a disgraceful choker in 1974. What the hell?

I keep downloading sets of this random data, the same stuff Phil compiled. Every time it's different, and by the time I realize what's going on I've downloaded and saved a whopping 1000 sets of random data. This isn't much of a stretch by the way, I'm not a quick study.

As I have these 1000 random seasons and also the real clutch data, I may as well make use of them. I plot out a bunch of them as histograms (that's just a bar chart, each bar covers a range of clutch averages, such as -.040 to -.035). The result is always a squiggly bell shape. Not too exciting. The actual clutch histogram is also a squiggly bell shape. To be expected, the universe is a squiggly place. It's also a little off centre, LIPS ABs probably come against better pitching a shade more often than not. It also looks like the bell lists to the right a bit, we'll call that left skewed.

It turns out that the actual data is spread out wider than the vast majority of the random seasons. Wider than 931 of them, in fact, using variance as the measuring stick.

Variance is a simple measure. If, using all the data for these 16 seasons, Ron Cey has a clutchness of +.020, and the overall league average is -.005, then he is is .025 points from average. Square that ( .025 x .025. ), then the same for everyone. Take the average of the whole bunch.

Σ(abs(x-x

_{o})/n is similar to variance, we just don't square the differences. We have to make sure they are all positive numbers though.Σ(abs(x-x

_{o})^{3})/n is as above, except we don't square the differences, we cube them. Again, we have to make sure they are all positive numbers.Using Jim Albert's equation from the article above. The sum of variances of luck and ability distributions equals the variance of the actual distribution.

Therefore:

Hitting clutchness, as defined by Phil Birnbaum and using his data, was 10.4% skill and 89.6% luck.

That strikes me as a naive assumption though, nature probably hasn't been kind enough to distribute clutch ability in Gaussian (Normal Distribution) fashion throughout the rosters of MLB.

We can build our own model for ability, parlay it through the luck distribution (the average of the 1000 random seasons) and see how close we come to the actual, or observed, distribution that Phil provided.

Trial 1: Assume that most hitters have no clutch or choking qualities. Apply .010 of added clutchness to 100 random hitters. Deduct .010 of clutchness from 100 random hitters. Run 1000 simulations.

Result: It's an improvement over the assumption that no clutch ability exists, this by all three measures above, but not enough. We need to bump it up a bit more.

Trial 2: Same as trial 1 but crank it up to .020 points added or deducted to clutchness.

Result: It's an improvement over Trial 1 by all three measures above, we're getting close to the 50th percentile by all three measures above. But still not enough. We need to bump it up just a shade more.

Trial 3: Same as trial 2 but crank it up to .025 points added or deducted to clutchness.

Result: Now we've gone much too far. In around the 35th percentile range for the three measures I'm using.

Trial 4: Let's try .022 as an adjustment.

Result: Ah, that's the stuff. The result we create with the model matches all three measure are very closely. All measures would rank close to 500th when compared to the 1000 random seasons.

That's all that I have done. No more or less. From here out it's straightforward though, we can refine the ability distribution to give a perfect result if we try. I wouldn't bother at this point, though. Firstly because my implementation here was a bit heavy handed, clutchness should be added into the ability distribution in a different way. Secondly because the difference between the actual data and the random data could still be the product of randomness. Or, equally likely, clutch ability is larger than I'm indicating here. Randomness is the essence of the universe, after all. Best to run the same procedure on several different sets of data, methinks. We're still painting with a big brush at this point.

* I may well have made a mistake along the way, either in logic or in coding, so please do not use this information for the purpose of wagering.

* None of the 1000 sample seasons resulted in a wider spread of results by all three measurements (absolute average difference from the mean, average squared difference from the mean and average absolute cubed difference from the mean) than the actual results, though this may be due in part to the fact that overall the players averaged a -.0057 clutchness. I don't suspect that is fatal, but this offset does escape the Lutheran philosophy of the model.

## 15 Comments:

Vic said: That strikes me as a naive assumption though, nature probably hasn't been kind enough to distribute clutch ability in Gaussian (Normal Distribution) fashion throughout the rosters of MLB.

--

If we buy that Eddie Murray is more likely to hit a baseball hard at any time, and that he is likely to hit with men on base more often than Mark Belanger, then why do we need to expand 10.4%? Aren't we really discussing "number of sorties" as opposed to "performance when death is on the line?"

I'm not sure that I follow you.

The model, as crude as it is, firstly tries to separate "clutch situations" from others by using late inning at bats with the game still within reach. I like LIPS because it doesn't try to do too much. It doesn't account for different 'runners on base' scenarios or platoon effects, it's simple and honest.

Secondly, it tries to separate luck from ability. If you assume that the distribution of ability is Gaussian, then the math gets piss simple. But that's not cricket.

A guy like Andy Dolphin likes to add the variances, a guy like Phil Birnbaum likes correlations. Both are clever buggers, and both are taking shortcuts to describe the missing element. If the distribution is Gaussian and if the luck distribution is as well

(if the sample isn't huge it probably won't be, btw)then they will come to precisely the same conclusion in any test. 37% luck ... or whatever.But what if the talent is distributed in more of a bullet shape than a bell shape? Then one of those ideas will see the luck% go up a bunch, and the other will see it plummet. I haven't checked that with a model, perhaps I should before speaking, but the world is round, surely that's the way it works.

In any case, if you're cheering for a team that is making personnel decisions based on clutchness, you should save yourself years of future heartache and just walk away now.

What if you're cheering for a team that is making personnel decisions based on, well, we're not sure what they are basing them on.

Quality cheap bets are pushed aside while expensive risky gambles are brought in in their place.

Basic tenets of the good old hockey game are ignores, things like experience, the ability to get the puck going in the right direction, the ability to defend, kill penalties, win faceoffs, win puck battles.

My oh my the list is a long one.

Would walking away suffice in this case or perhaps should one step in front of a bus?

;)

Welcome back Vic. Post more. Please.

I was going through your archives a few weeks back, just some terrific stuff. As always I struggle with the math but I enjoy reading it and I get the point. It just takes me longer.

"That strikes me as a naive assumption though, nature probably hasn't been kind enough to distribute clutch ability in Gaussian (Normal Distribution) fashion throughout the rosters of MLB."

Vic, I've heard you say that sort of thing before, and I wanna ask you a question.

Why does it matter what the actual distribution of the population looks like?

You're going to be taking a bunch of random samples (hopefully of a big enough size) and finding those means, and those means will be normally distributed no matter what the population distribution looks like, right? (CLT)

Also, for Birnbaum's numbers of players 1974-1990 I get .276 avg on 970,471 non-clutch AB's and .268 avg on 159,688 clutch AB's.

If we assumed there were no clutch ability and any deviation from a .275 avg by the population was pure luck, we'd expect 99 percent of the samples of 159,688 AB's to be between .272 and .278.

.268 is out of range by like almost 7 SD's.

Am I missing something?

Sunny:

Yeah, the luck distribution is going to be near enough gaussian. The model builder makes it so.

The distribution of talent, however, is in the hands of Mother Nature. And it can be anything at all.

Sunny:

I summed the ABs and hits for each player in each of the two situations (LIPS and non-LIPS) over the time period and found the average difference to be .0057, LIPS batting average being lower. I may have made a programming error, it wouldn't be the first or last time.

Have I? Frankly I don't follow the reasoning in the rest of your comment. Just reword it and I'll probably follow.

Bah, nevermind, I see my mistake. The proportion of each player's non-clutch ABs to the total population's non-clutch ABs is not the same as the proportion of his clutch ABs to the total population's clutch ABs over that time period, so I can't just take total averages. I'd need to weight each player's coin (i.e. At-Bat) differently (i.e Based on his assumed true batting average).

Ok, I also had syntax errors in my original spreadsheet, so I reran everything. Here's what I get:

non-clutch ABs: 970,471

non-clutch Hits: 267,477

non-clutch Avg: .276

clutch ABs: 170,327

clutch Hits: 46,055

clutch Avg: .270

total (i.e. clutch + non-clutch) ABs: 1,140,798

total Hits: 313,532

total Avg: .275

expected average using each player's number of clutch ABs weighted by that player's total BA: .274

So, what I'm saying is... Imagine we have a coin that we know comes up heads .274 of the time. If we ran an infinite number of 170,327 coin flip samples, we'd expect 99 percent of those samples to have between .2712 and .2768 heads.

Here we have a sample that came up with .2704 heads, which is 2.57 standard deviations away from our expected value. In other words, the probability of getting that many heads in that many flips is .0051 (i.e. half a percent, or, "pretty fucking unlikely").

Yeah, I'm not disputing that Late Inning Pressure Situations resulted in lower batting averages. This era saw the rise of the reliever in baseball. And if the starter is still in the game, it's probably because he's pitched really well and still has a low pitch count.

That won't have much of an impact on the spread of the bell curve, though. Which is what we looked at. It will just move the whole thing to the side.

It may well have some impact, just not much. I should tweak the script, just to be sure.

Pat,

Thanks for the love. If you're struggling to follow my math though, then that means I'm not explaining my reasoning well enough.

Sunny,

Thanks for pushing me to build a model that accounts for the lower batting average in LIP situations (we'll call that clutch for the purposes of this conversation).

I accounted for this in two different ways:

1. reduce the clutchness area that the hits could fall upon by 1.67%

2. build in a filter that screens all of the hits that are about to land in the clutchness are, it then randomly redirects 1.67% of the hits over to the nonclutch area.

Both yield the same result. It won't matter to you, the variance drops just a shade. Variance is all you need to know if you've decided that all distributions are Gaussian. (x-xo) and (x-xo)^3 change dramatically, though. Cool stuff.

Damn. Bill James original point, that the Cramer Test (which is a very crude version of the Real Effects tests commonly done on this site) might be oblivious to significant items.

I don't know if you've read the links above and gone further. In a nutshell, Birnbaum started chasing James around with a big linear regression sledgehammer. James countered with "Mapping The Fog", in which he built a model a non-Guassian for clutch hitting which he hoped would fuck Phil up. He was marginally successful. Unfortunately he was probably thinking of Eddie Murray when he did it, because the model was a thin bell with thickish tails, which was counterproductive to his intent. Plus he didn't bother to incorporate any transience (streakiness) to the stat, which was one of the central thrusts of his original paper.

I thought as I was reading "Mapping The Fog" that, at the very least, the dude should have made the distribution more bullet shaped. Looks like nature already had, I'll be damned. I suspect that Zimmer is right btw, that it is transient. Hitter BABIP and Ks in LIP situations would really help me here.

The nice thing about what nature has handed us is that we will see little deterioration with smaller player samples.

Of course LIPS batting average may not match people's version of clutch. And you could spend a lifetime digging through the minutaie (platoon effects, more hard throwing fastball pitches in LIP situations, etc.) but the fact is that a LIPS hit is a LIPS hit. And while a great many LIPS ABs wouldn't feel like clutch ABs to most, almost all clutch ABs would qualify as LIPs ABs.

Naw Vic, its not you, its me.

I've said that a few times.

When it comes to simple math I am very good. Can do mad sums in my head like Matt Damon in that Minnie Driver movie.

But anything even a little bit complicated and the fog comes in off the bay.

Vic,

I was re-reading through this and something else occurred to me.

Say we look at four distributions:

1) non-LIPS observed distribution

2) non-LIPS chance distribution

3) LIPS observed distribution

4) LIPS chance distribution

Comparing number 3 to number 4 and seeing a wider spread doesn't really tell us that players are clutch because I assume number 1 is also wider than number 2.

I.e. we know that batting average is a skill, so even in non-LIPS ABs the ability spread will be wider than the chance spread. What we really want to know is if the DIFFERENCE between ability and chance spreads is GREATER in LIPS ABs than in non-LIPS ABs, no?

Let me clarify something about my last question.

In your model, you weighted each player's "coin" differently by his batting average for that particular season. Would that lead to extraneous error because the samples are small enough such that a player's batting average for one season might be quite a bit off from his "true" batting average?

So what I'm saying is, would it be better to create two separate chance distributions by weighting each player's coin to the league mean LIPS batting average and league mean non-LIPS batting average? Then you'd have the four distributions I listed in my last question, and you could compare how much (if any) wider the [LIPSobserved minus LIPSchance] is to the [nonLIPSobseverd minus nonLIPSchance].

Make any sense?

Post a Comment

<< Home