Saturday, May 15, 2010

Streaks

Streaks are interesting phenomena. They are also very difficult to pin down with any language, be it spoken or mathematical. On top of that, the human brain seems to have evolved to recognize patterns, and we can spot them even where they don't exist. Children can't look at a cloud, a stipple ceiling or the grain in wood panelling without seeing an image. Ask one if you don't believe me.

One way of analyzing streakiness is to look at rolling averages. For example, if you looked at Canuck forward Alex Burrows during last season, and plotted out the 20 game rolling average of shooting percentage (even strength, no empty netters) you'd get this:
We can evaluate Burrows streakiness, or inconsistency, by summing up how many black pixels we needed to draw that picture. Then we compare that to how many black pixels we'd expect to use by chance alone. In Alex's case, nearly 99 times out of 100 we'd expect to use fewer black pixels. So he had a streaky season. How much of that was down to chance and how much of that was down to circumstance? It's a good question. The first thing you'd need to do is look at every other forward in the league, but that's a subject for another day.

This idea is taken from a 2008 article in The Journal of Quantitative Analysis in Sports. It has been coined the BLACK stat.


For now I have a little test, so you can check yourself for streak bias:

If you rolled a ten sided die 100 times, counting a seven as a success and any other number as a failure, and you happened to get 10 sevens ... what sort of pattern would you expect?

The pattern immediately below this paragraph would be too consistent to be true. We've all played enough board games to know that dice don't work that way:
0000000001000000000100000000010000000001000000000100000000010000000001000000000100000000010000000001

Copy that series of 0s and 1s into a word processor or text editor, and drag the 1s around in the series until it seems properly random to your mind.

Next, copy and paste the modified series of 0s and 1s into this URL. Simply replace the original series in the URL with your modified version. Open a browser window with that URL, and voila, you'll get a little graph like the one above for Burrows, and you'll also get a bit of BLACK stat data. The number to notice is the BLACK statistic rank, that shows how many pixels your series required, as compared to 10,000 randomly simulated series.

Please note that this will NOT count towards your final grade. If you're going to do it, trust your instincts. I got a shockingly small ranking with my first go, implying that I was subconsciously putting a pattern into my random series. That's not good, but it's humanity. Or so I think.

30 Comments:

Blogger JLikens said...

Good stuff.

My black statistic was 70; percentile rank was 46.1.

5/15/2010 12:20 pm  
Blogger R O said...

That was fun.

My output:


interval: 20
trials: 100
successes: 10

Player's BLACK statistic: 29

BLACK statistic for 10000 clone seasons, each with the data shuffled randomly
• 5% clone BLACK statistic: 39
• 50% clone BLACK statistic: 72
• 95% clone BLACK statistic: 117

BLACK statistic rank (percentile of clone seasons): 0.8

5/15/2010 12:40 pm  
Blogger Vic Ferrari said...

RO

I was at the 1.7 percentile. Damn, you and me must see patterns everywhere. J at the 46th percentile is impressive, I thought everyone would be well below the 50th.

5/15/2010 1:24 pm  
Blogger R O said...

Vic: I think I might have tried to overcompensate with a few too many runs of 1's. I like to think I have a pretty good grasp of the concept that luck can produce runs, it turns out that maybe I have too good a grasp :-D

Incidentally I have been accused before of seeing luck everywhere, maybe there's a grain of truth in that.

5/15/2010 1:31 pm  
Blogger Vic Ferrari said...

No RO, the opposite. We both spread them out too much, that's why we didn't use enough blank ink. The nearer the 50th percentile the nearer you are to a likely random series.

By this little test, JLikens is like a robot. You and me ... not so much.

5/15/2010 1:51 pm  
Blogger R O said...

Damn, I didn't think of that. I was trying to get away from that original 9-zeros-1-one series, too much you say?

In any case I think there was just too much deterministic thinking-of-how-to-do-it, same for you I take it? That's my takeaway from this exercise.

5/15/2010 2:03 pm  
Blogger Vic Ferrari said...

Just to add, RO, we shouldn't feel bad. I checked the site stats at timeonice.com, the full url shows up as a hit. So I could click it and link through to their results.

And it looks like, of the last 10 people to have a go, 8 are in single digits in terms of percentile rank. Then there is J and one person with a whopping 88.7 ranking.

So we shouldn't feel bad. :D

5/15/2010 2:06 pm  
Blogger Vic Ferrari said...

You weren't meant to think at all RO, just spread around the '1's until it was something that intuitively looked entirely random.

5/15/2010 2:07 pm  
Blogger R O said...

You weren't meant to think at all RO, just spread around the '1's until it was something that intuitively looked entirely random.

I'm not even sure it was a conscious decision. Just the thoughts popped up "more runs", and "get away from even spacing".

It's good to know, though, that others do the same thing. Like you say, it might be human nature, and few have the desire to be unnatural :-D

5/15/2010 2:19 pm  
Blogger Jeff J said...

JLikens was more random than my three trials with a rng: 76.8, 41, 63.3

5/15/2010 7:15 pm  
Blogger Coach pb9617 said...

interval: 20
trials: 100
successes: 10

Player's BLACK statistic: 44

BLACK statistic for 10000 clone seasons, each with the data shuffled randomly
• 5% clone BLACK statistic: 39
• 50% clone BLACK statistic: 72
• 95% clone BLACK statistic: 116

BLACK statistic rank (percentile of clone seasons): 8.1

5/15/2010 10:11 pm  
Blogger Scott Reynolds said...

This is a fun game! I screwed up on my first try by only including seven ones but on my second try I got this:

interval: 20
trials: 100
successes: 10

Player's BLACK statistic: 62

BLACK statistic for 10000 clone seasons, each with the data shuffled randomly
• 5% clone BLACK statistic: 40
• 50% clone BLACK statistic: 72
• 95% clone BLACK statistic: 116

BLACK statistic rank (percentile of clone seasons): 32

5/15/2010 11:19 pm  
Blogger Matt said...

Player's BLACK statistic: 42

BLACK statistic rank (percentile of clone seasons): 6.7

Point made!

5/16/2010 10:48 am  
Blogger MikeP said...

Radiolab had a couple of segments on randomness and human perception thereof:

http://www.wnyc.org/shows/radiolab/episodes/2009/09/11/segments/133415

I think that one has some tangentially related stuff.

http://www.wnyc.org/shows/radiolab/episodes/2009/09/11

That whole episode is explicitly about randomness.

If you're interested in human nature, you owe it to yourself to listen to that podcast.

5/16/2010 10:58 am  
Blogger mc79hockey said...

First try:

Player's BLACK statistic: 98

BLACK statistic for 10000 clone seasons, each with the data shuffled randomly
• 5% clone BLACK statistic: 40
• 50% clone BLACK statistic: 72
• 95% clone BLACK statistic: 117

BLACK statistic rank (percentile of clone seasons): 83.6

5/16/2010 12:49 pm  
Blogger Jonathan Willis said...

Ick, apparently I'm also bad at random:

Player's BLACK statistic: 57

BLACK statistic for 10000 clone seasons, each with the data shuffled randomly
• 5% clone BLACK statistic: 39
• 50% clone BLACK statistic: 72
• 95% clone BLACK statistic: 116

BLACK statistic rank (percentile of clone seasons): 24.5

5/16/2010 4:09 pm  
Blogger Jonathan Willis said...

This comment has been removed by the author.

5/16/2010 4:11 pm  
Blogger Jonathan Willis said...

This comment has been removed by the author.

5/16/2010 4:13 pm  
Blogger Jonathan Willis said...

I managed to get it up to 50.8%

5/16/2010 4:17 pm  
Blogger Passive Voice said...

What are the odds that you'd get at least 2 consecutive ones in a random draw?

5/17/2010 7:29 pm  
Blogger Vic Ferrari said...

Passive Voice:

I just wrote a quick script, and it looks like you'd expect back to back ones about 63% of the time by chance alone. Assuming I haven't screwed up, that's higher than I would have guessed. Seem right to you?

5/17/2010 10:25 pm  
Blogger Passive Voice said...

I'm definitely gonna defer to you on this one. It seems sorta high, but that reminds me of that "what're the odds that two people in this class of 30 share a birthday" question that clever middle school teachers use to dazzle 13-year-olds.

The reason I ask is because I realized after my first go round that I had been avoiding repeat numbers because, I guess, two consecutive ones didn't seem random enough or something.

5/18/2010 2:01 pm  
Blogger mediumTermReader-ThirdTimePoster said...

I got BLACK statistic rank (percentile of clone seasons): 85.4 with this string 0111101000000000000000000000000000000000000000000000000000000001100000000001000010000000000000000100

I read an article about a prof teaching a first year stats class who got half the class to flip a coin 100 times and write down the series of heads and tails. He got the other half to fake the results with what they thought random should look like. The prof was in the high 90s at guessing which was real and which was fake. Random is a really hard concept to wrap your head around.

5/18/2010 4:44 pm  
Blogger quain said...

This comment has been removed by the author.

5/20/2010 11:45 pm  
Blogger quain said...

Player's BLACK statistic: 85.1

BLACK statistic for 10000 clone seasons, each with the data shuffled randomly
• 5% clone BLACK statistic: 40.6
• 50% clone BLACK statistic: 73.2
• 95% clone BLACK statistic: 117.3

BLACK statistic rank (percentile of clone seasons): 68.8

If my series was a player, he would have been chided for failing to show up in the stretch run.

5/20/2010 11:54 pm  
Blogger NathanaelDrumm0113家明 said...

沒有友情,人生何樂?.............................................

5/23/2010 10:25 pm  
Blogger 奎峰 said...

人是受想像力所支配的。 ..................................................

5/31/2010 12:04 pm  
Blogger Pete. said...

Player's BLACK statistic: 66

BLACK statistic for 10000 clone seasons, each with the data shuffled randomly
• 5% clone BLACK statistic: 39
• 50% clone BLACK statistic: 72
• 95% clone BLACK statistic: 116

BLACK statistic rank (percentile of clone seasons): 39.7

My player was kinda Selivanov-like.

6/01/2010 10:59 am  
Blogger slipper said...

interval: 20
trials: 100
successes: 10

Player's BLACK statistic: 115

BLACK statistic for 10000 clone seasons, each with the data shuffled randomly
• 5% clone BLACK statistic: 39
• 50% clone BLACK statistic: 72
• 95% clone BLACK statistic: 115

BLACK statistic rank (percentile of clone seasons): 94.5

6/03/2010 8:00 am  
Blogger slipper said...

How do I collect my t-shirt?

6/03/2010 9:51 am  

Post a Comment

<< Home