Saturday, May 15, 2010

Streaks

Streaks are interesting phenomena. They are also very difficult to pin down with any language, be it spoken or mathematical. On top of that, the human brain seems to have evolved to recognize patterns, and we can spot them even where they don't exist. Children can't look at a cloud, a stipple ceiling or the grain in wood panelling without seeing an image. Ask one if you don't believe me.

One way of analyzing streakiness is to look at rolling averages. For example, if you looked at Canuck forward Alex Burrows during last season, and plotted out the 20 game rolling average of shooting percentage (even strength, no empty netters) you'd get this:
We can evaluate Burrows streakiness, or inconsistency, by summing up how many black pixels we needed to draw that picture. Then we compare that to how many black pixels we'd expect to use by chance alone. In Alex's case, nearly 99 times out of 100 we'd expect to use fewer black pixels. So he had a streaky season. How much of that was down to chance and how much of that was down to circumstance? It's a good question. The first thing you'd need to do is look at every other forward in the league, but that's a subject for another day.

This idea is taken from a 2008 article in The Journal of Quantitative Analysis in Sports. It has been coined the BLACK stat.


For now I have a little test, so you can check yourself for streak bias:

If you rolled a ten sided die 100 times, counting a seven as a success and any other number as a failure, and you happened to get 10 sevens ... what sort of pattern would you expect?

The pattern immediately below this paragraph would be too consistent to be true. We've all played enough board games to know that dice don't work that way:
0000000001000000000100000000010000000001000000000100000000010000000001000000000100000000010000000001

Copy that series of 0s and 1s into a word processor or text editor, and drag the 1s around in the series until it seems properly random to your mind.

Next, copy and paste the modified series of 0s and 1s into this URL. Simply replace the original series in the URL with your modified version. Open a browser window with that URL, and voila, you'll get a little graph like the one above for Burrows, and you'll also get a bit of BLACK stat data. The number to notice is the BLACK statistic rank, that shows how many pixels your series required, as compared to 10,000 randomly simulated series.

Please note that this will NOT count towards your final grade. If you're going to do it, trust your instincts. I got a shockingly small ranking with my first go, implying that I was subconsciously putting a pattern into my random series. That's not good, but it's humanity. Or so I think.

30 comments:

  1. Good stuff.

    My black statistic was 70; percentile rank was 46.1.

    ReplyDelete
  2. That was fun.

    My output:


    interval: 20
    trials: 100
    successes: 10

    Player's BLACK statistic: 29

    BLACK statistic for 10000 clone seasons, each with the data shuffled randomly
    • 5% clone BLACK statistic: 39
    • 50% clone BLACK statistic: 72
    • 95% clone BLACK statistic: 117

    BLACK statistic rank (percentile of clone seasons): 0.8

    ReplyDelete
  3. RO

    I was at the 1.7 percentile. Damn, you and me must see patterns everywhere. J at the 46th percentile is impressive, I thought everyone would be well below the 50th.

    ReplyDelete
  4. Vic: I think I might have tried to overcompensate with a few too many runs of 1's. I like to think I have a pretty good grasp of the concept that luck can produce runs, it turns out that maybe I have too good a grasp :-D

    Incidentally I have been accused before of seeing luck everywhere, maybe there's a grain of truth in that.

    ReplyDelete
  5. No RO, the opposite. We both spread them out too much, that's why we didn't use enough blank ink. The nearer the 50th percentile the nearer you are to a likely random series.

    By this little test, JLikens is like a robot. You and me ... not so much.

    ReplyDelete
  6. Damn, I didn't think of that. I was trying to get away from that original 9-zeros-1-one series, too much you say?

    In any case I think there was just too much deterministic thinking-of-how-to-do-it, same for you I take it? That's my takeaway from this exercise.

    ReplyDelete
  7. Just to add, RO, we shouldn't feel bad. I checked the site stats at timeonice.com, the full url shows up as a hit. So I could click it and link through to their results.

    And it looks like, of the last 10 people to have a go, 8 are in single digits in terms of percentile rank. Then there is J and one person with a whopping 88.7 ranking.

    So we shouldn't feel bad. :D

    ReplyDelete
  8. You weren't meant to think at all RO, just spread around the '1's until it was something that intuitively looked entirely random.

    ReplyDelete
  9. You weren't meant to think at all RO, just spread around the '1's until it was something that intuitively looked entirely random.

    I'm not even sure it was a conscious decision. Just the thoughts popped up "more runs", and "get away from even spacing".

    It's good to know, though, that others do the same thing. Like you say, it might be human nature, and few have the desire to be unnatural :-D

    ReplyDelete
  10. JLikens was more random than my three trials with a rng: 76.8, 41, 63.3

    ReplyDelete
  11. interval: 20
    trials: 100
    successes: 10

    Player's BLACK statistic: 44

    BLACK statistic for 10000 clone seasons, each with the data shuffled randomly
    • 5% clone BLACK statistic: 39
    • 50% clone BLACK statistic: 72
    • 95% clone BLACK statistic: 116

    BLACK statistic rank (percentile of clone seasons): 8.1

    ReplyDelete
  12. This is a fun game! I screwed up on my first try by only including seven ones but on my second try I got this:

    interval: 20
    trials: 100
    successes: 10

    Player's BLACK statistic: 62

    BLACK statistic for 10000 clone seasons, each with the data shuffled randomly
    • 5% clone BLACK statistic: 40
    • 50% clone BLACK statistic: 72
    • 95% clone BLACK statistic: 116

    BLACK statistic rank (percentile of clone seasons): 32

    ReplyDelete
  13. Player's BLACK statistic: 42

    BLACK statistic rank (percentile of clone seasons): 6.7

    Point made!

    ReplyDelete
  14. Radiolab had a couple of segments on randomness and human perception thereof:

    http://www.wnyc.org/shows/radiolab/episodes/2009/09/11/segments/133415

    I think that one has some tangentially related stuff.

    http://www.wnyc.org/shows/radiolab/episodes/2009/09/11

    That whole episode is explicitly about randomness.

    If you're interested in human nature, you owe it to yourself to listen to that podcast.

    ReplyDelete
  15. First try:

    Player's BLACK statistic: 98

    BLACK statistic for 10000 clone seasons, each with the data shuffled randomly
    • 5% clone BLACK statistic: 40
    • 50% clone BLACK statistic: 72
    • 95% clone BLACK statistic: 117

    BLACK statistic rank (percentile of clone seasons): 83.6

    ReplyDelete
  16. Ick, apparently I'm also bad at random:

    Player's BLACK statistic: 57

    BLACK statistic for 10000 clone seasons, each with the data shuffled randomly
    • 5% clone BLACK statistic: 39
    • 50% clone BLACK statistic: 72
    • 95% clone BLACK statistic: 116

    BLACK statistic rank (percentile of clone seasons): 24.5

    ReplyDelete
  17. This comment has been removed by the author.

    ReplyDelete
  18. This comment has been removed by the author.

    ReplyDelete
  19. I managed to get it up to 50.8%

    ReplyDelete
  20. What are the odds that you'd get at least 2 consecutive ones in a random draw?

    ReplyDelete
  21. Passive Voice:

    I just wrote a quick script, and it looks like you'd expect back to back ones about 63% of the time by chance alone. Assuming I haven't screwed up, that's higher than I would have guessed. Seem right to you?

    ReplyDelete
  22. I'm definitely gonna defer to you on this one. It seems sorta high, but that reminds me of that "what're the odds that two people in this class of 30 share a birthday" question that clever middle school teachers use to dazzle 13-year-olds.

    The reason I ask is because I realized after my first go round that I had been avoiding repeat numbers because, I guess, two consecutive ones didn't seem random enough or something.

    ReplyDelete
  23. I got BLACK statistic rank (percentile of clone seasons): 85.4 with this string 0111101000000000000000000000000000000000000000000000000000000001100000000001000010000000000000000100

    I read an article about a prof teaching a first year stats class who got half the class to flip a coin 100 times and write down the series of heads and tails. He got the other half to fake the results with what they thought random should look like. The prof was in the high 90s at guessing which was real and which was fake. Random is a really hard concept to wrap your head around.

    ReplyDelete
  24. This comment has been removed by the author.

    ReplyDelete
  25. Player's BLACK statistic: 85.1

    BLACK statistic for 10000 clone seasons, each with the data shuffled randomly
    • 5% clone BLACK statistic: 40.6
    • 50% clone BLACK statistic: 73.2
    • 95% clone BLACK statistic: 117.3

    BLACK statistic rank (percentile of clone seasons): 68.8

    If my series was a player, he would have been chided for failing to show up in the stretch run.

    ReplyDelete
  26. Player's BLACK statistic: 66

    BLACK statistic for 10000 clone seasons, each with the data shuffled randomly
    • 5% clone BLACK statistic: 39
    • 50% clone BLACK statistic: 72
    • 95% clone BLACK statistic: 116

    BLACK statistic rank (percentile of clone seasons): 39.7

    My player was kinda Selivanov-like.

    ReplyDelete
  27. interval: 20
    trials: 100
    successes: 10

    Player's BLACK statistic: 115

    BLACK statistic for 10000 clone seasons, each with the data shuffled randomly
    • 5% clone BLACK statistic: 39
    • 50% clone BLACK statistic: 72
    • 95% clone BLACK statistic: 115

    BLACK statistic rank (percentile of clone seasons): 94.5

    ReplyDelete
  28. How do I collect my t-shirt?

    ReplyDelete