How statistically accurate are Pitchfork’s ratings?

From Part-Time Music:

    Thursday, February 25th, 2010
    “About a week or so ago, there was a hearty discussion on twitter from well-known music bloggers about the controversial 7.6 rating by Pitchfork of Toro y Moi’s excellent debut LP Causers of This. Since I am guilty of being more of a mathematician than a writer, I decided that this was a great opportunity to dive right into the numbers and do a brief statistical study of Pitchfork’s rankings from a period of one complete year and see where exactly Chaz Bundick’s 7.6 grade stacked up in comparison to his peers. After sifting through the data most of yesterday afternoon, I have to say there are some pretty interesting finds (including some statistical anomalies) behind Pitchfork’s rating system for albums.

    Before beginning, I feel I should make a brief mention on how the data was collected. Initially, I was going to write a script to go through Pitchfork’s Record Reviews, logging each numbered grade between February 24, 2009 and February 24, 2010. However, knowing that p4k has an affinity for rating reissues and compilations very favorably (an unbelievable 30 reissued albums scored higher than the highest rated contemporary album — chalk that up to the Beatles, Neil Young, and Radiohead re-releases), I figured the only sure fire way to get accurate data on non-reissued material was to look into each review, see if it fits my criteria for a new release, and jot down the score. A cumbersome process to say the least! There were several things I decided to omit when classifying an album as “original”: soundtracks, label compilations, live recordings, and of course reissues. This left a relatively large sample size of 1,025 records of newly released, original albums to run analysis on. Is this result error free? Of course not — no doubt I tallied a handful of albums as “original” when they weren’t and vice versa. However, with the sample size large enough and my propensity to err small, any stray mistakes can be deemed statistically insignificant. The following is a histogram plotting the number of occurrences of each rating (click for larger view):”

    If you are a frequent follower of p4k, then most of the plot doesn’t come as a surprise. The bulk of the histogram centers around the 6.5-8.5 range with a score of 7.0 being the most common rating (51 times). Also, because pitchfork tends to not publish reviews on horrendously bad albums, it’s a no brainer to see the plot negatively skewed significantly. Similarly, exceptionally performing albums (i.e. 8.7 and above) are also relatively rare events.

    Probably one of the most interesting results of the histogram is seeing whole number ratings occurring significantly more often than its x.9 and x.1 neighbors — in fact enough to be considered a statistical anomaly. Notice how the peaks at 6.0, 7.0, and 8.0 are noticeably higher (almost twice as high in some instances) than 5.9, 6.9, and 7.9 respectively. My theory behind this is that when it comes to “on the fence” reviews, p4k tends to give the benefit of the doubt to the artist. Knowing that perceptively a rating with a unit higher whole number looks more impressive (also explains why things are priced $6.99 rather than $7.00 — we subconsciously think it is a lot less), they tend to bump up the score more often to show a more positive review. Now if it is true that individual critics are responsible for giving an album a score rather than a collective following a loose outline of established “rules”, then this result is very interesting both from a mathematical and a sociological point of view.

    To see a better idea of the break-down of scores and a loose determination of percentiles, a box plot was performed (click for larger view):

    This plot tells us a couple of things, most notably establishing a line between OK albums and great albums. One can see from the plot that the 1st quartile, representing the “top” 25% of rankings occurs at the 7.6 line. What this means is that our beloved Toro y Moi album would be statistically defined as on the border of the upper tier. Confirming our natural inclination that a majority of albums are rated around the “7″ mark, the box of the boxplot, representing the middle 50% of scores, occurs from 6.1 – 7.6. The final interesting part is that if an album scores below 3.9, it’s considered a statistical outlier (meaning Lil’ Wayne can breathe easy knowing his rock album just made the cut). Refining the results further into 10% percentiles, the following is established:

    In my opinion, the above table gives a better way for bands to determine the meaning of their p4k rating than what the actual numerical score can provide. Take for example a hypothetical review of 7.7. Without any context, it is a rather meaningless number which invokes a wide-range of opinions (C-grade, “better than most”, underwhelming, etc…). However, when comparing it to a large sample of past albums’ ratings and seeing that it is in the 60th percentile — meaning it is better than 60% of the albums they’ve graded — then you understand the score a lot better.

    The final thing I’ll mention is a couple of points when looking over their Best New Music selections and the seemingly arbitrary way they assign the label. With how much significance is attached to a BNM nod (record sales, exposure, tour upgrades), it was rather unsettling noticing some trends that seemed to pop up:

    All albums scoring 8.6 and higher was automatically made Best New Music.
    If you are a metal fan, you’ve gotten royally screwed over and overlooked by p4k. Only two albums were selected for BNM within the past year: Sunn O))))’s Monoliths & Dimensions and Isis’s Wavering Radiant (both with scores of 8.5). Adding insult to injury was that out of the 15 albums that scored an 8.5, 11 of them made BNM. Two of the four that didn’t make the cut were metal-related records (Baroness’s Blue Record and Converge’s Axe to Fall) — both occurring on days when no other record made BNM.
    Another one of the four albums that ranked 8.5 and was not stamped with a BNM was contemporary jazz musician Jon Hassel’s LP verbosely entitled Last Night the Moon Came Dropping Its Clothes in the Street, supplying another example of a high performing album from a more obscure genre getting the shaft. In p4k’s defense, Yacht’s superb See Mystery Lights was BNMed that day which leads me to my next point…
    If you release a great record, make sure you don’t get reviewed on the same day as another great record. I don’t have an individual statistic for this, but I often saw high scoring albums (8.2-8.5) not get a BNM because another even better (or same ranking, just more hyped) album was reviewed the same day.
    If you are a hyped record or are an established act, you have a better shot of getting a Best New Music when you are on the cusp. Now this seems kind of obvious, but there were some egregious instances where this occurred. Of the 41 albums that scored an 8.1 and 8.2, five were chosen as BNM: Surfer Blood’s Astro Coast, Atlas Sound’s Logos Cass McCombs’s Catacombs, Bill Callahan’s Sometimes I Wish We Were An Eagle, and Wavves’s S/T
    Yeah, I have no idea what they were thinking BNM-ing that Mos Def record (the lowest score and, out of 36 records that scored an 8.0, it was the only one to get BNM-ed).
    This was a fun project which allowed me to brush up on some of my Matlab skillz. In the future, I would like to dive deeper and provide a more detailed analysis, but that will have to wait until I get some free time. If you have comments or would like to speculate on p4ks ratings, or if you have any insight on how they are determined (individual vs. collective), just leave a comment. If you want a copy of my data so you could run your own analysis, I would be happy to supply it to you (EDIT :: You can download the data set here).”

I’ve heard about another music reviewing site called cokemachineglow, which I’m going to check out this week!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s