Photo (CC) Alosh Bennett.

With the regularity of clockwork, stories about how digital audio consumption is degrading the quality of music are published and then re-published. Nearly a decade after the introduction of Apple’s iPod, this still apparently qualifies as news. The content of the articles is so identical, you could believe the bylines are a ruse, a nom-de-plume for the same author re-publishing the same story.

Whatever the reason for their supposed newsworthiness, the problem with these stories isn’t their claims about the variable quality of music listening. I think it’d be hard to overstate just how sub-optimal real-world listening by real-world consumers can get. The problem is that these journalists, inexperienced in the actual history of the technology they’re covering, falsely identify a technological trend.

In the process, they miss the real story of how listeners listen.

Here’s the latest offender:

In Mobile Age, Sound Quality Steps Back [The New York Times]

The story conflates everything from comparing analog to digital to dynamic compression in mastering to data compression, so it’s hard to know where to begin. But I’ll do my best to separate out the issues. (After all, you barely have to read this article, because you’ve read this story – substituting a couple of sources here, a couple of metaphors there – repeatedly for about ten years.)

Myth #1: Audio advancement hasn’t kept pace with video advancement.

Here’s the myth, from author Joseph Plambeck:

The last decade has brought an explosion in dazzling technological advances — including enhancements in surround sound, high definition television and 3-D — that have transformed the fan’s experience. There are improvements in the quality of media everywhere — except in music.

First, this idea itself is internally inconsistent, at least in part. People’s home theater setups are full of music, from the soundtracks to games to movies to video of live concerts. In fact, the quality of audio in audiovisual contexts – including music – has improved alongside the video. Consider:

Original VHS format: Poor frequency response (100 Hz – 10 kHz), mono, or stereo with hideous dynamic response. In fact, this isn’t even worth measuring – it was awful. Couple that with poor analog reception or low-quality analog cable signals, and it means the 1980s, peak of the music video, sounded like crap.

DVD: Typically AC-3 or DTS digital audio, with better-than-CD audio quality (in terms of theoretical specifications), and digital surround capability. [Clarification: technically, it’s the theoretical 24-bit, 96kHz encoding rate that would make audio on DVDs “better” than CDs. Commenters are correct, though, that the lossy audio format, combined with real-world concessions to space, could degrade real-world audio quality – though you also get more channels, which is a good thing. For a better advance from the CD, see the Blu-Ray disc. Ed.] So, the NetFlix age is better off than the Blockbuster age.

Gaming: Games increasingly use compressed but relatively high-quality audio, approaching CD quality, and in digital surround formats. With intelligent surround mixing, this also leads to better channel separation and spatial separation, and a more pristine listening experience. Not only that, but because gamers use auditory clues to help them perceive where they and enemies are in space, anecdotally many non-musician gamers I’ve talked to are particular about their sound experience.

But that’s not the argument here. Apparently, the lowest-quality audio distribution format can be compared to the highest-quality video format. That just doesn’t make sense.

Let’s turn the tables, by way of comparison. I can even write the headline:

“Video Quality Suffers in the Age of the Internet – Unlike Audio”
By Peter Kirn[‘s fake evil imaginary brother]

Kids today, with their YouTube and their over-compressed, handheld shot video. Why, I remember in the old days. I used to shoot in gorgeous film on my Bolex and edit by hand on a Steenbeck.

Audio quality today is fantastic. 10.1 surround is the norm, as is better-quality mixing. Just listen to The Lord of the Rings recording. It’s spectacular. It’s a whole orchestra and everything. You can go watch the movie in a THX-certified theater, and listen to nearly three full hours of music. In fact, by the time you’ve watched the trilogy, you will have sit and listened to a longer piece of music than a Wagner opera – and you won’t have gotten out of your chair (minus that quick bathroom break).

Not like video. 320×240, really? Over-compressed video encoding and 15 fps? Stations that call themselves “HD” but exhibit noise artifacts do to over-compression – to say nothing of the less-popular, standard-definition stations squeezed into your cable signal to allow you to have 2000 stations? It’s as if people aren’t videophiles any more. It looks horrible.”

You get the point. Nothing above is incorrect; it’s just a matter of perspective, and whether you combine the best practices of one medium to the worst practices of another.

In video capture, the difference is more pronounced. Modern digital cameras now shoot in increasingly high-quality audio, which previously was often more compressed than the video. My new Olympus E-PL1 actually shoots uncompressed PCM audio alongside its motion JPEG video.

In fairness, the author here is talking about “music.” If TV in HD is now the norm, there isn’t an equivalent shift in the common format for distribution of musical albums (see myth #2). And that’s fair – mostly. But the issue is, again, comparing different delivery formats for different delivery applications for different content. Sure, the musical album hasn’t had the leap forward that, say, television has, in the move from standard definition content to high-definition content. But by the same token, would you compare the 16:9 cinematic experience – which was already “high fidelity” and “high definition” in optical film before the advent of these technologies – in the same way? In fact, if you did, the advances in cinema audio have been greater in the cinema than the advances in film presentation. While digital projection and 3D have very recently improved the situation, urban movie theaters getting carved into subdivided rooms actually made a lot of movies smaller, not bigger or “higher def.”

Photo (http://creativecommons.org/licenses/by/2.0/deed.en”>CC-BY) iamaruntimeerror.

Myth #2: MP3s reduce audio fidelity in the name of mobility

This topic has been discussed to death. At the risk of giving away the ending, low-bitrate MP3s don’t sound very good. Higher-bitrate MP3s do sound pretty good. (The same is true of Apple’s AAC-encoded audio, which incidentally, shares the audio codec being used on those DVDs and Blu-Ray discs and consumer digital video recorders.) In fact, bizarrely, the New York Times article doesn’t compare any hard numbers on perception of high-bitrate MP3s and AACs to CDs. It just takes it as a given that they aren’t as good, without any actual research.

But the central thesis of the entire article – one we’ve seen before – is this:

In one way, the music business has been the victim of its own technological success: the ease of loading songs onto a computer or an iPod has meant that a generation of fans has happily traded fidelity for portability and convenience. This is the obstacle the industry faces in any effort to create higher-quality — and more expensive — ways of listening.

Instead, music is often carried from place to place, played in the background while the consumer does something else — exercising, commuting or cooking dinner.

As usual, the lay journalist struggles with the notion of data compression, saying that the process is “eliminating some of the sounds and range contained on a CD.”

In fact, by design, lossy compression does nothing of the sort. The ideal behind MP3 compression is to eliminate tones which are themselves inaudible, masked in the normal perception of music. That means that, encoded correctly and with enough data, an MP3 should theoretically sound identical to a PCM-encoded CD.

There’s often a difference between theory and practice. But to suggest that the aim, the goal of MP3 or AAC is to eliminate auditory, perceptible sounds in order to increase portability is simply inaccurate. Perceptual compression designed so that, according to the appropriately-named Karlheinz Brandenburg, compression pioneer of the Fraunhofer Institute, “the basic task … is to compress the digital audio data in a way that … the reconstructed (decoded) audio sounds exactly (or as close as possible) to the original audio before compression.”

You do need enough data for the compression technique to work its magic, which is why the shift from lower bitrates in MP3/AAC to higher bitrates on leading digital music stores is so important. But at a certain point, you no longer perceive anything missing, and as Duke Ellington would say, “if it sounds good, it is good.”

If the audio compression is successful, that means a generation of fans hasn’t traded fidelity at all, if the previous popular format was the audio CD. The “if it’s successful” part is important, but it isn’t as simplistic as this (and most other stories) would have you believe.

Newspaper journalists continue to treat MP3s as though it’s still 1999. In 1999, it wasn’t uncommon for people to illegally download music from services like Napster that were encoded at bitrates that were too low, and that actually contained encoding errors, which will cause auditory distortion and pops. That’s not true of a track downloaded from Amazon or iTunes today. These issues are significant.

A discussion of how compressed audio compares to an audio CD actually isn’t an easy discussion. Even simple metrics like frequency range or signal-to-noise ratio aren’t directly applicable to audio that uses perceptual encoding techniques, because by definition, they use variable encoding rates to change from frame to frame. The quality of the encoder and its settings make a big difference. Suffice to say, it’s possible to create an MP3 or AAC file that isn’t as satisfying as an audio CD, or to create one that – even for many trained ears – is satisfying. I won’t even try to debate the merits here, because to get the answer technically correct, we’d have to do more work.

To have a technically-robust discussion, though, we’d actually define what we’re talking about: comparing, say, the quality of a direct-to-digital audio CD with a broad dynamic and frequency spectrum as played on a standard audio CD and a 320-kpbs MP3. That could be an interesting discussion, and you might even choose the audio CD over the MP3 in certain cases. But it probably wouldn’t reach any sweeping conclusions like generations of listeners turning their backs on quality in the name of cheap thrills.

In the dance of logical fallacies, articles like this one never define the terms of their basic thesis – the “generation of listeners” trading convenience for quality:

  • What generation? (I’ve seen everyone from age 8 to 80 with an iPod.)
  • Compared to what? (MP3 to audio CDs? AAC to 8-tracks? What?)
  • Who’s judging the quality, and how? What’s quality?
  • When? Who? Where? … What?

But yes, I suppose it can be said that at an indeterminate time, using an indeterminate playback format (MP3 or AAC or … something) with an undetermined bitrate (maybe 128k, maybe 256k), listening through a range of variables that have gone undefined (headphones? background noise? are you using your blender when cooking in the kitchen?), an indeterminate group of people listening broadly to things that might be called “music” (whether that’s the Brandenberg Concerto or Frank Zappa) from some indeterminate era, itself recording originally through some unknown means at some undefined time, has audio quality that is not as good as some other music music listened to by someone else … sometime. Or something.

I can’t really argue with that, can I?

Myth #3: The iPod is the perfect emblem of a generation that doesn’t care about music

Quick: what’s small and portable but sacrifices audio fidelity for over-compressed music with no frequency or dynamic range? It’s portable, it’s pocketable, it was a wildly-successful creation that changed how a generation consumed electronics and music alike, and it has terrible earbuds.

The iPod? No, I’m talking about the Japanese transistor radio. By contrast, it makes the iPod looks pretty amazing. The transistor radio had:

  • A terrible tuner. In order to save space, cost, and power consumption, the tuner in early radios – the “transistor” in transistor radio – was often sub-par. Say what you will about MP3s or online streams; at least you don’t have to tune them out of the air. Weak signal? Weak music.
  • A crappy speaker in a crappy housing. Want an insider tip for how to make a bad speaker sound even worse? Here’s a hint: put it inside a rattling plastic housing.
  • AM radio for music delivery. The irony of talking about MP3 as a step backward is nothing when compared to AM radio, which supported mono output and bandwidth of only 10 kHz. Analog mono FM radio sounds better, let alone a current average digital file. Only later did transistor radios add FM radio support, and it was some time before stations embraced the format.
  • Terrible, mono earbuds. The iPod’s weakest link is the lousy earbuds Apple ships with the device, but early transistor radios were even worse. Aside from holding one up to you ear, you could plug in an earbud – yes, one earbud, in one ear. The earbud was terrible, and mono. The signal was terrible, and mono. And you had one in only one ear.

I’ve always loved listening to transistor radios. They have gotten better. But that’s the point: they’ve gotten better, not worse. And an iPod can usually beat one of these devices when it comes to sound quality.

Myth #4: Music is getting squashed by loudness wars – blame the iPod

No article on the evils of digital music would be complete without reference to the Loudness Wars:

With the rise of digital music, fans listen to fewer albums straight through. Instead, they move from one artist’s song to another’s. Pop artists and their labels, meanwhile, shudder at the prospect of having their song seem quieter than the previous song on a fan’s playlist.

So audio engineers, acting as foot soldiers in a so-called volume war, are often enlisted to increase the overall volume of a recording.

It’s a subject for another post, but I’m tired of the “loudness war” being applied to “music.” What music? What genre? Recorded by whom? When? I’ve heard exquisitely-engineered music from the past few years. I’ve heard brickwall-limited pop songs that … well, would have sounded like crap even without being poorly mastered. I’ve also heard music that used over-compression for intentional distortion in some genres (like Dub) long before anyone began worrying about loudness wars. (I’m also unconvinced by the listening habits described here. We know how many singles versus albums are purchased, but not how people listen to their existing music collections, so I’m dubious.)

But that’s not really the argument here. The issue is what’s to blame. In fact, I believe historically the author again has it completely wrong. The technology that began to change how music was mastered, that began to cause people to move from one track to another, isn’t the iPod. It’s the radio. And if anything caused the homogenization of music at the top of the charts, it wasn’t the introduction of digital singles. In fact, the iPod has technology designed to level out volume levels automatically on a playlist. The trend attributed to the loudness wars scaled in the 1990s, as sales of CDs and CD singles – not downloads – were on the rise.

Let’s face it: A&R people don’t care what a track sounds like by the time it’s found its way to your iPod playlist. You’ve already bought it. Job over. What they care about is how “loud” that track sounds when you haven’t bought it. And that means impressing the people who run the radio stations.

If you want a historical variable in that time span (and the more recent decade), look to the consolidation of broadcasting companies and radio markets.

The nonpartisan Future of Music Coalition (FMC) found that in 2005, half of listeners tuned to stations owned by only four companies, and the top ten firms had almost two-thirds of listeners. At the same time, radio listenership has declined 22 percent since its peak in 1989 in the top 155 markets.

Source: Peter DiCola, False Premises, False Promises, Future of Music Coalition (2006)

It’s a topic for another article, but I just can’t find a rational explanation for why the iPod would make less dynamic range make sense. Personal listening means the ability to set your own volume level, and data compression and poor-quality headphones mean that over-compressed music sounds worse, not better. The charges levied against the iPod might just as easily be directed at the Cassette Walkman of the 80s, on which people routinely listened to mix tapes of their own creation.

At the same time, I haven’t seen anyone argue against the notion that media consolidation might be the culprit, even though radical consolidation took place over the same era that the “loudness wars” were supposedly raging. I welcome other theories here. But even if you don’t agree with me, I don’t think you can take it as a given that the iPod is specifically to blame – and I’d think you’d want some evidence, regardless.

Myth #5: Technology is the cause and determinant

There’s a bottom line to my endless rant. (I know, I know – get to it already.)

Via Twitter and Facebook this morning, while I was blowing off steam about this article, a couple of people referred to how artists “intend” their music be heard. I’ve got bad news for you: your listeners don’t care about your intentions. Part of the genius of people who are great mix engineers or great mastering engineers is that they know how to shape music for the worst-case scenario listening environment, not just the best.

It’s not that some MP3s don’t sound terrible, or that music is sometimes mastered poorly or overly compressed. It’s not that the standard earbuds on the iPod aren’t awful, or the blown-out speakers in someone’s car aren’t poor – they are.

The variable in all of this that’s more important than the technology is the listener. Listeners are fickle and unpredictable. They don’t always concentrate on music. They don’t always care about fidelity. “They” don’t always agree – which is why some people don’t replace those default earbuds, while others blow thousands of dollars on listening equipment.

Too much of the debate over listening focuses on the technology and not the listener. The listener – and perception – is everything. And that leaves us to our final myth:

Biggest myth of all: Perception and reality are one and the same

There’s an unstated elitism in most of these discussions. I think it’s worth a separate post, so I’ll come back to this video and the ideas in it, but one key revelation is that even golden-eared pros can have their perceptions fooled by comb filtering in a room or even the placebo effect:

[Mix engineer] Anyone who records and mixes professionally has done this at least once in their career—you tweak a snare or vocal track to perfection only to discover later that the EQ was bypassed the whole time. Or you were tweaking a different track. And if you’ve been mixing and playing around with … whether you’re a professional or just a hobbyist, if you’ve been doing this for a few years and you haven’t done that, then you’re lying. Yet you were certain you heard a change! Human auditory memory and perception are extremely fragile, and expectation bias and placebo effect are much stronger than people care to admit.

There’s a lot more to this panel. It winds up being a lot more interesting than the debates over MP3s and digital downloads, and get to the heart of how we hear. I’ll try to pull it apart and talk to people with more expertise than my own about is soon, but in the meantime, there are copious notes and audio downloads to go along with the video:
http://www.ethanwiner.com/aes/

Thanks to oivindi (see also SoundCloud) for the tip.

Why bother with this whole rant? I’m hopeful that, if we look beyond the simplistic explanations to the actual science, history, and magic by which we all hear music, we’ll find out a lot more about what music means. The story above came from the business section, but the industry isn’t a good place to look for answers. The failure of a format like SACD shows a real failure of understanding about how people listen, how they perceive quality, and even basics of how formats and compatibility would appeal. Nor has the recording industry always given you a better product for more money: they were just as happy to sell you excerpts of music at ridiculously inflated prices at lower fidelity for mobile formats in the form of ringtones.

My alternative rebuttal:

1. Audio and visual technology have advanced in lockstep, whether or not consumers have always bought the gear.
2. MP3/AAC files can sound just fine, so it’s not fair to leave audio complaints at their doorstep; what we need is better testing under optimal circumstances, not just how these formats fail.
3. The transistor radio, not the iPod, was the great backwards step in mobility; it shows just how important being mobile and cheap and for how long, years before digital.
4. The real culprits in the loudness wars is media consolidation and top-of-the-charts senselessness, not mastering engineers or iPods.
5. Listener are the variable, not tech.
6. Human perception is always the first place to consider – even with pros.

If you want to improve “fidelity,” even for your own listening, you can’t ignore the listener. You can’t ignore perception. And you certainly can’t ignore history. But pay attention to these things, and who knows what’s possible?