The history of music and the history of music notation are closely intertwined. Now, digital languages for communicating musical ideas between devices, users, and software, and storing and reproducing those ideas, take on the role notation alone once did. Notation has always been more than just a way of telling musicians what to do. (Any composer will quickly tell you as much.) Notation is a model by which we think about music, one so ingrained that even people who can’t read music are impacted by the way scores shape musical practice.
All of this creates a special challenge. Musical notational systems had traditionally evolved over centuries. Now, we face the daunting question of how to build that language overnight.
This question has been a topic I’ve visited in a couple of talks, first here in New York at in/out fest last December, then most recently for a more general audience at RSVP, a new conversation series in Hamburg, Germany hosted by the multi-disciplinary design studio Precious Forever. (See photo at top, by which we can prove that the event happened. Check out more on the event and how the Precious gang hope this will inspire new interchange of ideas in Hamburg – something perhaps to bring to your town.)
What I’ve learned in talking to people at those events is, music notation matters. It’s more relevant to broad audiences than even those audiences might instinctively think. The most common lingua franca we have for digital music storage, MIDI, is woefully inadequate.
But perhaps most importantly: replacing MIDI’s primitive note message is far from easy. The more you try to “fix” MIDI, the more you appreciate its relative simplicity. And engineering new solutions could take re-examining assumptions Western music notation has made for centuries.
Musical notation and culture
Explaining the importance of notation to expert musicians is easy. But to convey its importance to lay people, you need look no further than the game interface developed by Harmonix for the hit titles Guitar Hero and Rock Band (and in turn descended from a similar interface paradigm used in the Japan-only Konami GuitarFreaks). These games demonstrate that, even among non-musician gamers, certain received wisdoms from Western notation endure. (In fairness, many of the designers of music games have a fair bit of musical experience, but the fact that their work is received by audiences in the way it is nonetheless speaks volumes.)
The Guitar Hero interface actually is a Western musical score, rotated 90 degrees to make it easier to see how the events on-screen are matched to game play input. (For visual effect, the “track” is also rotated away from the screen, so that events further in the future recede into the background – a bit of visual flair that helped differentiate Harmonix from flatter-looking Japanese games.)
Whatever the rotation, the assumptions of the game screen itself are rooted in notation. Pitch is displayed along lines and spaces, just as on a score. Rhythm is displayed along a metrical grid, which reads as a linear track. Not coincidentally, I believe, when Harmonix has deviated from this formula, their titles have tended to be less successful. More sophisticated interactions in titles like Amplitude and Frequency (and the iPod game Phase) were big hits among gamers, but less so among the general public, perhaps in part because they require a more abstract relationship to the music.
Games are just one example, of course. Musical scores reflect basic cultural expectations, and in turn shape the music that people in that culture produce. As with most Western languages, text flows from left to right and top to bottom. Ask people to describe pitch in any culture that uses this notational system, and they’ll use the notions of “up” and “down,” “higher” and “lower” – even though these metaphors are meaningless in terms of sound. (Indonesian culture, for instance, gets it more physically correct, by describing what we call higher pitches as “smaller” and deeper pitches as “larger,” as they are in gongs.) And music in Western cultures are also deeply rooted on a grid, on 4/4 time and equal subdivisions. It wasn’t always so: even in the West, prior to the advent of notation of these meters, metrical structures flowed more freely.
It’s little surprise, then, that some of the biggest successes in electronic musical instruments have adopted the same conventions. From the Moog sequencer to the Page R editor on the Fairlight CMI sampler to the array of buttons on Roland’s grooveboxes, rhythmic sequencers that follow the grids devised in Western music notation are often the most popular. Even if the paradigm of the interface is one degree removed from the notation, the assumptions of how rhythms are divided – and thus the kinds of patterns you produce – remain.
Nowhere is this more true than in MIDI. MIDI is itself a kind of notational system, around which nearly all interfaces in software and hardware have been based over the past two and a half decades since its introduction.
MIDI, keyboards, and piano rolls: An incomplete “standard”
The first thing to understand about MIDI is that it began life as a keyboard technology. A complete history of MIDI should wait for another day, but even as its early history is told by the MIDI Manufacturing Association, it’s a technology for connecting keyboard-based synthesizers, not a solution to the broader question of how to represent music in general.
Many of the tradeoffs in MIDI, though, were made long before the 1980s or the invention of digital technology. When the 19th Century creators of the player piano needed not only standardization but reproduce-ability – before the advent of recording, the power to recreate entire musical performances – they turned to the piano as a way of modeling musical events. Indeed, the first player pianos quite literally reproduced the process of playing a piano, using wooden, mechanical fingers to strike notes on the keys just as a human would, before that mechanism was replaced with the internal players familiar to us today. What these inventors found in the piano was an instrument that, in the name of accessibility, aligned pitch to a simple grid.
The piano is a beautiful instrument, but its great innovation – the grid of its black and white keys – is also its greatest shortcoming. That grid is an imperfect model even of Western musical pitches, let alone other cultural systems. The 12-tone equal-tempered tuning used on modern pianos makes tuning multiple keys easier, but only by way of compromises. Even a modern violinist or singer may differentiate between the inflection of a G flat and an F sharp, based on context, but to the piano, these pitches are the same. And tuning is only the beginning. Piano notes begin with a note being “switched” on and end with it being “switched” off – no bending or other events within that pitch as on most other instruments.
It’s little wonder, given MIDI’s origins as a protocol for communicating amongst keyboards, that the editing view most common in music software is the piano roll, labeled as such. The piano roll is the perfect paradigm for sequencing events played on a keyboard, but that doesn’t mean it’s the best language for describing all music. And the obligation of a digital protocol is actually greater than that of musical notation, because there’s no human being at the other end to fill in missing expression and context.
Consider what’s missing in MIDI:
- Pitch reference: By convention, MIDI note 60 is C4. However, musical practice internationally lacks a consistent standard for what the tuning of C4 is, and any number of variables can interfere, from independent tuning tables to the use of the pitch bend to the activation of an octave transpose key.
- Pitch meaning: MIDI note values use an arbitrary pitch range from 0 to 127, a hypothetical 128-key piano, which itself makes no sense. 4? 8? 15? 16? 23? 42? The numbers themselves don’t mean anything.
- Pitch resolution: Because of the 0-127 resolution constraints, to get notes in between the pitches, you need a series of separate messages like pitch bend, giving you two values with only an incidental relationship to one another. Since pitch range is kept in yet another message, the results are confusing and un-musical, far more complex than they need to be. (Why wouldn’t 60.5 be a half-tone higher than 60?)
- Real expression: Events between note on and note off are represented independently as control change values. But that causes problems, because it means there’s no standard way to represent something as simple as a musical glissando. On a synth, making an expression (like twisting a knob or turning a wheel) separate from a note (pressing a key) makes sense. But that doesn’t make musical sense, and it doesn’t match most non-keyboard instruments. Only aftertouch is currently available, and that again assumes a keyboard and doesn’t expose pitch relationships created by adding the data.
- Musical representations of tuning and mode: The MIDI Tuning extensions require that you dump tuning information in fairly unstructured System Exclusive binary dumps. The standard itself is in some flux, and at best, its reliance on byte messages means that it’s not something a human being can read. And it still must be aligned with 128 otherwise arbitrary values. It’ll work, but it only makes sense on keyboards, and even then, it’s not terribly musical. Looking at number 42 in your sequencer, you’d have no idea of the tuning behind it, or the position in a mode – something any rational musical notational system would make clear.
Ironically, it was this very set of constraints that early innovators on the Buchla and Moog synthesizers hoped to escape. They were fully aware that the very genius of the keyboard was restricting musical invention. Analog control voltage, the basic means of interconnecting equipment prior to digital tech, was more open ended than MIDI, which replaced it. But that’s not to say it was better. Standardization is an aid in communication, as is the ability to describe messages. The question is, how can you do both? How can you be open ended and descriptive at the same time?
How do you build a new system?
Deconstructing is easy; constructing is hard. We certainly have the ability to send more open-ended messages and higher-resolution data; that’s not a problem. (Even by the early 80s when MIDI was introduced, its tiny messages and slow transmission speeds were conservative.) We also have OpenSoundControl (OSC), which has some traction and popularity, including near-viral use on mobile devices and universal support in live visual applications. It’s telling that that protocol is itself not really an independent protocol in the sense that MIDI is, but built on existing standards like TCP/IP and UDP. 2010 is, after all, not 1984.
The hold-up, I think, is simply the lack of a solid proposal for how to handle musical notes. And there are plenty of distractions. It’s tempting to throw out the simplicity of MIDI’s note on and note off schema, but it’s partly necessary: with a live input, you won’t know the duration of a pressed key until that key is released. It’s equally tempting to cling to Western musical pitches, even though those pitches themselves lack solid standardization and don’t encompass musical practices in the rest of the world. (12-tone equal temperament is a recent invention even in the Western world, and one that doesn’t encompass all of our musical practice. World tunings should best be described not by majority, but plurality, anyway – have a look at the current demographics of Planet Earth.)
One solution is simply to express musical events by frequency. That’s not a bad lowest common denominator, or a way to set the frequency of an oscillator. As a musical representation, though, it’s inadequate. It’s simply not how we think musically. The numbers are also unpleasant, because we perceive pitch roughly logarithmically. Pop quiz:
Can you do logarithms in your head? Yes or no?
Can you count?
MIDI gets it half right by using numbers, but then it’s hard to see octave equivalence, another essential concept for perceiving pitch. MIDI note 72 is probably equivalent to MIDI note 60… assuming 12 steps per octave. Or it might not be.
If you need a common denominator that covers a variety of musical traditions, mode (or more loosely, pitch collection) and register aren’t a bad place to start. I don’t think a system needs to be terribly complex. It could simply be more descriptive than MIDI is – while learning from the things MIDI does effectively.
Consider a new kind of musical object, described over any protocol you choose. It would ideally contain:
- Mode/pitch collection: As with MIDI and the MIDI tuning tables, tuning would need to be defined independently, but it can be done in a musical, human-readable way. It then becomes possible even to define modes that have different inflections based on context, as with pitches that are slightly different in ascending and descending gestures (common in many musical systems).
- Relative degree: a notation like “1 1 2 3 5 6” can work in any musical language. You simply need to know the active mode or pitch collection.
- Register: Instead of conflating register and scale degree, you could simply define an octave register and starting frequency. This retains modal identities and octave equivalence, and makes relative transposition easy to understand. (A “transposition” message could be defined as an actual message, which is more musically meaningful.)
- Standardized inflections, connected to pitch: Pitch bends and glissandi should be relative to a specific note, because notes can have pitches that bend around their relative scale degree. (Think of a singer bending just below a note and into the actual pitch. These aren’t independent events.) A trombonist would never have invented MIDI notes. They would likely have immediately turned to the question of how to universally describe bending between notes.
- Yes, frequency: There will be times when directly referring to frequency makes sense, and that should be possible, as well.
- Relative duration: Musical notation, regardless of musical culture, uses some kind of relative indication of duration. Only machines use raw clock values. The result is that it’s possible to make musically meaningful changes in tempo and have durations respond accordingly. And whereas note on and note on events make sense on input, a musical event would not logically separate these events; there’s some notion of an event with a beginning, middle, and end. If you sing an ‘A,’ that’s one event, with a duration, not an independent beginning of the note and end of the note.
Far from replacing existing standards for music notation, this kind of standard could interchange more gracefully with printed notation. If you import a standard MIDI file into notation software, you get results that are typically full of errors, because the SMF lacks musical information about the events it contains. With more of that information stored, and stored in standard ways, translating to paper would become vastly more effective.
I’m sure attempts to model this in OSC have been attempted before, but it’s worth compiling those ideas and resurrecting the discussion.
What about input?
Ah, you say, but then, let’s go back to the keyboard. None of these events makes sense on a keyboard. You don’t know when a note is pressed how long it’ll last. You don’t know the modal degree of a particular, arbitrarily-played note.
I was stuck on the same problem, until I realized what I had been taking for granted: MIDI conflates two very separate processes. It makes input and output the same. Musical notational systems have never done that. When you look at a score, it’s a set of musical ideas, given meaning and context. If you record a series of events from an input, those events don’t immediately have meaning or context. It’s confusing the mechanical with the musical. It’s the reason MIDI is not just like a player piano – it is a digital player piano.
Separate out the issue of recording mechanical input events, and you can have a system that’s more flexible. That system should fit whatever the input is. An organ, a shakuhachi, a didgeridoo, and an electric guitar aren’t the same thing. Why would they be represented with the same set of input events? That’s pretty daft.
Look at it this way: imagine if instead of being invented by synthesizer people, Aeolian Harp players had invented MIDI. (It’s not so far-fetched: the Aeolian Harp has a millenia-long history and was once quite popular.) An Aeolian Harp sequencer would feature elaborate, high-resolution data recording for wind pressure relative to different strings. It might measure, even, wind direction. In fact, it’d look a lot more like meteorological data than musical data per se. It certainly wouldn’t involve integers from 0 to 127.
This should lead to a simple conclusion with profound consequences:
Physical input and musical output should not be the same thing.
One of the advantages of a protocol like OSC (or any open, networked, self-described protocol) is that it can be open-ended and descriptive, meeting our earlier challenge. For instance, using a hierarchy of meta-data attached to the message, you could describe a set of variables relevant to wind input. If you wanted to transcribe the results in musical terms, you could then use a musical notation, as above – one that used musical identity attached to the resulting frequencies, as in relative modal pitch and rhythmic duration. But the input would be a separate problem. That’s a far piece from MIDI, which is adequate neither as a complete description of the input device, nor of any kind of resulting musical system.
But wait a minute – how is there a standard? How do you standardize something that could include an Aeolian Harp, a vuvuzela, and a bagpipe? Welcome to the problem of music. Music is by its very nature resistant to standardization, because the possibilities of the physical world are so broad. This also suggests how input protocols (and output protocols) can go beyond musically-exclusive data. Again, we can turn back to MIDI as a model. MIDI was intended with specific applications in mind, with messages that referred to MIDI notes and filter cutoff. But that didn’t stop it from being warped to accommodate tasks well outside the standard, ranging from triggering videos to controlling amusement park robotic characters (literally). This suggests to me that what defines a standard protocol of this kind is not what is most strictly standardized, but what is most flexible.
The real challenge with something like OSC, then, is to come up with standardized ways of defining non-standardized events, and using some kind of reflection or remote invocation to allow devices or software that have never communicated before to handle unexpected messages intelligently. At the very least, they should give users clear, understandable options about the data they send and receive. This independent question has been one the OSC community has raised for some time. To me, all that remains is to make some compelling implementations and let the most effective solution evolve and win out. Recent reading on the topic (though this absolutely deserves a separate post, which I’ll get to soon):
Best Practices for Open Sound Control
Minuit : Propositions for a query system over OSC
That’s a separate problem from how to make events musically meaningful. But that to me is the central revelation, and something MIDI completely misses: these are two separate problems, not one problem. Handle input events as input. If it makes sense in a sequencer to record them as musical events (like scale degree pitches), do that. If it makes sense to record them as a series of time-stamped, physical events, do that – but with actual information relative to what was recorded, so that the wind across an Aeolian Harp is recorded in a way that makes sense for that input. And when describing musical events, describe them in musical ways.
This isn’t relevant only to music communities, either: it’s relevant to anyone recording events in time. It’s part of the reason the “sound” needs to be dropped from OSC. MIDI is as specific as it is partly because the specification has messages too small to contain information describing what the events mean. We now have standard network protocols that do that, so they can include information about other kinds of events. There’s no reason someone monitoring water levels in their herb garden and someone recording a sousaphone solo couldn’t use some of the same underlying protocols. There’s also every reason they’d record different kinds of data content.
Promising venues and a call to action
There’s really no need to try to “replace” or “fix” MIDI – if MIDI has endured for a specific application, maybe it actually is well-suited to that application. I think it’s time, instead, to think about how new systems can encompass more musical meaning from our own traditions and traditions around the world, and how we can standardize broad ranges of events instead of trying to fit everything into narrow, rigid boxes.
There is every reason to believe new things can happen now, too. Whereas hardware standardization once was a slow process, requiring the involvement of major manufacturers, we now carry around programmable computers inside our pockets as “phones” and learn to write embedded code in Freshman college classes using $30 Arduino boards. If you want new hardware standards, you can literally make them yourself. We have the ability to share musical notation directly in a Web browser using standard descriptions, as covered here recently. Because browsers in general are demanding newly distributed, networked applications, communicating in standard ways – as Web APIs do naturally – is becoming imperative.
But there’s one thing that makes me especially optimistic: you. Via the Web, we have instant access to your collective knowledge and experience. That means it’s a sure thing that all of us, collectively, knows more about previous research in this area, previous ideas, and what has and hasn’t worked. We also have the opportunity to communicate with each other, to make ideas evolve, at least experimentally. That doesn’t remove the need for eventual standardization, but good standards follow practice, not the other way around – something has to work in one place before it can be a shared standard. We also have mechanisms for self-standardization that didn’t exist before. Spoken languages evolve because people collectively work to share common means of communication. You might argue that this leads to a tower of Babel, but then, I’m writing this in English and you’re reading it in the same language and (hopefully) understanding. The same is true of Mandarin, Portuguese, German, Arabic, Hindi, and so on. It’s also true of volunteer adoption on the Internet of HTML, XML, JSON, and RSS.
Music is not the result of notation or standards. It’s the other way around. Musical practice long predated any attempt to write it down. And mathematics and written language each have abilities to describe music and many other media.
To me, two questions remain:
1. What would an implementation of structured messages for pitch and duration look like, perhaps implemented via OSC? What history has been there in this area, and what do you need?
2. How can smarter implementations of a protocol like OSC allow software and hardware to better handle unfamiliar input – as musicians, as they have done since the dawn of time, invent novel physical interfaces?
I look forward to kicking off this discussion and hearing what you think.