The most important thing to know about Stems, a new multitrack specification for audio, is that it’s simple by design. That simplicity means that it could really take off as a way of sharing music with multiple tracks, for DJing or live-remix applications.

Stems won’t solve every problem of file exchange and sharing. It’s not a multichannel spatialization format. It’s not a sophisticated project format for storing metadata. I say that, because after we covered Stems at the beginning of this week, I found my inbox flooded with every use case for every file format imaginable, and complaints that Stems didn’t solve them. Some went as far as to get into video.

I get it: you have problems in search of solutions. Just be aware, solving every use case imaginable gets complicated fast. Take the industry standard on which Stems is based – MPEG-4. Covering everything from codecs to files, video to audio, the MPEG-4 spec has 31 parts and 15 levels, each containing still more specs inside the specs. There are international trade treaties that are simpler. And to anyone saying that there are already standards for complex project interchange and sophisticated multichannel audio – you’re absolutely right. That’s why Stems isn’t trying to be any of those things.

Instead, Stems is really a format for releasing music, and it’s intended to be as simple as possible.

Following the announcement of Stems, it seems there was some misinformation about the specs of the format. Some of this was simply technically wrong – like a report that Stems uses “MP3 files” (it uses AAC-encoded audio), or doesn’t support lossless audio (it does). And a lot of people tried to read into the future of Traktor – that’s fair, but it misses part of the point of Stems, which is to try to bring other developers onboard.

Since at least some of those developers are reading CDM, alongside producers and DJs, let’s take a look.

The “Stems” file itself is an MP4 container. Careful here – people use “MP4” interchangeably to refer to file formats and encoding, and they’re not the same thing. When you see a Stems file, you’ll see a single MP4 container – think of a box that can have different stuff in it. Technically, we’re talking MPEG-4 layer 14, but what’s important about this is that any software or hardware that can read an MP4 file can read a Stems file, and play it just like a normal stereo track. That includes iTunes or a CDJ, for example.

As I wrote last week, Stems also uses ID3 for adding metadata. That means the track itself can have all the usual cover art and bpm information and so on, but also each individual stem can also be titled so you know what it is. On controllers with displays, this means a DJ/performer can see what they are, as well.

Saying something is a “container” doesn’t say how the file is encoded or what’s in it. Let’s look at that separately:

“Stems” includes five separate stereo tracks – four stems and one stereo mixdown. Remember, the idea is to have individual parts for your track. So “Stems” specifies four parts, each one stereo. (Note: stereo, not mono – this is 4 x 2 channels each.)

There is additionally a stereo mixdown – this is your normal stereo master, in other words. Let’s assume you’ve set up a track with a bass line, drums, synth lead, and vocals. You would bounce each of those to a separate stereo track, and additionally export the complete mix as you usually would.

Which you hear depends on what you’re using for playback. For software without Stems support, you’ll simply hear the stereo mixdown.

Software (and, possibly, hardware) with Stems support will mix the four parts together. The user will then have control over the level of the parts. That’s why the simplicity of four stereo tracks is necessary: DJ software can always count on Stems tracks to have the same four-part arrangement, and so can create a consistent interface. (I imagine some Stems-compatible software might also give a user a choice of whether to use the individual stems or ignore them.)

(CC-BY-SA) jf1234. That's Russia's DJ Artyom

(CC-BY-SA) jf1234. That’s Russia’s DJ Artyom

You’ll master your tracks as you always did – with some added work. Since there’s a separate stereo track, you’ll master Stems tracks for stereo the way you always have. And that will almost certainly involve some processing on the master bus (or the stereo mixdown file you’ve given to a mastering engineer).

For the individual parts, however, you’ll apply dynamics processing and the like individually. Obviously, you’ll want the mix of those parts to be in the dynamic range you want, with each also sounding good on its own. There was a lot of discussion of this, but it’s not a huge task; a mastering engineer ought to be able to handle it if you can’t yourself. That’s another reason to keep this to four tracks; the process is more manageable.

The default encoding is 256k VBR AAC. The encoding for Stems is the same audio compression as iTunes Plus: 256kbps variable bit rate AAC. The idea is to get high quality sound, with optimized file sizes.

Remember, you have five stereo tracks, not just one, so file size is important. (Variable bit rate means that you get that file size as small as possible without compromising quality.)

Frankly, I think 256k AAC is just fine for listening and DJing, even on club systems. That’s part of why I was critical of the claims made by the new Tidal streaming service – and as many of you found, it’s very, very hard to tell any difference. Given DJs want to carry large libraries with them, this format seems optimal for the situation.

Also, because each stem is encoded separately, NI pointed out to me that the difference should be even less noticeable than it would for just a stereo bounce (assuming mastering the stems has gone well). That makes some sense; it’ll actually be fun to play with these. (This week has left me with the overwhelming impression that we should schedule some blind listening tests. Hmm, Funktion One will have a room at Messe, I know…)

Lossless is an option. Stems allows for lossless in the spec. Now, having admitted that you probably can’t hear the difference in most listening use cases, I think this is useful for another reason – it means you could conceivably use Stems as a format for simple file interchange. In those cases, you might want the lossless format – not because you’ll be able to hear the difference, necessarily, but because you may want to retain full lossless content if a file will be fed through additional processing.

Let’s say you have a drum machine on the iPad. You could export a four-track, lossless project to work on further on your studio machine. It’s lossless, but it’s still smaller than an uncompressed PCM stream like WAV/AIFF – with exactly the same audio quality when played back. (That means it also takes up less space on your tablet.)

I’m saying this out loud as I hope some developers are listening.

Use cases:

First, let’s thing about how important simplicity is in the things we already do. How many times was the easiest way to share music simply to bounce to an AAC MP4, then upload to SoundCloud or WeTransfer or Dropbox? How many times was the best way to get a remix done of a new track just sharing four stereo stems? Or mastering with nothing other than a WAV bounce?

Keeping Stems to just four stereo tracks I think is key. It forces the producer of the track to think through what parts are essential, which are logical to group together. That’s relevant if you’re making Stems only for yourself. It also means you can have consistent hardware controller mappings or consistent software interfaces.

This opens up a number of interesting possible applications:

Creative DJing with stems. Yes, bad mash-ups are one use case. But the format also allows you take apart tracks more creatively. In genres like techno, I think that will allow the use of more tracks as “building blocks” – including with hybrid live sets. In fact, I hope this stops a somewhat disturbing trend of releasing techno tracks that already sound like stems. (Ugh.) This can also couple with features like Traktor’s Remix Decks, obviously, but why not in Serato or on a CDJ, too?

Use of recognizable hooks. Another DJ friend pointed this out to me, while I was musing about subtle drum machine combinations. Outside genres like techno and more generally in DJing, this will clearly be a way to delight crowds with some recognizable bits of popular tracks – if those tracks embrace Stems, anyway.

Mobile remix apps. As I wrote before, this also opens up some new possibilities for, say, a label app that lets listeners remix tracks on their own. It could even allow some clever, standard means of messing about with track stems in games.

Unique live solutions. I like the idea of this for more left-field possibilities, too. For instance, I’ve lately been building environments in Pure Data that let me remix and “DJ” from my own tracks alongside live elements. Now, one problem has been the absence of a consistent way of exporting… well, stems. So I can absolutely see this as being a mechanism of exporting from a production environment to the live environment, and taking that on mobile. (Custom Raspberry Pi performance hardware? Sure.) Obviously, these won’t be widespread applications, but they’ll be brilliant hackday fodder, and some of us will find ways of using them.

What’s next? Well, now we wait. I’m told command-line and graphical tools for producing the formats are coming, in advance of a site going up in June.

Naturally, CDM will cover this both from the end user perspective – for producers and DJs – and the developer side, as well.

If all goes to plan, it isn't just for this. (CC-BY) magnetismus.

If all goes to plan, it isn’t just for this. (CC-BY) magnetismus.

Update: The plot thickens; Splice jumps in…

To me, the point of Stems is to create a format for releasing … um, stems (get it?) … for consumption by DJs. That explains to me the limit on four tracks: more than four tracks is simply too many different elements to work with at a time, given you’re jumping from one stereo track to four stereo tracks. Complexity, even musically, tends to be geometric.

However, one thing that was apparent following reader reaction to this format was that there’s demand for project interchange formats, as well.

Splice co-founder Matt Aimonetti is openly critical of some aspects of the Stems format (while applauding its general direction and basis in MPEG-4 standards). Specifically, he doesn’t like the four-stem limit:

There is technically no reason to limit the stems to 4 the way NI is doing it. Our guess is that they have their own reasons (memory limitations, user experience based on their 4 track hardware). Our hope is that Traktor Pro and the existing and upcoming controllers will accept mp4 files with more stems but will let NI users access the first 4 stems. This will allow other products (software/hardware) to go beyond NI’s current limit.

I will actually defend NI on this, because I think the 4-stem limit makes some sense:
1. Musically speaking, four stereo stems may be a useful limit. Having too many stems for a project is not an advantage even for remixing; for DJing, it could really be overkill. Few tracks have more than four main elements at a single time (or, at least, four meaningful groups of elements).

2. It maps well to hardware – not just NI’s own hardware, but a variety of hardware that has faders in groups of four. (Sort of … all hardware in that sense.)

3. Storage is an issue. Remember, Stems are being pitched as a format for music stores to sell and DJs to consume – as a new format for labels and artists. They already are five times larger than a standard file at the same encoding. Double the stems to eight, and they’d be nine times larger – or you’d never know the size you were getting. And that’s before you try to load them onto the cramped SSD on your laptop or an iPad to take out and DJ.

Not to mention, if you really need to go beyond this, you could – even, as Splice says, within that MP4 container. NI didn’t invent anything new here or close off some proprietary format. If you have a better solution for another use case, you should just ship it.

And that makes the next part of the Splice article the most interesting:

There is also a lot more we can do. At Splice we work at the source code level of music, we can inject all kinds of data/metadata into a container format, from cues, markers and loops to automation, visualization, samples, midi and presets.

This gets to the other interests people have raised – actual metadata around information beyond just audio.

I think those things belong nowhere near a format intended for distribution like Stems. And distributing tracks for DJing is the core business of the store partners we saw back Stems (Beatport and so on).

I disagree strongly with Splice’s criticisms of Stems for its use case, and I think the format is so simple, just shipping it was the right approach. I also have to disagree, perhaps a bit cynically here, with the idea that involving other companies would make things better. Remember that the origins of MIDI were formats crafted, individually, by Roland and Dave Smith.

I’m not sure how much more you can add to a DJ format – you’d have to have an argument about the number four, and at some point having more people argue about the number four would be unlikely to go anywhere. (One good comment from Splice’s Matt there – could you author an eight-stem file but still have it read by the four stem-compatible player? That seems useful and harmless… not for distribution, but for other apps building on Stems. We’re chatting about this on Twitter, which means I’ll probably never leave my desk now!)

But if the aim is collaboration, as is Splice’s core business, then that’s another matter. Project interchange is a tougher sphere to define, but a useful one. Splice has certainly gotten their hands dirty in that matter, as have the DAW makers with which they interoperate.

And what Stems has demonstrated to me this week is that it’s an area people are hungry to tackle. So I do agree with Splice calling for some discussion about this – even if I think it needs to be a separate discussion.

Native Instruments Stems Format: How Does it Work or, How Should it Work? [Splice Blog]

Now, good grief, before everyone thinks I’m some sort of NI fanboy, disclaimer: yes, I do wish they’d fix Traktor and its arcane preferences dialogs and make things sync properly. Ahem. Got that off my chest.

And yes, I look forward to seeing Stems in something other than just NI software, because I personally think the design will work.