Machine learning is synthesizing death metal. It might make your death metal radio DJ nervous – but it could also mean music software works with timbre and time in new ways. That news – plus some comical abuse of neural networks for writing genre-specific lyrics in genres like country – next.
Okay, first, whether this makes you urgently want to hear machine learning death metal or it drives you into a rage, either way you’ll want the death metal stream. And yes, it’s a totally live stream – you know, generative style. Tune in, bot out:
Okay, first it’s important to say, the whole point of this is, you need data sets to train on. That is, machines aren’t composing music, so much as creatively regurgitating existing samples based on fairly clever predictive mathematical models. In the case of the death metal example, this is SampleRNN – a recurrent neural network that uses sample material, repurposed from its original intended application working with speak. (Check the original project, though it’s been forked for the results here.)
This is a big, big point, actually – if this sounds a lot like existing music, it’s partly because it is actually sampling that content. The particular death metal example is nice in that the creators have published an academic article. But they’re open about saying they actually intend “overfitting” – that is, little bits of samples are actually playing back. Machines aren’t learning to generate this content from scratch; they’re actually piecing together those samples in interesting ways.
That’s relevant on two levels. One, because once you understand that’s what’s happening, you’ll recognize that machines aren’t magically replacing humans. (This works well for death metal partly because to non connoisseurs of the genre, the way angry guitar riffs and undecipherable shouting are plugged together already sounds quite random.)
But two, the fact that sample content is being re-stitched in time like this means this could suggest a very different kind of future sampler. Instead of playing the same 3-second audio on repeat or loop, for instance, you might pour hours or days of singing bowls into your sampler and then adjust dials that recreated those sounds in more organic ways. It might make for new instruments and production software.
Here’s what the creators say:
Thus, we want the out-put to overfit short timescale patterns (timbres, instruments, singers, percussion) and underfit long timescale patterns(rhythms, riffs, sections, transitions, compositions) so that it sounds like a recording of the original musicians playing new musical compositions in their style.
Sure enough, you can go check their code:
https://github.com/ZVK/sampleRNN_ICLR2017
Or read the full article:
Generating Albums with SampleRNN to Imitate Metal, Rock, and Punk Bands
The reason I’m belaboring this is simple. Big corporations like Spotify might use this sort of research to develop, well, crappy mediocre channels of background music that make vaguely coherent workout soundtracks or faux Brian Eno or something that sounded like Erik Satie got caught in an opium den and re-composed his piano repertoire in a half daze. And that would, well, sort of suck.
Alternatively, though, you could make something like a sampler or DAW more human and less conventionally predictable. You know, instead of applying a sample slice to a pad and then having the same snippet repeat every eighth note. (Guilty as charged, your honor.)
It should also be understood that, perversely, this may all be raising the value of music rather than lowering it. Given the amount of recorded music currently available, and given that it can already often be licensed or played for mere cents, the machine learning re-generation of these same genres actually requires more machine computation and more human intervention – because of the amount of human work required to even select datasets and set parameters and choose results.
DADABOTS, for their part, have made an entire channel of this stuff. The funny thing is, even when they’re training on The Beatles, what you get sounds like … well, some of the sort of experimental sound you might expect on your low-power college radio station. You know, in a good way – weird, digital drones, of exactly the sort we enjoy. I think there’s a layperson impression that these processes will magically improve. That may misunderstand the nature of the mathematics involved – on the contrary, it may be that these sorts of predictive models always produce these sorts of aesthetic results. (The same team use Markov Chains to generate track names for their Bandcamp label. Markov Chains work as well as they did a century ago; they didn’t just start working better.)
I enjoy listening to The Beatles as though an alien civilization has had to digitally reconstruct their oeuvre from some fallout-shrouded, nuclear-singed remains of the number-one hits box set post apocalypse. (“Help! I need somebody! Help! The human race is dead!” You know, like that.)
As it moves to black metal and death metal, their Bandcamp labels progresses in surreal coherence:
This album gets especially interesting, as you get weird rhythmic patterns in the samples. And there’s nothing saying this couldn’t in turn inspire new human efforts. (I once met Stewart Copeland, who talked about how surreal it was hearing human drummers learn to play the rhythms, unplugged, that he could only achieve with The Police using delay pedals.)
I’m really digging this one:
So, digital sample RNN processes mostly generate angry and angular experimental sounds – in a good way. That’s certainly true now, and could be true in the future.
What’s up in other genres?
SONGULARITY is making a pop album. They’re focusing on lyrics (and a very funny faux generated Coachella poster). In this case, though, the work is constrained to text – far easier to produce convincingly than sound. Even a Markov Chain can give you interesting or amusing results; with machine learning applied character-by-character to text, what you get is a hilarious sort of futuristic Mad Libs. (It’s also clear humans are cherry-picking the best results, so these are really humans working with the algorithms much as you might use chance operations in music or poetry.)
Whether this says anything about the future of machines, though, the dadaist results are actually funny parody.
And that gives us results like You Can’t Take My Door:
Barbed whiskey good and whiskey straight.
These projects work because lyrics are already slightly surreal and nonsensical. Machines chart directly into the uncanny valley instead of away from it, creating the element of surprise and exaggerated un-realness that is fundamental to why we laugh at a lot of humor in the first place.
This also produced this Morrissey “Bored With This Desire To Get Ripped” – thanks to the ingenious idea of training the dataset not just with Morrissey lyrics, but also Amazon customer reviews of the P90X home workout DVD system. (Like I said – human genius wins, every time.)
Or there’s Dylan mixed with negative Yelp reviews from Manhattan:
And maybe in this limited sense, the machines are telling us something about how we learn. Part of the poetic flow is about drawing on all our wetware neural connections between everything we’ve heard before – as in the half-awake state of creative vibrations. That is, we follow our own predictive logic without doing the usual censoring that keeps our language rational. Thinking this way, it’s not that we would use machine learning to replace the lyricist. Rather, just as with chance operations in the past, we can use this surreal nonsense to free ourselves from the constraints that normal behavior require.
We shouldn’t underestimate, though, human intervention in using these lyrics. The neural nets are good at stringing together short bits of words, but the normal act of composition – deciding the larger scale structure, choosing funnier bits over weaker ones, recognizing patterns – remain human.
My guess is, once the hype dies down, these particular approaches will wind up joining the pantheon of drunken walks and Markov Chains and fractals and other psuedo-random or generative algorithmic techniques. I sincerely hope that we don’t wait for that to happen, but use the hype to seize the opportunity to better educate ourselves about the math underneath (or collaborate with mathematicians), and see these more hardware-intensive processes in the context of some of these older ideas.
If you want to know why there’s so much hype and popular interest, though, the human brain may itself hold the answer. We are all of us hard-wired to delight in patterns, which means arguably there’s nothing more human than being endlessly entertained by what these algorithms produce.
But you know, I’m a marathon runner in my sorry way.