LaDiDa Demo from khush on Vimeo.
There’s no question iPhone/iPod touch development – really, just clever mobile development – has gotten a bit overhyped lately. But that’s all the more reason to do a round-up of genuinely interesting stories, real innovation happening on the platform. So, I’m clearing out my inbox with some of the more creative tools appearing recently on Apple’s mobile gadgets. There’s no better way to kick off today’s festivities than with this unusual “reverse karaoke” creation.
Sure, people may think they’re tone-deaf. But even the layperson has extraordinary powers of musical perception. So how could you train your iPhone to perceive and respond to music? That’s the question asked by LaDiDa for iPhone, the first of a new line of “intelligent” music applications for mobile devices. A “reverse karaoke” tool, the idea is to listen to singing and fake accompaniment, rather than having you sing along to canned backing tracks. Nothing is pre-programmed; everything is generated on the fly on the device.
It’ll even make up a Bollywood accompaniment to your singing:
LaDiDa Bollywood Duet from khush on Vimeo.
Of course, to me, it’s interesting not only what the iPhone is able to musically, but also what these algorithms are unable to make sound musical. Both reveal a whole lot about how we hear and conceptualize music. I think the team deserves real credit for making this fun, though, and on constrained hardware.
The app’s creator Khush follows in the footsteps of Smule in that it takes hard-core academic music research and uses mobile devices as a vessel for getting that tech in the hands (literally) of the general public. (See my interview with Smule founder and ChucK originator Dr. Ge Wang.)
Parag Chordia, developed at professor at the Georgia Institute of Technology and the gentleman you see in the video, spoke to CDM about what’s happening behind the scenes. He tells us about how this application was developed, and how the intelligent algorithms work (or at least try to work, as music analysis and auto-accompaniment remain at early stages).
First, an explanation of the app.
Khush CEO Prerna Gupta explains how it works:
1. You sing into the phone, and LaDiDa will compose music to match.
2. LaDiDa’s patent-pending technology analyzes the pitch and structure of the melody to compose a unique accompaniment for each recording.
3. To be clear, we do not query a database of pre-recorded songs. That is, LaDiDa has been designed to work with any music.
4. After recording your song, you can hear it with different styles. LaDiDa comes with three styles — E Piano Pop, Rhythm Synth Pop and Dub Tone — each of which has been developed using high-quality instrumentation to work specifically with our algorithm.
5. We will be launching new styles every month that will be made available through in-app purchases.
6. LaDiDa also works on rap! This month we’ll be adding three new rap styles.
7. After choosing your style, you can save the song and share it on Facebook, Twitter and email.
8. LaDiDa also has a Discover page, where you can hear songs recorded by other users from all over the world.
9. Khush was founded by music technology enthusiasts from the Georgia Tech Music Intelligence Lab. You can read about us here and also find out more about the research at our lab here.
10. LaDiDa went live in the iTunes store last week and is currently priced at $0.99.
Prena, the woman you see in the video, has some Web experience to boot, too, including founding a popular Indian dating site. Oh, and she’s a better singer than the music researcher, but, hey, that’s why we all went into computer music, right?
In case you’re wondering how you take a research idea and make it run on the iPhone – or how the algorithm works (and might get smarter in the future) – I turned to Parag for those details:
The initial code was developed in my lab in c++. Since the core algorithms are basically mathematical, that portion was relatively easy to port. However, we spent significant time thinking about how to optimize for the iphone and every aspect of the app, from the interface to sound design, has been built with the iphone in mind. For example, there are significant limits on sampler performance — samples have to be short and effects are more or less out — but we thought it was important for our styles to have a rich sound. So we
put great effort into designing light styles that sound realistic.Another significant challenge was making the analysis robust to external noise; iphone recordings are lo-fi and corrupted with tons of background noises, which makes robust (and again computationally efficient) pitch detection essential.
Our approach to reverse karaoke is somewhat different than what’s been done before. A significant limitation of previous work was a lack of fine-grained key estimation, a problem that we felt was critical to successful vocal accompaniment (most people are not anywhere near a piano or an instrument with fixed tuning when singing into the app).
We also worked on trying to give some larger structure to the
accompaniment, which can often sound locally reasonable but notably lacking in direction. Again, a difficult problem particularly when people are singing snippets. Still it is sometimes possible to detect phrases, and we have tried to incorporate that information as well.Auto-accompaniment is an endlessly fascinating and deep problem. As we learn more about human perception and cognition of music, as well as improve our tools for machine listening, our systems will become more musical. While we still have a ways to go, we believe that, with LaDiDa, we’ve created a product that is engaging and allows regular people to express themselves creatively.
If all of this talk about musical perception recalls the questions about how culture and background versus neurology can be used to explain music – as seen at the Notes & Neurons conference – that’s no coincidence. Researcher Parag played sarod with a fascinating ensemble at that same conference. Bobby McFerrin sings a really beautiful solo with the ensemble.
In fact, it’s absolutely worth contrasting the elegance and beauty of these all-human musical responses to the somewhat clumsy (sorry, Khush) iPhone responses. That’s not to say the iPhone creation is any less human – it’s a computation model programmed by humans, and is capable of some impressive feats made possible by their musical instincts and training. As such, we really can hear the gap between what advanced musicians can do intuitively and what we can model computationally, atop the restrictions of the device’s ability to sense the world around it.
World Science Festival 2009: Notes & Neurons, Part 5 of 5 from World Science Festival on Vimeo.