Shimon, an adaptive, improvisational, percussion-playing robot, is getting smarter – and more famous, with appearances in places like the Stephen Colbert show. Now, humans have been known to get a big head under such circumstances. Shimon’s head has gotten “more social” – gestural intelligence helps the robot relate to fellow players and nod its head in time to the music.

I got a chance to talk more to project creator Dr. Gil Weinberg, director of the Georgia Tech Center for Music Technology. He’s also taken some of this technology and built it into mobile app ZOOZBeat which you can spot in one of the videos here alongside the (much more expensive, no you can’t have one) robot.

What might surprise you about the Shimon project is that it isn’t just about copying what humans can do with a robot. In fact, if anything, says its creator, it’s about human-robotic relations. “The project was always aimed at creating new and inspiring interactions between humans and robots, with the goals of creating new and exciting musical outcome that cannot be created otherwise,” says Weinberg. He emphasizes that, while the robot assimilates human listening, it has a musical style all its own.

And yes, if there’s any doubt that Georgia Tech students can kick our ass in smarts and drum chops, even the ones who aren’t advanced robots, that’s Caity in the new promo video at top, an architecture major with LEED certification and drum line experience right up to the pros (having drummed for the Atlanta Falcons). That’s their fight song at the end. In fact, the only thing Georgia Tech students can’t do, evidently, is sing. (Though, based on my undergrad alma mater, I really, really, really can’t make fun.)

Shimon on Colbert from Georgia Tech on Vimeo.

Talking Machine Music

CDM: You talk in the NPR interview about not wanting the robot to play just like a human. So, I wonder – would you think of the robot as an expression of the musical taste and instincts of its designers? In some ways, it does sound different from a human; could we think of Shimon as having its own style, or being interesting because of its non-humanness?

Weinberg: Our motto is “listen like a human, play like a machine”. The idea behind it is that in order to connect with humans, Shimon has to understand music the way we humans do. For this purpose we developed perceptual modules based on music perception research for concepts such as tension and release, stability, similarly, etc. When Shimons responds, however, we want him to be surprising and inspiring, introducing new ideas that humans are not likely to use, whether by using mathematical processes that humans cannot process in real time or just through mechanical abilities. For this purpose we developed algorithms that utilize concepts such as genetic algorithms, fractals, morphing of HMM-based improvisation, etc. So for example, Shimon can respond by morphing the styles of Monk, Coltrane and his human co-player, in a way that humans probably will never use. In that sense he has its own musical style. For each different piece, though, Shimon may have a different style or “taste” based on the algorithm we use. I assume that one could say that his “taste” is a combination of all of his styles, which are inspired by the designers’ input.

How has the robot’s algorithm evolved since we first talked about it? What sorts of modifications have you found useful?

An important recent addition to the project is Shimon’s social head, developed with the help of my former post doc – Guy Hoffman. In order to create the connection with humans, we explored ideas of embodiment and gestures, as instrumental aspects of expressive musical group play. Currently Shimon can detect to the beat and nod his head accordingly, which helps humans get into the groove. He would look at what he finds interesting (if one player is playing something different than before, or different than the other players, Shimon would look at her rather than the other players). If Shimon plays something sophisticated, he is more likely to look at his own arms etc. We are also working on anticipating and coordination. We installed a camera in his head, and are currently working on letting Shimon use the visual input to anticipate and coordinate his playing with humans, along with the auditory information it currently processes.

Did the Colbert Bump go to Shimon’s head?

First Shimon was somewhat insulted, for himself and for the idiom of Jazz in general. Then he realized that they spelled his name correctly, so he started bragging about it. He also started to play in other styles to distance himself from the genre 🙂

But seriously, I am wiring a new piece for Shimon in an African Marimba Band Play style, which will help show its versatility in genres. (And here is an old clip where he plays an Indian Raga.)

Any other work we should know about, or other research growing out of the Shimon project?

To validate the importance of visual cues in music group play (i.e, validating the importance of Shimon’s embodiment and physically of the robot in comparison to interacting with computer generated music) we conducted this research: Visual Cues:
The Effect of the Visual Modality on Musical Ensemble Synchronization

Also, check out this paper, which won the Best Cognitive Paper in ICRA 2010. [That’s the IEEE International Conference on Robotics and Automation, for those of you not in the know. And IEEE originated as the Institute of Electrical and Electronics Engineers. Whew. -Ed.]
“Gesture-Based Human-Robot Jazz Improvisation” by Guy Hoffman and Gil Weinberg [PDF link]

Shimon in Videos

Georgia Tech Center for Music Technology