Austrian Composer Peter Ablinger has transformed a child speaking so that it can be played as MIDI events on a mechanically-controlled piano, making the piano a kind of speech speaker. Via Matrixsynth, the readers at Hack a Day get fairly involved with how this may be working.

It seems not quite accurate to describe this as vocoding in the strictest sense, so much as a simple transformation to a (much) lower frequency resolution – that is, the 88 keys of the piano. Ablinger, for his part, describes the events as “pixels.” It’s pretty extraordinary that without a bandpass filter, you get something approximating the noisy sibilance of the speech, but this seems to be the result of having lots of events (that is, lots of resolution in terms of time). Edit: Listening again, the short answer to how you can hear so much of the voice through the piano seems to be, you can’t; the original is almost certainly mixed in. It’s nonetheless an interesting effect, and I’d like to hear the piano on its own. In other words, the basic process is, 1) convert the sound spectrum of the recorded voice to a series of MIDI events, and 2) play back the translated MIDI file. You can see that the MIDI playback is accomplished with Pd (Pure Data) running on a Windows Linux/KDE netbook, though it’s not clear what was used to do the original conversion. (The screen shot with side-by-side audio and MIDI appears as though it may be for demonstration purposes, only.)

Correction: The work is absolutely done in custom software developed by the composer in Pd (Pure Data). It’s an ideal tool for the job, and free and open source. I wouldn’t dare try to replicate the results here, but this is fantastic inspiration for playing with sound in Pd.

One Windows tool that’s capable of the job is TS Audiotomidi, as observed by Hack a Day spacecoyote. Whether or not that’s what’s at work here – and it may well be – that utility is itself interesting. Edit: Yeah, far more likely the whole thing was done in Pd. And Pd should be up to the task.

TS-AudioToMIDI

Of course, this is to say nothing of the lovely work done on the mechanical piano. It’s a beautiful piece. Here’s hoping some government bureaucrats got the message of the declaration. Now, we just need a chorus of something really loud – say a thousand trumpets – shouting out the Universal Declaration of Human Rights.

audiotomidi

  • Dub

    Gorgeous! Just gorgious.

  • I would have never guessed that a piano's frequency specificity would be sufficient for this kind of thing!

    It would be cool if he wrote a piece that transitioned gradually from something more recognizably "music" to "speech".

  • I also wonder if he deliberately chose a kid with sibilant heavy speech.

  • vcd

    Interesting piece, but I'm not so sure you would be able to tell it was based on speech if both the original audio was no played on top of it, or if the transcription was not being shown to read in time. The work

    done with the piano is pretty stunning though.

    Leave it to Peter to turn something borderline sensational into something completely sensationalized (re: last paragraph).

  • shamburglar

    anybody know of any decent audio to midi apps for Mac?

  • Adrian Anders

    TS-AudioToMIDI dev should invest the time to make a VST plug-in version of his software. I would be interested in it then.

  • Dano

    @shamburglar

    Similar thing for Mac:
    http://widisoft.com/english/mp3-midi-products.htm

  • I use WIDI for mac. I made a "player piano" video a while back for the ohm64 using it and show how it is done here http://www.youtube.com/watch?v=KkKESe_QdKE

  • Well he's encoding in a seemingly similar process vocoder but decoding in a non-traditional way.

    If you think he's mixing the original with the piano then he's definitely cheating. It doesn't look realtime to me but perhaps that might be a reason why it would be somewhat excusable to hear the original

    If it is not cheating I think is very impressive. A traditional vocoder adjusts each frequency band's volume continuously. A piano just has a velocity, short attack and long decay that can be dampened, so besides the rich harmonic pitched sound vs bandpass decoding.

  • Pingback: KULTURTECHNO()

  • kobe

    one word: Melodyne.

  • Pingback: Das Kraftfuttermischwerk » Das sprechende Piano()

  • Looks to me that the desktop was actually running Linux/KDE with Pd, rather than Windows. Also, the person at the computer looks awful lot like Winfried Ritsch from IEM where they do a lot of work with Pd. So, it seems unlikely that the Windows app in question is being used here and more likely that the whole thing is done in in Pd…

  • @Ivica: You're completely right. That is indeed very clearly KDE. And there's a big honking "X" in the other window. 😉 So, yes, I agree, and I should get back to learning more Pd signal processing kung fu.

  • Stij

    Wow. I've often wondered if something like this was possible, but I've never had any idea of how to implement it. If this is legit then it's very impressive.

  • Here's a piano only video http://vimeo.com/1483630. Not the same text and with additional notes. Very amazing.

  • Stij

    Hmm…yeah, it isn't nearly as intelligible without the original voice mixed in, but you can still hear some of the sibilants.

    It also sounds extremely creepy!

  • GMM

    Wow this is amazing. And it is only a piano. Imagine when you have a whole orchestra scored and conducted to reproduce speech, and then further on, a whole orchestra running in realtime as a vocoder!

  • Here we go – here's the full explanation of how the whole thing works, including a blurry image of the Pd patch.

    http://ablinger.mur.at/docu11.html

    I must say, I love the idea of pixelation – this is something that, as a general approach, could be attached to a wide variety of work.

    Oh, and I actually prefer the more abstract rendition minus the overlaid speech. Who needs intelligibility? It's gorgeous.

  • Fishboy

    Why are so many commenters focusing on the sibilants? What makes them more interesting than other phonemes/classes of phonemes?

  • I'm not a linguist, but sibilants are essential to understandability, and they're the thing that would theoretically be hardest to hear on a piano which is least able to produce broad-band noise (versus formants/vowels). If you listen to the piano without the voice, in fact, it's what seems to be largely missing.

  • Fishboy

    So are you saying you hear vowels in the video without the actual voice layered in? http://vimeo.com/1483630 I couldn't hear a voice in that one, myself, at least not well enough to make out any words or phonemes – vowels, sibilants, or otherwise. I guess to my ear it sounded vaguely vocal. But anyway, I thought the most interesting would be vowels, especially diphthongs, since the language used is English.

  • John

    I'm not clear, how is this concept of "pixels" really any different than that of wavelets?

    As an aside, I'm not wholly convinced that they *are* mixing in the original audio on the feature video. Upon hearing the kid's plain voice, his formant seems different than what is coming from the piano audio. Is there anything other than subjective listening which would indicate that they are mixing in the original audio? The Vimeo clip Fishboy links to is difficult to compare, simply because of the vastly different acoustics, different piano AFAIK, and it doesn't seem to have the dampening that the one above does.

    Interesting work regardless of this point.

  • Dub

    Also covered by MeFi

  • Pingback: The Speaking Piano « The Kewl Doodz ‘n’ Chyx()

  • Pingback: vanillechip.de » Speaking Piano()

  • Pingback: Audio Damage Tattoo, Synthgeek Kurzweil K2000S samples, Talking Piano, Build a drone synthesizer()

  • Pingback: Mechanical piano hacked to talk, says nothing you’d be interested in | Reaction Radio()

  • Pingback: Mechanical piano hacked to talk, says nothing you’d be interested in « Tech Pulse!()

  • piker

    so what. he got a computer. good for him.

  • Pingback: Mechanical piano hacked to talk, says nothing you’d be interested in()

  • Pingback: Mechanical piano hacked to talk, says nothing you’d be interested in | TechRoo.com| Tech News, Gadget News()

  • Pingback: Mechanical piano hacked to talk, says nothing you’d be interested in at Technology Nirvana()

  • Pingback: Mechanical piano hacked to talk, says nothing you’d be interested in()

  • Pingback: TechTrance.net – Where Technology is a Passion » Blog Archive » Mechanical piano hacked to talk, says nothing you’d be interested in()

  • Pingback: :: TecnoloGeek :: » Blog Archive » Mechanical piano hacked to talk, says nothing you’d be interested in()

  • Pingback: Mechanical piano hacked to talk, says nothing you’d be interested in |()

  • Pingback: New Mobile Reviews » Mechanical piano hacked to talk, says nothing you’d be interested in()

  • Pingback: Mechanical piano hacked to talk, says nothing you’d be interested in – ComputerUser.ca()

  • Pingback: Mechanical piano hacked to talk, says nothing you’d be interested in()

  • Pingback: Amazing Digital » Mechanical piano hacked to talk, says nothing you’d be interested in()

  • Pingback: Mechanical piano hacked to talk, says nothing you’d be interested in | Technology South Africa()

  • Pingback: Its All About A Digital Technology » Blog Archive » Mechanical piano hacked to talk, says nothing you’d be interested in()

  • Pingback: Mechanical piano hacked to talk, says nothing you’d be interested in | Tech2Crave()

  • Pingback: Mechanical piano hacked to talk, says nothing you’d be interested in « Mini Apple Store()

  • Pingback: Mechanical piano hacked to talk, says nothing you’d be interested in | SalDee.com()

  • Pingback: RSS For Gadgets » Mechanical piano hacked to talk, says nothing you’d be interested in()

  • Pingback: Fit durch Musik: Das Treppen-Piano soll den Leuten Beine machen | Basic Thinking Blog()

  • Pingback: IHR #75 (Enhanced) – Podcast Awards, Synth Strings, Music Creator 5, Lily Allen | Inside Home Recording.com()

  • Speaking orchestra? http://www.heraldscotland.com/speakings-a-new-mus
    Harvey has done excellent work for decades.

    To blow my own horn: my Amiga program RGS is a real time spectrogram paint program (from 1987 originally), which could send out spectra as MIDI information, therefore being able to make my (microtonally tuned) DX7 to emit intelligible and unintelligible speech. http://www.echo.net/~jhhl/Mp3/RGS/

  • Pianoman

    Intelliscore is another program that converts audio to MIDI. It works with the latest versions of Windows, including Vista and Windows7. The website is: http://www.intelliscore.net/

  • Pingback: Amazing YouTube Music Video Series #9 -The Talking And Speaking Piano | MusTech.Net: Music Education, Music Technology, & Education!()

  • Pingback: Luke Loeffler :: Recreating the Speaking Piano()

  • Sylvaiw

    Why do you say the original voice is mixed with the piano ? Where did you get this information ? I can't find it.

    In my opinion only the piano is heard. and that's the whole interest of this thing.

  • telfer cronos

    i'm sure you are right, sylvia.

  • “Listening again, the short answer to how you can hear so much of the voice through the piano seems to be, you can’t; the original is almost certainly mixed in”.
    I disagree. Surely the point of the exercise is that the original is not mixed in.

  • richardmullins

    “Listening again, the short answer to how you can hear so much of the voice through the piano seems to be, you can’t; the original is almost certainly mixed in”.
    I disagree. Surely the point of the exercise is that the original is not mixed in.

  • richardmullins

    “Listening again, the short answer to how you can hear so much of the voice through the piano seems to be, you can’t; the original is almost certainly mixed in”.
    I disagree. Surely the point of the exercise is that the original is not mixed in.

  • Firekraag

    Any sound is a sum of pure frequencies in the 20-20k Hz range. Computers do the same sampling job since sound cards were created, but I have to admit using a live piano is kinda sexier.

  • Firekraag

    Any sound is a sum of pure frequencies in the 20-20k Hz range. Computers do the same sampling job since sound cards were created, but I have to admit using a live piano is kinda sexier.

  • Firekraag

    Any sound is a sum of pure frequencies in the 20-20k Hz range. Computers do the same sampling job since sound cards were created, but I have to admit using a live piano is kinda sexier.

  • Jim Bumgardner

    I’ve been trying to reproduce this effect in software using FFT and piano samples. Having a very hard time getting the voice to be nearly as intelligible as it is in these clips. At this point, I’m inclined to agree with Peter that the composer may be mixing in the original audio a bit (I’ve seen similar cheats in many photomosaics). Unfortunately, most of the writings on the composer’s site are about *why* he did it, rather than *how* he did it.

    • dvf

      Did you treat the piano samples as though they were sine tones, or did you apply the FFT to both the voice signal and the piano samples and then solve the system of triangular linear equations?

      • I did the former. You make a good point, the latter should do a better job of approximating the desired result. Will have to try it. Have you?

        • dvf

          I haven’t tried it. Sounds fairly computationally intensive, and I do all my work with interpreted rather than compiled languages. I do know how to do all the steps and I have coded an FFT (in PostScript, lol, but it works). Piano sample sets vary lots from one to another, so I suspect that makes a big difference. I’d go with the fairly percussive samples,

          • Lorenzo Peyrani

            It’s absolutely not mixed in. I got extraordinary results without any effort (I was lucky with the midi converter, go to this site: http://www.ofoct.com/audio…/convert-wav-or-mp3-ogg-aac-wma-to-midi.html); if you use a flute sound instead of the piano it gets even clearer. The quality of the result also probably depends on the timbre of the original voice. I used the TS Eliot recording of The Waste Land and you can even recognize Eliot’s particular accent (with just a shit midi playing!).

  • Jim Bumgardner

    I’ve been trying to reproduce this effect in software using FFT and piano samples. Having a very hard time getting the voice to be nearly as intelligible as it is in these clips. At this point, I’m inclined to agree with Peter that the composer may be mixing in the original audio a bit (I’ve seen similar cheats in many photomosaics). Unfortunately, most of the writings on the composer’s site are about *why* he did it, rather than *how* he did it.

    • dvf

      Did you treat the piano samples as though they were sine tones, or did you apply the FFT to both the voice signal and the piano samples and then solve the system of triangular linear equations?

      • I did the former. You make a good point, the latter should do a better job of approximating the desired result. Will have to try it. Have you?

        • dvf

          I haven’t tried it. Sounds fairly computationally intensive, and I do all my work with interpreted rather than compiled languages. I do know how to do all the steps and I have coded an FFT (in PostScript, lol, but it works). Piano sample sets vary lots from one to another, so I suspect that makes a big difference. I’d go with the fairly percussive samples,

          • Lorenzo Peyrani

            It’s absolutely not mixed in. I got extraordinary results without any effort (I was lucky with the midi converter, go to this site: http://www.ofoct.com/audio…/convert-wav-or-mp3-ogg-aac-wma-to-midi.html); if you use a flute sound instead of the piano it gets even clearer. The quality of the result also probably depends on the timbre of the original voice. I used the TS Eliot recording of The Waste Land and you can even recognize Eliot’s particular accent (with just a shit midi playing!).

  • Jim Bumgardner

    I’ve been trying to reproduce this effect in software using FFT and piano samples. Having a very hard time getting the voice to be nearly as intelligible as it is in these clips. At this point, I’m inclined to agree with Peter that the composer may be mixing in the original audio a bit (I’ve seen similar cheats in many photomosaics). Unfortunately, most of the writings on the composer’s site are about *why* he did it, rather than *how* he did it.

    • dvf

      Did you treat the piano samples as though they were sine tones, or did you apply the FFT to both the voice signal and the piano samples and then solve the system of triangular linear equations?

      • I did the former. You make a good point, the latter should do a better job of approximating the desired result. Will have to try it. Have you?

        • dvf

          I haven’t tried it. Sounds fairly computationally intensive, and I do all my work with interpreted rather than compiled languages. I do know how to do all the steps and I have coded an FFT (in PostScript, lol, but it works). Piano sample sets vary lots from one to another, so I suspect that makes a big difference. I’d go with the fairly percussive samples,

          • Lorenzo Peyrani

            It’s absolutely not mixed in. I got extraordinary results without any effort (I was lucky with the midi converter, go to this site: http://www.ofoct.com/audio…/convert-wav-or-mp3-ogg-aac-wma-to-midi.html); if you use a flute sound instead of the piano it gets even clearer. The quality of the result also probably depends on the timbre of the original voice. I used the TS Eliot recording of The Waste Land and you can even recognize Eliot’s particular accent (with just a shit midi playing!).