Artificial Synesthesia

I’ve always been interested in synesthesia and steganography. The two aren’t really related, but when I was tasked with a project for an art class, I decided to make it interesting by trying to marry the two.

You can “listen” to a picture pretty easily. By simply copying and pasting the contents of a bitmap into a .wav file, you can trick a program like Audacity into “playing” an image file. Here’s what the Mona Lisa sounds like using this method:

MonaLisa.bmp as .wav


While interesting, I wanted something that was a little more musical. I wanted something that sounded nice. For a while I played with the idea of trying to make an image that, when converted into a .wav in this way, would sound interesting. I quickly discovered that the color content of the image was much less important than the its spatial frequency content. That is, while a single vertical red bar and a green bar might sound identical (a single tone), two bars stacked next to each other would sound very different (a doubling in frequency). This is probably better illustrated by the image below:

One of my first attempts to draw something that sounded interesting. The image has been rotated 90 degrees, and so gets played from left to right.


You can clearly hear the part in the middle where the bars produce semi-uniform tones. You can also hear the difference between the black halves of the rightmost circles versus the green halves. What you’re hearing is the difference in the way green and black pixels are represented. For the black section, you’re hearing the simple alternating between bits that are mostly ‘all on’ and ‘all off’. For the green section, however, you’re hearing an alternation between bits that are ‘all on’ (white) and triplets of bytes where every third one is ‘all on’.
At any rate, while it was interesting to try to reverse the process by drawing something from scratch, I wanted to get back to the idea of converting existing visual art into something that sounded musical. I needed to find some way of mapping the pixel data in the image to notes that would be at least somewhat intuitive. Directly mapping the audio spectrum onto the visual spectrum turns out to be tough, though, because visual color can’t be described by a simple frequency–there is a brightness component as well. For instance, you can see every color in the visible spectrum in the image below, but nowhere along that spectrum is white or dark green.

Source: Wikimedia Commons

Our eyes produce these ‘emergent’ colors by both being sensitive to intensity and by treating combinations of colors differently. To try to imitate this, I wrote a java program using JFugue to translate the pixels in an image into a midi sequence of three-note piano chords. For each pixel, the three values for red, green, and blue get mapped onto corresponding sections of the piano.

It’s not perfect, but the results were actually pretty interesting. I had to downsample the images heavily so they could be played in a timely manner (640×480 pixels at 2 pixels per second would take about 42 hours to play).  Here are some sample images and their corresponding ‘songs’:


Blue flower blueflower.midi

The rhythm you hear in the Mona Lisa midi is actually due to the padding bytes used in the bitmap format the image was saved in. These padding bytes occur at regular intervals ever few pixels, and always have the same value. This causes the rhythm effect your hear. It had never occurred to me that a file format might have a rhythm, but by all accounts it makes sense. The padding bytes were removed in the blue flower midi.

At any rate, it was a really interesting project and I hope to get more time someday to keep playing with it.