Extraction of musical structure
I think my next big project will involve automatically extracting structure from music. Mike and I had some discussions about doing this with machine learning / evolutionary algorithms, which produced some interesting ideas. For now I'm implementing some of the more traditional signal-processing techniques. There's an overview of the literature in this paper.
What I have to show so far is this:
This (ignoring the added colors) is a representation of the autocorrelation of a piece of music ("Starlight" by Muse). Each pixel of distance in either the x or y axis represents one second of time, and the darkness of the pixel at (x,y) is proportional to the difference in average intensity between those two points in time. Thus, light squares on the diagonal represent parts of the song that are homogenous with respect to energy.
The colored boxes were added by hand, and represent the musical structure (mostly, which instruments are active). So it's clear that the autocorrelation plot does express structure, although at this crude level it's probably not good enough for extracting this structure automatically. (For some songs, it would be; for example, this algorithm is very good at distinguishing "guitar" from "guitar with screaming" in "Smells Like Teen Spirit" by Nirvana.) An important idea here is that the plot can show not only where the boundaries between musical sections are, but also which sections are similar (see for example the two cyan boxes above).
The next step will be to compare power spectra obtained via FFT, rather than a one-dimensional average power. This should help distinguish sections which have similar energy but use different instruments. The paper referenced above also used global beat detection to lock the analysis frames to beats (and to measures, by assuming 4/4 time). This is fine for DDR music (J-Pop and terrible house remixes of 80's music) but maybe we should be a bit more general. On the other hand, this approach is likely to improve quality when the assumptions of constant meter and tempo are met.
On the output side, I'm thinking of using this to control the generation of flam3 animations. The effect would basically be Electric Sheep synced up with music of your choice, including smooth transitions between sheep at musical section boundaries. The sheep could be automatically chosen, or selected from the online flock in an interactive editor, which could also provide options to modify the extracted structure (associate/dissociate sections, merge sections, break a section into an integral number of equal parts, etc.) For physical installation, add a beefy compute cluster (for realtime preview), an iPod dock / USB port (so participants can provide their own music), a snazzy touchscreen interface, and a DVD burner to take home your creations.