Music Visualization with the Fourier Transform

January 8th 2018
by Miranda Hawks

I first learned about the Fourier transform in a Software Development class. The idea was to make an app that can detect copyright infringement, so we needed to be able to match snippets of songs to each other. Basically, we needed to build something similar to Shazam.

At the time, I didn’t fully understand how a Fourier transform worked or how to use it. A lot of the class struggled with the project and the math involved. It wasn’t until I tried using the Fourier Transform in a later class, Computer Graphics, that I figured out how to use it without getting bogged down in too many technical details.

So, for this tutorial, I’m going to explain how I built a basic music visualizer that I used as a project in my Computer Graphics class. It was a pretty fun little project and I really enjoyed playing around with it.

What is a Fourier Transform?

There are a lot of fancy math equations and graphs that come up when you search for a Fourier transform, but for the purposes of this tutorial I'm going to give a basic overview.

So, we're building a music visualizer. What's the most common way that we see music represented digitally?

Usually, we see a standard waveform. What this shows is how loud the song is over time. So around the middle of the song, we see these big spikes, and then these spikes get signifcantly smaller. So we can see that the song gets loud, and then much quieter right after. This is showing amplitude (loudness) over time. This representation is called the time domain.

This doesn't give us a lot of information about the song, other than how loud it is at certain points. To learn more about it, we can use a Fourier transform.

Instead of representing audio in the time domain, a Fourier transform lets us represent it in the frequency domain. This means instead of showing us amplitude and time, we'll be looking at amplitude and frequency.

Here's a visual example:

This is showing a kick drum in both the time domain (left) and the frequency domain (right).

We can see on the left that there's an initial "boom" of sound that fades out over time. On the right, we're seeing the frequencies instead. So at the point where I stopped it (shown by the orange vertical line on the left side), we can see all of the frequencies at that point on the right.

So if we look at the frequency section, we can see there's a lot of lower-range frequencies. That makes sense, as we're looking at a kick drum.

Now, how can we use this for a music visualizer?

Creating a Music Visualizer

I used P5JS (a Javascript implementation of Processing) for this since it's pretty easy to jump into, even without a ton of programming experience. However, you can do this in any language as long as it can do the following, either natively or with libraries:

  • draw to the screen
  • read a sound file
  • perform a Fourier transform

Most libraries I've used for the Fourier transform use something called a Fast Fourier Transform (FFT), which is just what it sounds like. It's basically just a faster algorithm for doing it. That's what I'll be using here, alongside the P5JS sound library.

Setting up will vary depending on what language you want to use, but I set up a simple index.html file to import my libraries and my new code. I'm also running a simple Python server so I can view my visualizer.

So now, we need to import the song, create a canvas, play the song, and perform the FFT on it. This sounds like a lot but it can actually be fairly simple in something like P5JS.

function preload() {
  sound = loadSound('sample.mp3');
}

function setup() {
  createCanvas(windowWidth, windowHeight);
  sound.play();
  fft = new p5.FFT(0.7, 1024);
}

The values I'm passing to p5.FFT are the smoothing and bin size. Smoothing basically affects how "jerky" our visual will be on a scale from 0 to 1. The bin size means that we will have an array of length 1024, where the index represents the frequency and the value at each index is the amplitude of that frequency. So array[0] will be the lowest frequency, and array[1023] will be the highest. The only requirement for this parameter is that it is a power of 2 between 16 and 1024.

To get the hang of these, I would just play around with them once you get your visualization set up and see what settings you like. You can go more in depth on the settings in the documentation on the official documentation, but for the sake of simplicity, I won't get into too much of it. The basic thing we need to know is that it's going to run the FFT on very small chunks of the song. This allows us to update the visual continually throughout the song.

Now, we can make the visualization. Let's write the draw function:

function draw() {
  background(20, 20, 20);
  // Returns an array of amplitude values (between 0 and 255) across the frequency spectrum.
  // Length is equal to FFT bins (1024 by default). The array indices correspond to frequencies
  spectrum = fft.analyze();
  noStroke(); // Don’t create an outline around the shapes
  fill(20, 200, 20) // Set the fill color of the shapes
  for (var i = 0; i < spectrum.length; i++) {
    ellipse(i, spectrum[i], 2, 2);
  }
}

background takes RGB integers, so I've drawn a dark grey background. Then, I get my FFT spectrum. Again, the values are the amplitudes and the indices represent the frequencies.

Next, I'm setting up how I want to represent these values. In this case, I'm drawing green circles with no outline around them. Finally, I'm looping through my FFT spectrum and drawing the circles to visualize it. So the x-coordinate of the ellipse is the frequency, and the y coordinate is the amplitude.

Now the visualizer looks something like this:

It's a little weird though, because the circles actually move down on the page when they're at a higher amplitude. It's backwards from what I would expect to happen. It's also at the top of the page. So I'll fix this by changing my y-coordinate to be (windowHeight/2)- spectrum[i].

So there's a very basic visualization! You can play around with the library you're using to see what interesting affects you can make. I spent awhile tinkering with it and came up with something like this:

P5JS lets me grab a "frequency" range instead of using the entire array, so here I made the middle circles expand when the bass frequencies have a higher amplitude to create a kick drum effect. The blue rectangles change color based on the amplitudes of the mid to high ranges, and the green dots work just as before, except I decided to copy and mirror them so it's a bit more symmetrical.

Like I've said, I'd encourage you to play around with different settings and colors and get creative! And try it out with different songs too - I've tested it out on songs that have a male and female vocalist and it's interesting to see where those fall in the spectrum. Playing around with different genres can be a lot of fun too. You could even try to make custom visualizations for different songs.

Hopefully this simplifies the idea of the Fourier transform a bit and helps you understand how you can use it in projects, even if it looks very complicated at first glance. I wouldn't say I'm a math person by any means, but once I was able to figure out how it represents sound, it became much easier to work with.