Drawing good looking Waveforms on the web

A small exploration for creating waveform vizualization on the web, simple and raw

After starting a small project that had and audio component i wanted to show some representation of it. So how do you visualize audio ? Decided to look at products that have any sort of audio visualizations and found 2 distinct types of looks:

Well doesn’t seem like a complex task so what about libraries that help us achieve the task at hand. The Soundcloud is really stylized and also a web solution, but searching for what are they using i stumbled on Waveform.js (). Now at first hand its an easy to use solution and even supports streaming waveform data only one thing doesnt seem right

Well it seems like Soundcloud is not using it anymore so how do we really replicate the same type of audio visualization. One interesting note here is that they made an article about that some years age showing the aforementioned library. First thing thats important to be aware that our audio file contains alot of samples. For my specific purpose i had audio with length between 1 hours and a maximum of 24 hours so that is probably too big to use all the samples for the visualizations. So next thing we have to do is decide what samples are we gonna leave and what samples we want do discard, but keep the shape of the waveform almost intact. We can use some audio specific technique for downsampling, but if we are not really thinking about accuracy a simple selection of the maximum value is enough.

Getting Samples One of the best tools to get the audio samples from either video or some audio container format is ffmpeg. I know that there are more capable tools specific for audio format manipulation, but this will do the job for now. ffmpeg -i <audio/video_file> -ac 1 -ar 44100 -c:a pcm_s16le -f s16le pipe: ac allows us to choose how many channels we want ar is the sample rate of our output, should be the same as the input file the other options -c:a pcm_s16le -f s16le define how the output will be formatted We chose to use pcm format because its just raw samples no headers no extra metadata. Choosing a proper format of what you want to parse is essential, in this case I use signed 16-bit Little Endian. All the other formats you can see here https://trac.ffmpeg.org/wiki/audio%20types. After we have the proper format we just need to parse it in parts because there is no need to save the whole file.

https://gist.github.com/Bloodb0ne/706254db1f7b5baabf6cb86f0142f939#file-test_pcm-js

This is a sample snippet in Node.js to fetch and reduce the samples from a spawned ffmpeg process. First we estimate the amount of samples we get from a file with a certain size based on sample rate and duration of the audio. Using this size we can now fill a bucket with samples and get the maximum value for each bucket and that will be our final output sample. Now that we have our final result of 1800 samples of audio its time to display them. Using would allow us to really easy render and modify the data. From now on we have the simple task to draw a weighted line for each sample, but because of being in the browser and resizing we can pick which samples we draw. First off we clear the canvas and then save the context so we don’t contaminate the canvas drawing environment. Now for each pixel in the canvas we decide what sample to draw and what gap to leave between each drawn sample using the barWidth and gapWidth variables. Each sample that we draw is represented by a rectangle that has an X value dependent on the position of the sample and an Y value dependent on the value of the sample. First of we get our sample at index ((k / d) * r) | 0 where k is our current position, d is our canvas width and r is the amount of samples in our waveform. Because we have small values for the indexes we can round it using a bitwise operator ( | ) and not with Math.round(). After we get our sample value and we scale it in the [0,1] range we can use it to determine the maxY and minY value for the rectangle. T = (C * g) | 0; S = ((1 — C ) * v + g) | 0; Another important thing that’s specific about the style and aesthetic of the Soundcloud visualization is that there is a line used for the “center” of the sample. I separates the rectangle into two halfs thats specified with the linePercent option. In this example its placed at 70% of the height of the waveform. The minY value or T is calculated by simply multiplying the value of the sample to the size of the upper half above the waveform above the separating line. The maxY value or S is calculated by getting the inverse sample multiplying it by the lower half below the waveform and adding the height of the upper half. Having calculated maxY and minY we now have the height of our rectangle by just subtracting the two values. c.rect( k,T,gw,S-T); Each gap is positioned after our sample and uses the already calculated values of the previous sample so it aligns with the last drawn rectangle. A = Math.max(T, w) c.fillRect(k — gapWidth,A,gapWidth,Math.min(S,x) — A) Finally to create a nice effect we fill the already created path with a gradient with a sharp stop at the separating line. Creating a sharp stop is accomplished with adding another colorStop to our gradient with the same position. After drawing all rectangles of the waveform we just draw a single pixel line with a rectangle to create a clear distinction between the upper and lower half of the waveform. c.clearRect(0,g,d,1)

Bonus( coloring a region of the waveform) In the example code below we have the function drawOverlay that just draws a rectangle over our waveform but we use a composition operation “source-atop” so we use the waveform as a mask. You can learn more on composition operation on the MDN ( https://developer.mozilla.org/en-US/docs/Web/API/CanvasRenderingContext2D/globalCompositeOperation ).

https://gist.github.com/Bloodb0ne/419c5314b691816b20ecd74554b28c3b#file-waveform-html