Random Tech Stuff: digital signal processing

Showing posts with label digital signal processing. Show all posts

Monday, February 17, 2014

Sox spectrogram log frequency axis and upper/lower frequency limits

Linear spectrogram

As a result of some of the work I did last week on rendering MP3 files to scrolling spectrum waterfall plots, I noticed that most of the interesting detail was in frequencies under 1kHz. One obvious way to see the detail at lower frequencies, yet still keep the higher frequencies is to plot on a log frequency scale.

Unfortunately this is currently not supported in SoX, so I modified the spectrogram module to add this feature. This I noticed that plotting charts from 1Hz to the nyquist frequency meant that a lot of screen space was wasted in the lower 1 - 50Hz range which is not the most interesting for most audio files. So I also added a switch to set the lower and upper frequencies of the chart (this actually took considerably more work than the plotting to the log scale!).

Log axis spectrogram. Lower frequency details are far more visible.

The modified spectrogram.c file is available here [1]. Visit the SoX site [2], download the source code, replace src/spectrogram.c with this file and compile. This has been tested with the latest code as of 14 Feb 2014.

Summary of changes:

Implement -L which plots the spectrogram on a log10 frequency axis. By default from 1Hz to nyquist frequency.
Implement -R <low_freq>:<high_freq> : restrict spectrogram frequencies. I also updated the linear scale code to honor this switch.

Example:

sox mymusic.mp3 -n spectrogram -L -R 50:8k

Known problems and observations:

At lower frequencies (1 - 100Hz) each spectrum bin in q->dBfs[] array occupies several pixel rows: so it looks blocky. A lesser problem: at the top some frequency bins will be ignored because there will be more than one bin for each pixel row. I could fix the blockyness by getting a more detailed spectrum.
If lower and upper frequencies are in the same decade then there are no y axis tick labels. Hopefully will have a fix for that soon.
When compiling I'm getting this warning when using log10f() and powf() : "warning: passing argument 1 of 'log10f' as 'float' rather than 'double' due to prototype". According to the man pages for those function they should accept float args! I see some discussion on this relating to compiler switches: but I don't want to go changing anything there.
I created two new functions to parse the -R frequency range switch to cut down on code duplication.
parse_range (const char *s, int *a, int *b)
parse_num_with_suffix (const char *s, int *a)
I'm not familiar with the sox code base, so I'm not sure if similar functions already exist (I couldn't find any). I'm also concerned I'm making a temporary change to a string in parse_range when the type is const char *. I need to brush up on my C :)

Footnotes:

[1] https://github.com/jdesbonnet/joe-desbonnet-blog/tree/master/projects/sox-log-spectrogram

[2] http://sox.sourceforge.net/

Saturday, February 15, 2014

Boosting audio volume in a video file

For some reason I'm finding that some video files played directly on my new Samsung TV (by mounting a USB drive) have a low volume and boosting the TV's volume beyond 40 (of presumably 100) yields no additional gain.

This ffmpeg command can be used to boost the volume in the file:

ffmpeg -i myvideo.mkv -vcodec copy -af "volume=12dB" myvideo-boosted.mkv

You can vary the boost by chaning the volume gain parameter. I found 12dB was about right. Negative values are allowed also.

Wednesday, February 12, 2014

Convert MP3 to a scrolling spectrum waterfall plot video

There are many utilities that display a scrolling spectrum waterfall plot [1] from a signal, but I was unable to find any open source utility that converted an audio file into a video file with a scrolling waterfall plot + the audio.

The SoX [2] sound utility can generate a static spectrum waterfall plot image from an audio file (or part of it), but it can't make a video. So I wrote a script to do this.

It's a very brute force approach. It requires lots of CPU time and lots of temporary disk space. The script depends on SoX, GNU Parallel, mencoder (or ffmpeg). GNU Parallel is optional, but will result in significant speed up on a multi-core system.

To use, create an empty directory on a volume with plenty of disk space and run with the audio file as a parameter. Eg:

./make-spectrogram-video.sh -t "My Music File" mymusic.mp3

The output will be written to output.avi and output.mp4. Other options include setting frame rate, the speed of the scrolling, audio credit text etc. To get full help do this:

./make-spectrogram-video.sh -h

The script is available on GitHub here [2]. Here is a sample output video of Bach's Toccata and Fugue in D Minor [3] :

Updates:

17 Feb 2014: I noticed that all the interesting details in music is squashed down at the very bottom of the spectrogram. So I updated the spectrogram module in SoX to have the option of plotting on a log axis. This isn't in the offical sox distribution. See this blog post for more details [4].

Footnotes:

[1] http://en.wikipedia.org/wiki/Waterfall_plot

[2] https://github.com/jdesbonnet/audio-to-waterfall-plot-video

[3] Music MP3 file from https://archive.org/details/ToccataAndFugueInDMinor. YouTube video at http://www.youtube.com/watch?v=utp95bprqeg

[4] http://jdesbonnet.blogspot.ie/2014/02/sox-spectrogram-log-frequency-axis-and.html

Friday, January 31, 2014

Generating a spectrogram of The Shepard Tone

Earlier today I saw a tweet from Tim O'Reilly pointing to an interesting article in The Atlantic on audio illusions. The Shepard Tone illusion sounds like a continuously increasing or decreasing tone which goes on indefinitely... which is impossible because you'd eventually end up at at inaudibly low or high pitch.

Curious as to what was going on I downloaded the YouTube video, extracted the audio and ran off a spectrogram with the open source sox audio tool. I had done this before last year, so I was familiar with the process.

1. Download the YouTube video:
youtube-dl -o shepard_tone.mp4 http://www.youtube.com/watch?v=DfJa3IC1txI

2. Extract the audio from the video

ffmpeg -i shepard_tone.mp4 -f mp3 shepard_tone.mp3

3. Convert to mono (the YouTube video has stereo audio)

sox shepard_tone.mp3 shepard_tone_mono.mp3 remix 1,2

4. Create the spectrogram

sox shepard_tone_mono.mp3 -n spectrogram -o spectrum.png

One small problem: most of the action is down at the lower frequencies. So I resampled the audio and regenerated the spectrogram so that the interesting bits were more visible.

5. sox shepard_tone_mono.mp3 -r 4k -o shepard_tone_4k.wav

6. sox shepard_tone_4k.wav -n spectrogram -o spectrum.png

Tools used :

youtube-dl

https://github.com/rg3/youtube-dl

ffmpeg

http://www.ffmpeg.org/

sox

http://sox.sourceforge.net/

Saturday, August 31, 2013

Using Sox spectrogram tool to analyze audio noise

While out on a walk near Lagos (Algarve, south Portugal) I noticed a loud buzzing/crackling sound. It sounded very similar to the kind of buzz/crackle you'd hear near high voltage transmission lines, but there was no power lines in sight. So I though I'd capture a few seconds of audio with a voice recorder app on my phone and look at the spectrum later to rule in/out an electrical origin: any thing electrical would have peaks at exactly 50Hz and harmonics of 50Hz.
It turns out that sox has a neat tool to create a spectrogram from any audio file:

sox recording.wav -n spectrogram -o spectrum.png

However the frequencies I was interested in were down well under 1kHz, so I first resampled at a rate twice the highest frequency of interest:

sox recording.wav -r 2k -o t.wav
sox t.wav -n spectrogram -o spectrum.png

So this is the result:

The buzz sound can be seen in the spectrograph as horizontal streaks at about 60Hz and 120Hz. Not being exactly 50Hz rules out an electrical origin. The frequency can also be seen to vary a little with time.

Here is a spectrogram of another recording I took on the way down to the beach later. There was multiple sources in this recording (audio file here) but fainter/further away.

and other (audio file here):

You can see multiple horizontal streaks of varying frequencies.

Conclusion:

It's clear now the noise is from an insect. What insect, I have no idea. (Update: I think it might be Cicada)

If puzzled about the origin of a strange sound, record it and create a spectrogram... it might yield clues. Anything relating to utility power will be at the AC frequency (50Hz most of the world, 60Hz US).

Random Tech Stuff