The fast Fourier transform (FFT) is one of the most widely used methods of frequency spectrum analysis. This analyser conducts a succession of FFTs over the length of the audio file, and outputs data related to the magnitude of the time-varying power spectrum. This analyser also allows you to 'zoom' into a user-specified frequency band (by using the chirp z-transform instead of the FFT).
The purpose of this type of spectral analysis is generally to determine how sound power is distributed across the frequency range (from 0 Hz to the Nyquist frequency, which is equal to half the audio sampling rate, or over a narrower range if the chirp z-transform is selected). The output is the sound power per frequency component, and so the absolute numbers depend on the number of components selected by the user (the number of components is equal to half the number of samples analysed). It is also important to understand that the components have a linear distribution across the frequency range (whereas some other frequency analysis methods methods have a 1/f distribution across the frequency range). This means that for an FFT, the top half of the frequency range is just one octave; and the top three quarters of the frequency range is just two octaves. While some programs redistribute these frequency components for visualisation, this can be misleading because the sound levels in frequency components are related to the density of components on the frequency scale. The overall sound power of the spectrum is the sum of the sound power in all of the frequency components. Note that decibel values need to be converted to sound power values prior to summation, and then converted back to decibels to yield the total sound level.
This sets the overlap of windows in terms of percentage, milliseconds, seconds or number of samples.
This sets the size of the window in samples, which is a power of 2. Values from 2^7 (128 samples) to 2^20 (1048576 samples) are supported (corresponding to window durations of 2.9 ms to 23.8 s for a sampling rate of 44.1 kHz). Longer window durations could be achieved by downsampling the waveform prior to analysis.
A larger window size increases the frequency resolution, but reduces the time resolution. A larger window size is computationally more efficient, but requires greater memory. The number of frequency components in the magnitude spectrum output is equal to half window length.
This selects the windowing function to be applied to the wave prior to the FFT. Windowing functions provide a ‘fade-in’ and ‘fade-out’ of the windowed waveform, which can improve the frequency selectivity of an analysis. If no windowing function is used, the analysis will be perfect only if the frequencies of the waveform are perfectly in tune with the Fourier component frequencies. Waveform frequencies that do not match the Fourier series will produce spectra with energy spread across the whole spectrum. Windowing functions allow arbitrary waveforms to be better represented in the analysis: the energy that would be spread across the spectrum is gathered into the region of the true peak. The trade-off is that the peaks become more rounded.
The choice of the windowing function may depend on how strong a correction is desired (considering the trade-off between gathering energy to the relevant peaks and peak rounding) – for example, the Hanning windowing function produces better defined peaks, but less correction away from the peaks, than the Blackman windowing function. The supported windowing functions are Hanning, Hamming, Bartlett, Blackman and Rectangular (no windowing function). Windowing functions are implemented to yield 0 dB gain at DC.
Use Chirp Z-transform
Checking this box substitutes Matlab’s chirp z-transform (czt) for the FFT. This allows an upper and lower cutoff frequency to be specified (like a zoom-FFT). In this implementation, only points on the unit circle of the z-plane are evaluated.
Example of a chirp z-transform spectrogram for the 750 - 1000 Hz frequency range (demosound.wav)
This is the magnitude of the spectrum (expressed in decibels) as a function of frequency and time.
Example of a spectrogram, showing three tonal components as their level varies in time and frequency. (demosound.wav)
Example of a spectrogram of a solo operatic tenor singer. The harmonic series (the horizontal periodic stripes) and singer’s formant (the broad peak at around 3 kHz) are both clearly evident.
Average Power Spectrum
This is the power average (over time) of the spectra as a function of frequency, expressed in decibels.
Example of an average power spectrum plot (demosound.wav)
This is the power sum of the spectrum for each time window, expressed in decibels. When appropriately calibrated, this is the unweighted sound pressure level of the signal in each analysis window.
Note that this is the basis of the internal calibration of the FFT Spectrum analyser. If you analyse a pure tone, its spectral peak value will be less than the power sum value, because the power is distributed between all of the frequency bands of the spectrum. The difference between spectral peak level and power sum level will be even greater for broadband signals.
Using the chirp z-transform introduces a gain offset which is not corrected for by the program.
Standardized and non-standardized moments of the power spectrum are given as time series. The most commonly used of these is the spectral centroid, which can be an indicator of the 'brightness' or 'sharpness' of the sound.
Example of a spectral centroid time-series plot (demosound.wav)
This analyser was written by the PsySound3 team.
Information on Fourier analysis of audio signals is very widely available, for example:
Oppenheim, A.V., and R.W. Schafer. Discrete-Time Signal Processing. Englewood Cliffs, NJ: Prentice Hall, 1989