Loudness (Moore et al. / Psysound2)

Overview

This analyser implements many of the features of PsySound2, using code adapted from PsySound2 (originally written in Pascal). The most important of the models implemented is Moore, Glasberg and Baer’s static loudness model (1997). Although this is a static model, it is applied to each analysis window as if it were a dynamic model. Window lengths are 93 ms (4096 sample window, at 44.1 kHz sampling rate). The main purpose of this analyzer is to provide similar analysis models to PsySound2. The following text is adapted from the PsySound2 manual.
Currently this analyser is only suitable for sound files that have a sampling rate of 44.1 kHz.

Differences to PsySound2

Main differences from PsySound2 are given in the following table

PsySound2 PsySound3 implementation
Hanning window function Blackman window function
0.25 Erb resolution 0.1 Erb resolution
Free field or diffuse field Free field only (so far)
Pitch analysis Not implemented in this analyser

Measures Relating to Specific Loudness

Specific loudness is central to the modeling of several psychoacoustical measures, including loudness and sharpness [Zwicker, 1999 #41].

ERB scale

The cochlea is a frequency analyser, wherein the sound energy is distributed to tens of thousands of parallel auditory filters. Most psychoacoustical models attempt to replicate this process in an effort to measure the effect of sound as far inside the hearing system as practical. Upon entering the cochlea, sound stimulates the basilar membrane into vibration, with the low frequencies stimulating the far end and high frequencies stimulating the outer end of the basilar membrane. Receptors corresponding to auditory filters are distributed along the basilar membrane, their position corresponding to frequency sensitivity. This distribution of the frequency spectrum can be expressed as a psychoacoustical ‘frequency’ scale. This scale is called critical-band rate or Erb number. Following Glasberg and Moore [, 1990 #22], Erb number (z) is related to frequency (f) in the following manner:
EQUATION
Here frequency is in Hertz, and Erb number is in ‘Erbs’. This scale is used by PsySound2 in calculating the specific loudness pattern. Some authors use ‘Cam’ (Cambridge) instead of ‘Erb’ to distinguish between the general concept of a filter’s ‘equivalent rectangular bandwidth’ and the unit [Hartmann, 1998 #42].
Auditory filters have tunings between 2 Erbs (50 Hz) and 39 Erbs (15 kHz) - higher and lower frequencies are heard by off-centre frequency listening [Moore, 1997 #27].
An older version of critical-band rate [Zwicker, 1980 #206] is related to frequency in the following way:

Here critical-band rate is in ‘Barks’. The Bark scale ranges from 1 to 24 - it assumes broader auditory filters (or critical-bands). Like on PsySound2, this scale is used by this analyser in sharpness and pitch calculations.

Excitation pattern

The excitation level is calculated based of Moore et al. (1997), as a function of Erb number and time. This level represents the degree of stimulation of auditory filters prior to the specific loudness transformation.

Specific Loudness Pattern

If critical-band rate or Erb number is the ‘frequency’ dimension of the specific loudness ‘spectrum’, the value dimension is specific loudness itself. Specific loudness is the loudness that a sound stimulates within each auditory filter, and is measured in sones per Erb. Refer to Moore et al. (1997) for information on the way in which specific loudness is calculated.
The specific loudness pattern can be used to find the loudest part of the sound spectrum.
This analyser calculates and outputs specific loudness at 0.1 Erb intervals. This is different to PsySound2, which calculated specific loudness at 0.25 Erb intervals, and recorded it at 1 Erb intervals in the time series output files.

Loudness

Loudness is the subjective impression of the intensity of a sound, measured in sones. The sone unit is proportional to loudness; a doubling in sones corresponds to a doubling of loudness. Silence approaches 0 sones. A 1 kHz tone at 40 dB(SPL) presented as a frontal plane wave in a free field has a loudness of 1 sone.
Loudness is the integral of specific loudness on the Erb number (or z) scale (similar to a frequency scale). The symbol N represents loudness (in sones), and N’(z) specific loudness (in sones per Erb).
EQUATION

Sharpness

Sound can be subjectively rated on a scale from dull to sharp, and sharpness algorithms attempt to model this. These algorithms are essentially weighted centroids of specific loudness. The unit of sharpness is the acum. One acum is defined as the sharpness of a band of noise centred on 1000 Hz, 1 critical-bandwidth wide, with a level of 60 dB(SPL). A 1000 Hz pure tone at 60 dB(SPL) will have a similar sharpness.
Zwicker & Fastl’s sharpness is calculated in the following manner - where N is loudness, N’(z) is specific loudness, z is the critical-band rate, and g(z) is a weighting function that emphasises high frequencies (Zwicker and Fastl 1999).

Aures’ sharpness formula is a revision of Z&F’s, so as to model the positive influence that loudness has on sharpness (Aures 1985a). Aures also uses a different g(z) function. It is not immediately obvious that Z&F’s formula is also positively influenced by loudness, due to the increasing asymmetry of auditory filters at high excitation levels; nevertheless Aures’ formula is much more sensitive to loudness.

Timbral Width

The width of the peak of the specific loudness spectrum is called the timbral width. It is calculated in a manner similar to that proposed by Malloch (1997):
EQUATION
Here, zmax is the Erb number at which the specific loudness spectrum reaches its maximum. Theoretical values range between 0 and 1: a single pure tone is likely to have a smaller width than broadband noise. The square reduces the values - otherwise most measurements would be close to 1.

Volume

Volume refers to the ‘size’ of the sound. It is a rather old-fashioned concept: the auditory volume of pure tones was the concern of S. S. Stevens’ doctoral research (Stevens 1933). A preliminary volume model for arbitrary spectra, developed by Cabrera (1999) takes the following form:
EQUATION
V is volume in vols, N is loudness in sones, N’(z) is the specific loudness function in sones per Erb, and z is Erb number. The general idea is that volume increases with loudness, and decreases with increasing centroid. Hence the denominator of the large fraction is a centroid. The exponents determine the relative influence of loudness and centroid. This approach is related to the finding that the volume of narrow noise bands is equal to their loudness divided by their density (Stevens 1965).
Following Terrace and Stevens (1962), 1 vol is the volume of a 1 kHz tone presented in the free field at 40 dB(SPL).
This function is preliminary, accounting for a limited range of stimuli.

Code Authors

This analyser was written by the PsySound3 team, based on the PsySound2 code by Densil Cabrera.

Key References

The following reference list was used for PsySound2:
Aures, W. (1985a). “Berechnungsverfahren für den sensorischen Wohlklang beliebiger Schallsignale.” Acustica 59: 130-141.
Aures, W. (1985b). “Ein Berechnungsverfahren der Rauhigkeit.” Acustica 58: 268-281.
Cabrera, D. (1999). “The Size of Sound: Auditory Volume Reassessed.” Australasian Computer Music Association Conference, Wellington New Zealand: 26-31.
Glasberg, B. R. and B. C. J. Moore (1990). “Derivation of Auditory Filter Shapes from Notched Noise Data.” Hearing Research 47: 103-137.
Hartmann, W. M. (1998). Signals, Sound and Sensation. New York, Springer-Verlag.
Hutchinson, W. and L. Knopoff (1978). “The Acoustical Component of Western Consonance.” Interface 7: 1-29.
Kuhn, G. (1979). “The Pressure Transformation from a Diffuse Field to the External Ear and to the Body and Head Surface.” Journal of the Acoustical Society of America 65: 991-1000.
Malloch, S. N. (1997). Timbre and Technology : an Analytical Partnership: the Development of an Analytical Technique and its Application to Music by Lutoslawski and Ligeti. Thesis (Ph.D.), University of Edinburgh.
Moore, B. C. J., B. R. Glasberg and T. Baer (1997). “A Model for the Prediction of Thresholds, Loudness, and Partial Loudness.” Journal of the Audio Engineering Society 45(4): 224-240.
Parncutt, R. (1989). Harmony: A Psychoacoustical Approach. Berlin, Springer-Verlag.
Sethares, W. (1993). “Local Consonance and the Relationship between Timbre and Scale.” Journal of the Acoustical Society of America 93(3): 1218-1228.
Sethares, W. (1998). Tuning, Timbre, Spectrum, Scale. London, Springer.
Stevens, S. S. (1933). The Volume and Intensity of Tones. Thesis (Ph.D.), Harvard University.
Stevens, S. S., M. Guirao and W. Slawson (1965). “Loudness, A Product of Volume times Density.” Journal of Experimental Psychology 69: 503-510.
Terhardt, E., G. Stoll and M. Seewann (1982). “Algorithm for Extraction of Pitch and Pitch Salience from Complex Tonal Signals.” Journal of the Acoustical Society of America 71(3): 679-688.
Terrace, H. S. and S. S. Stevens (1962). “The Quantification of Tonal Volume.” American Journal of Psychology 75: 596-604.
Vos, J. (1986). “Purity Ratings of Tempered Fifths and Major Thirds.” Music Perception 3(3): 221-258.
Zwicker, E. and H. Fastl (1999). Psychoacoustics: Facts and Models. Berlin, Springer-Verlag.
Zwicker, E. and E. Terhardt (1980). “Analytical Expressions for Critical Band Rate and Critical Bandwidth as a Function of Frequency.” Journal of the Acoustical Society of America 68: 1523-1525.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License