Much of the material found here is redacted from the book "The Technology of Computer Music" by Max Mathews published by M.I.T. Press in 1969. This text is considered foundational in the literature of digital audio and computer music applications. This document also contains original materials.
The world of audio recording technology has changed. The change occurred in the 1980 with the explosion of digital audio technology - MIDI, DSP chip-sets, computer integration, AES/EBU, SPDIF, the digital audio workstation, DAT, the multi-effects digital signal processor, spatial processors - to name but a few, have taken their place within the total studio configuration. The technology of electronic and computer mediated sound generation and manipulation has given us new tools of tremendous power and sophistication for making new sounds and sonic environments. This same technology has also created a new problem - the need to understand the psychoacoustics of musical perception.
Sounds of conventional instruments and conventional acoustic spaces are well understood - so well understood that composers of experience can proceed with their work without recourse to the instruments themselves - rather they use the sonic intuition developed over long periods of experimentation. However, where new sounds are concerned no such background experience exists. The composer, sound designer and recording engineer must understand the relation between the physical sound wave and how it is perceived by listeners. He or she must also understand how the studio processing of sounds impacts auditory perception. The study of psychoacoustics addresses these questions and therefore has become an essential study for audio practitioners.
As a field of inquiry, psychoacoustics is relatively young. Furthermore, the basic research in the field is not always driven by musical imperatives. This makes the adoption of psychoacoustical theory into musical frameworks a bit more challenging. However, the insights gleaned through psychoacoustics remain tremendously useful in audio technology contexts.
The perceived loudness of a sound depends on many factors in addition to its intensity. For example, in order for a pure tone or sinusoid at 100Hz to be heard, its sound intensity must be 1000 times greater than that of a pure tone at 3000Hz. For most of the musical range of frequency the perceived loudness increases as the 0.6 power of the sound pressure. The perceived loudness increases more slowly with sound pressure for 3000Hz tones than it does for very low frequencies, say, 100Hz; and in the uncomfortably loud range, tones of equal power are about equally loud. This means that as we turn the volume control up or down, the balance of loudness among frequency components changes slightly.
The implications of this for studio work are myriad - one important caveat overshadows others however; when mixing, target the playback volumes your program material might reasonably have and work your balance and EQ at these volumes. Try to create a mix which works well at several different volume levels. Shift the volume level between these various settings frequently to help avoid aural fatigue and to help check your settings.
A tone or a noise masks or renders us incapable of hearing a less powerful tone. A tone has a strong masking effect for tones of higher frequency and a weaker masking effect for tones of lower frequency. The frequency range of masking is greater for loud tones than for soft tones. Thus we would expect that in a musical composition (audio mix) some sounds might be masked and unheard when the volume is high, whereas they would be unmasked and heard when the volume is low.
Masking can be considered as a raising of the level at which tones become audible. Some rise in the threshold persists for 1/6 sec or longer after a loud tone, but the after-effect of a loud tone on hearing is much less than that of a bright light on seeing.
Limens or jnd's of loudness and frequency have been carefully measured. They are surprisingly small. However, there is evidence that the limens are much smaller than the frequency or loudness differences that can be detected in complicated listening tasks, which are more akin to music. Very small differences in frequency (less than a half tone) and loudness can be detected in successive tones that are not too short.
It stands to reason then that in complex settings of music - 24 channel mix-down for example - grosser changes in volume settings and EQ settings will have noticeable effects. If the changes in these parameters were made in a "solo" context they may appear too great.
The pitch of a complex tone is often thought of as that of its lowest partial. However, experiments made with repetitions of various patterns of pulses and with complex tones in which the upper partials are harmonics of a frequency higher than the fundamental show that, although the fundamental dominates at higher frequencies, the repetition rate of the tone or of its higher partials dominates at lower frequencies. The pitch of a tone may be highly uncertain by one of more octaves; thus Shepard produced a circle of 12 tones, which when cyclically repeated give the impression of always rising in pitch, with no break. Tones with inharmonic partials, including gongs, bells and tones specially synthesized by computer may produce a sensation of pitch; a tune can be played on them. But the pitch may not be the first partial; for example, the hum tone of a bell is not the pitch to which the bell is tuned.
The sound quality or timbre of steady tones depends on the partials. Although partials up to the sixth (and sometimes higher) can be heard individually by careful listening, we tend rather to hear an over-all effect of the partials through the timbre of the tone. A pure tone or sinusoid is thin. A combination of octave partials is harsh or buzzy. In general, the timbre appears to be dissonant or unpleasant if two strong partials fall within a critical bandwidth, which is about 100Hz below 600Hz and about a fifth of an octave above 600Hz.
The timbre of a sound is strongly affected by resonances in the vocal tract or in musical instruments. These resonances strengthen the partials near the resonant frequencies. Three important formants or ranges of strengthened frequency are produced by the vocal tract; they give the qualities to vowel sounds which are identifiable independent of pitch.
Textbooks give harmonic analyses of sounds of various musical instruments, but if we synthesize a steady tone according to such a formula it sounds little like the actual instrument. Steady synthesized vowels do not sound like speech if their duration is long.
Temporal changes such as attack, decay, vibrato and tremolo, whether regular or irregular, have a strong effect on sound quality. A rapid attack followed by a gradual decay gives a plucked quality to any waveform. Also the rate at which various partials rise with time and the difference in the relative intensity of partials with loudness are essential to the quality of a sound. Indeed it is at least in part the difference in relative intensity of partials that enables us to tell a loud passage from a soft passage regardless of the setting of the volume control. This clue is lost in electronic music (and in audio recording) if the tones employed have a constant relative strength of partials, independent of volume.
The warmth of the piano has been shown due to the fact that the upper partials are not quite harmonically related to the fundamental.o
Observers with normal hearing but without musical training find pairs of pure tones consonant if the frequencies are separated by more that the critical bandwidth, or it the frequencies coincide or are within a few hertz of one another (in this case beats are heard). Pairs of tones are most dissonant when they are about a quarter of a critical bandwidth apart. For frequencies above 600Hz, this is about a twentieth of an octave.
Excluding bells, gongs and drums, the partials of musical instruments are nearly harmonic. When this is so, for certain ratios of the frequencies of fundamentals, the partials of two tone either coincide or are well separated. These ratios of the fundamental are 2:1 (the octave), 3:2 (the fifth), 4:3 (the fourth), 5:4 (the major third), and 6:5 (the minor third). Normal observers find pairs of tones with these ratios of fundamentals to be more pleasant, and intervening rations less pleasant.
Musical consonance and dissonance depend on many factors in addition to frequencies of partials. For example, unlike non-musicians, classical trained musicians describe pairs of pure tones with these simple numerical ratios of frequency as consonant and intervening ratios as dissonant. The only reasonable explanations is that trained musicians are able to recognize familiar intervals and have learned to think of these intervals as consonant.
In order for complex tones to attain a given degree of consonance, low tones must be separated by a larger fractions of an octave than high tones, and composers generally follow this principle.
If the partials of a tone are regularly arranged but not harmonic, the ratios of frequencies of the fundamental (or first partial) that lead to consonance are not the conventional ones.
When we listen to a pure tone of frequency f(1) and another tone of somewhat higher frequency f(2), we hear a combination tone of lower frequency 2f(1)-f(2), even at low sound levels. At much higher sound levels, around 100,000 times or more the power at threshold, it is possible to hear faint frequencies 2f(1), 2f(2), f(1)+f(2), f(2)-f(1) and etc. Combination tones are due to the nonlinearities in the hearing mechanism. They can contribute to dissonance and to beats.
Reverberation is important to musical quality; music played on an organ recorded in an organ loft sounds like a bad electronic organ. The reverberation for speech should be a short as possible; for music about 2 sec. is effective. Music sounds "dry" in a hall designed for speech. Reverberation is not the only effect in architectural acoustics. Our current understanding of architectural acoustics is far from satisfactory.
Many voices or instruments do not sound like one voice or one instrument. Some experiments by writers show that a choir effect (chorusing) cannot be attained by random tremolo or vibrato. It must be due to irregular changes in the overall waveform, caused by beating or head motions or by differences in attack.
We can experience a sidedness to sound by wearing headphones fed from two microphones, but the sound seems to be inside our head. We experience externalization of the sound - as coming from a particular direction - only when we allow head movements in a sound field. Although we cannot detect the directions of the source of a sinusoidal tone in a reverberant room, we can detect the direction by the onset of such a tone, and we can detect the direction of clicks and other changing sounds. The first arrival of the sound dominates later reverberant arrivals in our sensing of the direction of the source; this is called the precedence effect. We can detect vertical angle of arrival, although no one is sure how this is done. We can also sense the distance of a source in a reverberant room; this sensation must depend on some comparison of the direct arrival and the reverberant sound.
Most memory experiments are not done with musical sounds, but many are relevant to music.
It has been found that subjects can remember a sequence of from 5 to 9 randomly chosen digits, letters, or words. On the other hand, a good bridge player can remember every card that has been played in the entire game. Our ability to deal with stimuli depends on their familiarity or "meaning" to us. This familiarity comes about through over-learning. Over-learning has been insufficiently investigated because, although it is common in life, it is very difficult to achieve in the laboratory.
The phonemes of a language are over-learned. A subject an readily distinguish the phonemes of his own tongue, but not those of another. He can distinguish dialects of his own language, but not those of a foreign tongue. He can understand his native language in a noisy place better than he can understand a foreign language even though he is expert in it.
Conventional elements and structures in music are undoubtably over-learned. Much of our appreciation of harmony, much of our ability to remember conventional tunes, (Mozart, Haydn, and some other musicians could remember compositions heard only once) must depend on over-learning, just as our ability to use and remember language does. Performance with unfamiliar material is much poorer.
Some psychological stimuli have the same pattern of similarity for all people. Color is one. The psychological distance between stimuli such as colors can be obtained by computer analysis of data expressing either the confusions that subjects make among pairs of stimuli or the numbers that they assign to the pairs to express their judgments of similarity. This kind of analysis is call multidimensional scaling. The stimuli may appear in a psychological space of one dimension (loudness does), two dimensions (color does), or three (vowels do) or more dimensions. Psychological distance is dependent on, but not proportional to, physical parameters. Thus red and violet light are of all colors the farthest apart in wavelength, and yet they look more alike - they are closer together psychologically - than the "intermediate" colors orange and blue.
Unhappily, multidimensional scaling is just beginning to be applied in the field of music. Further results might be enlightening. For instance, we is nearly you said backwards, and yet we perceive no similarity between the sound of the two words. Is the retrograde of a phrase psychologically similar to the phrase, or is retrograde (in the words of Tovey) for the eye only? Transpositions certainly are psychologically close, but what about augmentations and inversions? What about changes in rhythm? What about manipulations of the tone row?