Digital Audio Technology

A Primer: Part II

MI 313

Center for Audio Recording Arts (CARA)

by: Robert S. Thompson, Ph.D.

 

Reconstruction of the Analog Signal

After the sampling process and subsequent storage of samples in an array, the DAC unit reconstructs the signal. This reconstruction takes place at around 50,000 times per second or, at a sampling rate of 50KHz (although both lower and higher rates can be found!). This means there are 6 million numbers (samples) for one minute of stereo sound.

Time does pass between successive samples of course, even at these high speeds! The duration of the gaps between successive samples is very small to be sure but it is safe to assume that for some sounds the waveform will change "between" successive samples. Also, samples are instantaneous and sampling pulses can have a very short duration perhaps as small as 0.00002 sec (this is two hundred-thousandths of a second)

With all of this in mind it is possible to say that the signal at the ADC is defined at discrete times, each such time represented by one sample.

Part of the "magic" of digitized sound is that if the signal is bandlimited, the DAC and associated lowpass filter can exactly reconstruct the original signal from the samples. This means, that under certain conditions, the missing part of the signal "between the samples" can be restored! The smoothing filter of the DAC effects this restoration process.

Phase Correction

The issue of phase correction came rushing to the fore following the introduction of the first generation of digital audio recorders and players. Many complained about the harsh sound of digital recordings, a problem that is traced to the "brickwall" anti-aliasing filters of the ADC.

The "Brickwall" Filter

The problem of aliasing and the need to provide total band-rejection at the Nyquist frequency mandated the implementation of a filter which has a very steep frequency rejection curve (over 90dB/octave at the Nyquist frequency, typically). These filters are called brickwall due to this steep cutoff frequency. These are analog filters generally speaking.

Steep filters such as these can cause significant time delays (phase distortion) in midrange and high audio frequencies. These problems are especially acute in the ADC but the DAC also introduces a smaller frequency-dependent delay which is contributed by the smoothing filter at output.

No analog filter can be both extremely steep and phase linear around the cutoff point.

phase linear - little or no frequency-dependent delay introduced by the filter

The effect of an extremely steep filter "spills over" into the audio range. For compact disc recordings at 44.1KHz sampling rate, the Nyquist frequency is 22.05KHz, and a step antialiasing filter can introduce phase distortion that extends well below 10KHz. This type of phase distortion lends an unnaturally harsh sound to high frequencies. This is why audio professionals were so critical of digital recording in the early days - way back in the 1980's!

There are several ways to tackle this problem. The simplest is to trade off the anti-aliasing properties (rejection at the Nyquist rate) in favor of less phase distortion. A less steep antialiasing filter (say 40-60 dB per octave) introduces less phase distortion, but at the risk of foldover for very high frequency sounds. Another solution is to apply a time correction filter before the ADC to skew the phase relationships in the incoming signal so as to preserve the original phase relationships in the recording. However, the current, high-technology solution to phase correct conversion is to use oversampling techniques at both the input and output stages of a system.

Quantization

Sampling at discrete time intervals constitutes one of the major differences between digital and analog signals. Another difference is quantization, or discrete amplitude resolution. The values of the sampled signal cannot take on any conceivable value. This is because digital numbers can only be represented within a certain range and with a certain accuracy, which varies with the hardware being used. The implications of this are an important factor in digital audio quality.

Quantization Noise

Samples are usually represented as integers. If the input signal has a voltage corresponding to a value between 53 and 54, for example, then the converter might round it off and assign a value of 53. In general, for each sample taken, the value of the sample usually differs slightly from the value of the original signal. This problem in digital signals is known as quantization error or quantization noise.

What does it sound like? Very much like analog tape hiss at the output. If we listen to just the errors it sounds just like noise, albeit at a fairly low amplitude!.

The quantization noise is dependent on two factors: the signal itself, and the accuracy with which the signal is represented in the digital form. We can explain the sensitivity to the signal by noting that on an analog tape recorder, the tape imposes a soft halo of noise that continues even through periods of silence on the tape (unless you use paper leader in these spots!)

But, in a digital system, there can be no quantization noise when nothing (or silence) is recorded. In other words, if the input signal is silence, then the signal is represented by a series of samples each of which is exactly zero. If, on the other hand, the input signal is a pure sinusoid, then the quantization error is not a random function but a deterministic effect. This gritty sound, called granulation noise, can be heard at when very low level sinusoids decay to silence. When the input signal is complicated, the granulation becomes randomized (by errors) into white noise.

The second factor in quantization noise is the accuracy of the digital representation. In a PCM system that represents each sample value by an integer (a linear PCM system), quantization noise is directly tied to the number of bits that are used to represent the sample. This specification is the sample width or quantization level of a system. The more bits used to represent a signal the less the quantization noise.

Low-level Quantization Noise and Dither

Although a digital system exhibits no noise when there is no input signal, at very low (but nonzero) signal levels, quantization noise takes a pernicious form. A very low level signal triggers variations only in the lowest bit (of the digital word that is...). These 1-bit variations look like a square wave which is rich in odd harmonics. Consider the decay characteristics of a piano tone, which smoothly attenuates with high partials rolling off - right until the lowest level when it changes character and becomes a harsh sounding square wave! The harmonics of the square wave may even extend beyond the Nyquist frequency, causing aliasing and introducing new frequency components that were not in the original signal. These artifacts may be possible to ignore if the signal is kept at a low monitoring level, but if the signal is heard at a high level or if it is digitally remixed to a higher level, it becomes more obvious. Hence it is important that the signal be quantized as accurately as possible at the input stage. It is also clear that regardless of the eventual playback volume of a digital signal, the relative strength of the signal will be important to the success of DSP (digital signal processing) treatments.

To confront low-level quantization problems, some digital recording systems take what seems at first to be a strange action. They introduce a small amount of analog noise - called dither - to the signal prior to the analog to digital conversion process. This causes the ADC to make random variations around the low-level signal, which smooths out the pernicious effects of square wave harmonics. With dither, the quantization error, which is usually signal-dependent, is turned into a wide band noise that is uncorrelated with the signal. For decrescendos like the piano tone, the effect is that of a "soft landing" as the tone fades smoothly into a bed of low-level random noise. The amount of added noise is usually on the order of 3dB, but the ear can reconstruct musical tones whose amplitudes fall below that of the dither signal.

Dither may not be necessary with an accurate 20-bit converter, since the low bit represents and extremely soft signal in excess of 108 dB below the loudest signal. But when converting signals from a 20-bit to a 16-bit format, for example, dithering is necessary to preserve signal fidelity.

Converter Linearity

Converters can cause a variety of distortions. One that is pertinent here is that an n-bit converter is not necessarily accurate to the full dynamic range implied by its n-bit input or output. While the resolution of an n-bit converter is one part in 2^n, a converter's linearity is the degree to which the analog and digital input and output signals match in terms of their magnitudes. That is, some converters use 2^n steps, but these steps are not linear, which causes distortion. Hence it is possible to see an 18-bit converter, for example, that is 16-bit linear. Such a converter may be better than a plain 16-bit converter, which may not be 16-bit linear.

Dynamic range of Digital Audio Systems

The specifications for digital sound equipment typically specify the accuracy or resolution of the system. This can be expressed as the number of bits that the system uses to store each sample. The number of bits per sample is important in calculating the maximum dynamic range of a digital sound system. In general, the dynamic range is the difference between the loudest and softest sounds that the system can produce and is measured in units of decibels (dB).

The Decibel Reviewed

The decibel is a unit of measurement for relationships of voltage levels, intensity, or power, particularly in audio systems. In acoustic measurements, the decibel scale indicates the ratio of one level to a reference level, according to the relation

number of decibels = 10 x log10( level / reference level )

where the reference level is usally the threshold of hearing (10^-12 watts per square meter). The logarithmic basis of decibels means that if two notes sound together, and each note is 60dB, the increase in level is jut 3dB. A millionfold increase in intensity results in a 60dB boost.

Two important facts inform the dynamic range requirements for digital audio.

1. The range of human hearing extents from approximately 0dB, roughly the level at which the softest sound can be heard, to something around 125 dB, which is roughly the threshold of pain for sustained sounds.

2. A difference of somewhat less than one dB between the amplitude levels of two sounds corresponds to the smallest difference in amplitude that can be heard.

These figures vary with age, training, pitch and the individual.

In recording music, it is important to capture the widest possible dynamic range if we want to reproduce the full expressive power of the music. In a live orchestra concert, for example, the dynamic range can vary from "silence," to an instrumental solo at 60dB, to a tutti section by the full orchestra exceeding 110dB! The dynamic range of analog tape equipment is dictated by the physics of the analog recording process. It stands somewhere around 80dB for a 1KHz tone using professional reel-to-reel recorders without noise reduction devices. (Recall that noise reduction devices can increase dynamic range at the price of various distortions.)

When a recording is produced for distribution on a medium that does not have a wide dynamic range (a mass-produced analog cassette, for example) the soft passages are made a little bit louder by the transfer engineer, and the loud passages are made a bit softer. If this were not done, then the loudest passages would produce distortion in the recording, and the softest passages would be masked by hiss and other noise.

Dynamic Range of a Digital System

To calculate the maximum dynamic range of a digital system, we can use the following simple formula:

maximum dynamic range in decibels = number of bits x 6.11

The number 6.11 is a close approximation to the theoretical maximum; in practice, 6 is a more realistic figure.

Thus, if we record sound with an 8-bit system, then the upper limit on dynamic range is approximately 48dB - worse than the dynamic range of analog tape recorders. But, if we record 16 bits per sample, the dynamic range increases to a maximum of 96 dB - a significant improvement. A 20-bit converter offers a potential dynamic range of 120dB, which corresponds roughly to the range of the human ear. And since quantization noise is directly related to the number of bits, even softer passages that do not use the full dynamic range of the system should sound cleaner.

This discussion assumes that we are using a linear PCM scheme that stores each sample as an integer representing the value of each sample. Other conversion schemes (less standard!) encode samples in various ways such as decimal numbers, fractions, differences between successive samples and the like. These other encoding schemes usually have the goal of reducing the total number of bits that the system must store. For some applications like compact disc media that mix images with audio data (CD-ROM, CD-I, etc.), it may be necessary to compromise dynamic range by storing fewer bits in order to filt all the needed information on the disk. Another way to save space is, of course, to reduce the sampling rate.

Oversampling

So far the discussion has focussed on linear PCM converters. A linear PCM DAC transforms a sample into an analog voltage in essentailly one straightforward step. In contrast to linear PCM converter, oversampling converters use more samples in the conversion stage than are actually stored in the recording medium. The theory of oversampling is an advanced topic, however for our purposes here it is sufficient to present the basic ideas.

Oversampling is not one technique but a family of methods for increasing the accuracy of converters. We distinguish between two different types of oversampling:

1. Multiple-bit oversampling DACs - developed for use in compact disc players in the early 1980s by Phillips.

2. 1-bit oversampling with sigma-delta modulation or a related method as used in more recent ADCs and DACs.

The first method converts a number of bits (for example, 16) at each tick of the sampling clock, while the second method converts just one bit at a time, but at a very high sampling frequency. The distinction between multi-bit and 1-bit sampling systems is not always clear, since some converters us a combination of these two approaches. That is, they perform multi-bit oversampling first and then turn this into a 1-bit stream that is again oversampled.

Multiple-bit Oversampling Converters

In the mid-1980s many CD manufacturers used a DAC chip set designed by Phillips that introduced the benefits of oversampling technology to home listeners. These converters take advantage of the fact that digital filters can provide a much more linear phase response than the steep brickwall analog filters used in regular DACs (ADCs based on this concept have also been made). In a CD player 44,100 16-bit samples are stored for each second per channel, but on playback they may be up-sampled four times (to 176.4 KHz) or eight times (to 352.8KHz), depending on the system. The up-sampling is accomplished by interpolating three (or seven) new 16-bit samples in between every two original samples. At the same time all of the samples are filtered by a linear phase digital filter, instead of a phase-distorting brickwall analog filter. (This digital filter is called a finite-impulse-response filter or, FIR filter - they tend to have good linearity and are very stable over the frequency band - more on this and related topics in MI 323.)

Besides phase linearity, a main benefit of oversampling is a reduction in quantization noise - and an increase in signal to noise ratio - over the audio bandwidth. This derives from a basic principle of converters stating that the total quantization noise power corresponds to the resolution of the converter, independent of its sampling rate. This noise is, in theory, spread evenly across the entire bandwidth of the system. A higher sampling rate spreads a constant amount of quantization noise over a wider range of frequencies. Subsequent lowpass filtering eliminates the quantization noise power above the audio frequency band. As a result, a four times oversampled recording has 6dB less quantization noise (equivalent to adding another bit or resolution), and an eight-times oversampled recording has 12dB less noise. The final stage of this kind of system is a gently sloping analog lowpass filter that removes all components above, say, 30KHz, with insignificant audio band phase shift.

1-bit Oversampling Converters

Although the theory of 1-bit oversampling converters goes back to the 1950's , it took many years for this technology to become incorporated into digital audio systems. The 1-bit oversampling converters constitute a family of different techniques that are variously called sigma-delta, delta-sigma, noise-shaping or bitstream depending on the manufacturer. They have a common thread in that they sample one bit at a time, but at high sampling frequencies. Rather than trying to represent the entire waveform in a single sample, these converters measure the difference between successive samples.

1-bit converters take advantage of a fundamental law of information theory, which says that one can trade off sample width for sample rate and still convert at the same resolution. That is, a 1-bit converter that "oversamples" at 16 times the stored sample rate is equivalent to a 16 bit converter with no oversampling. They both process the same number of bits. The benefits of oversampling accrue when the number of bits being processed is greater than the number of input bits.

From the standpoint of a user, the rate of oversampling in a 1-bit converter can be a confusing specification, since it does not necessarily indicate how many bits are being processed or stored. One way to try to decipher oversampling specification is to determine the total number of bits being processed, according to the relation:

(oversampling factor) X (width of converter)

For example, a "128-times oversampling" system that uses a 1-bit converter is processing 128 x 1 bits each sample period. This compares to a traditional 16-bit linear converter that handles 1 x 16 bits, or 8 times less data. In theory, the 1-bit converter should be much cleaner sounding. In practice, however, making this kind of determination is sometimes confounded by converters that use several stages of oversampling and varying internal bit widths.

In any case, all the benefits of oversampling accrue to 1-bit converters, including increased resolution and phase linearity due to digital filtering (digital filters are more accurate than analog ones!). High sampling rates that are difficult to achieve with the technology of multi-bit converters are much easier to implement with 1-bit converters. Oversampling in the MHZ range permits 20-bit quantization per sample.

Noise Shaping

Another technique used in 1-bit oversampling converters is noise shaping, which can take many forms. The basic idea is that the "requantization" error that occurs in the oversampling process is shifted into a high-frequency range -- out of the audio bandwidth -- by a highpass filter in a feedback loop with the input signal. This noise-shaping loop sends only the requantization error through the highpass filter, not the audio signal.

The final stage of any oversampling converter is a decimator/filter that reduces the sampling rate of the signal to that required for storage (for an ADC) or playback (for a DAC) and also lowpass filters the signal. In the case of a noise-shaping converter this decimator/filter also removes the requantization noise, resulting in dramatic improvements in the signal-to-noise ratio. With second-order noise shaping (so called because of the second-order digital hipass filter in the feedback loop), the maximum signal to noise level of a 1-bit converter is approximately equivalent to 15dB (2.5 bits) per octave of oversampling, minus a fixed 12.9 dB penalty. Thus an oversampling factor of 29 increases the signal to noise ratio of a 16-bit converter by the equivalent of 10 bits or 60dB!

For the intrepid engineer there is much written about the theory of oversampling converters - be sure to have your calculator ready, it can be tough going!

Overview of Digital Audio Media

Audio samples can be stored on any digital medium: tape, disk, or integrated circuit, using any digital recording technology, for example, electromagnetic, megneto-optical, or optical. Using a given medium, data can be written in a variety of formats. A format is a kind of data structure - which outlines how the information is to be stored on the media and later interpreted when recalled from storage. For example, some manufacturers of digital audio workstations implement a proprietary format for storing samples on hard disk. For both technological and marketing reasons new media and formats appear regularly.

Some media are capable of handling more bits per second and so have the potential for higher quality recording. For example, certain digital tape recorders can encode 20-bits per sample with appropriate converters. A hard disk can handle 20-bit samples at rates in excess of 100KHz (for a limited number of channels at one time), while for semiconductor media (memory chips) the potential sample width and sampling rate are much greater.

Another characteristic of media is life-span. Archival-quality optical disks made of etched tempered glass and plated with gold will last decades and can be played many thousands of times. Magnetic media like DAT and floppy disks are inexpensive and portable, but not nearly as robust. Some consider the life-span of CD-R to be on the order of 100 years. We will have to wait a while to see if this is true.

An outstanding advantage of digital storage media is that one can transfer the bits from one medium to another with no loss. (This assumes compatibility between machines and formats and the absence of copy-protection circuits.) One can clone a recording any number of times - from the original or from any of the copies. It also means that one can transfer a recording from an inexpensive serial medium, such as DAT, to a random-access medium such as hard disk, which is more suited to editing and processing. These transfer are accomplished using established transfer formats and associated hardware circuits such as AES/EBU and SP/DIF.

MEDIUM

ACCESS

NOTES

stationary head, magnetic tape

serial

typically used for professional multitrack

(24, 32, 48 track) recording; several formats

coexist; limited editing

rotary head videotape,

magnetic tape (helical scanning)

serial

professional and consumer formats; consumer video cassettes are inexpensive; two machines need for Assembly type editing; several tape formats (U-Matic, Beta, VHS, 8mm, etc.) and three incompatible international video encoding formats (NTSC, PAL and SECAM)

rotary head audiotape,

magnetic tape

serial

professional Nagra-D format for four-channel location recording

Digital Audio Tape (DAT),

magnetic tape

serial

small portable cassettes and recorders; compatible worldwide; some machines handle SMPTE timecode

Digital Compact Cassette (DCC),

magnetic tape

serial

a digital format that can also be used in traditional analog cassette recorders; uses data compression; inferior sound quality as compared to CD format

hard disks,

magnetic and optical

random

nonremovable hard disks are faster (several milliseconds access time); removable hard disks are convenient for backup and transporting of sound samples; a removable optical hard disk attached to a computer is usually not the same format as an audio CD, though they may look similar

floppy diskettes, magnetic

random

small, inexpensive and convenient, but slow access times and can store only short recordings; not reliable for long terms storage

 

Sony Mini Disc (MD),

magnetic

random

a floppy disk format for sound that employs data compression; inferior sound quality with respect to CD format

compact disc, optical

random

small thin disc storing a maximum of 782 Mbytes for a 74 minute disc; archive quality discs last decades; can playback images as well as audio; various levels of audio quality depending on the application from speech grade (CD-ROM) to very high fidelity (20 bit format); slow access and transfer rate compared to other random-access media

semiconductor memory,

electronic

random

very fast access time (less than 80 nanoseconds typically); excellent for temporary storage (for editing) but too expensive for large databases

 

 

Synthesis and Signal Processing

This is the future of the recording field. As we now understand, sampling transforms acoustical signals into binary numbers, making possible digital audio recording. For musical purposes, the applications of sampling go beyond recording, to synthesis and signal processing. Synthesis is the

process of generating streams of samples by algorithmic means. Signal processing transforms streams of samples. In music we use signal processing tools to sculpt sound waves into aesthetic forms. There are many applications to creative audio design.

 

DSP Sub-category

Audio Engineering Application(s)

dynamic range (amplitude) manipulations

reshaping the amplitude profile of a sound

mixing

combining multiple tracks of audio, including crossfading

filtering and equalization

changing the frequency spectrum of a sound

time-delay effects

echo, chorus effect, flanging, phasing

convolution

simultaneous time-domain and frequency-domain transformations

spatial projection

sonic holograms, acoustic space simulation, Doppler shift, reverberation

noise reduction

phase inversion convolution, multi-band filtering based on the Fast Fourier Transform (FFT)

sample rate conversion

interpolation or decimation with our without pitch shifting

FFT applications

sound analysis, transformation and resynthesis

time compression/expansion

decoupling frequency and time information

 

Although it is a relatively new field, digital signal processing (DSP) has blossomed into a vast theoretical science and applied art.

End of Part II: rst