Digital Audio

Digital audio converts a continuous pressure-wave signal into discrete numerical samples. The conversion quality is constrained by two core design choices: how often you sample (sampling rate) and how finely you quantize amplitude (bit depth).

🎯 Simple version: To store sound in a computer, you take very fast snapshots of the waveform. More snapshots per second and finer volume steps make a more accurate recording. Too few snapshots causes false tones (aliasing).

1) Analog-to-Digital Conversion (ADC)

The microphone produces a continuous voltage proportional to air pressure. ADC measures that voltage at discrete times:

x[n] = x(n * Ts)

where Ts = 1/Fs, and Fs is the sampling rate in samples/second.

This is sampling a continuous function on a time lattice.

2) Sampling Rate and Nyquist

Nyquist-Shannon condition for band-limited reconstruction:

Fs >= 2 * Fmax

where Fmax is the highest frequency component you want to preserve.

Example:

So 44.1 kHz is chosen to cover the audible range with practical anti-alias filter margins.

3) Aliasing

If frequencies above Nyquist enter the converter, they fold into lower frequencies and become false components:

f_alias = |f_in - k * Fs|   (choose integer k that places result in [0, Fs/2])

Example: f_in = 26 kHz, Fs = 44.1 kHz gives an aliased component at |26 - 44.1| = 18.1 kHz.

This is why anti-alias filtering before sampling is mandatory.

4) Bit Depth, Gain, and Clipping

Bit depth sets amplitude resolution:

DR ~ 6.02N + 1.76 dB

So:

Gain staging keeps signal within range. If sample magnitude exceeds full-scale limit, waveform tops are truncated:

|x[n]| > 1.0  -> clipping

Clipping introduces nonlinear distortion and high-frequency artifacts.

5) Echo and Delay

Echo is a delayed copy of a signal summed with the original:

y(t) = x(t) + a * x(t - tau)

Physics link for one reflection:

tau ~ distance / speed_of_sound

At c ~ 343 m/s, a 34.3 m round-trip path gives about 100 ms delay.

Short delays blend as coloration; longer delays are heard as discrete echoes.

6) Room Acoustics Basics

A room is a 3D resonator with standing-wave modes (analogous to instrument-physics.md, but in 3 dimensions).

Two key ideas:

Sabine approximation:

RT60 = 0.161 * V / A

where V is room volume (m^3) and A is total absorption (sabins).

This links physical space to recorded/perceived tone and motivates digital reverb models.

Psychoacoustics Connection

Engineering targets are chosen relative to human perception:

So β€œaccurate audio” is not only mathematical reconstruction; it is reconstruction sufficient for human hearing constraints.

Sampling & Aliasing Simulator
Source frequency: 440 Hz
Sampling rate: 8000 Hz

Translation Table

PhizMusic Western/Engineering Notes
Discrete pressure samples PCM audio Same concept
Sampling rate Fs Sample rate Samples per second
Nyquist limit Fs/2 Nyquist frequency Highest representable frequency without aliasing
Quantization levels 2^N Bit depth resolution Amplitude step count
Saturation clipping Digital clipping Full-scale overflow/truncation
Delayed replica mixing Echo/delay effect One of the simplest time-domain effects
RT60 = 0.161 * V / A Sabine reverberation formula First-order room decay model

Connections

Suggested References