Digital audio converts a continuous pressure-wave signal into discrete numerical samples. The conversion quality is constrained by two core design choices: how often you sample (sampling rate) and how finely you quantize amplitude (bit depth).
π― Simple version: To store sound in a computer, you take very fast snapshots of the waveform. More snapshots per second and finer volume steps make a more accurate recording. Too few snapshots causes false tones (aliasing).
The microphone produces a continuous voltage proportional to air pressure. ADC measures that voltage at discrete times:
x[n] = x(n * Ts)
where Ts = 1/Fs, and Fs is the sampling rate in samples/second.
This is sampling a continuous function on a time lattice.
Nyquist-Shannon condition for band-limited reconstruction:
Fs >= 2 * Fmax
where Fmax is the highest frequency component you want to preserve.
Example:
Fs = 44,100 HzFn = Fs/2 = 22,050 HzSo 44.1 kHz is chosen to cover the audible range with practical anti-alias filter margins.
If frequencies above Nyquist enter the converter, they fold into lower frequencies and become false components:
f_alias = |f_in - k * Fs| (choose integer k that places result in [0, Fs/2])
Example: f_in = 26 kHz, Fs = 44.1 kHz gives an aliased component at |26 - 44.1| = 18.1 kHz.
This is why anti-alias filtering before sampling is mandatory.
Bit depth sets amplitude resolution:
N bits -> 2^N quantization levelsDR ~ 6.02N + 1.76 dB
So:
Gain staging keeps signal within range. If sample magnitude exceeds full-scale limit, waveform tops are truncated:
|x[n]| > 1.0 -> clipping
Clipping introduces nonlinear distortion and high-frequency artifacts.
Echo is a delayed copy of a signal summed with the original:
y(t) = x(t) + a * x(t - tau)
Physics link for one reflection:
tau ~ distance / speed_of_sound
At c ~ 343 m/s, a 34.3 m round-trip path gives about 100 ms delay.
Short delays blend as coloration; longer delays are heard as discrete echoes.
A room is a 3D resonator with standing-wave modes (analogous to instrument-physics.md, but in 3 dimensions).
Two key ideas:
Sabine approximation:
RT60 = 0.161 * V / A
where V is room volume (m^3) and A is total absorption (sabins).
This links physical space to recorded/perceived tone and motivates digital reverb models.
Engineering targets are chosen relative to human perception:
So βaccurate audioβ is not only mathematical reconstruction; it is reconstruction sufficient for human hearing constraints.
| PhizMusic | Western/Engineering | Notes |
|---|---|---|
| Discrete pressure samples | PCM audio | Same concept |
| Sampling rate Fs | Sample rate | Samples per second |
| Nyquist limit Fs/2 | Nyquist frequency | Highest representable frequency without aliasing |
| Quantization levels 2^N | Bit depth resolution | Amplitude step count |
| Saturation clipping | Digital clipping | Full-scale overflow/truncation |
| Delayed replica mixing | Echo/delay effect | One of the simplest time-domain effects |
| RT60 = 0.161 * V / A | Sabine reverberation formula | First-order room decay model |