The ear is a biological transducer that converts pressure waves in air into electrochemical neural signals. Its most remarkable component β the cochlea β is a biological frequency analyzer that physically decomposes complex sounds into their component frequencies, performing in tissue what the Fourier transform performs in mathematics.
π― Simple version: Your inner ear is shaped like a snail shell. Different spots along it vibrate for different pitches β high pitches at the entrance, low pitches deep inside. Itβs like a piano keyboard made of flesh.
Sound processing passes through three stages in the ear:
The pinna (visible ear) and ear canal collect pressure waves and channel them to the eardrum (tympanic membrane). The ear canal acts as a resonant tube that amplifies frequencies around 2-4 kHz β the frequency range most critical for speech intelligibility. This is a physical resonance, not a choice: the canal is roughly 2.5 cm long, and a quarter-wavelength resonance at that length peaks near 3.4 kHz.
The eardrum vibrates in response to pressure waves. Three tiny bones (the ossicles β malleus, incus, stapes) mechanically link the eardrum to the oval window of the cochlea. Their purpose: impedance matching. The cochlea is filled with fluid, which has much higher acoustic impedance than air. Without the middle earβs lever action and area-ratio amplification, approximately 99.9% of sound energy would reflect off the fluid boundary. The ossicle chain provides roughly 25-30 dB of gain, making efficient air-to-fluid energy transfer possible.
This is where the transformation that matters for music happens.
The cochlea is a fluid-filled, coiled tube roughly 35 mm long when unrolled. Running along its length is the basilar membrane β a ribbon of tissue that varies in width and stiffness from one end to the other:
Cochlea (unrolled):
Base (oval window) Apex
ββββββββββββββββββββββββββββββββββββββββββββββββββββ
β narrow, stiff wide, flexible β
β β high frequencies β low frequenciesβ
β β
β 20,000 Hz βββββββββββββββββββββββββββββββ 20 Hz β
ββββββββββββββββββββββββββββββββββββββββββββββββββββ
Basilar Membrane
Position maps to frequency (tonotopic organization)
When a pressure wave enters the cochlea at the oval window, it creates a traveling wave along the basilar membrane. This wave grows in amplitude as it reaches the region whose local resonant frequency matches the input frequency, then dies off sharply beyond that point.
Each frequency excites a specific location. This is tonotopic organization β frequency maps to position, just as a pianoβs keys map to pitch. The mapping is logarithmic: equal distances along the basilar membrane correspond to equal frequency ratios, not equal Hz differences. This logarithmic mapping is the biological basis for why we hear pitch in ratios (see sound-waves.md β βLogarithmic Perceptionβ).
When a complex sound arrives β one containing many frequencies simultaneously β each frequency component excites its own region of the basilar membrane. The membrane physically separates the mixture into its constituent frequencies, just as a prism separates white light into colors. This is a biological Fourier transform performed by mechanical resonance, not computation.
Sitting on the basilar membrane are approximately 3,500 inner hair cells arranged in a single row. When the membrane vibrates at a given location, the hair cells there bend, opening ion channels, generating electrical signals that propagate via the auditory nerve to the brain.
The pattern of which hair cells fire, and how rapidly, encodes:
Phase locking is significant: below roughly 4 kHz, the brain receives both where on the membrane is vibrating (place code) and when the vibrations occur relative to the wave cycle (temporal code). This dual encoding gives pitch perception its high precision in the musically important frequency range.
The cochlea is not a passive filter. A second set of cells β roughly 12,000 outer hair cells arranged in three rows β act as a cochlear amplifier. These cells are motile: they physically change length in response to basilar membrane vibration, pumping energy back into the traveling wave.
This active amplification:
The nonlinearity of the cochlear amplifier has a musically important side effect: it generates combination tones. When two frequencies fβ and fβ enter the cochlea simultaneously, the nonlinear outer hair cells produce distortion products at frequencies like 2fβ - fβ and fβ - fβ. These are physically real vibrations on the basilar membrane β your ear literally creates frequencies that were not in the original sound. The most audible is the difference tone (fβ - fβ), sometimes called a Tartini tone after the violinist who first documented it in 1714. Combination tones play a role in consonance perception (see consonance-dissonance.md).
Critical bandwidth is the frequency range around a given center frequency within which two simultaneous tones interact β producing beating, roughness, or perceptual fusion rather than being heard as two distinct pitches.
Critical bandwidth is a direct consequence of the basilar membraneβs physical properties. Each point on the membrane responds not to a single exact frequency but to a range of nearby frequencies. Two tones whose excitation patterns overlap on the membrane fall within the same critical band.
The critical bandwidth varies with frequency:
| Center frequency | Approximate critical bandwidth | As percentage |
|---|---|---|
| 100 Hz | ~90 Hz | ~90% |
| 500 Hz | ~100 Hz | ~20% |
| 1,000 Hz | ~130 Hz | ~13% |
| 4,000 Hz | ~500 Hz | ~12.5% |
At musically relevant frequencies (200 Hz β 4 kHz), critical bandwidth is roughly 10-20% of center frequency, or approximately one-third of an octave (~3-4 chromatic steps).
Critical bandwidth matters enormously for music:
Sound wave (air pressure)
β
Outer ear: collect + resonant amplify (~2-4 kHz boost)
β
Middle ear: impedance match air β fluid (~25-30 dB gain)
β
Cochlea: frequency decomposition via basilar membrane
β (place coding) β (temporal coding)
Inner hair cells fire at Hair cells phase-lock to
specific membrane locations waveform cycles (<4 kHz)
β
Auditory nerve β brain
The cochleaβs output is not a single signal β it is an array of ~3,500 channels, each reporting activity at a different frequency. The brain receives a real-time spectrogram, decomposed by physics.
| PhizMusic | Western | Notes |
|---|---|---|
| Critical bandwidth | β | No standard Western theory term; used in psychoacoustics and audio engineering |
| Tonotopic mapping | β | No Western music theory equivalent; from auditory neuroscience |
| Combination tone / Tartini tone | Difference tone | Western tradition names it after Tartini (1714) |
| Cochlear amplifier | β | Biology term, not used in music theory |