A Shepard tone is an auditory illusion of endlessly ascending (or descending) pitch. Like a barber pole that appears to climb forever, a Shepard scale cycles through pitch steps while never actually getting higher or lower in any absolute sense. The illusion works because each “tone” is actually a stack of frequency components spread across many octaves, shaped by a fixed amplitude envelope that masks the recycling of components at the edges of the audible range.
🎯 Simple version: Imagine a spiral staircase where you keep climbing but never reach a higher floor. A Shepard tone does this with sound — it sounds like it’s always going up (or down), but it never actually gets higher or lower. The trick: it’s not one pitch, it’s many pitches spread across octaves, fading in at the bottom and out at the top so you never notice the recycling.
Each “tone” in a Shepard scale is not a single frequency. It is a set of frequency components spaced exactly one octave apart — for example, 62.5 Hz, 125 Hz, 250 Hz, 500 Hz, 1000 Hz, 2000 Hz, 4000 Hz, and 8000 Hz. That is eight components, each double the frequency of the one below.
The critical ingredient is a fixed amplitude envelope — a bell curve (Gaussian) centered on a constant frequency (typically around 500 Hz). Components near the center of the envelope are loud; components far from the center are quiet. The envelope does not move. Only the individual frequency components move.
When the scale ascends by one chromatic step (multiply all frequencies by 2^(1/12) ≈ 1.0595):
Because the amplitude envelope stays fixed, the overall brightness (spectral centroid) of the sound never changes. Your ear detects that individual components are moving up, but the macro-level spectral shape is frozen. The result: perpetual ascent with no destination. A sonic barber pole.
A Shepard tone at base frequency f consists of N frequency components at octave intervals:
Component frequencies: f, 2f, 4f, 8f, 16f, ... , 2^(N-1) × f
Each component’s amplitude is set by a Gaussian envelope centered on a fixed reference frequency f_center (typically ~500 Hz), with a standard deviation of sigma octaves (typically ~2 octaves):
amplitude(freq) = exp( -( (log2(freq) - log2(f_center)) / sigma )^2 )
This means a component at exactly f_center has amplitude 1.0, and components further away (in octave distance) fall off as a Gaussian bell curve.
To ascend by one chromatic step, multiply the base frequency by 2^(1/12):
f_new = f × 2^(1/12) ≈ f × 1.05946
All N components shift accordingly. After 12 steps, f has doubled — but since octave-spaced components are perceptually equivalent (octave equivalence), and the amplitude envelope has not moved, the tone sounds identical to where it started. One full “revolution” of the barber pole.
The frequency wrapping boundaries keep components in the audible range:
If freq > f_upper (e.g. 8000 Hz): wrap to freq / 2^(num_octaves)
If freq < f_lower (e.g. 30 Hz): wrap to freq × 2^(num_octaves)
The illusion exploits a conflict between two pitch cues the brain uses:
1. Local pitch movement (individual components). Each frequency component is clearly moving up (or down) by a semitone per step. The cochlea detects this unambiguously — the excitation pattern on the basilar membrane shifts.
2. Global spectral shape (overall brightness). The spectral centroid — the “center of mass” of the frequency spectrum — stays fixed because of the stationary Gaussian envelope. The brain also tracks spectral centroid as a cue for “how high” a sound is.
When these cues conflict — individual components rising but overall brightness unchanged — the brain resolves the ambiguity by prioritizing the local movement cue. You hear “going up” because the components you can hear clearly (near the center of the envelope) are all moving up. The recycling at the faint edges is below perceptual threshold.
This is closely related to harmonic template matching. The auditory system groups octave-spaced components and assigns a single pitch percept. Because all grouped components move in the same direction, the percept moves too — endlessly.
The phenomenon also depends on equal loudness perception: the Gaussian envelope must be calibrated so that the fade-in and fade-out regions are truly below audible threshold at their extremes, accounting for the ear’s frequency-dependent sensitivity.
| PhizMusic | Western | Other Systems |
|---|---|---|
| Shepard tone | Shepard tone, Shepard scale | Same term used universally |
| Frequency components (octave-spaced) | Octave-spaced partials | — |
| Gaussian amplitude envelope | Spectral bell curve, amplitude weighting | — |
| Pitch circularity | Pitch paradox, auditory barber pole | Tritone paradox (Diana Deutsch) — related |
| Chromatic step (×2^(1/12)) | Semitone, half step | 100 cents |