Why music sounds right – the hidden tones in our own speech

ByEd Yong
March 14, 2009
6 min read

Have you ever looked at a piano keyboard and wondered why the notes of an octave were divided up into seven white keys and five black ones? After all, the sounds that lie between one C and another form a continuous range of frequencies. And yet, throughout history and across different cultures, we have consistently divided them into these set of twelve semi-tones.

Now, Deborah Ross and colleagues from DukeUniversity have found the answer. These musical intervals actually reflect the sounds of our own speech, and are hidden in the vowels we use. Musical scales just sound right because they match the frequency ratios that our brains are primed to detect.

When you talk, your larynx produces sound waves which resonate through your throats. The rest of your vocal tract -your lips, tongue, mouth and more – act as a living, flexible organ pipe, that shifts in shape to change the characteristics of these waves.

What eventually escapes from our mouths is a combination of sound waves travelling at different frequencies, some louder than others. The loudest frequencies are called formants, and different vowels have different ‘formant signature’. Our brains use these to distinguish between different vowel sounds.

The first two formants, the ones with the lowest frequencies, are the most important. The brain pays particularly close attention to these and uses them to identify vowels. If they are artificially removed from a recording, the speaker becomes impossible to understand. On the other hand, getting rid of the higher formants does no such thing.

(This spectrogram shows the different frequencies that make up three different vowels. Frequency goes up the vertical axis. The darker the image, the louder that particular frequency is. For each vowel, the first two formants (the lowest dark bands) are marked.)

Despite the wide variety of sounds in different languages, and the even greater variety in people’s voices, the formants of their vowels fall into narrow and defined ranges of frequencies. The first one always has a frequency of 200-1,000 Hz, while the second always lies between 800 and 3,000 Hz.

Ross analysed the formants of English vowels by asking 10 English speakers to read out thousands of different words and some longer monologues. Amazingly, she found that the ratio of the first two formants in English vowels tends to fall within one of the intervals of the chromatic scale.

When people say the ‘o’ sound in rod, the ratio between the first two formants corresponds to a major sixth – the interval between C and A. When they say the ‘oo’ sound in booed, the ratio matches a major third – the gap between C and E. Ross found that every two in three vowel sounds contain a hidden musical interval.

Her results didn’t just apply to English either. Ross repeated her experiments with people who spoke Mandarin, a vastly different language where speakers use four different tones to change the meaning of each word.

Even so, Ross still found musical intervals within the formant ratios of Mandarin vowels. The distribution of the ratios was even similar – in both languages, an octave gap was most common, while minor sixth was fairly uncommon.

Ross believes that these hidden intervals could explain many musical curiosities. For example, the musical preferences of a certain culture could reflect the formants most commonly used in its language.

Hardly any music uses the full complement of 12 semitones, and European music usually limits itself to just 7 – the so-called ‘diatonic scale‘ represented by a piano’s white keys. Music from other parts of the world tends to use the ‘pentatonic scale’ where the octave is split into just 5 tones.

Ross found that the 70% of the chromatic intervals in her data were included in the diatonic scale, and 80% were found in the pentatonic one. She reckons that these scales are so widely used because they reflect the most common formant combinations in our speech.

She now wants to see if the link between formants and intervals can explain why music in a major key instinctively sounds happier and more upbeat than music in a minor key.

Formants are common to the vast majority of languages and cultures, which explains why the twelve-semitone chromatic scale is so universal. Regardless of our cultural differences, it is heartening to realise that in some ways, we are all the same.

Reference: Ross, Choi and Purves. 2007. Musical intervals in sounds. PNAS 104: 9852-9857.

Subscribe to the feed

LIMITED TIME OFFER

Get a FREE tote featuring 1 of 7 ICONIC PLACES OF THE WORLD

Go Further