In this article, I talk about how psychoacoustics of the human hearing works and why it matters for music producers, sound designers, and mixing engineers.
Every mixing engineer should know how our hearing works to get more effective at creating convincing spaces. When we are mixing, we target a specific playback setting: a stereo configuration, a surround setup, or a binaural configuration. The latter benefits the most from an understanding of psychoacoustics. But what exactly is it? Let’s find out.
How Does Psychoacoustics Work?
Psychoacoustics uses studies on the hearing of organisms to develop models on their perception of sound. These models help people accomplish audio-related tasks. In music, you’ll find psychoacoustics being used for audio file compression, plugin development, monitoring technology, and mixing.
In this article, I will focus on the music side, of course. Hence, I have divided it into two parts: The Science and The Mix. If you aren’t familiar with psychoacoustics, read on. And if you’re already familiar with the terms and only wish to learn how to apply psychoacoustics in your mixes, skip to the second part. However, I recommend reading serially so that we’re on the same page.
In this section, I describe what psychoacoustic is, how human hearing works, and what factors affect our perception of sound. Understanding these concepts is essential to implement them into your mixes.
What Is Psychoacoustics?
Psychoacoustics is the branch of psychology that studies the brain’s perception of changes in sound and its physiological effects. The changes could include a delay between the ears and a change in the frequencies. It studies how the brain localizes and associates sounds, something we all do subconsciously.
In other words, psychoacoustics studies how physics interacts with physiology to generate auditory perception. And while localization is the most exciting part, psychoacoustics also explores how the brain perceives loudness, intermodulation distortion (multiple frequencies mixing), masking, and other, more subjective responses. However, as musicians, our focus is on how psychoacoustics could be exploited to create better, more immersive mixes in music, games, and films.
How Do Humans Perceive Frequency?
Sound causes the eardrum to vibrate and transmit the motion to three bones called the malleus, incus, and stapes in sequence. The stapes vibrate the cochlea membrane. Vibrations at the membrane’s base are responsible for higher frequencies, whereas the farthest part from the stapes is for lower frequencies.
If you are after more detail, the stapes’ vibrations set a wave of fluid across the cochlea membrane into motion. The fluid stimulates the cochlea membrane’s hair cells, which trigger the auditory nerve. This nerve transmits the signals to the brain, which processes the sound by determining where the sound originated in the cochlea.
Here’s a video demonstrating this process:
What Frequencies Can Humans Hear?
Theoretically, humans can detect the sound frequencies from 20 Hz to 20 kHz. Interestingly, human infants can hear frequencies slightly higher than 20 kHz. However, maturity reduces high-frequency sensitivity, resulting in an upper limit closer to 15 to 17 kHz in most adults.
Next, several topics within psychoacoustics describe how our physiology affects our perception of sound. It could be about frequencies, loudness, or panning. So, in this section, I’ll describe some of these topics that I find particularly interesting to think about as a mixing engineer.
Sound localization is a biological process that gives the listener the ability to identify the location or origin of a sound. It lets one detect the direction and general distance of the sound source. The auditory system does so by using several cues like the time and the level difference between the ears.
Now, let’s have a look at the various cues that we humans use for sound localization:
The difference between the sound’s arrival time between our ears helps us detect the azimuth or the left-right angle of the sound’s origin. When a sound arrives at our left ear a few milliseconds quicker than the right ear, we instinctively think the sound originated from the left side. The more significant the difference, the farther to the side the sound seems.
The loudness of the sound helps tell us the distance between the sound origin and ourselves. A louder sound means a closer origin, whereas a softer sound means the origin is far. Also, it helps when you are familiar with the loudness of the sound you are detecting. For instance, we know how loud a truck generally is. So, when we hear a truck behind us, we can judge the distance based on the loudness alone.
The loss of high frequencies signals distance. High frequencies, which are highly directional, don’t travel as far as low frequencies. So, when the sound origin is far from us, the high frequencies disappear by the time the sound reaches us.
The reverb effect is created by a sound, reflecting on various objects like walls and the ceiling of a hall. The ratio of the actual sound to the reverberated (wet) signal helps indicate the distance between the sound origin and ourselves. If we hear more of the reverb, we assume the sound is far from us, and vice versa.
The shape of our head and torso acts as a barrier to change the timbre, intensity, and spectral shape (frequency spectrum) of the sound. Furthermore, they also reflect the sound and change the frequency spectrum. These minute changes in the sound help us detect the elevation or the vertical angle (up-down, front-back) of the sound origin. However, since physiology differs from person to person, it is challenging to simulate convincing changes in the elevation compared to the azimuth.
The human body uses these cues together to determine the location and the movement of the sound origin. “Binaural audio” essentially means emulating various settings for each cue and creating stereo audio based on the results. There are many plugins that help you create binaural audio, like the free Sennheiser Ambeo Orbit. However, today, I will focus on the separate elements of binaural mixing. Let’s get started!
People have used psychoacoustics in many ways. The most straightforward application to understand is the creation of audio compression codecs like MP3 and OGG. They use concepts of psychoacoustics to remove bits of information from a raw audio file, reducing the file size but keeping it difficult for human hearing to detect it. Another use is developing weapons that emit frequencies to kill or stun subjects. And alarm systems use psychoacoustics to ensure the sound generates the expected response from listeners. However, our interest is in using psychoacoustics to understand mixing, which brings us to the question:
Why Should I Use Psychoacoustics In My Mixes?
When you are making music or working on audio post-production, the general way to localize sound is by using panning. However, panning can only get you so far and can end up creating too much energy on one channel. Using psychoacoustics can keep the energy balanced but with distinct localization.
Similarly, mixing using psychoacoustics also creates much more immersive results. It is most useful for game audio, where players can immerse themselves in a convincing sonic world without an elaborate surround setup.
Psychoacoustics In Mixing
Now, let’s talk about the various ways psychoacoustics affects our mixing. We’ve talked a lot about localization, but there are several other concepts that help create better mixes. Let’s dive into them:
Masking means the phenomenon when a louder signal makes a weaker signal unintelligible. It happens mainly when the louder frequency is close to the softer one. For example, if you have a loud hi-hat playing, a whispered voice will not be audible. This issue is why creating frequency pockets is essential. If you want to prioritize a sound, reduce the frequencies it primarily covers on other instruments/sounds using an EQ.
Furthermore, psychoacoustics also describes that a loud sound immediately preceding or following can mask the weaker sound, even if they aren’t playing together. It occurs because our ears reduce their “input levels” to protect themselves and because our physiology becomes desensitized to the weaker signal in the presence of a stronger one.
Horror film scores often use this behavior of our hearing to create strong impacts using sound. You never hear a jump-scare sound immediately after a loud noise. Instead, we reduce the volume of everything else before the jump-scare, albeit unpredictably.
In mixing, you can create a short delay (<40 ms) between the two channels of your audio to localize its azimuth (horizontal angle) without changing the pan. However, be careful not to create phasing issues; it’s best to use the Haas effect on a short sound only. Sometimes, I also use both panning and a channel delay to make the localization apparent without going overboard in either method. Here’s a free plugin to help you achieve the Haas effect.
Another use of this effect is to widen the audio. I’d say the vocals benefit the most from the Haas effect in music. However, you can use it and even add automation to make sound effects like fire, water, ambiance, and other similar sounds enveloping and immersive if the scene demands it.
I’ve already described how distance reduces high frequencies. And you can use this effect when mixing too. For example, let’s say you have a scene where a car is driving from a distance towards the camera. You could use a general car foley for the scene but add an automated high-shelf cut, reducing the effect as the car comes nearer.
In music, you can reduce the high frequencies in instruments you don’t want to prioritize using an EQ. Avoid listening to instruments by soloing them. Instead, listen to the mix as a whole, choose which instrument(s) you want to focus on, and push the rest of the mix back by reducing their high (and sometimes low) frequencies. Doing so can make your mix sound much more focused and purposeful immediately.
Fletcher Munson Curve
The Fletcher Munson curve describes how the human ear responds to different frequencies at different volume levels. We generally hear frequencies around 2-5 kHz louder than the rest of the spectrum. The following image describes how loud a frequency has to be for our hearing to perceive equal loudness:
I know; it’s pretty confusing. But there’s an easier way to understand this image: flip it upside down!
Now, the image shows which frequencies you hear louder and softer at the same dB SPL.
A well-known use of this curve is in creating the LUFS loudness measurement unit. However, you can use this knowledge when mixing to ensure what you’re doing is right. The curve shows that the softer your monitoring level is, the more significant the perceived loudness differences between the frequencies are. So, if you seem to hear more of the mid-frequencies than the lows when you are monitoring at a lower level, you might want to increase your monitoring level a bit before adjusting the mix.
Similarly, the curve also explains why the “smiley-face” EQ curve, where you boost the lows and highs and scoop the mid-frequencies, sounds good. It flattens the Munson curve by compensating for our hearing limits.
Every waveform (other than a sine wave) has multiple frequencies playing back together. Often, these frequencies are multiples of the fundamental frequency, the first harmonic in the waveform. However, since our brain is used to hearing this way, you can remove the first and subsequent (to an extent) harmonics, and the waveform’s pitch seemingly remains the same.
Here’s a video demonstrating it:
Having this knowledge lets you know the extent to which you can reduce the low frequencies of an instrument. If you reduce too many harmonics, you might make the sound lose its pitch information entirely. However, as undesirable as that is in music, it can sometimes be helpful to create ambiguous sounds.
Our perception of sound changes based on our physiology, acoustic environment, and situation. The study of this phenomenon has helped researchers understand how sound incites specific responses in a listener, how music affects our brain, and how we can improve auditory technology (consumer tech like game audio and clinical tech like hearing aids).
Understanding psychoacoustics can help you shape your mixes into immersive experiences. And while I have covered a number of topics in this article, there is still much more to learn.
Furthermore, even if my article primarily focuses on stereo mixes, you can also apply the same concepts in surround mixing. Although, creating a delay between two audio channels in a surround setup may not produce the same effect as in a binaural mix. However, you can use it to make the sound source seem huge but localized.
For example, in a 7.1 setup, you could make the sound arrive a little quicker at the Front Right than at the Side Right and Rear Right. It will pinpoint the sound at the exact front-right spot but still maintain a sense of ambiguity, making the origin appear gigantic.
And that brings us to the end of this article. I hope I could share a few new things and incite interest in you to study psychoacoustics. I highly recommend researching further about these subjects if you work on films or games.
K. M. Joshi is a multi-award-winning composer and sound designer, specializing in film, game, and TV audio. He enjoys making cinematic music, rock, blues, and electronica.