Making Sense of Speech Understanding

Psychology students are drilled from the early days of their training to be mindful of the difficult task of operationalising abstract concepts. Consider then the task of an audiologist or an auditory scientist, such as myself, who is interested in measuring speech understanding. On the face of it, the task may not appear so difficult: we bring our participants to a laboratory where the acoustic environment is well controlled, give them a pair of expensive headphones to wear, and ask them to listen to spoken words or sentences and tell us what they hear. Quantifying their ability to understand speech is then simply a matter of counting the number of words they correctly understand in a variety of circumstances and calculating a percentage score. In this manner, we can show with good reliability that background noise, for example, impairs speech understanding (i.e. fewer recognised words) and that the noisier or more distorted the speech signal is, the worse speech understanding becomes. In fact, this is the typical test an audiologist may use when testing a new hearing aid with a client and one of the most common ways to measure speech intelligibility in a lab. Case closed, right? Not quite.

It’s reasonable to suggest that the main function of speech is communication and that our primary interest in measuring the ability to perceive speech is to understand how good communication is in various circumstances. You’ve likely already noticed that communicating becomes more difficult when you are in a noisy environment: words become harder to distinguish from the background. It is also true, however, that similar difficulties are experienced when you are listening to speech that is not in your native language or when you are trying to understand a complicated message, even when the words are easy to hear. So, what’s going on in those cases?

Understanding speech is hard work and if it seems so easy and automatic most of the time, it’s because there are so many processes that carry this load.

Speech communication involves a series of steps, the first of which is for the signal to be audible and correctly represented in the periphery of the auditory system. Noise reduces audibility (some of the sounds carried by a speaker’s voice are occluded by the noise), so speech becomes difficult to understand. Even when enough of the signal is audible, however, as is the case in our department’s cafeteria during the busy lunch hour, for example, you still need to be able to selectively focus on the speaker’s voice and distinguish it from the background din, comprehend the context and manner of the message, disentangle complex grammatical and semantic structures, store and continuously update this information in working memory, all the while eating your lunch and preparing appropriate responses to your friend’s ideas. Understanding speech is hard work and if it seems so easy and automatic most of the time, it’s because there are so many processes that carry this load.

when the listening circumstances become particularly tricky, listening becomes effortful.

If we return to the task we devised earlier in which we ask our participants to repeat words they hear over headphones, we begin to realise that what it doesn’t quite capture all the cognitive processes that are inherent in speech understanding and which, when the listening circumstances become particularly tricky, become increasingly necessary and the listening effortful. If we want good measures of speech perception, we need ones which are sensitive not just to how effective it is (i.e. how many words you can hear), but also how efficient it is (that is, how much effort it requires).

This was the aim of a recent study I worked on in collaboration with Hedderik van Rijn (Psychology), Deniz Başkent (UMCG), and Carina Pals (UMCG). Carina, who did this work as part of her PhD dissertation research, set out to validate a measure of speech intelligibility and listening effort that is simple enough to be easily implemented in an audiology clinic without the need for complex equipment or instructions. The hypothesis was that the more effortful the listening circumstances, the longer the processing time will be and, by extension, the slower the onset of the response. In other words, the harder a sentence is to understand, the longer the gap between hearing it and repeating it. We manipulated difficulty by adding background noise, either at a soft enough level that all words were repeated correctly, or slightly louder so intelligibility fell to 80% correct. What we found is that sentences heard without noise were the fastest to repeat and that response times increased as the level of noise increased. Crucially, this effect was observed when the sentences were correctly recognised, so audibility was not their cause; rather it was additional load carried by cognitive processing. The beauty of this paradigm is that it can easily be applied and automatised in many clinics, it is easy to use with young and elderly participants and provides a more complete measure of speech perception. Our hope is that it can be widely used when new hearing devices are fitted and tested, so listening to speech is as effortless as possible.




Pals, C., Sarampalis, A., van Rijn, H., & Başkent, D., (2015). “Validation of a simple response-time measure of listening effort.” J. Acoust. Soc. Am. 138(3), EL187-EL192.


Note: Image by Brian Smith, licenced under CC BY 2.0
Tassos Sarampalis on Twitter

Dr. Sarampalis is a lecturer at the Psychology department of the University of Groningen. He began his career in psychoacoustics in the UK where he worked with Deb Fantini and Chris Plack, before moving to California to work on hearing devices, first with Monita Chatterjee and then with Erv Hafter. His current research interests involve understanding the contributions of cognition in complex hearing situations and the interactions of cognition and hearing impairment. For more information, you can visit his website.

Select Publications

  • Everhardt, M. K., Sarampalis, A., Coler, M., Başkent, D., & Lowie, W. (2020). Meta-Analysis on the Identification of Linguistic and Emotional Prosody in Cochlear Implant Users and Vocoder Simulations. Ear Hear, 1. pdf

  • Pals, C., Sarampalis, A., Beynon, A., Stainsby, T., & Başkent, D. (2020). Effect of Spectral Channels on Speech Recognition, Comprehension, and Listening Effort in Cochlear-Implant Users. Trends in Hearing. pdf

  • Everhardt, M. K., Sarampalis, A., Coler, M., Başkent, D., & Lowie, W. (2019). “Perception of L2 lexical stress in words degraded by a cochlear implant simulation.” Proceedings of the 19th International Congress of Phonetic Sciences (ICPhS). Melbourne, Australia. pdf

  • Pals, C., Sarampalis, A., van Dijk, M, Baskent, D. (2018). “Effects of Additional Low-Pass–Filtered Speech on Listening Effort for Noise-Band–Vocoded Speech in Quiet and in Noise.” Ear and Hearing, pdf

  • Baskent, D., Clarke, J., Pals, C., Benard, M.R., Bhargava, P., Saija, J., Sarampalis, A., Wagner, A., & Gaudrain, E. (2016). “Cognitive Compensation of Speech Perception With Hearing Impairment, Cochlear Implants, and Aging: How and to What Degree Can It Be Achieved?” Trends in Hearing, 20, 1-16. pdf

You may also like

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.