SPIN2025: The Best of British! SPIN2025: The Best of British!

P70Session 2 (Friday 10 January 2025, 09:30-11:30)
Predicting noisy speech intelligibility beyond acoustics: Including listeners’ phonetic and language abilities.

Mark Huckvale, Gaston Hilkhuysen
University College London, UK

Scholars typically distinguish acoustic, auditory, phonetic and language stages in human speech perception. Intelligibility metrics however traditionally only account for changes in the acoustics of the noisy speech signal, although some now include information about the auditory sensitivity of hearing-impaired. Such metrics assume that cochlear impairment induces acoustic distortions, reducing the hearing abilities of people with a hearing loss. Unfortunately, such metrics predict the intelligibility of noisy speech for given listeners poorly. The current study examines whether adding factors that account for the listeners’ phonetic and language abilities can improve these predictions. The study is based on 13000 intelligibility judgments by 31 hearing-impaired listeners generated in the second Clarity Prediction Challenge.

To assess phonetic ability we start by computing the phone probabilities from the original and distorted speech audio. The correlation of these probabilities defines the degree of phonetic distortion, which is subsequently regressed on word-correct scores for each listener. The positions of these regression functions define a listener’s phonetic ability.

We also computed word probabilities for the intelligibility sentences using trigram word probabilities calculated from the British National Corpus. The position of the regression functions of these word probabilities while predicting word-correct scores defines a listener’s language ability.

A principal component analysis of auditory, phonetic and language abilities showed almost orthogonal auditory and language abilities, while phonetic abilities correlated intermediately with the other two abilities. A cross-validated regression analysis found that intelligibility predictions based on the short-time objective intelligibility measure (STOI) were improved when extended with the listeners’ phonetic and language abilities. A regression model based on STOI, acoustic and phonetic abilities generalized best to unseen test data, giving a 9.8% relative reduction in prediction error

Outcomes indicate that phonetic and language abilities are promising predictors while estimating the speech intelligibility for specific listeners with hearing loss. It opens ways to account for intelligibility variation across speech corpora as well as differences between native and non-native listeners. Future research could focus on predictor interactions.

Last modified 2024-11-22 15:45:01