SpiN 2025 :: programme

P58Session 2 (Friday 10 January 2025, 09:30-11:30)
Can the short-time objective intelligibility metric predict recall accuracy of spoken sentences in face-masked speech?

Cleopatra Moshona
Engineering Acoustics, Technische Universität Berlin, Germany

Background: Previous research indicated that face-masked speech can impair memory accuracy in listeners, particularly in noisy environments. However, it is unclear whether this decline results from reduced intelligibility due to signal degradation, increased listening effort that diverts cognitive resources from memory encoding, or a combination of both. Dual-task paradigms are often used to explore these interactions, though they may not always be feasible in certain experimental designs.

Methods: The present study investigated how speech style adaptations affect listeners’ working memory performance and retrospectively assessed the predictive value of the short-time objective intelligibility (STOI) metric. Eighty-two participants (48 female) listened to audio recordings of a female and male German native speaker uttering matrix-type sentences, with and without a face mask, in casual and Lombard speech. They completed a cued recall task, memorizing the last two words of 100 sentences, divided into 20 blocks, while exposed to task-irrelevant multi-talker babble with a speaker-adjusted, fixed SNR of +6 dB. A mixed binary logistic regression was calculated to predict recall accuracy using speech condition, speaker and listener sex as fixed factors and word position, serial sentence position, and block number as covariates. Random intercepts for participants and items and a random slope for serial sentence position by participant were included. A linear mixed-effects model was used to analyze the effects of sound condition and speaker on computed STOI scores and a point-biserial correlation evaluated the relationship between mean STOI and recall accuracy scores per sentence.

Results: The mixed binary model revealed significant effects on recall accuracy for all predictors except for speaker. Post-hoc comparisons indicated a 31.9% decrease in the likelihood of recalling keywords in masked casual speech versus masked Lombard speech, and a 27.4% decrease compared to unmasked Lombard speech. Recall odds were also 25.3% lower in masked casual speech relative to unmasked casual speech. Notably, female listeners demonstrated significantly higher recall accuracy odds, with a 48.1% increase compared to males. The linear mixed-effects model indicated significant effects of sound condition and speaker on STOI scores, with an interaction between these factors. Unmasked Lombard speech was more intelligible than masked casual speech, and the male speaker had higher scores than the female. Despite STOI scores following the same overall patterns as the experimental results, the point-biserial correlation revealed a weak positive correlation between mean STOI and recall scores per sentence, indicating that the outcome cannot be explained by intelligibility scores alone.

Last modified 2025-01-07 19:42:23

P58Session 2 (Friday 10 January 2025, 09:30-11:30)Can the short-time objective intelligibility metric predict recall accuracy of spoken sentences in face-masked speech?

P58Session 2 (Friday 10 January 2025, 09:30-11:30)
Can the short-time objective intelligibility metric predict recall accuracy of spoken sentences in face-masked speech?