P72Session 2 (Friday 10 January 2025, 09:30-11:30)An automated digits-in-noise hearing test using automatic speech recognition and text-to-speech: A proof-of-concept study
Aims: This study builds a framework for AI-powered speech-in-noise (SIN) tests. It specifically considers a digits-in-noise (DIN) test with Automatic Speech Recognition (ASR) and Text-To-Speech (TTS) and compares its performance against a benchmark test.
Methods: Three methods of DIN tests were compared: (1) An established DIN test, used as the Benchmark test; (2) Our implementation of the DIN test, called the Keyboard-based test, before adding TTS and ASR; (3) An AI-powered test, which was our implementation of DIN but extended to include TTS for synthesising the stimuli and ASR to transcribe the verbally repeated responses. Apart from stimulus generation and response capturing, its underlying code was identical to the keyboard-based test.
Ten hearing impaired and 21 normal hearing participants were recruited. The task in each test was to repeat three digits presented in babble noise to both ears in a sound-treated room. A 2-up/1-down approach was used to obtain the signal-to-noise ratio (SNR) corresponding to 71%-correct. A retest was carried out for both the Benchmark and AI-powered test. All tests were performed in a single session lasting no more than 90 minutes. The results (mean difference and 95%-limit of agreement — LoA) were compared using Bland-Altman analysis.
Results: The mean and 95%-LoA for test-retest reliability and comparisons between different methods were computed: the Benchmark test had a test-retest reliability with LoA of -0.4 ± 3.8 dB, and the AI-powered had a similar test-retest reliability of -0.9 ± 3.8 dB, indicating that the AI-powered test was as reliable as the Benchmark test.
Comparing the Benchmark test with the Keyboard-based method resulted in LoA of -0.7 ± 5.9 dB, and comparing the AI-powered test with the Benchmark test resulted in LoA of 0.0 ± 4.6 dB. The higher variability between the Keyboard-based test and the Benchmark test (5.9 dB) compared to the test-retest variability of the Benchmark test (3.8 dB) was likely due to differences in software implementations. The LoA for the Benchmark and AI-powered test (4.6 dB) was less than that of the Benchmark and Keyboard-based test (5.9 dB), indicating that the inclusion of TTS and ASR did not introduce additional variability and even improved the LoA. Which may be due to the elimination of typing errors and reduced distraction.
Conclusion: The developed framework works well: it adds little error compared to the test-retest reliability of the Benchmark test. The additional variance may arise at least in part from ASR errors. The results demonstrate proof-of-concept i.e., it may be possible to use ASR and TTS in a SIN test.