P37Session 1 (Thursday 9 January 2025, 15:25-17:30)Sound quality in DNN-based hearing-aid algorithms
Current hearing aids typically address outer-hair-cell (OHC) damage and associated hearing sensitivity loss, but do not consider age- or noise-exposure-related damage to auditory-nerve fibers (i.e., cochlear synaptopathy, CS). To compensate for individual and combined CS and OHC damage patterns, closed-loop systems which include biophysical models of (impaired) auditory signal processing can generate personalized sound processing algorithms. These closed-loop systems are most promising when the incorporated models are formulated as deep-neural-networks (DNN), such that the resulting sound processing algorithm can be obtained through backpropagation. One such method, CoNNear, is based on autoencoder models of auditory processing that comprise a modular and differentiable description of the cochlear mechanics, inner-hair-cell, and auditory-nerve fiber processing stages. The sound processors can be trained by minimizing biophysical auditory-processing differences between normal-hearing and hearing-impaired models, which can be embedded with AI-hardware.
However, these end-to-end systems come with a different kind of signal-processing artifacts than traditional sound processors. For example, the transposed convolutions included in CNN-based auditory processing modules can create tonal artifacts. The artifacts will propagate within the closed-loop framework to ultimately become overamplified and audible in the resulting hearing-aid algorithm.
To address this challenge, we propose a dilated CNN architecture that comprises a sequence of stacked memory blocks, which are most promising and artifact-free for closed-loop audio processing. To avoid using upsampling in the decoder, depthwise dilated 1-D convolutions are employed within each memory block to avoid artifacts while modeling the long-term dependencies of neural speech processing (e.g. cochlear impulse response durations and neural adaptation). We then employed the dCoNNear architecture to all auditory elements inside the closed-loop system as well as for the sound processors, and evaluated the sound quality as well as the compensation accuracy of the resulting algorithms. Our results show that dCoNNear cannot only accurately simulate all processing stages of non-DNN-based SOTA biophysical auditory processing system, but does so without introducing spurious and audible artifacts in the resulting sound processors. The predicted restoration accuracy for simulated auditory-nerve population responses shows that our algorithms can be used for both OHC and CS pathologies. The trained dCoNNear audio processors can process audio inputs of 3.2 ms in < 0. 3ms, which demonstrates its real-time capabilities. We conclude that the dCoNNear-based frameworks hold great promise for real-time and personalized hearing loss compensation strategies with high sound quality.
Acknowledgements: This work is supported by FWO Machine Hearing 2.0 and EIC Transition EarDiTech101058278