SPIN2025: The Best of British! SPIN2025: The Best of British!

P20Session 2 (Friday 10 January 2025, 09:30-11:30)
Convolutional neural networks improve decoding of selective attention to speech in cochlear implants users

Constantin Jehn 
Friedrich-Alexander Universität Erlangen-Nürnberg, Germany

Adrian Kossmann, Niki Katerina Vavatzanidis, Anja Hahne
Ear Research Center Dresden (ERCD), University Hospital Carl Gustav Carus, Dresden, Germany
Technische Universität Dresden, Germany

Tobias Reichenbach
Friedrich-Alexander Universität Erlangen-Nürnberg, Germany

Cochlear implants (CIs) are neural prostheses that use artificial electrical stimulation of the cochlea to restore hearing in severely hearing-impaired individuals. While modern CIs enable a majority of users to achieve good speech understanding in quiet environments, background noise and competing speech streams pose significant challenges. Auditory attention decoding (AAD) seeks to decode the attention of a listener in a multi-talker situation from electroencephalography (EEG) data. AAD may be used in the development of a neuro-steered CI, which aims to help CI users in challenging listening situations by amplifying the target speaker and attenuating background sounds. A variety of methods for AAD in normal-hearing individuals have been developed and evaluated over the past years, with deep neural networks (DNNs) proving superior to linear models in terms of decoding performance. However, although the feasibility of AAD in CI users has been demonstrated by several studies, the advantages of DNNs remain to be proven for CI users. Here we demonstrate how the implementation of a convolutional neural network (CNN) improves the decoding of selective attention to speech in CI users. First, we collected a substantial selective attention dataset from 25 bilateral CI users (15 female 10 male, median age 56 years ± 11.1), where stimuli were presented in a free field environment and EEG was measured simultaneously. Second, we implemented a CNN as well as a linear backward model for AAD. The CNN emerged as the superior method, as measured by the achieved decoding accuracy on all studied decision windows ranging from 1s to 60s. In conjunction with a learnable Support-Vector-Machine for speaker classification, the CNN achieved a maximal decoding accuracy of 74% (± 11%) on the population level and thereby significantly outperformed the linear backward model. These findings underscore the potential of DNNs with adaptable speaker classification as promising candidates for neuro-steered CIs, translating advancements made in AAD for normal-hearing individuals to benefit CI users.

Last modified 2025-01-07 19:42:23