SPIN2025: The Best of British! SPIN2025: The Best of British!

P75Session 1 (Thursday 9 January 2025, 15:25-17:30)
Interpreting the binaural speech intelligibility model as an internal beamformer

Johannes W. de Vries
Delft University of Technology, Netherlands

Steven van de Par
Carl von Ossietzky University of Oldenburg, Germany
Cluster of Excellence Hearing4all, Germany

Richard Heusdens
Netherlands Defence Academy
Delft University of Technology, Netherlands

Richard C. Hendriks
Delft University of Technology, Netherlands

Understanding speech is an integral part of daily life, and different solutions exist to aid people who are hard of hearing with their intelligibility issues. Because intelligibility measurements for these solutions can be expensive and time-consuming, models have been developed to predict intelligibility based on input sound signals. One such model is the binaural speech intelligibility model (BSIM) (Beutelmann & Brand, 2006, JASA 120:331), which, at its core, is a concatenation of the equalisation–cancellation (EC) model and the monaural speech intelligibility index (SII). Although this model has been verified to predict binaural intelligibility well, it is mathematically somewhat convoluted. Consequently, the internal mathematical optimisation problem is solved numerically with an inefficient grid search. A later revision (Beutelmann et al., 2010, JASA 127:2479) improves on the computational complexity, but only approximates the original optimisation. In this work, the mathematical framework of the BSIM is reformulated, making it both more efficient and more straightforward to use in new contexts.

The main challenge of the BSIM is in finding optimal EC parameters. As the speech-related input signals are processed independently per frequency band, the model can first be transformed to the short-time frequency transform (STFT) domain. The attenuation and delay operations of EC on the input signals correspond to a complex scaling in this domain. Therefore, the entire EC phase can be interpreted as an ‘internal beamformer’ applied to the input signals, whose elements are determined by the attenuation and delay parameters. The output signal power of the EC phase can then be written in terms of the internal beamformer and the input cross power matrix, which is periodogram estimated from the input signal realisations. The effects of the artificial processing errors in the BSIM can be incorporated by element-wise multiplying the cross-power matrix with a matrix that depends on the error variances. In this description, the model’s output SNR takes the form of a generalised Rayleigh quotient of the internal beamformer, such that the optimal EC parameters can be found efficiently through a generalised eigendecomposition.

Simulations of the original and novel implementations of the BSIM show that the internal beamformer framework results in similar intelligibility predictions but with a 10 to 20 times faster simulation time. Aside from the more efficient implementation, the mathematically more compact formulation allows the model to be used more easily in other applications, such as binaural beamformer design or machine learning approaches.

Last modified 2025-01-07 19:42:23