P47Session 1 (Thursday 9 January 2025, 15:25-17:30)Will I speak louder if I see you struggle understanding?
During face-to-face interactions, the quality of the conversation can be estimated on non-verbal cues expressed behaviorally by conversational partners. Here, we investigated whether a specific subset of these non-verbal cues (i.e., facial expressions of confusion, approaching movements towards the speaker) can prompt speakers to modify acoustic-phonetic characteristics of their speech to enhance clarity. We addressed this research question in the absence of acoustic feedback from the environment, without previous knowledge of the listener status and when no verbal feedback from the conversational partner is provided. In Experiment 1, we asked participants (N=22) to read short sentences to a listener seated in a separate room and visible through a head-mounted display. We informed participants that the listener hears their voice plus a background noise of different intensities, although this was not audible to them. The listener was, in fact, a confederate whose non-verbal cues were pre-recorded in silence with a 360 video-camera to convey three different levels of inferred listening effort (i.e., easy, medium and hard listening). We found that, with increasing non-verbal cues, participants reported that the listener experienced increased listening effort and reduced speech comprehension. Importantly, speakers also changed speech proportion (i.e., speech rate), voice intensity and fundamental frequency, which increased from easy to medium, and from medium to hard listening (as inferred from non-verbal visual cues). In Experiment 2 (N=12), we replicated the study but omitted from the cover story the information that the listener hears a background noise. Even in the absence of this contextual information of the listener’s experience, the speakers modify acoustic-phonetic characteristics of their speech to enhance clarity as a function of non-verbal cues. Unlike Experiment 1, participants appeared to have prioritized clarity over loudness, because vocal intensity was the only parameter that did not change. Our experiments show that facial expressions of confusion and approaching movements towards the speaker can effectively elicit acoustic-phonetic adaptations, in the absence of acoustic feedback from the environment, without previous knowledge of the listener status and when no verbal feedback from the conversational partner is provided. We suggest that, during face-to-face interactions, non-verbal cues offer a significant advantage by providing the speaker with real-time feedback on the quality of the conversation. This instant feedback is not achievable through prior information about the listener’s characteristics or the environment.