ANALYSIS-BY-SYNTHESIS IN AUDITORY-VISUAL SPEECH PERCEPTION: MULTI-SENSORY MOTOR INTERFACING

Virginie van Wassenhove
California Institute of Technology

ID 1763
[full paper]

In conversation, one sees as much as one hears the interlocutor. Compelling demonstrations of auditory-visual (AV) integration in speech perception are the classic McGurk effects: in McGurk “fusion,” an auditory [p] dubbed onto a face articulating [k] is perceived as a single fused percept [t], but in McGurk “combination,” an auditory [k] dubbed onto a visual [p] is heard as combinations of [k] and [p]. The spatiotemporal co-occurrence of AV speech signals is likely used by the brain. AV integration offers interesting challenges for neuroscience and speech science alike. How, when, where, and in what format do auditory and visual speech signals integrate? Several studies are described, suggesting that multisensory speech integration relies on a dynamic set of predictive computations involving large-scale cortical, sensorimotor networks. Within an ‘analysis-by-synthesis’ framework, it is suggested that speech perception entails a predictive brain network operating on abstract speech units.