Electrophysiological indicators of phonetic and non-phonetic multisensory interactions during audiovisual speech perception.
Klucharev V., Möttönen R., Sams M.
We studied the interactions in neural processing of auditory and visual speech by recording event-related brain potentials (ERPs). Unisensory (auditory - A and visual - V) and audiovisual (AV) vowels were presented to 11 subjects. AV vowels were phonetically either congruent (e.g., acoustic /a/ and visual /a/) or incongruent (e.g., acoustic /a/ and visual /y/). ERPs to AV stimuli and the sum of the ERPs to A and V stimuli (A+V) were compared. Similar ERPs to AV and A+V were hypothesized to indicate independent processing of A and V stimuli. Differences on the other hand would suggest AV interactions. Three deflections, the first peaking at about 85 ms after the A stimulus onset, were significantly larger in the ERPs to A+V than in the ERPs to both congruent and incongruent AV stimuli. We suggest that these differences reflect AV interactions in the processing of general, non-phonetic, features shared by the acoustic and visual stimulus (spatial location, coincidence in time). The first difference in the ERPs to incongruent and congruent AV vowels peaked at 155 ms from the A stimuli onset. This and two later differences are suggested to reflect interactions at phonetic level. The early general AV interactions probably reflect modified activity in the sensory-specific cortices, whereas the later phonetic AV interactions are likely generated in the heteromodal cortices. Thus, our results suggest that sensory-specific and heteromodal brain regions participate in AV speech integration at separate latencies and are sensitive to different features of A and V speech stimuli.