WORKSHOP ON AUDIOVISUAL SPEECH PROCESSING AND LANGUAGE LEARNING

Speakers

<div>
 <a href="https://speechhearing.columbian.gwu.edu/lynne-e-bernstein">Lynne Bernstein</a>, George Washington University</div>
<div>
 <a href="http://www.sap.upf.edu/node/328">Mathilde Fort</a>, Pompeu Fabra University</div>
<div>
 <a href="http://www.babylab-grenoble.fr/">Olivier Pascalis</a>, CNRS-Grenoble University</div>
<div>
 <a href="http://directori.ub.edu/dir/?accio=SEL&id=wi6pwqp35b6wv72r&lang=ca">Ferran Pons</a>, University of Barcelona</div>
<div>
 <a href="https://www.researchgate.net/profile/M_Macsweeney/citations?sorting=citationCount&page=2">Mairead MacSweeney</a>, University College of London</div>
<div>
 <a href="http://www.gipsa-lab.grenoble-inp.fr/~jean-luc.schwartz/">Jean-Luc Schwartz</a>, Grenoble Institute of Technology</div>
<div>
 <a href="http://www.mrg.upf.edu/node/112">Salvador Soto-Faraco</a>, Pompeu Fabra University</div>
<div>
 <a href="https://tuhat.halvi.helsinki.fi/portal/en/persons/kaisa-tiippana(ec563719-5e40-471a-864d-84d32aae045b).html">Kaisa Tiippana</a>, University of Helsinki</div>
<div>
  </div>
<div>
 
 Lynne E. Bernstein
 
 Title: Individual Differences in Visual Speech Perception, Visual Speech Perceptual Learning, and Visual Speech Enhancement to Audiovisual Speech Perception in Noise
 
 This talk will address individual variation in lipreading ability, in responding to visual speech perception training, and in achieving visual enhancement to auditory speech perception in noise among adults with normal and impaired hearing. The perspective is that visual speech is represented in visual cortical areas qua speech.  Individuals vary in the extent to which they have achieved visual speech representations. Training paradigms that focus attention on visual speech information and integration of that information for word recognition are expected to be most successful. Recent results support the prediction that orthographic feedback during training promotes top-down feedback from auditory phonological representations, impeding learning of visual speech. Sparse feedback that guides attention to sublexical speech information promotes learning, including generalization to untrained visual speech stimuli. Development of effective visual speech training is needed to address the needs of older adults who experience difficulties perceiving speech in noise.
 
  
 
 Mathilde Fort
 
 Title: Emergence of audiovisual associations in infancy: the role of language specific experience
 
 The main goal of my research is to investigate how language specific experience influences cross-modal associations in early language acquisition. I will present 2 lines of research: the first will present a meta-analysis exploring the emergence of cross-modal sound-symbolic matching in infancy and childhood (i.e. "bouba-kiki effect"). The second investigates the impact of early bilingualism on infant's ability to process talking faces, using eye-tracking measures. 
 
  
 
 Mairead MacSweeney
 
 Title:  The relationship between speechreading and reading in deaf children
 
 Speechreading (lipreading) is the ability to understand speech in the absence of sound. For most deaf people, speechreading is the primary route to access spoken language.  Longitudinal studies have provided evidence for the importance of speechreading (lipreading) as a predictor of variance in reading outcomes in deaf children (Kyle and Harris, 2010; 2011).  On the basis of our previous behavioural and neuroimaging research we  propose that  speechreading provides deaf children with visual information about the sublexical structure of spoken English and that this information helps deaf children to establish amodal phonological representations of speech which they can bring to the task of learning to read.
 
 We have previously found that deaf adults but not deaf children outperform their hearing peers on tests of speechreading (Kyle et al., 2013; Mohammed et al., 2006). This pattern of results suggests that increased experience of understanding silent speech leads to improvements in speechreading ability, and therefore raises the possibility that speechreading ability can be trained. 
 
 In this talk I will present the background data to support this model. I will also present the preliminary results from a randomised controlled trial in which we tested the influence of computerised speechreading training on reading development in young deaf children.
 
  
 
 Olivier Pascalis
 
 Title: Multisensory Representation of Gender by Infants
 
 Human faces provide multisensory inputs to infants, exposing them not only to visual information but also to the voice and language of their caregivers. This early multisensory perceptual experience shapes how we organize human faces into salient and biologically relevant social categories (e.g., gender, age, and race). The ability to reliably match synchronous faces and voices when presented with gender information seems to emerge by the age of 6 months (Walker-Andrews et al., 1991). However, the use of synchronous multimodal signals leaves open the question of whether infants are genuinely representing gender across face and voice or make the match based on speaker identity or idiosyncrasies in visible and audible articulatory or respiratory patterns. Additional studies have overcome this problem by dubbing other voices onto the visual stimuli. For example, Patterson and Werker (2002) showed an encoding of multisensory gender coherence at 8 months by presenting infant-directed (ID) vowels. More recently, we found that infants associated audible and visible female gender attributes only by the age of 9 months, when encoding infant-directed (ID) face-singing nursery rhymes. One possible explanation for this difference is that prosodic information, which is infant-directed, may have drawn infant attention away from face gender. I will discuss those issues and present new data that suggest that early multisensory abilities are shaped by the very nature of social interactions.
 
  
 
 Ferran Pons
 
 Title: What's in a mouth? Audiovisual redundancy during language acquisition
 
 Traditionally, research exploring infants’ ability to process language has been done in the auditory domain. The fact is, however, that in typical social interactions infants are usually exposed to audiovisual speech. Indeed, studies have found that as infants grow, they become interested in the source of audiovisual speech, namely a talker’s mouth (i.e., Lewkowicz & Hansen-Tift, 2012). The attentional focus to the talker’s mouth gives infants access to redundant and highly salient audiovisual speech. In this talk I will present results from studies with infants, children and adults, showing different factors that seem to modulate attention to the eyes and mouth of a talking face. In infancy, factors such as 1) bilingualism, 2) language familiarity, and 3) communication and social abilities, seem to play a key role regarding the use of the audiovisual information located at the mouth. On the other hand, in adults, 1) language proficiency and 2) language similarity seem to be responsible for the relative attention to the mouth of a talker.
 
 Finally, I will also discuss how this redundant audiovisual information is used in children with specific language impairment (SLI) as compared to typically developmental (TD) children. In this last case, the specific type of language impairment seems to be a good indicator of the use of the lip information in speech processing. 
 
  
 
 Kaisa Tiippana
 
 Title: Audiovisual speech training of school-aged children with specific language impairment (SLI)
 
 We developed a computer-based audiovisual training programme in Finnish for 7-10 year-old children with specific language impairment (SLI) to improve their phonological skills. The programme consists of phonological tasks presented audiovisually to one group and auditorily to another group. The children trained for six weeks, 5 days a week, 15 minutes a day. Before and after the training, language and other cognitive skills were assessed with neuropsychological tests and behavioral tasks, e.g. the McGurk paradigm. Also, speech processing was measured using EEG with the mismatch negativity (MMN) paradigm. The main result was an improvement in a nonword-repetition test (requiring phonological and memory skills) in the audiovisual group, but not in the auditory group. This suggests that audiovisual speech may be helpful in the rehabilitation of children with SLI. The findings of other the other tests and the MMN paradigm will also be discussed.
 
  
 
 Jean-Luc Schwartz
 
 Title: Audiovisual speech binding
 
 Speech perception involves a stage of fusion of multisensory information, which enables the listener improve the quality of the linguistic message to interpret. While the audiovisual fusion process has long been conceived as automatic and mainly focused on segmental decoding, it has progressively become clear that multisensory information actually concerns many other linguistic levels (e.g. prosody or lexical access) and that multisensory fusion is not automatic, but rather involves a number of ingredients related to attention, scene analysis, noise, individual or cultural/linguistic variation, age, etc. We will present a review of some of the works done in the last years in Grenoble about the question of “audiovisual binding”, that is the set of conditions and mechanisms according to which a given visual speech source may indeed modify the processing of an associated auditory speech source.
 
 The research leading to these results has received funding from the European Research Council under the European Community’s Seventh Framework Programme (FP7/2007-2013 Grant Agreement no. 339152, “Speech Unit(e)s”).
 
  
 
 Salvador Soto-Faraco
 
 Title: The Role of Attention in Audio-Visual Speech Integration 
 
 The beneficial consequences of multisensory integration in speech perception are well known by now. However, in most cases, these interactions have been studied under conditions of low perceptual load and task demands. This situation is very much unlike the problem that our perceptual system faces in everyday life complex scenarios, where selection of information seems to be crucial. Here I will argue that some of the processes at play during audio-visual speech integration are subject to strong limitations via attention selection, or are a manifestation of an attention process themselves.  We support this argument with evidence of three types: (1) Behavioural and BOLD data showing that responses to congruent AV speech are strongly modulated by the focus of attention; (2) BOLD and EEG data supporting that the famous McGurk illusion is mediated by the same conflict brain network that is engaged by stimulus-responses incompatibility paradigms; and, (3) BOLD and EEG data illustrating that the sight of speaker’s gestures has a consequence of allocating attention to the right moments in time via phase resetting of ongoing oscillations on the listener’s side.
</div>

Dates Event local time

Jun

29^th

'16

22:00 Registration opens

Jun

29^th

'16

22:00 Abstracts submission opening

Jun

29^th

'16

22:00 Abstracts submission opening (Poster Abstract Submission)

Nov

10^th

'16

22:59 Registration closes

Nov

14^th

'16

22:59 Abstracts submission closing

Nov

14^th

'16

22:59 Abstracts acceptance

Nov

14^th

'16

22:59 Abstracts submission closing (Poster Abstract Submission)

Nov

14^th

'16

22:59 Abstracts acceptance (Poster Abstract Submission)

Nov

28^th

'16

09:00 Starting date

Nov

29^th

'16

20:00 Closing date

Event place

WORKSHOP ON AUDIOVISUAL SPEECH PROCESSING AND LANGUAGE LEARNING