Research Topics

The Spoken Language Processing group carries out research aimed at understanding the human speech communication processes and developing models for use in automatic processing of speech. This research is by nature interdisciplinary, drawing upon expertise in signal processing, acoustic-phonetics, phonology, semantics, statistics and computer science. The group's research activities are validated by developing systems for automatic processing of spoken language such as speech recognition, language identification, multimodal characterization of speakers and their affective state, named-entity extraction and question-answering, spoken dialog, multimodal indexation of audio and video documents, and machine translation of both spoken and written language.

With the aim of extracting and structuring information in audio documents, the group develops models and algorithms that use diverse sources of information to carry out a global decoding of the signal, that can be applied to identify the speaker, the language being spoken if it is not known a priori, the affect, to transcribe the speech or translate it, or identify specific entities.

The research of the group is structured in seven interdependent topics: Speaker characterization in a multimodal context (Topic 1); Affective and social dimensions of spoken interactions (Topic 2); Perception and automatic processing of variation in speech (Topic 3); Robust analysis of spoken language and dialog systems (Topic 4); Translation and machine learning (Topic 5); Speech recognition (Topic 6); Language resources (Topic 7).

1. Speaker characterization in a multimodal context

Speaker recognition consists of determining who spoke when, where the identity can be that of the true speaker or an identity specific to one document or a set of documents. Different sources of information can be used to identify the speaker in multimedia documents (the speaker's voice, what is said, or what is written. The group is leading the QCOMPERE consortium for the REPERE challenge.

2. Affective and social dimensions of spoken interactions

Affective and social dimension detection are being applied to both human-machine interaction with robots and in the analysis of audiovisual and audio documents such as call center data. The main research subjects in this area are emotion and social cues identification in human-robot interaction, emotion detection based on verbal and non verbal cues (acoustic, visual and multimodal), dynamic user profile (emotional and interactional dimensions) in dialog for assistive robotics, and multimodal detection of the anxiety applied to therapeutic serious games.

3. Perception and automatic processing of variation in speech

The very large corpora used for training statistical models are exploited for linguistic studies of spoken language, such as acoustic-phonetics, pronunciation variation and diacronic evolution. Automatic alignment enables studies on hundreds to thousands of hours of data, permitting the validation of hypotheses and models. This topic also studies human and machine transcription errors via perception experiments.

4. Robust analysis of spoken language and dialog systems

Robust analysis methods for the spoken language are developed in the framework of open domain information retrieval with applications to language understanding for dialog systems, to named-entity recognition, and to interactive question answering systems supporting both spoken and written languages.

5. Translation and machine learning

This research activities on statistical machine translation of speech or text focus on design and development of novel language and translation models as well as novel decoding statregies and is closely related to the developement of machine learning methodologies. Two major achievements during this period are the Wapiti open source software for large-scale linear chain CRFs, and the developement of new architectures and training strategies for neural network language models.

6. Speech recognition

Speech recognition is the process of transcribing the speech signal into text. Depending upon the targeted use, the transcription can be completed with punctuation, with paralinguistic information such as hesitations, laughter or breath noises. Research on speech recognition relies on supporting research in acoustic-phonetic modeling, lexical modeling and language modeling (a problem also addressed for machine translation), which are undertaken in a multilingual context (18 languages). This topic also includes research on language recognition, that is determining the language and/or dialect of an audio document for both wideband and telephone band speech.

7. Language resources

In addition to the collection, annotation and sharing of varied corpora, this research topic addresses more general investigations on Language Resources, covering data, tools, evaluation and meta-resources (guidelines, methodologies, metadata, best Practice), for spoken and written language, but also for multilingual, multimodal, and mutimedia data. Those activities are mostly conducted in collaboration with national and international organizations and networks.

Last modified: Thursday,16-July-15 03:36:40 CEST

Spoken Language Processing Group (TLP)