|
|
|
Spoken Language Processing Group (TLP)
Research Topics
The Spoken Language Processing group carries out research aimed at
understanding the human speech communication processes and developing models
for use in automatic processing of speech. This research is by nature
interdisciplinary, drawing upon expertise in signal processing,
acoustic-phonetics, phonology, semantics, statistics and computer science. The
group's research activities are validated by developing systems for automatic
processing of spoken language such as speech recognition, language
identification, multimodal characterization of speakers and their affective
state, named-entity extraction and question-answering, spoken dialog,
multimodal indexation of audio and video documents, and machine translation of
both spoken and written language.
With the aim of extracting and structuring information in audio documents, the
group develops models and algorithms that use diverse sources of information
to carry out a global decoding of the signal, that can be applied to identify
the speaker, the language being spoken if it is not known a priori, the
affect, to transcribe the speech or translate it, or identify specific
entities.
The research of the group is structured in seven interdependent topics:
Speaker characterization in a multimodal context (Topic
1); Affective and social dimensions of spoken interactions (Topic 2); Perception and automatic processing of
variation in speech (Topic 3); Robust analysis of
spoken language and dialog systems (Topic 4);
Translation and machine learning (Topic 5); Speech
recognition (Topic 6); Language resources (Topic 7).
Speaker recognition consists of determining who spoke when, where the identity
can be that of the true speaker or an identity specific to one document or a
set of documents. Different sources of information can be used to identify the
speaker in multimedia documents (the speaker's voice, what is said, or what is
written. The group is leading the QCOMPERE consortium for the REPERE challenge.
Affective and social dimension detection are being applied to both
human-machine interaction with robots and in the analysis of audiovisual and
audio documents such as call center data. The main research subjects in this
area are emotion and social cues identification in human-robot interaction,
emotion detection based on verbal and non verbal cues (acoustic, visual and
multimodal), dynamic user profile (emotional and interactional dimensions) in
dialog for assistive robotics, and multimodal detection of the anxiety applied
to therapeutic serious games.
The very large corpora used for training statistical models are exploited for
linguistic studies of spoken language, such as acoustic-phonetics,
pronunciation variation and diacronic evolution. Automatic alignment enables
studies on hundreds to thousands of hours of data, permitting the validation
of hypotheses and models. This topic also studies human and machine
transcription errors via perception experiments.
Robust analysis methods for the spoken language are developed in the framework
of open domain information retrieval with applications to language
understanding for dialog systems, to named-entity recognition, and to
interactive question answering systems supporting both spoken and written
languages.
|
|
This research activities on statistical machine translation of speech or text
focus on design and development of novel language and translation models as
well as novel decoding statregies and is closely related to the developement
of machine learning methodologies. Two major achievements during this period
are the Wapiti open source software for large-scale linear chain CRFs, and the
developement of new architectures and training strategies for neural network
language models.
Speech recognition is the process of transcribing the speech signal into
text. Depending upon the targeted use, the transcription can be completed with
punctuation, with paralinguistic information such as hesitations, laughter or
breath noises. Research on speech recognition relies on supporting research in
acoustic-phonetic modeling, lexical modeling and language modeling (a problem
also addressed for machine translation), which are undertaken in a
multilingual context (18 languages). This topic also includes research on
language recognition, that is determining the language and/or dialect of an
audio document for both wideband and telephone band speech.
In addition to the collection, annotation and sharing of varied corpora, this
research topic addresses more general investigations on Language Resources,
covering data, tools, evaluation and meta-resources (guidelines,
methodologies, metadata, best Practice), for spoken and written language, but
also for multilingual, multimodal, and mutimedia data. Those activities are
mostly conducted in collaboration with national and international
organizations and networks.
|
Last modified: Thursday,16-July-15 03:36:40 CEST
|
|
|