SPEECH COMMUNICATION GROUP

______________

F. Néel

The main objective of the group is to develop speech recognizers and synthesizers which can easily be integrated in realistic conditions. This requires to study all the features characterizing the spoken language (non-grammatical phrases, prosody, hesitations, for example). The aim is to relieve the user from too many constraints : recognizers, therefore, must be speaker-, language-, vocabulary-, application- independent; synthesizers should offer a quality close to natural speech. Different approaches are investigated in parallel : self-organizing methods (Markov models) as well as methods inspired from neuro-biology (Guided Propagation), or based on explicit knowledge. A few illustrative examples are given below :

TOPIC 1 - Speech Analysis and Synthesis (C. d'Alessandro)

Acoustic and perceptive criteria have been used to develop a small number of rules to automatically generate prosodic contours, which may be applied to coded speech as well as to text-to-speech synthesis. Research is also being carried out on pitch modelization, on the noise analysis/synthesis of the vocal source, and on the characterization of non-linguistic information in the speech signal.

TOPIC 2 - Variability (M. Eskénazi)

An extensive study of the French language style variations led to the elaboration of modification rules applied to one or several segmental units and characterizing individual strategies adopted by speakers when they speak carefully. This study is carried out in relation with oral discourse analysis in order to automatically place markers between the main constitutive elements of a sentence. Maxine Eskénazi is currently on sabbatical leave at Carnegie Mellon University and developing a project on the same topic.

TOPIC 3 - Recognition (J-L. Gauvain, J-J. Gangolf)

The objective is to develop speaker-independent continuous speech recognition for large vocabulary (up to 65 000 words) dictation. Active participation in the test campaigns organized by ARPA has shown the excellence of the approach which is based on Markov models. The system developed for American-English and French has been extended to British-English and German within the framework of the LRE project Sqale. Other research activities are language-identification and speaker verification over telephone lines, within the framework of contracts with CNET and France Telecom. The last axis concerns understanding of spontaneously spoken requests, in an information inquiry application domain (concerning train or plane time schedule) (Esprit Mask, LE Mlap, Railtel).

TOPIC 4 - Oral Dialogue (F. Néel)

A collaboration with the preceding topic allowed, in the same class of application, for the validation of a functional hierarchical dialogue model represented by a recursive automaton and based on the identification of dialogue acts corresponding to different dialogue situations. An extension of the same approach has been applied to modelling multimodal interaction, in close collaboration with the Non-Verbal Communication Group, within the framework of a DRET contract with Sextant-Avionique.

TOPIC 5 - Linguistic Models for the Spoken Language (G. Adda)

Large vocabulary continuous speech recognition requires the availability of adapted linguistic tools : an important effort was devoted to the development of tools for the creation of both large vocabulary speech and text corpora, and lexicons (including one or several phonemic transcriptions for each entry), as well as to the creation of the corpora themselves. Studies on word automatic classification (using simulated annealing or Montecarlo methods) are also carried out, with the aim to improve speech recognition performance.

TOPIC 6 - Connexionist Systems (D. Béroule)

Within the framework of the project of a coincidence detection machine for Human-Machine Communication, Guided Propagation networks have been applied this year to the robust analysis of multimodal commands and natural language utterances. Other components have also been developed : extraction of multiple spectral events for speech recognition, formalization of the control unit, connection of a syntactic module to the reading model and development of variability processing algorithm along two (spatial) dimensions.

Our presence at the international level has been reinforced through collaborations with several research organizations and also through our active participation in European projects (Mask, Sqale, EuroCocosda, Relator, Speechdat, Railtel) and WorkGroups (BRA Vox). Through the HCM, PECO and ERASMUS programmes, we are able to welcome European students and researchers at our laboratory. We participate in the Network of Excellence "Speech and Language" in which LIMSI is the principal node for France. An extension of the network towards the Eastern European Countries (Ukraine and Russia) became effective this year, with an INTAS contract.

Industrial contracts have been carried out with several partners (Philips, CNET and France Telecom), in some cases under the auspices of the DRET (with Sextant-Avionique).

A new programme was launched in 1994 concerning all francophone (French-speaking) countries : under the care of AUPELF-UREF, the FRANCIL network (Francophone Network on Language Engineering) encourages contacts within the French-speaking community. A Concerted Research Action (ARC) was set up in order to organize periodic campaigns for the testing of speech recognition, synthesis and dialogue systems, and to make linguistic tools (corpora, lexicons, etc.) available within the same community. Other actions (ARPs : Shared Research Actions) allow for exchange of researchers and for collaborations between Southern and Northern Countries. The actions were created for the treatment of both spoken and written language.

Our active participation in this last programme illustrates our involvement not only in system development but also in their evaluation, elaborating the necessary resources and participating in test campaigns such as those organized by ARPA. Other activities in this area include : the Relator project, aimed at the definition of an European Association for linguistic resources dissemination and at the creation of ELRA (European Linguistic Resources Association) in early 95; the EuroCocosda project, in which we promote European activity in this area and produced the TED speech database (including native and non-native English speakers), in collaboration with the University of Munich; the BREF database, recorded at LIMSI with the support of GDR-PRC Man-Machine Communication, of Francophony and European Communities, to be soon available on CD-ROM. Lastly, we participated in the elaboration of the Eagles recommendation manual. At the national level, within the action set up by two CNRS scientific departments SPI (Engineering Science) and SHS (Human and Society Sciences), we participate in the GRACE project directed at morpho-syntactic analyzer evaluation.

The growth of the Speech Communication group and the expansion of the number of research axes, has led us to a restructuration in January 95, in order to reinforce links with the Non-Verbal Communication Group for research activities concerning a larger domain than automatic speech processing.