Dpt CHM Chap FF
___________________________
J. Mariani
Speech Communication Group Language and Cognition Group
Non-Verbal Communication Group Human Cognition Group
The research carried out in the LIMSI "Human-Machine Communication" department covers a large part of this research domain: automatic speech processing (analysis, recognition and synthesis), natural language processing (analysis, understanding and generation), image synthesis, computer vision, gestual and multimodal communication, human factors and cognitive science. It promotes the collaboration of researchers in Engineering Sciences (Computer Science, Signal Processing and Artificial Intelligence) with those in linguistics (phoneticians, specialists in semantics, psycholinguists) and cognitive processes. LIMSI gathers a remarkable set of interdisciplinary expertise, covering the different communication modes from various points of view. The "Human-Machine Communication" Department was created in 1987. In 1994, it was structured in the following 4 groups: "Speech Communication", "Language and Cognition", "Non-Verbal Communication" and "Human Cognition" (created in 1992 as a special action of the Cognitive Sciences CNRS interdisciplinary program). This structure slightly changed in 1995.
Communication systems to help Humans
The general objective, within the "Communication systems to help humans" heading, is to respond to the present and future needs of society regarding the user-machine relationship. It is important to foresee the ways in which this relationship will change as a result of forthcoming technologies, especially the wider interactive access to multimedia information, where the intelligent access to such information is still an open problem. The researches conducted within the laboratory address various user-machine communication modes: through spoken and written language, gesture, and vision. Each mode includes the perception aspects (analysis and comprehension of texts, speech, visual scenes, and ergotic or semantic gestures), the generation aspects (concept-to-text generation, speech synthesis, image or gesture synthesis), and the cognitive aspects (knowledge representation, plan generation, reasoning, learning...) underlying the entire process. In order to conduct a study on the identification of the needs of society, and on the way to answer those needs, Claude Henry, a specialist in the socio-economic aspects related to innovation, joined the laboratory in early 1995, to participate in the "Multimodal Human-Machine Communication Platform" project conducted within the "Intelligent Machines and Structures" program of the Engineering Sciences department at CNRS.
Interdisciplinarity and the consideration of the needs of society will therefore be the major guidelines of the research, both basic and applied, conducted at the laboratory.
A National and International presence
The laboratory is reknown on the national and international scene. Limsi is a main node of the Esprit BRA Network of Excellence Elsnet (European Language and Speech Network). In this framework, several actions have been conducted aiming at better cooperation with Eastern and Central European countries (Intas contract, survey for the CEC, the Copernicus-Babel project, PECO grants). In 1994, we started the Francil Francophone network in Language Engineering, sponsored by the Aupelf-Uref, which organizes several actions aiming at the corpus-based evaluation of written and spoken language processing systems. We are also co-responsible with Inalf (Institut National de la Langue Française) of an action commonly sponsored by the Engineering Sciences (SPI) and Human and Social Sciences (SHS) Departments of CNRS, entitled "Cognition, Intelligent Communication and Language Engineering". Finally, we had an important contribution this year in the launching of the European Language Resources Association (ELRA).
The "Spoken Communication" group has reached an excellent international level. This is demonstrated by Limsi's participation in the speech recognition evaluation tests conducted in the US by the DARPA (Department of Defense). Those tests were opened in 1992 to non-US laboratories. 3 European laboratories participated in this test campaign (Cambridge University - Enginering Department (Cued), Philips-Aachen and Limsi). In the very first test campaign, which addressed the recognition of a 1,000 word vocabulary (Resource Management task), Limsi obtained top level recognition performances. Since then, Limsi has participated in the annual evaluations and the system has regularly appeared among the best, now addressing in the "Wall Street Journal" dictation task, the recognition of continuous speech, speaker independent, with vocabularies up to 65,000 words. A similar approach is used for spoken language understanding, speaker recognition, and language identification through the telephone. The research is supported by a variety of contracts from the CEC (Esprit Mask, LRE Sqale, Mlap Railtel) or the industry (France Telecom and CNET).
Still in the area of Spoken Communication, the research addresses hybrid speech synthesis. obtaining good enough results to experiment with this technique in a practical application, through a contract granted by the Philips company. The research on speaking style and voice quality are conducted in the framework of 3 ESPRIT BRA and Human Capital and Mobility contracts (PLP, Sphere and VOX). Cooperations have been established with the Avicenne and Saint-Antoine hospitals, on the application of our researches to the domain of cochlear implants. Other research addressing oral dialog have conducted to an operational system for air-traffic controller students training, currently being assessed at the ENAC (National School for Civil Aerospace), under a contract of the CéNA (Air Traffic Study Agency), together with the Sextant-Avionique, Stéria and Vecsys companies. Those studies also resulted in a contract with DRET on the study of Multimodal Communication. Statistical physics methods (simulated annealing, Monte-Carlo) have been applied to language modelling. Finally, Guided Propagation Connexionist methods have been experimented in several domains of Verbal or Non-Verbal Communication, and several theses based on those works have been defended. A Human Capital and Mobility grant has also been granted on this topic.
The "Language and Cognition" group continues to conduct research organized around a distributed architecture (Caramel). These studies resulted this year in the proposal of a "Carnet d'Esquisses" (Sketchpad) model, which is an extension of the "BlackBoard" model, now well known in Artificial Intelligence. The research in Dialog, carried out in the framework of the Esprit PLUS project, in cooperation with the French Railway company (SNCF - Platon System), aims at modelling the plans and beliefs of users, including the study of Indirect Speech Acts. Work in Text Generation has been applied to language training and to aids for the handicaped (with the Centre de Kerpape and Thomson-CSF). Recurrent Neural Nets (recursive self-associative memories) have been applied to knowledge representation, and the research in semantics is conducted in close cooperation with linguists, using the Sowa conceptual graphs formalism. A robust Document Processing system has been developed in collaboration with the Resoudre company and with the Speech Communication group. The studies on Time representation, which were conducted in the proposal of the "Generalized Intervals" concept, have now been extended to Spatial representation. The use of a simplified 3D modelling software appears as the start-up of a common Image-Language action. Those works are conducted in cooperation with the Human Cognition group (especially in the framework of a contract with the Renault company for car navigation).
The "Non Verbal Communication" group considers communication by means other than spoken or written language. Activities in 3D modelling and image synthesis have been reinforced recently by the hiring of two CNRS researchers. The activities in Computer Vision and gestual communication by the hiring of two Assistant-Professors. A new version of our proprietary 3D modelling software package has been made (Sculptor II), under UNIX/X11 and GL. The visual analysis-synthesis coupling is a selected research effort (ROSA project). Gestual communication (using a DataGloveTM) is also an important research field in order to realize a sensori-motor model and to study the French sign language (LSF). Important progress has been made in the field of multimodal communication. A graphical objects design system (LimsiDraw) using voice command, a touch screen and image synthesis has been completed. Specimen, a multimodal user-machine interface specification tool, has then been used to design a text editor for blind users (Meditor). Another project allowed for the implementation of a system mixing spoken and graphic interaction (Mix3D), which will be used as an experimental platform for studying the use of multimodal interaction in CAD. Finally, a contract has been granted by the DRET (French DoD), with Sextant Avionique and in cooperation with the "Speech Communication" group, on the modelling of multimodality.
The activities of the "Human Cognition" group complement the activities of the three other groups in terms of spatial cognition, text understanding mechanisms and knowledge acquisition and representation. Many collaborations exist between this group and the other ones: with speech processing specialists for studying perceptive processes and variability in verbal wording, with language specialists for time and space representation and neural models, with specialists in Non-Verbal Communication for multimodal communication. Studies have been conducted with the Renault car company in the field of car navigation analysis (with the "Langage & Cognition" group). Others have been conducted for DRET-DoD (on "Cognitive Maps Elaboration" and on "Vigilance and Attention Load") and Electricité de France (EDF) (on "Aid to the decision for maintenance operations"). Cooperative actions exist, within the "Cognition Sciences" program of the Ministry of Research with the Orsay Hospital and the Laboratoire de Physiologie de la Perception et de l'Action (LPPA, Collège de France) for the study of the cerebral activity areas in cognitive tasks, and the laboratory is one of the 4 partners in the National Cognisciences Research Action "Space Representation". A CNRS-NSF cooperation has been established with Northeastern University (Boston) on metaphor understanding. Finally, the group is responsible for a "Human Capital and Mobility" Network on the role of image and language on spatial cognition.
Language resources and the Evaluation paradigm
The evaluation paradigm has had a great influence on recent developments in written and spoken language processing. It consists of installing together with a research project, the means (speech and text corpora), tools, and evaluation methods to build systems, to iteratively improve their performance, to measure formally the progress achieved and to compare in detail different methods on the same data. This paradigm has been an important factor in the success of the ARPA Human Language Technology program. It may be regretted that no comparable test infrastructure exist in France, or even in Europe. Fortunately, similar actions have now been started in Europe. In the domain of evaluation, we participate in the study presenty conducted by the "Evaluation Study Group" installed by the CEC. We also participate in the LRE Sqale project, on the evaluation of speech recognition systems in a multilingual context. We also initiated an action on the evaluation of morpho-syntactic taggers (GRACE), in the framework of the "Cognition, Intelligent Communication and Language Engineering" SPI-SHS CNRS program, which installed a Coordinating Committee gathering about 15 specialists from various French laboratories, in order to prepare the test campaign which is planned to take place in November 1995. We initiated a program within the "Actions de Recherche Concertées" (ARC) of Aupelf-Uref, which proposes to use the evaluation paradigm for assessing systems and methods for the automatic processing of the French language (both written and spoken). This 4-year 2 MEcu program got a very good response (about 100 proposals) to a first Call for Proposal, of which about 60 proposals have been retained. We are also present in the field of corpora and linguistic resources, with our participation in the LRE-Relator and Eurococosda, LE Mlap Speechdat and Copernicus Babel projects, and by participating in international working groups (Elsnet Resource Reusability Task Group, Eagles, Cocosda, ELRA Interim Board). We bring the experience we gained in the design and realisation of the very large BREF corpus, based on Le Monde Newspaper texts, and in the design of a multi-accent English speech corpus (TED), to the promotion of an international action on Newspaper texts (NEWS), to the design of an oral corpus of information requests for air (French ATIS) and rail (MASK, Railtel) travel, and large telephone corpora (Ideal, Speechdat). In the domain of written texts, another internal action is related to the use of monolingual corpus for the design of bilingual dictionaries.
Multimodal Communication and Multimodal Learning
The use of different communication modes aims at enabling a more natural and a more efficient communication between humans and machines. However, it also raises the problems of extracting and integrating information coming from various sources, such as the co-reference problem, when the user designates gestually one object, and accompanies this action with a verbal command ("Put that there !"). We do not believe in the surface integration of pre-existing modules. We think that it is important to study each of these communication modes as a research domain per se, in order to be able to link the deep level information extracted from the various modes so as to understand the transmitted message
It appears that humans learn communication modes together not independently. The acquisition of spoken language is learned along with the acquisition of touch, vision, movement... It can be thought that in the future, it will be necessary to be able to mix stimuli coming from various modes (gesture, speech, vision...) so as to learn the model associated to each of those stimuli, in the framework of multimodal learning, based on a symbolic, statistic, or neural net approach. This approach implies the existence of multimodal databases, which are very expensive and time consuming to obtain. We have started recording such databases, in the framework of a contract with Renault, involving (as a combined effort) the "Human Cognition" and the "Language and Cognition" groups.
A Multimodal Human-Machine Communication Platform
An internal, pluriannual "Program Action" conducted in the framework of the "Intelligent Machines and Structures" priority program of the CNRS SPI Department and of the SHS-SPI cooperative "Cognition, Intelligent Communication and Linguistic Enginering" and "Production Systems" programs has started. The topic of this action is Multimodal Communication, and therefore covers the full scope of the department. It includes stereovision scene analysis, 3D modelling, language analysis and knowledge (including time and space) representation, spoken and gestual communication. It implies the building, structuring and management of multimodal databases. It is foreseen to use the evaluation paradigm in order to measure the progress of research, and to compare methods on the same basis. In complement, it will aim to answer a need expressed by researchers in their own scientific area: the need to be able to test hypotheses with a set of up-to-date and integrated experimental tools (basic research), or the need to objectively evaluate algorithms in a modular standard environment (technological research).
In the first phase, we installed a committee headed by J.S. Liénard, who chaired the "Human-Machine Communication" CNRS COST Committee which proposed the platform concept. In that phase, the communication modalities, both for input and output, have been identified, together with a terminology of that domain. In the second phase, an action committee has been created, headed by F. Néel which includes representatives from each group of the HMC department. Generic applications have been proposed and a software environment has been selected. This action may extend to other laboratories in the framework of a Competence Network, and industrial companies can also participate. Coordination with national, European and international actions is aimed at in order to share the effort of research. This should lead to major contributions on very challenging questions: How to obtain the collaboration of Engineering Sciences researchers (computer scientists and signal processing specialists) with linguists and psychologists on common projects? How to obtain collaboration of written language and spoken language specialists? How to integrate various communication modes on a single platform ? What are the links between various modes? How can they cooperate to create meaning? How to use the evaluation paradigm to improve the national research in this area? Which products can be designed to answer the demands of society regarding user-machine communication? What will be the socio-economic output ?
The "Human-Machine Communication" Department has restructured in 1995. The resulting new structure allows for a better balance between the groups, by focusing the activities on spoken language processing on that specific topic and by reinforcing the "Non Verbal Communication" group with activities in the field of multimodality (dialog structures and connectionist models for perception-action relationship). The new structure came into existence in January 1995, together with a a reorganization of transversal actions (Multimodal Human-Machine Communication Platform, seminars and reading meetings, language resources, servers and evaluation means).