Spoken Language Processing Group at LIMSI | Affective and social dimensions of spoken interactions

Affective and social dimensions of spoken interactions

Laurence Devillers, Clement Chastagnol, Agnes Delaborde, Mohamed El Amine Sehili Joseph Mariani, Mariette Soury, Ioana Vasilescu Fan Yang

Affective and social dimension detection are being applied to both human-machine interaction with robots and in the analysis of audiovisual and audio documents such as call center data. The main research subjects in this area are emotion and social clues identification in human-robot interaction, emotion detection based on verbal and non verbal clues (acoustic, visual and multimodal), dynamic user profile (emotional and interactional dimensions) in dialog for assistive robotics, and multimodal detection of the anxiety applied to therapeutic serious games.

In order to design affective interactive systems, experimental grounding is required to study expressions of emotion and social clues during interaction. Socio-cultural clues are contrary to emotions voluntarily controlled. In human interaction, nonverbal elements such as gesture, facial expression and paralinguistic clues are valuable for a more precise understanding of the communicated message. Voice and speech play a fundamental role in social interactions but they have been relatively neglected, at least in the last years, compared to other aspects of social exchanges such as, facial expressions or gestures. There is a tendency within the area of emotion-oriented computing to use very exaggerated and unnatural emotional data portrayed by actors. It seems increasingly clear that this strategy is not effective, because the forms of expression that occur in natural interactions are fundamentally different from those that actors generate on command. Since 2001, the work on speech introduced in this theme is based on the use of genuinely naturalistic material. The team was one of the first to grasp the issue, and is one of a very small number of researchers who has now consistently taken on the challenge of finding, annotating and analysing databases of real-life emotional data. The team has collected and analysed emotional speech databases in financial consultations, calls for medical help and human-robot interactions. Studies were led on various levels of fear (stress, anxiety, fear panics), of anger (annoyance, anger), of sadness (disappointment, sadness, depression) and of positive feelings (relief, satisfaction, enjoyment, pride). Analysis techniques that extracted spectral, prosodic and affect burst markers and automatic emotion detection systems using sophisticated machine learning techniques such as Support Vector Machines (SVM) have been developed to understand this comprehensive data. Recent comparisons show that they are on a par with those developed by other members of the international community.

A social robot sensitive to emotions should not take only punctual emotions into account, but also have a representation of the emotional and interactional profile of the user along the interactions, in order to have a chance of being more relevant in its behavioural responses. We have studied the way paralinguistic clues impact the human-robot interaction as a first step, by linking the low-level clues computed from speech to an emotional and interactional profile of the user. Being able to predict which specific behaviour will have a chance to trigger pleasure in the user is a plus. For example, someone dominant and with a high self-confidence will not need to be encouraged to interact, and this encouragement could even be seen as irrelevant, even boring. The system would provide a closed interaction loop, where the robot would react to the emotional message of the human, and trigger an emotional response in the human according to relevant chosen behaviours. There are many cases where voice is not the only clue available to identify emotions and social stances. We propose in our next steps of research to extract multimodal dimensions using gaze tracking (with a webcam), posture detection (with a Kinect 3D sensor) and a few physiological clues such as EEG with non-invasive sensors in order to improve the performance of our systems.

Research subjects developed in the area are: Speaker and emotion identification in human-robot interaction, Emotion detection for analyzing the quality of Client/Agent interaction in call center data, Engagement in Human-Robot interaction, Emotion detection based on acoustic, visual and physiological clues for Assistive Robotics and finally Multimodal detection of the anxiety for the design of a serious game with therapeutic purpose.

Applications and projects

The detection of affective and social dimensions can be used for human-machine communication with robots but also for audiovisual documents analysis with goals of health, security, education, entertainment or serious games applications. For example, in the framework of the Cap Digital FUI VoxFactory project, we aim to analyse the quality of Client/Agent interactions in call center data (2009-11). Robotics is a relevant framework for assistive applications due to the learning and skills of robots. In the framework of the ANR Tecsan ARMEN project, we are involved in the designing and building of an assistive robot to maintain elderly people in their natural environment (2010-13). We also participated in the Cap Digital FUI ROMEO project (2009-2012) and now participate in OSEO PSPC ROMEO2 (2013-17) which has the main goal of building a social humanoid robot (a big brother of the NAO robot developed by Aldebaran Robotics) that can act as a comprehensive assistant to help persons suffering from autonomy loss. We are a member of the ROMEO Social Committee, which aims to provide a societal vision on the design of the robot. The FEDER E-THERAPY project is devoted to the design of immersive serious games with therapeutic vocation, based on the verbal and non-verbal interaction and the technique of role playing (2012-2015). In the framework of the ANR EMCO COMPARSE project, we also study, in collaboration with the CPU group, the relationships between cognition, motivation, and personality for emotional adaptation and regulation, using empathic virtual simulation (2012-15). The European CHISTERA project named JOKER (2013-16), JOKe and Empathy of a Robot/ECA: Towards social and affective relations with a robot, will emphasize the fusion of verbal and non-verbal channels for emotional and social behavior perception, interaction and generation capabilities. Our paradigm invokes two types of decision in the dialogs: intuitive (mainly based upon non-verbal multimodal clues) and cognitive (based upon fusion of semantic and contextual information with non-verbal multimodal clues).

Perspectives

Research Challenges:

Fusion between linguistic and paralinguistic channel, multimodal fusion
Deep Neural Networks for cross-corpora experiments, adaptation learning techniques
Affective dailog system, long-term relation-ship
Social interaction with robots, ethics and affective robotics

In a near future, socially assistive robotics aims to address critical areas and gaps in care by automating supervision, coaching, motivation, and companionship aspects of one-to-one interactions with individuals from various large and growing populations, including the elderly, children, disabled people, and individuals with social phobias among many others. The ethical issues, including safety, privacy, and dependability of robot behaviour, are also more and more widely discussed. It is thus necessary that a bigger ethical thought is combined with the scientific and technological development of robots, to ensure the harmony and acceptability of their relation with the human beings. We are also involved in the Ethical working group for research in robotics of CERNA (Committee on the Ethics of the Research in sciences and technologies of the Digital technology of Allistene).

Videos

Living with robots

Data collection E-Therapies (video)

Data collection ANR TECSAN ARMEN (video)

Some publications

Current Projects

FEDER E-THERAPIES (Design of immersive serious games with therapeutic vocation, based on the verbal and non-verbal interaction and the technique of role playing)
CHIST-ERA JOKER (JOKe and Empathy of a Robot/ECA: Towards social and affective relations with a robot, will emphasize the fusion of verbal and non-verbal channels for emotional and social behavior perception, interaction and generation capabilities)
ROMEO2 (Humanoid robot assistant)
ANR TECSAN COMPARSE (Cognition, motivation, and personality, for emotional adaptation and regulation, using empathic virtual simulation)