Field Trials of a Telephone Service for Rail Travel Information
(from LIMSI 1996 Scientific Report)
L. Devillers, S. Rosset, S. Bennacef,
J.L. Gauvain, J.J. Gangolf, S. Foukia, L. Lamel
Object
The goal of this research is to evaluate the potential spoken language
system for access to rail travel information via the telephone. A
particularity of telephone information services is that all
interaction with the user and all information returned by the system,
must be exchanged vocally, making oral dialog management and response
generation very important aspects of the system design and usability.
Content
The RailTel spoken language system is largely based on the
spoken language system developed for the Esprit Mask project
[1,2]. The system runs a Unix workstation with a high quality
telephone interface which can support up to 4 telephone lines. The
continuous speech recognizer has a recognition vocabulary of 1500
words, including 600 station names. The recognizer was adapted to
deal with telephone quality speech. The recognizer output is passed
to the natural language understanding component which carries out a
caseframe analysis and generates a semantic frame representation. The
dialog manager prompts the user to fill in missing information and
then generates a database query. The returned information is then
converted to a natural language response, which is played to the user.
Vocal messages are formed by concatenation of speech units which are
stored in the form of a dictionary. Mixed-initiative dialog is used,
where the user can provide any information at any point in time.
Experienced users are thus able to provide all the information needed
for database access in a single sentence, whereas less experienced
users tend to provide shorter responses, allowing the system to guide
them.
Situation
This research has been carried out in the context of LE-MLAP
project RailTel (Railway Telephone Information Service) which
aims to assess the technical adequacy of available speech technology
for interactive telephone services. A prototype service was developed
over the summer of 1995, and demonstrated at the Eurospeech'95
conference. This system was for to collect telephone data (0ver 4000
queries) with which a new acoustic models were constructed for the
speech recognizer. The prototype service was used to carry out a field
trail with 100 naive users. (A common field trial protocol was
designed for the project and trials were carried out by our Italian
partners and British partners with their systems.)
Subjects solved one of two style scenarios. Scenarios of type A supply the user with
an exact date and time of travel, and represent relatively simple, but
frequent, information requests. The scenarios of type B allow
more flexibility on the part of the user, as well as a range of
interpretations. The constraint that the trajet require changing
trains is to assess the response generation and synthesis components.
The average dialog duration is 193 secs for type A and 245 secs
for type B scenarios. For the 50 type A scenarios 76% of
calls were successfully completed, compared to 68% success for
scenarios of type B. The average number of dialog turns for the
72 successful calls was 4. 35% of the calls contained corrective
measures, with 80% of the errors due to recognition or understanding,
14% in dialog, and 6% for database access.
References
[1] L. Lamel, S. Bennacef, H. Bonneau-Maynard, S. Rosset, J.L. Gauvain,
``Recent Developments in Spoken Language Sytems for Information
Retrieval,'' Proc. ESCA Workshop on Spoken Dialog Systems,
Vigso, Denmark, Spring 1995.
[2] J.L. Gauvain, S. Bennacef, L. Devillers, L. Lamel, S. Rosset, ``The
Spoken Language Component of the Mask Kiosk,'' Proc. Human
Comfort & Security Workshop, Brussels, Oct. 26, 1995.
LIMSI-CNRS
BP-133
91403 Orsay Cedex France