Speech Understanding for Spoken Language Systems: 
Portability Across Domains and Languages

Wolfgang Minker 

PhD Thesis 
December 19, 1997 
Fakultät für Informatik

Universität Karlsruhe (TH)


Abstract


This thesis investigates the problem of automatic natural language understanding for spoken language systems. The proposed parsing method is sufficiently general and flexible so as to be easily ported to different applications, domains and human languages.

Spoken language systems support unconstrained human-machine communication. They combine primary component technologies (such as speech recognition, natural language understanding and dialog processing) to understand the meaning of an input utterance. Natural language generation and/or speech synthesis are required to build end-to-end systems which accomplish some given task.

Today’s state-of-the-art rule-based methods to natural language understanding provide good performance in limited applications for specific languages. However, the manual development of an understanding component using specific rules is costly as each application and language requires its own adaptation or, in the worst case, a completely new implementation. In order to address this cost issue, statistical modeling techniques are used in this work to replace the commonly-used hand-generated rules to convert the speech recognizer output into a semantic representation. The statistical models are derived from the automatic analyses of large corpora of utterances with their corresponding semantic representations. To port the semantic analyzer to different applications and languages it is thus sufficient to train the component on the application- and language-specific data sets as compared to translating and adapting the rule-based grammar by hand.

A stochastic method for natural language understanding was developed and applied to the following tasks and languages: the American ATIS (Air Travel Information Services), the French MASK (Multimodal-Multimedia Automated Service Kiosk) applications and the English Spontaneous Speech Task (ESST). The ATIS and MASK tasks deal with information retrieval for air and train travel, a domain of human-machine interaction. ESST deals with human-to-human interaction in which two people negotiate to schedule a meeting.

In ATIS, the corpora were semantically labeled by the rule-based component which was developed for the French language at the Laboratoire d’Informatique pour la Mécanique et les Sciences de l’Ingénieur (France). This same component was ported to English during the course of this thesis. For MASK, the semantic labels were obtained by integrating the stochastic component into the labeling process using bootstrapping and manual correction. For ESST, the model parameters were trained on a corpus of semantic tree-based representations which were produced by the natural language understanding component of JANUS, a spontaneous speech-to-speech translation system, in part developed at the University of Karlsruhe (Germany) and at Carnegie Mellon University (United States).

In direct comparison the stochastic data-driven parser is seen to outperform the rule-based method in terms of semantic accuracy and robustness. Furthermore, the semantic analyzer can be flexibly ported to new tasks, domains and languages. The strength of such a method is that the same software can be used regardless of the application and language. The stochastic models are trained on the specific data sets. The human effort in component development and porting is therefore limited to the task of data labeling, which is much simpler than the design, maintenance and extension of the grammar rules.
 

Ordering Information

The thesis has been published by Medien- und Verlagsgruppe Dr. Hänsel-Hohenhausen AG.
 

3 Fiches. DHS 2569. DHS. Mikroedition.

€ 50,1 (DM 98,00).
ISBN 3-8267-2569-7
 

Excerpts in PDF format

The following excerpts are available in compressed (gzip) pdf format:

Table of Contents
Introduction
Stochastically-based Case Frame Analysis
Conclusion
Bibliography