Janus Recognition Toolkit

The Janus Recognition Toolkit (JRTk) is a general-purpose speech recognition toolkit developed at the Interactive Systems Labs at Carnegie Mellon University and Karlsruhe Institute of Technology. Commercial and research liscenses are available.

The Janus Recognition Toolkit (JRTk) is a general-purpose speech recognition toolkit useful for both research and application development and is part of the JANUS speech-to-speech translation system.

The JRTk provides a flexible Tcl/Tk script based environment which enables researchers to build state-of-the-art speech recognizers and allows them to develop, implement, and evaluate new methods. It implements an object oriented approach that unlike other toolkits is not a set of libraries and precompiled modules but a programmable shell with transparent, yet efficient objects.

Since version 5 JRTk features the IBIS decoder, a one-pass decoder that is based on a re-entrant single pronunciation prefix tree and makes use of the concept of linguistic context polymorphism. It is therefore able to incorporate full linguistic knowledge at an early stage. It is possible to decode in one pass, using the same engine in combination with a statistical n-gram language model as well as context- free grammars. It is also possible to use the decoder to rescore lattices in a very efficient way.

Features

JRTk features state-of-the art techniques for pre-processing, acoustic modeling, and search.

Acoustic Pre-Processing

Processing of various, frequent audio formats
Flexible short-term fourier analysis
Flexibly configurable Mel-frequency scaled cepstral coefficients calculation
Minimimum variance distortion response processing
LPC processing
Mean and variance normalization

Acoustic Modeling

EM Training, label training, Viterbi training
incremental growing of Gaussians
Semi-tied covariances
MMIE training, bMMIE training
Speaker adaptive training

Decoding

Single pass decoder
Flexible language model interface for n-gram language models and grammars
lattice generation and manipulation
lattice rescoring
consensus decoding
confusion network combination

Projects

The JRTk is used for speech recognition in many on-going projects as well as past ones.

JRTk in current projects

Lecture Translator

The ISL’s Simultaneous Lecture Translation system is capable of automatically translating lectures and presenting the translation results as text via the www. The system is used to translate German lectures into English and further languages, so that international students can better follow their content, even if they are not fluent in German.

JRTk in past projects

EU-BRIDGE	The project will provide streaming technology that can convert speech from lectures, meetings, and telephone conversations into the text in another language. ISL is coordinator of the project.
EVEIL-3D	Im Projekt EVEIL-3D wurde ein Serious Game für den Einsatz im Rahmen des schulischen Fremdsprachenunterrichts entwickelt.
TC-STAR	TC-STAR is envisioned as a long term effort focused on advanced research in all core technologies for speech to speech translation (SST): speech recognition, speech translation and speech synthesis.
C-STAR	The Consortium for Speech Translation Advanced Research (C-STAR) has emerged from originally informal bilateral collaborations between research labs interested in Automatic Translation of Spoken Language.
FAME	Facilitating Agents for Multi-Cultural Exchange - FAME The vision of the FAME project was to construct and intelligent agent to facilitate communication among people from different cultures who collaborate on solving a common problem.
View4You	View 4 You automatically records the "Tagesschau" every day and allows the user to retrieve video segments of news items for different topics using spoken language input.
CHIL	CHIL - Computers in the Human Interaction Loop - is an Integrated Project (IP 506909) under the European Commission's Sixth Framework Programme. I
PF-STAR
NESPOLE!	The project aims at supporting multilingual and multimodal negotiation in e-commerce and e-service by providing a robust, flexible, scalable and portable speech-to-speech translation system.
VERBMOBIL	The Verbmobil system recognizes spontaneous speech, analyzes the input and translates it into a Foreign language, creates a sentence and pronounces it.
BABEL	Rapid porting of keyword search for new languages
Quaero	Quaero is a program promoting research and industrial innovation on technologies for automatic analysis and classification of multimedia and multilingual documents.
SFB 588 Humanoid Robots	The goal of this interdisciplinary research project is the development of humanoid robots which resemble humans in their ways of acting in the world, of reasoning and of communicating about the world.

License Information

The JRTk is liscensed by Carnegie Mellon University. Commercial as well as research liscenses are available. Terms and conditions as well as further information can be inquired by contacting Prof. Alex Waibel at ahw∂cs.cmu.edu.

Publications

JRTk Articles
Title	Author	Source
A One Pass-Decoder Based On Polymorphic Linguistic Context Assignment	Hagen Soltau, Florian Metze, Christian Fügen, Alex Waibel	Automatic Speech Recognition and Understanding Workshop 2001, ASRU 2001, Trento, Italy, 25. October 2011
Recognition Of Conversational Telephone Speech Using The Janus Speech Engine	Torsten Zeppenfeld, Michael Finke, Klaus Ries, Martin Westphal, Alex Waibel	International Conference on Acoustics, Speech, and Signal Processing 1997, ICASSP 1997, Munich, Germany, 01. April 1997
JANUS III: Speech-To-Speech Translation In Multiple Languages	Alon Lavie, Alex Waibel, Lori Levin, Michael Finke, Donna Gates, Marsal Gavalda, Torsten Zeppenfeld, Puming Zhan	International Conference on Acoustics, Speech, and Signal Processing 1997, ICASSP 1997, Munich, Germany, 01. April 1997
JANUS II Translation Of Spontaneous Conversational Speech	Alex Waibel, Michael Finke, Donna Gates, Marsal Gavalda, Thomas Kemp, Alon Lavie, Lori Levin, Uwe Meier, Laura Tomokiyo, Arthur McNair, Ivica Rogina, Kaori Shima, Tilo Sloboda, Monika Woszczyna, Torsten Zeppenfeld, Puming Zhan	IEEE International Conference On Acoustics, Speech And Signal Processing 1996, ICASSP 1996, Atlanta, USA, 01. May 1996
End-To-End Evaluation In Janus: A Speech-to-Speech Translation System	Donna Gates, Alon Lavie, Lori Levin, Alex Waibel, Marsal Gavalda, Laura Tomokiyo, Monika Woszczyna, Puming Zhan	12th European Conference on Artificial Intelligence, ECAI 1996, Budapest, Hungary, 01. August 1996
Translation Of Conversational Speech With Janus-II	Alon Lavie, Alex Waibel, Lori Levin, Donna Gates, Marsal Gavalda, Torsten Zeppenfeld, Puming Zhan, Oren Glickman	4th International Conference on Spoken Language Processing 1996, ICSLP 1996, Philadelphia, USA, 01. October 1996
JANUS II: Towards Spontaneous Spanish Speech Recognition	Puming Zhan, Klaus Ries, Marsal Gavalda, Donna Gates, Alon Lavie, Alex Waibel	4th International Conference on Spoken Language Processing 1996, ICSLP 1996, Philadelphia, USA, 01. October 1996
JANUS II: Towards Multi-Lingual Spoken Language Translation	Bernhard Suhm, Petra Geutner, Thomas Kemp, Alon Lavie, Laura Tomokiyo, Arthur McNair, Ivica Rogina, Tanja Schultz, Tilo Sloboda, Wayne Ward, Monika Woszczyna, Alex Waibel	01. January 1995
JANUS 93: Towards Spontaneous Speech Translation	Monika Woszczyna, N. Aoki-Waibel, Finn Dag Buø, Noah Coccaro, Keiko Horiguchi, Thomas Kemp, Alon Lavie, Arthur McNair, Thomas Polzin, Ivica Rogina, Carolyn Rose, Tanja Schultz, Bernhard Suhm, M. Tomita, Alex Waibel	International Conference on Acoustics, Speech, and Signal Processing 1994, ICASSP 1994, Adelaide, Australia, 01. April 1994
Recent Advances In Janus: A Speech Translation System	Thomas Polzin, Noah Coccaro, N. Aoki-Waibel, Monika Woszczyna, M. Tomita, J. Tsutsumi, Ivica Rogina, Carolyn Rose, Alex Waibel, Arthur McNair, Alon Lavie, A. Eisele, Tilo Sloboda, Wayne Ward	European Conference on Speech Communication and Technology 1993, Eurospeech 1993, Berlin, Germany, 26. January 1993
Testing Generality In Janus: A Multi-Lingual Speech Translation System	Louise Osterholtz, Joe Tebelskis, Ivica Rogina, Hiroaki Saito, Charles Augustine, Arthur McNair, Alex Waibel, Monika Woszczyna, Tilo Sloboda	IEEE International Conference on Acoustics, Speech, and Signal Processing 1992, ICASSP 1992, San Francisco, USA, 26. January 1992