Automatic Speech Recognition (ASR)

Automatic Speech Recognition (ASR) is the science of automatically transforming spoken text into a written form. The main applications for which we develop ASR systems at our laboratory are for the use in speech translation systems, such as the simultaneous lecture translation system. We conduct research in all areas relevant for ASR, we offer several courses in ASR, and teach how to build an ASR system with our in-house speech recognition toolkit (Janus Recognition Toolkit).

We conduct research in all areas relevant for ASR:
  • multilingual models
  • codeswitching
  • short- and long-term learning
  • language modeling
  • new word adaptation
  • accent conversion
    • Accent Conversion (AC) attempts to make non-native speech sound as if the speaker had a native accent, because the performance of speech recognition can be reduced when the audio is spoken by a non-native speaker. As a result, if we can develop a good Accent Conversion model, we can improve the performance of ASR models on non-native speech.
  • acoustic environment / far field ASR
    • Building an efficient voice interaction system with a remote microphone brings scalability to real-world applications since speech is the most natural way for human-machine interaction. In the ideal environment with a close speaking microphone, the speech recognition performance of current automatik speech recognition (ASR) systems has surpassed humans with a word error rate (WER) below 5%. However, in a realistic environment, the ASR system has to deal with complex acoustic conditions like noise, room reverberation, cross talk, ... It makes the model performance drop dramatically. Different from machines, humans actually do a very good job of ignoring the interfering signals and focusing on what we want to hear. Our goal is to research and develop a speech system that can learn to focus on the desired signal.
  • Multilingual ASR
We offer both, a class in ASR, and a laboratory that teaches how to build an ASR system with our in-house speech recognition toolkit Janus Recognition Toolkit (JRTk).

Applications for this technology are manifold. While the original idea was to create an automatic typewriter for dictation purposes, nowadays speech recognition software can be found in many applications that ask for a natural interface:

  • Dictation software
  • Speech Translation Systems
  • Smart Rooms
  • Human-Robot Communication
  • Telephone help lines
  • Machine control
  • Car navigation- and entertainment systems
  • Pick-to-voice systems
  • Appliances
  • Medical systems in operating rooms