Liberated Learning: Accessibility through Speech Recognition
home | site map | contact us
Français

About Us
Consortium
Technology
Projects
Resource Centre
News & Events

Technology

What is ViaScribe?

Liberated Learning technology centers around two core applications: automatically transcribing spoken language and creating web accessible multimedia notes. Liberated Learning and the world's top SR scientists, collaborating through a Joint Study partnership, have transformed a proof of concept application first developed at Saint Mary's University into a robust and rapidly evolving technology platform. The technology is called IBM ViaScribe, and the Liberated Learning team acts as its public stewards.

ViaScribe Overview

ViaScribe contains a speech recognition engine capable of transcribing live or prerecorded speech. Live speech is delivered to the system via a standard or USB microphone. Typically, public speakers wear noise-canceling wireless headsets or lavalieres (lapel mics) that record high quality sound without impeding movement. ViaScribe can also transcribe pre-recorded speech from a variety of audio and video formats, including WAV, MP3, and AVI.

During a live presentation, ViaScribe serves as a real time text display--like a closed captioning window--outputting text as it is processed by the Speech Recognition engine. Because natural spoken language generally does not lend itself to rules of grammar and punctuation, ViaScribe promotes readability by introducing a paragraph break or other markers whenever the speaker pauses to take a breath. These pauses can be customized according to the speaker’s individual speech characteristics.

The speaker can also use interactive voice commands to navigate PowerPoint slides or other applications during a live transcription, and automatically create captioned multimedia presentations.

Multimedia Notes

After the talk, ViaScribe saves the Speech Recognition generated transcript, audio, and optionally, screen captures and PowerPoint slides as an accessible webpage or streaming media file. IBM ViaScribe allows students to select lecture information that suits their individual learning preferences. In addition to text transcripts, ViaScribe creates a series of accessible multimedia files (SMIL, XML, WAV, RT, RTF) that can easily be published to the web creating a rich set of teaching resources.

Challenges

Since taking its first steps in the development of speech recognition as an accessibility tool in 1999, Liberated Learning has been working to overcome fundamental technology and usability issues. Accuracy, readability, user friendliness, and ease of training and editing are all areas that have required leading edge solutions and that continue to challenge the development team. Visit the Research and Development page for details on development priorities for 2007-2008.

Accuracy

Improving speech recognition accuracy remains a primary challenge. The Consortium spends considerable time studying factors that affect word error rates (WER). A number of factors drive speech recognition accuracy including microphone quality, ambient noise, voice profile training, individual speech characteristics, and available acoustic and language models. A number of integrated efforts such as evaluating new speech engines, improving speech models, and investigating new training techniques are key research priorities.

Editing/Post Production

Directly linked to accuracy, editing misrecognitions is another core challenge. Editing can be viewed along a continuum, ranging from no post-presentation intervention to extensive correction and modification of notes. ViaScribe offers an easy-to-use error correction system for subsequent editing. For any recognition errors that occur, ViaScribe allows an editor to replay the audio, make necessary corrections, and update the lecture output to create the final version to be used as course notes. The Consortium is studying automated post production techniques, team-based editing tools, and new interfaces to improve editing efficiency.

Training

Attempting to improve the system's overall usability, researchers developed a unique approach to voice profile training that does not require the speaker to read a set of predefined scripts, typical of most commercially available Speech Recognition systems. This new process replaces the traditional training process by using a person's own transcribed speech to create customized voice models. A lecture is recorded behind the scenes, without any extra preparation by the speaker. The transcribed audio is then edited by a third party to create a voice profile that can then be used to capture the speaker's next lecture, resulting in incremental improvements in accuracy with each usage.

Multiple Languages

The Liberated Learning Consortium is working to extend the language base for IBM ViaScribe. Consortium partners in the Far East have worked closely with IBM to integrate Chinese (Mandarin) and Japanese language models. Liberated Learning technology also supports French, Italian and both UK and US English language models, with other extensions planned.

ViaScribe Extensions

IBM ViaScribe features an Application Program Interface (API) exposing speech recognition functions to external applications. Liberated Learning Partners have used this API to build a number of exciting new applications including Personal Displays, Real Time Editing tools, and Multi Speaker systems. (See Projects).

Top of Page Top of Page
Saint Mary's University IBM University of the Sunshine Coast Purdue University Trent University Massey University Massachusetts Institute of Technology and Artificial Intelligence Laboratory
University of Southampton Cambrian College Kentucky University Messiah College Hiroshima University Beijing University Alexander Graham Bell Centre
  Charles Darwin University Australian National University Cape Breton University Alma Master Studiorum Università di Bologna    

About Us | Consortium | Technology | Projects | Resource Centre | News & Events | Site Map | Privacy | Contact Us | Home