Technology
What is ViaScribe?
Liberated Learning technology centers around two core applications:
automatically transcribing spoken language and creating web
accessible multimedia notes. Liberated Learning and the world's
top SR scientists, collaborating through a Joint Study partnership,
have transformed a proof of concept application first developed
at Saint Mary's University into a robust and rapidly evolving
technology platform. The technology is called IBM ViaScribe,
and the Liberated Learning team acts as its public stewards.
ViaScribe Overview
ViaScribe contains a speech recognition engine capable of
transcribing live or prerecorded speech. Live speech is delivered
to the system via a standard or USB microphone. Typically,
public speakers wear noise-canceling wireless headsets or
lavalieres (lapel mics) that record high quality sound without
impeding movement. ViaScribe can also transcribe pre-recorded
speech from a variety of audio and video formats, including
WAV, MP3, and AVI.
During a live presentation, ViaScribe serves as a real time
text display--like a closed captioning window--outputting
text as it is processed by the Speech Recognition engine.
Because natural spoken language generally does not lend itself
to rules of grammar and punctuation, ViaScribe promotes readability
by introducing a paragraph break or other markers whenever
the speaker pauses to take a breath. These pauses can be customized
according to the speaker’s individual speech characteristics.
The speaker can also use interactive voice commands to navigate
PowerPoint slides or other applications during a live transcription,
and automatically create captioned multimedia presentations.
Multimedia Notes
After the talk, ViaScribe saves the Speech Recognition generated
transcript, audio, and optionally, screen captures and PowerPoint
slides as an accessible webpage or streaming media file. IBM
ViaScribe allows students to select lecture information that
suits their individual learning preferences. In addition to
text transcripts, ViaScribe creates a series of accessible
multimedia files (SMIL, XML, WAV, RT, RTF) that can easily
be published to the web creating a rich set of teaching resources.
Challenges
Since taking its first steps in the development of speech
recognition as an accessibility tool in 1999, Liberated Learning
has been working to overcome fundamental technology and usability
issues. Accuracy, readability, user friendliness, and ease
of training and editing are all areas that have required leading
edge solutions and that continue to challenge the development
team. Visit the Research
and Development page for details on development priorities
for 2007-2008.
Accuracy
Improving speech recognition accuracy remains a primary challenge.
The Consortium spends considerable time studying factors that affect word error rates (WER).
A number of factors drive speech recognition accuracy including microphone quality,
ambient noise, voice profile training, individual speech characteristics, and available
acoustic and language models. A number of integrated efforts such as evaluating new speech engines,
improving speech models, and investigating new training techniques are key research
priorities.
Editing/Post Production
Directly linked to accuracy, editing misrecognitions is another core challenge.
Editing can be viewed along a continuum, ranging from no post-presentation intervention
to extensive correction and modification of notes. ViaScribe offers an easy-to-use
error correction system for subsequent editing. For any recognition errors that occur,
ViaScribe allows an editor to replay the audio, make necessary corrections, and update
the lecture output to create the final version to be used as course notes. The
Consortium is studying automated post production techniques, team-based editing tools,
and new interfaces to improve editing efficiency.
Training
Attempting to improve the system's overall usability, researchers developed a
unique approach to voice profile training that does not require the speaker to read
a set of predefined scripts, typical of most commercially available Speech Recognition
systems. This new process replaces the traditional training process by using a person's
own transcribed speech to create customized voice models. A lecture is recorded behind the scenes,
without any extra preparation by the speaker. The transcribed audio is then edited by a third party
to create a voice profile that can then be used to capture the speaker's next lecture, resulting in
incremental improvements in accuracy with each usage.
Multiple Languages
The Liberated Learning Consortium is working to extend the language base for IBM ViaScribe.
Consortium partners in the Far East have worked closely with IBM to integrate Chinese (Mandarin)
and Japanese language models. Liberated Learning technology also supports French, Italian and both
UK and US English language models, with other extensions planned.
ViaScribe Extensions
IBM ViaScribe features an Application Program Interface (API) exposing
speech recognition functions to external applications. Liberated
Learning Partners have used this API to build a number of
exciting new applications including Personal Displays, Real
Time Editing tools, and Multi Speaker systems. (See
Projects).
Top
of Page |