Just Say "Help":
Implementing Help in VoiceXML
By Allen Beebe, WaterCove Networks
Contents
Click a link below to jump to a particular section; click any "CONTENTS" image following a section heading to jump back here.
Introduction 
One of the exciting aspects of XML is the number of ways it is being used to provide new means of communicating and gathering new information. One such use is VoiceXML, an emerging W3C standard that brings voice to the Web or the Web to the phone.
You'll hear more about VoiceXML as more organizations create voice portals (for up-to-the-second information services like stock reports) or other customer-relationship tools. This article provides an overview of VoiceXML and the VoiceXML help tag. It also includes some resources so you can learn more about VoiceXML.
The entire code listing for this VoiceXML prototype is available for download.
First: Try the VoiceXML Demo 
I've prepared a simple VoiceXML application that you can call and talk to as an introduction to developing help for VoiceXML. The demo simulates a sales/inventory application that provides the caller with information for widgets. The demo provides two levels of help: overall and for each widget. The demo runs through VoiceXML services provided by TellMe Networks.
Access the demo from any telephone or cellular phone in the U.S.
- Call TellMe at 1-800-555-Tell (8355).
- During the TellMe Menu monologue, dial 1-01249.
- Listen to the demo and respond to the prompts. Say "help" to hear how a VoiceXML application can provide help at different levels.
Later, we'll look at portions of the VoiceXML file that you've called.
Defining VoiceXML 
VoiceXML is XML code that enables development of audio dialogs including text-to-speech (TTS), digitized audio, speech recognition, or dual-tone multi-frequency (DTMF) key input. VoiceXML provides voice applications with all the advantages of web-based development and content delivery.
VoiceXML Implementations 
VoiceXML is gaining use in voice portal applications that include:
- Customer Relationship Management (CRM) applications that provide both call center support systems and sales support tools that provide information such as orders, part numbers, or available inventory.
- Information service applications providing weather, traffic, or stock quotes.
- Voice-mail and hands-free cellular phone applications.
- Voice shopping applications enabling users to purchase tickets or make reservations.
- Interactive Voice Response (IVR) systems are replacing IVR systems that use vendor-specific APIs and rely on DTMF input. (We've all placed calls that wound us through a maze of numbered menus only to give us the choice of leaving a voice-mail message or returning to the main menu.)
Relationships to Other Technologies 
VoiceXML makes extensive use of other voice technologies. For example, TTS technology has improved to provide very realistic sounding voices. Several companies lead the way in providing commercial versions, including AT&T Natural Voices™ and SpeechWorks Speechify™. You can learn more about these by seeing the company web sites or trying an open source (Java-based) TTS package called FreeTTS. FreeTTS enables you to develop your TTS material.
While VoiceXML is still gaining acceptance, it already is facing competition from other new technologies that claim to offer other features and improvements:
- Call Control eXtensible Markup Language (CCXML) provides advanced call-handling capabilities for interaction with call centers and conferencing. It also allows calls to be transferred on to succeeding call legs; currently, that is difficult to do with VoiceXML. (See the CCXML web site listed in the References and Resources section for further information.)
- Speech Application Language Tags (SALT) extend the current mark-up languages (HTML, XHTML, and XML) to provide multi-modal access through speech, keyboard, keypad, mouse, or stylus, providing output that includes synthesized speech, other audio, video, text, and graphics. Developers add the SALT information to their HTML or XHTML code. (See the SALT web site listed in the References and Resources section for more information.)
VoiceXML Structure 
VoiceXML code uses a hierarchical parent-child structure as illustrated in Figure 1 below. Most code fits within the <form> and <field> tags. VoiceXML provides an extensive range of tags so developers can create sophisticated voice applications, even those that can interact with databases to provide real-time data updates and feedback.
Figure 1: Code Structure Diagram
VoiceXML Help Tag 
In VoiceXML, a user may encounter a problem during the VoiceXML session. If the user says "help," the system provides an answer relevant to the specific portion of the session. The help can be programmed to continue offering information in shorter and shorter segments to encourage the user to continue. The voice the user hears can be either from TTS or a recorded audio file. Invoking the help may initiate a brief tone or melody indicating help activation.
Looking Inside the Demo 
Now that you've called the demo and we've gone over some basic VoiceXML, we'll look at some code from the demonstration VoiceXML application.
Form, Field, and (Internal) Grammar
Figure 2 below shows code for the intro and main menu forms along with an inline grammar for the widget field. Other constructs (within the intro form) include a block containing the introductory TTS audio and a dual catch event tag for no match and no input.
Figure 2: Code for the intro and main menu forms
The inline grammar tag set identifies the phrases or words that the demo can understand as it converses with you. The help is identified as an option, and can understand the word "help" or the phrases "help me or assist me please." The other portions of the grammar identify the three widgets. More complex grammars can be placed in a separate grammar file and accessed by a tag reference within the VoiceXML file.
Main Menu Help Code
Figure 3 below shows the code for the main menu help. The help includes a .wav file that provides an auditory prompt indicating a change in application activity. The TTS audio provides simple instructions on what to say in order to access information on a specific widget.
Figure 3: Main Menu Help code
Widget Selection Code
Figure 4 below shows a code fragment with the main menu form's widget field. The code is a filled tag set and provides a mechanism for making selections or choices with a VoiceXML applications. Here we can see how widget choice branches off to a widget form.
Figure 4: Widget Selection Code
Widget 1 Code Example
Figure 5 below shows a portion of the Widget 1 form code. This example includes the help and filled tags. The help provides a short monologue stating the widget 1 choices that are available in the filled tag.
Figure 5: Widget 1 Code Example
Each filled condition (or selection) includes a disconnect tag that ends the call session, only because this is a demonstration. In a production application, the caller has control of the call and makes the choice to end the call. The application can provide a prompt, "When you have finished, say good-bye." to let the caller know that the processing activities have concluded.
VoiceXML and the Web: Using XLST 
Static content (information that can be used without rewriting) can be directed to other output through the use of XLST as shown in Figure 6 below. This static content can be in XML files, and by using a specific XSL file that includes namespaces, the content can be exported to a specific output including Wireless Access Protocol (WAP), HTML, or VoiceXML. For example, the help-related content for WAP would be a simple menu listing and help card in a WAP card deck. For HTML, help could be accessed through a button image and display in a small secondary window. For VoiceXML, help content is created for each appropriate layer of the VoiceXML application.
Ken Abbott in his 2001 book provides an excellent discussion, example code, and software for using VoiceXML, XSL, and XSLT to provide multiple output types. The following figure provides a representation of the files. The XSD is a large XML file containing XSL content coded using the editor XML Spy from Altova.
Figure 6: Use XSLT to prepare multiple distribution paths
Network View
Figure 7 below shows an example of a network configuration that provides access for land-line phones, modems and wireless PDAs, and cellular phones. The user can access information via WAP, VoiceXML, or HTML. The output is generated through XLST.
Figure 7: Network View
DB = Database
JSP = JavaServer page
WAP = Wireless Access Protocol
PSTN = Public Switched Telephone Network
SGSN = Serving GPRS Service Node
GGSN = Gateway GPRS Service Node
A VoiceXML file is placed on a web server, along with other XML files that can be used to create HTML or WAP content. A VoiceXML server contains the processing that "brings the voice to life." Indeed, the VoiceXML file can be on a server in New York processed by a VoiceXML server in California in response to a phone call made from Florida. The phone can be a standard telephone or a cellular phone.
Building the VoiceXML VUI: Roles for the Technical Communicator 
While programmers develop most of the structure in a VoiceXML application, technical communicators can participate in the development of the audio prompts or dialogs for applications, or in creating the documentation for hardware products such as a VoiceXML server. Tools such as Final Draft®, used for screenplay scripts, provide TTS voice playback features to enhance script development.
References and Resources 
Books
VoiceXML: Strategies and Techniques for Effective Voice Application Development with VoiceXML 2.0
Chetan Sharma and Jeff Kunins
December 2001
ISBN: 0471418935
John Wiley & Sons, Incorporated
496 pp. with CD
The VoiceXML Handbook: Understanding and Building the Phone-Enabled Web
Bob C. Edgar
March 2001
ISBN: 1578200849
C M P Books
481 pp.
Voice Enabling Web Applications: VoiceXML and Beyond
Ken Abbott
November 2001
ISBN: 1893115739
APress L. P.
256 pp. with CD
Designing Effective Speech Interfaces
Susan Weinschenk and Dean T. Barker
February 1999
ISBN: 0471375454
John Wiley & Sons, Incorporated
406 pp.
How to Build a Speech Recognition Application: A Style Guide for Telephony Dialogues (2nd Edition)
Bruce Balentine and David P. Morgan
April 1999
ISBN: 0967127823
Enterprise Integration Group, Incorporated
393 pp.
Magazines
Speech Technology Magazine
www.speechtechmag.com
XML Journal
www.sys-con.com/xml
XML Magazine
www.xml-mag.com
Vendors
AT&T Labs Natural Voices www.naturalvoices.att.com
IBM Voice Systems www-4.ibm.com/software/speech
Nuance Developer Network (NDN) extranet.nuance.com/developer
VoiceGenie developer.voicegenie.com
TellMe www.tellme.com or studio.tellme.com
Web Sites
VoiceXML Planet www.voicexmlplanet.com
Wireless Developer Network - Voice www.wirelessdevnet.com/channels/voice
SALT www.saltforum.org
CCXML www.telera.com/CCXML.html
VoiceXML Forum www.voicexml.org/index.html
FreeTTS Project Page on SourceForge.net FreeTTS.sourceforge.net
Ken Rehor's World of VoiceXML www.rehor.com/voicexml
Voice Web Community www.voicewebservices.com/community/default.asp
Discussion List
voicexml@yahoogroups.com
STC Intercom Articles
Ask Your Phone
Naomi Grattan
July/August 2001
48(7): pp. 10-11
Beyond the Bleeding Edge: Voice Portals
Neil Perlin
February 2002
49(2): pp. 37-38
Copyright 2002, Allen Beebe
Allen Beebe has over 18 years of experience in technical writing as well as creating both print and online documents. He has a great interest in developing Help systems or other user assistance tools. His primary focus has been online documents using Microsoft Windows Help, Microsoft HTMLHelp, Sun Microsystems JavaHelp, Oracle Help for Java, and eHelp's WebHelp. He has conducted workshops on using JavaHelp that encouraged a hands-on approach to understanding the use of XML in the JavaHelp system. He recently developed a JavaHelp system and print documentation for WaterCove Networks that supports the WaterCove Networks Mobile Data Services System for cellular phone networks. Allen can be reached at allen.beebe@verizon.net.

Copyright © WinWriters. All Rights Reserved.
Joe Welinske: jw@winwriters.com
Last modified on
|
|