World Wide Web Consortium Member Find out more about usContact WinWriters at 1-800-838-8999
Link to WinWriters home pageReceive information about our eventsLink to our discussion and jobs ForumLink to the Online Help Resource Directory
Link to WinWriters home

Just Say "Help":
   Implementing Help in VoiceXML

By Allen Beebe, WaterCove Networks


Contents

Click a link below to jump to a particular section; click any "CONTENTS" image following a section heading to jump back here.

Introduction     Link to the article Contents

One of the exciting aspects of XML is the number of ways it is being used to provide new means of communicating and gathering new information. One such use is VoiceXML, an emerging W3C standard that brings voice to the Web or the Web to the phone.

You'll hear more about VoiceXML as more organizations create voice portals (for up-to-the-second information services like stock reports) or other customer-relationship tools. This article provides an overview of VoiceXML and the VoiceXML help tag. It also includes some resources so you can learn more about VoiceXML.

The entire code listing for this VoiceXML prototype is available for download.

First: Try the VoiceXML Demo     Link to the article Contents

I've prepared a simple VoiceXML application that you can call and talk to as an introduction to developing help for VoiceXML. The demo simulates a sales/inventory application that provides the caller with information for widgets. The demo provides two levels of help: overall and for each widget. The demo runs through VoiceXML services provided by TellMe Networks.

Access the demo from any telephone or cellular phone in the U.S.

  1. Call TellMe at 1-800-555-Tell (8355).
  2. During the TellMe Menu monologue, dial 1-01249.
  3. Listen to the demo and respond to the prompts. Say "help" to hear how a VoiceXML application can provide help at different levels.

Later, we'll look at portions of the VoiceXML file that you've called.

Defining VoiceXML     Link to the article Contents

VoiceXML is XML code that enables development of audio dialogs including text-to-speech (TTS), digitized audio, speech recognition, or dual-tone multi-frequency (DTMF) key input. VoiceXML provides voice applications with all the advantages of web-based development and content delivery.

VoiceXML Implementations     Link to the article Contents

VoiceXML is gaining use in voice portal applications that include:

  • Customer Relationship Management (CRM) applications that provide both call center support systems and sales support tools that provide information such as orders, part numbers, or available inventory.
  • Information service applications providing weather, traffic, or stock quotes.
  • Voice-mail and hands-free cellular phone applications.
  • Voice shopping applications enabling users to purchase tickets or make reservations.
  • Interactive Voice Response (IVR) systems are replacing IVR systems that use vendor-specific APIs and rely on DTMF input. (We've all placed calls that wound us through a maze of numbered menus only to give us the choice of leaving a voice-mail message or returning to the main menu.)

Relationships to Other Technologies     Link to the article Contents

VoiceXML makes extensive use of other voice technologies. For example, TTS technology has improved to provide very realistic sounding voices. Several companies lead the way in providing commercial versions, including AT&T Natural Voices™ and SpeechWorks Speechify™. You can learn more about these by seeing the company web sites or trying an open source (Java-based) TTS package called FreeTTS. FreeTTS enables you to develop your TTS material.

While VoiceXML is still gaining acceptance, it already is facing competition from other new technologies that claim to offer other features and improvements:

  • Call Control eXtensible Markup Language (CCXML) provides advanced call-handling capabilities for interaction with call centers and conferencing. It also allows calls to be transferred on to succeeding call legs; currently, that is difficult to do with VoiceXML. (See the CCXML web site listed in the References and Resources section for further information.)
  • Speech Application Language Tags (SALT) extend the current mark-up languages (HTML, XHTML, and XML) to provide multi-modal access through speech, keyboard, keypad, mouse, or stylus, providing output that includes synthesized speech, other audio, video, text, and graphics. Developers add the SALT information to their HTML or XHTML code. (See the SALT web site listed in the References and Resources section for more information.)

VoiceXML Structure     Link to the article Contents

VoiceXML code uses a hierarchical parent-child structure as illustrated in Figure 1 below. Most code fits within the <form> and <field> tags. VoiceXML provides an extensive range of tags so developers can create sophisticated voice applications, even those that can interact with databases to provide real-time data updates and feedback.

VXML Code Structure Diagram

Figure 1: Code Structure Diagram

VoiceXML Help Tag     Link to the article Contents

In VoiceXML, a user may encounter a problem during the VoiceXML session. If the user says "help," the system provides an answer relevant to the specific portion of the session. The help can be programmed to continue offering information in shorter and shorter segments to encourage the user to continue. The voice the user hears can be either from TTS or a recorded audio file. Invoking the help may initiate a brief tone or melody indicating help activation.

Looking Inside the Demo     Link to the article Contents

Now that you've called the demo and we've gone over some basic VoiceXML, we'll look at some code from the demonstration VoiceXML application.

Form, Field, and (Internal) Grammar

Figure 2 below shows code for the intro and main menu forms along with an inline grammar for the widget field. Other constructs (within the intro form) include a block containing the introductory TTS audio and a dual catch event tag for no match and no input.

Code for the intro and main menu forms

Figure 2: Code for the intro and main menu forms

The inline grammar tag set identifies the phrases or words that the demo can understand as it converses with you. The help is identified as an option, and can understand the word "help" or the phrases "help me or assist me please." The other portions of the grammar identify the three widgets. More complex grammars can be placed in a separate grammar file and accessed by a tag reference within the VoiceXML file.

Main Menu Help Code

Figure 3 below shows the code for the main menu help. The help includes a .wav file that provides an auditory prompt indicating a change in application activity. The TTS audio provides simple instructions on what to say in order to access information on a specific widget.

Main Menu Help code

Figure 3: Main Menu Help code

Widget Selection Code

Figure 4 below shows a code fragment with the main menu form's widget field. The code is a filled tag set and provides a mechanism for making selections or choices with a VoiceXML applications. Here we can see how widget choice branches off to a widget form.

Widget Selection Code

Figure 4: Widget Selection Code

Widget 1 Code Example

Figure 5 below shows a portion of the Widget 1 form code. This example includes the help and filled tags. The help provides a short monologue stating the widget 1 choices that are available in the filled tag.

Widget 1  Code Example

Figure 5: Widget 1 Code Example

Each filled condition (or selection) includes a disconnect tag that ends the call session, only because this is a demonstration. In a production application, the caller has control of the call and makes the choice to end the call. The application can provide a prompt, "When you have finished, say good-bye." to let the caller know that the processing activities have concluded.

VoiceXML and the Web: Using XLST     Link to the article Contents

Static content (information that can be used without rewriting) can be directed to other output through the use of XLST as shown in Figure 6 below. This static content can be in XML files, and by using a specific XSL file that includes namespaces, the content can be exported to a specific output including Wireless Access Protocol (WAP), HTML, or VoiceXML. For example, the help-related content for WAP would be a simple menu listing and help card in a WAP card deck. For HTML, help could be accessed through a button image and display in a small secondary window. For VoiceXML, help content is created for each appropriate layer of the VoiceXML application.

Ken Abbott in his 2001 book provides an excellent discussion, example code, and software for using VoiceXML, XSL, and XSLT to provide multiple output types. The following figure provides a representation of the files. The XSD is a large XML file containing XSL content coded using the editor XML Spyexternal link from Altova.external link

Use XSLT to prepare multiple distribution paths

Figure 6: Use XSLT to prepare multiple distribution paths

Network View

Figure 7 below shows an example of a network configuration that provides access for land-line phones, modems and wireless PDAs, and cellular phones. The user can access information via WAP, VoiceXML, or HTML. The output is generated through XLST.

Network View

Figure 7: Network View

DB   = Database
JSP  = JavaServer page
WAP  = Wireless Access Protocol
PSTN = Public Switched Telephone Network
SGSN = Serving GPRS Service Node
GGSN = Gateway GPRS Service Node

A VoiceXML file is placed on a web server, along with other XML files that can be used to create HTML or WAP content. A VoiceXML server contains the processing that "brings the voice to life." Indeed, the VoiceXML file can be on a server in New York processed by a VoiceXML server in California in response to a phone call made from Florida. The phone can be a standard telephone or a cellular phone.

Building the VoiceXML VUI: Roles for the Technical Communicator     Link to the article Contents

While programmers develop most of the structure in a VoiceXML application, technical communicators can participate in the development of the audio prompts or dialogs for applications, or in creating the documentation for hardware products such as a VoiceXML server. Tools such as Final Draft®, used for screenplay scripts, provide TTS voice playback features to enhance script development.

References and Resources     Link to the article Contents

Books

VoiceXML: Strategies and Techniques for Effective Voice Application Development with VoiceXML 2.0
Chetan Sharma and Jeff Kunins
December 2001
ISBN: 0471418935
John Wiley & Sons, Incorporated
496 pp. with CD

The VoiceXML Handbook: Understanding and Building the Phone-Enabled Web
Bob C. Edgar
March 2001
ISBN: 1578200849
C M P Books
481 pp.

Voice Enabling Web Applications: VoiceXML and Beyond
Ken Abbott
November 2001
ISBN: 1893115739
APress L. P.
256 pp. with CD

Designing Effective Speech Interfaces
Susan Weinschenk and Dean T. Barker
February 1999
ISBN: 0471375454
John Wiley & Sons, Incorporated
406 pp.

How to Build a Speech Recognition Application: A Style Guide for Telephony Dialogues (2nd Edition)
Bruce Balentine and David P. Morgan
April 1999
ISBN: 0967127823
Enterprise Integration Group, Incorporated
393 pp.

Magazines

Speech Technology Magazine
www.speechtechmag.comexternal link

XML Journal
www.sys-con.com/xmlexternal link

XML Magazine
www.xml-mag.comexternal link

Vendors

AT&T Labs Natural Voices — www.naturalvoices.att.comexternal link
IBM Voice Systems — www-4.ibm.com/software/speechexternal link
Nuance Developer Network (NDN) — extranet.nuance.com/developerexternal link
VoiceGenie — developer.voicegenie.comexternal link
TellMe — www.tellme.comexternal link or studio.tellme.comexternal link

Web Sites

VoiceXML Planet — www.voicexmlplanet.comexternal link
Wireless Developer Network - Voice — www.wirelessdevnet.com/channels/voiceexternal link
SALT — www.saltforum.orgexternal link
CCXML — www.telera.com/CCXML.htmlexternal link
VoiceXML Forum — www.voicexml.org/index.htmlexternal link
FreeTTS Project Page on SourceForge.net — FreeTTS.sourceforge.netexternal link
Ken Rehor's World of VoiceXML — www.rehor.com/voicexmlexternal link
Voice Web Community — www.voicewebservices.com/community/default.aspexternal link

Discussion List

voicexml@yahoogroups.com

STC Intercom Articles

Ask Your Phone
Naomi Grattan
July/August 2001
48(7): pp. 10-11

Beyond the Bleeding Edge: Voice Portals
Neil Perlin
February 2002
49(2): pp. 37-38

Copyright 2002, Allen Beebe


Allen Beebe has over 18 years of experience in technical writing as well as creating both print and online documents. He has a great interest in developing Help systems or other user assistance tools. His primary focus has been online documents using Microsoft Windows Help, Microsoft HTMLHelp, Sun Microsystems JavaHelp, Oracle Help for Java, and eHelp's WebHelp. He has conducted workshops on using JavaHelp that encouraged a hands-on approach to understanding the use of XML in the JavaHelp system. He recently developed a JavaHelp system and print documentation for WaterCove Networks that supports the WaterCove Networks Mobile Data Services System for cellular phone networks. Allen can be reached at allen.beebe@verizon.net.


up

Copyright © WinWriters. All Rights Reserved.
Joe Welinske: jw@winwriters.com
Last modified on