deutsche Version
 

 

 

 

 

VoiceXML for speech-activated information retrieval

The focus of recent developments in the fast-growing field of mobile Internet is on presenting Web contents on portable end devices - mobile telephones, handheld computers and personal digital assistants (PDAs) similar to the way they appear on Web browsers. But there is a drawback to some technologies such as Wireless Application Protocol (WAP), Wireless Markup Language (WML) or Handheld Device Markup Language (HDML) - content is displayed on far smaller screens. Fortunately, squinting at screens the size of a postage stamp is not the only way to access the Internet's vast information offering. The convenient alternative is voice-enabled Web access. In this installment of the newsletter we'll take a closer look at VoiceXML, the remarkable technology that makes this possible.

Providing the underpinning for VoiceXML is eXtensible Markup Language XML, a standard sanctioned by the W3C, the consortium responsible for developing the standards that underlie the Web. Strictly speaking, VoiceXML is an XML schema that serves a single purpose: maximum standardization of voice-activated retrieval of Internet content. This means conventional phone systems may be used to access information and applications in the Internet via spoken selection dialogues, voice commands and interactive replies (to include voice frequency signaling processes). VoiceXML is particularly well-suited for two tasks:

  1. delivering the contents of an Internet site in speech form (for example, to enable access via mobile phones)
  2. facilitating development of new interactive and voice-controlled phone services based on standard open architecture.

VoiceXML brings a considerable benefit to developers' efforts to come up with voice controlled applications by virtue of the fact that it is based on existing tools as well as Web infrastructures and servers. Instead of a Web browser, a VoiceXML Interpreter and telephony server grants users telephone-enabled access to contents, for example, information residing on the Software AG's XML Server, Tamino. 

Special features that delight developers

Voice portals are among the most popular VoiceXML applications. These are seeing ever more widespread use in the areas of customer services such as automated human-machine voice dialogues (help desk, support) and information services (share prices, sports results). But VoiceXML can do much more. A case in point: it lets users access intranets via voice commands, or it can be used to create a "reminder" service to alert users to upcoming events, appointments, etc. Notably, there is a strict division between applications that run on a standard Web server and voice dialogues provided by a telephony server. This opens a window for an entirely new business opportunity - «voice service providing». One of the things that makes this such an attractive business proposition is that developers aren't compelled to buy or maintain additional hardware and software, and are instead free to direct all their attention towards turning up and rolling out telephone voice services. Another area with a promising future is consumer applications, where voice recognition will eventually bring unrivalled ease of use and convenience to consumers.

The architecture of VoiceXML

Every VoiceXML solution is based on several components:

  • An application server. Generally this is the Web server on which all the applications and databases run. It can also serve as an interface to external databases or transaction servers.
  • A VoiceXML telephony server. The VoiceXML Interpreter runs on this platform. Acting as an interface to the calling client, it translates all VoiceXML dialogues, natural language, and commands, ensuring intelligible communication between the most diverse VoiceXML-enabled end devices.
  • The network protocol. Based on the TCP/IP protocol, this is a packet network that connects the various application servers and telephony servers.
  • A telephone network. This may be a public telephone networks or enterprises' proprietary local networks. VoiceXML solutions are also able to communicate across VoIP-enabled (Voice over IP) networks. Users make their calls using standard telephones.

VoiceXML is a powerful language for developing and using voice-controlled dialogues and commands. It allows developers to converge the Internet architecture, tools and technologies of leading vendors and create innovative new solutions. Courtesy of VoiceXML's remarkable level of standardization, these solutions may be created by means of native XML development tools and deployed with ease in conjunction with databases like Tamino by Software AG. This affords developers entirely new possibilities for designed products and solutions for both corporate customers and private consumers.