Skip to Content
Voice recognition softwares come in two different flavours. One for online applications and another for offline applications.  A Web application ( which is an example for an online application ) typically uses telephone as the voice input channel. When a user makes a call on the telephone line, the input voice is converted into VoiceXML on a voice server. The voice server then sends a HTTP request to the Web server. The Web server which may access the same back-end infrastructure will return VoiceXML as a HTTP response back to the voice server. The voice server converts the text to speech using the VoiceXML. The point is that it changes the presentation of the information, not the information itself or the way it is generated by the Web server or the back-end system. VoiceXML provides a whole new way of accessing the same Web information, by providing voice access to Web data and services.  A .Net smart client application ( which is an example for an offline application ) typically uses microphone as the voice input channel. The input voice is converted to a text (speech-to-text). The text is matched against a pre-defined grammar file ( nothing but an xml file ). The recognition engine generates events based on the confidence level of recognition. The recognition engine fires an event called “reco” event, if confidence level is 1. The recognition engine fires an event called “hypo” event, if confidence level is any value other than 1.   A Speech Grammar file is a platform-independent, vendor-independent textual representation of grammars for use in speech recognition. Grammars are used by speech recognizers to determine what the recognizer should listen for.  A sample grammar file content looks as follows <grammar xmlns=”http://www.w3.org/2001/06/grammar” xmlns:sapi=”http://schemas.microsoft.com/Speech/2002/06/SRGSExtensions” mode=”voice” tag-format=”semantics-ms/1.0″ version=”1.0″ xml:lang=”en-US”>     <rule scope=”public”>          <item>create new order</item>     </rule>     <rule scope=”public”>                         <item>go to orders</item>               <item>go to sales orders</item>               </rule>     <rule scope=”public”>          <item>go to activities</item>     </rule></grammar> The above grammar xml file contains 3 rules. The rule ids are named NEWORDER, ORDERSEARCH and ACTIVITYSEARCH.  If the voice recognition engine recognises the phrase “create new order”, it fires a “reco” event with the rule id “NEWORDER”. If the voice recognition engine recognises the phrase “go to orders” OR “go to sales orders”, it fires a “reco” event with the rule id “ORDERSEARCH”. If the voice recognition engine recognises the phrase “go to activities”, it fires a “reco” event with the rule id “ACTIVITYSEARCH”.  Now, let’s look at the components that are required to develop a voice enabled .Net smart client application.  You require the following 1. Visual Studio .Net 2003 development environment 2. Microsoft SAPI 3. Your .Net smart client application  The Speech Application Programming Interface or SAPI is an API developed by Microsoft to allow the use of Speech Recognition and Speech Synthesis within Windows applications. Not surprisingly, Microsoft has shipped SAPI either as part of a Speech SDK, or as part of the Windows OS itself. Alternatively, you can also download “Microsoft SAPI 5.1” from http://www.microsoft.com/downloads/details.aspx?FamilyId=5E86EC97-40A7-453F-B0EE-6583171B4530&displaylang=en  Steps you need to do in your .Net smart client application are 1. Add the Interop.SpeechLib ( speech library ) as a reference to your VS 2003 .Net project 2. Write code in the app launch event to create an instance of the recognition context using the speech library 3. Write code to load the grammar file into the recognition context using the speech library API 4. Activate the rule ids in the grammar files using the speech library API 5. Handle the “reco” event and “hypo” event in your code 6. The “reco” event handler should contain case statements for each of the rule ids. The “reco” event handler can be as simple as calling a button click event.  One can also very easily make the application respond (text-to-speech) when a voice command successfully completes a process. During the recently SAP Teched 06, a demo of a such a voice enabled CRM Mobile Sales .Net Laptop Application was presented.
To report this post you need to login first.

2 Comments

You must be Logged on to comment or reply to a post.

  1. Prabhu S
    Hi Selvaraj

    Your blog was indeed very informative. I’m caught up with a requirement where we need to perform entire transaction in a SAP customized application via user comand (voice enabled thru microphone). Your inputs is requested on how to go-ahead with this kind a requirement. Is is possible to have this acheived with my application running in ECC.

    Kindly guide in this. My mail id is

    mailprabhu@gmail.com

    thkx
    Prabhu

    (0) 

Leave a Reply