Skip to Content
Technical Articles

Giving a Voice to SAP Conversational AI – Challenge Submission

Voice platforms like Alexa and Google Assistant make it easy to provide a custom voice experience to your clients, even without going deeper in audio processing — everything is part of the platform. But what if you already invested quite some effort into building a chatbot on SAP Conversational AI ? You certainly don’t want to switch to a totally new platform now.

Intro

This tutorial is part of the SAP Conversational AI Tutorial Challenge 2021 and the goal is to show a way how you can build your own voice platform using SAP Conversational AI and  Open Source tool Botium Speech Processing

When completing this tutorial you will have a working sample voice interface for your chatbot as a starter for your own custom implementation:

Botium Speech Processing is a unified, developer-friendly API to the best available free and Open-Source Speech-To-Text and Text-To-Speech services. Let’s combine this, but first let’s quickly have a look on the architecture.

  1. User speaks into a microphone
  2. A Speech-To-Text service translates into text (Botium Speech Processing)
  3. A chatbot platform extracts information out of the text and builds the text response  (SAP Conversational AI)
  4. A Text-To-Speech service translates into spoken text (Botium Speech Processing)
  5. User listens to the audio file

So let’s come to the fun part.

Prerequisites

Here is what you need to have available on your workstation:

Launch Botium Speech Processing Service

Botium Speech Processing comes with a reasonable default configuration for a voice platform

Both of them are free and Open Source and a good match to get started with voice technologies, on the other hand they are without a doubt among the best free voice tools available.

Launching it can be done with a few command line calls.

$ git clone https://github.com/codeforequity-at/botium-speech-processing.git
$ cd botium-speech-processing
$ docker-compose up -d

Depending on network speed and hardware this step can take a while.

Pointing your browser to http://localhost will show the API explorer for Botium Speech Processing.

Botium%20Speech%20Processing%20API%20Explorer

Botium Speech Processing API Explorer

 

Add Voice Capabilities to SAP Conversational AI

This Github repository includes sample webservice code which adds Speech-To-Text and Text-To-Speech capabilities to SAP Conversational AI.

First, clone the repository (if not already done before) and install the prerequisites:

$ git clone https://github.com/codeforequity-at/botium-speech-processing.git
$ cd botium-speech-processing/connectors/sapcai/server
$ npm install

Now you can launch the webservice with another command line call – replace my-sap-cai-token with your bot token:

$ SAPCAI_TOKEN=my-sap-cai-token npm start

Point your browser to http://localhost:5005 to bring up a minimal text-only chat interface to check if the connection to your SAP Conversational AI bot is already working:

Simple%20Text%20Interface

Simple Text Interface

There is a simple web-based voice interface available here. You can launch it with:

$ git clone https://github.com/codeforequity-at/botium-voice-interface.git
$ cd botium-voice-interface
$ npm install
$ npm run serve

Point your browser to http://localhost:8080 – now it is time to turn on your microphone and speakers and have a chat with your SAP Conversational AI chatbot!

Simple%20Voice%20Interface

Simple Voice Interface

Conclusion

This tutorial should help you to add basic voice capabilities to your SAP Conversational AI chatbot, which you can use to start your own project for providing a voice experience to your clients.

Thanks for reading, hopefully you enjoyed this tutorial. Feel free to ask any questions in the comments!

Appendix: Code Walkthrough

For those who are interested, here is the relevant portion of the webservice code.

  1. In case audio data is received, extract the audio data from the webservice request
  2. Convert the audio data to a canonical audio codec format, in this case mono-channel wav audio
  3. Apply Speech-To-Text to extract the spoken text out of the audio
  4. Send the text to SAP Conversational AI to get the response text
  5. Apply Text-To-Speech to generate the audio out of the text
  6. Attach the audio data to the webservice response
  socket.on('user_uttered', async (msg) => {
    if (msg && msg.message) {
      let textInput = msg.message

      if (msg.message.startsWith('data:')) {
        const base64Data = msg.message.substring(msg.message.indexOf(',') + 1)
        const audioData = Buffer.from(base64Data, 'base64')

        const wavToMonoWavRequestOptions = {
          method: 'POST',
          url: 'https://speech.botiumbox.com/api/convert/WAVTOMONOWAV',
          data: audioData,
          headers: {
            'content-type': 'audio/wav'
          },
          responseType: 'arraybuffer'
        }
        const wavToMonoWavResponse = await axios(wavToMonoWavRequestOptions)

        const sttRequestOptions = {
          method: 'POST',
          url: 'https://speech.botiumbox.com/api/stt/en',
          data: wavToMonoWavResponse.data,
          headers: {
            'content-type': 'audio/wav'
          },
          responseType: 'json'
        }
        const sttResponse = await axios(sttRequestOptions)

        textInput = sttResponse.data.text
      }

      const requestOptions = {
        method: 'POST',
        url: 'https://api.cai.tools.sap/build/v1/dialog',
        headers: {
          Authorization: `Token ${SAPCAI_TOKEN}`
        },
        data: {
          message: {
            type: 'text',
            content: textInput
          },
          conversation_id: msg.session_id || nanoid()
        }
      }
      try {
        const response = await axios(requestOptions)
        for (const message of response.data.results.messages.filter(t => t.type === 'text')) {
          const botUttered = {
            text: message.content
          }

          const ttsRequestOptions = {
            method: 'GET',
            url: 'https://speech.botiumbox.com/api/tts/en',
            params: {
              text: message.content,
              voice: 'dfki-poppy-hsmm'
            },
            responseType: 'arraybuffer'
          }
          const ttsResponse = await axios(ttsRequestOptions)
          botUttered.link = 'data:audio/wav;base64,' + Buffer.from(ttsResponse.data, 'binary').toString('base64')

          socket.emit('bot_uttered', botUttered)
        }
      } catch (err) {
        console.log(err.message)
      }
    }
  })
2 Comments
You must be Logged on to comment or reply to a post.
    • Botium actually is a tool for chatbot testing. The tech demo shown in this article is using one component of the Botium stack - Botium Speech Processing service - for having Speech-To-Text- and Text-To-Speech-capabilities. Within Botium, this service is required for testing chatbots on voice channels (but it can be used for other things as well ... see the article above ...).

      Botium supports most of the chatbot technologies out there, including basic support for testing IVR systems (Twilio account is required for this) - you can read about voice testing with Botium here: https://wiki.botiumbox.com/how-to-guides/voice-app-testing/

      SAP Conversational AI is of course supported by Botium, but there are lots of other technologies available as well - see https://wiki.botiumbox.com/technical-reference/botium-connectors/