Technical Articles
Giving a Voice to SAP Conversational AI – Challenge Submission
Voice platforms like Alexa and Google Assistant make it easy to provide a custom voice experience to your clients, even without going deeper in audio processing — everything is part of the platform. But what if you already invested quite some effort into building a chatbot on SAP Conversational AI ? You certainly don’t want to switch to a totally new platform now.
Intro
This tutorial is part of the SAP Conversational AI Tutorial Challenge 2021 and the goal is to show a way how you can build your own voice platform using SAP Conversational AI and Open Source tool Botium Speech Processing
When completing this tutorial you will have a working sample voice interface for your chatbot as a starter for your own custom implementation:
Architecture
Botium Speech Processing is a unified, developer-friendly API to the best available free and Open-Source Speech-To-Text and Text-To-Speech services. Let’s combine this, but first let’s quickly have a look on the architecture.
- User speaks into a microphone
- A Speech-To-Text service translates into text (Botium Speech Processing)
- A chatbot platform extracts information out of the text and builds the text response (SAP Conversational AI)
- A Text-To-Speech service translates into spoken text (Botium Speech Processing)
- User listens to the audio file
Installation Steps
So let’s come to the fun part.
Prerequisites
Here is what you need to have available on your workstation:
- Git client – get it here
- Docker and Docker-Compose
- Node.js and NPM (or Yarn) – get it here
- A SAP Conversational AI bot token
Launch Botium Speech Processing Service
Botium Speech Processing comes with a reasonable default configuration for a voice platform
Both of them are free and Open Source and a good match to get started with voice technologies, on the other hand they are without a doubt among the best free voice tools available.
Launching it can be done with a few command line calls.
$ git clone https://github.com/codeforequity-at/botium-speech-processing.git
$ cd botium-speech-processing
$ docker-compose up -d
Depending on network speed and hardware this step can take a while.
Pointing your browser to http://localhost will show the API explorer for Botium Speech Processing.
Botium Speech Processing API Explorer
Add Voice Capabilities to SAP Conversational AI
This Github repository includes sample webservice code which adds Speech-To-Text and Text-To-Speech capabilities to SAP Conversational AI.
First, clone the repository (if not already done before) and install the prerequisites:
$ git clone https://github.com/codeforequity-at/botium-speech-processing.git
$ cd botium-speech-processing/connectors/sapcai/server
$ npm install
Now you can launch the webservice with another command line call – replace my-sap-cai-token with your bot token:
$ SAPCAI_TOKEN=my-sap-cai-token npm start
Point your browser to http://localhost:5005 to bring up a minimal text-only chat interface to check if the connection to your SAP Conversational AI bot is already working:
Simple Text Interface
Testing Voice Capabilities
There is a simple web-based voice interface available here. You can launch it with:
$ git clone https://github.com/codeforequity-at/botium-voice-interface.git
$ cd botium-voice-interface
$ npm install
$ npm run serve
Point your browser to http://localhost:8080 – now it is time to turn on your microphone and speakers and have a chat with your SAP Conversational AI chatbot!
Simple Voice Interface
Conclusion
This tutorial should help you to add basic voice capabilities to your SAP Conversational AI chatbot, which you can use to start your own project for providing a voice experience to your clients.
Thanks for reading, hopefully you enjoyed this tutorial. Feel free to ask any questions in the comments!
Appendix: Code Walkthrough
For those who are interested, here is the relevant portion of the webservice code.
- In case audio data is received, extract the audio data from the webservice request
- Convert the audio data to a canonical audio codec format, in this case mono-channel wav audio
- Apply Speech-To-Text to extract the spoken text out of the audio
- Send the text to SAP Conversational AI to get the response text
- Apply Text-To-Speech to generate the audio out of the text
- Attach the audio data to the webservice response
socket.on('user_uttered', async (msg) => {
if (msg && msg.message) {
let textInput = msg.message
if (msg.message.startsWith('data:')) {
const base64Data = msg.message.substring(msg.message.indexOf(',') + 1)
const audioData = Buffer.from(base64Data, 'base64')
const wavToMonoWavRequestOptions = {
method: 'POST',
url: 'https://speech.botiumbox.com/api/convert/WAVTOMONOWAV',
data: audioData,
headers: {
'content-type': 'audio/wav'
},
responseType: 'arraybuffer'
}
const wavToMonoWavResponse = await axios(wavToMonoWavRequestOptions)
const sttRequestOptions = {
method: 'POST',
url: 'https://speech.botiumbox.com/api/stt/en',
data: wavToMonoWavResponse.data,
headers: {
'content-type': 'audio/wav'
},
responseType: 'json'
}
const sttResponse = await axios(sttRequestOptions)
textInput = sttResponse.data.text
}
const requestOptions = {
method: 'POST',
url: 'https://api.cai.tools.sap/build/v1/dialog',
headers: {
Authorization: `Token ${SAPCAI_TOKEN}`
},
data: {
message: {
type: 'text',
content: textInput
},
conversation_id: msg.session_id || nanoid()
}
}
try {
const response = await axios(requestOptions)
for (const message of response.data.results.messages.filter(t => t.type === 'text')) {
const botUttered = {
text: message.content
}
const ttsRequestOptions = {
method: 'GET',
url: 'https://speech.botiumbox.com/api/tts/en',
params: {
text: message.content,
voice: 'dfki-poppy-hsmm'
},
responseType: 'arraybuffer'
}
const ttsResponse = await axios(ttsRequestOptions)
botUttered.link = 'data:audio/wav;base64,' + Buffer.from(ttsResponse.data, 'binary').toString('base64')
socket.emit('bot_uttered', botUttered)
}
} catch (err) {
console.log(err.message)
}
}
})
Hi Florian Treml ,
sounds very interesting. Is there a way to add another voice channel via Botium? E.g. calling phone number?
BR
Simon
Botium actually is a tool for chatbot testing. The tech demo shown in this article is using one component of the Botium stack - Botium Speech Processing service - for having Speech-To-Text- and Text-To-Speech-capabilities. Within Botium, this service is required for testing chatbots on voice channels (but it can be used for other things as well ... see the article above ...).
Botium supports most of the chatbot technologies out there, including basic support for testing IVR systems (Twilio account is required for this) - you can read about voice testing with Botium here: https://wiki.botiumbox.com/how-to-guides/voice-app-testing/
SAP Conversational AI is of course supported by Botium, but there are lots of other technologies available as well - see https://wiki.botiumbox.com/technical-reference/botium-connectors/
This is a great tutorial, thank you very much! It has helped me build a conversational AI with voice to voice 🙂
One thing I had to change in the index.js of the speech processing repo regarding the sapcai connector. SAP Conversational AI has changed its API this year. Using the /dialog API, you have to send a bearer token now as well.
Replace this part in the index.js and add your tokens. You can create your bot's OAuth credentials via settings -> Tokens -> OAuth client. The client ID and client secret are used to get your bearer token via the Auth URL.
thanks for this update!