Technology Blogs by Members
Explore a vibrant mix of technical expertise, industry insights, and tech buzz in member blogs covering SAP products, technology, and events. Get in the mix!
cancel
Showing results for 
Search instead for 
Did you mean: 
archisman92
Participant
Hello UI5 Experts,

With advancement in technologies, we need to keep evolving ourselves to match the expectation of the client. With that in mind, today I will be showing you a UI5 application that works on speech recognition. Let us take a look at a short demo of the application before we proceed any further.

 



 

Let me provide a short transcript of what is happening in the the above demo.

The home screen of the application is the Quality Management landing page through which user can navigate to 3 pages based on the user persona viz. quality results recorder, reviewer and approver. Instead of navigating to these pages by selecting the option from the drop-down, we have enabled speech recognition to perform the exact same function. Once we have navigated to the recorder screen, we select a machine from the drop-down for which we would like to record the quality KPIs. Now, instead of capturing these KPI values by keyboard, we have do that via speech as you can see in the above video. Once the results have been recorded, the reviewer and approver can provide their remarks/edit the values which they deem incorrect by using speech again.

 

Under the hood:

Now that we have seen what is happening here, let us come to the how. Firstly, to enable the browser to get access to our microphone and start listening to our voice, we have used the JavaScript Web Speech API which is the backbone of our application. Let us look at the below lines of code which basically shows how to instantiate this API on initialization of the view.

 
onInit: function () {
var recognition = new window.webkitSpeechRecognition();
recognition.continuous = true;
recognition.lang = 'en-IN';
this.recognition = recognition;
}

 

Here we have created an instance of webkitSpeechRecognition and set the continuous and lang properties. The continuous property can be best explained by the older vs newer version of Google chatbot - in the older version every time you needed to ask something after taking a pause, you had to say "Hello Google" as opposed to the newer version where you conversation maintains continuity even when you take a pause. The lang property is your language preference.

So we have our recognition object initialized, but it still has not started listening to our microphone. We will do that only when user clicks on the microphone button, indicating he/she wants to start the recording. Let us look at the code for the same.
onStartRecording: function () {
var final_transcript = '';
var that = this;
this.recognition.start();
MessageToast.show("Recording started");
this.recognition.onstart = function() {};
this.recognition.onresult = function(event) {

var interim_transcript = '';

for (var i = event.resultIndex; i < event.results.length; ++i) {
if (event.results[i].isFinal) {
final_transcript += event.results[i][0].transcript;
} else {
interim_transcript += event.results[i][0].transcript;
console.log(interim_transcript);
}

}

if (final_transcript != "") {
that.submitValue(final_transcript);
final_transcript = "";
}
};
}

The recognition.start() activates the speech recognizer and triggers the onstart event handler which we have not used here. The onresult event handler is triggered for every new "set of result" that it receives. To make things a bit simple, we can consider that it is triggered approximately for each syllable. Hopefully the below screenshot makes it a bit more clearer.

 


So when i said the word recording, the event was triggered 5 times. Each time my event is triggered, my interim_transcript variable is getting reset which is what you see logged in the above console and i am appending this interim_transcript to a global variable final_transcript which i will be considering as the word which was actually spoken.

Now that we have the word/sentence that user has spoken, we need to decide what to do with it. In the home page, we are using a simple switch case statement to route the application to the view based on user's choice. Along with it we are also stopping the recognizer instance before navigating away from this view.

 
	submitValue: function(final_transcript) {
var key = final_transcript.toLowerCase().trim();
console.log("Key:" + key);
switch (key) {
case "quality results recording":
oRouter.navTo("View2");
this.recognition.stop();
break;
case "quality results review":
oRouter.navTo("View3");
this.recognition.stop();
break;
case "quality results approval":
oRouter.navTo("View4");
this.recognition.stop();
break;

 

Now in the KPI recording/review/approval screen, user needs to select the machine and record the values using speech. Here we have laid down some ground rules. The words "next", "back" and "focus" will be used to navigate the cells of the table. So when my final_transcript registers any of the above 3 words, that will be the cue for my application to go forward a cell, go back by a cell or set focus on a particular cell respectively. A small snippet of the same is shown below.

 
onSubmitValue: function (sValue) {

if (sValue.trim().toLowerCase() == "next") {
this.count = this.count + 1;
this.getView().byId('table1').getRows()[this.count].getCells()[1].focus();
} else if (sValue.trim().toLowerCase() == "back") {
this.count = this.count - 1;
this.getView().byId('table1').getRows()[this.count].getCells()[1].focus();
} else if (sValue.trim().toLowerCase() == "focus") {
MessageToast.show("Setting focus..");
setFocus = true;
} else {

var path = '/items/' + this.count + '/KPIValue';
this.getView().getModel('oModel2').setProperty(path, sValue.trim());
}

}

 

Below function is called once the setFocus variable is set to true.
setKPIFocus: function (sValue) {
var path;
var totalCount = Object.keys(this.getView().getModel('oModel2').getProperty('/items')).length;
for (var i = 0; i < totalCount; i++) {
path = '/items/' + i + '/QualityKPI';

if (this.getView().getModel('oModel2').getProperty(path).toLowerCase() == sValue.trim().toLowerCase()) {
this.count = i;

MessageToast.show("Focus set on " + sValue);
setFocus = false;
this.getView().byId('table1').getRows()[i].getCells()[1].focus();
break;
}

}

}

 

And that is how using speech recognition, we are navigating between pages and capturing data in our form. An immediate extension of this application would be to use speech to save the data as well. The only limit is our imagination!

Well formed documentation on the web speech API can be found in the below URL:

https://developers.google.com/web/updates/2013/01/Voice-Driven-Web-Apps-Introduction-to-the-Web-Spee...

 

Do leave your comments on how you plan to leverage this in your projects:-)

Thanks,

Archisman
8 Comments
Labels in this area