Lesson 16: Speech Recognition

Lesson Description

In this lesson, you will learn: how humans learn languages, how Alexa works and how to make your own Alexa using PictoBlox. You will learn how to recognize speech.

INTRODUCTION

How Does Humans Learn a Language?

From the time we are born, we hear words and sounds around us. Even before we can speak, we hear some words that we start responding to words like Mama, Dada, yes, No.

Our brain tries to find patterns to differentiate various sounds and words and categorize them. It may seem as though humans are pre-programmed to listen and understand but it is not so. We have been trained to develop this ability.

Speech recognition technology has been developed along the same lines. Computers are also trained in the same way.

Speech recognition is the ability of a machine to identify words and phrases in spoken language and convert them to a machine-readable format.

HOW ALEXA WORKS?

Alexa, Amazon’s virtual assistant AI technology, uses natural language processing to convert speech into sounds, words, and ideas. Here’s how she works:

  1. Alexa first records your speech. Then, this recording is sent to Amazon’s servers to be analysed more efficiently.
  2. Amazon breaks down the recording into individual sounds. It then consults a database containing various words’ pronunciations to find which words most closely correspond to the combination of individual sounds.
  3. It then identifies keywords to make sense of the tasks and carry out corresponding functions. E.g. if Alexa notices words like “weather” or “temperature”, it will open the weather app.
  4. Amazon’s servers send the information back to your device. If Alexa needs to say anything back to you, it will go through the same process described above, but in reverse order.

ACTIVITY: MAKE YOUR OWN ALEXA

In this project, we will make our own personal assistant like Alexa. We will be making a script that will recognize our voice command and analyze it to play the Mario theme song or the Spider-Man theme song. If the command is not recognized, it will say that it didn’t understand the command.

CODING STEPS

Follow the steps below:

  1. Scan the QR Code and open the PictoBlox code.
  2. Click the Tobi sprite and go to Sprite Settings. Click on the Sound You will find two audio files named Spiderman and Mario.
  3. Go back to the editing area. Click the Add extension button and add the Artificial Intelligence
  4. Click the Add extension button and add the Text to Speech
  1. Add a when flag clicked block into the scripting area.
  2. Snap a recognize speech for () s in () block below the when flag clicked Change the time to 4 seconds. The block records the audio for the specified time and analyse for text in cloud.
  3. Now, snap an if () else block below the recognize speech for () seconds
  4. In the condition of the if () else block, add a () contains ()? block from the Operators In the first argument, add a speech recognition result block and in the second write “mario“. So, if the decoded text contains the word Mario, it will execute the if branch blocks.
  5. Add a speak () block from Text to Speech palette under the if arm and write the message “Playing Mario Song!“.
  6. Next, a snap play sound () until done block below the speak () block and select Mario. This is how the script look:
  1. Duplicate the if () else block and snap it under the else
  2. Change “mario” to “spiderman” in the condition of the if
  3. Change the message in the speak block to “Playing Spiderman Song!“.
  4. Change the sound to Spiderman.
  5. Finally, under else arm, add a speak () block and write “Sorry, I am unable to understand the command“.
  6. Click the green flag to start the script. Recognition window will open. Say the command to execute the project.
  1. Save the file as Alexa.

 

 

Table of Contents