Whisper Deck Voice Control

I’ve gotten a few e-mails over the past few days regarding the voice control aspect of the Whisper Deck and how it works.  Here’s a brief overview of how I was able to incorporate speech as an input mechanism for augmented reality models in Flash.

The voice control system that I created is based on a client-side software package called “MacSpeech Dictate.” Voice recognition works as follows:

  1. Launch the MacSpeech Dictate recognition engine
  2. Place cursor focus in a text box at the bottom of the Whisper Deck interface.  This text box is not visible in the Youtube demo of the project.
  3. Speak into the microphone.  Recognized words are transcribed by MacSpeech Dictate and placed into the text box in the Flash movie.
  4. Flash listens for an Event.CHANGE event to fire on the text box.  When it does, it starts parsing the text that was transcribed by MacSpeech Dictate.  Here are the general steps my parsing routine goes through:
    1. Convert the entire spoken string to lowercase (”this.mytextfield.text = this.mytextfield.text.toLowerCase();“)
    2. Parse out any leading spaces
    3. Split the transcribed sentence into individual array elements based on the placement of spaces (i.e. “hello world how are you” would parse out to a new array with the following elements
      1. myarray[0] = “hello”;
      2. myarray[1] = “world”;
      3. myarray[2] = “how”;
      4. myarray[3] = “are”;
      5. myarray[4] = “you”;
    4. Look for the last element of the array to be the world “over” – this is used as a trigger to tell Flash to process the command in its entirety.
    5. If the “over” command is present, look at the first element of the array.  Current the Whisper Deck can recognize two commands (”search” and “compare”) – if either of these commands is present, pass the command to the appropriate AR rendering class.

Originally I did not include the “over” keyword as part of the system – instead I used a period of microphone inactivity as a cue to tell the program that I was done speaking.  Unfortunately this did not provide very stable results – machine transcription, even under quiet conditions, is fraught with errors, which led to a lot of inaccurate searches.  “Over” was included as a safety buffer to let me “proofread” my voice command before I asked the program to process it.  It works well for demo purposes, but I can see that it’s a limitation of the system that I will need to work out if the project was to move forward.

At some point I would love to play around with a web-accessible machine translation routine, similar to what Didier Brun has accomplished in the video below.  Unfortunately I was pressed for time on this application, and MacSpeech Dictate worked very well given the design requirements for this project.

Voice Gesture from didier.brun on Vimeo.

Leave a Reply