Scott Hanselman

Speech Recognition in Windows Vista - I'm listening

November 22, '06 Comments [6] Posted in Reviews | Tools
Sponsored By

In the past I've had a significant number of carpal tunnel like symptoms, and typing grows increasingly uncomfortable.  I doubt that it's carpal tunnel per se, but typing as fast as I do will no doubt eventually break the body down.  In the past I've used Dragon Naturally Speaking as an alternative to typing. In fact, most of my chapters in the ASP.NET book were dictated with Dragon Naturally Speaking.

Of course I was excited to hear that Windows Vista would include lots of new speech recognition features, and today I finally got to try them out.  I plugged in my Logitech USB headset and ran through the tutorial.

You really have to try it to fully understand the improvements that have been made to accessibility in Windows Vista.  While this entire blog post was dictated using the Built-in speech features in Vista, the dictation features, frankly aren't that impressive.  To be clear, they work, and they work well.  But it's the interface, the user experience, that's so amazing.

You can tell that BillG is very much not kidding when he says speech is going to be the way we will interact with their computers. Notice for example in the image below that while I've just been using voice recognition in vista for only 5 minutes, the system has scanned the documents in My Documents and my Desktop and determined that "Wii" (as in Nintendo Wii, a video game system that I recorded a podcast about yesterday) is a reasonable and valid homonym to "we."

But these are speech-specific things, what was really interesting to me is how easy it is to interact with the entire system, the shell, without touching your mouse.  This is going to be Huge for people who CAN'T touch the mouse.

One of the most clever user interface experiences is the "show numbers" interface. When you're using Windows Vista voice recognition and you tell it to "show numbers," the current window has numbered regions overlaid on a user interface elements, so that they can be easily selected just by saying a number.

For example, notice the interface of Windows Live Writer as seen below.  Even though the default interface will click when I say - meaning if I simply say "insert picture" the system will click the Insert Picture user interface element just because it's on the screen - if there's a user interface on it like a toolbar button or something that is difficult to express verbally, I can click it easily using show numbers.

The same feature is used when selecting words that appear multiple times within a chunk of text.  For example if a paragraph contained the name 'Hanselman' four times and I said "Select Hanselman," each instance of the word would have been numbered overlaid allowing me to quickly indicate the one I meant. 

I'm not familiar with the Windows Speech API, but it'll be interesting to see how vendors like the folks at Dragon Naturally Speaking are meant to integrate their speech recognition algorithms to the existing interface experience provided by Vista out of the box.

As the one who fortunately does have the use of both my hands, I find speech to be the most valuable when I can have one hand on the keyboard, one hand on the mouse, and be speaking simultaneously.  It's certainly true that I can talk faster than I can type, and it's very very difficult to beat really good speech recognition software by just typing. 

It's worth noting that they've removed all of the speech recognition features from Office 2007 and there are a number of people who were considerably torqued about that decision.  That said, if you're into speech recognition or you use speech recognition software in your everyday life, the improvements in a speech in Vista are reason enough to upgrade your OS.

And sure, it's not perfect, but I'm using a crappy microphone in a noisy room on a slowish machine while speaking quietly so as not to wake the baby.  Not too shabby.

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

facebook twitter subscribe
About   Newsletter
Sponsored By
Hosting By
Dedicated Windows Server Hosting by ORCS Web
Wednesday, November 22, 2006 7:59:29 AM UTC
Carpal Tunnel is actually a relatively rare affliction. It's the most publicized form of a more general class of afflictions called Repetitive Stress Injury or RSI for short. I've written about RSI before as I've received treatment via occupational therapy and physical therapy. It's a serious ailment and treatment and education does help.
Wednesday, November 22, 2006 8:00:03 AM UTC
I should add that RSI is much more widespread and takes on many forms such as tendinitis, neuritis, etc...
Wednesday, November 22, 2006 11:58:59 PM UTC
This is one of the top reasons I'm excited about upgrading to Vista (no, haven't done it yet...). It's too bad that one demo went poorly; the Slashdot crowd were very quick to write off one of Vista's best features due to an audio input problem.

There's a great video demo of it here: http://www.istartedsomething.com/20060808/vista-speech-recognition-screencast/
Thursday, November 23, 2006 12:43:19 AM UTC
If you have the tendency to rest your wrist on your desk while using the mouse, you should use one of these: http://www.mousemitt.com/kb.html. I use one most of the time.

The weight of your hand will be absorbed in the cushioning instead of pressuring your tendons.

Speech recognition will not replace keyboards and mice in the work place. Imagine the noise level when workers are 'talking' to their computers.
Abdu
Thursday, November 23, 2006 5:26:47 AM UTC
Has anyone had good success actually programming using speech recognition or pen input?
Peter
Thursday, November 23, 2006 6:11:14 AM UTC
I remain skeptical. took me just few tries with Dragon to forget all ambitions about ever dictating my docs. ~300M/2B = 85% of the world speaks English with accent. Most of the docs we need are in English. This accent is unique culture/first language/personality matrix, thats a huge volume to cover. experimental thought - if i read few lines of text into mp3 can you run it through vista voice rec? (or perhaps try with somebody in the office)
Comments are closed.

Disclaimer: The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.