Speech Recognition in Windows Vista - I'm listening
In the past I've had a significant number of carpal tunnel like symptoms, and typing grows increasingly uncomfortable. I doubt that it's carpal tunnel per se, but typing as fast as I do will no doubt eventually break the body down. In the past I've used Dragon Naturally Speaking as an alternative to typing. In fact, most of my chapters in the ASP.NET book were dictated with Dragon Naturally Speaking.
Of course I was excited to hear that Windows Vista would include lots of new speech recognition features, and today I finally got to try them out. I plugged in my Logitech USB headset and ran through the tutorial.
You really have to try it to fully understand the improvements that have been made to accessibility in Windows Vista. While this entire blog post was dictated using the Built-in speech features in Vista, the dictation features, frankly aren't that impressive. To be clear, they work, and they work well. But it's the interface, the user experience, that's so amazing.
You can tell that BillG is very much not kidding when he says speech is going to be the way we will interact with their computers. Notice for example in the image below that while I've just been using voice recognition in vista for only 5 minutes, the system has scanned the documents in My Documents and my Desktop and determined that "Wii" (as in Nintendo Wii, a video game system that I recorded a podcast about yesterday) is a reasonable and valid homonym to "we."
But these are speech-specific things, what was really interesting to me is how easy it is to interact with the entire system, the shell, without touching your mouse. This is going to be Huge for people who CAN'T touch the mouse.
One of the most clever user interface experiences is the "show numbers" interface. When you're using Windows Vista voice recognition and you tell it to "show numbers," the current window has numbered regions overlaid on a user interface elements, so that they can be easily selected just by saying a number.
For example, notice the interface of Windows Live Writer as seen below. Even though the default interface will click when I say - meaning if I simply say "insert picture" the system will click the Insert Picture user interface element just because it's on the screen - if there's a user interface on it like a toolbar button or something that is difficult to express verbally, I can click it easily using show numbers.
The same feature is used when selecting words that appear multiple times within a chunk of text. For example if a paragraph contained the name 'Hanselman' four times and I said "Select Hanselman," each instance of the word would have been numbered overlaid allowing me to quickly indicate the one I meant.
I'm not familiar with the Windows Speech API, but it'll be interesting to see how vendors like the folks at Dragon Naturally Speaking are meant to integrate their speech recognition algorithms to the existing interface experience provided by Vista out of the box.
As the one who fortunately does have the use of both my hands, I find speech to be the most valuable when I can have one hand on the keyboard, one hand on the mouse, and be speaking simultaneously. It's certainly true that I can talk faster than I can type, and it's very very difficult to beat really good speech recognition software by just typing.
It's worth noting that they've removed all of the speech recognition features from Office 2007 and there are a number of people who were considerably torqued about that decision. That said, if you're into speech recognition or you use speech recognition software in your everyday life, the improvements in a speech in Vista are reason enough to upgrade your OS.
And sure, it's not perfect, but I'm using a crappy microphone in a noisy room on a slowish machine while speaking quietly so as not to wake the baby. Not too shabby.