All About Speech Recognition Software

Speech recognition software is now on the cusp of being a practical means of inputting data to a computer.

Read this article series to see how to make it a practical solution for you too.

Noise Reducing Headphones

International Cell Phone Service

GSM cell phone unlocking FAQs

Portable MP3 Players

GPS series of articles

Should you choose an iPhone or Android series

Apple iPhone review series

iPhone 3G/3GS Battery replacement

Third Rail iPhone 4/4S External Battery

Apple iPad review series

iPad/Tablet Buying Guide

Google Nexus 7 review

Netflix Streaming Video

Sharing Internet Access series

Microsoft OneNote review

T-mobile/Google G1 phone review series

Blackberry review and user tips

Palm Tungsten T3

Motorola V3 Razr cell phone review

Motorola V600 cell phone review

Nokia 3650 cell phone review

SIM Saver GSM Phone Backup and Copy Device

Clipper Gear Micro Light

Amazon's Wand review

Amazon's new (Sep '11) Kindles and Fire review

Review of the Kindle Fire

Amazon Kindle eBook reader review

Amazon Kindle 2 preview

Sony PRS-500 eBook reader review

Audible Digital Talking Books review

Home Security Video Monitoring

Quik Pod review

Joby Gorillapod review

Satellite Radio Service

Satellite Phone Service

All About Speech Recognition Software

2005 Best Travel Technology Awards

How to connect to the Internet when away from home/office

Bluetooth wireless networking

How to Choose a Bluetooth Headset

Logitech Squeezebox Duet

Packet 8 VoIP phone service

Sugarsynch software review

iTwin remote access device

Barracuda Spam Firewall review

Cell Phone Emergency Power Recharger series

First Class Sleeper

Roboform Password Manager review

Securikey USB Computer Protection Key review

Steripen UV Water Purifiers

ScanGaugeII OBDII review

SafeDriver review

Expandable Bags for Traveling Convenience

USB Flash Drive

Vonage VoIP phone service

Laptop Screen Privacy Filter

AViiQ Laptop Stands

Aviator Laptop Computer Stand

No Luggage Worries

Pack-a-Cone roadside safety flashing cone

Emergency Self charging Radio

Evac-U8 Emergency Escape Smoke Hood

MyTag Luggage Tags

Beware of Checked Baggage Xray Machines

SearchAlert TSA approved lock

Boostaroo Portable Amplifier and splitter

Dry Pak protective pouch

Boom Noise Canceling Headset

Ety-Com Noise Canceling Headset

Nectar Blueclip BT headset holders

Skullcandy Link Headset Mixer

Lingo Pacifica 10 language talking translator

Nexcell NiMH rechargeable battery kit

Jet Lag Causes and Cures

SuddenStop License Frame

CoolIT USB Beverage cooler

Travel ID and Document Pouches

Protect Yourself Against Document Loss

Personal Radio Service

PicoPad Wallet Notes

Times Electronic Crossword Puzzles

Slim Cam 300 micro digital camera review

Stopping Spam

BottleWise Bottle Carrier review

The End of the Internet as We Know it?

Looking for something else? Search over two million words of free information on our site.

Custom Search

Free Newsletter

Help this Site

Thank you for your interest in helping this site to continue to develop. Some of the information we give you here can save you thousands of dollars the next time you're arranging travel, or will substantially help the quality of your travel experiences in other, non-cash ways.

Reader's Replies

If you'd like to add your own commentary, send me a note.

All About Speech Recognition Software

Talk, not type, to your computer

The omnipresent eye of the HAL 9000 computer in 2001 A Space Odyssey introduced the world to modern speech recognition in 1968. What was science fiction then is close to science fact now.

This is the first part of a series on speech recognition software. See related articles listed on the right.

Reliable speech recognition is something that has been long sought after, but only recently is becoming practical on normal computers.

The extraordinary computing power of a modern home computer, and the evolving capabilities of speech recognition software now offer the promise, and possibly the reality, of being able to effortlessly control and communicate with and via one's computer merely by talking normally to it.

Read through this and the rest of our five part series to understand what speech recognition is now capable of, if it might be suitable for you and your needs, and how to best use it in your own work environment.

A Short History of Speech Recognition

Speech versus voice recognition

First, perhaps we need to define some terms. We are using the term 'speech recognition' to refer to a computer being able to listen to an ordinary speaking voice and understand the words and sentences being spoken.

Voice recognition is something different. We consider voice recognition to be the ability to hear someone speaking and identify the person whose voice is being heard. This process is completely different, and the process of voice recognition may not actually involve understanding any of the words, but might be just limited to recognizing the voice.

This article is all about speech recognition, not voice recognition.

Slightly more than 40 years of history

Speech recognition technology has often been incorporated into science fiction, but for a long time it seemed as fanciful and impossible as death rays and faster than light interstellar travel.

Death rays are now a reality. Faster than light travel - at least at the subatomic level - is becoming a possibility, and after 40 years of hard slog, so to is speech recognition.

Of course the greatest enabler of modern speech recognition capabilities is the ever increasing computing power of a modern computer. But even limitless computing power would be useless without the appropriate programming to drive a speech recognition capability.

AT&T's Bell Labs developed the first-ever speech recognition device way back in the late 1940s and early 1950s. But this was more a proof of concept rather than a practical device that could be deployed in the real world. Until the late 1960s, the focus was on developing systems that would recognize 'discrete' words - that is, words spoken separately and distinctly. (A fascinating and detailed history can be found here.)

While such systems might have some limited application in some specialized fields, modern 'continuous' speech recognition capabilities first started to be developed in the early 1970s, when research into the theoretical concepts that allow for speech recognition, developed at Princeton University, was taken up by several ARPA (Advanced Research Projects Agency -- the same agency that brought us the Internet) contractors.

Some of the underlying theory

In case you wondered, the underlying theory involves using a technique known as 'Hidden Markov Modeling'. This is a way of identifying something without actually seeing the thing itself, by determining what it probably might be, based on other things associated with it. For example, if you wondered what the temperature was outside, and if you saw a person walking down the street wearing only a T-shirt and shorts, you might reasonably infer that it was warm.

The magic of this with speech recognition is that it enables a computer to imprecisely identify words, and then to 'fill in the gaps' based on the words around each other word, more or less the same way we do when we are listening to someone speak ourselves. The context of a word gives clues as to which the word is - particularly with words that sound the same (for example, consider the phrase 'He gave two balls to the other boy too' - with three different words to/too/two all sounding the same but, based on context, being clearly different).

This leads to the second 'magic' part of modern speech recognition. Statistically speaking, computers can accurately predict the next word in a phrase based on the words before it. Indeed, as an immediate and trivial example if you think about the sentence immediately before this one, if the last word was missing, you could probably guess that the last word would be 'it'. Studies have shown that computer statistical models are more accurate at competing phrases that we as people are when we intuitively do the same thing.

Early products released to the public in the mid 1990s

The various techniques for speech recognition were massively refined during the 1980s. After various experimental and high end products had been released to limited markets, 1995 saw the release of the first public speech recognition software. This software, released by Dragon, was a "discrete word" product that required the speaker to clearly enunciate each individual word separately.

Two years later, in 1997, modern speech recognition software appeared as we know it today. This new Dragon product, called "NaturallySpeaking", allowed exactly as its name implies. No longer does a speaker need to sound each word individually. Instead, they could speak in a normal conversational voice, and the computer would be able to break a steady flow of sound into individual words, even if there was no perceptible pause or break between the end of one word and the start of the next.

Since that time, the various different companies offering speech recognition software have all merged, and there is one major company remaining -- Nuance Software, which sells its product under the Dragon NaturallySpeaking name.

The product, now at version 10.1, has continued to improve over the years, and to make better use of the evermore powerful computers available. One could pointlessly debate whether or not earlier versions of their software were truly ready for prime time or not; the key issue which this article series attempts to address is whether the current version is now something that you should consider for yourself.

The Difference between Discrete and Continuous Speech

Think about how you or anyone else normally talks. You run your words together, with almost no pause between the end of one word and the start of the next, indeed, sometimes, people will use the end of one word to modify the start of the next word, either deliberately as a type of slang, or unconsciously because it makes for easier speech.

For example, the phrase 'It's a big one' might be pronounced 'It sa bigwun'. The first two words have been broken at a point so that part of the first word spills to the second word, making both words sound different, and the second two words will be pronounced as if they are a single word. Or maybe the first two words will be run together the same way as the second two words, as 'itsa'.

A discrete speech recognition system would require each word to be carefully sounded out separately. This is not the way we talk, and so makes discrete speech recognition systems less convenient.

A continuous speech recognition system will happily understand what you say, and to prove my point I will pronounce that short phrase four different ways, first sounding each word separately, secondly is to run the words together in a single utterance without pause, thirdly as three words with the first two words broken in the wrong place, and fourth by breaking the phrase into two two-word groups. Let's see how Dragon understands me. You can also see the CPU loading on the computer while Dragon is hard at work.

How to best watch the sample video

If you have a reasonably fast Internet connection, I would recommend that after you click on the play button, you then increase the resolution of the video from its default 360 setting, and possibly keep on going up past 480, perhaps all the way to either 720 or 1080. You should then increase the video size so it fills your screen, and that way with the larger video image and the higher resolution you can clearly see the text appearing on the video of my screen as I speak.

The option to change the resolution appears on the bottom line, but only after you have started playing the video. If you want the video to go fullscreen, you should click on the button next to the video resolution option button that has the four arrows pointing out to the corners.

Alternatively, click on this link to open up a regular YouTube page in a separate browser window.

Note - the second video in this two part video will be available next week.

Technical notes about this video

This test was done on a Dell E6400, with an Intel Core 2 Duo T9600 CPU at 2.8GHz and with 4GB of DDR2 memory, running Win7 32 bit with a Logitech ClearChat Pro USB headset.

NOTE : The sound you hear is NOT from the Logitech headset, it is recorded from the microphone on the camcorder. The sound that Dragon would hear from the Logitech headset would be very much better, and with less background noise.

Summary of Part 1 of this Article Series

Modern speech recognition systems are designed to work best when you speak normally, and in a continuous flow. the software, which has evolved over the last 40 or so years, is still not perfect, but it is getting impressively close.

Please read on to the second part of our series, where we talk about whether your type of work is well suited for speech recognition or not.

(And, of course, there's lots more good stuff in the subsequent parts of the series too.)