Simon/Contribute Data

The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

To build a speech recognition system, several types of data files are required:

A phonetic dictionary to learn how words are pronounced
Transcribed audio samples to learn how a human pronounces the phonetic elements from the dictionary (phones)
Large corpora of written text to learn what word structures commonly co-occur (provides context for the recognizer)