To get Simon to recognize speech and react to it you need what is called a speech model.
Speech models describe how your voice sounds, what words exist, how they sound and what word combination ("sentences" or "structures") exist.
A speech model basically consists of two parts:
- Language model: Describes all existing words and what sentences are grammatically correct
- Acoustic model: Describes how words sound
You need both these components to get Simon to recognize your voice.
In most cases you only need to install the appropriate scenario for your use case to set up your language model.
To create your own language model, you can use Simon to add / edit / remove words and grammar structures.
To make the adding of words easier, you can import a Shadow dictionary.
To create your own acoustic model you can simple read the trainings texts that come with your selected scenarios a couple of times.
If you are creating your own scenario you can easily create trainings texts yourself. See the Simon manual for details.
You can, however use static or adapted base models to avoid using the HTK or to improve the recognition rate.