5.2 VAD Operation
The best way to describe the VAD operation is by example. Figure5-1 shows sample speech patterns and the voice activity detector's operation on this speech.
For a description of the parameters, see the ec_setparm( ) function description in the Continuous Speech Processing API Library Reference.
Figure 5-1. Example of Voice Activity Detector (VAD) Operation
This example illustrates the following:
- The speech window consists of 10 speech blocks (the default value). You can adjust the size of the speech window as needed for play and non-play situations.
DXCH_SPEECHPLAYWINDOW = 10- On SpringWare boards, each speech block in the speech window consists of 96 samples; each block is 12 milliseconds in length at 8 kHz PCM. This value is fixed and cannot be modified.
On DM3 boards, each speech block in the speech window consists of 80 samples; each block is 10 milliseconds in length at 8 kHz PCM. This value is fixed and cannot be modified.
- The speech threshold is -40 dBm and the speech trigger is 9 speech blocks.
DXCH_SPEECHPLAYTHRESH = -40 dBm
DXCH_SPEECHPLAYTRIGG = 9 (speech blocks)- Each speech block is examined by the VAD to see whether the speech energy exceeds the speech threshold value of -40 dBm. If the speech energy exceeds or is equal to -40 dBm, then that speech block is assigned the value 1. If it is less than -40 dBm, the speech block is assigned the value 0.
- In this example, 9 out of 10 speech blocks in the speech window register speech energy greater than the speech threshold (as indicated by the value 1). Thus, barge-in occurs.
- When barge-in occurs, the VAD sends the voice data on to the host application or speech recognition engine for further voice analysis.
Click here to contact Dialogic Customer Engineering
Copyright 2001, Intel Corporation