PREV TOC HOME INDEX NEXT


1.2.3 Pre-Speech Buffer

The VAD does not usually detect an utterance just as it arrives; instead, the energy of the utterance builds until the utterance triggers the VAD. For example, the name "Steve", when pronounced, begins with a low-energy hiss.

If the VAD is monitoring the incoming speech signal, and only sends the signal to the application after the beginning of an utterance is detected, then most likely the low-energy start of the utterance will be missing. The ASR engine requires the complete speech utterance to correctly process the signal to fulfill the caller's request.

To avoid this problem, the CSP software stores a pre-speech buffer; that is, a recording of the echo-cancelled incoming speech signal prior to the VAD trigger. The data in the pre-speech buffer is sent to the application along with all subsequent speech signals. Pre-speech buffers are an integral part of VAD. See Figure6-1, Data Flow from Application to Firmware and Figure6-2, Data Flow from Application to Firmware (DM3 Boards) for an illustration of the pre-speech buffer.

There is one pre-speech buffer per voice channel. A pre-speech buffer can hold 250 milliseconds of speech when a sampling rate of 8000 samples per second and a sampling size of 8 bits per sample are used.


PREV TOC HOME INDEX NEXT

Click here to contact Dialogic Customer Engineering

Copyright 2001, Intel Corporation
All rights reserved
This page generated December, 2001