4.1. [encoder] Parameters
The encoder parameters are used to perform an encoding process on a media stream. Automatic Gain Control (AGC) and Silence Compressed Record (SCR) are two algorithms used as part of this encoding process.
The AGC is an algorithm for normalizing an input signal to a target record level. The target record level should be chosen to be the optimum level for an encoder and, at the same time, produce a suitable playback level for a listener.
The AGC algorithm is controlled by three parameters: PrmAGCk, PrmAGCmax_gain, and PrmAGClow_threshold. PrmAGCK, is a target output level. PrmAGCmax_gain is the a limit on the possible maximum gain. The ratio, PrmAGCk/PrmAGCmax_gain gives the AGC High Threshold value. This is the threshold for which inputs above it produce output level at the PrmAGCk level and inputs with a level below it produce outputs which linearly decrease with the input level. The PrmAGClow_threshold, on the other hand, is an upper limit for a noise level estimate. That is, a signal with a level above the PrmAGClow_threshold is declared speech, independently of whether it is or not. Below the threshold, the AGC algorithm itself tries to discriminate between voiced and unvoiced signals.
Figure 6 is a graphical representation of the AGC gain relative to input average.
The SCR algorithm operates on 1 millisecond blocks of speech and uses a twofold approach to determine whether a sample is speech or silence. Two Probability of Speech values are calculated using a Zero Crossing algorithm and an Energy Detection algorithm. These values are combined to calculate a Combined Probability of Speech.
The Zero Crossing algorithm counts the number of times a sample block crosses a zero line, thus establishing a rough "average frequency" for the sample. If the count for the sample falls within a predetermined range, the sample is considered speech.
The Energy Detection algorithm allows user input at the component level (via the SCR_LO_THR and SCR_HI_THR parameters) of a background noise threshold range. Signals above the high threshold are declared speech and signals below the low threshold are declared silence.
SCR declares speech or silence for the current 1 millisecond sample based on the following:
- previous 1 millisecond sample declaration (speech or silence)
- Combined Probability of Speech in relation to the Speech Probability Threshold (SCR_PR_SP)
- Combined Probability of Speech in relation to the Silence Probability Threshold (SCR_PR_SIL)
- Trailing Silence (SCR_T) relative to Silence Duration
If Combined Probability of Speech > Speech Probability Threshold then Declare Speech else Declare SilenceIf Combined Probability of Speech > Silence Probability Threshold then Declare Speech else If Silence Duration < Trailing Silence then Declare Speech else Declare SilenceThe encoder section of the CONFIG file includes the following parameters:
- PrmAGCk (AGC K Constant)
- PrmAGClow_threshold (AGC Noise Level Lower Threshold)
- PrmAGCmax_gain (AGC Maximum Gain)
- SCR_T (SCR Trailing Silence)
- SCR_PR_SP (SCR Speech Probability Threshold)
- SCR_PR_SIL (SCR Silence Probability Threshold)
- SCR_LO_THR (SCR Low Background Noise Threshold)
- SCR_HI_THR (SCR High Background Noise Threshold)
Click here to contact Telecom Support Resources
Copyright 2003, Intel Corporation