CA2016462A1 - Hybrid switched multi-pulse/stochastic speech coding technique - Google Patents
Hybrid switched multi-pulse/stochastic speech coding techniqueInfo
- Publication number
- CA2016462A1 CA2016462A1 CA002016462A CA2016462A CA2016462A1 CA 2016462 A1 CA2016462 A1 CA 2016462A1 CA 002016462 A CA002016462 A CA 002016462A CA 2016462 A CA2016462 A CA 2016462A CA 2016462 A1 CA2016462 A1 CA 2016462A1
- Authority
- CA
- Canada
- Prior art keywords
- pulse
- excitation
- unvoiced
- input signal
- coding
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/08—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
- G10L19/12—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a code excitation, e.g. in code excited linear prediction [CELP] vocoders
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/08—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
- G10L19/10—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a multipulse excitation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L2019/0001—Codebooks
- G10L2019/0003—Backward prediction of gain
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/06—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being correlation coefficients
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/93—Discriminating between voiced and unvoiced parts of speech signals
Abstract
RD-19,333 HYBRID SWITCHED MULTI-PULSE/STOCHASTIC
SPEECH CODING TECHNIQUE
ABSTRACT OF THE DISCLOSURE
Improved unvoiced speech performance problem in low-rate multi-pulse coders is achieved by employing a multi-pulse architecture that is simple in implementation but with an output quality comparable to code excited linear predictive (CELP) coding. A hybrid architecture is provided in which a stochastic excitation model that is used during unvoiced speech is also capable of modeling voiced speech by use of random codebook excitation. A modified method for calculating the gain during stochastic excitation is also provided.
SPEECH CODING TECHNIQUE
ABSTRACT OF THE DISCLOSURE
Improved unvoiced speech performance problem in low-rate multi-pulse coders is achieved by employing a multi-pulse architecture that is simple in implementation but with an output quality comparable to code excited linear predictive (CELP) coding. A hybrid architecture is provided in which a stochastic excitation model that is used during unvoiced speech is also capable of modeling voiced speech by use of random codebook excitation. A modified method for calculating the gain during stochastic excitation is also provided.
Description
~6~
RD-13,133 - HYBRID SWITCHED MULTI-PULSE/STOCHASTIC
S~EECH CODING TECHNIQUE
CROSS-REF~RENCE TO RELATED APPLICATION
s This application is related ln subject matter to Richard L. Zinser application Serial No. 07/ , filed concurrently herewith for "A Method for Improving the Speech Quality in Multi-~ulse Excited Linear Predictive Coding:
(Docket RD-19,291) and assigned to the instant assignee. The disclosure of that application is incorporated herein by reference.
DESCRIPTION
Field of the Invention The present invention generally relates to digital voice transmission systems and, more particularly, to a simple method of combining stochastic excitation and pulse excitation for a low-rate multi-pulse speech coder.
Description of the Prior Art Code exclted linear prediction (CELP) and mul~i-pulse linear predictive coding ~MPLPC) are two of the most promising techniques for low rate speech coding. While C-LP
holds the most promise fox hi~h ~uality, its computational requirements can be too great for some systems. MPLPC can be implemented with much less complexity, but it is generally considered to provide lower ~uality than CELP.
Multi-pulse coding is believed ~o have been first described by B.S. Atal and J.R. Remde in "A New Model of L?C
~$4~2 ~D-lg,33~
Excitation for Producing Natural Sounding Speech at Low 3it Rates", P~oc,_of 1982 IEEE Int. Con~. on Acç~stics S~eech and Si~n~ ~jLla~ May 1982, pp. 614-617, which is incorporated herein by reference. It was described to S improve on the rather synthetic quality of the speech produced by the standard U.S. Department of Defense LPC-10 vocoder. The basic method is to employ the linear predictive coding (LPC~ speech synthesis filter of the standard vocoder, but to use multiple pulses per pitch period for exciting the filter~ lnstead of the single pulse used in the Department of Defense standard system. The basic multi-pulse technique is illustrated in Figure 1.
At low transmission rates (e.g., 4800 bits/second), multi-pulse speech coders do not reproduce unvoiced speech correctly. They exhibit two perceptually annoying flaws: 1) amplitude of the unvoiced sounds is too low, making sibilant sounds difficult to understand, and 2) unvoiced sounds that are reproduced with sufficient amplitude tend to be buzzy, due to the pulsed nature of the excitation.
To see how these problems arise, the cause of the second of these two flaws is first considered. In a multi-pulse coder, as the transmission rate is lowered, fewer pulses can be coded per unit time. This makes the "excitation coverage"
spaxse; i.e., the second trace ("Exc Signal") in Figure 2 contains few pulses. During voiced speech, as shown in Figure 2, this sparseness does no~ become a significant problem unless the transmission rate is so low that a sin~le pulse per pitch period canno~ be transmitted. As seen in Figure 2, the coverage is about three pulses per pitch period. At 4800 bits/second, there is usually enough rate available so that several pulses can be used per pitch period (at least for male speakers), so that coding of voiced speech may readily be accomplished. However, for unvoiced speech, the impulse response of the LPC synthesis ~ilter is much shorter than for voiced speech, and consequently, a sparse 2 ~
RD-19,333 pulse excitation signal will produce a "splotchy~', se~i-periodic output that is buzzy sounding.
A simple way to improve unvoiced excitation would be to add a random noise generator and a voiced/unvoiced decision algorithm, as in the standard LPC-10 algorithm. This would correct for the lack of excitation during unvoiced periods and remove the buzzy artifac~s. Unfortunately, by adding the voiced/unvoiced decision and noise generator, the waveform-preserving properties of multi-pulse coding would be compromised and its intrinsic robustness would be reduced.
In addition, errors introduced into the ~oiced/unvoiced decision during operation in noisy environments would significantly degrade the speech quality.
As an alternative, one could employ simultaneous pulse excitation and random codebook excita~ion similar to CELP.
Such a system is described by T.V. Sreenivas in "Modeling LPC-Residue by Components for Good Quality Speech Coding", ~h~
~ ianal_ ProcQssing, April 1988, pp. 171-174, which is incorporated herein by reference. ~y simultaneously obtaining the pulse amplitudes and searching for the codeword index and gain, a robust system that would give good performance during hoth voiced and unvoiced speec~ could be provided. While this technique appears to be feasible at first look, it can become overly complex in implementation.
If an analysis-by-synthesis codebook technique is desired for the multi-pulse positions and/or amplitudes, then the two codebooks must be se rched together; i.e., if each codebook has N entries, then N2 combinations must be run through the synthesis filter and compared to the input signal.
("Codebook" as used herein refers to a collection of vectors filled with random Gaussian noise samples, and each codebook contains information as to the number of vectors therein and the lengths of the ~ectors.) With typical codebook sizes of 1~8 vector entries, the system becomes too complex for h 3 2 RG-19,3 3 implementation of an equivalent size of (128)2 or i6,384 vector entries.
SUMMARY OF THE INVENT ION
It is therefore an object of the present invention to provide a solution to the unvoiced speech performance problem in low-rate multi-pulse coders.
It is another object of this invention to provide a multi-pulse code architecture that is very simple in implementation yet has an output quality comparable to CELP.
Briefly, according to the invention, a hybrid switched multi-pulse coder architecture is provided in which a stochastic excitation model is used during unvoiced speech and which is also capable of modeling voiced speech. Th~
coder architecture comprises means for analyzing an input speech signal to determine if the signal is voiced or unvoiced, means for generating multi-pulse excita~ion for coding the input signal, means for generating a random codebook excitation for coding the input signal, and means responsive to the means for analyzing an input signal for selecting either the multi-pulse excitation or the random codebook exci~ation. A method of combining stochastic excitation and pulse excitation in an multi-pulse voice coder is also provided and comprises the steps of analyzing an input speech signal to de~ermine if the input signal is voiced or unvoiced - if the input signal is voiced, it is coded by use Gf multi-pùlse excita~ion while if the input signal i5 unvoiced, it is coded by use of a random code~ook excitation. A modified method for calculating ~he gain during stochastic excitation is also provided.
R~-19,333 BRIEF DESCRIPTION OF THE DRAWINGS
The features of the invention believed to be no~el are set forth with particularity in the appended claims. The invention itself, however, both as to organization and method of operation, together with further objects and advantages thereof, may best be understood by reference to the following description taken in conjunction with the accompanying drawings in which:
Figure 1 is a block diagram showing the conventional implementation of the basic multi-pulse technique of coding an input signal;
Figure 2 is a graph showing respectively the input signal, the excitation signal and the output signal in the conventional sys~em shown in Figure 1;
Figure 3 is a block diagram of the hybrid switched multi-pulse/stochastic coder according to the inventioni and Figure 4 is a graph showing respectively the input signal, the output signal of a standard multi-pulse coder, and the output signal of the improved multi-pulse coder according to the invention.
~ET~ILED DESCRIPTION OF A P~EFERRED
EMBODIMENT OF T~E INYENTION
In ~mploying the basic multi-pulse technique using the conventional system shown in Figure 1, the input signal at A
(shown in Figure 2) is first analyzed in a linear predictive coding (LPC) analysis circuit 10 to produce a set of linear prediction filter coefficients. These coefficients, when used in an all-pole LPC synthesis filter 11, produce a filter transfer function ~hat closely re~embles the gross spectral shap~ of the input signal. A feedback loop formed by a pulse generator 12, synthesis filter 11, weighting filters 13, and J~
P~D-19,~33 an error minimizer 14, generates a pulsed excitation at point B that, when fed into filter 11, produces an output waveform at point C that closely resembles the input waveform at point A. This is accomplished by selecting pulse positions and amplitudes to minimize the perceptuaLly weighted difference between the candidate output sequence and the input sequence. Trace B in Figure 2 depicts the pulse excitation for filter ll, and trace C shows the output signal of the system. The resemblance of signals at input A and output C should be noted. Perceptual weighting is provided by the weighting filters 13. The transfer function of these filters is derived from the LPC filter coefficients. A more complete understanding of the baslc multi-pulse technique may be gained from the aforementioned Atal et al. paper.
Since searching two codebooks simultaneously in order to obtain improvement in unvoiced excitation over that provided by multi-pulse speech coders is prohibitively complex, there are two possible choices that are more feasible; i.e., single mode excitation or a voiced/unvoiced decision. The latter approach is adopted by this invention, through use of multi-pulse excitation for voiced periods and random codebook excitation for unvoiced periods. If a pitch predictor is used in conjunction with random codebook excitation, then the random excitation is capable of modeling voiced or unvoiced speech (albeit with somewhat less qualitv during voiced periods). 8y use of this technique, the prevlously-mentloned reduction in robustness associated with the voiced/unvoiced decision is no longer a critical matter for natural sounding speech and the waveform-preserving properties of multi-pulse coding are retained. An improvement in quality over single mode excitation is thereby obtained without the expected aforementioned drawbacks.
Listening tests for the voiced/unvoiced decision system described in the preceding paragraph revealed one remaining problem. While the buzziness in unvoiced sections of the 2 ~ ~3 RD-19,3~3 speech was substantially eliminated, amplitude of the unvoiced sounds was too low. This problem can be traced to the codeword gain computation method for CELP coders. The minimum MSE (mean squared error) gain is calculated by normalizing the cross-correlation between the filtered excitation and the input signal, i.e., ~ y(i)x(i) g = i o~
~ y2 (i) where g is the gain, x(i) is the (weighted)input signal, y(i) is the synthesis-filtered (and weighted) excitation signal, and N is the frame length, i.e., length of a contiguous time sequence of analog~to~digital samplings of a speech sample.
While Equa~ion (1) provides the minimum error result, it also produces a level of output signal that is substantially lower than the level of input signal when a high de~ree of cross-correlation between output signal and input signal cannot be attained. The correlation mismatch occurs most often during unvoiced speech. Unvoiced speech is problematical because the pi~ch predictor provides a much smaller coding gain than in voiced speech and thus the codebook must provide most of the excitation pulses. For a small codebook system (128 vector entries or less), there are insufficient codebook entries for a good match.
If the unvoiced gain is instead calculated by a RMS
(root-mean-square) matching method, i.e., N-1 ~2 ~ x2(i) g_ T y2(i) ~ (2) ' ' -:
, .~h~
then the output signal level will more closely match the input signal level, but the overall signal-to-noise ratio (SNR) will be lower. I have employed the estimator of Equation (2) for unvoiced frames and found that the output amplitude during unvoiced speech sounded much closer to that of the original speech. In an informal comparison, listeners preferred speech synthesized with the unvoiced gain of Equation (2) compared to ~hat of Equation (1).
Figure 3 is a block diagram of a multi-pulse coder utilizing the improvements according to the invention. As in the system illustrated in Figure 1, the input sequence is first passed to an LPC analy~er 20 to p~oduce a set of linear predictive filter coefficients. In addition, the preferred embodiment of this invention contains a pitch prediction system that is fully described in my copending application S.N. ~docket RD 19,291). For the purpose of pitch prediction, the pitch lag is also calculated directly from the input data by a pitch detector 21. To find the pulse information, the impulse response is generated in a weighted impulse response circuit 22. The output signal of this response circuit is cross-correlated with error weighted input buffer data from an error weighting. filter 35 in a cross-correlator 23. (LPC analyzer 20 provides error weightlng filter 35 with the linear predictive filter coefficients so as to allow cross-correlator circuit 23 to minimize error.) An iterative peak search is performed by the cross-correlator on the resulting cross-correlation, producing the pulse positions. The preferred method or computing the pulse amplitudes can be found in my above-mentioned copending patent application. After all the pulsepositions and amplitudes are computed, they are passed to a pulse exci~ation generator 25~ which generates impulsive excitation similar to that shown in trace B of Figure 2; that is, correlator 2~ produces ~he pulse positions, and pulse excitation genera~or 25 generates the drive pulses.
,7 ~ ~ r.~
RD-13,333 Based on the input data, a voiced/unvoiced decision circuit 24 selects either pulse excitation, or noise codebook excitation. If a voiced determination is made by voiced/unvoiced decision circuit 24, pulse excitation is used and an electronic switch 30 is closed to its Voiced position.
The pulse excitation from generator 25 is then passed through switch 30 to the output stages.
If, alternatively, an unvoiced determination is made by decision circuit 24, then noise codebook excitation is employed. A Gaussian noise codebook 26 is exhaustively searched by first passing each codeword through a weighted LPC synthesis filter 27 (which provides weighting in accordance with the linear predictive coefficients from LPC
analyzer 20), and then selecting the codeword that produces the output sequence that most closely resembles the perceptually weighted input sequence. This task is performed by a noise codebook selector 28. Selector 28 also calculates optimal gain for the chosen codeword in accordance with the linear predictive coefficients from LPC analyzer 20. The gain-scaled codeword is then generated at the codebook output port 29 and passed through switch 30 (which is in the Unvoiced position) to the output stages.
The output stages make up a pitch prediction synthesis subsystem comprising a summing circuit 31, an excitation buffer 33 and pitch synthesis filter 34, and an LPC synthesis filter 32. ~ full description of the pit~h prediction subsystem can be ~ound in the above-mentioned copending application. Additionally, LPC synthesis filter 32 is essentially identical to filter 11 shown in Figure 1.
A multi-pulse algorithm was implemented with ~he stochastic excitation and gain estimator described above an~
as illustrated in Figure 3. Table 1 gives the per~inent operating parame~ers of the two coders.
, RD-19,33~
. ~ l Analvsis Parameters of Tested Coders . __ _ ._ Sam~lina Rate 8kHz _ , __ _ _ LPC Frame Size _ _ _ __ 256 sam~les Pitch Frame Size 64 samoles . . _ _ ~ Pitch Frames/LPC Frame4 frames # Pulses/Pitch ~rame 2 ulses .... ,. _...... _.. ,... ... ~ ~ P , _ Stochastic Excitation in_Improved Coder Pitch Frame Size I same as above . Stochastic Codebook Size ¦128 entries X 64 sam~les The coders described in Table 1 can be implemented with a rate of approximately 4800 bits/second.
To evaluate performance of the improved system, a segment of male speech was encoded using a standard multi-pulse coder and also using the improved version according to the invention. ~hile it is difficult to measure quality of speech without a comprehensive listening test, some idea of the quality improvement can be had by examining the time domain traces (equivalent to oscilloscope representations) of the speech signal during unvoiced speech. Figure q illustrates those traces. Segment ~A~ is from the original speech and displays 512 samples, or 64 milliseconds, of the fricative phoneme /s/ (from the end of the word "cross").
Segment (8) illustrates the output signal of the standard multi-pulse coder. Segment (C~ illustrates the output signal of the improved coder. It will be noted that segment (B) is significantly lower in amplitude than ~he original speech and has a pseudo-periodic quality that is manifested in buæziness in the output. Se~ment (C) has the correct amplitude envelope and spectral characteristics, and exhibits none of RD 19,333 the buzzlness inherent in segment (8). During informal listening tests, all listeners surveyed preferred the results obtained by the improved system and which are shown in segment ~C) over the results obtained by the standard system which are shown in segment (B).
While only certain preferred features of the invention have been illustrated and described herein, many modifications and changes will occur to those skilled in the art. It is, therefore, to be understood that the appended claims are intended to cover all such modifications and changes as fall within the true spirit of the invention.
:
' ,
RD-13,133 - HYBRID SWITCHED MULTI-PULSE/STOCHASTIC
S~EECH CODING TECHNIQUE
CROSS-REF~RENCE TO RELATED APPLICATION
s This application is related ln subject matter to Richard L. Zinser application Serial No. 07/ , filed concurrently herewith for "A Method for Improving the Speech Quality in Multi-~ulse Excited Linear Predictive Coding:
(Docket RD-19,291) and assigned to the instant assignee. The disclosure of that application is incorporated herein by reference.
DESCRIPTION
Field of the Invention The present invention generally relates to digital voice transmission systems and, more particularly, to a simple method of combining stochastic excitation and pulse excitation for a low-rate multi-pulse speech coder.
Description of the Prior Art Code exclted linear prediction (CELP) and mul~i-pulse linear predictive coding ~MPLPC) are two of the most promising techniques for low rate speech coding. While C-LP
holds the most promise fox hi~h ~uality, its computational requirements can be too great for some systems. MPLPC can be implemented with much less complexity, but it is generally considered to provide lower ~uality than CELP.
Multi-pulse coding is believed ~o have been first described by B.S. Atal and J.R. Remde in "A New Model of L?C
~$4~2 ~D-lg,33~
Excitation for Producing Natural Sounding Speech at Low 3it Rates", P~oc,_of 1982 IEEE Int. Con~. on Acç~stics S~eech and Si~n~ ~jLla~ May 1982, pp. 614-617, which is incorporated herein by reference. It was described to S improve on the rather synthetic quality of the speech produced by the standard U.S. Department of Defense LPC-10 vocoder. The basic method is to employ the linear predictive coding (LPC~ speech synthesis filter of the standard vocoder, but to use multiple pulses per pitch period for exciting the filter~ lnstead of the single pulse used in the Department of Defense standard system. The basic multi-pulse technique is illustrated in Figure 1.
At low transmission rates (e.g., 4800 bits/second), multi-pulse speech coders do not reproduce unvoiced speech correctly. They exhibit two perceptually annoying flaws: 1) amplitude of the unvoiced sounds is too low, making sibilant sounds difficult to understand, and 2) unvoiced sounds that are reproduced with sufficient amplitude tend to be buzzy, due to the pulsed nature of the excitation.
To see how these problems arise, the cause of the second of these two flaws is first considered. In a multi-pulse coder, as the transmission rate is lowered, fewer pulses can be coded per unit time. This makes the "excitation coverage"
spaxse; i.e., the second trace ("Exc Signal") in Figure 2 contains few pulses. During voiced speech, as shown in Figure 2, this sparseness does no~ become a significant problem unless the transmission rate is so low that a sin~le pulse per pitch period canno~ be transmitted. As seen in Figure 2, the coverage is about three pulses per pitch period. At 4800 bits/second, there is usually enough rate available so that several pulses can be used per pitch period (at least for male speakers), so that coding of voiced speech may readily be accomplished. However, for unvoiced speech, the impulse response of the LPC synthesis ~ilter is much shorter than for voiced speech, and consequently, a sparse 2 ~
RD-19,333 pulse excitation signal will produce a "splotchy~', se~i-periodic output that is buzzy sounding.
A simple way to improve unvoiced excitation would be to add a random noise generator and a voiced/unvoiced decision algorithm, as in the standard LPC-10 algorithm. This would correct for the lack of excitation during unvoiced periods and remove the buzzy artifac~s. Unfortunately, by adding the voiced/unvoiced decision and noise generator, the waveform-preserving properties of multi-pulse coding would be compromised and its intrinsic robustness would be reduced.
In addition, errors introduced into the ~oiced/unvoiced decision during operation in noisy environments would significantly degrade the speech quality.
As an alternative, one could employ simultaneous pulse excitation and random codebook excita~ion similar to CELP.
Such a system is described by T.V. Sreenivas in "Modeling LPC-Residue by Components for Good Quality Speech Coding", ~h~
~ ianal_ ProcQssing, April 1988, pp. 171-174, which is incorporated herein by reference. ~y simultaneously obtaining the pulse amplitudes and searching for the codeword index and gain, a robust system that would give good performance during hoth voiced and unvoiced speec~ could be provided. While this technique appears to be feasible at first look, it can become overly complex in implementation.
If an analysis-by-synthesis codebook technique is desired for the multi-pulse positions and/or amplitudes, then the two codebooks must be se rched together; i.e., if each codebook has N entries, then N2 combinations must be run through the synthesis filter and compared to the input signal.
("Codebook" as used herein refers to a collection of vectors filled with random Gaussian noise samples, and each codebook contains information as to the number of vectors therein and the lengths of the ~ectors.) With typical codebook sizes of 1~8 vector entries, the system becomes too complex for h 3 2 RG-19,3 3 implementation of an equivalent size of (128)2 or i6,384 vector entries.
SUMMARY OF THE INVENT ION
It is therefore an object of the present invention to provide a solution to the unvoiced speech performance problem in low-rate multi-pulse coders.
It is another object of this invention to provide a multi-pulse code architecture that is very simple in implementation yet has an output quality comparable to CELP.
Briefly, according to the invention, a hybrid switched multi-pulse coder architecture is provided in which a stochastic excitation model is used during unvoiced speech and which is also capable of modeling voiced speech. Th~
coder architecture comprises means for analyzing an input speech signal to determine if the signal is voiced or unvoiced, means for generating multi-pulse excita~ion for coding the input signal, means for generating a random codebook excitation for coding the input signal, and means responsive to the means for analyzing an input signal for selecting either the multi-pulse excitation or the random codebook exci~ation. A method of combining stochastic excitation and pulse excitation in an multi-pulse voice coder is also provided and comprises the steps of analyzing an input speech signal to de~ermine if the input signal is voiced or unvoiced - if the input signal is voiced, it is coded by use Gf multi-pùlse excita~ion while if the input signal i5 unvoiced, it is coded by use of a random code~ook excitation. A modified method for calculating ~he gain during stochastic excitation is also provided.
R~-19,333 BRIEF DESCRIPTION OF THE DRAWINGS
The features of the invention believed to be no~el are set forth with particularity in the appended claims. The invention itself, however, both as to organization and method of operation, together with further objects and advantages thereof, may best be understood by reference to the following description taken in conjunction with the accompanying drawings in which:
Figure 1 is a block diagram showing the conventional implementation of the basic multi-pulse technique of coding an input signal;
Figure 2 is a graph showing respectively the input signal, the excitation signal and the output signal in the conventional sys~em shown in Figure 1;
Figure 3 is a block diagram of the hybrid switched multi-pulse/stochastic coder according to the inventioni and Figure 4 is a graph showing respectively the input signal, the output signal of a standard multi-pulse coder, and the output signal of the improved multi-pulse coder according to the invention.
~ET~ILED DESCRIPTION OF A P~EFERRED
EMBODIMENT OF T~E INYENTION
In ~mploying the basic multi-pulse technique using the conventional system shown in Figure 1, the input signal at A
(shown in Figure 2) is first analyzed in a linear predictive coding (LPC) analysis circuit 10 to produce a set of linear prediction filter coefficients. These coefficients, when used in an all-pole LPC synthesis filter 11, produce a filter transfer function ~hat closely re~embles the gross spectral shap~ of the input signal. A feedback loop formed by a pulse generator 12, synthesis filter 11, weighting filters 13, and J~
P~D-19,~33 an error minimizer 14, generates a pulsed excitation at point B that, when fed into filter 11, produces an output waveform at point C that closely resembles the input waveform at point A. This is accomplished by selecting pulse positions and amplitudes to minimize the perceptuaLly weighted difference between the candidate output sequence and the input sequence. Trace B in Figure 2 depicts the pulse excitation for filter ll, and trace C shows the output signal of the system. The resemblance of signals at input A and output C should be noted. Perceptual weighting is provided by the weighting filters 13. The transfer function of these filters is derived from the LPC filter coefficients. A more complete understanding of the baslc multi-pulse technique may be gained from the aforementioned Atal et al. paper.
Since searching two codebooks simultaneously in order to obtain improvement in unvoiced excitation over that provided by multi-pulse speech coders is prohibitively complex, there are two possible choices that are more feasible; i.e., single mode excitation or a voiced/unvoiced decision. The latter approach is adopted by this invention, through use of multi-pulse excitation for voiced periods and random codebook excitation for unvoiced periods. If a pitch predictor is used in conjunction with random codebook excitation, then the random excitation is capable of modeling voiced or unvoiced speech (albeit with somewhat less qualitv during voiced periods). 8y use of this technique, the prevlously-mentloned reduction in robustness associated with the voiced/unvoiced decision is no longer a critical matter for natural sounding speech and the waveform-preserving properties of multi-pulse coding are retained. An improvement in quality over single mode excitation is thereby obtained without the expected aforementioned drawbacks.
Listening tests for the voiced/unvoiced decision system described in the preceding paragraph revealed one remaining problem. While the buzziness in unvoiced sections of the 2 ~ ~3 RD-19,3~3 speech was substantially eliminated, amplitude of the unvoiced sounds was too low. This problem can be traced to the codeword gain computation method for CELP coders. The minimum MSE (mean squared error) gain is calculated by normalizing the cross-correlation between the filtered excitation and the input signal, i.e., ~ y(i)x(i) g = i o~
~ y2 (i) where g is the gain, x(i) is the (weighted)input signal, y(i) is the synthesis-filtered (and weighted) excitation signal, and N is the frame length, i.e., length of a contiguous time sequence of analog~to~digital samplings of a speech sample.
While Equa~ion (1) provides the minimum error result, it also produces a level of output signal that is substantially lower than the level of input signal when a high de~ree of cross-correlation between output signal and input signal cannot be attained. The correlation mismatch occurs most often during unvoiced speech. Unvoiced speech is problematical because the pi~ch predictor provides a much smaller coding gain than in voiced speech and thus the codebook must provide most of the excitation pulses. For a small codebook system (128 vector entries or less), there are insufficient codebook entries for a good match.
If the unvoiced gain is instead calculated by a RMS
(root-mean-square) matching method, i.e., N-1 ~2 ~ x2(i) g_ T y2(i) ~ (2) ' ' -:
, .~h~
then the output signal level will more closely match the input signal level, but the overall signal-to-noise ratio (SNR) will be lower. I have employed the estimator of Equation (2) for unvoiced frames and found that the output amplitude during unvoiced speech sounded much closer to that of the original speech. In an informal comparison, listeners preferred speech synthesized with the unvoiced gain of Equation (2) compared to ~hat of Equation (1).
Figure 3 is a block diagram of a multi-pulse coder utilizing the improvements according to the invention. As in the system illustrated in Figure 1, the input sequence is first passed to an LPC analy~er 20 to p~oduce a set of linear predictive filter coefficients. In addition, the preferred embodiment of this invention contains a pitch prediction system that is fully described in my copending application S.N. ~docket RD 19,291). For the purpose of pitch prediction, the pitch lag is also calculated directly from the input data by a pitch detector 21. To find the pulse information, the impulse response is generated in a weighted impulse response circuit 22. The output signal of this response circuit is cross-correlated with error weighted input buffer data from an error weighting. filter 35 in a cross-correlator 23. (LPC analyzer 20 provides error weightlng filter 35 with the linear predictive filter coefficients so as to allow cross-correlator circuit 23 to minimize error.) An iterative peak search is performed by the cross-correlator on the resulting cross-correlation, producing the pulse positions. The preferred method or computing the pulse amplitudes can be found in my above-mentioned copending patent application. After all the pulsepositions and amplitudes are computed, they are passed to a pulse exci~ation generator 25~ which generates impulsive excitation similar to that shown in trace B of Figure 2; that is, correlator 2~ produces ~he pulse positions, and pulse excitation genera~or 25 generates the drive pulses.
,7 ~ ~ r.~
RD-13,333 Based on the input data, a voiced/unvoiced decision circuit 24 selects either pulse excitation, or noise codebook excitation. If a voiced determination is made by voiced/unvoiced decision circuit 24, pulse excitation is used and an electronic switch 30 is closed to its Voiced position.
The pulse excitation from generator 25 is then passed through switch 30 to the output stages.
If, alternatively, an unvoiced determination is made by decision circuit 24, then noise codebook excitation is employed. A Gaussian noise codebook 26 is exhaustively searched by first passing each codeword through a weighted LPC synthesis filter 27 (which provides weighting in accordance with the linear predictive coefficients from LPC
analyzer 20), and then selecting the codeword that produces the output sequence that most closely resembles the perceptually weighted input sequence. This task is performed by a noise codebook selector 28. Selector 28 also calculates optimal gain for the chosen codeword in accordance with the linear predictive coefficients from LPC analyzer 20. The gain-scaled codeword is then generated at the codebook output port 29 and passed through switch 30 (which is in the Unvoiced position) to the output stages.
The output stages make up a pitch prediction synthesis subsystem comprising a summing circuit 31, an excitation buffer 33 and pitch synthesis filter 34, and an LPC synthesis filter 32. ~ full description of the pit~h prediction subsystem can be ~ound in the above-mentioned copending application. Additionally, LPC synthesis filter 32 is essentially identical to filter 11 shown in Figure 1.
A multi-pulse algorithm was implemented with ~he stochastic excitation and gain estimator described above an~
as illustrated in Figure 3. Table 1 gives the per~inent operating parame~ers of the two coders.
, RD-19,33~
. ~ l Analvsis Parameters of Tested Coders . __ _ ._ Sam~lina Rate 8kHz _ , __ _ _ LPC Frame Size _ _ _ __ 256 sam~les Pitch Frame Size 64 samoles . . _ _ ~ Pitch Frames/LPC Frame4 frames # Pulses/Pitch ~rame 2 ulses .... ,. _...... _.. ,... ... ~ ~ P , _ Stochastic Excitation in_Improved Coder Pitch Frame Size I same as above . Stochastic Codebook Size ¦128 entries X 64 sam~les The coders described in Table 1 can be implemented with a rate of approximately 4800 bits/second.
To evaluate performance of the improved system, a segment of male speech was encoded using a standard multi-pulse coder and also using the improved version according to the invention. ~hile it is difficult to measure quality of speech without a comprehensive listening test, some idea of the quality improvement can be had by examining the time domain traces (equivalent to oscilloscope representations) of the speech signal during unvoiced speech. Figure q illustrates those traces. Segment ~A~ is from the original speech and displays 512 samples, or 64 milliseconds, of the fricative phoneme /s/ (from the end of the word "cross").
Segment (8) illustrates the output signal of the standard multi-pulse coder. Segment (C~ illustrates the output signal of the improved coder. It will be noted that segment (B) is significantly lower in amplitude than ~he original speech and has a pseudo-periodic quality that is manifested in buæziness in the output. Se~ment (C) has the correct amplitude envelope and spectral characteristics, and exhibits none of RD 19,333 the buzzlness inherent in segment (8). During informal listening tests, all listeners surveyed preferred the results obtained by the improved system and which are shown in segment ~C) over the results obtained by the standard system which are shown in segment (B).
While only certain preferred features of the invention have been illustrated and described herein, many modifications and changes will occur to those skilled in the art. It is, therefore, to be understood that the appended claims are intended to cover all such modifications and changes as fall within the true spirit of the invention.
:
' ,
Claims (7)
1. A method of combining stochastic excitation and pulse excitation in a multi-pulse voice coder, comprising the steps of:
analyzing an input speech signal to determine if the input signal is voiced or unvoiced; and selecting a form of excitation for coding the input signal depending upon the type of input signal, said excitation being multi-pulse excitation if the input signal is voiced and being random codebook excitation coding if the input signal is unvoiced.
analyzing an input speech signal to determine if the input signal is voiced or unvoiced; and selecting a form of excitation for coding the input signal depending upon the type of input signal, said excitation being multi-pulse excitation if the input signal is voiced and being random codebook excitation coding if the input signal is unvoiced.
2. The method recited in claim 1 wherein said multi-pulse excitation used for coding a voiced input signal comprises the steps of:
determining a pitch predictor tap gain as a normalized cross-correlation of an input sequence and pitch buffer samples by extending the pitch buffer through copying previous samples over a distance of P samples;
modifying a pitch synthesis filter so that a pitch predictor output sequence is a series computed for each interval P; and simultaneously solving for pulse amplitudes and pitch tap gain, thereby minimizing estimator bias in the multi-pulse excitation.
determining a pitch predictor tap gain as a normalized cross-correlation of an input sequence and pitch buffer samples by extending the pitch buffer through copying previous samples over a distance of P samples;
modifying a pitch synthesis filter so that a pitch predictor output sequence is a series computed for each interval P; and simultaneously solving for pulse amplitudes and pitch tap gain, thereby minimizing estimator bias in the multi-pulse excitation.
3. The method recited in claim 2 wherein said random codebook excitation used for coding an unvoiced input signal comprises the steps of:
RD-19,333 searching a Gaussian noise codebook by passing codewords through a weighted linear predictive coding synthesis filter;
selecting a codeword that produces an output sequence that most closely resembles a perceptually weighted input sequence; and gain scaling the selected codeword.
RD-19,333 searching a Gaussian noise codebook by passing codewords through a weighted linear predictive coding synthesis filter;
selecting a codeword that produces an output sequence that most closely resembles a perceptually weighted input sequence; and gain scaling the selected codeword.
4. A hybrid switched multi-pulse coder comprising:
means for analyzing an input speech signal to determine if the input signal is voiced or unvoiced;
means for generating multi-pulse excitation for coding an input voiced signal;
means for generating a random codebook excitation for coding an input unvoiced signal;
output means; and switching means responsive to said means for analyzing an input signal and for selectively coupling to said output means either said multi-pulse excitation or said random codebook excitation in accordance with whether said input signal is voiced or unvoiced.
means for analyzing an input speech signal to determine if the input signal is voiced or unvoiced;
means for generating multi-pulse excitation for coding an input voiced signal;
means for generating a random codebook excitation for coding an input unvoiced signal;
output means; and switching means responsive to said means for analyzing an input signal and for selectively coupling to said output means either said multi-pulse excitation or said random codebook excitation in accordance with whether said input signal is voiced or unvoiced.
5. The hybrid switched multi-pulse coder recited in claim 4 wherein said means for generating multi-pulse excitation comprises:
a linear predictive coefficient analyzer;
weighted impulse response means for weighting the output signal of said linear predictive coefficient analyzer;
means responsive to said weighted impulse response means for producing pulse position data; and pulse excitation generator means for generating drive pulses positioned in accordance with said pulse position data.
RD-19,333
a linear predictive coefficient analyzer;
weighted impulse response means for weighting the output signal of said linear predictive coefficient analyzer;
means responsive to said weighted impulse response means for producing pulse position data; and pulse excitation generator means for generating drive pulses positioned in accordance with said pulse position data.
RD-19,333
6. The hybrid switched multi-pulse coder recited in claim 5 wherein said means for generating a random codebook excitation comprises:
a Gaussian noise codebook;
a weighted linear predictive coding synthesis filter;
means coupling said Gaussian noise codebook to said weighted linear predictive coding synthesis filter so as to enable searching of said Gaussian noise codebook by passing codewords through said weighted linear predictive coding synthesis filter;
selector means coupled to said weighted linear predictive coding synthesis filter for selecting a codeword that produces an output sequence that most closely resembles a perceptually weighted input sequence; and means coupled to said selector means for gain scaling the selected codeword.
a Gaussian noise codebook;
a weighted linear predictive coding synthesis filter;
means coupling said Gaussian noise codebook to said weighted linear predictive coding synthesis filter so as to enable searching of said Gaussian noise codebook by passing codewords through said weighted linear predictive coding synthesis filter;
selector means coupled to said weighted linear predictive coding synthesis filter for selecting a codeword that produces an output sequence that most closely resembles a perceptually weighted input sequence; and means coupled to said selector means for gain scaling the selected codeword.
7. The invention as defined in and of the preceding claims including any further features of novelty disclosed.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US353,855 | 1989-05-18 | ||
US07/353,855 US5060269A (en) | 1989-05-18 | 1989-05-18 | Hybrid switched multi-pulse/stochastic speech coding technique |
Publications (1)
Publication Number | Publication Date |
---|---|
CA2016462A1 true CA2016462A1 (en) | 1990-11-18 |
Family
ID=23390867
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CA002016462A Abandoned CA2016462A1 (en) | 1989-05-18 | 1990-05-10 | Hybrid switched multi-pulse/stochastic speech coding technique |
Country Status (2)
Country | Link |
---|---|
US (1) | US5060269A (en) |
CA (1) | CA2016462A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0556354A1 (en) * | 1991-09-05 | 1993-08-25 | Motorola, Inc. | Error protection for multimode speech coders |
Families Citing this family (52)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5434948A (en) * | 1989-06-15 | 1995-07-18 | British Telecommunications Public Limited Company | Polyphonic coding |
DE9006717U1 (en) * | 1990-06-15 | 1991-10-10 | Philips Patentverwaltung Gmbh, 2000 Hamburg, De | |
US5138661A (en) * | 1990-11-13 | 1992-08-11 | General Electric Company | Linear predictive codeword excited speech synthesizer |
US5293449A (en) * | 1990-11-23 | 1994-03-08 | Comsat Corporation | Analysis-by-synthesis 2,4 kbps linear predictive speech codec |
US5537509A (en) * | 1990-12-06 | 1996-07-16 | Hughes Electronics | Comfort noise generation for digital communication systems |
JPH04264597A (en) * | 1991-02-20 | 1992-09-21 | Fujitsu Ltd | Voice encoding device and voice decoding device |
EP1239456A1 (en) * | 1991-06-11 | 2002-09-11 | QUALCOMM Incorporated | Variable rate vocoder |
US5457783A (en) * | 1992-08-07 | 1995-10-10 | Pacific Communication Sciences, Inc. | Adaptive speech coder having code excited linear prediction |
US5717824A (en) * | 1992-08-07 | 1998-02-10 | Pacific Communication Sciences, Inc. | Adaptive speech coder having code excited linear predictor with multiple codebook searches |
DE4231918C1 (en) * | 1992-09-24 | 1993-12-02 | Ant Nachrichtentech | Procedure for coding speech signals |
CA2108623A1 (en) * | 1992-11-02 | 1994-05-03 | Yi-Sheng Wang | Adaptive pitch pulse enhancer and method for use in a codebook excited linear prediction (celp) search loop |
US5479559A (en) * | 1993-05-28 | 1995-12-26 | Motorola, Inc. | Excitation synchronous time encoding vocoder and method |
US5659659A (en) * | 1993-07-26 | 1997-08-19 | Alaris, Inc. | Speech compressor using trellis encoding and linear prediction |
WO1995010760A2 (en) * | 1993-10-08 | 1995-04-20 | Comsat Corporation | Improved low bit rate vocoders and methods of operation therefor |
US5673364A (en) * | 1993-12-01 | 1997-09-30 | The Dsp Group Ltd. | System and method for compression and decompression of audio signals |
JPH07160299A (en) * | 1993-12-06 | 1995-06-23 | Hitachi Denshi Ltd | Sound signal band compander and band compression transmission system and reproducing system for sound signal |
US5568588A (en) * | 1994-04-29 | 1996-10-22 | Audiocodes Ltd. | Multi-pulse analysis speech processing System and method |
US5854998A (en) * | 1994-04-29 | 1998-12-29 | Audiocodes Ltd. | Speech processing system quantizer of single-gain pulse excitation in speech coder |
US5602961A (en) * | 1994-05-31 | 1997-02-11 | Alaris, Inc. | Method and apparatus for speech compression using multi-mode code excited linear predictive coding |
TW271524B (en) * | 1994-08-05 | 1996-03-01 | Qualcomm Inc | |
US5742734A (en) * | 1994-08-10 | 1998-04-21 | Qualcomm Incorporated | Encoding rate selection in a variable rate vocoder |
JP2720800B2 (en) * | 1994-12-16 | 1998-03-04 | 日本電気株式会社 | Noise insertion method and apparatus |
FR2729244B1 (en) * | 1995-01-06 | 1997-03-28 | Matra Communication | SYNTHESIS ANALYSIS SPEECH CODING METHOD |
FR2729247A1 (en) * | 1995-01-06 | 1996-07-12 | Matra Communication | SYNTHETIC ANALYSIS-SPEECH CODING METHOD |
FR2729246A1 (en) * | 1995-01-06 | 1996-07-12 | Matra Communication | SYNTHETIC ANALYSIS-SPEECH CODING METHOD |
DE69516522T2 (en) * | 1995-11-09 | 2001-03-08 | Nokia Mobile Phones Ltd | Method for synthesizing a speech signal block in a CELP encoder |
US5797121A (en) * | 1995-12-26 | 1998-08-18 | Motorola, Inc. | Method and apparatus for implementing vector quantization of speech parameters |
US5708757A (en) * | 1996-04-22 | 1998-01-13 | France Telecom | Method of determining parameters of a pitch synthesis filter in a speech coder, and speech coder implementing such method |
US5751901A (en) * | 1996-07-31 | 1998-05-12 | Qualcomm Incorporated | Method for searching an excitation codebook in a code excited linear prediction (CELP) coder |
JP4040126B2 (en) * | 1996-09-20 | 2008-01-30 | ソニー株式会社 | Speech decoding method and apparatus |
CN1169117C (en) | 1996-11-07 | 2004-09-29 | 松下电器产业株式会社 | Acoustic vector generator, and acoustic encoding and decoding apparatus |
US5832443A (en) * | 1997-02-25 | 1998-11-03 | Alaris, Inc. | Method and apparatus for adaptive audio compression and decompression |
JP3063668B2 (en) * | 1997-04-04 | 2000-07-12 | 日本電気株式会社 | Voice encoding device and decoding device |
CN1124590C (en) * | 1997-09-10 | 2003-10-15 | 三星电子株式会社 | Method for improving performance of voice coder |
WO1999065017A1 (en) * | 1998-06-09 | 1999-12-16 | Matsushita Electric Industrial Co., Ltd. | Speech coding apparatus and speech decoding apparatus |
US7072832B1 (en) * | 1998-08-24 | 2006-07-04 | Mindspeed Technologies, Inc. | System for speech encoding having an adaptive encoding arrangement |
US6192335B1 (en) * | 1998-09-01 | 2001-02-20 | Telefonaktieboiaget Lm Ericsson (Publ) | Adaptive combining of multi-mode coding for voiced speech and noise-like signals |
KR100309873B1 (en) * | 1998-12-29 | 2001-12-17 | 강상훈 | A method for encoding by unvoice detection in the CELP Vocoder |
US6954727B1 (en) | 1999-05-28 | 2005-10-11 | Koninklijke Philips Electronics N.V. | Reducing artifact generation in a vocoder |
US7167828B2 (en) * | 2000-01-11 | 2007-01-23 | Matsushita Electric Industrial Co., Ltd. | Multimode speech coding apparatus and decoding apparatus |
US7171355B1 (en) * | 2000-10-25 | 2007-01-30 | Broadcom Corporation | Method and apparatus for one-stage and two-stage noise feedback coding of speech and audio signals |
US7110942B2 (en) * | 2001-08-14 | 2006-09-19 | Broadcom Corporation | Efficient excitation quantization in a noise feedback coding system using correlation techniques |
US6751587B2 (en) | 2002-01-04 | 2004-06-15 | Broadcom Corporation | Efficient excitation quantization in noise feedback coding with general noise shaping |
US7206740B2 (en) * | 2002-01-04 | 2007-04-17 | Broadcom Corporation | Efficient excitation quantization in noise feedback coding with general noise shaping |
US8352248B2 (en) * | 2003-01-03 | 2013-01-08 | Marvell International Ltd. | Speech compression method and apparatus |
US8473286B2 (en) * | 2004-02-26 | 2013-06-25 | Broadcom Corporation | Noise feedback coding system and method for providing generalized noise shaping within a simple filter structure |
TWI327230B (en) * | 2007-04-03 | 2010-07-11 | Ind Tech Res Inst | Sound source localization system and sound soure localization method |
GB0822537D0 (en) | 2008-12-10 | 2009-01-14 | Skype Ltd | Regeneration of wideband speech |
GB2466201B (en) * | 2008-12-10 | 2012-07-11 | Skype Ltd | Regeneration of wideband speech |
US9947340B2 (en) * | 2008-12-10 | 2018-04-17 | Skype | Regeneration of wideband speech |
GB2480108B (en) * | 2010-05-07 | 2012-08-29 | Toshiba Res Europ Ltd | A speech processing method an apparatus |
US20120143611A1 (en) * | 2010-12-07 | 2012-06-07 | Microsoft Corporation | Trajectory Tiling Approach for Text-to-Speech |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US3872250A (en) * | 1973-02-28 | 1975-03-18 | David C Coulter | Method and system for speech compression |
IT1144551B (en) * | 1981-02-24 | 1986-10-29 | Cselt Centro Studi Lab Telecom | NUMERICAL DEVICE FOR DISCRIMINATION OF VOICE SIGNALS AND NUMBERED DATA SIGNALS |
US4817155A (en) * | 1983-05-05 | 1989-03-28 | Briar Herman P | Method and apparatus for speech analysis |
US4890328A (en) * | 1985-08-28 | 1989-12-26 | American Telephone And Telegraph Company | Voice synthesis utilizing multi-level filter excitation |
US4776014A (en) * | 1986-09-02 | 1988-10-04 | General Electric Company | Method for pitch-aligned high-frequency regeneration in RELP vocoders |
JPH087597B2 (en) * | 1988-03-28 | 1996-01-29 | 日本電気株式会社 | Speech coder |
-
1989
- 1989-05-18 US US07/353,855 patent/US5060269A/en not_active Expired - Lifetime
-
1990
- 1990-05-10 CA CA002016462A patent/CA2016462A1/en not_active Abandoned
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0556354A1 (en) * | 1991-09-05 | 1993-08-25 | Motorola, Inc. | Error protection for multimode speech coders |
EP1130576A1 (en) * | 1991-09-05 | 2001-09-05 | Motorola, Inc. | Error protection for multimode speech encoders |
EP0556354B1 (en) * | 1991-09-05 | 2001-10-31 | Motorola, Inc. | Error protection for multimode speech coders |
Also Published As
Publication number | Publication date |
---|---|
US5060269A (en) | 1991-10-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US5060269A (en) | Hybrid switched multi-pulse/stochastic speech coding technique | |
US5127053A (en) | Low-complexity method for improving the performance of autocorrelation-based pitch detectors | |
US5138661A (en) | Linear predictive codeword excited speech synthesizer | |
Campbell Jr et al. | The DoD 4.8 kbps standard (proposed federal standard 1016) | |
US7496505B2 (en) | Variable rate speech coding | |
US8401843B2 (en) | Method and device for coding transition frames in speech signals | |
US4980916A (en) | Method for improving speech quality in code excited linear predictive speech coding | |
CA2636552C (en) | A method for speech coding, method for speech decoding and their apparatuses | |
EP0747882A2 (en) | Pitch delay modification during frame erasures | |
EP0747883A2 (en) | Voiced/unvoiced classification of speech for use in speech decoding during frame erasures | |
EP1145228A1 (en) | Periodic speech coding | |
WO1995028824A2 (en) | Method of encoding a signal containing speech | |
KR100275429B1 (en) | Speech codec | |
Kleijn et al. | A 5.85 kbits CELP algorithm for cellular applications | |
EP0578436B1 (en) | Selective application of speech coding techniques | |
EP0849724A2 (en) | High quality speech coder and coding method | |
US5105464A (en) | Means for improving the speech quality in multi-pulse excited linear predictive coding | |
Granzow et al. | High-quality digital speech at 4 kb/s | |
Tzeng | Analysis-by-synthesis linear predictive speech coding at 2.4 kbit/s | |
Ramprashad | Embedded coding using a mixed speech and audio coding paradigm | |
Akamine et al. | CELP coding with an adaptive density pulse excitation model | |
Zinser et al. | 4800 and 7200 bit/sec hybrid codebook multipulse coding | |
EP0119033B1 (en) | Speech encoder | |
KR960011132B1 (en) | Pitch detection method of celp vocoder | |
JPH05273999A (en) | Voice encoding method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
EEER | Examination request | ||
FZDE | Discontinued |