US3624302A - Speech analysis and synthesis by the use of the linear prediction of a speech wave - Google Patents
Speech analysis and synthesis by the use of the linear prediction of a speech wave Download PDFInfo
- Publication number
- US3624302A US3624302A US872051A US3624302DA US3624302A US 3624302 A US3624302 A US 3624302A US 872051 A US872051 A US 872051A US 3624302D A US3624302D A US 3624302DA US 3624302 A US3624302 A US 3624302A
- Authority
- US
- United States
- Prior art keywords
- signals
- speech signal
- speech
- developing
- signal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/06—Determination or coding of the spectral characteristics, e.g. of the short-term prediction coefficients
Definitions
- the speech [2?] Output of the vocal "act at any instant of time can be assumed 179 SA to be a weighted sum of its past values and the input to the I 1 0 re 325/38 vocal tract at that instant of time.
- a speech wave is represented by the output of a linear filter Cm 3353553311 211?1.1231323?
- This invention relates to the artificial production of speech or similar complex waves from control signals, and particularly to the derivation of control signals from an original speech wave that can be accommodated by storage or transmission facilities with limited channel capacity.
- the principal object of the invention is to reduce, as far as possible, the channel capacity, or bit rate in the case of a digital channel, required for the storage or transmission of speech control signals without, however, a sacrifice of intelligibility or the introduction of an objectionable unnatural quality into the reconstructed speech.
- Speech parameter signals are continuously developed at a transmitter station using the constraint that the applied speech wave at any instant of time is a weighted sum of its past values, that is to say, speech parameter signals are developed which specify linearly predictable characteristics of an applied speech signal.
- speech parameter signals are developed which specify linearly predictable characteristics of an applied speech signal.
- a suitable functional model of speech production is established and it is assumed that a close approximation to a speech wave can be produced at its output.
- the model includes a discrete, linear, time-varying filter which is excited by a suitable combination of a quasi-periodic pulse train (voiced excitation) and white noise (unvoiced excitation).
- the output of the linear filter at any sampling instant is a linear combination of past output samples and the input.
- the n'" speech sample, s, may be expressed as:
- a a a,,; 12,, b b are parameters which specify the filter at any time
- x is the n input sample.
- the samples Jr represent a train of quasi-periodic pulses
- x represents the output of a white noise generator.
- the values of p and q are determined by the bandwidth of the input speech signal and the length of the vocal tract.
- a 10th to 12th order linear predictor satisfactorily represents the speech signal band-limited to 5 kHz. with sufficient accuracy.
- a higher order predictor may be necessary for certain cases (some male speakers saying nasalized consonants).
- the parameter q is assumed to be equal to l0 in the analysis and zero in the synthesis.
- Durations of individual pitch periods are determined by calculating the pitch-synchronous autocorrelation function of the third power of the input speech wave and selecting the delay for which the autocorrelation function is maximum.
- a speech signal may be synthesized by a network continually adjusted by parameter signals derived in this fashion.
- FIG. 1 is a block schematic diagram of a speech transmission system, including an analyzer and a synthesizer which illustrates the principles of the invention
- FIG. 2 is a block schematic diagram of a prediction parameter computer suitable for use in the analyzer of the speech transmission system illustrated in FIG. 1;
- FIG. 3 is a block schematic diagram of a network for developing parameters, representing the relative amplitudes of voiced and unvoiced signal components, suitable for use in the analyzer of the system illustrated in FIG. I;
- FIG. 4 is a block schematic diagram of a time-varying filter which may be used at the synthesizer of a transmission system embodying the principles of the invention.
- FIG. 1 A complete limited channel capacity speech transmission system which illustrates the principles of the invention is shown in FIG. 1.
- Speech signals which may originate, for example, in transducer 10, are passed through low pass filter II which has a cutoff frequency in the neighborhood of 5 kHz. and which exhibits a 3-db. cutoff frequency in the neighborhood of 4 kHz.
- the resultant signal is then sampled at a frequency of approximately 10 kHz. in sampler 12.
- Clock 13 is employed to energize the sampler and other units in the system.
- Speech samples, s,,, thus derived are supplied to prediction parameter computer 14, to pitch pulse position computer 15, and to parameter computer 16.
- Parameters a uniquely specify the frequencies and bandwidths of speech fonnants in the input signal below about 5 kHz.
- Parameter signals a are developed from linearly predictable characteristics of the applied speech signals delivered by sampler 12.
- Pitch pulse position computer 15 determines the location of the glottal pulses in the applied speech wave; the difference between the positions of successive glottal pulses specifies the duration of the pitch period. Any suitable pulse position analyzer may be employed to derive pitch period signals, N. For example, a suitable arrangement is described in Automatic Speaker Recognition Based on Pitch Contours by B. S. Atal, Polytechnic Institute of Brooklyn, June, 1968, pages 33-43.
- Speech samples from unit 12 are also supplied to computer 16, which determines parameters g,, and These parameters characterize the amplitudes of voiced and unvoiced signal excitation, i.e., parameter 3, specifies the amplitude of voiced (or buzz) excitation signals, and the parameter 3, specifies the amplitude of unvoiced (or hiss) excitation signals.
- Parameter signals a, N, and g,, 3, thus derived uniquely determine formant frequencies and bandwidths of a speech signal, its spectrum, and the relative amplitudes of voiced and unvoiced components necessary for the synthesis of artificial speech. Since these parameters require considerably less channel capacity than the corresponding analog signal representation, they may be economically stored for future use, or transmitted to a distant station. All parameter signals may, for example, be combined for transmission, for example, by multiplexing, or the like, in transmission coder 17. At a receiver station, these signals are recovered and delivered individually through the action of transmission decoder 18. Transmission coders and decoders of any desired construction and form may be employed. Obviously, storage of the parameter signals may take place at any point in the indicated transmission arrangement; the transfer of parameter signals from one storage location to another, for example, may be considered to be a form of transmission.
- voiced excitation is generated, for example, in pulse generator 19 of any desired construction, under control of pitch pulse parameter signal N.
- the amplitude of the voiced excitation signal is controlled continuously by parameter signal 3, acting upon modulator 20.
- generator 19 produces a pulse of unit amplitude at the beginning of every pitch period.
- unvoiced excitation is produced in noise generator 22.
- Generator 22 typically produces a sequence of random numbers uniformly distributed between +1 and l at sampling instants. Noise signals are controlled in amplitude by parameter signal g acting on modulator 23.
- pulse generator 19 and noise generator 22 are added together with selected past signal values, available at the output of time-varying filter 24, in a combining network 21.
- the combined signal produced at the output of network 21 is thereupon delivered by way of low pass filter 26 to reproducer 27, for example, a loud speaker.
- Low pass filter 26 preferably has a cutoff frequency of about 5 kHz., its exact frequency range being commensurate with the range of filter 11 at the analyzer.
- the combined signal is also delivered to the input of transversal filter 25, forming a part of filter network 24.
- Time-varying filter 24 serves to regenerate speech from the applied excitation and parameter signals a.
- Such a filter arrangement resembles the resonant filter system of the human vocal tract and typically exhibits certain natural resonances which may be tuned in accordance with formant parameter signals a.
- Resonant vocoder apparatus of this general form is well known in the art; a typical example is described in J. L. Kelly, Jr., US. Pat. No. 3,328,525, issued June 27, 1967.
- a transversal filter arrangement specifically adapted for use in the apparatus illustrated in FIG. 1 is described below with reference to FIG. 4.
- FIG. 2 illustrates a prediction parameter computer 14 suitable for developing formant parameter signals a in accordance with the invention.
- an array of signal values s For every pitch period of the applied speech wave, an array of signal values s, from sampler 12 (FIG. 1) is transferred into storage unit 140 to replace the previous array of signal samples contained in the storage unit.
- Storage unit 140 thus stores an array of signal values u "-9, u u a where N represents the duration of the current pitch period in samples. Every pitch period the values u u u are replaced by values y-g, uu .lncoming samples are placed in the vacated storage locations u u Thus, signals u u are consecutively stored as they are received in storage unit 140.
- indexj varies from i, 10
- indexj varies from 1, 10
- the resultant array, h,, h,, h designated H, is delivered every pitch period to computer 144.
- Prediction parameters a,, a a,,, uniquely determine the frequencies and bandwidths of all speech formants below 5 kHz. If desired, the bandwidths and frequencies of formants may be determined from values of a for use in the control of other synthesis apparatus. In accordance with the invention, this determination is made by supplying parameter values a from computer 144, by way of switch 147, to polynomial root computer 148. This unit determines the complex roots of a polynomial with real coefficients, i.e., the roots of a polynomialf(z), defined as:
- a polynomial root locater suitable for making the necessary evaluation is described in Mathematical Methods for Digital Computers, edited by Ralston and Wilf, John Wiley & Sons, lnc., 1967, in the section by E. R. Bareiss, at page 185.
- the output of the polynomial root locator 148 is 10 complex numbers (two sets of 10 real numbers) 2,, z,, z which are then supplied to the arithmetic unit 149 which computes the numbers p,, p:, p in accordance with equation (6) below.
- the complex numbers p can be separated into their real and imaginary parts, b and f,,, respectively, as follows:
- Logical unit 150 orders the numbers p such that the first number has the lowest positive imaginary part, the second number the second lowest positive imaginary part, and so on. Consequently, the numbers f,, and b represent the frequencies and the bandwidths of the various formants of the speech signal for the pitch period under consideration. These representations may be used in any desired fashion, e.g., for controlling a formant synthesis.
- Speech samples from sampling unit 12 are also supplied to parameter computer 16 which determines parameter values g and These parameters denote the relative amplitudes of voiced and unvoiced signal components in the applied speech signal.
- Storage unit 162 contains an array of signal values w w w Every pitch period, the values w W. are replaced by values w w New signal values are computed in arithmetical unit 163 according to equation l0) and stored consecutively in storage locations w w n varies from O, N
- Arithmetic unit 166 computes an array of signal values y y. esignated y, and stores them in storage unit 167 in locations designated y y Storage unit 167 is equipped with 10 additional storage locations designated y y which have a number 0 stored in them permanently.
- the array y is computed in arithmetic unit 166 according to the following relation:
- n varies from O, N
- the array of numbers r designates the output of a white noise generator 170. Similar to storage unit 167, storage unit 169 also has l0 additional storage locations designated v- L, which have the number 0 stored in them permanently.
- the arrays w, y, and v, and the numbers E and R are transferred periodically, under the influence of pitch synchronized clock pulses from pulser 171, to arithmetic unit 172 which comprises six arithmetic units designated d,, d d (1., which operate in parallel. These units of system 172 compute the numbers 11,, d in accordance with equations (13) through (18) set forth below.
- the index n is summed from 0 to N in each of the equations.
- Arithmetic and storage units operative in a fashion similar to that described above are described in greater detail in the aforementioned copending application, Ser. No. 75 3 ,408.
- time-varying filter 24 (FIG. 1) includes a transversal filter network 25 composed of IO unit delay elements 240, 240, supplying applied signals to 10 adjustable gain amplifiers 241,, 241 Signals developed at the junctions of the several delay units thus represent past sample values of signals supplied from combiner 21 to filter 26 in the synthesizer of FIG. 1.
- the gains of the individual amplifiers 241 are adjusted by parameter values a to form a collection of weighted past sample values.
- the resultant signals are additively combined in adder network 242 and supplied to one input of combiner unit 21.
- the combined output of combiner network 21, which includes voiced and unvoiced excitation, and the combination of weighted past sample values constitute a replica of the applied speech signal. It is supplied by way of filter 26 to loud speaker 27.
- an analog speech signal may be efficiently transmitted in the form of an array of numbers, viz., N; 3,, g,; and a,, a
- N an array of numbers
- g the necessary information concerning the speech wave in any given pitch period
- a sufi'icient for reconstructing the speech wave.
- a saving of approximately 10 to l in transmission capacity may be achieved when using these parameters rather than the analog signal itself.
- a lO-kilobit signal used for representing the parameter has been found to yield excellent quality synthesized speech.
- a S-kilobit signal still permits vary acceptable speech to be produced; this in contrast to the usual requirement of a O -kilobit signal for direct coding of a speech wave.
- Speech analysis apparatus which comprises:
- said first set of signals which specify linearly predictable characteristics comprises a plurality of limited channel capacity parameter signals derived from past and current values of said applied speech signal for adjusting a resonant filter system, arranged to produce a replica of said applied speech signal when excitedby voiced and unvoiced excitation signals.
- Speech signal analysis apparatus as defined in claim 3, in combination with,
- said first set of signals is developed by minimizing the meansquared error between the actual values of samples of said applied speech signal and predicted values thereof based on a selected number of past sample values.
- Speech signal apparatus which comprises:
- means for developing a second set of signals representative 5 of the duration of individual pitch periods of said applied speech signal means for developing a third set of signals representative of the energy of a speech signal in each of said pitch periods and of the voicing character of speech signals within said pitch periods, and
- said means at said receiver station for developing signals representative of predicted values of said speech signal comprises,
- transversal filter supplied with a combination of adjusted pitch period pulses, adjusted noise signals, and signals selectively representative of past values of said applied signal.
- Synthesis apparatus for developing artificial speech from signals representative of the pitch period, voicing character, and selected predictable characteristics of an applied speech signal, which comprises:
- means for generating white noise signals means responsive to received signals representative of the voicing character of said applied speech signal for individually adjusting the levels of said pitch period pulses and said white noise signals, and
- Synthesis apparatus as defined in claim 8, wherein said means for developing signals representative of predicted values of said speech signal comprises a transversal filter supplied with said combined replica signal and adjusted by said predictable characteristic signals.
- said predicted value signals are selected to represent a linear combination of preceding values of said replica of said applied speech signal.
Abstract
A short-time spectral analysis of a nonstationary signal, such as a speech signal, does not ordinarily yield control signal information sufficient for subsequent synthesis. However, more reliable control signals for a speech synthesizer can be obtained by making use of natural constraints, applicable to a speech wave, in the analysis procedure. For frequencies below 5 kHz., the human vocal tract can be modeled as an acoustic tube in which only plane waves propagate. Thus, for vowels and vowellike sounds, the speech output of the vocal tract at any instant of time can be assumed to be a weighted sum of its past values and the input to the vocal tract at that instant of time. In the described invention, a speech wave is represented by the output of a linear filter which simulates an acoustic tube and which is excited by a combination of a quasi-periodic pulse train and white noise. The parameters of this filter are derived from the speech wave such that the mean-squared error between the synthetic speech samples at the output of the filter and the input speech samples is minimum.
Description
United States Patent I I I Murray Hill, NJ.
SPEECH ANALYSIS AND SYNTHESIS BY THE USE OF THE LINEAR PREDICTION OF A SPEECH WAVE 10 Claims, 4 Drawing Figs.
Primary Examiner-Kathleen H. Claffy Assistant Examiner-Jon Bradford Leaheey Attorneys-R. .I. Guenther and William L. Keefauver ABSTRACT: A short-time spectral analysis of a nonstationary signal, such as a speech signal, does not ordinarily yield control signal information sufficient for subsequent synthesis. However, more reliable control signals for a speech synthesizer can be obtained by making use of natural constraints, applicable to a speech wave, in the analysis procedure. For frequencies below 5 kHz., the human vocal tract can be modeled as an acoustic tube in which only plane waves propagate. Thus, for vowels and vowellike sounds, the speech [2?] Output of the vocal "act at any instant of time can be assumed 179 SA to be a weighted sum of its past values and the input to the I 1 0 re 325/38 vocal tract at that instant of time. In the described invention, a speech wave is represented by the output of a linear filter Cm 3353553311 211?1.1231323? 13312: $551.12 JEIIZ nZISQ UNITED STATES PATENTS The parameters of this filter are derived from the speech wave 2,8l7 707 l2/l957 WCIIJFID l79/l SA Such that the meamsquared between the Synthetic 3920-344 2/1962 Presuglacomo SA speech samples at the output ofthe filter and the input speech 3,l58,685 11/1964 Gerstman 179 1 SA Samples is minimum. 3,328,525 6/1967 Kelly 179/] SA 4 TIME VARYING FILTER m PREDICTlON PARAMETER COMPUTER E TRANSVERSAL I: 5 g FILTER 10 I g E 19 2o 21 l 2a 27 PITCH PULSE Z z N I rb-Imm POSITION 3 9 PULSE COMBINER mm COMPUTER g 13 CLOCK 1e I E 61 31 PARAMETER COMPUTER I 110151: 22 GEN. 23
SPEECH ANALYSIS AND SYNTHESIS BY THE USE OF THE LINEAR PREDICTION OF A SPEECH WAVE BACKGROUND OF THE INVENTION This invention relates to the artificial production of speech or similar complex waves from control signals, and particularly to the derivation of control signals from an original speech wave that can be accommodated by storage or transmission facilities with limited channel capacity.
The principal object of the invention is to reduce, as far as possible, the channel capacity, or bit rate in the case of a digital channel, required for the storage or transmission of speech control signals without, however, a sacrifice of intelligibility or the introduction of an objectionable unnatural quality into the reconstructed speech.
1. Field of the Invention Conventional speech communication systems, for example, commercial telephone systems typically convey human speech by transmitting an electrical facsimile of the acoustic wavefonn produced by a human speaker. Because of the redundance of human speech, however, facsimile transmission is a relatively inefi'rcient way to transmit this information. Consequently, a number of arrangements for compressing or reducing the required channel capacity required for the transmission of speech information have been proposed. One of the best known of these arrangements is the so-called vocoder. More recently, techniques for removing inherent signal redundancy in the speech wave through the use of a linear predictor have been utilized.
2. Description of the Prior Art Production of good quality synthetic speech is a necessary corollary to limited channel capacity transmission systems of whatever sort. However, the quality of speech obtained from priorly known synthesizers generally lacks naturalness and exhibits and undesirable quality, even when the synthesizer control signals are derived from the original speech at closely spaced intervals. There are a number of reasons for the poor quality of such synthetic speech. Consider, for example, the case of a formant synthesizer, this being a part of another typical system for the narrow band transmission of speech. Most formant analyzers attempt to isolate peaks due to various formants in the speech spectra. This is a difficnlt task, even for low-pitched male voices, since formants do not always show up as distinct peaks in the spectra, and the spectral peaks do not always result from the formants. Such methods usually break down completely for female speech. Further, satisfactory operation of a formant synthesizer often depends upon the correct ordering of the various formants. This, too, is difficult to achieve.
SUMMARY OF THE INVENTION To avoid many of these problems, a different approach to speech analysis and synthesis is followed in the present invention. Speech parameter signals are continuously developed at a transmitter station using the constraint that the applied speech wave at any instant of time is a weighted sum of its past values, that is to say, speech parameter signals are developed which specify linearly predictable characteristics of an applied speech signal. To derive parameter control signals for the production of realistic synthesized speech, a suitable functional model of speech production is established and it is assumed that a close approximation to a speech wave can be produced at its output. Typically, the model includes a discrete, linear, time-varying filter which is excited by a suitable combination of a quasi-periodic pulse train (voiced excitation) and white noise (unvoiced excitation). The output of the linear filter at any sampling instant is a linear combination of past output samples and the input. In this analysis, the n'" speech sample, s,,, may be expressed as:
where a a a,,; 12,, b b, are parameters which specify the filter at any time, and x, is the n input sample. For completely voiced sounds, the samples Jr, represent a train of quasi-periodic pulses, whereas for completely unvoiced sounds, x represents the output of a white noise generator. For this model of speech production, it can be shown that in any pitch period the speech samples after the first q samples may be expressed as linear combination of the preceding p samples. The optimum linear combination, a,, a a,,, is obtained by minimizing the mean-squared error between the actual values of the speech samples and their predicted values based on the past p samples. The values of p and q are determined by the bandwidth of the input speech signal and the length of the vocal tract. A 10th to 12th order linear predictor satisfactorily represents the speech signal band-limited to 5 kHz. with sufficient accuracy. A higher order predictor may be necessary for certain cases (some male speakers saying nasalized consonants). The parameter q is assumed to be equal to l0 in the analysis and zero in the synthesis.
Durations of individual pitch periods are determined by calculating the pitch-synchronous autocorrelation function of the third power of the input speech wave and selecting the delay for which the autocorrelation function is maximum.
A speech signal may be synthesized by a network continually adjusted by parameter signals derived in this fashion.
This invention will be more fully understood from the following detailed description taken together with the drawings.
BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block schematic diagram of a speech transmission system, including an analyzer and a synthesizer which illustrates the principles of the invention;
FIG. 2 is a block schematic diagram of a prediction parameter computer suitable for use in the analyzer of the speech transmission system illustrated in FIG. 1;
FIG. 3 is a block schematic diagram of a network for developing parameters, representing the relative amplitudes of voiced and unvoiced signal components, suitable for use in the analyzer of the system illustrated in FIG. I; and
FIG. 4 is a block schematic diagram of a time-varying filter which may be used at the synthesizer of a transmission system embodying the principles of the invention.
DETAILED DESCRIPTION A complete limited channel capacity speech transmission system which illustrates the principles of the invention is shown in FIG. 1. Speech signals, which may originate, for example, in transducer 10, are passed through low pass filter II which has a cutoff frequency in the neighborhood of 5 kHz. and which exhibits a 3-db. cutoff frequency in the neighborhood of 4 kHz. The resultant signal is then sampled at a frequency of approximately 10 kHz. in sampler 12. Clock 13 is employed to energize the sampler and other units in the system. Speech samples, s,,, thus derived are supplied to prediction parameter computer 14, to pitch pulse position computer 15, and to parameter computer 16.
Pitch pulse position computer 15 determines the location of the glottal pulses in the applied speech wave; the difference between the positions of successive glottal pulses specifies the duration of the pitch period. Any suitable pulse position analyzer may be employed to derive pitch period signals, N. For example, a suitable arrangement is described in Automatic Speaker Recognition Based on Pitch Contours by B. S. Atal, Polytechnic Institute of Brooklyn, June, 1968, pages 33-43.
Speech samples from unit 12 are also supplied to computer 16, which determines parameters g,, and These parameters characterize the amplitudes of voiced and unvoiced signal excitation, i.e., parameter 3, specifies the amplitude of voiced (or buzz) excitation signals, and the parameter 3, specifies the amplitude of unvoiced (or hiss) excitation signals.
Parameter signals a, N, and g,, 3,, thus derived uniquely determine formant frequencies and bandwidths of a speech signal, its spectrum, and the relative amplitudes of voiced and unvoiced components necessary for the synthesis of artificial speech. Since these parameters require considerably less channel capacity than the corresponding analog signal representation, they may be economically stored for future use, or transmitted to a distant station. All parameter signals may, for example, be combined for transmission, for example, by multiplexing, or the like, in transmission coder 17. At a receiver station, these signals are recovered and delivered individually through the action of transmission decoder 18. Transmission coders and decoders of any desired construction and form may be employed. Obviously, storage of the parameter signals may take place at any point in the indicated transmission arrangement; the transfer of parameter signals from one storage location to another, for example, may be considered to be a form of transmission.
At the synthesizer, voiced excitation is generated, for example, in pulse generator 19 of any desired construction, under control of pitch pulse parameter signal N. The amplitude of the voiced excitation signal is controlled continuously by parameter signal 3, acting upon modulator 20. Typically, generator 19 produces a pulse of unit amplitude at the beginning of every pitch period. Similarly, unvoiced excitation is produced in noise generator 22. Generator 22 typically produces a sequence of random numbers uniformly distributed between +1 and l at sampling instants. Noise signals are controlled in amplitude by parameter signal g acting on modulator 23.
The outputs of pulse generator 19 and noise generator 22, as scaled by controlled amplifiers 20 and 23, are added together with selected past signal values, available at the output of time-varying filter 24, in a combining network 21. The combined signal produced at the output of network 21 is thereupon delivered by way of low pass filter 26 to reproducer 27, for example, a loud speaker. Low pass filter 26 preferably has a cutoff frequency of about 5 kHz., its exact frequency range being commensurate with the range of filter 11 at the analyzer.
The combined signal is also delivered to the input of transversal filter 25, forming a part of filter network 24. Time-varying filter 24 serves to regenerate speech from the applied excitation and parameter signals a. Such a filter arrangement resembles the resonant filter system of the human vocal tract and typically exhibits certain natural resonances which may be tuned in accordance with formant parameter signals a. Resonant vocoder apparatus of this general form is well known in the art; a typical example is described in J. L. Kelly, Jr., US. Pat. No. 3,328,525, issued June 27, 1967. A transversal filter arrangement specifically adapted for use in the apparatus illustrated in FIG. 1 is described below with reference to FIG. 4.
FIG. 2 illustrates a prediction parameter computer 14 suitable for developing formant parameter signals a in accordance with the invention. For every pitch period of the applied speech wave, an array of signal values s,, from sampler 12 (FIG. 1) is transferred into storage unit 140 to replace the previous array of signal samples contained in the storage unit. Storage unit 140 thus stores an array of signal values u "-9, u u a where N represents the duration of the current pitch period in samples. Every pitch period the values u u u are replaced by values y-g, uu .lncoming samples are placed in the vacated storage locations u u Thus, signals u u are consecutively stored as they are received in storage unit 140. Every pitch period, under the influence of timing signals from pulser 141, synchronized by signals N from pitch pulse position computer 15 to indicate the positions of glottal pulses, an array of signal values is read out of storage unit and transferred to arithmetic unit (AU) 142. This unit comprises a plurality of arithmetic units 143a, l43n, designated individuallyf f ...,f,, ;j', f gf f which operate in parallel. In a typical example of practice, n=55, i.e., 55 arithmetic units are employed. Each individual unit serves to compute one value off according to the following equation:
N-lO
fr. 3 2 n-i ni index i varies from 1, 10
indexj varies from i, 10
h 2 u u,,
indexj varies from 1, 10
to yield values of a. Although a special purpose computer may be programmed for this evaluation, one suitable arrangement is described in copending patent application Ser. No. 753,408, filed Aug. 19, 1968.
Prediction parameters a,, a a,,,, uniquely determine the frequencies and bandwidths of all speech formants below 5 kHz. If desired, the bandwidths and frequencies of formants may be determined from values of a for use in the control of other synthesis apparatus. In accordance with the invention, this determination is made by supplying parameter values a from computer 144, by way of switch 147, to polynomial root computer 148. This unit determines the complex roots of a polynomial with real coefficients, i.e., the roots of a polynomialf(z), defined as:
f(z)=z+a,z a z-l-a (5) A polynomial root locater suitable for making the necessary evaluation is described in Mathematical Methods for Digital Computers, edited by Ralston and Wilf, John Wiley & Sons, lnc., 1967, in the section by E. R. Bareiss, at page 185. The output of the polynomial root locator 148 is 10 complex numbers (two sets of 10 real numbers) 2,, z,, z which are then supplied to the arithmetic unit 149 which computes the numbers p,, p:, p in accordance with equation (6) below.
Pk /21'r)( UT) 8 k)- Arithmetic unit 149 is thus a device which takes the complex logarithm of numbers z,, and multiplies them with a number l/21rT) where T=0.000l sec., the interval of sampling unit 12 (FIG. 1). The complex numbers p can be separated into their real and imaginary parts, b and f,,, respectively, as follows:
Pk k+jfm where index k varies from 1 to 10. Logical unit 150 orders the numbers p such that the first number has the lowest positive imaginary part, the second number the second lowest positive imaginary part, and so on. Consequently, the numbers f,, and b represent the frequencies and the bandwidths of the various formants of the speech signal for the pitch period under consideration. These representations may be used in any desired fashion, e.g., for controlling a formant synthesis.
Speech samples from sampling unit 12 (FIG. 1) are also supplied to parameter computer 16 which determines parameter values g and These parameters denote the relative amplitudes of voiced and unvoiced signal components in the applied speech signal. The operation of computer 16 is illustrated in FIG. 3. Every pitch period, an array of signal values s, is transferred into storage unit 161 to replace the previous array of signal samples already in storage. Storage unit 161 thus stores an array of signal values u u u u where N denotes the duration of the current pitch period in samples and m represents the largest pitch period as measured in samples. A value of m=200 has been found to be sufficient in most cases. Every pitch period, the values u u are replaced by values u u Incoming samples are placed in the vacated storage locations u u Arithmetic units 164 and 165 operate on array u to evaluate the values of parameters E and R in accordance with equations (8) and (9) as follows:
N1 E: E W n:
10 v.,= a vn-k+n.
n varies from O, N
The array of numbers r designates the output of a white noise generator 170. Similar to storage unit 167, storage unit 169 also has l0 additional storage locations designated v- L, which have the number 0 stored in them permanently.
The arrays w, y, and v, and the numbers E and R are transferred periodically, under the influence of pitch synchronized clock pulses from pulser 171, to arithmetic unit 172 which comprises six arithmetic units designated d,, d d (1., which operate in parallel. These units of system 172 compute the numbers 11,, d in accordance with equations (13) through (18) set forth below. The index n is summed from 0 to N in each of the equations.
The array of numbers d dficomputed in the manner indicated above are delivered to arithmetic unit 173 which computes parameters g, and g in accordance with the following relations:
ogd gR E (19) R ESILSE (20) (J2 fl a 91 fi i Each of the operations indicated above is carried out sequentially every pitch period under the influence of clock signals (developed by pulser 171) synchronized with the positions of the glottal pulses from pitch pulse computer 14 (FIG. I), as defined by signal N.
Arithmetic and storage units operative in a fashion similar to that described above are described in greater detail in the aforementioned copending application, Ser. No. 75 3 ,408.
Ordinarily, there are five resonances below the frequency of 5 kHz. in the human vocal tract. As discussed above, these resonances may be simulated by a transversal filter arrangement employing n-discrete delay elements. When n=l0, the system can simulate n/2 resonances, i.e., the five resonances of the vocal tract. The synthesizer of this invention thus employs a discrete linear time-varying filter excited by a suitable combination of quasi-periodic pulses and white noise. A transversal filter arrangement is satisfactory for developing a linear combination of past output samples and the current input sample. Actual locations of resonances are determined in the transversal filter arrangement by the parameters a. Details of this form of resonance simulation is described in the abovementioned Kelly U.S. Pat. No. 3,328,525. Transversal filter arrangements for use in speech synthesizers also have been described abundantly in the art. One suitable form is shown by way of a rudimentary block diagram in FIG. 4.
In the arrangement of FIG. 4 time-varying filter 24 (FIG. 1) includes a transversal filter network 25 composed of IO unit delay elements 240, 240, supplying applied signals to 10 adjustable gain amplifiers 241,, 241 Signals developed at the junctions of the several delay units thus represent past sample values of signals supplied from combiner 21 to filter 26 in the synthesizer of FIG. 1. The gains of the individual amplifiers 241 are adjusted by parameter values a to form a collection of weighted past sample values. The resultant signals are additively combined in adder network 242 and supplied to one input of combiner unit 21. As discussed above, the combined output of combiner network 21, which includes voiced and unvoiced excitation, and the combination of weighted past sample values constitute a replica of the applied speech signal. It is supplied by way of filter 26 to loud speaker 27.
Thus, in accordance with the invention an analog speech signal may be efficiently transmitted in the form of an array of numbers, viz., N; 3,, g,; and a,, a These parameters represent the necessary information concerning the speech wave in any given pitch period and are sufi'icient for reconstructing the speech wave. A saving of approximately 10 to l in transmission capacity may be achieved when using these parameters rather than the analog signal itself.
For example, a lO-kilobit signal used for representing the parameter has been found to yield excellent quality synthesized speech. A S-kilobit signal still permits vary acceptable speech to be produced; this in contrast to the usual requirement of a O -kilobit signal for direct coding of a speech wave.
Various other arrangements and modifications of the described arrangements will occur to those skilled in the art.
What is claimed is? 1. Speech analysis apparatus, which comprises:
means for developing a first set of signals which specify linearly predictable characteristics of an applied speech signal,
means for developing a second set of signals representative of the duration of individual pitch periods of said applied speech signal,
means for developing a third set of signals representative of the energy of a speech signal and of the voicing character of speech signals within each of said pitch periods, and
means for utilizing all of said developed signals together as a representation of said applied speech signal.
2. Speech signal analysis apparatus as defined in claim 1, wherein,
said first set of signals which specify linearly predictable characteristics comprises a plurality of limited channel capacity parameter signals derived from past and current values of said applied speech signal for adjusting a resonant filter system, arranged to produce a replica of said applied speech signal when excitedby voiced and unvoiced excitation signals.
3. Speech signal analysis apparatus, as defined in claim 1, wherein,
said first set of signals comprises a sequence of signals a=a,,
..., a,,, for each pitch period of said applied signals, which uniquely determine the frequencies and bandwidths of formants of said applied signal below approximately 5 kHz.
4. Speech signal analysis apparatus as defined in claim 3, in combination with,
means supplied with said sequence of signals a for developing signals representative of the frequencies and bandwidths of forrnants of said applied speech signal during selected pitch periods.
5. Speech signal analysis apparatus as defined in claim 1, wherein,
said first set of signals is developed by minimizing the meansquared error between the actual values of samples of said applied speech signal and predicted values thereof based on a selected number of past sample values.
6. Speech signal apparatus, which comprises:
at a transmitter station;
means for developing a first set of signals which specify linearly predictable characteristics of an applied speech signal,
means for developing a second set of signals representative 5 of the duration of individual pitch periods of said applied speech signal, means for developing a third set of signals representative of the energy of a speech signal in each of said pitch periods and of the voicing character of speech signals within said pitch periods, and
means for combining all of said developed signals for transmission to a receiver station; and
at said receiver station;
means responsive to received signals of said first set for developing signals representative of predicted values of a speech signal,
means responsive to received signals of said second set for developing a sequence of pitch period pulses,
means for generating white noise signals,
means responsive to received signals of said third set for individually adjusting the levels of said pitch period pulses and said white noise signals, and
means for combining said adjusted pitch period pulses, said adjusted white noise signals, and said predicted value signals to form speech signal which is a replica of said applied speech signal.
7. Speech signal apparatus as defined in claim 6, wherein,
said means at said receiver station for developing signals representative of predicted values of said speech signal comprises,
a transversal filter supplied with a combination of adjusted pitch period pulses, adjusted noise signals, and signals selectively representative of past values of said applied signal.
8. Synthesis apparatus for developing artificial speech from signals representative of the pitch period, voicing character, and selected predictable characteristics of an applied speech signal, which comprises:
means responsive to received signals representative of selected predictable characteristics of an applied speech signal for developing signals representative of selected predicted values of said speech signal,
means responsive to received signals representative of the pitch period of said applied speech signal for developing a sequence of pitch period pulses,
means for generating white noise signals, means responsive to received signals representative of the voicing character of said applied speech signal for individually adjusting the levels of said pitch period pulses and said white noise signals, and
means for combining said adjusted pitch period pulses, said adjusted white noise signals, and said predicted value signals to form speech signal which is a replica of said applied speech signal.
9. Synthesis apparatus as defined in claim 8, wherein said means for developing signals representative of predicted values of said speech signal comprises a transversal filter supplied with said combined replica signal and adjusted by said predictable characteristic signals.
10. Synthesis apparatus as defined in claim 8, wherein,
said predicted value signals are selected to represent a linear combination of preceding values of said replica of said applied speech signal.
Claims (10)
1. Speech analysis apparatus, which comprises: means for developing a first set of signals which specify linearly predictable characteristics of an applied speech signal, means for developing a second set of signals representative of the duration of individual pitch periods of said applied speech signal, means for developing a third set of signals representative of the energy of a speech signal and of the voicing character of speech signals within each of said pitch periods, and means for utilizing all of said developed signals together as a representation of said applied speech signal.
2. Speech signal analysis apparatus as defined in claim 1, wherein, said first set of signals which specify linearly predictable characteristics comprises a plurality of limited channel capacity parameter signals derived from past and current values of said applied speech signal for adjusting a resonant filter system, arranged to produce a replica of said applied speech signal when excited by voiced and unvoiced excitation signals.
3. Speech signal analysis apparatus, as defined in claim 1, wherein, said first set of signals comprises a sequence of signals a a1, ..., an, for each pitch period of said applied signals, which uniquely determine the frequencies and bandwidths of formants of said applied signal below approximately 5 kHz.
4. Speech signal analysis apparatus as defined in claim 3, in combination with, means supplied with said sequence of signals a for developing signals representative of the frequencies and bandwidths of formants of said applied speech signal during selected pitch periods.
5. Speech signal analysis apparatus as defined in claim 1, wherein, said first set of signals is developed by minimizing the mean-squared error between the actual values of samples of said applied speech signal aNd predicted values thereof based on a selected number of past sample values.
6. Speech signal apparatus, which comprises: at a transmitter station; means for developing a first set of signals which specify linearly predictable characteristics of an applied speech signal, means for developing a second set of signals representative of the duration of individual pitch periods of said applied speech signal, means for developing a third set of signals representative of the energy of a speech signal in each of said pitch periods and of the voicing character of speech signals within said pitch periods, and means for combining all of said developed signals for transmission to a receiver station; and at said receiver station; means responsive to received signals of said first set for developing signals representative of predicted values of a speech signal, means responsive to received signals of said second set for developing a sequence of pitch period pulses, means for generating white noise signals, means responsive to received signals of said third set for individually adjusting the levels of said pitch period pulses and said white noise signals, and means for combining said adjusted pitch period pulses, said adjusted white noise signals, and said predicted value signals to form speech signal which is a replica of said applied speech signal.
7. Speech signal apparatus as defined in claim 6, wherein, said means at said receiver station for developing signals representative of predicted values of said speech signal comprises, a transversal filter supplied with a combination of adjusted pitch period pulses, adjusted noise signals, and signals selectively representative of past values of said applied signal.
8. Synthesis apparatus for developing artificial speech from signals representative of the pitch period, voicing character, and selected predictable characteristics of an applied speech signal, which comprises: means responsive to received signals representative of selected predictable characteristics of an applied speech signal for developing signals representative of selected predicted values of said speech signal, means responsive to received signals representative of the pitch period of said applied speech signal for developing a sequence of pitch period pulses, means for generating white noise signals, means responsive to received signals representative of the voicing character of said applied speech signal for individually adjusting the levels of said pitch period pulses and said white noise signals, and means for combining said adjusted pitch period pulses, said adjusted white noise signals, and said predicted value signals to form speech signal which is a replica of said applied speech signal.
9. Synthesis apparatus as defined in claim 8, wherein said means for developing signals representative of predicted values of said speech signal comprises a transversal filter supplied with said combined replica signal and adjusted by said predictable characteristic signals.
10. Synthesis apparatus as defined in claim 8, wherein, said predicted value signals are selected to represent a linear combination of preceding values of said replica of said applied speech signal.
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US87205169A | 1969-10-29 | 1969-10-29 |
Publications (1)
Publication Number | Publication Date |
---|---|
US3624302A true US3624302A (en) | 1971-11-30 |
Family
ID=25358732
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US872051A Expired - Lifetime US3624302A (en) | 1969-10-29 | 1969-10-29 | Speech analysis and synthesis by the use of the linear prediction of a speech wave |
Country Status (1)
Country | Link |
---|---|
US (1) | US3624302A (en) |
Cited By (47)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US3715512A (en) * | 1971-12-20 | 1973-02-06 | Bell Telephone Labor Inc | Adaptive predictive speech signal coding system |
US3825685A (en) * | 1971-06-10 | 1974-07-23 | Int Standard Corp | Helium environment vocoder |
US3836717A (en) * | 1971-03-01 | 1974-09-17 | Scitronix Corp | Speech synthesizer responsive to a digital command input |
US3909533A (en) * | 1974-07-22 | 1975-09-30 | Gretag Ag | Method and apparatus for the analysis and synthesis of speech signals |
US3916105A (en) * | 1972-12-04 | 1975-10-28 | Ibm | Pitch peak detection using linear prediction |
DE2435654A1 (en) * | 1974-07-24 | 1976-02-05 | Gretag Ag | Apparatus for speech analysis and synthesis - applies predictor method with reduced requirement of computer storage |
US3975587A (en) * | 1974-09-13 | 1976-08-17 | International Telephone And Telegraph Corporation | Digital vocoder |
US3979557A (en) * | 1974-07-03 | 1976-09-07 | International Telephone And Telegraph Corporation | Speech processor system for pitch period extraction using prediction filters |
US4022974A (en) * | 1976-06-03 | 1977-05-10 | Bell Telephone Laboratories, Incorporated | Adaptive linear prediction speech synthesizer |
US4038495A (en) * | 1975-11-14 | 1977-07-26 | Rockwell International Corporation | Speech analyzer/synthesizer using recursive filters |
US4045616A (en) * | 1975-05-23 | 1977-08-30 | Time Data Corporation | Vocoder system |
US4052563A (en) * | 1974-10-16 | 1977-10-04 | Nippon Telegraph And Telephone Public Corporation | Multiplex speech transmission system with speech analysis-synthesis |
US4058676A (en) * | 1975-07-07 | 1977-11-15 | International Communication Sciences | Speech analysis and synthesis system |
US4087632A (en) * | 1976-11-26 | 1978-05-02 | Bell Telephone Laboratories, Incorporated | Speech recognition system |
DE3037276A1 (en) * | 1979-10-03 | 1981-04-09 | Nippon Telegraph & Telephone Public Corp., Tokyo | TONSYNTHESIZER |
US4335275A (en) * | 1978-04-28 | 1982-06-15 | Texas Instruments Incorporated | Synchronous method and apparatus for speech synthesis circuit |
DE3244476A1 (en) * | 1981-12-01 | 1983-07-14 | Western Electric Co., Inc., 10038 New York, N.Y. | DIGITAL VOICE PROCESSOR |
US4633499A (en) * | 1981-10-09 | 1986-12-30 | Sharp Kabushiki Kaisha | Speech recognition system |
US4667340A (en) * | 1983-04-13 | 1987-05-19 | Texas Instruments Incorporated | Voice messaging system with pitch-congruent baseband coding |
US4701954A (en) * | 1984-03-16 | 1987-10-20 | American Telephone And Telegraph Company, At&T Bell Laboratories | Multipulse LPC speech processing arrangement |
US4709390A (en) * | 1984-05-04 | 1987-11-24 | American Telephone And Telegraph Company, At&T Bell Laboratories | Speech message code modifying arrangement |
US4710959A (en) * | 1982-04-29 | 1987-12-01 | Massachusetts Institute Of Technology | Voice encoder and synthesizer |
USRE32580E (en) * | 1981-12-01 | 1988-01-19 | American Telephone And Telegraph Company, At&T Bell Laboratories | Digital speech coder |
US4764963A (en) * | 1983-04-12 | 1988-08-16 | American Telephone And Telegraph Company, At&T Bell Laboratories | Speech pattern compression arrangement utilizing speech event identification |
US4827517A (en) * | 1985-12-26 | 1989-05-02 | American Telephone And Telegraph Company, At&T Bell Laboratories | Digital speech processor using arbitrary excitation coding |
US4847906A (en) * | 1986-03-28 | 1989-07-11 | American Telephone And Telegraph Company, At&T Bell Laboratories | Linear predictive speech coding arrangement |
US4866415A (en) * | 1983-12-28 | 1989-09-12 | Kabushiki Kaisha Toshiba | Tone signal generating system for use in communication apparatus |
US4890328A (en) * | 1985-08-28 | 1989-12-26 | American Telephone And Telegraph Company | Voice synthesis utilizing multi-level filter excitation |
US4913539A (en) * | 1988-04-04 | 1990-04-03 | New York Institute Of Technology | Apparatus and method for lip-synching animation |
US4975955A (en) * | 1984-05-14 | 1990-12-04 | Nec Corporation | Pattern matching vocoder using LSP parameters |
US5048088A (en) * | 1988-03-28 | 1991-09-10 | Nec Corporation | Linear predictive speech analysis-synthesis apparatus |
USRE34247E (en) * | 1985-12-26 | 1993-05-11 | At&T Bell Laboratories | Digital speech processor using arbitrary excitation coding |
US5233659A (en) * | 1991-01-14 | 1993-08-03 | Telefonaktiebolaget L M Ericsson | Method of quantizing line spectral frequencies when calculating filter parameters in a speech coder |
US5377301A (en) * | 1986-03-28 | 1994-12-27 | At&T Corp. | Technique for modifying reference vector quantized speech feature signals |
US5450449A (en) * | 1994-03-14 | 1995-09-12 | At&T Ipm Corp. | Linear prediction coefficient generation during frame erasure or packet loss |
US5471527A (en) * | 1993-12-02 | 1995-11-28 | Dsc Communications Corporation | Voice enhancement system and method |
EP0749111A2 (en) | 1995-06-14 | 1996-12-18 | AT&T IPM Corp. | Codebook searching techniques for speech processing |
US5704003A (en) * | 1995-09-19 | 1997-12-30 | Lucent Technologies Inc. | RCELP coder |
US5839098A (en) * | 1996-12-19 | 1998-11-17 | Lucent Technologies Inc. | Speech coder methods and systems |
WO1999010719A1 (en) * | 1997-08-29 | 1999-03-04 | The Regents Of The University Of California | Method and apparatus for hybrid coding of speech at 4kbps |
US5884253A (en) * | 1992-04-09 | 1999-03-16 | Lucent Technologies, Inc. | Prototype waveform speech coding with interpolation of pitch, pitch-period waveforms, and synthesis filter |
US5937376A (en) * | 1995-04-12 | 1999-08-10 | Telefonaktiebolaget Lm Ericsson | Method of coding an excitation pulse parameter sequence |
US6003000A (en) * | 1997-04-29 | 1999-12-14 | Meta-C Corporation | Method and system for speech processing with greatly reduced harmonic and intermodulation distortion |
US6081777A (en) * | 1998-09-21 | 2000-06-27 | Lockheed Martin Corporation | Enhancement of speech signals transmitted over a vocoder channel |
US6091773A (en) * | 1997-11-12 | 2000-07-18 | Sydorenko; Mark R. | Data compression method and apparatus |
US20030033136A1 (en) * | 2001-05-23 | 2003-02-13 | Samsung Electronics Co., Ltd. | Excitation codebook search method in a speech coding system |
US6708154B2 (en) * | 1999-09-03 | 2004-03-16 | Microsoft Corporation | Method and apparatus for using formant models in resonance control for speech systems |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US2817707A (en) * | 1954-05-07 | 1957-12-24 | Bell Telephone Labor Inc | Synthesis of complex waves |
US3020344A (en) * | 1960-12-27 | 1962-02-06 | Bell Telephone Labor Inc | Apparatus for deriving pitch information from a speech wave |
US3158685A (en) * | 1961-05-04 | 1964-11-24 | Bell Telephone Labor Inc | Synthesis of speech from code signals |
US3328525A (en) * | 1963-12-30 | 1967-06-27 | Bell Telephone Labor Inc | Speech synthesizer |
-
1969
- 1969-10-29 US US872051A patent/US3624302A/en not_active Expired - Lifetime
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US2817707A (en) * | 1954-05-07 | 1957-12-24 | Bell Telephone Labor Inc | Synthesis of complex waves |
US3020344A (en) * | 1960-12-27 | 1962-02-06 | Bell Telephone Labor Inc | Apparatus for deriving pitch information from a speech wave |
US3158685A (en) * | 1961-05-04 | 1964-11-24 | Bell Telephone Labor Inc | Synthesis of speech from code signals |
US3328525A (en) * | 1963-12-30 | 1967-06-27 | Bell Telephone Labor Inc | Speech synthesizer |
Cited By (55)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US3836717A (en) * | 1971-03-01 | 1974-09-17 | Scitronix Corp | Speech synthesizer responsive to a digital command input |
US3825685A (en) * | 1971-06-10 | 1974-07-23 | Int Standard Corp | Helium environment vocoder |
US3715512A (en) * | 1971-12-20 | 1973-02-06 | Bell Telephone Labor Inc | Adaptive predictive speech signal coding system |
US3916105A (en) * | 1972-12-04 | 1975-10-28 | Ibm | Pitch peak detection using linear prediction |
US3979557A (en) * | 1974-07-03 | 1976-09-07 | International Telephone And Telegraph Corporation | Speech processor system for pitch period extraction using prediction filters |
US3909533A (en) * | 1974-07-22 | 1975-09-30 | Gretag Ag | Method and apparatus for the analysis and synthesis of speech signals |
DE2435654A1 (en) * | 1974-07-24 | 1976-02-05 | Gretag Ag | Apparatus for speech analysis and synthesis - applies predictor method with reduced requirement of computer storage |
US3975587A (en) * | 1974-09-13 | 1976-08-17 | International Telephone And Telegraph Corporation | Digital vocoder |
US4052563A (en) * | 1974-10-16 | 1977-10-04 | Nippon Telegraph And Telephone Public Corporation | Multiplex speech transmission system with speech analysis-synthesis |
US4045616A (en) * | 1975-05-23 | 1977-08-30 | Time Data Corporation | Vocoder system |
US4058676A (en) * | 1975-07-07 | 1977-11-15 | International Communication Sciences | Speech analysis and synthesis system |
US4038495A (en) * | 1975-11-14 | 1977-07-26 | Rockwell International Corporation | Speech analyzer/synthesizer using recursive filters |
US4022974A (en) * | 1976-06-03 | 1977-05-10 | Bell Telephone Laboratories, Incorporated | Adaptive linear prediction speech synthesizer |
US4087632A (en) * | 1976-11-26 | 1978-05-02 | Bell Telephone Laboratories, Incorporated | Speech recognition system |
US4335275A (en) * | 1978-04-28 | 1982-06-15 | Texas Instruments Incorporated | Synchronous method and apparatus for speech synthesis circuit |
DE3037276A1 (en) * | 1979-10-03 | 1981-04-09 | Nippon Telegraph & Telephone Public Corp., Tokyo | TONSYNTHESIZER |
US4633499A (en) * | 1981-10-09 | 1986-12-30 | Sharp Kabushiki Kaisha | Speech recognition system |
DE3244476A1 (en) * | 1981-12-01 | 1983-07-14 | Western Electric Co., Inc., 10038 New York, N.Y. | DIGITAL VOICE PROCESSOR |
US4472832A (en) * | 1981-12-01 | 1984-09-18 | At&T Bell Laboratories | Digital speech coder |
USRE32580E (en) * | 1981-12-01 | 1988-01-19 | American Telephone And Telegraph Company, At&T Bell Laboratories | Digital speech coder |
US4710959A (en) * | 1982-04-29 | 1987-12-01 | Massachusetts Institute Of Technology | Voice encoder and synthesizer |
US4764963A (en) * | 1983-04-12 | 1988-08-16 | American Telephone And Telegraph Company, At&T Bell Laboratories | Speech pattern compression arrangement utilizing speech event identification |
US4667340A (en) * | 1983-04-13 | 1987-05-19 | Texas Instruments Incorporated | Voice messaging system with pitch-congruent baseband coding |
US4866415A (en) * | 1983-12-28 | 1989-09-12 | Kabushiki Kaisha Toshiba | Tone signal generating system for use in communication apparatus |
US4701954A (en) * | 1984-03-16 | 1987-10-20 | American Telephone And Telegraph Company, At&T Bell Laboratories | Multipulse LPC speech processing arrangement |
US4709390A (en) * | 1984-05-04 | 1987-11-24 | American Telephone And Telegraph Company, At&T Bell Laboratories | Speech message code modifying arrangement |
US4975955A (en) * | 1984-05-14 | 1990-12-04 | Nec Corporation | Pattern matching vocoder using LSP parameters |
US4890328A (en) * | 1985-08-28 | 1989-12-26 | American Telephone And Telegraph Company | Voice synthesis utilizing multi-level filter excitation |
US4827517A (en) * | 1985-12-26 | 1989-05-02 | American Telephone And Telegraph Company, At&T Bell Laboratories | Digital speech processor using arbitrary excitation coding |
USRE34247E (en) * | 1985-12-26 | 1993-05-11 | At&T Bell Laboratories | Digital speech processor using arbitrary excitation coding |
US5377301A (en) * | 1986-03-28 | 1994-12-27 | At&T Corp. | Technique for modifying reference vector quantized speech feature signals |
US4847906A (en) * | 1986-03-28 | 1989-07-11 | American Telephone And Telegraph Company, At&T Bell Laboratories | Linear predictive speech coding arrangement |
US5048088A (en) * | 1988-03-28 | 1991-09-10 | Nec Corporation | Linear predictive speech analysis-synthesis apparatus |
US4913539A (en) * | 1988-04-04 | 1990-04-03 | New York Institute Of Technology | Apparatus and method for lip-synching animation |
US5233659A (en) * | 1991-01-14 | 1993-08-03 | Telefonaktiebolaget L M Ericsson | Method of quantizing line spectral frequencies when calculating filter parameters in a speech coder |
US5884253A (en) * | 1992-04-09 | 1999-03-16 | Lucent Technologies, Inc. | Prototype waveform speech coding with interpolation of pitch, pitch-period waveforms, and synthesis filter |
US5471527A (en) * | 1993-12-02 | 1995-11-28 | Dsc Communications Corporation | Voice enhancement system and method |
US5450449A (en) * | 1994-03-14 | 1995-09-12 | At&T Ipm Corp. | Linear prediction coefficient generation during frame erasure or packet loss |
US5937376A (en) * | 1995-04-12 | 1999-08-10 | Telefonaktiebolaget Lm Ericsson | Method of coding an excitation pulse parameter sequence |
US6064956A (en) * | 1995-04-12 | 2000-05-16 | Telefonaktiebolaget Lm Ericsson | Method to determine the excitation pulse positions within a speech frame |
EP0749111A2 (en) | 1995-06-14 | 1996-12-18 | AT&T IPM Corp. | Codebook searching techniques for speech processing |
US5822724A (en) * | 1995-06-14 | 1998-10-13 | Nahumi; Dror | Optimized pulse location in codebook searching techniques for speech processing |
US5704003A (en) * | 1995-09-19 | 1997-12-30 | Lucent Technologies Inc. | RCELP coder |
US5839098A (en) * | 1996-12-19 | 1998-11-17 | Lucent Technologies Inc. | Speech coder methods and systems |
USRE43099E1 (en) | 1996-12-19 | 2012-01-10 | Alcatel Lucent | Speech coder methods and systems |
US6003000A (en) * | 1997-04-29 | 1999-12-14 | Meta-C Corporation | Method and system for speech processing with greatly reduced harmonic and intermodulation distortion |
WO1999010719A1 (en) * | 1997-08-29 | 1999-03-04 | The Regents Of The University Of California | Method and apparatus for hybrid coding of speech at 4kbps |
US6233550B1 (en) | 1997-08-29 | 2001-05-15 | The Regents Of The University Of California | Method and apparatus for hybrid coding of speech at 4kbps |
US6475245B2 (en) | 1997-08-29 | 2002-11-05 | The Regents Of The University Of California | Method and apparatus for hybrid coding of speech at 4KBPS having phase alignment between mode-switched frames |
US6091773A (en) * | 1997-11-12 | 2000-07-18 | Sydorenko; Mark R. | Data compression method and apparatus |
US6081777A (en) * | 1998-09-21 | 2000-06-27 | Lockheed Martin Corporation | Enhancement of speech signals transmitted over a vocoder channel |
US6708154B2 (en) * | 1999-09-03 | 2004-03-16 | Microsoft Corporation | Method and apparatus for using formant models in resonance control for speech systems |
US20030033136A1 (en) * | 2001-05-23 | 2003-02-13 | Samsung Electronics Co., Ltd. | Excitation codebook search method in a speech coding system |
US20070043560A1 (en) * | 2001-05-23 | 2007-02-22 | Samsung Electronics Co., Ltd. | Excitation codebook search method in a speech coding system |
US7206739B2 (en) | 2001-05-23 | 2007-04-17 | Samsung Electronics Co., Ltd. | Excitation codebook search method in a speech coding system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US3624302A (en) | Speech analysis and synthesis by the use of the linear prediction of a speech wave | |
US4220819A (en) | Residual excited predictive speech coding system | |
US5457783A (en) | Adaptive speech coder having code excited linear prediction | |
Schroeder | Vocoders: Analysis and synthesis of speech | |
CA1222568A (en) | Multipulse lpc speech processing arrangement | |
Schroeder et al. | Code-excited linear prediction (CELP): High-quality speech at very low bit rates | |
US4472832A (en) | Digital speech coder | |
US4827517A (en) | Digital speech processor using arbitrary excitation coding | |
USRE32580E (en) | Digital speech coder | |
US4945565A (en) | Low bit-rate pattern encoding and decoding with a reduced number of excitation pulses | |
US3715512A (en) | Adaptive predictive speech signal coding system | |
AU6672094A (en) | Use of an auditory model to improve quality or lower the bit rate of speech synthesis systems | |
Robinson | Speech analysis | |
Campanella | A survey of speech bandwidth compression techniques | |
Yegnanarayana et al. | Voice simulation: Factors affecting quality and naturalness | |
CA1336841C (en) | Multi-pulse type coding system | |
USRE34247E (en) | Digital speech processor using arbitrary excitation coding | |
Kelly | Speech and vocoders | |
EP0119033B1 (en) | Speech encoder | |
Kitawaki et al. | Artificial voice signal for objective quality evaluation of speech coding systems | |
KR950013373B1 (en) | Speech message suppling device and speech message reviving method | |
JPH09506182A (en) | Adaptive speech coder with code-driven linear prediction | |
GB2266213A (en) | Digital signal coding | |
Flanagan | Time Modification and Variable Coding for Packet Transmission of Speech: Early Studies [DSP History] | |
Flanagan et al. | Systems for Analysis-Synthesis Telephony |