USH2172H1 - Pitch-synchronous speech processing - Google Patents

Pitch-synchronous speech processing Download PDF

Info

Publication number
USH2172H1
USH2172H1 US10/186,605 US18660502A USH2172H US H2172 H1 USH2172 H1 US H2172H1 US 18660502 A US18660502 A US 18660502A US H2172 H USH2172 H US H2172H
Authority
US
United States
Prior art keywords
speech
pitch
periods
acoustic data
data stream
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US10/186,605
Inventor
David H. Staelin
Carlos R. Cabrera-Mercader
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
US Air Force
Original Assignee
US Air Force
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by US Air Force filed Critical US Air Force
Priority to US10/186,605 priority Critical patent/USH2172H1/en
Assigned to GOVERNMENT OF THE UNITED STATES OF AMERICA AS REPRESENTED BY THE SECRETARY OF THE AIR FORCE reassignment GOVERNMENT OF THE UNITED STATES OF AMERICA AS REPRESENTED BY THE SECRETARY OF THE AIR FORCE ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: CABRERA-MERCADER, CARLOS R., STAELIN, DAVID H.
Application granted granted Critical
Publication of USH2172H1 publication Critical patent/USH2172H1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • G10L19/097Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters using prototype waveform decomposition or prototype waveform interpolative [PWI] coders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/93Discriminating between voiced and unvoiced parts of speech signals

Definitions

  • the present invention relates generally to synthetic speech systems and more specifically to a pitch synchronous method of transforming speech into vectors for speech processing.
  • Signal processing for speech, speaker, or language recognition, or for other speech applications generally consists of a pre-processing step that reduces the speech to a series of vectors, on per time interval, where that interval is typically chosen to lie between five and twenty msec, and successive intervals may overlap.
  • the most commonly used vector representation is the mel cepstrum, which is the Discrete Fourier Transform (DFT) of the logarithm of the non-uniformly low-pass filtered sampled magnitude of the spectrum of that speech segment.
  • DFT Discrete Fourier Transform
  • the non-uniform filtering and sampling provide roughly constant Q for each channel.
  • a typical output vector might have twenty-eight scalar elements.
  • the Stadin is interesting as it is for a powered roller skating system using speech recognition sensors and synthesized speech data processing.
  • the best reference is the Eberman patent which shows a computerized speech processing system with speech signals stored in a vector codebook and processed to produce corrected vectors.
  • speech processing includes the following steps.
  • digitized speech signals are partitioned into time-aligned portions (frames) where acoustic features can generally be represented by linear predictive coefficient (LPC) “feature” vectors.
  • LPC linear predictive coefficient
  • the vectors can be cleaned up using environmental acoustic data. That is, processes are applied to the vectors representing dirty speech signals so that a substantial amount of the noise and distortion is removed.
  • the cleaned-up vectors using statistical comparison methods, more closely resemble similar speech produced in a clean environment.
  • the cleaned feature vectors can be presented to a speech processing engine which determines how the speech is going to be used.
  • the processing relies on the use of statistical models or neural networks to analyze and identify speech signal patterns.
  • the feature vectors remain dirty.
  • the pre-stored statistical models or networks which will be used to process the speech are modified to resemble the characteristics of the feature vectors of dirty speech. This way a mismatch between clean and dirty speech, or their representative feature vectors can be reduced.
  • the speech analysis can be configured to solve a generalized maximum likelihood problem where the maximization is over both the speech signals and the environmental parameters.
  • the present invention is an alternate method and means for performing this first step of transforming speech into a standard series of vectors where each vector represents the sampled magnitude of the spectrum of one pitch period for voiced speech or one pseudo pitch period for unvoiced speech.
  • the subsequent speech processing steps can then be performed with these new vectors as inputs.
  • the present invention is an alternate method and means for performing the first step of transforming speech into a standard series of vectors where each vector represents the sampled magnitude of the spectrum of one pitch period for voiced speech or one pseudo pitch period for unvoiced speech.
  • the subsequent speech processing steps can then be performed with these new vectors as inputs, provided these subsequent steps are adapted to the new vectors with suitable training protocols and data.
  • the invention involves two main steps:
  • FIG. 1 is an illustration of a complete speech preprocessing system of the present invention
  • FIG. 2 is a diagram of the pitch estimation component
  • FIG. 3 is a diagram of the output of the pitch period segmentor of the present invention.
  • the present invention is a speech processing system and process for transforming speech into a standard series of vectors where each vector represents the sampled magnitude of the spectrum of one pitch period for voiced speech or one pseudo pitch period for unvoiced speech.
  • the subsequent speech processing steps can then be performed with these new vectors as inputs, provided these subsequent steps are adapted to the new vectors with suitable training protocols and data.
  • FIG. 1 A block diagram of the proposed process is illustrated in FIG. 1 .
  • the process of FIG. 1 has two main steps:
  • the process of FIG. 1 begins as acoustic data is processed for silence detection 100 to determine which part of the data stream has speech or silence.
  • the speech sequence is converted into a stream of windows of LW speech samples each.
  • the length LW should be comparable to the duration of a syllable.
  • a given window is said to contain speech if its average power exceeds a suitably chosen threshold POW_TH and is otherwise classified as silence, e.g. POW_TH may equal the noise variance per sample.
  • the pitch estimator 200 can process the flagged data stream.
  • the pitch estimation component is illustrated in FIG. 2 .
  • the input data used to estimate the pitch is the stream of classified speech/silence windows, and the minimum and maximum anticipated pitch period, P_MIN and P_MAX respectively.
  • the speech segments are segmented further into pitch periods as follows.
  • FIG. 3 shows the segmentation into pitch periods and pseudo pitch periods of a speech segment 100 msec long, where the breaks are indicated by asterisks.
  • each pitch period or pseudo pitch period the N-point DFT is computed with N equal to the length of the period in question and the log-magnitude of each transform coefficient is computed. Finally, each log-magnitude spectrum is linearly interpolated to a common regular grid with frequency resolution 1/P_MAX.
  • One example of the invention illustrated the pitch-synchronous spectral representation of the sentence “The little blankets lay around on the floor.” as delivered by a female speaker.
  • the values of the parameters used to generate this example are listed in Table 1.

Abstract

Pitch-synchronous speech processing invention involves two main steps: 1) divide the speech into pitch periods, or into pseudo pitch periods for unvoiced speech, where the breaks occur, for example, at the first zero-crossing preceding each glottal pulse for voiced speech and at any arbitrary point for unvoiced speech, and 2) compute the log-magnitude of the Discrete Fourier Transform (DFT) of each pitch-period waveform, and interpolate each log-magnitude spectrum to a common regular grid which can accommodate the spectrum of a waveform having the longest pitch period anticipated.

Description

STATEMENT OF GOVERNMENT INTEREST
The invention described herein may be manufactured and used by or for the Government for governmental purposes without the payment of any royalty thereon.
BACKGROUND OF THE INVENTION
The present invention relates generally to synthetic speech systems and more specifically to a pitch synchronous method of transforming speech into vectors for speech processing.
Signal processing for speech, speaker, or language recognition, or for other speech applications, generally consists of a pre-processing step that reduces the speech to a series of vectors, on per time interval, where that interval is typically chosen to lie between five and twenty msec, and successive intervals may overlap. The most commonly used vector representation is the mel cepstrum, which is the Discrete Fourier Transform (DFT) of the logarithm of the non-uniformly low-pass filtered sampled magnitude of the spectrum of that speech segment. The non-uniform filtering and sampling provide roughly constant Q for each channel. A typical output vector might have twenty-eight scalar elements.
The task of processing speech into preprocessing vectors is alleviated, to some extent, by the systems disclosed in the following U.S. Patent, the disclosures of which are incorporated herein by reference:
    • U.S. Pat. No. 5,008,941 issued to Sejnoha
    • U.S. Pat. No. 5,148,489 issued to Erell et al
    • U.S. Pat. No. 5,337,301 issued to Rosenberg et al
    • U.S. Pat. No. 5,469,529 issued to Bimbot et al
    • U.S. Pat. No. 5,598,505 issued to Austin et al
    • U.S. Pat. No. 5,727,124 issued to Lee et al
    • U.S. Pat. No. 5,745,872 issued to Sonmez et al
    • U.S. Pat. No. 5,768,474 issued to Neti
    • U.S. Pat. No. 5,924,065 issued to Eberman
    • U.S. Pat. No. 6,059,602 issued to Stadin
The Stadin is interesting as it is for a powered roller skating system using speech recognition sensors and synthesized speech data processing.
The best reference is the Eberman patent which shows a computerized speech processing system with speech signals stored in a vector codebook and processed to produce corrected vectors.
Generally, speech processing includes the following steps. In a first step, digitized speech signals are partitioned into time-aligned portions (frames) where acoustic features can generally be represented by linear predictive coefficient (LPC) “feature” vectors. In a second step, the vectors can be cleaned up using environmental acoustic data. That is, processes are applied to the vectors representing dirty speech signals so that a substantial amount of the noise and distortion is removed. The cleaned-up vectors, using statistical comparison methods, more closely resemble similar speech produced in a clean environment. Then in a third step, the cleaned feature vectors can be presented to a speech processing engine which determines how the speech is going to be used. Typically, the processing relies on the use of statistical models or neural networks to analyze and identify speech signal patterns.
In an alternative approach, the feature vectors remain dirty. Instead, the pre-stored statistical models or networks which will be used to process the speech are modified to resemble the characteristics of the feature vectors of dirty speech. This way a mismatch between clean and dirty speech, or their representative feature vectors can be reduced.
By applying the compensation on the processes (or speech processing engines) themselves, instead on the data, i.e., the feature vectors, the speech analysis can be configured to solve a generalized maximum likelihood problem where the maximization is over both the speech signals and the environmental parameters.
The present invention is an alternate method and means for performing this first step of transforming speech into a standard series of vectors where each vector represents the sampled magnitude of the spectrum of one pitch period for voiced speech or one pseudo pitch period for unvoiced speech. The subsequent speech processing steps can then be performed with these new vectors as inputs.
SUMMARY OF THE INVENTION
The present invention is an alternate method and means for performing the first step of transforming speech into a standard series of vectors where each vector represents the sampled magnitude of the spectrum of one pitch period for voiced speech or one pseudo pitch period for unvoiced speech. The subsequent speech processing steps can then be performed with these new vectors as inputs, provided these subsequent steps are adapted to the new vectors with suitable training protocols and data.
The invention involves two main steps:
    • 1. divide the speech into pitch periods, or into pseudo pitch periods for unvoiced speech, where the breaks occur, for example, at the first zero-crossing preceding each glottal pulse for voiced speech and at any arbitrary point for unvoiced speech, and
    • 2. compute the log-magnitude of the Discrete Fourier Transform (DFT) of each pitch-period waveform, and interpolate each log-magnitude spectrum to a common regular grid which can accommodate the spectrum of a waveform having the longest pitch period anticipated.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is an illustration of a complete speech preprocessing system of the present invention;
FIG. 2 is a diagram of the pitch estimation component; and
FIG. 3 is a diagram of the output of the pitch period segmentor of the present invention.
DESCRIPTION OF THE PREFERRED EMBODIMENT
The present invention is a speech processing system and process for transforming speech into a standard series of vectors where each vector represents the sampled magnitude of the spectrum of one pitch period for voiced speech or one pseudo pitch period for unvoiced speech. The subsequent speech processing steps can then be performed with these new vectors as inputs, provided these subsequent steps are adapted to the new vectors with suitable training protocols and data. A block diagram of the proposed process is illustrated in FIG. 1.
The process of FIG. 1 has two main steps:
    • 1. divide the speech into pitch periods, or into pseudo pitch periods for unvoiced speech, where the breaks occur, for example, at the first zero-crossing preceding each glottal pulse for voiced speech and at any arbitrary point for unvoiced speech, and
    • 2. compute the log-magnitude of the Discrete Fourier Transform (DFT) of each pitch-period waveform, and interpolate each log-magnitude spectrum to a common regular grid which can accommodate the spectrum of a waveform having the longest pitch period anticipated.
The process of FIG. 1 begins as acoustic data is processed for silence detection 100 to determine which part of the data stream has speech or silence. The speech sequence is converted into a stream of windows of LW speech samples each. The length LW should be comparable to the duration of a syllable. A given window is said to contain speech if its average power exceeds a suitably chosen threshold POW_TH and is otherwise classified as silence, e.g. POW_TH may equal the noise variance per sample.
Once the portion of the data stream containing speech is flagged the pitch estimator 200 can process the flagged data stream.
The pitch estimation component is illustrated in FIG. 2. The input data used to estimate the pitch is the stream of classified speech/silence windows, and the minimum and maximum anticipated pitch period, P_MIN and P_MAX respectively. A register of length K=┌2P_MAX/LW┐ LW1 is sequentially filled with samples from a contiguous sequence of windows containing speech until the capacity of the buffer is reached or a silence window is found on the input stream. Then the following operations are performed on the retrieved speech segment:
    • 1. The N-point DFT of the speech segment is computed with N = 2 log K 2 + 1
    •  and the square-magnitude of each transform coefficient is computed to yield a power spectrum.
    • 2. The frequencies at which the power spectrum has local maxima are determined.
    • 3. A locally normalized spectral envelope is computed by dividing the value of the power spectrum at each peak by the geometric mean of the two adjacent peaks. For the first and last peaks the power spectrum is normalized by the value of the single adjacent peak.
    • 4. If there are no frequencies at which the normalized spectral envelope is greater than ten, the speech segment is declared to be unvoiced; otherwise it is declared to be voiced.
    • 5. For unvoiced speech segments the pitch is set to the default pitch P_DEF.
    • 6. For voiced speech segments a primary pitch estimate is extracted from the normalized spectral envelope using the following heuristic. If there are fewer than five normalized spectral peaks which exceed a threshold of ten, then the lowest frequency in that set of spectral values yields the primary pitch estimate. Alternatively, if there are five or more normalized spectral peaks greater than ten, one first finds the maximum normalized spectral peak from the set of frequencies which are lower than the lowest frequency satisfying the threshold condition. If such a maximum exists and is greater than five and occurs at a frequency which is within twenty percent of half the lower frequency at which the normalized spectrum is greater than ten, then the lower of the two frequencies gives the primary pitch estimate, otherwise the higher of the two frequencies is used as the primary pitch estimate.
    • 7. If the current and previous speech segments are not separated by silence and they were both declared as voiced, a secondary pitch estimate for the current segment is computed. First the means and standard deviation of the ensemble of pitch period lengths of the previous speech segment are computed. If the standard deviation is less than ten percent of the mean and the mean is less than P_MAX, then the mean pitch period length for the previous segment is used as the secondary pitch estimate for the current speech segment.
    • 8. The final pitch estimate p_est for voiced speech segments is obtained as follows. If only the primary pitch estimate is available, it is used as the final estimate. When the secondary pitch estimate is also available the ratio of the primary estimate to the secondary estimate determines which of the two estimates is used as the final estimate. If the ratio is less than 1.3 and greater than 0.7, the primary estimate is used; otherwise the secondary estimate is used.
The speech segments are segmented further into pitch periods as follows.
    • 1. If the current speech segment is starting and the current and previous speech segments are separated by silence, find the maximum peak of the speech waveform in the time interval of duration P_MAX starting at the beginning of the current speech segment. Otherwise, find the maximum peak within the time interval starting 0.7*p_est time units ahead of the last located peak and ending 1.3*p_est time units ahead of the last located peak. Let s_max and t_max be the value and the time index of the located maximum, respectively.
    • 2. Find the minimum value of the speech waveform in the time interval of duration p_est/2 ending at t_max. Let s_min be the value of the located minimum.
    • 3. Position the time cursor at t_max.
    • 4. Move back along the time axis until a peak is found which lies above a line of slope 0.5*(s_max−s_min)/p_est passing through the current peak and is contained in the time interval of length 0.3*p_est ending at t_max.
    • 5. Repeat step 4 until another peak satisfying the specified conditions is not found. Let t_p be the time index of the last located peak.
    • 6. If the current speech segment is declared as unvoiced, the start of the current pseudo pitch period is the minimum of t_p and the start of the previous pitch period (pseudo pitch period) plus P_MAX if there is a preceding pitch period (pseudo pitch period), or the maximum of t_p and the start of the current speech segment if the current pseudo pitch period is the first one in the current speech segment if the current pseudo pitch period is the first one in the current speech segment and the current and previous speech segments are separated by silence.
TABLE 1
parameter values used to generate the example discussed below.
The symbol [*] denotes rounding to the nearest integer. The
sampling rate was F_s = 48000 samples sec.
Parameter Value
POW_TH
1000
LW [16 * F_s/1000]
P_MIN [1.4 * F_s/1000]
P_MAX [25 * F_s/1000]
P_DEF [6 * F_s/1000]
    • 7. If the current speech segment is declared as voiced the following rules are used to determine the start of the current pitch period.
      • (a) If the current and previous speech segments are separated by silence and the current pitch period is the first one in the current speech segment, the start of the current pitch period is the maximum of the zero-crossing preceding t_p and the start of the current speech segment. If there is no zero-crossing, the start of the current pitch period is the start of the current speech segment.
      • (b) If the current and previous speech segments are adjacent in time and there is a zero-crossing between t_p and the start of the previous pitch period, the start of the current pitch period is the minimum of trhe zero-crossing immediately preceding t_p and the start of the previous pitch period plus P_MAX. If there is no zero-crossing between t_p and the start of the previous pitch period, the start of the current pitch period is the start of the previous pitch period plus p_est.
This procedure is repeated until the end of the current speech segment is reached. FIG. 3 shows the segmentation into pitch periods and pseudo pitch periods of a speech segment 100 msec long, where the breaks are indicated by asterisks.
For each pitch period or pseudo pitch period the N-point DFT is computed with N equal to the length of the period in question and the log-magnitude of each transform coefficient is computed. Finally, each log-magnitude spectrum is linearly interpolated to a common regular grid with frequency resolution 1/P_MAX.
One example of the invention illustrated the pitch-synchronous spectral representation of the sentence “The little blankets lay around on the floor.” as delivered by a female speaker. The speech was sampled at a rate of F_s=48000 samples/sec with 16-bit resolution. The values of the parameters used to generate this example are listed in Table 1.
While the invention has been described in its presently preferred embodiment it is understood that the words which have been used are words of description rather than words of limitation and that changes within the purview of the appended claims may be made without departing from the scope and spirit of the invention in its broader aspects.

Claims (7)

1. A pitch-synchronous speech processing method for converting an acoustic data stream that contains periods of speech and periods of silence into a series of vectors that constitute a vector representation of the speech the proves comprising the steps of:
dividing the speech into pitch periods, or into pseudo pitch periods for unvoiced speed, where breaks occur, for example, at a first zero-crossing preceding each glottal pulse for voiced speech and at any arbitrary point for unvoiced speech, and
computing log-magnitude of the Discrete Fourier Transform (DFT) of each pitch-period waveform, and interpolate each log-magnitude spectrum to a common regular grid which can accommodate a spectrum of a waveform have a pitch period.
2. A method as defined in claim 1, wherein said dividing step further comprises:
a silence detection subset in which periods of speech in the acoustic data stream are flagged with a speech identifier flag, and wherein the periods of silence in the acoustic data stream are flagged with a silence identifier flag.
3. A method as defined in claim 2, wherein said dividing step further comprises a pitch estimation substep in which samples of the acoustic data stream are taken and used to estimate pitch in the periods of speech identified with a speech identifier flag, and not in the periods of silence identified by a silence identifier flag, the pitch estimation substep outputting thereby a set of pitch estimates.
4. A method as defined in claim 3, wherein said dividing step further comprises a pitch period segmentor substep in which the acoustic data stream, pitch estimate, speech identifier flags and silence identifier flagger are used to compute measurements of pitch period lengths and pitch period waveforms in the acoustic data stream.
5. A method as identified in claim 4, wherein said computing step further comprises:
a Fourier transform substep which produces output signals by performing Fourier transforms on the pitch period waveforms and outputting said Fourier transforms and pitch period lengths.
6. A method as defined in claim 5 wherein said computing step further comprises:
a log-magnitude computing step which operates on the output signals of the Fourier transform substep to output thereby a log-magnitude spectra of the acoustic data stream.
7. A method as defined in claim 6 wherein said computing step further comprises and interpolator substep which produces an output by interpolating the log-magnitude spectra of the acoustic data stream with the pitch period lengths of the acoustic data stream, the output signals of the interpolator step being the series of vectors of the acoustic data thereon defined as a set of interpolated log-magnitude spectra values.
US10/186,605 2002-07-02 2002-07-02 Pitch-synchronous speech processing Abandoned USH2172H1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US10/186,605 USH2172H1 (en) 2002-07-02 2002-07-02 Pitch-synchronous speech processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US10/186,605 USH2172H1 (en) 2002-07-02 2002-07-02 Pitch-synchronous speech processing

Publications (1)

Publication Number Publication Date
USH2172H1 true USH2172H1 (en) 2006-09-05

Family

ID=36939673

Family Applications (1)

Application Number Title Priority Date Filing Date
US10/186,605 Abandoned USH2172H1 (en) 2002-07-02 2002-07-02 Pitch-synchronous speech processing

Country Status (1)

Country Link
US (1) USH2172H1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130144612A1 (en) * 2009-12-30 2013-06-06 Synvo Gmbh Pitch Period Segmentation of Speech Signals
US20140200889A1 (en) * 2012-12-03 2014-07-17 Chengjun Julian Chen System and Method for Speech Recognition Using Pitch-Synchronous Spectral Parameters
US9135923B1 (en) * 2014-03-17 2015-09-15 Chengjun Julian Chen Pitch synchronous speech coding based on timbre vectors

Citations (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4885790A (en) * 1985-03-18 1989-12-05 Massachusetts Institute Of Technology Processing of acoustic waveforms
US5008941A (en) 1989-03-31 1991-04-16 Kurzweil Applied Intelligence, Inc. Method and apparatus for automatically updating estimates of undesirable components of the speech signal in a speech recognition system
US5023910A (en) * 1988-04-08 1991-06-11 At&T Bell Laboratories Vector quantization in a harmonic speech coding arrangement
US5148489A (en) 1990-02-28 1992-09-15 Sri International Method for spectral estimation to improve noise robustness for speech recognition
US5377301A (en) 1986-03-28 1994-12-27 At&T Corp. Technique for modifying reference vector quantized speech feature signals
US5469529A (en) 1992-09-24 1995-11-21 France Telecom Establissement Autonome De Droit Public Process for measuring the resemblance between sound samples and apparatus for performing this process
US5548680A (en) * 1993-06-10 1996-08-20 Sip-Societa Italiana Per L'esercizio Delle Telecomunicazioni P.A. Method and device for speech signal pitch period estimation and classification in digital speech coders
US5598505A (en) * 1994-09-30 1997-01-28 Apple Computer, Inc. Cepstral correction vector quantizer for speech recognition
US5727124A (en) 1994-06-21 1998-03-10 Lucent Technologies, Inc. Method of and apparatus for signal recognition that compensates for mismatching
US5745872A (en) 1996-05-07 1998-04-28 Texas Instruments Incorporated Method and system for compensating speech signals using vector quantization codebook adaptation
US5768474A (en) 1995-12-29 1998-06-16 International Business Machines Corporation Method and system for noise-robust speech processing with cochlea filters in an auditory model
US5832437A (en) * 1994-08-23 1998-11-03 Sony Corporation Continuous and discontinuous sine wave synthesis of speech signals from harmonic data of different pitch periods
US5924065A (en) * 1997-06-16 1999-07-13 Digital Equipment Corporation Environmently compensated speech processing
US5933808A (en) * 1995-11-07 1999-08-03 The United States Of America As Represented By The Secretary Of The Navy Method and apparatus for generating modified speech from pitch-synchronous segmented speech waveforms
US6029133A (en) * 1997-09-15 2000-02-22 Tritech Microelectronics, Ltd. Pitch synchronized sinusoidal synthesizer
US6059062A (en) 1995-05-31 2000-05-09 Empower Corporation Powered roller skates
US6418408B1 (en) * 1999-04-05 2002-07-09 Hughes Electronics Corporation Frequency domain interpolative speech codec system
US6463406B1 (en) * 1994-03-25 2002-10-08 Texas Instruments Incorporated Fractional pitch method
US6678655B2 (en) * 1999-10-01 2004-01-13 International Business Machines Corporation Method and system for low bit rate speech coding with speech recognition features and pitch providing reconstruction of the spectral envelope
US6871176B2 (en) * 2001-07-26 2005-03-22 Freescale Semiconductor, Inc. Phase excited linear prediction encoder
US6885986B1 (en) * 1998-05-11 2005-04-26 Koninklijke Philips Electronics N.V. Refinement of pitch detection

Patent Citations (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4885790A (en) * 1985-03-18 1989-12-05 Massachusetts Institute Of Technology Processing of acoustic waveforms
US5377301A (en) 1986-03-28 1994-12-27 At&T Corp. Technique for modifying reference vector quantized speech feature signals
US5023910A (en) * 1988-04-08 1991-06-11 At&T Bell Laboratories Vector quantization in a harmonic speech coding arrangement
US5008941A (en) 1989-03-31 1991-04-16 Kurzweil Applied Intelligence, Inc. Method and apparatus for automatically updating estimates of undesirable components of the speech signal in a speech recognition system
US5148489A (en) 1990-02-28 1992-09-15 Sri International Method for spectral estimation to improve noise robustness for speech recognition
US5469529A (en) 1992-09-24 1995-11-21 France Telecom Establissement Autonome De Droit Public Process for measuring the resemblance between sound samples and apparatus for performing this process
US5548680A (en) * 1993-06-10 1996-08-20 Sip-Societa Italiana Per L'esercizio Delle Telecomunicazioni P.A. Method and device for speech signal pitch period estimation and classification in digital speech coders
US6463406B1 (en) * 1994-03-25 2002-10-08 Texas Instruments Incorporated Fractional pitch method
US5727124A (en) 1994-06-21 1998-03-10 Lucent Technologies, Inc. Method of and apparatus for signal recognition that compensates for mismatching
US5832437A (en) * 1994-08-23 1998-11-03 Sony Corporation Continuous and discontinuous sine wave synthesis of speech signals from harmonic data of different pitch periods
US5598505A (en) * 1994-09-30 1997-01-28 Apple Computer, Inc. Cepstral correction vector quantizer for speech recognition
US6059062A (en) 1995-05-31 2000-05-09 Empower Corporation Powered roller skates
US5933808A (en) * 1995-11-07 1999-08-03 The United States Of America As Represented By The Secretary Of The Navy Method and apparatus for generating modified speech from pitch-synchronous segmented speech waveforms
US5768474A (en) 1995-12-29 1998-06-16 International Business Machines Corporation Method and system for noise-robust speech processing with cochlea filters in an auditory model
US5745872A (en) 1996-05-07 1998-04-28 Texas Instruments Incorporated Method and system for compensating speech signals using vector quantization codebook adaptation
US5924065A (en) * 1997-06-16 1999-07-13 Digital Equipment Corporation Environmently compensated speech processing
US6029133A (en) * 1997-09-15 2000-02-22 Tritech Microelectronics, Ltd. Pitch synchronized sinusoidal synthesizer
US6885986B1 (en) * 1998-05-11 2005-04-26 Koninklijke Philips Electronics N.V. Refinement of pitch detection
US6418408B1 (en) * 1999-04-05 2002-07-09 Hughes Electronics Corporation Frequency domain interpolative speech codec system
US6678655B2 (en) * 1999-10-01 2004-01-13 International Business Machines Corporation Method and system for low bit rate speech coding with speech recognition features and pitch providing reconstruction of the spectral envelope
US6871176B2 (en) * 2001-07-26 2005-03-22 Freescale Semiconductor, Inc. Phase excited linear prediction encoder

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130144612A1 (en) * 2009-12-30 2013-06-06 Synvo Gmbh Pitch Period Segmentation of Speech Signals
US9196263B2 (en) * 2009-12-30 2015-11-24 Synvo Gmbh Pitch period segmentation of speech signals
US20140200889A1 (en) * 2012-12-03 2014-07-17 Chengjun Julian Chen System and Method for Speech Recognition Using Pitch-Synchronous Spectral Parameters
US8942977B2 (en) * 2012-12-03 2015-01-27 Chengjun Julian Chen System and method for speech recognition using pitch-synchronous spectral parameters
US9135923B1 (en) * 2014-03-17 2015-09-15 Chengjun Julian Chen Pitch synchronous speech coding based on timbre vectors

Similar Documents

Publication Publication Date Title
Talkin et al. A robust algorithm for pitch tracking (RAPT)
DE69826446T2 (en) VOICE CONVERSION
EP1309964B1 (en) Fast frequency-domain pitch estimation
Lim et al. All-pole modeling of degraded speech
DE69931813T2 (en) METHOD AND DEVICE FOR BASIC FREQUENCY DETERMINATION
US5459815A (en) Speech recognition method using time-frequency masking mechanism
US20080215321A1 (en) Pitch model for noise estimation
US8942977B2 (en) System and method for speech recognition using pitch-synchronous spectral parameters
EP1113415B1 (en) Method of extracting sound source information
Christensen et al. A comparison of three methods of extracting resonance information from predictor-coefficient coded speech
KR100827097B1 (en) Method for determining variable length of frame for preprocessing of a speech signal and method and apparatus for preprocessing a speech signal using the same
EP0248593A1 (en) Preprocessing system for speech recognition
US6470311B1 (en) Method and apparatus for determining pitch synchronous frames
Zbancioc et al. Using neural networks and LPCC to improve speech recognition
USH2172H1 (en) Pitch-synchronous speech processing
Chavan et al. Speech recognition in noisy environment, issues and challenges: A review
JP4571871B2 (en) Speech signal analysis method and apparatus for performing the analysis method, speech recognition apparatus using the speech signal analysis apparatus, program for executing the analysis method, and storage medium thereof
Alimuradov et al. Application of improved complete ensemble empirical mode decomposition with adaptive noise in speech signal processing
EP1673761B1 (en) Adaptation of environment mismatch for speech recognition systems
Sasou et al. Glottal excitation modeling using HMM with application to robust analysis of speech signal.
Nadeu Camprubí et al. Pitch determination using the cepstrum of the one-sided autocorrelation sequence
US10354671B1 (en) System and method for the analysis and synthesis of periodic and non-periodic components of speech signals
Evans et al. Efficient real-time noise estimation without explicit speech, non-speech detection: an assessment on the AURORA corpus
Boonkla et al. Speech analysis method based on source-filter model using multivariate empirical mode decomposition in log-spectrum domain
JP3398968B2 (en) Speech analysis and synthesis method

Legal Events

Date Code Title Description
AS Assignment

Owner name: GOVERNMENT OF THE UNITED STATES OF AMERICA AS REPR

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:STAELIN, DAVID H.;CABRERA-MERCADER, CARLOS R.;REEL/FRAME:013297/0546;SIGNING DATES FROM 20020521 TO 20020613

STCF Information on status: patent grant

Free format text: PATENTED CASE