US3916105A - Pitch peak detection using linear prediction - Google Patents

Pitch peak detection using linear prediction Download PDF

Info

Publication number
US3916105A
US3916105A US446847A US44684774A US3916105A US 3916105 A US3916105 A US 3916105A US 446847 A US446847 A US 446847A US 44684774 A US44684774 A US 44684774A US 3916105 A US3916105 A US 3916105A
Authority
US
United States
Prior art keywords
interval
speech
voicing
pitch
error
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
US446847A
Inventor
William R Mccray
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Priority to US446847A priority Critical patent/US3916105A/en
Application granted granted Critical
Publication of US3916105A publication Critical patent/US3916105A/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/93Discriminating between voiced and unvoiced parts of speech signals

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)

Abstract

The application of linear prediction techniques to speech analysis is well covered by the papers referred to below. This case describes a technique to determine the presence or absence of voicing in a digitized speech signal and to locate the glottal impulse positions in that signal when voicing is present.

Description

United States Patent 1191 1111 3,916,105
McCray Oct. 28, 1975 PITCH PEAK DETECTION USING LINEAR 3,631,520 12/1971 A'tal 179/1 SA PREDICTION [751 Inventor: William R. McCray, Lexington, Ky. Primary Examiner Kathleen H. Claffy [73] Assignee: International Business Machines Assistant Examiner-E. S. Kemeny Corporation, Armonk, NY. Attorney, Agent, or FirmD1 Kendell Cooper [22] Filed: Feb. 28, 1974 21 Appl. No.: 446,847
Related US. Application Data [57] ABSTRACT [63] Continuation-impart of Ser. No. 312,063, Dec. 4,
1972 abandoned. The application of linear predlctlon techniques to speech analysis is well covered by the papers referred 52 US. Cl 179/1.s D to below This case describes a technique to determine [51] Int. Cl. G01L 1/04 the Presence or absence of Voicing in a digitized [58 Field of Search 179/1 SA, 1 s1), 1 sc speech Signal and to 19921t9 the glottal impulse P tions n that signal when voicing is present. [56] References Cited UNITED STATES PATENTS 20 Claims, 4 Drawing Figures 3,624,302 11/1971 Atal 179/1'SA 1115111011011 mats- 95 'VOIOED/UIIVOICED OUTPUT UTPUT 88 I Vol 0 111115 11500110 1111vo1cEn 5s 1 112111011011 PREDIGTION as IEIGHT IEIGHT, STORM 39 .INTERVAL .GENERATOR V SmAcE 151111111 P 11* 62 1111111115 5%; A6 SPEECH PREDICIOR STORAGE 9511 m H9 5011111011011 54 win! Es 7 28 VOICED/UNVOIOED 82 1 5 1 911111111115011 1101151512111 volume 50\ 001111101 UH PICKER 'NEIWIRK NETWORK H 76 121 VOICING PEAK HULIIRKMIIW COMPARISON FOUND HOKER NETWORK NETWORK 13 ALUE /123 m A)! v 6 SUBTRACTUR m PEAK 001011115011 THREE TIIE NETWORK 511165 405 00111411115011 11s AGE 511m 1151110111 REGISIER 121 11515011511 5m i: WPARW FEM gm NEI'URK mus 141 SW mnxmu 0 B 158 t VALUE LEAs-T mu:
U.S. Patent A Oct. 28, 1975 Sheet 1 of3 3,916,105
PREDICTION PREDICTION 5 DATA AND WEIGHT ANALYZER ANALYZED GENERATOR SPEECH (CONTROL OUTPUT 4 5 K2 RERET JTETTEAL FROM GENERATOR FIG. 2
INTERROGATE VOICED-UNVOICED -1 UNVOICED MEMORY "A? I 19 (A) VOICED E PREDICTION 15 E l ERRoR 8 I ma 1-"- SIGNAL E GENERATOR u, i 9 10 l 5 i 1 A {6 i l SMPLED SPEECH RR C sEEcR STORAGE STORAGE UNVOICED "1 REGISTER J ESI FOUND REGISTER MEMORY NOT l 1 Q E FOUND i I t PREDICTIONWEIGHT g AND ERRoR i T SIGNAL GENERATOR H T g (F) I 12 n i 1 1 I PITCH PATTERN T oETEcToR (M) sET T0 VOICED, IF
PATTERN mum) 6/ 4 sET To UNVOICED, OTHERWISE INTERVAL To WEIGHT GENERATOR US. Patent Oct. 28, 1975 CENERATE PREDICTION WEIGHTS AND ERROR FOR 4P SECONDS (F) EXTRACT FIRST FOUR PEAKS Pk THRU Pk (G) I use Pk AS NEW FOURTH PEAK OUTPUT PL/2 AS NEXT INTERVAL (R) SET v-uv MEMORY T0 UNVOICEO SET v-uv MEMORY TO VOICED (0) Sheet 2 of 3 FIG. 3
P PERIOD OF LOWEST ACCEPTABLE PITCH Pkr-ERROR VALUE AT PEAK OF THE ERROR L LOCATION OF Pk;
p LENGTH OF PREVIOUS PITCH PERIOD b,c CONSTANTS I OUTPUT L1 1 AS NEXT INTERVAL U.S. Patent Oct. 28, 1975 Sheet 3 of 3 3,916,105
FIG.4
PREDICTION WEIGHTS W 95 VOICED/UNVOICED OUTPUT UTPUT GATE RECORD 88 VOIOED- v UNVOICED A PREDICTION RRETNcTToN STORAGE 95 NETGNT WEIGHT 89 \INTERVAL GENERATOR STORAGE LENGTH 52 P 62! ADAPTNE I M A sPEEGN V PREDICTOR STORAGE SAMPLED STORAGE F69 57 SPEECH INPUT SUBTRACTION 54 PREDICTION NETWORK VOIGED/UNVOICED ERROR SAMPLES 70 T 68 l 82 so \86 50 M PEAK Z COMPARISON CONSISTENT vo|c|NG j CONTROL PTGNER NETWORK NETWORK T? L T21 VOICING H PEAK MULTIPLICATION/ COMPARISON FOUND PICKER NETWORK 75 TIME OF 128 151 MAXIMUM VALUE 101 {O4 PEAK COMPARISON THREE TIME NETwoRN STAGE COMPARISON N5 A E SHIFT 106 NETWORK REGISTER m I 1 TRNN REJECTED T PEAK COMPARISON 2 VALUE PEAK E STORAGE NETWORK T41 VALUE T 152 sToRAGE 103MAX|MUM No 157 158 VALUE LEAsT VALUE T40 PITCH PEAK DETECTION USING LINEAR PREDICTION This is a continuation-in-part of application Ser. No. 312,063, filed Dec. 4, 1972, now abandoned.
REFERENCES OF INTEREST B. S. Atal and M. R. Schroeder, Adaptive Predictive Coding of Speech Signals, Bell System Technical Journal, 49, 1973-1986 (1970).
B. S. Atal, Characterization of Speech Signals by Linear Prediction of the Speech Wave, Proc. IEEE Symposium on Feature Extraction and Selection in Pattern Recognition, Argonne, Ill. (Oct. 1970), pp. 202-209.
B. S. Atal and Suzanne L. I-Ianauer, Speech Analysis and Synthesis by Linear Prediction of the Speech Wave, Journal of the Acoustical Society of America, Vol. 50, Number 2 (Part 2), pp. 637-655 (1971).
US. Pat. No. 3,631,520, Predictive Coding of Speech Signals, B. S. Atal.
US. Pat. No. 3,624,302, Speech Analysis and Synthesis by the Useof the Linear Prediction of 21 Speech Wave, B. S. Atal.
BACKGROUND OF THE INVENTION AND PRIOR ART SUMMARY OF THE INVENTION A region of consistently-voiced speech is characterized by having pitch periods of approximately equal length. Thus, such a region may be discovered by locating a pattern of regularly spaced, large prediction errors, and within such a region it is only necessary to compare the length of the next pitch period with the length of the previous pitch period to determine if con sistent voicing has ceased or continues OBJECTS Accordingly, the prime object of the present invention is to provide a speech analysis system based on linear prediction and having improved efficiency.
The foregoing and other objects, featuresnd advantages of the invention will be apparent from the following more particular description of the preferred embodiment of the invention as illustrated in the accompanying drawings.
DRAWINGS IN THE DRAWINGS FIG. 1 associates the prediction weight generator, previously taught by Atal and a prediction interval analyzer that is significant in practicing the present invention.
FIG. 2 is a block diagram of a system incorporating the speech analysis techniques of the present invention.
FIG. 3 is a flow chart related to the system of FIG. 2.
FIG. 4 is a detailedrepresentation of the system.
DETAILED DESCRIPTION FIG. 1 is a simplified diagram of a system incorporating the inventive techniques taught herein. Sample speech is considered to be available on line 1 for input to a prediction interval analyzer 2, and a prediction weight generator 3. Data and control is symbolized by line 4 with analyzed speech output on line 5.
Block 2 of FIG. 1 is particularly expanded upon in FIG. 2. As indicated, consistently-voiced speech is characterized by having pitch periods of approximately equal length. In accordance with the present technique, the length of a succeeding pitch period is compared with the length of the previous pitch period to determine is consistent voicing has ceased or continues.
For the sake of consistency, the blocks in the flow chart of FIG. 3 are designated with letters in parentheses (A) through (S), and where possible, corresponding letters in parentheses are incorporated in the blocks of FIG. 2. Thus, the hardware represented by block 7, FIG. 2, represents the decision block (A) in FIG. 3.
Other blocks shown in FIG. 2 include an error signal generator 8, a speech storage register 9, a next pitch detector 10, a prediction weight and error signal generator 11, a pitch pattern detector 12, an error storage register 13, and a voiced-unvoiced memory 14. The various blocks in the flow chart of FIG. 3 are designated 20-37.
Considering FIG. 2, first, the status of the voicedunvoiced memory 14 is checked by block 7 to determine the character of the previous voice segment. If the segment is voiced then a decision is made to generate an error signal by generator circuit 8. Errors are stored in the error storage register 13. An input from generator 3 is provided at terminal 15 indicative of the weights calculated for the previous pitch period. An error signal is generated for a predetermined bp seconds. This is stored in register 13 and serves as an input by line 16 to block 10. Block 10, related to blocks 22 and 23 in FIG. 3, determines two maxima and makes a decision as will be discussed in connection with block 23, FIG. 3, as to whether voicing has ended or a change in pitch occurred. If this has not occurred, a set of predictor weights is calculated and an output record written as determined by a control signal on line 17.
If the end of voicing or a change in putch has occurred, then the routine proceeds by control on line 18 to block 1 1. It is noted that if an unvoiced segment was determined by block 7 then acontrol signal so indicates on line 19 directly to block 11 for processing of the speech segment.
In any case, additional determinations are made by the pitch pattern detector block 12 corresponding to blocks 26-37, FIG. 3. This primarily has to do with the detection of a speech pattern and the control of memory 14 to a voiced or unvoiced state. Various output situations are represented by control on line 6 which indicates an interval of speech weighted in order to be written as an output record.
FLOW CHART OF FIG. 3
As indicated, FIG. 3 illustrates a flow chart for carrying out the present invention. A decision is made at block 20 as to whether the previous segment was voiced or unvoiced. If voiced, the routine proceeds to block 21, if not, it proceeds to block 25.
BLOCK 21 Using the predictor weights calculated for the previous pitch period and beginning at the end of that period, the speech waveform is predicted and the error signal generated for bp seconds. where p is the previous pitch period length and 19 determines the partial period to be examined beyond the next expected pitch period ending. Obviously, b must be between 1.0 and 2.0, so that the expected time of the next error signal peak is included in the interval, but so that the second succeeding peak is excluded.
BLOCK 22 The peaks (local maxima) of the error signal are scanned out to bp seconds and two maxima are obtained, the maximum peak (Pk within a small region around p seconds and the maximum peak (Pk outside this small region.
BLOCK 23 If Pk does not exceed Pk by a significant amount (Pk less than c Pkwhere c is a constant greater than 1.0), either voicing has ended or a significant change in pitch has occurred. In either case, the region of consistent voicing has ended, and the procedure must be abandoned. Block 25 is executed next.
Otherwise, the location of Pk is taken as the end of the pitch period. A set of predictor weights is calculated over the period beteen the two pitch period endings, and an output record written, block 24. The process is then repeated from block 20.
When a consistently-voiced region of speech occurs, it contains a significant number of pitch periods (more than three). This fact is utilized in discovering the beginning of such a region. The error signal is scanned for a sufficient time in an attempt to discover four error peaks with nearly constant spacing between them. The following steps are taken. In this discussion, P and P are the periods of the lowest and highest pitches of interest.
BLOCK 25 Predictor weights are calculated, the speech wave form is predicted, and the error signal is generated over 4P,
BLOCK 26 The peaks of the error signal are scanned beginning P into the region of the waveform being analyzed. The first four peaks encountered are collected.
BLOCKS 30 AND 33 If the first collected peak is found beyond P consistent voicing has not been found in this region of speech. A set of predictor weights is calculated over a period equal to P /2 and an output record written, at block 28 after setting memory 14 to an unvoiced state at block 31. The process is then repreated from block 20.
Otherwise, the four collected peaks are analyzed to determine if a pitch pattern exists, block 33. If the periods between adjacent peaks are approximately equal and each is not less than P,.,, such a pattern has been found. The collected peaks are assumed to be pitch period endings, and a region of consistent voicing has been found beginning at the first peak. A set of predictor weights is calculated up to the location of the first peak. and an output record written. at block 37, after setting memory 14 to a voiced state at block 34. Block 20 is executed next.
BLOCK 36 Ifa pitch pattern is not found, the smallest of the four collected peaks is discarded, block 36.
BLOCK 35 The error signal is scanned from the location of the most recently found peak to find the next error peak.
BLOCK 32 If the end of predicted speech (4P is found prior to the next error peak, this region of speech does not contain consistent voicing. Blocks 31 and 28 are executed next.
BLOCK 29 If a peak is found prior to the end of predicted speech, it is compared to the value of the peak discarded in block 36. If the new one is not larger, it is rejected and block 35 is repeated.
BLOCK 27 If the new peak is larger than the one discarded in block 36, it is taken as the new fourth peak, block 27. The pitch pattern recognition process is then repeated from block 30.
DETAILED SYSTEM, FIG. 4
A detailed implementation of the system is illustrated in FIG. 4.
Control Network 50 controls the timing and sequence of operation of the other portions of the system. Its outputs to the other blocks are represented by cable 51 to avoid undue complication of the diagram.
Sampled speech is inputted on line 52 and is stored in Speech storage 65 until its processing by the system is complete.
Voice-Unvoiced storage 53 indicates whether the previously analyzed segment of speech was voiced or unvoiced. To begin a cycle of operation, block 50 determines from block 53 by line 54 the status of the previous segment.
If the previous segment was voiced, Control Network 50 obtains the length of that segment p from Segment Length Storage 56 on line 57. This is multiplied by a constant factor b, (between 1.0 and 2.0) to determine the length of speech bp to be evaluated and whether voicing continues.
The prediction weights of the previous segment are moved from storage block 60 to Adaptive Predictor 61 on line 62. This predictor, as described by Atal, uses these weights and the speech samples from Speech storage 65 on line 66 to predict subsequent speech samples.
Predictor 61 will operate on an interval of speech of length bp producing predicted speech samples which are conducted to Subtraction Network 68 by line 69. Here the original speech samples from line 66 are subtracted from the predicted samples to produce the prediction error samples on line 70.
Line 70 carries the error samples to two peak Pickers 72 and 73. Picker 72 is controlled to scan the error on both before and after Picker 72, scanning the error samples throughout bp (the time of the speech predicted by block 61) except when Picker 72 is on. Thus, Picker 72 selects the largest error sample within a small interval around p and Picker 73 selects the largest sample outside this interval.
Consistent voicing is assumed to continue if the error peak found by Picker 72 is significantly greater than the one found by Picker 73. To determine this, the output of Picker 73 is transferred to Multiplication Network 76 by line 77 where it is multiplied by a constant greater than one (1). The result of this multiplication is presented to Comparison Network 80 on line 81 where it is compared to the peak found by Picker 72 on line 82.
Control Network 50 determines via line 84 the results of the comparison in block 80. If the output of Picker 72 is greater, then the exact location of the error peak in the speech interval is stored in Segment Length storage 56 via line 86. Prediction Weight Generator 88 (as described by Atal) uses the length of the speech segment from Segment Length storage 56 on line 89 and the speech samples from Speech storage 65 on line 90 to analyze the speech segment and transfers the results via line 91 to Storage block 60. The output gate 93 is opened to allow the contents of storage blocks 53, 56 and 60 to be outputted on line 95.
If, however, the result of the comparison showed that the output of Multiplication Network 76 was greater, then Control Network 50 would set the Voiced- Unvoiced storage 53 to unvoiced. Subsequent operation will then be identical to that which would have occurred had the previous segment of speech been unvoiced when the control cycle was initiated.
In the unvoiced case, Prediction Weight Generator 88 is controlled to calculate a set of weights over a portion of speech representing a time of at least 4 P where P is the pitch period of the lowest pitch frequency of interest. The calculated weights are stored in block 60 and used in Adaptive Predictor 6 1 to predict speech. Predictor 61 and Subtraction Network 68 then operate to produce the prediction error signal.
Rejected Peak Value Storage 100 is set initially to zero. The error signal enters a three-stage shift register 101, which presents to a Comparison Network 103 the most recent three error values via lines 104, 105 and 106, while storage block 100 presents its present value via line 107. Each stage of register 101 is capable of storing enough bits to represent the full value of the error signal. When Comparison Network 103 detects that the value on line 105 is greater than the other three, then a local maximum of the error signal has been found. This maximum value is transferred to Peak Value Storage 110 via line 111 while the time location of the maximum is transferred to Peak Time Storage 112 by line 113.
When four such maxima have been placed in storage blocks 110 and 112, the stored times are gated onto lines 115, 116, 117 and 118. Comparison Network 120 compares the time of the first peak with P If the time on line 115 does not exceed P then Control Network 50 is signalled on line 121 to continue the checks. Subtractor Networks 123, 124 and 125 produce the intervals between adjacent peaks which are presented to Comparison Network 127 by lines 128, 129, and 130. Network 127 compares these three values for approximate equality, and that each is less than P and greater than P the pitch period of the highest pitch frequency of interest.
If all the requirements are met, control network 50 is signalled on line 131 that voicing has been found beginning at the timeon line 115. The time on line 115 is placed in Segment Length Storage 56 (connection not shown) and generator 88 calculates a set of weights over this segment of unvoiced speech. The weights are stored in storage60 and Output Gate 93 is opened to output a record. Subsequently, the Voiced-Unvoiced Storage 53 is changed to voiced, the pitch interval on line 128 is placed in the Segment Length Storage 56 and a new cycle is initiated.
If, however, one or more of the requirements for voicing are absent, the peak values are gated to Comparison Network 132 via lines 135, 136, 137 and 138. Network 132 determines the least of the four, transfers this value by line 140 to Rejected Peak Value Storage 100, and signals Control Network 50 by line 141 which value is the least. Network 50 causes the least peak and its time to be removed from storage units 110 and 112, and the remaining peaks and times to be moved in storage to maintain chronological order and to leave position four vacant for a new error maximum. Shift Register 101 and Comparison Network 103 operate to locate error signal maxima as before, but now, since Storage block has a non-zero value stored in it, the maximum selected by Network 103 must exceed the value of the rejected peak in block 100. A selected maximum and its time will be gated into blocks and 112 and the aforementioned tests performed.
This process continues until voicing is found, or until one or more limits occur. These limits are:
1. that the location of the first peak in storage (110,
112) on line exceeds P 2. that one or more of the peak intervals on lines 128,
129 and exceeds P or 3. that Adaptive Predictor 61 has predicted a portion of speech of 4P,
When one of these limits is exceeded, the process is discontinued, Control Network 50 places a fixed length (10 milliseconds) in Segment Length Storage 56 causes the Prediction Weight Generator 88 to generate weights over this period, which are stored in block 60 and opens Output Gate 93 to output a record. A new cycle of operation is then initiated.
While the invention has been particularly shown and described with respect to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made without where P is the period of the lowest acceptable.
pitch, said error signal representing the difference between actual speech samples and the corresponding predicted values;
3. analyzing error peaks of said error signal to detect a pitch pattern comprising a predetermined minimum number of substantially equally spaced pitch periods indicative of consistent voicing.
2. The method of claim 1, further comprising:
4. when consistent voicing is detected, providing an output representation of the related interval.
3. The method of claim 1, further comprising:
4. when an unvoiced interval is detected, providing an output representation of said unvoiced interval.
4. The method of claim 1 wherein said predetermined time interval is four (4) P and said minimum number of peaks is four, designated Pk, Pk,
5. The method of claim 1, further comprising:
5. determining the continuation of consistently voiced speech by comparing the length of a next occurring pitch period in a voiced interval with the length of a previous pitch period.
6. The method of claim 5, further comprising:
6. storing an indication of the occurrence of a voiced interval;
7. analyzing prediction weights for a preceding speech interval in relation to a current speech interval to develop an error signal prediction for bp seconds where b is a constant representative of a partial pitch period to be examined beyond the next expected pitch period ending and where p is the length of the previous pitch period;
8. detecting occurrence of the next pitch period by extracting two local maxima Pk, and Pk,, respectively representative of maximum peaks within and outside of a small region around p seconds; and
9. determining the status of voicing by comparing Pk, with (c Pk where c is a constant greater than 1.0.
7. The method of claim 6, further comprising:
10. providing a signal indicative of the continuation of consistent voicing when Pk, equals or exceeds Pk 8. The method of claim 7, further comprising:
11. outputting the current voiced speech interval.
9. The method of claim 6, further comprising:
10. providing a signal indicative of the discontinuance of consistent voicing when Pk, does not equal or exceed 0 Pk 10. The method of claim 9 further comprising:
11. proceeding with steps (1) (3) to detect the next voiced interval.
11. The method of claim 1, further comprising the following steps between steps (2) and (3):
2a. determining if the first peak of said predetermined minimum number is prior to P, where P, is the lowest pitch of interest; and
21). if not prior, storing an indication that the speech signal interval is unvoiced and is not consistent voicing; and
20. if prior, proceeding with step (3).
12. The method of claim 11, further comprising the following steps after step (3):
3a. determining and discarding the smallest peak Pk from among said predetermined minimum number of peaks;
3b. scanning said error signal from the most recently formed peak to the next error peak Pk,,;
36. if end of predicted P seconds occurs, prior to next error peak Pk,,, outputting a record for P /2 seconds;
3d. if next error peak Pk,, is formed prior to P, seconds, comparing its value to the value of the peak Pk, discarded in step (30);
3e. if Pk,, is larger than Pk,, establish Pk,, as new last peak of said minimum number and repeat steps 2(1-2t'; and
3f. if Pk,, is smaller than Pk,, repeat steps 3b3d.
13. Apparatus for determining the presence or absence of consistent voicing in speech signals characterized by voiced intervals of substantially equally spaced voice pitch periods and unvoiced intervals of irregular unequally spaced unvoiced periods, comprising:
1. means for predicting speech values based on a weighted sum of a number of preceding samples of said speech signals;
2. means for generating an error signal having error peaks for a predetermined selected time interval P seconds where P is the period of the lowest acceptable pitch, said error signal representing the difference between actual speech samples and the corresponding predicted values; and
3. means for analyzing error peaks of said error signal to detect a pitch pattern comprising a predetermined minimum number of substantially equally spaced pitch periods indicative of consistent voicmg.
14. The apparatus of claim 13, further comprising:
4. means operable when consistent voicing is detected for providing an output representation of the related voiced interval.
15. The apparatus of claim 13, further comprising:
4. means operable when an unvoiced interval is detected for providing an output representation of said unvoiced interval.
16. The apparatus of claim 13, further comprising:
5. means for determining the continuation of consistently voiced speech by comparing the length of a next occurring pitch period in a voiced interval with the length of a previous pitch period.
17. The apparatus of claim 16, further comprising:
6. means for storing an indication of the occurrence of a voiced interval;
7. means for analyzing prediction weights for a preceding speech interval in relation to a current speech interval to develop an error signal prediction for hp seconds where b is a constant representative of a partial pitch period to be examined beyond the next expected pitch period ending and where p is the length of the previous pitch period;
8. means for detecting occurrence of the next pitch period by extracting two local maxima Pk, and Pk respectively representative of maximum peaks within and outside of a small region around p seconds; and
9. means for determining the status of voicing by comparing Pk, with (c Pk where c is a constant greater than 1.0.
18. The apparatus of claim 17, further comprising:
10. means for providing a signal indicative of the continuation of consistent voicing when Pk, equals or exceeds 0 Pk 19. The apparatus of claim 18, further comprising:
11. gating means for providing prediction weights, voiced/unvoiced status, and interval lengths of speech intervals following calculations.
20. The apparatus of claim 18, further comprising:
10. means for providing a signal indicative of the discontinuance of consistent voicing when Pk, does not equal or exceed c Pk

Claims (45)

1. A method for determining the presence or absence of consistent voicing in speech signals characterized by voice intervals of substantially equally spaced voice pitch periods and unvoiced intervals of irregular unequally spaced unvoiced periods, comprising: 1. predicting speech values based on a weighted sum of a number of preceding samples of said speech signals; 2. generating an error signal having error peaks for a predetermined selected time interval PL seconds where PL is the period of the lowest acceptable pitch, said error signal representing the difference between actual speech samples and the corresponding predicted values; 3. analyzing error peaks of said error signal to detect a pitch Pattern comprising a predetermined minimum number of substantially equally spaced pitch periods indicative of consistent voicing.
2. means for generating an error signal having error peaks for a predetermined selected time interval PL seconds where PL is the period of the lowest acceptable pitch, said error signal representing the difference between actual speech samples and the corresponding predicted values; and
2. The method of claim 1, further comprising:
2. generating an error signal having error peaks for a predetermined selected time interval PL seconds where PL is the period of the lowest acceptable pitch, said error signal representing the difference between actual speech samples and the corresponding predicted values;
3. analyzing error peaks of said error signal to detect a pitch Pattern comprising a predetermined minimum number of substantially equally spaced pitch periods indicative of consistent voicing.
3. means for analyzing error peaks of said error signal to detect a pitch pattern comprising a predetermined minimum number of substantially equally spaced pitch periods indicative of consistent voicing.
3. The method of claim 1, further comprising:
4. when an unvoiced interval is detected, providing an output representation of said unvoiced interval.
4. The method of claim 1 wherein said predetermined time interval is four (4) PL and said minimum number of peaks is four, designated Pk1 - Pk4.
4. means operable when an unvoiced interval is detected for providing an output representation of said unvoiced interval.
4. means operable when consistent voicing is detected for providing an output representation of the related voiced interval.
4. when consistent voicing is detected, providing an output representation of the related interval.
5. The method of claim 1, further comprising:
5. determining the continuation of consistently voiced speech by comparing the length of a next occurring pitch period in a voiced interval with the length of a previous pitch period.
5. means for determining the continuation of consistently voiced speech by comparing the length of a next occurring pitch period in a voiced interval with the length of a previous pitch period.
6. The method of claim 5, further comprising:
6. storing an indication of the occurrence of a voiced interval;
6. means for storing an indication of the occurrence of a voiced interval;
7. means for analyzing prediction weights for a preceding speech interval in relation to a current speech interval to develop an error signal prediction for bp seconds where b is a constant representative of a partial pitch period to be examined beyond the next expected pitch period ending and where p is the length of the previous pitch period;
7. The method of claim 6, further comprising:
7. analyzing prediction weights for a preceding speech interval in relation to a current speech interval to develop an error signal prediction for bp seconds where b is a constant representative of a partial pitch period to be examined beyond the next expected pitch period ending and where p is the length of the previous pitch period;
8. detecting occurrence of the next pitch period by extracting two local maxima Pk1 and Pk2 respectively representative of maximum peaks within and outside of a small region around p seconds; and
8. The method of claim 7, further comprising:
8. means for detecting occurrence of the next pitch period by extracting two local maxima Pk1 and Pk2 respectively representative of maximum peaks within and outside of a small region around p seconds; and
9. means for determining the status of voicing by comparing Pk1 with (c Pk2) where c is a constant greater than 1.0.
9. The method of claim 6, further comprising:
9. determining the status of voicing by comparing Pk1 with (c Pk2) where c is a constant greater than 1.0.
10. providing a signal indicative of the continuation of consistent voicing when Pk1 equals or exceeds c Pk2.
10. means for providing a signal indicative of the continuation of consistent voicing when Pk1 equals or exceeds c Pk2.
10. means for providing a signal indicative of the discontinuance of consistent voicing when Pk1 does not equal or exceed c Pk2.
10. providing a signal indicative of the discontinuance of consistent voicing when Pk1 does not equal or exceed c Pk2.
10. The method of claim 9 further comprising:
11. proceeding with steps (1) - (3) to detect the next voiced interval.
11. outputting the current voiced speech interval.
11. The method of claim 1, further comprising the following steps between steps (2) and (3): 2a. determining if the first peak of said predetermined minimum number is prior to PL where PL is the lowest pitch of interest; and 2b. if not prior, storing an indication that the speech signal interval is unvoiced and is not consistent voicing; and 2c. if prior, proceeding with step (3).
11. gating means for providing prediction weights, voiced/unvoiced status, and interval lengths of speech intervals following calculations.
12. The method of claim 11, further comprising the following steps after step (3): 3a. determining and discarding the smallest peak Pks from among said predetermined minimum number of peaks; 3b. scanning said error signal from the most recently formed peak to the next error peak Pkn; 3c. if end of predicted PL seconds occurs, prior to next error peak Pkn, outputting a record for PL/2 seconds; 3d. if next error peak Pkn is formed prior to PL seconds, comparing its value to the value of the peak Pks discarded in step (3a); 3e. if Pkn is larger than Pks, establish Pkn as new last peak of said minimum number and repeat steps 2a-2c; and 3f. if Pkn is smaller than Pks, repeat steps 3b-3d.
13. Apparatus for determining the presence or absence of consistent voicing in speech signals characterized by voiced intervals of substantially equally spaced voice pitch periods and unvoiced intervals of irregular unequally spaced unvoiced periods, compriSing:
14. The apparatus of claim 13, further comprising:
15. The apparatus of claim 13, further comprising:
16. The apparatus of claim 13, further comprising:
17. The apparatus of claim 16, further comprising:
18. The apparatus of claim 17, further comprising:
19. The apparatus of claim 18, further comprising:
20. The apparatus of claim 18, further comprising:
US446847A 1972-12-04 1974-02-28 Pitch peak detection using linear prediction Expired - Lifetime US3916105A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US446847A US3916105A (en) 1972-12-04 1974-02-28 Pitch peak detection using linear prediction

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US31206372A 1972-12-04 1972-12-04
US446847A US3916105A (en) 1972-12-04 1974-02-28 Pitch peak detection using linear prediction

Publications (1)

Publication Number Publication Date
US3916105A true US3916105A (en) 1975-10-28

Family

ID=26978210

Family Applications (1)

Application Number Title Priority Date Filing Date
US446847A Expired - Lifetime US3916105A (en) 1972-12-04 1974-02-28 Pitch peak detection using linear prediction

Country Status (1)

Country Link
US (1) US3916105A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2351467A1 (en) * 1976-05-15 1977-12-09 Licentia Gmbh PROCESS FOR DETERMINING THE FUNDAMENTAL PERIOD OF A VOICE SIGNAL USING THE DIFFERENTIAL SIGNAL DELIVERED BY PREDICTIVE VOCODERS.
WO1984001049A1 (en) * 1982-08-26 1984-03-15 Western Electric Co Lpc word recognizer utilizing energy features
WO1987001500A1 (en) * 1985-08-28 1987-03-12 American Telephone & Telegraph Company Voice synthesis utilizing multi-level filter excitation
WO1987001498A1 (en) * 1985-08-28 1987-03-12 American Telephone & Telegraph Company A parallel processing pitch detector
US4710959A (en) * 1982-04-29 1987-12-01 Massachusetts Institute Of Technology Voice encoder and synthesizer
WO1988000754A1 (en) * 1986-07-21 1988-01-28 Ncr Corporation Method and system for compressing speech signal data
EP0280827A1 (en) * 1987-03-05 1988-09-07 International Business Machines Corporation Pitch detection process and speech coder using said process
US4783807A (en) * 1984-08-27 1988-11-08 John Marley System and method for sound recognition with feature selection synchronized to voice pitch
US4912764A (en) * 1985-08-28 1990-03-27 American Telephone And Telegraph Company, At&T Bell Laboratories Digital speech coder with different excitation types
US5471527A (en) 1993-12-02 1995-11-28 Dsc Communications Corporation Voice enhancement system and method
US20030088401A1 (en) * 2001-10-26 2003-05-08 Terez Dmitry Edward Methods and apparatus for pitch determination
US20100268532A1 (en) * 2007-11-27 2010-10-21 Takayuki Arakawa System, method and program for voice detection

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3624302A (en) * 1969-10-29 1971-11-30 Bell Telephone Labor Inc Speech analysis and synthesis by the use of the linear prediction of a speech wave
US3631520A (en) * 1968-08-19 1971-12-28 Bell Telephone Labor Inc Predictive coding of speech signals

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3631520A (en) * 1968-08-19 1971-12-28 Bell Telephone Labor Inc Predictive coding of speech signals
US3624302A (en) * 1969-10-29 1971-11-30 Bell Telephone Labor Inc Speech analysis and synthesis by the use of the linear prediction of a speech wave

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2351467A1 (en) * 1976-05-15 1977-12-09 Licentia Gmbh PROCESS FOR DETERMINING THE FUNDAMENTAL PERIOD OF A VOICE SIGNAL USING THE DIFFERENTIAL SIGNAL DELIVERED BY PREDICTIVE VOCODERS.
US4710959A (en) * 1982-04-29 1987-12-01 Massachusetts Institute Of Technology Voice encoder and synthesizer
WO1984001049A1 (en) * 1982-08-26 1984-03-15 Western Electric Co Lpc word recognizer utilizing energy features
US4519094A (en) * 1982-08-26 1985-05-21 At&T Bell Laboratories LPC Word recognizer utilizing energy features
US4783807A (en) * 1984-08-27 1988-11-08 John Marley System and method for sound recognition with feature selection synchronized to voice pitch
US4879748A (en) * 1985-08-28 1989-11-07 American Telephone And Telegraph Company Parallel processing pitch detector
WO1987001500A1 (en) * 1985-08-28 1987-03-12 American Telephone & Telegraph Company Voice synthesis utilizing multi-level filter excitation
US4912764A (en) * 1985-08-28 1990-03-27 American Telephone And Telegraph Company, At&T Bell Laboratories Digital speech coder with different excitation types
WO1987001498A1 (en) * 1985-08-28 1987-03-12 American Telephone & Telegraph Company A parallel processing pitch detector
US4890328A (en) * 1985-08-28 1989-12-26 American Telephone And Telegraph Company Voice synthesis utilizing multi-level filter excitation
US4802221A (en) * 1986-07-21 1989-01-31 Ncr Corporation Digital system and method for compressing speech signals for storage and transmission
WO1988000754A1 (en) * 1986-07-21 1988-01-28 Ncr Corporation Method and system for compressing speech signal data
EP0280827A1 (en) * 1987-03-05 1988-09-07 International Business Machines Corporation Pitch detection process and speech coder using said process
US4924508A (en) * 1987-03-05 1990-05-08 International Business Machines Pitch detection for use in a predictive speech coder
US5471527A (en) 1993-12-02 1995-11-28 Dsc Communications Corporation Voice enhancement system and method
US20030088401A1 (en) * 2001-10-26 2003-05-08 Terez Dmitry Edward Methods and apparatus for pitch determination
WO2003038805A1 (en) * 2001-10-26 2003-05-08 Dmitry Edward Terez Methods and apparatus for pitch determination
WO2003038806A1 (en) * 2001-10-26 2003-05-08 Dmitry Edward Terez Methods and apparatus for pitch determination
US7124075B2 (en) 2001-10-26 2006-10-17 Dmitry Edward Terez Methods and apparatus for pitch determination
US20100268532A1 (en) * 2007-11-27 2010-10-21 Takayuki Arakawa System, method and program for voice detection
US8694308B2 (en) * 2007-11-27 2014-04-08 Nec Corporation System, method and program for voice detection

Similar Documents

Publication Publication Date Title
US4058676A (en) Speech analysis and synthesis system
US3916105A (en) Pitch peak detection using linear prediction
US4712243A (en) Speech recognition apparatus
US4400828A (en) Word recognizer
US4559602A (en) Signal processing and synthesizing method and apparatus
US6349277B1 (en) Method and system for analyzing voices
KR100880480B1 (en) Method and system for real-time music/speech discrimination in digital audio signals
KR970001165B1 (en) Recognizer and its operating method of speaker training
KR0143076B1 (en) Coding method and apparatus
US7340375B1 (en) Method and apparatus for noise floor estimation
Gold Computer program for pitch extraction
US5774836A (en) System and method for performing pitch estimation and error checking on low estimated pitch values in a correlation based pitch estimator
US3995116A (en) Emphasis controlled speech synthesizer
GB2109205A (en) Apparatus for detecting the duration of voice
US20090144058A1 (en) Restoration of high-order Mel Frequency Cepstral Coefficients
EP0459363B1 (en) Voice signal coding system
US4827519A (en) Voice recognition system using voice power patterns
US4388491A (en) Speech pitch period extraction apparatus
US20070011001A1 (en) Apparatus for predicting the spectral information of voice signals and a method therefor
CN113053354B (en) Method and equipment for improving voice synthesis effect
US20010044714A1 (en) Method of estimating the pitch of a speech signal using an average distance between peaks, use of the method, and a device adapted therefor
US4845753A (en) Pitch detecting device
Sluyter et al. A novel method for pitch extraction from speech and a hardware model applicable to vocoder systems
US6954726B2 (en) Method and device for estimating the pitch of a speech signal using a binary signal
US3127477A (en) Automatic formant locator