US2908761A - Voice pitch determination - Google Patents

Voice pitch determination Download PDF

Info

Publication number
US2908761A
US2908761A US463467A US46346754A US2908761A US 2908761 A US2908761 A US 2908761A US 463467 A US463467 A US 463467A US 46346754 A US46346754 A US 46346754A US 2908761 A US2908761 A US 2908761A
Authority
US
United States
Prior art keywords
signal
speech
wave
energy
delay
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
US463467A
Inventor
Raisbeck Gordon
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
AT&T Corp
Original Assignee
Bell Telephone Laboratories Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bell Telephone Laboratories Inc filed Critical Bell Telephone Laboratories Inc
Priority to US463467A priority Critical patent/US2908761A/en
Priority to GB28214/57A priority patent/GB796677A/en
Priority to GB28633/55A priority patent/GB796676A/en
Application granted granted Critical
Publication of US2908761A publication Critical patent/US2908761A/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/90Pitch determination of speech signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00

Definitions

  • This invention relates to the transmission of speech over narrow band media by vocoder techniques. Its principal object is to improve the accuracy and naturalness of the reproduced speech.
  • an input speech wave is analyzed to determine Y its fundamental frequency or pitch and the distribution of amplitudes among a number of frequency sub-bands into which the speech-frequency range is divided.
  • 'Ihe result of this analysis is translated into a number of control currents each of which, over and above the first, represents the speech energy in one sub-band, while the first control current represents its fundamental frequency or pitch.
  • These control currents are transmitted to a synthesizer and are there utilized to build up, from sources of energy in the synthesizer, an artificial speech wave having the characteristic pitch and amplitude-frequency distribution of the original impressed speech.
  • the synthesizer apparatus includes a source of periodic energy, commonly known as a buzz source, for reproducing voiced sounds, and a source of aperiodic energy, commonly known as a hiss source, for reproducing unvoiced sounds, and the outputs of these sources are applied, in the alternative, to the several members of a bank of lters, each of which embraces one of the subbands into which the voice frequency range is divided. ⁇
  • a source of periodic energy commonly known as a buzz source
  • a source of aperiodic energy commonly known as a hiss source
  • the sub-band control currents derived by the analyzing apparatus control the gain or attenuation in these several sub-band paths.
  • the choice between the buzz and the hiss source is controlled by one characteristic of the pitch control current while the fundamental frequency of the buzz source is controlled by another characteristic of the pitch control current.
  • the pitch control current tunes the buzz source to thefundamental frequency of the original voice, turns the buzz source on for a voiced sound, indicated by a strong pitch control current, and turns the buzz source off and the hiss source on for an unvoiced sound indicated by a weak pitch control current.
  • the present invention approaches the problem of pitch determination in the time domain instead of the fre,
  • the invention turns these considerations to account by dividing the Voice wave into two paths, delaying the energy in one with respect to that in the other by a controllable amount, comparing the delayed wave with the undelayed wave,;varying the amount of delay until a best match is obtained, and noting the corresponding amount of delay, with the recognition that this amount of delay is, identically, the fundamental period.
  • a period control current may then be derived which would serve as well as a pitch control current, the period and the pitch being reciprocals of each other. It is preferred, however, in order that presently knownbuzz source frequency control apparatus may be employed without change, to derive in the first instancea control current which is reciprocally related to the observed fundamental period of the voice; i.e., it is directly proportional to the voice pitch.
  • This indirectly derived pitch control currentl is indistinguishable from the pitch control current of the prior art except in respect to its greater reliability, and ⁇ may be employed in the presently known fashion.
  • the first cause has to do with the fact that the length of the period changes from period to period in the course of inec-y tion. Inflection rates in voiced speech sounds are so slow, comparedwith their fundamental frequencies, that this cause is responsible for only a negligible departure from exactness of match, provided only that the apparatus is capable of holding and tracking the fundamental period through its changes.
  • FIG. 1 is a block schematic diagram showing a vocoder transmission system embodying the invention
  • Fig. 2 is a block schematic diagram showing apparatus for deriving a speech period controll current and an energy distribution control current
  • Fig. 3 is a schematic circuit diagram showing a divider for use in the combination of Fig. 2;
  • Fig. 4 is a set of curves which are referred'to in the explanation of the invention.
  • speech currents which may originate in a telephone instrument 1 are first passed' through a unit which acts to hold them to an approximately constant level suitable for the operations of the remainder of the apparatus. It may be a voice-operated gain adjusting device 2 now commonly known as a Vogad
  • the output of the vogad is applied in parallel to apparatus shown in the upper part of the figure which derives the novel period control current andv the novel energy distribution cturent, and to conventional spectrum analyzer apparatus shown in a broken line box 3 in the lower part of the figure.
  • These energy paths like all others subsequently to be described, are shown by single lines in the drawing merely in order to avoid complexity. It will be obvious to those skilled in the art at what points wire pairs or other complete circuits may be required in practice.
  • Fig. 2 shows its details.
  • it comprises an autocorrelator having a single input point and a number of output points 6-1 to 6-20, a maximum value selector 7 having a number of 2input points 6-2 to 6--20V and two output points 8, 9, and a divider 10 having two input points 9, 11 and a single output point 12.
  • the number of output points 6 of the autocorrelator 4 exceeds by one the number of input points of the maximum value selector 7, and each of these, other than the first, is connected to one such input point.
  • the rst output point 6-41 of the autocorrelator 4 is connected directly to one input point 11 of the divider 10 while the other input point of the divider 10 is furnished by one of the two output points 9 of the maximum value seelctor 7.
  • the remainlng output point 8 of the maximum value yselector 7 4 carries ⁇ a signal which may be utilized at the receiver station without further change.
  • the autocorrelator 4 measures the autocorrelation gI/(r) of the input signal for various discretely different amounts of relay T, including zero, and delivers on each of its output points 6 a signal which is proportional to the autocorrelation for one such delay value.
  • the maximum value selector 7 picks out the greatest among these, tl/(Tm) from among all the others and supplies it as one input to the divider ⁇ 10. At the same time the maximum value selector 7 acts to tag or identify the delay value rm for which this greatest autocorrelation was obtained and to deliver an ⁇ additional identifying signal for use without further change at the synthesizer station.
  • the identifying signal is made inversely proportional to the identified delay and therefore directly proportional to its reciprocal. Because the delay for which the autocorrelation is greatest is equal to the fundamental period of the speech, its reciprocal, 1/ Tm, is equal to the pitch frequency.
  • the conventional spectrum analyzer 3 shown in the broken box in the lower part of Fig. 1 comprises a bank of bandpass filters 15 connected in parallel, each of which is proportioned to pass a preassigned sub-band of the voice frequency band of interest, while together they pass the entire band. For the sake of illustration, ten such filters are indicated, the first two and the last being shown. Each such filter 15 is followed by a detector 16 which in turn is followed by a low-pass filter i7. The control current output of each of these several low-pass filters 17 is thus a measure of the voice energy in that sub-band to which such low-pass filter is connected.
  • These spectrum control currents are transmitted by any desired means, indicated by conductors 18, 19, to a receiver station which comprises conventional synthesizing apparatus shown in the broken line box 20 at the lower right-hand part of the figure.
  • This synthesizer comprises a number of filters 21 having their output terminals connected in parallel to a reproducer 22. These several filters are proportioned to exhibit transmission characteristics like those of the several analyzing filters 15.
  • Shaping networks 23 precede the several filters 21.
  • Each shaping network 23 is Supplied, by way of a conductor 24, with locally generated energy from a hiss source 25 and a buzz source 26, while its transmission is modulated by the control current derived at the analyzer station by a corresponding one of the lters 15, 17 and transmitted over the intervening channels 18, 19.
  • this spectrum reconstruction apparatus is conventional.
  • the autocorrelator 4 itself is a variant of one shown in Bennett et al. Patent 2,676,206. It comprises a delay device 30 such as an electromagnetic transmission line. example, forty similar sections of series inductance and shunt capacitances having twenty evenly spaced taps 31; i.e., one located at every other section. The line may be terminated in well known fashion for no reflection by a resistive load 32. In terms of propagation time along the transmission line, the spacing between each tap 31 and the next one may be 500 microseconds, the total delay for all 40 sections being thus 10,000 microseconds or 20 milliseconds.
  • a delay device 30 such as an electromagnetic transmission line. example, forty similar sections of series inductance and shunt capacitances having twenty evenly spaced taps 31; i.e., one located at every other section. The line may be terminated in well known fashion for no reflection by a resistive load 32. In terms of propagation time along the transmission line, the spacing between each tap 31 and the next one may be 500 microseconds,
  • each of the output taps31 of the delay line 30 is connected, by way of a buffer amplifier 33, to one input point of a multiplier 34, while the-undelayed wave is supplied from the input terminal 5 of the line tothe remaining input
  • the latter may comprise, forV points of all ofV these multipliers 34 in parallel.
  • each such integrator is thus the autocorrelation MT) of the signal for a particular delay T.
  • the outputs of the several integrators 35 differ fromeach other in magnitude, some one being greater than the others. 'This greatest signal is a measure of the best match between the undelayed signal and one of the delayedv signals, and the location along the transmission line30y of the tap 31 on which it appears is a measure'of the corresponding delay Tm. y
  • a maximum value selector 7 may bel a variant of one shown in Davis etal. Patent 2,646,465, and may comprise a plurality of amplifiers, eg., Vacuum tubes 40 of which all the cathodes are connected together and by wayof a load resistor'41 to ground, while their anodes are connected by way of individual relay windings 42 to the positive terminal of a source 43 whose negative terminal is grounded.
  • the several input terminals of the selector 7 are provided by the control grids of these ampliers 40, which must of course be adjusted in their steady potentials by appropriate bias circuits not shown.
  • Each relay 42 is provided with two pairs of contacts 44, 45, shown in their openpositions. Inveach case the inner contact pairv44 acts, when pulled'up, to apply the input signal ,directly to a rst common bus V46, While the second contact pair 45 acts, when pulled up, to apply thepotential of an individual battery 47 to a second common bus 48.
  • the potential of each of the several batteries 47 'differs from that of every other battery 47, and these potentials are selected in accordance with a pattern't'o be described subsequently.
  • n is the number of different values of delay T which the autocorrelator 4 provides
  • n is the number of different values of delay T which the autocorrelator 4 provides
  • only one of the tubes 40 conducts significantly, and one and only one relay contact pair 44, 45 is closed, thus placing on the rst commonbus 46 the magnitude of the autocorrelation MT) for a particular delay and, on the second common bus 48,the potential of a particular battery 47 which is uniquely associated with that delay.
  • the autocorrelation MT) and the corresponding delay T thus selected are the maximum autocorrelation, MTm) and tbe corresponding delay Tm.
  • the potentials of the several batteries 47 are selected in inverse proportion to the several values of delay T with which they are associated. This is indicated on the output lead of the second commonY bus 48 by a legend showing that the signal thereon is the reciprocal of that delay Tml for which the autocorrelation is a maximum. From what has been stated above this is evidently proportional to the fundamental frequency or pitch of the speech.
  • this signal may be made substantially continuous.
  • this output signal naturally contains somewhat abrupt transitions.
  • the filtered signal from the iirst common bus 46 is applied to one of the two input points 9 of a divider 10 while the autocorrelation MTD) of the speech wave for zero delay is applied to the other input point 11 of this divider.
  • the divider 10 itself may be any unit which forms on its output lead 12 the quotient of a signal ap Y plied to its upper input lead 9 divided by another signal applied to its lower input lead 11.
  • Fig. 3 which carries out a logarithmic division operation by electronic means.
  • K' (l) It takes the logarithm of the signal applied to its upper input lead;
  • P-N junction rectifier diodes are turned to account in the construction of the divider circuit of Fig. 3 which comprises two vacuum tube triodes 51, 52, the anode of each being connected to the positive terminal E++ of an operating potential source 53 and the cathode of each being connected by way of a resistor R" to its negative terminal E-
  • resistors R are of sufficient magnitude, as compared with the internal resistances of the tubes 51, S2, to provide substantial cathode follower action.
  • One input signal V1 which is to serve as the dividend appears at a first input terminal 54 and is applied by way of a resistor of magnitude R to the control grid of the left-hand tube 51.
  • a second input signal V2 which is to serve as the divisor appears at another input terminal and is applied by way of another resistor, also of magnitude R, to the control grid of the right-hand tube 52.
  • Each of these control grids is returned, by way of a P-N junction rectifier diode 56, 57 to a suitable point of intermediate potential of the source 53, here shown as ground.
  • cathode current flows to each of the tubes S1, 52 in an amount suicient to hold its control grid, in the absence of an input signal, at cutoff, while the input rectiers 56, 57 are operated in their forward directions, so that their resistances are small compared with the resistors R to which they are connected.
  • a third P-N junction rectifier diode S8 interconnects the two cathodes, and an output signal is derived across a resistor R in series with the anode of the right-hand tube 52; i.e., between the righthand tube anode and ground.
  • the input diode current is in each case, therefore, proportional to the input signal voltage; i.e.,
  • Equation 2 the grid voltages applied to the two tubes 51, 52 are given by V J9 a 3 g 10g o (5) and E it [/4- q 1 0, (6)
  • i and i0 are the currents which flow in the first diode 56 and in the second diode 57, respectively, when large reverse voltages are applied.
  • the cathode potentials are in turn given by From elementary circuit considerations, the voltage drop across the third diode is equal to Vs-Ve From Equation 1 the current through the third diode 58 is an exponential function of the voltage drop across it; i.e.,
  • Equation is the current which flows in the third diode 58 in the absence of an applied voltage.
  • Equation is subtracted from Equation 7 as required by the exponent in Equation 9 and the logarithm of both sides is taken, there results iol/7:0! il ,iol/iol V1 i0 t.” to V2 (lo) that is to say, the current through the third diode S8 is proportional to the quotient of the dividend signal by the divisor signal.
  • Equation 12 the first and second terms are constants and may be eliminated by standard well known means, while the third term is directly proportional to the desired quotient.
  • curve A shows a representative correlation curve, normalized to unity value at zero delay, for a fully periodic signal of a certain average amount of complexity.
  • This fully periodic signal may be termed f1(t) and its autocorrelation, curve A, may be termed gol(1).
  • Curve A shows four maxima of equal heights located at delays of o, r, 2f, and 3f.
  • the time function represented thus extends over at least three full periods, each one being exactly like the last.
  • the fact that the curve A has the value unity for zero delay expresses the truism that the time function f1(t) is exactly like itself in the absence of delay.
  • maxima occur in the autocorrelation curve A at intervals of r reflects the fact that each full period of the time function f1(t) resembles its predecessor and that the delayed signal finds its best match with the undelayed signal for discrete values of the delay equal to T, 2f, 31- and so on.
  • the fact that these successive maxima rise to the same height as the maximum at zero delay reflects the fact that the successive periods of the time function f1(t) are exactly alike, so that for these values of delay the match is perfect.
  • Curve B of Fig. 4 shows the normalized autocorrelation p2(1) of an aperiodic signal f2(t), such as noise. It, too, has the value unity for zero delay, which reflects the truism that even a noise signal is an exact replica the r'ist part being a periodic function f1(t) and-'the second being an aperiodic function f2(t).
  • the pitch control signal 1/ rm is applied to the buzz source 26 to adjust its oscillation frequency in well known fashion and as shown, for example; in Reisz Patent 2,522,539.
  • The'energy ratio signal,--nam:lely, the divider output signal as itY appears on thel divider output terminal 12, is likewise transmitted 1'0 to the receiver station where it is employed to control the relative amounts of buzz and hiss furnished to the spec'- trum reconstructor inthe following fashion.
  • Theoutput of the buzz source 26 is delivered by way of a variable gain amplifier 61 to a combination point 62.
  • This amplifier 61 may be of a well known construction such as to furnish a gain proportional to the magnitude of a gain control signal applied to its control terminal 63.
  • the incoming energy distribution signal, termed g(t) for short, is passed through a. rooter 64 and applied to this control terminal 63.
  • g(t) for short
  • the signal gft) is also passed through a phase inverter 65 which converts it to --g( t) and then through a steady potential source 66 ofone volt connected in series.
  • a phase inverter 65 which converts it to --g( t) and then through a steady potential source 66 ofone volt connected in series.
  • This in turn is passed through a rooter 68 and applied to the gain control terminal 69 of a second variable gain amplifier 70 which is connected in tandem with the hiss source 25.
  • the gain control signal has the form f A
  • theV energy which ,appears ,at the combination point 62 comprises an additive mixture of periodic energy from the buzz source 26and aperiodic energy from ⁇ the hiss source 25 in proportions as called for by the energy distribution of the voice and as controlled by the output of the divider 10.
  • the periodic energy of the buzz source 26 is tuned to the required pitch frequency as stated above by the rst of the two outputs ofthe maximum value selector 7.
  • This additive combination is now fed to the Shaping Networks 23 in parallel, where the slowly varying spectrum control currents operate to control the magnitudes ⁇ of the several frequency sub-bands which collectively constitute the synthesized signal.
  • the divider 10 Becauseof the nature of the operation of the divider 10 its output is unchanged when its two inputs are increased or reduced in the same ratio. To this extent it supplies an output which is a measure of the relative magnitudes of its two inputs and is otherwise independent of Ysuch magnitudes. In ⁇ other words it carries out anormalizing function Ain addition to its assigned dividing function. Provided such normalization as between the autocorrelation signal for zero delayand ⁇ the maximum autocorrelation signal be otherwise obtained, the energy distribution signal g(t) or a near equivalent thereof can readily be derived by a subtraction process.
  • a speech analyzer station having means 4for deriving from a speech sound a pitch control signal, an energy distribution signal which is representative, for each sound, of its.. proportional content of periodic energy, and a plurality of spectrum control signals, and a reproducer station having ⁇ a buzz source, a hiss source and a spectrum synthesizer, means for transmitting all of said'control signals to said reproducer station, means for tuning the buzz sourceV under control of the pitch control signal, means controlled by said energy distribution signal for mixing the outputs of said sources in proportions determined by the magnitude of said energy distribution signal, means for applying said mixed outputs to said spectrum synthesizer, and means for controlling said spectrum synthesizer under control of said spectrum control signals.
  • a system'for deriving control signals to control the artificial production of speech means for analyzing .a speech sound, means for deriving from said analysis a pitch control signal representative of the fundamental frequency of said speech sound, means for deriving from said ⁇ analysis a plurality of spectrum control signals, each representative of the speech energy falling within one of .a plurality of frequency sub-bands; which collectively embrace the frequency band of said speech sound, means for -derivingfrom said' analysis a first measure of the entire energy of said speech sound, means Ifor deriving from. said analysis a second measure of the energy of the periodic components of said speech sound, and means for deriving from said two measures an additional signal representative of the distribution of the energy of said speech sound as between its periodic components and its aperiodic components.
  • Apparatus as defined in claim l wherein said means for deriving said pitch control signal comprises an autocorrelator and a maximum value selector.
  • said means -for deriving said pitch control signal comprises means for determining the autocorrelation of said speech sound for various delays, and means for selecting from among Said various delays that one for which said autocorrelation as determined is a maximum.
  • a system for deriving control signals to control the artificial production of speech means for analyzing' a speech wave, means for deriving from ⁇ said analysis a pitch control signal representative of the fundamental frequency of said speech wave,'means for deriving from said vanalysis a plurality-'of spectrum control signals representative of the speech energy falling within a plurality of frequency sub-bands which collectively embrace the fre- 'quency band of sai-d speech wave, means for deriving from said analysis a first ⁇ measure of the entire energy of said speech wave, means for Vderiving from said analysis a second measure of the energy of the periodic components of said speech wave, and means for deriving from said two measures an additional signal representative of the distribution of the energy of said speech Wave as between its periodic components and its aperiodic components, said pitch control signal deriving means comprising means for delaying said wave by each of a plurality of different time lags distributed over a range extending substantially from zero to the longest period of said "speech wave, means for individually
  • Apparatus for deriving a desired control signal indicative of the fundamental frequency of a complex signal wave which comprises means for delaying said Wave by each of a plurality of different time lags distributed over a range extending substantially from zero to the longest period of said complex signal Wave, means for individually comparing each delayed wave singly with the original undelayed wave, means for identifying that one of said different time lags for which the delayed wave most nearly resembles the undelayed wave as determined by said comparison, means for deriving an auxiliary signal proportional to the duration of said identified time lag, and means for reciprocating said auxiliary signal to provide said desired control signal.
  • a tunable source of periodic energy a source of aperiodic energy
  • means for tuning said periodic energy source under control of a pitch control signal means controlled by an energy distribution signal which is representative, for each sound to be reproduced, of its proportional content of periodic energy for mixing the outputs of said two sources in proportions determined by the t2 magnitude of said energy distribution signal, a plurality of filters having passbands contiguously located on the frequency scale and together embracing the frequency band of said speech sound, means for applying said mixed outputs to all of said filters, means for variably attenuating the energy path of each of said filters under control of a spectrum control signal, and means for reproducing the outputs of all of said lters as an artificial sound.
  • Apparatus for selecting from among a plurality of signals that one which has the greatest value which comprises a plurality of discharge devices each having an anode, a cathode and a control electrode, a first connection extending from all of said anodes to one terminal of an operating potential source, a second connection extending from all of said cathodes to one terminal of a common impedance element, a third connection extending from the other terminal of said common impedance element to the other terminal of said operating potential source, a like plurality of relays each having a winding connected in the anode circuit of one of said discharge devices, each of said relays being provided with two contact pairs, means for applying said input signals individually to the control electrodes of said discharge devices, a first common bus, individual energy paths extending from said rst common bus through the first contact pair of each relay to the control electrode of the device in whose anode circuit said relay is connected, a second common bus, a like plurality of auxiliary potential sources, of successively greater potentials, associated respectively with
  • Apparatus having a first input point, a second input point and an output point for delivering at said output point a signal proportional to the quotient of a dividend signal by a divisor signal of which the dividend signal is applied to said rst input point while the divisor signal is applied to said second input point which comprises a first element having an exponential current-voltage characteristic connected to the first input point, a second element having a similar exponential current-voltage characteristic connected to the second input point, means for applying said dividend and divisor signals as currents to said rst and second input points, respectively, thereby to produce a first voltage which is proportional to the logarithm of said first input signal and a second voltage which is proportional to the logarithm of said second input signal, means for subtracting said second voltage from said first voltage, and a third element having a similar exponential current-voltage characteristic for deriving from the difference of said first and second voltages a current which is exponentially related to said voltage difference and hence directly related to the quotient of said dividend signal by said
  • apparatus for continuously determining the continuously varying fundamental period of said ⁇ speech wave which comprises means for delaying said wave by each of a plurality of different time lags distributed over a range extending substantially from zero to the longest period of said speech wave, means for individually comparing each delayed wave' singly with the original undelayed Wave, means for identifying that one of said different time lags for which the delayed wave -most closely resembles the undelayed wave as determined by said comparison, means for rejecting all others of said time lags, means for continually altering the time lag identified to preserve said closest resemblance as said fundamental period changes, and means for developing a signal continuously representative of said varying identified time lag.
  • Apparatus as defined in claim 11 wherein said comparing means comprises means for determining the autocorrelation of said speech wave for various delays.
  • Apparatus as defined in claim 12 wherein said identifying means comprises means for-selecting that value of the time lag for 'which said autocorrelation as determined 1s a maximum.
  • said comparing means comprises means -for multiplying the deeSt speech Wave period, means for comparing each of lsaid replicas with the undelayed wave, means for identifying the replica which most closely matches the orig- 20 inal wave as determined by said comparison, means for rejecting all others of said replicas, whereby the delay characterizing the replica thus identified at each moment is equal to the length of said fundamental period at that moment, means for continually altering said identification to presreve said closest match as said fundamental period changes, and means for developing a signal continuously representative of the delay characterizing the replica momentarily identified.

Abstract

796,677. Vocoder systems. WESTERN ELECTRIC CO., Inc. Oct. 7, 1955 [Oct. 20, 1954], No. 28214/57. Divided out of 796,676. Drawings to Specification. Class 40 (4). [Also in Group XIV (c)] The fundamental period of a complex waveform (e.g. speech) is obtained by applying the signal to a tapped delay line and comparing the undelayed wave with the waves obtained from the tappings and determining the delay for which the output waveform most nearly resembles that of the undelayed wave. The description is identical with part of Specification 796,676. Specification 466,327 also is referred to.

Description

Oct. 13, 1959- s. RAlsBEcK voIcE FITCH DETERMINATION 3 Sheets-Sheet 1 Filed Oct. 20, 1954 INVENTOR By G. RA/SBECK ATTORNEY .lIwM iilllmlu. ill
O'ct. 13, 1959 G. RAlsBEcK 2,908,761
voIcE PITcH DETERMINATION iam-7C.
A TTORNEV t 13, 1959 G. RAlsBEcK 2,908,761
voIcE FITCH DETERMINATION Filed Oct. 20, 1954 3 Sheets-Sheet 3 F/c. 3 l E# '-Lllllllllh-o E" (T) I vvv/kfvvfxmv /NVENTQR 6. RA /SBE C K A Tron/var Patented Oct. 13, 195.9`
2,908,761 voIcE Prrcn DErERMlNArIoN Gordon Raisbeck, Bernards Township, Somerset County,
NJ., assignor to Bell Telephone Laboratories, Incorporated, New York, N.Y., a corporation of New York Application October 20, 1954, Serial No. 463,467 16 Claims. (Cl. 179-1555.)
This invention relates to the transmission of speech over narrow band media by vocoder techniques. Its principal object is to improve the accuracy and naturalness of the reproduced speech.
In the vocoder transmission system of Dudley Patent 2,151,091, an input speech wave is analyzed to determine Y its fundamental frequency or pitch and the distribution of amplitudes among a number of frequency sub-bands into which the speech-frequency range is divided. 'Ihe result of this analysis is translated into a number of control currents each of which, over and above the first, represents the speech energy in one sub-band, while the first control current represents its fundamental frequency or pitch. These control currents are transmitted to a synthesizer and are there utilized to build up, from sources of energy in the synthesizer, an artificial speech wave having the characteristic pitch and amplitude-frequency distribution of the original impressed speech. More particularly, the synthesizer apparatus includes a source of periodic energy, commonly known as a buzz source, for reproducing voiced sounds, and a source of aperiodic energy, commonly known as a hiss source, for reproducing unvoiced sounds, and the outputs of these sources are applied, in the alternative, to the several members of a bank of lters, each of which embraces one of the subbands into which the voice frequency range is divided.`
The sub-band control currents derived by the analyzing apparatus control the gain or attenuation in these several sub-band paths. The choice between the buzz and the hiss source is controlled by one characteristic of the pitch control current while the fundamental frequency of the buzz source is controlled by another characteristic of the pitch control current. In other words, the pitch control current tunes the buzz source to thefundamental frequency of the original voice, turns the buzz source on for a voiced sound, indicated by a strong pitch control current, and turns the buzz source off and the hiss source on for an unvoiced sound indicated by a weak pitch control current.
With such apparatus every sound of normal speech is, in the synthesizer apparatus, classed either as a voiced sound or as an unvoiced sound. While such classification is adequate in some cases there are many other cases in which it is inadequate. For example, the sound of the letter Z in the normal English or American pronunciation of the word azure partakes equally of the nature of a voiced'sound andY of an unvoiced sound; so that any classification of such a sound either as voiced or as unvoiced but not both is arbitrary and artificial. Heretofore this artificial classification of all sounds into one or the other of two mutually exclusive categories has been an unfortunate necessity imposed by the lack` of any measure of the relative amounts of the original speech energy of the two kinds, periodic or aperiodic.
Another defect of present vocoder systems is that the derivation at the analyzer station of a reliable pitch control current has always presented a difficult' problem to the engineer.` Many voices are so rich in harmonic' comtherefore ditlicult to segregate.
2 Y ponents that the energy of the fundamental component is small in comparison with the harmonic energy, and is Under some conditions the energy at the fundamental frequency disappears entirely and resort must be had to indirect measures, such as the intermodulation of adjacent harmonic components, to derive a difference frequency. Aside from the apparatus complexities entailed, such difference frequency is a true measure of the voice pitch only in the case of a steady sound, while variations of frequency and of phase of the intermodulated components in the course of inection causes such instantaneous frequency to be wholly inadequate.
The present invention approaches the problem of pitch determination in the time domain instead of the fre,
quency domain; i.e., it seizes hold of the fundamental period of the Voice instead of its fundamental frequency, and tracks it; i.e., continues to hold it, as it changes. It is characteristic of a periodic wave, no matter how complex that, after a certain time interval known as thev period its form is a repetition of what has gone before. In the case of an exactly periodic wave the repetition is exact.
frequencies of interestthat every voiced speech wave is periodic or nearly periodic) the repetition is inexact and approximate, but nevertheless easily recognizable. Such` repetition or near repetition of the waveform in successive periods holds good quite aside from the existence o physical energy at the fundamental frequency.
The invention turns these considerations to account by dividing the Voice wave into two paths, delaying the energy in one with respect to that in the other by a controllable amount, comparing the delayed wave with the undelayed wave,;varying the amount of delay until a best match is obtained, and noting the corresponding amount of delay, with the recognition that this amount of delay is, identically, the fundamental period. A period control current may then be derived which would serve as well as a pitch control current, the period and the pitch being reciprocals of each other. It is preferred, however, in order that presently knownbuzz source frequency control apparatus may be employed without change, to derive in the first instancea control current which is reciprocally related to the observed fundamental period of the voice; i.e., it is directly proportional to the voice pitch. This indirectly derived pitch control currentl is indistinguishable from the pitch control current of the prior art except in respect to its greater reliability, and` may be employed in the presently known fashion.
There are in general two physical causes for inexactness of the match between the waveform of any funda'- mental period and that of the prior period. The first cause has to do with the fact that the length of the period changes from period to period in the course of inec-y tion. Inflection rates in voiced speech sounds are so slow, comparedwith their fundamental frequencies, that this cause is responsible for only a negligible departure from exactness of match, provided only that the apparatus is capable of holding and tracking the fundamental period through its changes.
The second cause has to do with the fact that noA tion, therefore, and in addition to determining the delay In the case of a nearly periodic wave (and, syllabic rates in speech are so slow compared with voicefor which the best match is obtained, there is also determined the degree to which this best match fails of perfection. From this determination a novel control current is derived. It is transmitted to the synthesizer station where it operates to control the lrelative =amounts of energy supplied, respectively, by the buzz source and the hiss source to the filter bank of the spectrum synthesizer apparatus. Thus the switching operation as between these two sources, which characterizes the prior art, is completely dispensed with. The buzz source and the hiss source `remain connected to the filter bank `at all times Vand feed the filters of the bank with their respective energies in the proper yamounts under control of this novel control current and therefore in accordance with the distribution of energy in the original speech as between its periodic Vcomponent and its aperiodic component.
-It is well known that a quantitative measure of the degree to which any wave matches any other wave is found in the cross-correlation function of these two waves. When one of the waves in question is a delayed replica of the other, the function in question is known as the auto-correlation function. The present system employs auto-correlation techniques for the determination of the fundamental period.
The invention will be fully apprehended from the following detailed description of a preferred illustrative embodiment thereof taken in connection with the appended drawings in which:
- Fig. 1 is a block schematic diagram showing a vocoder transmission system embodying the invention;
Fig. 2 is a block schematic diagram showing apparatus for deriving a speech period controll current and an energy distribution control current;
Fig. 3 is a schematic circuit diagram showing a divider for use in the combination of Fig. 2; and
Fig. 4 is a set of curves which are referred'to in the explanation of the invention.
Referring now to the drawings and in particular to Fig. 1, speech currents, which may originate in a telephone instrument 1 are first passed' through a unit which acts to hold them to an approximately constant level suitable for the operations of the remainder of the apparatus. It may be a voice-operated gain adjusting device 2 now commonly known as a Vogad The output of the vogad is applied in parallel to apparatus shown in the upper part of the figure which derives the novel period control current andv the novel energy distribution cturent, and to conventional spectrum analyzer apparatus shown in a broken line box 3 in the lower part of the figure. These energy paths, like all others subsequently to be described, are shown by single lines in the drawing merely in order to avoid complexity. It will be obvious to those skilled in the art at what points wire pairs or other complete circuits may be required in practice.
The novel apparatus shown in the upper left-hand part of the figure is described below in connection with Fig. 2 which shows its details. For the present it suffices to note that it comprises an autocorrelator having a single input point and a number of output points 6-1 to 6-20, a maximum value selector 7 having a number of 2input points 6-2 to 6--20V and two output points 8, 9, and a divider 10 having two input points 9, 11 and a single output point 12. The number of output points 6 of the autocorrelator 4 exceeds by one the number of input points of the maximum value selector 7, and each of these, other than the first, is connected to one such input point. The rst output point 6-41 of the autocorrelator 4 is connected directly to one input point 11 of the divider 10 while the other input point of the divider 10 is furnished by one of the two output points 9 of the maximum value seelctor 7. The remainlng output point 8 of the maximum value yselector 7 4 carries `a signal which may be utilized at the receiver station without further change.
The autocorrelator 4 measures the autocorrelation gI/(r) of the input signal for various discretely different amounts of relay T, including zero, and delivers on each of its output points 6 a signal which is proportional to the autocorrelation for one such delay value. The maximum value selector 7 picks out the greatest among these, tl/(Tm) from among all the others and supplies it as one input to the divider `10. At the same time the maximum value selector 7 acts to tag or identify the delay value rm for which this greatest autocorrelation was obtained and to deliver an `additional identifying signal for use without further change at the synthesizer station. As a matter of convenience the identifying signal is made inversely proportional to the identified delay and therefore directly proportional to its reciprocal. Because the delay for which the autocorrelation is greatest is equal to the fundamental period of the speech, its reciprocal, 1/ Tm, is equal to the pitch frequency.
The conventional spectrum analyzer 3 shown in the broken box in the lower part of Fig. 1 comprises a bank of bandpass filters 15 connected in parallel, each of which is proportioned to pass a preassigned sub-band of the voice frequency band of interest, while together they pass the entire band. For the sake of illustration, ten such filters are indicated, the first two and the last being shown. Each such filter 15 is followed by a detector 16 which in turn is followed by a low-pass filter i7. The control current output of each of these several low-pass filters 17 is thus a measure of the voice energy in that sub-band to which such low-pass filter is connected.
These spectrum control currents are transmitted by any desired means, indicated by conductors 18, 19, to a receiver station which comprises conventional synthesizing apparatus shown in the broken line box 20 at the lower right-hand part of the figure. This synthesizer comprises a number of filters 21 having their output terminals connected in parallel to a reproducer 22. These several filters are proportioned to exhibit transmission characteristics like those of the several analyzing filters 15. Shaping networks 23 precede the several filters 21. Each shaping network 23 is Supplied, by way of a conductor 24, with locally generated energy from a hiss source 25 and a buzz source 26, while its transmission is modulated by the control current derived at the analyzer station by a corresponding one of the lters 15, 17 and transmitted over the intervening channels 18, 19. Aside from the fact that the locally generated energy supplied to the several shaping networks 23 is continuously a mixture of the outputs of a buzz source 26 and a hiss source 25 in varying proportions, this spectrum reconstruction apparatus is conventional.
Turning now to Fig. 2, the autocorrelator 4 itself is a variant of one shown in Bennett et al. Patent 2,676,206. It comprises a delay device 30 such as an electromagnetic transmission line. example, forty similar sections of series inductance and shunt capacitances having twenty evenly spaced taps 31; i.e., one located at every other section. The line may be terminated in well known fashion for no reflection by a resistive load 32. In terms of propagation time along the transmission line, the spacing between each tap 31 and the next one may be 500 microseconds, the total delay for all 40 sections being thus 10,000 microseconds or 20 milliseconds.
It is well known that the autocorrelation of a wave for any particular value of delay T is obtainable by multiplying the delayedv wave by the undelayed wave and integrating the product. To this end each of the output taps31 of the delay line 30 is connected, by way of a buffer amplifier 33, to one input point of a multiplier 34, while the-undelayed wave is supplied from the input terminal 5 of the line tothe remaining input The latter may comprise, forV points of all ofV these multipliers 34 in parallel. The products of the delayed signal by the undelayed signal thus formed by each 'of these multipliers for each particular value of delay T, as represented by the locations of the several tapsV 31 along the transmission line 30, is now averaged in the time domain by an integrator 35. The output of each such integrator is thus the autocorrelation MT) of the signal for a particular delay T. In general, the outputs of the several integrators 35 differ fromeach other in magnitude, some one being greater than the others. 'This greatest signal is a measure of the best match between the undelayed signal and one of the delayedv signals, and the location along the transmission line30y of the tap 31 on which it appears is a measure'of the corresponding delay Tm. y
l These several Aautocorrelation outputs MT1), MTZ), MTS), and so forth, other than the rst one MTU), are connected to the several input terminals of a maximum value selector 7. The lattermay bel a variant of one shown in Davis etal. Patent 2,646,465, and may comprise a plurality of amplifiers, eg., Vacuum tubes 40 of which all the cathodes are connected together and by wayof a load resistor'41 to ground, while their anodes are connected by way of individual relay windings 42 to the positive terminal of a source 43 whose negative terminal is grounded. The several input terminals of the selector 7 are provided by the control grids of these ampliers 40, which must of course be adjusted in their steady potentials by appropriate bias circuits not shown. Each relay 42 is provided with two pairs of contacts 44, 45, shown in their openpositions. Inveach case the inner contact pairv44 acts, when pulled'up, to apply the input signal ,directly to a rst common bus V46, While the second contact pair 45 acts, when pulled up, to apply thepotential of an individual battery 47 to a second common bus 48. The potential of each of the several batteries 47 'differs from that of every other battery 47, and these potentials are selected in accordance with a pattern't'o be described subsequently. f f
In operation, let it be assumed thatl the match of `the undelayed signal Awith the signal delayed by transmis-y sion over "the line to the third tap 31-2 is more perfect than the match of the undelayed signal with thedelayed signal as it appears at any other tap 31.` In this case the signal applied to Vthe grid of the second tube 40%2 of the selector 7 exceeds in magnitude the signal applied to the grid of any other tube 40 of the array. While conduction maywell start inothers of these tubes, the greater anode current ofthe.` second tube, flowing to ground by way of the cathode resistor 41, establishes a voltage drop across this resistor sufcient to raise the potentials ofthe cathodes of all of these tubes 40 to levels Asuch that the conduction -of all tubes other than the second is negligible. Since the second tube 40-2 is' the only one which conducts in a significant amount, th'e contacts of the second relay 42-'2 are pulled up and no` others. By the closing of the inner contacts 44-2 a signal proportional to the autocorrelation MTE) for the second delayvalue T2 is applied directly to the first common bus 46. f By the closing of the outer contacts 45-2 the potential of the second battery 47-2 is applied directly to the second common bus 48.
Thus, under each of n different possible conditions, where n is the number of different values of delay T which the autocorrelator 4 provides, only one of the tubes 40 conducts significantly, and one and only one relay contact pair 44, 45 is closed, thus placing on the rst commonbus 46 the magnitude of the autocorrelation MT) for a particular delay and, on the second common bus 48,the potential of a particular battery 47 which is uniquely associated with that delay. Furthermore the autocorrelation MT) and the corresponding delay T thus selected are the maximum autocorrelation, MTm) and tbe corresponding delay Tm. Y Y
6 t VIn order that conventional receiver apparatus may be employed with a minimum of change, the potentials of the several batteries 47 are selected in inverse proportion to the several values of delay T with which they are associated. This is indicated on the output lead of the second commonY bus 48 by a legend showing that the signal thereon is the reciprocal of that delay Tml for which the autocorrelation is a maximum. From what has been stated above this is evidently proportional to the fundamental frequency or pitch of the speech.
By employing a very large number of delay line taps 31 and a correspondingly large number of recognizing circuits in the maximum value selector 7 the variations of this signal may be made substantially continuous. With a smaller number, e.g., ytwenty of such circuits as contemplated, .this output signal naturally contains somewhat abrupt transitions. These, however, may readily be eliminated by the interposition of a low-pass filter 49.
Similarly the changes, as operation proceeds, from recognition of one autocorrelation signal as the greatest .to recognition of another one as the greatest, is reected in abrupt transitions in the magnitude of the signal on the first common bus 46. As before, these transitions may in principle be reduced as far as desired by the employment ofa' large number of delay line taps 31 in the autocorrelator 4 and a correspondingly large number of recognizing circuits in the selector. In practice, however, it is suicient to interpose a low-pass filter 50.`
The filtered signal from the iirst common bus 46 is applied to one of the two input points 9 of a divider 10 while the autocorrelation MTD) of the speech wave for zero delay is applied to the other input point 11 of this divider. The divider 10 itself may be any unit which forms on its output lead 12 the quotient of a signal ap Y plied to its upper input lead 9 divided by another signal applied to its lower input lead 11. Various constructions are possible, a suitable one being shown in Fig. 3 which carries out a logarithmic division operation by electronic means.
That is to say it carries out the following operation: K' (l) It takes the logarithm of the signal applied to its upper input lead;
v(2) It takes the logarithm of the signal applied to its lower input lead;
(3) It subtracts thesecond result from the first result; and, finally,
@il t=to ekT-i where Vis the applied voltage,
e is the base of Naperian logarithms,
Tis the absolute temperature,
lc is Bolzmanns constant,
q is the electronic charge,
z'ois'a constant; i.e., theY current which ows in the absence of any applied voltage V.; I
From this it follows that the voltage is proportional to the logarithm of the current; i.e
, V. 10 *r 1 l g 'La 2) These relations, known for some years to hold exactly 1nl principle, have more recently been found to hold to '7 a Avery good approximation in practice, especially with rectiiiers of the P-N junction variety. See, for example, a note by Goucher, Pearson, Sparks, Teal and Shockley, published in the Physical Review for 1951, volume 81, No. 4, page 637.
These properties of P-N junction rectifier diodes are turned to account in the construction of the divider circuit of Fig. 3 which comprises two vacuum tube triodes 51, 52, the anode of each being connected to the positive terminal E++ of an operating potential source 53 and the cathode of each being connected by way of a resistor R" to its negative terminal E- These resistors R are of sufficient magnitude, as compared with the internal resistances of the tubes 51, S2, to provide substantial cathode follower action. One input signal V1, which is to serve as the dividend appears at a first input terminal 54 and is applied by way of a resistor of magnitude R to the control grid of the left-hand tube 51. A second input signal V2 which is to serve as the divisor appears at another input terminal and is applied by way of another resistor, also of magnitude R, to the control grid of the right-hand tube 52. Each of these control grids is returned, by way of a P-N junction rectifier diode 56, 57 to a suitable point of intermediate potential of the source 53, here shown as ground. With these connections cathode current flows to each of the tubes S1, 52 in an amount suicient to hold its control grid, in the absence of an input signal, at cutoff, while the input rectiers 56, 57 are operated in their forward directions, so that their resistances are small compared with the resistors R to which they are connected. A third P-N junction rectifier diode S8 interconnects the two cathodes, and an output signal is derived across a resistor R in series with the anode of the right-hand tube 52; i.e., between the righthand tube anode and ground.
The input diode current is in each case, therefore, proportional to the input signal voltage; i.e.,
Hence, from Equation 2 the grid voltages applied to the two tubes 51, 52 are given by V J9 a 3 g 10g o (5) and E it [/4- q 1 0, (6)
where i and i0 are the currents which flow in the first diode 56 and in the second diode 57, respectively, when large reverse voltages are applied.
By virtue of the cathode follower action the cathode potentials are in turn given by From elementary circuit considerations, the voltage drop across the third diode is equal to Vs-Ve From Equation 1 the current through the third diode 58 is an exponential function of the voltage drop across it; i.e.,
1 -(V5V saw a 9) where i0 is the current which flows in the third diode 58 in the absence of an applied voltage. But when Equation is subtracted from Equation 7 as required by the exponent in Equation 9 and the logarithm of both sides is taken, there results iol/7:0! il ,iol/iol V1 i0 t." to V2 (lo) that is to say, the current through the third diode S8 is proportional to the quotient of the dividend signal by the divisor signal.
It remains to extract an output proportional to this third diode current without upsetting the operation `of the circuit. This may be done by the inclusion of the load resistor in series with the anode of the right-hand tube 52. Its magnitude R is preferably substantially smaller than that of the other resistors discussed heretofore. With this arrangement the voltage V8 is evidently equal to the positive potential of the operating source 53, reduced by the voltage drop across this load resistor; i.e.,
From elementary circuit considerations, taking the magnitudes of currents and voltages as indicated on the draw ing into account, this in turn is equal to In Equation 12 the first and second terms are constants and may be eliminated by standard well known means, while the third term is directly proportional to the desired quotient.
Hence, when the maximum autocorrelation signal 41h-m) is applied to the left-hand input terminal 54 of Fig. 3 (the upper or dividend terminal of the divider 10 of Figs. l and 2) and the correlation signal \//(r0) for zero delay is applied to the right-hand input terminal 55 of Fig. 3 (the lower or divisor terminal of the divider 10 of Figs. 1 and 2), the required quotient appears at the output terminal 59 of the divider (the terminal 12 of Figs. 1 and 2), namely the ratio of the maximum correlation Mfm) to the zero delay correlation MTU).
The significance of these relations will readily be understood from a consideration of Fig. 4, wherein the curve A shows a representative correlation curve, normalized to unity value at zero delay, for a fully periodic signal of a certain average amount of complexity. This fully periodic signal may be termed f1(t) and its autocorrelation, curve A, may be termed gol(1). Curve A shows four maxima of equal heights located at delays of o, r, 2f, and 3f. The time function represented thus extends over at least three full periods, each one being exactly like the last. The fact that the curve A has the value unity for zero delay expresses the truism that the time function f1(t) is exactly like itself in the absence of delay. The fact that maxima occur in the autocorrelation curve A at intervals of r reflects the fact that each full period of the time function f1(t) resembles its predecessor and that the delayed signal finds its best match with the undelayed signal for discrete values of the delay equal to T, 2f, 31- and so on. The fact that these successive maxima rise to the same height as the maximum at zero delay reflects the fact that the successive periods of the time function f1(t) are exactly alike, so that for these values of delay the match is perfect.
Curve B of Fig. 4 shows the normalized autocorrelation p2(1) of an aperiodic signal f2(t), such as noise. It, too, has the value unity for zero delay, which reflects the truism that even a noise signal is an exact replica the r'ist part being a periodic function f1(t) and-'the second being an aperiodic function f2(t).
`While it is not exactly true that the autocorrelation p3(t)- of the Ysum of two signals is equal to the sum of Vthe autocorrelations of the signals individually, it is sufficiently nearly true to permit a graphic representation such as that of curve C of Fig. 4, which shows the autocorrelation 03(7) of the time function or sum signal f3(t) `The mathematical formulation of the unnormalized autocorrelation of a time function f1(t) is well known andis stated in Bennett et al. Patent 2,676,206 and in Lee et. al. Patent 2,643,819, as well as elsewhere. For the purposes of these patents and in other circumstances in -which'comparison of autocorrelations is not needed, normalization is unnecessary. In the present situation, on the other hand, normalization of the autocorrelation to thevalue unity at zer'o' delay is ofassistance in the actualization of further operations of the apparatus.
The normalized autocorrelation of a time function f(t) Ais given bythe expression Y r Y Lang Tffmfewdt j o Ltm fuman 1mi. i "T (13) ,':r'Itzisrevidently independent of themagnitude ,of f(t), -beingdependent only on its form.
When for the general time function f( t) there is substituted in` Equation 13 the specic timefunction of in- 'tereslt, namely where indicates the rms value and p12 is the crosscor- Vrelation of f1 and f2.
To a good approximation,
` i Jlfz 2= J1 2I J2 2 (1,6)` forA theV reason that f1 is purely periodic and f2 ispure noise. rAlso, under these circumstances :p12(1) lis-small. Hence, to a good approximation,
i" f1 2 I f2 2 t iti-"1f f1 2+ f2 2f1wT f1 2+ f2 2f22 .-From ,the foregoing it is plain that, taking the `undelayed; autocorrelation as having the value unity, the successive maximaof curve C rise to or very close to a height above the axis which is proportional to the ratio of the'periodic energy in the speech to its entire energy. Looked at in 'another way, these maxima fall below the value unity by a distance which is proportional to the ratio of the aperiodic energy of the speech to its entire -c iiergy. Hence a measure of the amplitudes of these maxima, as compared with the amplitude of the autocorrelation curve for zero'delay constitutes a measure of the relative proportionsof buzz energy and hissenergy, re spectively, required for proper and realistic synthesis of artificial speech. This justifies the transmission of the output of the divider V10, derived as aforesaid, to the receiver station without further change.
IReturning now to Fig. l, the pitch control signal 1/ rm is applied to the buzz source 26 to adjust its oscillation frequency in well known fashion and as shown, for example; in Reisz Patent 2,522,539. The'energy ratio signal,--nam:lely, the divider output signal as itY appears on thel divider output terminal 12, is likewise transmitted 1'0 to the receiver station where it is employed to control the relative amounts of buzz and hiss furnished to the spec'- trum reconstructor inthe following fashion.
Theoutput of the buzz source 26 is delivered by way of a variable gain amplifier 61 to a combination point 62. This amplifier 61 may be of a well known construction such as to furnish a gain proportional to the magnitude of a gain control signal applied to its control terminal 63. The incoming energy distribution signal, termed g(t) for short, is passed through a. rooter 64 and applied to this control terminal 63. Hence the buzz component of the energy applied to the combination point 62 is proportional to the square root of g, and so to the amplitudes of the periodic components of the speech.
The signal gft) is also passed through a phase inverter 65 which converts it to --g( t) and then through a steady potential source 66 ofone volt connected in series. Thus the signal on the lead 67 beyond the source vis equal to I1''-g(t). This in turn is passed through a rooter 68 and applied to the gain control terminal 69 of a second variable gain amplifier 70 which is connected in tandem with the hiss source 25. At this point the gain control signal has the form f A Thus theV energy which ,appears ,at the combination point 62 comprises an additive mixture of periodic energy from the buzz source 26and aperiodic energy from `the hiss source 25 in proportions as called for by the energy distribution of the voice and as controlled by the output of the divider 10. The periodic energy of the buzz source 26 is tuned to the required pitch frequency as stated above by the rst of the two outputs ofthe maximum value selector 7. vThis additive combination is now fed to the Shaping Networks 23 in parallel, where the slowly varying spectrum control currents operate to control the magnitudes` of the several frequency sub-bands which collectively constitute the synthesized signal. l i
Becauseof the nature of the operation of the divider 10 its output is unchanged when its two inputs are increased or reduced in the same ratio. To this extent it supplies an output which is a measure of the relative magnitudes of its two inputs and is otherwise independent of Ysuch magnitudes. In `other words it carries out anormalizing function Ain addition to its assigned dividing function. Provided such normalization as between the autocorrelation signal for zero delayand `the maximum autocorrelation signal be otherwise obtained, the energy distribution signal g(t) or a near equivalent thereof can readily be derived by a subtraction process.
What is claimed is: f
1. In a system for artificial production of speech, the combination which comprises a speech analyzer station having means 4for deriving from a speech sound a pitch control signal, an energy distribution signal which is representative, for each sound, of its.. proportional content of periodic energy, and a plurality of spectrum control signals, and a reproducer station having` a buzz source, a hiss source and a spectrum synthesizer, means for transmitting all of said'control signals to said reproducer station, means for tuning the buzz sourceV under control of the pitch control signal, means controlled by said energy distribution signal for mixing the outputs of said sources in proportions determined by the magnitude of said energy distribution signal, means for applying said mixed outputs to said spectrum synthesizer, and means for controlling said spectrum synthesizer under control of said spectrum control signals.
v2. In a system'for deriving control signals to control the artificial production of speech, means for analyzing .a speech sound, means for deriving from said analysis a pitch control signal representative of the fundamental frequency of said speech sound, means for deriving from said `analysis a plurality of spectrum control signals, each representative of the speech energy falling within one of .a plurality of frequency sub-bands; which collectively embrace the frequency band of said speech sound, means for -derivingfrom said' analysis a first measure of the entire energy of said speech sound, means Ifor deriving from. said analysis a second measure of the energy of the periodic components of said speech sound, and means for deriving from said two measures an additional signal representative of the distribution of the energy of said speech sound as between its periodic components and its aperiodic components.
3. Apparatus as defined in claim l wherein said means for deriving said pitch control signal comprises an autocorrelator and a maximum value selector.
4. Apparatus as defined in claim l wherein said means -for deriving said pitch control signal comprises means for determining the autocorrelation of said speech sound for various delays, and means for selecting from among Said various delays that one for which said autocorrelation as determined is a maximum.
5. In combination with apparatus as defined in claim 4, means for deriving a signal which is substantially inversely proportional to said selected delay.
6. In a system for deriving control signals to control the artificial production of speech, means for analyzing' a speech wave, means for deriving from` said analysis a pitch control signal representative of the fundamental frequency of said speech wave,'means for deriving from said vanalysis a plurality-'of spectrum control signals representative of the speech energy falling within a plurality of frequency sub-bands which collectively embrace the fre- 'quency band of sai-d speech wave, means for deriving from said analysis a first` measure of the entire energy of said speech wave, means for Vderiving from said analysis a second measure of the energy of the periodic components of said speech wave, and means for deriving from said two measures an additional signal representative of the distribution of the energy of said speech Wave as between its periodic components and its aperiodic components, said pitch control signal deriving means comprising means for delaying said wave by each of a plurality of different time lags distributed over a range extending substantially from zero to the longest period of said "speech wave, means for individually comparing each delayed wave singly with the original undelayed Wave, tmeans for identifying that one of said different time'lags for which the delayed Wave, most nearly resembles the undelayed wave as determined by said comparison, means for deriving an auxiliary signal proportional to the duration of said identified time lag, and means vfor reciprocating said auxiliary signal to provide said pitch control signal.
7. Apparatus for deriving a desired control signal indicative of the fundamental frequency of a complex signal wave which comprises means for delaying said Wave by each of a plurality of different time lags distributed over a range extending substantially from zero to the longest period of said complex signal Wave, means for individually comparing each delayed wave singly with the original undelayed wave, means for identifying that one of said different time lags for which the delayed wave most nearly resembles the undelayed wave as determined by said comparison, means for deriving an auxiliary signal proportional to the duration of said identified time lag, and means for reciprocating said auxiliary signal to provide said desired control signal.
8. In a system for artificial production of speech sounds, a tunable source of periodic energy, a source of aperiodic energy, means for tuning said periodic energy source under control of a pitch control signal, means controlled by an energy distribution signal which is representative, for each sound to be reproduced, of its proportional content of periodic energy for mixing the outputs of said two sources in proportions determined by the t2 magnitude of said energy distribution signal, a plurality of filters having passbands contiguously located on the frequency scale and together embracing the frequency band of said speech sound, means for applying said mixed outputs to all of said filters, means for variably attenuating the energy path of each of said filters under control of a spectrum control signal, and means for reproducing the outputs of all of said lters as an artificial sound.
9. Apparatus for selecting from among a plurality of signals that one which has the greatest value which comprises a plurality of discharge devices each having an anode, a cathode and a control electrode, a first connection extending from all of said anodes to one terminal of an operating potential source, a second connection extending from all of said cathodes to one terminal of a common impedance element, a third connection extending from the other terminal of said common impedance element to the other terminal of said operating potential source, a like plurality of relays each having a winding connected in the anode circuit of one of said discharge devices, each of said relays being provided with two contact pairs, means for applying said input signals individually to the control electrodes of said discharge devices, a first common bus, individual energy paths extending from said rst common bus through the first contact pair of each relay to the control electrode of the device in whose anode circuit said relay is connected, a second common bus, a like plurality of auxiliary potential sources, of successively greater potentials, associated respectively with the several relays, and an energy path extending from said second common bus through the second contact pair of each relay to its associated auxiliary source.
10. Apparatus having a first input point, a second input point and an output point for delivering at said output point a signal proportional to the quotient of a dividend signal by a divisor signal of which the dividend signal is applied to said rst input point while the divisor signal is applied to said second input point which comprises a first element having an exponential current-voltage characteristic connected to the first input point, a second element having a similar exponential current-voltage characteristic connected to the second input point, means for applying said dividend and divisor signals as currents to said rst and second input points, respectively, thereby to produce a first voltage which is proportional to the logarithm of said first input signal and a second voltage which is proportional to the logarithm of said second input signal, means for subtracting said second voltage from said first voltage, and a third element having a similar exponential current-voltage characteristic for deriving from the difference of said first and second voltages a current which is exponentially related to said voltage difference and hence directly related to the quotient of said dividend signal by said divisor signal. p Y
1l. In combination with a source of a speech wave, apparatus for continuously determining the continuously varying fundamental period of said` speech wave which comprises means for delaying said wave by each of a plurality of different time lags distributed over a range extending substantially from zero to the longest period of said speech wave, means for individually comparing each delayed wave' singly with the original undelayed Wave, means for identifying that one of said different time lags for which the delayed wave -most closely resembles the undelayed wave as determined by said comparison, means for rejecting all others of said time lags, means for continually altering the time lag identified to preserve said closest resemblance as said fundamental period changes, and means for developing a signal continuously representative of said varying identified time lag. v
12. Apparatus as defined in claim 11 wherein said comparing means comprises means for determining the autocorrelation of said speech wave for various delays.
13. Apparatus as defined in claim 12 wherein said identifying means comprises means for-selecting that value of the time lag for 'which said autocorrelation as determined 1s a maximum.
14. Apparatus as defined in claim 11 wherein said comparing means comprises means -for multiplying the deeSt speech Wave period, means for comparing each of lsaid replicas with the undelayed wave, means for identifying the replica which most closely matches the orig- 20 inal wave as determined by said comparison, means for rejecting all others of said replicas, whereby the delay characterizing the replica thus identified at each moment is equal to the length of said fundamental period at that moment, means for continually altering said identification to presreve said closest match as said fundamental period changes, and means for developing a signal continuously representative of the delay characterizing the replica momentarily identified.
References Cited in the lile of this patent UNITED STATES PATENTS 2,098,956 Dudley Nov. 16, 1937 2,243,526 Dudley May 27, 1941 2,401,405 Bedford June 4, 1946 2,508,620 Peterson May 23, 1950 2,580,421 Guanella Jan. 1, 1952 2,705,742
Miller Apr. 5, 1955
US463467A 1954-10-20 1954-10-20 Voice pitch determination Expired - Lifetime US2908761A (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US463467A US2908761A (en) 1954-10-20 1954-10-20 Voice pitch determination
GB28214/57A GB796677A (en) 1954-10-20 1955-10-07 Improvements in or relating to circuits for the analysis of speech currents
GB28633/55A GB796676A (en) 1954-10-20 1955-10-07 Improvements in or relating to circuits for the analysis and synthesis of speech currents

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US463467A US2908761A (en) 1954-10-20 1954-10-20 Voice pitch determination

Publications (1)

Publication Number Publication Date
US2908761A true US2908761A (en) 1959-10-13

Family

ID=23840192

Family Applications (1)

Application Number Title Priority Date Filing Date
US463467A Expired - Lifetime US2908761A (en) 1954-10-20 1954-10-20 Voice pitch determination

Country Status (2)

Country Link
US (1) US2908761A (en)
GB (2) GB796676A (en)

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3036268A (en) * 1958-01-10 1962-05-22 Caldwell P Smith Detection of relative distribution patterns
US3036775A (en) * 1960-08-11 1962-05-29 Ibm Function generators
US3069507A (en) * 1960-08-09 1962-12-18 Bell Telephone Labor Inc Autocorrelation vocoder
US3078345A (en) * 1958-07-31 1963-02-19 Melpar Inc Speech compression systems
US3083338A (en) * 1959-11-10 1963-03-26 Crosby Lab Inc Speech communication system
US3090837A (en) * 1959-04-29 1963-05-21 Ibm Speech bandwidth compression system
US3094692A (en) * 1958-10-01 1963-06-18 Electro Mechanical Res Inc Statistical telemetering
US3102236A (en) * 1960-05-05 1963-08-27 Collins Radio Co Squelch circuit controlled by demodulated voice signal
US3102928A (en) * 1960-12-23 1963-09-03 Bell Telephone Labor Inc Vocoder excitation generator
US3109070A (en) * 1960-08-09 1963-10-29 Bell Telephone Labor Inc Pitch synchronous autocorrelation vocoder
US3211832A (en) * 1961-08-28 1965-10-12 Rca Corp Processing apparatus utilizing simulated neurons
US3269921A (en) * 1961-07-03 1966-08-30 Phillips Petroleum Co Computing and controlling the enthalpy of a process stream
US3321582A (en) * 1965-12-09 1967-05-23 Bell Telephone Labor Inc Wave analyzer
US3359409A (en) * 1963-04-09 1967-12-19 Hugh L Dryden Correlation function apparatus
US3405237A (en) * 1965-06-01 1968-10-08 Bell Telephone Labor Inc Apparatus for determining the periodicity and aperiodicity of a complex wave
US3947638A (en) * 1975-02-18 1976-03-30 The United States Of America As Represented By The Secretary Of The Army Pitch analyzer using log-tapped delay line
US4004096A (en) * 1975-02-18 1977-01-18 The United States Of America As Represented By The Secretary Of The Army Process for extracting pitch information
US4015088A (en) * 1975-10-31 1977-03-29 Bell Telephone Laboratories, Incorporated Real-time speech analyzer
US4355204A (en) * 1979-11-09 1982-10-19 U.S. Philips Corporation Speech synthesizing arrangement having at least two distortion circuits
US4589131A (en) * 1981-09-24 1986-05-13 Gretag Aktiengesellschaft Voiced/unvoiced decision using sequential decisions
US4591673A (en) * 1982-05-10 1986-05-27 Lee Lin Shan Frequency or time domain speech scrambling technique and system which does not require any frame synchronization
US4791671A (en) * 1984-02-22 1988-12-13 U.S. Philips Corporation System for analyzing human speech
US4945567A (en) * 1984-03-06 1990-07-31 Nec Corporation Method and apparatus for speech-band signal coding
US5471527A (en) * 1993-12-02 1995-11-28 Dsc Communications Corporation Voice enhancement system and method
US20030088401A1 (en) * 2001-10-26 2003-05-08 Terez Dmitry Edward Methods and apparatus for pitch determination

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3740476A (en) * 1971-07-09 1973-06-19 Bell Telephone Labor Inc Speech signal pitch detector using prediction error data
GB2139052A (en) * 1983-04-20 1984-10-31 Philips Electronic Associated Apparatus for distinguishing between speech and certain other signals

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US2098956A (en) * 1935-10-30 1937-11-16 Bell Telephone Labor Inc Signaling system
US2243526A (en) * 1940-03-16 1941-05-27 Bell Telephone Labor Inc Production of artificial speech
US2401405A (en) * 1944-05-20 1946-06-04 Rca Corp Method of and means for synchronizing wave generators
US2508620A (en) * 1944-11-09 1950-05-23 Rca Corp Multiplex pulse communication system
US2580421A (en) * 1944-12-23 1952-01-01 Radio Patents Corp Cross-talk compensation in pulse multiplex system
US2705742A (en) * 1951-09-15 1955-04-05 Bell Telephone Labor Inc High speed continuous spectrum analysis

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US2098956A (en) * 1935-10-30 1937-11-16 Bell Telephone Labor Inc Signaling system
US2243526A (en) * 1940-03-16 1941-05-27 Bell Telephone Labor Inc Production of artificial speech
US2401405A (en) * 1944-05-20 1946-06-04 Rca Corp Method of and means for synchronizing wave generators
US2508620A (en) * 1944-11-09 1950-05-23 Rca Corp Multiplex pulse communication system
US2580421A (en) * 1944-12-23 1952-01-01 Radio Patents Corp Cross-talk compensation in pulse multiplex system
US2705742A (en) * 1951-09-15 1955-04-05 Bell Telephone Labor Inc High speed continuous spectrum analysis

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3036268A (en) * 1958-01-10 1962-05-22 Caldwell P Smith Detection of relative distribution patterns
US3078345A (en) * 1958-07-31 1963-02-19 Melpar Inc Speech compression systems
US3094692A (en) * 1958-10-01 1963-06-18 Electro Mechanical Res Inc Statistical telemetering
US3090837A (en) * 1959-04-29 1963-05-21 Ibm Speech bandwidth compression system
US3083338A (en) * 1959-11-10 1963-03-26 Crosby Lab Inc Speech communication system
US3102236A (en) * 1960-05-05 1963-08-27 Collins Radio Co Squelch circuit controlled by demodulated voice signal
US3069507A (en) * 1960-08-09 1962-12-18 Bell Telephone Labor Inc Autocorrelation vocoder
US3109070A (en) * 1960-08-09 1963-10-29 Bell Telephone Labor Inc Pitch synchronous autocorrelation vocoder
US3036775A (en) * 1960-08-11 1962-05-29 Ibm Function generators
US3102928A (en) * 1960-12-23 1963-09-03 Bell Telephone Labor Inc Vocoder excitation generator
US3269921A (en) * 1961-07-03 1966-08-30 Phillips Petroleum Co Computing and controlling the enthalpy of a process stream
US3211832A (en) * 1961-08-28 1965-10-12 Rca Corp Processing apparatus utilizing simulated neurons
US3359409A (en) * 1963-04-09 1967-12-19 Hugh L Dryden Correlation function apparatus
US3405237A (en) * 1965-06-01 1968-10-08 Bell Telephone Labor Inc Apparatus for determining the periodicity and aperiodicity of a complex wave
US3321582A (en) * 1965-12-09 1967-05-23 Bell Telephone Labor Inc Wave analyzer
US3947638A (en) * 1975-02-18 1976-03-30 The United States Of America As Represented By The Secretary Of The Army Pitch analyzer using log-tapped delay line
US4004096A (en) * 1975-02-18 1977-01-18 The United States Of America As Represented By The Secretary Of The Army Process for extracting pitch information
US4015088A (en) * 1975-10-31 1977-03-29 Bell Telephone Laboratories, Incorporated Real-time speech analyzer
US4355204A (en) * 1979-11-09 1982-10-19 U.S. Philips Corporation Speech synthesizing arrangement having at least two distortion circuits
US4589131A (en) * 1981-09-24 1986-05-13 Gretag Aktiengesellschaft Voiced/unvoiced decision using sequential decisions
US4591673A (en) * 1982-05-10 1986-05-27 Lee Lin Shan Frequency or time domain speech scrambling technique and system which does not require any frame synchronization
US4791671A (en) * 1984-02-22 1988-12-13 U.S. Philips Corporation System for analyzing human speech
US4945567A (en) * 1984-03-06 1990-07-31 Nec Corporation Method and apparatus for speech-band signal coding
US5471527A (en) * 1993-12-02 1995-11-28 Dsc Communications Corporation Voice enhancement system and method
US20030088401A1 (en) * 2001-10-26 2003-05-08 Terez Dmitry Edward Methods and apparatus for pitch determination
US7124075B2 (en) 2001-10-26 2006-10-17 Dmitry Edward Terez Methods and apparatus for pitch determination

Also Published As

Publication number Publication date
GB796676A (en) 1958-06-18
GB796677A (en) 1958-06-18

Similar Documents

Publication Publication Date Title
US2908761A (en) Voice pitch determination
US2958039A (en) Delay line time compressor
US3030450A (en) Band compression system
US3360610A (en) Bandwidth compression utilizing magnitude and phase coded signals representative of the input signal
US2938079A (en) Spectrum segmentation system for the automatic extraction of formant frequencies from human speech
US3180936A (en) Apparatus for suppressing noise and distortion in communication signals
US3566035A (en) Real time cepstrum analyzer
US3488604A (en) Automatic pulsed-signal amplitude normalizer
GB1409101A (en) Demultiplexing
US3069507A (en) Autocorrelation vocoder
US3327058A (en) Speech wave analyzer
US2627541A (en) Determination of pitch frequency of complex wave
US2902542A (en) Electric pulse code modulation systems
US3139487A (en) Bandwidth reduction system
US3109070A (en) Pitch synchronous autocorrelation vocoder
US3746791A (en) Speech synthesizer utilizing white noise
US3405237A (en) Apparatus for determining the periodicity and aperiodicity of a complex wave
US3127477A (en) Automatic formant locator
US2553610A (en) Harmonic amplitude selector for signaling systems
US2906955A (en) Derivation of vocoder pitch signals
US3190963A (en) Transmission and synthesis of speech
US3742146A (en) Vowel recognition apparatus
US2927969A (en) Determination of pitch frequency of complex wave
US3321582A (en) Wave analyzer
US2891111A (en) Speech analysis