Recherche Images Maps Play YouTube Actualités Gmail Drive Plus »
Recherche avancée dans les brevets | Images de page | Historique Web | Connexion

Brevets

  
[merged small][graphic]
[graphic]
[graphic]
[graphic]
[graphic]
[graphic]
[graphic]
[graphic]
[graphic]
[graphic]
[graphic]

_ — _ 124 _ _ _ COMPUTE ADAPTIVE V |,?,$OI<',';IflgN / UPDATE FILTER / 150 ' CODEBOOK TARGET TARGET MEMORIES _ _ __—>, / 142 II 122 — — 1 ‘ F ‘I / 140 I CODE i__ FIND BEST COMPUTE /148 I INDEX I INNOVATION - EXCITATION 128 126 L _ _ _ J |_ _1 _ _| \ I I I PITCH I $23 SE3; QUANTIZE GAIN /144 IL INDEX JI AND GAIN ‘T _> VECTOR 130 \ COMPUTE ADAPTIVE _ SELECT ADAPTIVE : _ QMN Z; l — — ————> CODEBOOK — —> CODEBOOK I INDEX ,\ CONTRIBUTION FILTER \ 132 |_ J 146 l - _ _I_ _ _ TI — — — cf,||'§"J’LLST|EE _ I FILTER I _ _ _ ' RESPONSE L FLAG 'NDEX _|'\ 134 136 / ———— — — 23.85 KB/S MODE: _ _ _ COMPUTE — ’ HIGHBAND GAIN \138 I_ — _ - — _ _ _l I HIGHBAND I /1 GAIN INDEX I 140 L _____ ___| FIG. 1-2

910 S was zmz ‘LI ‘Adv luewdSn

ZH ZL8‘09I‘8 Sfl

[blocks in formation]
[graphic]
[graphic]
[graphic]
[graphic]
[graphic]
[graphic]
[graphic]
[graphic]
[graphic]
[graphic]
[graphic]
[graphic]
[graphic]
[graphic]
[graphic]
[graphic]
[graphic]
[graphic]
[graphic]
[graphic]
[blocks in formation]
[graphic][graphic]

| _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _| _ _ _ _ _ _ _ _ _ _l I I \>: 3123 PITCH LAG /3153 I I I 314a “ 316a | I 31 1a / I | \ 313a + > LP SYNTHESIS I CODEBOOK1 CODE CORE LAYER I EXCITATION I SYNTH E‘|'|C I (INNOVATIONS) I SPEECH i— — — — II— —————————————————————— — — —i I | I I I | 31% PITCH LAG /315b I \I 311b I W am A 31/6b I I \ I 313b = LP SYNTHESIS I CODE ENHANCEMENT I CODEBOOK 2 EXCITATION I LAYER 2 | (INNOVATIONS) | SYNTHETIC |____,'_ ______________________ __J SPEECH I I I I I 1- — - - 1- ---------------------- - — -I I | l I I 312" PITCH LAG /315" : I W 314n “ 31/6" 311 I />1 \n I 313n = LP SYNTHESIS I CODE I ENHANCEMENT | CODEBOOK N EXCITATION | LAYER N 317II I (INNOVATIONS) I SYNTHETIC |_ __________________________ _ _ _| SPEECH

[graphic][graphic][graphic]
[graphic]

919 17 wens ZIOZ ‘LI adv 1II91I2d ‘Sn

ZH ZL8‘09I‘8 S11

405

410 MAXIMIZE CORRELATION BETWEEN CURRENT \ SUB-FRAME AND PAST LP RESIDUAL TO GENERATE PITCH LAG ESTIMATE

415 \ USE PITCH LAG ESTIMATE TO PERFORM CLOSED-LOOP SEARCH FOR PITCH LAG

420 \ APPLY PITCH LAG SELECTED VIA CLOSEDLOOP SEARCH TO ADAPTIVE CODEBOOK

UPDATE QUANTIZATION TARGET BY SUBTRACTING 42 5 / SCALED ADAPTIVE CODEBOOK ENTRY CORRESPONDING TO THE SELECTED PITCH LAG

SEARCH FIXED CODEBOOK BASED ON 430 / THE UPDATED QUANTIZATION TARGET

435 / PERFORM A JOINT CLOSED-LOOP GAIN QUANTIZATION

4 40 _/ UPDATE PAST QUANTIZED LP EXCITATION BUFFER

445

FIG. 4

[graphic]
[graphic][graphic]
[graphic][graphic]
[graphic][graphic]
[graphic][graphic]
[graphic][graphic]
[graphic]
[graphic]
[graphic]
[graphic]

505

510 USE SECOND ADAPTIVE CODEBOOK
-\ POPULATED WITH PAST QUANTIZED LP
EXCITATION TO SELECT PITCH LAG ESTIMATE

515 \ PERFORM CLOSED-LOOP PITCH-LAG SEARCH WITH THE PITCH LAG ESTIMATE

520 \ APPLY PITCH LAG SELECTED VIA THE CLOSEDLOOP SEARCH TO FIRST ADAPTIVE CODEBOOK

525 UPDATE QUANTIZATION TARGET BY SUBTRACTING \ SCALED FIRST ADAPTIVE CODEBOOK ENTRY CORRESPONDING TO THE SELECTED PITCH LAG

SEARCH FIXED CODEBOOK BASED ON 530 / THE UPDATED QUANTIZATION TARGET

535 / PERFORM A JOINT CLOSED-LOOP GAIN QUANTIZATION

5 40 _/ UPDATE PAST QUANTIZED LP EXCITATION BUFFER

PERFORM JOINT EVALUATION OF ADAPTIVE 545 / AND FIXED-CODEBOOK OPTIMAL GAINS

UPDATE ADDITIONAL SIGNAL BUFFER 550 / WITH THE CORRESPONDING CODEBOOK CONTRIBUTIONS SCALED BY THE OPTIMAL GAINS

555

FIG. 5

[graphic]
[graphic][graphic]
[graphic][graphic]
[graphic][graphic]
[graphic][graphic]
[graphic][graphic]
[graphic]
[graphic]
[graphic]
[graphic]
[graphic][graphic]
[graphic]
[graphic]

1 METHOD AND APPARATUS FOR LAYERED CODE-EXCITED LINEAR PREDICTION SPEECH UTILIZING LINEAR PREDICTION EXCITATION CORRESPONDING TO OPTIMAL GAINS

CROSS-REFERENCE TO PROVISIONAL APPLICATION

This application claims the benefit of U.S. Provisional Application Ser. No. 60/910,343, filed by Stachurski on Apr. 5, 2007, entitled “CELP System and Method,” commonly assigned with the invention and incorporated herein by reference. Co-pending U.S. patent application Ser. Nos. 11/279, 932, filed by Stachurski on Apr. 17, 2006, entitled “Layered CELP System and Method” and [TI-64406], filed by Stachurski on even date herewith, entitled “Layered Code-Excited Linear Prediction Speech Encoder and Decoder Having Plural Codebook Contributions in Enhancement Layers Thereof and Methods of Layered CELP Encoding and Decoding,” both commonly assigned with the invention and incorporated herein by reference, disclose related subject matter.

TECHNICAL FIELD OF THE INVENTION

The invention is directed, in general, to electronic devices and digital signal processing and, more specifically, to a layered code-excited linear prediction (CELP) speech encoder and decoder having plural codebook contributions in enhancement layers thereof and methods of layered CELP encoding and decoding that employ the contributions.

BACKGROUND OF THE INVENTION

The performance of digital speech systems using low bit rates has become increasingly important with current and foreseeable digital communications. Both dedicated channel and packetized voice-over-intemet protocol (VoIP) transmission benefit from compression of speech signals. The widelyused linear prediction (LP) digital speech coding method (see, e.g., Schroeder, et al., “Code-Excited Linear Prediction (CELP): High Quality Speech at Very Low Bit Rates,” in Proc. IEEE Int. Conf, on Acoustics, Speech, Signal Processing, (Tampa), pp. 937-940, March 1985) models the vocal tract as a time-varying filter and a time-varying excitation of the filter to mimic human speech. Linear prediction analysis determines linear prediction (LP) coeflicients a(j), j:1, 2, . . . , M, for an input frame of digital speech samples {s(n)} by setting:

(1)

and minimizing Z/a,amer(n)2. Typically, M, the order of the linear prediction filter, is taken to be about 10-12; the sampling rate to form the samples s(n) is typically taken to be 8 kHz (the same as the public switched telephone network, or PSTN, sampling for digital transmission and which corresponds to a voiceband of about 0.3-3.4 kHz); and the number of samples {s(n)} in a frame is often 80 or 160 (10 or 20 ms frames). Various windowing operations may be applied to the samples of the input speech frame. The name “linear prediction” arises from the interpretation of the residual r(n):s(n)— Z NPIJ; 1a(j)s(n—j) as the error in predicting s(n) by a linear combination of preceding speech samples ZMZJ-;Ia(j)s(n—j); that is, a linear autoregression. Thus minimizing Z/a,,,,,I,,r(n)2 yields the {a(j)} which fumish the best linear prediction. The coeflicients {a(j)} may be converted to line spectral frequen

r(n):S(n)—EM2,-2 141/I)-11”-J‘)

20

25

30

35

40

50

55

60

65

2

cies (LSFs) or immittance spectrum pairs (ISPs) for vector quantization plus transmission and/or storage.

The {r(n)} form the LP residual for the frame, and ideally the LP residual would be the excitation for the synthesis filter 1/A(z) where A(z) is the transfer function of Equation (1); that is, Equation (1) is a convolution that z-transforms to a multiplication: R(z):A(z)S(z), so S(z):R(z)/A(z). Of course, the LP residual is not available at the decoder; thus the task of the encoder is to represent the LP residual so that the decoder can generate an excitation for the LP synthesis filter. That is, from the encoded parameters the decoder generates a filter estimate, A(z), plus an estimate of the residual to use as an excitation, E(z); and thereby estimates the speech frame by S(z):E(z)/A(z). Physiologically, for voiced frames the excitation roughly has the form of a series of pulses at the pitch frequency, and for unvoiced frames the excitation roughly has the form of white noise.

For compression the LP approach basically quantizes various parameters and only transmits/stores updates or codebook entries for these quantized parameters, filter coeflicients, pitch lag, residual waveform, and gains. A receiver regenerates the speech with the same perceptual characteristics as the input speech. Periodic updating of the quantized items requires fewer bits than direct representation of the speech signal, so a reasonable LP encoder can operate at bits rates as low as 2-3 kb/ s (kilobits per second).

For example, the Adaptive Multirate Wideband (AMRWB) encoding standard with available bit rates ranging from 6.6 kb/s up to 23.85 kb/s uses LP analysis with codebook excitation (CELP) to compress speech. An adaptive-codebook contribution provides periodicity in the excitation and is the product of a gain, gP, multiplied by v(n), the excitation of the prior frame translated by the pitch lag of the current frame and interpolated to fit the current frame. The algebraic codebook contribution approximates the difference between the actual residual and the adaptive codebook contribution with a multiple-pulse vector (also known as an innovation sequence), c(n), multiplied by a gain, gc. The number of pulses depends on the bit rate. That is, the excitation is u(n): gPv(n)+gCc(n) where v(n) comes from the prior (decoded) frame, and g P, gc, and c(n) come from the transmitted parameters for the current frame. The speech synthesized from the excitation is then postfiltered to mask noise. Postfiltering essentially involves three successive filters: a short-term filter, a long-term filter, and a tilt compensation filter. The shortterm filter emphasizes forrnants; the long-term filter emphasizes periodicity, and the tilt compensation filter compensates for the spectral tilt typical of the short-term filter. See, e.g., Bessette, et al., The Adaptive Multirate Wideband Speech Codec (AMR-VVB), 10 IEEE Tran. Speech and Audio Processing 620 (2002).

A layered (embedded) CELP speech encoder, such as the MPEG-4 audio CELP, provides bit rate scalability with an output bitstream consisting of a core (or base) layer (an adaptive codebook together with a fixed codebook 0) plus N enhancement layers (fixed codebooks 1 through N). For a general discussion on fixed (or algebraic) codebooks, see, e.g., Adoui, et al., “Fast CELP Coding Based on Algebraic Codes,” in Proc. IEEE Int. ConfonAcoustics, Speech, Signal Processing, (Dallas), pp. 1957-1960, April 1987.

A layered encoder uses only the core layer at the lowest bit rate to give acceptable quality and provides progressively enhanced quality by adding progressively more enhancement layers to the core layer. A layer’s fixed codebook entry is found by minimizing the error between the input speech and the so-far cumulative synthesized speech. Layering is useful for some Voice-over-Intemet-Protocol (VoIP) applications

« PrécédentContinuer »