WO2001003118A1

WO2001003118A1 - Audio coding and decoding by interpolation

Info

Publication number: WO2001003118A1
Application number: PCT/FR2000/001906
Authority: WO
Inventors: François CAPMAN; Carlo Murgia
Original assignee: Matra Nortel Communications
Priority date: 1999-07-05
Filing date: 2000-07-04
Publication date: 2001-01-11
Also published as: ATE277403T1; FR2796191A1; EP1192619B1; FR2796191B1; DE60014085D1; AU6292000A; EP1192619A1

Abstract

The invention concerns a method wherein a decoder synthesises a set of successive frames of N samples of an audio signal from encoding data included in a digital flow received from the encoder and comprising, for only one subset of frames, data representing spectral amplitudes of the audio signal. The decoder determines for each of the frames of said subset, on the basis of the encoding data, the cepstral coefficients (cx_q[n]) representing some at least of said spectral amplitudes whereas for the frames not forming part of said subset, it interpolates said cepstral coefficients, and generates by means of said interpolated cepstral coefficients (cx[n-1/2] a spectral estimate of the audio signal which it transforms in the temporal domain to obtain the synthesised frame.

Description

AUDIO ENCODING AND DECODING BY INTERPOLATION

The present invention relates to the field of coding of audio signals. It applies in particular, but not exclusively, to speech coding, in narrow band or in wide band, in various ranges of coding bit rates.

The design of an audio codec mainly aims to provide a good compromise between the bit rate of the stream transmitted by the coder and the quality of the audio signal which the decoder is capable of reconstructing from this stream.

With this in mind, families of coders have notably been developed based on an analysis of the audio signal in the spectral domain: the coder estimates a fundamental frequency of the signal, representing its pitch, and the spectral analysis consists determining parameters representing the harmonic structure of the signal at frequencies which are integer multiples of this fundamental frequency. A modeling of the non-harmonic, or non-voiced component, can also be carried out in the spectral domain. The parameters transmitted to the decoder typically represent the modulus of the spectrum of the voiced and unvoiced components. Added to this is information representing either voiced / unvoiced decisions relating to different portions of the spectrum, or information on the probability of voicing of the signal, allowing the decoder to determine in which portions of the spectrum it must use the voiced component. or the unvoiced component.

These coder families include MBE type coders

(“Multi-Band Excitation”), or STC (“Sinusoidal Transform Coder”) encoders. As a reference, we can cite the US patents

4,856,068, 4,885,790, 4,937,873, 5,054,072, 5,081,681.5,195,166, 5,216,747,

5,226,084, 5,226,108, 5,247,579, 5,473,727, 5,517,511, 5,630,011, 5,630,012, 5,649,050, 5,651,093, 5,664,051, 5,664,052, 5,684,926, 5,701 390, 5 715 365, 5 749 065, 5 752 222, 5 765 127, 5 774 837 and 5 890 108. An object of the present invention is to allow, in a coding scheme with analysis in the spectral domain, improve the modeling of the phases of the signal spectrum by the decoder.

The invention thus proposes a method for decoding an input digital stream representing an encoded audio signal, in which a set of successive frames of N samples of the audio signal is synthesized from coding data included in the digital input stream, in which the coding data comprises, for only a subset of the frames, data representative of spectral amplitudes associated with frequencies of the spectrum of the audio signal. According to the invention, cepstral coefficients representative of at least some of said spectral amplitudes are determined for each of the frames of said subset, on the basis of the coding data. For frames not forming part of said subset, said cepstral coefficients are interpolated, and a spectral estimate of the audio signal is transformed using the interpolated cepstral coefficients which are transformed in the time domain to obtain the synthesized frame.

The method advantageously applies when the coding data directly comprises data for quantifying the cepstral coefficients. But if the spectrum modeling uses other quantified parameters in the stream which allow, by a known transformation, to recover the cepstral coefficients, for example the LSP ("A Spectrum Parameters"), the decoder can use this transformation to proceed to the interpolation of parameters in the cepstral domain and thus benefit from the advantages of the invention.

In an advantageous embodiment of the method, the interpolated cepstral coefficients are corrected, for the frames not forming part of said subset, on the basis of interpolation error quantization data included in the coding data. As a variant, the cepstral coefficients can be interpolated by a filter determined on the basis of quantization data included in the coding data. Another aspect of the present invention relates to a method of coding an audio signal, in which a spectrum of the audio signal is determined by a transform in the frequency domain of a frame of the audio signal, and included in a stream digital output of the data representative of spectral amplitudes associated with at least some of the frequencies of the spectrum, in which the spectrum of the audio signal is determined for a set of successive frames of N samples of the audio signal, and in which one determines for each of the frames of said set of cepstral coefficients representative of at least some of said spectral amplitudes. Said data representative of the spectral amplitudes are included in the digital output stream for only a subset of the frames. For frames not part of said subset, we include in the flow digital output either quantization data of an interpolation error of said cepstral coefficients, or data representing an optimal interpolating filter determined for said cepstral coefficients.

The invention also provides an audio coder and decoder comprising means for implementing the above methods.

Other features and advantages of the present invention will appear in the following description of nonlimiting exemplary embodiments, with reference to the appended drawings, in which:

- Figure 1 is a block diagram of an audio encoder according to the invention; - Figures 2 and 3 are diagrams illustrating the formation of audio signal frames in the encoder of Figure 1;

- Figures 4 and 5 are graphs showing an example of the audio signal spectrum and illustrating the extraction of the upper and lower envelopes of this spectrum; - Figure 6 is a block diagram of an example of quantization means usable in the encoder of Figure 1;

- Figure 7 is a block diagram of means used to extract parameters relating to the phase of the non-harmonic component in a variant of the encoder of Figure 1; - Figure 8 is a block diagram of an audio decoder corresponding to the encoder of Figure 1;

FIG. 9 is a flow diagram of an example of a procedure for smoothing spectral coefficients and extracting minimum phases implemented in the decoder of FIG. 8; - Figure 10 is a block diagram of analysis and spectral mixing modules of harmonic and non-harmonic components of the audio signal;

- Figures 11 to 13 are graphs showing examples of non-linear functions usable in the analysis module of Figure 10;

FIGS. 14 and 15 are diagrams illustrating one way of proceeding to the temporal synthesis of the signal frames in the decoder of FIG. 8;

- Figures 16 and 17 are graphs showing windowing functions usable in the synthesis of the frames according to Figures 14 and 15; - Figures 18 and 19 are block diagrams of interpolation means usable in an alternative embodiment of the coder and the decoder;

- Figure 20 is a block diagram of interpolation means usable in another alternative embodiment of the encoder; and

FIGS. 21 and 22 are diagrams illustrating another way of proceeding with the temporal synthesis of the signal frames in the decoder of FIG. 8, using an interpolation of parameters.

The coder and the decoder described below are digital circuits which can, as is usual in the field of audio signal processing, be produced by programming a digital signal processor (DSP) or an integrated circuit d specific application (ASIC).

The audio coder represented in FIG. 1 processes an input audio signal x which, in the nonlimiting example considered below, is a speech signal. The signal x is available in digital form, for example at a sampling frequency F _e of 8 kHz. It is for example delivered by an analog-digital converter processing the amplified output signal of a microphone. The input signal x can also be formed from another version, analog or digital, coded or not, of the speech signal. The encoder comprises a module 1 which forms successive audio signal frames for the different processing operations carried out, and an output multiplexer 6 which delivers an output stream Φ containing for each frame sets of quantization parameters from which a decoder will be capable. synthesize a decoded version of the audio signal. The structure of the frames is illustrated by FIGS. 2 and 3. Each frame 2 is composed of a number N of consecutive samples of the audio signal x. Successive frames have mutual time offsets corresponding to M samples, so that their overlap is L = NM samples of the signal. In the example considered, where N = 256, M = 160 and L = 96, the duration of frames 2 is N / F _e = 32 ms, and a frame is formed every M / F _e = 20 ms.

Conventionally, the module 1 multiplies the samples of each frame 2 by a windowing function f _A , preferably chosen for its good spectral properties. The samples x (i) of the frame being numbered from i = 0 to i = N— 1, the analysis window f _A (i) can thus be a window of Hamming, of expression i - (N-1) / 2 ^" f _A (i) = 0.54 + 0.46. Cos 2π (DN or a Hanning window of expression:

or a window of Kaiser, of expression:

where α is a coefficient for example equal to 6, and l ₀ (.) denotes the function of

Bessel with index 0.

The coder in FIG. 1 analyzes the audio signal in the spectral domain. It includes a module 3 which calculates the fast Fourier transform (TFR) of each signal frame. The signal frame is formatted before being submitted to the TFR module 3: the module 1 adds N = 256 samples to zero in order to obtain the maximum resolution of the Fourier transform, and it also performs a circular permutation of the 2N = 512 samples in order to compensate for the phase effects resulting from the analysis window. This modification of the frame is illustrated in FIG. 3. The frame for which the fast Fourier transform on 2N = 512 points is calculated begins with the N / 2 = 128 last weighted samples of the frame, followed by the N = 256 samples with zero, and ends with the N / 2 = 128 first weighted samples of the frame.

The TFR 3 module obtains the signal spectrum for each frame, the module and phase of which are respectively denoted | X | and φ _x , or | X (i) | and φ _χ (i) for the frequency indexes i = 0 to i = 2N-1 (thanks to the symmetry of the Fourier transform and of the frames, we can limit ourselves to the values for 0 <i <N).

A fundamental frequency detector 4 estimates for each signal frame a value of the fundamental frequency F ₀ . The detector 4 can apply any known method of analysis of the speech signal of the frame to estimate the fundamental frequency F ₀ , for example a method based on the autocorrelation function or the AMDF function, possibly preceded by a bleaching module by linear prediction. The estimation can also be performed in the spectral domain or in the cepstral domain. Another possibility is to evaluate the time intervals between the consecutive breaks in the speech signal attributable to closures of the glottis of the intervening speaker during the duration of the frame. Well-known methods which can be used to detect such micro-ruptures are described in the following articles: M. Basseville et al., “Sequential detection of abrupt changes in spectral characteristics of digital signais” (IEEE Trans. On Information Theory, 1983, Vol IT-29, No. 5, pages 708-723); R. Andre-Obrecht, "A new statistical approach for the automatic segmentation of continuous speech signais" (IEEE Trans. On Acous., Speech and Sig. Proc, Vol. 36, N ° 1, January 1988); and C. MURGIA et al., “An algorithm for the estimation of glottal closure instants using the sequential detection of abrupt changes in speech signais” (Signal Processing VII, 1994, pages 1685-1688).

The estimated fundamental frequency F ₀ is subject to quantification, for example scalar, by a module 5, which supplies the output multiplexer 6 with an index iF for quantizing the fundamental frequency for each frame of the signal. The encoder uses cepstral parametric models to represent an upper envelope and a lower envelope of the spectrum of the audio signal. The first step of the cepstral transformation consists in applying to the signal spectrum module a spectral compression function, which can be a logarithmic or root function. The coder module 8 thus operates, for each value X (i) of the signal spectrum (0 <i <N), the following transformation:

in the case of logarithmic compression or

LX (i) = | X (i) | ^γ (5) in the case of compression at the root, γ being an exponent between 0 and 1.

The compressed spectrum LX of the audio signal is processed by a module 9 which extracts spectral amplitudes associated with the harmonics of the signal corresponding to the multiples of the estimated fundamental frequency F0. These amplitudes are then interpolated by a module 10 in order to obtain a compressed upper envelope denoted LX_sup.

It should be noted that the spectral compression could be carried out in an equivalent manner after the determination of the amplitudes associated with the harmonics. It could also be done after interpolation, which would only change the form of the interpolation functions.

The maxima extraction module 9 takes account of the possible variation of the fundamental frequency on the analysis frame, of the errors that the detector 4 can make, as well as of the inaccuracies linked to the discrete nature of the frequency sampling. For this, the search for the amplitudes of the spectral peaks does not simply consist in taking the values LX (i) corresponding to the indices i such that iF _e / 2N is the frequency closest to a harmonic of frequency kF ₀ (k> 1 ). The spectral amplitude retained for a harmonic of order k is a local maximum of the spectrum module in the vicinity of the frequency kF ₀ (this amplitude is obtained directly in compressed form when the spectral compression 8 is carried out before the extraction of the maxima 9 ).

Figures 4 and 5 show an example of the shape of the compressed spectrum

LX, where we see that the maximum amplitudes of the harmonic peaks do not necessarily coincide with the amplitudes corresponding to the integer multiples of the estimated fundamental frequency F ₀ . The sides of the peaks being quite steep, a small positioning error of the fundamental frequency F ₀ , amplified by the harmonic index k, can strongly distort the estimated upper envelope of the spectrum and cause poor modeling of the formantic structure of the signal. For example, taking the spectral amplitude directly for the frequency 3.F ₀ in the case of FIGS. 4 and 5 would produce a significant error in the extraction of the upper envelope in the vicinity of the harmonic of order k = 3, whereas it is an energetically important zone in the example drawn. By interpolating from the true maximum, this kind of error in estimating the upper envelope is avoided.

In the example shown in FIG. 4, the interpolation is carried out between points whose abscissa is the frequency corresponding to the maximum of the amplitude of a spectral peak, and whose ordinate is this maximum, before or after compression.

The interpolation performed to calculate the upper envelope LX_sup is a simple linear interpolation. Of course another form of interpolation could be used (for example nominal pol or spline). In the preferred variant shown in FIG. 5, the interpolation is carried out between points whose abscissa is a frequency kF ₀ multiple of the fundamental frequency (in fact the closest frequency in the discrete spectrum) and whose ordinate is the maximum amplitude, before or after compression, of the spectrum in the vicinity of this multiple frequency. By comparing Figures 4 and 5, we can see that the extraction mode according to Figure 5, which reposition the peaks on the harmonic frequencies, leads to better precision on the amplitude of the peaks that the decoder will attribute to the multiple frequencies of the fundamental frequency. There may be a slight frequency shift of the position of these peaks, which is not perceptually very important and is not avoided either in the case of Figure 4. In the case of Figure 4, the anchor points for the interpolation are confused with the vertices of the harmonic peaks. In the case of FIG. 5, it is imposed that these anchoring points are precisely at the frequencies multiple of the fundamental frequency, their amplitudes corresponding to those of the peaks.

The maximum amplitude search interval associated with a harmonic of rank k is centered on the index i of the frequency of the highest TFR

close to kF ₀ , i.e. i = 2Nk ^ ° - + l where [_aj denotes the integer equal or

immediately below the number a. The width of this search interval depends on the sampling frequency F _e , the size 2N of the TFR and the range of possible variation of the fundamental frequency. This width is typically of the order of ten frequencies with the examples of values previously considered. It can be made adjustable as a function of the value F ₀ of the fundamental frequency and of the number k of the harmonic. In order to improve the resolution in the low frequencies and therefore to represent more faithfully the amplitudes of the harmonics in this zone, a non-linear distortion of the frequency scale is operated on the upper envelope compressed by a module 12 before the module 13 performs the inverse fast Fourier transform (TFRI) providing the cepstral coefficients cx_sup.

The non-linear distortion makes it possible to minimize the modeling error more effectively. It is for example carried out according to a Mel or Bark type frequency scale. This distortion may possibly depend on the estimated fundamental frequency F ₀ . Figure 1 illustrates the case of the Mel scale. The relationship between the frequencies F of the linear spectrum, expressed in hertz, and the frequencies F 'of the Mel scale is as follows:

1000

- χ log ₁₀ 1 + - (6) l ° 9lθ (2) 1 ι 0u0u0uy

In order to limit the transmission rate, a truncation of the cepstral coefficients cx_sup is performed. The TFRI module 13 needs to calculate only one cepstral vector of NCS cepstral coefficients of orders 0 to NCS-1. As an example, NCS can be equal to 16.

A post-filtering in the cepstral domain, called post-liftrage, is applied by a module 15 to the compressed upper envelope LX_sup. This post-liftrage corresponds to a manipulation of the cepstral coefficients cx_sup delivered by the module of TRF1 13, which corresponds approximately to a post-filtering of the harmonic part of the signal by a transfer function having the classical form:

H ( _Z ) = (l - ^ - ¹ ) ^ (7,

where A (z) is the transfer function of a linear prediction filter of the audio signal, γ ₁ and γ ₂ are coefficients between 0 and 1, and μ is a possibly zero pre-emphasis coefficient. The relation between the post-liftré coefficient of order i, noted c _p (i), and the corresponding cepstral coefficient c (i) = cx_sup (i) delivered by module 13 is then: c _p (0) = c ( 0) /. _M '(8) c _p (i) = (1 + Y2 ^~ ΥÎ JC (') - - ^ - for i> 0

The optional pre-emphasis coefficient μ can be controlled by placing the constraint of preserving the value of the cepstral coefficient cx_sup (1) relative to the slope. Indeed, the value c (1) = cx_sup (1) of a white noise filtered by the pre-emphasis filter corresponds to the pre-emphasis coefficient. We can thus choose the latter as follows:

After the post-liftre 15, a normalization module 16 further modifies the cepstral coefficients by imposing the exact modeling constraint of a point on the initial spectrum, which is preferably the most energetic point among the spectral maxima extracted by the module 9 In practice, this normalization only modifies the value of the coefficient c _p (0).

The normalization module 16 operates as follows: it recalculates a value of the synthesized spectrum at the frequency of the maximum indicated by the module 9, by Fourier transform of the truncated and post-liftral cepstral coefficients, taking into account the non-distortion linear of the frequency axis; it determines a normalization gain g _N by the logarithmic difference between the value of the maximum provided by the module 9 and this recalculated value; and he adds the gain g _N to the post-raised cepstral coefficient Cp (0).

This normalization can be seen as part of post-liftrage.

The post-liftrated and normalized cepstral coefficients are subject to quantification by a module 18 which transmits corresponding quantization indexes icxs to the output multiplexer 6 of the coder.

The module 18 can operate by vector quantization from cepstral vectors formed from post-liftred and normalized coefficients, denoted here cx [n] for the signal frame of rank n. For example, the cepstral vector cx [n] of NCS = 16 cepstral coefficients cx [n, 0], cx [n, 1], ..., cx [n, NCS-1] is distributed in four sub - cepstral vectors each containing four coefficients of consecutive orders. The cepstral vector cx [n] can be processed by the means shown in Figure 6, part of the quantization module 18. These means implement, for each component cx [n, i], a predictor of the form: cx _p [n, i] = (1 - α (i)). rcx [n, i] + α (i). rcx [n-1, i] (9) where rcx [n] denotes a residual prediction vector for the frame of rank n whose components are respectively denoted rcx [n, 0], rcx [n, 1], .. ., rcx [n, NCS-1], and α (i) denotes a prediction coefficient chosen to be representative of an assumed inter-frame correlation. After quantification of the residuals, this residual vector is defined by: (. „10 -,>)

where rcx_q [n-1] designates the quantized residual vector for the frame of rank n-1, whose components are respectively noted rcx_q [n, 0], rcx_q [n, 1], ..., rcx_q [n, NCS-1].

The numerator of the relation (10) is obtained by a subtractor 20, whose components of the output vector are divided by the quantities 2-α (i) at 21. For the purposes of quantification, the residual vector rcx [n] is subdivided into four sub-vectors, corresponding to the subdivision into four cepstral sub-vectors. On the basis of a dictionary obtained by prior learning, the unit 22 proceeds to the vector quantization of each sub-vector of the residual vector rcx [n]. This quantification can consist, for each sub-vector srcxfn], in selecting in the dictionary the quantized sub-vector srcx_q [n] which minimizes the quadratic error

| srcx [n] - srcx_q [n] | . The set icxs of the quantization indices icx, corresponding to the addresses in the dictionary or dictionaries of the quantified residual sub-vectors srcx_q [n], is supplied to the output multiplexer 6. The unit 22 also delivers the values of the residual sub-vectors which form the vector rcx_q [n]. This is delayed by a frame at 23, and its components are multiplied by the coefficients α (i) at 24 to supply the vector to the negative input of the subtractor 20. This latter vector is also supplied to a adder 25, the other input of which receives a vector formed by the components of the quantized residue rcx_q [n] respectively multiplied by the quantities 1-α (i) at 26. The adder 25 thus delivers the quantized cepstral vector cx_q [n] that will recover the decoder.

The prediction coefficient α (i) can be optimized separately for each of the cepstral coefficients. The quantization dictionaries can also be optimized separately for each of four cepstral sub-vectors. On the other hand, it is possible, in a manner known per se, to normalize the cepstral vectors before applying the prediction / quantification scheme, from the variance of the cepstrums.

It should be noted that the above scheme for quantifying cepstral coefficients may only be applied for some of the frames. For example, it is possible to provide a second quantization mode as well as a selection process of that of the two modes which minimizes a criterion of least squares with the cepstral coefficients to be quantified, and transmit with the quantization indexes of the frame a bit indicating which of the two modes has been selected. The quantized cepstral coefficients cx_sup_q = cx_q [n] supplied by the adder 25 are addressed to a module 28 which recalculates the spectral amplitudes associated with one or more of the harmonics of the fundamental frequency F ₀ (FIG. 1). These spectral amplitudes are for example calculated in compressed form, by applying the Fourier transform to the quantified cepstral coefficients taking into account the non-linear distortion of the frequency scale used in the cepstral transformation. The amplitudes thus recalculated are supplied to an adaptation module 29 which compares them to the maximum amplitudes determined by the extraction module 9.

The adaptation module 29 controls the post-lifter 15 so as to minimize a module difference between the spectrum of the audio signal and the corresponding module values calculated at 28. This module difference can be expressed by a sum of absolute values of amplitude differences, compressed or not, corresponding to one or more of the harmonic frequencies. This sum can be weighted according to the spectral amplitudes associated with these frequencies.

Optimally, the modulus difference taken into account in the adaptation of the post-liftring would take into account all the harmonics of the spectrum. However, in order to reduce the complexity of the optimization, the module 28 can resynthesize the spectral amplitudes only for one or more frequencies multiple of the fundamental frequency F ₀ , selected on the basis of the size of the spectrum module in absolute value . The adaptation module 29 can for example consider the three most intense spectral peaks in the calculation of the module deviation to be minimized.

In another embodiment, the adaptation module 29 estimates a spectral masking curve of the audio signal by means of a psychoacoustic model, and the frequencies taken into account in the calculation of the module deviation to be minimized are selected on the basis the importance of the spectrum module relative to the masking curve (we can for example take the three frequencies for which the spectrum module exceeds the masking curve the most). Different conventional methods can be used to calculate the masking curve from the audio signal. We can for example use the one developed by JD Johnston ("Transform Coding of Audio Signais Using Perceptual Noise Criteria", IEEE Journal on Selected Area in Communications, Vol. 6, No. 2, February 1988).

To carry out the adaptation of the post-liftrage, the module 29 can use a filter identification model. A simpler method consists in predefining a set of post-liftring parameter sets, that is to say a set of couples γ. ,, γ ₂ in the case of a post-liftring according to relations (8) , to carry out the operations incumbent on the modules 15, 16, 18 and 28 for each of these sets of parameters, and to retain that of the sets of parameters which leads to the minimum module deviation between the signal spectrum and the recalculated values. The quantization indexes provided by the module 18 are then those which relate to the best set of parameters.

By a process analogous to that of the extraction of the coefficients cx_sup representing the compressed upper envelope LX_sup of the signal spectrum, the coder determines coefficients cx_inf representing a compressed lower envelope LX_inf. A module 30 extracts from the compressed spectrum LX spectral amplitudes associated with frequencies located in regions of the intermediate spectrum with respect to the frequencies multiple of the estimated fundamental frequency F ₀ .

In the example illustrated by FIGS. 4 and 5, each amplitude associated with a frequency located in an intermediate zone between two successive harmonics kF ₀ and (k + 1). F ₀ simply corresponds to the modulus of the spectrum for the frequency (k + 1/2) .F ₀ located in the middle of the interval separating the two harmonics. In another embodiment, this amplitude could be an average of the spectrum modulus over a small range surrounding this frequency (k + 1/2) .F ₀ . A module 31 proceeds to an interpolation, for example linear, of the spectral amplitudes associated with the frequencies located in the intermediate zones to obtain the compressed lower envelope LX_inf.

The cepstral transformation applied to this compressed lower envelope LX_inf is carried out according to a frequency scale resulting from a non-linear distortion applied by a module 32. The TFRI module 33 calculates a cepstral vector of NCI cepstral coefficients cx_inf of orders 0 to NCI-1 representing the lower envelope. NCI is a number which can be significantly smaller than NCS, for example NCI = 4.

The non-linear transformation of the frequency scale for the cepstral transformation of the lower envelope can be performed towards a finer scale at high frequencies than at low frequencies, which advantageously makes it possible to model the unvoiced components of the signal at high frequencies. However, to ensure a uniformity of representation between the upper envelope and the lower envelope, it may be preferable to adopt in module 32 the same scale as in module 12 (Mel in the example considered).

The cepstral coefficients cx_inf representing the compressed lower envelope are quantified by a module 34, which can operate in the same way as the module 18 for quantifying the cepstral coefficients representing the compressed upper envelope. In the case considered, where we limit ourselves to NCI = 4 cepstral coefficients for the lower envelope, the vector thus formed is subjected to a vector quantization of prediction residue, carried out by means identical to those represented in FIG. 6 but without subdivision into sub-vectors. The quantization index icx = icxi determined by the vector quantizer 22 for each frame relative to the coefficients cx_inf is supplied to the output multiplexer 6 of the coder.

The coder shown in FIG. 1 does not include any particular device for coding the phases of the spectrum with the harmonics of the audio signal. On the other hand, it includes means 36-40 for coding temporal information linked to the phase of the non-harmonic component represented by the lower envelope.

A spectral decompression module 36 and a TFRI module 37 form a temporal estimate of the frame of the non-harmonic component. The module 36 applies a reciprocal decompression function of the compression function applied by the module 8 (that is to say an exponential or a power 1 / γ function) to the compressed lower envelope LX_inf produced by the module interpolation 31. This provides the modulus of the estimated frame of the non-harmonic component, the phase of which is taken to be equal to that φ _χ of the spectrum of the signal X over the frame. The inverse Fourier transform performed by the module 37 provides the estimated frame of the non-harmonic component.

The module 38 subdivides this estimated frame of the non-harmonic component into several time segments. The frame delivered by the module 37 consisting of 2N = 512 weighted samples as illustrated in FIG. 3, the module 38 considers only the N / 2 = 128 first samples and the N / 2 = 128 last samples, and subdivides them for example into eight segments of 32 consecutive samples each representing 4 ms of signal.

For each segment, the module 38 calculates the energy equal to the sum of the squares of the samples, and forms a vector E1 formed by eight positive real components equal to the eight calculated energies. The largest of these eight energies, denoted EM, is also determined to be supplied, with the vector E1, to a normalization module 39. The latter divides each component of the vector E1 by EM, so that the normalized vector Emix is formed of eight components between 0 and 1. It is this normalized vector Emix, or weighting vector, which is subject to quantization by module 40. This can perform vector quantization with a dictionary determined during a prior learning. The quantization index iEm is supplied by the module 40 to the output multiplexer 6 of the coder.

FIG. 7 shows an alternative embodiment of the means used by the coder of FIG. 1 to determine the vector Emix of energy weighting of the frame of the non-harmonic component. The spectral decompression and TFRI modules 36, 37 operate like those which have the same references in FIG. 1. A selection module 42 is added to determine the value of the module of the spectrum subjected to the inverse Fourier transform 37. On the based on the estimated fundamental frequency F ₀ , the module 42 identifies harmonic regions and non-harmonic regions of the spectrum of the audio signal. For example, a frequency will be considered to belong to a harmonic region if it is in a frequency interval centered on a harmonic kF ₀ and of width corresponding to a width of spectral line synthesized, and to a non-harmonic region otherwise. In non-harmonic regions, the complex signal subjected to TFRI 37 is equal to the value of the spectrum, that is to say that its modulus and its phase correspond to the values | X | and φ _χ provided by the TFR module 3. In harmonic regions, this complex signal has the same phase φ _x as the spectrum and a module given by the lower envelope after spectral decompression 36. This procedure according to FIG. 7 provides more precise modeling of non-harmonic regions. The decoder shown in Figure 8 includes a demultiplexer input 45 which extracts from the bit stream Φ, coming from an encoder according to FIG. 1, the indexes iF, icxs, icxi, iEm for quantifying the fundamental frequency F ₀ , cepstral coefficients representing the compressed upper envelope, coefficients representing the compressed lower envelope, and the weighting vector Emix, and distributing them respectively to modules 46, 47, 48 and 49. These modules 46-49 include quantization dictionaries similar to those of modules 5, 18, 34 and 40 of FIG. 1, in order to restore the values of the quantized parameters. The modules 47 and 48 have dictionaries to form the quantized prediction residues rcx_q [n], and they deduce therefrom the quantified cepstral vectors cx_q [n] with elements identical to the elements 23-26 of FIG. 6. These quantified cepstral vectors cx_q [n] provide the cepstral coefficients cx_sup_q and cx_inf_q processed by the decoder.

A module 51 calculates the fast Fourier transform of the cepstral coefficients cx_sup for each signal frame. The frequency spectrum of the resulting compressed spectrum is modified non-linearly by a module 52 applying the reciprocal non-linear transformation to that of module 12 in FIG. 1, and which provides the estimate LX_sup of the compressed upper envelope . A spectral decompression of LX_sup, operated by a module 53, provides the upper envelope X_sup comprising the estimated values of the module of the spectrum at frequencies multiple of the fundamental frequency F ₀ .

The module 54 synthesizes the spectral estimate X _v of the harmonic component of the audio signal, by a sum of spectral lines centered on the frequencies multiple of the fundamental frequency F ₀ and whose amplitudes (in module) are those given by the envelope superior X_sup.

Although the digital input stream Φ does not contain specific information on the phase of the signal spectrum at the harmonics of the fundamental frequency, the decoder in FIG. 8 is capable of extracting information on this phase from cepstral coefficients cx_sup_q representing the compressed upper envelope. This phase information is used to assign a phase φ (k) to each of the spectral lines determined by the module 54 in the estimation of the harmonic component of the signal.

As a first approximation, the speech signal can be considered to be at minimum phase. On the other hand, it is known that the minimum phase information can easily be deduced from a cepstral modeling. This minimum phase information is therefore calculated for each harmonic frequency. The minimum phase assumption means that the energy of the synthesized signal is localized at the start of each period of the fundamental frequency F ₀ . To be closer to a real speech signal, a little dispersion is introduced by means of a specific post-liftering of the cepstrums during the synthesis of the phase. With this post-liftrage, carried out by the module 55 in FIG. 8, it is possible to accentuate the form resonances of the envelope and therefore to control the dispersion of the phases. This post-liftrage is for example of the form (8).

To limit phase breaks, it is preferable to smooth the post-liftrated cepstral coefficients, which is done by module 56. Module 57 deduces post-liftrated cepstral coefficients and smoothed the minimum phase assigned to each spectral line representing a harmonic peak of the spectrum.

The operations performed by the modules 56, 57 for smoothing and extracting the minimum phase are illustrated by the flowchart in FIG. 9. The module 56 examines the variations of the cepstral coefficients in order to apply a lesser smoothing in the presence of sudden variations only in the presence of slow variations. For this, it performs the smoothing of the cepstral coefficients by means of a forgetting factor λ _c chosen as a function of a comparison between a threshold d _th and a distance d between two successive sets of post-liftrated cepstral coefficients. The threshold d _th is itself adapted as a function of the variations of the cepstral coefficients. The first step 60 consists in calculating the distance d between the two successive vectors relating to the frames n-1 and n. These vectors, denoted here cxp [n-1] and cxp [n], correspond for each frame to all of the NCS post-liftral cepstral coefficients representing the compressed upper envelope. The distance used can in particular be the Euclidean distance between the two vectors or even a quadratic distance.

Two smoothings are first carried out, respectively by means of forgetting factors λ _min and λ _max , to determine a minimum distance d _min and a maximum distance d _max . The threshold d _th is then determined in step 70 as being located between the minimum and maximum distances d _mjn , d _maχ : d _th = β.d _maχ + (1-β) .d _min , the coefficient β being for example equal to 0.5.

In the example shown, the forgetting factors λ _mjn and λ _max are themselves selected from two distinct values, respectively λ _mjn1 , λ _mjn2 ^{and λ} maxi ' ^λ max2 ^corr ιP ^{πses between} 0 and 1, the indices λ _mjn1 , λ _max1 each being substantially closer to 0 than the indices λ _mjn2 , λ _max2 . If d> d _mjn (test 61), the forget factor λ _mjn is equal to λ _mjn1 (step 62); otherwise it is taken equal to λ _min2 (step 63). In step 64, the minimum distance d _mjn is taken equal to ^λ min- ^d min ⁺ ( ¹ - ^λ min) ^d - ^{If d> d} max ( ^{test 65} ) - ^the ^forget ^factor ^λ max ^is 9 ^{al at} λ _max1 (step 66); otherwise it is taken equal to λ _max2 (step 67). In step 68, the minimum distance d _maχ is taken equal to λ _maχ .d _maχ + (1-λ _max ) .d.

If the distance d between the two consecutive cepstral vectors is greater than the threshold d _th (test 71), a value λ _c1 relatively close to 0 is adopted for the forget factor λ _c (step 72). In this case, the corresponding signal is considered to be of the non-stationary type, so that there is no need to keep a large memory of the previous cepstral coefficients. If d <d _th , in step 73 for the forget factor λ _c , we adopt a value λ ^ less close to 0 in order to further smooth the cepstral coefficients. Smoothing is performed in step 74, where the vector cxl [n] of smoothed coefficients for the current frame n is determined by: cxl [n] = λ _c .cxl [n-1] + (l- λ _c ) cxp [n] (11)

The module 57 then calculates the minimum phases φ (k) associated with the harmonics kF ₀ . In known manner, the minimum phase for a harmonic of order k is given by:

NCS-1 φ (k) = -2. ∑ cxl [n, m] .sin (2πmkF ₀ / F _e ) (12) m = 1 where cxl [n, m] denotes the smooth cepstral coefficient of order m for the frame n.

In step 75, the harmonic index k is initialized to 1. To initialize the calculation of the minimum phase assigned to the harmonic k, the phase φ (k) and the cepstral index m are initialized respectively at 0 and 1 in step 76. In step 77, the module 57 adds to phase φ (k) the quantity -2.cxl [n, m] .sin (2πmk.F ₀ / F _e ). The cepstral index m is incremented in step 78 and compared to NCS in step 79. Steps 77 and 78 are repeated as long as m <NCS. When m = NCS, the calculation of the minimum phase is completed for the harmonic k, and the index k is incremented in step 80. The calculation of minimum phases 76-79 is repeated for the following harmonic as long as kF ₀ <F _e / 2 (test 81). In the exemplary embodiment according to FIG. 8, the module 54 takes account of a constant phase over the width of each spectral line, equal to the minimum phase φ (k) supplied for the corresponding harmonic k by the module 57.

The estimate X _v of the harmonic component is synthesized by summing spectral lines positioned at the harmonic frequencies of the fundamental frequency F ₀ . During this synthesis, the spectral lines can be positioned on the frequency axis with a resolution greater than the resolution of the Fourier transform. For that, one precalculates once and for all a spectral line of reference according to the higher resolution. This calculation can consist of a Fourier transform of the analysis window fA with a transform size of 16384 points, providing a resolution of 0.5 Hz per point. The synthesis of each harmonic line is then carried out by the module 54 by positioning on the frequency axis the reference line at high resolution, and by sub-sampling this spectral line of reference to reduce to the resolution of 16.625 Hz of the Fourier transform on 512 points. This allows to precisely position the spectral line.

For the determination of the lower envelope, the TFR module 85 of the decoder of FIG. 8 receives the NCI quantified cepstral coefficients cx_inf_q of orders 0 to NCI - 1, and it advantageously supplements them by the NCS - NCI cepstral coefficients cx_sup_q d NCI to NCS order - 1 representing the upper envelope. Indeed, it can be estimated as a first approximation that the rapid variations of the compressed lower envelope are well reproduced by those of the compressed upper envelope. In another embodiment, the TFR 85 module could only consider the NCI cepstraux parameters cx_inf_q.

The module 86 converts the frequency scale reciprocally from the conversion operated by the module 32 of the coder, in order to restore the estimate LX_inf of the compressed lower envelope, subjected to the spectral decompression module 87. At the output of the module 87 , the decoder has of a lower envelope X nf comprising the values of the modulus of the spectrum in the valleys located between the harmonic peaks.

This envelope Xjnf will modulate the spectrum of a noise frame whose phase is processed as a function of the quantized weighting vector Emix extracted by the module 49. A generator 88 delivers a normalized noise frame whose 4 ms segments are weighted in a module 89 in accordance with the normalized components of the Emix vector supplied by the module 49 for the current frame. This noise is a high-pass filtered white noise to take account of the low level which in principle the unvoiced component has at low frequencies. From the energy-weighted noise, the module 90 forms frames of 2N = 512 samples by applying the analysis window f _A , the insertion of 256 samples at zero and the circular permutation for the phase compensation in accordance with what has been explained with reference to FIG. 3. The Fourier transform of the resulting frame is calculated by the TFR module 91.

The spectral estimate X _uv of the non-harmonic component is determined by the spectral synthesis module 92 which performs frequency-by-frequency weighting. This weighting consists in multiplying each complex spectral value supplied by the TFR module 91 by the value of the lower envelope Xjnf obtained for the same frequency by the spectral decompression module 87.

The spectral estimates X _v , X _uv of the harmonic components

(voiced in the case of a speech signal) and non-harmonic (or voiceless) are combined by a mixing module 95 controlled by a module 96 for analyzing the degree of harmony (or voicing) of the signal .

The organization of these modules 95, 96 is illustrated in FIG. 10. The analysis module 96 comprises a unit 97 for estimating a degree of voicing W dependent on the frequency, from which four gains dependent on are calculated. the frequency, namely two gains g _v , g _uv controlling the relative importance of the harmonic and non-harmonic components in the synthesized signal, and two gains g _v g _uv used to noise the phase of the harmonic component.

The degree of voicing W (i) is a continuously variable value between 0 and 1 determined for each frequency index i (0 <i <N) as a function of the upper envelope X_sup (i) and the lower envelope X inf (i) obtained for this frequency i by the decompression modules 53,

87. The degree of voicing W (i) is estimated by the unit 97 for each frequency index i corresponding to a harmonic of the fundamental frequency F ₀ ,

namely i for k = 1, 2, ..., by an increasing function of

ratio between the upper envelope X_sup and the lower envelope Xjnf at this frequency, for example according to the formula:

The threshold Vth (F ₀ ) corresponds to the average dynamics calculated on a synthetic spectrum purely voiced at the fundamental frequency. It is advantageously chosen depending on the fundamental frequency F ₀ .

The degree of voicing W (i) for a frequency other than the harmonic frequencies is obtained simply as being equal to that estimated for the nearest harmonic.

The gain g _v (i), which depends on the frequency, is obtained by applying a non-linear function to the degree of voicing W (i) (block 98). This nonlinear function has for example the form represented on figure 11: g _v (i) = 0 if 0 <W (i) <W1

9 _w (i) = ^{W (|) ~ W1} if W1 <W (i) <W2 (14) ^v W2 - W1 g _v (i) = 1 if W2 <W (i) <1 the thresholds W1, W2 being such as 0 <W1 <W2 <1. The gain g _uv can be calculated in a similar way to the gain g _v (the sum of the two gains g _v , g _uv being constant, for example equal to 1), or simply deduced from that -ci by the relation g _uv (i) = 1 - g _v (i), as shown diagrammatically by the subtractor 99 in FIG. 10. It is interesting to be able to noise the phase of the harmonic component of the signal at a given frequency if l analysis of the degree of voicing shows that the signal is rather of non-harmonic type at this frequency. For this, the phase φ _v of the mixed harmonic component is the result of a linear combination of the phases φ _v , φ _uv of the harmonic and non-harmonic components X _v , X _uv synthesized by the modules 54, 92. The gains g _v g _uv respectively applied to these phases are calculated from the degree of voicing W and also weighted as a function of the frequency index i, since the sound effects of the phase are only really useful beyond a certain frequency.

A first gain g _v1 is calculated by applying a non-linear function to the degree of voicing W (i), as shown diagrammatically by block 100 in FIG. 10. This non-linear function can have the form represented in FIG. 12: g _v1 _ _φ (i) = G1 if 0 <W (i) <W3

9v1_ _φ = ^{G1 +} ( ¹ - ^G1 ) w4- W3 ^{Sj W3 <W (<W4 (15)} g _v1 _ _φ (i) = 1 if W4 <W (i) <1 the thresholds W3 and W4 being such that 0 <W3 <W4 <1, and the minimum gain G1 being between 0 and 1.

A multiplier 101 multiplies for each index frequency i the gain g _v1 by another gain g _{v2 φ} depending only on the frequency index i, to form the gain g _v (i). The gain g _v2 (i) depends non-linearly on the frequency index i, for example as shown in Figure 13:

9v2_ _φ (') ^{= 1} if 0 <i <il

9v2_φ ( ⁱ ) = 1 - (l - G2) ^ _: if iK i <i2 (16)

g _v2 _ _φ (i) = G2 if i2 ≤ i ≤ 1 the indices il and i2 being such that 0 <il <i2 <N, and the minimum gain G2 being between 0 and 1. The gain g _uv (i) can be calculated simply as being equal to 1 - g _v _ _φ (i) = 1 - 9vi_φ ( ⁱ ) g _V 2_φ ( ⁱ ) (subtractor 102 of figure 10). The complex spectrum Y of the synthesized signal is produced by the mixing module 95, which realizes the following mixing relation, for 0 <i <N:

Y (i) = 9 _v (i) | Xv (i) | - ^β xpθ φv (i)] ⁺ 9 _UV (') - X _uv (') < ¹⁷ ) with φ ^ (i) = g _v _ _φ (i). φ _v (i) + g _uv _ _φ (i). φ _uv (i) (18) where φ _v (i) denotes the argument of the complex number X _v (i) supplied by the module 54 for the frequency of index i (block 104 of figure 10), and φ _uv (i) designates the argument of the complex number X _uv (i) supplied by the module 92 (block 105 of FIG. 10). This combination is carried out by the multipliers 106-110 and the adders 111-112 shown in FIG. 10.

The mixed spectrum Y (i) for 0 <i <2N (with Y (2N-1-i) = Y (i)) is then transformed in the time domain by the TFRI 115 module (Figure 8). Only the first N / 2 = 128 and the last N / 2 = 128 samples of the frame of 2N = 512 samples produced by the module 115 are retained, and the inverse circular permutation of that illustrated in FIG. 3 is applied to obtain the synthesized frame of N = 256 samples weighted by the analysis window f _A. The frames successively obtained in this way are finally processed by the time synthesis module 116 which forms the decoded audio signal x.

The temporal synthesis module 116 performs an overlap sum of modified frames with respect to those successively evaluated at the output of module 115. The modification can be seen in two stages illustrated respectively in FIGS. 14 and 15.

The first step (FIG. 14) consists in multiplying each frame 2 ′ delivered by the TFRI module 115 by a window 1 / f _A opposite to the analysis window f _A used by the module 1 of the coder. The resulting 2 "frame samples are therefore weighted uniformly.

The second step (figure 15) consists in multiplying the samples of this 2 "frame by a synthesis window f _s verifying the following properties: f _s (N-L + i) + f _s (i) = A for 0 ≤ i <L (19) f _s (i) = A for L ≤ i <NL (20) where A denotes an arbitrary positive constant, for example A = 1. The synthesis window f _s (i) gradually increases from 0 to A for i going from 0 to L. It is for example a raised half-sinusoid: ^f s (') = - ( ¹ - ^C0S [(' + ¹ 2) π / Lj for 0 <i <L (21)

After reweighting each frame 2 "by the summary window f _s , the module 116 positions the successive frames with their time offsets of M = 160 samples and their time overlaps of L = 96 samples, then it performs the sum of the frames thus positioned over time. Due to the properties (19) and (20) of the summary window f _s , each sample of the decoded audio signal x thus obtained is assigned a uniform overall weight, equal to A. This overall weight comes from the contribution of a single frame if the sample has in this frame a rank i such that L <i < N - L, and includes the summed contributions of two successive frames if 0 <i <L where N - L <i <N.

It is thus possible to carry out the time synthesis in a simple manner even if, as in the case considered, the overlap L between two successive frames is smaller than half the size N of these frames.

The two steps outlined above for modifying signal frames can be merged in one step. It is enough to precompute a compound window f _c (i) = f _s (i) / f _A (i), and to simply multiply the frames

2 'of N = 256 samples delivered by the module 115 by the compound window f _c before performing the overlapping summation.

FIG. 16 shows the appearance of the compound window f _c in the case where the analysis window f _A is a Hamming window and the synthesis window f _s has the form given by the relations (19) to (21) .

Other forms of the summary window f _s verifying the relations (19) and (20) can be used. In the variant of figure 17, it is a piecewise affine function defined by: f _s (i) = Ai / L for 0 <i <L (22)

In order to improve the coding quality of the audio signal, the coder in FIG. 1 can increase the rate of formation and analysis of the frames, in order to transmit more quantization parameters to the decoder. In the frame structure shown in Figure 2, a frame of N = 256 samples (32 ms) is formed every 20 ms. These frames of 256 samples could be formed at a higher rate, for example of 10 ms, two successive frames then having an offset of M / 2 = 80 samples and an overlap of 176 samples.

Under these conditions, it is possible to transmit the complete sets of quantization parameters iF, icxs, icxi, iEm for only a subset of the frames, and transmit for the other frames parameters making it possible to carry out an adequate interpolation at the level of the decoder. In the example envisaged above, the subset for which complete sets of parameters are transmitted can be constituted by the frames of rank integer n, whose periodicity is M / F _e = 20 ms, and the frames for which an interpolation is performed can be those of rank half-integer n + 1/2 which are offset by 10 ms relative to the frames of the sub -together.

In the embodiment illustrated in FIG. 18, the notations cx_q [n-1] and cxjq [n] denote quantized cepstral vectors determined, for two successive frames of whole rank, by the quantization module 18 and / or by the quantization 34. These vectors comprise for example four consecutive cepstral coefficients each. They could also include more cepstral coefficients. A module 120 performs an interpolation of these two cepstral vectors cx q [n-1] and cx_q [n], in order to estimate an intermediate value cx_i [n-1/2]. The interpolation performed by the module 120 can be a simple arithmetic mean of the vectors cx q [n-1] and cxjqfn]. As a variant, the module 120 could apply a more sophisticated interpolation formula, for example polynomial, also relying on the cepstral vectors obtained for frames prior to the frame n-1. On the other hand, if more than one interpolated frame is interposed between two consecutive frames of whole rank, the interpolation takes account of the relative position of each interpolated frame.

Using the means described above, the coder also calculates the cepstral coefficients cx [n-1/2] relating to the frame of half-integer rank. In the case of the upper envelope, these cepstral coefficients are those provided by the module of TFR1 13 after post-liftrage 15 (for example with the same post-liftrage coefficients as for the previous frame n-1) and normalization 16. In the case of the lower envelope, the cepstral coefficients cx [n-1/2] are those delivered by the TFRI module 33.

A subtractor 121 forms the difference ecx [n-1/2] between the cepstral coefficients cx [n-1/2] calculated for the frame of half-integer rank and the coefficients cxj ^' [n-1/2] estimated by interpolation . This difference is supplied to a quantization module 122 which addresses quantization indices icx [n-1/2] to the output multiplexer 6 of the coder. The module 122 operates for example by vector quantization of the ecx interpolation errors [n-1/2] successively determined for the half-integer rank frames.

This quantification of the interpolation error can be carried out by the coder for each of the NCS + NCI cepstral coefficients used by the decoder, or only for some of them, typically those of orders the smallest.

The corresponding means of the decoder are illustrated in FIG. 19.

The decoder essentially functions as that described with reference to Figure 8 to determine the signal frames of whole rank. An interpolation module 124 identical to the module 120 of the coder estimates the intermediate coefficients cx_i [n-1/2] from the quantized coefficients cx q [n-1] and cx q [n] supplied by the module 47 and / or module 48 from the icxs, icxi indexes extracted from the flow Φ. A parameter extraction module 125 receives the quantization index icx [n-1/2] from the input demultiplexer 45 of the decoder, and deduces therefrom the quantized interpolation error ecx q [n-1/2 ] from the same quantization dictionary as that used by the module 122 of the coder. An adder 126 sums the cepstral vectors cx_i [n-1/2] and ecx_q [n-1/2] in order to provide the cepstral coefficients cx [n-1/2] which will be used by the decoder (modules 51- 57, 95, 96, 115 and / or modules 85-87, 92, 95, 96, 115) to form the interpolated frame of rank n-1/2.

If only some of the cepstral coefficients have been the subject of an interpolation error quantification, the others are determined by the decoder by a simple interpolation, without correction.

The decoder can also interpolate the other parameters F ₀ , Emix used to synthesize the signal frames. The fundamental frequency F ₀ can be interpolated linearly, either in the time domain, or (preferably) directly in the frequency domain. For the possible interpolation of the energy weighting vector Emix, the interpolation should be carried out after denormalization and of course taking account of the time offsets between frames.

It should be noted that it is particularly advantageous, to interpolate the representation of the spectral envelopes, to perform this interpolation in the cepstral domain. Contrary to an interpolation carried out on other parameters, such as the LSP coefficients (“Line Spectrum Pairs”), the linear interpolation of the cepstral coefficients corresponds to the linear interpolation of the compressed spectral amplitudes.

In the variant represented in FIG. 20, the coder uses the cepstral vectors cx_q [n], cx_q [n-1], ..., cx q [nr] and cx q [n-1/2] calculated for the last past frames (r> 1) to identify an optimal interpolator filter which, when subject to the quantized cepstral vectors cx_q [nr], ..., cx_q [n] relative to frames of whole rank, delivers an interpolated cepstral vector cxj ^' [n-1/2] which presents a minimum distance with the vector cx [n-1/2] calculated for the last frame of half-whole rank.

In the example shown in FIG. 20, this interpolator filter 128 is present in the coder, and a subtractor 129 subtracts its output cxj [n-1/2] from the calculated cepstral vector cx [n-1/2]. A minimization module 130 determines the set of parameters {P} of the interpolator filter 128, for which the interpolation error ecx [n-1/2] delivered by the subtractor 129 has a minimum standard. This set of parameters {P} is addressed to a quantization module 131 which provides a corresponding quantization index iP to the output multiplexer 6 of the coder.

Depending on the bit rate allocated in the flow Φ to the quantization indexes of the parameters {P} defining the optimal interpolator filter 128, we can adopt a more or less fine quantification of these parameters, or a more or less elaborate form of the interpolator filter, or still provide several distinctly quantized interpolating filters for different vectors of cepstral coefficients.

In a simple embodiment, the interpolator filter 128 is linear, with r = 1: cx_i [n-1/2] = p.cx q [n-1] + (1-ρ) .cx q [n] (23) and the set of parameters {P} is limited to the coefficient p between 0 and 1.

From the quantization indexes iP of the parameters {P} obtained in the bit stream φ, the decoder reconstructs the interpolator filter 128 (except for quantization errors), and processes the spectral vectors cxjq [nr], ..., cx_q [ n] in order to estimate the cepstral coefficients cx [n-1/2] used to synthesize the half-integer rank frames.

In general, the decoder can use a simple interpolation method (without transmission of parameters from the coder for half-integer rank frames), an interpolation method with consideration of an interpolation error quantized (according to Figures 17 and 18), or an interpolation method with an optimal interpolator filter (according to Figure 19) to evaluate the half-integer rank frames in addition to the whole rank frames evaluated directly as explained with reference to Figures 8 to 13. The time synthesis module 116 can then combine all of these evaluated frames to form the synthesized signal x in the manner explained above. after with reference to Figures 14, 21 and 22.

As in the temporal synthesis method previously described, the module 116 performs an overlap sum of modified frames with respect to those successively evaluated at the output of the module 115, and this modification can be seen in two stages, the first of which is identical to that previously described with reference to FIG. 14 (divide the samples of the frame 2 'by the analysis window f).

The second step (Figure 21) consists in multiplying the samples of the renormalized 2 "frame by a synthesis window f _s verifying the following properties: f _s (i) = 0 for 0 <i <N / 2 - M / p and N / 2 + M / p <i <N (24) fs (i) + fs (i + M / p) = A for N / 2 - M / p <i <N / 2 (25) where A denotes a arbitrary positive constant, for example A = 1, and p is the integer such that the time offset between the successive frames (calculated directly and interpolated) is of M / p samples, or p = 2 in the example ι described. synthesis window f _s (i) increases progressively for i going from N / 2 - M / p to N / 2. It is for example a raised sinusoid over the interval N / 2 - M / p <i <N / 2 + M / p. In particular, the synthesis window fs can be, over this interval, a Hamming window (as shown in FIG. 21) or a Hanning window.

FIG. 21 shows the successive frames 2 "repositioned in time by the module 116. The hatching indicates the portions eliminated from the frames (summary window at 0). It can be seen that by performing the overlapping sum of the samples of the successive frames, the property (25) ensures a homogeneous weighting of the samples of the synthesized signal.

As in the synthesis method illustrated by FIGS. 14 and 15, the weighting procedure of the frames obtained by inverse Fourier transform of the spectra Y can be carried out in a single step, with a compound window fc (= ^f s (') AA (i) • Figure 22 shows the shape of the compound window f _c in the case where the windows f _A and fs are of the Hamming type.

Like the temporal synthesis method illustrated by figures 14 to 17, that illustrated by figures 14, 21 and 22 makes it possible to take into account an overlap L between two analysis frames (for which the analysis is carried out completely) smaller than half than the size N of these frames. In general, this last method is applicable when the successive analysis frames have mutual time shifts M of more than N / 2 samples (even possibly more than N samples if a very low bit rate is required), the interpolation leading to a set of frames whose mutual time offsets are less than N / 2 samples.

The interpolated frames can be the subject of a reduced transmission of coding parameters, as described above, but this is not compulsory. This embodiment makes it possible to maintain a relatively large interval M between two analysis frames, and therefore to limit the required transmission rate, while limiting the discontinuities likely to appear due to the size of this interval relative to the scales. of time typical of the variations of the parameters of the audio signal, in particular the cepstral coefficients and the fundamental frequency.

Claims

1. Method for decoding an input digital stream (Φ) representing an encoded audio signal, in which a set of successive frames of N samples of the audio signal is synthesized from encoding data included in the digital stream of input, in which the coding data comprises, for only a subset of the frames, data representative of spectral amplitudes associated with frequencies of the spectrum of the audio signal, characterized in that one determines for each of the frames of said sub- together, on the basis of the coding data, cepstral coefficients representative of at least some of said spectral amplitudes, and in that for the frames not forming part of said subset, said cepstral coefficients are interpolated, and the using interpolation cepstral coefficients a spectral estimate (Y) of the audio signal that is transformed in the time domain to obtain the synthesized frame.

2. Method according to claim 1, in which the coding data comprise quantification data of said cepstral coefficients.

3. Method according to claim 1 or 2, in which a fundamental frequency (F ₀ ) of the audio signal is determined from quantization data included in the input bit stream (Φ), in which for each frame of said sub- together, it is determined, from the coding data, a higher spectral envelope (X_sup) of the audio signal corresponding to spectral amplitudes associated with frequencies multiple of the fundamental frequency, and in which for each frame not forming part of said sub- together, the upper spectral envelope of the audio signal is determined from the interpolated cepstral coefficients.

4. Method according to any one of claims 1 to 3, in which a fundamental frequency (F ₀ ) of the audio signal is determined from quantization data included in the input bit stream (Φ), in which for each frame of said subset, a lower spectral envelope (Xjnf) of the audio signal corresponding to spectral amplitudes associated with frequencies located is determined from the coding data. in regions of the intermediate spectrum with respect to the frequencies multiple of the fundamental frequency, and in which for each frame not forming part of said subset, the lower spectral envelope of the audio signal is determined from the interpolated cepstral coefficients.

5. Method according to any one of claims 1 to 4, in which the successive frames of said set are overlapping and composed of N samples of the audio signal weighted by an analysis window (f _A ), in which the successive frames of said sub -set have mutual time shifts of M samples, the number M being greater than N / 2, while the successive frames of said set have mutual time shifts of M / p samples, p being an integer greater than 1, in which modifies each synthesized frame of the set by applying to it a processing corresponding to a division by said analysis window ι (f _A ) and to a multiplication by a synthesis window (fs), and the decoded audio signal is formed (x) as an overlapping sum of the modified frames, and in which, the samples of a frame having rows i numbered from 0 to N-1, the summary window f _s (i) has a support l imitated at ranks i going from N / 2 - M / p to N / 2 + M / p and satisfies f _s (i) + f _s (i + M / p) = A for N / 2 - M / p <i <N / 2, A being a positive constant.

6. Method according to any one of claims 1 to 5, in which, for the frames not forming part of said subset, the interpolated cepstral coefficients (cxj [n-1/2]) are corrected on the database (icx [n-1/2]) quantization of interpolation error (ecxjq [n-1/2]) included in the coding data.

7. Method according to any one of claims 1 to 5, in which, for the frames not forming part of said subset, the cepstral coefficients (cxjq [n]) are interpolated by a filter (128) determined on the basis interpolator filter quantization data (iP) included in the coding data.

8. Audio decoder, comprising means for executing a method according to any one of claims 1 to 7.

9. Method for coding an audio signal (x), in which a spectrum of the audio signal is determined by a frequency domain transform of a frame of the audio signal, and a digital output stream (Φ) is included data representative of spectral amplitudes associated with at least some of the frequencies of the spectrum, in which the spectrum of the audio signal is determined for a set of successive frames of N samples of the audio signal, and in which it is determined for each of the frames of said set cepstral coefficients representative of at least some of said spectral amplitudes, characterized in that said data representative of spectral amplitudes are included in the digital output stream for only a subset of the frames, and in that for the frames not forming part of said subset, we include in the digital output stream (Φ) data (icx [n-1/2]) for quantifying an internal error erpolation (ecx [n-1/2]) of said cepstral coefficients.

10. Method for coding an audio signal (x), in which a spectrum of the audio signal is determined by a frequency domain transform of a frame of the audio signal, and a digital output stream (Φ) is included data representative of spectral amplitudes associated with at least some of the frequencies of the spectrum, in which the spectrum of the audio signal is determined for a set of successive frames of N samples of the audio signal, and in which it is determined for each of the frames of said set cepstral coefficients representative of at least some of said spectral amplitudes, characterized in that said data representative of spectral amplitudes are included in the digital output stream for only a subset of the frames, and in that for the frames not forming part of said subset, an optimal interpolator filter (128) is determined for said cepstral coefficients, and one includes in the digital output stream (Φ) of the data (iP) representing said optimal interpolator filter.

11. The method of claim 9 or 10, wherein said data representative of the spectral amplitudes comprise data for quantification of the cepstral coefficients.

12. Method according to any one of claims 9 to 11, in which a fundamental frequency (F ₀ ) of the audio signal is estimated, and in which the interpolated cepstral coefficients comprise cepstral coefficients calculated by transforming in the cepstral domain an upper envelope compressed (LX_sup) of the audio signal spectrum.

13. The method of claim 12, wherein the compressed upper envelope (LX sup) is determined by interpolation of spectral amplitudes associated with frequencies multiple of the fundamental frequency (F ₀ ), with the application of a spectral compression function. .

14. Method according to any one of claims 9 to 13, in which a fundamental frequency (F ₀ ) of the audio signal is estimated, and in which the interpolated cepstral coefficients comprise cepstral coefficients calculated by transforming a lower envelope in the cepstral domain compressed (LXj ^' nf) of the audio signal spectrum.

15. The method of claim 14, wherein the compressed lower envelope (LXj ^' nf) is determined by interpolation of spectral amplitudes associated with frequencies located in regions of the intermediate spectrum relative to the frequencies multiple of the fundamental frequency (F ₀ ), with application of a spectral compression function.

16. Method according to any one of claims 9 to 15, in which the successive frames of said subset have mutual time offsets of more than N / 2 samples, and the successive frames of said set have mutual time offsets of less than N / 2 samples.

17. Audio coder, comprising means for executing a method according to any one of claims 9 to 16.