EP1388846A2 - Method and device for wideband speech coding able to independently control short-term and long-term distortions - Google Patents

Method and device for wideband speech coding able to independently control short-term and long-term distortions Download PDF

Info

Publication number
EP1388846A2
EP1388846A2 EP03291749A EP03291749A EP1388846A2 EP 1388846 A2 EP1388846 A2 EP 1388846A2 EP 03291749 A EP03291749 A EP 03291749A EP 03291749 A EP03291749 A EP 03291749A EP 1388846 A2 EP1388846 A2 EP 1388846A2
Authority
EP
European Patent Office
Prior art keywords
weighting filter
term
filter
long
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP03291749A
Other languages
German (de)
French (fr)
Other versions
EP1388846A3 (en
Inventor
Michael Ansorge
Giuseppina Biundo Lotito
Benito Carnero
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
STMicroelectronics NV
Original Assignee
STMicroelectronics NV
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from EP02015919A external-priority patent/EP1383113A1/en
Application filed by STMicroelectronics NV filed Critical STMicroelectronics NV
Priority to EP03291749A priority Critical patent/EP1388846A3/en
Publication of EP1388846A2 publication Critical patent/EP1388846A2/en
Publication of EP1388846A3 publication Critical patent/EP1388846A3/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • G10L19/12Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a code excitation, e.g. in code excited linear prediction [CELP] vocoders

Definitions

  • the invention relates to speech encoding / decoding extended band, in particular but not limited to telephony mobile.
  • the bandwidth of the speech signal is between 50 and 7000 Hz.
  • Successive speech sequences sampled at one predetermined sampling frequency are processed in a CELP-type coding device, using a linear prediction with excitation by coded sequences (by ACELP example: "algebraic-code-excited linear-prediction"), well known to those skilled in the art, and described in particular in the recommendation ITU-TG 729, version 3/96, entitled “coding of the speech at 8 kbits / s by linear prediction with excitation by coded sequences with conjugated algebraic structure ”.
  • the CD prediction coder of the CELP type, is based on the linear predictive coding model with code excitation.
  • the coder operates on vocal superframes equivalent for example to 20 ms of signal and each comprising 320 samples.
  • the extraction of the linear prediction parameters, ie the coefficients of the linear prediction filter also called short-term synthesis filter 1 / A (z), is carried out for each speech superframe.
  • each superframe is subdivided into 5 ms frames comprising 80 samples.
  • the voice signal is analyzed to extract the parameters of the CELP prediction model (that is to say, in particular, a long-term digital excitation word v i extracted from an adaptive coded DLT directory, also called “adaptive long-term dictionary", an associated long-term gain Ga, a short-term excitation word c j , extracted from a fixed coded repertoire DCT, also called “short-term dictionary”, and a gain at associated short term Gc).
  • a long-term digital excitation word v i extracted from an adaptive coded DLT directory, also called “adaptive long-term dictionary”
  • an associated long-term gain Ga a short-term excitation word c j
  • a short-term excitation word c j extracted from a fixed coded repertoire DCT, also called “short-term dictionary”
  • Gc gain at associated short term Gc
  • these parameters are used, in a decoder, to retrieve the excitation and predictive filter parameters. We then reconstitutes speech by filtering this excitation flow in a short-term synthesis filter.
  • the short-term dictionary DCT is founded on a fixed structure, for example of the stochastic type, or of the algebraic using an interlaced permutation model of Dirac pulses.
  • the coded repertoire contains innovative excitations, also called algebraic or short-term excitations, and each vector contains a number of non-zero pulses, for example four, each of which can have amplitude +1 or -1 with predetermined positions.
  • the CD encoder processing means include functionally of the first MEXT1 extraction means intended to extract the word long-term excitement, and second MEXT2 extraction means intended to extract the word short-term excitement. Functionally, these means are made for example in software within a processor.
  • These extraction means include a predictive filter FP having a transfer function equal to 1 / A (z), as well as a filter FPP perceptual weighting with a transfer function W (z).
  • the perceptual weighting filter is applied to the signal to model the perception of the ear.
  • the extraction means include means MECM intended to perform a minimization of a square error average.
  • the linear prediction FP synthesis filter models the spectral envelope of the signal. Linear predictive analysis is performed all superframes, so as to determine the linear predictive filter coefficients. These are converted to spectral line pairs (LSP: “Line Spectrum Pairs”) and digitized by predictive vector quantization in two stages.
  • LSP Line Spectrum Pairs
  • Each 20 ms speech superframe is divided into four frames of 5 ms each containing 80 samples.
  • the settings Quantized LSPs are transmitted to the decoder once per superframe while long term and short term parameters are passed at each frame.
  • the coefficients of the linear prediction filter, quantified and not quantified, are used for the most recent frame of a super-frame, while the other three frames of the same super-frame use an interpolation of these coefficients.
  • Tonal delay in open loop is estimated for example every two frames on the basis of the perceptually weighted voice signal. Then, the The following operations are repeated for each frame:
  • the long-term target signal X LT is calculated by filtering the sampled speech signal s (n) by the perceptual weighting filter FPP.
  • the impulse response of the weighted synthesis filter is calculated.
  • a closed loop tonal analysis using a minimization of the mean square error is then carried out in order to determine the long-term excitation word v i and the associated gain Ga, by means of the target signal and the impulse response, by searches around the value of the tone delay in open loop.
  • the long-term target signal is then updated by subtracting the filtered contribution y from the adaptive coded directory DLT and this new short-term target signal X ST is used when exploring the fixed coded directory DCT in order to determine the password.
  • short term excitation C j and the associated gain G c is used when exploring the fixed coded directory DCT in order to determine the password.
  • CELP algorithm strongly depends on the richness of the DCT short term excitation dictionary for example from an algebraic excitation dictionary. If the effectiveness of such algorithm is unquestionable for bandwidth signals narrow (300-3400 Hz), problems arise for signals with widened band.
  • the object of the invention is to independently control the short-term and long-term distortions.
  • the invention therefore provides a speech encoding method with wide band, in which the speech is sampled so as to obtain successive voice frames each comprising a predetermined number of samples, and for each voice frame, we determines parameters of a linear prediction model at excitation by code, these parameters comprising a numeric word of long-term excitement extracted from an adaptive coded repertoire, as well that a word of short-term excitement extracted from a coded repertoire associated fixed.
  • long term excitation word extraction using a prime perceptual weighting filter comprising a first filter formantic weighting
  • the denominator of the transfer function of the first formantic weighting filter is equal to the numerator of the second formantic weighting filter.
  • the use of two filters weighting different formant allows to control regardless of short-term and long-term distortions.
  • the short-term weighting filter is cascaded to the filter of long-term weighting.
  • tying the denominator of the long-term weighting filter in the numerator of the short-term weighting allows these two to be controlled separately filters and also allows a clear simplification when these two filters are cascaded.
  • the first extraction means include a first filter perceptual weighting including a first weighting filter formantic, by the fact that the second means of extraction include the first perceptual weighting filter and a second perceptual weighting filter including a second formantic weighting filter, and the denominator of the function of transfer of the first formantic weighting filter is equal to numerator of the second formantic weighting filter.
  • the invention also relates to a terminal of a system wireless communication, such as a mobile phone cell, incorporating a device as defined above.
  • the FPP perceptual weighting filter uses the masking properties of the human ear compared to the spectral envelope of the speech signal, whose shape is a function resonances of the vocal tract. This filter allows you to assign more importance of the error appearing in the spectral valleys by compared to formic peaks.
  • W (z) AT ( z / ⁇ 1 ) AT ( z / ⁇ 2 ) in which 1 / A (z) is the transfer function of the predictive filter FP and ⁇ 1 and ⁇ 2 are the perceptual weighting coefficients, the two coefficients being positive or zero and less than or equal to 1 with the coefficient ⁇ 2 less than or equal to the coefficient ⁇ 1.
  • the perceptual weighting filter consists of a formantic weighting filter and a weighting of the slope of the spectral envelope of the signal (tilt).
  • FIG. 2 Such an embodiment according to the invention is illustrated in the Figure 2, in which, compared to Figure 1, the FPP single filter has been replaced by a first formantic weighting filter FPP1 for long-term research, cascaded with a second FPP2 formantic weighting filter for short search term.
  • FPP1 formantic weighting filter
  • FPP2 formantic weighting filter
  • the filters appearing in the long-term research loop should also appear in the short-term research loop.
  • the transfer function W 1 (z) of the formantic weighting filter FPP1 is given by formula (II) below.
  • W 1 ( z ) AT ( z / ⁇ 11 ) AT ( z / ⁇ 12 ) while the transfer function W 2 (z) of the formantic weighting filter FPP2 is given by formula (III) below.
  • W 2 ( z ) AT ( z / ⁇ 21 ) AT ( z / ⁇ 22 )
  • the coefficient ⁇ 12 is equal to the coefficient ⁇ 21 . This allows a clear simplification when cascading these two filters.
  • the filter equivalent to the cascade of these two filters has a transfer function given by formula (IV) below.
  • the synthesis filter FP (having the transfer function 1 / A (z)) followed by the long-term weighting filter FPP1 and the weighting filter FPP2 is then equivalent to the filter whose transfer function is given by formula (V) below. 1 AT ( z / ⁇ 22 )
  • the invention advantageously applies to telephony mobile, and in particular to all remote terminals belonging to a wireless communication system.
  • Such a terminal for example a TP mobile telephone, such as that illustrated in FIG. 3, conventionally comprises a antenna connected via a DUP duplexer to a chain reception CHR and a CHT transmission chain.
  • a baseband processor BB is connected to the chain respectively of reception CHR and to the chain of transmission CHT by via analog digital ADCs and analog digital DACs.
  • the processor BB performs processing in baseband, including DCN channel decoding, followed by DCS source decoding.
  • the processor For transmission, the processor performs source coding CCS followed by CCN channel coding.
  • the mobile phone incorporates an encoder according to the invention, it is incorporated within the coding means of CCS source, while the decoder is incorporated within the means DCS source decoding.

Abstract

Method for broadband speech encoding in which the voice of a speaker is sampled in order to obtain successive frames containing a predetermined number of samples, whereby a first weighting filter (FPP1) is used to extract a long-term excitation word and a second weighting filter (FPP2) is used cascaded with the first filter to extract a short-term excitation word. The denominator of the transfer function of the first filter is equal to the numerator of the second filter. An independent claim is made for a device for broadband speech encoding with first and second extraction means (MEXT1, MEXT2) with first and second weighing filters.

Description

L'invention concerne l'encodage/décodage de la parole à bande élargie, notamment mais non limitativement pour la téléphonie mobile.The invention relates to speech encoding / decoding extended band, in particular but not limited to telephony mobile.

En bande élargie, la bande passante du signal de parole est comprise entre 50 et 7000 Hz.In wideband, the bandwidth of the speech signal is between 50 and 7000 Hz.

Des séquences de parole successives échantillonnées à une fréquence d'échantillonnage prédéterminée, par exemple 16 kHz, sont traitées dans un dispositif de codage de type CELP, utilisant une prédiction linéaire à excitation par séquences codées ( par exemple ACELP : « algebraic-code-excited linear-prediction »), bien connu de l'homme du métier, et décrit notamment dans la recommandation ITU-TG 729, version 3/96, intitulée « codage de la parole à 8 kbits/s par prédiction linéaire avec excitation par séquences codées à structure algébrique conjuguée ».Successive speech sequences sampled at one predetermined sampling frequency, for example 16 kHz, are processed in a CELP-type coding device, using a linear prediction with excitation by coded sequences (by ACELP example: "algebraic-code-excited linear-prediction"), well known to those skilled in the art, and described in particular in the recommendation ITU-TG 729, version 3/96, entitled “coding of the speech at 8 kbits / s by linear prediction with excitation by coded sequences with conjugated algebraic structure ”.

On va maintenant rappeler brièvement, en se référant à la figure 1, les principales caractéristiques et fonctionnalités d'un tel codeur, l'homme du métier pouvant se référer à toutes fins utiles, pour plus de détails, à la recommandation G 729 précitée.We will now briefly recall, referring to the Figure 1, the main features and functionality of such coder, the skilled person can refer for all practical purposes, for more details, see the above-mentioned recommendation G 729.

Le codeur de prédiction CD, du type CELP, est fondé sur le modèle de codage prédictif linéaire à excitation par code. Le codeur opère sur des super-trames vocales équivalentes par exemple à 20 ms de signal et comportant chacune 320 échantillons. L'extraction des paramètres de prédiction linéaire, c'est-à-dire les coefficients du filtre de prédiction linéaire également appelé filtre de synthèse à court terme 1/A(z), est effectuée pour chaque super-trame de parole. Par contre, chaque super-trame est subdivisée en trames de 5 ms comprenant 80 échantillons. Toutes les trames, le signal vocal est analysé pour en extraire les paramètres du modèle de prédiction CELP (c'est-à-dire notamment, un mot numérique d'excitation à long terme vi extrait d'un répertoire codé adaptatif DLT, également dénommé « dictionnaire à long terme adaptatif », un gain à long terme associé Ga, un mot d'excitation à court terme cj, extrait d'un répertoire codé fixe DCT, également dénommé « dictionnaire à court terme », et un gain à court terme associé Gc).The CD prediction coder, of the CELP type, is based on the linear predictive coding model with code excitation. The coder operates on vocal superframes equivalent for example to 20 ms of signal and each comprising 320 samples. The extraction of the linear prediction parameters, ie the coefficients of the linear prediction filter also called short-term synthesis filter 1 / A (z), is carried out for each speech superframe. On the other hand, each superframe is subdivided into 5 ms frames comprising 80 samples. All the frames, the voice signal is analyzed to extract the parameters of the CELP prediction model (that is to say, in particular, a long-term digital excitation word v i extracted from an adaptive coded DLT directory, also called "adaptive long-term dictionary", an associated long-term gain Ga, a short-term excitation word c j , extracted from a fixed coded repertoire DCT, also called "short-term dictionary", and a gain at associated short term Gc).

Ces paramètres sont ensuite codés et transmis.These parameters are then coded and transmitted.

A la réception, ces paramètres servent, dans un décodeur, à récupérer les paramètres d'excitation et du filtre prédictif. On reconstitue alors la parole en filtrant ce flux d'excitation dans un filtre de synthèse à court terme.On reception, these parameters are used, in a decoder, to retrieve the excitation and predictive filter parameters. We then reconstitutes speech by filtering this excitation flow in a short-term synthesis filter.

Alors que le dictionnaire adaptatif DLT contient des mots numériques représentatifs de délais tonaux représentatifs d'excitations passées, le dictionnaire à court terme DCT est fondé sur une structure fixe, par exemple de type stochastique, ou de type algébrique utilisant un modèle de permutation entrelacée d'impulsions de Dirac. Dans le cas d'une structure algébrique, le répertoire codé contient des excitations innovatrices, également appelées excitations algébriques ou à court terme, et chaque vecteur contient un certain nombre d'impulsions non nulles, par exemple quatre, dont chacune peut avoir l'amplitude +1 ou -1 avec des positions prédéterminées.While the Adaptive DLT Dictionary Contains Words digital representative of representative tone delays of past excitations, the short-term dictionary DCT is founded on a fixed structure, for example of the stochastic type, or of the algebraic using an interlaced permutation model of Dirac pulses. In the case of an algebraic structure, the coded repertoire contains innovative excitations, also called algebraic or short-term excitations, and each vector contains a number of non-zero pulses, for example four, each of which can have amplitude +1 or -1 with predetermined positions.

Les moyens de traitement du codeur CD comportent fonctionnellement des premiers moyens d'extraction MEXT1 destinés à extraire le mot d'excitation à long terme, et des deuxièmes moyens d'extraction MEXT2 destinés à extraire le mot d'excitation à court terme. Fonctionnellement, ces moyens sont réalisés par exemple de façon logicielle au sein d'un processeur.The CD encoder processing means include functionally of the first MEXT1 extraction means intended to extract the word long-term excitement, and second MEXT2 extraction means intended to extract the word short-term excitement. Functionally, these means are made for example in software within a processor.

Ces moyens d'extraction comportent un filtre prédictif FP ayant une fonction de transfert égale à 1/A(z), ainsi qu'un filtre de pondération perceptuel FPP ayant une fonction de transfert W(z). Le filtre de pondération perceptuelle est appliqué au signal pour modéliser la perception de l'oreille. These extraction means include a predictive filter FP having a transfer function equal to 1 / A (z), as well as a filter FPP perceptual weighting with a transfer function W (z). The perceptual weighting filter is applied to the signal to model the perception of the ear.

Par ailleurs, les moyens d'extraction comportent des moyens MECM destinés à effectuer une minimisation d'une erreur carrée moyenne.Furthermore, the extraction means include means MECM intended to perform a minimization of a square error average.

Le filtre de synthèse FP de la prédiction linéaire modélise l'enveloppe spectrale du signal. L'analyse prédictive linéaire est effectuée toutes les super-trames, de façon à déterminer les coefficients de filtrage prédictif linéaire. Ceux-ci sont convertis en paires de lignes spectrales (LSP : « Line Spectrum Pairs ») et numérisés par quantification vectorielle prédictive en deux étapes.The linear prediction FP synthesis filter models the spectral envelope of the signal. Linear predictive analysis is performed all superframes, so as to determine the linear predictive filter coefficients. These are converted to spectral line pairs (LSP: “Line Spectrum Pairs”) and digitized by predictive vector quantization in two stages.

Chaque super-trame de parole de 20 ms est divisée en quatre trames de 5 ms chacune contenant 80 échantillons. Les paramètres LSP quantifiés sont transmis au décodeur une fois par super-trame alors que les paramètres à long terme et à court terme sont transmis à chaque trame.Each 20 ms speech superframe is divided into four frames of 5 ms each containing 80 samples. The settings Quantized LSPs are transmitted to the decoder once per superframe while long term and short term parameters are passed at each frame.

Les coefficients du filtre de prédiction linéaire, quantifiés et non quantifiés, sont utilisés pour la trame la plus récente d'une super-trame, tandis que les trois autres trames de la même super-trame utilisent une interpolation de ces coefficients. Le délai tonal en boucle ouverte est estimé par exemple toutes les deux trames sur la base du signal vocal pondéré perceptuellement. Puis, les opérations suivantes sont répétées à chaque trame :The coefficients of the linear prediction filter, quantified and not quantified, are used for the most recent frame of a super-frame, while the other three frames of the same super-frame use an interpolation of these coefficients. Tonal delay in open loop is estimated for example every two frames on the basis of the perceptually weighted voice signal. Then, the The following operations are repeated for each frame:

Le signal cible à long terme XLT est calculé en filtrant le signal de parole échantillonné s(n) par le filtre de pondération perceptuelle FPP.The long-term target signal X LT is calculated by filtering the sampled speech signal s (n) by the perceptual weighting filter FPP.

On soustrait ensuite du signal vocal pondéré la réponse à entrée nulle du filtre de synthèse pondéré FP, FPP, de façon à obtenir un nouveau signal cible long terme.We then subtract the response to null input of the weighted synthesis filter FP, FPP, so that get a new long-term target signal.

La réponse impulsionnelle du filtre de synthèse pondéré est calculé.The impulse response of the weighted synthesis filter is calculated.

Une analyse tonale en boucle fermée utilisant une minimisation de l'erreur carrée moyenne, est ensuite effectuée afin de déterminer le mot d'excitation à long terme vi et le gain associé Ga, au moyen du signal cible et de la réponse impulsionnelle, par recherche autour de la valeur du délai tonal en boucle ouverte. A closed loop tonal analysis using a minimization of the mean square error is then carried out in order to determine the long-term excitation word v i and the associated gain Ga, by means of the target signal and the impulse response, by searches around the value of the tone delay in open loop.

Le signal cible long terme est ensuite mis à jour par soustraction de la contribution filtrée y du répertoire codé adaptatif DLT et ce nouveau signal cible court terme XST est utilisé lors de l'exploration du répertoire codé fixe DCT afin de déterminer le mot d'excitation court terme Cj et le gain Gc associé. Là encore, cette recherche en boucle fermée s'effectue par une minimisation de l'erreur carrée moyenne.The long-term target signal is then updated by subtracting the filtered contribution y from the adaptive coded directory DLT and this new short-term target signal X ST is used when exploring the fixed coded directory DCT in order to determine the password. short term excitation C j and the associated gain G c . Again, this closed loop search is performed by minimizing the mean square error.

Finalement, le dictionnaire à long terme adaptatif DLT ainsi que les mémoires des filtres FP et FPP, sont mis à jour au moyen des mots d'excitation long terme et court terme ainsi déterminés.Finally, the long-term adaptive DLT dictionary as well that the memories of the filters FP and FPP, are updated by means of the long term and short term excitation words so determined.

La qualité d'un algorithme CELP dépend fortement de la richesse du dictionnaire d'excitation à court terme DCT, par exemple d'un dictionnaire d'excitation algébrique. Si l'efficacité d'un tel algorithme est incontestable pour les signaux à bande passante étroite (300-3400 Hz), des problèmes surviennent pour des signaux à bande élargie.The quality of a CELP algorithm strongly depends on the richness of the DCT short term excitation dictionary for example from an algebraic excitation dictionary. If the effectiveness of such algorithm is unquestionable for bandwidth signals narrow (300-3400 Hz), problems arise for signals with widened band.

L'invention a pour but de contrôler indépendamment les distorsions à court terme et à long terme.The object of the invention is to independently control the short-term and long-term distortions.

L'invention propose donc un procédé d'encodage de la parole à bande élargie, dans lequel on échantillonne la parole de façon à obtenir des trames vocales successives comportant chacune un nombre prédéterminé d'échantillons, et à chaque trame vocale, on détermine des paramètres d'un modèle de prédiction linéaire à excitation par code, ces paramètres comportant un mot numérique d'excitation à long terme extrait d'un répertoire codé adaptatif, ainsi qu'un mot d'excitation à court terme extrait d'un répertoire codé fixe associé.The invention therefore provides a speech encoding method with wide band, in which the speech is sampled so as to obtain successive voice frames each comprising a predetermined number of samples, and for each voice frame, we determines parameters of a linear prediction model at excitation by code, these parameters comprising a numeric word of long-term excitement extracted from an adaptive coded repertoire, as well that a word of short-term excitement extracted from a coded repertoire associated fixed.

Selon une caractéristique générale de l'invention, on effectue l'extraction du mot d'excitation à long terme en utilisant un premier filtre de pondération perceptuelle comportant un premier filtre de pondération formantique, on effectue l'extraction du mot d'excitation à court terme en utilisant le premier filtre de pondération perceptuelle cascadé à un deuxième filtre de pondération perceptuelle comportant un deuxième filtre de pondération formantique. Le dénominateur de la fonction de transfert du premier filtre de pondération formantique est égal au numérateur du deuxième filtre de pondération formantique.According to a general characteristic of the invention, long term excitation word extraction using a prime perceptual weighting filter comprising a first filter formantic weighting, we extract the word short-term excitation using the first filter perceptual weighting cascaded to a second filter of perceptual weighting including a second filter formantic weighting. The denominator of the transfer function of the first formantic weighting filter is equal to the numerator of the second formantic weighting filter.

Ainsi, selon l'invention, l'utilisation de deux filtres de pondération formantique différents permet de contrôler indépendamment les distorsions à court terme et à long terme. Le filtre de pondération à court terme est cascadé au filtre de pondération à long terme. En outre, le fait de lier le dénominateur du filtre de pondération à long terme au numérateur du filtre de pondération à court terme permet de contrôler séparément ces deux filtres et permet en outre une nette simplification lorsque ces deux filtres sont cascadés.Thus, according to the invention, the use of two filters weighting different formant allows to control regardless of short-term and long-term distortions. The short-term weighting filter is cascaded to the filter of long-term weighting. Also, tying the denominator of the long-term weighting filter in the numerator of the short-term weighting allows these two to be controlled separately filters and also allows a clear simplification when these two filters are cascaded.

L'invention a également pour objet un dispositif d'encodage de la parole à bande élargie, comportant

  • des moyens d'échantillonnage aptes à échantillonner la parole de façon à obtenir des trames vocales successives comportant chacune un nombre prédéterminé d'échantillons,
  • des moyens de traitement aptes à chaque trame vocale, à déterminer des paramètres d'un modèle de prédiction linéaire à excitation par code, ces moyens de traitement comportant des premiers moyens d'extraction aptes à extraire un mot numérique d'excitation à long terme d'un répertoire codé adaptatif, et des deuxièmes moyens d'extraction aptes à extraire un mot d'excitation à court terme d'un répertoire codé fixe.
The subject of the invention is also a device for encoding speech with a wide band, comprising
  • sampling means able to sample the speech so as to obtain successive speech frames each comprising a predetermined number of samples,
  • processing means suitable for each voice frame, in determining parameters of a linear prediction model with code excitation, these processing means comprising first extraction means capable of extracting a digital word of long-term excitation d 'an adaptive coded repertoire, and second extraction means capable of extracting a short-term excitation word from a fixed coded repertoire.

Selon une caractéristique générale de l'invention, les premiers moyens d'extraction comprennent un premier filtre de pondération perceptuelle comportant un premier filtre de pondération formantique, par le fait que les deuxièmes moyens d'extraction comprennent le premier filtre de pondération perceptuelle et un deuxième filtre de pondération perceptuelle comportant un deuxième filtre de pondération formantique, et le dénominateur de la fonction de transfert du premier filtre de pondération formantique est égal au numérateur du deuxième filtre de pondération formantique. According to a general characteristic of the invention, the first extraction means include a first filter perceptual weighting including a first weighting filter formantic, by the fact that the second means of extraction include the first perceptual weighting filter and a second perceptual weighting filter including a second formantic weighting filter, and the denominator of the function of transfer of the first formantic weighting filter is equal to numerator of the second formantic weighting filter.

L'invention a également pour objet un terminal d'un système de communication sans fil, par exemple un téléphone mobile cellulaire, incorporant un dispositif tel que défini ci-avant.The invention also relates to a terminal of a system wireless communication, such as a mobile phone cell, incorporating a device as defined above.

D'autres avantages et caractéristiques de l'invention apparaítront à l'examen de la description détaillée de modes de réalisation et de mise en oeuvre, nullement limitatifs, et des dessins annexés, sur lesquels :

  • la figure 1, déjà décrite, illustre schématiquement un dispositif d'encodage de la parole, selon l'art antérieur ;
  • la figure 2 illustre schématiquement un mode de réalisation d'un dispositif d'encodage, selon l'invention ; et
  • la figure 3 illustre schématiquement l'architecture interne d'un téléphone mobile cellulaire incorporant un dispositif de codage, selon l'invention.
Other advantages and characteristics of the invention will appear on examining the detailed description of embodiments and implementation, in no way limiting, and the appended drawings, in which:
  • Figure 1, already described, schematically illustrates a speech encoding device, according to the prior art;
  • Figure 2 schematically illustrates an embodiment of an encoding device according to the invention; and
  • FIG. 3 schematically illustrates the internal architecture of a cellular mobile telephone incorporating a coding device according to the invention.

Le filtre de pondération perceptuelle FPP exploite les propriétés de masquage de l'oreille humaine par rapport à l'enveloppe spectrale du signal de parole, dont la forme est fonction des résonances du conduit vocal. Ce filtre permet d'attribuer plus d'importance à l'erreur apparaissant dans les vallées spectrales par rapport aux pics formantiques.The FPP perceptual weighting filter uses the masking properties of the human ear compared to the spectral envelope of the speech signal, whose shape is a function resonances of the vocal tract. This filter allows you to assign more importance of the error appearing in the spectral valleys by compared to formic peaks.

Dans l'art antérieur illustré sur la figure 1, le même filtre de pondération perceptuelle FPP est utilisé pour la recherche à court terme et pour celle à long terme. La fonction de transfert W(z) de ce filtre FPP est donnée par la formule (I) ci-dessous. W(z) = A(z1) A(z2) dans laquelle 1/A(z) est la fonction de transfert du filtre prédictif FP et γ1 et γ2 sont les coefficients de pondération perceptuelle, les deux coefficients étant positifs ou nuls et inférieurs ou égaux à 1 avec le coefficient γ2 inférieur ou égal au coefficient γ1. In the prior art illustrated in FIG. 1, the same FPP perceptual weighting filter is used for short-term research and for long-term research. The transfer function W (z) of this FPP filter is given by the formula (I) below. W ( z ) = AT ( z / γ 1 ) AT ( z / γ 2 ) in which 1 / A (z) is the transfer function of the predictive filter FP and γ1 and γ2 are the perceptual weighting coefficients, the two coefficients being positive or zero and less than or equal to 1 with the coefficient γ2 less than or equal to the coefficient γ1.

D'une façon générale, le filtre de pondération perceptuelle est constitué d'un filtre de pondération formantique et d'un filtre de pondération de la pente de l'enveloppe spectrale du signal (tilt).In general, the perceptual weighting filter consists of a formantic weighting filter and a weighting of the slope of the spectral envelope of the signal (tilt).

Dans le cas présent, on supposera que le filtre de pondération perceptuelle est uniquement formé du filtre de pondération formantique dont la fonction de transfert est donnée par la formule (I) ci-dessus.In this case, assume that the weighting filter perceptual is only formed by the weighting filter formant whose transfer function is given by the formula (I) above.

Or, la nature spectrale de la contribution à long terme est différente de celle de la contribution à court terme. Par conséquent, il est avantageux d'utiliser deux filtres de pondération formantique différents, permettant de contrôler indépendamment les distorsions à court terme et à long terme.The spectral nature of the long-term contribution is different from the short-term contribution. Therefore, it is advantageous to use two formantic weighting filters different, allowing independent control of distortions at short term and long term.

Un tel mode de réalisation selon l'invention est illustré sur la figure 2, dans laquelle, par rapport à la figure 1, le filtre unique FPP a été remplacé par un premier filtre de pondération formantique FPP1 pour la recherche à long terme, cascadé avec un deuxième filtre de pondération formantique FPP2 pour la recherche à court terme.Such an embodiment according to the invention is illustrated in the Figure 2, in which, compared to Figure 1, the FPP single filter has been replaced by a first formantic weighting filter FPP1 for long-term research, cascaded with a second FPP2 formantic weighting filter for short search term.

Puisque le filtre de pondération à court terme FPP2 est cascadé au filtre de pondération à long terme, les filtres apparaissant dans la boucle de recherche long terme doivent aussi apparaítre dans la boucle de recherche court terme.Since the short-term weighting filter FPP2 is cascaded to the long-term weighting filter, the filters appearing in the long-term research loop should also appear in the short-term research loop.

La fonction de transfert W1 (z) du filtre de pondération formantique FPP1 est donnée par la formule (II) ci-dessous. W 1(z) = A(z11) A(z12) tandis que la fonction de transfert W2(z) du filtre de pondération formantique FPP2 est donnée par la formule (III) ci-dessous. W 2(z) = A(z21) A(z22) The transfer function W 1 (z) of the formantic weighting filter FPP1 is given by formula (II) below. W 1 ( z ) = AT ( z / γ 11 ) AT ( z / γ 12 ) while the transfer function W 2 (z) of the formantic weighting filter FPP2 is given by formula (III) below. W 2 ( z ) = AT ( z / γ 21 ) AT ( z / γ 22 )

Par ailleurs, le coefficient γ12 est égal au coefficient γ21. Ceci permet une nette simplification lorsqu'on cascade ces deux filtres. Ainsi, le filtre équivalent à la cascade de ces deux filtres a une fonction de transfert donnée par la formule (IV) ci-dessous. A(z11) A(z22) Furthermore, the coefficient γ 12 is equal to the coefficient γ 21 . This allows a clear simplification when cascading these two filters. Thus, the filter equivalent to the cascade of these two filters has a transfer function given by formula (IV) below. AT ( z / γ 11 ) AT ( z / γ 22 )

Par ailleurs, si l'on utilise la valeur 1 pour le coefficient γ11, alors le filtre de synthèse FP (ayant la fonction de transfert 1/A(z)) suivi du filtre de pondération à long terme FPP1 et du filtre de pondération FPP2 équivaut alors au filtre dont la fonction de transfert est donnée par la formule (V) ci-dessous. 1 A(z22) Furthermore, if the value 1 is used for the coefficient γ 11 , then the synthesis filter FP (having the transfer function 1 / A (z)) followed by the long-term weighting filter FPP1 and the weighting filter FPP2 is then equivalent to the filter whose transfer function is given by formula (V) below. 1 AT ( z / γ 22 )

Ce qui réduit encore considérablement la complexité de l'algorithme d'extraction des excitations.This further reduces the complexity of the excitation extraction algorithm.

A titre indicatif, on peut par exemple utiliser pour les coefficients γ11, γ21 = γ12 et γ22, les valeurs respectives 1 ; 0,1 et 0,9.As an indication, one can for example use for the coefficients γ 11 , γ 21 = γ 12 and γ 22 , the respective values 1; 0.1 and 0.9.

L'invention s'applique avantageusement à la téléphonie mobile, et en particulier à tous terminaux distants appartenant à un système de communication sans fil.The invention advantageously applies to telephony mobile, and in particular to all remote terminals belonging to a wireless communication system.

Un tel terminal, par exemple un téléphone mobile TP, tel que celui illustré sur la figure 3, comporte de façon classique une antenne reliée par l'intermédiaire d'un duplexeur DUP à une chaíne de réception CHR et à une chaíne de transmission CHT. Un processeur en bande de base BB est relié respectivement à la chaíne de réception CHR et à la chaíne de transmission CHT par l'intermédiaire de convertisseurs analogiques numériques CAN et numériques analogiques CNA. Such a terminal, for example a TP mobile telephone, such as that illustrated in FIG. 3, conventionally comprises a antenna connected via a DUP duplexer to a chain reception CHR and a CHT transmission chain. A baseband processor BB is connected to the chain respectively of reception CHR and to the chain of transmission CHT by via analog digital ADCs and analog digital DACs.

Classiquement, le processeur BB effectue des traitements en bande de base, et notamment un décodage de canal DCN, suivi d'un décodage de source DCS.Conventionally, the processor BB performs processing in baseband, including DCN channel decoding, followed by DCS source decoding.

Pour l'émission, le processeur effectue un codage de source CCS suivi d'un codage de canal CCN.For transmission, the processor performs source coding CCS followed by CCN channel coding.

Lorsque le téléphone mobile incorpore un codeur selon l'invention, celui-ci est incorporé au sein des moyens de codage de source CCS, tandis que le décodeur est incorporé au sein des moyens de décodage de source DCS.When the mobile phone incorporates an encoder according to the invention, it is incorporated within the coding means of CCS source, while the decoder is incorporated within the means DCS source decoding.

Claims (4)

Procédé d'encodage de la parole à bande élargie, dans lequel on échantillonne la parole de façon à obtenir des trames vocales successives comportant chacune un nombre prédéterminé d'échantillons, et à chaque trame vocale on détermine des paramètres d'un modèle de prédiction linéaire à excitation par code, ces paramètres comportant un mot numérique d'excitation à long terme extrait d'un répertoire codé adaptatif, ainsi qu'un mot d'excitation à court terme extrait d'un répertoire codé fixe, caractérisé par le fait qu'on effectue l'extraction du mot d'excitation à long terme en utilisant un premier filtre de pondération perceptuelle comportant un premier filtre de pondération formantique (FPP1), par le fait qu'on effectue l'extraction du mot d'excitation à court terme en utilisant le premier filtre de pondération perceptuelle (FPP1) cascadé à un deuxième filtre de pondération perceptuelle comportant un deuxième filtre de pondération formantique (FPP2), et par le fait que le dénominateur de la fonction de transfert du premier filtre de pondération formantique est égal au numérateur du deuxième filtre de pondération formantique.Wideband speech encoding method, in which speech is sampled so as to obtain successive speech frames each comprising a predetermined number of samples, and for each speech frame parameters of a linear prediction model are determined with code excitation, these parameters comprising a long-term digital excitation word extracted from an adaptive coded repertoire, as well as a short-term excitation word extracted from a fixed coded repertoire, characterized in that the long term excitation word is extracted using a first perceptual weighting filter comprising a first formantic weighting filter (FPP1), by the fact that the short term excitation word is extracted using the first perceptual weighting filter (FPP1) cascaded to a second perceptual weighting filter comprising a second formant weighting filter (FP P2), and by the fact that the denominator of the transfer function of the first formant weighting filter is equal to the numerator of the second formant weighting filter. Dispositif d'encodage de la parole à bande élargie, comportant des moyens d'échantillonnage aptes à échantillonner la parole de façon à obtenir des trames vocales successives comportant chacune un nombre prédéterminé d'échantillons, et des moyens de traitement aptes à chaque trame vocale, à déterminer des paramètres d'un modèle de prédiction linéaire à excitation par code, ces moyens de traitement comportant des premiers moyens d'extraction aptes à extraire un mot numérique d'excitation à long terme d'un répertoire codé adaptatif, ainsi que des deuxièmes moyens d'extraction aptes à extraire un mot d'excitation à court terme d'un répertoire codé fixe, caractérisé par le fait que les premiers moyens d'extraction (MEXT1) comprennent un premier filtre de pondération perceptuelle comportant un premier filtre de pondération formantique (FPP1), par le fait que les deuxièmes moyens d'extraction (MEXT2) comprennent le premier filtre de pondération perceptuelle (FPP1) cascadé à un deuxième filtre de pondération perceptuelle comportant un deuxième filtre de pondération formantique (FPP2), et par le fait que le dénominateur de la fonction de transfert du premier filtre de pondération formantique est égal au numérateur du deuxième filtre de pondération formantique.Enlarged band speech encoding device, comprising sampling means capable of sampling the speech so as to obtain successive speech frames each comprising a predetermined number of samples, and processing means suitable for each speech frame, determining parameters of a linear prediction model with code excitation, these processing means comprising first extraction means capable of extracting a digital long-term excitation word from an adaptive coded directory, as well as second extraction means capable of extracting a short-term excitation word from a fixed coded directory, characterized in that the first extraction means (MEXT1) comprise a first perceptual weighting filter comprising a first formantic weighting filter (FPP1), by the fact that the second extraction means (MEXT2) include the first perceptual weighting filter (FPP1) cascaded to a second perceptual weighting filter comprising a second formant weighting filter (FPP2), and by the fact that the denominator of the transfer function of the first formant weighting filter is equal to the numerator of the second formant weighting filter . Terminal d'un système de communication sans fil, caractérisé par le fait qu'il incorpore un dispositif selon la revendication 2.Terminal of a wireless communication system, characterized in that it incorporates a device according to claim 2. Terminal selon la revendication 3, caractérisé par le fait qu'il forme un téléphone mobile cellulaire.Terminal according to claim 3, characterized in that it forms a cellular mobile telephone.
EP03291749A 2002-07-17 2003-07-15 Method and device for wideband speech coding able to independently control short-term and long-term distortions Withdrawn EP1388846A3 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP03291749A EP1388846A3 (en) 2002-07-17 2003-07-15 Method and device for wideband speech coding able to independently control short-term and long-term distortions

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
EP02015919 2002-07-17
EP02015919A EP1383113A1 (en) 2002-07-17 2002-07-17 Method and device for wide band speech coding capable of controlling independently short term and long term distortions
EP03291749A EP1388846A3 (en) 2002-07-17 2003-07-15 Method and device for wideband speech coding able to independently control short-term and long-term distortions

Publications (2)

Publication Number Publication Date
EP1388846A2 true EP1388846A2 (en) 2004-02-11
EP1388846A3 EP1388846A3 (en) 2008-08-20

Family

ID=30445142

Family Applications (1)

Application Number Title Priority Date Filing Date
EP03291749A Withdrawn EP1388846A3 (en) 2002-07-17 2003-07-15 Method and device for wideband speech coding able to independently control short-term and long-term distortions

Country Status (1)

Country Link
EP (1) EP1388846A3 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5926785A (en) * 1996-08-16 1999-07-20 Kabushiki Kaisha Toshiba Speech encoding method and apparatus including a codebook storing a plurality of code vectors for encoding a speech signal
US6173257B1 (en) * 1998-08-24 2001-01-09 Conexant Systems, Inc Completed fixed codebook for speech encoder

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5926785A (en) * 1996-08-16 1999-07-20 Kabushiki Kaisha Toshiba Speech encoding method and apparatus including a codebook storing a plurality of code vectors for encoding a speech signal
US6173257B1 (en) * 1998-08-24 2001-01-09 Conexant Systems, Inc Completed fixed codebook for speech encoder

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
CHEN J-H ET AL: "Improving the performance of the 16 kb/s LD-CELP speech coder" DIGITAL SIGNAL PROCESSING 2, ESTIMATION, VLSI. SAN FRANCISCO, MAR. 23 - 26, 1992, PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), NEW YORK, IEEE, US, vol. 5 CONF. 17, 23 mars 1992 (1992-03-23), pages 69-72, XP010058714 ISBN: 0-7803-0532-9 *

Also Published As

Publication number Publication date
EP1388846A3 (en) 2008-08-20

Similar Documents

Publication Publication Date Title
EP2002428B1 (en) Method for trained discrimination and attenuation of echoes of a digital signal in a decoder and corresponding device
EP0782128B1 (en) Method of analysing by linear prediction an audio frequency signal, and its application to a method of coding and decoding an audio frequency signal
EP1320087B1 (en) Synthesis of an excitation signal for use in a comfort noise generator
EP0710947B1 (en) Method and apparatus for noise suppression in a speech signal and corresponding system with echo cancellation
EP0784311A1 (en) Method and device for voice activity detection and a communication device
FR2596936A1 (en) VOICE SIGNAL TRANSMISSION SYSTEM
KR100417351B1 (en) Reducing Sparseness in Coded Speech Signals
EP1125283B1 (en) Method for quantizing speech coder parameters
Kroon et al. Predictive coding of speech using analysis-by-synthesis techniques
EP1267325A1 (en) Process for voice activity detection in a signal, and speech signal coder comprising a device for carrying out the process
EP2979266B1 (en) Optimized partial mixing of audio streams encoded by sub-band encoding
EP0428445A1 (en) Method and apparatus for coding of predictive filters in very low bitrate vocoders
EP0685833B1 (en) Method for speech coding using linear prediction
WO2007107670A2 (en) Method for post-processing a signal in an audio decoder
EP1429316B1 (en) System and method for multi-referenced correction of spectral voice distortions introduced by a communication network
EP1383109A1 (en) Method and device for wide band speech coding
EP2652735B1 (en) Improved encoding of an improvement stage in a hierarchical encoder
EP1383113A1 (en) Method and device for wide band speech coding capable of controlling independently short term and long term distortions
EP1388846A2 (en) Method and device for wideband speech coding able to independently control short-term and long-term distortions
EP0616315A1 (en) Digital speech coding and decoding device, process for scanning a pseudo-logarithmic LTP codebook and process of LTP analysis
WO2023165946A1 (en) Optimised encoding and decoding of an audio signal using a neural network-based autoencoder
EP1383110A1 (en) Method and device for wide band speech coding, particularly allowing for an improved quality of voised speech frames
EP1383112A2 (en) Method and device for enlarged bandwidth speech coding, allowing in particular an improved quality of voiced frames
JPH09508479A (en) Burst excitation linear prediction
FR2783651A1 (en) DEVICE AND METHOD FOR FILTERING A SPEECH SIGNAL, RECEIVER AND TELEPHONE COMMUNICATIONS SYSTEM

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LI LU MC NL PT RO SE SI SK TR

AX Request for extension of the european patent

Extension state: AL LT LV MK

PUAL Search report despatched

Free format text: ORIGINAL CODE: 0009013

AK Designated contracting states

Kind code of ref document: A3

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LI LU MC NL PT RO SE SI SK TR

AX Request for extension of the european patent

Extension state: AL LT LV MK

RIC1 Information provided on ipc code assigned before grant

Ipc: G10L 19/12 20060101AFI20080716BHEP

AKX Designation fees paid
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20090203

REG Reference to a national code

Ref country code: DE

Ref legal event code: 8566