EP0710947A1

EP0710947A1 - Method and apparatus for noise suppression in a speech signal and corresponding system with echo cancellation

Info

Publication number: EP0710947A1
Application number: EP95402385A
Authority: EP
Inventors: Ivan Bourmeyster; Frédéric Lejay
Original assignee: Alcatel Mobile Communication France SA; Alcatel Mobile Phones SA
Current assignee: Alcatel Lucent SAS
Priority date: 1994-10-28
Filing date: 1995-10-25
Publication date: 1996-05-08
Anticipated expiration: 2015-10-25
Also published as: CA2161575A1; EP0710947B1; FI955086A; JPH08213936A; US5680393A; AU698081B2; JP4567655B2; FR2726392A1; DE69529328T2; JP2007129736A; FI955086A0; AU3444295A; NZ280224A; DE69529328D1; ATE230890T1; FR2726392B1

Abstract

The method involves applying noisy input speech signals (S(t)) to a sampler (1a). The signal is divided, with part of the signal passed to a noisy signal processor (100). The processor extracts noisy energy signals (10) and passes them to an S/N ratio estimator (11). The signal is then passed to a gain calculation circuit (12) before input to a circuit determining the filter function coefficient. The output C(nT) from the filter function circuit is applied to a FIR filter (14) in the second channel.

Description

La présente invention concerne de manière générale des procédé et dispositif de suppression d'un signal de bruit dans un signal de parole, typiquement pour une application à la radiotéléphonie main-libre. Elle a trait également à un système mettant en oeuvre un tel dispositif en combinaison avec un annuleur d'écho. The present invention relates generally to method and device for suppressing a noise signal in a speech signal, typically for an application to hands-free radiotelephony. It also relates to a system implementing such a device in combination with an echo canceller.

Dans un environnement bruité, le signal électrique résultant d'une conversion acousto-électrique d'un signal de parole est mélangé à un signal de bruit. Dès lors que le niveau du signal de bruit dans cet environnement est élevé, par exemple dans l'habitacle d'un véhicule, il est nécessaire de mettre en oeuvre des traitements visant à supprimer le signal de bruit dans le signal électrique de parole. Typiquement, deux types de traitement de suppression de bruit sont distingués selon la technique antérieure : le traitement par soustraction spectrale et le traitement dit par banc de filtres. In a noisy environment, the electrical signal resulting from an acousto-electric conversion of a signal speech is mixed with a noise signal. As soon as the noise level in this environment is high, for example in the passenger compartment of a vehicle, it is necessary to implement treatments aimed at suppress the noise signal in the electrical signal of speech. Typically, two types of removal processing are distinguished according to the prior art: the spectral subtraction processing and the so-called processing by filter bank.

Le traitement par banc de filtres, tel que décrit dans le brevet américain US-A-4 628 529, consiste en une étape de séparation du signal d'entrée en une pluralité de signaux temporels représentatifs chacun d'une bande de fréquences prédéterminée respective, une étape d'estimation d'un rapport signal sur bruit pour chacun de ces signaux temporels, une étape de pondération de ces signaux temporels par des coefficients respectifs qui sont chacun fonction de l'un respectif des rapports signal sur bruit pour le signal temporel considéré, et une étape d'addition de ces signaux temporels pondérés en un signal résultant consistant en un signal de parole dans lequel le signal de bruit est supprimé. Typiquement, chacun des rapports signal sur bruit est estimé en fonction de la variation de la puissance du signal temporel considéré dans sa bande de fréquences respective. Un tel traitement par banc de filtres nécessite des moyens de calcul considérables en raison du fait que toutes les étapes de séparation, estimation, pondération et addition précitées sont réalisées dans le domaine temporel. En pratique, dans un radiotéléphone, les moyens de calcul étant limités, en termes de Millions d'Instructions Par Seconde (Mips), par la capacité du processeur de traitement de signal numérique (DSP en terminologie anglo-saxonne pour Digital Signal Processor), il est alors proposé de réduire le traitement de suppression de signal de bruit à des bandes de fréquences grossières, et donc réduire la finesse, ou précision, de ce traitement. Treatment by filter bank, as described in American patent US-A-4 628 529, consists of a step of separating the input signal into a plurality of representative time signals each of a band of respective predetermined frequencies, an estimation step a signal-to-noise ratio for each of these signals temporal, a step of weighting these temporal signals by respective coefficients which are each a function of the respective one signal to noise ratios for the signal time considered, and a step of adding these signals time weighted as a resulting signal consisting of a speech signal in which the noise signal is deleted. Typically, each of the signal-to-noise ratios is estimated based on the variation of the power of the time signal considered in its frequency band respective. Such a treatment by filter bank requires considerable means of calculation due to the fact that all stages of separation, estimation, weighting and above addition are carried out in the time domain. In practice, in a radiotelephone, the means of calculation being limited, in terms of Millions of Instructions Per Second (Mips), by the capacity of the processing processor digital signal (DSP in English terminology for Digital Signal Processor), it is then proposed to reduce noise signal suppression processing at bands coarse frequencies, and therefore reduce the fineness, or precision, of this treatment.

Le traitement par soustraction spectrale recourt pour sa part au domaine fréquentiel, typiquement par utilisation de transformée de Fourier Rapide FFT (Fast Fourier Transform en terminologie anglo-saxonne). Il présente l'inconvénient principal d'induire une distorsion non linéaire dans le signal de parole traité qui résulte de la perte d'information de phase de ce signal. En effet ce traitement par soustraction spectrale produit une telle distorsion car il applique aux échantillons résultant de la Transformée de Fourier Rapide du signal de parole bruité à traiter des fonctions de module au carré qui suppriment l'information de phase, rendant ainsi ce traitement non linéaire. En outre, le défaut de linéarité du traitement par soustraction spectrale empêche son utilisation efficacement en combinaison avec un traitement d'annulation d'écho, telle que proposée par l'invention, car le dispositif d'annulation d'écho est perturbé dans son fonctionnement par cette perte d'information de phase. Spectral subtraction processing uses its part in the frequency domain, typically by use Fast Fourier Transform (FFT) in Anglo-Saxon terminology). It has the disadvantage main to induce nonlinear distortion in the processed speech signal resulting from loss phase information of this signal. Indeed this treatment by spectral subtraction produces such a distortion because it applies to samples resulting from the Transformation of Fast Fourier of the noisy speech signal to process squared module functions that suppress information from phase, making this treatment non-linear. In addition, the lack of linearity of the subtraction processing spectral prevents its effective use in combination with echo cancellation processing, such as as proposed by the invention, because the cancellation device echo is disturbed in its functioning by this loss phase information.

Un premier objectif de la présente invention est de fournir un procédé de suppression de bruit dans un signal de parole qui a pour avantage de réduire considérablement la puissance de calcul nécessaire en termes de Nombre d'Instructions par Seconde, comparativement à un traitement par banc de filtres. A first objective of the present invention is to provide a method of suppressing noise in a signal speech which has the advantage of considerably reducing the computing power required in terms of Number Instructions Per Second, Compared To Treatment by filter bank.

Un second objectif de l'invention est de fournir un procédé n'induisant pas une distorsion non linéaire dans le signal de parole à traiter, par contraste avec le traitement par soustraction spectrale. A second objective of the invention is to provide a process which does not induce non-linear distortion in the speech signal to be processed, in contrast to the processing by spectral subtraction.

Un autre objectif de l'invention est de fournir un système comprenant un dispositif de suppression de bruit mettant en oeuvre les étapes du procédé, en combinaison avec un dispositif d'annulation d'écho. Another object of the invention is to provide a system including a noise canceling device implementing the process steps, in combination with an echo canceller.

A cette fin, un procédé de suppression d'un signal de bruit dans un signal de parole bruité qui est échantillonné est caractérisé selon l'invention par les étapes de :

traitement fréquentiel numérique dudit signal de parole bruité, pour produire des coefficients temporels de filtrage, et
traitement temporel numérique dudit signal de parole bruité en fonction desdits coefficients de filtrage, en un signal de parole dans lequel ledit signal de bruit est sensiblement supprimé.

To this end, a method of suppressing a noise signal in a noisy speech signal which is sampled is characterized according to the invention by the steps of:

digital frequency processing of said noisy speech signal, to produce temporal filtering coefficients, and
digital time processing of said noisy speech signal as a function of said filter coefficients, into a speech signal in which said noise signal is substantially suppressed.

Le procédé définit, pour un cycle de traitement donné, les étapes de traitement fréquentiel numérique de:

extraction d'une pluralité de composantes énergétiques fréquentielles dans ledit signal de parole bruité,
estimation, pour chacune des composantes énergétiques fréquentielles extraites, d'un rapport entre un niveau d'énergie du signal de parole bruité et un niveau d'énergie du signal de bruit,
détermination d'un gain respectif pour ladite chacune des composantes énergétiques fréquentielles extraites, en fonction dudit rapport estimé entre le niveau d'énergie du signal de parole bruité et le niveau d'énergie du signal de bruit pour ladite chacune sélectionnée des composantes fréquentielles, et
synthèse desdits coefficients de filtrage en fonction desdits gains.

The method defines, for a given processing cycle, the digital frequency processing steps of:

extraction of a plurality of frequency energy components from said noisy speech signal,
estimation, for each of the frequency energy components extracted, of a relationship between an energy level of the noisy speech signal and an energy level of the noise signal,
determination of a respective gain for said each of the extracted frequency energy components, as a function of said estimated ratio between the energy level of the noisy speech signal and the energy level of the noise signal for said each of selected frequency components, and
synthesis of said filter coefficients as a function of said gains.

De préférence, l'étape d'extraction de composantes énergétiques fréquentielles comprend les sous-étapes de

production de K groupes comprenant chacun une pluralité de composantes fréquentielles, respectivement pour K blocs entrelacés du signal de parole bruité, avec K entier, et
calcul d'une moyenne énergétique de K composantes fréquentielles de même rang respectivement dans les K groupes, en l'une respective des composantes énergétiques fréquentielles extraites.

Preferably, the step of extracting frequency energy components comprises the sub-steps of

production of K groups each comprising a plurality of frequency components, respectively for K interleaved blocks of the noisy speech signal, with K integer, and
calculation of an energy average of K frequency components of the same rank respectively in the K groups, in a respective one of the frequency energy components extracted.

Typiquement, l'étape de calcul est précédée, pour chacun des K groupes de composantes fréquentielles, par une étape de sélection d'une partie des composantes fréquentielles ayant des rangs prédéterminés respectifs dans ledit chacun des groupes, ladite partie sélectionnée présentant un caractère de symétrie par rapport au complémentaire de cette partie parmi la pluralité des composantes fréquentielles extraites. Par ailleurs, les étapes de production et synthèse sont mises en oeuvre respectivement au moyen de Transformée de Fourier Rapide et Transformée de Fourier Inverse. Typically, the calculation step is preceded, for each of the K groups of frequency components, by a step of selecting a part of the components frequencies with respective predetermined ranks in said each of the groups, said selected part presenting a character of symmetry with respect to the complementary to this part among the plurality of frequency components extracted. In addition, production and synthesis stages are implemented respectively by means of Fast Fourier Transform and Inverse Fourier transform.

Un dispositif pour la mise en oeuvre du procédé comprend, pour chacun de cycles de traitement successifs,:

des moyens pour extraire une pluralité de composantes énergétiques fréquentielles dans ledit signal de parole bruité,
des moyens pour estimer, pour chacune des composantes énergétiques fréquentielles extraites, un rapport entre un niveau d'énergie du signal de parole bruité et un niveau d'énergie de signal de bruit,
des moyens pour déterminer un gain respectif pour ladite chacune des composantes énergétiques fréquentielles extraites, en fonction dudit rapport estimé entre le niveau d'énergie du signal de parole bruité et le niveau d'énergie du signal de bruit pour ladite chacune sélectionnée des composantes fréquentielles,
des moyens pour synthétiser lesdits coefficients de filtrage en fonction desdits gains, et
des moyens de filtrage temporel dudit signal de parole bruit en fonction desdits coefficients de filtrage, en un signal de parole dans lequel ledit signal de bruit est sensiblement supprimé.

A device for implementing the method comprises, for each of the successive treatment cycles:

means for extracting a plurality of frequency energy components from said noisy speech signal,
means for estimating, for each of the frequency energy components extracted, a relationship between an energy level of the noisy speech signal and a noise signal energy level,
means for determining a respective gain for said each of the extracted frequency energy components, as a function of said estimated ratio between the energy level of the noisy speech signal and the energy level of the noise signal for said selected each of the frequency components,
means for synthesizing said filter coefficients as a function of said gains, and
means for the temporal filtering of said noise speech signal as a function of said filtering coefficients, into a speech signal in which said noise signal is substantially suppressed.

L'invention fournit également deux variantes de système combiné d'annulation d'écho et de suppression de bruit. The invention also provides two variants of combined echo cancellation and suppression system noise.

Selon une première variante, le système comprend:

un dispositif de suppression de bruit pour supprimer un signal de bruit dans un signal de parole à transmettre, en un signal de parole débruité,
un annuleur d'écho comprenant un premier moyen pour produire un signal d'écho estimé en fonction d'un signal de parole donné et d'un signal de différence, et un second moyen pour soustraire ledit signal d'écho estimé audit signal de parole débruité, en ledit signal de différence.

According to a first variant, the system comprises:

a noise suppression device for suppressing a noise signal in a speech signal to be transmitted, into a noise-reduced speech signal,
an echo canceller comprising first means for producing an estimated echo signal based on a given speech signal and a difference signal, and second means for subtracting said estimated echo signal from said speech signal denoised, in said difference signal.

Il est caractérisé en ce que ledit dispositif de suppression de bruit est sous la forme de:

un moyen de traitement fréquentiel numérique dudit signal de parole à transmettre, pour produire des coefficients temporels de filtrage, et
un premier moyen de traitement temporel numérique pour traiter ledit signal de parole à transmettre en fonction desdits coefficients de filtrage, en ledit signal de parole débruité dans lequel ledit signal de bruit est sensiblement supprimé, et en ce que ledit système comprend en outre:
un second moyen de traitement temporel numérique, strictement similaire audit premier moyen de traitement temporel, pour traiter un signal de parole reçu d'un terminal distant en fonction desdits coefficients de filtrage, en ledit signal de parole donné.

It is characterized in that said noise suppression device is in the form of:

means for digital frequency processing of said speech signal to be transmitted, to produce temporal filter coefficients, and
first digital time processing means for processing said speech signal to be transmitted as a function of said filter coefficients, into said denoised speech signal in which said noise signal is substantially suppressed, and in that said system further comprises:
second digital time processing means, strictly similar to said first time processing means, for processing a speech signal received from a remote terminal according to said filter coefficients, into said given speech signal.

Selon une seconde variante, le système comprend

un annuleur d'écho comprenant un premier moyen pour produire un signal d'écho estimé en fonction d'un signal de parole reçu d'un terminal distant et d'un signal de différence, et un second moyen pour soustraire ledit signal d'écho estimé à un signal de parole à transmettre, en ledit signal de différence,

According to a second variant, the system comprises

an echo canceller comprising first means for producing an estimated echo signal based on a speech signal received from a remote terminal and a difference signal, and second means for subtracting said echo signal estimated at a speech signal to be transmitted, in said difference signal,

Il est caractérisé en ce qu'il comprend en outre:

un dispositif de suppression de bruit pour supprimer un signal de bruit dans le signal de différence, en un signal de parole débruité, ledit dispositif de suppression de bruit étant sous la forme de:
un moyen de traitement fréquentiel numérique pour traiter ledit signal de parole à transmettre, afin de produire des coefficients temporels de filtrage, et
un moyen de traitement temporel numérique pour traiter ledit signal de différence en fonction desdits coefficients de filtrage, en un signal de parole débruité dans lequel ledit signal de bruit est sensiblement supprimé.

It is characterized in that it further comprises:

a noise suppression device for suppressing a noise signal in the difference signal, into a noise-suppressed speech signal, said noise suppression device being in the form of:
digital frequency processing means for processing said speech signal to be transmitted, in order to produce temporal filter coefficients, and
digital time processing means for processing said difference signal as a function of said filter coefficients, into a denoised speech signal in which said noise signal is substantially suppressed.

D'autres caractéristiques et avantages de la présente invention apparaítront puis clairement à la lecture de la description suivante, en référence aux dessins annexés correspondants dans lesquels :

la figure 1 montre sous la forme d'un bloc-diagramme un dispositif selon l'invention de suppression de bruit dans un signal de parole;
la figure 2 schématise des étapes de traitement mises en oeuvre dans un circuit du dispositif de la figure 1;
la figure 3 montre, sous forme de bloc-diagramme, une première réalisation selon l'invention d'un système mettant en oeuvre le dispositif de la figure 1 en combinaison avec un annuleur d'écho; et
la figure 4 montre, sous forme de bloc-diagramme, une seconde réalisation selon l'invention d'un système mettant en oeuvre le dispositif de la figure 1 en combinaison avec un annuleur d'écho

Other characteristics and advantages of the present invention will appear then clearly on reading the following description, with reference to the corresponding appended drawings in which:

FIG. 1 shows in the form of a block diagram a device according to the invention for suppressing noise in a speech signal;
Figure 2 shows schematically the processing steps implemented in a circuit of the device of Figure 1;
FIG. 3 shows, in the form of a block diagram, a first embodiment according to the invention of a system implementing the device of FIG. 1 in combination with an echo canceller; and
FIG. 4 shows, in the form of a block diagram, a second embodiment according to the invention of a system implementing the device of FIG. 1 in combination with an echo canceller

En référence à la figure 1, un dispositif selon l'invention de suppression d'un signal de bruit dans un signal de parole 1 comprend un circuit d'échantillonnage la, une unité de traitement fréquentiel 100, et un circuit de traitement temporel 14. L'unité de traitement fréquentiel 100 comprend en cascade un circuit d'extraction de composantes énergétiques 10, un circuit d'estimation de rapport signal sur bruit 11, un circuit de calcul de gain 12 et un circuit de synthèse de coefficients de filtrage 13. Le circuit de traitement temporel 14 est un filtre temporel, de type filtre à réponse impulsionnelle finie (FIR Filter en terminologie anglo-saxonne pour Finite Impulse Response Filter). Referring to Figure 1, a device according to the invention of suppressing a noise signal in a speech signal 1 comprises a sampling circuit la, a frequency processing unit 100, and a circuit temporal processing 14. The frequency processing unit 100 includes a cascade extraction circuit energy components 10, an estimation circuit of signal to noise ratio 11, a gain calculation circuit 12 and a circuit for synthesizing filter coefficients 13. The time processing circuit 14 is a time filter, of type finite impulse response filter (FIR Filter in Anglo-Saxon terminology for Finite Impulse Response Filter).

Le circuit d'échantillonnage la échantillonne à une fréquence F=1/T un signal analogique de parole bruité s(t) constitué d'un signal de bruit additionné à un signal de parole. Le signal de parole bruité échantillonné s(nT) résultant de cet échantillonnage est appliqué, d'une part, à une entrée du circuit d'extraction de composantes énergétiques 10 dans l'unité de traitement fréquentiel 100, et, d'autre part, à une entrée du filtre temporel 14. La figure 2 schématise les traitements effectués dans le circuit 10 qui reçoit le signal de parole bruité s(nT). Ce signal de parole bruité échantillonné s(nT) se présente sous la forme de trames successives d'échantillons, quatre d'entr'elles T(n-2), T(n-1), T(n) et T(n+1) étant représentées à une première ligne de la figure 2. Une trame T(n) est formée, selon la réalisation décrite, par M = 128 échantillons, notés e(n)_m, l'indice m variant entre O et 127. Pour chaque trame de rang n, T(n), associée à un cycle de traitement donné du procédé selon l'invention, sont produits K=3 blocs d'échantillons B(1), B(2) et B(3), K étant un nombre entier. Ces K=3 blocs d'échantillons sont formés selon la réalisation décrite à partir, d'une part, de cette trame de rang n, T(n), et, d'autre part, des 2 trames T(n-2) et T(n-1), de rangs respectifs (n-2) et (n-1). Les K=3 blocs d'échantillons B(1) à B(3) sont entrelacés et comprennent chacun 2.M= 256 échantillons successifs dans les trames T(n-2) à T(n), à compter de K=3 premiers échantillons respectifs de rangs 0 et M/2=64 dans la trame T(n-2) et de rang O dans la trame T(n-1). Les groupes respectifs de 2.M échantillons formant les blocs B(1), B(2) et B(3) sont notés b(1)_i, b(2)_i; et b(3)_i, i variant de 0 à (2.M - 1)=255. A ces groupes respectifs d'échantillons b(1)_i, b(2)_i, b(3)_i, 0 ≤i ≤255, sont appliquées respectivement trois transformées de Fourier rapides identiques selon les étapes notées 100a, 100b, et 100c. Ces étapes de transformée de Fourier rapide peuvent éventuellement être précédées par une opération de fenétrage temporel. Par ces transformées de Fourier rapides, à chacun des K=3 groupes d'échantillons b(1)_i, b(2)_i et b(3)_i, est associé l'un respectif de K=3 groupes de composantes fréquentielles, notés E(1)_i, E(2)_i et E(3)_i, l'indice i variant de 0 à 255. L'étape notée 101 dans cette figure 2 vise à simplifier la mise en oeuvre des traitements ultérieurs en sélectionnant seulement une partie des composantes fréquentielles dans chaque groupe E(1)_i à E(3)_i, 0 ≤i ≤255. Cette étape se fonde sur la propriété suivante. La Transformée de Fourier Rapide d'un signal réel présente une pseudo-symétrie. En effet, en sachant que les échantillons formant le signal de parole sont réels, chaque groupe de composantes fréquentielles E(k)_i, k= 1, 2 ou 3, peut s'écrire sous la forme: E(k)i={E(k)0,E(k)1,.,E(k)127,E(k)128,E(k)129=E(k)127,.,E(k)255=E(k)1} L'étape de traitement 101 sélectionne, dans chaque groupe E(k=1)_i ,E(k=2)_i et E(k=3)_i, 0 ≤i ≤255, une partie des composantes fréquentielles le constituant, à savoir les composantes fréquentielles E(k)_O à E(k)₁₂₈ de rang 0 à 128, qui forment un groupe de composantes fréquentielles sélectionnées. Ces 129 premières composantes fréquentielles sélectionnées sont suffisantes pour décrire chaque groupe E(k)_i, 0 ≤i ≤255, de façon complète puisque les autres composantes fréquentielles dans le groupe, à savoir les 127 dernières composantes E(k)₁₂₉ à E(k)_255, se déduisent par symétrie. Les composantes fréquentielles E(k)_O à E(k)₁₂₈ qui sont sélectionnées dans chaque groupe présentent en effet un caractère de symétrie par rapport au complémentaire E(k)₁₂₉ à E(k)₂₅₅ de ces composantes sélectionnées parmi toutes les composantes fréquentielles dans le groupe qui sont produites initialement. En sortie du traitement 101, sont ainsi produites les composantes fréquentielles E(k)_O à E(k)₁₂₈ pour chaque groupe. Selon l'étape 102, les 129 composantes fréquentielles sélectionnées dans chaque groupe sont décimées par 2, en vue de ne retenir qu'une composante sur deux parmi chaque groupe de composantes sélectionné. Cette décimation par deux 102 vise à écarter sélectivement une composante sur deux, relative à une fréquence donnée, en vue d'inhiber l'effet d'interaction que produit sur cette composante chacune des deux composantes fréquentielles situées à deux fréquences respectives de part et d'autre de ladite fréquence donnée. En pratique, sont retenues les 65 composantes fréquentielles E(k)_i, avec i=1, 3, 5,.., 127,128, sachant que la composante fréquentielle E(k)₀ ne présente aucun intérêt à être retenue puisqu'il s'agit d'une composante continue. A des fins de simplification de notation, ces composantes fréquentielles E(k)_i, avec i=1, 3, 5,..,127,128 sont notées E(k)_j, avec 0 ≤j ≤64. En résultat des étapes 101 et 102, est ainsi produite pour chaque groupe initial de composantes E(1)_i ,E(2)_i et E(3)_i, 0 ≤i ≤255, un groupe de composantes sélectionnées et décimées. The sampling circuit samples it at a frequency F = 1 / T an analog noisy speech signal s (t) consisting of a noise signal added to a speech signal. The sampled noisy speech signal s (nT) resulting from this sampling is applied, on the one hand, to an input of the energy component extraction circuit 10 in the frequency processing unit 100, and, on the other hand, at an input of the time filter 14. FIG. 2 diagrams the processing carried out in circuit 10 which receives the noisy speech signal s (nT). This sampled noisy speech signal s (nT) is in the form of successive frames of samples, four of them T (n-2), T (n-1), T (n) and T (n +1) being represented in a first line of FIG. 2. A frame T (n) is formed, according to the embodiment described, by M = 128 samples, denoted e (n) _m , the index m varying between O and 127 For each frame of rank n, T (n), associated with a given processing cycle of the method according to the invention, are produced K = 3 blocks of samples B (1), B (2) and B (3) , K being an integer. These K = 3 blocks of samples are formed according to the embodiment described from, on the one hand, this frame of rank n, T (n), and, on the other hand, the 2 frames T (n-2) and T (n-1), of respective ranks (n-2) and (n-1). The K = 3 blocks of samples B (1) to B (3) are interleaved and each comprise 2.M = 256 successive samples in the frames T (n-2) to T (n), from K = 3 first respective samples of ranks 0 and M / 2 = 64 in the frame T (n-2) and of rank O in the frame T (n-1). The respective groups of 2.M samples forming the blocks B (1), B (2) and B (3) are denoted b (1) _i , b (2) _i ; and b (3) _i , i varying from 0 to (2.M - 1) = 255. To these respective groups of samples b (1) _i , b (2) _i , b (3) _i , 0 ≤i ≤255 , three identical fast Fourier transforms are applied respectively according to the steps noted 100a, 100b, and 100c . These fast Fourier transform steps can possibly be preceded by a time window operation. By these fast Fourier transforms, to each of K = 3 groups of samples b (1) _i , b (2) _i and b (3) _i , is associated the respective one of K = 3 groups of frequency components, denoted E (1) _i , E (2) _i and E (3) _i , the index i varying from 0 to 255. The step denoted 101 in this figure 2 aims to simplify the implementation of the subsequent treatments by selecting only part of the frequency components in each group E (1) _i to E (3) _i , 0 ≤i ≤255. This step is based on the following property. The Fast Fourier Transform of a real signal has pseudo-symmetry. Indeed, knowing that the samples forming the speech signal are real, each group of frequency components E (k) _i , k = 1, 2 or 3, can be written in the form: E (k) i = {E (k) 0 , E (k) 1 ,., E (k) 127 , E (k) 128 , E (k) 129 = E (k) 127 ,., E (k) 255 = E (k) 1 } The processing step 101 selects, in each group E (k = 1) _i , E (k = 2) _i and E (k = 3) _i , 0 ≤i ≤255 , a part of the frequency components constituting it, with know the frequency components E (k) _O to E (k) ₁₂₈ of rank 0 to 128, which form a group of selected frequency components. These first 129 selected frequency components are sufficient to describe each group E (k) _i , 0 ≤i ≤255 , in a complete way since the other frequency components in the group, namely the last 127 components E (k) ₁₂₉ to E ( k) _255, are deduced by symmetry. The frequency components E (k) _O to E (k) ₁₂₈ which are selected in each group indeed present a character of symmetry compared to the complementary E (k) ₁₂₉ to E (k) ₂₅₅ of these components selected among all the components frequencies in the group that are produced initially. At the end of the processing 101, the frequency components E (k) _O to E (k) ₁₂₈ are thus produced for each group. According to step 102, the 129 frequency components selected in each group are decimated by 2, in order to retain only one component out of two from each group of components selected. This decimation by two 102 aims at selectively removing one component out of two, relating to a given frequency, with a view to inhibiting the interaction effect produced on this component each of the two frequency components located at two respective frequencies on the one hand and d 'other of said given frequency. In practice, the 65 frequency components E (k) _i are retained, with i = 1, 3, 5, .., 127.128, knowing that the frequency component E (k) ₀ has no interest in being retained since it is a continuous component. For the purposes of simplification of notation, these frequency components E (k) _i , with i = 1, 3, 5, .., 127.128 are denoted E (k) _j , with 0 ≤j ≤64 . As a result of steps 101 and 102, a group of selected and decimated components is thus produced for each initial group of components E (1) _i , E (2) _i and E (3) _i , 0 ≤i ≤255 .

Selon l'étape 103, il est opéré un calcul de la moyenne énergétique de chaque triplet de K=3 composantes fréquentielles de même rang j dans les K=3 groupes de composantes fréquentielles sélectionnées et décimées E(1)_j, E(2)_j et E(3)_j, l'indice j variant de 0 à 64, pour produire 65 composantes énergétiques moyennées Em_j, j variant de 0 à 64. Ce calcul comprend l'élévation au carré du module de chaque composante fréquentielle de même rang j dans les K=3 groupes de composantes sélectionnées et décimées, en K=3 composantes énergétiques, puis la moyenne de ces K=3 composantes énergétiques. According to step 103, a calculation is made of the energy average of each triplet of K = 3 frequency components of the same rank j in the K = 3 groups of frequency components selected and decimated E (1) _j , E (2) _j and E (3) _j , the index j varying from 0 to 64, to produce 65 averaged energy components Em _j , j varying from 0 to 64. This calculation includes the squared elevation of the module of each frequency component likewise rank j in the K = 3 groups of selected and decimated components, in K = 3 energy components, then the average of these K = 3 energy components.

Ainsi le dispositif 10 extrait pour un cycle relatif à une trame T(n) de traitement du signal de parole bruité s(nT), 65 composantes énergétiques Em_j, chacune représentative d'une énergie, ou puissance, du signal de parole bruité s (nT) pour la fréquence, ou bande de fréquence, considérée. Il est à noter que toutes les étapes 100, 101 et 102 décrites relativement à la figure 2, bien qu'améliorant la mise en oeuvre du procédé selon l'invention, peuvent être réduites à une seule étape consistant en l'application d'une unique transformée de Fourier FFT aux M = 128 échantillons de la trame T(n) retenue pour le cycle de traitement considéré. En outre, l'étape de sélection 101 peut ou bien être opérée ou bien ne pas être opérée, et cela directement sur les composantes fréquentielles résultant du(des) traitement(s) FFT. Thus the device 10 extracts for a cycle relating to a frame T (n) for processing the noisy speech signal s (nT), 65 energy components Em _j , each representative of an energy, or power, of the noisy speech signal s (nT) for the frequency, or frequency band, considered. It should be noted that all of the steps 100, 101 and 102 described in relation to FIG. 2, although improving the implementation of the method according to the invention, can be reduced to a single step consisting in the application of a single Fourier transform FFT at M = 128 samples of the frame T (n) retained for the processing cycle considered. In addition, the selection step 101 may either be operated or else not be operated, and this directly on the frequency components resulting from the FFT treatment (s).

En revenant à la figure 1, les 65 composantes énergétiques Emj, 0 ≤j ≤64, sont appliquées à une entrée du circuit d'estimation de rapport signal sur bruit 11. Pour chacune de ces 65 composantes énergétiques extraites Em_j, le circuit 11 estime un rapport de signal sur bruit SNR_j entre le signal de parole bruité s(nT) et un signal de bruit inclus dans ce signal de parole bruité, pour la composante énergétique considérée Em_j. Un tel rapport signal sur bruit est donné par: SNRj n = Emj n/Bj n dans laquelle l'exposant n repère le cycle de traitement relatif à la trame T(n) de rang n et B_j dénote une composante d'énergie de bruit dans la composante énergétique Em_j de rang j. Returning to FIG. 1, the 65 energy components Emj, 0 ≤j ≤64 , are applied to an input of the signal-to-noise ratio estimation circuit 11. For each of these 65 extracted energy components Em _j , circuit 11 estimates a signal-to-noise ratio SNR _j between the noisy speech signal s (nT) and a noise signal included in this noisy speech signal, for the energy component considered Em _j . Such a signal-to-noise ratio is given by: SNR j not = Em j not / B j not in which the exponent n identifies the processing cycle relating to the frame T (n) of rank n and B _j denotes a noise energy component in the energy component Em _j of rank j.

En pratique, une telle estimation du rapport signal sur bruit se base sur le calcul de la composante d'énergie de bruit estimée dans chaque composante énergétique donnée. Elle utilise par exemple le rapport entre la composante énergétique Em_j ⁿ extraite et la composante énergétique de bruit B_j ^n-1 calculée précédemment au cours d'un cycle de traitement ayant précédé le cycle considéré de traitement de suppression de signal de bruit dans la trame T(n). Plus ce rapport est élevé, plus il traduit l'existence d'un signal de parole pour la composante énergétique fréquentielle Em_j ⁿ considérée, auquel cas la composante de bruit B_j ^(n-1) calculée relativement à la composante énergétique Em_j ^(n-1) pour le rang (n-1) est maintenue en la composante de bruit B_j ⁿ . Plus ce rapport est faible, plus il traduit le fait que la composante énergétique se résume à un signal de bruit, auquel cas la composante de bruit B_j ⁿ varie par calcul en conséquence. Le circuit 11 attribue un rapport signal sur bruit SNR_j, 0 ≤j ≤64, à chaque composante énergétique extraite Emj, 0 ≤j ≤64, selon un algorithme d'estimation utilisant un tel principe. En fonction de ces 65 rapports signal sur bruit SNR_j, le circuit 12 calcule pour chacun d'eux, un gain G_j, prenant par exemple une valeur comprise sensiblement entre 0 et 1, qui est directement lié au rapport signal sur bruit SNR_j pour la composante fréquentielle correspondante. Pour une composante énergétique fréquentielle donnée Em_j, plus le rapport SNR_j du signal de parole bruité s(nT) sur signal de bruit est élevé, plus le gain G_j est faible, et plus le rapport SNR_j du signal de parole bruité sur le signal de bruit est faible, plus le gain G_j est élevé. Ainsi est atténuée la composante de signal de bruit pour chaque composante énergétique fréquentielle Em_j. Ces gains G_j sont tels que la pondération des composantes énergétiques Em_j respectivement par ces gains donnerait un spectre discret de composantes énergétiques fréquentielles pondérées, qui serait représentatif du signal de parole bruité s(nT) dans lequel le signal de bruit est sensiblement supprimé. In practice, such an estimation of the signal to noise ratio is based on the calculation of the noise energy component estimated in each given energy component. It uses for example the ratio between the energy component Em _j ⁿ extracted and the noise energy component B _j ^n-1 previously calculated during a processing cycle which preceded the considered processing cycle for suppressing the noise signal in the frame T (n). The higher this ratio, the more it translates the existence of a speech signal for the frequency energy component Em _j ⁿ considered, in which case the noise component B _j ^(n-1) calculated relative to the energy component Em _j ^{( n-1)} for the rank (n-1) is maintained in the noise component B _j ⁿ . The lower this ratio, the more it reflects the fact that the energy component boils down to a noise signal, in which case the noise component B _j ⁿ varies by calculation accordingly. The circuit 11 assigns a signal to noise ratio SNR _j, 0 ≤j ≤64 , to each extracted energy component Emj, 0 ≤j ≤64 , according to an estimation algorithm using such a principle. As a function of these 65 signal-to-noise ratios SNR _j , the circuit 12 calculates for each of them a gain G _j , taking for example a value substantially between 0 and 1, which is directly linked to the signal-to-noise ratio SNR _j for the corresponding frequency component. For a given frequency energy component Em _j , the higher the ratio SNR _j of the noisy speech signal s (nT) to the noise signal, the lower the gain G _j , and the higher the ratio SNR _j of the noisy speech signal on the lower the noise signal, the higher the gain G _j . Thus the noise signal component for each frequency energy component Em _j is attenuated. These gains G _j are such that the weighting of the energy components Em _j respectively by these gains would give a discrete spectrum of weighted frequency energy components, which would be representative of the noisy speech signal s (nT) in which the noise signal is substantially suppressed.

Une sortie du circuit 12 produisant les gains G_j est appliquée à une entrée du circuit de synthèse des coefficients de filtrage 13. Ce circuit 13 comprend un premier circuit (non représenté) de duplication des 65 gains calculés Gj, en conformité avec l'équation donnée en (1). Ce circuit reçoit 65 gains, notés G₀, G₁,..,G₆₄, et produit 128 gains pouvant s'écrire sous la forme d'un groupe de gains G_j, j compris entre 0 et 127, tel que suit: Gj={G0,G1,..,G63,G64,G65=G63,..,G127=G1} An output of the circuit 12 producing the gains G _j is applied to an input of the circuit for synthesizing the filter coefficients 13. This circuit 13 comprises a first circuit (not shown) for duplicating the 65 calculated gains Gj, in accordance with the equation given in (1). This circuit receives 65 gains, noted G₀, G₁, .., G₆₄, and produces 128 gains which can be written in the form of a group of gains G _j, j between 0 and 127, as follows: G j = {G 0 , G 1 , .., G 63 , G 64 , G 65 = G 63 , .., G 127 = G 1 }

Un second circuit (non représenté) dans le circuit de synthèse 13, sous la forme d'une Transformée de Fourier Inverse TFD⁻¹, synthétise 128 coefficients C(nT) du filtre 14 par transformée de Fourier inverse des 128 gains G_j. Ces coefficients notés C(nT), au nombre de 128, sont appliquées à une première entrée de commande du filtre 14, typiquement filtre FIR. Une seconde entrée du filtre 14 reçoit le signal de parole bruité s(nT). Le filtre 14 opère une convolution des coefficients C(nT) avec les 128 échantillons de la trame T(n), en une trame débruitée de 128 échantillons formant partie du signal de parole débruité s*(nT). Le procédé résultant du dispositif décrit ci-dessus est naturellement "adaptatif" en ce sens que les coefficients C(nT) appliqués à l'entrée de commande du filtre FIR 14 sont modifiés pour chaque trame T(n), de rang n donné, en fonction des traitements 10, 11, 12 et 13 des échantillons formant le signal de parole à traiter. A second circuit (not shown) in the synthesis circuit 13, in the form of an Inverse Fourier Transform TFD⁻¹, synthesizes 128 coefficients C (nT) of the filter 14 by inverse Fourier transform of the 128 gains G _j . These coefficients denoted C (nT), 128 in number, are applied to a first control input of filter 14, typically FIR filter. A second input of the filter 14 receives the noisy speech signal s (nT). The filter 14 operates a convolution of the coefficients C (nT) with the 128 samples of the frame T (n), into a denoised frame of 128 samples forming part of the denoised speech signal s * (nT). The method resulting from the device described above is naturally "adaptive" in the sense that the coefficients C (nT) applied to the control input of the FIR filter 14 are modified for each frame T (n), of rank n given, according to the processing 10, 11, 12 and 13 of the samples forming the speech signal to be processed.

En résumé de ce qui précède, la caractéristique principale du procédé de suppression de bruit selon l'invention est d'utiliser, d'une part, un traitement fréquentiel numérique 100 du signal de parole bruité, pour produire des coefficients temporels de filtrage C(nT), et, d'autre part, un traitement temporel numérique 14 du signal de parole bruité s(nT) en fonction des coefficients de filtrage C(nT), pour produire un signal de parole s*(nT) dans lequel le signal de bruit est sensiblement supprimé. In summary from the above, the characteristic main process of noise suppression according to the invention is to use, on the one hand, a treatment digital frequency 100 of the noisy speech signal, for produce temporal filter coefficients C (nT), and, on the other hand, a digital temporal processing 14 of the signal speech noise s (nT) as a function of the coefficients of filtering C (nT), to produce a speech signal s * (nT) in which the noise signal is substantially suppressed.

En référence à la figure 3, un système combiné de suppression de bruit et d'annulation d'écho selon une première variante de l'invention qui est inclus dans un terminal, typiquement un radiotéléphone main-libre, comprend un microphone 2, un haut parleur 4, un dispositif de suppression de bruit selon l'invention 1, tel que décrit précédemment, un circuit de traitement temporel 14' et un annuleur d'écho 3. Le dispositif de suppression de bruit 1 est identique au dispositif montré à la figure 1, et comprend principalement une unité de traitement fréquentiel 100 et un circuit de traitement temporel 14. L'annuleur d'écho est formé d'un soustracteur 30 et d'un circuit 31 qui produit un signal d'écho estimé. Le microphone 2 reçoit un signal de parole à transmettre [s(t)+e(t)] formé d'un signal de parole sonore bruité s(t) additionné à un signal d'écho e(t). Ce signal d'écho résulte du couplage acoustique entre les haut-parleur 4 et microphone 2. Le dispositif de suppression de bruit 1 traite, tel que décrit précédemment, le signal de parole à transmettre en un signal de parole transmis débruité [s*(nT)+e*(nT)] qui est appliqué à une première entrée du soustracteur 30 dont une seconde entrée reçoit la sortie du circuit 31. Un signal de parole reçu r(t) en provenance d'un terminal distant est appliqué, d'une part à une entrée du haut-parleur, et, d'autre part, à une entrée du circuit 31 à travers le circuit de traitement temporel 14' précédé par un échantillonneur 14a'. Selon une caractéristique importante de l'invention, le circuit de traitement temporel 14' est à chaque instant strictement similaire au circuit de traitement temporel 14 dans le dispositif de suppression de bruit 1 (FIG.1). Cette caractéristique se base sur le fait que l'écho estimé du signal reçu r(t) produit par le circuit 31 est à soustraire, dans le soustracteur 30, au signal d'écho traité par suppression de bruit e*(nT) dans le circuit 1 et non au signal d'écho original e(nT). Ce circuit 14' est donc une duplication pure et simple du circuit de traitement temporel 14 dans le dispositif 1, comme indiqué par la flèche en trait discontinu à deux extrémités dans la figure 3. Le circuit de traitement temporel 14' est donc associé à chaque instant aux mêmes 128 coefficients de filtrage C(nT) que le circuit 14 dans le dispositif 1. Il traite le signal de parole reçu r(t) en un signal de parole reçu débruité r*(nT). Ce traitement résulte de la convolution, par cycle de 128, des coefficients C(nT) et des échantillons r(nT) du signal reçu r(t). Le circuit 31 produit un signal d'estimation ê*(nT) du signal d'écho débruité e*(nT) à partir du signal de parole reçu débruité r*(nT) et de coefficients d'annulation d'écho w(nT). En sortie du soustracteur 30 est donc produit un signal de différence [s*(nT)+e*(nT)-ê*(nT)] dans lequel le signal d'écho est sensiblement supprimé. Les coefficients d'annulation d'écho w(nT) sont obtenus à partir de ce signal de différence. Referring to Figure 3, a combined system of noise cancellation and echo cancellation according to a first variant of the invention which is included in a terminal, typically a hands-free radiotelephone, includes a microphone 2, a speaker 4, a device for noise suppression according to the invention 1, as described previously, a time processing circuit 14 'and a echo canceller 3. The noise canceling device 1 is identical to the device shown in Figure 1, and mainly includes a frequency processing unit 100 and a time processing circuit 14. The canceller echo consists of a subtractor 30 and a circuit 31 which produces an estimated echo signal. Microphone 2 receives a speech signal to be transmitted [s (t) + e (t)] formed by a signal of noisy sound speech s (t) added to an echo signal and). This echo signal results from the acoustic coupling between speaker 4 and microphone 2. The device noise suppression 1 milking, as described above, the speech signal to be transmitted as a speech signal transmitted denoised [s * (nT) + e * (nT)] which is applied to a first input of subtractor 30 including a second input receives the output of circuit 31. A speech signal received r (t) from a remote terminal is applied, from a part to a speaker input, and, secondly, to a input of circuit 31 through the processing circuit temporal 14 'preceded by a sampler 14a'. According to one an important characteristic of the invention, the circuit time processing 14 'is strictly at all times similar to the time processing circuit 14 in the noise canceling device 1 (FIG.1). This characteristic is based on the fact that the estimated echo of the received signal r (t) produced by circuit 31 is to be subtracted, in subtractor 30, to the echo signal processed by noise suppression e * (nT) in circuit 1 and not at original echo signal e (nT). This circuit 14 'is therefore a outright duplication of the time processing circuit 14 in device 1, as indicated by the arrow in broken line at two ends in Figure 3. The time processing circuit 14 ′ is therefore associated with each instant at the same 128 filter coefficients C (nT) as the circuit 14 in device 1. It processes the signal from received speech r (t) into a noisy received speech signal r * (nT). This treatment results from convolution, by cycle of 128, coefficients C (nT) and samples r (nT) of signal received r (t). Circuit 31 produces a signal estimation ê * (nT) of the noisy echo signal e * (nT) at from the noisy received speech signal r * (nT) and from echo cancellation coefficients w (nT). At the end of subtractor 30 is therefore produced a difference signal [s * (nT) + e * (nT) -ê * (nT)] in which the echo signal is significantly removed. The echo cancellation coefficients w (nT) are obtained from this difference signal.

En référence à la figure 4, un système combiné de suppression de bruit et d'annulation d'écho selon une seconde variante de l'invention comprend un microphone 2, un haut parleur 4, un annuleur d'écho 3, une unité de traitement fréquentiel 100, un circuit de traitement temporel 14 et un échantillonneur 5. Les unité 100 et circuit 14 sont identiques à ceux décrits dans la figure 1. L'annuleur d'écho 3 est formé d'un soustracteur 30 et d'un circuit 31 qui produit un signal d'écho estimé ê(nT). Le microphone 2 reçoit un signal de parole transmis [s(t)+e(t)] formé d'un signal de parole sonore bruité s(t) additionné à un signal d'écho e(t). Ce signal d'écho résulte du couplage acoustique entre les haut-parleur 4 et microphone 2. Le signal de parole transmis [s(t)+e(t)] est échantillonné dans l'échantillonneur 5 en le signal [s(nT)+e(nT)]. Ce signal échantillonné est appliqué, d'une part, à une entrée de l'unité 100, et, d'autre part, à une entrée du circuit 14 à travers le soustracteur 30. Un signal de parole reçu en provenance d'un terminal distant r(t)- est appliqué, d'une part, à une entrée du circuit 31, et, d'autre part, à une entrée du haut-parleur 4. Le circuit 31, recevant le signal r(t), produit en réponse un signal d'écho estimé ê(nT) appliqué à une première entrée du soustracteur 30 dont une seconde entrée reçoit le signal de parole transmis [s(nT)+e(nT)]. En sortie du soustracteur 30, est produit un signal de différence [s(nT)+e(nT)-ê(nT)] appliqué au circuit 14. Selon cette variante, d'une part les traitements fréquentiels mis en oeuvre dans l'unité 100 s'opèrent sur le signal de parole transmis [s(nT)+e(nT)], et d'autre part le traitement temporel dans le circuit 14, à partir des coefficients C(nT) produits par l'unité 100, s'opère sur le signal de différence, ou signal de parole transmis traité par annulation d'écho, [s(nT)+e(nT)-ê(nT)]. Cette variante évite de "dupliquer" le circuit 14 dans la branche incluant le circuit 31, comme représenté par la flèche en trait discontinu dans la figure 3 pour la variante précédente. Referring to Figure 4, a combined system of noise cancellation and echo cancellation according to a second variant of the invention comprises a microphone 2, a speaker 4, an echo canceller 3, a signal unit frequency processing 100, a processing circuit temporal 14 and a sampler 5. Units 100 and circuit 14 are identical to those described in FIG. 1. The echo canceller 3 is formed by a subtractor 30 and a circuit 31 which produces an estimated echo signal ê (nT). The microphone 2 receives a transmitted speech signal [s (t) + e (t)] formed by a noisy audible speech signal s (t) added to an echo signal e (t). This echo signal results from coupling acoustic between speaker 4 and microphone 2. The transmitted speech signal [s (t) + e (t)] is sampled in the sampler 5 in the signal [s (nT) + e (nT)]. This signal sampled is applied, on the one hand, to an input of the unit 100, and, on the other hand, at an input of the circuit 14 to through subtractor 30. A speech signal received in from a remote terminal r (t) - is applied, from a on the other hand, at an input of circuit 31, and, on the other hand, at a speaker input 4. Circuit 31, receiving the signal r (t), produces in response an estimated echo signal ê (nT) applied to a first input of the subtractor 30, one of which second input receives the transmitted speech signal [s (nT) + e (nT)]. At the output of the subtractor 30, is produced a difference signal [s (nT) + e (nT) -ê (nT)] applied to the circuit 14. According to this variant, on the one hand the treatments implemented in unit 100 operate on the transmitted speech signal [s (nT) + e (nT)], and on the other hand the time processing in circuit 14, from coefficients C (nT) produced by the unit 100, operates on the difference signal, or processed speech signal processed by echo cancellation, [s (nT) + e (nT) -ê (nT)]. This variant avoids "duplicating" circuit 14 in the branch including circuit 31, as represented by the arrow in lines discontinuous in Figure 3 for the previous variant.

Claims

Method for suppressing a noise signal in a noisy speech signal (s (nT)) which is sampled, characterized by the steps of:

digital frequency processing (100) of said noisy speech signal, to produce temporal filter coefficients (C (nT)), and

digital time processing (14) of said noisy speech signal (s (nT)) as a function of said filter coefficients (C (nT)), into a speech signal (s * (nT)) in which said noise signal is substantially deleted.

Method for suppressing a noise signal in a noisy speech signal according to claim 1, characterized, for a given processing cycle, by the steps of digital frequency processing of:

extraction (10) of a plurality of frequency energy components (Em _j ) in said noisy speech signal (s (nT)),

estimation (11), for each of the frequency energy components extracted, of a ratio (SNR _j ) between an energy level of the noisy speech signal (s (nT)) and an energy level of the noise signal,

determination (12) of a respective gain (G _j ) for said each of the extracted frequency energy components (Em _j ), as a function of said ratio (SNR _j ) estimated between the energy level of the noisy speech signal (s (nT) )) and the energy level of the noise signal for said each selected frequency components, and

synthesis (13) of said filter coefficients (C (nT)) as a function of said gains (G _j ).

Process according to any one of Claims 2 to 3, characterized in that the said step of extracting frequency energy components comprises the substeps of

production (100a, 100b, 100c) of K groups each comprising a plurality of frequency components (E (1) _i , E (2) _i , E (3) _i ), respectively for K blocks (B (1), B ( 2), B (3)) interleaved with the noisy speech signal (s (nT)), with integer K, and

calculation (103) of an energy average of K frequency components of the same rank (j) respectively in the K groups, in a respective one of the frequency energy components extracted.

Method according to Claim 3, characterized in that the said calculation step (103) is preceded, for each of the K groups of frequency components, by a selection step (101) of a part of the frequency components having respective predetermined ranks in said each of the groups (E (1) _i , E (2) _i , E (3) _i ), said selected part having a character of symmetry with respect to the complement of this part among the plurality of frequency components extracted.

Process according to any of claims 2 to 4, characterized in that said steps production (100a, 100b, 100c) and synthesis (13) are implemented respectively by means of Transform Fast Fourier and Reverse Fourier Transform.

Device (1) for suppressing a noise signal in a noisy speech signal (s (nT)) which is sampled, characterized in that it comprises, for each of successive processing cycles:

means (10) for extracting a plurality of frequency energy components (Em _j ) from said noisy speech signal (s (nT)),

means (11) for estimating, for each of the frequency energy components extracted, a ratio (SNR _j ) between an energy level of the noisy speech signal (s (nT)) and a noise signal energy level,

means for determining (12) a respective gain (G _j ) for said each of the extracted frequency energy components (Em _j ), as a function of said ratio (SNR _j ) estimated between the energy level of the noisy speech signal (s ( nT)) and the energy level of the noise signal for said each selected frequency components,

means (13) for synthesizing said filter coefficients (C (nT)) as a function of said gains (G _j ), and

means for temporal filtering (14) of said noisy speech signal (s (nT)) as a function of said filter coefficients (C (nT)), into a speech signal (s (nT)) in which said noise signal is significantly removed.

Combined echo cancellation (3) and noise cancellation (1) system, comprising

a noise suppression device (1) for suppressing a noise signal in a speech signal to be transmitted (s (nT) + e (nT)), into a noise-reduced speech signal,

an echo canceller (3) comprising first means (31) for producing an estimated echo signal (ê * (t)) based on a given speech signal (r * (nT)) and a difference signal (s * (nT) + e * (nT) -ê * (nT)), and second means (30) for subtracting said estimated echo signal (ê * (t)) from said denoised speech signal (s * (nT) + e * (nT)), in said difference signal (s * (nT) + e * (nT) -ê * (nT)),
characterized in that said noise suppression device is in the form of:

a digital frequency processing means (100) of said speech signal to be transmitted, to produce temporal filter coefficients (C (nT)), and

first digital time processing means (14) for processing said speech signal to be transmitted (s (nT)) as a function of said filter coefficients (C (nT)), in said denoised speech signal (s * (nT) + e * (nT)) in which said noise signal is substantially suppressed, and in that said system further comprises:

second digital time processing means (14 '), strictly similar to said first time processing means (14), for processing a speech signal received from a remote terminal (r (t)) as a function of said filter coefficients (C (nT)), in said given speech signal (r * (nT)).

Combined echo cancellation (3) and noise cancellation (1) system in a speech signal to be transmitted (s (nT) + e (nT)), comprising

an echo canceller (3) comprising first means (31) for producing an estimated echo signal (ê (t)) as a function of a speech signal (r (t)) received from a remote terminal and a difference signal (s (nT) + e (nT) -ê (nT)), and second means (30) for subtracting said estimated echo signal (ê (t)) from a speech signal at transmit (s (nT) + e (nT)), in said difference signal (s (nT) + e (nT) -ê (nT)),
characterized in that it comprises:

a noise suppression device (1) for suppressing a noise signal in the difference signal (s (nT) + e (nT) -ê (nT)), into a noise-reduced speech signal (s * (nT) + e * (nT) -ê * (nT)), said noise suppression device being in the form of:

digital frequency processing means (100) for processing said speech signal to be transmitted (s (nT) + e (nT)), in order to produce temporal filter coefficients (C (nT)), and

digital time processing means (14) for processing said difference signal (s (nT) + e (nT) -ê (nT)) as a function of said filter coefficients (C (nT)), into a denoised speech signal (s * (nT) + e * (nT) -ê * (nT)) in which said noise signal is substantially suppressed.