CN102612712B - Bandwidth extension of low band audio signal - Google Patents

Bandwidth extension of low band audio signal Download PDF

Info

Publication number
CN102612712B
CN102612712B CN201080052278.3A CN201080052278A CN102612712B CN 102612712 B CN102612712 B CN 102612712B CN 201080052278 A CN201080052278 A CN 201080052278A CN 102612712 B CN102612712 B CN 102612712B
Authority
CN
China
Prior art keywords
audio signal
frequency band
low band
high frequency
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201080052278.3A
Other languages
Chinese (zh)
Other versions
CN102612712A (en
Inventor
沃洛佳·格兰恰诺夫
斯特凡·布鲁恩
哈拉尔德·波布洛斯
西格德尔·斯维里森
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Telefonaktiebolaget LM Ericsson AB
Original Assignee
Telefonaktiebolaget LM Ericsson AB
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Telefonaktiebolaget LM Ericsson AB filed Critical Telefonaktiebolaget LM Ericsson AB
Publication of CN102612712A publication Critical patent/CN102612712A/en
Application granted granted Critical
Publication of CN102612712B publication Critical patent/CN102612712B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/038Speech enhancement, e.g. noise reduction or echo cancellation using band spreading techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/038Speech enhancement, e.g. noise reduction or echo cancellation using band spreading techniques
    • G10L21/0388Details of processing therefor

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Telephone Function (AREA)

Abstract

Estimation of a high band extension of a low band audio signal includes the following steps: extracting (S1) a set of features of the low band audio signal; mapping (S2) extracted features to at least one high band parameter with generalized additive modeling; frequency shifting (S3) a copy of the low band audio signal into the high band; controlling (S4) the envelope of the frequency shifted copy of the low band audio signal by said at least one high band parameter.

Description

The bandwidth expansion of low band audio signal
Technical field
The present invention relates to audio coding, more specifically, relate to the bandwidth expansion of low band audio signal.
Background technology
The present invention relates to the bandwidth expansion (BWE) of sound signal.In voice and audio coding/decoding, by BWE scheme, improve the perceived quality under given bit rate more and more.BWE based on main theory be: do not send a part of sound signal, but according to the component of signal that receives, rebuild (estimation) this part sound signal at demoder place.
Therefore, in BWE scheme, in a part for demoder place reconstruction signal frequency spectrum.With utilizing the special characteristic of the signal spectrum of the actual transmission of traditional coding method, carry out this reconstruction.Conventionally, according to specific low-frequency band (LB) audio signal characteristic, carry out the high frequency band (HB) of reconstruction signal.
Conventionally by gauss hybrid models (GMM) or Hidden Markov Model (HMM) (HMM), the dependence between LB feature and HB characteristics of signals is carried out to modeling (for example, [1-2]).The most often the HB characteristic of prediction is relevant with spectrum envelope and/or temporal envelope.
There is the BWE scheme of two kinds of main Types:
● in the first scheme, according to specific LB feature, predict HB characteristics of signals completely.
These BWE solutions have been introduced pseudo-sound (artifact) in the HB rebuilding, and this causes the quality of the reduction compared with bandwidth limited signal in some cases.Complex mappings (for example,, based on GMM or HMM) easily causes the deteriorated of unknown data.
General experience is: shine upon more complicated (number of training parameter is larger), for non-existent data type in training set, occur that the possibility of pseudo-sound is just higher.The mapping of finding the optimum balance that will be given between whole precision of prediction and a small amount of abnormal data (outlier, obviously departs from the data of the data in training set, can not by the component of fine modeling) to have complexity is very difficult.
● alternative plan (example of describing in [3]) is to rebuild HB signal according to the combination of LB feature and a small amount of HB information sending.Utilize the BWE scheme of the HB information sending to tend to improve performance (cost is the bit budget increasing), but the conventional method that the parameter of the parameter of transmission and prediction is combined is not provided.Conventionally, send a set of HB parameter, and another set of prediction HB parameter, it means the fault of sent information in can not the parameter of compensation prediction.
Summary of the invention
The object of the invention is to realize improved BWE scheme.
According to appended claim, realized this object.
According to first aspect, the present invention relates to estimate the method for the high frequency band expansion of low band audio signal.The method comprises the following steps.Extract the characteristic set of low band audio signal.Utilize Generalized Additive Models (generalized additive modeling) that the Feature Mapping of extracting is arrived at least one high frequency band parameters.The copy frequency displacement of low band audio signal is arrived to high frequency band.By described at least one high frequency band parameters, control the envelope of copy after the frequency displacement of low band audio signal.
According to second aspect, the present invention relates to for estimating the device of the high frequency band expansion of low band audio signal.Feature extraction piece is configured to extract the characteristic set of low band audio signal.Mapping block comprises following unit: Generalized Additive Models mapper, is configured to utilize Generalized Additive Models that the Feature Mapping of extracting is arrived at least one high frequency band parameters; Frequency shifter, is configured to the copy frequency displacement of low band audio signal to high frequency band; Envelope control device, is configured to control by described at least one high frequency band parameters the envelope of copy after frequency displacement.
According to the third aspect, the present invention relates to comprise the Voice decoder according to the device of second aspect.
According to fourth aspect, the present invention relates to comprise the network node according to the Voice decoder of the third aspect.
The advantage of the BWE scheme proposing is: it provides good balance between complex mappings scheme (good average behavior, but a large amount of abnormal datas) and the mapping scheme of multiple constraint more (lower average behavior, but more robust).
Accompanying drawing explanation
Carry out in conjunction with the drawings reference description below, can understand best the present invention and other objects and advantage, in the accompanying drawings:
Fig. 1 shows the block diagram of the embodiment that comprises that the coding/decoding of Voice decoder is arranged according to an embodiment of the invention;
Fig. 2 A-C shows the figure of the principle of Generalized Additive Models;
Fig. 3 shows according to of the present invention for generating the block diagram of embodiment of the device of HB expansion;
Fig. 4 shows the figure of the example of the high frequency band parameters obtaining by Generalized Additive Models according to an embodiment of the invention;
Fig. 5 shows the figure of definition of the feature of applicable extraction according to another embodiment of the present invention;
Fig. 6 shows the block diagram of embodiment that is applicable to generating based on feature shown in Fig. 5 the device of HB expansion according to of the present invention;
Fig. 7 shows the diagram of the example of the high frequency band parameters that the feature based on shown in Fig. 5 obtains by Generalized Additive Models according to an embodiment of the invention;
Fig. 8 shows the block diagram of another embodiment of the coding/decoding layout that comprises Voice decoder according to another embodiment of the present invention;
Fig. 9 shows the block diagram of the another embodiment of the coding/decoding layout that comprises Voice decoder according to still another embodiment of the invention;
Figure 10 shows according to of the present invention for generating the block diagram of another embodiment of the device of HB expansion;
Figure 11 shows according to of the present invention for generating the block diagram of another embodiment of the device of HB expansion;
Figure 12 shows the block diagram comprising according to the embodiment of the network node of the embodiment of Voice decoder of the present invention;
Figure 13 shows according to the block diagram of the embodiment of Voice decoder of the present invention;
Figure 14 shows the process flow diagram of the embodiment of the method according to this invention.
Embodiment
In the accompanying drawings, to thering is the unit of same or similar function, provide identical invoking marks.
Hereinafter, explain LB characteristic set and by mapping, carried out the usage of the HB part of estimated signal.In addition, also explained that how the HB information sending is can be for controlling mapping.
Fig. 1 shows the block diagram of the embodiment that comprises that the coding/decoding of Voice decoder is arranged according to an embodiment of the invention.Speech coder 1 reception sources sound signal s (conventionally receiving its frame), is transmitted to analysis filterbank 10, and analysis filterbank 10 is divided into low-frequency band part s by sound signal lBwith highband part s hB.In this embodiment, HB is partly dropped (it means that analysis filterbank can only comprise low-pass filter).The LB part s of sound signal lBfor example, in LB scrambler 12 (normally Code Excited Linear Prediction (CELP) scrambler, Algebraic Code Excited Linear Prediction (ACELP) scrambler), encode, and code is sent to Voice decoder 2.In [4], can find the example of ACELP coding/decoding.The code that Voice decoder 2 receives for example, is decoded in LB demoder 14 (normally CELP demoder, ACELP demoder), and LB demoder 14 provides and s lBcorresponding low band audio signal
Figure BDA00001655406600041
this low band audio signal
Figure BDA00001655406600042
be forwarded to feature extraction piece 16, feature extraction piece 16 extracts signal
Figure BDA00001655406600043
feature F lBset (described below).The feature F extracting lBbe forwarded to mapping block 18, mapping block 18 utilizes Generalized Additive Models (described below) by the feature F extracting lBbe mapped at least one high frequency band parameters (described below).HB parameter is used to control LB sound signal
Figure BDA00001655406600044
frequency displacement to the envelope of the copy of high frequency band, wherein this envelope provides the HB part s to abandoning hBestimation
Figure BDA00001655406600045
signal with
Figure BDA00001655406600047
be forwarded to synthesis filter banks 20, the estimation that synthesis filter banks 20 is rebuild original source sound signal feature extraction piece 16 is formed for generating the device 30 (below further describing) of HB expansion together with mapping block 18.
The exemplary L B audio signal characteristic (being called local feature) of below introducing is used to predict specific HB characteristics of signals.Can use all features or subset in exemplified feature.Calculate frame by frame all these local features, local feature dynamically also comprises from the information of frame before.Hereinafter, n is frame index, and l is sample index, and s (n, l) is speech samples.
Two exemplary characteristics are with spectral tilt and tilt dynamically relevant.They measure the frequency distribution of energy:
Ψ 1 ( n ) = Σ l = 1 L s ( n , l ) s ( n , l - 1 ) Σ l = 1 L s 2 ( n , l ) - - - ( 1 )
Ψ 2 ( n ) = | Ψ 1 ( n ) - Ψ 1 ( n - 1 ) | Ψ 1 ( n ) + Ψ 1 ( n - 1 ) - - - ( 2 )
Ensuing two exemplary characteristics measurement pitch (voice basic frequency) and pitch are dynamic.Pass through τ mINand τ mAXsearch for optimal delay is limited in to significant pitch range, for example 50-400Hz:
&Psi; 3 ( n ) = arg max &tau; MIN < &tau; < &tau; MAX &Sigma; l = 1 L s ( n , l ) s ( n , l + &tau; ) &Sigma; l = 1 L s 2 ( n , l ) &Sigma; l = 1 L s 2 ( n , l + &tau; ) - - - ( 3 )
&Psi; 4 ( n ) = | &Psi; 3 ( n ) - &Psi; 3 ( n - 1 ) | &Psi; 3 ( n ) + &Psi; 3 ( n - 1 ) - - - ( 4 )
The the 5th and the 6th exemplary characteristics has reflected tonal components in signal and the balance between noise like component.Herein,
Figure BDA00001655406600054
with
Figure BDA00001655406600055
for example, self-adaptation in CELP encoding and decoding (ACELP encoding and decoding) and the energy of fixed codebook, and
Figure BDA00001655406600056
the energy of pumping signal:
&Psi; 5 ( n ) = &sigma; ACB 2 ( n ) - &sigma; FCB 2 ( n ) &sigma; e 2 ( n ) - - - ( 5 )
&Psi; 6 ( n ) = | &Psi; 5 ( n ) - &Psi; 5 ( n - 1 ) | &Psi; 5 ( n ) + &Psi; 5 ( n - 1 ) - - - ( 6 )
Last local feature frame by frame in this example collection catches energy dynamics.Herein, the energy of speech frame:
&Psi; 7 ( n ) = | lo g 10 ( &sigma; s 2 ( n ) ) - lo g 10 ( &sigma; s 2 ( n - 1 ) ) | lo g 10 ( &sigma; s 2 ( n ) ) + lo g 10 ( &sigma; s 2 ( n - 1 ) ) - - - ( 7 )
All these local features that use in mapping carried out following convergent-divergent before mapping:
&Psi; ~ ( n ) = &Psi; ( n ) - &Psi; MIN &Psi; MAX - &Psi; MIN - - - ( 8 )
Ψ wherein mINand Ψ mAXthe predetermined constant corresponding with the minimum value of given feature and maximal value.This has provided the characteristic set extracting
Figure BDA000016554066000512
According to the present invention, according to local feature, estimate that HB expansion is based on Generalized Additive Models.For this reason, with reference to Fig. 2 A-C, this concept is briefly described.Can for example in [5], find the further details about Generalized Additive Models.
In statistics, often with regression model, carry out the behavior of estimated parameter.A kind of naive model is linear model:
Y ^ = &omega; 0 + &Sigma; m = 1 M &omega; m X m - - - ( 9 )
Wherein
Figure BDA00001655406600062
to depending on (at random) variable X 1..., X mthe estimation of variable Y.Its situation when M=2 has been shown in Fig. 2 A.In this case,
Figure BDA00001655406600063
it will be flat surfaces.
The property feature of linear model be with each be only linearly dependent on a variable.The popularization of this feature is that these linear functions (at least one) are modified as to nonlinear function (it remains each and only depends on a variable).This causes additive model:
Y ^ = &omega; 0 + &Sigma; m = 1 M f m ( X m ) - - - ( 10 ) .
The situation of this additive model when M=2 has been shown in Fig. 2 B.In this case, representative
Figure BDA00001655406600065
surface be crooked.Function f m(X m) S shape (sigmoid) function (being generally serpentine function) typically, as shown in Figure 2 B.The example of sigmoid function is logarithmic function, Compertz curve, anti-arc (ogee) curve and hyperbolic tangent function.By change, define the parameter of sigmoid function, S shape shape can continuously change the approximate step function between identical minimum value and maximal value with the approximately linear shape between maximal value from minimum value.
By Generalized Additive Models acquisition below, further promote:
g ( Y ^ ) = &omega; 0 + &Sigma; m = 1 M f m ( X m ) - - - ( 11 )
Wherein g () is called as link (link) function.This function has been shown in Fig. 2 C, wherein, surface
Figure BDA00001655406600067
further revised (by equation (11) both sides being got to contrary g -1(), obtains
Figure BDA00001655406600068
g wherein -1() is also sigmoid function conventionally).Link function g () be identity function in particular cases, equation (11) is simplified to equation (10).Because both of these case is all very important, for purposes of the present invention, " Generalized Additive Models " also will comprise the situation of identical link function.Yet, as mentioned above, at least one f m(X m) be nonlinear, it makes model is nonlinear (surface
Figure BDA00001655406600069
crooked).
In an embodiment of the present invention, 7 (normalized) features that obtain according to equation (1) to (8)
Figure BDA000016554066000610
be used to estimate HB energy in compression (perception promotes) territory and the ratio Y (n) between LB energy.This ratio can be corresponding with the specific part of temporal envelope or spectrum envelope, or corresponding with entire gain, below will be described further.An example is as follows:
Y ( n ) = ( E HB ( m ) E LB ( n ) ) &beta; - - - ( 12 )
Wherein β for example can be chosen as β=0.2.Another example is as follows:
Y ( n ) = lo g 10 ( E HB ( n ) E LB ( n ) ) - - - ( 13 )
In equation (12) and (13), parameter beta and log 10function is for converting energy Ratios in " perception promotes " territory of compression.Carry out the susceptibility characteristic that this changes to consider the approximate logarithm of people's ear.
Because in demoder place ENERGY E hB(n) unavailable, so prediction or estimation ratio Y (n).This is by LB feature and the estimation of Generalized Additive Models to Y (n) based on extracting
Figure BDA00001655406600073
carrying out modeling completes.An example provides as follows:
Y ^ ( n ) &omega; 0 + &Sigma; m = 1 M ( w 1 m 1 + e - w 2 m &Psi; ~ m ( n ) + w 3 m ) - - - ( 14 )
Wherein, M=7, and the local feature of given extraction (less feature is also feasible).Compare with equation (11), obviously
Figure BDA00001655406600075
with variable X 1..., X pcorrespondence, and function f kwith with in corresponding, it is by model parameter
Figure BDA00001655406600076
sigmoid function with identical link function definition.This Generalized Additive Models parameter ω 0be stored in demoder with ω, and by training to obtain on the database of speech frame.This training process is by minimizing the ratio of being estimated by equation (14)
Figure BDA00001655406600077
and the error between the effective rate Y (n) being provided by equation (12) (or (13)) finds suitable parameter ω 0and ω.A kind of suitable method (especially for S shape parameter) is Levenberg-Marquardt method of for example describing in [6].
Fig. 3 shows according to of the present invention for generating the block diagram of embodiment of the device 30 of HB expansion.Device 30 comprises feature extraction piece 16, and it is configured to extract the characteristic set of low band audio signal
Figure BDA00001655406600078
mapping block
18, is connected to feature extraction piece 16, comprises Generalized Additive Models mapper 32, its be configured to utilize Generalized Additive Models by the Feature Mapping of extracting to high frequency band parameters in the embodiment shown, at mapping block 18, comprise and being configured to low band audio signal
Figure BDA000016554066000710
copy frequency displacement to the frequency shifter 34 of high frequency band.In the embodiment shown, mapping block 18 also comprises envelope control device 36, and it is configured to pass through high frequency band parameters control the envelope of copy after frequency displacement.
Fig. 4 shows the figure of the example of the high frequency band parameters obtaining by Generalized Additive Models according to an embodiment of the invention.It shows the ratio (gain) that how to use estimation
Figure BDA00001655406600081
control the envelope (being in this case in frequency domain) of copy after the frequency displacement of LB signal.Dotted line represents the no gain (1.0) of change of LB signal.Therefore, in this embodiment, by by single estimated gain after being applied to the frequency displacement of LB signal, copy obtains HB expansion.
Fig. 5 shows the figure of definition of the feature of applicable extraction according to another embodiment of the present invention.This embodiment only extracts 2 LB signal characteristic F 1, F 2.
In the embodiment shown in fig. 5, feature F1 is defined as follows:
F 1 = E 10.0 - 11.6 E 8.0 - 11.6 - - - ( 15 )
Wherein,
E 10.0-11.6the estimation of the energy in frequency band 10.0-11.6kHz to low band audio signal,
E 8.0-11.6it is the estimation of the energy in frequency band 8.0-11.6kHz to low band audio signal.
In addition, in the embodiment shown in fig. 5, feature F 2be defined as follows:
F 2 = E 8.0 - 11.6 E 0.0 - 11.6 - - - ( 16 )
Wherein,
E 8.0-11.6the estimation of the energy in frequency band 8.0-11.6kHz to low band audio signal,
E 0.0-11.6it is the estimation of the energy in frequency band 0.0-11.6kHz to low band audio signal.
Feature F 1, F 2represent spectral tilt, and with feature above
Figure BDA00001655406600085
similar, but they are determined at frequency domain rather than in time domain.In addition, on other frequency intervals of LB signal, determine feature F 1, F 2feasible.Yet in this embodiment of the present invention, main points are F 1, F 2energy Ratios between the different piece of low band audio signal frequency spectrum has been described.
Use the feature F extracting 1, F 2, mapper 32 now can be by using following Generalized Additive Models that they are mapped to HB parameter
Figure BDA00001655406600086
E ^ k = w 0 k + &Sigma; m = 1 2 w 1 mk 1 + exp ( - w 2 mk F m + w 3 mk ) - - - ( 17 )
Wherein,
Figure BDA00001655406600088
k=1 ..., K is the high frequency band parameters that defines gain, the envelope of K predetermined frequency band of copy after the frequency displacement of this gain control low band audio signal,
{ w 0k, w 1mk, w 2mk, w 3mkfor each high frequency band parameters
Figure BDA00001655406600091
the mapping coefficient set of definition sigmoid function,
F m, m=1,2, be the feature of describing the low band audio signal of the energy Ratios between the different piece of low band audio signal frequency spectrum.
Fig. 6 shows the block diagram of embodiment that is applicable to generating based on feature shown in Fig. 5 the device of HB expansion according to of the present invention.This embodiment comprises the similar unit of embodiment with Fig. 3, but in this embodiment, they are configured to feature F 1, F 2be mapped to K gain rather than single gain
Fig. 7 shows the figure of the example of the high frequency band parameters obtaining based on feature shown in Fig. 5 by Generalized Additive Models according to an embodiment of the invention.In this example, there is K=4 gain these four gains
Figure BDA00001655406600095
the envelope of 4 predetermined frequency bands of copy after the frequency displacement of control low band audio signal.Therefore, in this example, by 4 parameters
Figure BDA00001655406600096
control HB envelope, rather than use single parameter as in the example with reference to figure 4
Figure BDA00001655406600097
control HB envelope.Still less also feasible with more parameter.
Fig. 8 shows the block diagram of another embodiment of the coding/decoding layout that comprises demoder according to another embodiment of the present invention.The difference of the embodiment of this embodiment and Fig. 1 is: do not abandon HB signal s hB.On the contrary, HB signal is forwarded to HB message block 22, and 22 pairs of HB signals of HB message block are classified, and sends N bit classification index to Voice decoder 2.If allow to send HB information (as shown in Figure 8), bunch (cluster) that mapping utilizes transmission to provide comes segmentation to carry out, and the number of wherein classifying depends on the amount of available bits.As mentioned below, category index is used by mapping block 18.
Fig. 9 shows the block diagram of the another embodiment of the coding/decoding layout that comprises demoder according to still another embodiment of the invention.The embodiment of this embodiment and Fig. 8 is similar, but uses HB signal s hBand LB signal s lBcome together to form category index.In this example, N=1 bit, but by comprising more bits, can also have more than 2 classifications.
Figure 10 shows according to of the present invention for generating the block diagram of another embodiment of the device of HB expansion.The difference of the embodiment of this embodiment and Fig. 3 is: it comprises mapping coefficient selector switch 38, and the signal category index C that mapping coefficient selector switch 38 is configured to based on receiving selects mapping coefficient set
Figure BDA00001655406600098
in this embodiment, according to the set of low-frequency band feature
Figure BDA00001655406600101
mapping coefficient ω with pre-stored cpredict high frequency band parameters
Figure BDA00001655406600102
category index C selects the set of mapping coefficient, and the set of described mapping coefficient is determined by the off-line training process of the data in order in matching bunch.Can be regarded as state (without classification) from pure prediction HB seamlessly transitting to pure quantification HB state (having classification).The latter is the result of the following fact: in the situation that bunch number increase, mapping will trend towards the mean value of prediction bunch.
Figure 11 shows according to of the present invention for generating the block diagram of another embodiment of the device of HB expansion.The embodiment of this embodiment and Figure 10 is similar, but this embodiment is the feature F based on describing with reference to figure 5 1, F 2.In addition, in this embodiment, by following classification (also with reference to figure 5 top), provide signal classification C:
Figure BDA00001655406600103
Wherein,
Figure BDA00001655406600104
the estimation of the energy in frequency band 8.0-11.6kHz to source sound signal,
Figure BDA00001655406600105
it is the estimation of the energy in frequency band 11.6-16.0kHz to source sound signal.
In this example, C classifies (say roughly, provide the psychological picture of the represented content of this example classification) to sound, is divided into " voice " (classifying 1) and " non-voice " (classifying 2).
Based on this classification, mapping block 18 can be configured to carry out mapping according to following formula (Generalized Additive Models 32):
E ^ k C = w 0 k C + &Sigma; m = 1 2 w 1 mk C 1 + exp ( - w 2 mk C F m + w 3 mk C )
Wherein,
Figure BDA00001655406600107
k=1 ..., K is high frequency band parameters, the gain that the definition of this high frequency band parameters is associated with signal classification C, and control the envelope of K predetermined frequency band of copy after the frequency displacement of low band audio signal, wherein said signal is classified C to by low band audio signal
Figure BDA00001655406600108
the source sound signal representing is classified,
be in signal classification C, for each high frequency band parameters
Figure BDA000016554066001010
the mapping coefficient set of definition sigmoid function,
F m, m=1,2, be the feature of describing the low band audio signal of the energy Ratios between the different piece of low band audio signal frequency spectrum.
As example, K=4 and can define F by (15) and (16) 1, F 2.
The advantage of the embodiment of Fig. 8-11 is that they have realized feature from extracting to " fine setting " of the mapping of the type of the sound of having encoded.
Figure 12 shows the block diagram comprising according to the embodiment of the network node of the embodiment of Voice decoder 2 of the present invention.This embodiment shows wireless terminal, but other network nodes are also feasible.For example, if use the voice based on IP (Internet protocol) in network, node can comprise computing machine.
In the network node of Figure 12, the voice signal that antenna reception has been encoded.Detuner and channel decoder 50 convert this signal to low-frequency band speech parameter (signal classification C, as indicated in passed through (classification C) and void signal wire alternatively), and they are transmitted to Voice decoder 2, to generate voice signal
Figure BDA00001655406600111
as reference, each embodiment is described above.
Step described herein, function, process and/or piece can be realized with the hardware with any conventional art, and described conventional art is for example discrete circuit or integrated circuit technique, comprise universal circuit and special circuit.
Alternatively, at least some in step described herein, function, process and/or piece can use the software of being carried out by suitable treatment facility to realize, described treatment facility is for example microprocessor, digital signal processor (DSP) and/or any suitable programmable logic device (PLD), as field programmable gate array (FPGA) device.
Be also to be understood that the common treatment ability of reusing network node is possible.This can be for example by realizing to existing software reprogramming or by adding new component software.
As a realization example, Figure 13 is the block diagram illustrating according to the example embodiment of Voice decoder 2 of the present invention.This embodiment is based on processor 100 (as microprocessor), its execution: component software 110, and for estimating low-frequency band voice signal
Figure BDA00001655406600112
component software
120, for estimating high frequency band voice signal
Figure BDA00001655406600113
and component software 130, for basis with
Figure BDA00001655406600115
generate voice signal
Figure BDA00001655406600116
this software is stored in storer 150.Processor 100 is by system bus and memory communication.By controlling with I/O (I/O) controller 160 of processor 100 and the storage 150 I/O buses that are connected, receive low-frequency band speech parameter (signal classification C alternatively).In this embodiment, the parameter that I/O controller 150 receives is stored in storer 150, and wherein they are processed by component software.Component software 110 can be realized the function of the piece 14 in above-described embodiment.Component software 120 can be realized the function of the piece 30 in above-described embodiment.Component software 130 can be realized the function of the piece 20 in above-described embodiment.By I/O controller 160, by I/O bus, from storer 150 outputs, be obtained from the voice signal of component software 130.
In the embodiment of Figure 13, by I/O controller 160 reception speech parameters, and hypothesis is by other local other tasks of processing in receiving network node, as the solution mediation channel-decoding in wireless terminal.Yet alternatives is to allow other component softwares in storer 150 also process for extract all or part of the digital signal processing of speech parameter from receiving signal.In such embodiments, can directly from storer 150, retrieve speech parameter.
In the situation that receive network node, be the computing machine that receives the voice based on IP grouping, IP grouping is forwarded to I/O controller 160 conventionally, and another component software in storer 150 extracts speech parameter.
Some or all assemblies in above-described component software can for example, carry on computer-readable medium (CD, DVD or hard disk), and loaded into memory is carried out for processor.
Figure 14 shows the process flow diagram of the embodiment of the method according to this invention.Step S1 extracts the characteristic set of low band audio signal
Figure BDA00001655406600121
step S2 utilizes Generalized Additive Models that the Feature Mapping of extracting is arrived at least one high frequency band parameters
Figure BDA00001655406600122
step S3 is by low band audio signal
Figure BDA00001655406600123
copy frequency displacement to high frequency band.Step S4 controls the envelope of copy after the frequency displacement of low band audio signal by high frequency band parameters.
It will be understood by those skilled in the art that in the situation that do not depart from the scope being defined by the following claims of the present invention, can carry out various modifications and change to the present invention.
Abbreviation
ACELP Algebraic Code Excited Linear Prediction
BWE bandwidth expansion
CELP Code Excited Linear Prediction
DSP digital signal processor
FPGA field programmable gate array
GMM gauss hybrid models
HB high frequency band
HMM Hidden Markov Model (HMM)
IP Internet protocol
LB low-frequency band
List of references
[1]M.Nilsson and W.B.Kleijn,“Avoiding over-estimation in bandwidth extension of telephony speech”,Proc.IEEE Int.Conf.Acoust.Speech Sign.Process.,2001.
[2]P.Jax and P.Vary,“Wideband extension of telephone speech using a hidden Markov model”,IEEE Workshop on Speech Coding,2000.
[3]ITU-T Rec.G.729.1,“G.729-based embedded variable bit-rate coder:An 8-32kbit/s scalable wideband coder bitstream interoperable with G.729”,2006.
[4]3GPP TS 26.190,“Adaptive Multi-Rate-Wideband (AMR-WB)speech codec;Transcoding functions”,2008.
[5]“New Approaches to Regression by Generalized Additive Models and Continuous Optimizationfor Modern Applications in Finance,Science and Technology”,Pakize Taylan,Gerhard-Wilhelm Weber,Amir Beck, http://www3.iam.metu.edu.tr/iam/images/1/10/Preprint56.pdf
[6]Numerical Recipes in C++:The Art of Scientific Computing,2nd edition,reprinted 2003,W. Press,S.Teukolsky,W.Vetterling,B.Flannery

Claims (19)

1. estimate low band audio signal for one kind
Figure FDA0000382771140000011
high frequency band expansion
Figure FDA0000382771140000012
method, comprise the characteristic set of extraction (S1) low band audio signal step, described method is characterised in that:
Utilize Generalized Additive Models, extracted Feature Mapping (S2) is arrived at least one high frequency band parameters
Figure FDA0000382771140000014
By low band audio signal copy frequency displacement (S3) to high frequency band;
By described at least one high frequency band parameters, control the envelope of copy after the frequency displacement of (S4) low band audio signal.
2. method according to claim 1, wherein, the feature of described mapping based on extracted
Figure FDA0000382771140000016
sigmoid function and.
3. method according to claim 2, wherein, described mapping provides by following formula:
E ^ k = w 0 k + &Sigma; m = 1 2 w 1 mk 1 + exp ( - w 2 mk F m + w 3 mk )
Wherein,
Figure FDA0000382771140000018
the high frequency band parameters that defines gain, the envelope of K predetermined frequency band of copy after the frequency displacement of described gain control low band audio signal,
{ w 0k, w 1mk, w 2mk, w 3mkthat definition is for each high frequency band parameters
Figure FDA0000382771140000019
the mapping coefficient set of sigmoid function,
F m, m=1,2, be the feature of describing the low band audio signal of the energy Ratios between the different piece of low band audio signal frequency spectrum.
4. method according to claim 2, wherein, described mapping provides by following formula:
E ^ k C = w 0 k C + &Sigma; m = 1 2 w 1 mk C 1 + exp ( - w 2 mk C F m + w 3 mk C )
Wherein,
Figure FDA00003827711400000111
high frequency band parameters, the gain that the definition of this high frequency band parameters is associated with signal classification C, and control the envelope of K predetermined frequency band of copy after the frequency displacement of low band audio signal, wherein, described signal is classified C to by low band audio signal
Figure FDA00003827711400000112
the source sound signal representing is classified,
Figure FDA0000382771140000021
that definition is for each high frequency band parameters in signal classification C
Figure FDA0000382771140000022
the mapping coefficient set of sigmoid function,
F m, m=1,2, be the feature of describing the low band audio signal of the energy Ratios between the different piece of low band audio signal frequency spectrum.
5. according to the method described in claim 3 or 4, wherein, described feature F 1by following formula, provide:
F 1 = E 10.0 - 11.6 E 8.0 - 11.6
Wherein,
E 10.0-11.6the estimation of the energy in frequency band 10.0-11.6kHz to low band audio signal,
E 8.0-11.6it is the estimation of the energy in frequency band 8.0-11.6kHz to low band audio signal.
6. according to the method described in claim 3 or 4, wherein, described feature F 2by following formula, provide:
F 2 = E 8.0 - 11.6 E 0.0 - 11.6
Wherein,
E 8.0-11.6the estimation of the energy in frequency band 8.0-11.6kHz to low band audio signal,
E 0.0-11.6it is the estimation of the energy in frequency band 0.0-11.6kHz to low band audio signal.
7. according to the method described in claim 3 or 4, wherein, K=4.
8. method according to claim 4, comprises the following steps: select the mapping coefficient set corresponding with signal classification C
Figure FDA0000382771140000025
wherein, C is provided by following formula:
Figure FDA0000382771140000026
Wherein,
Figure FDA0000382771140000027
the estimation of the energy in frequency band 8.0-11.6kHz to source sound signal, and
Figure FDA0000382771140000028
it is the estimation of the energy in frequency band 11.6-16.0kHz to source sound signal.
9. one kind for estimating low band audio signal
Figure FDA0000382771140000029
high frequency band expansion
Figure FDA00003827711400000210
equipment (30), comprise the characteristic set that is configured to extract low band audio signal
Figure FDA0000382771140000031
feature extraction piece (16), described equipment is characterised in that:
Mapping block (18), comprising:
Generalized Additive Models mapper (32), is configured to utilize Generalized Additive Models, and extracted Feature Mapping is arrived at least one high frequency band parameters
Figure FDA0000382771140000032
Frequency shifter (34), is configured to low band audio signal
Figure FDA0000382771140000033
copy frequency displacement to high frequency band;
Envelope control device (36), is configured to control by described at least one high frequency band parameters the envelope of copy after frequency displacement.
10. equipment according to claim 9, wherein, described Generalized Additive Models mapper (32) is configured to make the feature of described mapping based on extracted
Figure FDA0000382771140000034
sigmoid function and.
11. equipment according to claim 10, wherein, described Generalized Additive Models mapper (32) is configured to carry out mapping according to following formula:
E ^ k = w 0 k + &Sigma; m = 1 2 w 1 mk 1 + exp ( - w 2 mk F m + w 3 mk )
Wherein,
Figure FDA0000382771140000036
the high frequency band parameters that defines gain, the envelope of K predetermined frequency band of copy after the frequency displacement of this gain control low band audio signal,
{ w 0k, w 1mk, w 2mk, w 3mkthat definition is for each high frequency band parameters
Figure FDA0000382771140000037
the mapping coefficient set of sigmoid function,
F m, m=1,2, be the feature of describing the low band audio signal of the energy Ratios between the different piece of low band audio signal frequency spectrum.
12. equipment according to claim 10, wherein, described Generalized Additive Models mapper (32) is configured to carry out mapping according to following formula:
E ^ k C = w 0 k C + &Sigma; m = 1 2 w 1 mk C 1 + exp ( - w 2 mk C F m + w 3 mk C )
Wherein,
Figure FDA0000382771140000039
high frequency band parameters, the gain that the definition of this high frequency band parameters is associated with signal classification C, and control the envelope of K predetermined frequency band of copy after the frequency displacement of low band audio signal, wherein, described signal is classified C to by low band audio signal
Figure FDA0000382771140000041
the source sound signal representing is classified,
Figure FDA0000382771140000042
that definition is for each high frequency band parameters in signal classification C
Figure FDA0000382771140000043
the mapping coefficient set of sigmoid function,
F m, m=1,2, be the feature of describing the low band audio signal of the energy Ratios between the different piece of low band audio signal frequency spectrum.
13. according to the equipment described in claim 11 or 12, and wherein, described feature extraction piece (16) is configured to extract feature F by following formula 1:
F 1 = E 10.0 - 11.6 E 8.0 - 11.6
Wherein,
E 10.0-11.6the estimation of the energy in frequency band 10.0-11.6kHz to low band audio signal,
E 8.0-11.6it is the estimation of the energy in frequency band 8.0-11.6kHz to low band audio signal.
14. according to the equipment described in claim 11 or 12, and wherein, described feature extraction piece (16) is configured to extract feature F by following formula 2:
F 2 = E 8.0 - 11.6 E 0.0 - 11.6
Wherein,
E 8.0-11.6the estimation of the energy in frequency band 8.0-11.6kHz to low band audio signal,
E 0.0-11.6it is the estimation of the energy in frequency band 0.0-11.6kHz to low band audio signal.
15. according to the equipment described in claim 11 or 12, and wherein, described Generalized Additive Models mapper (32) is configured to extracted Feature Mapping to K=4 high frequency band parameters
Figure FDA0000382771140000046
16. equipment according to claim 12, comprise mapping coefficient Resource selection device (38), and it is configured to select the mapping coefficient set corresponding with signal classification C
Figure FDA0000382771140000047
wherein C is provided by following formula:
Figure FDA0000382771140000048
Wherein,
Figure FDA0000382771140000051
the estimation of the energy in frequency band 8.0-11.6kHz to source sound signal, and
Figure FDA0000382771140000052
it is the estimation of the energy in frequency band 11.6-16.0kHz to source sound signal.
17. 1 kinds of Voice decoders, comprise according to the equipment (30) described in any one in aforementioned claim 9 to 16.
18. 1 kinds of network nodes, comprise Voice decoder according to claim 17.
19. network nodes according to claim 18, wherein, described network node is wireless terminal.
CN201080052278.3A 2009-11-19 2010-09-14 Bandwidth extension of low band audio signal Expired - Fee Related CN102612712B (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US26259309P 2009-11-19 2009-11-19
US61/262,593 2009-11-19
PCT/SE2010/050984 WO2011062538A1 (en) 2009-11-19 2010-09-14 Bandwidth extension of a low band audio signal

Publications (2)

Publication Number Publication Date
CN102612712A CN102612712A (en) 2012-07-25
CN102612712B true CN102612712B (en) 2014-03-12

Family

ID=44059836

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201080052278.3A Expired - Fee Related CN102612712B (en) 2009-11-19 2010-09-14 Bandwidth extension of low band audio signal

Country Status (7)

Country Link
US (1) US8929568B2 (en)
EP (1) EP2502231B1 (en)
JP (1) JP5619177B2 (en)
CN (1) CN102612712B (en)
BR (1) BR112012012119A2 (en)
RU (1) RU2568278C2 (en)
WO (1) WO2011062538A1 (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8447617B2 (en) * 2009-12-21 2013-05-21 Mindspeed Technologies, Inc. Method and system for speech bandwidth extension
RU2725416C1 (en) 2012-03-29 2020-07-02 Телефонактиеболагет Лм Эрикссон (Пабл) Broadband of harmonic audio signal
CN105551497B (en) 2013-01-15 2019-03-19 华为技术有限公司 Coding method, coding/decoding method, encoding apparatus and decoding apparatus
RU2625945C2 (en) * 2013-01-29 2017-07-19 Фраунхофер-Гезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф. Device and method for generating signal with improved spectrum using limited energy operation
EP3054446B1 (en) * 2013-01-29 2023-08-09 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Audio encoder, audio decoder, method for providing an encoded audio information, method for providing a decoded audio information, computer program and encoded representation using a signal-adaptive bandwidth extension
CN108172239B (en) * 2013-09-26 2021-01-12 华为技术有限公司 Method and device for expanding frequency band
FR3017484A1 (en) * 2014-02-07 2015-08-14 Orange ENHANCED FREQUENCY BAND EXTENSION IN AUDIO FREQUENCY SIGNAL DECODER
JP2016038435A (en) * 2014-08-06 2016-03-22 ソニー株式会社 Encoding device and method, decoding device and method, and program
US10847170B2 (en) * 2015-06-18 2020-11-24 Qualcomm Incorporated Device and method for generating a high-band signal from non-linearly processed sub-ranges
US9837094B2 (en) * 2015-08-18 2017-12-05 Qualcomm Incorporated Signal re-use during bandwidth transition period
JP2022523564A (en) 2019-03-04 2022-04-25 アイオーカレンツ, インコーポレイテッド Data compression and communication using machine learning

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1300833A2 (en) * 2001-10-04 2003-04-09 AT&T Corp. A method of bandwidth extension for narrow-band speech
US20060277039A1 (en) * 2005-04-22 2006-12-07 Vos Koen B Systems, methods, and apparatus for gain factor smoothing
US20070067163A1 (en) * 2005-09-02 2007-03-22 Nortel Networks Limited Method and apparatus for extending the bandwidth of a speech signal

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE69619284T3 (en) * 1995-03-13 2006-04-27 Matsushita Electric Industrial Co., Ltd., Kadoma Device for expanding the voice bandwidth
SE9700772D0 (en) * 1997-03-03 1997-03-03 Ericsson Telefon Ab L M A high resolution post processing method for a speech decoder
SE512719C2 (en) 1997-06-10 2000-05-02 Lars Gustaf Liljeryd A method and apparatus for reducing data flow based on harmonic bandwidth expansion
US20040002856A1 (en) * 2002-03-08 2004-01-01 Udaya Bhaskar Multi-rate frequency domain interpolative speech CODEC system
JP3861770B2 (en) * 2002-08-21 2006-12-20 ソニー株式会社 Signal encoding apparatus and method, signal decoding apparatus and method, program, and recording medium
WO2005078707A1 (en) * 2004-02-16 2005-08-25 Koninklijke Philips Electronics N.V. A transcoder and method of transcoding therefore
DE602004020765D1 (en) * 2004-09-17 2009-06-04 Harman Becker Automotive Sys Bandwidth extension of band-limited tone signals
RU2376657C2 (en) * 2005-04-01 2009-12-20 Квэлкомм Инкорпорейтед Systems, methods and apparatus for highband time warping
KR20070037945A (en) * 2005-10-04 2007-04-09 삼성전자주식회사 Audio encoding/decoding method and apparatus
US7835904B2 (en) * 2006-03-03 2010-11-16 Microsoft Corp. Perceptual, scalable audio compression
US8688441B2 (en) * 2007-11-29 2014-04-01 Motorola Mobility Llc Method and apparatus to facilitate provision and use of an energy value to determine a spectral envelope shape for out-of-signal bandwidth content
TWI591625B (en) * 2009-05-27 2017-07-11 杜比國際公司 Systems and methods for generating a high frequency component of a signal from a low frequency component of the signal, a set-top box, a computer program product and storage medium thereof

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1300833A2 (en) * 2001-10-04 2003-04-09 AT&T Corp. A method of bandwidth extension for narrow-band speech
US20060277039A1 (en) * 2005-04-22 2006-12-07 Vos Koen B Systems, methods, and apparatus for gain factor smoothing
US20070067163A1 (en) * 2005-09-02 2007-03-22 Nortel Networks Limited Method and apparatus for extending the bandwidth of a speech signal

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
New Approaches to Regression by Generalized Additive Models and Continuous Optimization for Modern Applications in Finance, Science and Technology;PAKIZE TAYLAN ET AL;《THE ART OF SCIENTIFIC COMPUTING》;20031231;第1.3节,第2节 *
PAKIZETAYLANETAL.NewApproachestoRegressionbyGeneralizedAdditiveModelsandContinuousOptimizationforModernApplicationsinFinance Science and Technology.《THE ART OF SCIENTIFIC COMPUTING》.2003
附图2,3.

Also Published As

Publication number Publication date
EP2502231A4 (en) 2013-07-10
BR112012012119A2 (en) 2021-01-05
WO2011062538A9 (en) 2011-06-30
RU2012125251A (en) 2013-12-27
CN102612712A (en) 2012-07-25
JP5619177B2 (en) 2014-11-05
WO2011062538A1 (en) 2011-05-26
RU2568278C2 (en) 2015-11-20
US20120230515A1 (en) 2012-09-13
EP2502231B1 (en) 2014-06-04
US8929568B2 (en) 2015-01-06
EP2502231A1 (en) 2012-09-26
JP2013511743A (en) 2013-04-04

Similar Documents

Publication Publication Date Title
CN102612712B (en) Bandwidth extension of low band audio signal
KR100949232B1 (en) Encoding device, decoding device and methods thereof
US7660720B2 (en) Lossless audio coding/decoding method and apparatus
JP4810422B2 (en) Encoding device, decoding device, and methods thereof
KR100986152B1 (en) Audio coding system using characteristics of a decoded signal to adapt synthesized spectral components
TWI405187B (en) Scalable speech and audio encoder device, processor including the same, and method and machine-readable medium therefor
TWI480856B (en) Noise generation in audio codecs
JP2009533716A (en) Excitation processing in audio encoding and decoding
CN110556123A (en) frequency band extension method, device, electronic equipment and computer readable storage medium
EP3550563B1 (en) Encoder, decoder, encoding method, decoding method, and associated programs
JP2011075936A (en) Audio encoder and decoder
US7426462B2 (en) Fast codebook selection method in audio encoding
CN116368563A (en) Real-time packet loss concealment using deep-drawn networks
WO2024051412A1 (en) Speech encoding method and apparatus, speech decoding method and apparatus, computer device and storage medium
KR102308077B1 (en) Method and Apparatus for Artificial Band Conversion Based on Learning Model
Ohidujjaman et al. Packet Loss Compensation for VoIP through Bone‐Conducted Speech Using Modified Linear Prediction
CN112530446A (en) Frequency band extension method, device, electronic equipment and computer readable storage medium
Hosoda et al. Speech bandwidth extension using data hiding based on discrete hartley transform domain
Singh et al. Design of Medium to Low Bitrate Neural Audio Codec
Gunjal et al. Traditional Psychoacoustic Model and Daubechies Wavelets for Enhanced Speech Coder Performance
Lu et al. An MELP Vocoder Based on UVS and MVF
Kemper et al. MPEG-1 psychoacoustic model emulation using multiscale convolutional neural networks
Songsriboonsit et al. Robustness Improvement against G. 726 Speech Codec for Semi-fragile Watermarking in Speech Signals with Singular Spectrum Analysis and Quantization Index Modulation
Dasen Bridging Image and Audio Compression: A Spectrogram-based Neural Approach
Hoang et al. Embedded transform coding of audio signals by model-based bit plane coding

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20140312