WO2006059288A1 - Parametric audio coding comprising balanced quantization scheme - Google Patents

Parametric audio coding comprising balanced quantization scheme Download PDF

Info

Publication number
WO2006059288A1
WO2006059288A1 PCT/IB2005/053977 IB2005053977W WO2006059288A1 WO 2006059288 A1 WO2006059288 A1 WO 2006059288A1 IB 2005053977 W IB2005053977 W IB 2005053977W WO 2006059288 A1 WO2006059288 A1 WO 2006059288A1
Authority
WO
WIPO (PCT)
Prior art keywords
quantization
sinusoidal components
sinusoidal
audio signal
quantization scheme
Prior art date
Application number
PCT/IB2005/053977
Other languages
French (fr)
Inventor
Valery S. Kot
Renat Vafin
Willem B. Kleijn
Original Assignee
Koninklijke Philips Electronics N.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips Electronics N.V. filed Critical Koninklijke Philips Electronics N.V.
Publication of WO2006059288A1 publication Critical patent/WO2006059288A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • G10L19/093Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters using sinusoidal excitation models
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/032Quantisation or dequantisation of spectral components

Definitions

  • Parametric audio coding comprising balanced quantization scheme
  • the invention relates to audio signal coding. Especially, the invention relates to audio coding based on parametric coding and adapted to efficient high quality audio coding. More specifically, the invention relates to sinusoidal encoding scheme taking into account a balance between number of sinusoidal components and quantization.
  • Sinusoidal modeling is a well-known method of audio coding.
  • An input signal to be coded is divided into a number of relatively short time frames (typically in the range of 5 to 50 ms), with the sinusoidal modeling technique being applied to each frame.
  • Sinusoidal modeling of each frame involves finding a set of sinusoidal components parameterized by, for example, amplitude, frequency and phase to represent the portion of the input signal contained in that frame.
  • Sinusoidal modeling may involve picking spectral peaks in the input signal.
  • PAMP Psychoacoustic Matching Pursuit
  • Encoding of the sinusoidal parameters in the bit stream with original floatingpoint precision would lead to a very high bit rate and, in feet, is not necessary.
  • Quantization step The distance between neighboring representation levels is called "quantization step”. Quantization steps can be quite different - the larger a step size, the higher the distortion and the lower the bit rate. The specific choice of quantization scales and quantization steps forms the quantization scheme.
  • JND just-noticeable difference
  • HRQ high-rate quantization
  • each individual component may cost a variable amount of bits in the bit stream.
  • each individual component may cost a variable amount of bits in the bit stream.
  • the total target bit rate one can decide to encode less sinusoidal components with finer quantization, or vice versa, more sinusoids with coarser quantization, or to combine different quantization schemes.
  • the simplest and most common approach is to fix the quantization scheme for the complete audio excerpt and vary only the number of components in each time frame. Exact values of quantization steps in that case are chosen from some limited pre-defined set, depending on the target bit rate and integral properties of the complete excerpt under consideration.
  • the number of components and the target bit rate for the given short time frame is fixed, and optimal quantizers are then defined for that set of sinusoids.
  • bit allocation algorithms are used to distribute the total bit rate between transform coefficients.
  • the optimal bit allocation and quantizers are found such that a distortion measure, such as mean-squared error (MSE) or a weighted MSE, is minimized.
  • MSE mean-squared error
  • the bit rate allocation is found under the non-negativity constraint on the bit rates assigned to the individual transform coefficients.
  • Methods based on a search through possible bit rate allocations or based on HRQ are used.
  • the main problem with the "fixed quantizers" approach is its rigid structure. This method uses the same quantization steps for all sinusoids, while in some cases it might be beneficial to spend more bits (that is use finer quantization) for more perceptually relevant components, compensating it with coarser quantization of less relevant ones.
  • HRQ can provide optimal quantizers for the given bit rate and given set of sinusoids. But again, it might be beneficial to spend this bit budget by encoding less sinusoidal components with finer quantization, or vice versa, more sinusoids with coarser quantization, as the choice of the optimal set of sinusoids is not known.
  • the invention provides an audio encoder adapted to encode an audio signal, the audio encoder comprising a sinusoidal type encoder adapted to generate, for each frame of the audio signal, a set of sinusoidal components, and optimizing means adapted to optimize a predetermined encoding efficiency criterion by selecting, for each frame of the audio signal, a number of sinusoidal components from the set of sinusoidal components, and a quantization scheme for quantization of the selected sinusoidal components, and generate an encoded audio signal comprising the selected set of sinusoidal components quantized according to the selected quantization scheme.
  • an encoder for each frame of the audio signal, given a total target bit rate for that frame, it is possible to select less sinusoids but with finer quantization, or for more sinusoids with coarser quantization, or for a combination of different quantization schemes.
  • an encoder is capable of finding an optimal balance between number of sinusoids and quantization scheme based on an evaluation of encoding efficiency, e.g. in terms of a perceptual distortion measure.
  • the optimizing means may be adapted to find the number of sinusoids and quantization scheme resulting in the least possible distortion for a given target bit rate, such as a maximum bit rate for the given frame.
  • the sinusoidal type encoder may be adapted to generate, for each frame, a fixed number of sinusoids that, preferably, is at least enough to ensure that enough sinusoids are generated in order to represent the audio signal with an adequately low amount of distortion.
  • the sinusoidal type encoder may comprise means to evaluate each sinusoidal component by some perceptually relevant measure, such as one based on a representation of a human auditory masking curve, and rank the sinusoidal components according to perceptual relevance.
  • the sinusoidal encoder may thus be adapted to stop extracting sinusoids as a predetermined stop criterion, e.g. a perceptually relevant stop criterion, has been met.
  • the optimizing means may be adapted to select separate quantization schemes for each of the selected sinusoidal components.
  • a drawback is the amount of data necessary in the output bit stream, namely quantization scheme information for each of the sinusoidal components contained in a frame.
  • the optimizing means may be adapted to select the quantization scheme from a predetermined set of quantization schemes.
  • the optimizing means has a number of preselected quantization schemes that can be successively run through and for each scheme encoding efficiency is evaluated, and the most optimal one is chosen.
  • the process may be stopped when a scheme has been evaluated and found to comply with a predetermined encoding efficiency stop criterion.
  • the quantization scheme may comprise quantization parameters for quantization of frequency, amplitude, and phase of the sinusoidal components. Quantization of each of frequency, amplitude and phase may be adjustable independent of each other, or they may be locked together in predefined sets of quantization parameters with finer and coarser quantization.
  • the predetermined encoding efficiency criterion comprises a perceptual distortion measure.
  • 'distortion' is understood a difference between the audio signal itself and the encoded audio signal, generated by encoding the audio signal according to the encoding template.
  • 'perceptual distortion measure' is understood a measure of distortion relevant with respect to what is perceived by the human auditory system, i.e. a measure of distortion that reflects a perceived sound quality.
  • the perceptual distortion measure is based on a perceptual model, such as a representation of the human auditory system.
  • the optimizing means may be adapted to optimize, for each frame of the audio signal, a distortion measure for a predetermined bit rate used to represent the encoded audio signal.
  • the optimizing means is able to generate an encoding efficiency measure in terms of a bit rate versus distortion. More preferably, a perceptual distortion for a given bit rate is used as encoding efficiency measure.
  • the optimizing means may be adapted to select the sinusoidal components in order of their perceptual relevance.
  • the sinusoidal encoder may be adapted to rank the sinusoidal components in order of their perceptual relevance, such as by using a perceptual model. This will help the optimizing means in selecting a proper number of the sinusoidal components, since starting from the perceptually most relevant sinusoidal component will tend to make the optimizing procedure converge faster.
  • a stop criterion e.g. based on a predetermined maximum allowable perceptual distortion or a maximum allowable bit rate may be chosen.
  • the invention provides a method of encoding an audio signal comprising, for each frame of the audio signal, the steps of (1) generating a set of sinusoidal components in response to the audio signal,
  • the invention provides a device comprising an audio encoder according to the first aspect.
  • the device may be an audio device, however other devices may profit from the advantages of the audio encoder according to the invention.
  • the invention provides a computer readable program code adapted to encode an audio signal according to the method of the second aspect.
  • the computer readable program code according to the fourth aspect may comprise software algorithms adapted for a signal processor, personal computers etc. and it may be present on a carriable medium such as a disk or memory card or memory stick, or it may be present in a ROM chip or in other way stored in a device.
  • FIG. 1 showing a block diagram illustrating the principles of a preferred encoder embodiment.
  • Fig. 1 illustrates a block diagram of a preferred encoder according to the invention.
  • An audio signal IN is applied to a parametric encoder, i.e. a sinusoidal encoder SE.
  • the sinusoidal encoder SE For each frame or time segment of the audio signal IN, the sinusoidal encoder SE generates in response a set of sinusoidal components SC that are applied to optimizing means OM.
  • the optimizing means OM selects from the set of sinusoidal components SC a number of sinusoidal components SS from the set of sinusoidal components SC.
  • the optimizing means OM selects a quantization scheme QS comprising quantization parameters for quantization of the selected sinusoidal components SS.
  • the optimizing means OM After the optimizing means OM has evaluated encoding efficiency for different numbers of sinusoidal components SS and different quantization schemes QS and found an optimal balance, the selected sinusoidal components SS and the selected quantization scheme QS are provided to a bit stream generator BG that generates an encoded output signal OUT. It is to be understood that the bit stream generator BG is inessential, though, for the inventive idea.
  • the optimizing means OM optimizes an encoding efficiency criterion based on a perceptual model PM, e.g. so as to evaluate encoding efficiency by using a perceptual distortion measure.
  • the optimizing means OM may iteratively optimize a perceptual distortion measure for a given target bit rate.
  • an input signal to be coded is divided into a number of frames, with the sinusoidal modeling technique being applied to each frame.
  • Frames may be windowed with, for example, a Harming window, or some special window to avoid pre-echo effects, or any kind of window.
  • sinusoidal modeling of X involves finding a set of sinusoidal signals parameterized by, for example, amplitude, frequency and phase to represent the portion of the input signal contained in that frame.
  • a method which can determine perceptual relevance of extracted components. Examples on such methods may be such like PAMP such as described in published patent application WO 0237476.
  • N non-quantized sinusoids GN ⁇ SI, S 2 , ... , S N ⁇ are ranked in the order of their relevance, most important components first.
  • R denotes a target bit rate R, which is the amount of bits that can be spent in that frame.
  • N subsets G K (S 1 , S 2 , ... , S K ⁇ , l ⁇ K ⁇ N are then defined. For each G K , given target bit rate R, the best quantization is searched for. This can be done with high rate quantization, which results in the set of quantized sinusoids GK' and quantization scheme QK. Synthesis of (GK', QK) provides the synthetic signal XK'. To determine the difference between X and X K ' a perceptual distortion measure
  • D ⁇ ' can be used which describes how audible the modeling error is.
  • a preferred example on such perceptual distortion measure is found in S. van de Par, A. Kohlrausch, G. Charestan, and R. Heusdens. "A new psychoacoustical masking model for audio coding applications". In IEEE Int. Conf. Acoust, Speech and Signal Process., pages 1805-1808, Orlando, USA, 2002. The optimal solution then becomes (G m ', Q m ),
  • G m ' is the optimal components set
  • Q m is the optimal quantization scheme for the given target bit rate R.
  • any sinusoidal extraction method can be used.
  • sinusoids G N (S 1 , S 2 , ... , S N ⁇ are not ranked in the order of their perceptual relevance. Then a complete search through all possible combinations of sinusoids (which is 2 N -1, instead of N in the preferred embodiment) is performed.
  • any quantization method can be used instead of, or in addition to, high rate quantization. Then for some K a search among some possible quantization schemes Q ⁇ L is performed. Each (GK, QK L ) results in the bit rate R ⁇ L and the perceptual distortion measure D K L . The optimal solution is then defined by: arg min (Z) ⁇ )
  • the invention is applied to an audio encoder comprising a plurality of sub-encoders, for example sinusoidal, waveform, and noise, either in parallel or cascaded.
  • a distortion D ⁇ ' at the exit of the complete encoder.
  • the encoding principles according to the invention may be applied within a large range of applications, such as solid state audio devices, audio players/recorders, mobile communication devices, IP-telephony, multimedia streaming of audio such as on the internet etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

An audio encoder comprising a sinusoidal type encoder SE that generates, for each frame of an audio input signal IN, a set of sinusoidal components SC. For each frame, an optimum balance between number of sinusoidal components selected SS from the set of sinusoidal components together with a quantization scheme QS is found in terms of a predetermined encoding efficiency criterion. An encoded audio signal OUT comprising the selected set of sinusoidal components SS quantized according to the selected quantization scheme QS is then generated. Thus, the encoder is adapted to provide a high efficiency since it takes into account number of sinusoids to use for representing the input signal together with the quantization scheme. The encoder may be adapted to assign different quantization schemes for each one of the sinusoidal components. Preferably, the encoder is adapted to optimize encoding efficiency according to a perceptually relevant efficiency criterion, such as including a perceptual distortion measure PM. The quantization scheme may be assigned by high-rate quantization method or selected from a predefined set of quantization schemes.

Description

Parametric audio coding comprising balanced quantization scheme
The invention relates to audio signal coding. Especially, the invention relates to audio coding based on parametric coding and adapted to efficient high quality audio coding. More specifically, the invention relates to sinusoidal encoding scheme taking into account a balance between number of sinusoidal components and quantization.
Sinusoidal modeling is a well-known method of audio coding. An input signal to be coded is divided into a number of relatively short time frames (typically in the range of 5 to 50 ms), with the sinusoidal modeling technique being applied to each frame. Sinusoidal modeling of each frame involves finding a set of sinusoidal components parameterized by, for example, amplitude, frequency and phase to represent the portion of the input signal contained in that frame. Sinusoidal modeling may involve picking spectral peaks in the input signal. Alternatively, one can use more advanced analysis-by-synthesis techniques like Psychoacoustic Matching Pursuit (PAMP). Encoding of the sinusoidal parameters in the bit stream with original floatingpoint precision would lead to a very high bit rate and, in feet, is not necessary. So instead all parameters are quantized, with all values within a certain interval being mapped to one single representation level. This operation can be performed on a linear (uniform) or any other scale. The distance between neighboring representation levels is called "quantization step". Quantization steps can be quite different - the larger a step size, the higher the distortion and the lower the bit rate. The specific choice of quantization scales and quantization steps forms the quantization scheme.
In the most common approach frequencies and amplitudes are quantized on a logarithmic-like scale, while phases are quantized on the uniform scale. Quantization steps are set to be equal to the so-called "just-noticeable difference" (JND) or some JND derivative. The more advanced high-rate quantization (HRQ) approach finds the optimal quantizers that minimize a distortion measure for a given set of sinusoids and a given target bit rate.
Depending on the chosen quantization scheme each individual component may cost a variable amount of bits in the bit stream. Given the total target bit rate, one can decide to encode less sinusoidal components with finer quantization, or vice versa, more sinusoids with coarser quantization, or to combine different quantization schemes. The simplest and most common approach is to fix the quantization scheme for the complete audio excerpt and vary only the number of components in each time frame. Exact values of quantization steps in that case are chosen from some limited pre-defined set, depending on the target bit rate and integral properties of the complete excerpt under consideration. In the HRQ approach the number of components and the target bit rate for the given short time frame is fixed, and optimal quantizers are then defined for that set of sinusoids.
In transform coding, bit allocation algorithms are used to distribute the total bit rate between transform coefficients. The optimal bit allocation and quantizers are found such that a distortion measure, such as mean-squared error (MSE) or a weighted MSE, is minimized. The bit rate allocation is found under the non-negativity constraint on the bit rates assigned to the individual transform coefficients. Methods based on a search through possible bit rate allocations or based on HRQ are used. The main problem with the "fixed quantizers" approach is its rigid structure. This method uses the same quantization steps for all sinusoids, while in some cases it might be beneficial to spend more bits (that is use finer quantization) for more perceptually relevant components, compensating it with coarser quantization of less relevant ones. HRQ can provide optimal quantizers for the given bit rate and given set of sinusoids. But again, it might be beneficial to spend this bit budget by encoding less sinusoidal components with finer quantization, or vice versa, more sinusoids with coarser quantization, as the choice of the optimal set of sinusoids is not known.
Thus, all known encoders are unable to provide an encoded signal with an optimal balance between number of sinusoids and quantization scheme, which results in non- optimal performance of the complete encoder.
It may be seen as an object of the present invention to provide a sinusoidal based encoder and encoding method capable of providing a high sound quality at a low bit rate. According to a first aspect the invention provides an audio encoder adapted to encode an audio signal, the audio encoder comprising a sinusoidal type encoder adapted to generate, for each frame of the audio signal, a set of sinusoidal components, and optimizing means adapted to optimize a predetermined encoding efficiency criterion by selecting, for each frame of the audio signal, a number of sinusoidal components from the set of sinusoidal components, and a quantization scheme for quantization of the selected sinusoidal components, and generate an encoded audio signal comprising the selected set of sinusoidal components quantized according to the selected quantization scheme. Thus, for each frame of the audio signal, given a total target bit rate for that frame, it is possible to select less sinusoids but with finer quantization, or for more sinusoids with coarser quantization, or for a combination of different quantization schemes. Hereby an encoder according to the first aspect is capable of finding an optimal balance between number of sinusoids and quantization scheme based on an evaluation of encoding efficiency, e.g. in terms of a perceptual distortion measure.
Different encoding efficiency criteria may be preferred. E.g. the optimizing means may be adapted to find the number of sinusoids and quantization scheme resulting in the least possible distortion for a given target bit rate, such as a maximum bit rate for the given frame. The sinusoidal type encoder may be adapted to generate, for each frame, a fixed number of sinusoids that, preferably, is at least enough to ensure that enough sinusoids are generated in order to represent the audio signal with an adequately low amount of distortion. For this purpose the sinusoidal type encoder may comprise means to evaluate each sinusoidal component by some perceptually relevant measure, such as one based on a representation of a human auditory masking curve, and rank the sinusoidal components according to perceptual relevance. The sinusoidal encoder may thus be adapted to stop extracting sinusoids as a predetermined stop criterion, e.g. a perceptually relevant stop criterion, has been met.
The optimizing means may be adapted to select separate quantization schemes for each of the selected sinusoidal components. In this embodiment it is possible to individually assign quantization schemes specially adapted to each sinusoidal component. This may require a more comprehensive optimizing procedure but provides an even better possibility of balancing quantization scheme and number of sinusoidal components in order to obtain an optimal rate- distortion efficiency. A drawback is the amount of data necessary in the output bit stream, namely quantization scheme information for each of the sinusoidal components contained in a frame.
The optimizing means may be adapted to select the quantization scheme from a predetermined set of quantization schemes. Thus, the optimizing means has a number of preselected quantization schemes that can be successively run through and for each scheme encoding efficiency is evaluated, and the most optimal one is chosen. Alternatively, the process may be stopped when a scheme has been evaluated and found to comply with a predetermined encoding efficiency stop criterion.
The quantization scheme may comprise quantization parameters for quantization of frequency, amplitude, and phase of the sinusoidal components. Quantization of each of frequency, amplitude and phase may be adjustable independent of each other, or they may be locked together in predefined sets of quantization parameters with finer and coarser quantization.
Preferably, the predetermined encoding efficiency criterion comprises a perceptual distortion measure. By 'distortion' is understood a difference between the audio signal itself and the encoded audio signal, generated by encoding the audio signal according to the encoding template. By 'perceptual distortion measure' is understood a measure of distortion relevant with respect to what is perceived by the human auditory system, i.e. a measure of distortion that reflects a perceived sound quality. Preferably, the perceptual distortion measure is based on a perceptual model, such as a representation of the human auditory system. The optimizing means may be adapted to optimize, for each frame of the audio signal, a distortion measure for a predetermined bit rate used to represent the encoded audio signal. In this way the optimizing means is able to generate an encoding efficiency measure in terms of a bit rate versus distortion. More preferably, a perceptual distortion for a given bit rate is used as encoding efficiency measure. The optimizing means may be adapted to select the sinusoidal components in order of their perceptual relevance. The sinusoidal encoder may be adapted to rank the sinusoidal components in order of their perceptual relevance, such as by using a perceptual model. This will help the optimizing means in selecting a proper number of the sinusoidal components, since starting from the perceptually most relevant sinusoidal component will tend to make the optimizing procedure converge faster. A stop criterion e.g. based on a predetermined maximum allowable perceptual distortion or a maximum allowable bit rate may be chosen.
According to a second aspect the invention provides a method of encoding an audio signal comprising, for each frame of the audio signal, the steps of (1) generating a set of sinusoidal components in response to the audio signal,
(2) selecting a number of sinusoidal components from the set of sinusoidal components,
(3) selecting a quantization scheme for quantization of the selected sinusoidal components, (4) repeating steps (2) and (3) until a predetermined encoding efficiency criterion is iulfilled, and
(5) generating an encoded audio signal comprising the selected set of sinusoidal components quantized according to the selected quantization scheme. The same explanations as for the first aspect apply for the second aspect. Also the same embodiments and/or variants as mentioned for the first aspect apply for the second aspect.
In a third aspect the invention provides a device comprising an audio encoder according to the first aspect. The device may be an audio device, however other devices may profit from the advantages of the audio encoder according to the invention.
In a fourth aspect the invention provides a computer readable program code adapted to encode an audio signal according to the method of the second aspect.
The computer readable program code according to the fourth aspect may comprise software algorithms adapted for a signal processor, personal computers etc. and it may be present on a carriable medium such as a disk or memory card or memory stick, or it may be present in a ROM chip or in other way stored in a device.
In the following the invention is described in more details with reference to the accompanying Fig. 1 showing a block diagram illustrating the principles of a preferred encoder embodiment.
While the invention is susceptible to various modifications and alternative forms, specific embodiments have been shown by way of example in the drawings and will be described in detail herein. It should be understood, however, that the invention is not intended to be limited to the particular forms disclosed. Rather, the invention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention as defined by the appended claims.
Fig. 1 illustrates a block diagram of a preferred encoder according to the invention. An audio signal IN is applied to a parametric encoder, i.e. a sinusoidal encoder SE. For each frame or time segment of the audio signal IN, the sinusoidal encoder SE generates in response a set of sinusoidal components SC that are applied to optimizing means OM. The optimizing means OM selects from the set of sinusoidal components SC a number of sinusoidal components SS from the set of sinusoidal components SC. In addition the optimizing means OM selects a quantization scheme QS comprising quantization parameters for quantization of the selected sinusoidal components SS. After the optimizing means OM has evaluated encoding efficiency for different numbers of sinusoidal components SS and different quantization schemes QS and found an optimal balance, the selected sinusoidal components SS and the selected quantization scheme QS are provided to a bit stream generator BG that generates an encoded output signal OUT. It is to be understood that the bit stream generator BG is inessential, though, for the inventive idea.
Preferably, the optimizing means OM optimizes an encoding efficiency criterion based on a perceptual model PM, e.g. so as to evaluate encoding efficiency by using a perceptual distortion measure. Hereby, the optimizing means OM may iteratively optimize a perceptual distortion measure for a given target bit rate.
In the following, preferred embodiments are described in more details.
In a first embodiment an input signal to be coded is divided into a number of frames, with the sinusoidal modeling technique being applied to each frame. Frames may be windowed with, for example, a Harming window, or some special window to avoid pre-echo effects, or any kind of window. If a signal within the frame is denoted X, sinusoidal modeling of X involves finding a set of sinusoidal signals parameterized by, for example, amplitude, frequency and phase to represent the portion of the input signal contained in that frame. For the extraction of the sinusoids it is preferred to use a method, which can determine perceptual relevance of extracted components. Examples on such methods may be such like PAMP such as described in published patent application WO 0237476.
After sinusoidal extraction a set of resulting N non-quantized sinusoids GN = {SI, S2, ... , SN} are ranked in the order of their relevance, most important components first. R denotes a target bit rate R, which is the amount of bits that can be spent in that frame. N subsets GK=(S1, S2, ... , SK}, l≤K≤N are then defined. For each GK, given target bit rate R, the best quantization is searched for. This can be done with high rate quantization, which results in the set of quantized sinusoids GK' and quantization scheme QK. Synthesis of (GK', QK) provides the synthetic signal XK'. To determine the difference between X and XK' a perceptual distortion measure
Dκ' can be used which describes how audible the modeling error is. A preferred example on such perceptual distortion measure is found in S. van de Par, A. Kohlrausch, G. Charestan, and R. Heusdens. "A new psychoacoustical masking model for audio coding applications". In IEEE Int. Conf. Acoust, Speech and Signal Process., pages 1805-1808, Orlando, USA, 2002. The optimal solution then becomes (Gm', Qm),
m = argτrάn(D'κ ) (1) l≤K≤N
where m is the optimal number of components, Gm' is the optimal components set and Qm is the optimal quantization scheme for the given target bit rate R.
In a second embodiment any sinusoidal extraction method can be used. In that case sinusoids GN = (S1, S2, ... , SN} are not ranked in the order of their perceptual relevance. Then a complete search through all possible combinations of sinusoids (which is 2N-1, instead of N in the preferred embodiment) is performed.
In a third embodiment traditional fixed quantizers can be used instead of, or in addition to, high rate quantization. Then for some K's a search among some fixed quantization schemes QκL is performed. Then only a limited number of allowed quantization schemes can be chosen, but as the amount of the required side information for fixed quantizers is very low, it might result in the optimal solution. Each (GK, QK L) results in the bit rate RκL and the perceptual distortion measure DK L. The optimal solution is then defined by: arg IrUn (Z^ ), L5K possibly subject to RK L < R
In a fourth embodiment any quantization method can be used instead of, or in addition to, high rate quantization. Then for some K a search among some possible quantization schemes QκL is performed. Each (GK, QKL) results in the bit rate RκL and the perceptual distortion measure DK L. The optimal solution is then defined by: arg min (Z)^ )
L5K possibly subject to
RK L ≤ R
In a fifth embodiment the invention is applied to an audio encoder comprising a plurality of sub-encoders, for example sinusoidal, waveform, and noise, either in parallel or cascaded. In this case, for all aforementioned embodiments, it may be beneficial to calculate a distortion Dκ' at the exit of the complete encoder. As will be understood the encoding principles according to the invention may be applied within a large range of applications, such as solid state audio devices, audio players/recorders, mobile communication devices, IP-telephony, multimedia streaming of audio such as on the internet etc.
In the claims reference signs to the Figures are included for clarity reasons only. These references to exemplary embodiments in the Figure should not in any way be construed as limiting the scope of the claims.

Claims

CLAIMS:
1. An audio encoder adapted to encode an audio signal (IN), the audio encoder comprising a sinusoidal type encoder (SE) adapted to generate, for each frame of the audio signal (IN), a set of sinusoidal components (SC), and - optimizing means (OM) adapted to optimize a predetermined encoding efficiency criterion by selecting, for each frame of the audio signal (IN), a number of sinusoidal components from the set of sinusoidal components (SC), and a quantization scheme (QS) for quantization of the selected sinusoidal components (SS), and generate an encoded audio signal (OUT) comprising the selected set of sinusoidal components (SS) quantized according to the selected quantization scheme (QS).
2. An audio encoder according to claim 1 , wherein the optimizing means (OM) is adapted to select separate quantization schemes (QS) for each of the selected sinusoidal components (SS).
3. An audio encoder according to claim 1 , wherein the optimizing means (OM) is adapted to select the quantization scheme (QS) from a predetermined set of quantization schemes.
4. An audio encoder according to claim 1 , wherein the quantization scheme (QS) comprises quantization parameters for quantization of frequency, amplitude, and phase of the sinusoidal components (SS).
5. An audio encoder according to claim 1 , wherein the predetermined encoding efficiency criterion comprises a perceptual distortion measure.
6. An audio encoder according to claim 1 , wherein the optimizing means (OM) is adapted to optimize, for each frame of the audio signal, a distortion measure for a predetermined bit rate used to represent the encoded audio signal (OUT).
7. An audio encoder according to claim 1 , wherein the optimizing means (OM) is adapted to select the sinusoidal components (SC) in order of their perceptual relevance.
8. A method of encoding an audio signal (IN) comprising, for each frame of the audio signal, the steps of
(1) generating a set of sinusoidal components (SC) in response to the audio signal (IN),
(2) selecting a number of sinusoidal components (SS) from the set of sinusoidal components (SC),
(3) selecting a quantization scheme (QS) for quantization of the selected sinusoidal components (SS),
(4) repeating steps (2) and (3) until a predetermined encoding efficiency criterion is fulfilled, and (5) generating an encoded audio signal (OUT) comprising the selected set of sinusoidal components (SS) quantized according to the selected quantization scheme (QS).
9. A method according to claim 8, wherein step (3) comprises selecting a separate quantization scheme (QS) for each of the selected sinusoidal component (SS).
10. A device comprising an audio encoder according to claim 1.
11. Computer readable program code adapted to encode an audio signal (IN) according to the method of claim 8.
PCT/IB2005/053977 2004-12-03 2005-11-30 Parametric audio coding comprising balanced quantization scheme WO2006059288A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP04106281.1 2004-12-03
EP04106281 2004-12-03

Publications (1)

Publication Number Publication Date
WO2006059288A1 true WO2006059288A1 (en) 2006-06-08

Family

ID=36087545

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2005/053977 WO2006059288A1 (en) 2004-12-03 2005-11-30 Parametric audio coding comprising balanced quantization scheme

Country Status (1)

Country Link
WO (1) WO2006059288A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103811011A (en) * 2012-11-02 2014-05-21 富士通株式会社 Audio sine wave detection method and device
CN104347082A (en) * 2013-07-24 2015-02-11 富士通株式会社 Tone frame detection method, tone frame detection apparatus, audio encoding method and audio encoding apparatus

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0713295A1 (en) * 1994-04-01 1996-05-22 Sony Corporation Method and device for encoding information, method and device for decoding information, information transmitting method, and information recording medium
US6298322B1 (en) * 1999-05-06 2001-10-02 Eric Lindemann Encoding and synthesis of tonal audio signals using dominant sinusoids and a vector-quantized residual tonal signal

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0713295A1 (en) * 1994-04-01 1996-05-22 Sony Corporation Method and device for encoding information, method and device for decoding information, information transmitting method, and information recording medium
US6298322B1 (en) * 1999-05-06 2001-10-02 Eric Lindemann Encoding and synthesis of tonal audio signals using dominant sinusoids and a vector-quantized residual tonal signal

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
HEUSDENS R ET AL: "Jointly Optimal Time Segmentation, Component Selection and Quantization for Sinusoidal Coding of Audio and Speech", ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2005. PROCEEDINGS. (ICASSP '05). IEEE INTERNATIONAL CONFERENCE ON PHILADELPHIA, PENNSYLVANIA, USA MARCH 18-23, 2005, PISCATAWAY, NJ, USA,IEEE, 18 March 2005 (2005-03-18), pages 193 - 196, XP010792362, ISBN: 0-7803-8874-7 *
LEVINE S N ET AL: "Improvements to the switched parametric and transform audio coder", APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS, 1999 IEEE WORKSHOP ON NEW PALTZ, NY, USA 17-20 OCT. 1999, PISCATAWAY, NJ, USA,IEEE, US, 17 October 1999 (1999-10-17), pages 43 - 46, XP010365091, ISBN: 0-7803-5612-8 *
P.E.L. KORTEN, J. JENSEN AND R. HEUSDENS: "HIGH RATE SPHERICAL QUANTIZATION OF SINUSOIDAL PARAMETERS", PROC. 12TH EUROPEAN SIGNAL PROCESSING CONFERENCE, September 2004 (2004-09-01), pages 1805 - 1808, XP002375730 *
VAFIN R ET AL: "Towards optimal quantization in multistage audio coding", ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2004. PROCEEDINGS. (ICASSP '04). IEEE INTERNATIONAL CONFERENCE ON MONTREAL, QUEBEC, CANADA 17-21 MAY 2004, PISCATAWAY, NJ, USA,IEEE, vol. 4, 17 May 2004 (2004-05-17), pages 205 - 208, XP010718441, ISBN: 0-7803-8484-9 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103811011A (en) * 2012-11-02 2014-05-21 富士通株式会社 Audio sine wave detection method and device
CN104347082A (en) * 2013-07-24 2015-02-11 富士通株式会社 Tone frame detection method, tone frame detection apparatus, audio encoding method and audio encoding apparatus

Similar Documents

Publication Publication Date Title
RU2393552C2 (en) Combined audio coding, which minimises perceived distortion
US8938387B2 (en) Audio encoder and decoder
CN101903945B (en) Encoder, decoder, and encoding method
CN103765510B (en) Code device and method, decoding apparatus and method
US8972270B2 (en) Method and an apparatus for processing an audio signal
US7599833B2 (en) Apparatus and method for coding residual signals of audio signals into a frequency domain and apparatus and method for decoding the same
US8831959B2 (en) Transform audio codec and methods for encoding and decoding a time segment of an audio signal
EP3217398B1 (en) Advanced quantizer
JP2003323198A (en) Encoding method and device, decoding method and device, and program and recording medium
EP1782419A1 (en) Scalable audio coding
EP3616325B1 (en) Difference data in digital audio signals
CN101099199A (en) Audio encoding and decoding
JP4639073B2 (en) Audio signal encoding apparatus and method
US20090063158A1 (en) Efficient audio coding using signal properties
WO2002021091A1 (en) Noise signal analyzer, noise signal synthesizer, noise signal analyzing method, and noise signal synthesizing method
CN115171709B (en) Speech coding, decoding method, device, computer equipment and storage medium
WO2006059288A1 (en) Parametric audio coding comprising balanced quantization scheme
JP2000132194A (en) Signal encoding device and method therefor, and signal decoding device and method therefor
JP2000132193A (en) Signal encoding device and method therefor, and signal decoding device and method therefor
Ramprashad Sparse bit-allocations based on partial ordering schemes with application to speech and audio coding
JP4327420B2 (en) Audio signal encoding method and audio signal decoding method
US20130197919A1 (en) &#34;method and device for determining a number of bits for encoding an audio signal&#34;
van Schijndel et al. Adaptive RD optimized hybrid sound coding
KR20080092823A (en) Apparatus and method for encoding and decoding signal
JP2000132195A (en) Signal encoding device and method therefor

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KM KN KP KR KZ LC LK LR LS LT LU LV LY MA MD MG MK MN MW MX MZ NA NG NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU LV MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 05826691

Country of ref document: EP

Kind code of ref document: A1