WO2006056100A1 - Coding/decoding method and device utilizing intra-channel signal redundancy - Google Patents

Coding/decoding method and device utilizing intra-channel signal redundancy Download PDF

Info

Publication number
WO2006056100A1
WO2006056100A1 PCT/CN2004/001349 CN2004001349W WO2006056100A1 WO 2006056100 A1 WO2006056100 A1 WO 2006056100A1 CN 2004001349 W CN2004001349 W CN 2004001349W WO 2006056100 A1 WO2006056100 A1 WO 2006056100A1
Authority
WO
WIPO (PCT)
Prior art keywords
integer
transform
channel
coefficients
klt
Prior art date
Application number
PCT/CN2004/001349
Other languages
French (fr)
Chinese (zh)
Inventor
Xingde Pan
Lei Wang
Original Assignee
Beijing E-World Technology Co., Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing E-World Technology Co., Ltd filed Critical Beijing E-World Technology Co., Ltd
Priority to PCT/CN2004/001349 priority Critical patent/WO2006056100A1/en
Priority to CN200480044452.4A priority patent/CN101065796A/en
Publication of WO2006056100A1 publication Critical patent/WO2006056100A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing

Definitions

  • the present invention relates to the field of audio codec technology, and in particular to a method and apparatus for encoding and decoding using inter-channel redundancy. Background technique
  • the digital audio signal is audio-encoded or audio-compressed for storage and transmission.
  • the purpose of encoding an audio signal is to achieve a transparent representation of the audio signal with as few bits as possible, such as little difference between the originally input audio signal and the encoded output audio signal.
  • CDs represented the many advantages of digitally representing audio signals, such as high fidelity, large dynamic range, and robustness.
  • these advantages are at the expense of high data rates.
  • the digitization of a CD-quality stereo signal requires a sampling rate of 44.1 kHz, and each sample value is uniformly quantized with 16 bits, so that the uncompressed data rate reaches 1.41 Mb/s, so high.
  • the data rate brings great inconvenience to the transmission and storage of data, especially in the case of multimedia applications and wireless transmission applications, and is limited by bandwidth and cost.
  • new network and wireless multimedia digital audio systems are required to reduce the rate of data without compromising the quality of the audio.
  • FIG. 1 shows a block diagram of an MPEG-2 AAC encoder including a gain controller 101, a modified discrete pre-transform (MDCT) module 102, a time domain noise shaping module 103, an intensity/coupling module 104, a psychoacoustic model, a second order backward adaptive predictor 105, a / difference stereo module 106, a bit allocation and quantization encoding module 107, and a bitstream multiplexing module 108, wherein the bit allocation and quantization encoding module 107 further includes a compression ratio/distortion processing controller, Scale factor module, non-uniform quantizer and entropy coding module.
  • MDCT modified discrete pre-transform
  • the audio signal After the audio signal passes through the gain controller 101, it enters the modified discrete cosine transform module 102, performs time-frequency transform according to different signals, and then processes the spectral coefficients output by the modified discrete cosine transform module 102 through the time domain noise shaping module 103, and time domain noise.
  • the shaping technique performs linear prediction analysis on the spectral coefficients in the frequency domain, and then controls the shape of the quantization noise in the time domain according to the above analysis, thereby achieving the purpose of controlling the pre-echo.
  • the intensity/coupling module 104 is used for stereo encoding of signal strength, since for a high frequency band (greater than 2 kHz) the sense of direction of the hearing is related to the change in signal strength (signal envelope), independent of the waveform of the signal, ie The constant envelope signal has no effect on the sense of direction of the hearing, so this feature and related information between multiple channels can be used to combine several channels into one common channel for encoding.
  • the second-order backward adaptive predictor 105 is used to eliminate redundancy of the steady state signal and improve coding efficiency.
  • the and difference stereo (M/S) module 106 is used to operate the channel pair, which refers to two channels such as a left channel or a left and right surround channel in a two-channel signal or a multi-channel signal.
  • the M/S module 106 utilizes the correlation between the two channels of the channel pair to achieve the effect of reducing the code rate and improving the coding efficiency.
  • the bit allocation and quantization coding module 107 is implemented by a nested loop process in which the non-uniform quantizer performs lossy coding, and the entropy coding module performs lossless coding, which removes redundancy and reduces correlation.
  • Nested loops include inner loops and outer loops, where inner loops are tuned The step size of the non-uniform quantizer is used until the provided bits are used up, and the outer loop uses the ratio of quantization noise to the masking threshold to estimate the encoding quality of the signal.
  • the last encoded signal forms an encoded audio stream output through bitstream multiplexing module 108.
  • the input signal simultaneously performs four-band multi-phase filter bank (PQF) to generate four equal-bandwidth bands, and each band uses MDCT to generate 256 spectral coefficients, for a total of 1024.
  • PQF multi-phase filter bank
  • each band uses MDCT to generate 256 spectral coefficients, for a total of 1024.
  • a gain controller 101 is used in each frequency band.
  • the high frequency PQF band can be ignored to obtain a low sampling rate signal.
  • FIG. 2 shows a block diagram of the corresponding MPEG-2 AAC decoder.
  • the decoder includes a bitstream demultiplexing module 201, a lossless decoding module 202, an inverse quantizer 203, a scale factor module 204, and/or a difference stereo (M/S) module 205, a prediction module 206, an intensity/coupling module 207, and a time.
  • the encoded audio stream is demultiplexed by the bitstream demultiplexing module 201 to obtain a corresponding data stream and control stream.
  • IMDCT inverse modified discrete cosine transform
  • the inverse quantizer 203 is a set of non-uniform quantizers implemented by a companding function for converting integer quantized values into reconstructed spectra. Since the scale factor module in the encoder differentiates the current scale factor from the previous scale factor and then uses the Huffman code for the difference value, the scale factor module 204 in the decoder performs Huffman decoding to obtain the corresponding difference value, and then recovers. A true scale factor.
  • the M/S module 205 converts the sum/difference channel into left and right channels under the control of side information.
  • the prediction decoding is performed by the prediction module 206 in the decoder.
  • the intensity/coupling module 207 performs intensity/coupling decoding under the control of the side information, and then outputs it to the time domain noise shaping module 208 for time domain noise shaping decoding, and finally performs frequency-time conversion by the inverse modified discrete cosine transform module 209.
  • the sampling frequency is scalable, the high frequency PQF band can be ignored by the gain control module 210 to obtain a low sampling rate signal.
  • the Dolby AC_3 encoder Similar to MPEG AAC, the Dolby AC_3 encoder also uses inter-channel intensity combining to improve multi-channel signal encoding efficiency.
  • the international patent application with the international application number PCT/IB02/01595 proposes to quantize a plurality of channels when encoding an audio signal of more than one channel.
  • the coefficients use an integer discrete cosine transform (INT DCT) method to remove inter-channel redundancy.
  • INT DCT integer discrete cosine transform
  • This method is proposed for the shortcomings of the current multi-channel coding method, but does not solve the problem of two-channel stereo coding efficiency.
  • the method of integer discrete cosine transform employed in the method of the patent application is not an optimal solution for the redundancy removal between quantized coefficients (considering the time variation of the source). At the same time, this method also inevitably increases the computational complexity of encoding and decoding. Summary of the invention
  • the object of the present invention is to provide a method and apparatus for encoding and decoding using inter-channel redundancy in order to solve the shortcomings of the prior art, to solve stereo in any stereo and multi-channel audio codec in the prior art. Codec low efficiency and poor quality.
  • the present invention provides a method for encoding using inter-channel redundancy, including the following steps:
  • Step 1 Transform a linear PCM (Pulse Code Modulation) signal into the frequency domain, and calculate a masking threshold of the scale factor band;
  • PCM Pulse Code Modulation
  • Step 2 The frequency domain coefficients of the region are quantized according to the masking threshold of the scale factor band, and the integer coefficients of each channel are obtained;
  • Step 3 Organizing the integer coefficients according to the principle of maximizing the coding gain, and obtaining channel pairs/groups of specific regions of time-frequency;
  • Step 4 Perform matrix transformation on the quantized integer coefficients of the channel, and perform the entropy coding and code stream multiplexing output on the transformed channel pair/group integer coefficients.
  • step 4 performing matrix transformation on the quantized integer coefficients of the channel adopts an optimal transform mode, where the optimal transform mode is a determined number of integer transforms, KLT transforms, and KLTs.
  • the optimal transform mode is a determined number of integer transforms, KLT transforms, and KLTs.
  • the approximate transformation a transform having the largest coding gain is selected for encoding the quantized integer coefficients of the determined region.
  • the present invention also provides an apparatus for encoding using inter-channel redundancy, comprising a psychoacoustic module, a modified discrete cosine transform module, a quantizer, an entropy coding and a code stream multiplexing module, and a matrix transformation module, wherein the matrix The transform module is configured to organize the integer coefficients of each channel output from the quantizer according to the principle of maximizing the coding gain, and obtain channel pairs/groups of specific regions of time-frequency, for the channels Performing matrix transformation on the quantized integer coefficients of the group/group, and outputting the transformed channel pair/group integer coefficients to the entropy coding and code stream multiplexing module; the psychoacoustic module is configured to calculate the current according to the auditory characteristics of the human ear a masking curve of the frame signal, calculating a masking threshold of the specific time-frequency region according to the masking curve, for guiding quantization of the current frame signal; and the modified discrete cosine transform module for linear PCM (Pulse Code
  • the present invention further provides a method for decoding using inter-channel redundancy, comprising the following steps: Step 1. Perform inverse matrix transformation on integer coefficients of code stream demultiplexing and entropy decoding to obtain an integer quantization coefficient;
  • Step 2 Perform inverse quantization processing on the integer quantized coefficients to recover the frequency domain coefficients
  • Step 3 Perform inverse inverse cosine transform on the frequency domain coefficients to obtain a linear PCM signal.
  • the inverse matrix transformation in the step 1 adopts an optimal transformation mode, and the optimal transformation mode is in the side information in a certain number of integer transformation modes, KLT transformation modes, and KLT approximate transformation modes.
  • the present invention also provides an apparatus for decoding using inter-channel redundancy, which includes a code stream demultiplexing and entropy decoding module, an inverse quantizer, an inverse modified discrete cosine transform module, and an inverse matrix transform module, where
  • the inverse matrix transform module is configured to perform inverse matrix transform on integer coefficients output from the code stream demultiplexing and entropy decoding module to obtain integer quantized coefficients;
  • the code stream demultiplexing and entropy decoding module is configured to input Compressed bit stream demultiplexing and entropy decoding to obtain integer coefficients;
  • the inverse quantizer is configured to inverse quantize the integer quantized coefficients output from the inverse matrix transform module to recover frequency domain coefficients;
  • the inverse modified discrete cosine Transform module for frequency domain output from inverse quantizer
  • the coefficients are inversely modified by discrete cosine transform to obtain a linear PCM signal.
  • the invention adopts an optimal transform method in encoding and decoding, that is, can perform lossless de-duplication processing on the quantized multi-channel coefficients; and can be used for lossless two-channel and multi-channel encoding (Loss less Stereo and Mul t ichannel Audio Coding ) »
  • lossy coding for transformed (such as MDCT transform, QMF subband filtering and wavelet transform, etc.), frequency domain processing (such as predictive coding, noise shaping and differential stereo coding) and quantization
  • the post-spectrum coefficients including the transform coefficients and the filtered sub-band signals
  • the present invention can also be used to remove channel signals (such as time domain PCM samples, sub-band samples). Statistical redundancy between the frequency domain coefficients and the frequency domain.
  • the stereo codec efficiency and quality are improved for any stereo and multi-channel audio codec.
  • FIG. 1 is a schematic block diagram of an MPEG-2 AAC encoder in the prior art
  • FIG. 2 is a schematic block diagram of an MPEG-2 AAC decoder in the prior art
  • Figure 3 is a schematic block diagram of an encoder of the present invention.
  • FIG. 4 is a schematic block diagram of a decoder of the present invention. detailed description
  • a method of encoding using inter-channel redundancy includes the following steps:
  • Step 1 Transform the linear PCM signal into the frequency domain, and calculate the masking threshold of the scale factor band.
  • Step 2. Quantize the frequency domain coefficients of the region according to the masking threshold of the scale factor band, and obtain the integer coefficients of each channel;
  • Step 4 Perform matrix transformation on the channel pair/group quantized integer coefficients, and perform the entropy coding and code stream multiplexing output on the transformed channel pair/group integer coefficients.
  • the channel coefficients (including the time domain, the frequency domain, and the sub-bands processed by the present invention, whether it is lossy coding or lossless coding, for convenience of description, the following processing time domain samples, sub-band samples, and frequency domain coefficients are collectively referred to as "Coefficients".) are all integers and are treated in much the same way. Therefore, in the following description, “lossy coding” and “lossless coding” are not distinguished.
  • FIG. 3 A block diagram of a device for encoding with inter-channel redundancy is shown in FIG. 3.
  • the linear PCM signals are input to a modified discrete cosine transform module 301 and a psychoacoustic model 305, respectively, and the modified discrete cosine transform module 301 converts the PCM signals into the frequency domain.
  • the modified discrete cosine transform window function and block length can be switched according to signal characteristics to ensure sufficient time-frequency resolution and effectively remove intra-channel time domain redundancy.
  • the psychoacoustic model 305 is used to calculate a masking curve of the current frame signal according to the auditory characteristics of the human ear, and the masking threshold of the specific time-frequency region can be calculated according to the masking curve for guiding the quantization of the current frame signal.
  • the frequency domain coefficients obtained by the modified discrete cosine transform module 301 are sent to the quantizer 302.
  • the quantizer is composed of a set of sub-quantizers, and each sub-quantizer separately quantizes the mask according to a masking threshold of a specific time-frequency region.
  • the frequency domain coefficient of a region which is usually referred to as a scale factor band.
  • the quantizer has a bit allocation mechanism that controls the number of bits that each sub-quantizer can utilize, such that the number of bits taken to quantize the frequency domain coefficients of the current frame does not exceed the allowed bit limit and minimizes quantization distortion.
  • the bit allocation strategy described herein can employ general common strategies, such as the rate control method of MPEG AAC.
  • the quantizer described here can use a scalar quantizer and a vector quantizer, such as MPEG AAC. Linear scalar quantizer, and vector quantizer for MPEG TwinVQ.
  • the integer coefficients are sent to the matrix transformation module 303.
  • the matrix conversion module 303 organizes the quantized integer coefficients of the respective channels obtained by the quantization according to the principle of maximizing the coding gain to obtain channel pairs/groups of the specific regions of the time-frequency. Also, the channel pair/group of different time-frequency regions (time segments for time domain samples, frequency segments for frequency domain coefficients, and time-frequency regions for sub-band samples) may be different.
  • the correlation between the left channel (L) and the right channel (R) is high, as well as the left surround channel (LS) and the right surround channel (RS). The correlation is high, and the L/R pair and the LS/RS pair are often obtained.
  • the channel-to-organization information needs to be encoded as control information.
  • the following channel groups often appear when organized according to the channel group: Left channel/Right channel/Center channel, Left front channel/Right front channel/Left center channel/Right center channel/Central sound Road, left surround/right surround/back surround, and more.
  • the so-called optimal transform means that one of the determined number of integer transforms, KLT transforms, and any transforms used for the approximate LT transform is selected, and the coding gain is maximum.
  • the LIFTING algorithm is used to transform the integer coefficient to the integer coefficient.
  • the so-called maximum coding gain means that the number of bits used is the least when encoding a specific signal at a specific quality.
  • the integer transform refers to a transform in which each coefficient of the transform matrix is an integer, and there is an inverse matrix (each coefficient is an integer) such that I is a unit matrix.
  • each coefficient is an integer
  • I is a unit matrix.
  • Z and ? to represent the two channel integer coefficients of the channel pair (this , and 7? represent any channel that may appear in the encoding, and should not be interpreted as merely "left channel, and "right channel”)
  • £ and the quantized integer coefficient, ⁇ and integer transform The resulting integer coefficients, for each channel pair, are integer-scaled for the channel-to-integer coefficients within a certain resolution scale (eg, using the so-called "scale factor band”):
  • the number of bits used for encoding is less than the number of bits used for ⁇ ' encoding.
  • KLT transform refers to a signal adaptation matrix whose row vector is the eigenvector of the multi-channel coefficient covariance matrix. Since the KLT transformation matrix is an orthogonal matrix, it can be decomposed into a GIVENS matrix and approximated by the LIFTING algorithm, and an integer result can be obtained.
  • the covariance matrix ⁇ of the signal is calculated according to the time domain signal.
  • the calculation methods of covariance matrix ⁇ and orthogonal matrix Q are introduced in signal processing and linear algebra books, such as "Digital Signal Processing: Theory, Algorithm and Implementation", Tsinghua University Press, edited by Hu Guangshu, 1997.
  • the KLT transform needs to be approximated using the so-called LIFTING algorithm.
  • the LIFTING algorithm described herein can be referred to related documents such as "Factor ing Wavelet Transforms into Lifting Steps" (I. Daubechies, W. Sweldens, Tech. Rep., Bel l Labora tories, Lucent Technologies, 1996).
  • Orthogonal matrix 0 happens to be a GIVENS rotation matrix, so it can be decomposed into the following form According to the LIFTING algorithm, after each transformation, the coefficients can be rounded and do not affect the complete reversibility of the system.
  • the KLT transformation matrix and the LIFTING algorithm are similar to the channel pair method.
  • the approximate transformation of the KLT transformation refers to the transformation method used to approximate the KLT transformation under certain premise (such as source statistical properties and computational complexity). Since the KLT transform is the optimal transform in the sense of mean square error, the calculation amount and sideband information are large. Therefore, other transform methods can be used to approximate the KLT transform to reduce the computational amount and/or sideband information, such as DFT (Discrete Fourier). Transform), DCT (discrete Cosine transform), DST (discrete sine transform), etc.
  • DFT Discrete Fourier
  • DCT discretrete Cosine transform
  • DST discretrete sine transform
  • the so-called optimal transformation means that in a certain number of integer transformations, KLT transformations (LIFTING implementation) and KLT approximate transformations (LIFTING implementation), the transformation with the largest coding gain is selected for encoding the determined region.
  • the matrix transformation module includes a determined number of integer transform units, KLT transform units, and KLT approximate transform units, and the matrix transformation manner includes selecting a certain number of integer transform modes, KLT transform modes, and KLT approximate transforms. Ways (such as DFT, DCT, DST, etc.). For example, you can select M integer conversion methods, set the code to 4, ⁇ 2, which is not less than
  • each channel pair it can be handled as follows to reduce the number of bits required for encoding.
  • the code numbers are ⁇ , . 4 and A are two integer transformation methods, which is the KLT transformation method. among them
  • the value of 0 is as shown in equations (4) and (5).
  • the channel The integer coefficient after the quantization is not processed; when the transform 4 is adopted, the quantized integer coefficient of the first channel of the channel pair is unchanged, and the integer coefficient of the second channel obtained by the transform is the original first
  • the quantized integer coefficients of the channels are reduced by the difference of the quantized integer coefficients of the second channel; when the transform A is used, the KLT transform is used to achieve redundancy cancellation between the channel coefficients, in addition to coding transformation In addition to the code of the way, it is also necessary to encode (9 values).
  • the decision switch 306 employing the transformation matrix may be used to select an optimal transformation mode among a determined number of integer transformation units or KLT transformation units in the matrix transformation module or an approximate transformation unit of the KLT, and the selected optimal transformation mode
  • the code number is encoded as side information.
  • the matrix transformation type adopted may be selected according to the scale factor band, and the selected matrix is transformed.
  • the serial number is encoded.
  • the transformation mode A is adopted, that is, the channel internal coefficient is not changed.
  • O and O the integer transformation method is used. In other cases, the transformation method 4 is used.
  • the selected transform mode A, A or ⁇ is written as a side information into the compressed bit stream to control the decoder to accurately decode.
  • the integer coefficients are sent to the entropy coding and code stream multiplexing module 304.
  • the statistical redundancy of the integer coefficients can be removed by the effective entropy coding, and then the entropy coding result is multiplexed with the other control information into the compressed bit stream, and output to the transmission.
  • the entropy coding may employ an encoding method such as Huffman coding, run length coding, and arithmetic coding.
  • the present invention also discloses a decoding method and apparatus using inter-channel redundancy, as shown in FIG. 4, including a code stream demultiplexing and decoding module, an inverse matrix transform module, an inverse quantizer, and an inverse.
  • a modified discrete cosine transform module the method comprising the following steps:
  • Step 1 The compressed bit stream is demultiplexed and entropy decoded by the code stream demultiplexing and entropy decoding module. To the integer coefficient and the edge information used to determine which inverse matrix transformation method is used; Step 2, the integer coefficient is inverse matrix transformed by the inverse matrix transformation module to obtain an integer quantization coefficient after the inverse matrix transformation;
  • Step 3 The integer quantization coefficient transformed by the inverse matrix is inverse quantized by the inverse quantizer to recover the frequency domain coefficient;
  • Step 4 The frequency domain coefficient is subjected to inverse modified discrete cosine transform by an inverse modified discrete cosine transform module to obtain a linear PCM signal.
  • the inverse matrix transformation in the step 2 is determined by the conversion mode code in the side information obtained from the step 1, which one of the above conversion methods is employed.
  • the integer transform may be used to restore the integer quantized coefficients.
  • Step la obtaining a covariance matrix or corresponding parameters from the code stream (such as step lb in equation (4), calculating a KLT transformation matrix according to the covariance matrix or corresponding parameters;
  • Step lc For the LT transformation matrix, use the LIFTING algorithm to restore the channel-to-integer quantization coefficient.
  • the integer coefficients and the side information for determining which inverse matrix transform method is used are obtained, and the integer coefficients are sent to the inverse matrix transform module 402.
  • the matrix transformation of the three matrix transformation modes of equation (6) is performed, the corresponding inverse matrix is transformed into
  • the inverse matrix transform module 402 selects which inverse matrix transform method is used to recover the integer quantized coefficients at the time of encoding based on the side information obtained from 401.
  • the integer quantized coefficients obtained by the inverse matrix transform are sent to the inverse quantization module 403 for inverse quantization processing.
  • the recovered frequency domain coefficients are fed to an inverse modified discrete cosine transform 404 to obtain a linear PCM audio signal.
  • the inverse matrix transform module includes an integer transform unit, a KLT transform unit, and an approximate transform unit of the KLT, wherein the matrix transform code in the side information is used to select which inverse matrix transform method is used to demultiplex the code stream and
  • the integer coefficients output by the entropy decoding module are inverse matrix transformed, and the transformed integer quantized coefficients are output to an inverse quantizer.

Abstract

The present invention discloses a coding/decoding method and device utilizing intra­channel signal redundancy, in which the said coding method comprises the following steps: Transforming the linear PCM signals into frequency domain by modified discrete cosine transform (MDCT), and calculating the masking thresholds of the scaling factors bands by the psycho-acoustic module; quantising the coefficients of frequency domain in the area by the quantiser based on the masking thresholds of the scaling factors bands so as to gain the integrai coeficients of each channel; transforming matrix for the said integral coeficients of each channel by the matrix transforming module, and outputting the transformed integral coeficients of the channel pairs by the entropy encoder and the code stream multiplexer. The present invention also provides the device corresponding to the coding method , and the decoding method and device corresponding to the coding method and device. In the invention, the coding efficiency of audio signals is improved for loss coding, and the statistical redundancy of intra-channel signals is removed for lossless coding so as to achieve the purpose of compressing signals. The coding /decoding efficiency and quality can be improved for any stereo audio coder/decoder and multichannel audio coder/decoder by the invention.

Description

利用声道间冗余进行编 /解码的方法及装置 技术领域  Method and device for encoding/decoding using inter-channel redundancy
本发明涉及音频编解码技术领域, 具体地说, 涉及一种利用声道间冗余 进行编 I解码的方法及装置。 背景技术  The present invention relates to the field of audio codec technology, and in particular to a method and apparatus for encoding and decoding using inter-channel redundancy. Background technique
为得到高保真的数字音频信号, 需对数字音频信号进行音频编码或音频 压缩以便于存储和传输。 对音频信号进行编码的目的是用尽可能少的比特数 实现音频信号的透明表示, 例如原始输入的音频信号与经编码后输出的音频 信号之间几乎没有差别。  In order to obtain a high-fidelity digital audio signal, the digital audio signal is audio-encoded or audio-compressed for storage and transmission. The purpose of encoding an audio signal is to achieve a transparent representation of the audio signal with as few bits as possible, such as little difference between the originally input audio signal and the encoded output audio signal.
在二十世纪八十年代初, CD的出现体现了用数字表示音频信号的诸多优 点, 例如高保真度、 大动态范围和强鲁棒性。 然而, 这些优点都是以很高的 数据速率为代价的。 例如 CD质量的立体声信号的数字化所要求的采样率为 44. 1kHz, 且每个采样值需用 16 比特进行均匀量化, 这样, 没有经过压缩的 数据速率就达到了 1. 41Mb/s, 如此高的数据速率给数据的传输和存储带来极 大的不便, 特别是在多媒体应用和无线传输应用的场合下, 更是受到带宽和 成本的限制。 为了保持高质量的音频信号, 因此要求新的网络和无线多媒体 数字音频系统必须降低数据的速率, 且同时不损害音频的质量。 针对上述问 题, 目前已提出了多种既能得到很高压缩比又能产生高保真的音频信号的音 频压缩技术, 典型的有国际标准化组织 IS0/ IEC的 MPEG- 1/-2/- 4技术、 杜比 公司的 AC-2/AC-3技术、 索尼公司的 ATRAC/MiniDi sc/SDDS技术以及朗讯科 技的 PAC/EPAC/MPAC技术等。 下面选择 MPEG- 2 A AC技术、 杜比公司的 AC - 3 技术进行具体的说明。 In the early 1980s, the advent of CDs represented the many advantages of digitally representing audio signals, such as high fidelity, large dynamic range, and robustness. However, these advantages are at the expense of high data rates. For example, the digitization of a CD-quality stereo signal requires a sampling rate of 44.1 kHz, and each sample value is uniformly quantized with 16 bits, so that the uncompressed data rate reaches 1.41 Mb/s, so high. The data rate brings great inconvenience to the transmission and storage of data, especially in the case of multimedia applications and wireless transmission applications, and is limited by bandwidth and cost. In order to maintain high quality audio signals, new network and wireless multimedia digital audio systems are required to reduce the rate of data without compromising the quality of the audio. In response to the above problems, a variety of audio compression techniques have been proposed which can obtain high compression ratio and high fidelity audio signals, and are typically MPEG-1/-2/-4 technology of the International Organization for Standardization ISO/IEC. , Dolby's AC-2/AC-3 technology, Sony's ATRAC/MiniDi sc/SDDS technology, and Lucent's PAC/EPAC/MPAC technology. Select MPEG-2 A AC technology, Dolby AC-3 The technology is specifically described.
图 1给出了 MPEG- 2 AAC编码器的方框图 , 该编码器包括增益控制器 101、 修正离散预先变换(MDCT )模块 102、 时域噪声整形模块 103、 强度 /耦合模 块 104、 心理声学模型、 二阶后向自适应预测器 105、 和 /差立体声模块 106、 比特分配和量化编码模块 107以及比特流复用模块 108 ,其中比特分配和量化 编码模块 107进一步包括压缩比 /失真处理控制器、 尺度因子模块、 非均匀量 化器和熵编码模块。  1 shows a block diagram of an MPEG-2 AAC encoder including a gain controller 101, a modified discrete pre-transform (MDCT) module 102, a time domain noise shaping module 103, an intensity/coupling module 104, a psychoacoustic model, a second order backward adaptive predictor 105, a / difference stereo module 106, a bit allocation and quantization encoding module 107, and a bitstream multiplexing module 108, wherein the bit allocation and quantization encoding module 107 further includes a compression ratio/distortion processing controller, Scale factor module, non-uniform quantizer and entropy coding module.
音频信号经过增益控制器 101后进入修正离散余弦变换模块 102 ,根据不 同的信号进行时频变换, 然后通过时域噪声整形模块 103对修正离散余弦变 换模块 102输出的频谱系数进行处理, 时域噪声整形技术是在频域上对频谱 系数进行线性预测分析, 然后依据上述分析控制量化噪声在时域上的形状, 以此达到控制预回声的目的。  After the audio signal passes through the gain controller 101, it enters the modified discrete cosine transform module 102, performs time-frequency transform according to different signals, and then processes the spectral coefficients output by the modified discrete cosine transform module 102 through the time domain noise shaping module 103, and time domain noise. The shaping technique performs linear prediction analysis on the spectral coefficients in the frequency domain, and then controls the shape of the quantization noise in the time domain according to the above analysis, thereby achieving the purpose of controlling the pre-echo.
强度 /耦合模块 104用于对信号强度的立体声编码, 由于对于高频段 (大 于 2kHz ) 的信号, 听觉的方向感与有关信号强度的变化(信号包络)有关, 而与信号的波形无关, 即恒包络信号对听觉方向感无影响, 因此可利用这一 特点以及多声道间的相关信息 , 将若干声道合成一个共同声道进行编码。  The intensity/coupling module 104 is used for stereo encoding of signal strength, since for a high frequency band (greater than 2 kHz) the sense of direction of the hearing is related to the change in signal strength (signal envelope), independent of the waveform of the signal, ie The constant envelope signal has no effect on the sense of direction of the hearing, so this feature and related information between multiple channels can be used to combine several channels into one common channel for encoding.
二阶后向自适应预测器 105用于消除稳态信号的冗余, 提高编码效率。 和差立体声 (M/S )模块 106用于操作声道对, 声道对是指诸如双声道信 号或多声道信号中的左右声道或左右环绕声道的两个声道。 M/S模块 106利用 声道对中两个声道之间的相关性以达到减少码率和提高编码效率的效果。  The second-order backward adaptive predictor 105 is used to eliminate redundancy of the steady state signal and improve coding efficiency. The and difference stereo (M/S) module 106 is used to operate the channel pair, which refers to two channels such as a left channel or a left and right surround channel in a two-channel signal or a multi-channel signal. The M/S module 106 utilizes the correlation between the two channels of the channel pair to achieve the effect of reducing the code rate and improving the coding efficiency.
比特分配和量化编码模块 107是通过一个嵌套循环过程实现的, 其中非 均匀量化器进行的是有损编码, 而熵编码模块进行的是无损编码, 这样可以 去除冗余和减少相关。 嵌套循环包括内层循环和外层循环, 其中内层循环调 整非均匀量化器的步长直到所提供的比特用完, 外层循环则利用量化噪声与 掩蔽阈值的比来估计信号的编码质量。 最后经过编码的信号通过比特流复用 模块 108形成编码的音频流输出。 The bit allocation and quantization coding module 107 is implemented by a nested loop process in which the non-uniform quantizer performs lossy coding, and the entropy coding module performs lossless coding, which removes redundancy and reduces correlation. Nested loops include inner loops and outer loops, where inner loops are tuned The step size of the non-uniform quantizer is used until the provided bits are used up, and the outer loop uses the ratio of quantization noise to the masking threshold to estimate the encoding quality of the signal. The last encoded signal forms an encoded audio stream output through bitstream multiplexing module 108.
在釆样率可伸缩的情况下, 输入信号同时进行四频段多相位滤波器组 ( PQF )产生四个等带宽的频带, 每个频带利用 MDCT产生 256个频谱系数, 总共有 1024个。 在每个频带内都使用增益控制器 101。 而在解码器中可以忽 略高频的 PQF频带得到低采样率信号。  In the case where the sample rate is scalable, the input signal simultaneously performs four-band multi-phase filter bank (PQF) to generate four equal-bandwidth bands, and each band uses MDCT to generate 256 spectral coefficients, for a total of 1024. A gain controller 101 is used in each frequency band. In the decoder, the high frequency PQF band can be ignored to obtain a low sampling rate signal.
图 2给出了对应的 MPEG- 2 AAC解码器的方框示意图。 该解码器包括比特 流解复用模块 201、 无损解码模块 202、 逆量化器 203、 尺度因子模块 204、 和 /差立体声 (M/S )模块 205、 预测模块 206、 强度 /耦合模块 207、 时域噪声 整形模块 208、 逆修正离散余弦变换模块 ( IMDCT ) 209和增益控制模块 210。 编码的音频流经过比特流解复用模块 201 进行解复用, 得到相应的数据流和 控制流。 上述信号通过无损解码模块 202 的解码后, 得到尺度因子的整数表 示和信号谱的量化值。 逆量化器 203是一组通过压扩函数实现的非均匀量化 器组, 用于将整数量化值转换为重建谱。 由于编码器中的尺度因子模块是将 当前尺度因子与前一尺度因子进行差分, 然后将差分值采用 Huffman编码, 因此解码器中的尺度因子模块 204进行 Huffman解码可得到相应的差分值, 再恢复出真实的尺度因子。 M/S模块 205在边信息的控制下将和 /差声道转换 成左右声道。 由于在编码器中采用二阶后向自适应预测器 105 消除稳态信号 的冗余并提高编码效率, 因此在解码器中通过预测模块 206 进行预测解码。 强度 /耦合模块 207 在边信息的控制下进行强度 /耦合解码, 然后输出到时域 噪声整形模块 208 中进行时域噪声整形解码, 最后通过逆修正离散余弦变换 模块 209进行频率-时间变换。 对于采样频率可伸缩的情况, 可通过增益控制模块 210 忽略高频的 PQF 频带, 以得到低采样率信号。 Figure 2 shows a block diagram of the corresponding MPEG-2 AAC decoder. The decoder includes a bitstream demultiplexing module 201, a lossless decoding module 202, an inverse quantizer 203, a scale factor module 204, and/or a difference stereo (M/S) module 205, a prediction module 206, an intensity/coupling module 207, and a time. A domain noise shaping module 208, an inverse modified discrete cosine transform module (IMDCT) 209, and a gain control module 210. The encoded audio stream is demultiplexed by the bitstream demultiplexing module 201 to obtain a corresponding data stream and control stream. After the above signal is decoded by the lossless decoding module 202, an integer representation of the scale factor and a quantized value of the signal spectrum are obtained. The inverse quantizer 203 is a set of non-uniform quantizers implemented by a companding function for converting integer quantized values into reconstructed spectra. Since the scale factor module in the encoder differentiates the current scale factor from the previous scale factor and then uses the Huffman code for the difference value, the scale factor module 204 in the decoder performs Huffman decoding to obtain the corresponding difference value, and then recovers. A true scale factor. The M/S module 205 converts the sum/difference channel into left and right channels under the control of side information. Since the second order backward adaptive predictor 105 is used in the encoder to eliminate the redundancy of the steady state signal and improve the coding efficiency, the prediction decoding is performed by the prediction module 206 in the decoder. The intensity/coupling module 207 performs intensity/coupling decoding under the control of the side information, and then outputs it to the time domain noise shaping module 208 for time domain noise shaping decoding, and finally performs frequency-time conversion by the inverse modified discrete cosine transform module 209. For the case where the sampling frequency is scalable, the high frequency PQF band can be ignored by the gain control module 210 to obtain a low sampling rate signal.
与 MPEG AAC类似, 杜比 AC_3编码器也采用声道间强度鵪合方法提高多 声道信号编码效率。  Similar to MPEG AAC, the Dolby AC_3 encoder also uses inter-channel intensity combining to improve multi-channel signal encoding efficiency.
但是, 已有的立体声编码技术, 包括和 /差立体声技术和强度耦合立体声 技术, 都存在一定的缺陷。 例如, 在和 /差立体声编码中, 编码端对和、 差声 道信号分别量化, 则在解码端获得的 L/R左右声道信号的噪声是和、 差声道 量化噪声的叠加, 导致质量劣化。 在强度耦合编码中, 若量化精度较低, 或 分辨率不够, 都会严重影响解码音频信号的主观质量。  However, existing stereo coding techniques, including the //difference stereo technique and the intensity coupled stereo technique, have certain drawbacks. For example, in the sum/difference stereo coding, the coded end pairs and the difference channel signals are separately quantized, and the noise of the L/R left and right channel signals obtained at the decoding end is a superposition of the sum and difference channel quantization noise, resulting in quality. Deterioration. In intensity-coupled coding, if the quantization accuracy is low, or the resolution is insufficient, the subjective quality of the decoded audio signal will be seriously affected.
Dai Yang在博士论文《Higli Fidel i ty.MuU ichannel Audio Compress i 011》 中, 提出了在滤波之后和量化之前用 KLT ( Karhunen-Loeve Transform )变换 去除声道冗余的方法。 由于 KLT变换是最小均方误差准则下的最佳变换, 因 此, 在这个意义下, 采用 KLT变换可以最大限度的去除声道间的冗余。 但是, 该方法却引入了一个现有技术难以解决的问题: 如何利用现有的心理声学模 型技术有效的量化解冗的声道系数, 如果无法解决这个问题, 则该方法没有 实际应用意义。  In his doctoral thesis "Higli Fidel i ty. MuU ichannel Audio Compress i 011", Dai Yang proposed a method of removing channel redundancy by KLT (Karhunen-Loeve Transform) transformation after filtering and before quantization. Since the KLT transform is the best transform under the minimum mean square error criterion, in this sense, the KLT transform can be used to minimize the redundancy between channels. However, this method introduces a problem that is difficult to solve in the prior art: How to use the existing psychoacoustic model technology to effectively quantify the redundant channel coefficients. If this problem cannot be solved, the method has no practical significance.
针对以上问题, 国际申请号为 PCT/IB02/01595 (申请日 2002年 5月 8 曰) 的国际专利申请提出了在对大于 1 个声道的音频信号编码时, 对多个声 道量化后的系数采用整数离散余弦变换(INT DCT )的方法, 去除声道间冗余。 该方法是针对目前多声道编码方法的不足提出的, 但并没有解决双声道立体 声编码效率问题。 并且, 该专利申请所述方法所采用整数离散余弦变换的方 法,并不是量化系数声道间冗佘去除的最优解决方法(考虑到信源的时变性)。 同时, 该方法也不可避免的增加了编码、 解码的计算复杂度。 发明内容 In response to the above problem, the international patent application with the international application number PCT/IB02/01595 (application date May 8, 2002) proposes to quantize a plurality of channels when encoding an audio signal of more than one channel. The coefficients use an integer discrete cosine transform (INT DCT) method to remove inter-channel redundancy. This method is proposed for the shortcomings of the current multi-channel coding method, but does not solve the problem of two-channel stereo coding efficiency. Moreover, the method of integer discrete cosine transform employed in the method of the patent application is not an optimal solution for the redundancy removal between quantized coefficients (considering the time variation of the source). At the same time, this method also inevitably increases the computational complexity of encoding and decoding. Summary of the invention
本发明的目的在于, 针对现有技术的不足, 提出一种利用声道间冗余进 行编解码的方法及装置, 以解决现有技术中的任何立体声和多声道音频编解 码器中, 立体声编解码效率低和质量差的问题。  The object of the present invention is to provide a method and apparatus for encoding and decoding using inter-channel redundancy in order to solve the shortcomings of the prior art, to solve stereo in any stereo and multi-channel audio codec in the prior art. Codec low efficiency and poor quality.
为实现上述目的,本发胡提供一种利用声道间冗余进行编码的方法, 包括 以下步 :  In order to achieve the above object, the present invention provides a method for encoding using inter-channel redundancy, including the following steps:
步骤 1、 将线性 PCM ( Pulse Code Modulat ion, 脉冲编码调制 )信号变换 到频域, 并计算尺度因子带的掩蔽阔值;  Step 1. Transform a linear PCM (Pulse Code Modulation) signal into the frequency domain, and calculate a masking threshold of the scale factor band;
步骤 2、 才 据尺度因子带的掩蔽阈值量化本区域的频域系数, 得到各声道 的整数系数;  Step 2. The frequency domain coefficients of the region are quantized according to the masking threshold of the scale factor band, and the integer coefficients of each channel are obtained;
步骤 3、 将所述的整数系数按照编码增益最大的原则进行组织, 获得时频 特定区域的声道对 /组;  Step 3: Organizing the integer coefficients according to the principle of maximizing the coding gain, and obtaining channel pairs/groups of specific regions of time-frequency;
步骤 4、对所述的声道对量化后的整数系数进行矩阵变换, 并将变换后的 声道对 /组整数系数经过熵编码和码流复用输出。  Step 4: Perform matrix transformation on the quantized integer coefficients of the channel, and perform the entropy coding and code stream multiplexing output on the transformed channel pair/group integer coefficients.
其中, 所述的步骤 4 中, 对所述的声道对量化后的整数系数进行矩阵变 换采用最优变换方式, 所述的最优变换方式为在确定数量的整数变换、 KLT变 换和 KLT 的近似变换中, 选择编码增益最大的一种变换, 用于对确定区域的 量化后的整数系数进行编码。  In the step 4, performing matrix transformation on the quantized integer coefficients of the channel adopts an optimal transform mode, where the optimal transform mode is a determined number of integer transforms, KLT transforms, and KLTs. In the approximate transformation, a transform having the largest coding gain is selected for encoding the quantized integer coefficients of the determined region.
本发明还提供一种利用声道间冗余进行编码的装置, 包括心理声学模块、 修正离散余弦变换模块、 量化器, 熵编码和码流复用模块, 矩阵变换模块, 其中, 所述的矩阵变换模块用于将从量化器输出的各声道的整数系数按照编 码增益最大的原则进行组织, 获得时频特定区域的声道对 /组, 对所述的声道 对 /组量化后的整数系数进行矩阵变换, 并将变换后的声道对 /组整数系数输 出到熵编码和码流复用模块; 所述的心理声学模块用于根据人耳听觉特性计 算当前帧信号的掩蔽曲线, 根据掩蔽曲线计算特定时频区域的掩蔽阈值, 用 于指导对当前帧信号的量化; 所述的修正离散余弦变换模块, 用于将线性 PCM ( Pulse Code Modulat ion, 脉冲编码调制)信号变换到频域; 所述的量化器, 用于将从修正离散余弦变换模块输出的频域系根据特定时频区域的掩蔽阔 值, 量化本区域的频域系数。 The present invention also provides an apparatus for encoding using inter-channel redundancy, comprising a psychoacoustic module, a modified discrete cosine transform module, a quantizer, an entropy coding and a code stream multiplexing module, and a matrix transformation module, wherein the matrix The transform module is configured to organize the integer coefficients of each channel output from the quantizer according to the principle of maximizing the coding gain, and obtain channel pairs/groups of specific regions of time-frequency, for the channels Performing matrix transformation on the quantized integer coefficients of the group/group, and outputting the transformed channel pair/group integer coefficients to the entropy coding and code stream multiplexing module; the psychoacoustic module is configured to calculate the current according to the auditory characteristics of the human ear a masking curve of the frame signal, calculating a masking threshold of the specific time-frequency region according to the masking curve, for guiding quantization of the current frame signal; and the modified discrete cosine transform module for linear PCM (Pulse Code Modulation) The modulation signal is transformed into the frequency domain; and the quantizer is configured to quantize the frequency domain coefficients of the local region according to the masking threshold of the specific time-frequency region from the frequency domain outputted by the modified discrete cosine transform module.
本发明还提供一种利用声道间冗余进行解码的方法, 包括以下步骤: 步骤 1、 将经过码流解复用和熵解码的整数系数进行逆矩阵变换, 得到整 数量化系数;  The present invention further provides a method for decoding using inter-channel redundancy, comprising the following steps: Step 1. Perform inverse matrix transformation on integer coefficients of code stream demultiplexing and entropy decoding to obtain an integer quantization coefficient;
步骤 2、 将整数量化系数进行反量化处理, 恢复频域系数;  Step 2. Perform inverse quantization processing on the integer quantized coefficients to recover the frequency domain coefficients;
步骤 3、 将频域系数进行逆修正离散余弦变换, 得到线性 PCM信号。  Step 3. Perform inverse inverse cosine transform on the frequency domain coefficients to obtain a linear PCM signal.
其中, 所述的步骤 1 中进行逆矩阵变换采用最优变换方式, 所述的最优 变换方式为在确定数量的整数变换方式、 KLT变换方式和 KLT的近似变换方式 中, 通过边信息中的矩阵变换代号而确定的一种用于恢复编码时的整数量化 系数的逆矩阵变换方式。  The inverse matrix transformation in the step 1 adopts an optimal transformation mode, and the optimal transformation mode is in the side information in a certain number of integer transformation modes, KLT transformation modes, and KLT approximate transformation modes. An inverse matrix transform method for recovering integer quantized coefficients at the time of encoding, which is determined by a matrix transform code.
本发明还提供一种利用声道间冗余进行解码的装置, 其特征在于, 包括 码流解复用和熵解码模块、 逆量化器、 逆修正离散余弦变换模块和逆矩阵变 换模块, 其中, 所述的逆矩阵变换模块用于将从码流解复用和熵解码模块输 出的整数系数进行逆矩阵变换, 得到整数量化系数; 所述的码流解复用和熵 解码模块用于将输入的压缩比特流解复用和熵解码, 得到整数系数; 所述的 逆量化器用于将从逆矩阵变换模块输出的整数量化系数进行反量化处理, 恢 复频域系数; 所述的逆修正离散余弦变换模块用于将从逆量化器输出的频域 系数进行逆修正离散余弦变换, 得到线性 PCM信号。 The present invention also provides an apparatus for decoding using inter-channel redundancy, which includes a code stream demultiplexing and entropy decoding module, an inverse quantizer, an inverse modified discrete cosine transform module, and an inverse matrix transform module, where The inverse matrix transform module is configured to perform inverse matrix transform on integer coefficients output from the code stream demultiplexing and entropy decoding module to obtain integer quantized coefficients; the code stream demultiplexing and entropy decoding module is configured to input Compressed bit stream demultiplexing and entropy decoding to obtain integer coefficients; the inverse quantizer is configured to inverse quantize the integer quantized coefficients output from the inverse matrix transform module to recover frequency domain coefficients; the inverse modified discrete cosine Transform module for frequency domain output from inverse quantizer The coefficients are inversely modified by discrete cosine transform to obtain a linear PCM signal.
本发明在编码和解码时采用最优的变换方法, 即可以通过对量化后的多 声道系数进行无损去冗余处理; 又可以用于无损双声道和多声道编码 ( Loss less Stereo and Mul t ichannel Audio Coding )„ 在有损编码中, 对 于经过变换(如 MDCT变换、 QMF子带滤波和小波变换等)、 频域处理(如预测 编码、 噪声整形和和差立体声编码等)和量化后的谱系数(包括变换系数和 滤波得到的子带信号), 进一步提高了音频信号编码效率; 在无损编码中, 同 样可以采用本发明, 去除声道信号(如时域 PCM样本、 子带样本和频域系数) 间的统计冗余, 达到信号压缩的目的, 对于任何立体声和多声道音频编解码 器, 提高了立体声编解码效率和质量。 附图说明  The invention adopts an optimal transform method in encoding and decoding, that is, can perform lossless de-duplication processing on the quantized multi-channel coefficients; and can be used for lossless two-channel and multi-channel encoding (Loss less Stereo and Mul t ichannel Audio Coding ) „ In lossy coding, for transformed (such as MDCT transform, QMF subband filtering and wavelet transform, etc.), frequency domain processing (such as predictive coding, noise shaping and differential stereo coding) and quantization The post-spectrum coefficients (including the transform coefficients and the filtered sub-band signals) further improve the encoding efficiency of the audio signal; in lossless coding, the present invention can also be used to remove channel signals (such as time domain PCM samples, sub-band samples). Statistical redundancy between the frequency domain coefficients and the frequency domain. For signal compression purposes, the stereo codec efficiency and quality are improved for any stereo and multi-channel audio codec.
图 1为现有技术中的 MPEG- 2 AAC编码器的原理框图;  1 is a schematic block diagram of an MPEG-2 AAC encoder in the prior art;
图 2为现有技术中的 MPEG- 2 AAC解码器的原理框图;  2 is a schematic block diagram of an MPEG-2 AAC decoder in the prior art;
图 3为本发明的编码器的原理框图;  Figure 3 is a schematic block diagram of an encoder of the present invention;
图 4为本发明的解码器的原理框图。 具体实施方式  4 is a schematic block diagram of a decoder of the present invention. detailed description
以下结合附图和具体的实施例对本发明进行详细的说明。  The invention will be described in detail below with reference to the drawings and specific embodiments.
一种利用声道间冗余进行编码的方法, 包括以下步驟:  A method of encoding using inter-channel redundancy includes the following steps:
步骤 1、 将线性 PCM信号变换到频域, 并计算尺度因子带的掩蔽阔值; 步骤 2、 根据尺度因子带的掩蔽阈值量化本区域的频域系数, 得到各声道 的整数系数; 步骤 3、 将所述的整数系数按照编码增益最大的原则进行组织, 获得时频 特定区域的声道对 /組; Step 1. Transform the linear PCM signal into the frequency domain, and calculate the masking threshold of the scale factor band. Step 2. Quantize the frequency domain coefficients of the region according to the masking threshold of the scale factor band, and obtain the integer coefficients of each channel; Step 3: The integer coefficients are organized according to the principle that the coding gain is maximum, and the channel pair/group of the specific region of the time-frequency is obtained;
步骤 4、 对所述的声道对 /组量化后的整数系数进行矩阵变换, 并将变换 后的声道对 /组整数系数经过熵编码和码流复用输出。  Step 4: Perform matrix transformation on the channel pair/group quantized integer coefficients, and perform the entropy coding and code stream multiplexing output on the transformed channel pair/group integer coefficients.
由于无论是有损编码, 还是无损编码, 本发明所处理的声道系数(包括 时域、 频域和子带。 为了方便叙述, 以下将带处理时域样本、 子带样本和频 域系数统称为 "系数 "。)都是整数形式, 并且处理方式基本相同。 因此, 在 下面的叙述中, 不在区分 "有损编码" 和 "无损编码"。  Since the channel coefficients (including the time domain, the frequency domain, and the sub-bands processed by the present invention, whether it is lossy coding or lossless coding, for convenience of description, the following processing time domain samples, sub-band samples, and frequency domain coefficients are collectively referred to as "Coefficients".) are all integers and are treated in much the same way. Therefore, in the following description, "lossy coding" and "lossless coding" are not distinguished.
具体地, 将上述方法结合装置进行详细的说明。 利用声道间冗余进行编 码的装置原理框图如图 3所示, 线性 PCM信号被分别输入到修正离散余弦变 换模块 301和心理声学模型 305 ,修正离散余弦变换模块 301将 PCM信号变换 到频域, 和 MPEG AAC中一样, 修正离散余弦变换窗函数和块长可以根据信号 特征进行切换, 以保证足够的时间-频率分辨率, 并有效的去除声道内时域冗 余。 心理声学模型 305 用于根据人耳听觉特性计算当前帧信号的掩蔽曲线, 根据掩蔽曲线可以计算特定时频区域的掩蔽阈值, 用于指导对当前帧信号的 量化。  Specifically, the above method in combination with the device will be described in detail. A block diagram of a device for encoding with inter-channel redundancy is shown in FIG. 3. The linear PCM signals are input to a modified discrete cosine transform module 301 and a psychoacoustic model 305, respectively, and the modified discrete cosine transform module 301 converts the PCM signals into the frequency domain. As in MPEG AAC, the modified discrete cosine transform window function and block length can be switched according to signal characteristics to ensure sufficient time-frequency resolution and effectively remove intra-channel time domain redundancy. The psychoacoustic model 305 is used to calculate a masking curve of the current frame signal according to the auditory characteristics of the human ear, and the masking threshold of the specific time-frequency region can be calculated according to the masking curve for guiding the quantization of the current frame signal.
经过修正离散余弦变换模块 301 处理后得到的频域系数被送入量化器 302中, 量化器由一組子量化器组成,每个子量化器分别^ ^据特定时频区域的 掩蔽阈值, 量化本区域的频域系数, 通常将该区域称为尺度因子带。 量化器 有一个比特分配机制控制各子量化器能够利用的比特数, 使得量化当前帧的 频域系数所花费的比特数不超过所允许的比特限额, 并使量化失真最小。 这 里所述的比特分配策略可以采用一般的常用策略, 如 MPEG AAC的码率控制方 法。 这里所述的量化器可以采用标量量化器和矢量量化器, 如 MPEG AAC的非 线性标量量化器, 以及 MPEG TwinVQ的矢量量化器。 The frequency domain coefficients obtained by the modified discrete cosine transform module 301 are sent to the quantizer 302. The quantizer is composed of a set of sub-quantizers, and each sub-quantizer separately quantizes the mask according to a masking threshold of a specific time-frequency region. The frequency domain coefficient of a region, which is usually referred to as a scale factor band. The quantizer has a bit allocation mechanism that controls the number of bits that each sub-quantizer can utilize, such that the number of bits taken to quantize the frequency domain coefficients of the current frame does not exceed the allowed bit limit and minimizes quantization distortion. The bit allocation strategy described herein can employ general common strategies, such as the rate control method of MPEG AAC. The quantizer described here can use a scalar quantizer and a vector quantizer, such as MPEG AAC. Linear scalar quantizer, and vector quantizer for MPEG TwinVQ.
量化后, 整数系数被送到矩阵变换模块 303。矩阵变换模块 303将量化后 得到的各声道的整数系数按照编码增益最大的原则进行组织, 获得时频特定 区域的声道对 /组。 并且, 不同时频区域(对于时域样本为时间段、 对于频域 系数为频率段, 对于子带样本为时频区域) 的声道对 /组可以不同。 在编码器 选择声道对的过程中, 典型的, 由于左声道(L )和右声道(R ) 的相关性较 高, 以及左环绕声道(LS )和右环绕声道(RS ) 的相关性较高, 经常会获得 L/R对和 LS/RS对, 当采用多种声道对组织方式时, 声道对组织信息需要作为 控制信息编码。 在按照声道组的组织方式时, 经常出现下面的声道組: 左声 道 /右声道 /中央声道、 左前声道 /右前声道 /左中声道 /右中声道 /中央声道、 左环绕 /右环绕 /后环绕等等。  After quantization, the integer coefficients are sent to the matrix transformation module 303. The matrix conversion module 303 organizes the quantized integer coefficients of the respective channels obtained by the quantization according to the principle of maximizing the coding gain to obtain channel pairs/groups of the specific regions of the time-frequency. Also, the channel pair/group of different time-frequency regions (time segments for time domain samples, frequency segments for frequency domain coefficients, and time-frequency regions for sub-band samples) may be different. In the process of selecting the channel pair by the encoder, typically, the correlation between the left channel (L) and the right channel (R) is high, as well as the left surround channel (LS) and the right surround channel (RS). The correlation is high, and the L/R pair and the LS/RS pair are often obtained. When using multiple channel-pair organization modes, the channel-to-organization information needs to be encoded as control information. The following channel groups often appear when organized according to the channel group: Left channel/Right channel/Center channel, Left front channel/Right front channel/Left center channel/Right center channel/Central sound Road, left surround/right surround/back surround, and more.
对于 "声道对 /组" 中的量化后的整数系数采样 "最优变换" 的方法去除 声道间冗余。  The method of "optimal transform" is sampled for the quantized integer coefficients in the "channel pair/group" to remove inter-channel redundancy.
所谓最优变换, 是指在确定数量的整数变换、 KLT 变换、 以及任何用于 近似 LT变换的变换中选择其一, 其编码增益为最大。在选择 KLT变换和 KLT 变换的近似变换进行编码时, 采用 LIFTING算法, 实现整数系数到整数系数 的变换。  The so-called optimal transform means that one of the determined number of integer transforms, KLT transforms, and any transforms used for the approximate LT transform is selected, and the coding gain is maximum. When the KLT transform and the approximate transform of the KLT transform are selected for encoding, the LIFTING algorithm is used to transform the integer coefficient to the integer coefficient.
所谓编码增益最大, 是指在特定的质量下, 在编码特定的信号时, 所用 的比特数最少。  The so-called maximum coding gain means that the number of bits used is the least when encoding a specific signal at a specific quality.
所谓整数变换, 是指变换矩阵 的各系数均为整数的变换, 并且, 存在 逆矩阵 (各系数均为整数), 使得 其中 I为单位阵。 例如, 当采用声道对时, 用 Z和 ?表示声道对的两个声道整数系数(这 里, 和 7?表示编码中可能出现的任何声道, 而不应被仅仅理解为 "左声道,, 和 "右声道"), £和 为量化后的整数系数, ^和 为整数变换后得到的整数 系数, 对于每个声道对, 在一定的分辨率尺度内 (如采用所谓的 "尺度因子 带")对声道对整数系数采用如下整数变换:
Figure imgf000012_0002
Figure imgf000012_0001
The integer transform refers to a transform in which each coefficient of the transform matrix is an integer, and there is an inverse matrix (each coefficient is an integer) such that I is a unit matrix. For example, when using a channel pair, use Z and ? to represent the two channel integer coefficients of the channel pair (this , and 7? represent any channel that may appear in the encoding, and should not be interpreted as merely "left channel, and "right channel"), £ and the quantized integer coefficient, ^ and integer transform The resulting integer coefficients, for each channel pair, are integer-scaled for the channel-to-integer coefficients within a certain resolution scale (eg, using the so-called "scale factor band"):
Figure imgf000012_0002
Figure imgf000012_0001
使得^ 编码所用的比特数少于 ^'编码所用的比特数。 The number of bits used for encoding is less than the number of bits used for ^' encoding.
当采用声道组时, 方法和声道对方式类似。  When a channel group is used, the method and channel pairing are similar.
所谓 KLT 变换, 是指一个信号适应矩阵, 该矩阵的行向量是多声道系数 协方差矩阵的特征向量。由于 KLT变换矩阵是正交阵,因此可以分解成 GIVENS 矩阵, 并采用 LIFTING算法近似计算, 可以获得整数结果。  The so-called KLT transform refers to a signal adaptation matrix whose row vector is the eigenvector of the multi-channel coefficient covariance matrix. Since the KLT transformation matrix is an orthogonal matrix, it can be decomposed into a GIVENS matrix and approximated by the LIFTING algorithm, and an integer result can be obtained.
当采用 KLT变换时, 编码时, 根据时域信号计算信号的协方差矩阵 Φ 。 根据 Φ , 计算正交矩阵 Q。 其中, 协方差矩阵 Φ 和正交矩阵 Q的计算方法 在信号处理和线性代数书籍中有介绍, 如《数字信号处理: 理论、 算法与实 现》, 清华大学出版社, 胡广书编著, 1997。  When using KLT transform, when encoding, the covariance matrix Φ of the signal is calculated according to the time domain signal. Calculate the orthogonal matrix Q according to Φ. Among them, the calculation methods of covariance matrix Φ and orthogonal matrix Q are introduced in signal processing and linear algebra books, such as "Digital Signal Processing: Theory, Algorithm and Implementation", Tsinghua University Press, edited by Hu Guangshu, 1997.
为了实现整数系数到整数系数的无损变换, KLT 变换需要采用所谓的 LIFTING算法近似实现。 这里所述的 LIFTING算法, 可以参考相关的文献, 如 "Factor ing Wavelet Transforms into Lift ing Steps " ( I. Daubechies, W. Sweldens, Tech. Rep. , Bel l Labora tories, Lucent Technologies, 1996 )。  In order to achieve a lossless transformation of integer coefficients to integer coefficients, the KLT transform needs to be approximated using the so-called LIFTING algorithm. The LIFTING algorithm described herein can be referred to related documents such as "Factor ing Wavelet Transforms into Lifting Steps" (I. Daubechies, W. Sweldens, Tech. Rep., Bel l Labora tories, Lucent Technologies, 1996).
这里, 仅以声道对为例说明 KLT变换矩阵的计算和其 LIFTING算法。  Here, the calculation of the KLT transformation matrix and its LIFTING algorithm are illustrated by taking only the channel pair as an example.
如前所设, 假定分析区域中  As previously set, assuming the analysis area
L{n), R(n), 0≤n≤N ( 2 )  L{n), R(n), 0≤n≤N ( 2 )
其中: £和 为量化后的整数系数; N为分析区域的大小 Where: £ and are the quantized integer coefficients; N is the size of the analysis area
其协方差矩阵 Covariance matrix
CLL CLR C LL C LR
Φχ = ( 3 )  Φχ = ( 3 )
Figure imgf000013_0001
Figure imgf000013_0001
1 w Λ Λ 1 w Λ Λ
其中: C 、 CRRR为协方差系数 t Where: C, C RR , R are covariance coefficients t
对应的 KLT变换正交矩阵 Q Corresponding KLT transformation orthogonal matrix Q
( 4 ) (4)
Figure imgf000013_0002
Figure imgf000013_0002
正交矩阵0恰好为一个 GIVENS旋转矩阵, 因此, 可以分解成以下形式
Figure imgf000013_0003
根据 LIFTING 算法, 每次变换后, 系数可以进行取整操作, 并且不影响系统 的完全可逆。在采用声道組编码时, KLT变换矩阵和 LIFTING算法和声道对方 法相似。
Orthogonal matrix 0 happens to be a GIVENS rotation matrix, so it can be decomposed into the following form
Figure imgf000013_0003
According to the LIFTING algorithm, after each transformation, the coefficients can be rounded and do not affect the complete reversibility of the system. When using channel group coding, the KLT transformation matrix and the LIFTING algorithm are similar to the channel pair method.
所谓 KLT 变换的近似变换, 是指在一定的前提下 (如信源统计特性、 计 算复杂度)用于近似 KLT变换的变换方法。 由于 KLT变换是均方差意义下的 最优变换, 但计算量和边带信息较大, 因此, 可以采用其他变换方法近似 KLT 变换, 以减少计算量和 /或边带信息, 如 DFT (离散傅立叶变换)、 DCT (离散 余弦变换)、 DST (离散正弦变换)等。 The approximate transformation of the KLT transformation refers to the transformation method used to approximate the KLT transformation under certain premise (such as source statistical properties and computational complexity). Since the KLT transform is the optimal transform in the sense of mean square error, the calculation amount and sideband information are large. Therefore, other transform methods can be used to approximate the KLT transform to reduce the computational amount and/or sideband information, such as DFT (Discrete Fourier). Transform), DCT (discrete Cosine transform), DST (discrete sine transform), etc.
在采用 KLT 变换的近似变换时, 为了保证整数到整数的无损变换, 也需 要采用 LIFTING算法进行变换, 计算过程和 KLT的 LIFTING算法相同。  In the approximate transformation of KLT transformation, in order to guarantee the lossless transformation from integer to integer, the LIFTING algorithm is also needed to transform, and the calculation process is the same as KLT's LIFTING algorithm.
所谓最优变换, 是指在确定数量的整数变换、 KLT变换(LIFTING实现) 和 KLT的近似变换( LIFTING实现)中, 选择编码增益最大的变换, 用于对确 定区域进行编码。  The so-called optimal transformation means that in a certain number of integer transformations, KLT transformations (LIFTING implementation) and KLT approximate transformations (LIFTING implementation), the transformation with the largest coding gain is selected for encoding the determined region.
在具体编码装置中, 矩阵变换模块中包括确定数量的整数变换单元、 KLT 变换单元和 KLT 的近似变换单元, 上述的矩阵变换方式包括选择确定数量的 整数变换方式、 KLT变换方式和 KLT的近似变换方式(如 DFT、 DCT、 DST等)。 比如可以选择 M个整数变换方式, 设代号为 4、 Α2 其中 为不小于In the specific coding apparatus, the matrix transformation module includes a determined number of integer transform units, KLT transform units, and KLT approximate transform units, and the matrix transformation manner includes selecting a certain number of integer transform modes, KLT transform modes, and KLT approximate transforms. Ways (such as DFT, DCT, DST, etc.). For example, you can select M integer conversion methods, set the code to 4, Α 2, which is not less than
1的整数; 设 KLT变换的代号为 A"; KLT的近似变换方式(如 DFT、 DCT、 DST 等) 的代号为 其中 N为大于 2的整数。 并设不同变换方式所对 座的编码增益为 (1≤ ≤N ), 设置一个判断开关模块, 使编码器自适应的选 择编码增益最大的变换方式, 最大程度的消除编码信号的声道间冗余。 对应 的变换方式的代号以及其他的必要信息作为边信息写入压缩比特流, 以控制 解码器准确解码。 An integer of 1; let the code of the KLT transform be A"; the code of the KLT approximation (such as DFT, DCT, DST, etc.) is where N is an integer greater than 2. And the coding gain of the different transform mode is (1≤ ≤N), set a judgment switch module to make the encoder adaptively select the conversion method with the largest coding gain, and to eliminate the inter-channel redundancy of the encoded signal to the greatest extent. The code of the corresponding conversion method and other necessary Information is written as side information to the compressed bit stream to control the decoder for accurate decoding.
对于每个声道对, 可以按照如下的方法处理, 以降低编码所需要的比特 数。  For each channel pair, it can be handled as follows to reduce the number of bits required for encoding.
例如, 我们可以选择三个变换方式, 代号分别为 Α、 . 其中 4和 A 是两种整数变换方式, 是 KLT变换方式。 其中
Figure imgf000014_0001
For example, we can choose three transformation methods, the code numbers are Α, . 4 and A are two integer transformation methods, which is the KLT transformation method. among them
Figure imgf000014_0001
其中, 0的取值如式(4 )和(5 ) 所示。 其中, 当采用变换 4时, 声道 对中量化后整数系数不做任何处理; 当采用变换 4时, 声道对的第一个声道 的量化后整数系数不变, 而变换得到的第二个声道的整数系数为原第一个声 道的量化后整数系数减原第二个声道的量化后整数系数的差; 当采用变换 A 时, 则采用 KLT 变换实现声道系数间的冗余消除, 此时, 除了要编码变换方 式的代号外, 还需要将 (9值编码。 Among them, the value of 0 is as shown in equations (4) and (5). Where, when using transform 4, the channel The integer coefficient after the quantization is not processed; when the transform 4 is adopted, the quantized integer coefficient of the first channel of the channel pair is unchanged, and the integer coefficient of the second channel obtained by the transform is the original first The quantized integer coefficients of the channels are reduced by the difference of the quantized integer coefficients of the second channel; when the transform A is used, the KLT transform is used to achieve redundancy cancellation between the channel coefficients, in addition to coding transformation In addition to the code of the way, it is also necessary to encode (9 values).
采用变换矩阵的判断开关 306 可以用于在矩阵变换模块中的确定数量的 整数变换单元或 KLT变换单元或 KLT的近似变换单元中选择最优的变换方式, 并将选择的最优的变换方式的代号做为边信息进行编码。  The decision switch 306 employing the transformation matrix may be used to select an optimal transformation mode among a determined number of integer transformation units or KLT transformation units in the matrix transformation module or an approximate transformation unit of the KLT, and the selected optimal transformation mode The code number is encoded as side information.
考虑到边信息所占带宽的限制, 在对声道对组织方式和矩阵变换序号等 控制信息编码时, 可以按照尺度因子带为单位, 选择所采用的矩阵变换类型, 并将所选择的矩阵变换序号进行编码。 当 0 且 > 时, 采用变换方式 A, 即声道对内系数不做任何变换。 当 O 且 O 时, 则采用整数变换方 式 其他情况,则釆用变换方式 4。 并将选择的变换方式 A、 A还是 Λ做为 边信息写入压缩比特流, 以控制解码器准确解码。  Considering the limitation of the bandwidth occupied by the side information, when encoding the control information such as the channel pair organization mode and the matrix transform sequence number, the matrix transformation type adopted may be selected according to the scale factor band, and the selected matrix is transformed. The serial number is encoded. When 0 and >, the transformation mode A is adopted, that is, the channel internal coefficient is not changed. When O and O, the integer transformation method is used. In other cases, the transformation method 4 is used. The selected transform mode A, A or Λ is written as a side information into the compressed bit stream to control the decoder to accurately decode.
经过变换后, 整数系数被送到熵编码和码流复用模块 304。 在熵编码和码 流复用模块 304 , 通过有效的熵编码可以最大限度的去除整数系数的统计冗 余, 然后, 将熵编码结果和其他控制信息一起复用为压缩比特流, 并输出到 传输信道或存储介质。 这里, 熵编码可以采用 Huffman编码、 游程编码和算 术编码等编码方法。  After transformation, the integer coefficients are sent to the entropy coding and code stream multiplexing module 304. In the entropy coding and code stream multiplexing module 304, the statistical redundancy of the integer coefficients can be removed by the effective entropy coding, and then the entropy coding result is multiplexed with the other control information into the compressed bit stream, and output to the transmission. Channel or storage medium. Here, the entropy coding may employ an encoding method such as Huffman coding, run length coding, and arithmetic coding.
本发明还公开了一种利用声道间冗余进行解码方法和装置,所述的装置如 图 4所示, 包括码流解复用和嫡解码模块、 逆矩阵变换模块、 逆量化器和逆 修正离散余弦变换模块, 所述的方法包括如下步骤:  The present invention also discloses a decoding method and apparatus using inter-channel redundancy, as shown in FIG. 4, including a code stream demultiplexing and decoding module, an inverse matrix transform module, an inverse quantizer, and an inverse. A modified discrete cosine transform module, the method comprising the following steps:
步骤 1、 压缩比特流经过码流解复用和熵解码模块的解复用和熵解码, 得 到整数系数和用于判断是利用哪一种逆矩阵变换方式的边信息; 步驟 2、 所述的整数系数经过逆矩阵变换模块进行逆矩阵变换, 得到逆矩 阵变换后的整数量化系数; Step 1. The compressed bit stream is demultiplexed and entropy decoded by the code stream demultiplexing and entropy decoding module. To the integer coefficient and the edge information used to determine which inverse matrix transformation method is used; Step 2, the integer coefficient is inverse matrix transformed by the inverse matrix transformation module to obtain an integer quantization coefficient after the inverse matrix transformation;
步骤 3、 所述的逆矩阵变换后的整数量化系数在逆量化器进行反量化处 理, 恢复频域系数;  Step 3: The integer quantization coefficient transformed by the inverse matrix is inverse quantized by the inverse quantizer to recover the frequency domain coefficient;
步骤 4、所述的频域系数经过逆修正离散余弦变换模块进行逆修正离散余 弦变换, 得到线性 PCM信号。  Step 4: The frequency domain coefficient is subjected to inverse modified discrete cosine transform by an inverse modified discrete cosine transform module to obtain a linear PCM signal.
其中,所述的步骤 2中进行逆矩阵变换是通过从步驟 1得到的边信息中的 变换方式代号而确定是采用上述变换方式中的哪一种。  Here, the inverse matrix transformation in the step 2 is determined by the conversion mode code in the side information obtained from the step 1, which one of the above conversion methods is employed.
当矩阵变换模块利用式(1 )进行整数变换时, 可以采用如下整数变换恢 复整数量化系数  When the matrix transform module performs integer transform using equation (1), the integer transform may be used to restore the integer quantized coefficients.
( 7 )(7)
Figure imgf000016_0002
Figure imgf000016_0002
Figure imgf000016_0001
Figure imgf000016_0001
BA = I  BA = I
其中: 和 是经过解复用和熵解码得到的整数系数; i和&为通过整数变换而恢复出的整数系数。  Where: and are integer coefficients obtained by demultiplexing and entropy decoding; i and & are integer coefficients recovered by integer transformation.
当采用 KLT变换方式时, 包括以下步驟:  When using the KLT transformation method, the following steps are included:
步驟 la、 从码流中获取协方差矩阵或相应的参数(如式(4 ) 中的 步骤 lb、 根据所述的协方差矩阵或相应的参数计算 KLT变换矩阵; 步骤 lc、 对所述的〖LT变换矩阵, 利用 LIFTING算法, 恢复声道对整数 量化系数。 Step la, obtaining a covariance matrix or corresponding parameters from the code stream (such as step lb in equation (4), calculating a KLT transformation matrix according to the covariance matrix or corresponding parameters; Step lc: For the LT transformation matrix, use the LIFTING algorithm to restore the channel-to-integer quantization coefficient.
当釆用 KLT的近似变换方式时, 对这些近似变换方式利用 LIFTING算法 计算其整数近似恢复声道对整数量化系数。  When KLT's approximate transformation method is used, these approximate transformation methods are used to calculate the integer approximation channel pair integer quantization coefficient using the LIFTING algorithm.
在解码端, 当压缩比特流被解复用和熵解码 401后,得到整数系数和用于 判断是利用哪一种逆矩阵变换方式的边信息, 将整数系数送入逆矩阵变换模 块 402。 在本实施例中, 当选择式(6 )的三种矩阵变换方式进行矩阵变换时, 其对应的逆矩阵变换为  At the decoding end, when the compressed bit stream is demultiplexed and entropy decoded 401, the integer coefficients and the side information for determining which inverse matrix transform method is used are obtained, and the integer coefficients are sent to the inverse matrix transform module 402. In this embodiment, when the matrix transformation of the three matrix transformation modes of equation (6) is performed, the corresponding inverse matrix is transformed into
( 8 )( 8 )
Figure imgf000017_0001
Figure imgf000017_0001
逆矩阵变换模块 402根据从 401得到的边信息选择是利用哪一种逆矩阵 变换方式用于恢复编码时的整数量化系数。  The inverse matrix transform module 402 selects which inverse matrix transform method is used to recover the integer quantized coefficients at the time of encoding based on the side information obtained from 401.
逆矩阵变换获得的整数量化系数被送入反量化模块 403,进行反向量化处 理。恢复的频域系数被送入逆修正离散余弦变换 404 ,获得线性 PCM音频信号。  The integer quantized coefficients obtained by the inverse matrix transform are sent to the inverse quantization module 403 for inverse quantization processing. The recovered frequency domain coefficients are fed to an inverse modified discrete cosine transform 404 to obtain a linear PCM audio signal.
逆矩阵变换模块包括整数变换单元、 KLT变换单元和 KLT的近似变换单元, 其中, 由边信息中的矩阵变换代号来选择是利用哪一种逆矩阵变换方式用于 对从码流解复用和熵解码模块输出的整数系数进行逆矩阵变换, 并将变换后 的整数量化系数输出到逆量化器。 述的技术方案; 因此, 尽管本说明书参照上述的各个实施例对本发明已进行 了详细的说明, 但是, 本领域的普通技术人员应当理解, 仍然可以对本发明 进行修改或者等同替换; 而一切不脱离本发明的精神和范围的技术方案及其 改进, 其均应涵盖在本发明的权利要求范围当中。  The inverse matrix transform module includes an integer transform unit, a KLT transform unit, and an approximate transform unit of the KLT, wherein the matrix transform code in the side information is used to select which inverse matrix transform method is used to demultiplex the code stream and The integer coefficients output by the entropy decoding module are inverse matrix transformed, and the transformed integer quantized coefficients are output to an inverse quantizer. The present invention has been described in detail with reference to the various embodiments described above, but those skilled in the art will understand that the invention may be modified or equivalently substituted; The technical solutions of the spirit and scope of the present invention and the improvements thereof are intended to be included in the scope of the claims of the present invention.

Claims

权利要求书 Claim
1、 一种利用声道间冗余进行编码的方法, 其特征在于, 包括以下步骤: 步驟 1、 将线性 PCM信号变换到频域, 并计算尺度因子带的掩蔽阈值; 步骤 2、 ^据尺度因子带的掩蔽阈值量化本区域的频域系数, 得到各声道 的整数系数; 步骤 3、 将所述的整数系数按照编码增益最大的原则进行组织, 获得时频 特定区域的声道对 /組; 步骤 4、 对所述的声道对 /组量化后的整数系数进行矩阵变换, 并将变换 后的声道对 /组整数系数经过熵编码和码流复用输出。 A method for encoding by using inter-channel redundancy, comprising the steps of: Step 1: transforming a linear PCM signal into a frequency domain, and calculating a masking threshold of the scale factor band; Step 2; The masking threshold of the factor band quantizes the frequency domain coefficients of the region, and obtains the integer coefficients of each channel. Step 3: Organize the integer coefficients according to the principle of maximizing the coding gain, and obtain channel pairs/groups of specific regions in the time-frequency domain. Step 4: Perform matrix transformation on the channel pair/group quantized integer coefficients, and perform the entropy coding and code stream multiplexing output on the transformed channel pair/group integer coefficients.
2、根据权利要求 1所述的利用声道间冗余进行编码的方法,其特征在于, 所述的步骤 4中, 对所述的声道对 /组量化后的整数系数进行矩阵变换方式为 在确定数量的整数变换、 KLT变换和 KLT的近似变换中, 选择的编码增益最大 的一种变换, 用于对确定区域的量化后的整数系数进行编码。 The method for encoding using inter-channel redundancy according to claim 1, wherein in the step 4, the matrix transform of the channel pair/group quantized integer coefficient is performed as follows. In a determined number of integer transforms, KLT transforms, and KLT approximate transforms, a transform having the largest coding gain selected is used to encode the quantized integer coefficients of the determined region.
3、根据权利要求 2所述的利用声道间冗余进行编码的方法,其特征在于, 所述的对声道对 /组量化后的整数系数进行整数变换的过程为:
Figure imgf000018_0002
The method for encoding by using inter-channel redundancy according to claim 2, wherein the process of performing integer conversion on the channel coefficients/group quantized integer coefficients is:
Figure imgf000018_0002
和 为量化后的整数系数; 和 为整数变换后得到的整数系数;  And are quantized integer coefficients; and integer coefficients obtained by integer transformation;
A =A =
Figure imgf000018_0001
Figure imgf000018_0001
1 0  1 0
数均为整数, 使得 n, 其中 / - , 为单位阵。 The numbers are integers such that n, where / - , is the unit matrix.
0 1 0 1
4、根据权利要求 2所述的利用声道间冗余进行编码的方法,其特征在于, 所迷的 KLT的近似变换为 FFT或 DCT或 DST, 采用 LIFTING算法进行变换。 4. The method of encoding using inter-channel redundancy according to claim 2, wherein the approximate conversion of the KLT is FFT or DCT or DST, and the transform is performed by using the LIFTING algorithm.
5、根据权利要求 2所述的利用声道间冗余进行编码的方法,其特征在于, 所述的 KLT变换的 KLT变换正交矩阵 Q为:
Figure imgf000019_0001
5. The method of encoding using inter-channel redundancy according to claim 2, wherein said KLT transformed orthogonal matrix Q of said KLT transform is:
Figure imgf000019_0001
其中, 协方差矩阵 表示为: Φχ = Wherein, the covariance matrix is expressed as: Φχ =
其中
Figure imgf000019_0002
among them
Figure imgf000019_0002
1 w Λ1 w Λ
C =ZW(«) C =Z W(«)
N «=0  N «=0
L{n), R{n), 0≤n≤N L{n), R{n), 0≤n≤N
6、 根据权利要求 2或 3或 4或 5所述的利用声道间冗余进行编码的方 法,其特征在于, 当选择确定数量的整数变换、 KLT变换和 KLT的近似变换时, 将对应的该变换的代号作为边信息进行编码。 6. A method of encoding using inter-channel redundancy according to claim 2 or 3 or 4 or 5, wherein when a certain number of integer transforms, KLT transforms, and KLT approximate transforms are selected, corresponding The code of the transform is encoded as side information.
7、 根据权利要求 2或 5所述的利用声道间冗余进行编码的方法, 其特 征在于, 当选择所述的 KLT 变换时, 将协方差矩阵或相应的参数作为边信息 进行编码。  A method of encoding using inter-channel redundancy according to claim 2 or 5, wherein when the KLT transform is selected, a covariance matrix or a corresponding parameter is encoded as side information.
8、 一种利用声道间冗余进行编码的装置, 包括心理声学模块、 修正离 散余弦变换模块、 量化器, 熵编码和码流复用模块, 其特征在于, 还包括矩 阵变换模块, 其中,  8. A device for encoding with inter-channel redundancy, comprising a psychoacoustic module, a modified discrete cosine transform module, a quantizer, an entropy coding, and a code stream multiplexing module, further comprising a matrix transformation module, wherein
所述的矩阵变换模块用于将从量化器输出的各声道的整数系数按照编 码增益最大的原则进行组织, 获得尺度因子带的声道对 /组, 对所述的声道对 /組进行矩阵变换, 并将变换后的声道对 /组整数系数输出到熵编码和码流复 用模块; The matrix transformation module is configured to edit the integer coefficients of each channel output from the quantizer The principle of maximizing the code gain is organized, the channel pair/group of the scale factor band is obtained, the channel pair/group is matrix-transformed, and the transformed channel pair/group integer coefficients are output to the entropy coding and code. Stream multiplexing module;
所述的心理声学模块用于根据人耳听觉特性计算当前帧信号的掩蔽曲 线, 根据掩蔽曲线计算尺度因子带的掩蔽阈值, 所述的尺度因子带的掩蔽闹 值用于指导对当前帧信号的量化;  The psychoacoustic module is configured to calculate a masking curve of the current frame signal according to the auditory characteristics of the human ear, calculate a masking threshold of the scale factor band according to the masking curve, and the masking value of the scale factor band is used to guide the signal of the current frame. Quantify
所述的修正离散余弦变换模块, 用于将线性 PCM信号变换到频域; 所述的量化器, 用于将从修正离散余弦变换模块输出的频域系根据特定 时频区域的掩蔽阈值, 量化本区域的频域系数。  The modified discrete cosine transform module is configured to transform a linear PCM signal into a frequency domain; and the quantizer is configured to quantize a frequency domain output from the modified discrete cosine transform module according to a masking threshold of a specific time-frequency region. The frequency domain coefficient of this area.
9、 根据权利要求 8 所述的利用声道间冗余进行编码的装置, 其特征在 于, 所述的矩阵变换模块包括整数变换单元、 KLT变换单元和 KLT的近似变换 单元, 其中, 整数变换单元、 KLT变换单元和 KLT的近似变换单元分别用于对 所述的声道对进行矩阵变换, 并将变换后的声道对整数系数输出到熵编码和 码流复用模块。  9. The apparatus for encoding using inter-channel redundancy according to claim 8, wherein the matrix transform module comprises an integer transform unit, a KLT transform unit, and an approximate transform unit of a KLT, where the integer transform unit The KLT transform unit and the KLT approximation transform unit are respectively configured to perform matrix transformation on the channel pair, and output the transformed channel pair integer coefficients to the entropy coding and code stream multiplexing module.
10、 根据权利要求 9所述的利用声道间冗余进行编码的装置, 其特征在 于, 该装置还包括判断开关模块, 用于在矩阵变.换模块中的整数变换单元或 KLT变换单元或 KLT的近似变换单元中选择最优变换方式,并将控制信息编码。  10. The apparatus for encoding using inter-channel redundancy according to claim 9, wherein the apparatus further comprises a judgment switch module for an integer transform unit or a KLT transform unit in the matrix change/change module or The optimal transformation mode is selected in the approximate transformation unit of KLT, and the control information is encoded.
11、一种利用声道间冗余进行解码的方法, 其特征在于, 包括以下步骤: 步骤 1、 将从码流解复用和熵解码得到的整数系数进行逆矩阵变换, 得 到整数量化系数;  A method for decoding by using inter-channel redundancy, comprising the following steps: Step 1. Perform inverse matrix transformation on integer coefficients obtained by code stream demultiplexing and entropy decoding to obtain integer quantization coefficients;
步骤 2、 将整数量化系数进行反量化处理, 恢复频域系数;  Step 2. Perform inverse quantization processing on the integer quantized coefficients to recover the frequency domain coefficients;
步骤 3、 将频域系数进行逆修正离散余弦变换, 得到线性 PCM信号。 Step 3. Perform inverse inverse cosine transform on the frequency domain coefficients to obtain a linear PCM signal.
12、根据权利要求 11所述的利用声道间冗余进行解码的方法,其特征在 于, 所述的步骤 1中进行逆矩阵变换方式为在确定数量的整数变换方式、 KLT 变换方式和 KLT 的近似变换方式中, 通过从码流解复用和熵解码得到的边信 息中的变换方式代号而确定的一种用于恢复编码时的整数量化系数的逆矩阵 变换方式。 12. A method of decoding using inter-channel redundancy according to claim 11 wherein The inverse matrix transformation method in the step 1 is a transformation in the side information obtained by code stream demultiplexing and entropy decoding in a certain number of integer transformation modes, KLT transformation modes, and KLT approximate transformation modes. An inverse matrix transform method for recovering integer quantized coefficients at the time of encoding, determined by a mode code.
13、 根据权利要求 12 所述的利用声道间冗余进行解码的方法, 其特征 在于, 所述逆矩阵变换釆用整数变换方式时, 直接用整数变换恢复出变换前 的声道对 /组整数量化系数。  13. The method for decoding by inter-channel redundancy according to claim 12, wherein when the inverse matrix transform uses an integer transform method, the channel pair/group before the transform is directly restored by integer transform. Integer quantized coefficient.
14、 根据权利要求 12 所述的利用声道间冗余进行解码的方法, 其特征 在于, 所述逆矩阵变换采用 KLT变换方式时, 包括以下步驟:  The method for decoding by using inter-channel redundancy according to claim 12, wherein when the inverse matrix transformation adopts the KLT transformation mode, the following steps are included:
步骤 la、 从码流中获得协方差矩阵或其相应的参数;  Step la, obtaining a covariance matrix or its corresponding parameter from the code stream;
步驟 lb、 根据所述的协方差矩阵或相应的参数计算 KLT变换矩阵; 步驟 lc、 对所述的 KLT变换矩阵, 利用 LIFTING算法, 恢复声道对整数 量化系数。  Step lb, calculating a KLT transformation matrix according to the covariance matrix or corresponding parameters; Step lc, using the LIFTING algorithm to recover the channel-to-integer quantization coefficient for the KLT transformation matrix.
15、 根据权利要求 12 所述的利用声道间冗余进行解码的方法, 其特征 在于, 所述逆矩阵变换采用 KLT的近似变换方式时, 对这些近似变换方式利 用 LIFTING算法, 恢复声道对 /组整数量化系数。  The method for decoding by using inter-channel redundancy according to claim 12, wherein when the inverse matrix transformation adopts an approximate transformation mode of KLT, the LIFTING algorithm is used for recovering channel pairs for these approximate transformation modes. / group integer quantized coefficient.
16、 一种利用声道间冗余进行解码的装置, 其特征在于, 包括码流解复 用和熵解码模块、 逆量化器和逆修正离散余弦变换模块, 其特征在于, 包括: 逆矩阵变换模块, 其中,  16. An apparatus for decoding using inter-channel redundancy, comprising: a code stream demultiplexing and entropy decoding module, an inverse quantizer, and an inverse modified discrete cosine transform module, comprising: an inverse matrix transform Module, where
所述的逆矩阵变换模块用于将从码流解复用和熵解码模块输出的整数 系数进行逆矩阵变换, 得到整数量化系数;  The inverse matrix transform module is configured to perform inverse matrix transform on integer coefficients output from the code stream demultiplexing and entropy decoding modules to obtain integer quantized coefficients;
所述的码流解复用和熵解码模块用于将输入的压缩比特流解复用和熵 解码, 得到整数系数; ― 所述的逆量化器用于将从逆矩阵变换模块输出的整数量化系数进行反 量化处理, 恢复频域系数; The code stream demultiplexing and entropy decoding module is configured to demultiplex and entropy the input compressed bit stream to obtain an integer coefficient; The inverse quantizer is configured to perform inverse quantization processing on the integer quantized coefficients output from the inverse matrix transform module to recover frequency domain coefficients;
所述的逆修正离散余弦变换模块用于将从逆量化器输出的频域系数进 行逆修正离散余弦变换, 得到线性 PCM信号。  The inverse modified discrete cosine transform module is configured to perform inverse modified discrete cosine transform on the frequency domain coefficients output from the inverse quantizer to obtain a linear PCM signal.
17、 根据权利要求 16所述的利用声道间冗余进行解码的装置, 其特征 在于, 所述的逆矩阵变换模块包括整数变换单元、 KLT变换单元和 KLT的近似 变换单元, 其中, 由从所述码流解复用和熵解码模块得到的边信息中的矩阵 变换代号确定所述整数变换单元、 KLT变换单元或 KLT的近似变换单元用于对 从所述码流解复用和熵解码模块输出的整数系数进行逆矩阵变换, 并将变换 后的整数量化系数输出到逆量化器。  17. The apparatus for decoding using inter-channel redundancy according to claim 16, wherein the inverse matrix transform module comprises an integer transform unit, a KLT transform unit, and an approximate transform unit of a KLT, wherein The matrix transform code in the side information obtained by the code stream demultiplexing and entropy decoding module determines the integer transform unit, the KLT transform unit or the approximate transform unit of the KLT for demultiplexing and entropy decoding from the code stream The integer coefficients output by the module are inverse matrix transformed, and the transformed integer quantized coefficients are output to the inverse quantizer.
PCT/CN2004/001349 2004-11-24 2004-11-24 Coding/decoding method and device utilizing intra-channel signal redundancy WO2006056100A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2004/001349 WO2006056100A1 (en) 2004-11-24 2004-11-24 Coding/decoding method and device utilizing intra-channel signal redundancy
CN200480044452.4A CN101065796A (en) 2004-11-24 2004-11-24 Method and apparatus for coding/decoding using inter-channel redundance

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2004/001349 WO2006056100A1 (en) 2004-11-24 2004-11-24 Coding/decoding method and device utilizing intra-channel signal redundancy

Publications (1)

Publication Number Publication Date
WO2006056100A1 true WO2006056100A1 (en) 2006-06-01

Family

ID=36497722

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2004/001349 WO2006056100A1 (en) 2004-11-24 2004-11-24 Coding/decoding method and device utilizing intra-channel signal redundancy

Country Status (2)

Country Link
CN (1) CN101065796A (en)
WO (1) WO2006056100A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010102537A1 (en) * 2009-03-12 2010-09-16 华为终端有限公司 Method and apparatus for reducing redundancy of multiple description coding and decoding
CN104616657A (en) * 2015-01-13 2015-05-13 中国电子科技集团公司第三十二研究所 Advanced audio coding system

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8576927B2 (en) * 2008-10-10 2013-11-05 Nippon Telegraph And Telephone Corporation Encoding method, encoding device, decoding method, decoding device, program, and recording medium
US9148672B2 (en) * 2013-05-08 2015-09-29 Mediatek Inc. Method and apparatus for residue transform
WO2019235891A1 (en) * 2018-06-08 2019-12-12 주식회사 케이티 Method and apparatus for processing video signal

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6345125B2 (en) * 1998-02-25 2002-02-05 Lucent Technologies Inc. Multiple description transform coding using optimal transforms of arbitrary dimension
US20030014136A1 (en) * 2001-05-11 2003-01-16 Nokia Corporation Method and system for inter-channel signal redundancy removal in perceptual audio coding
CN1461112A (en) * 2003-07-04 2003-12-10 北京阜国数字技术有限公司 Quantized voice-frequency coding method based on minimized global noise masking ratio criterion and entropy coding

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6345125B2 (en) * 1998-02-25 2002-02-05 Lucent Technologies Inc. Multiple description transform coding using optimal transforms of arbitrary dimension
US20030014136A1 (en) * 2001-05-11 2003-01-16 Nokia Corporation Method and system for inter-channel signal redundancy removal in perceptual audio coding
CN1461112A (en) * 2003-07-04 2003-12-10 北京阜国数字技术有限公司 Quantized voice-frequency coding method based on minimized global noise masking ratio criterion and entropy coding

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010102537A1 (en) * 2009-03-12 2010-09-16 华为终端有限公司 Method and apparatus for reducing redundancy of multiple description coding and decoding
CN104616657A (en) * 2015-01-13 2015-05-13 中国电子科技集团公司第三十二研究所 Advanced audio coding system

Also Published As

Publication number Publication date
CN101065796A (en) 2007-10-31

Similar Documents

Publication Publication Date Title
JP5395917B2 (en) Multi-channel digital speech coding apparatus and method
WO2005096274A1 (en) An enhanced audio encoding/decoding device and method
US6092041A (en) System and method of encoding and decoding a layered bitstream by re-applying psychoacoustic analysis in the decoder
US6253165B1 (en) System and method for modeling probability distribution functions of transform coefficients of encoded signal
US6182034B1 (en) System and method for producing a fixed effort quantization step size with a binary search
US6029126A (en) Scalable audio coder and decoder
KR101445396B1 (en) Method and apparatus for encoding and decoding an audio signal using adaptively switched temporal resolution in the spectral domain
JP5539203B2 (en) Improved transform coding of speech and audio signals
EP1080579B1 (en) Scalable audio coder and decoder
JP4168976B2 (en) Audio signal encoding apparatus and method
EP1600946A1 (en) Method and apparatus for encoding/decoding a digital signal
JP2012163969A5 (en)
KR100707177B1 (en) Method and apparatus for encoding and decoding of digital signals
KR19990041073A (en) Audio encoding / decoding method and device with adjustable bit rate
US6965859B2 (en) Method and apparatus for audio compression
EP1873753A1 (en) Enhanced audio encoding/decoding device and method
CN1677490A (en) Intensified audio-frequency coding-decoding device and method
JP5629319B2 (en) Apparatus and method for efficiently encoding quantization parameter of spectral coefficient coding
CN1677491A (en) Intensified audio-frequency coding-decoding device and method
Ben-Shalom et al. Improved low bit-rate audio compression using reduced rank ICA instead of psychoacoustic modeling
WO2006056100A1 (en) Coding/decoding method and device utilizing intra-channel signal redundancy
WO2005096508A1 (en) Enhanced audio encoding and decoding equipment, method thereof
CN1318904A (en) Practical sound coder based on wavelet conversion
KR100195711B1 (en) A digital audio decoder
JPH05114863A (en) High-efficiency encoding device and decoding device

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 200480044452.4

Country of ref document: CN

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 04797379

Country of ref document: EP

Kind code of ref document: A1