WO2006056100A1

WO2006056100A1 - Coding/decoding method and device utilizing intra-channel signal redundancy

Info

Publication number: WO2006056100A1
Application number: PCT/CN2004/001349
Authority: WO
Inventors: Xingde Pan; Lei Wang
Original assignee: Beijing E-World Technology Co., Ltd
Priority date: 2004-11-24
Filing date: 2004-11-24
Publication date: 2006-06-01
Also published as: CN101065796A

Abstract

The present invention discloses a coding/decoding method and device utilizing intrachannel signal redundancy, in which the said coding method comprises the following steps: Transforming the linear PCM signals into frequency domain by modified discrete cosine transform (MDCT), and calculating the masking thresholds of the scaling factors bands by the psycho-acoustic module; quantising the coefficients of frequency domain in the area by the quantiser based on the masking thresholds of the scaling factors bands so as to gain the integrai coeficients of each channel; transforming matrix for the said integral coeficients of each channel by the matrix transforming module, and outputting the transformed integral coeficients of the channel pairs by the entropy encoder and the code stream multiplexer. The present invention also provides the device corresponding to the coding method , and the decoding method and device corresponding to the coding method and device. In the invention, the coding efficiency of audio signals is improved for loss coding, and the statistical redundancy of intra-channel signals is removed for lossless coding so as to achieve the purpose of compressing signals. The coding /decoding efficiency and quality can be improved for any stereo audio coder/decoder and multichannel audio coder/decoder by the invention.

Description

Method and device for encoding/decoding using inter-channel redundancy

The present invention relates to the field of audio codec technology, and in particular to a method and apparatus for encoding and decoding using inter-channel redundancy. Background technique

In order to obtain a high-fidelity digital audio signal, the digital audio signal is audio-encoded or audio-compressed for storage and transmission. The purpose of encoding an audio signal is to achieve a transparent representation of the audio signal with as few bits as possible, such as little difference between the originally input audio signal and the encoded output audio signal.

In the early 1980s, the advent of CDs represented the many advantages of digitally representing audio signals, such as high fidelity, large dynamic range, and robustness. However, these advantages are at the expense of high data rates. For example, the digitization of a CD-quality stereo signal requires a sampling rate of 44.1 kHz, and each sample value is uniformly quantized with 16 bits, so that the uncompressed data rate reaches 1.41 Mb/s, so high. The data rate brings great inconvenience to the transmission and storage of data, especially in the case of multimedia applications and wireless transmission applications, and is limited by bandwidth and cost. In order to maintain high quality audio signals, new network and wireless multimedia digital audio systems are required to reduce the rate of data without compromising the quality of the audio. In response to the above problems, a variety of audio compression techniques have been proposed which can obtain high compression ratio and high fidelity audio signals, and are typically MPEG-1/-2/-4 technology of the International Organization for Standardization ISO/IEC. , Dolby's AC-2/AC-3 technology, Sony's ATRAC/MiniDi sc/SDDS technology, and Lucent's PAC/EPAC/MPAC technology. Select MPEG-2 A AC technology, Dolby AC-3 The technology is specifically described.

1 shows a block diagram of an MPEG-2 AAC encoder including a gain controller 101, a modified discrete pre-transform (MDCT) module 102, a time domain noise shaping module 103, an intensity/coupling module 104, a psychoacoustic model, a second order backward adaptive predictor 105, a / difference stereo module 106, a bit allocation and quantization encoding module 107, and a bitstream multiplexing module 108, wherein the bit allocation and quantization encoding module 107 further includes a compression ratio/distortion processing controller, Scale factor module, non-uniform quantizer and entropy coding module.

After the audio signal passes through the gain controller 101, it enters the modified discrete cosine transform module 102, performs time-frequency transform according to different signals, and then processes the spectral coefficients output by the modified discrete cosine transform module 102 through the time domain noise shaping module 103, and time domain noise. The shaping technique performs linear prediction analysis on the spectral coefficients in the frequency domain, and then controls the shape of the quantization noise in the time domain according to the above analysis, thereby achieving the purpose of controlling the pre-echo.

The intensity/coupling module 104 is used for stereo encoding of signal strength, since for a high frequency band (greater than 2 kHz) the sense of direction of the hearing is related to the change in signal strength (signal envelope), independent of the waveform of the signal, ie The constant envelope signal has no effect on the sense of direction of the hearing, so this feature and related information between multiple channels can be used to combine several channels into one common channel for encoding.

The second-order backward adaptive predictor 105 is used to eliminate redundancy of the steady state signal and improve coding efficiency. The and difference stereo (M/S) module 106 is used to operate the channel pair, which refers to two channels such as a left channel or a left and right surround channel in a two-channel signal or a multi-channel signal. The M/S module 106 utilizes the correlation between the two channels of the channel pair to achieve the effect of reducing the code rate and improving the coding efficiency.

The bit allocation and quantization coding module 107 is implemented by a nested loop process in which the non-uniform quantizer performs lossy coding, and the entropy coding module performs lossless coding, which removes redundancy and reduces correlation. Nested loops include inner loops and outer loops, where inner loops are tuned The step size of the non-uniform quantizer is used until the provided bits are used up, and the outer loop uses the ratio of quantization noise to the masking threshold to estimate the encoding quality of the signal. The last encoded signal forms an encoded audio stream output through bitstream multiplexing module 108.

In the case where the sample rate is scalable, the input signal simultaneously performs four-band multi-phase filter bank (PQF) to generate four equal-bandwidth bands, and each band uses MDCT to generate 256 spectral coefficients, for a total of 1024. A gain controller 101 is used in each frequency band. In the decoder, the high frequency PQF band can be ignored to obtain a low sampling rate signal.

Figure 2 shows a block diagram of the corresponding MPEG-2 AAC decoder. The decoder includes a bitstream demultiplexing module 201, a lossless decoding module 202, an inverse quantizer 203, a scale factor module 204, and/or a difference stereo (M/S) module 205, a prediction module 206, an intensity/coupling module 207, and a time. A domain noise shaping module 208, an inverse modified discrete cosine transform module (IMDCT) 209, and a gain control module 210. The encoded audio stream is demultiplexed by the bitstream demultiplexing module 201 to obtain a corresponding data stream and control stream. After the above signal is decoded by the lossless decoding module 202, an integer representation of the scale factor and a quantized value of the signal spectrum are obtained. The inverse quantizer 203 is a set of non-uniform quantizers implemented by a companding function for converting integer quantized values into reconstructed spectra. Since the scale factor module in the encoder differentiates the current scale factor from the previous scale factor and then uses the Huffman code for the difference value, the scale factor module 204 in the decoder performs Huffman decoding to obtain the corresponding difference value, and then recovers. A true scale factor. The M/S module 205 converts the sum/difference channel into left and right channels under the control of side information. Since the second order backward adaptive predictor 105 is used in the encoder to eliminate the redundancy of the steady state signal and improve the coding efficiency, the prediction decoding is performed by the prediction module 206 in the decoder. The intensity/coupling module 207 performs intensity/coupling decoding under the control of the side information, and then outputs it to the time domain noise shaping module 208 for time domain noise shaping decoding, and finally performs frequency-time conversion by the inverse modified discrete cosine transform module 209. For the case where the sampling frequency is scalable, the high frequency PQF band can be ignored by the gain control module 210 to obtain a low sampling rate signal.

Similar to MPEG AAC, the Dolby AC_3 encoder also uses inter-channel intensity combining to improve multi-channel signal encoding efficiency.

However, existing stereo coding techniques, including the //difference stereo technique and the intensity coupled stereo technique, have certain drawbacks. For example, in the sum/difference stereo coding, the coded end pairs and the difference channel signals are separately quantized, and the noise of the L/R left and right channel signals obtained at the decoding end is a superposition of the sum and difference channel quantization noise, resulting in quality. Deterioration. In intensity-coupled coding, if the quantization accuracy is low, or the resolution is insufficient, the subjective quality of the decoded audio signal will be seriously affected.

In his doctoral thesis "Higli Fidel i ty. MuU ichannel Audio Compress i 011", Dai Yang proposed a method of removing channel redundancy by KLT (Karhunen-Loeve Transform) transformation after filtering and before quantization. Since the KLT transform is the best transform under the minimum mean square error criterion, in this sense, the KLT transform can be used to minimize the redundancy between channels. However, this method introduces a problem that is difficult to solve in the prior art: How to use the existing psychoacoustic model technology to effectively quantify the redundant channel coefficients. If this problem cannot be solved, the method has no practical significance.

In response to the above problem, the international patent application with the international application number PCT/IB02/01595 (application date May 8, 2002) proposes to quantize a plurality of channels when encoding an audio signal of more than one channel. The coefficients use an integer discrete cosine transform (INT DCT) method to remove inter-channel redundancy. This method is proposed for the shortcomings of the current multi-channel coding method, but does not solve the problem of two-channel stereo coding efficiency. Moreover, the method of integer discrete cosine transform employed in the method of the patent application is not an optimal solution for the redundancy removal between quantized coefficients (considering the time variation of the source). At the same time, this method also inevitably increases the computational complexity of encoding and decoding. Summary of the invention

The object of the present invention is to provide a method and apparatus for encoding and decoding using inter-channel redundancy in order to solve the shortcomings of the prior art, to solve stereo in any stereo and multi-channel audio codec in the prior art. Codec low efficiency and poor quality.

In order to achieve the above object, the present invention provides a method for encoding using inter-channel redundancy, including the following steps:

Step 1. Transform a linear PCM (Pulse Code Modulation) signal into the frequency domain, and calculate a masking threshold of the scale factor band;

Step 2. The frequency domain coefficients of the region are quantized according to the masking threshold of the scale factor band, and the integer coefficients of each channel are obtained;

Step 3: Organizing the integer coefficients according to the principle of maximizing the coding gain, and obtaining channel pairs/groups of specific regions of time-frequency;

Step 4: Perform matrix transformation on the quantized integer coefficients of the channel, and perform the entropy coding and code stream multiplexing output on the transformed channel pair/group integer coefficients.

In the step 4, performing matrix transformation on the quantized integer coefficients of the channel adopts an optimal transform mode, where the optimal transform mode is a determined number of integer transforms, KLT transforms, and KLTs. In the approximate transformation, a transform having the largest coding gain is selected for encoding the quantized integer coefficients of the determined region.

The present invention also provides an apparatus for encoding using inter-channel redundancy, comprising a psychoacoustic module, a modified discrete cosine transform module, a quantizer, an entropy coding and a code stream multiplexing module, and a matrix transformation module, wherein the matrix The transform module is configured to organize the integer coefficients of each channel output from the quantizer according to the principle of maximizing the coding gain, and obtain channel pairs/groups of specific regions of time-frequency, for the channels Performing matrix transformation on the quantized integer coefficients of the group/group, and outputting the transformed channel pair/group integer coefficients to the entropy coding and code stream multiplexing module; the psychoacoustic module is configured to calculate the current according to the auditory characteristics of the human ear a masking curve of the frame signal, calculating a masking threshold of the specific time-frequency region according to the masking curve, for guiding quantization of the current frame signal; and the modified discrete cosine transform module for linear PCM (Pulse Code Modulation) The modulation signal is transformed into the frequency domain; and the quantizer is configured to quantize the frequency domain coefficients of the local region according to the masking threshold of the specific time-frequency region from the frequency domain outputted by the modified discrete cosine transform module.

The present invention further provides a method for decoding using inter-channel redundancy, comprising the following steps: Step 1. Perform inverse matrix transformation on integer coefficients of code stream demultiplexing and entropy decoding to obtain an integer quantization coefficient;

Step 2. Perform inverse quantization processing on the integer quantized coefficients to recover the frequency domain coefficients;

Step 3. Perform inverse inverse cosine transform on the frequency domain coefficients to obtain a linear PCM signal.

The inverse matrix transformation in the step 1 adopts an optimal transformation mode, and the optimal transformation mode is in the side information in a certain number of integer transformation modes, KLT transformation modes, and KLT approximate transformation modes. An inverse matrix transform method for recovering integer quantized coefficients at the time of encoding, which is determined by a matrix transform code.

The present invention also provides an apparatus for decoding using inter-channel redundancy, which includes a code stream demultiplexing and entropy decoding module, an inverse quantizer, an inverse modified discrete cosine transform module, and an inverse matrix transform module, where The inverse matrix transform module is configured to perform inverse matrix transform on integer coefficients output from the code stream demultiplexing and entropy decoding module to obtain integer quantized coefficients; the code stream demultiplexing and entropy decoding module is configured to input Compressed bit stream demultiplexing and entropy decoding to obtain integer coefficients; the inverse quantizer is configured to inverse quantize the integer quantized coefficients output from the inverse matrix transform module to recover frequency domain coefficients; the inverse modified discrete cosine Transform module for frequency domain output from inverse quantizer The coefficients are inversely modified by discrete cosine transform to obtain a linear PCM signal.

The invention adopts an optimal transform method in encoding and decoding, that is, can perform lossless de-duplication processing on the quantized multi-channel coefficients; and can be used for lossless two-channel and multi-channel encoding (Loss less Stereo and Mul t ichannel Audio Coding ) „ In lossy coding, for transformed (such as MDCT transform, QMF subband filtering and wavelet transform, etc.), frequency domain processing (such as predictive coding, noise shaping and differential stereo coding) and quantization The post-spectrum coefficients (including the transform coefficients and the filtered sub-band signals) further improve the encoding efficiency of the audio signal; in lossless coding, the present invention can also be used to remove channel signals (such as time domain PCM samples, sub-band samples). Statistical redundancy between the frequency domain coefficients and the frequency domain. For signal compression purposes, the stereo codec efficiency and quality are improved for any stereo and multi-channel audio codec.

1 is a schematic block diagram of an MPEG-2 AAC encoder in the prior art;

2 is a schematic block diagram of an MPEG-2 AAC decoder in the prior art;

Figure 3 is a schematic block diagram of an encoder of the present invention;

4 is a schematic block diagram of a decoder of the present invention. detailed description

The invention will be described in detail below with reference to the drawings and specific embodiments.

A method of encoding using inter-channel redundancy includes the following steps:

Step 1. Transform the linear PCM signal into the frequency domain, and calculate the masking threshold of the scale factor band. Step 2. Quantize the frequency domain coefficients of the region according to the masking threshold of the scale factor band, and obtain the integer coefficients of each channel; Step 3: The integer coefficients are organized according to the principle that the coding gain is maximum, and the channel pair/group of the specific region of the time-frequency is obtained;

Step 4: Perform matrix transformation on the channel pair/group quantized integer coefficients, and perform the entropy coding and code stream multiplexing output on the transformed channel pair/group integer coefficients.

Since the channel coefficients (including the time domain, the frequency domain, and the sub-bands processed by the present invention, whether it is lossy coding or lossless coding, for convenience of description, the following processing time domain samples, sub-band samples, and frequency domain coefficients are collectively referred to as "Coefficients".) are all integers and are treated in much the same way. Therefore, in the following description, "lossy coding" and "lossless coding" are not distinguished.

Specifically, the above method in combination with the device will be described in detail. A block diagram of a device for encoding with inter-channel redundancy is shown in FIG. 3. The linear PCM signals are input to a modified discrete cosine transform module 301 and a psychoacoustic model 305, respectively, and the modified discrete cosine transform module 301 converts the PCM signals into the frequency domain. As in MPEG AAC, the modified discrete cosine transform window function and block length can be switched according to signal characteristics to ensure sufficient time-frequency resolution and effectively remove intra-channel time domain redundancy. The psychoacoustic model 305 is used to calculate a masking curve of the current frame signal according to the auditory characteristics of the human ear, and the masking threshold of the specific time-frequency region can be calculated according to the masking curve for guiding the quantization of the current frame signal.

The frequency domain coefficients obtained by the modified discrete cosine transform module 301 are sent to the quantizer 302. The quantizer is composed of a set of sub-quantizers, and each sub-quantizer separately quantizes the mask according to a masking threshold of a specific time-frequency region. The frequency domain coefficient of a region, which is usually referred to as a scale factor band. The quantizer has a bit allocation mechanism that controls the number of bits that each sub-quantizer can utilize, such that the number of bits taken to quantize the frequency domain coefficients of the current frame does not exceed the allowed bit limit and minimizes quantization distortion. The bit allocation strategy described herein can employ general common strategies, such as the rate control method of MPEG AAC. The quantizer described here can use a scalar quantizer and a vector quantizer, such as MPEG AAC. Linear scalar quantizer, and vector quantizer for MPEG TwinVQ.

After quantization, the integer coefficients are sent to the matrix transformation module 303. The matrix conversion module 303 organizes the quantized integer coefficients of the respective channels obtained by the quantization according to the principle of maximizing the coding gain to obtain channel pairs/groups of the specific regions of the time-frequency. Also, the channel pair/group of different time-frequency regions (time segments for time domain samples, frequency segments for frequency domain coefficients, and time-frequency regions for sub-band samples) may be different. In the process of selecting the channel pair by the encoder, typically, the correlation between the left channel (L) and the right channel (R) is high, as well as the left surround channel (LS) and the right surround channel (RS). The correlation is high, and the L/R pair and the LS/RS pair are often obtained. When using multiple channel-pair organization modes, the channel-to-organization information needs to be encoded as control information. The following channel groups often appear when organized according to the channel group: Left channel/Right channel/Center channel, Left front channel/Right front channel/Left center channel/Right center channel/Central sound Road, left surround/right surround/back surround, and more.

The method of "optimal transform" is sampled for the quantized integer coefficients in the "channel pair/group" to remove inter-channel redundancy.

The so-called optimal transform means that one of the determined number of integer transforms, KLT transforms, and any transforms used for the approximate LT transform is selected, and the coding gain is maximum. When the KLT transform and the approximate transform of the KLT transform are selected for encoding, the LIFTING algorithm is used to transform the integer coefficient to the integer coefficient.

The so-called maximum coding gain means that the number of bits used is the least when encoding a specific signal at a specific quality.

The integer transform refers to a transform in which each coefficient of the transform matrix is an integer, and there is an inverse matrix (each coefficient is an integer) such that I is a unit matrix. For example, when using a channel pair, use Z and ? to represent the two channel integer coefficients of the channel pair (this , and 7? represent any channel that may appear in the encoding, and should not be interpreted as merely "left channel, and "right channel"), £ and the quantized integer coefficient, ^ and integer transform The resulting integer coefficients, for each channel pair, are integer-scaled for the channel-to-integer coefficients within a certain resolution scale (eg, using the so-called "scale factor band"):

The number of bits used for encoding is less than the number of bits used for ^' encoding.

When a channel group is used, the method and channel pairing are similar.

The so-called KLT transform refers to a signal adaptation matrix whose row vector is the eigenvector of the multi-channel coefficient covariance matrix. Since the KLT transformation matrix is an orthogonal matrix, it can be decomposed into a GIVENS matrix and approximated by the LIFTING algorithm, and an integer result can be obtained.

When using KLT transform, when encoding, the covariance matrix Φ of the signal is calculated according to the time domain signal. Calculate the orthogonal matrix Q according to Φ. Among them, the calculation methods of covariance matrix Φ and orthogonal matrix Q are introduced in signal processing and linear algebra books, such as "Digital Signal Processing: Theory, Algorithm and Implementation", Tsinghua University Press, edited by Hu Guangshu, 1997.

In order to achieve a lossless transformation of integer coefficients to integer coefficients, the KLT transform needs to be approximated using the so-called LIFTING algorithm. The LIFTING algorithm described herein can be referred to related documents such as "Factor ing Wavelet Transforms into Lifting Steps" (I. Daubechies, W. Sweldens, Tech. Rep., Bel l Labora tories, Lucent Technologies, 1996).

Here, the calculation of the KLT transformation matrix and its LIFTING algorithm are illustrated by taking only the channel pair as an example.

As previously set, assuming the analysis area

L{n), R(n), 0≤n≤N ( 2 )

Where: £ and are the quantized integer coefficients; N is the size of the analysis area

Covariance matrix

^C LL ^C LR

Φχ = ( 3 )

1 w _{Λ Λ}

Where: C, C _RR , _R are covariance coefficients _t

Corresponding KLT transformation orthogonal matrix Q

(4)

Orthogonal matrix 0 happens to be a GIVENS rotation matrix, so it can be decomposed into the following form

According to the LIFTING algorithm, after each transformation, the coefficients can be rounded and do not affect the complete reversibility of the system. When using channel group coding, the KLT transformation matrix and the LIFTING algorithm are similar to the channel pair method.

The approximate transformation of the KLT transformation refers to the transformation method used to approximate the KLT transformation under certain premise (such as source statistical properties and computational complexity). Since the KLT transform is the optimal transform in the sense of mean square error, the calculation amount and sideband information are large. Therefore, other transform methods can be used to approximate the KLT transform to reduce the computational amount and/or sideband information, such as DFT (Discrete Fourier). Transform), DCT (discrete Cosine transform), DST (discrete sine transform), etc.

In the approximate transformation of KLT transformation, in order to guarantee the lossless transformation from integer to integer, the LIFTING algorithm is also needed to transform, and the calculation process is the same as KLT's LIFTING algorithm.

The so-called optimal transformation means that in a certain number of integer transformations, KLT transformations (LIFTING implementation) and KLT approximate transformations (LIFTING implementation), the transformation with the largest coding gain is selected for encoding the determined region.

In the specific coding apparatus, the matrix transformation module includes a determined number of integer transform units, KLT transform units, and KLT approximate transform units, and the matrix transformation manner includes selecting a certain number of integer transform modes, KLT transform modes, and KLT approximate transforms. Ways (such as DFT, DCT, DST, etc.). For example, you can select M integer conversion methods, set the code to 4, Α _2, which is not less than

An integer of 1; let the code of the KLT transform be A"; the code of the KLT approximation (such as DFT, DCT, DST, etc.) is where N is an integer greater than 2. And the coding gain of the different transform mode is (1≤ ≤N), set a judgment switch module to make the encoder adaptively select the conversion method with the largest coding gain, and to eliminate the inter-channel redundancy of the encoded signal to the greatest extent. The code of the corresponding conversion method and other necessary Information is written as side information to the compressed bit stream to control the decoder for accurate decoding.

For each channel pair, it can be handled as follows to reduce the number of bits required for encoding.

For example, we can choose three transformation methods, the code numbers are Α, . 4 and A are two integer transformation methods, which is the KLT transformation method. among them

Among them, the value of 0 is as shown in equations (4) and (5). Where, when using transform 4, the channel The integer coefficient after the quantization is not processed; when the transform 4 is adopted, the quantized integer coefficient of the first channel of the channel pair is unchanged, and the integer coefficient of the second channel obtained by the transform is the original first The quantized integer coefficients of the channels are reduced by the difference of the quantized integer coefficients of the second channel; when the transform A is used, the KLT transform is used to achieve redundancy cancellation between the channel coefficients, in addition to coding transformation In addition to the code of the way, it is also necessary to encode (9 values).

The decision switch 306 employing the transformation matrix may be used to select an optimal transformation mode among a determined number of integer transformation units or KLT transformation units in the matrix transformation module or an approximate transformation unit of the KLT, and the selected optimal transformation mode The code number is encoded as side information.

Considering the limitation of the bandwidth occupied by the side information, when encoding the control information such as the channel pair organization mode and the matrix transform sequence number, the matrix transformation type adopted may be selected according to the scale factor band, and the selected matrix is transformed. The serial number is encoded. When 0 and >, the transformation mode A is adopted, that is, the channel internal coefficient is not changed. When O and O, the integer transformation method is used. In other cases, the transformation method 4 is used. The selected transform mode A, A or Λ is written as a side information into the compressed bit stream to control the decoder to accurately decode.

After transformation, the integer coefficients are sent to the entropy coding and code stream multiplexing module 304. In the entropy coding and code stream multiplexing module 304, the statistical redundancy of the integer coefficients can be removed by the effective entropy coding, and then the entropy coding result is multiplexed with the other control information into the compressed bit stream, and output to the transmission. Channel or storage medium. Here, the entropy coding may employ an encoding method such as Huffman coding, run length coding, and arithmetic coding.

The present invention also discloses a decoding method and apparatus using inter-channel redundancy, as shown in FIG. 4, including a code stream demultiplexing and decoding module, an inverse matrix transform module, an inverse quantizer, and an inverse. A modified discrete cosine transform module, the method comprising the following steps:

Step 1. The compressed bit stream is demultiplexed and entropy decoded by the code stream demultiplexing and entropy decoding module. To the integer coefficient and the edge information used to determine which inverse matrix transformation method is used; Step 2, the integer coefficient is inverse matrix transformed by the inverse matrix transformation module to obtain an integer quantization coefficient after the inverse matrix transformation;

Step 3: The integer quantization coefficient transformed by the inverse matrix is inverse quantized by the inverse quantizer to recover the frequency domain coefficient;

Step 4: The frequency domain coefficient is subjected to inverse modified discrete cosine transform by an inverse modified discrete cosine transform module to obtain a linear PCM signal.

Here, the inverse matrix transformation in the step 2 is determined by the conversion mode code in the side information obtained from the step 1, which one of the above conversion methods is employed.

When the matrix transform module performs integer transform using equation (1), the integer transform may be used to restore the integer quantized coefficients.

(7)

BA = I

Where: and are integer coefficients obtained by demultiplexing and entropy decoding; i and & are integer coefficients recovered by integer transformation.

When using the KLT transformation method, the following steps are included:

Step la, obtaining a covariance matrix or corresponding parameters from the code stream (such as step lb in equation (4), calculating a KLT transformation matrix according to the covariance matrix or corresponding parameters; Step lc: For the LT transformation matrix, use the LIFTING algorithm to restore the channel-to-integer quantization coefficient.

When KLT's approximate transformation method is used, these approximate transformation methods are used to calculate the integer approximation channel pair integer quantization coefficient using the LIFTING algorithm.

At the decoding end, when the compressed bit stream is demultiplexed and entropy decoded 401, the integer coefficients and the side information for determining which inverse matrix transform method is used are obtained, and the integer coefficients are sent to the inverse matrix transform module 402. In this embodiment, when the matrix transformation of the three matrix transformation modes of equation (6) is performed, the corresponding inverse matrix is transformed into

( 8 )

The inverse matrix transform module 402 selects which inverse matrix transform method is used to recover the integer quantized coefficients at the time of encoding based on the side information obtained from 401.

The integer quantized coefficients obtained by the inverse matrix transform are sent to the inverse quantization module 403 for inverse quantization processing. The recovered frequency domain coefficients are fed to an inverse modified discrete cosine transform 404 to obtain a linear PCM audio signal.

The inverse matrix transform module includes an integer transform unit, a KLT transform unit, and an approximate transform unit of the KLT, wherein the matrix transform code in the side information is used to select which inverse matrix transform method is used to demultiplex the code stream and The integer coefficients output by the entropy decoding module are inverse matrix transformed, and the transformed integer quantized coefficients are output to an inverse quantizer. The present invention has been described in detail with reference to the various embodiments described above, but those skilled in the art will understand that the invention may be modified or equivalently substituted; The technical solutions of the spirit and scope of the present invention and the improvements thereof are intended to be included in the scope of the claims of the present invention.

Claims

Claim

A method for encoding by using inter-channel redundancy, comprising the steps of: Step 1: transforming a linear PCM signal into a frequency domain, and calculating a masking threshold of the scale factor band; Step 2; The masking threshold of the factor band quantizes the frequency domain coefficients of the region, and obtains the integer coefficients of each channel. Step 3: Organize the integer coefficients according to the principle of maximizing the coding gain, and obtain channel pairs/groups of specific regions in the time-frequency domain. Step 4: Perform matrix transformation on the channel pair/group quantized integer coefficients, and perform the entropy coding and code stream multiplexing output on the transformed channel pair/group integer coefficients.

The method for encoding using inter-channel redundancy according to claim 1, wherein in the step 4, the matrix transform of the channel pair/group quantized integer coefficient is performed as follows. In a determined number of integer transforms, KLT transforms, and KLT approximate transforms, a transform having the largest coding gain selected is used to encode the quantized integer coefficients of the determined region.

The method for encoding by using inter-channel redundancy according to claim 2, wherein the process of performing integer conversion on the channel coefficients/group quantized integer coefficients is:

And are quantized integer coefficients; and integer coefficients obtained by integer transformation;

A =

1 0

The numbers are integers such that n, where / - , is the unit matrix.

0 1

4. The method of encoding using inter-channel redundancy according to claim 2, wherein the approximate conversion of the KLT is FFT or DCT or DST, and the transform is performed by using the LIFTING algorithm.

5. The method of encoding using inter-channel redundancy according to claim 2, wherein said KLT transformed orthogonal matrix Q of said KLT transform is:

Wherein, the covariance matrix is expressed as: Φχ =

among them

1 w _Λ „

C ⁼ ∑ ^Z W(«)

N «=0

L{n), R{n), 0≤n≤N

6. A method of encoding using inter-channel redundancy according to claim 2 or 3 or 4 or 5, wherein when a certain number of integer transforms, KLT transforms, and KLT approximate transforms are selected, corresponding The code of the transform is encoded as side information.

A method of encoding using inter-channel redundancy according to claim 2 or 5, wherein when the KLT transform is selected, a covariance matrix or a corresponding parameter is encoded as side information.

8. A device for encoding with inter-channel redundancy, comprising a psychoacoustic module, a modified discrete cosine transform module, a quantizer, an entropy coding, and a code stream multiplexing module, further comprising a matrix transformation module, wherein

The matrix transformation module is configured to edit the integer coefficients of each channel output from the quantizer The principle of maximizing the code gain is organized, the channel pair/group of the scale factor band is obtained, the channel pair/group is matrix-transformed, and the transformed channel pair/group integer coefficients are output to the entropy coding and code. Stream multiplexing module;

The psychoacoustic module is configured to calculate a masking curve of the current frame signal according to the auditory characteristics of the human ear, calculate a masking threshold of the scale factor band according to the masking curve, and the masking value of the scale factor band is used to guide the signal of the current frame. Quantify

The modified discrete cosine transform module is configured to transform a linear PCM signal into a frequency domain; and the quantizer is configured to quantize a frequency domain output from the modified discrete cosine transform module according to a masking threshold of a specific time-frequency region. The frequency domain coefficient of this area.

9. The apparatus for encoding using inter-channel redundancy according to claim 8, wherein the matrix transform module comprises an integer transform unit, a KLT transform unit, and an approximate transform unit of a KLT, where the integer transform unit The KLT transform unit and the KLT approximation transform unit are respectively configured to perform matrix transformation on the channel pair, and output the transformed channel pair integer coefficients to the entropy coding and code stream multiplexing module.

10. The apparatus for encoding using inter-channel redundancy according to claim 9, wherein the apparatus further comprises a judgment switch module for an integer transform unit or a KLT transform unit in the matrix change/change module or The optimal transformation mode is selected in the approximate transformation unit of KLT, and the control information is encoded.

A method for decoding by using inter-channel redundancy, comprising the following steps: Step 1. Perform inverse matrix transformation on integer coefficients obtained by code stream demultiplexing and entropy decoding to obtain integer quantization coefficients;

12. A method of decoding using inter-channel redundancy according to claim 11 wherein The inverse matrix transformation method in the step 1 is a transformation in the side information obtained by code stream demultiplexing and entropy decoding in a certain number of integer transformation modes, KLT transformation modes, and KLT approximate transformation modes. An inverse matrix transform method for recovering integer quantized coefficients at the time of encoding, determined by a mode code.

13. The method for decoding by inter-channel redundancy according to claim 12, wherein when the inverse matrix transform uses an integer transform method, the channel pair/group before the transform is directly restored by integer transform. Integer quantized coefficient.

The method for decoding by using inter-channel redundancy according to claim 12, wherein when the inverse matrix transformation adopts the KLT transformation mode, the following steps are included:

Step la, obtaining a covariance matrix or its corresponding parameter from the code stream;

Step lb, calculating a KLT transformation matrix according to the covariance matrix or corresponding parameters; Step lc, using the LIFTING algorithm to recover the channel-to-integer quantization coefficient for the KLT transformation matrix.

The method for decoding by using inter-channel redundancy according to claim 12, wherein when the inverse matrix transformation adopts an approximate transformation mode of KLT, the LIFTING algorithm is used for recovering channel pairs for these approximate transformation modes. / group integer quantized coefficient.

16. An apparatus for decoding using inter-channel redundancy, comprising: a code stream demultiplexing and entropy decoding module, an inverse quantizer, and an inverse modified discrete cosine transform module, comprising: an inverse matrix transform Module, where

The inverse matrix transform module is configured to perform inverse matrix transform on integer coefficients output from the code stream demultiplexing and entropy decoding modules to obtain integer quantized coefficients;

The code stream demultiplexing and entropy decoding module is configured to demultiplex and entropy the input compressed bit stream to obtain an integer coefficient; The inverse quantizer is configured to perform inverse quantization processing on the integer quantized coefficients output from the inverse matrix transform module to recover frequency domain coefficients;

The inverse modified discrete cosine transform module is configured to perform inverse modified discrete cosine transform on the frequency domain coefficients output from the inverse quantizer to obtain a linear PCM signal.

17. The apparatus for decoding using inter-channel redundancy according to claim 16, wherein the inverse matrix transform module comprises an integer transform unit, a KLT transform unit, and an approximate transform unit of a KLT, wherein The matrix transform code in the side information obtained by the code stream demultiplexing and entropy decoding module determines the integer transform unit, the KLT transform unit or the approximate transform unit of the KLT for demultiplexing and entropy decoding from the code stream The integer coefficients output by the module are inverse matrix transformed, and the transformed integer quantized coefficients are output to the inverse quantizer.