WO2007078254A2

WO2007078254A2 - Personalized decoding of multi-channel surround sound

Info

Publication number: WO2007078254A2
Application number: PCT/SE2007/000006
Authority: WO
Inventors: Anisse Taleb; Erlendur Karlsson
Original assignee: Telefonaktiebolaget Lm Ericsson (Publ)
Priority date: 2006-01-05
Filing date: 2007-01-05
Publication date: 2007-07-12
Also published as: BRPI0706285A2; CN101433099A; RU2008132156A; EP1969901A2; WO2007078254A3

Abstract

A parametric multi-channel surround audio bitstream is received in a multi-channel decoder (13). The received spatial parameters are in combining unit (37) transformed into a new set of spatial parameters that are used in order to obtain a decoding of the multi-channel surround sound that is not a simple equivalent of the original input multi-channel surround signal but e.g. may be personalized by making the transformation based on a representation of user head related filters obtained from a unit (43). Such personalized spatial parameters may also be obtained by combining the received spatial parameters and a representation of the user head related filters with a set of additional rendering parameters that for example are interactively determined by the user and thus are time dependent.

Description

PERSONALIZED DECODING OF MULTI-CHANNEL SURROUND SOUND RELATED APPLICATION

This application claims priority and benefit from U.S. provisional patent application No. 60/743,096, filed January 5, 2006, the entire teachings of which are incorporated herein by reference.

TECHNICAL FIELD

The present invention is related to decoding a multi-channel surround audio bitstream. BACKGROUND

In film theaters around the world, multi-channel surround audio systems have since long placed film audiences in the center of the audio spaces of the film scenes that are being played before them and are giving them a realistic and convincing feeling of "being there". This audio technology has moved into the homes of ordinary people as home surround sound theatre systems and is now providing them with the sense of "being there" in their own living rooms.

The next field where this audio technology will be used includes mobile wireless units or terminals, in particular small units such as cellular telephones and PDAs. There the immersive nature of the surround sound is even more important because of the small sizes of the displays.

Moving this technology to mobile units is, however, not a trivial matter. The main obstacles include that:

1. The available bit-rate is in many cases low in wireless mobile channels. 2. The processing power of mobile terminals is often limited.

3. Small mobile terminals generally have only two micro speakers and earplugs or headphones.

This means, in particular for mobile terminals such as cellular telephones, that a surround sound solution for a mobile terminal has to use a much lower bit rate than the 384 kbits/s used in the Dolby Digital 5.1 system. Due to the limited processing power, the decoders of the mobile terminals must be computationally optimized and due to the speaker configuration of the mobile terminal, the surround sound must be delivered through the earplugs or headphones.

A standard way of delivering multi-channel surround sound through headphones or earplugs is to perform a 3D audio or binaural rendering of each of the speaker signals.

In general, in 3D audio rendering a model of the audio scene is used and each incoming monophonic signal is filtered through a set of filters that model the transformations created by the human head, torso and ears. These filters are called head related filters (HRFs) having head related transfer functions (HRTFs) and if appropriately designed, they give a good 3D audio scene perception.

The diagram of Fig. 1 illustrates a method of complete 3D audio rendering of an audio signal according to the Dolby Digital 5.1 system. The six multi-channel signals according to the Dolby Digital 5.1 system are: - surround right (SR), - right (R),

- center (C),

- low frequency (LFE), 5 - left (L)

- surround left (SL).

In the example illustrated in Fig. 1 the center and low frequency signals are combined into one signal. Then, five different filters Hf , Hf ,H^c , Hf and H_c ^r are needed in order to implement this method of head related filtering. The SR signal is input to filters Hf and Hf. , the o R signal is input to filters Hf and Hf. , the C and LFE signals are jointly input to filter H^c , the L signal is input to filters Hf. and Hf and the SL signal is input to filters Hf. and Hf . The signals output from the filters Hf , Hf , H^c , Hf. and Hf, are summed in a right summing element IR to give a signal intended to be provided to the right headphone, not shown. The signals output from the filters Hf. , Hf, , H^c , Hf and Hf are summed in a left summing 5 element IL to give a signal intended to be provided to the left headphone, not shown.

The quality in terms of 3D perception of such rendering depends on how closely the HRFs model or represent the listener's own head related filtering when she/he is listening. Hence, it may be advantageous if the HRFs can be adapted and personalized for each listener if a good or very good quality is desired. This adaptation and personalization step may include modeling, measure- 0 ment and in general a user dependent tuning in order to refine the quality of the perceived 3D audio scene.

Current state-of-the-art standardized multi-channel audio codecs require a high amount of bandwidth or a high bit-rate in order to reach an acceptable quality, and thus they prohibit the use of such codecs for services such as wireless mobile streaming. 5 For instance, even if the Dolby Digital 5.1 system (AC-3 codec) has very low complexity when compared to an AAC multi-channel codec, it requires a much higher bit-rate for similar quality. Both codecs, the AAC multi-channel codec and the AC-3 codec, remain until today unusable in the wireless mobile domain because of the high demands the high demands that they make on computational complexity and bitrate. 0 New parametric multi-channel codecs based on the principles of binaural cue coding have been developed. The recently standardized parametric stereo tool is a good example of the low complexity/high quality parametric technique for encoding stereophonic sound. The extension of parametric stereo to multi-channel coding is currently under standardization in MPEG under the name Spatial Audio coding, and is also known as MPEG-surround. The principles of parametric multi-channel coding can be explained and understood from the block diagram of Fig. 2 that illustrates a general case. A parametric surround encoder 3, also called a multi-channel parametric surround encoder, receives a multichannel, composite audio signal comprising the individual signals X₁ (ή) to x_N (ή) , where N is the number of input channels. For a Dolby Digital 5.1 surround system N = 6 as stated above. The encoder 3 then forms in down-mixing unit 5 a composite down-mixed signal comprising the individual down-mixed signals Z₁(H) to z_M(ή) . The number M of down-mixed channels (M < N) is dependent upon the required or allowable maximum bit-rate, the required quality and the availability of an M-channel audio encoder 7. One key aspect of the encoding process is that the down-mixed composite signal, typically a stereo signal but it could also be a mono signal, is derived from the multichannel input signal, and it is this down-mixed composite signal that is compressed in the audio encoder 7 for transmission over the wireless channel 9 rather than the original multi-channel signal. The parametric encoder 3 and in particular down-mixing unit 5 thereof may be capable of performing a down-mixing process, such that it creates a more or less true equivalent of the multi-channel signal, in the mono or stereo down-mixing. The parametric surround encoder also comprises a spatial parameter estimation unit 9 that from the input signals X₁(Ji) to x_N(n) computes the cues or spatial parameters that in some way can be said to describe the down- mixing process or the assumptions made therein. The compressed audio signal which is output from the M-channel audio encoder and also is the main signal is together with the spatial parameters that constitute side information transmitted over an interface 11 such as a wireless interface to the receiving side that in the case considered here typically is a mobile terminal.

Alternatively, the down-mixing could be supplied by some external unit, such as from a unit employing Artistic Downmix.

On the receiving side, a complementary parametric surround decoder 13 includes an audio decoder 15 and should be constructed to be capable of creating the best possible multi-channel decoding based on knowledge of the down-mixing algorithm used on the transmitting side and the encoded spatial parameters or cues that are received in parallel to the compressed multichannel signal. The audio decoder 15 produces signals Z₁(H) to z_M(ή) that should be as similar as possible to the signals Z₁(Ii) to z_M(n) on the transmitting side. These are together with the spatial parameters input to a spatial synthesis unit 17 that produces output signals X₁Qi) to x_N(ή) that should be as similar as possible to the original input signals X₁(U) to x_N(ή) on the transmitting side. The output signals X₁ (ή) to x_N (ή) can be input to a binaural rendering system such as that shown in Fig. 1.

It is obvious, that depending on the bandwidth of the transmitting channel over the interface 11 that generally is relatively low there will be a loss of information and hence the signals Z₁(H) to z_M (n) and X₁ (ή) to x_N (ή) on the receiving side cannot be the same as their counterparts on the transmitting side. Even though they are not quite true equivalents of their counter parts, they may be sufficiently good equivalents. In general, such a surround encoding process is independent of the compression algorithm used for the transmitted channels used in the units audio encoder 7 and audio decoder 15 in Fig.

2. The encoding process can use any of a number of high-performance compression algorithms such as AMR-WB+, MPEG-I Layer III, MPEG-4 AAC or MPEG-4 High Efficiency AAC, and it could even use PCM. In general, the above operations are done in the transformed signal domain, such as Fourier transform or MDCT. This is especially beneficial if the spatial parameter estimation and synthesis in the units 9 and 17 use the same type of transform as that used in the audio encoder 7, also called core codec.

Fig. 3 is a detailed block diagram of an efficient parametric audio encoder. The N-channel discrete time input signal, denoted in vector form as x_N(n) , is first transformed to the frequency domain in a transform unit 21, and in general to a transform domain that gives a signal x^ (k, m) .

The index k is the index of the transform coefficients, or sub-bands if a frequency domain transform is chosen. The index m represents the decimated time domain index that is also related to the input signal possibly through overlapped frames. The signal is thereafter down-mixed in a down-mixing unit 5 to generate the M-channel downmix signal z_M (k,m) , where M < N. A sequence of spatial model parameter vectors ^■p_N(k, m) is estimated in an estimation unit 9. This can be either done in an open-loop or closed loop fashion.

Spatial parameters consist of psycho-acoustical cues that are representative of the surround sound sensation. For instance, in the MPEG surround encoder, these parameters consist of inter- channel differences in level, phase and coherence equivalent to the ILD, ITD and IC cues to capture the spatial image of a multi-channel audio signal relative to a transmitted down-mixed signal z_M(k,m) (or if in closed loop, the decoded signal x_M(k,m)). The cues p_N(k,m) can be encoded in a very compact form such as in a spatial parameter quantization unit 23 producing the signal p_N(k,7n) followed by a spatial parameter encoder 25. The M-channel audio encoder 7 produces the main bitstream which in a multiplexer 27 is multiplexed with the spatial side information produced by the parameter encoder. From the multiplexer the multiplexed signal is transmitted to demultiplexer 29 on the receiving side in which the side information and the main bitstream are recovered as seen in the block diagram of Fig. 4. On the receiving side the main bitstream is decoded to synthesize a high quality multichannel representation using the received spatial parameters. The main bitstream is first decoded in an M-channel audio decoder 31 from which the decoded signals x_M(k,m) are input to the spatial synthesis unit 17. The spatial side information holding the spatial parameters is extracted by the demultiplexer 29 and provided to a spatial parameter decoder 33 that produces the decoded parameters p_N(k,m) and transmits them to the synthesis unit 17. The spatial synthesis unit produces the signal x_N(k,m) , that is provided to the signal F/T transform unit 35 transforming into the time domain to produce the signal \_N(n) , i.e. the multichannel decoded signal. A 3D audio rendering of a multi-channel surround sound can be delivered to a mobile terminal user by using an efficient parametric surround decoder to first obtain the multiple surround sound channels, using for instance the multi-channel decoder described above with reference to Fig. 4. Thereupon, the system illustrated in Fig. 1 is used to synthesize a binaural 3D-audio rendered multichannel signal. This operation is shown in the schematic of Fig. 5. Work has also been done in which spatial or 3D audio filtering has been performed in the subband domain. In CA. Lanciani, and R. W. Schafer, "Application of Head-related Transfer Functions to MPEG Audio Signals", Proc. 31st Symposium on System Theory, March 21-23, 1999, Auburn, AL, U.S.A., it is disclosed how an MPEG coded mono signal could be spatialized by performing the HR filtering operation in the subband domain, m A.B. Touimi, M. Emerit and J.M. Pernaux, "Efficient Method for Multiple Compressed Audio Streams Spatialization," Proc. 3rd International Conference on Mobile and Ubiquitous Multimedia, pp. 229-235, October 27-29, 2004, College Park, Maryland, U.S.A., it is disclosed how a number of individually MPEG coded mono signals can be spatialized by doing the HR filtering operations in the subband domain. The solution is based on a special implementation of the HR filters, in which all HR filters are modeled as a linear combination of a few predefined basis filters.

Applications of 3D audio rendering are multiple and include gamming, mobile TV shows, using standards such as 3GPP MBMS or DVB-H, listening to music concerts, watching movies and in general multimedia services, which contain a multi-channel audio component.

The methods described above of rendering multi-channel surround sound, although attractive since they allow a whole new set of services to be provided to wireless mobile units, have many drawbacks:

First of all, the computational demands of such rendering are prohibitive since both decoding and 3D rendering have to be performed in parallel and in real time. The complexity of a parametric multi-channel decoder even if low when compared to a full waveform multi-channel decoder is still quite high and at least higher than that of a simple stereo decoder. The synthesis stage of spatial decoding has a complexity that is at least proportional to the number of encoded channels. Additionally, the filtering operations of 3D rendering are also proportional to the number of channels.

The second disadvantage consists of the temporary memory that is needed in order to store the intermediate decoded channels. They are in fact buffered since they are needed in the second stage of 3D rendering.

Finally, the possible post-processing steps that usually are part of speech and audio codecs may affect the quality of such 3D audio rendering. These post-processing are beneficial for listening in a loudspeaker environment. However, they may introduce severe nonlinear phase distortion that is unequally distributed over the multiple channels and that may impact the 3D audio rendering quality.

SUMMARY

It is an object of the invention to provide an efficient and versatile method of decoding a parametric multi-channel surround audio bitstream. It is another object of the invention to provide a mobile terminal in which a parametric multi-channel surround audio bitstream can be efficiently decoded to produce a signal or signals suitable for being provided to listening equipment in or connected to the mobile terminal.

In a method of decoding a parametric multi-channel surround audio bitstream concepts such as decoding multi-channel surround sound and in particular binaural decoding multi-channel surround sound are used.

In such a method the spatial parameters received by a parametric multi-channel decoder may be transformed into a new set of spatial parameters that are used in order to obtain a different decoding of multi-channel surround sound.

The transformed parameters may also be personalized spatial parameters and can then be obtained by combining both the received spatial parameters and a representation of user head related filters.

The personalized spatial parameters may also be obtained by combining the received spatial parameters and a representation of the user head related filters and a set of additional rendering parameters determined by the user. A subset of the set of additional rendering parameters may be interactive parameters that are set in response to user choices that may be changed during the listening process.

The set of additional rendering parameters may be time dependent parameters.

The method as described herein may allow a simple and efficient way to render surround sound, which is encoded by parametric encoders on mobile devices. The major advantage consists of a reduced complexity and an increased interactivity when listening through headphones using a mobile device. Additional objects and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objects and advantages of the invention maybe realized and obtained by means of the methods, processes, instrumentalities and combinations particularly pointed out in the appended claims.

BRIEF DESCRIP TION OF THE DRAWINGS

While the novel features of the invention are set forth with particularly in the appended claims, a complete understanding of the invention, both as to organization and content, and of the above and other features thereof may be gained from and the invention will be better appreciated from a consideration of the following detailed description of non-limiting embodiments presented hereinbelow with reference to the accompanying drawings, in which:

- Fig. 1 is a block diagram illustrating a possible 3D audio or binaural rendering of a 5.1 audio signal,

- Fig. 2 is a high-level description of the principles of a parametric multi-channel coding and decoding system,

- Fig. 3 is a detailed description of the parametric multi-channel audio encoder

- Fig. 4 is a detailed description of the parametric multi-channel audio decoder,

- Fig. 5 is 3D-audio rendering of decoded multi-channel signal (Prior- Art),

- Fig. 6 is a personalized binaural decoding of multi-channel surround sound, - Fig. 7 is a generalized diagram of the spatial audio processing in the MPEG-surround decoder,

- Fig. 8 is an embodiment of the invention for personalized binaural decoding,

- Fig. 9 is a schematic illustrating combining parameters, and

- Fig. 10 is a diagram illustrating results of listening test.

DETAILED DESCRIPTION The block diagram of Fig. 6 illustrates the main steps in a method of decoding a parametric multi-channel surround audio bitstream as performed in a parametric sound decoder 13. In the demultiplexer 29 the main bitstream and the spatial side information are recovered. The main bitstream is first decoded in an M-channel audio decoder 31 from which the decoded signals i_M(k,m) are input to the personalized spatial synthesis unit 17'. The spatial side information holding the spatial parameters is from the demultiplexer 29 provided to a spatial parameter decoder 33 that produces the decoded parameters p_N(k,m) . The decoded spatial parameters are input to a parameter combining unit 37 that may also receive other parameter information, in particular personalized parameters and HRF information. The combining unit produces new parameters that in particular may be personalized spatial parameters and are input to the synthesis unit 17'. The spatial synthesis unit produces the signal x₂(k,m) that is provided to the signal F/T transform unit 35 transforming back into the time domain. The time domain signal is provided to e.g. the ear-phones 39 of a mobile terminal 41 in which the parametric surround decoder is running. The additional information and parameters received by the combining unit 37 can be obtained from a parameter unit 43 that e.g. may be constructed to receive user input interactively during a listening session such as from depressing some suitable key of the mobile terminal or unit 41.

The method as embodied in an MPEG surround multi-channel decoder, see Text of ISO/IEC 14496-3:200X/PDAM 4, MPEG Surround, N7530, October 2005, Nice, France, will now be described. However, it is obvious that the method can equally well be used in other contexts.

The processing in the MPEG surround decoder can be defined by two matrix multiplications as illustrated in the diagram of Fig. 7, the multiplications shown as including matrix units Ml and M2, also called the predecorrelator matrix unit and the mix matrix unit, respectively, to which the respective signals are input. The first matrix multiplication forms the input signals to decorrelation units or decorrelators D₁, D₂, ..., and the second matrix multiplication forms the output signals based on the down-mix input and the output from the decorrelators. The above operations are done for each hybrid subband, indexed by the hybrid subband index L

In the following, the index n is used for a number of a time slot, k is used to index a hybrid subband, and / is used to index the parameter set. The processing of the input channels to form the output channels can then be described as:

where M"¹* is a two-dimensional matrix mapping a certain number of input channels to a certain number of channels going into the decorrelators, and is defined for every time-slot n and every hybrid subband Jc, and Mf is a two-dimensional matrix mapping a certain number of pre- processed channels to a certain number of output channels and is defined for every time-slot n and every hybrid subband k. The matrix M!,^α' comes in two versions depending on whether time- domain temporal shaping (TP) or temporal envelope shaping (TES) of the decorrelated signal is used, the two versions denoted Ml'^k _wa and MIJ'^ .

The input vector xⁿ'^k to the first matrix unit Ml corresponds to the decoded signals n k z_M(k,m) of Fig. 6 obtained from the audio decoder 31. The vector W ' that is input to the mix matrix unit M2 is a combination of the output d_ls d₂, ... from the decorrelators D₁, D₂, ..., the output from first matrix multiplication, i.e. from the predecorrelator matrix unit M₁, and residual signals res_ls res₂, ..., and is defined for every time-slot n and every hybrid subband k. The output vector y"'^k has components I_f, l_s, r_f, r_s, cf and lfe that basically correspond to the signals L, SL, R, SR, C and LFE as described above. The components must be transformed to the time domains and in some way be rendered to be provided to used earphones, i.e. they cannot be directly used.

A method for 3D audio rendering and in particular personalized decoding uses a decoder that includes a "Reconstruct from Model" block that takes extra input such as a representation of the personal 3D audio filters in the hybrid filter-bank domain and uses it to transform derivatives of the model parameters to other model parameters that allows generating the two binaural signals directly in the transform domain, so that only the binaural 2-channel signal has to be transformed into the discrete time domain, compare the transform unit 35 in Fig. 6.

An embodiment for personalized binaural decoding based on the MPEG surround is illustrated in the diagram of Fig. 8. A third matrix M^ , symbolically shown as the parameter modification matrix M3, is in this example a linear mapping from 6 channels to two channels, which are used as input to the user headphones 39 through the transform unit 35. The matrix multiplication can be written as

Additional binaural post-processing may also be done and is outside the scope of the method as described herein. This may include further post-processing of the left and right channels.

By linearity (associative law) it is clear that the matrices M_j'* and M₃'* can be combined together to form a new set of parameters stored in a new mix matrix M^'* = M^⁴M_j'* ^: . This combining operation is illustrated in Fig. 9, where the multiplication unit corresponding to the new matrix is shown as the mix matrix unit M4 and the multiplication of the two matrices is made in a multiplying unit 45.

The new mix matrix M^ has parameters that depend both on the bit-stream parameters and the user predefined head related filters HRFs and as well on other dynamic rendering parameters if desired.

For the case of head related filters only, the matrix M?'* can be written as rn,k HH)C) H_c ^B(k) Hf (Jc) Hf (Jc) H^c(k) H^c (k)

MΓ = Hf (k) Hf Qc) H_c ^r(k) H_c ^B(k) H^c(k) H^c (k\

the matrix elements being the five different filters which are used to implement the head related filtering and as above are denoted Hf , H^ , H^c , Hf and H_C ^F . In this case the filters are represented in the hybrid domain. Such an operation to represent filters from the time domain to the frequency or transform domain are well known in the signal processing literature. Here the filters that form the matrix M₃'* are functions of the hybrid subband index k and are similar to those illustrated in Fig. 1.

It should be noted that for this simple case the matrix M₃ ¹'* is independent of the time slot index n. Head related filters might also be changed dynamically if the user wants another virtual loudspeaker configuration to be experienced through the headphones 39.

Ln another embodiment, the user may want to interactively change his spatial position. By this it is meant that the user may want to experience how it is to be close to the concert scene if for instance a live concert is played, or farther away. This could be easily implemented by adding delay lines to the parameter modification matrix M₃ ¹'* . The user action may be dynamic and in that case, the matrix M"'* is dependent on the time slot index n.

In yet another embodiment the user may want to experience different spatial sensations. In this case, reverberation and other sound effects can be efficiently introduced in the matrix M₃ ¹'* .

The dynamic nature of the matrix M₃'* related to the user interactivity could benefit from interpolation between two user actions. Methods of parameter interpolation are well known and are not be described herein.

As already stated, the parameter modification matrix M₃'^fc can contain additional rendering parameters that are interactive and are changed in response to user input.

The particular embodiment of the invention described above has been implemented and tested as part of the MPEG standardization effort for a binaural extension of the MPEG surround decoder. The test results from several listening tests performed by independent groups are shown in the diagram of Fig. 10. There it clearly seen that the perceived quality of the binaural rendering from the particular embodiment of the invention is for most test signals better than that obtained from the standard 3D audio post processing method as shown in Figure 5. Although the embodiments described herein refer to decoding for binaural headphone listening, it is obvious to one skilled in the art that they can be applied also for loudspeaker listening and other spatial configurations without departing from the basic idea of parameter mapping and combination.

While specific embodiments of the invention have been illustrated and described herein, it is realized that numerous other embodiments may be envisaged and that numerous additional advantages, modifications and changes will readily occur to those skilled in the art without departing from the spirit and scope of the invention. Therefore, the invention in its broader aspects is not limited to the specific details, representative devices and illustrated examples shown and described herein. Accordingly, various modifications may be made without departing from the spirit or scope of the general inventive concept as defined by the appended claims and their equivalents. It is therefore to be understood that the appended claims are intended to cover all such modifications and changes as fall within a true spirit and scope of the invention. Numerous other embodiments may be envisaged without departing from the spirit and scope of the invention.

Claims

1. A method of decoding a parametric multi-channel surround audio bitstream received by a parametric multi-channel decoder including the steps of:

- demultiplexing said bitstream to form a main bitstream and spatial side information, - decoding the spatial side information to form a first set of spatial parameters,

- modifying the first set of spatial parameters to form a second set of spatial parameters,

- synthesizing from said main bitstream, based on or using the second set of spatial parameters, a surround audio signal to be provided to listening equipment.

2. A method according to claim 1, characterized in that in the step of modifying, the second set of spatial parameters are obtained by combining the first set of spatial parameters and a representation of user head related filters so that the new parameters are personalized and also the surround audio signal is personalized.

3. A method according to claim 2, characterized in that in the step of combining, the received spatial parameters and a representation of user head related filters are also combined with additional rendering parameters determined by the user.

4. A method according to claim 3, characterized in that the additional rendering parameters are interactive parameters set in response to user choices.

5. A method according to claim 3, characterized in that the additional rendering parameters are time dependent.

6. A method of transmitting digital data representing sound to a mobile unit, the digital data including a first number (N) of first channels, each first channel in particular representing sound having a special characteristic, such as sound received from a particular direction and being in a particular frequency band, the method comprising the steps:

- analyzing said digital data to determine parameters characteristic of the sound, the parameters in particular being determined to represent a spatial relationship between the sounds which are represented by the digital data in each of the first channels,

- down-mixing digital data of the first channels with one another to produce digital data in a second number (M) of second channels, the second number being smaller than the first number (M< N), - transmitting wirelessly the digital data in the second channels and the parameters to a mobile unit,

- receiving in the mobile unit the digital data in the second channels and the parameters,

- transforming the received digital data in the second channels, based on the received parameters, to produce transformed digital data suited to be rendered to sound emitters of the mobile unit, and - rendering the transformed digital data to the sound emitters of the mobile unit, characterized by the additional step of modifying, before the step of transforming, the received parameters to form new parameters that are used in the transforming step.

7. A parametric surround decoder for decoding a parametric multi-channel surround audio bitstream, the bitstream including spatial parameters indicating the character of sound represented in the channels of the bitstream received by the decoder, characterized by a modifying unit for modifying the spatial parameters to form new spatial parameters used in synthesizing so that a different decoding of the original multi-channel surround sound is obtained.

8. A parametric surround decoder according to claim 7, characterized in that the modifying unit is arranged to use, in modifying the spatial parameters, a representation of user head related filters so that the new parameters are personalized and also a resulting surround audio signal is personalized.

9. A parametric surround decoder according to claim 8, characterized in that the modifying unit is arranged to use, in modifying the spatial parameters, also additional rendering parameters determined by the user.

10. A parametric surround decoder according to claim 7, characterized in that the modifying unit is arranged to modify the spatial parameters in a time dependent way.

11. A mobile terminal including a parametric surround decoder for decoding a parametric multi-channel surround audio bitstream received by the mobile unit, the bitstream including spatial parameters indicating the character of sound represented in channels of the received bitstream decoder, characterized in that the parametric surround decoder includes a modifying unit for modifying the spatial parameters to form new spatial parameters used in synthesizing so that a different decoding of the original multi-channel surround sound is obtained.

12. A mobile terminal according to claim 11, characterized in that the modifying unit is arranged to use, in modifying the spatial parameters, a representation of user head related filters so that the new parameters are personalized and also a resulting surround audio signal is personalized.

13. A mobile terminal according to claim 12, characterized in that the modifying unit is arranged to use, in modifying the spatial parameters, also additional rendering parameters determined by the user or input from the user, such as by depressing one or more keys of the mobile unit.

14. A mobile terminal according to claim 12, characterized in that the modifying unit is arranged to modify the spatial parameters interactively in accordance with input from the user.

15. A mobile terminal according to claim 7, characterized in that the modifying unit is arranged to modify the spatial parameters in a time dependent way.

16. A method of decoding a parametric multi-channel surround audio bitstream including a first number (N) of audio channels, said bitstream received by a parametric multi-channel decoder, the method including the steps of: - demultiplexing said bitstream to form a main bitstream and spatial side information,

- decoding the main bitstream to form separate bitstreams for said plurality of audio channels,

- decoding the spatial side information to form a first set of spatial parameters,

- synthesizing from said separate bitstreams, based on or using the first set of spatial parameters, surround audio signals in a second number (M) of audio channels suited to be provided to listening equipment, wherein the second number (M) is smaller than the first number (N).

17. A method according to claim 16, characterized in that the second number (M) is equal to 2.

18. A method according to claim 16, characterized in that the first number (N) is equal to 5 or 6.