US20140010372A1 - Method for generating and consuming 3-d audio scene with extended spatiality of sound source - Google Patents

Method for generating and consuming 3-d audio scene with extended spatiality of sound source Download PDF

Info

Publication number
US20140010372A1
US20140010372A1 US13/925,013 US201313925013A US2014010372A1 US 20140010372 A1 US20140010372 A1 US 20140010372A1 US 201313925013 A US201313925013 A US 201313925013A US 2014010372 A1 US2014010372 A1 US 2014010372A1
Authority
US
United States
Prior art keywords
sound source
sound
recited
audio scene
dimensional space
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US13/925,013
Inventor
Jeong-Il Seo
Dae-Young Jang
Kyeong-Ok Kang
Jin-woong Kim
Chie-Teuk Ahn
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Electronics and Telecommunications Research Institute ETRI
Original Assignee
Electronics and Telecommunications Research Institute ETRI
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from KR1020030071345A external-priority patent/KR100626661B1/en
Application filed by Electronics and Telecommunications Research Institute ETRI filed Critical Electronics and Telecommunications Research Institute ETRI
Priority to US13/925,013 priority Critical patent/US20140010372A1/en
Publication of US20140010372A1 publication Critical patent/US20140010372A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/302Electronic adaptation of stereophonic sound system to listener position or orientation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S3/00Systems employing more than two channels, e.g. quadraphonic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S3/00Systems employing more than two channels, e.g. quadraphonic
    • H04S3/002Non-adaptive circuits, e.g. manually adjustable or static, for enhancing the sound image or the spatial distribution
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/13Application of wave-field synthesis in stereophonic audio systems

Definitions

  • the present invention relates to a method for generating and consuming a three-dimensional audio scene having sound source whose spatiality is extended; and, more particularly, to a method for generating and consuming a three-dimensional audio scene to extend the spatiality of sound source in a three-dimensional audio scene.
  • a content providing server encodes contents in a predetermined encoding method and transmits the encoded contents to content consuming terminals that consume the contents.
  • the content consuming terminals decode the contents in a predetermined decoding method and output the transmitted contents.
  • the content providing server includes an encoding unit for encoding the contents and a transmission unit for transmitting the encoded contents.
  • the content consuming terminals includes a reception unit for receiving the transmitted encoded contents, a decoding unit for decoding the encoded contents, and an output unit for outputting the decoded contents to users.
  • MPEG-4 is a technical standard for data compression and restoration technology defined by the MPEG to transmit moving pictures at a low transmission rate.
  • MPEG-4 an object of an arbitrary shape can be encoded and the content consuming terminals consume a scene composed of a plurality of objects. Therefore, MPEG-4 defines Audio Binary Format for Scene (Audio BIFS) with a scene description language for designating a sound object expression method and the characteristics thereof.
  • Audio BIFS Audio Binary Format for Scene
  • an AudioFX node and a Directive Sound node are used to express spatiality of a three-dimensional audio scene.
  • modeling of sound source is usually depended on point-source. Point-source can be described and embodied in a three-dimensional sound space easily.
  • a sound of waves dashing against the coastline stretched in a straight line can be recognized as a linear sound source instead of a point sound source.
  • the size and shape of the sound source should be expressed. Otherwise, the sense of the real of a sound object in the three-dimensional audio scene would be damaged seriously.
  • the spatiality of a sound source could be described to endow a three-dimensional audio scene with a sound source which is of more than one dimensional.
  • an object of the present invention to provide a method for generating and consuming a three-dimensional audio scene having a sound source whose spatiality is extended by adding sound source characteristics information having information on extending the spatiality of the sound source to the three-dimensional audio scene description information.
  • a method for generating a three-dimensional audio scene with a sound source whose spatiality is extended including the steps of a) generating a sound object; and b) generating three-dimensional audio scene description information including sound source characteristics information for the sound object, wherein the sound source characteristics information includes spatiality extension information of the sound source which is information of the size and shape of the sound source expressed in a three-dimensional space.
  • a method for consuming a three dimensional audio scene with a sound source whose spatiality is extended including the steps of: a) receiving a sound object and three-dimensional audio scene description information including sound source characteristics information for the sound object; and b) outputting the sound object based on the three-dimensional audio scene description information, wherein the sound source characteristics information includes spatiality extension information which is information on the size and shape of a sound source expressed in a three-dimensional space.
  • FIG. 1A illustrates a point sound source
  • FIG. 1B illustrates sound sources configured along a line
  • FIG. 1C illustrates sound sources configured on a planar surface
  • FIG. 1D illustrates sound sources configured to have volume in space
  • FIG. 2 is a diagram describing a method for expressing spatial sound source by grouping successive point sound sources
  • FIG. 3 shows an example where spatiality extension information is added to a “DirectiveSound” node of AudioBIFS in accordance with the present invention
  • FIG. 4 is a diagram illustrating how a sound source is extended in accordance with the present invention.
  • FIG. 5A illustrates a distribution of point sound sources configured linearly
  • FIG. 5B illustrates a distribution of point sound sources configured on a planar surface
  • FIG. 5C illustrates a distribution of point sound sources configured to have volume in space.
  • block diagrams of the present invention should be understood to show a conceptual viewpoint of an exemplary circuit that embodies the principles of the present invention.
  • all the flowcharts, state conversion diagrams, pseudo codes and the like can be expressed substantially in a computer-readable media, and whether or not a computer or a processor is described distinctively, they should be understood to express various processes operated by a computer or a processor.
  • Functions of various devices illustrated in the drawings including a functional block expressed as a processor or a similar concept can be provided not only by using hardware dedicated to the functions, but also by using hardware capable of running proper software for the functions.
  • a function When a function is provided by a processor, the function may be provided by a single dedicated processor, single shared processor, or a plurality of individual processors, part of which can be shared.
  • processor should not be understood to exclusively refer to a piece of hardware capable of running software, but should be understood to include a digital signal processor (DSP), hardware, and ROM, PAM and non-volatile memory for storing software, implicatively.
  • DSP digital signal processor
  • ROM read-only memory
  • PAM non-volatile memory
  • an element expressed as a means for performing a function described in the detailed description is intended to include all methods for performing the function including all formats of software, such as combinations or circuits for performing the intended function, firmware/microcode and the like. To perform the intended function, the element is cooperated with a proper circuit for performing the software.
  • the present invention defined by claims includes diverse means for performing particular functions, and the means are connected with each other in a method requested in the claims. Therefore, any means that can provide the function should be understood to be an equivalent to what is figured out from the present specification.
  • FIGS. 1A-1D illustrate various shapes of sound sources.
  • the sound source(s) can be a point sound source, linearly-configured sound sources, planar-surface configured sound sources, and sound sources configured to have volume in space.
  • FIG. 1A illustrates a point sound source.
  • FIG. 1B illustrates sound sources configured along a line.
  • FIG. 1C illustrates sound sources configured on a planar surface.
  • FIG. 1D illustrates sound sources configured to have volume in space. Since sound source has an arbitrary shape and size, it is very complicated to describe the sound source. However, if the shape of the sound source to be modeled is controlled, the sound source can be described less complicatedly.
  • point sound sources are distributed uniformly in the dimension of a virtual sound source in order to model sound sources of various shapes and sizes.
  • the sound sources of various shapes and sizes can be expressed as continuous arrays of point sound sources.
  • the location of each point sound source in a virtual object can be calculated using a vector location of a sound source which is defined in a three-dimensional scene.
  • the spatial sound source When a spatial sound source is modeled with a plurality of point sound sources, the spatial sound source should be described using a node defined in AudioBIFS.
  • AudioBIFS which will be referred to as an AudioBIFS node
  • any effect can be included in the three-dimensional scene. Therefore, an effect corresponding to the spatial sound source can be programmed through the AudioBIFS node and inserted to the three-dimensional scene.
  • the point sound sources distributed in a limited dimension of an object are grouped using AudioBIFS, and the spatial location and direction of the sound sources can be changed by changing the sound source group.
  • the characteristics of the point sound sources are described using a plurality of “DirectiveSound” node.
  • the locations of the point sound sources are calculated to be distributed on the surface of the object uniformly.
  • the point sound sources are located with a spatial distance that can eliminate spatial aliasing, which is disclosed by A. J. Berkhout, D. de Vries, and P. Vogel, “Acoustic control by wave field synthesis,” J. Aoust. Soc. Am., Vol. 93, No. 5 on pages from 2764 to 2778, May, 1993.
  • the spatial sound source can be vectorized by using a group node and grouping the point sound sources.
  • FIG. 2 is a diagram describing a method for expressing spatial sound source by grouping successive point sound sources.
  • a virtual successive linear sound source is modeled by using three point sound sources which are distributed uniformly along the axis of the linear sound source.
  • the locations of the point sound sources are determined to be (x 0 ⁇ dx, y 0 ⁇ dy, z 0 ⁇ dz), (x 0 , y 0 , z 0 ), and (x 0 +dx, y 0 +dy, z 0 +dz) according to the concept of the virtual sound source.
  • dx, dy and dz can be calculated from a vector between a listener and the location of the sound source and the angle between the direction vectors of the sound source, the vector and the angle which are defined in an angle field and a direction field.
  • FIG. 2 describes a spatial sound source by using a plurality of point sound sources. AudioBIFS appears it can support the description of a particular scene. However, this method requires too much unnecessary sound object definition. This is because many objects should be defined to model one single object.
  • MPEG-4 Moving Picture Experts Group 4
  • MPEG-4 Moving Picture Experts Group 4
  • FIG. 3 shows an example where spatiality extension information is added to a “DirectiveSound” node of AudioBIFS in accordance with the present invention.
  • a new rendering design corresponding to a value of a “SourceDimensions” field is applied to the “DirectiveSound” node.
  • the “SourceDimensions” field also includes shape information of the sound source. If the value of the “SourceDimensions” field is “0,0,0”, the sound source becomes one point, no additional technology for extending the sound source is applied to the “DirectiveSound” node. If the value of the “SourceDimensions” field is a value other than “0,0,0”, the dimension of the sound source is extended virtually.
  • the location and direction of the sound source are defined in a location field and a direction field, respectively, in the “DirectiveSound” node.
  • the dimension of the sound source is extended in vertical to a vector defined in the direction field based on the value of the “SourceDimensions” field.
  • the “location” field defines the geometrical center of the extended sound source, whereas the “SourceDimensions” field defines the three-dimensional size of the sound source.
  • the size of the sound source extended spatially is determined according to the values of ⁇ x, ⁇ y, and ⁇ z.
  • FIG. 4 is a diagram illustrating how a sound source is extended in accordance with the present invention.
  • the value of the “SourceDimensions” field is (0, ⁇ y, ⁇ z), ⁇ y and ⁇ z being not zero ( ⁇ y ⁇ 0, ⁇ z ⁇ 0). This indicates a surface sound source having an area of ⁇ y ⁇ z.
  • the illustrated sound source is extended in a direction vertical to a vector defined in the “direction” field based on the values of the “SourceDimensions” field, i.e., (0, ⁇ y, ⁇ z), and thereby forming a surface sound source.
  • the point sound sources are located on the surfaces of the extended sound source.
  • the locations of the point sound sources are calculated to be distributed on the surfaces of the extended sound source uniformly.
  • FIGS. 5A to 5C are diagrams depicting the distributions of point sound sources based on the shapes of various sound sources in accordance with the present invention. More specifically, FIG. 5A illustrates a distribution of point sound sources configured linearly, FIG. 5B illustrates a distribution of point sound sources configured on a planar surface, and FIG. 5C illustrates a distribution of point sound sources configured to have volume in space.
  • the dimension and distance of a sound source are free variables. So, the size of the sound source that can be recognized by a user can be formed freely.
  • multi-track audio signals that are recorded by using an array of microphones can be expressed by extending point sound sources linearly as shown in FIG. 5A .
  • the value of the “SourceDimensions” field is (0, 0, ⁇ z).
  • FIGS. 5B and 5C show a surface sound source expressed through the spread of the point sound source and a spatial sound source having a volume.
  • the value of the “SourceDimensions” field is (0, ⁇ y, ⁇ z) and, in case of FIG. 5C , the value of the “SourceDimensions” field is ( ⁇ x, ⁇ y, ⁇ z).
  • the number of the point sound sources determines the density of the point sound sources in the extended sound source.
  • an “AudioSource” node is defined in a “source” field
  • the value of a “numChan” field may indicate the number of used point sound sources.
  • the directivity defined in “angle,” “directivity” and “frequency” fields of the “DirectiveSound” node can be applied to all point sound sources included in the extended sound source uniformly.
  • the apparatus and method of the present invention can produce more effective three-dimensional sounds by extending the spatiality of sound sources of contents.

Abstract

A method of generating and consuming 3D audio scene with extended spatiality of sound source describes the shape and size attributes of the sound source. The method includes the steps of: generating audio object; and generating 3D audio scene description information including attributes of the sound source of the audio object.

Description

    CROSS REFERENCE TO RELATED APPLICATIONS
  • This application is a continuation of application Ser. No. 11/796,808, filed Apr. 30, 2007, which is a division of application Ser. No. 10/531,632, filed on Oct. 31, 2005, which is a National Stage application of International Patent Application No. PCT/KR2003/002149 filed Oct. 15, 2003 and claims the benefit of Korean Patent Application Nos. 10-2002-0062962, filed Oct. 15, 2002 and 10-2003-0071345, filed Oct. 14, 2003, the entirety of each are incorporated herein by reference.
  • DESCRIPTION
  • 1. Technical Field
  • The present invention relates to a method for generating and consuming a three-dimensional audio scene having sound source whose spatiality is extended; and, more particularly, to a method for generating and consuming a three-dimensional audio scene to extend the spatiality of sound source in a three-dimensional audio scene.
  • 2. Background Art
  • Generally, a content providing server encodes contents in a predetermined encoding method and transmits the encoded contents to content consuming terminals that consume the contents. The content consuming terminals decode the contents in a predetermined decoding method and output the transmitted contents.
  • Accordingly, the content providing server includes an encoding unit for encoding the contents and a transmission unit for transmitting the encoded contents. On the other hand, the content consuming terminals includes a reception unit for receiving the transmitted encoded contents, a decoding unit for decoding the encoded contents, and an output unit for outputting the decoded contents to users.
  • Many encoding/decoding methods of audio/video signals are known so far. Among them, an encoding/decoding method based on Moving Picture Experts Group 4 (MPEG-4) is widely used these days. MPEG-4 is a technical standard for data compression and restoration technology defined by the MPEG to transmit moving pictures at a low transmission rate.
  • According to MPEG-4, an object of an arbitrary shape can be encoded and the content consuming terminals consume a scene composed of a plurality of objects. Therefore, MPEG-4 defines Audio Binary Format for Scene (Audio BIFS) with a scene description language for designating a sound object expression method and the characteristics thereof.
  • Meanwhile, along with the development in video, users want to consume contents of more lifelike sounds and video quality. In the MPEG-4 AudioBIFS, an AudioFX node and a Directive Sound node are used to express spatiality of a three-dimensional audio scene. In these nodes, modeling of sound source is usually depended on point-source. Point-source can be described and embodied in a three-dimensional sound space easily.
  • Actual point-sources, however, tend to have a dimension more than two, rather than to be a point of literal meaning. More import ant here is that the shape of the sound source can be recognized human beings, which is disclosed by J. Baluert, “Spatial Hearing,” the MIT Press, Cambridge Mass., 1996.
  • For example, a sound of waves dashing against the coastline stretched in a straight line can be recognized as a linear sound source instead of a point sound source. To improve the sense of the real of the three-dimensional audio scene by using the AudioBIFS, the size and shape of the sound source should be expressed. Otherwise, the sense of the real of a sound object in the three-dimensional audio scene would be damaged seriously.
  • That is, the spatiality of a sound source could be described to endow a three-dimensional audio scene with a sound source which is of more than one dimensional.
  • DISCLOSURE OF INVENTION
  • It is, therefore, an object of the present invention to provide a method for generating and consuming a three-dimensional audio scene having a sound source whose spatiality is extended by adding sound source characteristics information having information on extending the spatiality of the sound source to the three-dimensional audio scene description information.
  • The other objects and advantages of the present invention can be easily recognized by those of ordinary skill in the art from the drawings, detailed description and claims of the present specification.
  • In accordance with one aspect of the present invention, there is provided a method for generating a three-dimensional audio scene with a sound source whose spatiality is extended, including the steps of a) generating a sound object; and b) generating three-dimensional audio scene description information including sound source characteristics information for the sound object, wherein the sound source characteristics information includes spatiality extension information of the sound source which is information of the size and shape of the sound source expressed in a three-dimensional space.
  • In accordance with one aspect of the present invention, there is provided a method for consuming a three dimensional audio scene with a sound source whose spatiality is extended, including the steps of: a) receiving a sound object and three-dimensional audio scene description information including sound source characteristics information for the sound object; and b) outputting the sound object based on the three-dimensional audio scene description information, wherein the sound source characteristics information includes spatiality extension information which is information on the size and shape of a sound source expressed in a three-dimensional space.
  • BRIEF DESCRIPTION OF DRAWINGS
  • The above and other objects and features of the present invention will become apparent from the following description of the preferred embodiments given in conjunction with the accompanying drawings, in which:
  • FIG. 1A illustrates a point sound source;
  • FIG. 1B illustrates sound sources configured along a line;
  • FIG. 1C illustrates sound sources configured on a planar surface;
  • FIG. 1D illustrates sound sources configured to have volume in space;
  • FIG. 2 is a diagram describing a method for expressing spatial sound source by grouping successive point sound sources;
  • FIG. 3 shows an example where spatiality extension information is added to a “DirectiveSound” node of AudioBIFS in accordance with the present invention;
  • FIG. 4 is a diagram illustrating how a sound source is extended in accordance with the present invention;
  • FIG. 5A illustrates a distribution of point sound sources configured linearly;
  • FIG. 5B illustrates a distribution of point sound sources configured on a planar surface; and
  • FIG. 5C illustrates a distribution of point sound sources configured to have volume in space.
  • BEST MODE FOR CARRYING OUT THE INVENTION
  • Other objects and aspects of the invention will become apparent from the following description of the embodiments with reference to the accompanying drawings, which is set for hereinafter.
  • Following description exemplifies only the principles of the present invention. Even if they are not described or illustrated clearly in the present specification, one of ordinary skill in the art can embody the principles of the present invention and invent various apparatuses within the concept and scope of the present invention.
  • The use of the conditional, terms and embodiments presented in the present specification are intended only to make the concept of the present invention understood, and they are not limited to the embodiments and conditions mentioned in the specification.
  • In addition, all the detailed description on the principles, viewpoints and embodiments and particular embodiments of the present invention should be understood to include structural and functional equivalents to them. The equivalents include not only currently known equivalents but also those to be developed in future, that is, all devices invented to perform the same function, regardless of their structures.
  • For example, block diagrams of the present invention should be understood to show a conceptual viewpoint of an exemplary circuit that embodies the principles of the present invention. Similarly, all the flowcharts, state conversion diagrams, pseudo codes and the like can be expressed substantially in a computer-readable media, and whether or not a computer or a processor is described distinctively, they should be understood to express various processes operated by a computer or a processor.
  • Functions of various devices illustrated in the drawings including a functional block expressed as a processor or a similar concept can be provided not only by using hardware dedicated to the functions, but also by using hardware capable of running proper software for the functions. When a function is provided by a processor, the function may be provided by a single dedicated processor, single shared processor, or a plurality of individual processors, part of which can be shared.
  • The apparent use of a term, ‘processor’, ‘control’ or similar concept, should not be understood to exclusively refer to a piece of hardware capable of running software, but should be understood to include a digital signal processor (DSP), hardware, and ROM, PAM and non-volatile memory for storing software, implicatively. Other known and commonly used hardware may be included therein, too.
  • In the claims of the present specification, an element expressed as a means for performing a function described in the detailed description is intended to include all methods for performing the function including all formats of software, such as combinations or circuits for performing the intended function, firmware/microcode and the like. To perform the intended function, the element is cooperated with a proper circuit for performing the software. The present invention defined by claims includes diverse means for performing particular functions, and the means are connected with each other in a method requested in the claims. Therefore, any means that can provide the function should be understood to be an equivalent to what is figured out from the present specification.
  • Other objects and aspects of the invention will become apparent from the following description of the embodiments with reference to the accompanying drawings, which is set forth hereinafter. The same reference numeral is given to the same element, although the element appears in different drawings. In addition, if further detailed description on the related prior arts is determined to blur the point of the present invention, the description is omitted. Hereafter, preferred embodiments of the present invention will be described in detail.
  • FIGS. 1A-1D illustrate various shapes of sound sources. The sound source(s) can be a point sound source, linearly-configured sound sources, planar-surface configured sound sources, and sound sources configured to have volume in space. FIG. 1A illustrates a point sound source. FIG. 1B illustrates sound sources configured along a line. FIG. 1C illustrates sound sources configured on a planar surface. FIG. 1D illustrates sound sources configured to have volume in space. Since sound source has an arbitrary shape and size, it is very complicated to describe the sound source. However, if the shape of the sound source to be modeled is controlled, the sound source can be described less complicatedly.
  • In the present invention, it is assumed that point sound sources are distributed uniformly in the dimension of a virtual sound source in order to model sound sources of various shapes and sizes. As a result, the sound sources of various shapes and sizes can be expressed as continuous arrays of point sound sources. Here, the location of each point sound source in a virtual object can be calculated using a vector location of a sound source which is defined in a three-dimensional scene.
  • When a spatial sound source is modeled with a plurality of point sound sources, the spatial sound source should be described using a node defined in AudioBIFS. When the node defined in AudioBIFS, which will be referred to as an AudioBIFS node, is used, any effect can be included in the three-dimensional scene. Therefore, an effect corresponding to the spatial sound source can be programmed through the AudioBIFS node and inserted to the three-dimensional scene.
  • However, this requires very complicated Digital Signal Processing (DSP) algorithm and it is very troublesome to control the dimension of the spatial sound source.
  • Also, the point sound sources distributed in a limited dimension of an object are grouped using AudioBIFS, and the spatial location and direction of the sound sources can be changed by changing the sound source group. First of all, the characteristics of the point sound sources are described using a plurality of “DirectiveSound” node. The locations of the point sound sources are calculated to be distributed on the surface of the object uniformly.
  • Subsequently, the point sound sources are located with a spatial distance that can eliminate spatial aliasing, which is disclosed by A. J. Berkhout, D. de Vries, and P. Vogel, “Acoustic control by wave field synthesis,” J. Aoust. Soc. Am., Vol. 93, No. 5 on pages from 2764 to 2778, May, 1993. The spatial sound source can be vectorized by using a group node and grouping the point sound sources.
  • FIG. 2 is a diagram describing a method for expressing spatial sound source by grouping successive point sound sources. In the drawing, a virtual successive linear sound source is modeled by using three point sound sources which are distributed uniformly along the axis of the linear sound source.
  • The locations of the point sound sources are determined to be (x0−dx, y0−dy, z0−dz), (x0, y0, z0), and (x0+dx, y0+dy, z0+dz) according to the concept of the virtual sound source. Here dx, dy and dz can be calculated from a vector between a listener and the location of the sound source and the angle between the direction vectors of the sound source, the vector and the angle which are defined in an angle field and a direction field.
  • FIG. 2 describes a spatial sound source by using a plurality of point sound sources. AudioBIFS appears it can support the description of a particular scene. However, this method requires too much unnecessary sound object definition. This is because many objects should be defined to model one single object.
  • When it is told that the genuine object of hybrid description of Moving Picture Experts Group 4 (MPEG-4) is more object-oriented representations, it is desirable to combine the point sound sources, which are used for model one spatial sound source, and reproduce one single object.
  • In accordance with the present invention, a new field is added to a “DirectiveSound” node of the AudioBIFS to describe the shape and size attributes of a sound source. FIG. 3 shows an example where spatiality extension information is added to a “DirectiveSound” node of AudioBIFS in accordance with the present invention.
  • Referring to FIG. 3, a new rendering design corresponding to a value of a “SourceDimensions” field is applied to the “DirectiveSound” node. The “SourceDimensions” field also includes shape information of the sound source. If the value of the “SourceDimensions” field is “0,0,0”, the sound source becomes one point, no additional technology for extending the sound source is applied to the “DirectiveSound” node. If the value of the “SourceDimensions” field is a value other than “0,0,0”, the dimension of the sound source is extended virtually.
  • The location and direction of the sound source are defined in a location field and a direction field, respectively, in the “DirectiveSound” node. The dimension of the sound source is extended in vertical to a vector defined in the direction field based on the value of the “SourceDimensions” field.
  • The “location” field defines the geometrical center of the extended sound source, whereas the “SourceDimensions” field defines the three-dimensional size of the sound source. In short, the size of the sound source extended spatially is determined according to the values of Δx, Δy, and Δz.
  • FIG. 4 is a diagram illustrating how a sound source is extended in accordance with the present invention. As illustrated in the drawing, the value of the “SourceDimensions” field is (0, Δy, Δz), Δy and Δz being not zero (Δy≠0, Δz≠0). This indicates a surface sound source having an area of Δy×Δz.
  • The illustrated sound source is extended in a direction vertical to a vector defined in the “direction” field based on the values of the “SourceDimensions” field, i.e., (0, Δy, Δz), and thereby forming a surface sound source. As shown in the above, when the dimension and location of a sound source is defined, the point sound sources are located on the surfaces of the extended sound source. In the present invention, the locations of the point sound sources are calculated to be distributed on the surfaces of the extended sound source uniformly.
  • FIGS. 5A to 5C are diagrams depicting the distributions of point sound sources based on the shapes of various sound sources in accordance with the present invention. More specifically, FIG. 5A illustrates a distribution of point sound sources configured linearly, FIG. 5B illustrates a distribution of point sound sources configured on a planar surface, and FIG. 5C illustrates a distribution of point sound sources configured to have volume in space. The dimension and distance of a sound source are free variables. So, the size of the sound source that can be recognized by a user can be formed freely.
  • For example, multi-track audio signals that are recorded by using an array of microphones can be expressed by extending point sound sources linearly as shown in FIG. 5A. In this case, the value of the “SourceDimensions” field is (0, 0, Δz).
  • Also, different sound signals can be expressed as an extension of a point sound source to generate a spread sound source. FIGS. 5B and 5C show a surface sound source expressed through the spread of the point sound source and a spatial sound source having a volume. In case of FIG. 5B, the value of the “SourceDimensions” field is (0, Δy, Δz) and, in case of FIG. 5C, the value of the “SourceDimensions” field is (Δx, Δy, Δz).
  • As the dimension of a spatial sound source is defined as described in the above, the number of the point sound sources (i.e., the number of input audio channels) determines the density of the point sound sources in the extended sound source.
  • If an “AudioSource” node is defined in a “source” field, the value of a “numChan” field may indicate the number of used point sound sources. The directivity defined in “angle,” “directivity” and “frequency” fields of the “DirectiveSound” node can be applied to all point sound sources included in the extended sound source uniformly.
  • The apparatus and method of the present invention can produce more effective three-dimensional sounds by extending the spatiality of sound sources of contents.
  • While the present invention has been described with respect to certain preferred embodiments, it will be apparent to those skilled in the art that various changes and modifications may be made without departing from the scope of the invention as defined in the following claims.

Claims (15)

What is claimed is:
1. A method for processing a three-dimensional audio scene, comprising:
generating, by a computer, a sound object related to at least one of sound source;
generating, audio scene information including sound source characteristics information for the sound object;
transmitting the sound object and the audio scene information,
wherein the sound source characteristics information is related to extension pattern of sound source in the three-dimensional space.
2. The method as recited in claim 1, wherein the sound source characteristics information comprise at least one of a shape, a size or a dimension of the sound source expressed in a three-dimensional space.
3. The method as recited in claim 2, wherein the size is expressed as x0−Δx, y0−Δy, z0−Δz; x0, y0, z0; and x0+Δx, y0+Δy, z0+Δz.
4. The method as recited in claim 3, wherein Δx, Δy, and Δz are calculated based on a vector between a listener and the location of the sound source, or angle between a direction vector of the sound source.
5. The method as recited in claim 1, wherein the three-dimensional space is related to output space of the sound source.
6. A apparatus for processing a three-dimensional audio scene, comprising:
generating unit generates a sound object related to at least one of sound source, and audio scene information including sound source characteristics information for the sound object; and
transmitting unit transmits the sound object and the audio scene information,
wherein the sound source characteristics information is related to extension pattern of sound source in the three-dimensional space.
7. The apparatus as recited in claim 6, wherein the sound source characteristics information comprise at least one of a shape, a size or a dimension of the sound source expressed in a three-dimensional space.
8. The apparatus as recited in claim 7, wherein the size is expressed as x0−Δx, y0−Δy, z0−Δz; x0, y0, z0; and x0+Δx, y0+Δy, z0+Δz.
9. The apparatus as recited in claim 8, wherein Δx, Δy, and Δz are calculated based on a vector between a listener and the location of the sound source, or angle between a direction vector of the sound source.
10. The apparatus as recited in claim 6, wherein the three-dimensional space is related to output space of the sound source.
11. A computer program embodied on a non-transitory computer readable medium, the computer program being configured to control a processor, wherein the computer program comprising:
a sound object related to at least one of sound source; and
audio scene information including sound source characteristics information for the sound object
wherein the sound source characteristics information is related to extension pattern of sound source in the three-dimensional space.
12. The computer program as recited in claim 11, wherein the sound source characteristics information comprise at least one of a shape, a size or a dimension of the sound source expressed in a three-dimensional space.
13. The computer program as recited in claim 12, wherein the size is expressed as x0−Δx, y0−Δy, z0−Δz; x0, y0, z0; and x0+Δx, y0+Δy, z0+Δz.
14. The computer program as recited in claim 13, wherein Δx, Δy, and Δz are calculated based on a vector between a listener and the location of the sound source, or angle between a direction vector of the sound source.
15. The computer program as recited in claim 11, wherein the three-dimensional space is related to output space of the sound source.
US13/925,013 2002-10-15 2013-06-24 Method for generating and consuming 3-d audio scene with extended spatiality of sound source Abandoned US20140010372A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US13/925,013 US20140010372A1 (en) 2002-10-15 2013-06-24 Method for generating and consuming 3-d audio scene with extended spatiality of sound source

Applications Claiming Priority (8)

Application Number Priority Date Filing Date Title
KR20020062962 2002-10-15
KR10-2002-0062962 2002-10-15
KR1020030071345A KR100626661B1 (en) 2002-10-15 2003-10-14 Method of Processing 3D Audio Scene with Extended Spatiality of Sound Source
KR10-2003-0071345 2003-10-14
PCT/KR2003/002149 WO2004036955A1 (en) 2002-10-15 2003-10-15 Method for generating and consuming 3d audio scene with extended spatiality of sound source
US10/531,632 US20060120534A1 (en) 2002-10-15 2003-10-15 Method for generating and consuming 3d audio scene with extended spatiality of sound source
US11/796,808 US8494666B2 (en) 2002-10-15 2007-04-30 Method for generating and consuming 3-D audio scene with extended spatiality of sound source
US13/925,013 US20140010372A1 (en) 2002-10-15 2013-06-24 Method for generating and consuming 3-d audio scene with extended spatiality of sound source

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US11/796,808 Continuation US8494666B2 (en) 2002-10-15 2007-04-30 Method for generating and consuming 3-D audio scene with extended spatiality of sound source

Publications (1)

Publication Number Publication Date
US20140010372A1 true US20140010372A1 (en) 2014-01-09

Family

ID=36574228

Family Applications (3)

Application Number Title Priority Date Filing Date
US10/531,632 Abandoned US20060120534A1 (en) 2002-10-15 2003-10-15 Method for generating and consuming 3d audio scene with extended spatiality of sound source
US11/796,808 Active US8494666B2 (en) 2002-10-15 2007-04-30 Method for generating and consuming 3-D audio scene with extended spatiality of sound source
US13/925,013 Abandoned US20140010372A1 (en) 2002-10-15 2013-06-24 Method for generating and consuming 3-d audio scene with extended spatiality of sound source

Family Applications Before (2)

Application Number Title Priority Date Filing Date
US10/531,632 Abandoned US20060120534A1 (en) 2002-10-15 2003-10-15 Method for generating and consuming 3d audio scene with extended spatiality of sound source
US11/796,808 Active US8494666B2 (en) 2002-10-15 2007-04-30 Method for generating and consuming 3-D audio scene with extended spatiality of sound source

Country Status (5)

Country Link
US (3) US20060120534A1 (en)
EP (1) EP1552724A4 (en)
JP (1) JP4578243B2 (en)
AU (1) AU2003269551A1 (en)
WO (1) WO2004036955A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9148740B2 (en) 2010-05-04 2015-09-29 Samsung Electronics Co., Ltd. Method and apparatus for reproducing stereophonic sound
WO2022219100A1 (en) * 2021-04-14 2022-10-20 Telefonaktiebolaget Lm Ericsson (Publ) Spatially-bounded audio elements with derived interior representation
US11564050B2 (en) * 2019-12-09 2023-01-24 Samsung Electronics Co., Ltd. Audio output apparatus and method of controlling thereof

Families Citing this family (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080024434A1 (en) * 2004-03-30 2008-01-31 Fumio Isozaki Sound Information Output Device, Sound Information Output Method, and Sound Information Output Program
EP1691348A1 (en) 2005-02-14 2006-08-16 Ecole Polytechnique Federale De Lausanne Parametric joint-coding of audio sources
DE102005008343A1 (en) 2005-02-23 2006-09-07 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for providing data in a multi-renderer system
DE102005008342A1 (en) 2005-02-23 2006-08-24 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Audio-data files storage device especially for driving a wave-field synthesis rendering device, uses control device for controlling audio data files written on storage device
DE102005008333A1 (en) * 2005-02-23 2006-08-31 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Control device for wave field synthesis rendering device, has audio object manipulation device to vary start/end point of audio object within time period, depending on extent of utilization situation of wave field synthesis system
DE102005008366A1 (en) * 2005-02-23 2006-08-24 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Device for driving wave-field synthesis rendering device with audio objects, has unit for supplying scene description defining time sequence of audio objects
DE102005008369A1 (en) 2005-02-23 2006-09-07 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for simulating a wave field synthesis system
WO2006126843A2 (en) 2005-05-26 2006-11-30 Lg Electronics Inc. Method and apparatus for decoding audio signal
JP4988717B2 (en) 2005-05-26 2012-08-01 エルジー エレクトロニクス インコーポレイティド Audio signal decoding method and apparatus
KR100857107B1 (en) 2005-09-14 2008-09-05 엘지전자 주식회사 Method and apparatus for decoding an audio signal
WO2007083958A1 (en) * 2006-01-19 2007-07-26 Lg Electronics Inc. Method and apparatus for decoding a signal
EP1974343A4 (en) * 2006-01-19 2011-05-04 Lg Electronics Inc Method and apparatus for decoding a signal
JP4814344B2 (en) 2006-01-19 2011-11-16 エルジー エレクトロニクス インコーポレイティド Media signal processing method and apparatus
CN102693727B (en) 2006-02-03 2015-06-10 韩国电子通信研究院 Method for control of randering multiobject or multichannel audio signal using spatial cue
KR20080093419A (en) 2006-02-07 2008-10-21 엘지전자 주식회사 Apparatus and method for encoding/decoding signal
BRPI0706488A2 (en) 2006-02-23 2011-03-29 Lg Electronics Inc method and apparatus for processing audio signal
TWI340600B (en) 2006-03-30 2011-04-11 Lg Electronics Inc Method for processing an audio signal, method of encoding an audio signal and apparatus thereof
US20080235006A1 (en) 2006-08-18 2008-09-25 Lg Electronics, Inc. Method and Apparatus for Decoding an Audio Signal
MX2009002795A (en) * 2006-09-18 2009-04-01 Koninkl Philips Electronics Nv Encoding and decoding of audio objects.
US7962756B2 (en) * 2006-10-31 2011-06-14 At&T Intellectual Property Ii, L.P. Method and apparatus for providing automatic generation of webpages
US8265941B2 (en) * 2006-12-07 2012-09-11 Lg Electronics Inc. Method and an apparatus for decoding an audio signal
KR100868475B1 (en) 2007-02-16 2008-11-12 한국전자통신연구원 Method for creating, editing, and reproducing multi-object audio contents files for object-based audio service, and method for creating audio presets
KR100934928B1 (en) * 2008-03-20 2010-01-06 박승민 Display Apparatus having sound effect of three dimensional coordinates corresponding to the object location in a scene
RU2556390C2 (en) 2010-12-03 2015-07-10 Фраунхофер-Гезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф. Apparatus and method for geometry-based spatial audio coding
KR101901908B1 (en) * 2011-07-29 2018-11-05 삼성전자주식회사 Method for processing audio signal and apparatus for processing audio signal thereof
US10176644B2 (en) 2015-06-07 2019-01-08 Apple Inc. Automatic rendering of 3D sound
EP3378241B1 (en) * 2015-11-20 2020-05-13 Dolby International AB Improved rendering of immersive audio content
JP6786834B2 (en) 2016-03-23 2020-11-18 ヤマハ株式会社 Sound processing equipment, programs and sound processing methods
BR112021011170A2 (en) * 2018-12-19 2021-08-24 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e. V. Apparatus and method for reproducing a spatially extended sound source or apparatus and method for generating a bit stream from a spatially extended sound source
US11341952B2 (en) 2019-08-06 2022-05-24 Insoundz, Ltd. System and method for generating audio featuring spatial representations of sound sources
US20230017323A1 (en) * 2019-12-12 2023-01-19 Liquid Oxigen (Lox) B.V. Generating an audio signal associated with a virtual sound source
NL2024434B1 (en) * 2019-12-12 2021-09-01 Liquid Oxigen Lox B V Generating an audio signal associated with a virtual sound source
BR112022013974A2 (en) * 2020-01-14 2022-11-29 Fraunhofer Ges Forschung APPARATUS AND METHOD FOR REPRODUCING A SPATIALLY EXTENDED SOUND SOURCE OR APPARATUS AND METHOD FOR GENERATING A DESCRIPTION FOR A SPATIALLY EXTENDED SOUND SOURCE USING ANCHORING INFORMATION
CN112839165B (en) * 2020-11-27 2022-07-29 深圳市捷视飞通科技股份有限公司 Method and device for realizing face tracking camera shooting, computer equipment and storage medium
KR20220150592A (en) 2021-05-04 2022-11-11 한국전자통신연구원 Method and apparatus for rendering a volume sound source
WO2023061965A2 (en) * 2021-10-11 2023-04-20 Telefonaktiebolaget Lm Ericsson (Publ) Configuring virtual loudspeakers

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5923788A (en) * 1995-03-06 1999-07-13 Kabushiki Kaisha Toshiba Image processing apparatus
US20030095668A1 (en) * 2001-11-20 2003-05-22 Hewlett-Packard Company Audio user interface with multiple audio sub-fields
US7027600B1 (en) * 1999-03-16 2006-04-11 Kabushiki Kaisha Sega Audio signal processing device
US20060282874A1 (en) * 1998-12-08 2006-12-14 Canon Kabushiki Kaisha Receiving apparatus and method

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE19721487A1 (en) * 1997-05-23 1998-11-26 Thomson Brandt Gmbh Method and device for concealing errors in multi-channel sound signals
US6330486B1 (en) 1997-07-16 2001-12-11 Silicon Graphics, Inc. Acoustic perspective in a virtual three-dimensional environment
AU761202B2 (en) 1997-09-22 2003-05-29 Sony Corporation Generation of a bit stream containing binary image/audio data that is multiplexed with a code defining an object in ascii format
US6016473A (en) 1998-04-07 2000-01-18 Dolby; Ray M. Low bit-rate spatial coding method and system
JP2000092754A (en) 1998-09-14 2000-03-31 Toshiba Corp Power circuit for electrical equipment
KR100339559B1 (en) * 1999-09-22 2002-06-03 구자홍 Heat exchanger and its manufacturing mathod for air conditioner
US20010037386A1 (en) * 2000-03-06 2001-11-01 Susumu Takatsuka Communication system, entertainment apparatus, recording medium, and program
US6985620B2 (en) * 2000-03-07 2006-01-10 Sarnoff Corporation Method of pose estimation and model refinement for video representation of a three dimensional scene
JP2001251698A (en) 2000-03-07 2001-09-14 Canon Inc Sound processing system, its control method and storage medium
JP2002218599A (en) 2001-01-16 2002-08-02 Sony Corp Sound signal processing unit, sound signal processing method
US7113610B1 (en) * 2002-09-10 2006-09-26 Microsoft Corporation Virtual sound source positioning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5923788A (en) * 1995-03-06 1999-07-13 Kabushiki Kaisha Toshiba Image processing apparatus
US20060282874A1 (en) * 1998-12-08 2006-12-14 Canon Kabushiki Kaisha Receiving apparatus and method
US7027600B1 (en) * 1999-03-16 2006-04-11 Kabushiki Kaisha Sega Audio signal processing device
US20030095668A1 (en) * 2001-11-20 2003-05-22 Hewlett-Packard Company Audio user interface with multiple audio sub-fields

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9148740B2 (en) 2010-05-04 2015-09-29 Samsung Electronics Co., Ltd. Method and apparatus for reproducing stereophonic sound
US9749767B2 (en) 2010-05-04 2017-08-29 Samsung Electronics Co., Ltd. Method and apparatus for reproducing stereophonic sound
US11564050B2 (en) * 2019-12-09 2023-01-24 Samsung Electronics Co., Ltd. Audio output apparatus and method of controlling thereof
WO2022219100A1 (en) * 2021-04-14 2022-10-20 Telefonaktiebolaget Lm Ericsson (Publ) Spatially-bounded audio elements with derived interior representation

Also Published As

Publication number Publication date
EP1552724A4 (en) 2010-10-20
US20070203598A1 (en) 2007-08-30
US8494666B2 (en) 2013-07-23
EP1552724A1 (en) 2005-07-13
JP2006503491A (en) 2006-01-26
AU2003269551A1 (en) 2004-05-04
JP4578243B2 (en) 2010-11-10
US20060120534A1 (en) 2006-06-08
WO2004036955A1 (en) 2004-04-29

Similar Documents

Publication Publication Date Title
US8494666B2 (en) Method for generating and consuming 3-D audio scene with extended spatiality of sound source
KR101004836B1 (en) Method for coding and decoding the wideness of a sound source in an audio scene
JP4499165B2 (en) Method for generating and consuming a three-dimensional sound scene having a sound source with enhanced spatiality
EP2954702B1 (en) Mapping virtual speakers to physical speakers
US10659904B2 (en) Method and device for processing binaural audio signal
US11930351B2 (en) Spatially-bounded audio elements with interior and exterior representations
WO2020144062A1 (en) Efficient spatially-heterogeneous audio elements for virtual reality
CN114067810A (en) Audio signal rendering method and device
US20230007427A1 (en) Audio scene change signaling
CN114630145A (en) Multimedia data synthesis method, equipment and storage medium
KR20220028021A (en) Methods, apparatus and systems for representation, encoding and decoding of discrete directional data
US20230133555A1 (en) Method and Apparatus for Audio Transition Between Acoustic Environments
KR20190060464A (en) Audio signal processing method and apparatus
CN116472725A (en) Intelligent hybrid rendering for augmented reality/virtual reality audio
KR20210120063A (en) Audio signal processing method and apparatus

Legal Events

Date Code Title Description
STPP Information on status: patent application and granting procedure in general

Free format text: FINAL REJECTION MAILED

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION