US20050283361A1 - Audio signal processing method, audio signal processing apparatus, audio signal processing system and computer program product - Google Patents

Audio signal processing method, audio signal processing apparatus, audio signal processing system and computer program product Download PDF

Info

Publication number
US20050283361A1
US20050283361A1 US11/020,030 US2003004A US2005283361A1 US 20050283361 A1 US20050283361 A1 US 20050283361A1 US 2003004 A US2003004 A US 2003004A US 2005283361 A1 US2005283361 A1 US 2005283361A1
Authority
US
United States
Prior art keywords
spectral component
audio signal
template
predetermined
extracted
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US11/020,030
Inventor
Kazuyoshi Yoshii
Hiroshi Okuno
Masataka Goto
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kyoto University
National Institute of Advanced Industrial Science and Technology AIST
Original Assignee
Kyoto University
National Institute of Advanced Industrial Science and Technology AIST
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kyoto University, National Institute of Advanced Industrial Science and Technology AIST filed Critical Kyoto University
Assigned to NATIONAL INSTITUTE OF ADVANCED INDUSTRIAL SCIENCE AND TECHNOLOGY, KYOTO UNIVERSITY reassignment NATIONAL INSTITUTE OF ADVANCED INDUSTRIAL SCIENCE AND TECHNOLOGY ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: GOTO, MASATAKA, OKUNO, HIROSHI, YOSHII, KAZUYOSHI
Publication of US20050283361A1 publication Critical patent/US20050283361A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0316Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
    • G10L21/0364Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude for improving intelligibility
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain

Definitions

  • the present invention relates to an audio signal processing method, an audio signal processing apparatus, and an audio signal processing system for increasing or decreasing a predetermined non-harmonic structured spectral component contained in an audio signal, as well as to a computer program product for causing a computer to increase or decrease a predetermined non-harmonic structured spectral component contained in an audio signal.
  • Graphic equalizers are widely used as means for adjusting an audio signal such as music outputted from a speaker.
  • an audio signal reproduced from a CD (compact disk) or the like can be frequency-analyzed, and then the spectra of specific frequency ranges can be increased and decreased.
  • the spectrum of a low frequency range may be increased.
  • a plurality of musical instruments are used in a musical performance, and hence a plurality of instrumental sounds are contained in the audio signal.
  • a plurality of instrumental sounds having a spectrum in the specific frequency range should be increased or decreased similarly.
  • the bass drum sound is increased, and so are other instrumental sounds such as a bass guitar sound that have a spectrum in the low frequency range of the target of increase.
  • a graphic equalizer increases and decreases the spectra of specific frequency ranges of an audio signal, and hence all the instrumental sounds are similarly increased and decreased that have a spectrum in a specific frequency range of the target of increase or decrease.
  • An object of the invention is to provide an audio signal processing method, an audio signal processing apparatus, and a computer program product for extracting a predetermined non-harmonic structured spectral component contained in an audio signal and then increasing or decreasing the spectral component so as to allow the audio-signal contained predetermined spectral component to be independently increased or decreased without an influence on the other spectral components.
  • Another object of the invention is to provide an audio signal processing method, an audio signal processing apparatus, and a computer program product for calculating the spectrum of an audio signal by frequency analysis so as to allow a non-harmonic structured sound such as a drum sound to be extracted from the audio signal on the basis of the spectrum distribution.
  • Another object of the invention is to provide an audio signal processing method, an audio signal processing apparatus, and a computer program product for adapting a spectral component of a template in such a manner that the difference between an extracted spectral component and the spectral component of the template goes below or at a predetermined value, so as to improve the accuracy in the extraction of a non-harmonic structured sound such as a drum sound.
  • Another object of the invention is to provide an audio signal processing method, an audio signal processing apparatus, and a computer program product for selecting a predetermined number of extracted spectral components in ascending order of difference between the spectral component and a spectral component of a template and then updating the spectral component of the template into the median of the predetermined number of selected spectral components so as to permit the acquisition of a template in which the spectra of spectral components not having a non-harmonic structure are suppressed.
  • Another object of the invention is to provide an audio signal processing method, an audio signal processing apparatus, and a computer program product for quantizing an extracted spectral component and a spectral component of a template in the initial adaptation for the spectral component of the template so as to permit the suppression of an erroneous calculation that a large difference value is obtained despite that the two components are alike.
  • Another object of the invention is to provide an audio signal processing method, an audio signal processing apparatus, and a computer program product for increasing or decreasing an extracted predetermined spectral component in response to a received amount of increase or decrease so as to allow the power of the extracted predetermined spectral component to be adjusted independently of the power of the audio signal.
  • Another object of the invention is to provide an audio signal processing method, an audio signal processing apparatus, and a computer program product for causing the process of extracting a predetermined non-harmonic structured spectral component and the process of increasing or decreasing the spectral component to be performed in different apparatuses from each other, so as to allow the load to be distributed efficiently.
  • An audio signal processing method is characterized by comprising steps of extracting a predetermined non-harmonic structured spectral component contained in an audio signal; and increasing or decreasing the extracted predetermined spectral component.
  • An audio signal processing method is based on the first invention, and characterized by further comprising a step of calculating a spectrum of the audio signal by frequency analysis, wherein, in the step of extracting the predetermined non-harmonic structured spectral component, a spectrum is extracted that corresponds to the predetermined non-harmonic structured spectral component.
  • An audio signal processing method is based on the first invention, and characterized in that the step of extracting the predetermined non-harmonic structured spectral component is performed with reference to a spectral component of a template stored in advance, and the method further comprises a step of adapting the spectral component of the template in such a manner that a difference between the extracted spectral component and the spectral component of the template goes below or at a predetermined value.
  • An audio signal processing method is an audio signal processing method for extracting, with reference to a spectral component of a template stored in advance, a predetermined non-harmonic structured spectral component contained in an audio signal, and is characterized by comprising a step of adapting the spectral component of the template in such a manner that a difference between the extracted spectral component and the spectral component of the template goes below or at a predetermined value.
  • An audio signal processing method is based on the third or fourth invention, and is characterized in that the adapting step further comprises steps of calculating a difference between each extracted spectral component and the spectral component of the template in case that a plurality of spectral components have been extracted; selecting a predetermined number of spectral components in ascending order of the calculated difference; and updating the spectral component of the template into a median of the predetermined number of selected spectral components.
  • An audio signal processing method is based on the fifth invention, and characterized by further comprising a step of quantizing the extracted spectral components and the spectral component of the template in an initial adaptation for the spectral component of the template, wherein, in the step of calculating a difference, a difference is calculated between each extracted spectral component and the spectral component of the template which have been quantized.
  • An audio signal processing method is based on the first or fourth invention, and characterized by further comprising a step of receiving an amount of increase or decrease for the predetermined spectral component, wherein, in the increasing or decreasing step, the extracted predetermined spectral component is increased or decreased in response to the received amount of increase or decrease.
  • An audio signal processing method is characterized by comprising steps of extracting a predetermined non-harmonic structured spectral component contained in an audio signal; outputting onset time information of the extraction of the predetermined on-harmonic structured spectral component from the audio signal, the predetermined spectral component, and the audio signal; receiving the outputted onset time information, the predetermined spectral component, and the audio signal; and increasing or decreasing the received spectral component contained in the received audio signal, on the basis of the received onset time information.
  • An audio signal processing apparatus is characterized by comprising: extracting means for extracting a predetermined non-harmonic structured spectral component contained in an audio signal; and increasing and decreasing means for increasing or decreasing the predetermined spectral component extracted by the extracting means.
  • An audio signal processing apparatus is based on the ninth invention, and characterized by further comprising calculating means for calculating a spectrum of the audio signal by frequency analysis, wherein the extracting means extracts a spectrum corresponding to the predetermined non-harmonic structured spectral component.
  • An audio signal processing apparatus is based on the tenth invention, and characterized in that the extraction of a predetermined non-harmonic structured spectral component is performed with reference to a spectral component of a template stored in a storage unit in advance, and the apparatus further comprises adapting means for adapting the spectral component of the template in such a manner that a difference between the extracted spectral component and the spectral component of the template goes below or at a predetermined value.
  • An audio signal processing apparatus is an audio signal processing apparatus for extracting, with reference to a spectral component of a template stored in a storage unit in advance, a predetermined non-harmonic structured spectral component contained in an audio signal, and characterized by comprising adapting means for adapting the spectral component of the template in such a manner that a difference between the extracted spectral component and the spectral component of the template goes below or at a predetermined value.
  • An audio signal processing apparatus is based on the eleventh or twelfth invention, and characterized in that the adapting means further comprises: subtracting means for calculating a difference between each extracted spectral component and the spectral component of the template in case that a plurality of spectral components have been extracted; selecting means for selecting a predetermined number of spectral components in ascending order of the difference calculated by the subtracting means; and updating means for updating the spectral component of the template into a median of the predetermined number of spectral components selected by the selecting means.
  • An audio signal processing apparatus is based on the thirteenth invention, and characterized by further comprising quantizing means for quantizing the extracted spectral components and the spectral component of the template in an initial adaptation for the spectral component of the template, wherein the subtracting means calculates a difference between each extracted spectral component and the spectral component of the template which have been quantized by the quantizing means.
  • An audio signal processing apparatus is based on the ninth or twelfth invention, and characterized by further comprising receiving means for receiving an amount of increase or decrease for the predetermined spectral component, wherein the increasing and decreasing means increases or decreases the extracted predetermined spectral component in response to the amount of increase or decrease received by the receiving means.
  • An audio signal processing system is characterized by including: a first audio signal processing apparatus comprising: extracting means for extracting a predetermined non-harmonic structured spectral component contained in an audio signal; and outputting means for outputting onset time information of the extraction of the predetermined non-harmonic structured spectral component from the audio signal by the extracting means, the predetermined spectral component, and the audio signal; and a second audio signal processing apparatus comprising: receiving means for receiving the onset time information, the predetermined spectral component, and the audio signal outputted from the first audio signal processing apparatus; and increasing and decreasing means for increasing or decreasing the received spectral component contained in the received audio signal, on the basis of the onset time information received by the receiving means.
  • An audio signal processing apparatus is characterized by comprising: extracting means for extracting a predetermined non-harmonic structured spectral component contained in an audio signal; and outputting means for outputting onset time information of the extraction of the predetermined non-harmonic structured spectral component from the audio signal by the extracting means, the predetermined spectral component, and the audio signal.
  • An audio signal processing apparatus is characterized by comprising: receiving means for receiving onset time information of the extraction of a predetermined non-harmonic structured spectral component from an audio signal, the predetermined spectral component, and the audio signal; and increasing and decreasing means for increasing or decreasing the received spectral component contained in the received audio signal, on the basis of the onset time information received by the receiving means.
  • a computer program product is a computer program product for causing a computer to process an audio signal
  • the computer program product comprises a computer readable storage medium having computer readable program code means embodied in the medium, and characterized in that the computer readable program code means comprises instructions for: extracting a predetermined non-harmonic structured spectral component contained in an audio signal; and increasing or decreasing the extracted predetermined spectral component.
  • a computer program product is based on the nineteenth invention, and characterized in that the computer readable program code means further comprises an instruction for calculating a spectrum of the audio signal by frequency analysis, and the extracting instruction causes the computer to extract a spectrum corresponding to the predetermined non-harmonic structured spectral component.
  • a computer program product is based on the twentieth invention, and characterized in that the instruction for extracting a predetermined non-harmonic structured spectral component is executed with reference to a spectral component of a template stored in advance, and the computer readable program code means further comprises an instruction for adapting the spectral component of the template in such a manner that a difference between the extracted spectral component and the spectral component of the template goes below or at a predetermined value.
  • a computer program product is a computer program product for causing a computer to extract, with reference to a spectral component of a template stored in a memory in advance, a predetermined non-harmonic structured spectral component contained in an audio signal, and characterized in that the computer program product comprises a computer readable storage medium having computer readable program code means embodied in the medium, and characterized in that the computer readable program code means comprises an instruction for adapting the spectral component of the template in such a manner that a difference between the extracted spectral component and the spectral component of the template goes below or at a predetermined value.
  • a computer program product is based on the twenty-first or twenty-second invention, and characterized in that, in the adapting instruction, the computer readable program code means further comprises instructions for: calculating a difference between each extracted spectral component and the spectral component of the template in case that a plurality of spectral components have been extracted; selecting a predetermined number of spectral components in ascending order of the calculated difference; and updating the spectral component of the template into a median of the predetermined number of selected spectral components.
  • a computer program product is based on the twenty-third invention, and characterized in that the computer readable program code means further comprises an instruction for quantizing the extracted spectral components and the spectral component of the template in an initial adaptation for the spectral component of the template; and the instruction for calculating a difference causes the computer to calculate a difference between each extracted spectral component and the spectral component of the template which have been quantized.
  • a computer program product is based on the nineteenth or twenty-second invention, and characterized in that the computer readable program code means further comprises an instruction for receiving an amount of increase or decrease for the predetermined spectral component; and the increasing or decreasing instruction causes the computer to increase or decrease the extracted predetermined spectral component in response to the received amount of increase or decrease.
  • a computer program product is a computer program product for causing a computer to process an audio signal
  • the computer program product comprises a computer readable storage medium having computer readable program code means embodied in the medium, and characterized in that the computer readable program code means comprises instructions for: extracting a predetermined non-harmonic structured spectral component contained in an audio signal; and outputting onset time information of the extraction of the predetermined non-harmonic structured spectral component from the audio signal, the predetermined spectral component, and the audio signal.
  • a computer program product is a computer program product for causing a computer to process an audio signal
  • the computer program product comprises a computer readable storage medium having computer readable program code means embodied in the medium
  • the computer readable program code means comprises instructions for: receiving onset time information of the extraction of a predetermined non-harmonic structured spectral component from an audio signal, the predetermined spectral component, and the audio signal; and increasing or decreasing the received spectral component contained in the received audio signal, on the basis of the received onset time information.
  • a predetermined non-harmonic structured spectral component contained in an audio signal is extracted.
  • An example of the non-harmonic structured tone is a sound of a percussion instrument such as a drum.
  • the extracted predetermined spectral component is increased or decreased. For example, when the extracted spectral component of a drum is increased, the drum sound is emphasized. On the contrary, when the extracted spectral component of a drum is decreased, the drum sound is cancelled.
  • a predetermined spectral component contained in an audio signal is solely extracted and can be independently increased or decreased without an influence on the other spectral components.
  • the spectrum of an audio signal is calculated by frequency analysis.
  • the sound of a percussion instrument such as a drum is of non-harmonic structure, and have slight or no harmonic structure.
  • the sounds of other types of musical instruments have a harmonic structure.
  • the non-harmonic structured sound of a percussion instrument such as a drum can be discriminated from the harmonic structured sounds of other types of musical instruments. That is, the non-harmonic structured sound of a percussion instrument such as a drum can be extracted from the audio signal on the basis of the spectrum distribution.
  • the extraction of a predetermined non-harmonic structured spectral component is performed on the basis of a spectral component of a template stored in advance.
  • a template of a drum sound is stored in a storage unit in advance. Nevertheless, it is extremely rare that the drum sound contained in an audio signal agrees completely with the drum sound of the template stored in advance. These sounds usually differ from each other more or less.
  • the spectral component of the template is adapted in such a manner that the difference between the extracted spectral component and the spectral component of the template goes below or at a predetermined value.
  • the difference between each extracted spectral component and a spectral component of a template is calculated. Then, a predetermined number of spectral components are selected in ascending order of the calculated difference. The spectral component of the template is then updated into the median of the predetermined number of selected spectral components, so that the template is adapted.
  • the spectral structure of a non-harmonic structured spectral component usually appears in the same position of the selected spectral components. In contrast, the spectral structure of a harmonic structured spectral component seldom appears in the same position of the selected spectral components.
  • the spectral structure of the non-harmonic structured spectral component is expected to be retained, whereas harmonic structured musical instrumental sounds other than the sound of a percussion instrument such as a drum are seldom retained.
  • the spectra of spectral components not having a non-harmonic structure are suppressed.
  • extracted spectral components and a spectral component of a template are quantized in the initial adaptation for the spectral component of the template, and then the difference is calculated between each extracted spectral component and the spectral component of the template which have been quantized.
  • template adaptation since it is extremely rare that a drum sound, for example, contained in an audio signal agrees completely with a template drum sound, a large difference could be erroneously calculated despite that the two sounds are alike.
  • an amount of increase or decrease for a predetermined spectral component is received, and then the extracted predetermined spectral component is increased or decreased in response to the received amount of increase or decrease.
  • an increase and decrease knob similar to a volume control knob for the power of the audio signal may be used for inputting the amount of increase or decrease.
  • a user adjusts the increase and decrease knob so as to vary the power of the extracted predetermined spectral component independently of the power of the audio signal.
  • a predetermined non-harmonic structured spectral component contained in an audio signal is extracted. Then, outputted are onset time information of the extraction of the predetermined non-harmonic structured spectral component from the audio signal, the predetermined spectral component, and the audio signal. These outputs may be recorded in a recording medium or transmitted through a communication network.
  • the onset time information, the predetermined spectral component, and the audio signal which have been outputted are received. Then, the received spectral component contained in the received audio signal is increased or decreased on the basis of the received onset time information.
  • Various types of information described here may be received in the form of a recording medium or through a communication network.
  • the extraction of a predetermined non-harmonic structured spectral component is a task of heavy load, and hence is desired to be carried out by a high performance computer or the like.
  • the increasing or decreasing of a predetermined spectral component is a task of light load, and hence may be carried out by a general audio device or the like.
  • the load is efficiently distributed so that even an audio device of low performance can increase or decrease the predetermined non-harmonic structured spectral component.
  • a predetermined spectral component contained in an audio signal can be independently increased or decreased without an influence on the other spectral components.
  • a non-harmonic structured sound such as a drum sound can be extracted from an audio signal on the basis of the spectrum distribution.
  • the accuracy is improved in the extraction of a non-harmonic structured sound such as a drum sound.
  • a non-harmonic structured sound such as a drum sound.
  • the invention allows various non-harmonic structured sounds such as various drum sounds to be extracted on the basis of a single template.
  • a template is obtained in which the spectra of spectral components not having a non-harmonic structure are suppressed.
  • the power of an extracted predetermined spectral component can be adjusted independently of the power of the audio signal.
  • the process of extracting a predetermined non-harmonic structured spectral component and the process of increasing or decreasing the spectral component are carried out by different apparatuses from each other.
  • the load is efficiently distributed so that even a general audio device or the like can increase or decrease a predetermined non-harmonic structured spectral component.
  • FIG. 1 is a block diagram showing an exemplary configuration of a computer (audio signal processing apparatus) according to the invention
  • FIG. 2 is a graph showing an example of a low pass filter function F(f);
  • FIG. 3A , FIG. 3B and FIG. 3C are graphs each showing an example of the distance between a template T g and a spectrum segment P i ;
  • FIG. 4A and FIG. 4B are diagrams each showing an example of determination whether a spectrum is contained or not;
  • FIG. 5A , FIG. 5B and FIG. 5C are schematic diagrams each illustrating a time series (frame series) of graphs showing an example of increasing or decreasing a drum sound at onset time;
  • FIG. 6 is a flow chart showing an exemplary procedure of increasing or decreasing a drum sound by means of template adaptation
  • FIG. 7 is a flow chart showing, in the form of a subroutine, an exemplary detail of the procedure of template adaptation shown in FIG. 6 ;
  • FIG. 8 is a flow chart showing, in the form of a subroutine, an exemplary detail of the procedure of template matching shown in FIG. 6 ;
  • FIG. 9 is a flow chart showing, in the form of a subroutine, an exemplary detail of the procedure of spectrum segment adjustment shown in FIG. 8 ;
  • FIG. 10 is a block diagram showing an exemplary configuration of an audio signal processing apparatus according to the invention embodied as an audio device.
  • FIG. 1 is a block diagram showing an exemplary configuration of a computer (audio signal processing apparatus) according to the invention.
  • the computer 10 comprises: a CPU (central processing unit) 11 ; a RAM (random access memory) 12 such as a DRAM; an HDD (hard disk drive) 13 ; an external storage unit 14 such as a flexible disk drive or a CD-ROM drive; and a communication unit 17 for performing communications with a communication network 20 such as a LAN (local area network) or the Internet.
  • the computer 10 further comprises: an input unit 15 provided with a keyboard and a mouse; and a display unit 16 provided with a CRT display, a liquid crystal display, or the like.
  • the CPU 11 controls the system components 12 through 17 described above.
  • the CPU 11 causes the RAM 12 to store programs and data received through the input unit 15 or the communication unit 17 , programs and data read out from a recording medium by the HDD 13 or the external storage unit 14 and the like. Further, the CPU 11 performs various processing such as the execution of the programs stored in the RAM 12 and arithmetic operations on the stored data, and causes the RAM 12 to store the results of the various processing as well as temporary data used in the various processing.
  • the data such as operation results temporarily stored in the RAM 12 is transferred to the HDD 13 and outputted through the display unit 16 or the communication unit 17 under the control of the CPU 11 .
  • the HDD 13 stores an audio signal (sound data) received from the outside by the computer 10 .
  • the computer 10 extracts a non-harmonic structured sound (spectral component) such as the sound of a percussion instrument such as a drum contained in the audio signal, and then increases or decreases the extracted sound. Amount of increase or decrease of the extracted sound is received through the input unit (receiving means) 15 .
  • the non-harmonic structured sound is a sound having almost no harmonic structure. However, the sound may contain a very weak harmonic structure negligible in comparison with general musical instrumental sounds having a harmonic structure.
  • the CPU 11 serves as means (calculating means) for calculating the power spectrum P(t, f) of an audio signal at a frame t and frequency f.
  • the audio signal is sampled in 44.1 kHz.
  • an STFT Short Time Fourier Transformation
  • an STFT Short Time Fourier Transformation
  • a Hanning window having a window width of 4096 points (a frequency resolution of 10.8 Hz) and a window shift length of 441 points (a time resolution of 10 ms)
  • the CPU 11 serves also as means for detecting an onset time candidate o i of a drum.
  • the onset time candidate o i of the drum is detected, for example, as a time (frame) where the power spectrum rises steeply.
  • the CPU 11 calculates the differential Q(a, f) at frame a.
  • the CPU 11 multiplies Q(t, f) by a low pass filter function F(f) based on the typical frequency characteristics of a drum, and calculates a sum S(t) in the frequency direction according to the following equation.
  • FIG. 2 is a graph showing an example of the low pass filter function F(f).
  • the horizontal axis indicates frequency f, while the vertical axis indicates F(f).
  • the low pass filter function F(f) is stored in the HDD 13 in advance.
  • the CPU 11 calculates time where the sum S(t) in the frequency direction reaches a maximum, and then determines the time to be an onset time candidate o i . Before the detection of the maximum, the CPU 11 preferably performs 11-frame smoothing on S(t) by a method according to Savitzky and Golay.
  • the HDD (storage unit) 13 stores a seed template T s created on the basis of a single tone signal of a drum.
  • the seed template T s is a power spectrum having a predetermined time length and acquired by STFT starting at an onset time.
  • the seed template T s is in the form of a matrix the row of which corresponds to time and the column of which corresponds to frequency.
  • Each component is specified as a seed template T s (t, f) (where 1 ⁇ t ⁇ 15 and 1 ⁇ f ⁇ 2048).
  • the CPU 11 serves as means (adapting means) for adapting the seed template T s to an audio signal of the target of analysis.
  • the CPU 11 updates the seed template T s as described later, and repeats the update of the template after that.
  • the spectrum segment P i is a matrix having the same size as the template T g .
  • the extraction of the spectrum segment is carried out as described above. Nevertheless, the time resolution of 10 ms is not sufficient for the template to be adapted accurately.
  • a correction process is preferably performed on the onset time candidate o i .
  • the CPU 11 serves as means for correcting the onset time candidate o i (ms) into o i ′ (ms), and then extracts a spectrum segment P i for the corrected onset time candidate o i ′ (ms).
  • the CPU 11 adopts as the spectrum segment P i the power spectrum extracted from those starting at time o i ′ (ms).
  • the CPU 11 then acquires an offset value J maximizing the correlation value Corr(j), and determines the P i j with the obtained offset value J to be P i .
  • the CPU 11 further calculates a template T g ′ and a spectrum segment P i ′ which are generated by multiplying the template T g and the spectrum segment P i respectively by the low pass filter function F(f) according to the following equations.
  • the CPU 11 serves as means (selecting means) for selecting a predetermined number M of spectrum segments that are alike to the template T g in the course of adaptation.
  • the predetermined number M has a constant ratio (0.1 in the present embodiment) to the total number of spectrum segments (detected onset time candidates).
  • the CPU 11 serves also as subtracting means. That is, the CPU 11 calculates the distance (difference) D i between the template T g and the spectrum segment P i , and then selects a predetermined number M of spectrum segments in ascending order of the calculated distance.
  • the distance D i may be calculated according to the following equation.
  • FIG. 3A , FIG. 3B and FIG. 3C are graphs each showing an example of the distance between a template T g and a spectrum segment P i .
  • the horizontal axis indicates frequency f, while the vertical axis indicates power P.
  • a solid line indicates the spectrum segment P i , while a broken line indicates the template T g .
  • FIG. 3A owing to merely a small difference in the power peak position, a notably large distance is erroneously calculated between the two spectra.
  • the seed template T 0 (T s ) and the spectrum segment P i are quantized with lower time and frequency resolutions in the initial adaptation as shown in FIG. 3B and FIG. 3C . Then, the distance D i is calculated. In an example, the time resolution after quantization is made to be 2 frames (20 ms), and the frequency resolution is made to be 5 bins (54 Hz).
  • the CPU 11 serves also as quantizing means. That is, the CPU 11 quantizes the seed template To and the spectrum segment P i , and then calculates quantized spectra T 0 ′′(t′′, f′′) and P i ′′(t′′, f′′) according to the following equations, respectively.
  • the CPU 11 calculates the distance D i between the seed template T 0 (T s ) and the spectrum segment P i according to the following equation.
  • the seed template To can be adapted to a drum sound in an audio signal containing plural types of instrumental sounds.
  • the drum sound of the template approaches the drum sound contained in the audio signal so that the template adaptation is achieved.
  • the amount of change in the template goes smaller so that the adaptation converges.
  • the CPU 11 serves as means for comparing the present template T g with a new template T g+1 , and thereby determining the convergence of adaptation in case that the difference between the two spectra goes below or at a predetermined value. At that time, the CPU 11 adopts the new template T g+1 , as an adapted template TA.
  • the CPU 11 serves also as means (extracting means) for performing template matching based on the adapted template TA and thereby determining whether the drum is generating a sound at an onset time candidate o i or not.
  • the CPU 11 multiplies the adapted template T A by the low pass filter function F(f) described above, and thereby calculates according to the following equation a weight function ⁇ that indicates the magnitude of characteristics on the spectrum at each frame t of the adapted template T A and at each frequency f.
  • ⁇ ( t, f ) F ( f ) ⁇ T A ( t, f )
  • the power of each spectrum segment is preferably adjusted such that the power matches with that of the template.
  • ⁇ i ( t, f t,k ) P i ( t, f t,k ) ⁇ T A ( t, f t,k )
  • the CPU 11 selects the value of ⁇ i (t, f t,k ) at the first quartile point (the point at 25% of the sample set sorted in ascending order), and thereby adopts this value as the power difference ⁇ i (t) at frame t.
  • the CPU 11 determines that TA is not contained in the spectrum segment P i .
  • the CPU 11 calculates the final power difference ⁇ i (the adjustment value for the spectrum segment: ⁇ i ) according to the following equation.
  • ⁇ i ⁇ ⁇ t / ⁇ i ⁇ ( t ) > ⁇ ⁇ ⁇ i ⁇ ( t ) ⁇ ⁇ ⁇ ( t , f t , K i ⁇ ( t ) ) ⁇ ⁇ t / ⁇ i ⁇ ( t ) > ⁇ ⁇ ⁇ ⁇ ( t , f t , K i ⁇ ( t ) )
  • the CPU 11 serves also as means for calculating the distance between the adapted template TA and the adjusted spectrum segment P i ′. At the calculation of the distance, the CPU 11 determines whether the spectrum of the adapted template T A is contained in the spectrum of the spectrum segment P i ′.
  • FIG. 4A and FIG. 4B are graphs each showing an example of determination whether a spectrum is contained or not.
  • the horizontal axis indicates frequency f, while the vertical axis indicates power P.
  • a solid line indicates a spectrum segment P i ′, while a broken line indicates an adapted template T A . For example, in case that a spectrum segment P i ′(t, f) is larger than the adapted template T A (t, f) all over the frequency range as shown in FIG.
  • the spectrum segment P i ′ (t, f) contains not only the spectral component of a drum sound but also the spectral components of other musical instruments, and that the adapted template T A (t, f) is contained in the spectrum segment P i ′ (t, f)
  • the adapted template T A (t, f) is not contained in the spectrum segment P i ′(t, f).
  • the CPU 11 calculates a local distance measure ⁇ i (t, between the adapted template T A and the spectrum segment P i ′ at frame t and frequency f according to the following equation.
  • ⁇ i ⁇ ( t , f ) ⁇ 0 ⁇ ( if ⁇ ⁇ P i ′ ⁇ ( t , f ) - T A ⁇ ( t , f ) ⁇ ⁇ ) 1 ⁇ ( otherwise )
  • is a negative constant.
  • is a negative constant.
  • a non-zero negative number
  • the CPU 11 integrates the distance measure ⁇ i over the time-frequency domain, and thereby acquires the overall distance ⁇ i .
  • the CPU 11 performs a weighting operation of multiplying the distance measure by the weight function co according to the following equation.
  • the CPU 11 serves also as means for determining whether the target drum has generated a sound in the spectrum segment P i ′(t, f) portion or not. More specifically, in case that ⁇ i ⁇ is satisfied, the CPU 11 determines that the target drum has generated a sound, and then decides the onset time candidate o i as the onset time.
  • FIG. 5A , FIG. 5B and FIG. 5C are schematic diagrams each illustrating a time series (frame series) of graphs showing an example of increasing or decreasing a drum sound at onset time.
  • the horizontal axis indicates frequency f, while the vertical axis indicates power P.
  • Symbol t indicates time (frame).
  • the CPU 11 multiplies a spectrum P x corresponding to the adapted template TA by r (0 ⁇ r ⁇ 1 ) (the broken line in FIG. 5B indicates P x without the multiplication by r, while the solid line indicates P x multiplied by r).
  • the CPU 11 then subtracts r ⁇ P x from the spectrum P of the audio signal shown in FIG. 5A , and thereby calculates an audio signal P′ shown in FIG. 5C where the drum sound is decreased. In case that the drum sound is to be increased, the CPU 11 adds r ⁇ P x to the spectrum P of the audio signal.
  • the CPU 11 calculates various numerical data.
  • the numerical data calculated by the CPU 11 is stored in the RAM 12 or the HDD 13 . Further, when the CPU 11 is to calculate other numerical data on the basis of already calculated numerical data, the CPU 11 reads necessary numerical data from the RAM 12 before the new calculation.
  • a computer program stored in a recording medium 19 such as a CD-ROM is read by the external storage unit 14 and then temporarily stored in the HDD 13 or the RAM 12 . After that, the computer program is executed by the CPU 11 .
  • This approach allows the CPU 11 to serve as various system components described above.
  • a computer program may be received via the communication unit 17 from another apparatus connected to the communication network 20 , and then temporarily stored in the HDD 13 or the RAM 12 . After that, the computer program may be executed by the CPU 11 .
  • FIG. 6 is a flow chart showing an exemplary procedure of increasing or decreasing a drum sound by means of template adaptation. The procedure shown in the flow chart of FIG. 6 is carried out when the CPU 11 executes a computer program stored in the HDD 13 or the RAM 12 .
  • the computer 10 reads an audio signal (sound data), for example, from a recording medium 19 in the external storage unit 14 , and then stores the data into the HDD 13 .
  • the computer 10 may store into the HDD 13 sound data (an audio signal, hereafter) that are inputted through a sound card (not shown) and then converted into an audio signal.
  • the computer 10 further reads a drum sound template (seed template T s ), for example, from a recording medium 19 in the external storage unit 14 , and then stores the data into the HDD 13 .
  • the CPU 11 first performs frequency analysis on the audio signal so as to calculate the power spectrum P, and then stores into the HDD 13 the data of the calculated power spectrum P.
  • the CPU 11 detects an onset time candidate o i (S 10 ) on the basis of a power spectrum P extracted and stored in the HDD 13 .
  • the CPU 11 stores the detected onset time candidate o i into the HDD 13 .
  • the CPU 11 extracts (calculates) a spectrum segment P i (S 12 ), and then stores the data of the extracted spectrum segment P i into the HDD 13 .
  • the CPU 11 performs template adaptation (template adaptation) (S 14 ), and thereby updates the updated template T g (seed template T s in the beginning) stored in the HDD 13 .
  • template adaptation template adaptation
  • the CPU 11 performs template matching by using the adapted template T A , and then decides the onset time (extracts a drum sound) (S 16 ).
  • the CPU 11 stores the decided onset time into the HDD 13 .
  • the CPU 11 increases or decreases the power spectrum in the vicinity of the decided onset time (S 18 ), and thereby creates an audio signal used as an output.
  • the CPU 11 stores this audio signal into the HDD 13 .
  • the increase or decrease of the power spectrum is performed in response to the amount of increase or decrease received through the input unit 15 .
  • the audio signal (sound data) used as an output may be outputted and recorded into a recording medium 19 in the external storage unit 14 . Alternatively, the audio signal used as an output may be outputted through a sound card not shown.
  • FIG. 7 is a flow chart showing, in the form of a subroutine, an exemplary detail of the procedure of template adaptation (S 14 ) shown in FIG. 6 .
  • the CPU 11 first calculates the distance D i between the spectrum segment P i and the template T g (S 20 ), and then stores the calculated distance D i into the HDD 13 . In the initial process, the distance D i is calculated after quantization.
  • the CPU 11 selects spectrum segments P s having smaller calculated distances D i (S 22 ), and then performs template update using the median of the selected spectrum segments (S 24 ). Then, the CPU 11 compares the amount of change between the not-yet-updated template and the updated template (S 26 ).
  • the CPU 11 terminates the template adaptation process.
  • the CPU 11 repeats the processes of S 20 , S 22 and S 24 described above until the amount of change between the templates before and after the update goes below or at the predetermined value.
  • FIG. 8 is a flow chart showing, in the form of a subroutine, an exemplary detail of the procedure of template matching (S 16 ) shown in FIG. 6 .
  • the CPU 11 first adjusts the spectrum segment P i so as to match with the template (S 30 ).
  • the CPU 11 stores the adjusted spectrum segment P i ′ into the HDD 13 .
  • the CPU 11 calculates the amount (adjustment value ⁇ i ) of change between the spectrum segments P i and P i ′ before and after the power adjustment, and then stores the value into the RAM 12 .
  • the CPU 11 compares the value with a threshold ⁇ stored in the HDD 13 in advance (S 32 ).
  • the CPU 11 terminates the template matching process. In case that the adjustment value ⁇ i is greater than or equal to the threshold ⁇ (S 32 : YES), the CPU 11 terminates the template matching process. In case that the adjustment value ⁇ i is smaller than the threshold ⁇ (S 32 : NO), the CPU 11 calculates the distance ⁇ i between the template and the adjusted spectrum segment ⁇ i ′ (S 34 ), and then stores the calculated distance ⁇ i into the HDD 13 . The CPU 11 then compares the calculated distance ⁇ i with a threshold 0 stored in the HDD 13 in advance (S 36 ). In case that the distance ⁇ i is greater than or equal to the threshold ⁇ (S 36 : YES), the CPU 11 terminates the template matching process. In case that the distance ⁇ i is smaller than the threshold ⁇ (S 36 : NO), the CPU 11 decides the onset time candidate o i as the onset time (S 38 ), and then stores the decided onset time into the HDD 13 .
  • FIG. 9 is a flow chart showing, in the form of a subroutine, an exemplary detail of the procedure of spectrum segment adjustment (S 30 ) shown in FIG. 8 .
  • the CPU 11 first calculates the power difference ⁇ i between the template TA and the spectrum segment P i at the characteristic frequency at each time (frame) (S 40 ), and then stores the value into the RAM 12 or the HDD 13 .
  • the CPU 11 calculates the power difference ⁇ i at each time (S 42 ), and then stores the value into the RAM 12 or the HDD 13 .
  • the CPU 11 compares the power difference ⁇ i at each time with a threshold ⁇ stored in the HDD 13 in advance, and thereby counts the number of frames where the power difference ⁇ i is greater than or equal to the threshold ⁇ .
  • the CPU 11 stores the count into the RAM 12 or the HDD 13 .
  • the CPU 11 compares the number of frames where the power difference ⁇ i is greater than or equal to the threshold ⁇ with a threshold R stored in the HDD 13 in advance (S 44 ). In case that the number of frames where the power difference ⁇ i is greater than or equal to the threshold ⁇ is smaller than or equal to the threshold R (S 44 : YES), the CPU 11 terminates the process of adjusting the spectrum segment P i .
  • the CPU 11 In case that the number of frames where the power difference ⁇ i is greater than or equal to the threshold ⁇ is greater than the threshold R (S 44 : NO), the CPU 11 integrates the power difference ⁇ i at each time, and thereby acquires the power difference (adjustment value ⁇ i ) (S 46 ). The CPU 11 stores the value into the HDD 13 . The CPU 11 then compares the power difference ⁇ i calculated in step S 46 with a threshold ⁇ stored in the HDD 13 in advance (S 48 ). In case that the power difference ⁇ i is smaller than or equal to the threshold ⁇ (S 48 : YES), the CPU 11 terminates the process of adjusting the spectrum segment P i .
  • the CPU 11 subtracts the power difference ⁇ i from the spectrum segment P i (S 50 ), and then stores the result as a spectrum segment P i ′ into the HDD 13 .
  • the audio signal processing apparatus according to the invention is embodied in the form of a software process using a computer.
  • the invention is applicable also to various types of apparatuses for outputting an audio signal such as a recording device, an electronic musical instrument, an audio device, a portable audio device, and a portable telephone or the like.
  • FIG. 10 is a block diagram showing an exemplary configuration of an audio signal processing apparatus according to the invention embodied as an audio device.
  • the audio device 30 comprises: an operation unit 35 for receiving various operations such as a reproduction operation; a display unit 36 provided with a liquid crystal display panel or the like for displaying the operation status such as “in reproduction”; a reproducing unit 34 for reading data from a recording medium (not shown) such as an MD (Mini Disc), a disc of another type, and flash memory, and thereby reproducing an audio signal; an output unit 37 for outputting to a headphone or a speaker the audio signal reproduced by the reproducing unit 34 ; a control unit (CPU) 31 for controlling various system components such as the operation unit 35 , the display unit 36 , the reproducing unit 34 , and the output unit 37 ; a RAM 32 connected to the control unit 31 ; and a flash memory 33 serving as a storage unit.
  • the control unit 31 controls various system components such as the reproducing unit 34 and the output unit 37 in response to an operation
  • the control unit 31 serves as means for extracting a predetermined non-harmonic structured spectral component such as a drum sound contained in an audio signal as well as means for increasing or decreasing the extracted predetermined spectral component.
  • the control unit 31 serves also as means for calculating the spectrum of an audio signal by frequency analysis, and thereby extracts a spectrum corresponding to the predetermined non-harmonic structured spectral component.
  • the extraction of the predetermined non-harmonic structured spectral component is performed with reference to a spectral component of a template stored in the flash memory (storage unit) 33 in advance.
  • the control unit 31 serves as means for adapting the spectral component of the template in such a manner that the difference between the extracted spectral component and the spectral component of the template stored in the flash memory 33 goes below or at a predetermined value. More specifically, the control unit 31 serves as in case that a plurality of spectral components have been extracted: means for calculating the difference between each extracted spectral component and the spectral component of the template; means for selecting a predetermined number of spectral components in ascending order of the calculated difference; and means for updating the spectral component of the template into the median of the predetermined number of selected spectral components. As such, the control unit 31 adapts the spectral component of the template.
  • the control unit 31 serves also as means for quantizing each extracted spectral component and the spectral component of the template in the initial adaptation for the spectral component of the template, and thereby calculates the difference between each extracted spectral component and the spectral component of the template that have been quantized.
  • the operation unit 35 serves as means for receiving the amount of increase or decrease of the predetermined spectral component, so that the control unit 31 increases or decreases the extracted predetermined spectral component in response to the amount of increase or decrease received through the operation unit 35 .
  • the operation unit 35 comprises a volume control knob for bass drum.
  • the audio device 30 shown in FIG. 10 extracts and increases or decreases a predetermined non-harmonic structured spectral component such as a drum sound according to the invention.
  • the control unit 31 , the RAM 32 , the flash memory 33 , the reproducing unit 34 , the operation unit 35 , the display unit 36 , and the output unit 37 in the audio device 30 operate respectively in a similar manner to the CPU 11 , the RAM 12 , the HDD 13 , the external storage unit 14 , the input unit 15 , the display unit 16 , and the sound card (not shown) in the computer 10 of FIG. 1 , and thereby extract and increase or decrease a drum sound or the like.
  • the control unit (CPU) 31 extracts and increases or decreases the drum sound or the like.
  • a dedicated hardware (dedicated LSI) for extracting and increasing or decreasing the drum sound or the like may be provided so that the dedicated LSI, instead of the control unit 31 , may extract and increase or decrease the predetermined non-harmonic structured spectral component such as a drum sound.
  • the audio device 30 may be provided with a communication port for performing communications with the outside.
  • the reproducing unit 34 may be constructed in a manner capable of recording in addition to reproducing.
  • the invention is applicable also to arbitrary audio devices.
  • the invention may be applied in its audio signal processing unit.
  • the invention is applicable to the audio signal processing units of various devices for processing an audio signal.
  • a non-harmonic structured sound such as a drum sound is extracted and increased or decreased.
  • the invention is not limited to the drum sound.
  • a non-harmonic structured sound generated by another percussion instrument such as cymbals may be extracted and increased or decreased.
  • a non-harmonic structured sound generated by another type of sound source may be extracted and increased or decreased.
  • a bass drum sound or a snare drum sound among various types of drum sounds may be extracted and increased or decreased.
  • An audio signal processed according to the invention may contain a voice signal.
  • a predetermined non-harmonic structured spectral component may be extracted from an audio signal of music containing a vocal, and then the extracted spectral component may be increased or decreased.
  • a predetermined non-harmonic structured spectral component may be extracted from an audio signal containing a voice of the target of speech recognition, and then the extracted spectral component may be increased or decreased.
  • a predetermined non-harmonic structured spectral component contained in voice data can be extracted and decreased.
  • Such a non-harmonic structured spectral component contained in voice data is a noise component in many cases. Thus, the noise component can be cancelled by extracting and decreasing it. This improves the accuracy in the speech recognition.
  • the above-mentioned embodiment has been described in the case that once the onset time is decided, the power spectrum is immediately increased or decreased in the vicinity of the onset time (S 16 and S 18 in FIG. 6 ).
  • the deciding of the onset time may be processed separately from the increase or decrease of the power spectrum in the vicinity of the onset time.
  • the audio signal sound data
  • the onset time onset position data
  • the adapted template may be transmitted through a recording medium or a network to another computer. Then, this another computer or an audio device may increase or decrease the power spectrum in the vicinity of the onset time.
  • the communication unit (outputting means) 17 of the computer (first audio signal processing apparatus) shown in FIG. 1 may transmit the audio signal, the onset time, and the adapted template. Further, the external storage unit (outputting means) 14 may output such data and record it into a recording medium. Furthermore, the reproducing unit (receiving means) 34 of the audio device (second audio signal processing apparatus) shown in FIG. 10 may read the audio signal, the onset time, and the adapted template, while the control unit 31 or the like may increase or decrease the power spectrum of the audio signal corresponding to the adapted template at the onset time. Similarly, the communication unit (receiving means) 17 of the computer (second audio signal processing apparatus) shown in FIG.
  • the first input signal may receive the audio signal, the onset time, and the adapted template.
  • the external storage unit (receiving means) 14 may read the audio signal, the onset time, and the adapted template, while the CPU 11 may increase or decrease the power spectrum of the audio signal corresponding to the adapted template at the onset time.
  • the template adaptation may be separately performed in another audio signal processing apparatus such as a computer.

Abstract

An apparatus and method for extracting a predetermined non-harmonic structured spectral component contained in an audio signal. Then, the extracted predetermined spectral component is increased or decreased. In this process, the spectrum of the audio signal is calculated by frequency analysis, so that a spectrum component corresponding to the predetermined non-harmonic structured spectral component is extracted and then increased or decreased. The extraction of the predetermined non-harmonic structured spectral component is performed with reference to a spectral component of a template stored in advance. In this process, the spectral component of the template is adapted in such a manner that the difference between the extracted spectral component and the spectral component of the template goes below or at a predetermined value. This allows the audio-signal contained predetermined non-harmonic structured spectral component to be independently increased or decreased without an influence on other spectral components.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • This Nonprovisional application claims priority under 35 U.S.C. §119(a) on patent Application No. 2004-181881 filed in Japan on Jun. 18, 2004, the entire contents of which are hereby incorporated by reference.
  • BACKGROUND OF THE INVENTION
  • 1. Field of the Invention
  • The present invention relates to an audio signal processing method, an audio signal processing apparatus, and an audio signal processing system for increasing or decreasing a predetermined non-harmonic structured spectral component contained in an audio signal, as well as to a computer program product for causing a computer to increase or decrease a predetermined non-harmonic structured spectral component contained in an audio signal.
  • 2. Description of Related Art
  • Graphic equalizers are widely used as means for adjusting an audio signal such as music outputted from a speaker. (e.g., Japanese Patent Application Laid-Open No. 5-175773 (1993)). When a graphic equalizer is used, an audio signal reproduced from a CD (compact disk) or the like can be frequency-analyzed, and then the spectra of specific frequency ranges can be increased and decreased. Thus, when a bass drum sound contained in an audio signal outputted from a speaker is to be emphasized, the spectrum of a low frequency range may be increased.
  • Nevertheless, in many cases, a plurality of musical instruments are used in a musical performance, and hence a plurality of instrumental sounds are contained in the audio signal. Thus, when the spectrum of a specific frequency range of the audio signal is increased or decreased, a plurality of instrumental sounds having a spectrum in the specific frequency range should be increased or decreased similarly. For example, when the spectrum of a low frequency range is increased for the purpose of emphasizing a bass drum, the bass drum sound is increased, and so are other instrumental sounds such as a bass guitar sound that have a spectrum in the low frequency range of the target of increase.
  • As such, a graphic equalizer increases and decreases the spectra of specific frequency ranges of an audio signal, and hence all the instrumental sounds are similarly increased and decreased that have a spectrum in a specific frequency range of the target of increase or decrease. This has caused a problem that a specific instrumental sound cannot be solely increased or decreased without an influence on the other instrumental sounds, such as that a bass drum sound cannot be solely increased or decreased without an influence on a bass guitar sound.
  • BRIEF SUMMARY OF THE INVENTION
  • The present invention has been devised with considering such a situation. An object of the invention is to provide an audio signal processing method, an audio signal processing apparatus, and a computer program product for extracting a predetermined non-harmonic structured spectral component contained in an audio signal and then increasing or decreasing the spectral component so as to allow the audio-signal contained predetermined spectral component to be independently increased or decreased without an influence on the other spectral components.
  • Another object of the invention is to provide an audio signal processing method, an audio signal processing apparatus, and a computer program product for calculating the spectrum of an audio signal by frequency analysis so as to allow a non-harmonic structured sound such as a drum sound to be extracted from the audio signal on the basis of the spectrum distribution.
  • Another object of the invention is to provide an audio signal processing method, an audio signal processing apparatus, and a computer program product for adapting a spectral component of a template in such a manner that the difference between an extracted spectral component and the spectral component of the template goes below or at a predetermined value, so as to improve the accuracy in the extraction of a non-harmonic structured sound such as a drum sound.
  • Another object of the invention is to provide an audio signal processing method, an audio signal processing apparatus, and a computer program product for selecting a predetermined number of extracted spectral components in ascending order of difference between the spectral component and a spectral component of a template and then updating the spectral component of the template into the median of the predetermined number of selected spectral components so as to permit the acquisition of a template in which the spectra of spectral components not having a non-harmonic structure are suppressed.
  • Another object of the invention is to provide an audio signal processing method, an audio signal processing apparatus, and a computer program product for quantizing an extracted spectral component and a spectral component of a template in the initial adaptation for the spectral component of the template so as to permit the suppression of an erroneous calculation that a large difference value is obtained despite that the two components are alike.
  • Another object of the invention is to provide an audio signal processing method, an audio signal processing apparatus, and a computer program product for increasing or decreasing an extracted predetermined spectral component in response to a received amount of increase or decrease so as to allow the power of the extracted predetermined spectral component to be adjusted independently of the power of the audio signal.
  • Another object of the invention is to provide an audio signal processing method, an audio signal processing apparatus, and a computer program product for causing the process of extracting a predetermined non-harmonic structured spectral component and the process of increasing or decreasing the spectral component to be performed in different apparatuses from each other, so as to allow the load to be distributed efficiently.
  • An audio signal processing method according to the first invention is characterized by comprising steps of extracting a predetermined non-harmonic structured spectral component contained in an audio signal; and increasing or decreasing the extracted predetermined spectral component.
  • An audio signal processing method according to the second invention is based on the first invention, and characterized by further comprising a step of calculating a spectrum of the audio signal by frequency analysis, wherein, in the step of extracting the predetermined non-harmonic structured spectral component, a spectrum is extracted that corresponds to the predetermined non-harmonic structured spectral component.
  • An audio signal processing method according to the third invention is based on the first invention, and characterized in that the step of extracting the predetermined non-harmonic structured spectral component is performed with reference to a spectral component of a template stored in advance, and the method further comprises a step of adapting the spectral component of the template in such a manner that a difference between the extracted spectral component and the spectral component of the template goes below or at a predetermined value.
  • An audio signal processing method according to the fourth invention is an audio signal processing method for extracting, with reference to a spectral component of a template stored in advance, a predetermined non-harmonic structured spectral component contained in an audio signal, and is characterized by comprising a step of adapting the spectral component of the template in such a manner that a difference between the extracted spectral component and the spectral component of the template goes below or at a predetermined value.
  • An audio signal processing method according to the fifth invention is based on the third or fourth invention, and is characterized in that the adapting step further comprises steps of calculating a difference between each extracted spectral component and the spectral component of the template in case that a plurality of spectral components have been extracted; selecting a predetermined number of spectral components in ascending order of the calculated difference; and updating the spectral component of the template into a median of the predetermined number of selected spectral components.
  • An audio signal processing method according to the sixth invention is based on the fifth invention, and characterized by further comprising a step of quantizing the extracted spectral components and the spectral component of the template in an initial adaptation for the spectral component of the template, wherein, in the step of calculating a difference, a difference is calculated between each extracted spectral component and the spectral component of the template which have been quantized.
  • An audio signal processing method according to the seventh invention is based on the first or fourth invention, and characterized by further comprising a step of receiving an amount of increase or decrease for the predetermined spectral component, wherein, in the increasing or decreasing step, the extracted predetermined spectral component is increased or decreased in response to the received amount of increase or decrease.
  • An audio signal processing method according to the eighth invention is characterized by comprising steps of extracting a predetermined non-harmonic structured spectral component contained in an audio signal; outputting onset time information of the extraction of the predetermined on-harmonic structured spectral component from the audio signal, the predetermined spectral component, and the audio signal; receiving the outputted onset time information, the predetermined spectral component, and the audio signal; and increasing or decreasing the received spectral component contained in the received audio signal, on the basis of the received onset time information.
  • An audio signal processing apparatus according to the ninth invention is characterized by comprising: extracting means for extracting a predetermined non-harmonic structured spectral component contained in an audio signal; and increasing and decreasing means for increasing or decreasing the predetermined spectral component extracted by the extracting means.
  • An audio signal processing apparatus according to the tenth invention is based on the ninth invention, and characterized by further comprising calculating means for calculating a spectrum of the audio signal by frequency analysis, wherein the extracting means extracts a spectrum corresponding to the predetermined non-harmonic structured spectral component.
  • An audio signal processing apparatus according to the eleventh invention is based on the tenth invention, and characterized in that the extraction of a predetermined non-harmonic structured spectral component is performed with reference to a spectral component of a template stored in a storage unit in advance, and the apparatus further comprises adapting means for adapting the spectral component of the template in such a manner that a difference between the extracted spectral component and the spectral component of the template goes below or at a predetermined value.
  • An audio signal processing apparatus according to the twelfth invention is an audio signal processing apparatus for extracting, with reference to a spectral component of a template stored in a storage unit in advance, a predetermined non-harmonic structured spectral component contained in an audio signal, and characterized by comprising adapting means for adapting the spectral component of the template in such a manner that a difference between the extracted spectral component and the spectral component of the template goes below or at a predetermined value.
  • An audio signal processing apparatus according to the thirteenth invention is based on the eleventh or twelfth invention, and characterized in that the adapting means further comprises: subtracting means for calculating a difference between each extracted spectral component and the spectral component of the template in case that a plurality of spectral components have been extracted; selecting means for selecting a predetermined number of spectral components in ascending order of the difference calculated by the subtracting means; and updating means for updating the spectral component of the template into a median of the predetermined number of spectral components selected by the selecting means.
  • An audio signal processing apparatus according to the fourteenth invention is based on the thirteenth invention, and characterized by further comprising quantizing means for quantizing the extracted spectral components and the spectral component of the template in an initial adaptation for the spectral component of the template, wherein the subtracting means calculates a difference between each extracted spectral component and the spectral component of the template which have been quantized by the quantizing means.
  • An audio signal processing apparatus according to the fifteenth invention is based on the ninth or twelfth invention, and characterized by further comprising receiving means for receiving an amount of increase or decrease for the predetermined spectral component, wherein the increasing and decreasing means increases or decreases the extracted predetermined spectral component in response to the amount of increase or decrease received by the receiving means.
  • An audio signal processing system according to the sixteenth invention is characterized by including: a first audio signal processing apparatus comprising: extracting means for extracting a predetermined non-harmonic structured spectral component contained in an audio signal; and outputting means for outputting onset time information of the extraction of the predetermined non-harmonic structured spectral component from the audio signal by the extracting means, the predetermined spectral component, and the audio signal; and a second audio signal processing apparatus comprising: receiving means for receiving the onset time information, the predetermined spectral component, and the audio signal outputted from the first audio signal processing apparatus; and increasing and decreasing means for increasing or decreasing the received spectral component contained in the received audio signal, on the basis of the onset time information received by the receiving means.
  • An audio signal processing apparatus according to the seventeenth invention is characterized by comprising: extracting means for extracting a predetermined non-harmonic structured spectral component contained in an audio signal; and outputting means for outputting onset time information of the extraction of the predetermined non-harmonic structured spectral component from the audio signal by the extracting means, the predetermined spectral component, and the audio signal.
  • An audio signal processing apparatus according to the eighteenth invention is characterized by comprising: receiving means for receiving onset time information of the extraction of a predetermined non-harmonic structured spectral component from an audio signal, the predetermined spectral component, and the audio signal; and increasing and decreasing means for increasing or decreasing the received spectral component contained in the received audio signal, on the basis of the onset time information received by the receiving means.
  • A computer program product according to the nineteenth invention is a computer program product for causing a computer to process an audio signal, wherein the computer program product comprises a computer readable storage medium having computer readable program code means embodied in the medium, and characterized in that the computer readable program code means comprises instructions for: extracting a predetermined non-harmonic structured spectral component contained in an audio signal; and increasing or decreasing the extracted predetermined spectral component.
  • A computer program product according to the twentieth invention is based on the nineteenth invention, and characterized in that the computer readable program code means further comprises an instruction for calculating a spectrum of the audio signal by frequency analysis, and the extracting instruction causes the computer to extract a spectrum corresponding to the predetermined non-harmonic structured spectral component.
  • A computer program product according to the twenty-first invention is based on the twentieth invention, and characterized in that the instruction for extracting a predetermined non-harmonic structured spectral component is executed with reference to a spectral component of a template stored in advance, and the computer readable program code means further comprises an instruction for adapting the spectral component of the template in such a manner that a difference between the extracted spectral component and the spectral component of the template goes below or at a predetermined value.
  • A computer program product according to the twenty-second invention is a computer program product for causing a computer to extract, with reference to a spectral component of a template stored in a memory in advance, a predetermined non-harmonic structured spectral component contained in an audio signal, and characterized in that the computer program product comprises a computer readable storage medium having computer readable program code means embodied in the medium, and characterized in that the computer readable program code means comprises an instruction for adapting the spectral component of the template in such a manner that a difference between the extracted spectral component and the spectral component of the template goes below or at a predetermined value.
  • A computer program product according to the twenty-third invention is based on the twenty-first or twenty-second invention, and characterized in that, in the adapting instruction, the computer readable program code means further comprises instructions for: calculating a difference between each extracted spectral component and the spectral component of the template in case that a plurality of spectral components have been extracted; selecting a predetermined number of spectral components in ascending order of the calculated difference; and updating the spectral component of the template into a median of the predetermined number of selected spectral components.
  • A computer program product according to the twenty-fourth invention is based on the twenty-third invention, and characterized in that the computer readable program code means further comprises an instruction for quantizing the extracted spectral components and the spectral component of the template in an initial adaptation for the spectral component of the template; and the instruction for calculating a difference causes the computer to calculate a difference between each extracted spectral component and the spectral component of the template which have been quantized.
  • A computer program product according to the twenty-fifth invention is based on the nineteenth or twenty-second invention, and characterized in that the computer readable program code means further comprises an instruction for receiving an amount of increase or decrease for the predetermined spectral component; and the increasing or decreasing instruction causes the computer to increase or decrease the extracted predetermined spectral component in response to the received amount of increase or decrease.
  • A computer program product according to the twenty-sixth invention is a computer program product for causing a computer to process an audio signal, wherein the computer program product comprises a computer readable storage medium having computer readable program code means embodied in the medium, and characterized in that the computer readable program code means comprises instructions for: extracting a predetermined non-harmonic structured spectral component contained in an audio signal; and outputting onset time information of the extraction of the predetermined non-harmonic structured spectral component from the audio signal, the predetermined spectral component, and the audio signal.
  • A computer program product according to the twenty-seventh invention is a computer program product for causing a computer to process an audio signal, wherein the computer program product comprises a computer readable storage medium having computer readable program code means embodied in the medium, and the computer readable program code means comprises instructions for: receiving onset time information of the extraction of a predetermined non-harmonic structured spectral component from an audio signal, the predetermined spectral component, and the audio signal; and increasing or decreasing the received spectral component contained in the received audio signal, on the basis of the received onset time information.
  • In the first, ninth and nineteenth-inventions, a predetermined non-harmonic structured spectral component contained in an audio signal is extracted. An example of the non-harmonic structured tone is a sound of a percussion instrument such as a drum. Then, in the audio signal, the extracted predetermined spectral component is increased or decreased. For example, when the extracted spectral component of a drum is increased, the drum sound is emphasized. On the contrary, when the extracted spectral component of a drum is decreased, the drum sound is cancelled. As such, a predetermined spectral component contained in an audio signal is solely extracted and can be independently increased or decreased without an influence on the other spectral components.
  • In the second, tenth and twentieth inventions, the spectrum of an audio signal is calculated by frequency analysis. The sound of a percussion instrument such as a drum is of non-harmonic structure, and have slight or no harmonic structure. The sounds of other types of musical instruments have a harmonic structure. Thus, on the basis of the spectrum distribution, the non-harmonic structured sound of a percussion instrument such as a drum can be discriminated from the harmonic structured sounds of other types of musical instruments. That is, the non-harmonic structured sound of a percussion instrument such as a drum can be extracted from the audio signal on the basis of the spectrum distribution.
  • In the third, fourth, eleventh, twelfth, twenty-first and twenty-second inventions, the extraction of a predetermined non-harmonic structured spectral component is performed on the basis of a spectral component of a template stored in advance. For example, when a drum sound is to be extracted, a template of a drum sound is stored in a storage unit in advance. Nevertheless, it is extremely rare that the drum sound contained in an audio signal agrees completely with the drum sound of the template stored in advance. These sounds usually differ from each other more or less. Thus, the spectral component of the template is adapted in such a manner that the difference between the extracted spectral component and the spectral component of the template goes below or at a predetermined value. This ensures that the drum sound contained in the audio signal agrees approximately with the drum sound of the template stored in advance. This improves the accuracy in the extraction of the drum sound, and hence permits accurate increase or decrease of the extracted drum sound. Further, this approach allows various drum sounds to be extracted on the basis of a single template.
  • In the fifth, thirteenth and twenty-third inventions, in case that a plurality of spectral components have been extracted, the difference between each extracted spectral component and a spectral component of a template is calculated. Then, a predetermined number of spectral components are selected in ascending order of the calculated difference. The spectral component of the template is then updated into the median of the predetermined number of selected spectral components, so that the template is adapted. The spectral structure of a non-harmonic structured spectral component usually appears in the same position of the selected spectral components. In contrast, the spectral structure of a harmonic structured spectral component seldom appears in the same position of the selected spectral components. Thus, when the median is used, the spectral structure of the non-harmonic structured spectral component is expected to be retained, whereas harmonic structured musical instrumental sounds other than the sound of a percussion instrument such as a drum are seldom retained. As a result, the spectra of spectral components not having a non-harmonic structure are suppressed.
  • In the sixth, fourteenth and twenty-fourth inventions, extracted spectral components and a spectral component of a template are quantized in the initial adaptation for the spectral component of the template, and then the difference is calculated between each extracted spectral component and the spectral component of the template which have been quantized. Without template adaptation, since it is extremely rare that a drum sound, for example, contained in an audio signal agrees completely with a template drum sound, a large difference could be erroneously calculated despite that the two sounds are alike. In contrast, when the extracted spectral components and the spectral component of the template are quantized, and when a representative value such as the median is used in the difference calculation, it is suppressed that a large difference is erroneously calculated despite that the two sounds are alike.
  • In the seventh, fifteenth and twenty-fifth inventions, an amount of increase or decrease for a predetermined spectral component is received, and then the extracted predetermined spectral component is increased or decreased in response to the received amount of increase or decrease. For example, an increase and decrease knob similar to a volume control knob for the power of the audio signal may be used for inputting the amount of increase or decrease. A user adjusts the increase and decrease knob so as to vary the power of the extracted predetermined spectral component independently of the power of the audio signal.
  • In the eighth, sixteenth, seventeenth, eighteenth, twenty-sixth and twenty-seventh inventions, in a first audio signal processing apparatus, a predetermined non-harmonic structured spectral component contained in an audio signal is extracted. Then, outputted are onset time information of the extraction of the predetermined non-harmonic structured spectral component from the audio signal, the predetermined spectral component, and the audio signal. These outputs may be recorded in a recording medium or transmitted through a communication network. In a second audio signal processing apparatus, the onset time information, the predetermined spectral component, and the audio signal which have been outputted are received. Then, the received spectral component contained in the received audio signal is increased or decreased on the basis of the received onset time information. Various types of information described here may be received in the form of a recording medium or through a communication network. The extraction of a predetermined non-harmonic structured spectral component is a task of heavy load, and hence is desired to be carried out by a high performance computer or the like. In contrast, the increasing or decreasing of a predetermined spectral component is a task of light load, and hence may be carried out by a general audio device or the like. As such, according to the invention, the load is efficiently distributed so that even an audio device of low performance can increase or decrease the predetermined non-harmonic structured spectral component.
  • According to the first, ninth and nineteenth inventions, a predetermined spectral component contained in an audio signal can be independently increased or decreased without an influence on the other spectral components.
  • According to the second, tenth and twentieth inventions, a non-harmonic structured sound such as a drum sound can be extracted from an audio signal on the basis of the spectrum distribution.
  • According to the third, fourth, eleventh, twelfth, twenty-first and twenty-second inventions, the accuracy is improved in the extraction of a non-harmonic structured sound such as a drum sound. This permits accurate increase or decrease of the extracted drum sound. Further, the invention allows various non-harmonic structured sounds such as various drum sounds to be extracted on the basis of a single template.
  • According to the fifth, thirteenth and twenty-third inventions, a template is obtained in which the spectra of spectral components not having a non-harmonic structure are suppressed.
  • According to the sixth, fourteenth and twenty-fourth inventions, it is suppressed that a large difference is erroneously calculated despite that an extracted spectral component and a spectral component of a template are alike.
  • According to the seventh, fifteenth and twenty-fifth inventions, the power of an extracted predetermined spectral component can be adjusted independently of the power of the audio signal.
  • According to the eighth, sixteenth, seventeenth, eighteenth, twenty-sixth and twenty-seventh inventions, the process of extracting a predetermined non-harmonic structured spectral component and the process of increasing or decreasing the spectral component are carried out by different apparatuses from each other. Thus, the load is efficiently distributed so that even a general audio device or the like can increase or decrease a predetermined non-harmonic structured spectral component.
  • The above and further objects and features of the invention will more fully be apparent from the following detailed description with accompanying drawings.
  • BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS
  • FIG. 1 is a block diagram showing an exemplary configuration of a computer (audio signal processing apparatus) according to the invention;
  • FIG. 2 is a graph showing an example of a low pass filter function F(f);
  • FIG. 3A, FIG. 3B and FIG. 3C are graphs each showing an example of the distance between a template Tg and a spectrum segment Pi;
  • FIG. 4A and FIG. 4B are diagrams each showing an example of determination whether a spectrum is contained or not;
  • FIG. 5A, FIG. 5B and FIG. 5C are schematic diagrams each illustrating a time series (frame series) of graphs showing an example of increasing or decreasing a drum sound at onset time;
  • FIG. 6 is a flow chart showing an exemplary procedure of increasing or decreasing a drum sound by means of template adaptation;
  • FIG. 7 is a flow chart showing, in the form of a subroutine, an exemplary detail of the procedure of template adaptation shown in FIG. 6;
  • FIG. 8 is a flow chart showing, in the form of a subroutine, an exemplary detail of the procedure of template matching shown in FIG. 6;
  • FIG. 9 is a flow chart showing, in the form of a subroutine, an exemplary detail of the procedure of spectrum segment adjustment shown in FIG. 8; and
  • FIG. 10 is a block diagram showing an exemplary configuration of an audio signal processing apparatus according to the invention embodied as an audio device.
  • DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
  • The invention is described below in detail with reference to drawings showing its embodiments.
  • FIG. 1 is a block diagram showing an exemplary configuration of a computer (audio signal processing apparatus) according to the invention. The computer 10 comprises: a CPU (central processing unit) 11; a RAM (random access memory) 12 such as a DRAM; an HDD (hard disk drive) 13; an external storage unit 14 such as a flexible disk drive or a CD-ROM drive; and a communication unit 17 for performing communications with a communication network 20 such as a LAN (local area network) or the Internet. The computer 10 further comprises: an input unit 15 provided with a keyboard and a mouse; and a display unit 16 provided with a CRT display, a liquid crystal display, or the like.
  • The CPU 11 controls the system components 12 through 17 described above. The CPU 11 causes the RAM 12 to store programs and data received through the input unit 15 or the communication unit 17, programs and data read out from a recording medium by the HDD 13 or the external storage unit 14 and the like. Further, the CPU 11 performs various processing such as the execution of the programs stored in the RAM 12 and arithmetic operations on the stored data, and causes the RAM 12 to store the results of the various processing as well as temporary data used in the various processing. The data such as operation results temporarily stored in the RAM 12 is transferred to the HDD 13 and outputted through the display unit 16 or the communication unit 17 under the control of the CPU 11.
  • The HDD 13 stores an audio signal (sound data) received from the outside by the computer 10. The computer 10 extracts a non-harmonic structured sound (spectral component) such as the sound of a percussion instrument such as a drum contained in the audio signal, and then increases or decreases the extracted sound. Amount of increase or decrease of the extracted sound is received through the input unit (receiving means) 15. The non-harmonic structured sound is a sound having almost no harmonic structure. However, the sound may contain a very weak harmonic structure negligible in comparison with general musical instrumental sounds having a harmonic structure.
  • The CPU 11 serves as means (calculating means) for calculating the power spectrum P(t, f) of an audio signal at a frame t and frequency f. In an example, the audio signal is sampled in 44.1 kHz. Then, an STFT (Short Time Fourier Transformation) is calculated using a Hanning window having a window width of 4096 points (a frequency resolution of 10.8 Hz) and a window shift length of 441 points (a time resolution of 10 ms), so that the power spectrum P(t, f) is obtained.
  • The CPU 11 serves also as means for detecting an onset time candidate oi of a drum. The onset time candidate oi of the drum is detected, for example, as a time (frame) where the power spectrum rises steeply. In three successive frames in the time direction (t=a+1, a, a+1), in case that the differential Q(t, f)={∂P(t, f)/∂t} of P(t, f) with respect to time (frame) satisfies Q(t, f)>0, the CPU 11 calculates the differential Q(a, f) at frame a. On the contrary, in case that Q(t, f)>0 is not satisfied in the three successive frames, the CPU 11 sets Q(a, f)=0. Then, at each frame t, the CPU 11 multiplies Q(t, f) by a low pass filter function F(f) based on the typical frequency characteristics of a drum, and calculates a sum S(t) in the frequency direction according to the following equation. S ( t ) = f = 1 2048 F ( f ) Q ( t , f )
  • FIG. 2 is a graph showing an example of the low pass filter function F(f). The horizontal axis indicates frequency f, while the vertical axis indicates F(f). The low pass filter function F(f) is stored in the HDD 13 in advance. The CPU 11 calculates time where the sum S(t) in the frequency direction reaches a maximum, and then determines the time to be an onset time candidate oi. Before the detection of the maximum, the CPU 11 preferably performs 11-frame smoothing on S(t) by a method according to Savitzky and Golay.
  • The HDD (storage unit) 13 stores a seed template Ts created on the basis of a single tone signal of a drum. The seed template Ts is a power spectrum having a predetermined time length and acquired by STFT starting at an onset time. The seed template Ts is in the form of a matrix the row of which corresponds to time and the column of which corresponds to frequency. Each component is specified as a seed template Ts(t, f) (where 1≦t≦15 and 1≦f≦2048).
  • The CPU 11 serves as means (adapting means) for adapting the seed template Ts to an audio signal of the target of analysis. The CPU 11 updates the seed template Ts as described later, and repeats the update of the template after that. The template having undergone the g-th update is expressed by Tg. Since the seed template Ts is the initially inputted (g=0) template, T0=Ts. The CPU 11 serves as means (calculating means) for extracting a spectrum segment Pi (i=1, . . . , N, where N is the total number of detected onset time candidates) which is a power spectrum having a predetermined time length and starting at an onset time candidate oi (ms) detected from the audio signal of the target of analysis. The spectrum segment Pi is a matrix having the same size as the template Tg.
  • The extraction of the spectrum segment is carried out as described above. Nevertheless, the time resolution of 10 ms is not sufficient for the template to be adapted accurately. Thus, a correction process is preferably performed on the onset time candidate oi. In an example, the CPU 11 serves as means for correcting the onset time candidate oi (ms) into oi′ (ms), and then extracts a spectrum segment Pi for the corrected onset time candidate oi′ (ms). For example, in case that a spectrum segment selected from those of oi′=oi−5 ms or oi+5 ms has better quality than that extracted from those of oi (ms), the CPU 11 adopts as the spectrum segment Pi the power spectrum extracted from those starting at time oi′ (ms).
  • In an example, the CPU 11 extracts a spectrum segment Pi j starting at time oi+j (ms) (where j=−5 ms, 0 ms and 5 ms). Then, the CPU 11 calculates the correlation value Corr(j) between the template Tg′ and the spectrum segment Pi,j according to the following equation. Corr ( j ) = t = 1 15 f = 1 2048 F ( f ) T g ( t , f ) · F ( f ) P i , j ( t , f )
  • The CPU 11 then acquires an offset value J maximizing the correlation value Corr(j), and determines the Pi j with the obtained offset value J to be Pi.
  • The CPU 11 further calculates a template Tg′ and a spectrum segment Pi′ which are generated by multiplying the template Tg and the spectrum segment Pi respectively by the low pass filter function F(f) according to the following equations.
    T g′(t,f)=F(f) T g(t,f)
    P i′(t,f)=F(f) P i(t,f)
  • The CPU 11 serves as means (selecting means) for selecting a predetermined number M of spectrum segments that are alike to the template Tg in the course of adaptation. The predetermined number M has a constant ratio (0.1 in the present embodiment) to the total number of spectrum segments (detected onset time candidates). The CPU 11 serves also as subtracting means. That is, the CPU 11 calculates the distance (difference) Di between the template Tg and the spectrum segment Pi, and then selects a predetermined number M of spectrum segments in ascending order of the calculated distance. The distance Di may be calculated according to the following equation. D i = { t = 1 15 f = 1 2048 ( T g ( t , f ) - P i ( t , f ) ) 2 }
  • In case that the distance Di is calculated according to the above equation, a large distance is calculated despite that the power peak position in the template Tg differs merely slightly from that in the spectrum segment Pi. This occurs a possibility that accurate calculation of the distance can not be executed. FIG. 3A, FIG. 3B and FIG. 3C are graphs each showing an example of the distance between a template Tg and a spectrum segment Pi. The horizontal axis indicates frequency f, while the vertical axis indicates power P. A solid line indicates the spectrum segment Pi, while a broken line indicates the template Tg. As shown in FIG. 3A, owing to merely a small difference in the power peak position, a notably large distance is erroneously calculated between the two spectra.
  • In order to avoid this situation, in the invention, the seed template T0 (Ts) and the spectrum segment Pi are quantized with lower time and frequency resolutions in the initial adaptation as shown in FIG. 3B and FIG. 3C. Then, the distance Di is calculated. In an example, the time resolution after quantization is made to be 2 frames (20 ms), and the frequency resolution is made to be 5 bins (54 Hz). The CPU 11 serves also as quantizing means. That is, the CPU 11 quantizes the seed template To and the spectrum segment Pi, and then calculates quantized spectra T0″(t″, f″) and Pi″(t″, f″) according to the following equations, respectively. T 0 ( t , f ) = t = 2 t - 1 2 t f = 5 f - 4 5 f T 0 ( t , f ) P 0 ( t , f ) = t = 2 t - 1 2 t f = 5 f - 4 5 f P i ( t , f )
  • The CPU 11 then calculates the distance Di between the seed template T0 (Ts) and the spectrum segment Pi according to the following equation. D i = { t = 1 15 / 2 f = 1 2048 / 5 ( T 0 ( t , f ) - P i ( t , f ) ) 2 }
  • The CPU 11 serves also as updating means for updating the template Tg into a new template Tg+1 on the basis of the predetermined number M of selected spectrum segments Ps (s=1, . . . , M). It is probable that the spectral structure of a drum sound appears in the same position in each spectrum segment PS. In contrast, the sound spectral components of musical instruments other than the drum seldom appear in the same position in each spectrum segment Ps. Thus, the CPU 11 determines as a new template Tg+1 the median of the selected spectrum segments Ps as follows.
    T g+1(t,f)=medianP s(t,f)
  • When the median is used as described here, the spectral structure of the drum sound is expected to be retained. In contrast, instrumental sounds other than the drum sound are seldom retained. Thus, the sound spectral components of musical instruments other than the drum are expected to be suppressed. As such, the seed template To can be adapted to a drum sound in an audio signal containing plural types of instrumental sounds.
  • When the determination of a new template Tg+1 is repeated, the drum sound of the template approaches the drum sound contained in the audio signal so that the template adaptation is achieved. In the course of repetition of the determination, the amount of change in the template goes smaller so that the adaptation converges. The CPU 11 serves as means for comparing the present template Tg with a new template Tg+1, and thereby determining the convergence of adaptation in case that the difference between the two spectra goes below or at a predetermined value. At that time, the CPU 11 adopts the new template Tg+1, as an adapted template TA.
  • The CPU 11 serves also as means (extracting means) for performing template matching based on the adapted template TA and thereby determining whether the drum is generating a sound at an onset time candidate oi or not. The CPU 11 multiplies the adapted template TA by the low pass filter function F(f) described above, and thereby calculates according to the following equation a weight function ω that indicates the magnitude of characteristics on the spectrum at each frame t of the adapted template TA and at each frequency f.
    ω(t, f)=F(fT A(t, f)
  • In case that the power of each spectrum segment differs from that of the template, it is not sure that the determination whether the template is contained in the spectrum segment or not is performed appropriately. Thus, for the purpose of ensuring appropriate template matching, the power of each spectrum segment is preferably adjusted such that the power matches with that of the template. The CPU 11 selects the frequency ft,k (k=1, . . . , 15) of a characteristic point having the k-th largest value of ω(t, ft, k) at frame t in the template TA, and then calculates the power difference ηi(t, ft,k) according to the following equation.
    ηi(t, f t,k)=P i(t, f t,k)−T A(t, f t,k)
  • Then, the CPU 11 selects the value of ηi(t, ft,k) at the first quartile point (the point at 25% of the sample set sorted in ascending order), and thereby adopts this value as the power difference δi(t) at frame t. In case that the number of frames that do not satisfy δi(t)≧Ψ (Ψ is a negative constant) exceeds a predetermined threshold value R, the CPU 11 determines that TA is not contained in the spectrum segment Pi.
  • The CPU 11 calculates the final power difference Δi (the adjustment value for the spectrum segment: −Δi) according to the following equation. Δ i = { t / δ i ( t ) > Ψ } δ i ( t ) ω ( t , f t , K i ( t ) ) { t / δ i ( t ) > Ψ } ω ( t , f t , K i ( t ) )
  • In case that Δi≦Θ (Θ is a constant) is satisfied, the CPU 11 determines that the adapted template TA is not contained in the spectrum segment Pi. In case that Δi≦Θ is not satisfied, the CPU 11 determines that the adapted template TA is contained in the spectrum segment Pi, and then calculates an adjusted spectrum segment Pi′ according to the following equation.
    P i′(t, f)=P i(t, f)−Δi
  • The CPU 11 serves also as means for calculating the distance between the adapted template TA and the adjusted spectrum segment Pi′. At the calculation of the distance, the CPU 11 determines whether the spectrum of the adapted template TA is contained in the spectrum of the spectrum segment Pi′. FIG. 4A and FIG. 4B are graphs each showing an example of determination whether a spectrum is contained or not. The horizontal axis indicates frequency f, while the vertical axis indicates power P. A solid line indicates a spectrum segment Pi′, while a broken line indicates an adapted template TA. For example, in case that a spectrum segment Pi′(t, f) is larger than the adapted template TA(t, f) all over the frequency range as shown in FIG. 4A, it is determined that the spectrum segment Pi′ (t, f) contains not only the spectral component of a drum sound but also the spectral components of other musical instruments, and that the adapted template TA(t, f) is contained in the spectrum segment Pi′ (t, f) In the other cases as shown in FIG. 4B, it is determined that the adapted template TA(t, f) is not contained in the spectrum segment Pi′(t, f). The CPU 11 calculates a local distance measure γi(t, between the adapted template TA and the spectrum segment Pi′ at frame t and frequency f according to the following equation. γ i ( t , f ) = { 0 ( if P i ( t , f ) - T A ( t , f ) Ψ ) 1 ( otherwise )
  • Here, Ψ is a negative constant. When a non-zero negative number is used as Ψ, a small variation in the spectral component can be absorbed. The CPU 11 integrates the distance measure γi over the time-frequency domain, and thereby acquires the overall distance Γi. At that time, the CPU 11 performs a weighting operation of multiplying the distance measure by the weight function co according to the following equation. Γ i = t = 1 15 f = 1 2048 ω ( t , f ) γ i ( t , f )
  • The CPU 11 serves also as means for determining whether the target drum has generated a sound in the spectrum segment Pi′(t, f) portion or not. More specifically, in case that Γi<θ is satisfied, the CPU 11 determines that the target drum has generated a sound, and then decides the onset time candidate oi as the onset time.
  • The CPU 11 serves also as increasing and decreasing means for increasing or decreasing a drum sound at onset time. FIG. 5A, FIG. 5B and FIG. 5C are schematic diagrams each illustrating a time series (frame series) of graphs showing an example of increasing or decreasing a drum sound at onset time. The horizontal axis indicates frequency f, while the vertical axis indicates power P. Symbol t indicates time (frame). As shown in FIG. 5B, the CPU 11 multiplies a spectrum Px corresponding to the adapted template TA by r (0≦r≦1) (the broken line in FIG. 5B indicates Px without the multiplication by r, while the solid line indicates Px multiplied by r). The CPU 11 then subtracts r·Px from the spectrum P of the audio signal shown in FIG. 5A, and thereby calculates an audio signal P′ shown in FIG. 5C where the drum sound is decreased. In case that the drum sound is to be increased, the CPU 11 adds r·Px to the spectrum P of the audio signal.
  • As described above, the CPU 11 calculates various numerical data. The numerical data calculated by the CPU 11 is stored in the RAM 12 or the HDD 13. Further, when the CPU 11 is to calculate other numerical data on the basis of already calculated numerical data, the CPU 11 reads necessary numerical data from the RAM 12 before the new calculation.
  • A computer program stored in a recording medium 19 such as a CD-ROM is read by the external storage unit 14 and then temporarily stored in the HDD 13 or the RAM 12. After that, the computer program is executed by the CPU 11. This approach allows the CPU 11 to serve as various system components described above. Alternatively, a computer program may be received via the communication unit 17 from another apparatus connected to the communication network 20, and then temporarily stored in the HDD 13 or the RAM 12. After that, the computer program may be executed by the CPU 11.
  • Described below is a practical procedure of increasing or decreasing a drum sound by using a computer (audio signal processing apparatus) according to the invention. FIG. 6 is a flow chart showing an exemplary procedure of increasing or decreasing a drum sound by means of template adaptation. The procedure shown in the flow chart of FIG. 6 is carried out when the CPU 11 executes a computer program stored in the HDD 13 or the RAM 12.
  • The computer 10 reads an audio signal (sound data), for example, from a recording medium 19 in the external storage unit 14, and then stores the data into the HDD 13. Alternatively, the computer 10 may store into the HDD 13 sound data (an audio signal, hereafter) that are inputted through a sound card (not shown) and then converted into an audio signal. The computer 10 further reads a drum sound template (seed template Ts), for example, from a recording medium 19 in the external storage unit 14, and then stores the data into the HDD 13.
  • The CPU 11 first performs frequency analysis on the audio signal so as to calculate the power spectrum P, and then stores into the HDD 13 the data of the calculated power spectrum P. The CPU 11 then detects an onset time candidate oi (S10) on the basis of a power spectrum P extracted and stored in the HDD 13. The CPU 11 stores the detected onset time candidate oi into the HDD 13. On the basis of the onset time candidate oi, the CPU 11 extracts (calculates) a spectrum segment Pi (S12), and then stores the data of the extracted spectrum segment Pi into the HDD 13. After that, the CPU 11 performs template adaptation (template adaptation) (S14), and thereby updates the updated template Tg (seed template Ts in the beginning) stored in the HDD 13. As a result, the template converges into an adapted template TA.
  • After that, the CPU 11 performs template matching by using the adapted template TA, and then decides the onset time (extracts a drum sound) (S16). The CPU 11 stores the decided onset time into the HDD 13. Using the adapted template TA, the CPU 11 increases or decreases the power spectrum in the vicinity of the decided onset time (S18), and thereby creates an audio signal used as an output. The CPU 11 stores this audio signal into the HDD 13. The increase or decrease of the power spectrum is performed in response to the amount of increase or decrease received through the input unit 15. The audio signal (sound data) used as an output may be outputted and recorded into a recording medium 19 in the external storage unit 14. Alternatively, the audio signal used as an output may be outputted through a sound card not shown.
  • FIG. 7 is a flow chart showing, in the form of a subroutine, an exemplary detail of the procedure of template adaptation (S14) shown in FIG. 6. The CPU 11 first calculates the distance Di between the spectrum segment Pi and the template Tg (S20), and then stores the calculated distance Di into the HDD 13. In the initial process, the distance Di is calculated after quantization. The CPU 11 then selects spectrum segments Ps having smaller calculated distances Di (S22), and then performs template update using the median of the selected spectrum segments (S24). Then, the CPU 11 compares the amount of change between the not-yet-updated template and the updated template (S26). In case that the amount of change between the templates before and after the update goes below or at a predetermined value, that is, in case that the adaptation has been converged (S26: YES), the CPU 11 terminates the template adaptation process. In contrast, in case that the amount of change between the templates before and after the update does not yet go below or at the predetermined value, that is, in case that the adaptation has not yet converged (S26: NO), the CPU 11 repeats the processes of S20, S22 and S24 described above until the amount of change between the templates before and after the update goes below or at the predetermined value.
  • FIG. 8 is a flow chart showing, in the form of a subroutine, an exemplary detail of the procedure of template matching (S16) shown in FIG. 6. The CPU 11 first adjusts the spectrum segment Pi so as to match with the template (S30). The CPU 11 then stores the adjusted spectrum segment Pi′ into the HDD 13. Then, the CPU 11 calculates the amount (adjustment value Δi) of change between the spectrum segments Pi and Pi′ before and after the power adjustment, and then stores the value into the RAM 12. The CPU 11 then compares the value with a threshold Θ stored in the HDD 13 in advance (S32). In case that the adjustment value Δi is greater than or equal to the threshold Θ (S32: YES), the CPU 11 terminates the template matching process. In case that the adjustment value Δi is smaller than the threshold Θ (S32: NO), the CPU 11 calculates the distance Γi between the template and the adjusted spectrum segment Γi′ (S34), and then stores the calculated distance Γi into the HDD 13. The CPU 11 then compares the calculated distance Γi with a threshold 0 stored in the HDD 13 in advance (S36). In case that the distance Γi is greater than or equal to the threshold θ (S36: YES), the CPU 11 terminates the template matching process. In case that the distance Γi is smaller than the threshold θ (S36: NO), the CPU 11 decides the onset time candidate oi as the onset time (S38), and then stores the decided onset time into the HDD 13.
  • FIG. 9 is a flow chart showing, in the form of a subroutine, an exemplary detail of the procedure of spectrum segment adjustment (S30) shown in FIG. 8. The CPU 11 first calculates the power difference ηi between the template TA and the spectrum segment Pi at the characteristic frequency at each time (frame) (S40), and then stores the value into the RAM 12 or the HDD 13. On the basis of the calculated power difference ηi at the characteristic frequency, the CPU 11 calculates the power difference δi at each time (S42), and then stores the value into the RAM 12 or the HDD 13. The CPU 11 then compares the power difference δi at each time with a threshold Ψ stored in the HDD 13 in advance, and thereby counts the number of frames where the power difference δi is greater than or equal to the threshold Ψ. The CPU 11 stores the count into the RAM 12 or the HDD 13. The CPU 11 then compares the number of frames where the power difference δi is greater than or equal to the threshold Ψ with a threshold R stored in the HDD 13 in advance (S44). In case that the number of frames where the power difference δi is greater than or equal to the threshold Ψ is smaller than or equal to the threshold R (S44: YES), the CPU 11 terminates the process of adjusting the spectrum segment Pi. In case that the number of frames where the power difference δi is greater than or equal to the threshold Ψ is greater than the threshold R (S44: NO), the CPU 11 integrates the power difference δi at each time, and thereby acquires the power difference (adjustment value Δi) (S46). The CPU 11 stores the value into the HDD 13. The CPU 11 then compares the power difference Δi calculated in step S46 with a threshold Γ stored in the HDD 13 in advance (S48). In case that the power difference Δi is smaller than or equal to the threshold Γ (S48: YES), the CPU 11 terminates the process of adjusting the spectrum segment Pi. In case that the power difference Δi is greater than the threshold Θ (S48: NO), the CPU 11 subtracts the power difference Δi from the spectrum segment Pi (S50), and then stores the result as a spectrum segment Pi′ into the HDD 13.
  • The above-mentioned embodiment has been described in the case that the audio signal processing apparatus according to the invention is embodied in the form of a software process using a computer. However, the invention is applicable also to various types of apparatuses for outputting an audio signal such as a recording device, an electronic musical instrument, an audio device, a portable audio device, and a portable telephone or the like.
  • FIG. 10 is a block diagram showing an exemplary configuration of an audio signal processing apparatus according to the invention embodied as an audio device. The audio device 30 comprises: an operation unit 35 for receiving various operations such as a reproduction operation; a display unit 36 provided with a liquid crystal display panel or the like for displaying the operation status such as “in reproduction”; a reproducing unit 34 for reading data from a recording medium (not shown) such as an MD (Mini Disc), a disc of another type, and flash memory, and thereby reproducing an audio signal; an output unit 37 for outputting to a headphone or a speaker the audio signal reproduced by the reproducing unit 34; a control unit (CPU) 31 for controlling various system components such as the operation unit 35, the display unit 36, the reproducing unit 34, and the output unit 37; a RAM 32 connected to the control unit 31; and a flash memory 33 serving as a storage unit. The control unit 31 controls various system components such as the reproducing unit 34 and the output unit 37 in response to an operation received through the operation unit 35, and thereby causes an audio signal to be outputted through the output unit 37.
  • The control unit 31 serves as means for extracting a predetermined non-harmonic structured spectral component such as a drum sound contained in an audio signal as well as means for increasing or decreasing the extracted predetermined spectral component. The control unit 31 serves also as means for calculating the spectrum of an audio signal by frequency analysis, and thereby extracts a spectrum corresponding to the predetermined non-harmonic structured spectral component. The extraction of the predetermined non-harmonic structured spectral component is performed with reference to a spectral component of a template stored in the flash memory (storage unit) 33 in advance. The control unit 31 serves as means for adapting the spectral component of the template in such a manner that the difference between the extracted spectral component and the spectral component of the template stored in the flash memory 33 goes below or at a predetermined value. More specifically, the control unit 31 serves as in case that a plurality of spectral components have been extracted: means for calculating the difference between each extracted spectral component and the spectral component of the template; means for selecting a predetermined number of spectral components in ascending order of the calculated difference; and means for updating the spectral component of the template into the median of the predetermined number of selected spectral components. As such, the control unit 31 adapts the spectral component of the template.
  • The control unit 31 serves also as means for quantizing each extracted spectral component and the spectral component of the template in the initial adaptation for the spectral component of the template, and thereby calculates the difference between each extracted spectral component and the spectral component of the template that have been quantized. The operation unit 35 serves as means for receiving the amount of increase or decrease of the predetermined spectral component, so that the control unit 31 increases or decreases the extracted predetermined spectral component in response to the amount of increase or decrease received through the operation unit 35. In an example, in addition to a volume control knob for the overall power of the audio signal, the operation unit 35 comprises a volume control knob for bass drum.
  • Similarly to the computer shown in FIG. 1, the audio device 30 shown in FIG. 10 extracts and increases or decreases a predetermined non-harmonic structured spectral component such as a drum sound according to the invention. The control unit 31, the RAM 32, the flash memory 33, the reproducing unit 34, the operation unit 35, the display unit 36, and the output unit 37 in the audio device 30 operate respectively in a similar manner to the CPU 11, the RAM 12, the HDD 13, the external storage unit 14, the input unit 15, the display unit 16, and the sound card (not shown) in the computer 10 of FIG. 1, and thereby extract and increase or decrease a drum sound or the like.
  • In the configuration shown in FIG. 10, the control unit (CPU) 31 extracts and increases or decreases the drum sound or the like. However, a dedicated hardware (dedicated LSI) for extracting and increasing or decreasing the drum sound or the like may be provided so that the dedicated LSI, instead of the control unit 31, may extract and increase or decrease the predetermined non-harmonic structured spectral component such as a drum sound. Further, the audio device 30 may be provided with a communication port for performing communications with the outside. Furthermore, the reproducing unit 34 may be constructed in a manner capable of recording in addition to reproducing. As such, the invention is applicable also to arbitrary audio devices. In the case of a portable telephone, the invention may be applied in its audio signal processing unit. As such, the invention is applicable to the audio signal processing units of various devices for processing an audio signal.
  • The above-mentioned embodiment has been described in the case that a non-harmonic structured sound such as a drum sound is extracted and increased or decreased. However, the invention is not limited to the drum sound. A non-harmonic structured sound generated by another percussion instrument such as cymbals may be extracted and increased or decreased. Further, a non-harmonic structured sound generated by another type of sound source may be extracted and increased or decreased. Further, a bass drum sound or a snare drum sound among various types of drum sounds may be extracted and increased or decreased.
  • An audio signal processed according to the invention may contain a voice signal. For example, a predetermined non-harmonic structured spectral component may be extracted from an audio signal of music containing a vocal, and then the extracted spectral component may be increased or decreased. Further, a predetermined non-harmonic structured spectral component may be extracted from an audio signal containing a voice of the target of speech recognition, and then the extracted spectral component may be increased or decreased. Accordingly, in speech recognition, a predetermined non-harmonic structured spectral component contained in voice data can be extracted and decreased. Such a non-harmonic structured spectral component contained in voice data is a noise component in many cases. Thus, the noise component can be cancelled by extracting and decreasing it. This improves the accuracy in the speech recognition.
  • Further, the above-mentioned embodiment has been described in the case that once the onset time is decided, the power spectrum is immediately increased or decreased in the vicinity of the onset time (S16 and S18 in FIG. 6). However, the deciding of the onset time may be processed separately from the increase or decrease of the power spectrum in the vicinity of the onset time. In an example, after the onset time of a drum in an audio signal is decided, the audio signal (sound data), the onset time (onset position data), and the adapted template may be transmitted through a recording medium or a network to another computer. Then, this another computer or an audio device may increase or decrease the power spectrum in the vicinity of the onset time. More specifically, the communication unit (outputting means) 17 of the computer (first audio signal processing apparatus) shown in FIG. 1 may transmit the audio signal, the onset time, and the adapted template. Further, the external storage unit (outputting means) 14 may output such data and record it into a recording medium. Furthermore, the reproducing unit (receiving means) 34 of the audio device (second audio signal processing apparatus) shown in FIG. 10 may read the audio signal, the onset time, and the adapted template, while the control unit 31 or the like may increase or decrease the power spectrum of the audio signal corresponding to the adapted template at the onset time. Similarly, the communication unit (receiving means) 17 of the computer (second audio signal processing apparatus) shown in FIG. 1 may receive the audio signal, the onset time, and the adapted template. Further, the external storage unit (receiving means) 14 may read the audio signal, the onset time, and the adapted template, while the CPU 11 may increase or decrease the power spectrum of the audio signal corresponding to the adapted template at the onset time. Furthermore, the template adaptation may be separately performed in another audio signal processing apparatus such as a computer.
  • As this invention may be embodied in several forms without departing from the spirit of essential characteristics thereof, the present embodiment is therefore illustrative and not restrictive, since the scope of the invention is defined by the appended claims rather than by the description preceding them, and all changes that fall within metes and bounds of the claims, or equivalence of such metes and bounds thereof are therefore intended to be embraced by the claims.

Claims (49)

1. An audio signal processing method comprising steps of
extracting a predetermined non-harmonic structured spectral component contained in an audio signal; and
increasing or decreasing said extracted predetermined spectral component.
2. The audio signal processing method as set forth in claim 1, further comprising a step of calculating a spectrum of said audio signal by frequency analysis,
wherein, in said step of extracting the predetermined non-harmonic structured spectral component, a spectrum is extracted that corresponds to said predetermined non-harmonic structured spectral component.
3. The audio signal processing method as set forth in claim 2, wherein
said step of extracting the predetermined non-harmonic structured spectral component is performed with reference to a spectral component of a template stored in advance, and
said method further comprises a step of adapting said spectral component of said template in such a manner that a difference between said extracted spectral component and said spectral component of said template goes below or at a predetermined value.
4. The audio signal processing method as set forth in claim 3, wherein said adapting step further comprises steps of
calculating a difference between each extracted spectral component and said spectral component of said template in case that a plurality of spectral components have been extracted;
selecting a predetermined number of spectral components in ascending order of said calculated difference; and
updating said spectral component of said template into a median of said predetermined number of selected spectral components.
5. The audio signal processing method as set forth in claim 4, further comprising a step of quantizing said extracted spectral components and said spectral component of said template in an initial adaptation for said spectral component of said template,
wherein, in said step of calculating a difference, a difference is calculated between each extracted spectral component and said spectral component of said template which have been quantized.
6. The audio signal processing method as set forth in claim 1, further comprising a step of receiving an amount of increase or decrease for said predetermined spectral component,
wherein, in said increasing or decreasing step, said extracted predetermined spectral component is increased or decreased in response to said received amount of increase or decrease.
7. An audio signal processing method for extracting, with reference to a spectral component of a template stored in advance, a predetermined non-harmonic structured spectral component contained in an audio signal, comprising a step of adapting said spectral component of said template in such a manner that a difference between said extracted spectral component and said spectral component of said template goes below or at a predetermined value.
8. The audio signal processing method as set forth in claim 7, wherein said adapting step further comprises steps of:
calculating a difference between each extracted spectral component and said spectral component of said template in case that a plurality of spectral components have been extracted;
selecting a predetermined number of spectral components in ascending order of said calculated difference; and
updating said spectral component of said template into a median of said predetermined number of selected spectral components.
9. The audio signal processing method as set forth in claim 8, further comprising a step of quantizing said extracted spectral component and said spectral component of said template in an initial adaptation for said spectral component of said template,
wherein, in said step of calculating a difference, a difference is calculated between each extracted spectral component and said spectral component of said template which have been quantized.
10. The audio signal processing method as set forth in claim 7, further comprising a step of receiving an amount of increase or decrease for said predetermined spectral component,
wherein, in said increasing or decreasing step, said extracted predetermined spectral component is increased or decreased in response to said received amount of increase or decrease.
11. An audio signal processing method comprising steps of
extracting a predetermined non-harmonic structured spectral component contained in an audio signal;
outputting onset time information of the extraction of said predetermined non-harmonic structured spectral component from said audio signal, said predetermined spectral component, and said audio signal;
receiving said outputted onset time information, said predetermined spectral component, and said audio signal; and
increasing or decreasing said received spectral component contained in said received audio signal, on the basis of said received onset time information.
12. An audio signal processing apparatus comprising:
extracting means for extracting a predetermined non-harmonic structured spectral component contained in an audio signal; and
increasing and decreasing means for increasing or decreasing said predetermined spectral component extracted by said extracting means.
13. The audio signal processing apparatus as set forth in claim 12, further comprising calculating means for calculating a spectrum of said audio signal by frequency analysis,
wherein said extracting means extracts a spectrum corresponding to said predetermined non-harmonic structured spectral component.
14. The audio signal processing apparatus as set forth in claim 13, wherein
said extraction of a predetermined non-harmonic structured spectral component is performed with reference to a spectral component of a template stored in a storage unit in advance, and
said apparatus further comprises adapting means for adapting said spectral component of said template in such a manner that a difference between said extracted spectral component and said spectral component of said template goes below or at a predetermined value.
15. The audio signal processing apparatus as set forth in claim 14, wherein said adapting means further comprises:
subtracting means for calculating a difference between each extracted spectral component and said spectral component of said template in case that a plurality of spectral components have been extracted;
selecting means for selecting a predetermined number of spectral components in ascending order of the difference calculated by said subtracting means; and
updating means for updating said spectral component of said template into a median of said predetermined number of spectral components selected by said selecting means.
16. The audio signal processing apparatus as set forth in claim 15, further comprising quantizing means for quantizing said extracted spectral components and said spectral component of said template in an initial adaptation for said spectral component of said template,
wherein said subtracting means calculates a difference between each extracted spectral component and said spectral component of said template which have been quantized by said quantizing means.
17. The audio signal processing apparatus as set forth in claim 12, further comprising receiving means for receiving an amount of increase or decrease for said predetermined spectral component,
wherein said increasing and decreasing means increases or decreases said extracted predetermined spectral component in response to said amount of increase or decrease received by said receiving means.
18. An audio signal processing apparatus for extracting, with reference to a spectral component of a template stored in a storage unit in advance, a predetermined non-harmonic structured spectral component contained in an audio signal, comprising adapting means for adapting said spectral component of said template in such a manner that a difference between said extracted spectral component and said spectral component of said template goes below or at a predetermined value.
19. The audio signal processing apparatus as set forth in claim 18, wherein said adapting means further comprises:
subtracting means for calculating a difference between each extracted spectral component and said spectral component of said template in case that a plurality of spectral components have been extracted;
selecting means for selecting a predetermined number of spectral components in ascending order of the difference calculated by said subtracting means; and
updating means for updating said spectral component of said template into a median of said predetermined number of spectral components selected by said selecting means.
20. The audio signal processing apparatus as set forth in claim 19, further comprising quantizing means for quantizing said extracted spectral components and said spectral component of said template in an initial adaptation for said spectral component of said template,
wherein said subtracting means calculates a difference between each extracted spectral component and said spectral component of said template which have been quantized by said quantizing means.
21. The audio signal processing apparatus as set forth in claim 18, further comprising receiving means for receiving an amount of increase or decrease for said predetermined spectral component,
wherein said increasing and decreasing means increases or decreases said extracted predetermined spectral component in response to said amount of increase or decrease received by said receiving means.
22. An audio signal processing system including:
a first audio signal processing apparatus comprising: extracting means for extracting a predetermined non-harmonic structured spectral component contained in an audio signal; and outputting means for outputting onset time information of the extraction of said predetermined non-harmonic structured spectral component from said audio signal by said extracting means, said predetermined spectral component, and said audio signal; and
a second audio signal processing apparatus comprising: receiving means for receiving said onset time information, said predetermined spectral component, and said audio signal outputted from said first audio signal processing apparatus; and increasing and decreasing means for increasing or decreasing said received spectral component contained in said received audio signal, on the basis of said onset time information received by said receiving means.
23. An audio signal processing apparatus comprising:
extracting means for extracting a predetermined non-harmonic structured spectral component contained in an audio signal; and
outputting means for outputting onset time information of the extraction of said predetermined non-harmonic structured spectral component from said audio signal by said extracting means, said predetermined spectral component, and said audio signal.
24. An audio signal processing apparatus comprising:
receiving means for receiving onset time information of the extraction of a predetermined non-harmonic structured spectral component from an audio signal, said predetermined spectral component, and said audio signal; and
increasing and decreasing means for increasing or decreasing said received spectral component contained in said received audio signal, on the basis of said onset time information received by said receiving means.
25. An audio signal processing apparatus comprising a processor being capable of performing following operations of:
extracting a predetermined non-harmonic structured spectral component contained in an audio signal; and
increasing or decreasing said extracted predetermined spectral component.
26. The audio signal processing apparatus as set forth in claim 25, wherein
said processor is further capable of performing a following operation of calculating a spectrum of said audio signal by frequency analysis; and
in said operation of extracting a predetermined non-harmonic structured spectral component, a spectrum is extracted that corresponds to said predetermined non-harmonic structured spectral component.
27. The audio signal processing apparatus as set forth in claim 26, further comprising a storage unit for storing a spectral component of a template in advance, wherein
said operation of extracting a predetermined non-harmonic structured spectral component is performed with reference to a spectral component of a template stored in said storage unit in advance, and
said processor is further capable of performing a following operation of adapting said spectral component of said template in such a manner that a difference between said extracted spectral component and said spectral component of said template goes below or at a predetermined value.
28. The audio signal processing apparatus as set forth in claim 27, wherein, in said adapting operation, said processor is further capable of performing following operations of:
calculating a difference between each extracted spectral component and said spectral component of said template in case that a plurality of spectral components have been extracted;
selecting a predetermined number of spectral components in ascending order of said calculated difference; and
updating said spectral component of said template into a median of said predetermined number of selected spectral components.
29. The audio signal processing apparatus as set forth in claim 28, wherein
said processor is further capable of performing a following operation of quantizing said extracted spectral components and said spectral component of said template in an initial adaptation for said spectral component of said template, and
in said operation of calculating a difference, a difference is calculated between each extracted spectral component and said spectral component of said template which have been quantized.
30. The audio signal processing apparatus as set forth in claim 25, further comprising a receiving unit for receiving an amount of increase or decrease for said predetermined spectral component,
wherein said processor increases or decreases said extracted predetermined spectral component in response to said received amount of increase or decrease.
31. An audio signal processing apparatus comprising: a storage unit for storing a spectral component of a template in advance; and a processor for extracting, with reference to a spectral component of a template stored in said storage unit in advance, a predetermined non-harmonic structured spectral component contained in an audio signal;
wherein said processor is further capable of performing a following operation of adapting said spectral component of said template in such a manner that a difference between said extracted spectral component and said spectral component of said template goes below or at a predetermined value.
32. The audio signal processing apparatus as set forth in claim 31, wherein, in said adapting operation, said processor is further capable of performing following operations of:
calculating a difference between each extracted spectral component and said spectral component of said template in case that a plurality of spectral components have been extracted;
selecting a predetermined number of spectral components in ascending order of said calculated difference; and
updating said spectral component of said template into a median of said predetermined number of selected spectral components.
33. The audio signal processing apparatus as set forth in claim 32, wherein
said processor is further capable of performing a following operation of quantizing said extracted spectral components and said spectral component of said template in an initial adaptation for said spectral component of said template, and
in said operation of calculating a difference, a difference is calculated between each extracted spectral component and said spectral component of said template which have been quantized.
34. The audio signal processing apparatus as set forth in claim 31, further comprising a receiving unit for receiving an amount of increase or decrease for said predetermined spectral component,
wherein said processor increases or decreases said extracted predetermined spectral component in response to said received amount of increase or decrease.
35. An audio signal processing system including:
a first audio signal processing apparatus comprising a processor being capable of performing following operations of extracting a predetermined non-harmonic structured spectral component contained in an audio signal; and outputting onset time information of the extraction of said predetermined non-harmonic structured spectral component from said audio signal, said predetermined spectral component, and said audio signal; and
a second audio signal processing apparatus comprising a processor being capable of performing following operations of receiving said onset time information, said predetermined spectral component, and said audio signal outputted from said first audio signal processing apparatus; and increasing or decreasing said received spectral component contained in said received audio signal, on the basis of said received onset time information.
36. An audio signal processing apparatus comprising a processor being capable of performing following operations of:
extracting a predetermined non-harmonic structured spectral component contained in an audio signal; and
outputting onset time information of the extraction of said predetermined non-harmonic structured spectral component from said audio signal, said predetermined spectral component, and said audio signal.
37. An audio signal processing apparatus comprising a processor being capable of performing following operations of;
receiving onset time information of the extraction of a predetermined non-harmonic structured spectral component from an audio signal, said predetermined spectral component, and said audio signal; and
increasing or decreasing said received spectral component contained in said received audio signal, on the basis of said received onset time information.
38. A computer program product for causing a computer to process an audio signal, wherein said computer program product comprises:
a computer readable storage medium having computer readable program code means embodied in said medium, said computer readable program code means comprising instructions for:
extracting a predetermined non-harmonic structured spectral component contained in an audio signal; and
increasing or decreasing said extracted predetermined spectral component.
39. The computer program product as set forth in claim 38, wherein
said computer readable program code means further comprises an instruction for calculating a spectrum of said audio signal by frequency analysis, and
said extracting instruction causes said computer to extract a spectrum corresponding to said predetermined non-harmonic structured spectral component.
40. The computer program product as set forth in claim 39, wherein
said instruction for extracting a predetermined non-harmonic structured spectral component is executed with reference to a spectral component of a template stored in advance, and
said computer readable program code means further comprises an instruction for adapting said spectral component of said template in such a manner that a difference between said extracted spectral component and said spectral component of said template goes below or at a predetermined value.
41. The computer program product as set forth in claim 40, wherein, in said adapting instruction, said computer readable program code means further comprises instructions for:
calculating a difference between each extracted spectral component and said spectral component of said template in case that a plurality of spectral components have been extracted;
selecting a predetermined number of spectral components in ascending order of said calculated difference; and
updating said spectral component of said template into a median of said predetermined number of selected spectral components.
42. The computer program product as set forth in claim 41, wherein
said computer readable program code means further comprises an instruction for quantizing said extracted spectral components and said spectral component of said template in an initial adaptation for said spectral component of said template; and
said instruction for calculating a difference causes said computer to calculate a difference between each extracted spectral component and said spectral component of said template which have been quantized.
43. The computer program product as set forth in claim 38, wherein
said computer readable program code means further comprises an instruction for receiving an amount of increase or decrease for said predetermined spectral component; and
said increasing or decreasing instruction causes said computer to increase or decrease said extracted predetermined spectral component in response to said received amount of increase or decrease.
44. A computer program product for causing a computer to extract, with reference to a spectral component of a template stored in a memory in advance, a predetermined non-harmonic structured spectral component contained in an audio signal, wherein said computer program product comprises:
a computer readable storage medium having computer readable program code means embodied in said medium, said computer readable program code means comprising an instruction for adapting said spectral component of said template in such a manner that a difference between said extracted spectral component and said spectral component of said template goes below or at a predetermined value.
45. The computer program product as set forth in claim 44, wherein, in said adapting instruction, said computer readable program code means further comprises instructions for:
calculating a difference between each extracted spectral component and said spectral component of said template in case that a plurality of spectral components have been extracted;
selecting a predetermined number of spectral components in ascending order of said calculated difference; and
updating said spectral component of said template into a median of said predetermined number of selected spectral components.
46. The computer program product as set forth in claim 45, wherein
said computer readable program code means further comprises an instruction for quantizing said extracted spectral components and said spectral component of said template in an initial adaptation for said spectral component of said template; and
said instruction for calculating a difference causes said computer to calculate a difference between each extracted spectral component and said spectral component of said template which have been quantized.
47. The computer program product as set forth in claim 44, wherein
said computer readable program code means further comprises an instruction for receiving an amount of increase or decrease for said predetermined spectral component; and
said increasing or decreasing instruction causes said computer to increase or decrease said extracted predetermined spectral component in response to said received amount of increase or decrease.
48. A computer program product for causing a computer to process an audio signal, wherein said computer program product comprises:
a computer readable storage medium having computer readable program code means embodied in said medium, said computer readable program code means comprising instructions for:
extracting a predetermined non-harmonic structured spectral component contained in an audio signal; and
outputting onset time information of the extraction of said predetermined non-harmonic structured spectral component from said audio signal, said predetermined spectral component, and said audio signal.
49. A computer program product for causing a computer to process an audio signal, wherein said computer program product comprises:
a computer readable storage medium having computer readable program code means embodied in said medium, said computer readable program code means comprising instructions for:
receiving onset time information of the extraction of a predetermined non-harmonic structured spectral component from an audio signal, said predetermined spectral component, and said audio signal; and
increasing or decreasing said received spectral component contained in said received audio signal, on the basis of said received onset time information.
US11/020,030 2004-06-18 2004-12-21 Audio signal processing method, audio signal processing apparatus, audio signal processing system and computer program product Abandoned US20050283361A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2004-181881 2004-06-18
JP2004181881A JP4318119B2 (en) 2004-06-18 2004-06-18 Acoustic signal processing method, acoustic signal processing apparatus, acoustic signal processing system, and computer program

Publications (1)

Publication Number Publication Date
US20050283361A1 true US20050283361A1 (en) 2005-12-22

Family

ID=35481746

Family Applications (1)

Application Number Title Priority Date Filing Date
US11/020,030 Abandoned US20050283361A1 (en) 2004-06-18 2004-12-21 Audio signal processing method, audio signal processing apparatus, audio signal processing system and computer program product

Country Status (2)

Country Link
US (1) US20050283361A1 (en)
JP (1) JP4318119B2 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2148321A1 (en) * 2007-04-13 2010-01-27 Kyoto University Sound source separation system, sound source separation method, and computer program for sound source separation
FR2980620A1 (en) * 2011-09-23 2013-03-29 France Telecom Method for processing decoded audio frequency signal, e.g. coded voice signal including music, involves performing spectral attenuation of residue, and combining residue and attenuated signal from spectrum of tonal components
US8541676B1 (en) * 2010-03-06 2013-09-24 Alexander Waldman Method for extracting individual instrumental parts from an audio recording and optionally outputting sheet music
US8831762B2 (en) 2009-02-17 2014-09-09 Kyoto University Music audio signal generating system
CN111382302A (en) * 2018-12-28 2020-07-07 中国科学院声学研究所 Audio sample retrieval method based on variable speed template
CN113496706A (en) * 2020-03-19 2021-10-12 北京字节跳动网络技术有限公司 Audio processing method and device, electronic equipment and storage medium
RU2801156C2 (en) * 2013-04-05 2023-08-02 Долби Лабораторис Лайсэнзин Корпорейшн Companding system and method for reducing quantization noise using improved spectral expansion

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5273402B2 (en) * 2010-05-11 2013-08-28 ブラザー工業株式会社 Karaoke equipment
CN102248142B (en) * 2011-06-30 2013-05-08 攀钢集团有限公司 Method for producing medium and low carbon aluminum killed steel

Citations (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5025471A (en) * 1989-08-04 1991-06-18 Scott Instruments Corporation Method and apparatus for extracting information-bearing portions of a signal for recognizing varying instances of similar patterns
US5640490A (en) * 1994-11-14 1997-06-17 Fonix Corporation User independent, real-time speech recognition system and method
US5757937A (en) * 1996-01-31 1998-05-26 Nippon Telegraph And Telephone Corporation Acoustic noise suppressor
US5854600A (en) * 1991-05-29 1998-12-29 Pacific Microsonics, Inc. Hidden side code channels
US5924066A (en) * 1997-09-26 1999-07-13 U S West, Inc. System and method for classifying a speech signal
US6073096A (en) * 1998-02-04 2000-06-06 International Business Machines Corporation Speaker adaptation system and method based on class-specific pre-clustering training speakers
US6127617A (en) * 1997-09-25 2000-10-03 Yamaha Corporation Effector differently controlling harmonics and noises to improve sound field effect
US6138093A (en) * 1997-03-03 2000-10-24 Telefonaktiebolaget Lm Ericsson High resolution post processing method for a speech decoder
US6233550B1 (en) * 1997-08-29 2001-05-15 The Regents Of The University Of California Method and apparatus for hybrid coding of speech at 4kbps
US6263312B1 (en) * 1997-10-03 2001-07-17 Alaris, Inc. Audio compression and decompression employing subband decomposition of residual signal and distortion reduction
US6424942B1 (en) * 1998-10-26 2002-07-23 Telefonaktiebolaget Lm Ericsson (Publ) Methods and arrangements in a telecommunications system
US6426456B1 (en) * 2001-10-26 2002-07-30 Motorola, Inc. Method and apparatus for generating percussive sounds in embedded devices
US20020120445A1 (en) * 2000-11-03 2002-08-29 Renat Vafin Coding signals
US6453282B1 (en) * 1997-08-22 2002-09-17 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Method and device for detecting a transient in a discrete-time audiosignal
US6576827B2 (en) * 2001-03-23 2003-06-10 Yamaha Corporation Music sound synthesis with waveform caching by prediction
US6584442B1 (en) * 1999-03-25 2003-06-24 Yamaha Corporation Method and apparatus for compressing and generating waveform
US6629049B2 (en) * 1996-03-05 2003-09-30 Hirata Wave Analysis, Inc. Method for non-harmonic analysis of waveforms for synthesis, interpolation and extrapolation
US6675144B1 (en) * 1997-05-15 2004-01-06 Hewlett-Packard Development Company, L.P. Audio coding systems and methods
US6757654B1 (en) * 2000-05-11 2004-06-29 Telefonaktiebolaget Lm Ericsson Forward error correction in speech coding
US6910011B1 (en) * 1999-08-16 2005-06-21 Haman Becker Automotive Systems - Wavemakers, Inc. Noisy acoustic signal enhancement
US6925435B1 (en) * 2000-11-27 2005-08-02 Mindspeed Technologies, Inc. Method and apparatus for improved noise reduction in a speech encoder
US6925434B2 (en) * 2000-03-15 2005-08-02 Koninklijke Philips Electronics N.V. Audio coding
US6993481B2 (en) * 2000-12-04 2006-01-31 Global Ip Sound Ab Detection of speech activity using feature model adaptation
US7117145B1 (en) * 2000-10-19 2006-10-03 Lear Corporation Adaptive filter for speech enhancement in a noisy environment
US7146003B2 (en) * 2000-09-30 2006-12-05 Zarlink Semiconductor Inc. Noise level calculator for echo canceller
US7337107B2 (en) * 2000-10-02 2008-02-26 The Regents Of The University Of California Perceptual harmonic cepstral coefficients as the front-end for speech recognition
US7353169B1 (en) * 2003-06-24 2008-04-01 Creative Technology Ltd. Transient detection and modification in audio signals

Patent Citations (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5025471A (en) * 1989-08-04 1991-06-18 Scott Instruments Corporation Method and apparatus for extracting information-bearing portions of a signal for recognizing varying instances of similar patterns
US5854600A (en) * 1991-05-29 1998-12-29 Pacific Microsonics, Inc. Hidden side code channels
US5640490A (en) * 1994-11-14 1997-06-17 Fonix Corporation User independent, real-time speech recognition system and method
US5757937A (en) * 1996-01-31 1998-05-26 Nippon Telegraph And Telephone Corporation Acoustic noise suppressor
US6629049B2 (en) * 1996-03-05 2003-09-30 Hirata Wave Analysis, Inc. Method for non-harmonic analysis of waveforms for synthesis, interpolation and extrapolation
US6138093A (en) * 1997-03-03 2000-10-24 Telefonaktiebolaget Lm Ericsson High resolution post processing method for a speech decoder
US6675144B1 (en) * 1997-05-15 2004-01-06 Hewlett-Packard Development Company, L.P. Audio coding systems and methods
US6453282B1 (en) * 1997-08-22 2002-09-17 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Method and device for detecting a transient in a discrete-time audiosignal
US6475245B2 (en) * 1997-08-29 2002-11-05 The Regents Of The University Of California Method and apparatus for hybrid coding of speech at 4KBPS having phase alignment between mode-switched frames
US6233550B1 (en) * 1997-08-29 2001-05-15 The Regents Of The University Of California Method and apparatus for hybrid coding of speech at 4kbps
US6127617A (en) * 1997-09-25 2000-10-03 Yamaha Corporation Effector differently controlling harmonics and noises to improve sound field effect
US5924066A (en) * 1997-09-26 1999-07-13 U S West, Inc. System and method for classifying a speech signal
US6263312B1 (en) * 1997-10-03 2001-07-17 Alaris, Inc. Audio compression and decompression employing subband decomposition of residual signal and distortion reduction
US6073096A (en) * 1998-02-04 2000-06-06 International Business Machines Corporation Speaker adaptation system and method based on class-specific pre-clustering training speakers
US6424942B1 (en) * 1998-10-26 2002-07-23 Telefonaktiebolaget Lm Ericsson (Publ) Methods and arrangements in a telecommunications system
US6584442B1 (en) * 1999-03-25 2003-06-24 Yamaha Corporation Method and apparatus for compressing and generating waveform
US7231347B2 (en) * 1999-08-16 2007-06-12 Qnx Software Systems (Wavemakers), Inc. Acoustic signal enhancement system
US6910011B1 (en) * 1999-08-16 2005-06-21 Haman Becker Automotive Systems - Wavemakers, Inc. Noisy acoustic signal enhancement
US6925434B2 (en) * 2000-03-15 2005-08-02 Koninklijke Philips Electronics N.V. Audio coding
US6757654B1 (en) * 2000-05-11 2004-06-29 Telefonaktiebolaget Lm Ericsson Forward error correction in speech coding
US7146003B2 (en) * 2000-09-30 2006-12-05 Zarlink Semiconductor Inc. Noise level calculator for echo canceller
US7337107B2 (en) * 2000-10-02 2008-02-26 The Regents Of The University Of California Perceptual harmonic cepstral coefficients as the front-end for speech recognition
US7117145B1 (en) * 2000-10-19 2006-10-03 Lear Corporation Adaptive filter for speech enhancement in a noisy environment
US20020120445A1 (en) * 2000-11-03 2002-08-29 Renat Vafin Coding signals
US6925435B1 (en) * 2000-11-27 2005-08-02 Mindspeed Technologies, Inc. Method and apparatus for improved noise reduction in a speech encoder
US6993481B2 (en) * 2000-12-04 2006-01-31 Global Ip Sound Ab Detection of speech activity using feature model adaptation
US6576827B2 (en) * 2001-03-23 2003-06-10 Yamaha Corporation Music sound synthesis with waveform caching by prediction
US6426456B1 (en) * 2001-10-26 2002-07-30 Motorola, Inc. Method and apparatus for generating percussive sounds in embedded devices
US7353169B1 (en) * 2003-06-24 2008-04-01 Creative Technology Ltd. Transient detection and modification in audio signals

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2148321A1 (en) * 2007-04-13 2010-01-27 Kyoto University Sound source separation system, sound source separation method, and computer program for sound source separation
US20100131086A1 (en) * 2007-04-13 2010-05-27 Kyoto University Sound source separation system, sound source separation method, and computer program for sound source separation
US8239052B2 (en) 2007-04-13 2012-08-07 National Institute Of Advanced Industrial Science And Technology Sound source separation system, sound source separation method, and computer program for sound source separation
EP2148321A4 (en) * 2007-04-13 2014-06-11 Nat Inst Of Advanced Ind Scien Sound source separation system, sound source separation method, and computer program for sound source separation
US8831762B2 (en) 2009-02-17 2014-09-09 Kyoto University Music audio signal generating system
US8541676B1 (en) * 2010-03-06 2013-09-24 Alexander Waldman Method for extracting individual instrumental parts from an audio recording and optionally outputting sheet music
FR2980620A1 (en) * 2011-09-23 2013-03-29 France Telecom Method for processing decoded audio frequency signal, e.g. coded voice signal including music, involves performing spectral attenuation of residue, and combining residue and attenuated signal from spectrum of tonal components
RU2801156C2 (en) * 2013-04-05 2023-08-02 Долби Лабораторис Лайсэнзин Корпорейшн Companding system and method for reducing quantization noise using improved spectral expansion
CN111382302A (en) * 2018-12-28 2020-07-07 中国科学院声学研究所 Audio sample retrieval method based on variable speed template
CN113496706A (en) * 2020-03-19 2021-10-12 北京字节跳动网络技术有限公司 Audio processing method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
JP2006005807A (en) 2006-01-05
JP4318119B2 (en) 2009-08-19

Similar Documents

Publication Publication Date Title
KR101101384B1 (en) Parameterized temporal feature analysis
US8805697B2 (en) Decomposition of music signals using basis functions with time-evolution information
US8831762B2 (en) Music audio signal generating system
US8036884B2 (en) Identification of the presence of speech in digital audio data
US9094078B2 (en) Method and apparatus for removing noise from input signal in noisy environment
JP4572218B2 (en) Music segment detection method, music segment detection device, music segment detection program, and recording medium
US8489404B2 (en) Method for detecting audio signal transient and time-scale modification based on same
EP1995723A1 (en) Neuroevolution training system
CN102668374A (en) Adaptive dynamic range enhancement of audio recordings
US10249315B2 (en) Method and apparatus for detecting correctness of pitch period
US7930173B2 (en) Signal processing method, signal processing apparatus and recording medium
KR101026632B1 (en) Method and apparatus for formant tracking using a residual model
US20240062738A1 (en) Methods and Apparatus for Harmonic Source Enhancement
JP4497911B2 (en) Signal detection apparatus and method, and program
US20050283361A1 (en) Audio signal processing method, audio signal processing apparatus, audio signal processing system and computer program product
US8532986B2 (en) Speech signal evaluation apparatus, storage medium storing speech signal evaluation program, and speech signal evaluation method
JP2021517267A (en) Methods and devices for extracting tone color attributes that do not depend on pitch from media signals
JPH0675562A (en) Automatic musical note picking-up device
KR20060029663A (en) Music summarization apparatus and method using multi-level vector quantization
US20070270987A1 (en) Signal processing method, signal processing apparatus and recording medium
JP6234134B2 (en) Speech synthesizer
US20060047506A1 (en) Greedy algorithm for identifying values for vocal tract resonance vectors
US20230419929A1 (en) Signal processing system, signal processing method, and program
US10629177B2 (en) Sound signal processing method and sound signal processing device
JP4843711B2 (en) Music type discrimination device, music type discrimination method, and music type discrimination program

Legal Events

Date Code Title Description
AS Assignment

Owner name: NATIONAL INSTITUTE OF ADVANCED INDUSTRIAL SCIENCE

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:YOSHII, KAZUYOSHI;OKUNO, HIROSHI;GOTO, MASATAKA;REEL/FRAME:015968/0859;SIGNING DATES FROM 20050301 TO 20050303

Owner name: KYOTO UNIVERSITY, JAPAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:YOSHII, KAZUYOSHI;OKUNO, HIROSHI;GOTO, MASATAKA;REEL/FRAME:015968/0859;SIGNING DATES FROM 20050301 TO 20050303

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION