WO2002005433A1 - A method, a device and a system for compressing a musical and voice signal - Google Patents

A method, a device and a system for compressing a musical and voice signal Download PDF

Info

Publication number
WO2002005433A1
WO2002005433A1 PCT/SG2001/000144 SG0100144W WO0205433A1 WO 2002005433 A1 WO2002005433 A1 WO 2002005433A1 SG 0100144 W SG0100144 W SG 0100144W WO 0205433 A1 WO0205433 A1 WO 0205433A1
Authority
WO
WIPO (PCT)
Prior art keywords
musical
voice signal
signal
compressing
notes
Prior art date
Application number
PCT/SG2001/000144
Other languages
French (fr)
Inventor
Kai Kong Ng
Original Assignee
Cyberinc Pte Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cyberinc Pte Ltd filed Critical Cyberinc Pte Ltd
Priority to AU2001284619A priority Critical patent/AU2001284619A1/en
Publication of WO2002005433A1 publication Critical patent/WO2002005433A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/0018Speech coding using phonetic or linguistical decoding of the source; Reconstruction using text-to-speech synthesis

Definitions

  • the invention relates to a method, a device and a system for compressing a musical and voice signal.
  • This technology has also shaped the way people enjoy themselves .
  • the Internet has served as a very useful highway to transport and distribute musical and voice signals to the user anywhere and anytime.
  • Compression technology is therefore the topic of focus, which would reduce the bandwidth requirement for the transmission of data in general, in particular for the transmission of musical and voice signals.
  • MP3 Motion Picture Expert Group (MPEG) Audio Layer 3
  • MPEG Motion Picture Expert Group
  • MP3 provides a compression ratio of about 10 times over uncompressed data for CD quality audio (44.1 kHz * 16 bit/s) .
  • musical signals 101 and vocals i.e. speech signals, also referred to as voice signals 102
  • voice signals 102 are recorded on separate tracks .
  • the analog signal is first converted to a digital form through an analog to digital conversion device, i.e. the analog musical signals 101 are converted into digitized musical signals 103 and the analog voice signals 102 are converted into digitized voice signals 104.
  • the separate signals are then mixed down (through a mixer) onto a master track (the audio signal) , which is symbolized by block 105 in Fig.l, which master track forms the compression source for most compression methods (including MP3) .
  • MP3 audio compression belongs to a class of data compression schemes called perceptual coding.
  • Perceptual coding analyses the frequency and amplitude content of the input signal, and compares it to a model of human auditory perception.
  • sub band / transform coding works in the frequency domain.
  • the uncorrelated nature of the spectral components makes it possible to quantise the spectral components in different frequency bands with a different number of bits, provided that the resulting quantization noise is unperceived.
  • the MP3 encoder analyses the frequency and amplitude content of. the input . audio signal and compares it to a psychoacoustics model of the human auditory system.
  • Audio compression examples include ADPCM (Adaptive Delta Pulse Code Modulation) , wavelet compression, etc.
  • the compressed data are stored into a storage device (step 107) , e.g. a hard disk, a CD-Rom or a semiconductor device like a Flash-Memory or a Read-Only-Memory (ROM) .
  • the data could also be stored • into a server computer where it would, be transmitted over a. transmission line (such as the Internet) to a user on-demand and stored within the user's storage device in a user's client computer.
  • the compressed audio data is decompressed (step 108) and outputted to a digital-to-analog device (step 109) , with the analog signal driving a loudspeaker producing music for listening pleasure.
  • An object of the invention is to compress a musical and voice signal with an improved compression ratio.
  • the object is achieved with a method, a device and a system for compressing a musical and voice signal with the features according to the independent claims .
  • a musical and voice sigikal which musical and voice signal comprises a musical signal and a voice signal
  • the sound from the musical instruments also ** referred to as musical signal and vocals, i.e. speech signal, also referred to as voice signal 102
  • voice signal 102 the sound from the musical instruments, also ** referred to as musical signal and vocals, i.e. speech signal, also referred to as voice signal 102, are recorded on separate tracks .
  • the analog musical signal is then converted into a digitized musical signal and the analog voice signal is converted into a digitized voice signal.
  • notes parameters of the musical signal are determined.
  • notes parameters are e.g.
  • the fundamental ..frequency in this context is the frequency, with which the notes of the reconstructed signal will later be played.
  • a compressed digitized voice signal is generated, using e.g. a speech recognition algorithm or a Linear Prediction Coding algorithm (LPC) .
  • LPC Linear Prediction Coding algorithm
  • the musical notes parameters are stored together with the compressed voice signal in a memory, so that it is possible to restore and decompress the musical notes parameters and the compressed voice signal, thereby generating a synthesized musical and voice signal .
  • the invention provides a much higher compression rate than the known compression algorithm.
  • the compression rate is even improved when using the speech recognition algorithm, e.g. using Hidden Markov Models, for compressing the voice signal.
  • the stored musical notes parameters and compressed voice signal may be transmitted from a server computer over a communication network, e.g. via the Internet to a client computer, where it is restored and decompressed, thereby generating the synthesized musical and voice signal, which is presented to a user of the client computer.
  • the compressed data may be stored into a storage device (step 107), e.g. a hard disk, a CD-Rpm or a semiconductor device like a Flash-Memory or a ROM (Read-Only- Memory) and restored and decompressed from that respective storage device.
  • a storage device e.g. a hard disk, a CD-Rpm or a semiconductor device like a Flash-Memory or a ROM (Read-Only- Memory) and restored and decompressed from that respective storage device.
  • the restoring of the compressed voice signal may Comprise the step of text-phoneme-converting of the compressed voice signal into a speech synthesis signal, which is used for generating the synthesized musical and voice signal.
  • a device for compressing a musical and voice signal comprises a processing unit for executing the above mentioned steps .
  • the device includes e.g.
  • a memory for storing t ⁇ e musical notes parameters together with the compressed voice signal, so that it is possible to restore and decompress the musical notes parameters and the compressed voice signal, thereby generating-a synthesized musical and voice signal, the memory being connected to the musical notes determination unit and the voice signal compression unit .
  • a system for compressing and decompressing a musical and voice signal comprises a processing unit for executing the above mentioned steps.
  • the system includes e.g.
  • a musical notes determination unit for determining musical notes parameters of the musical signal
  • a voice signal compression unit for compressing the voice signal independently from the musical signal
  • a memory for storing the musical notes parameters together with the compressed voice signal, so that it possible to restore and decompress the musical notes parameters ,and the compressed voice signal, thereby generating a synthesized musical and voice signal
  • the memory being connected to the musical notes determination unit and the voice signal compression unit
  • a musical and voice signal synthesizing unit for restoring and decompressing the musical notes parameters and the compressed voice signal, thereby generating a synthesized musical and voice signal.
  • the invention may be implemented using a special electronic circuit, i.e. in hardware, or using computer programs, i.e. in software.
  • Figure 1 is a block diagram showing an example of a method for compressing a musical and voice signal
  • Figure 2 is a block diagram showing a model of human speech production
  • Figure 3 is a block diagram showing an LPC voice coding unit, also referred to as a vocoder; and Figure 4 is a block diagram showing a system and a method for compressing a musical and voice signal according to a preferred embodiment of the invention.
  • an improved compression ratio is achieved by synthesizing an audio signal, i.e. a musical and voice signal, instead of modeling it.
  • Music synthesizing has been available e.g. by equipments which could synthesize (music synthesizer) musical instruments. Such a synthesizer has been provided by a standard keyboard input and it produces musical output from a musical notes.
  • Such a synthesizer e.g. uses a Wavetable method by ]pecording all the notes from a musical instrument and stores it into a semiconductor storage (ROM) . Given the instrument, notes ant! velocity (the information about how hard and how fast the key of the keyboard is pressed) , the particular musical notes can be played.
  • ROM semiconductor storage
  • the vocal compression according to a embodiment of the invention uses a method called Linear Predictive Coding (LPC) .
  • LPC Linear Predictive Coding
  • the way, how the human speech is generated, is modeled.
  • Speech is produced by cooperation of lungs, glottis (with vocal cords) and articulation tract (mouth and nose cavity) .
  • the lungs press air through the epiglottis, the vocal cords vibrates; they interrupt the air stream and produce a quasi-periodic pressure wave.
  • a model 200 illustrates the human speech production, as shown in Fig.2.
  • the lungs are modeled by a DC source 201 , the vocal cords by an impulse generator 202 and the articulation tract by a linear filter system 203.
  • a noise generator 204 produces the unvoiced excitation.
  • Speech sounds 205- consist of both voiced and unvoiced signals mixed together.
  • LPC coder A great advantage of an LPC coder is the manipulation facilities and the narrow analogy to human speech. By manipulating the parameters of the LPC vocoder, it is for example possible to transform a male voice into a female voice or a child voice.
  • An LPC vocoder can be used as the engine for the text-to-speech synthesis, which will be described later in detail .
  • Fig.3 shows a block diagram of an LPC vocoder 300.
  • the first step is to perform an LPC and speech analysis on the digital voice data, i.e. an LPC analysis (block 301) and a pitch analysis (block 302) .
  • Both sets of the determined LPC coefficients 303 and the determined pitch values 304 are then stored in the parameter memory (block 305) . These parameters are then used to control the synthesis part of the LPC vocoder 300.
  • the stored parameters a fed into a pitch generator 306, which generates reconstructed pitch values 307 and into a digital filter 308.
  • noise signals 310 are generated, by a noise generator 309.
  • the reconstructed pitch values 307 and the noise signals 310 are amplified (block 311) and the amplified signals 312 are fed into the digital filter 308, thereby generating a reconstructed voice signal 313.
  • the LPC compression can only be used for human speech compression, i.e. for compressing a voice signal. It is not suitable for compression of a musical signal.
  • the compression ratio achieved by the LPC is much higher than any audio compression (MP3, ADPCM or Wavelet) so far.
  • Fig.4 shows a system for compressing and decompressing a musical and voice signal according to a preferred embodiment of the invention.
  • the system 400 comprises a server computer 401 and a plurality of client computers 402, one of them being shown in Fig.4.
  • the server computer 401 and the client computer 402 are connected to each other via the Internet 403 as a communication network.
  • an analog musical signal 404 and an analog voice signal 405 are recorded on separate tracks using a microphone (not shown) .
  • the analog musical signal 404 and analog voice signal 405 are converted into a digital musical signal 406 and a digital voice signal 407 signal using an analog to digital conversion device.
  • the digital signal from the musical instrument i.e. the digital musical signal 406 is fed into a frequency analyzer, which determines the fundamental frequency of the notes played.
  • the amplitude and the type of instrument are also recorded.
  • the digital musical*" * signal 406 is transformed from the time domain to the frequency domain.
  • the fundamental frequency is selected and its amplitude is noted, i.e. stored.
  • the fundamental frequency is the frequency, with which the noted will be played.
  • the frequency and the amplitude are recorded as described in the General MIDI standard.
  • the frequency is respectively stored as the notes.
  • the amplitude is stored as the velocity.
  • the determined values are normalized to fit in the required predetermined range .
  • the fundamental frequency of the notes played the amplitude and the type of instrument form the musical notes parameters (block 408) .
  • the digital voice signal 407 is fed into an LPC vocoder 409
  • the LPC vocoder 409 determines the LPC coefficients as described above* thereby generating a compressed voice signal 411.
  • a speech recognition can alternatively be used to replace the LPC.
  • Hidden Markov Models may be used.
  • the musical notes parameters 410 and the compressed voice signal 411 is multiplexed and stored in storage device of the server computer 401 (block 412) , alternatively on any other storage medium such as a CD-ROM.
  • multiplexed is to be understood in the sense that a rather small portion of the musical notes parameters 410 and a rather small portion of the compressed voice signal 411 are loaded into a small memory space sufficient to store those two portions, which respectively form a sub portion of the whole musical notes parameters 410 and compressed voice signal 411.
  • the client computer is a cheap and rather low-end device such as a mobile phone or a PDA having an audio player, with which it is possible to reconstruct and play the reconstructed audio signal.
  • Another advantage of the storing of a small portion of the musical notes parameters 410 and a small portion of the compressed voice signal 411 together is that in this case it is not necessary to transmit the entire musical notes parameters 410 and the entire compressed voice signal 411 before beginning to reconstruct and play the audio signal, i.e. the song.
  • This is particularly advantageous when using a rather slow communication network such as the Internet 403 using a slow telephone modem line between the server computer 401 and the client computer 402.
  • the data 413 is transmitted across the Internet 403 on an on- demand basis.
  • the received data 413 is then stored within the client computer 402 (block 414) .
  • the compressed data 413 is extracted and decompressed e.g. in real-time.
  • the stored musical notes parameters 410 are extracted (block 415) and a decompressed digital musical signal 416 is generated using the Wavetable method used in a usual keyboard synthesizer.
  • the stored compressed voice signal 411 is decompressed (block 417) and a decompressed digital voice signal 418 is generated.
  • the decompressed digital voice signal 418 is generated in the way described with reference to the LPC vocoder of Fig.3.
  • text-to-speech conversion is used for the synthesis of the digital voice signal 418.
  • the information and stress of the voice are adjusted based on the particular context of the reconstructed digital voice signal 418.
  • the information and the stress may be provided by the melody of the digital musical signal 410 using the note pitch and its amplitude.
  • the "raw" musical signals and voices signals are combined either by digital or analog means.
  • a digital-to-analog conversion process will convert the digital signals to analog signals.
  • the decompressed digital musical signal 416 is converted into a decompressed analog musical signal 420 (block 419) .
  • the reconstructed digital voice signal 418 is converted into a reconstructed analog voice signal 422 (block 421) .
  • the analog musical signal 420 and the analog voice signal 422 are then combined through a' summing operational amplifier, i.e. a mixer 423, thereby generating a reconstructed analog musical and voice signal 424.
  • a' summing operational amplifier i.e. a mixer 423
  • the analog musical and voice signal.424 is. output to a power amplifier 425 and the thereby generated amplified analog musical and voice signal 426 is used to drive a speaker in order to generate the audio signal 427 output to the user of the client computer 402.

Abstract

A method for compressing a musical and voice signal, comprising a musical signal and a voice signal , comprises the following steps: determining musical notes parameters of the musical signal, compressing the voice signal independently from the musical signal, and storing the musical notes parameters together with the compressed voice signal, thereby generating a synthesized musical and voice signal.

Description

A method, a device and a system for compressing a musical and voice signal
BACKGROUND OF THE INVENTION
The invention relates to a method, a device and a system for compressing a musical and voice signal.
In today's advance in digital communication technology, transmission of data across the Internet, mobile technology- has made information available to the user almost immediately even over decentralized communication networks such as the Internet .
This technology has also shaped the way people enjoy themselves .
In the audio entertainment field, a user nowadays usually expects audio enjoyment in an on-demand basis. The Internet has served as a very useful highway to transport and distribute musical and voice signals to the user anywhere and anytime.
The Internet phenomena are only at its infancy and it has experienced enormous growth.' Even then, the increasing number of users and new applications entering the Internet need a bandwidth which goes far beyond the bandwidth which is currently available from communication networks.
Compression technology is therefore the topic of focus, which would reduce the bandwidth requirement for the transmission of data in general, in particular for the transmission of musical and voice signals. Using compression of data, to the user, it would mean a shorter time to download the data, a need of smaller storage space and therefore saved money and time .
MP3 (Motion Picture Expert Group (MPEG) Audio Layer 3) was therefore being popular adopted by the industry as the de- facto standard for transmission of audio data across the Internet .
MP3 provides a compression ratio of about 10 times over uncompressed data for CD quality audio (44.1 kHz * 16 bit/s) . To transmit 'a three minute lasting uncompressed song as a musical and voice signal across the Internet would take (44.1 * 16kbps * 2 channels * 3 min * 60 sec) / 56kbps = 4536 sec = 75.6 minf which is more than an hour.
MP3 would reduce that to only 7.56 minutes, which is an amazing feat.
However, to transmit an album of 10 MP3 songs as musical and voice signals would again take more than an hour. Therefore, a compression method of more than 10 times would be desirable if Internet music becomes a reality.
As shown in Fig.l, professional music is usually recorded in a studio within a soundproof room.
The sound from the musical instruments, also referred to as musical signals 101 and vocals, i.e. speech signals, also referred to as voice signals 102, are recorded on separate tracks .
If the data (comprising the analog musical signals 101 and the voice signals 102) is to be compressed using a digital method, the analog signal is first converted to a digital form through an analog to digital conversion device, i.e. the analog musical signals 101 are converted into digitized musical signals 103 and the analog voice signals 102 are converted into digitized voice signals 104.
The separate signals are then mixed down (through a mixer) onto a master track (the audio signal) , which is symbolized by block 105 in Fig.l, which master track forms the compression source for most compression methods (including MP3) .
MP3 audio compression belongs to a class of data compression schemes called perceptual coding.
This is based on the sub band / transform coding technique. Perceptual coding analyses the frequency and amplitude content of the input signal, and compares it to a model of human auditory perception.
Information that is audible is coded and everything that is inaudible can be discarded.
The advantage of sub band / transform coding is that it works in the frequency domain. The uncorrelated nature of the spectral components makes it possible to quantise the spectral components in different frequency bands with a different number of bits, provided that the resulting quantization noise is unperceived.
This advantage is further exploited by MP3 using the masking" phenomenon of the human auditory system. The MP3 encoder analyses the frequency and amplitude content of. the input . audio signal and compares it to a psychoacoustics model of the human auditory system.
Alternative forms of audio compression include ADPCM (Adaptive Delta Pulse Code Modulation) , wavelet compression, etc. After the audio signal is compress (step 106) , the compressed data are stored into a storage device (step 107) , e.g. a hard disk, a CD-Rom or a semiconductor device like a Flash-Memory or a Read-Only-Memory (ROM) . The data could also be stored • into a server computer where it would, be transmitted over a. transmission line (such as the Internet) to a user on-demand and stored within the user's storage device in a user's client computer.
When the user wishes to listen to the piece of an audio signal, the compressed audio data is decompressed (step 108) and outputted to a digital-to-analog device (step 109) , with the analog signal driving a loudspeaker producing music for listening pleasure.
SUMMARY OF THE INVENTION
An object of the invention is to compress a musical and voice signal with an improved compression ratio.
The object is achieved with a method, a device and a system for compressing a musical and voice signal with the features according to the independent claims .
In a method for compressing' a musical and voice sigikal, which musical and voice signal comprises a musical signal and a voice signal, the sound from the musical instruments, also ** referred to as musical signal and vocals, i.e. speech signal, also referred to as voice signal 102, are recorded on separate tracks .
The analog musical signal is then converted into a digitized musical signal and the analog voice signal is converted into a digitized voice signal. For the digitized musical signal, notes parameters of the musical signal are determined. In this context, notes parameters are e.g.
• the fundamental frequency of the notes of the musical signal, and/or
• the amplitude of the musical signal, and/or
• the type of instrument or instruments, which are involved in generating the musical signal.
The fundamental ..frequency in this context is the frequency, with which the notes of the reconstructed signal will later be played. *
For the digitized voice signal, a compressed digitized voice signal is generated, using e.g. a speech recognition algorithm or a Linear Prediction Coding algorithm (LPC) .
The determination of the notes parameters and the compression of the digitized voice signal are executed independently from each other.
In a last step, the musical notes parameters are stored together with the compressed voice signal in a memory, so that it is possible to restore and decompress the musical notes parameters and the compressed voice signal, thereby generating a synthesized musical and voice signal .
The invention provides a much higher compression rate than the known compression algorithm. The compression rate is even improved when using the speech recognition algorithm, e.g. using Hidden Markov Models, for compressing the voice signal.
The stored musical notes parameters and compressed voice signal may be transmitted from a server computer over a communication network, e.g. via the Internet to a client computer, where it is restored and decompressed, thereby generating the synthesized musical and voice signal, which is presented to a user of the client computer.
Alternatively, the compressed data may be stored into a storage device (step 107), e.g. a hard disk, a CD-Rpm or a semiconductor device like a Flash-Memory or a ROM (Read-Only- Memory) and restored and decompressed from that respective storage device.
When using the speech recognition algorithm for compressing the speech signal, the restoring of the compressed voice signal may Comprise the step of text-phoneme-converting of the compressed voice signal into a speech synthesis signal, which is used for generating the synthesized musical and voice signal.
Furthermore, a device for compressing a musical and voice signal comprises a processing unit for executing the above mentioned steps .
Thus, the device includes e.g.
• a musical notes determination unit for determining musical notes parameters of the musical signal,
• a voice signal compression unit for compressing the voice signal independently from the musical signal, and
• a memory for storing tήe musical notes parameters together with the compressed voice signal, so that it is possible to restore and decompress the musical notes parameters and the compressed voice signal, thereby generating-a synthesized musical and voice signal, the memory being connected to the musical notes determination unit and the voice signal compression unit .
Furthermore, a system for compressing and decompressing a musical and voice signal comprises a processing unit for executing the above mentioned steps. Thus, the system includes e.g.
• a musical notes determination unit for determining musical notes parameters of the musical signal, • a voice signal compression unit for compressing the voice signal independently from the musical signal,
• a memory for storing the musical notes parameters together with the compressed voice signal, so that it possible to restore and decompress the musical notes parameters ,and the compressed voice signal, thereby generating a synthesized musical and voice signal, the memory 'being connected to the musical notes determination unit and the voice signal compression unit, and • a musical and voice signal synthesizing unit for restoring and decompressing the musical notes parameters and the compressed voice signal, thereby generating a synthesized musical and voice signal.
The invention may be implemented using a special electronic circuit, i.e. in hardware, or using computer programs, i.e. in software.
BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1 is a block diagram showing an example of a method for compressing a musical and voice signal;
Figure 2 is a block diagram showing a model of human speech production;
Figure 3 is a block diagram showing an LPC voice coding unit, also referred to as a vocoder; and Figure 4 is a block diagram showing a system and a method for compressing a musical and voice signal according to a preferred embodiment of the invention.
DESCRIPTION OF PREFERRED EMBODIMENTS OF THE INVENTION
Preferred embodiments of the invention and modifications thereof will now be described with reference to the accompanying drawings.
According to the embodiments of the invention, an improved compression ratio is achieved by synthesizing an audio signal, i.e. a musical and voice signal, instead of modeling it.
In order to properly synthesize the audio signal, the complete model of the instrument and vocal cord is required.
Music synthesizing has been available e.g. by equipments which could synthesize (music synthesizer) musical instruments. Such a synthesizer has been provided by a standard keyboard input and it produces musical output from a musical notes.
Such a synthesizer e.g. uses a Wavetable method by ]pecording all the notes from a musical instrument and stores it into a semiconductor storage (ROM) . Given the instrument, notes ant! velocity (the information about how hard and how fast the key of the keyboard is pressed) , the particular musical notes can be played.
Although popular in recording audio signals, it should be mentioned that music synthesis has never been used as a compression methodology according to the state of the art. Furthermore, voice can be synthesized using a text-to-speech generating method.
Furthermore, it is known to extract the vocal parameters to mimic a person's voice. Since a human is quite perceptive to a singer voice, a compression method that models the general vocal code would be sufficient and will be described instead.
The vocal compression according to a embodiment of the invention uses a method called Linear Predictive Coding (LPC) .
According to the LPC, the way, how the human speech is generated, is modeled.
Speech is produced by cooperation of lungs, glottis (with vocal cords) and articulation tract (mouth and nose cavity) .
For the production of voiced sounds, the lungs press air through the epiglottis, the vocal cords vibrates; they interrupt the air stream and produce a quasi-periodic pressure wave.
In the case of unvoiced sounds, the excitation of the vocal tract is more noise-like.
A model 200 illustrates the human speech production, as shown in Fig.2.
The lungs are modeled by a DC source 201 , the vocal cords by an impulse generator 202 and the articulation tract by a linear filter system 203. A noise generator 204 produces the unvoiced excitation. Speech sounds 205- consist of both voiced and unvoiced signals mixed together.
A great advantage of an LPC coder is the manipulation facilities and the narrow analogy to human speech. By manipulating the parameters of the LPC vocoder, it is for example possible to transform a male voice into a female voice or a child voice. An LPC vocoder can be used as the engine for the text-to-speech synthesis, which will be described later in detail .
Fig.3 shows a block diagram of an LPC vocoder 300.
The first step is to perform an LPC and speech analysis on the digital voice data, i.e. an LPC analysis (block 301) and a pitch analysis (block 302) .
Both sets of the determined LPC coefficients 303 and the determined pitch values 304 are then stored in the parameter memory (block 305) . These parameters are then used to control the synthesis part of the LPC vocoder 300.
In other words, the stored parameters a fed into a pitch generator 306, which generates reconstructed pitch values 307 and into a digital filter 308. Furthermore, noise signals 310 are generated, by a noise generator 309. The reconstructed pitch values 307 and the noise signals 310 are amplified (block 311) and the amplified signals 312 are fed into the digital filter 308, thereby generating a reconstructed voice signal 313.
In this context, it should be mentioned that the LPC compression can only be used for human speech compression, i.e. for compressing a voice signal. It is not suitable for compression of a musical signal. The compression ratio achieved by the LPC is much higher than any audio compression (MP3, ADPCM or Wavelet) so far.
Fig.4 shows a system for compressing and decompressing a musical and voice signal according to a preferred embodiment of the invention. The system 400 comprises a server computer 401 and a plurality of client computers 402, one of them being shown in Fig.4.
The respective steps which are executed during the method are symbolized as blocks in the server computer 401 and the client computer 402, respectively.
The server computer 401 and the client computer 402 are connected to each other via the Internet 403 as a communication network.
As shown in Fig.4, an analog musical signal 404 and an analog voice signal 405 are recorded on separate tracks using a microphone (not shown) .
The analog musical signal 404 and analog voice signal 405 are converted into a digital musical signal 406 and a digital voice signal 407 signal using an analog to digital conversion device.
The digital signal from the musical instrument, i.e. the digital musical signal 406 is fed into a frequency analyzer, which determines the fundamental frequency of the notes played. The amplitude and the type of instrument are also recorded.
In order to determine these parameters, the digital musical*"* signal 406 is transformed from the time domain to the frequency domain. The fundamental frequency is selected and its amplitude is noted, i.e. stored. The fundamental frequency is the frequency, with which the noted will be played. The frequency and the amplitude are recorded as described in the General MIDI standard. The frequency is respectively stored as the notes. The amplitude is stored as the velocity. According to this embodiment, the determined values are normalized to fit in the required predetermined range .
Together, the fundamental frequency of the notes played, the amplitude and the type of instrument form the musical notes parameters (block 408) .
The digital voice signal 407 is fed into an LPC vocoder 409 The LPC vocoder 409 determines the LPC coefficients as described above* thereby generating a compressed voice signal 411.
A speech recognition can alternatively be used to replace the LPC. When using a speech recognition algorithm, Hidden Markov Models may be used.
The musical notes parameters 410 and the compressed voice signal 411 is multiplexed and stored in storage device of the server computer 401 (block 412) , alternatively on any other storage medium such as a CD-ROM.
The term "multiplexed" is to be understood in the sense that a rather small portion of the musical notes parameters 410 and a rather small portion of the compressed voice signal 411 are loaded into a small memory space sufficient to store those two portions, which respectively form a sub portion of the whole musical notes parameters 410 and compressed voice signal 411.
With this optional feature, it is possible to reduce the required memory space in the client computer, which is especially advantageous if the client computer is a cheap and rather low-end device such as a mobile phone or a PDA having an audio player, with which it is possible to reconstruct and play the reconstructed audio signal. Another advantage of the storing of a small portion of the musical notes parameters 410 and a small portion of the compressed voice signal 411 together is that in this case it is not necessary to transmit the entire musical notes parameters 410 and the entire compressed voice signal 411 before beginning to reconstruct and play the audio signal, i.e. the song. This is particularly advantageous when using a rather slow communication network such as the Internet 403 using a slow telephone modem line between the server computer 401 and the client computer 402.
The data 413 is transmitted across the Internet 403 on an on- demand basis.
The received data 413 is then stored within the client computer 402 (block 414) .
When the user of the client computer 402 wishes to listen to a piece of music, the compressed data 413 is extracted and decompressed e.g. in real-time.
In other words, the stored musical notes parameters 410 are extracted (block 415) and a decompressed digital musical signal 416 is generated using the Wavetable method used in a usual keyboard synthesizer.
Furthermore, the stored compressed voice signal 411 is decompressed (block 417) and a decompressed digital voice signal 418 is generated.
When using the LPC, the decompressed digital voice signal 418 is generated in the way described with reference to the LPC vocoder of Fig.3.
In general, text-to-speech conversion is used for the synthesis of the digital voice signal 418. This means that a stored dictionary of text and corresponding phonemes is used. Each phoneme has a corresponding voice. The information and stress of the voice are adjusted based on the particular context of the reconstructed digital voice signal 418. The information and the stress may be provided by the melody of the digital musical signal 410 using the note pitch and its amplitude.
When using the speech recognition algorithm and the corresponding text-to-speech conversion will usually not generate the sound of the original singer. However, using the speech recognition algorithm provides the higher compression ratio.
The "raw" musical signals and voices signals are combined either by digital or analog means.
For analog combination, a digital-to-analog conversion process will convert the digital signals to analog signals. In other words, the decompressed digital musical signal 416 is converted into a decompressed analog musical signal 420 (block 419) . The reconstructed digital voice signal 418 is converted into a reconstructed analog voice signal 422 (block 421) .
The analog musical signal 420 and the analog voice signal 422 are then combined through a' summing operational amplifier, i.e. a mixer 423, thereby generating a reconstructed analog musical and voice signal 424.
The analog musical and voice signal.424 is. output to a power amplifier 425 and the thereby generated amplified analog musical and voice signal 426 is used to drive a speaker in order to generate the audio signal 427 output to the user of the client computer 402.

Claims

CLAIMSWhat is claimed is :
1. A method for compressing a musical and voice signal, comprising a musical signal and a voice signal, which method comprises the following steps :
• determining musical notes parameters of the musical signal, • compressing; the voice signal independently from the musical signal, and
• storing- the musical notes parameters together with the compressed voice signal, so that it is possible to restore and decompress the musical notes parameters and the compressed voice signal, thereby generating a synthesized musical and voice signal.
2. The method according to claim 1, wherein
• the musical signal is a digitized musical signal, and • the voice signal is a digitized voice signal.
3. The method according to claim 1 or 2, wherein
• the fundamental frequency of the notes of the musical signal, and/or • the amplitude of the musical signal, and/or
• the type of instrument 'or instruments, which
Figure imgf000017_0001
involved in generating the musical signal, are determined as the musical notes parameters.
4. The method according to any one of the claims 1 to 3, wherein the compressing of the digital voice signal is performed using a linear prediction algorithm.
5. The method according to any one of the claims 1 to 3 , wherein the compressing of the digital voice signal is performed using a speech recognition algorithm.
6. The method according to claim 5, wherein the speech recognition algorithm is an algorithm using Hidden Markov Models.
7. The method according to any one of the preceding claims, wherein the stored musical notes parameters and compressed voice signal are restored and decompressed, thereby generating the synthesized musical and voice signal.
8. The method according to claim 5 and 7, wherein the restoring of the compressed voice signal comprises the step of text-phoneme-converting of the compressed voice signal into a speech synthesis signal, which is used for generating the synthesized musical and voice signal .
9. A device for compressing a musical and voice signal, comprising a musical signal and a voice signal, the device comprising: • a musical notes determination unit for determining musical notes parameters of the musical signal,
• a voice signal compression unit for compressing the voice signal independently from the musical signal, and
• a memory for storing the musical notes parameters together with the compressed voice signal, so that it is possible to restore and decompress the musical <notes parameters and the compressed voice signal, thereby generating a synthesized musical and voice signal, the*"" memory being connected to the musical notes determination unit and the voice signal compression unit .
10. A system for compressing and decompressing a musical and voice signal, comprising a musical signal and a voice signal, the system comprising:
• a musical notes determination unit for determining musical notes parameters of the musical signal, • a voice signal compression unit for compressing the voice signal independently from the musical signal,
• a memory for storing the musical notes parameters together with the compressed voice signal, so that it is possible to restore and decompress the musical notes parameters and the compressed voice signal, thereby generating a synthesized musical and voice signal, the memory being connected to the musical notes determination unit and the voice signal compression unit, and ,
• a musical and voice signal synthesizing unit for restoring and decompressing the musical notes parameters and the compressed voice signal, thereby generating a synthesized musical and voice signal.
11. A computer readable medium, having a program recorded thereon, where the program makes the computer execute a procedure comprising the following steps for compressing a musical and voice signal, comprising a musical signal and a voice signal:
• determining musical notes parameters of the musical signal,
• compressing the voice signal independently from the musical signal, and • storing the musical notes parameters together with the compressed voice signal, so that it is possible^to restore and decompress the musical notes parameters and the compressed voice signal, thereby generating a synthesized musical and voice signal .
12. A computer program element which makes the computer execute a procedure comprising the following steps for compressing a musical and voice signal, comprising a musical signal and a voice signal : • determining musical notes parameters of the musical signal, compressing the voice signal independently from the musical signal, and storing the musical notes parameters together with the compressed voice signal, so that it is possible to restore and decompress the musical notes parameters and the compressed voice signal, thereby generating a synthesized musical and voice signal.
PCT/SG2001/000144 2000-07-10 2001-07-10 A method, a device and a system for compressing a musical and voice signal WO2002005433A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2001284619A AU2001284619A1 (en) 2000-07-10 2001-07-10 A method, a device and a system for compressing a musical and voice signal

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
SG200003823A SG98418A1 (en) 2000-07-10 2000-07-10 A method, a device and a system for compressing a musical and voice signal
SG0003823-2 2000-07-10

Publications (1)

Publication Number Publication Date
WO2002005433A1 true WO2002005433A1 (en) 2002-01-17

Family

ID=20430621

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/SG2001/000144 WO2002005433A1 (en) 2000-07-10 2001-07-10 A method, a device and a system for compressing a musical and voice signal

Country Status (3)

Country Link
AU (1) AU2001284619A1 (en)
SG (1) SG98418A1 (en)
WO (1) WO2002005433A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2372417A (en) * 2000-10-30 2002-08-21 Nec Corp Method and system for delivering music
WO2009110738A3 (en) * 2008-03-03 2009-10-29 엘지전자(주) Method and apparatus for processing audio signal
WO2009110751A3 (en) * 2008-03-04 2009-10-29 Lg Electronics Inc. Method and apparatus for processing an audio signal

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4510840A (en) * 1982-12-30 1985-04-16 Victor Company Of Japan, Limited Musical note display device
EP0480760A2 (en) * 1990-10-12 1992-04-15 Pioneer Electronic Corporation Apparatus for reproducing musical accompaniment information
EP0501483B1 (en) * 1991-02-27 1996-05-15 Ricos Co., Ltd. Backing chorus mixing device and karaoke system incorporating said device
US5518408A (en) * 1993-04-06 1996-05-21 Yamaha Corporation Karaoke apparatus sounding instrumental accompaniment and back chorus
US5541359A (en) * 1993-02-26 1996-07-30 Samsung Electronics Co., Ltd. Audio signal record format applicable to memory chips and the reproducing method and apparatus therefor
US5705762A (en) * 1994-12-08 1998-01-06 Samsung Electronics Co., Ltd. Data format and apparatus for song accompaniment which allows a user to select a section of a song for playback
US5890125A (en) * 1997-07-16 1999-03-30 Dolby Laboratories Licensing Corporation Method and apparatus for encoding and decoding multiple audio channels at low bit rates using adaptive selection of encoding method
US5915237A (en) * 1996-12-13 1999-06-22 Intel Corporation Representing speech using MIDI
US5974387A (en) * 1996-06-19 1999-10-26 Yamaha Corporation Audio recompression from higher rates for karaoke, video games, and other applications
US6077084A (en) * 1997-04-01 2000-06-20 Daiichi Kosho, Co., Ltd. Karaoke system and contents storage medium therefor
US6081780A (en) * 1998-04-28 2000-06-27 International Business Machines Corporation TTS and prosody based authoring system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US529A (en) * 1837-12-20 Secret safety-lock

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4510840A (en) * 1982-12-30 1985-04-16 Victor Company Of Japan, Limited Musical note display device
EP0480760A2 (en) * 1990-10-12 1992-04-15 Pioneer Electronic Corporation Apparatus for reproducing musical accompaniment information
EP0501483B1 (en) * 1991-02-27 1996-05-15 Ricos Co., Ltd. Backing chorus mixing device and karaoke system incorporating said device
US5541359A (en) * 1993-02-26 1996-07-30 Samsung Electronics Co., Ltd. Audio signal record format applicable to memory chips and the reproducing method and apparatus therefor
US5518408A (en) * 1993-04-06 1996-05-21 Yamaha Corporation Karaoke apparatus sounding instrumental accompaniment and back chorus
US5705762A (en) * 1994-12-08 1998-01-06 Samsung Electronics Co., Ltd. Data format and apparatus for song accompaniment which allows a user to select a section of a song for playback
US5974387A (en) * 1996-06-19 1999-10-26 Yamaha Corporation Audio recompression from higher rates for karaoke, video games, and other applications
US5915237A (en) * 1996-12-13 1999-06-22 Intel Corporation Representing speech using MIDI
US6077084A (en) * 1997-04-01 2000-06-20 Daiichi Kosho, Co., Ltd. Karaoke system and contents storage medium therefor
US5890125A (en) * 1997-07-16 1999-03-30 Dolby Laboratories Licensing Corporation Method and apparatus for encoding and decoding multiple audio channels at low bit rates using adaptive selection of encoding method
US6081780A (en) * 1998-04-28 2000-06-27 International Business Machines Corporation TTS and prosody based authoring system

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2372417A (en) * 2000-10-30 2002-08-21 Nec Corp Method and system for delivering music
GB2372417B (en) * 2000-10-30 2003-05-14 Nec Corp Method and system for delivering music
US6815601B2 (en) 2000-10-30 2004-11-09 Nec Corporation Method and system for delivering music
WO2009110738A3 (en) * 2008-03-03 2009-10-29 엘지전자(주) Method and apparatus for processing audio signal
US7991621B2 (en) 2008-03-03 2011-08-02 Lg Electronics Inc. Method and an apparatus for processing a signal
AU2009220321B2 (en) * 2008-03-03 2011-09-22 Intellectual Discovery Co., Ltd. Method and apparatus for processing audio signal
WO2009110751A3 (en) * 2008-03-04 2009-10-29 Lg Electronics Inc. Method and apparatus for processing an audio signal
AU2009220341B2 (en) * 2008-03-04 2011-09-22 Lg Electronics Inc. Method and apparatus for processing an audio signal
US8135585B2 (en) 2008-03-04 2012-03-13 Lg Electronics Inc. Method and an apparatus for processing a signal

Also Published As

Publication number Publication date
AU2001284619A1 (en) 2002-01-21
SG98418A1 (en) 2003-09-19

Similar Documents

Publication Publication Date Title
JP4132109B2 (en) Speech signal reproduction method and device, speech decoding method and device, and speech synthesis method and device
WO2003010752A1 (en) Speech bandwidth extension apparatus and speech bandwidth extension method
JPH10260692A (en) Method and system for recognition synthesis encoding and decoding of speech
CN101578659A (en) Voice tone converting device and voice tone converting method
JP2971796B2 (en) Low bit rate audio encoder and decoder
JPH07271396A (en) Voice encoding method and voice sound source device
US5828993A (en) Apparatus and method of coding and decoding vocal sound data based on phoneme
JP2002341896A (en) Digital audio compression circuit and expansion circuit
JP4420562B2 (en) System and method for improving the quality of encoded speech in which background noise coexists
JP2003108197A (en) Audio signal decoding device and audio signal encoding device
US7136811B2 (en) Low bandwidth speech communication using default and personal phoneme tables
US7356373B2 (en) Method and device for enhancing ring tones in mobile terminals
CN107547984A (en) A kind of audio-frequency inputting method and audio output system based on intelligent terminal
WO2002005433A1 (en) A method, a device and a system for compressing a musical and voice signal
CN100461262C (en) Terminal device, guide voice reproducing method and storage medium
CN100538820C (en) A kind of method and device that voice data is handled
JPH11338500A (en) Formant shift compensating sound synthesizer, and operation thereof
US6477496B1 (en) Signal synthesis by decoding subband scale factors from one audio signal and subband samples from different one
JP2002132271A (en) Music delivery system and music delivery method
JP2003099094A (en) Voice processing device
JP4826580B2 (en) Audio signal reproduction method and apparatus
JP2010160289A (en) Midi (r) karaoke system which automatically corrects interval
JP2658068B2 (en) Voice processor
KR20010008954A (en) Encoder and decoder for music file
JP2001034299A (en) Sound synthesis device

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP