CN101366080B - Method and system for updating state of demoder - Google Patents

Method and system for updating state of demoder Download PDF

Info

Publication number
CN101366080B
CN101366080B CN2007800020499A CN200780002049A CN101366080B CN 101366080 B CN101366080 B CN 101366080B CN 2007800020499 A CN2007800020499 A CN 2007800020499A CN 200780002049 A CN200780002049 A CN 200780002049A CN 101366080 B CN101366080 B CN 101366080B
Authority
CN
China
Prior art keywords
signal
frame
audio signal
extrapolation
time lag
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN2007800020499A
Other languages
Chinese (zh)
Other versions
CN101366080A (en
Inventor
罗伯塔·W.·措普夫
杰斯·赛森
朱因韦·陈
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Avago Technologies International Sales Pte Ltd
Original Assignee
Zyray Wireless Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zyray Wireless Inc filed Critical Zyray Wireless Inc
Priority claimed from PCT/US2007/076009 external-priority patent/WO2008022200A2/en
Publication of CN101366080A publication Critical patent/CN101366080A/en
Application granted granted Critical
Publication of CN101366080B publication Critical patent/CN101366080B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

A technique is described for updating a state of a decoder. In accordance with the technique, a decoder state is aligned with a synthesized output audio signal at a frame boundary. An extrapolated signal is generated from the synthesized output audio signal. A time lag is calculated between the extrapolated signal and a decoded audio signal associated with a first received frame after the lost frame in the series of frames, wherein the time lag represents a phase difference between the extrapolated signal and the decoded audio signal. The decoder state is then reset based on the time lag.

Description

A kind of method and system of state of more new decoder
Technical field
The present invention relates to the system and method that a kind of quality that hiding packet loss causes in voice or audio coder reduces effect.
Background technology
Undertaken by packet network in the process of digital transmission at sound or sound signal, the sound/sound signal of coding is divided into frame usually, is packaged into bag then, and wherein each bag can comprise the frame of one or more encode sound/voice datas.Transmit these bags by packet network then.Sometimes some bags can be lost, and some useful bags can arrive too late, lose thereby be identified as.This packet loss can cause the remarkable reduction of audio quality, unless use special technique to hide the effect that packet loss caused.
Current existence is used for the bag bag-losing hide based on the extrapolation sound signal (packet loss concealment the is abbreviated as PLC) method of autonomous block scrambler or full range band predictive coding device.This PLC method comprises disclosed technology in the following U.S. Patent application: application number is 11/234,291, artificially old, the name of invention is called that the U.S. Patent application of " the bag-losing hide technology that is used for the autonomous block audio coder ﹠ decoder (codec) " and application number are 10/183,608, invention is artificially old, name is called the U.S. Patent application of " frame deletion hidden method and the system based on the extrapolation speech waveform that are used to predict voice coding ".Yet the technology of describing in these applications can not be directly used in subband predictive coding device, recommends G.722 wideband acoustic encoder as ITU-T, and this is because the unsolved subband ad hoc structure of these technology of existence problem.In addition, for each subband, G.722 scrambler has used adaptive difference pulse code modulation (ADPCM) predictive coding device, this ADPCM predictive coding device has used based on the sampling one by one (sample-by-sample) of the quantiser step size of gradient method and predictor coefficient back to self-adaptation, and this has caused the unsolved particular challenge of existing PLC technology.Therefore, needing a kind of is the specially designed appropriate PLC method of subband predictive coding device (as G.722).
Summary of the invention
The present invention is used for hiding the quality reduction effect that packet loss causes at subband predictive coding device.The present invention has specifically solved the specific structure problem of some subbands when subband predictive coding device is used the audio volume control extrapolation technique, and the present invention has also solved a general back specific PLC difficult problem to self-adaptation adpcm encoder and special G.722 subband adpcm encoder.
Specifically, the present invention has described a kind of renewal be used to decode method of state of demoder of series of frames of expression coding audio signal at this.According to described method, the related output audio signal of lost frames in synthetic and the described series of frames.Described decoder states is set on frame boundaries aligns with described synthetic output audio signal.Produce the extrapolation signal based on described synthetic output audio signal.Calculate the time lag between described extrapolation signal and the decoded audio signal related with first received frame behind the lost frames in the described series of frames, wherein said time lag is represented the phase differential between described extrapolation signal and the described decoded audio signal.Reset described decoder states based on described time lag then.
The present invention has also described a kind of system at this.Described system comprises more new logic of demoder, sound signal compositor and demoder.Described demoder is used for the received frame of the series of frames of expression coding audio signal is decoded.Described sound signal compositor is used for the synthetic output audio signal related with the lost frames of described series of frames.Described decoder states more new logic is used for setting described decoder states for to align with described synthetic output audio signal on frame boundaries after producing described synthetic output audio signal, produce the extrapolation signal based on described synthetic output audio signal, calculate the time lag between described extrapolation signal and the decoded audio signal related, and reset described decoder states based on described time lag with first received frame behind the lost frames in the described series of frames.Described time lag is represented the phase differential between described extrapolation signal and the described decoded audio signal.
The present invention also describes a kind of computer program.Described computer program comprises the computer readable medium that records computer program logic, the state of the demoder of the series of frames that described computer program logic is used to make update processor be used to decode represents coding audio signal.Described computer program logic comprises first module, second module, three module, four module and the 5th module.First module is used for making described processor to synthesize the output audio signal related with the lost frames of described series of frames.Second module is used to make described processor to set described decoder states for to align with described synthetic output audio signal after producing described synthetic output audio signal on frame boundaries.Three module is used to make described processor to produce the extrapolation signal based on described synthetic output audio signal.Four module is used for making described processor to calculate time lag between described extrapolation signal and the decoded audio signal related with first received frame behind the described series of frames lost frames.The 5th module is used to make described processor to reset described decoder states based on described time lag, and wherein said time lag is represented the phase differential between described extrapolation signal and the described decoded audio signal.
The structure of more feature and advantage of the present invention and various embodiments of the invention and operation will be made further details with reference to the accompanying drawings and describe.Notice that the present invention is not limited to specific embodiments described herein.At the embodiment of this proposition only as exemplary purpose.Based on the instruction that is included in this, more embodiment is conspicuous to those of ordinary skill in the art.
Description of drawings
Accompanying drawing in this combination is the part of instructions, accompanying drawing has been illustrated the one or more embodiment of the present invention with text description, and be further used for explaining purposes of the present invention, advantage and principle, and the those skilled in the art is implemented and use the present invention.
Fig. 1 is the G.722 synoptic diagram of the coder structure of subband predictive coding device of traditional ITU-T;
Fig. 2 is the G.722 synoptic diagram of the decoder architecture of subband predictive coding device of traditional ITU-T;
Fig. 3 is the module map of the demoder/PLC system according to the embodiment of the invention;
Fig. 4 is a method flow diagram of exporting voice signal according to embodiment of the invention processed frame in demoder/PLC system with generation;
Fig. 5 is can be by the sequential chart of the dissimilar frames of demoder/PLC system handles according to the embodiment of the invention;
Fig. 6 is the timeline synoptic diagram of the amplitude of primary speech signal and extrapolation voice signal;
Fig. 7 is the method flow diagram that calculates time lag (time lag) according to the embodiment of the invention between decodeing speech signal and extrapolation voice signal;
Fig. 8 is the method flow diagram that calculates two stages of time lag according to the embodiment of the invention between decodeing speech signal and extrapolation voice signal;
Fig. 9 is that calculate in the implementation in time lag according to the embodiment of the invention can be with respect to the synoptic diagram of the mode of decodeing speech signal translation extrapolation voice signal;
Figure 10 A represents to be ahead of the decodeing speech signal of extrapolation voice signal and the timeline synoptic diagram of the relevant effect that recompile is operated according to the embodiment of the invention;
Figure 10 B represents to lag behind the decodeing speech signal of extrapolation voice signal and the timeline synoptic diagram of the relevant effect that recompile is operated according to the embodiment of the invention;
Figure 10 C is the timeline synoptic diagram that is illustrated in the relevant effect of extrapolation voice signal synchronous on the frame boundaries and decodeing speech signal and recompile operation according to the embodiment of the invention;
Figure 11 is a method flow diagram of carrying out the phasing again (re-phasing) of subband adpcm decoder internal state according to the embodiment of the invention behind packet loss;
Figure 12 A is the synoptic diagram that is twisted (time-warping) the decodeing speech signal application time that is ahead of the extrapolation voice signal according to the embodiment of the invention;
Figure 12 B and 12C all are the synoptic diagram that the decodeing speech signal application time that lags behind the extrapolation voice signal twisted according to the embodiment of the invention;
Figure 13 is with the process flow diagram along a kind of method of time shaft contraction signal according to embodiment of the invention execution time distortion;
Figure 14 is with the process flow diagram along a kind of method of time shaft stretch signal according to embodiment of the invention execution time distortion;
Figure 15 is the module map that is used in demoder/PLC system taking place the logic the received frame after the received frame of predetermined quantity handled behind the packet loss according to the embodiment of the invention;
Figure 16 be according to the embodiment of the invention be used for carry out the module map that the waveform extrapolation generates the logic of the output voice signal that is associated with the frame of losing at demoder/PLC system;
Figure 17 is the more module map of the logic of the subband adpcm decoder state of new decoder/PLC system that is used for according to the embodiment of the invention;
Figure 18 is the module map that is used for carrying out again in demoder/PLC system the logic of phasing and time distortion according to the embodiment of the invention;
Figure 19 is a module map of carrying out the logic of constraint and controlled decoding according to the good frame that is used for receiving after demoder/PLC system is to packet loss of the embodiment of the invention;
Figure 20 is the module map of simplification low strap adpcm encoder that is used for upgrading in the packet loss process internal state of low strap adpcm decoder according to the embodiment of the invention;
Figure 21 is the module map of simplification high-band adpcm encoder that is used for upgrading in the packet loss process internal state of high-band adpcm decoder according to the embodiment of the invention;
Figure 22 A, 22B and 22C all are according to the timeline synoptic diagram of the embodiment of the invention to the distortion of decodeing speech signal application time;
Figure 23 is the module map according to another demoder/PLC system of the embodiment of the invention;
Figure 24 is the module map that realizes the computer system of embodiments of the invention.
The details of making is in conjunction with the drawings described, and it is more apparent that feature and advantage of the present invention will become.Leftmost arabic numeral are represented in the Reference numeral of assembly by correspondence that occurs for the first time in the accompanying drawing.
Embodiment
A, foreword
Below with reference to accompanying drawing exemplary embodiments of the present invention having been done details describes.Other embodiment also is feasible, and can make modification to exemplary embodiment within spirit and scope of the invention.Therefore, following details description is not limited to the present invention.On the contrary, scope of the present invention is defined by claim.
The those skilled in the art should be easily understood that as described below, the present invention can realize in hardware, software, software and hardware and/or entity shown in the drawings.Anyly realize that with specific control hardware actual software code of the present invention is not restriction of the present invention.Thereby, below the description of operation of the present invention and behavior is based on following understanding provides, promptly, can carry out various modifications and variations to the embodiment among the application according to the program of the detailed description that provides among the application.
Although should be understood that the details of the present invention of this proposition describe be at the processing of voice signal, the present invention also can be used for relating to the processing of other type sound signal.Therefore, term " voice " and " voice signal " only are for convenience as used herein, not as restriction.Those having skill in the art will recognize that and know that this term can replace with term " audio frequency " and " sound signal " more commonly used.In addition,, those having skill in the art will recognize that and know that sort signal also can be divided into other discrete signal segment, includes but not limited to subframe though voice and sound signal are described as being divided into a plurality of frames at this.Thereby the similar operations of carrying out on other section of voice or sound signal that is also included within the operation of carrying out on the frame described here is as subframe.
In addition, though the LOF (being called as packet loss) by the sound signal of packet network transmission has been discussed in following description, the present invention is not limited to bag-losing hide (PLC).For example, in wireless network,, also may lose or delete audio signal frame because channel damages.This situation is referred to as " frame deletion ".When this situation takes place, descend for fear of the essence of exporting on the voice quality, the demoder in the wireless system needs to carry out " frame deletion is hidden " and (FEC) attempts hiding the quality decline that causes because of frame losing.For PLC or FEC algorithm, packet loss all runs into identical problem with frame deletion: the frame of some transmission can not be used further to decoding, so PLC or FEC algorithm need produce waveform and fill up waveform gap (gap) corresponding to the frame of losing, descend because of the quality that frame losing causes thereby hide frame.Because term FEC and PLC are often referred to the technology of identical type, so can alternately use.Thereby, for convenience's sake, be used in reference to for both at this used term " bag-losing hide " or PLC.
The review of B, subband predictive coding
In order to help understanding better the various embodiments of the present invention of describing in the chapters and sections in the back, recall the ultimate principle of subband predictive coding at this.Usually, subband predictive coding device can be separated into input speech signal N subband, wherein N 〉=2.Under situation about being without loss of generality, this with ITU-T G.722 the biobelt predictive coding system of scrambler be described as example.The those skilled in the art can easily be summarized into this description other N belt band predictive coding device.
Fig. 1 is the simplification coder structure 100 of G.722 subband predictive coding device.Coder structure 100 comprises quadrature mirror filter (QMF) analysis filterbank 110, low strap adaptive differential pulse code modulation (ADPCM) scrambler 120, high-band adpcm encoder 130 and bit stream multiplexer 140.QMF analysis filterbank 110 is separated into low strap voice signal and high-band voice signal with input speech signal.Low strap adpcm encoder 120 becomes the low strap bit stream with the low strap speech signal coding.High-band adpcm encoder 130 becomes the high-band bit stream with the high-band speech signal coding.Bit stream multiplexer 140 is multiplexed into single output bit flow with low strap bit stream and high-band bit stream.In the transmitted in packets of this discussion was used, this output bit flow was packaged into bag, is transferred to subband prediction decoding device 200 then, as shown in Figure 2.
As shown in Figure 2, demoder 200 comprises bit stream demultiplexer 210, low strap adpcm decoder 220, high-band adpcm decoder 230 and QMF composite filter group 240.Bit stream demultiplexer 210 is separated into low strap bit stream and high-band bit stream with incoming bit stream.Low strap adpcm decoder 220 becomes decoding low strap voice signal with the low strap bit stream decoding.High-band adpcm decoder 230 becomes decoding high-band speech signal with the high-band bit stream decoding.To decode low strap voice signal and decoding high-band voice signal of QMF composite filter group 240 merges and helps band output voice signal then.
About the more details of the structure of scrambler 100 and demoder 200 and operation can find in G.722 ITU-T recommends, its integral body is incorporated herein as a reference at this.
C, based on the bag-losing hide technology of the subband predictive coding device of full band speech waveform extrapolation (extrapolation)
Now high quality P LC system and method according to an embodiment of the invention is described.Summarized introduction to this system and method is provided in this section, and the more details that relate to this system and method specific implementation will be described in following D joint.This example system and method are used for ITU-T and recommend G.722 speech coder.Yet, those having skill in the art will recognize that and know, can be used for carrying out PLC in these many notions of describing with reference to this specific embodiments at the subband prediction speech coder of other type and the voice and the audio coder of other type.
As describing in this more details, this embodiment carries out PLC in the 16kHz domain output of Voice decoder G.722.This method periodic waveform extrapolation is filled the waveform that is associated with the frame losing of voice signal, wherein according to the signal characteristic that takes place before the frame losing extrapolation waveform is mixed with noise through filtering.In order to upgrade the state of subband adpcm decoder, the 16kHz signal of extrapolation generates subband signal by the QMF analysis filterbank, and this subband signal is handled by the subband adpcm encoder of simplifying then.For provide from the extrapolation waveform related with the frame of losing to packet loss after the seamlessly transitting of the related normal decoder waveform of the good frame that receives, can carry out extra processing behind each packet loss.Wherein, the good frame of first that receives behind the state of subband adpcm decoder and the packet loss carries out phase alignment, and the normal decoder waveform related with first good frame carried out time distortion, with its with interpolation waveform stack between this normal decoder waveform is alignd with the interpolation waveform, thereby realization seamlessly transits.For long-term packet drop, this system and method will weaken output signal gradually.
Fig. 3 is the higher level module figure that realizes the G.722 Voice decoder 300 of this PLC function.Though demoder described here/PLC system 300 comprises G.722 demoder, those having skill in the art will recognize that and knows, many notions described here can be used for any N belt band predictive coding system usually.Similarly, needing not to be adpcm encoder shown in Figure 3 at the predictive coding device of each subband, also can be any common predictive coding device, and can be that forward direction self-adaptation or back are to adaptive.
As shown in Figure 3, demoder/PLC system 300 comprises bit stream demultiplexer 310, low strap ADPCM demoder 320, high-band adpcm decoder 330, switch 336, QMF composite filter group 340, is with voice signal compositor 350, subband adpcm decoder state update module 360 and decoding to retrain and control module 370 entirely.
Term " lost frames " or " bad frame " refer to the voice signal frame that does not receive at demoder/PLC300 or be considered to be not suitable for the normal decoder operation as used herein." received frame " or " good frame " is at demoder/PLC system 300 normal voice signal frames that receive." present frame " is current just by the frame of demoder/PLC300 processing with generation output voice signal, and " preceding frame " generates the frame of exporting voice signal by demoder/PLC system 300 processing before being.Term " present frame " and " preceding frame " all can be used to the frame of finger receipts and are just carrying out PLC operation and lost frames.
The mode of demoder/PLC system 300 operations will be described with reference to the process flow diagram 400 of figure 4.As shown in Figure 4, the method for process flow diagram 400 is in step 402 beginning, and demoder/PLC system 300 determines the frame type of present frame.Six kinds of dissimilar frames are distinguished by demoder/PLC system 300, represent with Class1 to 6 respectively.Fig. 5 provides the timeline 500 of different frame type.The frame of Class1 is any received frame behind the 8th received frame behind the packet loss.The frame of type 2 is first and second lost frames relevant with packet loss.The frame of type 3 is any one in the 3rd to the 6th lost frames relevant with packet loss.The frame of type 4 is any one lost frames behind the 6th lost frames relevant with packet loss.The frame of type 5 is to follow any received frame that receives behind the packet loss closely.At last, the frame of type 6 is any one in second to the 8th received frame that receives behind packet loss.The those skilled in the art should know easily, also can use other scheme of classification frame type according to alternate embodiment of the present invention.For example, in the system with different frame size, the frame number in each frame type is all with above-mentioned different.Equally, for different codecs (being non-G.722 codec), the frame number in each frame type can be different.
The mode that demoder/PLC system 300 processing present frames produce the output voice signal is to be determined by the frame type of present frame.This shows by a series of determining steps 404,406,408 and 410 in Fig. 4.Specifically, if determine that in step 402 present frame is the frame of Class1, the treatment step of carrying out first sequence so produces the output voice signal, shown in determining step 404.If determine that in step 402 present frame is the frame of type 2, type 3 or type 4, the treatment step of carrying out second sequence so produces the output voice signal, shown in determining step 406.If determine that in step 402 present frame is the frame of type 5, the treatment step of carrying out the 3rd sequence so produces the output voice signal, shown in determining step 408.At last, if determine that in step 402 present frame is the frame of type 6, the treatment step of carrying out the 4th sequence so produces the output voice signal, shown in determining step 410.Treatment step with every kind of different frame type association below will be described.
After executing the treatment step of each sequence, determine whether that in determining step 430 extra frame will handle.If there is extra frame to handle, handles so and turn back to step 402.Yet, if there is not extra frame to handle, handle so shown in step 432 finish.
1, handles the frame of Class1
As shown in the step 412 of process flow diagram 400, if present frame is the frame of Class1, demoder/PLC system 300 carries out the normally G.722 decoding of present frame so.Therefore, the module 310,320,330 of demoder/ PLC system 300 and 340 is correctly carried out respectively and tradition relative module 210,220,230 and 240 identical functions of demoder 200 G.722.Specifically, bit stream demultiplexer 310 is separated into low strap bit stream and high-band bit stream with incoming bit stream.Low strap adpcm decoder 320 becomes decoding low strap voice signal with the low strap bit stream decoding.High-band adpcm decoder 330 becomes decoding high-band voice signal with the high-band bit stream decoding.To decode low strap voice signal and decoding high-band voice signal of QMF synthetic filtering group 340 reconsolidates and helps the band voice signal then.In the process of the frame of handling Class1, switch 336 is connected to the top position that is labeled as " Class1 ", thus with the output signal of QMF synthetic filtering group 340 as final output voice signal at the demoder/PLC system 300 of the frame of Class1.
After completing steps 412, demoder/PLC system 300 upgrades various status registers, and carries out some processing that help the PLC operation carried out for follow-up frame losing, shown in step 414.Status register comprise the relevant high-band adpcm decoder status register of the relevant low strap adpcm decoder status register of PLC, PLC with entirely with the relevant status register of PLC.As the part of this step, be stored in the internal signal buffer memory with the output signal of voice signal compositor 350 entirely QMF composite filter group 340, think that speech waveform extrapolation possible in the follow-up lost frames processing procedure prepares.Subband adpcm decoder state update module 360 and decoding constraint and control module 370 are non-active in the process of the frame of handling Class1.The more details of the frame processing of relevant Class1 are provided below with reference to the specific implementation of demoder/PLC system 300 of describing in the D joint.
2, handle the frame of type 2, type 3 and type 4
In the process of the frame of handling type 2, type 3 and type 4, the incoming bit stream relevant with lost frames is disabled.Therefore, module 310,320,330 and 340 can not be carried out their common functions, and is non-active.On the contrary, switch 336 is connected to the lower position that is labeled as " type-6 ", and it is active to be with voice signal compositor 350 to become entirely, the output voice signal of synthetic demoder/PLC system 300.The full band voice signal compositor 350 output voice signals relevant with last several received frames before the packet loss by storage before inserting synthesize the output voice signal of demoder/PLC system 300.This embodies in the step 416 of process flow diagram 400.
After full band voice signal compositor 350 is finished the synthetic task of waveform, subband adpcm decoder state update module 360 is suitably upgraded the internal state of low strap adpcm decoder 320 and high-band adpcm decoder 330, for the good frame that may exist in the next frame is prepared, shown in step 418.Now the mode of execution in step 416 and 418 being carried out more details describes.
A, waveform extrapolation
There are many prior aries in the outer plugging function of the waveform of execution in step 416.Below the employed technology of realization of demoder/PLC system 300 of in D joint, describing be 11/234,291 at application number, the revision of application is artificially old, the submission date is on September 26th, 2005, title is described for the U.S. Patent application of " the bag-losing hide technology that is used for piece independent voice codec " technology.The senior description of this technology will be provided at this, and in the D joint, more details will be proposed.
In order to realize the outer plugging function of waveform, in the processing procedure of received frame, analyzes output voice signal from the storage of QMF composite filter group 340 with extraction pitch period (pitchperiod), the short-term forecasting factor and the long-term forecasting factor with voice signal compositor 350 entirely.Then these parameters are stored so that follow-up use.
Full band voice signal compositor 350 is searched for by two stages of carrying out and is extracted pitch period.In the phase one, determine the pitch period of low-res (or thick fundamental tone) by the sampled version (decimated version) of input speech signal or its filtered version being carried out search.In subordinate phase, thick fundamental tone is refined into normal resolution by the neighborhood that uses sampled signal not to search for thick fundamental tone.This two stage searching methods obviously need obvious lower computation complexity than the complete search of single phase in sampling interval not.Before voice signal or its filtered version were sampled, sampled signal need not passed through anti-aliased (anti-aliasing) low-pass filter usually.In order to reduce complexity, common prior art is to use low order infinite impulse response (IIR) wave filter, as elliptic filter.Yet the limit of good low order iir filter is usually very close to unit circle, thereby during with the corresponding filtering operation of full pole segment of wave filter, needs the algorithm computing of dual precision in carrying out 16 fixed-point algorithms.
Compared with prior art, be with voice signal compositor 350 to use finite impulse response (FIR) (FIR) wave filter entirely as anti-aliased low-pass filter.By using the FIR wave filter by this way, only need 16 fixed-point algorithm computings of single precision, the FIR wave filter can carry out computing with the sampling rate of lower sampled signal.So this method can reduce the computation complexity of anti-aliased low-pass filter significantly.For example, in the realization of demoder/PLC system 300 of describing in the D joint, sampled signal does not have the sampling rate of 16kHz, but the sampled signal that is used for the fundamental tone extraction only has the sampling rate of 2kHz.On the basis of existing technology, can use 4 rank elliptic filters.The full pole segment of elliptic filter needs the fixed-point algorithm of dual precision, and need be with the computing of 16kHz sampling rate.Just because of this, although full null part can be with the computing of 2kHz sampling rate, whole 4 rank elliptic filters and down-sampling operation need the computation complexity of 0.66WMOPS (weighting 1,000,000 computing per seconds).On the contrary, even if use the high order FIR filter on 60 relative rank to replace 4 rank elliptic filters, because 60 rank FIR wave filters are with low-down 2kHz sampling rate operation, so whole 60 rank FIR wave filters and down-sampling operation only need the complexity of 0.18WMOPS, compare 4 rank elliptic filters and have reduced 73%.
First lost frames starting point at packet loss, when the input to the cascade composite filter is made as zero, full band voice signal compositor 350 uses long-term composite filter of cascade and short-term composite filter to produce a signal, is referred to as " call signal (ringing signal) ".Analyze the degree that some signal parameter (as fundamental tone prediction gain and standardization auto-correlation) is determined " pronunciation (voicing) " in the output voice signal of storage with voice signal compositor 350 entirely then.If output voice signal before pronunciation is very high, so with this voice signal of periodic manner extrapolation to produce the alternative wave of current bad frame.The periodic waveform extrapolation is to use the refinement version of the pitch period that extracts on the frame that receives recently to carry out.If output voice signal before is sounding or similar noise, so proportional (scaled) random noise produces the substitution signal of current bad frame by the short-term composite filter.If the degree of pronunciation be two extreme between, so two compositions are mixed by the pronunciation degree is proportional.Then with this extrapolation signal and call signal stack, to guarantee that not having waveform when first bad frame of packet loss begins interrupts.In addition, the waveform extrapolation is expanded to exceed terminal one period that equals the cycle of superposeing at least of current bad frame, the call signal that makes stack when the extra samples of this extrapolation signal can begin as next frame when next frame began.
In the bad frame of first bad frame that is not packet loss (promptly in the frame of type 3 or type 4), the operation of full band voice signal compositor 350 is identical with the operation that the preceding paragraph is described in essence, except full band voice signal compositor 350 does not need to calculate call signal, and can use in the call signal of the extra samples that exceeds the extrapolation signal that in previous frame, calculates behind the previous frame end, guarantee when this frame begins, not have waveform to interrupt with this as the stack computing.
For the situation of long-term packet loss, gradually the output voice signal of demoder/PLC system 300 is weakened with voice signal compositor 350 entirely.For example, in the realization of demoder/PLC system of describing in D joint, the output voice signal that produces in the packet loss process,, from 20ms and finishes at 60ms to zero with linear mode decay or " weakening ".Carrying out this function is because increase in time about the shape of " reality " waveform and the uncertainty of form.In fact, when the extrapolation fragment far exceeded the scope of about 60ms, many PLC schemes began to produce the output of drone sound (buzzy).
In alternate embodiment of the present invention, for the PLC in the ground unrest, (usually) embodiments of the invention are followed the trail of the rank of ground unrest (ambient noise), and decay to this rank at long frame deletion, rather than zero.This has eliminated the packet loss interruption effect that the noise elimination of exporting is produced because of the PLC system in ground unrest.
Further alternate embodiment of the present invention realizes that by carrying out comfort noise produces (CNG) function and solved the foregoing problems of PLC in the ground unrest.When this embodiment of the present invention began output voice signal at long-term packet loss decay demoder/PLC system 300, it also began to sneak into the comfort noise that is produced by CNG.By sneaking into comfort noise and replace with comfort noise, when the output voice signal of demoder/PLC system 300 by weak and when finally eliminating the noise, above-mentioned interruption effect will be eliminated and provide the reliable reproduction of this signal surrounding environment.This method has been proved to be and has been accepted in other is used at large.For example, in sub-band echo canceller (SBAEC), in the perhaps general Echo Canceller (AEC), when detecting residual echo, weaken this signal and replace with comfort noise.This typically refers to Nonlinear Processing (NLP).The prerequisite of this embodiment of the present invention is that PLC presents very similarly scheme.Similar to AEC, this method will provide a kind of enhanced experience more at the use of PLC, and this interruption effect that is far from is horrible.
The renewal of the internal state of b, low strap and high-band adpcm decoder
Full band voice signal compositor 350 has been finished after the synthetic task of the waveform of carrying out in step 416, subband adpcm decoder state update module 360 is suitably upgraded the internal state of low strap adpcm decoder 320 and high-band adpcm decoder 330 then in step 418, for good frame possible in the next frame is prepared.The internal state of carrying out low strap adpcm decoder 320 and high-band adpcm decoder 330 upgrades and has many methods.Because the internal state that G.722 scrambler among Fig. 1 and the G.722 demoder among Fig. 2 have same type, more a kind of direct method of new decoder 320 and 330 internal state be the output signal of the full band of feedback voice signal compositor 350 by standard shown in Figure 1 scrambler G.722, the internal state that stays with last sampling of previous frame begins.Then, behind the current bad frame coding to the extrapolation voice signal, the internal state that last sampling of current bad frame stays is used to upgrade the internal state of low strap adpcm decoder 320 and high-band adpcm decoder 330.
Yet preceding method has the complexity of two subband coders.In order to save complexity, demoder/PLC system of in the D joint, describing more than 300 pairs method carried out approximate realization.For the high-band adpcm encoder, be recognized that in first received frame after handling packet loss, do not need high-band adaptive quantizing step delta H(n).On the contrary, quantization step is reset to the preceding sliding average of packet loss (as other parts description in this application).Therefore, use differential signal (or predictive error signal) e of non-quantification H(n) adaptive prediction that carries out in the high-band adpcm encoder upgrades, and to e H(n) quantization operation has been avoided fully.
For the low strap adpcm encoder, scheme is a little a bit different.Owing to keep low strap adaptive quantizing step delta LThe importance of (n) fundamental tone modulation, below the realization of demoder/PLC system 300 of in the D joint, describing in lost frames, upgraded this parameter effectively.Standard G.722 low strap adpcm encoder adopts differential signal (or predictive error signal) e L(n) 6 quantifications.Yet,, only used the subclass of 8 amplitude quantizing indexes to upgrade low strap adaptive quantizing step delta according to standard G.722 L(n).By using non-quantized difference signal e L(n) adaptive prediction that replaces quantized difference signal to be used for the low strap adpcm encoder upgrades, and is keeping low strap adaptive quantizing step delta L(n) under the same more news, the embodiment that describes in the D joint can use the differential signal that is not very complicated to quantize.
The those skilled in the art should know easily, relates to high-band adaptive quantizing step delta in this application H(n) in the description, high-band adaptive quantizing step-length can be by the high-band logarithmic scale factor
Figure S2007800020499D00131
Replace.Similarly, relate to low strap adaptive quantizing step delta in this application L(n) in the description, low strap adaptive quantizing step-length can be by the low strap logarithmic scale factor Replace.
With standard G.722 the subband adpcm encoder relatively, used low strap and another difference of high-band adpcm encoder are resetting based on signal attribute and the self-adaptation of packet loss duration of scrambler in the embodiment of D joint.Begin to describe this function now.
As above introduction,, will export speech waveform after being with voice signal compositor 350 at the fixed time entirely and eliminate the noise for long packet loss.In the realization of demoder/PLC system 300 of in following D joint, describing, G.722 QMF analysis filterbank is given in output signal feedthrough from full band voice signal compositor 350, to obtain to be used for upgrading the subband signal of low strap adpcm decoder 320 and high-band adpcm decoder 330 internal states during lost frames.Therefore, in case subdued to zero from the output signal of full band voice signal compositor 350, the subband signal that is used to upgrade subband adpcm decoder internal state also can become zero.Permanent zero can make the adaptive predictor in each demoder distinguish with the adaptive predictor in the scrambler, because permanent zero can make fallout predictor partly in equidirectional accommodation ceaselessly artificially.This is conspicuous in traditional high-band adpcm decoder, and this can produce high frequency chirp (chirping) usually when long-time good frame of packet loss aftertreatment.For traditional low strap adpcm decoder, because fallout predictor has too high filter gain, this problem can cause that once in a while factitious energy increases.
Based on aforesaid argumentation, in case the PLC output waveform subdued to zero, below the realization of demoder/PLC system 300 of in the D joint, the describing ADPCM sub-band decoder that resets.This method has almost completely been eliminated the high frequency chirp behind long-time frame deletion.The uncertainty of the synthetic waveform that full band voice signal compositor 350 produces increases with the increase of packet loss time, and this argumentation result shows that, it is unconspicuous using this method to upgrade subband adpcm decoder 320 and 330 at some point.
Yet, even when the output of full band voice signal compositor 350 is eliminated the noise fully, reset subband APCM demoder 320 and 330, still have some problems that occur with uncommon chirp (from high-band adpcm decoder 330) and form uncommon and not natural energy growth (from low strap adpcm decoder 320).By producing the adaptive degree of depth that resets of each subband adpcm decoder, solve these problems in the realization of in the D joint, describing.Reset at when waveform is eliminated the noise and still can take place, but also can reset the one or more of subband adpcm decoder 320 and 330 in advance.
As will in D joint, describing, during the decision that resets in advance is based on bad frame (promptly based on upgrade from the output signal of full band voice signal compositor subband adpcm decoder 320 and 330 during) monitoring of some performance of the adaptive signal of the pole segment of the adaptive predictor of control subband adpcm decoder 320 and 330 made.For low strap adpcm decoder 320, the signal P of part reconstruct Lt(n) drive full limit filtering self-adaptation partly, and the signal P of part reconstruct H(n) self-adaptation of the full limit filtering part of driving high-band adpcm decoder 330.In essence, monitoring each parameter in during the lost frames of 10ms all is constant to a great extent, is mainly plus or minus in perhaps during current lost frames.It should be noted that self-adaptation resets and is limited in after the packet loss 30ms in the realization that the D joint is described.
3, handle the frame of type 5 and type 6
When handling the frame of type 5 and type 6, the incoming bit stream relevant with present frame is available again, thereby module 310,320,330 and 340 is in active state once more.Yet, decoded constraint of decode operation and control module 370 that low strap adpcm decoder 320 and high-band adpcm decoder 330 are carried out retrain and control, with the artifact (artifact) and the distortion of the transition position of minimizing from lost frames to the received frame, thus the performance of demoder/PLC system 300 behind the raising packet loss.For the frame of type 5, this is embodied in the step 420 of process flow diagram 400, then is embodied in the step 426 for the frame of type 6.
For the frame of type 5, will carry out additional modifications to the output voice signal and guarantee by the composite signal of full band voice signal compositor 350 generations with by seamlessly transitting between the output signal of QMF composite filter group 340 generations.Thereby the output signal of QMF composite filter group 340 directly is not used as the output voice signal of demoder/PLC system 300.On the contrary, revise the output of QMF composite filter group 340 entirely with voice operation demonstrator 350, and use the output voice signal of amended version as demoder/PLC system 300.Thereby when handling the frame of type 5 or type 6, switch 336 remains connected to the lower position that is labeled as " type 2-6 ", to receive the output voice signal from full band voice signal compositor 350.
In this, if exist unjustifiedly entirely between the output signal that the composite signal that produces with voice signal compositor 350 and QMF composite filter group 340 produce, the operation of carrying out with voice signal compositor 350 comprises time distortion and phasing again entirely.The execution of these operations illustrates in the step 422 of flow process 400, and will do more details and describe following.
Equally, for the frame of type 5, output voice signal that produces with voice signal compositor 350 and call signal stack entirely from the lost frames of first pre-treatment.Do like this be in order to ensure from the synthetic waveform related with preceding frame to the seamlessly transitting of the related output waveform of the frame of current type 5.The execution of this step illustrates in the step 424 of process flow diagram 400.
After the frame that is type 5 or type 6 produces the output voice signal, demoder/PLC system 300 upgrades various status registers, and carry out some and handle and be beneficial at follow-up lost frames with the performed PLC computing of the similar fashion of step 414, shown in step 428.
The constraint and the control of a, subband ADPCM decoding
As above introduction, 370 pairs of decode operations of being carried out by low strap adpcm decoder 320 and high-band adpcm decoder 330 in the frame process of handling type 5 and type 6 of decoding constraint and control module retrain and control, with the performance of demoder/PLC system 300 behind the raising packet loss.Various constraints and control to decoding constraint and control module 370 employings now is described.About the more details of these constraints and control will be further described in following D joint in the special realization of reference decoder/PLC system 300.
I, be used for the setting of the adaptive quantizing step-length of high-band adpcm decoder
For the frame of type 5, decoding constraint and control module 370 will be used for the adaptive quantizing step delta of high-band adpcm decoder 330 H(n) be set at the sliding average of the value related with the good frame that receives before the packet loss.Energy decreases by visible packet loss in the fragment that reduces ground unrest causes has improved the performance of demoder/PLC system 300 in the ground unrest.
Ii, be used for the setting of the adaptive quantizing step-length of low strap adpcm decoder
For the frame of type 5, decoding constraint and control module 370 are carried out adaptive strategy and are thought that low strap adpcm decoder 320 is provided with the adaptive quantizing step delta L(n).In the embodiment that substitutes, this method also can be used for high-band adpcm decoder 330.As part introduction in front, for high-band adpcm decoder 330, with the adaptive quantizing step delta H(n) being set at the sliding average of the value on first good frame before the packet loss, is useful to the demoder/performance of PLC system 300 in ground unrest.Yet same procedure is applied to low strap adpcm decoder 320 can produce very big not natural energy growth once in a while on speech sound.This is because use pitch period to Δ in speech sound L(n) modulate, and therefore with Δ L(n) be set at the preceding sliding average of frame losing and can on first the good frame behind the packet loss, cause Δ L(n) very large unusual increasing.
Therefore, modulating Δ by pitch period L(n) under the situation, preferably use Δ from adpcm decoder state update module 360 L(n), rather than packet loss before Δ L(n) sliding average.Recall, by will be entirely obtaining low band signal with the output signal transmission of voice signal compositor 350 by QMF analysis filterbank G.722, subband adpcm decoder state update module 360 is upgraded low strap adpcm decoders 320.If be with voice signal compositor 350 executing the task entirely, may be at speech sound, the signal that is used to upgrade low strap adpcm decoder 320 so mates the signal that uses probably very much on scrambler, so the parameter Δ L(n) also probably very near the step-length of this scrambler.For speech sound, this method is preferably with Δ L(n) be set at the preceding Δ of packet loss L(n) sliding average.
Describe before considering, decoding constraint and control module 370 employing adaptive strategies are used for the Δ of first good frame behind the packet loss with setting L(n).If the voice signal quite stable before the packet loss, for example stable ground unrest is so with Δ L(n) be set at the preceding Δ of packet loss L(n) sliding average.Yet, if the voice signal before the packet loss presents Δ L(n) variation on for example is considered at speech sound, so with Δ L(n) be set at by low strap adpcm decoder and upgrade the value that obtains based on the output of full band voice signal compositor 350.For the situation of centre, Δ L(n) be set to based on Δ before the packet loss L(n) change the linear weighted function that between these two values, carries out.
Iii, be used for the self-adaptation low-pass filtering of the adaptive quantizing step-length of high-band adpcm decoder
Lose in the process of the back good frame of first few (frame of type 5 and type 6) in pack processing, for the risk of the localised waving that reduces the too strong high-frequency content of generation (because G.722 scrambler and G.722 synchronous temporarily losing between the demoder), decoding constraint and control module 370 can effectively be controlled the adaptive quantizing step delta of high-band adpcm decoder H(n).Can produce higher-order of oscillation effect like this, this just in time is the influence of actual chirp.Therefore, in the good frame of first few, to high-band quantization step Δ H(n) application self-adapting low-pass filter.Reduced with quadric form and seamlessly transitted through the adaptive time cycle.For the highly stable signal segment of voice signal before the packet loss, the duration longer (in the realization of following demoder/PLC system 300 of describing in the D joint is 80ms).For the situation that before the packet loss is not very stable voice signal, the duration short slightly (in the realization of following demoder/PLC system 300 of describing in the D joint is 40ms), and, then do not adopt low-pass filtering for unsettled fragment.
Adaptive security nargin (adaptivesafety margin) in iv, the good frame of first few on the all-pole filter part
Because in inevitable deviation between demoder and the scrambler G.722 during the packet loss and afterwards, decoding constraint and control module 370 are carried out some constraint to the adaptive predictor of low strap adpcm decoder 720 during the good frame of first few behind the packet loss (frame of type 5 and type 6).According to standard G.722, the encoder of acquiescence is carried out 1/16 minimum " safety " nargin on the pole segment of subband fallout predictor.Yet, it has been found that, the full pole segment of six at the two poles of the earth of low strap adpcm decoder, zero predictive filters at packet loss after regular meeting causes that unusual energy increases.This senses with the form of waveform spring (pop) usually.Obviously, packet loss causes lower margin of safety, and it is corresponding with the full limit filtering part with higher gain that produces the very high energies waveform.
By carrying out more strict constraint adaptively on the full limit filtering part of the adaptive predictor of low strap adpcm decoder 320, decoding constraint and control module 370 have greatly reduced the unusual energy increase behind this packet loss.On the good frame of first few behind the packet loss, obtained the minimal security nargin that increases.The minimal security nargin that increases is gradually reduced the G.722 minimal security nargin to standard.In addition, also the sliding average of the margin of safety before the packet loss is monitored, and the minimal security nargin of good increase image duration of first few behind the packet loss is controlled, so that can not surpass this sliding average.
DC on the internal signal of v, high-band adpcm decoder removes
During the good frame of first few behind the packet loss (frame of type 5 and type 6), according to observations, G.722 demoder often produces the distortion of warbling of very tedious tangible high frequency.This distortion comes from because of packet loss and high-band adpcm encoder and loses synchronously and thereby produce the high-band adpcm decoder of prediction devious.The synchronization loss of distortion of causing warbling has shown himself antipodal points fallout predictor P in input signal H(n) adaptive control and the high band signal r of reconstruct in long-time, having constant sign H(n) control.This makes the pole segment of fallout predictor drift about, because self-adaptation is based on symbol (sign-based), thereby keeps in the same direction upgrading.
For fear of this problem, by first few behind packet loss good image duration respectively with high-pass filtered version P H, HP(n) and r H, HP(n) replace signal P H(n) and r H(n), decoding constraint and control module 370 have increased DC to these signals and have removed operation.This is used for eliminating fully chirp.DC removes and is implemented as P H(n) and r H(n) subtraction of sliding average separately.These sliding averages upgrade constantly at good frame and bad frame.In the realization of demoder/PLC system 300 of in following D joint, describing, this replacement at packet loss after initial 40ms take place.
B, phasing and time distortion again
As above introduction, in the step 422 of process flow diagram 400, if between the voice signal that produce the initial image duration that receives behind the packet loss, exist unjustifiedly entirely in the synthetic speech signal that produces during the packet loss and QMF composite filter group 340 with voice signal compositor 350, carry out the technology that is called as " phasing again " and " time distortion " with voice signal compositor 350 entirely.
Aforesaid, when handling lost frames,,, be with voice signal compositor 350 so entirely based on pitch period extrapolation speech waveform as the first tone signal of voice if the decodeing speech signal relevant with received frame before the packet loss almost is periodic.Still aforesaid, beyond lost frames endings, continue this waveform extrapolation, obtaining to be used for the more samplings with the related voice signal stack of next frame, thereby guaranteed to seamlessly transit and avoided any interruption.Yet the actual pitch period of decodeing speech signal is not generally followed and carry out used pitch contour during the waveform extrapolation in lost frames.So, the decodeing speech signal complete matching that general extrapolation voice signal can be not related with first good frame.
This is shown in Figure 6, Fig. 6 show before the packet loss and packet loss after during first received frame decodeing speech signal 602 amplitude (for convenience, decodeing speech signal when also showing lost frames, but should be appreciated that can not decode this part of original signal of demoder/PLC system 300) and during the lost frames and the timeline 600 of the amplitude of the extrapolation voice signal 604 that produces during first received frame behind the packet loss.As shown in Figure 6, two signals are out-phase in first received frame.
This out-phase phenomenon causes two problems in demoder/PLC system 300.The first, as can see from Figure 6, in first received frame behind packet loss, decodeing speech signal 602 and extrapolation voice signal 604 in the overlap-add region are out-phase, and part is offseted, and cause to listen to bear picture.The second, the status register relevant with subband adpcm decoder 320 and 330 presented fundamental tone modulation to a certain degree, and therefore to the phase sensitive of voice signal.If voice signal near pitch period, promptly is that this problem is especially obvious near the voice signal part of the rapid fundamental tone pulse of rising and descending of signal level.Because the phase sensitive of subband adpcm decoder 320 and 330 pairs of voice signals, and because extrapolation voice signal 604 is used to upgrade the status register (as mentioned above) of these demoders when packet loss, phase differential between extrapolation voice signal 604 and the decodeing speech signal 602 will produce significantly breast picture in the received frame behind packet loss, and this is because the internal state mismatch of subband adpcm encoder and demoder.
Below will make more details to this and describe, the time distortion is used for solving first problem of the destructive interference of overlap-add region.Specifically, time distortion be used to the to stretch time shaft of the decodeing speech signal relevant with first received frame behind the packet loss with contraction is to align it with the extrapolation voice signal that is used for hiding last lost frames.Though with reference to subband predictive coding device the time distortion is described with storer at this, but this ordinary skill also may be used on other scrambler, includes but not limited to have and does not have scrambler, prediction and nonanticipating scrambler and the subband of storer and be with scrambler entirely.
Make more details at this equally and describe, phasing is used to solve because second problem that the internal state of the unjustified subband adpcm encoder that causes of first frame and demoder be excuse me, but I must be leaving now behind lost frames and the packet loss again.Again phasing is that internal state with subband adpcm decoder 320 and 330 is set at behind extrapolation speech waveform and the packet loss last input signal sampling before first received frame with the processing procedure of the state of the time point of phase time.Though describing phasing again in the back in the environment of adaptive system, it also can be used for carrying out PLC at forward direction adaptive prediction coder or any scrambler with storer.
I, time lag are calculated
Again phasing and time distortion technology all needs to calculate the quantity of unjustified sampling between extrapolation voice signal and the decodeing speech signal relevant with first received frame behind the packet loss.This unjustified being called as " hysteresis ", as institute's mark among Fig. 6, it can think the number of samples of decodeing speech signal hysteresis extrapolation voice signal.In the situation of Fig. 6, hysteresis is born.
A kind of general method of carrying out time lag calculating still also can be used other method shown in the process flow diagram 700 of Fig. 7.A kind of ad hoc fashion of carrying out this method has provided description in following D joint.
As shown in Figure 7, the method for process flow diagram 700 is in step 702 beginning, after the speech waveform that is produced by full band voice signal compositor 350 during last lost frames is extrapolated to packet loss in first received frame.
In step 704, calculate time lag.On conceptual level, calculate time lag by the correlativity between maximization extrapolation voice signal and the decodeing speech signal related with first received frame behind the packet loss.As shown in Figure 9, with respect to the related decodeing speech signal of first received frame (being expressed as 902), extrapolation voice signal (being expressed as 904) drifts about in the scope of+MAXOS at-MAXOS, and wherein MAXOS represents maximum offset, and the drift value of maximization correlativity is used as time lag.This can by near zero ± signal in the time lag scope of MAXOS between the peak value of search criterion crossing dependency function R (k) finish:
R ( k ) = Σ i = 0 LSW - 1 es ( i - k ) · x ( i ) Σ i = 0 LSW - 1 es 2 ( i - k ) Σ i = 0 LSW - 1 x 2 ( i ) , k = - MAXOS , K , MAXOS - - - ( 1 )
Wherein es is the extrapolation voice signal, x be with packet loss after the related decodeing speech signal of first received frame, MAXOS is the peak excursion that allows, LSW is a hysteresis search window length, i=0 is illustrated in the sampling of first in the hysteresis search window.The time lag that maximizes this function will be corresponding to the relative time deviation between two waveforms.
In one embodiment, to determine the quantity (being called the hysteresis search window again) of the sampling of calculating correlativity thereon based on the adaptive mode of pitch period.For example, among the embodiment that describes in following D joint, the quantitative window size of sampling (16kHz sampling rate) that is used for the thick search that lags behind is as follows:
Wherein ppfe is a pitch period.This equation has used floor function (floor function).The floor function of real number x It is the function that returns the maximum integer that is less than or equal to x.
If the time lag of calculating in step 704 is zero, this expression extrapolation voice signal decodeing speech signal related with first received frame is homophase so, and lagging behind (comparatively speaking being delayed) extrapolation voice signal on the occasion of the expression decodeing speech signal related with first received frame, negative value represents that the decodeing speech signal related with first received frame is ahead of the extrapolation voice signal.If time lag equals zero, do not need to carry out again the distortion of phasing and time so.In the example implementation that in following D joint, proposes, if last received frame before the packet loss is that noiseless (number of degrees by the sounding that calculates at this frame are represented, described as above processing) about type 2, type 3 and type 4, if first received frame perhaps behind the packet loss is noiseless, time lag also is set as zero.
In order to minimize the complexity of correlation calculations, can use the multistage to handle and carry out the search that lags behind.The process flow diagram 800 of Fig. 8 shows this method, wherein at first uses the down-sampling of signal to represent to carry out thick time lag search in step 802, represents to carry out the search of refinement time lag what step 804 was used signal than high sampling rate then.For example, signal is down sampled to carries out thick time lag search behind the 4kHz, carry out the search of refinement time lag with the signal on the 8kHz.In order further to reduce complexity, can be only ignore any aliasing effect and carry out to down-sampling by signal being carried out double sampling.
A problem is to use the extrapolation voice signal that what signal is next and first received frame is interior to carry out relevant." powerful (brute force) " method is intactly to decode first received frame to obtain decodeing speech signal calculates correlativity then on 16kHz method.Be first received frame of decoding, can use from recompile extrapolation voice signal (as mentioned above) until frame boundaries and the subband adpcm decoder 320 that obtains and 330 internal state.Yet because the algorithm of phasing again of the following stated will provide one group better state for subband adpcm decoder 320 and 330, this need rerun G.722 decoding.Because this method is carried out complete decode operation twice, aspect computation complexity, waste very much.For head it off, embodiments of the invention have been realized a kind of method of lower complexity.
According to the method for lower complexity, the G.722 bit stream that in first received frame, receives only by partial decoding of h to obtain low strap quantized difference signal d Lt(n).Normally G.722 in the decode procedure, the bit that receives from bit stream demultiplexer 310 converts differential signal d to by subband adpcm decoder 320 and 330 Lt(n) and d H(n), these two signals carry out convergent-divergent by self-adaptation scale factor backward, and obtain the subband voice signal by self-adaptation zero limit (pole-zero) fallout predictor backward, and these signals are synthesized by QMF composite filter group 340 then and produce the output voice signal.In each sampling in this processing procedure, the coefficient (coefficient) of the adaptive predictor in subband ADPCM decoding d device 320 and 330 will be upgraded.This renewal has solved the pith of decoder complexity.Owing to only need to be used for the signal that time lag is calculated, so in the lower complexity method, the two poles of the earth, 60 predictive filter coefficients still remain unchanged (they are not updated based on sampling one by one).In addition, because lag behind by the fundamental tone decision, and the fundamental tone basic frequency of people's voice is less than 4kHz, so only can obtain low strap approximate signal r L(n).More details about the method will provide in following D joint.
Among the embodiment that describes in following D joint, the fixedly filter factor of the two poles of the earth, 60 predictive filters is that the extrapolation waveform during the packet loss of decoding again obtains up to the end of last lost frames.In optionally realizing, fixedly filter factor can be those filter coefficients of the ending use of last received frame before packet loss.In another is optionally realized, can be according to characteristic or other standard of voice signal, select in these coefficient sets one or other with adaptive mode.
Ii, phasing again
In phasing process again, the internal state of adjusting subband adpcm decoder 320 and 330 is considered the time lag between the decoded speech waveform of first received frame association behind extrapolation speech waveform and the packet loss.As described above, before handling first received frame, by the output voice signals that synthesized by full band voice signal compositor 350 during last lost frames are carried out the internal state that recompile is estimated subband adpcm decoder 320 and 330.The internal state of these demoders presents certain fundamental tone modulation.Thereby, if the pitch period that uses during the waveform extrapolation related with last lost frames is just in time closelyed follow the pitch contour of decodeing speech signal, in the end the border between lost frames and first received frame stops recoding processing so, and the state of subband adpcm decoder 320 and 330 and original signal are homophases.Yet, as mentioned above, generally the do not match pitch contour of decodeing speech signal of the fundamental tone that when extrapolation, uses, and first received frame behind packet loss is when beginning, and extrapolation voice signal and decodeing speech signal do not line up.
In order to overcome this problem, phasing use time lag is controlled at and where stops again the phasing processing again.In the example of Fig. 6, the time lag between extrapolation voice signal 604 and the decodeing speech signal 602 is born.Suppose that this time lag represented by lag.So, as can be seen, if to the extrapolation voice signal recode exceed frame boundaries-lag sampling, stop on the phase place consistent that recodification will be in extrapolation voice signal 604 so with the phase place of decodeing speech signal 602 on the frame boundaries.Subband adpcm decoder 320 that produces and 330 status register will with the reception data homophase in first good frame, thereby better decoded signal is provided.Thereby the number of samples of the subband reconstruction signal of decoding again is as follows:
N=FS-lag (3)
Wherein FS is a frame sign, and all parameters all are to be unit with sub-band sample rate (8kHz).
Figure 10 A, Figure 10 B and Figure 10 C have provided three kinds of schemes of phasing again respectively.On the timeline 1000 of Figure 10 A, decodeing speech signal 1002 is ahead of extrapolation voice signal 1004, so decoding exceeds frame boundaries-lag sampling again.On the timeline 1010 of Figure 10 B, decodeing speech signal 102 lags behind extrapolation voice signal 1014, lag the sampling place termination before frame boundaries of decoding again.On the timeline 1020 of Figure 10 C, extrapolation voice signal 1024 believes 1022 homophase on the frame boundaries (although the pitch contour during the lost frames is different) with the decoding voice number, and recompile stops on frame boundaries.Note, for convenience, in Figure 10 A, 10B and 10C, all show the decodeing speech signal during lost frames, but should know can not decode this part of original signal of demoder 300.
If do not carry out the phasing again of subband adpcm decoder 320 and 330 internal states, can in the entire process process of lost frames, carry out the recompile that is used to upgrade these internal states so.Yet, because first received frame that will arrive behind the packet loss is just known hysteresis always, so can not in the whole process of lost frames, carry out recompile.The straightforward procedure of head it off is the whole extrapolation waveform that storage is used to replace last lost frames, carries out recompile then during first received frame.Yet this needs storer to store FS+MAXOS sampling.The complexity of recompile also all falls into first received frame.
Figure 11 is to redistribute carry out method flow Figure 110 of recompile in a large amount of calculation mode of preceding lost frames.Because MAXOS<<FS, so from load calculated balance angle, this is reasonable and feasible.
As shown in figure 11, the method for process flow diagram 1100 starts from step 1102, carries out recompile until frame boundaries in lost frames, then subband adpcm decoder 320 and 330 internal state on the storage frame border.In addition, also to store recompile FS-MAXOS the inside middle state after the sampling, shown in step 1104.In step 1106, in storer, preserve and be used for recompile FS-MAXOS+1 and sample to the waveform extrapolation that FS+MAXOS generated.In step 1108, in first received frame behind packet loss, subband is approximate decodes (being used for determining above-mentioned lag) to carry out as original state to use the internal state of storing on the frame boundaries.Then, in decision steps 1110, determine that lag is positive or negative.If lag is positive, be stored in the internal state in the FS-MAXOS sampling so again, and the MAXOS-lag sampling that begins to decode again, shown in step 1112.On the contrary,, use the internal state on the frame boundaries so, and recompile is additional if lag bears | the lag| sampling.According to this method, recompile MAXOS sampling at the most in first received frame.
Those having skill in the art will recognize that and know, store the amount that more G.722 state reduces the recompile in first good frame in the time of can be by the recompile process in lost frames on the way.Under extreme case, can store the G.722 state of each sampling between FRAMESIZE-MAXOS and the FRAMESIZE+MAXOS, and need in first received frame, not carry out recompile.
Compare the method for process flow diagram 1100, in a kind of alternative method that needs more recompile in first received frame, recompile is carried out at FS-MAXOS sampling during lost frames.Subband adpcm decoder 320 and 330 internal state and 2*MAXOS sampling of residue are stored in the storer so that use in first received frame.In first received frame, calculate hysteresis, and the sampling of appropriate amount is begun to carry out recompile from the G.722 state of storage based on this hysteresis.This method need be stored the sampling of 2*MAXOS reconstruct, the recompile of a copy of state and the 2*MAXOS at the most in first received frame sampling G.722.The shortcoming of this alternative method is to store subband adpcm decoder 320 on the frame boundaries that is used for decoding of above-mentioned lower complexity and time lag calculating and 330 internal state.
Say the phase shift unanimity on the frame boundaries between the decodeing speech signal that hysteresis should be related with extrapolation voice signal and first received frame ideally.According to one embodiment of present invention, calculate thick hysteresis estimated value by long relatively hysteresis search window, the center of this window is not consistent with frame boundaries.For example, the hysteresis search window can be 1.5 times of pitch period.Hysteresis hunting zone (promptly being offset the number of samples of the extrapolation voice signal) broad that also compares (sampling promptly ± 28) with respect to primary speech signal.In order to improve degree of registration, so carry out the hysteresis search refinement.As the part of hysteresis search refinement, the mobile search window is with first sampling from first received frame.This can finish by the extrapolation voice signal of being estimated to setover by thick hysteresis.The size of the hysteresis search window in the hysteresis search refinement can be smaller, and the hysteresis hunting zone also can smaller (sampling promptly ± 4).Searching method can be identical with the method in the above-mentioned 3.b.i joint.
The present invention has proposed the notion of phasing again in above G.722 back in the environment of adaptive prediction coder.We can be easily with this conceptual expansion to other back to the adaptive prediction coder, as G.726.Yet the use of phasing is not limited to the back to the adaptive prediction coder again.On the contrary, most scramblers based on storer show phase correlation in status register, and therefore benefit from phasing again.
Iii, time distortion
As used herein, the distortion of term time refers to along the processing of time shaft stretching or contraction signal.As discussing in these other places, in order to keep continuous signal, embodiments of the invention will be used to replace the extrapolation voice signal decodeing speech signal related with first received frame behind the packet loss of lost frames to merge to avoid interruption.This is to finish by carrying out two stacks between the signal.Yet, if between signal be mutual out-phase, waveform may take place so offset (cancellation), and produce and can listen the breast picture, for example the overlap-add region among Fig. 6.In this zone, carry out stack and will cause between the negative part of decodeing speech signal 602 and the extrapolation voice signal 604 that significantly waveform offsets.
According to embodiments of the invention, the decodeing speech signal related with first received frame behind the packet loss is performed the time distortion, to make decodeing speech signal and extrapolation voice signal phase alignment on certain time point in first received frame.The amount of time distortion is to be controlled by the value of time lag.Thereby, in one embodiment, if time lag is positive, the decodeing speech signal related that will stretch so with first received frame, and overlap-add region can be arranged on the place that begins of first received frame.Yet if time lag is born, decodeing speech signal will be compressed.Therefore, overlap-add region is set to and enters first received frame | lag| sampling.
Under situation G.722, some sampling that first received frame begins behind the packet loss is not reliably, this be because frame when beginning subband adpcm decoder 320 and 330 internal state be incorrect.Therefore, in an embodiment of the present invention, according to the time distortion that the decodeing speech signal related with first received frame used, the MIN_UNSTBL sampling in first received frame can not be included in the overlap-add region.For example, in the embodiment that following D joint is described, MIN_UNSTBL is set at 16, or first 1ms in the 10ms frame of 160 samplings.In this zone, the extrapolation voice signal can be used as the output voice signal of demoder/PLC system 300.This embodiment has solved the convergence time again of voice signal in first received frame effectively.
Figure 12 A, Figure 12 B and Figure 12 C show several examples of this notion.In the example of Figure 12 A, timeline 1200 shows that decodeing speech signal is ahead of the extrapolation signal in first received frame.Therefore, decodeing speech signal has passed through-the lag time of sampling a distortion contraction (time lag lag bears).Used result after the time distortion shown in the timeline 1210.Shown in timeline 1210, these signals are homophases near the center in stack district or center.In this case, overlap-add region be centered close to the MIN_UNSTBL-lag+OLA/2 place, wherein OLA is the quantity of sampling in the overlap-add region.In the example of Figure 12 B, timeline 1220 shows that decodeing speech signal lags behind the extrapolation signal in first received frame.Therefore, alignment is finished in the time distortion stretching of decodeing speech signal being done lag sampling.The result of employing time distortion is shown in Figure 123 0.In this case, MIN_UNSTBL>lag, and in first received frame, still have unsettled zone.In the example of Figure 12 C, timeline 1240 shows decoded signal and lags behind the extrapolation signal again, provides result in the timeline 1250 so decodeing speech signal is stretched by time distortion.Yet, shown in timeline 1250, because MIN_UNSTBL≤lag, so overlap-add region can begin in first sampling from first received frame.
" homophase point " between decodeing speech signal and the extrapolation signal need be in the centre of overlap-add region, and overlap-add region is arranged on the place that begins near first received frame as far as possible.This has reduced the time that the synthetic speech signal of last lost frames association must be extrapolated to first received frame.In one embodiment of the invention, this is to estimate to finish by the time lag of carrying out two stages.In the phase one, calculate thick hysteresis estimated value by long relatively hysteresis search window, the center of window can be not consistent with the center of overlap-add region.For example, the hysteresis search window can be 1.5 times of pitch period.Hysteresis hunting zone (promptly being offset the number of samples of the extrapolation voice signal) broad that also compares (sampling promptly ± 28) with respect to primary speech signal.In order to improve degree of registration, so carry out the hysteresis search refinement.As the part of hysteresis search refinement, the hysteresis search window is provided with concentricity with the expectation stack of estimating according to thick hysteresis to obtain.This can finish by the extrapolation voice signal of being estimated to setover by thick hysteresis.The size of the hysteresis search window in the hysteresis search refinement can less (for example size of overlap-add region), and the hysteresis hunting zone also can less (sampling promptly ± 4).Searching method can be identical with the method in the above-mentioned 3.b.i joint.
The execution time distortion exists many technology, a kind of technology to comprise sectional type (piece-wise) single sampling translation and stack.The process flow diagram 1300 of Figure 13 has been described a kind of method of using this technology to shrink.According to this method, shown in step 1302, periodically reduce sampling.From this point that sampling reduces, stack original signal and to the signal (because reduction) of left, shown in step 1304.The process flow diagram 1400 of Figure 14 has been described a kind of method of using this technology to stretch.According to this method, the periodicity repeated sampling is shown in step 1402.From that point that sampling repeats, stack original signal and to the signal (because sampling repetition) of right translation, shown in step 1404.The length of the stack window of these operations depends on the periodicity of the increase/reduction of sampling.For fear of too many signal smoothing, can define the stack cycle (i.e. 8 samplings) of a maximum.The cycle that sampling increases/reductions takes place is depended on various factors, as the quantity of frame sign, sampling increase/reduction and whether carrying out increase or reduction.
The time amount of distortion can be limited.For example, in the G.722 system that following D joint is described, the metering pin that the time can be twisted is restricted to ± 1.75ms (perhaps 28 samplings in the 10ms frame of 160 samplings) the frame of 10ms.Distortion greater than this scope can be eliminated above-mentioned destructive interference, but can bring some other listened to distortion usually.Thereby, in such an embodiment, exceed in time lag under the situation of this scope, not the execution time distortion.
The present invention is provided with the following system of describing and guarantees the zero sampling delay after first received frame behind packet loss in the D joint.For this reason, this system is not to surpassing the decodeing speech signal execution time distortion of first received frame.This so limited the amount that can listen the time distortion that distortion takes place not having described in the preceding paragraph.Yet, those having skill in the art will recognize that and know, in the system that holds some sampling delay (after first received frame behind packet loss), can be to surpassing the decodeing speech signal application time distortion of first good frame, thus can adjust to bigger time lag not having to listen under the situation of distortion.Certainly, in this system, if the LOF behind first received frame, the time distortion only can be applied to and the related decodeing speech signal of first good frame so.This optional embodiment is also within scope and spirit of the present invention.
In optional embodiment of the present invention, decodeing speech signal and extrapolation voice signal can be twisted the execution time.Owing to multiple reason, this method can provide more performance.
For example, if time lag is-20, according to said method decodeing speech signal is done the contraction of 20 samplings so.Need to produce 20 samplings of extrapolation voice signal in other words to be used in first received frame.This quantity also can reduce by shrinking the extrapolation voice signal.For example, the extrapolation voice signal can be shunk 4 samplings, stay 16 samplings and be used for decodeing speech signal.This has reduced the extrapolation signals sampling quantity that must be used in first received frame, has also reduced the amount of the distortion that must carry out on decodeing speech signal.As above write down, in the embodiment of D joint, the time distortion need be restricted to 28 samplings.The minimizing that is used for the required time twist angle of aligned signal means in time distort process meeting introducing distortion still less, and has increased the quantity of the situation that can improve.
By decodeing speech signal and extrapolation voice signal are done the time distortion, also should obtain better Waveform Matching in the overlap-add region.Be explained as follows: if lag behind be before-20 samplings in the example, decodeing speech signal is ahead of 20 samplings of extrapolation signal in other words.The most possible reason of this situation is to be used for the pitch period of extrapolation greater than actual fundamental tone.By same contraction extrapolation voice signal, effective fundamental tone of this signal becomes littler in the overlap-add region, more approaches actual pitch period.Equally, by shrinking original signal, effective pitch period of this signal is greater than the situation that only is used to shrink.Therefore, two waveforms in the overlap-add region can have the pitch period of coupling more, thereby waveform can more mate.
If it is positive lagging behind, decodeing speech signal so stretches.In this case, whether though stretching extrapolation signal can increase the quantity of the extrapolation sampling that is used for first received frame, and it is also unclear to be improved.Yet, if there is long-term packet loss, and two obvious out-phase of waveform, this method can provide the performance of improvement so.For example, be 30 samplings if lag behind, because, in aforesaid method, do not carry out distortion greater than the restriction of 28 samplings.The distortion of 30 samplings causes the distortion of itself probably.Yet, if these 30 sample distribution between two signals, as 10 samplings of stretching extrapolation voice signal and 20 samplings of stretching decodeing speech signal, can be alignd them under the situation of not using too much time distortion so.
D, the details of the example implementation in the demoder G.722
This part provides and has related to the detail that the present invention recommends special realization in the Voice decoder G.722 at ITU-T.This example implementation is carried out on 10 milliseconds of intrinsic (ms) frame signs, also can carry out on the bag of the multiple of any 10ms or frame sign.Long incoming frame is handled as superframe (super frame), and to this, the PLC logic is called the number of times of suitable quantity with its intrinsic 10ms frame sign.G.722 decode with the routine of using the same number of frames size and to compare, it can't cause additional delay.The present invention only mode by example provides these to realize details and the following content that provides, and can not be used for limiting the present invention.
The embodiment that describes in this section satisfy with appendix IV G.722 in the identical complexity demand of PLC algorithm described, but provided than the obvious better voice quality of the PLC algorithm of describing in that appendix.Because its high-quality, the embodiment that describes in this section are applicable to the general application G.722 that frame deletion or packet loss take place.This application comprises that for example, internet protocol voice technology (VoIP), Wireless Fidelity voice technology (WiFi) and numeral of future generation strengthen radio communication (DECT).The embodiment that describes in this section is easy to be suitable for, except reality after the substantially G.722 demoder of carrying out no PLC does not stay the application of complexity headroom (headroom).
1, abbreviation and agreement
Some abbreviations of in this section, using have been listed in the table 1.
Abbreviation Describe
ADPCM Adaptive differential PCM
ANSI American National Standards Institute
dB Decibel
DECT Numeral strengthens radio communication
DC Direct current
FIR Finite impulse response (FIR)
Hz Hertz
LPC Linear predictive coding
OLA Stack
PCM Pulse code modulation (PCM)
PLC Bag-losing hide
PWE The periodic waveform extrapolation
STL2005 Software tool archive 2005
QMF The mirror image secondary filter
VoIP Internet protocol voice technology
WB The broadband
WiFi Wireless Fidelity
Table 1: abbreviation
Description of the invention has also been used some agreements, and will make explanations to a part wherein.The PLC algorithm carries out computing with the intrinsic frame sign of 10ms, so the description of this algorithm is only at the frame of 10ms.For bigger bag (multiple of 10ms), decode with 10ms segmentation butt joint packet receiving.The discrete time of signal is generally used " j " or " i " expression on 16kHz sampling rate rank.The discrete time index of signal generally uses " n " expression on 8kHz sampling rank.Low band signal (0-4kHz) is with subscript " L " sign, and high band signal (4-8kHz) identifies with subscript " H ".If possible, this description will be reused ITU-T standard G.722.
The most frequently used symbol and their description have been listed in the following table 2.
Figure S2007800020499D00301
Figure S2007800020499D00311
Figure S2007800020499D00321
Figure S2007800020499D00331
Table 2: conventional sign and description thereof
2, the general description of PLC algorithm
Describe with reference to figure 5 as above, the frame that demoder/PLC system 300 handles has six types: Class1, type 2, type 3, type 4, type 5 and type 6.The frame of Class1 is the received frame beyond any the 8th received frame after packet loss.The frame of type 2 is one of them of first related with packet loss and second lost frames.The frame of type 3 be related with packet loss the 3rd to the 6th lost frames wherein any one.The frame of type 4 is six frame in addition any lost frames related with packet loss.The frame of type 5 is received frames that follow closely behind the packet loss.At last, the frame of type 6 be follow closely behind the packet loss second to the 8th received frame wherein any one.The PLC algorithm described in this section is to be the enterprising row operation of constant frame size of 10ms in the duration.
The present invention is according to the frame of G.722 operating the Class1 of decoding of standard, and the maintenance that has increased some status register is beneficial to PLC and relevant processing with processing.Figure 15 is a module map 1500 of carrying out the logic of these operations according to embodiments of the invention.Specifically, as shown in figure 15, when handling the frame of Class1, from bit demodulation multiplexer (not shown Figure 15), receive index (index) I of low strap adpcm encoder L(n), and by 1510 decodings of low strap adpcm decoder produce the subband voice signal.Similarly, from the bit demodulation multiplexer, receive the index number I of high-band adpcm encoder H(n), and by 1520 decodings of high-band adpcm decoder produce the subband voice signal.QMF composite filter group 1530 synthetic low strap voice signals and high-band voice signal produce decoded output signal x Out(j).G.722, it is consistent that these operations are decoded with standard.
Except that these standards G.722 the decode operation, when handling the frame of Class1, logic module 1540 is used to upgrade the relevant low strap ADPCM status register of PLC, logic module 1550 is used to upgrade the relevant high-band ADPCM status register of PLC, and logic module 1560 is used to upgrade the relevant status register of WB PCM PLC.These status registers upgrade and are used to accelerate the PLC processing relevant with other frame type.
For the frame of type 2, type 3 and type 4, broadband (WB) PCM PLC carries out in 16kHz output voice domain.The module map 1600 of the logic that is used to carry out WB PCM PLC is provided among Figure 16.G.722 demoder before output voice x Out(j) be buffered, and be sent to WB PCMPLC logic.WB PCM PLC algorithm is based on periodic waveform extrapolation (PWE), and it is the important component part of WB PCM PLC logic that fundamental tone is estimated.At first, based on estimating thick fundamental tone to down-sampling (to 2kHz) signal in the weighting voice domain.Subsequently, use original 16kHz sampling with complete this estimated value of resolution refinement.The output x of WB PCM PLC logic PLC(i) be extrapolation waveform and periodically by the linear combination of the noise of PLC setting.For the frame deletion that continues, output waveform x PLC(i) weakened gradually.After weakening the 20ms after the LOF, and behind the 60ms after the LOF, finish.
Shown in the module map 1700 of Figure 17, for the frame of type 2, type 3 and type 4, the output x of WB PCMPLC logic PLC(i) be transmitted through G.722QMF analysis filterbank 1702 obtaining corresponding subband signal, these subband signals are sent to improved low strap adpcm encoder 1704 and improved high-band adpcm encoder 1706 respectively subsequently with the more state and the storer of new decoder.The subband adpcm encoder that has only part to simplify is used for this renewal.
The processing that Figure 16 and logic shown in Figure 17 are carried out occurs in during the lost frames.Improved low strap adpcm encoder 1704 and improved high-band adpcm encoder all are simplified to reduce complexity.To make details to them in other place of the application describes.A feature (not existing in the G.722 subband adpcm encoder of routine) that occurs in scrambler 1704 and 1706 is based on the scrambler self-adaptation of signal attribute and packet loss duration and resets.
The most complicated processing related with the PLC algorithm is the processing for the frame of type 5, and the frame of type 5 is first received frames that follow packet loss closely.The transition of extrapolation waveform to the standard decode waveform took place in this image duration.The technology of using when handling the frame of type 5 comprises again phasing and time distortion, and these will be made more details at this and describe.Figure 18 provides the module map 1800 of the logic that is used to carry out these technology.In addition, when handling the frame of type 5, come QMF composite filter group in the new decoder more in the mode of having described more details at this.Another function related with the frame of handling type 5 is included in low strap when first received frame begins behind the packet loss and the setting of the high-band logarithmic scale factor.
All be to use the decode frame of type 5 and type 6 of subband adpcm decoder improved and constraint at this.Figure 19 has described the module map 1900 of the logic of the frame that is used to handle type 5 and type 6.As shown in figure 19, logical one 970 imposes restriction to subband adpcm decoder 1910 and 1920 when handling the frame of type 5 and/or type 6 and controls.The constraint of subband adpcm decoder and control are to apply during the 80ms behind the packet loss.Wherein some does not expand to beyond the 40ms, and other constraint and be controlled at the duration or the degree on be adaptive.Constraint and controlling mechanism will be made more details in this application and describe.As shown in figure 19, logic module 1940,1950 and 1960 is used for update mode storer after the frame of handling type 5 or type 6.
Under the condition of error-free channel, the PLC algorithm of describing in this section with G.722 be bit (bit-exact) accurately.In addition, under error condition, this algorithm behind packet loss beyond the 8th frame with G.722 be identical, if there is not the bit mistake, should be able to obtain to the G.722 convergence of error-free output.
The PLC algorithm of describing in this section is supported the frame sign of the multiple of any 10ms of being.For bag size, only need the PLC algorithm to be called repeatedly at interval with 10ms at each bag greater than 10ms.Therefore, in the further part of this section, will the PLC algorithm be described according to the constant frame size of 10ms.
3, the waveform extrapolation of G.722 exporting
For with the corresponding lost frames of packet loss (frame of type 2, type 3 and type 4), the WB PCM PLC logic extrapolation G.722 output waveform of describing in Figure 16 related with former frame produces the replacement waveform of present frame.Then when handling the frame of type 2, type 3 and type 4 with this extrapolation broadband signal waveform x PLC(i) be used as the G.722PLC output waveform of logic.In order to describe various modules among Figure 16 easily, when WB PCM PLC logic is that lost frames calculate signal x PLC(i) after, signal x PLC(i) be written into and stored x Out(j) buffer memory, wherein x Out(j) be the final output of whole G.722 demoder/PLC system.Now each processing module of Figure 16 being made more details describes.
A, eight rank lpc analysis
Module 1604 is used for calculating the signal x of present frame association Out(j) and with it be stored in after the buffer memory, carrying out 8 rank lpc analysis near the ending of frame cycle of treatment.This 8 rank lpc analysis are a kind of auto-correlation lpc analysis, have to be applied to the x related with present frame Out(j) the asymmetric analysis window of the 10ms of signal.This asymmetrical window is defined as follows:
w ( j ) = 1 2 [ 1 - cos ( ( j + 1 ) π 121 ) ] , for j = 0.1,2 , . . . , 119 cos ( ( j - 120 ) π 80 ) , for j = 120,121 , . . . , 159 - - - ( 4 )
Suppose x Out(0), x Out(1), x Out, x Out(159) the expression G.722 demoder/PLC system output Sampling for Wide-Band Signal related with present frame.It is as follows to carry out the window computing:
x w(j)=x out(j)w(j),j=0,1,2,...,159.(5)
Next step, it is as follows to calculate coefficient of autocorrelation:
r ( i ) = Σ j = 1 159 x w ( j ) x w ( j - i ) , i = 0,1,2 , . . . , 8 . - - - ( 6 )
Then spectral smoothing and white noise are corrected operational applications to coefficient of autocorrelation, as follows:
r ^ ( i ) = 1.0001 × r ( 0 ) , i = 0 r ( i ) e - ( 2 πiσ / f s ) 2 2 , i = 1,2 , . . . , 8 , - - - ( 7 )
F wherein sThe=16000th, the sampling rate of input signal, σ=40.
Next step uses row Vincent-Du Bin (Levinson-Durbin) recurrence with coefficient of autocorrelation
Figure S2007800020499D00364
Be converted to the LPC predictor coefficient
Figure S2007800020499D00365
I=0,1 ..., 8.If the short-term forecasting device coefficient related with a nearest frame used in row Vincent-Du Bin recurrence too early withdrawing from before finishing recurrence (for example, because prediction residual energy E (i) is less than zero) so in present frame.In order to solve the exception of this mode, need
Figure S2007800020499D00366
The initial value of array.
Figure S2007800020499D00367
The initial value of array is set to
Figure S2007800020499D00368
With I=1,2 ..., 8.Row Vincent-Du Bin recursive algorithm concrete regulation is as follows:
If 1
Figure S2007800020499D003610
Use a nearest frame
Figure S2007800020499D003611
Array, and withdraw from row Vincent-Du Bin recurrence
2 . E ( 0 ) = r ^ ( 0 )
3 . k 1 = - r ^ ( 1 ) / r ^ ( 0 )
4 . a ^ 1 ( 1 ) = k 1
5.E(1)=(1-k 1 2)E(0)
If 6 E (1)≤0 use a nearest frame
Figure S2007800020499D00374
Array, and withdraw from row Vincent-Du Bin recurrence
7, for i=2,3,4 ..., 8, carry out following computing:
a . k i = - r ^ ( i ) - Σ j = 1 i - 1 a ^ j ( i - 1 ) r ^ ( i - j ) E ( i - 1 )
b . a ^ i ( i ) = k i
c . a ^ j ( i ) = a ^ j ( i - 1 ) + k i a ^ i - j ( i - 1 ) , for j = 1,2 , . . . , i - 1
d.E(i)=(1-k i 2)E(i-1)
If a nearest frame is used in e is E (i)≤0 Array, and withdraw from row Vincent-Du Bin recurrence
If withdraw from recurrence too early, the frame of handling before using
Figure S2007800020499D00379
Array.If finish recurrence (under the normal condition) smoothly, the LPC predictor coefficient is as follows:
a ^ 0 = 1 - - - ( 8 )
And
a ^ i = a ^ i ( 8 ) , for i = 1,2 , . . . , 8 . - - - ( 9 )
By the coefficient applicable broadband extended arithmetic to above acquisition, the final LPC predictor coefficient group that obtains is as follows:
a i = ( 0.96852 ) i a ^ i , for i = 0,1 , . . . , 8 . - - - ( 10 )
The calculating of b, short-term forecasting residue signal
The module 1602 of Figure 16 (being labeled as " A (z) ") expression short-term linear prediction error wave filter, filter coefficient is a of above calculating i, i=0,1 ..., 8.Module 1602 is used for carrying out the laggard line operate of 8 rank lpc analysis.It is as follows that module 1602 is calculated short-term forecasting residue signal d (j):
d ( j ) x out ( j ) + Σ i = 1 8 a i · x out ( j - i ) , for j = 0,1,2 , . . . , 159 . - - - ( 11 )
Traditionally, the time index n of present frame is from the time index continuation of the frame of processing before.In other words, if time index scope 0,1,2 ..., 159 expression present frames, time index scope-160 so ,-159 ... ,-1 frame of just handling before the expression.Thereby, in above equation, if near the signal sampling of this index point ending of the frame of handling before so born in index (j-i).
The calculating of c, scale factor
Module 1606 among Figure 16 is used to calculate the average amplitude of the short-term forecasting residue signal related with present frame.This operates in module 1602 and calculates short-term forecasting residue signal d (j) just execution afterwards in the above described manner.Average amplitude avm is calculated as follows:
avm = 1 160 Σ j = 0 159 | d ( j ) | . - - - ( 12 )
If next pending frame is lost frames (in other words, with the corresponding frame of packet loss), can use this average amplitude to adjust white Gauss (Gaussian) noise sequence (if present frame is noiseless) as scale factor.
The calculating of d, weighted speech signal
The module 1608 of Figure 16 (being labeled as " 1/A (z/y) ") expression weighting short-term composite filter.Module 1608 is used for the short-term forecasting residue signal d (j) that in the above described manner (referrer module 1602) calculate present frame and operates afterwards.The coefficient a ' of this weighting short-term composite filter i(i=0,1 ..., 8) be calculated as follows (γ wherein 1=0.75):
a′ i=γ 1 ia i,i=1,2,...,8.(13)
Short-term forecasting residue signal d (j) is by this weighted synthesis filter.Being calculated as follows of corresponding output weighted speech signal xw (j):
xw ( j ) = d ( j ) - Σ i = 1 8 a i ′ · xw ( j - i ) , j = 0,1,2 , . . . , 159 . - - - ( 14 )
E, eight to one sampling (eight-to-one decimation)
The module 1616 of Figure 16 transmits the weighted speech signal of module 1608 outputs by 60 rank minimum phase finite impulse response (FIR) (FIR) wave filters, carries out sampling in 8: 1 then the 16kHz low-pass filtering weighted speech signal that obtains is sampled as 2kHz downwards to the weighted speech signal xwd of down-sampling (n).This sampling operation is just carried out after calculating weighted speech signal.So that reduce complexity, only when needing xwd (n) new sampling the time carry out the FIR low-pass filtering operation.Thereby, to being calculated as follows of weighted speech signal xwd (n) of down-sampling:
xwd ( n ) = Σ i = 0 59 b i · xw ( 8 n + 7 - i ) , n = 0,1,2 . . . , 19 , - - - ( 15 )
B wherein i(i=0,1,2 ..., 59) be the filter factor of 60 rank FIR low-pass filters, as shown in table 3.
Lag behind i The b of Q15 form i Lag behind i The b of Q15 form i Lag behind i The b of Q15 form i
?0 1209 20 -618 40 313
?1 728 21 -941 41 143
?2 1120 22 -1168 42 -6
?3 1460 23 -1289 43 -126
?4 1845 24 -1298 44 -211
?5 2202 25 -1199 45 -259
?6 2533 26 -995 46 -273
?7 2809 27 -701 47 -254
?8 3030 28 -348 48 -210
?9 3169 29 20 49 -152
?10 3207 30 165 50 -89
?11 3124 31 365 51 -30
?12 2927 32 607 52 21
?13 2631 33 782 53 58
?14 2257 34 885 54 81
?15 1814 35 916 55 89
?16 1317 36 881 56 84
?17 789 37 790 57 66
?18 267 38 654 58 41
?19 -211 39 490 59 17
The coefficient of table 3:60 rank FIR wave filter
F, thick pitch period extract
For the complexity that reduces to calculate, WB PCM PLC logic is carried out the fundamental tone extraction and is divided into two stages: at first use the temporal analytical density of 2kHz sampled signal to determine thick pitch period, use the temporal analytical density of the non-sampled signal of 16kHz to carry out the pitch period refinement then.Only after the weighted speech signal xwd of down-sampling (n), just carry out this fundamental tone extraction when calculating.This subdivision has been described the module 1620 of Figure 16 thick pitch period extraction algorithm of performed phase one.This algorithm is based on using certain additional decision logic to maximize the crossing dependency of standard.
When extracting, can use thick pitch period the pitch analysis window of 15ms.The afterbody of pitch analysis window aligns with the afterbody of present frame.Under the sampling rate of 2kHz, corresponding 30 samplings of 15ms.Under situation about being without loss of generality, suppose index range n=0 to n=29 corresponding to the pitch analysis window that is used for xwd (n).Thick pitch period extraction algorithm is by calculating following value beginning:
c ( k ) = Σ n = 0 29 xwd ( n ) xwd ( n - k ) , - - - ( 16 )
E ( k ) = Σ n = 0 29 [ xwd ( n - k ) ] 2 , - - - ( 17 )
And
Figure S2007800020499D00403
More than calculating is carried out at all integers in from k=MINPPD-1 to the k=MAXPPD+1 scope, and wherein MINPPD=5 and MAXPPD=33 are respectively minimum and the maximum pitch periods in the sample range.Thick then pitch period extraction algorithm is at k=MINPPD, MINPPD+1, and MINPPD+2 ..., search in the scope of MAXPPD, to find all local peakings of the array { c2 (k)/E (k) } that satisfies c (k)>0.If (adjacent two values of value are all little than it, are local peaking with this value defined).Suppose N pThe quantity of representing positive local peaking.Suppose k p(j) (j=1,2 ..., N p) be index, c2 (k wherein p(j))/E (k p(j)) be local peaking and c (k p(j))>0, and the hypothesis k p(1)<k p(2)<...<k p(N p).For convenience, c2 (k)/E (k) will be called as " standardization correlativity square (normalized correlation square) ".
If N p=0, if promptly there is not positive local peaking in function c2 (k)/E (k), this algorithm has maximum amplitude with search so | c2 (k)/E (k) | the negative local peaking of maximum.If find the negative local peaking of this maximum, so corresponding index k is used as the thick pitch period cpp of output, and stops the processing of module 1620.If standardization correlativity chi square function c2 (k)/E (k) had not both had positive local peaking, also not negative local peaking will export thick pitch period so and be set at cpp=MIPPD, and stop the processing of module 1620.If N p=1, will export thick pitch period so and be set at cpp=k pAnd stop the processing of module 1620 (1).
If there are at least two (N of local peaking p〉=2), this module is used algorithm A, B, C and D (will be described below) so, comes to determine the thick pitch period cpp of output in proper order according to this.The variable that calculates in the more preceding algorithm in these four algorithms will pass to back one algorithm to be continued to use.
Following algorithm A is used for criterion of identification correlativity square c2 (k p)/E (k p) local peaking around maximum secondary interpolation peak value.At c (k p) carry out the secondary interpolation, and at E (k p) the execution linear interpolation.This interpolation is to use the temporal analytical density of the non-sampling voice signal of 16kHz to carry out.In following algorithm, the sampling factor of using when D represents that xw (n) is sampled to xwd (n).Thereby, at this D=8.
Algorithm A-is at c2 (k p)/E (k p) look-around maximum secondary interpolation peak value.
A, setting c2max=-1, Emax=1, jmax=0.
B, for j=1,2 ..., N p, carry out following 12 steps:
1, sets a=0.5[c (k p(j)+1)+c (k p(j)-1)]-c (k p(j))
2, set b=0.5[c (k p(j)+1)+c (k p(j)-1)]
3, set ji=0
4, set ei=E (k p(j))
5, set c2m=c2 (k p(j))
6, set Em=E (k p(j))
If 7 c2 (k p(j)+1) E (k p(j)-1)>c2 (k p(j)-1) E (k p(j)+1), carry out the remaining part of step 7:
a、Δ=[E(k p(j)+1)-ei]/D
B, for k=1,2 ..., D/2, carry out step 7 with the lower part:
i.ci=a(k/D) 2+b(k/D)+c(k p(j))
ii.ei←ei+Δ
If iii. (ci) 2The ei of Em>(c2m), carry out following triplex row:
a.ji=k
b.c2m=(ci) 2
c.Em=ei
If 8 c2 (k p(j)+1) E (k p(j)-1)≤c2 (k p(j)-1) E (k p(j)+1), carry out the remaining part of step 8:
a、Δ=[E(k p(j)-1)-ei]/D
B, for k=-1 ,-2 ... ,-D/2, carry out step 8 with the lower part:
i.ci=a(k/D) 2+b(k/D)+c(k p(j))
ii.ei←ei+Δ
If iii. (ci) 2The ei of Em>(c2m), carry out following triplex row:
a.ji=k
b.c2m=(ci) 2
c.Em=ei
9, set lag (j)=k p(j)+ji/D
10, set c2i (j)=c2m
11, set Ei (j)=Em
If 12 c2m * Emax>c2max * Em carry out following triplex row:
a.jmax=j
b.c2max=c2m
c.Emax=Em
Symbol ← expression uses the value on right side to upgrade the parameter in left side.
For fear of selecting approximately is the thick pitch period of integral multiple of actual thick pitch period, to c2 (k p)/E (k p) corresponding each time lag of local peaking search for, with determine whether time lag enough near before the thick pitch period of output (be expressed as cpplast, for each first frame, cpplast is initialized to 12) of the frame handled.If have time lag be positioned at cpplast 25% in, so just think enough near.For all cpplast 25% with interior time lag, with corresponding standard correlativity square c2 (k p)/E (k p) secondary interpolation peak value compare, and select wherein to be used for subsequent treatment corresponding to the interpolation time lag of maximum standardization correlativity square.Following algorithm B has carried out above-mentioned task.The interpolation array c2i (j) and the Ei (j) that in above-mentioned algorithm A, calculate in this algorithm, have been used.
Algorithm B-searches maximization interpolation c2 (k among all time lags near the thick pitch period of output of a nearest frame p)/E (k p) time lag:
A, setting index im=-1
B, setting c2m=-1
C, setting Em=1
D, for j=1,2 ... N p, carry out following computing:
1. if | k p(j)-cpplast|≤0.25 * cpplast, carry out as follows:
If c2i (j) * Em>c2m * Ei (j) a. carries out following triplex row:
i.im=j
ii.c2m=c2i(j)
iii.Em=Ei(j)
Note, if be not positioned at cpplast 25% with time lag k p(j), the value of index im will remain-1 behind execution algorithm B so.If exist one or more cpplast of being positioned at 25% with time lag, so index im in these time lags corresponding to maximum standardization correlativity square.
Next, algorithm C determines whether to select another time lag as the thick pitch period of output at the preceding half cycle of fundamental tone scope.This algorithm search is less than all interpolation time lag lag (j) of 16, and checks the local peaking that whether has in them near the enough big standardization correlativity of its each integral multiple (until 32, comprise itself) square.If there are the one or more time lags that satisfy this condition, select minimum in these time lags that a satisfy condition time lag so as the thick pitch period of output.
In addition, each variable that calculates in above algorithm A and algorithm B also transmits its end value and gives following algorithm C.As described below, parameter MPDTH is 0.06, and providing threshold array MPTH (k) is MPTH (2)=0.7, MPTH (3)=0.55, MPTH (4)=0.48, MPTH (5)=0.37, MPTH (k)=0.30, k>5.
Algorithm C-checks that another time lag that whether should select in the interior preceding half cycle of thick pitch period scope is as exporting thick pitch period:
A, for j=1,2,3 ..., N p, when lag (j)<16, carry out following operation in proper order by this:
If 1 j ≠ im sets threshold=0.73; Otherwise, set threshold=0.4.
If 2 c2i (j) * Emax≤threshold * c2max * Ei (j) cancel this j, and skip the step (3) of corresponding this j, j is increased 1 and return step (1).
If 3 c2i (j) * Emax>threshold * c2max * Ei (j) carry out following operation:
A, for k=2,3,4 ..., when k * lag (j)<32, carry out as follows:
i、s=k×lag(j)
ii、a=(1-MPDTH)s
iii、b=(1+MPDTH)s
Iv, pass through m=j+1 in order, j+2, j+3 ..., N p, see if there is time lag lag (m) between a and b.If there is not time lag to be between a and the b, skip this j, stop step 3, j is increased 1 and return step 1.If there is at least one m satisfy a<lag (m)<b and c2i (m) * Emax>MPTH (k) * c2max * Ei (m), think near the enough big peak value that k the integral multiple of lag (j), has found standardization correlativity square so; In this case, stop step 3.a.iv, k is increased 1 and return step 3.a.i.B if under situation about not stopping too early completing steps 3.a, just, if less than each integral multiple of 32 lag (j) ± have the enough big interpolation peak value of standardization correlativity square among the 100xMPDTH%, stop this algorithm so, skip algorithm D and with cpp=lag (j) as the thick pitch period of final output.
Do not find the thick pitch period cpp of the output that satisfies condition if execute above algorithm C, algorithm D will check the maximum local peaking (obtaining) of the standardization correlativity square around the thick pitch period of a nearest frame in above algorithm B, and will finally determine the thick pitch period cpp of output.Equally, the variable that calculates in above algorithm A and algorithm B passes to following algorithm D with its end value.As described below, parameter is SMDTH=0.095, LPTHI=0.78.
Algorithm D-exports the final of thick pitch period and determines:
If A is im=-1, if promptly there is not the local peaking of enough big standardization correlativity square around the thick pitch period of a nearest frame, the cpp that will calculate at last at algorithm A is as the thick pitch period of final output so, and withdraws from this algorithm.
If B is im=jmax, if promptly the maximum local peaking of the standardization correlativity around the thick pitch period of a nearest frame square also is the global maximum in all the interpolation peak values of standardization correlativity square in this frame, the cpp that will calculate at last at algorithm A is as the thick pitch period of final output so, and withdraws from this algorithm.
If C is im<jmax, carry out as the lower part:
If 1 c2m * Emax>0.43 * c2max * Em, carry out step C with the lower part:
If a is lag (im)>MAXPPD/2 sets output cpp=lag (im), and withdraws from this algorithm.
B otherwise, for k=2,3,4,5, carry out with the lower part:
i、s=lag(jmax)/k
ii、a=(1-SMDTH)s
iii、b=(1+SMDTH)s
If iv is lag (im)>a and lag (im)>b sets output cpp=lag (im), and withdraws from this algorithm.
If D is im>jmax, carry out with the lower part:
If 1 c2m * Emax>LPTH1 * c2max * Em sets output cpp=lag (im), and withdraws from this algorithm.
If the E algorithm is carried out this, above-mentioned steps is not selected the thick pitch period of final output so.In this case, only be received in cpp that the ending of algorithm A calculates as the thick pitch period of final output.
G, pitch period refinement
Module 1622 among Figure 16 is used for carrying out by the near zone that uses decoding output voice signal G.722 to search for thick pitch period with complete 16kHz temporal analytical density the subordinate phase of pitch period extraction algorithm and handles.This module at first is transformed into non-sampled signal territory, wherein D=8 by multiply by thick pitch period cpp with the sampling factor D with thick pitch period cpp.Fundamental tone refinement analysis window size WSZ is chosen to be less window size a: WSZ=min (cpp * D, 160) in cpp * D sampling and 160 samplings (corresponding 10ms).
Next, the lower boundary of calculating hunting zone is that (MINPP, cpp * D-4), wherein MINPP=40 sampling is minimum pitch period to lb=max.The coboundary of calculating the hunting zone is that (MAXPP, cpp * D+4), wherein MAXPP=265 sampling is maximum pitch period to ub=max.
It is G.722 decodeing speech signal x of XQOFF=MAXPP+1+FRSZ 16kHz that samples altogether that module 1622 has been kept Out(j) buffer memory, wherein FRSZ=160 is a frame sign.Last FRSZ sampling of this buffer memory comprises the G.722 decodeing speech signal of present frame.Preceding MAXPP+1 sampling is the G.722 demoder/PLC system output signal in the before processed previous frame of present frame.Last sampling of analysis window is alignd with last sampling of present frame.(this window is x if the index range from j=0 to j=WSZ-1 is corresponding to this analysis window OutThe sampling of last WSZ in the buffer memory), and establish negative index and represent sampling before the analysis window (j).Following relevance function and the energy term (energy term) calculated in the non-sampled signal territory at time lag k in [lb, ub] in the hunting zone are as follows:
c ~ ( k ) = Σ j = 0 WSZ - 1 x out ( j ) x out ( j - k ) - - - ( 19 )
E ~ ( k ) = Σ j = 0 WSZ - 1 x out ( j - k ) 2 . - - - ( 20 )
To maximize ratio then Time lag k ∈ [lb, ub] be chosen to be the final refinement pitch period of frame deletion or ppfe.Promptly
ppfe = arg max k ∈ [ lb . ub ] [ c ~ 2 ( k ) E ~ ( k ) ] . - - - ( 21 )
Next, module 1622 has also been calculated two relevant with fundamental tone more scale factors.First is called as the fundamental tone tap of ptfe or frame deletion, and it is the scale factor that is used for the periodic waveform extrapolation, and is calculated as x in the analysis window Out(j) average amplitude of signal and ppfe sampling x before Out(j) ratio of the average amplitude of signal section, have with these two signal sections between the identical symbol of relevance function, as follows:
ptfe = sign ( c ~ ( ppfe ) ) [ Σ j = 0 WSZ - 1 | x out ( j ) | Σ j = 0 WSZ - 1 | x out ( j - ppfe ) | ] . - - - ( 22 )
Σ j = 0 WSZ - 1 | x out ( j - ppfe ) | = 0 Degenerate case under, ptfe is set at 0.After this calculating of ptfe was finished, the bounds of ptfe value was [1,1].
Second scale factor relevant with fundamental tone is called as ppt or fundamental tone prediction tapped, is used to calculate long-term filtering call signal (back will be narrated this), and it is calculated as ppt=0.75 * ptfe.
H, calculating mixing ratio (Mixing Ratio)
Module 1618 among Figure 16 is calculated periodicity extrapolation waveform and the mixing ratio between the filter noise waveform during figure of merit (figure of merit) is determined lost frames.This calculating is only carried out during first lost frames when at every turn packet loss taking place.Figure of merit is the weighted sum of three characteristics of signals: log gain, the first standardization auto-correlation and fundamental tone prediction gain, they each be calculated as follows.
Use with before son joint description in the identical x that is used for Out(j) index agreement, the x in the fundamental tone refinement analysis window Out(j) energy of signal is
sige = Σ j = 0 WSZ - 1 x out 2 ( j ) , - - - ( 23 )
And with 2 is being calculated as follows of log gain lg at the end
Figure S2007800020499D00472
If E ~ ( ppfe ) ≠ 0 , Being calculated as follows of fundamental tone prediction complementary energy
rese = sige - c ~ 2 ( ppfe ) / E ~ ( ppfe ) , - - - ( 25 )
And being calculated as follows of fundamental tone prediction gain pg
Figure S2007800020499D00475
If E ~ ( ppfe ) = 0 , Set pg=0.If sige=0 sets pg=0 equally.
The first standardization auto-correlation ρ 1Be calculated as follows
Figure S2007800020499D00477
After obtaining these three characteristics of signals, being calculated as follows of figure of merit
merit=lg+pg+12ρ 1. (28)
The merit that more than calculates has determined two scale factor Gp and Gr, and these two scale factors have been determined periodically extrapolation waveform and the mixing ratio between the filter noise waveform effectively.Two threshold values that are used for merit are arranged here: the figure of merit high threshold MHI and the figure of merit are hanged down threshold value MLO.These threshold values are set to MHI=28 and MLO=20.(filter noise) components in proportions factor Gr's is calculated as follows at random
Gr = MHI - merit MHI - MLO , - - - ( 29 )
And being calculated as follows of the scale factor Gp of cyclic component
Gp=1-Gr (30)
I, periodic waveform extrapolation
Module 1624 among Figure 16 is used at the last output speech waveform of lost frames period ground extrapolation (if merit>MLO).To the mode that module 1624 is carried out this function be described now.
For first lost frames of each packet loss, calculate the average pitch periodic increment of every frame.Pitch period history buffer pph (m) (m=1,2 ..., 5) preserved the pitch period ppfe of preceding 5 frames.The average pitch periodic increment obtains according to following process.Begin with an instant nearest frame, calculate pitch period increment (negative value is represented the pitch period decrement) from its former frame to this frame.If the pitch period increment is zero, this algorithm can be checked the pitch period increment of former frame.This processing procedure continues till detecting first frame with non-zero pitch period increment, perhaps till detecting the 4th previous frame.If all the preceding five frames have identical pitch period, so the average pitch periodic increment is made as zero.Otherwise, if find first non-zero pitch period increment at m previous frame, if and the amplitude of this pitch period increment is less than 5% of the pitch period on this frame, then average pitch periodic increment ppinc is calculated as pitch period increment on this frame divided by m, and end value is limited in the scope of [1,2].
In second continuous lost frames of packet loss, with average pitch periodic increment and pitch period ppfe addition, and number of results is rounding to immediate integer, be not limited to then in the scope of [MIPP, MAXPP].
If present frame is first lost frames of packet loss, calculate so-called " call signal " in stack, used so, to guarantee the smooth waveform transition when frame is initial.The stack length of call signal and periodicity extrapolation waveform is 20 samplings of first lost frames.Suppose j=0,1,2 ..., 20 samplings of corresponding current first lost frames of 19 index range are the stack cycle, and the corresponding previous frame of the negative index of hypothesis.Just can obtain long-term call signal is the zoom version of short-term forecasting residue signal (it is than a Zao pitch period of stack cycle):
ltring ( j ) = x out ( j - ppfe ) + Σ i = 1 8 a i · x out ( j - ppfe - i ) , j = 0,1,2 , . . . , 19 . - - - ( 31 )
After these 20 samplings that calculate ltring (j), the scale factor ppt that calculates with module 622 further adjusts them:
ltring(j)←ppt·ltring(j),j=0,1,2,...,19.(32)
Use is initialized to x in a nearest frame Out(j) the filtering storer ring (j) of last 8 samplings of signal (j=-8 ,-7 ... ,-1), the final call signal of acquisition is as follows:
ring ( j ) = ltring ( j ) - Σ i = 1 8 a i · ring ( j - i ) , j = 0,1,2 , . . . , 19 . - - - ( 33 )
Suppose j=0,1,2 ..., 159 index range is corresponding to current first lost frames, and j=160,161,162 ..., 209 index range is corresponding to 50 samplings of next frame.In addition, suppose wi (j) and wo (j) (j=0,1 ..., 19) be respectively that triangle fades in and the window that fades out, so wi (j)+wo (j)=1.So, the periodic waveform extrapolation is that following two steps of branch are carried out:
Step 1:
x out(j)=wi(j)·ptfe·x out(n-ppfe)+wo(j)·ring(j),j=0,1,2,...,19.(34)
Step 2:
x out(j)=ptfe·x out(j-ppfe),j=20,21,22,...,209.(35)
J, standardization noise maker
If merit<MHI, the module 1610 among Figure 16 can produce has the white gaussian random noise sequence of single average amplitude.In order to reduce computation complexity, calculate white gaussian random noise in advance and be stored in the table.For fear of using long table and avoiding owing to too short table repeats identical noise pattern, the present invention will use a kind of special index scheme.In this scheme, white Gauss noise table wn (j) has 127 clauses and subclauses, and the adjustment version of the output of noise maker module is
wgn(j)=avm×wn(mod(cfecount×j,127)),j=0,1,2,...,209,(36)
Wherein cfecount is a frame counter, for the k in the current packet loss continuous lost frames, and cfecount=k,
Figure S2007800020499D00492
It is modular arithmetic.
The filtering of k, noise sequence
Module 1614 expression short-term composite filters among Figure 16.If merit<MHI, 1614 pairs of modules through the white Gauss noises of adjusting carry out filtering with it is provided with a nearest frame in x Out(j) the identical spectrum envelope of the spectrum envelope of signal.What obtain is as follows through filter noise fn (j)
fn ( j ) = wgn ( j ) - Σ i = 1 8 a i · fn ( j - i ) , j = 0,1,2 , . . . , 209 . - - - ( 37 )
1, the mixing of periodicity and random element
If merit>MHI, the periodicity extrapolation waveform x that has only module 1624 to calculate so Out(j) as the output of WB PCM PLC logic.If merit<MLO, have only that module 1614 produces through the output of filtered noise signals fn (j) as WB PCM PLC logic.If MLO≤merit≤MHI is mixed into two compositions so
x out(j)←Gp·x out(j)+Gr·fn(j),j=0,1,2,...,209.(38)
The x of extrapolation Out(j) signal (j=160,161,162 ..., 199) preceding 40 extra samples will become the call signal ring (j) of next frame, j=0,1,2 ..., 39.If next frame or lost frames have only preceding 20 samplings of this call signal to be used for stack so.If next frame is a received frame, 40 samplings of all of this call signal all will be used for stack so.
M, oblique deascension with good conditionsi (conditional ramp down)
If packet loss continues 20ms or shorter, the x that produces by mixing cycle and random element so Out(j) signal will be as WB PCM PLC output signal.If packet loss continues greater than 60ms, WB PCM PLC output signal is eliminated the noise fully so.If packet loss continues greater than 20ms but less than 60ms, so the x that produces by mixing cycle and random element Out(j) signal will produce linear oblique deascension (decaying to zero with linear mode).As what stipulate in the following specific algorithm, this oblique deascension with good conditionsi is to carry out during the lost frames of cfecount>2 o'clock.This provided with the array gawd () of Q15 form for 52 ,-69 ,-104 ,-207}.Equally, j=0,1,2 ..., 159 index range is corresponding to x Out(j) present frame.Oblique deascension algorithm with good conditionsi:
If A is cfecount≤and 6, carry out following 9 row:
1、delta=gawd(cfecount-3)
2、gaw=1
3, for j=0,1,2 ..., 159, carry out following two row:
a.x out(j)=gaw·x out(j)
b.gaw=gaw+delta
If following triplex row is carried out in 4 cfecount<6:
A, for j=160,161,162 ..., 209, carry out following two the row:
i.x out(j)=gaw·x out(j)
ii.gaw=gaw+delta
X is set in B otherwise (if cfecount>6) Out(j)=0, j=0,1,2 ..., 209.
Stack in n, first received frame
For the frame of type 5, will be from the output x of demoder G.722 Out(j) with from the call signal ring (j) of last lost frames (calculating in mode as described above) superpose by module 1624:
x out(j)=wi(j)·x out(j)+w o(j)·ring(j)j=0...L OLA-1,(39)
Wherein
L OLA = 8 if G p = 0 40 otherwise . - - - ( 40 )
4, the recompile of PLC output
For during lost frames (frame of type 2, type 3 and type 4) upgrade the G.722ADPCM storer and the parameter of demoder, PLC output in essence will be through scrambler G.722.Figure 17 is the module map 1700 that is used to carry out the logic that this recompile handles.As shown in figure 17, PLC output x Out(j) pass through QMF analysis filterbank 1702 to produce low strap subband signal x L(n) and high-band subband signal x H(n).Low strap subband signal x L(n) encode high-band subband signal x by low strap adpcm encoder 1704 H(n) encode by high-band adpcm encoder 1706.In order to reduce complexity, compare with traditional ADPCM subband coder, the present invention simplifies ADPCM subband coder 1704 and 1706.Now more details being carried out in aforesaid operation describes.
A, transmission PLC output are through the QMF analysis filterbank
The storer of QMF analysis filterbank 1702 is initialised to be provided and subband signal that the subband signal of decoding is continuous.Initial 22 samplings of WB PCM PLC output have constituted the filtering storer, and subband signal calculates according to following equation:
x L ( n ) = Σ i = 0 11 h 2 i · x PLC ( 23 + j - 2 i ) + Σ i = 0 11 h 2 i + 1 · x PLC ( 22 + j - 2 i ) , and - - - ( 41 )
x H ( n ) = Σ i = 0 11 h 2 i · x PLC ( 23 + j - 2 i ) - Σ i = 0 11 h 2 i + 1 · x PLC ( 22 + j - 2 i ) , - - - ( 42 )
X wherein PLC(0) first sampling of the 16kHz WB PCM PLC of corresponding present frame output, x L(n=0) and x H(n=0) the 8kHz low strap of corresponding present frame and first sampling of high-band subband signal respectively.Except being offset 22 extra samplings, described filtering is identical with the transmission QMF of scrambler G.722, and WB PCM PLC output (relative with input) is sent to bank of filters.In addition, (80 sampling~10ms), WB PCM PLC need expand 22 samplings outside present frame, and produces 182 sampling~11.375ms for the whole frame that produces subband signal.Subband signal x L(n) (n=0,1 ..., 79) and x H(n) (n=0,1 ..., 79) produce according to equation 41 and 42 respectively.
The recompile of b, low band signal
Low band signal x L(n) be to use the low strap adpcm encoder of simplification to encode.The module map of the low strap adpcm encoder of simplifying 2000 as shown in figure 20.In Figure 20, can see, delete the inverse quantizer of the low strap adpcm encoder of standard, and replace quantized prediction error with non-quantized prediction error.In addition, because the renewal of adaptive quantizer only based on by 6 bit low strap encoder index I L(n) 8 element subclass in Biao Shi 64 element sets are carried out, so predicated error only is quantified as 8 element sets.This provides the identical renewal of adaptive quantizer, has also simplified quantification.Table 4 has been listed based on e LDecision level, output code and the multiplier of 8 grades of simplification quantizers of absolute value (n).
m L Low threshold value High threshold I L Multiplier, W L
1 0.00000 0.14103 3c -0.02930
2 0.14103 0.45482 38 -0.01465
3 0.45482 0.82335 34 0.02832
4 0.82335 1.26989 30 0.08398
5 1.26989 1.83683 2c 0.16309
6 1.83683 2.61482 28 0.26270
7 2.61482 3.86796 24 0.58496
8 3.86796 20 1.48535
Table 4:8 level is simplified decision level, output code and the multiplier of quantizer
The entity of Figure 20 is to calculate according to the equivalents of their G.722 low strap ADPCM subband coder:
s Lz ( n ) = Σ i = 1 6 b L , i ( n - 1 ) · e L ( n - i ) , - - - ( 43 )
s Lp ( n ) = Σ i = 1 2 a L , i ( n - 1 ) · x L ( n - i ) , - - - ( 44 )
s L(n)=s Lp(n)+s Lz(n),(45)
e L(n)=x L(n)-s L(n), reach (46)
p Lt(n)=s Lz(n)+e L(n).(47)
Adaptive quantizer upgrades according to the regulation of scrambler G.722 exactly.The self-adaptation of zero-sum pole segment and the same generation in scrambler G.722 are as described in the clause 3.6.3 of standard G.722 and the 3.6.4.
Low strap adpcm decoder 1910 automatically resets behind the 60ms of LOF, but its can be during LOF early 30ms carry out self-adaptation and reset.In the recompile process of low band signal, to part reconstruction signal p Lt(n) attribute is monitored, and the self-adaptation of control low strap adpcm decoder 1910 resets.p Lt(n) signal is lost in the process monitoredly whole, so it is set as zero when first lost frames:
sgn [ p Lt ( n ) ] = sgn [ p Lt ( n - 1 ) ] + 1 p Lt ( n ) > 0 sgn [ p Lt ( n - 1 ) ] p Lt ( n ) = 0 sgn [ p Lt ( n - 1 ) ] - 1 p Lt ( n ) < 0 . - - - ( 48 )
For lost frames, the p of monitoring and constant signal contrast on the basis of every frame LtTherefore (n) originally attribute is made as zero with attribute (cnst[]) at each lost frames.It is updated to
cnst [ p Lt ( n ) ] = cnst [ p Lt ( n - 1 ) ] + 1 p Lt ( n ) = p Lt ( n - 1 ) cnst [ p Lt ( n - 1 ) ] p Lt ( n ) &NotEqual; p Lt ( n - 1 ) . - - - ( 49 )
If meet the following conditions, in the ending of lost frames 3 to 5 sub-band decoder that resets:
| sgn [ p Lt ( n ) ] N lost | > 36 Or cnst[p Lt(n)]>40, (50)
N wherein LostBe the quantity of lost frames, promptly 3,4 or 5.
The recompile of c, high band signal
High band signal x H(n) be to use the high-band adpcm encoder of simplification to encode.The module map of the high-band adpcm encoder of simplifying 2100 as shown in figure 21.In Figure 21, can see, the adaptive quantizer of standard high-band adpcm encoder is deleted, because the moving average before this algorithm use packet loss rewrites the logarithmic scale factor on first received frame, thereby the logarithmic scale factor that does not need high-band to recode.The quantized prediction error of high-band adpcm encoder 2100 has been substituted by non-quantized prediction error.
The entity of Figure 21 is to calculate according to the equivalents of their G.722 high-band ADPCM subband coder:
s Hz ( n ) = &Sigma; i = 1 6 b H , i ( n - 1 ) &CenterDot; e H ( n - i ) , - - - ( 51 )
s Hp ( n ) = &Sigma; i = 1 2 a H , i ( n - 1 ) &CenterDot; x H ( n - i ) , - - - ( 52 )
s H(n)=s Hp(n)+s Hz(n),(53)
e H(n)=x H(n)-s H(n), reach (54)
p H(n)=s Hz(n)+e H(n).(55)
The self-adaptation of zero-sum pole segment and the same generation in scrambler G.722 are as described in the clause 3.6.3 of standard G.722 and the 3.6.4.
Similar with the low strap recompile, high-band adpcm decoder 1920 automatically resets behind the 60ms of LOF, but its can be during LOF early 30ms carry out self-adaptation and reset.In the recompile process of high band signal, to part reconstruction signal p H(n) attribute is monitored, and the self-adaptation of control high-band adpcm decoder 1910 resets.p H(n) signal is lost in the process monitoredly whole, so it is set as zero when first lost frames:
sgn [ p H ( n ) ] = sgn [ p H ( n - 1 ) ] + 1 p H ( n ) > 0 sgn [ p H ( n - 1 ) ] p H ( n ) = 0 sgn [ p H ( n - 1 ) ] - 1 p H ( n ) < 0 . - - - ( 56 )
For lost frames, the p of monitoring and constant signal contrast on every frame basis H(n) attribute, therefore each lost frames begin attribute (cnst[]) is made as zero.It is updated to
cnst [ p H ( n ) ] = cnst [ p H ( n - 1 ) ] + 1 p H ( n ) = p H ( n - 1 ) cnst [ p H ( n - 1 ) ] p H ( n ) &NotEqual; p H ( n - 1 ) . - - - ( 57 )
If meet the following conditions, in the ending of lost frames 3 to 5 sub-band decoder that resets:
| sgn [ p H ( n ) ] N lost | > 36 Or cnst[p H(n)]>40. (58)
5, the use of pilot signal characteristic and PLC thereof
Below describe the constraint of Figure 19 and the function of steering logic 1970, be used to reduce artifact and distortion in the transition from lost frames to the received frame, thus the performance of demoder/PLC system 300 behind the raising packet loss.
A, the low strap logarithmic scale factor
During received frame, upgrade the low strap logarithmic scale factor
Figure S2007800020499D00553
Characteristic, and on first received frame after the LOF, use these characteristics to set state adaptively at the adaptive quantizer of scale factor.Thereby obtain a kind of tolerance (measure) of low strap logarithmic scale factor stationarity, be used for determining the proper reset of state.
The stationarity of i, the low strap logarithmic scale factor
During received frame, calculate and the renewal low strap logarithmic scale factor
Figure S2007800020499D00554
Stationarity, this is based on the constant leakage of tool (leakage)
Figure S2007800020499D00555
The single order moving average
Figure S2007800020499D00556
:
&dtri; L , m 1 ( n ) = 7 / 8 &CenterDot; &dtri; L , m 1 ( n - 1 ) + 1 / 8 &CenterDot; &dtri; L ( n ) . - - - - ( 59 )
The tracking of single order moving average Metric calculation as follows
&dtri; L , trck ( n ) = 127 / 128 &CenterDot; &dtri; L , trck ( n - 1 ) + 1 / 128 &CenterDot; | &dtri; L , m 1 ( n ) - &dtri; L , m 1 ( n - 1 ) | . - - - ( 60 )
Has the second order moving average that self-adaptation is leaked
Figure S2007800020499D005510
Calculate according to following equation 61:
&dtri; L , m 2 ( n ) = 7 / 8 &CenterDot; &dtri; L , m 2 ( n - 1 ) + 1 / 8 &CenterDot; &dtri; L , m 1 ( n ) &dtri; L , trck ( n ) < 3277 3 / 4 &CenterDot; &dtri; L , m 2 ( n - 1 ) + 1 / 4 &CenterDot; &dtri; L , m 1 ( n ) 3277 &le; &dtri; L , trck ( n ) < 6554 1 / 2 &CenterDot; &dtri; L , m 2 ( n - 1 ) + 1 / 2 &CenterDot; &dtri; L , m 1 ( n ) 6554 &le; &dtri; L , trck ( n ) < 9830 &dtri; L , m 2 ( n ) = &dtri; L , m 1 ( n ) 9830 &le; &dtri; L , trck ( n ) - - - ( 61 )
The stationarity of the low strap logarithmic scale factor is weighed according to following equation is change degree:
&dtri; L , chng ( n ) = 127 / 128 &CenterDot; &dtri; L , chng ( n - 1 ) + 1 / 128 &CenterDot; 256 &CenterDot; | &dtri; L , m 2 ( n ) - &dtri; L , m 2 ( n - 1 ) | . - - - ( 62 )
During lost frames, do not upgrade, in other words:
&dtri; L , m 1 ( n ) = &dtri; L , m 1 ( n - 1 ) &dtri; L , trck ( n ) = &dtri; L , trck ( n - 1 ) &dtri; L , m 2 ( n ) = &dtri; L , m 2 ( n - 1 ) &dtri; L , chng ( n ) = &dtri; L , chng ( n - 1 ) . - - - ( 63 )
Resetting of the logarithmic scale factor of ii, low strap adaptive quantizer
First received frame place after LOF, according to (rewriting) low strap logarithmic scale factor that resets adaptively of the stationarity before the LOF:
&dtri; L ( n - 1 ) &LeftArrow; &dtri; L , m 2 ( n - 1 ) &dtri; L , chng ( n - 1 ) < 6554 &dtri; L ( n - 1 ) 3276 [ &dtri; L , chng ( n - 1 ) - 6 5 5 4 ] + &dtri; L , m 2 ( n - 1 ) 3276 [ 9830 - &dtri; L , chng ( n - 1 ) ] 6554 &le; &dtri; L , chng ( n - 1 ) &le; 9830 &dtri; L ( n - 1 ) 9830 < &dtri; L , chng ( n - 1 ) - - - ( 64 )
B, the high-band logarithmic scale factor
During received frame, upgrade the high-band logarithmic scale factor
Figure S2007800020499D00563
Characteristic, and on the received frame after the LOF, use these characteristics to set the state of adaptive quantizing scale factor.In addition, described characteristic adaptively control frame lose the convergence of the back high-band logarithmic scale factor.
The moving average and the stationarity of i, the high-band logarithmic scale factor
Calculate according to following formula Tracking:
Figure S2007800020499D00565
Based on tracking, it is as follows to calculate the moving average with self-adaptation leakage
&dtri; H , m ( n ) = 255 / 256 &CenterDot; &dtri; H , m ( n - 1 ) + 1 / 256 &CenterDot; &dtri; H ( n ) | &dtri; H , trck ( n ) | < 1638 127 / 128 &CenterDot; &dtri; H , m ( n - 1 ) + 1 / 128 &CenterDot; &dtri; H ( n ) 1638 &le; | &dtri; H , trck ( n ) | < 3277 63 / 64 &CenterDot; &dtri; H , m ( n - 1 ) + 1 / 64 &CenterDot; &dtri; H ( n ) 3277 &le; | &dtri; H , trck ( n ) | < 4915 31 / 32 &CenterDot; &dtri; H , m ( n - 1 ) + 1 / 32 &CenterDot; &dtri; H ( n ) 4915 &le; | &dtri; H , trck ( n ) | . - - - ( 66 )
This moving average is used for the high-band logarithmic scale factor that resets on first received frame, this will be described in son joint after a while.
The degree of the stationarity of the calculating high-band logarithmic scale factor is as follows from mean value:
&dtri; H , chng ( n ) = 127 / 128 &CenterDot; &dtri; H , chng ( n - 1 ) + 1 / 128 &CenterDot; 256 &CenterDot; | &dtri; H , m ( n ) - &dtri; H , m ( n - 1 ) | . - - - ( 67 )
The tolerance of this stationarity is used for control frame lose after
Figure S2007800020499D00571
Heavily convergence, this will be described in after a while son joint.
During lost frames, do not upgrade, in other words:
&dtri; H , trck ( n ) = &dtri; H , trck ( n - 1 ) &dtri; H , m ( n ) = &dtri; H , m ( n - 1 ) &dtri; H , chng ( n ) = &dtri; H , chng ( n - 1 ) . - - - ( 68 )
Resetting of the logarithmic scale factor of ii, high-band adaptive quantizer
The moving average of the received frame before on first received frame, the high-band logarithmic scale factor being reset to packet loss:
&dtri; H ( n - 1 ) &LeftArrow; &dtri; H , m ( n - 1 ) . - - - ( 69 )
The convergence of the high-band logarithmic scale factor after the despread frame of the logarithmic scale factor of iii, high-band adaptive quantizer is lost is the tolerance by the stationarity before the frame losing Control.For situation stably, right behind packet loss
Figure S2007800020499D00575
The application self-adapting low-pass filter.This application of low pass filters is on 0ms, 40ms or 80ms, and the degree of low-pass filtering reduces gradually during this period.The duration of sampling
Figure S2007800020499D00576
According to definite as getting off:
N LP , &dtri; H = 640 &dtri; H , chng < 819 320 &dtri; H , chng < 1311 0 &dtri; H , chng &GreaterEqual; 1311 . - - - ( 70 )
Low-pass filtering is following to be provided:
&dtri; H , LP ( n ) = &alpha; LP ( n ) &dtri; H , LP ( n - 1 ) + ( 1 - &alpha; LP ( n ) ) &dtri; H ( n ) , - - - ( 71 )
Wherein coefficient provides by following:
&alpha; LP ( n ) = 1 - ( n + 1 N LP , &dtri; H + 1 ) 2 , n = 0,1 . . . , N LP , &dtri; H - 1 . - - - ( 72 )
Therefore, the low-pass filtering minimizing of sampling one by one time n.The logarithmic scale factor through low-pass filtering exists Replace the conventional logarithmic scale factor between sampling period simply.
C, low strap pole segment
During received frame for the subband adpcm decoder upgrades the entity (entity) be referred to as (pole segment) engine sta bility margin, to be used to retrain the pole segment after the LOF.
The engine sta bility margin of i, low strap pole segment
The engine sta bility margin of low strap pole segment is defined as
β L(n)=1-|a L,1(n)|-a L,2(n),(73)
A wherein L, 1(n) and a L, 2(n) be two limit coefficients.The moving average of the engine sta bility margin during the received frame is upgraded according to following formula:
β L,MA(n)=15/16·β L,MA(n-1)+1/16·β L(n)(74)
During lost frames, do not upgrade moving average:
β L,MA(n)=β L,MA(n-1).(75)
The constraint of ii, low strap pole segment
G.722 in low strap (and high-band) the ADPCM Code And Decode, keep β in routine L, min=1/16 minimum engine sta bility margin.In the initial 40ms after LOF, for the low strap adpcm decoder is kept the minimum engine sta bility margin of increase, it has been the function of the moving average of time since the LOF and engine sta bility margin.
For three initial 10ms frames, minimum engine sta bility margin
β L,min=min{3/16,β L,MA(n-1)}(76)
Be arranged on the frame boundaries and this minimum stability margin of execution on entire frame.On the frame boundaries that enters into the 4th 10ms frame, carry out minimum engine sta bility margin
&beta; L , min = min { 2 / 16 , 1 / 6 + &beta; L , MA ( n - 1 ) 2 } - - - ( 77 )
And other frame is carried out β L, min=1/16 conventional minimum engine sta bility margin.
D, high band portion reconstruction signal and high-band reconstruction signal
In all image durations, comprise lost frames and received frame, preserve and hold high band portion reconstruction signal p H(n) and high-band reconstruction signal r H(n) high-pass filtered version:
p H, HP(n)=0.97[p H(n)-p H(n-1)+p H, HP(n-1)], reach (78)
Figure S2007800020499D00582
The 3dB of this corresponding about 40Hz blocks, and mainly is to remove DC.
In the initial 40ms after LOF, conventional part reconstruction signal and conventional reconstruction signal are replaced by their high-pass filtered version separately, and this is respectively for high-band pole segment self-adaptation and high-band reconstruct output.
6, time lag is calculated
Phasing again and time distortion Technology Need in this discussion have frame-losing hide waveform x PLC(j) and the unjustified number of samples of the signal in first received frame.
The low complex degree of a, low subband reconstruction signal is estimated
The signal that is used to calculate time lag in first received frame is by using extremely zero filter factor (a Lpwe, i(159), b Lpwe, i(159)) with from STATE 159The necessary status information of other that obtains is blocked differential signal d to low subband Lt(n) carry out that filtering obtains:
r Le ( n ) = &Sigma; i = 1 2 a Lpwe , i ( 159 ) &CenterDot; r Le ( n - i ) + &Sigma; i = 1 6 b Lpwe , i ( 159 ) &CenterDot; d Lt ( n - i ) + d Lt ( n ) ,
n=0,1,...,79.(80)
This function is carried out by the module 1820 of Figure 18.
B, phasing and time distortion demand determines again
If last received frame is noiseless, and is represented as figure of merit, time lag T LBe set to:
If merit≤MLO, T L=0. (81)
In addition, if first received frame is noiseless, represented as standardization first coefficient of autocorrelation:
r ( 1 ) = &Sigma; n = 0 78 r Le ( n ) &CenterDot; r Le ( n ) &Sigma; n = 0 78 r Le ( n ) &CenterDot; r Le ( n + 1 ) , - - - ( 82 )
Time lag is set to zero:
If r (1)<0.125, T L=0. (83)
Otherwise the calculating of time lag is as being explained with the lower part.The calculating of described time lag is to be carried out by the module 1850 of Figure 18.
The calculating of c, time lag
The calculating of time lag may further comprise the steps: (1) generates the extrapolation signal; (2) thick time lag search; And (3) refinement time lag search.These will be described in following son joint.
The generation of i, extrapolation signal
Time lag is represented x PLC(j) and r Le(n) deviation between.In order to calculate this deviation, with x PLC(j) extend to first received frame, and standardization crossing dependency function is maximized.How this height joint has described extrapolation x PLCAnd describe the length of desired signal in detail (j).Suppose x PLC(j) be copied to x Out(j) in the buffer memory.Because this is the frame (first received frame) of a type 5, so hypothesis is accordingly:
x out(j-160)=x PLC(j),j=0,1,...,159(84)
The scope of relevance of searches (correlation) is as follows:
Figure S2007800020499D00601
Δ wherein TLMAX=28, ppfe is producing x PLCThe pitch period of the periodic waveform extrapolation of using in the time of (j).The window size (under the 16kHz sampling rate) of hysteresis search is as follows:
Figure S2007800020499D00602
Specifying hysteresis search window LSW under the 8kHz sampling rate is usefulness very, as follows:
Figure S2007800020499D00603
As above provide, need be from x PLCThe total length of the extrapolation signal that (j) obtains is:
L=2·(LSW+Δ TL).(88)
The extrapolation signal with respect to the reference position of first sampling in the received frame is:
D=12-Δ TL.(89)
Extrapolation signal es (j) is according to following method reconstruct:
If D<0
es(j)=x out(D+j)j=0,1,...,-D-1
If (L+D≤ppfe)
es(j)=x out(-ppfe+D+j) j=-D,-D+1,...,L-1
Otherwise
es(j)=x out(-ppfe+D+j) j=-D,-D+1,...,ppfe-D-1
es(j)=es(j-ppfe) j=ppfe-D,ppfe-D+1,...,L-1
Otherwise
Figure S2007800020499D00611
If (ovs 〉=L)
es(j)=x out(-ovs+j) j=0,1,...,L-1
Otherwise
(if ovs>0)
es(j)=x out(-ovs+j) j=0,1,...,ovs-1
If (L-ovs≤ppfe)
es(j)=x out(-ovs-ppfe+j) j=ovs,ovs+1,...,L-1
Otherwise
es(j)=x out(-ovs-ppfe+j) j=ovs,ovs+1,...,ovs+ppfe-1
es(j)=es(j-ppfe) j=ovs+ppfe,ovs+ppfe+1,...,L-1.
Ii, thick time lag search
The time lag T of guestimate LSUBAt first by search sub sampling standardization crossing dependency function R SUB(k) peak value calculates:
R SUB ( k ) = &Sigma; i = 0 LSW / 2 - 1 es ( 4 i - k + &Delta; TL ) &CenterDot; r Le ( 2 i ) &Sigma; i = 0 LSW / 2 - 1 es 2 ( 4 i - k + &Delta; TL ) &Sigma; i = 0 LSW - 1 r Le 2 ( 2 i ) , k = - &Delta; TL , - &Delta; TL + 4 , - &Delta; TL + 8 , K , &Delta; TL - - - ( 90 )
In order to avoid when the refinement search to be beyond the boundary, adjust T LSUBAs follows:
If (T LSUB>Δ T LMAX-4) T LSUBTLMAX-4 (91)
If (T LSUB<-Δ TLMAX+ 4) T LSUB=-Δ TLMAX+ 4. (92)
Iii, the search of refinement time lag
Then by the search R (k) the peak value search refinement to provide time lag T L, R (k) is as follows:
R ( k ) = &Sigma; i = 0 LSW - 1 es ( 2 i - k + &Delta; TL ) &CenterDot; r Le ( i ) &Sigma; i = 0 LSW - 1 es 2 ( 2 i - k + &Delta; TL ) &Sigma; i = 0 LSW - 1 r Le 2 ( i ) , k = - 4 + T LSUB , - 2 + T LSUB , . . . , 4 + T LSUB . - - - ( 93 )
At last, check following condition:
If &Sigma; i = 0 LSW - 1 r Le 2 ( i ) = 0 - - - ( 94 )
Or &Sigma; i = 0 LSW - 1 es ( 2 i - T L + &Delta; TL ) &CenterDot; r Le ( i ) &le; 0.25 &CenterDot; &Sigma; i = 0 LSW - 1 r Le 2 ( i ) - - - ( 95 )
Or(T L>Δ TLMAX-2)||(T L<-Δ TLMAX+2)(96)
T so L=0.
7, phasing again
Again phasing is that internal state is set at frame-losing hide waveform x PLC(j) with first received frame before the sampling of last input signal with the processing procedure of the state of phase time.Again phasing can be divided into following step: (1) stores middle G.722 state in the recompile process of lost frames; (2) adjust recompile according to time lag; And (3) upgrade QMF synthetic filtering storer.Following subdivision will be described the more details of these steps.Again phasing is to be carried out by the module 1810 of Figure 18.
A, in the recompile process state G.722 in the middle of the storage
As described in other place of the application, recompile reconstruction signal x during lost frames PLC(j) to upgrade G.722 decoder states storer.Suppose STTATE jBe to x PLC(j) G.722 state and PLC state behind j the sampling recompile.The G.722 state that removes on frame boundaries (will normally be kept, i.e. STATE so 159) outside, also stored STATE 159-Δ TLMAXIn order to promote phasing again, also stored subband signal:
x L(n),x H(n)n=69-Δ TLMAX/2...79+Δ TLMAX/2
B, adjust recompile according to time lag
According to the symbol of time lag, the process of adjusting recompile is as follows:
If Δ TL>0
1, recovers G.722 state and PLC state to STATE 159-Δ TLMAX
2, with aforementioned manner recompile x L(n), x H(n) (n=80-Δ TLMAX/ 2...79-Δ TL/ 2)
If Δ TL<0
1, recovers G.722 state and PLC state to STATE 159
2, with aforementioned manner recompile x L(n), x H(n) (n=80...79+| Δ TL/ 2|)
Note, in order to promote x L(n) and x H(n) recompile is until the n=79+| Δ TL/ 2| needs x PLC(j) until Δ TLMAX+ 182 samplings.
C, renewal QMF composite filter storer
On first received frame, owing to occur in the 16kHz output voice domain because of PLC during lost frames, QMF composite filter group is sluggish, so need to calculate QMF composite filter storer.On time, last sampling of general corresponding last lost frames of this storer.Yet, need again phasing to take into account.According to G.722, QMF composite filter storer is given as follows:
x d(i)=r L(n-i)-r H(n-i), i=1,2 ..., 11, and (97)
x s(i)=r L(n-i)+r H(n-i),i=1,2,...,11(98)
Initial two output samplings of first received frame are calculated as follows:
x out ( j ) = 2 &Sigma; i = 0 11 h 2 i &CenterDot; x d ( i ) ,
x out ( j + 1 ) = 2 &Sigma; i = 0 11 h 2 i + 1 &CenterDot; x s ( i ) . - - - ( 100 )
Filter memory (is x d(i) and x s(i) (i=1,2 ..., 11)) be that basis is at recompile x L(n) and x H(n) (n=69-Δ TL/ 2, the 69-Δ TL/ 2+1 ..., the 79-Δ TL/ 2) give to simplify that last 11 samplings in the input of phasing again of subband adpcm encoder calculate when (promptly until more last samplings of phasing point again):
x d(i)=x L(80-Δ TL/ 2-i)-x H(80-Δ TL/ 2-i), and i=1,2 ..., 11, and (101)
x s(i)=x L(80-Δ TL/2-i)+x H(80-Δ TL/2-i),i=1,2,...,11, (102)
X wherein L(n) and x H(n) during lost frames, be stored in the status register.
8, time distortion
The time distortion is the processing along time shaft stretching or contraction signal.Below described how to x Out(j) carry out the time distortion to improve and periodic waveform extrapolation signal x PLC(j) alignment.Only work as T LCarried out this algorithm at ≠ 0 o'clock.The time distortion is carried out by the module 1860 of Figure 18.
A, time lag refinement
Come refinement to be used for the time lag T of time distortion by the maximal value of getting the crossing dependency in the stack window LBased on T LThe reference position of the stack window in first received frame that estimates is as follows:
SP OLA=max(0,MIN_UNSTBL-T L),(103)
MIN_UNSTBL=16 wherein.
With respect to SP OLAThe reference position of extrapolation signal as follows:
D ref=SP OLA-T L-RSR,(104)
Wherein RSR=4 is the search refinement scope.
The Len req of extrapolation signal is as follows:
L ref=OLALG+RSR.(105)
Extrapolation signal es Tw(j) be to use as D.6.c.i saving described identical process and obtain, except LSW=OLALG, L=L RefAnd D=D Ref
Refinement hysteresis T RefBe to obtain by searching for following peak value:
R ( k ) = &Sigma; i = 0 OLALG - 1 es tw ( i - k + RSR ) &CenterDot; x out ( i + SP OLA ) &Sigma; i = 0 OLALG - 1 es tw 2 ( i - k + RSR ) &Sigma; i = 0 OLALG - 1 x out 2 ( i + SP OLA ) , k = - RSR , - RSR + 1 . . . , RSR . - - - ( 106 )
Obtain to be used for the final time lag of time distortion then:
T Lwarp=T L+T ref.(107)
The x that b, calculating were twisted through the time Out(j) signal
Signal x Out(j) through T LwarpThe distortion of time of individual sampling forms subsequently and waveform extrapolation signal es Old(j) Die Jia signal x Warp(j).The timeline 2200,2220 and 2240 of Figure 22 A, Figure 22 B and Figure 22 C shows according to T respectively LwarpThree kinds of situations of value.In Figure 22 A, T Lwarp<0, x Out(j) experience is shunk or compression.x Out(j) first MIN_UNSTBL sampling is not used in distort process creates x Warp(j) and xstart=MIN_UNSTBL.In Figure 22 B, 0≤T Lwarp≤ MIN_UNSTBL is to x Out(j) carry out T LwarpThe stretching of individual sampling.Equally, x Out(j) first MIN_UNSTBL is not used and xstart=MIN_UNSTBL.In Figure 22 C, T Lwarp〉=MIN_UNSTBL is again to x Out(j) carry out T LwarpThe stretching of individual sampling.Yet, because in distort process, can create extra T LwarpIndividual sampling is not so need x in this case Out(j) a T LwarpIndividual sampling; Thereby, xstart=T Lwarp
In each case, the number of samples of each stack/decline is as follows:
spad = ( 160 - xstart ) | T Lwarp | . - - - ( 108 )
Distortion superposes by sectional type (piece-wise) single sampling translation and triangle and realizes, from x Out[xstart] beginning.In order to carry out contraction, periodically reduce sampling.The point that reduces from sampling superposes with original signal with to the signal (because reduction) of left.In order to carry out stretching, repeated sampling periodically.The point that repeats from sampling superposes with original signal with to the signal (because sampling repeats) of right translation.The length L of stack window Olawarp(note: this is different from the OLA zone shown in Figure 22 A, 22B and the 22C) depends on the periodicity of the increase/reduction of sampling, and be as follows:
If T Lwarp < 0 , L olawarp = ( 160 - xstart - | T Lwarp | ) | T Lwarp |
Otherwise
Figure S2007800020499D00653
L olawarp=min(8,L olawarp).
Input signal x through distortion WarpLength as follows:
L xwarp=min(160,160-MIN_UNSTBL+T Lwarp).(110)
C, calculating waveform extrapolation signal
Shown in Figure 22 A, 22B and 22C, in first received frame, will twist signal x WarpWith extrapolation signal es Ola(j) superpose.Processing according to following two steps can be at x Out(j) directly produce extrapolation signal es in the signal buffer memory Ola(j):
Step 1
es ola(j)=x out(j)=ptfe·x out(j-ppfe) j=0,1,...,160-L xwarp+39(111)
Step 2
x out(j)=x out(j)·w i(j)+ring(j)·w o(j) j=0,1,...,39, (112)
W wherein i(j) and w o(j) be that length is 40 triangle oblique ascension and oblique deascension stack window, ring (j) is the call signal that calculates in other local described mode among the application.
The stack of d, time distortion signal and waveform extrapolation signal
Extrapolation signal that in the last period, calculates and distortion signal x Warp(j) superpose, as follows:
x out(160-L xwarp+j)=x out(160-L xwarp+j)·w o(j)+x warp(j)·w i(j),j=0,1,...,39.
(113)
X then Warp(j) remainder by simple copy in the signal buffer memory:
x out(160-L xwarp+j)=x warp(j),j=40,41,...,L xwarp-1.(114)
E, based on the bag-losing hide of the subband predictive coding device of subband speech waveform extrapolation
Shown in the demoder among Figure 23/PLC system 2300, be an optional embodiment of the present invention.Above-mentioned most of technology for demoder/PLC system 300 researchs also can be used for this second exemplary embodiment.The key distinction of demoder PLC system 2300 and demoder/PLC system 300 is to carry out voice signal waveform extrapolation in subband voice signal territory, rather than in full band voice signal territory.
As shown in figure 23, demoder/PLC system 2300 comprises bit stream demultiplexer 2310, low strap adpcm decoder 2320, low strap voice signal compositor 2322, switch 2336 and QMF composite filter group 2340.Bit stream demultiplexer 2310 bit stream demultiplexer 210 with Fig. 2 in essence is identical, and the QMF composite filter group 2340 QMF composite filter group 240 with Fig. 2 in essence is identical.
The same with demoder/PLC system 300 of Fig. 3, demoder/PLC system 2300 is with the mode processed frame of foundation frame type, and identical frame type among use and above-mentioned Fig. 5.
When handling the frame of Class1, G.722 demoder/PLC system 2300 operative norms decode.Under this operator scheme, the module 2310,2320,2330 and 2340 of demoder/PLC system 2300 is carried out respective modules 210,220,230 and 240 identical functions with traditional G.722 demoder 200 respectively.Specifically, bit stream demultiplexer 2310 is separated into low strap bit stream and high-band bit stream with incoming bit stream.Low strap adpcm decoder 2320 becomes decoding low strap voice signal with the low strap bit stream decoding.Switch 2326 is connected to the top position that is labeled as " Class1 ", thereby the low strap voice signal of will decoding is connected to QMF composite filter group 2340.High-band adpcm decoder 2330 becomes decoding high-band voice signal with the high-band bit stream decoding.Switch 2336 also is connected to the top position that is labeled as " Class1 ", thereby the high-band voice signal of will decoding is connected to QMF composite filter group 2340.To decode low strap voice signal and decoding high-band voice signal of QMF composite filter group 2340 reconfigures and helps band output voice signal then.
Therefore, when handling the frame of Class1, demoder/PLC system is equivalent to the demoder 200 of Fig. 2, difference is for use possible in the follow-up lost frames, decoding low strap voice signal is stored in the low strap voice signal compositor 2322, same for use possible in the follow-up lost frames, decoding high-band voice signal is stored in the high-band voice signal compositor 2332.Estimate to carry out other state renewal of PLC operation and handle and also can be performed.
When handling the frame (lost frames) of type 2, type 3 and type 4, the decodeing speech signal to each subband from the subband voice signal of the storage related with previous frame carries out extrapolation, to fill up the waveform gap related with current lost frames.This waveform extrapolation is carried out by low strap voice signal compositor 2322 and high-band voice signal compositor 2332.There are many prior aries in the outer plugging function of execution module 2322 and 2332 waveform.For example, can use at application number is 11/234,291, artificially old, application time of invention be September 26 in 2005 day, be called the technology of describing in the United States Patent (USP) of " the bag-losing hide technology of autonomous block audio coder ﹠ decoder (codec) ", perhaps use the revision of those technology, as relate to the technology of demoder/PLC system 300 of above-mentioned Fig. 3.
When handling the frame of type 2, type 3 or type 4, switch 2326 and 2336 is positioned at the lower position that is labeled as " type 2-6 ".Thereby they will synthesize the low strap sound signal and synthetic high-band sound signal is connected to QMF composite filter group 2340, and then they are reassembled into synthetic output voice signal at current lost frames.
Similar with demoder/PLC system 300, the first few received frame that follows bad frame (frame of type 5 and type 6) closely needs particular processing to minimize because the voice quality that the state G.722 of not matching causes descends, and guarantees the seamlessly transitting of the decodeing speech signal waveform of extrapolation voice signal waveform in the good frame of the first few that follows last bad frame closely from last lost frames.Thereby, when handling these frames, switch 2326 and 2336 remains on the lower position that is designated as " type 2-6 ", so that the decoding low strap voice signal from low strap adpcm decoder 2320 can be made amendment by low strap voice signal compositor 2322 before being provided to QMF composite filter group 2340, and can before being provided to QMF composite filter group 2340, make amendment by high-band voice signal compositor 2332 from the decoding high-band voice signal of high-band adpcm decoder 2330.
Those having skill in the art will recognize that and know, in above son joint C and D at packet loss after this exemplary embodiment of special processing of most of technology of first few frame delineation first few frame after also can being used for packet loss at an easy rate.For example, decoding constraint and steering logic (not shown among Figure 23) are also included within demoder/PLC system 2300, are used for retraining and control the decode operation of being carried out by low strap adpcm decoder 2320 and high-band adpcm decoder 2330 with above-mentioned similar fashion with reference to demoder/PLC system 300 when handling the frame of type 5 and type 6.Equally, each subband voice signal compositor 2322 and 2332 is used to carry out again phasing and time distortion technology, as above-mentioned those technology with reference to demoder/PLC system 300 descriptions.Because part provides the complete description of these technology in front, so do not need the description of these technology that repetition uses in the environment of demoder/PLC system 2300 at this.
Compare with demoder/PLC system 300, the major advantage of demoder/PLC system 2300 is that it has lower complexity.This is to have got rid of because of extrapolation voice signal in subband domain to adopt QMF composite filter group will be separated into the needs of subband voice signal entirely with the extrapolation voice signal, as finishing in first exemplary embodiment.Yet the extrapolation voice signal also has its advantage in full band territory, below will set forth this.
When the 2300 extrapolation high-band voice signals of the system among Figure 23, there are some potential problems.At first, if it, exports voice signal will not keep the high-band voice signal that may occur in some high periodically audible signals periodic characteristic so not to high-band voice signal performance period property waveform extrapolation.In other words, if it is to high-band voice signal performance period property waveform extrapolation, reduce calculating and guarantee that two subband voice signals are just using identical pitch period to be used for extrapolation, still exist another problem even its use is used for the identical pitch period of extrapolation low strap voice signal.When periodicity extrapolation high-band voice signal, extrapolation high-band voice signal will be periodic, and have harmonic structure on frequency spectrum.In other words, the frequency of the spectrum peak in the frequency spectrum of high-band voice signal will be relevant with integral multiple.Yet in case composite filter group 2340 reconfigures high-band voice signal and low strap voice signal, the frequency spectrum of high-band voice signal will " be translated into " or be shifted to be higher frequency simultaneously mirror image may take place, and this depends on used QMF composite filter group.Thereby, after this mirror image and frequency shift (FS), can not guarantee that the spectrum peak that full band is exported in the high band portion of voice signal still has the frequency of the integral multiple of the fundamental frequency in the low strap voice signal.This may cause the high periodically decline of the output audio quality of audible signal potentially.On the contrary, there is not this problem in the system among Fig. 3 300.Because system 300 carries out the sound signal extrapolation in full band territory, so guaranteed that the frequency of the harmonic wave peak value in the high-band is the integral multiple of fundamental frequency.
In a word, the advantage of demoder/PLC system 300 is for audible signal, and extrapolation will keep the harmonic structure of spectrum peak with voice signal entirely on whole voice band.In other words, demoder/PLC system 2300 has the advantage of low complex degree, but it cannot keep this harmonic structure in higher subband.
F, hardware and software are realized
Complete in order to guarantee, the invention provides the following description of general calculation machine system.The present invention can realize in the combination of hardware or software and hardware.Therefore, the present invention can realize in the environment of computer system or other disposal system.Figure 24 shows an example of this computer system 2400.In the present invention, more than all decodings of in C, D and E joint, describing with PLC operation can on one or more different computer systems 2400, carry out, to realize the whole bag of tricks of the present invention.
Computer system 2400 comprises one or more processors, as processor 2404.Processor 2404 can be specific use or general digital signal processor.Processor 2404 is connected to communication construction 2402 (for example, bus or network).Various softwares are realized describing according to this exemplary computer system.After reading this narration, the those skilled in the art will be readily appreciated that how to use other computer system and/or computer organization to realize the present invention.
Computer system 2400 also comprises primary memory 2406, and preferably random access memory (RAM) also can comprise second memory 2420.Second memory 2420 can comprise, for example, hard disk drive 2422 and/or mobile storage driver 2424, representational have floppy disk, tape drive, CD drive or the like.Mobile storage unit 2428 is read and/or write to mobile storage driver 2424 in a well-known manner.There are floppy disk, tape, CD or the like in mobile storage unit 2428, and it carries out read and write by mobile storage driver 2424.Should know, mobile storage unit 2428 comprise with computer software and/data storage computing machine therein uses storage medium.
In optionally realizing, second memory 2420 can comprise other similar device, is used for computer program or other instruction are written into computer system 2400.This device also comprises, for example, and mobile storage unit 2430 and interface 2426.The example of this device comprises programming box (cartridge) and cartridge interface (as using), removable memory chip (as EPROM or PROM) and associated slots in video-game is provided with, other makes data be transferred to the mobile storage unit 2430 and the interface 2426 of computer system 2400 from mobile storage unit 2430.
Computer system 2400 also comprises communication interface 2440.Communication interface 2440 is transmitted software and data between computer system 2400 and outer setting.The example of communication interface 2440 comprises modulator-demodular unit, network interface (as Ethernet card), communication port, PCMCIA groove and card etc.The software by communication interface 2440 transmission and the form of data can be that electricity, electromagnetism, light or other can be by the signals of communication interface 2440 receptions.These signals can be provided to communication interface 2440 by communication port 2442.Communication port 2442 transmits signal, and it can use electric wire or cable, optical fiber, telephone wire, mobile phone connection, RF connects and other communication channel realizes.
Be generally used for referring to media at this used term " computer program media " and " computing machine can use media ", as mobile storage unit 2428, the signal that is installed in the hard disk on the hard disk drive 2422 and receives by communication interface 2440.These computer programs are to be used to provide the means of software to computer system 2400.
Computer program (being also referred to as computer control logic) is stored in primary memory 2406 and/or the second memory 2420.Computer program also can receive by communication interface 2440.This computer program makes computer system 2400 realize the present invention, as in this discussion when being performed.Specifically, when computer program is performed, make processor 2400 realize processing of the present invention, any as in this discussion method.Therefore, this computer program is represented the controller of computer system 2400.Using software to realize when of the present invention, can be stored in software in the computer program and use mobile storage driver 2424, interface 2426 or communication interface 2440 that software is loaded in the computer system 2400.
In another embodiment, feature of the present invention mainly is to realize in hardware, for example, uses hardware device, as special IC (ASIC) and gate matrix.The realization of using hardware state machine to carry out function described herein also is conspicuous to those of ordinary skill in the art.
G, conclusion
Though more than described various embodiment of the present invention, should be appreciated that its purpose only is to illustrate, and do not have restricted.Those skilled in the art knows, not leaving under the spirit and scope of the present invention situation, also can make various changes in the form and details.Therefore, protection scope of the present invention is improper only to be confined to arbitrary embodiment described above, and should limit according to claim and equivalence replacement thereof.

Claims (9)

1. the renewal method of state of demoder of series of frames of expression coding audio signal that is used to decode is characterized in that described method comprises:
The related output audio signal of lost frames in synthetic and the described series of frames;
Set described decoder states on frame boundaries, to align with described synthetic output audio signal;
Produce the extrapolation signal based on described synthetic output audio signal;
Calculate the time lag between described extrapolation signal and the decoded audio signal related with first received frame behind the lost frames in the described series of frames, wherein said time lag is represented the phase differential between described extrapolation signal and the described decoded audio signal; And
Reset described decoder states based on described time lag.
2. method according to claim 1, it is characterized in that, set described decoder states and comprise representing that a series of sampling recompiles of described synthetic output audio signal are up to described frame boundaries on frame boundaries, aliging with described synthetic output audio signal; And
Reset described decoder states based on described time lag and comprise that a series of sampling recompiles to representing described synthetic output audio signal add or deduct a plurality of samplings related with described time lag up to described frame boundaries.
3. method according to claim 1 is characterized in that, the time lag of calculating between described extrapolation signal and the described decoded audio signal comprises the correlativity that maximizes between described extrapolation signal and the described decoded audio signal.
4. method according to claim 1 is characterized in that, the time lag of calculating between described extrapolation signal and the described decoded audio signal comprises:
Use the first hysteresis hunting zone and the first hysteresis search window to search for first peak value of the standardization crossing dependency function between described extrapolation signal and the described decoded audio signal to determine thick time lag, the wherein said first hysteresis hunting zone specifies in the scope of the starting point displacement of the signal of extrapolation described in the search procedure, and the described first hysteresis search window is specified the number of samples that calculates described standardization crossing dependency function; And
Use second peak value that the second hysteresis hunting zone and the second hysteresis search window search for the standardization crossing dependency function between described extrapolation signal and the described decoded audio signal to determine the refinement time lag, wherein the second hysteresis hunting zone is less than the first hysteresis hunting zone.
5. method according to claim 4, it is characterized in that first peak value of searching for the standardization crossing dependency function between described extrapolation signal and the described decoded audio signal comprises the peak value of the standardization crossing dependency function between the down-sampling sample of searching for described extrapolation signal and described decoded audio signal.
6. method according to claim 4 is characterized in that, the described second hysteresis search window is less than the first hysteresis search window.
7. the renewal system of state of demoder of series of frames of expression coding audio signal that is used to decode is characterized in that, comprising:
Be used for demoder that the received frame of series of frames of expression coding audio signal is decoded;
The sound signal compositor that is used for synthetic related output audio signal with the lost frames of described series of frames;
Decoder states is new logic more, be used for after producing described synthetic output audio signal, setting described decoder states on frame boundaries, to align with described synthetic output audio signal, produce the extrapolation signal based on described synthetic output audio signal, calculate the time lag between described extrapolation signal and the decoded audio signal related with first received frame behind the lost frames in the described series of frames, and reset described decoder states based on described time lag, wherein said time lag is represented the phase differential between described extrapolation signal and the described decoded audio signal.
8. system according to claim 7, it is characterized in that described decoder states more new logic is used for by a series of sampling recompiles of representing described synthetic output audio signal are set described decoder states for to align with described synthetic output audio signal up to described frame boundaries on frame boundaries; And
Described decoder states more new logic be used for by to a series of sampling recompiles of representing described synthetic output audio signal up to described frame boundaries add or deduct a plurality of samplings related and reset described decoder states based on described time lag with described time lag.
9. system according to claim 7, it is characterized in that described decoder states more new logic is used for calculating time lag between described extrapolation signal and the described decoded audio signal by maximizing correlativity between described extrapolation signal and the described decoded audio signal.
CN2007800020499A 2006-08-15 2007-08-15 Method and system for updating state of demoder Active CN101366080B (en)

Applications Claiming Priority (9)

Application Number Priority Date Filing Date Title
US83762706P 2006-08-15 2006-08-15
US60/837,627 2006-08-15
US84805106P 2006-09-29 2006-09-29
US84804906P 2006-09-29 2006-09-29
US60/848,049 2006-09-29
US60/848,051 2006-09-29
US85346106P 2006-10-23 2006-10-23
US60/853,461 2006-10-23
PCT/US2007/076009 WO2008022200A2 (en) 2006-08-15 2007-08-15 Re-phasing of decoder states after packet loss

Publications (2)

Publication Number Publication Date
CN101366080A CN101366080A (en) 2009-02-11
CN101366080B true CN101366080B (en) 2011-10-19

Family

ID=40332816

Family Applications (5)

Application Number Title Priority Date Filing Date
CN2007800020499A Active CN101366080B (en) 2006-08-15 2007-08-15 Method and system for updating state of demoder
CN200780001854XA Expired - Fee Related CN101366079B (en) 2006-08-15 2007-08-15 Packet loss concealment for sub-band predictive coding based on extrapolation of full-band audio waveform
CN2007800014996A Expired - Fee Related CN101361112B (en) 2006-08-15 2007-08-15 Re-phasing of decoder states after packet loss
CN2007800015096A Expired - Fee Related CN101361113B (en) 2006-08-15 2007-08-15 Constrained and controlled decoding after packet loss
CN2007800031830A Expired - Fee Related CN101375330B (en) 2006-08-15 2007-08-15 Re-phasing of decoder states after packet loss

Family Applications After (4)

Application Number Title Priority Date Filing Date
CN200780001854XA Expired - Fee Related CN101366079B (en) 2006-08-15 2007-08-15 Packet loss concealment for sub-band predictive coding based on extrapolation of full-band audio waveform
CN2007800014996A Expired - Fee Related CN101361112B (en) 2006-08-15 2007-08-15 Re-phasing of decoder states after packet loss
CN2007800015096A Expired - Fee Related CN101361113B (en) 2006-08-15 2007-08-15 Constrained and controlled decoding after packet loss
CN2007800031830A Expired - Fee Related CN101375330B (en) 2006-08-15 2007-08-15 Re-phasing of decoder states after packet loss

Country Status (1)

Country Link
CN (5) CN101366080B (en)

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5456914B2 (en) * 2010-03-10 2014-04-02 フラウンホーファーゲゼルシャフト ツール フォルデルング デル アンゲヴァンテン フォルシユング エー.フアー. Audio signal decoder, audio signal encoder, method, and computer program using sampling rate dependent time warp contour coding
WO2014042439A1 (en) 2012-09-13 2014-03-20 엘지전자 주식회사 Frame loss recovering method, and audio decoding method and device using same
KR102037691B1 (en) 2013-02-05 2019-10-29 텔레폰악티에볼라겟엘엠에릭슨(펍) Audio frame loss concealment
BR112015031178B1 (en) 2013-06-21 2022-03-22 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V Apparatus and method for generating an adaptive spectral shape of comfort noise
CN108364657B (en) * 2013-07-16 2020-10-30 超清编解码有限公司 Method and decoder for processing lost frame
EP2922055A1 (en) 2014-03-19 2015-09-23 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus, method and corresponding computer program for generating an error concealment signal using individual replacement LPC representations for individual codebook information
EP2922056A1 (en) 2014-03-19 2015-09-23 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus, method and corresponding computer program for generating an error concealment signal using power compensation
EP2922054A1 (en) * 2014-03-19 2015-09-23 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus, method and corresponding computer program for generating an error concealment signal using an adaptive noise estimation
NO2780522T3 (en) 2014-05-15 2018-06-09
JP6700507B6 (en) * 2014-06-10 2020-07-22 エムキューエー リミテッド Digital encapsulation of audio signals
CN104021792B (en) * 2014-06-10 2016-10-26 中国电子科技集团公司第三十研究所 A kind of voice bag-losing hide method and system thereof
FR3024582A1 (en) * 2014-07-29 2016-02-05 Orange MANAGING FRAME LOSS IN A FD / LPD TRANSITION CONTEXT
EP2988300A1 (en) * 2014-08-18 2016-02-24 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Switching of sampling rates at audio processing devices
EP3023983B1 (en) * 2014-11-21 2017-10-18 AKG Acoustics GmbH Method of packet loss concealment in ADPCM codec and ADPCM decoder with PLC circuit
CN106898356B (en) * 2017-03-14 2020-04-14 建荣半导体(深圳)有限公司 Packet loss hiding method and device suitable for Bluetooth voice call and Bluetooth voice processing chip
CN107749299B (en) * 2017-09-28 2021-07-09 瑞芯微电子股份有限公司 Multi-audio output method and device
CN110310621A (en) * 2019-05-16 2019-10-08 平安科技(深圳)有限公司 Sing synthetic method, device, equipment and computer readable storage medium
CN110970038B (en) * 2019-11-27 2023-04-18 云知声智能科技股份有限公司 Voice decoding method and device
CN113035205B (en) * 2020-12-28 2022-06-07 阿里巴巴(中国)有限公司 Audio packet loss compensation processing method and device and electronic equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1288916A2 (en) * 2001-08-17 2003-03-05 Broadcom Corporation Method and system for frame erasure concealment for predictive speech coding based on extrapolation of speech waveform
US20060045138A1 (en) * 2004-08-30 2006-03-02 Black Peter J Method and apparatus for an adaptive de-jitter buffer
US7047190B1 (en) * 1999-04-19 2006-05-16 At&Tcorp. Method and apparatus for performing packet loss or frame erasure concealment

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2774827B1 (en) * 1998-02-06 2000-04-14 France Telecom METHOD FOR DECODING A BIT STREAM REPRESENTATIVE OF AN AUDIO SIGNAL
CA2388439A1 (en) * 2002-05-31 2003-11-30 Voiceage Corporation A method and device for efficient frame erasure concealment in linear predictive based speech codecs
SG124307A1 (en) * 2005-01-20 2006-08-30 St Microelectronics Asia Method and system for lost packet concealment in high quality audio streaming applications

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7047190B1 (en) * 1999-04-19 2006-05-16 At&Tcorp. Method and apparatus for performing packet loss or frame erasure concealment
EP1288916A2 (en) * 2001-08-17 2003-03-05 Broadcom Corporation Method and system for frame erasure concealment for predictive speech coding based on extrapolation of speech waveform
US20060045138A1 (en) * 2004-08-30 2006-03-02 Black Peter J Method and apparatus for an adaptive de-jitter buffer

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHIBANI M ET AL.Resynchronization of the Adaptive Codebook in a Constrained celp Codec After a Frame Erasure.《ACOUSTIC,SPEECH AND SIGNAL PRECESSING,2006》.2006,1-13. *
SERIZAWA M ET AL.A packet loss concealment method using pitch waveform repetition and internal state update on the decoded speech for the sub-band adpcm wideband speech codec.《SPEECH CODING,2002,IEEE WORKSHOP PROCEEDINGS》.2002,68-70. *

Also Published As

Publication number Publication date
CN101361113B (en) 2011-11-30
CN101361112B (en) 2012-02-15
CN101366079A (en) 2009-02-11
CN101375330B (en) 2012-02-08
CN101361112A (en) 2009-02-04
CN101375330A (en) 2009-02-25
CN101361113A (en) 2009-02-04
CN101366080A (en) 2009-02-11
CN101366079B (en) 2012-02-15

Similar Documents

Publication Publication Date Title
CN101366080B (en) Method and system for updating state of demoder
CN1957398B (en) Methods and devices for low-frequency emphasis during audio compression based on acelp/tcx
US8041562B2 (en) Constrained and controlled decoding after packet loss
CN101981615B (en) Concealment of transmission error in a digital signal in a hierarchical decoding structure
JP2012141649A (en) Sub-band voice codec with multi-stage codebooks and redundant coding technique field
KR20100085994A (en) Scalable speech and audio encoding using combinatorial encoding of mdct spectrum
CN103384900A (en) Low-delay sound-encoding alternating between predictive encoding and transform encoding
CN1751338B (en) Method and apparatus for speech coding

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 1129488

Country of ref document: HK

C14 Grant of patent or utility model
GR01 Patent grant
REG Reference to a national code

Ref country code: HK

Ref legal event code: GR

Ref document number: 1129488

Country of ref document: HK

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20180503

Address after: Singapore Singapore

Patentee after: Avago Technologies Fiber IP Singapore Pte. Ltd.

Address before: california

Patentee before: Zyray Wireless Inc.

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20190827

Address after: Singapore Singapore

Patentee after: Annwa high tech Limited by Share Ltd

Address before: Singapore Singapore

Patentee before: Avago Technologies Fiber IP Singapore Pte. Ltd.