WO2000067417A1 - Robust coding for the transmission of audio or video signals - Google Patents
Robust coding for the transmission of audio or video signals Download PDFInfo
- Publication number
- WO2000067417A1 WO2000067417A1 PCT/GB2000/001649 GB0001649W WO0067417A1 WO 2000067417 A1 WO2000067417 A1 WO 2000067417A1 GB 0001649 W GB0001649 W GB 0001649W WO 0067417 A1 WO0067417 A1 WO 0067417A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- bits
- group
- digital codes
- packet
- significance
- Prior art date
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L1/00—Arrangements for detecting or preventing errors in the information received
- H04L1/0078—Avoidance of errors by organising the transmitted data in a format specifically designed to deal with errors, e.g. location
- H04L1/0083—Formatting with frames or packets; Protocol or part of protocol for error control
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L1/00—Arrangements for detecting or preventing errors in the information received
- H04L2001/0098—Unequal error protection
Definitions
- the present invention relates to the transfer of digital data.
- the Internet and other digital media are being used more and more for the transmission of data.
- the transmitted data falls into two categories. These categories comprises data that must be received, allowing for error correction, exactly as it was transmitted and data that, on reception, need only correspond with the transmitted data within certain tolerances, i.e. loss-tolerant signals.
- the first category includes document files, financial information and program code, for example Java applets.
- the second category comprises data that is primarily intended to be rendered perceptible to the human senses, for example photographic images, music and speech.
- the present invention is concerned with the transmission of data in the second category.
- a method of transmitting a loss-tolerant signal comprising selecting a bit from each digital code in a group of digital codes representing a time-varying signal at a plurality of instants, the selected bits all having the same significance, and transmitting the selected bits together.
- an apparatus for transmitting a loss-tolerant signal comprising selection means for selecting a bit from each digital code in a group of digital codes representing a time- varying signal at a plurality of instants, the selected bits all having the same significance, and means for transmitting the selected bits together.
- the selected bits have the significance of the most significant 1 bit in the group of digital codes.
- a further bit may be selected from each digital code, the selected further bits all having the same, lower significance, and transmitting the selected further bits together.
- the transmitted signal increases in fidelity as the number of times bits are selected increases.
- the selected further bits are transmitted after said selected bits. However, this is not essential but the selections should be made from an unbroken series of bit significances.
- the digital codes each comprise a sign bit and a plurality of magnitude bits, with the sign bits being accorded a significance equivalent to that of the most significant magnitude bits.
- the step of selecting bits having the same significance may be repeated for different significance levels, said significance levels being selected in dependence on the bandwidth of a channel through which the bits are to be transmitted and being in an unbroken sequence.
- a single file containing, for example a piece of music can be used to provide the piece of music to a remote listener with different degrees of fidelity simply by reading and transmitting the appropriate sub-set of bits from the file.
- a first subset of the bits, having significances in a first upper range, may be selected and transmitted in a first packet with a second subset of the bits, having significances in a second lower range, being selected and transmitted in a second packet, the packets including the same destination address.
- the first packets can be sent with, for example, approximately 25% of the data required for full fidelity to provide a preview to a user who can then request the second packets.
- the data from the second packets is added to the data from the first packets at the receiver to increase the fidelity of the signal presented to the user.
- the second packets may also include the data sent in the first packets.
- the first and second packets are preferably distinguished by respective quality of service codes in accordance with a quality of service routing protocol of a router in a path to the destination identified by the destination address. Consequently, the quality of service routing protocol is more likely to discard less significant data so that gaps in the received signal are less likely to occur, instead there will be short-duration reductions in fidelity which a listener, for example in the case of audio signals, may not even notice.
- the bits are transmitted in packets and each packet comprises bits selected from a respective first group of digital codes and bits selected from a respective second group of digital codes, the second group representing an earlier part of said time-varying signal than the first group. More preferably, a greater portion of the packet is given over to the bits selected from the first group than to the bits selected from the second group.
- the packets may comprise three or more sections generally of decreasing size and containing successively older data.
- the transmitted bits are compressed.
- a method according to the present invention may comprises re-ordering the bits of said digital codes so as to group the bits thereof by significance before selecting bits for transmission.
- a method according to the present invention includes transforming time domain samples into frequency domain coefficients, wherein said digital codes comprises frequency domain coefficients.
- frequency domain coefficients includes quasi-frequency domain coefficients such as produced by wavelet packet transforms as well as the coefficients produced by modified discrete cosine transformations (MDCT) and the like. Wavelet packet transforms produce "scale” coefficients. However, these coefficients are dependent on the spectral content of the transformed signal.
- each coefficient produced by the wavelet packet transform should be the result of the same number of filtering and decimation steps, such as a symmetrically branching tree arrangement of filter and decimator functional units.
- Each node having the same depth will have the same number of branches but the number of branches may differ between nodes at different depths.
- the bits to be transmitted may be compressed during a re-ordering process.
- the re-ordering comprises arranging the coefficients in a representation of a two-dimensional matrix having a separate column for each time slot and separate row for each frequency subband. More preferably, the rows are ordered by frequency subband. However, the rows may be ordered in whatever manner produces the best compression, which will be dependent on the spectral content of the original signal. Still more preferably, the re-ordering comprises for each significance level of the coefficients in each column in row order, replacing runs of zeros terminating at an edge row with a termination marker, e.g.
- the re- ordering comprises for each significance level of the coefficients in each column in row order, replacing runs of zeros through coefficients, having a most significant 1 bit in a yet unhandled significance level, and terminating before an edge row with a run-length code.
- the run-length code may comprise a prefix defining a range and suffix defining a position within the range defined by the suffix.
- bits in significance levels above that containing the most significant 1 bit among the coefficients are discarded during re-orde ⁇ ng.
- a method of receiving a loss-tolerant signal comprising receiving bits of a first group of digital codes, re-ordering the bits to produce a second group of digital codes, ⁇ ach member of which comprises at least the most significant bit or bits of a corresponding member of said first group of digital codes, and generating a time-varying signal using said second group of codes.
- an apparatus for receiving a loss-tolerant signal comprising: receiving means for receiving bits of a first group of digital codes, the more significant bits of said codes preceding the less significant bits; means for re-ordering the bits to produce a second group of digital codes, each member of which comprises at least the most significant bit or bits of a corresponding member of said first group of digital codes; and means for generating a time-varying signal using said second group of codes.
- the receiving method and apparatus should be adapted to meet the requirements of the various signal forms produced in accordance with the transmission aspect of the present invention.
- the codes of the second group are padded with zeros in positions for which bits were not received, such that the digital codes of the second group have the same number of bits as the digital codes of the first group.
- the method according to the present invention comprises receiving a first subset of the bits of the digital codes of the first group, having significances in a first upper range, in a first packet and receiving a second subset of the bits of the digital codes of the first group, having significances in a second lower range, in a second packet, and appending the bits of the second subset to those of the first subset before re-ordering to produce the second group of digital codes.
- a method comprises receiving a plurality of packets, each packet comprises a first section followed by a second section, the first section comprising re-ordered bits from one group of digital codes and the second section comprising re-ordered bits from another group of digital codes, wherein the bits in the second section represent an earlier part of said time-varying signal than those in the first section.
- such a method comprises the steps of:- receiving a first packet; re-ordering the bits in the first section for reproducing said time-varying signal; receiving a second packet; determining that an intervening packet has been lost; re-ordering the bits of the second section of the second packet for reproducing said time-varying signal; and thereafter re-ordering the bits of the first section of the second packet for reproducing said time-varying signal.
- the re-ordered bits are preferably represented in the form of a two- dimensional matrix having a column for each time slot and a row for each frequency subband represented by the coefficients and re-ordering comprises for each significance level the coefficients of each time slot, replacing a predetermined termination marker, if present, with a run of zeros terminating at an edge row.
- re-ordering preferably comprises for each significance level of each time slot, determining the presence of a run-length code and, if a run-length code is present, replacing it with a run of zeros having a length determined by the run- length code. More preferably, the run-length code comprises a prefix defining a range and suffix defining a position within the range defined by the suffix.
- Padding of untransmitted more significant bit positions of the first group digital codes may be required.
- a significance code may need to be obtained from the received signal.
- the present invention may be embodied in an audio playback device, wjiich may be portable, including memory means for storing bits received by the receiving means, wherein the re-ordering means is configured to re-order bits stored in the memory for generating a playback audio signal.
- a method of operating a network routing node for routing a signal packet in which more significant bits precede less significant bits comprising determining the bandwidth available in a path away from the node and, if the bandwidth is below a threshold value, truncating the data in said packet in dependence on the determined bandwidth.
- the method may be applied in a network having a TDMA wireless link to a terminal apparatus, wherein the bandwidth varies with the number of slots available in each frame for transmissions to the terminal apparatus.
- Transmitters and receivers according to the present invention may take many forms.
- a voice over IP (VOIP) terminal can be made by combining the transmitting and receiving parts in the same apparatus.
- VOIP voice over IP
- Figure 1 shows a transmitter and a receiver connected by a network
- Figure 2 illustrates a transmitter according to the present invention
- Figure 3 illustrates the process of interleaving the bits of a group of digital codes
- Figure 4 illustrates a receiver according to the present invention
- Figure 5 illustrates a wireless link to a terminal apparatus
- Figure 6 is a flowchart illustrating the operation of a network node
- Figure 7 is a flowchart illustrating the operation of a receiver according to the present invention.
- Figure 8 illustrates a transmitter according to the present invention
- Figure 9 is a flowchart illustrating the operation of a transmitter according to the present invention.
- Figure 10 illustrates the format of a data part of a packet according to the present invention
- Figure 11 is a flowchart illustrating the operation of a receiver according to the present invention.
- Figure 12 illustrates a receiver according to the present invention
- Figure 13 illustrates a wavelet packet transform operation
- Figure 14 illustrates the output of a wavelet packet transform operation
- Figure 15 is a flowchart illustrating the operation of a transmitter according to the present invention
- Figure 16 is a flowchart illustrating the operation of a receiver according to the present invention
- Figure 17 illustrates a receiver according to the present invention
- Figure 18 is a flowchart illustrating the operation of a receiver according to the present invention
- Figure 19 is a flowchart illustrating the operation of a transmitter according to the present invention
- Figure 20 is a simplified view of the data for one time-slot illustrating the compression effected by the method illustrated in Figure 21;
- Figure 21 is a flowchart illustrating the operation of a receiver according to the present invention.
- Figure 22 shows a portable audio playback device according to the present invention.
- a transmission system comprises a transmitter 1, a receiver 2 and a transmission medium 3.
- the transmission medium 3 may be characterised as having a bandwidth (B) determined by the minimum bandwidth of the signal path, e.g. bottleneck 4, and a notional switch 5 that directs signal portions into oblivion 6.
- the switch 5 may operate stochastically or according to a pattern, e.g. EMC from vehicle ignition systems.
- EMC electronic Chemetic Call
- the transmitter 1 comprises a computer provided with a source of audio signals 7, e.g. a magnetic tape recording or a microphone, an analogue-to-digital converter 8 and a hardware network interface (not shown).
- the computer also supports an interleaver process 9, a transmission process 10, a UDP socket 11 and a request processing process or processes (not shown).
- the analogue-to-digital converter 8 digitises the output of the audio signal source 7 ( Figure 3(a)) as signed 4-bit numbers ( Figure 3(b)). (4-four bit numbers are used here in the interests of clarity and it will be appreciated that a larger number of bits, e.g. 10 or 16, would be desirable in many circumstances.)
- the interleaver process 9 reads the samples from the digital-to-analogue converter 8.
- the interleaver process 9 processes the samples in groups of 32.
- the bits of the samples in a complete group are interleaved and then added to a file 12, created for the current audio signal, or supplied to the transmission process 10.
- a characteristic of interleaved sample groups is that their tails can be lopped without making the remaining data unusable. Referring to Figure 3(d), it can be seen that the form of the original waveform is largely retained even though the last 32 bits of a sample group have been lost. This characteristic is even more marked in the case of samples comprising larger numbers of bits.
- the transmission process 10 sends interleaved sample data from the interleaver process 9 or a file 12 to a UDP socket for transmission to a requesting receiver 2. To achieve this the transmission process 10 is also provided with a channel class and a destination address. The operation of the transmission process 10 will be described in more detail below.
- the receiver 2 comprises a computer provided with a digital-to-analogue converter 20, a loudspeaker 21 and a hardware network interface (not shown).
- the computer also supports a UDP socket 16, a reception process 17, a de-interleaver process 18 and a buffering process 19.
- UDP datagrams from the transmitter 1 are received by the UDP socket 16 and their data contents passed to the de-interleaver process 18 by the reception process 17.
- the de- interleaver process 18 de-interleaves the received data and passes the de-interleaved sample groups to the buffer process 19.
- the buffer process 19 ensures that, subject to the datagrams arriving at rate greater than some system-dependent threshold rate, the samples are presented to the digital-to-analogue converter 20 at a constant rate and the output of the digital-to-analogue converter 20, and hence the output of the loudspeaker, faithfully reproduces the frequency components of the original signal.
- the ability of the system to tolerate the truncation of interleaved sample groups means that the contents of datagrams can be tailored according to the bandwidth (B) of the signal path between the transmitter 1 and the receiver 2.
- a user of the receiver 2 may wish to hear a piece of music stored in a file 12 at the transmitter 1.
- the user therefore instructs a process (not shown) running at the receiver 2 to send a request to the transmitter 1.
- This request includes the identity of the file 12 and a channel class.
- the channel class may be determined on the basis of the type of connection between the receiver 2 and the intervening network, e.g. dial-up, ISDN, WAP, and/or a locally determined actually achieved bitrate.
- the request is received by a receiver process (not shown) which starts the transmission process 10 with the file and channel class as parameters.
- the transmission process 10 starts to read groups of interleaved samples from the file 12.
- the transmission process 10 includes all the bits of a respective interleaved sample group in each transmitted UDP datagram. However, if the channel belongs to a second lower bandwidth class, the transmission process 10 omits the final 32 bits of each interleaved sample group when generating the UDP datagram data parts. Similarly, if the channel belongs to a third, worse class, the final 64 bits of each interleaved sample group are omitted. Thus, it can be seen that a single data file can produce useable signals at different bit-rates very simply with little, if any, processing overhead.
- the reception process 18 when each datagram is received, the reception process 18 sends its data part to the de-interleaver process 18. If the number of data bits in the datagram is less than 160, the reception process 18 pads the end of the received data with zeros to make it up to 160 bits in total.
- a mobile telephone network includes a serving GPRS support node (SGSN) 25 and a mobile station 26.
- GPRS General Packet Radio Service
- SGSN serving GPRS support node
- mobile station 26 GPRS (General Packet Radio Service) allocates time slots in a GSM frame for data transmission dynamically on the basis of traffic levels.
- a GPRS transmission will be allocated eight time slots per frame, if there is little traffic in the cell containing the source/destination mobile station 26 but will be allocated successively fewer time slots per frame as traffic in the cell increases.
- the SGSN 25 is largely conventional save that it is configured, e.g. by programming a component computer, to identify UDP datagrams of the type produced by the transmitter 1 shown in Figure 2. To achieve this, the seventh bits of th£ "type of service" portions of the UDP headers are set to one by the UDP socket 11 and the SGSN 25 looks for these. Referring to Figure 6, when the SGSN 25 detects a UDP datagram of the subject type (step si), it determines the number of slots per frame allocated to data communication with the destination mobile station 26 (step s2). If the number of slots per frame is less than 2 (step s3), the SGSN 26 removes a predetermined portion of the tail of each UDP datagram (step s4) before sending it on to the mobile station 26.
- one or more further thresholds may be used to trigger truncation of packets by differing amounts.
- threshold or thresholds triggering truncation of the UDP packets will depend on the packet sizes and the data capacity of each time slot.
- a "quality of service" routing protocol has been proposed for TCP/IP transmissions.
- an overloaded router in a TCP/IP network will discard new incoming packets simply because it has run out buffer space.
- a "quality of service” approach marks each datagram with a service quality code and routers use this code to determine which packets to discard when they become overloaded. For instance, a router may store a table containing the service quality code for all of the datagrams in its input buffer. When the buffer is full and a new datagram arrives, the router reads its service quality code and then searches the table for the most recent datagram having a lower service quality code. If a datagram having a lower service quality code is located, it is discarded and the new datagram is stored in the router's input buffer. However, if no datagrams with a lower service quality code are found, the newly incoming datagram is discarded.
- a modified form of the transmitter 1 makes use of a "quality of service" routing protocol to advantage.
- the operations of the transmitter 1 is substantially as described in the case of the first preferred embodiment except for the operation of the transmission process 10.
- the transmission process 10 treats each interleaved sample group as four sub-groups.
- the first sub-group comprises the sign bits and the most significant bits and the second, third and fourth groups comprise respectively the second, third and fourth most significant bits.
- the bits of the first group are sent to the UDP socket 11 for transmission in a first datagram marked with a high service quality code.
- the second, third and fourth sub-groups are sent to the UDP socket separately for transmission in respective UDP datagrams with successively lower service quality codes.
- the datagrams are routed through one or more overloaded routers, the datagrams with the most important information, i.e. the sign bits and the most significant magnitude bits, are routed in preference to the packets containing bits of lesser significance.
- the receiver 2 is modified for the present embodiment so that the reception process 17 will attempt to reconstruct the original interleaved sample groups and then pass the reconstructions to the de-interleaver process 18.
- step sll UDP datagram
- step sl2 the contents, if any, of a 160-bit buffer are output to the de-interleaver process 18 (step si 3) and the buffer is set to contain all zeros (step si 4).
- step si 5 the data from the datagram is stored in the first 64 bits of the buffer. If the next received datagram relates to a preceding sample group (step si 6), it is discarded (step si 7) otherwise the data from the new packet is stored in the appropriate place in the buffer (step sl8), i.e.
- the performance of this embodiment can be improved by adding a buffer for temporarily storing datagrams before they are passed to the reception process 17.
- the buffer provides time for out of sequence datagrams to be received and then fed to the reception process 17 in their correct position.
- the scalability of the signal format can be employed in a modification to the present embodiment.
- the quality of service code is omitted or the same for all packets.
- a user of the receiver 2 can request a preview of an audio file from the transmitter 1.
- the transmitter 1 responds by sending the most significant data, e.g 25% of the whole, in a first set of datagrams.
- the user can then play the received file and decide whether to download the full quality version. If the user decides to download the full quality version, the omitted data is sent in a second stream of datagrams.
- a transmitter 1 comprises a memory 31, such as a hard disk drive, in which a plurality of audio data files 32, 33, 34 are stored.
- Each of the audio data files 32, 33, 34 comprises groups of interleaved samples, such as those produced by the interleaver process 9 shown in Figure 2.
- the data in the files 32, 33, 34 can be selectively read by a transmission process 35 in response to a request therefor from a receiver 2 which includes the file's name or some other identifier.
- the transmission process 35 combines data front rour sequential sample groups to produce one datagram and sends one datagram for every sample group. It will be appreciated that the first three datagrams will be padded with runs of zeros because there will be insufficient sample groups to provide real data.
- the transmission process 35 reads all 160 bits of the n th sample group (step s21). Then the transmission process 35 reads and appends the first 96 bits from the n-T sample group (step s22), followed by the first 64 bits of the n-2 th and n-3 th sample groups (steps s23 and s24). The result is a datagram data part as shown in Figure 10. This data is then sent to a UDP socket 36 for transmission to the receiver 2 (step s25).
- the receiver 2 is structurally similar to that shown in Figure 4. However, it differs in the operation of the reception process 17.
- the reception process 17 when the reception process 17 receives (step s31) a datagram conveying the content of a file 32, 33, 34, it determines whether the datagram has a sequence position before the last received datagram (step s32). If this is the case, the newly received datagram is discarded (step s33).
- the reception process 17 outputs bits 320 to 351 of the data part of the datagram and 96 zeros to the de- interleaver process 18 (step s35).
- the reception process 17 outputs bits 256 to 319 of the data part of the datagram and 96 zeros to the de-interleaver process 18 (step s37). Then if the datagram has a sequence position indicating that one datagram has been sent to oblivion 6 (step s38), the reception process 17 outputs bits 160 to 255 of the data part of the datagram and 64 zeros to the de-interleaver process 18 (step s39).
- the reception process 17 outputs bits 0 to 159 of the datagram's data part to the de-interleaver process 18 (step s40).
- the de-interleaver process 18 receives a block of 160 bits it carries out the inverse process of that the interleaver process 9 in Figure 2 to reconstructs the original sample groups, as well as possible on the basis of the received datagrams, and sends the reconstructed samples to the loudspeaker 21 via the buffer 20.
- the performance of this embodiment can be improved by adding a buffer for temporarily storing datagram before they are passed to the reception process 17.
- the buffer provides time for out of sequence datagrams to be received and then fed to the reception process 17 in their correct position.
- error correction coding and/or compression techniques may be employed.
- the present invention has been illustrated in the first to fourth embodiments in the context of systems which transmit time domain samples of a time-varying signal.
- such signals be sent as coefficients produced by a time to frequency domain transformation, e.g. a wavelet packet transform or a discrete cosine transform such as the Modified Discrete Cosine Transform (MDCT).
- a time to frequency domain transformation e.g. a wavelet packet transform or a discrete cosine transform such as the Modified Discrete Cosine Transform (MDCT).
- MDCT Modified Discrete Cosine Transform
- the transmitter 1 comprises a c ⁇ mputer provided with a source of audio signals 107, e.g. a magnetic tape recording or a microphone, an analogue-to-digital converter 108 and a hardware network interface (not shown).
- the computer also supports a wavelet packet transformer process 109, an encoder process 110, a transmission process 111, a UDP socket 112 and a request processing process or processes (not shown).
- the analogue-to-digital converter 108 digitises the output of the audio signal source 107 to produce signed 16-bit data.
- the wavelet packet transformer process 109 reads the samples from the digital-to-analogue converter 108 and outputs frequency domain coefficients which are encoded by the encoder process 110.
- the encoded coefficients are stored in a file 113 and then transmitted by the transmission process 111 and the UDP socket 112.
- the wavelet packet transformer process 109 implements a seven layer (only three shown) digital filter tree structure.
- Each layer of the tree comprises a high-pass filter 121 and a low-pass filter 122 for each branch and a down-sample by 2 decimator 123 for decimating the output of each filter.
- the tree functions as an array of slightly overlapping bandpass filters with each filter in the last layer outputting samples for signal components in a substantially discrete frequency band.
- the result of the filtering is a set of samples that can be illustrated as a matrix having sixteen samples along a time axis and 64 samples along a frequency axis. An 8 by 8 section of such a matrix is shown in Figure.
- the data in the "matrix" is taken by the encoder process 110 as its input.
- the purpose of the encoder process 110 is to produce an output in which information about significant bits precedes information about bits of lesser significance. This could be achieved by a modified interleaving process as shown in Figure 15.
- the encoder process scans the matrix to determine the largest value (step slOl) and stores the integer part (n) of the base 2 logarithm of this value as a 5-bit code (step si 02).
- the encoder process appends the sign bit for each sample in the matrix to the stored value n (step si 03).
- the encoder process appends the value of the bit with the current significance to the previously stored values (step si 04).
- the stored data is sent to the file 113 (step sl07).
- This stored data constitutes one sample group and a plurality of such groups will represent a piece of music, for example, and be stored in a file together.
- the data in the file 113 can be selectively read by a transmission process 111 in response to a request therefor from a receiver 2 which includes the file's name or some other identifier.
- the transmission process 111 combines data from four sequential sample groups to produce one datagram and sends one datagram for every sample group. It will be appreciated that the first three datagrams will be padded with runs of zeros because there will be insufficient sample groups to provide real data.
- the transmission process 111 reads and stores all of the bits of the n th sample group (step sl22). The transmission process 111 determines how many bits this is by multiplying the value (m) represented by the first five bits plus one (to account for the sign bits) by 1024 and adding 5 (step sl21).
- the transmission process 111 calculates the integer part of ((m/2) + 1) x 1024 + 5 for the n-l th sample group (step sl23) and reads and appends that number of bits of the n-l th sample group (step sl24), headed by the n-l th sample group's first five bits.
- the transmission process 111 calculates the integer part of ((m/4) + 1) x 1024 + 5 for the n-2 th sample group (step sl25) and reads and appends that number of bits of the n-2 lh sample group (step sl26), headed by the n-2 th sample group's first five bits.
- the transmission process 111 calculates the integer part of ((m/4) + 1) x 1024 + 5 for the n-3 th sample group (step sl27) and reads and appends that number of bits of the n-l th sample group (step sl28), headed by the n-3 th sample group's first five bits..
- step si 29 The result is a string of bits similar to the datagram data part shown in Figure 10 but with the potential for variation in the lengths of the subsections.
- This data is then compressed (step si 29) using a convenient conventional technique and sent to the UDP socket 112 for transmission to the receiver 2 (step si 30).
- the receiver 2 in this example, comprises a UDP socket 116, a reception process 117, a decoder 118, an inverse wavelet packet transformer 119, a digital-to-analogue converter 120 and a loudspeaker 121.
- the data in received datagrams is passed by the UDP socket 116 to the reception process 117.
- the reception process 117 de-compresses the data and compensates for lost or out of sequence datagrams.
- the reception process 117 when the reception process 117 receives (step si 30) a datagram conveying the content of the file 113 ( Figure 15), it determines whether the datagram has a sequence position before the last received datagram (step sl31). If this is the case, the newly received datagram is discarded (step sl32). Otherwise, it decompresses the data in the datagram (step sl33).
- the reception process 117 determines the boundaries between data from different sample groups within the data.
- the reception process reads the first 5 bits of the data, giving value m and calculates (m+ 1) x 1024 + 5 to get the zero-based position a, of first bit of the second section of the data (step si 34).
- the reception process then reads five bits starting from bit a 15 giving value m 1 ⁇ and calculates ((m-/2) + 1) x 1024 + 5 to get the position a- of the first bit of the third section (step sl35) and reads five bits from bit a 2 inclusive giving value m, and Calculates ((m 2 /2) + 1) x 1024 + 5 to get the position a 3 of the first bit of the fourth section (step si 36).
- the reception process 117 If the datagram has a sequence position indicating that three or more datagrams have been sent to oblivion 6 by switch 5 ( Figure 1) (step si 37), the reception process 117 outputs bits a 3 to the end of the data to the decoder process 118 (step sl38). Otherwise, if the datagram has a sequence position indicating that two datagrams have been sent to oblivion 6 (step sl39), the reception process 117 outputs bits a 2 to a 3 -l of the data part of the datagram to the decode process 118 (step sl40).
- the reception process 117 outputs bits a, to a 2 -l of the data part of the datagram to the decoder process 18 (step sl42).
- the reception process 17 outputs bits 0 to a--l of the datagram's data part to the decoder process 18 (step sl43).
- the data compression function is performed by the encoder process 110 rather than the transmission process 111 and the transformer process 109 is a MDCT transformer process giving an output matrix of coefficients comprising two columns in the time direction and 512 rows in the frequency direction.
- Sig_coeffs con t ains pointers to coefficients which have been found to be "significant" and is initially empty.
- the Pending list contains pointers to coefficients found to be significant in the present iteration.
- Ts_ptr comprises a respective pointer to the next coefficient to be checked for significance in each time slot.
- the encoder process 110 determines and outputs resolution definitions, one per Bark scale band, which communicate the minimum resolutions required to ensure that quantisation noise is imperceptible (step s201). The encoder process 110 then finds the largest coefficient (step s202). If c max is 0 (step s203) which indicate a silent frame, the process outputs 00000 (step s206) and termina t es. However, if c max is not 0, the process determines the code for the most significant bit position (N) which has a 1 in the binary representation of this number (step s204) using: where c m ⁇ x is the largest coefficient value.. The coefficients are floating point numbers.
- step s205 If N less than 1 (step s205), the process moves to step s206. If however N is not less than 1 (step s205), five bits representing the new value of N are output (step s207) and the samples are then processed on a time slot by time slot basis (steps s216 and s217). First it is determined whether there are any unmasked newly significant coefficients beyond the current Ts_ptr position in the current time slot (step s208). A newly significant coefficient is masked if its most significant 1 bit is below the resolution level set for the Bark band containing it. Such masked bits are treated at this point as zeros.
- a 1 is output (step s209) and the length (R) of the run of insignificant coefficients from the Ts_ptr position to the significant coefficient is determined and run-length encoded (step. s210).
- the run-length code (step s211) and the sign bit of the significant coefficient are then output (step s212 , and a pointer to the coefficient is added to the Pending list (s213).
- a zero is output (step s214) and the Ts_ptr pointer for the present time slot is changed to point to the first extant insignificant coefficient (step s215). It is then determined whether the last time slot has been processed (step s216). If not, the process moves on to the next time slot (step s217).
- the Sig_coeffs list is looped through (step s220). While the Sig_coeffs list is being looped through, the Nth bit of the coefficient pointer to by the next list member is output (step s219), if the resolution data indicates that it has a material effect on the perception of the signal by a listener (step s218). After looping through the Sig_coeffs list has been completed, the contents of the Pending list are transferred to the Sig_coeffs list (step s221) and N is decremented by 1 (step s222).
- step s223 flow returns to step s208 otherwise the process terminates.
- the processing is finished when there is no more space available in the output datagram or datagram section, where packet loss recovery is being employed.
- the size of the datagram data part is determined by the required bit rate which may established in the design of the system or dynamically during operation to reflect different fidelity requirements.
- bits representing a particular significance level for a particular time slot terminate with a zero after the sign bit of the highest frequency significant bit. This removes long runs of zeros than would otherwise occur with audio signals which tend to contain relatively little energy at higher frequencies. However, long runs of zeros can frequently occur before a significant coefficient or between significant coefficients.
- these "preceding" and "interposed" zero runs are encoded using a run-length code.
- bits which do not have a role in creating the listener's perception of the transmitted signal are not transmitted. Instead, bits defining the necessary resolutions for masked components in the bands of the Bark scale are transmitted.
- the result of the encoding process 110 is a file or bitstream comprising a header and a plurality of data blocks.
- the header comprises the resolution definition data and the 5-bit significance code (N).
- the data blocks comprise the meaningful data from respective significance levels and within each block the data concerning coefficients becoming significant at the associated level precede data refining the values of coefficients that became significant at higher levels. It should also be noted that the number of Bark bands, i.e. 24, is much lower than the number of coefficients, i.e. 512, for each time slot.
- the each run-length code comprises a prefix and a suffix.
- the prefix defines a range of values and the suffix the position within the range.
- the prefixes and suffix lengths are as follows:-
- the value represented by a code value is Prefix followed by Suffix value - R m ⁇ .
- the data decompression function is again performed by the decoder process 118 rather than the reception process 117.
- the decoder process 118 reconstructs the original coefficients.
- Sig_coeffs contains pointers to coefficients which have been found to be "significant" and is initially empty.
- the Pending list contains pointers to coefficients found to be significant in the present iteration.
- Ts_ptr comprises pointers to the next coefficient to be processed in each time slot.
- the resolution definitions are obtained from the first 96 bits of the datagram data (step s301) and the value N is obtained from the next five bits of the data part of the datagram (step s302). If N is zero (step s303), the coefficients are output to the transformer process 119 (step s304). If N is not zero (step s303) and there are more than 2 unprocessed datagram data bits (step s305), N is used to set a threshold at 2 N'9 (step s306) and the incoming data is then processed in respect of the coefficients in time slot order (steps s308 and s321).
- step s306 it is determined whether there are any unprocessed datagram data bits left (step s307). If the result at step s307 is yes, the processing of the next time slot is performed, otherwise the process moves to step s304.
- processing starts from the coefficient pointed to by the relevant Ts_ptr member (s309) and the datagram data is tested (step s310) to determine whether the next bit is a 1 and that there are more than two bits left. If the answer at step s310 is no, the process moves on to the next time slot after resetting the Ts_ptr member for the current time slot (step s320). However, if the answer at step s310 is yes, it is determined whether the next bit is 0 (step s312). If so, it is determined whether there are more than 2 bits left in the data part of the datagram (step s313).
- step s313 the run-length code prefix defaults initially to the prefix for the lowest range
- step s311 If the answer at step s311 is no, it is determined, on the basis of the run-length code prefix, whether there are sufficient bits left in the datagram data to complete the run-length code (step s314). If there are not, the process returns to step s310 otherwise the number of coefficients indicated by the run-length code are skipped (step s315) and the next bit of the datagram data is read as the sign of the next coefficient (step s316) and the magnitude of the coefficient is set to the value of the threshold (step s317). The current coefficient is then added to the Pending list (step s318).
- the members of Sig_coeffs are processed. For each member of Sig_coeffs (steps s322 and s326), the next bit of the datagram data is added as the h bit to the current coefficient (step s324), if the resolution definition data does not indicate that the value of the current coefficient is irrelevant (step s323).
- N is decremented (step s326) and, if all the data in the received datagram has not been processed (step s327), the members of the Pending list are transferred to the Sig_coeffs list (step s328) and the process returns to step s305. If, however, all the data in the received datagram has been processed at step s327, the process moves to step s304 to output the coefficients.
- a feature of the above-described decoder process 118 is that the input bitstream may be truncated at any point within the blocks of coefficient data without causing a failure.
- the decoder does not actually need to be aware to the number of significance levels represented in the received signal.
- the encoder can be operated to produce the number of bits required to fill the main or only data part in the present datagram and therefore data for as many significance levels as possible is transmitted, maximising the fidelity within the current bandwidth constraints.
- the "old" data parts can be formed by truncating the data from a previous frame at a particular bit position rather than at a particular boundary between data for different significance levels.
- a portable audio playback device 200 comprises a control circuit 201 in the form of microcomputer circuitry, a serial communication interface 202, a large flash ROM memory 203, a keypad 204 for controlling the operation of the device, an audio module 205 including a digital-to-analogue converter and a variable-gain amplifier, and a jack socket 206 for connecting the device to an earpiece.
- the control circuit 201 includes an embedded version of the Linux operating system which includes an ftp daemon. By connecting the device to a personal computer (not shown), a user can transfer files or selected parts thereof to the memory 203 using the ftp protocol.
- the transferred files are preferably in a format of the types produced by the encoder processes in the embodiments described above. Consequently, the user can trade fidelity, i.e. number of significance levels, against duration. Thus, the user may choose between a few high fidelity recordings or many lower fidelity recordings.
- the control circuit 201 is also provided with a program for reading data from the memory 203 and decoding and, if necessary, transforming it.
- the resultant time domain digital data is then sent to the audio module 205 for output as an analogue signal via the jack socket 206.
- the present invention has been described solely in the context of single channel signals, such as monophonic audio.
- the present invention can be applied to multichannel signals, e.g. stereophonic audio.
- a multichannel signal may be sent with each channel being carried by a separate stream of packets, the packets for each channel being interleaved with each other.
- each packet contain data from all of the channels with the data grouped according to significance so that the most significant data from all of the channels is grouped together and the next most significant data from all of the channels is grouped together and so on.
- stereophonic signals may be sent as sum and difference signals. This also maintains compatibility with monophonic receiving or reproducing apparatus.
- the left and right channel signals are added together to produce the sum signal, and subtracted one from the each other to produce the difference signal. When there is correlation between the channels, the difference signal has a much smaller amplitude than the sum.
- the sum and difference signal generation may be carried out either on the raw time-domain signals before any time-frequency transformation or on the transformed frequency domain versions of the signals.
- the latter is more efficient and it is preferred because the correlation is generally greater in the frequency domain.
- the sum and difference signals are then encoded, or encoded and compressed, by one of the methods above.
- the total bitrate available may be divided up between the sum and difference signals in a fixed way, so that each receives a fixed proportion of the total bitrate, regardless of the signal characteristics.
- the proportion allocated to the sum will always be higher, but the best compromise would need to be determined by experiment.
- the encoding/compression for each frame is carried out simultaneously on the sum and difference signals.
- Encoding stops when the sum of the bits used for the two parts is equal to the number dictated by the cuVrent bitrate. However, the initial threshold is the same for both, and will normally be dictated by the sum signal, since it will in general be greater. The result of this process is that sum and difference will be specified with equal precision but that the number of bits used for the two will vary according to how similar or different the two channels are.
- each packe t could contain sum and difference significance blocks interleaved or the sum and difference signals could be transmitted in separate packets. In both cases the packet loss recovery system described above can be employed.
- the sum packets c + ould be transmitted wi t h a high service quality code and the difference packets with low service quality code. Then at times of high network traffic, a mono signal only would be available, but at times of low traffic, the full stereo signal would be provided.
- An alternative approach takes advantage of two phenomena related to the perception of the stereophonic 'image'. At low frequencies, channels may be amalgamated without affecting the stereo image and, at high frequencies, the perception of the stereo image depends more on the temporal envelope of the signal than on the fine structure.
- the subbands for the n channels may be replaced by a single set of subbands equal to the average values of the coefficients in those subbands.
- a certain frequency typically 2-3 kHz
- the separate subband coefficients are used to generate the initial significance information.
- the subbands are averaged as above, and these averaged subbands are used to generate joint refinement information for all channels. In this way, the higher-frequency subbands are conveyed with different envelopes, but with the same fine structure.
- the encoding or encoding and compression is then carried out using: (a) joint low- frequency subbands, (b) separate mid-frequency subbands, (c) separate high- frequency significance information (i.e. most significant bits) and (d) joint high- frequency refinement information (i.e. bits other than most significant).
- time-frequency domain transformation may be adapted to the nature of the input signal and the transmission path.
- the optimum division of the packet will depend on the packet loss characteristics of the channel through which packets are to be sent.
Abstract
Description
Claims
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP00925502A EP1177651A1 (en) | 1999-05-01 | 2000-04-28 | Robust coding for the transmission of audio or video signals |
AU44223/00A AU4422300A (en) | 1999-05-01 | 2000-04-28 | Robust coding for the transmission of audio or video signals |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GB9910002A GB9910002D0 (en) | 1999-05-01 | 1999-05-01 | Audio signal encoders and decoders |
GB9910002.6 | 1999-05-01 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2000067417A1 true WO2000067417A1 (en) | 2000-11-09 |
Family
ID=10852571
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/GB2000/001649 WO2000067417A1 (en) | 1999-05-01 | 2000-04-28 | Robust coding for the transmission of audio or video signals |
Country Status (4)
Country | Link |
---|---|
EP (1) | EP1177651A1 (en) |
AU (1) | AU4422300A (en) |
GB (1) | GB9910002D0 (en) |
WO (1) | WO2000067417A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2002052240A1 (en) * | 2000-12-22 | 2002-07-04 | Telefonaktiebolaget Lm Ericsson (Publ) | Method and a communication apparatus in a communication system |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2139458A (en) * | 1983-04-18 | 1984-11-07 | British Broadcasting Corp | Error correction in data transmission or processing |
WO1993015502A1 (en) * | 1992-01-28 | 1993-08-05 | Qualcomm Incorporated | Method and system for the arrangement of vocoder data for the masking of transmission channel induced errors |
US5255343A (en) * | 1992-06-26 | 1993-10-19 | Northern Telecom Limited | Method for detecting and masking bad frames in coded speech signals |
US5487061A (en) * | 1994-06-27 | 1996-01-23 | Loral Fairchild Corporation | System and method for providing multiple loss and service priorities |
EP0869622A2 (en) * | 1997-04-02 | 1998-10-07 | Samsung Electronics Co., Ltd. | Scalable audio coding/decoding method and apparatus |
-
1999
- 1999-05-01 GB GB9910002A patent/GB9910002D0/en not_active Ceased
-
2000
- 2000-04-28 AU AU44223/00A patent/AU4422300A/en not_active Abandoned
- 2000-04-28 WO PCT/GB2000/001649 patent/WO2000067417A1/en not_active Application Discontinuation
- 2000-04-28 EP EP00925502A patent/EP1177651A1/en not_active Withdrawn
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2139458A (en) * | 1983-04-18 | 1984-11-07 | British Broadcasting Corp | Error correction in data transmission or processing |
WO1993015502A1 (en) * | 1992-01-28 | 1993-08-05 | Qualcomm Incorporated | Method and system for the arrangement of vocoder data for the masking of transmission channel induced errors |
US5255343A (en) * | 1992-06-26 | 1993-10-19 | Northern Telecom Limited | Method for detecting and masking bad frames in coded speech signals |
US5487061A (en) * | 1994-06-27 | 1996-01-23 | Loral Fairchild Corporation | System and method for providing multiple loss and service priorities |
EP0869622A2 (en) * | 1997-04-02 | 1998-10-07 | Samsung Electronics Co., Ltd. | Scalable audio coding/decoding method and apparatus |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2002052240A1 (en) * | 2000-12-22 | 2002-07-04 | Telefonaktiebolaget Lm Ericsson (Publ) | Method and a communication apparatus in a communication system |
US7444281B2 (en) | 2000-12-22 | 2008-10-28 | Telefonaktiebolaget Lm Ericsson (Publ) | Method and communication apparatus generation packets after sample rate conversion of speech stream |
Also Published As
Publication number | Publication date |
---|---|
AU4422300A (en) | 2000-11-17 |
EP1177651A1 (en) | 2002-02-06 |
GB9910002D0 (en) | 1999-06-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP0936772B1 (en) | Unequal error protection for perceptual audio coders | |
US7069208B2 (en) | System and method for concealment of data loss in digital audio transmission | |
JP4004707B2 (en) | Techniques for multirate coding of signals containing information | |
US6122338A (en) | Audio encoding transmission system | |
EP1946517B1 (en) | Audio data packet format and decoding method thereof and method for correcting mobile communication terminal codec setup error and mobile communication terminal performing same | |
KR101699548B1 (en) | Encoder, decoder and method for encoding and decoding | |
JP2000324183A (en) | Communication device and method | |
US20020078241A1 (en) | Method of accelerating media transfer | |
KR20090001370A (en) | Method of setting configuration of codec and codec using the same | |
US20040024592A1 (en) | Audio data processing apparatus and audio data distributing apparatus | |
EP0919988B1 (en) | Speech playback speed change using wavelet coding | |
US20050240414A1 (en) | Data processing system, data processing method, data processing device, and data processing program | |
US8023585B2 (en) | Apparatus and method for transmitting or receiving data | |
JPH11511308A (en) | Digital transmission system | |
JP3379610B2 (en) | Encoding and decoding apparatus and method using channel masking characteristic for bit allocation | |
CN1398055A (en) | Wireless audio transmission system and method | |
KR100706968B1 (en) | Audio data packet generation apparatus and decoding method thereof | |
EP1177651A1 (en) | Robust coding for the transmission of audio or video signals | |
JP4077037B2 (en) | Method and apparatus for mapping between cellular bitstream and wired waveform | |
CN1157853C (en) | Transmitting device for transmitting a digital information signal alternately in encoded form and non-encoded form | |
US20010056343A1 (en) | Sound signal encoding apparatus and method | |
CN100339903C (en) | Method and apparatus for transmitting audio and non-audio information with error correction | |
CN115691521A (en) | Audio signal coding and decoding method and device | |
CN115691514A (en) | Coding and decoding method and device for multi-channel signal | |
Becker et al. | Influence of the BER on the Intelligibility of the Received DAB Signal |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): GH GM KE LS MW SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2000925502 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 09959612 Country of ref document: US |
|
WWP | Wipo information: published in national office |
Ref document number: 2000925502 Country of ref document: EP |
|
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
NENP | Non-entry into the national phase |
Ref country code: JP |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 2000925502 Country of ref document: EP |