WO2003090028A2

WO2003090028A2 - Wavelet transform system, method and computer program product

Info

Publication number: WO2003090028A2
Application number: PCT/US2003/012078
Authority: WO
Inventors: Krasimir Kolarov; William C. Lynch; Steven E. Saunders; Thomas A. Darbonne
Original assignee: Droplet Technology, Inc.
Priority date: 2002-04-19
Filing date: 2003-04-17
Publication date: 2003-10-30
Also published as: WO2003090028A3; AU2003230986A1; CN1663257A; EP1500268A2; JP2005523615A; CN101902648A; JP2010141922A; AU2003230986A8

Abstract

A system, method and computer program product are provided for compressing data (Fig. 2). Initially, an interpolation formula (302) is received. Such interpolation formula is utilized for compressing data. In use, it is determined (304) whether at least one data value is required by the interpolation formula, where the required data value is unavailable. If such is the case, an extrapolation operation (306) is performed to generate the required unavailable data value.

Description

WAVELET TRANSFORM SYSTEM, METHOD AND COMPUTER PROGRAM PRODUCT

FIELD OF THE INVENTION

The present invention relates to data compression, and more particularly to compressing data utilizing wavelets.

BACKGROUND OF THE INVENTION

Video "codecs" (compressor/decompressor) are used to reduce the data rate required for data communication streams by balancing between image quality, processor requirements (i.e. cost/power consumption), and compression ratio (i.e. resulting data rate). The currently available compression approaches offer a different range of trade-offs, and spawn a plurality of codec profiles, where each profile is optimized to meet the needs of a particular application.

Prior Art Figure 1 shows an example 100 of trade-offs among the various compression algorithms currently available. As shown, such compression algorithms include wavelet-based codecs 102, and DCT-based codecs 104 that include the various MPEG video distribution profiles.

2D and 3D wavelets are current alternatives to the DCT-based codec algorithms. Wavelets have been highly regarded due to their pleasing image quality and flexible compression ratios, prompting the JPEG committee to adopt a wavelet algorithm for its JPEG2000 still image standard. Unfortunately, most wavelet implementations use very complex algorithms, requiring a great deal of processing power, relative to DCT alternatives, hi addition, wavelets present unique challenges for temporal compression, making 3D wavelets particularly difficult.

For these reasons, wavelets have never offered a cost-competitive advantage over high volume industry standard codecs like MPEG, and have therefore only been adopted for niche applications. There is thus a need for a commercially viable implementation of 3D wavelets that is optimized for low power and low cost focusing on three major market segments.

For example, small video cameras are becoming more widespread, and the advantages of handling their signals digitally are obvious. For instance, the fastest- growing segment of the cellular phone market in some countries is for phones with image and video-clip capability. Most digital still cameras have a video-clip feature. In the mobile wireless handset market, transmission of these still pictures and short video clips demand even more capacity from the device battery. Existing video coding standards and digital signal processors put even more strain on the battery.

Another new application is the Personal Video Recorders (PVR) that allow a viewer to pause live TV and time-shift programming. These devices use digital hard disk storage to record the video, and require video compression of analog video from a cable. In order to offer such features as picture-in-picture and watch-while-record, these units require multiple video compression encoders.

Another growing application area is the Digital Video Recorders (DVR) for surveillance and security video. Again, compression encoding is required for each channel of input video to be stored. In order to take advantage of convenient, flexible digital network transmission architectures, the video must be digitized at the camera. Even with the older multiplexing recorder architecture, multiple channel compression encoders are required. Of course, there are a vast number of other markets which would benefit from a commercially viable implementation of 3D wavelets that is optimized for low power and low cost.

Experience teaches one that images, considered as a function on the 2- dimensional square, are well modeled as a polynomial smooth at most points with some relatively isolated point and line (edge) singularities. A video clip is similarly modeled with a 3 -dimensional domain. For most images and video, the RMS (Root Mean Square) residual from a linear polynomial model is in the neighborhood of 5%, and in the neighborhood of 2% for a quadratic polynomial model.

Commonly used schemes for approximating such functions (images and video) comprise the following steps:

1) Reversibly transforming the function so that the transformed coefficients can be divided into "subbands"

2) Quantizing (i.e., reducing the precision of) all but the "low-pass" subband

3) Applying the reverse transformation to the quantized coefficients, thus reconstructing an approximation to the original function.

A good scheme uses a transformation that projects the low degree polynomial content of the function into the unquantized "low pass" subband. Such a scheme will, ideally, also produce zeros or very small values in the other subbands.

Subsequent quantization of the non-low-pass subbands will thus not significantly change the transform of a function well-modeled by a sufficiently low degree polynomial, and the approximation of the reconstruction to the original function will be very good.

Implementation realities make it highly desirable for a value in the transformed function to depend only on values in a small neighborhood of some point in the original function domain. This is one of the purposes of the 8x8 blocks in the JPEG and MPEG standards. In these specifications, domain neighborhoods either coincide or do not intersect, partitioning the image domain into a cover of disjoint neighborhoods, each with a distinct border. The approximation resulting from quantization tends to be poor at these borders (the well known "Gibbs effect" in discrete Fourier transforms) resulting in noticeable "blocking" artifacts in the reconstructed, approximated image.

Wavelet transformations have attracted a good deal of attention as a transform class that has the small domain neighborhood property, although with overlapping neighborhoods. Some wavelet transforms do a better job, compared to the DCTs of JPEG/MPEG, of projecting the function primarily into the low-pass subbands. Moreover, some wavelet transforms (not necessarily the same ones) are significantly less compute intensive. However, the domain neighborhood overlap imposes significant implementation problems in the area of data handling, memory utilization, and memory bandwidth. It is still useful to "block" the domain, bringing back their boundaries and the issue of approximation near those boundaries.

Transforms at domain boundaries present a problem in that the domain neighborhood, centered at a boundary point, does not lie within the domain block to which the boundary point belongs. The conventional approach to this problem, as embodied in the various JPEG and MPEG standards, is to reflect symmetrically the domain values in a block across the boundary to create "virtual" values and a virtual function on the required neighborhood.

Unless this virtual function is typically a constant on the neighborhood, it will have a cusp or crease at the boundary resulting from a discontinuous first derivative. This discontinuity is not well modeled by a low degree polynomial and hence is reflected in large non-low-pass subband coefficients that remain large after quantization. The larger quantization error results in an increased approximation error at the boundary. One of the transforms specified in the JPEG 2000 standard 1) is the reversible 5-3 transform shown in Equations #1.1 and 1.2.

Equations #1.1 and 1.2

y ^2« + ^2n+2

-'2/1+1 2π+l eq ll

__2n„_-i + Y₂„₊ι +

Y ^; χ₂„ + eq 1.2

As these equations are integer to integer maps and are easily backsolved for the Ys, this transform is reversible and the reverse produces the input Ys exactly bit- for-bit. See Equations #2.1 and 2.2.

Equations #2.1 and 2.2

X. 2n-\ ~ ^■i2n+l ^τ ^

2« Y - ^l ¹2n eq 2.1

It is clear from these equations that: Y_2n+l is an estimate of the negative of half of the 2nd derivative at (2n+l); and F₂„₊₁ is approximately zero if the function is well approximated by a 1st degree polynomial at (2n+l).

The purpose of the constant addition within the^Oor brackets (jj) is to remove any DC bias from the estimates. Uncorrected biases in wavelets easily lead to oscillatory errors in the reconstructed data, exhibited as fixed pattern noise such as horizontal or vertical bars. There are several possibilities for bias estimation and correction, of which one has been selected in the JPEG2000 standard. If the right boundary of the image is at point 2N - 1 , Equation #1.1 cannot be calculated as the required value X_2N is not available. The JPEG 2000 standard requires that this case be addressed by extending the function by positive symmetry so that one uses X_2N = X_2N_₂ - Making this substitution into Equation #1.1 renders Equation #1.1 ext.

Equation #1 Text

X 2.N- -2 ⁺ % N-

7, 2N- •1 ^{— Λ}-2N-X = eq l.l ext

2 ⁼ ^ N-\ ~ L 2-V-2J ⁼ -Λ-2N-1 ~ ^2W-2

This produces a Y_w_ which is an estimate of the first derivative as opposed to an estimate of half of the negative of the second derivative as in the interior points. Further, it is clear that an estimate of the second derivative can be obtained only by use of three distinct points rather than just two. One needs to confine the two points required in the lifting term to X s with an even index, as these are the only ones available for the reverse step. The closest candidate index is 2N - 4.

As can be seen especially in Equations #1.2 and 2.1, the JPEG-2000 formulation of the 5-3 wavelet filters involves addition of a constant 1 or 2 inside the calculation, as well as other limitations. When implemented for maximum speed and efficiency of computation, these additions and other limitations can require a significant fraction of the total computational load, and cause a significant slowdown in performance.

DISCLOSURE OF THE INVENTION

A system, method and computer program product are provided for compressing data. Initially, an interpolation formula is received. Such interpolation formula is utilized for compressing data. In use, it is determined whether at least one data value is required by the inteφolation formula, where the required data value is unavailable. If such is the case, an extrapolation operation is performed to generate the required unavailable data value.

In one embodiment, the interpolation formula maybe a component of a wavelet filter. As another option, the wavelet filter may be selectively replaced with a polyphase filter.

In another embodiment, a plurality of the data values may be segmented into a plurality of spans. Thus, an amount of computation involving the interpolation formula may be reduced by only utilizing data values within one of the spans.

In still another embodiment, the data values may be quantized. In such embodiment, an amount of computation associated with entropy coding may be reduced by reducing a quantity of the data values. The quantity of the data values may be reduced during a quantization operation involving the data values.

In still yet another embodiment, an amount of computation associated with reconstructing the data values into a predetermined data range may be reduced. Such computation may be reduced by performing only one single clip operation.

In one embodiment, the wavelet filter includes an interpolation formula including:

In one embodiment, the wavelet filter includes an inteφolation formula including:

7 +7

(Z₂„ +l/2) = (7_2fl +l/2)- ^J2n-1 ^{τ J}2n+l

{X_2N+l +l/2) _^ Y_2N+1 +(X_2N +l/2)

Another system and method are provided for compressing data. Initially, data is received in a single device. Such data is encoded utilizing the single device to generate first compressed data in a first format. Moreover, the first compressed data is transcoded utilizing the single device to generate second compressed data in a second format.

In one embodiment, the encoding may occur in real-time. Moreover, the transcoding may occur off-line.

In another embodiment, the first compressed data may be transcoded to generate the second compressed data in the second format such that the second compressed data is adapted to match a capacity of a communication network coupled to the single device. As an option, the encoding may be carried out utilizing a first encoder. Moreover, the transcoding may be carried out utilizing a decoder and a second encoder.

Still yet, the first format may include a wavelet-based format. Further, the second format may include a DCT-based format. In one particular embodiment, the second format may include an MPEG format.

Another system and method are provided for compressing data utilizing multiple encoders on a single integrated circuit. Initially, data is received in a single integrated circuit. The data is then encoded utilizing a plurality of encoders incoφorated on the single integrated circuit.

In one embodiment, data may be encoded utilizing multiple channels on the single integrated circuit. Moreover, the data may be encoded into a wavelet-based format.

Another single module system and method are provided for compressing data. In use, photons are received utilizing a single module. Thereafter, compressed data representative of the photons is outputted utilizing the single module.

As an option, the compressed data may be encoded into a wavelet-based format. Moreover, the transform operations associated with the encoding may be carried out in analog. The single module may further include an imager.

BRIEF DESCRIPTION OF THE DRAWINGS

Prior Art Figure 1 shows an example of trade-offs among the various compression algorithms currently available.

Figure 2 illustrates a framework for compressing/decompressing data, in accordance with one embodiment.

Figure 3 illustrates a method for compressing/decompressing data, in accordance with one embodiment.

Figure 4 shows a data structure on which the method of Figure 3 is carried out.

Figure 5 illustrates a method for compressing/decompressing data, in accordance with one embodiment.

Figure 6 illustrates a system for compressing data, in accordance with one embodiment.

Figure 7 illustrates a system for compressing data utilizing multiple encoders on a single integrated circuit.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

Figure 2 illustrates a framework 200 for compressing/decompressing data, in accordance with one embodiment. Included in this framework 200 are a coder portion 201 and a decoder portion 203, which together form a "codec." The coder portion 201 includes a transform module 202, a quantizer 204, and an entropy encoder 206 for compressing data for storage in a file 208. To carry out decompression of such file 208, the decoder portion 203 includes a reverse transform module 214, a de-quantizer 212, and an entropy decoder 210 for decompressing data for use (i.e. viewing in the case of video data, etc).

In use, the transform module 202 carries out a reversible transform, often linear, of a plurality of pixels (in the case of video data) for the puφose of de- correlation. Next, the quantizer 204 effects the quantization of the transform values, after which the entropy encoder 206 is responsible for entropy coding of the quantized transform coefficients.

Figure 3 illustrates a method 300 for compressing/decompressing data, in accordance with one embodiment. In one embodiment, the present method 300 may be carried out in the context of the transform module 202 of Figure 2 and the manner in which it carries out a reversible transform. It should be noted, however, that the method 300 may be implemented in any desired context.

In operation 302, an inteφolation formula is received (i.e. identified, retrieved from memory, etc.) for compressing data. In the context of the present description, the data may refer to any data capable of being compressed. Moreover, the inteφolation formula may include any formula employing inteφolation (i.e. a wavelet filter, etc.). hi operation 304, it is determined whether at least one data value is required by the inteφolation formula, where the required data value is unavailable. Such data value may include any subset of the aforementioned data. By being unavailable, the required data value may be non-existent, out of range, etc.

Thereafter, an extrapolation operation is performed to generate the required unavailable data value. See operation 306. The extrapolation formula may include any formula employing extrapolation. By this scheme, the compression of the data is enhanced.

Figure 4 shows a data structure 400 on which the method 300 is carried out. As shown, during the transformation, a "best fit" 401 may be achieved by an inteφolation formula 403 involving a plurality of data values 402. Note operation 302 of the method 300 of Figure 3. If it is determined that one of the data values 402 is unavailable (see 404), an extrapolation formula may be used to generate such unavailable data value. More optional details regarding one exemplary implementation of the foregoing technique will be set forth in greater detail during reference to Figure 5.

Figure 5 illustrates a method 500 for compressing/decompressing data, in accordance with one embodiment. As an option, the present method 500 may be carried out in the context of the transform module 202 of Figure 2 and the manner in which it carries out a reversible transform. It should be noted, however, that the method 500 may be implemented in any desired context.

The method 500 provides a technique for generating edge filters for a wavelet filter pair. Initally, in operation 502, a wavelet scheme is analyzed to determine local derivatives that a wavelet filter approximates. Next, in operation 504, a polynomial order is chosen to use for extrapolation based on characteristics of the wavelet filter and a numbers of available samples. Next, extrapolation formulas are derived for each wavelet filter using the chosen polynomial order. See operation 506. Still yet, in operation 508, specific edge wavelet cases are derived utlizing the extrapolation formulas with the available samples in each case.

See Appendix A for an optional method of using Vandermonde type matrices to solve for the coefficients. Moreover, additional optional information regarding exemplary extrapolation formulas and related information will now be set forth in greater detail.

To approximate Y_2N__λ from the left, one may fit a quadratic polynomial from the left. Approximating the negative of half the 2nd derivative at 2N-1 using the available values yields Equation # 1.1.R. See Appendix A for one possible determination of this extrapolating quadratic.

Equation # 1.1.R

3X

7 2JV' -2 ~ ^2.V-4 ⁺ ^

2Λ- -1 X 2.N-1 eq l.l.R

Equation # 1.1.R may be used in place of Equation #1.1 (see background section) when point one is right-most. The apparent multiply by 3 can be accomplished with a shift and add. The division by 3 is trickier. For this case where the right-most index is 2N - 1 , there is no problem calculating Y_2N__₂ by means of

Equation #1.2 (again, see background section). In the case where the index of the right-most point is even (say 2N ), there is no problem with Equation #1.1, but

Equation #1.2 involves missing values. Here the object is to subtact an estimate of 7 from the even X using just the previously calculated odd indexed 7 s, Y_x and 7₃ in the case in point. This required estimate at index 2N can be obtained by linear extrapolation, as noted above. The appropriate formula is given by Equation #1.2.R. Equation #1.2.R

Y — V 4- 9

^JJ 2N-1 ¹2N-3 ^τ ^ ₂N ~ -Λ- 2.N ⁺ eq 1.2.R

A corresponding situation applies at the left boundary. Similar edge filters apply with the required extrapolations from the right (interior) rather than from the left. In this case, the appropriate filters are represented by Equations #1.1.L and I.2.L.

Equations #1.1.L and 1.2.L

3X^X₃ +1

Y = -- X„ eq 1.1.-L

⁰ 3 y

37, -7₃ + 2 γ₀ = χ₀ + eq ll.L

The reverse transform fiters can be obtained for these extrapolating boundary filters as for the original ones, namely by back substitution. The inverse transform boundary filters may be used in place of the standard filters in exactly the same circumstances as the forward boundary filters are used. Such filters are represented by Equations #2.1.Rinv, 2.2.Rinv, 2.1.L.inv, and 2.2.L.inv.

Equations #2.1.Rinv. 2.2.Rinv. 2.1.L.inv. 2.2.L.inv

3X 2W-2 ^' -^2Λf-4 ⁺ 1

^2N-l ^— ^2N-\ ^"*^" eq 2.1.R inv

3X_t -X₃ + l

X₀ = -37₀ + eq 2Λ.L inv

Thus, one embodiment may utilize a reformulation of the 5-3 filters that avoids the addition steps of the prior art while preserving the visual properties of the filter. See for example, Equations #3.1, 3.1 R, 3.2, 3.2L.

Equations #3.1. 3.1R. 3.2. 3.2L

Y +7 (7₂„ +l/2) = (X₂„ +l/2) + ¹2n-l ^τ -^t2n+l eq 3.2

(7₀ +l/2) = (X₀ +l/2) + eq 3.22.

In such formulation, certain coefficients are computed with an offset or bias of V₂, in order to avoid the additions mentioned above. It is to be noted that, although there appear to be many additions of Y₂ in this formulation, these additions need not actually occur in the computation. In Equations #3.1 and 3. IR, it can be seen that the effects of the additions of Y₂ cancel out, so they need not be applied to the input data. Instead, the terms in parentheses (7₀ + 1/2) and the like may be understood as names for the quantities actually calculated and stored as coefficients, passed to the following level of the wavelet transform pyramid.

Just as in the forward case, the JPEG-2000 inverse filters can be reformulated in the following Equations #4.2, 4.2L, 4.1, 4.1R. Equations #4.2.4.2L. 4.1. 4.1R

(X_2π +l/2) = (7_2n +l/2)- Y2„-l + Y2 2n+l eq 4.2

(X₀ +l/2) = (7₀ +l/2)- n 4.2Z,

(X_2H +l/2) + (X₂„₊₂ +l/2)

(X_2n+1 +l/2) = 7_2n+I + eq 4.1 2

(X_2W+1 + l/2) = 7_2W+1 + (X_2W +l/2) eq 4.1R

As can be seen here, the values taken as input to the inverse computation are the same terms produced by the forward computation in Equations #3.1 ~ 3.2L and the corrections by ^lΛ need never be calculated explicitly.

In this way, the total number of arithmetic operations performed during the computation of the wavelet transform is reduced.

OPTIONAL FEATURES

Additional optional features and techniques that may be used in the context of the systems and methods of Figures 2-5 will now be set forth. It should be noted such optional features are set forth strictly for illustrative puφoses and should not be construed as limiting in any manner. Moreover, such features may be implemented independent from the foregoing systems and methods of Figures 2-5.

General Optional Features

In use, a transform module (i.e. see, for example, transform module 202 of Figure 2, etc.) may utilize a wavelet pyramid that acts as a filter bank separating the image into sub-bands each covering approximately one octave (i.e. factor of 2). At each octave, there may be three sub-bands corresponding to horizontal, vertical, and checkerboard features. In one embodiment, the pyramids may be typically three to five levels deep, covering the same number of octaves. If the original image is at all smooth, the magnitude of the wavelet coefficients decreases rapidly. Images may have a Holder coefficient of 2/3 meaning roughly that the image has 2/3 of a derivative. If the wavelet coefficients are arranged in descending order of absolute value, those absolute values may be seen to decrease as N^"s where N is the position in the sequence and s is the smoothness of the image.

After forming the wavelet pyramid, the wavelet coefficients may be scaled (quantized) by a quantizer (i.e. see, for example, quantizer 204 of Figure 2, etc.) to render a result that is consistent with the viewing conditions and the human visual contrast sensitivity curve (CSF). By accounting for the characteristics of the human visual system (HVS), the number of bits used to code the chroma subbands may be reduced significantly.

To provide a fast algorithm implementable in minimal silicon area demands, the use of a traditional arithmetic coder maybe avoided. For example, multiplies may be avoided as they are very expensive in silicon area, as set forth earlier. Moreover, such algorithm may have a very good "fast path" for the individual elements of the runs.

The codec may employ a group of pictures (GOP) of two interlaced video frames, edge filters for the boundaries, intermediate field image compression and block compression structure. Specific features of the implementation for a small single chip may be as follows in Table 1.

Table 1

■ One implementation may use short wavelet bases (the 2-6 wavelet) , which are particularly appropriate for those that focus on natural scene images quantized to match the HVS. Implementation can be accomplished with adds and shifts. A Mallat pyramid may be used resulting from five filter applications in the horizontal direction and three applications in the vertical direction for each field. This produces filters with dyadic coefficients, two coefficients in the low pass filter and two, four, or six coefficients in the wavelet filters (resulting in twelve wavelet sub- bands) . One may use modified edge filters near block and image boundaries so as to utilize actual image values. The resulting video pyramids may have substantial runs of zeros and also substantial runs of non-zeros. Encoding can therefore be done efficiently by table look-up.

■ Another solution may use motion image compression via 3D wavelet pyramid in place of the motion compensation search used in MPEG-like methods. One may apply transform compression in the temporal direction to a GOP of four fields . A two level temporal Mallat pyramid may be used as a tensor product with the spatial pyramid. The linear edge filters may be used at the fine level and the modified Haar filters at the coarse level, resulting in four temporal subbands. Each of these temporal subbands is compressed. ■ Processing can be decoupled into the processing of blocks of 8 scan lines of 32 pixels each. This helps reduce the RAM requirements to the point that the RAM can be placed in the ASIC itself. This reduces the chip count and simplifies the satisfaction of RAM bandwidth requirements.

Compression processing may be performed stripe by stripe (two passes per stripe). A "stripe" is 8 pixels high and the full width of the picture. ■ Still another embodiment can use quantization of the wavelet coefficients to achieve further improvement in compression. Quantization denominators are powers of two, enabling implementation by shifts. Quantization may refer to a process of assigning scaling factors to each subband, multiplying each coefficient in a sub band by the corresponding scaling factor, and fixing the scaled coefficient to an integer.

Combined Filters

As another option, the wavelet filters maybe selectively replaced with polyphase filters. In one embodiment, such replacement may occur in a transform module (i.e. see transform module 202 and/orreverse transform module 214 of Figure 2) ofa data compression/decompression system. Ofcourse, such feature may be implemented independent ofthe various other features described herein. More exemplary information regarding the present optional feature will now be set forth. In the present embodiment, the use of conventional [i.e. Finite Impulse Response (FIR)] information-discarding or smoothing filters may be combined with wavelet information-preserving filters in the design of a video compression codec. FIR filters may be distinguished from wavelet filters in that the conventional FIR filters are used singly, whereas wavelet filters always come as a complementary pair. Also, the FIR filters in a wavelet transform are not necessarily related to each other as a polyphase filter bank.

Video compression may be performed in a three-step process; sometimes other steps are added, but the three main phases are these: transformation, quantization, and entry coding, as set forth earlier. These operations, as usually practiced, typically only discard information during quantization. In fact, a lossless compression method may result if this operation is omitted. However, lossless compression is limited to much smaller compression ratios than lossy compression which takes advantage of the human visual system and discards mformation which makes no visual difference, or negligible visual difference, in the decoded result.

One class of visual information that can sometimes be lost with acceptable results is fine detail. While most transform processes used in video compression are capable of discarding fine detail mformation through the quantization step, they may do so less efficiently or with less visual fidelity than a direct low-pass filter implementation.

One way to implement a smoothing filter is by use of a FIR structure. An alternative way to implement a smoothing filter is by use of an Infinite Impulse Response (HR) structure.

When the size of an image or a data sequence is to be changed, a Polyphase Filter Bank (PFB) composed of related FIR filters may be employed. Such method processes the image by removing some detail and producing a correspondingly smaller image for further processing.

A polyphase filter bank may include a set of FIR filters that share the same bandwidth, or frequency selectivity properties, but generate pixels inteφolated at different locations on or between the original samples.

For example, a polyphase filter bank can be used to reduce an image (i.e. a frame of video) to 2/3 of its original width. It does this by computing inteφolated pixels midway between each of the original pixels, computing smoothed pixels at the original locations, then keeping only every third pixel of the resulting stream.

With this method, it may be possible to omit computing the pixels that will not be kept, resulting in a more efficient method of reducing the image size. This process generalizes easily to other rational fractional size changes. In this way, a polyphase filter bank can smoothly remove a small amount of fine detail, scaling an image by a factor less than one (1). The factor can be greater than ^lΛ.

The present embodiment combines the benefits of smooth detail removal with the picture quality of wavelet transform coding by using a polyphase filter as the first stage of a wavelet based image compression process. By using this combination, one may add the advantage of smooth, high-quality, artifact-free removal of the finest details and the bits that would be required to represent them, from using a polyphase filter bank, to the well-known advantages of fast efficient computation and high visual quality from using a wavelet transform as the basis for image and video compression.

In a first embodiment of the present method, one may first apply a polyphase filter bank to the image in one direction, typically the horizontal direction, and then apply a wavelet transform to the image before quantizing and entropy coding in the conventional way. In a second embodiment of the present method, one may apply a polyphase filter in a particular direction before the first wavelet operation in that direction, but possibly after wavelet operations in other directions.

In still another embodiment, one may apply a polyphase filter in each of several directions, before the first wavelet operation for that direction but possibly after wavelet steps in other directions.

There are several advantages to the present method of applying a lossy filtering step before at least some wavelet or DCT transform stages. For instance, filters such as FIR or polyphase designs, not being constrained to function in a wavelet fashion, may be designed for higher quality and smaller artifacts. Wavelet filters may be designed in pairs that split the information into two parts without discarding information.

Applying a lossy filter before transform operations, rather than after, means that the transform computation may operate on less data and hence take less time to compute and take less intermediate storage while computing. Because the transform is typically an expensive part of the compression process, this reduction results in a significantly improved speed and efficiency for the overall compression process.

Sparse Wavelet Transform Using Piles

As yet another option, an amount of computation associated with entropy coding may be reduced by reducing a quantity of the data values. In one embodiment, such reduction may occur in a quantizer (i.e. see quantizer 204 of Figure 2) of a data compression/decompression system. Of course, such feature may be implemented independent of the various other features described herein. More exemplary information regarding the present optional feature will now be set forth. In the present embodiment, piles may be used as an operation in the decoding operation and are thus ready for use in computing the succeeding steps. See Appendix B for more information regarding piles.

It is well known in the field of scientific computing to provide what are called sparse representations for matrix data. Ordinary matrices are represented as the complete array of numbers that are the matrix elements; this is called a "dense" representation. Some program packages store, convert, and operate on "sparse matrices" in which the zero entries are not represented explicitly one-by-one, but are implicitly represented. One such "sparse" representation is zero-run coding, in which zeros are represented by the count of zeros occurring together. This count can itself be zero (when two nonzero values are adjacent), one (for an isolated zero value), or larger.

However, if video data are not matrices, one does not typically apply matrix operations (i.e. multiplication, inverse, eigenvalue decomposition, etc) to them. The principles underlying sparse matrix computation can be extracted and translated to video transforms.

Briefly, a pile consists of an array of pairs; each pair gives the address (or offset) in the ordinary data of a nonzero item together with the value of that item. The addresses or offsets are in sorted order, so that one can traverse the entire data set from one end to the other by traversing the pile and operating on the nonzero elements in it, taking account of their place in the full data set.

Piles are specifically designed to be efficiently implementable on computers that process data in parallel using identical operations on several data items at once (i.e. SIMD processors), and on computers that make conditional transfers of control relatively expensive. These processors are in common use to handle video and audio, and are sometimes called "media processors". When one needs to perform some operation on two data sets, both of which are sparse, there arises a consideration not present when the data are represented densely. That is, "when do the data items coincide with each other?"

In operating on two data sets represented as piles, the basic operation for identifying coincident data items is called "match and merge". As one traverses the two piles, he or she may have at every operation after the beginning an address from each pile and an address for which an output value has been just produced. In order to find the next address for which one can produce a value, the minimum of the two addresses presented by the input piles may be found. If both piles agree on this address, there is a data item available from each and one can operate on the two values to produce the desired result. Then, one may step to the next item on both piles.

If the next addresses in the two piles differ, a nonzero value is present in one pile (one data set) but a zero value in the other data set (implicitly represented by the pile); one may operate on the one value and zero, producing a value. Alternatively, if the operation one is performing produces zero when one input is zero, no value is produced. In either case, one may step to the next item only on the pile with the minimum address.

The result values are placed in an output location, either a dense array (by writing explicit zeros whenever the address is advanced by more than one) or in an output pile.

As mentioned earlier, a wavelet transform comprises the repeated application of wavelet filter pairs to a set of data, either in one dimension or in more than one. For video compression, one may use a 2-D wavelet transform (horizontal and vertical) or a 3-D wavelet transform (horizontal, vertical, and temporal). The intent of the transform stage in a video compressor is to gather the energy or information of the source picture into as compact a form as possible by taking advantage of local similarities and patterns in the picture or sequence. No compressor can possibly compress all possible inputs; one may design compressors to work well on "typical" inputs and ignore their failure to compress "random" or "pathological" inputs.

When the transform works well, and the picture information is well gathered into a few of the transform coefficients, the remaining coefficients have many zeros among them.

It is also a stage of video compressors to quantize the results, as set forth earlier, hi this stage, computed values that are near zero are represented by zeros. It is sometimes desirable to quantize computed coefficients during the computation of the wavelet transform, rather than or in addition to quantizing the final transformed result.

So one can get many zeros in some of the wavelet coefficient data, and this can happen while there is a need to do more computing on the data.

Additionally, when one is decoding a compressed image or video to display it, he or she may work from the entropy-coded significant coefficients back toward a completely filled-in image for display. The typical output of the first decoding step, entropy-code decoding, is a set of significant coefficients with large numbers of non- significant coefficients, considered by default to be zero, among them.

When this happens, it is valuable to convert the dense data with many zeros into a sparse representation; one may do this by piling the data as disclosed earlier. The pile representations resemble run-of-zero representations but usually store an address or offset, rather than the run length (a difference of addresses). This allows faster processing both to create the pile and to expand the pile into a dense representation later.

In the case of decoding, the data are not in a dense form and it is more natural to construct the piles directly in the entropy decoder.

The processing of wavelet transforms presents several cases that are susceptible to piling treatment. Note Table 2 below:

Table 2

• decompressing, both bands piled

• decompressing, one band piled

• decompressing, input piled and output dense • compressing, input dense and output piled

One example will be considered: decoding a compressed frame of video, where the encoding process has resulted in very many of the coefficients being quantized to zero. The first stages of decompression undo the entropy-encoding or bit-encoding of the nonzero coefficients, giving the values and the location of each value in the frame. This is just the information represented in a pile, and it is very convenient to use a pile to store it, rather than expanding it immediately to a dense representation by explicitly filling in all the intervening zero values.

At this stage, one has the coefficients ready to be operated upon by the inverse wavelet transform. The final result of the inverse transform is the decompressed image ready to be displayed; it is only rarely sparse.

The first stage of the inverse wavelet transform (like each stage) is a filter computation that takes data from two areas or "bands" of the coefficient data and combines them into an intermediate band, which will be used in further stages of the same process. At this first stage, the data for both bands is sparse and represented in piles. One may produce the output of this stage in a pile, too, so that he or she need not fill in zeros. The computation below in Table 3 operates on "band" piles P_\ and Pz, produces its result in new pile R, and performs the filter computation step W(p, q) on pairs of coefficients from the two bands.

Table 3

while not both EOF (P₁) , EOF (P₂) {

I_l=0; I₂=0; guard (P_x. index ≤ P₂. index, Pile_Read (P_{l r} I_λ) ) ; guar (P_x. index ≥ P₂. index, Pile_Read (P₂, 1₂) ) ; Condi tional_Append (R, true, W(I_{l r} I₂) ) ; } ;

Destroy_Pile (P _/ Destroy _F 'He (P₂) ;

It should be noted that the foregoing computation can still be unrolled for parallel operation, as shown in Appendix B.

The time taken to compute the wavelet transform may be reduced by using a sparse representation, piles, for intermediate results that have many zero values. Such method improves the performance and computational efficiency of wavelet- based image compression and video compression products.

Transform Range Limit

As yet another option, an amount of computation associated with reconstructing the data values into a predetermined data range may be reduced. Such computation may be reduced by performing only one single clip operation. In one embodiment, such reduction may occur in a de-quantizer module (i.e. see de- quantizer 212 of Figure 2) of a data compression/decompression system. Of course, such feature may be implemented independent of the various other features described herein. More exemplary information regarding the present optional feature will now be set forth. hi digital image compression and digital video compression methods, images (or frames) are represented as arrays of numbers, each number representing the brightness of an area or the quantity of a particular color (such as red) in an area. These areas are called pixels, and the numbers are called samples or component values.

Image compression or video compression is done using a wide range of different methods. As mentioned earlier, many of these methods involve, as a step, the computation of a transform: through a sequence of arithmetic operations, the array of samples representing an image is transformed into a different array of numbers, called coefficients, that contain the image information but do not individually correspond to brightness or color of small areas. Although the transform contains the same image information, this information is distributed among the numbers in a way that is advantageous for the further operation of the compression method.

When an image or frame, compressed by such a method, is to be replayed, the compressed data must be decompressed. Usually this involves, as a step, computing an inverse transform that takes an array of coefficients and produces an array of samples.

The samples of an image or frame are commonly represented by integers of small size, typically 8 binary bits. Such an 8 bit number can represent only 256 distinct values, and in these applications the values are typically considered to be the range of integers [0, 255] from zero through 255 inclusive.

Many standards and operating conditions impose more restricted ranges than this. For example, the pixel component (Y, U, V) sample values in CCIR-601 (ITU- R BT.601-4) digital video are specified to lie within smaller ranges than [0, 255]. In particular, the luma Y component valid range in the lit part of the screen is specified to lie within [16, 235], and the chroma U, V range is specified to line within [16, 240]. Values outside these ranges can have meanings other than brightness, such as indicating sync events.

Image and video compression methods can be divided into two categories, lossless and lossy. Lossless compression methods operate in such a way as to produce the exact same values from decompression as were presented for compression. For these methods there is no issue of range, since the output occupies the same range of numbers as the input.

Lossy compression, however, produces a decompressed output that is only expected to approximate the original input, not to match it bit for bit. By taking advantage of this freedom to alter the image slightly, lossy methods can produce much greater compression ratios.

In the decompression part of a lossy compression method, the samples computed are not guaranteed to be the same as the corresponding original sample and thus are not guaranteed to occupy the same range of values. In order to meet the range conditions of the image standards, therefore, there must be included a step of limiting or clipping the computed values to the specified range.

The straightforward way to perform this clipping step is as follows: For every computed sample s, test whether s > max and if so set s = max; test whether s < min and if so set s = min.

Another way to perform this step uses the MAX and MIN operators found in some computing platforms; again, on may apply two operations to each sample. Both the ways shown, and many others, are more computationally expensive than a simple arithmetic operation such as an add or subtract.

Since this process may be performed separately for every sample value (every pixel) in the image or frame, it is a significant part of the computation in a decompression method. Note that for nearly all of the computed samples, which normally lie well within the required range, both tests will fail and thus both tests must be computed.

Transform computations as described above commonly have the following property: one of the resulting coefficients represents the overall brightness level the entire frame, or a significant part of the frame (a block, in MPEG terminology). This coefficient is called a "DC coefficient." Because of the way the transform is computed, altering the DC coefficient alters the value of all the samples in its frame or block in the same way, proportionally to the alteration made. So it is possible, for example, to increase the value of every sample in a block by the same amount, by adding a properly chosen constant to the DC coefficient for the block just before computing the inverse transform.

The computing engines upon which compression methods are executed commonly have arithmetic instructions with the saturation property: when a result is calculated, if it would exceed the representation range of its container (for 8-bit quantities, [0, 255]), the result is clipped to lie within that range. For example, if a saturating subtract instruction is given the values 4 and 9, the result instead of (4 - 9 =) -5 would be clipped, and the result 0 would be returned. Likewise a saturating add instruction would deliver for 250 + 10 the result 255.

A lower-cost way to clip the pixel component values will now be described that results from decoding to appropriate limits, in many compression methods. The present embodiment performs one of the two clips with saturating arithmetic, by carrying a bias on the part values, leaving only one of the MAX/MIN ops. To see an example in more detail, when the range required is [llim, ulim] = [16, 240], see Table 4.

Table 4 1 . Apply a bias to the DC coefficient in each block which will result , after all the transform filters , in each part being offset by negative 16 ( -Him in general) . Cost : one arithmetic operation per block or image .

2 . Be sure that the final arithmetic step of the inverse transform saturates (clips) at zero . Cost : free on most compute engines .

3 . Apply the MAX operation, partitioned, with 224 (ulim Him in general ) .

Cost : one MAX operation per sample .

4 . Remove bias using ADD 16 (Him in general) . This cannot overflow because of the MAX just before ; it does not need to be done with saturating arithmetic . Cost : one ADD per sample .

As is now apparent, the computational cost of the necessary range limiting can be reduced from two MAX/MIN operations per sample to one ADD per block, one MAX per sample, and one simple ADD per sample.

On some computing engines, for example the EQUATOR MAP-CA processor, the savings of using the present method can be much more substantial than is readily apparent from the foregoing. On these engines, several samples can be combined in a word and operated upon simultaneously. However, these partitioned operations are limited to certain parts of the processor, and in compression applications can be the limiting resource for performance. On such an engine, the fact that the ADD in Step 4 above cannot overflow is highly significant. Step 4 need not use the special partitioned ADD, but can use an ordinary ADD to operate on several samples at once as if partitioned. This ordinary operation uses a part of the processor that is not as highly loaded and can be overlapped, or executed at the same time as other necessary partitioned operations, resulting in a significant saving of time in computing the inverse transform. Figure 6 illustrates a system 600 for compressing data, in accordance with one embodiment. As an option, the system 600 may be implemented in the context of the foregoing concepts set forth hereinabove. Of course, however, the system 600 may be implemented in any desired context.

Included is an encoder 602 embodied on a single device 604 for encoding data to generate first compressed data in a first format. Moreover, a transcoder 606 is embodied on the same single device 604 as the encoder 602 for transcoding the first compressed data to generate second compressed data in a second format.

In use, data is received in the single device 604. Such data is encoded utilizing the single device 604 to generate first compressed data in a first format. Moreover, the first compressed data is transcoded utilizing the single device 604 to generate second compressed data in a second format.

In one embodiment, the encoding may occur in real-time. Moreover, the transcoding may occur off-line. In another embodiment, the first compressed data may be transcoded to generate the second compressed data in the second format, such that the second compressed data is adapted to match a capacity of a communication network coupled to the single device 604.

As an option, the encoding may be carried out utilizing a first encoder. Moreover, the transcoding may be carried out utilizing a decoder and a second encoder, as shown in Figure 6.

Still yet, the first format may include a wavelet-based format. Further, the second format may include a DCT-based format. In one particular embodiment, the second format may include an MPEG format. More exemplary information regarding additional optional features will now be set forth. As set forth earlier, there are several modes of communication using images and video sequences. In addition to direct real-time viewing, one can capture the image(s) or video sequence and transmit it at a later time, either immediately following capture or delayed until a more advantageous time.

In addition, the receiving of a video sequence can be done either in a realtime mode in which the video seen but not stored, like watching TV, or in another mode where the sequence is stored for later viewing.

These various options combine into three scenarios of use, in addition to other combinations. The three scenarios are:

1. Videophone or picturephone operation as described above, where both the transmitter and receiver operate in real time. This requires all compression, coding, and decompression to be done in real time at the speed of video capture, and it requires the transmission channel to carry the full rate of the compressed video.

2. Streaming operation, in which the video is captured and stored at the source or in the network and viewed at the receiver in real time. This requires realtime decoding, but allows time for processing the sequence before transmission. This mode requires the transmission channel, at least from the network to the receiver, to carry the full rate of the compressed video. In addition, for most transmission channels, the receiver must buffer some amount of the sequence to maintain smooth playback in the presence of variance in the transmission rate.

3. Messaging or File-transfer mode, in which the video is captured and stored at the source, transferred in non-real-time to the receiver, and stored at the receiver for later playback. This mode allows for operation on transmission channels that cannot carry the full rate of real-time video, and allows for the recipient to replay, pause, and otherwise control the experience.

Images or video that have been captured and compressed in one format may be converted into another compression format. This operation is called transcoding. It is done, in the worst case, by decompressing the input format into a full picture or video, then compressing in the desired output format. For many pairs of formats there may be less-expensive methods than this worst-case method available.

In many networks, such as the international cell phone network, different users may prefer or require different formats for images or video. This may be the case even if all users adhere, for example, to the MPEG-4 standard, because that standard offers many options of profiles, size, rate, and other parameters. For this reason and others it will sometimes be desirable for the sender and recipient devices to negotiate which format is to be used in a particular transmission. In the simplest case each device provides a list of formats that it can handle, and both choose one that is mutually acceptable from the intersection of the lists. There are more complex forms of this negotiation, but the general effect is the same: the sender only knows what format to transmit after the start of the connection.

When transcoding is required as part of a connection, it can be performed either in the originating device or in some intermediate location. Some networks may offer transcoding services as part of the operation of the network, in order to provide for mutual communication among devices with disparate local capabilities. This will help to keep the complexity, and hence the cost, of the mobile units low.

Because of the disparities mentioned above between video data rates and transmission channel rates, it can be advantageous to operate in a new mode as follows. The device captures video, compresses it in real time using a low- complexity compression method such as that to be described hereinafter, and stores the compressed video sequence. Then at a later time the device can tianscode the video sequence into a format that is acceptable to the recipient or to the network. This allows for low power operation, long battery life, and simpler circuitry in the device, along with complete compatibility with network format standards.

An optional advantage of this operating style is flexibility: the choice of realtime compression does not limit the range of receivers with which the device can communicate directly. The transmission format can be negotiated at the time of the transfer call, as described above. The device can support a broader range of formats this way, because it need not have an extensively optimized real-time implementation of every one.

Another optional advantage of the operating style above is that the transcoding need not operate at the speed of video capture, but can be matched to the speed of the transmission network which is often much lower. The lower speed transcoding operation, in turn, can be done in circuitry that is smaller and consumes less power than a standard real-time compressor would take. Thus the overall power consumption, battery life, complexity, and cost of the device is reduced.

Yet another optional advantage of this style of operation is the possibility of postponing transmission of images and video from times when the cost is high, such as daytime telephone rates, to times when the cost is lower (or, in current cell phone pricing schemes, even free) such as night rates.

The transmission may have lower cost at another time because of other factors than time. For example, a cell phone may incur lower charges when it returns to its home territory than when it is "roaming".

Deferred transmission as described does not necessarily require the user of the device to take any deferred action. The transmission can be scheduled automatically by the device, based on information it has about rates and schedules. Thus the user's convenience is preserved. Of course, some messages have higher perceived urgency than others; users can easily specify whether and how long to defer transmission.

When images and video are transferred in non-real time, it is possible that the user of the device will want to make a call while the transfer is in progress, or that an incoming call will arrive, or that the connection will be broken for some other reason. It is well known in the computer networking field to provide information that allows an interrupted transfer to resume, without having to retransmit parts of the information that were already successfully transferred.

Such interruptible transfers will allow both for deliberate interruption such as placing a call and for unexpected interruption such as a dropped connection.

It is not necessary for the receiving device to have the capacity to store an entire video sequence. A transcoding source device can send to a streaming-mode receiver, including a receiver that is much simpler and much less capable than the sender. This allows for easy adoption of advanced transcoding devices into an existing network of devices.

Standard image and video formats provide error detection, error correction, and burst-error control methods. By transcoding into these standard formats, the device can take full advantage of standard error resilience features while using a low-complexity, low-power capture compression method.

The idea of capturing a signal of interest using low-complexity real-time processing, then transcoding it later into a format better suited to transmission, storage, or further processing, can be applied to signals other than images and video, and to uses other than wireless transmission, and to devices other than mobile personal conveniences. For example, military intelligence sensing, infrared remote sensing, sonar, telescope spectra, radio telescope signals, SETI channels, biochemical measurements, seismic signals, and many others can profit from this basic scheme.

Figure 7 illustrates a system 700 for compressing data utilizing multiple encoders 702 on a single integrated circuit 704 (i.e. ASIC). As an option, the system 700 may be implemented in the context of the foregoing concepts set forth hereinabove. Of course, however, the system 700 may be implemented in any desired context.

As shown, a first encoder is embodied on the single integrated circuit 704 for encoding a first set of data. Moreover, a second encoder is embodied on the same single integrated circuit 704 as the first encoder for encoding a second set of data. Of course, more encoders may be embodied on the single integrated circuit 704 for similar puφoses.

In use, data is received in the single integrated circuit. The data is then encoded utilizing a plurality of encoders incoφorated on the single integrated circuit.

Many applications for video compression would be better served by an ASIC containing multiple coding or decoding stages. An example is the category of Personal Video Recorders (PVR) or Digital Video Recorders (DVR), such as the products of TiVo and Replay TV, wherein the processes of compression and decompression must be performed simultaneously. Another example is video surveillance recorders, wherein many video signals from cameras must be multiplexed, compressed, and recorded together. Putting several compression circuits on a single ASIC, or a combination of compression and decompression circuits on a single ASIC, offers both direct and indirect advantages. Direct advantages include reduced package count, reduced pin count, reduced power consumption, and reduced circuit board area. All of these contribute to reduced product cost.

Indirect advantages include the possibility to incoφorate video selection and multiplexing circuitry on the same chip, further reducing the pin count and board area.

There are now video compression methods, for example the algorithms described during reference to Figures 2-5 developed by Droplet Technology, Inc.(R), that require far less circuitry to implement than the conventional and standard compression methods. Due to their superior design, multiple instances of these advanced compression methods may now be integrated onto a single ASIC or other integrated circuit.

Another single module system and method for compressing data are further provided. In use, photons are received utilizing a single module. Thereafter, compressed data representative of the photons is outputted utilizing the single module.

The present embodiment may be implemented for building imager arrays - CMOS or CCD cameras or other devices - to facilitate the whole process of capturing and delivering compressed digital video. Directly digitized images and video take lots of bits; it is common to compress images and video for storage, transmission, and other uses. Several basic methods of compression are known, and very many specific variants of these. A general method can be characterized by a three-stage process: transform, quantize, and entropy-code.

The intent of the transform stage in a video compressor is to gather the energy or information of the source picture into as compact a form as possible by taking advantage of local similarities and patterns in the picture or sequence. The present embodiment works well on "typical" inputs and ignores their failure to compress "random" or "pathological" inputs.

Many image compression and video compression methods, such as JPEG [1], MPEG-2 [2] and MPEG-4 [4], use the discrete cosine transform (DCT) as the transform stage.

Some newer image compression and video compression methods, such as JPEG-2000 [3] and MPEG-4 textures [4], use various wavelet transforms as the transform stage.

A wavelet transform comprises the repeated application of wavelet filter pairs to a set of data, either in one dimension or in more than one. For image compression, one may use a 2-D wavelet transform (horizontal and vertical); for video one may use a 3-D wavelet transform (horizontal, vertical, and temporal).

A wavelet filter pair processes an image (or a section of an image) to produce two images, each typically half the size of the input, that can generally be regarded as "low-pass" or average or blurred, for one, and "high-pass" or detail or edges, for the other. The full information in the input picture is retained, and the original can be reconstructed exactly (in many cases), from the transformed image pair. A wavelet filter pair generally processes an image in one dimension, either horizontally, vertically, or temporally (across a time sequence of frames). The full wavelet transform is composed of a series of these steps applied in several dimensions successively. In general, not all of the results of earlier steps are subjected to later steps; the high-pass image is sometimes kept without further filtering.

A camera has at its heart an imager device: something that responds to and records varying intensities and colors of light for later display and other uses. Common imager devices for digital still cameras and video cameras today are CCDs and CMOS arrays. Both accumulate an electric charge in response to light at each pixel; they differ in the way they transfer and read out the amount of charge.

CMOS ("complimentary metal-oxide semiconductor") imagers are the newer technology, and can be made less expensively than CCDs. A key advantage of CMOS imagers is that the processing of the imager chip resembles the processing of digital logic chips rather closely. This makes it easier to include control and other functions on the same chip. Both kinds of chip, however, are necessarily built from analog circuits at the lowest level to measure the analog charge or voltage or current that represents the amount of light seen.

CMOS imagers are very similar in structure to DRAMs ("dynamic random- access memories"), and transfer the charge that represents the light seen in a pixel to the edge of the array along a grid of metal traces that cross the array. This readout method is standard practice for memory chips and is well developed in the industry.

CCD imagers, while an older technology, are well developed and offer lower noise and better sensitivity. CCDs ("charge-coupled devices") transfer the charge that represents the light seen in a pixel to the edge of the array by passing it from cell to cell in bucket-brigade fashion. A CMOS imager or CCD imager differs from a digital memory device in that the charge transferred to the edge of the array represents not just a "0" or "1" bit value, but a range of brightness values. Thus an analog-to-digital conversion is required. Preceding this conversion, the signal is amplified; it is often subjected to other processing to cancel out errors and variability in the chip fabrication and operation. A common processing step is "correlated double sampling", in which a dark sample is taken and stored as a measure of the leakage current for this part of the circuit, and subtracted from the image sample to reduce noise patterns.

The analog processing is done in a differential amplifier, a circuit that responds primarily to the difference between its inputs rather than to the absolute size of either one.

At some point in the processing chain between light capture and stored digital images, the signal must be converted from analog (charge, voltage, or current) representation to digital representation.

Because one can choose to do the analog-to-digital sooner or later in the chain, he or she has the option of doing some stages of the overall processing either in analog or in digital form.

The wavelet filter pair that is a step of a wavelet consists, in some implementations, of a very simple set of additions and subtractions of adjacent and nearby pixel values. For instance, useful filter pair, called the "Haar Wavelet", is just the sum and difference as follows in Equations #1.1H, and 1.2H.

Equations #1.1H. and 1.2H

L_n = X_2n +X_2n» eq lΛH

H_n = X_2n -X_2n+. eq \.2H This generates one sample of the "High" transformed image and one sample of the "Low" transformed image from the same two samples of the input image "X".

Other wavelet filters are possible and are used; some are very complex, but some are as simple as doing a few Haar steps, summing them together, and scaling them by constant amounts.

For instance, one of the transforms specified in the JPEG 2000 standard [1] is the reversible 5-3 transform set forth earlier in Equations #1.1, and 1.2.

As one can see, the entire wavelet filter pair takes 5 add/subtract operations and two scaling operations; in the continuous analog domain the floor operations disappear.

It turns out that summing analog values together is easily and naturally accomplished by differential amplifiers (for either addition or subtraction), and that scaling by a constant amount is the easiest operation of all for an analog signal, requiring only a resistor or two.

In contrast, summing values in the digital domain requires an adder logic circuit for each bit plus a carry chain; scaling by some special constant amounts is easy but general scaling is not cheap in digital logic.

Because CMOS and CCD imagers are presently built using differential amplifiers to amplify and subtract noise from the pixel samples on the chip, it is fairly easy to do some simple processing steps on the chip before digital-to-analog conversion. Doing these steps adds some analog circuitry to the chip, but it can be a small amount of circuitry.

It turns out in some implementations of the wavelet transform, including those one prefers, that the first step computed is the most expensive. This is because each of the first several steps reduces the amount of image to be processed by later stages; one does not necessarily further process the "high pass" image output by each filter stage. Thus implementing the first step or first few steps in analog, before doing analog-to-digital conversion, can reduce the digital processing significantly, since only the "low pass" image must be digitally processed. The benefit can be taken either by reducing the amount of digital circuitry, thus reducing the chip area it occupies, or by running the digital circuitry slower, thus reducing its power consumption and heat generation.

The transform stage of image or video compression can be done using a

DCT; this process transforms an image into a spectrum, whose successive samples represent the content of a range of spatial frequencies in the image. Some implementations of DCT use Haar steps, and these could benefit from being done in analog as well.

Usually in wavelet transforms, one can compute a horizontal filter pair as the first step. This seems convenient for the analog filtering as well. One can do two horizontal steps before doing the first vertical filter step, and this would also be convenient in analog.

Vertical filter steps require the simultaneous presence of vertically adjacent pixels, hi the conventional image scanning raster order, such pixels appear widely separated in time (a line time apart). However, in chip imagers such as CMOS imagers, it is reasonable to consider rearranging the scan order so that several lines appear together, and then it is feasible to do a vertical filter step in analog as well, either before or after the first horizontal filter step.

Imager chips that capture color images typically place a color filter in front of each pixel, restricting it to one of red, green, or blue response. These filters are arranged in a pattern so that all three colors are sampled adjacently everywhere in the image. Digital video standards, however, prefer an arrangement of components other than RGB. The most widely used is YUV, or YCt,C_r, in which the Y component represents black-and-white brightness or "luma" and the U and V components represent color differences between blue or red and luma. The reason for this representation is that human visual response allows lower resolution in the C components, thus allowing smaller digital representations of images. The YUV representation is convenient for compression as well. Color imager chips sometimes provide circuitry to do the operation of transforming RGB pixel values into YUV values, either analog (before conversion) or digital (after conversion).

One can combine color conversion with wavelet filter steps in any of several ways. For instance, the analog color conversion can precede the first analog wavelet filter step; in this case the wavelet filters work on full-bandwidth Y component and on half-bandwidth U and V components. Alternatively, the wavelet filters can be applied to the R, G, and B components from the imager array first, followed by color conversion to YUV; in this case the filters work on three full-bandwidth component signals.

hi another arrangement, one can omit the conventional color conversion step altogether and provide RGB components to the wavelet transform. There are versions of the wavelet transform that accomplish conversion to YUV as part of their operation. In this arrangement, the analog circuitry that does the color conversion is replaced by the analog circuitry that does the first wavelet steps, for no net increase in analog circuitry, reduced digital circuitry, and a very clean interface with the digital wavelet compression processing.

It has thus been shown how to make a compressed digital video capture subsystem more efficient by incoφorating analog computation of the initial wavelet filter step or steps. This can be done for monochrome imagers, and can be combined in several ways with the color conversion stage of color digital imagers. This method improves the performance and computational efficiency of wavelet-based image compression and video compression products.

While various embodiments have been described above, it should be understood that they have been presented by way of example only, and not limitation. Thus, the breadth and scope of a preferred embodiment should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.

APPENDIX A

One may have three data values, [X_2Λf_₁ X_2N-₂ X _N-₄ ] > ^an& ^need three coefficients for the quadratic:

The negative of half the 2ⁿ derivative may be — 2 ₂ so interest may only be in α₂. In that case, it is more simple to find the quadratic:

since

Three linear equations with a Vandermonde type coefficient matrix may be solved.

(-1)° (-2)° (-4)°

[α₀ α_λ α₂] (-1)¹ (-2)¹ (-4)¹

-Λ-2N-2 - - 2N-i (-1)² (-2)² (-4)²

16 12 2 α₀ fl_j α₂\ — \ Λ._2Η__X JL _2N_₂ Λ _2W_₄J -12 -15 -3

2 3 1

Half of the negative of the 2^nd derivative is:

APPENDIX B

INTRODUCTION TO PILES

Parallel processors are difficult to program for high throughput when the required algorithms have narrow data widths, serial data dependencies, or frequent control statements (e.g., "if, "for", "while" statements). This embodiment overcomes all three of these problems, singly or in combination. Entropy coding applications are an important class of applications with all three types of problems.

PARALLEL PROCESSING

There are three types of parallelism that may be used to advantage in processors.

1) The first type is supported by multiple functional units and allows processing to proceed simultaneously in each functional unit. Super-sealer processor architectures and VLIW (Very Long Instruction Word) processor architectures allow instructions to be issued to each of several functional units on the same cycle. Generally the latency, or time for completion, varies from one type of functional unit to another. The most simple functions (e.g. bitwise AND) usually complete in a single cycle while a floating add function my take 3 or more cycles.

2) The second type of parallel processing is supported by pipelining of individual functional units. For example, a floating add may take 3 cycles to complete and be implemented in three sequential sub-functions requiring 1 cycle each. By placing pipelining registers between the sub-functions, a second floating add may be initiated into the first sub-function on the same cycle that the previous floating add is initiated into the second sub-function. By this means a floating add can be initiated and completed every cycle even though any individual floating add requires 3 cycles to complete.

3) The third type of parallel processing available is that of devoting different field- partitions of a word to different instances of the same calculation. For example, a 32 bit word on a 32 bit processor may be divided into 4 field-partitions of 8 bits each. If the data items are small enough to fit in 8 bits it is may possible to process all 4 values with the same single instruction.

It may be possible in each single cycle to process a number of data items equal to the product of the number of field-partitions times the number of functional unit initiations. LOOP UNROLLING

There is a conventional and general approach to programming multiple and/or pipelined functional units: find many instances of the same computation and perform together corresponding operations from each instance. The instances can be generated by the technique of loop unrolling or by some other source of identical computation.

While loop unrolling is a generally applicable technique, a specific example is helpful in learning the benefits. Consider, for example, the program A) for i = 0:1:255, { S(z) }; where the body S(ϊ) is some sequence of operations { Sι(z^'); S₂(z); S₃(z); S₄(z^'); Ss(z^'); } dependent on i and where the computation S(z^') is completely independent of the computation S(j),j≠ i. One may not assume that the operations Sι(z^'); S₂(z); S₃(z); S₄(z); S₅(z); are independent of each other; to the contrary one may assume that dependencies from one operation to the next prohibit reordering.

One may also assume that these same dependencies require that the next operation not begin until the previous one is complete. If each (pipelined) operation required two cycles to complete (even though the pipelined execution unit may produce a new result each cycle) the sequence of five operations will require 10 cycles for completion. In addition the loop branch will typically require an additional 3 cycles per loop unless the programming tools can overlap S₄(z); S₅(t); with the branch delay. Program A) will require 256/4*10=640 cycles to complete if the branch delay is overlapped and 256/4*13=832 cycles to complete if the branch delay is not overlapped.

The program B) for n = 0:4:255, { S(n); S(.z+1); S(«+2); S(«+3);}; is completely equivalent to program A). The loop has been "unrolled" four times. This reduces the number of expensive control flow changes by a factor of four. More important, it provides the opportunity for reordering the constituent operations of each of the four S(z^'). So programs A) and B) are equivalent to program C) for n = 0:4:255, { Sι(n); S₂(«); S₃(Λ); S₄(«); S₅(n); Sι(n+1); S₂(n+1); S₃(«+i); S₄(n+1); S₅(n+I); Sι(n+2); S₂(«+2); S₃(n+2); S₄(«+2); S₅(n+2); Sι(n+3); S₂(/.+3); S₃(»+ ); S₄(n+3); S₅(n+3);

};

With the set of assumptions about dependencies and independencies above one may create the equivalent program D) for n = 0:4:255, { Sι(«); Sι(n+1); S_(n+2); Sι(»+3); S₂(n); S₂(«+7); S₂(n+2); S₂(n+3); S₃(«); S₃(n+1); S₃(»+2); S₃(n+3); S₄(.z); S₄(n+7); S₄(«+2); S₄(«+3); S₅(n); $₅(n+l); S₅(n+2); S₅(n+3);

};

On the first cycle S_\(n); S (n+1); can be issued and Sι(n+2); Sι(«+3); can be issued on the 2^nd cycle. At the beginning of the third cycle S i); S.(n+1); will be completed (two cycles have gone by) so that S₂(ra); S₂(n+7); can be issued. So it goes - the next two operations can be issued on each subsequent cycle so that this whole body can be executed in the same 10 cycles. Program D) operates in less than a quarter of time of program A).

Most parallel processors necessarily have conditional branch instruction which require several cycles of delay between the instruction itself and the point at which the branch actually takes place. During this delay period other instructions can be executed. The branch can cost as little as one instruction issue opportunity as long as the branch condition is known sufficiently early and the compiler or other programming tools support the execution of instructions during the delay. This technique can be applied to even program A) as the branch condition (z— 255) is known at the top of the loop.

Excessive unrolling is counter productive. First, once all of the issue opportunities are utilized (as in program D)) there is no further speedup with additional unrolling. Second, each of the unrolled loop turns, in general, requires additional registers to hold the state for that particular turn. The number of registers required is linearly proportional to the number of turns unrolled. If the total number of registers required exceeds the number available, some of the registers must be spilled to cache and then restored on the next loop turn. The instructions required to be issued to support the spill and reload lengthen the program time, ultimately when the loop unrolls give no speedup. Bottom line - there is an optimum number of times to unroll such loops.

UNROLLING LOOPS CONTAINING EXCEPTION PROCESSING

Now consider program A') for i = 0:1:255, { S(z); if C(z) then T(I(z)) }; where C(z) is some rarely true (say, 1 in 64) exception condition dependent on S(z^'); only, and T(I(z^')) is some lengthy exception processing of, say, 1024 operations. I(z^') is the information computed by S(z^') that is required for the exception processing. For example puφoses let's say that T(I(z^')) adds, on the average, 16 operations to each loop turn in program A), an amount which exceeds the 4 operations in the main body of the loop. Such rare but lengthy exception processing is a common programming problem. How does one handle this without losing the benefits of umolling?

GUARDED INSTRUCTIONS

One approach is through the use of guarded instructions, a facility available on many processors. A guarded instruction specifies a Boolean value as an additional operand with the meaning that the instruction always occupies the expected functional unit, but the retention of the result is suppressed if the guard is false.

In implementing an if-then-else the guard is taken to be the if condition. The instructions of the then clause are guarded by the if condition and the instructions of the else clause are guarded by the negative of the if condition. In any case, both clauses are executed. Only the instances with the guard true are updated by the results of the then clause - only the instances with the guard false are updated by the results of the else clause. All instances execute the instructions of both clauses, enduring this penalty rather than the pipeline delay penalty required by a conditional change in the control flow.

The guarded approach suffers a large penalty if, as in program A'), the guards are preponderantly true and the else clause is large. In that case all instances pay the large else clause penalty even though only a few are affected by it. If one has an operation S to be guarded by a condition C one may program that as gu rd(C, S);

FIRST UNROLLING

Program A') can be unrolled to program D') for n = 0:4:255, { Sι(n); S^n+1); Sι(/.+2); Sι(«+3);

S₂(»); S₂(n+1); S₂(n+2); S₂(«+3);

S₃(»); S₃(«+i); S₃(n+2); S₃(«+3);

S₄(/z); S₄(n+1); S₄(ra+2); S₄(z.+3);

S₅(n); S₅(n+1); S₅(n+2); S₅(n+3); if C(«) then T(I(«)); if C(rc+ ) then T(I(R+E)); if C(n+2) then T(h +2)); if C(n+3) then T(I(«+3));

};

Given the above example parameters no T(I(;z)) will be executed in 77% of the loop turns, one T(I(?z)) will be executed in 21% of the loop turns, and more than one T(I(?z)) in only 2% of the loop turns. Clearly there is little to be gained by interleaving the operations of T(I(n)), T(I(n+l)), T(I(7z+2)) and T(I(«+3)). PILE PROCESSING

A novel alternative is Pz7e Processing. A pile is a sequential memory object typically stored in RAM. Piles are intended to be written sequentially and to be subsequently read sequentially from the beginning. A number of methods are defined on Pz7e objects.

For piles and their methods to be practical in parallel processing environments their implementations are required to be a few instructions of inline (no return branch to a subroutine) code. It is also required that this inline code contain no branch instructions. Such method implementations will be described below. It is the possibility of such implementations that make piles novel and valuable.

1) A pile is created by the CreateJPile(P) method. This allocates storage and initializes the internal state variables.

2) The primary method for writing to a pile is Conditional_Append(pile, condition, record). This method appends the record to the pile pile if and only if condition is true.

3) When a pile has been completely written, it is prepared for reading by the Rewind_Pile(P) method. This adjusts the internal variables so that reading will begin with the first record written.

4) The method EOF(P) produces a boolean value indicating whether or not all of the records of the pile have been read.

5) The method Pile_Read(P, record) reads the next sequential record from the pile P.

6) The method Destroy _Pile(P) destroys the pile P by deallocating all of its state variables.

USING PILES TO SPLIT OFF CONDITIONAL PROCESSING

One can now transform program D') into program E') by means of a pile P. Create _Pile (P); for n = 0:4:255, { Sι(n); S (n+1); Sι(n+2); Sι(w+3); S₂(«); S₂(«+7); S₂(«+2); S₂(n+3); S₃(»); S₃(»+7); S₃(«+2); S₃(«+3); S₄(»); S₄(«+7); S₄(/ +2); S₄(«+3); S₅(»); S₅(n+7); S₅(n+2); S₅(n+3); Conditional _Append(P, C(n), I(n)); Conditional _Append(P , C(n+1), ϊ(n+l)) Conditional _Append(F ', C(n+2), I(n+2)) Conditional _Append(P, C(ra+3), l(n+3))

};

Rewind(P); while not EOF(P) { Pile_Read(P, I);

T(D; };

Destroy JPile (P);

Program E') operates by saving the required information I for the exception computation T on the pile P. Just I records corresponding to the exception condition C(n) are written so that the number (e.g., 16) of I records in P is much less than the number of loop turns (e.g., 256) in the original program A). Afterwards a separate while loop reads through the pile P performing all of the exception computations T. Since P contains records I only for the cases where C(AZ) was true, only those cases are processed.

The second loop is a little trickier than the first loop because the number of turns of the second loop, while 16 on the average in this example, is indeterminate. Therefore a while loop rather than a for loop is required, terminating when the EOF method indicates that all records have been read from the pile.

As asserted above and described below, the Conditional _Append method invocations can be implemented inline and without branches. This means that the first loop is still unrolled in an effective manner, with few unproductive issue opportunities.

UNROLLINGTHE SECONDLOOP

The second loop in program E') is not unrolled and is still inefficient. However, one can transform program E') into program F') by means of four piles Pi, P₂, P₃, P₄. The result is that F') has both loops unrolled with the attendant efficiency improvements.

Create_Pile (P\); Create_Pile (P_%); Create _Pile (P₃); CreateJPile (PA); for n = 0:4:255, { Sι(ra); Sι(zz+7) Sι(«+2); Sι(«+3); S₂(n); S₂(.z+7); S₂(n+2); S₂(n+3) S₃(»); S₃(n+1); S₃(«+2); S₃(»+3) S₄(«); S₄(ra+7); S₄(n+2); S₄(n+3) S₅(n); S₅(n+1); S₅(ιt+2); S₅(n+3) Conditional _Append(P\, C( ), l( )) Conditional_Append(P₂, C(n+I), ϊ(n+l)); Conditional_Append(P₃, C(n+2), l(n+2)); Conditional_Append(P , C(n+3), l(n+3));

};

Rewind(P\); Rewind (P₂); Rewind (P₃); Rewind (P₄); while not aUEOF(Pi) { Pile_Reαd(P_h l.);Pile_Reαd(P₂, J; Pile_Reαd(P₃, l₃);PileJR.eαd(P₄, I₄J; uardO- t EOEfPi S);^); uard(not EOEfPi , S);T(I₂); guard(notEOEfP₃j, S);T(I₃); guard(not EOF(P₄), S);T(l4);

};

Destroy pPϊle (P_\); Destroy JPile (P^; Destroy _Pile (P₃); Destroy _Pile (P₄);

Program F') is program Ε') with the second loop unrolled. The unrolling is accomplished by dividing the single pile of program Ε') into four piles, each of which can be processed independently of the other. Each turn of the second loop in program F') processes one record from each of these four piles. Since each record is processed independently the operations of each T can be interleaved with the operations of the 3 other Ts.

The control of the while loop must be modified to loop until all of the piles have been processed. And the Ts in the while loop body must be guarded since, in general, all of the piles will not be completed on the same loop turn. There will be some inefficiency whenever the number of records in two piles differ greatly from each other, but the probabilities (law of large numbers) are that the piles will contain similar numbers of records.

Of course this piling technique can be applied recursively. If T itself contains a lengthy conditional clause T', one can split T' out of the second loop with some additional piles and unroll the third loop. Many practical applications have several such nested exception clauses.

IMPLEMENTING PILE PROCESSING

The implementations of the pile object and its methods must be kept simple in order to meet the implementation criteria stated above: a) The method implementations, except for CreαteJPile and Destroy _ Pϊle, must be but a few instructions of inline code b) The implementation should contain no branch instructions. At its heart, a pile consists of an allocated linear array in RAM and a pointer, index, whose current value is the location of the next record to read or write. The written size of the array, sz, is a pointer whose value is the maximum value of index during the writing of the pile. The EOF method can be implemented as the inline conditional (sz ≤ index). The pointer base has a value which points to the first location to write in the pile. It is set by the Create JPile method.

The Conditional_Append method copies the record to the pile array beginning at the value of index. Then index is incremented by a computed quantity that is either 0 or the size of the record (szjrecord). Since the parameter condition has a value of 7 for true and 0 for false, index can be computed without a branch as: index = index + condition* szjrecord;

Of course many variations of this computation exist, many of them not involving multiplying given special values of the variables. It may also be computed using a guard as: ga&rd(condition, index = index + sz_record);

Notice that the record is copied to the pile without regard to condition. If the condition is false this record will be overwritten by the very next record; if the condition is true the very next record will be written following the current record. This next record may or may not be itself overwritten by the record after that. As a result it is generally optimal to write as little as possible to the pile even if that means recomputing some (redundant) data when the record is read and processed.

The Rewind method is implemented simply by sz = index; index = base;. This operation records the amount of data written for the EOF method and then resets index to the beginning.

The Pile_Read method copies the next portion of the pile(of length sz_record) to I and increments index. index = index + sz_record;

Destroy JPile deallocates the storage for the Pile.

All of these methods (except Create _Pile and Destroy JPile) can be implemented in a few inline instructions and without branches.

Pile Processing thus enables the unrolling of loops and the consequent improved performance in the presence of branches. This technique particularly enables the parallel execution of lengthy exception clauses. The cost for this is the requirement to write and reread a modest amount of data to/from RAM.

Claims

CLAIMSWhat is claimed is:

1. A method for compressing data, comprising: receiving an inteφolation formula; determining whether at least one data value is required by the inteφolation formula, where the required data value is unavailable; and performing an extrapolation operation to generate the required unavailable data value; wherein the inteφolation formula is utilized for compressing data.

2. The method as recited in claim 1, wherein the inteφolation formula is a component of a wavelet filter.

3. The method as recited in claim 1, and further comprising segmenting a plurality of the data values into a plurality of spans.

4. The method as recited in claim 3, and further comprising reducing an amount of computation involving the inteφolation formula by only utilizing data values within one of the spans.

5. The method as recited in claim 2, and further comprising selectively replacing the wavelet filter with a polyphase filter.

6. The method as recited in claim 1, and further comprising quantizing the data values.

7. The method as recited in claim 6, and further comprising reducing an amount of computation associated with entropy coding by reducing a quantity of the data values.

8. The method as recited in claim 7, wherein the quantity of the data values is reduced during a quantization operation involving the data values.

9. The method as recited in claim 7, wherein the quantity of the data values is reduced using piles.

10. The method as recited in claim 1, and further comprising reducing an amount of computation associated with reconstructing a plurality of the data values into a predetermined data range.

11. The method as recited in claim 10, wherein the computation is reduced by performing only one single clip operation.

12. The method as recited in claim 2, wherein the wavelet filter includes an inteφolation formula including:

13. The method as recited in claim 2, wherein the wavelet filter includes an inteφolation formula including:

Y_2N+1 = {X_2N+_ +l/2)-(X_2N +l/2)

14. The method as recited in claim 2, wherein the wavelet filter includes an inteφolation formula including:

Y +Y

( , +l/2) = ( _2n +l/2) + -^t2n-l ^{~ J}2»÷l

15. The method as recited in claim 2, wherein the wavelet filter includes an inteφolation formula including:

16. The method as recited in claim 2, wherein the wavelet filter includes an inteφolation formula including:

Y +l/2)- ¹1 2nn--\l + ^ ' Y

(X₂„ +l/2) = (7₂„ ^x -*22»+l

17. The method as recited in claim 2, wherein the wavelet filter includes an inteφolation formula including:

18. The method as recited in claim 2, wherein the wavelet filter includes an inteφolation formula including:

19. The method as recited in claim 2, wherein the wavelet filter includes an inteφolation formula including:

{X_2N÷. + l/2) = Y_2N+1 +(X_2N +l/2)

20. A computer program product for compressing data, comprising: computer code for receiving an inteφolation formula; computer code for determining whether at least one data value is required by the inteφolation formula, where the required data value is unavailable; and computer code for performing an extrapolation operation to generate the required unavailable data value; wherein the inteφolation formula is utilized for compressing data.

21. A system for compressing data, comprising: logic for: analyzing a wavelet scheme to determine local derivatives that a wavelet filter approximates; choosing a polynomial order to use for extrapolation based on characteristics of the wavelet filter and a numbers of available samples; deriving extrapolation formulas for each wavelet filter using the chosen polynomial order; and deriving specific edge wavelet cases utlizing the extrapolation formulas with the available samples in each case.

22. A method for compressing data, comprising: receiving data in a single device; encoding the data utilizing the single device to generate first compressed data in a first format; and transcoding the first compressed data utilizing the single device to generate second compressed data in a second format.

23. The method as recited in claim 22, wherein the encoding occurs in real-time.

24. The method as recited in claim 22, wherem the transcoding occurs off-line.

25. The method as recited in claim 22, wherein the first compressed data is transcoded to generate the second compressed data in the second format such that the second compressed data is adapted to match a capacity of a communication network coupled to the single device.

26. The method as recited in claim 22, wherein the encoding is carried out utilizing a first encoder.

27. The method as recited in claim 26, wherein the transcoding is carried out utilizing a decoder and a second encoder.

28. The method as recited in claim 22, wherein the first format includes a wavelet-based format.

29. The method as recited in claim 22, wherein the second format includes a DCT-based format.

30. The method as recited in claim 29, wherein the second format includes an MPEG format.

31. A single device for compressing data, comprising: an encoder embodied on a single device for encoding data to generate first compressed data in a first format; and a transcoder embodied on the same single device as the encoder for transcoding the first compressed data to generate second compressed data in a second format.

32. The single device as recited in claim 31, wherein the encoding occurs in realtime.

33. The single device as recited in claim 31 , wherein the transcoding occurs offline.

34. The single device as recited in claim 31 , wherem the first compressed data is transcoded to generate the second compressed data in the second format such that the second compressed data is adapted to match a capacity of a communication network coupled to the single device.

35. The single device as recited in claim 31, wherein the encoding is carried out utilizing a first encoder.

36. The single device as recited in claim 35, wherein the transcoding is carried out utilizing a decoder and a second encoder.

37. The single device as recited in claim 31, wherein the first format includes a wavelet-based format.

38. The single device as recited in claim 31 , wherein the second format includes a DCT-based format.

39. The single device as recited in claim 38, wherein the second format includes an MPEG format.

40. A method for compressing data utilizing multiple encoders on a single integrated circuit, comprising: receiving data in a single integrated circuit; encoding the data utilizing a plurality of encoders incoφorated on the single integrated circuit.

41. The method as recited in claim 40, wherem the data is encoded utilizing multiple channels on the single integrated circuit.

42. The method as recited in claim 40, wherein the data is encoded into a wavelet-based format.

43. A single integrated circuit for compressing data, comprising: a first encoder embodied on the single integrated circuit for encoding a first set of data; and a second encoder embodied on the same single integrated circuit as the first encoder for encoding a second set of data.

44. The single integrated circuit as recited in claim 43, wherein the data is encoded utilizing multiple channels on the single integrated circuit.

45. The single integrated circuit as recited in claim 43, wherein the data is encoded into a wavelet-based format.

46. A method for compressing data utilizing a single module, comprising: receiving photons utilizing a single module; and outputting compressed data representative of the photons utilizing the single module.

47. The single module as recited in claim 46, wherein the compressed data is encoded into a wavelet-based format.

48. The single module as recited in claim 47, wherein transform operations associated with the encoding are carried out in analog.

49. The single module as recited in claim 46, wherein the single module includes an imager.