US20060235883A1 - Multimedia system for mobile client platforms - Google Patents

Multimedia system for mobile client platforms Download PDF

Info

Publication number
US20060235883A1
US20060235883A1 US11/107,952 US10795205A US2006235883A1 US 20060235883 A1 US20060235883 A1 US 20060235883A1 US 10795205 A US10795205 A US 10795205A US 2006235883 A1 US2006235883 A1 US 2006235883A1
Authority
US
United States
Prior art keywords
multimedia
audio
objects
multimedia object
decoding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US11/107,952
Inventor
Mark Krebs
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to US11/107,952 priority Critical patent/US20060235883A1/en
Publication of US20060235883A1 publication Critical patent/US20060235883A1/en
Priority to US15/016,821 priority patent/US10171873B2/en
Priority to US16/181,285 priority patent/US10771849B2/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/462Content or additional data management, e.g. creating a master electronic program guide from data received from the Internet and a Head-end, controlling the complexity of a video stream by scaling the resolution or bit-rate based on the client capabilities
    • H04N21/4621Controlling the complexity of the content stream or additional data, e.g. lowering the resolution or bit-rate of the video stream for a mobile client with a small screen
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/40Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code
    • H03M7/42Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code using table look-up for the coding or decoding process, e.g. using read-only memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/41Indexing; Data structures therefor; Storage structures
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/40Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video transcoding, i.e. partial or full decoding of a coded input stream followed by re-encoding of the decoded output stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/46Embedding additional information in the video signal during the compression process
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/60Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client 
    • H04N21/61Network physical structure; Signal processing
    • H04N21/6156Network physical structure; Signal processing specially adapted to the upstream path of the transmission network
    • H04N21/6181Network physical structure; Signal processing specially adapted to the upstream path of the transmission network involving transmission via a mobile phone network

Definitions

  • the invention relates to issues of the wireless Internet, specifically to methods of multimedia transmission and playback for mobile clients.
  • the current methods of distributing multimedia data over the wireless Internet to mobile clients are constrained by existing wireless bandwidth, and the real-time decoding, processing and displaying of multimedia content with limited hardware capabilities. These hardware limitations include slow CPUs, high memory latencies, slow drawing capabilities and the absence of YUV to RGB conversion in the hardware.
  • embedded media players several cell phone handsets and handheld computers can play either streamed video or audio.
  • Popular digital video encoding standards for some handsets are H263 and MPEG4.
  • the audio codecs, MP3, AMR and AAC, are also typically supported on some mobile handhelds.
  • Newer video codecs, like H264 could be used for video transmission to cell phones, but would require clients systems with fast memory access for their motion compensation methods.
  • Embedded streaming media players rely on firmware integration to take advantage of the multitasking capabilities of cell phone handsets. At the time of this writing, most cell phones cannot support multimedia playback because they are only capable of supporting one or a few concurrent processing threads.
  • video Is also limited to very low frame rates and the bandwidth available for streaming in North America is low, varying from 2-3 kybtes/second to ISDN speeds of 64 Kbits.
  • European countries and Japan currently offer 3G network connection speeds, varying from 64 kbits-300 Kbits, and offer more technologically advanced cell phones with embedded media players that can achieve higher video frame rates. For limited per usage periods and some, EV-DO (Evolution Data Optimized) networks can also provide these higher speeds over local CDMA networks.
  • EV-DO Evolution Data Optimized
  • J2ME device independent Java
  • CLDC Connected Limited Device Configuration
  • CDC Connected Device Configuration
  • MIDP Mobile Information Device Profile
  • Java players for cell phones like the Oplayo MVQ player exist, but implementations of true, platform independent, MPEG4 Java decoders that will play video on cell phones are not known.
  • More efficient methods such as U.S. Pat. No. 5,699,121, do not rely on DCT motion compensation, and propose pattern marching to identify regions in the motion residual signal that have not been accurately reproduced and to correct them using a pattern library. But again, this approach does not use MPEG4 video encoding.
  • Bit streaming is the standard method of transmitting audio or video to cell phones over wireless networks. Streamed bits are buffered and then decoded, or entire video files are downloaded or proportionately cached, or as in progressive, http streaming downloading to a point where complete, continuous playback is deemed possible.
  • bit streaming in terms of audio/video content is usually is done over a non-reliable transport like UDP and requires a lot of error correction and duplication of content (extra stream correction data). Advances in these transmission methods do propose more sophisticated means of reserved bandwidth, such as AROMA.
  • Alternatives to streaming methods have been proposed for the transmission of video as objects through pre-fetched lists Waese et al., U.S. Pat. No.
  • Streaming also requires client processing for significant error correction in video decoding, adaptive encoding for varying channel bitrates and in cellular networks. It also requires the use of cellular MMS multimedia protocol.
  • the object of the current invention is to solve one or more of the drawbacks in existing methods discussed above, or to provide other improvements to the art.
  • the invention relates to wireless Internet multimedia transmission and wireless clients.
  • the invention provides a method of efficient multimedia object creation.
  • the invention deployment addresses the limitations of large-scale multimedia transmission on cellular networks to wireless clients.
  • the invention relates to methods of decoding of video, sufficiently optimized to be played on a limited wireless client.
  • the invention relates to methods of decoding of audio, sufficiently optimized to be played on a limited wireless client.
  • the limited mobile handset multimedia object player for both MPEG4 video decoding and AAC audio decoding, is implemented as a device-hardware-independent java (J2ME) applet.
  • the invention pertains to efficiently transmittable multimedia object creation.
  • a server-based transcoder coupled with a multimedia object creator, inputs a standard analog signal or alternative digital signal like MPEG2, and converts this signal into true MPEG4/AAC multimedia objects.
  • multimedia objects they can then be dynamically uploaded to multiple live content hosting web servers, which, through proximate mobile network proxy servers, make live content accessible to mobile clients as consecutive multimedia objects.
  • the multimedia object creator produces discrete multimedia objects from video and audio segments of a continuous stream. If the stream is MPEG4, multimedia objects can also be segments of multiple component video and audio streams. In the case of multiple MPEG4 component streams, per object segmentation and decoding can enable the composition of a single scene from several temporally-independent multimedia objects. This provides the possibility of decoding only a limited number of multimedia objects, and not all objects, to provide an object-based scalability.
  • Multimedia objects are discrete, and also have distinctive Internet addresses, and hence, the mobile host will have the opportunity to interact with any given media sequence on a per object basis.
  • a window of multimedia objects is made available on the host server. This window would be comprised of a number of recently created multimedia objects.
  • a larger multimedia object window can be dynamically created on-the host server.
  • each object is numerically ordered.
  • the transport mechanism for the multimedia objects is assumed to be HTTP for the purposes of illustration, however, other protocols which access content through file and directory structures could be used.
  • FTP, IMAP4 and NNTP all have the capability to serve files in a directory structure.
  • the number of multimedia objects that can be buffered in memory is based on the size of the first multimedia object
  • the amount of free memory available and the processing of further multimedia objects in the sequence can be optional and dependent on whether the implementation allows the modification of object parameters between multimedia objects (such as the size of the visual frame or the sample rate of the audio stream).
  • the buffering and playback of multimedia objects in a synchronized fashion is critical to fluid playback.
  • the HTTP 1.1 protocol and some implementations of HTTP 1.0 allow the use of a persistent connection over TCP to perform multiple requests.
  • some HTTP 1.1 implementations allow the use of pipelined connections allowing the HTTP client to perform many requests in rapid succession decreasing the latency between the request and reception of each multimedia object. When possible, the invention can take advantage of this capability.
  • the invention pertains to multimedia object deployment to large numbers widely distributed wireless Internet clients.
  • Media content converted to multimedia objects must be available to many users and the distribution system must be sufficiently robust to allow peaks in demand and have sufficient geographic proximity that network congestion and latency are reduced.
  • the transcoding of input media formats and the creation multimedia objects is done in real-time and immediately deployed to every content server of the distributed system.
  • These content servers may be at the same location, or they may be geographically placed to support local mobile clients and take advantage of alternative mobile network proxy servers and proxy server object caching.
  • the distribution of multimedia objects to alternative content servers can take place on the wired Internet.
  • the invention provides novel optimizations for digital video decoding. Some of these optimizations can then be used by an expert assessment process, whereby, the decoder maintains a state information list of short-cuts, related to perceived frame rate in a sorted list starting with those that will decrease output quality the least, to those that will decrease output quality the most but have the most impact on decoding speed.
  • the client player-decoder dynamically adjusts how many shortcuts must be taken. These short-cuts are specifically designed to drastically reduce the number of computations necessary at certain critical steps in the video decoding process at the cost video output quality. This allows the video decoder to scale in complexity based on the processing power of the device being used. It also allows users to experience multimedia playback despite the limitations of the device they may be using.
  • the invention pertains to decoding audio on a limited mobile device.
  • Mobile devices present many challenges for audio playback.
  • Typical mobile processors have integer math only, little or no on-CPU cache, and a limited audio output interface.
  • the present invention takes several approaches to getting maximum audio quality out of these very limited devices, which are applicable to other audio codecs such as AC3, AMR and WMA v9, as well as AAC LC. These include a novel use of Huffman codebooks, a highly optimized IMDCT process, and innovative windowing optimizations.
  • the proposed invention also solves this gapping problem by intelligent placement of the gaps. A frame of low total energy is selected, and the playback is controlled so that the gap will occur during that frame. The low-energy frame may be dropped so that synchronization is not lost.
  • the invention pertains to the implementation of a moble handset MPEG4 video and MC audio player that is hardware-independent and operating system independent, and can simply be downloaded prior to media playback on mobile clients that do not have embedded media players.
  • Hardware and operating system independence are characteristics of Java applets, but Java cannot take advantage of hardware capabilities in processing the huge number of calculations and variables required for either MPEG4 decoding or MC decoding on limited processing mobile handset.
  • the required optimizations for Java itself, to permit the playback of AAC and MPEG4 on current mobile client hardware are a source of technological innovation and advance.
  • FIG. 1 is a general diagram for a distributed network system for multimedia-on-demand, utilizing a centralized content server, indexing host, multimedia object creator and transcoder for live broadcast applications or to transcode and create multimedia objects from archived multimedia files, and distributed content servers involving high capacity cellular network proxy servers and mobile clients running downloaded java applets or embedded or downloaded non-java multimedia object players; and
  • FIG. 2 is a flow diagram illustrating a multimedia object identification method by the multimedia object creator of FIG. 1 for mobile clients by the host content server of FIG. 1 ;
  • FIG. 3 illustrates a multimedia object windowing sequence for a live transmission of multimedia objects created by the multimedia object creator of FIG. 1 ;
  • FIG. 3 a illustrates multimedia object creation for single stream multimedia, just audio and multi-stream MPEG4 composite layers by the multimedia object creator of FIG. 1 ;
  • FIG. 4 is a flow diagram illustrating the steps of multimedia object processing by the multimedia object players of FIG. 1 ;
  • FIG. 5 is a diagram illustrating the architecture and processing interaction for the large scale distribution of live and archived multimedia content in the distributed network being managed by the indexing host of FIG. 1 , involving remote transcoding/multimedia object creating servers and a central indexing host server; and
  • FIG. 6 is general diagram illustrating standard MPEG4 Simple Profile decoding steps which are followed in general for video decoding by the multimedia object players of FIG. 1 ;
  • FIG. 7 is a flow diagram illustrating an optimized Huffman codebook method for digital video decoding method used by the multimedia object players of FIG. 1 ;
  • FIG. 8 is a flow diagram illustrating a method of using a texture buffer to process P-frames for digital video decoding.
  • FIG. 9 is a flow diagram showing a method of video decoding performing faster motion compensation without bilinear interpolation when less quality but faster processing is required that is used by the multimedia object players of FIG. 1 ;
  • FIG. 10 is a flow diagram illustrating an optimized digital video decoding method for optimizations in pixel processing and dequantization used by the multimedia object players of FIG. 1 ;
  • FIG. 11 is a flow diagram illustrating a novel use of Chen's algorithm used by the multimedia object players of FIG. 1 ;
  • FIG. 12 is a flow diagram showing a novel handling YUV to RGB conversion used by the multimedia object players of FIG. 1 ;
  • FIG. 13 is a flow diagram illustrating decoding short cuts for effective video decoding on variable limited mobile client hardware used by the multimedia object players of FIG. 1 ;
  • FIG. 14 is a general diagram illustrating basic steps of the AAC digital audio decoding and other similar audio codec decoding, which are followed in general by the multimedia object players of FIG. 1 ;
  • FIG. 15 is a flow diagram illustrating an optimized Huffman codebook method for digital audio decoding used by the multimedia object players of FIG. 1 ;
  • FIG. 16 is a flow diagram illustrating an optimized digital audio decoding method for optimizations in the IMDCT step used by the multimedia object players of FIG. 1 ;
  • FIG. 17 illustrates simplified input short-cut processes specific to AAC Low Complexity (LC) audio decoding profile used by the multimedia object players of FIG. 1 ;
  • FIG. 18 shows audio decoding using an alternative bit-operation based Taylor computation method used by the multimedia object players of FIG. 1 ;
  • FIG. 19 illustrates further IMDCT short window processing for digital audio decoding for the method used by the multimedia object players of FIG. 1 ;
  • FIG. 20 illustrates low energy gap timing in audio playback for the method of audio decoding used by the multimedia object players of FIG. 1 .
  • FIG. 1 illustrates a centralized content server system 1 , utilizing a transcoder 2 and a multimedia object creator 3 to create multimedia objects from a live broadcast 4 or to transcode and create multimedia objects from archived multimedia files 5 .
  • the central server includes an indexing host system 6 to deploy created multimedia objects to relevant content servers 7 through the wired Internet and to verify all geographically dispersed wireless clients 8 .
  • the system includes the potential use of proxy cellular network http servers 9 , which can cache large numbers of small multimedia objects to support large numbers of concurrent wireless clients 8 running multimedia object java applets 10 or embedded or downloaded non-java multimedia players 11 .
  • FIG. 2 is a flow diagram illustrating the process of multimedia object identification by the multimedia object creator 3 . This process encodes a Supplied Identification to each multimedia object to identify the transport protocol, source host, path and number of objects of a particular multimedia stream.
  • the host directory name of the multimedia objects is formatted to contain the number of video objects located within the directory.
  • An delimiting character is placed between the end of the directory name and the number indicating the multimedia object count. This allows the use of directory names terminating in numbers while indicating an unambiguous multimedia object count e.g. StarWars — 1.mp4, StarWars — 2.mp4, etc.
  • Multimedia objects within the directory are named similarly to directory name. However, instead of the multimedia count following the delimiting character, a number Indicating the multimedia object's position within the sequence of multimedia object is specified. The following is an example:
  • the first multimedia object in the sequence to be played could have the index 0 of a counting series.
  • a window of multimedia objects is made available on all content servers 1 and 7 . This window would be comprised of a number of recently created multimedia objects transcoded from the live stream.
  • the window of multimedia objects allows clients to begin reception of a multimedia object sequence at an earlier point than the most recently created multimedia object. This mechanism provides an extra degree of forgiveness in high-latency situations, where there may be a delay between the client 8 discovering the most recent multimedia object and the actual request.
  • the window of multimedia objects would shift as more multimedia objects are transmitted from the live source.
  • the multimedia object sequences would begin at 0 and be numbered sequentially.
  • the window size hence permits the removal of earlier objects.
  • a live stream may be made comprised of a window of four objects.
  • the first multimedia object Upon transmission of a fifth video object, the first multimedia object would be deleted, resulting in the following sequence illustrated in FIG. 3 .
  • the wireless client 8 can have the capability to search forward in the multimedia object sequence among the multimedia video objects in window. This provides additional transmission continuity in cases where it is not possible to maintain sufficient bandwidth for all multimedia objects in the live sequence.
  • a larger multimedia object window can be used.
  • the mobile client 8 may also store more than two multimedia objects in the internal buffer.
  • wireless networks over which the limited devices operate often have a very high latency. This is especially evident when TCP's 3-way handshake must be performed for every connection that is made. It is therefore ideal to use an application protocol that is able to minimize the latency between each request for a multimedia object.
  • HTTP HyperText Transfer Protocol
  • IMAP4 IMAP4
  • NNTP all have the capability to serve files in a directory structure.
  • HTTP 1.1 protocol and some implementations of HTTP 1.0 allow the use of a persistent connection over TCP to perform multiple requests.
  • HTTP 1.1 implementations allow the use of pipelined connections, allowing the HTTP client to perform many requests in rapid succession decreasing the latency between the request and reception of each multimedia object.
  • the transcoder 2 and multimedia object creator 3 create multimedia objects of an optimal digital encoding-such as MPEG4/AAC from analog multimedia or an alternative codec stream 12 , such as MPEG1, MPEG2, MOV, AVI, WMV, ASF, and higher encoded MPEG4.
  • an optimal digital encoding such as MPEG4/AAC from analog multimedia or an alternative codec stream 12 , such as MPEG1, MPEG2, MOV, AVI, WMV, ASF, and higher encoded MPEG4.
  • the Input stream is transcoded into MPEG4 and MC and then it is split according to a specified interval, such as 10 seconds, into multimedia objects.
  • the video component of the stream is scanned after the specified interval for the next I-frame 13 , where the split is made. Since typically there are no predicted frames in digitized audio 15 , a conditional split is made to correspond to-the video segmentation 14 .
  • multiple video and/or audio composite layers can also be split into multimedia objects at I-frames 13 .
  • the transcoded audio can be analog, or digital codecs, AMR, MP3, RealAudio or higher rate encoded AAC and then split into specified intervals 13 a.
  • FIG. 4 is a flow diagram illustrating client side processing of multimedia objects.
  • Multimedia object player 10 or 11 processing is initiated by the receipt of the first multimedia object from a content server 1 or 7 .
  • the first multimedia object's Identification is parsed and the total number of multimedia objects stored within the Indentication's ⁇ path> is determined, or the case of live transmission applications, the number of multimedia objects in window.
  • heap memory allocations for the multimedia objects and meta-data can then be determined. These allocations are created of sufficient size that multimedia objects that follow can overwrite older multimedia objects in the same memory allocation without overflowing.
  • This state information provides a mechanism with which the reception and playback of multimedia objects can be synchronized.
  • the multimedia object contains information required to properly configure the audio and video decoders, and this information is passed to the respective decoder.
  • the object player may choose to either delay playback until the multimedia object buffers in memory have filled or may begin playback immediately while requesting the next multimedia object concurrently. This decision can be based on the speed at which the multimedia objects are retrieved versus the playback time of each multimedia object, the latency of requests for multimedia objects or the number of multimedia objects that can be stored in memory at once.
  • the decoder can perform both audio and video decoding in a single thread.
  • the state information described also provides a mechanism which can be used to skip backwards and forwards through the multimedia object sequence. By changing the state information and restarting the retrieval of multimedia objects, the playback of the objects can be repositioned to any multimedia object in the sequence.
  • FIG. 5 a large scale “live content” application is illustrated.
  • a central server indexing host 17 manages all of the available content and the content servers 7 through which the content is made available.
  • Remote transcoding and multimedia object creating servers 18 that provide continuously updated content must register this content with the indexing host 18 .
  • the transcoding servers 18 must also keep the central indexing server 17 updated with the latest multimedia object sequence Indices, to allow distributed wireless clients 8 to begin playback of any live content with minimal delay.
  • the URLs of any content servers supporting a particular broadcast would be pre-registered in a table on the indexing server 17 .
  • Content servers 7 accept and store live content being transmitted from transcoding servers 18 . They can also store non-live archive multimedia content, but in a live content type application, they need only cache the most current window of multimedia objects.
  • Content servers 7 are distributed in such a fashion that allows wireless clients 8 a fast and low latency host connection. Content servers 7 could all be connected in a LAN, but for large scale operations, they could have any distribution on the wired Internet.
  • the wireless client 8 can receive the content directly 19 from a content server 7 or indirectly 20 through a cellular network proxy server 9 .
  • the central indexing host 17 accepts requests from clients 8 for multimedia content 21 .
  • the indexing host 17 must reply with the most suitable content server 7 for the client 8 . This can either be done in a round-robin fashion or other factors can be included such as the location of the client 8 relative to available content servers 7 . Other information such as the server load and network congestion of each content server 7 can be taken into account.
  • the central indexing host 17 also authenticates 22 clients 8 as they request available content and specific pieces of content.
  • the authentication process is designed in such a way that the content servers 7 do not need to maintain a list of authorized clients 8 and the content available to them. Instead the indexing host 17 authenticates the client 8 and provides the client 8 with an encrypted string that is eventually decrypted by the content server 7 .
  • This string is the encrypted form of the catenation of the content name or description, and the current UTC date-time and an interval of time for which the client 8 is authorized to access the multimedia content.
  • the string is designed to allow the client 8 to access and playback multimedia objects received from a designated content server 7 .
  • the indexing host 17 may also provide the client 8 with other information about the multimedia content, along with the encryption string, such as a description of the source, copyrights, and subtitle data-sources.
  • FIG. 6 illustrates MPEG4 video decoding process as outlined by the MPEG-4 Committee for recovering video object planes (VOPs) data from the coded bit stream. These steps of video decoding are followed in general by the video decoding process of multimedia players 10 and 11 .
  • the decoding process is composed of three major sections; shape, motion, and texture decoding.
  • Coded Bit Streams The video stream is parsed and demultiplexed to obtain shape, motion, and texture bit streams. Each stream has a decoding process needed in order to reconstruct the VOPs.
  • Shape Decoding Binary shape decoding is based on a block-based representation.
  • the primary coding methods are block-based context-based binary arithmetic decoding and block-based motion compensation.
  • Variable Length Decoding Shape information, motion vectors, and the quantized DCT coefficients are encoded using variable length codes. Differential DC coefficients in intra macroblocks are encoded as variable length codes. The final DC value is the sum of the differential DC value and the predicted value. The AC coefficients and non-intra block DC coefficients use a different variable length code.
  • Inverse Scan Coefficients are scanned during the encoding process for two reason—to allocate more bits to high energy DCT coefficients during quantization and to turn the two dimensional array (8 ⁇ 8) into a one dimensional array.
  • the reverse process i.e. inverse scan
  • the decoding size is used to ensure proper dequantization and to restore the two dimensional information.
  • Inverse AC and DC Prediction The prediction process is only carried out for intra macro blocks. Previous intra macro blocks are used for forward prediction in order produce subsequent macro blocks. This optimization process is used to predict both DC and AC coefficients.
  • Inverse Quantization The two-dimensional array of coefficients produced by the inverse scan is inverse quantized to produce the reconstructed DCT coefficients.
  • the process is trivial; it is basically a multiplication by the quantizer step size.
  • a variable quantizer step size can be produced by using a weighted matrix or a scale factor in order to variably allocate bits during the encoding/decoding process.
  • ICT Inverse DCT
  • Motion compensation is another technique used to achieve high compression.
  • the algorithm used by MPEG-4 is block-based motion compensation to reduce the temporal redundancy between VOPs.
  • Motion compensation in this case is two fold: if is used to predict current VOP from previous VOP, and to interpolate prediction from past and future VOPs in order to predict bi-directional VOPs.
  • Motion vectors must be decoded to predict movement of shapes and macroblocks from one VOP to the next. Motion vectors are defined for 8 ⁇ 8 or 16 ⁇ 16 regions of a VOP.
  • VOP Video Object Planes
  • Video streams must begin with a frame that makes no temporal reference to any earlier frames or an Intra-Frame (I-Frame).
  • I-Frame Intra-Frame
  • a second type of VOP that allows temporal reference to the previous frame in the stream are known as Predicted Frames (P-Frames).
  • P-Frames Predicted Frames
  • Macroblocks within P-Frames may contain motion vectors to enable motion correction from the previous frame. These macroblocks often contain pixel residue information which includes corrections to the predicted pixels.
  • Motion compensation must occur for many of the macroblocks within P-Frames and is a critical component of any video decoding mechanism.
  • Motion vectors can be compressed using Huffman codes. These are binary Variable Length Codes (VLC) which represent values occurring with high probability with shorter binary length than values which occur with less probability.
  • VLC binary Variable Length Codes
  • the rapid decoding of VLCs is critical to any decoding application on constrained devices.
  • the video decoding process operating on the multimedia object players 10 and 11 decodes these VLCs in a novel use of Huffman codebooks.
  • Theoretical Huffman codebook process reads bits from the packet bitstream until the accumulated bits match a codeword in the codebook this process can be thought of as logically walking the Huffman decode tree by reading one bit at a time from the bitstream, and using the bit as a decision Boolean to take the 0 branch (left side) or the 1 branch (right side). Walking this binary tree finishes when the decoding process hits a leaf in the decision tree—the result is the entry number corresponding to that leaf. Reading past the end of a packet propagates the ‘end-of-stream’ condition to a decoder.
  • FIG. 7 The novel approach taken to decode VLCs by the video decoding process operating on the multimedia object players 10 and 11 , is illustrated in FIG. 7 , and can be precisely described as follows:
  • the maximum length of a code in Table B-7 is 9.
  • the roof of logarithm (base 2 ) of N is found to be
  • the value 4 is then used to identify the array in which N is used as an index to locate the appropriate decoded value. N can also be shifted to remove irrelevant bits allowing the lookup array to be smaller.
  • FIG. 8 is a flow diagram, describing video decoding process in the multimedia object players 10 and 11 , which illustrates that a texture buffer large enough to contain 4 luminance and 2 chrominance blocks (the dimensions of a macroblock exemplified in the MPEG4 specification) is used to store the predicted pixels from a reference frame.
  • This texture buffer is much smaller then the original video frame and decreases the amount of reading from and writing to non-consecutive bytes within the reference and output video frames. All pixel residues are applied to the texture buffer which is then copied to the output frame.
  • This method of processing P-frames is optimal in situations where the main processing unit has sufficient cache to store the texture information of the entire Macroblock. In cases where the limited device has very little or no on-die cache, it may be preferable to avoid using a macroblock texture buffer. Also, macroblocks with motion vector information contain pixel residue values that are often distributed in a much smaller range of values than the pixels of a texture. In cases where the device is unable to decode the video stream in real-time, a faster but less accurate IDCT algorithm can be used to process these residue values. Furthermore, to minimize the effect of the less accurate IDCT algorithm, this step is taken first on chrominance pixel residues, but can also occur for luminance pixel residues as required.
  • the motion vector information associated with a macroblock often references a point between pixels on the reference VOP. This requires that decoders perform bilinear interpolation between pixels. This is a time consuming process requiring the sampling of four source pixels, four additions and a single divide operation for every output pixel.
  • the video decoding process of the multimedia object players 10 and 11 shown in the flow diagram of FIG. 9 , uses faster motion compensation without bilinear interpolation when less quality but faster processing is required.
  • Digital video codecs define Luminance and Chrominance values within a given subrange of values, MPEG4 uses [0, 255].
  • FIG. 10 is a flow diagram illustrating novel optimization for the dequantization step of digital video decoding in the multimedia object players 10 and 11 .
  • the novel optimization requires the a reduction in pixel accuracy but allows values outside the range [0, 255] to be represented in a byte field without an overflow.
  • the range [ ⁇ 128, 383] is sufficient to store nearly all potential resulting Luminance and Chrominance pixel values.
  • values in the [ ⁇ 128, 383] may be represented in the [0, 255] with a decrease in accuracy of 50%.
  • FIG. 11 is a flow diagram illustrating a novel use of Chen's algorithm in the multimedia object players 10 and 11 .
  • Chen's algorithm be implemented, based on the energy or distribution of input DC and AC coefficients. This can result in reduced video output quality, but the effect is mitigated by giving a higher-quality preference to luminance blocks.
  • Reduced color definition is often not as noticeable on constrained devices, and allows the chrominance blocks to be decoded with less precision.
  • the IDCT process can be further optimized by recording which rows of the input matrix to the IDCT are populated with values. This same mechanism can be used to ignore certain input values of insufficient energy to make a very noticeable Impact on the output image and further decrease processing time.
  • FIG. 12 is a-flow diagram showing video decoding of the YUV to RGB step in the in the multimedia object players 10 and 11 , as follows:
  • a minimum amount of reading from the source Luminance and Chrominance planes is desired. This is accomplished by iterating through pixels in the source plane. A fixed number of Luminance and Chrominance values in a column are read and the resulting RGB values computed for each pixel position. The pixel values are then copied in a looping fashion first by column, then by row to the output plane. This provides a way to read a single input value which may result in many output values In the output plane when scaling up.
  • Luminance and Chrominance planes are desired. This is accomplished by iterating through pixel positions in the output plane and calculating the source pixel in the input plane. This provides a way to read a single input value for every output value and minimizes the number of input-plain reads that are necessary.
  • the YUV to RGB conversion step is such a time consuming one that methods of improving the speed of computation at the expense of output quality have been implemented. Improvements in speed can be obtained by sampling only a subset of the chrominance pixels, avoiding pixel clipping or calculating the Red and Blue values for only a subset of output pixels. All of these methods are used together to provide several quality levels in the YUV to RGB step.
  • FIG. 13 is a flow diagram summarizing the short-cut optimization processing by the video decoding process used in the multimedia objects players 10 and 11 .
  • State information is maintained about the quality levels with which the current video stream is processed.
  • short-cuts in the decoding process must be made to allow the device to maintain synchronicity between the audio and video playback.
  • These short-cuts are specifically designed to drastically reduce the number of computations necessary at certain critical steps in the video decoding process at the cost video output quality. This mechanism allows video decoding to scale in complexity based on the processing power of the device being used.
  • a final option exists is to avoid the processing and displaying of some or all P-Frames. This is only an option in video streams where I-Frames occur at regular intervals. Given the wide variety of processing capabilities in limited devices, this implementation strongly suggests the creation of multimedia objects from video streams with transcoder 2 specifying very regular I-Frames so that devices of very limited processing power are able to provide the client 8 with occasional frame changes.
  • the state information is composed of a series of integers correspond to various steps in the decoding process and define the quality at which the decoder should perform several steps.
  • the implemented system in the multimedia players 10 and 11 consists of six of these integers:
  • nVideoQuality In addition to the set of integers defining the actual quality at various steps, a single integer representing the current quality level of the overall decoding is used (named nVideoQuality in this instance). Each step quality has a very limited number of possibilities (HIGH, MEDIUM, LOW, etc), however, nVideoQuality can take on many values. At each value of nVideoQuality, a ruleset defines quality of each of the above step qualities. At the highest value of nVideoQuality, all step qualities are set to maximum. As an nVideoQuality is decreased, the step qualities are incrementally reduced according to the ruleset.
  • Some states of quality levels are less preferable to others. For example, it is not preferable to render many frames at the lowest quality setting of nLuminaIDCTQuality—it is instead more preferable to drop frames if there is insufficient processing capability to perform nLumaIDCTQuality at a higher quality.
  • the ruleset is designed to take these possibilities into account.
  • FIG. 14 illustrates the general steps of audio decoding followed by the audio decoding process of the multimedia object players 10 and 11 .
  • the first step in AAC audio decoding is to establish frame alignment. This involves finding the AAC sync word and confirming that the AAC frame does not contain any errors, if error checking is enabled in the frame. Once the frame sync is found, the bitstream is de-multiplexed or unpacked. This includes unpacking of the Huffman decoded and quantized scale factors, the M/S synthesis side information, the intensity stereo side Information, the TNS coefficients, the filter bank side information and the gain control words.
  • each coefficient must be inverse quantized by a 4/3 power nonlinearity and then scaled by the quantizer step size.
  • VLCs variable length fields
  • Bits are read off the stream into an integer N.
  • the number of bits read is equivalent to the maximum number of bits in the longest codeword in the codebook.
  • the first binary 0 is then located starting from the highest bit.
  • the left-based index of this first 0 is then used to remove out all the previous is and N is shifted and used as an array Index.
  • the AAC standard's 2nd Codebook contains the
  • the ZeroPosition of the above integer is found to be 4.
  • the ZeroPosition is then used to mask off the 1 bits previous to it yielding the integer “010X”. This can then be used as an index to an array or be shifted to remove the irrelevant bits allowing the lookup array to be smaller.
  • the next standard audio decoding step conditionally dematrixes two channels into a stereo pair.
  • the samples my already represent the left and right signals, in which case no computation is necessary. Otherwise the pair must be de-matrixed via one add and one subtract per sample pair in order to retrieve the proper channel coefficients.
  • Intensity stereo identifies regions in a channel pair that are similar, except for their position.
  • Left-channel intensity regions must have inverse quantization and scaling applied.
  • Right-channel intensity stereo regions use the left-channel inverse quantized and scaled coefficients, which must be re-scaled by the intensity position factors.
  • the next standard step, temporal noise shaping (TNS) has a variable load, depending on the number of spectral coefficients that are filtered.
  • IMDCT Inverse Modified Discrete Cosine Transform
  • FIG. 16 illustrates Intermediate 23 and Final 24 optimizations for the digital audio IMDCT step used by the audio decoding process in the multimedia object players 10 and 11 .
  • the audio decoder of the multimedia object players 10 and 11 combines the use of a specific Inverse Fast-Fourier Transform with Pre- and Post-processing steps.
  • This method produces a simplified IMDCT algorithm with O(n*log(n)) runtime.
  • This method can also incorporate the use of various IFFT algorithms based on the sparseness of input.
  • the IMDCT algorithm accepts an input array X of spectral coefficients in the frequency domain and outputs an array of amplitude values in the time domain twice the size of the input.
  • the implementation of the AAC Low Complexity codec requires that the IMDCT algorithm accept input array lengths of 128 or 1024 Real values and results in an output of 256 or 2048 Real values.
  • N refers to the size of the output (256 or 2048)
  • Im(X) returns the imaginary component of some variable X
  • Re(X) returns the real component.
  • IFFT Inverse Fast Fourier
  • the transformation can be calculated in a fixed point manner.
  • the input is be scaled by multiplying the input values by a scale factor and then the correct output is found by multiplying by the reciprocal of the scale factor. Therefore a scaling operation is applied before and after the IFFT.
  • a scale factor which is a power of two is chosen so that the scaling and re-scaling operations can be accomplished by bit shift operations. Bit shifts are among the fastest operations for CPUs.
  • Re-order, pre-scale and twiddle The method loops over the input data, and each datum is complex-multiplied by the twiddle factor, and is then re-scaled by doing a bit shift operation. However, the twiddle factor is already bit-shifted so it can be treated as a fixed-point number, so the scaling operation's bit shift is partially performed by the twiddle factor itself.
  • the relevant twiddle factors are stored in an array table. Once the complex multiplication and scaling are done, the resulting values are stored in the re-ordered location in the IFFT input array.
  • Re-scale, re-order, post-twiddle, window and overlap Combining these four operations into one step replaces four array accesses with one, and some of the multiplications are also combined into single bit shifts.
  • This method loops over the IFFT output array, and performs four operations in each iteration of the loop: the post-twiddle and rescale are combined, because the post-twiddle uses a twiddle factor table which is already bit-shifted. Windowing is combined in this step also, with window values coming from either a table or a fast integer sine calculator. Finally, values are overlapped and stored in the correct location in the output array.
  • FIG. 17 illustrates simplified input shortcut processes that are specific to AAC Low Complexity (LC) profile which are used in the audio decoding process of multimedia players 10 and 11 .
  • LC AAC Low Complexity
  • the Mid/Side, Intensity and Temporal Noise Shaping steps, marked with cross hatches above, are optional.
  • audio decoding can further combine other steps in a novel way. These steps are marked in grey in FIG. 17 . If these other steps are combined, there are no dependencies within a frame until we reach the IFFT step within IMDCT itself. Therefore operations between noiseless decoding and the pre-IFFT operations within IMDCT itself are combined, minimizing memory access.
  • IMDCT has four different window shapes which are common in other digital audio codecs: long only, long start, long stop, and eight short. Of these four window sequences, only one (long only) has non-zero data in the entire output synthesis window. In the case of AAC, however, the output synthesis window always has 2048 output values. Window shape Non-zero byte range LONG ONLY 0-2047 LONG START 0-1599 LONG STOP 448-2047 EIGHT SHORT 448-1600
  • the calculations can be short-cut, avoiding the post-twiddle, windowing, re-ordering, scaling and overlapping steps entirely.
  • IMDCT permits two different window types: Kaiser-Bessel Derived (KBD) windows and Sine windows.
  • KBD uses a complicated formula which cannot be computed in real-time, and is always used as a table.
  • Sine windows are also used from tables in most implementations.
  • FIG. 18 shows the audio decoder of the multimedia objects players 10 and 11 , using a bit-operation based Taylor computation, as follows:
  • FIG. 19 is illustrates further IMDCT short window processing for even greater efficiency by the audio decoding process of multimedia players 10 and 11 .
  • the input of 1024 values is divided into eight short windows of 128 values, and IMDCT, windowing and overlapping is performed on each of these short windows.
  • Each window of 128 values results in a synthesis output window of 256 values. These are then overlapped, resulting in non-zero values in the range of 448 to 1600.
  • the approach taken is to do every one of the IMDCT operations in sequence, rather than in parallel, storing the IMDCT results directly into the regions of the output array which will be zeroed. The output values are then windowed and overlapped. After all the eight short windows are completed, the regions of the synthesis output window which are always zero can be disregarded, due to the window shape shortcut method described above.
  • FIG. 20 illustrates an interleaved detection process in the audio decoding of received multimedia objects 25 by the multimedia object players 10 and 11 .
  • FIG. 20 illustrates the placement of gaps 26 at detected frames of low total energy 27 are they are detected during audio decoding by the multimedia object players 10 and 11 . Hence, playback is then controlled so that the gap will occur during that frame, which may be dropped, so that synchronization with video is not lost.
  • the multimedia object player 10 is a downloadable Java (J2M2) applet and the described audio and video decoder optimizations and strategies, FIG. 7-13 and FIG. 15-20 , as applied to standard MPEG4 and AAC decoding make it possible for the multimedia object player 10 to playback live music and video, at acceptable frame rates (5-15 fps), on limited, cell phone handsets.
  • Java cannot take advantage of hardware capabilities in processing the huge number of calculations and variables required for either MPEG4 decoding or AAC decoding.
  • the required optimizations for multimedia player 10 to permit the playback of AAC and MPEG4 on current mobile client hardware, are a source of technological innovation and advance.

Abstract

A method for multimedia playback and transmission to wireless clients is described. A host webserver transcodes a live digital or analog audio-visual or audio broadcast signal and splits the input stream into small multimedia objects of an efficient compression such as MPEG4/AAC, and then immediately deploys the objects to distributed content servers for a geographically dispersed population of wireless clients. A java applet object player, downloaded to wireless clients at the beginning of the multimedia on-demand session, interprets and decodes the multimedia objects as they are received, using multiple levels of optimization. The applet uses novel video and audio decoding optimizations which can be generically applied to many digital video and audio codecs, and specifically decodes Simple Profile MPEG4 video and Low Complexity AAC audio.

Description

    FIELD OF INVENTION
  • The invention relates to issues of the wireless Internet, specifically to methods of multimedia transmission and playback for mobile clients.
  • BACKGROUND OF THE INVENTION
  • The current methods of distributing multimedia data over the wireless Internet to mobile clients are constrained by existing wireless bandwidth, and the real-time decoding, processing and displaying of multimedia content with limited hardware capabilities. These hardware limitations include slow CPUs, high memory latencies, slow drawing capabilities and the absence of YUV to RGB conversion in the hardware.
  • Video and audio playback exist on certain cell phone handsets, but this technology is embedded and takes advantage of low-level hardware processing to enable the performance required for media playback. Through embedded media players, several cell phone handsets and handheld computers can play either streamed video or audio. Popular digital video encoding standards for some handsets are H263 and MPEG4. The audio codecs, MP3, AMR and AAC, are also typically supported on some mobile handhelds. Newer video codecs, like H264 could be used for video transmission to cell phones, but would require clients systems with fast memory access for their motion compensation methods.
  • Embedded streaming media players rely on firmware integration to take advantage of the multitasking capabilities of cell phone handsets. At the time of this writing, most cell phones cannot support multimedia playback because they are only capable of supporting one or a few concurrent processing threads. On handsets that have embedded media players, video Is also limited to very low frame rates and the bandwidth available for streaming in North America is low, varying from 2-3 kybtes/second to ISDN speeds of 64 Kbits. European countries and Japan currently offer 3G network connection speeds, varying from 64 kbits-300 Kbits, and offer more technologically advanced cell phones with embedded media players that can achieve higher video frame rates. For limited per usage periods and some, EV-DO (Evolution Data Optimized) networks can also provide these higher speeds over local CDMA networks.
  • Decoders for complex video codecs which support highly scalable MPEG4 video, and more complex, CD quality music, audio codecs like AAC, require multiple parallel processes and fast processing. Mathematical algorithms, as in Fan and Madisetti, designed for the high number of floating point samples for higher end MPEG4 and AAC, which requires a sample rate approximately 36,000 floating point calculations/second, are intended to run on specialized chips. Even at lower and very low bitrates, where MPEG4 is more efficient than its predecessors, MPEG4 software players depend on PC multitasking or hardware APIs for efficient processing to draw video frames.
  • Currently, device independent Java (J2ME) offers two standard configurations on mobile clients. The Connected Limited Device Configuration (CLDC) is prevalent in the J2ME world, and powers cellular phones, pagers, PDAs, and other handheld devices. A variant of CLDC, Connected Device Configuration (CDC) targets more powerful devices, such as home appliances, set-top boxes, and Internet TVs. The second configuration, Mobile Information Device Profile (MIDP), runs on top of the CLDC, and several profiles run on top of CDC.
  • Java players for cell phones like the Oplayo MVQ player exist, but implementations of true, platform independent, MPEG4 Java decoders that will play video on cell phones are not known. More efficient methods, such as U.S. Pat. No. 5,699,121, do not rely on DCT motion compensation, and propose pattern marching to identify regions in the motion residual signal that have not been accurately reproduced and to correct them using a pattern library. But again, this approach does not use MPEG4 video encoding.
  • Similarly, although Java decoders exist that play MP3 ringtones on cell phones, no Java players are known that will play AAC. In fact, many of the newer IDCT algorithms are targeted more towards customized logic chips that only do IDCT (composed of many simple pipelined instructions as opposed to a few more complex ones).
  • Bit streaming is the standard method of transmitting audio or video to cell phones over wireless networks. Streamed bits are buffered and then decoded, or entire video files are downloaded or proportionately cached, or as in progressive, http streaming downloading to a point where complete, continuous playback is deemed possible. In the case of wireless networks, bit streaming in terms of audio/video content is usually is done over a non-reliable transport like UDP and requires a lot of error correction and duplication of content (extra stream correction data). Advances in these transmission methods do propose more sophisticated means of reserved bandwidth, such as AROMA. Alternatives to streaming methods have been proposed for the transmission of video as objects through pre-fetched lists Waese et al., U.S. Pat. No. 6,286,031,which are similar to downloading pre-fetched lists of SMIL objects, and instant or scheduled notification file downloading Stumm, U.S. Pat. No. 5,768,528. However, these do not address specific continuity and deployment issues for wireless multimedia transmission and concurrent playback on a limited-tasking cell phone handsets.
  • Streaming also requires client processing for significant error correction in video decoding, adaptive encoding for varying channel bitrates and in cellular networks. It also requires the use of cellular MMS multimedia protocol.
  • The object of the current invention is to solve one or more of the drawbacks in existing methods discussed above, or to provide other improvements to the art.
  • SUMMARY OF THE INVENTION
  • The invention relates to wireless Internet multimedia transmission and wireless clients. In a first aspect the invention provides a method of efficient multimedia object creation. In a second aspect the invention deployment addresses the limitations of large-scale multimedia transmission on cellular networks to wireless clients. In a third aspect, the invention relates to methods of decoding of video, sufficiently optimized to be played on a limited wireless client. In a fourth aspect, the invention relates to methods of decoding of audio, sufficiently optimized to be played on a limited wireless client. In a fifth aspect, the limited mobile handset multimedia object player, for both MPEG4 video decoding and AAC audio decoding, is implemented as a device-hardware-independent java (J2ME) applet.
  • In its first aspect, the invention pertains to efficiently transmittable multimedia object creation. A server-based transcoder, coupled with a multimedia object creator, inputs a standard analog signal or alternative digital signal like MPEG2, and converts this signal into true MPEG4/AAC multimedia objects. As multimedia objects they can then be dynamically uploaded to multiple live content hosting web servers, which, through proximate mobile network proxy servers, make live content accessible to mobile clients as consecutive multimedia objects.
  • The multimedia object creator produces discrete multimedia objects from video and audio segments of a continuous stream. If the stream is MPEG4, multimedia objects can also be segments of multiple component video and audio streams. In the case of multiple MPEG4 component streams, per object segmentation and decoding can enable the composition of a single scene from several temporally-independent multimedia objects. This provides the possibility of decoding only a limited number of multimedia objects, and not all objects, to provide an object-based scalability.
  • Multimedia objects are discrete, and also have distinctive Internet addresses, and hence, the mobile dient will have the opportunity to interact with any given media sequence on a per object basis. In cases where the multimedia object sequence is being transcoded from a live stream, a window of multimedia objects is made available on the host server. This window would be comprised of a number of recently created multimedia objects. To minimize delays that will occur to maintain synchronicity between the client and server, a larger multimedia object window can be dynamically created on-the host server.
  • In cases where the media object sequence has been previously transcoded and resides on the host as a non-live source, each object is numerically ordered. The transport mechanism for the multimedia objects is assumed to be HTTP for the purposes of illustration, however, other protocols which access content through file and directory structures could be used. For example FTP, IMAP4 and NNTP all have the capability to serve files in a directory structure.
  • On the client side, the number of multimedia objects that can be buffered in memory is based on the size of the first multimedia object The amount of free memory available and the processing of further multimedia objects in the sequence can be optional and dependent on whether the implementation allows the modification of object parameters between multimedia objects (such as the size of the visual frame or the sample rate of the audio stream). The buffering and playback of multimedia objects in a synchronized fashion is critical to fluid playback.
  • The wireless networks over which the limited devices operate often have a very high latency. The HTTP 1.1 protocol and some implementations of HTTP 1.0 allow the use of a persistent connection over TCP to perform multiple requests. Furthermore, some HTTP 1.1 implementations allow the use of pipelined connections allowing the HTTP client to perform many requests in rapid succession decreasing the latency between the request and reception of each multimedia object. When possible, the invention can take advantage of this capability.
  • In its second aspect, the invention pertains to multimedia object deployment to large numbers widely distributed wireless Internet clients. Media content converted to multimedia objects must be available to many users and the distribution system must be sufficiently robust to allow peaks in demand and have sufficient geographic proximity that network congestion and latency are reduced.
  • In this second aspect, and in the case of live content transcoding from a live audio/video stream, the transcoding of input media formats and the creation multimedia objects is done in real-time and immediately deployed to every content server of the distributed system. These content servers may be at the same location, or they may be geographically placed to support local mobile clients and take advantage of alternative mobile network proxy servers and proxy server object caching. The distribution of multimedia objects to alternative content servers can take place on the wired Internet.
  • In a third aspect, the invention provides novel optimizations for digital video decoding. Some of these optimizations can then be used by an expert assessment process, whereby, the decoder maintains a state information list of short-cuts, related to perceived frame rate in a sorted list starting with those that will decrease output quality the least, to those that will decrease output quality the most but have the most impact on decoding speed. The client player-decoder dynamically adjusts how many shortcuts must be taken. These short-cuts are specifically designed to drastically reduce the number of computations necessary at certain critical steps in the video decoding process at the cost video output quality. This allows the video decoder to scale in complexity based on the processing power of the device being used. It also allows users to experience multimedia playback despite the limitations of the device they may be using.
  • In a fourth aspect, the invention pertains to decoding audio on a limited mobile device. Mobile devices present many challenges for audio playback. Typical mobile processors have integer math only, little or no on-CPU cache, and a limited audio output interface. The present invention takes several approaches to getting maximum audio quality out of these very limited devices, which are applicable to other audio codecs such as AC3, AMR and WMA v9, as well as AAC LC. These include a novel use of Huffman codebooks, a highly optimized IMDCT process, and innovative windowing optimizations.
  • One of the serious limitations of mobile devices is their inability to play continuous sound. There is no way to play a long sound without gaps which occur when switching from one block of sound to the next block of sound. The proposed invention also solves this gapping problem by intelligent placement of the gaps. A frame of low total energy is selected, and the playback is controlled so that the gap will occur during that frame. The low-energy frame may be dropped so that synchronization is not lost.
  • In a fifth aspect, the invention pertains to the implementation of a moble handset MPEG4 video and MC audio player that is hardware-independent and operating system independent, and can simply be downloaded prior to media playback on mobile clients that do not have embedded media players. Hardware and operating system independence are characteristics of Java applets, but Java cannot take advantage of hardware capabilities in processing the huge number of calculations and variables required for either MPEG4 decoding or MC decoding on limited processing mobile handset. Hence, the required optimizations for Java itself, to permit the playback of AAC and MPEG4 on current mobile client hardware, are a source of technological innovation and advance.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • For a better understanding of the present invention and to show more clearly how It may be carried into effect, reference will now be made, by way of example, to the accompanying drawings which show the preferred embodiment of the present invention and in which:
  • FIG. 1 is a general diagram for a distributed network system for multimedia-on-demand, utilizing a centralized content server, indexing host, multimedia object creator and transcoder for live broadcast applications or to transcode and create multimedia objects from archived multimedia files, and distributed content servers involving high capacity cellular network proxy servers and mobile clients running downloaded java applets or embedded or downloaded non-java multimedia object players; and
  • FIG. 2 is a flow diagram illustrating a multimedia object identification method by the multimedia object creator of FIG. 1 for mobile clients by the host content server of FIG. 1; and
  • FIG. 3 illustrates a multimedia object windowing sequence for a live transmission of multimedia objects created by the multimedia object creator of FIG. 1; and
  • FIG. 3 a illustrates multimedia object creation for single stream multimedia, just audio and multi-stream MPEG4 composite layers by the multimedia object creator of FIG. 1; and
  • FIG. 4 is a flow diagram illustrating the steps of multimedia object processing by the multimedia object players of FIG. 1; and
  • FIG. 5 is a diagram illustrating the architecture and processing interaction for the large scale distribution of live and archived multimedia content in the distributed network being managed by the indexing host of FIG. 1, involving remote transcoding/multimedia object creating servers and a central indexing host server; and
  • FIG. 6 is general diagram illustrating standard MPEG4 Simple Profile decoding steps which are followed in general for video decoding by the multimedia object players of FIG. 1; and
  • FIG. 7 is a flow diagram illustrating an optimized Huffman codebook method for digital video decoding method used by the multimedia object players of FIG. 1; and
  • FIG. 8 is a flow diagram illustrating a method of using a texture buffer to process P-frames for digital video decoding; and
  • FIG. 9 is a flow diagram showing a method of video decoding performing faster motion compensation without bilinear interpolation when less quality but faster processing is required that is used by the multimedia object players of FIG. 1; and
  • FIG. 10 is a flow diagram illustrating an optimized digital video decoding method for optimizations in pixel processing and dequantization used by the multimedia object players of FIG. 1; and
  • FIG. 11 is a flow diagram illustrating a novel use of Chen's algorithm used by the multimedia object players of FIG. 1; and
  • FIG. 12 is a flow diagram showing a novel handling YUV to RGB conversion used by the multimedia object players of FIG. 1; and
  • FIG. 13 is a flow diagram illustrating decoding short cuts for effective video decoding on variable limited mobile client hardware used by the multimedia object players of FIG. 1; and
  • FIG. 14 is a general diagram illustrating basic steps of the AAC digital audio decoding and other similar audio codec decoding, which are followed in general by the multimedia object players of FIG. 1; and
  • FIG. 15 is a flow diagram illustrating an optimized Huffman codebook method for digital audio decoding used by the multimedia object players of FIG. 1; and
  • FIG. 16 is a flow diagram illustrating an optimized digital audio decoding method for optimizations in the IMDCT step used by the multimedia object players of FIG. 1; and
  • FIG. 17 illustrates simplified input short-cut processes specific to AAC Low Complexity (LC) audio decoding profile used by the multimedia object players of FIG. 1; and
  • FIG. 18 shows audio decoding using an alternative bit-operation based Taylor computation method used by the multimedia object players of FIG. 1; and
  • FIG. 19 illustrates further IMDCT short window processing for digital audio decoding for the method used by the multimedia object players of FIG. 1; and
  • FIG. 20 illustrates low energy gap timing in audio playback for the method of audio decoding used by the multimedia object players of FIG. 1.
  • DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
  • FIG. 1 illustrates a centralized content server system 1, utilizing a transcoder 2 and a multimedia object creator 3 to create multimedia objects from a live broadcast 4 or to transcode and create multimedia objects from archived multimedia files 5. The central server includes an indexing host system 6 to deploy created multimedia objects to relevant content servers 7 through the wired Internet and to verify all geographically dispersed wireless clients 8. The system includes the potential use of proxy cellular network http servers 9, which can cache large numbers of small multimedia objects to support large numbers of concurrent wireless clients 8 running multimedia object java applets 10 or embedded or downloaded non-java multimedia players 11.
  • FIG. 2 is a flow diagram illustrating the process of multimedia object identification by the multimedia object creator 3. This process encodes a Supplied Identification to each multimedia object to identify the transport protocol, source host, path and number of objects of a particular multimedia stream.
  • The host directory name of the multimedia objects is formatted to contain the number of video objects located within the directory. An delimiting character is placed between the end of the directory name and the number indicating the multimedia object count. This allows the use of directory names terminating in numbers while indicating an unambiguous multimedia object count e.g. StarWars1.mp4, StarWars2.mp4, etc.
  • Multimedia objects within the directory are named similarly to directory name. However, instead of the multimedia count following the delimiting character, a number Indicating the multimedia object's position within the sequence of multimedia object is specified. The following is an example:
  • Supplied Identification for MultimediaObjects:
  • <transport>://<host>/<path>/<MOSName><Delim><MOCount>
  • Computed Identification for each multimedia object based on Supplied Identification:
    <SuppliedID>/<MOSName>.<MOSeqNum><Delim><MOType>
    <transport>: protocol used to transmit multimedia objects
      <host>: Host content servers 1 or 7 serving the multimedia
        objects directly to mobile clients 8
      <path>: Path to the multimedia object directory
      <MOSName>: Name of the multimedia object sequence
      (perhaps the name of a broadcast)
      <MOCount>: Number of multimedia objects (Integer greater
      than zero)
      <MOSeqNum>: The multimedia object's sequence number
      (Integer greater then zero, less then or equal to MOCount)
      <Delim>: The delimiting character
      <MOType>: enocoding type e.g. mp4
  • When multimedia has been transcoded from a non-live source, the first multimedia object in the sequence to be played could have the index 0 of a counting series. In cases where the multimedia object sequence is being transcoded by the multimedia object creator 3 from a live stream, a window of multimedia objects is made available on all content servers 1 and 7. This window would be comprised of a number of recently created multimedia objects transcoded from the live stream.
  • The window of multimedia objects allows clients to begin reception of a multimedia object sequence at an earlier point than the most recently created multimedia object. This mechanism provides an extra degree of forgiveness in high-latency situations, where there may be a delay between the client 8 discovering the most recent multimedia object and the actual request.
  • The window of multimedia objects would shift as more multimedia objects are transmitted from the live source. The multimedia object sequences would begin at 0 and be numbered sequentially. The window size hence permits the removal of earlier objects.
  • For example, a live stream may be made comprised of a window of four objects. Upon transmission of a fifth video object, the first multimedia object would be deleted, resulting in the following sequence illustrated in FIG. 3.
  • The wireless client 8 can have the capability to search forward in the multimedia object sequence among the multimedia video objects in window. This provides additional transmission continuity in cases where it is not possible to maintain sufficient bandwidth for all multimedia objects in the live sequence.
  • To reduce delays that will occur to maintain synchronicity between the client 8 and server 1 or 7, a larger multimedia object window can be used. Likewise, the mobile client 8 may also store more than two multimedia objects in the internal buffer. Moreover, wireless networks over which the limited devices operate often have a very high latency. This is especially evident when TCP's 3-way handshake must be performed for every connection that is made. It is therefore ideal to use an application protocol that is able to minimize the latency between each request for a multimedia object.
  • The transport mechanism for multimedia objects is assumed to be HTTP for the purposes of system FIG. 1, however, other protocols which access content through file and directory structures could be used. For example FTP, IMAP4 and NNTP all have the capability to serve files in a directory structure. HTTP 1.1 protocol and some implementations of HTTP 1.0 allow the use of a persistent connection over TCP to perform multiple requests. Furthermore, some HTTP 1.1 implementations allow the use of pipelined connections, allowing the HTTP client to perform many requests in rapid succession decreasing the latency between the request and reception of each multimedia object.
  • In FIG. 3 a, the transcoder 2 and multimedia object creator 3 create multimedia objects of an optimal digital encoding-such as MPEG4/AAC from analog multimedia or an alternative codec stream 12, such as MPEG1, MPEG2, MOV, AVI, WMV, ASF, and higher encoded MPEG4. First, the Input stream is transcoded into MPEG4 and MC and then it is split according to a specified interval, such as 10 seconds, into multimedia objects. The video component of the stream is scanned after the specified interval for the next I-frame 13, where the split is made. Since typically there are no predicted frames in digitized audio 15, a conditional split is made to correspond to-the video segmentation 14. In the case of multiple MPEG4 component streams 16, multiple video and/or audio composite layers can also be split into multimedia objects at I-frames 13. In the case of just audio signal input 12 a, the transcoded audio can be analog, or digital codecs, AMR, MP3, RealAudio or higher rate encoded AAC and then split into specified intervals 13 a.
  • FIG. 4 is a flow diagram illustrating client side processing of multimedia objects. Multimedia object player 10 or 11 processing is initiated by the receipt of the first multimedia object from a content server 1 or 7. The first multimedia object's Identification is parsed and the total number of multimedia objects stored within the Indentication's <path> is determined, or the case of live transmission applications, the number of multimedia objects in window. Hence, heap memory allocations for the multimedia objects and meta-data can then be determined. These allocations are created of sufficient size that multimedia objects that follow can overwrite older multimedia objects in the same memory allocation without overflowing.
  • Depending on the amount of heap memory available on the device, several sets of memory allocations can be made to store multiple multimedia objects at once. This constitutes a multimedia object buffer and allows the decoder to be playing one multimedia object while the next has not yet fully completed reception. A device must have enough memory to allow two multimedia objects to be stored in the heap at once, otherwise the behavior is undefined (the decoder may refuse to play the multimedia object sequence, or play it with extended pauses between each multimedia object). Important state information is stored in several variables. This information includes the Integer values:
      • nMObjectPlaying—current Multimedia Object index playing
      • nMObjectRecving—current Multimedia Object index being received and Boolean value(s):
      • bWaitForBuffer—Indicates to the Playback component that it should way until buffering of further multimedia objects is complete
  • This state information provides a mechanism with which the reception and playback of multimedia objects can be synchronized. The multimedia object contains information required to properly configure the audio and video decoders, and this information is passed to the respective decoder.
  • After configuring the audio and/or video decoding components of the multimedia object player, the object player may choose to either delay playback until the multimedia object buffers in memory have filled or may begin playback immediately while requesting the next multimedia object concurrently. This decision can be based on the speed at which the multimedia objects are retrieved versus the playback time of each multimedia object, the latency of requests for multimedia objects or the number of multimedia objects that can be stored in memory at once.
  • Following the reception and parsing of the first multimedia, its audio And/or video content are hence parsed and played back. In the audio and/o video content of the first and every subsequent multimedia object, the approach taken is to decode sufficient audio frames that their total duration is as long as the display time of the associated video frame and processing time of the next audio frame. By interleaving the processing between several audio frames and a single video frame, the decoder can perform both audio and video decoding in a single thread.
  • The retrieval and playback of multimedia objects continues until the last multimedia object in the sequence has been completely retrieved and its playback has finished.
  • The state information described also provides a mechanism which can be used to skip backwards and forwards through the multimedia object sequence. By changing the state information and restarting the retrieval of multimedia objects, the playback of the objects can be repositioned to any multimedia object in the sequence.
  • In FIG. 5, a large scale “live content” application is illustrated. A central server indexing host 17 manages all of the available content and the content servers 7 through which the content is made available. Remote transcoding and multimedia object creating servers 18 that provide continuously updated content must register this content with the indexing host 18. The transcoding servers 18 must also keep the central indexing server 17 updated with the latest multimedia object sequence Indices, to allow distributed wireless clients 8 to begin playback of any live content with minimal delay.
  • The URLs of any content servers supporting a particular broadcast would be pre-registered in a table on the indexing server 17.
  • Content servers 7 accept and store live content being transmitted from transcoding servers 18. They can also store non-live archive multimedia content, but in a live content type application, they need only cache the most current window of multimedia objects.
  • Content servers 7 are distributed in such a fashion that allows wireless clients 8 a fast and low latency host connection. Content servers 7 could all be connected in a LAN, but for large scale operations, they could have any distribution on the wired Internet. The wireless client 8 can receive the content directly 19 from a content server 7 or indirectly 20 through a cellular network proxy server 9.
  • The central indexing host 17 accepts requests from clients 8 for multimedia content 21. The indexing host 17 must reply with the most suitable content server 7 for the client 8. This can either be done in a round-robin fashion or other factors can be included such as the location of the client 8 relative to available content servers 7. Other information such as the server load and network congestion of each content server 7 can be taken into account.
  • The central indexing host 17 also authenticates 22 clients 8 as they request available content and specific pieces of content. The authentication process is designed in such a way that the content servers 7 do not need to maintain a list of authorized clients 8 and the content available to them. Instead the indexing host 17 authenticates the client 8 and provides the client 8 with an encrypted string that is eventually decrypted by the content server 7. This string is the encrypted form of the catenation of the content name or description, and the current UTC date-time and an interval of time for which the client 8 is authorized to access the multimedia content. The string is designed to allow the client 8 to access and playback multimedia objects received from a designated content server 7.
  • The indexing host 17 may also provide the client 8 with other information about the multimedia content, along with the encryption string, such as a description of the source, copyrights, and subtitle data-sources.
  • FIG. 6 illustrates MPEG4 video decoding process as outlined by the MPEG-4 Committee for recovering video object planes (VOPs) data from the coded bit stream. These steps of video decoding are followed in general by the video decoding process of multimedia players 10 and 11. The decoding process is composed of three major sections; shape, motion, and texture decoding.
  • Coded Bit Streams: The video stream is parsed and demultiplexed to obtain shape, motion, and texture bit streams. Each stream has a decoding process needed in order to reconstruct the VOPs.
  • Shape Decoding: Binary shape decoding is based on a block-based representation. The primary coding methods are block-based context-based binary arithmetic decoding and block-based motion compensation.
  • Variable Length Decoding: Shape information, motion vectors, and the quantized DCT coefficients are encoded using variable length codes. Differential DC coefficients in intra macroblocks are encoded as variable length codes. The final DC value is the sum of the differential DC value and the predicted value. The AC coefficients and non-intra block DC coefficients use a different variable length code.
  • Inverse Scan: Coefficients are scanned during the encoding process for two reason—to allocate more bits to high energy DCT coefficients during quantization and to turn the two dimensional array (8×8) into a one dimensional array. The reverse process (i.e. inverse scan) is used on the decoding size to ensure proper dequantization and to restore the two dimensional information. There are three types of scans used; Alternate-Horizontal scan, Alternate-Vertical scan, and the Zigzag scan. The type of scan used during the decoding process will depend on the type of coefficients being decoded.
  • Inverse AC and DC Prediction: The prediction process is only carried out for intra macro blocks. Previous intra macro blocks are used for forward prediction in order produce subsequent macro blocks. This optimization process is used to predict both DC and AC coefficients.
  • Inverse Quantization: The two-dimensional array of coefficients produced by the inverse scan is inverse quantized to produce the reconstructed DCT coefficients. The process is trivial; it is basically a multiplication by the quantizer step size. A variable quantizer step size can be produced by using a weighted matrix or a scale factor in order to variably allocate bits during the encoding/decoding process.
  • Inverse DCT (IDCT): An inverse DCT is applied in order to recover the VOP from the frequency domain (i.e. DCT coefficients) into the spatial domain (i.e. pixel values). Note that in the texture decoding process, the luminance and chrominance components of the VOP (i.e. Y, Cb, Cr components) are quantized at different rates in order to reach a higher compression rate (which is the powerful aspect of the DCT transform when used in compression).
  • Motion Decoding and Compensation: Motion compensation is another technique used to achieve high compression. The algorithm used by MPEG-4 is block-based motion compensation to reduce the temporal redundancy between VOPs. Motion compensation in this case is two fold: if is used to predict current VOP from previous VOP, and to interpolate prediction from past and future VOPs in order to predict bi-directional VOPs. Motion vectors must be decoded to predict movement of shapes and macroblocks from one VOP to the next. Motion vectors are defined for 8×8 or 16×16 regions of a VOP.
  • As exemplified by MPEG4 Simple Profile decoding diagram FIG. 6, but common to other video codecs, two types of Video Object Planes (VOP) are handled in digital video decoding. Video streams must begin with a frame that makes no temporal reference to any earlier frames or an Intra-Frame (I-Frame). A second type of VOP that allows temporal reference to the previous frame in the stream are known as Predicted Frames (P-Frames). Macroblocks within P-Frames may contain motion vectors to enable motion correction from the previous frame. These macroblocks often contain pixel residue information which includes corrections to the predicted pixels.
  • Motion compensation must occur for many of the macroblocks within P-Frames and is a critical component of any video decoding mechanism. Motion vectors can be compressed using Huffman codes. These are binary Variable Length Codes (VLC) which represent values occurring with high probability with shorter binary length than values which occur with less probability. The rapid decoding of VLCs is critical to any decoding application on constrained devices. The video decoding process operating on the multimedia object players 10 and 11, decodes these VLCs in a novel use of Huffman codebooks.
  • Theoretical Huffman codebook process reads bits from the packet bitstream until the accumulated bits match a codeword in the codebook this process can be thought of as logically walking the Huffman decode tree by reading one bit at a time from the bitstream, and using the bit as a decision Boolean to take the 0 branch (left side) or the 1 branch (right side). Walking this binary tree finishes when the decoding process hits a leaf in the decision tree—the result is the entry number corresponding to that leaf. Reading past the end of a packet propagates the ‘end-of-stream’ condition to a decoder.
  • The novel approach taken to decode VLCs by the video decoding process operating on the multimedia object players 10 and 11, is illustrated in FIG. 7, and can be precisely described as follows:
      • Bits are read off the stream into an integer buffer (N). The number of bits read is equivalent to the length of the longest code in the VLC codebook. The roof of logarithm (base 2) of N is taken. Based on the result, N is shifted and used as an index into an array containing the true value indicated in the codebook and the true length of the code. The number of bits indicated as the true length is then removed from the video stream and processing continues. An example is provided:
        • Table B-7 of the MPEG4 Standard (Conf. [1]) contains the Code/Value(s) pair:
        • Code: 0000 0101 Values MBType: 2 CBPC: Ob11
  • The maximum length of a code in Table B-7 is 9. The above code would be read off the bit stream as (N:=) 0000 0101X (where X is a ‘Do Not Care’ bit). The roof of logarithm (base 2) of N is found to be The value 4 is then used to identify the array in which N is used as an index to locate the appropriate decoded value. N can also be shifted to remove irrelevant bits allowing the lookup array to be smaller.
  • This novel approach provides a very low time complexity and due to the nature of Huffman codes, a great majority of codes can be decoded with the first few tables providing a greater cache hit ratio.
  • Following the reading and processing of motion vectors off the video stream, motion correction must take place. Due to the high latency of memory often used in constrained devices, random memory access and non-contiguous memory access must be minimized. FIG. 8 is a flow diagram, describing video decoding process in the multimedia object players 10 and 11, which illustrates that a texture buffer large enough to contain 4 luminance and 2 chrominance blocks (the dimensions of a macroblock exemplified in the MPEG4 specification) is used to store the predicted pixels from a reference frame. This texture buffer is much smaller then the original video frame and decreases the amount of reading from and writing to non-consecutive bytes within the reference and output video frames. All pixel residues are applied to the texture buffer which is then copied to the output frame. This method of processing P-frames is optimal in situations where the main processing unit has sufficient cache to store the texture information of the entire Macroblock. In cases where the limited device has very little or no on-die cache, it may be preferable to avoid using a macroblock texture buffer. Also, macroblocks with motion vector information contain pixel residue values that are often distributed in a much smaller range of values than the pixels of a texture. In cases where the device is unable to decode the video stream in real-time, a faster but less accurate IDCT algorithm can be used to process these residue values. Furthermore, to minimize the effect of the less accurate IDCT algorithm, this step is taken first on chrominance pixel residues, but can also occur for luminance pixel residues as required.
  • The motion vector information associated with a macroblock often references a point between pixels on the reference VOP. This requires that decoders perform bilinear interpolation between pixels. This is a time consuming process requiring the sampling of four source pixels, four additions and a single divide operation for every output pixel. In addition to various arithmetic optimizations performed the video decoding process of the multimedia object players 10 and 11, shown in the flow diagram of FIG. 9, uses faster motion compensation without bilinear interpolation when less quality but faster processing is required. Digital video codecs define Luminance and Chrominance values within a given subrange of values, MPEG4 uses [0, 255]. This allows the decoding software to store the Luminance and Chrominace pixels within a single byte of data with the correct precision, However, during the decoding process, values outside the [0,255] are often generated during motion compensation and in the inverse DCT steps. Attempting to store values outside this range results in a single byte overflows causing graphical errors in the final video output. Clipping these values and modifications to the dequantization process can be very time consuming and can result in decrease of output correctness.
  • FIG. 10 is a flow diagram illustrating novel optimization for the dequantization step of digital video decoding in the multimedia object players 10 and 11. The novel optimization requires the a reduction in pixel accuracy but allows values outside the range [0, 255] to be represented in a byte field without an overflow. Through analysis of various video samples, it has been found that the range [−128, 383] is sufficient to store nearly all potential resulting Luminance and Chrominance pixel values. By taking the original pixel value, adding 128 to it and dividing the result by two, values in the [−128, 383] may be represented in the [0, 255] with a decrease in accuracy of 50%.
  • This decrease in luminance and chrominance accuracy is not a factor on many limited devices as the RGB color resolution is often in the 4-bit to 18-bit range. As an example an input pixel (nInputPixel) in the range [−128, 383] is converted into the alternate format for storing in a byte field (nbOutputPixel): byte nbOutputPixel=(nInputPixel+128)/2.
  • One of the most processing intensive steps of the decoding process occurs during the IDCT step. The use of an effective integer-based algorithm is an absolute requirement when the decoding occurs on constrained devices. The Chen's IDCT algorithm is optimized, but the processing time consumed by the standard Chen implementation is to great for real-time decoding on limited devices. Hence, FIG. 11 is a flow diagram illustrating a novel use of Chen's algorithm in the multimedia object players 10 and 11. Here several different simplified versions of Chen's algorithm be implemented, based on the energy or distribution of input DC and AC coefficients. This can result in reduced video output quality, but the effect is mitigated by giving a higher-quality preference to luminance blocks. Reduced color definition is often not as noticeable on constrained devices, and allows the chrominance blocks to be decoded with less precision. The IDCT process can be further optimized by recording which rows of the input matrix to the IDCT are populated with values. This same mechanism can be used to ignore certain input values of insufficient energy to make a very noticeable Impact on the output image and further decrease processing time.
  • In a limited device, the memory access required in the YUV to RGB conversion process can be sufficiently long to consume more time then any other step in the video decoding process. The video decoding process in the multimedia object players 10 and 11 uses a further step of scaling to reduce this processing, as the display size is often not the exact size of the video output. The YUV to RGB conversion and scaling steps can be combined into a single step to decrease memory access and increase the speed of video output. Several YUV to RGB functions are available providing decoding times of varying speeds and quality as well as scaling ranges. FIG. 12 is a-flow diagram showing video decoding of the YUV to RGB step in the in the multimedia object players 10 and 11, as follows:
  • 1) Separate YUV to RGB and scaling functions for cases where scaling up is required and where scaling down is required. Distinct optimizations are available for each method and added speed can be attained by separating the functionality between several different functions.
  • 2) When scaling up is required, a minimum amount of reading from the source Luminance and Chrominance planes is desired. This is accomplished by iterating through pixels in the source plane. A fixed number of Luminance and Chrominance values in a column are read and the resulting RGB values computed for each pixel position. The pixel values are then copied in a looping fashion first by column, then by row to the output plane. This provides a way to read a single input value which may result in many output values In the output plane when scaling up.
  • 3) Similarly, when scaling down is required a minimum amount of reading from the source Luminance and Chrominance planes is desired. This is accomplished by iterating through pixel positions in the output plane and calculating the source pixel in the input plane. This provides a way to read a single input value for every output value and minimizes the number of input-plain reads that are necessary.
  • 4) The YUV to RGB conversion step is such a time consuming one that methods of improving the speed of computation at the expense of output quality have been implemented. Improvements in speed can be obtained by sampling only a subset of the chrominance pixels, avoiding pixel clipping or calculating the Red and Blue values for only a subset of output pixels. All of these methods are used together to provide several quality levels in the YUV to RGB step.
  • Hence, FIG. 13 is a flow diagram summarizing the short-cut optimization processing by the video decoding process used in the multimedia objects players 10 and 11. State information is maintained about the quality levels with which the current video stream is processed. On very limited devices short-cuts in the decoding process must be made to allow the device to maintain synchronicity between the audio and video playback. These short-cuts are specifically designed to drastically reduce the number of computations necessary at certain critical steps in the video decoding process at the cost video output quality. This mechanism allows video decoding to scale in complexity based on the processing power of the device being used.
  • It has been found that three quality levels tests at each critical step appear to yield the best results. The highest quality is consistent with the video codec specification and displays a correct image. A medium quality level indicates that certain time consuming short-cuts are made with some impact on image quality. A low quality level indicates that drastic reductions in display quality are made to improve processing time—the output video can be unrecognizable at times and as a result this level is used only in drastic cases of a sudden drop in processor availability.
  • A final option exists is to avoid the processing and displaying of some or all P-Frames. This is only an option in video streams where I-Frames occur at regular intervals. Given the wide variety of processing capabilities in limited devices, this implementation strongly suggests the creation of multimedia objects from video streams with transcoder 2 specifying very regular I-Frames so that devices of very limited processing power are able to provide the client 8 with occasional frame changes.
  • The state information is composed of a series of integers correspond to various steps in the decoding process and define the quality at which the decoder should perform several steps. The implemented system in the multimedia players 10 and 11 consists of six of these integers:
      • nYUVtoRGBQuality—Quality of the YUV to RGB conversion process
      • nLumaIDCTQuality—Quality of the Inverse DCT function for Luminance blocks
      • nChromaIDCTQuality—Quality of the Inverse DCT function for Chrominance blocks
      • nLumaMCQuality—Quality of motion compensation for Luminance blocks
      • nChromaMCQuality—Quality of motion compensation for Chrominance blocks
      • nFrameRateQuality—Defines the allowance to drop frames (from a single P-Frame occuring before an I-Frame up to dropping all P-Frames)
  • In addition to the set of integers defining the actual quality at various steps, a single integer representing the current quality level of the overall decoding is used (named nVideoQuality in this instance). Each step quality has a very limited number of possibilities (HIGH, MEDIUM, LOW, etc), however, nVideoQuality can take on many values. At each value of nVideoQuality, a ruleset defines quality of each of the above step qualities. At the highest value of nVideoQuality, all step qualities are set to maximum. As an nVideoQuality is decreased, the step qualities are incrementally reduced according to the ruleset.
  • Some states of quality levels are less preferable to others. For example, it is not preferable to render many frames at the lowest quality setting of nLuminaIDCTQuality—it is instead more preferable to drop frames if there is insufficient processing capability to perform nLumaIDCTQuality at a higher quality. The ruleset is designed to take these possibilities into account.
  • FIG. 14 illustrates the general steps of audio decoding followed by the audio decoding process of the multimedia object players 10 and 11.
  • The first step in AAC audio decoding (bit-stream de-multiplexing), which is common to other digital codecs, is to establish frame alignment. This involves finding the AAC sync word and confirming that the AAC frame does not contain any errors, if error checking is enabled in the frame. Once the frame sync is found, the bitstream is de-multiplexed or unpacked. This includes unpacking of the Huffman decoded and quantized scale factors, the M/S synthesis side information, the intensity stereo side Information, the TNS coefficients, the filter bank side information and the gain control words.
  • Next the quantized spectral coefficients are Huffman decoded. Each coefficient must be inverse quantized by a 4/3 power nonlinearity and then scaled by the quantizer step size.
  • The Huffman codebooks used to decode digital audio in the multimedia object players 10 and 11 of FIG. 1, are very different from those used for digital video, but they are very similar to Huffman codebooks used In other digital audio codecs. A novel simplification of variable length fields (VLCs), used in audio decoding by the multimedia object players 10 and 11, is illustrated in FIG. 15, that allows the decoding of a single VLC value with a single array lookup. The novel approach taken is as follows:
  • Bits are read off the stream into an integer N. The number of bits read is equivalent to the maximum number of bits in the longest codeword in the codebook. The first binary 0 is then located starting from the highest bit. The left-based index of this first 0 is then used to remove out all the previous is and N is shifted and used as an array Index.
  • For example, the AAC standard's 2nd Codebook contains the
  • Code/Value pair:
  • Code: 11110110, Value: 77
  • The maximum length of a code in the 2nd table is 9 so when read from the BitStream the above code would appear as:
    11110110X (Where X is a “Do Not Care” bit)
  • The ZeroPosition of the above integer is found to be 4. The ZeroPosition is then used to mask off the 1 bits previous to it yielding the integer “010X”. This can then be used as an index to an array or be shifted to remove the irrelevant bits allowing the lookup array to be smaller.
  • The next standard audio decoding step, M/S synthesis, conditionally dematrixes two channels into a stereo pair. The samples my already represent the left and right signals, in which case no computation is necessary. Otherwise the pair must be de-matrixed via one add and one subtract per sample pair in order to retrieve the proper channel coefficients.
  • Intensity stereo identifies regions in a channel pair that are similar, except for their position. Left-channel intensity regions must have inverse quantization and scaling applied. Right-channel intensity stereo regions use the left-channel inverse quantized and scaled coefficients, which must be re-scaled by the intensity position factors. Hence the net complexity of intensity stereo is a savings of one inverse quantization per intensity stereo coded coefficient. The next standard step, temporal noise shaping (TNS), has a variable load, depending on the number of spectral coefficients that are filtered.
  • Finally, the Inverse Modified Discrete Cosine Transform (IMDCT) transforms the spectral coefficients into time-domain samples. For fixed-point implementations it is required that any round-off noise is less than 1/2 LSB after the transform result is rounded to linear 16-bit values. Fixed-point realizations using 24 bit words are sufficient.
  • FIG. 16 illustrates Intermediate 23 and Final 24 optimizations for the digital audio IMDCT step used by the audio decoding process in the multimedia object players 10 and 11. The audio decoder of the multimedia object players 10 and 11, combines the use of a specific Inverse Fast-Fourier Transform with Pre- and Post-processing steps.
  • This method produces a simplified IMDCT algorithm with O(n*log(n)) runtime. This method can also incorporate the use of various IFFT algorithms based on the sparseness of input.
  • The following the steps describe the implementation:
  • [0] The IMDCT algorithm accepts an input array X of spectral coefficients in the frequency domain and outputs an array of amplitude values in the time domain twice the size of the input. The implementation of the AAC Low Complexity codec requires that the IMDCT algorithm accept input array lengths of 128 or 1024 Real values and results in an output of 256 or 2048 Real values. In the following steps, N refers to the size of the output (256 or 2048), Im(X) returns the imaginary component of some variable X and Re(X) returns the real component.
  • [1] The (N/2) input spectral coefficients are converted into complex numbers and stored into an array C of size (N/4). There are many approaches to this step, however, the approach taken in the described implementation pairs coefficients with one coefficient becoming the real component and one becoming the imaginary component of a complex number. The following pseudo code describes this step:
    for (n=0; n<N/4; n++) {
      Re(C[n]) = X[N/2 − 2*n];
      Im(C[n]) = X[n];
    }
  • [2] This result is then multiplied with scaled complex numbers on the unit circle yielding an array of size N/4. This step is described with the following pseudo code:
    for (n=0; n<N/4; n++) {
      Re(Z) = SQRT(2/N) * cos(2*Pi*(n + 1/8) / N)
      Im(Z) = SQRT(2/N) * sin(2*Pi*(n + 1/8) / N)
      Re(C[n]) = Re(C[n]) * Re(Z) − Im(C[n]) * Im(Z);
      Im(C[n]) = Re(C[n]) * Im(Z) + Im(C[n]) * Re(Z);
    }
  • [3] The resulting array of complex numbers is then passed into an Inverse Fast Fourier (IFFT) Algorithm. A fixed-point IFFT algorithm is used to allow processing of IMDCT on devices which lack floating point capabilities. Most mobile devices do not allow floating point computations, and of those that do allow floating point, it is usually too slow.
  • Due to the properties of the inverse Fourier-transformation, the transformation can be calculated in a fixed point manner. In a fixed point transformation, the input is be scaled by multiplying the input values by a scale factor and then the correct output is found by multiplying by the reciprocal of the scale factor. Therefore a scaling operation is applied before and after the IFFT. A scale factor which is a power of two is chosen so that the scaling and re-scaling operations can be accomplished by bit shift operations. Bit shifts are among the fastest operations for CPUs.
  • [4] Following the Inverse FFT step, elements from the complex array C must again be multiplied by complex numbers as in step [2].
  • [5] The values from the resulting complex array C are then stored into an array of Real numbers x of size N. The following pseudo code demonstrates the process:
    for (k=l; l<N/8; l+=2) {
      x[2*l ] = Im(C[N/8+l]);
      x[2*l+1] = Re(−C[N/8−1−l]);
      x[2*l+2] = Im(C[N/8+1+l]);
      x[2*l+3] = Re(−C[N/8−2−l]);
      x[2*l+N/4 ] = Re(C[l]);
      x[2*l+N/4+1] = Im(−C[N/4−1−l]);
      x[2*l+N/4+2] = Re(C[l+1]);
      x[2*l+N/4+3] = Im(−C[N/4−2−l]);
      x[2*l+N/2 ] = Re(C[N/8+l]);
      x[2*l+N/2+1] = Im(−C[N/8−1−l]);
      x[2*l+N/2+2] = Re(C[N/8+1+l]);
      x[2*l+N/2+3] = Im(−C[N/8−2−l]);
      x[2*l+N/2+N/4 ] = Im(−C[l]);
      x[2*l+N/2+N/4+1] = Re(C[N/4−1−l]);
      x[2*l+N/2+N/4+2] = Im(−C[l+1]);
      x[2*l+N/2+N/4+3] = Re(C[N/4−2−l]);
    }
  • As can be seen in FIG. 16, several steps in the IMDCT process can be combined. The goal of combining steps is to reduce the number of memory accesses needed to decode a frame of audio. The flow on the right shows the steps as they occur in the decoder.
  • In summary then, the novel optimization of the IMDCT step in audio decoding shown by FIG. 16 pertains to combining steps on the Final 24 optimization side:
  • 1. Re-order, pre-scale and twiddle: The method loops over the input data, and each datum is complex-multiplied by the twiddle factor, and is then re-scaled by doing a bit shift operation. However, the twiddle factor is already bit-shifted so it can be treated as a fixed-point number, so the scaling operation's bit shift is partially performed by the twiddle factor itself. The relevant twiddle factors are stored in an array table. Once the complex multiplication and scaling are done, the resulting values are stored in the re-ordered location in the IFFT input array.
  • 2. Perform the fixed-point integer inverse Fourier transform. This transformation is the same as the transformation in the pre-combined flow.
  • 3. Re-scale, re-order, post-twiddle, window and overlap: Combining these four operations into one step replaces four array accesses with one, and some of the multiplications are also combined into single bit shifts. This method loops over the IFFT output array, and performs four operations in each iteration of the loop: the post-twiddle and rescale are combined, because the post-twiddle uses a twiddle factor table which is already bit-shifted. Windowing is combined in this step also, with window values coming from either a table or a fast integer sine calculator. Finally, values are overlapped and stored in the correct location in the output array.
  • FIG. 17 illustrates simplified input shortcut processes that are specific to AAC Low Complexity (LC) profile which are used in the audio decoding process of multimedia players 10 and 11. Note, that the Mid/Side, Intensity and Temporal Noise Shaping steps, marked with cross hatches above, are optional. In cases where these three features are not present, audio decoding can further combine other steps in a novel way. These steps are marked in grey in FIG. 17. If these other steps are combined, there are no dependencies within a frame until we reach the IFFT step within IMDCT itself. Therefore operations between noiseless decoding and the pre-IFFT operations within IMDCT itself are combined, minimizing memory access.
  • IMDCT has four different window shapes which are common in other digital audio codecs: long only, long start, long stop, and eight short. Of these four window sequences, only one (long only) has non-zero data in the entire output synthesis window. In the case of AAC, however, the output synthesis window always has 2048 output values.
    Window shape Non-zero byte range
    LONG ONLY  0-2047
    LONG START  0-1599
    LONG STOP 448-2047
    EIGHT SHORT 448-1600
  • For some window shapes, the calculations can be short-cut, avoiding the post-twiddle, windowing, re-ordering, scaling and overlapping steps entirely.
  • IMDCT permits two different window types: Kaiser-Bessel Derived (KBD) windows and Sine windows. KBD uses a complicated formula which cannot be computed in real-time, and is always used as a table. Sine windows are also used from tables in most implementations.
  • However, on a mobile device, which generally has a very small on-CPU memory cache, frequent accesses to a sine window value table will cause cache misses and degraded performance.
  • As an alternative to using a sine lookup table to compute windowing, the FIG. 18 shows the audio decoder of the multimedia objects players 10 and 11, using a bit-operation based Taylor computation, as follows:
      • 1. Use trigonometric identities to express the sine calculation in terms of a sine in the range of 0 to n/2. Call the resulting angle X.
      • 2. Calculate X*X. Call this value S.
      • 3. Calculate the result as X*(256−S*(43−(S<<1))
      • 4. The result produces a window value in the range of 0 to 255, allowing fast windowing without the use of lookup tables.
      • 5. The bit shift operations in Step 3 can be further combined with other fixed-point multiplication steps.
  • FIG. 19 is illustrates further IMDCT short window processing for even greater efficiency by the audio decoding process of multimedia players 10 and 11. In a sequence of eight short windows, the input of 1024 values is divided into eight short windows of 128 values, and IMDCT, windowing and overlapping is performed on each of these short windows. Each window of 128 values results in a synthesis output window of 256 values. These are then overlapped, resulting in non-zero values in the range of 448 to 1600.
  • The approach taken is to do every one of the IMDCT operations in sequence, rather than in parallel, storing the IMDCT results directly into the regions of the output array which will be zeroed. The output values are then windowed and overlapped. After all the eight short windows are completed, the regions of the synthesis output window which are always zero can be disregarded, due to the window shape shortcut method described above.
  • Finally, FIG. 20 illustrates an interleaved detection process in the audio decoding of received multimedia objects 25 by the multimedia object players 10 and 11. FIG. 20 illustrates the placement of gaps 26 at detected frames of low total energy 27 are they are detected during audio decoding by the multimedia object players 10 and 11. Hence, playback is then controlled so that the gap will occur during that frame, which may be dropped, so that synchronization with video is not lost.
  • In FIG. 1 the multimedia object player 10 is a downloadable Java (J2M2) applet and the described audio and video decoder optimizations and strategies, FIG. 7-13 and FIG. 15-20, as applied to standard MPEG4 and AAC decoding make it possible for the multimedia object player 10 to playback live music and video, at acceptable frame rates (5-15 fps), on limited, cell phone handsets. Java cannot take advantage of hardware capabilities in processing the huge number of calculations and variables required for either MPEG4 decoding or AAC decoding. Hence, the required optimizations for multimedia player 10, to permit the playback of AAC and MPEG4 on current mobile client hardware, are a source of technological innovation and advance.
  • The foregoing is intended, along with the drawings, to illustrate the preferred embodiment of the invention. Those skilled in the art will be able to devise numerous arrangements which, although not explicitly shown or described herein, embody the principles of the invention and are within their spirit and scope as defined by the following claims.

Claims (20)

1. A method of transmitting multimedia to wireless clients, wherein the multimedia transmission method depends on:
the creation of multimedia objects from existing multimedia files or dynamically from live multimedia streams;
a direct request and transmission of just the multimedia objects created from existing multimedia files or dynamically created multimedia objects from live multimedia streams by wireless client-based, multimedia object players;
and, a continuous playback of the received multimedia objects by wireless client-based multimedia players that are specifically designed to play continuous sequences of the multimedia objects.
2. The method 1, running on a distributed network system for multimedia-on-demand, utilizing a centralized content server, an indexing host, multimedia object creator and transcoder for live broadcast applications or to transcode and create multimedia objects from archived multimedia files, and distributed content servers involving high capacity cellular network proxy servers and mobile clients running downloaded java applets or embedded or downloaded non-java multimedia object players.
3. The method 1, wherein the transmission of said multimedia objects is by protocols such as:
HTTP, FTP, IMAP4 and NNTP, which have the capability to serve files in a directory structure;
and where HTTP 1.1 is used and allows pipelined connections over persistent TCP connections, multimedia object players can request many multimedia objects in rapid succession.
4. The method of claim 1, wherein the wireless client-based multimedia players are downloadable java applet multimedia object players or non-java multimedia object players, or embedded multimedia object players.
5. A method of creating multimedia objects, where, in the case of a live multimedia stream, the input multimedia stream is first transcoded into a optimal audiovisual format such as MPEG4/AAC and at an optimal encoding rate reflecting available cellular network bandwidth, then dynamically converted into multimedia objects by splitting the encoded stream into specified intervals, and then immediately transmitted to wireless clients to distributed content servers transmitting the recently created multimedia objects to wireless clients; alternatively, in the case of converting an archived multimedia file, the input multimedia stream is first transcoded into a optimal audiovisual format such as MPEG4/AAC and at an optimal encoding rate reflecting available cellular network bandwidth, and then converted into multimedia objects by splitting the encoded stream into specified intervals.
6. The method of 5, wherein the dynamically created multimedia objects are maintained by content servers serving the multimedia objects to wireless clients, as a windows of multimedia objects, during transmission to wireless clients.
7. The method 5, wherein the input multimedia stream is scanned after specified intervals for the next I-frame, and the multimedia segment is split at that next I-frame to create another multimedia object.
8. The method 5, wherein the input multimedia stream can be in analog audiovisual format or a variety of digital audiovisual formats, including MPEG4, MPEG1, MPEG2, MOV, AVI, WMV, ASF, and higher encoded MPEG4, or just audio formats, including analog audio, mp3, AMR, Windows Media Audio, RealAudio and higher encoded AAC.
9. The method 6, wherein a window of multimedia objects for live transmission is created and comprised of a small series of multimedia objects, which can be incremented and decremented as newly created objects are introduced to the window or transmitted to wireless clients.
10. The method 5, wherein multimedia objects are identified when they are created by the multimedia object creator with an Internet address that includes such information as:
the transport protocol;
the varying host URL, if there are many content servers involved as in a live broadcast application, of the transmission server or content server directly serving the wireless client;
the name of the multimedia object sequence or broadcast;
the number of multimedia objects in the sequence;
and, the multimedia object's sequence number.
11. The method 5, whereby multimedia objects are spit from multiple MPEG4 composite layer streams by scanning time intervals and splitting them at next I-frames.
12. The method 5, whereby audio media objects are split from a single audio stream by splitting at set time intervals.
13. A method of wireless client side processing of multimedia objects by multimedia object player, wherein:
the Identification of the multimedia object is parsed and, the total number of multimedia objects within the Identification path is determined, or the number of multimedia objects in window is determined for live applications;
heap memory allocations for said multimedia objects and meta-data are determined;
to create a buffer on the wireless client for more than one multimedia object;
to identify multimedia object playing, multimedia object receiving and multimedia wait for states for the multimedia object sequence; to hence use these states as a mechanism to synchronize the reception and playback of multimedia objects;
to pass this information onto the audio and/or video decoding components of the multimedia player to properly configure them to uniquely process the sequence of multimedia objects.
14. The method 13, whereby, following configuration of audio and/or video decoding components for a specific sequence of multimedia objects, the multimedia object player can delay playback until the multimedia object buffers in the wireless client memory have filled or can begin playback immediately while requesting the next multimedia object;
and, whereby, the multimedia object player decision can be based on the speed at which the multimedia objects are retrieved versus the playback time of each multimedia object, the latency of requests for multimedia objects, or the number of multimedia objects that can be stored in wireless client memory at once.
15. The method 13, following the parsing of the first multimedia object, its audio and video contents of the first and each subsequent multimedia object in the sequence are decoded and played back, whereby sufficient audio frames are decoded that their total display time is as long as the associated video frame and processing time of the next audio frame; and, whereby interleaving the processing between several audio frames and a single video frame, the multimedia object player can perform audio and video decoding in a single thread.
16. The method 13, whereby state information, also provides a mechanism that can be used to skip backwards and forwards through a multimedia object sequence, wherein changing the state information and restarting retrieval of multimedia objects, repositions playback from any multimedia object in the sequence; and, wherein the transmission is a live transmission, state information can reposition playback from any multimedia object within a current window.
17. A method for processing the large scale distribution of multimedia content in the distributed network being managed by an indexing host server, wherein: the indexing host registers all URLs of content servers supporting particular live multimedia object transmissions and archived sequences of multimedia objects;
remote transcoding/multimedia object creating servers provide registered updates of multimedia object sequence indices to the indexing host;
remote transcoding/multimedia object creating servers also register the sequence indices of the most recent windows of live content multimedia objects with the indexing host; wherein content servers accept and store the most current window of live content multimedia objects or the most recent non-live archives of multimedia object sequences;
content servers transmit their multimedia directly to wireless clients, or indirectly through cellular network proxy servers;
and whereby, the indexing host verifies the wireless client; the indexing host accepts requests from wireless clients for multimedia content;
the indexing host determines the most suitable content server for the wireless client;
and, the indexing host provides the wireless client with a decryption string for the requested multimedia content.
18. A method of optimized video decoding in decoding Variable Length Codes (VLCs) in Huffman codebooks which are used to compress Motion Vectors for motion compensation occurring in many macroblocks within P-frames, whereby, bits are read off the main video stream into an integer buffer (N);
the number of bits read is equivalent to the longest code in the VLC codebook;
the roof of logarithm (base 2) of N is taken;
based on the result, N is shifted and used as an index into an array containing the true value indicated in the codebook and the true length of the code;
the number of bits indicated as the true length is then removed from the video stream and processing continues;
said optimized video decoding using a texture buffer large enough to 4 luminance and 2 chrominance blocks (the dimensions of a macroblock exemplified in MPEG4 specification) to store predicted pixels from a reference frame;
said texture buffer decreases the amount of reading from and writing to non-consecutive bytes within the reference and output video frames;
all pixel residues are applied to the texture buffer which is then copied to the output frame; to use a faster but less accurate IDCT algorithm with the process if the wireless handset cannot decode the video stream in real-time, to process these residue values;
furthermore, to minimize the effect of the less accurate IDCT algorithm but using this process first on the chrominance pixel residues;
said optimized video decoding processing faster motion compensation without bilinear interpolation when less quality but faster processing is required;
said optimized digital video decoding performing optimizations in pixel processing and dequantization, whereby original luminance and chrominance values are taken and 128 added and the result divided by 2;
values in the [−128,383] range are then represented in the [0,255] range, decreasing luminance and chrominance accuracy without significantly affected RGB color resolution in the 4-bit to 18-bit range;
said optimized video decoding processing by optimizing Chen's algorithm, whereby, different simplified versions of Chen's algorithm are used, based on the energy input or distribution of input DC and AC coefficients, whereby, the energy or distribution of DC and AC coefficients is first assessed;
a simplified Chen's algorithm is selected for IDCT processing;
a higher quality preference is given to luminance blocks;
and, the process is further optimized by recording rows of the input matrix to the IDCT that are populated with values.
said optimized video decoding processing in the handling YUV to RGB conversion, whereby, YUV and RGB scaling functions are separated; when scaling up, pixels are read on the source plane and copied to the output plane;
when scaling down, iteration is performed through pixel positions in the output plane and source pixels are calculated in the input plane; and, sampling is performed on only a subset of chrominance pixels, avoiding pixel clipping or calculating the Red and Blue values for only a subset of output pixels;
said optimized video decoding processing by using short-cuts to permit video decoding to scale in complexity, based on the processing power of the wireless client, whereby, three quality levels are used with high being consistent with a correct image in the digital codec specification;
medium corresponds to some reduction in image quality to reduce processing time;
and low being a drastic reduction in image quality to improve processing time;
wherein a final option is to avoid the processing and display of P-frames when I-frames occur at regular intervals.
said optimized video decoding processing by using short-cuts to permit video decoding to scale in complexity, based on the processing power of the wireless client, where state information defines the quality at which decoding should be performed at several steps of the decoding process;
said state information consisting of six integer value steps defining state: Quality of the YUV to RGB conversion process;
Quality of the Inverse DCT for luminance blocks;
Quality of the Inverse DCT function for chrominance blocks;
Quality of Motion Compensation for luminance blocks;
Quality of Motion Compensation for chrominance blocks;
and, allowance to drop frames (from a single P-frame occurring before an I-Frame up to dropping all P-Frames);
said state information further including a single integer representing the quality level of the overall encoding, wherein, at each value of overall quality, a ruleset defines quality for each of the step qualities;
and, at the highest overall quality, all step qualities are set to maximum;
and, as overall quality is decreased, step qualities are incrementally reduced according to the ruleset.
19. A method of optimized audio decoding pertaining to a simplification of variable length codes (VLCs) in Huffman codebooks,
wherein, bits are read off the audio stream into an integer N;
the number of bits read is equivalent to maximum number of bits in the longest codeword in the codebook;
the first binary 0 is then located starting from the highest bit;
the left-based index of this first 0 is then used to remove all the previous as; and, N is shifted and used as an array index;
said optimized audio decoding for optimizations in the IMDCT step, whereby, the Inverse Fast Fourier Transform is combined with pre- and post-processing steps to produce a simplified IMDCT algorithm with O(n*nlog(n)) runtime, which can incorporate various IFFT algorithms based on the sparseness of input, and, which specifically involves the following combination of steps in a final optimization:
a) Re-order, pre-scale and twiddle, whereby, the method loops over the input data, and each datum is complex-multiplied by the twiddle factor, and is then re-scaled by doing a bit shift operation; and, the twiddle factor is already bit-shifted so it can be treated as a fixed-point number, so the scaling operation's bit shift is partially performed by the twiddle factor itself; and the relevant twiddle factors are stored in an array table; and finally, once the complex multiplication and scaling are done, the resulting values are stored in the re-ordered location in the IFFT input array;
b) Perform the fixed-point integer inverse Fourier transform;
c) Re-scale, re-order, post-twiddle, window and overlap, whereby combining these four operations into one step replaces four array accesses with one, and some multiplications are also combined into single bit shifts; and hence, the method loops over the IFFT output array, and performs four operations in each iteration of the loop: the post-twiddle and rescale are combined; the post-twiddle uses a twiddle factor table which is already bit-shifted;
and, windowing is combined in this step also, with window values coming from either a table or a fast integer sine calculator;
and finally, values are overlapped and stored in the correct location in the output array;
said optimized audio decoding performing simplified input processing when specific to AAC Low Complexity (LC) audio decoding profile, wherein, Mid/Side, Intensity and Temporal Noise Shaping steps, are optional;
in cases where these three features are not present, there are no dependencies within a frame until the IFFT step within IMDCT itself; and, operations between noiseless decoding and the pre-IFFT operations within IMDCT itself are combined, minimizing memory access.
said optimized audio decoding using an alternative bit-operation based upon Taylor computation, wherein, trigonometric identities are used to express the sine calculation in terms of a sine in the range of 0 to PI/2, resulting in angle X;
X is multiplied by X, resulting in S;
perform a bit-shift operation by calculating X*(256−S*(43−(S<<1));
the result producing a window value in the range of 0 to 255, allowing fast windowing without the use of lookup tables;
and, combining the bit-shift operation with other fixed-point multiplication steps;
said optimized audio decoding using IMDCT short window processing for digital audio decoding, wherein, IMDCT 1024 values are divided into sequences of 8 short windows;
IMDCT window and overlap functions are performed on each short window;
each window of 128 values results in a synthesis output window of 256 values;
these output windows are then overlapped, resulting in a non-zero values in the range of 448 to 1600.
20. A method of low energy gap timing in audio playback, wherein, an interleaved process in audio decoding detects frames of low energy;
audio playback is controlled so a gap will occur during the detected frames, which may be dropped so that synchronization with video is not lost.
US11/107,952 2005-04-18 2005-04-18 Multimedia system for mobile client platforms Abandoned US20060235883A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US11/107,952 US20060235883A1 (en) 2005-04-18 2005-04-18 Multimedia system for mobile client platforms
US15/016,821 US10171873B2 (en) 2005-04-18 2016-02-05 Multimedia system for mobile client platforms
US16/181,285 US10771849B2 (en) 2005-04-18 2018-11-05 Multimedia system for mobile client platforms

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US11/107,952 US20060235883A1 (en) 2005-04-18 2005-04-18 Multimedia system for mobile client platforms

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/016,821 Continuation US10171873B2 (en) 2005-04-18 2016-02-05 Multimedia system for mobile client platforms

Publications (1)

Publication Number Publication Date
US20060235883A1 true US20060235883A1 (en) 2006-10-19

Family

ID=37109796

Family Applications (3)

Application Number Title Priority Date Filing Date
US11/107,952 Abandoned US20060235883A1 (en) 2005-04-18 2005-04-18 Multimedia system for mobile client platforms
US15/016,821 Active 2025-06-25 US10171873B2 (en) 2005-04-18 2016-02-05 Multimedia system for mobile client platforms
US16/181,285 Active 2025-04-26 US10771849B2 (en) 2005-04-18 2018-11-05 Multimedia system for mobile client platforms

Family Applications After (2)

Application Number Title Priority Date Filing Date
US15/016,821 Active 2025-06-25 US10171873B2 (en) 2005-04-18 2016-02-05 Multimedia system for mobile client platforms
US16/181,285 Active 2025-04-26 US10771849B2 (en) 2005-04-18 2018-11-05 Multimedia system for mobile client platforms

Country Status (1)

Country Link
US (3) US20060235883A1 (en)

Cited By (58)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050188820A1 (en) * 2004-02-26 2005-09-01 Lg Electronics Inc. Apparatus and method for processing bell sound
US20070162392A1 (en) * 2006-01-12 2007-07-12 Microsoft Corporation Management of Streaming Content
US20070174656A1 (en) * 2006-01-20 2007-07-26 Microsoft Corporation Manager/Remote Content Architecture
US20070174883A1 (en) * 2006-01-20 2007-07-26 Microsoft Corporation Token Bandwidth Portioning
US20070174287A1 (en) * 2006-01-17 2007-07-26 Microsoft Corporation Virtual Tuner Management
US20070174476A1 (en) * 2006-01-20 2007-07-26 Microsoft Corporation Streaming Content Navigation
US20070180112A1 (en) * 2006-01-30 2007-08-02 Microsoft Corporation Changeable Token Bandwidth Portioning
US20070204313A1 (en) * 2006-02-27 2007-08-30 Microsoft Corporation Token Locking to Schedule Content Consumption
US20070203714A1 (en) * 2006-02-28 2007-08-30 Microsoft Corporation Purchasable Token Bandwidth Portioning
US20080037726A1 (en) * 2006-07-21 2008-02-14 Rose Yao Method and System for Integrating Voicemail and Electronic Messaging
US20080037721A1 (en) * 2006-07-21 2008-02-14 Rose Yao Method and System for Generating and Presenting Conversation Threads Having Email, Voicemail and Chat Messages
US20080165859A1 (en) * 2007-01-10 2008-07-10 Chih-Ta Star Sung Method of digital video frame buffer compression
WO2008094635A1 (en) * 2007-01-31 2008-08-07 Hewlett-Packard Development Company, L.P. Transcoding of media content
US20090106356A1 (en) * 2007-10-19 2009-04-23 Swarmcast, Inc. Media playback point seeking using data range requests
US20090150557A1 (en) * 2007-12-05 2009-06-11 Swarmcast, Inc. Dynamic bit rate scaling
US20090204885A1 (en) * 2008-02-13 2009-08-13 Ellsworth Thomas N Automated management and publication of electronic content from mobile nodes
WO2009140208A2 (en) * 2008-05-12 2009-11-19 Swarmcast, Inc. Live media delivery over a packet-based computer network
US20090292820A1 (en) * 2008-05-20 2009-11-26 Htc Corporation Method for playing streaming data, electronic device for performing the same and information storage media for storing the same
US20100011293A1 (en) * 2007-07-17 2010-01-14 Huawei Technologies Co., Ltd. Method and Apparatus for Generating Prompt Information of a Mobile Terminal
US20100023579A1 (en) * 2008-06-18 2010-01-28 Onion Networks, KK Dynamic media bit rates based on enterprise data transfer policies
US20100088277A1 (en) * 2008-10-07 2010-04-08 Ocarina Networks Object deduplication and application aware snapshots
US20100146145A1 (en) * 2008-12-04 2010-06-10 Swarmcast, Inc. Adaptive playback rate with look-ahead
US20100284389A1 (en) * 2008-01-07 2010-11-11 Max Ramsay Systems and methods for providing a media playback in a networked environment
US20100306373A1 (en) * 2009-06-01 2010-12-02 Swarmcast, Inc. Data retrieval based on bandwidth cost and delay
US20110064140A1 (en) * 2005-07-20 2011-03-17 Humax Co., Ltd. Encoder and decoder
US7925774B2 (en) 2008-05-30 2011-04-12 Microsoft Corporation Media streaming using an index file
WO2012047028A3 (en) * 2010-10-06 2012-05-31 한국전자통신연구원 Apparatus and method for providing streaming content
US8265140B2 (en) 2008-09-30 2012-09-11 Microsoft Corporation Fine-grained client-side control of scalable media delivery
US8325800B2 (en) 2008-05-07 2012-12-04 Microsoft Corporation Encoding streaming media as a high bit rate layer, a low bit rate layer, and one or more intermediate bit rate layers
US8379851B2 (en) 2008-05-12 2013-02-19 Microsoft Corporation Optimized client side rate control and indexed file layout for streaming media
WO2013095354A1 (en) * 2011-12-20 2013-06-27 Intel Corporation Enhanced wireless display
US20130305291A1 (en) * 2006-01-27 2013-11-14 Robin Dua Method and system to share media content between devices via near field commmunication (nfc) and wireless communication
US20140281013A1 (en) * 2010-10-06 2014-09-18 Electronics And Telecommunications Research Institute Apparatus and method for providing streaming content
US20150110166A1 (en) * 2013-10-23 2015-04-23 Paladin Innovators Mechanics and Processes for Remote Control of Live Video Production
WO2015067990A1 (en) 2013-11-06 2015-05-14 Freescale Semiconductor, Inc. FFT device and method for performing a Fast Fourier Transform
US10182387B2 (en) 2016-06-01 2019-01-15 At&T Intellectual Property I, L.P. Method and apparatus for distributing content via diverse networks
US10200668B2 (en) * 2012-04-09 2019-02-05 Intel Corporation Quality of experience reporting for combined unicast-multicast/broadcast streaming of media content
CN109669793A (en) * 2018-12-24 2019-04-23 中国人民解放军国防科技大学 Object calling method in middleware process
US10277660B1 (en) 2010-09-06 2019-04-30 Ideahub Inc. Apparatus and method for providing streaming content
US10291215B2 (en) * 2015-03-23 2019-05-14 Seiko Epson Corporation Data processing circuit, physical quantity detection circuit, physical quantity detection device, electronic apparatus, and moving object
CN109890071A (en) * 2012-10-18 2019-06-14 Vid拓展公司 The decoding complex degree of mobile multimedia stream
US10362130B2 (en) 2010-07-20 2019-07-23 Ideahub Inc. Apparatus and method for providing streaming contents
US20200204810A1 (en) * 2018-12-21 2020-06-25 Hulu, LLC Adaptive bitrate algorithm with cross-user based viewport prediction for 360-degree video streaming
US10992955B2 (en) 2011-01-05 2021-04-27 Divx, Llc Systems and methods for performing adaptive bitrate streaming
US11017816B2 (en) 2003-12-08 2021-05-25 Divx, Llc Multimedia distribution system
US20210185291A1 (en) * 2019-12-17 2021-06-17 Realtek Semiconductor Corporation Video interface conversion device and method
US11050808B2 (en) 2007-01-05 2021-06-29 Divx, Llc Systems and methods for seeking within multimedia content during streaming playback
US20210218972A1 (en) * 2006-08-17 2021-07-15 Electronics And Telecommunications Research Institute Apparatus for encoding and decoding image using adaptive dct coefficient scanning based on pixel similarity and method therefor
US11102553B2 (en) 2009-12-04 2021-08-24 Divx, Llc Systems and methods for secure playback of encrypted elementary bitstreams
US11115450B2 (en) 2011-08-31 2021-09-07 Divx, Llc Systems, methods, and media for playing back protected video content by using top level index file
US11159746B2 (en) 2003-12-08 2021-10-26 Divx, Llc Multimedia distribution system for multimedia files with packed frames
US20220038771A1 (en) * 2013-03-06 2022-02-03 Interdigital Patent Holdings, Inc. Power aware adaptation for video streaming
US11457054B2 (en) 2011-08-30 2022-09-27 Divx, Llc Selection of resolutions for seamless resolution switching of multimedia content
US11495266B2 (en) 2007-11-16 2022-11-08 Divx, Llc Systems and methods for playing back multimedia files incorporating reduced index structures
US11683542B2 (en) 2011-09-01 2023-06-20 Divx, Llc Systems and methods for distributing content using a common set of encryption keys
US11711410B2 (en) 2015-01-06 2023-07-25 Divx, Llc Systems and methods for encoding and sharing content between devices
US11785066B2 (en) 2012-12-31 2023-10-10 Divx, Llc Systems, methods, and media for controlling delivery of content
US11886545B2 (en) 2006-03-14 2024-01-30 Divx, Llc Federated digital rights management scheme including trusted systems

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106598854B (en) * 2016-12-08 2019-08-02 武汉斗鱼网络科技有限公司 A kind of real-time method and device for obtaining pc client software data
US10397286B2 (en) 2017-05-05 2019-08-27 At&T Intellectual Property I, L.P. Estimating network data streaming rate
US10382517B2 (en) 2017-06-09 2019-08-13 At&T Intellectual Property I, L.P. Estimating network data encoding rate
EP3997888A1 (en) 2019-07-12 2022-05-18 Carrier Corporation A system and a method for streaming videos by creating object urls at client
CN110769269A (en) * 2019-11-08 2020-02-07 北京工业大学 Local area network screen live broadcast delay optimization method
CN111147895A (en) * 2019-12-24 2020-05-12 西安天互通信有限公司 TB-level video transcoding method based on cloud computing
CN111355979B (en) * 2020-03-09 2022-04-08 联通沃音乐文化有限公司 Online audio rapid playing method

Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5699121A (en) * 1995-09-21 1997-12-16 Regents Of The University Of California Method and apparatus for compression of low bit rate video signals
US5768528A (en) * 1996-05-24 1998-06-16 V-Cast, Inc. Client-server system for delivery of online information
US6286031B1 (en) * 1999-01-21 2001-09-04 Jerry Richard Waese Scalable multimedia distribution method using client pull to retrieve objects in a client-specific multimedia list
US20010024239A1 (en) * 1998-07-27 2001-09-27 Webtv Networks, Inc. Bandwidth optimization
US6490320B1 (en) * 2000-02-02 2002-12-03 Mitsubishi Electric Research Laboratories Inc. Adaptable bitstream video delivery system
US6498810B1 (en) * 1997-09-12 2002-12-24 Lg Electronics Inc. Method for motion vector coding of MPEG-4
US6631403B1 (en) * 1998-05-11 2003-10-07 At&T Corp. Architecture and application programming interfaces for Java-enabled MPEG-4 (MPEG-J) systems
US6637031B1 (en) * 1998-12-04 2003-10-21 Microsoft Corporation Multimedia presentation latency minimization
US6703948B1 (en) * 1999-12-08 2004-03-09 Robert Bosch Gmbh Method for decoding digital audio data
US6792048B1 (en) * 1999-10-29 2004-09-14 Samsung Electronics Co., Ltd. Terminal supporting signaling used in transmission and reception of MPEG-4 data
US6792449B2 (en) * 2001-06-28 2004-09-14 Microsoft Corporation Startup methods and apparatuses for use in streaming content
US20040194144A1 (en) * 2003-03-25 2004-09-30 Chi-Tai Lin Method for asynchronously watching programs from the internet and the system thereof
US6807526B2 (en) * 1999-12-08 2004-10-19 France Telecom S.A. Method of and apparatus for processing at least one coded binary audio flux organized into frames
US6810233B2 (en) * 1999-03-05 2004-10-26 Xm Satellite Radio Inc. System for providing signals from an auxiliary audio source to a radio receiver using a wireless link
US7120194B2 (en) * 1999-12-22 2006-10-10 Neomtel Co. Ltd. System for moving image data using wireless communication and the method of the same
US7330509B2 (en) * 2003-09-12 2008-02-12 International Business Machines Corporation Method for video transcoding with adaptive frame rate control
US7738550B2 (en) * 2000-03-13 2010-06-15 Sony Corporation Method and apparatus for generating compact transcoding hints metadata
US20110292996A1 (en) * 2002-07-01 2011-12-01 Arris Group, Inc. Efficient Compression and Transport of Video over a Network

Family Cites Families (46)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5202961A (en) 1990-06-08 1993-04-13 Apple Computer, Inc. Sequential information controller
US5434678A (en) 1993-01-11 1995-07-18 Abecassis; Max Seamless transmission of non-sequential video segments
US5414455A (en) 1993-07-07 1995-05-09 Digital Equipment Corporation Segmented video on demand system
US5610841A (en) 1993-09-30 1997-03-11 Matsushita Electric Industrial Co., Ltd. Video server
US5579239A (en) 1994-02-16 1996-11-26 Freeman; Mitchael C. Remote video transmission system
US5434860A (en) 1994-04-20 1995-07-18 Apple Computer, Inc. Flow control for real-time data streams
US5612742A (en) 1994-10-19 1997-03-18 Imedia Corporation Method and apparatus for encoding and formatting data representing a video program to provide multiple overlapping presentations of the video program
US5821986A (en) 1994-11-03 1998-10-13 Picturetel Corporation Method and apparatus for visual communications in a scalable network environment
JP3184763B2 (en) 1995-06-07 2001-07-09 インターナショナル・ビジネス・マシーンズ・コーポレ−ション Multimedia direct access storage device and format method
US6119154A (en) 1995-07-14 2000-09-12 Oracle Corporation Method and apparatus for non-sequential access to an in-progress video feed
US6112226A (en) 1995-07-14 2000-08-29 Oracle Corporation Method and apparatus for concurrently encoding and tagging digital information for allowing non-sequential access during playback
US6138147A (en) 1995-07-14 2000-10-24 Oracle Corporation Method and apparatus for implementing seamless playback of continuous media feeds
IL115263A (en) 1995-09-12 1999-04-11 Vocaltec Ltd System and method for distributing multi-media presentations in a computer network
JPH0981497A (en) 1995-09-12 1997-03-28 Toshiba Corp Real-time stream server, storing method for real-time stream data and transfer method therefor
US5933603A (en) 1995-10-27 1999-08-03 Emc Corporation Video file server maintaining sliding windows of a video data set in random access memories of stream server computers for immediate video-on-demand service beginning at any specified location
EP0773503B1 (en) 1995-11-10 2004-03-31 Kabushiki Kaisha Toshiba File transfer method, method for a file requesting client device, and file server device
US5841432A (en) 1996-02-09 1998-11-24 Carmel; Sharon Method and system of building and transmitting a data file for real time play of multimedia, particularly animation, and a data file for real time play of multimedia applications
IL117133A (en) 1996-02-14 1999-07-14 Olivr Corp Ltd Method and system for providing on-line virtual reality movies
WO1997030551A1 (en) 1996-02-14 1997-08-21 Olivr Corporation Ltd. Method and systems for progressive asynchronous transmission of multimedia data
US6195352B1 (en) 1996-03-15 2001-02-27 Network Associates, Inc. System and method for automatically identifying and analyzing currently active channels in an ATM network
US5928330A (en) 1996-09-06 1999-07-27 Motorola, Inc. System, device, and method for streaming a multimedia file
US5819160A (en) 1996-09-18 1998-10-06 At&T Corp Programmable radio subscription system for receiving selectively defined information
US5953506A (en) 1996-12-17 1999-09-14 Adaptive Media Technologies Method and apparatus that provides a scalable media delivery system
EP0962097A4 (en) 1997-01-29 2006-08-09 Digital Advertising And Market Method of transferring media files over a communications network
US6014706A (en) 1997-01-30 2000-01-11 Microsoft Corporation Methods and apparatus for implementing control functions in a streamed video display system
US5903673A (en) 1997-03-14 1999-05-11 Microsoft Corporation Digital video signal encoder and encoding method
US6292834B1 (en) 1997-03-14 2001-09-18 Microsoft Corporation Dynamic bandwidth selection for efficient transmission of multimedia streams in a computer network
US6151632A (en) 1997-03-14 2000-11-21 Microsoft Corporation Method and apparatus for distributed transmission of real-time multimedia information
US6032193A (en) 1997-03-20 2000-02-29 Niobrara Research And Development Corporation Computer system having virtual circuit address altered by local computer to switch to different physical data link to increase data transmission bandwidth
US5892915A (en) 1997-04-25 1999-04-06 Emc Corporation System having client sending edit commands to server during transmission of continuous media from one clip in play list for editing the play list
US5974503A (en) 1997-04-25 1999-10-26 Emc Corporation Storage and access of continuous media files indexed as lists of raid stripe sets associated with file names
US6014694A (en) 1997-06-26 2000-01-11 Citrix Systems, Inc. System for adaptive video/audio transport over a network
US5996015A (en) 1997-10-31 1999-11-30 International Business Machines Corporation Method of delivering seamless and continuous presentation of multimedia data files to a target device by assembling and concatenating multimedia segments in memory
US6385596B1 (en) 1998-02-06 2002-05-07 Liquid Audio, Inc. Secure online music distribution system
IL123819A (en) 1998-03-24 2001-09-13 Geo Interactive Media Group Lt Network media streaming
US20010042107A1 (en) 2000-01-06 2001-11-15 Palm Stephen R. Networked audio player transport protocol and architecture
US6747991B1 (en) * 2000-04-26 2004-06-08 Carnegie Mellon University Filter and method for adaptively modifying the bit rate of synchronized video and audio streams to meet packet-switched network bandwidth constraints
WO2001093161A1 (en) 2000-05-26 2001-12-06 Zebus Group, Inc. Online multimedia system and method
US7191242B1 (en) 2000-06-22 2007-03-13 Apple, Inc. Methods and apparatuses for transferring data
FI112307B (en) 2000-08-02 2003-11-14 Nokia Corp communication Server
US6963972B1 (en) * 2000-09-26 2005-11-08 International Business Machines Corporation Method and apparatus for networked information dissemination through secure transcoding
US7739327B2 (en) 2001-04-05 2010-06-15 Playstream Inc. Distributed link processing system for delivering application and multi-media content on the internet
AU2002334720B8 (en) * 2001-09-26 2006-08-10 Interact Devices, Inc. System and method for communicating media signals
US20030204602A1 (en) 2002-04-26 2003-10-30 Hudson Michael D. Mediated multi-source peer content delivery network architecture
KR100486713B1 (en) 2002-09-17 2005-05-03 삼성전자주식회사 Apparatus and method for streaming multimedia data
US8560595B2 (en) 2006-06-23 2013-10-15 Microsoft Corporation Virtualization of mobile device user experience

Patent Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5699121A (en) * 1995-09-21 1997-12-16 Regents Of The University Of California Method and apparatus for compression of low bit rate video signals
US5768528A (en) * 1996-05-24 1998-06-16 V-Cast, Inc. Client-server system for delivery of online information
US6498810B1 (en) * 1997-09-12 2002-12-24 Lg Electronics Inc. Method for motion vector coding of MPEG-4
US6631403B1 (en) * 1998-05-11 2003-10-07 At&T Corp. Architecture and application programming interfaces for Java-enabled MPEG-4 (MPEG-J) systems
US20010024239A1 (en) * 1998-07-27 2001-09-27 Webtv Networks, Inc. Bandwidth optimization
US6637031B1 (en) * 1998-12-04 2003-10-21 Microsoft Corporation Multimedia presentation latency minimization
US6286031B1 (en) * 1999-01-21 2001-09-04 Jerry Richard Waese Scalable multimedia distribution method using client pull to retrieve objects in a client-specific multimedia list
US6810233B2 (en) * 1999-03-05 2004-10-26 Xm Satellite Radio Inc. System for providing signals from an auxiliary audio source to a radio receiver using a wireless link
US6792048B1 (en) * 1999-10-29 2004-09-14 Samsung Electronics Co., Ltd. Terminal supporting signaling used in transmission and reception of MPEG-4 data
US6807526B2 (en) * 1999-12-08 2004-10-19 France Telecom S.A. Method of and apparatus for processing at least one coded binary audio flux organized into frames
US6703948B1 (en) * 1999-12-08 2004-03-09 Robert Bosch Gmbh Method for decoding digital audio data
US7120194B2 (en) * 1999-12-22 2006-10-10 Neomtel Co. Ltd. System for moving image data using wireless communication and the method of the same
US6490320B1 (en) * 2000-02-02 2002-12-03 Mitsubishi Electric Research Laboratories Inc. Adaptable bitstream video delivery system
US7738550B2 (en) * 2000-03-13 2010-06-15 Sony Corporation Method and apparatus for generating compact transcoding hints metadata
US6792449B2 (en) * 2001-06-28 2004-09-14 Microsoft Corporation Startup methods and apparatuses for use in streaming content
US20110292996A1 (en) * 2002-07-01 2011-12-01 Arris Group, Inc. Efficient Compression and Transport of Video over a Network
US20040194144A1 (en) * 2003-03-25 2004-09-30 Chi-Tai Lin Method for asynchronously watching programs from the internet and the system thereof
US7330509B2 (en) * 2003-09-12 2008-02-12 International Business Machines Corporation Method for video transcoding with adaptive frame rate control

Cited By (120)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11735227B2 (en) 2003-12-08 2023-08-22 Divx, Llc Multimedia distribution system
US11509839B2 (en) 2003-12-08 2022-11-22 Divx, Llc Multimedia distribution system for multimedia files with packed frames
US11355159B2 (en) 2003-12-08 2022-06-07 Divx, Llc Multimedia distribution system
US11297263B2 (en) 2003-12-08 2022-04-05 Divx, Llc Multimedia distribution system for multimedia files with packed frames
US11735228B2 (en) 2003-12-08 2023-08-22 Divx, Llc Multimedia distribution system
US11159746B2 (en) 2003-12-08 2021-10-26 Divx, Llc Multimedia distribution system for multimedia files with packed frames
US11017816B2 (en) 2003-12-08 2021-05-25 Divx, Llc Multimedia distribution system
US20050188820A1 (en) * 2004-02-26 2005-09-01 Lg Electronics Inc. Apparatus and method for processing bell sound
US9083972B2 (en) * 2005-07-20 2015-07-14 Humax Holdings Co., Ltd. Encoder and decoder
US20110064140A1 (en) * 2005-07-20 2011-03-17 Humax Co., Ltd. Encoder and decoder
US7634652B2 (en) 2006-01-12 2009-12-15 Microsoft Corporation Management of streaming content
US20070162392A1 (en) * 2006-01-12 2007-07-12 Microsoft Corporation Management of Streaming Content
US20070174287A1 (en) * 2006-01-17 2007-07-26 Microsoft Corporation Virtual Tuner Management
US7669222B2 (en) 2006-01-17 2010-02-23 Microsoft Corporation Virtual tuner management
US8739230B2 (en) 2006-01-20 2014-05-27 Microsoft Corporation Manager/remote content architecture
US20070174656A1 (en) * 2006-01-20 2007-07-26 Microsoft Corporation Manager/Remote Content Architecture
US20070174883A1 (en) * 2006-01-20 2007-07-26 Microsoft Corporation Token Bandwidth Portioning
US20070174476A1 (en) * 2006-01-20 2007-07-26 Microsoft Corporation Streaming Content Navigation
US7685306B2 (en) * 2006-01-20 2010-03-23 Microsoft Corporation Streaming content navigation
US10433006B2 (en) 2006-01-27 2019-10-01 Syndefense Corp. Method, apparatus, and system for accessing data storage with biometric verification
US9736535B2 (en) 2006-01-27 2017-08-15 Syndefense Corp. System, method, and device to configure devices via a remote with biometrics
US20130305291A1 (en) * 2006-01-27 2013-11-14 Robin Dua Method and system to share media content between devices via near field commmunication (nfc) and wireless communication
US10349128B2 (en) 2006-01-27 2019-07-09 Syndefense Corp Set-top box apparatus, system, and method of multimedia presentation
US10154306B2 (en) 2006-01-27 2018-12-11 Syndefense Corp. Method, apparatus, and system for streaming data with biometric verification
US10462522B2 (en) * 2006-01-27 2019-10-29 Syndefense, Corp. Method, system, and apparatus to provide media content from broadcast media sources to media devices
US20070180112A1 (en) * 2006-01-30 2007-08-02 Microsoft Corporation Changeable Token Bandwidth Portioning
US20070204313A1 (en) * 2006-02-27 2007-08-30 Microsoft Corporation Token Locking to Schedule Content Consumption
US20070203714A1 (en) * 2006-02-28 2007-08-30 Microsoft Corporation Purchasable Token Bandwidth Portioning
US11886545B2 (en) 2006-03-14 2024-01-30 Divx, Llc Federated digital rights management scheme including trusted systems
US7769144B2 (en) * 2006-07-21 2010-08-03 Google Inc. Method and system for generating and presenting conversation threads having email, voicemail and chat messages
US8520809B2 (en) 2006-07-21 2013-08-27 Google Inc. Method and system for integrating voicemail and electronic messaging
US20080037726A1 (en) * 2006-07-21 2008-02-14 Rose Yao Method and System for Integrating Voicemail and Electronic Messaging
US20080037721A1 (en) * 2006-07-21 2008-02-14 Rose Yao Method and System for Generating and Presenting Conversation Threads Having Email, Voicemail and Chat Messages
US8121263B2 (en) 2006-07-21 2012-02-21 Google Inc. Method and system for integrating voicemail and electronic messaging
US11949881B2 (en) * 2006-08-17 2024-04-02 Electronics And Telecommunications Research Institute Apparatus for encoding and decoding image using adaptive DCT coefficient scanning based on pixel similarity and method therefor
US20210218972A1 (en) * 2006-08-17 2021-07-15 Electronics And Telecommunications Research Institute Apparatus for encoding and decoding image using adaptive dct coefficient scanning based on pixel similarity and method therefor
US11706276B2 (en) 2007-01-05 2023-07-18 Divx, Llc Systems and methods for seeking within multimedia content during streaming playback
US11050808B2 (en) 2007-01-05 2021-06-29 Divx, Llc Systems and methods for seeking within multimedia content during streaming playback
US20080165859A1 (en) * 2007-01-10 2008-07-10 Chih-Ta Star Sung Method of digital video frame buffer compression
WO2008094635A1 (en) * 2007-01-31 2008-08-07 Hewlett-Packard Development Company, L.P. Transcoding of media content
US20100011293A1 (en) * 2007-07-17 2010-01-14 Huawei Technologies Co., Ltd. Method and Apparatus for Generating Prompt Information of a Mobile Terminal
US8635360B2 (en) 2007-10-19 2014-01-21 Google Inc. Media playback point seeking using data range requests
US20090106356A1 (en) * 2007-10-19 2009-04-23 Swarmcast, Inc. Media playback point seeking using data range requests
US11495266B2 (en) 2007-11-16 2022-11-08 Divx, Llc Systems and methods for playing back multimedia files incorporating reduced index structures
US20090150557A1 (en) * 2007-12-05 2009-06-11 Swarmcast, Inc. Dynamic bit rate scaling
US9608921B2 (en) 2007-12-05 2017-03-28 Google Inc. Dynamic bit rate scaling
US8543720B2 (en) 2007-12-05 2013-09-24 Google Inc. Dynamic bit rate scaling
US20100284389A1 (en) * 2008-01-07 2010-11-11 Max Ramsay Systems and methods for providing a media playback in a networked environment
US8724600B2 (en) * 2008-01-07 2014-05-13 Tymphany Hong Kong Limited Systems and methods for providing a media playback in a networked environment
USRE48946E1 (en) * 2008-01-07 2022-02-22 D&M Holdings, Inc. Systems and methods for providing a media playback in a networked environment
US20090204885A1 (en) * 2008-02-13 2009-08-13 Ellsworth Thomas N Automated management and publication of electronic content from mobile nodes
WO2009102666A1 (en) * 2008-02-13 2009-08-20 Gotv Networks, Inc. Automated management and publication of electronic content from mobile nodes
US8325800B2 (en) 2008-05-07 2012-12-04 Microsoft Corporation Encoding streaming media as a high bit rate layer, a low bit rate layer, and one or more intermediate bit rate layers
US9571550B2 (en) 2008-05-12 2017-02-14 Microsoft Technology Licensing, Llc Optimized client side rate control and indexed file layout for streaming media
US8379851B2 (en) 2008-05-12 2013-02-19 Microsoft Corporation Optimized client side rate control and indexed file layout for streaming media
US8661098B2 (en) 2008-05-12 2014-02-25 Google Inc. Live media delivery over a packet-based computer network
US8301732B2 (en) 2008-05-12 2012-10-30 Google Inc. Live media delivery over a packet-based computer network
WO2009140208A2 (en) * 2008-05-12 2009-11-19 Swarmcast, Inc. Live media delivery over a packet-based computer network
US7979570B2 (en) 2008-05-12 2011-07-12 Swarmcast, Inc. Live media delivery over a packet-based computer network
US20090287841A1 (en) * 2008-05-12 2009-11-19 Swarmcast, Inc. Live media delivery over a packet-based computer network
WO2009140208A3 (en) * 2008-05-12 2010-01-14 Swarmcast, Inc. Live media delivery over a packet-based computer network
US8364838B2 (en) * 2008-05-20 2013-01-29 Htc Corporation Method for playing streaming data, electronic device for performing the same and information storage media for storing the same
US20090292820A1 (en) * 2008-05-20 2009-11-26 Htc Corporation Method for playing streaming data, electronic device for performing the same and information storage media for storing the same
US7949775B2 (en) 2008-05-30 2011-05-24 Microsoft Corporation Stream selection for enhanced media streaming
US8370887B2 (en) 2008-05-30 2013-02-05 Microsoft Corporation Media streaming with enhanced seek operation
US7925774B2 (en) 2008-05-30 2011-04-12 Microsoft Corporation Media streaming using an index file
US8819754B2 (en) 2008-05-30 2014-08-26 Microsoft Corporation Media streaming with enhanced seek operation
US8458355B1 (en) 2008-06-18 2013-06-04 Google Inc. Dynamic media bit rates based on enterprise data transfer policies
US8150992B2 (en) 2008-06-18 2012-04-03 Google Inc. Dynamic media bit rates based on enterprise data transfer policies
US20100023579A1 (en) * 2008-06-18 2010-01-28 Onion Networks, KK Dynamic media bit rates based on enterprise data transfer policies
US8880722B2 (en) 2008-06-18 2014-11-04 Google Inc. Dynamic media bit rates based on enterprise data transfer policies
US8265140B2 (en) 2008-09-30 2012-09-11 Microsoft Corporation Fine-grained client-side control of scalable media delivery
US8694466B2 (en) * 2008-10-07 2014-04-08 Dell Products L.P. Object deduplication and application aware snapshots
US9613043B2 (en) 2008-10-07 2017-04-04 Quest Software Inc. Object deduplication and application aware snapshots
US20100088277A1 (en) * 2008-10-07 2010-04-08 Ocarina Networks Object deduplication and application aware snapshots
US9251161B2 (en) 2008-10-07 2016-02-02 Dell Products L.P. Object deduplication and application aware snapshots
US20100146145A1 (en) * 2008-12-04 2010-06-10 Swarmcast, Inc. Adaptive playback rate with look-ahead
US9112938B2 (en) 2008-12-04 2015-08-18 Google Inc. Adaptive playback with look-ahead
US8375140B2 (en) 2008-12-04 2013-02-12 Google Inc. Adaptive playback rate with look-ahead
US20100306373A1 (en) * 2009-06-01 2010-12-02 Swarmcast, Inc. Data retrieval based on bandwidth cost and delay
US9948708B2 (en) * 2009-06-01 2018-04-17 Google Llc Data retrieval based on bandwidth cost and delay
US11102553B2 (en) 2009-12-04 2021-08-24 Divx, Llc Systems and methods for secure playback of encrypted elementary bitstreams
US10362130B2 (en) 2010-07-20 2019-07-23 Ideahub Inc. Apparatus and method for providing streaming contents
US10819815B2 (en) 2010-07-20 2020-10-27 Ideahub Inc. Apparatus and method for providing streaming content
US10277660B1 (en) 2010-09-06 2019-04-30 Ideahub Inc. Apparatus and method for providing streaming content
WO2012047028A3 (en) * 2010-10-06 2012-05-31 한국전자통신연구원 Apparatus and method for providing streaming content
US9369512B2 (en) * 2010-10-06 2016-06-14 Electronics And Telecommunications Research Institute Apparatus and method for providing streaming content
US9986009B2 (en) 2010-10-06 2018-05-29 Electronics And Telecommunications Research Institute Apparatus and method for providing streaming content
US20130185398A1 (en) * 2010-10-06 2013-07-18 Industry-University Cooperation Foundation Korea Aerospace University Apparatus and method for providing streaming content
US8909805B2 (en) * 2010-10-06 2014-12-09 Electronics And Telecommunications Research Institute Apparatus and method for providing streaming content
US20140281013A1 (en) * 2010-10-06 2014-09-18 Electronics And Telecommunications Research Institute Apparatus and method for providing streaming content
US10992955B2 (en) 2011-01-05 2021-04-27 Divx, Llc Systems and methods for performing adaptive bitrate streaming
US11638033B2 (en) 2011-01-05 2023-04-25 Divx, Llc Systems and methods for performing adaptive bitrate streaming
US11457054B2 (en) 2011-08-30 2022-09-27 Divx, Llc Selection of resolutions for seamless resolution switching of multimedia content
US11115450B2 (en) 2011-08-31 2021-09-07 Divx, Llc Systems, methods, and media for playing back protected video content by using top level index file
US11716371B2 (en) 2011-08-31 2023-08-01 Divx, Llc Systems and methods for automatically generating top level index files
US11683542B2 (en) 2011-09-01 2023-06-20 Divx, Llc Systems and methods for distributing content using a common set of encryption keys
US9756333B2 (en) 2011-12-20 2017-09-05 Intel Corporation Enhanced wireless display
WO2013095354A1 (en) * 2011-12-20 2013-06-27 Intel Corporation Enhanced wireless display
US10200668B2 (en) * 2012-04-09 2019-02-05 Intel Corporation Quality of experience reporting for combined unicast-multicast/broadcast streaming of media content
CN109890071A (en) * 2012-10-18 2019-06-14 Vid拓展公司 The decoding complex degree of mobile multimedia stream
US11785066B2 (en) 2012-12-31 2023-10-10 Divx, Llc Systems, methods, and media for controlling delivery of content
US20220038771A1 (en) * 2013-03-06 2022-02-03 Interdigital Patent Holdings, Inc. Power aware adaptation for video streaming
US11695991B2 (en) * 2013-03-06 2023-07-04 Interdigital Patent Holdings, Inc. Power aware adaptation for video streaming
US20150110166A1 (en) * 2013-10-23 2015-04-23 Paladin Innovators Mechanics and Processes for Remote Control of Live Video Production
US10282387B2 (en) 2013-11-06 2019-05-07 Nxp Usa, Inc. FFT device and method for performing a Fast Fourier Transform
US10303736B2 (en) 2013-11-06 2019-05-28 Nxp Usa, Inc. FFT device and method for performing a fast fourier transform
EP3066583A4 (en) * 2013-11-06 2017-06-14 NXP USA, Inc. Fft device and method for performing a fast fourier transform
WO2015067990A1 (en) 2013-11-06 2015-05-14 Freescale Semiconductor, Inc. FFT device and method for performing a Fast Fourier Transform
EP3066582A4 (en) * 2013-11-06 2017-06-14 NXP USA, Inc. FFT device and method for performing a Fast Fourier Transform
US11711410B2 (en) 2015-01-06 2023-07-25 Divx, Llc Systems and methods for encoding and sharing content between devices
US10291215B2 (en) * 2015-03-23 2019-05-14 Seiko Epson Corporation Data processing circuit, physical quantity detection circuit, physical quantity detection device, electronic apparatus, and moving object
US11206598B2 (en) 2016-06-01 2021-12-21 At&T Intellectual Property I, L.P. Method and apparatus for distributing content via diverse networks
US10182387B2 (en) 2016-06-01 2019-01-15 At&T Intellectual Property I, L.P. Method and apparatus for distributing content via diverse networks
US10820249B2 (en) 2016-06-01 2020-10-27 At&T Intellectual Property I, L.P. Method and apparatus for distributing content via diverse networks
US11310516B2 (en) * 2018-12-21 2022-04-19 Hulu, LLC Adaptive bitrate algorithm with cross-user based viewport prediction for 360-degree video streaming
US20200204810A1 (en) * 2018-12-21 2020-06-25 Hulu, LLC Adaptive bitrate algorithm with cross-user based viewport prediction for 360-degree video streaming
CN109669793A (en) * 2018-12-24 2019-04-23 中国人民解放军国防科技大学 Object calling method in middleware process
US11785233B2 (en) * 2019-12-17 2023-10-10 Realtek Semiconductor Corporation Video interface conversion device and method
US20210185291A1 (en) * 2019-12-17 2021-06-17 Realtek Semiconductor Corporation Video interface conversion device and method

Also Published As

Publication number Publication date
US10171873B2 (en) 2019-01-01
US20190075361A1 (en) 2019-03-07
US10771849B2 (en) 2020-09-08
US20160198226A1 (en) 2016-07-07

Similar Documents

Publication Publication Date Title
US10771849B2 (en) Multimedia system for mobile client platforms
WO2006110975A1 (en) Multimedia system for mobile client platforms
EP2596633B1 (en) A media streaming apparatus
JP5728736B2 (en) Audio splitting at codec applicable frame size
RU2497302C2 (en) Methodologies of copying and decoding of digital video with alternating resolution
KR100681168B1 (en) System and method for encoding and decoding residual signals for fine granular scalable video
CN102301710B (en) Multiple bit rate video encoding using variable bit rate and dynamic resolution for adaptive video streaming
KR101376666B1 (en) Enhancing image quality
US9113177B2 (en) Methods, apparatuses and computer program products for pausing video streaming content
KR20010080644A (en) System and Method for encoding and decoding enhancement layer data using base layer quantization data
RU2408089C9 (en) Decoding predictively coded data using buffer adaptation
AU2008202703B2 (en) Apparatus and method for providing multimedia content
KR20110106423A (en) Video encoding using previously calculated motion information
CN1406431A (en) Bit-plane dependent signal compression
CN102783147A (en) Budget encoding
CN101077011A (en) System and method for real-time transcoding of digital video for fine-granular scalability
US9818422B2 (en) Method and apparatus for layered compression of multimedia signals for storage and transmission over heterogeneous networks
CN1726644B (en) Apparatus and method for multiple description encoding
US10154263B2 (en) Rate-distortion optimization-based quantization method and apparatus
Su et al. A practical design of high-volume steganography in digital video files
WO2022242534A1 (en) Encoding method and apparatus, decoding method and apparatus, device, storage medium and computer program
Webb et al. Video and Audio Coding for Mobile Applications
US8345746B2 (en) Video quantizer unit and method thereof
Chiariglione Moving picture experts group (mpeg)
Pejhan et al. Online rate control for video streams

Legal Events

Date Code Title Description
STCB Information on status: application discontinuation

Free format text: ABANDONED -- AFTER EXAMINER'S ANSWER OR BOARD OF APPEALS DECISION