WO2005117431A1 - Method for synchronising video and audio data - Google Patents

Method for synchronising video and audio data Download PDF

Info

Publication number
WO2005117431A1
WO2005117431A1 PCT/AU2005/000747 AU2005000747W WO2005117431A1 WO 2005117431 A1 WO2005117431 A1 WO 2005117431A1 AU 2005000747 W AU2005000747 W AU 2005000747W WO 2005117431 A1 WO2005117431 A1 WO 2005117431A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
stream
audio
frame
time
Prior art date
Application number
PCT/AU2005/000747
Other languages
French (fr)
Inventor
Martin Samuel Lipka
Original Assignee
Vividas Technologies Pty Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from AU2004902811A external-priority patent/AU2004902811A0/en
Application filed by Vividas Technologies Pty Ltd filed Critical Vividas Technologies Pty Ltd
Publication of WO2005117431A1 publication Critical patent/WO2005117431A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/236Assembling of a multiplex stream, e.g. transport stream, by combining a video stream with other content or additional data, e.g. inserting a URL [Uniform Resource Locator] into a video stream, multiplexing software data into a video stream; Remultiplexing of multiplex streams; Insertion of stuffing bits into the multiplex stream, e.g. to obtain a constant bit-rate; Assembling of a packetised elementary stream
    • H04N21/2368Multiplexing of audio and video streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/4302Content synchronisation processes, e.g. decoder synchronisation
    • H04N21/4307Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen
    • H04N21/43072Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen of multiple content streams on the same device
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/414Specialised client platforms, e.g. receiver in car or embedded in a mobile appliance
    • H04N21/4143Specialised client platforms, e.g. receiver in car or embedded in a mobile appliance embedded in a Personal Computer [PC]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/4302Content synchronisation processes, e.g. decoder synchronisation
    • H04N21/4305Synchronising client clock from received content stream, e.g. locking decoder clock with encoder clock, extraction of the PCR packets
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/434Disassembling of a multiplex stream, e.g. demultiplexing audio and video streams, extraction of additional data from a video stream; Remultiplexing of multiplex streams; Extraction or processing of SI; Disassembling of packetised elementary stream
    • H04N21/4341Demultiplexing of audio and video streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/04Synchronising
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N9/00Details of colour television systems
    • H04N9/79Processing of colour television signals in connection with recording
    • H04N9/80Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback
    • H04N9/804Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback involving pulse code modulation of the colour picture signal components
    • H04N9/8042Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback involving pulse code modulation of the colour picture signal components involving data reduction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/426Internal components of the client ; Characteristics thereof
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/44Receiver circuitry for the reception of television signals according to analogue transmission standards
    • H04N5/60Receiver circuitry for the reception of television signals according to analogue transmission standards for the sound signals
    • H04N5/602Receiver circuitry for the reception of television signals according to analogue transmission standards for the sound signals for digital sound signals

Definitions

  • the present invention concerns a method for synchronising audio and video media streams, in particular to provide to a user the experience of seamless audio playback and smoothly synchronised video playback.
  • the invention relates to the field of data processing, for processing a stream of data including audio and video data comprised in a sequence of frames.
  • a new architecture and method of operation is described below.
  • the MPEG standard (from the Motion Pictures Expert Group (MPEG)) is a well established standard for audio and video compression and decompression algorithms, for use in the digital transmission and receipt of audio and video broadcasts. This provides for the efficient compression of data according to an established psychoacoustic model to enable real time transmission, decompression and broadcast of high quality sound and video images.
  • Other audio standards have also been established for the encoding and decoding of audio and video data transmitted in digital format, such as data for digital television systems.
  • Compression standards are based on psycho-acoustics of human perception. Generally, video and audio need to match to an accuracy of not much worse than 1/20 of a second in order to be acceptable for the viewer. Accuracy worse than 1/10 of a second is usually noticeable by the viewer, and accuracy of worse than 1/5 of a second is almost always noticeable.
  • Maintaining synchronisation between video and audio data is a straightforward matter if the streams are integrated and played using a single video/audio source. This is not the case for digital video, as the audio data and the video data are separated and independently decoded, processed, and played. Furthermore, computer users may require to view digital video while performing some other task or function within the computer, such as sending or receiving information from a computer network. This is quite possible in a multitasking computing environment, but can introduce significant multimedia synchronisation problems between the audio and the video data.
  • audio hardware does not generally support simple alterations in the audio rate, and in any case varying the audio rate produces a result generally unpleasant to the viewer, such as wavering alterations in pitch, deterioration in speech, etc.
  • the audio data is generally taken as providing the standard of player time, and the video is made to keep pace with it.
  • a further approach is simply to increase the performance level of the hardware, to ensure that the intensive computing requirements are met, and synchronisation of the audio and video can therefore be maintained.
  • the system has no control over the processing power (or over the simultaneous competing needs) of individual machines. It is therefore important that the synchronisation processes are as performance-tolerant as possible.
  • United States Patent No. 6,310,652 to Li et al. discusses a synchronisation method in which a 'presentation time' of data frames is continuously compared with a 'reference time' calculated by the playing device. Subsequent frames or portions thereof, are then either dropped or repeated depending on whether the presentation time is earlier or later than the calculated reference time.
  • This solution is less than ideal, in that it not only requires specialised hardware to calculate the reference time, but also involves dropping or repeating both audio and video frames, resulting in an unsatisfactory user experience.
  • United States Patent No. 6,272,776 to Griffiths discusses playing the video data ahead of the corresponding audio data in order to maintain synchronisation.
  • the 'initial due time' of the video data is first determined, which is typically the time-stamped initial start time for the video and audio data indicating when the video and audio data should be played.
  • An 'offset time' is then applied to the video due time, which adjusts when the video data should be played relative to the corresponding audio data and produces an adjusted video due time earlier than the initial video due time.
  • the particular value of the offset - and hence the amount of time by which the video data is played ahead of the audio data - may be varied depending on how early or late a frame of video data is relative to the corresponding audio data. Variations in the offset may also be made to account for an increase in available processing power, which allows a smaller offset to be applied.
  • the method is said to be advantageous in that it allows video to be played ahead of the audio in order to 'build in a margin' for any future late frames while degrading the video as little as possible.
  • the method attempts to jump to the exact point of synchronisation between the audio and video data upon each detection of an early or late video frame. Typically, this results in a blurred or jerky image, in a similar manner to when video frames are dropped or paused, in order to achieve synchronisation.
  • a method for playing a multimedia digital data stream comprising audio data and video data, the latter displayed to a user in a sequence of frames, in order to provide synchronisation between the streams, comprising the steps of: calculating the audio time in accordance with the time elapsed since the start of the audio data stream; determining at a certain point in time the offset of the video stream; adjusting, if an offset is detected, the frame delivery rate by a prescribed amount; repeating the above steps at successive points in time, to constantly adjust the frame delivery rate by no more than a prescribed amount for each successive frame, so to constantly trim the video stream display to enhance synchronisation with the audio stream.
  • the audio is synchronised to the system clock, and that synchronisation produces an offset.
  • This offset is relative to a point in time when the media started playing. It should be noted that there is plenty of this jitter in the audio synchronisation offset, and this is dependent on the hardware, the load on the computer at the time, and many other factors.
  • Video frames are displayed at a certain and adjustable rate, that rate being trimmed (adjusted by a small amount) in accordance with the apparent time difference between the audio and video.
  • a computer software product for playing a multimedia digital data stream comprising audio data and video data, the latter displayed to a user in a sequence of frames, in order to provide synchronisation between the streams, comprising computer program code, which when executed: j calculates the audio time in accordance with the time elapsed since the start of the audio data stream; determines at a certain point in time the offset of the video stream from the audio stream; adjusts, if an offset is detected, the frame delivery rate by a prescribed amount; and repeats the above steps at successive points in time, to constantly adjust the frame delivery rate by no more than a maximum amount for each successive frame, so as to constantly trim the video stream display to enhance synchronisation with the audio stream.
  • the present invention may be practised on any suitable computing device, with the necessary hardware and software resources for decoding and playing digital audio and video data streams.
  • suitable computing devices include personal computers (PCs), hand-held devices, multiprocessor systems, mobile telephone handsets, dvd players and terrestrial, satellite or cable digital television set top boxes.
  • PCs personal computers
  • hand-held devices multiprocessor systems
  • mobile telephone handsets dvd players
  • terrestrial satellite or cable digital television set top boxes.
  • the data to be played may be provided as streamed data, or may be stored for playback in any suitable form.
  • the audio playback is synchronised to the system clock of the particular device, and this is the only variable that is considered an absolute reference for the purposes of the technique of the invention.
  • the system clock measures time in milliseconds. When the audio stream is started, the system clock time is recorded. A calculation is then performed to determine how much audio time has elapsed.
  • Some media playback devices such as those implemented on the Mac OS, provide this information directly in the form of an actual time value.
  • Other devices such as those utilising the DirectX Application Programming Interfaces, only provide the position of , a playback pointer in an audio buffer, rather than an actual time elapsed value.
  • 'ring buffers' are often used, it is necessary to keep track of the number of buffers of data that have been used, along with the sample rate of the media, in order to calculate how much audio time has elapsed.
  • the audio time is considered as the 'lead', and the video attempts to loosely synchronise to this time.
  • a timer event exists which prompts the video to display a frame. This prompting is over sampled, and is often ignored. In the current embodiment of the device, it is set to 100 prompts per second, and therefore 3 in 4 prompts are ignored when displaying 25 frames per second media. This setting is arbitrary, and the trade-off is amount of CPU overhead used against smoothness of playback.
  • the video time is calculated from the same base offset as the audio time.
  • the actual video time is calculated by the time from when the last displayed frame plus that frame number, multiplied by the interval between frames (the reciprocal of the frame rate).
  • the determination of whether the audio time is 'in front' or 'behind' the video time occurs at a frequency of around once per second. This is sufficiently frequent to afford a smooth and constant effective synchronisation between the audio and video stream.
  • the audio and the video become excessively out of synchronism (in accordance with prescribed criteria; currently 200ms is considered excessively out of synchronism), the following considerations come into play. If the audio is excessively ahead of the video, one or more entire frames are omitted ('dropped'), to enable the video to catch up with the audio. As many are discarded as is required to catch up. If the video time is well ahead of the audio time, then the video is stalled until the audio catches up.
  • the accompanying Figure 1 diagrammatically illustrates the method of the invention.
  • the horizontal time axis represents time t elapsed from commencement of the multimedia data playback, as measured by the system clock.
  • the upper trace shows the audio data stream, the audio played time APT representing the synchronisation point we are aiming for.
  • the lower trace shows the video data stream, and the latest frame to be played LFP is shown in the figure as trailing the synchronisation point objective.
  • the next frame is therefore scheduled for display at the 'apparent' time of 1/fps later, but with a +2ms deviation, to trim it towards synchronisation with the audio data stream.
  • the video data stream is ahead of the audio data stream, then the next frame is scheduled for display 1/fps later, but with a - 2ms deviation. If the latest frame to be played occurred before a prescribed time interval ta before the synchronisation point (LFP ⁇ -t ⁇ ), then one or more frames are omitted. If the latest frame to be played is timed to display after a prescribed time interval ta from the synchronisation point (LFP > ta), then the video is held for the audio to catch up.

Abstract

The present invention involves a method and computer software product for playing a multimedia digital data stream comprising audio data and video data, the latter displayed to a user in a sequence of frames, in order to provide synchronisation between the streams. The method comprising the steps of calculating the audio time in accordance with the time elapsed since the start of the audio data stream, determining at a certain point in time the offset of the video stream from the audio stream, adjusting, if an offset is detected, the frame delivery rate by a prescribed amount, and repeating the above steps at successive points in time, to constantly adjust the frame delivery rate by no more than a maximum amount for each successive frame, so as to constantly trim the video stream display to enhance synchronisation with the audio stream.

Description

Method for synchronising video and audio data
Field of the invention
The present invention concerns a method for synchronising audio and video media streams, in particular to provide to a user the experience of seamless audio playback and smoothly synchronised video playback.
In broad terms, the invention relates to the field of data processing, for processing a stream of data including audio and video data comprised in a sequence of frames. A new architecture and method of operation is described below.
Background of the invention In order to preserve synchronisation between audio and video data, it is necessary to make adjustment to the transfer rate of the stream of data, so that a specified video presentation time is synchronised with a reference time, such as the correct moment in time of the associated audio stream. The data stream is organised in frames of data fed through a processing device, and a processing unit within the processing device is provided with means for determining the synchronisation.
The MPEG standard (from the Motion Pictures Expert Group (MPEG)) is a well established standard for audio and video compression and decompression algorithms, for use in the digital transmission and receipt of audio and video broadcasts. This provides for the efficient compression of data according to an established psychoacoustic model to enable real time transmission, decompression and broadcast of high quality sound and video images. Other audio standards have also been established for the encoding and decoding of audio and video data transmitted in digital format, such as data for digital television systems.
Compression standards are based on psycho-acoustics of human perception. Generally, video and audio need to match to an accuracy of not much worse than 1/20 of a second in order to be acceptable for the viewer. Accuracy worse than 1/10 of a second is usually noticeable by the viewer, and accuracy of worse than 1/5 of a second is almost always noticeable.
Maintaining synchronisation between video and audio data is a straightforward matter if the streams are integrated and played using a single video/audio source. This is not the case for digital video, as the audio data and the video data are separated and independently decoded, processed, and played. Furthermore, computer users may require to view digital video while performing some other task or function within the computer, such as sending or receiving information from a computer network. This is quite possible in a multitasking computing environment, but can introduce significant multimedia synchronisation problems between the audio and the video data.
The use of compression techniques such as MPEG requires the multimedia data to be decoded before it can be played, which is often a very computer-intensive task, particularly with respect to the video data. In addition, competing processes may steal away processing cycles of the central processor, which dynamically affects apparent processing power of the machine. This the result that the ability to read, decode, process, and play the multimedia data will vary during the processing, which can effect the ability to synchronously present the multimedia data to the user. The prior art has developed a number of ways to tackle this problem. One simple solution is to alter the speed of the audio data to match that of the video data. However, audio hardware does not generally support simple alterations in the audio rate, and in any case varying the audio rate produces a result generally unpleasant to the viewer, such as wavering alterations in pitch, deterioration in speech, etc. For this reason, the audio data is generally taken as providing the standard of player time, and the video is made to keep pace with it.
A further approach is simply to increase the performance level of the hardware, to ensure that the intensive computing requirements are met, and synchronisation of the audio and video can therefore be maintained. However, in applications of multimedia streaming to client browsers, the system has no control over the processing power (or over the simultaneous competing needs) of individual machines. It is therefore important that the synchronisation processes are as performance-tolerant as possible.
Other solutions of the prior art have included use of inferior decoding methods, and the dropping of frames of video data to maintain synchronisation with the audio data. However, in terms of viewer experience, these techniques are very much compromises. Using an inferior decoding method typically results in a blurred or blocky image, whilst merely dropping frames produces a result that is typically jerky in appearance.
United States Patent No. 6,310,652 to Li et al. discusses a synchronisation method in which a 'presentation time' of data frames is continuously compared with a 'reference time' calculated by the playing device. Subsequent frames or portions thereof, are then either dropped or repeated depending on whether the presentation time is earlier or later than the calculated reference time. This solution is less than ideal, in that it not only requires specialised hardware to calculate the reference time, but also involves dropping or repeating both audio and video frames, resulting in an unsatisfactory user experience. United States Patent No. 6,272,776 to Griffiths discusses playing the video data ahead of the corresponding audio data in order to maintain synchronisation. The 'initial due time' of the video data is first determined, which is typically the time-stamped initial start time for the video and audio data indicating when the video and audio data should be played. An 'offset time' is then applied to the video due time, which adjusts when the video data should be played relative to the corresponding audio data and produces an adjusted video due time earlier than the initial video due time. The particular value of the offset - and hence the amount of time by which the video data is played ahead of the audio data - may be varied depending on how early or late a frame of video data is relative to the corresponding audio data. Variations in the offset may also be made to account for an increase in available processing power, which allows a smaller offset to be applied. The method is said to be advantageous in that it allows video to be played ahead of the audio in order to 'build in a margin' for any future late frames while degrading the video as little as possible.
However, irrespective of whether an offset between the video and audio data is applied, the method attempts to jump to the exact point of synchronisation between the audio and video data upon each detection of an early or late video frame. Typically, this results in a blurred or jerky image, in a similar manner to when video frames are dropped or paused, in order to achieve synchronisation.
It is also important that sufficient processor time is devoted to the audio decode and play process to avoid intrusive and undesirable breaks (pops and silences) in the sound stream.
There therefore remains a need for a system for maintaining or improving the synchronisation between audio and video data which degrades the presented video as little as possible, which avoids breaks in the audio, which minimises the need for dropped video frames, and which is adaptive to the apparent processing power of the system while avoiding jerky video appearances when adapting to the apparent processing power of the system or other effects.
Summary the invention
In accordance with the invention, there is provided a method for playing a multimedia digital data stream comprising audio data and video data, the latter displayed to a user in a sequence of frames, in order to provide synchronisation between the streams, comprising the steps of: calculating the audio time in accordance with the time elapsed since the start of the audio data stream; determining at a certain point in time the offset of the video stream; adjusting, if an offset is detected, the frame delivery rate by a prescribed amount; repeating the above steps at successive points in time, to constantly adjust the frame delivery rate by no more than a prescribed amount for each successive frame, so to constantly trim the video stream display to enhance synchronisation with the audio stream. In summary, then, the audio is synchronised to the system clock, and that synchronisation produces an offset. This offset is relative to a point in time when the media started playing. It should be noted that there is plenty of this jitter in the audio synchronisation offset, and this is dependent on the hardware, the load on the computer at the time, and many other factors. Video frames are displayed at a certain and adjustable rate, that rate being trimmed (adjusted by a small amount) in accordance with the apparent time difference between the audio and video. This might be termed 'loose synchronisation', as opposed to so-called 'dead beat' control, as instead of being brought abruptly into accurate synchronisation, the video frame delivery control is slowly slewed in accordance with the detected time difference between the audio and video times. The result is smooth video delivery, synchronisation being achieved in a manner largely undetectable to the viewer.
According to a second aspect of the present invention there is provided a computer software product for playing a multimedia digital data stream comprising audio data and video data, the latter displayed to a user in a sequence of frames, in order to provide synchronisation between the streams, comprising computer program code, which when executed: j calculates the audio time in accordance with the time elapsed since the start of the audio data stream; determines at a certain point in time the offset of the video stream from the audio stream; adjusts, if an offset is detected, the frame delivery rate by a prescribed amount; and repeats the above steps at successive points in time, to constantly adjust the frame delivery rate by no more than a maximum amount for each successive frame, so as to constantly trim the video stream display to enhance synchronisation with the audio stream.
Brief Description of the Drawing
A preferred embodiment of the present invention will now be described by reference to the accompanying drawing (Figure 1), a diagrammatic illustration of the method of the present invention.
Detailed Description of the Drawing The present invention may be practised on any suitable computing device, with the necessary hardware and software resources for decoding and playing digital audio and video data streams. Such devices include personal computers (PCs), hand-held devices, multiprocessor systems, mobile telephone handsets, dvd players and terrestrial, satellite or cable digital television set top boxes. The data to be played may be provided as streamed data, or may be stored for playback in any suitable form.
The audio playback is synchronised to the system clock of the particular device, and this is the only variable that is considered an absolute reference for the purposes of the technique of the invention. The system clock measures time in milliseconds. When the audio stream is started, the system clock time is recorded. A calculation is then performed to determine how much audio time has elapsed.
Some media playback devices, such as those implemented on the Mac OS, provide this information directly in the form of an actual time value. Other devices however, such as those utilising the DirectX Application Programming Interfaces, only provide the position of , a playback pointer in an audio buffer, rather than an actual time elapsed value. Moreover, because 'ring buffers' are often used, it is necessary to keep track of the number of buffers of data that have been used, along with the sample rate of the media, in order to calculate how much audio time has elapsed.
The audio time is considered as the 'lead', and the video attempts to loosely synchronise to this time. A timer event exists which prompts the video to display a frame. This prompting is over sampled, and is often ignored. In the current embodiment of the device, it is set to 100 prompts per second, and therefore 3 in 4 prompts are ignored when displaying 25 frames per second media. This setting is arbitrary, and the trade-off is amount of CPU overhead used against smoothness of playback. The video time is calculated from the same base offset as the audio time. The actual video time is calculated by the time from when the last displayed frame plus that frame number, multiplied by the interval between frames (the reciprocal of the frame rate).
When the period between frames has elapsed (this is the reciprocal of the frames-per-second rate (eg l/25s), and the next video prompt occurs, then the next frame is displayed. As will be recognised, this can introduce up to 10 milliseconds of jitter in this process. However, video refresh is already synchronised to the vertical blank (ie the refresh rate of the monitor) which is usually between around 43 and 120 Hz (nominally 72 Hz) which already gives a jitter of nominally 14 milliseconds, imperceptible to the human eye. The period between frames is adjusted, or 'trimmed' in accordance with the audio time, by a prescribed maximum amount. In the current embodiment and under normal conditions, this trimming is adjusted by a maximum of 2 millisecond per frame. If the audio time appears to be 'in front' of the video time, then the period between frames is reduced, and vice versa. Oversampling the timer event that prompts the display of a video frame allows the system to account for the trimming, an in particular the reduction of the time period between display of successive video frames.
In one embodiment, the determination of whether the audio time is 'in front' or 'behind' the video time, occurs at a frequency of around once per second. This is sufficiently frequent to afford a smooth and constant effective synchronisation between the audio and video stream.
If the audio and the video become excessively out of synchronism (in accordance with prescribed criteria; currently 200ms is considered excessively out of synchronism), the following considerations come into play. If the audio is excessively ahead of the video, one or more entire frames are omitted ('dropped'), to enable the video to catch up with the audio. As many are discarded as is required to catch up. If the video time is well ahead of the audio time, then the video is stalled until the audio catches up.
The accompanying Figure 1 diagrammatically illustrates the method of the invention. The horizontal time axis represents time t elapsed from commencement of the multimedia data playback, as measured by the system clock. The upper trace shows the audio data stream, the audio played time APT representing the synchronisation point we are aiming for. The lower trace shows the video data stream, and the latest frame to be played LFP is shown in the figure as trailing the synchronisation point objective. The next frame is therefore scheduled for display at the 'apparent' time of 1/fps later, but with a +2ms deviation, to trim it towards synchronisation with the audio data stream. If the video data stream is ahead of the audio data stream, then the next frame is scheduled for display 1/fps later, but with a - 2ms deviation. If the latest frame to be played occurred before a prescribed time interval ta before the synchronisation point (LFP<-tø), then one or more frames are omitted. If the latest frame to be played is timed to display after a prescribed time interval ta from the synchronisation point (LFP > ta), then the video is held for the audio to catch up. Features of the invention:
1. Smooth video playback, without jerky adjustment to audio.
2. Smoothly handling jitter in the audio play information.
3. Synchronise the audio and video. 4. Handle adverse situations with massive differences in stream position.
The word 'comprising' and forms of the word 'comprising' as used in this description and in the claims does not limit the invention claimed to exclude any variants or additions. Modifications and improvements to the invention will be readily apparent to those skilled in the art. Such modifications and improvements are intended to be within the scope of this invention.

Claims

1. A method for playing a multimedia digital data stream comprising audio data and video data, the latter displayed to a user in a sequence of frames, in order to provide synchronisation between the streams, comprising the steps of: calculating the audio time in accordance with the time elapsed since the start of the audio data stream; determining at a certain point in time the offset of the video stream from the audio stream; adjusting, if an offset is detected, the frame delivery rate by a prescribed amount; and repeating the above steps at successive points in time, to constantly adjust the frame delivery rate by no more than a. maximum amount for each successive frame, so as to constantly trim the video stream display to enhance synchronisation with the audio stream.
2. A method according to claim 1 wherein the prescribed amount by which the frame delivery rate is adjusted is related to the magnitude of the offset of the video stream.
3- A method according to claim 1 wherein the step of adjusting the frame delivery rate comprises adjusting the interval between display of successive frames.
4. A method according to claim 1 including the further step of periodically prompting for the display of video frames at a higher rate than the intended frame display rate of the multimedia digital stream, and ignoring prompts for frame display occurring in the interval between display of successive frames, so as to account for the trimming of the frame delivery rate.
5. A method according to claim 4 wherein the video frame display prompt rate is around 100 times per second for a multimedia digital stream with an intended video frame display rate of approximately 25 frames per second.
6. A method according to any preceding claim, wherein the step of determining the offset of the video stream from the audio stream is carried out around once per second.
7- A method according to any preceding claim, including the further step of either pausing or dropping frames from, the video stream in the event that the offset between the video stream and audio steam exceeds a predetermined maximum value, so as to restore synchronisation between the video and audio stream.
8. A method according to any preceding claim, wherein the predetermined maximum amount is approximately 2 milliseconds per frame.
9. A method according to any preceding claim, wherein the audio stream is synchronised with the system clock of the device upon which the multimedia digital data stream is played.
10. A computer software product for playing a multimedia digital data stream comprising audio data and video data, the latter displayed to a user in a sequence of frames, in order to provide synchronisation between the streams, comprising computer program code, which when executed: calculates the audio time in accordance with the time elapsed since the start of the audio data stream; determines at a certain point in time the offset of the video stream from the audio stream; adjusts, if an offset is detected, the frame delivery rate by a prescribed amount; and repeats the above steps at successive points in time, to constantly adjust the frame delivery rate by no more than a maximum amount for each successive frame, so as to constantly trim the video stream display to enhance synchronisation with the audio stream.
11. A computer software product according to claim 10 wherein the prescribed amount by which the frame delivery rate is adjusted is related to the magnitude of the offset of the video stream.
12. A computer software product according to claim 10 wherein adjusting the frame delivery rate comprises adjusting the interval between display of successive frames.
13. A computer software product according to claim 10, further including computer code, which when executed periodically prompts for the display of video frames at a higher rate than the intended frame display rate of the multimedia digital stream, and ignores prompts for frame display occurring in the interval between display of successive frames, so as to account for the trimming of the frame delivery rate.
14. A computer software product according to claim 13 wherein the video frame display prompt rate is around 100 times per second for a multimedia digital stream with an intended video frame display rate of approximately 25 frames per second.
15. A computer software product according to claim 10, wherein the step of determining the offset of the video stream from the audio stream is carried out around once per second.
16. A software product according to claim 10, further including computer program code, which when executed, either pauses or drops frames from, the video stream in the event that the offset between the video stream and audio steam exceeds a predetermined maximum value, so as to restore synchronisation between the video and audio stream.
17. A computer software product according to claim 10 wherein the predetermined maximum amount is approximately 2 milliseconds per frame.
18. A computer software product according to claim 10 wherein the audio stream is synchronised with the system clock of the device upon which the multimedia digital data stream is played.
PCT/AU2005/000747 2004-05-26 2005-05-26 Method for synchronising video and audio data WO2005117431A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
AU2004902811 2004-05-26
AU2004902811A AU2004902811A0 (en) 2004-05-26 Method for synchronising video and audio data

Publications (1)

Publication Number Publication Date
WO2005117431A1 true WO2005117431A1 (en) 2005-12-08

Family

ID=35451272

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/AU2005/000747 WO2005117431A1 (en) 2004-05-26 2005-05-26 Method for synchronising video and audio data

Country Status (1)

Country Link
WO (1) WO2005117431A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2076052A3 (en) * 2007-12-28 2009-07-29 Intel Corporation Synchronizing audio and video frames
CN101119461B (en) * 2006-08-02 2010-05-12 广达电脑股份有限公司 System and method for maintaining video frame and audio frame synchronous broadcasting
US8126309B2 (en) * 2007-02-19 2012-02-28 Kabushiki Kaisha Toshiba Video playback apparatus and method
CN112637488A (en) * 2020-12-17 2021-04-09 深圳市普汇智联科技有限公司 Edge fusion method and device for audio and video synchronous playing system
CN112714353A (en) * 2020-12-28 2021-04-27 杭州电子科技大学 Distributed synchronization method for multimedia stream

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5617502A (en) * 1996-03-22 1997-04-01 Cirrus Logic, Inc. System and method synchronizing audio and video digital data signals during playback
US6337883B1 (en) * 1998-06-10 2002-01-08 Nec Corporation Method and apparatus for synchronously reproducing audio data and video data
US6452974B1 (en) * 1998-01-02 2002-09-17 Intel Corporation Synchronization of related audio and video streams
US20030058224A1 (en) * 2001-09-18 2003-03-27 Chikara Ushimaru Moving image playback apparatus, moving image playback method, and audio playback apparatus

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5617502A (en) * 1996-03-22 1997-04-01 Cirrus Logic, Inc. System and method synchronizing audio and video digital data signals during playback
US6452974B1 (en) * 1998-01-02 2002-09-17 Intel Corporation Synchronization of related audio and video streams
US6337883B1 (en) * 1998-06-10 2002-01-08 Nec Corporation Method and apparatus for synchronously reproducing audio data and video data
US20030058224A1 (en) * 2001-09-18 2003-03-27 Chikara Ushimaru Moving image playback apparatus, moving image playback method, and audio playback apparatus

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101119461B (en) * 2006-08-02 2010-05-12 广达电脑股份有限公司 System and method for maintaining video frame and audio frame synchronous broadcasting
US8126309B2 (en) * 2007-02-19 2012-02-28 Kabushiki Kaisha Toshiba Video playback apparatus and method
EP2076052A3 (en) * 2007-12-28 2009-07-29 Intel Corporation Synchronizing audio and video frames
US9571901B2 (en) 2007-12-28 2017-02-14 Intel Corporation Synchronizing audio and video frames
CN112637488A (en) * 2020-12-17 2021-04-09 深圳市普汇智联科技有限公司 Edge fusion method and device for audio and video synchronous playing system
CN112637488B (en) * 2020-12-17 2022-02-22 深圳市普汇智联科技有限公司 Edge fusion method and device for audio and video synchronous playing system
CN112714353A (en) * 2020-12-28 2021-04-27 杭州电子科技大学 Distributed synchronization method for multimedia stream
CN112714353B (en) * 2020-12-28 2022-08-30 杭州电子科技大学 Distributed synchronization method for multimedia stream

Similar Documents

Publication Publication Date Title
CN109714634B (en) Decoding synchronization method, device and equipment for live data stream
US8111327B2 (en) Method and apparatus for audio/video synchronization
US10930318B2 (en) Gapless video looping
US20070217505A1 (en) Adaptive Decoding Of Video Data
EP1684516B1 (en) Software-based audio rendering
KR102536652B1 (en) Dynamic reduction of alternative content playback to support aligning the end of the alternative content with the end of the substitute content.
US20070019931A1 (en) Systems and methods for re-synchronizing video and audio data
KR102469142B1 (en) Dynamic playback of transition frames while transitioning between media stream playbacks
US10638180B1 (en) Media timeline management
CN106470352B (en) Live channel playing method, device and system
US8279344B2 (en) Synchronization of video presentation by video cadence modification
CN113225598A (en) Method, device and equipment for synchronizing audio and video of mobile terminal and storage medium
CN108810656B (en) Real-time live broadcast TS (transport stream) jitter removal processing method and processing system
US10148722B2 (en) Methods and nodes for synchronized streaming of a first and a second data stream
CN108259964B (en) Video playing rate adjusting method and system
CN101119461B (en) System and method for maintaining video frame and audio frame synchronous broadcasting
US20180367827A1 (en) Player client terminal, system, and method for implementing live video synchronization
WO2005117431A1 (en) Method for synchronising video and audio data
US20140362291A1 (en) Method and apparatus for processing a video signal
JP2020522193A (en) Temporal placement of rebuffering events
CN113766261A (en) Method and device for determining pre-pulling duration, electronic equipment and storage medium
US8848803B2 (en) Information processing device and method, and program
JP3906712B2 (en) Data stream processing device
US11283852B2 (en) Methods and nodes for synchronized streaming of a first and a second data stream
US20190387271A1 (en) Image processing apparatus, image processing method, and program

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KM KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NG NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: DE

WWW Wipo information: withdrawn in national office

Country of ref document: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 69(1) EPC - FORM EPO 1205A DATED 30-03-2007

122 Ep: pct application non-entry in european phase

Ref document number: 05742122

Country of ref document: EP

Kind code of ref document: A1