US20040201784A9 - System and method for locating program boundaries and commercial boundaries using audio categories - Google Patents

System and method for locating program boundaries and commercial boundaries using audio categories Download PDF

Info

Publication number
US20040201784A9
US20040201784A9 US09/746,077 US74607700A US2004201784A9 US 20040201784 A9 US20040201784 A9 US 20040201784A9 US 74607700 A US74607700 A US 74607700A US 2004201784 A9 US2004201784 A9 US 2004201784A9
Authority
US
United States
Prior art keywords
audio
category
change
rate
detector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US09/746,077
Other versions
US6819863B2 (en
US20020080286A1 (en
Inventor
Serhan Dagtas
Nevenka Dimitrova
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Koninklijke Philips NV
Original Assignee
Philips Electronics North America Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US09/006,657 external-priority patent/US6363380B1/en
Application filed by Philips Electronics North America Corp filed Critical Philips Electronics North America Corp
Assigned to PHILIPS ELECTRONICS NORTH AMERICA CORPORATION reassignment PHILIPS ELECTRONICS NORTH AMERICA CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: DAGTAS, SERHAN, DIMITROVA, NEVENKA
Priority to US09/746,077 priority Critical patent/US6819863B2/en
Priority to PCT/IB2001/002432 priority patent/WO2002052440A1/en
Priority to JP2002553671A priority patent/JP2004517518A/en
Priority to EP01272141A priority patent/EP1417593A1/en
Priority to CN01808461A priority patent/CN1426563A/en
Publication of US20020080286A1 publication Critical patent/US20020080286A1/en
Assigned to KONINKLIJKE PHILIPS ELECTRONICS N.V. reassignment KONINKLIJKE PHILIPS ELECTRONICS N.V. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: PHILIP ELECTRONICS NORTH AMERICA CORPORATION
Publication of US20040201784A9 publication Critical patent/US20040201784A9/en
Publication of US6819863B2 publication Critical patent/US6819863B2/en
Application granted granted Critical
Adjusted expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7834Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using audio features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7844Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using original textual content or text extracted from visual content or transcript of audio data
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/19Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier
    • G11B27/22Means responsive to presence or absence of recorded information signals
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/19Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier
    • G11B27/28Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4394Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/812Monomedia components thereof involving advertisement data
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B2220/00Record carriers by type
    • G11B2220/20Disc-shaped record carriers
    • G11B2220/21Disc-shaped record carriers characterised in that the disc is of read-only, rewritable, or recordable type
    • G11B2220/215Recordable discs
    • G11B2220/216Rewritable discs
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B2220/00Record carriers by type
    • G11B2220/20Disc-shaped record carriers
    • G11B2220/25Disc-shaped record carriers characterised in that the disc is based on a specific recording technology
    • G11B2220/2508Magnetic discs
    • G11B2220/2516Hard disks
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B2220/00Record carriers by type
    • G11B2220/20Disc-shaped record carriers
    • G11B2220/25Disc-shaped record carriers characterised in that the disc is based on a specific recording technology
    • G11B2220/2537Optical discs
    • G11B2220/2545CDs
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B2220/00Record carriers by type
    • G11B2220/20Disc-shaped record carriers
    • G11B2220/25Disc-shaped record carriers characterised in that the disc is based on a specific recording technology
    • G11B2220/2537Optical discs
    • G11B2220/2562DVDs [digital versatile discs]; Digital video discs; MMCDs; HDCDs
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B2220/00Record carriers by type
    • G11B2220/40Combinations of multiple record carriers
    • G11B2220/45Hierarchical combination of record carriers, e.g. HDD for fast access, optical discs for long term storage or tapes for backup
    • G11B2220/455Hierarchical combination of record carriers, e.g. HDD for fast access, optical discs for long term storage or tapes for backup said record carriers being in one device and being used as primary and secondary/backup media, e.g. HDD-DVD combo device, or as source and target media, e.g. PC and portable player
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B2220/00Record carriers by type
    • G11B2220/90Tape-like record carriers
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S358/00Facsimile and static presentation processing
    • Y10S358/908Pause control, i.e. "commercial killers"

Definitions

  • the present invention is related to the inventions disclosed in U.S. Pat. No. 6,100,941 issued Aug. 8, 2000, entitled “APPARATUS AND METHOD FOR LOCATING A COMMERCIAL DISPOSED WITHIN A VIDEO DATA STREAM” and in U.S. Pat. application Ser. No. 09/006,657 filed Jan. 13, 1998, entitled “MULTIMEDIA COMPUTER SYSTEM WITH STORY SEGMENTATION CAPABILITY AND OPERATING PROGRAM THEREFOR INCLUDING FINITE AUTOMATON VIDEO PARSER.”
  • This patent and this patent application are commonly assigned to the assignee of the present invention.
  • the disclosures of this patent and patent application are hereby incorporated herein by reference for all purposes as if fully set forth herein.
  • the present invention is directed, in general, to a system and method for locating the boundaries of segments of a video program within a video data stream and, more specifically, to a system and method for locating boundaries of video programs and boundaries of commercial messages by using audio categories such as speech, music, silence, and noise.
  • VCR video cassette recorder
  • VTR video tape recorder
  • a video cassette recorder records video programs on magnetic cassette tapes.
  • video recorders that use computer magnetic hard disks rather than magnetic cassette tapes to store video programs have appeared in the market.
  • the ReplayTVTM recorder and the TiVOTM recorder digitally record television programs on hard disk drives using, for example, an MPEG video compression standard.
  • some video recorders may record on a readable/writable, digital versatile disk (DVD) rather than a magnetic disk.
  • DVD digital versatile disk
  • Video recorders are typically used in conjunction with a video display device such as a television.
  • a video recorder may be used to record a video program at the same time that the video program is being displayed on the video display device.
  • a common example is the use of a video cassette recorder (VCR) to record television programs while the television programs are simultaneously displayed on a television screen.
  • VCR video cassette recorder
  • Video recorders rely on high level Electronics Program Guide (EPG) information in order to determine the start times and the end times of television programs for recording purposes.
  • EPG Electronics Program Guide
  • the EPG information may often be inaccurate, especially for live television broadcasts.
  • broadcasters are not motivated to insert any metadata information about the boundaries of commercial messages (“commercials”) in video programs.
  • a black frame is a black video frame that is usually found immediately before and after a commercial.
  • Other methods for detecting the boundaries of a commercial include using cut rate change, super histograms, digitized codes with time information, etc.
  • Another prior art method for detecting the boundaries of a program or a commercial involves inserting a special code or signal in the video signal to designate the beginning and the end of the program or commercial. Special circuitry is needed to detect and identify the special code or signal.
  • program identification information uniquely identifies the beginning and the end of the program. This information can also be used to detect the boundaries of programs.
  • Computerized personal multimedia retrieval systems exist for identifying and recording segments of a video program (usually from a television broadcast) that contain topics that a user desires to record. The desired segments are usually identified based upon keywords input by the user.
  • a computer system operates in the background to monitor the content of information from a source such as the Internet. The content selection is guided by the keywords provided by the user. When a match is found between the keywords and the content of the monitored information, the information is stored for later replay and viewing by the user.
  • the downloaded information may include links to audio signals and to video clips that can also be downloaded by the user.
  • a computerized personal multimedia retrieval system that allows users to select and retrieve portions of television programs for later playback usually meets three primary requirements.
  • a system and method is usually available for parsing an incoming video signal into its visual, audio, and textual components.
  • a system and method is usually available for analyzing the content of the audio and/or textual components of the broadcast signal with respect to user input criteria and segmenting the components based upon content.
  • a system and method is usually available for integrating and storing program segments that match the user's requirements for later replay by the user.
  • U.S. Pat. application Ser. No. 09/006,657 describes a system and method that provides a set of models for recognizing a sequence of symbols, a matching model that identifies desired selection criteria, and a methodology for selecting and retrieving one or more video story segments or sequences based upon the selection criteria.
  • portions of audio signals into audio categories such as speech with background music, speech with background noise, speech with background speech, etc.
  • the audio classifier controller identifies also categorizes sequential portions of audio speech signals in speaker categories when the identity of a speaker can be determined. Each speaker category contains audio speech signals of one individual speaker. Speakers who can not be identified are categorized in an “unknown speaker” category.
  • the audio classifier controller of the present invention also comprises a category change detector that detects when a first portion of the audio signal categorized in a first category ceases and when a second portion of the audio signal categorized in a second category begins. That is, the category change detector determines when a category of the audio signal changes. In this manner the audio classifier controller of the present invention continually determines the type of each audio category.
  • the category change detector also determines when a first portion of the audio signal categorized in a first speaker category ceases and when a second portion of the audio signal categorized in a second speaker category begins. That is, the category change detector determines when a speaker category of the audio signal changes.
  • the audio classifier controller of the present invention also comprises a category change rate detector that determines the rate at which the audio categories are changing (the “category change rate”).
  • the category change rate detector compares the category change rate to a threshold value.
  • the threshold value can either be a preselected value or can be determined dynamically in response to changing operating conditions. If the category change rate is greater than the threshold value, the existence of a commercial segment may be inferred, therefore leading to the existence of a boundary.
  • FIG. 1 illustrates an exemplary video recorder and a television set, according to an advantageous embodiment of the present invention
  • FIG. 2 illustrates a block diagram of the exemplary video recorder, according to an advantageous embodiment of the present invention
  • FIG. 3 illustrates a block diagram of an exemplary audio classifier controller, according to an advantageous embodiment of the present invention.
  • FIG. 4 illustrates a flow chart depicting the operation of an exemplary audio classifier controller, according to an advantageous embodiment of the present invention.
  • FIGS. 1 through 4 discussed below, and the various embodiments used to describe the principles of the present invention in this patent document are by way of illustration only and should not be construed in any way to limit the scope of the invention. Those skilled in the art will understand that the principles of the present invention may be implemented in any suitably arranged audio classification system.
  • FIG. 1 illustrates exemplary video recorder 150 and television set 105 according to one embodiment of the present invention.
  • Video recorder 150 receives incoming television signals from an external source, such as a cable television service provider (Cable Co.), a local antenna, a satellite, the Internet, or a digital versatile disk (DVD) or a Video Home System (VHS) tape player.
  • Video recorder 150 transmits television signals from a selected channel to television set 105 .
  • a channel may be selected manually by the viewer or may be selected automatically by a recording device previously programmed by the viewer. Alternatively, a channel and a video program may be selected automatically by a recording device based upon information from a program profile in the viewer's personal viewing history.
  • Video recorder 150 comprises infrared (IR) sensor 160 that receives commands (such as Channel Up, Channel Down, Volume Up, Volume Down, Record, Play, Fast Forward (FF), Reverse, and the like) from remote control device 125 operated by the viewer.
  • Television set 105 is a conventional television comprising screen 110 , infrared (IR) sensor 115 , and one or more manual controls 120 (indicated by a dotted line).
  • IR sensor 115 also receives commands (such as Volume Up, Volume Down, Power On, Power Off) from remote control device 125 operated by the viewer.
  • video recorder 150 is not limited to receiving a particular type of incoming television signal from a particular type of source.
  • the external source may be a cable service provider, a conventional RF broadcast antenna, a satellite dish, an Internet connection, or another local storage device, such as a DVD player or a VHS tape player.
  • the incoming signal may be a digital signal, an analog signal, Internet protocol (IP) packets, or signals in other types of format.
  • IP Internet protocol
  • video recorder 150 receives (from a cable service provider) incoming analog television signals. Nonetheless, those skilled in the art will understand that the principles of the present invention may readily be adapted for use with digital television signals, wireless broadcast television signals, local storage systems, an incoming stream of IP packets containing MPEG data, and the like.
  • FIG. 2 illustrates exemplary video recorder 150 in greater detail according to one embodiment of the present invention.
  • Video recorder 150 comprises IR sensor 160 , video processor 210 , MPEG-2 encoder 220 , hard disk drive 230 , MPEG-2 decoder/NTSC encoder 240 , and controller 250 .
  • Video recorder 150 further comprises audio classifier controller 270 and memory 280 .
  • Controller 250 directs the overall operation of video recorder 150 , including View mode, Record mode, Play mode, Fast Forward (FF) mode, Reverse mode, among others.
  • FF Fast Forward
  • controller 250 causes the incoming television signal from the cable service provider to be demodulated and processed by video processor 210 and transmitted to television set 105 , without storing video signals in (or retrieving video signals from) hard disk drive 230 .
  • Video processor 210 contains radio frequency (RF) front-end circuitry for receiving incoming television signals from the cable service provider, tuning to a user-selected channel, and converting the selected RF signal to a baseband television signal (e.g., super video signal) suitable for of the MPEG-1, MPEG-2, and MPEG-4 standards, or with one or more other types of standards.
  • RF radio frequency
  • hard disk drive 230 is defined to include any mass storage device that is both readable and writable, including, but not limited to, conventional magnetic disk drives and optical disk drives for read/write digital versatile disks (DVD ⁇ RW standard and DVD+RW standard), re-writable CD-ROMs, VCR tapes and the like.
  • hard disk drive 230 need not be fixed in the conventional sense that it is permanently embedded in video recorder 150 . Rather, hard disk drive 230 includes any mass storage device that is dedicated to video recorder 150 for the purpose of storing recorded video programs.
  • hard disk drive 230 may include an attached peripheral drive or removable disk drives (whether embedded or attached), such as a juke box device (not shown) that holds several read/write DVDs or re-writable CD-ROMs. As illustrated schematically in FIG. 2, removable disk drives of this type are capable of receiving and reading re-writable CD-ROM disk 235 .
  • hard disk drive 230 may include external mass storage devices that video recorder 150 may access and control via a drive or removable disk drives (whether embedded or attached) that reads read/write DVDs or re-writable CD-ROMs. As illustrated schematically in FIG. 2, removable disk drives of this type are capable of receiving and reading re-writable CD-ROM disk 285 .
  • audio classifier controller 270 extracts an audio signal and separates the extracted audio signal into discrete audio categories, including speech, music, noise, and silence. Audio classifier controller 270 sends the extracted voice signals to speaker identifier 330 (shown in FIG. 3). Speaker identifier 330 analyzes the voice signals to identify the person who is speaking. Audio classifier controller 270 inserts time stamps into the extracted and categorized audio data.
  • Audio classifier controller 270 executes software instructions to identify and classify audio portions of a video program segment using audio categories. Audio classification may be achieved with multidimensional feature based methods that are known in the art. These methods typically use Linear Predictive Coding (LPC) derived cepstral coefficients and their regression coefficients, energy level, average energy, Zero Crossing Rate (ZCR), etc.
  • LPC Linear Predictive Coding
  • ZCR Zero Crossing Rate
  • Audio classifier controller 270 comprises classification algorithm 305 for classifying audio signals from audio signal source 300 , four data buffers, 310 through 325 , for recording information for four different types of audio categories, speaker identifier 330 containing a speech database of speaker identification data, category change detector 335 , category change rate detector 340 , and boundary detector 345 .
  • Audio classifier controller 270 receives audio signal segments directly from audio signal source 300 and classifies the audio signal segments with classification algorithm 305 .
  • Classification algorithm 305 classifies the audio signals into individual types of audio categories, such as silence, music, noise, speech and any combination of these audio categories. These four types of audio categories are illustrated in FIG. 3. These types are not the only types of audio categories that may be used. It is clear that other types of audio categories may also be identified and classified (e.g., laughter).
  • Classification algorithm 305 records information for the audio category of “silence” in data buffer 310 , records information for the audio category of “music” in data buffer 315 , records information for the audio category of “noise” in data buffer 320 , and records information for the audio category of “speech” in data buffer 325 . Classification algorithm 305 also inserts time stamps into the categorized audio signals.
  • Speaker identifier 330 contains a speech database of voice identification information for persons whose voices have been previously identified, classified, and recorded.
  • Classification algorithm 305 is capable of accessing the speech database within speaker identifier 330 .
  • classification algorithm 305 classifies an audio signal as a “speech” audio signal
  • classification algorithm 305 accesses speaker identifier 330 to identify the speaker. If the speaker can be identified, the identity of the speaker is added to the data concerning the “speech” audio category.
  • Classification algorithm 305 is capable of classifying “speech” audio signals from more than one speaker. A first “speech” audio signal may be identified as originating from a first speaker and a second “speech” audio signal may be identified as originating from a second speaker.
  • “Speech” audio signals from unidentified speakers are classified in a “unknown speaker” category. Whenever a “speech” audio signal from an unknown speaker is identified, that unknown speaker is added to the speech database and identified as “unknown speaker number 1.” When a “speech” audio signal from a second unknown speaker is identified, that second unknown speaker is added to the speech database and identified as “unknown speaker number 2.” Each time an unknown speaker is detected, the unknown speaker's “speech” audio signal is compared to the “speech” audio signals of each of the unknown speakers in the speech database to see if the unknown speaker is one that has already been added to the speech database.
  • Classification algorithm 305 can use this information to determine the number of unknown speakers who speak within a given period of time. The existence of a relatively large number of unknown speakers within a short period of time can indicate the presence of a commercial within the video data stream.
  • Classification algorithm 305 also updates the speech database in speaker identifier 330 to add voice identification information for new persons who appear in the program portions of the video data stream. These persons may be new actors and actresses, new musicians, newly elected politicians, etc. It is not necessary to update the speech database with voice identification information for new persons who appear in commercials. Therefore, classification algorithm 305 records the number of times that new unknown persons appear and whether they appear in commercials or in the program portions of the video data stream. Classification algorithm 305 then deletes all information relating to new unknown persons who appear in commercials (unless they also happen to appear in the program portion of the video data stream).
  • classification algorithm 305 sends the classification information to category change detector 335 .
  • Category change detector 335 uses time stamp information to detect when a first portion of the audio signal that has been categorized in a first category ceases and when a second portion of the audio signal categorized in a second category begins.
  • Category change detector 335 determines when a category of the audio signal changes and determines the identity of the two categories involved. Specifically, category change detector 335 is capable of determining that an audio signal has changed from a speech signal to a music signal, or that an audio signal has changed from a silent signal to a speech signal, and so on.
  • Category change detector 335 also detects when a first portion of the audio signal that has been categorized in a first subcategory ceases and when a second portion of the audio signal categorized in a second subcategory begins. For example, category change detector 335 is capable of determining that an audio signal has changed from a first subcategory of speech with background music to a second subcategory of speech with background noise.
  • Category change detector 335 also determines when a first portion of the audio signal categorized in a first speaker category ceases and when a second portion of the audio signal categorized in a second speaker category begins. Category change detector 335 determines when a speaker category of the audio signal changes. Category change detector 335 is capable of determining that an audio signal has changed from a first speaker to a second speaker, or from a second speaker to a third speaker, and so on.
  • Category change detector 335 sends this information to category change rate detector 340 .
  • Category change rate detector 340 detects the rate at which the various categories are changing.
  • Category change rate detector 340 uses time stamp information to calculate how many times each particular category is changing within a unit time (e.g., one minute).
  • Category change rate detector 340 determines the rate of change for each of the categories.
  • Category change rate detector 340 uses the rate of change for each of the categories to determine an overall change rate.
  • the overall change rate takes into account 1) the change rate of each category, and 2) the audio cut rate (i.e., the rate at which all of the categories are changing), and 3) the total length of time of each category, and 4) the ratio of the change rate of each category to the total length of time of the category within a given period of time.
  • Category change rate detector 340 then sends the information described above to boundary detector 345 .
  • Boundary detector 345 uses the information (including the overall change rate) to locate the boundaries of video programs and commercials. It is known that commercials often contain diverse and rapidly changing audio categories. Commercials usually have a larger number of speaker changes (within a given time) than do other types of video segments. If boundary detector 345 receives change rate information that shows that the rate of change of speakers is above a preselected threshold value, then boundary detector 345 may infer that a commercial is in progress. An appropriate threshold value may be obtained empirically by measuring the rate of change of speakers for a large number of commercials.
  • Boundary detector 345 may assign a “weighting factor” to each change in each category.
  • the weighting factor may be a number that represents the relative importance assigned to the category change in assessing the likelihood of locating a boundary at the point where the particular change in category occurs. For example, if it is determined that a change from “silence” to “music” is more likely to be associated with an initial boundary, then the numerical factor that represents that particular category change may be multiplied by a “weighting factor” to increase the relative impact of that particular category change in determining the likelihood of the existence of an initial boundary.
  • each category e.g., speech, music
  • each category has a mean vector that represents the centroid of that category.
  • the distances between each of those mean vectors is also a measure of the significance of a category change.
  • the distance between the means vectors can therefore be used to quantify the importance of a category change.
  • Boundary detector 345 uses the audio categories (such as speech, silence, music and noise), and the audio subcategories (such as speech with background noise, music with background noise), and the speaker categories (such as identified speakers and unknown speakers). To determine the boundary of a commercial segment, boundary detector 345 selects the size of a time window. For example, for a commercial the size of the time window can be selected to be twenty (20) seconds.
  • Boundary detector 345 performs a sliding window high-level feature extraction and classification process to extract the following high-level features: 1) the rate of change of each category (i.e., how many times each category appears during the time window), 2) the length of each category within the time window (n-values for n categories), 3) the rate change of audio cuts (any category change) computed with the corresponding weighting factors, and 4) the average audio cut distance.
  • a classifier (not shown) within boundary detector 345 (e.g., a nearest neighbor classifier) that determines whether the audio segment within the time window is or is not a commercial segment. If the classifier is a probabilistic classifier (e.g., a Bayesian classifier), then classifier determines a probability that the audio segment within the time window is or is not a commercial segment.
  • boundary detector 345 selects the size of a time window. For example, for a program segment the size of the time window can be selected to be five (5) minutes.
  • Boundary detector 345 then performs a sliding window high-level feature extraction and classification process to extract the following high-level features: 1) the rate of change of each category (i.e., how many times each category appears during the time window), 2) the length of each category within the time window (adjusted by the weighting factor), 3) the rate change of audio cuts (any category change), and 4) the average audio cut distance.
  • a probabilistic classifier (not shown) within boundary detector 345 (e.g., a Bayesian classifier) that determines the probability that the audio segment within the time window belongs to a particular class.
  • the audio segment may belong to a dialog, or to a news story, or to a music video, or to a crowd scene with shouting, etc.
  • the output values from the sliding window are subjected to an analysis for a global minimum among the different segments and to an overall analysis (e.g., for the last one hour of time).
  • heuristics concerning program boundaries include 1) a musical audio logo is usually present at the start of a news program, 2) there is usually a commercial close to end of every program, 3) credits at the end of a movie are usually shown with music in the background, and 4) the identity of the speaker (or speakers) almost always changes between programs.
  • FIG. 4 illustrates flow chart 400 depicting the operation of audio classifier controller 270 , according to an advantageous embodiment of the present invention.
  • Flow chart 400 depicts one advantageous method of operation of the present invention in audio classifier controller 270 in video recorder 150 .
  • Audio classifier controller 280 receives an audio signal from an audio signal source 300 (step 410 ).
  • Audio classifier controller 270 classifies the audio signal into audio categories (and subcategories) using classification algorithm 305 (step 420 ).
  • Classification algorithm 305 identifies individual speakers in each segment in the “speech” audio category using information from speaker identifier 330 (step 430 ).
  • Category change detector 335 determines when each audio category (or subcategory) changes (step 440 ).
  • Category change rate detector 340 determines the rate of change of audio categories (or subcategories) (step 450 ).
  • Boundary detector 345 uses the rate of change information of audio categories (or subcategories) for multifeature classification to locate boundaries of video programs and commercials (step 460 ).

Abstract

For use in a video signal processor, there is disclosed a system and method for locating program boundaries and commercial boundaries using audio categories. The system comprises an audio classifier controller that obtains information concerning the audio categories of the segments of an audio signal. Audio categories include such categories as silence, music, noise and speech. The audio classifier controller determines the rates of change of the audio categories. The audio classifier controller then compares each rate of change of the audio categories with a threshold value to locate the boundaries of the programs and commercials. The audio classifier controller is also capable of classifying at least one feature of an audio category change rate using a multifeature classifier to locate the boundaries of the programs and commercials.

Description

    CROSS-REFERENCE TO RELATED PATENT AND APPLICATION
  • The present invention is related to the inventions disclosed in U.S. Pat. No. 6,100,941 issued Aug. 8, 2000, entitled “APPARATUS AND METHOD FOR LOCATING A COMMERCIAL DISPOSED WITHIN A VIDEO DATA STREAM” and in U.S. Pat. application Ser. No. 09/006,657 filed Jan. 13, 1998, entitled “MULTIMEDIA COMPUTER SYSTEM WITH STORY SEGMENTATION CAPABILITY AND OPERATING PROGRAM THEREFOR INCLUDING FINITE AUTOMATON VIDEO PARSER.” This patent and this patent application are commonly assigned to the assignee of the present invention. The disclosures of this patent and patent application are hereby incorporated herein by reference for all purposes as if fully set forth herein. [0001]
  • TECHNICAL FIELD OF THE INVENTION
  • The present invention is directed, in general, to a system and method for locating the boundaries of segments of a video program within a video data stream and, more specifically, to a system and method for locating boundaries of video programs and boundaries of commercial messages by using audio categories such as speech, music, silence, and noise. [0002]
  • BACKGROUND OF THE INVENTION
  • A wide variety of video recorders are available in the marketplace. Most people own, or are familiar with, a video cassette recorder (VCR), also referred to as a video tape recorder (VTR). A video cassette recorder records video programs on magnetic cassette tapes. More recently, video recorders that use computer magnetic hard disks rather than magnetic cassette tapes to store video programs have appeared in the market. For example, the ReplayTV™ recorder and the TiVO™ recorder digitally record television programs on hard disk drives using, for example, an MPEG video compression standard. Additionally, some video recorders may record on a readable/writable, digital versatile disk (DVD) rather than a magnetic disk. [0003]
  • Video recorders are typically used in conjunction with a video display device such as a television. A video recorder may be used to record a video program at the same time that the video program is being displayed on the video display device. A common example is the use of a video cassette recorder (VCR) to record television programs while the television programs are simultaneously displayed on a television screen. [0004]
  • Video recorders rely on high level Electronics Program Guide (EPG) information in order to determine the start times and the end times of television programs for recording purposes. Unfortunately, the EPG information may often be inaccurate, especially for live television broadcasts. There is a need in the art for an improved system and method for locating the boundaries of video programs. However, broadcasters are not motivated to insert any metadata information about the boundaries of commercial messages (“commercials”) in video programs. [0005]
  • Various methods exist to detect the start times and the end times of segments of video programs. These methods are typically used to detect commercials so that the commercials may be automatically skipped over when a video program is being recorded in a video recorder. Several well known methods involve the detection of a “black frame.” A black frame is a black video frame that is usually found immediately before and after a commercial. Other methods for detecting the boundaries of a commercial include using cut rate change, super histograms, digitized codes with time information, etc. [0006]
  • Another prior art method for detecting the boundaries of a program or a commercial involves inserting a special code or signal in the video signal to designate the beginning and the end of the program or commercial. Special circuitry is needed to detect and identify the special code or signal. [0007]
  • In addition, there are presently existing television standards that insert program identification information in the video signal. The program identification information uniquely identifies the beginning and the end of the program. This information can also be used to detect the boundaries of programs. [0008]
  • These prior art methods all involve the insertion and detection of special codes, special signals, or special program identification information within a video data stream. There is a need in the art for an improved system and method for locating the boundaries of video programs and commercials within a video data stream without using special codes, special signals, or special program identification information. [0009]
  • There is also a need for an improved system and method for automatically locating the boundaries of video programs and the boundaries of commercials in computerized personal multimedia retrieval systems. Computerized personal multimedia retrieval systems exist for identifying and recording segments of a video program (usually from a television broadcast) that contain topics that a user desires to record. The desired segments are usually identified based upon keywords input by the user. In a typical application, a computer system operates in the background to monitor the content of information from a source such as the Internet. The content selection is guided by the keywords provided by the user. When a match is found between the keywords and the content of the monitored information, the information is stored for later replay and viewing by the user. The downloaded information may include links to audio signals and to video clips that can also be downloaded by the user. [0010]
  • A computerized personal multimedia retrieval system that allows users to select and retrieve portions of television programs for later playback usually meets three primary requirements. First, a system and method is usually available for parsing an incoming video signal into its visual, audio, and textual components. Second, a system and method is usually available for analyzing the content of the audio and/or textual components of the broadcast signal with respect to user input criteria and segmenting the components based upon content. Third, a system and method is usually available for integrating and storing program segments that match the user's requirements for later replay by the user. Fourth, users prefer to record/playback only program segments and not commercials. [0011]
  • A system that meets these requirements is described in U.S. Pat. application Ser. No. 09/006,657 filed Jan. 13, 1998 by Dimitrova (a co-inventor of the present invention) entitled “MULTIMEDIA COMPUTER SYSTEM WITH STORY SEGMENTATION CAPABILITY AND OPERATING PROGRAM THEREFOR INCLUDING FINITE AUTOMATON VIDEO PARSER.” U.S. Pat. application Ser. No. 09/006,657 is hereby incorporated herein by reference within this document for all purposes as if fully set forth herein. [0012]
  • U.S. Pat. application Ser. No. 09/006,657 describes a system and method that provides a set of models for recognizing a sequence of symbols, a matching model that identifies desired selection criteria, and a methodology for selecting and retrieving one or more video story segments or sequences based upon the selection criteria. [0013]
  • A significant improvement in the operation of video signal processors, such as video recorders and computerized personal multimedia retrieval systems, can be obtained if the locations of the boundaries of the video programs and commercials are known. There is therefore a need in the art for an improved system and method for locating the boundaries of video programs and the boundaries of commercials within a video data stream. portions of audio signals into audio categories such as speech with background music, speech with background noise, speech with background speech, etc. The audio classifier controller identifies also categorizes sequential portions of audio speech signals in speaker categories when the identity of a speaker can be determined. Each speaker category contains audio speech signals of one individual speaker. Speakers who can not be identified are categorized in an “unknown speaker” category. [0014]
  • The audio classifier controller of the present invention also comprises a category change detector that detects when a first portion of the audio signal categorized in a first category ceases and when a second portion of the audio signal categorized in a second category begins. That is, the category change detector determines when a category of the audio signal changes. In this manner the audio classifier controller of the present invention continually determines the type of each audio category. [0015]
  • The category change detector also determines when a first portion of the audio signal categorized in a first speaker category ceases and when a second portion of the audio signal categorized in a second speaker category begins. That is, the category change detector determines when a speaker category of the audio signal changes. [0016]
  • The audio classifier controller of the present invention also comprises a category change rate detector that determines the rate at which the audio categories are changing (the “category change rate”). The category change rate detector compares the category change rate to a threshold value. The threshold value can either be a preselected value or can be determined dynamically in response to changing operating conditions. If the category change rate is greater than the threshold value, the existence of a commercial segment may be inferred, therefore leading to the existence of a boundary. [0017]
  • It is an object of the present invention to provide an improved system and method for identifying boundaries using classification of audio signals to obtain at least one audio category for each segment of an audio signal. [0018]
  • It is also an object of the present invention to provide an improved system and method for identifying boundaries using classification of audio signals into audio categories such as silence, music, noise and speech. [0019]
  • It is also an object of the present invention to provide an improved system and method for identifying boundaries using classification of audio signals into audio subcategories such as speech with background music, speech with background noise, music invention that follows. Additional features and advantages of the invention will be described hereinafter that form the subject of the claims of the invention. Those skilled in the art should appreciate that they may readily use the conception and the specific embodiment disclosed as a basis for modifying or designing other structures for carrying out the same purposes of the present invention. Those skilled in the art should also realize that such equivalent constructions do not depart from the spirit and scope of the invention in its broadest form. [0020]
  • Before undertaking the DETAILED DESCRIPTION, it may be advantageous to set forth definitions of certain words and phrases used throughout this patent document: the terms “include” and “comprise,” as well as derivatives thereof, mean inclusion without limitation; the term “or,” is inclusive, meaning and/or; the phrases “associated with” and “associated therewith,” as well as derivatives thereof, may mean to include, be included within, interconnect with, contain, be contained within, connect to or with, couple to or with, be communicable with, cooperate with, interleave, juxtapose, be proximate to, be bound to or with, have, have a property of, or the like; and the term “controller” means any device, system or part thereof that controls at least one operation, such a device may be implemented in hardware, firmware or software, or some combination of at least two of the same. It should be noted that the functionality associated with any particular controller may be centralized or distributed, whether locally or remotely. Definitions for certain words and phrases are provided throughout this patent document, those of ordinary skill in the art should understand that in many, if not most instances, such definitions apply to prior, as well as future uses of such defined words and phrases. [0021]
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • For a more complete understanding of the present invention, and the advantages thereof, reference is now made to the following descriptions taken in conjunction with the accompanying drawings, wherein like numbers designate like objects, and in which: [0022]
  • FIG. 1 illustrates an exemplary video recorder and a television set, according to an advantageous embodiment of the present invention; [0023]
  • FIG. 2 illustrates a block diagram of the exemplary video recorder, according to an advantageous embodiment of the present invention; [0024]
  • FIG. 3 illustrates a block diagram of an exemplary audio classifier controller, according to an advantageous embodiment of the present invention; and [0025]
  • FIG. 4 illustrates a flow chart depicting the operation of an exemplary audio classifier controller, according to an advantageous embodiment of the present invention. [0026]
  • DETAILED DESCRIPTION OF THE INVENTION
  • FIGS. 1 through 4, discussed below, and the various embodiments used to describe the principles of the present invention in this patent document are by way of illustration only and should not be construed in any way to limit the scope of the invention. Those skilled in the art will understand that the principles of the present invention may be implemented in any suitably arranged audio classification system. [0027]
  • FIG. 1 illustrates [0028] exemplary video recorder 150 and television set 105 according to one embodiment of the present invention. Video recorder 150 receives incoming television signals from an external source, such as a cable television service provider (Cable Co.), a local antenna, a satellite, the Internet, or a digital versatile disk (DVD) or a Video Home System (VHS) tape player. Video recorder 150 transmits television signals from a selected channel to television set 105. A channel may be selected manually by the viewer or may be selected automatically by a recording device previously programmed by the viewer. Alternatively, a channel and a video program may be selected automatically by a recording device based upon information from a program profile in the viewer's personal viewing history.
  • [0029] Video recorder 150 comprises infrared (IR) sensor 160 that receives commands (such as Channel Up, Channel Down, Volume Up, Volume Down, Record, Play, Fast Forward (FF), Reverse, and the like) from remote control device 125 operated by the viewer. Television set 105 is a conventional television comprising screen 110, infrared (IR) sensor 115, and one or more manual controls 120 (indicated by a dotted line). IR sensor 115 also receives commands (such as Volume Up, Volume Down, Power On, Power Off) from remote control device 125 operated by the viewer.
  • It should be noted that [0030] video recorder 150 is not limited to receiving a particular type of incoming television signal from a particular type of source. As noted above, the external source may be a cable service provider, a conventional RF broadcast antenna, a satellite dish, an Internet connection, or another local storage device, such as a DVD player or a VHS tape player. The incoming signal may be a digital signal, an analog signal, Internet protocol (IP) packets, or signals in other types of format.
  • For the purposes of simplicity and clarity in explaining the principles of the present invention, the descriptions that follow shall generally be directed to an embodiment in which [0031] video recorder 150 receives (from a cable service provider) incoming analog television signals. Nonetheless, those skilled in the art will understand that the principles of the present invention may readily be adapted for use with digital television signals, wireless broadcast television signals, local storage systems, an incoming stream of IP packets containing MPEG data, and the like.
  • FIG. 2 illustrates [0032] exemplary video recorder 150 in greater detail according to one embodiment of the present invention. Video recorder 150 comprises IR sensor 160, video processor 210, MPEG-2 encoder 220, hard disk drive 230, MPEG-2 decoder/NTSC encoder 240, and controller 250. Video recorder 150 further comprises audio classifier controller 270 and memory 280. Controller 250 directs the overall operation of video recorder 150, including View mode, Record mode, Play mode, Fast Forward (FF) mode, Reverse mode, among others.
  • In View mode, [0033] controller 250 causes the incoming television signal from the cable service provider to be demodulated and processed by video processor 210 and transmitted to television set 105, without storing video signals in (or retrieving video signals from) hard disk drive 230. Video processor 210 contains radio frequency (RF) front-end circuitry for receiving incoming television signals from the cable service provider, tuning to a user-selected channel, and converting the selected RF signal to a baseband television signal (e.g., super video signal) suitable for of the MPEG-1, MPEG-2, and MPEG-4 standards, or with one or more other types of standards.
  • For the purposes of this application and the claims that follow, [0034] hard disk drive 230 is defined to include any mass storage device that is both readable and writable, including, but not limited to, conventional magnetic disk drives and optical disk drives for read/write digital versatile disks (DVD−RW standard and DVD+RW standard), re-writable CD-ROMs, VCR tapes and the like. In fact, hard disk drive 230 need not be fixed in the conventional sense that it is permanently embedded in video recorder 150. Rather, hard disk drive 230 includes any mass storage device that is dedicated to video recorder 150 for the purpose of storing recorded video programs. Thus, hard disk drive 230 may include an attached peripheral drive or removable disk drives (whether embedded or attached), such as a juke box device (not shown) that holds several read/write DVDs or re-writable CD-ROMs. As illustrated schematically in FIG. 2, removable disk drives of this type are capable of receiving and reading re-writable CD-ROM disk 235.
  • Furthermore, in an advantageous embodiment of the present invention, [0035] hard disk drive 230 may include external mass storage devices that video recorder 150 may access and control via a drive or removable disk drives (whether embedded or attached) that reads read/write DVDs or re-writable CD-ROMs. As illustrated schematically in FIG. 2, removable disk drives of this type are capable of receiving and reading re-writable CD-ROM disk 285.
  • As the video program is recorded on [0036] hard disk drive 230, (or, alternatively, after the video program has been recorded on hard disk drive 230), audio classifier controller 270 extracts an audio signal and separates the extracted audio signal into discrete audio categories, including speech, music, noise, and silence. Audio classifier controller 270 sends the extracted voice signals to speaker identifier 330 (shown in FIG. 3). Speaker identifier 330 analyzes the voice signals to identify the person who is speaking. Audio classifier controller 270 inserts time stamps into the extracted and categorized audio data.
  • A block diagram of [0037] audio classifier controller 270 is shown in detail in FIG. 3. Audio classifier controller 270 executes software instructions to identify and classify audio portions of a video program segment using audio categories. Audio classification may be achieved with multidimensional feature based methods that are known in the art. These methods typically use Linear Predictive Coding (LPC) derived cepstral coefficients and their regression coefficients, energy level, average energy, Zero Crossing Rate (ZCR), etc. For further information refer to a paper entitled “Classification of General Audio Data for Content-Based Retrieval” by Dongge Li, Ishwar K. Sethi, Nevenka Dimitrova and Tom McGee, Technical Report, Oakland University, Rochester Minn., TR-CSE-IIE-00-11, 2000.
  • The source of audio signals for [0038] audio classifier controller 270 is identified in FIG. 3 with the reference numeral 300. Audio classifier controller 270 comprises classification algorithm 305 for classifying audio signals from audio signal source 300, four data buffers, 310 through 325, for recording information for four different types of audio categories, speaker identifier 330 containing a speech database of speaker identification data, category change detector 335, category change rate detector 340, and boundary detector 345.
  • [0039] Audio classifier controller 270 receives audio signal segments directly from audio signal source 300 and classifies the audio signal segments with classification algorithm 305. Classification algorithm 305 classifies the audio signals into individual types of audio categories, such as silence, music, noise, speech and any combination of these audio categories. These four types of audio categories are illustrated in FIG. 3. These types are not the only types of audio categories that may be used. It is clear that other types of audio categories may also be identified and classified (e.g., laughter).
  • [0040] Classification algorithm 305 records information for the audio category of “silence” in data buffer 310, records information for the audio category of “music” in data buffer 315, records information for the audio category of “noise” in data buffer 320, and records information for the audio category of “speech” in data buffer 325. Classification algorithm 305 also inserts time stamps into the categorized audio signals.
  • [0041] Speaker identifier 330 contains a speech database of voice identification information for persons whose voices have been previously identified, classified, and recorded. Classification algorithm 305 is capable of accessing the speech database within speaker identifier 330. When classification algorithm 305 classifies an audio signal as a “speech” audio signal, classification algorithm 305 accesses speaker identifier 330 to identify the speaker. If the speaker can be identified, the identity of the speaker is added to the data concerning the “speech” audio category. Classification algorithm 305 is capable of classifying “speech” audio signals from more than one speaker. A first “speech” audio signal may be identified as originating from a first speaker and a second “speech” audio signal may be identified as originating from a second speaker.
  • “Speech” audio signals from unidentified speakers are classified in a “unknown speaker” category. Whenever a “speech” audio signal from an unknown speaker is identified, that unknown speaker is added to the speech database and identified as “unknown speaker number 1.” When a “speech” audio signal from a second unknown speaker is identified, that second unknown speaker is added to the speech database and identified as “unknown speaker number 2.” Each time an unknown speaker is detected, the unknown speaker's “speech” audio signal is compared to the “speech” audio signals of each of the unknown speakers in the speech database to see if the unknown speaker is one that has already been added to the speech database. [0042]
  • [0043] Classification algorithm 305 can use this information to determine the number of unknown speakers who speak within a given period of time. The existence of a relatively large number of unknown speakers within a short period of time can indicate the presence of a commercial within the video data stream.
  • [0044] Classification algorithm 305 also updates the speech database in speaker identifier 330 to add voice identification information for new persons who appear in the program portions of the video data stream. These persons may be new actors and actresses, new musicians, newly elected politicians, etc. It is not necessary to update the speech database with voice identification information for new persons who appear in commercials. Therefore, classification algorithm 305 records the number of times that new unknown persons appear and whether they appear in commercials or in the program portions of the video data stream. Classification algorithm 305 then deletes all information relating to new unknown persons who appear in commercials (unless they also happen to appear in the program portion of the video data stream).
  • After the individual audio signal segments have been categorized in the proper audio categories, [0045] classification algorithm 305 sends the classification information to category change detector 335. Category change detector 335 uses time stamp information to detect when a first portion of the audio signal that has been categorized in a first category ceases and when a second portion of the audio signal categorized in a second category begins. Category change detector 335 determines when a category of the audio signal changes and determines the identity of the two categories involved. Specifically, category change detector 335 is capable of determining that an audio signal has changed from a speech signal to a music signal, or that an audio signal has changed from a silent signal to a speech signal, and so on.
  • [0046] Category change detector 335 also detects when a first portion of the audio signal that has been categorized in a first subcategory ceases and when a second portion of the audio signal categorized in a second subcategory begins. For example, category change detector 335 is capable of determining that an audio signal has changed from a first subcategory of speech with background music to a second subcategory of speech with background noise.
  • [0047] Category change detector 335 also determines when a first portion of the audio signal categorized in a first speaker category ceases and when a second portion of the audio signal categorized in a second speaker category begins. Category change detector 335 determines when a speaker category of the audio signal changes. Category change detector 335 is capable of determining that an audio signal has changed from a first speaker to a second speaker, or from a second speaker to a third speaker, and so on.
  • [0048] Category change detector 335 sends this information to category change rate detector 340. Category change rate detector 340 detects the rate at which the various categories are changing. Category change rate detector 340 uses time stamp information to calculate how many times each particular category is changing within a unit time (e.g., one minute).
  • Category [0049] change rate detector 340 determines the rate of change for each of the categories. Category change rate detector 340 uses the rate of change for each of the categories to determine an overall change rate. The overall change rate takes into account 1) the change rate of each category, and 2) the audio cut rate (i.e., the rate at which all of the categories are changing), and 3) the total length of time of each category, and 4) the ratio of the change rate of each category to the total length of time of the category within a given period of time. Category change rate detector 340 then sends the information described above to boundary detector 345.
  • [0050] Boundary detector 345 uses the information (including the overall change rate) to locate the boundaries of video programs and commercials. It is known that commercials often contain diverse and rapidly changing audio categories. Commercials usually have a larger number of speaker changes (within a given time) than do other types of video segments. If boundary detector 345 receives change rate information that shows that the rate of change of speakers is above a preselected threshold value, then boundary detector 345 may infer that a commercial is in progress. An appropriate threshold value may be obtained empirically by measuring the rate of change of speakers for a large number of commercials.
  • [0051] Boundary detector 345 may assign a “weighting factor” to each change in each category. The weighting factor may be a number that represents the relative importance assigned to the category change in assessing the likelihood of locating a boundary at the point where the particular change in category occurs. For example, if it is determined that a change from “silence” to “music” is more likely to be associated with an initial boundary, then the numerical factor that represents that particular category change may be multiplied by a “weighting factor” to increase the relative impact of that particular category change in determining the likelihood of the existence of an initial boundary.
  • In addition to the method described above, the “weighting factors” can be automatically computed directly from the category change features. In the multidimensional feature space used to describe [0052] audio classifier controller 270, each category (e.g., speech, music) has a mean vector that represents the centroid of that category. The distances between each of those mean vectors is also a measure of the significance of a category change. The distance between the means vectors can therefore be used to quantify the importance of a category change.
  • [0053] Boundary detector 345 uses the audio categories (such as speech, silence, music and noise), and the audio subcategories (such as speech with background noise, music with background noise), and the speaker categories (such as identified speakers and unknown speakers). To determine the boundary of a commercial segment, boundary detector 345 selects the size of a time window. For example, for a commercial the size of the time window can be selected to be twenty (20) seconds. Boundary detector 345 performs a sliding window high-level feature extraction and classification process to extract the following high-level features: 1) the rate of change of each category (i.e., how many times each category appears during the time window), 2) the length of each category within the time window (n-values for n categories), 3) the rate change of audio cuts (any category change) computed with the corresponding weighting factors, and 4) the average audio cut distance. These four features are sent to a classifier (not shown) within boundary detector 345 (e.g., a nearest neighbor classifier) that determines whether the audio segment within the time window is or is not a commercial segment. If the classifier is a probabilistic classifier (e.g., a Bayesian classifier), then classifier determines a probability that the audio segment within the time window is or is not a commercial segment.
  • To determine the boundary of a program segment, [0054] boundary detector 345 selects the size of a time window. For example, for a program segment the size of the time window can be selected to be five (5) minutes.
  • [0055] Boundary detector 345 then performs a sliding window high-level feature extraction and classification process to extract the following high-level features: 1) the rate of change of each category (i.e., how many times each category appears during the time window), 2) the length of each category within the time window (adjusted by the weighting factor), 3) the rate change of audio cuts (any category change), and 4) the average audio cut distance. These four features are sent to a probabilistic classifier (not shown) within boundary detector 345 (e.g., a Bayesian classifier) that determines the probability that the audio segment within the time window belongs to a particular class. For example, the audio segment may belong to a dialog, or to a news story, or to a music video, or to a crowd scene with shouting, etc. The output values from the sliding window are subjected to an analysis for a global minimum among the different segments and to an overall analysis (e.g., for the last one hour of time).
  • The result is then analyzed with the help of heuristics concerning program boundaries. Examples of heuristics concerning program boundaries include 1) a musical audio logo is usually present at the start of a news program, 2) there is usually a commercial close to end of every program, 3) credits at the end of a movie are usually shown with music in the background, and 4) the identity of the speaker (or speakers) almost always changes between programs. [0056]
  • FIG. 4 illustrates [0057] flow chart 400 depicting the operation of audio classifier controller 270, according to an advantageous embodiment of the present invention. Flow chart 400 depicts one advantageous method of operation of the present invention in audio classifier controller 270 in video recorder 150. Audio classifier controller 280 receives an audio signal from an audio signal source 300 (step 410). Audio classifier controller 270 classifies the audio signal into audio categories (and subcategories) using classification algorithm 305 (step 420). Classification algorithm 305 identifies individual speakers in each segment in the “speech” audio category using information from speaker identifier 330 (step 430). Category change detector 335 then determines when each audio category (or subcategory) changes (step 440). Category change rate detector 340 then determines the rate of change of audio categories (or subcategories) (step 450). Boundary detector 345 then uses the rate of change information of audio categories (or subcategories) for multifeature classification to locate boundaries of video programs and commercials (step 460).

Claims (24)

What is claimed is:
1. For use in a video signal processor, a system for locating boundaries of video programs and commercials comprising:
an audio classifier controller capable of receiving at least one audio category of at least one segment of an audio signal, and capable of determining at least one rate of change of said at least one audio category, and capable of locating at least one of said boundaries by comparing said at least one rate of change of said at least one audio category with a threshold value.
2. The system as claimed in claim 1 wherein said audio classifier controller comprises a classification algorithm that is capable of classifying audio signals to obtain at least one audio category for each segment of said audio signal.
3. The system as claimed in claim 2 wherein said classification algorithm is capable of classifying audio signals into audio categories of silence, music, noise and speech.
4. The system as claimed in claim 3 wherein said audio classifier controller comprises a speaker identifier comprising a speech database that contains voice identification information of persons whose voices have been identified, and wherein said classification algorithm is capable of accessing said speech database of said speaker identifier and classifying speech audio signals of persons whose voices are in said speech database of said speaker identifier as audio categories.
5. The system as claimed in claim 4 wherein said speaker identifier comprises an unknown speaker database that contains voice information of persons whose voices have not been identified,
wherein said classification algorithm is capable of accessing said unknown speaker database and determining the number of unknown speakers who speak within a given period of time, and
wherein said classification algorithm is capable of updating said speech database in said speaker identifier to add voice identification information for newly identified speakers.
6. The system as claimed in claim 1 wherein said audio classifier controller comprises a category change detector capable of receiving audio categories of segments of said audio signal, and capable of determining when an audio category of said audio signal changes, and capable of determining the identities of said audio categories before and after said change of audio category.
7. The system as claimed in claim 6 wherein said category change detector is capable of detecting audio subcategories of segments of said audio signal, and is capable of determining when an audio subcategory of said audio signal changes, and is capable of determining the identities of said audio subcategories before and after said change of audio subcategory.
8. The system as claimed in claim 6 wherein said audio classifier controller comprises a category change rate detector capable of receiving information from said category change detector concerning audio category changes, and capable of calculating the rates at which said audio category changes occur.
9. The system as claimed in claim 8 wherein said category change rate detector is capable of determining an overall change rate using information from the change rate of each category, the audio cut rate, the total length of time of each category, and the ratio of the change rate of each category to the total length of time of the category within a given period of time.
10. The system as claimed in claim 8 wherein said audio classifier controller comprises a boundary detector capable of receiving information from said category change rate detector concerning audio category rate changes, and capable of classifying at least one feature concerning at least one audio category rate change using a multifeature classifier to locate at least one boundary of a video program segment.
11. The system as claimed in claim 10 wherein said boundary detector is capable of assigning a weighting factor to each change in each category, said weighting factor comprising a number that represents the relative importance assigned to the category change in assessing the likelihood of locating a boundary at a point where a particular change in category occurs.
12. The system as claimed in claim 10 wherein said boundary detector is capable of receiving information from said category change rate detector concerning an overall change rate determined by using information from the change rate of each category, the audio cut rate, the total length of time of each category, and the ratio of the change rate of each category to the total length of time of the category within a given period of time, said boundary detector capable of classifying at least one feature concerning at least one overall change rate using a multifeature classifier to locate at least one boundary of a video program segment.
13. A video signal processor capable of locating boundaries of video programs and commercials comprising:
an audio classifier controller capable of receiving at least one audio category of at least one segment of an audio signal, and capable of determining at least one rate of change of said at least one audio category, and capable of locating at least one of said boundaries by comparing said at least one rate of change of said at least one audio category with a threshold value.
14. The video signal processor as claimed in claim 13 wherein said video signal processor comprises one of:
a television receiver, a video recorder, a device for receiving streaming video data signals, and a computerized personal multimedia retrieval system.
15. An audio signal processor capable of locating boundaries of audio programs and commercials comprising:
an audio classifier controller capable of receiving at least one audio category of at least one segment of an audio signal, and capable of determining at least one rate of change of said at least one audio category, and capable of locating at least one of said boundaries by comparing said at least one rate of change of said at least one audio category with a threshold value.
16. The audio signal processor as claimed in claim 13 wherein said audio signal processor comprises one of:
a radio receiver, an audio recorder, a device for receiving a source of streaming audio data signals, and a computerized personal audio multimedia retrieval system.
20. The method as claimed in claim 19, further comprising the steps of:
accessing a speech database in a speaker identifier within said audio classifier controller that contains voice identification information of persons who have been identified; and
classifying speech audio signals of persons whose voices are in said speech database as audio categories.
21. The method as claimed in claim 20, further comprising the steps of:
accessing a unknown speaker database in said speaker identifier that contains voice information of persons who have not been identified;
determining the number of unknown speakers who speak within a given period of time; and
updating said speech database in said speaker identifier to add voice identification information for newly identified speakers.
22. The method as claimed in claim 17, further comprising the steps of:
receiving audio categories of said audio signal in a category change detector of said audio classifier controller;
determining in said category change detector when an audio category of said audio signal changes; and
determining in said category change detector the identities of said audio categories before and after said change of audio category.
23. The method as claimed in claim 22, further comprising the steps of:
receiving audio subcategories of said audio signal in a category change detector of said audio classifier controller;
determining in said category change detector when an audio subcategory of said audio signal changes; and
determining in said category change detector the identities of said audio subcategories before and after said change of audio subcategory.
24. The method as claimed in claim 22, further comprising the steps of:
receiving in a category change rate detector information from said category change detector concerning audio category changes; and
calculating the rates at which said audio category changes occur.
25. The method as claimed in claim 24, further comprising the steps of:
determining in said category change rate detector an overall change rate using information from the change rate of each category, the audio cut rate, the total length of time of each category, and the ratio of the change rate of each category to the total length of time of the category within a given period of time; and
classifying at least one feature concerning at least one overall change rate using a multifeature classifier to locate at least one boundary of a video segment.
26. The method as claimed in claim 24, further comprising the steps of:
receiving information in a boundary detector of said audio classifier controller from said category change rate detector concerning audio category rate changes; and
classifying at least one feature concerning at least one audio category rate change using a multifeature classifier to locate at least one boundary of a video program segment containing said audio signal.
27. The method as claimed in claim 26, further comprising the step of:
assigning a weighting factor to each change in each category,
wherein said weighting factor comprises a number that represents the relative importance assigned to the category change in assessing the likelihood of locating a boundary at a point where a particular change in category occurs.
US09/746,077 1998-01-13 2000-12-22 System and method for locating program boundaries and commercial boundaries using audio categories Expired - Fee Related US6819863B2 (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
US09/746,077 US6819863B2 (en) 1998-01-13 2000-12-22 System and method for locating program boundaries and commercial boundaries using audio categories
CN01808461A CN1426563A (en) 2000-12-22 2001-12-10 System and method for locating boundaries between vidoe programs and commercial using audio categories
EP01272141A EP1417593A1 (en) 2000-12-22 2001-12-10 System and method for locating boundaries between video programs and commercial using audio categories
JP2002553671A JP2004517518A (en) 2000-12-22 2001-12-10 System and method for locating program boundaries and commercial boundaries using audio categories
PCT/IB2001/002432 WO2002052440A1 (en) 2000-12-22 2001-12-10 System and method for locating boundaries between video programs and commercial using audio categories

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US09/006,657 US6363380B1 (en) 1998-01-13 1998-01-13 Multimedia computer system with story segmentation capability and operating program therefor including finite automation video parser
US09/746,077 US6819863B2 (en) 1998-01-13 2000-12-22 System and method for locating program boundaries and commercial boundaries using audio categories

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US09/006,657 Continuation-In-Part US6363380B1 (en) 1998-01-13 1998-01-13 Multimedia computer system with story segmentation capability and operating program therefor including finite automation video parser

Publications (3)

Publication Number Publication Date
US20020080286A1 US20020080286A1 (en) 2002-06-27
US20040201784A9 true US20040201784A9 (en) 2004-10-14
US6819863B2 US6819863B2 (en) 2004-11-16

Family

ID=24999385

Family Applications (1)

Application Number Title Priority Date Filing Date
US09/746,077 Expired - Fee Related US6819863B2 (en) 1998-01-13 2000-12-22 System and method for locating program boundaries and commercial boundaries using audio categories

Country Status (5)

Country Link
US (1) US6819863B2 (en)
EP (1) EP1417593A1 (en)
JP (1) JP2004517518A (en)
CN (1) CN1426563A (en)
WO (1) WO2002052440A1 (en)

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020103963A1 (en) * 2001-01-30 2002-08-01 Pioneer Corporation Information recording and reproducing apparatus, method of appending title information, and program recording medium having recorded title information appending procedure program
US20030033321A1 (en) * 2001-07-20 2003-02-13 Audible Magic, Inc. Method and apparatus for identifying new media content
US20040163106A1 (en) * 2003-02-01 2004-08-19 Audible Magic, Inc. Method and apparatus to identify a work received by a processing system
US20050102135A1 (en) * 2003-11-12 2005-05-12 Silke Goronzy Apparatus and method for automatic extraction of important events in audio signals
US20050154681A1 (en) * 2001-04-05 2005-07-14 Audible Magic Corporation Copyright detection and protection system and method
US20050232580A1 (en) * 2004-03-11 2005-10-20 Interdigital Technology Corporation Control of device operation within an area
US20060034177A1 (en) * 2004-07-28 2006-02-16 Audible Magic Corporation System for distributing decoy content in a peer to peer network
US20060172063A1 (en) * 2004-12-06 2006-08-03 Interdigital Technology Corporation Method and apparatus for detecting portable electronic device functionality
US20070077028A1 (en) * 1999-08-09 2007-04-05 British Sky Broadcasting Limited Receivers for television signals
US20070082607A1 (en) * 2005-10-11 2007-04-12 Lg Electronics Inc. Digital broadcast system and method for a mobile terminal
US20070099602A1 (en) * 2005-10-28 2007-05-03 Microsoft Corporation Multi-modal device capable of automated actions
US20070250777A1 (en) * 2006-04-25 2007-10-25 Cyberlink Corp. Systems and methods for classifying sports video
US20080235267A1 (en) * 2005-09-29 2008-09-25 Koninklijke Philips Electronics, N.V. Method and Apparatus For Automatically Generating a Playlist By Segmental Feature Comparison
US20100281499A1 (en) * 1999-11-18 2010-11-04 Harville Michael L Iterative, maximally probable, batch-mode commercial detection for audiovisual content
US7917645B2 (en) 2000-02-17 2011-03-29 Audible Magic Corporation Method and apparatus for identifying media content presented on a media playing device
US8006314B2 (en) 2007-07-27 2011-08-23 Audible Magic Corporation System for identifying content of digital data
US8082150B2 (en) 2001-07-10 2011-12-20 Audible Magic Corporation Method and apparatus for identifying an unknown work
US8086445B2 (en) 2000-11-03 2011-12-27 Audible Magic Corporation Method and apparatus for creating a unique audio signature
US8199651B1 (en) 2009-03-16 2012-06-12 Audible Magic Corporation Method and system for modifying communication flows at a port level
US8972481B2 (en) 2001-07-20 2015-03-03 Audible Magic, Inc. Playlist generation method and apparatus
US20150082338A1 (en) * 2000-03-28 2015-03-19 Compass Innovations, LLC Audiovisual Content Presentation Dependent On Metadata
US9081778B2 (en) 2012-09-25 2015-07-14 Audible Magic Corporation Using digital fingerprints to associate data with a work
US9160837B2 (en) 2011-06-29 2015-10-13 Gracenote, Inc. Interactive streaming content apparatus, systems and methods

Families Citing this family (68)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7877774B1 (en) * 1999-04-19 2011-01-25 At&T Intellectual Property Ii, L.P. Browsing and retrieval of full broadcast-quality video
US9171545B2 (en) * 1999-04-19 2015-10-27 At&T Intellectual Property Ii, L.P. Browsing and retrieval of full broadcast-quality video
DE19929166A1 (en) * 1999-06-25 2001-03-22 Tektronix Inc Procedure for learning protocol rules from observed communication processes
US20020141730A1 (en) * 2001-03-27 2002-10-03 Koninklijke Philips Electronics N.V. TV recorder with inoperative settop box functions
JP4546682B2 (en) * 2001-06-26 2010-09-15 パイオニア株式会社 Video information summarizing apparatus, video information summarizing method, and video information summarizing processing program
JP4615166B2 (en) * 2001-07-17 2011-01-19 パイオニア株式会社 Video information summarizing apparatus, video information summarizing method, and video information summarizing program
DE10148351B4 (en) * 2001-09-29 2007-06-21 Grundig Multimedia B.V. Method and device for selecting a sound algorithm
US20030108334A1 (en) * 2001-12-06 2003-06-12 Koninklijke Philips Elecronics N.V. Adaptive environment system and method of providing an adaptive environment
US7006976B2 (en) * 2002-01-29 2006-02-28 Pace Micro Technology, Llp Apparatus and method for inserting data effects into a digital data stream
US7336890B2 (en) * 2003-02-19 2008-02-26 Microsoft Corporation Automatic detection and segmentation of music videos in an audio/video stream
US7738704B2 (en) * 2003-03-07 2010-06-15 Technology, Patents And Licensing, Inc. Detecting known video entities utilizing fingerprints
US7694318B2 (en) * 2003-03-07 2010-04-06 Technology, Patents & Licensing, Inc. Video detection and insertion
US7809154B2 (en) 2003-03-07 2010-10-05 Technology, Patents & Licensing, Inc. Video entity recognition in compressed digital video streams
US7599554B2 (en) * 2003-04-14 2009-10-06 Koninklijke Philips Electronics N.V. Method and apparatus for summarizing a music video using content analysis
US7130623B2 (en) * 2003-04-17 2006-10-31 Nokia Corporation Remote broadcast recording
CN1836287B (en) * 2003-08-18 2012-03-21 皇家飞利浦电子股份有限公司 Video abstracting
US7786987B2 (en) * 2003-09-25 2010-08-31 The Nielsen Company (Us), Llc Methods and apparatus to detect an operating state of a display based on visible light
US9027043B2 (en) * 2003-09-25 2015-05-05 The Nielsen Company (Us), Llc Methods and apparatus to detect an operating state of a display
JP4143017B2 (en) * 2003-10-30 2008-09-03 株式会社東芝 Recording apparatus and recording method
US7179980B2 (en) * 2003-12-12 2007-02-20 Nokia Corporation Automatic extraction of musical portions of an audio stream
US20050138655A1 (en) * 2003-12-22 2005-06-23 Randy Zimler Methods, systems and storage medium for managing digital rights of segmented content
US20050177618A1 (en) * 2003-12-22 2005-08-11 Randy Zimler Methods, systems and storage medium for managing bandwidth of segmented content
CA2899107C (en) 2003-12-30 2017-12-05 The Nielsen Company (Us), Llc Methods and apparatus to distinguish a signal originating from a local device from a broadcast signal
TW200537941A (en) * 2004-01-26 2005-11-16 Koninkl Philips Electronics Nv Replay of media stream from a prior change location
US7280737B2 (en) * 2004-02-23 2007-10-09 Warner Bros. Entertainment Inc. Method and apparatus for discouraging commercial skipping
US7409407B2 (en) * 2004-05-07 2008-08-05 Mitsubishi Electric Research Laboratories, Inc. Multimedia event detection and summarization
JP4387408B2 (en) * 2004-06-18 2009-12-16 パナソニック株式会社 AV content processing apparatus, AV content processing method, AV content processing program, and integrated circuit used for AV content processing apparatus
CN102523063A (en) 2004-08-09 2012-06-27 尼尔森(美国)有限公司 Methods and apparatus to monitor audio/visual content from various sources
JP2006058874A (en) * 2004-08-20 2006-03-02 Mitsubishi Electric Research Laboratories Inc Method to detect event in multimedia
JP2008527940A (en) * 2005-01-19 2008-07-24 コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ Apparatus and method for analyzing a content stream containing content items
US20060195859A1 (en) * 2005-02-25 2006-08-31 Richard Konig Detecting known video entities taking into account regions of disinterest
US7617188B2 (en) * 2005-03-24 2009-11-10 The Mitre Corporation System and method for audio hot spotting
US7690011B2 (en) 2005-05-02 2010-03-30 Technology, Patents & Licensing, Inc. Video stream modification to defeat detection
WO2007022250A2 (en) 2005-08-16 2007-02-22 Nielsen Media Research, Inc. Display device on/off detection methods and apparatus
WO2007036888A2 (en) * 2005-09-29 2007-04-05 Koninklijke Philips Electronics N.V. A method and apparatus for segmenting a content item
CN101322123A (en) * 2005-11-30 2008-12-10 皇家飞利浦电子股份有限公司 Method and system for updating user profiles
JP4698453B2 (en) * 2006-02-28 2011-06-08 三洋電機株式会社 Commercial detection device, video playback device
JP4759745B2 (en) * 2006-06-21 2011-08-31 国立大学法人北海道大学 Video classification device, video classification method, video classification program, and computer-readable recording medium
US8107541B2 (en) * 2006-11-07 2012-01-31 Mitsubishi Electric Research Laboratories, Inc. Method and system for video segmentation
JP4919879B2 (en) 2007-06-07 2012-04-18 ソニー株式会社 Information processing apparatus and method, and program
US8515257B2 (en) * 2007-10-17 2013-08-20 International Business Machines Corporation Automatic announcer voice attenuation in a presentation of a televised sporting event
WO2009090705A1 (en) * 2008-01-16 2009-07-23 Panasonic Corporation Recording/reproduction device
CN101534352A (en) * 2008-03-10 2009-09-16 华为技术有限公司 Line status detecting method, device and predictive outbound system
JP4656202B2 (en) * 2008-07-22 2011-03-23 ソニー株式会社 Information processing apparatus and method, program, and recording medium
US8180712B2 (en) 2008-09-30 2012-05-15 The Nielsen Company (Us), Llc Methods and apparatus for determining whether a media presentation device is in an on state or an off state
US8793717B2 (en) * 2008-10-31 2014-07-29 The Nielsen Company (Us), Llc Probabilistic methods and apparatus to determine the state of a media device
US20100169908A1 (en) * 2008-12-30 2010-07-01 Nielsen Christen V Methods and apparatus to enforce a power off state of an audience measurement device during shipping
US8375404B2 (en) * 2008-12-30 2013-02-12 The Nielsen Company (Us), Llc Methods and apparatus to enforce a power off state of an audience measurement device during shipping
US8156517B2 (en) 2008-12-30 2012-04-10 The Nielsen Company (U.S.), Llc Methods and apparatus to enforce a power off state of an audience measurement device during shipping
WO2010151785A1 (en) 2009-06-25 2010-12-29 Visible World Inc. Time compressing video content
DE112009005215T8 (en) 2009-08-04 2013-01-03 Nokia Corp. Method and apparatus for audio signal classification
US8532863B2 (en) * 2009-09-28 2013-09-10 Sri International Audio based robot control and navigation
EP2840801B1 (en) * 2010-02-26 2017-09-20 Comcast Cable Communications, LLC Video stream segmentation and classification to skip advertisements.
US10116902B2 (en) * 2010-02-26 2018-10-30 Comcast Cable Communications, Llc Program segmentation of linear transmission
CN102956230B (en) * 2011-08-19 2017-03-01 杜比实验室特许公司 The method and apparatus that song detection is carried out to audio signal
EP2758956B1 (en) 2011-09-23 2021-03-10 Digimarc Corporation Context-based smartphone sensor logic
JP2015506158A (en) 2011-12-19 2015-02-26 ザ ニールセン カンパニー (ユーエス) エルエルシー Method and apparatus for crediting a media presentation device
KR20130071873A (en) * 2011-12-21 2013-07-01 삼성전자주식회사 Content playing apparatus and control method thereof
US9692535B2 (en) 2012-02-20 2017-06-27 The Nielsen Company (Us), Llc Methods and apparatus for automatic TV on/off detection
EP2830062B1 (en) 2012-03-21 2019-11-20 Samsung Electronics Co., Ltd. Method and apparatus for high-frequency encoding/decoding for bandwidth extension
CN104380222B (en) * 2012-03-28 2018-03-27 泰瑞·克劳福德 Sector type is provided and browses the method and system for having recorded dialogue
US10133538B2 (en) * 2015-03-27 2018-11-20 Sri International Semi-supervised speaker diarization
US9924224B2 (en) 2015-04-03 2018-03-20 The Nielsen Company (Us), Llc Methods and apparatus to determine a state of a media presentation device
US11228817B2 (en) 2016-03-01 2022-01-18 Comcast Cable Communications, Llc Crowd-sourced program boundaries
US10945030B2 (en) 2018-03-30 2021-03-09 Alphonso Inc. Detection of potential commercial by detection and analysis of transitions in video content
US11245958B2 (en) * 2018-11-16 2022-02-08 Roku, Inc. Detection of mute and compensation therefor during media replacement event
US11024291B2 (en) 2018-11-21 2021-06-01 Sri International Real-time class recognition for an audio stream
US11763242B2 (en) * 2021-12-09 2023-09-19 Z21 Labs, Inc. Automatic evaluation of recorded interactions

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5343251A (en) * 1993-05-13 1994-08-30 Pareto Partners, Inc. Method and apparatus for classifying patterns of television programs and commercials based on discerning of broadcast audio and video signals
US5999688A (en) * 1993-01-08 1999-12-07 Srt, Inc. Method and apparatus for controlling a video player to automatically locate a segment of a recorded program
US6363380B1 (en) * 1998-01-13 2002-03-26 U.S. Philips Corporation Multimedia computer system with story segmentation capability and operating program therefor including finite automation video parser
US6434520B1 (en) * 1999-04-16 2002-08-13 International Business Machines Corporation System and method for indexing and querying audio archives
US6459735B1 (en) * 1998-11-30 2002-10-01 Sony Corporation Information processing apparatus, information processing method, and distribution media
US20020164151A1 (en) * 2001-05-01 2002-11-07 Koninklijke Philips Electronics N.V. Automatic content analysis and representation of multimedia presentations

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100206804B1 (en) 1996-08-29 1999-07-01 구자홍 The automatic selection recording method of highlight part
JPH10174039A (en) 1996-12-06 1998-06-26 Matsushita Electric Ind Co Ltd Program recording device
JPH10224724A (en) 1997-02-04 1998-08-21 Sony Corp Television signal recorder, its method, television signal reproducing device and its method, and television signal recording and reproducing device and recording medium
US6236395B1 (en) * 1999-02-01 2001-05-22 Sharp Laboratories Of America, Inc. Audiovisual information management system
US6469749B1 (en) 1999-10-13 2002-10-22 Koninklijke Philips Electronics N.V. Automatic signature-based spotting, learning and extracting of commercials and other video content

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5999688A (en) * 1993-01-08 1999-12-07 Srt, Inc. Method and apparatus for controlling a video player to automatically locate a segment of a recorded program
US5343251A (en) * 1993-05-13 1994-08-30 Pareto Partners, Inc. Method and apparatus for classifying patterns of television programs and commercials based on discerning of broadcast audio and video signals
US6363380B1 (en) * 1998-01-13 2002-03-26 U.S. Philips Corporation Multimedia computer system with story segmentation capability and operating program therefor including finite automation video parser
US6459735B1 (en) * 1998-11-30 2002-10-01 Sony Corporation Information processing apparatus, information processing method, and distribution media
US6434520B1 (en) * 1999-04-16 2002-08-13 International Business Machines Corporation System and method for indexing and querying audio archives
US20020164151A1 (en) * 2001-05-01 2002-11-07 Koninklijke Philips Electronics N.V. Automatic content analysis and representation of multimedia presentations

Cited By (57)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070077028A1 (en) * 1999-08-09 2007-04-05 British Sky Broadcasting Limited Receivers for television signals
US20130014157A1 (en) * 1999-11-18 2013-01-10 Interval Licensing Llc Iterative, maximally probable, batch-mode commercial detection for audiovisual content
US8630536B2 (en) * 1999-11-18 2014-01-14 Interval Licensing Llc Iterative, maximally probable, batch-mode commercial detection for audiovisual content
US8724967B2 (en) * 1999-11-18 2014-05-13 Interval Licensing Llc Iterative, maximally probable, batch-mode commercial detection for audiovisual content
US20100281499A1 (en) * 1999-11-18 2010-11-04 Harville Michael L Iterative, maximally probable, batch-mode commercial detection for audiovisual content
US10194187B2 (en) 2000-02-17 2019-01-29 Audible Magic Corporation Method and apparatus for identifying media content presented on a media playing device
US7917645B2 (en) 2000-02-17 2011-03-29 Audible Magic Corporation Method and apparatus for identifying media content presented on a media playing device
US9049468B2 (en) 2000-02-17 2015-06-02 Audible Magic Corporation Method and apparatus for identifying media content presented on a media playing device
US10313714B2 (en) * 2000-03-28 2019-06-04 Tivo Solutions Inc. Audiovisual content presentation dependent on metadata
US20150082338A1 (en) * 2000-03-28 2015-03-19 Compass Innovations, LLC Audiovisual Content Presentation Dependent On Metadata
US8086445B2 (en) 2000-11-03 2011-12-27 Audible Magic Corporation Method and apparatus for creating a unique audio signature
US20020103963A1 (en) * 2001-01-30 2002-08-01 Pioneer Corporation Information recording and reproducing apparatus, method of appending title information, and program recording medium having recorded title information appending procedure program
US20050154680A1 (en) * 2001-04-05 2005-07-14 Audible Magic Corporation Copyright detection and protection system and method
US20050154681A1 (en) * 2001-04-05 2005-07-14 Audible Magic Corporation Copyright detection and protection system and method
US7707088B2 (en) 2001-04-05 2010-04-27 Audible Magic Corporation Copyright detection and protection system and method
US7711652B2 (en) 2001-04-05 2010-05-04 Audible Magic Corporation Copyright detection and protection system and method
US9589141B2 (en) 2001-04-05 2017-03-07 Audible Magic Corporation Copyright detection and protection system and method
US7797249B2 (en) 2001-04-05 2010-09-14 Audible Magic Corporation Copyright detection and protection system and method
US8775317B2 (en) 2001-04-05 2014-07-08 Audible Magic Corporation Copyright detection and protection system and method
US8484691B2 (en) 2001-04-05 2013-07-09 Audible Magic Corporation Copyright detection and protection system and method
US8645279B2 (en) 2001-04-05 2014-02-04 Audible Magic Corporation Copyright detection and protection system and method
US8082150B2 (en) 2001-07-10 2011-12-20 Audible Magic Corporation Method and apparatus for identifying an unknown work
US7877438B2 (en) * 2001-07-20 2011-01-25 Audible Magic Corporation Method and apparatus for identifying new media content
US8972481B2 (en) 2001-07-20 2015-03-03 Audible Magic, Inc. Playlist generation method and apparatus
US10025841B2 (en) 2001-07-20 2018-07-17 Audible Magic, Inc. Play list generation method and apparatus
US20030033321A1 (en) * 2001-07-20 2003-02-13 Audible Magic, Inc. Method and apparatus for identifying new media content
US20040163106A1 (en) * 2003-02-01 2004-08-19 Audible Magic, Inc. Method and apparatus to identify a work received by a processing system
US8332326B2 (en) 2003-02-01 2012-12-11 Audible Magic Corporation Method and apparatus to identify a work received by a processing system
US20050102135A1 (en) * 2003-11-12 2005-05-12 Silke Goronzy Apparatus and method for automatic extraction of important events in audio signals
US8635065B2 (en) * 2003-11-12 2014-01-21 Sony Deutschland Gmbh Apparatus and method for automatic extraction of important events in audio signals
US20050232580A1 (en) * 2004-03-11 2005-10-20 Interdigital Technology Corporation Control of device operation within an area
US8130746B2 (en) 2004-07-28 2012-03-06 Audible Magic Corporation System for distributing decoy content in a peer to peer network
US20060034177A1 (en) * 2004-07-28 2006-02-16 Audible Magic Corporation System for distributing decoy content in a peer to peer network
US7948375B2 (en) 2004-12-06 2011-05-24 Interdigital Technology Corporation Method and apparatus for detecting portable electronic device functionality
US20060172063A1 (en) * 2004-12-06 2006-08-03 Interdigital Technology Corporation Method and apparatus for detecting portable electronic device functionality
US20080235267A1 (en) * 2005-09-29 2008-09-25 Koninklijke Philips Electronics, N.V. Method and Apparatus For Automatically Generating a Playlist By Segmental Feature Comparison
US7826793B2 (en) * 2005-10-11 2010-11-02 Lg Electronics Inc. Digital broadcast system and method for a mobile terminal
US20070082607A1 (en) * 2005-10-11 2007-04-12 Lg Electronics Inc. Digital broadcast system and method for a mobile terminal
US20070099602A1 (en) * 2005-10-28 2007-05-03 Microsoft Corporation Multi-modal device capable of automated actions
US7778632B2 (en) * 2005-10-28 2010-08-17 Microsoft Corporation Multi-modal device capable of automated actions
US8682654B2 (en) * 2006-04-25 2014-03-25 Cyberlink Corp. Systems and methods for classifying sports video
US20070250777A1 (en) * 2006-04-25 2007-10-25 Cyberlink Corp. Systems and methods for classifying sports video
US8006314B2 (en) 2007-07-27 2011-08-23 Audible Magic Corporation System for identifying content of digital data
US8112818B2 (en) 2007-07-27 2012-02-07 Audible Magic Corporation System for identifying content of digital data
US9268921B2 (en) 2007-07-27 2016-02-23 Audible Magic Corporation System for identifying content of digital data
US8732858B2 (en) 2007-07-27 2014-05-20 Audible Magic Corporation System for identifying content of digital data
US10181015B2 (en) 2007-07-27 2019-01-15 Audible Magic Corporation System for identifying content of digital data
US9785757B2 (en) 2007-07-27 2017-10-10 Audible Magic Corporation System for identifying content of digital data
US8199651B1 (en) 2009-03-16 2012-06-12 Audible Magic Corporation Method and system for modifying communication flows at a port level
US20160019876A1 (en) * 2011-06-29 2016-01-21 Gracenote, Inc. Machine-control of a device based on machine-detected transitions
US10134373B2 (en) * 2011-06-29 2018-11-20 Gracenote, Inc. Machine-control of a device based on machine-detected transitions
US9160837B2 (en) 2011-06-29 2015-10-13 Gracenote, Inc. Interactive streaming content apparatus, systems and methods
US10783863B2 (en) 2011-06-29 2020-09-22 Gracenote, Inc. Machine-control of a device based on machine-detected transitions
US11417302B2 (en) 2011-06-29 2022-08-16 Gracenote, Inc. Machine-control of a device based on machine-detected transitions
US9608824B2 (en) 2012-09-25 2017-03-28 Audible Magic Corporation Using digital fingerprints to associate data with a work
US9081778B2 (en) 2012-09-25 2015-07-14 Audible Magic Corporation Using digital fingerprints to associate data with a work
US10698952B2 (en) 2012-09-25 2020-06-30 Audible Magic Corporation Using digital fingerprints to associate data with a work

Also Published As

Publication number Publication date
WO2002052440A1 (en) 2002-07-04
US6819863B2 (en) 2004-11-16
US20020080286A1 (en) 2002-06-27
CN1426563A (en) 2003-06-25
EP1417593A1 (en) 2004-05-12
JP2004517518A (en) 2004-06-10

Similar Documents

Publication Publication Date Title
US6819863B2 (en) System and method for locating program boundaries and commercial boundaries using audio categories
US6973256B1 (en) System and method for detecting highlights in a video program using audio properties
US6993245B1 (en) Iterative, maximally probable, batch-mode commercial detection for audiovisual content
US7599554B2 (en) Method and apparatus for summarizing a music video using content analysis
US6998527B2 (en) System and method for indexing and summarizing music videos
US7046911B2 (en) System and method for reduced playback of recorded video based on video segment priority
US7362950B2 (en) Method and apparatus for controlling reproduction of video contents
KR100903160B1 (en) Method and apparatus for signal processing
JP2005173569A (en) Apparatus and method for classifying audio signal
US20020083473A1 (en) System and method for accessing a multimedia summary of a video program
US8103149B2 (en) Playback system, apparatus, and method, information processing apparatus and method, and program therefor
JP2007522722A (en) Play a media stream from the pre-change position
KR20040101245A (en) Use of transcript information to find key audio/video segments
US20060224616A1 (en) Information processing device and method thereof
KR20060136413A (en) Replay of media stream from a prior change location

Legal Events

Date Code Title Description
AS Assignment

Owner name: PHILIPS ELECTRONICS NORTH AMERICA CORPORATION, NEW

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:DAGTAS, SERHAN;DIMITROVA, NEVENKA;REEL/FRAME:011407/0728

Effective date: 20000929

AS Assignment

Owner name: KONINKLIJKE PHILIPS ELECTRONICS N.V., NETHERLANDS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:PHILIP ELECTRONICS NORTH AMERICA CORPORATION;REEL/FRAME:015725/0691

Effective date: 20040823

REMI Maintenance fee reminder mailed
LAPS Lapse for failure to pay maintenance fees
STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Lapsed due to failure to pay maintenance fee

Effective date: 20081116