WO2005086081A1 - Detecting known video entities - Google Patents

Detecting known video entities Download PDF

Info

Publication number
WO2005086081A1
WO2005086081A1 PCT/GB2005/000772 GB2005000772W WO2005086081A1 WO 2005086081 A1 WO2005086081 A1 WO 2005086081A1 GB 2005000772 W GB2005000772 W GB 2005000772W WO 2005086081 A1 WO2005086081 A1 WO 2005086081A1
Authority
WO
WIPO (PCT)
Prior art keywords
video stream
video
advertisement
channel
fingeφrints
Prior art date
Application number
PCT/GB2005/000772
Other languages
French (fr)
Inventor
Richard König
Charles Eldering
Rainer Lienhart
Christine Lienhart
Douglas J. Ryder
Original Assignee
Half Minute Media Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Half Minute Media Ltd filed Critical Half Minute Media Ltd
Priority to DE602005019273T priority Critical patent/DE602005019273D1/en
Priority to AT05717850T priority patent/ATE457501T1/en
Priority to EP05717850A priority patent/EP1730668B1/en
Publication of WO2005086081A1 publication Critical patent/WO2005086081A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7847Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content
    • G06F16/785Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content using colour or luminescence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/48Matching video sequences
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • G06V20/635Overlay text, e.g. embedded captions in a TV program
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/214Specialised server platform, e.g. server located in an airplane, hotel, hospital
    • H04N21/2143Specialised server platform, e.g. server located in an airplane, hotel, hospital located in a single building, e.g. hotel, hospital or museum
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • H04N21/23424Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving splicing one content stream with another content stream, e.g. for inserting or substituting an advertisement
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/254Management at additional data server, e.g. shopping server, rights management server
    • H04N21/2543Billing, e.g. for subscription services
    • H04N21/2547Third Party Billing, e.g. billing of advertiser
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/258Client or end-user data management, e.g. managing client capabilities, user preferences or demographics, processing of multiple end-users preferences to derive collaborative data
    • H04N21/25866Management of end-user data
    • H04N21/25891Management of end-user data being end-user preferences
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/266Channel or content management, e.g. generation and management of keys and entitlement messages in a conditional access system, merging a VOD unicast channel into a multicast channel
    • H04N21/2668Creating a channel for a dedicated end-user group, e.g. insertion of targeted commercials based on end-user profiles
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/414Specialised client platforms, e.g. receiver in car or embedded in a mobile appliance
    • H04N21/41415Specialised client platforms, e.g. receiver in car or embedded in a mobile appliance involving a public display, viewable by several users in a public space outside their home, e.g. movie theatre, information kiosk
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/433Content storage operation, e.g. storage operation in response to a pause request, caching operations
    • H04N21/4331Caching operations, e.g. of an advertisement for later insertion during playback
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/44016Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/458Scheduling content for creating a personalised stream, e.g. by combining a locally stored advertisement with an incoming stream; Updating operations, e.g. for OS modules ; time-related management operations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/812Monomedia components thereof involving advertisement data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/835Generation of protective data, e.g. certificates
    • H04N21/8352Generation of protective data, e.g. certificates involving content or source identification data, e.g. Unique Material Identifier [UMID]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/84Generation or processing of descriptive data, e.g. content descriptors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/84Generation or processing of descriptive data, e.g. content descriptors
    • H04N21/8405Generation or processing of descriptive data, e.g. content descriptors represented by keywords
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/14Picture signal circuitry for video frequency region
    • H04N5/147Scene change detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/08Systems for the simultaneous or sequential transmission of more than one television signal, e.g. additional information signals, the signals occupying wholly or partially the same frequency band, e.g. by time division
    • H04N7/087Systems for the simultaneous or sequential transmission of more than one television signal, e.g. additional information signals, the signals occupying wholly or partially the same frequency band, e.g. by time division with signal insertion during the vertical blanking interval only
    • H04N7/088Systems for the simultaneous or sequential transmission of more than one television signal, e.g. additional information signals, the signals occupying wholly or partially the same frequency band, e.g. by time division with signal insertion during the vertical blanking interval only the inserted signal being digital
    • H04N7/0887Systems for the simultaneous or sequential transmission of more than one television signal, e.g. additional information signals, the signals occupying wholly or partially the same frequency band, e.g. by time division with signal insertion during the vertical blanking interval only the inserted signal being digital for the transmission of programme or channel identifying signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/16Analogue secrecy systems; Analogue subscription systems
    • H04N7/162Authorising the user terminal, e.g. by paying; Registering the use of a subscription channel, e.g. billing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/08Systems for the simultaneous or sequential transmission of more than one television signal, e.g. additional information signals, the signals occupying wholly or partially the same frequency band, e.g. by time division
    • H04N7/087Systems for the simultaneous or sequential transmission of more than one television signal, e.g. additional information signals, the signals occupying wholly or partially the same frequency band, e.g. by time division with signal insertion during the vertical blanking interval only
    • H04N7/088Systems for the simultaneous or sequential transmission of more than one television signal, e.g. additional information signals, the signals occupying wholly or partially the same frequency band, e.g. by time division with signal insertion during the vertical blanking interval only the inserted signal being digital

Definitions

  • Advertisements are commonplace in most broadcast video, including video received from satellite transmissions, cable television networks, over-the-air broadcasts, digital subscriber line (DSL) systems, and fiber optic networks. Advertising plays an important role in the economics of entertainment programming in that advertisements are used to subsidize or pay for the development of the content. As an example, broadcast of sports such as football games, soccer games, basketball games and baseball games is paid for by advertisers. Even though subscribers may pay for access to that sports programming, such as through satellite or cable network subscriptions, the advertisements appearing during the breaks in the sport are sold by the network producing the transmission of the event, and subsidize the costs of the programming . Advertisements included in the prograrnming may not be applicable to individuals watching the programming.
  • hotels sometimes have internal channels containing advertising directed at the guests, but this tends to be an ''infomercial" channel that does not have significant viewership.
  • the entertainment programming video streams may be purchased on a subscription basis from satellite or cable operator, or may simply be taken from over-the-air broadcasts.
  • the hotel operator offers Video on Demand (NoD) services, allowing consumers to choose a movie or other program for their particular viewing. These movies are presented on a fee basis, and although there are typically some types of advertising before the movie, viewers are not subjected to advertising during the movie.
  • Hospitals also provide video programming to the patients, who may pay for the programming based on a daily fee, or in some instances on a pay-per-view basis.
  • the advertising in the programming is not specifically directed at the patients, but is simply the advertising put into the programming by the content provider.
  • Residential viewers are also presented advertisements in the vast majority of programming they view. These advertisements may or may not be the appropriate advertisements for that viewer or family.
  • Detection of the advertisements may require access to signals indicating the start and end of an advertisement. In the absence of these signals, another means is required for detecting the start and end of an advertisement or advertisement break. There is a need for a system and method that allows for the insertion of advertisements in video streams.
  • One method includes calculating features about an incoming video stream. These features may include color histograms, color coherence vectors (CCNs), and evenly or randomly highly subsampled representations of the original video (all known as fmgerprints).
  • the fmge ⁇ rints of the incoming video stream are compared to a database of finge ⁇ rints for known advertisements, video sequences known to precede commercial breaks (ad intros), and/or sequences known to follow commercial breaks (ad outros).
  • a match is found between the incoming video stream and a known advertisement or ad intro, the incoming video stream is associated with the known advertisement and/or ad intro and a targeted advertisement may be substituted.
  • the fmge ⁇ rint of the incoming video stream (calculated fmge ⁇ rint) may be compared to a plurality of finge ⁇ rints for known entities (e.g., ads, intros, outros) within the database (known finge ⁇ rints).
  • the comparison may be done based on small segments of a video stream at a time.
  • a determination is made as to whether the calculated fmge ⁇ rint and the known finge ⁇ rints within the database exceed some threshold level of dissimilarity. If the comparison exceeds the threshold for certain known fmge ⁇ rints witihin the database, the comparison of the calculated fmge ⁇ rint to those known fmge ⁇ rints stops for the time being. For those known fmge ⁇ rints that the comparison was below the threshold level of dissimilarity the comparison continues. At each step of the comparison those known fmge ⁇ rints exceeding the threshold level of dissimilarity cease.
  • the process continues until one of the known fmge ⁇ rints has a comparison that exceeds a threshold level of similarity (indicating a match) or the comparison of all of the known fmge ⁇ rints within the database exceed the dissimilarity threshold at which point the video stream is not associated with any of the known finge ⁇ rints.
  • a threshold level of similarity indicating a match
  • the comparison of all of the known fmge ⁇ rints within the database exceed the dissimilarity threshold at which point the video stream is not associated with any of the known finge ⁇ rints.
  • certain portions of the finge ⁇ rints may be excluded. For example, a channel banner may be excluded from the calculation of the dissimilarity so as not to skew the results of comparisons to the database of known fmge ⁇ rints.
  • a channel change may be detected when the fmge ⁇ rint of the incoming video stream matches a f ⁇ nge ⁇ rint in the database associated with a channel change (e.g., blank frames after channel banner is removed).
  • a channel change e.g., blank frames after channel banner is removed.
  • certain portions of the video stream may have fmge ⁇ rints calculated therefore. These portions of the video stream may include the portions of the video stream that have channel identification data. These portions of the video may be analyzed to determine the channel associated with the video stream. Finge ⁇ rints for the portions may be calculated and compared to finge ⁇ rints for know channel identification data.
  • the channel may also be detected by comparing detected advertisement breaks with known advertisement breaks for specific channels.
  • the channel may also be detected by comparing finge ⁇ rints for the incoming stream with finge ⁇ rints for known channels stored in a database. Determining that channel may affect the ads that are inserted and is useful in reporting the programs into which targeted advertising has been inserted. Determining the channel also allows remote manual triggering of ad insertion while detection of a channel change event is used as a trigger to end ad insertion. Furthermore, if a network frequently overlays or covers a portion of frames in the video stream that portion of each frame in the video stream can be excluded during the calculation of the dissimilarity so as not to skew the results of comparisons to the database of known finge ⁇ rints.
  • certain portions may be identified and excluded. According to one embodiment, certain portions may be excluded from the database of known finge ⁇ rints as well as from the calculated fmge ⁇ rint.
  • targeted advertisements When targeted advertisements are being inserted the system continues to generate fmge ⁇ rints for the incoming video stream and to compare to known fmge ⁇ rints stored in the database in order to look for outros or programming that would indicate the end of the commercial break in the incoming video stream.
  • channel changes or EPG activations may be detected. The detection of the end of a commercial break may cause the system to instantly return to the incoming video stream.
  • the currently being inserted advertisement may be completed before the incoming video stream is returned to.
  • time parameters may be set that automatically returns to the video stream even if an end of the commercial break is not detected. After a certain time the system may present a pre-outro (e.g., still image) that is displayed until the end of the commercial break is detected. Calculating fmge ⁇ rints for the incoming video stream and comparing to a database of finge ⁇ rints of known entities can also be used to detect certain programs and/or scenes and to record, bookmark or stop recording the programs and/or scenes.
  • FIG. 1 illustrates an exemplary content delivery system, according to one embodiment
  • FIG. 2 illustrates an exemplary configuration for local detection of advertisements within a video programming stream, according to one embodiment
  • FIG. 3 illustrates an exemplary pixel grid for a video frame and an associated color histogram, according to one embodiment
  • FIG. 4 illustrates an exemplary comparison of two color histograms, according to one embodiment
  • FIG. 5 illustrates an exemplary pixel grid for a video frame and associated color histogram and CCNs, according to one embodiment
  • FIG. 6 illustrates an exemplary comparison of color histograms and CCNs for two images, according to one embodiment
  • FIG. 6 A illustrates edge pixels for two exemplary consecutive images, according to one embodiment
  • FIG. 6B illustrates macroblocks for two exemplary consecutive images, according to one embodiment
  • FIG 7 illustrates an exemplary pixel grid for a video frame with a plurality of regions sampled, according to one embodiment
  • FIG. 8 illustrates two exemplary pixel grids having a plurality of regions for sampling and coherent and incoherent pixels identified, according to one embodiment
  • FIG. 9 illustrates exemplary comparisons of the pixel grids of FIG. 8 based on color histograms for the entire frame, CCNs for the entire frame and average color for the plurality of regions, according to one embodiment
  • FIG. 10 illustrates an exemplary flow-chart of the advertisement matching process, according to one embodiment
  • FIG. 11 illustrates an exemplary flow-chart of an initial dissimilarity determination process, according to one embodiment
  • FIG. 12 illustrates an exemplary initial comparison of calculated features for an incoming stream versus initial portions of finge ⁇ rints for a plurality of known advertisements, according to one embodiment
  • FIG. 13 illustrates an exemplary initial comparison of calculated features for an incoming stream versus an expanded initial portion of a fmge ⁇ rint for a known advertisement, according to one embodiment
  • FIG. 14 illustrates an exemplary expanding window comparison of the features of the incoming video stream and the features of the finge ⁇ rints of known advertisements, according to one embodiment
  • FIG. 15 illustrates an exemplary pixel grid divided into sections, according to one embodiment
  • FIG. 16 illustrates an exemplary comparison of two whole images and corresponding sections of the two images, according to one embodiment
  • FIG. 17 illustrates an exemplary comparison of pixel grids by sections, according to one embodiment
  • FIG. 18 illustrates several exemplary images with different overlays, according to one embodiment
  • FIG. 19A illustrates an exemplary impact on pixel grids of an overlay being placed on corresponding image, according to one embodiment
  • FIG. 19B illustrates an exemplary pixel grid with a region of interest excluded, according to one embodiment
  • FIG. 20 illustrates an exemplary image to be fmge ⁇ rinted that is divided into four sections and has a portion to be excluded from fmge ⁇ rinting, according to one embodiment.
  • FIG. 21 illustrates an exemplary image to be fmge ⁇ rinted that is divided into a plurality of regions that are evenly distributed across the image and has a portion to be excluded from fmge ⁇ rinting, according to one embodiment
  • FIG. 22 illustrates exemplary channel change images, according to one embodiment
  • FIG. 23 illustrates an image with expected locations of a channel banner and channel identification information within the channel banner identified, according to one embodiment.
  • FIG. 1 illustrates an exemplary content delivery system 100.
  • the system 100 includes a broadcast facility 110 and receiving/presentation locations.
  • the broadcast facility 110 transmits content to the receiving/presentation facilities and the receiving/presentation facilities receive the content and present the content to subscribers.
  • the broadcast facility 110 may be a satellite transmission facility, a head-end, a central office or other distribution center.
  • the broadcast facility 110 may transmit the content to the receiving/presentation locations via satellite 170 or via a network 180.
  • the network 180 may be the Internet, a cable television network (e.g., hybrid fiber cable, coaxial), a switched digital video network (e.g., digital subscriber line, or fiber optic network), broadcast television network, other wired or wireless network, public network, private network, or some combination thereof.
  • the receiving/presentation facilities may include residence 120, pubs, bars and or restaurants 130, hotels and/or motels 140, business 150, and/or other establishments 160.
  • the content delivery system 100 may also include a Digital
  • DNR Video Recorder
  • the methods and system described herein can be applied to DNRs both with respect to content being recorded as well as content being played back.
  • the content delivery network 100 may deliver many different types of content. However, for ease of understanding the remainder of this disclosure will concentrate on programming and specifically video programming. Many programming channels include advertisements with the programming. The advertisements may be provided before and/or after the programming, may be provided in breaks during the programming, or may be provided within the programming (e.g., product placements, bugs, banner ads).
  • advertisements opportunities that are provided between programming, whether it be between programs (e.g., after one program and before another) or during programming (e.g., advertisement breaks in programming, during time outs in sporting events).
  • the advertisements may subsidize the cost or the programming and may provide additional sources of revenue for the broadcaster (e.g., satellite service provider, cable service provider).
  • the broadcaster e.g., satellite service provider, cable service provider.
  • detect particular scenes of interest is also possible to detect particular scenes of interest or to generically detect scene changes.
  • a segment of video or a particular image, or scene change between images, which is of interest can be considered to be a video entity.
  • the library of video segments, images, scene changes between images, or finge ⁇ rints of those images can be considered to be comprised of known video entities.
  • substituting advertisements may be beneficial and/or desired.
  • Substitution of advertisements can be performed locally (e.g., residence 120, pub 130, hotel 140) or may be performed somewhere in the video distribution system 100 (e.g., head end, nodes) and then delivered to a specific location (e.g., pub 130), a specific geographic region (e.g., neighborhood), subscribers having specific traits (e.g., demographics) or some combination thereof.
  • a specific location e.g., pub 130
  • a specific geographic region e.g., neighborhood
  • subscribers having specific traits e.g., demographics
  • the remaining disclosure will focus on local substitution as the substitution and delivery of targeted advertisements from within the system 100.
  • Substituting advertisements requires that advertisements be detected within the programming. The advertisements may be detected using information that is embedded in the program stream to define where the advertisements are.
  • For analog programming cue tones may be embedded in the programming to mark the advertisement boundaries.
  • digital cue messages may be embedded in the programming to identify the advertisement boundaries.
  • a targeted advertisement or targeted advertisements may be substituted in place of a default advertisement, default advertisements, or an entire advertisement block.
  • the local detection of cue tones (or cue tone messages) and substitution of targeted advertisements may be performed by local system equipment including a set top box (STB) or DNR.
  • STB set top box
  • not all programming streams include cue tones or cue tone messages.
  • cue tones may not be transmitted to the STB or DNR since the broadcaster may desire to suppress them to prevent automated ad detection (and potential deletion).
  • Techniques for detecting advertisements without the use of cue tones or cue messages include manual detection (e.g., individuals detecting the start of advertisements) and automatic detection. Regardless of what technique is used, the detection can be performed at various locations (e.g., pubs 130, hotels 140). Alternatively, the detection can be performed external to the locations where the external detection points may be part of the system (e.g., node, head end) or may be external to the system. The external detection points would inform the locations (e.g., pubs 130, hotels 140) of the detection of an advertisement or advertisement block. The communications from the external detection point to the locations could be via the network 170. For ease of understanding this disclosure, we will focus on local detection. FIG.
  • the incoming video stream is received by a network interface device ( ⁇ ID) 200.
  • the type of network interface device will be dependent on how the incoming video stream is being delivered to the location. For example, if the content is being delivered via satellite (e.g., 170 of FIG. 1) the ⁇ ID 200 will be a satellite dish (illustrated as such) for receiving the incoming video stream.
  • the incoming video stream is provided to a STB 210 (a tuner) that tunes to a desired channel, and possibly decodes the channel if encrypted or compressed. It should be noted that the STB 210 may also be capable of recording programming as is the case with a DNR or video cassette recorder NCR.
  • the STB 210 forwards the desired channel (video stream) to a splitter 220 that provides the video stream to a detection/replacement device 230 and a selector (e.g., A/B switch) 240.
  • the detection/replacement device 230 detects and replaces advertisements by creating a presentation stream consisting of programming with targeted advertisements.
  • the selector 240 can select which signal (video steam or presentation stream) to output to an output device 250 (e.g., television).
  • the selector 240 may be controlled manually by an operator, may be controlled by a signal/message (e.g., ad break beginning message, ad break ending message) that was generated and transmitted from an upstream detection location, and/or may be controlled by the detection/replacement device 230.
  • the splitter 220 and the selector 240 may be used as a bypass circuit in case of an operations issue or problem in the detection/replacement device 230.
  • the default mode for the selector 240 may be to pass-through the incoming video stream.
  • manually switching the selector 240 to the detection/replacement device 230 may cause the detection/replacement device 230 to provide advertisements (e.g., targeted advertisements) to be displayed to the subscriber (viewer, user). That is, the detection/replacement device 230 may not detect and insert the advertisements in the program stream to create a presentation stream. Accordingly, the manual switching of the selector 240 may be the equivalent to switching a channel from a program content channel to an advertisement channel.
  • advertisements e.g., targeted advertisements
  • this embodiment would have no copyright issues associated therewith as no recording, analyzing, or manipulation of the program stream would be required.
  • the splitter 220, the detection/replacement device 230, and the selector 240 are all illustrated as separate components they are not limited thereby. Rather, all the components could be part of a single component (e.g., the splitter 220 and the selector 240 contained inside the detection/replacement device 230; the splitter 220, the detection/replacement device 230, and the selector 240 could be part of the STB 210).
  • Automatic techniques for detecting advertisements may include detecting aspects (features) of the video stream that indicate an advertisement is about to be displayed or is being displayed (feature based detection).
  • advertisements are often played at a higher volume then programming so a sudden volume increase (without commands from a user) may indicate an advertisement.
  • Many times several dark monochrome (black) frames of video are presented prior to the start of an advertisement so the detection of these types of frames may indicate an advertisement.
  • the above noted techniques may be used individually or in combination with one another. These techniques may be utilized along with temporal measurements, since commercial breaks often begin within a certain known time range. However, these techniques may miss advertisements if the volume increases or if the display of black frames is missing or does not meet a detection threshold. Moreover, these techniques may result in false positives (detection of an advertisement when one is not present) as the programming may include volume increases or sequences of black frames.
  • scene/shot breaks are more common during an advertisement since action/scene changes stimulate interest in the advertisement. Additionally, there is typically more action and scene changes during an advertisement block. Accordingly, another possible automatic feature based technique for detecting advertisements is the detection of scene/shot breaks (or frequent scene/shot breaks) in the video programming.
  • Scene breaks may be detected by comparing consecutive frames of video. Comparing the actual images of consecutive frames may require significant processing.
  • scene/shot breaks may be detected by computing characteristics for consecutive frames of video and for comparing these characteristics. The computed characteristics may include, for example, a color histogram or a color coherence vector (CCN).
  • CCN color coherence vector
  • a color histogram is an analysis of the number of pixels of various colors within an image or frame.
  • the frame Prior to calculating a color histogram the frame may be scaled to a particular size (e.g., number of pixels), the colors may be reduced to the most significant bits for each color of the red, blue, green (RGB) spectrum, and the image may be smoothed by filtering.
  • RGB red, blue, green
  • FIG. 3 illustrates an exemplary pixel grid 300 for a video frame and an associated color histogram 310.
  • the pixel grid 300 is 4x4 (16 pixels) and each grid is identified by a six digit number with each two digit portion representing a specific color (RGB). Below the digit is the color identifier for each color. For example, an upper right grid has a 000000 as the six digit number which equates to Ro, Go and B 0 .
  • the color histogram 310 is the number of each color in the overall pixel grid. For example, there are 9 Ro's in FIG . 3.
  • FIG. 4 illustrates an exemplary comparison of two color histograms 400, 410. The comparison entails computing the difference/distance between, the two. The distance may be computed for example by summing the absolute differences
  • CCVs divide the col ors from the color histogram into coherent and incoherent ones based on how the colors are grouped together.
  • Coherent colors are colors that are grouped together in more than a threshold number of connected pixels and incoherent colors are colors "that are either not grouped together or are grouped together in less than a thresho Id number of pixels. For example, if 8 is the threshold and there are only 7 red pixels grouped (connected together) then these 7 red pixels are considered incoherent.
  • FIG. 5 illustrates an exemplary pixel grid 500 for a video frame and associated color histogram 510 and CCVs 520, 530.
  • the pixel grid 500 has the same number associated with each of the colors (RGB) so that a single number represents all colors an ⁇ i the pixel grid 500 is limited to 16 pixels.
  • some colotrs that are grouped together (has at least one other color at a connected pixel - one of the 8 touching pixels) and some colors that are by themselves. For example, r vo color Is, four color 2s, and four (two sets of 2) color 3 s are grouped (connected), while three color 0s, one color 1, and two color 3s are not grouped (connected).
  • the color histogram 510 indicates the number of each color.
  • a first CCV 520 illustrates the number of coherent and incoherent colors assuming that the threshold grouping for being considered coherent is 2 (that is a grouping of two pixels of the same color means the pixels are coherent for that color).
  • a second CCV 530 illustrates the number of coherent and incoherent colors assuming that the threshold grouping was 3.
  • the colors impacted by the change in threshold are color 0 (went from 2 coherent and 1 incoherent to 0 coherent and 3 incoherent) and color 3 (went from 4 coherent and 2 incoherent to 0 coherent and 6 incoherent).
  • the threshold used for detecting scene changes or other parameters may be adjusted accordingly.
  • FIG. 6 illustrates an exemplary comparison of color histograms 600, 610 and CCVs 620, 630 for two images.
  • the differences may be calculated, for example, by summing the absolute differences (Ll-Norm) or by summing the square of the differences (L2-Norm).
  • Ll-Norm absolute differences
  • L2-Norm square of the differences
  • the image contains only 9 pixels and that each pixel has the same bit identifier for each of the colors in the RGB spectrum.
  • the color histograms 600, 610 are identical so the difference ( ⁇ cH) is 0 (calculation illustrated for summing the absolute differences).
  • the difference ( ⁇ CCV) between the two CCVs 620, 630 is 8 (based on the sum of the absolute differences method).
  • Another possible feature based automatic advertisement detection technique includes detecting action (e.g., fast moving objects, hard cuts, zooms, changing colors) as an advertisement may have more action in a short time than the programming.
  • action can be determined using edge change ratios (ECR).
  • ECR detects structural changes in a scene, such as entering, exiting and moving objects. The changes are detected by comparing the edge pixels of consecutive images (frames), n and n-1. Edge pixels are the pixels that form the exterior of distinct objects within a scene (e.g., a person, a house). A determination is made as to the total number of edge pixels for two consecutive images, ⁇ n and ⁇ n - ⁇ , the number of edge pixels exiting a first frame, J ⁇ °_ x and the number of edge
  • FIG. 6 A illustrates two exemplary consecutive images, n and n-1. Edge pixels for each of the images are shaded. The total number of edge pixels for image n-1, ⁇ n - ⁇ , is 43 while the total number of edge pixels for image n, ⁇ manger, is 32. The pixels circled in image n-1 are not part of the image n (they exited image n-1).
  • the number of edge pixels exiting image n-1, °_ is 22.
  • the pixels circled in image n were not part of image n-1 (they entered image n).
  • the number of edge pixels entering image n, ] TM is 13.
  • the ECR is the greater of -r - out the two ratios ---ii ( 2 ? and X - - 3 )• Accordingly, the ECR value is
  • action can be determined using a motion vector length (MVL).
  • the MVL divides images (frames) into macroblocks (e.g., 16x16 pixels). A determination is then made as to where each macroblock is in the next image (e.g., distance between macroblock in consecutive images). The determination may be limited to a certain number of pixels (e.g., 20) in each direction. If the location of the macroblock can not be determined then a predefined maximum distance may be defined (e.g., 20 pixels in each direction).
  • the macroblock length vector for each macroblock can be calculated as the square root of the sum of the squares of the differences between the x and y coordinates [V(x ⁇ - FIG.
  • FIG. 6B illustrates two exemplary consecutive images, n and n-1.
  • the images are divided into a plurality of macroblocks (as illustrated each macroblock is 4 (2x2) pixels).
  • Four specific macroblocks are identified with shading and are labeled 1-4 in the first image n-1.
  • a maximum search area is defined around the 4 specific macroblocks as a dotted line (as illustrated the search areas is one macroblock in each direction).
  • the four macroblocks are identified with shading on the second image n. Comparing the specified macroblocks between images reveals that the first and second macroblocks moved within the defined search are, the third macroblock did not move, and the fourth macroblock moved out of the search area.
  • the length vector for the macroblocks is 1.41 for MB1, 2.83 for MB2, 0 for MB3, and 4.24 for MB4.
  • the action detection techniques e.g., ECR, MVL
  • several of these techniques may be used in conjunction with one another to produce a result with a higher degree of confidence and may be able to reduce the number of false positives and detect the advertisements faster.
  • the feature based techniques are based solely on recognition of features that may be present more often in advertisements than programming there can probably never be a complete level of confidence that an advertisement has been detected.
  • commercial break intros are utilized to indicate to the viewers that the subsequent material being presented is not programming but rather sponsored advertising. These commercial break intros vary in nature but may include certain logos, characters, or other specific video and audio messages to indicate that the subsequent material is not programming but rather advertising.
  • the return to programming may in some instances also be preceded by a commercial break outro which is a short video segment that indicates the return to programming.
  • the intros and the outros may be the same with an identical programming segment being used for both the intro and the outro. Detecting the potential presence of the commercial break intros or outros may indicate that an advertisement (or advertisement block) is about to begin or end respectively. If the intros and/or outros were always the same, detection could be done by detecting the existence of specific video or audio, or specific logos or characters in the video stream, or by detecting specific features about the video stream (e.g., CCNs). However, the intros and/or outros need not be the same.
  • the intros/outros may vary based on at least some subset of day, time, channel (network), program, and advertisement (or advertisement break). Intros may be several frames of video easily recognized by the viewer, but may also be icons, graphics, text, or other representations that do not cover the entire screen or which are only shown for very brief periods of time.
  • broadcasters are also selling sponsorship of certain programming which means that a sponsor's short message appears on either side (beginning or end) of each ad break during that programming. These sponsorship messages can also be used as latent cue tones indicating the start and end of ad breaks.
  • the detection of the intros, outros, and/or sponsorship messages may be based on comparing the incoming video stream, to a plurality of known intros, outros, and/or sponsorship messages. This would require that each of a plurality of known intros, outros, and/or sponsorship messages be stored and that the incoming video stream be compared to each. This may require a large amount of storage and may require significant processing as well, including the use of non-real-time processing.. Such storage and processing may not be feasible or practical, especially for real time detection systems. Moreover, storing the known advertisements for comparing to the video programming could potentially be considered a copyright violation.
  • the detection of the intros, outros, and/or sponsorship messages may be based on detecting messages, logos or characters within the video stream and comparing them to a plurality of known messages, logos or characters from known intros, outros, and/or sponsorship messages.
  • the incoming video may be processed to find these messages, logos or characters.
  • the known messages, logos or characters would need to be stored in advance along with an association to an intro or outro.
  • the comparison of the detected messages, logos or characters to the known messages, logos or characters may require significant processing , including the use of non-real-time processing.
  • storing the known messages, logos or characters for comparison to messages, logos or characters from the incoming video stream could potentially be considered a copyright violation.
  • the detection of the intros, outros, and/or sponsorship messages may be based on detecting messages within the video stream and determining the meaning of the words (e.g., detecting text in the video stream and analyzing the text to determine if it means an advertisement is about to start).
  • the detection may be based on calculating features (statistical parameters) about the incoming video stream.
  • the features calculated may include, for example, color histograms or CCNs as discussed above.
  • the features may be calculated for an entire video frame, as discussed above, number of frames, or may be calculated for evenly/randomly highly subsampled representations of the video frame.
  • the video frame could be sampled at a number (e.g., 64) of random locations or regions in the video frame and parameters such as average color) may be computed for each of these regions.
  • the subsampling can also be performed in the temporal domain.
  • the collection of features including CCNs for a plurality of images/frames, color histograms for a plurality of regions, may be referred to as a finge ⁇ rint.
  • FIG. 7 illustrates an exemplary pixel grid 700 for a video frame.
  • a plurality of regions 710 , 720, 730, 740, 750, 760, 770, 780, 785, 790, 795 of the pixel grid 700 are sampled and an average color for each of the regions 710, 720, 730 , 740, 750, 760, 770, 780, 785, 790, 795 is calculated.
  • the region 710 has an average color of 1.5
  • the region 790 has an average color of .5
  • the region 795 has an average color of 2.5.
  • One advantage of the sampling of regions of a frame instead of an entire frame is that the entire frame would not need to be copied in order to calculate the features (if copying was even needed to calculate the features). Rather, certain regions of the image may be copied in order to calculate the features for those regions. As the regions of the frame would provide only a partial image and could not be used to recreate the image, there would be less potential copyright issues.
  • FIG. 8 illustrates two exemplary pixel grids 800 and 810.
  • Each of the pixel grids is 11x11 (121 pixels) and is limited to a single bit (0 or 1) for each of the colors.
  • the top view of each pixel grid 800 , 810 has a plurality of regions identified 815-850 and 855-890 respectively.
  • the lower view of each pixel grids 800 , 810 has the coherent and incoherent pixels identified, where the threshold level is greater than 5.
  • FIG. 9 illustrates exemplary comparisons of the pixel grids 800, 810 of FIG.
  • Color histograms 900, 910 are for the entire frame 800, 810 respectively and the difference in the color histograms 920 is 0.
  • CCNs 930, 940 are for the entire frame 800, 810 respectively and the difference in the CCNs 950 is 0.
  • Average colors 960, 970 capture the average colors for the various identified regions in frames 800, 810 .
  • the difference is the average color of the regions 980 is 3.5 (using the sum of absolute values).
  • FIGs. 7-9 focused on determining the average color for each of the regions but the techniques illustrated therein are not limited to average color determinations. For example, a color histogram or CCN could be generated for each of these regions. For CCNs to provide useful benefits the regions would have to be big enough or all of the colors will be incoherent.
  • the calculated features/finge ⁇ rints e.g., CCNs, evenly/randomly highly subsampled representations
  • the finge ⁇ rints for the known intros and outros could be calculated and stored in advance.
  • the comparison of calculated features of the incoming video stream (statistical parameterized representations) to the stored finge ⁇ rints for known intros/outros will be discussed in more detail later.
  • Another method for detecting the presentation of an advertisement is automatic detection of the advertisement. Automatic detection techniques may include recognizing that the incoming video stream is a known advertisement.
  • Recognition techniques may include comparing the incoming video stream to known video advertisements. This would require that each of a plurality of known video advertisements be stored in order to do the comparison. This would require a relatively large amount of storage and would likely require significant processing, including non-real-time processing. Such storage and processing may not be feasible or practical, especially for real time detection systems. Moreover, storing the known advertisements for comparison to the video programming could potentially be considered a copyright violation. Accordingly, a more practical automatic advertisement recognition technique may be to calculate features (statistical parameters) about the incoming video stream and to compare the calculated features to a database of the same features (previously calculated) for known advertisements.
  • the features may include color histograms, CCNs, and/or evenly/randomly highly subsampled representations of the video stream as discussed above or may include other features such as text and object recognition, logo or other graphic overlay recognition, and unique spatial frequencies or patterns of spatial frequencies (e.g., salient points).
  • the features may be calculated for images (e.g., frames) or portions of images (e.g., portions of frames).
  • the features may be calculated for each image (e.g., all frames) or for certain images (e.g., every I-frame in an MPEG stream).
  • the combination of features for different images (or portions of images) make up a fmge ⁇ rint.
  • the finge ⁇ rint may include unique temporal characteristics instead of, or in addition to, the unique spatial characteristics of a single image.
  • the features/finge ⁇ rints for the known advertisements or other segments of programming may have been pre - calculated and stored at the detection point.
  • the finge ⁇ rints may be calculated for the entire advertisement so that the known advertisement fmge ⁇ rint includes calculated features for the entire advertisement (e.g., every frame for an entire 30-second advertisement).
  • the finge ⁇ rints may be calculated for only a portion of the known advertisements (e.g., 5 seconds).
  • FIG. 10 illustrates an exemplary flowchart of the advertisement matching process.
  • the video stream is received 1000.
  • the received video stream may be analog or digital video.
  • the processing may be done in either analog or digital but is computationally easier as digital video (accordingly digital video may be preferred). Therefore, the video stream may be digitized 1010 if it is received as analog video.
  • Features are calculated for the video stream 1020.
  • the features may include CCNs, color histograms, other statistical parameters, or a combination thereof.
  • the features can be calculated for images or for portions of images.
  • the calculated features/fmge ⁇ rints are compared to corresponding finge ⁇ rints (e.g., CCNs are compared to CCNs) for known advertisements 1030.
  • the comparison is made to the pre-stored fmge ⁇ rints of a plurality of known advertisements (finge ⁇ rints of known advertisements stored in a database).
  • the comparison 1030 may be made to the entire finge ⁇ rint for the known advertisements, or may be made after comparing to some portion of the finge ⁇ rints (e.g., 1 second which is approximately 25 frames, 35 frames which is approximately 1.4 seconds) that is large enough to make a determination regarding similarity. A determination is made as to whether the comparison was to entire finge ⁇ rints (or some large enough portion) 1040. If the entire finge ⁇ rint (or large enough portion) was not compared (1040 No) additional video stream will be received and have features calculated and compared to the finge ⁇ rint (1000-1030).
  • the incoming video stream is associated with the known advertisement (the incoming video stream is assumed to be the advertisement) 1070.
  • Targeted advertisements may be substituted in place of all advertisements within an advertisement block.
  • the targeted advertisements may be inserted in order or may be inserted based on any number of parameters including day, time, program, last time ads were inserted, and default advertisement
  • a particular advertisement may be next in the queue to be inserted as long as the incoming video stream is not tuned to a particular program (e.g., a Nike ® ad may be next in the queue but may be restricted from being substituted in football games because Adidas is a sponsor of the football league).
  • the targeted advertisements may only be inserted in place of certain default advertisements.
  • the determination of which default ads should be substituted with targeted ads may be based on the same or similar parameters as noted above with respect to the order of targeted ad insertion. For example, beer ads may not be substituted in a bar, especially if the bar sells that brand of beer.
  • the process described above with respect to FIG. 10 is focused on detecting advertisements within the incoming video stream.
  • the process is not limited to advertisements.
  • the same or similar process could be used to compare calculated features for the incoming video stream to a database of finge ⁇ rints for known intros (if intros are used in the video delivery system) or known sponsorships (if sponsorships are used). If a match is detected that would indicate that an intro is being displayed and that an advertisement break is about to begin. Ad substitution could begin once the intro is detected.
  • targeted advertisements may be inserted for an entire advertisement block (e.g., until an outro is detected).
  • the targeted advertisements may be inserted in order or may be inserted based on any number of parameters including day, time, program, and last time ads were inserted.
  • the targeted advertisements may only be inserted in place of certain default advertisements. To limit insertion of targeted advertisements to specific default advertisements would require the detection of specific advertisements.
  • the intro or sponsorship may provide some insight as to what ads may be played in the advertisement block. For example, the intro detected may be associated with (often played prior to) an advertisement break in a soccer game and the first ad played may normally be a beer advertisement.
  • This information could be used to limit the comparison of the incoming video stream to ad finge ⁇ rints for known beer advertisements as stored in an indexed ad database or could be used to assist in the determination of which advertisement to substitute. For example, a restaurant that did not serve alcohol may want to replace the beer advertisement with an advertisement for a non-alcoholic beverage.
  • the level of similarity is based on substitutions, deletions and insertions of features necessary to align the features of the incoming video stream with a finge ⁇ rint (the minimal distance between the two).
  • the feature based techniques described above may be used to detect the start of a potential advertisement (or advertisement block) and the calculating of features 1020 and comparing to known finge ⁇ rints 1030 may only be performed once a possible advertisement break has been detected. It should be noted that some methods of detecting the possibility of an advertisement break in the video stream such as an increase in scene changes, where scene changes may be detected by comparing successive CCNs, may in fact be calculating features of the video stream 1020 so the advertisement detection process may begin with the comparison 1030.
  • the calculating of features 1020 and comparing to known fmge ⁇ rints 1030 may be limited to predicted advertisement break times (e.g., between : 10 and :20 after every hour).
  • the generation 1020 and the comparison 1030 may be based on the channel to which it is tuned. For example, a broadcast channel may have scheduled advertisement blocks so that the generation 1020 and the comparison 1030 may be limited to specific times.
  • a live event such as a sporting event may not have fixed advertisement blocks so time limiting may not be an option.
  • channels are changed at random times, so time blocks would have to be channel specific.
  • the calculated finge ⁇ rint for the mcoming video stream may be continually compared to finge ⁇ rints for known intros stored in a database (known intro finge ⁇ rints). After an intro is detected indicating that an advertisement (or advertisement block) is about to begin, the comparison of the calculated finge ⁇ rint for the incoming video stream to finge ⁇ rints for known advertisements stored in a database (known advertisement finge ⁇ rints) begins.
  • a comparison of the calculated finge ⁇ rints of the incoming video stream to the known advertisement finge ⁇ rints stored in a database will be performed whether the comparison is continual or only after some event (e.g., detection of intro, certain time). Comparing the calculated finge ⁇ rint of the incoming video stream to entire finge ⁇ rints (or portions thereof) for all the known advertisement finge ⁇ rints 1030 may not be an efficient use of resources. The calculated finge ⁇ rint may have little or no similarity with a percentage of the known advertisement finge ⁇ rints and this difference may be obvious early in the comparison process.
  • an initial window (e.g., several frames, several regions of a frame) of the calculated finge ⁇ rint of the mcoming video steam may be compared to an initial window of all of the known advertisement fmge ⁇ rints (e.g., several frames, several regions). Only the known advertisement fmge ⁇ rints that have less than some defined level of dissimilarity (e.g., less than a certain distance between them) proceed for further comparison.
  • the initial window may be, for example, a certain period (e.g., 1 second), a certain number of images (e.g., first 5 I-frames), or a certain number of regions of a frame (e.g., 16 of 64 regions of frame).
  • FIG. 11 illustrates an exemplary flowchart of an initial dissimilarity determination process.
  • the video stream is received 1100 and may be digitized 1110 (e.g., if it is received as analog video).
  • Features are calculated for the video stream (e.g., digital video stream) 1120.
  • the features (fmge ⁇ rint) may include CCNs, color histograms, other statistical parameters, or a combination thereof.
  • the features can be calculated for images or for portions of images.
  • the calculated features are compared to the finge ⁇ rints for known advertisements 1130 (known advertisement finge ⁇ rints). A determination is made as to whether the compare has been completed for an initial period (window) 1140. If the initial window compare is not complete (1140 No) the process returns to 1100-1130. If the initial window compare is complete (1140 Yes) then a determination is made as to the level of dissimilarity (distance) between the calculated finge ⁇ rint and the known advertisement finge ⁇ rints exceeding a threshold 1150. If the dissimilarity is below the threshold, the process proceeds to FIG. 10 (1000) for those finge ⁇ rints.
  • FIG. 12 illustrates an exemplary initial comparison of the calculated finge ⁇ rint for an incoming stream versus initial portions of finge ⁇ rints for a plurality of known advertisements stored in a database (known advertisement finge ⁇ rints).
  • each color is limited to a single digit (two colors), that each color has the same digit so that a single number can represent all colors, and that the pixel grid is 25 pixels.
  • the calculated finge ⁇ rint includes a CCN for each image (e.g., frame, I- frame).
  • the incoming video stream has a CCN calculated for the first three frames.
  • the CCN for the first three frames of the incoming stream are compared to the associated portion (CCNs of the first three frames) of each of the known advertisement finge ⁇ rints.
  • the comparison includes summating the dissimilarity (e.g., calculated distance) between corresponding frames (e.g., distance Frame 1 + distance Frame 2 + distance Frame 3).
  • the distance between the CCNs for each of the frames can be calculated in various manners including the sum of the absolute difference and the sum of the squared differences as described above.
  • the sum of the absolute differences is utilized in FIG. 12.
  • the difference between the incoming video steam and a first fmge ⁇ rint (FPi) is 52 while the difference between the incoming video stream and the Nth finge ⁇ rint (FP N ) is 8.
  • the comparison for FPi would not proceed further (e.g., 1160) since the level of dissimilarity exceeds the predefined level (e.g., 1150 Yes).
  • the comparison for FP N would continue (e.g., proceed to 1000) since the level of dissimilarity did not exceed the predefined level (e.g., 1150 No).
  • the incoming video stream may have dropped the first few frames of the advertisement or that the calculated features (e.g., CCN) are not calculated for the beginning of the advertisement (e.g., first few frames) because, for example, the possibility of an advertisement being presented was not detected early enough.
  • FIG. 13 illustrates an exemplary initial comparison of calculated features for an incoming stream versus an expanded initial portion of known advertisement finge ⁇ rints.
  • the CCVs calculated for the first three frames of the incoming video stream are compared by a sliding window to the first five frames for a stored fmge ⁇ rint. That is, frames 1-3 of the calculated features of the incoming video stream are compared against frames 1-3 of the finge ⁇ rint, frames 2 -4 of the fmge ⁇ rint, and frames 3-5 of the fmge ⁇ rint. By doing this it is possible to reduce or eliminate the differences that may have been caused by one or more frames being dropped from the incoming video stream. In the example of FIG. 13, the first two frames of the incoming stream were dropped. Accordingly, the difference between the calculated features of the incoming video stream equated best to frames 3-5 of the fingerprint.
  • the comparison continues .
  • the comparison may continue from the portion of the finge ⁇ rint where the best match was found for the initial comparison. In the exemplary comparison of FIG. 12, the comparison should continue between frame 6 (next frame outside of initial window) of the finge ⁇ rint and frame 4 of incoming stream. It should be noted that if the comparison resulted in the best match for frames 1-3 of the finge ⁇ rint, then the comparison may continue starting at frame 4 (next frame within the initial window) for the finge ⁇ rint.
  • the window of comparison may continually be increased for the known advertisement finge ⁇ rints that do not meet or exceed the dissimilarity threshold until one of the known advertisement finge ⁇ rints possibly meets or exceeds the similarity threshold.
  • the window may be extended 5 frames for each known advertisement finge ⁇ rint that does not exceed the dissimilarity threshold.
  • the dissimilarity threshold may be measured in distance (e.g., total distance, average distance/frame). Comparison is stopped if the incoming video finge ⁇ rint and the known advertisement fmge ⁇ rint differ by more than a chosen dissimilarity threshold. A determination of a match would be based on a similarity threshold.
  • a determination of the similarity threshold being met or exceeded may be delayed until some predefined number of frames (e.g., 20) have been compared to ensure a false match is not detected (small number of frames being similar).
  • the similarity threshold may be measured in distance. For example, if the distance between the features for the incoming video stream and the finge ⁇ rint differ by less then 5 per frame after at least 20 frames are compared it is considered a match.
  • FIG. 14 illustrates an exemplary expanding window comparison of the features of the incoming video stream and the features of the finge ⁇ rints of known advertisements. For the initial window Wi, the incoming video stream is compared to each of five known advertisement finge ⁇ rints (FPi - FP 5 ).
  • the comparison of FP 2 is aborted because it exceeded the dissimilarity threshold.
  • the comparison of the remaining known advertisement finge ⁇ rints continues for the next window W 2 (e.g., next five frames, total of 10 frames).
  • the comparison of FPi is aborted because it exceeded the dissimilarity threshold.
  • the comparison of the remaining known advertisement finge ⁇ rints continues for the next window W 3 (e.g., next five frames, total of 15 frames).
  • W 3 the comparison of FP 3 is aborted.
  • the comparison of the remaining known advertisement finge ⁇ rints continues for the next window W (e.g., next five frames, total of 20 frames).
  • W 4 a determination can be made about the level of similarity.
  • the window 14 may have been a comparison of temporal alignment of the frames, a summation of the differences between the individual frames, a summation of the differences of individual regions of the frames, or some combination thereof.
  • the window is not limited to a certain number of frames as illustrated and may be based on regions of a frame (e.g., 16 of the 32 regions the frame is divided into). If the window was for less than a frame, certain finge ⁇ rints may be excluded from further comparisons after comparing less than a frame.
  • the level of dissimilarity may have to be high for comparisons of less than a frame so as not to exclude comparisons that are temporarily high due to, for example, misalignment of the finge ⁇ rints.
  • the calculated features for the incoming video stream are not stored. Rather, they are calculated and compared and then discarded. No video is being copied or if the video is being copied it is only for a short time (temporarily) while the features are calculated.
  • the features calculated for images can not be used to reconstruct the video, and the calculated features are not copied or if the features are copied it is only for a short time (temporarily) while the comparison to the known advertisement finge ⁇ rints is being performed.
  • the features may be calculated for an image (e.g., frame) or for a portion or portions of an image.
  • Calculating features for a portion may entail sampling certain regions of an image as discussed above with respect to FIGs. 7-9 above.
  • Calculating features for a portion of an image may entail dividing the image into sections, selecting a specific portion of the image or excluding a specific portion of the image. Selecting specific portions may be done to focus on specific areas of the incoming video stream (e.g., network logo, channel identification, program identification). The focus on specific areas will be discussed in more detail later.
  • FIG. 15 illustrates an exemplary pixel grid 1500 divided into sections 1510, 1520, 1530, 1540 as indicated by the dotted line.
  • the pixel grid 1500 consists of 36 pixels (a 6x6 grid) and a single digit for each color with each pixel having the same number associated with each color.
  • the pixel grid 1500 is divided into 4 separate 3x3 grids 1510-1540.
  • a full image CCV 1550 is generated for the entire grid 1500, and partial image CCVs 1560, 1570, 1580, 1590 are generated for the associated sections 1510-1540.
  • a summation of the section CCVs 1595 would not result in the CCV 1550 as the pixels may have been coherent because they were grouped over section borders which would not be indicated in the summation CCV 1595. It should be noted that the summation CCV 1595 is simply for comparing to the CCV 1550 and would not be used in a comparison to finge ⁇ rints. When calculating CCVs for sections the coherence threshold may be lowered.
  • the coherence threshold for the overall grid was four and may have been three for the sections. It should be noted that if it was lowered to 2 that the color 1 pixels in the lower right corner of section pixel grid 1520 would be considered coherent and the CCV would change accordingly to reflect this fact.
  • the comparison of the features associated with the incoming video stream to the features associated with known advertisements may be done based on sections. The comparison may be based on a single section. Comparing a single section by itself may have less granularity then comparing an entire image.
  • FIG. 16 illustrates an exemplary comparison of two images 1600, 1620 based on the whole images 1600, 1620 and sections of the images 1640, 1660 (e.g., upper left quarter of image).
  • CCVs 1610, 1630 are calculated for the images 1600, 1620 and reveal that the difference (distance) between them is 16 (based on sum of absolute values).
  • CCVs 1650, 1670 are calculated for the sections 1640, 1660 and reveal that there is no difference. The first sections 1640, 1660 of the images were the same while the other sections were different thus comparing only the features 1650, 1670 may erroneously result in not being filtered
  • the dissimilarity threshold will have to be set at an appropriate level to account for this possible effect or several comparisons will have to be made before a comparison can be terminated due to a mismatch (exceeding dissimilarity threshold). Alternatively, the comparison of the sections may be done at the same time
  • FIG. 17 illustrates an exemplary comparison of a pixel grid 1700 (divided into sections 1710, 1720, 1730, 1740) to the pixel grid 1500 (divided into sections
  • FIG. 15 depicted the image being divided into four quadrants of equal size, but is not limited thereto.
  • the image could be divided in numerous ways without departing from the scope (e.g., row slices, column slices, sections of unequal size and/or shape).
  • the image need not be divided in a manner in which the whole image is covered.
  • the image could be divided into a plurality of random regions as discussed above with respect to FIGs. 7-9.
  • the sections of an image that are analyzed and compared are only a portion of the entire image and could not be used to recreate the image so that there could clearly be no copyright issues. That is, certain portions of the image are not captured for calculating features or for comparing to associated portions of the known advertisement finge ⁇ rints that are stored in a database.
  • FIGs. 11-14 discussed comparing calculated features for the incoming video stream to windows (small portions) of the finge ⁇ rints at a time so that likely mismatches need not be continually compared.
  • the same basic process can be used with segments. If the features for each of the segments for an image are calculated and compared together (e.g., FIG. 17) the process may be identical except for the fact that separate features for an image are being compared instead of a single feature.
  • the process may compare the features for that subset of the incoming video stream to the features for that subset of the advertisement finge ⁇ rints. For the fmge ⁇ rints that do not exceed the threshold level of dissimilarity (e.g., 1150 No of FIG. 11) the comparison window may be expanded to the additional segments of the image and finge ⁇ rints or may be extended to the same section of additional images.
  • the threshold level of dissimilarity e.g., 1150 No of FIG. 11
  • a finge ⁇ rint for an incoming video stream may be based on an image (or portion of an image) and consist of features calculated for different regions (q ls q 2 ... q n ) of the image.
  • the finge ⁇ rints for known advertisements may be based on images and consist of features calculated for different regions (si, s 2 ... s m ) of the images.
  • the number of regions in an image for a stored finge ⁇ rint may be greater then the integer n (number of regions in an image of incoming video stream) if the finge ⁇ rint of the incoming video stream is not for a complete image. For example, regions may not be defined for boundaries on an incoming video stream due to the differences associated with presentation of images for different TVs and/or STBs.
  • features may not encode any spatial distribution. For instance, areas which are visible in the top half of the incoming video stream and are used for the calculation of the query finge ⁇ rint might match an area in a subject finge ⁇ rint that is not part of the query finge ⁇ rint. This would result in a false match.
  • entire images of neither the incoming video stream nor the known advertisements (ad intros, sponsorship messages, etc.) are stored, rather the portions of the images are captured so that the features can be calculated.
  • the features calculated for the portions of the images of the incoming video stream are not stored, they are calculated and compared to features for known advertisements and then discarded.
  • the video stream is an analog stream and it is desired to calculate the features and compare to finge ⁇ rints in digital then the video stream is converted to digital only as necessary. That is, if the comparisons to finge ⁇ rints are done on a image by image basis the conversion to digital will be done image by image. If the video stream is not having features generated (e.g., CCV) or being compared to at least one fmge ⁇ rint then the digital conversion will not be performed.
  • features generated e.g., CCV
  • the features for the incoming video stream do not match any finge ⁇ rints so no comparison is being done or the incoming video stream was equated with an advertisement and the comparison is temporarily terminated while the ad is being displayed or a targeted ad is being substituted. If no features are being generated or compared then there is no need for the digital conversion. Limiting the amount of conversion from analog to digital for the incoming video stream means that there is less manipulation and less temporary storage (if any is required) of the analog stream while it is being converted. According to one embodiment, when calculating the features for the incoming video stream certain sections (regions of interest) may be either avoided or focused on. Portions of an image that are excluded may be defined as regions of disinterest while regions that are focused on may be defined as regions of interest.
  • Regions of disinterest and/or interest may include overlays, bugs, and banners.
  • the overlays, bugs and banners may include at least some subset of channel and/or network logo, clock, sports scoreboard, timer, program information, EPG screen, promotions, weather reports, special news bulletins, close captioned data, and interactive TV buttons .
  • a bug e.g., network logo
  • the calculated features e.g., CCVs
  • the overlay may be a region of disinterest that should be excluded from calculations and comparisons.
  • FIG. 18 illustrates several exemplary images with different overlays.
  • the upper two images are taken from the same video stream.
  • the first image has a channel logo overlay in the upper left corner and a promotion overlay in the upper right corner while the second image has no channel overlay and has a different promotion overlay.
  • the lower two images are taken from the same video stream.
  • the first image has a station overlay in the upper right corner and an interactive bottom in the lower right corner while the second image has a different channel logo in the upper right and no interactive button. Comparing finge ⁇ rints for the first set of images or the second set of images may result in a non-match due to the different overlays.
  • FIG. 19A illustrates an exemplary impact on pixel grids of an overlay being placed on a corresponding image.
  • Pixel grid 190OA is for an image and pixel grid
  • FIG. 19A illustrates an embodiment where the calculated finge ⁇ rint for the incoming video stream and the known advertisement finge ⁇ rints stored in a local database were calculated for entire frames.
  • the regions of disinterest are detected in the video stream and are excluded from the calculation of the finge ⁇ rint (e.g., CCVs) for the incoming video stream.
  • the finge ⁇ rint e.g., CCVs
  • the detection of regions of disinterest in the video stream will be discussed in more detail later. Excluding the region from the finge ⁇ rint will affect the comparison of the calculated finge ⁇ ri ⁇ t to the known advertisement finge ⁇ rints that may not have the region excluded.
  • FIG. 19B illustrates an exemplary pixel grid 1900B with the region of interest 1910B (e.g., 1920A of FIG. 19A) excluded.
  • the excluded region of interest 1910B is not used in calculating the features (e.g.
  • CCV CCV of the pixel grid 1900B.
  • a CCV 1920B will only identify 94 pixels. Comparing the CCV 1920B having the region of interest excluded and the CCV 1930A for the pixel grid for the image without an overlay 1900A results in a difference 1930B of 6 (using the sum of absolute values). By removing the region of interest from the difference (distance) calculation, the distance between the image with no overlay 19O0A and the image with the overlay removed 1900B was half of the difference between the image with no overlay 1900A and the image with the overlay 1910A.
  • the regions of disinterest (ROD) ay be detected by searching for certain characteristics in the video stream.
  • the search for the characteristics may be limited to locations where overlays, bugs and banners may normally be placed (e .g., banner scrolling along bottom of image).
  • the detection of the RODs may include comparing the image (or portions of it) to stored regions of interest. For example, network overlays may be stored and the incoming video stream may be compared to the stored overlay to determine if an overlay is part of the video stream. Comparing actual images may require extensive memory for storing the known regions of interest as well as extensive processing to compare the incoming video stream to the stored regions. According to one embodiment, a ROD may be detected by comparing a plurality of successive images.
  • the known RODs may have features calculated (e.g., CCVs) and these features may be stored as ROD finge ⁇ rints.
  • Features e.g., CCVs
  • CCVs CCVs
  • the video stream features may be compared to the ROD finge ⁇ rints. As the ROD is likely small with respect to the image the features for the incoming video stream may have to be limited to specific portions (portions where the ROD is likely to be).
  • bugs may normally be placed in a lower right hand corner so the features will be generated for a lower right portion of the incoming video and compared to the ROD finge ⁇ rints (at least the ROD finge ⁇ rints associated with bugs) to determine if an overlay is present.
  • Banners may be placed on the lower 10% of the image so that features would be generated for the bottom 10% of an incoming video stream and compared to the ROD finge ⁇ rints (at least the ROD fmge ⁇ rints for banners).
  • the detection of RODs may require that separate finge ⁇ rints be generated for the incoming video stream and compared to distinct finge ⁇ rints for RODs.
  • the features calculated for the possible RODs for the incoming video stream may not match stored ROD finge ⁇ rints because the RODs for the incoming video stream may be overlaid on top of the video stream so that the features calculated will include the video stream as well as the overlay where the known finge ⁇ rint may be generated for simply the overlay or for the overlay over a different video stream. Accordingly it may not be practical to determine RODs in an incoming video stream.
  • the generation of the finge ⁇ rints for known advertisements as well as for the incoming video steam may exclude portions of an image that are known to possibly contain RODs (e.g., overlays, banners). For example as previously discussed with respect to FIG.
  • a possible ROD 1910B may be excluded from the calculation of the finge ⁇ rint for the entire frame. This would be the case for both the calculated finge ⁇ rint of the incoming video stream as well as the known advertisement finge ⁇ rints stored in the database. Accordingly, the possible ROD would be excluded from comparisons of the calculated finge ⁇ rint and the known advertisement finge ⁇ rints.
  • the excluded region may be identified in numerous manners. For example, the ROD may be specifically defined (e.g., exclude pixels 117-128).
  • the portion of the image that should be included in finge ⁇ rinting may be defined (e.g., include pixels 1-116 and 129-150).
  • the image may be broken up into a plurality of blocks (e.g., 16x16 pixel grids) and those blocks that are included or excluded may be defined (e.g., include regions 1 -7 and 9-12, exclude region 6).
  • a bit vector may be used to identify the pixels and/or blocks that should be included or excluded from the finge ⁇ rint calculation (e.g., 0101100 may indicate that blocks 2, 4 and 5 should be included and blocks 1, 3, 6 and 7 are excluded).
  • the RODs may also be excluded from sections and/or regions if the finge ⁇ rints are generated for portions of an image as opposed to an entire image as illustrated in FIG. 19B.
  • FIG. 20 illustrates an exemplary image 2000 to be fmge ⁇ rinted that is divided into four sections 2010-2040.
  • the image 2000 may be from an incoming video stream or a known advertisement, intro, outro, or channel identifier.
  • the sections 2010-2040 do not make up the entire image. That is, if each of these sections is grabbed in order to create the fingerprint for the sections there is clearly no copyright issues associated therewith as the entire image is not captured and the image could not be regenerated based on the portions thereof.
  • Each of the sections 2010-2040 is approximately 25% of the image 2000, however the section 2040 has a portion 2050 excluded therefrom as the portion 2050 may be associated with where an overlay is normally placed.
  • FIG. 21 illustrates an exemplary image 2100 to be fmge ⁇ rinted that is divided into a plurality of regions 2110 that are evenly distributed across the image 2100.
  • the image 2100 may be from an incoming video stream or a known advertisement and that the regions 2100 do not make up the entire image.
  • a section 2120 of the image that may be associated with where a banner may normally be placed so this portion of the image would be excluded.
  • Certain regions 2130 fall within the section 2120 so they may be excluded from the finge ⁇ rint or those regions 2130 may be shrunk so as to not fall within the section 2120.
  • Ad substitution may be based on the particular channel that is being displayed. That is, a particular targeted advertisement may not be able to be displayed on a certain channel (e.g., an alcohol advertisement may not be able to be displayed on a religious programming channel).
  • the local ad insertion unit is to respond properly to channel specific cue tones that are centrally generated and distributed to each local site, the local unit has to know -what channel is being passed through it.
  • An advertisement detection unit may not have access to data (e.g., specific frequency, metadata) indicating identity of the channel that is being displayed. Accordingly the unit will need to detect the specific channel.
  • Finge ⁇ rints may be defined for channel identification information that may be transmitted within the video stream (e.g., channel logos, channel banners, channel messages) and these finge ⁇ rints may be stored for comparison. When the incoming video stream is received an attempt to identify the portion of the video stream containing the channel identification information may be made.
  • channel overlays may normally be placed in a specific location on the video stream so that portion of the video stream may be extracted and have features (e.g. CCV) generated therefore. These features will be compared to stored fmge ⁇ rints for channel logos.
  • features e.g. CCV
  • one problem may be the fact that the features calculated for the region of interest for the video stream may include the actual video stream as well as the overlay. Additionally, the logos may not be placed in the same place on the video stream at all times so that defining an exact portion of the video stream to calculate features for may be difficult.
  • channel changes may be detected and the channel information may be detected during the channel change.
  • the detection of a channel change may be detected by comparing features of successive images of the incoming video stream and detecting a sudden and abrupt change in features.
  • a change in channel often results in the display of several monochrome (e.g., blank, black, blue) frames while the new channel is decoded.
  • the display of these monochrome frames may be detected in order to determine that a channel change is occurring.
  • the display of these monochrome frames may be detected by calculating a fmge ⁇ rint for the incoming video stream and comparing it to finge ⁇ rints for known channel change events (e.g., monochrome images displayed between channel changes).
  • channel numbers may be overlaid on a portion of the video stream.
  • FIG. 22 illustrates exemplary channel change images. As illustrated, the image during a channel change is a monochrome frame with the exception of the channel change banner 2210 along the bottom of the image.
  • the channel banner may be identified as a region of disinterest to be excluded from comparisons of the features generated for the incoming video stream and the stored finge ⁇ rints.
  • the channel change has been detected (whether based on comparing finge ⁇ rints or some other method)
  • a determination as to what channel the system is tuned to can be made.
  • the determination may be based on analyzing channel numbers overlaid on the image or the channel banner.
  • the analysis may include comparing to stored channel numbers and/or channel banners.
  • the actual comparison of images or portions of images requires large amounts of storage and processing and may not be possible to perform in real time.
  • features/finge ⁇ rints may be calculated for the incoming video stream and compared to finge ⁇ rints for known channel identification data.
  • calculating and comparing finge ⁇ rints for overlays may be difficult due to the background image. Accordingly, the calculation and comparison of finge ⁇ rints for channel numbers will focus on the channel banners.
  • the channel banner may have more data then just the channel name or number. For example, it may include time, day, and program details (e.g., title, duration, actors, rating).
  • the channel identification data is likely contained in the same location of the channel banner so that only that portion of the channel ban-ner will be of interest and only that portioa will be analyzed. Referring back to FIG.
  • channel identification data 2220 is in the upper left hand comer of the charmel banner. According, this area may be defined as a region of interest. Fingerprints for the relevant portion of channel banners for each channel will be generated and will be stored in a database.
  • the channel identification finge ⁇ rints may be stored in same database as the knowrr advertisement (intro, outro, sponsorship message) fmge ⁇ rints or may be stored in a separate database. If stored in the same database the channel ident finge ⁇ rints are likely segregated so that the incoming ⁇ video stream is only compared to these finge ⁇ rints when a channel change has been detected.
  • FIG. 23 illustrates an image 2300 with expected locations of a channel banner 2310 and channel identification information 2320 within the channel banner 2310 identified.
  • the channel identification information 2320 may not be in the exact location expected due to parameters (e.g., scaling, translation) associated with the specific TV and/or STB (or DVR) used to receive and view the programming. For example, it is possible that the channel identification information 2320 could be located within a specific region 2330 that is greatly expanded from the expected location 2320.
  • scaling and translation factors must be determined for the incoming video stream. According to one embodiment, these factors can be determined by comparing location of the channel banner for the incoming video stream to the reference channel banner 2310. Initially a determination will be made as to where an inner boundary between the monochrome background and the channel banner is. Once the inner boundary is determined, the width and length of the channel banner can be determined.
  • the scale factor can be determined by comparing the actual dimensions to the expected dimensions.
  • the scale factor in x direction is the actual width of the channel banner/reference width
  • the scale factor in y direction is the actual height of channel banner/reference height.
  • the translation factor can be determined based on comparing a certain point of the incoming stream to the same reference point (e.g., top left comer of the inner boundary between the monochrome background and the channel banner).
  • the reference channel banner is scaled and translated during the start-up procedure to the actual size and position.
  • the translation and scaling parameter are stored so they are known so that they can be used to scale and franslate the incoming stream so that an accurate comparison to the reference material (e.g., finge ⁇ rints) can be made.
  • the scaling and translation factors have been discussed with respect to the channel banner and channel identification information but are in no way limited thereto. Rather, these factors can used to ensure an appropriate comparison of finge ⁇ rints of the incoming video stream to known finge ⁇ rints (e.g., ads, ad intros, ad outros, channel idents, sponsorships). These factors can also be used to ensure that regions of disinterest or regions of interest are adequately identified. Alternatively, rather then creating a jEinge ⁇ rint for the channel identifier region of interest the region of interest can b>e analyzed by a text recognition system that may recognize the text associated with the channel identification data in order to determine the associated channel.
  • Some networks may send messages ('channel ident') identifying the network (or channel) that is being displayed to reinforce network (channel) branding. According to one embodiment, these messages are detected and analyzed to determine the channel. The analysis may be comparing the message to stored messages for known networks (channels). Alternatively, the analysis may be calculating features for the message and comparing to stored features for known network (channel) messages/idents. The features may be generated for an entire video stream (entire image) or may be generated for a portion containing the branding message. Alternatively, the analysis may include using text recognition to determine what the message says and identirfying the channel based on that. When advertisement breaks are detected and/or when advertisements are substituted that information can be feed back to a central location for tracking and billing.
  • the central location may compare ⁇ ie detected breaks against actual advertisement breaks in video streams and a-ssociate the video stream being displayed at the location with a channel based on matching advertisement breaks.
  • the central location may transmit the associ ted channel identification back to the local detection device.
  • the central location may track when ad breaks are detected for a plurality of users and group the users according to detected ad breaks.
  • the central location could then compare the average of the detected ad breaks for the group and compare to actual ad breaks for a plurality of program streams.
  • the groups may then be associated with a channel based on matching advertisement breaks.
  • the cenfral location may transmit the associated channel identification back to the local detection devices of the group members.
  • the local detection devices may transmit features associated with the presently viewed video stream (e.g., finge ⁇ rints) to the central location.
  • the central location may compare the features to features for the plurality of program streams that are being transmitted.
  • the presently viewed presentation stream will be associated with the channel that the features correspond to.
  • the features may be transmitted to the central location at certain intervals (e.g., 30 seconds of features every 15 minutes).
  • the central location may fransmit that channel association back to the local ad detection equipment.
  • the local detection device may send data related to when the advertisement break is detected and what finge ⁇ rint was used to detect the advertisement break (e.g., finge ⁇ rint identification).
  • the fmge ⁇ rint to detect an advertisement break may be at least some subset of an ad intro finge ⁇ rint, channel ident finge ⁇ rint, sponsorship message finge ⁇ rint, ad finge ⁇ rint, and ad outro finge ⁇ rint.
  • Using both time and finge ⁇ rint identification could provide a more accurate grouping and accordingly a more accurate channel identification.
  • subscribers associated with the same group may be forced to the channel associated with the group.
  • targeted advertisements may be inserted locally. The number of targeted advertisements slated to be inserted during an advertisement break may be based on the predictecd duration of the advertisement break.
  • targeted advertisements may be selected for a majority of the advertisement break but not all of it. The remaining time may be used by a still image or animation (pre-outro) that can be cut off at any time if it is desirable to return to the program without losing impact.
  • a maximum break duration is identified.
  • the maximum break duration is the maximum amount of time that the incoming video sfream will be preempted. After this period of time is up, insertion of advertisements will end and return to the incoming video stream.
  • a pre - outro time is identified. A pre-outro is a still or animation that is presented until the max break duration is achieved or an outro is detected whichever is sooner.
  • the maximum break duration may be defined as 1 :45 and the pre-outro may be defined as : 15. Accordingly, three 30 second advertisements may be displayed during the first 1 :30 of the ad break and then the pre-outro may be displayed for the remaining : 15 or until an outro is detected, whichever is sooner.
  • the maximum break duration and outro time are defined so as to attempt to prevent targeted advertisements from being presented during programming. If an outro is detected while advertisements are still being inserted (e.g., before the pre-outro begins) a return to the incoming video stream may be initiated. As previously discussed sponsorship messages may be utilized along with or in place of outros prior to return of programming.
  • Detection of a sponsorship message will also cause the return to the incoming video stream.
  • Detection of programming may also cause the return to programming.
  • a minimum time between detection of a video entity (e.g., ad, ad infro) that starts advertisement insertion and ability to detect a video entity (e.g., ad outro, programming) that causes ad insertion to end can be defined (minimum break duration).
  • the minimum break duration may be beneficial where intros and outros are the same.
  • the minimum break duration may be associated with a shortest advertisement period (e.g., 30 seconds).
  • a minimum duration between breaks may be defined.
  • the minimum duration between breaks may be beneficial where intros and outros are the same. The duration would come into play when the maximum break duration was reached and the display of the incoming video steam was reestablished before detection of the outro. If the outro was detected when the incoming video stream was being displayed it may be associated with an intro and attempt to start another insertion.
  • the minimum duration between breaks may also be useful where video entities similar to know intros and/or outros are used during programming but are not followed by ad breaks.
  • Such a condition may occur during replays of specific events during a sporting event, or possibly during the beginning or ending of a program, when titles and/or credits, are being displayed.
  • the titles at the begirming of a program may contain sub-sequences or images that are similar to know intros and/or oufros.
  • the detection of programming can be used to suppress any detection for a predefined time frame (minimum duration after program start). The minimum duration after program start ensures that once the start of a program is detected that sub-sequences or images that are similar to know intros and/or outros will not interrupt programming .
  • the detection of the beginning of programming may end the insertion of targeted advertisements or the pre-outro if the beginning of programming is identified before the maximum break duration is expired or an outro is identified.
  • the advertisement may be completed and then a return to programming may be initiated.
  • the detection of the beginning of programming may be detected by comparing a calculated fmge ⁇ rint of the incoming video stream with previously generated finge ⁇ rints for the programming.
  • the finge ⁇ rints for programming may be for the scenes that are displayed during the theme song, or a particular image that is displayed once programming is about to resume (e.g., an image with the name of the program).
  • the finge ⁇ rints of programming and scenes within programming will be defined in more detail below.
  • the detection of a channel change or an electronic program guide (EPG) activation may cause the insertion of advertisements to cease and the new program or EPG to be displayed.
  • EPG electronic program guide
  • finge ⁇ rints are generated for special bulletins that may preempt advertising in the incoming video stream and correspondingly would want to preempt insertion of targeted advertising.
  • Special bulletins may begin with a standard image such as the station name and logo and the words special bulletin or similar type slogan. Finge ⁇ rints would be generated for each known special bulletin (one or more for each network) and stored locally. If the calculated fmge ⁇ rint for an incoming video stream matched the special bulletin while targeted advertisement or the pre-outro were being displayed a return to the incoming video stream would be initiated.
  • the specification has concentrated on local detection of advertisements or advertisement intros and local insertion of targeted advertisements. However, the specification is not limited thereto.
  • certain programs may be detected locally.
  • the local detection of programs may enable the automatic recording of the program on a digital recording device such as a DVR.
  • specific scenes or scene changes may be detected. Based on the detection of scenes a program being recorded can be bookmarked for future viewing ease.
  • To detect a particular program finge ⁇ rints may be established for a plurality of programs (e.g., video that plays weekly during theme song, program title displayed in the video stream) and calculated features for the incoming video stream may be compared to these finge ⁇ rints. When a match is detected the incoming video stream is associated with that program. Once the association is made, a determination can be made as to whether this is a program of interest to the user.
  • a recording device may be turned on to record the program.
  • the use of finge ⁇ rints to detect the programs and ensure they are recorded without any user interaction is an alternative to using the electronic or interactive program guide to schedule recordings.
  • the recorded programs could be archived and indexed based on any number of parameters (e.g., program, genre, actor, channel, network).
  • Scene changes can be detected as described above through the matching of finge ⁇ rints. If during recording of a program scene changes are detected the change in scenes can be bookmarked for ease of viewing at a later time.
  • finge ⁇ rints could be generated for the incoming video sfream and compared against scene finge ⁇ rints. When a match is found the scene title could bookmark the scene being recorded.
  • the subscriber may be able to initiate bookmarking.
  • the subscriber generated bookmarking could be related to programs and/or scenes or could be related to anything the subscriber desires (e.g., line from a show, goal scored in soccer game). For example, while viewing a program being recorded the subscriber could inform the system (e.g., pressing a button) that they wish to have that portion of the video bookmarked.
  • the system will save the calculated features (fmge ⁇ rint) for a predefined number of frames (e.g., 25) or for a predefined time (e.g., 1 second) when the subscriber indicates a desire to bookmark.
  • the subscriber may have the option to provide an identification for the finge ⁇ rint that they bookmarked so that can easily return to this portion.
  • a subscriber may desire to finge ⁇ rint an entire portion of a video stream so that they can easily return to this portion or identify the portion for further processing (e.g., copying to a DVD if allowed and appropriate).
  • a subscriber could instruct the system to save the finge ⁇ rint for the entire overtime (e.g., hold the button for the entire time to inform the system to maintain the fmge ⁇ rint generated).
  • the subscriber may have the option to provide an identification for the finge ⁇ rint that they bookmarked so that can easily return to this portion.
  • the finge ⁇ rint bookmarks and the associated programs, scenes or portions of video could be archived and indexed.
  • the finge ⁇ rints and associated video could be indexed based on any number of parameters (e.g., program, geme, actor, channel, network, user identification).
  • the bookmarks could be used as chapters so that the subscriber could easily find the sections of the programming they are interested in.
  • the finge ⁇ rint bookmarks could be indexed with other bookmarks. If during the recording of a program an advertisement (or advertisement break) is detected, the recording of the program stream may be temporarily halted. After a certain time frame (e.g., typical advertisement block time, 2 minutes) or upon detection of an outro or programming the recording will begin again.
  • the fmge ⁇ rints stored locally may be updated as new finge ⁇ rints are generated for any combination of ads, ad intros, channel banners, program overlays, programs, and scenes.
  • the updates may be downloaded automatically at certain times (e.g., every night between 1 and 2 am), or may require a user to download fmge ⁇ rints from a certain location (e.g., website) or any other means of updating. Automated distribution of fmge ⁇ rints can also be utilized to ensure that viewers local finge ⁇ rint libraries are up-to-date.
  • the local detection system may track the features it generates for the incoming streams and if there is no match to a stored fmge ⁇ rint the system may determine that it is a new finge ⁇ rint and may store the finge ⁇ rint.
  • the system detects that an advertisement break has started and generates a finge ⁇ rint for the ad (e.g., new Pepsi ® ad) and the features generated for the new ad are not already stored, the calculated features may be stored for the new ad.
  • equipment can be placed in commercial establishments such as bars, hotels, and hospitals, and will allow for the recognition of known video entities (e.g., advertisements, advertisement intros, advertisement outros, sponsorship messages, programs, scenes, channel changes, EPG activations, and special bulletins) and appropriate subsequent processing.
  • a unit having the capabilities described herein is placed in a bar, and is connected to an appropriate video source, as well as having a connection to a data network such as the internet.
  • the output of a receiving unit e.g., STB, DVR
  • the unit is continually updated with finge ⁇ rints that correspond to video entities that are to be substituted, which in one case are advertisements.
  • the unit processes the incoming video and can detect the channel that is being displayed on the television using the techniques described herein.
  • the unit continually monitors the incoming video signal and, based on processing of multiple frames, full frames, sub-frames or partial images, determines a match to a known advertisement or intro.
  • the unit can access an appropriate advertisement and substitute the original advertisement with another advertisement.
  • the unit can also record that a particular advertisement was displayed on a particular channel and the time at which it was aired.
  • regions of interest in the video programming are marked and regions outside of the regions of interest are excluded from processing.
  • the marking of the regions of interest is also used to focus processing on the areas that can provide information that is useful in determining to which channel the unit is tuned.
  • the region of interest for detection of video segments is the region that is excluded for channel detection and visa versa.
  • the area that provides graphics, icons or text indicating the channel is examined for channel recognition but excluded for video segment recognition.
  • the personal/digital video recorder stores incoming video for future playback (also known as time-shifted video).
  • the functionality described herein, or portions thereof, are included in the personal/digital video recorder and allows for the recognition of video segments on the incoming video, on stored video, or on video being played back.
  • the stored finge ⁇ rints represent advertisements, while in another application the stored finge ⁇ rints represent intros to programs.
  • the personal/digital video recorder can perform advertisement recognition and substitution, or can automatically recognize segments that indicate that a program should be recorded.
  • the user designates one or more finge ⁇ rints as the basis for recording (e.g.
  • the known video entities can be established such that they are useful in classifying the video, determining content, or establishing bookmarks for future reference. It is noted that any and/or all of the above embodiments, configurations, and/or variations of the present invention described above can be mixed and matched and used in any combination with one another. Moreover, any description of a component or embodiment herein also includes hardware, software, and configurations which already exist in the prior art and may be necessary to the operation of such component(s) or embodiment(s).

Abstract

In general, in one aspect, the disclosure describes a method for detecting a known video entity within a video stream. The method includes receiving a video stream and continually creating statistical parameterized representations for windows of the video stream. The statistical parameterized representation windows are continually compared to windows of a plurality of fingerprints. Each of the plurality of fingerprints includes associated statistical parameterized representations of a known video entity. A known video entity in the video stream is detected when a particular fingerprint of the Morality of fingerprints has at least a threshold level of similarity with the video stream.The disclosure also describes a method of determining a channel associated with a video stream. A method for specifying regions of interest for video event detection is also described. Furthermore, a method for ending advertisement insertion in a video stream is described.

Description

Detecting known video entities Advertisements are commonplace in most broadcast video, including video received from satellite transmissions, cable television networks, over-the-air broadcasts, digital subscriber line (DSL) systems, and fiber optic networks. Advertising plays an important role in the economics of entertainment programming in that advertisements are used to subsidize or pay for the development of the content. As an example, broadcast of sports such as football games, soccer games, basketball games and baseball games is paid for by advertisers. Even though subscribers may pay for access to that sports programming, such as through satellite or cable network subscriptions, the advertisements appearing during the breaks in the sport are sold by the network producing the transmission of the event, and subsidize the costs of the programming . Advertisements included in the prograrnming may not be applicable to individuals watching the programming. For example, in the United Kingdom, sports events are frequently viewed in public locations such as pubs and bars. Pubs, generally speaking, purchase a subscription from a satellite provider for reception of sports events. This subscription allows for the presentation of the sports event in the pub to the patrons. The advertising to those patrons may or may not be appropriate depending on the location of the pub, the make up of the clientele, the local environment, or other factors. The advertising may even promote products and services which compete with those stocked or offered by the owner of the pub. Another environment in which advertising is presented to consumers through a commercial establishment is in hotels. In hotels, consumers frequently watch television in their rooms and are subjected to the defacto advertisements placed in the video stream. Hotels sometimes have internal channels containing advertising directed at the guests, but this tends to be an ''infomercial" channel that does not have significant viewership. As is the case for pubs, the entertainment programming video streams may be purchased on a subscription basis from satellite or cable operator, or may simply be taken from over-the-air broadcasts. In some cases, the hotel operator offers Video on Demand (NoD) services, allowing consumers to choose a movie or other program for their particular viewing. These movies are presented on a fee basis, and although there are typically some types of advertising before the movie, viewers are not subjected to advertising during the movie. Hospitals also provide video programming to the patients, who may pay for the programming based on a daily fee, or in some instances on a pay-per-view basis. The advertising in the programming is not specifically directed at the patients, but is simply the advertising put into the programming by the content provider. Residential viewers are also presented advertisements in the vast majority of programming they view. These advertisements may or may not be the appropriate advertisements for that viewer or family. In all of the aforementioned embodiments, it is necessary to know when an advertisement is being presented in order to substitute an advertisement that may be more applicable. Detection of the advertisements may require access to signals indicating the start and end of an advertisement. In the absence of these signals, another means is required for detecting the start and end of an advertisement or advertisement break. There is a need for a system and method that allows for the insertion of advertisements in video streams. There is also a need for a system which allows advertisements to be better targeted to audiences and for the ability for operators of commercial premises to cross-market services and products to the audience. Additionally, there is a need for a system which enables the operators of commercial premises to eliminate and substitute advertising of competitors' products and services included in broadcasts shown to guests on their premises. In the absence of cue tones, such as broadcaster supplied cue tones, indicating the boundaries of advertisement breaks another means of detecting the display of an advertisement is required. One method includes calculating features about an incoming video stream. These features may include color histograms, color coherence vectors (CCNs), and evenly or randomly highly subsampled representations of the original video (all known as fmgerprints). The fmgeφrints of the incoming video stream are compared to a database of fingeφrints for known advertisements, video sequences known to precede commercial breaks (ad intros), and/or sequences known to follow commercial breaks (ad outros). When a match is found between the incoming video stream and a known advertisement or ad intro, the incoming video stream is associated with the known advertisement and/or ad intro and a targeted advertisement may be substituted. The fmgeφrint of the incoming video stream (calculated fmgeφrint) may be compared to a plurality of fingeφrints for known entities (e.g., ads, intros, outros) within the database (known fingeφrints). The comparison may be done based on small segments of a video stream at a time. A determination is made as to whether the calculated fmgeφrint and the known fingeφrints within the database exceed some threshold level of dissimilarity. If the comparison exceeds the threshold for certain known fmgeφrints witihin the database, the comparison of the calculated fmgeφrint to those known fmgeφrints stops for the time being. For those known fmgeφrints that the comparison was below the threshold level of dissimilarity the comparison continues. At each step of the comparison those known fmgeφrints exceeding the threshold level of dissimilarity cease. The process continues until one of the known fmgeφrints has a comparison that exceeds a threshold level of similarity (indicating a match) or the comparison of all of the known fmgeφrints within the database exceed the dissimilarity threshold at which point the video stream is not associated with any of the known fingeφrints. When comparing the fmgeφrint for the incoming video stream to the database of known fingeφrints certain portions of the fingeφrints may be excluded. For example, a channel banner may be excluded from the calculation of the dissimilarity so as not to skew the results of comparisons to the database of known fmgeφrints. A channel change may be detected when the fmgeφrint of the incoming video stream matches a fϊngeφrint in the database associated with a channel change (e.g., blank frames after channel banner is removed). After a channel change is detected, certain portions of the video stream may have fmgeφrints calculated therefore. These portions of the video stream may include the portions of the video stream that have channel identification data. These portions of the video may be analyzed to determine the channel associated with the video stream. Fingeφrints for the portions may be calculated and compared to fingeφrints for know channel identification data. The channel may also be detected by comparing detected advertisement breaks with known advertisement breaks for specific channels. The channel may also be detected by comparing fingeφrints for the incoming stream with fingeφrints for known channels stored in a database. Determining that channel may affect the ads that are inserted and is useful in reporting the programs into which targeted advertising has been inserted. Determining the channel also allows remote manual triggering of ad insertion while detection of a channel change event is used as a trigger to end ad insertion. Furthermore, if a network frequently overlays or covers a portion of frames in the video stream that portion of each frame in the video stream can be excluded during the calculation of the dissimilarity so as not to skew the results of comparisons to the database of known fingeφrints. Alternatively, when the fingeφrints are generated certain portions may be identified and excluded. According to one embodiment, certain portions may be excluded from the database of known fingeφrints as well as from the calculated fmgeφrint. When targeted advertisements are being inserted the system continues to generate fmgeφrints for the incoming video stream and to compare to known fmgeφrints stored in the database in order to look for outros or programming that would indicate the end of the commercial break in the incoming video stream. In addition channel changes or EPG activations may be detected. The detection of the end of a commercial break may cause the system to instantly return to the incoming video stream. Alternatively, the currently being inserted advertisement may be completed before the incoming video stream is returned to. Additionally, time parameters may be set that automatically returns to the video stream even if an end of the commercial break is not detected. After a certain time the system may present a pre-outro (e.g., still image) that is displayed until the end of the commercial break is detected. Calculating fmgeφrints for the incoming video stream and comparing to a database of fingeφrints of known entities can also be used to detect certain programs and/or scenes and to record, bookmark or stop recording the programs and/or scenes. Preferred embodiments of the present invention will now be described by way of example only and with reference to the accompanying drawings, in which: FIG. 1 illustrates an exemplary content delivery system, according to one embodiment; FIG. 2 illustrates an exemplary configuration for local detection of advertisements within a video programming stream, according to one embodiment; FIG. 3 illustrates an exemplary pixel grid for a video frame and an associated color histogram, according to one embodiment; FIG. 4 illustrates an exemplary comparison of two color histograms, according to one embodiment; FIG. 5 illustrates an exemplary pixel grid for a video frame and associated color histogram and CCNs, according to one embodiment; FIG. 6 illustrates an exemplary comparison of color histograms and CCNs for two images, according to one embodiment; FIG. 6 A illustrates edge pixels for two exemplary consecutive images, according to one embodiment; FIG. 6B illustrates macroblocks for two exemplary consecutive images, according to one embodiment; FIG 7 illustrates an exemplary pixel grid for a video frame with a plurality of regions sampled, according to one embodiment; FIG. 8 illustrates two exemplary pixel grids having a plurality of regions for sampling and coherent and incoherent pixels identified, according to one embodiment; FIG. 9 illustrates exemplary comparisons of the pixel grids of FIG. 8 based on color histograms for the entire frame, CCNs for the entire frame and average color for the plurality of regions, according to one embodiment; FIG. 10 illustrates an exemplary flow-chart of the advertisement matching process, according to one embodiment; FIG. 11 illustrates an exemplary flow-chart of an initial dissimilarity determination process, according to one embodiment; FIG. 12 illustrates an exemplary initial comparison of calculated features for an incoming stream versus initial portions of fingeφrints for a plurality of known advertisements, according to one embodiment; FIG. 13 illustrates an exemplary initial comparison of calculated features for an incoming stream versus an expanded initial portion of a fmgeφrint for a known advertisement, according to one embodiment; FIG. 14 illustrates an exemplary expanding window comparison of the features of the incoming video stream and the features of the fingeφrints of known advertisements, according to one embodiment; FIG. 15 illustrates an exemplary pixel grid divided into sections, according to one embodiment; FIG. 16 illustrates an exemplary comparison of two whole images and corresponding sections of the two images, according to one embodiment; FIG. 17 illustrates an exemplary comparison of pixel grids by sections, according to one embodiment; FIG. 18 illustrates several exemplary images with different overlays, according to one embodiment; FIG. 19A illustrates an exemplary impact on pixel grids of an overlay being placed on corresponding image, according to one embodiment; FIG. 19B illustrates an exemplary pixel grid with a region of interest excluded, according to one embodiment; FIG. 20 illustrates an exemplary image to be fmgeφrinted that is divided into four sections and has a portion to be excluded from fmgeφrinting, according to one embodiment. FIG. 21 illustrates an exemplary image to be fmgeφrinted that is divided into a plurality of regions that are evenly distributed across the image and has a portion to be excluded from fmgeφrinting, according to one embodiment; FIG. 22 illustrates exemplary channel change images, according to one embodiment; and FIG. 23 illustrates an image with expected locations of a channel banner and channel identification information within the channel banner identified, according to one embodiment. In describing various embodiments illustrated in the drawings, specific terminology will be used for the sake of clarity. However, the embodiments are not intended to be limited to the specific terms so selected, and it is to be understood that each specific term includes all technical equivalents which operate in a similar manner to accomplish a similar puφose. FIG. 1 illustrates an exemplary content delivery system 100. The system 100 includes a broadcast facility 110 and receiving/presentation locations. The broadcast facility 110 transmits content to the receiving/presentation facilities and the receiving/presentation facilities receive the content and present the content to subscribers. The broadcast facility 110 may be a satellite transmission facility, a head-end, a central office or other distribution center. The broadcast facility 110 may transmit the content to the receiving/presentation locations via satellite 170 or via a network 180. The network 180 may be the Internet, a cable television network (e.g., hybrid fiber cable, coaxial), a switched digital video network (e.g., digital subscriber line, or fiber optic network), broadcast television network, other wired or wireless network, public network, private network, or some combination thereof. The receiving/presentation facilities may include residence 120, pubs, bars and or restaurants 130, hotels and/or motels 140, business 150, and/or other establishments 160. In addition, the content delivery system 100 may also include a Digital
Video Recorder (DNR) that allows the user (residential or commercial establishment) to record and playback the programming. The methods and system described herein can be applied to DNRs both with respect to content being recorded as well as content being played back. The content delivery network 100 may deliver many different types of content. However, for ease of understanding the remainder of this disclosure will concentrate on programming and specifically video programming. Many programming channels include advertisements with the programming. The advertisements may be provided before and/or after the programming, may be provided in breaks during the programming, or may be provided within the programming (e.g., product placements, bugs, banner ads). For ease of understanding the remainder of the disclosure will focus on advertisements opportunities that are provided between programming, whether it be between programs (e.g., after one program and before another) or during programming (e.g., advertisement breaks in programming, during time outs in sporting events). The advertisements may subsidize the cost or the programming and may provide additional sources of revenue for the broadcaster (e.g., satellite service provider, cable service provider). In addition to being able to recognize advertisements is also possible to detect particular scenes of interest or to generically detect scene changes. A segment of video or a particular image, or scene change between images, which is of interest, can be considered to be a video entity. The library of video segments, images, scene changes between images, or fingeφrints of those images can be considered to be comprised of known video entities. As the advertisements provided in the programming may not be appropriate to the audience watching the programming at the particular location, substituting advertisements may be beneficial and/or desired. Substitution of advertisements can be performed locally (e.g., residence 120, pub 130, hotel 140) or may be performed somewhere in the video distribution system 100 (e.g., head end, nodes) and then delivered to a specific location (e.g., pub 130), a specific geographic region (e.g., neighborhood), subscribers having specific traits (e.g., demographics) or some combination thereof. For ease of understanding, the remaining disclosure will focus on local substitution as the substitution and delivery of targeted advertisements from within the system 100. Substituting advertisements requires that advertisements be detected within the programming. The advertisements may be detected using information that is embedded in the program stream to define where the advertisements are. For analog programming cue tones may be embedded in the programming to mark the advertisement boundaries. For digital programming digital cue messages may be embedded in the programming to identify the advertisement boundaries. Once the cue tones or cue tone messages are detected, a targeted advertisement or targeted advertisements may be substituted in place of a default advertisement, default advertisements, or an entire advertisement block. The local detection of cue tones (or cue tone messages) and substitution of targeted advertisements may be performed by local system equipment including a set top box (STB) or DNR. However, not all programming streams include cue tones or cue tone messages. Moreover, cue tones may not be transmitted to the STB or DNR since the broadcaster may desire to suppress them to prevent automated ad detection (and potential deletion). Techniques for detecting advertisements without the use of cue tones or cue messages include manual detection (e.g., individuals detecting the start of advertisements) and automatic detection. Regardless of what technique is used, the detection can be performed at various locations (e.g., pubs 130, hotels 140). Alternatively, the detection can be performed external to the locations where the external detection points may be part of the system (e.g., node, head end) or may be external to the system. The external detection points would inform the locations (e.g., pubs 130, hotels 140) of the detection of an advertisement or advertisement block. The communications from the external detection point to the locations could be via the network 170. For ease of understanding this disclosure, we will focus on local detection. FIG. 2 illustrates an exemplary configuration for manual local detection of advertisements within a video programming stream. The incoming video stream is received by a network interface device (ΝID) 200. The type of network interface device will be dependent on how the incoming video stream is being delivered to the location. For example, if the content is being delivered via satellite (e.g., 170 of FIG. 1) the ΝID 200 will be a satellite dish (illustrated as such) for receiving the incoming video stream. The incoming video stream is provided to a STB 210 (a tuner) that tunes to a desired channel, and possibly decodes the channel if encrypted or compressed. It should be noted that the STB 210 may also be capable of recording programming as is the case with a DNR or video cassette recorder NCR. The STB 210 forwards the desired channel (video stream) to a splitter 220 that provides the video stream to a detection/replacement device 230 and a selector (e.g., A/B switch) 240. The detection/replacement device 230 detects and replaces advertisements by creating a presentation stream consisting of programming with targeted advertisements. The selector 240 can select which signal (video steam or presentation stream) to output to an output device 250 (e.g., television). The selector 240 may be controlled manually by an operator, may be controlled by a signal/message (e.g., ad break beginning message, ad break ending message) that was generated and transmitted from an upstream detection location, and/or may be controlled by the detection/replacement device 230. The splitter 220 and the selector 240 may be used as a bypass circuit in case of an operations issue or problem in the detection/replacement device 230. The default mode for the selector 240 may be to pass-through the incoming video stream. According to one embodiment, manually switching the selector 240 to the detection/replacement device 230 may cause the detection/replacement device 230 to provide advertisements (e.g., targeted advertisements) to be displayed to the subscriber (viewer, user). That is, the detection/replacement device 230 may not detect and insert the advertisements in the program stream to create a presentation stream. Accordingly, the manual switching of the selector 240 may be the equivalent to switching a channel from a program content channel to an advertisement channel. Accordingly, this embodiment would have no copyright issues associated therewith as no recording, analyzing, or manipulation of the program stream would be required. While the splitter 220, the detection/replacement device 230, and the selector 240 are all illustrated as separate components they are not limited thereby. Rather, all the components could be part of a single component (e.g., the splitter 220 and the selector 240 contained inside the detection/replacement device 230; the splitter 220, the detection/replacement device 230, and the selector 240 could be part of the STB 210). Automatic techniques for detecting advertisements (or advertisement blocks) may include detecting aspects (features) of the video stream that indicate an advertisement is about to be displayed or is being displayed (feature based detection). For example, advertisements are often played at a higher volume then programming so a sudden volume increase (without commands from a user) may indicate an advertisement. Many times several dark monochrome (black) frames of video are presented prior to the start of an advertisement so the detection of these types of frames may indicate an advertisement. The above noted techniques may be used individually or in combination with one another. These techniques may be utilized along with temporal measurements, since commercial breaks often begin within a certain known time range. However, these techniques may miss advertisements if the volume increases or if the display of black frames is missing or does not meet a detection threshold. Moreover, these techniques may result in false positives (detection of an advertisement when one is not present) as the programming may include volume increases or sequences of black frames. Frequent scene/shot breaks are more common during an advertisement since action/scene changes stimulate interest in the advertisement. Additionally, there is typically more action and scene changes during an advertisement block. Accordingly, another possible automatic feature based technique for detecting advertisements is the detection of scene/shot breaks (or frequent scene/shot breaks) in the video programming. Scene breaks may be detected by comparing consecutive frames of video. Comparing the actual images of consecutive frames may require significant processing. Alternatively, scene/shot breaks may be detected by computing characteristics for consecutive frames of video and for comparing these characteristics. The computed characteristics may include, for example, a color histogram or a color coherence vector (CCN). The detection of scene/shot breaks may result in many false positives (detection of scene changes in programming as opposed to actual advertisements). A color histogram is an analysis of the number of pixels of various colors within an image or frame. Prior to calculating a color histogram the frame may be scaled to a particular size (e.g., number of pixels), the colors may be reduced to the most significant bits for each color of the red, blue, green (RGB) spectrum, and the image may be smoothed by filtering. As an example, if the RGB spectrum is reduced to the 2 most significant bits for each color (4 versions of each color) there will be a total of 6 bits for the RGB color spectrum or 64 total color combinations (26). FIG. 3 illustrates an exemplary pixel grid 300 for a video frame and an associated color histogram 310. As illustrated the pixel grid 300 is 4x4 (16 pixels) and each grid is identified by a six digit number with each two digit portion representing a specific color (RGB). Below the digit is the color identifier for each color. For example, an upper right grid has a 000000 as the six digit number which equates to Ro, Go and B0. As discussed, the color histogram 310 is the number of each color in the overall pixel grid. For example, there are 9 Ro's in FIG . 3. FIG. 4 illustrates an exemplary comparison of two color histograms 400, 410. The comparison entails computing the difference/distance between, the two. The distance may be computed for example by summing the absolute differences
(Ll-Norm) 420 or by summing the square of the differences (L2-Norm) 430. For simplicity and ease of understanding we assume that the image contains only 9 pixels and that each pixel has the same bit identifier for each of the colors in the RGB spectrum so that a single number represents all colors. The differe-cice between the color histograms 400, 410 is 6 using the absolute difference method <= 20 and 10 using the squared difference method 430. Depending on the method utiLized to compare the color histograms the threshold used to detect scene changes or other parameters may be adjusted accordingly. A color histogram tracks the total number of colors in a frame. TTms, it is possible that when comparing two frames that are completely different Vnt utilize similar colors throughout, a false match will occur. CCVs divide the col ors from the color histogram into coherent and incoherent ones based on how the colors are grouped together. Coherent colors are colors that are grouped together in more than a threshold number of connected pixels and incoherent colors are colors "that are either not grouped together or are grouped together in less than a thresho Id number of pixels. For example, if 8 is the threshold and there are only 7 red pixels grouped (connected together) then these 7 red pixels are considered incoherent. FIG. 5 illustrates an exemplary pixel grid 500 for a video frame and associated color histogram 510 and CCVs 520, 530. For ease of understanding we assume that all of the colors in the pixel grid have the same number associated with each of the colors (RGB) so that a single number represents all colors an<i the pixel grid 500 is limited to 16 pixels. Within the grid 500 there are some colotrs that are grouped together (has at least one other color at a connected pixel - one of the 8 touching pixels) and some colors that are by themselves. For example, r vo color Is, four color 2s, and four (two sets of 2) color 3 s are grouped (connected), while three color 0s, one color 1, and two color 3s are not grouped (connected). The color histogram 510 indicates the number of each color. A first CCV 520 illustrates the number of coherent and incoherent colors assuming that the threshold grouping for being considered coherent is 2 (that is a grouping of two pixels of the same color means the pixels are coherent for that color). A second CCV 530 illustrates the number of coherent and incoherent colors assuming that the threshold grouping was 3. The colors impacted by the change in threshold are color 0 (went from 2 coherent and 1 incoherent to 0 coherent and 3 incoherent) and color 3 (went from 4 coherent and 2 incoherent to 0 coherent and 6 incoherent). Depending on the method utilized to compare the CCVs the threshold used for detecting scene changes or other parameters may be adjusted accordingly. FIG. 6 illustrates an exemplary comparison of color histograms 600, 610 and CCVs 620, 630 for two images. In order to compare, the differences (distances) between the color histograms and the CCVs can be calculated. The differences may be calculated, for example, by summing the absolute differences (Ll-Norm) or by summing the square of the differences (L2-Norm). For simplicity and ease of understanding assume that the image contains only 9 pixels and that each pixel has the same bit identifier for each of the colors in the RGB spectrum. As illustrated the color histograms 600, 610 are identical so the difference ( ΔcH) is 0 (calculation illustrated for summing the absolute differences). The difference ( Δ CCV) between the two CCVs 620, 630 is 8 (based on the sum of the absolute differences method). Another possible feature based automatic advertisement detection technique includes detecting action (e.g., fast moving objects, hard cuts, zooms, changing colors) as an advertisement may have more action in a short time than the programming. According to one embodiment, action can be determined using edge change ratios (ECR). ECR detects structural changes in a scene, such as entering, exiting and moving objects. The changes are detected by comparing the edge pixels of consecutive images (frames), n and n-1. Edge pixels are the pixels that form the exterior of distinct objects within a scene (e.g., a person, a house). A determination is made as to the total number of edge pixels for two consecutive images, σn and σn-ι, the number of edge pixels exiting a first frame, Jζ°_x and the number of edge
pixels entering a second image, J '" . The ECR is the maximum of (1) the ratio of X o l — , — , ... -.*& ---& ~„&~ r-.. -_. w,.^ -,- - -. - -. -u- -iU«.& 2=1 ), or (2) the
τ »- in ratio of incoming edge pixels to total pixels for a second image ( X « ). σ „ FIG. 6 A illustrates two exemplary consecutive images, n and n-1. Edge pixels for each of the images are shaded. The total number of edge pixels for image n-1, σn-ι, is 43 while the total number of edge pixels for image n, σ„, is 32. The pixels circled in image n-1 are not part of the image n (they exited image n-1).
Accordingly, the number of edge pixels exiting image n-1, °_ , is 22. The pixels circled in image n were not part of image n-1 (they entered image n). Accordingly, the number of edge pixels entering image n, ] ™ , is 13. The ECR is the greater of -r - out the two ratios ---ii (2? and X - -3 )• Accordingly, the ECR value is
0.512. According to one embodiment, action can be determined using a motion vector length (MVL). The MVL divides images (frames) into macroblocks (e.g., 16x16 pixels). A determination is then made as to where each macroblock is in the next image (e.g., distance between macroblock in consecutive images). The determination may be limited to a certain number of pixels (e.g., 20) in each direction. If the location of the macroblock can not be determined then a predefined maximum distance may be defined (e.g., 20 pixels in each direction). The macroblock length vector for each macroblock can be calculated as the square root of the sum of the squares of the differences between the x and y coordinates [V(x ι-
Figure imgf000016_0001
FIG. 6B illustrates two exemplary consecutive images, n and n-1. The images are divided into a plurality of macroblocks (as illustrated each macroblock is 4 (2x2) pixels). Four specific macroblocks are identified with shading and are labeled 1-4 in the first image n-1. A maximum search area is defined around the 4 specific macroblocks as a dotted line (as illustrated the search areas is one macroblock in each direction). The four macroblocks are identified with shading on the second image n. Comparing the specified macroblocks between images reveals that the first and second macroblocks moved within the defined search are, the third macroblock did not move, and the fourth macroblock moved out of the search area. If the upper left hand pixel is used as the coordinates for the macroblock it can be seen that MB1 moved from 1,1 to 2,2; MB2 moved from 9,7 to 11,9; MB3 did not move from 5,15; and MB4 moved from 13 , 13 to outside of the range. Since MB4 could not be found within the search window a maximum distance of 3 pixels in each direction is defined. Accordingly, the length vector for the macroblocks is 1.41 for MB1, 2.83 for MB2, 0 for MB3, and 4.24 for MB4. As with the other feature based automatic advertisement detection techniques the action detection techniques (e.g., ECR, MVL) do not always provide a high level of confidence that the advertisement is detected and may also led to false positives. According to one embodiment, several of these techniques may be used in conjunction with one another to produce a result with a higher degree of confidence and may be able to reduce the number of false positives and detect the advertisements faster. However, as the feature based techniques are based solely on recognition of features that may be present more often in advertisements than programming there can probably never be a complete level of confidence that an advertisement has been detected. In addition, it may take a long time to recognize that these features are present (several advertisements). In some countries, commercial break intros are utilized to indicate to the viewers that the subsequent material being presented is not programming but rather sponsored advertising. These commercial break intros vary in nature but may include certain logos, characters, or other specific video and audio messages to indicate that the subsequent material is not programming but rather advertising. The return to programming may in some instances also be preceded by a commercial break outro which is a short video segment that indicates the return to programming. In some cases the intros and the outros may be the same with an identical programming segment being used for both the intro and the outro. Detecting the potential presence of the commercial break intros or outros may indicate that an advertisement (or advertisement block) is about to begin or end respectively. If the intros and/or outros were always the same, detection could be done by detecting the existence of specific video or audio, or specific logos or characters in the video stream, or by detecting specific features about the video stream (e.g., CCNs). However, the intros and/or outros need not be the same. The intros/outros may vary based on at least some subset of day, time, channel (network), program, and advertisement (or advertisement break). Intros may be several frames of video easily recognized by the viewer, but may also be icons, graphics, text, or other representations that do not cover the entire screen or which are only shown for very brief periods of time. Increasingly, broadcasters are also selling sponsorship of certain programming which means that a sponsor's short message appears on either side (beginning or end) of each ad break during that programming. These sponsorship messages can also be used as latent cue tones indicating the start and end of ad breaks. The detection of the intros, outros, and/or sponsorship messages may be based on comparing the incoming video stream, to a plurality of known intros, outros, and/or sponsorship messages. This would require that each of a plurality of known intros, outros, and/or sponsorship messages be stored and that the incoming video stream be compared to each. This may require a large amount of storage and may require significant processing as well, including the use of non-real-time processing.. Such storage and processing may not be feasible or practical, especially for real time detection systems. Moreover, storing the known advertisements for comparing to the video programming could potentially be considered a copyright violation. The detection of the intros, outros, and/or sponsorship messages may be based on detecting messages, logos or characters within the video stream and comparing them to a plurality of known messages, logos or characters from known intros, outros, and/or sponsorship messages. The incoming video may be processed to find these messages, logos or characters. The known messages, logos or characters would need to be stored in advance along with an association to an intro or outro. The comparison of the detected messages, logos or characters to the known messages, logos or characters may require significant processing , including the use of non-real-time processing. Moreover, storing the known messages, logos or characters for comparison to messages, logos or characters from the incoming video stream could potentially be considered a copyright violation. The detection of the intros, outros, and/or sponsorship messages may be based on detecting messages within the video stream and determining the meaning of the words (e.g., detecting text in the video stream and analyzing the text to determine if it means an advertisement is about to start). Alternatively, the detection may be based on calculating features (statistical parameters) about the incoming video stream. The features calculated may include, for example, color histograms or CCNs as discussed above. The features may be calculated for an entire video frame, as discussed above, number of frames, or may be calculated for evenly/randomly highly subsampled representations of the video frame. For example, the video frame could be sampled at a number (e.g., 64) of random locations or regions in the video frame and parameters such as average color) may be computed for each of these regions. The subsampling can also be performed in the temporal domain. The collection of features including CCNs for a plurality of images/frames, color histograms for a plurality of regions, may be referred to as a fingeφrint. FIG. 7 illustrates an exemplary pixel grid 700 for a video frame. For ease of understanding, we limit the pixel grid to 12x12 (144 pixels), limit the color variations for each color (RGB) to the two most significant bits (4 color variations), and have each pixel have the same number associated with each of the colors (RGB) so that a single number represents all colors. A plurality of regions 710 , 720, 730, 740, 750, 760, 770, 780, 785, 790, 795 of the pixel grid 700 are sampled and an average color for each of the regions 710, 720, 730 , 740, 750, 760, 770, 780, 785, 790, 795 is calculated. For example, the region 710 has an average color of 1.5, the region 790 has an average color of .5 and the region 795 has an average color of 2.5. One advantage of the sampling of regions of a frame instead of an entire frame is that the entire frame would not need to be copied in order to calculate the features (if copying was even needed to calculate the features). Rather, certain regions of the image may be copied in order to calculate the features for those regions. As the regions of the frame would provide only a partial image and could not be used to recreate the image, there would be less potential copyright issues. As will be discussed in more detail later, the generation of fingeφrints for known entities (e.g., advertisements, intros) that are stored in a database for comparison could be done for regions as well and therefore create less potential copyright issues. FIG. 8 illustrates two exemplary pixel grids 800 and 810. Each of the pixel grids is 11x11 (121 pixels) and is limited to a single bit (0 or 1) for each of the colors. The top view of each pixel grid 800 , 810 has a plurality of regions identified 815-850 and 855-890 respectively. The lower view of each pixel grids 800 , 810 has the coherent and incoherent pixels identified, where the threshold level is greater than 5. FIG. 9 illustrates exemplary comparisons of the pixel grids 800, 810 of FIG.
8. Color histograms 900, 910 are for the entire frame 800, 810 respectively and the difference in the color histograms 920 is 0. CCNs 930, 940 are for the entire frame 800, 810 respectively and the difference in the CCNs 950 is 0. Average colors 960, 970 capture the average colors for the various identified regions in frames 800, 810 . The difference is the average color of the regions 980 is 3.5 (using the sum of absolute values). FIGs. 7-9 focused on determining the average color for each of the regions but the techniques illustrated therein are not limited to average color determinations. For example, a color histogram or CCN could be generated for each of these regions. For CCNs to provide useful benefits the regions would have to be big enough or all of the colors will be incoherent. All of the colors will be coherent if the coherent threshold is made too low. The calculated features/fingeφrints (e.g., CCNs, evenly/randomly highly subsampled representations) are compared to corresponding features/fingeφrints for known intros and/or outros. The fingeφrints for the known intros and outros could be calculated and stored in advance. The comparison of calculated features of the incoming video stream (statistical parameterized representations) to the stored fingeφrints for known intros/outros will be discussed in more detail later. Another method for detecting the presentation of an advertisement is automatic detection of the advertisement. Automatic detection techniques may include recognizing that the incoming video stream is a known advertisement. Recognition techniques may include comparing the incoming video stream to known video advertisements. This would require that each of a plurality of known video advertisements be stored in order to do the comparison. This would require a relatively large amount of storage and would likely require significant processing, including non-real-time processing. Such storage and processing may not be feasible or practical, especially for real time detection systems. Moreover, storing the known advertisements for comparison to the video programming could potentially be considered a copyright violation. Accordingly, a more practical automatic advertisement recognition technique may be to calculate features (statistical parameters) about the incoming video stream and to compare the calculated features to a database of the same features (previously calculated) for known advertisements. The features may include color histograms, CCNs, and/or evenly/randomly highly subsampled representations of the video stream as discussed above or may include other features such as text and object recognition, logo or other graphic overlay recognition, and unique spatial frequencies or patterns of spatial frequencies (e.g., salient points). The features may be calculated for images (e.g., frames) or portions of images (e.g., portions of frames). The features may be calculated for each image (e.g., all frames) or for certain images (e.g., every I-frame in an MPEG stream). The combination of features for different images (or portions of images) make up a fmgeφrint. The fingeφrint (features created from multiple frames or frame portions) may include unique temporal characteristics instead of, or in addition to, the unique spatial characteristics of a single image. The features/fingeφrints for the known advertisements or other segments of programming (also referred to as known video entities) may have been pre - calculated and stored at the detection point. For the known advertisements, the fingeφrints may be calculated for the entire advertisement so that the known advertisement fmgeφrint includes calculated features for the entire advertisement (e.g., every frame for an entire 30-second advertisement). Alternatively, the fingeφrints may be calculated for only a portion of the known advertisements (e.g., 5 seconds). The portion should be large enough so that effective matching to the calculated fmgeφrint for the incoming video stream is possible. For example, an effective match may require comparison of at least a certain number of images/frames (e.g., 10) as the false negatives may be high if less comparison is performed. FIG. 10 illustrates an exemplary flowchart of the advertisement matching process. Initially, the video stream is received 1000. The received video stream may be analog or digital video. The processing may be done in either analog or digital but is computationally easier as digital video (accordingly digital video may be preferred). Therefore, the video stream may be digitized 1010 if it is received as analog video. Features (statistical parameters) are calculated for the video stream 1020. The features may include CCNs, color histograms, other statistical parameters, or a combination thereof. As mentioned above the features can be calculated for images or for portions of images. The calculated features/fmgeφrints are compared to corresponding fingeφrints (e.g., CCNs are compared to CCNs) for known advertisements 1030. According to one embodiment, the comparison is made to the pre-stored fmgeφrints of a plurality of known advertisements (fingeφrints of known advertisements stored in a database). The comparison 1030 may be made to the entire fingeφrint for the known advertisements, or may be made after comparing to some portion of the fingeφrints (e.g., 1 second which is approximately 25 frames, 35 frames which is approximately 1.4 seconds) that is large enough to make a determination regarding similarity. A determination is made as to whether the comparison was to entire fingeφrints (or some large enough portion) 1040. If the entire fingeφrint (or large enough portion) was not compared (1040 No) additional video stream will be received and have features calculated and compared to the fingeφrint (1000-1030). If the entire fingeφrint (or large enough portion) was compared (1040 Yes) then a determination is made as to whether the features of the incoming video stream meets a threshold level of similarity with any of the fingeφrints 1050. If the features for the incoming video stream do not meet a threshold level of similarity with one of the known advertisement fingeφrints (1050 No) then the incoming video stream is not associated with a known advertisement 1060. If the features for the incoming video stream meet a threshold level of similarity with one of the known advertisement fingeφrints (1050 Yes) then the incoming video stream is associated with the known advertisement (the incoming video stream is assumed to be the advertisement) 1070. Once it is determined that the incoming video stream is an advertisement, ad substitution may occur. Targeted advertisements may be substituted in place of all advertisements within an advertisement block. The targeted advertisements may be inserted in order or may be inserted based on any number of parameters including day, time, program, last time ads were inserted, and default advertisement
(advertisement it is replacing). For example, a particular advertisement may be next in the queue to be inserted as long as the incoming video stream is not tuned to a particular program (e.g., a Nike® ad may be next in the queue but may be restricted from being substituted in football games because Adidas is a sponsor of the football league). Alternatively, the targeted advertisements may only be inserted in place of certain default advertisements. The determination of which default ads should be substituted with targeted ads may be based on the same or similar parameters as noted above with respect to the order of targeted ad insertion. For example, beer ads may not be substituted in a bar, especially if the bar sells that brand of beer. Conversely, if a default ad for a competitor hotel is detected in the incoming video stream at a hotel the default ad should be replaced with a targeted ad. The process described above with respect to FIG. 10 is focused on detecting advertisements within the incoming video stream. However, the process is not limited to advertisements. For example, the same or similar process could be used to compare calculated features for the incoming video stream to a database of fingeφrints for known intros (if intros are used in the video delivery system) or known sponsorships (if sponsorships are used). If a match is detected that would indicate that an intro is being displayed and that an advertisement break is about to begin. Ad substitution could begin once the intro is detected. According to one embodiment, targeted advertisements may be inserted for an entire advertisement block (e.g., until an outro is detected). The targeted advertisements may be inserted in order or may be inserted based on any number of parameters including day, time, program, and last time ads were inserted. Alternatively, the targeted advertisements may only be inserted in place of certain default advertisements. To limit insertion of targeted advertisements to specific default advertisements would require the detection of specific advertisements. The intro or sponsorship may provide some insight as to what ads may be played in the advertisement block. For example, the intro detected may be associated with (often played prior to) an advertisement break in a soccer game and the first ad played may normally be a beer advertisement. This information could be used to limit the comparison of the incoming video stream to ad fingeφrints for known beer advertisements as stored in an indexed ad database or could be used to assist in the determination of which advertisement to substitute. For example, a restaurant that did not serve alcohol may want to replace the beer advertisement with an advertisement for a non-alcoholic beverage. The level of similarity is based on substitutions, deletions and insertions of features necessary to align the features of the incoming video stream with a fingeφrint (the minimal distance between the two). It is regarded as a match between the fingeφrint sequences for the incoming video stream and a known advertisement if the minimal distance between does not exceed a distance threshold and the difference in length of the fingeφrints does not exceed a length difference threshold. Approximate substring matching may allow detection of commercials that have been slightly shortened or lengthened, or whose color characteristics have been affected by different modes or quality of transmission. Advertisements only make up a portion of an incoming video stream so that continually calculating features for the incoming video stream 1020 and comparing the features to known advertisement fingeφrints 1030 may not be efficient. According to one embodiment, the feature based techniques described above (e.g., volume increases, increase scene changes, monochrome images) may be used to detect the start of a potential advertisement (or advertisement block) and the calculating of features 1020 and comparing to known fingeφrints 1030 may only be performed once a possible advertisement break has been detected. It should be noted that some methods of detecting the possibility of an advertisement break in the video stream such as an increase in scene changes, where scene changes may be detected by comparing successive CCNs, may in fact be calculating features of the video stream 1020 so the advertisement detection process may begin with the comparison 1030. According to one embodiment, the calculating of features 1020 and comparing to known fmgeφrints 1030 may be limited to predicted advertisement break times (e.g., between : 10 and :20 after every hour). The generation 1020 and the comparison 1030 may be based on the channel to which it is tuned. For example, a broadcast channel may have scheduled advertisement blocks so that the generation 1020 and the comparison 1030 may be limited to specific times. However, a live event such as a sporting event may not have fixed advertisement blocks so time limiting may not be an option. Moreover channels are changed at random times, so time blocks would have to be channel specific. According to an embodiment in which intros are used, the calculated fingeφrint for the mcoming video stream may be continually compared to fingeφrints for known intros stored in a database (known intro fingeφrints). After an intro is detected indicating that an advertisement (or advertisement block) is about to begin, the comparison of the calculated fingeφrint for the incoming video stream to fingeφrints for known advertisements stored in a database (known advertisement fingeφrints) begins. If an actual advertisement detection is desired, a comparison of the calculated fingeφrints of the incoming video stream to the known advertisement fingeφrints stored in a database will be performed whether the comparison is continual or only after some event (e.g., detection of intro, certain time). Comparing the calculated fingeφrint of the incoming video stream to entire fingeφrints (or portions thereof) for all the known advertisement fingeφrints 1030 may not be an efficient use of resources. The calculated fingeφrint may have little or no similarity with a percentage of the known advertisement fingeφrints and this difference may be obvious early in the comparison process. Accordingly, continuing to compare the calculated fingeφrint to these known advertisement fingeφrints is a waste of resources. According to one embodiment, an initial window (e.g., several frames, several regions of a frame) of the calculated fingeφrint of the mcoming video steam may be compared to an initial window of all of the known advertisement fmgeφrints (e.g., several frames, several regions). Only the known advertisement fmgeφrints that have less than some defined level of dissimilarity (e.g., less than a certain distance between them) proceed for further comparison. The initial window may be, for example, a certain period (e.g., 1 second), a certain number of images (e.g., first 5 I-frames), or a certain number of regions of a frame (e.g., 16 of 64 regions of frame). FIG. 11 illustrates an exemplary flowchart of an initial dissimilarity determination process. The video stream is received 1100 and may be digitized 1110 (e.g., if it is received as analog video). Features (statistical parameters) are calculated for the video stream (e.g., digital video stream) 1120. The features (fmgeφrint) may include CCNs, color histograms, other statistical parameters, or a combination thereof. The features can be calculated for images or for portions of images. The calculated features (fmgeφrint) are compared to the fingeφrints for known advertisements 1130 (known advertisement fingeφrints). A determination is made as to whether the compare has been completed for an initial period (window) 1140. If the initial window compare is not complete (1140 No) the process returns to 1100-1130. If the initial window compare is complete (1140 Yes) then a determination is made as to the level of dissimilarity (distance) between the calculated fingeφrint and the known advertisement fingeφrints exceeding a threshold 1150. If the dissimilarity is below the threshold, the process proceeds to FIG. 10 (1000) for those fingeφrints. For the known advertisement fingeφrints that the threshold is exceeded (1150 Yes) the comparing is aborted. FIG. 12 illustrates an exemplary initial comparison of the calculated fingeφrint for an incoming stream versus initial portions of fingeφrints for a plurality of known advertisements stored in a database (known advertisement fingeφrints). For ease of understanding we will assume that each color is limited to a single digit (two colors), that each color has the same digit so that a single number can represent all colors, and that the pixel grid is 25 pixels. The calculated fingeφrint includes a CCN for each image (e.g., frame, I- frame). The incoming video stream has a CCN calculated for the first three frames. The CCN for the first three frames of the incoming stream are compared to the associated portion (CCNs of the first three frames) of each of the known advertisement fingeφrints. The comparison includes summating the dissimilarity (e.g., calculated distance) between corresponding frames (e.g., distance Frame 1 + distance Frame 2 + distance Frame 3). The distance between the CCNs for each of the frames can be calculated in various manners including the sum of the absolute difference and the sum of the squared differences as described above. The sum of the absolute differences is utilized in FIG. 12. The difference between the incoming video steam and a first fmgeφrint (FPi) is 52 while the difference between the incoming video stream and the Nth fingeφrint (FPN) is 8. If the predefined level of dissimilarity (distance) was 25, then the comparison for FPi would not proceed further (e.g., 1160) since the level of dissimilarity exceeds the predefined level (e.g., 1150 Yes). The comparison for FPN would continue (e.g., proceed to 1000) since the level of dissimilarity did not exceed the predefined level (e.g., 1150 No). It is possible that the incoming video stream may have dropped the first few frames of the advertisement or that the calculated features (e.g., CCN) are not calculated for the beginning of the advertisement (e.g., first few frames) because, for example, the possibility of an advertisement being presented was not detected early enough. In this case, if the comparison of the calculated features for the first three frames is compared to the associated portion (calculated features of the first three frames) of each of the known advertisement fingeφrints, the level of dissimilarity may be increased erroneously since the frames do not correspond. One way to handle this is to extend the length of the fingeφrint window in order to attempt to line the frames up. FIG. 13 illustrates an exemplary initial comparison of calculated features for an incoming stream versus an expanded initial portion of known advertisement fingeφrints. For ease of understanding one can make the same assumptions as with regard to FIG. 12. The CCVs calculated for the first three frames of the incoming video stream are compared by a sliding window to the first five frames for a stored fmgeφrint. That is, frames 1-3 of the calculated features of the incoming video stream are compared against frames 1-3 of the fingeφrint, frames 2 -4 of the fmgeφrint, and frames 3-5 of the fmgeφrint. By doing this it is possible to reduce or eliminate the differences that may have been caused by one or more frames being dropped from the incoming video stream. In the example of FIG. 13, the first two frames of the incoming stream were dropped. Accordingly, the difference between the calculated features of the incoming video stream equated best to frames 3-5 of the fingerprint. If the comparison between the calculated features of the incoming stream and the fϊngeφrint have less dissimilarity then the threshold, the comparison continues . The comparison may continue from the portion of the fingeφrint where the best match was found for the initial comparison. In the exemplary comparison of FIG. 12, the comparison should continue between frame 6 (next frame outside of initial window) of the fingeφrint and frame 4 of incoming stream. It should be noted that if the comparison resulted in the best match for frames 1-3 of the fingeφrint, then the comparison may continue starting at frame 4 (next frame within the initial window) for the fingeφrint. To increase the efficiency by hmiting the amount of comparisons being performed, the window of comparison may continually be increased for the known advertisement fingeφrints that do not meet or exceed the dissimilarity threshold until one of the known advertisement fingeφrints possibly meets or exceeds the similarity threshold. For example, the window may be extended 5 frames for each known advertisement fingeφrint that does not exceed the dissimilarity threshold. The dissimilarity threshold may be measured in distance (e.g., total distance, average distance/frame). Comparison is stopped if the incoming video fingeφrint and the known advertisement fmgeφrint differ by more than a chosen dissimilarity threshold. A determination of a match would be based on a similarity threshold. A determination of the similarity threshold being met or exceeded may be delayed until some predefined number of frames (e.g., 20) have been compared to ensure a false match is not detected (small number of frames being similar). Like the dissimilarity threshold, the similarity threshold may be measured in distance. For example, if the distance between the features for the incoming video stream and the fingeφrint differ by less then 5 per frame after at least 20 frames are compared it is considered a match. FIG. 14 illustrates an exemplary expanding window comparison of the features of the incoming video stream and the features of the fingeφrints of known advertisements. For the initial window Wi, the incoming video stream is compared to each of five known advertisement fingeφrints (FPi - FP5). After Wi, the comparison of FP2 is aborted because it exceeded the dissimilarity threshold. The comparison of the remaining known advertisement fingeφrints continues for the next window W2 (e.g., next five frames, total of 10 frames). After W 2, the comparison of FPi is aborted because it exceeded the dissimilarity threshold. The comparison of the remaining known advertisement fingeφrints continues for the next window W3 (e.g., next five frames, total of 15 frames). After W 3, the comparison of FP3 is aborted. The comparison of the remaining known advertisement fingeφrints continues for the next window W (e.g., next five frames, total of 20 frames). After W4 , a determination can be made about the level of similarity. As illustrated, it was determined that FP5 meets the similarity threshold. If neither of the known advertisement fingeφrints (FP4 or FP5) meet the similarity threshold, the comparison would continue for the known advertisement fingerprints that did not exceed the dissimilarity threshold. Those that meet the dissimilarity threshold would not continue with the comparisons. If more then one known advertisement fingeφrint meet the similarity threshold then the comparison may continue until one of the known advertisement fingeφrints falls outside of the similarity window, or the most similar known advertisement fingeφrint is chosen. The windows of comparison in FIG. 14 (e.g., 5 frames) may have been a comparison of temporal alignment of the frames, a summation of the differences between the individual frames, a summation of the differences of individual regions of the frames, or some combination thereof. It should also be noted, that the window is not limited to a certain number of frames as illustrated and may be based on regions of a frame (e.g., 16 of the 32 regions the frame is divided into). If the window was for less than a frame, certain fingeφrints may be excluded from further comparisons after comparing less than a frame. It should be noted that the level of dissimilarity may have to be high for comparisons of less than a frame so as not to exclude comparisons that are temporarily high due to, for example, misalignment of the fingeφrints. According to one embodiment, the calculated features for the incoming video stream are not stored. Rather, they are calculated and compared and then discarded. No video is being copied or if the video is being copied it is only for a short time (temporarily) while the features are calculated. The features calculated for images can not be used to reconstruct the video, and the calculated features are not copied or if the features are copied it is only for a short time (temporarily) while the comparison to the known advertisement fingeφrints is being performed. As previously noted, the features may be calculated for an image (e.g., frame) or for a portion or portions of an image. Calculating features for a portion may entail sampling certain regions of an image as discussed above with respect to FIGs. 7-9 above. Calculating features for a portion of an image may entail dividing the image into sections, selecting a specific portion of the image or excluding a specific portion of the image. Selecting specific portions may be done to focus on specific areas of the incoming video stream (e.g., network logo, channel identification, program identification). The focus on specific areas will be discussed in more detail later. Excluding specific portions may be done to avoid overlays (e.g., network logo) or banners (e.g., scrolling news, weather or sport updates) that may be placed on the incoming video stream that could potentially affect the matching of the calculated features of the video stream to fingeφrints, due to the fact that known advertisements might not have had these overlays and/or banners when the original library fingeφrints were generated. FIG. 15 illustrates an exemplary pixel grid 1500 divided into sections 1510, 1520, 1530, 1540 as indicated by the dotted line. The pixel grid 1500 consists of 36 pixels (a 6x6 grid) and a single digit for each color with each pixel having the same number associated with each color. The pixel grid 1500 is divided into 4 separate 3x3 grids 1510-1540. A full image CCV 1550 is generated for the entire grid 1500, and partial image CCVs 1560, 1570, 1580, 1590 are generated for the associated sections 1510-1540. A summation of the section CCVs 1595 would not result in the CCV 1550 as the pixels may have been coherent because they were grouped over section borders which would not be indicated in the summation CCV 1595. It should be noted that the summation CCV 1595 is simply for comparing to the CCV 1550 and would not be used in a comparison to fingeφrints. When calculating CCVs for sections the coherence threshold may be lowered. For example, the coherence threshold for the overall grid was four and may have been three for the sections. It should be noted that if it was lowered to 2 that the color 1 pixels in the lower right corner of section pixel grid 1520 would be considered coherent and the CCV would change accordingly to reflect this fact. If the image is divided into sections, the comparison of the features associated with the incoming video stream to the features associated with known advertisements may be done based on sections. The comparison may be based on a single section. Comparing a single section by itself may have less granularity then comparing an entire image. FIG. 16 illustrates an exemplary comparison of two images 1600, 1620 based on the whole images 1600, 1620 and sections of the images 1640, 1660 (e.g., upper left quarter of image). Features (CCVs) 1610, 1630 are calculated for the images 1600, 1620 and reveal that the difference (distance) between them is 16 (based on sum of absolute values). Features (CCVs) 1650, 1670 are calculated for the sections 1640, 1660 and reveal that there is no difference. The first sections 1640, 1660 of the images were the same while the other sections were different thus comparing only the features 1650, 1670 may erroneously result in not being filtered
(not exceeding dissimilarity threshold) or a match (exceeding similarity threshold). A match based on this false positive would not be likely, as in a preferred embodiment a match would be based on more then a single comparison of calculated features for a section of an image in an incoming video stream to portions of known advertisement fingeφrints. Rather, the false positive would likely be filtered out as the comparison was extended to further sections. In the example of FIG. 16, when the comparison is extended to other sections of the image or other sections of additional images the appropriate weeding out should occur. It should be noted that comparing only a single section may provide the opposite result (being filtered or not matching) if the section being compared was the only section that was different and all the other sections were the same. The dissimilarity threshold will have to be set at an appropriate level to account for this possible effect or several comparisons will have to be made before a comparison can be terminated due to a mismatch (exceeding dissimilarity threshold). Alternatively, the comparison of the sections may be done at the same time
(e.g., features of sections 1-4 of the incoming video stream to features of sections 1- 4 of the known advertisements). As discussed above, comparing features of sections may require thresholds (e.g., coherence threshold) to be adjusted. Comparing each of the sections individually may result in a finer granularity then comparing the whole image. FIG. 17 illustrates an exemplary comparison of a pixel grid 1700 (divided into sections 1710, 1720, 1730, 1740) to the pixel grid 1500 (divided into sections
1510, 1520, 1530, 1540) of FIG. 15. By simply comparing the pixel grids 1500 and 1700 it can be seen that the color distribution is different. However, comparing a CCV 1750 of the pixel grid 1700 and the CCV 1550 of the pixel grid 1500 results in a difference (distance) of only 4. However, comparing CCVs 1760-1790 for sections 1710-1740 to the CCVs 1560-1590 for sections 1510-1540 would result in differences of 12, 12, 12 and 4 respectively, for a total difference of 40. It should be noted that FIGs. 15-17 depicted the image being divided into four quadrants of equal size, but is not limited thereto. Rather the image could be divided in numerous ways without departing from the scope (e.g., row slices, column slices, sections of unequal size and/or shape). The image need not be divided in a manner in which the whole image is covered. For example, the image could be divided into a plurality of random regions as discussed above with respect to FIGs. 7-9. In fact, in one embodiment the sections of an image that are analyzed and compared are only a portion of the entire image and could not be used to recreate the image so that there could clearly be no copyright issues. That is, certain portions of the image are not captured for calculating features or for comparing to associated portions of the known advertisement fingeφrints that are stored in a database. The known advertisement fingeφrints would also not be calculated for entire images but would be calculated for the same or similar portions of the images. FIGs. 11-14 discussed comparing calculated features for the incoming video stream to windows (small portions) of the fingeφrints at a time so that likely mismatches need not be continually compared. The same basic process can be used with segments. If the features for each of the segments for an image are calculated and compared together (e.g., FIG. 17) the process may be identical except for the fact that separate features for an image are being compared instead of a single feature. If the features for a subset of all the sections are generated and compared, then the process may compare the features for that subset of the incoming video stream to the features for that subset of the advertisement fingeφrints. For the fmgeφrints that do not exceed the threshold level of dissimilarity (e.g., 1150 No of FIG. 11) the comparison window may be expanded to the additional segments of the image and fingeφrints or may be extended to the same section of additional images. When determining if there is a match between the incoming video stream and a fingeφrint for a known ad (e.g., 1050 of FIG. 10), the comparison is likely not based on a single section/region as this may result in erroneous conclusions (as depicted in FIG. 16). Rather, it is preferable if the determination of a match is made after sufficient comparisons of sections/regions (e.g., a plurality of sections of an image, a plurality of images). For example, a fingeφrint for an incoming video stream (query fingeφrint q) may be based on an image (or portion of an image) and consist of features calculated for different regions (qls q2 ... qn) of the image. The fingeφrints for known advertisements (subject fingeφrints s) may be based on images and consist of features calculated for different regions (si, s2 ... sm) of the images. The integer m
(the number of regions in an image for a stored fingeφrint) may be greater then the integer n (number of regions in an image of incoming video stream) if the fingeφrint of the incoming video stream is not for a complete image. For example, regions may not be defined for boundaries on an incoming video stream due to the differences associated with presentation of images for different TVs and/or STBs. A comparison of the fingeφrints would (similarity measure) be the sum for i = 1 to n of the minimum distance between q\ and Sj, where i is the particular region. Some distance measures may not really affected by calculating a fingeφrint (q) based on less then the whole image. However, it might accidentally match the wrong areas since features may not encode any spatial distribution. For instance, areas which are visible in the top half of the incoming video stream and are used for the calculation of the query fingeφrint might match an area in a subject fingeφrint that is not part of the query fingeφrint. This would result in a false match. As previously noted, entire images of neither the incoming video stream nor the known advertisements (ad intros, sponsorship messages, etc.) are stored, rather the portions of the images are captured so that the features can be calculated. Moreover, the features calculated for the portions of the images of the incoming video stream are not stored, they are calculated and compared to features for known advertisements and then discarded. According to one embodiment, if the video stream is an analog stream and it is desired to calculate the features and compare to fingeφrints in digital then the video stream is converted to digital only as necessary. That is, if the comparisons to fingeφrints are done on a image by image basis the conversion to digital will be done image by image. If the video stream is not having features generated (e.g., CCV) or being compared to at least one fmgeφrint then the digital conversion will not be performed. That is, if the features for the incoming video stream do not match any fingeφrints so no comparison is being done or the incoming video stream was equated with an advertisement and the comparison is temporarily terminated while the ad is being displayed or a targeted ad is being substituted. If no features are being generated or compared then there is no need for the digital conversion. Limiting the amount of conversion from analog to digital for the incoming video stream means that there is less manipulation and less temporary storage (if any is required) of the analog stream while it is being converted. According to one embodiment, when calculating the features for the incoming video stream certain sections (regions of interest) may be either avoided or focused on. Portions of an image that are excluded may be defined as regions of disinterest while regions that are focused on may be defined as regions of interest.
Regions of disinterest and/or interest may include overlays, bugs, and banners. The overlays, bugs and banners may include at least some subset of channel and/or network logo, clock, sports scoreboard, timer, program information, EPG screen, promotions, weather reports, special news bulletins, close captioned data, and interactive TV buttons . If a bug (e.g., network logo) is placed on top of a video stream (including advertisements within the stream) the calculated features (e.g., CCVs) may be incomparable to fingeφrints of the same video sequence (ads or intros) that were generated without the overlays. Accordingly, the overlay may be a region of disinterest that should be excluded from calculations and comparisons. FIG. 18 illustrates several exemplary images with different overlays. The upper two images are taken from the same video stream. The first image has a channel logo overlay in the upper left corner and a promotion overlay in the upper right corner while the second image has no channel overlay and has a different promotion overlay. The lower two images are taken from the same video stream. The first image has a station overlay in the upper right corner and an interactive bottom in the lower right corner while the second image has a different channel logo in the upper right and no interactive button. Comparing fingeφrints for the first set of images or the second set of images may result in a non-match due to the different overlays. FIG. 19A illustrates an exemplary impact on pixel grids of an overlay being placed on a corresponding image. Pixel grid 190OA is for an image and pixel grid
1910A is for the image with an overlay. For ease of explanation and understanding the pixel grids are limited to 10x10 (100 pixels) and each pixel has a single bit defining each of the RGB colors. The overlay was placed in the lower right corner of the image and accordingly a lower right corner 1920A of the pixel grid 1910A was affected. Comparing the features (e.g., CCVs) 1930A, 1940A of the pixel grids 1900A, 1910A respectively indicates that the difference (distance) 1950A is 12 (using sum of absolute values). FIG. 19A illustrates an embodiment where the calculated fingeφrint for the incoming video stream and the known advertisement fingeφrints stored in a local database were calculated for entire frames. According to one embodiment, the regions of disinterest (e.g., overlays, bugs or banners) are detected in the video stream and are excluded from the calculation of the fingeφrint (e.g., CCVs) for the incoming video stream. The detection of regions of disinterest in the video stream will be discussed in more detail later. Excluding the region from the fingeφrint will affect the comparison of the calculated fingeφriαt to the known advertisement fingeφrints that may not have the region excluded. FIG. 19B illustrates an exemplary pixel grid 1900B with the region of interest 1910B (e.g., 1920A of FIG. 19A) excluded. The excluded region of interest 1910B is not used in calculating the features (e.g. , CCV) of the pixel grid 1900B. As 6 pixels are in the excluded region of interest 1910B, a CCV 1920B will only identify 94 pixels. Comparing the CCV 1920B having the region of interest excluded and the CCV 1930A for the pixel grid for the image without an overlay 1900A results in a difference 1930B of 6 (using the sum of absolute values). By removing the region of interest from the difference (distance) calculation, the distance between the image with no overlay 19O0A and the image with the overlay removed 1900B was half of the difference between the image with no overlay 1900A and the image with the overlay 1910A. The regions of disinterest (ROD) ay be detected by searching for certain characteristics in the video stream. The search for the characteristics may be limited to locations where overlays, bugs and banners may normally be placed (e .g., banner scrolling along bottom of image). The detection of the RODs may include comparing the image (or portions of it) to stored regions of interest. For example, network overlays may be stored and the incoming video stream may be compared to the stored overlay to determine if an overlay is part of the video stream. Comparing actual images may require extensive memory for storing the known regions of interest as well as extensive processing to compare the incoming video stream to the stored regions. According to one embodiment, a ROD may be detected by comparing a plurality of successive images. If a group of pixels is determined to not have changed for a predetermined number of frames, scene changes or hard cuts then it may be a logo or some over type of overlay (e.g., logo, banner). Accordingly, the ROD may be excluded from comparisons. According to one embodiment, the known RODs may have features calculated (e.g., CCVs) and these features may be stored as ROD fingeφrints. Features (e.g., CCVs) may be generated for the incoming video stream and the video stream features may be compared to the ROD fingeφrints. As the ROD is likely small with respect to the image the features for the incoming video stream may have to be limited to specific portions (portions where the ROD is likely to be). For example, bugs may normally be placed in a lower right hand corner so the features will be generated for a lower right portion of the incoming video and compared to the ROD fingeφrints (at least the ROD fingeφrints associated with bugs) to determine if an overlay is present. Banners may be placed on the lower 10% of the image so that features would be generated for the bottom 10% of an incoming video stream and compared to the ROD fingeφrints (at least the ROD fmgeφrints for banners). The detection of RODs may require that separate fingeφrints be generated for the incoming video stream and compared to distinct fingeφrints for RODs. Moreover, the features calculated for the possible RODs for the incoming video stream may not match stored ROD fingeφrints because the RODs for the incoming video stream may be overlaid on top of the video stream so that the features calculated will include the video stream as well as the overlay where the known fingeφrint may be generated for simply the overlay or for the overlay over a different video stream. Accordingly it may not be practical to determine RODs in an incoming video stream. According to one embodiment, the generation of the fingeφrints for known advertisements as well as for the incoming video steam may exclude portions of an image that are known to possibly contain RODs (e.g., overlays, banners). For example as previously discussed with respect to FIG. 19B, a possible ROD 1910B may be excluded from the calculation of the fingeφrint for the entire frame. This would be the case for both the calculated fingeφrint of the incoming video stream as well as the known advertisement fingeφrints stored in the database. Accordingly, the possible ROD would be excluded from comparisons of the calculated fingeφrint and the known advertisement fingeφrints. The excluded region may be identified in numerous manners. For example, the ROD may be specifically defined (e.g., exclude pixels 117-128). The portion of the image that should be included in fingeφrinting may be defined (e.g., include pixels 1-116 and 129-150). The image may be broken up into a plurality of blocks (e.g., 16x16 pixel grids) and those blocks that are included or excluded may be defined (e.g., include regions 1 -7 and 9-12, exclude region 6). A bit vector may be used to identify the pixels and/or blocks that should be included or excluded from the fingeφrint calculation (e.g., 0101100 may indicate that blocks 2, 4 and 5 should be included and blocks 1, 3, 6 and 7 are excluded). The RODs may also be excluded from sections and/or regions if the fingeφrints are generated for portions of an image as opposed to an entire image as illustrated in FIG. 19B. FIG. 20 illustrates an exemplary image 2000 to be fmgeφrinted that is divided into four sections 2010-2040. The image 2000 may be from an incoming video stream or a known advertisement, intro, outro, or channel identifier. It should be noted that the sections 2010-2040 do not make up the entire image. That is, if each of these sections is grabbed in order to create the fingerprint for the sections there is clearly no copyright issues associated therewith as the entire image is not captured and the image could not be regenerated based on the portions thereof. Each of the sections 2010-2040 is approximately 25% of the image 2000, however the section 2040 has a portion 2050 excluded therefrom as the portion 2050 may be associated with where an overlay is normally placed. FIG. 21 illustrates an exemplary image 2100 to be fmgeφrinted that is divided into a plurality of regions 2110 that are evenly distributed across the image 2100. Again it should be noted that the image 2100 may be from an incoming video stream or a known advertisement and that the regions 2100 do not make up the entire image. A section 2120 of the image that may be associated with where a banner may normally be placed so this portion of the image would be excluded. Certain regions 2130 fall within the section 2120 so they may be excluded from the fingeφrint or those regions 2130 may be shrunk so as to not fall within the section 2120. Ad substitution may be based on the particular channel that is being displayed. That is, a particular targeted advertisement may not be able to be displayed on a certain channel (e.g., an alcohol advertisement may not be able to be displayed on a religious programming channel). In addition, if the local ad insertion unit is to respond properly to channel specific cue tones that are centrally generated and distributed to each local site, the local unit has to know -what channel is being passed through it. An advertisement detection unit may not have access to data (e.g., specific frequency, metadata) indicating identity of the channel that is being displayed. Accordingly the unit will need to detect the specific channel. Fingeφrints may be defined for channel identification information that may be transmitted within the video stream (e.g., channel logos, channel banners, channel messages) and these fingeφrints may be stored for comparison. When the incoming video stream is received an attempt to identify the portion of the video stream containing the channel identification information may be made. For example, channel overlays may normally be placed in a specific location on the video stream so that portion of the video stream may be extracted and have features (e.g. CCV) generated therefore. These features will be compared to stored fmgeφrints for channel logos. As previously noted, one problem may be the fact that the features calculated for the region of interest for the video stream may include the actual video stream as well as the overlay. Additionally, the logos may not be placed in the same place on the video stream at all times so that defining an exact portion of the video stream to calculate features for may be difficult. According to one embodiment, channel changes may be detected and the channel information may be detected during the channel change. The detection of a channel change may be detected by comparing features of successive images of the incoming video stream and detecting a sudden and abrupt change in features. In digital programming a change in channel often results in the display of several monochrome (e.g., blank, black, blue) frames while the new channel is decoded. The display of these monochrome frames may be detected in order to determine that a channel change is occurring. The display of these monochrome frames may be detected by calculating a fmgeφrint for the incoming video stream and comparing it to fingeφrints for known channel change events (e.g., monochrome images displayed between channel changes). When channels are changed the channel numbers may be overlaid on a portion of the video stream. Alternatively a channel banner identifying various aspects of the channel being changed to may be displayed. The channel numbers and/or channel banner may normally be displayed in the same location. As discussed above with respect to the RODs, the locations on the images that may be associated with a channel overlay or channel banner may be excluded from the fmgeφrint calculation. Accordingly, the fingeφrints for either the incoming video stream or the channel change fmgeφrint(s) stored in the database would likely be for simply a monochrome image. FIG. 22 illustrates exemplary channel change images. As illustrated, the image during a channel change is a monochrome frame with the exception of the channel change banner 2210 along the bottom of the image. Accordingly, the channel banner may be identified as a region of disinterest to be excluded from comparisons of the features generated for the incoming video stream and the stored fingeφrints. After, the channel change has been detected (whether based on comparing fingeφrints or some other method), a determination as to what channel the system is tuned to can be made. The determination may be based on analyzing channel numbers overlaid on the image or the channel banner. The analysis may include comparing to stored channel numbers and/or channel banners. As addressed above, the actual comparison of images or portions of images requires large amounts of storage and processing and may not be possible to perform in real time. Alternatively, features/fingeφrints may be calculated for the incoming video stream and compared to fingeφrints for known channel identification data. As addressed above, calculating and comparing fingeφrints for overlays may be difficult due to the background image. Accordingly, the calculation and comparison of fingeφrints for channel numbers will focus on the channel banners. It should be noted that the channel banner may have more data then just the channel name or number. For example, it may include time, day, and program details (e.g., title, duration, actors, rating). The channel identification data is likely contained in the same location of the channel banner so that only that portion of the channel ban-ner will be of interest and only that portioa will be analyzed. Referring back to FIG. 22 shows that the channel identification data 2220 is in the upper left hand comer of the charmel banner. According, this area may be defined as a region of interest. Fingerprints for the relevant portion of channel banners for each channel will be generated and will be stored in a database. The channel identification fingeφrints may be stored in same database as the knowrr advertisement (intro, outro, sponsorship message) fmgeφrints or may be stored in a separate database. If stored in the same database the channel ident fingeφrints are likely segregated so that the incoming ^video stream is only compared to these fingeφrints when a channel change has been detected. It should be noted that different televisions and/or different set-top boxes may display an incoming video stream in slightly different fashions. This incl es the channel change banners 2210 and the channel number 2220 in the channel change banner being in different locations or being scaled differently. When looking at an entire image or multiple regions of an image this difference may be negligible in the comparison. However, when generating channel identification fingeφrints for an incoming video stream and comparing the calculated channel identification fingeφrints to known channel identification fingeφrints the difference in display may be significant. FIG. 23 illustrates an image 2300 with expected locations of a channel banner 2310 and channel identification information 2320 within the channel banner 2310 identified. The channel identification information 2320 may not be in the exact location expected due to parameters (e.g., scaling, translation) associated with the specific TV and/or STB (or DVR) used to receive and view the programming. For example, it is possible that the channel identification information 2320 could be located within a specific region 2330 that is greatly expanded from the expected location 2320. In order to account for the possible differences, scaling and translation factors must be determined for the incoming video stream. According to one embodiment, these factors can be determined by comparing location of the channel banner for the incoming video stream to the reference channel banner 2310. Initially a determination will be made as to where an inner boundary between the monochrome background and the channel banner is. Once the inner boundary is determined, the width and length of the channel banner can be determined. The scale factor can be determined by comparing the actual dimensions to the expected dimensions. The scale factor in x direction is the actual width of the channel banner/reference width, the scale factor in y direction is the actual height of channel banner/reference height. The translation factor can be determined based on comparing a certain point of the incoming stream to the same reference point (e.g., top left comer of the inner boundary between the monochrome background and the channel banner). According to one embodiment, the reference channel banner is scaled and translated during the start-up procedure to the actual size and position. The translation and scaling parameter are stored so they are known so that they can be used to scale and franslate the incoming stream so that an accurate comparison to the reference material (e.g., fingeφrints) can be made. The scaling and translation factors have been discussed with respect to the channel banner and channel identification information but are in no way limited thereto. Rather, these factors can used to ensure an appropriate comparison of fingeφrints of the incoming video stream to known fingeφrints (e.g., ads, ad intros, ad outros, channel idents, sponsorships). These factors can also be used to ensure that regions of disinterest or regions of interest are adequately identified. Alternatively, rather then creating a jEingeφrint for the channel identifier region of interest the region of interest can b>e analyzed by a text recognition system that may recognize the text associated with the channel identification data in order to determine the associated channel. Some networks may send messages ('channel ident') identifying the network (or channel) that is being displayed to reinforce network (channel) branding. According to one embodiment, these messages are detected and analyzed to determine the channel. The analysis may be comparing the message to stored messages for known networks (channels). Alternatively, the analysis may be calculating features for the message and comparing to stored features for known network (channel) messages/idents. The features may be generated for an entire video stream (entire image) or may be generated for a portion containing the branding message. Alternatively, the analysis may include using text recognition to determine what the message says and identirfying the channel based on that. When advertisement breaks are detected and/or when advertisements are substituted that information can be feed back to a central location for tracking and billing. The central location may compare Λie detected breaks against actual advertisement breaks in video streams and a-ssociate the video stream being displayed at the location with a channel based on matching advertisement breaks. The central location may transmit the associ ted channel identification back to the local detection device. The central location may track when ad breaks are detected for a plurality of users and group the users according to detected ad breaks. The central location could then compare the average of the detected ad breaks for the group and compare to actual ad breaks for a plurality of program streams. The groups may then be associated with a channel based on matching advertisement breaks. The cenfral location may transmit the associated channel identification back to the local detection devices of the group members. The local detection devices may transmit features associated with the presently viewed video stream (e.g., fingeφrints) to the central location. The central location may compare the features to features for the plurality of program streams that are being transmitted. The presently viewed presentation stream will be associated with the channel that the features correspond to. The features may be transmitted to the central location at certain intervals (e.g., 30 seconds of features every 15 minutes). The central location may fransmit that channel association back to the local ad detection equipment. According to one embodiment, the local detection device may send data related to when the advertisement break is detected and what fingeφrint was used to detect the advertisement break (e.g., fingeφrint identification). As previously discussed, the fmgeφrint to detect an advertisement break may be at least some subset of an ad intro fingeφrint, channel ident fingeφrint, sponsorship message fingeφrint, ad fingeφrint, and ad outro fingeφrint. Using both time and fingeφrint identification could provide a more accurate grouping and accordingly a more accurate channel identification. According to one embodiment, subscribers associated with the same group may be forced to the channel associated with the group. As previously mentioned, once an advertisement or an advertisement intro is detected in the incoming program stream targeted advertisements may be inserted locally. The number of targeted advertisements slated to be inserted during an advertisement break may be based on the predictecd duration of the advertisement break. For example, if the typical advertisement break is two minutes, it is feasible that four 30 second targeted advertisements may be inserted. However, if it took several seconds to detect the advertisement (or advertisement break) or if the advertisement break is shortened for any reason, tbe targeted advertisements may continue displaying over the resumed programming. Alternatively, an outro may be detected and a targeted advertisement may be cut off in the middle in order to return to the programming. According to one embodiment, targeted advertisements will be selected for a majority of the advertisement break but not all of it. The remaining time may be used by a still image or animation (pre-outro) that can be cut off at any time if it is desirable to return to the program without losing impact. For example, if targeted ads were presented for 1 :45 of a believed to be 2:00 advertisement break the remaining 15 seconds could be filled with a still image (e.g., a still image supporting the establishment, a message indicating "don't forget to tip your bartender"). According to one embodiment, a maximum break duration is identified. The maximum break duration is the maximum amount of time that the incoming video sfream will be preempted. After this period of time is up, insertion of advertisements will end and return to the incoming video stream. In addition a pre - outro time is identified. A pre-outro is a still or animation that is presented until the max break duration is achieved or an outro is detected whichever is sooner. For example, the maximum break duration may be defined as 1 :45 and the pre-outro may be defined as : 15. Accordingly, three 30 second advertisements may be displayed during the first 1 :30 of the ad break and then the pre-outro may be displayed for the remaining : 15 or until an outro is detected, whichever is sooner. The maximum break duration and outro time are defined so as to attempt to prevent targeted advertisements from being presented during programming. If an outro is detected while advertisements are still being inserted (e.g., before the pre-outro begins) a return to the incoming video stream may be initiated. As previously discussed sponsorship messages may be utilized along with or in place of outros prior to return of programming. Detection of a sponsorship message will also cause the return to the incoming video stream. Detection of programming may also cause the return to programming. According to one embodiment, a minimum time between detection of a video entity (e.g., ad, ad infro) that starts advertisement insertion and ability to detect a video entity (e.g., ad outro, programming) that causes ad insertion to end can be defined (minimum break duration). The minimum break duration may be beneficial where intros and outros are the same. The minimum break duration may be associated with a shortest advertisement period (e.g., 30 seconds). The minimum break duration would prevent the system from detecting an intro twice in a relatively short time frame and assuming that the detection of the second was an outro and accordingly ending insertion of an advertisement almost instantly. According to one embodiment, a minimum duration between breaks (insertions) may be defined. The minimum duration between breaks may be beneficial where intros and outros are the same. The duration would come into play when the maximum break duration was reached and the display of the incoming video steam was reestablished before detection of the outro. If the outro was detected when the incoming video stream was being displayed it may be associated with an intro and attempt to start another insertion. The minimum duration between breaks may also be useful where video entities similar to know intros and/or outros are used during programming but are not followed by ad breaks. Such a condition may occur during replays of specific events during a sporting event, or possibly during the beginning or ending of a program, when titles and/or credits, are being displayed. According to one embodiment, the titles at the begirming of a program may contain sub-sequences or images that are similar to know intros and/or oufros. In order to prevent the detection of these sub-sequences or images from initiating an ad break, the detection of programming can be used to suppress any detection for a predefined time frame (minimum duration after program start). The minimum duration after program start ensures that once the start of a program is detected that sub-sequences or images that are similar to know intros and/or outros will not interrupt programming . According to one embodiment, the detection of the beginning of programming (either the actual beginning of the program or the return of programming after an advertisement break) may end the insertion of targeted advertisements or the pre-outro if the beginning of programming is identified before the maximum break duration is expired or an outro is identified. Alternatively, if an outro, sponsorship message or programming is detected during an advertisement being inserted, the advertisement may be completed and then a return to programming may be initiated. The detection of the beginning of programming may be detected by comparing a calculated fmgeφrint of the incoming video stream with previously generated fingeφrints for the programming. The fingeφrints for programming may be for the scenes that are displayed during the theme song, or a particular image that is displayed once programming is about to resume (e.g., an image with the name of the program). The fingeφrints of programming and scenes within programming will be defined in more detail below. According to one embodiment, once it is determined that programming is again being presented on the incoming video stream the generation and comparison of fingeφrints may be halted temporarily as it is unlikely that an advertisement break be presented in a short time frame. According to one embodiment, the detection of a channel change or an electronic program guide (EPG) activation may cause the insertion of advertisements to cease and the new program or EPG to be displayed. According to one embodiment, fingeφrints are generated for special bulletins that may preempt advertising in the incoming video stream and correspondingly would want to preempt insertion of targeted advertising. Special bulletins may begin with a standard image such as the station name and logo and the words special bulletin or similar type slogan. Fingeφrints would be generated for each known special bulletin (one or more for each network) and stored locally. If the calculated fmgeφrint for an incoming video stream matched the special bulletin while targeted advertisement or the pre-outro were being displayed a return to the incoming video stream would be initiated. The specification has concentrated on local detection of advertisements or advertisement intros and local insertion of targeted advertisements. However, the specification is not limited thereto. For example, certain programs may be detected locally. The local detection of programs may enable the automatic recording of the program on a digital recording device such as a DVR. Likewise, specific scenes or scene changes may be detected. Based on the detection of scenes a program being recorded can be bookmarked for future viewing ease. To detect a particular program fingeφrints may be established for a plurality of programs (e.g., video that plays weekly during theme song, program title displayed in the video stream) and calculated features for the incoming video stream may be compared to these fingeφrints. When a match is detected the incoming video stream is associated with that program. Once the association is made, a determination can be made as to whether this is a program of interest to the user. If the detected program is a program of interest, a recording device may be turned on to record the program. The use of fingeφrints to detect the programs and ensure they are recorded without any user interaction is an alternative to using the electronic or interactive program guide to schedule recordings. The recorded programs could be archived and indexed based on any number of parameters (e.g., program, genre, actor, channel, network). Scene changes can be detected as described above through the matching of fingeφrints. If during recording of a program scene changes are detected the change in scenes can be bookmarked for ease of viewing at a later time. If specific scenes have already been identified and fingeφrints stored for those scenes, fingeφrints could be generated for the incoming video sfream and compared against scene fingeφrints. When a match is found the scene title could bookmark the scene being recorded. According to one embodiment, the subscriber may be able to initiate bookmarking. The subscriber generated bookmarking could be related to programs and/or scenes or could be related to anything the subscriber desires (e.g., line from a show, goal scored in soccer game). For example, while viewing a program being recorded the subscriber could inform the system (e.g., pressing a button) that they wish to have that portion of the video bookmarked. According to one embodiment, the system will save the calculated features (fmgeφrint) for a predefined number of frames (e.g., 25) or for a predefined time (e.g., 1 second) when the subscriber indicates a desire to bookmark. The subscriber may have the option to provide an identification for the fingeφrint that they bookmarked so that can easily return to this portion. According to one embodiment, a subscriber may desire to fingeφrint an entire portion of a video stream so that they can easily return to this portion or identify the portion for further processing (e.g., copying to a DVD if allowed and appropriate). For example, if a subscriber was watching a sports program that went into overtime and wanted to flag the overtime period they could instruct the system to save the fingeφrint for the entire overtime (e.g., hold the button for the entire time to inform the system to maintain the fmgeφrint generated). The subscriber may have the option to provide an identification for the fingeφrint that they bookmarked so that can easily return to this portion. The fingeφrint bookmarks and the associated programs, scenes or portions of video could be archived and indexed. The fingeφrints and associated video could be indexed based on any number of parameters (e.g., program, geme, actor, channel, network, user identification). The bookmarks could be used as chapters so that the subscriber could easily find the sections of the programming they are interested in. The fingeφrint bookmarks could be indexed with other bookmarks. If during the recording of a program an advertisement (or advertisement break) is detected, the recording of the program stream may be temporarily halted. After a certain time frame (e.g., typical advertisement block time, 2 minutes) or upon detection of an outro or programming the recording will begin again. The fmgeφrints stored locally may be updated as new fingeφrints are generated for any combination of ads, ad intros, channel banners, program overlays, programs, and scenes. The updates may be downloaded automatically at certain times (e.g., every night between 1 and 2 am), or may require a user to download fmgeφrints from a certain location (e.g., website) or any other means of updating. Automated distribution of fmgeφrints can also be utilized to ensure that viewers local fingeφrint libraries are up-to-date. According to one embodiment, the local detection system may track the features it generates for the incoming streams and if there is no match to a stored fmgeφrint the system may determine that it is a new fingeφrint and may store the fingeφrint. For example, if the system detects that an advertisement break has started and generates a fingeφrint for the ad (e.g., new Pepsi® ad) and the features generated for the new ad are not already stored, the calculated features may be stored for the new ad. As an example of the industrial applicability of the method, system, and apparatus described herein, equipment can be placed in commercial establishments such as bars, hotels, and hospitals, and will allow for the recognition of known video entities (e.g., advertisements, advertisement intros, advertisement outros, sponsorship messages, programs, scenes, channel changes, EPG activations, and special bulletins) and appropriate subsequent processing. In one embodiment, a unit having the capabilities described herein is placed in a bar, and is connected to an appropriate video source, as well as having a connection to a data network such as the internet. The output of a receiving unit (e.g., STB, DVR) is routed to the unit and subsequently to a television or other display. In this application the unit is continually updated with fingeφrints that correspond to video entities that are to be substituted, which in one case are advertisements. The unit processes the incoming video and can detect the channel that is being displayed on the television using the techniques described herein. The unit continually monitors the incoming video signal and, based on processing of multiple frames, full frames, sub-frames or partial images, determines a match to a known advertisement or intro. Based on which channel is being displayed on the television, the unit can access an appropriate advertisement and substitute the original advertisement with another advertisement. The unit can also record that a particular advertisement was displayed on a particular channel and the time at which it was aired. In order to ensure that video segments (and in particular intros and advertisements) are detected reliably, regions of interest in the video programming are marked and regions outside of the regions of interest are excluded from processing. The marking of the regions of interest is also used to focus processing on the areas that can provide information that is useful in determining to which channel the unit is tuned. In one instance, the region of interest for detection of video segments is the region that is excluded for channel detection and visa versa. In this instance the area that provides graphics, icons or text indicating the channel is examined for channel recognition but excluded for video segment recognition. Another application is the use of the method, system and apparatus in a personal/digital video recorder. In this instance, the personal/digital video recorder stores incoming video for future playback (also known as time-shifted video). The functionality described herein, or portions thereof, are included in the personal/digital video recorder and allows for the recognition of video segments on the incoming video, on stored video, or on video being played back. In one application the stored fingeφrints represent advertisements, while in another application the stored fingeφrints represent intros to programs. As such the personal/digital video recorder can perform advertisement recognition and substitution, or can automatically recognize segments that indicate that a program should be recorded. In one embodiment the user designates one or more fingeφrints as the basis for recording (e.g. known intros to sitcoms, sports events, talk shows). Each time one of those video entities is recognized by the system, the corresponding programming is recorded. The recognition of known video entities can also be used to create bookmarks in stored video such as that stored on a personal/digital video recorder. In this instance the user is presented with bookmarks that allow identification of particular segments of a program and allow the user to rapidly access those segments for playback. Yet another application of the method, system and apparatus described herein is incoφoration into servers that search for and access video across a network such as the internet. Using the fmgeφrinting methodology described herein, it is possible to compare video segments in stored video with fingeφrints representing known video entities. The known video entities can be established such that they are useful in classifying the video, determining content, or establishing bookmarks for future reference. It is noted that any and/or all of the above embodiments, configurations, and/or variations of the present invention described above can be mixed and matched and used in any combination with one another. Moreover, any description of a component or embodiment herein also includes hardware, software, and configurations which already exist in the prior art and may be necessary to the operation of such component(s) or embodiment(s). All embodiments of the present invention, can be realized in on a number of hardware and software platforms including microprocessor systems programmed in languages including (but not limited to) C, C++, Perl, HTML, Pascal, and Java, although the scope of the invention is not limited by the choice of a particular hardware platform, programming language or tool. The many features and advantages of the invention are apparent from the detailed specification. Thus, the appended claims are to cover all such features and advantages of the invention that fall within the scope of the invention. Furthermore, since numerous modifications and variations will readily occur to those skilled in the art, it is not desired to limit the invention to the exact construction and operation illustrated and described. Accordingly, appropriate modifications and equivalents may be included within the scope.

Claims

1. A method for detecting a known video entity within a video stream, the method comprising: receiving a video sfream; continually creating statistical parameterized representations for windows of the video sfream; continually comparing the statistical parameterized representation windows to windows of a plurality of fingeφrints, wherein each of the plurality of fmgeφrints includes associated statistical parameterized representations of a known video entity; and detecting a known video entity in the video sfream when a particular fingeφrint of the plurahty of fingeφrints has at least a threshold level of similarity with the video sfream.
2. A method as claimed in claim 1, wherein said receiving includes receiving a non-digital video stream and further comprising digitizing the video sfream.
3. A method as claimed in claim 1 or 2, wherein said creating includes creating statistical parameterized representations for less than an entire image.
4. A method as claimed in claim 1, 2 or 3, wherein the statistical parameterized representations are color coherence vectors.
5. A method as claimed in claim 1, 2 or 3, wherein the statistical parameterized representations are color histograms.
6. A method as claimed in claim 1, 2 or 3, where the statistical parameterized representations are evenly or randomly highly subsampled representations of an image.
7. A method as claimed in any preceding claim, wherein the known video entities are advertisements.
8. A method as claimed in any of claims 1 to 7, wherein the known video entities are at least some subset of advertisement intros, advertisement outros, channel idents, and sponsorships.
9. A method as claimed in any preceding claim, wherein said detecting includes detecting an advertisement opportunity in the video stream when said comparing indicates that a fmgeφrint associated with an advertisement or an advertisement intro has at least a threshold level of similarity with the statistical parameterized representations of the video stream.
10. A method as claimed in claim 9, further comprising inserting a targeted advertisement subsequent to said detecting an advertisement opportunity in the video stream.
11. A method as claimed in any of claims 1 to 8, wherein said detecting includes detecting an advertisement opportunity in the video stream when a manual switch to an alternative video source is received.
12. A method as claimed in claim 11, wherein the alternative video source is targeted advertisements.
13. A method as claimed in any preceding claim, wherein said comparing only proceeds to a next window for a subset of the plurality of fingeφrints that do not meet or exceed a maximum level of dissimilarity.
14. A method as claimed in any preceding claim, further comprising screening out fingeφrints having more then a maximum level of dissimilarity with the statistical parameterized representation window.
15. A method as claimed in any preceding claim, wherein said detecting is subsequent to comparing at least a defined number of windows of the particular fingeφrint.
16. A system for detecting a known video entity within a video stream, the system comprising: a receiver to receive a video sfream; memory for storing a plurality of fingeφrints, wherein each of the plurahty of fingeφrints includes associated statistical parameterized representations of a known video entity; and a processor to continually create statistical parameterized representations for windows of the video stream; continually compare the statistical parameterized representation windows to windows of the plurality of fingeφrints, and detect a known video entity in the video stream when a particular fingeφrint of the plurality of fingeφrints has at least a threshold level of similarity with the video stream.
17. A system as claimed in claim 16, wherein the windows of the video stream and the windows of the plurality of fingeφrints are portions of an image.
18. A system as claimed in claim 16 or 17, wherein the statistical parameterized representations include at least some subset of color coherence vectors, color histograms, and evenly or randomly highly subsampled representations of an image.
19. A system as claimed in any of claims 16 to 18, wherein the known video segments include at least some subset of advertisements, advertisement intros, sponsorship messages, advertisement outros, and channel idents.
20. A system as claimed in any of claims 16 to 19, wherein said processor detects an advertisement opportunity in the video stream when the comparing indicates that a fingeφrint associated with an advertisement or an advertisement infro has at least a threshold level of similarity with the statistical parameterized representations of the video stream.
21. A system as claimed in any of claims 16 to 20, further comprising a video inserter to insert targeted advertisements once the incoming video stream is associated with a known advertisement.
22. A system as claimed in any of claims 16 to 19, wherein said processor detects an advertisement opportunity in the video stream when a manual switch to an alternative video source is received.
23. A computer program embodied on a computer readable medium for detecting a known video entity within a video stream, when enabled by a computer readable instruction the computer program : continually creates statistical parameterized representations for windows of a received video stream; continually compares the statistical parameterized representation windows to windows of a plurality of fingeφrints, wherein each of the plurality of fingeφrints includes associated statistical parameterized representations of a known video entity; and detects a known video entity in the video stream when a particular fingeφrint of the plurality of fingeφrints has at least a threshold level of similarity with the video sfream.
24. A computer program as claimed in claim 23, wherein the windows of the video stream and the windows of the plurality of fingeφrints are for less than an entire image.
25. A computer program as claimed in claim 23, wherein the statistical parameterized representations are at least some subset of color coherence vectors, color histograms or highly subsampled representations of an image.
26. A computer program as claimed in claim 23, wherein the known video segments are at least some subset of advertisements, advertisement intros, sponsorship messages, advertisement outros, and channel idents.
27. A method for detecting a known video entity within a video stream substantially as hereinbefore described with reference to the accompanying drawings.
28. A system for detecting a known video entity within a video sfream substantially as hereinbefore described with reference to the accompanying drawings.
29. A computer program embodied on a computer readable medium for detecting a known video entity within a video stream substantially as hereinbefore described with reference to the accompanying drawings.
30. A method of determining a channel associated with a video stream, the method comprising: receiving a video stream; identifying a portion of the video stream containing information relevant to channel identification; and analyzing the identified portion of the video sfream to determine a channel associated with the video sfream.
31. A method as claimed in claim 30, wherein said receiving includes receiving a non-digital video stream and further comprising digitizing the video stream.
32. A method as claimed in claim 30 or 31 , wherein said analyzing includes comparing the identified portion of the video stream to stored channel identification information.
33. A method as claimed in claim 30 or 31, wherein said analyzing includes: generating a statistical parameterized representation of the identified portion of the video stream; and comparing the statistical parameterized representation of the identified 5 portion of the video stream to at least one fingeφrint, wherein each of the at least one fingeφrint includes associated statistical parameterized representations of known channel identification information.
34. A method as claimed in claim 33, wherein the statistical parameterized 0 representation is a color coherence vector.
35. A method as claimed in claim 33, wherein the statistical parameterized representation is a color histogram. 5
36. A method as claimed in claim 33, where the statistical parameterized representation is an evenly or randomly subsampled representation of an image of the video stream.
37. A method as claimed in any of claims 30 to 36, wherein the identified0 portion is a channel logo.
38. A method as claimed in any of claims 30 to 36, wherein the identified portion is a channel banner. 5
39. A method as claimed in any of claims 30 to 36, wherein the identified portion is a short clip within the video stream that reinforces channel branding.
40. A method as claimed in any of claims 30 to 39, wherein said identifying includes analyzing the video stream for indications that a channel change began or isO about to begin.
41. A method as claimed in claim 40, wherein the analyzing includes detecting several mostly monochrome images in a row.
42. A method as claimed in any of claims 30 to 41, further comprising determining differences between expected location and dimensions of the video stream and the actual location and dimensions of the video stream, wherein said identifying and said analyzing take into account the differences.
43. A method as claimed in claim 42, wherein the differences are determined during system start up.
44. A system for determining a channel associated with a video stream, the system comprising: a receiver to receive a video sfream; and a processor to identify a portion of the video stream relevant to channel identification; create a fingeφrint of the portion of the video stream; and compare the fingeφrint of the portion of the video stream against a library of stored fingeφrints associated with channel identifications to determine a channel associated with the video sfream.
45. A system as claimed in claim 44, wherein said processor creates a fingeφrint that includes at least some subset of color coherence vectors, color histograms, and evenly or randomly subsampled representation for the portion of the video sfream.
46. A system as claimed in claim 44 or 45, wherein said processor identifies a portion of the video stream relevant to channel identification that is at least some subset of a channel logo, a channel banner, or a short clip within the video stream that reinforces channel branding.
47. A system as claimed in any of claims 44 to 46, wherein said processor identifies a channel change.
48. A system as claimed in any of claims 44 to 47, wherein said processor determines differences between expected location and dimensions of the video stream and the actual location and dimensions of the video stream and uses the differences when identifying a portion.
49. A computer program embodied on a computer readable medium for deteπninrng a channel associated with a video stream, when enabled by a computer readable instruction the computer program: receiving a video stream; identifying a portion of the video stream relevant to channel identification; creating a fmgeφrint of the portion of the video stream; comparing the fingeφrint of the portion of the video stream against a library of stored fingeφrints associated with channel identifications to determine a channel associated with the video stream.
50. A computer program as claimed in claim 49, wherein said creating includes generating at least some subset of color coherence vectors, color histograms, and evenly or randomly subsampled representation for the portion of the video stream.
51. A computer program as claimed in claim 49 or 50, wherein said identifying includes identifying at least some subset of a channel logo, a channel banner, or a short clip within the video stream that reinforces channel branding.
52. A computer program as claimed in any of claims 49 to 51, wherein said identifying includes identifying a channel change.
53. A computer program as claimed in any of claims 49 to 52, wherein said computer program determines differences between expected location and dimensions of the video stream and the actual location and dimensions of the video sfream and uses the differences when identifying a portion.
54. A method of determining a channel associated with a video sfream, the method comprising: receiving a video stream; calculating features for the incoming video sfream; comparing the calculated features to features for known channels; and associating the video stream with a channel having same features.
55. A method as claimed in claim 54, wherein said calculating includes calculating at least some subset of color coherence vectors, color histograms, and evenly or randomly subsampled representations for the portion of the video stream.
56. A method as claimed in claim 54 or 55, wherein said calculating includes calculating shot detection, shot duration and shot boundary.
57. A method as claimed in any of claims 54 to 56, further comprising tracking detection of advertisement breaks within the video sfream; and comparing timing of the advertisement breaks in video stream to timing of advertisement breaks in known channels.
58. A method as claimed in any of claims 54 to 57, wherein said associating includes associating the video stream with a channel having same features and comparable advertisement break timing.
59. A method of determining a channel associated with a video stream, the method comprising: receiving a video stream; tracking detection of advertisement breaks within the video sfream; comparing timing of the advertisement breaks in video stream to timing of advertisement breaks in known channels; and associating the video steam to a channel having advertisement breaks at similar times.
60. A method as claimed in claim 59, further comprising calculating a fingeφrint for the incoming video stream; comparing the incoming video sfream fingeφrint to a database of fingeφrints for known video entities; identifying the incoming video sfream with a known video entity when the know video entity fingeφrint and the incoming video stream fingeφrint meet a threshold level of similarity.
61. A method as claimed in claim 60, wherein said associating includes associating the video stream with a channel based on known video entity fingeφrint identification and comparable advertisement break times.
62. A method of determining a channel associated with a video sfream substantially as hereinbefore described with reference to the accompanying drawings.
63. A system for determining a channel associated with a video stream substantially as hereinbefore described with reference to the accompanying drawings.
64. A computer program embodied on a computer readable medium for determining a channel associated with a video stream substantially as hereinbefore described with reference to the accompanying drawings.
65. A method for detecting a known video entity within a video stream, the method comprising: receiving a video stream; identifying a region of disinterest in the video stream, wherein the region of disinterest is a portion of images within the video stream; and creating statistical parameterized representations of the video sfream; comparing the statistical parameterized representation of the video sfream to a plurality of fmgerprints, wherein each of the plurality of fingeφrints includes a plurality of associated statistical parameterized representations of a known video entity, and wherein said comparing does not include the region of disinterest; and detecting a known video entity in the video stream when a particular fingeφrint of the plurality of fingeφrints has at least a threshold level of similarity with the video stream.
66. A method as claimed in claim 65, wherein said comparing is done based on a sliding window that only proceeds to a next window for a subset of the plurality of fingeφrints that do not meet or exceed a maximum level of dissimilarity for a current window.
67. A method as claimed in claim 66, wherein the sliding window is for less than an entire image.
68. A method as claimed in claim 65, 66 or 67, wherein the statistical parameterized representations are color coherence vectors.
69. A method as claimed in claim 65, 66 or 67, wherein the statistical parameterized representations are color histograms.
70. A method as claimed in claim 65, 66 or 67, wherein the statistical parameterized representations are an evenly or randomly highly subsampled representation of an image.
71. A method as claimed in any of claims 65 to 70, wherein the known video entities are advertisements.
72. A method as claimed in any of claims 65 to 70, wherein the known video entities include at least some subset of advertisement intros, advertisement outros, channel idents, and sponsorship messages.
73. A method as claimed in any of claims 65 to 72, wherein the region of disinterest is excluded from said creating statistical parameterized representations of the video stream.
74. A method as claimed in any of claims 65 to 73, wherein the region of disinterest is an overlay.
75. A method as claimed in any of claims 65 to 73, wherein the region of disinterest is a banner.
76. A method as claimed in any of claims 65 to 73, wherein the region of disinterest is a channel display.
77. A method as claimed in any of claims 65 to 73, wherein the region of disinterest includes at least some subset of channel logo, network logo, clock, scoreboard, timer, program information, EPG screen, promotions, weather reports, special news bulletins, close captioned data, and interactive TV buttons.
78. A method as claimed in any of claims 65 to 77, wherein the plurality of fingeφrints do not include the region of disinterest.
79. A method as claimed in any of claims 65 to 78, wherein images within the incoming video stream are segregated into a plurality of regions and a set of regions associated with the region of disinterest are excluded from said comparing.
80. A method as claimed in any of claims 65 to 79, further comprising filtering out fingerprints having more then a maximum level of dissimilarity with the statistical parameterized representation window.
81. A system for detecting a known video entity wilhin a video stream, the system comprising: a receiver to receive a video sfream; and a processor to identify a region of disinterest in the video stream, wherein the region of disinterest is a portion of at least an image within the video stream; and create statistical parameterized representations of the video stream; compare the statistical parameterized representation of the video stream to a plurality of fingeφrints, wherein each of the plurality of fingeφrints includes a plurality of associated statistical parameterized representations of a known video entity, and wherein said comparing does not include the region of disinterest; and detect a known video entity in the video stream when said comparing indicates that a particular fingeφrint of the plurality of fingeφrints has at least a threshold level of similarity with the video stream after comparing at least a defined number of windows of the particular fingeφrint.
82. A system as claimed in claim 81, wherein said processor compares based on a sliding window that only proceeds to a next window for a subset of the plurality of fingeφrints that do not meet or exceed a maximum level of dissimilarity for a current window.
83. A system as claimed in claim 81 or 82, wherein said processor creates statistical parameterized representations that are at least some subset of color coherence vectors, color histograms, and evenly or randomly highly subsampled representations of an image.
84. A system as claimed in any of claims 81 to 83, wherein the known video segments are at least some subset of advertisements, advertisement intros, advertisement outros, channel idents, sponsorship messages, channel changes, programs and scenes.
85. A system as claimed in any of claims 81 to 84, wherein said processor excludes the region of disinterest when creating the statistical parameterized representations of the video stream.
86. A system as claimed in any of claims 81 to 85, wherein the region of disinterest is at least some subset of an overlay, a banner, a channel display.
87. A computer program embodied on a computer readable medium for detecting an advertisement opportunity within a video stream, when enabled by a computer readable instruction the computer program: receiving a video stream; identifying a region of disinterest in the video stream, wherein the region of disinterest is a portion of images within the video sfream; creating statistical parameterized representations for windows of the video stream; comparing the statistical parameterized representation windows to windows of a plurality of fingeφrints, wherein each of the plurality of fingeφrints includes associated statistical parameterized representations of a known video entity, and wherein said comparing does not include the region of disinterest; and detecting a known video entity in the video stream when a particular fingeφrint of the plurahty of fingeφrints has at least a threshold level of similarity with the video stream.
88. A computer program as claimed in claim 87, further comprising filtering out fingeφrints having more then a maximum level of dissimilarity with the statistical parameterized representation window.
89. A computer program as claimed in claim 87 or 88, wherein the statistical parameterized representations are at least some subset of color coherence vectors, color histograms or highly subsampled representations of an image.
90. A computer program as claimed in any of claims 87 to 89, wherein the known video segments are at least some subset of advertisements, advertisement intros, advertisement outros, channel idents, sponsorship messages, channel changes, programs and scenes
91. A computer program as claimed in any of claims 87 to 90, wherein the region of disinterest is at least some subset of an overlay, a banner, and a channel display.
92. A method for detecting a known video entity within a video stream substantially as hereinbefore described with reference to the accompanying drawings.
93. A system for detecting a known video entity within a video stream substantially as hereinbefore described with reference to the accompanying drawings.
94. A computer program embodied on a computer readable medium for detecting an advertisement opportunity within a video stream substantially as hereinbefore described with reference to the accompanying drawings.
95. A method for ending advertisement insertion in a video stream, the method comprising: receiving a video stream; continually creating statistical parameterized representations for windows of the video stream; continually comparing the statistical parameterized representation windows to windows of a plurality of fmgeφrints, -wherein each of the plurality of fmgeφrints includes associated statistical parameterized representations of a known video entity; inserting advertisements into the video stream when a fingeφrint for a known video entity indicative of a commercial break has at least a threshold level of similarity with the video stream; and ending said inserting when an end of the advertisement break is determined.
96. A method as claimed in claim 95, wherein said ending includes detecting a fmgeφrint for a known video entity indicative of an end of advertisement break having at least a threshold level of similarity with the video sfream.
97. A method as claimed in claim 96, wherein the known video entity indicative of an end of advertisement break is an advertisement outro.
98. A method as claimed in claim 96, wherein the known video entity indicative of an end of advertisement break is a program or program title.
99. A method as claimed in claim 96, wherein the known video entity indicative of an end of advertisement break is a channel change.
100. A method as claimed in claim 96, wherein the known video entity indicative of an end of advertisement break is an EPG activation.
101. A method as claimed in claim 96, wherein the known video entity indicative of an end of advertisement break is a sponsorship message.
102. A method as claimed in any of claims 95 to 101, wherein said ending includes returning to the video stream after completion of current advertisement being inserted.
103. A method as claimed in any of claims 96 to 101, wherein said ending includes immediately returning to the video stream after detecting a fingeφrint for a known video entity indicative of an end of advertisement break.
104. A method as claimed in claim 95, wherein said ending includes returning to the video stream after a predetermined time frame.
105. A method as claimed in any of claims 95 to 104, wherein said ending includes playing a pre-outro after a predetermined time frame.
106. A method as claimed in claim 95, wherein said ending includes receiving a manually initiated trigger signal to end said inserting.
107. A method as claimed in any of claims 95 to 105, wherein said ending includes waiting at least a predetermined amount of time prior to attempting to determine an end of advertisement break.
108. A method as claimed in any of claims 95 to 107, wherein further comprising waiting a predetermined amount of time between said ending and said inserting.
109. A method as claimed in any of claims 95 to 108, further comprising suppressing the continually comparing for a predetermined amount of time after detection of a certain video entity.
110. A method as claimed in claim 109, wherein the certain video entity includes at least some subset program, program title, beginning of advertisement break, and end of advertisement break.
111. A method as claimed in any of claims 95 to 110, wherein the statistical parameterized representations include at least some subset of color coherence vectors, color histograms, evenly or randomly highly subsampled representations of an image.
112. A method as claimed in any of claims 95 to 111, wherein the known video entities include at least some subset of advertisements, advertisement intros, advertisement outros, sponsorship messages, channel changes, programs, EPG activations, channel idents and program titles.
113. A method as claimed in any of claims 95 to 112, wherein said comparing only proceeds to a next window for a subset of the plurality of fingeφrints that do not meet or exceed a maximum level of dissimilarity.
114. A method as claimed in any of claims 95 to 113, further comprising screening out fingeφrints having more then a maximum level of dissimilarity with the statistical parameterized representation window.
115. A system for ending advertisement insertion in a video stream, the system comprising: a receiver to receive a video stream; memory for storing a plurality of fingeφrints, wherein each of the plurahty of fingeφrints includes associated statistical parameterized representations of a known video entity; and a processor to continually create statistical parameterized representations for windows of the video stream; continually compare the statistical parameterized representation windows to windows of the plurality of fingeφrints, insert advertisements into the video stream when a fingeφrint for a known video entity indicative of a commercial break has at least a threshold level of similarity with the video sfream; and end the inserting when an end of the advertisement break is determined.
116. A system as claimed in claim 115, wherein said processor ends the inserting after detecting a fingeφrint for a known video entity indicative of an end of advertisement break having at least a threshold level of similarity with the video stream.
117. A system as claimed in claim 115, wherein the known video entity indicative of an end of advertisement break includes at least some subset of an advertisement outro, a program, a channel change, an EPG activation, a channel ident, a program title and a sponsorship message.
118. A system as claimed in claim 115, wherein said processor ends the inserting immediately after detection of an end of advertisement break, after completion of current advertisement being inserted when an end of advertisement break is detected, upon receiving a manually initiated trigger signal to end said inserting, or after a predetermined time frame.
119. A system as claimed in claim 115, wherein said processor further performs at least some subset of: waiting at least a predetermined amount of time prior to attempting to determine an end of advertisement break after inserting begins; waiting a predetermined amount of time between said ending and said inserting; and suppressing the continually comparing for a predetermined amount of time after detection of a certain video entity.
120. A computer program embodied on a computer readable medium for ending advertisement insertion in a video stream, when enabled by a computer readable instruction the computer program: continually creates statistical parameterized representations for windows of a received video stream; continually compares the statistical parameterized representation windows to windows of a plurality of fingeφrints, wherein each of the plurality of fingeφrints includes associated statistical parameterized representations of a known video entity; inserts advertisements into the video stream when a fmgeφrint for a known video entity indicative of a commercial break has at least a threshold level of similarity with the video sfream; and ends the inserting when an end of the advertisement break is determined.
121. A computer program as claimed in claim 120, wherein said computer program ends the inserting after detecting a fmgeφrint for a known video entity indicative of an end of advertisement break having at least a threshold level of similarity with the video sfream.
122. A computer program as claimed in claim 120, wherein the known video entity indicative of an end of advertisement break includes at least some subset of an advertisement outro, a program, a channel change, an EPG activation, a program title, a channel ident, and a sponsorship message.
123. A computer program as claimed in claim 120, wherein said computer program ends the inserting immediately after detection of an end of advertisement break, after completion of current advertisement being inserted when an end of advertisement break is detected, upon receiving a manually initiated trigger signal to end said inserting, or after a predetermined time frame.
124. A computer program as claimed in claim 120, wherein said computer program further performs at least some subset of: waiting at least a predetermined amount of time prior to attempting to determine an end of advertisement break after inserting begins; waiting a predetermined amount of time between said ending and said inserting; and suppressing the continually comparing for a predetermined amount of time after detection of a certain video entity.
125. A method for ending advertisement insertion in a video stream substantially as hereinbefore described with reference to the accompanying drawings.
126. A system for ending advertisement insertion in a video stream substantially as hereinbefore described with reference to the accompanying drawings.
127. A computer program embodied on a computer readable medium, for ending advertisement insertion in a video stream substantially as hereinbefore described with reference to the accompanying drawings.
128. A method for detecting and acting on a known video entity within a video sfream, the method comprising: receiving a video stream; continually creating statistical parameterized representations for windows of the video sfream; continually comparing the statistical parameterized representation windows to windows of a plurality of fmgeφrints associated with known video entities, wherein each of the at least one fingeφrint includes statistical parameterized representations of the known video entities; and detecting a known video entity in the video sfream when a particular fingeφrint of the plurahty of fingeφrints has at least a threshold level of similarity with the video stream.
129. A method as claimed in claim 128, wherein the known video entities include program scenes.
130. A method as claimed in claim 129, further comprising bookmarking the video stream when a known program scene is detected in the video stream.
131. A method as claimed in claim 128, 129 or 130, further comprising archiving and indexing the video sfream.
132. A method as claimed in claim 130, further comprising using the bookmarks as chapters indicating different scenes.
133. A method as claimed in claim 128, wherein the known video entities include at least some subset of programs and program titles.
134. A method as claimed in claim 133, further comprising recording the video stream when a known program is detected and the known program is a program of interest.
135. A method as claimed in claim 134, further comprising archiving and indexing the programs of interest.
136. A method of claim 128, wherein the known video entities or segments include at least some subset of advertisements, advertisement intros, channel idents and sponsorship messages.
137. A method as claimed in claim 136, wherein said detecting includes detecting an advertisement break when a fingeφrint associated with at least some subset of an advertisement, an advertisement intro, a channel ident and a sponsorship message has at least a threshold level of similarity with the video stream.
138. A method as claimed in claim 137, further comprising discontinuing recording of the video stream when the advertisement break is detected in the video stream and the video stream is currently being recorded.
139. A method as claimed in claim 138, wherein the known video entities or segments include at least some subset of advertisement outros, channel idents, sponsorship messages, programming and programming titles.
140. A method as claimed in claim 139, wherein said detecting includes detecting a return to programming from the advertisement break when a fingeφrint associated with at least some subset of an advertisement outro, a channel ident, a sponsorship message, programming and a program title has at least a threshold level of similarity with the video stream.
141. A method as claimed in claim 140, further comprising restarting recording of the video stream when the return to programming is detected in the video stream.
142. A method as claimed in claim 140, further comprising restarting the recording of the video sfream after a predefined amount of time has passed if the return to programming has not been detected.
143. A method as claimed in any of claims 128 to 142, wherein said creating includes creating statistical parameterized representations for less than an entire image.
144. A method as claimed in any of claims 128 to 143, wherein the statistical parameterized representations include at least some subset of color coherence vectors, color histograms, and evenly or randomly highly subsampled representations of an image.
145. A method as claimed in any of claims 128 to 144, wherein said comparing only proceeds to a next window for a stibset of the plurality of fingeφrints that do not meet or exceed a maximum level of dissimilarity.
146. A system for detecting and acting on a known video entity within a video stream, the system comprising: a receiver to receive a video stream; memory for storing a plurality of fingeφrints, wherein each of the plurality of fingeφrints includes associated statistical parameterized representations of a known video entity; and a processor to continually create statistical parameterized representations for windows of the video sfream; continually compare the statistical parameterized representation windows to windows of the plurality of fingeφrints, and detect a known video entity in the video stream when a particular fingeφrint of the plurality of fingeφrints has at least a threshold level of similarity with the video sfream.
147. A system as claimed in claim 146, wherein the known video entities include at least some subset of programs, program titles, scenes, advertisements, advertisement intros, advertisement oufros, channel idents and sponsorship messages.
148. A system as claimed in claim 146 or 147, wherein said processor further bookmarks the video sfream when a known program scene is detected in the video stream.
149. A system as claimed in claim 146 or 147, wherein said processor further records the video sfream when a known program is detected and the known program is a program of interest.
150. A system as claimed in any of claims 146 to 149, wherein said processor further discontinues recording of the video stream when an advertisement break is detected in the video sfream and the video stream is currently being recorded.
151. A system as claimed in claim 150, wherein said processor further restarts recording of the video sfream when a return to programming is detected in the video stream.
152. A system as claimed in claim 150, wherein said processor further restarts the recording of the video stream after a predefined amount of time has passed if a return to programming has not been detected.
153. A system as claimed in any of claims 146 to 152, wherein said processor further archives and indexes recorded programs and bookmarks.
154. A system as claimed in any of claims 146 to 153, wherein the statistical parameterized representations include at least some subset of color coherence vectors, color histograms, and evenly or randomly highly subsampled representations of an image.
155. A computer program embodied on a computer readable medium for detecting a known scene within a video stream, when enabled by a computer readable instruction the computer program: continually creates statistical parameterized representations for windows of a received video sfream; continually compares the statistical parameterized representation windows to windows of a plurality of fingeφrints, wherein each of the plurality of fingeφrints includes associated statistical parameterized representations of a known video entity; and detects a known video entity in the video streaio. when a particular fingeφrint of the plurahty of fingeφrints has at least a threshold level of similarity with the video stream.
156. A computer program as claimed in claim 155, wherein said computer program further bookmarks the video sfream when a known program scene is detected in the video stream.
157. A computer program as claimed in claim 155, wherein said computer program further records the video stream when a known program is detected and the known program is a program of interest.
158. A computer program as claimed in any of clai s 155 to 157, wherein said computer program further discontinues recording of the video stream when an advertisement break is detected in the video stream and the video stream is currently being recorded.
159. A system as claimed in claim 158, wherein said processor computer program restarts recording of the video stream when a return to programming is detected in the video stream.
160. A method for detecting and acting on a known video entity within a video stream substantially as hereinbefore described with reference to the accompanying drawings.
161. A system for detecting and acting on a known video entity within a video stream substantially as hereinbefore described with reference to the accompanying drawings.
162. A computer program embodied on a computer readable medium for detecting a known scene within a video sfream substantially as hereinbefore described with reference to the accompanying drawings.
PCT/GB2005/000772 2004-03-01 2005-03-01 Detecting known video entities WO2005086081A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
DE602005019273T DE602005019273D1 (en) 2004-03-01 2005-03-01 DETECTION OF KNOWN PICTURES IN VIDEO SEQUENCES
AT05717850T ATE457501T1 (en) 2004-03-01 2005-03-01 DETECTION OF KNOWN IMAGES IN VIDEO SEQUENCES
EP05717850A EP1730668B1 (en) 2004-03-01 2005-03-01 Detecting known images in video streams

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US10/790,468 US7694318B2 (en) 2003-03-07 2004-03-01 Video detection and insertion
US10/790,468 2004-03-01

Publications (1)

Publication Number Publication Date
WO2005086081A1 true WO2005086081A1 (en) 2005-09-15

Family

ID=34435909

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2005/000772 WO2005086081A1 (en) 2004-03-01 2005-03-01 Detecting known video entities

Country Status (7)

Country Link
US (2) US7694318B2 (en)
EP (1) EP1730668B1 (en)
AT (1) ATE457501T1 (en)
DE (1) DE602005019273D1 (en)
GB (3) GB2411786B (en)
WO (1) WO2005086081A1 (en)
ZA (1) ZA200608155B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8207989B2 (en) 2008-12-12 2012-06-26 Microsoft Corporation Multi-video synthesis
US8654255B2 (en) 2007-09-20 2014-02-18 Microsoft Corporation Advertisement insertion points detection for online video advertising
US9554093B2 (en) 2006-02-27 2017-01-24 Microsoft Technology Licensing, Llc Automatically inserting advertisements into source video content playback streams
US9639531B2 (en) 2008-04-09 2017-05-02 The Nielsen Company (Us), Llc Methods and apparatus to play and control playing of media in a web page
US10943252B2 (en) 2013-03-15 2021-03-09 The Nielsen Company (Us), Llc Methods and apparatus to identify a type of media presented by a media player

Families Citing this family (210)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7809154B2 (en) 2003-03-07 2010-10-05 Technology, Patents & Licensing, Inc. Video entity recognition in compressed digital video streams
US20050149968A1 (en) * 2003-03-07 2005-07-07 Richard Konig Ending advertisement insertion
US20050177847A1 (en) * 2003-03-07 2005-08-11 Richard Konig Determining channel associated with video stream
US7694318B2 (en) * 2003-03-07 2010-04-06 Technology, Patents & Licensing, Inc. Video detection and insertion
US7738704B2 (en) * 2003-03-07 2010-06-15 Technology, Patents And Licensing, Inc. Detecting known video entities utilizing fingerprints
US20040237102A1 (en) * 2003-03-07 2004-11-25 Richard Konig Advertisement substitution
US20040226035A1 (en) * 2003-05-05 2004-11-11 Hauser David L. Method and apparatus for detecting media content
US8020000B2 (en) * 2003-07-11 2011-09-13 Gracenote, Inc. Method and device for generating and detecting a fingerprint functioning as a trigger marker in a multimedia signal
TW200527110A (en) * 2003-10-20 2005-08-16 Johnson Res And Dev Co Inc Portable multimedia projection system
US10387920B2 (en) 2003-12-23 2019-08-20 Roku, Inc. System and method for offering and billing advertisement opportunities
US9865017B2 (en) 2003-12-23 2018-01-09 Opentv, Inc. System and method for providing interactive advertisement
US10032192B2 (en) * 2003-12-23 2018-07-24 Roku, Inc. Automatic localization of advertisements
US20060195860A1 (en) * 2005-02-25 2006-08-31 Eldering Charles A Acting on known video entities detected utilizing fingerprinting
US20060242667A1 (en) * 2005-04-22 2006-10-26 Petersen Erin L Ad monitoring and indication
US7690011B2 (en) * 2005-05-02 2010-03-30 Technology, Patents & Licensing, Inc. Video stream modification to defeat detection
US9286388B2 (en) 2005-08-04 2016-03-15 Time Warner Cable Enterprises Llc Method and apparatus for context-specific content delivery
US8326775B2 (en) 2005-10-26 2012-12-04 Cortica Ltd. Signature generation for multimedia deep-content-classification by a large-scale matching system and method thereof
US8818916B2 (en) 2005-10-26 2014-08-26 Cortica, Ltd. System and method for linking multimedia data elements to web pages
US10607355B2 (en) 2005-10-26 2020-03-31 Cortica, Ltd. Method and system for determining the dimensions of an object shown in a multimedia content item
US9256668B2 (en) 2005-10-26 2016-02-09 Cortica, Ltd. System and method of detecting common patterns within unstructured data elements retrieved from big data sources
US9396435B2 (en) 2005-10-26 2016-07-19 Cortica, Ltd. System and method for identification of deviations from periodic behavior patterns in multimedia content
US10191976B2 (en) 2005-10-26 2019-01-29 Cortica, Ltd. System and method of detecting common patterns within unstructured data elements retrieved from big data sources
US10949773B2 (en) 2005-10-26 2021-03-16 Cortica, Ltd. System and methods thereof for recommending tags for multimedia content elements based on context
US10635640B2 (en) 2005-10-26 2020-04-28 Cortica, Ltd. System and method for enriching a concept database
US8312031B2 (en) 2005-10-26 2012-11-13 Cortica Ltd. System and method for generation of complex signatures for multimedia data content
US11019161B2 (en) 2005-10-26 2021-05-25 Cortica, Ltd. System and method for profiling users interest based on multimedia content analysis
US10360253B2 (en) 2005-10-26 2019-07-23 Cortica, Ltd. Systems and methods for generation of searchable structures respective of multimedia data content
US9489431B2 (en) 2005-10-26 2016-11-08 Cortica, Ltd. System and method for distributed search-by-content
US11604847B2 (en) 2005-10-26 2023-03-14 Cortica Ltd. System and method for overlaying content on a multimedia content element based on user interest
US9191626B2 (en) 2005-10-26 2015-11-17 Cortica, Ltd. System and methods thereof for visual analysis of an image on a web-page and matching an advertisement thereto
US10698939B2 (en) 2005-10-26 2020-06-30 Cortica Ltd System and method for customizing images
US10380623B2 (en) 2005-10-26 2019-08-13 Cortica, Ltd. System and method for generating an advertisement effectiveness performance score
US9286623B2 (en) 2005-10-26 2016-03-15 Cortica, Ltd. Method for determining an area within a multimedia content element over which an advertisement can be displayed
US10193990B2 (en) 2005-10-26 2019-01-29 Cortica Ltd. System and method for creating user profiles based on multimedia content
US9477658B2 (en) 2005-10-26 2016-10-25 Cortica, Ltd. Systems and method for speech to speech translation using cores of a natural liquid architecture system
US10372746B2 (en) 2005-10-26 2019-08-06 Cortica, Ltd. System and method for searching applications using multimedia content elements
US9639532B2 (en) 2005-10-26 2017-05-02 Cortica, Ltd. Context-based analysis of multimedia content items using signatures of multimedia elements and matching concepts
US9466068B2 (en) 2005-10-26 2016-10-11 Cortica, Ltd. System and method for determining a pupillary response to a multimedia data element
US9767143B2 (en) 2005-10-26 2017-09-19 Cortica, Ltd. System and method for caching of concept structures
US11403336B2 (en) 2005-10-26 2022-08-02 Cortica Ltd. System and method for removing contextually identical multimedia content elements
US10380164B2 (en) 2005-10-26 2019-08-13 Cortica, Ltd. System and method for using on-image gestures and multimedia content elements as search queries
US10691642B2 (en) 2005-10-26 2020-06-23 Cortica Ltd System and method for enriching a concept database with homogenous concepts
US10742340B2 (en) 2005-10-26 2020-08-11 Cortica Ltd. System and method for identifying the context of multimedia content elements displayed in a web-page and providing contextual filters respective thereto
US11003706B2 (en) 2005-10-26 2021-05-11 Cortica Ltd System and methods for determining access permissions on personalized clusters of multimedia content elements
US10380267B2 (en) 2005-10-26 2019-08-13 Cortica, Ltd. System and method for tagging multimedia content elements
US10535192B2 (en) 2005-10-26 2020-01-14 Cortica Ltd. System and method for generating a customized augmented reality environment to a user
US10585934B2 (en) 2005-10-26 2020-03-10 Cortica Ltd. Method and system for populating a concept database with respect to user identifiers
US10776585B2 (en) 2005-10-26 2020-09-15 Cortica, Ltd. System and method for recognizing characters in multimedia content
US9558449B2 (en) 2005-10-26 2017-01-31 Cortica, Ltd. System and method for identifying a target area in a multimedia content element
US9529984B2 (en) 2005-10-26 2016-12-27 Cortica, Ltd. System and method for verification of user identification based on multimedia content elements
US10621988B2 (en) 2005-10-26 2020-04-14 Cortica Ltd System and method for speech to text translation using cores of a natural liquid architecture system
US9031999B2 (en) 2005-10-26 2015-05-12 Cortica, Ltd. System and methods for generation of a concept based database
US9384196B2 (en) 2005-10-26 2016-07-05 Cortica, Ltd. Signature generation for multimedia deep-content-classification by a large-scale matching system and method thereof
US9953032B2 (en) 2005-10-26 2018-04-24 Cortica, Ltd. System and method for characterization of multimedia content signals using cores of a natural liquid architecture system
US11361014B2 (en) 2005-10-26 2022-06-14 Cortica Ltd. System and method for completing a user profile
US11216498B2 (en) 2005-10-26 2022-01-04 Cortica, Ltd. System and method for generating signatures to three-dimensional multimedia data elements
US11032017B2 (en) 2005-10-26 2021-06-08 Cortica, Ltd. System and method for identifying the context of multimedia content elements
US9330189B2 (en) 2005-10-26 2016-05-03 Cortica, Ltd. System and method for capturing a multimedia content item by a mobile device and matching sequentially relevant content to the multimedia content item
US8266185B2 (en) 2005-10-26 2012-09-11 Cortica Ltd. System and methods thereof for generation of searchable structures respective of multimedia data content
US9646005B2 (en) 2005-10-26 2017-05-09 Cortica, Ltd. System and method for creating a database of multimedia content elements assigned to users
US11386139B2 (en) 2005-10-26 2022-07-12 Cortica Ltd. System and method for generating analytics for entities depicted in multimedia content
US10614626B2 (en) 2005-10-26 2020-04-07 Cortica Ltd. System and method for providing augmented reality challenges
US10180942B2 (en) 2005-10-26 2019-01-15 Cortica Ltd. System and method for generation of concept structures based on sub-concepts
US10848590B2 (en) 2005-10-26 2020-11-24 Cortica Ltd System and method for determining a contextual insight and providing recommendations based thereon
US9372940B2 (en) 2005-10-26 2016-06-21 Cortica, Ltd. Apparatus and method for determining user attention using a deep-content-classification (DCC) system
US11620327B2 (en) 2005-10-26 2023-04-04 Cortica Ltd System and method for determining a contextual insight and generating an interface with recommendations based thereon
US10387914B2 (en) 2005-10-26 2019-08-20 Cortica, Ltd. Method for identification of multimedia content elements and adding advertising content respective thereof
US9218606B2 (en) 2005-10-26 2015-12-22 Cortica, Ltd. System and method for brand monitoring and trend analysis based on deep-content-classification
US9235557B2 (en) * 2005-10-26 2016-01-12 Cortica, Ltd. System and method thereof for dynamically associating a link to an information resource with a multimedia content displayed in a web-page
US9269088B2 (en) * 2005-11-23 2016-02-23 Cable Television Laboratories, Inc. Method and system of advertising
US20070124762A1 (en) * 2005-11-30 2007-05-31 Microsoft Corporation Selective advertisement display for multimedia content
US20070136758A1 (en) * 2005-12-14 2007-06-14 Nokia Corporation System, method, mobile terminal and computer program product for defining and detecting an interactive component in a video data stream
CN101352029A (en) * 2005-12-15 2009-01-21 模拟装置公司 Randomly sub-sampled partition voting(RSVP) algorithm for scene change detection
JP2009521174A (en) * 2005-12-21 2009-05-28 コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ Video encoding and decoding
US8774602B1 (en) * 2006-02-10 2014-07-08 Tp Lab, Inc. Method to record a media file
US9386327B2 (en) 2006-05-24 2016-07-05 Time Warner Cable Enterprises Llc Secondary content insertion apparatus and methods
US8199160B2 (en) 2006-06-02 2012-06-12 Advanced Us Technology Group, Inc. Method and apparatus for monitoring a user's activities
US7873982B2 (en) * 2006-06-22 2011-01-18 Tivo Inc. Method and apparatus for creating and viewing customized multimedia segments
US7661121B2 (en) * 2006-06-22 2010-02-09 Tivo, Inc. In-band data recognition and synchronization system
EP2057843A1 (en) * 2006-08-31 2009-05-13 International Business Machines Corporation Personalized advertising in mobile television
US8966389B2 (en) 2006-09-22 2015-02-24 Limelight Networks, Inc. Visual interface for identifying positions of interest within a sequentially ordered information encoding
US8396878B2 (en) 2006-09-22 2013-03-12 Limelight Networks, Inc. Methods and systems for generating automated tags for video files
US8214374B1 (en) 2011-09-26 2012-07-03 Limelight Networks, Inc. Methods and systems for abridging video files
US9015172B2 (en) 2006-09-22 2015-04-21 Limelight Networks, Inc. Method and subsystem for searching media content within a content-search service system
JP2008099001A (en) * 2006-10-12 2008-04-24 Funai Electric Co Ltd Image recorder and recording method
US10733326B2 (en) 2006-10-26 2020-08-04 Cortica Ltd. System and method for identification of inappropriate multimedia content
GB2444094A (en) * 2006-11-22 2008-05-28 Half Minute Media Ltd Identifying repeating video sections by comparing video fingerprints from detected candidate video sequences
US8493510B2 (en) * 2006-12-12 2013-07-23 Time Warner Inc. Method and apparatus for concealing portions of a video screen
US20090037949A1 (en) * 2007-02-22 2009-02-05 Birch James R Integrated and synchronized cross platform delivery system
US20100318428A1 (en) * 2007-03-02 2010-12-16 Birch James R Dynamic prioritization of advertisements and content delivery system
US20100318429A1 (en) * 2007-03-02 2010-12-16 Birch James R Relative usage and location optimization system
US20100324992A1 (en) * 2007-03-02 2010-12-23 Birch James R Dynamically reactive response and specific sequencing of targeted advertising and content delivery system
US7912217B2 (en) * 2007-03-20 2011-03-22 Cisco Technology, Inc. Customized advertisement splicing in encrypted entertainment sources
US10356366B2 (en) * 2007-05-31 2019-07-16 Sony Interactive Entertainment America Llc System and method for taking control of a system during a commercial break
US9165301B2 (en) * 2007-06-06 2015-10-20 Core Audience, Inc. Network devices for replacing an advertisement with another advertisement
US8526784B2 (en) 2007-07-27 2013-09-03 Cisco Technology, Inc. Digital video recorder collaboration and similar media segment determination
US8214273B2 (en) * 2007-09-25 2012-07-03 Goldspot Media Apparatus and methods for enabling targeted insertion of advertisements using metadata as in-content descriptors
EP2053530A1 (en) * 2007-10-05 2009-04-29 Research In Motion Limited Method and system for multifaceted scanning
US7979906B2 (en) * 2007-10-05 2011-07-12 Research In Motion Limited Method and system for multifaceted scanning
US9239958B2 (en) 2007-11-09 2016-01-19 The Nielsen Company (Us), Llc Methods and apparatus to measure brand exposure in media streams
GB2454582B (en) * 2007-11-09 2012-07-04 Nielsen Co Us Llc Methods and apparatus to measure brand exposure in media streams and to specify regions of interest in associated video frames
US8136140B2 (en) 2007-11-20 2012-03-13 Dish Network L.L.C. Methods and apparatus for generating metadata utilized to filter content from a video stream using text data
US8165450B2 (en) 2007-11-19 2012-04-24 Echostar Technologies L.L.C. Methods and apparatus for filtering content in a video stream using text data
US8165451B2 (en) 2007-11-20 2012-04-24 Echostar Technologies L.L.C. Methods and apparatus for displaying information regarding interstitials of a video stream
JP2009152810A (en) 2007-12-19 2009-07-09 Yahoo Japan Corp Recording and replay apparatus and operation method therefor
US7886070B2 (en) * 2008-01-15 2011-02-08 International Business Corporation Source updating for streaming based servers
US20090193455A1 (en) * 2008-01-29 2009-07-30 Samsung Electronics Co., Ltd. Information storage medium and method for providing additional contents based on trigger, and digital broadcast reception apparatus
EP2265007A4 (en) 2008-01-29 2011-08-24 Samsung Electronics Ltd Content recording control method for peers, and a device therefor
US8973028B2 (en) * 2008-01-29 2015-03-03 Samsung Electronics Co., Ltd. Information storage medium storing metadata and method of providing additional contents, and digital broadcast reception apparatus
CN101933039B (en) 2008-01-29 2015-07-08 三星电子株式会社 Method for providing a content-sharing service, and a device therefor
US8606085B2 (en) 2008-03-20 2013-12-10 Dish Network L.L.C. Method and apparatus for replacement of audio data in recorded audio/video stream
AU2009234358A1 (en) * 2008-04-10 2009-10-15 Gvbb Holdings S.A.R.L. Method and apparatus for content replacement in live production
US8156520B2 (en) 2008-05-30 2012-04-10 EchoStar Technologies, L.L.C. Methods and apparatus for presenting substitute content in an audio/video stream using text data
US20090320063A1 (en) * 2008-06-23 2009-12-24 Microsoft Corporation Local advertisement insertion detection
US20100014825A1 (en) * 2008-07-18 2010-01-21 Porto Technology, Llc Use of a secondary device to overlay disassociated media elements onto video content
CN102132574B (en) * 2008-08-22 2014-04-02 杜比实验室特许公司 Content identification and quality monitoring
US8004576B2 (en) 2008-10-31 2011-08-23 Digimarc Corporation Histogram methods and systems for object recognition
US8407735B2 (en) 2008-12-24 2013-03-26 Echostar Technologies L.L.C. Methods and apparatus for identifying segments of content in a presentation stream using signature data
US8510771B2 (en) 2008-12-24 2013-08-13 Echostar Technologies L.L.C. Methods and apparatus for filtering content from a presentation stream using signature data
US8588579B2 (en) 2008-12-24 2013-11-19 Echostar Technologies L.L.C. Methods and apparatus for filtering and inserting content into a presentation stream using signature data
US9215423B2 (en) 2009-03-30 2015-12-15 Time Warner Cable Enterprises Llc Recommendation engine apparatus and methods
US8918806B2 (en) * 2009-04-01 2014-12-23 Disney Enterprises, Inc. Packaged media playback with remotely obtained supplemental content
US9015741B2 (en) 2009-04-17 2015-04-21 Gracenote, Inc. Method and system for remotely controlling consumer electronic devices
JP5133454B2 (en) * 2009-04-28 2013-01-30 シャープ株式会社 Display device, display method, and program for executing the same
WO2010141691A1 (en) 2009-06-03 2010-12-09 Visible World, Inc. Targeting television advertisements based on automatic optimization of demographic information
US8437617B2 (en) 2009-06-17 2013-05-07 Echostar Technologies L.L.C. Method and apparatus for modifying the presentation of content
US8813124B2 (en) 2009-07-15 2014-08-19 Time Warner Cable Enterprises Llc Methods and apparatus for targeted secondary content insertion
GB2473911A (en) 2009-09-10 2011-03-30 Miniweb Technologies Ltd Content item receiver with advertisement replacement facility
US8984626B2 (en) 2009-09-14 2015-03-17 Tivo Inc. Multifunction multimedia device
US8369686B2 (en) * 2009-09-30 2013-02-05 Microsoft Corporation Intelligent overlay for video advertising
US20110137976A1 (en) * 2009-12-04 2011-06-09 Bob Poniatowski Multifunction Multimedia Device
US8682145B2 (en) 2009-12-04 2014-03-25 Tivo Inc. Recording system based on multimedia content fingerprints
US8798442B2 (en) * 2009-12-12 2014-08-05 At&T Intellectual Property I, Lp System, method and computer program product for updating advertising data for recorded video data
US8934758B2 (en) 2010-02-09 2015-01-13 Echostar Global B.V. Methods and apparatus for presenting supplemental content in association with recorded content
US8433142B2 (en) 2010-04-05 2013-04-30 The Nielsen Company (Us), Llc Methods and apparatus to detect differences between images
US20110264530A1 (en) * 2010-04-23 2011-10-27 Bryan Santangelo Apparatus and methods for dynamic secondary content and data insertion and delivery
WO2011146275A2 (en) * 2010-05-19 2011-11-24 Google Inc. Managing lifecycles of television gadgets and applications
US9288550B2 (en) * 2010-08-09 2016-03-15 Surewaves Mediatech Private Limited Method and system for integrated media planning and automated advertisement distribution and insertion
US8826319B2 (en) * 2010-08-09 2014-09-02 Surewaves Mediatech Private Limited Method and system for tracking of advertisements
KR20120060134A (en) * 2010-08-16 2012-06-11 삼성전자주식회사 Method and apparatus for reproducing advertisement
US8863165B2 (en) 2010-11-01 2014-10-14 Gracenote, Inc. Method and system for presenting additional content at a media system
US8656422B2 (en) * 2011-01-25 2014-02-18 Motorola Mobility Llc Method and apparatus for managing targeted advertisements for a linear television service
NL2006291C2 (en) * 2011-02-24 2012-08-27 Civolution B V Broadcasting an information signal having special content for triggering an appropriate action in user device.
EP2700238B1 (en) 2011-04-19 2018-09-19 Nagravision S.A. Ethernet decoder device and method to access protected content
US9264760B1 (en) * 2011-09-30 2016-02-16 Tribune Broadcasting Company, Llc Systems and methods for electronically tagging a video component in a video package
US8966525B2 (en) * 2011-11-08 2015-02-24 Verizon Patent And Licensing Inc. Contextual information between television and user device
US9270718B2 (en) * 2011-11-25 2016-02-23 Harry E Emerson, III Internet streaming and the presentation of dynamic content
US9578378B2 (en) * 2012-01-05 2017-02-21 Lg Electronics Inc. Video display apparatus and operating method thereof
EP2690593A1 (en) 2012-07-24 2014-01-29 Nagravision S.A. Method for marking and transmitting a content and method for detecting an identifyier of said content
GB2505535B (en) * 2012-09-03 2015-06-10 Nds Ltd Method and apparatus for selection of advertisements to fill a commercial break of an unknown duration
EP2712203A1 (en) 2012-09-25 2014-03-26 Nagravision S.A. Method and system for enhancing redistributed audio / video content
US8805721B2 (en) * 2012-09-27 2014-08-12 Canoe Ventures Instantiation of asset insertion processing on multiple computing devices for directing insertion of assets into content on demand
US9398340B2 (en) 2012-09-27 2016-07-19 Canoe Ventures, Llc Asset qualification for content on demand insertion
US9386349B2 (en) 2012-09-27 2016-07-05 Canoe Ventures, Llc Asset conflict resolution for content on demand asset insertion
US9883208B2 (en) 2012-09-27 2018-01-30 Canoe Ventures Llc Data synchronization for content on demand asset insertion decisions
US9872075B2 (en) 2012-09-27 2018-01-16 Canoe Ventures Asset scoring and ranking for content on demand insertion
US20140123161A1 (en) * 2012-10-24 2014-05-01 Bart P.E. van Coppenolle Video presentation interface with enhanced navigation features
US9161090B2 (en) 2012-12-27 2015-10-13 EchoStar Technologies, L.L.C. Fast channel change from electronic programming guide
US9794642B2 (en) * 2013-01-07 2017-10-17 Gracenote, Inc. Inserting advertisements into video content
US8978060B2 (en) 2013-03-15 2015-03-10 Google Inc. Systems, methods, and media for presenting advertisements
US9161074B2 (en) 2013-04-30 2015-10-13 Ensequence, Inc. Methods and systems for distributing interactive content
US9817911B2 (en) * 2013-05-10 2017-11-14 Excalibur Ip, Llc Method and system for displaying content relating to a subject matter of a displayed media program
US10318579B2 (en) 2013-09-06 2019-06-11 Gracenote, Inc. Inserting information into playing content
US9368158B2 (en) * 2013-09-26 2016-06-14 Thomson Licensing Method and apparatus for re-inserting a commercial during playback of a recorded program
GB2519375A (en) * 2013-10-21 2015-04-22 Mastercard International Inc Method and apparatus for interaction via television system
EP2876890A1 (en) * 2013-11-21 2015-05-27 Thomson Licensing Method and apparatus for frame accurate synchronization of video streams
EP4040795A1 (en) 2014-02-14 2022-08-10 Pluto Inc. Methods and systems for generating and providing program guides and content
US10091263B2 (en) * 2014-05-21 2018-10-02 Audible Magic Corporation Media stream cue point creation with automated content recognition
US9854306B2 (en) * 2014-07-28 2017-12-26 Echostar Technologies L.L.C. Methods and systems for content navigation among programs presenting advertising content
US9565456B2 (en) * 2014-09-29 2017-02-07 Spotify Ab System and method for commercial detection in digital media environments
US10917693B2 (en) * 2014-10-10 2021-02-09 Nicholas-Alexander, LLC Systems and methods for utilizing tones
US10909566B2 (en) * 2014-10-10 2021-02-02 Nicholas-Alexander, LLC Systems and methods for utilizing tones
KR20160085076A (en) * 2015-01-07 2016-07-15 삼성전자주식회사 Method for determining broadcasting server for providing contents and electronic device for implementing the same
US9756378B2 (en) 2015-01-07 2017-09-05 Echostar Technologies L.L.C. Single file PVR per service ID
US9743123B2 (en) * 2015-05-22 2017-08-22 Disney Enterprise, Inc. Multi-channel video playback system with variable time delay
US9854326B1 (en) 2015-09-09 2017-12-26 Sorenson Media, Inc. Creating and fulfilling dynamic advertisement replacement inventory
US10366404B2 (en) 2015-09-10 2019-07-30 The Nielsen Company (Us), Llc Methods and apparatus to group advertisements by advertisement campaign
US9635413B2 (en) 2015-09-23 2017-04-25 Echostar Technologies L.L.C. Advance decryption key acquisition for streaming media content
US9924214B2 (en) * 2015-09-23 2018-03-20 Viacom International Inc. Device, system, and method for scheduled avail tone validation
US10136183B2 (en) * 2015-12-16 2018-11-20 Gracenote, Inc. Dynamic video overlays
US9930406B2 (en) 2016-02-29 2018-03-27 Gracenote, Inc. Media channel identification with video multi-match detection and disambiguation based on audio fingerprint
US9924222B2 (en) 2016-02-29 2018-03-20 Gracenote, Inc. Media channel identification with multi-match detection and disambiguation based on location
US10063918B2 (en) 2016-02-29 2018-08-28 Gracenote, Inc. Media channel identification with multi-match detection and disambiguation based on single-match
US10586023B2 (en) 2016-04-21 2020-03-10 Time Warner Cable Enterprises Llc Methods and apparatus for secondary content management and fraud prevention
US11087358B2 (en) * 2016-06-24 2021-08-10 The Nielsen Company (Us), Llc Methods and apparatus for wireless communication with an audience measurement device
US10327037B2 (en) * 2016-07-05 2019-06-18 Pluto Inc. Methods and systems for generating and providing program guides and content
US9621929B1 (en) * 2016-07-22 2017-04-11 Samuel Chenillo Method of video content selection and display
US20180027269A1 (en) * 2016-07-22 2018-01-25 Samuel Chenillo Method of Video Content Selection and Display
EP3306948A1 (en) * 2016-10-07 2018-04-11 HURRA Communications GmbH Method and system for displaying the content of a video or audio broadcast signal to a user and method and system for storing timestamps in a database
US10136185B2 (en) 2016-10-25 2018-11-20 Alphonso Inc. System and method for detecting unknown TV commercials from a live TV stream
US10108718B2 (en) 2016-11-02 2018-10-23 Alphonso Inc. System and method for detecting repeating content, including commercials, in a video data stream
US10225603B2 (en) * 2017-03-13 2019-03-05 Wipro Limited Methods and systems for rendering multimedia content on a user device
US10123058B1 (en) 2017-05-08 2018-11-06 DISH Technologies L.L.C. Systems and methods for facilitating seamless flow content splicing
US11115717B2 (en) 2017-10-13 2021-09-07 Dish Network L.L.C. Content receiver control based on intra-content metrics and viewing pattern detection
US10771831B2 (en) 2017-12-14 2020-09-08 At&T Intellectual Property I, L.P. System and method for preemptive advertisement caching to optimize network traffic
US11166054B2 (en) 2018-04-06 2021-11-02 The Nielsen Company (Us), Llc Methods and apparatus for identification of local commercial insertion opportunities
US11917240B2 (en) * 2018-08-14 2024-02-27 Inscape Data, Inc. Dynamic content serving using automated content recognition (ACR) and digital media watermarks
CN113302926A (en) * 2018-09-04 2021-08-24 潘杜多公司 Method and system for dynamic analysis, modification and distribution of digital images and videos
US10750212B2 (en) * 2018-09-17 2020-08-18 Mobitv, Inc. Dynamic digital object placement in video stream
US10880600B2 (en) * 2018-12-27 2020-12-29 The Nielsen Company (US) Methods and apparatus to monitor digital media
US11403849B2 (en) 2019-09-25 2022-08-02 Charter Communications Operating, Llc Methods and apparatus for characterization of digital content
US11082730B2 (en) 2019-09-30 2021-08-03 The Nielsen Company (Us), Llc Methods and apparatus for affiliate interrupt detection
US11363321B2 (en) * 2019-10-31 2022-06-14 Roku, Inc. Content-modification system with delay buffer feature
US11190854B2 (en) * 2019-10-31 2021-11-30 Roku, Inc. Content-modification system with client-side advertisement caching
US11856273B2 (en) * 2019-12-20 2023-12-26 Dish Network L.L.C. Method and system for digital program insertion in satellite communications
US11172269B2 (en) 2020-03-04 2021-11-09 Dish Network L.L.C. Automated commercial content shifting in a video streaming system
US11343565B2 (en) 2020-04-08 2022-05-24 Roku, Inc. Content-modification system with feature for detecting and responding to a content modification by a tuner device
US20220264171A1 (en) * 2021-02-12 2022-08-18 Roku, Inc. Use of In-Band Data to Facilitate Ad Harvesting for Dynamic Ad Replacement
US11457270B1 (en) * 2021-05-20 2022-09-27 At&T Intellectual Property I, L.P. Content processing workflow that alters objectionable content segments to enhance customer viewing experience
US20230064341A1 (en) * 2021-08-24 2023-03-02 Dish Network L.L.C. Methods and systems for detecting interruptions while streaming media content

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000036775A1 (en) * 1998-12-15 2000-06-22 Logan James D Apparatus and methods for broadcast monitoring and for providing individual programming
WO2001033848A1 (en) * 1999-11-01 2001-05-10 Koninklijke Philips Electronics N.V. Method and apparatus for swapping the video contents of undesired commercial breaks or other video sequences
US20020010919A1 (en) * 1998-05-12 2002-01-24 Nielsen Media Research, Inc. Audience measurement system for digital television

Family Cites Families (149)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3919479A (en) * 1972-09-21 1975-11-11 First National Bank Of Boston Broadcast signal identification system
US4888638A (en) * 1988-10-11 1989-12-19 A. C. Nielsen Company System for substituting television programs transmitted via telephone lines
US4974085A (en) * 1989-05-02 1990-11-27 Bases Burke Institute, Inc. Television signal substitution
US5155591A (en) * 1989-10-23 1992-10-13 General Instrument Corporation Method and apparatus for providing demographically targeted television commercials
US5029014A (en) * 1989-10-26 1991-07-02 James E. Lindstrom Ad insertion system and method for broadcasting spot messages out of recorded sequence
US5446919A (en) * 1990-02-20 1995-08-29 Wilkins; Jeff K. Communication system and method with demographically or psychographically defined audiences
US5319455A (en) * 1990-09-28 1994-06-07 Ictv Inc. System for distributing customized commercials to television viewers
US5715018A (en) * 1992-04-10 1998-02-03 Avid Technology, Inc. Digital advertisement insertion system
US5436653A (en) * 1992-04-30 1995-07-25 The Arbitron Company Method and system for recognition of broadcast segments
US5600364A (en) * 1992-12-09 1997-02-04 Discovery Communications, Inc. Network controller for cable television delivery systems
US6463585B1 (en) * 1992-12-09 2002-10-08 Discovery Communications, Inc. Targeted advertisement using television delivery systems
US5389964A (en) * 1992-12-30 1995-02-14 Information Resources, Inc. Broadcast channel substitution method and apparatus
US5987210A (en) * 1993-01-08 1999-11-16 Srt, Inc. Method and apparatus for eliminating television commercial messages
US5668917A (en) * 1994-07-05 1997-09-16 Lewine; Donald A. Apparatus and method for detection of unwanted broadcast information
US5574572A (en) * 1994-09-07 1996-11-12 Harris Corporation Video scaling method and device
US5515098A (en) * 1994-09-08 1996-05-07 Carles; John B. System and method for selectively distributing commercial messages over a communications network
US6122016A (en) * 1994-11-14 2000-09-19 U.S. Philips Corporation Video signal processing
US5774170A (en) * 1994-12-13 1998-06-30 Hite; Kenneth C. System and method for delivering targeted advertisements to consumers
US5748263A (en) * 1995-03-07 1998-05-05 Ball; Bradley E. System for automatically producing infrared control signals
US5600366A (en) * 1995-03-22 1997-02-04 Npb Partners, Ltd. Methods and apparatus for digital advertisement insertion in video programming
JP3625344B2 (en) * 1996-11-05 2005-03-02 株式会社ビデオリサーチ Viewing channel detector
US7055166B1 (en) * 1996-10-03 2006-05-30 Gotuit Media Corp. Apparatus and methods for broadcast monitoring
US20030093790A1 (en) * 2000-03-28 2003-05-15 Logan James D. Audio and video program recording, editing and playback systems using metadata
US20020120925A1 (en) * 2000-03-28 2002-08-29 Logan James D. Audio and video program recording, editing and playback systems using metadata
US5892536A (en) * 1996-10-03 1999-04-06 Personal Audio Systems and methods for computer enhanced broadcast monitoring
US6771316B1 (en) * 1996-11-01 2004-08-03 Jerry Iggulden Method and apparatus for selectively altering a televised video signal in real-time
US5999689A (en) * 1996-11-01 1999-12-07 Iggulden; Jerry Method and apparatus for controlling a videotape recorder in real-time to automatically identify and selectively skip segments of a television broadcast signal during recording of the television signal
US7269330B1 (en) 1996-11-01 2007-09-11 Televentions, Llc Method and apparatus for controlling a video recorder/player to selectively alter a video signal
US6002443A (en) * 1996-11-01 1999-12-14 Iggulden; Jerry Method and apparatus for automatically identifying and selectively altering segments of a television broadcast signal in real-time
WO1998028906A2 (en) 1996-12-20 1998-07-02 Princeton Video Image, Inc. Set top device for targeted electronic insertion of indicia into video
CA2196930C (en) * 1997-02-06 2005-06-21 Nael Hirzalla Video sequence recognition
US5978381A (en) 1997-06-06 1999-11-02 Webtv Networks, Inc. Transmitting high bandwidth network content on a low bandwidth communications channel during off peak hours
EP1025517A1 (en) * 1997-10-27 2000-08-09 Massachusetts Institute Of Technology Image search and retrieval system
US6078896A (en) 1997-11-05 2000-06-20 Marconi Commerce Systems Inc. Video identification for forecourt advertising
US5973723A (en) 1997-12-12 1999-10-26 Deluca; Michael Joseph Selective commercial detector and eliminator apparatus and method
US6819863B2 (en) 1998-01-13 2004-11-16 Koninklijke Philips Electronics N.V. System and method for locating program boundaries and commercial boundaries using audio categories
US6487721B1 (en) * 1998-01-30 2002-11-26 General Instrument Corporation Apparatus and method for digital advertisement insertion in a bitstream
US6698020B1 (en) 1998-06-15 2004-02-24 Webtv Networks, Inc. Techniques for intelligent video ad insertion
US6100941A (en) * 1998-07-28 2000-08-08 U.S. Philips Corporation Apparatus and method for locating a commercial disposed within a video data stream
US6820277B1 (en) 1999-04-20 2004-11-16 Expanse Networks, Inc. Advertising management system for digital video streams
US6560578B2 (en) * 1999-03-12 2003-05-06 Expanse Networks, Inc. Advertisement selection system supporting discretionary target market characteristics
US20020123928A1 (en) * 2001-01-11 2002-09-05 Eldering Charles A. Targeting ads to subscribers based on privacy-protected subscriber profiles
US7068724B1 (en) 1999-10-20 2006-06-27 Prime Research Alliance E., Inc. Method and apparatus for inserting digital media advertisements into statistical multiplexed streams
US20020083445A1 (en) * 2000-08-31 2002-06-27 Flickinger Gregory C. Delivering targeted advertisements to the set-top-box
US20010049620A1 (en) * 2000-02-29 2001-12-06 Blasko John P. Privacy-protected targeting system
US20020072966A1 (en) 2000-08-31 2002-06-13 Eldering Charles A. System for providing targeted advertisements using advertiser-specific target groups
US7039932B2 (en) * 2000-08-31 2006-05-02 Prime Research Alliance E., Inc. Queue-based head-end advertisement scheduling method and apparatus
US20020087973A1 (en) * 2000-12-28 2002-07-04 Hamilton Jeffrey S. Inserting local signals during MPEG channel changes
US7228555B2 (en) * 2000-08-31 2007-06-05 Prime Research Alliance E., Inc. System and method for delivering targeted advertisements using multiple presentation streams
US8290351B2 (en) * 2001-04-03 2012-10-16 Prime Research Alliance E., Inc. Alternative advertising in prerecorded media
US7328448B2 (en) * 2000-08-31 2008-02-05 Prime Research Alliance E, Inc. Advertisement distribution system for distributing targeted advertisements in television systems
US7653923B2 (en) * 2000-02-18 2010-01-26 Prime Research Alliance E, Inc. Scheduling and presenting IPG ads in conjunction with programming ads in a television environment
US20020144263A1 (en) * 2000-08-31 2002-10-03 Eldering Charles A. Grouping of advertisements on an advertising channel in a targeted advertisement system
US20020083439A1 (en) * 2000-08-31 2002-06-27 Eldering Charles A. System for rescheduling and inserting advertisements
US7185353B2 (en) * 2000-08-31 2007-02-27 Prime Research Alliance E., Inc. System and method for delivering statistically scheduled advertisements
US20020083441A1 (en) * 2000-08-31 2002-06-27 Flickinger Gregory C. Advertisement filtering and storage for targeted advertisement systems
US6704930B1 (en) * 1999-04-20 2004-03-09 Expanse Networks, Inc. Advertisement insertion techniques for digital video streams
US6436653B1 (en) * 1998-12-15 2002-08-20 Exiqon A/S Method for introduction of reporter groups into bacterial lipopolysaccharide-derived carbohydrates and the subsequent coupling of such derivatives onto solid surfaces
US6490370B1 (en) * 1999-01-28 2002-12-03 Koninklijke Philips Electronics N.V. System and method for describing multimedia content
US6028950A (en) * 1999-02-10 2000-02-22 The National Registry, Inc. Fingerprint controlled set-top box
US7051351B2 (en) 1999-03-08 2006-05-23 Microsoft Corporation System and method of inserting advertisements into an information retrieval system display
US6646655B1 (en) * 1999-03-09 2003-11-11 Webex Communications, Inc. Extracting a time-sequence of slides from video
WO2000069163A2 (en) 1999-05-10 2000-11-16 Expanse Networks, Inc. Advertisement subgroups for digital streams
WO2000070869A1 (en) * 1999-05-18 2000-11-23 Contentwise Ltd. Monitoring system
US6536037B1 (en) * 1999-05-27 2003-03-18 Accenture Llp Identification of redundancies and omissions among components of a web based architecture
US7116377B2 (en) * 1999-09-27 2006-10-03 General Instrument Corporation Graphics subsystem bypass method and apparatus
US6469749B1 (en) * 1999-10-13 2002-10-22 Koninklijke Philips Electronics N.V. Automatic signature-based spotting, learning and extracting of commercials and other video content
US7272295B1 (en) 1999-11-10 2007-09-18 Thomson Licensing Commercial skip and chapter delineation feature on recordable media
US6606744B1 (en) * 1999-11-22 2003-08-12 Accenture, Llp Providing collaborative installation management in a network-based supply chain environment
US7110454B1 (en) 1999-12-21 2006-09-19 Siemens Corporate Research, Inc. Integrated method for scene change detection
US6425127B1 (en) * 2000-01-13 2002-07-23 International Business Machines Corporation Method and system for controlling visual access by a user to broadcast video segments
US6577346B1 (en) 2000-01-24 2003-06-10 Webtv Networks, Inc. Recognizing a pattern in a video segment to identify the video segment
AU2001229644A1 (en) * 2000-01-27 2001-08-07 Suzanne M. Berberet System and method for providing broadcast programming, a virtual vcr, and a video scrapbook to programming subscribers
US6675174B1 (en) * 2000-02-02 2004-01-06 International Business Machines Corp. System and method for measuring similarity between a set of known temporal media segments and a one or more temporal media streams
US7631338B2 (en) * 2000-02-02 2009-12-08 Wink Communications, Inc. Interactive content delivery methods and apparatus
US6593976B1 (en) * 2000-02-14 2003-07-15 Koninklijke Philips Electronics N.V. Automatic return to input source when user-selected content reappears in input source
US20050283796A1 (en) 2000-02-18 2005-12-22 Prime Research Alliance E., Inc. Method and system for addressable and program independent advertising during recorded programs
US6912571B1 (en) 2000-02-22 2005-06-28 Frank David Serena Method of replacing content
US8572639B2 (en) * 2000-03-23 2013-10-29 The Directv Group, Inc. Broadcast advertisement adapting method and apparatus
GB2361128A (en) 2000-04-05 2001-10-10 Sony Uk Ltd Video and/or audio processing apparatus
GB2361127A (en) 2000-04-05 2001-10-10 Sony Uk Ltd Audio/video reproduction via a communications network
US20040148625A1 (en) * 2000-04-20 2004-07-29 Eldering Charles A Advertisement management system for digital video streams
DE10028623A1 (en) * 2000-06-09 2001-12-20 Clemente Spehr Manipulating transmission media and device for manipulating the efficiency of a method for suppressing undesirable transmission blocks of advertisements alternates film mini-blocks and advertisement mini-blocks
GB0015065D0 (en) 2000-06-21 2000-08-09 Macnamee Gerard System and method of personalised interactive TV advertising over broadcast television system
WO2002009328A1 (en) * 2000-07-21 2002-01-31 Koninklijke Philips Electronics N.V. Multimedia monitoring by combining watermarking and characteristic signature of signal
US20040128317A1 (en) * 2000-07-24 2004-07-01 Sanghoon Sull Methods and apparatuses for viewing, browsing, navigating and bookmarking videos and displaying images
KR20040041082A (en) * 2000-07-24 2004-05-13 비브콤 인코포레이티드 System and method for indexing, searching, identifying, and editing portions of electronic multimedia files
GB0022370D0 (en) * 2000-09-13 2000-10-25 Pace Micro Tech Plc Improvements to televisionn system
US7185044B2 (en) 2000-11-06 2007-02-27 The Weather Channel Weather information delivery systems and methods providing planning functionality and navigational tools
US7062084B2 (en) 2000-12-01 2006-06-13 Sharp Laboratories Of America, Inc. Method for image description using color and local spatial information
US20020067730A1 (en) 2000-12-05 2002-06-06 Starguide Digital Networks, Inc. Method and apparatus for IP multicast content distribution system having national and regional demographically targeted advertisement insertion
US6965683B2 (en) * 2000-12-21 2005-11-15 Digimarc Corporation Routing networks for use with watermark systems
US20020126224A1 (en) * 2000-12-28 2002-09-12 Rainer Lienhart System for detection of transition and special effects in video
US20020124077A1 (en) * 2001-02-20 2002-09-05 Hill Clarke Randolph Advertising and audience authentication with server-side measurement and client-side verification
DE10156514A1 (en) 2001-11-16 2003-05-28 Grundig Ag TV
US20020129362A1 (en) * 2001-03-08 2002-09-12 Chang Matthew S. Multiple commercial option in the same time slot
US20020178445A1 (en) * 2001-04-03 2002-11-28 Charles Eldering Subscriber selected advertisement display and scheduling
US20020184047A1 (en) * 2001-04-03 2002-12-05 Plotnick Michael A. Universal ad queue
US20020178447A1 (en) * 2001-04-03 2002-11-28 Plotnick Michael A. Behavioral targeted advertising
US20020186957A1 (en) * 2001-04-27 2002-12-12 Henry Yuen Personal video recorder with high-capacity archive
US6892193B2 (en) 2001-05-10 2005-05-10 International Business Machines Corporation Method and apparatus for inducing classifiers for multimedia based on unified representation of features reflecting disparate modalities
EP1421792B1 (en) 2001-06-08 2011-11-09 Grotuit Media Inc. Audio and video program recording, editing and playback systems using metadata
US7146632B2 (en) 2001-06-08 2006-12-05 Digeo, Inc. Interactive information aggregator for an interactive television system
US7266832B2 (en) * 2001-06-14 2007-09-04 Digeo, Inc. Advertisement swapping using an aggregator for an interactive television system
US20030001977A1 (en) * 2001-06-28 2003-01-02 Xiaoling Wang Apparatus and a method for preventing automated detection of television commercials
US20030023972A1 (en) 2001-07-26 2003-01-30 Koninklijke Philips Electronics N.V. Method for charging advertisers based on adaptive commercial switching between TV channels
US7089575B2 (en) 2001-09-04 2006-08-08 Koninklijke Philips Electronics N.V. Method of using transcript information to identify and learn commercial portions of a program
US20030122966A1 (en) * 2001-12-06 2003-07-03 Digeo, Inc. System and method for meta data distribution to customize media content playback
US7849476B2 (en) * 2001-12-13 2010-12-07 Thomson Licensing System and method for automatic switching to interactive application during television program breaks
US20110178877A1 (en) 2001-12-14 2011-07-21 Swix Scott R Advertising and content management systems and methods
US7064796B2 (en) * 2001-12-21 2006-06-20 Eloda Inc. Method and system for re-identifying broadcast segments using statistical profiles
US7170566B2 (en) * 2001-12-21 2007-01-30 Koninklijke Philips Electronics N.V. Family histogram based techniques for detection of commercials and other video content
US20030123841A1 (en) * 2001-12-27 2003-07-03 Sylvie Jeannin Commercial detection in audio-visual content based on scene change distances on separator boundaries
WO2003062960A2 (en) * 2002-01-22 2003-07-31 Digimarc Corporation Digital watermarking and fingerprinting including symchronization, layering, version control, and compressed embedding
US20030149975A1 (en) * 2002-02-05 2003-08-07 Charles Eldering Targeted advertising in on demand programming
US7467398B2 (en) * 2002-03-21 2008-12-16 International Business Machines Corproation Apparatus and method of searching for desired television content
US20030192045A1 (en) 2002-04-04 2003-10-09 International Business Machines Corporation Apparatus and method for blocking television commercials and displaying alternative programming
US7461392B2 (en) * 2002-07-01 2008-12-02 Microsoft Corporation System and method for identifying and segmenting repeating media objects embedded in a stream
US7333864B1 (en) 2002-06-01 2008-02-19 Microsoft Corporation System and method for automatic segmentation and identification of repeating objects from an audio stream
US20030227475A1 (en) * 2002-06-06 2003-12-11 International Business Machines Corporation Apparatus and method for blocking television commercials and delivering micro-programming content
FI20021089A (en) 2002-06-07 2003-12-08 Mitron Oy Method and system for identifying a channel for a viewer survey and measuring device of a channel
WO2004004322A1 (en) 2002-07-01 2004-01-08 Koninklijke Philips Electronics N.V. System for processing video signals
CN100426861C (en) 2002-07-01 2008-10-15 微软公司 A system and method for providing user control over repeating objects embedded in a stream
US6983481B2 (en) * 2002-07-25 2006-01-03 International Business Machines Corporation Apparatus and method for blocking television commercials with a content interrogation program
US8176508B2 (en) * 2002-08-02 2012-05-08 Time Warner Cable Method and apparatus to provide verification of data using a fingerprint
US6987883B2 (en) 2002-12-31 2006-01-17 Objectvideo, Inc. Video scene background maintenance using statistical pixel modeling
US8332326B2 (en) * 2003-02-01 2012-12-11 Audible Magic Corporation Method and apparatus to identify a work received by a processing system
US20050177847A1 (en) * 2003-03-07 2005-08-11 Richard Konig Determining channel associated with video stream
US7738704B2 (en) 2003-03-07 2010-06-15 Technology, Patents And Licensing, Inc. Detecting known video entities utilizing fingerprints
US7694318B2 (en) * 2003-03-07 2010-04-06 Technology, Patents & Licensing, Inc. Video detection and insertion
US20040237102A1 (en) 2003-03-07 2004-11-25 Richard Konig Advertisement substitution
US20050149968A1 (en) * 2003-03-07 2005-07-07 Richard Konig Ending advertisement insertion
EP1611745A1 (en) 2003-03-28 2006-01-04 Koninklijke Philips Electronics N.V. Data block detect by fingerprint
US20040226035A1 (en) * 2003-05-05 2004-11-11 Hauser David L. Method and apparatus for detecting media content
US7298962B2 (en) * 2003-05-12 2007-11-20 Macrovision Corporation Method and apparatus for reducing and restoring the effectiveness of a commercial skip system
US20040260682A1 (en) * 2003-06-19 2004-12-23 Microsoft Corporation System and method for identifying content and managing information corresponding to objects in a signal
US7327885B2 (en) 2003-06-30 2008-02-05 Mitsubishi Electric Research Laboratories, Inc. Method for detecting short term unusual events in videos
US20050044561A1 (en) * 2003-08-20 2005-02-24 Gotuit Audio, Inc. Methods and apparatus for identifying program segments by detecting duplicate signal patterns
US7818257B2 (en) 2004-07-16 2010-10-19 Deluxe Laboratories, Inc. Program encoding and counterfeit tracking system and method
US7788696B2 (en) 2003-10-15 2010-08-31 Microsoft Corporation Inferring information about media stream objects
WO2005041109A2 (en) 2003-10-17 2005-05-06 Nielsen Media Research, Inc. Methods and apparatus for identifiying audio/video content using temporal signal characteristics
EP1687978A1 (en) * 2003-11-17 2006-08-09 Koninklijke Philips Electronics N.V. Commercial insertion into video streams based on surrounding program content
US7853968B2 (en) * 2003-12-02 2010-12-14 Lsi Corporation Commercial detection suppressor with inactive video modification
US20060195860A1 (en) 2005-02-25 2006-08-31 Eldering Charles A Acting on known video entities detected utilizing fingerprinting
US20060195859A1 (en) 2005-02-25 2006-08-31 Richard Konig Detecting known video entities taking into account regions of disinterest
US20060242667A1 (en) 2005-04-22 2006-10-26 Petersen Erin L Ad monitoring and indication
US7690011B2 (en) 2005-05-02 2010-03-30 Technology, Patents & Licensing, Inc. Video stream modification to defeat detection
US20060271947A1 (en) 2005-05-23 2006-11-30 Lienhart Rainer W Creating fingerprints
US8311214B2 (en) 2006-04-24 2012-11-13 Motorola Mobility Llc Method for elliptic curve public key cryptographic validation

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020010919A1 (en) * 1998-05-12 2002-01-24 Nielsen Media Research, Inc. Audience measurement system for digital television
WO2000036775A1 (en) * 1998-12-15 2000-06-22 Logan James D Apparatus and methods for broadcast monitoring and for providing individual programming
WO2001033848A1 (en) * 1999-11-01 2001-05-10 Koninklijke Philips Electronics N.V. Method and apparatus for swapping the video contents of undesired commercial breaks or other video sequences

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
ALBIOL A ET AL: "Detection of tv commercials", ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2004. PROCEEDINGS. (ICASSP '04). IEEE INTERNATIONAL CONFERENCE ON MONTREAL, QUEBEC, CANADA 17-21 MAY 2004, PISCATAWAY, NJ, USA,IEEE, vol. 3, 17 May 2004 (2004-05-17), pages 541 - 544, XP010718246, ISBN: 0-7803-8484-9 *
DIMITROVA N ET AL: "Applications of video-content analysis and retrieval", IEEE MULTIMEDIA, IEEE COMPUTER SOCIETY, US, vol. 9, no. 3, July 2002 (2002-07-01), pages 42 - 55, XP002287758, ISSN: 1070-986X *
DIMITROVA N ET AL: "ON SELECTIVE VIDEO CONTENT ANALYSIS AND FILTERING", PROCEEDINGS OF THE SPIE, SPIE, BELLINGHAM, VA, US, vol. 3972, 26 January 2000 (2000-01-26), pages 359 - 368, XP009002896, ISSN: 0277-786X *
DRUCKER S M ET AL: "SMARTSKIP: CONSUMER LEVEL BROWSING AND SKIPPING OF DIGITAL VIDEO CONTENT", CHI 2002 CONFERENCE PROCEEDINGS. CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS. MINNEAPOLIS, MN, APRIL 20 - 25, 2002, CHI CONFERENCE PROCEEDINGS. HUMAN FACTORS IN COMPUTING SYSTEMS, NEW YORK, NY : ACM, US, 20 April 2002 (2002-04-20), pages 219 - 226, XP001099414, ISBN: 1-58113-453-3 *
LIENHART R ET AL: "On the detection and recognition of television commercials", MULTIMEDIA COMPUTING AND SYSTEMS '97. PROCEEDINGS., IEEE INTERNATIONAL CONFERENCE ON OTTAWA, ONT., CANADA 3-6 JUNE 1997, LOS ALAMITOS, CA, USA,IEEE COMPUT. SOC, US, 3 June 1997 (1997-06-03), pages 509 - 516, XP010239226, ISBN: 0-8186-7819-4 *
MARIA JOSÉ CHAMORRO FULLÀ: "Deteccion de anuncios en secuencias de television", DEPARTAMENTO DE COMUNICACIONES, UNIVERSIDAD POLITECNICA DE VALENCIA, 13 October 2003 (2003-10-13), XP002330270 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9554093B2 (en) 2006-02-27 2017-01-24 Microsoft Technology Licensing, Llc Automatically inserting advertisements into source video content playback streams
US9788080B2 (en) 2006-02-27 2017-10-10 Microsoft Technology Licensing, Llc Automatically inserting advertisements into source video content playback streams
US8654255B2 (en) 2007-09-20 2014-02-18 Microsoft Corporation Advertisement insertion points detection for online video advertising
US9639531B2 (en) 2008-04-09 2017-05-02 The Nielsen Company (Us), Llc Methods and apparatus to play and control playing of media in a web page
US8207989B2 (en) 2008-12-12 2012-06-26 Microsoft Corporation Multi-video synthesis
US10943252B2 (en) 2013-03-15 2021-03-09 The Nielsen Company (Us), Llc Methods and apparatus to identify a type of media presented by a media player
US11361340B2 (en) 2013-03-15 2022-06-14 The Nielsen Company (Us), Llc Methods and apparatus to identify a type of media presented by a media player
US11734710B2 (en) 2013-03-15 2023-08-22 The Nielsen Company (Us), Llc Methods and apparatus to identify a type of media presented by a media player

Also Published As

Publication number Publication date
US20100153993A1 (en) 2010-06-17
GB0504196D0 (en) 2005-04-06
US7694318B2 (en) 2010-04-06
GB2411787A (en) 2005-09-07
EP1730668B1 (en) 2010-02-10
GB2411786A (en) 2005-09-07
GB2411786B (en) 2009-10-07
US20040189873A1 (en) 2004-09-30
GB0504217D0 (en) 2005-04-06
EP1730668A1 (en) 2006-12-13
GB0504214D0 (en) 2005-04-06
DE602005019273D1 (en) 2010-03-25
ZA200608155B (en) 2008-11-26
ATE457501T1 (en) 2010-02-15
GB2411788B (en) 2010-01-27
US7930714B2 (en) 2011-04-19
GB2411788A (en) 2005-09-07

Similar Documents

Publication Publication Date Title
EP1730668B1 (en) Detecting known images in video streams
US7738704B2 (en) Detecting known video entities utilizing fingerprints
US20060195859A1 (en) Detecting known video entities taking into account regions of disinterest
US20050177847A1 (en) Determining channel associated with video stream
US20050149968A1 (en) Ending advertisement insertion
US20060195860A1 (en) Acting on known video entities detected utilizing fingerprinting
US8365216B2 (en) Video stream modification to defeat detection
US9147112B2 (en) Advertisement detection
US8327407B2 (en) Determination of receiving live versus time-shifted media content at a communication device
US20040237102A1 (en) Advertisement substitution
US20060107301A1 (en) Video recorder unit and method of operation therefor
JP2018530273A (en) Common media segment detection
GB2423881A (en) Detecting known video entities taking into account regions of disinterest
GB2423882A (en) Acting on known video entities detected utilizing fingerprinting
GB2425431A (en) Video entity recognition in compressed digital video streams
US20180027269A1 (en) Method of Video Content Selection and Display
WO2008037942A1 (en) Video stream modification to defeat detection

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: DE

WWW Wipo information: withdrawn in national office

Country of ref document: DE

WWE Wipo information: entry into national phase

Ref document number: 2005717850

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 2005717850

Country of ref document: EP