WO2015038121A1 - Video segmentation by audio selection - Google Patents

Video segmentation by audio selection Download PDF

Info

Publication number
WO2015038121A1
WO2015038121A1 PCT/US2013/059343 US2013059343W WO2015038121A1 WO 2015038121 A1 WO2015038121 A1 WO 2015038121A1 US 2013059343 W US2013059343 W US 2013059343W WO 2015038121 A1 WO2015038121 A1 WO 2015038121A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
content
playback
scenes
request
Prior art date
Application number
PCT/US2013/059343
Other languages
French (fr)
Inventor
Samo KONYAR
Original Assignee
Thomson Licensing
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Thomson Licensing filed Critical Thomson Licensing
Priority to PCT/US2013/059343 priority Critical patent/WO2015038121A1/en
Publication of WO2015038121A1 publication Critical patent/WO2015038121A1/en

Links

Classifications

    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/34Indicating arrangements 
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7834Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using audio features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/7867Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title and artist information, manually generated time, location and usage information, user ratings
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/19Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier
    • G11B27/28Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/239Interfacing the upstream path of the transmission network, e.g. prioritizing client content requests
    • H04N21/2393Interfacing the upstream path of the transmission network, e.g. prioritizing client content requests involving handling client requests
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/42203Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS] sound input device, e.g. microphone
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47211End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for requesting pay-per-view content
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/4728End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for selecting a Region Of Interest [ROI], e.g. for requesting a higher resolution version of a selected region
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/8106Monomedia components thereof involving special audio data, e.g. different tracks for different languages

Definitions

  • the present invention relates to video content playback, and, in particular to the video content playback based upon audio selections.
  • video content has been broken (segmented) into scenes either due to editing decisions or how a script is written.
  • scenes are typically used for trick play functions (jump to the next scene) or chapters that are accessed on a DVD/Blu-Ray disc.
  • the present invention segments video content based upon the audio (music) associations of the video instead of as in the prior art based on scenes associated with the video.
  • the present invention provides a framework for breaking up a video for a media asset such as a television show or movie or other content (including multimedia content) into segments that correlate to the audio/music selections instead of scenes as in the prior art.
  • a method and apparatus including receiving a user request, determining if the user request is a playback request, displaying to a user, playback options, if the user request is the playback request, receiving the playback options selected by the user, determining if the playback options selected by the user are by audio selection and forwarding scenes that match the audio selection of the user for playback.
  • Fig. 1 is a flowchart of an exemplary method for segmenting a video into different segments that correlate to the audio associated with such a video.
  • Fig. 2 presents the results of the application of the described method of the present invention.
  • FIGs. 3 A and 3B together form a flowchart of an exemplary embodiment of the method of the present invention.
  • Fig. 4 is a block diagram of an exemplary embodiment in accordance with the principles of the present invention.
  • the present invention provides a mechanism for breaking up (segmenting) a video (such as a movie or a television show or any other form of content) into audio selections, which can be music selections such as songs or background music.
  • the video segment can then be mapped to such audio segments (called M) instead of using the typically scenes for which a video is segmented.
  • Video scenes are typically the way a video is constructed either due to editing decisions, how a script is written, commercial break requirements, and the like.
  • Fig. 1 is a flowchart of an exemplary method for segmenting a video into different segments that correlate to the audio associated with such a video.
  • the invention begins with a video asset that is composed of a number of scenes. Each scene can then be associated with a number (0 to X) of music selections which can be songs associated with different music artists.
  • a media asset containing audio and video is ingested into a server or other type of storage device.
  • a time reference is assigned to the audio and the video of the media asset.
  • such a time reference can be timestamps although other mechanisms can be used for this step.
  • an audio recognition mechanism is applied to recognize the different audio segments in the media asset.
  • Such audio recognition techniques can be the use of a program such as Shazam, which identifies songs/music selections based on audio fingerprint of the audio selection or AudioID developed by Fraunhofer institute, volume changes in the audio such as periods of silence that exceed a threshold time value and the like.
  • Other audio recognition mechanisms can be used and/or combined with Shazam and/or AudioID.
  • the video is segmented by way of the audio selections that were identified in the previous step. That is, instead of having the video scenes control how a video is organized, it will be the audio selections which are used to segment a video. Hence, the beginning and the end of an audio selection will have a video segment corresponding in kind.
  • the time basis for the audio and the video segment would then match up using the time reference (e.g., time stamps).
  • the audio segment can be referred by a song title, artist, copyright, and the like if such information can be determined by the characteristics of the audio segment, whereby such information can be assigned as metadata to such audio selections.
  • Fig. 2 presents the results of the application of the described method of the present invention where a video that is 30 minutes in length is divided according to Music Selections Ml, M2, M3, M4, M5, and M6.
  • Fig. 2 displays the corresponding scenes SI, S2, S3, and S4.
  • Table 1 presents an example of audio selections and the related metadata of such selections.
  • a user can specify that a video be played back where the audio segments that correspond to a certain artist (M83, Prince) are used to select the corresponding video segments that are played back (Ml, for example for M83 and M2 and M3 for Prince).
  • M83, Prince the audio segments that correspond to a certain artist
  • Ml the audio segments that correspond to a certain artist
  • M3 for example for M83 and M2 and M3 for Prince
  • a user can also be provided with an option to buy the corresponding audio segments in a form such as an MP3 from a service such as M-Go, Amazon, and the like if a video is presented with a corresponding list of audio selections as presented in Table 1.
  • M-Go a service
  • Amazon Amazon
  • a first aspect of the invention lets a user playback all of the scenes that match a particular artist or song title. Hence, if Prince is selected as the relevant artist in the present example both the 1 st scene and the 2 nd scene will be played back (from the beginning of such scenes or where such songs begin in accordance with a user preference).
  • the playback of such scenes can be done in sequential order or in a randomized order.
  • a second aspect of the invention will let a user playback "all" of the scenes that pertain to a particular artist or song title where all of such scenes are available to a user.
  • Such scenes can be from different movies, television shows, and the like.
  • One feature of this aspect of the present invention is that the scenes that are played back can have the SAME song performed by a different artist.
  • a cover version of "BLACKBIRD” could be played back with a corresponding scene instead of having the "4 th scene” where the Beatles performed the version of "BLACKBIRD”. This is a substitute song feature.
  • a third aspect will let a user buy MP3 versions of songs from a service provider such as AMAZON, M-GO, and the like.
  • a service provider such as AMAZON, M-GO, and the like.
  • alternative versions of songs can be offered as well as "Blackbird" from the Beatles and respective cover versions of the song.
  • the invention could be performed by a content server or content distribution system.
  • the segmented content could be stored on a user's device including but not limited to televisions, set top boxes, computers, laptops, dual mode smart phones, iPods, iPads, iPhones, and tablets.
  • a user's device including but not limited to televisions, set top boxes, computers, laptops, dual mode smart phones, iPods, iPads, iPhones, and tablets.
  • the third aspect of the present invention involving purchasing certain audio would be coordinated with the user's content provider in terms of billing (accounting).
  • Other secondary aspects of the third aspect of the present invention could include offering user's "deals” if they purchased, for example, "Blackbird” by all artists that recorded that song or perhaps if the user purchased an entire Beatles album on which the song "Blackbird” was included or if they purchased other movies or movie segments that included the song "Blackbird”.
  • Figs. 3 A and 3B together form a flowchart of an exemplary embodiment of the method of the present invention.
  • a user request is received.
  • a test is performed to determine if the user request is a playback request.
  • a playback request may be by movie, TV show, other content, by scene, or by audio selection. If the user request is a playback request, then at 315, the playback options are displayed to the user.
  • the user's playback options are received.
  • a test is performed to determine if the playback option selected is by audio selection (rather than by scene as in the prior art).
  • the movie, TV show or scene selected by the user is forwarded to the user for playback at 330. If the playback option selected is by audio selection then at 335, a test is performed to determine if all scenes in the current content that match the user's audio selection are to be played back. If all scenes in the current content that match the user's audio selection are to be played back then a test is performed at 340 to determine if the playback is to start at the beginning of each scene of the current content that matches the user's audio selection.
  • the playback is to start at the beginning of each scene of the current content that matches the user's audio selection then forward the scenes (in the order that the scenes occur or shuffled (in some random order)) to the user for playback at 345. If the playback is not to start at the beginning of each scene of the current content that matches the user's audio selection then at 350 forward the scenes (in the order that the scenes occur or shuffled (in some random order)) starting at the point where the audio selection begins to the user for playback. If all scenes in the current content that match the user's audio selection are not to be played back then at 355 a test is performed to determine if all available scenes that match the user's audio selection are to be played back.
  • Step 380 If all available scenes that match the user's audio selection are not to be played back then at 380, the available scenes that match the user's audio selection are displayed. At 385, the user's scene selections from the available scenes are received. Steps/operations 340, 345 and 345 will not be described again since they are he same as described above.
  • the user request was not a playback request, then it is assumed to be a purchase request for audio selections and at 365 a request for audio selections and account information to make the purchase is displayed.
  • the user's account information is received and validated.
  • the audio selections purchased by the user are transmitted to the user.
  • Fig. 4 is a block diagram of an exemplary embodiment in accordance with the principles of the present invention.
  • the present invention is practiced at a content provider's equipment/system.
  • the user interface module 405 is in bi-directional communication with the user.
  • User interface module 405 is also in bi-directional communication with a user request module 410, which parses the user's request to determine if the user's request is a purchase request or a playback request. If the user's request is a purchase request then the user request module 410, which is in bi- directional communication with the purchase request module 415, forwards the user's purchase request to the purchases request module 415.
  • the purchase request module 415 is in bi-directional communication with the account information module 420.
  • the account information module request and receives the user's account information via the user requests module 410 and the user interface module 405 and validates the user's account information. Once the user's account information is validated the account information module 420 returns confirmation to the purchase request module 415.
  • the purchase request module 415 requests and retrieves the user's audio selection information via the user request module 410 and the user interface module 405.
  • the user's audio selection information is forwarded to the retrieve audio selection module 425 with which the purchase request module is in bi-directional communication.
  • the retrieve audio selection module 425 retrieves the user's audio selection from the content storage and retrieval module 445 with which the retrieve audio selection module 425 is in bi-directional communication.
  • the user request module 410 forwards the request to playback module 430 with which the user request module is in bi-directional communication.
  • the ⁇ playback module 430 determines if the playback is by scene (or entire content) or by audio selection. If the playback is by scene (or entire content) then the playback request module 430 forwards the user's selection to the retrieve content by scene or entire content module 440 with which the playback request module 430 is in bidirectional communication.
  • the retrieve content by scene or entire content module 440 retrieves the user's content selection from the content storage and retrieval module 445 with which the retrieve content by scene or entire content module 440 is in bi-directional communication.
  • the playback request module 430 forwards the user's selection to the retrieve content by audio selection to the retrieve content by audio selection module 435 with which the playback request module 430 is in bi- directional communication.
  • the user's selection information includes whether to playback scenes starting at the beginning of each scene or starting at the point that the audio selection begins.
  • the user's selection information also includes whether the user wants to playback scenes that match the user's audio selection in the current content or all available content.
  • the retrieve content by audio selection module 435 retrieves the user's content selection that match the user's audio selection from the content storage and retrieval module 445 with which the retrieve content by audio selection module 435 is in bi-directional communication. All content retrieved from the content storage and retrieval module is transmitted (forwarded) to the user on the reverse path from which the request was received via the bi-directional communication paths.
  • the means for receiving a user request is the user interface module.
  • the means for determining if said user request is a playback request is the user request module.
  • the means for displaying to a user, playback options, if said user request is said playback request is the user interface module.
  • the means for receiving said playback options selected by said user is the user interface module.
  • the means for determining if said playback options selected by said user are by audio selection is the playback request module.
  • the means for forwarding scenes that match said audio selection of said user for playback is via retrieve content by audio selection module (having retrieved the selected content from the content storage and retrieval module).
  • the retrieve content by audio selection module forwards the retrieved content back to the playback request module which forwards the retrieved content to the user request module which forwards the retrieved content to the user interface module, which forwards the retrieved content to the user for playback.
  • the means for determining if said user has selected playing back of all scenes in content currently being viewed by said user is the playback request module.
  • the means for determining if said user has selected playing back content starting at a beginning of each scene or said user has selected playing back content where said audio selection selected by said user commences is the playback request module.
  • the means for forwarding content selected by said user for playback, if said playback options selected by said user are not by audio selection is via retrieve content by scene or entire content module (having retrieved the selected content from the content storage and retrieval module).
  • the retrieve content by scene or entire content module forwards the retrieved content back to the playback request module which forwards the retrieved content to the user request module which forwards the retrieved content to the user interface module, which forwards the retrieved content to the user for playback.
  • the means for determining if said user has selected playing back all scenes in all available content, if said user has not selected playing back scenes that match the user's audio selection from said current content is the playback request module.
  • the means for displaying a list of all scenes of all available content that match said audio selection of said user is the user interface module.
  • the means for receiving said user's scene selection from said list of all scenes of all available content that match said audio selection of said user is the user interface module.
  • the means for determining if said user has selected playing back content starting at a beginning of each scene or said user has selected playing back content where said audio selection selected by said user commences is the playback request module.
  • the means for receiving said audio selections for purchase and account information from said user is the uer interface module.
  • the means for validating said user's account information is the account information module.
  • the means for transmitting said purchased selections to said user is via the retrieve audio selection module (having retrieved the selected content from the content storage and retrieval module).
  • the retrieve audio selection module forwards the retrieved content back to the purchase request module which forwards the retrieved content to the user request module which forwards the retrieved content to the user interface module, which forwards the retrieved content to the user for playback.
  • the present invention may be implemented in various forms of hardware, software, firmware, special purpose processors, or a combination thereof.
  • Special purpose processors may include application specific integrated circuits (ASICs), reduced instruction set computers (RISCs) and/or field programmable gate arrays (FPGAs).
  • ASICs application specific integrated circuits
  • RISCs reduced instruction set computers
  • FPGAs field programmable gate arrays
  • the present invention is implemented as a combination of hardware and software.
  • the software is preferably implemented as an application program tangibly embodied on a program storage device.
  • the application program may be uploaded to, and executed by, a machine comprising any suitable architecture.
  • the machine is implemented on a computer platform having hardware such as one or more central processing units (CPU), a random access memory (RAM), and input/output (I/O) interface(s).
  • CPU central processing units
  • RAM random access memory
  • I/O input/output
  • the computer platform also includes an operating system and microinstruction code.
  • the various processes and functions described herein may either be part of the microinstruction code or part of the application program (or a combination thereof), which is executed via the operating system.
  • various other peripheral devices may be connected to the computer platform such as an additional data storage device and a printing device.

Abstract

A method and apparatus are described including receiving a user request, determining if the user request is a playback request, displaying to a user, playback options, if the user request is the playback request, receiving the playback options selected by the user, determining if the playback options selected by the user are by audio selection and forwarding scenes that match the audio selection of the user for playback.

Description

VIDEO SEGMENTATION BY AUDIO SELECTION
FIELD OF THE INVENTION
The present invention relates to video content playback, and, in particular to the video content playback based upon audio selections.
BACKGROUND OF THE INVENTION
Conventionally, video content has been broken (segmented) into scenes either due to editing decisions or how a script is written. Such scenes are typically used for trick play functions (jump to the next scene) or chapters that are accessed on a DVD/Blu-Ray disc.
SUMMARY OF THE INVENTION
The present invention segments video content based upon the audio (music) associations of the video instead of as in the prior art based on scenes associated with the video. The present invention provides a framework for breaking up a video for a media asset such as a television show or movie or other content (including multimedia content) into segments that correlate to the audio/music selections instead of scenes as in the prior art.
A method and apparatus are described including receiving a user request, determining if the user request is a playback request, displaying to a user, playback options, if the user request is the playback request, receiving the playback options selected by the user, determining if the playback options selected by the user are by audio selection and forwarding scenes that match the audio selection of the user for playback.
BRIEF DESCRIPTION OF THE DRAWINGS
The present invention is best understood from the following detailed description when read in conjunction with the accompanying drawings. The drawings include the following figures briefly described below:
Fig. 1 is a flowchart of an exemplary method for segmenting a video into different segments that correlate to the audio associated with such a video. Fig. 2 presents the results of the application of the described method of the present invention.
Figs. 3 A and 3B together form a flowchart of an exemplary embodiment of the method of the present invention.
Fig. 4 is a block diagram of an exemplary embodiment in accordance with the principles of the present invention.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
The present invention provides a mechanism for breaking up (segmenting) a video (such as a movie or a television show or any other form of content) into audio selections, which can be music selections such as songs or background music. The video segment can then be mapped to such audio segments (called M) instead of using the typically scenes for which a video is segmented. Video scenes are typically the way a video is constructed either due to editing decisions, how a script is written, commercial break requirements, and the like.
Fig. 1 is a flowchart of an exemplary method for segmenting a video into different segments that correlate to the audio associated with such a video. The invention begins with a video asset that is composed of a number of scenes. Each scene can then be associated with a number (0 to X) of music selections which can be songs associated with different music artists. At 105, a media asset containing audio and video is ingested into a server or other type of storage device. At 110, a time reference is assigned to the audio and the video of the media asset. Preferably, such a time reference can be timestamps although other mechanisms can be used for this step. At 115, an audio recognition mechanism is applied to recognize the different audio segments in the media asset. Such audio recognition techniques can be the use of a program such as Shazam, which identifies songs/music selections based on audio fingerprint of the audio selection or AudioID developed by Fraunhofer institute, volume changes in the audio such as periods of silence that exceed a threshold time value and the like. Other audio recognition mechanisms can be used and/or combined with Shazam and/or AudioID. At 120, the video is segmented by way of the audio selections that were identified in the previous step. That is, instead of having the video scenes control how a video is organized, it will be the audio selections which are used to segment a video. Hence, the beginning and the end of an audio selection will have a video segment corresponding in kind. The time basis for the audio and the video segment would then match up using the time reference (e.g., time stamps). In an alternative embodiment, if the audio segment is identified using an audio recognition technique, the audio segment can be referred by a song title, artist, copyright, and the like if such information can be determined by the characteristics of the audio segment, whereby such information can be assigned as metadata to such audio selections.
Fig. 2 presents the results of the application of the described method of the present invention where a video that is 30 minutes in length is divided according to Music Selections Ml, M2, M3, M4, M5, and M6. In addition, Fig. 2 displays the corresponding scenes SI, S2, S3, and S4.
Table 1 presents an example of audio selections and the related metadata of such selections. Using the information listed below, a user can specify that a video be played back where the audio segments that correspond to a certain artist (M83, Prince) are used to select the corresponding video segments that are played back (Ml, for example for M83 and M2 and M3 for Prince). A user can also be provided with an option to buy the corresponding audio segments in a form such as an MP3 from a service such as M-Go, Amazon, and the like if a video is presented with a corresponding list of audio selections as presented in Table 1. Other examples are possible in accordance with the described principles.
Figure imgf000005_0001
A first aspect of the invention lets a user playback all of the scenes that match a particular artist or song title. Hence, if Prince is selected as the relevant artist in the present example both the 1st scene and the 2nd scene will be played back (from the beginning of such scenes or where such songs begin in accordance with a user preference). In addition, the playback of such scenes can be done in sequential order or in a randomized order.
A second aspect of the invention will let a user playback "all" of the scenes that pertain to a particular artist or song title where all of such scenes are available to a user. Such scenes can be from different movies, television shows, and the like. One feature of this aspect of the present invention is that the scenes that are played back can have the SAME song performed by a different artist. Hence, a cover version of "BLACKBIRD" could be played back with a corresponding scene instead of having the "4th scene" where the Beatles performed the version of "BLACKBIRD". This is a substitute song feature.
A third aspect will let a user buy MP3 versions of songs from a service provider such as AMAZON, M-GO, and the like. With this embodiment, alternative versions of songs can be offered as well as "Blackbird" from the Beatles and respective cover versions of the song.
The invention could be performed by a content server or content distribution system. In an alternative embodiment, the segmented content could be stored on a user's device including but not limited to televisions, set top boxes, computers, laptops, dual mode smart phones, iPods, iPads, iPhones, and tablets. As processing power increases and processing devices get smaller (and less costly) much of the above described segmentation could be performed by an end user's device. The third aspect of the present invention involving purchasing certain audio would be coordinated with the user's content provider in terms of billing (accounting). Other secondary aspects of the third aspect of the present invention could include offering user's "deals" if they purchased, for example, "Blackbird" by all artists that recorded that song or perhaps if the user purchased an entire Beatles album on which the song "Blackbird" was included or if they purchased other movies or movie segments that included the song "Blackbird".
Figs. 3 A and 3B together form a flowchart of an exemplary embodiment of the method of the present invention. At 305, a user request is received. At 310, a test is performed to determine if the user request is a playback request. A playback request may be by movie, TV show, other content, by scene, or by audio selection. If the user request is a playback request, then at 315, the playback options are displayed to the user. At 320, the user's playback options are received. At 325, a test is performed to determine if the playback option selected is by audio selection (rather than by scene as in the prior art). If the playback option selected is not by audio selection then the movie, TV show or scene selected by the user is forwarded to the user for playback at 330. If the playback option selected is by audio selection then at 335, a test is performed to determine if all scenes in the current content that match the user's audio selection are to be played back. If all scenes in the current content that match the user's audio selection are to be played back then a test is performed at 340 to determine if the playback is to start at the beginning of each scene of the current content that matches the user's audio selection. If the playback is to start at the beginning of each scene of the current content that matches the user's audio selection then forward the scenes (in the order that the scenes occur or shuffled (in some random order)) to the user for playback at 345. If the playback is not to start at the beginning of each scene of the current content that matches the user's audio selection then at 350 forward the scenes (in the order that the scenes occur or shuffled (in some random order)) starting at the point where the audio selection begins to the user for playback. If all scenes in the current content that match the user's audio selection are not to be played back then at 355 a test is performed to determine if all available scenes that match the user's audio selection are to be played back. If all available scenes that match the user's audio selection are to be played back then at 360, forward all available scenes that match the user's audio selection to the user for playback. It should be understood that the user may playback the received content immediately or store the received content for playback at a later time more convenient for the user.
If all available scenes that match the user's audio selection are not to be played back then at 380, the available scenes that match the user's audio selection are displayed. At 385, the user's scene selections from the available scenes are received. Steps/operations 340, 345 and 345 will not be described again since they are he same as described above.
If the user request was not a playback request, then it is assumed to be a purchase request for audio selections and at 365 a request for audio selections and account information to make the purchase is displayed. At 370, the user's account information is received and validated. At 375, the audio selections purchased by the user are transmitted to the user.
Fig. 4 is a block diagram of an exemplary embodiment in accordance with the principles of the present invention. The present invention is practiced at a content provider's equipment/system. The user interface module 405 is in bi-directional communication with the user. User interface module 405 is also in bi-directional communication with a user request module 410, which parses the user's request to determine if the user's request is a purchase request or a playback request. If the user's request is a purchase request then the user request module 410, which is in bi- directional communication with the purchase request module 415, forwards the user's purchase request to the purchases request module 415. The purchase request module 415 is in bi-directional communication with the account information module 420. The account information module request and receives the user's account information via the user requests module 410 and the user interface module 405 and validates the user's account information. Once the user's account information is validated the account information module 420 returns confirmation to the purchase request module 415. The purchase request module 415 then requests and retrieves the user's audio selection information via the user request module 410 and the user interface module 405. The user's audio selection information is forwarded to the retrieve audio selection module 425 with which the purchase request module is in bi-directional communication. The retrieve audio selection module 425 retrieves the user's audio selection from the content storage and retrieval module 445 with which the retrieve audio selection module 425 is in bi-directional communication.
If the user request is determined by the user request module 410 to be a playback request then the user request module forwards the request to playback module 430 with which the user request module is in bi-directional communication. The \playback module 430 determines if the playback is by scene (or entire content) or by audio selection. If the playback is by scene (or entire content) then the playback request module 430 forwards the user's selection to the retrieve content by scene or entire content module 440 with which the playback request module 430 is in bidirectional communication. The retrieve content by scene or entire content module 440 retrieves the user's content selection from the content storage and retrieval module 445 with which the retrieve content by scene or entire content module 440 is in bi-directional communication. If the user's request is determined by the playback request module 430 to be a request to playback by audio selection then the playback request module 430 forwards the user's selection to the retrieve content by audio selection to the retrieve content by audio selection module 435 with which the playback request module 430 is in bi- directional communication. The user's selection information includes whether to playback scenes starting at the beginning of each scene or starting at the point that the audio selection begins. The user's selection information also includes whether the user wants to playback scenes that match the user's audio selection in the current content or all available content. The retrieve content by audio selection module 435 retrieves the user's content selection that match the user's audio selection from the content storage and retrieval module 445 with which the retrieve content by audio selection module 435 is in bi-directional communication. All content retrieved from the content storage and retrieval module is transmitted (forwarded) to the user on the reverse path from which the request was received via the bi-directional communication paths.
The means for receiving a user request is the user interface module. The means for determining if said user request is a playback request is the user request module. The means for displaying to a user, playback options, if said user request is said playback request is the user interface module. The means for receiving said playback options selected by said user is the user interface module. The means for determining if said playback options selected by said user are by audio selection is the playback request module. The means for forwarding scenes that match said audio selection of said user for playback is via retrieve content by audio selection module (having retrieved the selected content from the content storage and retrieval module). The retrieve content by audio selection module forwards the retrieved content back to the playback request module which forwards the retrieved content to the user request module which forwards the retrieved content to the user interface module, which forwards the retrieved content to the user for playback.
The means for determining if said user has selected playing back of all scenes in content currently being viewed by said user is the playback request module. The means for determining if said user has selected playing back content starting at a beginning of each scene or said user has selected playing back content where said audio selection selected by said user commences is the playback request module. The means for forwarding content selected by said user for playback, if said playback options selected by said user are not by audio selection is via retrieve content by scene or entire content module (having retrieved the selected content from the content storage and retrieval module). The retrieve content by scene or entire content module forwards the retrieved content back to the playback request module which forwards the retrieved content to the user request module which forwards the retrieved content to the user interface module, which forwards the retrieved content to the user for playback.
The means for determining if said user has selected playing back all scenes in all available content, if said user has not selected playing back scenes that match the user's audio selection from said current content is the playback request module. The means for displaying a list of all scenes of all available content that match said audio selection of said user is the user interface module. The means for receiving said user's scene selection from said list of all scenes of all available content that match said audio selection of said user is the user interface module. The means for determining if said user has selected playing back content starting at a beginning of each scene or said user has selected playing back content where said audio selection selected by said user commences is the playback request module.
The means for receiving said audio selections for purchase and account information from said user is the uer interface module. The means for validating said user's account information is the account information module. The means for transmitting said purchased selections to said user is via the retrieve audio selection module (having retrieved the selected content from the content storage and retrieval module). The retrieve audio selection module forwards the retrieved content back to the purchase request module which forwards the retrieved content to the user request module which forwards the retrieved content to the user interface module, which forwards the retrieved content to the user for playback.
It is to be understood that the present invention may be implemented in various forms of hardware, software, firmware, special purpose processors, or a combination thereof. Special purpose processors may include application specific integrated circuits (ASICs), reduced instruction set computers (RISCs) and/or field programmable gate arrays (FPGAs). Preferably, the present invention is implemented as a combination of hardware and software. Moreover, the software is preferably implemented as an application program tangibly embodied on a program storage device. The application program may be uploaded to, and executed by, a machine comprising any suitable architecture. Preferably, the machine is implemented on a computer platform having hardware such as one or more central processing units (CPU), a random access memory (RAM), and input/output (I/O) interface(s). The computer platform also includes an operating system and microinstruction code. The various processes and functions described herein may either be part of the microinstruction code or part of the application program (or a combination thereof), which is executed via the operating system. In addition, various other peripheral devices may be connected to the computer platform such as an additional data storage device and a printing device.
It is to be further understood that, because some of the constituent system components and method steps depicted in the accompanying figures are preferably implemented in software, the actual connections between the system components (or the process steps) may differ depending upon the manner in which the present invention is programmed. Given the teachings herein, one of ordinary skill in the related art will be able to contemplate these and similar implementations or configurations of the present invention.

Claims

CLAIMS:
1. A method, said method comprising: receiving a user request; determining if said user request is a playback request; displaying to a user, playback options, if said user request is said playback request; receiving said playback options selected by said user; determining if said playback options selected by said user are by audio selection; and forwarding scenes that match said audio selection of said user for playback.
2. The method according to claim 1, further comprising: determining if said user has selected playing back of all scenes in content currently being viewed by said user; and determining if said user has selected playing back content starting at a beginning of each scene or said user has selected playing back content where said audio selection selected by said user commences.
3. The method according to claim 1, further comprising forwarding content selected by said user for playback, if said playback options selected by said user are not by audio selection.
4. The method according to claim 2, further comprising determining if said user has selected playing back all scenes in all available content, if said user has not selected playing back scenes that match the user's audio selection from said current content. 5. The method according to claim 4, further comprising: displaying a list of all scenes of all available content that match said audio selection of said user; receiving said user's scene selection from said list of all scenes of all available content that match said audio selection of said user; and determining if said user has selected playing back content starting at a beginning of each scene or said user has selected playing back content where said audio selection selected by said user commences.
The method according to claim 1, further comprising: receiving said audio selections for purchase and account information from said user; validating said user's account information; and transmitting said purchased selections to said user.
A system, comprising: means for receiving a user request; means for determining if said user request is a playback request; means for displaying to a user, playback options, if said user request is said playback request; means for receiving said playback options selected by said user; means for determining if said playback options selected by said user are by audio selection; and means for forwarding scenes that match said audio selection of said user for playback.
The system according to claim 7, further comprising: means for determining if said user has selected playing back of all scenes in content currently being viewed by said user; and means for determining if said user has selected playing back content starting at a beginning of each scene or said user has selected playing back content where said audio selection selected by said user commences.
9. The system according to claim 7, further comprising means for forwarding content selected by said user for playback, if said playback options selected by said user are not by audio selection.
10. The system according to claim 8, further comprising means for determining if said user has selected playing back all scenes in all available content, if said user has not selected playing back scenes that match the user's audio selection from said current content.
11. The system according to claim 10, further comprising: means for displaying a list of all scenes of all available content that match said audio selection of said user; means for receiving said user's scene selection from said list of all scenes of all available content that match said audio selection of said user; and means for determining if said user has selected playing back content starting at a beginning of each scene or said user has selected playing back content where said audio selection selected by said user commences.
12. The system according to claim 7, further comprising: means for receiving said audio selections for purchase and account information from said user; means for validating said user's account information; and means for transmitting said purchased selections to said user.
PCT/US2013/059343 2013-09-12 2013-09-12 Video segmentation by audio selection WO2015038121A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/US2013/059343 WO2015038121A1 (en) 2013-09-12 2013-09-12 Video segmentation by audio selection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2013/059343 WO2015038121A1 (en) 2013-09-12 2013-09-12 Video segmentation by audio selection

Publications (1)

Publication Number Publication Date
WO2015038121A1 true WO2015038121A1 (en) 2015-03-19

Family

ID=49226576

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2013/059343 WO2015038121A1 (en) 2013-09-12 2013-09-12 Video segmentation by audio selection

Country Status (1)

Country Link
WO (1) WO2015038121A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH10111872A (en) * 1996-10-08 1998-04-28 Nippon Telegr & Teleph Corp <Ntt> Device and method for distributing moving image
US6600874B1 (en) * 1997-03-19 2003-07-29 Hitachi, Ltd. Method and device for detecting starting and ending points of sound segment in video
WO2005093752A1 (en) * 2004-03-23 2005-10-06 British Telecommunications Public Limited Company Method and system for detecting audio and video scene changes
EP1708101A1 (en) * 2004-01-14 2006-10-04 Mitsubishi Denki Kabushiki Kaisha Summarizing reproduction device and summarizing reproduction method
US20080304807A1 (en) * 2007-06-08 2008-12-11 Gary Johnson Assembling Video Content
EP2608206A2 (en) * 2011-12-21 2013-06-26 Samsung Electronics Co., Ltd. Content playing apparatus and control method thereof

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH10111872A (en) * 1996-10-08 1998-04-28 Nippon Telegr & Teleph Corp <Ntt> Device and method for distributing moving image
US6600874B1 (en) * 1997-03-19 2003-07-29 Hitachi, Ltd. Method and device for detecting starting and ending points of sound segment in video
EP1708101A1 (en) * 2004-01-14 2006-10-04 Mitsubishi Denki Kabushiki Kaisha Summarizing reproduction device and summarizing reproduction method
WO2005093752A1 (en) * 2004-03-23 2005-10-06 British Telecommunications Public Limited Company Method and system for detecting audio and video scene changes
US20080304807A1 (en) * 2007-06-08 2008-12-11 Gary Johnson Assembling Video Content
EP2608206A2 (en) * 2011-12-21 2013-06-26 Samsung Electronics Co., Ltd. Content playing apparatus and control method thereof

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
SARACENO C ET AL: "Identification of story units in audio-visual sequences by joint audio and video processing", IMAGE PROCESSING, 1998. ICIP 98. PROCEEDINGS. 1998 INTERNATIONAL CONFERENCE ON CHICAGO, IL, USA 4-7 OCT. 1998, LOS ALAMITOS, CA, USA,IEEE COMPUT. SOC, US, vol. 1, 4 October 1998 (1998-10-04), pages 363 - 367, XP010308744, ISBN: 978-0-8186-8821-8, DOI: 10.1109/ICIP.1998.723500 *
SMITH M A ET AL: "VIDEO SKIMMING AND CHARACTERIZATION THROUGH THE COMBINATION OF IMAGE AND LANGUAGE UNDERSTANDING TECHNIQUES", PROCEEDINGS OF THE 1997 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION. SAN JUAN, PUERTO RICO, JUNE 17 - 19, 1997; [PROCEEDINGS OF THE IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION], LOS ALAM, vol. CONF. 16, 17 June 1997 (1997-06-17), pages 775 - 781, XP000776576, ISBN: 978-0-7803-4236-1 *
SUNDARAM H ET AL: "A UTILITY FRAMEWORK FOR THE AUTOMATIC GENERATION OF AUDIO-VISUAL SKIMS", PROCEEDINGS ACM MULTIMEDIA 2002. 10TH. INTERNATIONAL CONFERENCE ON MULTIMEDIA. JUAN-LES-PINS, FRANCE, DEC. 1 - 6, 2002; [ACM INTERNATIONAL MULTIMEDIA CONFERENCE], ACM, NEW YORK, NY, vol. CONF. 10, 1 December 2002 (2002-12-01), pages 189 - 198, XP001174988, ISBN: 978-1-58113-620-3, DOI: 10.1145/641007.641042 *
SUNDARAM H ET AL: "Video scene segmentation using video and audio features", MULTIMEDIA AND EXPO, 2000. ICME 2000. 2000 IEEE INTERNATIONAL CONFEREN CE ON NEW YORK, NY, USA 30 JULY-2 AUG. 2000, PISCATAWAY, NJ, USA,IEEE, US, vol. 2, 30 July 2000 (2000-07-30), pages 1145 - 1148, XP010513212, ISBN: 978-0-7803-6536-0, DOI: 10.1109/ICME.2000.871563 *
YAO WANG ET AL: "Multimedia Content Analysis - Using Both Audio and Visual Clues", IEEE SIGNAL PROCESSING MAGAZINE, IEEE SERVICE CENTER, PISCATAWAY, NJ, US, vol. 17, no. 6, 1 November 2000 (2000-11-01), pages 12 - 36, XP011089877, ISSN: 1053-5888, DOI: 10.1109/79.888862 *

Similar Documents

Publication Publication Date Title
US10157181B2 (en) Mixed source media playback
US9077956B1 (en) Scene identification
US9319724B2 (en) Favorite media program scenes systems and methods
US9380282B2 (en) Providing item information during video playing
US9124950B2 (en) Providing item information notification during video playing
CN104575550B (en) Multimedia file title skipping method and electronic device
US20110289135A1 (en) Asset resolvable bookmarks
US9635337B1 (en) Dynamically generated media trailers
US9092436B2 (en) Programming interface for use by media bundles to provide media presentations
US8522357B2 (en) Rights-based advertisement management in protected media
US20160249091A1 (en) Method and an electronic device for providing a media stream
JP2014534513A (en) Method and user interface for classifying media assets
US20150350736A1 (en) Source agnostic content model
CN101202894B (en) Method, system for playing program sequence and digital television receiver
JP2010282319A (en) Server apparatus and advertisement system
US9501482B2 (en) Download queue as part of user interface library view for on-demand content systems and methods
US10536735B2 (en) Purchasing and viewing content based on a linear broadcast
US20150312613A1 (en) Personalized smart-list video channels
US20140156739A1 (en) Client device, information processing method, and information processing system
US11956520B2 (en) Methods and systems for providing dynamically composed personalized media assets
WO2015038121A1 (en) Video segmentation by audio selection
US11729480B2 (en) Systems and methods to enhance interactive program watching
US11570523B1 (en) Systems and methods to enhance interactive program watching
US20240022774A1 (en) Smart automatic skip mode
JP4826677B2 (en) Recording medium and reproducing apparatus

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13765906

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13765906

Country of ref document: EP

Kind code of ref document: A1