WO2015038121A1

WO2015038121A1 - Video segmentation by audio selection

Info

Publication number: WO2015038121A1
Application number: PCT/US2013/059343
Authority: WO
Inventors: Samo KONYAR
Original assignee: Thomson Licensing
Priority date: 2013-09-12
Filing date: 2013-09-12
Publication date: 2015-03-19

Abstract

A method and apparatus are described including receiving a user request, determining if the user request is a playback request, displaying to a user, playback options, if the user request is the playback request, receiving the playback options selected by the user, determining if the playback options selected by the user are by audio selection and forwarding scenes that match the audio selection of the user for playback.

Description

VIDEO SEGMENTATION BY AUDIO SELECTION

FIELD OF THE INVENTION

The present invention relates to video content playback, and, in particular to the video content playback based upon audio selections.

BACKGROUND OF THE INVENTION

Conventionally, video content has been broken (segmented) into scenes either due to editing decisions or how a script is written. Such scenes are typically used for trick play functions (jump to the next scene) or chapters that are accessed on a DVD/Blu-Ray disc.

SUMMARY OF THE INVENTION

The present invention segments video content based upon the audio (music) associations of the video instead of as in the prior art based on scenes associated with the video. The present invention provides a framework for breaking up a video for a media asset such as a television show or movie or other content (including multimedia content) into segments that correlate to the audio/music selections instead of scenes as in the prior art.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention is best understood from the following detailed description when read in conjunction with the accompanying drawings. The drawings include the following figures briefly described below:

Fig. 1 is a flowchart of an exemplary method for segmenting a video into different segments that correlate to the audio associated with such a video. Fig. 2 presents the results of the application of the described method of the present invention.

Figs. 3 A and 3B together form a flowchart of an exemplary embodiment of the method of the present invention.

Fig. 4 is a block diagram of an exemplary embodiment in accordance with the principles of the present invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

The present invention provides a mechanism for breaking up (segmenting) a video (such as a movie or a television show or any other form of content) into audio selections, which can be music selections such as songs or background music. The video segment can then be mapped to such audio segments (called M) instead of using the typically scenes for which a video is segmented. Video scenes are typically the way a video is constructed either due to editing decisions, how a script is written, commercial break requirements, and the like.

Fig. 1 is a flowchart of an exemplary method for segmenting a video into different segments that correlate to the audio associated with such a video. The invention begins with a video asset that is composed of a number of scenes. Each scene can then be associated with a number (0 to X) of music selections which can be songs associated with different music artists. At 105, a media asset containing audio and video is ingested into a server or other type of storage device. At 110, a time reference is assigned to the audio and the video of the media asset. Preferably, such a time reference can be timestamps although other mechanisms can be used for this step. At 115, an audio recognition mechanism is applied to recognize the different audio segments in the media asset. Such audio recognition techniques can be the use of a program such as Shazam, which identifies songs/music selections based on audio fingerprint of the audio selection or AudioID developed by Fraunhofer institute, volume changes in the audio such as periods of silence that exceed a threshold time value and the like. Other audio recognition mechanisms can be used and/or combined with Shazam and/or AudioID. At 120, the video is segmented by way of the audio selections that were identified in the previous step. That is, instead of having the video scenes control how a video is organized, it will be the audio selections which are used to segment a video. Hence, the beginning and the end of an audio selection will have a video segment corresponding in kind. The time basis for the audio and the video segment would then match up using the time reference (e.g., time stamps). In an alternative embodiment, if the audio segment is identified using an audio recognition technique, the audio segment can be referred by a song title, artist, copyright, and the like if such information can be determined by the characteristics of the audio segment, whereby such information can be assigned as metadata to such audio selections.

Fig. 2 presents the results of the application of the described method of the present invention where a video that is 30 minutes in length is divided according to Music Selections Ml, M2, M3, M4, M5, and M6. In addition, Fig. 2 displays the corresponding scenes SI, S2, S3, and S4.

Table 1 presents an example of audio selections and the related metadata of such selections. Using the information listed below, a user can specify that a video be played back where the audio segments that correspond to a certain artist (M83, Prince) are used to select the corresponding video segments that are played back (Ml, for example for M83 and M2 and M3 for Prince). A user can also be provided with an option to buy the corresponding audio segments in a form such as an MP3 from a service such as M-Go, Amazon, and the like if a video is presented with a corresponding list of audio selections as presented in Table 1. Other examples are possible in accordance with the described principles.

A first aspect of the invention lets a user playback all of the scenes that match a particular artist or song title. Hence, if Prince is selected as the relevant artist in the present example both the 1^st scene and the 2^nd scene will be played back (from the beginning of such scenes or where such songs begin in accordance with a user preference). In addition, the playback of such scenes can be done in sequential order or in a randomized order.

A second aspect of the invention will let a user playback "all" of the scenes that pertain to a particular artist or song title where all of such scenes are available to a user. Such scenes can be from different movies, television shows, and the like. One feature of this aspect of the present invention is that the scenes that are played back can have the SAME song performed by a different artist. Hence, a cover version of "BLACKBIRD" could be played back with a corresponding scene instead of having the "4^th scene" where the Beatles performed the version of "BLACKBIRD". This is a substitute song feature.

A third aspect will let a user buy MP3 versions of songs from a service provider such as AMAZON, M-GO, and the like. With this embodiment, alternative versions of songs can be offered as well as "Blackbird" from the Beatles and respective cover versions of the song.

The invention could be performed by a content server or content distribution system. In an alternative embodiment, the segmented content could be stored on a user's device including but not limited to televisions, set top boxes, computers, laptops, dual mode smart phones, iPods, iPads, iPhones, and tablets. As processing power increases and processing devices get smaller (and less costly) much of the above described segmentation could be performed by an end user's device. The third aspect of the present invention involving purchasing certain audio would be coordinated with the user's content provider in terms of billing (accounting). Other secondary aspects of the third aspect of the present invention could include offering user's "deals" if they purchased, for example, "Blackbird" by all artists that recorded that song or perhaps if the user purchased an entire Beatles album on which the song "Blackbird" was included or if they purchased other movies or movie segments that included the song "Blackbird".

Figs. 3 A and 3B together form a flowchart of an exemplary embodiment of the method of the present invention. At 305, a user request is received. At 310, a test is performed to determine if the user request is a playback request. A playback request may be by movie, TV show, other content, by scene, or by audio selection. If the user request is a playback request, then at 315, the playback options are displayed to the user. At 320, the user's playback options are received. At 325, a test is performed to determine if the playback option selected is by audio selection (rather than by scene as in the prior art). If the playback option selected is not by audio selection then the movie, TV show or scene selected by the user is forwarded to the user for playback at 330. If the playback option selected is by audio selection then at 335, a test is performed to determine if all scenes in the current content that match the user's audio selection are to be played back. If all scenes in the current content that match the user's audio selection are to be played back then a test is performed at 340 to determine if the playback is to start at the beginning of each scene of the current content that matches the user's audio selection. If the playback is to start at the beginning of each scene of the current content that matches the user's audio selection then forward the scenes (in the order that the scenes occur or shuffled (in some random order)) to the user for playback at 345. If the playback is not to start at the beginning of each scene of the current content that matches the user's audio selection then at 350 forward the scenes (in the order that the scenes occur or shuffled (in some random order)) starting at the point where the audio selection begins to the user for playback. If all scenes in the current content that match the user's audio selection are not to be played back then at 355 a test is performed to determine if all available scenes that match the user's audio selection are to be played back. If all available scenes that match the user's audio selection are to be played back then at 360, forward all available scenes that match the user's audio selection to the user for playback. It should be understood that the user may playback the received content immediately or store the received content for playback at a later time more convenient for the user.

If all available scenes that match the user's audio selection are not to be played back then at 380, the available scenes that match the user's audio selection are displayed. At 385, the user's scene selections from the available scenes are received. Steps/operations 340, 345 and 345 will not be described again since they are he same as described above.

If the user request was not a playback request, then it is assumed to be a purchase request for audio selections and at 365 a request for audio selections and account information to make the purchase is displayed. At 370, the user's account information is received and validated. At 375, the audio selections purchased by the user are transmitted to the user.

Fig. 4 is a block diagram of an exemplary embodiment in accordance with the principles of the present invention. The present invention is practiced at a content provider's equipment/system. The user interface module 405 is in bi-directional communication with the user. User interface module 405 is also in bi-directional communication with a user request module 410, which parses the user's request to determine if the user's request is a purchase request or a playback request. If the user's request is a purchase request then the user request module 410, which is in bi- directional communication with the purchase request module 415, forwards the user's purchase request to the purchases request module 415. The purchase request module 415 is in bi-directional communication with the account information module 420. The account information module request and receives the user's account information via the user requests module 410 and the user interface module 405 and validates the user's account information. Once the user's account information is validated the account information module 420 returns confirmation to the purchase request module 415. The purchase request module 415 then requests and retrieves the user's audio selection information via the user request module 410 and the user interface module 405. The user's audio selection information is forwarded to the retrieve audio selection module 425 with which the purchase request module is in bi-directional communication. The retrieve audio selection module 425 retrieves the user's audio selection from the content storage and retrieval module 445 with which the retrieve audio selection module 425 is in bi-directional communication.

If the user request is determined by the user request module 410 to be a playback request then the user request module forwards the request to playback module 430 with which the user request module is in bi-directional communication. The \playback module 430 determines if the playback is by scene (or entire content) or by audio selection. If the playback is by scene (or entire content) then the playback request module 430 forwards the user's selection to the retrieve content by scene or entire content module 440 with which the playback request module 430 is in bidirectional communication. The retrieve content by scene or entire content module 440 retrieves the user's content selection from the content storage and retrieval module 445 with which the retrieve content by scene or entire content module 440 is in bi-directional communication. If the user's request is determined by the playback request module 430 to be a request to playback by audio selection then the playback request module 430 forwards the user's selection to the retrieve content by audio selection to the retrieve content by audio selection module 435 with which the playback request module 430 is in bi- directional communication. The user's selection information includes whether to playback scenes starting at the beginning of each scene or starting at the point that the audio selection begins. The user's selection information also includes whether the user wants to playback scenes that match the user's audio selection in the current content or all available content. The retrieve content by audio selection module 435 retrieves the user's content selection that match the user's audio selection from the content storage and retrieval module 445 with which the retrieve content by audio selection module 435 is in bi-directional communication. All content retrieved from the content storage and retrieval module is transmitted (forwarded) to the user on the reverse path from which the request was received via the bi-directional communication paths.

The means for receiving a user request is the user interface module. The means for determining if said user request is a playback request is the user request module. The means for displaying to a user, playback options, if said user request is said playback request is the user interface module. The means for receiving said playback options selected by said user is the user interface module. The means for determining if said playback options selected by said user are by audio selection is the playback request module. The means for forwarding scenes that match said audio selection of said user for playback is via retrieve content by audio selection module (having retrieved the selected content from the content storage and retrieval module). The retrieve content by audio selection module forwards the retrieved content back to the playback request module which forwards the retrieved content to the user request module which forwards the retrieved content to the user interface module, which forwards the retrieved content to the user for playback.

The means for determining if said user has selected playing back of all scenes in content currently being viewed by said user is the playback request module. The means for determining if said user has selected playing back content starting at a beginning of each scene or said user has selected playing back content where said audio selection selected by said user commences is the playback request module. The means for forwarding content selected by said user for playback, if said playback options selected by said user are not by audio selection is via retrieve content by scene or entire content module (having retrieved the selected content from the content storage and retrieval module). The retrieve content by scene or entire content module forwards the retrieved content back to the playback request module which forwards the retrieved content to the user request module which forwards the retrieved content to the user interface module, which forwards the retrieved content to the user for playback.

The means for determining if said user has selected playing back all scenes in all available content, if said user has not selected playing back scenes that match the user's audio selection from said current content is the playback request module. The means for displaying a list of all scenes of all available content that match said audio selection of said user is the user interface module. The means for receiving said user's scene selection from said list of all scenes of all available content that match said audio selection of said user is the user interface module. The means for determining if said user has selected playing back content starting at a beginning of each scene or said user has selected playing back content where said audio selection selected by said user commences is the playback request module.

The means for receiving said audio selections for purchase and account information from said user is the uer interface module. The means for validating said user's account information is the account information module. The means for transmitting said purchased selections to said user is via the retrieve audio selection module (having retrieved the selected content from the content storage and retrieval module). The retrieve audio selection module forwards the retrieved content back to the purchase request module which forwards the retrieved content to the user request module which forwards the retrieved content to the user interface module, which forwards the retrieved content to the user for playback.

It is to be understood that the present invention may be implemented in various forms of hardware, software, firmware, special purpose processors, or a combination thereof. Special purpose processors may include application specific integrated circuits (ASICs), reduced instruction set computers (RISCs) and/or field programmable gate arrays (FPGAs). Preferably, the present invention is implemented as a combination of hardware and software. Moreover, the software is preferably implemented as an application program tangibly embodied on a program storage device. The application program may be uploaded to, and executed by, a machine comprising any suitable architecture. Preferably, the machine is implemented on a computer platform having hardware such as one or more central processing units (CPU), a random access memory (RAM), and input/output (I/O) interface(s). The computer platform also includes an operating system and microinstruction code. The various processes and functions described herein may either be part of the microinstruction code or part of the application program (or a combination thereof), which is executed via the operating system. In addition, various other peripheral devices may be connected to the computer platform such as an additional data storage device and a printing device.

It is to be further understood that, because some of the constituent system components and method steps depicted in the accompanying figures are preferably implemented in software, the actual connections between the system components (or the process steps) may differ depending upon the manner in which the present invention is programmed. Given the teachings herein, one of ordinary skill in the related art will be able to contemplate these and similar implementations or configurations of the present invention.

Claims

CLAIMS:

1. A method, said method comprising: receiving a user request; determining if said user request is a playback request; displaying to a user, playback options, if said user request is said playback request; receiving said playback options selected by said user; determining if said playback options selected by said user are by audio selection; and forwarding scenes that match said audio selection of said user for playback.

2. The method according to claim 1, further comprising: determining if said user has selected playing back of all scenes in content currently being viewed by said user; and determining if said user has selected playing back content starting at a beginning of each scene or said user has selected playing back content where said audio selection selected by said user commences.

3. The method according to claim 1, further comprising forwarding content selected by said user for playback, if said playback options selected by said user are not by audio selection.

4. The method according to claim 2, further comprising determining if said user has selected playing back all scenes in all available content, if said user has not selected playing back scenes that match the user's audio selection from said current content. 5. The method according to claim 4, further comprising: displaying a list of all scenes of all available content that match said audio selection of said user; receiving said user's scene selection from said list of all scenes of all available content that match said audio selection of said user; and determining if said user has selected playing back content starting at a beginning of each scene or said user has selected playing back content where said audio selection selected by said user commences.

The method according to claim 1, further comprising: receiving said audio selections for purchase and account information from said user; validating said user's account information; and transmitting said purchased selections to said user.

A system, comprising: means for receiving a user request; means for determining if said user request is a playback request; means for displaying to a user, playback options, if said user request is said playback request; means for receiving said playback options selected by said user; means for determining if said playback options selected by said user are by audio selection; and means for forwarding scenes that match said audio selection of said user for playback.

The system according to claim 7, further comprising: means for determining if said user has selected playing back of all scenes in content currently being viewed by said user; and means for determining if said user has selected playing back content starting at a beginning of each scene or said user has selected playing back content where said audio selection selected by said user commences.

9. The system according to claim 7, further comprising means for forwarding content selected by said user for playback, if said playback options selected by said user are not by audio selection.

10. The system according to claim 8, further comprising means for determining if said user has selected playing back all scenes in all available content, if said user has not selected playing back scenes that match the user's audio selection from said current content.

11. The system according to claim 10, further comprising: means for displaying a list of all scenes of all available content that match said audio selection of said user; means for receiving said user's scene selection from said list of all scenes of all available content that match said audio selection of said user; and means for determining if said user has selected playing back content starting at a beginning of each scene or said user has selected playing back content where said audio selection selected by said user commences.

12. The system according to claim 7, further comprising: means for receiving said audio selections for purchase and account information from said user; means for validating said user's account information; and means for transmitting said purchased selections to said user.