WO2015131369A1

WO2015131369A1 - Constructing a visual representation of a video

Info

Publication number: WO2015131369A1
Application number: PCT/CN2014/072987
Authority: WO
Inventors: Kongqiao Wang; Yingfei Liu; Hao Wang
Original assignee: Nokia Technologies Oy; Nokia (China) Investment Co., Ltd.
Priority date: 2014-03-06
Filing date: 2014-03-06
Publication date: 2015-09-11

Abstract

A method, apparatus and computer program product are provided in order to construct a visual representation of a video including one or more segmented objects from the frames of the video. In the context of a method, an object that appears in a plurality of frames of a video is identified. The method also includes segmenting the object from one or more frames of the video to create one or more segmented objects. The method further includes constructing a three-dimensional visual representation of the video including a representation of at least one frame of the video that includes the object. The three-dimensional visual representation also includes the one or more segmented objects positioned in a time sequence relative to the at least one frame of the video.

Description

CONSTRUCTING A VISUAL REPRESENTATION OF A VIDEO

TECHNOLOGICAL FILED

[0001] An example embodiment of the present invention relates generally to the construction of a visual representation of a video and, more particularly, to the construction of a visual representation of a video that includes one or more segmented objects from frames of the video.

BACKGROUND

[0002] Videos consisting of a plurality of time-sequenced frames are frequently captured, stored, transferred and reviewed. Since a video includes a plurality of frames captured over time, it may correspondingly take some time to view a video. Additionally, the size of at least some video files may be quite large, thereby placing increased demands upon memory for storage of the video files and upon the communication resources in order to transfer the video files, such as to a third party.

[0003] In some instances, one or more frames of the video may be identified. These frames may be identified as being representative of the video, as being interesting or noteworthy or for any other purpose.

BRIEF SUMMARY

[0004] A method, apparatus and computer program product may be provided in accordance with an example embodiment in order to construct a visual representation of a video including one or more segmented objects from the frames of the video. The visual representation of the video that is constructed in accordance with an example embodiment permits the segmented objects from the frames of the video to be presented in a manner that is representative of the video and that provides context for the segmented objects.

[0005] In one embodiment, a method is provided that includes identifying an object that appears in a plurality of frames of a video. The method of this embodiment also includes segmenting the object from one or more frames of the video to create one or more segmented objects. The method of this example embodiment also includes constructing a three-dimensional visual representation of the video including a representation of at least one frame of the video that includes the object. The three-dimensional visual representation also includes the one or more segmented objects positioned in a time sequence relative to the at least one frame of the video.

[0006] The method of an example embodiment may construct the three-dimensional visual representation by constructing a box having at least three sides. In this embodiment, the representation of the at least one frame of the video that includes the object defines an end of the box. The box of an example embodiment also includes first and second sides extending from the end of the box. In this regard, the method may construct a three-dimensional visual

representation by overlaying the one or more segmented objects at least partially on at least one of the first and second sides of the box. In one embodiment, the first and second sides have a color that is at least partially defined by the representation of the at least one frame of the video that includes the object.

[0007] The method of an example embodiment may also include identifying a plurality of key frames of the video. In this embodiment, the method may identify the object by identifying the object in the plurality of key frames. The method of an example embodiment may also include receiving an input indicating that the three-dimensional visual representation should be zoomed in or zoomed out. In response to the input indicating that the three-dimensional visual representation should be zoomed in, the method may also include supplementing the three- dimensional visual representation with one or more additional segmented objects positioned in the time sequence relative to other ones of the segmented objects.

[0008] In another embodiment, an apparatus is provided that includes at least one processor and at least one memory including computer program code with the at least one memory and the computer program code configured to, with the processor, cause the apparatus to at least identify an object that appears in the plurality of frames of a video. The at least one memory and the computer program code are also configured to, with the processor, cause the apparatus of this embodiment to segment the object from one or more frames of the video to create one or more segmented objects. The at least one memory and the computer program code are also configured to, with the processor, cause the apparatus of this embodiment to construct a three-dimensional visual representation of the video including a representation of at least one frame of the video that includes the object. The three-dimensional visual representation includes the one or more segmented objects positioned in a time sequence relative to the at least one frame of the video. [0009] The at least one memory and the computer program code may be configured to, with the processor, cause the apparatus of an example embodiment to construct a three-dimensional visual representation by constructing a box having at least three sides. In this embodiment, the representation of the at least one frame of the video that includes the object defines an end of the box. The box may also include first and second sides extending from the end of the box. In this embodiment, the at least one memory and the computer program code may be configured to, with the processor, cause the apparatus to construct the three-dimensional visual representation by overlaying the one or more segmented objects at least partially on at least one of the first and second sides of the box. The first and second sides may have a color that is at least partially defined by the representation of the at least one frame of the video that includes the object.

[0010] The at least one memory and the computer program code may be further configured to, with the processor, cause the apparatus of an example embodiment to identify a plurality of key frames of the video. In this embodiment, the at least one memory and the computer program code may be configured to, with the processor, cause the apparatus to identify the object by identifying the object in a plurality of key frames. The at least one memory and the computer program code may be further configured to, with the processor, cause the apparatus of an example embodiment to receive an input indicating that the three-dimensional visual representation should be zoomed in or zoomed out. In response to the input indicating that the three-dimensional visual representation should be zoomed in, the at least one memory and the computer program code may be further configured to, with the processor, cause the apparatus to supplement the three-dimensional visual representation with one or more additional segmented objects positioned in the time sequence relative to other ones of the segmented objects.

[0011] In a further embodiment, a computer program product is provided that includes at least one non-transitory computer-readable storage medium having computer-executable program code portions stored therein with the computer-executable program code portions including program code instructions configured, upon execution, to identify an object that appears in the plurality of frames of the video. The computer-executable program code portions of this embodiment also include program code instructions configured, upon execution, to segment the object from one or more frames of the video to create one or more segmented objects. The computer-executable program code portions of this example embodiment also include program code instructions that are configured, upon execution, to construct a three- dimensional visual representation of the video including a representation of at least one frame of the video that includes the object. The three-dimensional visual representation includes one or more segmented objects positioned in a time sequence relative to the at least one frame of the video.

[0012] The program code instructions configured to construct a three-dimensional visual representation may include program code instructions configured to construct a box having at least three sides. In this embodiment, the representation of the at least one frame of the video that includes the object defines an end of the box. The box may also include first and second sides extending from the end of the box. In this embodiment, the program code instructions configured to construct the three-dimensional visual representation may include program code instructions configured to overlay the one or more segmented objects at least partially on at least one of the first and second sides of the box. In an example embodiment, the first and second sides may have a color that is at least partially defined by the representation of the at least one frame of the video that includes the object.

[0013] The computer-executable program code portions of an example embodiment may also include program code instructions configured to identify a plurality of key frames of the video. The program code instructions that are configured to identify the object in this embodiment may include program code instructions configured to identify the object in the plurality of key frames. The computer-executable program code portions of an example embodiment may also include program code instructions configured to receive an input indicating that the three-dimensional visual representation should be zoomed in or zoomed out. In response to the input indicating that the three-dimensional visual representation should be zoomed in, the computer-executable program code portions may also include program code instructions configured to supplement the three-dimensional visual representation with one or more additional segmented objects positioned in the time sequence relative to other ones of the segmented objects.

[0014] In yet another embodiment, an apparatus is provided that includes means for identifying objects that appear in a plurality of frames of the video. The apparatus of this example embodiment also includes means for segmenting the object from one or more frames of the video to create one or more segmented objects. In this example embodiment, the apparatus further includes means for constructing a three-dimensional visual representation of the video including a representation of at least one frame of the video that includes the object. A three- dimensional visual representation includes the one or more segmented objects positioned in a time sequence relative to the at least one frame of the video.

BRIEF DESCRIPTION OF THE DRAWINGS

[0015] Having thus described example embodiments of the invention in general terms, reference will now be made to the accompanying drawings, which are not necessarily drawn to scale, and wherein:

[0016] Figure 1 is a block diagram of an apparatus that may be specifically configured in accordance with an example embodiment of the present invention;

[0017] Figure 2 is a flowchart illustrating operations performed, such as by the apparatus of Figure 1 , in accordance with an example embodiment of the present invention;

[0018] Figure 3 is a three-dimensional visual representation of a video as presented upon a display of a mobile terminal in accordance with an example embodiment of the present invention;

[0019] Figure 4 is a flowchart illustrating operations performed with respect to the construction of a three-dimensional visual representation of a video in accordance with an example embodiment of the present invention;

[0020] Figure 5 is a flowchart illustrating operations performed in conjunction with zooming of a three-dimensional representation of a video in accordance with an example embodiment of the present invention;

[0021] Figures 6A, 6B and 6C are three-dimensional visual representations of a video that are presented in sequence upon a display of a mobile terminal and that illustrate supplementation of the three-dimensional visual representation with one or more additional segmented objects in response to an input indicating that the three-dimensional visual representation should be zoomed in, in accordance with an example embodiment of the present invention;

[0022] Figure 7 is a three-dimensional visual representation presented by a display of a mobile terminal that illustrates the selection of a segmented object in accordance with an example embodiment of the present invention;

[0023] Figure 8 illustrates the frame of the video that includes the segmented object that was selected in Figure 7 and that further illustrates the removal of some background that was initially included with the segmented object in accordance with an example embodiment of the present invention; [0024] Figure 9 illustrates the frame of the video that includes the segmented object that was selected in Figure 7 and that further illustrates the removal of additional background that was initially included with the segmented object in accordance with an example embodiment of the present invention; and

[0025] Figure 10 illustrates the frame of the video that includes the segmented object that was selected in Figure 7 following the removal of background imagery that was initially included with the segmented object in accordance with an example embodiment of the present invention.

DETAILED DESCRIPTION

[0026] Some embodiments of the present invention will now be described more fully hereinafter with reference to the accompanying drawings, in which some, but not all, embodiments of the invention are shown. Indeed, various embodiments of the invention may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will satisfy applicable legal requirements. Like reference numerals refer to like elements throughout. As used herein, the terms "data," "content," "information," and similar terms may be used interchangeably to refer to data capable of being transmitted, received and/or stored in accordance with

embodiments of the present invention. Thus, use of any such terms should not be taken to limit the spirit and scope of embodiments of the present invention.

[0027] Additionally, as used herein, the term 'circuitry' refers to (a) hardware-only circuit implementations (for example, implementations in analog circuitry and/or digital circuitry); (b) combinations of circuits and computer program product(s) comprising software and/or firmware instructions stored on one or more computer readable memories that work together to cause an apparatus to perform one or more functions described herein; and (c) circuits, such as, for example, a microprocessor(s) or a portion of a microprocessor(s), that require software or firmware for operation even if the software or firmware is not physically present. This definition of 'circuitry' applies to all uses of this term herein, including in any claims. As a further example, as used herein, the term 'circuitry' also includes an implementation comprising one or more processors and/or portion(s) thereof and accompanying software and/or firmware. As another example, the term 'circuitry' as used herein also includes, for example, a baseband integrated circuit or applications processor integrated circuit for a mobile phone or a similar integrated circuit in a server, a cellular network device, other network device, and/or other computing device.

[0028] As defined herein, a "computer-readable storage medium," which refers to a non- transitory physical storage medium (for example, volatile or non-volatile memory device), can be differentiated from a "computer-readable transmission medium," which refers to an

electromagnetic signal.

[0029] A method, apparatus and computer program product are provided in accordance with an example embodiment in order to construct a three-dimensional visual representation of a video. In this regard, the method, apparatus and computer program product may construct a three-dimensional visual representation of a video that includes both a representation of at least one frame of the video as well as one or more segmented objects positioned in a time sequence relative to the at least one frame of the video. As such, the three-dimensional visual

representation of the video may provide a useful and intuitive summary of the video that may be quickly reviewed and that may be stored and/or transmitted in a more efficient manner relative to the video itself. Moreover, the method, apparatus and computer program product of an example embodiment may construct the three-dimensional visual representation of the video in an efficient manner without requiring extensive input or involvement by a user.

[0030] One example of an apparatus 10 that may be specifically configured in order to construct a three-dimensional visual representation of a video in accordance with an example embodiment is depicted in Figure 1. The apparatus may be embodied by or associated with a variety of computing devices including, for example, a variety of video recording and/or playback devices, in this regard, the apparatus may be embodied by or otherwise associated with a mobile terminal, such as a personal digital assistant (PDA), mobile telephone, smartphone, companion device, for example, a smart watch, pager, mobile television, gaming device, laptop computer, camera, tablet computer, touch surface, video recorder, audio/video player, radio, electronic book, positioning device (for example, global positioning system (GPS) device), or any combination of the aforementioned, and other types of voice and text communications systems. Alternatively, the apparatus may be embodied by or associated with a fixed computing device, such as a computer workstation, a personal computer, a server or the like.

[0031] Regardless of the manner in which the apparatus 10 is embodied, the apparatus of the embodiment of Figure 1 may include or otherwise be in communication with a processor 12, a memory device 14, a user interface 16 and optionally a communication interface 18 and/or a camera 20. In some embodiments, the processor (and/or co-processors or any other processing circuitry assisting or otherwise associated with the processor) may be in communication with the memory device via a bus for passing information among components of the apparatus. The memory device may be non-transitory and may include, for example, one or more volatile and/or non-volatile memories. In other words, for example, the memory device may be an electronic storage device (for example, a computer readable storage medium) comprising gates configured to store data (for example, bits) that may be retrievable by a machine (for example, a computing device like the processor). The memory device may be configured to store information, data, content, applications, instructions, or the like for enabling the apparatus to carry out various functions in accordance with an example embodiment of the present invention. For example, the memory device could be configured to buffer input data for processing by the processor.

Additionally or alternatively, the memory device could be configured to store instructions for execution by the processor.

[0032] As noted above, the apparatus 10 may be embodied by a computing device, such as, for example, a variety of video recording and/or playback devices. However, in some embodiments, the apparatus may be embodied as a chip or chip set. In other words, the apparatus may comprise one or more physical packages (for example, chips) including materials, components and/or wires on a structural assembly (for example, a circuit board). The structural assembly may provide physical strength, conservation of size, and/or limitation of electrical interaction for component circuitry included thereon. The apparatus may therefore, in some cases, be configured to implement an embodiment of the present invention on a single chip or as a single "system on a chip." As such, in some cases, a chip or chipset may constitute means for performing one or more operations for providing the functionalities described herein.

[0033] The processor 12 may be embodied in a number of different ways. For example, the processor may be embodied as one or more of various hardware processing means such as a coprocessor, a microprocessor, a controller, a digital signal processor (DSP), a processing element with or without an accompanying DSP, or various other processing circuitry including integrated circuits such as, for example, an ASIC (application specific integrated circuit), an FPGA (field programmable gate array), a microcontroller unit (MCU), a hardware accelerator, a special-purpose computer chip, or the like. As such, in some embodiments, the processor may include one or more processing cores configured to perform independently. A multi-core processor may enable multiprocessing within a single physical package. Additionally or alternatively, the processor may include one or more processors configured in tandem via the bus to enable independent execution of instructions, pipelining and/or multithreading.

[0034] In an example embodiment, the processor 12 may be configured to execute instructions stored in the memory device 14 or otherwise accessible to the processor.

Alternatively or additionally, the processor may be configured to execute hard coded

functionality. As such, whether configured by hardware or software methods, or by a combination thereof, the processor may represent an entity (for example, physically embodied in circuitry) capable of performing operations according to an embodiment of the present invention while configured accordingly. Thus, for example, when the processor is embodied as an ASIC, FPGA or the like, the processor may be specifically configured hardware for conducting the operations described herein. Alternatively, as another example, when the processor is embodied as an executor of software instructions, the instructions may specifically configure the processor to perform the algorithms and/or operations described herein when the instructions are executed. However, in some cases, the processor may be a processor of a specific device (for example, the client device 10 and/or a network entity) configured to employ an embodiment of the present invention by further configuration of the processor by instructions for performing the algorithms and/or operations described herein. The processor may include, among other things, a clock, an arithmetic logic unit (ALU) and logic gates configured to support operation of the processor.

[0035] The apparatus 10 of the illustrated embodiment also includes a user interface 16. The user interface, such as a display, may be in communication with the processor 12 to provide output to the user and, in some embodiments, to receive an indication of a user input, such as in an instance in which the user interface includes a touch screen display. In some embodiments, the user interface may also include a keyboard, a mouse, a joystick, touch areas, soft keys, one or more microphones, a plurality of speakers, or other input/output mechanisms. In an example embodiment, the processor may comprise user interface circuitry configured to control at least some functions of one or more user interface elements such as a display and, in some embodiments, a plurality of speakers, a ringer, one or more microphones and/or the like. The processor and/or user interface circuitry comprising the processor may be configured to control one or more functions of one or more user interface elements through computer program instructions (for example, software and/or firmware) stored on a memory accessible to the processor (for example, memory device 14, and/or the like).

[0036] The apparatus 10 of the illustrated embodiment may also optionally include a communication interface 18 that may be any means such as a device or circuitry embodied in either hardware or a combination of hardware and software that is configured to receive and/or transmit data from to a communications device in communication with the apparatus. In this regard, the communication interface may include, for example, an antenna (or multiple antennas) and supporting hardware and/or software for enabling communications with a wireless communication network. Additionally or alternatively, the communication interface may include the circuitry for interacting with the antenna(s) to cause transmission of signals via the antenna(s) or to handle receipt of signals received via the antenna(s). In some environments, the communication interface may alternatively or also support wired communication. As such, for example, the communication interface may include a communication modem and/or other hardware and/or software for supporting communication via cable, digital subscriber line (DSL), universal serial bus (USB) or other mechanisms.

[0037] Although the video may be stored in memory 14 or received from another device via the communication interface 18, the apparatus 10 of some embodiment may include or be associated with a camera 20 that is in communication with the processor 12. The camera may be any means for capturing a video for storage, display or transmission including, for example, an imaging sensor. For example, the camera may include a digital camera including an imaging sensor capable of forming a digital video file from a plurality of captured images. As such, the camera may include all hardware, such as a lens, an imaging sensor and/or other optical device(s), and software necessary for creating a digital video file from a plurality of captured images. Alternatively, the camera may include only the hardware needed to view a video, while the memory stores instructions for execution by the processor in the form of software necessary to create a digital video file from a plurality of captured images. In an example embodiment, the camera may further include a processing element such as a co-processor which assists the processor in processing image data and an encoder and/or decoder for compressing and/or decompressing image data. The encoder and/or decoder may encode and/or decode according to a predefined format, such as a JPEG standard format. The video that is captured may be stored for future viewings and/or manipulations in the memory of the apparatus and/or in a memory external to the apparatus, such as the memory of a video playback system.

[0038] Referring now to Figure 2, the operations performed, such as by the apparatus 10 of Figure 1 , in order to construct a three-dimensional visual representation of a video in accordance with an example embodiment is illustrated. The video comprised of a plurality of frames ordered in a time sequence may have been captured by the camera 20. Alternatively, the video may be stored by memory 14 or be received by the communication interface 18 from a third party. Regardless of the origin of the video, as shown in block 32 of Figure 2, the apparatus may include means, such as the processor 12 or the like, for identifying an object that appears in a plurality of frames of the video. The object may be identified in various manners. For example, the object may be identified based upon input by a user who, for example, may touch or otherwise select the object within a frame of the video. Alternatively, the object may be identified by the processor in accordance with other criteria. For example, the processor may be configured to identify the object based upon a determination across a plurality of frames that the object has moved relative to the background. Still further, the processor may be configured to identify the object based upon a relative position of the object within the frames of the video, such as by identifying the object based upon the location of the object in a central portion of the all, a majority or at least a predetermined percentage of the frames of the video.

[0039] The object is identified by the processor 12 based upon the appearance of the object in a plurality of frames of the video. The number of frames of the video in which the object is identified may be defined in various manners. For example, the processor may identify the object in a predefined number of frames of the video. Alternatively, the user may provide input indicative of the number of frames of the video in which the object should be identified. Still further, the processor of an example embodiment may be configured to identify the object in a number of frames of the video that depends upon the length of the video and, as such, the total number of frames in the video. For example, the number of frames of the video in which the object is identified may be proportional to, such as a predetermined percentage of, the total number of frames of the video. Regardless of the number of frames of the video in which the object is identified, the processor of an example embodiment may be configured to identify the object in frames of the video that are spaced apart throughout the length or at least throughout a portion of the video. Thus, although the frames of the video in which the object are identified may be continuous, the frames of the video in which the object is identified may, in an example embodiment, be spaced apart in the time sequence. In this regard, the frames of the video in which the object is identified may be spaced evenly throughout the video or may be spaced in an uneven manner, such as in an instance in which the object that is identified moves more in one segment of the video than in another segment, such that the object is identified in a greater percentage of the frames during the period of time in which the object is moving to a greater degree than in other portions of the video.

[0040] In one example embodiment, the apparatus 10 may include means, such as the processor 12 or the like, for identifying a plurality of key frames of the video as shown in block 30 of Figure 2. The processor of this example embodiment may be configured to identify the key frames in various manners. In one embodiment, however, the processor is configured to identify the key frame as those frames that exhibit the most change or the greatest difference from a prior frame. In this regard, the processor of an example embodiment may identify the key frames as those frames of the video that include the object and in which the object has changed or moved to the greatest degree relative to the prior frame of the video. In this embodiment, once the key frames of the video have been identified, the processor may be configured to identify the object that appears in the key frames of the video.

[0041] As shown in block 34 of Figure 2, the apparatus 10 may also include means, such as the processor 12 or the like, for segmenting the object from one or more frames of the video to create one or more segmented objects. In this regard, the processor may be configured to segment the object that has been identified from the one or more frames of the video, such as the one or more key frames of the video, by separating the object from the background of the frame, thereby isolating the visual representation of the object relative to the remainder of the frame.

[0042] The apparatus 10 may also include means, such as the processor 12, the user interface 16 or the like, for constructing a three-dimensional visual representation of the video. See block 36 of Figure 2. As shown in Figure 3 in which a mobile terminal 40, e.g., a smartphone, that embodies the apparatus includes a touch screen display 42, for example, the three-dimensional visual representation of the video may be presented upon the display. The three-dimensional visual representation includes a representation 44 of at least one frame of the video that includes the object. The three-dimensional visual representation also includes the one or more segmented objects 46 positioned in a time sequence relative to the at least one frame of the video. [0043] By way of example, the object that has been identified in Figure 3 is a skateboarder, including both the person and the skateboard. The three-dimensional visual representation of the video that is constructed in this example embodiment includes a representation 44 of at least one frame of the video as shown on the right hand side of the three-dimensional visual representation as well as a plurality, e.g., four, segmented objects 46. The segmented objects are from different frames of the video and are positioned in time sequence relative to one another and relative to the frame of the video for which a representation. In this regard, the time sequence in which the segmented objects are positioned in the three-dimensional visual representation follows the same order in which the frames appeared within the video. Although the time sequence may be represented in various manner by the three-dimensional visual representation of the video, the three-dimensional visual representation of the example embodiment of Figure 3 is such that the time sequence extends from an earlier time on the left to a later time on the right with a segmented object extracted from an earlier frame of the video being positioned to the left of a segmented object from a later frame.

[0044] Additionally, the three-dimensional visual representation may be constructed by the processor 12 such that the segmented objects 46 are not only positioned in a time sequence, such as along a horizontal axis,, but are also generally aligned, such as along a vertical axis, with one another. In this regard, while the segmented objects are spaced from one another along the horizontal axis so as to indicate the time sequence of the frames of the video from which the segmented objects were identified, the segmented objects are positioned vertically in general alignment with one another as well as in general alignment with the object in the representation 44 of the at least one frame of the video. As shown in Figure 3, the three-dimensional visual representation of the video illustrates changes in or movement by the segmented object over the course of time and, as such, provides a succinct summary of the video in a manner that is understandable by and intuitive to a viewer. Moreover, the identification of the object and the segmentation of the object from one or more frames of the video by the processor permit the three-dimensional visual representation of the video to be constructed in an efficient manner.

[0045] Although the three-dimensional visual representation may be constructed in various manners, the operations performed, such as by the apparatus 10 of Figure 1 , in order to construct the three-dimensional visual representation of the video in accordance with one example embodiment are shown in Figure 4. In this regard, the apparatus may include means, such as the processor 12, for constructing a box having at least three sides. See block 50 of Figure 4. In this embodiment, a representation 44 of the at least one frame of the video that includes the object defines an end of the box, such as the right hand side of the box in the embodiment of Figure 3. In order to construct the box and to permit the box representative of the three-dimensional visual representation of the video to be viewed in perspective, the representation of the at least one frame of the video that includes the object that defines the end of the box may also be presented in perspective as shown in Figure 3. The box of this example embodiment may also include first and second sides extending from the end of the box. In the example of Figure 3, the first and second sides may include the back side and the bottom side of the box. However, the box that is constructed to provide the three-dimensional visual representation of the video may have other orientations and, as such, the first and second side may comprise different sides of the box in other embodiments.

[0046] As shown in block 52 of Figure 4, the apparatus 10 of an example embodiment may include means, such as the processor 12, the user interface 16 or the like, for causing the first and second sides that extend from the end of the box to have a color that is at least partially defined by the representation 44 of the at least one frame of the video that includes the object. In this regard, reference to color may include both gray levels as well as other colors. In one example embodiment, the colors of the first and second side are at least partially defined by the color of the pixels along the edge of the representation of the at least one frame of the video. In this regard, the colors of the first and second sides of the box may be the same as or similar to the colors of the pixels along the edge of the representation of the at least one frame of the video. In regards to the orientation of the box depicted in Figure 3 , the colors of the pixels along the edge of the representation of the at least one frame of the video may extend in horizontal stripes such that the colors of the first and second sides appear to be a continuation of the colors that appear along the adjacent edges, that is, the left side edge and the bottom edge, of the representation of the frame of the video that defines the end of the box. Thus, the first and second sides of the box of this example embodiment appear related to and, to some degree, a continuation of the background of the representation of the at least one frame of the video that defines the end of the box. Moreover, the colors of the first and second sides of the box of this example embodiment are relatively continuous from the left side of the box to the right side of the box so as not to distract from the segmented object. [0047] As shown in block 54 of Figure 4, the apparatus 10 of this example embodiment may also include means, such as the processor 12, the user interface 16 or the like, for overlaying the one or more segmented objects 46 at least partially on at least one of the first and second sides of the box. For example, in the embodiment of Figure 3, the position of the segmented object along the vertical axis, as defined by the position of the object in the frames of the video and the representation 44 of the frame of the video that defines the end of the box, is such that the segmented object overlays a portion of both the first and second sides of the box. The segmented objects may be spaced apart along the length of the first and second sides in the same time sequence in which the frames from which segmented objects were identified appeared in the video. The segmented objects may be spaced evenly across the width of the three-dimensional visual representation or the segmented objects may be overlaid with different spacing therebetween, such as in instances in which the frames from which the segmented objects were identified were not evenly spaced throughout the video.

[0048] The user may interact with the three-dimensional visual representation of the video in order to edit or otherwise modify the three-dimensional visual representation. For example, the user may zoom in or zoom out relative to the three-dimensional visual representation in order to modify the three-dimensional visual representation to add or subtract segmented objects 46, respectively, so as to create slow motion and fast motion effects, respectively. In this example embodiment and as depicted in Figure 5, the apparatus 10 may include means, such as the processer 12, the user interface 16 or the like, for receiving an input, such as from a user, indicating that the three-dimensional visual representation should be zoomed in or zoomed out. Although the apparatus may be configured to receive various types of inputs indicative of zooming in or zooming out, the apparatus, such as the processor, of an example embodiment may receive a pinching gesture, such as a pinching out gesture for zooming in and a pinching in gesture for zooming out.

[0049] In this embodiment, the apparatus 12 may include means, such as a processor 12 or the like, for determining the type of zooming to be provided based upon the input as shown in block 62 of Figure 5. By way of example, Figure 6A depicts a pinching out gesture in which the user places two fingers at an initial location 70 and then moves one or both fingers outwardly relative to the other finger to a subsequent position 74. In response to the pinching out gesture, the apparatus, such as the processor, the user interface 16 or the like, may be configured to cause the three-dimensional visual representation of the video to be spread in the same outward direction that the user's fingers were moved. By spreading the three-dimensional visual representation, such as in a horizontal direction in the embodiment of Figure 6B, the resulting three-dimensional visual representation includes additional space between the segmented objects 46.

[0050] In this example embodiment in which the user input, such as the pinching out gesture, is indicative of an intent to zoom in, the apparatus 10 may include means, such as the processor 12, the user interface 16 or the like, for supplementing the three-dimensional visual

representation with one or more additional segmented objects 46 positioned in the time sequence relative to other ones of the segmented objects. See block 64 of Figure 5. In the example embodiment of Figure 6B, the processor may be configured to determine that the additional space between segmented objects is larger than a segmented object itself. As such, the processor of this example embodiment may be configured to identify and segment the object from another frame of the video and to insert the additionally segmented object in the three-dimensional visual representation, thereby resulting in a fast motion effect as shown in Figure 6C. As described above, each segmented object is identified and extracted from a respective frame of the video. As such, the processor, in conjunction with the supplementation of the three-dimensional visual representation, may be configured to identify additional segmented objects from frames that appear in the video between the frames from which the adjacent segmented objects were extracted. With reference to Figure 6C, for example, the segmented object that is second to the left may have been added to the three-dimensional visual representation as a result of the zooming in resulting from the pinching out gesture of Figures 6A and 6B. This additional segmented object may have been extracted by the processor from a frame of the video that is sequentially after the frame of the video from which the left most segmented object was extracted and prior to the frame of the video from which the segmented object that is third from the left was extracted, thereby effectively interpolating between the segmented objects. Thus, the resulting three-dimensional visual representation remains in time sequence following supplementation. The frames from which the additional segmented objects are extracted in order to supplement the three-dimensional visual representation may be key frames of the video that have been identified as described above or may be frames of the video that are selected in other manners, such as frames of the video that are selected at a predetermined equal spacing throughout length of the video.

[0051] As described above in conjunction with Figure 6A-6C, the three-dimensional visual representation may be zoomed in, in response to a pinching out gesture. In contrast, the input may indicate that the three-dimensional visual representation should be zoomed out, such as in response to a pinching in gesture. In this instance, the apparatus 10 may include means, such as the processor 12, the user interface 16 or the like, for reducing the number of segmented objects included in the three-dimensional visual representation so as to produce a slow motion effect in response to the pinching in gesture. See block 66 of Figure 5. The processor of this example embodiment may determine the segmented objects 46 to be removed from the three-dimensional visual representation in various manners including by removing a number of segmented objects sufficient to prevent overlap of the remaining segmented objects while still providing a representation of the manner in which the segmented object changes over the course of the video.

[0052] As described above and as shown in the example of Figure 3, the apparatus 10, such as the processor 12, may be configured to segment the object from a frame of the video by removing the object from the frame after having separated the object from the background of the frame. As such, the segmented object 46 may be included within the three-dimensional visual representation of the video without the distraction or noise associated with the background of the respective frame. In some instances, the processor may initially segment the object in such a manner that the segmented object not only includes the object itself, but also a portion of the background. As shown in Figure 7, for example, the second and third segmented objects from the left include not only the object, that is, the skateboarder, but also a portion of the background of the respective frames from which the segmented objects were extracted. In this regard, the segmented object that is second to the left includes a portion of the background of the original frame beneath the left arm of the skateboarder and between the legs of the skateboarder, while the segmented object that is third from left includes a portion of the original background between the legs of the skateboarder. In this embodiment, the apparatus may include means, such as the processor, the user interface 16 or the like, for responding to user input in order to remove at least some of the background from the original frame and to further define the segmented object so as to include only the object and none of the background or at least a smaller amount of the background from the original frame. [0053] In an example embodiment, a segmented object 46 may initially be selected, such as in response to a user touching a segmented object as shown as 76 in Figure. 7. In response to the selection of a segmented object, the apparatus 10, such as the processor 12, the user interface 16 or the like, may be configured to present the original frame from which the segmented object was extracted, along with the segmented object, as shown in Figure 8. In this example embodiment, the background of the frame that is not included in the segmented object is lighter or more faded than the segmented object. However, the portions of the background that are included in the segmented object are shown more brightly, as is the object itself. In this example embodiment, the apparatus, such as the processor, the user interface or the like, is responsive to user input, such as placement and movement of the user's fmger, upon that portion of the segmented object that constitutes the background of the original frame. In response to the user input, the portion of the segmented objected contacted by the user is removed from the segmented object, thereby permitting the user to rub away an unwanted portion of the segmented object, such as the portion of the background of the original frame that is initially included within the segmented object. In an example embodiment, the segmented image may be considered by the processor to be comprised of superpixels. A superpixel is a group of spatially connected pixels that have a similar color and/or texture. As such, the processor of this embodiment may respond more efficiently to the user input by removing each superpixel that is contacted by the user during the editing operation, as opposed to operating on a per pixel basis.

[0054] As shown in Figure 8, the user may provide input to remove the portion of the background of the original frame that is under the arm of the skateboarder, thereby removing that portion of the background from the segmented object. As shown in Figure 9, the user may provide comparable input to remove the portion of the background of the original frame that is between the user's legs that was also included as part of the original segmented object. The resulting segmented object following removal of the background of the original frame is shown in Figure 10 and may, in turn, be incorporated in the three-dimensional visual representation, such as shown in Figure 3.

[0055] While the user may separately modify each segmented object 46 that includes a portion of the background of the original frame, the apparatus 10, such as the processor 12, of an example embodiment may also be configured to respond to the user input associated with one of the segmented objects that causes a portion of the background of the original frame to be removed from the segmented object to similarly modify all other segmented objects, or at least all other segmented objects identified by the user, in a comparable manner. Thus, the removal of a portion of the background of the original frame from between the user's legs in the segmented object that is second to the left may also cause the portion of the background of the original frame from which the segmented object that is third from the left was extracted to similarly be automatically removed without further user input, thereby increasing the efficiency with which the three-dimensional visual representation may be edited.

[0056] As described above, the method, apparatus 10 and computer program product of an example embodiment may construct a visual representation of a video including one or more segmented objects from the frames of the video. The visual representation of the video that is constructed in accordance with an example embodiment permits the segmented objects from the frames of the video to be presented in a manner that is representative of the video and that provides context for the segmented objects, such as in terms of the time sequence of the segmented object relative to one another and to a representation of at least one frame of the video. By providing context for the segmented objects, issues may be avoided relating to a plurality of individual frames from a video appearing disjointed or failing to convey the same context or message as the original video that may otherwise arise from other techniques that rely upon a user to identifying one or more frames of a video. Thus, the visual representation of the video that is constructed in accordance with an example embodiment may provide an alternative representation of the video that may be reviewed more quickly and may be stored and

transmitted in a more efficient manner. Further, the method, apparatus and computer program product of an example embodiment may construct the visual representation of the video including one or more segmented objects from frames of the video in an efficient manner by limiting the user involvement that is required in comparison to other techniques in which a user undergoes a time consuming process to identify one or more particular frames of a video that are then presented in a manner that is representative of the video.

[0057] As described above, Figures 2, 4 and 5 illustrate flowcharts of an apparatus 10, method and computer program product according to example embodiments of the invention. It will be understood that each block of the flowcharts, and combinations of blocks in the flowcharts, may be implemented by various means, such as hardware, firmware, processor, circuitry, and/or other communication devices associated with execution of software including one or more computer program instructions. For example, one or more of the procedures described above may be embodied by computer program instructions. In this regard, the computer program instructions which embody the procedures described above may be stored by a memory device 14 of an apparatus employing an embodiment of the present invention and executed by a processor 12 of the apparatus. As will be appreciated, any such computer program instructions may be loaded onto a computer or other programmable apparatus (for example, hardware) to produce a machine, such that the resulting computer or other programmable apparatus implements the functions specified in the flowchart blocks. These computer program instructions may also be stored in a computer-readable memory that may direct a computer or other programmable apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture the execution of which implements the function specified in the flowchart blocks. The computer program instructions may also be loaded onto a computer or other programmable apparatus to cause a series of operations to be performed on the computer or other programmable apparatus to produce a computer- implemented process such that the instructions which execute on the computer or other programmable apparatus provide operations for implementing the functions specified in the flowchart blocks.

[0058] Accordingly, blocks of the flowcharts support combinations of means for performing the specified functions and combinations of operations for performing the specified functions for performing the specified functions. It will also be understood that one or more blocks of the flowcharts, and combinations of blocks in the flowcharts, can be implemented by special purpose hardware -based computer systems which perform the specified functions, or combinations of special purpose hardware and computer instructions.

[0059] In some embodiments, certain ones of the operations above may be modified or further amplified. Furthermore, in some embodiments, additional optional operations may be included, such as indicated by the dashed lines in Figure 2. Modifications, additions, or amplifications to the operations above may be performed in any order and in any combination.

[0060] Many modifications and other embodiments of the inventions set forth herein will come to mind to one skilled in the art to which these inventions pertain having the benefit of the teachings presented in the foregoing descriptions and the associated drawings. Therefore, it is to be understood that the inventions are not to be limited to the specific embodiments disclosed and that modifications and other embodiments are intended to be included within the scope of the appended claims. Moreover, although the foregoing descriptions and the associated drawings describe example embodiments in the context of certain example combinations of elements and/or functions, it should be appreciated that different combinations of elements and/or functions may be provided by alternative embodiments without departing from the scope of the appended claims. In this regard, for example, different combinations of elements and/or functions than those explicitly described above are also contemplated as may be set forth in some of the appended claims. Although specific terms are employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation.

Claims

WHAT IS CLAIMED IS:

1. A method comprising:

identifying an object that appears in a plurality of frames of a video;

segmenting the object from one or more frames of the video to create one or more segmented objects; and

constructing a three-dimensional visual representation of the video including a representation of at least one frame of the video that includes the object, wherein the three- dimensional visual representation comprises the one or more segmented objects positioned in a time sequence relative to the at least one frame of the video.

2. A method according to Claim 1 wherein constructing the three-dimensional visual representation further comprises constructing a box having at least three sides, wherein the representation of the at least one frame of the video that includes the object defines an end of the box.

3. A method according to Claim 2 wherein the box comprises first and second sides extending from the end of the box, and wherein constructing the three-dimensional visual representation comprises overlaying the one or more segmented objects at least partially on at least one of the first and second sides of the box.

4. A method according to Claim 3 wherein the first and second sides have a color that is at least partially defined by the representation of the at least one frame of the video that includes the object.

5. A method according to any one of Claims 1 to 4 further comprising identifying a plurality of key frames of the video, wherein identifying the object comprises identifying the object in the plurality of key frames.

6. A method according to any one of Claims 1 to 5 further comprising receiving an input indicating that the three-dimensional visual representation should be zoomed in or zoomed out.

7. A method according to Claim 6 wherein, in response to the input indicating that the three-dimensional visual representation should be zoomed in, the method further comprises supplementing the three-dimensional visual representation with one or more additional segmented objects positioned in the time sequence relative to other ones of the segmented objects.

8. An apparatus comprising at least one processor and at least one memory including computer program code, the at least one memory and the computer program code configured to, with the processor, cause the apparatus to at least:

identify an object that appears in a plurality of frames of a video;

segment the object from one or more frames of the video to create one or more segmented objects; and

construct a three-dimensional visual representation of the video including a

representation of at least one frame of the video that includes the object, wherein the three- dimensional visual representation comprises the one or more segmented objects positioned in a time sequence relative to the at least one frame of the video.

9. An apparatus according to Claim 8 wherein the at least one memory and the computer program code are configured to, with the processor, cause the apparatus to construct the three-dimensional visual representation by constructing a box having at least three sides, wherein the representation of the at least one frame of the video that includes the object defines an end of the box.

10. An apparatus according to Claim 9 wherein the box comprises first and second sides extending from the end of the box, and wherein the at least one memory and the computer program code are configured to, with the processor, cause the apparatus to construct the three- dimensional visual representation by overlaying the one or more segmented objects at least partially on at least one of the first and second sides of the box.

11. An apparatus according to Claim 10 wherein the first and second sides have a color that is at least partially defined by the representation of the at least one frame of the video that includes the object.

12. An apparatus according to any one of Claims 8 to 11 wherein the at least one memory and the computer program code are further configured to, with the processor, cause the apparatus to identify a plurality of key frames of the video, wherein the at least one memory and the computer program code are configured to, with the processor, cause the apparatus to identify the object by identifying the object in the plurality of key frames.

13. An apparatus according to any one of Claims 8 to 12 wherein the at least one memory and the computer program code are further configured to, with the processor, cause the apparatus to receive an input indicating that the three-dimensional visual representation should be zoomed in or zoomed out.

14. An apparatus according to Claim 13 wherein, in response to the input indicating that the three-dimensional visual representation should be zoomed in, the at least one memory and the computer program code are further configured to, with the processor, cause the apparatus to supplement the three-dimensional visual representation with one or more additional segmented objects positioned in the time sequence relative to other ones of the segmented objects.

15. A computer program product comprising at least one non-transitory computer- readable storage medium having computer-executable program code portions stored therein, the computer-executable program code portions comprising program code instructions configured, upon execution, to:

identify an object that appears in a plurality of frames of a video;

segment the object from one or more frames of the video to create one or more segmented objects; and construct a three-dimensional visual representation of the video including a representation of at least one frame of the video that includes the object, wherein the three- dimensional visual representation comprises the one or more segmented objects positioned in a time sequence relative to the at least one frame of the video.

16. A computer program product according to Claim 15 wherein the program code instructions configured to construct the three-dimensional visual representation comprise program code instructions configured to construct a box having at least three sides, wherein the representation of the at least one frame of the video that includes the object defines an end of the box.

17. A computer program product according to Claim 16 wherein the box comprises first and second sides extending from the end of the box, and wherein the program code instructions configured to construct the three-dimensional visual representation comprise program code instructions configured to overlay the one or more segmented objects at least partially on at least one of the first and second sides of the box.

18. A computer program product according to Claim 17 wherein the first and second sides have a color that is at least partially defined by the representation of the at least one frame of the video that includes the object.

19. A computer program product according to any one of Claims 15 to 18 wherein the computer-executable program code portions further comprise program code instructions configured to identify a plurality of key frames of the video, wherein the program code instructions configured to identify the object comprise program code instructions configured to identify the object in the plurality of key frames.

20. A computer program product according to any one of Claims 15 to 19 wherein the computer-executable program code portions further comprise program code instructions configured to receive an input indicating that the three-dimensional visual representation should be zoomed in or zoomed out, wherein, in response to the input indicating that the three- dimensional visual representation should be zoomed in, the computer-executable program code portions further comprise program code instructions configured to supplement the three- dimensional visual representation with one or more additional segmented objects positioned in the time sequence relative to other ones of the segmented objects.