US8358837B2 - Apparatus and methods for detecting adult videos - Google Patents

Apparatus and methods for detecting adult videos Download PDF

Info

Publication number
US8358837B2
US8358837B2 US12/113,835 US11383508A US8358837B2 US 8358837 B2 US8358837 B2 US 8358837B2 US 11383508 A US11383508 A US 11383508A US 8358837 B2 US8358837 B2 US 8358837B2
Authority
US
United States
Prior art keywords
adult
video
indicator
unknown
key frames
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US12/113,835
Other versions
US20090274364A1 (en
Inventor
Subodh Shakya
Ruofei Zhang
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Verizon Patent and Licensing Inc
Original Assignee
Yahoo Inc until 2017
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yahoo Inc until 2017 filed Critical Yahoo Inc until 2017
Priority to US12/113,835 priority Critical patent/US8358837B2/en
Assigned to YAHOO! INC. reassignment YAHOO! INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: SHAKYA, SUBODH, ZHANG, RUOFEI
Publication of US20090274364A1 publication Critical patent/US20090274364A1/en
Application granted granted Critical
Publication of US8358837B2 publication Critical patent/US8358837B2/en
Assigned to YAHOO HOLDINGS, INC. reassignment YAHOO HOLDINGS, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: YAHOO! INC.
Assigned to OATH INC. reassignment OATH INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: YAHOO HOLDINGS, INC.
Assigned to VERIZON MEDIA INC. reassignment VERIZON MEDIA INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: OATH INC.
Assigned to VERIZON PATENT AND LICENSING INC. reassignment VERIZON PATENT AND LICENSING INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: VERIZON MEDIA INC.
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content

Definitions

  • the present invention is related to analyzing video content. It especially pertains to analyzing video content to determine whether such video is pornographic.
  • a user may wish to view one or more videos that have an unknown content.
  • a user may search for videos related to a keyword.
  • Some of the videos that are found based on the keyword may be pornographic in nature, and the user may not wish to inadvertently view such pornographic or adult videos. Additionally, adult video content may be inappropriate for people under 18 years of age and it would be beneficial to screen adult video content from reaching users who are minors.
  • a learning system is operable to generate one or more models for adult video detection.
  • the model is generated based on a large set of known videos that have been defined as adult or non-adult.
  • Adult detection is then based on this adult detection model.
  • This adult detection model may be applied to selected key frames of an unknown video.
  • these key frames can be selected from the frames of the unknown video.
  • Each key frame may generally correspond to a frame that contains key portions that are likely relevant for detecting pornographic or adult aspects of the unknown video.
  • key frames may include moving objects, skin, people, etc.
  • a video is not divided into key frames and all frames are analyzed by a learning system to generate a model, as well as by an adult detection system based on such model.
  • a method for detecting pornographic or adult videos For an unknown video having a plurality of frames, a plurality of key frames selected from the frames of the unknown video is defined. Each key frame corresponds to a frame that contains features that are likely relevant for detecting pornographic or adult aspects of the unknown video.
  • the key frames are analyzed using an adult detection model that was generated by a learning process based on a training set of images and their associated adult indicators that each specifies whether the associated known image is an adult or non-adult image, whereby the analysis results in an adult indicator that specifies whether the unknown video is an adult video, a non-adult video, or a suspected adult video.
  • defining the key frames comprises (i) determining one or more portions of each frame that are significantly different from corresponding portions of a plurality of adjacent frames and (ii) defining the key frames based on the significantly different one or more portions of each frame.
  • analyzing the key frames comprises (i) analyzing one or more of the significantly different portions of each key frame with the adult detection model to thereby determine an adult indicator for such one or more of the significantly different portions of such each key frame being adult or non-adult, and (ii) determining the adult indicator of the unknown video based on the adult indicators for the key frames.
  • an adult indicator is determined for each significantly different portion of each key frame that is determined to include a moving object.
  • the learning process is executed based on one or more key frame features extracted from each known image and the each known image's associated adult indicator so as to generate the adult detection model that is to be used for the unknown video.
  • a plurality of key frame features is extracted from the key frames of the unknown video. The analyzing of the key frames of the unknown video is based on the extracted key frame features for such unknown video, and a same type of features are used for analysis of the key frames of the unknown video and by the learning process.
  • the key frames and associated adult indicators are included in the training set of known images, and the learning process is executed based on each known image, including the key frames, and each known image's adult indicator, including the key frames' adult indicators, so as to generate a new adult detection model to be used for adult detection of new unknown videos.
  • one or more adult indicators of the known images, which include the key frames of the new known video are manually corrected prior to executing the learning process on such known images.
  • the invention pertains to an apparatus having at least a processor and a memory.
  • the processor and/or memory are configured to perform one or more of the above described operations.
  • the invention pertains to at least one computer readable storage medium having computer program instructions stored thereon that are arranged to perform one or more of the above described operations.
  • FIG. 1 is a diagrammatic representation of an adult detection system for unknown videos in accordance with one embodiment of the present invention.
  • FIG. 2A is a flowchart illustrating processes for adult video detection in accordance with one implementation of the present invention.
  • FIG. 2B includes two screen shots from an example search application in which a user may select to filter adult videos from their search results in accordance with a specific implementation.
  • FIG. 3 illustrates example processes for implementation of the learning system and the adult key frame detection system of FIG. 1 in accordance with one embodiment of the present invention.
  • FIG. 4 is a diagrammatic representation of applying key frames detection to an unknown video in accordance with one embodiment of the present invention.
  • FIG. 5 is a diagrammatic representation of a plurality of key frame adult indicators in accordance with a specific implementation.
  • FIG. 6 is a simplified diagram of a network environment in which specific embodiments of the present invention may be implemented.
  • FIG. 7 illustrates an example computer system in which specific embodiments of the present invention may be implemented.
  • pornographic or adult videos are detected from a set of unknown videos, such as the results obtained by a search service.
  • An adult video may have content that would be deemed by a particular community or societal construct to be suitable only for adults, e.g., over 17 or 18, to view. That is, the definition of an “adult” or “pornographic” video is subjective and depends on the specific requirements or social norms of a group of people, cultural, government, or company. Additionally, some societies or communities may have different age thresholds for which it is deemed suitable for viewing or not viewing adult videos.
  • adult detection is based on an adult detection model that is generated from a learning process that analyzes a large set of known videos that have been defined as adult or non-adult.
  • This adult detection model may be applied to selected key frames of an unknown video.
  • these key frames can be selected from the frames of the unknown video.
  • Each key frame may generally correspond to a frame that contains key portions that are likely relevant for detecting pornographic or adult aspects of the unknown video.
  • key frames may include moving objects, skin, people, etc.
  • a video is not divided into key frames and all frames are analyzed by a learning system to generate a model, as well as by an adult detection system based on such model.
  • Such adult detection may have any number of uses. For example, detected adult videos may be filtered from search results that are presented to certain users, e.g., who select filtering or are minors.
  • an adult detection technique will now be described with respect to a search application, of course, the adult detection techniques of the present invention can be applied to a diverse number and/or type of applications that could utilize an adult detection process. Examples of other applications include techniques for selecting or displaying advertisements over a computer, mobile phone, or TV network, recommending content to users, or selecting content to be delivered to the user, etc.
  • inventive method embodiments are applicable in any application that provides video content.
  • FIG. 1 is a diagrammatic representation of an adult detection system 100 for unknown videos in accordance with one embodiment of the present invention.
  • the term “unknown” video is not meant to imply that the unknown video cannot include a tag indicating whether it is an adult video.
  • the adult detection techniques described herein can be implemented independently of the video's self-labeling as to adult content. Accordingly, these adult detection techniques do not need to rely on the tagging or ratings of each video, which may be untrustworthy or incorrect. For example, adult labels or tags may be applied to videos based on inconsistent standards or policy that may be more or less stringent than desired by the users of such adult detection system.
  • the adult detection system 100 may include a learning system 108 for generating an adult detection model, an adult detection module 106 for adult detection (e.g., for a particular key frame) based on such model, a key frame extraction module 104 for extracting key frames from an unknown video, and an adult categorization module 114 for categorizing the unknown video based on the adult detection output for the key frames of such unknown video.
  • a learning system 108 for generating an adult detection model
  • an adult detection module 106 for adult detection e.g., for a particular key frame
  • a key frame extraction module 104 for extracting key frames from an unknown video
  • an adult categorization module 114 for categorizing the unknown video based on the adult detection output for the key frames of such unknown video.
  • Key frame extraction module 102 may receive an unknown video, e.g., that has not yet been analyzed by adult detection module 106 .
  • the key frame extraction module generally defines a set of key frames for the unknown videos that can be usefully analyzed by adult detection module 106 .
  • the adult detection module 106 receives each key frame and outputs an adult indicator for each key frame to adult categorization module 114 .
  • the adult indicator for a particular image indicates whether one or more portions of such image are adult or non-adult, and may also indicate a confidence value for such adult or non-adult indication.
  • the adult indicator may be determined based on either an adult detection model from learning system 108 or may be retrieved from known videos and key frames database 110 .
  • the adult categorization system 114 receives the key frames and their adult indicators for an unknown video and then determines whether the video is an adult video, a non-adult video, or a suspected adult video based on the received key frame adult indicators.
  • the newly known video and its associated adult indicator may be retained in database 110 .
  • the adult categorization system 114 may also reassess the key frames and modify their associated adult indicators based on the video's overall adult indicator, as explained further herein.
  • the adult categorization system 114 may also retain these newly known key frame adult indicators, e.g., in database 110 .
  • the learning system 108 may be configured to receive information regarding a large training set of known videos and images and then generate an adult detection model based on this training set that is output to adult detection module 106 .
  • the training set of images may be obtained from any suitable storage device or devices, such as from a known videos and key frames database 110 .
  • the known video and key frames database 110 may include identifying information for a plurality of known images (or the images themselves) and an adult indicator associated with each image that specifies whether the image is an adult or non-adult image, as well as identifying information for each known video.
  • This training set may initially be provided by manually classifying a large set of images as adult or non-adult. In one embodiment, 6000 or more images (3000 adult and 3000 non-adult) are initially, manually classified as adult or non-adult so as to achieve a reasonable level of accuracy for the adult detection model.
  • the training set of images may also include images that have been analyzed by the adult detection system 106 based on a previously generated model. For instance, a new model may be generated once a month or every week. That is, a feedback mechanism may be provided so that a new adult detection model is generated periodically based on newly analyzed key frames.
  • the system 100 may also include a manual adult indication and/or correction module 112 .
  • This manual module 112 may be include mechanisms to allow a user to manually provide or correct an adult indicator for any number of images or key frames, e.g., of known videos and key frames database 110 . In other words, the manual module may allow a user to provide the initial training set and/or to correct adult indicators that are determined by the adult detection system 106 .
  • the manual module may include a user interface for viewing images and inputting an adult indicator value (e.g., adult or non-adult) by any suitable input mechanisms, such as a pull-down menu with selectable adult and non-adult options, selectable adult and non-adult buttons, or a text input box into which a user can enter a string indicating “adult” or “non-adult” by way of examples.
  • an adult indicator value e.g., adult or non-adult
  • suitable input mechanisms such as a pull-down menu with selectable adult and non-adult options, selectable adult and non-adult buttons, or a text input box into which a user can enter a string indicating “adult” or “non-adult” by way of examples.
  • FIG. 2A is a flowchart illustrating processes for adult video detection in accordance with one implementation of the present invention.
  • unknown video 102 may be received into the key frame extraction module 104 .
  • An unknown video may originate from any suitable source. Although only described with respect to a single unknown video, the following operations may be performed for each unknown video in a set of unknown videos.
  • the unknown video is one of the search results that were obtained for a particular user video search, and adult detection may be performed on each of the search results that have not been previously analyzed for adult content.
  • the adult detection system may be configured on or accessible by a search server.
  • the search server may take any suitable form for performing searches for videos.
  • Embodiments of the present invention may be employed with respect to any search application, and example search applications include Yahoo! Search, Google, Microsoft MSN and Live Search, Ask Jeeves, etc.
  • the search application may be implemented on any number of servers.
  • FIG. 2B includes two screen shots from an example search application 250 , e.g., from Yahoo! of Sunnyvale, Calif.
  • the search application of a search server may present a web page 252 having an input feature in the form of input box 154 to the client so the client can enter one or more search term(s).
  • user may type any number of search terms into the search input feature. Selectable options for choosing different types of searches, such as video or images, may also be present next to the input feature. As shown, a user may select a video option 156 for searching videos.
  • the search server When a search for videos based on one or more search terms is initiated in a query to a search server, the search server then locates a plurality of videos that relate to the search terms. These videos can be found on any number of web servers and usually enter the search server via a crawling and indexing pipeline possibly performed by a different set of computers (not shown). The plurality of located videos may then be analyzed by a rule based or decision tree system to determine a “goodness” or relevance ranking. For instance, the videos are ranked in order from most relevant to least relevant based on a plurality of feature values of the videos, the user who initiated the search with a search request, etc.
  • adult video detection may be implemented so as to filter out adult videos from the search results.
  • the adult detection may be selected by the user, e.g., via a selectable search option or via a user profile that was previously set up by the user.
  • the adult detection may also be automatically performed based on the user's age, e.g., when the user is younger than 18 or 17 years old.
  • a user may select an “Advanced Video Search” option 258 to be applied to the current video search, or modify their user preferences 260 for all video searches performed by the user.
  • the user preferences are only applied when the user is logged in during performance of a search.
  • Other mechanisms may be utilized to detect the user's preference, besides a login, so as to apply adult video detection for such user.
  • Screen shot 262 includes option 264 a for “Filtering out adult Web, video, and image search results”, option 264 b for “Filtering out adult video and image search results only”, and option 264 c for “Do not filter results”.
  • the user preferences may also be applied more generally to the computer on which the preferences are being set. As shown, the user may select option 266 so as to “Lock safe search setting to filter out adult web, video, and image search results” for anyone signed in to the computer who is under 18 or when searches are performed without logging into the computer.
  • the ranked and filtered lists of documents/objects can then be presented to the user in a search results list that is ordered based on ranking.
  • the ranking and/or adult detection processes may be performed by the search server that has received the search query or by another server, such as a specially configured ranking server (not shown).
  • each particular frame is analyzed to determined portions of the particular frame that are significantly different from corresponding portions of a specified number of adjacent frames may be defined for further analysis while background portions are excluded from such analysis. This process generally serves to filter out large portions of the background or noise from each frame while retaining the moving portions of each frame.
  • each frame is compared to a predefined number of adjacent frames to detect difference portions of the each frame that differ from the corresponding adjacent frame portions. Any suitable number of adjacent frames, such as 96 adjacent frames, may be utilized.
  • any suitable compression technique for removing pixels that are common between a majority of a predefined set of adjacent frames.
  • any suitable video compression approach such as a MPEG (Moving Picture Experts Group) technique
  • a modified version of a video compression approach may be used so as to define or detect motion (e.g., moving objects) out of a background and also identify separately each moving object.
  • a simple motion detection approach would be to compare the current frame with the previous frame (which is what is widely used in video compression techniques).
  • the background or the starting frame
  • the reference or background frame actually changes in the direction of the subsequent frames. That is, changes may be tracked, and these tracked changes may be relative to multiple previous frames, not just the beginning frame of a video.
  • an original frame, Fo can first be defined, as well as a next frame, Fn, and a previous frame, Fp, with respect to the current, original frame, Fo.
  • the first step may include finding where the previous frame, Fp, differs from the current (original) frame, Fo.
  • a differencing filter may be applied between the gray scale images obtained from Fp and Fo using a predefined threshold, such as 15%.
  • the result from this difference filter may be an image with white pixels at specific areas for which the current (original) frame is different from the previous (background) frame by an amount that is equal or above the predefined threshold, e.g., 15%.
  • a predefined subset of frames e.g., frames F1-F96.
  • This comparison can now be made with color information. If the difference in pixels between any two of these frames in the predefined subset (e.g., F1-F96) exceeds a predetermined amount, e.g., 35%, a new previous/background frame (Fp) may be used for the original frame, Fo, and the above described modified process is then repeated using the new previous frame, Fp.
  • a predetermined amount e.g. 35%
  • the background of the new, current, previous frame, Fp may then be subtracted from the current, original frame, Fo, to obtain the significantly different portions of such current frame, Fo.
  • This modified process can be repeated for each frame of the video being defined as the current frame, Fo, as well as new previous frames for such new current frame.
  • This modified compression process has several features. Since most videos have grainy images that may be interpreted as motion, an erosion technique may be applied before the differencing operations so as to prevent random motion bits from manifesting. Additionally, the previous frame may not actually be the literal previous frame. The previous frame may actually be closely behind the current frame or may be up to 96 frames behind the current frame, depending on the difference in the number of pixels that have been found to have changed. Sometimes the previous frame may just be 3 or 4 frames behind (for example for a fast moving video). Whenever multiple moving objects are detected (identified by multiple closed boundaries that represent separate areas within the white (differenced) image, sudden disappearance of such objects would tend to cause the background/previous frame reference to be reset (to a different previous frame).
  • video 402 includes a plurality of frames 404 (e.g., frames 06 through 18 are shown).
  • the background is substantially filtered out of the frames to produce significantly difference portions for frames 406 .
  • a portion of the background may be retained around each significantly different portion (or moving object) to provide context to the further analysis procedures.
  • Key frames may then be identified or defined based on the significantly different portions of the video's frames in operation 204 .
  • a full speed, 29 frames per second, video may be reduced to a collection of key frames that represent the whole video and include images that are significantly different from each other.
  • a set of key frames can be selected from the frames and their significantly different portions based on content differential.
  • Content differential factors may include a quantification or qualification of any suitable characteristics.
  • content differential factors may include a quantification or qualification of one or more of the following image characteristics: motion and spatial activity, likeliness that the image contains people, skin-color detection, and/or face detection.
  • the significantly different portions of each frame as shown by 406 , are reduced to key frames 408 .
  • frames 09 , 12 , 15 , and 18 of video 402 are selected as key frames 408 .
  • the video is initially divided into shots. One or more shots are then selected. One or more key frames are then selected from each selected shot. Shot detection may be based on detecting discontinuities in motion activity and changes in pixel value histogram distribution. Shot and key frame selection may be based on measures of motion activity, spatial activity, skin-color detection, and face detection. Motion activity may be measured by frame difference, and spatial activity may be determined by the entropy of pixel values distribution. Skin-color and face detection may be based on a learning system, such as described in (i) M. J. Jones et al., “Statistical Color Models with Applications to Skin Detection”, TR 98-11, CRL, Compaq Computer Corp., December 1998 and (ii) H. A.
  • key frame detection may simply be based on measurable features, rather than object detection. For instance, key frame detection may occur without face detection.
  • key frame detection technique is further described in Frederic Dufaux, “Key frame selection to represent a video”, IEEE Proceedings 2000 International Conference on Image Processing, Vol. 11 of III: 275-278, Sep. 10-13, 2000, which document is incorporated herein by reference.
  • a video may be first divided into shots.
  • a shot may be defined as a set of frames that are captured from a same perspective. Shot detection may rely on a measure of frame-to-frame change. Several suitable techniques of shot detection are further described in B. L. Yeo et al., “Rapid Scene Analysis on Compressed Video”, IEEE Trans. On CSVT, 5 (6): 533-544, 1995, which document is incorporated herein by reference.
  • a key frame is then selected for each shot. For example, the first frame of each shot may be selected. If a shot contains significant changes (e.g., color or motion) occur in a particular shot, multiple key frames may be selected for such shot, e.g., by using a clustering technique.
  • each key frame may then be analyzed to determine an adult indicator for each key frame.
  • a first key frame is then obtained in operation 206 . It is then determined whether an adult indicator is already associated with the current key frame in operation 208 .
  • the current key frame may have already been processed in another video during the same search, during a previous search, or manually classified as part of the initial training set of videos or as a corrected key frame.
  • an adult indicator e.g., an indication as to whether the key frame is adult or non-adult key frame and a confidence value for such indication, may already be associated with the current key frame, e.g., in database 110 . If the current key frame is already associated with an adult indicator, this adult indicator is then obtained in operation 210 .
  • the current key frame is sent to the adult detection module 106 , which outputs an adult indicator for the current key frame.
  • the adult indicator for the current key frame may be retained in operation 212 .
  • a unique identifier for the current key frame and its associated adult indicator are retained in database 110 .
  • a unique identifier may take any suitable form, such as a unique name or reference that is associated with each frame. It may then be determined whether there are more key frames in operation 214 . That is, it is determined whether all of the key frames for the unknown video have been processed.
  • the next key frame is obtained in operation 206 and operations 208 through 212 are repeated for such next key frame.
  • the key frame adult indicators for the unknown video are sent to the adult categorization module 114 , which outputs an adult indicator for the unknown video based on such key frame adult indicators.
  • FIG. 3 illustrates example processes for implementation of the learning system 108 and the adult key frame detection system 106 of FIG. 1 in accordance with one embodiment of the present invention.
  • an adult detection model is provided by the learning system 108 to the adult key frame detection module 106 .
  • the learning system may generate an adult detection model utilizing any suitable learning process.
  • the learning system generally may receive information regarding known videos and key frames and their associated adult indicators from database 110 . For instance, an index of unique video and key frame identifiers associated with adult indicators and references to the actual videos and key frames may be stored in database 110 . The key frames that are associated with the index may be retrieved and analyzed by the learning system 108 .
  • one or more key frame features may then be extracted from the known key frames in operation 302 .
  • Any suitable key frame features may be extracted from each key frame.
  • spatial and/or color distribution features and texture features are extracted.
  • audio as well as visual characteristics may also be extracted.
  • Some techniques that may be used in key feature extraction may include but are not limited to: 1) generating a histogram that counts and graphs the total number of pixels at each grayscale level (e.g., a histogram may be used to detect underexposure or saturation in an image/video), 2) generating a line profile that plots the variations of intensity along a line (e.g., line profiles are sometime helpful in determining the boundaries between objects in an image/video), 3) performing intensity measurements to measure grayscale statistics in an image/video or a region of an image/video, such as but not limited to minimum intensity value, maximum intensity value, mean intensity value, standard deviation of the intensity value, 4) using look-up tables to convert grayscale values in the source image/video into other grayscale values in a transformed image/video, 5) using spatial filters to remove noise, smooth, sharpen or otherwise transform an image/video, such as but not limited to Gaussian filters for smoothing images/video, Laplacian filters for
  • Other image processing techniques may include 11) using edge detection algorithms, 12) using gauging of dimensional characteristics of objects, 13) using image correlation to determine how close an image/video is to an expected image/video (e.g., comparing a newly captured image/video to a recorded image/video that has already been analyzed for object identification), 14) using pattern matching to locate regions of a grayscale image/video and determine how close the grayscale image/video matches a predetermined template (e.g., pattern matching may be configured to find template matches regardless of poor lighting, blur, noise, shifting of the template or rotation of the template. For graphical components on a captured image/video, the size, shape, location, etc.
  • That correspond to specific objects in an image/video may be predetermined which allows a template to be constructed for particular object sets), 15) using optical character recognition algorithms and methods, 16) using color matching to quantify which color, how much of each color and/or ratio of colors exist in a region of an image/video and compare the values generated during color matching to expected values to determine whether the image/video includes known reference object colors, and 17) using color pattern matching to locate known reference patterns in a color image/video.
  • a learning algorithm may then be executed on the extracted key frame features in operation 352 .
  • the learning algorithm outputs an adult detection model to the adult key frame detection system 106 .
  • Any suitable learning system may be utilized.
  • a suitable open source learning algorithm which is known as the Support Vector Machine, is available through Kernel-Machines.org. Embodiments of the Support Vector Machine are further described in (i) the publication by Ron Meir, “Support Vector Machines—an Introduction”, Dept. of Electr. Eng. Technion, Israel, June 2002, (ii) U.S. Pat. No. 7,356,187, issued 8 Apr. 2008 by Shananhan et al., and (iii) U.S. Pat. No. 6,816,847, issued 9 Nov. 2004 by Toyama, which document and patents are incorporated herein by reference in their entirety.
  • Support Vector Machines may build classifiers by identifying a hyperplane that partitions two classes of adult and non-adult videos or images in a multi-dimensional feature space into two disjoint subsets with a maximum margin, e.g., between the hyperplane and each class.
  • the margin is defined by the distance of the hyperplane to the nearest adult and non-adult cases for each class.
  • Different SVM-based training methods include maximizing the margin as an optimization problem.
  • a linear SVM (e.g., non-linear SVMs are also contemplated) can be represented, for example, in the following two equivalent forms: using a weight vector representation; or using a support vector representation.
  • the weight vector representation mathematically can represent an SVM (the separating hyperplane) as a pair of parameters ⁇ W, b>, where W denotes a weight vector and b represents a threshold or bias term.
  • the weight vector W can include a list of tuples of the form ⁇ f i , w i >, where f i denotes a feature and w i denotes the weight associated with feature f i . This corresponds to a vector space representation of the weight vector W.
  • the weight value w i associated with each feature f i and the threshold value b may be learned from examples using standard SVM learning algorithms.
  • This weight vector representation is also known as the primal representation.
  • the support vector representation of an SVM model also known as the dual representation, mathematically represents an SVM (the separating hyperplane) as a pair of parameters ⁇ SV, b>, where SV denotes a list of example tuples, known as support vectors, and b represents a threshold.
  • the support vector list can include tuples of the form ⁇ SV i , ⁇ i >, where SV i denotes an example video with known classification and ⁇ i denotes the weight associated with example SV i .
  • the Euclidean (perpendicular) distance from the hyperplane to the support vectors is known as the margin of the support vector machine.
  • the parameters of the support vector machine model may be determined using a learning algorithm in conjunction with a training data set that characterizes the information need, i.e., a list of videos or key frames that have been labeled as adult or non-adult.
  • learning a linear SVM model may include determining the position and orientation of the hyperplane that separates the adult examples and non-adult examples that are used during learning.
  • the parameters of the weight vector representation or the support vector representation may also be determined. Learning a support vector machine can be viewed both as a constraint satisfaction and optimization algorithm, where the first objective is to determine a hyperplane that classifies each labeled training example correctly, and where the second objective is to determine the hyperplane that is furthest from the training data, so that an adult detection model is determined.
  • the model that is output from learning system 108 may be used for each unknown video and its unknown key frames.
  • an unknown key frame 301 is received by the adult key frame detection system 106 .
  • One or more key frame features may then be extracted from such unknown key frame, e.g., as described above for the learning system, in operation 302 .
  • the adult detection model may then be executed to obtain an adult indicator for the current key frame in operation 304 .
  • the key frame adult indicator may then be output from the adult key frame detection system 106 .
  • Classifying a key frame using an SVM model reduces to determining which side of the hyperplane the example falls. If the example falls on the adult side of the hyperplane then the example is assigned an adult label; otherwise it is assigned a non-adult label.
  • This form of learned SVM is known as a hard SVM.
  • Other types of SVM exist which relax the first objective. For example, not requiring all training examples to be classified correctly by the SVM leads to a type known as soft SVMs. In this case the SVM learning algorithm sacrifices accuracy of the model with the margin of the model.
  • Other types of SVMs and SVM learning algorithms also exist and may be utilized by techniques of the present invention.
  • FIG. 5 is a diagrammatic representation of a plurality of key frame adult indicators in accordance with a specific implementation. As shown, portion 502 a of key frame 09 has an adult indicator that specifies “non-adult” and a 97.23% confidence level, and portion 502 b of key frame 12 has an adult indicator that specifies “non-adult” and a 99.21%. Key frames 15 and 18 each have two portions that each have a representative adult indicator.
  • Key frame 15 has a portion 504 a with an adult indicator of “adult” at a 91.28% confidence level and a portion 502 c with an adult indicator of “non-adult” at a 96.22% confidence level.
  • Key frame 19 has a portion 504 b with an adult indicator of “adult” at a 63.06% confidence level and a portion 502 d with an adult indicator of “non-adult” at a 98.33% confidence level.
  • an average confidence value is determined for all of the key frames for both adult and non-adult portions.
  • the confidence level for the video being non-adult may be determined by (97.23+99.21+96.22+98.33)/4, which equals 97.75%.
  • the adult confidence level may be determined by (0+0+91.28+63.06)/4, which equals 38.59%.
  • the final determination may be based on different thresholds for adult and non-adult confidence levels. For instance, when the aggregate (total) non-adult confidence level exceeds 97%, the unknown video is deemed to be safe (non-adult), provided that the aggregate adult confidence level is below 50%.
  • the unknown video when the adult confidence is above 70% and the non-adult confidence is below 61%, the unknown video may be deemed adult. Additionally, the unknown video may be deemed a suspected adult video when the adult confidence level is above 70%, while the non-adult confidence level is above 61.11%.
  • Other thresholds that may be used involve non-deterministic scenarios such as an unknown video having too low aggregate confidence scores (for example, less than 70% adult and less than 61% non-adult). Likewise if an unknown video has very high scores (contention) between adult as well as non-adult cut-offs (e.g., 80% adult and 99% non-adult), the unknown video can be deemed as suspect safe.
  • the key frame adult indicators for such now known video can be reassessed. For example, if the video is determined to be adult, all key frames with an adult indicator can have their confidence levels increased. As an example, a Video Va containing key frames K1, K2, K3, and K4 was deemed suspect adult. At a later point when another Video Vb containing key frames K3, K4, K5, and K6 is deemed to be “adult classified,” the classification causes the result of Va to be reassessed to the extent that if any of the key frames (e.g., K3 and K4) were contributing non-deterministically earlier by way of mechanics described in above, the aggregate scores may now be recalculated based on the new information. Since Video Vb is adult, non-determinstic key frames belonging to all videos including common with Vb (in Va, for example, K3 and K4) can also be deemed as adult.
  • the new known video and key frames with their associated adult indicators may be retained, e.g., in database 110 .
  • the database includes a list of a plurality of videos entries that each includes a reference or title and an unique video identity, which can be quickly search for the video's location and/or identity.
  • the database may also include another list of unique video identifiers and their associated one or more key words for such video, a server identity, a video type, the number of key frames, a video confidence value, an adult indicator field (e.g. set to 1 for an adult video and 0 for non-adult or possibly suspected adult), and a suspected adult indicator field (e.g.
  • the database may also include a list of key frames for the multiple videos, where each key frame entry includes a video identifier, key frame identifier or number, key frame file name or reference, type, fingerprint, adult indicator (e.g., adult or non-adult), and a confidence level value.
  • the fingerprint takes the form of a unique identifier for the key frame and helps in locating, searching and comparing key frames quickly.
  • Embodiments of the present invention may be employed to perform adult detection techniques in any of a wide variety of computing contexts.
  • implementations are contemplated in which the relevant population of users interact with a diverse network environment via any type of computer (e.g., desktop, laptop, tablet, etc.) 602 , media computing platforms 603 (e.g., cable and satellite set top boxes and digital video recorders), handheld computing devices (e.g., PDAs) 604 , cell phones 606 , or any other type of computing or communication platform.
  • computer e.g., desktop, laptop, tablet, etc.
  • media computing platforms 603 e.g., cable and satellite set top boxes and digital video recorders
  • handheld computing devices e.g., PDAs
  • cell phones 606 or any other type of computing or communication platform.
  • video information may be obtained using a wide variety of techniques. For example, adult detection selection based on a user's interaction with a local application, web site or web-based application or service may be accomplished using any of a variety of well known mechanisms for recording and determining a user's behavior. However, it should be understood that such methods are merely exemplary and that preference information and video information may be collected in many other ways.
  • this information may be analyzed and used to generate adult indicators according to the invention in some centralized manner.
  • This is represented in FIG. 6 by server 608 and data store 610 that, as will be understood, may correspond to multiple distributed devices and data stores.
  • the invention may also be practiced in a wide variety of network environments (represented by network 612 ) including, for example, TCP/IP-based networks, telecommunications networks, wireless networks, etc.
  • the computer program instructions with which embodiments of the invention are implemented may be stored in any type of computer-readable media, and may be executed according to a variety of computing models including a client/server model, a peer-to-peer model, on a stand-alone computing device, or according to a distributed computing model in which various of the functionalities described herein may be effected or employed at different locations.
  • FIG. 7 illustrates a typical computer system that, when appropriately configured or designed, can serve as a adult detection system and/or search application, etc.
  • the computer system 700 includes any number of processors 702 (also referred to as central processing units, or CPUs) that are coupled to storage devices including primary storage 706 (typically a random access memory, or RAM), primary storage 704 (typically a read only memory, or ROM).
  • processors 702 may be of various types including microcontrollers and microprocessors such as programmable devices (e.g., CPLDs and FPGAs) and unprogrammable devices such as gate array ASICs or general-purpose microprocessors.
  • primary storage 704 acts to transfer data and instructions uni-directionally to the CPU and primary storage 706 is used typically to transfer data and instructions in a bi-directional manner. Both of these primary storage devices may include any suitable computer-readable media such as those described herein.
  • a mass storage device 708 is also coupled bi-directionally to CPU 702 and provides additional data storage capacity and may include any of the computer-readable media described above. Mass storage device 708 may be used to store programs, data and the like and is typically a secondary storage medium such as a hard disk. It will be appreciated that the information retained within the mass storage device 708 , may, in appropriate cases, be incorporated in standard fashion as part of primary storage 706 as virtual memory.
  • a specific mass storage device such as a CD-ROM 714 may also pass data uni-directionally to the CPU.
  • CPU 702 is also coupled to an interface 710 that connects to one or more input/output devices such as such as video monitors, track balls, mice, keyboards, microphones, touch-sensitive displays, transducer card readers, magnetic or paper tape readers, tablets, styluses, voice or handwriting recognizers, or other well-known input devices such as, of course, other computers.
  • CPU 702 optionally may be coupled to an external device such as a database or a computer or telecommunications network using an external connection as shown generally at 712 . With such a connection, it is contemplated that the CPU might receive information from the network, or might output information to the network in the course of performing the method steps described herein.
  • the system may employ one or more memories or memory modules configured to store data, program instructions for the general-purpose processing operations and/or the inventive techniques described herein.
  • the program instructions may control the operation of an operating system and/or one or more applications, for example.
  • the memory or memories may also be configured to store user preferences and profile information, video and key frame information, adult detection models adult indicators for key frames and videos, etc.
  • machine-readable media that include program instructions, state information, etc. for performing various operations described herein.
  • machine-readable media include, but are not limited to, magnetic media such as hard disks, floppy disks, and magnetic tape; optical media such as CD-ROM disks; magneto-optical media such as floptical disks; and hardware devices that are specially configured to store and perform program instructions, such as read-only memory devices (ROM) and random access memory (RAM).
  • program instructions include both machine code, such as produced by a compiler, and files containing higher level code that may be executed by the computer using an interpreter.

Abstract

Disclosed are apparatus and methods for detecting whether a video is adult or non-adult. In certain embodiments, a learning system is operable to generate one or more models for adult video detection. The model is generated based on a large set of known videos that have been defined as adult or non-adult. Adult detection is then based on this adult detection model. This adult detection model may be applied to selected key frames of an unknown video. In certain implementations, these key frames can be selected from the frames of the unknown video. Each key frame may generally correspond to a frame that contains key portions that are likely relevant for detecting pornographic or adult aspects of the unknown video. By way of examples, key frames may include moving objects, skin, people, etc. In alternative embodiments, a video is not divided into key frames and all frames are analyzed by a learning system to generate a model, as well as by an adult detection system based on such model.

Description

BACKGROUND OF THE INVENTION
The present invention is related to analyzing video content. It especially pertains to analyzing video content to determine whether such video is pornographic.
In multimedia applications, a user may wish to view one or more videos that have an unknown content. In a search application example, a user may search for videos related to a keyword. Some of the videos that are found based on the keyword may be pornographic in nature, and the user may not wish to inadvertently view such pornographic or adult videos. Additionally, adult video content may be inappropriate for people under 18 years of age and it would be beneficial to screen adult video content from reaching users who are minors.
Accordingly, it would be beneficial to provide mechanisms for detecting whether a video is an adult video or is suspected of being an adult video.
SUMMARY OF THE INVENTION
Accordingly, apparatus and methods for detecting whether a video is adult or non-adult are provided. In certain embodiments, a learning system is operable to generate one or more models for adult video detection. The model is generated based on a large set of known videos that have been defined as adult or non-adult. Adult detection is then based on this adult detection model. This adult detection model may be applied to selected key frames of an unknown video. In certain implementations, these key frames can be selected from the frames of the unknown video. Each key frame may generally correspond to a frame that contains key portions that are likely relevant for detecting pornographic or adult aspects of the unknown video. By way of examples, key frames may include moving objects, skin, people, etc. In alternative embodiments, a video is not divided into key frames and all frames are analyzed by a learning system to generate a model, as well as by an adult detection system based on such model.
In one embodiment, a method for detecting pornographic or adult videos is disclosed. For an unknown video having a plurality of frames, a plurality of key frames selected from the frames of the unknown video is defined. Each key frame corresponds to a frame that contains features that are likely relevant for detecting pornographic or adult aspects of the unknown video. The key frames are analyzed using an adult detection model that was generated by a learning process based on a training set of images and their associated adult indicators that each specifies whether the associated known image is an adult or non-adult image, whereby the analysis results in an adult indicator that specifies whether the unknown video is an adult video, a non-adult video, or a suspected adult video.
In a specific implementation, defining the key frames comprises (i) determining one or more portions of each frame that are significantly different from corresponding portions of a plurality of adjacent frames and (ii) defining the key frames based on the significantly different one or more portions of each frame. In a further aspect, analyzing the key frames comprises (i) analyzing one or more of the significantly different portions of each key frame with the adult detection model to thereby determine an adult indicator for such one or more of the significantly different portions of such each key frame being adult or non-adult, and (ii) determining the adult indicator of the unknown video based on the adult indicators for the key frames. In yet a further aspect, an adult indicator is determined for each significantly different portion of each key frame that is determined to include a moving object.
In another implementation, prior to analyzing the key frames of the unknown video, the learning process is executed based on one or more key frame features extracted from each known image and the each known image's associated adult indicator so as to generate the adult detection model that is to be used for the unknown video. In a further aspect, a plurality of key frame features is extracted from the key frames of the unknown video. The analyzing of the key frames of the unknown video is based on the extracted key frame features for such unknown video, and a same type of features are used for analysis of the key frames of the unknown video and by the learning process. In another example, after analyzing the key frames of the unknown video so that the unknown video is defined as a new known video, the key frames and associated adult indicators are included in the training set of known images, and the learning process is executed based on each known image, including the key frames, and each known image's adult indicator, including the key frames' adult indicators, so as to generate a new adult detection model to be used for adult detection of new unknown videos. In one embodiment, one or more adult indicators of the known images, which include the key frames of the new known video, are manually corrected prior to executing the learning process on such known images.
In another embodiment, the invention pertains to an apparatus having at least a processor and a memory. The processor and/or memory are configured to perform one or more of the above described operations. In another embodiment, the invention pertains to at least one computer readable storage medium having computer program instructions stored thereon that are arranged to perform one or more of the above described operations.
These and other features of the present invention will be presented in more detail in the following specification of the invention and the accompanying figures which illustrate by way of example the principles of the invention.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a diagrammatic representation of an adult detection system for unknown videos in accordance with one embodiment of the present invention.
FIG. 2A is a flowchart illustrating processes for adult video detection in accordance with one implementation of the present invention.
FIG. 2B includes two screen shots from an example search application in which a user may select to filter adult videos from their search results in accordance with a specific implementation.
FIG. 3 illustrates example processes for implementation of the learning system and the adult key frame detection system of FIG. 1 in accordance with one embodiment of the present invention.
FIG. 4 is a diagrammatic representation of applying key frames detection to an unknown video in accordance with one embodiment of the present invention.
FIG. 5 is a diagrammatic representation of a plurality of key frame adult indicators in accordance with a specific implementation.
FIG. 6 is a simplified diagram of a network environment in which specific embodiments of the present invention may be implemented.
FIG. 7 illustrates an example computer system in which specific embodiments of the present invention may be implemented.
DETAILED DESCRIPTION OF THE SPECIFIC EMBODIMENTS
Reference will now be made in detail to a specific embodiment of the invention. An example of this embodiment is illustrated in the accompanying drawings. While the invention will be described in conjunction with this specific embodiment, it will be understood that it is not intended to limit the invention to one embodiment. On the contrary, it is intended to cover alternatives, modifications, and equivalents as may be included within the spirit and scope of the invention as defined by the appended claims. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. The present invention may be practiced without some or all of these specific details. In other instances, well known process operations have not been described in detail in order not to unnecessarily obscure the present invention.
In certain embodiments, pornographic or adult videos are detected from a set of unknown videos, such as the results obtained by a search service. An adult video may have content that would be deemed by a particular community or societal construct to be suitable only for adults, e.g., over 17 or 18, to view. That is, the definition of an “adult” or “pornographic” video is subjective and depends on the specific requirements or social norms of a group of people, cultural, government, or company. Additionally, some societies or communities may have different age thresholds for which it is deemed suitable for viewing or not viewing adult videos.
In certain embodiments, adult detection is based on an adult detection model that is generated from a learning process that analyzes a large set of known videos that have been defined as adult or non-adult. This adult detection model may be applied to selected key frames of an unknown video. In certain implementations, these key frames can be selected from the frames of the unknown video. Each key frame may generally correspond to a frame that contains key portions that are likely relevant for detecting pornographic or adult aspects of the unknown video. By way of examples, key frames may include moving objects, skin, people, etc. In alternative embodiments, a video is not divided into key frames and all frames are analyzed by a learning system to generate a model, as well as by an adult detection system based on such model.
Such adult detection may have any number of uses. For example, detected adult videos may be filtered from search results that are presented to certain users, e.g., who select filtering or are minors. Although several example embodiments of an adult detection technique will now be described with respect to a search application, of course, the adult detection techniques of the present invention can be applied to a diverse number and/or type of applications that could utilize an adult detection process. Examples of other applications include techniques for selecting or displaying advertisements over a computer, mobile phone, or TV network, recommending content to users, or selecting content to be delivered to the user, etc. In general, the inventive method embodiments are applicable in any application that provides video content.
FIG. 1 is a diagrammatic representation of an adult detection system 100 for unknown videos in accordance with one embodiment of the present invention. The term “unknown” video is not meant to imply that the unknown video cannot include a tag indicating whether it is an adult video. Said in another way, the adult detection techniques described herein can be implemented independently of the video's self-labeling as to adult content. Accordingly, these adult detection techniques do not need to rely on the tagging or ratings of each video, which may be untrustworthy or incorrect. For example, adult labels or tags may be applied to videos based on inconsistent standards or policy that may be more or less stringent than desired by the users of such adult detection system.
As shown, the adult detection system 100 may include a learning system 108 for generating an adult detection model, an adult detection module 106 for adult detection (e.g., for a particular key frame) based on such model, a key frame extraction module 104 for extracting key frames from an unknown video, and an adult categorization module 114 for categorizing the unknown video based on the adult detection output for the key frames of such unknown video.
Key frame extraction module 102 may receive an unknown video, e.g., that has not yet been analyzed by adult detection module 106. The key frame extraction module generally defines a set of key frames for the unknown videos that can be usefully analyzed by adult detection module 106. In this implementation, the adult detection module 106 receives each key frame and outputs an adult indicator for each key frame to adult categorization module 114. The adult indicator for a particular image indicates whether one or more portions of such image are adult or non-adult, and may also indicate a confidence value for such adult or non-adult indication. The adult indicator may be determined based on either an adult detection model from learning system 108 or may be retrieved from known videos and key frames database 110.
The adult categorization system 114 receives the key frames and their adult indicators for an unknown video and then determines whether the video is an adult video, a non-adult video, or a suspected adult video based on the received key frame adult indicators. The newly known video and its associated adult indicator may be retained in database 110. The adult categorization system 114 may also reassess the key frames and modify their associated adult indicators based on the video's overall adult indicator, as explained further herein. The adult categorization system 114 may also retain these newly known key frame adult indicators, e.g., in database 110.
The learning system 108 may be configured to receive information regarding a large training set of known videos and images and then generate an adult detection model based on this training set that is output to adult detection module 106. The training set of images may be obtained from any suitable storage device or devices, such as from a known videos and key frames database 110. The known video and key frames database 110 may include identifying information for a plurality of known images (or the images themselves) and an adult indicator associated with each image that specifies whether the image is an adult or non-adult image, as well as identifying information for each known video. This training set may initially be provided by manually classifying a large set of images as adult or non-adult. In one embodiment, 6000 or more images (3000 adult and 3000 non-adult) are initially, manually classified as adult or non-adult so as to achieve a reasonable level of accuracy for the adult detection model.
The training set of images may also include images that have been analyzed by the adult detection system 106 based on a previously generated model. For instance, a new model may be generated once a month or every week. That is, a feedback mechanism may be provided so that a new adult detection model is generated periodically based on newly analyzed key frames. The system 100 may also include a manual adult indication and/or correction module 112. This manual module 112 may be include mechanisms to allow a user to manually provide or correct an adult indicator for any number of images or key frames, e.g., of known videos and key frames database 110. In other words, the manual module may allow a user to provide the initial training set and/or to correct adult indicators that are determined by the adult detection system 106. For example, the manual module may include a user interface for viewing images and inputting an adult indicator value (e.g., adult or non-adult) by any suitable input mechanisms, such as a pull-down menu with selectable adult and non-adult options, selectable adult and non-adult buttons, or a text input box into which a user can enter a string indicating “adult” or “non-adult” by way of examples.
FIG. 2A is a flowchart illustrating processes for adult video detection in accordance with one implementation of the present invention. Initially, unknown video 102 may be received into the key frame extraction module 104. An unknown video may originate from any suitable source. Although only described with respect to a single unknown video, the following operations may be performed for each unknown video in a set of unknown videos. In one example, the unknown video is one of the search results that were obtained for a particular user video search, and adult detection may be performed on each of the search results that have not been previously analyzed for adult content.
In one search application, the adult detection system may be configured on or accessible by a search server. The search server may take any suitable form for performing searches for videos. Embodiments of the present invention may be employed with respect to any search application, and example search applications include Yahoo! Search, Google, Microsoft MSN and Live Search, Ask Jeeves, etc. The search application may be implemented on any number of servers.
FIG. 2B includes two screen shots from an example search application 250, e.g., from Yahoo! of Sunnyvale, Calif. In this example, the search application of a search server may present a web page 252 having an input feature in the form of input box 154 to the client so the client can enter one or more search term(s). In a typical implementation, user may type any number of search terms into the search input feature. Selectable options for choosing different types of searches, such as video or images, may also be present next to the input feature. As shown, a user may select a video option 156 for searching videos.
When a search for videos based on one or more search terms is initiated in a query to a search server, the search server then locates a plurality of videos that relate to the search terms. These videos can be found on any number of web servers and usually enter the search server via a crawling and indexing pipeline possibly performed by a different set of computers (not shown). The plurality of located videos may then be analyzed by a rule based or decision tree system to determine a “goodness” or relevance ranking. For instance, the videos are ranked in order from most relevant to least relevant based on a plurality of feature values of the videos, the user who initiated the search with a search request, etc.
At this point, adult video detection may be implemented so as to filter out adult videos from the search results. The adult detection may be selected by the user, e.g., via a selectable search option or via a user profile that was previously set up by the user. The adult detection may also be automatically performed based on the user's age, e.g., when the user is younger than 18 or 17 years old. In FIG. 2B, a user may select an “Advanced Video Search” option 258 to be applied to the current video search, or modify their user preferences 260 for all video searches performed by the user. In this example, the user preferences are only applied when the user is logged in during performance of a search. Other mechanisms may be utilized to detect the user's preference, besides a login, so as to apply adult video detection for such user.
Screen shot 262 includes option 264 a for “Filtering out adult Web, video, and image search results”, option 264 b for “Filtering out adult video and image search results only”, and option 264 c for “Do not filter results”. The user preferences may also be applied more generally to the computer on which the preferences are being set. As shown, the user may select option 266 so as to “Lock safe search setting to filter out adult web, video, and image search results” for anyone signed in to the computer who is under 18 or when searches are performed without logging into the computer.
Once the videos are ranked and filtered, the ranked and filtered lists of documents/objects can then be presented to the user in a search results list that is ordered based on ranking. The ranking and/or adult detection processes may be performed by the search server that has received the search query or by another server, such as a specially configured ranking server (not shown).
Referring back to the key frame extraction process, significantly different portions of each frame of the unknown video 102 may be determined in operation 202. That is, each particular frame is analyzed to determined portions of the particular frame that are significantly different from corresponding portions of a specified number of adjacent frames may be defined for further analysis while background portions are excluded from such analysis. This process generally serves to filter out large portions of the background or noise from each frame while retaining the moving portions of each frame. In one implementation, each frame is compared to a predefined number of adjacent frames to detect difference portions of the each frame that differ from the corresponding adjacent frame portions. Any suitable number of adjacent frames, such as 96 adjacent frames, may be utilized.
Significantly different portions may be found for each frame using any suitable compression technique for removing pixels that are common between a majority of a predefined set of adjacent frames. For example, any suitable video compression approach, such as a MPEG (Moving Picture Experts Group) technique, may be used. In a specific implementation, a modified version of a video compression approach may be used so as to define or detect motion (e.g., moving objects) out of a background and also identify separately each moving object. A simple motion detection approach would be to compare the current frame with the previous frame (which is what is widely used in video compression techniques). However, unlike a video compression technique, the background (or the starting frame) is not constant, e.g., does not rely on a single beginning frame as a reference. In contrast, the reference or background frame actually changes in the direction of the subsequent frames. That is, changes may be tracked, and these tracked changes may be relative to multiple previous frames, not just the beginning frame of a video.
In one example, an original frame, Fo, can first be defined, as well as a next frame, Fn, and a previous frame, Fp, with respect to the current, original frame, Fo. The first step may include finding where the previous frame, Fp, differs from the current (original) frame, Fo. For this purpose, a differencing filter may be applied between the gray scale images obtained from Fp and Fo using a predefined threshold, such as 15%. The result from this difference filter may be an image with white pixels at specific areas for which the current (original) frame is different from the previous (background) frame by an amount that is equal or above the predefined threshold, e.g., 15%. These specific areas can then be used to count the number of pixels that have actually changed between each pair of frames within a predefined subset of frames, e.g., frames F1-F96. This comparison can now be made with color information. If the difference in pixels between any two of these frames in the predefined subset (e.g., F1-F96) exceeds a predetermined amount, e.g., 35%, a new previous/background frame (Fp) may be used for the original frame, Fo, and the above described modified process is then repeated using the new previous frame, Fp. When the difference in pixels between each of the pairs of frames in the predefined set, e.g., frames F1-F96, is less than 35%, the background of the new, current, previous frame, Fp, may then be subtracted from the current, original frame, Fo, to obtain the significantly different portions of such current frame, Fo. This modified process can be repeated for each frame of the video being defined as the current frame, Fo, as well as new previous frames for such new current frame.
This modified compression process has several features. Since most videos have grainy images that may be interpreted as motion, an erosion technique may be applied before the differencing operations so as to prevent random motion bits from manifesting. Additionally, the previous frame may not actually be the literal previous frame. The previous frame may actually be closely behind the current frame or may be up to 96 frames behind the current frame, depending on the difference in the number of pixels that have been found to have changed. Sometimes the previous frame may just be 3 or 4 frames behind (for example for a fast moving video). Whenever multiple moving objects are detected (identified by multiple closed boundaries that represent separate areas within the white (differenced) image, sudden disappearance of such objects would tend to cause the background/previous frame reference to be reset (to a different previous frame).
An example application of difference detection is illustrated in FIG. 4. As shown, video 402 includes a plurality of frames 404 (e.g., frames 06 through 18 are shown). When difference detection is applied, the background is substantially filtered out of the frames to produce significantly difference portions for frames 406. A portion of the background may be retained around each significantly different portion (or moving object) to provide context to the further analysis procedures.
Key frames may then be identified or defined based on the significantly different portions of the video's frames in operation 204. By way of example, a full speed, 29 frames per second, video may be reduced to a collection of key frames that represent the whole video and include images that are significantly different from each other. For instance, a set of key frames can be selected from the frames and their significantly different portions based on content differential. Content differential factors may include a quantification or qualification of any suitable characteristics. In one implementation, content differential factors may include a quantification or qualification of one or more of the following image characteristics: motion and spatial activity, likeliness that the image contains people, skin-color detection, and/or face detection. In the example of FIG. 4, the significantly different portions of each frame, as shown by 406, are reduced to key frames 408. For instance, frames 09, 12, 15, and 18 of video 402 are selected as key frames 408.
In a specific implementation of key frame detection, the video is initially divided into shots. One or more shots are then selected. One or more key frames are then selected from each selected shot. Shot detection may be based on detecting discontinuities in motion activity and changes in pixel value histogram distribution. Shot and key frame selection may be based on measures of motion activity, spatial activity, skin-color detection, and face detection. Motion activity may be measured by frame difference, and spatial activity may be determined by the entropy of pixel values distribution. Skin-color and face detection may be based on a learning system, such as described in (i) M. J. Jones et al., “Statistical Color Models with Applications to Skin Detection”, TR 98-11, CRL, Compaq Computer Corp., December 1998 and (ii) H. A. Rowley et al., “Neural Network-Based Face Detection”, IEEE Trans. On PAMI, 20 (1): 23-38, 1998, which documents are incorporated herein by reference. Alternatively, key frame detection may simply be based on measurable features, rather than object detection. For instance, key frame detection may occur without face detection. One key frame detection technique is further described in Frederic Dufaux, “Key frame selection to represent a video”, IEEE Proceedings 2000 International Conference on Image Processing, Vol. 11 of III: 275-278, Sep. 10-13, 2000, which document is incorporated herein by reference.
In other embodiments, a video may be first divided into shots. A shot may be defined as a set of frames that are captured from a same perspective. Shot detection may rely on a measure of frame-to-frame change. Several suitable techniques of shot detection are further described in B. L. Yeo et al., “Rapid Scene Analysis on Compressed Video”, IEEE Trans. On CSVT, 5 (6): 533-544, 1995, which document is incorporated herein by reference. A key frame is then selected for each shot. For example, the first frame of each shot may be selected. If a shot contains significant changes (e.g., color or motion) occur in a particular shot, multiple key frames may be selected for such shot, e.g., by using a clustering technique. Clustering techniques are described further in Y. Zhuang et al., “Adaptive Key Frame Extraction Using Unsupervised Clustering”, Proc. Of. Int. Conf. on Image Proc., Chicago, October 1998, which document is incorporated herein by reference.
Once a set of key frames is detected for the unknown video, each key frame may then be analyzed to determine an adult indicator for each key frame. Referring back to FIG. 2, a first key frame is then obtained in operation 206. It is then determined whether an adult indicator is already associated with the current key frame in operation 208. For instance, the current key frame may have already been processed in another video during the same search, during a previous search, or manually classified as part of the initial training set of videos or as a corrected key frame. In either case, an adult indicator, e.g., an indication as to whether the key frame is adult or non-adult key frame and a confidence value for such indication, may already be associated with the current key frame, e.g., in database 110. If the current key frame is already associated with an adult indicator, this adult indicator is then obtained in operation 210.
If an adult indicator is not already associated with the current key frame, the current key frame is sent to the adult detection module 106, which outputs an adult indicator for the current key frame. Whether the adult indicator for the current key frame is obtained from a database or determined by the adult detection module 106, the adult indicator for the current key frame may be retained in operation 212. For instance, a unique identifier for the current key frame and its associated adult indicator are retained in database 110. A unique identifier may take any suitable form, such as a unique name or reference that is associated with each frame. It may then be determined whether there are more key frames in operation 214. That is, it is determined whether all of the key frames for the unknown video have been processed. If there are more key frames, the next key frame is obtained in operation 206 and operations 208 through 212 are repeated for such next key frame. When there are no more key frames, the key frame adult indicators for the unknown video are sent to the adult categorization module 114, which outputs an adult indicator for the unknown video based on such key frame adult indicators.
FIG. 3 illustrates example processes for implementation of the learning system 108 and the adult key frame detection system 106 of FIG. 1 in accordance with one embodiment of the present invention. Before the adult key frame can analyze a key frame to determine an adult indicator, an adult detection model is provided by the learning system 108 to the adult key frame detection module 106.
The learning system may generate an adult detection model utilizing any suitable learning process. The learning system generally may receive information regarding known videos and key frames and their associated adult indicators from database 110. For instance, an index of unique video and key frame identifiers associated with adult indicators and references to the actual videos and key frames may be stored in database 110. The key frames that are associated with the index may be retrieved and analyzed by the learning system 108.
In the illustrated example, one or more key frame features may then be extracted from the known key frames in operation 302. Any suitable key frame features may be extracted from each key frame. In a specific implementation, spatial and/or color distribution features and texture features are extracted. In a further embodiment, audio as well as visual characteristics may also be extracted.
Some techniques that may be used in key feature extraction (or key frame extraction or in any of the frame or video analysis techniques described herein) may include but are not limited to: 1) generating a histogram that counts and graphs the total number of pixels at each grayscale level (e.g., a histogram may be used to detect underexposure or saturation in an image/video), 2) generating a line profile that plots the variations of intensity along a line (e.g., line profiles are sometime helpful in determining the boundaries between objects in an image/video), 3) performing intensity measurements to measure grayscale statistics in an image/video or a region of an image/video, such as but not limited to minimum intensity value, maximum intensity value, mean intensity value, standard deviation of the intensity value, 4) using look-up tables to convert grayscale values in the source image/video into other grayscale values in a transformed image/video, 5) using spatial filters to remove noise, smooth, sharpen or otherwise transform an image/video, such as but not limited to Gaussian filters for smoothing images/video, Laplacian filters for highlighting image/video detail, Median and nth order filters for noise removal and Prewitt, Roberts and Sobel filters for edge detection, 6) using grayscale morphology to filter or smooth the pixel intensities of an image/video, to alter the shape of regions by expanding bright areas at the expense of dark areas, remove or enhance isolated features, smooth gradually varying patterns and increase the contrast in boundary areas, 7) using frequency domain processing to remove unwanted frequency information, such as noise, 8) blob (binary large object) analysis in regards to touching pixels with same logic state (Blob analysis may be used to find statistical information such as the size of blobs or the number, location and presence of blob regions to locate particular objects in an image/video.), 9) using thresholding to select ranges of pixel values in grayscale and color images/video that separate objects under consideration from the background, or 10) using binary morphological operations to extract and/or alter the structures of particles (e.g., blobs) in a binary image/video including primary binary morphology, advanced binary morphology.
Other image processing techniques may include 11) using edge detection algorithms, 12) using gauging of dimensional characteristics of objects, 13) using image correlation to determine how close an image/video is to an expected image/video (e.g., comparing a newly captured image/video to a recorded image/video that has already been analyzed for object identification), 14) using pattern matching to locate regions of a grayscale image/video and determine how close the grayscale image/video matches a predetermined template (e.g., pattern matching may be configured to find template matches regardless of poor lighting, blur, noise, shifting of the template or rotation of the template. For graphical components on a captured image/video, the size, shape, location, etc. that correspond to specific objects in an image/video may be predetermined which allows a template to be constructed for particular object sets), 15) using optical character recognition algorithms and methods, 16) using color matching to quantify which color, how much of each color and/or ratio of colors exist in a region of an image/video and compare the values generated during color matching to expected values to determine whether the image/video includes known reference object colors, and 17) using color pattern matching to locate known reference patterns in a color image/video.
Referring back to FIG. 3 after features are extracted from each known key frame, a learning algorithm may then be executed on the extracted key frame features in operation 352. The learning algorithm outputs an adult detection model to the adult key frame detection system 106.
Any suitable learning system may be utilized. For example, a suitable open source learning algorithm, which is known as the Support Vector Machine, is available through Kernel-Machines.org. Embodiments of the Support Vector Machine are further described in (i) the publication by Ron Meir, “Support Vector Machines—an Introduction”, Dept. of Electr. Eng. Technion, Israel, June 2002, (ii) U.S. Pat. No. 7,356,187, issued 8 Apr. 2008 by Shananhan et al., and (iii) U.S. Pat. No. 6,816,847, issued 9 Nov. 2004 by Toyama, which document and patents are incorporated herein by reference in their entirety.
For example, Support Vector Machines may build classifiers by identifying a hyperplane that partitions two classes of adult and non-adult videos or images in a multi-dimensional feature space into two disjoint subsets with a maximum margin, e.g., between the hyperplane and each class. In the linear form of SVM that is employed in one embodiment, the margin is defined by the distance of the hyperplane to the nearest adult and non-adult cases for each class. Different SVM-based training methods include maximizing the margin as an optimization problem.
Mathematically, a linear SVM (e.g., non-linear SVMs are also contemplated) can be represented, for example, in the following two equivalent forms: using a weight vector representation; or using a support vector representation. The weight vector representation mathematically can represent an SVM (the separating hyperplane) as a pair of parameters <W, b>, where W denotes a weight vector and b represents a threshold or bias term. The weight vector W can include a list of tuples of the form <fi, wi>, where fi denotes a feature and wi denotes the weight associated with feature fi. This corresponds to a vector space representation of the weight vector W. Here, the weight value wi associated with each feature fi and the threshold value b may be learned from examples using standard SVM learning algorithms. This weight vector representation is also known as the primal representation. The support vector representation of an SVM model, also known as the dual representation, mathematically represents an SVM (the separating hyperplane) as a pair of parameters <SV, b>, where SV denotes a list of example tuples, known as support vectors, and b represents a threshold. The support vector list can include tuples of the form <SVi, αi>, where SVi denotes an example video with known classification and αi denotes the weight associated with example SVi. The Euclidean (perpendicular) distance from the hyperplane to the support vectors is known as the margin of the support vector machine.
The parameters of the support vector machine model may be determined using a learning algorithm in conjunction with a training data set that characterizes the information need, i.e., a list of videos or key frames that have been labeled as adult or non-adult. Abstractly, learning a linear SVM model may include determining the position and orientation of the hyperplane that separates the adult examples and non-adult examples that are used during learning. The parameters of the weight vector representation or the support vector representation may also be determined. Learning a support vector machine can be viewed both as a constraint satisfaction and optimization algorithm, where the first objective is to determine a hyperplane that classifies each labeled training example correctly, and where the second objective is to determine the hyperplane that is furthest from the training data, so that an adult detection model is determined.
Referring back to FIG. 3, the model that is output from learning system 108 may be used for each unknown video and its unknown key frames. In the illustrated example, an unknown key frame 301 is received by the adult key frame detection system 106. One or more key frame features may then be extracted from such unknown key frame, e.g., as described above for the learning system, in operation 302. The adult detection model may then be executed to obtain an adult indicator for the current key frame in operation 304. The key frame adult indicator may then be output from the adult key frame detection system 106.
Classifying a key frame using an SVM model reduces to determining which side of the hyperplane the example falls. If the example falls on the adult side of the hyperplane then the example is assigned an adult label; otherwise it is assigned a non-adult label. This form of learned SVM is known as a hard SVM. Other types of SVM exist which relax the first objective. For example, not requiring all training examples to be classified correctly by the SVM leads to a type known as soft SVMs. In this case the SVM learning algorithm sacrifices accuracy of the model with the margin of the model. Other types of SVMs and SVM learning algorithms also exist and may be utilized by techniques of the present invention.
Once each key frame of an unknown video has been assigned at least one adult indicator, the adult categorization module may then determine an adult indicator for the entire unknown video based on the key frames' adult indicators. In one embodiment, each significantly different portion of each key frame that is determined to be a moving object is assigned an adult indicator. FIG. 5 is a diagrammatic representation of a plurality of key frame adult indicators in accordance with a specific implementation. As shown, portion 502 a of key frame 09 has an adult indicator that specifies “non-adult” and a 97.23% confidence level, and portion 502 b of key frame 12 has an adult indicator that specifies “non-adult” and a 99.21%. Key frames 15 and 18 each have two portions that each have a representative adult indicator. Key frame 15 has a portion 504 a with an adult indicator of “adult” at a 91.28% confidence level and a portion 502 c with an adult indicator of “non-adult” at a 96.22% confidence level. Key frame 19 has a portion 504 b with an adult indicator of “adult” at a 63.06% confidence level and a portion 502 d with an adult indicator of “non-adult” at a 98.33% confidence level.
Any suitable technique may be used to determine an unknown video's adult indicator based on the key frame adult indicators. In one implementation, an average confidence value is determined for all of the key frames for both adult and non-adult portions. For instance, the confidence level for the video being non-adult may be determined by (97.23+99.21+96.22+98.33)/4, which equals 97.75%. Likewise, the adult confidence level may be determined by (0+0+91.28+63.06)/4, which equals 38.59%. The final determination may be based on different thresholds for adult and non-adult confidence levels. For instance, when the aggregate (total) non-adult confidence level exceeds 97%, the unknown video is deemed to be safe (non-adult), provided that the aggregate adult confidence level is below 50%. In other examples, when the adult confidence is above 70% and the non-adult confidence is below 61%, the unknown video may be deemed adult. Additionally, the unknown video may be deemed a suspected adult video when the adult confidence level is above 70%, while the non-adult confidence level is above 61.11%. Other thresholds that may be used involve non-deterministic scenarios such as an unknown video having too low aggregate confidence scores (for example, less than 70% adult and less than 61% non-adult). Likewise if an unknown video has very high scores (contention) between adult as well as non-adult cut-offs (e.g., 80% adult and 99% non-adult), the unknown video can be deemed as suspect safe.
Once an unknown video's adult indicator is determined, the key frame adult indicators for such now known video can be reassessed. For example, if the video is determined to be adult, all key frames with an adult indicator can have their confidence levels increased. As an example, a Video Va containing key frames K1, K2, K3, and K4 was deemed suspect adult. At a later point when another Video Vb containing key frames K3, K4, K5, and K6 is deemed to be “adult classified,” the classification causes the result of Va to be reassessed to the extent that if any of the key frames (e.g., K3 and K4) were contributing non-deterministically earlier by way of mechanics described in above, the aggregate scores may now be recalculated based on the new information. Since Video Vb is adult, non-determinstic key frames belonging to all videos including common with Vb (in Va, for example, K3 and K4) can also be deemed as adult.
Referring back to FIG. 3, when a new known video and its key frames adult indicator determination has been completed, the new known video and key frames with their associated adult indicators may be retained, e.g., in database 110. In one implementation, the database includes a list of a plurality of videos entries that each includes a reference or title and an unique video identity, which can be quickly search for the video's location and/or identity. The database may also include another list of unique video identifiers and their associated one or more key words for such video, a server identity, a video type, the number of key frames, a video confidence value, an adult indicator field (e.g. set to 1 for an adult video and 0 for non-adult or possibly suspected adult), and a suspected adult indicator field (e.g. set to 1 for suspected adult and set to 0 for non-adult video). The database may also include a list of key frames for the multiple videos, where each key frame entry includes a video identifier, key frame identifier or number, key frame file name or reference, type, fingerprint, adult indicator (e.g., adult or non-adult), and a confidence level value. The fingerprint takes the form of a unique identifier for the key frame and helps in locating, searching and comparing key frames quickly.
Embodiments of the present invention may be employed to perform adult detection techniques in any of a wide variety of computing contexts. For example, as illustrated in FIG. 6, implementations are contemplated in which the relevant population of users interact with a diverse network environment via any type of computer (e.g., desktop, laptop, tablet, etc.) 602, media computing platforms 603 (e.g., cable and satellite set top boxes and digital video recorders), handheld computing devices (e.g., PDAs) 604, cell phones 606, or any other type of computing or communication platform.
And according to various embodiments, video information, as well as user preferences, may be obtained using a wide variety of techniques. For example, adult detection selection based on a user's interaction with a local application, web site or web-based application or service may be accomplished using any of a variety of well known mechanisms for recording and determining a user's behavior. However, it should be understood that such methods are merely exemplary and that preference information and video information may be collected in many other ways.
Once video information has been obtained, this information may be analyzed and used to generate adult indicators according to the invention in some centralized manner. This is represented in FIG. 6 by server 608 and data store 610 that, as will be understood, may correspond to multiple distributed devices and data stores. The invention may also be practiced in a wide variety of network environments (represented by network 612) including, for example, TCP/IP-based networks, telecommunications networks, wireless networks, etc. In addition, the computer program instructions with which embodiments of the invention are implemented may be stored in any type of computer-readable media, and may be executed according to a variety of computing models including a client/server model, a peer-to-peer model, on a stand-alone computing device, or according to a distributed computing model in which various of the functionalities described herein may be effected or employed at different locations.
FIG. 7 illustrates a typical computer system that, when appropriately configured or designed, can serve as a adult detection system and/or search application, etc. The computer system 700 includes any number of processors 702 (also referred to as central processing units, or CPUs) that are coupled to storage devices including primary storage 706 (typically a random access memory, or RAM), primary storage 704 (typically a read only memory, or ROM). CPU 702 may be of various types including microcontrollers and microprocessors such as programmable devices (e.g., CPLDs and FPGAs) and unprogrammable devices such as gate array ASICs or general-purpose microprocessors. As is well known in the art, primary storage 704 acts to transfer data and instructions uni-directionally to the CPU and primary storage 706 is used typically to transfer data and instructions in a bi-directional manner. Both of these primary storage devices may include any suitable computer-readable media such as those described herein. A mass storage device 708 is also coupled bi-directionally to CPU 702 and provides additional data storage capacity and may include any of the computer-readable media described above. Mass storage device 708 may be used to store programs, data and the like and is typically a secondary storage medium such as a hard disk. It will be appreciated that the information retained within the mass storage device 708, may, in appropriate cases, be incorporated in standard fashion as part of primary storage 706 as virtual memory. A specific mass storage device such as a CD-ROM 714 may also pass data uni-directionally to the CPU.
CPU 702 is also coupled to an interface 710 that connects to one or more input/output devices such as such as video monitors, track balls, mice, keyboards, microphones, touch-sensitive displays, transducer card readers, magnetic or paper tape readers, tablets, styluses, voice or handwriting recognizers, or other well-known input devices such as, of course, other computers. Finally, CPU 702 optionally may be coupled to an external device such as a database or a computer or telecommunications network using an external connection as shown generally at 712. With such a connection, it is contemplated that the CPU might receive information from the network, or might output information to the network in the course of performing the method steps described herein.
Regardless of the system's configuration, it may employ one or more memories or memory modules configured to store data, program instructions for the general-purpose processing operations and/or the inventive techniques described herein. The program instructions may control the operation of an operating system and/or one or more applications, for example. The memory or memories may also be configured to store user preferences and profile information, video and key frame information, adult detection models adult indicators for key frames and videos, etc.
Because such information and program instructions may be employed to implement the systems/methods described herein, the present invention relates to machine-readable media that include program instructions, state information, etc. for performing various operations described herein. Examples of machine-readable media include, but are not limited to, magnetic media such as hard disks, floppy disks, and magnetic tape; optical media such as CD-ROM disks; magneto-optical media such as floptical disks; and hardware devices that are specially configured to store and perform program instructions, such as read-only memory devices (ROM) and random access memory (RAM). Examples of program instructions include both machine code, such as produced by a compiler, and files containing higher level code that may be executed by the computer using an interpreter.
Although the foregoing invention has been described in some detail for purposes of clarity of understanding, it will be apparent that certain changes and modifications may be practiced within the scope of the appended claims. Therefore, the present embodiments are to be considered as illustrative and not restrictive and the invention is not to be limited to the details given herein, but may be modified within the scope and equivalents of the appended claims.

Claims (30)

1. A method for detecting pornographic or adult videos, comprising:
for an unknown video having a plurality of frames, defining a plurality of key frames selected from the frames of the unknown video, wherein each key frame corresponds to a frame that contains features that are likely relevant for detecting pornographic or adult aspects of the unknown video; and
analyzing the key frames using an adult detection model that was generated by a learning process based on a training set of images and their associated adult indicators that each specifies whether the associated known image is an adult or non- adult image, whereby the analysis results in an adult indicator that specifies whether the unknown video is an adult video, a non-adult video, or a suspected adult video.
2. The method as recited in claim 1, wherein defining the key frames comprises:
determining one or more portions of each frame that are significantly different from corresponding portions of a plurality of adjacent frames; and
defining the key frames based on the significantly different one or more portions of each frame.
3. The method as recited in claim 2, wherein analyzing the key frames comprises:
analyzing one or more of the significantly different portions of each key frame with the adult detection model to thereby determine an adult indicator for such one or more of the significantly different portions of such each key frame being adult or non- adult; and
determining the adult indicator of the unknown video based on the adult indicators for the key frames.
4. The method as recited in claim 3, wherein an adult indicator is determined for each significantly different portion of each key frame that is determined to include a moving object.
5. The method as recited in claim 1, further comprising:
prior to analyzing the key frames of the unknown video, executing the learning process based on one or more key frame features extracted from each known image and the each known image's associated adult indicator so as to generate the adult detection model that is to be used for the unknown video.
6. The method as recited in claim 5, further comprising extracting a plurality of key frame features from the key frames of the unknown video, wherein the analyzing of the key frames of the unknown video is based on the extracted key frame features for such unknown video, and wherein a same type of features are used for analysis of the key frames of the unknown video and by the learning process.
7. The method as recited in claim 5, further comprising:
after analyzing the key frames of the unknown video so that the unknown video is defined as a new known video, including the key frames and associated adult indicators in the training set of known images; and
executing the learning process based on each known image, including the key frames, and each known image's adult indicator, including the key frames' adult indicators, so as to generate a new adult detection model to be used for adult detection of new unknown videos.
8. The method as recited in claim 7, further comprising manually correcting one or more adult indicators of the known images, which include the key frames of the new known video, prior to executing the learning process on such known images.
9. The method as recited in claim 1, wherein the adult indicator specifies that the unknown video is an adult video when an adult indicator field is set, and wherein the adult indicator specifies that the unknown video is a suspected adult video when a suspected adult indicator field is set.
10. The method as recited in claim 1, wherein the analysis results in a confidence level corresponding to the adult indicator, wherein the confidence level corresponding to the adult indicator comprises a value within a first range of values when the unknown video is an adult video, wherein the confidence level corresponding to the adult indicator is within a second range of values when the unknown video is a suspected adult video, and wherein the confidence level corresponding to the adult indicator is within a third range of values when the unknown values is a non-adult video.
11. An apparatus comprising at least a processor and a memory, wherein the processor and/or memory are configured to perform operations, comprising:
for an unknown video having a plurality of frames, defining a plurality of key frames selected from the frames of the unknown video, wherein each key frame corresponds to a frame that contains features that are likely relevant for detecting pornographic or adult aspects of the unknown video; and
analyzing the key frames using an adult detection model that was generated by a learning process based on a training set of images and their associated adult indicators that each specifies whether the associated known image is an adult or non-adult image, whereby the analysis results in an adult indicator that specifies whether the unknown video is an adult video, a non-adult video, or a suspected adult video.
12. The apparatus as recited in claim 11, wherein defining the key frames comprises:
determining one or more portions of each frame that are significantly different from corresponding portions of a plurality of adjacent frames; and
defining the key frames based on the significantly different one or more portions of each frame.
13. The apparatus as recited in claim 12, wherein analyzing the key frames comprises:
analyzing one or more of the significantly different portions of each key frame with the adult detection model to thereby determine an adult indicator for such one or more of the significantly different portions of such each key frame being adult or non- adult; and
determining the adult indicator of the unknown video based on the adult indicators for the key frames.
14. The apparatus as recited in claim 13, wherein an adult indicator is determined for each significantly different portion of each key frame that is determined to include a moving object.
15. The apparatus as recited in claim 11, wherein the processor and/or memory are
further configured for performing operations, further comprising:
prior to analyzing the key frames of the unknown video, executing the learning process based on one or more key frame features extracted from each known image and the each known image's associated adult indicator so as to generate the adult detection model that is to be used for the unknown video.
16. The apparatus as recited in claim 15, wherein the processor and/or memory
are further configured for performing operations, further comprising:
extracting a plurality of key frame features from the key frames of the unknown video, wherein the analyzing of the key frames of the unknown video is based on the extracted key frame features for such unknown video, and wherein a same type of features are used for analysis of the key frames of the unknown video and by the learning process.
17. The apparatus as recited in claim 15, wherein the processor and/or memory are further configured for performing operations, further comprising:
after analyzing the key frames of the unknown video so that the unknown video is defined as a new known video, including the key frames and associated adult indicators in the training set of known images; and
executing the learning process based on each known image, including the key frames, and each known image's adult indicator, including the key frames' adult indicators, so as to generate a new adult detection model to be used for adult detection of new unknown videos.
18. The apparatus as recited in claim 17, wherein the processor and/or memory are further configured for supporting manually correcting one or more adult indicators of the known images, which include the key frames of the new known video, prior to executing the learning process on such known images.
19. The apparatus as recited in claim 11, wherein the adult indicator specifies that the unknown video is an adult video when the adult indicator is in a first state, that the unknown video is a non-adult video when the adult indicator is in a second state, or that the unknown video is a suspected adult video when the adult indicator is in a third state.
20. At least one non-transitory computer readable storage medium having computer program instructions stored thereon that are arranged to perform operations, comprising:
for an unknown video having a plurality of frames, defining a plurality of key frames selected from the frames of the unknown video, wherein each key frame corresponds to a frame that contains features that are likely relevant for detecting pornographic or adult aspects of the unknown video; and
analyzing the key frames using an adult detection model that was generated by a learning process based on a training set of images and their associated adult indicators that each specifies whether the associated known image is an adult or non- adult image, whereby the analysis results in an adult indicator that specifies whether the unknown video is an adult video, a non-adult video, or a suspected adult video.
21. The at least one non-transitory computer readable storage medium as recited in claim 20, wherein defining the key frames comprises:
determining one or more portions of each frame that are significantly different from corresponding portions of a plurality of adjacent frames; and
defining the key frames based on the significantly different one or more portions of each frame.
22. The at least one non-transitory computer readable storage medium as recited in claim 21, wherein analyzing the key frames comprises:
analyzing one or more of the significantly different portions of each key frame with the adult detection model to thereby determine an adult indicator for such one or more of the significantly different portions of such each key frame being adult or non- adult; and
determining the adult indicator of the unknown video based on the adult indicators for the key frames.
23. The at least one non-transitory computer readable storage medium as recited in claim 22, wherein an adult indicator is determined for each significantly different portion of each key frame that is determined to include a moving object.
24. The at least one non-transitory computer readable storage medium as recited in claim 20, wherein the computer program instructions are further arranged to perform operations, further comprising:
prior to analyzing the key frames of the unknown video, executing the learning process based on one or more key frame features extracted from each known image and the each known image's associated adult indicator so as to generate the adult detection model that is to be used for the unknown video.
25. The at least one computer readable storage medium as recited in claim 24, wherein the computer program instructions are further arranged to operations, further comprising:
extracting a plurality of key frame features from the key frames of the unknown video, wherein the analyzing of the key frames of the unknown video is based on the extracted key frame features for such unknown video, and wherein a same type of features are used for analysis of the key frames of the unknown video and by the learning process.
26. The at least one non-transitory computer readable storage medium as recited in claim 24, wherein the computer program instructions are further arranged to perform operations, further comprising:
after analyzing the key frames of the unknown video so that the unknown video is defined as a new known video, including the key frames and associated adult indicators in the training set of known images; and
executing the learning process based on each known image, including the key frames, and each known image's adult indicator, including the key frames' adult indicators, so as to generate a new adult detection model to be used for adult detection of new unknown videos.
27. The at least one non-transitory computer readable storage medium as recited in claim 26, wherein the computer program instructions are further arranged to perform operations, further comprising:
manually correcting one or more adult indicators of the known images, which include the key frames of the new known video, prior to executing the learning process on such known images.
28. At least one non-transitory computer readable storage medium having computer program instructions stored thereon that are arranged to perform operations, comprising:
sending a request for a plurality of videos, wherein the request is associated with a parameter that indicates that pornographic or adult videos are to be filtered from such videos; and
receiving a plurality of references to a plurality of videos from which a plurality of adult videos have been filtered using an adult detection model that was generated by a learning process based on a training set of videos that each include an adult indicator that specifies whether the each known video is an adult video, a non-adult video, or a suspected adult video.
29. An apparatus comprising at least a processor and a memory, wherein the processor and/or memory are configured to perform operations, comprising:
sending a request for a plurality of videos, wherein the request is associated with a parameter that indicates that pornographic or adult videos are to be filtered from such videos; and
receiving a plurality of references to a plurality of videos from which a plurality of adult videos have been filtered using an adult detection model that was generated by a learning process based on a training set of videos that each include an adult indicator that specifies that the known video is an adult video when the adult indicator is in a first state, that the known video is a non-adult video when the adult indicator is in a second state, or that the known video is a suspected adult video when the adult indicator is in a third state.
30. At least one non-transitory computer readable storage medium having computer program instructions stored thereon that are arranged to perform operations, comprising:
sending a request for a plurality of videos, wherein the request is associated with a parameter that indicates that pornographic or adult videos are to be filtered from such videos; and
receiving a plurality of references to a plurality of videos from which a plurality of adult videos have been filtered using an adult detection model that was generated by a learning process based on a training set of videos that each include an adult indicator that specifies that the known video is an adult video when the adult indicator is in a first state, that the known video is a non-adult video when the adult indicator is in a second state, or that the known video is a suspected adult video when the adult indicator is in a third state.
US12/113,835 2008-05-01 2008-05-01 Apparatus and methods for detecting adult videos Active 2031-11-23 US8358837B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US12/113,835 US8358837B2 (en) 2008-05-01 2008-05-01 Apparatus and methods for detecting adult videos

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US12/113,835 US8358837B2 (en) 2008-05-01 2008-05-01 Apparatus and methods for detecting adult videos

Publications (2)

Publication Number Publication Date
US20090274364A1 US20090274364A1 (en) 2009-11-05
US8358837B2 true US8358837B2 (en) 2013-01-22

Family

ID=41257121

Family Applications (1)

Application Number Title Priority Date Filing Date
US12/113,835 Active 2031-11-23 US8358837B2 (en) 2008-05-01 2008-05-01 Apparatus and methods for detecting adult videos

Country Status (1)

Country Link
US (1) US8358837B2 (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103400155A (en) * 2013-06-28 2013-11-20 西安交通大学 Pornographic video detection method based on semi-supervised learning of images
US20140157096A1 (en) * 2012-12-05 2014-06-05 International Business Machines Corporation Selecting video thumbnail based on surrounding context
US20150125074A1 (en) * 2013-11-05 2015-05-07 Electronics And Telecommunications Research Institute Apparatus and method for extracting skin area to block harmful content image
CN104951742A (en) * 2015-03-02 2015-09-30 北京奇艺世纪科技有限公司 Detection method and system for sensitive video
US9529840B1 (en) 2014-01-14 2016-12-27 Google Inc. Real-time duplicate detection of videos in a massive video sharing system
US9542976B2 (en) 2013-09-13 2017-01-10 Google Inc. Synchronizing videos with frame-based metadata using video content
CN106658048A (en) * 2016-12-20 2017-05-10 天脉聚源(北京)教育科技有限公司 Method and device for updating preview images during live monitoring
US9723344B1 (en) 2015-12-29 2017-08-01 Google Inc. Early detection of policy violating media
CN110163300A (en) * 2019-05-31 2019-08-23 北京金山云网络技术有限公司 A kind of image classification method, device, electronic equipment and storage medium
US10516918B2 (en) * 2017-07-27 2019-12-24 Global Tel*Link Corporation System and method for audio visual content creation and publishing within a controlled environment
US11108885B2 (en) 2017-07-27 2021-08-31 Global Tel*Link Corporation Systems and methods for providing a visual content gallery within a controlled environment
US11213754B2 (en) 2017-08-10 2022-01-04 Global Tel*Link Corporation Video game center for a controlled environment facility
US11595701B2 (en) 2017-07-27 2023-02-28 Global Tel*Link Corporation Systems and methods for a video sharing service within controlled environments

Families Citing this family (47)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4433027B2 (en) * 2007-09-21 2010-03-17 ソニー株式会社 Signal processing apparatus, signal processing method, and program
KR100932537B1 (en) * 2007-11-26 2009-12-17 한국전자통신연구원 Forensic Evidence Analysis System and Method Using Image Filter
US8358837B2 (en) * 2008-05-01 2013-01-22 Yahoo! Inc. Apparatus and methods for detecting adult videos
US20090287655A1 (en) * 2008-05-13 2009-11-19 Bennett James D Image search engine employing user suitability feedback
US8472728B1 (en) * 2008-10-31 2013-06-25 The Rubicon Project, Inc. System and method for identifying and characterizing content within electronic files using example sets
KR101174057B1 (en) * 2008-12-19 2012-08-16 한국전자통신연구원 Method and apparatus for analyzing and searching index
JP4766197B2 (en) * 2009-01-29 2011-09-07 日本電気株式会社 Feature selection device
TWI464706B (en) * 2009-03-13 2014-12-11 Micro Star Int Co Ltd Dark portion exposure compensation method for simulating high dynamic range with single image and image processing device using the same
IT1395648B1 (en) * 2009-05-28 2012-10-16 St Microelectronics Srl PROCEDURE AND SYSTEM FOR DETECTION OF PORNOGRAPHIC CONTENT IN VIDEO SEQUENCES, RELATIVE COMPUTER PRODUCT
KR20110066676A (en) * 2009-12-11 2011-06-17 한국전자통신연구원 Apparatus and method for blocking the objectionable multimedia based on skin-color and face information
US8359642B1 (en) * 2010-06-25 2013-01-22 Sprint Communications Company L.P. Restricting mature content
KR101468863B1 (en) 2010-11-30 2014-12-04 한국전자통신연구원 System and method for detecting global harmful video
KR101435778B1 (en) * 2011-03-16 2014-08-29 한국전자통신연구원 Method for classifying objectionable movies using visual features based on video and multi-level statistical combination and apparatus for the same
CN102930553B (en) * 2011-08-10 2016-03-30 中国移动通信集团上海有限公司 Bad video content recognition method and device
CN103093180B (en) * 2011-10-28 2016-06-29 阿里巴巴集团控股有限公司 A kind of method and system of pornographic image detecting
US8943426B2 (en) * 2011-11-03 2015-01-27 Htc Corporation Method for displaying background wallpaper and one or more user interface elements on display unit of electrical apparatus at the same time, computer program product for the method and electrical apparatus implementing the method
US9223986B2 (en) * 2012-04-24 2015-12-29 Samsung Electronics Co., Ltd. Method and system for information content validation in electronic devices
US9135712B2 (en) * 2012-08-01 2015-09-15 Augmented Reality Lab LLC Image recognition system in a cloud environment
GB201315859D0 (en) * 2013-09-05 2013-10-23 Image Analyser Ltd Video analysis method and system
CN103544498B (en) * 2013-09-25 2017-02-08 华中科技大学 Video content detection method and video content detection system based on self-adaption sampling
KR20150092546A (en) * 2014-02-05 2015-08-13 한국전자통신연구원 Harmless frame filter and apparatus for harmful image block having the same, method for filtering harmless frame
US9847101B2 (en) * 2014-12-19 2017-12-19 Oracle International Corporation Video storytelling based on conditions determined from a business object
KR20160107417A (en) * 2015-03-03 2016-09-19 한국전자통신연구원 Method and apparatus for detecting harmful video
US9530082B2 (en) * 2015-04-24 2016-12-27 Facebook, Inc. Objectionable content detector
CN105183758A (en) * 2015-07-22 2015-12-23 深圳市万姓宗祠网络科技股份有限公司 Content recognition method for continuously recorded video or image
US20170185841A1 (en) * 2015-12-29 2017-06-29 Le Holdings (Beijing) Co., Ltd. Method and electronic apparatus for identifying video characteristic
EP3437326A4 (en) * 2016-03-30 2019-08-21 Covenant Eyes, Inc. Applications, systems and methods to monitor, filter and/or alter output of a computing device
BR102016007265B1 (en) 2016-04-01 2022-11-16 Samsung Eletrônica da Amazônia Ltda. MULTIMODAL AND REAL-TIME METHOD FOR FILTERING SENSITIVE CONTENT
US9996769B2 (en) 2016-06-08 2018-06-12 International Business Machines Corporation Detecting usage of copyrighted video content using object recognition
WO2018023453A1 (en) * 2016-08-02 2018-02-08 步晓芳 Patent information pushing method performed during automatic pornography identification, and recognition system
WO2018023452A1 (en) * 2016-08-02 2018-02-08 步晓芳 Method for collecting usage condition of adult shot identification technique, and recognition system
WO2018023454A1 (en) * 2016-08-02 2018-02-08 步晓芳 Automatic pornography identification method, and recognition system
CN106778486A (en) * 2016-11-18 2017-05-31 乐视控股(北京)有限公司 A kind of method and apparatus for differentiating specific image
US10349126B2 (en) * 2016-12-19 2019-07-09 Samsung Electronics Co., Ltd. Method and apparatus for filtering video
CN108229262B (en) * 2016-12-22 2021-10-15 腾讯科技(深圳)有限公司 Pornographic video detection method and device
US20180247161A1 (en) * 2017-01-23 2018-08-30 Intaimate LLC System, method and apparatus for machine learning-assisted image screening for disallowed content
CN107896335B (en) * 2017-12-06 2019-12-31 重庆智韬信息技术中心 Video detection and rating method based on big data technology
CN110913243B (en) * 2018-09-14 2021-09-14 华为技术有限公司 Video auditing method, device and equipment
WO2020123124A1 (en) * 2018-12-14 2020-06-18 Google Llc Methods, systems, and media for identifying videos containing objectionable content
RU2743932C2 (en) 2019-04-15 2021-03-01 Общество С Ограниченной Ответственностью «Яндекс» Method and server for repeated training of machine learning algorithm
CN110119788B (en) * 2019-05-27 2021-06-01 航美传媒集团有限公司 Intelligent identification system for electronic media advertisement playing content
CN110751224B (en) * 2019-10-25 2022-08-05 Oppo广东移动通信有限公司 Training method of video classification model, video classification method, device and equipment
KR102504321B1 (en) * 2020-08-25 2023-02-28 한국전자통신연구원 Apparatus and method for online action detection
CN112381114A (en) * 2020-10-20 2021-02-19 广东电网有限责任公司中山供电局 Deep learning image annotation system and method
CN113766297B (en) * 2021-05-27 2023-12-05 腾讯科技(深圳)有限公司 Video processing method, playing terminal and computer readable storage medium
CN115550684B (en) * 2021-12-30 2023-07-25 北京国瑞数智技术有限公司 Improved video content filtering method and system
CN115988229A (en) * 2022-11-16 2023-04-18 阿里云计算有限公司 Image identification method and device

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5796948A (en) * 1996-11-12 1998-08-18 Cohen; Elliot D. Offensive message interceptor for computers
US5835722A (en) * 1996-06-27 1998-11-10 Logon Data Corporation System to control content and prohibit certain interactive attempts by a person using a personal computer
US6266664B1 (en) * 1997-10-01 2001-07-24 Rulespace, Inc. Method for scanning, analyzing and rating digital information content
US7076527B2 (en) * 2001-06-14 2006-07-11 Apple Computer, Inc. Method and apparatus for filtering email
US20080134282A1 (en) * 2006-08-24 2008-06-05 Neustar, Inc. System and method for filtering offensive information content in communication systems
US20080159624A1 (en) * 2006-12-27 2008-07-03 Yahoo! Inc. Texture-based pornography detection
US20090274364A1 (en) * 2008-05-01 2009-11-05 Yahoo! Inc. Apparatus and methods for detecting adult videos
US7689913B2 (en) * 2005-06-02 2010-03-30 Us Tax Relief, Llc Managing internet pornography effectively
US7814545B2 (en) * 2003-07-22 2010-10-12 Sonicwall, Inc. Message classification using classifiers

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5835722A (en) * 1996-06-27 1998-11-10 Logon Data Corporation System to control content and prohibit certain interactive attempts by a person using a personal computer
US5796948A (en) * 1996-11-12 1998-08-18 Cohen; Elliot D. Offensive message interceptor for computers
US6266664B1 (en) * 1997-10-01 2001-07-24 Rulespace, Inc. Method for scanning, analyzing and rating digital information content
US7076527B2 (en) * 2001-06-14 2006-07-11 Apple Computer, Inc. Method and apparatus for filtering email
US7814545B2 (en) * 2003-07-22 2010-10-12 Sonicwall, Inc. Message classification using classifiers
US7689913B2 (en) * 2005-06-02 2010-03-30 Us Tax Relief, Llc Managing internet pornography effectively
US20080134282A1 (en) * 2006-08-24 2008-06-05 Neustar, Inc. System and method for filtering offensive information content in communication systems
US20080159624A1 (en) * 2006-12-27 2008-07-03 Yahoo! Inc. Texture-based pornography detection
US20090274364A1 (en) * 2008-05-01 2009-11-05 Yahoo! Inc. Apparatus and methods for detecting adult videos

Non-Patent Citations (10)

* Cited by examiner, † Cited by third party
Title
Boon-Lock Yeo et al.,"Rapid Scene Analysis on Compressed Video", IEEE Transactions on Circuits and Systems for Video Technology, vol. 5, No. 6, Dec. 1995, pp. 533-544.
Frederic Dufaux, "Key Frame Selection to represent a video" Proceeding 2000 International Conference on Image Processing, Sep. 10-13, 2000, pp. 275-278.
Google Preferences, http://www.google.com/preferences?hl=en, printed Apr. 18, 2008.
Henry A. Rowley et al., "Neural Network-Based Face Detention" PAMI, Jan. 1998, pp. 1-28.
Marti A. Hearst, "Support Vector Machines" IEEE Intelligent Systems, Jul./Aug. 1998, pp. 18-28.
Michael J. Jones et al., "Statistical Color Models with Application to Skin Detection" Compaq Computer Corporation, Cambridge Research Laboratory Technical Report Series, Dec. 1998, pp. 1-28.
Ron Meir, "Support Vector Machines-an Introduction" Department of Electrical Engineering Technion, Israel, Jun. 2002, pp. 1-44.
Yahoo! Advanced Video Search, Advanced Video Search, http://video.search.yahoo.com/video/advanced?ei=UTF-8, printed Apr. 18, 2008.
Yahoo! Search Preferences, Search Preferences, http://search.yahoo.com/preferences/preferences?page=filters&pref-done=http%3A%2F%, printed Apr. 18, 2008.
Yueting Zhuang et al., Adaptive Key Frame Extraction Using Unsupervised Clustering, Department of Computer Science, Zhejiang University, Beckman Institute Computer Science, University of Illinois, 2000.

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140157096A1 (en) * 2012-12-05 2014-06-05 International Business Machines Corporation Selecting video thumbnail based on surrounding context
CN103400155A (en) * 2013-06-28 2013-11-20 西安交通大学 Pornographic video detection method based on semi-supervised learning of images
US9542976B2 (en) 2013-09-13 2017-01-10 Google Inc. Synchronizing videos with frame-based metadata using video content
US20150125074A1 (en) * 2013-11-05 2015-05-07 Electronics And Telecommunications Research Institute Apparatus and method for extracting skin area to block harmful content image
US10198441B1 (en) 2014-01-14 2019-02-05 Google Llc Real-time duplicate detection of videos in a massive video sharing system
US9529840B1 (en) 2014-01-14 2016-12-27 Google Inc. Real-time duplicate detection of videos in a massive video sharing system
CN104951742B (en) * 2015-03-02 2018-06-22 北京奇艺世纪科技有限公司 The detection method and system of objectionable video
CN104951742A (en) * 2015-03-02 2015-09-30 北京奇艺世纪科技有限公司 Detection method and system for sensitive video
US9723344B1 (en) 2015-12-29 2017-08-01 Google Inc. Early detection of policy violating media
CN106658048A (en) * 2016-12-20 2017-05-10 天脉聚源(北京)教育科技有限公司 Method and device for updating preview images during live monitoring
US10516918B2 (en) * 2017-07-27 2019-12-24 Global Tel*Link Corporation System and method for audio visual content creation and publishing within a controlled environment
US11108885B2 (en) 2017-07-27 2021-08-31 Global Tel*Link Corporation Systems and methods for providing a visual content gallery within a controlled environment
US11115716B2 (en) 2017-07-27 2021-09-07 Global Tel*Link Corporation System and method for audio visual content creation and publishing within a controlled environment
US11595701B2 (en) 2017-07-27 2023-02-28 Global Tel*Link Corporation Systems and methods for a video sharing service within controlled environments
US11750723B2 (en) 2017-07-27 2023-09-05 Global Tel*Link Corporation Systems and methods for providing a visual content gallery within a controlled environment
US11213754B2 (en) 2017-08-10 2022-01-04 Global Tel*Link Corporation Video game center for a controlled environment facility
CN110163300A (en) * 2019-05-31 2019-08-23 北京金山云网络技术有限公司 A kind of image classification method, device, electronic equipment and storage medium

Also Published As

Publication number Publication date
US20090274364A1 (en) 2009-11-05

Similar Documents

Publication Publication Date Title
US8358837B2 (en) Apparatus and methods for detecting adult videos
US20200372662A1 (en) Logo Recognition in Images and Videos
CN109151501B (en) Video key frame extraction method and device, terminal equipment and storage medium
Hannane et al. An efficient method for video shot boundary detection and keyframe extraction using SIFT-point distribution histogram
US8867828B2 (en) Text region detection system and method
JP5050075B2 (en) Image discrimination method
CN102007499B (en) Detecting facial expressions in digital images
US7171042B2 (en) System and method for classification of images and videos
Crandall et al. Extraction of special effects caption text events from digital video
US20170083770A1 (en) Video segmentation techniques
EP2568429A1 (en) Method and system for pushing individual advertisement based on user interest learning
US9471675B2 (en) Automatic face discovery and recognition for video content analysis
US20090290752A1 (en) Method for producing video signatures and identifying video clips
Küçüktunç et al. Video copy detection using multiple visual cues and MPEG-7 descriptors
Chasanis et al. Simultaneous detection of abrupt cuts and dissolves in videos using support vector machines
Fan et al. Fuzzy color distribution chart-based shot boundary detection
CN111191591A (en) Watermark detection method, video processing method and related equipment
e Souza et al. Survey on visual rhythms: A spatio-temporal representation for video sequences
US8073963B1 (en) Data stream change detector
Lucena et al. Improving face detection performance by skin detection post-processing
Mishra Hybrid feature extraction and optimized deep convolutional neural network based video shot boundary detection
KR100656373B1 (en) Method for discriminating obscene video using priority and classification-policy in time interval and apparatus thereof
Smitha et al. Illumination invariant text recognition system based on contrast limit adaptive histogram equalization in videos/images
Zumer et al. Color-independent classification of animation video
Xu et al. SAIVT-QUT@ TRECVid 2012: Interactive surveillance event detection

Legal Events

Date Code Title Description
AS Assignment

Owner name: YAHOO| INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SHAKYA, SUBODH;ZHANG, RUOFEI;REEL/FRAME:020889/0126

Effective date: 20080429

STCF Information on status: patent grant

Free format text: PATENTED CASE

FPAY Fee payment

Year of fee payment: 4

AS Assignment

Owner name: YAHOO HOLDINGS, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:YAHOO| INC.;REEL/FRAME:042963/0211

Effective date: 20170613

AS Assignment

Owner name: OATH INC., NEW YORK

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:YAHOO HOLDINGS, INC.;REEL/FRAME:045240/0310

Effective date: 20171231

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 8

AS Assignment

Owner name: VERIZON MEDIA INC., NEW YORK

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:OATH INC.;REEL/FRAME:054258/0635

Effective date: 20201005

AS Assignment

Owner name: VERIZON PATENT AND LICENSING INC., NEW JERSEY

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:VERIZON MEDIA INC.;REEL/FRAME:057453/0431

Effective date: 20210801