CN103617262B - Picture content attribute identification method and system - Google Patents

Picture content attribute identification method and system Download PDF

Info

Publication number
CN103617262B
CN103617262B CN201310632676.8A CN201310632676A CN103617262B CN 103617262 B CN103617262 B CN 103617262B CN 201310632676 A CN201310632676 A CN 201310632676A CN 103617262 B CN103617262 B CN 103617262B
Authority
CN
China
Prior art keywords
picture
cluster
reprinting
homology
website
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310632676.8A
Other languages
Chinese (zh)
Other versions
CN103617262A (en
Inventor
陶哲
白明
韩玉刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qihoo Technology Co Ltd
Original Assignee
Beijing Qihoo Technology Co Ltd
Qizhi Software Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qihoo Technology Co Ltd, Qizhi Software Beijing Co Ltd filed Critical Beijing Qihoo Technology Co Ltd
Priority to CN201310632676.8A priority Critical patent/CN103617262B/en
Publication of CN103617262A publication Critical patent/CN103617262A/en
Priority to PCT/CN2014/087109 priority patent/WO2015081748A1/en
Application granted granted Critical
Publication of CN103617262B publication Critical patent/CN103617262B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns

Abstract

The invention provides a kind of picture content attribute identification method and system, method includes:Calculate the relative reprinting number for specific resources website for multiple homology picture clusters;Train screening washer model according to multiple homology picture clusters and corresponding number of relatively reprinting;According to the image content attribute in the screening washer Model Identification Target Photo cluster after training.It is an advantage of the current invention that being reprinted on network according to picture or the data propagated can identify the contents attribute of picture it is particularly possible to be used for determining whether advertising pictures.

Description

Picture content attribute identification method and system
Technical field
The present invention relates to field of image recognition is and in particular to a kind of picture content attribute identification method and system.
Background technology
In a network on very eurypalynous resource website, some advertising pictures, the species of these advertising pictures all can occur Very abundant, it includes the advertisement of all kinds of commodity(For example, with regard to the advertisement of milk powder, clothes), and the advertisement of physical stores, and The advertisement of some other types.
These advertising pictures not only appear on the website of businessman, may also appear on the page of other resource websites, For example, in the community allowing user's uploading pictures(Forum, picture station etc.), have some users and upload advertising pictures.Wide in a large number Accuse the presence of picture, often user is interfered, or even when user carries out picture searching, also occur unrelated with user's request Advertising pictures.
From the point of view of the picture material angle of picture, different advertising pictures are that do not have especially many similitudes, so being based on It is difficult to be identified to the image content attribute of picture, that is, it is wide for being difficult to out which picture to current image recognition technology Accuse picture, also just advertising pictures targetedly cannot be processed, the experience of user is necessarily affected by advertising pictures.
Content of the invention
In view of the above problems it is proposed that the present invention so as to provide one kind overcome the problems referred to above or at least in part solve on State a kind of picture content attribute identification method and the system of problem.
According to one aspect of the present invention, there is provided a kind of picture content attribute identification method, it includes:Calculate multiple same Source picture cluster is for the relative reprinting number of specific resources website;Instructed according to multiple homology picture clusters and corresponding number of relatively reprinting Practice screening washer model;According to the image content attribute in the screening washer Model Identification Target Photo cluster after training.
Alternatively, calculate multiple homology picture clusters the step of the relative reprinting number of specific resources website is included:For One of multiple homology picture clusters homology picture cluster, by reprinting on specific resources website for the picture in homology picture cluster Number, compared with the reprinting number on multiple resource websites, obtains the relative reprinting for specific resources website for the homology picture cluster Number, multiple resource websites include specific resources website.
Alternatively, the reprinting number on specific resources website by the picture in homology picture cluster, and in multiple resource websites On the step that compares of number of reprinting include:Calculate the first average reprinting number of the picture on specific resources website;Calculate multiple Second average reprinting number of the picture on resource website;Take reprinting number on specific resources website for the picture in homology picture cluster Average with first reprint number the first difference, and take reprinting number on multiple resource websites for the picture in homology picture cluster with Second average the second difference reprinting number, the first difference and the contrast of the second difference are obtained homology picture cluster for specific resources station The relative reprinting number of point.
Alternatively, the first average step reprinting number calculating the picture on specific resources website includes:Take multiple homologies The multiple pictures being located on specific resources website in the picture of picture cluster, will be corresponding with multiple pictures for the quantity of multiple pictures same The quantity of source picture cluster is contrasted, and obtains the first average reprinting number.
Alternatively, the second average step reprinting number calculating the picture on multiple resource websites includes:By multiple homologies The quantity of the picture of picture cluster, is compared with the quantity of multiple homology picture clusters, obtains the second average reprinting number.
Alternatively, in the reprinting number on specific resources website by the picture in homology picture cluster, and in multiple station resources Before the step that reprinting number on point compares, also include:The image link occurring is captured on multiple resource websites;Detection picture Link corresponding with the picture of homology picture cluster link whether identical, and/or detect the corresponding picture of image link verification believe Whether breath is identical with the check information of the picture of homology picture cluster, and/or the detection corresponding picture of image link and homology picture The picture of cluster whether there is one or more identical characteristics of image;According to testing result, determine whether image link is homology The reprinting of the picture of picture cluster, and count the reprinting number of the picture of homology picture cluster.
Alternatively, specific resources website is the most money of picture reprinting each homology picture cluster in multiple homology picture clusters Source Site.
Alternatively, the picture of each homology picture cluster corresponding same source picture, and the picture of each homology picture cluster and its Corresponding source picture has one or more identical characteristics of image.
Alternatively, methods described further includes:Extract the format character of the picture comprising in described homology picture cluster And/or the chain feature of picture, according to the plurality of homology picture cluster, the corresponding number of reprinting relatively, and the figure that correspondence comprises The format character training screening washer model of piece;According to the screening washer model after training, relatively reprint number and target based on described The format character of the picture comprising in picture cluster and/or the chain feature of picture, to identify the image content in Target Photo cluster Attribute.
Alternatively, including but not limited to one or more of following combination of the format character of described picture:The length of picture/ Width, the size of picture, the definition of picture.
Alternatively, including but not limited to one or more of following combination of the chain feature of described picture:Image link Whether stand together with webpage, whether picture redirected link stands outer.According to another aspect of the present invention, there is provided a kind of image content Attribute identification system, it includes:Relatively reprint number computing module, for calculating multiple homology picture clusters for specific resources website Relative reprinting number;Training module, for instructing multiple homology picture clusters and corresponding relatively reprinting in number input screening washer Practice screening washer model;Screening washer, the screening washer model after being suitable to be trained according to training module, and according to model to target figure Piece cluster is screened;Identification module, the figure for being screened to Target Photo cluster according to screening washer, in identification Target Photo cluster Piece contents attribute.
Alternatively, relatively reprint number computing module for one of multiple homology picture clusters homology picture cluster, by homology Reprinting number on specific resources website for the picture in picture cluster, compared with the reprinting number on multiple resource websites, obtains Homology picture cluster includes specific resources website for the relative reprinting number of specific resources website, multiple resource websites.
Alternatively, also include:First average reprints number computing module, for calculating the of picture on specific resources website One average reprinting number;Second average reprint number computing module, second for calculating picture on multiple resource websites averagely turns Carry number;Relatively reprinting number computing module takes reprinting number on specific resources website for the picture in homology picture cluster average with first Reprint the first difference of number, and take reprinting number on multiple resource websites for the picture in homology picture cluster averagely to turn with second Carry the second difference of number, the first difference is obtained homology picture cluster with the second difference contrast the relative of specific resources website is turned Carry number.
Alternatively, first average reprint number computing module and take be located at specific resources website in the picture of multiple homology picture clusters On multiple pictures, the quantity of corresponding with multiple pictures for the quantity of multiple pictures homology picture cluster is contrasted, is obtained One average reprinting number.
Alternatively, second average reprint number computing module by the quantity of the picture of multiple homology picture clusters, with multiple homologies The quantity of picture cluster is compared, and obtains the second average reprinting number.
Alternatively, also include:Image link handling module, for capturing the image link occurring on multiple resource websites; Image link detection module, for detect image link corresponding with the picture of homology picture cluster link whether identical, and/or inspection Whether the check information surveying the corresponding picture of image link is identical with the check information of the picture of homology picture cluster, and/or detection The corresponding picture of image link whether there is one or more identical characteristics of image with the picture of homology picture cluster;Picture is reprinted Number statistical modules, for according to testing result, determining that whether image link is the reprinting of the picture of homology picture cluster, and count with The reprinting number of the picture of source picture cluster.
Alternatively, specific resources website is the most money of picture reprinting each homology picture cluster in multiple homology picture clusters Source Site.
Alternatively, the picture of each homology picture cluster corresponding same source picture, and the picture of each homology picture cluster and its Corresponding source picture has one or more identical characteristics of image.
Picture content attribute identification method according to the present invention and system, make use of homology picture cluster for specific resources station The relative reprinting number of point carries out the training of screening washer model as training data, and relatively reprinting number is to reflect picture specific The data of outer ratio of standing in the station of resource website, and of picture as advertisement is mainly characterized by:In a certain station resource The number of times reprinted on point is very high, and the number of times reprinted on other resource websites in the range of the Internet can relatively substantially become Few, whether picture is propagated as advertisement respectively therefore to can be used for area relative to the size reprinting number, and using reprinting relatively The training of the screening washer model that number is carried out, then the screening washer model obtaining can voluntarily be known to the image content attribute of picture Not, judge whether picture is advertising pictures exactly.
Described above is only the general introduction of technical solution of the present invention, in order to better understand the technological means of the present invention, And can be practiced according to the content of description, and in order to allow the above and other objects of the present invention, feature and advantage can Become apparent, below especially exemplified by the specific embodiment of the present invention.
Brief description
By reading the detailed description of hereafter preferred implementation, various other advantages and benefit are common for this area Technical staff will be clear from understanding.Accompanying drawing is only used for illustrating the purpose of preferred implementation, and is not considered as to the present invention Restriction.And in whole accompanying drawing, it is denoted by the same reference numerals identical part.In the accompanying drawings:
The flow chart that Fig. 1 shows image content recognizing method according to an embodiment of the invention;
Fig. 2 shows the partial process view of image content recognizing method according to an embodiment of the invention;
The flow chart that Fig. 3 shows image content recognizing method according to an embodiment of the invention;
Fig. 4 shows the block diagram of image content identifying system according to an embodiment of the invention;
Fig. 5 shows the block diagram of image content identifying system according to an embodiment of the invention;
Fig. 6 shows the block diagram of image content identifying system according to an embodiment of the invention.
Specific embodiment
It is more fully described the exemplary embodiment of the disclosure below with reference to accompanying drawings.Although showing the disclosure in accompanying drawing Exemplary embodiment it being understood, however, that may be realized in various forms the disclosure and should not be by embodiments set forth here Limited.On the contrary, these embodiments are provided to be able to be best understood from the disclosure, and can be by the scope of the present disclosure Complete conveys to those skilled in the art.
As shown in figure 1, An embodiment provides a kind of picture content attribute identification method, it includes:Step Rapid 110, calculate the relative reprinting number for specific resources website for multiple homology picture clusters, each picture cluster is to one group of picture Polymerization, for example, it may be one group of higher picture of similarity, and relatively reprinting number is a kind of figure that can reflect homology picture cluster Piece stand in specific resources website station outside reprinting ratio data, the calculation relatively reprinting number is more, in the present embodiment The calculation relatively reprinting number is not limited;Step 120, according to multiple homology picture clusters and corresponding reprinting relatively Number training screening washer model, by the research discovery to advertising pictures, advertising pictures have following characteristics:Advertising pictures production cost Height, a lot of advertising pictures Dou Shi trade companies spend money, spend the time to make, because the production cost of advertising pictures is high, One advertising pictures can be propagated many times by trade company, but these advertising pictures substantially only have trade company can be propagated, and its His user then substantially will not propagating advertisement picture, advertising pictures propagate on this difference be eventually embodied in resource website On reprinting number on:The number of times reprinted on specific resource website is very many(Trade company's spreading intentionally), and the Internet other The number of times of the reprinting on website is relatively little of many(Other users are not propagated), namely advertising pictures are in specific resources website station Reprinting outside standing is higher than regular meeting, so relatively reprint number can be used as a kind of number distinguishing advertising pictures and non-advertising pictures According to, and train the LIBSVM that the instrument of screening washer model including but not limited to increases income;Step 130, according to the screening washer after training Image content attribute in Model Identification Target Photo cluster, that is, whether the picture in identification Target Photo cluster is advertising pictures, has Beneficial to carrying out the process such as filtering to advertising pictures, it is to avoid the experience of user is impacted advertising pictures it is assumed that Target Photo cluster For one group of picture of corresponding picture searching request, then the technical scheme according to the present embodiment, can be from wherein identifying advertisement figure Piece is simultaneously filtered, thus non-advertising pictures are supplied to user as Search Results, thus ensureing the experience of user.
In actual applications, outside the number of reprinting relatively proposed by the present invention, other features are also simultaneously taken account of, for example The length/width of picture, the size of picture, the definition of picture, image link whether with webpage with standing, or whether picture redirected link The feature such as stand outer, when training screening washer can according to multiple homology picture clusters each self-corresponding relatively reprint number, and picture cluster In picture length/width, the size of picture, the definition of picture, image link whether with webpage with standing, picture redirected link is No one or more of outer combination of standing, first passes through screening washer and goes to learn and train.When Target Photo cluster identifies, also can correspond to Screened with reference to one or more of these other features above-mentioned and identified whether as advertising pictures.
Another embodiment of the present invention proposes a kind of picture content attribute identification method, compared with above-described embodiment, this reality Apply the picture content attribute identification method of example, step 110 can include:For one of multiple homology picture clusters homology picture Cluster, reprinting number on specific resources website for the picture in homology picture cluster has for example been reprinted 30 times on the A of picture station, with Reprinting number on multiple resource websites compares, such as at 10 picture stations(Including picture station A)Upper corotation has carried 35 times, obtains To homology picture cluster for specific resources website relative reprinting number, multiple resource websites include specific resources website, this enforcement Provide in example and calculate the feasible pattern relatively reprinting number, and specific manner of comparison is not defined, for example, take 30/35, 30/(35-30)All it is possible as relatively reprinting number.
As shown in Fig. 2 another embodiment of the present invention proposes a kind of picture content attribute identification method, with above-described embodiment Compare, the picture content attribute identification method of the present embodiment, step 110 includes:Step 111, calculates on specific resources website First average reprinting number of picture is it is assumed for example that the first average number of reprinting of picture station A is 5;Step 112, calculates multiple resources Second average number of reprinting of the picture on website is it is assumed for example that 10 picture stations(Including picture station A)The second average reprinting number For 20;Step 113, takes the picture in homology picture cluster to reprint the of number reprinting average with first number on specific resources website One difference, then the first difference actually can reflect the picture of homology picture cluster and reprinting on specific resources website for other pictures Difference, the more big probability then representing that homology picture cluster is advertising pictures of difference is bigger, understands first in conjunction with aforesaid embodiment Difference is 30-5=25, and takes reprinting number on multiple resource websites for the picture in homology picture cluster averagely to reprint with second Second difference of number, then the second difference actually can reflect the picture of homology picture cluster and other pictures on multiple resource websites Reprinting difference, difference bigger represent homology picture cluster be advertising pictures probability less, in conjunction with aforesaid embodiment understand Second difference is 35-20=15, and the first difference and the contrast of the second difference are obtained the phase for specific resources website for the homology picture cluster To reprinting number, in the present embodiment, provide another kind of mode calculating and relatively reprinting number, and the picture in view of homology picture cluster With the reprinting difference of other pictures so that relative reprinting number can preferably reflect whether picture is advertising pictures, in the present embodiment First difference and the second difference way of contrast are not defined, for example, take 25/15,(25±a)/(15±b)It is all permissible , a, b are constant.
Another embodiment of the present invention proposes a kind of picture content attribute identification method, compared with above-described embodiment, this reality Apply the picture content attribute identification method of example, step 111 includes:Take and in the picture of multiple homology picture clusters, be located at specific resources station Multiple pictures on point, the quantity of corresponding with multiple pictures for the quantity of multiple pictures homology picture cluster are contrasted, are obtained First average reprinting number, such as have 100 pictures on picture station A, and this 100 pictures is located in 20 picture clusters, then first is flat All reprinting number is 100/20=5, provides a kind of side rapidly and efficiently averagely being reprinted number in the technical scheme of the present embodiment Formula.
Another embodiment of the present invention proposes a kind of picture content attribute identification method, compared with above-described embodiment, this reality Apply the picture content attribute identification method of example, step 112 includes:By the quantity of the picture of multiple homology picture clusters, with multiple with The quantity of source picture cluster is compared, and obtains the second average reprinting number, such as 10 picture stations(Including picture station A)On have 1000 Pictures, this 1000 pictures can cluster as 50 picture clusters, then the second average number of reprinting is 1000/50=20, the present embodiment A kind of mode rapidly and efficiently averagely being reprinted number is provided in technical scheme.
As shown in figure 3, another embodiment of the present invention proposes a kind of picture content attribute identification method, with above-described embodiment Compare, the picture content attribute identification method of the present embodiment, before step 110, also include:Step 101, captures multiple station resources The image link occurring on point(URL);Step 102, whether detection image link is corresponding with the picture of homology picture cluster links Identical, which reflects whether pictures are reprinted with different URL, and/or the verification of the detection corresponding picture of image link The check information of the picture of information and homology picture cluster(Including but not limited to MD5 value)Whether identical, which reflects and whether there is Multiple identical pictures, and/or the picture of the detection corresponding picture of image link and homology picture cluster is with the presence or absence of one or many Individual identical characteristics of image, whether identical, or obtained by the modification of same pictures, in the present embodiment if which reflects plurality of pictures Characteristics of image includes but is not limited to contour feature, color characteristic, histogram feature etc.;Step 103, according to testing result, determines Whether image link is the reprinting of the picture of homology picture cluster, and counts the reprinting number of the picture of homology picture cluster, then this enforcement Example in provide a kind of can all-round statistics picture reprint number technical scheme.
Another embodiment of the present invention proposes a kind of picture content attribute identification method, compared with above-described embodiment, this reality Apply the picture content attribute identification method of example, specific resources website is to reprint each homology picture cluster in multiple homology picture clusters The most resource website of picture, the website reprinting the most number of times of picture is possible for the station that the trade company of advertising pictures is propagated Point, the corresponding number of reprinting of this website is best able to be effectively reflected whether picture is advertising pictures.
Another embodiment of the present invention proposes a kind of picture content attribute identification method, compared with above-described embodiment, this reality Apply the picture content attribute identification method of example, the picture corresponding same source picture of each homology picture cluster, and each homology picture The source picture that the picture of cluster is corresponding has one or more identical characteristics of image, then in the technical scheme of the present embodiment In, the picture of each homology picture cluster is identical, or can with same picture modification obtain, the characteristics of image in the present embodiment include but It is not limited to contour feature, color characteristic, histogram feature etc..
As shown in figure 4, An embodiment provides a kind of image content attribute identification system, it includes:Phase To reprinting number computing module 210, for calculating the relative reprinting number for specific resources website for multiple homology picture clusters, each figure Piece cluster is the polymerization to one group of picture, for example, it may be one group of higher picture of similarity, and relatively reprinting number is that one kind can Reflection homology picture cluster picture stand in specific resources website station outside reprintings ratio data, with respect to reprinting number calculating side Formula is more, the calculation relatively reprinting number is not limited in the present embodiment;Training module 220, for by multiple homologies Picture cluster and corresponding relatively reprinting in number input screening washer train screening washer model.By sending out to the research of advertising pictures Existing, advertising pictures have following characteristics:Advertising pictures production cost is high, when a lot of advertising pictures Dou Shi trade companies spend money, spend Between make because the production cost of advertising pictures is high, advertising pictures can be propagated many times by trade company, but these Advertising pictures substantially only have trade company can be propagated, and other user then substantially will not propagating advertisement picture, advertising pictures On the reprinting number that this difference on propagating eventually is embodied on resource website:That reprints on specific resource website is secondary Number is very many(Trade company's spreading intentionally), and the number of times of the reprinting on other websites of the Internet is relatively little of many(Other users are simultaneously Do not propagate), namely advertising pictures stand in specific resources website station outside reprinting higher than regular meeting, so relatively reprint number can Using as a kind of data distinguishing advertising pictures and non-advertising pictures;Screening washer 230, is suitable to be trained according to training module Screening washer model afterwards, and Target Photo cluster being screened according to model, used in the present embodiment screening washer include but not It is limited to the LIBSVM increasing income;Identification module 240, for screening to Target Photo cluster according to screening washer, identifies Target Photo Image content attribute in cluster, that is, whether the picture in identification Target Photo cluster is advertising pictures.
In addition, system further includes described in practical application:Picture format characteristic module 310 and/or image link are special Levy module 320;Described picture format characteristic module 310, is suitable to extract the figure comprising in homology picture cluster and Target Photo cluster The format character of piece;Described image link characteristic module 320, is suitable to extract and comprises in homology picture cluster and Target Photo cluster The chain feature of picture;Described training module 220 be further adapted for based on multiple homology picture clusters, corresponding relatively reprint number with And corresponding picture format feature and/or image link feature, together train screening washer model in input screening washer;Described screening Device 230, the model after being further adapted for according to training, combining target picture cluster is corresponding to reprint number and corresponding picture relatively Format character and/or image link feature, screen to Target Photo cluster;Described identification module 240, is further used for basis Described screening washer is based on the corresponding reprinting number relatively of Target Photo cluster and corresponding picture format feature and/or image link is special Levy and Target Photo cluster is screened, the image content attribute in identification Target Photo cluster.
Be conducive to advertising pictures are carried out the process such as filtering, it is to avoid advertising pictures the experience of user is impacted it is assumed that Target Photo cluster is one group of picture of corresponding picture searching request, then the technical scheme according to the present embodiment, can be from wherein knowing Do not go out advertising pictures and filtered, thus non-advertising pictures are supplied to user as Search Results, thus ensureing user's Experience.
In actual applications, proposed by the present invention relatively reprint number outside it is also contemplated that other feature, such as picture Length/width, the size of picture, the definition of picture, image link whether with webpage with standing, or picture redirected link whether stand outer Etc. feature, equally first pass through grader and go to learn and train.When Target Photo cluster identifies, these other spies above-mentioned also can be considered One or more of levy and to be screened and to be identified whether as advertising pictures.
Another embodiment of the present invention proposes a kind of image content attribute identification system, compared with above-described embodiment, this reality Apply the image content attribute identification system of example, relatively reprint number computing module 210 same for one of multiple homology picture clusters Source picture cluster, reprinting number on specific resources website for the picture in homology picture cluster has for example reprinted 30 on the A of picture station Secondary, compared with the reprinting number on multiple resource websites, such as at 10 picture stations(Including picture station A)Upper corotation has carried 35 Secondary, obtain the relative reprinting number for specific resources website for the homology picture cluster, multiple resource websites include specific resources website, Provide in the present embodiment and calculate the feasible pattern relatively reprinting number, and specific manner of comparison is not defined, for example, take 30/35、30/(35-30)All it is possible as relatively reprinting number.
As shown in figure 5, another embodiment of the present invention proposes a kind of image content attribute identification system, with above-described embodiment Compare, the image content attribute identification system of the present embodiment, also include:First average reprinting number computing module 250, for calculating First average reprinting number of the picture on specific resources website is it is assumed for example that the first average number of reprinting of picture station A is 5;Second Average reprint number computing module 260, second for calculating picture on multiple resource websites be average to reprint number it is assumed for example that 10 Individual picture station(Including picture station A)Second average reprint number be 20;Relatively reprint number computing module 210 to take in homology picture cluster Picture on specific resources website reprint number with first average reprinting number the first difference, then the first difference actually can be anti- Reflect picture and the reprinting difference on specific resources website for other pictures of homology picture cluster, difference is more big then to represent homology picture Cluster is that the probability of advertising pictures is bigger, understands that the first difference is 30-5=25 in conjunction with aforesaid embodiment, and takes homology picture The second difference reprinting number reprinting average with second number on multiple resource websites for the picture in cluster, then the second difference is actually Picture and the reprinting difference on multiple resource websites for other pictures of homology picture cluster can be reflected, difference is bigger to represent homology figure Piece cluster is that the probability of advertising pictures is less, understands that the second difference is 35-20=15 in conjunction with aforesaid embodiment, by the first difference Obtain the relative reprinting number for specific resources website for the homology picture cluster with the second difference contrast, provide another in the present embodiment Kind calculate the mode relatively reprinting number, and the reprinting difference of the picture in view of homology picture cluster and other pictures is so that relative Reprint number and can preferably reflect whether picture is advertising pictures, not to the first difference and the second difference way of contrast in the present embodiment It is defined, for example, take 25/15,(25±a)/(15±b)All it is possible, a, b are constant.
Another embodiment of the present invention proposes a kind of image content attribute identification system, compared with above-described embodiment, this reality Apply the image content attribute identification system of example, the first average number computing module 250 of reprinting takes in the picture of multiple homology picture clusters Multiple pictures on specific resources website, by the quantity of corresponding with multiple pictures for the quantity of multiple pictures homology picture cluster Contrasted, obtain the first average reprinting number, such as on picture station A, have 100 pictures, this 100 pictures is located at 20 pictures In cluster, then the first average number of reprinting is 100/20=5, provides one kind and rapidly and efficiently put down in the technical scheme of the present embodiment All reprint the mode of number.
Another embodiment of the present invention proposes a kind of image content attribute identification system, compared with above-described embodiment, this reality Apply the image content attribute identification system of example, the second average number computing module 260 of reprinting is by the picture of multiple homology picture clusters Quantity, is compared with the quantity of multiple homology picture clusters, obtains the second average reprinting number, such as 10 picture stations(Including figure Piece station A)On have 1000 pictures, this 1000 pictures can cluster as 50 picture clusters, then the second average number of reprinting is 1000/50 =20, provide a kind of mode rapidly and efficiently averagely being reprinted number in the technical scheme of the present embodiment.
As shown in fig. 6, another embodiment of the present invention proposes a kind of image content attribute identification system, with above-described embodiment Compare, the image content attribute identification system of the present embodiment, also include:Image link handling module 270, for capturing multiple moneys The image link occurring on Source Site(URL);Image link detection module 280, for detecting image link and homology picture cluster Picture corresponding link whether identical, which reflects whether pictures are reprinted with different URL, and/or detection picture Link the check information of the check information of corresponding picture and the picture of homology picture cluster(Including but not limited to MD5 value)Whether phase With, which reflects with the presence or absence of multiple identical pictures, and/or the figure of the detection corresponding picture of image link and homology picture cluster Piece whether there is one or more identical characteristics of image, and whether identical, or repaiied by same pictures if which reflects plurality of pictures Change and obtain, the characteristics of image in the present embodiment includes but is not limited to contour feature, color characteristic, histogram feature etc.;Picture turns Carry number statistical module 290, for according to testing result, determining that whether image link is the reprinting of the picture of homology picture cluster, and Statistics homology picture cluster picture reprinting number, then provide in the present embodiment a kind of can all-round statistics picture reprint number technology Scheme.
Another embodiment of the present invention proposes a kind of image content attribute identification system, compared with above-described embodiment, this reality Apply the image content attribute identification system of example, specific resources website is to reprint each homology picture cluster in multiple homology picture clusters The most resource website of picture, the website reprinting the most number of times of picture is possible for the station that the trade company of advertising pictures is propagated Point, the corresponding number of reprinting of this website is best able to be effectively reflected whether picture is advertising pictures.
Another embodiment of the present invention proposes a kind of image content attribute identification system, compared with above-described embodiment, this reality Apply the image content attribute identification system of example, the picture corresponding same source picture of each homology picture cluster, and each homology picture The source picture that the picture of cluster is corresponding has one or more identical characteristics of image, then in the technical scheme of the present embodiment In, the picture of each homology picture cluster is identical, or can with same picture modification obtain, the characteristics of image in the present embodiment include but It is not limited to contour feature, color characteristic, histogram feature etc..
Algorithm and display be not inherently related to any certain computer, virtual system or miscellaneous equipment provided herein. Various general-purpose systems can also be used together with based on teaching in this.As described above, construct required by this kind of system Structure be obvious.Additionally, the present invention is also not for any certain programmed language.It is understood that, it is possible to use various Programming language realizes the content of invention described herein, and the description above language-specific done is to disclose this Bright preferred forms.
In description mentioned herein, illustrate a large amount of details.It is to be appreciated, however, that the enforcement of the present invention Example can be put into practice in the case of not having these details.In some instances, known method, structure are not been shown in detail And technology, so as not to obscure the understanding of this description.
Similarly it will be appreciated that in order to simplify the disclosure and help understand one or more of each inventive aspect, Above in the description to the exemplary embodiment of the present invention, each feature of the present invention is grouped together into single enforcement sometimes In example, figure or descriptions thereof.However, the method for the disclosure should be construed to reflect following intention:I.e. required guarantor The application claims of shield more features than the feature being expressly recited in each claim.More precisely, it is such as following Claims reflected as, inventive aspect is all features less than single embodiment disclosed above.Therefore, The claims following specific embodiment are thus expressly incorporated in this specific embodiment, wherein each claim itself All as the separate embodiments of the present invention.
Those skilled in the art are appreciated that and the module in the equipment in embodiment can be carried out adaptively Change and they are arranged in one or more equipment different from this embodiment.Can be the module in embodiment or list Unit or assembly be combined into a module or unit or assembly, and can be divided in addition multiple submodule or subelement or Sub-component.In addition to such feature and/or at least some of process or unit exclude each other, can adopt any Combination is to this specification(Including adjoint claim, summary and accompanying drawing)Disclosed in all features and so disclosed Where method or all processes of equipment or unit are combined.Unless expressly stated otherwise, this specification(Including adjoint power Profit requires, makes a summary and accompanying drawing)Disclosed in each feature can carry out generation by the alternative features providing identical, equivalent or similar purpose Replace.
Although additionally, it will be appreciated by those of skill in the art that some embodiments described herein include other embodiments In included some features rather than further feature, but the combination of the feature of different embodiment means to be in the present invention's Within the scope of and form different embodiments.For example, in the following claims, embodiment required for protection appoint One of meaning can in any combination mode using.
The all parts embodiment of the present invention can be realized with hardware, or to run on one or more processor Software module realize, or with combinations thereof realize.It will be understood by those of skill in the art that can make in practice With microprocessor or digital signal processor(DSP)To realize image content attribute identification system according to embodiments of the present invention In some or all parts some or all functions.The present invention is also implemented as described herein for executing Some or all equipment of method or program of device(For example, computer program and computer program).So The program realizing the present invention can store on a computer-readable medium, or can have the shape of one or more signal Formula.Such signal can be downloaded from internet website and obtain, or provides on carrier signal, or with any other shape Formula provides.
It should be noted that above-described embodiment the present invention will be described rather than limits the invention, and ability Field technique personnel can design alternative embodiment without departing from the scope of the appended claims.In the claims, Any reference markss between bracket should not be configured to limitations on claims.Word "comprising" does not exclude the presence of not Element listed in the claims or step.Word "a" or "an" before element does not exclude the presence of multiple such Element.The present invention can come real by means of the hardware including some different elements and by means of properly programmed computer Existing.If in the unit claim listing equipment for drying, several in these devices can be by same hardware branch To embody.The use of word first, second, and third does not indicate that any order.These words can be explained and run after fame Claim.

Claims (16)

1. a kind of picture content attribute identification method, it includes:
Calculate the relative reprinting number for specific resources website for multiple homology picture clusters, the described number of reprinting relatively is same for reflecting The data of reprinting ratio inside and outside specific resources website station for the picture of source picture cluster;
Screening washer model is trained according to the plurality of homology picture cluster and corresponding number of relatively reprinting;
According to the image content attribute in the screening washer Model Identification Target Photo cluster after training, described image content attribute is figure Whether piece is advertising pictures.
2. picture content attribute identification method according to claim 1, wherein, described calculating multiple homology picture clusters for The step of the relative reprinting number of specific resources website includes:
For one of the plurality of homology picture cluster homology picture cluster, by the picture in described homology picture cluster in specific money Reprinting number on Source Site, compared with the reprinting number on multiple resource websites, obtains described homology picture cluster for described The relative reprinting number of specific resources website, the plurality of resource website includes described specific resources website.
3. picture content attribute identification method according to claim 2, wherein, described by the figure in described homology picture cluster Reprinting number on described specific resources website for the piece, the step compared with the reprinting number on multiple resource websites includes:
Calculate the first average reprinting number of the picture on described specific resources website;
Calculate the second average reprinting number of the picture on the plurality of resource website;
Take reprinting number on described specific resources website for the picture in described homology picture cluster and described first average reprinting number The first difference, and take reprinting number and described second on the plurality of resource website for the picture in described homology picture cluster Average the second difference reprinting number, described first difference and described second difference contrast are obtained described homology picture cluster for institute State the relative reprinting number of specific resources website.
4. picture content attribute identification method according to claim 3, wherein, on the described specific resources website of described calculating Picture first average reprint number step include:
Take the multiple pictures being located on described specific resources website in the picture of the plurality of homology picture cluster, by the plurality of figure The quantity of the quantity of piece homology corresponding with the plurality of picture picture cluster is contrasted, and obtains the described first average reprinting number.
5. picture content attribute identification method according to claim 3, wherein, on the plurality of resource website of described calculating Picture second average reprint number step include:
By the quantity of the picture of the plurality of homology picture cluster, it is compared with the quantity of the plurality of homology picture cluster, obtains Described second average reprinting number.
6. picture content attribute identification method according to claim 2, wherein, described by described homology picture cluster Reprinting number on specific resources website for the picture, before the step compared with the reprinting number on multiple resource websites, also wraps Include:
The image link occurring is captured on the plurality of resource website;
Detect described image link corresponding with the picture of described homology picture cluster link whether identical, and/or detect described figure Whether the check information that piece links corresponding picture is identical with the check information of the picture of described homology picture cluster, and/or detection The picture of the described corresponding picture of image link and described homology picture cluster whether there is one or more identical characteristics of image;
According to testing result, determine that whether described image link is the reprinting of the picture of described homology picture cluster, and count described The reprinting number of the picture of homology picture cluster.
7. picture content attribute identification method according to claim 2, wherein,
Described specific resources website is the most resource of picture reprinting each homology picture cluster in the plurality of homology picture cluster Website.
8. picture content attribute identification method according to any one of claim 1 to 7, wherein,
The picture of each homology picture cluster corresponding same source picture, and the corresponding source picture of picture of each homology picture cluster There are one or more identical characteristics of image.
9. a kind of image content attribute identification system, it includes:
Relatively reprint number computing module, for calculating the relative reprinting number for specific resources website for multiple homology picture clusters, institute State and relatively reprint the data that number is the reprinting ratio that can reflect the picture of homology picture cluster inside and outside specific resources website station;
Training module, for training screening by the plurality of homology picture cluster and corresponding relatively reprinting in number input screening washer Device model;
Screening washer, is suitable to the screening washer model after being trained according to described training module, and according to described model to target figure Piece cluster is screened;
Identification module, for being screened to Target Photo cluster according to described screening washer, in the picture in identification Target Photo cluster Hold attribute, whether described image content attribute is advertising pictures for picture.
10. image content attribute identification system according to claim 9, wherein,
Described number computing module of relatively reprinting for one of the plurality of homology picture cluster homology picture cluster, by described homology Reprinting number on specific resources website for the picture in picture cluster, compared with the reprinting number on multiple resource websites, obtains Described homology picture cluster includes described specific money for the relative reprinting number of described specific resources website, the plurality of resource website Source Site.
11. image content attribute identification systems according to claim 10, wherein, also include:
First average reprinting number computing module, for calculating the first average reprinting number of the picture on described specific resources website;
Second average reprinting number computing module, for calculating the second average reprinting number of the picture on the plurality of resource website;
Described number computing module of relatively reprinting takes reprinting on described specific resources website for the picture in described homology picture cluster Number first difference of reprinting number average with described first, and take the picture in described homology picture cluster in the plurality of station resource The second difference reprinting number reprinting average with described second number on point, described first difference and described second difference are contrasted To described homology picture cluster for described specific resources website relative reprinting number.
12. image content attribute identification systems according to claim 11, wherein,
First average reprint number computing module and take in the picture of the plurality of homology picture cluster be located on described specific resources website Multiple pictures, the quantity of corresponding with the plurality of picture for the quantity of the plurality of picture homology picture cluster is contrasted, Obtain the described first average reprinting number.
13. image content attribute identification systems according to claim 11, wherein,
Second average reprint number computing module by the quantity of the picture of the plurality of homology picture cluster, with the plurality of homology picture The quantity of cluster is compared, and obtains the described second average reprinting number.
14. image content attribute identification systems according to claim 10, wherein, also include:
Image link handling module, for capturing the image link occurring on the plurality of resource website;
Whether image link detection module, link for detecting that described image link is corresponding with the picture of described homology picture cluster Identical, and/or the verification letter detecting the check information of the corresponding picture of described image link and the picture of described homology picture cluster Whether breath is identical, and/or the detection corresponding picture of described image link whether there is one with the picture of described homology picture cluster Or multiple identical characteristics of image;
Number statistical module reprinted by picture, for according to testing result, determining whether described image link is described homology picture cluster Picture reprinting, and count the reprinting number of the picture of described homology picture cluster.
15. image content attribute identification systems according to claim 10, wherein,
Described specific resources website is the most resource of picture reprinting each homology picture cluster in the plurality of homology picture cluster Website.
The 16. image content attribute identification systems according to any one of claim 9 to 15, wherein,
The picture of each homology picture cluster corresponding same source picture, and the corresponding source picture of picture of each homology picture cluster There are one or more identical characteristics of image.
CN201310632676.8A 2013-12-02 2013-12-02 Picture content attribute identification method and system Active CN103617262B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201310632676.8A CN103617262B (en) 2013-12-02 2013-12-02 Picture content attribute identification method and system
PCT/CN2014/087109 WO2015081748A1 (en) 2013-12-02 2014-09-22 Method and system for identifying content attribute of picture

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310632676.8A CN103617262B (en) 2013-12-02 2013-12-02 Picture content attribute identification method and system

Publications (2)

Publication Number Publication Date
CN103617262A CN103617262A (en) 2014-03-05
CN103617262B true CN103617262B (en) 2017-03-08

Family

ID=50167965

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310632676.8A Active CN103617262B (en) 2013-12-02 2013-12-02 Picture content attribute identification method and system

Country Status (1)

Country Link
CN (1) CN103617262B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015081748A1 (en) * 2013-12-02 2015-06-11 北京奇虎科技有限公司 Method and system for identifying content attribute of picture
CN105022738A (en) * 2014-04-21 2015-11-04 上海京知信息科技有限公司 Extracting and mapping method of network picture format file on the basis of histograms
CN103995857A (en) * 2014-05-14 2014-08-20 北京奇虎科技有限公司 Method and device for achieving image search and sorting
CN106599177B (en) * 2016-12-12 2020-02-14 国云科技股份有限公司 Advertisement page shielding processing method
CN107451180B (en) * 2017-06-13 2021-02-19 百度在线网络技术(北京)有限公司 Method, device, equipment and computer storage medium for identifying site homologous relation

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5832119A (en) * 1993-11-18 1998-11-03 Digimarc Corporation Methods for controlling systems using control signals embedded in empirical data
CN101071433A (en) * 2007-05-10 2007-11-14 腾讯科技(深圳)有限公司 Picture download system and method
CN102419777A (en) * 2012-01-10 2012-04-18 凤凰在线(北京)信息技术有限公司 System and method for filtering internet image advertisements
CN102591983A (en) * 2012-01-10 2012-07-18 凤凰在线(北京)信息技术有限公司 Advertisement filter system and advertisement filter method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5832119A (en) * 1993-11-18 1998-11-03 Digimarc Corporation Methods for controlling systems using control signals embedded in empirical data
US5832119C1 (en) * 1993-11-18 2002-03-05 Digimarc Corp Methods for controlling systems using control signals embedded in empirical data
CN101071433A (en) * 2007-05-10 2007-11-14 腾讯科技(深圳)有限公司 Picture download system and method
CN102419777A (en) * 2012-01-10 2012-04-18 凤凰在线(北京)信息技术有限公司 System and method for filtering internet image advertisements
CN102591983A (en) * 2012-01-10 2012-07-18 凤凰在线(北京)信息技术有限公司 Advertisement filter system and advertisement filter method

Also Published As

Publication number Publication date
CN103617262A (en) 2014-03-05

Similar Documents

Publication Publication Date Title
CN103617262B (en) Picture content attribute identification method and system
CN104834933B (en) A kind of detection method and device in saliency region
CN103500405B (en) For carrying out mirror method for distinguishing and its equipment to the nominal model of target terminal
US8587604B1 (en) Interactive color palettes for color-aware search
WO2019031503A1 (en) Tire image recognition method and tire image recognition device
CN108694223A (en) The construction method and device in a kind of user's portrait library
CN104603833B (en) Method and system for linking printing object with digital content
KR20190119219A (en) Main image recommendation method and apparatus, and system
CN107918767B (en) Object detection method, device, electronic equipment and computer-readable medium
CN106203454B (en) The method and device of certificate format analysis
CN109191424B (en) Breast mass detection and classification system and computer-readable storage medium
CN109472193A (en) Method for detecting human face and device
CN109685528A (en) System and method based on deep learning detection counterfeit product
CN103617261B (en) Picture content attribute identification method and system
CN110110714A (en) Method and system are corrected automatically on a kind of line of papery operation
CN110348511A (en) A kind of picture reproduction detection method, system and electronic equipment
CN109919211A (en) Commodity recognition method, device, system and computer-readable medium
CN107003834B (en) Pedestrian detection device and method
CN110347855A (en) Paintings recommended method, terminal device, server, computer equipment and medium
CN106469187A (en) The extracting method of key word and device
CN102902790B (en) Web page classification system and method
CN112988557A (en) Search box positioning method, data acquisition device and medium
CN107493469A (en) A kind of method and device of the area-of-interest of determination SFR test cards
CN109934194A (en) Picture classification method, edge device, system and storage medium
CN105183843B (en) list page identification system and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20220727

Address after: Room 801, 8th floor, No. 104, floors 1-19, building 2, yard 6, Jiuxianqiao Road, Chaoyang District, Beijing 100015

Patentee after: BEIJING QIHOO TECHNOLOGY Co.,Ltd.

Address before: 100088 room 112, block D, 28 new street, new street, Xicheng District, Beijing (Desheng Park)

Patentee before: BEIJING QIHOO TECHNOLOGY Co.,Ltd.

Patentee before: Qizhi software (Beijing) Co.,Ltd.

TR01 Transfer of patent right