CN103617262B - Picture content attribute identification method and system - Google Patents
Picture content attribute identification method and system Download PDFInfo
- Publication number
- CN103617262B CN103617262B CN201310632676.8A CN201310632676A CN103617262B CN 103617262 B CN103617262 B CN 103617262B CN 201310632676 A CN201310632676 A CN 201310632676A CN 103617262 B CN103617262 B CN 103617262B
- Authority
- CN
- China
- Prior art keywords
- picture
- cluster
- reprinting
- homology
- website
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
Abstract
The invention provides a kind of picture content attribute identification method and system, method includes:Calculate the relative reprinting number for specific resources website for multiple homology picture clusters;Train screening washer model according to multiple homology picture clusters and corresponding number of relatively reprinting;According to the image content attribute in the screening washer Model Identification Target Photo cluster after training.It is an advantage of the current invention that being reprinted on network according to picture or the data propagated can identify the contents attribute of picture it is particularly possible to be used for determining whether advertising pictures.
Description
Technical field
The present invention relates to field of image recognition is and in particular to a kind of picture content attribute identification method and system.
Background technology
In a network on very eurypalynous resource website, some advertising pictures, the species of these advertising pictures all can occur
Very abundant, it includes the advertisement of all kinds of commodity(For example, with regard to the advertisement of milk powder, clothes), and the advertisement of physical stores, and
The advertisement of some other types.
These advertising pictures not only appear on the website of businessman, may also appear on the page of other resource websites,
For example, in the community allowing user's uploading pictures(Forum, picture station etc.), have some users and upload advertising pictures.Wide in a large number
Accuse the presence of picture, often user is interfered, or even when user carries out picture searching, also occur unrelated with user's request
Advertising pictures.
From the point of view of the picture material angle of picture, different advertising pictures are that do not have especially many similitudes, so being based on
It is difficult to be identified to the image content attribute of picture, that is, it is wide for being difficult to out which picture to current image recognition technology
Accuse picture, also just advertising pictures targetedly cannot be processed, the experience of user is necessarily affected by advertising pictures.
Content of the invention
In view of the above problems it is proposed that the present invention so as to provide one kind overcome the problems referred to above or at least in part solve on
State a kind of picture content attribute identification method and the system of problem.
According to one aspect of the present invention, there is provided a kind of picture content attribute identification method, it includes:Calculate multiple same
Source picture cluster is for the relative reprinting number of specific resources website;Instructed according to multiple homology picture clusters and corresponding number of relatively reprinting
Practice screening washer model;According to the image content attribute in the screening washer Model Identification Target Photo cluster after training.
Alternatively, calculate multiple homology picture clusters the step of the relative reprinting number of specific resources website is included:For
One of multiple homology picture clusters homology picture cluster, by reprinting on specific resources website for the picture in homology picture cluster
Number, compared with the reprinting number on multiple resource websites, obtains the relative reprinting for specific resources website for the homology picture cluster
Number, multiple resource websites include specific resources website.
Alternatively, the reprinting number on specific resources website by the picture in homology picture cluster, and in multiple resource websites
On the step that compares of number of reprinting include:Calculate the first average reprinting number of the picture on specific resources website;Calculate multiple
Second average reprinting number of the picture on resource website;Take reprinting number on specific resources website for the picture in homology picture cluster
Average with first reprint number the first difference, and take reprinting number on multiple resource websites for the picture in homology picture cluster with
Second average the second difference reprinting number, the first difference and the contrast of the second difference are obtained homology picture cluster for specific resources station
The relative reprinting number of point.
Alternatively, the first average step reprinting number calculating the picture on specific resources website includes:Take multiple homologies
The multiple pictures being located on specific resources website in the picture of picture cluster, will be corresponding with multiple pictures for the quantity of multiple pictures same
The quantity of source picture cluster is contrasted, and obtains the first average reprinting number.
Alternatively, the second average step reprinting number calculating the picture on multiple resource websites includes:By multiple homologies
The quantity of the picture of picture cluster, is compared with the quantity of multiple homology picture clusters, obtains the second average reprinting number.
Alternatively, in the reprinting number on specific resources website by the picture in homology picture cluster, and in multiple station resources
Before the step that reprinting number on point compares, also include:The image link occurring is captured on multiple resource websites;Detection picture
Link corresponding with the picture of homology picture cluster link whether identical, and/or detect the corresponding picture of image link verification believe
Whether breath is identical with the check information of the picture of homology picture cluster, and/or the detection corresponding picture of image link and homology picture
The picture of cluster whether there is one or more identical characteristics of image;According to testing result, determine whether image link is homology
The reprinting of the picture of picture cluster, and count the reprinting number of the picture of homology picture cluster.
Alternatively, specific resources website is the most money of picture reprinting each homology picture cluster in multiple homology picture clusters
Source Site.
Alternatively, the picture of each homology picture cluster corresponding same source picture, and the picture of each homology picture cluster and its
Corresponding source picture has one or more identical characteristics of image.
Alternatively, methods described further includes:Extract the format character of the picture comprising in described homology picture cluster
And/or the chain feature of picture, according to the plurality of homology picture cluster, the corresponding number of reprinting relatively, and the figure that correspondence comprises
The format character training screening washer model of piece;According to the screening washer model after training, relatively reprint number and target based on described
The format character of the picture comprising in picture cluster and/or the chain feature of picture, to identify the image content in Target Photo cluster
Attribute.
Alternatively, including but not limited to one or more of following combination of the format character of described picture:The length of picture/
Width, the size of picture, the definition of picture.
Alternatively, including but not limited to one or more of following combination of the chain feature of described picture:Image link
Whether stand together with webpage, whether picture redirected link stands outer.According to another aspect of the present invention, there is provided a kind of image content
Attribute identification system, it includes:Relatively reprint number computing module, for calculating multiple homology picture clusters for specific resources website
Relative reprinting number;Training module, for instructing multiple homology picture clusters and corresponding relatively reprinting in number input screening washer
Practice screening washer model;Screening washer, the screening washer model after being suitable to be trained according to training module, and according to model to target figure
Piece cluster is screened;Identification module, the figure for being screened to Target Photo cluster according to screening washer, in identification Target Photo cluster
Piece contents attribute.
Alternatively, relatively reprint number computing module for one of multiple homology picture clusters homology picture cluster, by homology
Reprinting number on specific resources website for the picture in picture cluster, compared with the reprinting number on multiple resource websites, obtains
Homology picture cluster includes specific resources website for the relative reprinting number of specific resources website, multiple resource websites.
Alternatively, also include:First average reprints number computing module, for calculating the of picture on specific resources website
One average reprinting number;Second average reprint number computing module, second for calculating picture on multiple resource websites averagely turns
Carry number;Relatively reprinting number computing module takes reprinting number on specific resources website for the picture in homology picture cluster average with first
Reprint the first difference of number, and take reprinting number on multiple resource websites for the picture in homology picture cluster averagely to turn with second
Carry the second difference of number, the first difference is obtained homology picture cluster with the second difference contrast the relative of specific resources website is turned
Carry number.
Alternatively, first average reprint number computing module and take be located at specific resources website in the picture of multiple homology picture clusters
On multiple pictures, the quantity of corresponding with multiple pictures for the quantity of multiple pictures homology picture cluster is contrasted, is obtained
One average reprinting number.
Alternatively, second average reprint number computing module by the quantity of the picture of multiple homology picture clusters, with multiple homologies
The quantity of picture cluster is compared, and obtains the second average reprinting number.
Alternatively, also include:Image link handling module, for capturing the image link occurring on multiple resource websites;
Image link detection module, for detect image link corresponding with the picture of homology picture cluster link whether identical, and/or inspection
Whether the check information surveying the corresponding picture of image link is identical with the check information of the picture of homology picture cluster, and/or detection
The corresponding picture of image link whether there is one or more identical characteristics of image with the picture of homology picture cluster;Picture is reprinted
Number statistical modules, for according to testing result, determining that whether image link is the reprinting of the picture of homology picture cluster, and count with
The reprinting number of the picture of source picture cluster.
Alternatively, specific resources website is the most money of picture reprinting each homology picture cluster in multiple homology picture clusters
Source Site.
Alternatively, the picture of each homology picture cluster corresponding same source picture, and the picture of each homology picture cluster and its
Corresponding source picture has one or more identical characteristics of image.
Picture content attribute identification method according to the present invention and system, make use of homology picture cluster for specific resources station
The relative reprinting number of point carries out the training of screening washer model as training data, and relatively reprinting number is to reflect picture specific
The data of outer ratio of standing in the station of resource website, and of picture as advertisement is mainly characterized by:In a certain station resource
The number of times reprinted on point is very high, and the number of times reprinted on other resource websites in the range of the Internet can relatively substantially become
Few, whether picture is propagated as advertisement respectively therefore to can be used for area relative to the size reprinting number, and using reprinting relatively
The training of the screening washer model that number is carried out, then the screening washer model obtaining can voluntarily be known to the image content attribute of picture
Not, judge whether picture is advertising pictures exactly.
Described above is only the general introduction of technical solution of the present invention, in order to better understand the technological means of the present invention,
And can be practiced according to the content of description, and in order to allow the above and other objects of the present invention, feature and advantage can
Become apparent, below especially exemplified by the specific embodiment of the present invention.
Brief description
By reading the detailed description of hereafter preferred implementation, various other advantages and benefit are common for this area
Technical staff will be clear from understanding.Accompanying drawing is only used for illustrating the purpose of preferred implementation, and is not considered as to the present invention
Restriction.And in whole accompanying drawing, it is denoted by the same reference numerals identical part.In the accompanying drawings:
The flow chart that Fig. 1 shows image content recognizing method according to an embodiment of the invention;
Fig. 2 shows the partial process view of image content recognizing method according to an embodiment of the invention;
The flow chart that Fig. 3 shows image content recognizing method according to an embodiment of the invention;
Fig. 4 shows the block diagram of image content identifying system according to an embodiment of the invention;
Fig. 5 shows the block diagram of image content identifying system according to an embodiment of the invention;
Fig. 6 shows the block diagram of image content identifying system according to an embodiment of the invention.
Specific embodiment
It is more fully described the exemplary embodiment of the disclosure below with reference to accompanying drawings.Although showing the disclosure in accompanying drawing
Exemplary embodiment it being understood, however, that may be realized in various forms the disclosure and should not be by embodiments set forth here
Limited.On the contrary, these embodiments are provided to be able to be best understood from the disclosure, and can be by the scope of the present disclosure
Complete conveys to those skilled in the art.
As shown in figure 1, An embodiment provides a kind of picture content attribute identification method, it includes:Step
Rapid 110, calculate the relative reprinting number for specific resources website for multiple homology picture clusters, each picture cluster is to one group of picture
Polymerization, for example, it may be one group of higher picture of similarity, and relatively reprinting number is a kind of figure that can reflect homology picture cluster
Piece stand in specific resources website station outside reprinting ratio data, the calculation relatively reprinting number is more, in the present embodiment
The calculation relatively reprinting number is not limited;Step 120, according to multiple homology picture clusters and corresponding reprinting relatively
Number training screening washer model, by the research discovery to advertising pictures, advertising pictures have following characteristics:Advertising pictures production cost
Height, a lot of advertising pictures Dou Shi trade companies spend money, spend the time to make, because the production cost of advertising pictures is high,
One advertising pictures can be propagated many times by trade company, but these advertising pictures substantially only have trade company can be propagated, and its
His user then substantially will not propagating advertisement picture, advertising pictures propagate on this difference be eventually embodied in resource website
On reprinting number on:The number of times reprinted on specific resource website is very many(Trade company's spreading intentionally), and the Internet other
The number of times of the reprinting on website is relatively little of many(Other users are not propagated), namely advertising pictures are in specific resources website station
Reprinting outside standing is higher than regular meeting, so relatively reprint number can be used as a kind of number distinguishing advertising pictures and non-advertising pictures
According to, and train the LIBSVM that the instrument of screening washer model including but not limited to increases income;Step 130, according to the screening washer after training
Image content attribute in Model Identification Target Photo cluster, that is, whether the picture in identification Target Photo cluster is advertising pictures, has
Beneficial to carrying out the process such as filtering to advertising pictures, it is to avoid the experience of user is impacted advertising pictures it is assumed that Target Photo cluster
For one group of picture of corresponding picture searching request, then the technical scheme according to the present embodiment, can be from wherein identifying advertisement figure
Piece is simultaneously filtered, thus non-advertising pictures are supplied to user as Search Results, thus ensureing the experience of user.
In actual applications, outside the number of reprinting relatively proposed by the present invention, other features are also simultaneously taken account of, for example
The length/width of picture, the size of picture, the definition of picture, image link whether with webpage with standing, or whether picture redirected link
The feature such as stand outer, when training screening washer can according to multiple homology picture clusters each self-corresponding relatively reprint number, and picture cluster
In picture length/width, the size of picture, the definition of picture, image link whether with webpage with standing, picture redirected link is
No one or more of outer combination of standing, first passes through screening washer and goes to learn and train.When Target Photo cluster identifies, also can correspond to
Screened with reference to one or more of these other features above-mentioned and identified whether as advertising pictures.
Another embodiment of the present invention proposes a kind of picture content attribute identification method, compared with above-described embodiment, this reality
Apply the picture content attribute identification method of example, step 110 can include:For one of multiple homology picture clusters homology picture
Cluster, reprinting number on specific resources website for the picture in homology picture cluster has for example been reprinted 30 times on the A of picture station, with
Reprinting number on multiple resource websites compares, such as at 10 picture stations(Including picture station A)Upper corotation has carried 35 times, obtains
To homology picture cluster for specific resources website relative reprinting number, multiple resource websites include specific resources website, this enforcement
Provide in example and calculate the feasible pattern relatively reprinting number, and specific manner of comparison is not defined, for example, take 30/35,
30/(35-30)All it is possible as relatively reprinting number.
As shown in Fig. 2 another embodiment of the present invention proposes a kind of picture content attribute identification method, with above-described embodiment
Compare, the picture content attribute identification method of the present embodiment, step 110 includes:Step 111, calculates on specific resources website
First average reprinting number of picture is it is assumed for example that the first average number of reprinting of picture station A is 5;Step 112, calculates multiple resources
Second average number of reprinting of the picture on website is it is assumed for example that 10 picture stations(Including picture station A)The second average reprinting number
For 20;Step 113, takes the picture in homology picture cluster to reprint the of number reprinting average with first number on specific resources website
One difference, then the first difference actually can reflect the picture of homology picture cluster and reprinting on specific resources website for other pictures
Difference, the more big probability then representing that homology picture cluster is advertising pictures of difference is bigger, understands first in conjunction with aforesaid embodiment
Difference is 30-5=25, and takes reprinting number on multiple resource websites for the picture in homology picture cluster averagely to reprint with second
Second difference of number, then the second difference actually can reflect the picture of homology picture cluster and other pictures on multiple resource websites
Reprinting difference, difference bigger represent homology picture cluster be advertising pictures probability less, in conjunction with aforesaid embodiment understand
Second difference is 35-20=15, and the first difference and the contrast of the second difference are obtained the phase for specific resources website for the homology picture cluster
To reprinting number, in the present embodiment, provide another kind of mode calculating and relatively reprinting number, and the picture in view of homology picture cluster
With the reprinting difference of other pictures so that relative reprinting number can preferably reflect whether picture is advertising pictures, in the present embodiment
First difference and the second difference way of contrast are not defined, for example, take 25/15,(25±a)/(15±b)It is all permissible
, a, b are constant.
Another embodiment of the present invention proposes a kind of picture content attribute identification method, compared with above-described embodiment, this reality
Apply the picture content attribute identification method of example, step 111 includes:Take and in the picture of multiple homology picture clusters, be located at specific resources station
Multiple pictures on point, the quantity of corresponding with multiple pictures for the quantity of multiple pictures homology picture cluster are contrasted, are obtained
First average reprinting number, such as have 100 pictures on picture station A, and this 100 pictures is located in 20 picture clusters, then first is flat
All reprinting number is 100/20=5, provides a kind of side rapidly and efficiently averagely being reprinted number in the technical scheme of the present embodiment
Formula.
Another embodiment of the present invention proposes a kind of picture content attribute identification method, compared with above-described embodiment, this reality
Apply the picture content attribute identification method of example, step 112 includes:By the quantity of the picture of multiple homology picture clusters, with multiple with
The quantity of source picture cluster is compared, and obtains the second average reprinting number, such as 10 picture stations(Including picture station A)On have 1000
Pictures, this 1000 pictures can cluster as 50 picture clusters, then the second average number of reprinting is 1000/50=20, the present embodiment
A kind of mode rapidly and efficiently averagely being reprinted number is provided in technical scheme.
As shown in figure 3, another embodiment of the present invention proposes a kind of picture content attribute identification method, with above-described embodiment
Compare, the picture content attribute identification method of the present embodiment, before step 110, also include:Step 101, captures multiple station resources
The image link occurring on point(URL);Step 102, whether detection image link is corresponding with the picture of homology picture cluster links
Identical, which reflects whether pictures are reprinted with different URL, and/or the verification of the detection corresponding picture of image link
The check information of the picture of information and homology picture cluster(Including but not limited to MD5 value)Whether identical, which reflects and whether there is
Multiple identical pictures, and/or the picture of the detection corresponding picture of image link and homology picture cluster is with the presence or absence of one or many
Individual identical characteristics of image, whether identical, or obtained by the modification of same pictures, in the present embodiment if which reflects plurality of pictures
Characteristics of image includes but is not limited to contour feature, color characteristic, histogram feature etc.;Step 103, according to testing result, determines
Whether image link is the reprinting of the picture of homology picture cluster, and counts the reprinting number of the picture of homology picture cluster, then this enforcement
Example in provide a kind of can all-round statistics picture reprint number technical scheme.
Another embodiment of the present invention proposes a kind of picture content attribute identification method, compared with above-described embodiment, this reality
Apply the picture content attribute identification method of example, specific resources website is to reprint each homology picture cluster in multiple homology picture clusters
The most resource website of picture, the website reprinting the most number of times of picture is possible for the station that the trade company of advertising pictures is propagated
Point, the corresponding number of reprinting of this website is best able to be effectively reflected whether picture is advertising pictures.
Another embodiment of the present invention proposes a kind of picture content attribute identification method, compared with above-described embodiment, this reality
Apply the picture content attribute identification method of example, the picture corresponding same source picture of each homology picture cluster, and each homology picture
The source picture that the picture of cluster is corresponding has one or more identical characteristics of image, then in the technical scheme of the present embodiment
In, the picture of each homology picture cluster is identical, or can with same picture modification obtain, the characteristics of image in the present embodiment include but
It is not limited to contour feature, color characteristic, histogram feature etc..
As shown in figure 4, An embodiment provides a kind of image content attribute identification system, it includes:Phase
To reprinting number computing module 210, for calculating the relative reprinting number for specific resources website for multiple homology picture clusters, each figure
Piece cluster is the polymerization to one group of picture, for example, it may be one group of higher picture of similarity, and relatively reprinting number is that one kind can
Reflection homology picture cluster picture stand in specific resources website station outside reprintings ratio data, with respect to reprinting number calculating side
Formula is more, the calculation relatively reprinting number is not limited in the present embodiment;Training module 220, for by multiple homologies
Picture cluster and corresponding relatively reprinting in number input screening washer train screening washer model.By sending out to the research of advertising pictures
Existing, advertising pictures have following characteristics:Advertising pictures production cost is high, when a lot of advertising pictures Dou Shi trade companies spend money, spend
Between make because the production cost of advertising pictures is high, advertising pictures can be propagated many times by trade company, but these
Advertising pictures substantially only have trade company can be propagated, and other user then substantially will not propagating advertisement picture, advertising pictures
On the reprinting number that this difference on propagating eventually is embodied on resource website:That reprints on specific resource website is secondary
Number is very many(Trade company's spreading intentionally), and the number of times of the reprinting on other websites of the Internet is relatively little of many(Other users are simultaneously
Do not propagate), namely advertising pictures stand in specific resources website station outside reprinting higher than regular meeting, so relatively reprint number can
Using as a kind of data distinguishing advertising pictures and non-advertising pictures;Screening washer 230, is suitable to be trained according to training module
Screening washer model afterwards, and Target Photo cluster being screened according to model, used in the present embodiment screening washer include but not
It is limited to the LIBSVM increasing income;Identification module 240, for screening to Target Photo cluster according to screening washer, identifies Target Photo
Image content attribute in cluster, that is, whether the picture in identification Target Photo cluster is advertising pictures.
In addition, system further includes described in practical application:Picture format characteristic module 310 and/or image link are special
Levy module 320;Described picture format characteristic module 310, is suitable to extract the figure comprising in homology picture cluster and Target Photo cluster
The format character of piece;Described image link characteristic module 320, is suitable to extract and comprises in homology picture cluster and Target Photo cluster
The chain feature of picture;Described training module 220 be further adapted for based on multiple homology picture clusters, corresponding relatively reprint number with
And corresponding picture format feature and/or image link feature, together train screening washer model in input screening washer;Described screening
Device 230, the model after being further adapted for according to training, combining target picture cluster is corresponding to reprint number and corresponding picture relatively
Format character and/or image link feature, screen to Target Photo cluster;Described identification module 240, is further used for basis
Described screening washer is based on the corresponding reprinting number relatively of Target Photo cluster and corresponding picture format feature and/or image link is special
Levy and Target Photo cluster is screened, the image content attribute in identification Target Photo cluster.
Be conducive to advertising pictures are carried out the process such as filtering, it is to avoid advertising pictures the experience of user is impacted it is assumed that
Target Photo cluster is one group of picture of corresponding picture searching request, then the technical scheme according to the present embodiment, can be from wherein knowing
Do not go out advertising pictures and filtered, thus non-advertising pictures are supplied to user as Search Results, thus ensureing user's
Experience.
In actual applications, proposed by the present invention relatively reprint number outside it is also contemplated that other feature, such as picture
Length/width, the size of picture, the definition of picture, image link whether with webpage with standing, or picture redirected link whether stand outer
Etc. feature, equally first pass through grader and go to learn and train.When Target Photo cluster identifies, these other spies above-mentioned also can be considered
One or more of levy and to be screened and to be identified whether as advertising pictures.
Another embodiment of the present invention proposes a kind of image content attribute identification system, compared with above-described embodiment, this reality
Apply the image content attribute identification system of example, relatively reprint number computing module 210 same for one of multiple homology picture clusters
Source picture cluster, reprinting number on specific resources website for the picture in homology picture cluster has for example reprinted 30 on the A of picture station
Secondary, compared with the reprinting number on multiple resource websites, such as at 10 picture stations(Including picture station A)Upper corotation has carried 35
Secondary, obtain the relative reprinting number for specific resources website for the homology picture cluster, multiple resource websites include specific resources website,
Provide in the present embodiment and calculate the feasible pattern relatively reprinting number, and specific manner of comparison is not defined, for example, take
30/35、30/(35-30)All it is possible as relatively reprinting number.
As shown in figure 5, another embodiment of the present invention proposes a kind of image content attribute identification system, with above-described embodiment
Compare, the image content attribute identification system of the present embodiment, also include:First average reprinting number computing module 250, for calculating
First average reprinting number of the picture on specific resources website is it is assumed for example that the first average number of reprinting of picture station A is 5;Second
Average reprint number computing module 260, second for calculating picture on multiple resource websites be average to reprint number it is assumed for example that 10
Individual picture station(Including picture station A)Second average reprint number be 20;Relatively reprint number computing module 210 to take in homology picture cluster
Picture on specific resources website reprint number with first average reprinting number the first difference, then the first difference actually can be anti-
Reflect picture and the reprinting difference on specific resources website for other pictures of homology picture cluster, difference is more big then to represent homology picture
Cluster is that the probability of advertising pictures is bigger, understands that the first difference is 30-5=25 in conjunction with aforesaid embodiment, and takes homology picture
The second difference reprinting number reprinting average with second number on multiple resource websites for the picture in cluster, then the second difference is actually
Picture and the reprinting difference on multiple resource websites for other pictures of homology picture cluster can be reflected, difference is bigger to represent homology figure
Piece cluster is that the probability of advertising pictures is less, understands that the second difference is 35-20=15 in conjunction with aforesaid embodiment, by the first difference
Obtain the relative reprinting number for specific resources website for the homology picture cluster with the second difference contrast, provide another in the present embodiment
Kind calculate the mode relatively reprinting number, and the reprinting difference of the picture in view of homology picture cluster and other pictures is so that relative
Reprint number and can preferably reflect whether picture is advertising pictures, not to the first difference and the second difference way of contrast in the present embodiment
It is defined, for example, take 25/15,(25±a)/(15±b)All it is possible, a, b are constant.
Another embodiment of the present invention proposes a kind of image content attribute identification system, compared with above-described embodiment, this reality
Apply the image content attribute identification system of example, the first average number computing module 250 of reprinting takes in the picture of multiple homology picture clusters
Multiple pictures on specific resources website, by the quantity of corresponding with multiple pictures for the quantity of multiple pictures homology picture cluster
Contrasted, obtain the first average reprinting number, such as on picture station A, have 100 pictures, this 100 pictures is located at 20 pictures
In cluster, then the first average number of reprinting is 100/20=5, provides one kind and rapidly and efficiently put down in the technical scheme of the present embodiment
All reprint the mode of number.
Another embodiment of the present invention proposes a kind of image content attribute identification system, compared with above-described embodiment, this reality
Apply the image content attribute identification system of example, the second average number computing module 260 of reprinting is by the picture of multiple homology picture clusters
Quantity, is compared with the quantity of multiple homology picture clusters, obtains the second average reprinting number, such as 10 picture stations(Including figure
Piece station A)On have 1000 pictures, this 1000 pictures can cluster as 50 picture clusters, then the second average number of reprinting is 1000/50
=20, provide a kind of mode rapidly and efficiently averagely being reprinted number in the technical scheme of the present embodiment.
As shown in fig. 6, another embodiment of the present invention proposes a kind of image content attribute identification system, with above-described embodiment
Compare, the image content attribute identification system of the present embodiment, also include:Image link handling module 270, for capturing multiple moneys
The image link occurring on Source Site(URL);Image link detection module 280, for detecting image link and homology picture cluster
Picture corresponding link whether identical, which reflects whether pictures are reprinted with different URL, and/or detection picture
Link the check information of the check information of corresponding picture and the picture of homology picture cluster(Including but not limited to MD5 value)Whether phase
With, which reflects with the presence or absence of multiple identical pictures, and/or the figure of the detection corresponding picture of image link and homology picture cluster
Piece whether there is one or more identical characteristics of image, and whether identical, or repaiied by same pictures if which reflects plurality of pictures
Change and obtain, the characteristics of image in the present embodiment includes but is not limited to contour feature, color characteristic, histogram feature etc.;Picture turns
Carry number statistical module 290, for according to testing result, determining that whether image link is the reprinting of the picture of homology picture cluster, and
Statistics homology picture cluster picture reprinting number, then provide in the present embodiment a kind of can all-round statistics picture reprint number technology
Scheme.
Another embodiment of the present invention proposes a kind of image content attribute identification system, compared with above-described embodiment, this reality
Apply the image content attribute identification system of example, specific resources website is to reprint each homology picture cluster in multiple homology picture clusters
The most resource website of picture, the website reprinting the most number of times of picture is possible for the station that the trade company of advertising pictures is propagated
Point, the corresponding number of reprinting of this website is best able to be effectively reflected whether picture is advertising pictures.
Another embodiment of the present invention proposes a kind of image content attribute identification system, compared with above-described embodiment, this reality
Apply the image content attribute identification system of example, the picture corresponding same source picture of each homology picture cluster, and each homology picture
The source picture that the picture of cluster is corresponding has one or more identical characteristics of image, then in the technical scheme of the present embodiment
In, the picture of each homology picture cluster is identical, or can with same picture modification obtain, the characteristics of image in the present embodiment include but
It is not limited to contour feature, color characteristic, histogram feature etc..
Algorithm and display be not inherently related to any certain computer, virtual system or miscellaneous equipment provided herein.
Various general-purpose systems can also be used together with based on teaching in this.As described above, construct required by this kind of system
Structure be obvious.Additionally, the present invention is also not for any certain programmed language.It is understood that, it is possible to use various
Programming language realizes the content of invention described herein, and the description above language-specific done is to disclose this
Bright preferred forms.
In description mentioned herein, illustrate a large amount of details.It is to be appreciated, however, that the enforcement of the present invention
Example can be put into practice in the case of not having these details.In some instances, known method, structure are not been shown in detail
And technology, so as not to obscure the understanding of this description.
Similarly it will be appreciated that in order to simplify the disclosure and help understand one or more of each inventive aspect,
Above in the description to the exemplary embodiment of the present invention, each feature of the present invention is grouped together into single enforcement sometimes
In example, figure or descriptions thereof.However, the method for the disclosure should be construed to reflect following intention:I.e. required guarantor
The application claims of shield more features than the feature being expressly recited in each claim.More precisely, it is such as following
Claims reflected as, inventive aspect is all features less than single embodiment disclosed above.Therefore,
The claims following specific embodiment are thus expressly incorporated in this specific embodiment, wherein each claim itself
All as the separate embodiments of the present invention.
Those skilled in the art are appreciated that and the module in the equipment in embodiment can be carried out adaptively
Change and they are arranged in one or more equipment different from this embodiment.Can be the module in embodiment or list
Unit or assembly be combined into a module or unit or assembly, and can be divided in addition multiple submodule or subelement or
Sub-component.In addition to such feature and/or at least some of process or unit exclude each other, can adopt any
Combination is to this specification(Including adjoint claim, summary and accompanying drawing)Disclosed in all features and so disclosed
Where method or all processes of equipment or unit are combined.Unless expressly stated otherwise, this specification(Including adjoint power
Profit requires, makes a summary and accompanying drawing)Disclosed in each feature can carry out generation by the alternative features providing identical, equivalent or similar purpose
Replace.
Although additionally, it will be appreciated by those of skill in the art that some embodiments described herein include other embodiments
In included some features rather than further feature, but the combination of the feature of different embodiment means to be in the present invention's
Within the scope of and form different embodiments.For example, in the following claims, embodiment required for protection appoint
One of meaning can in any combination mode using.
The all parts embodiment of the present invention can be realized with hardware, or to run on one or more processor
Software module realize, or with combinations thereof realize.It will be understood by those of skill in the art that can make in practice
With microprocessor or digital signal processor(DSP)To realize image content attribute identification system according to embodiments of the present invention
In some or all parts some or all functions.The present invention is also implemented as described herein for executing
Some or all equipment of method or program of device(For example, computer program and computer program).So
The program realizing the present invention can store on a computer-readable medium, or can have the shape of one or more signal
Formula.Such signal can be downloaded from internet website and obtain, or provides on carrier signal, or with any other shape
Formula provides.
It should be noted that above-described embodiment the present invention will be described rather than limits the invention, and ability
Field technique personnel can design alternative embodiment without departing from the scope of the appended claims.In the claims,
Any reference markss between bracket should not be configured to limitations on claims.Word "comprising" does not exclude the presence of not
Element listed in the claims or step.Word "a" or "an" before element does not exclude the presence of multiple such
Element.The present invention can come real by means of the hardware including some different elements and by means of properly programmed computer
Existing.If in the unit claim listing equipment for drying, several in these devices can be by same hardware branch
To embody.The use of word first, second, and third does not indicate that any order.These words can be explained and run after fame
Claim.
Claims (16)
1. a kind of picture content attribute identification method, it includes:
Calculate the relative reprinting number for specific resources website for multiple homology picture clusters, the described number of reprinting relatively is same for reflecting
The data of reprinting ratio inside and outside specific resources website station for the picture of source picture cluster;
Screening washer model is trained according to the plurality of homology picture cluster and corresponding number of relatively reprinting;
According to the image content attribute in the screening washer Model Identification Target Photo cluster after training, described image content attribute is figure
Whether piece is advertising pictures.
2. picture content attribute identification method according to claim 1, wherein, described calculating multiple homology picture clusters for
The step of the relative reprinting number of specific resources website includes:
For one of the plurality of homology picture cluster homology picture cluster, by the picture in described homology picture cluster in specific money
Reprinting number on Source Site, compared with the reprinting number on multiple resource websites, obtains described homology picture cluster for described
The relative reprinting number of specific resources website, the plurality of resource website includes described specific resources website.
3. picture content attribute identification method according to claim 2, wherein, described by the figure in described homology picture cluster
Reprinting number on described specific resources website for the piece, the step compared with the reprinting number on multiple resource websites includes:
Calculate the first average reprinting number of the picture on described specific resources website;
Calculate the second average reprinting number of the picture on the plurality of resource website;
Take reprinting number on described specific resources website for the picture in described homology picture cluster and described first average reprinting number
The first difference, and take reprinting number and described second on the plurality of resource website for the picture in described homology picture cluster
Average the second difference reprinting number, described first difference and described second difference contrast are obtained described homology picture cluster for institute
State the relative reprinting number of specific resources website.
4. picture content attribute identification method according to claim 3, wherein, on the described specific resources website of described calculating
Picture first average reprint number step include:
Take the multiple pictures being located on described specific resources website in the picture of the plurality of homology picture cluster, by the plurality of figure
The quantity of the quantity of piece homology corresponding with the plurality of picture picture cluster is contrasted, and obtains the described first average reprinting number.
5. picture content attribute identification method according to claim 3, wherein, on the plurality of resource website of described calculating
Picture second average reprint number step include:
By the quantity of the picture of the plurality of homology picture cluster, it is compared with the quantity of the plurality of homology picture cluster, obtains
Described second average reprinting number.
6. picture content attribute identification method according to claim 2, wherein, described by described homology picture cluster
Reprinting number on specific resources website for the picture, before the step compared with the reprinting number on multiple resource websites, also wraps
Include:
The image link occurring is captured on the plurality of resource website;
Detect described image link corresponding with the picture of described homology picture cluster link whether identical, and/or detect described figure
Whether the check information that piece links corresponding picture is identical with the check information of the picture of described homology picture cluster, and/or detection
The picture of the described corresponding picture of image link and described homology picture cluster whether there is one or more identical characteristics of image;
According to testing result, determine that whether described image link is the reprinting of the picture of described homology picture cluster, and count described
The reprinting number of the picture of homology picture cluster.
7. picture content attribute identification method according to claim 2, wherein,
Described specific resources website is the most resource of picture reprinting each homology picture cluster in the plurality of homology picture cluster
Website.
8. picture content attribute identification method according to any one of claim 1 to 7, wherein,
The picture of each homology picture cluster corresponding same source picture, and the corresponding source picture of picture of each homology picture cluster
There are one or more identical characteristics of image.
9. a kind of image content attribute identification system, it includes:
Relatively reprint number computing module, for calculating the relative reprinting number for specific resources website for multiple homology picture clusters, institute
State and relatively reprint the data that number is the reprinting ratio that can reflect the picture of homology picture cluster inside and outside specific resources website station;
Training module, for training screening by the plurality of homology picture cluster and corresponding relatively reprinting in number input screening washer
Device model;
Screening washer, is suitable to the screening washer model after being trained according to described training module, and according to described model to target figure
Piece cluster is screened;
Identification module, for being screened to Target Photo cluster according to described screening washer, in the picture in identification Target Photo cluster
Hold attribute, whether described image content attribute is advertising pictures for picture.
10. image content attribute identification system according to claim 9, wherein,
Described number computing module of relatively reprinting for one of the plurality of homology picture cluster homology picture cluster, by described homology
Reprinting number on specific resources website for the picture in picture cluster, compared with the reprinting number on multiple resource websites, obtains
Described homology picture cluster includes described specific money for the relative reprinting number of described specific resources website, the plurality of resource website
Source Site.
11. image content attribute identification systems according to claim 10, wherein, also include:
First average reprinting number computing module, for calculating the first average reprinting number of the picture on described specific resources website;
Second average reprinting number computing module, for calculating the second average reprinting number of the picture on the plurality of resource website;
Described number computing module of relatively reprinting takes reprinting on described specific resources website for the picture in described homology picture cluster
Number first difference of reprinting number average with described first, and take the picture in described homology picture cluster in the plurality of station resource
The second difference reprinting number reprinting average with described second number on point, described first difference and described second difference are contrasted
To described homology picture cluster for described specific resources website relative reprinting number.
12. image content attribute identification systems according to claim 11, wherein,
First average reprint number computing module and take in the picture of the plurality of homology picture cluster be located on described specific resources website
Multiple pictures, the quantity of corresponding with the plurality of picture for the quantity of the plurality of picture homology picture cluster is contrasted,
Obtain the described first average reprinting number.
13. image content attribute identification systems according to claim 11, wherein,
Second average reprint number computing module by the quantity of the picture of the plurality of homology picture cluster, with the plurality of homology picture
The quantity of cluster is compared, and obtains the described second average reprinting number.
14. image content attribute identification systems according to claim 10, wherein, also include:
Image link handling module, for capturing the image link occurring on the plurality of resource website;
Whether image link detection module, link for detecting that described image link is corresponding with the picture of described homology picture cluster
Identical, and/or the verification letter detecting the check information of the corresponding picture of described image link and the picture of described homology picture cluster
Whether breath is identical, and/or the detection corresponding picture of described image link whether there is one with the picture of described homology picture cluster
Or multiple identical characteristics of image;
Number statistical module reprinted by picture, for according to testing result, determining whether described image link is described homology picture cluster
Picture reprinting, and count the reprinting number of the picture of described homology picture cluster.
15. image content attribute identification systems according to claim 10, wherein,
Described specific resources website is the most resource of picture reprinting each homology picture cluster in the plurality of homology picture cluster
Website.
The 16. image content attribute identification systems according to any one of claim 9 to 15, wherein,
The picture of each homology picture cluster corresponding same source picture, and the corresponding source picture of picture of each homology picture cluster
There are one or more identical characteristics of image.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310632676.8A CN103617262B (en) | 2013-12-02 | 2013-12-02 | Picture content attribute identification method and system |
PCT/CN2014/087109 WO2015081748A1 (en) | 2013-12-02 | 2014-09-22 | Method and system for identifying content attribute of picture |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310632676.8A CN103617262B (en) | 2013-12-02 | 2013-12-02 | Picture content attribute identification method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103617262A CN103617262A (en) | 2014-03-05 |
CN103617262B true CN103617262B (en) | 2017-03-08 |
Family
ID=50167965
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310632676.8A Active CN103617262B (en) | 2013-12-02 | 2013-12-02 | Picture content attribute identification method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103617262B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2015081748A1 (en) * | 2013-12-02 | 2015-06-11 | 北京奇虎科技有限公司 | Method and system for identifying content attribute of picture |
CN105022738A (en) * | 2014-04-21 | 2015-11-04 | 上海京知信息科技有限公司 | Extracting and mapping method of network picture format file on the basis of histograms |
CN103995857A (en) * | 2014-05-14 | 2014-08-20 | 北京奇虎科技有限公司 | Method and device for achieving image search and sorting |
CN106599177B (en) * | 2016-12-12 | 2020-02-14 | 国云科技股份有限公司 | Advertisement page shielding processing method |
CN107451180B (en) * | 2017-06-13 | 2021-02-19 | 百度在线网络技术(北京)有限公司 | Method, device, equipment and computer storage medium for identifying site homologous relation |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5832119A (en) * | 1993-11-18 | 1998-11-03 | Digimarc Corporation | Methods for controlling systems using control signals embedded in empirical data |
CN101071433A (en) * | 2007-05-10 | 2007-11-14 | 腾讯科技(深圳)有限公司 | Picture download system and method |
CN102419777A (en) * | 2012-01-10 | 2012-04-18 | 凤凰在线(北京)信息技术有限公司 | System and method for filtering internet image advertisements |
CN102591983A (en) * | 2012-01-10 | 2012-07-18 | 凤凰在线(北京)信息技术有限公司 | Advertisement filter system and advertisement filter method |
-
2013
- 2013-12-02 CN CN201310632676.8A patent/CN103617262B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5832119A (en) * | 1993-11-18 | 1998-11-03 | Digimarc Corporation | Methods for controlling systems using control signals embedded in empirical data |
US5832119C1 (en) * | 1993-11-18 | 2002-03-05 | Digimarc Corp | Methods for controlling systems using control signals embedded in empirical data |
CN101071433A (en) * | 2007-05-10 | 2007-11-14 | 腾讯科技(深圳)有限公司 | Picture download system and method |
CN102419777A (en) * | 2012-01-10 | 2012-04-18 | 凤凰在线(北京)信息技术有限公司 | System and method for filtering internet image advertisements |
CN102591983A (en) * | 2012-01-10 | 2012-07-18 | 凤凰在线(北京)信息技术有限公司 | Advertisement filter system and advertisement filter method |
Also Published As
Publication number | Publication date |
---|---|
CN103617262A (en) | 2014-03-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103617262B (en) | Picture content attribute identification method and system | |
CN104834933B (en) | A kind of detection method and device in saliency region | |
CN103500405B (en) | For carrying out mirror method for distinguishing and its equipment to the nominal model of target terminal | |
US8587604B1 (en) | Interactive color palettes for color-aware search | |
WO2019031503A1 (en) | Tire image recognition method and tire image recognition device | |
CN108694223A (en) | The construction method and device in a kind of user's portrait library | |
CN104603833B (en) | Method and system for linking printing object with digital content | |
KR20190119219A (en) | Main image recommendation method and apparatus, and system | |
CN107918767B (en) | Object detection method, device, electronic equipment and computer-readable medium | |
CN106203454B (en) | The method and device of certificate format analysis | |
CN109191424B (en) | Breast mass detection and classification system and computer-readable storage medium | |
CN109472193A (en) | Method for detecting human face and device | |
CN109685528A (en) | System and method based on deep learning detection counterfeit product | |
CN103617261B (en) | Picture content attribute identification method and system | |
CN110110714A (en) | Method and system are corrected automatically on a kind of line of papery operation | |
CN110348511A (en) | A kind of picture reproduction detection method, system and electronic equipment | |
CN109919211A (en) | Commodity recognition method, device, system and computer-readable medium | |
CN107003834B (en) | Pedestrian detection device and method | |
CN110347855A (en) | Paintings recommended method, terminal device, server, computer equipment and medium | |
CN106469187A (en) | The extracting method of key word and device | |
CN102902790B (en) | Web page classification system and method | |
CN112988557A (en) | Search box positioning method, data acquisition device and medium | |
CN107493469A (en) | A kind of method and device of the area-of-interest of determination SFR test cards | |
CN109934194A (en) | Picture classification method, edge device, system and storage medium | |
CN105183843B (en) | list page identification system and method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20220727 Address after: Room 801, 8th floor, No. 104, floors 1-19, building 2, yard 6, Jiuxianqiao Road, Chaoyang District, Beijing 100015 Patentee after: BEIJING QIHOO TECHNOLOGY Co.,Ltd. Address before: 100088 room 112, block D, 28 new street, new street, Xicheng District, Beijing (Desheng Park) Patentee before: BEIJING QIHOO TECHNOLOGY Co.,Ltd. Patentee before: Qizhi software (Beijing) Co.,Ltd. |
|
TR01 | Transfer of patent right |