CN100490426C - Method and system for counteracting rubbish e-mails - Google Patents

Method and system for counteracting rubbish e-mails Download PDF

Info

Publication number
CN100490426C
CN100490426C CNB2005100375200A CN200510037520A CN100490426C CN 100490426 C CN100490426 C CN 100490426C CN B2005100375200 A CNB2005100375200 A CN B2005100375200A CN 200510037520 A CN200510037520 A CN 200510037520A CN 100490426 C CN100490426 C CN 100490426C
Authority
CN
China
Prior art keywords
mail
spam
server
learning
clients
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CNB2005100375200A
Other languages
Chinese (zh)
Other versions
CN1941746A (en
Inventor
徐嘉键
李光
柯军严
冯晓勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Tencent Computer Systems Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CNB2005100375200A priority Critical patent/CN100490426C/en
Priority to PCT/CN2006/002546 priority patent/WO2007036152A1/en
Publication of CN1941746A publication Critical patent/CN1941746A/en
Application granted granted Critical
Publication of CN100490426C publication Critical patent/CN100490426C/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/21Monitoring or handling of messages
    • H04L51/212Monitoring or handling of messages using filtering or selective blocking

Abstract

The method thereof comprises: 1) the mail learning module learns the feature of mail saved in the mail client side, and sends the learning result to the mail server; 2) the mail server combines the newly learned result with the original feature base of the mail client side to form a new feature base; 3) the junk mail filter uses the newest feature base to filter the new mail of corresponding mail client side received by the mail server.

Description

A kind of method and system of anti-rubbish mail
[technical field]
The present invention relates to the spam treatment technology in the e-mail technique field, particularly a kind of method and system of anti-rubbish mail.
[background technology]
At present the global communications network as the internet is that vast consulting sales company or the like has brought business opportunity.Email is also more and more received the welcome of businessman as the means of interspersing advertisements and propaganda.Cause like this that spam more and more spreads unchecked on the network, greatly wasted the network bandwidth, mail user's processing time and system resource, the user is pestered beyond endurance, become and hinder a great problem that network foundation is used.
Be used to stop the key technology of spam to be to use filtration system.At present, mostly adopt the method for keyword statistics based on the Spam filtering of Mail Contents.The bayesian filtering method is to use frequency the highest, and the most tangible a kind of filtrating mail way based on keyword of filter effect is comprising tender bayesian filtering method all improved forms finally from the beginning.The bayesian filtering method is by a certain amount of known spam and non-spam are learnt, generate a cover Bayesian learning storehouse, utilize Bayesian formula to judge that in conjunction with the Bayesian learning storehouse whether an envelope mail is the method for spam, has the ability of continuous self study then.
For some Spam filtering gateways (have specially at each user, also have at many users), can be included into Mail Clients or server end respectively according to their implementation such as all users' in certain several territory.Mail Clients is to be used to help the software product of user at its local reception, transmission and management mail.Mail server is used for being responsible for receiving and sending the mail of all users in certain territory.
Utilize Bayes's method to carry out the filtration of spam at Mail Clients, mainly comprise: at first in this mailbox, obtain the spam sample, carry out Bayesian learning at Mail Clients then, generate the Bayesian learning storehouse.After Mail Clients is received new mail from mail server, utilize the Bayesian learning storehouse that new mail is filtered.At last estimative spam is put into dustbin, normal email is put into inbox.Shortcoming is: mail could begin to filter after need receiving subscriber's local, need take a large amount of network bandwidths and custom system resource, prolongs the time of reception of mail, has reduced user's experience.
Utilize Bayes's method to carry out the filtration of spam in mail server side, mainly comprise: at first obtain the mail sample in the mail scope to all users in certain territory, carry out Bayesian learning, generate same set of Bayesian learning storehouse.Utilize this learning database to filter to all new mails that arrive mail server then.Shortcoming is: because whether an envelope mail is spam criterion and incomplete same concerning each user, such as, certain advertisement matter concerning the sub-fraction user, just in time is that he is needed, but is spam to most of user.Adopt same set of Bayes storehouse will reduce the False Rate of the discrimination and the non-spam of raising of spam to some users like this.
Simultaneously, realize filtering spam to greatest extent and reduce False Rate simultaneously, constantly known spam and non-spam be learnt, and need the study of booster injection each user's non-spam sample with regard to needs.Because for mail, think the non-spam of portion by mistake consequence that spam causes than rubbish E-mail error is thought it is that non-spam is many greatly.Prior art is not just accomplished this point.
[summary of the invention]
The object of the present invention is to provide a kind of mode that combines with server based on Mail Clients, can realize anti-rubbish mail method and system, can overcome the defective of prior art at each user's request.
The technical solution adopted for the present invention to solve the technical problems is: a kind of method of anti-rubbish mail comprises step:
101, the mail study module carries out feature learning to the mail sample that is stored in Mail Clients, and up-to-date learning outcome is sent to mail server;
102, mail server is integrated original feature database of described up-to-date learning outcome and this Mail Clients, forms the latest features storehouse;
103, twit filter adopts the new mail of the corresponding Mail Clients that described latest features storehouse receives mail server to filter.
Described mail sample is that the mail study module is stored in some spams and the non-spam that filters out behind the mail of Mail Clients according to mail management configuration timing scan, or the mail of the manual screening of user.
Former scanned mail is no longer scanned.
In the step 101, adopt Bayes's method that described mail sample is learnt, generate up-to-date Bayesian learning storehouse, and this learning database is sent to mail server.
Step 103 comprises that further mail server is stored in the spam that filters out in the spam recycle bin of server end, normal email is stored in the addressee position of server end respective user.
As an improvement of the present invention, mail server generates a tabulation that comprises the spam characteristic information simultaneously, is stored in corresponding Mail Clients addressee position.
Improve as of the present invention another, Mail Clients is stored the learning outcome in nearest a period of time.
The present invention also provides a kind of anti-garbage mail system, comprising:
The mail study module is arranged on Mail Clients, is used to filter out some spams and non-spam as the mail sample, and described mail sample is carried out feature learning, generates up-to-date learning database, again this learning database is sent to mail server;
Feature specification module is arranged on mail server side, is used for original feature database of described up-to-date learning database and this Mail Clients is integrated, and forms this user's latest features storehouse;
Twit filter is arranged on mail server side, and filter the new mail of the corresponding Mail Clients that mail server receives in its latest features storehouse of adopting described this user.
System of the present invention also comprises the spam recycle bin, is arranged on mail server side, is used to store the spam that mail server filters out.
System of the present invention also comprises the learning database memory, is arranged on Mail Clients, is used to store the learning outcome in nearest a period of time.
The present invention filters spam by the common cooperation of Mail Clients and mail server side, spam just is filtered at server end, does not need to receive subscriber's local again, reduces the use of the network bandwidth, shorten the time of reception of mail, promoted user experience.On mail server, all be provided with one to up-to-date Bayesian learning storehouse that should the user for each user., can when improving the Spam filtering rate, reduce the False Rate of non-spam like this.Simultaneously, because whole study and filter process nearly all do not need the user to participate in, so each all very little to the influence of custom system when user mail is learnt, user's fundamental sensation in operation, reduces burden for users less than native system.
[description of drawings]
Fig. 1 is the system configuration schematic diagram of a kind of anti-garbage mail system of the present invention.
Fig. 2 is the realization flow figure of a kind of anti-rubbish mail method of the present invention.
[embodiment]
The present invention is further elaborated with specific embodiment with reference to the accompanying drawings below.
As shown in Figure 1, system of the present invention mainly comprises mail study module 11, learning database memory 12, feature specification module 21, twit filter 22 and spam recycle bin 23.
Wherein, mail study module 11 is arranged on Mail Clients 10, is used for the mail sample is carried out feature learning, generates up-to-date learning database, again this learning database is sent to mail server 20.Described mail sample, can be after mail study module 11 is stored in the mail of Mail Clients 10 according to mail management configuration timing scan, according to some spams and the non-spam that some algorithms and the tactful Automatic sieve of inside are selected, also can be the rubbish or the non-spam of the manual screening of user.Select a collection of mail such as the user in mailbox, the button of clicking " having learnt mail " or " learning spam mail " on Mail Clients 10 guidance panels and so on then is by systematic learning.When the user screened by hand, mail study module 11 was triggered immediately, the mail sample is learnt, and learning outcome is sent to mail server 20.Generally speaking, mail study module 11 screens the mail sample according to configuration timing (after system start-up every day), learns, and learning outcome is sent to mail server 20.In order to improve system effectiveness, reduce resource waste, when automatic screening mail sample, former scanned mail is no longer scanned.The configuration of described mail management is installed the back at Mail Clients 10 and is generated automatically, user's configuration of can making amendment voluntarily.The keeper of mailing system also can issue administration configuration by unified renewal of mail server 20 ends by native system, does not so just need the user to participate in, and can not bring added burden for the user.
Learning database memory 12 is arranged on the Mail Clients 10, is used to store the learning outcome in nearest a period of time.Under the normal condition, Mail Clients all can generate a new learning database 10 every days, and is uploaded to mail server 20 ends.Mail server 20 can be replied a response that receives success or not after receiving new learning database.Sometimes may be because a variety of causes, cause mail server 20 ends not receive the learning database of certain client such as network problem etc., at this moment just need Mail Clients 10 to upload these storehouses again.Therefore, learning database memory 12 is mainly used in fault-tolerant.
Feature specification module 21, be arranged on mail server 20 ends, be used to receive the up-to-date learning database that each client sends over, and the up-to-date learning database of correspondence integrated with original feature database of corresponding Mail Clients 10, form the latest features storehouse of each Mail Clients 10, be stored on the mail server 20.
Twit filter 22 is arranged on mail server 20 ends, and filter the new mail of the corresponding Mail Clients 10 that mail server 20 receives in the latest features storehouse that is used to utilize each Mail Clients 10.
Spam recycle bin 23 is arranged on mail server 20 ends, is used to store the spam that mail server 20 is filtered out.General each Mail Clients 10 all spam recycle bin on mail server 20 has a catalogue (or file) that is labeled as the rubbish mailbox for 23 li.The user can check these spams by WebMail, again the mail of collection system erroneous judgement.
Certainly, mail server 20 ends necessarily also comprise Mail Clients addressee position 24, are used to store the normal email that twit filter 22 filters out, and wait for that corresponding Mail Clients collects new mail to this position.
By facts have proved, the bayesian filtering method is to use frequency the highest, and therefore the most tangible a kind of filtrating mail way based on keyword of filter effect in preferred embodiment of the present invention, adopts Bayes's method to carry out feature learning.As shown in Figure 2, the realization flow of the inventive method is as follows:
At first, user's installation and operation Mail Clients 10 on computers.
In the 110th step, the user starts Mail Clients 10 backs at every turn or according to being configured in the set time, mail study module 11 scans mail in the mailboxes, automatic screening mail sample, or the user screens the mail sample by hand.
In the 120th step, the mail sample that 11 pairs of mail study modules are elected carries out Bayesian learning.
In the 130th step, generate up-to-date Bayesian learning storehouse.
The 140th step uploaded to mail server 20 with newly-generated Bayesian learning storehouse, stored into simultaneously in the learning database memory 12.
In the 150th step, mail server 20 receives the up-to-date Bayesian learning storehouse that each Mail Clients 10 sends over.
In the 160th step, feature specification module 21 is integrated each up-to-date learning database with original feature database of corresponding Mail Clients 10, form the latest features storehouse of each Mail Clients 10.
In the 170th step, mail server 20 receives new mail and classification (judging which user it belongs to).
The 180th step, filter the new mail of the corresponding Mail Clients 10 that mail server 20 receives in the latest features storehouse that twit filter 22 utilizes each Mail Clients 10, spam is stored in the spam recycle bin 23, normal email is stored in the addressee position 24 of server end respective user.Like this, Mail Clients 10 is only collected non-spam when mailing, do not collect spam.
As an improvement of the present invention, after mail server 20 is stored in spam recycle bin 23 with spam, can generates one automatically and include the spam characteristic information, as sender information, the tabulation of mail header etc. is stored in corresponding Mail Clients addressee position 24.The user just can obtain being listed in all information of spam when mailing like this, if find to have the mail of erroneous judgement, can collect by WebMail.So also can improve constantly the filter capacity of system.
As another kind of improvement the of the present invention, internal system has adopted the series of optimum algorithm, such as under the very little situation of system loading, just learning, during study the CPU time of using is remained on certain lower value, make when user mail being learnt all very little to the influence of custom system at every turn, user's fundamental sensation is being moved less than native system, reduce burden for users.
Show that according to the statistics in some large-scale mailing system actual motions when all use his a Bayesian learning storehouse of cover to filter at each user, the filterability of spam surpasses 99.4%, the False Rate of non-spam is no more than 0.8%; But when a shared Bayes storehouse, 10 general-purpose families, the filterability of spam drops to and is lower than 90%, and the False Rate of non-spam surpasses 7%.This shows that using the filter effect in the Bayes storehouse of a cover its oneself to be significantly higher than a plurality of users to each user uses same Bayes storehouse.
The present invention filters spam by the common cooperation of Mail Clients 10 and mail server 20 ends, each user has the Bayes storehouse of oneself, and spam is no longer collected Mail Clients 10, and increase is at the study of each user's non-spam sample, effectively reduce the use of the network bandwidth, the time of reception that has shortened mail, lifting user experience, when improving the Spam filtering rate, reduce the False Rate of non-spam.
The above is the preferred embodiments of the present invention only, is not limited to the present invention, and for a person skilled in the art, the present invention can have various changes and variation.Within the spirit and principles in the present invention all, any modification of being done, be equal to replacement, improvement etc., all should be included within the claim scope of the present invention.

Claims (10)

1, a kind of method of anti-rubbish mail is characterized in that comprising step:
101, mail study module (11) carries out feature learning to the mail sample that is stored in Mail Clients (10), and up-to-date learning outcome is sent to mail server (20);
102, mail server (20) is integrated original feature database of described up-to-date learning outcome and this Mail Clients (10), forms the latest features storehouse;
103, filter the new mail of the corresponding Mail Clients (10) that mail server (20) receives in the described latest features of twit filter (22) employing storehouse.
2, the method for anti-rubbish mail according to claim 1, it is characterized in that: described mail sample, be that mail study module (11) is stored in some spams and the non-spam that filters out behind the mail of Mail Clients (10) according to mail management configuration timing scan, or the mail of the manual screening of user.
3, the method for anti-rubbish mail according to claim 2 is characterized in that: former scanned mail is no longer scanned.
4, the method for anti-rubbish mail according to claim 1 is characterized in that: in the step 101, adopt Bayes's method that described mail sample is learnt, generate up-to-date Bayesian learning storehouse, and this learning database is sent to mail server (20).
5, the method for anti-rubbish mail according to claim 1, it is characterized in that: step 103 further comprises, mail server (20) is stored in the spam that filters out in the spam recycle bin (23) of server end, normal email is stored in the addressee position of server end respective user.
6, the method for anti-rubbish mail according to claim 5 is characterized in that: mail server (20) generates a tabulation that comprises the spam characteristic information simultaneously, is stored in corresponding Mail Clients addressee position (24).
7, the method for anti-rubbish mail according to claim 1 is characterized in that: the learning outcome of Mail Clients (10) in nearest a period of time of storage.
8, a kind of anti-garbage mail system is characterized in that comprising:
Mail study module (11), be arranged on Mail Clients (10), be used to filter out some spams and non-spam, and described mail sample is carried out feature learning as the mail sample, generate up-to-date learning database, again this learning database is sent to mail server (20);
Feature specification module (21) is arranged on mail server (20) end, is used for original feature database of described up-to-date learning database and this Mail Clients (10) is integrated, and forms this user's latest features storehouse;
Twit filter (22) is arranged on mail server (20) end, and filter the new mail of the corresponding Mail Clients (10) that mail server (20) receives in its latest features storehouse of adopting described this user.
9, anti-garbage mail system according to claim 8 is characterized in that: also comprise spam recycle bin (23), be arranged on mail server (20) end, be used to store the spam that mail server (20) is filtered out.
10, anti-garbage mail system according to claim 8 is characterized in that: also comprise learning database memory (12), be arranged on Mail Clients (10), be used to store the learning outcome in nearest a period of time.
CNB2005100375200A 2005-09-27 2005-09-27 Method and system for counteracting rubbish e-mails Active CN100490426C (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CNB2005100375200A CN100490426C (en) 2005-09-27 2005-09-27 Method and system for counteracting rubbish e-mails
PCT/CN2006/002546 WO2007036152A1 (en) 2005-09-27 2006-09-27 A system and method for filtering spam mail, and a mail client terminal and mail server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2005100375200A CN100490426C (en) 2005-09-27 2005-09-27 Method and system for counteracting rubbish e-mails

Publications (2)

Publication Number Publication Date
CN1941746A CN1941746A (en) 2007-04-04
CN100490426C true CN100490426C (en) 2009-05-20

Family

ID=37899378

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2005100375200A Active CN100490426C (en) 2005-09-27 2005-09-27 Method and system for counteracting rubbish e-mails

Country Status (2)

Country Link
CN (1) CN100490426C (en)
WO (1) WO2007036152A1 (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101150756B (en) * 2007-11-08 2010-05-19 电子科技大学 A spam filtering method
CN101594312B (en) * 2008-05-30 2012-12-26 电子科技大学 Method for recognizing junk mail based on artificial immunity and behavior characteristics
CN101388859B (en) * 2008-09-16 2010-09-01 王玉冰 System and method preventing junk mail
CN103188136B (en) * 2011-12-30 2016-04-27 盈世信息科技(北京)有限公司 A kind of filtrating mail information saving method, mail server and e-mail system
CN103684971B (en) * 2012-09-07 2017-02-08 盈世信息科技(北京)有限公司 Method and system for processing mails
CN103490979B (en) * 2013-09-03 2016-09-14 福建伊时代信息科技股份有限公司 electronic mail identification method and system
CN112433989A (en) * 2020-12-14 2021-03-02 国网辽宁省电力有限公司葫芦岛供电公司 System and method for automatically collecting e-mails based on web mode
CN115277612A (en) * 2022-08-03 2022-11-01 西安热工研究院有限公司 Junk mail detection and filtering method and system for intranet

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7219148B2 (en) * 2003-03-03 2007-05-15 Microsoft Corporation Feedback loop for spam prevention
US7320020B2 (en) * 2003-04-17 2008-01-15 The Go Daddy Group, Inc. Mail server probability spam filter
US7519668B2 (en) * 2003-06-20 2009-04-14 Microsoft Corporation Obfuscation of spam filter

Also Published As

Publication number Publication date
CN1941746A (en) 2007-04-04
WO2007036152A1 (en) 2007-04-05

Similar Documents

Publication Publication Date Title
CN100490426C (en) Method and system for counteracting rubbish e-mails
US7275082B2 (en) System for policing junk e-mail messages
CN100546288C (en) A kind of Email tracking system and method thereof
US8135779B2 (en) Method, system, apparatus, and software product for filtering out spam more efficiently
AU2003213262B2 (en) Hierarchical org-chart based email mailing list maintenance
US8312085B2 (en) Self-tuning statistical method and system for blocking spam
US7469292B2 (en) Managing electronic messages using contact information
DE60127569T2 (en) ELECTRONIC ADDED MESSAGE SERVICES AND ITS TRANSPARENT IMPLEMENTATION USING AN INTERMEDIATE SERVICE
EP1530771B1 (en) Method and system for transmitting notifications to users of a logistic system
US7457844B2 (en) Correspondent-centric management email system for associating message identifiers with instances of properties already stored in database
US7089241B1 (en) Classifier tuning based on data similarities
CN103607339B (en) The method and system of mail sending strategy it is automatically adjusted based on content
US20050091320A1 (en) Method and system for categorizing and processing e-mails
WO1999032985A1 (en) E-mail filter and method thereof
CN1719812A (en) Method and system for filtering refuse E-mail
CN102456022A (en) Short message management method and system
CN101877837A (en) Method and device for short message filtration
CN102413076A (en) Spam mail judging system based on behavior analysis
CN1921458B (en) System and method for uniform switch-in and exchange of enterprise E-mail
CN101494546B (en) Method for preventing collaboration type junk mail
CN109600300B (en) Artificial intelligent mail management system and method
FR2917205A1 (en) VIRTUAL DISTRIBUTION SYSTEM FOR MAIL ARTICLES
CN100556039C (en) Eliminate the method and system of spam erroneous judgement
WO2001053965A1 (en) E-mail spam filter
Kim et al. Spam filtering with dynamically updated URL statistics

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C41 Transfer of patent application or patent right or utility model
TR01 Transfer of patent right

Effective date of registration: 20151218

Address after: The South Road in Guangdong province Shenzhen city Fiyta building 518057 floor 5-10 Nanshan District high tech Zone

Patentee after: Shenzhen Tencent Computer System Co., Ltd.

Address before: 518057, room 410, east two, SEG science and Technology Park, Zhenxing Road, Shenzhen, Guangdong

Patentee before: Tencent Technology (Shenzhen) Co., Ltd.