WO2004097676A1 - A method of, and system for, replacing external links in electronic documents - Google Patents

A method of, and system for, replacing external links in electronic documents Download PDF

Info

Publication number
WO2004097676A1
WO2004097676A1 PCT/GB2004/000212 GB2004000212W WO2004097676A1 WO 2004097676 A1 WO2004097676 A1 WO 2004097676A1 GB 2004000212 W GB2004000212 W GB 2004000212W WO 2004097676 A1 WO2004097676 A1 WO 2004097676A1
Authority
WO
WIPO (PCT)
Prior art keywords
link
content
document
email
external
Prior art date
Application number
PCT/GB2004/000212
Other languages
French (fr)
Inventor
Alexander Shipp
Original Assignee
Messagelabs Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Messagelabs Limited filed Critical Messagelabs Limited
Priority to US10/500,958 priority Critical patent/US7487540B2/en
Priority to AU2004235513A priority patent/AU2004235513A1/en
Priority to EP04703210A priority patent/EP1618492A1/en
Publication of WO2004097676A1 publication Critical patent/WO2004097676A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9566URL specific, e.g. using aliases, detecting broken or misspelled links
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99941Database schema or data structure
    • Y10S707/99944Object-oriented database structure

Definitions

  • the present invention relates to a method of, and system for, replacing external links in electronic documents such as email with links which can be controlled.
  • One use of this is to ensure that email that attempts to bypass email content scanners no longer succeeds.
  • Another use is to reduce the effectiveness of web bugs.
  • Content scanning can be carried out at a number of places in the passage of electronic documents from one system to another. Taking email as an example, it may be carried out by software operated by the user, e.g. incorporated in or an adjunct to, his email client, and it may be carried out on a mail server to which the user connects, over a LAN or WAN, in order to retrieve email. Also, Internet Service Providers (ISPs) can carry out content scanning as a value-added service on behalf of customers who, for example, then retrieve their content-scanned email via a POP3 account or similar.
  • ISPs Internet Service Providers
  • One trick which can be used to bypass email content scanners is to create an email which just contains a link (such as an HTML hyperlink) to the undesirable or "nasty" content.
  • content may include viruses and other varieties of malware as well as potentially offensive material such as pornographic images and text, and other material to which the email recipient may not wish to be subjected, such as spam.
  • the content scanner sees only the link, which is not suspicious, and the email is let through.
  • the object referred to may either be bought in automatically by the email client, or when the reader clicks on the link.
  • the nasty object ends up on the user's desktop, without ever passing through the email content scanner. It is possible for the content scanner to download the object by following the link itself.
  • the server delivering the object to the content scanner may be able to detect that the request is from a content scanner and not from the end user. It may then serve up a different, innocent object to be scanned. However, when the end-user requests the object, they get the nasty one.
  • the present invention seeks to reduce or eliminate the problems of embedded links in electronic documents and does so by having the content scanner attempt to follow a link found in an electronic document and scan the object which is the target of the link. If the object is found to be acceptable from the point of view of content-scanning criteria, it is retrieved by the scanner and stored on a local, trusted server which is under the control of the person or organisation operating the invention.
  • the link in the electronic document is adjusted to point at the copy of the object stored on the trusted server rather than the original; the document can then be delivered to the recipient without the possibility that the version received by the recipient differs from the one originally scanned. Note that it does not matter to the principle of the invention whether the linked object is stored on the trusted server before or after it has been scanned for acceptability; if it is stored first and found unacceptable on scanning, the link to it can simply be deleted.
  • remedial actions may be taken: for example, the link may be replaced by a non-functional link and/or a notice that the original link has been removed and why; another possibility is that the electronic document can be quarantined and an email or alert generated and sent to the intended recipient advising him that this has been done and perhaps including a link via which he can retrieve it nevertheless or delete it.
  • the process of following links, scanning the linked object and replacing it or not with an embedded copy and an adjusted link may be applied recursively.
  • An upper limit may be placed on the number of recursion levels, to stop the system getting stuck in an infinite loop (e.g. because there are circular links) and to effectively limit the amount of time the processing will take.
  • a content scanning system for electronic documents such as emails comprising: a) a link analyser for identifying hyperlinks in document content; b) means for causing a content scanner to scan objects referenced by links identified by the link analyser and to determine their acceptability according to predefined rules, the means being operative, when the link is to an object external to the document and is determined by the content analyser to be acceptable, to retrieve the external object and modify the document by replacing the link to the external object by one to a copy of the object stored on a trusted server.
  • the invention also provides a method of content-scanning electronic documents such as emails comprising: a) using a link analyser for identifying hyperlinks in document content; b) using a content scanner to scan objects referenced by links identified by the link analyser and to determine their acceptability according to predefined rules, the means being operative, when the link is to an object external to the document and is determined by the content analyser to be acceptable, to retrieve the external object and modify the document by replacing the link to the external object by one to a copy of the object stored on a trusted server.
  • the content scanner can follow the link, and download and scan the object. If the object is judged satisfactory, a copy of it is stored on the trusted server, and the link to the external object replaced by a link to that copy.
  • One trick used by spammers is to embody 'web bugs' in their spam emails. These are unique or semi-unique links to web sites - so a spammer sending out 1000 emails would use 1000 different links.
  • a connection is made to the web site, and by finding which link has been hit, the spammer can match it with their records to tell which person has read the spam email. This then confirms that the email address is a genuine one. The spammer can continue to send email to that address, or perhaps even sell the address on to other spammers.
  • Figure 1 shows the "before” and "after” states of an email processed by an embodiment of the present invention.
  • Figure 2 shows the email processor of a system embodying the present invention
  • Figure 3 shows an object server for providing objects referenced by links in email which has been rebuilt by the system of Figure 2.
  • Figure la shows an email 1 formatted according to an internet (e.g.
  • the body includes a hypertext link 2 which points to an object 3 on a web server 4 somewhere on the internet.
  • the object 3 may for example be a graphical image embedded in a web page (e.g. HTML or XML).
  • Figure lb shows the situation after the email 1 has been processed by the system to be described below and the content pointed at by the link 2 has been judged to be acceptable.
  • the content i.e. image 3 has been copied to an object server 5 as image 3'; the object server 5 is hosted on a secure server machine 6 (or array of such machines) under the control of the person or organisation operating the system (e.g. an ISP).
  • the original link 2 has been replaced by a new link 2' pointing at the image 3' stored on the secure or trusted server 6.
  • the server 6 operates in the security domain of the operator of the system and has access permissions associated with the stored content objects such as 3' which enable eventual recipients of emails such as 1, or more strictly speaking their email client software to follow the link 2' and retrieve the linked-to object.
  • the access permissions of server 6 should prevent persons or software without appropriate security credentials from writing to the linked-to object storage area.
  • FIGS 2 and 3 illustrate a system according to the present invention.
  • this example embodiment is given in terms of a content scanner operated by an ISP to process email stream e.g. passing through an email gateway.
  • Figure 2 shows the part of the system which processes emails and modifies them to replace links to objects on untrusted servers such as 4 by links to objects on trusted server 6, where the linked-to object is considered to be acceptable content.
  • Figure 3 shows the object server which provides the linked-to objects when recipients follow the processed links in their emails.
  • the part 100 of the system which is shown in Figure 2 may operate as follows in respect of each item of email delivered to an input 101.
  • the email is analysed by analyser 102 to determine whether it contains external links. This determination may be made, for example, by scanning it for standard markup tags which point to external content or objects, for example the ⁇ A> and ⁇ IMAGE> tags in HTML. If none are found, steps 2 to 5 are omitted and the email is delivered unprocessed to output 103 via path 104.
  • the analyser 102 operates in concert with a link replacer 105 to process links to external objects. For each link, the link replacer 105 creates a new link which is stored in a link database 106. The new link is generated by a process guaranteed to generate unique links each time. A database entry is created, tying the original link and the new, replacement, one together.
  • An email rebuilder 107 rebuilds the email with each link replaced by its new counterpart stored in link database 106 and the rebuilt email is forwarded on. 4.
  • An email rebuilder 107 rebuilds the email with each link replaced by its new counterpart stored in link database 106 and the rebuilt email is forwarded on. 4.
  • the new link may be requested either by the email client software, or by the person reading the email clicking or otherwise selecting the link. This generates a request retrieve an object from the trusted server 6.
  • the server 6 looks up in the link database 106 to find the original object, and retrieves it. If it cannot be retrieved, go to step 8.
  • step 8 The external objects are scanned for pornography, viruses, spam and other undesirables. If any are found, go to step 8. 6. The external objects are analysed to see whether they contain external links. If the nesting limit has been reached, go to step 8. Otherwise each external link is replaced by a new link in a manner similar to step 2, and a database entry is created, tying the old and new links together.
  • the rebuilt object is forwarded to the requester, and the process ends 8. If processing arrives here, an undesirable obj ect has been found, or the object could not be retrieved, or the nesting limit has been reached.
  • the system can now take some appropriate error action, such as returning an error message, alerting an operator or returning a default object.
  • Figure 3 shows the object server 300 which services requests received at an input 301 to retrieve a linked-to object on an entrusted server, scan it for acceptability and, if acceptable, to store it on the secure "safelink" server 6.
  • An object locator 300 locates the linked-to object, e.g. on the internet and initiates a retrieval operation by which the object is retrieved by the retriever 303. This retrieval process takes place using the internet protocol appropriate to the link and linked-to object. If the retrieval fails, an error handler 304 is invoked. If successful, the object is processed by an object control scanner which makes a determination of whether the content is acceptable. If it is not, the error handler 304 is invoked, otherwise an object returner 306 returns the object and stores it on the trusted server 6.
  • the following email contains a link to a website.
  • Subject email with link Subject:
  • Subject email with link Subject:
  • the safelink.com server will lookup http://safelink.com/09052002161710a33071ef407.gif in the database, and find that the original link is http ://www.messagelabs . com/threatlist. gif. It will download the object, and scan it, perhaps for pornography or other inappropriate content.
  • the link can be generated by processing the name of the server generating the link, the current time, the process id, a number that increments each time and random number. This will all be appended to an appropriate reference to the 'safelink' server.
  • the server generating the link is mail2071.messagelabs.com, the reference is for the http protocol the time is 27 Jan 2003, 17:45:01 the process id is 1717, 27 references have already been generated and the safelink server is safelink.com, then a typical link might be: http://safelink.com/mail2071_messagelabs_com/27012003174501/1717/28/10131354834
  • the system does not have to wait until the object is first requested. It may proactively fetch the object ahead of time, scan it, and either cache the object or remember that the scan did not pass.
  • the system can also cache and intelligently remember which objects have been retrieved - if five emails contain the same original link, then even though they will end up with five different new links the referred to object only needs to be retrieved once and not five times.
  • the system might want to ensure the same filename extension is used for the old and new links.

Abstract

A content scanner for electronic documents such as email scans objects which are the target of hyperlinks within the documents. If they are determined to be acceptable, the hyperlinks are replaced by ones pointing to copies of the objects stored on a trusted server.

Description

A METHOD OF, AND SYSTEM FOR, REPLACING EXTERNAL LINKS IN ELECTRONIC DOCUMENTS
The present invention relates to a method of, and system for, replacing external links in electronic documents such as email with links which can be controlled. One use of this is to ensure that email that attempts to bypass email content scanners no longer succeeds. Another use is to reduce the effectiveness of web bugs.
Content scanning can be carried out at a number of places in the passage of electronic documents from one system to another. Taking email as an example, it may be carried out by software operated by the user, e.g. incorporated in or an adjunct to, his email client, and it may be carried out on a mail server to which the user connects, over a LAN or WAN, in order to retrieve email. Also, Internet Service Providers (ISPs) can carry out content scanning as a value-added service on behalf of customers who, for example, then retrieve their content-scanned email via a POP3 account or similar.
One trick which can be used to bypass email content scanners is to create an email which just contains a link (such as an HTML hyperlink) to the undesirable or "nasty" content. Such content may include viruses and other varieties of malware as well as potentially offensive material such as pornographic images and text, and other material to which the email recipient may not wish to be subjected, such as spam. The content scanner sees only the link, which is not suspicious, and the email is let through. However, when viewed in the email client, the object referred to may either be bought in automatically by the email client, or when the reader clicks on the link. Thus, the nasty object ends up on the user's desktop, without ever passing through the email content scanner. It is possible for the content scanner to download the object by following the link itself. It can then scan the object. However, this method is not foolproof- for instance, the server delivering the object to the content scanner may be able to detect that the request is from a content scanner and not from the end user. It may then serve up a different, innocent object to be scanned. However, when the end-user requests the object, they get the nasty one.
The present invention seeks to reduce or eliminate the problems of embedded links in electronic documents and does so by having the content scanner attempt to follow a link found in an electronic document and scan the object which is the target of the link. If the object is found to be acceptable from the point of view of content-scanning criteria, it is retrieved by the scanner and stored on a local, trusted server which is under the control of the person or organisation operating the invention. The link in the electronic document is adjusted to point at the copy of the object stored on the trusted server rather than the original; the document can then be delivered to the recipient without the possibility that the version received by the recipient differs from the one originally scanned. Note that it does not matter to the principle of the invention whether the linked object is stored on the trusted server before or after it has been scanned for acceptability; if it is stored first and found unacceptable on scanning, the link to it can simply be deleted.
If the object is not found to be acceptable, one or more remedial actions may be taken: for example, the link may be replaced by a non-functional link and/or a notice that the original link has been removed and why; another possibility is that the electronic document can be quarantined and an email or alert generated and sent to the intended recipient advising him that this has been done and perhaps including a link via which he can retrieve it nevertheless or delete it. The process of following links, scanning the linked object and replacing it or not with an embedded copy and an adjusted link may be applied recursively. An upper limit may be placed on the number of recursion levels, to stop the system getting stuck in an infinite loop (e.g. because there are circular links) and to effectively limit the amount of time the processing will take.
Thus according to the present invention there is provided a content scanning system for electronic documents such as emails comprising: a) a link analyser for identifying hyperlinks in document content; b) means for causing a content scanner to scan objects referenced by links identified by the link analyser and to determine their acceptability according to predefined rules, the means being operative, when the link is to an object external to the document and is determined by the content analyser to be acceptable, to retrieve the external object and modify the document by replacing the link to the external object by one to a copy of the object stored on a trusted server.
The invention also provides a method of content-scanning electronic documents such as emails comprising: a) using a link analyser for identifying hyperlinks in document content; b) using a content scanner to scan objects referenced by links identified by the link analyser and to determine their acceptability according to predefined rules, the means being operative, when the link is to an object external to the document and is determined by the content analyser to be acceptable, to retrieve the external object and modify the document by replacing the link to the external object by one to a copy of the object stored on a trusted server.
Thus the content scanner can follow the link, and download and scan the object. If the object is judged satisfactory, a copy of it is stored on the trusted server, and the link to the external object replaced by a link to that copy.
One trick used by spammers is to embody 'web bugs' in their spam emails. These are unique or semi-unique links to web sites - so a spammer sending out 1000 emails would use 1000 different links. When the email is read, a connection is made to the web site, and by finding which link has been hit, the spammer can match it with their records to tell which person has read the spam email. This then confirms that the email address is a genuine one. The spammer can continue to send email to that address, or perhaps even sell the address on to other spammers.
By following every external link in every email that passes through the content scanner, all the web bugs the spammer sends out will be activated. Their effectiveness therefore becomes much reduced, because they can no longer be used to tell which email addresses were valid or not.
The invention will be further described by way of non-limiting example with reference to the accompanying drawings, in which:-
Figure 1 shows the "before" and "after" states of an email processed by an embodiment of the present invention; and
Figure 2 shows the email processor of a system embodying the present invention; and
Figure 3 shows an object server for providing objects referenced by links in email which has been rebuilt by the system of Figure 2. Figure la shows an email 1 formatted according to an internet (e.g.
SMTP/MIME) format. The body includes a hypertext link 2 which points to an object 3 on a web server 4 somewhere on the internet. The object 3 may for example be a graphical image embedded in a web page (e.g. HTML or XML).
Figure lb shows the situation after the email 1 has been processed by the system to be described below and the content pointed at by the link 2 has been judged to be acceptable. The content, i.e. image 3 has been copied to an object server 5 as image 3'; the object server 5 is hosted on a secure server machine 6 (or array of such machines) under the control of the person or organisation operating the system (e.g. an ISP). The original link 2 has been replaced by a new link 2' pointing at the image 3' stored on the secure or trusted server 6. The server 6 operates in the security domain of the operator of the system and has access permissions associated with the stored content objects such as 3' which enable eventual recipients of emails such as 1, or more strictly speaking their email client software to follow the link 2' and retrieve the linked-to object. Of course, the access permissions of server 6 should prevent persons or software without appropriate security credentials from writing to the linked-to object storage area.
Figures 2 and 3 illustrate a system according to the present invention. Although the invention is not limited to this application, this example embodiment is given in terms of a content scanner operated by an ISP to process email stream e.g. passing through an email gateway.
Figure 2 shows the part of the system which processes emails and modifies them to replace links to objects on untrusted servers such as 4 by links to objects on trusted server 6, where the linked-to object is considered to be acceptable content. Figure 3 shows the object server which provides the linked-to objects when recipients follow the processed links in their emails.
The part 100 of the system which is shown in Figure 2 may operate as follows in respect of each item of email delivered to an input 101.
1. The email is analysed by analyser 102 to determine whether it contains external links. This determination may be made, for example, by scanning it for standard markup tags which point to external content or objects, for example the <A> and <IMAGE> tags in HTML. If none are found, steps 2 to 5 are omitted and the email is delivered unprocessed to output 103 via path 104.
2. The analyser 102 operates in concert with a link replacer 105 to process links to external objects. For each link, the link replacer 105 creates a new link which is stored in a link database 106. The new link is generated by a process guaranteed to generate unique links each time. A database entry is created, tying the original link and the new, replacement, one together.
3. An email rebuilder 107 rebuilds the email with each link replaced by its new counterpart stored in link database 106 and the rebuilt email is forwarded on. 4. When the email is read, the part 200 of the system illustrated in Figure
3 comes into play. The new link may be requested either by the email client software, or by the person reading the email clicking or otherwise selecting the link. This generates a request retrieve an object from the trusted server 6. The server 6 looks up in the link database 106 to find the original object, and retrieves it. If it cannot be retrieved, go to step 8.
5. The external objects are scanned for pornography, viruses, spam and other undesirables. If any are found, go to step 8. 6. The external objects are analysed to see whether they contain external links. If the nesting limit has been reached, go to step 8. Otherwise each external link is replaced by a new link in a manner similar to step 2, and a database entry is created, tying the old and new links together.
7. The rebuilt object is forwarded to the requester, and the process ends 8. If processing arrives here, an undesirable obj ect has been found, or the object could not be retrieved, or the nesting limit has been reached. The system can now take some appropriate error action, such as returning an error message, alerting an operator or returning a default object.
Figure 3 shows the object server 300 which services requests received at an input 301 to retrieve a linked-to object on an entrusted server, scan it for acceptability and, if acceptable, to store it on the secure "safelink" server 6. An object locator 300 locates the linked-to object, e.g. on the internet and initiates a retrieval operation by which the object is retrieved by the retriever 303. This retrieval process takes place using the internet protocol appropriate to the link and linked-to object. If the retrieval fails, an error handler 304 is invoked. If successful, the object is processed by an object control scanner which makes a determination of whether the content is acceptable. If it is not, the error handler 304 is invoked, otherwise an object returner 306 returns the object and stores it on the trusted server 6.
Example
The following email contains a link to a website.
Subject: email with link Subject:
Date: Thu, 9 May 2002 16:17:01 +0600
MIME-Version: 1.0
Content-Type: text/html;
Content-Transfer-Encoding: 7bit
<!DOCTYPE HTML PUBLIC "-// 3C//DTD HTML 4.0 Transitional//EN"> <HTMLXHEAD> </HEAD>
<BODY bgColor=3D#ffffff> <DIV>&nbsp;</DIV>
This is some text<BR> <DIV><IMAGE src="http: //www.messagelabs . com/threatlist"
</DIV>
This is some more text<BR>
</BODYX/HTML>
A new link is generated: http://safelink.com/09052002161710a33071ef407.gif, the email is updated, and a database entry is generated. Database Entry
Old link: http://www.messagelabs.com/tlireatlist.gif
New link: http://safelink.com/09052002161710a33071ef407.gif Updated email
Subject: email with link Subject:
Date: Thu, 9 May 2002 16:17:01 +0600 MIME-Version: 1.0
Content-Type: text/html; Content-Transfer-Encoding: 7bit
<!DOCTYPE HTML PUBLIC "-// 3C//DTD HTML 4.0 Transitional//EN"> <HTMLXHEAD>
</HEAD>
<BODY bgColor=3D#ffffff>
<DIV>&nbsp; </DIV>
This is some text<BR> <DIVXIMAGE src="http: //safelink. com/09052002161710a33071ef407. gif"
</DIV>
This is some more text<BR>
</BODYX/HTML> When the email is read, the email client may try and download the image
» referred to by the link. However, it will try and retrieve the image from http://safelink.com/09052002161710a33071ef407.gif rather than http://www.messagelabs.com/threatlist.gif
The safelink.com server will lookup http://safelink.com/09052002161710a33071ef407.gif in the database, and find that the original link is http ://www.messagelabs . com/threatlist. gif. It will download the object, and scan it, perhaps for pornography or other inappropriate content.
If the scan shows the object is harmless, it can be passed back to the original requestor. Example Link Generator
The link can be generated by processing the name of the server generating the link, the current time, the process id, a number that increments each time and random number. This will all be appended to an appropriate reference to the 'safelink' server. Thus if the server generating the link is mail2071.messagelabs.com, the reference is for the http protocol the time is 27 Jan 2003, 17:45:01 the process id is 1717, 27 references have already been generated and the safelink server is safelink.com, then a typical link might be: http://safelink.com/mail2071_messagelabs_com/27012003174501/1717/28/10131354834
Other Improvements
Other improvements can be added to the system. For instance, the system does not have to wait until the object is first requested. It may proactively fetch the object ahead of time, scan it, and either cache the object or remember that the scan did not pass.
This will cut down on delays when the object is requested. If all links are followed then this will activate all web bugs placed in the email, thereby much reducing their effectiveness. The system can also cache and intelligently remember which objects have been retrieved - if five emails contain the same original link, then even though they will end up with five different new links the referred to object only needs to be retrieved once and not five times.
The system might want to ensure the same filename extension is used for the old and new links.

Claims

1. A content scanning system for electronic documents such as emails comprising: a) a link analyser for identifying hyperlinks in document content; b) means for causing a content scanner to scan objects referenced by links identified by the link analyser and to determine their acceptability according to predefined rules, the means being operative, when the link is to an object external to the document and is determined by the content analyser to be acceptable, to retrieve the external object and modify the document by replacing the link to the external object by one to a copy of the object stored on a trusted server.
2. A system according to claim 1 wherein the link analyser a) and means b) are operative to recursively process links identified in such external objects.
3. A system according to claim 2 in which only a maximum depth of recursion is permitted and the document is flagged as unacceptable if that limit is reached.
4. A system according to any one of the preceding claims wherein if any linked-to object is determined by the content scanner to be unacceptable the document is flagged or modified to indicate that fact.
5. A method of content-scanning electronic documents such as emails comprising: a) using a link analyser for identifying hyperlinks in document content; b) using a content scanner to scan objects referenced by links identified by the link analyser and to determine their acceptability according to predefined rules, the means being operative, when the link is to an object external to the document and is determined by the content analyser to be acceptable, to retrieve the external object and modify the document by replacing the link to the external object by one to a copy of the object stored on a trusted server.
6. A method according to claim 5 wherein the steps a) and b) are used recursively to process links identified in such external objects.
7. A method according to claim 6 in which only a maximum depth of recursion is permitted and the document is flagged as unacceptable if that limit is reached.
8. A method according to any one of claim 5 to 7, wherein if any linked-to object is determined by the content scanner to be unacceptable the document is flagged or modified to indicate that fact.
PCT/GB2004/000212 2003-04-25 2004-01-19 A method of, and system for, replacing external links in electronic documents WO2004097676A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US10/500,958 US7487540B2 (en) 2003-04-25 2004-01-19 Method of, and system for, replacing external links in electronic documents
AU2004235513A AU2004235513A1 (en) 2003-04-25 2004-01-19 A method of, and system for, replacing external links in electronic documents
EP04703210A EP1618492A1 (en) 2003-04-25 2004-01-19 A method of, and system for, replacing external links in electronic documents

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB0309462.0 2003-04-25
GB0309462A GB2400931B (en) 2003-04-25 2003-04-25 A method of, and system for, replacing external links in electronic documents

Publications (1)

Publication Number Publication Date
WO2004097676A1 true WO2004097676A1 (en) 2004-11-11

Family

ID=33042175

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2004/000212 WO2004097676A1 (en) 2003-04-25 2004-01-19 A method of, and system for, replacing external links in electronic documents

Country Status (5)

Country Link
US (1) US7487540B2 (en)
EP (1) EP1618492A1 (en)
AU (1) AU2004235513A1 (en)
GB (1) GB2400931B (en)
WO (1) WO2004097676A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006136605A1 (en) 2005-06-22 2006-12-28 Surfcontrol On-Demand Limited Method and system for filtering electronic messages
US7404209B2 (en) 2002-08-14 2008-07-22 Messagelabs Limited Method of, and system for, scanning electronic documents which contain links to external objects
US7487540B2 (en) 2003-04-25 2009-02-03 Messagelabs Limited Method of, and system for, replacing external links in electronic documents

Families Citing this family (59)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9219755B2 (en) 1996-11-08 2015-12-22 Finjan, Inc. Malicious mobile code runtime monitoring system and methods
US7058822B2 (en) 2000-03-30 2006-06-06 Finjan Software, Ltd. Malicious mobile code runtime monitoring system and methods
US8079086B1 (en) 1997-11-06 2011-12-13 Finjan, Inc. Malicious mobile code runtime monitoring system and methods
US20070299915A1 (en) * 2004-05-02 2007-12-27 Markmonitor, Inc. Customer-based detection of online fraud
US7457823B2 (en) 2004-05-02 2008-11-25 Markmonitor Inc. Methods and systems for analyzing data related to possible online fraud
US8041769B2 (en) * 2004-05-02 2011-10-18 Markmonitor Inc. Generating phish messages
US7870608B2 (en) * 2004-05-02 2011-01-11 Markmonitor, Inc. Early detection and monitoring of online fraud
US20070107053A1 (en) * 2004-05-02 2007-05-10 Markmonitor, Inc. Enhanced responses to online fraud
US8769671B2 (en) * 2004-05-02 2014-07-01 Markmonitor Inc. Online fraud solution
US7992204B2 (en) 2004-05-02 2011-08-02 Markmonitor, Inc. Enhanced responses to online fraud
US9203648B2 (en) * 2004-05-02 2015-12-01 Thomson Reuters Global Resources Online fraud solution
US7913302B2 (en) * 2004-05-02 2011-03-22 Markmonitor, Inc. Advanced responses to online fraud
US7461170B1 (en) * 2004-09-01 2008-12-02 Microsoft Corporation Zone-based rendering of resource addresses
GB2418999A (en) * 2004-09-09 2006-04-12 Surfcontrol Plc Categorizing uniform resource locators
GB2418037B (en) * 2004-09-09 2007-02-28 Surfcontrol Plc System, method and apparatus for use in monitoring or controlling internet access
US7841003B1 (en) * 2005-05-04 2010-11-23 Capital One Financial Corporation Phishing solution method
US20070028301A1 (en) * 2005-07-01 2007-02-01 Markmonitor Inc. Enhanced fraud monitoring systems
US8615800B2 (en) 2006-07-10 2013-12-24 Websense, Inc. System and method for analyzing web content
US8020206B2 (en) 2006-07-10 2011-09-13 Websense, Inc. System and method of analyzing web content
US9654495B2 (en) * 2006-12-01 2017-05-16 Websense, Llc System and method of analyzing web addresses
GB2458094A (en) * 2007-01-09 2009-09-09 Surfcontrol On Demand Ltd URL interception and categorization in firewalls
GB2445764A (en) * 2007-01-22 2008-07-23 Surfcontrol Plc Resource access filtering system and database structure for use therewith
US8015174B2 (en) * 2007-02-28 2011-09-06 Websense, Inc. System and method of controlling access to the internet
US7970761B2 (en) * 2007-03-28 2011-06-28 International Business Machines Corporation Automatic identification of components for a compound document in a content management system
US8140589B2 (en) * 2007-03-28 2012-03-20 International Business Machines Corporation Autonomic updating of templates in a content management system
US8024652B2 (en) * 2007-04-10 2011-09-20 Microsoft Corporation Techniques to associate information between application programs
GB0709527D0 (en) 2007-05-18 2007-06-27 Surfcontrol Plc Electronic messaging system, message processing apparatus and message processing method
US8074162B1 (en) * 2007-10-23 2011-12-06 Google Inc. Method and system for verifying the appropriateness of shared content
EP2318955A1 (en) * 2008-06-30 2011-05-11 Websense, Inc. System and method for dynamic and real-time categorization of webpages
US9130972B2 (en) 2009-05-26 2015-09-08 Websense, Inc. Systems and methods for efficient detection of fingerprinted data and information
US8850584B2 (en) * 2010-02-08 2014-09-30 Mcafee, Inc. Systems and methods for malware detection
US10404615B2 (en) 2012-02-14 2019-09-03 Airwatch, Llc Controlling distribution of resources on a network
US9680763B2 (en) 2012-02-14 2017-06-13 Airwatch, Llc Controlling distribution of resources in a network
US9241259B2 (en) 2012-11-30 2016-01-19 Websense, Inc. Method and apparatus for managing the transfer of sensitive information to mobile devices
US8978110B2 (en) 2012-12-06 2015-03-10 Airwatch Llc Systems and methods for controlling email access
US9021037B2 (en) 2012-12-06 2015-04-28 Airwatch Llc Systems and methods for controlling email access
US8826432B2 (en) * 2012-12-06 2014-09-02 Airwatch, Llc Systems and methods for controlling email access
US8862868B2 (en) 2012-12-06 2014-10-14 Airwatch, Llc Systems and methods for controlling email access
US8832785B2 (en) 2012-12-06 2014-09-09 Airwatch, Llc Systems and methods for controlling email access
US20140280955A1 (en) 2013-03-14 2014-09-18 Sky Socket, Llc Controlling Electronically Communicated Resources
US8997187B2 (en) 2013-03-15 2015-03-31 Airwatch Llc Delegating authorization to applications on a client device in a networked environment
US9787686B2 (en) 2013-04-12 2017-10-10 Airwatch Llc On-demand security policy activation
US9219741B2 (en) 2013-05-02 2015-12-22 Airwatch, Llc Time-based configuration policy toggling
US11232250B2 (en) * 2013-05-15 2022-01-25 Microsoft Technology Licensing, Llc Enhanced links in curation and collaboration applications
US20140344257A1 (en) * 2013-05-16 2014-11-20 International Business Machines Corporation Detecting a Preferred Implementation of an Operation
US9900261B2 (en) 2013-06-02 2018-02-20 Airwatch Llc Shared resource watermarking and management
US9584437B2 (en) 2013-06-02 2017-02-28 Airwatch Llc Resource watermarking and management
US9686304B1 (en) * 2013-06-25 2017-06-20 Symantec Corporation Systems and methods for healing infected document files
US8756426B2 (en) 2013-07-03 2014-06-17 Sky Socket, Llc Functionality watermarking and management
US8775815B2 (en) 2013-07-03 2014-07-08 Sky Socket, Llc Enterprise-specific functionality watermarking and management
US8806217B2 (en) 2013-07-03 2014-08-12 Sky Socket, Llc Functionality watermarking and management
US9665723B2 (en) 2013-08-15 2017-05-30 Airwatch, Llc Watermarking detection and management
WO2015042681A1 (en) * 2013-09-24 2015-04-02 Netsweeper (Barbados) Inc. Network policy service for dynamic media
US9544306B2 (en) 2013-10-29 2017-01-10 Airwatch Llc Attempted security breach remediation
US9824332B1 (en) 2017-04-12 2017-11-21 eTorch Inc. Email data collection compliance enforcement
US9674129B1 (en) 2016-10-05 2017-06-06 eTorch Inc. Email privacy enforcement
US9559997B1 (en) 2016-01-11 2017-01-31 Paul Everton Client agnostic email processing
US10097580B2 (en) * 2016-04-12 2018-10-09 Microsoft Technology Licensing, Llc Using web search engines to correct domain names used for social engineering
WO2023028599A1 (en) * 2021-08-27 2023-03-02 Rock Cube Holdings LLC Systems and methods for time-dependent hyperlink presentation

Family Cites Families (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5781901A (en) 1995-12-21 1998-07-14 Intel Corporation Transmitting electronic mail attachment over a network using a e-mail page
JP2940459B2 (en) 1996-02-08 1999-08-25 日本電気株式会社 Node / link search device
EP0945811B1 (en) 1996-10-23 2003-01-22 Access Co., Ltd. Information apparatus having automatic web reading function
US6038601A (en) * 1997-07-21 2000-03-14 Tibco, Inc. Method and apparatus for storing and delivering documents on the internet
US6119231A (en) 1997-10-31 2000-09-12 Cisco Technologies, Inc. Data scanning network security technique
US6321242B1 (en) * 1998-02-06 2001-11-20 Sun Microsystems, Inc. Re-linking technology for a moving web site
KR100684986B1 (en) 1999-12-31 2007-02-22 주식회사 잉카인터넷 Online dangerous information screening system and method
US6924828B1 (en) 1999-04-27 2005-08-02 Surfnotes Method and apparatus for improved information representation
JP2002544582A (en) 1999-05-11 2002-12-24 アメリカ オンライン インコーポレイテッド Control access to content
TW504619B (en) 1999-06-04 2002-10-01 Ibm Internet mail delivery agent with automatic caching of file attachments
JP3487789B2 (en) 1999-07-26 2004-01-19 中部日本電気ソフトウェア株式会社 E-mail processing system, e-mail processing method, and recording medium
US6665838B1 (en) * 1999-07-30 2003-12-16 International Business Machines Corporation Web page thumbnails and user configured complementary information provided from a server
US8543901B1 (en) * 1999-11-01 2013-09-24 Level 3 Communications, Llc Verification of content stored in a network
US6954783B1 (en) * 1999-11-12 2005-10-11 Bmc Software, Inc. System and method of mediating a web page
WO2001050353A1 (en) * 2000-01-04 2001-07-12 Ma'at System and method for anonymous observation and use of premium content
US6701440B1 (en) * 2000-01-06 2004-03-02 Networks Associates Technology, Inc. Method and system for protecting a computer using a remote e-mail scanning device
US7043757B2 (en) * 2001-05-22 2006-05-09 Mci, Llc System and method for malicious code detection
WO2003044617A2 (en) * 2001-10-03 2003-05-30 Reginald Adkins Authorized email control system
US20030097591A1 (en) * 2001-11-20 2003-05-22 Khai Pham System and method for protecting computer users from web sites hosting computer viruses
US7096500B2 (en) 2001-12-21 2006-08-22 Mcafee, Inc. Predictive malware scanning of internet data
GB2391964B (en) 2002-08-14 2006-05-03 Messagelabs Ltd Method of and system for scanning electronic documents which contain links to external objects
US20040117450A1 (en) * 2002-12-13 2004-06-17 Campbell David T. Gateway email concentrator
US20040177042A1 (en) * 2003-03-05 2004-09-09 Comverse Network Systems, Ltd. Digital rights management for end-user content
GB2400931B (en) 2003-04-25 2006-09-27 Messagelabs Ltd A method of, and system for, replacing external links in electronic documents

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"LOOK AHEAD FILTERING OF INTERNET CONTENT", IBM TECHNICAL DISCLOSURE BULLETIN, IBM CORP. NEW YORK, US, vol. 40, no. 12, 1 December 1997 (1997-12-01), pages 143, XP000754118, ISSN: 0018-8689 *
GREENFIELD P ET AL: "Access Prevention techniques for Internet Content Filtering", CSIRIO, December 1999 (1999-12-01), XP002265027 *
WIEGEL B ED - ASSOCIATION FOR COMPUTING MACHINERY: "SECURE EXTERNAL REFERENCES IN MULTIMEDIA EMAIL MESSAGES", 3RD. ACM CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY. NEW DELHI, MAR. 14 - 16, 1996, ACM CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, NEW YORK, ACM, US, vol. CONF. 3, 14 March 1996 (1996-03-14), pages 11 - 18, XP000620973, ISBN: 0-89791-829-0 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7404209B2 (en) 2002-08-14 2008-07-22 Messagelabs Limited Method of, and system for, scanning electronic documents which contain links to external objects
US7487540B2 (en) 2003-04-25 2009-02-03 Messagelabs Limited Method of, and system for, replacing external links in electronic documents
WO2006136605A1 (en) 2005-06-22 2006-12-28 Surfcontrol On-Demand Limited Method and system for filtering electronic messages

Also Published As

Publication number Publication date
US20050071748A1 (en) 2005-03-31
EP1618492A1 (en) 2006-01-25
AU2004235513A1 (en) 2004-11-11
GB2400931A (en) 2004-10-27
US7487540B2 (en) 2009-02-03
GB2400931B (en) 2006-09-27

Similar Documents

Publication Publication Date Title
US7487540B2 (en) Method of, and system for, replacing external links in electronic documents
EP1535229B1 (en) Method of, and system for, scanning electronic documents which contain links to external objects
US10904186B1 (en) Email processing for enhanced email privacy and security
AU2006260933B2 (en) Method and system for filtering electronic messages
US7086050B2 (en) Updating computer files
KR101700176B1 (en) Just-in-time, email embedded url reputation determination
US8321512B2 (en) Method and software product for identifying unsolicited emails
US8082328B2 (en) Method and apparatus for publishing documents over a network
US20070136806A1 (en) Method and system for blocking phishing scams
US20020178381A1 (en) System and method for identifying undesirable content in responses sent in reply to a user request for content
US20180084002A1 (en) Malicious hyperlink protection
US20050080816A1 (en) Method of, and system for, heurisically determining that an unknown file is harmless by using traffic heuristics
US7787141B2 (en) System and method for detecting errors in electronic document workflow
KR100819072B1 (en) Mitigating self-propagating e-mail viruses

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 10500958

Country of ref document: US

AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): BW GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 2004703210

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2004235513

Country of ref document: AU

ENP Entry into the national phase

Ref document number: 2004235513

Country of ref document: AU

Date of ref document: 20040119

Kind code of ref document: A

WWP Wipo information: published in national office

Ref document number: 2004235513

Country of ref document: AU

WWP Wipo information: published in national office

Ref document number: 2004703210

Country of ref document: EP