US8788500B2 - Electronic mail duplicate detection - Google Patents
Electronic mail duplicate detection Download PDFInfo
- Publication number
- US8788500B2 US8788500B2 US12/879,478 US87947810A US8788500B2 US 8788500 B2 US8788500 B2 US 8788500B2 US 87947810 A US87947810 A US 87947810A US 8788500 B2 US8788500 B2 US 8788500B2
- Authority
- US
- United States
- Prior art keywords
- segment
- signature
- index
- query
- root
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/38—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/174—Redundancy elimination performed by the file system
- G06F16/1748—De-duplication implemented within the file system, e.g. based on file segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L51/00—User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
- H04L51/21—Monitoring or handling of messages
- H04L51/216—Handling conversation history, e.g. grouping of messages in sessions or threads
Abstract
Description
Claims (21)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/879,478 US8788500B2 (en) | 2010-09-10 | 2010-09-10 | Electronic mail duplicate detection |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/879,478 US8788500B2 (en) | 2010-09-10 | 2010-09-10 | Electronic mail duplicate detection |
Publications (2)
Publication Number | Publication Date |
---|---|
US20120066209A1 US20120066209A1 (en) | 2012-03-15 |
US8788500B2 true US8788500B2 (en) | 2014-07-22 |
Family
ID=45807683
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/879,478 Active 2031-06-17 US8788500B2 (en) | 2010-09-10 | 2010-09-10 | Electronic mail duplicate detection |
Country Status (1)
Country | Link |
---|---|
US (1) | US8788500B2 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11449545B2 (en) * | 2019-05-13 | 2022-09-20 | Snap Inc. | Deduplication of media file search results |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103488734A (en) * | 2013-09-17 | 2014-01-01 | 华为技术有限公司 | Data processing method and deduplication engine |
US9805099B2 (en) * | 2014-10-30 | 2017-10-31 | The Johns Hopkins University | Apparatus and method for efficient identification of code similarity |
Citations (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5404488A (en) * | 1990-09-26 | 1995-04-04 | Lotus Development Corporation | Realtime data feed engine for updating an application with the most currently received data from multiple data feeds |
US6510453B1 (en) | 1999-02-26 | 2003-01-21 | Microsoft Corporation | System and method for creating and inserting multiple data fragments into an electronic mail message |
US20030105716A1 (en) | 2001-12-03 | 2003-06-05 | Sutton Lorin R. | Reducing duplication of files on a network |
US6820081B1 (en) | 2001-03-19 | 2004-11-16 | Attenex Corporation | System and method for evaluating a structured message store for message redundancy |
WO2006008733A2 (en) | 2004-07-21 | 2006-01-26 | Equivio Ltd. | A method for determining near duplicate data objects |
US20060095521A1 (en) * | 2004-11-04 | 2006-05-04 | Seth Patinkin | Method, apparatus, and system for clustering and classification |
US20070255803A1 (en) * | 2006-04-28 | 2007-11-01 | Gabe Cherian | X-mail (tm) |
US20080183826A1 (en) * | 2007-01-31 | 2008-07-31 | Ranjit Notani | System and Method For Transactional, Addressable Communication |
US20080208992A1 (en) | 2007-01-03 | 2008-08-28 | Madnani Rajkumar R | Mechanism for discovering and recovering missing emails in an email conversation |
WO2008137308A1 (en) | 2007-05-03 | 2008-11-13 | Microsoft Corporation | Identifying and correlating electronic mail messages |
US20090012984A1 (en) | 2007-07-02 | 2009-01-08 | Equivio Ltd. | Method for Organizing Large Numbers of Documents |
US20090089383A1 (en) | 2007-09-30 | 2009-04-02 | Tsuen Wan Ngan | System and method for detecting content similarity within emails documents employing selective truncation |
US7539871B1 (en) | 2004-02-23 | 2009-05-26 | Sun Microsystems, Inc. | System and method for identifying message propagation |
US20090319500A1 (en) | 2008-06-24 | 2009-12-24 | Microsoft Corporation | Scalable lookup-driven entity extraction from indexed document collections |
US20100030798A1 (en) * | 2007-01-23 | 2010-02-04 | Clearwell Systems, Inc. | Systems and Methods for Tagging Emails by Discussions |
US7716217B2 (en) | 2006-01-13 | 2010-05-11 | Bluespace Software Corporation | Determining relevance of electronic content |
US7725475B1 (en) * | 2004-02-11 | 2010-05-25 | Aol Inc. | Simplifying lexicon creation in hybrid duplicate detection and inductive classifier systems |
US7743051B1 (en) | 2006-01-23 | 2010-06-22 | Clearwell Systems, Inc. | Methods, systems, and user interface for e-mail search and retrieval |
US20100169888A1 (en) | 2003-05-21 | 2010-07-01 | Resilient, Inc. | Virtual process collaboration |
US20100287196A1 (en) | 2007-12-21 | 2010-11-11 | Thomas Clay Shields | Automated forensic document signatures |
US8032534B2 (en) | 2007-12-17 | 2011-10-04 | Electronics And Telecommunications Research Institute | Method and system for indexing and searching high-dimensional data using signature file |
US8200762B2 (en) | 2006-06-01 | 2012-06-12 | Aol Inc. | Displaying complex messaging threads into a single display |
US20120191716A1 (en) * | 2002-06-24 | 2012-07-26 | Nosa Omoigui | System and method for knowledge retrieval, management, delivery and presentation |
US8266430B1 (en) | 2007-11-29 | 2012-09-11 | Emc Corporation | Selective shredding in a deduplication system |
US8351678B1 (en) * | 2008-06-11 | 2013-01-08 | United Services Automobile Association (Usaa) | Duplicate check detection |
US8429178B2 (en) * | 2004-02-11 | 2013-04-23 | Facebook, Inc. | Reliability of duplicate document detection algorithms |
-
2010
- 2010-09-10 US US12/879,478 patent/US8788500B2/en active Active
Patent Citations (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5404488A (en) * | 1990-09-26 | 1995-04-04 | Lotus Development Corporation | Realtime data feed engine for updating an application with the most currently received data from multiple data feeds |
US6510453B1 (en) | 1999-02-26 | 2003-01-21 | Microsoft Corporation | System and method for creating and inserting multiple data fragments into an electronic mail message |
US6820081B1 (en) | 2001-03-19 | 2004-11-16 | Attenex Corporation | System and method for evaluating a structured message store for message redundancy |
US20030105716A1 (en) | 2001-12-03 | 2003-06-05 | Sutton Lorin R. | Reducing duplication of files on a network |
US20120191716A1 (en) * | 2002-06-24 | 2012-07-26 | Nosa Omoigui | System and method for knowledge retrieval, management, delivery and presentation |
US20100169888A1 (en) | 2003-05-21 | 2010-07-01 | Resilient, Inc. | Virtual process collaboration |
US7725475B1 (en) * | 2004-02-11 | 2010-05-25 | Aol Inc. | Simplifying lexicon creation in hybrid duplicate detection and inductive classifier systems |
US8429178B2 (en) * | 2004-02-11 | 2013-04-23 | Facebook, Inc. | Reliability of duplicate document detection algorithms |
US7539871B1 (en) | 2004-02-23 | 2009-05-26 | Sun Microsystems, Inc. | System and method for identifying message propagation |
WO2006008733A2 (en) | 2004-07-21 | 2006-01-26 | Equivio Ltd. | A method for determining near duplicate data objects |
US7574409B2 (en) * | 2004-11-04 | 2009-08-11 | Vericept Corporation | Method, apparatus, and system for clustering and classification |
US8010466B2 (en) * | 2004-11-04 | 2011-08-30 | Tw Vericept Corporation | Method, apparatus, and system for clustering and classification |
US20060095521A1 (en) * | 2004-11-04 | 2006-05-04 | Seth Patinkin | Method, apparatus, and system for clustering and classification |
US20100017487A1 (en) * | 2004-11-04 | 2010-01-21 | Vericept Corporation | Method, apparatus, and system for clustering and classification |
US7716217B2 (en) | 2006-01-13 | 2010-05-11 | Bluespace Software Corporation | Determining relevance of electronic content |
US7743051B1 (en) | 2006-01-23 | 2010-06-22 | Clearwell Systems, Inc. | Methods, systems, and user interface for e-mail search and retrieval |
US20070255803A1 (en) * | 2006-04-28 | 2007-11-01 | Gabe Cherian | X-mail (tm) |
US8200762B2 (en) | 2006-06-01 | 2012-06-12 | Aol Inc. | Displaying complex messaging threads into a single display |
US20080208992A1 (en) | 2007-01-03 | 2008-08-28 | Madnani Rajkumar R | Mechanism for discovering and recovering missing emails in an email conversation |
US20100030798A1 (en) * | 2007-01-23 | 2010-02-04 | Clearwell Systems, Inc. | Systems and Methods for Tagging Emails by Discussions |
US20080183826A1 (en) * | 2007-01-31 | 2008-07-31 | Ranjit Notani | System and Method For Transactional, Addressable Communication |
WO2008137308A1 (en) | 2007-05-03 | 2008-11-13 | Microsoft Corporation | Identifying and correlating electronic mail messages |
US20090012984A1 (en) | 2007-07-02 | 2009-01-08 | Equivio Ltd. | Method for Organizing Large Numbers of Documents |
US20090089383A1 (en) | 2007-09-30 | 2009-04-02 | Tsuen Wan Ngan | System and method for detecting content similarity within emails documents employing selective truncation |
US8266430B1 (en) | 2007-11-29 | 2012-09-11 | Emc Corporation | Selective shredding in a deduplication system |
US8032534B2 (en) | 2007-12-17 | 2011-10-04 | Electronics And Telecommunications Research Institute | Method and system for indexing and searching high-dimensional data using signature file |
US20100287196A1 (en) | 2007-12-21 | 2010-11-11 | Thomas Clay Shields | Automated forensic document signatures |
US8351678B1 (en) * | 2008-06-11 | 2013-01-08 | United Services Automobile Association (Usaa) | Duplicate check detection |
US20090319500A1 (en) | 2008-06-24 | 2009-12-24 | Microsoft Corporation | Scalable lookup-driven entity extraction from indexed document collections |
US20120158728A1 (en) * | 2008-07-29 | 2012-06-21 | Clearwell Systems, Inc. | Systems and methods for tagging emails by discussions |
Non-Patent Citations (7)
Title |
---|
"Encore Discovery Solutions Selects Equivio Technology for Near-Duplicate Detection and Email Thread Analysis", available at http://www.encorelegal.com/pdfs/Equivo-Technology-Press-Release.pdf, Aug. 10, 2009, Phoenix, Arizona. |
Carenini, Giuseppe et al., "Summarizing Email Conversations with Clue Words," WWW 2007, Banff, Alberta, Canada, May 8-12, 2007, pp. 91-100, ACM Digital Library. |
Huy, Nguyen The, "Elimination of Redundant Emails", Honour Year Project Report, 2006-2007, National University of Singapore, available at http://www.comp.nus.edu.sg/~wongls/projects/redundant-mails/nguyen-report-4apr07.pdf. |
Huy, Nguyen The, "Elimination of Redundant Emails", Honour Year Project Report, 2006-2007, National University of Singapore, available at http://www.comp.nus.edu.sg/˜wongls/projects/redundant-mails/nguyen-report-4apr07.pdf. |
Wu, Yejun, and Oard, Douglas, W., "Indexing Emails and Email Threads for Retrieval", SIGIR '05, Aug. 15-19, 2005, Salvador, Brazil, available at http://portal.acm.org/citation.cfm?id=1076180&dl=GUIDE&coll=GUIDE&CFID=64252189&CFTOKEN=34781219. |
Yeh, Jen-Yuan and Harnly, Aaron, "Email Thread Reassembly Using Similarity Matching," CEAS 2006-Third Conference on Email and Anti-Spam, Jul. 27-28, 2006, Mountain View, California, USA, 8 pages. |
Zhou, Xiaodong, "Discovering and Summarizing Email Conversations," Thesis, Feb. 2008, 140 pages, The University of British Columbia, Vancouver, Canada. |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11449545B2 (en) * | 2019-05-13 | 2022-09-20 | Snap Inc. | Deduplication of media file search results |
US11899715B2 (en) | 2019-05-13 | 2024-02-13 | Snap Inc. | Deduplication of media files |
Also Published As
Publication number | Publication date |
---|---|
US20120066209A1 (en) | 2012-03-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10454864B2 (en) | Delivering messages from message sources to subscribing recipients | |
US8422786B2 (en) | Analyzing documents using stored templates | |
CN102999562B (en) | Routing inquiry result | |
CN111368013B (en) | Unified identification method, system, equipment and storage medium based on multiple accounts | |
US11539726B2 (en) | System and method for generating heuristic rules for identifying spam emails based on fields in headers of emails | |
CN112463991B (en) | Historical behavior data processing method and device, computer equipment and storage medium | |
US9667737B2 (en) | Publisher-assisted, broker-based caching in a publish-subscription environment | |
US20130325863A1 (en) | Data Clustering for Multi-Layer Social Link Analysis | |
US8788500B2 (en) | Electronic mail duplicate detection | |
US8396877B2 (en) | Method and apparatus for generating a fused view of one or more people | |
US20160381154A1 (en) | Predicting Geolocation Of Users On Social Networks | |
US11677699B2 (en) | Cognitive pre-loading of referenced content in electronic messages | |
CN113886683A (en) | Label cluster construction method and system, storage medium and electronic equipment | |
US8898177B2 (en) | E-mail thread hierarchy detection | |
US20220284501A1 (en) | Probabilistic determination of compatible content | |
CN113807056B (en) | Document name sequence error correction method, device and equipment | |
US20190199671A1 (en) | Ad-hoc virtual organization communication platform | |
US9426173B2 (en) | System and method for elimination of spam in a data stream according to information density | |
US10819622B2 (en) | Batch checkpointing for inter-stream messaging system | |
CN112612865A (en) | Document storage method and device based on elastic search | |
EP3746964A1 (en) | Automatic image classification in electronic communications | |
CN110084710A (en) | Determine the method and device of message subject | |
US10970068B2 (en) | Computer structures for computer artifacts | |
CN110619086B (en) | Method and apparatus for processing information | |
US20170126605A1 (en) | Identifying and merging duplicate messages |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:CONTRACTOR, DANISH;HOSURMATH, MANJULA GOLLA;JOSHI, SACHINDRA;AND OTHERS;SIGNING DATES FROM 20100823 TO 20100909;REEL/FRAME:025144/0658 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
FEPP | Fee payment procedure |
Free format text: SURCHARGE FOR LATE PAYMENT, LARGE ENTITY (ORIGINAL EVENT CODE: M1554) |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551) Year of fee payment: 4 |
|
AS | Assignment |
Owner name: BREAKWATER SOLUTIONS LLC, DELAWARE Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:INTERNATIONAL BUSINESS MACHINES CORPORATION;REEL/FRAME:058616/0384 Effective date: 20210120 |
|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO SMALL (ORIGINAL EVENT CODE: SMAL); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY |
|
FEPP | Fee payment procedure |
Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY |
|
FEPP | Fee payment procedure |
Free format text: 7.5 YR SURCHARGE - LATE PMT W/IN 6 MO, SMALL ENTITY (ORIGINAL EVENT CODE: M2555); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YR, SMALL ENTITY (ORIGINAL EVENT CODE: M2552); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY Year of fee payment: 8 |