WO2000049517A3 - Multi-document summarization system and method - Google Patents

Multi-document summarization system and method Download PDF

Info

Publication number
WO2000049517A3
WO2000049517A3 PCT/US2000/004118 US0004118W WO0049517A3 WO 2000049517 A3 WO2000049517 A3 WO 2000049517A3 US 0004118 W US0004118 W US 0004118W WO 0049517 A3 WO0049517 A3 WO 0049517A3
Authority
WO
WIPO (PCT)
Prior art keywords
phrases
phrase intersection
phrase
intersection table
document summarization
Prior art date
Application number
PCT/US2000/004118
Other languages
French (fr)
Other versions
WO2000049517A2 (en
Inventor
Kathleen R Mckeown
Regina Barzilay
Original Assignee
Univ Columbia
Kathleen R Mckeown
Regina Barzilay
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Univ Columbia, Kathleen R Mckeown, Regina Barzilay filed Critical Univ Columbia
Priority to CA2363017A priority Critical patent/CA2363017C/en
Priority to IL14495100A priority patent/IL144951A0/en
Priority to US09/913,745 priority patent/US7366711B1/en
Priority to AU40026/00A priority patent/AU775978B2/en
Priority to EP00919318A priority patent/EP1190343A4/en
Publication of WO2000049517A2 publication Critical patent/WO2000049517A2/en
Publication of WO2000049517A3 publication Critical patent/WO2000049517A3/en
Priority to IL144951A priority patent/IL144951A/en
Priority to HK02106992.3A priority patent/HK1045391A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • G06F16/345Summarisation for human users
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation

Abstract

A summary for a collection of related documents can be generated by extracting phrases (100) from the documents which include common focus elements. Phrase intersection analysis (120) is then performed on the extracted phrases (100) to generate a phrase intersection table, where identical or equivalent phrases are identified. Temporal processing (140) on the phrases in the phrase intersection table is performed to remove ambiguous time references and to sort the phrases in a temporal sequence. Sentence generation is then used to combine the phrases in the phrase intersection table into a coherent summary.
PCT/US2000/004118 1999-02-19 2000-02-18 Multi-document summarization system and method WO2000049517A2 (en)

Priority Applications (7)

Application Number Priority Date Filing Date Title
CA2363017A CA2363017C (en) 1999-02-19 2000-02-18 Multi-document summarization system and method
IL14495100A IL144951A0 (en) 1999-02-19 2000-02-18 Multi-document summarization system and method
US09/913,745 US7366711B1 (en) 1999-02-19 2000-02-18 Multi-document summarization system and method
AU40026/00A AU775978B2 (en) 1999-02-19 2000-02-18 Multi-document summarization system and method
EP00919318A EP1190343A4 (en) 1999-02-19 2000-02-18 Multi-document summarization system and method
IL144951A IL144951A (en) 1999-02-19 2001-08-16 Multi-document summarization system and method
HK02106992.3A HK1045391A1 (en) 1999-02-19 2002-09-25 Multi-document summarization system and method

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US12065999P 1999-02-19 1999-02-19
US60/120,659 1999-02-19

Publications (2)

Publication Number Publication Date
WO2000049517A2 WO2000049517A2 (en) 2000-08-24
WO2000049517A3 true WO2000049517A3 (en) 2000-11-30

Family

ID=22391735

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2000/004118 WO2000049517A2 (en) 1999-02-19 2000-02-18 Multi-document summarization system and method

Country Status (6)

Country Link
EP (1) EP1190343A4 (en)
AU (1) AU775978B2 (en)
CA (1) CA2363017C (en)
HK (1) HK1045391A1 (en)
IL (2) IL144951A0 (en)
WO (1) WO2000049517A2 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7027974B1 (en) * 2000-10-27 2006-04-11 Science Applications International Corporation Ontology-based parser for natural language processing
US6766316B2 (en) 2001-01-18 2004-07-20 Science Applications International Corporation Method and system of ranking and clustering for document indexing and retrieval
US7818117B2 (en) * 2007-06-20 2010-10-19 Amadeus S.A.S. System and method for integrating and displaying travel advices gathered from a plurality of reliable sources
US11374888B2 (en) 2015-09-25 2022-06-28 Microsoft Technology Licensing, Llc User-defined notification templates

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4965763A (en) * 1987-03-03 1990-10-23 International Business Machines Corporation Computer method for automatic extraction of commonly specified information from business correspondence
US5077668A (en) * 1988-09-30 1991-12-31 Kabushiki Kaisha Toshiba Method and apparatus for producing an abstract of a document
US5297027A (en) * 1990-05-11 1994-03-22 Hitachi, Ltd. Method of and apparatus for promoting the understanding of a text by using an abstract of that text
US5384703A (en) * 1993-07-02 1995-01-24 Xerox Corporation Method and apparatus for summarizing documents according to theme
US5638543A (en) * 1993-06-03 1997-06-10 Xerox Corporation Method and apparatus for automatic document summarization
US5689716A (en) * 1995-04-14 1997-11-18 Xerox Corporation Automatic method of generating thematic summaries
US5778397A (en) * 1995-06-28 1998-07-07 Xerox Corporation Automatic method of generating feature probabilities for automatic extracting summarization
US5838323A (en) * 1995-09-29 1998-11-17 Apple Computer, Inc. Document summary computer system user interface
US5848191A (en) * 1995-12-14 1998-12-08 Xerox Corporation Automatic method of generating thematic summaries from a document image without performing character recognition
US5924108A (en) * 1996-03-29 1999-07-13 Microsoft Corporation Document summarizer for word processors

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4965763A (en) * 1987-03-03 1990-10-23 International Business Machines Corporation Computer method for automatic extraction of commonly specified information from business correspondence
US5077668A (en) * 1988-09-30 1991-12-31 Kabushiki Kaisha Toshiba Method and apparatus for producing an abstract of a document
US5297027A (en) * 1990-05-11 1994-03-22 Hitachi, Ltd. Method of and apparatus for promoting the understanding of a text by using an abstract of that text
US5638543A (en) * 1993-06-03 1997-06-10 Xerox Corporation Method and apparatus for automatic document summarization
US5384703A (en) * 1993-07-02 1995-01-24 Xerox Corporation Method and apparatus for summarizing documents according to theme
US5689716A (en) * 1995-04-14 1997-11-18 Xerox Corporation Automatic method of generating thematic summaries
US5778397A (en) * 1995-06-28 1998-07-07 Xerox Corporation Automatic method of generating feature probabilities for automatic extracting summarization
US5838323A (en) * 1995-09-29 1998-11-17 Apple Computer, Inc. Document summary computer system user interface
US5848191A (en) * 1995-12-14 1998-12-08 Xerox Corporation Automatic method of generating thematic summaries from a document image without performing character recognition
US5924108A (en) * 1996-03-29 1999-07-13 Microsoft Corporation Document summarizer for word processors

Also Published As

Publication number Publication date
HK1045391A1 (en) 2002-11-22
IL144951A (en) 2006-08-01
CA2363017A1 (en) 2000-08-24
WO2000049517A2 (en) 2000-08-24
AU4002600A (en) 2000-09-04
EP1190343A4 (en) 2006-08-09
CA2363017C (en) 2011-04-19
EP1190343A2 (en) 2002-03-27
IL144951A0 (en) 2002-06-30
AU775978B2 (en) 2004-08-19

Similar Documents

Publication Publication Date Title
EP0805403A3 (en) Translating apparatus and translating method
Tang et al. A cascade method for detecting hedges and their scope in natural language text
EP1217533A3 (en) Method and computer system for part-of-speech tagging of incomplete sentences
EP1227409A3 (en) Extracting sentence translations from translated documents
CA2236623A1 (en) Method and apparatus for automatically identifying key words within a document
Ahrenberg et al. Evaluation of Word Alignment Systems.
EP1271355A3 (en) Auto-index method
WO1997038376A3 (en) A system, software and method for locating information in a collection of text-based information sources
Nirenburg et al. A full-text experiment in example-based machine translation
Pal et al. Automatic building and using parallel resources for SMT from comparable corpora
ATE362141T1 (en) CREATION AND EVALUATION OF THE USEFULNESS OF A MULTI-TRAIT BASED CLASSIFICATION SYSTEM USING GENETIC ALGORITHMS
Villavicencio The availability of verb–particle constructions in lexical resources: How much is enough?
Heid A linguistic bootstrapping approach to the extraction of term candidates from German text
WO2000049517A3 (en) Multi-document summarization system and method
Wilkinson Chinese document retrieval at TREC-6
Qu et al. Resolving translation ambiguity using monolingual corpora
Tesema et al. Towards the sense disambiguation of Afan Oromo words using hybrid approach (unsupervised machine learning and rule based)
van Halteren New feature sets for summarization by sentence extraction
Pal et al. Role of paraphrases in pb-smt
Becks et al. Phrases or Terms? The Impact of Different Query Types.
Yakushiji et al. Use of a full parser for information extraction in molecular biology domain
Pal et al. The University of Edinburgh’s Bengali-Hindi submissions to the WMT21 news translation task
van der Plas et al. Automatic acquisition of synonyms for French using parallel corpora
Castelli et al. Mining parallel data from comparable corpora via triangulation
Al-Shammari A novel algorithm for normalizing noisy Arabic text

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE DK DM EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
AK Designated states

Kind code of ref document: A3

Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE DK DM EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A3

Designated state(s): GH GM KE LS MW SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
WWE Wipo information: entry into national phase

Ref document number: 144951

Country of ref document: IL

ENP Entry into the national phase

Ref document number: 2363017

Country of ref document: CA

Ref document number: 2363017

Country of ref document: CA

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: IN/PCT/2001/00737/DE

Country of ref document: IN

WWE Wipo information: entry into national phase

Ref document number: 2000919318

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 09913745

Country of ref document: US

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

WWP Wipo information: published in national office

Ref document number: 2000919318

Country of ref document: EP