WO2001001289A1 - Semantic processor and method with knowledge analysis of and extraction from natural language documents - Google Patents

Semantic processor and method with knowledge analysis of and extraction from natural language documents Download PDF

Info

Publication number
WO2001001289A1
WO2001001289A1 PCT/US2000/017444 US0017444W WO0101289A1 WO 2001001289 A1 WO2001001289 A1 WO 2001001289A1 US 0017444 W US0017444 W US 0017444W WO 0101289 A1 WO0101289 A1 WO 0101289A1
Authority
WO
WIPO (PCT)
Prior art keywords
sao
storing
subject
association
unit
Prior art date
Application number
PCT/US2000/017444
Other languages
French (fr)
Other versions
WO2001001289A8 (en
Inventor
Valery Tsourikov
Leonid Batchilo
Igor Sovpel
Original Assignee
Invention Machine Corporation, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Invention Machine Corporation, Inc. filed Critical Invention Machine Corporation, Inc.
Priority to AU56370/00A priority Critical patent/AU5637000A/en
Priority to EP00941702A priority patent/EP1208457A1/en
Publication of WO2001001289A1 publication Critical patent/WO2001001289A1/en
Publication of WO2001001289A8 publication Critical patent/WO2001001289A8/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/232Orthographic correction, e.g. spell checking or vowelisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Definitions

  • the present invention relates to natural language processing systems, and more specifically to a method and system for converting natural language texts into Subject-
  • Action-Object Knowledge Database (SAO KB). This database can form the heart of various new applications or methods of natural language processing and analysis.
  • parsers are included in known natural language processors, such as Ergo Linguistic Technologies parser (U.S. Pat. No. 5878385), which
  • Part-of-Speech (POS) identification has the following features: Part-of-Speech (POS) identification; Parts of Sentences identification; Passive to Active and Active to Passive mode conversion; Statement to
  • verb chains Subjects, verb chains and objects of the sentence are extracted syntactically but not semantically. As a result, semantic actions (verb chains) can be recognized only if they are described by finite verbs and generally can not be recognized if the actions are described
  • a general purpose computer including entering and storing a user criterion, entering into a
  • first storage area representations of the texts of a plurality of natural language documents that have some relationship with the stored user criterion formatting said representations and storing the formatted text in a second storage area, identifying and extracting from the
  • solutions comprising the subject portions of the one or more stored lemmatized SAO
  • Figure 1 is a pictorial representation of one exemplary embodiment of the system according to the principles of the present invention.
  • Figure 2 is a schematic representation of the main architectural elements of the system and functional links according to the present invention.
  • Figure 3 is a structural and functional schematic representation of Unit 18 of Figure 2.
  • Figure 4 is a structural and functional schematic representation of Unit 20 of Figure 2.
  • Figure 5 is a structural and functional schematic representation of Unit 22 of Figure 2.
  • Figure 6 is a schematic representation of Unit 42 of Figure 4.
  • Figure 7 is a schematic representation of Unit 44 of Figure 4.
  • Figure 8 is a schematic representation of Unit 46 of Figure 4.
  • Figure 9 is a schematic representation of Unit 26 of Figure 2.
  • Figure 10 is a typical example of the text to be semantically processed.
  • Figure 11 is a representation of formatted text of Figure 10.
  • Figure 12 is a representation of error corrected text of Figure 11.
  • Figure 13 is a representation of word-splitted text of Figure 12.
  • Figure 14 is a representation of sentence-splitted text of Figure 13.
  • Figure 15 is a representation of tagged text of Figure 14.
  • Figure 16 is a representation of parsed text of Figure 15.
  • Figure 17 is a representation of SAO DB extracted from parsed text of Figure 16.
  • Figure 18 is a representation of lemmatized SAO DB of Figure 17.
  • Figure 19 is a typical example of relevant SAO DB entry of Figure 18.
  • Figure 20 is a representation of a Problem Folder generated in response to the
  • Figure 21 A is a representation of three original input texts from various sources.
  • Figure 21B is a representation of the output structured SAO KB resulting from
  • a CPU 4 with MODEM and or cable box 5 that could comprise a general purpose computer or networked server or minicomputer with standard user input and output device such as keyboard 10, mouse 8, printer 6 and monitor 2 and/or other user data entry device 9.
  • the SAO Semantic Processor (Fig.2) includes a
  • Preformator 18 receives the document data 28 from the database 16, removes formatting symbols and other symbols that are not part of natural language text (Unit 30),
  • Unit 30 in preformator 18 removes from the input text 28 all the
  • the Preformator splits the text into words (Unit 34) and sentences.
  • SAO Extractor Unit 20 tags the text with part-of-speech tags (Unit 42), parses (Unit 44) the text 40 syntactically, recognizes Subjects, Actions and Objects, their attributes, Cause-Effect relations between SAO-triplets and builds the Syntactical Tree of each sentence of the text 40 (unit 46), which then outputs to the SAO Editor (Unit 22).
  • the Preformator supplies the Formatted text 38 to the input of the SAO Extractor (Fig.4).
  • SAO Extractor uses Linguistic Knowledge base in order to tag the Formatted text 40 with part-of-speech tags (Unit 42). There are preferably three stages in POS tagging process.
  • a context-independent analysis module assigns each word of the text 66 a set of one or more part-of-speech tags. Then the disambiguation context-dependent
  • Unit 70 uses statistical Hidden Markov Model algorithm to assign each word of the text a unique part-of-speech tag (unit 70).
  • Unit 72 uses a rule-based POS tagging module to perform the correction of the output of the Unit 70 and recognition of unknown
  • the parsed text 86 is supplied to the Unit 46 which extracts SAOs from the
  • Parsed text 90 (Fig. 8). At first, SAOs with finite verbs as Actions are extracted where Action type recognized in Unit 100 enables Unit 102 to extract Subject and Object from
  • Unit 104 is recognized in Unit 104 and as verbal nouns recognized in Unit 106. All Subjects and Objects attributes (location, composition, etc.) are recognized in Unit 108. Next, Unit 109 recognizes Cause-Effect relations for SAO-triplets. As the result, the SAO Extractor
  • SAO Editor Unit 22 ( Figures 2 and 5) performs the lemmatization of Actions (unit
  • SAO Editor provides the possibility to filter SAOs, i.e. to remove from SAO database SAOs (Unit 52) not
  • the resulting SAO Knowledge Base (Unit 24) includes SAO database and various tools for analyzing SAOs and building
  • Linguistic Knowledge Base includes Database section (1) and Database of Recognizing Linguistic Models section (2), which describes algorithms for recognizing linguistic objects and relations in the text.
  • Preformator (Unit 18) accesses and is controlled by information stored in blocks (3), (4), (5), (10), (12), (13), (14).
  • SAO Extractor (Unit 20) accesses and is controlled by
  • the method and apparatus of the present invention provide the user with the possibility of automatically extracting World Knowledge from text and storing it in the
  • SAOs where SAOs can be lemmatized and unified into complex hierarchical structures using their attributes and meanings which in turn can help extract other types of
  • the classifier contains a list of tags which are traditionally called part-of-speech
  • tags The list includes tags for nouns, verbs, adjectives, adverbs, prepositions, etc. But
  • NNS common noun, plural
  • JJ adjective
  • each stored word is linked with a set of part-of-speech
  • the Idiomatic Dictionary comprises set expressions and idioms. Each idiom or unit is assigned a part-of-speech tag or a set of
  • part-of-speech tags e.g. go into detail — VB a great deal of — ABL
  • idioms contains 2200 idioms. It is well known that part of speech properties of idioms can not be obtained by analyzing words that constitute idioms. So, the use of idioms can dramatically
  • objects of the outer world i.e. inherent properties (features) of an object measured by a
  • This Dictionary contains in one exemplary embodiment of the present invention about 1250 parameters.
  • syntactic classes which are used for classification of structural elements of syntactically analyzed sentences which are optimized for further SAO extraction.
  • This Probabilistic Grammar provides means for automatically annotating the text with part of
  • the algorithm is based on the known Hidden Markov Model and uses statistical data from block 12.
  • This Rule-Based Grammar is used as the final step of part-of-speech tagging process.
  • the Linguistic Facts module contains Filters Database, Dictionary Word-Code-
  • Filters database includes a list of lexical items and their codes which are considered to be non-informative by knowledge engineers. This information is used by SAO Editor (Fig.5) which checks if it should
  • the Error Detection and Correction module contains Recognizing Linguistic
  • Unit 32 (Fig.3) uses Recognizing Linguistic Models in
  • the Probabilistic Grammar, Unit (10) calculates the most likely word from the above mentioned set of words and corrects the spelling error automatically. If the
  • the Unit (14) uses formal characteristics like spaces, capital letters and punctuation for determining word and sentence boundaries.
  • the splitter is used by Preformator ( Figure 3).
  • Each idiom is assigned a part of speech tag from a list of tags that it can have. The algorithm tends to recognize the longest idiom with a given first word.
  • This Recognizer includes Recognizing Linguistic Models for Verb Chains Recognition. These Models use part-of-speech tagged text (Unit 78) and rules for
  • This Recognizer includes Linguistic Models for Noun Group Recognition. They can also be described in Backus Naur Form. Noun group recognition rules use part-of-
  • speech tagged text and lexemes (such as prepositions, conjunctions and adverbs) in order to extract noun groups, keeping the information on internal structure of noun groups, which is used during next steps of SAO analysis(Subject and Object extraction, Subject and Object lemmatization).
  • This module includes are stored Recognizing Linguistic Models for Functional and
  • Syntactic Phrase Tree Construction They describe rules for structurization of the sentence, i.e. for correlating part-of-speech tags, syntactic and semantic classes, etc. which
  • Action and Object They describe rules that use part-of-speech tags, lexemes and syntactic categories which are then used by SAO Editor (Fig. 5) while lemmatizing Actions (unit 54), Subjects and Objects (Unit 56).
  • SAOs These models use linguistic patterns, lexemes and predefined codes from a list of codes. These patterns describe the location of cause and effect in the input sentence.
  • noun group e.g.
  • the network termination unit includes a plurality of semi-conductor switches electrically connected to conductors of the telephone line to establish a network of electrical paths capable of altering the electrical conduction of the telephone line when caused to assume a state of conduction.
  • SAO extraction module ( Fig.8, Unit 109).
  • FIGS 10-19 show the results of various process steps designated in the respective figure for the sentence:
  • the pressure-sensitive device moves the air through the conducting lumen and into the intubated patient's airway.
  • Figure 20B shows the Problem Folder for this task with each of the four possible
  • Action is the constituent that is expressed either by a finite verb or non-finite verb or
  • verbal noun and denotes a relation between Subject and Object.
  • corpus a collection of text in machine-readable form
  • lemmatization the process or result of dividing a text into sets of different forms of a
  • Linguistic KB (knowledge base): a database of (i) Recognition Linguistic Models and
  • Object is the constituent that is affected by the Action, e.g. John likes Mary. Object is
  • parser toll (often automatic or semi-automatic computer program) used for parsing
  • POS part-of-speech
  • part-of-speech tagging assigning part-of-speech tags to a text.
  • part-of-speech tag a label associated with a word (or other unit) providing
  • run can be tagged as a noun (run NN) or verb (run VB).
  • SAO-DB is a database of S AO-triples and semantic relations.
  • SAO-KB (knowledge base): includes SAO-DB, set of rules for structurizing SAO-
  • SAO Triple SAO-triplet
  • SAO Triplet is a set of Subject, Action and Object, related one with another.
  • SAOs e.g. relations like Object-Parameter
  • SAO as a whole e.g. relations like
  • Storage Area either a separate storage facility in a general purpose computer or
  • Subject is the constituent that performs the Action, e.g. John likes Mary. Subject is
  • Subject Attributes is a property of a Subject (Object), e.g.
  • Syntactic Tree of Sentence is a tree view of the sentence where nodes are syntactic
  • tag-classifier set of tags used for part of speech tagging.

Abstract

A method of semantically processing natural language representations in a general purpose computer including retrieving from remote and local databases (12, 14) and storing representations of the texts of a plurality of natural language documents, formatting said representations and storing the formatted representation (18), identifying and extracting from the formatted representation subject-action-object (SAO) extractions and storing the SAO extractions (20), processing the SAO extractions into normalized SAO structures and storing the SAO structures (22), designating the AO portions as substantially the names of Folders of at least some of the SAO structures, and storing in association with each Folder name the identity of one or more subject portions (S1, S2, ...Sn) that are associated with the respective AO portion of stored SAO structures. The method further includes storing in association with each respective (S1, S2, ... Sn) the full sentence in which the respective SAO appears and highlighting each S-A-O portion that appears in each said full sentence. The list of subjects (S1, S2 ...Sn) stored in association with a respective AO portion is displayed in response to the user selecting the displayed AO portion or Folder name. If desired, the retrieved and processed documents can relate to a user-entered criterion.

Description

TITLE: SEMANTIC PROCESSOR AND METHOD WITH KNOWLEDGE
ANALYSIS OF AND EXTRACTION FROM NATURAL LANGUAGE DOCUMENTS
REFERENCE TO PRIOR APPLICATION:
This is a continuation-in-part of US Patent Application SN 09/321,804, filed May 27, 1999, which matured from US Provisional Patent Application SN 60/099641,
filed September 9, 1998, both applications of which are incorporated herein by reference.
BACKGROUND:
The present invention relates to natural language processing systems, and more specifically to a method and system for converting natural language texts into Subject-
Action-Object Knowledge Database (SAO KB). This database can form the heart of various new applications or methods of natural language processing and analysis.
Different implementations of parsers are included in known natural language processors, such as Ergo Linguistic Technologies parser (U.S. Pat. No. 5878385), which
has the following features: Part-of-Speech (POS) identification; Parts of Sentences identification; Passive to Active and Active to Passive mode conversion; Statement to
Question conversion; Sentence Type identification; Tense conversion. A functional Dependency Grammar Parser is also known (Pasi Tapanainen, Timo Jarvinen, A Non- Projective Dependency Parser: Proceedings of Fifth Conference on Applied Natural
Language Processing, Washington, D.C., 1997), which builds the dependency tree of a sentence (i.e. which word depends on which word in the sentence) and extracts parts of
the sentence. Syntactical parsing is also used in U.S. Patent Nos. 5,060,155 (J.M. van Juijlen) and 5,424,947 (K. Nagao, et al). Another useful technique includes Parts-Of- Speech Tagging (POST), examples of which are disclosed in U.S. Patent No. 5,146,405 to
K.W. Church, and U.S. patent No. 4,887,212 to A. Zamora, et al.
Prior art systems such as these and others analyze the text mostly grammatically or
syntactically and do not have significant ability to consider semantics of natural language.
Subjects, verb chains and objects of the sentence are extracted syntactically but not semantically. As a result, semantic actions (verb chains) can be recognized only if they are described by finite verbs and generally can not be recognized if the actions are described
by non-finite verb forms, like Infinitive, Participle I, Participle II, Gerund and Verbal
Nouns. Because prior known systems or methods lack a meaningful semantic analysis capability, the problem of recognition of semantic relations for subjects, actions, objects
that go beyond one sentence and the problem of hierarchical and synonymical relations between triplets are not addressed by the prior art. Even grammatical analysis is often
incomplete for complex sentences and sentences with unknown words (not listed in the dictionary in advance). Furthermore, prior art systems are often too inaccurate, extremely slow, and are not suitable for working with considerable amounts of text which limits their value for
industrial or commercial level applications.
SUMMARY OF EXEMPLARY EMBODIMENT OF THE PRESENT INVENTION
One exemplary embodiment according to the principles of the present invention includes a system or method of semantically processing natural language representations in
a general purpose computer including entering and storing a user criterion, entering into a
first storage area representations of the texts of a plurality of natural language documents that have some relationship with the stored user criterion, formatting said representations and storing the formatted text in a second storage area, identifying and extracting from the
second storage area subject-action-object (SAO) triplets and storing the SAO extractions
in a third storage area, lemmatizing and storing the SAO extractions, generating a Problem Folder for each stored lemmatized SAO extraction and designating the AO portion as the name of the Problem Folder, storing in the Problem Folder a list of one or more possible
solutions comprising the subject portions of the one or more stored lemmatized SAO
triplets.
DRAWINGS:
These and other objects, features and benefits of the method and system according to the principles of the present invention will become apparent with the following detailed description when taken in view of the appended drawings, in which:
Figure 1 is a pictorial representation of one exemplary embodiment of the system according to the principles of the present invention.
Figure 2 is a schematic representation of the main architectural elements of the system and functional links according to the present invention.
Figure 3 is a structural and functional schematic representation of Unit 18 of Figure 2.
Figure 4 is a structural and functional schematic representation of Unit 20 of Figure 2.
Figure 5 is a structural and functional schematic representation of Unit 22 of Figure 2.
Figure 6 is a schematic representation of Unit 42 of Figure 4.
Figure 7 is a schematic representation of Unit 44 of Figure 4.
Figure 8 is a schematic representation of Unit 46 of Figure 4.
Figure 9 is a schematic representation of Unit 26 of Figure 2.
Figure 10 is a typical example of the text to be semantically processed. Figure 11 is a representation of formatted text of Figure 10.
Figure 12 is a representation of error corrected text of Figure 11.
Figure 13 is a representation of word-splitted text of Figure 12.
Figure 14 is a representation of sentence-splitted text of Figure 13.
Figure 15 is a representation of tagged text of Figure 14.
Figure 16 is a representation of parsed text of Figure 15.
Figure 17 is a representation of SAO DB extracted from parsed text of Figure 16.
Figure 18 is a representation of lemmatized SAO DB of Figure 17.
Figure 19 is a typical example of relevant SAO DB entry of Figure 18.
Figure 20 is a representation of a Problem Folder generated in response to the
relevant SAO DB of Figure 18.
Figure 21 A is a representation of three original input texts from various sources.
Figure 21B is a representation of the output structured SAO KB resulting from
process step 60.
DETAILED DESCRIPTION OF AN EXEMPLARY EMBODIMENT
Note the glossary at the end of this detailed description which will assist the reader.
One exemplary embodiment of SAO Semantic Processor according to the principles of the present invention includes (Fig.l) a CPU 4 with MODEM and or cable box 5 that could comprise a general purpose computer or networked server or minicomputer with standard user input and output device such as keyboard 10, mouse 8, printer 6 and monitor 2 and/or other user data entry device 9.
With reference to Figures 2-9, the SAO Semantic Processor (Fig.2) includes a
database 16 of original documents for receiving and storing documents downloaded from the web 14 or local database 12 or generated as a user request text with the use of keyboard 10 or other input devices, such as a scanner or voice recognition device.
Preformator 18 receives the document data 28 from the database 16, removes formatting symbols and other symbols that are not part of natural language text (Unit 30),
identifies and corrects automatically the mismatches and mistakes (Unit 32), divides the text into words (Unit 34) and sentences (36) and supplies the output 38 to the SAO
Extractor 20.
Unit 30 in preformator 18 (Fig.3) removes from the input text 28 all the
formatting, images, tables, converts different text formats to ASCII text, and recognizes unknown words not stored in one of the dictionaries described below. Preformator's Unit
32 detects in the input text 28 all the spelling errors in accordance with four basic types of errors (substitution, omission, transposition, insertion) and corrects the above errors
automatically. Also the Preformator splits the text into words (Unit 34) and sentences.
Complex sentences are divided or organized into simple sentences (Unit 36).
SAO Extractor Unit 20 (Figure 4) tags the text with part-of-speech tags (Unit 42), parses (Unit 44) the text 40 syntactically, recognizes Subjects, Actions and Objects, their attributes, Cause-Effect relations between SAO-triplets and builds the Syntactical Tree of each sentence of the text 40 (unit 46), which then outputs to the SAO Editor (Unit 22). The Preformator supplies the Formatted text 38 to the input of the SAO Extractor (Fig.4).
SAO Extractor uses Linguistic Knowledge base in order to tag the Formatted text 40 with part-of-speech tags (Unit 42). There are preferably three stages in POS tagging process.
First, a context-independent analysis module (Unit 68) assigns each word of the text 66 a set of one or more part-of-speech tags. Then the disambiguation context-dependent
module based on statistical Hidden Markov Model algorithm assigns each word of the text a unique part-of-speech tag (unit 70). Next, the Unit 72 uses a rule-based POS tagging module to perform the correction of the output of the Unit 70 and recognition of unknown
words, forming the POS-tagged text 74 as the output. Then the SAO Extractor performs the Text Parsing (Unit 44, Figure 4), which includes the steps presented in Figure 7.
Theses steps are verb group (Unit 80) and noun group (Unit 82) extraction and Functional
Tree construction (Unit 84).
The parsed text 86 is supplied to the Unit 46 which extracts SAOs from the
Parsed text 90 (Fig. 8). At first, SAOs with finite verbs as Actions are extracted where Action type recognized in Unit 100 enables Unit 102 to extract Subject and Object from
the right or the left part of the Action. Then SAOs with Actions as non-finite verb forms
are recognized in Unit 104 and as verbal nouns recognized in Unit 106. All Subjects and Objects attributes (location, composition, etc.) are recognized in Unit 108. Next, Unit 109 recognizes Cause-Effect relations for SAO-triplets. As the result, the SAO Extractor
20 outputs the Functionally analyzed text 112 which serves as input to the SAO Editor
(Unit 22).
SAO Editor Unit 22 (Figures 2 and 5) performs the lemmatization of Actions (unit
54), Subjects and Objects (Unit 56), filters SAOs (unit 58), structures (unit 60) them and forms the SAO Knowledge base (Unit 24). SAO Editor performs Action lemmatization
(Unit 54, Fig.5), i.e. converts all Actions to infinitive, Subject and Object lemmatization
during which Subject and Object are represented or described by noun(s) Unit 56. For example:
sets — ► set
sets of processors → set (of) processor
In addition to all the above mentioned features, SAO Editor provides the possibility to filter SAOs, i.e. to remove from SAO database SAOs (Unit 52) not
irrelevant to the analyzed text (Unit 58), to structure SAOs (Unit 60), i.e. to set the
relevant SAOs in synonymical and hierarchical structures. The resulting SAO Knowledge Base (Unit 24) includes SAO database and various tools for analyzing SAOs and building
synonymical and hierarchical relations.
System modules (Units 18,20, 22) access Linguistic Knowledge Base (Unit 26), one example of which structure is presented in Figure 9. Linguistic Knowledge Base includes Database section (1) and Database of Recognizing Linguistic Models section (2), which describes algorithms for recognizing linguistic objects and relations in the text.
Preformator (Unit 18) accesses and is controlled by information stored in blocks (3), (4), (5), (10), (12), (13), (14). SAO Extractor (Unit 20) accesses and is controlled by
information stored in blocks (3), (4), (5), (6), (8), (9), (10), (11), (12), (15), (16), (17), (18), (19), (21), (22). SAO Editor (Unit 22) accesses and is controlled by information
stored in blocks (4), (12), (20).
The method and apparatus of the present invention provide the user with the possibility of automatically extracting World Knowledge from text and storing it in the
form of SAOs where SAOs can be lemmatized and unified into complex hierarchical structures using their attributes and meanings which in turn can help extract other types of
knowledge which will use natural language facts and dependencies to reflect the real world regularities and information.
The Linguistic KB (Figure 9) will now be described. Classifier (3)
The classifier contains a list of tags which are traditionally called part-of-speech
tags. The list includes tags for nouns, verbs, adjectives, adverbs, prepositions, etc. But
these tags should be more detailed than traditional parts of speech. For example, words
that combine noun and adverb functions have a separate tag. Punctuation symbols also have appropriate tags. Other tag classes are names of devices, systems, enterprises that can be treated as semantic tags. See US Pat. No. 4,868,750 (Kucera , et al.) and US Pat. No. 4,887,212 (Zamora , et al.) for examples of implementation of a classified database.
Examples of tags:
NN — common noun, singular
NNS — common noun, plural These part-of-speech tags are used for assigning each word in the main Dictionary (4) a set of tags that it can have. For example, as the word "well" can be a noun(NN), an adverb(RB), and an adjective(JJ), the dictionary entry can be the following:
Well — RB, NN, JJ
One suitable list of codes is published in Publication No. 1 that discloses 154 codes in the
list.
Main Dictionary (4)
In the Main Dictionary, each stored word is linked with a set of part-of-speech
tags that it can have in the text, for example,
Absorb — VB
Abstract — JJ, NN, VB
WeU — RB, NN, JJ It is used for spelling correction, automatic part of speech tagging, syntactic analysis, semantic analysis - — anywhere part of speech analysis is used. If we describe the text as a chain of words
w, w2 w3 wn
Then after a first look-up in the Main Dictionary we obtain:
w,K, w2 K2 w3 K3 n K„ where K, , i = 1, n is a set of tags from the Main Dictionary which, in one exemplary
embodiment of the present invention, contains about 60,000 English words. Further details of a Main Dictionary are published in Publication No. 1.
Dictionary of Abbreviation (5)
This is a database list of abbreviations. Each abbreviation is assigned one part of
speech tag, e.g.
A.C. — NNU where AC means alternating current
i.e. — RB Because abbreviations act just like ordinary words in the text, abbreviations dictionary is
quite similar to the Main Dictionary.
Idiomatic Dictionary (6)
This is a database list of idioms. The Idiomatic Dictionary comprises set expressions and idioms. Each idiom or unit is assigned a part-of-speech tag or a set of
part-of-speech tags, e.g. go into detail — VB a great deal of — ABL
In one exemplary embodiment of the present invention, the Idiomatic Dictionary
contains 2200 idioms. It is well known that part of speech properties of idioms can not be obtained by analyzing words that constitute idioms. So, the use of idioms can dramatically
improve the accuracy of part-of-speech tagging.
Dictionary of Parameters (7)
The Dictionary of Parameters is a database list of parameters that characterize
objects of the outer world, i.e. inherent properties (features) of an object measured by a
numerical number, i.e. weight, temperature, density, current strength, etc. This Dictionary contains in one exemplary embodiment of the present invention about 1250 parameters.
Syntactic Classifier (8)
This is a database of Syntactic Classifier of lexical items and relations. Includes
syntactic classes (codes), which are used for classification of structural elements of syntactically analyzed sentences which are optimized for further SAO extraction.
One of the most widely used syntactic classifier of this kind is described in Perm Treebank Project [Bracketing guidelines for Treebank II style Perm Treebank Project,
January 1995, University of Pennsylvania]. More information about Penn TreeBank Project can be found at http://www.cis.upenn.edu/~treebank/home.html. Another example
of syntactic classifier can be found in U.S. Patent No. 5,878,385. Syntactic classifier (8)
differs from the prior art classifiers, because it includes new unique codes, for example, the
following codes: w_NN — code for noun group w VBN XX — code for one verb chain pattern
w Sentence — code for sentence
These codes enable generation of or formatting of certain linguistic structures
(noun groups, verb chain patterns and sentences as basic elements of semantic analysis) that are important for further SAO extraction.
Semantic Classifier (9)
This is a database list of Semantic Classifier of lexical items and relations. It includes semantic classes (codes) of SAO triplets, their components S,A,O, A-O and their
attributes and relations, including cause-effect, object-parameter and other relations. It is
used for classifying structural elements of sentence trees. Probabilistic Grammar (10)
This is a database of probabilistic Grammar of Natural Language. This Grammar uses main lexicon, Idiomatic Dictionary and abbreviations, an algorithm of looking up the
main lexicon, Idiomatic Dictionary and abbreviations and also an algorithm for word part of speech disambiguation, i.e. determining word part of speech using context. This Probabilistic Grammar provides means for automatically annotating the text with part of
speech information(Units 66, 68, 70). The algorithm is based on the known Hidden Markov Model and uses statistical data from block 12.
Rule-Based Grammar (11)
This is a database of Rule-Based Part of Speech Tagging module. It includes rules
and rules processor which detects erroneous output of Unit 70 and corrects it. Example of rule is: If the Unit 70 outputs a sequence of article, such as "a" or "the", and verb, this
sequence should be replaced by the sequence of article and noun.
This Rule-Based Grammar is used as the final step of part-of-speech tagging process.
Linguistic Facts (12)
The Linguistic Facts module contains Filters Database, Dictionary Word-Code-
Frequency, Statistical Matrix Code-Code and so on. Filters database includes a list of lexical items and their codes which are considered to be non-informative by knowledge engineers. This information is used by SAO Editor (Fig.5) which checks if it should
include a given SAO into SAO Knowledge Base. Other above mentioned components of the Linguistic Facts module are used in Probabilistic Grammar (Unit (10)) for part-of-
speech tagging of text.
Error Detection and Correction (13)
The Error Detection and Correction module contains Recognizing Linguistic
Models (algorithms) for automatic spelling corrector. Algorithms of the automatic
spelling corrector are detailed in Publication No. 1. Similar algorithms are described in Publication No. 3. This module is used to detect and correct four basic types of spelling
errors (substitution of a symbol in a word, omission of a symbol, shift of adjacent symbols, insertion of a superfluous symbol). Unit 32 (Fig.3) uses Recognizing Linguistic Models in
order to find errors, determine their types and select a set of words for correction without
using context. The Probabilistic Grammar, Unit (10), calculates the most likely word from the above mentioned set of words and corrects the spelling error automatically. If the
word length is more than 5 letters, the word is checked for a combination of any two out of four basic types of errors. Splitter (14)
These are stored Recognizing Linguistic Models for Text to Words and Text to
Sentences splitting. The Unit (14) uses formal characteristics like spaces, capital letters and punctuation for determining word and sentence boundaries. The splitter is used by Preformator (Figure 3).
Idiom/Set Expression Recognizer (15)
These are stored Recognizing Linguistic Models for idiom recognition. The model
described in-depth in Publication No. 1 can be used. It provides an Idiomatic dictionary (Unit 6) to recognize idioms in the text during the first stage of part of speech tagging
(Unit 42). Each idiom is assigned a part of speech tag from a list of tags that it can have. The algorithm tends to recognize the longest idiom with a given first word.
Verb Chain Recognizer (16)
This Recognizer includes Recognizing Linguistic Models for Verb Chains Recognition. These Models use part-of-speech tagged text (Unit 78) and rules for
extracting verb chains in the text. Rules can be described in Backus Naur Form, e.g.
<present perfect passive>::=<HVZ><BEN><VBN> is a pattern for extracting verb chains like "has been produced". Noun Group Recognizer (17)
This Recognizer includes Linguistic Models for Noun Group Recognition. They can also be described in Backus Naur Form. Noun group recognition rules use part-of-
speech tagged text and lexemes (such as prepositions, conjunctions and adverbs) in order to extract noun groups, keeping the information on internal structure of noun groups, which is used during next steps of SAO analysis(Subject and Object extraction, Subject and Object lemmatization).
Tree Construction (18)
This module includes are stored Recognizing Linguistic Models for Functional and
Syntactic Phrase Tree Construction. They describe rules for structurization of the sentence, i.e. for correlating part-of-speech tags, syntactic and semantic classes, etc. which
are used by Text parsing (Unit 44) and SAO extraction (Unit 46) for building Syntactic
and Functional phrases.
SAO Recognition (19)
These are stored Recognizing Linguistic Models for Subject, Action and Object
extraction. They describe rules that use part-of-speech tags, lexemes and syntactic categories which are then used to extract from the parsed text (Unit 90) SAOs with finite
actions (Units 100, 102), non-finite actions (Unit 104), verbal nouns (Unit 106). One example of an Action extraction is: <HVZ><BEN><VBN> D (<A>=<VBN>)
This rule means that "if an input sentence contains a sequence of words w,, w2, w3
which at the step of part-of-speech tagging obtained HVZ, BEN, VBN tags respectively, then the word with VBN tag in this sequence is in Action". For example, has HVZ been BEN produced VBN D (A=produced)
There are more than one hundred rules in on exemplary embodiment of the present invention for action extraction.
Lemmatization (20)
These are stored Recognizing Linguistic Models for Lemmatization of Subject,
Action and Object. They describe rules that use part-of-speech tags, lexemes and syntactic categories which are then used by SAO Editor (Fig. 5) while lemmatizing Actions (unit 54), Subjects and Objects (Unit 56).
Below are examples of such rules for Action and Object Lemmatization respectively:
<VBN> D <Infinitive> produced VBN D produce
<NNS> D <NN> processors_NNS D processor Subject-Object Attributes (21)
These are stored Recognizing Linguistic Models for detecting Subject and Object
attributes. These models describe rules (algorithms) for detecting subjects, objects, their attributes (placement, inclusion, parameter, etc.) and their meanings in syntactic tree.
These algorithms work with noun groups and act like linguistic patterns that control extraction of above-mentioned relations from the text. For example, for the relations of
type parameter-object, basic patterns are <Parameter> of <Object>
and
<Object> <Parameter>
The relation is valid only when the lexeme which corresponds to <parameter> is found in the list of parameters included in block (7).
These models are used by Unit 108 of SAO extraction module (Fig. 8).
SAO Semantic Relations (22)
These are stored Recognizing Linguistic Models for detecting semantic relations of
SAOs. These models describe algorithms for detecting cause-effect relations between
SAOs. These models use linguistic patterns, lexemes and predefined codes from a list of codes. These patterns describe the location of cause and effect in the input sentence. For
example, the condition when caused + TO + VB
shows that the Cause is to the right of the word caused and is expressed by an infinitive and a noun group that follows it. The Effect is to the left from the word caused and is
expressed by a noun group, e.g.
The network termination unit includes a plurality of semi-conductor switches electrically connected to conductors of the telephone line to establish a network of electrical paths capable of altering the electrical conduction of the telephone line when caused to assume a state of conduction.
These models are used by SAO extraction module ( Fig.8, Unit 109).
Figures 10-19 show the results of various process steps designated in the respective figure for the sentence:
"As the ambubag is squeezed by the control unit, the pressure-sensitive device moves the air through the conducting lumen and into the intubated patient's airway."
The user in this example entered: "How to move air." Figure 20A shows related
text from documents found on the web, for example, and stored.
Figure 20B shows the Problem Folder for this task with each of the four possible
solutions listed along with their citations and quotes generated by the present system and method and displayed for the user on monitor 2. Publications and Patents incorporated herein by reference:
1. Sovpel, I. V. Injenemo-lingvisticheskie prinzipi, metodi i algoritmi avtomatizirovannoi pererabotki teksta. — Minsk: Visheishaya Shkola, 1991, — 116 p.
2. Zamora, Antonio "Automatic Detection and Correction of Spelling Errors in a Large Data Base," Journal of the American Society for Information Science, 31(1), pp. 51-57, 1980.
3. All patents mentioned throughout this specification are incorporated herein by reference.
GLOSSARY:
1. Action: is the constituent that is expressed either by a finite verb or non-finite verb or
verbal noun and denotes a relation between Subject and Object.
2. corpus (pi. corpora or corpuses): a collection of text in machine-readable form,
compiled to be representative of a particular kind of language and provided with some kind of additional information.
3. Functional Tree of Sentence: is a syntactical tree of the sentence where SAO-triples
and semantic relations for them are recognized.
4. lemmatization: the process or result of dividing a text into sets of different forms of a
word, such as the inflected forms of a verb. Ex. "sing", "sang", "sung" are one lemma, "boy", "boys" another.
5. Linguistic KB (knowledge base): a database of (i) Recognition Linguistic Models and
(ϋ) a database of linguistic rules and dictionaries [see for example Figure 9].
6. NL: Natural Language
7. Object: is the constituent that is affected by the Action, e.g. John likes Mary. Object is
"Mary" because "Mary" supports the Action.
8. parsing: the process or result of making a syntactic analysis of the natural language
text. 9. parser: toll (often automatic or semi-automatic computer program) used for parsing
the text.
10. part-of-speech (POS): word class, such as verb, noun, adjective.
11. part-of-speech tagging: assigning part-of-speech tags to a text.
12. part-of-speech tag: a label associated with a word (or other unit) providing
information about the word, or the process of assigning tags. Ex: "run" can be tagged as a noun (run NN) or verb (run VB).
13. Problem Folder: Computer storage address and area for storing structured SAO KB
entries in problem statement form (e.g. "How to move air") and a plurality of possible
solutions (i.e., subjects from all documents related to the problem), the document citations thereof, the full sentence in which the subject and SAO appears with the subject, action
and object preferably highlighted.
14. Recognizing Linguistic Models: are linguistic algorithms for recognizing and
transfoirning certain linguistic objects and their relations in a text. The models are
described as rules using lexical units, tags, syntactic and semantic categories.
15. SAO: Subject- Action-Object
16. SAO-DB (database): is a database of S AO-triples and semantic relations.
17. SAO-KB (knowledge base): includes SAO-DB, set of rules for structurizing SAO-
DB and tools for managing SAO-DB. 18. SAO Triple: SAO-triplet
19. SAO Triplet: is a set of Subject, Action and Object, related one with another.
20. Semantic Relations for SAO triples: are semantic relations for separate components
of SAOs (e.g. relations like Object-Parameter) and for SAO as a whole (E.g. relations like
SAOj→SAOj where SAO, is Cause and SAO2 is Effect).
21. Storage Area: either a separate storage facility in a general purpose computer or
address designated storage within a general purpose computer.
22. Subject: is the constituent that performs the Action, e.g. John likes Mary. Subject is
"John" because "John" performs the Action.
23. Subject Attributes, Object Attributes: is a property of a Subject (Object), e.g.
parameter, placement.
24. Syntactic Tree of Sentence: is a tree view of the sentence where nodes are syntactic
categories and edges are dependencies between syntactic categories.
25. tag-classifier: set of tags used for part of speech tagging.

Claims

WE CLAIM:
Claim 1. A method of semantically processing natural language representations in a general purpose computer comprising: entering and storing a user criterion, retrieving from remote and local databases and storing representations of the texts of a plurality of natural language documents that have some relationship with the stored user criterion, formatting said representations and storing the formatted representation, identifying and extracting from the formatted representation subject-action-object (SAO) extractions and storing the SAO extractions, processing the SAO extractions into normalized SAO structures and storing the SAO structures, designating the AO portions as substantially the names of Folders of at least some of the SAO structures, storing in association with each the Folder name the identity of one or more subject portions (Si, S2, ...S„) that are associated with the respective AO portion of stored SAO structures.
Claim 2. The method of Claim 1 including storing in association with each respective Si, S , ... S„ the full sentence in which the respective SAO appears.
Claim 3. The method of Claim 2 including highlighting each S-A-O portion that appears in each said full sentence.
Claim 4. The method of Claim 1 further including displaying the list of subjects (Si, S2 ...Sn) stored in association with a respective AO portion in response to the user selecting the displayed AO portion or folder name. Claim 5. The method of Claim 1, further including storing in association with at least each subject (S) the respective sentence of the source document from which the respective SAO structure originated.
Claim 6. The method of Claim 1, further including storing in association with at least each subject (S) the citation to the source document from which the respective SAO structure originated.
AMENDED CLAIMS
[received by the International Bureau on 19 September 2000 (19.09.00); original claims 1 , 2 and 3 amended; new claims 7 and 8 added; other claims unchanged (2 pages)]
Claim 1. A method of semantically processing, managing, and displaying natural language representations in a general purpose computer comprising: retrieving from remote and local databases and storing representations of the texts of a plurality of natural language documents, formatting said representations and storing the formatted representation, identifying and extracting from the formatted representation subject-action-object (SAO) extractions and storing the SAO extractions, processing the SAO extractions into normalized SAO structures and storing the SAO structures, designating the AO portions as substantially the names of Folders of at least some of the SAO structures, storing in association with each Folder name the identity of one or more subject portions (Si, S2, ...S„) that are associated with the respective AO portion of stored SAO structures displaying a plurality of Folder names, and in response to user selection of a particular Folder name, displaying the subject portions (Si, S2, ...S„) associated with the selected Folder name.
Claim 2. The method of Claim 1 including storing in association with each respective Si, S2, ... Sn the full sentence of the source document in which the respective SAO appears. Claim 3. The method of Claim 2 including displaying in response to user selection of a particular Si, S2, ...Sn said full sentence and highlighting each S-A-O portion that appears in said full sentence.
Claim 4. The method of Claim 1 further including displaying the list of subjects (Si, S2 ...Sn) stored in association with a respective AO portion in response to the user selecting the displayed AO portion or folder name.
Claim 5. The method of Claim 1, further including storing in association with at least each subject (S) the respective sentence of the source document from which the respective SAO structure originated.
Claim 6. The method of Claim 1, further including storing in association with at least each subject (S) the citation to the source document from which the respective SAO structure originated.
Claim 7. The method of Claim 5, further including displaying said respective sentence in response to user selection of the respective subject (S).
Claim 8. The method of Claim 6, further including displaying said citation to the source document of said respective SAO.
PCT/US2000/017444 1999-06-30 2000-06-23 Semantic processor and method with knowledge analysis of and extraction from natural language documents WO2001001289A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
AU56370/00A AU5637000A (en) 1999-06-30 2000-06-23 Semantic processor and method with knowledge analysis of and extraction from natural language documents
EP00941702A EP1208457A1 (en) 1999-06-30 2000-06-23 Semantic processor and method with knowledge analysis of and extraction from natural language documents

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US34554799A 1999-06-30 1999-06-30
US09/345,547 1999-06-30

Publications (2)

Publication Number Publication Date
WO2001001289A1 true WO2001001289A1 (en) 2001-01-04
WO2001001289A8 WO2001001289A8 (en) 2001-06-21

Family

ID=23355462

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2000/017444 WO2001001289A1 (en) 1999-06-30 2000-06-23 Semantic processor and method with knowledge analysis of and extraction from natural language documents

Country Status (3)

Country Link
EP (1) EP1208457A1 (en)
AU (1) AU5637000A (en)
WO (1) WO2001001289A1 (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003077154A2 (en) * 2002-03-14 2003-09-18 Universita'degli Studi Di Firenze System and method for performing functional analyses making use of a plurality of inputs
GB2417103A (en) * 2004-08-11 2006-02-15 Sdl Plc Natural language translation system
ITTO20120303A1 (en) * 2012-04-05 2012-07-05 Wolf S R L Dr METHOD AND SYSTEM FOR CARRYING OUT ANALYSIS AND AUTOMATIC COMPARISON OF PATENTS AND TECHNICAL DESCRIPTIONS.
US9128929B2 (en) 2011-01-14 2015-09-08 Sdl Language Technologies Systems and methods for automatically estimating a translation time including preparation time in addition to the translation itself
US9569425B2 (en) 2013-03-01 2017-02-14 The Software Shop, Inc. Systems and methods for improving the efficiency of syntactic and semantic analysis in automated processes for natural language understanding using traveling features
EP3316148A1 (en) * 2016-10-30 2018-05-02 Wipro Limited Method and system for determining action items from knowledge base for execution of operations
US10198438B2 (en) 1999-09-17 2019-02-05 Sdl Inc. E-services translation utilizing machine translation and translation memory
US10248650B2 (en) 2004-03-05 2019-04-02 Sdl Inc. In-context exact (ICE) matching
CN109918640A (en) * 2018-12-22 2019-06-21 浙江工商大学 A kind of Chinese text proofreading method of knowledge based map
US10635863B2 (en) 2017-10-30 2020-04-28 Sdl Inc. Fragment recall and adaptive automated translation
US10817676B2 (en) 2017-12-27 2020-10-27 Sdl Inc. Intelligent routing services and systems
US11256867B2 (en) 2018-10-09 2022-02-22 Sdl Inc. Systems and methods of machine learning for digital assets and message creation

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040001099A1 (en) * 2002-06-27 2004-01-01 Microsoft Corporation Method and system for associating actions with semantic labels in electronic documents
US8521506B2 (en) 2006-09-21 2013-08-27 Sdl Plc Computer-implemented method, computer software and apparatus for use in a translation system
US9262403B2 (en) 2009-03-02 2016-02-16 Sdl Plc Dynamic generation of auto-suggest dictionary for natural language translation
WO2018203912A1 (en) * 2017-05-05 2018-11-08 Midmore Roger Interactive story system using four-valued logic

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4829423A (en) * 1983-01-28 1989-05-09 Texas Instruments Incorporated Menu-based natural language understanding system
US4864502A (en) * 1987-10-07 1989-09-05 Houghton Mifflin Company Sentence analyzer
US5331556A (en) * 1993-06-28 1994-07-19 General Electric Company Method for natural language data processing using morphological and part-of-speech information
US5369575A (en) * 1992-05-15 1994-11-29 International Business Machines Corporation Constrained natural language interface for a computer system
US5559940A (en) * 1990-12-14 1996-09-24 Hutson; William H. Method and system for real-time information analysis of textual material
US5696916A (en) * 1985-03-27 1997-12-09 Hitachi, Ltd. Information storage and retrieval system and display method therefor
US5761497A (en) * 1993-11-22 1998-06-02 Reed Elsevier, Inc. Associative text search and retrieval system that calculates ranking scores and window scores
US5799268A (en) * 1994-09-28 1998-08-25 Apple Computer, Inc. Method for extracting knowledge from online documentation and creating a glossary, index, help database or the like
US5873076A (en) * 1995-09-15 1999-02-16 Infonautics Corporation Architecture for processing search queries, retrieving documents identified thereby, and method for using same
US5873056A (en) * 1993-10-12 1999-02-16 The Syracuse University Natural language processing system for semantic vector representation which accounts for lexical ambiguity

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4829423A (en) * 1983-01-28 1989-05-09 Texas Instruments Incorporated Menu-based natural language understanding system
US5696916A (en) * 1985-03-27 1997-12-09 Hitachi, Ltd. Information storage and retrieval system and display method therefor
US4864502A (en) * 1987-10-07 1989-09-05 Houghton Mifflin Company Sentence analyzer
US5559940A (en) * 1990-12-14 1996-09-24 Hutson; William H. Method and system for real-time information analysis of textual material
US5369575A (en) * 1992-05-15 1994-11-29 International Business Machines Corporation Constrained natural language interface for a computer system
US5331556A (en) * 1993-06-28 1994-07-19 General Electric Company Method for natural language data processing using morphological and part-of-speech information
US5873056A (en) * 1993-10-12 1999-02-16 The Syracuse University Natural language processing system for semantic vector representation which accounts for lexical ambiguity
US5761497A (en) * 1993-11-22 1998-06-02 Reed Elsevier, Inc. Associative text search and retrieval system that calculates ranking scores and window scores
US5799268A (en) * 1994-09-28 1998-08-25 Apple Computer, Inc. Method for extracting knowledge from online documentation and creating a glossary, index, help database or the like
US5873076A (en) * 1995-09-15 1999-02-16 Infonautics Corporation Architecture for processing search queries, retrieving documents identified thereby, and method for using same

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10198438B2 (en) 1999-09-17 2019-02-05 Sdl Inc. E-services translation utilizing machine translation and translation memory
US10216731B2 (en) 1999-09-17 2019-02-26 Sdl Inc. E-services translation utilizing machine translation and translation memory
EP1351156A1 (en) * 2002-03-14 2003-10-08 Universita' Degli Studi di Firenze System and method for automatically performing functional analyses of technical texts
WO2003077154A3 (en) * 2002-03-14 2004-04-08 Univ Firenze System and method for performing functional analyses making use of a plurality of inputs
WO2003077154A2 (en) * 2002-03-14 2003-09-18 Universita'degli Studi Di Firenze System and method for performing functional analyses making use of a plurality of inputs
US10248650B2 (en) 2004-03-05 2019-04-02 Sdl Inc. In-context exact (ICE) matching
GB2417103A (en) * 2004-08-11 2006-02-15 Sdl Plc Natural language translation system
US9128929B2 (en) 2011-01-14 2015-09-08 Sdl Language Technologies Systems and methods for automatically estimating a translation time including preparation time in addition to the translation itself
ITTO20120303A1 (en) * 2012-04-05 2012-07-05 Wolf S R L Dr METHOD AND SYSTEM FOR CARRYING OUT ANALYSIS AND AUTOMATIC COMPARISON OF PATENTS AND TECHNICAL DESCRIPTIONS.
US9965461B2 (en) 2013-03-01 2018-05-08 The Software Shop, Inc. Systems and methods for improving the efficiency of syntactic and semantic analysis in automated processes for natural language understanding using argument ordering
US9594745B2 (en) 2013-03-01 2017-03-14 The Software Shop, Inc. Systems and methods for improving the efficiency of syntactic and semantic analysis in automated processes for natural language understanding using general composition
US9569425B2 (en) 2013-03-01 2017-02-14 The Software Shop, Inc. Systems and methods for improving the efficiency of syntactic and semantic analysis in automated processes for natural language understanding using traveling features
EP3316148A1 (en) * 2016-10-30 2018-05-02 Wipro Limited Method and system for determining action items from knowledge base for execution of operations
US10635863B2 (en) 2017-10-30 2020-04-28 Sdl Inc. Fragment recall and adaptive automated translation
US11321540B2 (en) 2017-10-30 2022-05-03 Sdl Inc. Systems and methods of adaptive automated translation utilizing fine-grained alignment
US10817676B2 (en) 2017-12-27 2020-10-27 Sdl Inc. Intelligent routing services and systems
US11475227B2 (en) 2017-12-27 2022-10-18 Sdl Inc. Intelligent routing services and systems
US11256867B2 (en) 2018-10-09 2022-02-22 Sdl Inc. Systems and methods of machine learning for digital assets and message creation
CN109918640A (en) * 2018-12-22 2019-06-21 浙江工商大学 A kind of Chinese text proofreading method of knowledge based map
CN109918640B (en) * 2018-12-22 2023-05-02 浙江工商大学 Chinese text proofreading method based on knowledge graph

Also Published As

Publication number Publication date
AU5637000A (en) 2001-01-31
WO2001001289A8 (en) 2001-06-21
EP1208457A1 (en) 2002-05-29

Similar Documents

Publication Publication Date Title
US6658377B1 (en) Method and system for text analysis based on the tagging, processing, and/or reformatting of the input text
US10296584B2 (en) Semantic textual analysis
US20100332217A1 (en) Method for text improvement via linguistic abstractions
US20050086047A1 (en) Syntax analysis method and apparatus
US20150012262A1 (en) Method and system for generating new entries in natural language dictionary
GB2417103A (en) Natural language translation system
EP1208457A1 (en) Semantic processor and method with knowledge analysis of and extraction from natural language documents
Kammoun et al. The MORPH2 new version: A robust morphological analyzer for Arabic texts
Prokopidis et al. A suite of natural language processing tools for Greek
Litkowski Senseval: The cl research experience
Löfberg et al. Porting an English semantic tagger to the Finnish language
Yeshambel et al. Evaluation of corpora, resources and tools for Amharic information retrieval
Al-Arfaj et al. Arabic NLP tools for ontology construction from Arabic text: An overview
Bouziane et al. Annotating Arabic Texts with Linked Data
Nasir et al. Syntactic structured framework for resolving reflexive anaphora in Urdu discourse using multilingual NLP
Samir et al. Training and evaluation of TreeTagger on Amazigh corpus
Mesfar Towards a cascade of morpho-syntactic tools for Arabic natural language processing
Knapp et al. Multiple use of content in a web-based language learning system
Mishra et al. Identifying and Analyzing Reduplication Multiword Expressions in Hindi Text Using Machine Learning
Ruch et al. Toward filling the gap between interactive and fully-automatic spelling correction using the linguistic context
Uzun et al. Web-based acquisition of subcategorization frames for Turkish
Saint-Joanis A New Set of Linguistic Resources for Ukrainian
Maharramova Analysis of the role and use of prefixes in word formation in modern german compared to english
Abdelkader HMM Based Part of Speech Tagging for Hadith Isnad
Гирин Determining the Morphological Class of a Word During the Automatic Natural Language Processing

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
AK Designated states

Kind code of ref document: C1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: C1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

CFP Corrected version of a pamphlet front page

Free format text: REVISED ABSTRACT RECEIVED BY THE INTERNATIONAL BUREAU AFTER COMPLETION OF THE TECHNICAL PREPARATIONS FOR INTERNATIONAL PUBLICATION

WWE Wipo information: entry into national phase

Ref document number: 2000941702

Country of ref document: EP

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

WWP Wipo information: published in national office

Ref document number: 2000941702

Country of ref document: EP

WWW Wipo information: withdrawn in national office

Ref document number: 2000941702

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: JP