WO2002035376A2 - Ontology-based parser for natural language processing - Google Patents

Ontology-based parser for natural language processing Download PDF

Info

Publication number
WO2002035376A2
WO2002035376A2 PCT/US2001/032636 US0132636W WO0235376A2 WO 2002035376 A2 WO2002035376 A2 WO 2002035376A2 US 0132636 W US0132636 W US 0132636W WO 0235376 A2 WO0235376 A2 WO 0235376A2
Authority
WO
WIPO (PCT)
Prior art keywords
ontological
predicate
parsing
recited
language text
Prior art date
Application number
PCT/US2001/032636
Other languages
French (fr)
Other versions
WO2002035376A3 (en
Inventor
Justin Eliot Busch
Albert Deirchow Lin
Patrick John Graydon
Maureen Caudill
Original Assignee
Science Applications International Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Science Applications International Corporation filed Critical Science Applications International Corporation
Priority to AU2002224446A priority Critical patent/AU2002224446A1/en
Publication of WO2002035376A2 publication Critical patent/WO2002035376A2/en
Publication of WO2002035376A3 publication Critical patent/WO2002035376A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99933Query processing, i.e. searching
    • Y10S707/99934Query formulation, input preparation, or translation

Definitions

  • the present invention relates to an ontological parser for natural language processing. More particularly, the present invention relates to a system and method for ontological parsing of natural language that provides a simple knowledge-base-style representation format for the manipulation of natural- language documents.
  • the system utilizes unstructured text as input and produces a set of data structures representing the conceptual content of the document as output.
  • the data is transformed using a syntactic parser and ontology.
  • the ontology is used as a lexical resource.
  • the output that results is also an ontological entity with a structure that matches the organization of concepts in natural language.
  • the resulting ontological entities are predicate-argument structures designed in accordance with the best practices of artificial intelligences and knowledge-base research.
  • the ontology-based parser is designed around the idea that predicate structures represent a convenient approach to searching through text.
  • Predicate structures constitute the most compact possible representation for the relations between grammatical entities.
  • Most of the information required to construct predicates does not need to be stored, and once the predicates have been derived from a document, the predicates may be stored as literal text strings, to be used in the same way.
  • the system and method of ontology-based parsing of the present invention is directed towards techniques for deriving predicate structures with minimal computational effort.
  • the ontology-based parser is designed to permit the use of arithmetic operations instead of string operations in text-processing programs, which employ the ontology-based parser.
  • the output predicate structures contain numeric tags that represent the location of each concept within the ontology.
  • the tags are defined in terms of an absolute coordinate system that allows calculation of conceptual similarity according to the distance within a tree structure. All applications making use of the fact that the output of the ontology-based parser is an ontological entity may realize enormous speed benefits from the parameterized ontology that the parser utilizes .
  • U.S. Patent No. 4,864,502 to Kucera et al discloses a device that tags and parses natural-language sentences, and provides interactive facilities for grammar correction by an end user.
  • the system taught by Kucera et al. has a complicated analysis, and cannot afford semantic status to each word relative to all the other words within the dictionary.
  • the Kucera et al . system uses three parsing stages, each of which needs more than one pass through the sentence to complete its analysis.
  • U.S. Patent No. 4,887,212 to Zamora et al discloses a parser for syntactic analysis of text using a fast and compact technique. After part-of-speech tagging and disambiguation, syntactic analysis occurs in four steps.
  • the grammar of Zamora et al operates by making multiple passes to guess at noun phrases and verb phrases and then attempts to reconcile the results. Furthermore, the grammar violation checking technique of the Zamora et al. system checks only for syntactic correctness.
  • U.S. Patent No. 4,914,590 to Loatman et al discloses a natural language understanding system.
  • the goal of the Loatman et al . system is to provide a formal representation of the context of a sentence, not merely the sentence itself.
  • Case frames used in Loatman et al . require substantial hard-coded information to be px ⁇ ogrammed about each word, and a large number of case frames must be provided to obtain reasonable coverage.
  • U.S. patent No. 5,101,349 discloses a natural language processing system that makes provisions for validating grammar from the standpoint of syntactic well-formedness, but does not provide facilities for validating the semantic well-formedness of feature structures.
  • U.S. Patent No. 5,146,496 to Jensen discloses a technique for identifying predicate-argument relationships in natural language text. The Jensen system must create intermediate feature structures to store semantic roles, which are then used to fill in predicates whose deep structures have missing arguments. Post-parsing analysis is needed and the parsing time is impacted by the maintenance of these variables. Additionally, semantic feature compatibility checking is not possible with Jensen's system.
  • Stuckey discloses a parsing technique, which organizes natural language into symbolic complexes, which treat all words as either nouns or verbs.
  • the Stuckey system is oriented towards grammar-checker-style applications, and does not produce output suitable for a wide range of natural-language processing applications.
  • the parser of the Stuckey system is only suitable for grammar-checking applications .
  • U.S. Patent No. 5,960,384 to Brash discloses a parsing method and apparatus for symbolic expressions of thought such as English-language sentences.
  • the parser of the Brash system assumes a strict compositional semantics, where a sentence's interpretation is the sum of the lexical meanings of nearby constituents.
  • the Brash system cannot accommodate predicates with different numbers of arguments, and makes an arbitrary assumption that all relationships are transitive.
  • the Brash system makes no provisions for the possibility that immediate relationships are not in fact the correct expression of sentence-level concepts, because it assumes that syntactic constituency is always defined by immediate relationships.
  • the Brash system does not incorporate ontologies as the basis for its lexical resource, and therefore does not permit the output of the parser to be easily modified by other applications.
  • the Brash system requires target languages to have a natural word order that already largely corresponds to the style of its syntactic analysis.
  • Languages such as Japanese or Russian, which permit free ordering of words, but mark intended usage by morphological changes, would be difficult to parse using the Brash system.
  • U.S. Patent No. 5,386,406 to Hedin et al discloses a system for converting natural-language expressions into a language-independent conceptual schema.
  • the output of the Hedin et al . system is not suitable for use in a wide variety of applications (e.g. machine translation, document summarization, categorization) .
  • the Hedin et al . system depends on the application in which it is used.
  • the present invention is directed to an ontology-based parser for natural language processing. More particularly, the present invention relates to a system that provides a simple knowledge-base-style representation format for the manipulation of natural- language documents.
  • the system utilizes unstructured text as input and produces a set of data structures representing the conceptual content of the document as output.
  • the data is transformed using a syntactic parser and ontology.
  • the ontology is used as a lexical resource.
  • the output that results is also an ontological entity with a structure that matches the organization of concepts in natural language.
  • the resulting ontological entities are predicate-argument structures designed in accordance with the best practices of artificial intelligences and knowledge-base research.
  • the design of the ontology-based parser is based on the premise that predicate structures represent a convenient approach to searching through text. Predicate structures constitute the most compact possible representation for the relations between grammatical entities. Most of the information required to construct predicates does not need to be stored, and once the predicates have been derived from a document, the predicates may be stored as literal text strings, to be used in the same way.
  • the ontology-based parser of the present invention is directed towards techniques for deriving predicate structures with minimal computational effort .
  • the ontology-based parser is designed to permit the use of arithmetic operations instead of string operations in text-processing programs, which employ the ontology-based parser.
  • the output predicate structures contain numeric tags that represent the location of each concept within the ontology.
  • the tags are defined in terms of an absolute coordinate system that allows calculation of conceptual similarity according to the distance within a tree structure. All applications making use of the fact that the output of the ontology-based parser is an ontological entity may realize enormous speed benefits from the parameterized ontology that the parser utilizes.
  • the present system imposes a logical structure on text, and a semantic representation is the form used for storage.
  • the present system further provides logical representations for all content in documents. The advantages of the present system are the provision of a semantic representation of comparable utility with significantly reduced processing requirements, and no need to train the system to produce semantic representations of text content.
  • the system and method for ontological parsing of natural language has a far simpler analysis process than conventional parsing techniques, and utilizes a dictionary containing tags with syntactic information.
  • the preferred implementation of the present system and method affords semantic status to each word relative to all the other words within the dictionary, and uses a single-pass context-free grammar to provide complete predicate structures containing subject and object relationships.
  • the system and method of the present invention also provides a robust feature- checking system that accounts for semantic compatibility as well as syntactic compatibility.
  • the ontology of the present invention converts all inflected words to their canonical forms. Additionally, the system and method can filter lexical items according to their information content. For example, in an information retrieval application, it is capable of pulling out stopwords and unintended query words (as in the pseudo-concept and pseudo-predicate filters) .
  • the grammar of the system and method of the present invention operates in a single pass to produce predicate structure analyses, and groups noun phrases and verb phrases as they occur, not by making multiple passes to guess at them and then attempting to reconcile the results.
  • the grammar violation checking of the system and method of the present invention filters both by the probability of a syntactically successful parse and the compatibility of the lexical semantics of words in the ontology.
  • the compatibility referred to here is the self-consistent compatibility of words within the ontology; no particular requirement is imposed to force the ontology to be consistent with anything outside the present system.
  • predicates may be enhanced with selectional restriction information, which can be coded automatically for entire semantic classes of words , rather than on an individual basis, because of the ontological scheme.
  • selectional restriction information can be coded automatically for entire semantic classes of words , rather than on an individual basis, because of the ontological scheme.
  • the system of the present invention maintains arguments as variables during the parsing process, and automatically fills in long-distance dependencies as part of the parsing process.
  • the system and method of the present invention isolates predicate-argument relationships into a consistent format regardless of text types.
  • the predicate-argument relationships can be used in search, grammar-checking, summarization, and categorization applications, among others.
  • the system and method of the present invention can accommodate predicates with different numbers of arguments, and does not make arbitrary assumptions about predicate transitivity or intransitivity. Instead the system and method of the present invention incorporates a sophisticated syntactic analysis component, which allows facts about parts-of-speech to determine the correct syntactic analysis. Additionally, by incorporating ontologies as the basis for the lexical resource, the present invention permits the output of the parser to be easily modified by other applications. For example, a search engine incorporating our parser can easily substitute words corresponding to different levels of abstraction into the arguments of a predicate, thus broadening the search. As long as grammatical roles can be identified, the present system and method can be easily adapted to any language. For example, certain case-marked languages, such as Japanese or German, can be parsed through a grammar which simply records the grammatical relationships encoded by particular markers, and the resulting output is still compatible with the parsing results achieved for other languages.
  • Another object of the present invention is to provide a system and method for parsing natural language input that derives predicate structures with minimal computational effort.
  • Still another object of the present invention is to provide a system and method for parsing natural language input that permits the use of arithmetic operations in text-processing programs, where the output predicate structures contain numeric tags that represent the location of each concept within the ontology, and the tags are defined in terms of an absolute coordinate system that allows calculation of conceptual similarity according to the distance within a tree structure.
  • Another object of the present invention is to provide a system and method for parsing natural language input that realizes enormous speed benefits from the parameterized ontology that the parser utilizes.
  • FIG. 1 is a block diagram of the sentence lexer according to the present invention,-
  • FIG. 2 is a block diagram of the parser according to the present invention
  • FIG. 3 is a diagram showing two complete parse trees produced according to the present invention
  • FIG. 4 is an example parse tree according to the present invention.
  • FIG. 5 is another example parse tree according to the present invention.
  • FIG. 6 is another example parse tree according to the present invention.
  • FIG. 7 is another example parse tree incorporating real words according to the present invention.
  • concept means an abstract formal representation of meaning, which corresponds to multiple generic or specific words in multiple languages. Concepts may represent the meanings of individual words or phrases, or the meanings of entire sentences.
  • predicate means a concept that defines an n-ary relationship between other concepts.
  • a predicate structure is a data type that includes a predicate and multiple additional concepts; as a grouping of concepts, it is itself a concept.
  • An ontology is a hierarchically organized complex data structure that provides a context for the lexical meaning of concepts. An ontology may contain both individual concepts and predicates .
  • the ontology-based parser incorporates both a system and method for converting natural-language text into predicate-argument format that can be easily used by a variety of applications, including search engines, summarization applications, categorization applications, and word processors .
  • the ontology-based parser contains functional components for receiving documents in a plurality of formats, tokenizing them into instances of concepts from an ontology, and assembling the resulting concepts into predicate structures.
  • the ontological parser is designed to be modular, so that improvements and language-specific changes can be made to individual components without reengineering the other components.
  • the components are discussed in detail below.
  • the ontological parser has two major functional elements, a sentence lexer and a parser.
  • the sentence lexer takes a sentence and converts it into a sequence of ontological entities that are tagged with part-of-speech information.
  • the parser converts the sequence of ontological entities into predicate structures using a two-stage process that analyzes the grammatical structure of the sentence, and then applies rules to it that bind arguments into predicates .
  • Ontological parsing is a grammatical analysis technique built on the proposition that the most useful information that can be extracted from a sentence is the set of concepts within it, as well as their formal relations to each other.
  • Ontological parsing derives its power from the use of ontologies to situate words within the context of their meaning, and from the fact that it does not need to find the correct purely syntactic analysis of the structure of a sentence in order to produce the correct analysis of the meaning of a sentence.
  • An ontological parser is a tool that transforms natural -language sentences into predicate structures.
  • Predicate structures are representations of logical relationships between the words in a sentence. Every predicate structure contains a predicate, which is either a verb or a preposition, and a set of arguments, which may be any part of speech.
  • Predicates are words which not only have intrinsic meaning of their own, but which also provide logical relations between other concepts in a sentence. Those other concepts are the arguments of the predicate, and are generally nouns, because predicate relationships are usually between entities.
  • the ontological parser has two major components, a sentence lexer 100 and a parser 200.
  • the sentence lexer 100 is a tool for transforming text strings into ontological entities.
  • the parser is a tool for analyzing syntactic relationships between entities.
  • Document iterator 120 receives documents or text input 110, and outputs individual sentences to the lexer 130. As the lexer 130 receives each sentence, it passes each individual word to the ontology 140. If the word exists within the ontology 140, it is returned as an ontological entity; if not, it is returned as a word tagged with default assumptions about its ontological status. In one embodiment, words are automatically assumed to be nouns; however, the words may be other parts of speech.
  • Lexer filters 150 are modular plug-ins, which modify sentences based on knowledge about word meanings.
  • the preferred embodiment contains several filters 150, although more may be developed, and existing filters may be removed from future versions, without altering the scope of the invention.
  • an ontological parser may employ the following filters: proper noun filter, adjective filter, adverb filter, modal verb filter, and stop word filter.
  • an embodiment of the ontological parser optimized for queries may make use of all these filters, but add a pseudo-predicate filter and a pseudo-concept filter.
  • the stop word filter removes stop words from sentences. Stop words are words that serve only as placeholders in English-language sentences.
  • the stop word filter will contain a set of words accepted as stop words; any lexeme whose text is in that set is considered to be a stop word.
  • An adjective filter serves to remove lexemes representing adjective concepts from sentences. Adjective filter checks each adjective for a noun following the adjective. The noun must follow either immediately after the adjective, or have only adjective and conjunction words appearing between the noun and the adjective. If no such noun or conjunction is found, the adjective filter will veto the sentence.
  • the noun must also meet the selectional restrictions required by the adjective; if not, the adjective filter will veto the sentence. If a noun is found and it satisfies the restrictions of the adjective, the adjective filter will apply the selectional features of the adjective to the noun by adding all of the adjective's selectional features to the noun's set of selectional features.
  • the proper noun filter groups proper nouns in a sentence into single lexical nouns, rather than allowing them to pass as multiple-word sequences, which may be unparsable.
  • a proper noun is any word or phrase representing a non-generic noun concept. Although a number of proper nouns are already present in the lexicon, they are already properly treated as regular lexical items.
  • proper nouns behave syntactically as regular nouns, there is no need to distinguish proper nouns and nouns already in the lexicon.
  • the purpose of the proper noun filter is to ensure that sequences not already in the lexicon are treated as single words where appropriate.
  • the modal verb filter removes modal verbs from sentence objects.
  • Modal verbs are verbs such as "should", “could”, and "would". Such verbs alter the conditions under which a sentence is true, but do not affect the basic meaning of the sentence. Since truth conditions do not need to be addressed by the ontological parser 120 or 140, such words can be eliminated to reduce parsing complexity.
  • the modal verb filter will contain a set of modal verbs similar to the stop word list contained in stop word filter. Any Lexeme whose text is in that set and whose concept is a verb is identified as a modal verb, and will be removed.
  • the adverb filter removes Lexemes containing adverb concepts from sentences .
  • Adverbs detail the meaning of the verbs they accompany, but do not change them. Since the meaning of the sentence remains the same, adverbs can be removed to simplify parsing.
  • the pseudo-predicate filter operates in one embodiment, a query ontological parser. It removes verbs from queries which are not likely to be the actual predicate of the query. Pseudo-predicate verbs include "give”, "show", and "find”. Not all instances of these verbs are pseudo-predicates; however, the first instance of them in a query often is.
  • the deterministic rule to be used in implementing the pseudo- predicate filter is that it should remove any instance of these verbs not preceded by a content-bearing noun (i.e., one not appearing in the list of pseudo-concepts or stop words ) .
  • the pseudo-concept filter operates in one embodiment, a query ontological parser.
  • Pseudo-concepts are largely nouns, and can be captured by a stop word list. Pseudo-concepts include "I”, “me”, “you”, and in certain syntactic usages, "information", "news”, and related words.
  • Two rules are included in this example of a pseudo-concept filter implementation. The first rule is that any word relating to the user, or his current situation, such as "I” or "me” is always deleted. The second rule is that any of the "information" -type words is deleted when followed by a preposition.
  • the sentence receiver 220 obtains sentences 210 consisting of ontological entities produced by the sentence lexer 100. These sentences are parsed by the parser 230, which is designed to use a context-free grammar, although other grammatical models may be used without departing from the scope and spirit of the invention. Sentences are parsed into structures called parse trees, which represent the relationships between concepts in a sentence. Parse tree converter 240 receives the output of the parser 230, and converts the parse trees into predicates. Following the Parse tree converter, parser filters 250 operate on the predicates to remove erroneously generated predicates based on rules about the probability of syntactic analyses, as well as rules about the compatibility of concepts with each other.
  • the sentence receiver 220 is an architectural feature designed to provide an interface between the sentence lexer 100 and the ontological parser 200.
  • the sentence receiver is a software abstraction that may be realized through any number of techniques.
  • the parser 230 takes a sequence of instances from an ontology, in the form of a sentence, and converts them into a collection of parse trees.
  • the parser 230 will use a modified vez ⁇ sion of an LALR parser, which looks ahead (by one word) , scans the input from left-to- right, and constructs a parse tree from the bottom up.
  • the LALR parser is widely used and is better known as the approach used by parser generators such as yacc and bison.
  • LALR parsers and parser generators are incapable of handling ambiguous grammars, as well as some grammars that are not ambiguous but do not follow the prescribed LALR format. Consequently, a parser that handles both of these conditions is needed.
  • the parser 230 must pursue all possible parse trees, in effect branching and pursuing more than one path at every ambiguity.
  • the standard LALR parser is a finite state machine designed to build a parse tree from the set of grammar rules (called productions) one input symbol at a time.
  • the finite state machine makes use of a two-dimensional table, called an action table, that specifies what action the finite state machine is to perform when the state machine is in a given current state and the next symbol in the input stream is a given symbol.
  • an action table specifies what action the finite state machine is to perform when the state machine is in a given current state and the next symbol in the input stream is a given symbol.
  • a new character is read from the input stream and the character and current state are used to look up, in the action table, which action to perform.
  • the actions are in one of the following forms:
  • Shift actions cause the parser to enter a new state and indicate that some progress has been made in assembling the production currently in progress;
  • Reduce actions cause the parser to finish the current production and replace the assembled symbols with the symbol that replaces them;
  • LALR parsers can be generated by a standard algorithm that builds the parser finite state machine's action table from a set of grammar rules. These grammar rules, called productions, specify language that the target parser is supposed to recognize. Each production indicates that a specific combination of input symbols, called terminals, and assembled groups of terminals, called non-terminals, can be assembled into a new nonterminal.
  • the grammar, set of productions set forth below recognizes a string of at least one 'a':
  • the standard LALR parser generator algorithm fails when the grammar does not provide the parser generator enough information to decide whether the correction to perform given a certain current state and input symbol is to shift or to reduce.
  • the generator algorithm also fails when the grammar does not provide the parser generator enough information to decide which of two or more rules should be reduced. For instance, consider the following grammar:
  • an LALR parser generator would fail to produce a parser because of a shift/reduce conflict.
  • the parser generator would be unable to decide whether after having seen v a' as input and having looked ahead to see the coming *b' it should continue to work on assembling the production S ab (shift action) or reduce the rule A a (reduce action) .
  • the modified LALR parser generator algorithm that the ontological parser of the present invention uses must be aware of the possibility of more than one possible course of action, and should recursively try both actions.
  • a parser built to recognize the ambiguous grammar above would produce both of the complete parse trees shown in Fig. 3, for the input string 'ab.'
  • the modified LALR parser generator, grammar, and modified LALR parsing engine discussed previously should generate a non-deterministic recursive parser. Since a natural language is the input to the grammar, some sentences will fail to meet the foregoing conditions. In other cases, syntactic ambiguity will result in multiple possible parses. The parser should not generate any output trees for a sentence that does not reduce according to the rules; rather it should generate a tree for every possible parse of an ambiguous sentence. In the above example, NP represents a nominal phrase, VP represents a verbal phrase, and CONJ represents a conjunction.
  • Parser filters 250 are designed to prune out spurious parse trees generated by the parser
  • parser filter 230 by removing trees that violate either statistical or ontological criteria for well-formed-ness . While several types of parser filter are set forth above, other filters may be included, such as a selectional restriction filter and a parse probability filter.
  • the parser filters 250 may be chained together to form a list of filters to be applied to each candidate parse tree. Each parser filter 250 will keep track of the filter that should be applied immediately before it, and will submit candidate parse trees to that filter before performing a filtering function. Since each parse filter 250 may alter or veto each candidate parse tree, each parse filter 250 must expect this possible behavior from the previous filter in a chain.
  • a selectional restriction filter vetoes any parse tree where there are conflicts between the selectional features of the concepts serving as arguments to another concept and the restrictions of that concept. Selectional restrictions are imposed on the argument positions of predicate structures.
  • the filter checks the selectional features of the concepts, which could fill the argument slots, to see if they are compatible. This operation may be accomplished in several ways: If the ontology used by the parser only contains string labels for the nodes in a tree structure, the tree leading to the restriction must be established as a subtree of the selectional features of the argument. They must share the same hierarchy of features up to the point of the restriction. Consider a sample path through an ontology: transportation ⁇ vehicle ⁇ car ⁇ Ford.
  • any of the three more-specific words will be an acceptable argument for the predicate.
  • a parameterized ontology assigns numbers to these concepts, such that each level is a larger number than the previous level.
  • the set of numbers lOOO ⁇ llOO ⁇ lllO ⁇ llll.
  • the parse probability filter vetoes parse trees that fall below a minimum probability for valid semantic interpretation.
  • the parse probability filter will calculate the probability of a sentence parse by taking the product of the probabilities of the syntactic rules used to generate a given parse tree. Certain rules are more probable than others. However, appropriate probabilities for each rule can only be determined by experimentation. In the initial version, probabilities will be assigned by linguistic intuition; as iterations of the design progress, probabilities will be determined through experimentation. Since sentence probabilities are generally very small numbers, the parse probability filter should pass any parse tree with a probability of at least 30% of the highest probability parse.
  • Parse trees may be useful in some applications, and thus an interface is provided to output parse trees directly.
  • the intended output of the parser is the set of predicate structures that it builds for each sentence, and so the preferred parse tree receiver is a software module called a parse tree converter, which extracts predicate structures from the parse trees.
  • the predicate structures may be used by any application, which incorporates the present invention.
  • the modular design of the ontological parser permits the use of any pa t-of-speech-tagged ontology, with only minimal rewriting of the lexer and parser to accommodate format-specific issues.
  • maximum benefits are recognized through the use of a parameterized ontology, an innovation heretofore unavailable in any parser or information retrieval system.
  • Ontologies are hierarchies of related concepts, traditionally represented by tree structures. These trees are implemented via a variety of techniques, which are generally equivalent to doubly-linked lists.
  • a doubly-linked list is a collection of data objects containing at least three significant members: a pointer to the previous node in the list, a pointer to the following node in the list, and the data itself, which may take any form, depending on the purpose of the list.
  • Doubly-linked lists must be created with head and tail nodes, which terminate the list and are designed to keep traversals of the list in bounds. That is, the pointer to the node previous to the head contains the address of the head, and the pointer to the node after the tail contains the address of the tail. This structure guarantees that an arbitrary number of nodes may be inserted into the list without losing track of the locations of existing nodes, as well as enabling the list to be searched from either the top or bottom.
  • This assumption is that the number of branches in an ontological hierarchy, and their depth, can be determined by designing it to fixed parameters at the time of creation, and by selecting maximum values for the branches and the depths.
  • the ontology When the ontology is applied to natural-language processing applications, such as indexing web pages for a search engine, it will only be able to assign feature structures to those words, which are instances of concepts already in the ontology.
  • a limitation of this assumption is that substantially more effort must be applied in crafting the ontology, since re-indexing large volumes of text becomes extraordinarily expensive as the text grows.
  • the designers of a parameterized ontology must be certain that their coverage is adequate before making a decision to freeze the structure.
  • a parameterized ontology is not extensible, however.
  • a key to intelligent design is leaving room for expansion. As long as the maximum depth of trees is not reached, adding additional levels is transparent.
  • the trade-off in a parameterized ontology is selecting the size of a data structure so that it is no larger than it needs to be, but with adequate room for correcting mistakes or expanding coverage later on. It is possible to mitigate the risk entailed in reengineering a parameterized ontology by mapping the old structure to a new one, and simply writing a translation routine to recode existing data into the new form.
  • the proposed data structure includes an integer value, where each digit of the integer corresponds to a specific branch taken at the corresponding level in the tree.
  • the parameterization is thus encoded in two ways: the base (i.e., decimal, octal, etc.) of the integer bounds the number of branches extending from the root node(s) of the ontology, while the number of digits in the integer bounds the potential depths of the tree. For example, if an array with 10 elements, all of which were base-10 integers, was defined to be the representation of an ontology, a maximum of 10 10 (10 billion) distinct concepts could be defined.
  • the above data structure naturally lends itself to one particular algorithm for comparing the identity or subsumption of ontological features.
  • the algorithm relies on the implementation of the tree by associating with each node in the tree an integer value that represents the position of that node within the hierarchical structure.
  • Each arrowhead in Fig. 4 represents a concept node. The deeper into the tree (i.e., the higher the numbered node).
  • Such a representation scheme gives each node in the tree a unique identifier that completely determines the relative place of that node in the tree structure. It also provides a simple way to compare relative positions of two discovered node instances. This is as simple as subtracting the value of one node identifier from the other. For example, in a search engine application, it may be useful to check whether or not a particular noun can serve as an argument of a predicate. The features of the noun should be more specific than the features of the argument position it is attached to. This means that the noun should be deeper in the tree than the argument node . Similar features will have similar paths through the tree.
  • Node A is represented with the decimal number "1212."
  • Node B is represented with the decimal number "1220.”
  • the difference between Node A and Node B, taken digit-by- digit from left to right is "001-.” It is worth noting that once the first digit difference is detected, there is no further need to compute remaining digits. They diverge at level 3, the third digit in the representation, and thereafter lie along completely different sub-trees that do not intersect. Any further differences are thus meaningless and irrelevant.
  • decimal digits for each level of the tree has an inherent weakness in such a representation scheme.
  • a 10-digit decimal number allows 10 10 , or 10 billion possible concepts to be stored in the tree. That is a sufficient number of total concepts, but the branching factor is too small.
  • the concept "move.” Clearly there are many more than ten general ways (i.e., branches to the next level) in which to move, such as: walk, run, drive, sail, ride, fly, hop, swim, crawl, dance, slide, skid, roll, etc.
  • a warfare ontology As described above, the use of decimal digits for each level of the tree has an inherent weakness in such a representation scheme.
  • a 10-digit decimal number allows 10 10 , or 10 billion possible concepts to be stored in the tree. That is a sufficient number of total concepts, but the branching factor is too small.
  • Such a representation provides eight total ontological levels, and gives a branching factor at each node of 16 2 , or 256. This representation also provides optimized execution of the difference comparison, since using hexadecimals instead of decimals optimizes the logical digit-by-digit comparison to a computer-efficient byte-by-byte comparison.
  • decimal, hexadecimal, or multi-digit hexadecimal are typical parameter choices for the node encoding included in the present invention.
  • the specific parameters chosen do not alter the conception of the invention, which is the numerically encoded ontology tree.
  • another possible encoding of the ontology tree might involve a 40-digit decimal number. In such a case, 4 digits could be assigned to each node of the tree, implying that the tree could have up to 10 levels of depth. Such an encoding would allow 10 ⁇ -1 or 9,999 branches on each level, and a tree depth of 10.
  • a 36-digit hexadecimal encoding that assigns 3 digits to each node allows a branching factor at each node of 4095 (i.e., 16 3 -1) and a total depth of 12 levels.
  • node representation values should be computed on the fly as the tree is traversed or whether they should be stored at each node. It would certainly be possible to compute these dynamically, since any tree- search algorithm must keep track of which branches it traverses in trying to locate a particular node. However, as the search backtracks and corrects its path a fair number of adjustments and recalculations of the current node value would likely result.
  • the trade-off is to store at each node the relative position of the node in the tree via the 16-digit hexadecimal number. This would add 8 bytes of total storage to each node in the tree. For a 10 , 000-concept tree, this is only 80 KB.
  • the second method is to perform research from the ground up in defining an ontology, assigning elements on an as-needed basis. Since minimal representation size is a main goal of parameterizing the ontology, one would want to eliminate many of the redundancies found in general-purpose ontologies such as WordNet .
  • WordNet provides a concept for "run” which is derived from “move, " and another concept for "run” which is derived from “leave/go away, " where the two parent concepts are in no way linked. This distinction may have some psychological validity, but it is not computationally attractive to maintain this distinction in separate array elements .
  • a compromise approach is to attempt to make judgments about redundancy, and write software to merge branches as specified by the judgments of a knowledge engineer. This requires the creation of a table of equivalent branches and tree depths, and requires substantial knowledge engineering time, but not as much as attempting to create an ontology from the ground up.
  • the following is an example of a sentence and demonstrates both how it is parsed as a sentence within a document, and how a question to an information retrieval system would produce matching predicates to retrieve the document containing this sentence.
  • the example is explained with regard to how the sentence would be parsed as a declarative, and in a sample search engine application, how a query matching the sentence parse would also be generated.
  • the example sentence is:
  • the octopus has a heart.
  • the sentence lexer 100 would process this sentence.
  • the first component of the sentence lexer 100 the document iterator 110, would extract this sentence from the document it was contained in. At this stage, it would exist as the text string shown above. Following that, it would be passed to the lexer 120, which would access the ontology 140, and return the sequence:
  • The-det octopus-noun have-verb a-det heart-noun.
  • det stands for determiner, which is a word with a purely grammatical function, namely specifying a noun phrase.
  • the other tags, noun and verb indicate parts of speech with ontological content.
  • octopus-noun have-verb heart-noun
  • the sentence is then taken up by the sentence receiver 210, which passes it to the parser 220.
  • the parser 220 In the parser 220, the tree shown in Figure 6 is produced.
  • the parse tree converter 230 then converts this tree into a predicate, where octopus is the subject of have, and heart is the object.
  • the predicate is:
  • this predicate is then passed through the parser filters, where it successfully passes the parse probability and selectional feature compatibility tests.
  • "have” is a verb unlikely to have any selectional restrictions on arguments.
  • the predicate can be used within any application which benefits from the ability to manipulate natural language. Suppose that a user of a search engine which makes use of this parser asks the question:
  • the sentence lexer 100 will read the question, and a sentence made of ontological entities is produced. It reads:
  • the pseudo predicate filter removes the first verb "do," because it is not the main verb of the sentence. "Do" only serves to fill a grammatical role within this type of question, and is thus removed, leaving:
  • octopus-noun have-verb heart-noun
  • the ontological parser in this example embodiment receives this question, it generates a predicate identical to that from a declarative sentence, and they can be matched. In this way, the parser enables information retrieval using natural language.

Abstract

An ontology-based parser incorporates both a system and method for converting natural-language text into predicate-argument format that can be easily used by a variety of applications, including search engines, summarization applications, categorization applications, and word processors. The ontology-based parser contains functional components for receiving documents in a plurality of formats, tokenizing them into instances of concepts from an ontology, and assembling the resulting concepts into predicates. The ontological parser has two major functional elements, a sentence lexer and a parser. The sentence lexer takes a sentence and converts it into a sequence of ontological entities that are tagged with part-of-speech information. The parser converts the sequence of ontological entities into predicate structures using a two-stage process that analyzes the grammatical structure of the sentence, and then applies rules to it that bind arguments into predicates.

Description

ONTOLOGY-BASED PARSER FOR NATURAL LANGUAGE PROCESSING
This application claims the benefit of U.S. application Serial No. 09/697,676 filed October 27, 2000
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to an ontological parser for natural language processing. More particularly, the present invention relates to a system and method for ontological parsing of natural language that provides a simple knowledge-base-style representation format for the manipulation of natural- language documents. The system utilizes unstructured text as input and produces a set of data structures representing the conceptual content of the document as output. The data is transformed using a syntactic parser and ontology. The ontology is used as a lexical resource. The output that results is also an ontological entity with a structure that matches the organization of concepts in natural language. The resulting ontological entities are predicate-argument structures designed in accordance with the best practices of artificial intelligences and knowledge-base research.
The ontology-based parser is designed around the idea that predicate structures represent a convenient approach to searching through text. Predicate structures constitute the most compact possible representation for the relations between grammatical entities. Most of the information required to construct predicates does not need to be stored, and once the predicates have been derived from a document, the predicates may be stored as literal text strings, to be used in the same way. The system and method of ontology-based parsing of the present invention is directed towards techniques for deriving predicate structures with minimal computational effort.
In addition, the ontology-based parser is designed to permit the use of arithmetic operations instead of string operations in text-processing programs, which employ the ontology-based parser. The output predicate structures contain numeric tags that represent the location of each concept within the ontology. The tags are defined in terms of an absolute coordinate system that allows calculation of conceptual similarity according to the distance within a tree structure. All applications making use of the fact that the output of the ontology-based parser is an ontological entity may realize enormous speed benefits from the parameterized ontology that the parser utilizes .
2. Background of the Invention
Numerous techniques have been developed to process natural language input . These techniques tend to be complicated and cumbersome. Often numerous passes through the input sentence (s) are required to fully parse the input, thereby adding to the time required to parse the input. Often the previous techniques do not have very robust feature checking capabilities. In particular, the techniques do not check for both syntactic and semantic compatibility. Often these techniques expend significant time trying to parse words that can be pruned or filtered according to their information .
The previous techniques of natural language processing are often limited to the performance of a particular purpose and cannot be used for other purposes. Conventional parsing techniques may be designed to function as part of a grammar checking system, but cannot function as part of a search engine, summarization application, or categorization application. Furthermore, conventional parsing techniques do not take full advantage of an ontology as a lexical resource. This limits the versatility of the techniques.
U.S. Patent No. 4,864,502 to Kucera et al . discloses a device that tags and parses natural-language sentences, and provides interactive facilities for grammar correction by an end user. The system taught by Kucera et al. has a complicated analysis, and cannot afford semantic status to each word relative to all the other words within the dictionary. The Kucera et al . system uses three parsing stages, each of which needs more than one pass through the sentence to complete its analysis.
U.S. Patent No. 4,887,212 to Zamora et al . discloses a parser for syntactic analysis of text using a fast and compact technique. After part-of-speech tagging and disambiguation, syntactic analysis occurs in four steps. The grammar of Zamora et al . operates by making multiple passes to guess at noun phrases and verb phrases and then attempts to reconcile the results. Furthermore, the grammar violation checking technique of the Zamora et al. system checks only for syntactic correctness.
U.S. Patent No. 4,914,590 to Loatman et al . discloses a natural language understanding system. The goal of the Loatman et al . system is to provide a formal representation of the context of a sentence, not merely the sentence itself. Case frames used in Loatman et al . require substantial hard-coded information to be px~ogrammed about each word, and a large number of case frames must be provided to obtain reasonable coverage.
Tokuume et al., U.S. patent No. 5,101,349, discloses a natural language processing system that makes provisions for validating grammar from the standpoint of syntactic well-formedness, but does not provide facilities for validating the semantic well-formedness of feature structures. U.S. Patent No. 5,146,496 to Jensen discloses a technique for identifying predicate-argument relationships in natural language text. The Jensen system must create intermediate feature structures to store semantic roles, which are then used to fill in predicates whose deep structures have missing arguments. Post-parsing analysis is needed and the parsing time is impacted by the maintenance of these variables. Additionally, semantic feature compatibility checking is not possible with Jensen's system. U.S Patent No. 5,721,938 to Stuckey discloses a parsing technique, which organizes natural language into symbolic complexes, which treat all words as either nouns or verbs. The Stuckey system is oriented towards grammar-checker-style applications, and does not produce output suitable for a wide range of natural-language processing applications. The parser of the Stuckey system is only suitable for grammar-checking applications .
U.S. Patent No. 5,960,384 to Brash discloses a parsing method and apparatus for symbolic expressions of thought such as English-language sentences. The parser of the Brash system assumes a strict compositional semantics, where a sentence's interpretation is the sum of the lexical meanings of nearby constituents. The Brash system cannot accommodate predicates with different numbers of arguments, and makes an arbitrary assumption that all relationships are transitive. The Brash system makes no provisions for the possibility that immediate relationships are not in fact the correct expression of sentence-level concepts, because it assumes that syntactic constituency is always defined by immediate relationships. The Brash system does not incorporate ontologies as the basis for its lexical resource, and therefore does not permit the output of the parser to be easily modified by other applications. Furthermore, the Brash system requires target languages to have a natural word order that already largely corresponds to the style of its syntactic analysis. Languages such as Japanese or Russian, which permit free ordering of words, but mark intended usage by morphological changes, would be difficult to parse using the Brash system.
The patent to He phill et al . (U.S. Patent No. 4,984,178) discloses a chart parser designed to implement a probabilistic version of a unification-based grammar. The decision-making process occurs at intermediate parsing stages, and parse probabilities are considered before all parse paths have been pursued. Intermediate parse probability calculations have to be stored, and the system has to check for intermediate feature clashes.
U.S. Patent No. 5,386,406 to Hedin et al . discloses a system for converting natural-language expressions into a language-independent conceptual schema. The output of the Hedin et al . system is not suitable for use in a wide variety of applications (e.g. machine translation, document summarization, categorization) . The Hedin et al . system depends on the application in which it is used. SUMMARY OF THE INVENTION
The foregoing and other deficiencies are addressed hy the present invention, which is directed to an ontology-based parser for natural language processing. More particularly, the present invention relates to a system that provides a simple knowledge-base-style representation format for the manipulation of natural- language documents. The system utilizes unstructured text as input and produces a set of data structures representing the conceptual content of the document as output. The data is transformed using a syntactic parser and ontology. The ontology is used as a lexical resource. The output that results is also an ontological entity with a structure that matches the organization of concepts in natural language. The resulting ontological entities are predicate-argument structures designed in accordance with the best practices of artificial intelligences and knowledge-base research. The design of the ontology-based parser is based on the premise that predicate structures represent a convenient approach to searching through text. Predicate structures constitute the most compact possible representation for the relations between grammatical entities. Most of the information required to construct predicates does not need to be stored, and once the predicates have been derived from a document, the predicates may be stored as literal text strings, to be used in the same way. The ontology-based parser of the present invention is directed towards techniques for deriving predicate structures with minimal computational effort .
In addition, the ontology-based parser is designed to permit the use of arithmetic operations instead of string operations in text-processing programs, which employ the ontology-based parser. The output predicate structures contain numeric tags that represent the location of each concept within the ontology. The tags are defined in terms of an absolute coordinate system that allows calculation of conceptual similarity according to the distance within a tree structure. All applications making use of the fact that the output of the ontology-based parser is an ontological entity may realize enormous speed benefits from the parameterized ontology that the parser utilizes. The present system imposes a logical structure on text, and a semantic representation is the form used for storage. The present system further provides logical representations for all content in documents. The advantages of the present system are the provision of a semantic representation of comparable utility with significantly reduced processing requirements, and no need to train the system to produce semantic representations of text content.
The system and method for ontological parsing of natural language according to the present invention has a far simpler analysis process than conventional parsing techniques, and utilizes a dictionary containing tags with syntactic information. The preferred implementation of the present system and method affords semantic status to each word relative to all the other words within the dictionary, and uses a single-pass context-free grammar to provide complete predicate structures containing subject and object relationships. The system and method of the present invention also provides a robust feature- checking system that accounts for semantic compatibility as well as syntactic compatibility.
The ontology of the present invention converts all inflected words to their canonical forms. Additionally, the system and method can filter lexical items according to their information content. For example, in an information retrieval application, it is capable of pulling out stopwords and unintended query words (as in the pseudo-concept and pseudo-predicate filters) . In one embodiment, the grammar of the system and method of the present invention operates in a single pass to produce predicate structure analyses, and groups noun phrases and verb phrases as they occur, not by making multiple passes to guess at them and then attempting to reconcile the results. In the embodiment discussed above, the grammar violation checking of the system and method of the present invention filters both by the probability of a syntactically successful parse and the compatibility of the lexical semantics of words in the ontology. The compatibility referred to here is the self-consistent compatibility of words within the ontology; no particular requirement is imposed to force the ontology to be consistent with anything outside the present system.
In the predicate representation scheme of the present invention, there are only a few distinct frames for predicate structures, as many as needed to cover the different numbers of arguments taken by different verbs. Predicates may be enhanced with selectional restriction information, which can be coded automatically for entire semantic classes of words , rather than on an individual basis, because of the ontological scheme. The manner in which the present invention constructs parse trees, from which predicate structures and their arguments can be read directly, uses context-free grammars, which result in faster execution. The system of the present invention maintains arguments as variables during the parsing process, and automatically fills in long-distance dependencies as part of the parsing process. No post-parsing analysis is needed to obtain this benefit, and the parsing time is not impacted by the maintenance of these variables, thus resulting in faster parsing execution. Additionally, the ontologies used permit semantic feature compatibility checking. The system and method of the present invention isolates predicate-argument relationships into a consistent format regardless of text types. The predicate-argument relationships can be used in search, grammar-checking, summarization, and categorization applications, among others.
The system and method of the present invention can accommodate predicates with different numbers of arguments, and does not make arbitrary assumptions about predicate transitivity or intransitivity.. Instead the system and method of the present invention incorporates a sophisticated syntactic analysis component, which allows facts about parts-of-speech to determine the correct syntactic analysis. Additionally, by incorporating ontologies as the basis for the lexical resource, the present invention permits the output of the parser to be easily modified by other applications. For example, a search engine incorporating our parser can easily substitute words corresponding to different levels of abstraction into the arguments of a predicate, thus broadening the search. As long as grammatical roles can be identified, the present system and method can be easily adapted to any language. For example, certain case-marked languages, such as Japanese or German, can be parsed through a grammar which simply records the grammatical relationships encoded by particular markers, and the resulting output is still compatible with the parsing results achieved for other languages.
From the foregoing, it is an object of the present invention to provide a system and method for parsing natural language input that provides a simple knowledge- base-style representation format for the manipulation of natural-language documents.
Another object of the present invention is to provide a system and method for parsing natural language input that utilizes unstructured text as an input and produces a set of data structures representing the conceptual content of the document as output, where the output is an ontological entity with a structure that matches the organization of concepts in natural language. Still another object of the present invention is to provide a system and method for parsing natural language input that transforms data using a syntactic parser and ontology, where the ontology is used as a lexical resource. Yet another object of the present invention is to provide a system and method for parsing natural language input that provides ontological entities as output that are predicate-argument structures.
Another object of the present invention is to provide a system and method for parsing natural language input that derives predicate structures with minimal computational effort.
Still another object of the present invention is to provide a system and method for parsing natural language input that permits the use of arithmetic operations in text-processing programs, where the output predicate structures contain numeric tags that represent the location of each concept within the ontology, and the tags are defined in terms of an absolute coordinate system that allows calculation of conceptual similarity according to the distance within a tree structure. Another object of the present invention is to provide a system and method for parsing natural language input that realizes enormous speed benefits from the parameterized ontology that the parser utilizes.
BRIEF DESCRIPTION OF THE DRAWINGS These and other attributes of the present invention will be described with respect to the following drawings in which: FIG. 1 is a block diagram of the sentence lexer according to the present invention,-
FIG. 2 is a block diagram of the parser according to the present invention; FIG. 3 is a diagram showing two complete parse trees produced according to the present invention;
FIG. 4 is an example parse tree according to the present invention;
FIG. 5 is another example parse tree according to the present invention;
FIG. 6 is another example parse tree according to the present invention; and
FIG. 7 is another example parse tree incorporating real words according to the present invention.
DETAILED DESCRIPTION OF THE INVENTION
In the following detailed discussion of the present invention, numerous terms, specific to the subject matter of a system and method for concept-based searching, are used. In order to provide complete understanding of the present invention, the meaning of these terms is set forth below as follows:
The term concept as used herein means an abstract formal representation of meaning, which corresponds to multiple generic or specific words in multiple languages. Concepts may represent the meanings of individual words or phrases, or the meanings of entire sentences. The term predicate means a concept that defines an n-ary relationship between other concepts. A predicate structure is a data type that includes a predicate and multiple additional concepts; as a grouping of concepts, it is itself a concept. An ontology is a hierarchically organized complex data structure that provides a context for the lexical meaning of concepts. An ontology may contain both individual concepts and predicates .
II The ontology-based parser incorporates both a system and method for converting natural-language text into predicate-argument format that can be easily used by a variety of applications, including search engines, summarization applications, categorization applications, and word processors . The ontology-based parser contains functional components for receiving documents in a plurality of formats, tokenizing them into instances of concepts from an ontology, and assembling the resulting concepts into predicate structures.
The ontological parser is designed to be modular, so that improvements and language-specific changes can be made to individual components without reengineering the other components. The components are discussed in detail below.
The ontological parser has two major functional elements, a sentence lexer and a parser. The sentence lexer takes a sentence and converts it into a sequence of ontological entities that are tagged with part-of-speech information. The parser converts the sequence of ontological entities into predicate structures using a two-stage process that analyzes the grammatical structure of the sentence, and then applies rules to it that bind arguments into predicates . Ontological parsing is a grammatical analysis technique built on the proposition that the most useful information that can be extracted from a sentence is the set of concepts within it, as well as their formal relations to each other. Ontological parsing derives its power from the use of ontologies to situate words within the context of their meaning, and from the fact that it does not need to find the correct purely syntactic analysis of the structure of a sentence in order to produce the correct analysis of the meaning of a sentence. An ontological parser is a tool that transforms natural -language sentences into predicate structures. Predicate structures are representations of logical relationships between the words in a sentence. Every predicate structure contains a predicate, which is either a verb or a preposition, and a set of arguments, which may be any part of speech. Predicates are words which not only have intrinsic meaning of their own, but which also provide logical relations between other concepts in a sentence. Those other concepts are the arguments of the predicate, and are generally nouns, because predicate relationships are usually between entities.
As stated previously, the ontological parser has two major components, a sentence lexer 100 and a parser 200. The sentence lexer 100 is a tool for transforming text strings into ontological entities. The parser is a tool for analyzing syntactic relationships between entities.
Referring to Figure 1, the sentence lexer 100 is shown. Document iterator 120 receives documents or text input 110, and outputs individual sentences to the lexer 130. As the lexer 130 receives each sentence, it passes each individual word to the ontology 140. If the word exists within the ontology 140, it is returned as an ontological entity; if not, it is returned as a word tagged with default assumptions about its ontological status. In one embodiment, words are automatically assumed to be nouns; however, the words may be other parts of speech.
After the lexer 130 has checked the last word in a sentence against the contents of the ontology 140, the unparsed sentence is passed to a series of lexer filters 150. Lexer filters 150 are modular plug-ins, which modify sentences based on knowledge about word meanings. The preferred embodiment contains several filters 150, although more may be developed, and existing filters may be removed from future versions, without altering the scope of the invention. For example, in an information retrieval application, an ontological parser may employ the following filters: proper noun filter, adjective filter, adverb filter, modal verb filter, and stop word filter. Similarly, for information retrieval purposes, an embodiment of the ontological parser optimized for queries may make use of all these filters, but add a pseudo-predicate filter and a pseudo-concept filter. The stop word filter removes stop words from sentences. Stop words are words that serve only as placeholders in English-language sentences. The stop word filter will contain a set of words accepted as stop words; any lexeme whose text is in that set is considered to be a stop word. An adjective filter serves to remove lexemes representing adjective concepts from sentences. Adjective filter checks each adjective for a noun following the adjective. The noun must follow either immediately after the adjective, or have only adjective and conjunction words appearing between the noun and the adjective. If no such noun or conjunction is found, the adjective filter will veto the sentence. The noun must also meet the selectional restrictions required by the adjective; if not, the adjective filter will veto the sentence. If a noun is found and it satisfies the restrictions of the adjective, the adjective filter will apply the selectional features of the adjective to the noun by adding all of the adjective's selectional features to the noun's set of selectional features. The proper noun filter groups proper nouns in a sentence into single lexical nouns, rather than allowing them to pass as multiple-word sequences, which may be unparsable. A proper noun is any word or phrase representing a non-generic noun concept. Although a number of proper nouns are already present in the lexicon, they are already properly treated as regular lexical items. Since proper nouns behave syntactically as regular nouns, there is no need to distinguish proper nouns and nouns already in the lexicon. The purpose of the proper noun filter is to ensure that sequences not already in the lexicon are treated as single words where appropriate.
The modal verb filter removes modal verbs from sentence objects. Modal verbs are verbs such as "should", "could", and "would". Such verbs alter the conditions under which a sentence is true, but do not affect the basic meaning of the sentence. Since truth conditions do not need to be addressed by the ontological parser 120 or 140, such words can be eliminated to reduce parsing complexity. The modal verb filter will contain a set of modal verbs similar to the stop word list contained in stop word filter. Any Lexeme whose text is in that set and whose concept is a verb is identified as a modal verb, and will be removed.
The adverb filter removes Lexemes containing adverb concepts from sentences . Adverbs detail the meaning of the verbs they accompany, but do not change them. Since the meaning of the sentence remains the same, adverbs can be removed to simplify parsing.
The pseudo-predicate filter operates in one embodiment, a query ontological parser. It removes verbs from queries which are not likely to be the actual predicate of the query. Pseudo-predicate verbs include "give", "show", and "find". Not all instances of these verbs are pseudo-predicates; however, the first instance of them in a query often is. In one embodiment, the deterministic rule to be used in implementing the pseudo- predicate filter is that it should remove any instance of these verbs not preceded by a content-bearing noun (i.e., one not appearing in the list of pseudo-concepts or stop words ) . The pseudo-concept filter operates in one embodiment, a query ontological parser. It removes concepts from queries, which are not likely to be the actual concept the user intends. Pseudo-concepts are largely nouns, and can be captured by a stop word list. Pseudo-concepts include "I", "me", "you", and in certain syntactic usages, "information", "news", and related words. Two rules are included in this example of a pseudo-concept filter implementation. The first rule is that any word relating to the user, or his current situation, such as "I" or "me" is always deleted. The second rule is that any of the "information" -type words is deleted when followed by a preposition.
The configuration of the parser 200 is shown in Figure 2. First, the sentence receiver 220 obtains sentences 210 consisting of ontological entities produced by the sentence lexer 100. These sentences are parsed by the parser 230, which is designed to use a context-free grammar, although other grammatical models may be used without departing from the scope and spirit of the invention. Sentences are parsed into structures called parse trees, which represent the relationships between concepts in a sentence. Parse tree converter 240 receives the output of the parser 230, and converts the parse trees into predicates. Following the Parse tree converter, parser filters 250 operate on the predicates to remove erroneously generated predicates based on rules about the probability of syntactic analyses, as well as rules about the compatibility of concepts with each other.
The sentence receiver 220 is an architectural feature designed to provide an interface between the sentence lexer 100 and the ontological parser 200. The sentence receiver is a software abstraction that may be realized through any number of techniques. The parser 230 takes a sequence of instances from an ontology, in the form of a sentence, and converts them into a collection of parse trees. Preferably, the parser 230 will use a modified vez~sion of an LALR parser, which looks ahead (by one word) , scans the input from left-to- right, and constructs a parse tree from the bottom up. The LALR parser is widely used and is better known as the approach used by parser generators such as yacc and bison. While the description is a preferred embodiment, it will be understood that any implementation of a context-free grammar within a similar architecture, including such variants as an LALR-2 parser (which looks ahead by two words) , are within the scope of the present invention. LALR parsers and parser generators are incapable of handling ambiguous grammars, as well as some grammars that are not ambiguous but do not follow the prescribed LALR format. Consequently, a parser that handles both of these conditions is needed. The parser 230 must pursue all possible parse trees, in effect branching and pursuing more than one path at every ambiguity.
The standard LALR parser is a finite state machine designed to build a parse tree from the set of grammar rules (called productions) one input symbol at a time. The finite state machine makes use of a two-dimensional table, called an action table, that specifies what action the finite state machine is to perform when the state machine is in a given current state and the next symbol in the input stream is a given symbol. At every cycle, a new character is read from the input stream and the character and current state are used to look up, in the action table, which action to perform. The actions are in one of the following forms:
Shift actions cause the parser to enter a new state and indicate that some progress has been made in assembling the production currently in progress; Reduce actions cause the parser to finish the current production and replace the assembled symbols with the symbol that replaces them;
Accepts cause the parser to finish assembling a complete parse tree and halt;
Errors cause the parser to give up because no grammar rule is available to reconcile what has already been parsed with what remains in the input stream. LALR parsers can be generated by a standard algorithm that builds the parser finite state machine's action table from a set of grammar rules. These grammar rules, called productions, specify language that the target parser is supposed to recognize. Each production indicates that a specific combination of input symbols, called terminals, and assembled groups of terminals, called non-terminals, can be assembled into a new nonterminal. For example, the grammar, set of productions, set forth below recognizes a string of at least one 'a':
S S a S a
The standard LALR parser generator algorithm fails when the grammar does not provide the parser generator enough information to decide whether the correction to perform given a certain current state and input symbol is to shift or to reduce. The generator algorithm also fails when the grammar does not provide the parser generator enough information to decide which of two or more rules should be reduced. For instance, consider the following grammar:
S AB s ab
A a
B b
Given this grammar, an LALR parser generator would fail to produce a parser because of a shift/reduce conflict. The parser generator would be unable to decide whether after having seen v a' as input and having looked ahead to see the coming *b' it should continue to work on assembling the production S ab (shift action) or reduce the rule A a (reduce action) . The modified LALR parser generator algorithm that the ontological parser of the present invention uses must be aware of the possibility of more than one possible course of action, and should recursively try both actions. Using the modified LALR parsing approach, a parser built to recognize the ambiguous grammar above would produce both of the complete parse trees shown in Fig. 3, for the input string 'ab.'
An example of a context-free grammar that would be used in implementing the parser is as follows: S→NP VP
VP→V NP→N
VP→V NP NP→N PP
VP→VP CONJ NP NP→Adj N VP→V NP CONJ NP NP→Adj Adj N
VP-W NP PP NP→Adj N PP
VP→V NP VP NP→Adj Adj N PP
VP→ V that S NP→NP CONJ NP PP→P NP PP→P CONJ NP
COMMA→CONJ The modified LALR parser generator, grammar, and modified LALR parsing engine discussed previously should generate a non-deterministic recursive parser. Since a natural language is the input to the grammar, some sentences will fail to meet the foregoing conditions. In other cases, syntactic ambiguity will result in multiple possible parses. The parser should not generate any output trees for a sentence that does not reduce according to the rules; rather it should generate a tree for every possible parse of an ambiguous sentence. In the above example, NP represents a nominal phrase, VP represents a verbal phrase, and CONJ represents a conjunction.
Since the parser is both probabilistic and operating on multiple streams of possible ontological entities, it is necessary to prune out spurious parse trees generated by the parser 230. Parser filters 250 are designed to prune out spurious parse trees generated by the parser
230, by removing trees that violate either statistical or ontological criteria for well-formed-ness . While several types of parser filter are set forth above, other filters may be included, such as a selectional restriction filter and a parse probability filter.
Similar to the lexer filters 150, the parser filters 250 may be chained together to form a list of filters to be applied to each candidate parse tree. Each parser filter 250 will keep track of the filter that should be applied immediately before it, and will submit candidate parse trees to that filter before performing a filtering function. Since each parse filter 250 may alter or veto each candidate parse tree, each parse filter 250 must expect this possible behavior from the previous filter in a chain.
A selectional restriction filter vetoes any parse tree where there are conflicts between the selectional features of the concepts serving as arguments to another concept and the restrictions of that concept. Selectional restrictions are imposed on the argument positions of predicate structures. The filter checks the selectional features of the concepts, which could fill the argument slots, to see if they are compatible. This operation may be accomplished in several ways: If the ontology used by the parser only contains string labels for the nodes in a tree structure, the tree leading to the restriction must be established as a subtree of the selectional features of the argument. They must share the same hierarchy of features up to the point of the restriction. Consider a sample path through an ontology: transportation→vehicle→car→Ford. In this example, if the argument position of a predicate must be an example of transportation, then any of the three more-specific words will be an acceptable argument for the predicate. However, it will take multiple iterations through the hierarchy to discover this fact. For example, if the word that actually occurs in a sentence is "Ford, " the filter will first determine that Ford is an example of a car, and then that car is an example of a vehicle, and only after three attempts will it find that Ford is a word which agrees with the selectional restriction of "transportation." Similarly, the filter would need to check twice to determine that "car" is in agreement with "transportation, " and once for "vehicle."
In contrast, a parameterized ontology assigns numbers to these concepts, such that each level is a larger number than the previous level. Suppose we assign to the same sequence of concepts the set of numbers: lOOO→llOO→lllO→llll.
We can then subtract numbers to see if the features are in agreement, and a non-negative result suffices to prove this. Thus, if we want to see if "Ford" is an example of "transportation," we subtract as follows: 1111-1110 = 1.
Since 1 is nonnegative, we know that the features are in agreement. If concepts are identical, they will subtract to zero, which is equivalent to passing the filter by having two identical strings. As a final example, if an argument had to be an instance of "vehicle," and the word actually used in the sentence was "transportation, " then the selectional restriction filter would calculate: 1000 - 1100 = -100.
This result is negative, so the parse would be rejected because of feature incompatibility.
The parse probability filter vetoes parse trees that fall below a minimum probability for valid semantic interpretation. The parse probability filter will calculate the probability of a sentence parse by taking the product of the probabilities of the syntactic rules used to generate a given parse tree. Certain rules are more probable than others. However, appropriate probabilities for each rule can only be determined by experimentation. In the initial version, probabilities will be assigned by linguistic intuition; as iterations of the design progress, probabilities will be determined through experimentation. Since sentence probabilities are generally very small numbers, the parse probability filter should pass any parse tree with a probability of at least 30% of the highest probability parse.
Parse trees may be useful in some applications, and thus an interface is provided to output parse trees directly. However, the intended output of the parser is the set of predicate structures that it builds for each sentence, and so the preferred parse tree receiver is a software module called a parse tree converter, which extracts predicate structures from the parse trees. The predicate structures may be used by any application, which incorporates the present invention.
The modular design of the ontological parser permits the use of any pa t-of-speech-tagged ontology, with only minimal rewriting of the lexer and parser to accommodate format-specific issues. However, maximum benefits are recognized through the use of a parameterized ontology, an innovation heretofore unavailable in any parser or information retrieval system.
Ontologies are hierarchies of related concepts, traditionally represented by tree structures. These trees are implemented via a variety of techniques, which are generally equivalent to doubly-linked lists. A doubly-linked list is a collection of data objects containing at least three significant members: a pointer to the previous node in the list, a pointer to the following node in the list, and the data itself, which may take any form, depending on the purpose of the list. Doubly-linked lists must be created with head and tail nodes, which terminate the list and are designed to keep traversals of the list in bounds. That is, the pointer to the node previous to the head contains the address of the head, and the pointer to the node after the tail contains the address of the tail. This structure guarantees that an arbitrary number of nodes may be inserted into the list without losing track of the locations of existing nodes, as well as enabling the list to be searched from either the top or bottom.
However, the great flexibility of tree data structures, which may encompass trees of arbitrary depth, also imposes a significant cost in computability. The utility of ontologies derives from their use as a reference tree structure encompassing all relationships between all concepts within the information domain they are created for. Knowledge bases contain instances of real data, which represent a location somewhere within the ontology. Validating the equivalence of an instance with a concept in an ontology entails comparing the features of an instance with the features of a concept. Since algorithms to compare these features must be general enough to cover the potentially arbitrary number of levels from the root of the ontology to the feature in question, they cannot be optimized to compare such trees in a single operation. Instead, they must traverse the list of links and compare structures on a node-by-node basis to guarantee identity. Complicating this procedure is the fact that concepts may be cross-linked across multiple branches of a tree, sharing multiple structures. This entails even more general-purpose algorithms for logic programming, as several branches of a tree need to be followed. The result is that the time complexity of structure-comparison algorithms attains the polynomial order of the number of features (or nodes) being compared. This fact makes the use of ontologies inefficient for high-performance computing applications, such as searching terabyte-sized databases with wide- ranging conceptual content. A crucial assumption may be used to define the problem so that algorithms can be designed much more efficiently to compare structures. This assumption is that the number of branches in an ontological hierarchy, and their depth, can be determined by designing it to fixed parameters at the time of creation, and by selecting maximum values for the branches and the depths. When the ontology is applied to natural-language processing applications, such as indexing web pages for a search engine, it will only be able to assign feature structures to those words, which are instances of concepts already in the ontology. Crucially, a limitation of this assumption is that substantially more effort must be applied in crafting the ontology, since re-indexing large volumes of text becomes extraordinarily expensive as the text grows. The designers of a parameterized ontology must be certain that their coverage is adequate before making a decision to freeze the structure.
This is different than the concept that a parameterized ontology is not extensible, however. A key to intelligent design is leaving room for expansion. As long as the maximum depth of trees is not reached, adding additional levels is transparent. The trade-off in a parameterized ontology is selecting the size of a data structure so that it is no larger than it needs to be, but with adequate room for correcting mistakes or expanding coverage later on. It is possible to mitigate the risk entailed in reengineering a parameterized ontology by mapping the old structure to a new one, and simply writing a translation routine to recode existing data into the new form.
Since algorithm design and implementation are distinct and separable issues, an embodiment of a parameterized ontology's data structures has not yet been discussed. The following is a suggested implementation. The proposed data structure includes an integer value, where each digit of the integer corresponds to a specific branch taken at the corresponding level in the tree. The parameterization is thus encoded in two ways: the base (i.e., decimal, octal, etc.) of the integer bounds the number of branches extending from the root node(s) of the ontology, while the number of digits in the integer bounds the potential depths of the tree. For example, if an array with 10 elements, all of which were base-10 integers, was defined to be the representation of an ontology, a maximum of 1010 (10 billion) distinct concepts could be defined.
The above data structure naturally lends itself to one particular algorithm for comparing the identity or subsumption of ontological features. The algorithm relies on the implementation of the tree by associating with each node in the tree an integer value that represents the position of that node within the hierarchical structure. Consider, for example, the tree illustrated in Figure 4. Each arrowhead in Fig. 4 represents a concept node. The deeper into the tree (i.e., the higher the numbered
~>s level of the concept node) , the more specific the concept is. Consider one path through Fig. 4. The path starts at the root node (Level 1) and takes the 2nd branch to level 2, then takes the 3rd branch from that node to get to level 3. Thus, an appropriate (decimal) representation of this node might be "1230" (where all horizontal branch choices are counted from left to right, the root node is the highest significant digit of the representation, and counting the root node as node #1 of level 1.) The final "O" is a terminator, indicating that this particular node of the tree is not at the lowest possible level of the tree; it does not necessarily indicate that no nodes branch from this level. In fact, this is clearly not the case in this example. Such a representation scheme gives each node in the tree a unique identifier that completely determines the relative place of that node in the tree structure. It also provides a simple way to compare relative positions of two discovered node instances. This is as simple as subtracting the value of one node identifier from the other. For example, in a search engine application, it may be useful to check whether or not a particular noun can serve as an argument of a predicate. The features of the noun should be more specific than the features of the argument position it is attached to. This means that the noun should be deeper in the tree than the argument node . Similar features will have similar paths through the tree.
Referring to Fig. 5, an example is illustrated. Node A is represented with the decimal number "1212." Node B is represented with the decimal number "1220." The difference between Node A and Node B, taken digit-by- digit from left to right is "001-." It is worth noting that once the first digit difference is detected, there is no further need to compute remaining digits. They diverge at level 3, the third digit in the representation, and thereafter lie along completely different sub-trees that do not intersect. Any further differences are thus meaningless and irrelevant.
If the ontological tree structure is carefully crafted, proximity within the tree should, in some measure, correspond to ontological proximity. Therefore, detecting the first digit difference, as above, gives a reasonable measure of the degree of ontological proximity of the two concepts. The closer the concepts are, the smaller the numerical value of the divergence. Thus, for example, the node to Node A's immediate left, is represented by "1211." When the difference comparison is made, it works out to be "0001," which implies a correspondingly close ontological relationship between the two concepts.
At this point, it is useful to consider how real words map into this sort of coding, and what its limitations are. For example, consider a tree shown in Fig. 7. It is clear that in some cases, it is useful to know the distance between words, but that it is not equally useful in all cases. For example, since "bread" and "broccoli" are nodes which directly inherit all the properties of "food, " it is useful to know that one of these words is more specified than the other in cases where we want to search for only foods of certain types . However, since neither of these terms shares any properties beyond "organic" with "amino acid, " it is not helpful to know the distance between "bread" and "amino acid," even though they are only one level apart.
This makes the utility of the numerical encoding scheme as a parsing tool clearer. During the sentence lexer stage, words are labeled with information from the ontology, including these numerical codes. The argument position for each predicate structure may be tagged with codes from any level of the ontology. The parser will only output predicate structures where the noun inherits at least those features specified by the code. For example, the object of the verb "eat" is usually a type of food. A predicate structure built from "eat" might thus require that the object of the predicate have a code beginning with "112. " As can be seen from the tree shown, it is clear that all the foods listed inherit the "112" prefix.
The sign of the difference between tree entries is irrelevant. The difference is simply a digit-by-digit comparison that starts with the most significant bit and continues until the first decimal digit difference is located. Importantly, though, the differences due to inheritance along incompatible sub-trees do not correspond to elements of natural-language meaning.
Thus, to use the example above, even though "amino acid" and "food" differ by the same order of magnitude from "organic," they are not synonymous, and applications making use of this coding must be aware of this fact. A side benefit from this algorithm is that it provides an intuitive, natural ranking algorithm. Larger values from the subtraction operation mean further distance apart in the tree, so even when two concepts are in the same branch, the representation provides a convenient metric of conceptual distance. The results from the feature-comparison operation could be used in a ranking algorithm so that smaller differences receive higher relevance rankings. However, it is clear from the tree above that not all differences are equally meaningful. In order for the magnitude of the difference to be relevant, it must first be the case that one of the concepts inherits all the properties of the others.
As described above, the use of decimal digits for each level of the tree has an inherent weakness in such a representation scheme. A 10-digit decimal number allows 1010, or 10 billion possible concepts to be stored in the tree. That is a sufficient number of total concepts, but the branching factor is too small. There can be a maximum of ten possible branches out of each node to the next level. As an example of the problem inherent in this limit, consider the concept "move." Clearly there are many more than ten general ways (i.e., branches to the next level) in which to move, such as: walk, run, drive, sail, ride, fly, hop, swim, crawl, dance, slide, skid, roll, etc. As a more specialized example, consider a warfare ontology. The concept of "weapon" could include such varied concepts as tank rifle, cannon, machine gun, chemical gas, viral agent, germ agent, bomber, fighter plane, pistol, bomb, incendiary device, nuclear weapon, missile, bazooka, and so on. Consequently, ten is too small to constrain the branching factor for each level. The use of a hexadecimal representation would improve this some by increasing the branching factor to 16. Thus, using a 16-digit (i.e., a 64-bit) hexadecimal number gives 16 branches at each node for 16 levels: 1616 possible concepts. In addition to eliminating the need to do binary-to-decimal conversions, such a hexadecimal representation stores more concepts than any reasonable ontology will ever need. Despite such an improvement over a decimal representation, the branching factor of only 16 is still unacceptably small.
A solution to this is to use a modified hexadecimal representation. Since it is unlikely that a reasonable, specialized ontology will need more than eight levels of general concept representation, a 16-digit hexadecimal number can be interpreted slightly differently, as an octet of hexadecimal pairs:
52C2 6296 AC19 0000 -> 52 C2 62 96 AC 19 00 00 Such a representation provides eight total ontological levels, and gives a branching factor at each node of 162, or 256. This representation also provides optimized execution of the difference comparison, since using hexadecimals instead of decimals optimizes the logical digit-by-digit comparison to a computer-efficient byte-by-byte comparison.
It should also be noted that the above examples of decimal, hexadecimal, or multi-digit hexadecimal are typical parameter choices for the node encoding included in the present invention. The specific parameters chosen do not alter the conception of the invention, which is the numerically encoded ontology tree. For example, another possible encoding of the ontology tree might involve a 40-digit decimal number. In such a case, 4 digits could be assigned to each node of the tree, implying that the tree could have up to 10 levels of depth. Such an encoding would allow 10^-1 or 9,999 branches on each level, and a tree depth of 10.
Similarly, a 36-digit hexadecimal encoding that assigns 3 digits to each node allows a branching factor at each node of 4095 (i.e., 163-1) and a total depth of 12 levels.
One other factor that should be considered is whether these node representation values should be computed on the fly as the tree is traversed or whether they should be stored at each node. It would certainly be possible to compute these dynamically, since any tree- search algorithm must keep track of which branches it traverses in trying to locate a particular node. However, as the search backtracks and corrects its path a fair number of adjustments and recalculations of the current node value would likely result. The trade-off is to store at each node the relative position of the node in the tree via the 16-digit hexadecimal number. This would add 8 bytes of total storage to each node in the tree. For a 10 , 000-concept tree, this is only 80 KB. For a 100, 000-concept tree, it is 800 KB. And for a 1 , 000 , 000-concept tree, it is 8 MB. Regardless of whether the values are maintained statically or dynamically, it is clear that both implementation details fall within the spirit and scope of the invention. It should be readily apparent that the ordering of elements of the code can be arbitrary, but must be used consistently in order to compare features. There are two ways to construct a parameterized ontology. The first method is to simply freeze an existing ontology, write a program to find the maximum tree depths and number of branches, and then write another program to recode the pointer information into array elements and depths. This method allows rapid bootstrapping of existing ontologies to higher levels of performance, although it will preserve any redundancies and inefficiencies in the original construction.
The second method is to perform research from the ground up in defining an ontology, assigning elements on an as-needed basis. Since minimal representation size is a main goal of parameterizing the ontology, one would want to eliminate many of the redundancies found in general-purpose ontologies such as WordNet . For example, WordNet provides a concept for "run" which is derived from "move, " and another concept for "run" which is derived from "leave/go away, " where the two parent concepts are in no way linked. This distinction may have some psychological validity, but it is not computationally attractive to maintain this distinction in separate array elements .
A compromise approach is to attempt to make judgments about redundancy, and write software to merge branches as specified by the judgments of a knowledge engineer. This requires the creation of a table of equivalent branches and tree depths, and requires substantial knowledge engineering time, but not as much as attempting to create an ontology from the ground up.
The following is an example of a sentence and demonstrates both how it is parsed as a sentence within a document, and how a question to an information retrieval system would produce matching predicates to retrieve the document containing this sentence. The example is explained with regard to how the sentence would be parsed as a declarative, and in a sample search engine application, how a query matching the sentence parse would also be generated. The example sentence is:
The octopus has a heart.
First, the sentence lexer 100 would process this sentence. The first component of the sentence lexer 100, the document iterator 110, would extract this sentence from the document it was contained in. At this stage, it would exist as the text string shown above. Following that, it would be passed to the lexer 120, which would access the ontology 140, and return the sequence:
The-det octopus-noun have-verb a-det heart-noun.
Here, det stands for determiner, which is a word with a purely grammatical function, namely specifying a noun phrase. The other tags, noun and verb, indicate parts of speech with ontological content. Thus, when the sentence passes through the lexer filters 150 as discussed in the previous example embodiment, the stop word filter removes "a" and "the," leaving:
octopus-noun have-verb heart-noun
The sentence is then taken up by the sentence receiver 210, which passes it to the parser 220. In the parser 220, the tree shown in Figure 6 is produced. The parse tree converter 230 then converts this tree into a predicate, where octopus is the subject of have, and heart is the object. The predicate is:
have<octopus, heart> In this sample embodiment, this predicate is then passed through the parser filters, where it successfully passes the parse probability and selectional feature compatibility tests. In the foregoing example, "have" is a verb unlikely to have any selectional restrictions on arguments. Following filtering, the predicate can be used within any application which benefits from the ability to manipulate natural language. Suppose that a user of a search engine which makes use of this parser asks the question:
Do octopuses have hearts?
The sentence lexer 100 will read the question, and a sentence made of ontological entities is produced. It reads:
Do-verb octopus-noun have-verb heart-noun
In the preferred embodiment's lexer filters, the pseudo predicate filter removes the first verb "do," because it is not the main verb of the sentence. "Do" only serves to fill a grammatical role within this type of question, and is thus removed, leaving:
octopus-noun have-verb heart-noun
This is identical to the sentence produced above, and results in the same parse tree, and the same predicate structure. Thus, when the ontological parser in this example embodiment receives this question, it generates a predicate identical to that from a declarative sentence, and they can be matched. In this way, the parser enables information retrieval using natural language. Having described several embodiments of the concept- based indexing and search system in accordance with the
ii pχ-esent invention, it is believed that other modifications, variations and changes will be suggested to those skilled in the art in view of the description set forth above. It is therefore to be understood that all such variations, modifications and changes are believed to fall within the scope of the invention as defined in the appended claims.

Claims

WHAT IS CLAIMED IS:
1. A system for ontological parsing that converts natural-language text into predicate-argument format comprising: a sentence lexer for converting a natural language sentence into a sequence of ontological entities that are tagged with part-of-speech information; and a parser for converting the sequence of ontological entities into predicate structures using a two-stage process that analyzes the grammatical structure of natural language sentence, and binds arguments into predicates .
2. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 1, wherein said sentence lexer comprises : a document iterator that receives text input and outputs individual sentences; a lexer that receives said individual sentences from said sentence lexer and outputs individual words ; and an ontology that receives said words from said lexer and returns ontological entities or a word tagged with default assumptions about ontological status of said words, to said lexer.
3. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 2, further comprising lexer filters for modifying said sentences based on word meanings.
4. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 3, wherein said lexer filters may comprise at least one of a noun filter, adjective filter, adverb filter, modal verb filter, stop word filter, a pseudo-predicate filter, and a pseudo-concept filter.
5. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 4, wherein said stop word filter removes stop words from said sentences.
6. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 4, wherein said adjective filter removes lexemes representing adjectives from said sentences.
7 A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 4, wherein said noun filter groups proper nouns into single lexical nouns.
8. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 4, wherein said modal verb filter removes modal verbs from objects of said sentences.
9. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 4, wherein said adverb filter removes lexemes containing adverb concepts from said sentences.
10. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim4, wherein said pseudo-predicate filter removes verbs from queries .
11. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim4, wherein said pseudo-concept filter removes concepts from queries.
12. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 1, wherein said parser comprises: a sentence receiver that receives sentences including ontological entities from said sentence lexer; a parser that parses said sentences, received by said sentence receiver, into parse trees representing concepts in a sentence; and a parse tree converter that receives the output of said parser and converts said parse trees into predicates .
13. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 12, wherein said parser further comprises : parser filters operating on said predicates to remove erroneous predicates .
14. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 12, wherein said parser looks ahead at least one word, scans input from left-to-right, and constructs said parse tree.
15. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 13, wherein said parser filters remove parse trees that violate one of statistical and ontological criteria for well-formedness.
16. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 15, wherein said parser filters include a selectional restriction filter and a parse probability filter .
17. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 16, wherein said selectional restriction filter vetoes parse trees having conflicts between selectional features of concepts serving as arguments to a second concept and restrictions of said concept.
18. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 2, wherein said ontology is a parameterized ontology that assigns numbers to said concepts .
19. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 18, wherein said numbers can be subtracted to determine if features are in agreement, wherein a non-negative number indicates agreement.
20. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 18, wherein said numbers can be subtracted to determine if features are in agreement, wherein a negative number indicates feature incompatibility.
21. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 16, wherein said parse probability filter vetoes parse trees that fall below a minimum probability for semantic interpretation.
22. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 1, wherein said system is modular to permit the use of any part-of-speech-tagged ontology.
23. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 18, wherein in said parameterized ontology each data structure includes an integer value, where each digit of said integer corresponds to a specific branch taken at a corresponding level in said parse tree.
24. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 23, wherein said parameterization is encoded in two ways : a base of said integer bounds a number of branches extending from a root node of said ontology, while a number of digits in the integer bounds a potential depths of said parse tree.
25. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 23, wherein a first digit difference between two nodes provides a measure of the degree of ontological proximity of two concepts.
26. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 22, wherein said parse trees are represented by modified hexadecimal digits that have an octet of hexadecimal pairs to provide eight ontological levels and a branching factor at each node of 256.
27. A method of ontological parsing that converts natural-language text into predicate-argument format comprising the steps of: converting a natural language sentence into a sequence of ontological entities that are tagged with part-of-speech information; and converting said sequence of ontological entities into predicate structures using a two-stage process that analyzes the grammatical structure of a natural language sentence, and binds arguments into predicates.
28. A method of ontological parsing that converts natural-language text into predicate-argument format as recited in claim 27, further comprising the step of modifying said sentences based on word meanings.
29. A method of ontological parsing that converts natural-language text into predicate-argument format as recited in claim 27, further comprising the steps of: receiving sentences including ontological entities; parsing said sentences into parse trees representing concepts in a sentence; and converting said parse trees into predicates.
30. A method of ontological parsing that converts natural-language text into predicate-argument format as recited in claim 27, wherein said parsing comprises the step of looking ahead one word, scanning input from left- to-right, and constructing said parse tree.
31. A method of ontological parsing that converts natural-language text into predicate-argument format as recited in claim 27, further comprising the step of removing parse trees that violate one of the statistical and ontological criteria for well-formedness.
32. A method of ontological parsing that converts natural-language text into predicate-argument format as recited in claim 27, further comprising the step of vetoing parse trees having conflicts between selectional features of concepts serving as arguments to a second concept and restrictions of said concept.
33. A method of ontological parsing that converts natural-language text into predicate-argument format as recited in claim 27, further comprising the step of assigning numbers to said concepts.
34. A method of ontological parsing that converts natural-language text into predicate-argument format as recited in claim 27, further comprising the step of subtracting said numbers to determine if features are in agreement, wherein a non-negative number indicates agreement .
35. A method of ontological parsing that converts natural-language text into predicate-argument format as recited in claim 33, further comprising the step of subtracting said numbers to determine if features are in agreement, wherein when a negative number indicates feature incompatibility.
36. A method of ontological parsing that converts natural-language text into predicate-argument format as recited in claim 27, further comprising the step of vetoing parse trees that fall below a minimum probability for semantic interpretation.
37. A method of ontological parsing that converts natural-language text into predicate-argument format as recited in claim 27, wherein in said parameterized ontology each data structure includes an integer value, where each digit of said integer corresponds to a specific branch taken at a corresponding level in said parse tree.
38. A method of ontological parsing that converts natural-language text into predicate-argument format as recited in claim 27, further comprising the step of encoding said parameterization in two ways: a base of said integer bounds a number of branches extending from a root node of said ontology, while a number of digits in the integer bounds a potential depths of said parse tree.
39. A method of ontological parsing that converts natural-language text into predicate-argument format as recited in claim 38, wherein a first digit difference between two nodes provides a measure of the degree of ontological proximity of two concepts.
40. A method of ontological parsing that converts natural-language text into predicate-argument format as recited in claim 27, further comprising the step of representing said parse trees by modified hexadecimal numbers that have an octet of hexadecimal pairs to provide eight ontological levels and a branching factor at each node of 256.
41. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 23, wherein said parse trees are represented by multiple digits that are separated into multiple groups to provide multiple ontological levels and a branching factor at each node.
42. A method of ontological parsing that converts natural-language text into predicate-argument format as recited in claim 27, further comprising the step of representing said parse trees by multiple digits that ax~e separated into groups to provide multiple ontological levels and a branching factor at each node.
PCT/US2001/032636 2000-10-27 2001-10-26 Ontology-based parser for natural language processing WO2002035376A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2002224446A AU2002224446A1 (en) 2000-10-27 2001-10-26 Ontology-based parser for natural language processing

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US09/697,676 US7027974B1 (en) 2000-10-27 2000-10-27 Ontology-based parser for natural language processing
US09/697,676 2000-10-27

Publications (2)

Publication Number Publication Date
WO2002035376A2 true WO2002035376A2 (en) 2002-05-02
WO2002035376A3 WO2002035376A3 (en) 2003-08-28

Family

ID=24802088

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2001/032636 WO2002035376A2 (en) 2000-10-27 2001-10-26 Ontology-based parser for natural language processing

Country Status (3)

Country Link
US (1) US7027974B1 (en)
AU (1) AU2002224446A1 (en)
WO (1) WO2002035376A2 (en)

Cited By (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006013233A1 (en) * 2004-07-01 2006-02-09 France Telecom Method and device for automatic processing of a language
US7668850B1 (en) 2006-05-10 2010-02-23 Inquira, Inc. Rule based navigation
US7739104B2 (en) 2005-05-27 2010-06-15 Hakia, Inc. System and method for natural language processing and using ontological searches
US7747601B2 (en) 2006-08-14 2010-06-29 Inquira, Inc. Method and apparatus for identifying and classifying query intent
US8082264B2 (en) 2004-04-07 2011-12-20 Inquira, Inc. Automated scheme for identifying user intent in real-time
US8095476B2 (en) 2006-11-27 2012-01-10 Inquira, Inc. Automated support scheme for electronic forms
US8539001B1 (en) 2012-08-20 2013-09-17 International Business Machines Corporation Determining the value of an association between ontologies
US8612208B2 (en) * 2004-04-07 2013-12-17 Oracle Otc Subsidiary Llc Ontology for use with a system, method, and computer readable medium for retrieving information and response to a query
US8747115B2 (en) 2012-03-28 2014-06-10 International Business Machines Corporation Building an ontology by transforming complex triples
US8781813B2 (en) 2006-08-14 2014-07-15 Oracle Otc Subsidiary Llc Intent management tool for identifying concepts associated with a plurality of users' queries
US8793208B2 (en) 2009-12-17 2014-07-29 International Business Machines Corporation Identifying common data objects representing solutions to a problem in different disciplines
US9152623B2 (en) 2012-11-02 2015-10-06 Fido Labs, Inc. Natural language processing system and method
EP2988231A1 (en) * 2014-08-21 2016-02-24 Samsung Electronics Co., Ltd. Method and apparatus for providing summarized content to users
US9483506B2 (en) 2014-11-05 2016-11-01 Palantir Technologies, Inc. History preserving data pipeline
GB2537903A (en) * 2015-04-30 2016-11-02 Toshiba Res Europe Ltd Device and method for a spoken dialogue system
US9740369B2 (en) 2013-03-15 2017-08-22 Palantir Technologies Inc. Systems and methods for providing a tagging interface for external content
US9898167B2 (en) 2013-03-15 2018-02-20 Palantir Technologies Inc. Systems and methods for providing a tagging interface for external content
US9922108B1 (en) 2017-01-05 2018-03-20 Palantir Technologies Inc. Systems and methods for facilitating data transformation
US9946777B1 (en) 2016-12-19 2018-04-17 Palantir Technologies Inc. Systems and methods for facilitating data transformation
US9965534B2 (en) 2015-09-09 2018-05-08 Palantir Technologies, Inc. Domain-specific language for dataset transformations
US9996595B2 (en) 2015-08-03 2018-06-12 Palantir Technologies, Inc. Providing full data provenance visualization for versioned datasets
US10007674B2 (en) 2016-06-13 2018-06-26 Palantir Technologies Inc. Data revision control in large-scale data analytic systems
US10102229B2 (en) 2016-11-09 2018-10-16 Palantir Technologies Inc. Validating data integrations using a secondary data store
US10120857B2 (en) 2013-03-15 2018-11-06 Palantir Technologies Inc. Method and system for generating a parser and parsing complex data
CN109241522A (en) * 2018-08-02 2019-01-18 义语智能科技(上海)有限公司 Coding-decoding method and equipment
US10248722B2 (en) 2016-02-22 2019-04-02 Palantir Technologies Inc. Multi-language support for dynamic ontology
CN110569342A (en) * 2019-08-15 2019-12-13 阿里巴巴集团控股有限公司 question matching method, device, equipment and computer readable storage medium
US10572529B2 (en) 2013-03-15 2020-02-25 Palantir Technologies Inc. Data integration tool
US10691729B2 (en) 2017-07-07 2020-06-23 Palantir Technologies Inc. Systems and methods for providing an object platform for a relational database
US10754822B1 (en) 2018-04-18 2020-08-25 Palantir Technologies Inc. Systems and methods for ontology migration
US10803106B1 (en) 2015-02-24 2020-10-13 Palantir Technologies Inc. System with methodology for dynamic modular ontology
US10872067B2 (en) 2006-11-20 2020-12-22 Palantir Technologies, Inc. Creating data in a data store using a dynamic ontology
US10956406B2 (en) 2017-06-12 2021-03-23 Palantir Technologies Inc. Propagated deletion of database records and derived data
US10956508B2 (en) 2017-11-10 2021-03-23 Palantir Technologies Inc. Systems and methods for creating and managing a data integration workspace containing automatically updated data models
US10956670B2 (en) 2018-03-03 2021-03-23 Samurai Labs Sp. Z O.O. System and method for detecting undesirable and potentially harmful online behavior
US11461355B1 (en) 2018-05-15 2022-10-04 Palantir Technologies Inc. Ontological mapping of data

Families Citing this family (408)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7089218B1 (en) 2004-01-06 2006-08-08 Neuric Technologies, Llc Method for inclusion of psychological temperament in an electronic emulation of the human brain
US8725493B2 (en) * 2004-01-06 2014-05-13 Neuric Llc Natural language parsing method to provide conceptual flow
AU6630800A (en) * 1999-08-13 2001-03-13 Pixo, Inc. Methods and apparatuses for display and traversing of links in page character array
US8645137B2 (en) 2000-03-16 2014-02-04 Apple Inc. Fast, language-independent method for user authentication by voice
US7146399B2 (en) * 2001-05-25 2006-12-05 2006 Trident Company Run-time architecture for enterprise integration with transformation generation
US7877421B2 (en) * 2001-05-25 2011-01-25 International Business Machines Corporation Method and system for mapping enterprise data assets to a semantic information model
US20060064666A1 (en) 2001-05-25 2006-03-23 Amaru Ruth M Business rules for configurable metamodels and enterprise impact analysis
US20030101170A1 (en) * 2001-05-25 2003-05-29 Joseph Edelstein Data query and location through a central ontology model
US8412746B2 (en) 2001-05-25 2013-04-02 International Business Machines Corporation Method and system for federated querying of data sources
US7099885B2 (en) * 2001-05-25 2006-08-29 Unicorn Solutions Method and system for collaborative ontology modeling
US7003444B2 (en) * 2001-07-12 2006-02-21 Microsoft Corporation Method and apparatus for improved grammar checking using a stochastic parser
ITFI20010199A1 (en) 2001-10-22 2003-04-22 Riccardo Vieri SYSTEM AND METHOD TO TRANSFORM TEXTUAL COMMUNICATIONS INTO VOICE AND SEND THEM WITH AN INTERNET CONNECTION TO ANY TELEPHONE SYSTEM
US7373597B2 (en) * 2001-10-31 2008-05-13 University Of Medicine & Dentistry Of New Jersey Conversion of text data into a hypertext markup language
JP3921523B2 (en) * 2001-12-27 2007-05-30 独立行政法人情報通信研究機構 Text generation method and text generation apparatus
US7225183B2 (en) * 2002-01-28 2007-05-29 Ipxl, Inc. Ontology-based information management system and method
US7260570B2 (en) * 2002-02-01 2007-08-21 International Business Machines Corporation Retrieving matching documents by queries in any national language
US6952691B2 (en) 2002-02-01 2005-10-04 International Business Machines Corporation Method and system for searching a multi-lingual database
WO2003077152A2 (en) * 2002-03-04 2003-09-18 University Of Southern California Sentence generator
US7627603B2 (en) * 2002-03-28 2009-12-01 Precache Inc. Method and apparatus for implementing query-response interactions in a publish-subscribe network
US7805302B2 (en) * 2002-05-20 2010-09-28 Microsoft Corporation Applying a structured language model to information extraction
US20040039562A1 (en) * 2002-06-17 2004-02-26 Kenneth Haase Para-linguistic expansion
WO2003107223A1 (en) * 2002-06-17 2003-12-24 Beingmeta, Inc. Systems and methods for processing queries
FI121583B (en) * 2002-07-05 2011-01-14 Syslore Oy Finding a Symbol String
US7493253B1 (en) * 2002-07-12 2009-02-17 Language And Computing, Inc. Conceptual world representation natural language understanding system and method
US7263517B2 (en) * 2002-10-31 2007-08-28 Biomedical Objects, Inc. Structured natural language query and knowledge system
US7669134B1 (en) * 2003-05-02 2010-02-23 Apple Inc. Method and apparatus for displaying information during an instant messaging session
US7854009B2 (en) 2003-06-12 2010-12-14 International Business Machines Corporation Method of securing access to IP LANs
US7383302B2 (en) * 2003-09-15 2008-06-03 International Business Machines Corporation Method and system for providing a common collaboration framework accessible from within multiple applications
US8014997B2 (en) * 2003-09-20 2011-09-06 International Business Machines Corporation Method of search content enhancement
EP1678630A2 (en) * 2003-10-17 2006-07-12 Rightscom Limited Computer implemented methods and systems for representing multiple schemas and transferring data between different data schemas within a contextual ontology
US20050114475A1 (en) * 2003-11-24 2005-05-26 Hung-Yang Chang System and method for collaborative development environments
JP2005165958A (en) * 2003-12-05 2005-06-23 Ibm Japan Ltd Information retrieval system, information retrieval support system and method therefor, and program
US7865354B2 (en) * 2003-12-05 2011-01-04 International Business Machines Corporation Extracting and grouping opinions from text documents
JP2005182280A (en) * 2003-12-17 2005-07-07 Ibm Japan Ltd Information retrieval system, retrieval result processing system, information retrieval method, and program
US7565368B2 (en) * 2004-05-04 2009-07-21 Next It Corporation Data disambiguation systems and methods
US7254589B2 (en) * 2004-05-21 2007-08-07 International Business Machines Corporation Apparatus and method for managing and inferencing contextural relationships accessed by the context engine to answer queries received from the application program interface, wherein ontology manager is operationally coupled with a working memory
WO2005122042A2 (en) * 2004-06-02 2005-12-22 Catalis, Inc. Method and system for generating medical narrative
US7328209B2 (en) * 2004-08-11 2008-02-05 Oracle International Corporation System for ontology-based semantic matching in a relational database system
US20060047690A1 (en) * 2004-08-31 2006-03-02 Microsoft Corporation Integration of Flex and Yacc into a linguistic services platform for named entity recognition
US20060047500A1 (en) * 2004-08-31 2006-03-02 Microsoft Corporation Named entity recognition using compiler methods
US20060047691A1 (en) * 2004-08-31 2006-03-02 Microsoft Corporation Creating a document index from a flex- and Yacc-generated named entity recognizer
US20060047646A1 (en) * 2004-09-01 2006-03-02 Maluf David A Query-based document composition
US20060074833A1 (en) * 2004-09-03 2006-04-06 Biowisdom Limited System and method for notifying users of changes in multi-relational ontologies
US20060053171A1 (en) * 2004-09-03 2006-03-09 Biowisdom Limited System and method for curating one or more multi-relational ontologies
US20060053382A1 (en) * 2004-09-03 2006-03-09 Biowisdom Limited System and method for facilitating user interaction with multi-relational ontologies
US20060053173A1 (en) * 2004-09-03 2006-03-09 Biowisdom Limited System and method for support of chemical data within multi-relational ontologies
US7493333B2 (en) 2004-09-03 2009-02-17 Biowisdom Limited System and method for parsing and/or exporting data from one or more multi-relational ontologies
US20060053174A1 (en) * 2004-09-03 2006-03-09 Bio Wisdom Limited System and method for data extraction and management in multi-relational ontology creation
US7496593B2 (en) * 2004-09-03 2009-02-24 Biowisdom Limited Creating a multi-relational ontology having a predetermined structure
US20060053175A1 (en) * 2004-09-03 2006-03-09 Biowisdom Limited System and method for creating, editing, and utilizing one or more rules for multi-relational ontology creation and maintenance
US20060053172A1 (en) * 2004-09-03 2006-03-09 Biowisdom Limited System and method for creating, editing, and using multi-relational ontologies
US7505989B2 (en) 2004-09-03 2009-03-17 Biowisdom Limited System and method for creating customized ontologies
US8473449B2 (en) * 2005-01-06 2013-06-25 Neuric Technologies, Llc Process of dialogue and discussion
US8775158B2 (en) * 2005-08-04 2014-07-08 Nec Corporation Data processing device, data processing method, and data processing program
US8677377B2 (en) * 2005-09-08 2014-03-18 Apple Inc. Method and apparatus for building an intelligent automated assistant
JP4427500B2 (en) * 2005-09-29 2010-03-10 株式会社東芝 Semantic analysis device, semantic analysis method, and semantic analysis program
US7633076B2 (en) 2005-09-30 2009-12-15 Apple Inc. Automated response to and sensing of user activity in portable devices
US7840451B2 (en) * 2005-11-07 2010-11-23 Sap Ag Identifying the most relevant computer system state information
US8805675B2 (en) * 2005-11-07 2014-08-12 Sap Ag Representing a computer system state to a user
US8832064B2 (en) * 2005-11-30 2014-09-09 At&T Intellectual Property Ii, L.P. Answer determination for natural language questioning
US7979295B2 (en) * 2005-12-02 2011-07-12 Sap Ag Supporting user interaction with a computer system
US7676489B2 (en) * 2005-12-06 2010-03-09 Sap Ag Providing natural-language interface to repository
WO2007097208A1 (en) * 2006-02-27 2007-08-30 Nec Corporation Language processing device, language processing method, and language processing program
US20090112583A1 (en) * 2006-03-07 2009-04-30 Yousuke Sakao Language Processing System, Language Processing Method and Program
US7962328B2 (en) * 2006-03-13 2011-06-14 Lexikos Corporation Method and apparatus for generating a compact data structure to identify the meaning of a symbol
US7765097B1 (en) * 2006-03-20 2010-07-27 Intuit Inc. Automatic code generation via natural language processing
US7869125B2 (en) * 2006-04-17 2011-01-11 Raytheon Company Multi-magnification viewing and aiming scope
WO2008027503A2 (en) * 2006-08-31 2008-03-06 The Regents Of The University Of California Semantic search engine
US9318108B2 (en) 2010-01-18 2016-04-19 Apple Inc. Intelligent automated assistant
US9633005B2 (en) 2006-10-10 2017-04-25 Abbyy Infopoisk Llc Exhaustive automatic processing of textual information
US9495358B2 (en) * 2006-10-10 2016-11-15 Abbyy Infopoisk Llc Cross-language text clustering
US20080086298A1 (en) * 2006-10-10 2008-04-10 Anisimovich Konstantin Method and system for translating sentences between langauges
US8145473B2 (en) 2006-10-10 2012-03-27 Abbyy Software Ltd. Deep model statistics method for machine translation
US8548795B2 (en) * 2006-10-10 2013-10-01 Abbyy Software Ltd. Method for translating documents from one language into another using a database of translations, a terminology dictionary, a translation dictionary, and a machine translation system
US9047275B2 (en) 2006-10-10 2015-06-02 Abbyy Infopoisk Llc Methods and systems for alignment of parallel text corpora
US9984071B2 (en) 2006-10-10 2018-05-29 Abbyy Production Llc Language ambiguity detection of text
US8214199B2 (en) * 2006-10-10 2012-07-03 Abbyy Software, Ltd. Systems for translating sentences between languages using language-independent semantic structures and ratings of syntactic constructions
US9645993B2 (en) 2006-10-10 2017-05-09 Abbyy Infopoisk Llc Method and system for semantic searching
US8195447B2 (en) 2006-10-10 2012-06-05 Abbyy Software Ltd. Translating sentences between languages using language-independent semantic structures and ratings of syntactic constructions
US9235573B2 (en) 2006-10-10 2016-01-12 Abbyy Infopoisk Llc Universal difference measure
US9588958B2 (en) * 2006-10-10 2017-03-07 Abbyy Infopoisk Llc Cross-language text classification
US8073681B2 (en) 2006-10-16 2011-12-06 Voicebox Technologies, Inc. System and method for a cooperative conversational voice user interface
US20080256026A1 (en) * 2006-10-17 2008-10-16 Michael Glen Hays Method For Optimizing And Executing A Query Using Ontological Metadata
US7725466B2 (en) * 2006-10-24 2010-05-25 Tarique Mustafa High accuracy document information-element vector encoding server
US8515912B2 (en) 2010-07-15 2013-08-20 Palantir Technologies, Inc. Sharing and deconflicting data changes in a multimaster database system
JP4451435B2 (en) * 2006-12-06 2010-04-14 本田技研工業株式会社 Language understanding device, language understanding method, and computer program
US7818176B2 (en) 2007-02-06 2010-10-19 Voicebox Technologies, Inc. System and method for selecting and presenting advertisements based on natural language processing of voice-based input
US7912828B2 (en) * 2007-02-23 2011-03-22 Apple Inc. Pattern searching methods and apparatuses
US8959011B2 (en) 2007-03-22 2015-02-17 Abbyy Infopoisk Llc Indicating and correcting errors in machine translation systems
US8977255B2 (en) 2007-04-03 2015-03-10 Apple Inc. Method and system for operating a multi-function portable electronic device using voice-activation
US20080249762A1 (en) * 2007-04-05 2008-10-09 Microsoft Corporation Categorization of documents using part-of-speech smoothing
US8812296B2 (en) 2007-06-27 2014-08-19 Abbyy Infopoisk Llc Method and system for natural language dictionary generation
US8452725B2 (en) * 2008-09-03 2013-05-28 Hamid Hatami-Hanza System and method of ontological subject mapping for knowledge processing applications
ITFI20070177A1 (en) 2007-07-26 2009-01-27 Riccardo Vieri SYSTEM FOR THE CREATION AND SETTING OF AN ADVERTISING CAMPAIGN DERIVING FROM THE INSERTION OF ADVERTISING MESSAGES WITHIN AN EXCHANGE OF MESSAGES AND METHOD FOR ITS FUNCTIONING.
WO2009026140A2 (en) * 2007-08-16 2009-02-26 Hollingsworth William A Automatic text skimming using lexical chains
US8712758B2 (en) * 2007-08-31 2014-04-29 Microsoft Corporation Coreference resolution in an ambiguity-sensitive natural language processing system
US20090070322A1 (en) * 2007-08-31 2009-03-12 Powerset, Inc. Browsing knowledge on the basis of semantic relations
US8209321B2 (en) * 2007-08-31 2012-06-26 Microsoft Corporation Emphasizing search results according to conceptual meaning
US8229970B2 (en) * 2007-08-31 2012-07-24 Microsoft Corporation Efficient storage and retrieval of posting lists
US8463593B2 (en) 2007-08-31 2013-06-11 Microsoft Corporation Natural language hypernym weighting for word sense disambiguation
US8229730B2 (en) * 2007-08-31 2012-07-24 Microsoft Corporation Indexing role hierarchies for words in a search index
US8346756B2 (en) * 2007-08-31 2013-01-01 Microsoft Corporation Calculating valence of expressions within documents for searching a document index
US8868562B2 (en) * 2007-08-31 2014-10-21 Microsoft Corporation Identification of semantic relationships within reported speech
US8280721B2 (en) * 2007-08-31 2012-10-02 Microsoft Corporation Efficiently representing word sense probabilities
US7984032B2 (en) * 2007-08-31 2011-07-19 Microsoft Corporation Iterators for applying term occurrence-level constraints in natural language searching
US8041697B2 (en) * 2007-08-31 2011-10-18 Microsoft Corporation Semi-automatic example-based induction of semantic translation rules to support natural language search
US8316036B2 (en) * 2007-08-31 2012-11-20 Microsoft Corporation Checkpointing iterators during search
US8024177B2 (en) * 2007-09-28 2011-09-20 Cycorp, Inc. Method of transforming natural language expression into formal language representation
US9053089B2 (en) 2007-10-02 2015-06-09 Apple Inc. Part-of-speech tagging using latent analogy
US8595642B1 (en) 2007-10-04 2013-11-26 Great Northern Research, LLC Multiple shell multi faceted graphical user interface
US8165886B1 (en) 2007-10-04 2012-04-24 Great Northern Research LLC Speech interface system and method for control and interaction with applications on a computing system
US8364694B2 (en) 2007-10-26 2013-01-29 Apple Inc. Search assistant for digital media assets
US8620662B2 (en) 2007-11-20 2013-12-31 Apple Inc. Context-aware unit selection
US8412516B2 (en) * 2007-11-27 2013-04-02 Accenture Global Services Limited Document analysis, commenting, and reporting system
US8140335B2 (en) 2007-12-11 2012-03-20 Voicebox Technologies, Inc. System and method for providing a natural language voice user interface in an integrated voice navigation services environment
US10002189B2 (en) 2007-12-20 2018-06-19 Apple Inc. Method and apparatus for searching using an active ontology
US9330720B2 (en) 2008-01-03 2016-05-03 Apple Inc. Methods and apparatus for altering audio output signals
US8327272B2 (en) 2008-01-06 2012-12-04 Apple Inc. Portable multifunction device, method, and graphical user interface for viewing and managing electronic calendars
US8065143B2 (en) 2008-02-22 2011-11-22 Apple Inc. Providing text input using speech data and non-speech data
US8289283B2 (en) * 2008-03-04 2012-10-16 Apple Inc. Language input interface on a device
US8996376B2 (en) 2008-04-05 2015-03-31 Apple Inc. Intelligent text-to-speech conversion
US10496753B2 (en) 2010-01-18 2019-12-03 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US9305548B2 (en) 2008-05-27 2016-04-05 Voicebox Technologies Corporation System and method for an integrated, multi-modal, multi-device natural language voice services environment
US8473279B2 (en) * 2008-05-30 2013-06-25 Eiman Al-Shammari Lemmatizing, stemming, and query expansion method and system
US8738360B2 (en) 2008-06-06 2014-05-27 Apple Inc. Data detection of a character sequence having multiple possible data types
US8464150B2 (en) 2008-06-07 2013-06-11 Apple Inc. Automatic language identification for dynamic text processing
US20100030549A1 (en) 2008-07-31 2010-02-04 Lee Michael M Mobile device having human language translation capability with positional feedback
US9262409B2 (en) 2008-08-06 2016-02-16 Abbyy Infopoisk Llc Translation of a selected text fragment of a screen
US8794972B2 (en) * 2008-08-07 2014-08-05 Lynn M. LoPucki System and method for enhancing comprehension and readability of legal text
US8768702B2 (en) 2008-09-05 2014-07-01 Apple Inc. Multi-tiered voice feedback in an electronic device
US8898568B2 (en) 2008-09-09 2014-11-25 Apple Inc. Audio user interface
US8352268B2 (en) 2008-09-29 2013-01-08 Apple Inc. Systems and methods for selective rate of speech and speech preferences for text to speech synthesis
US20100082328A1 (en) * 2008-09-29 2010-04-01 Apple Inc. Systems and methods for speech preprocessing in text to speech synthesis
US8352272B2 (en) * 2008-09-29 2013-01-08 Apple Inc. Systems and methods for text to speech synthesis
US8396714B2 (en) * 2008-09-29 2013-03-12 Apple Inc. Systems and methods for concatenation of words in text to speech synthesis
US8583418B2 (en) 2008-09-29 2013-11-12 Apple Inc. Systems and methods of detecting language and natural language strings for text to speech synthesis
US8355919B2 (en) * 2008-09-29 2013-01-15 Apple Inc. Systems and methods for text normalization for text to speech synthesis
US8712776B2 (en) 2008-09-29 2014-04-29 Apple Inc. Systems and methods for selective text to speech synthesis
US8676904B2 (en) 2008-10-02 2014-03-18 Apple Inc. Electronic devices with voice command and contextual data processing capabilities
WO2010067118A1 (en) 2008-12-11 2010-06-17 Novauris Technologies Limited Speech recognition involving a mobile device
US8862252B2 (en) 2009-01-30 2014-10-14 Apple Inc. Audio user interface for displayless electronic device
US8417523B2 (en) * 2009-02-03 2013-04-09 SoftHUS Sp z.o.o Systems and methods for interactively accessing hosted services using voice communications
US8326637B2 (en) 2009-02-20 2012-12-04 Voicebox Technologies, Inc. System and method for processing multi-modal device interactions in a natural language voice services environment
US20100228538A1 (en) * 2009-03-03 2010-09-09 Yamada John A Computational linguistic systems and methods
US8380507B2 (en) 2009-03-09 2013-02-19 Apple Inc. Systems and methods for determining the language to use for speech generated by a text to speech engine
US8326602B2 (en) * 2009-06-05 2012-12-04 Google Inc. Detecting writing systems and languages
US8468011B1 (en) 2009-06-05 2013-06-18 Google Inc. Detecting writing systems and languages
US9858925B2 (en) 2009-06-05 2018-01-02 Apple Inc. Using context information to facilitate processing of commands in a virtual assistant
US10241644B2 (en) 2011-06-03 2019-03-26 Apple Inc. Actionable reminder entries
US10540976B2 (en) 2009-06-05 2020-01-21 Apple Inc. Contextual voice commands
US10241752B2 (en) 2011-09-30 2019-03-26 Apple Inc. Interface for a virtual digital assistant
US10706373B2 (en) 2011-06-03 2020-07-07 Apple Inc. Performing actions associated with task items that represent tasks to perform
US20100332217A1 (en) * 2009-06-29 2010-12-30 Shalom Wintner Method for text improvement via linguistic abstractions
US9431006B2 (en) 2009-07-02 2016-08-30 Apple Inc. Methods and apparatuses for automatic speech recognition
US20110010179A1 (en) * 2009-07-13 2011-01-13 Naik Devang K Voice synthesis and processing
CN101996208B (en) * 2009-08-31 2014-04-02 国际商业机器公司 Method and system for database semantic query answering
US8682649B2 (en) 2009-11-12 2014-03-25 Apple Inc. Sentiment prediction from textual data
US8600743B2 (en) 2010-01-06 2013-12-03 Apple Inc. Noise profile determination for voice-related feature
US8347276B2 (en) * 2010-01-07 2013-01-01 Gunther Schadow Systems and methods for software specification and design using a unified document
US8311838B2 (en) 2010-01-13 2012-11-13 Apple Inc. Devices and methods for identifying a prompt corresponding to a voice input in a sequence of prompts
US8381107B2 (en) 2010-01-13 2013-02-19 Apple Inc. Adaptive audio feedback system and method
US10679605B2 (en) 2010-01-18 2020-06-09 Apple Inc. Hands-free list-reading by intelligent automated assistant
US10705794B2 (en) 2010-01-18 2020-07-07 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US10276170B2 (en) 2010-01-18 2019-04-30 Apple Inc. Intelligent automated assistant
US10553209B2 (en) 2010-01-18 2020-02-04 Apple Inc. Systems and methods for hands-free notification summaries
WO2011089450A2 (en) 2010-01-25 2011-07-28 Andrew Peter Nelson Jerram Apparatuses, methods and systems for a digital conversation management platform
US8682667B2 (en) 2010-02-25 2014-03-25 Apple Inc. User profiling for selecting user specific voice input processing information
US8788260B2 (en) * 2010-05-11 2014-07-22 Microsoft Corporation Generating snippets based on content features
US9002700B2 (en) 2010-05-13 2015-04-07 Grammarly, Inc. Systems and methods for advanced grammar checking
US8935196B2 (en) * 2010-05-28 2015-01-13 Siemens Aktiengesellschaft System and method for providing instance information data of an instance
US8639516B2 (en) 2010-06-04 2014-01-28 Apple Inc. User-specific noise suppression for voice quality improvements
EP2583192A4 (en) 2010-06-21 2016-11-30 Commw Scient Ind Res Org Modification of description logic expressions
US8713021B2 (en) 2010-07-07 2014-04-29 Apple Inc. Unsupervised document clustering using latent semantic density analysis
US9104670B2 (en) 2010-07-21 2015-08-11 Apple Inc. Customized search or acquisition of digital media assets
US8719006B2 (en) 2010-08-27 2014-05-06 Apple Inc. Combined statistical and rule-based part-of-speech tagging for text-to-speech synthesis
US9713774B2 (en) 2010-08-30 2017-07-25 Disney Enterprises, Inc. Contextual chat message generation in online environments
JP5695199B2 (en) * 2010-08-30 2015-04-01 本田技研工業株式会社 Thought tracking and action selection in dialogue systems
US8719014B2 (en) 2010-09-27 2014-05-06 Apple Inc. Electronic device with text error correction based on voice recognition data
US9928296B2 (en) * 2010-12-16 2018-03-27 Microsoft Technology Licensing, Llc Search lexicon expansion
US10762293B2 (en) 2010-12-22 2020-09-01 Apple Inc. Using parts-of-speech tagging and named entity recognition for spelling correction
US10515147B2 (en) 2010-12-22 2019-12-24 Apple Inc. Using statistical language models for contextual lookup
US10185477B1 (en) 2013-03-15 2019-01-22 Narrative Science Inc. Method and system for configuring automatic generation of narratives from data
US9720899B1 (en) 2011-01-07 2017-08-01 Narrative Science, Inc. Automatic generation of narratives from data using communication goals and narrative analytics
US9552353B2 (en) * 2011-01-21 2017-01-24 Disney Enterprises, Inc. System and method for generating phrases
US8781836B2 (en) 2011-02-22 2014-07-15 Apple Inc. Hearing assistance system for providing consistent human speech
US20120239381A1 (en) 2011-03-17 2012-09-20 Sap Ag Semantic phrase suggestion engine
US9262612B2 (en) 2011-03-21 2016-02-16 Apple Inc. Device access using voice authentication
US10642934B2 (en) 2011-03-31 2020-05-05 Microsoft Technology Licensing, Llc Augmented conversational understanding architecture
US9760566B2 (en) 2011-03-31 2017-09-12 Microsoft Technology Licensing, Llc Augmented conversational understanding agent to identify conversation context between two humans and taking an agent action thereof
US9842168B2 (en) 2011-03-31 2017-12-12 Microsoft Technology Licensing, Llc Task driven user intents
US9110883B2 (en) * 2011-04-01 2015-08-18 Rima Ghannam System for natural language understanding
US9064006B2 (en) 2012-08-23 2015-06-23 Microsoft Technology Licensing, Llc Translating natural language utterances to keyword search queries
CA2741212C (en) * 2011-05-27 2020-12-08 Ibm Canada Limited - Ibm Canada Limitee Automated self-service user support based on ontology analysis
US20120310642A1 (en) 2011-06-03 2012-12-06 Apple Inc. Automatically creating a mapping between text data and audio data
US10057736B2 (en) 2011-06-03 2018-08-21 Apple Inc. Active transport based notifications
US8812294B2 (en) 2011-06-21 2014-08-19 Apple Inc. Translating phrases from one language into another using an order-based set of declarative rules
US9501455B2 (en) * 2011-06-30 2016-11-22 The Boeing Company Systems and methods for processing data
US8521769B2 (en) 2011-07-25 2013-08-27 The Boeing Company Locating ambiguities in data
US8706472B2 (en) 2011-08-11 2014-04-22 Apple Inc. Method for disambiguating multiple readings in language conversion
US9176947B2 (en) * 2011-08-19 2015-11-03 Disney Enterprises, Inc. Dynamically generated phrase-based assisted input
US9245253B2 (en) 2011-08-19 2016-01-26 Disney Enterprises, Inc. Soft-sending chat messages
US9576573B2 (en) 2011-08-29 2017-02-21 Microsoft Technology Licensing, Llc Using multiple modality input to feedback context for natural language understanding
US8994660B2 (en) 2011-08-29 2015-03-31 Apple Inc. Text correction processing
US8762156B2 (en) 2011-09-28 2014-06-24 Apple Inc. Speech recognition repair using contextual information
JP2015505082A (en) 2011-12-12 2015-02-16 インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation Generation of natural language processing model for information domain
CA2767676C (en) 2012-02-08 2022-03-01 Ibm Canada Limited - Ibm Canada Limitee Attribution using semantic analysis
US10134385B2 (en) 2012-03-02 2018-11-20 Apple Inc. Systems and methods for name pronunciation
US9483461B2 (en) 2012-03-06 2016-11-01 Apple Inc. Handling speech synthesis of content for multiple languages
US9092504B2 (en) 2012-04-09 2015-07-28 Vivek Ventures, LLC Clustered information processing and searching with structured-unstructured database bridge
US8971630B2 (en) 2012-04-27 2015-03-03 Abbyy Development Llc Fast CJK character recognition
US8989485B2 (en) 2012-04-27 2015-03-24 Abbyy Development Llc Detecting a junction in a text line of CJK characters
US9127950B2 (en) 2012-05-03 2015-09-08 Honda Motor Co., Ltd. Landmark-based location belief tracking for voice-controlled navigation system
US9280610B2 (en) 2012-05-14 2016-03-08 Apple Inc. Crowd sourcing information to fulfill user requests
US8775442B2 (en) 2012-05-15 2014-07-08 Apple Inc. Semantic search using a single-source semantic model
US10417037B2 (en) 2012-05-15 2019-09-17 Apple Inc. Systems and methods for integrating third party services with a digital assistant
US20130311166A1 (en) * 2012-05-15 2013-11-21 Andre Yanpolsky Domain-Specific Natural-Language Processing Engine
US9721563B2 (en) 2012-06-08 2017-08-01 Apple Inc. Name recognition system
WO2013185109A2 (en) 2012-06-08 2013-12-12 Apple Inc. Systems and methods for recognizing textual identifiers within a plurality of words
US9495129B2 (en) 2012-06-29 2016-11-15 Apple Inc. Device, method, and user interface for voice-activated navigation and browsing of a document
US9336297B2 (en) * 2012-08-02 2016-05-10 Paypal, Inc. Content inversion for user searches and product recommendations systems and methods
US9576574B2 (en) 2012-09-10 2017-02-21 Apple Inc. Context-sensitive handling of interruptions by intelligent digital assistant
US9547647B2 (en) 2012-09-19 2017-01-17 Apple Inc. Voice-based media searching
US8935167B2 (en) 2012-09-25 2015-01-13 Apple Inc. Exemplar-based latent perceptual modeling for automatic speech recognition
US9165329B2 (en) 2012-10-19 2015-10-20 Disney Enterprises, Inc. Multi layer chat detection and classification
US9081975B2 (en) 2012-10-22 2015-07-14 Palantir Technologies, Inc. Sharing information between nexuses that use different classification schemes for information access control
WO2014066651A2 (en) * 2012-10-25 2014-05-01 Walker Reading Technologies, Inc. Sentence parsing correction system
US10650089B1 (en) * 2012-10-25 2020-05-12 Walker Reading Technologies Sentence parsing correction system
US9501761B2 (en) 2012-11-05 2016-11-22 Palantir Technologies, Inc. System and method for sharing investigation results
US9244909B2 (en) 2012-12-10 2016-01-26 General Electric Company System and method for extracting ontological information from a body of text
US9053422B1 (en) 2012-12-21 2015-06-09 Lockheed Martin Corporation Computer system and method for planning a strategy from data and model ontologies to meet a challenge
CN113470640B (en) 2013-02-07 2022-04-26 苹果公司 Voice trigger of digital assistant
DE102013003055A1 (en) 2013-02-18 2014-08-21 Nadine Sina Kurz Method and apparatus for performing natural language searches
US9569425B2 (en) * 2013-03-01 2017-02-14 The Software Shop, Inc. Systems and methods for improving the efficiency of syntactic and semantic analysis in automated processes for natural language understanding using traveling features
US9368114B2 (en) 2013-03-14 2016-06-14 Apple Inc. Context-sensitive handling of interruptions
US10572476B2 (en) 2013-03-14 2020-02-25 Apple Inc. Refining a search based on schedule items
US9733821B2 (en) 2013-03-14 2017-08-15 Apple Inc. Voice control to diagnose inadvertent activation of accessibility features
US9977779B2 (en) 2013-03-14 2018-05-22 Apple Inc. Automatic supplementation of word correction dictionaries
US10642574B2 (en) 2013-03-14 2020-05-05 Apple Inc. Device, method, and graphical user interface for outputting captions
US10652394B2 (en) 2013-03-14 2020-05-12 Apple Inc. System and method for processing voicemail
US8855999B1 (en) 2013-03-15 2014-10-07 Palantir Technologies Inc. Method and system for generating a parser and parsing complex data
US10303762B2 (en) 2013-03-15 2019-05-28 Disney Enterprises, Inc. Comprehensive safety schema for ensuring appropriateness of language in online chat
US20140278362A1 (en) * 2013-03-15 2014-09-18 International Business Machines Corporation Entity Recognition in Natural Language Processing Systems
US9262550B2 (en) 2013-03-15 2016-02-16 Business Objects Software Ltd. Processing semi-structured data
CN110096712B (en) 2013-03-15 2023-06-20 苹果公司 User training through intelligent digital assistant
US9299041B2 (en) 2013-03-15 2016-03-29 Business Objects Software Ltd. Obtaining data from unstructured data for a structured data collection
WO2014144579A1 (en) 2013-03-15 2014-09-18 Apple Inc. System and method for updating an adaptive speech recognition model
US20140280008A1 (en) * 2013-03-15 2014-09-18 International Business Machines Corporation Axiomatic Approach for Entity Attribution in Unstructured Data
US10742577B2 (en) 2013-03-15 2020-08-11 Disney Enterprises, Inc. Real-time search and validation of phrases using linguistic phrase components
US9218568B2 (en) 2013-03-15 2015-12-22 Business Objects Software Ltd. Disambiguating data using contextual and historical information
CN105027197B (en) 2013-03-15 2018-12-14 苹果公司 Training at least partly voice command system
US10748529B1 (en) 2013-03-15 2020-08-18 Apple Inc. Voice activated device for use with a voice-based digital assistant
US10078487B2 (en) 2013-03-15 2018-09-18 Apple Inc. Context-sensitive handling of interruptions
US10579835B1 (en) * 2013-05-22 2020-03-03 Sri International Semantic pre-processing of natural language input in a virtual personal assistant
US9286029B2 (en) * 2013-06-06 2016-03-15 Honda Motor Co., Ltd. System and method for multimodal human-vehicle interaction and belief tracking
WO2014197336A1 (en) 2013-06-07 2014-12-11 Apple Inc. System and method for detecting errors in interactions with a voice-based digital assistant
US9582608B2 (en) 2013-06-07 2017-02-28 Apple Inc. Unified ranking with entropy-weighted information for phrase-based semantic auto-completion
WO2014197334A2 (en) 2013-06-07 2014-12-11 Apple Inc. System and method for user-specified pronunciation of words for speech synthesis and recognition
WO2014197335A1 (en) 2013-06-08 2014-12-11 Apple Inc. Interpreting and acting upon commands that involve sharing information with remote devices
US10176167B2 (en) 2013-06-09 2019-01-08 Apple Inc. System and method for inferring user intent from speech inputs
KR101922663B1 (en) 2013-06-09 2018-11-28 애플 인크. Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant
KR101809808B1 (en) 2013-06-13 2017-12-15 애플 인크. System and method for emergency calls initiated by voice command
DE112014003653B4 (en) 2013-08-06 2024-04-18 Apple Inc. Automatically activate intelligent responses based on activities from remote devices
US9223773B2 (en) 2013-08-08 2015-12-29 Palatir Technologies Inc. Template system for custom document generation
US10296160B2 (en) 2013-12-06 2019-05-21 Apple Inc. Method for extracting salient dialog usage from live data
RU2592395C2 (en) 2013-12-19 2016-07-20 Общество с ограниченной ответственностью "Аби ИнфоПоиск" Resolution semantic ambiguity by statistical analysis
RU2586577C2 (en) 2014-01-15 2016-06-10 Общество с ограниченной ответственностью "Аби ИнфоПоиск" Filtering arcs parser graph
US9009827B1 (en) 2014-02-20 2015-04-14 Palantir Technologies Inc. Security sharing system
US11436270B2 (en) * 2014-02-28 2022-09-06 San Diego State University Research Foundation Knowledge reference system and method
US9620105B2 (en) 2014-05-15 2017-04-11 Apple Inc. Analyzing audio input for efficient speech and music recognition
US10592095B2 (en) 2014-05-23 2020-03-17 Apple Inc. Instantaneous speaking of content on touch devices
US9502031B2 (en) 2014-05-27 2016-11-22 Apple Inc. Method for supporting dynamic grammars in WFST-based ASR
US9430463B2 (en) 2014-05-30 2016-08-30 Apple Inc. Exemplar-based natural language processing
US10289433B2 (en) 2014-05-30 2019-05-14 Apple Inc. Domain specific language for encoding assistant dialog
US9715875B2 (en) 2014-05-30 2017-07-25 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US10078631B2 (en) 2014-05-30 2018-09-18 Apple Inc. Entropy-guided text prediction using combined word and character n-gram language models
US10170123B2 (en) 2014-05-30 2019-01-01 Apple Inc. Intelligent assistant for home automation
US9842101B2 (en) 2014-05-30 2017-12-12 Apple Inc. Predictive conversion of language input
US9633004B2 (en) 2014-05-30 2017-04-25 Apple Inc. Better resolution when referencing to concepts
TWI566107B (en) 2014-05-30 2017-01-11 蘋果公司 Method for processing a multi-part voice command, non-transitory computer readable storage medium and electronic device
US9760559B2 (en) 2014-05-30 2017-09-12 Apple Inc. Predictive text input
US9785630B2 (en) 2014-05-30 2017-10-10 Apple Inc. Text prediction using combined word N-gram and unigram language models
US9734193B2 (en) 2014-05-30 2017-08-15 Apple Inc. Determining domain salience ranking from ambiguous words in natural speech
US9471283B2 (en) 2014-06-11 2016-10-18 Ca, Inc. Generating virtualized application programming interface (API) implementation from narrative API documentation
US10659851B2 (en) 2014-06-30 2020-05-19 Apple Inc. Real-time digital assistant knowledge updates
US9338493B2 (en) 2014-06-30 2016-05-10 Apple Inc. Intelligent automated assistant for TV user interactions
US10572496B1 (en) 2014-07-03 2020-02-25 Palantir Technologies Inc. Distributed workflow system and database with access controls for city resiliency
US10446141B2 (en) 2014-08-28 2019-10-15 Apple Inc. Automatic speech recognition based on user feedback
RU2596600C2 (en) 2014-09-02 2016-09-10 Общество с ограниченной ответственностью "Аби Девелопмент" Methods and systems for processing images of mathematical expressions
US9818400B2 (en) 2014-09-11 2017-11-14 Apple Inc. Method and apparatus for discovering trending terms in speech requests
US10789041B2 (en) 2014-09-12 2020-09-29 Apple Inc. Dynamic thresholds for always listening speech trigger
US9898459B2 (en) * 2014-09-16 2018-02-20 Voicebox Technologies Corporation Integration of domain information into state transitions of a finite state transducer for natural language processing
US9626703B2 (en) 2014-09-16 2017-04-18 Voicebox Technologies Corporation Voice commerce
US9836529B2 (en) 2014-09-22 2017-12-05 Oracle International Corporation Semantic text search
US10074360B2 (en) 2014-09-30 2018-09-11 Apple Inc. Providing an indication of the suitability of speech recognition
US9668121B2 (en) 2014-09-30 2017-05-30 Apple Inc. Social reminders
US9646609B2 (en) 2014-09-30 2017-05-09 Apple Inc. Caching apparatus for serving phonetic pronunciations
US10127911B2 (en) 2014-09-30 2018-11-13 Apple Inc. Speaker identification and unsupervised speaker adaptation techniques
US9886432B2 (en) 2014-09-30 2018-02-06 Apple Inc. Parsimonious handling of word inflection via categorical stem + suffix N-gram language models
US9665564B2 (en) 2014-10-06 2017-05-30 International Business Machines Corporation Natural language processing utilizing logical tree structures
US9715488B2 (en) * 2014-10-06 2017-07-25 International Business Machines Corporation Natural language processing utilizing transaction based knowledge representation
US9588961B2 (en) 2014-10-06 2017-03-07 International Business Machines Corporation Natural language processing utilizing propagation of knowledge through logical parse tree structures
US9424298B2 (en) * 2014-10-07 2016-08-23 International Business Machines Corporation Preserving conceptual distance within unstructured documents
CN107003999B (en) 2014-10-15 2020-08-21 声钰科技 System and method for subsequent response to a user's prior natural language input
US11922344B2 (en) 2014-10-22 2024-03-05 Narrative Science Llc Automatic generation of narratives from data using communication goals and narrative analytics
US11341338B1 (en) 2016-08-31 2022-05-24 Narrative Science Inc. Applied artificial intelligence technology for interactively using narrative analytics to focus and control visualizations of data
US11238090B1 (en) 2015-11-02 2022-02-01 Narrative Science Inc. Applied artificial intelligence technology for using narrative analytics to automatically generate narratives from visualization data
US10431214B2 (en) 2014-11-26 2019-10-01 Voicebox Technologies Corporation System and method of determining a domain and/or an action related to a natural language input
US10614799B2 (en) 2014-11-26 2020-04-07 Voicebox Technologies Corporation System and method of providing intent predictions for an utterance prior to a system detection of an end of the utterance
US9626358B2 (en) 2014-11-26 2017-04-18 Abbyy Infopoisk Llc Creating ontologies by analyzing natural language texts
US10552013B2 (en) 2014-12-02 2020-02-04 Apple Inc. Data detection
US9711141B2 (en) 2014-12-09 2017-07-18 Apple Inc. Disambiguating heteronyms in speech synthesis
US9734254B2 (en) 2015-01-13 2017-08-15 Bank Of America Corporation Method and apparatus for automatic completion of an entry into an input field
RU2596599C2 (en) * 2015-02-03 2016-09-10 Общество с ограниченной ответственностью "Аби ИнфоПоиск" System and method of creating and using user ontology-based patterns for processing user text in natural language
US10019437B2 (en) * 2015-02-23 2018-07-10 International Business Machines Corporation Facilitating information extraction via semantic abstraction
US9727560B2 (en) 2015-02-25 2017-08-08 Palantir Technologies Inc. Systems and methods for organizing and identifying documents via hierarchies and dimensions of tags
US20160259851A1 (en) * 2015-03-04 2016-09-08 The Allen Institute For Artificial Intelligence System and methods for generating treebanks for natural language processing by modifying parser operation through introduction of constraints on parse tree structure
US9865280B2 (en) 2015-03-06 2018-01-09 Apple Inc. Structured dictation using intelligent automated assistants
US9721566B2 (en) 2015-03-08 2017-08-01 Apple Inc. Competing devices responding to voice triggers
US10567477B2 (en) 2015-03-08 2020-02-18 Apple Inc. Virtual assistant continuity
US9886953B2 (en) 2015-03-08 2018-02-06 Apple Inc. Virtual assistant activation
US9899019B2 (en) 2015-03-18 2018-02-20 Apple Inc. Systems and methods for structured stem and suffix language models
US9842105B2 (en) 2015-04-16 2017-12-12 Apple Inc. Parsimonious continuous-space phrase representations for natural language processing
US10083688B2 (en) 2015-05-27 2018-09-25 Apple Inc. Device voice control for selecting a displayed affordance
US10127220B2 (en) 2015-06-04 2018-11-13 Apple Inc. Language identification from short strings
US10101822B2 (en) 2015-06-05 2018-10-16 Apple Inc. Language input correction
US9578173B2 (en) 2015-06-05 2017-02-21 Apple Inc. Virtual assistant aided communication with 3rd party service in a communication session
US10186254B2 (en) 2015-06-07 2019-01-22 Apple Inc. Context-based endpoint detection
US10255907B2 (en) 2015-06-07 2019-04-09 Apple Inc. Automatic accent detection using acoustic models
US11025565B2 (en) 2015-06-07 2021-06-01 Apple Inc. Personalized prediction of responses for instant messaging
US9898447B2 (en) 2015-06-22 2018-02-20 International Business Machines Corporation Domain specific representation of document text for accelerated natural language processing
US10853378B1 (en) 2015-08-25 2020-12-01 Palantir Technologies Inc. Electronic note management via a connected entity graph
US10747498B2 (en) 2015-09-08 2020-08-18 Apple Inc. Zero latency digital assistant
US10671428B2 (en) 2015-09-08 2020-06-02 Apple Inc. Distributed personal assistant
EP3142028A3 (en) * 2015-09-11 2017-07-12 Google, Inc. Handling failures in processing natural language queries through user interactions
CN107016011B (en) 2015-09-11 2021-03-30 谷歌有限责任公司 Disambiguation of join paths for natural language queries
US11301502B1 (en) * 2015-09-15 2022-04-12 Google Llc Parsing natural language queries without retraining
US11157260B2 (en) 2015-09-18 2021-10-26 ReactiveCore LLC Efficient information storage and retrieval using subgraphs
US9864598B2 (en) 2015-09-18 2018-01-09 ReactiveCore LLC System and method for providing supplemental functionalities to a computer program
US9372684B1 (en) * 2015-09-18 2016-06-21 ReactiveCore LLC System and method for providing supplemental functionalities to a computer program via an ontology instance
US9697820B2 (en) 2015-09-24 2017-07-04 Apple Inc. Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks
US10366158B2 (en) 2015-09-29 2019-07-30 Apple Inc. Efficient word encoding for recurrent neural network language models
US11010550B2 (en) 2015-09-29 2021-05-18 Apple Inc. Unified language modeling framework for word prediction, auto-completion and auto-correction
US11587559B2 (en) 2015-09-30 2023-02-21 Apple Inc. Intelligent device identification
US11222184B1 (en) 2015-11-02 2022-01-11 Narrative Science Inc. Applied artificial intelligence technology for using narrative analytics to automatically generate narratives from bar charts
US11188588B1 (en) 2015-11-02 2021-11-30 Narrative Science Inc. Applied artificial intelligence technology for using narrative analytics to interactively generate narratives from visualization data
US11232268B1 (en) 2015-11-02 2022-01-25 Narrative Science Inc. Applied artificial intelligence technology for using narrative analytics to automatically generate narratives from line charts
US10691473B2 (en) 2015-11-06 2020-06-23 Apple Inc. Intelligent automated assistant in a messaging environment
US10049668B2 (en) 2015-12-02 2018-08-14 Apple Inc. Applying neural network language models to weighted finite state transducers for automatic speech recognition
US10223066B2 (en) 2015-12-23 2019-03-05 Apple Inc. Proactive assistance based on dialog communication between devices
US10446143B2 (en) 2016-03-14 2019-10-15 Apple Inc. Identification of voice inputs providing credentials
US10698938B2 (en) 2016-03-18 2020-06-30 Palantir Technologies Inc. Systems and methods for organizing and identifying documents via hierarchies and dimensions of tags
US10169454B2 (en) * 2016-05-17 2019-01-01 Xerox Corporation Unsupervised ontology-based graph extraction from texts
RU2640297C2 (en) * 2016-05-17 2017-12-27 Общество с ограниченной ответственностью "Аби Продакшн" Definition of confidence degrees related to attribute values of information objects
US9934775B2 (en) 2016-05-26 2018-04-03 Apple Inc. Unit-selection text-to-speech synthesis based on predicted concatenation parameters
US9972304B2 (en) 2016-06-03 2018-05-15 Apple Inc. Privacy preserving distributed evaluation framework for embedded personalized systems
US20230410504A9 (en) * 2016-06-06 2023-12-21 Purdue Research Foundation System and method for sentence directed video object codetection
US10249300B2 (en) 2016-06-06 2019-04-02 Apple Inc. Intelligent list reading
EP3465487A4 (en) * 2016-06-06 2020-01-22 Purdue Research Foundation System and method for sentence directed video object codetection
US10049663B2 (en) 2016-06-08 2018-08-14 Apple, Inc. Intelligent automated assistant for media exploration
DK179588B1 (en) 2016-06-09 2019-02-22 Apple Inc. Intelligent automated assistant in a home environment
US10067938B2 (en) 2016-06-10 2018-09-04 Apple Inc. Multilingual word prediction
US10586535B2 (en) 2016-06-10 2020-03-10 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US10490187B2 (en) 2016-06-10 2019-11-26 Apple Inc. Digital assistant providing automated status report
US10192552B2 (en) 2016-06-10 2019-01-29 Apple Inc. Digital assistant providing whispered speech
US10509862B2 (en) 2016-06-10 2019-12-17 Apple Inc. Dynamic phrase expansion of language input
DK179343B1 (en) 2016-06-11 2018-05-14 Apple Inc Intelligent task discovery
DK179415B1 (en) 2016-06-11 2018-06-14 Apple Inc Intelligent device arbitration and control
DK201670540A1 (en) 2016-06-11 2018-01-08 Apple Inc Application integration with a digital assistant
DK179049B1 (en) 2016-06-11 2017-09-18 Apple Inc Data driven natural language event detection and classification
US10331784B2 (en) 2016-07-29 2019-06-25 Voicebox Technologies Corporation System and method of disambiguating natural language processing requests
US10042840B2 (en) 2016-08-04 2018-08-07 Oath Inc. Hybrid grammatical and ungrammatical parsing
US10102200B2 (en) 2016-08-25 2018-10-16 International Business Machines Corporation Predicate parses using semantic knowledge
US10043516B2 (en) 2016-09-23 2018-08-07 Apple Inc. Intelligent automated assistant
WO2018060777A1 (en) * 2016-09-29 2018-04-05 Yokogawa Electric Corporation Method and system for optimizing software testing
RU2646386C1 (en) * 2016-12-07 2018-03-02 Общество с ограниченной ответственностью "Аби Продакшн" Extraction of information using alternative variants of semantic-syntactic analysis
US10593346B2 (en) 2016-12-22 2020-03-17 Apple Inc. Rank-reduced token representation for automatic speech recognition
RU2646380C1 (en) * 2016-12-22 2018-03-02 Общество с ограниченной ответственностью "Аби Продакшн" Using verified by user data for training models of confidence
US10943069B1 (en) 2017-02-17 2021-03-09 Narrative Science Inc. Applied artificial intelligence technology for narrative generation based on a conditional outcome framework
US11068661B1 (en) 2017-02-17 2021-07-20 Narrative Science Inc. Applied artificial intelligence technology for narrative generation based on smart attributes
US11568148B1 (en) 2017-02-17 2023-01-31 Narrative Science Inc. Applied artificial intelligence technology for narrative generation based on explanation communication goals
US10755053B1 (en) 2017-02-17 2020-08-25 Narrative Science Inc. Applied artificial intelligence technology for story outline formation using composable communication goals to support natural language generation (NLG)
US11954445B2 (en) 2017-02-17 2024-04-09 Narrative Science Llc Applied artificial intelligence technology for narrative generation based on explanation communication goals
DK201770439A1 (en) 2017-05-11 2018-12-13 Apple Inc. Offline personal assistant
DK179745B1 (en) 2017-05-12 2019-05-01 Apple Inc. SYNCHRONIZATION AND TASK DELEGATION OF A DIGITAL ASSISTANT
DK179496B1 (en) 2017-05-12 2019-01-15 Apple Inc. USER-SPECIFIC Acoustic Models
DK201770431A1 (en) 2017-05-15 2018-12-20 Apple Inc. Optimizing dialogue policy decisions for digital assistants using implicit feedback
DK201770432A1 (en) 2017-05-15 2018-12-21 Apple Inc. Hierarchical belief states for digital assistants
DK179560B1 (en) 2017-05-16 2019-02-18 Apple Inc. Far-field extension for digital assistant services
US10810273B2 (en) 2017-06-13 2020-10-20 Bank Of America Corporation Auto identification and mapping of functional attributes from visual representation
US11240184B2 (en) 2017-06-23 2022-02-01 Realpage, Inc. Interaction driven artificial intelligence system and uses for same, including presentation through portions of web pages
US11138249B1 (en) 2017-08-23 2021-10-05 Realpage, Inc. Systems and methods for the creation, update and use of concept networks to select destinations in artificial intelligence systems
US10872125B2 (en) 2017-10-05 2020-12-22 Realpage, Inc. Concept networks and systems and methods for the creation, update and use of same to select images, including the selection of images corresponding to destinations in artificial intelligence systems
US10997259B2 (en) * 2017-10-06 2021-05-04 Realpage, Inc. Concept networks and systems and methods for the creation, update and use of same in artificial intelligence systems
US11042709B1 (en) 2018-01-02 2021-06-22 Narrative Science Inc. Context saliency-based deictic parser for natural language processing
US10896294B2 (en) 2018-01-11 2021-01-19 End Cue, Llc Script writing and content generation tools and improved operation of same
EP3714382A4 (en) * 2018-01-11 2021-01-20 End Cue, LLC Script writing and content generation tools and improved operation of same
US10963649B1 (en) 2018-01-17 2021-03-30 Narrative Science Inc. Applied artificial intelligence technology for narrative generation using an invocable analysis service and configuration-driven analytics
US11023684B1 (en) * 2018-03-19 2021-06-01 Educational Testing Service Systems and methods for automatic generation of questions from text
RU2681356C1 (en) * 2018-03-23 2019-03-06 Общество с ограниченной ответственностью "Аби Продакшн" Classifier training used for extracting information from texts in natural language
US10606951B2 (en) 2018-04-17 2020-03-31 International Business Machines Corporation Optimizing resource allocation to a bid request response based on cognitive analysis of natural language documentation
US10650186B2 (en) 2018-06-08 2020-05-12 Handycontract, LLC Device, system and method for displaying sectioned documents
US11042713B1 (en) 2018-06-28 2021-06-22 Narrative Scienc Inc. Applied artificial intelligence technology for using natural language processing to train a natural language generation system
US11341330B1 (en) 2019-01-28 2022-05-24 Narrative Science Inc. Applied artificial intelligence technology for adaptive natural language understanding with term discovery
US11416558B2 (en) * 2019-03-29 2022-08-16 Indiavidual Learning Private Limited System and method for recommending personalized content using contextualized knowledge base
US11526515B2 (en) 2020-07-28 2022-12-13 International Business Machines Corporation Replacing mappings within a semantic search application over a commonly enriched corpus
US11640430B2 (en) 2020-07-28 2023-05-02 International Business Machines Corporation Custom semantic search experience driven by an ontology
US11481561B2 (en) * 2020-07-28 2022-10-25 International Business Machines Corporation Semantic linkage qualification of ontologically related entities
US11748342B2 (en) 2021-08-06 2023-09-05 Cloud Software Group, Inc. Natural language based processor and query constructor

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0413132A2 (en) * 1989-08-16 1991-02-20 International Business Machines Corporation A computer method for identifying predicate-argument structures in natural language text
US5870701A (en) * 1992-08-21 1999-02-09 Canon Kabushiki Kaisha Control signal processing method and apparatus having natural language interfacing capabilities
US5930746A (en) * 1996-03-20 1999-07-27 The Government Of Singapore Parsing and translating natural language sentences automatically
WO2000049517A2 (en) * 1999-02-19 2000-08-24 The Trustees Of Columbia University In The City Of New York Multi-document summarization system and method

Family Cites Families (58)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4270182A (en) 1974-12-30 1981-05-26 Asija Satya P Automated information input, storage, and retrieval system
US4887212A (en) 1986-10-29 1989-12-12 International Business Machines Corporation Parser for natural language text
US4864502A (en) 1987-10-07 1989-09-05 Houghton Mifflin Company Sentence analyzer
US4914590A (en) 1988-05-18 1990-04-03 Emhart Industries, Inc. Natural language understanding system
US4984178A (en) 1989-02-21 1991-01-08 Texas Instruments Incorporated Chart parser for stochastic unification grammar
SE466029B (en) 1989-03-06 1991-12-02 Ibm Svenska Ab DEVICE AND PROCEDURE FOR ANALYSIS OF NATURAL LANGUAGES IN A COMPUTER-BASED INFORMATION PROCESSING SYSTEM
JPH02240769A (en) 1989-03-14 1990-09-25 Canon Inc Device for preparing natural language sentence
US5056021A (en) 1989-06-08 1991-10-08 Carolyn Ausborn Method and apparatus for abstracting concepts from natural language
JPH03129472A (en) 1989-07-31 1991-06-03 Ricoh Co Ltd Processing method for document retrieving device
US5309359A (en) 1990-08-16 1994-05-03 Boris Katz Method and apparatus for generating and utlizing annotations to facilitate computer text retrieval
US5404295A (en) 1990-08-16 1995-04-04 Katz; Boris Method and apparatus for utilizing annotations to facilitate computer retrieval of database material
US5321833A (en) 1990-08-29 1994-06-14 Gte Laboratories Incorporated Adaptive ranking system for information retrieval
EP0473864A1 (en) 1990-09-04 1992-03-11 International Business Machines Corporation Method and apparatus for paraphrasing information contained in logical forms
US5317507A (en) 1990-11-07 1994-05-31 Gallant Stephen I Method for document retrieval and for word sense disambiguation using neural networks
JP2943447B2 (en) 1991-01-30 1999-08-30 三菱電機株式会社 Text information extraction device, text similarity matching device, text search system, text information extraction method, text similarity matching method, and question analysis device
US5680627A (en) 1991-02-15 1997-10-21 Texas Instruments Incorporated Method and apparatus for character preprocessing which translates textual description into numeric form for input to a neural network
US5265065A (en) 1991-10-08 1993-11-23 West Publishing Company Method and apparatus for information retrieval from a database by replacing domain specific stemmed phases in a natural language to create a search query
US5446891A (en) 1992-02-26 1995-08-29 International Business Machines Corporation System for adjusting hypertext links with weighed user goals and activities
US6055531A (en) 1993-03-24 2000-04-25 Engate Incorporated Down-line transcription system having context sensitive searching capability
US5475588A (en) 1993-06-18 1995-12-12 Mitsubishi Electric Research Laboratories, Inc. System for decreasing the time required to parse a sentence
US5331556A (en) 1993-06-28 1994-07-19 General Electric Company Method for natural language data processing using morphological and part-of-speech information
US5619709A (en) 1993-09-20 1997-04-08 Hnc, Inc. System and method of context vector generation and retrieval
US5873056A (en) 1993-10-12 1999-02-16 The Syracuse University Natural language processing system for semantic vector representation which accounts for lexical ambiguity
US5576954A (en) 1993-11-05 1996-11-19 University Of Central Florida Process for determination of text relevancy
JP3476237B2 (en) 1993-12-28 2003-12-10 富士通株式会社 Parser
US5706497A (en) 1994-08-15 1998-01-06 Nec Research Institute, Inc. Document retrieval using fuzzy-logic inference
JPH0877010A (en) 1994-09-07 1996-03-22 Hitachi Ltd Method and device for data analysis
US5790754A (en) 1994-10-21 1998-08-04 Sensory Circuits, Inc. Speech recognition apparatus for consumer electronic applications
US5758257A (en) 1994-11-29 1998-05-26 Herz; Frederick System and method for scheduling broadcast of and access to video programs and other data using customer profiles
US5642502A (en) 1994-12-06 1997-06-24 University Of Central Florida Method and system for searching for relevant documents from a text database collection, using statistical ranking, relevancy feedback and small pieces of text
US5794050A (en) * 1995-01-04 1998-08-11 Intelligent Text Processing, Inc. Natural language understanding system
US5694523A (en) 1995-05-31 1997-12-02 Oracle Corporation Content processing system for discourse
US5721938A (en) 1995-06-07 1998-02-24 Stuckey; Barbara K. Method and device for parsing and analyzing natural language sentences and text
US5675710A (en) 1995-06-07 1997-10-07 Lucent Technologies, Inc. Method and apparatus for training a text classifier
EP0856175A4 (en) 1995-08-16 2000-05-24 Univ Syracuse Multilingual document retrieval system and method using semantic vector matching
US5963940A (en) 1995-08-16 1999-10-05 Syracuse University Natural language information retrieval system and method
US6026388A (en) 1995-08-16 2000-02-15 Textwise, Llc User interface and other enhancements for natural language information retrieval system and method
US5721902A (en) 1995-09-15 1998-02-24 Infonautics Corporation Restricted expansion of query terms using part of speech tagging
US5974455A (en) 1995-12-13 1999-10-26 Digital Equipment Corporation System for adding new entry to web page table upon receiving web page including link to another web page not having corresponding entry in web page table
US5864855A (en) 1996-02-26 1999-01-26 The United States Of America As Represented By The Secretary Of The Army Parallel document clustering process
US5802515A (en) 1996-06-11 1998-09-01 Massachusetts Institute Of Technology Randomized query generation and document relevance ranking for robust information retrieval from a database
US5915249A (en) 1996-06-14 1999-06-22 Excite, Inc. System and method for accelerated query evaluation of very large full-text databases
US5864863A (en) 1996-08-09 1999-01-26 Digital Equipment Corporation Method for parsing, indexing and searching world-wide-web pages
US5920854A (en) 1996-08-14 1999-07-06 Infoseek Corporation Real-time document collection search engine with phrase indexing
US5870740A (en) 1996-09-30 1999-02-09 Apple Computer, Inc. System and method for improving the ranking of information retrieval results for short queries
US6076051A (en) 1997-03-07 2000-06-13 Microsoft Corporation Information retrieval utilizing semantic representation of text
US6049799A (en) 1997-05-12 2000-04-11 Novell, Inc. Document link management using directory services
US6038560A (en) 1997-05-21 2000-03-14 Oracle Corporation Concept knowledge base search and retrieval system
US5940821A (en) 1997-05-21 1999-08-17 Oracle Corporation Information presentation in a knowledge base search and retrieval system
US6047277A (en) 1997-06-19 2000-04-04 Parry; Michael H. Self-organizing neural network for plain text categorization
US6012053A (en) 1997-06-23 2000-01-04 Lycos, Inc. Computer system with user-controlled relevance ranking of search results
US6233575B1 (en) 1997-06-24 2001-05-15 International Business Machines Corporation Multilevel taxonomy based on features derived from training documents classification using fisher values as discrimination values
US5933822A (en) 1997-07-22 1999-08-03 Microsoft Corporation Apparatus and methods for an information retrieval system that employs natural language processing of search results to improve overall precision
US5960384A (en) 1997-09-03 1999-09-28 Brash; Douglas E. Method and device for parsing natural language sentences and other sequential symbolic expressions
US5974412A (en) 1997-09-24 1999-10-26 Sapient Health Network Intelligent query system for automatically indexing information in a database and automatically categorizing users
US5953718A (en) 1997-11-12 1999-09-14 Oracle Corporation Research mode for a knowledge base search and retrieval system
US6961728B2 (en) 2000-11-28 2005-11-01 Centerboard, Inc. System and methods for highly distributed wide-area data management of a network of data sources through a database interface
US6778979B2 (en) * 2001-08-13 2004-08-17 Xerox Corporation System for automatically generating queries

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0413132A2 (en) * 1989-08-16 1991-02-20 International Business Machines Corporation A computer method for identifying predicate-argument structures in natural language text
US5870701A (en) * 1992-08-21 1999-02-09 Canon Kabushiki Kaisha Control signal processing method and apparatus having natural language interfacing capabilities
US5930746A (en) * 1996-03-20 1999-07-27 The Government Of Singapore Parsing and translating natural language sentences automatically
WO2000049517A2 (en) * 1999-02-19 2000-08-24 The Trustees Of Columbia University In The City Of New York Multi-document summarization system and method

Cited By (66)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8082264B2 (en) 2004-04-07 2011-12-20 Inquira, Inc. Automated scheme for identifying user intent in real-time
US8924410B2 (en) 2004-04-07 2014-12-30 Oracle International Corporation Automated scheme for identifying user intent in real-time
US9747390B2 (en) 2004-04-07 2017-08-29 Oracle Otc Subsidiary Llc Ontology for use with a system, method, and computer readable medium for retrieving information and response to a query
US8612208B2 (en) * 2004-04-07 2013-12-17 Oracle Otc Subsidiary Llc Ontology for use with a system, method, and computer readable medium for retrieving information and response to a query
WO2006013233A1 (en) * 2004-07-01 2006-02-09 France Telecom Method and device for automatic processing of a language
US7739104B2 (en) 2005-05-27 2010-06-15 Hakia, Inc. System and method for natural language processing and using ontological searches
US7921099B2 (en) 2006-05-10 2011-04-05 Inquira, Inc. Guided navigation system
US7672951B1 (en) 2006-05-10 2010-03-02 Inquira, Inc. Guided navigation system
US8296284B2 (en) 2006-05-10 2012-10-23 Oracle International Corp. Guided navigation system
US7668850B1 (en) 2006-05-10 2010-02-23 Inquira, Inc. Rule based navigation
US8781813B2 (en) 2006-08-14 2014-07-15 Oracle Otc Subsidiary Llc Intent management tool for identifying concepts associated with a plurality of users' queries
US7747601B2 (en) 2006-08-14 2010-06-29 Inquira, Inc. Method and apparatus for identifying and classifying query intent
US8478780B2 (en) 2006-08-14 2013-07-02 Oracle Otc Subsidiary Llc Method and apparatus for identifying and classifying query intent
US10872067B2 (en) 2006-11-20 2020-12-22 Palantir Technologies, Inc. Creating data in a data store using a dynamic ontology
US8095476B2 (en) 2006-11-27 2012-01-10 Inquira, Inc. Automated support scheme for electronic forms
US8793208B2 (en) 2009-12-17 2014-07-29 International Business Machines Corporation Identifying common data objects representing solutions to a problem in different disciplines
US9053180B2 (en) 2009-12-17 2015-06-09 International Business Machines Corporation Identifying common data objects representing solutions to a problem in different disciplines
US9298817B2 (en) 2012-03-28 2016-03-29 International Business Machines Corporation Building an ontology by transforming complex triples
US8747115B2 (en) 2012-03-28 2014-06-10 International Business Machines Corporation Building an ontology by transforming complex triples
US9489453B2 (en) 2012-03-28 2016-11-08 International Business Machines Corporation Building an ontology by transforming complex triples
US8539001B1 (en) 2012-08-20 2013-09-17 International Business Machines Corporation Determining the value of an association between ontologies
US8799330B2 (en) 2012-08-20 2014-08-05 International Business Machines Corporation Determining the value of an association between ontologies
US9152623B2 (en) 2012-11-02 2015-10-06 Fido Labs, Inc. Natural language processing system and method
US10809888B2 (en) 2013-03-15 2020-10-20 Palantir Technologies, Inc. Systems and methods for providing a tagging interface for external content
US9740369B2 (en) 2013-03-15 2017-08-22 Palantir Technologies Inc. Systems and methods for providing a tagging interface for external content
US10572529B2 (en) 2013-03-15 2020-02-25 Palantir Technologies Inc. Data integration tool
US10120857B2 (en) 2013-03-15 2018-11-06 Palantir Technologies Inc. Method and system for generating a parser and parsing complex data
US9898167B2 (en) 2013-03-15 2018-02-20 Palantir Technologies Inc. Systems and methods for providing a tagging interface for external content
US11100154B2 (en) 2013-03-15 2021-08-24 Palantir Technologies Inc. Data integration tool
EP2988231A1 (en) * 2014-08-21 2016-02-24 Samsung Electronics Co., Ltd. Method and apparatus for providing summarized content to users
US10191926B2 (en) 2014-11-05 2019-01-29 Palantir Technologies, Inc. Universal data pipeline
US9946738B2 (en) 2014-11-05 2018-04-17 Palantir Technologies, Inc. Universal data pipeline
US9483506B2 (en) 2014-11-05 2016-11-01 Palantir Technologies, Inc. History preserving data pipeline
US10853338B2 (en) 2014-11-05 2020-12-01 Palantir Technologies Inc. Universal data pipeline
US10803106B1 (en) 2015-02-24 2020-10-13 Palantir Technologies Inc. System with methodology for dynamic modular ontology
US9865257B2 (en) 2015-04-30 2018-01-09 Kabushiki Kaisha Toshiba Device and method for a spoken dialogue system
GB2537903B (en) * 2015-04-30 2019-09-04 Toshiba Res Europe Limited Device and method for a spoken dialogue system
GB2537903A (en) * 2015-04-30 2016-11-02 Toshiba Res Europe Ltd Device and method for a spoken dialogue system
US9996595B2 (en) 2015-08-03 2018-06-12 Palantir Technologies, Inc. Providing full data provenance visualization for versioned datasets
US9965534B2 (en) 2015-09-09 2018-05-08 Palantir Technologies, Inc. Domain-specific language for dataset transformations
US11080296B2 (en) 2015-09-09 2021-08-03 Palantir Technologies Inc. Domain-specific language for dataset transformations
US10248722B2 (en) 2016-02-22 2019-04-02 Palantir Technologies Inc. Multi-language support for dynamic ontology
US10909159B2 (en) 2016-02-22 2021-02-02 Palantir Technologies Inc. Multi-language support for dynamic ontology
US11106638B2 (en) 2016-06-13 2021-08-31 Palantir Technologies Inc. Data revision control in large-scale data analytic systems
US10007674B2 (en) 2016-06-13 2018-06-26 Palantir Technologies Inc. Data revision control in large-scale data analytic systems
US10102229B2 (en) 2016-11-09 2018-10-16 Palantir Technologies Inc. Validating data integrations using a secondary data store
US9946777B1 (en) 2016-12-19 2018-04-17 Palantir Technologies Inc. Systems and methods for facilitating data transformation
US11768851B2 (en) 2016-12-19 2023-09-26 Palantir Technologies Inc. Systems and methods for facilitating data transformation
US11416512B2 (en) 2016-12-19 2022-08-16 Palantir Technologies Inc. Systems and methods for facilitating data transformation
US10482099B2 (en) 2016-12-19 2019-11-19 Palantir Technologies Inc. Systems and methods for facilitating data transformation
US9922108B1 (en) 2017-01-05 2018-03-20 Palantir Technologies Inc. Systems and methods for facilitating data transformation
US10776382B2 (en) 2017-01-05 2020-09-15 Palantir Technologies Inc. Systems and methods for facilitating data transformation
US10956406B2 (en) 2017-06-12 2021-03-23 Palantir Technologies Inc. Propagated deletion of database records and derived data
US11301499B2 (en) 2017-07-07 2022-04-12 Palantir Technologies Inc. Systems and methods for providing an object platform for datasets
US10691729B2 (en) 2017-07-07 2020-06-23 Palantir Technologies Inc. Systems and methods for providing an object platform for a relational database
US11741166B2 (en) 2017-11-10 2023-08-29 Palantir Technologies Inc. Systems and methods for creating and managing a data integration workspace
US10956508B2 (en) 2017-11-10 2021-03-23 Palantir Technologies Inc. Systems and methods for creating and managing a data integration workspace containing automatically updated data models
US10956670B2 (en) 2018-03-03 2021-03-23 Samurai Labs Sp. Z O.O. System and method for detecting undesirable and potentially harmful online behavior
US11151318B2 (en) 2018-03-03 2021-10-19 SAMURAI LABS sp. z. o.o. System and method for detecting undesirable and potentially harmful online behavior
US11507745B2 (en) 2018-03-03 2022-11-22 Samurai Labs Sp. Z O.O. System and method for detecting undesirable and potentially harmful online behavior
US11663403B2 (en) 2018-03-03 2023-05-30 Samurai Labs Sp. Z O.O. System and method for detecting undesirable and potentially harmful online behavior
US10754822B1 (en) 2018-04-18 2020-08-25 Palantir Technologies Inc. Systems and methods for ontology migration
US11461355B1 (en) 2018-05-15 2022-10-04 Palantir Technologies Inc. Ontological mapping of data
CN109241522A (en) * 2018-08-02 2019-01-18 义语智能科技(上海)有限公司 Coding-decoding method and equipment
CN110569342A (en) * 2019-08-15 2019-12-13 阿里巴巴集团控股有限公司 question matching method, device, equipment and computer readable storage medium
CN110569342B (en) * 2019-08-15 2023-04-07 创新先进技术有限公司 Question matching method, device, equipment and computer readable storage medium

Also Published As

Publication number Publication date
WO2002035376A3 (en) 2003-08-28
AU2002224446A1 (en) 2002-05-06
US7027974B1 (en) 2006-04-11

Similar Documents

Publication Publication Date Title
US7027974B1 (en) Ontology-based parser for natural language processing
US9710458B2 (en) System for natural language understanding
US6330530B1 (en) Method and system for transforming a source language linguistic structure into a target language linguistic structure based on example linguistic feature structures
US9824083B2 (en) System for natural language understanding
US5475587A (en) Method and apparatus for efficient morphological text analysis using a high-level language for compact specification of inflectional paradigms
US5243520A (en) Sense discrimination system and method
Carroll Practical unification-based parsing of natural language
Silberztein INTEX: an FST toolbox
WO1999021105A9 (en) Automatically recognizing the discourse structure of a body of text
Dahl Translating spanish into logic through logic
KR20100116595A (en) Managing an archive for approximate string matching
US10503769B2 (en) System for natural language understanding
Neumann et al. A shallow text processing core engine
JP2007026451A (en) Processing method of x-path query
EP0814417B1 (en) Method of and system for unifying data structures
Seo et al. Syntactic graphs: A representation for the union of all ambiguous parse trees
Abiteboul et al. A logical view of structured files
EA037156B1 (en) Method for template match searching in a text
Grana et al. Compilation methods of minimal acyclic finite-state automata for large dictionaries
Kaiser et al. Information extraction
Starc et al. Joint learning of ontology and semantic parser from text
Perfiliev et al. Methods of syntactic analysis and comparison of constructions of a natural language oriented to use in search systems
Alshawi Qualitative and quantitative models of speech translation
Harandi et al. Rule base management using meta knowledge
Yoon et al. A New Parsing Method Using a Global Association Table

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP