CA1306062C - Computer information retrieval using latent semantic structure - Google Patents

Computer information retrieval using latent semantic structure

Info

Publication number
CA1306062C
CA1306062C CA000596524A CA596524A CA1306062C CA 1306062 C CA1306062 C CA 1306062C CA 000596524 A CA000596524 A CA 000596524A CA 596524 A CA596524 A CA 596524A CA 1306062 C CA1306062 C CA 1306062C
Authority
CA
Canada
Prior art keywords
term
data
pseudo
matrix
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
CA000596524A
Other languages
French (fr)
Inventor
Scott Craig Deerwester
Susan Theresa Dumais
George William Furnas
Richard Allan Harshman
Thomas K. Landauer
Karen Elizabeth Lochbaum
Lynn Anne Streeter
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Content Analyst Co LLC
Original Assignee
Bell Communications Research Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bell Communications Research Inc filed Critical Bell Communications Research Inc
Application granted granted Critical
Publication of CA1306062C publication Critical patent/CA1306062C/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99933Query processing, i.e. searching
    • Y10S707/99935Query augmenting and refining, e.g. inexact access

Abstract

Abstract of the Disclosure A methodology for retrieving textual data objects is disclosed. The information is treated in the statistical domain by presuming that there is an underlying, latent semantic structure in the usage of words in the data objects. Estimates to this latent structure are utilized to represent and retrieve objects. A user query is recouched in the new statistical domain and then processed in the computer system to extract the underlying meaning to respond to the query.

Description

~3~

This invention relates generally to computer-based i~formation retrieval and, in particular, to user accessibility to and display of textual material stored in computer files.
S E~çkg~ of the Invention Increases in computer storage capacity, transmission rates and processing speed mean that many large and important collections of data are now available electronically, such as via bulletin boards, mail, and on-line texts, documents and directories. While many of the technological barriers to information access and display have I0 been removed, the human/system interface problem of being able to locate what one really needs from the collections remains. Methods for storing, organizing and accessing this information range from electronic analogs of familiar paper-based techniques, such as tables of contents or indices to richer associative connections that are feasible only wi1h computers, such as hypertext and full-context addressability. While these techniques may provide 15 retrieval benefits over the prior paper-based techniques, many advantages of electronic storage are yet unrealized. Most systems still require a user or provider of information to specify explicit relationships and links b~ tween data objects or text objects, thereby making the systems tedious to use or to apply to large, heterogeneous computer information files whose content may be unfamiliar to the user.
To exemplify one standard approach whose difficulties and deficiencies are representative of conventioDal approaches, the retrieval of information using keyword matching is considered. This technique depends Dn matching individual words in a user's request vith individual words in the total database of textual material. Text objects that contain one or more words in common with those in the user's query are return~d as 25 relevant. Keyword-based retrieval systems like this are, however, far from ideal. Many objects relevant to the query may be missed, and oftentimes unrelated objects are retrieved.
The fundamental deficiency of current information retrieval methods is that the words a searcher uses are often not the same as those by which the information sought has been indexed. There are actually t vo aspects to the problem. First, there is a 30 tremendous diversity in the words people use to describe the same object or concept; this is called synonymy. Users in different contexts, or with different needs, knowledge or linguistic habits will describe the same information using different terms. For example, it has been demonstrated that any two people choose the same main keyword for a single, well-known object less than 20% of the time on average. Indeed, this variability is much ~3~
greater than commonly believed and this places strict, low limits on the expected performance of word-matching systems.
The second aspect relates to polysemy, a word havin~ more than one distinct meaning. In different contexts or when used by different people the same word S takes on varying referential significance (e.g., "bank" in river bank versus "bank" in a saviDgs bank). Thus the use of a term in a search query does not necessarily mean that a text object containing or labeled by the same term is of interest.
Because human word use is characterized by extensive synonymy and polysemy, straightforward term-matching schemes have serious shortcomings -- relevant 10 materials will be missed because different people describe the same topic using different words and, because the same word can have different meanings, irrelevant material will be retrieved. The basic problem may be simply summarized by stating that people want to access information based on meaning, but the words they select do not adequately e~press intended meaning. Previous attempts to improve standard word searching and overcome the 15 diversity in human word usage have involved: restricting the allowable vocabulary and training intermediaries to generate indexing and search keys; hand-crafting thesauri to provide synonyms; or coDstructing explicit models of the relPvant domain knowledge. Not only are these methods expert-labor intensive, but they are often not very successful.
SumrrlaIy Q~h~;~nvention These shortcomings as well as other deficiencies and limitations of information retrieval are obviated, in accordance with the present invention, byautomatically constructing a semaDtic space for rstrieval. This is effected by treating the unreliability of observed word-to-text object association data as a statistical problem. The basic postulate is that there is an Imderlying latent semantic structure in word usage data 25 that is partially hiddeD or obscured by the variability of word choice. A statistical approach is utilized to estimate this latent structure and uncover the latent meaning. Words, the te~t objects and, later, user queries are processed to extract this underlying meaning and the new, latent semantic structure domain is then used to represent and retrieve information.
The organization and operation of this invention will be better 30 understood from a consideration of the detailed description of the illustrative embodiment thereof, which follows, when taken iD conjunction with the accompanying drawing.

6~
Br;ef nescription of the Drawin~
FIG. 1 is a plot of the "term" coordinates and the "document"
coordinates based s~n a two-dimensional singular value decomposition of an original "term-by-document" matrix; and FIG. 2 is a flow diagram depicting the processiDg to generate the "term" and "document" matrices using singular valui: decomposition as well as the processing of a user's query.
Detailed Descripti~
Before discussing the principles and operational characteristics of this 10 invention in detail, it i9 helpful to present a motivating e~ample. This also aids in introducing terminology utilized later in the discussion.

Simple Example Illustrafing the Method The coDtents of Table 1 are used to illustrate how semantic structure analysis works and to point out the diffe}ences between this method and conventional 15 keyword matching.

DOCUMENT S~3T BASED ON TITLES

c1: Human machine interface fo} Lab ABC computer applications c2: A survey of user opinion of computer system response time ~ -c3: The EPS user mterface management system c4: Systems and human systems engineering testing of EPS-2 cS: Relation of user-perceived response time to error measurement ml: The generation of random, binary, unordered trees m2: The intersection graph of paths in trees m3: Graph minors IV: Widths of trees and well-quasi-ordering m4: Graph minors: A survey In this example, a file of text objects consists of nine titles of technical documents with titles cl-cS concerned with human/computer interaction and titles ml-m4 concerned with mathematical graph theory. In Table 1, words occurring in more than one 30 title are italicized. Using conventional keyword retrieval, if a user requested papers dealing with "human compu~er interaction," titles c1, c2, and c4 would be returned, since these titles 6;~
contain at least one keyword from the user request. However, c3 and cS, while related to the query, would not be returned since they share no words in cs~mmon with the request. It is now shown how latent semantic structure analysis treats this request to return titles c3 and cS.
Table 2 depicts the l'term-by-document" matrix for the 9 technical document titles. Eaeh cell entry, (i,j), is the frequeDey of occurrence of term i in document j. This basic term-by-document matrix or a mathematical transformation thereof is used as input to the statistical procedure described below.

10 TERMS DO~UMENT~

cl c2 c3 c4 cS ml m2 m3 m4 human 1 0 0 1 0 0 0 0 0 interface 1 0 1 0 0 0 0 0 0 15 computer 1 1 0 0 0 0 0 0 0 user 0 1 1 0 1 0 0 0 0 system 0 1 1 2 0 0 0 0 0 response 0 1 0 0 1 0 0 0 0 time 0 1 0 0 1 0 0 0 0 survey 0 1 0 0 0 0 0 0 tree 0 0 0 û O 1 1 1 0 graph O O O O O 0 minor 0 0 0 0 0 0 0 For this example the documents and terms have been carefully selected to yield a good approximatiDn in just two dimensions for expository purposes. FIG. 1 is a two dimensional graphical representation of the two largest dimensions resulting from the statistical process, singular value decomposition. Both document titles and the terms used in them are fit into the same space. Terms are shown as circles and labeled by number.
30 Document titles are represented by squares with the numbers of constituent terms indicated parenthetically. The cosiIIe or dot product between two objects (terms or documents) describe their estimated similarity. In this rep}esentation, the two types of documents form two distinct groups: all the mathematical graph theory titles occupy the same region in ~L~0~i~6~

space (basically along Dimension 1 of FIG. 1,) whereas a quite distinct group is formed for human/computer interaction titles (essentially along Dimension 2 of FIG. 1).
To respond to a user query about "human compllter interaction," the query is first folded into this two-dimeDsional space using those query terms that occur in 5 the space (namely, "human" and "computer"). The query vector is located in the direction of the weighted average of these constituent terms, and is denoted by a directional arrow labeled "Q" in FIG. 1. A measure of closeness or similarity is related to the angle between the query vector and any given term or document vector. One such measure is the cosine between the query vector and a given term or document vector. In FIG. 1 the cosine 10 between the query vector and each c1-cS titles is greater than 0.90; the angle corresponding to the cosine value of 0.90 with the query is shown by the dashed lines in FIG. 1. With this technique, documents c3 and cS would be returned as matches to the user query, even though they share no common terms with the qllery. This is because the latent semantic structure (represented in FIG. 1) fits the overall pattern of term usage across documents.

Descr)p~ion of Singular Value Decomp~s~t~on To obtain the data to plot FIG. 1, the "term-by-document" matrix of Table 2 is decomposed using singular value decomposition (SVD). A reduced SVD isemployed to approximate the original matrix in terms of a much smaller number oforthogonal dimensions. This reduced SVD is used for retrieval; it describes major 20 associational structures in the matrix but it ignores small variations in word usage. The number of dimensions to represent adequately a particular domain is largely an empirical matter. If the number of dimensions is too large, random noise or variations in ~ord usage will be modeled. If the number of dimensions is too small, significant semantic content will remain uncaptured. ~or diverse information sources, 100 or more dimensions may be 25 needed.
To illustrate the decomposition technique, the term-by-document matrix, denoted Y, is decomposed into three other matrices, namely, the term matrix (TERM), the document matrix (DOCUMENT)~ and a diagonal matrix of singular values(DIAGONAL), as follows:

t,d TERl!v[t,m DIAGONALm m DOCUMENTl'm d where Y is the original t-by-d matrix, TERM is the t-by-m matrix that has unit-length orthogonal columns, DOCUMEiNT~ is the transpose of the d-by-m DOCUMENT matrix with unit-length orthogoDal columns, and DIAGONAL is the m-by-m diagonal matrix of singular values typically ordered by magnitude.

~3~ 6~
The dimensionality of the full solution, denoted m, is the rank of the t-by-d matrix, that is, m ~; min(t,d). Tables 3, 4 aDd S below show the TE2M and DOCUMFNT matrices and the diagonal elements of the DIAGONAL matrix, respectively, as fou~d via SVD.

T~BLI~ 3 S TE~M MATRI2~ (12 terms by 9 dimen~ions) human 0.22 -0.11 0.29 -0.41 -0.11 -0.34 -.52 -0.06 -0.41 interface 0.20 -0.07 0.14 -0.55 0.28 0.50 -0.07 -0.01 -0.11 computer 0.24 0.04 -0.16 -0.59 -0.11 -0.2S -0.30 0.06 0.49 10 user 0.40 0.06 -0.34 D.10 0.33 0.38 0.00 0.00 0.01 systern 0.64 -0.17 0.36 0.33 -0.16 -0.21 -0.16 0.03 0.27 response 0.26 0.11 -0.42 0.07 0.08 -0.17 0.28 -0.02 -0.05 time 0.26 0.11 -0.42 0.07 0.08 -0.17 0.28 -0.02 -0.05 EPS 0.30 -0.14 0.33 0.19 0.11 0.27 0.03 -0.02 -0.16 15 survey 0.20 0.27 -0.18 -0.03 -0.54 0.08 -0.47 -0.04 -0.58 tree 0.01 0.4S 0.23 0.02 0.59 -0.39 -0.29 0.25 -0.22 graph 0.04 0.62 0.22 0.00 -0.07 0.11 0.16 -0.68 0.23 minor 0.03 0.45 0.14 -0.01 -0.30 0.28 0.34 0.68 0.18 ~3~
DOCUMENT MAT~IX (9 documents by 9 dimensions) -c1 0.20 -0.06 0.11 -0.95 0.04 -0.08 0.18-0.01 -0.06 c2 0.60 0.16 -0.50 -0.~3 -0.21 -0.02-0.43 0.05 0.2~
c3 0.~6 -0.13 0.21 0.04 0.38 0.07 -0.240.01 0.02 c4 0.54 -0.23 0.57 0.27 -0.20 -0.04 0.2S-0.02 -0.08 cS 0.28 0.11 -0.50 0.15 0.33 0.03 0.67-0.06 -0.26 ml 0.00 0.19 0.10 0.02 0.39 -0.30 -0.340.45 -0.62 m2 0.01 0.44 0.19 0.02 0.35 -0.21-0.15 -û.76 0.02 m3 0.02 0.62 0.25 0.01 0.15 0.000.25 0.45 0.S2 m4 0.08 0.53 0.08 -0.02 -0.60 0.360.04 -0.07 -0.45 DIAGONAL (9 singul~r vallles) ~
3.34 2.54 2.35 1.64 1.50 1.31 0.84 0.56 0.36 As alluded to ea}lier, data to plot FIG. 1 was obtained by presuming that two-dimensions are sufficient to capture the major associational structure of the t-by-d matrix, that is, m is set to two in the e~pression for Yt d~ yielding an approximation of the 20 original matrL~. Only the first two columns of the TERM and DOCUMENT matrices are considered with the remaining columns being igDored. Thus, the term data point corresponding to "human" in FIC;. 1 is plotted with coordinates (0.22,-0.11), which are extracted from the first row and the two left-most columns of the TERM matrix. Similarly, the document data point corresponding to title ml has cooIdinates (0.00,0.19), coming from 25 row six and the two left-most columns of the DOCUMENT matrix.

General ~odel Details It is now elucidating to describe in somewhat more detail the mathematical model underlying the latent structure, singular value decomposition technique.

606~:
Any rectangular matrix Y of t rows and d columns, for example, a t-by-d matrix of terms and documeDts, can be decomposed h~to a product of three other matrices:

Y = To SO D O~ (1) 5 such that To and Do have unlt-length orthogonal columns (i.e. To~To = I; DoTDo = I) and SO is diagonal. This is called the singular value decomposition (SVD) of Y. (A procedure for SVD is described in the text ~umerical ~ec~pes, by Press, Flannery, Teukolsky and Vetterling, 1986, Cambridge University Press, Csmbridge, England). To and Do are the matrices of left and right singular vectors and SO is the diagonal matrix of singular values.
10 By convention, the diagonal elements of SO are ordered in decreasing magnitude.
With SVD, It is possible to devise a simple strategy for an optimal approximation to Y using smaller matrices. The k largest singular values and their associated columns in TD and Do may be kept and the remaining entries set to zero. The product of the resulting matrices is a matrix YR which is approximately equal to Y, and is 15 of rank k. The new matri~ Y~ is the matrix of rank k which is the closest in the least squares sense to Y. Since zeros were introduced into SO' the representation of SO can be simplified by deleting the rows and columns having these zeros to obtain a new diagonal matrix S, and then deleting the corresponding columns of To and Do to define new matrices T and D, respectively. The result is a reduced model such that YR = TSDT. (2) The value of k is chosen for each application; it is generally such that k 2100 for collections of 1000-3000 data objects.
For discussion purposes, it is useful to interpret the SVD
geometrically. The rows of the reduced matrices T and D may be taken as vectors 25 representiDg the terms a31d documents, respectively, in a k-dimensional spacs. With appropriate rescaling of the axes, by quantities related to the associated diagonal values of S, dot products between points in the space can be used to access and compare objects. (A
simplified approach which did not involve rescaling was used to plot the data of FIG. 1, but this was strictly for expository purposes.) These techniques are now discussed.

~3060~;~

Fundamental ~omparisons There are basically three types of comparisons of interest: (i) those comparing two terms; (ii) those comparing two doc~lments or text objects; aDd (iii) thosc comparing a term and a document or text object. As used throughout, the notion of a text S object or data object is general whereas a document is a specific instance of a te~t object or data object. Also, text or data objects are stored in the computer system in files.

Two Terms: In the data, the dot product between two row vectors of YR tells the extent to which two terms have a similar pattern of occurrence across the set of documents. The matrix YRYTR is the square symmetric matrix approximation containing all the term-by-10 term dot products. Using equation (2), y yT = (TSDT)(TSDT)T = TS2TT= (TS)(TS) . (3) This means that the dot product between the i-th row and j-th row of YR can be obtained by calculating the dot product between the i-th and j-th rows of the TS matrix. That is, considering the rows of TS as vectors representing the terms, dot products between these 15 vectors give the comparison between the terms. The relation between taking the rows of T
as vectors and those of TS as vectors is simple since S is a diagonal rnatrix; each vector element has been stretched or shrunk by the corresponding element of S.

Tw~ Doc~ments: In this case, the dot product is between two column vectors of Y. The document-to-document dot product is approximated by yTRYR = (TSDT)T(TSDT) = DS2DT = ~Ds)(Ds)T (4) Thus the rows of the DS matri2~ are taken as vectors representing the documents, and the comparison is via the dot product between the rows of the DS matrix.

Term and Document: This comparison is somewhat different. Instead of trying to estimate the dot product betweeD rows or between columns of Y, the Eundamental comparison25 between a term and a document is the value of an individual cell in Y. The approximation of Y is simply equation t2), i.e., YR - TSDT. The i,j cell of YR may therefore be obtained by taking the dot product between the i-th row of the matrix TS1/2 and the j-th row of the matrix DS1/2. While the "within" ~term or document) comparisons involved using rows of TS and DS as vectors, the "between" comparison requires TS1/2 and DS1/2 for 30 coordinates. Thus it is not possible to make a single configuration of points in a space that ~1.3~ 2 will allow both "between" and "withhl" comparisons. Tbey will be similar, however, differing only by a stretching or shrinking of the dimensional elements by a factor S1/2.

Representatfons of Pseudo-Obje~ts The previous results show how it is possible to compute comparisons S between the various objects associated with the ro vs or columns of Y. It is very important in information retrieval applications to compute similar comparison quantities for objects such as queries that do not appear explicitly in Y. For e~ample, it is necessary to be able to take a completely novel query, find a location in the k-dimensional latent semantic space for it, and then evaluate its cosine or inner product with respect to terms or objects in the space.
10 Another example would be trying, after-the-fact, to find representations for documents that did not appear in the original space. The new objects for both these examples are equivalent to objects in the matrix Y in that they may be represented as vectors of terms.
For this reason they are called pseudo-documents specifically or pseudo-objects generically.
In order to compare pseudo-documents to other documents, the starting point is defining a 15 pseudo-document vector, designated Yq. Then a representation Dq is derived such that Dq can be used just like a row of D in the comparison relationships described in the foregoing sections. One criterion for such a derivation is that the insertion of a real document Yi should give Di when the model is ideal (i.e., Y=YR). With this constraint, Y = TSD 1' q q 20 or, smce I T equals the identlty matri~, D T S-lTTY

or, finally, D = yT TS-l. (5) Thus, with appropriate rescaling of the a~es, this amounts to placing the pseudo-object at 25 the vector sum of its corresponding term points. Then Dq may be used like any row of D
and, appropriately scaled by S or S1/2, can be used like a usual document vector for making "withiD" and "between" comparisons. It is to be noted that if the measure of similarity to be used in comparing the query against all the documents is one in which only the angle between the vectors is important (such as the cosin0), there is no difference for comparison :ll3~
11 ~

purposes betveen placing the query at the vector average or the vector sum of its terms.

3~rativ~ Embodiment The foundation principles presented in the foregoing sections are now applied to a practical example by way of teaching an illustrative embodiment in accordance S with the present invention.
The system under consideration is one that receives a request for technical information from a user and returns as a response display the most appropriate groups in a large, technically diverse company dealing with that technical information. The size of each group is from five to ten people. There is no expert who understands iD detail 10 what every group is accomplishing. ~ach person's understanding or knowledge of the company's technical vork tends to be myopic, that is, each one knows their particular group's work, less about neighboring groups and their knowledge becomes less precise or even none~istent as one moves further a~ay from the core group.
~f each group can be described by a set of terms, then the latent 15 semantic indexing procedure can be applied. For instance, one set of textual descriptions might include annual write-ups each group member must prepare in describing the planned activity for the coming year. Another input could be the abstracts of technical memoranda written by members of each group.
The technique for processing the documents gathered together to 20 represent the company technical information is shown in block diagram form in FIG. 2. The first processing activity, as illustrated by processing block 100, is that of text preprocessing.
All the combined text i9 preprocessed to identify terms and possible compound noun phrases. First, phrases are found by identifying all words between (1) a precompiled list of stop words; or (2) punctuation marks, or (3) parenthetical remarks.
To obtain more stable estimates of word frequencies, all inflectional suffixss (past tense, plurals, adverbials, progressive tense, and so forth) are removed from the words. Inflectional suffixes, in contrast to derivational suffixes, are those that do not usually change the meaning of the base word. (For example, removing the "s" from "boys"
does not change the meaning of the base word whereas stripping "ation" from "information"
30 does change the meaning). Since no single set of pattern-action rules can correctly descril~e English language, the suffix stripper sub-program may contain an cxception list.The next step to the processing is represented by block 110 in FIG. 2.
Based upon the earlier text preprocessing, a system le~icon is created. The le~icon includes both single vvord and noun phases. The noun phrases provide for a richer semantic space.
35 For e~ample, the "information" in "information retrieval" and "information theory" have different meanings. Treating these as separate terms places each of the compounds at different places in the k-dimensional syace. (For a word in radically different semantic environments, treating it as a single word tends to place ehe word in a meaningless place in k-dimensional space, whereas trcating each of its different semantic environments separately using separate compounds yields spatial differentiation).
Compound noun phrases may be extracted using a simplifiedJ
automatic procedure. First, phrases are found using the "pseudo" parsing techDique described with respect to step 100. Then all left and right branching subphrases are found.
Any phrase or subphrase that occurs in more than one document is a potential compound phrase. Compound phrases may range from tvo to many words ~e.g., "semi-insulating Fe-10 doped InP current blocking layer"). From these potential compolmd phrases, all longest-matching phrases as well as single words making up the compounds are entered into the lexicon base to obtain spatial separation.
In the illustrative embodiment, all inflectionally stripped single vords occurring in more than one document and that are not on the list of most frequently used 15 words in English (such as "the", "and") are also included in the system lexicon. Typically, the e~clusion list comprises about 150 common words.
~rom the list of lexicon terms, the Term-by-Document matrix is created, as depicted by processing block 120 in FI&. 2. In one exemplary situation, the matrix contained 7100 terms and 728 documents representing 480 groups.
The next step is to perform the singular value decomposition on the Term-by-Document matrix, as depicted by processing block 130. This analysis is only effected once (or each time there is a significant update in the storage files).The last step in processing the documPnts prior to a user query is depicted by block 140. In order to relate a selected document to the group responsible for 25 that document, an organizational database is constructed. This latter database may contain, for instance, the group manager's name and the manager's mail address.
The user query processing activity is depicted on the right-hand side of FIG. 2. The first step, as represented by processing block 200, is to preprocess the query iD
the same way as the original documents.
As then depicted by block 210 the longest matching compound phrases as well as single words not part of compound phrases are extracted from the query. For each query term also contained in the system lexicon, the k-dimensional vector is located.
The query vector is the weighted vector average of the k-dimensional vectors. Processing block 220 depicts the generation step for the query vector.
The next step in the query processing is depicted by processing block 230. In order that the best matching document is located, the query vector is compared to all documents in the space. The similarity metris used is the cosine betveen the query )6~

vector and the document vectors. A cosine of 1.0 would indicate that the query vector and the document vector were on top of one another in the space. The cosine metric is similar to a dot product measure except that it lgnores the magnitude of the vectors and simply uses the angle between the vectors being compared.
The cosines are sorted, as depicted by processiDg block 240, and for each of the best N matching documents (typically N=8), the value of the cosine along with organizational information corresponding to the documentls group are displayed to the user, as depicted by prOCeSSiDg block 250. Table 6 shows a typical input and output for N= 5.

10 INPUT QUERY: An ExpertlExpert-Locating System Based on Automatic Representation of Semantic Structure OUTPUT RESULTS:
1. Group: B
Group Title: Artificial Intelligence and Information Science Research Group Manager: D. E. Walker, Address B, Phone B
Fit (Cosine): 0.67 2. Group: A
Group Title: Artificial Intelligence and Communications Research Group MaDager: L. A. Streeter, Address A, Phone A
Fit (Cosine): 0.64 3. Group: E
Group Title: Cognitive Science Research Group Manager: T. K. Laudauer, Address E, Phone E
Fit (Cosine): 0.63 4. Group: C
Group Title: Experimental Systems Group Manager: C. A. Riley, Address C, Phone C
Fit (Cosine): 0.62 5. Group: D
Group Title: Software Technology Group Manager: C. P. Lewis, Address D, Phone D
Fit (Cosine): 0.55 It is to be further understood that the metbodology described herein is not limited to the specific forms disclosed by way of illustration, but may assume other embodiments limited only by the scope of the appended claims.

Claims (11)

1. An information retrieval method comprising the steps of generating term-by-data object matrix data to represent information files stored in a computer system, said matrix data being indicative of the frequency of occurrence of selected terms contained in the data objects stored in the information files, decomposing said matrix into a reduced singular value representation composed of distinct term and data object files, in response to a user query, generating a pseudo-object utilizing said selected terms and inserting said pseudo-object into said matrix data, and examining the similarity between said pseudo-object and said term and data object files to generate an information response and storing said response in the system in a form accessible by the user.
2. The method as recited in claim 1 wherein said step of generating said matrix data includes the step of producing a lexicon database defining said selected terms.
3. The method as recited in claim 2 wherein said step of producing said lexicon database includes the step of parsing the data objects.
4. The method as recited in claim 3 wherein said step of parsing includes the steps of removing inflectional suffixes and isolating phrases in the data objects.
5. The method as recited in claim 2 wherein said step of generating said pseudo-object includes the step of parsing said pseudo-object with reference to said lexicon database.
6. The method as recited in claim 1 further including the step of generating an organizational database associated with the authorship of the data objects and storing said organizational database in the system and said response includes information from said organizational database based on said similarity.
7. The method as recited in claim 1 wherein said matrix database is expressed as Y, said step of decomposing produces said representation in the form Y=T0S0DT0 of rank m, and an approximation representation YR=TSDT of rank k < m, where T0 and D0 represent said term and data object databases and S0 corresponds to said singular value representation and where T, D and S represent reduced forms of T0, D0 and S0, respectively, said pseudo-object is expressible as Yq and said step of inserting includes the step of computing Dq=YqTTS-1, and said step of examining includes the step of evaluating the dot products between said pseudo-object and said term and document matrices.
8. The method as recited in claim 7 wherein the degree of similarity is measured by said dot products exceeding a predetermined threshold.
9. The method as recited in claim 8 wherein said approximation representation is obtained by setting (k+1) through m diagonal values of S0 to zero.
10. The method as recited in claim 1 wherein said matrix database is expressed as Y, said step of decomposing produces said representation in the form Y=T0S0DT0 of rank m, and an approximation representation YR=TSDT of rank k < m, where T0 and D0 represent said term and data object databases and S0 corresponds to said singular value representation and where T, D and S represent reduced forms of T0, D0 and S0, respectively, said pseudo-object is expressible as Yq and said step of inserting includes the step of computing Dq=YqTTS-1, and said step of examining includes the step of evaluating the cosines between said pseudo-object and said term and document matrices.
11. A method for retrieving information from an information file stored in a computer system comprising the steps of generating term by-data object matrix data by processing the information file, performing a singular value decomposition on said matrix data to obtain the reduced term and data object vectors and diagonal values, in response to a user query, generating a pseudo-object vector and augmenting said matrix data with said pseudo-vector using reduced forms of said term vector and said diagonal values and storing said augmented data in the system, and examining the similarities between said pseudo-object vector and said reduced term vector and a reduced form of said data object vector to generate the information and storing the information in a response file accessible to the user.
CA000596524A 1988-09-15 1989-04-12 Computer information retrieval using latent semantic structure Expired - Lifetime CA1306062C (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US07/244,349 1988-09-15
US07/244,349 US4839853A (en) 1988-09-15 1988-09-15 Computer information retrieval using latent semantic structure

Publications (1)

Publication Number Publication Date
CA1306062C true CA1306062C (en) 1992-08-04

Family

ID=22922358

Family Applications (1)

Application Number Title Priority Date Filing Date
CA000596524A Expired - Lifetime CA1306062C (en) 1988-09-15 1989-04-12 Computer information retrieval using latent semantic structure

Country Status (2)

Country Link
US (1) US4839853A (en)
CA (1) CA1306062C (en)

Families Citing this family (467)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5142681A (en) * 1986-07-07 1992-08-25 International Business Machines Corporation APL-to-Fortran translators
US5408655A (en) * 1989-02-27 1995-04-18 Apple Computer, Inc. User interface system and method for traversing a database
US5197005A (en) * 1989-05-01 1993-03-23 Intelligent Business Systems Database retrieval system having a natural language interface
US6978277B2 (en) * 1989-10-26 2005-12-20 Encyclopaedia Britannica, Inc. Multimedia search system
US5241671C1 (en) 1989-10-26 2002-07-02 Encyclopaedia Britannica Educa Multimedia search system using a plurality of entry path means which indicate interrelatedness of information
US5301109A (en) * 1990-06-11 1994-04-05 Bell Communications Research, Inc. Computerized cross-language document retrieval using latent semantic indexing
US5321833A (en) * 1990-08-29 1994-06-14 Gte Laboratories Incorporated Adaptive ranking system for information retrieval
US5490516A (en) * 1990-12-14 1996-02-13 Hutson; William H. Method and system to enhance medical signals for real-time analysis and high-resolution display
US5348020A (en) * 1990-12-14 1994-09-20 Hutson William H Method and system for near real-time analysis and display of electrocardiographic signals
US5559940A (en) * 1990-12-14 1996-09-24 Hutson; William H. Method and system for real-time information analysis of textual material
DE69229521T2 (en) * 1991-04-25 2000-03-30 Nippon Steel Corp Database discovery system
US6643656B2 (en) 1991-07-31 2003-11-04 Richard Esty Peterson Computerized information retrieval system
US5265065A (en) * 1991-10-08 1993-11-23 West Publishing Company Method and apparatus for information retrieval from a database by replacing domain specific stemmed phases in a natural language to create a search query
JP2792293B2 (en) * 1991-11-29 1998-09-03 日本電気株式会社 Information retrieval device
US5369575A (en) * 1992-05-15 1994-11-29 International Business Machines Corporation Constrained natural language interface for a computer system
US5598557A (en) * 1992-09-22 1997-01-28 Caere Corporation Apparatus and method for retrieving and grouping images representing text files based on the relevance of key words extracted from a selected file to the text files
US5440481A (en) * 1992-10-28 1995-08-08 The United States Of America As Represented By The Secretary Of The Navy System and method for database tomography
JP3025724B2 (en) * 1992-11-24 2000-03-27 富士通株式会社 Synonym generation processing method
EP0615201B1 (en) * 1993-03-12 2001-01-10 Kabushiki Kaisha Toshiba Document detection system using detection result presentation for facilitating user's comprehension
US5652897A (en) * 1993-05-24 1997-07-29 Unisys Corporation Robust language processor for segmenting and parsing-language containing multiple instructions
US5544352A (en) * 1993-06-14 1996-08-06 Libertech, Inc. Method and apparatus for indexing, searching and displaying data
JPH07105239A (en) * 1993-09-30 1995-04-21 Omron Corp Data base managing method and data base retrieving method
US5873056A (en) * 1993-10-12 1999-02-16 The Syracuse University Natural language processing system for semantic vector representation which accounts for lexical ambiguity
US5692176A (en) * 1993-11-22 1997-11-25 Reed Elsevier Inc. Associative text search and retrieval system
US5584024A (en) * 1994-03-24 1996-12-10 Software Ag Interactive database query system and method for prohibiting the selection of semantically incorrect query parameters
US5630125A (en) * 1994-05-23 1997-05-13 Zellweger; Paul Method and apparatus for information management using an open hierarchical data structure
US5745745A (en) * 1994-06-29 1998-04-28 Hitachi, Ltd. Text search method and apparatus for structured documents
US5706497A (en) * 1994-08-15 1998-01-06 Nec Research Institute, Inc. Document retrieval using fuzzy-logic inference
US6604103B1 (en) 1994-09-02 2003-08-05 Mark A. Wolfe System and method for information retrieval employing a preloading procedure
US5715445A (en) * 1994-09-02 1998-02-03 Wolfe; Mark A. Document retrieval system employing a preloading procedure
US7467137B1 (en) 1994-09-02 2008-12-16 Wolfe Mark A System and method for information retrieval employing a preloading procedure
US7103594B1 (en) 1994-09-02 2006-09-05 Wolfe Mark A System and method for information retrieval employing a preloading procedure
US5659766A (en) * 1994-09-16 1997-08-19 Xerox Corporation Method and apparatus for inferring the topical content of a document based upon its lexical content without supervision
US5687364A (en) * 1994-09-16 1997-11-11 Xerox Corporation Method for learning to infer the topical content of documents based upon their lexical content
US5855015A (en) * 1995-03-20 1998-12-29 Interval Research Corporation System and method for retrieval of hyperlinked information resources
US7246310B1 (en) * 1995-06-07 2007-07-17 Wolfe Mark A Efficiently displaying and researching information about the interrelationships between documents
US7302638B1 (en) * 1995-06-07 2007-11-27 Wolfe Mark A Efficiently displaying and researching information about the interrelationships between documents
US5870770A (en) * 1995-06-07 1999-02-09 Wolfe; Mark A. Document research system and method for displaying citing documents
US5675710A (en) * 1995-06-07 1997-10-07 Lucent Technologies, Inc. Method and apparatus for training a text classifier
US5724571A (en) 1995-07-07 1998-03-03 Sun Microsystems, Inc. Method and apparatus for generating query responses in a computer-based document retrieval system
US5787422A (en) * 1996-01-11 1998-07-28 Xerox Corporation Method and apparatus for information accesss employing overlapping clusters
US5787450A (en) * 1996-05-29 1998-07-28 International Business Machines Corporation Apparatus and method for constructing a non-linear data object from a common gateway interface
US5926812A (en) * 1996-06-20 1999-07-20 Mantra Technologies, Inc. Document extraction and comparison method with applications to automatic personalized database searching
US5778362A (en) * 1996-06-21 1998-07-07 Kdl Technologies Limted Method and system for revealing information structures in collections of data items
US5813002A (en) * 1996-07-31 1998-09-22 International Business Machines Corporation Method and system for linearly detecting data deviations in a large database
JP3916007B2 (en) * 1996-08-01 2007-05-16 高嗣 北川 Semantic information processing method and apparatus
US5765149A (en) * 1996-08-09 1998-06-09 Digital Equipment Corporation Modified collection frequency ranking method
US5765150A (en) * 1996-08-09 1998-06-09 Digital Equipment Corporation Method for statistically projecting the ranking of information
US6745194B2 (en) * 2000-08-07 2004-06-01 Alta Vista Company Technique for deleting duplicate records referenced in an index of a database
US5745890A (en) 1996-08-09 1998-04-28 Digital Equipment Corporation Sequential searching of a database index using constraints on word-location pairs
US5909680A (en) * 1996-09-09 1999-06-01 Ricoh Company Limited Document categorization by word length distribution analysis
US5857179A (en) * 1996-09-09 1999-01-05 Digital Equipment Corporation Computer method and apparatus for clustering documents and automatic generation of cluster keywords
US5987446A (en) * 1996-11-12 1999-11-16 U.S. West, Inc. Searching large collections of text using multiple search engines concurrently
US5915001A (en) 1996-11-14 1999-06-22 Vois Corporation System and method for providing and using universally accessible voice and speech data files
US6415319B1 (en) 1997-02-07 2002-07-02 Sun Microsystems, Inc. Intelligent network browser using incremental conceptual indexer
US6076051A (en) 1997-03-07 2000-06-13 Microsoft Corporation Information retrieval utilizing semantic representation of text
US5996011A (en) * 1997-03-25 1999-11-30 Unified Research Laboratories, Inc. System and method for filtering data received by a computer system
US6539430B1 (en) * 1997-03-25 2003-03-25 Symantec Corporation System and method for filtering data received by a computer system
US8626763B1 (en) 1997-05-22 2014-01-07 Google Inc. Server-side suggestion of preload operations
US6356864B1 (en) 1997-07-25 2002-03-12 University Technology Corporation Methods for analysis and evaluation of the semantic content of a writing based on vector length
US6078878A (en) * 1997-07-31 2000-06-20 Microsoft Corporation Bootstrapping sense characterizations of occurrences of polysemous words
US6112304A (en) * 1997-08-27 2000-08-29 Zipsoft, Inc. Distributed computing architecture
US6122628A (en) * 1997-10-31 2000-09-19 International Business Machines Corporation Multidimensional data clustering and dimension reduction for indexing and searching
US6134541A (en) * 1997-10-31 2000-10-17 International Business Machines Corporation Searching multidimensional indexes using associated clustering and dimension reduction information
US7257604B1 (en) 1997-11-17 2007-08-14 Wolfe Mark A System and method for communicating information relating to a network resource
US6272531B1 (en) * 1998-03-31 2001-08-07 International Business Machines Corporation Method and system for recognizing and acting upon dynamic data on the internet
US7194471B1 (en) 1998-04-10 2007-03-20 Ricoh Company, Ltd. Document classification system and method for classifying a document according to contents of the document
US6211876B1 (en) * 1998-06-22 2001-04-03 Mitsubishi Electric Research Laboratories, Inc. Method and system for displaying icons representing information items stored in a database
US6173441B1 (en) 1998-10-16 2001-01-09 Peter A. Klein Method and system for compiling source code containing natural language instructions
US6256629B1 (en) * 1998-11-25 2001-07-03 Lucent Technologies Inc. Method and apparatus for measuring the degree of polysemy in polysemous words
US6868389B1 (en) 1999-01-19 2005-03-15 Jeffrey K. Wilkins Internet-enabled lead generation
US6574378B1 (en) 1999-01-22 2003-06-03 Kent Ridge Digital Labs Method and apparatus for indexing and retrieving images using visual keywords
US6282540B1 (en) * 1999-02-26 2001-08-28 Vicinity Corporation Method and apparatus for efficient proximity searching
US6584464B1 (en) 1999-03-19 2003-06-24 Ask Jeeves, Inc. Grammar template query system
US6862710B1 (en) 1999-03-23 2005-03-01 Insightful Corporation Internet navigation using soft hyperlinks
US6510406B1 (en) * 1999-03-23 2003-01-21 Mathsoft, Inc. Inverse inference engine for high performance web search
US6629097B1 (en) 1999-04-28 2003-09-30 Douglas K. Keith Displaying implicit associations among items in loosely-structured data sets
US6493702B1 (en) 1999-05-05 2002-12-10 Xerox Corporation System and method for searching and recommending documents in a collection using share bookmarks
US6611825B1 (en) 1999-06-09 2003-08-26 The Boeing Company Method and system for text mining using multidimensional subspaces
US6701305B1 (en) 1999-06-09 2004-03-02 The Boeing Company Methods, apparatus and computer program products for information retrieval and document classification utilizing a multidimensional subspace
KR20010004404A (en) 1999-06-28 2001-01-15 정선종 Keyfact-based text retrieval system, keyfact-based text index method, and retrieval method using this system
US6598047B1 (en) * 1999-07-26 2003-07-22 David W. Russell Method and system for searching text
US6453315B1 (en) * 1999-09-22 2002-09-17 Applied Semantics, Inc. Meaning-based information organization and retrieval
US8051104B2 (en) 1999-09-22 2011-11-01 Google Inc. Editing a network of interconnected concepts
US7925610B2 (en) * 1999-09-22 2011-04-12 Google Inc. Determining a meaning of a knowledge item using document-based information
US8914361B2 (en) * 1999-09-22 2014-12-16 Google Inc. Methods and systems for determining a meaning of a document to match the document to content
US6816857B1 (en) 1999-11-01 2004-11-09 Applied Semantics, Inc. Meaning-based advertising and document relevance determination
JP3335602B2 (en) 1999-11-26 2002-10-21 株式会社クリエイティブ・ブレインズ Thinking system analysis method and analyzer
US6480837B1 (en) * 1999-12-16 2002-11-12 International Business Machines Corporation Method, system, and program for ordering search results using a popularity weighting
US6751621B1 (en) 2000-01-27 2004-06-15 Manning & Napier Information Services, Llc. Construction of trainable semantic vectors and clustering, classification, and searching using trainable semantic vectors
US8645137B2 (en) 2000-03-16 2014-02-04 Apple Inc. Fast, language-independent method for user authentication by voice
US6757646B2 (en) * 2000-03-22 2004-06-29 Insightful Corporation Extended functionality for an inverse inference engine based web search
US6925427B1 (en) 2000-04-04 2005-08-02 Ford Global Technologies, Llc Method of determining a switch sequence plan for an electrical system
US7912868B2 (en) * 2000-05-02 2011-03-22 Textwise Llc Advertisement placement method and system using semantic analysis
US6728695B1 (en) * 2000-05-26 2004-04-27 Burning Glass Technologies, Llc Method and apparatus for making predictions about entities represented in documents
JP3672234B2 (en) * 2000-06-12 2005-07-20 インターナショナル・ビジネス・マシーンズ・コーポレーション Method for retrieving and ranking documents from a database, computer system, and recording medium
JP3573688B2 (en) 2000-06-28 2004-10-06 松下電器産業株式会社 Similar document search device and related keyword extraction device
AU2001264363A1 (en) * 2000-07-06 2002-02-13 Si Han Kim Information searching system and method thereof
DE10033612B4 (en) * 2000-07-11 2004-05-13 Siemens Ag Method for controlling access to a storage device
US6687696B2 (en) * 2000-07-26 2004-02-03 Recommind Inc. System and method for personalized search, information filtering, and for generating recommendations utilizing statistical latent class models
US7024407B2 (en) 2000-08-24 2006-04-04 Content Analyst Company, Llc Word sense disambiguation
US6615208B1 (en) 2000-09-01 2003-09-02 Telcordia Technologies, Inc. Automatic recommendation of products using latent semantic indexing of content
WO2002021335A1 (en) * 2000-09-01 2002-03-14 Telcordia Technologies, Inc. Automatic recommendation of products using latent semantic indexing of content
WO2002027536A1 (en) * 2000-09-25 2002-04-04 Insightful Corporation Extended functionality for an inverse inference engine based web search
US6678679B1 (en) * 2000-10-10 2004-01-13 Science Applications International Corporation Method and system for facilitating the refinement of data queries
JP2002157270A (en) * 2000-11-17 2002-05-31 Mitsubishi Space Software Kk System and method for distributing interesting article
US6937986B2 (en) * 2000-12-28 2005-08-30 Comverse, Inc. Automatic dynamic speech recognition vocabulary based on external sources of information
US8744835B2 (en) * 2001-03-16 2014-06-03 Meaningful Machines Llc Content conversion method and apparatus
US20030083860A1 (en) * 2001-03-16 2003-05-01 Eli Abir Content conversion method and apparatus
US20030093261A1 (en) * 2001-03-16 2003-05-15 Eli Abir Multilingual database creation system and method
US7711547B2 (en) * 2001-03-16 2010-05-04 Meaningful Machines, L.L.C. Word association method and apparatus
US7860706B2 (en) * 2001-03-16 2010-12-28 Eli Abir Knowledge system method and appparatus
US8874431B2 (en) * 2001-03-16 2014-10-28 Meaningful Machines Llc Knowledge system method and apparatus
US7062572B1 (en) 2001-03-19 2006-06-13 Microsoft Corporation Method and system to determine the geographic location of a network user
US7120646B2 (en) * 2001-04-09 2006-10-10 Health Language, Inc. Method and system for interfacing with a multi-level data structure
US7062220B2 (en) 2001-04-18 2006-06-13 Intelligent Automation, Inc. Automated, computer-based reading tutoring systems and methods
US7194483B1 (en) 2001-05-07 2007-03-20 Intelligenxia, Inc. Method, system, and computer program product for concept-based multi-dimensional analysis of unstructured information
US7536413B1 (en) 2001-05-07 2009-05-19 Ixreveal, Inc. Concept-based categorization of unstructured objects
USRE46973E1 (en) * 2001-05-07 2018-07-31 Ureveal, Inc. Method, system, and computer program product for concept-based multi-dimensional analysis of unstructured information
US7627588B1 (en) 2001-05-07 2009-12-01 Ixreveal, Inc. System and method for concept based analysis of unstructured data
US6654740B2 (en) 2001-05-08 2003-11-25 Sunflare Co., Ltd. Probabilistic information retrieval based on differential latent semantic space
US7050964B2 (en) * 2001-06-01 2006-05-23 Microsoft Corporation Scaleable machine translation system
US7734459B2 (en) * 2001-06-01 2010-06-08 Microsoft Corporation Automatic extraction of transfer mappings from bilingual corpora
US7430562B1 (en) 2001-06-19 2008-09-30 Microstrategy, Incorporated System and method for efficient date retrieval and processing
US8005870B1 (en) 2001-06-19 2011-08-23 Microstrategy Incorporated System and method for syntax abstraction in query language generation
US7003512B1 (en) * 2001-06-20 2006-02-21 Microstrategy, Inc. System and method for multiple pass cooperative processing
US6820073B1 (en) 2001-06-20 2004-11-16 Microstrategy Inc. System and method for multiple pass cooperative processing
US20030004996A1 (en) * 2001-06-29 2003-01-02 International Business Machines Corporation Method and system for spatial information retrieval for hyperlinked documents
US8301503B2 (en) * 2001-07-17 2012-10-30 Incucomm, Inc. System and method for providing requested information to thin clients
KR20030009704A (en) * 2001-07-23 2003-02-05 한국전자통신연구원 System for drawing patent map using technical field word, its method
US20020010715A1 (en) * 2001-07-26 2002-01-24 Garry Chinn System and method for browsing using a limited display device
US7398201B2 (en) 2001-08-14 2008-07-08 Evri Inc. Method and system for enhanced data searching
US7526425B2 (en) * 2001-08-14 2009-04-28 Evri Inc. Method and system for extending keyword searching to syntactically and semantically annotated data
US7283951B2 (en) * 2001-08-14 2007-10-16 Insightful Corporation Method and system for enhanced data searching
US6978275B2 (en) * 2001-08-31 2005-12-20 Hewlett-Packard Development Company, L.P. Method and system for mining a document containing dirty text
US8078545B1 (en) 2001-09-24 2011-12-13 Aloft Media, Llc System, method and computer program product for collecting strategic patent data associated with an identifier
US7124081B1 (en) * 2001-09-28 2006-10-17 Apple Computer, Inc. Method and apparatus for speech recognition using latent semantic adaptation
ITFI20010199A1 (en) 2001-10-22 2003-04-22 Riccardo Vieri SYSTEM AND METHOD TO TRANSFORM TEXTUAL COMMUNICATIONS INTO VOICE AND SEND THEM WITH AN INTERNET CONNECTION TO ANY TELEPHONE SYSTEM
JP3953295B2 (en) * 2001-10-23 2007-08-08 インターナショナル・ビジネス・マシーンズ・コーポレーション Information search system, information search method, program for executing information search, and recording medium on which program for executing information search is recorded
US20070156665A1 (en) * 2001-12-05 2007-07-05 Janusz Wnek Taxonomy discovery
US6965900B2 (en) * 2001-12-19 2005-11-15 X-Labs Holdings, Llc Method and apparatus for electronically extracting application specific multidimensional information from documents selected from a set of documents electronically extracted from a library of electronically searchable documents
US7137062B2 (en) 2001-12-28 2006-11-14 International Business Machines Corporation System and method for hierarchical segmentation with latent semantic indexing in scale space
US7124073B2 (en) * 2002-02-12 2006-10-17 Sunflare Co., Ltd Computer-assisted memory translation scheme based on template automaton and latent semantic index principle
US8589413B1 (en) 2002-03-01 2013-11-19 Ixreveal, Inc. Concept-based method and system for dynamically analyzing results from search engines
US6847966B1 (en) 2002-04-24 2005-01-25 Engenium Corporation Method and system for optimally searching a document database using a representative semantic space
US7158983B2 (en) 2002-09-23 2007-01-02 Battelle Memorial Institute Text analysis technique
US20040133574A1 (en) * 2003-01-07 2004-07-08 Science Applications International Corporaton Vector space method for secure information sharing
US7421418B2 (en) * 2003-02-19 2008-09-02 Nahava Inc. Method and apparatus for fundamental operations on token sequences: computing similarity, extracting term values, and searching efficiently
US7557805B2 (en) * 2003-04-01 2009-07-07 Battelle Memorial Institute Dynamic visualization of data streams
US7152065B2 (en) * 2003-05-01 2006-12-19 Telcordia Technologies, Inc. Information retrieval and text mining using distributed latent semantic indexing
US7734627B1 (en) 2003-06-17 2010-06-08 Google Inc. Document similarity detection
US20040260551A1 (en) * 2003-06-19 2004-12-23 International Business Machines Corporation System and method for configuring voice readers using semantic analysis
GB0322600D0 (en) * 2003-09-26 2003-10-29 Univ Ulster Thematic retrieval in heterogeneous data repositories
JP4428036B2 (en) 2003-12-02 2010-03-10 ソニー株式会社 Information processing apparatus and method, program, information processing system and method
US7689536B1 (en) 2003-12-18 2010-03-30 Google Inc. Methods and systems for detecting and extracting information
US7299110B2 (en) * 2004-01-06 2007-11-20 Honda Motor Co., Ltd. Systems and methods for using statistical techniques to reason with noisy data
US20060051727A1 (en) * 2004-01-13 2006-03-09 Posit Science Corporation Method for enhancing memory and cognition in aging adults
US8210851B2 (en) * 2004-01-13 2012-07-03 Posit Science Corporation Method for modulating listener attention toward synthetic formant transition cues in speech stimuli for training
US20070111173A1 (en) * 2004-01-13 2007-05-17 Posit Science Corporation Method for modulating listener attention toward synthetic formant transition cues in speech stimuli for training
US20060177805A1 (en) * 2004-01-13 2006-08-10 Posit Science Corporation Method for enhancing memory and cognition in aging adults
US20050175972A1 (en) * 2004-01-13 2005-08-11 Neuroscience Solutions Corporation Method for enhancing memory and cognition in aging adults
US20060105307A1 (en) * 2004-01-13 2006-05-18 Posit Science Corporation Method for enhancing memory and cognition in aging adults
US20070065789A1 (en) * 2004-01-13 2007-03-22 Posit Science Corporation Method for enhancing memory and cognition in aging adults
US20060073452A1 (en) * 2004-01-13 2006-04-06 Posit Science Corporation Method for enhancing memory and cognition in aging adults
US20060047441A1 (en) * 2004-08-31 2006-03-02 Ramin Homayouni Semantic gene organizer
US20060074980A1 (en) * 2004-09-29 2006-04-06 Sarkar Pte. Ltd. System for semantically disambiguating text information
US20070011155A1 (en) * 2004-09-29 2007-01-11 Sarkar Pte. Ltd. System for communication and collaboration
WO2006035196A1 (en) * 2004-09-30 2006-04-06 British Telecommunications Public Limited Company Information retrieval
US7680648B2 (en) * 2004-09-30 2010-03-16 Google Inc. Methods and systems for improving text segmentation
US8051096B1 (en) 2004-09-30 2011-11-01 Google Inc. Methods and systems for augmenting a token lexicon
US7996208B2 (en) 2004-09-30 2011-08-09 Google Inc. Methods and systems for selecting a language for text segmentation
US7814105B2 (en) * 2004-10-27 2010-10-12 Harris Corporation Method for domain identification of documents in a document database
US7984388B2 (en) 2004-12-10 2011-07-19 International Business Machines Corporation System and method for partially collapsing a hierarchical structure for information navigation
US8843536B1 (en) 2004-12-31 2014-09-23 Google Inc. Methods and systems for providing relevant advertisements or other content for inactive uniform resource locators using search queries
US20060235870A1 (en) * 2005-01-31 2006-10-19 Musgrove Technology Enterprises, Llc System and method for generating an interlinked taxonomy structure
JP2008529173A (en) * 2005-01-31 2008-07-31 テキストディガー,インコーポレイテッド Method and system for semantic retrieval and capture of electronic documents
JP4524640B2 (en) * 2005-03-31 2010-08-18 ソニー株式会社 Information processing apparatus and method, and program
US20060224584A1 (en) * 2005-03-31 2006-10-05 Content Analyst Company, Llc Automatic linear text segmentation
US7720792B2 (en) * 2005-04-05 2010-05-18 Content Analyst Company, Llc Automatic stop word identification and compensation
US7580910B2 (en) * 2005-04-06 2009-08-25 Content Analyst Company, Llc Perturbing latent semantic indexing spaces
US9400838B2 (en) * 2005-04-11 2016-07-26 Textdigger, Inc. System and method for searching for a query
US7765098B2 (en) * 2005-04-26 2010-07-27 Content Analyst Company, Llc Machine translation using vector space representations
US20060242190A1 (en) * 2005-04-26 2006-10-26 Content Analyst Comapny, Llc Latent semantic taxonomy generation
US7844566B2 (en) * 2005-04-26 2010-11-30 Content Analyst Company, Llc Latent semantic clustering
US20060253423A1 (en) * 2005-05-07 2006-11-09 Mclane Mark Information retrieval system and method
US20060294101A1 (en) * 2005-06-24 2006-12-28 Content Analyst Company, Llc Multi-strategy document classification system and method
US8312034B2 (en) 2005-06-24 2012-11-13 Purediscovery Corporation Concept bridge and method of operating the same
US7747618B2 (en) * 2005-09-08 2010-06-29 Microsoft Corporation Augmenting user, query, and document triplets using singular value decomposition
US8677377B2 (en) 2005-09-08 2014-03-18 Apple Inc. Method and apparatus for building an intelligent automated assistant
US20080215614A1 (en) * 2005-09-08 2008-09-04 Slattery Michael J Pyramid Information Quantification or PIQ or Pyramid Database or Pyramided Database or Pyramided or Selective Pressure Database Management System
US8688673B2 (en) * 2005-09-27 2014-04-01 Sarkar Pte Ltd System for communication and collaboration
US7562074B2 (en) * 2005-09-28 2009-07-14 Epacris Inc. Search engine determining results based on probabilistic scoring of relevance
US7633076B2 (en) 2005-09-30 2009-12-15 Apple Inc. Automated response to and sensing of user activity in portable devices
EP1952280B8 (en) * 2005-10-11 2016-11-30 Ureveal, Inc. System, method&computer program product for concept based searching&analysis
US9069847B2 (en) * 2005-10-21 2015-06-30 Battelle Memorial Institute Data visualization methods, data visualization devices, data visualization apparatuses, and articles of manufacture
DE102005054510A1 (en) * 2005-11-16 2007-05-24 Voith Patent Gmbh tissue machine
EP1949273A1 (en) 2005-11-16 2008-07-30 Evri Inc. Extending keyword searching to syntactically and semantically annotated data
US7630992B2 (en) * 2005-11-30 2009-12-08 Selective, Inc. Selective latent semantic indexing method for information retrieval applications
US20070134635A1 (en) * 2005-12-13 2007-06-14 Posit Science Corporation Cognitive training using formant frequency sweeps
US20070143307A1 (en) * 2005-12-15 2007-06-21 Bowers Matthew N Communication system employing a context engine
US8694530B2 (en) * 2006-01-03 2014-04-08 Textdigger, Inc. Search system with query refinement and search method
US7676485B2 (en) * 2006-01-20 2010-03-09 Ixreveal, Inc. Method and computer program product for converting ontologies into concept semantic networks
US20070219946A1 (en) * 2006-03-15 2007-09-20 Emmanuel Roche Information repository and answering system
WO2007114932A2 (en) 2006-04-04 2007-10-11 Textdigger, Inc. Search system and method with text function tagging
US8060567B2 (en) 2006-04-12 2011-11-15 Google Inc. Method, system, graphical user interface, and data structure for creating electronic calendar entries from email messages
WO2007143109A2 (en) * 2006-06-02 2007-12-13 Telcordia Technologies, Inc. Concept based cross media indexing and retrieval of speech documents
US8401841B2 (en) 2006-08-31 2013-03-19 Orcatec Llc Retrieval of documents using language models
US9318108B2 (en) 2010-01-18 2016-04-19 Apple Inc. Intelligent automated assistant
US20080086490A1 (en) * 2006-10-04 2008-04-10 Sap Ag Discovery of services matching a service request
US8024193B2 (en) * 2006-10-10 2011-09-20 Apple Inc. Methods and apparatus related to pruning for concatenative text-to-speech synthesis
US9165040B1 (en) 2006-10-12 2015-10-20 Google Inc. Producing a ranking for pages using distances in a web-link graph
US8672055B2 (en) 2006-12-07 2014-03-18 Canrig Drilling Technology Ltd. Automated directional drilling apparatus and methods
US11725494B2 (en) 2006-12-07 2023-08-15 Nabors Drilling Technologies Usa, Inc. Method and apparatus for automatically modifying a drilling path in response to a reversal of a predicted trend
US7860593B2 (en) 2007-05-10 2010-12-28 Canrig Drilling Technology Ltd. Well prog execution facilitation system and method
US8065307B2 (en) * 2006-12-20 2011-11-22 Microsoft Corporation Parsing, analysis and scoring of document content
WO2008113045A1 (en) 2007-03-14 2008-09-18 Evri Inc. Query templates and labeled search tip system, methods, and techniques
US8977255B2 (en) 2007-04-03 2015-03-10 Apple Inc. Method and system for operating a multi-function portable electronic device using voice-activation
US8451475B2 (en) 2007-05-01 2013-05-28 Kofax, Inc. Systems and methods for routing a facsimile confirmation based on content
US8279465B2 (en) * 2007-05-01 2012-10-02 Kofax, Inc. Systems and methods for routing facsimiles based on content
US9069861B2 (en) 2007-05-29 2015-06-30 Brainspace Corporation Query generation system for an information retrieval system
US20080312985A1 (en) * 2007-06-18 2008-12-18 Microsoft Corporation Computerized evaluation of user impressions of product artifacts
US8006121B1 (en) * 2007-06-28 2011-08-23 Apple Inc. Systems and methods for diagnosing and fixing electronic devices
US20090228777A1 (en) * 2007-08-17 2009-09-10 Accupatent, Inc. System and Method for Search
US9053089B2 (en) 2007-10-02 2015-06-09 Apple Inc. Part-of-speech tagging using latent analogy
AU2008312423B2 (en) 2007-10-17 2013-12-19 Vcvc Iii Llc NLP-based content recommender
US8594996B2 (en) 2007-10-17 2013-11-26 Evri Inc. NLP-based entity recognition and disambiguation
US8694483B2 (en) 2007-10-19 2014-04-08 Xerox Corporation Real-time query suggestion in a troubleshooting context
US20090254540A1 (en) * 2007-11-01 2009-10-08 Textdigger, Inc. Method and apparatus for automated tag generation for digital content
US8580149B2 (en) 2007-11-16 2013-11-12 Lawrence Livermore National Security, Llc Barium iodide and strontium iodide crystals and scintillators implementing the same
US8620662B2 (en) 2007-11-20 2013-12-31 Apple Inc. Context-aware unit selection
US10002189B2 (en) 2007-12-20 2018-06-19 Apple Inc. Method and apparatus for searching using an active ontology
US9330720B2 (en) 2008-01-03 2016-05-03 Apple Inc. Methods and apparatus for altering audio output signals
US20090226872A1 (en) * 2008-01-16 2009-09-10 Nicholas Langdon Gunther Electronic grading system
US8065143B2 (en) 2008-02-22 2011-11-22 Apple Inc. Providing text input using speech data and non-speech data
US20090228296A1 (en) * 2008-03-04 2009-09-10 Collarity, Inc. Optimization of social distribution networks
US8996376B2 (en) 2008-04-05 2015-03-31 Apple Inc. Intelligent text-to-speech conversion
US20090276694A1 (en) * 2008-05-02 2009-11-05 Accupatent, Inc. System and Method for Document Display
US10496753B2 (en) 2010-01-18 2019-12-03 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US8103669B2 (en) 2008-05-23 2012-01-24 Xerox Corporation System and method for semi-automatic creation and maintenance of query expansion rules
US8464150B2 (en) 2008-06-07 2013-06-11 Apple Inc. Automatic language identification for dynamic text processing
US8438178B2 (en) * 2008-06-26 2013-05-07 Collarity Inc. Interactions among online digital identities
US20100030549A1 (en) 2008-07-31 2010-02-04 Lee Michael M Mobile device having human language translation capability with positional feedback
WO2010021530A1 (en) * 2008-08-20 2010-02-25 Instituto Tecnologico Y De Estudios Superiores De Monterrey System and method for displaying relevant textual advertising based on semantic similarity
US8768702B2 (en) 2008-09-05 2014-07-01 Apple Inc. Multi-tiered voice feedback in an electronic device
US8898568B2 (en) 2008-09-09 2014-11-25 Apple Inc. Audio user interface
TW201013430A (en) 2008-09-17 2010-04-01 Ibm Method and system for providing suggested tags associated with a target page for manipulation by a user
US8583418B2 (en) 2008-09-29 2013-11-12 Apple Inc. Systems and methods of detecting language and natural language strings for text to speech synthesis
US8712776B2 (en) 2008-09-29 2014-04-29 Apple Inc. Systems and methods for selective text to speech synthesis
US8676904B2 (en) 2008-10-02 2014-03-18 Apple Inc. Electronic devices with voice command and contextual data processing capabilities
US20100094814A1 (en) * 2008-10-13 2010-04-15 James Alexander Levy Assessment Generation Using the Semantic Web
US8156120B2 (en) 2008-10-22 2012-04-10 James Brady Information retrieval using user-generated metadata
US20100114890A1 (en) * 2008-10-31 2010-05-06 Purediscovery Corporation System and Method for Discovering Latent Relationships in Data
US20100131569A1 (en) * 2008-11-21 2010-05-27 Robert Marc Jamison Method & apparatus for identifying a secondary concept in a collection of documents
US9959870B2 (en) 2008-12-11 2018-05-01 Apple Inc. Speech recognition involving a mobile device
WO2010075888A1 (en) * 2008-12-30 2010-07-08 Telecom Italia S.P.A. Method and system of content recommendation
US8862252B2 (en) 2009-01-30 2014-10-14 Apple Inc. Audio user interface for displayless electronic device
US8380507B2 (en) 2009-03-09 2013-02-19 Apple Inc. Systems and methods for determining the language to use for speech generated by a text to speech engine
US8166032B2 (en) * 2009-04-09 2012-04-24 MarketChorus, Inc. System and method for sentiment-based text classification and relevancy ranking
US9245243B2 (en) * 2009-04-14 2016-01-26 Ureveal, Inc. Concept-based analysis of structured and unstructured data using concept inheritance
US20100268600A1 (en) * 2009-04-16 2010-10-21 Evri Inc. Enhanced advertisement targeting
US8527523B1 (en) 2009-04-22 2013-09-03 Equivio Ltd. System for enhancing expert-based computerized analysis of a set of digital documents and methods useful in conjunction therewith
US8533194B1 (en) 2009-04-22 2013-09-10 Equivio Ltd. System for enhancing expert-based computerized analysis of a set of digital documents and methods useful in conjunction therewith
US8346685B1 (en) 2009-04-22 2013-01-01 Equivio Ltd. Computerized system for enhancing expert-based processes and methods useful in conjunction therewith
WO2010134885A1 (en) * 2009-05-20 2010-11-25 Farhan Sarwar Predicting the correctness of eyewitness' statements with semantic evaluation method (sem)
US10241644B2 (en) 2011-06-03 2019-03-26 Apple Inc. Actionable reminder entries
US10255566B2 (en) * 2011-06-03 2019-04-09 Apple Inc. Generating and processing task items that represent tasks to perform
US10241752B2 (en) 2011-09-30 2019-03-26 Apple Inc. Interface for a virtual digital assistant
US9858925B2 (en) 2009-06-05 2018-01-02 Apple Inc. Using context information to facilitate processing of commands in a virtual assistant
US10540976B2 (en) 2009-06-05 2020-01-21 Apple Inc. Contextual voice commands
US8510308B1 (en) * 2009-06-16 2013-08-13 Google Inc. Extracting semantic classes and instances from text
US9431006B2 (en) 2009-07-02 2016-08-30 Apple Inc. Methods and apparatuses for automatic speech recognition
GB2472250A (en) * 2009-07-31 2011-02-02 Stephen Timothy Morris Method for determining document relevance
AU2010300096B2 (en) 2009-09-26 2012-10-04 Sajari Pty Ltd Document analysis and association system and method
US8645372B2 (en) * 2009-10-30 2014-02-04 Evri, Inc. Keyword-based search engine results using enhanced query strategies
US8682649B2 (en) 2009-11-12 2014-03-25 Apple Inc. Sentiment prediction from textual data
RU2012121711A (en) * 2009-12-04 2013-11-27 Сони Корпорейшн SEARCH DEVICE, PROGRAM SEARCH METHOD
US8600743B2 (en) 2010-01-06 2013-12-03 Apple Inc. Noise profile determination for voice-related feature
US8311838B2 (en) 2010-01-13 2012-11-13 Apple Inc. Devices and methods for identifying a prompt corresponding to a voice input in a sequence of prompts
US8381107B2 (en) 2010-01-13 2013-02-19 Apple Inc. Adaptive audio feedback system and method
US9715332B1 (en) 2010-08-26 2017-07-25 Cypress Lake Software, Inc. Methods, systems, and computer program products for navigating between visual components
US8661361B2 (en) 2010-08-26 2014-02-25 Sitting Man, Llc Methods, systems, and computer program products for navigating between visual components
US10276170B2 (en) 2010-01-18 2019-04-30 Apple Inc. Intelligent automated assistant
US10679605B2 (en) 2010-01-18 2020-06-09 Apple Inc. Hands-free list-reading by intelligent automated assistant
US10553209B2 (en) 2010-01-18 2020-02-04 Apple Inc. Systems and methods for hands-free notification summaries
US10705794B2 (en) 2010-01-18 2020-07-07 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US8780130B2 (en) 2010-11-30 2014-07-15 Sitting Man, Llc Methods, systems, and computer program products for binding attributes between visual components
DE202011111062U1 (en) 2010-01-25 2019-02-19 Newvaluexchange Ltd. Device and system for a digital conversation management platform
US9183288B2 (en) * 2010-01-27 2015-11-10 Kinetx, Inc. System and method of structuring data for search using latent semantic analysis techniques
US10397639B1 (en) 2010-01-29 2019-08-27 Sitting Man, Llc Hot key systems and methods
US8682667B2 (en) 2010-02-25 2014-03-25 Apple Inc. User profiling for selecting user specific voice input processing information
US9710556B2 (en) 2010-03-01 2017-07-18 Vcvc Iii Llc Content recommendation based on collections of entities
US8645125B2 (en) 2010-03-30 2014-02-04 Evri, Inc. NLP-based systems and methods for providing quotations
US8255401B2 (en) 2010-04-28 2012-08-28 International Business Machines Corporation Computer information retrieval using latent semantic structure via sketches
US7933859B1 (en) * 2010-05-25 2011-04-26 Recommind, Inc. Systems and methods for predictive coding
US8161325B2 (en) 2010-05-28 2012-04-17 Bank Of America Corporation Recommendation of relevant information to support problem diagnosis
WO2011153508A2 (en) * 2010-06-04 2011-12-08 Google Inc. Service for aggregating event information
US8713021B2 (en) 2010-07-07 2014-04-29 Apple Inc. Unsupervised document clustering using latent semantic density analysis
US8838633B2 (en) 2010-08-11 2014-09-16 Vcvc Iii Llc NLP-based sentiment analysis
US8719006B2 (en) 2010-08-27 2014-05-06 Apple Inc. Combined statistical and rule-based part-of-speech tagging for text-to-speech synthesis
US9405848B2 (en) 2010-09-15 2016-08-02 Vcvc Iii Llc Recommending mobile device activities
US8719014B2 (en) 2010-09-27 2014-05-06 Apple Inc. Electronic device with text error correction based on voice recognition data
US10734115B1 (en) 2012-08-09 2020-08-04 Cerner Innovation, Inc Clinical decision support for sepsis
US10431336B1 (en) 2010-10-01 2019-10-01 Cerner Innovation, Inc. Computerized systems and methods for facilitating clinical decision making
US11398310B1 (en) 2010-10-01 2022-07-26 Cerner Innovation, Inc. Clinical decision support for sepsis
US11348667B2 (en) 2010-10-08 2022-05-31 Cerner Innovation, Inc. Multi-site clinical decision support
US8725739B2 (en) 2010-11-01 2014-05-13 Evri, Inc. Category-based content recommendation
US10515147B2 (en) 2010-12-22 2019-12-24 Apple Inc. Using statistical language models for contextual lookup
US10762293B2 (en) 2010-12-22 2020-09-01 Apple Inc. Using parts-of-speech tagging and named entity recognition for spelling correction
US10289802B2 (en) 2010-12-27 2019-05-14 The Board Of Trustees Of The Leland Stanford Junior University Spanning-tree progression analysis of density-normalized events (SPADE)
US10628553B1 (en) 2010-12-30 2020-04-21 Cerner Innovation, Inc. Health information transformation system
US8781836B2 (en) 2011-02-22 2014-07-15 Apple Inc. Hearing assistance system for providing consistent human speech
US9262612B2 (en) 2011-03-21 2016-02-16 Apple Inc. Device access using voice authentication
US9116995B2 (en) 2011-03-30 2015-08-25 Vcvc Iii Llc Cluster-based identification of news stories
US10057736B2 (en) 2011-06-03 2018-08-21 Apple Inc. Active transport based notifications
US10672399B2 (en) 2011-06-03 2020-06-02 Apple Inc. Switching between text data and audio data based on a mapping
US9785634B2 (en) 2011-06-04 2017-10-10 Recommind, Inc. Integration and combination of random sampling and document batching
US8812294B2 (en) 2011-06-21 2014-08-19 Apple Inc. Translating phrases from one language into another using an order-based set of declarative rules
JP5742506B2 (en) * 2011-06-27 2015-07-01 日本電気株式会社 Document similarity calculation device
US8983963B2 (en) 2011-07-07 2015-03-17 Software Ag Techniques for comparing and clustering documents
US8706472B2 (en) 2011-08-11 2014-04-22 Apple Inc. Method for disambiguating multiple readings in language conversion
US8994660B2 (en) 2011-08-29 2015-03-31 Apple Inc. Text correction processing
US9442928B2 (en) 2011-09-07 2016-09-13 Venio Inc. System, method and computer program product for automatic topic identification using a hypertext corpus
US9442930B2 (en) 2011-09-07 2016-09-13 Venio Inc. System, method and computer program product for automatic topic identification using a hypertext corpus
US8762156B2 (en) 2011-09-28 2014-06-24 Apple Inc. Speech recognition repair using contextual information
US8856156B1 (en) 2011-10-07 2014-10-07 Cerner Innovation, Inc. Ontology mapper
US9430563B2 (en) 2012-02-02 2016-08-30 Xerox Corporation Document processing employing probabilistic topic modeling of documents represented as text words transformed to a continuous space
US10134385B2 (en) 2012-03-02 2018-11-20 Apple Inc. Systems and methods for name pronunciation
US9483461B2 (en) 2012-03-06 2016-11-01 Apple Inc. Handling speech synthesis of content for multiple languages
US8805842B2 (en) 2012-03-30 2014-08-12 Her Majesty The Queen In Right Of Canada, As Represented By The Minister Of National Defence, Ottawa Method for displaying search results
US10249385B1 (en) 2012-05-01 2019-04-02 Cerner Innovation, Inc. System and method for record linkage
US9280610B2 (en) 2012-05-14 2016-03-08 Apple Inc. Crowd sourcing information to fulfill user requests
US10417037B2 (en) 2012-05-15 2019-09-17 Apple Inc. Systems and methods for integrating third party services with a digital assistant
US8775442B2 (en) 2012-05-15 2014-07-08 Apple Inc. Semantic search using a single-source semantic model
WO2013185109A2 (en) 2012-06-08 2013-12-12 Apple Inc. Systems and methods for recognizing textual identifiers within a plurality of words
US9721563B2 (en) 2012-06-08 2017-08-01 Apple Inc. Name recognition system
US9495129B2 (en) 2012-06-29 2016-11-15 Apple Inc. Device, method, and user interface for voice-activated navigation and browsing of a document
US9002842B2 (en) 2012-08-08 2015-04-07 Equivio Ltd. System and method for computerized batching of huge populations of electronic documents
US9576574B2 (en) 2012-09-10 2017-02-21 Apple Inc. Context-sensitive handling of interruptions by intelligent digital assistant
US9547647B2 (en) 2012-09-19 2017-01-17 Apple Inc. Voice-based media searching
US8935167B2 (en) 2012-09-25 2015-01-13 Apple Inc. Exemplar-based latent perceptual modeling for automatic speech recognition
US9075846B2 (en) 2012-12-12 2015-07-07 King Fahd University Of Petroleum And Minerals Method for retrieval of arabic historical manuscripts
KR20230137475A (en) 2013-02-07 2023-10-04 애플 인크. Voice trigger for a digital assistant
US10769241B1 (en) 2013-02-07 2020-09-08 Cerner Innovation, Inc. Discovering context-specific complexity and utilization sequences
US10946311B1 (en) 2013-02-07 2021-03-16 Cerner Innovation, Inc. Discovering context-specific serial health trajectories
US11894117B1 (en) 2013-02-07 2024-02-06 Cerner Innovation, Inc. Discovering context-specific complexity and utilization sequences
US9601026B1 (en) 2013-03-07 2017-03-21 Posit Science Corporation Neuroplasticity games for depression
US9972030B2 (en) 2013-03-11 2018-05-15 Criteo S.A. Systems and methods for the semantic modeling of advertising creatives in targeted search advertising campaigns
US9977779B2 (en) 2013-03-14 2018-05-22 Apple Inc. Automatic supplementation of word correction dictionaries
US10642574B2 (en) 2013-03-14 2020-05-05 Apple Inc. Device, method, and graphical user interface for outputting captions
US10572476B2 (en) 2013-03-14 2020-02-25 Apple Inc. Refining a search based on schedule items
US9368114B2 (en) 2013-03-14 2016-06-14 Apple Inc. Context-sensitive handling of interruptions
US10652394B2 (en) 2013-03-14 2020-05-12 Apple Inc. System and method for processing voicemail
US9733821B2 (en) 2013-03-14 2017-08-15 Apple Inc. Voice control to diagnose inadvertent activation of accessibility features
US8788516B1 (en) 2013-03-15 2014-07-22 Purediscovery Corporation Generating and using social brains with complimentary semantic brains and indexes
KR101857648B1 (en) 2013-03-15 2018-05-15 애플 인크. User training by intelligent digital assistant
AU2014251347B2 (en) 2013-03-15 2017-05-18 Apple Inc. Context-sensitive handling of interruptions
WO2014144579A1 (en) 2013-03-15 2014-09-18 Apple Inc. System and method for updating an adaptive speech recognition model
US10748529B1 (en) 2013-03-15 2020-08-18 Apple Inc. Voice activated device for use with a voice-based digital assistant
US9122681B2 (en) 2013-03-15 2015-09-01 Gordon Villy Cormack Systems and methods for classifying electronic information using advanced active learning techniques
AU2014233517B2 (en) 2013-03-15 2017-05-25 Apple Inc. Training an at least partial voice command system
US9760644B2 (en) 2013-04-17 2017-09-12 Google Inc. Embedding event creation link in a document
WO2014197334A2 (en) 2013-06-07 2014-12-11 Apple Inc. System and method for user-specified pronunciation of words for speech synthesis and recognition
WO2014197336A1 (en) 2013-06-07 2014-12-11 Apple Inc. System and method for detecting errors in interactions with a voice-based digital assistant
US9582608B2 (en) 2013-06-07 2017-02-28 Apple Inc. Unified ranking with entropy-weighted information for phrase-based semantic auto-completion
WO2014197335A1 (en) 2013-06-08 2014-12-11 Apple Inc. Interpreting and acting upon commands that involve sharing information with remote devices
US10176167B2 (en) 2013-06-09 2019-01-08 Apple Inc. System and method for inferring user intent from speech inputs
EP3937002A1 (en) 2013-06-09 2022-01-12 Apple Inc. Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant
AU2014278595B2 (en) 2013-06-13 2017-04-06 Apple Inc. System and method for emergency calls initiated by voice command
JP6225543B2 (en) * 2013-07-30 2017-11-08 富士通株式会社 Discussion support program, discussion support apparatus, and discussion support method
DE112014003653B4 (en) 2013-08-06 2024-04-18 Apple Inc. Automatically activate intelligent responses based on activities from remote devices
US10483003B1 (en) 2013-08-12 2019-11-19 Cerner Innovation, Inc. Dynamically determining risk of clinical condition
US10446273B1 (en) 2013-08-12 2019-10-15 Cerner Innovation, Inc. Decision support with clinical nomenclatures
US10378329B2 (en) 2013-08-20 2019-08-13 Nabors Drilling Technologies Usa, Inc. Rig control system and methods
JP6241211B2 (en) * 2013-11-06 2017-12-06 富士通株式会社 Education support program, method, apparatus and system
US10224119B1 (en) 2013-11-25 2019-03-05 Quire, Inc. (Delaware corporation) System and method of prediction through the use of latent semantic indexing
US10296160B2 (en) 2013-12-06 2019-05-21 Apple Inc. Method for extracting salient dialog usage from live data
US9620105B2 (en) 2014-05-15 2017-04-11 Apple Inc. Analyzing audio input for efficient speech and music recognition
US10592095B2 (en) 2014-05-23 2020-03-17 Apple Inc. Instantaneous speaking of content on touch devices
US9502031B2 (en) 2014-05-27 2016-11-22 Apple Inc. Method for supporting dynamic grammars in WFST-based ASR
US9633004B2 (en) 2014-05-30 2017-04-25 Apple Inc. Better resolution when referencing to concepts
US9715875B2 (en) 2014-05-30 2017-07-25 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US9785630B2 (en) 2014-05-30 2017-10-10 Apple Inc. Text prediction using combined word N-gram and unigram language models
US10170123B2 (en) 2014-05-30 2019-01-01 Apple Inc. Intelligent assistant for home automation
US9760559B2 (en) 2014-05-30 2017-09-12 Apple Inc. Predictive text input
US10078631B2 (en) 2014-05-30 2018-09-18 Apple Inc. Entropy-guided text prediction using combined word and character n-gram language models
US10289433B2 (en) 2014-05-30 2019-05-14 Apple Inc. Domain specific language for encoding assistant dialog
AU2015266863B2 (en) 2014-05-30 2018-03-15 Apple Inc. Multi-command single utterance input method
US9430463B2 (en) 2014-05-30 2016-08-30 Apple Inc. Exemplar-based natural language processing
US9842101B2 (en) 2014-05-30 2017-12-12 Apple Inc. Predictive conversion of language input
US9734193B2 (en) 2014-05-30 2017-08-15 Apple Inc. Determining domain salience ranking from ambiguous words in natural speech
US10049102B2 (en) 2014-06-26 2018-08-14 Hcl Technologies Limited Method and system for providing semantics based technical support
US9338493B2 (en) 2014-06-30 2016-05-10 Apple Inc. Intelligent automated assistant for TV user interactions
US10659851B2 (en) 2014-06-30 2020-05-19 Apple Inc. Real-time digital assistant knowledge updates
US9703858B2 (en) 2014-07-14 2017-07-11 International Business Machines Corporation Inverted table for storing and querying conceptual indices
US10162882B2 (en) 2014-07-14 2018-12-25 Nternational Business Machines Corporation Automatically linking text to concepts in a knowledge base
US9710570B2 (en) 2014-07-14 2017-07-18 International Business Machines Corporation Computing the relevance of a document to concepts not specified in the document
US10503761B2 (en) 2014-07-14 2019-12-10 International Business Machines Corporation System for searching, recommending, and exploring documents through conceptual associations
US9576023B2 (en) 2014-07-14 2017-02-21 International Business Machines Corporation User interface for summarizing the relevance of a document to a query
US10437869B2 (en) 2014-07-14 2019-10-08 International Business Machines Corporation Automatic new concept definition
US10446141B2 (en) 2014-08-28 2019-10-15 Apple Inc. Automatic speech recognition based on user feedback
US9818400B2 (en) 2014-09-11 2017-11-14 Apple Inc. Method and apparatus for discovering trending terms in speech requests
US10789041B2 (en) 2014-09-12 2020-09-29 Apple Inc. Dynamic thresholds for always listening speech trigger
US9734144B2 (en) 2014-09-18 2017-08-15 Empire Technology Development Llc Three-dimensional latent semantic analysis
US9668121B2 (en) 2014-09-30 2017-05-30 Apple Inc. Social reminders
US9886432B2 (en) 2014-09-30 2018-02-06 Apple Inc. Parsimonious handling of word inflection via categorical stem + suffix N-gram language models
US10127911B2 (en) 2014-09-30 2018-11-13 Apple Inc. Speaker identification and unsupervised speaker adaptation techniques
US10074360B2 (en) 2014-09-30 2018-09-11 Apple Inc. Providing an indication of the suitability of speech recognition
US9646609B2 (en) 2014-09-30 2017-05-09 Apple Inc. Caching apparatus for serving phonetic pronunciations
US10372718B2 (en) 2014-11-03 2019-08-06 SavantX, Inc. Systems and methods for enterprise data search and analysis
US10915543B2 (en) 2014-11-03 2021-02-09 SavantX, Inc. Systems and methods for enterprise data search and analysis
US20160154844A1 (en) * 2014-11-29 2016-06-02 Infinitt Healthcare Co., Ltd. Intelligent medical image and medical information search method
US10552013B2 (en) 2014-12-02 2020-02-04 Apple Inc. Data detection
US9711141B2 (en) 2014-12-09 2017-07-18 Apple Inc. Disambiguating heteronyms in speech synthesis
US9865280B2 (en) 2015-03-06 2018-01-09 Apple Inc. Structured dictation using intelligent automated assistants
US9886953B2 (en) 2015-03-08 2018-02-06 Apple Inc. Virtual assistant activation
US9721566B2 (en) 2015-03-08 2017-08-01 Apple Inc. Competing devices responding to voice triggers
US10567477B2 (en) 2015-03-08 2020-02-18 Apple Inc. Virtual assistant continuity
US9899019B2 (en) 2015-03-18 2018-02-20 Apple Inc. Systems and methods for structured stem and suffix language models
US9842105B2 (en) 2015-04-16 2017-12-12 Apple Inc. Parsimonious continuous-space phrase representations for natural language processing
US10083688B2 (en) 2015-05-27 2018-09-25 Apple Inc. Device voice control for selecting a displayed affordance
US10127220B2 (en) 2015-06-04 2018-11-13 Apple Inc. Language identification from short strings
US10101822B2 (en) 2015-06-05 2018-10-16 Apple Inc. Language input correction
US10186254B2 (en) 2015-06-07 2019-01-22 Apple Inc. Context-based endpoint detection
US10255907B2 (en) 2015-06-07 2019-04-09 Apple Inc. Automatic accent detection using acoustic models
US11025565B2 (en) 2015-06-07 2021-06-01 Apple Inc. Personalized prediction of responses for instant messaging
US10671675B2 (en) 2015-06-19 2020-06-02 Gordon V. Cormack Systems and methods for a scalable continuous active learning approach to information classification
US10747498B2 (en) 2015-09-08 2020-08-18 Apple Inc. Zero latency digital assistant
US10671428B2 (en) 2015-09-08 2020-06-02 Apple Inc. Distributed personal assistant
US9734141B2 (en) 2015-09-22 2017-08-15 Yang Chang Word mapping
US9697820B2 (en) 2015-09-24 2017-07-04 Apple Inc. Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks
US11010550B2 (en) 2015-09-29 2021-05-18 Apple Inc. Unified language modeling framework for word prediction, auto-completion and auto-correction
US10366158B2 (en) 2015-09-29 2019-07-30 Apple Inc. Efficient word encoding for recurrent neural network language models
US11587559B2 (en) 2015-09-30 2023-02-21 Apple Inc. Intelligent device identification
US10691473B2 (en) 2015-11-06 2020-06-23 Apple Inc. Intelligent automated assistant in a messaging environment
US10003559B2 (en) * 2015-11-12 2018-06-19 International Business Machines Corporation Aggregating redundant messages in a group chat
US10049668B2 (en) 2015-12-02 2018-08-14 Apple Inc. Applying neural network language models to weighted finite state transducers for automatic speech recognition
US10223066B2 (en) 2015-12-23 2019-03-05 Apple Inc. Proactive assistance based on dialog communication between devices
US9836669B2 (en) 2016-02-22 2017-12-05 International Business Machines Corporation Generating a reference digital image based on an indicated time frame and searching for other images using the reference digital image
US10446143B2 (en) 2016-03-14 2019-10-15 Apple Inc. Identification of voice inputs providing credentials
US10372872B2 (en) 2016-04-22 2019-08-06 The Boeing Company Providing early warning and assessment of vehicle design problems with potential operational impact
US9934775B2 (en) 2016-05-26 2018-04-03 Apple Inc. Unit-selection text-to-speech synthesis based on predicted concatenation parameters
US9972304B2 (en) 2016-06-03 2018-05-15 Apple Inc. Privacy preserving distributed evaluation framework for embedded personalized systems
US10249300B2 (en) 2016-06-06 2019-04-02 Apple Inc. Intelligent list reading
US10049663B2 (en) 2016-06-08 2018-08-14 Apple, Inc. Intelligent automated assistant for media exploration
DK179588B1 (en) 2016-06-09 2019-02-22 Apple Inc. Intelligent automated assistant in a home environment
US10586535B2 (en) 2016-06-10 2020-03-10 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US10490187B2 (en) 2016-06-10 2019-11-26 Apple Inc. Digital assistant providing automated status report
US10192552B2 (en) 2016-06-10 2019-01-29 Apple Inc. Digital assistant providing whispered speech
US10509862B2 (en) 2016-06-10 2019-12-17 Apple Inc. Dynamic phrase expansion of language input
US10067938B2 (en) 2016-06-10 2018-09-04 Apple Inc. Multilingual word prediction
DK179415B1 (en) 2016-06-11 2018-06-14 Apple Inc Intelligent device arbitration and control
DK179049B1 (en) 2016-06-11 2017-09-18 Apple Inc Data driven natural language event detection and classification
DK179343B1 (en) 2016-06-11 2018-05-14 Apple Inc Intelligent task discovery
DK201670540A1 (en) 2016-06-11 2018-01-08 Apple Inc Application integration with a digital assistant
US20180173850A1 (en) * 2016-12-21 2018-06-21 Kevin Erich Heinrich System and Method of Semantic Differentiation of Individuals Based On Electronic Medical Records
US10593346B2 (en) 2016-12-22 2020-03-17 Apple Inc. Rank-reduced token representation for automatic speech recognition
EP3590053A4 (en) 2017-02-28 2020-11-25 SavantX, Inc. System and method for analysis and navigation of data
US11328128B2 (en) 2017-02-28 2022-05-10 SavantX, Inc. System and method for analysis and navigation of data
DK179745B1 (en) 2017-05-12 2019-05-01 Apple Inc. SYNCHRONIZATION AND TASK DELEGATION OF A DIGITAL ASSISTANT
DK201770431A1 (en) 2017-05-15 2018-12-20 Apple Inc. Optimizing dialogue policy decisions for digital assistants using implicit feedback
CN107943978B (en) * 2017-11-29 2020-11-24 北京金堤科技有限公司 Storage method and device for user access records
US10902066B2 (en) 2018-07-23 2021-01-26 Open Text Holdings, Inc. Electronic discovery using predictive filtering
WO2021009861A1 (en) * 2019-07-17 2021-01-21 富士通株式会社 Specifying program, specifying method, and specifying device
DE102019212421A1 (en) 2019-08-20 2021-02-25 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Method and device for identifying similar documents
US11730420B2 (en) 2019-12-17 2023-08-22 Cerner Innovation, Inc. Maternal-fetal sepsis indicator
CN113377923B (en) * 2021-06-25 2024-01-09 北京百度网讯科技有限公司 Semantic retrieval method, apparatus, device, storage medium and computer program product
DE102022203475A1 (en) 2022-04-07 2023-10-12 Zf Friedrichshafen Ag System for generating a human-perceptible explanation output for an anomaly predicted by an anomaly detection module on high-frequency sensor data or quantities derived therefrom of an industrial manufacturing process, method and computer program for monitoring artificial intelligence-based anomaly detection in high-frequency sensor data or quantities derived therefrom of an industrial manufacturing process and method and computer program for monitoring artificial intelligence-based anomaly detection during an end-of-line acoustic test of a transmission

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4384325A (en) * 1980-06-23 1983-05-17 Sperry Corporation Apparatus and method for searching a data base using variable search criteria
EP0054588B1 (en) * 1980-12-19 1984-09-26 International Business Machines Corporation Interactive data retrieval apparatus
US4495566A (en) * 1981-09-30 1985-01-22 System Development Corporation Method and means using digital data processing means for locating representations in a stored textual data base
US4506326A (en) * 1983-02-28 1985-03-19 International Business Machines Corporation Apparatus and method for synthesizing a query for accessing a relational data base
US4575798A (en) * 1983-06-03 1986-03-11 International Business Machines Corporation External sorting using key value distribution and range formation

Also Published As

Publication number Publication date
US4839853A (en) 1989-06-13

Similar Documents

Publication Publication Date Title
CA1306062C (en) Computer information retrieval using latent semantic structure
US5987446A (en) Searching large collections of text using multiple search engines concurrently
Lochbaum et al. Comparing and combining the effectiveness of latent semantic indexing and the ordinary vector space model for information retrieval
Liu et al. Mining topic-specific concepts and definitions on the web
Ding A similarity-based probability model for latent semantic indexing
Wang et al. Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization
Deerwester et al. Indexing by latent semantic analysis
US7269598B2 (en) Extended functionality for an inverse inference engine based web search
Dumais Improving the retrieval of information from external sources
US5301109A (en) Computerized cross-language document retrieval using latent semantic indexing
EP0597630A1 (en) Method for resolution of natural-language queries against full-text databases
Croft Advances in information retrieval: recent research from the center for intelligent information retrieval
US20070143235A1 (en) Method, system and computer program product for organizing data
Cruz et al. Measuring structural similarity among web documents: preliminary results
CA2423476C (en) Extended functionality for an inverse inference engine based web search
Kim et al. Cluster-based faq retrieval using latent term weights
Corston-Oliver et al. Less is more: eliminating index terms from subordinate clauses
Khan et al. Web document clustering using a hybrid neural network
Steinberger et al. Text summarization: An old challenge and new approaches
Feuer et al. Implementing and evaluating phrasal query suggestions for proximity search
Rodrigues et al. Concept based search using LSI and automatic keyphrase extraction
Anick et al. Interactive document retrieval using faceted terminological feedback
Rungsawang Dsir: The first trec-7 attempt
Rafiei Fourier transform based techniques in efficient retrieval of similar time sequences
Liao et al. A domain‐independent software reuse framework based on a hierarchical thesaurus

Legal Events

Date Code Title Description
MKEX Expiry