WO2002008946A2 - A method and system for a document search system using search criteria comprised of ratings prepared by experts - Google Patents
A method and system for a document search system using search criteria comprised of ratings prepared by experts Download PDFInfo
- Publication number
- WO2002008946A2 WO2002008946A2 PCT/US2001/023058 US0123058W WO0208946A2 WO 2002008946 A2 WO2002008946 A2 WO 2002008946A2 US 0123058 W US0123058 W US 0123058W WO 0208946 A2 WO0208946 A2 WO 0208946A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- rating
- significance
- taxonomy
- database
- multidimensional
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/38—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/93—Document management systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/954—Navigation, e.g. using categorised browsing
Definitions
- This invention relates to the field of electronic computer related systems. More particularly, the present invention relates to a method and system for the automated search of document files using search criteria based upon document identifiers generated by expert reviewers in lieu of key words in context, and relates to a method and system for indexing such documents.
- Patent Office web site www.uspto.gov permit customized searches with known parameters in lieu of key words, such as Inventor name, assignee name, patent agent name, etc., but also include key-word searches. These searches also suffer from the same malady: returning many documents which must generally be read to find the pertinent ones.
- Dr. Woods has addressed the problem wherein the articulation of the desired subject matter is different that that used by the authors of the documents being searched. This is sometimes referred to as the "synonym problem" although Dr.
- Woods characterizes the problem in a broader connotation by referring to it as the "paraphrase problem” and his general solution approach is called “conceptual indexing” and more specifically as “subsumption technology.”
- Subsumption technology is used to automatically integrate syntactic, semantic, and morphological relationships among concepts that occur in the material, and to organize them into a structured conceptual taxonomy that is efficiently useable by retrieval algorithms and also effective for browsing.
- Dr. Woods conceptual indexing approach is described in a number of papers including "Natural Language Technology in Precision Content Retrieval” by Jacek Ambroziak and William A. Woods, Proceedings of the International Conference on Natural Language Processing and Industrial Applications, August 18-21, 1998, Moncton, New Brunswick, Canada, and
- Woods and Manning & Napier approaches are that a two step process is required: First a linguistic vector or structured conceptual taxonomy must be constructed by the indexing engine when the material is indexed, and second a special retrieval algorithm is used to find either equivalent linguistic vectors or combinations of morphological and semantic subsumption relationships that connect concepts in the request with concepts that occur in the indexed material. While both approaches appear to provide significant efficiency over key word searches, and while the Wood approach appears to be the more efficient of the two, both have the same disadvantages. Both systems require first a baseline database of target documents and second a powerful lexical computing engine to create the linguistic vectors or combinations of morphological and semantic subsumption relationships. Only then can the search technologies of the two be used.
- Biomedicine is largely a knowledge industry. While a physical product, the medicine, does have to be developed, tested, manufactured and delivered, the knowledge of how to do so and the knowledge of which product works best in particular cases contributes most of the value.
- a second characteristic of biomedical knowledge is that it is highly dynamic. At the research level, significant advances in our understanding of biomedical phenomena happen on a weekly basis. Therefore, biomedical professionals have an ongoing need to keep up with the advances relevant to their own specialty area. Such needs have become particularly acute in health-care, because patients can now use the Web to learn about the latest developments themselves; as a result, they demand increasingly detailed and timely information from health-care professionals.
- biomedical Web pages There is as yet no centralized source of biomedical information on the web. The information one seeks may be available somewhere on the web. The hard part is finding it. There are thousands of biomedical Web pages, ranging from individual sites to corporate sites. These sites generally fall into the following categories:
- taxonomies which are, basically, hierarchical i.e. one-dimensional.
- n- dimensional taxonomy is more appropriate. That is, a biomedical development might be considered mundane from a technical standpoint, yet highly significant from a social or business viewpoint. While it is true that this "significance" issue might be expected to be handled by the way the query is structured (i.e. from the technical viewpoint or from the social or business viewpoint), systems such as the Sun and Manning & Napier systems cannot handle these issues because of the pre-defined mathematical indexing algorithms they use.
- the solution to these technical problems therefore is to provide a method for analyzing a database of documents wherein a multi-dimensional taxonomy of attributes for a specific domain can be developed and used to tag the related documents with significance rating indicia, which can then be searched by a qualitative matching engine.
- the present invention provides a solution to the needs described above through a system and method for creating and maintaining a Biomedical document database, wherein the documents have been reviewed by biomedical and other experts, who have assigned taxonomic based attribute indicia to each document wherein a specialized search engine can rapidly retrieve relevant documents based upon the commonly known taxonomy.
- "Reference” is defined as a URL or literature citation associated with an original research article or similar content.
- the input interface components for this database are:
- Expert input may be provided by the editorial staff of a leading trade journal, by selected leading practitioners in the field, or by other similar expert mechanism.
- leading practitioners or “domain experts” are defined as those within the top echelon of researchers within the biomedical sub-specialty in question, as judged by publication record, public recognition through competitive awards (such as the Nobel prize) and peer evaluations.
- An Interface for expert ratings of references Expert input (as defined in the previous section) regarding each reference is collected under a multidimensional rating taxonomy. Each reference receives ratings under each relevant taxonomic category from appropriate expert sources. A composite rating may be computed from the mean of multiple ratings received under a single taxonomic category.
- An Optional interface for expert commentary Expert ratings may be accompanied by text commentary on each reference. Such commentary received from multiple expert sources can provide additional insight into the relevance of a particular reference.
- An Optional interface for acquiring profiles of experts Input from each expert source may be normalized for certain variables, based on attributes measured for that expert source. For example, mean ratings and distributions collected and analyzed from each expert source may allow that expert's rating input to be expressed as standard deviations from the mean.
- Figure 1 illustrates an exemplary Internet distributed system configuration.
- Figure 2 illustrates a representative general purpose computer server configuration.
- Figure 3 illustrates a block diagram of a preferred embodiment of the process of rating documents and storing them in a database.
- Figure 4 illustrates a block diagram of a preferred embodiment of the process of search and retrieval from the database.
- Figure 5 illustrates a preferred embodiment of an exemplary data structure for the indicia storage related to a document
- Figure 6 illustrates a preferred embodiment of an exemplary data structure of a multidimensional taxonomy structure.
- Figure 7 illustrates a preferred embodiment of an exemplary data structure of a taxonomy structure for use with the present invention.
- Figure 8 illustrates a preferred embodiment of an exemplary input screen showing how to rate significance in a first dimension.
- Figure 9 illustrates a preferred embodiment of an exemplary input screen showing how to rate significance in subcategories of first dimension.
- Figure 10 illustrates a preferred embodiment of an exemplary input screen showing how to rate significance in a second dimension.
- Figure 11 illustrates a preferred embodiment of an exemplary input screen showing how to rate significance in a third dimension.
- Figure 12 illustrates a preferred embodiment of an exemplary input screen showing how to rate significance in subcategories of third dimension.
- Figure 13 illustrates a preferred embodiment of an exemplary input screen showing how to rate significance in a fourth dimension.
- Figure 14 illustrates a preferred embodiment of an exemplary input screen showing a summary review significance pattern.
- Figure 15 illustrates a preferred embodiment of an exemplary input screen showing how to capture a document from the PubMed database.
- Figure 16 illustrates a preferred embodiment of an exemplary input screen showing haw to submit a reference to the database of the invention.
- Figure 17 illustrates a preferred embodiment of an exemplary input screen showing how to submit a critique of a reference to the system of the invention.
- the present invention provides a solution to the needs described above through a system and method for creating and maintaining a Biomedical document database, wherein the documents have been reviewed by biomedical and other experts, who have assigned taxonomic based indicia to each document wherem a specialized search engine can rapidly retrieve relevant documents based upon the commonly known taxonomy.
- "Reference” is defined as a URL or literature citation associated with an original research article or similar content.
- the input interface components for this database are:
- Expert input may be provided by the editorial staff of a leading trade journal, by selected leading practitioners in the field, or by other similar expert mechanism.
- leading practitioners or “domain experts” are defined as those within the top echelon of researchers within the biomedical sub-specialty in question, as judged by publication record, public recognition through competitive awards (such as the Nobel prize) and peer evaluations.
- An Interface for expert ratings of references Expert input (as defined in the previous section) regarding each reference is collected under a multidimensional rating taxonomy. Each reference receives ratings under each relevant taxonomic category from appropriate expert sources. A composite rating may be computed from the mean of multiple ratings received under a single taxonomic category.
- An Optional interface for expert commentary Expert ratings may be accompanied by text commentary on each reference. Such commentary received from multiple expert sources can provide additional insight into the relevance of a particular reference.
- An Optional interface for acquiring profiles of experts Input from each expert source may be normalized for certain variables, based on attributes measured for that expert source. For example, mean ratings and distributions collected and analyzed from each expert source may allow that expert's rating input to be expressed as standard deviations from the mean.
- input interfaces may be designed to collect information relating to time of input, and other relevant attributes. Keywords, summaries, abstracts, slides, audio, editorials, interactive modules, educational content and other items related to a reference may also be stored in the database.
- a search query may utilize taxonomic category information (example: cancer), date of reference (example: significant developments in the last 3 months), bias of party making the search (for example, an individual may be interested in cancer, but particularly interested in new clinical treatment modalities- not in basic science research advances) or any other pertinent attributes.
- taxonomic category information example: cancer
- date of reference example: significant developments in the last 3 months
- bias of party making the search for example, an individual may be interested in cancer, but particularly interested in new clinical treatment modalities- not in basic science research advances
- the relative contribution of each attribute comparison to the final match reported may be manipulated by any prediction algorithm.
- the result returned by the search algorithm may be a score. Results of a search may be displayed as a list of references sorted by score, or it may be searched further by additional criteria.
- Each reference may be linked to derivatives such as keywords, summaries, abstracts, slides, audio, editorials, interactive modules, educational content and other items related to a reference.
- the biomedical reference rating database may be created and searched, using the elements described in the preceding paragraphs.
- the salient features of this system are its multi-dimensional taxonomy of rating categories, and its strict dependence on expert ratings.
- An expert panel-based mechanism, or other similar peer-review-based mechanism is the only traditionally credible method for assigning significance in the biomedical domain. For this reason it remains quite distinct from "popularity contest" rating systems wherein the source of the rating is not known to be expert.
- One value of a credible rating system is that the database and search functions described above generate results which are credible and trusted by biomedical professionals and, by extension, those non-professional or non-specialist audiences that rely on the judgement of biomedical professionals.
- the environment in which the present invention is used encompasses the general Internet-based systems hardware and infrastructure along with well known electronic transmission protocols both conventional and wireless.
- FIG. 1 Some of the elements of a typical Internet network configuration are shown in Figure 1, wherein a number of client machines 105 possibly in a remote local office, are shown connected to a gateway/hub/tunnel-server/etc. 106 which is itself connected to the internet 107 via some internet service provider (ISP) connection 108. Also shown are other possible clients 101, 103 similarly connected to the internet 107 via an ISP connection 104, with these units communicating to possibly a central lab or office via an ISP connection 109 to a gateway/tunnel-server 110 which is connected 111 to various enterprise application servers 112, 113, 114 which could be connected through another hub/router 115 to various local clients 116, 117, 118. Any of these servers 112, 113, 114 could function as a database server for the storage of the indexed documents and messages of the present invention as well as the server for the search engine of the present invention, as more fully described below.
- ISP internet service provider
- the general purpose system 201 includes a motherboard 203 having thereon an input/output ("I/O") section 205, one or more central processing units (“CPU”) 207, and a memory section 209 which may have a flash memory card 211 related to it.
- the I O section 205 is connected to a keyboard 226, other similar general purpose computer units 225, 215, a disk storage unit 223 and a CD-ROM drive unit 217.
- the CD-ROM drive unit 217 can read a CD-ROM medium 219 which typically contains programs 221 and other data.
- Logic circuits or other components of these programmed computers will perform series of specifically identified operations dictated by computer programs as described more fully below.
- the rating taxonomy which, in our case, is multidimensional and developed over the biomedical domain.
- the multidimensional nature of the classification of knowledge is a key element because those skilled in these arts have been able to come up with a satisfactory systematization of biomedical knowledge "significance" without it. That is, significance or relevance (both in its individual or in its broader societal sense) is dimensional. For example a particular research paper in the area of cancer will be rated for significance in a particular way if the dimension is, say, drug development, but in a completely different way if the dimension is basic science impact, technological impact, or societal impact.
- n-dimensional taxonomy is used wherein each dimension is independent. That is, the same item may eventually appear under several dimensions of the taxonomy (see Figure 6). This is novel and important for developing a comprehensive taxonomy in the biomedical arena. Most traditional taxonomies are hierarchical. For example the music rating taxonomy of Listen.com would have various branches and the final categories would all be distinct.
- An input interface for tagging items with a significance rating according to the taxonomy (this includes the peer-review mechanisms described below, which are different from the art in that they employ expert panels).
- the rating is done by acknowledged experts in the dimension of relevance, in a peer- review process analogous to grant proposal review, or pre-publication manuscript review. It is this process that brings credibility to the ratings.
- a profiling interface (which may itself contain several embedded technologies) for creating profiles of users. Again, a multidimensional taxonomy of user profiles is used, which is also believed to be distinct from the art. For example, one dimension may be knowledge taxonomic domains, the other dimension may be some or all of the following profile layers: a. User ID b. predicted preferences (based, for instance on inferred cognitive style) c. reported preferences (by user) d. reported experience (by user) e. locally documented experience (such as assessment test results, purchase records) f. composite profile layer
- a personalized search strategy based on each individual's weighted preferences in the categories of our multidimensional taxonomy A person might want to know the most significant recent developments in the field of cancer, but what does that really mean? If that person's preferences are for developments of franslational interest or, even more complicating, developments of translational interest which offer opportunities for investment, the present system of keyword searches could never satisfy this requirement. Thus, our rating taxonomy allows for novel ways of searching for "significant" knowledge. The search is personalized with respect to what would be considered significant.
- An alternative embodiment may include predictional algorithms based on prior research findings. These may optionally be used to compute the "composite profile" of a user.
- Calculating a composite profile may be useful in situations where information about a user's interest in specific taxonomic categories might be derived from several different sources such as known professional experience, self-reported experience, self-reported interest, or actually recorded performance in educational assessments or other interactive modules. In such cases, it may be desirable to compute and use a composite profile calculated from these disparate sources of input information.
- the actual method of computation may be an arithmetic mean, a weighted mean based on hypothesis or previous findings, or any other suitable computation.
- any appropriate quantitative matching algorithm and search engine may be used.
- Liquid EnginesTM Inc. has developed a generally matching algorithm which may be used for this search purpose.
- a method for creating and maintaining a biomedical reference database in which each reference is associated with rating attributes across a multi-dimensional taxonomy.
- the search function of this invention will serve professional users in a far more sophisticated manner than currently available search methods, with directory/search engine features more suited to current needs.
- the primary goal is to make relevant biomedical content as easy as possible to find - and to present it in a form that is easy to digest.
- Each search result lists a variety of derivative products. For example, instead of providing the original research articles, a link to summaries and short editorial commentary relating to such articles may be provided. Each summary is also linked to the original content.
- each derivative page is hypertextually linked to a central biomedical knowledge database, allowing for quick educational reference to the underlying concepts. This adds value to the consumer's need to understand the content more fully, in the shortest amount of time.
- PERSONALIZATION hi order to customize the display of content to each user, some information will be stored and served in two formats (expert and non-expert). A biomedical researcher with a Ph.D. might see a different version of selected content than a non-specialist.
- documents are selected for special significance to this community of users 303.
- the expert reviewers supply ratings for each document 305.
- Expert input may be provided by the editorial staff of a leading trade journal, by selected leading practitioners in the field, or by other similar expert mechanism.
- leading practitioners are defined as those within the top 10% of researchers within the biomedical sub-specialty in question, as judged by publication record, public recognition through competitive awards (such as the Nobel prize) and peer evaluations.
- Rating values are assigned to search elements 307 according to the taxonomy indicia (the taxonomy is described in more detail below). Expert input (as defined in the previous section) regarding each reference is collected under a multidimensional rating taxonomy. Each reference receives ratings under each relevant taxonomic category from each appropriate expert source. A composite rating may be computed from the mean of multiple ratings received under a single taxonomic category. This rating process and calculation is explained in more detail below. When completed the annotated document is stored in the database 309. A generalized data structure for a document in the database is shown in Figure 5 although those skilled in the art will understand that there are a multitude of ways to structure such an index.
- FIG. 4 a generalized proprietary database search and access system 400 is described. Input to such a system 400 may be directly or through the Internet from a client machine which may be a Personal Computer (PC) or from a Personal Data Assistant (PDA) device such as a 3ComTM handheld device, and may use any number of communications protocols such as HTML, XML, WAP, WML, etc.
- a user contacts the system through its web page 403. The system checks the user's password and ID 405 in order to determine whether the user is a subscriber to the service. If not 407 the user is requested to become a subscriber and if not the system exits 409, 411.
- the user is a subscriber 413 he is given a search format page wherein he can enter the specific search criteria 415 in which he is interested.
- the database is searched 417 for matches to his input criteria, and a page of pointers to relevant documents is returned 419.
- the user may request another search if he desires 421, 427 or he may terminate the search 423 at which time his search time and costs are calculated and saved 425 for periodic billing of the user.
- Alternative billing/subscription schemes may be used wherein the subscriber is billed a flat fee per period.
- the rating is done by acknowledged experts in the dimension of relevance, in a peer-review process analogous to grant proposal review, or pre-publication manuscript review. It is this process that brings credibility to the ratings. One cannot simulate this with reviews written by online users. When one reads a review for a book on Amazon that says: "...most useful book I have read in the past two years" one doesn't know what qualifications that individual has, whether in fact he or she has read more than one book in the past two years, or whether he/she is a nut.
- document reviewers perform the reviews as follows. Referring to Figure 7, a document reviewer initially selects the Enterprise Domain 701 type B 703, type T 705, type C 707 or type S 709. The reviewer then selects a type 713 in the Disease Group 711, if applicable, and a type 717 in the Underlying Concepts Group 715 if applicable.
- B 703 or T 703 in the Enterprise Domain 701 he then selects a type Reductionist 721 or Abstractive 723 in the Investigative Emphasis group 719. If B 703 was selected in the Enterprise Domain 701, the reviewer then selects a topic type 727 from the IBl Topic Group 725. If T 703 was selected in the Enterprise Domain 701, he selects a topic type 731 from the ITl Topic Group 729. If C 707 was selected in the Enterprise Domain 701, the reviewer then selects a topic type 735 from the 1C1 Topic Group 733. If S 709 was selected in the Enterprise Domain 701, the reviewer then selects a topic type 739 from the 1 SI Topic Group 737.
- search criteria For such a document would specify in the search criteria
- the attributes IB, 2G, 3L, 4R, 1B1-03 of the above example might carry associated weightings assigned based on prior observation, or by hypothesis.
- preference information might, for example, have been collected at the time of initial user registration at the web site.
- a person might want to know the most significant recent developments in the field of cancer, but what does that really mean?
- a search function that takes x inputs on the user profile, and y inputs on each database item and returns a list of documents, each with a mathematical probability of a match, maybe used.
- Any appropriate quantitative matching algorithm may be used.
- Liquid EnginesTM, Inc. has developed a generally applicable matching algorithm which may be used for this purpose.
- Subscription Groups Wherein users pay a fee to see comments and feedback of articles and content from experts within a field. Articles and comments are rated within a set of categories by experts and are then searchable by date or relevance for those subscribing to those groups.
- the system will generate a framework in which both types of groups can exist in the same suite of tables in a database, with different front end implementations.
- Subscription Member Observes postings by elite members and can search database by taxonomy criteria.
- the implementation consists of additions of a user membership tables, group tables, and finally a messaging table for determining messages contained within a particular group.
- the membership table simply indicates whether the user is a member of a group or not. It also contains the time at which point a users membership expires (if at all), as well as a billing reference (customizable depending on what billing service is used.)
- This table contains a PRIMARY KEY which is (UserlD, GroupID)
- a UNIQUE constraint is placed upon (UserlD, GroupID) to prevent multiple inclusions of a user in a group.
- the Perms is a bitfield (where any value can be on or off) which described the type of privileges a user has within a group.
- the "Flags” field is a bitfield (so multiple values may be on or off) which maps to the following values:
- the Messages Table contains messages posted to all groups.
- the ContentType is the type of information being commented about:
- An index should be created for GroupID to accelerate searching for all messages within a group.
- the Ratings Table contains ratings for subscription based groups. Ratings are based upon the relevance of a particular reference to a taxonomic category, as described above.
- the document classification sub-system measures the user's perception and evaluation of the document.
- a set of questions in the form of analysis screens is posed to the user regarding the relevance of the document being rated, which the user responds to by selecting an option box ("Highly i ⁇ elevant,” “irrelevant,” “Slightly i ⁇ elevant,” “Neutral,” “Slightly relevant,” “relevant,” and “Very relevant.")
- Sample rating categories include: “Basic Science Impact”; “Technology Impact”; “Business Impact”; “Societal Impact”; and “Clinical Impact.” Each category is scored on a seven-point scale (-3 to +3) where a score of -3 is “Highly I ⁇ elevant” and +3 is “Very Relevant.”
- a present exemplary embodiment of the invention is found at the web site www.biocritique.com. Once a user has logged on to the system she is able to post an article, rate an article, critique an article or search for articles that co ⁇ elate most closely with the specific interests of the user. Those users who are permitted to post and rate an article are members of a selected "expert panel" who have agreed to participate in this system.
- the work of the expert panels in the BioCritique Forums rapidly forms a database of rated articles and reviews from which users can obtain significant information relevant to their specific needs, i.e. the latest important developments in their own and related fields.
- Panelists will map articles using BioCritique's multi-dimensional taxonomy of significance (see below). Users store quantitative profiles of their interests rated across the same taxonomy. Pair- wise co ⁇ elations are performed to sort the database for each user. A search of the BioCritique database is thus based on an intuitive "pattern-matching" concept akin to the way human beings relate to information in the real world. Users can store multiple profiles, thereby customizing them to different needs.
- BioCritique looks at each object in its database in 6 primary dimensions, rating each on a scale. For example, the Attributes dimension scores the following parameters on a scale of 1 to 9:
- the expertise dimension contains a large number of biomedical specialties ranging from Biostatistics to Toxicology. Disease dimension categories are subcategorized. A rating under Cardiovascular would trigger the following subcategories:
- a first rating screen 800 is illustrated, indicating that the user is to rate the significance in a first dimension 801. Shown on the screen are five basic categories 807. The user chooses one or two categories by indicating which categories are not relevant 803. In this example "clinical impact" has been chosen. For each category chosen the user chooses a relevancy category on a scale of 1-9. Here the user has selected an indication which has a value of 5. 805. On completion of the rating process the user clicks on the button 809 to go to the next section. These selections would produce a basic "Impact Domain” selection vector, for example, that would look like this:
- the user is requested to rate the significance in a third dimension ("disease states") 1101.
- Disease states a third dimension
- the selection vector would look like:
- third dimension 1201 which in this instance relate to "cardiovascular subcategories.”
- the user again elects the categories that are "not relevant” 1203 and for each category deemed relevant, a rating is selected 1205. hi this case the selection vector would look like:
- this exemplary rating of a given document by a user would produce the significance vectors as indicated above and these would be stored with the document.
- Another user who would review the same document and provide a similar rating could very well select different categories and different significance ratings even for the same categories.
- these different significance vector values are averaged and the resulting vector with the averages is saved with the document along with a "number of raters" value which is used to compute the new average. For example, looking at the significance vector associated with the description of Figure 9 above, if we had a second review of this document the system would execute the acts to produce a significance vector with averages for each category as follows:
- Figures 15-17 disclose an exemplary set of screens which facilitate downloading a reference (Fig. 15), submitting the reference to the BioCritique database (Fig. 16), and adding personal comments to a reference (Fig. 17).
- BioCritique Registered users of the system containing the invention can create and save a set of significance vectors to be used regularly thereafter whenever they sign on, or a special one-time set of significance vectors can be generated.
- This set of significance vectors are the same as those indicated above, and are generated by the user going through the same set of screens as shown in Figures 8-14.
- the search is conducted by doing a pair-wise co ⁇ elation between the user specific significance vectors and the significance vectors stored for each document.
- This pair-wise co ⁇ elation is performed using the Pearson co ⁇ elation coefficient method (which is explained below).
- a Pearson co ⁇ elation coefficient (“r") is calculated for each document and the documents then sorted with those having the highest "r" value first. The first 10 such documents meeting a minimal "r" value level are then presented to the searching user.
- the co ⁇ elation between two variables reflects the degree to which the variables are related.
- the most common measure of co ⁇ elation is the Pearson Product Moment Co ⁇ elation (called Pearson's co ⁇ elation for short).
- Pearson's co ⁇ elation When measured in a population the Pearson Product Moment co ⁇ elation is designated by the Greek letter rho ( ⁇ ). When computed in a sample, it is designated by the letter “r” and is sometimes called “Pearson's r.”
- Pearson's co ⁇ elation reflects the degree of linear relationship between two variables. It ranges from +1 to -1. A co ⁇ elation of +1 means that there is a perfect positive linear relationship between variables.
- a co ⁇ elation of -1 means that there is a perfect negative linear relationship between variables.
- This value, 0.9608, would say that the numbers in the X column are highly co ⁇ elated with the numbers in the Y column (a value of +1.0 meaning the numbers were perfectly co ⁇ elated).
- this high co ⁇ elation (0.9608) would characterize this document as highly likely to be of significant interest to this user and his given search criteria.
Abstract
Description
Claims
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CA002415608A CA2415608A1 (en) | 2000-07-24 | 2001-07-23 | A method and system for a document search system using search criteria comprised of ratings prepared by experts |
EP01954862A EP1428143A2 (en) | 2000-07-24 | 2001-07-23 | A method and system for a document search system using search criteria comprised of ratings prepared by experts |
AU2001277082A AU2001277082A1 (en) | 2000-07-24 | 2001-07-23 | A method and system for a document search system using search criteria comprised of ratings prepared by experts |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US22039800P | 2000-07-24 | 2000-07-24 | |
US60/220,398 | 2000-07-24 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2002008946A2 true WO2002008946A2 (en) | 2002-01-31 |
WO2002008946A3 WO2002008946A3 (en) | 2004-04-01 |
Family
ID=22823388
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2001/023058 WO2002008946A2 (en) | 2000-07-24 | 2001-07-23 | A method and system for a document search system using search criteria comprised of ratings prepared by experts |
Country Status (4)
Country | Link |
---|---|
EP (1) | EP1428143A2 (en) |
AU (1) | AU2001277082A1 (en) |
CA (1) | CA2415608A1 (en) |
WO (1) | WO2002008946A2 (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2004049097A2 (en) * | 2002-11-27 | 2004-06-10 | Accenture Global Services Gmbh | Content feedback in a multiple-owner content management system |
WO2005013162A1 (en) * | 2003-07-30 | 2005-02-10 | Trialstat Corporation | Systematic review system |
US7062505B2 (en) | 2002-11-27 | 2006-06-13 | Accenture Global Services Gmbh | Content management system for the telecommunications industry |
US7200614B2 (en) | 2002-11-27 | 2007-04-03 | Accenture Global Services Gmbh | Dual information system for contact center users |
US7395499B2 (en) | 2002-11-27 | 2008-07-01 | Accenture Global Services Gmbh | Enforcing template completion when publishing to a content management system |
US8572058B2 (en) | 2002-11-27 | 2013-10-29 | Accenture Global Services Limited | Presenting linked information in a CRM system |
US9785906B2 (en) | 2002-11-27 | 2017-10-10 | Accenture Global Services Limited | Content feedback in a multiple-owner content management system |
CN112801530A (en) * | 2021-02-05 | 2021-05-14 | 江西清能高科技术有限公司 | Intelligent review system based on semantic splitting and working method |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1997008604A2 (en) * | 1995-08-16 | 1997-03-06 | Syracuse University | Multilingual document retrieval system and method using semantic vector matching |
US5721910A (en) * | 1996-06-04 | 1998-02-24 | Exxon Research And Engineering Company | Relational database system containing a multidimensional hierachical model of interrelated subject categories with recognition capabilities |
WO2000036529A1 (en) * | 1998-12-16 | 2000-06-22 | Grassi Mantelli, Maria, Teresa | Dynamic taxonomy process for browsing and retrieving information in large heterogeneous data bases |
-
2001
- 2001-07-23 WO PCT/US2001/023058 patent/WO2002008946A2/en active Search and Examination
- 2001-07-23 EP EP01954862A patent/EP1428143A2/en not_active Withdrawn
- 2001-07-23 CA CA002415608A patent/CA2415608A1/en not_active Abandoned
- 2001-07-23 AU AU2001277082A patent/AU2001277082A1/en not_active Abandoned
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1997008604A2 (en) * | 1995-08-16 | 1997-03-06 | Syracuse University | Multilingual document retrieval system and method using semantic vector matching |
US5721910A (en) * | 1996-06-04 | 1998-02-24 | Exxon Research And Engineering Company | Relational database system containing a multidimensional hierachical model of interrelated subject categories with recognition capabilities |
WO2000036529A1 (en) * | 1998-12-16 | 2000-06-22 | Grassi Mantelli, Maria, Teresa | Dynamic taxonomy process for browsing and retrieving information in large heterogeneous data bases |
Non-Patent Citations (1)
Title |
---|
CHAKRABARTI S ET AL: "Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies" VLDB JOURNAL, SPRINGER VERLAG, BERLIN, DE, vol. 7, no. 3, August 1998 (1998-08), pages 163-178, XP002141635 ISSN: 1066-8888 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2004049097A2 (en) * | 2002-11-27 | 2004-06-10 | Accenture Global Services Gmbh | Content feedback in a multiple-owner content management system |
WO2004049097A3 (en) * | 2002-11-27 | 2004-10-07 | Accenture Global Services Gmbh | Content feedback in a multiple-owner content management system |
US7062505B2 (en) | 2002-11-27 | 2006-06-13 | Accenture Global Services Gmbh | Content management system for the telecommunications industry |
US7200614B2 (en) | 2002-11-27 | 2007-04-03 | Accenture Global Services Gmbh | Dual information system for contact center users |
US7395499B2 (en) | 2002-11-27 | 2008-07-01 | Accenture Global Services Gmbh | Enforcing template completion when publishing to a content management system |
AU2003302349B2 (en) * | 2002-11-27 | 2008-08-28 | Accenture Global Services Limited | Content feedback in a multiple-owner content management system |
US8572058B2 (en) | 2002-11-27 | 2013-10-29 | Accenture Global Services Limited | Presenting linked information in a CRM system |
US9785906B2 (en) | 2002-11-27 | 2017-10-10 | Accenture Global Services Limited | Content feedback in a multiple-owner content management system |
WO2005013162A1 (en) * | 2003-07-30 | 2005-02-10 | Trialstat Corporation | Systematic review system |
CN112801530A (en) * | 2021-02-05 | 2021-05-14 | 江西清能高科技术有限公司 | Intelligent review system based on semantic splitting and working method |
Also Published As
Publication number | Publication date |
---|---|
CA2415608A1 (en) | 2002-01-31 |
WO2002008946A3 (en) | 2004-04-01 |
AU2001277082A1 (en) | 2002-02-05 |
EP1428143A2 (en) | 2004-06-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7058516B2 (en) | Computer implemented searching using search criteria comprised of ratings prepared by leading practitioners in biomedical specialties | |
US20060161353A1 (en) | Computer implemented searching using search criteria comprised of ratings prepared by leading practitioners in biomedical specialties | |
US6694331B2 (en) | Apparatus for and method of searching and organizing intellectual property information utilizing a classification system | |
US8484177B2 (en) | Apparatus for and method of searching and organizing intellectual property information utilizing a field-of-search | |
Markov et al. | Data mining the Web: uncovering patterns in Web content, structure, and usage | |
US8296296B2 (en) | Method and apparatus for formatting information within a directory tree structure into an encyclopedia-like entry | |
Hawkins | Information science abstracts: tracking the literature of information science. Part 1: definition and map | |
US20070022125A1 (en) | Systems, methods, and computer program products for accumulating, strong, sharing, annotating, manipulating, and combining search results | |
Uren et al. | The usability of semantic search tools: a review | |
Hoekstra et al. | Data scopes for digital history research | |
US20060129538A1 (en) | Text search quality by exploiting organizational information | |
US20020138297A1 (en) | Apparatus for and method of analyzing intellectual property information | |
US20050177561A1 (en) | Learning search algorithm for indexing the web that converges to near perfect results for search queries | |
US20070022111A1 (en) | Systems, methods, and computer program products for accumulating, storing, sharing, annotating, manipulating, and combining search results | |
WO2002008946A2 (en) | A method and system for a document search system using search criteria comprised of ratings prepared by experts | |
Thelwall et al. | Why do web sites from different academic subjects interlink? | |
Li et al. | People search: Searching people sharing similar interests from the Web | |
EP1672544A2 (en) | Improving text search quality by exploiting organizational information | |
Pirmann | Using tags to improve findability in library OPACs: a Usability Study of LibraryThing for Libraries | |
Kling et al. | Research articles in scholarly electronic communication | |
Gilchrist | Text retrieval: an overview | |
Suárez-Figueroa | D1. 3.2 Identification of standards on metadata for ontologies | |
Liu | An empirical investigation of expertise matching within academia | |
Boulware et al. | Buddy: Harnessing the power of the internet | |
Li et al. | Web Mining to Identify People of Similar Background |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A2 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2001277082 Country of ref document: AU Ref document number: 2415608 Country of ref document: CA |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2001954862 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 200300673 Country of ref document: ZA |
|
WWP | Wipo information: published in national office |
Ref document number: 2001954862 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: JP |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 2001954862 Country of ref document: EP |
|
DPE2 | Request for preliminary examination filed before expiration of 19th month from priority date (pct application filed from 20040101) |