WO2001073587A2 - Universal biomolecular data system - Google Patents

Universal biomolecular data system Download PDF

Info

Publication number
WO2001073587A2
WO2001073587A2 PCT/US2001/003038 US0103038W WO0173587A2 WO 2001073587 A2 WO2001073587 A2 WO 2001073587A2 US 0103038 W US0103038 W US 0103038W WO 0173587 A2 WO0173587 A2 WO 0173587A2
Authority
WO
WIPO (PCT)
Prior art keywords
databases
data
computer
biomolecular
biomolecular data
Prior art date
Application number
PCT/US2001/003038
Other languages
French (fr)
Other versions
WO2001073587A3 (en
Inventor
Gareth Jones
Original Assignee
Arena Pharmaceuticals, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Arena Pharmaceuticals, Inc. filed Critical Arena Pharmaceuticals, Inc.
Priority to AU2001233134A priority Critical patent/AU2001233134A1/en
Publication of WO2001073587A2 publication Critical patent/WO2001073587A2/en
Publication of WO2001073587A3 publication Critical patent/WO2001073587A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/30Data warehousing; Computing architectures
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/40Searching chemical structures or physicochemical data
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/90Programming languages; Computing architectures; Database systems; Data warehousing

Definitions

  • the field ofthe invention is database management systems.
  • the field of the invention includes a database management system for storing, using, maintaining and retrieving large amounts of biomolecular data, including chemical, biological, genomic, testing, and other related data and information.
  • the present invention relates generally to a database management system (or
  • DBMS database management systems for providing fast access to information relating to large collections of biomolecular data, including, for example, data on organic compounds and nucleotide sequences (for example gene sequences), structural and chemical information, in vivo and in vitro data, protein sequences, assays, and other screening information.
  • the present invention utilizes, as appropriate for a desired system, multiple databases, including relational databases, chemical structure databases, and related persistent storage mechanisms, sequence databases, database-searching mechanisms, dedicated laboratory instruments, server computers, and internal and external software including
  • Computer systems in general are known.
  • a typical system comprises a computer, keyboard, mouse, and a monitor.
  • the computer comprises a single or multiple central processing unit(s) (“CPU”) and random access memory (“RAM”) and allows various software programs to be used.
  • the computer might comprise a modem, an Ethernet card or other similar device for connecting to a system of networked computers, such as the
  • the Internet provides a useful technique for making information available to a variety of individuals each of whom may be located at a variety of different locations. Indeed, within the vast Internet environment, individuals can access information tools from remote locations.
  • the Internet which originally came about in the late 1960s, is a computer network made up of many smaller networks spanning the entire globe. The host computers or networks of computers on the Internet allow public or private access to databases containing information in numerous areas of expertise. Hosts can be sponsored by a wide range of entities including, for example, universities, government organizations, commercial enterprises and individuals.
  • Internet information is made available to the public through servers running on an Internet host.
  • the servers make documents or other files available to those accessing the host site.
  • files can be stored in databases and on storage media such as, for example, optical or magnetic storage devices, preferably local to the host.
  • TCP/IP Transmission Control Protocol/Internet Protocol
  • IP Internet Protocol
  • Internet Protocol IP
  • the Web comprises hundreds of thousands of interconnected "web pages", or documents, which can be displayed on a user's computer monitor. These web pages are provided by hosts running special servers. Software that runs these web servers is relatively simple and is available on a wide range of computer platforms including PC's ("personal computers”). Equally available is web browser software, used to display web pages as well as traditional non-web files on the user's system.
  • HTTP Hypertext Transfer Protocol
  • HTTP is designed to run primarily over TCP/IP and uses the standard Internet setup, where a server issues the data and a client displays or processes it.
  • HTTP is designed to run primarily over TCP/IP and uses the standard Internet setup, where a server issues the data and a client displays or processes it.
  • One format for information transfer is to create documents using Hypertext Markup Language (“HTML"). HTML pages are made up of standard text as well as formatting codes indicating how to display the page. The browser reads these codes to display the page.
  • Each web page may contain pictures and sounds in addition to text. Associated with certain text, pictures or sounds are connections, known as hypertext links, to other pages within the same server or even on other computers within the Internet. For example, links may appear as underlined or highlighted words or phrases. Each link is directed to a web page by using a special name called a URL ("Uniform Resource Locator"). URLs enable the browser to go directly to the associated resource, even if it is on another web server.
  • URL Uniform Resource Locator
  • experimental data are not always recognized as being relevant or important, particularly when such data are located across millions of pages of information. For example, "negative” data from one scientific program is often, in retrospect, "positive” within the context of another program. Additionally, it is increasingly the case that companies may have data relating to thousands of nucleic acids sequences. For example, the explosion of genomic information has created an unprecedented need for users with a deep working knowledge of biological sciences, chemical sciences and computational methods. Accordingly, there exists a need for data models and database systems that allow one to store, manipulate, retrieve, use or search various biomolecular data, including sequences, structures, genetic linkages and maps, signal pathways, and the like. Biological databases also need a means to provide experimental data relating to these sequences and structures, including both positive and negative data.
  • the database management system invention disclosed and claimed herein provides users with transparent access to a set of data that otherwise would require separate software packages to access.
  • the database management system includes, as necessary or desired, a local area network, a relational database management system, a chemical structure database, persistent storage (such as flat files and DBM ("Disk Based Hash”) files, a relational database, a sequence database and/or sequence searching mechanism, one or more laboratory instruments collecting data (for example, one or more of in vitro data, in vivo data, in situ data, tissue distribution information and data, or the like), external software methods (e.g., file format translation programs), a set of two or more computers with Internet browsing software, a server computer (or a computer cluster), web server software and/or programs for collecting, publishing, editing, and searching data.
  • SSL secure sockets layer
  • a database management system that provides an interface to a plurality of databases storing biomolecular data
  • the system comprises: a plurality of databases storing biomolecular data; a processing unit; a first computer instruction that directs the processing unit to receive a request for access to biomolecular data stored in at least one ofthe plurality of databases; a second computer instruction that determines which ofthe plurality of databases stores the biomolecular data; a third computer instruction that accesses the biomolecular data in the at least one ofthe plurality of databases; a fourth computer instruction that receives the biomolecular data from the at least one ofthe plurality of databases; and a web page that is generated by the processing unit and displays the biomolecular data received from the at least one ofthe plurality of databases.
  • Biomolecular data refers to any data and/or information concerning molecules that are studied, made, used, sold, offered for sale, exported or imported by an individual or entity.
  • Such molecules include, but are not limited to, nucleic acids, proteins, lipids, carbohydrates, and the like, as well as any constituents or building blocks of such molecules, for example amino acids, nucleotides, and the like.
  • Such molecules also include, but are not limited to, organic and inorganic compounds and mixtures of such compounds, whether in liquid, solid, gaseous, or any other form.
  • Information concerning such molecules includes, but is not limited to, structure, formula, weight, chemical activities and characteristics, biological activities and characteristics, in vivo and in vitro activities and characteristics, kinetic activities and characteristics, physiological activities and characteristics, receptor binding activities and characteristics, and so on, including any and all laboratory, testing, experimental and other data relating to such molecules.
  • This definition is intended to be exemplary only. It is not intended to limit the types of data that may be manipulated by the invention to the classes or types of information described or referred to throughout the specification. Rather it is meant to include any information that a person or entity would find useful to store and quickly retrieve using the methods and apparatus described and claimed herein.
  • Cold refers to any item that can be distinguished from another item based on visible or machine readable color recognition.
  • a technical advantage of one embodiment of the present invention is that it provides a fast and easy system for obtaining biomolecular data, such as chemical, structural, genomic, screening and other laboratory data, genomic, and gene-expression data, over the Internet, and thus is accessible virtually from anywhere.
  • Another technical advantage of one embodiment of the present invention is that it provides a customizable system for gathering, analyzing, and relating data. Additionally, such information may easily and securely be shared within an organization or with other desired third parties.
  • An additional technical advantage of one embodiment of the present invention is that it allows real-time distribution of experimental data and allows users to analyze and search information with little to no delay.
  • Another technical advantage of one embodiment of the present invention is that multiple interacting databases may be used to provide useful information to a user. For example, gene sequence based information may be linked to biological assay information and chemical structure information.
  • An additional technical advantage of one embodiment of the present invention is that it may be used to gather and collect data from on-going experiments. Additionally, one embodiment of the present invention may be directly connected to laboratory instruments to record and track data in real time and/or update information in one or more of the plurality of databases.
  • Another technical advantage of one embodiment of the present invention is that users with limited computer experience may easily gather and read data. Additionally, one embodiment of the present invention outputs clearly formatted data.
  • Another technical advantage of one embodiment of the present invention is that it may operate over a LAN, wide-area network ("WAN”), stand-alone computer, the Internet, or any other system of networked computers.
  • WAN wide-area network
  • stand-alone computer the Internet
  • Internet any other system of networked computers.
  • FIG. 1 illustrates an overview of a system contemplated by one embodiment of the present invention
  • FIG. 2 illustrates an overview of a computer system of one embodiment of the present invention
  • FIG. 3 illustrates a schematic overview of one embodiment of the system of the present invention
  • FIG. 4 illustrates another schematic overview of one embodiment of the system ofthe present invention
  • FIG. 5 illustrates a series of menu options available in one embodiment of the system of the present invention
  • FIG. 6 illustrates a flowchart of an overview of one embodiment of the present invention
  • FIG. 7 illustrates a flowchart of a specific aspect of the flowchart of FIG. 6
  • FIG. 8 illustrates a flowchart of a specific aspect of the flowchart of FIG. 6;
  • FIG. 9 illustrates an introduction screen as output from one embodiment ofthe present invention.
  • FIG. 10 illustrates a visualization tool for runset counts as output from one embodiment of the present invention
  • FIG. 1 1 illustrates a hydrophobicity sequence analysis, with two applet output windows, as output from one embodiment of the present invention
  • FIG. 12 illustrates a histogram of frequency counts for a runset, as output from one embodiment of the present invention
  • FIG. 13 illustrates a molecular datasheet as output from one embodiment of the present invention
  • FIG. 14 illustrates a primary screening runset as output in two windows from an embodiment of the present invention
  • FIG. 15 illustrates a primary screening plate as output in two windows from an embodiment of the present invention
  • FIG. 16 illustrates animal data output for a motor function/dysfunction experiment as output from one embodiment of the present invention
  • FIG. 17 illustrates an interface while a program is running as output from one embodiment of the present invention
  • FIG. 18 illustrates an IC50 form for searching results from IC50 searches as output in two windows from an embodiment of the present invention
  • FIG. 19 illustrates a scatter plot comparison, with axes inputs, as output from one embodiment of the present invention
  • FIG. 20 illustrates a query for searching database results as output in two windows from an embodiment of the present invention
  • FIG. 21 illustrates a form for entering assay data as output from one embodiment of the present invention
  • FIG. 22 illustrates dot blot experimental details as output in two windows from an embodiment of the present invention
  • FIG. 23 illustrates an IC50 plat assay data sheet as output in two windows from an embodiment ofthe present invention.
  • the present invention comprises a system for maintaining, searching and using a system of databases and for storing and retrieving large amounts of biomolecular and other data, including chemical, biological, genomic, and other information. Moreover the present invention provides fast access to information relating to large collections of organic compounds, proteins, and nucleotide sequences, including structural information, in vivo and in vitro data, gene sequences, protein sequences, and other laboratory data including but not limited to assay and screening information.
  • Biomolecular data comprises the various data types mentioned herein, including, but not limited to, chemical, biological, genomic, organic, structural, and the like
  • a client computer may be used to make a request for data to a server.
  • the client computer passes to the server computer the name of a server program to run along with a set of arguments and values.
  • this request is encoded in a client web page via a link or form so that the user may access the program by using a computer mouse.
  • this request may originate from a JAVA® (Sun Microsystems, Inc.) program and may request that the browser load a new web page, or the request may also be in response to action through a computer mouse by single or multiple clicks.
  • the request is passed to the server computer and onto the server program using the
  • HTTP web protocol and executed on the server using, for example, the Common Gateway Interface ("CGI") mechanism.
  • CGI Common Gateway Interface
  • PHP PHP Hypertext Preprocessor
  • ISAPI Internet Server Application Program Interface
  • Microsoft Corporation application such as ASP, Active Server Pages
  • Standard web server programs such as the APACHETM (Apache Software Foundation) web server or MICROSOFT'S INTERNET INFORMATION SERVER® (Microsoft Corporation) can be configured to handle these requests.
  • the server program executes and creates a new web page as output, for example in the HTML, the Extensible Markup Language (“XML”) format, or any other web page format, which replaces the current web page in the client browser.
  • HTML Hypertext Preprocessor
  • ISAPI Internet Server Application Program Interface
  • ASP Active Server Pages
  • Standard web server programs such as the APACHETM (Apache Software Foundation) web server or MICROSOFT'S INTERNET INFORMATION SERVER® (Microsoft Corporation) can be configured to handle these requests.
  • the server program execute
  • the client request may originate from an applet, such as a JAVA® program, that runs in the web page.
  • Applets are programs that can run inside web pages.
  • the applet may obtain its data points from the server program and display a scatter-plot based on that data.
  • the applet is also able to communicate with the CGI program through web server software.
  • the CGI program may then use a data format that the applet could readily interpret.
  • any applet may be under any programming language.
  • C the Practical Extraction Report Language
  • PROL Practical Extraction Report Language
  • CGI Licoln D. Stein
  • DBI Tim Bunce
  • LWP Gisle Aas, Martijn Koster
  • NET Win32
  • GD Longcoln D. Stein
  • DB_File Paul Marquess
  • PTK Nick Ing-Simmons
  • APACHE APACHE
  • XS XS
  • Some tools that are useful to use in the present invention include, for example: APACHETM, UNITY® (TRIPOS Associates, Inc.), public domain software such as APACHETM, NCBI BLASTTM (National Center for Biotechnology Information), CLUSTALTM (developed by Toby Gibson, Des Higgins, and Julie Thompson), PHYLIPTM (University of Washington), IMAGEMAGICKTM (Imagemagick Studio), GNUPLOTTM (Thomas Williams and Colin Kelley), and the like.
  • Some output formats for visualization of the data include: histograms, scatterplots, barcharts, tables, assay browsers, menus, curves, plate maps, and the like.
  • the client computer may be any workstation, personal computer, server computer, handheld computer, laptop computer, mobile or wireless computing device, or alternatively any other computing device, for example a client in a client/server environment.
  • the client may be a user machine that performs computer processing, while the server acts as a remote storage medium for the data in the databases.
  • the processing may be done by the client computer.
  • the client computer may interact with the databases or the DBMS.
  • the present invention is generally referenced as utilizing chemical, biological, and genomic information, the present invention may use any type of information that is suitable for use on a database system.
  • a local area network (“LAN”) or intranet 11 may be used to carry out the present invention.
  • computers 12 and 14 may be connected to LAN 11.
  • Computers 12 and 14 may connect to web server software 26 through LAN 11.
  • Web server software 26 may connect to a database management system 5 ("DBMS").
  • the DBMS 5 may be on a server computer or on a plurality of computers (such as a tier of computers), wherein the server is comprised of a plurality of computers.
  • computers 12 and 14 may connect directly to DBMS 5.
  • computers 12 and 14 may be client computers.
  • the client computers 12 and 14 interact with the server computer housing the DBMS 5 or to other computers on the LAN. In the present invention, as many client computers may be used as necessary. Thus, there may be a client computer for each user that desires access to the present invention).
  • FIG. 2 illustrates a portion of a computer, including a CPU and conventional memory in which one embodiment ofthe present invention may be embodied.
  • the environment in which the present invention operates encompasses a general distributed computing system, wherein general purpose computers, workstations, or personal computers are connected via communication links of various types to the Internet.
  • Some of the elements of a general purpose workstation computer 100 are illustrated in FIG. 2, wherein a processor 101 is illustrated, having an input/output ("I/O") section 102, a central processing unit (“CPU”) 103 and a memory section 104.
  • I/O input/output
  • CPU central processing unit
  • the I/O section 102 may be connected to a keyboard 105, a display unit 106, a disk storage unit 109 and a CD-ROM drive unit 107.
  • the CD-ROM unit 107 can read a CD-ROM medium 108, which typically contains programs and data 1 10.
  • This computer 100 may be connected to the Internet or the Web, via a modem, Ethernet connection, or other communications link.
  • Computers described herein may use a computer system as described above or computer systems similar to this computer system.
  • DBMS 5 may use web server software 26.
  • Computers 12 and 14 may then use Internet browsing software, such as NETSCAPE NAVIGATOR® (Netscape Communications Corporation) or MICROSOFT INTERNET EXPLORER® (Microsoft Corporation).
  • Internet browsing software such as NETSCAPE NAVIGATOR® (Netscape Communications Corporation) or MICROSOFT INTERNET EXPLORER® (Microsoft Corporation).
  • remote client computers 38 that are outside of LAN 11 may connect to the system of the present invention.
  • a firewall may be erected, passwords may be required, or data encryption may also be used.
  • Database management system 5 oversees organization of the data.
  • Database management system 5 may be the computer system that oversees management, integration and operation of the various databases.
  • Database management system 5 may be housed on a computer or a plurality of computers, such as the server.
  • the present invention may use any database that contains information useful to the user, including, but not limited to, biomolecular data.
  • databases that may be used in the present invention include databases that contain information on compound collections, organic compounds, proteins, nucleotide sequences (including structural information), structural descriptors (such as TRIPOS® SLNS or SMILES® (Daylight Chemical Information Systems)), in vivo or in vitro data, gene sequences, protein sequences, a tissue screen, structural information, sequence information, data-mining results, assay information, and the like.
  • Data-mining results may come from searches (such as for primary screens, reconfirmations, or IC50s), comparisons of assays, track screening, structure-based searches (such as for substructure, similarity, or clustering), or a combination of structure/assay searches, for example.
  • results output from the present invention may provide users with various information in various formats.
  • these results may allow a user to: view data from laboratory devices; add data from counters to a database; compare assay results; obtain statistics on assays; edit plates, runsets, or wells out of an assay; edit chemical structures; add compounds; compare results from various searches; browse dot blots; search dot blots; browse genes; search genes; search sequences; retrieve literature for sequences or other information; and the like.
  • a runset profile may be generated.
  • the runset profile may include a grid of colored wells wherein each well contains a representation of a number of data points from sequential plates in the runset.
  • the user may control the number of data points. Additionally, the user may scroll through the visualization ofthe runset.
  • DBMS 5 determines which database or databases the particular data sought by the user is located. This may be accomplished, for example, by sequentially examining each database in the system or by examining specific databases in the system, depending on the request. For example, if a chemical structure is desired, the system may look at chemical structure databases, such as the UNITY® (TRIPOS Associates, Inc.) database and other chemical structure databases.
  • UNITY® TRIPOS Associates, Inc.
  • Database management system 5 is connected to the various databases used in the present invention.
  • database software management system 5 may be connected to a relational database management system ("RDBMS").
  • RDBMS relational database management system
  • a relational database is a set of tables containing data fit into predefined categories, where each table contains one or more data categories.
  • a RDBMS may use structured query language (“SQL”). Examples of RDBMS are: an ORACLE® database (Oracle Corporation), an IBM DB2® (International Business Machines), or MICROSOFT SQL SERVER® (Microsoft Corporation).
  • DBMS 5 may be connected to: a persistent storage medium, such as flat files 22; disk based hash ("DBM") files, such as Berkley DBTM (Sleepycat Software); a sequence database; a sequence searching database 20, such as NCBI BLASTTM; a sequence searching mechanism; a chemical structure database 18; a chemical structure searching mechanism, such as UNITY®; a screening database 16, such as an RDBMS storing screening information (for example using ORACLE® to store screening information); gene expression sequence databases (such as one using ORACLE®); and the like.
  • a persistent storage medium such as flat files 22
  • DBM disk based hash
  • sequence database such as NCBI BLASTTM
  • sequence searching mechanism such as NCBI BLASTTM
  • chemical structure database 18 such as UNITY®
  • UNITY® chemical structure searching mechanism
  • a screening database 16 such as an RDBMS storing screening information (for example using ORACLE® to store screening information); gene expression sequence databases (such as one using ORACLE®); and the
  • instruments 30, 32, and 34 for collecting data may also be connected to DBMS 5.
  • Instruments 30, 32, and 34 are preferably laboratory instruments, however, they may be any instrument that is useful to connect to DBMS 5 and gather data.
  • the laboratory instruments used in the present invention may include but are not limited to: thermometers, electrochemical readout devices, refractive index devices, biosensor readout devices, chemiluminescent readout devices, plot counters, and plate readers such as beta scintillation counters, fluorescent plate readers, fluorescent polarization plate readers, colorimetric plate readers, ultraviolet detector readers and data derived from animal studies. Further, there may be as many instruments used in the present invention as necessary to collect the desired data.
  • a software data source 28 that gathers information from other programs, such as external software methods (for example, file format translation programs) and/or computer programs for collecting, publishing, editing, and searching data, may also be connected to DBMS 5.
  • DBMS 5 and the various databases may be on one computer and communicate with the va ⁇ ous databases within the one computer.
  • the databases and other systems may be on various computers and DBMS 5 may communicate with these various systems by using electronic connections, standard communications systems, secure communications, wireless communications, encrypted communications, or the like.
  • the system begins with a user, via the client computer, requesting data via a web server program, using the appropriate parameters and security access as necessary, at 140.
  • the user may be required to enter a login and/or a password in order to use the system.
  • the system may utilize other forms of user authorization, such as biometric information (for example, a fingerprint, voice recognition, or retinal), knowledge based identifying information (for example, a mother's maiden name), or the like.
  • the system may identify the user and associate various parameters with the user, such as an access level, the number of times the user has accessed the system, the price the user pays per access to the system or per minute on the system, or the like.
  • the server CGI program starts.
  • the server CGI program may be a collection of software objects, associated methods and/or procedural programs.
  • the server CGI program for example, is selected at 142.
  • these programs and objects comprising the CGI program may be coded in the computer software programming language PERL.
  • PERL is an object-orientated scripting language that allows useful extension modules for Internet and database programming.
  • PERL may also load C or C++ objects.
  • numerical routines may be coded as C or C++ and then loaded as PERL modules.
  • Many common data structures used in the present invention may be encoded as PERL objects. For example, a molecule object that describes data associated with a molecule in the compound collection may be encoded using PERL. Such an object, when instantiated, may contain data-fields that describe the molecule, together with software methods.
  • the data- fields may contain raw information, for example, molecular weight or molecular co-ordinates, or may themselves be complex software objects.
  • Another example of a method in the molecule object may include: a method to retrieve basic information about the molecule from the structural database 16 (such as a structural descriptor, molecule name, compounds containing this molecular structure); a method to retrieve similar compounds from a structure database 18; a method to generate a structure diagram graphic that may be used in a web page and a method to retrieve assay results from database 16 and methods to format and visualize information in the molecule object. Additionally, such methods may create other objects and run other methods. For a given client computer request, a PERL encoded program may be invoked with the CGI parameters. Generally the program will create objects and apply methods, though tasks may be achieved purely through procedural code.
  • Data may be obtained through data-sources on the server computer, as illustrated in 143 to 154.
  • the order of steps in the present invention may involve accessing data-sources in whatever order is beneficial for the user to obtain the desired data.
  • the system determines if a relational database management system ("RDBMS") is to be accessed at 143.
  • the RDBMS may be ORACLE®, however, the present invention may be used in conjunction with any other RDBMS, for example MICROSOFT SQL SERVER® or IBM DB2®.
  • a RDBMS stores data as a set of tables. For example, two tables in the database of one embodiment ofthe present invention are the "molecules" table, which stores specific molecule identification numbers and structural descriptors, and the "primary assay results” table which stores percent response and counts-per-minute for wells in primary assays. Relationships between these and the many other tables are described using foreign keys. Extensive use of indexes in this embodiment enables rapid retrieval of data from the RDBMS.
  • RDBMS data is needed, data is retrieved from the RDBMS using SQL.
  • the present invention may generate the appropriate SQL commands automatically and retrieve the data at 144. These commands are executed against the RDBMS and the results may be stored in the objects data fields.
  • the system determines if a chemical structure search is to be performed.
  • UNITY® a chemical structure search system
  • the present invention may use any other chemical structure search system, such as MDL, the Integrated Scientific Information System ("ISIS") (MDL information Systems Inc.) or the DAYLIGHT DATABASE PACKAGETM (Daylight Chemical Information Systems, Inc.).
  • ISIS Integrated Scientific Information System
  • DAYLIGHT DATABASE PACKAGETM Daylight Chemical Information Systems, Inc.
  • chemical structure searches were not possible using SQL, which is why such searches could not be done in traditional RDBMSs.
  • the present invention may use a chemical structure search that is embedded within a RDBMS (for example RS3 from Oxford Molecular).
  • the chemical structure search is performed.
  • common chemical structure searches such as a search for "phenol" should result in: (i) an exact match (e.g., retrieve phenol from the database if it is present); (ii) a substructure search (for example, retrieve a list of structures that contain a phenol as a functional group from the database); or (iii) a similarity search (for example, retrieve a list of structures that have a similar chemical structure to phenol).
  • the database management system may have access to a number of other databases, such as UNITY® databases. These databases contain chemical structures and a registration identifier for each structure.
  • One such useful UNITY® database is one containing the chemical structures in the compound collection, where the registration identifier relates to the specific molecule identification number.
  • methods that retrieve data from this UNITY® database are able to perform a "software join" to data in the RDBMS database.
  • This feature allows the user to navigate between structural and biological data. For example, the user may perform a substructure search to retrieve all compounds containing a particular substituent from the compound collection. The user may then generate a table of these compounds with additional columns containing biological activity. Additional databases are also available to enhance the above information, such as commercial compound catalogues that may contain compounds of interest to users.
  • the UNITY® database software uses an input file and command line syntax to perform searches. Results from the searches are written to a "hitlist" file. Methods in the database management system generate the input file and associated command line, execute the search and parse the hitlist file. Typically, a hitlist object is created to browse the results from the hitlist file. It is stored in a data-field in the original calling object.
  • the system of an embodiment of the present invention determines if a sequence-similarity search, such as a bioinformatics search, is to be performed.
  • a sequence-similarity search such as a bioinformatics search
  • gene-sequences may be stored in a NCBI BLASTTM database, which is a public database. This software performs sequence homology searches.
  • BLASTTM uses an input file and command line, and produces an output file.
  • the present invention contains methods for generating BLASTTM queries and input files and also for parsing BLASTTM output files automatically, and does so at 148.
  • the present invention may use any other sequence searching mechanisms.
  • the system ofthe present invention determines if saved data is to be used. If so, the saved data object is retrieved at 150.
  • cached or saved data may be used. For example, results from searches are often stored in tabular data files. As a user browses the data, the user may add columns to the data or perform operations on the table. Such information may be cached to prevent the search operation from having to be repeated. Caching also speeds up operation ofthe present invention.
  • Other saved objects may further include, for example, molecular structure icons, UNITY® hitlist files or BLASTTM output files.
  • data may exist in flat files.
  • the output from laboratory devices is an example of data in a flat file.
  • Flat files are collected from devices by software services. Such data may be time-stamped and stored on the web server. Users may download such data for local processing or the user may enter such data into the database as assay results.
  • the present invention may use Berkeley DBTM files. This is a DBM file that contains a set of keys and values where the values may be rapidly retrieved using the keys.
  • DBM files There are multiple public implementations of DBM files, including the Berkeley DBTM, MLDBM, and SDBM. Any of these files may be used in the present invention.
  • the present invention uses software programs that may store search results in the DBM file for quick retrieval later. These files may be used to store results from bioinformatics data-mining experiments, which are typically run every week.
  • a software object may store the data in a file.
  • Higher level programming languages such as JAVA® and PERL provide simple interfaces for this type of data persistency. An example of this occurs when displaying locomotor animal data.
  • Both the online images and the page itself involves the retrieval of thousands of data points from the database management system followed by statistical analysis.
  • the search and calculation need only be done once and the resulting complex data structure may be saved to disk.
  • the system determines if data from an external procedure is to be used. If so, at 152, the data is received from the external procedure.
  • Such external procedures include data-sources and other types of databases or persistent sources.
  • While such external procedures use the same method of retrieval as those databases described above, calling such external procedures involves defining the input (for example, in a command line argument and/or input file), executing the procedure, and then parsing the results.
  • the present invention may communicate with such external procedures in a number of different manners.
  • the present invention may use utility commands from TRIPOS Associates® that add functionality, including chemical structure file format translation programs (for example a program that allows the present invention to read and write structures to the CHEMDRAW® (CambridgeSoft Corporation) plugin) and "sln2gif," a TRIPOS® program that converts a structural descriptor strings to molecular image files.
  • the present invention may use any other external procedures. Such external procedures increase the breadth and scope ofthe services available to the user.
  • GNUPLOT a public plotting program
  • MAGEMAGICK CONVERTTM an image translation program of Imagemagick Studios that allows PNG format images to be converted to MACINTOSH® (Apple Computer, Inc.) PICT or WINDOWS® (Microsoft Corporation) BMP file formats
  • CLUSTALTM sequence alignment software and PHYLIPTM that allows the generation of a phylogenetic tree diagram from the CLUSTALTM sequence alignment
  • the present invention may use external procedures that provide a chemical structure clustering program that (in conjunction with a TRIPOS Associates, Inc. tool that generates molecular fingerprints for chemical structures) divides a set of molecules into chemically similar groups.
  • this tool allows a user to collect hits from a search for biologically active compounds and sort these hits into chemically similar classes.
  • another external procedure may be used to gather data from IC50 experiments and configure the results into a standard dose response curve using simplex minimization.
  • data collected may originate from an instrument, such as a laboratory instrument, that is connected to the present invention. This data may be stored in the appropriate databases, as outlined in detail below. Alternatively, the data from such instruments may be used to update or modify existing data stored in one ofthe plurality of databases connected to the database management system. Once data has been collected from all sources, processing ofthe data occurs at 153.
  • Data processing in the present invention may occur in various forms, including organization, tabulation, or statistical calculation of such data, for example.
  • the data may be used to modify or update data already existing in the databases. Calculations of the data may range from simple calculations such as the mean or standard deviation, to more complex numeric calculations such as a t-test, a student's distribution probability, the area under a section of a normal distribution curve or an ANOVA (analysis of variance), etc.
  • the present invention may also update databases as the present invention operates. For example, a user may add new screening results to the database as he or she works in the lab, or may collect data from a lab instrument. The present invention determines biological activities and assay controls from a device data-file and user supplied parameters. These new results are added to the appropriate database, however, SQL queries for the update should be formulated and executed in a transaction such that the updated database is not corrupted in the event of a error.
  • users may also edit existing data. For example, a user may mark one or more ofthe wells in an experimental 96-well plate as "bad" if he or she believes a dilution or other error has occurred. The present invention then updates the assay results and controls. The present invention may monitor the databases to ensure the integrity ofthe system.
  • the present invention may update multiple databases. Further, the present invention may perform frequent e ⁇ or checks to ensure that the various databases are synchronized. The present invention may also perform error checks when a compound structure is being edited. In addition, other error checks also may occur, such as when a structure is newly added. The system may check to determine if the newly added structure is already in the database. Such error checks help ensure the integrity of system, because, for example, a structural analysis using a nuclear magnetic resonance ("NMR") spectrometer and/or a mass-spectrometer may show that a compound had a different structure than what was expected. At 154, the present invention determines whether further searches are needed or if additional information from data-sources is required.
  • NMR nuclear magnetic resonance
  • step 154 helps prevent information from being missed. For example, a user may request a table of all compounds having an average "IC50" against a "receptor A" that is, for example, less than l ⁇ M cross referenced against "receptor B" and "receptor C” activities. Rather than constructing a complex SQL query to obtain all results at one time, the present invention may return to step 143 and retrieve the receptor A hits, construct a table object from these hits, and then retrieve any receptor B and receptor C activities which may then be added as additional columns to the table object. At 155, persistent data objects may be saved. If just an HTML page is output, the system may not require saving any persistent objects.
  • a primary assay data sheet may contain an applet that displays assay data in bar-chart form and a second applet that displays the same data as a colored grid in a separate applet.
  • the present invention can write to disk files containing the assay data.
  • Each applet is given the name of the data file as a program parameter. Both applets then open a connection to the server and read the data file.
  • Such applets may be developed under JAVA® and may include methods to read and interpret data files generated by the present invention. Again, if a user is repeatedly using the same data again, such data may be saved to a cached file. Alternatively, all search data may be cached and saved as a table file that the user is free to manipulate in later requests.
  • the standard output of the program is redirected by the web server software back to the client computer.
  • the data is formatted in such a way that the applet may read the data. If an applet is being used, at 157 the data is output to the client.
  • the data may be in the format requested by the client. For example, if a dot blot was desired, it may be output as a "jpg" file, a "bmp” file, a "gif ' file, or some other computer image file.
  • a dot blot is any matrix based system for hybridization.
  • a dot blot may be northern blots data from any DNA technology, any data from two dimensional proteomic gels, a matrix of RNA extract from tissues in dot format, a chip with DNA sequences attached to them, or the like.
  • a sample ofthe results of a dot blot experiment, as output from one embodiment of the present invention, is shown in FIG. 22.
  • images may be saved in the system, such that a user may cut and paste the image to some other program (such as a word processing document).
  • the output may be in various other formats stated above, such as a histogram, a relational table format, or in some application specific format, such as formatted for ORACLE®.
  • the data may be in any different format that is convenient for the user to utilize such data.
  • the data may be returned in an HTML file to the client.
  • the present invention may use software objects to ensure that a properly formatted HTML page is created. This allows users with limited computer experience to have clearly formatted data that is easily utilized by such users.
  • the generated HTML may contain multiple links to allow the user to retrieve related data.
  • the new web page may also contain one or more forms or easy to fill in pages that the user may fill out to perform a data search.
  • Data may also be output in XML or any other alternative format that a browser can interpret.
  • the present invention may use any available web browser, such as MICROSOFT INTERNET EXPLORER®, NETSCAPE NAVIGATOR®, or any other web browser such as ones used on WINDOWS®, MACINTOSH® and UNIX® (Unix System Laboratories, Inc.) operating systems.
  • the original request to the system may initiate from a web server, but there may not be a need for an HTML web page file to be returned as output. In such an instance, some other file is returned.
  • a user may request a MICROSOFT® Excel spreadsheet.
  • a MICROSOFT® Excel spreadsheet may be returned to the user, as a readable web page file.
  • the system ofthe present invention may also contain methods that may communicate with other MICROSOFT® OLE objects that allow the creation of other documents.
  • a user may request the output to be in a MICROSOFT® Word format.
  • a browser that requests an inline image or a user that requests an image for document preparation may return the appropriate image file instead of an HTML web page file.
  • the output of the program sends to the client a web page file that is readable from a web browser on the client's computer.
  • the server program finishes.
  • the client loads the server data through its web browser.
  • Data that is not in JAVA® or HTML format, such as a MICROSOFT® Excel spreadsheet or images (as described above) are handled automatically by the web browser.
  • the web browser loads the HTML page.
  • the web browser may automatically format the page and retrieve any included images.
  • the system determines if applets are in the web page at 162. If there are none, the system is complete at 163, as the web browser has displayed the pages requested by the user. If there are applets in the page, at 164 the system automatically start such applets through the web browser.
  • the applets are stored in compiled byte-code on the server and are downloaded on demand by the web browser. The behavior and size of the applet is described by HTML tags in the client web page. Normally, HTML web pages contain static content. Having an applet in a web page allows dynamic content within the page. Alternatively, other technologies that allow dynamic content in the web page may be used, such as javascript and dynamic HTML.
  • the system determines if data files are needed from the server for the applet. If such data files are needed, then these files are retrieved from the server at 166.
  • the system may open a HTTP connection back to the server and read the file. Many applets may need to gather data from the server. For example, if the applet is displaying table data, then the applet will read the appropriate data file on the server.
  • the system determines whether the applet needs to run server software. If so, the applet opens an HTTP connection back to the server and runs the software.
  • the web server software ensures that the output from the server program is returned to the applet (as described above). For example, the applet may need to generate a database query. This query is passed to a server program for execution 143 and the data from the server program is returned to the applet through the appropriate steps described above.
  • the applet processes the data.
  • the applet may run within a frame in the web page. Additionally, the applet may provide features such as buttons, menus and entry-boxes as well as graphic elements. Also, the applet may respond to user events such as mouse movements or single or multiple clicks.
  • applets that may be used in the present invention include bar- charts, histograms, scatter-plots for the visualization of numeric data, and the like. These components can be used in a variety of situations, such as to download and display table files. Additionally, a user may click on a column in a bar-chart displaying the plate-profile for a primary assay and the web browser can load the relevant compound data-page. Such applets may also be computer-friendly by displaying data in a clear and simple manner to users. Further, many features give the user control over the data display (for example, depending on the particular applet, the user may turn off or on the display of error bars; add points or other labeling; perform scrolling; or perform selection operations; or the like).
  • a prefe ⁇ ed use for the applet is to display tabular data.
  • Static web pages can also be used to display tabular data, however, such web pages may be unwieldy for a table with many hundreds of rows.
  • Applets may scroll through large amounts of data while controlling the display of contents in the cells ofthe table. For example, if a user clicked on a cell containing a molecule identification number, the applet directs the browser to that molecule's data sheet.
  • a further applet that may be used is one that allows programs running on the server to display progress messages. This allows the user to track the progress of long tasks. Without this applet, the user would typically have to wait for the program to finish before seeing any progress. Often the user can become frustrated and believe that the connection to the server has been lost without such updates.
  • a colored grid applet may be used for the visualization of results, such as screening plate results from laboratory assay experiments.
  • results such as screening plate results from laboratory assay experiments.
  • Each well in a screening plate may be illustrated as a colored box, with the color being determined by the well measurement.
  • the user may use buttons to toggle between, for example, a measured biological response and counts-per-minute and the user may control the range and type of color displayed. Further, the user may use the mouse to obtain more detailed information on the contents of each well.
  • the system may use an applet for the visualization of dose-response data, allowing the user to toggle, for example, the display of e ⁇ or bars, data-points and the curve itself.
  • a further applet that may be used is an applet that performs dose response curve fitting using a downhill simplex minimization or other minimization techniques such as simulated annealing.
  • the user may interactively: add points to the curve; remove points from the fitting calculation; determine the hill slope; perform a two-site dose response fit, anchor either of he curve end-points, and the like.
  • Applets can also be used to build navigation tools, such as an applet that downloads and displays menu files from the server (such as standard menus that occur at the foot of web pages in the database) or menus for selecting specific assay screens (screens may be sorted by type, experimental technique and receptor target).
  • any other utility applet objects may be used.
  • some web browsers currently are unable to print applet contents. Therefore, there is an applet object method to capture the display of another graphic object. The method calls a server program to display the printable image graphic in a new browser window.
  • Applets can also be used to "stack" displays. For example one applet may store many tables but display only one. The user can select another table by clicking on a button. This applet is used for concise display in what would be otherwise cluttered data sheets (for example the molecule and substance data sheets).
  • a similar applet may stack plate profile bar-charts, allowing a user to view either counts-per-minute of biological response (for a more complex assay such as a flashplate-cyclic AMP (“cAMP”) assay) or any other measurements (such as pMol cAMP). Stacked bar-charts and histograms can also be used to analyze the distribution of data within table columns.
  • cAMP flashplate-cyclic AMP
  • Another useful applet that may be used combines features from two previously described applets: the scatter-plot applet and the assay browser applet. This allows the user to select two assays.
  • the applet generates SQL commands for a comparison ofthe assays and uses a server program to fetch the data from the database management system. The data is then displayed in a scatter-plot.
  • the present invention may use commercial JAVA® objects, such as, for example, objects that relate to chemical structure sketching.
  • objects that relate to chemical structure sketching.
  • One such object allows the DBMS to retrieve a chemical structure from the user, where the user has sketched the structure using, for example, the CHEMDRAW® plugin.
  • step 143 more data may be retrieved from the server. For example, multiple iterations to step 143 may occur.
  • the applet may continue to run while it is displayed in the web browser.
  • Different web browsers react differently to the applet; for example, MICROSOFT INTERNET EXPLORER® stops the program when the user leaves the page and NETSCAPE NAVIGATOR® may allow the applet to continue to execute after the user leaves the page.
  • the present invention may react in accordance with these differences and other differences between various web browsers.
  • FIG. 4 another schematic of one embodiment ofthe present invention is illustrated. This schematic depicts when the server passes status messages to a client during a search or computation.
  • FIG. 4 depicts how one embodiment of the present invention informs a user ofthe progress ofthe present invention.
  • the client requests that a program be run on the server. This request comes from a hyperlink, from in a web page, or from an applet running in the current page.
  • the client passes to the server the name of a server program to run and various parameters, such as arguments and values, at 202.
  • the web server then starts the program and saves the arguments and the program name to a state file at 203. Then at 204, the server sends the applet and the name ofthe state file with the arguments to the client in HTML format. At this point, the server program stops.
  • the client loads the new HTML web page and starts the applet.
  • Two threads of execution 206 and 213 are setup.
  • the applet finishes when both threads stop.
  • the first thread 206 starts a timer at 207.
  • the timer is useful for showing the user that the programs are still running and for showing the elapsed time the program has taken to process.
  • the system determines whether the second thread has indicated that the program is finished. If not, the timer is incremented by 0.1 seconds and this is indicated on the display at 209. The display is then redrawn.
  • the display may show a moving icon and the elapsed program time.
  • the thread then sleeps for 0.1 seconds until the next determination at 208.
  • An example screen shot of such a display is illustrated in FIG. 17.
  • the second thread starts and opens a connection to the web server at 215.
  • the thread sends a request to the server to run the server program at 223 and the thread passes to the server the filename of the "state" file, where the state file stores the parameters and the state of the process.
  • the thread continues receiving messages from the server program until the connection closes.
  • the thread determines if a redirect message has been received at 217.
  • the redirect message indicates that the server program is about to finish and that all results have been written out to an HTML file. This message includes the name of the HTML file.
  • the thread loads the HTML file into the browser at 218.
  • the thread determines if any other message or messages from the server have been received. If so, a message is displayed in the applet at 220. Such a message may indicate the progress of the server program, for example stating that the server is "Setting up BLASTTM search” or "Added new compound to database” or the like.
  • the thread determines if the server connection has closed at 221. If so, the server program is finished at 222 and thread 1 is notified as such at 21 1. Thread 2 then terminates at 212. If the server connection has not closed, the loop waits for incoming messages at 216. At 223, the server program is started once again. However, when the server program starts this time, it has the state filename. At 224, the state file is read by the server and loaded. The state file is checked to see if it indicates that this program has already been run before at 225. If it has, then all previous messages (that are in the state file) are sent to the client at 226 along with an additional message that informs the client that this program has already been run. If, however, the user wants to rerun the program, the user can just reload the page. After sending this message to the client, the server program stops at 238.
  • the main search program 238 begins.
  • the main search program is not shown in FIG. 4, however, it may involve the same process as the current process.
  • the main search program is monitored for messages or for termination at 229.
  • the system checks for such a message or termination. If such a message is received, the message is sent to the client at 231 and the state file is updated with the message at 232.
  • the web server may automatically send the message to the client, however the web server output should be unbuffered for this to occur.
  • the system determines if the program has finished. If not, the system loops back to 229 monitoring for messages. If the program has finished, at 234, the state file is updated to show that this program has now been run.
  • the system determines if there are any results to report at 235. If yes, the results are formatted as an HTML file and saved at 236. Additionally, other data may be saved at this point, as stated above.
  • a redirect request is sent the client at 237.
  • the redirect request may include the HTML filename.
  • the system ofthe present invention is capable of interfacing and handling tabular data. Such data may include information on molecules, plates, assays, or the like. To do this, server and client software objects and methods for handling tabular data have been developed. Additionally, a suite of routines for generation and display of tabular data has also been developed. Such tabular data includes data and meta-data (for example, the prefe ⁇ ed display methods and the table title). Formats for files can also be exchanged between different server and client programs/tools.
  • Tabular data may be saved in a simple file format, for example, that may be exchanged between different server and client programs.
  • This file format may be read by both client applets and server CGI programs.
  • a server program may read a table data file and display that table in HTML, or a client JAVA® applet may read the same table and display it dynamically.
  • Applets that can read table files include, for example: a table viewer, a scatter plot viewer, a histogram viewer, a barchart viewer, and the like.
  • Cell information may be linked to other database pages. For example, if a column in a table contains structural data, the software can link the user to structure data sheets. Additionally, users can delete and edit columns.
  • users may add related data from other databases or the database management system, or add screening data to build a selectivity profile.
  • users can interact with data using tools like interactive JAVA® scatter-plots, histograms, curves, bar-charts for interpretation of numerical data in tables, or the like.
  • Data may be viewed with a table viewer, scatterplots, histograms, or in spreadsheet formats. Additionally, the system ofthe present invention may use other graphics applets that display graphics, such as creating a barchart or creating a histogram profile of screen. Alternatively, colored grid-plate viewers may also be created.
  • the system ofthe present invention may be carried out using a computer, and specifically, the processing units of such computers.
  • an interface to the above databases that store the biomolecular data receives an instruction to search for a particular piece of biomolecular data.
  • This may come in the form of a computer instruction that directs the processing unit to receive a request for access to certain biomolecular data stored in one ofthe plurality of databases.
  • a second computer instruction determines which ofthe plurality of databases stores the biomolecular data.
  • a third computer instruction accesses the biomolecular data in one of the plurality of databases.
  • a fourth computer instruction receives the biomolecular data from one ofthe plurality of databases.
  • the processing unit displays the biomolecular data received from the one of the plurality of databases.
  • another embodiment ofthe present invention includes a computer system for electronically retrieving biomolecular data from a plurality of databases over a system of networked computers, where the computer system includes a central processing unit (CPU) or units and random access memory (RAM) coupled to said CPU, for use in compiling a target program to run on a target computer architecture.
  • the present invention may further include a client computer and the database management system. These systems may be connected by electronic connections such as hardwire or wireless connections.
  • the database management system is electronically connected to a plurality of databases that include biomolecular data.
  • the client computer receives a request for biomolecular data in a desired format from a user and sends the request to the database management system.
  • the database management system accesses the biomolecular data from the plurality of databases.
  • the database management system then generates as output a web page that is sent to the client computer over the electronic connection.
  • the web page includes the biomolecular data in the desired format.
  • FIG. 5 illustrates a series of menu options available in one embodiment ofthe system ofthe present invention. These menu options are shown purely as an example of available menu options and are not intended to limit the present invention in anyway. Further, additional menu options may be added as necessary to the system ofthe present invention.
  • the present invention may display a "Main" page 310, which provides the option of entering either a "Gene Distribution Home Page” 311 or a "Screening Home Page” 329. These menu choices may be displayed using a HTML web page or an applet. Depending on what menu choice the use selected, the user may navigate through various menu choice options to obtain the desired information.
  • the "Gene Distribution Home Page” provides the user with, for example, genomic and bioinformatic data.
  • the user may be directed to four other menu choices, "Main” 312, "Search Sequence” 313, “Search Dot Blot” 314 or “Update” 315. The user may then select which menu option he or she desires. The “Main” 312 selection allows the user to select “Gene Distribution Home Page” 311 or “Screening Home Page” 329. These options returns the user to the beginning ofthe menu options.
  • the user may print a hydrophobicity plot of an amino acid sequence.
  • the user may, for example, choose one of five options: “Browse dot blots" 322; “Search by organ and disease state” 323;
  • a user may select a dot blot and print it, view the plate map, or show a bar graph. The dot blot is displayed with its ID, the gene, and the date in which the dot blot was added.
  • a user may instead select to "Search by organ and disease state" 323, in which a list of organs and possible diseases may be selected. The user may choose to view the results in a scatterplot and/or barchart.
  • a user selects to "Search by gene and disease state" 324, the user may select a gene and a co ⁇ esponding disease state. The user may then review the results. Alternatively, the user may "Search by gene and organ” 325, in which the user may select a gene and an organ, and view the results.
  • "Find interesting dot blots" 326 for example, the user may enter various parameters, such as to search for dot blots containing organs whose relative abundance is more than a desired amount of standard deviations larger than the mean for the dot plot. The desired amount of standard deviations may be entered by the user. A table of results is then displayed.
  • the "Screening Home Page” 329 allows a user to obtain analysis of screening and molecular information. This page allows a user to select from one of six menus, for example: “Main” 330; "Molecular Info” 331; “Assay Info” 332; “Lab Info 333; “Chemistry” 334; and “Other Pages” 335.
  • the "Main” 330 selection allows the user to select “Gene Distribution Home Page” 311 or “Screening Home Page” 329, similarly to the "Main” 312 under the “Gene Distribution Home Page” 311. Again, this returns the user to the beginning ofthe menu choice options.
  • the user may select one of four options, for example, comprising: “Select molecule from database” 336; “Select substance from database” 337;
  • the option to "Select molecule from database” 336 allows a user to enter a molecule ID and retrieve a list of molecules based on the ID. If the user selects "Select substance from database” 337, the user may enter a substance ID and a substance name and retrieve a list of substances.
  • “Perform chemical structure searches” 338 the user may select a database, type of search, a cutoff (for two dimensional similarity searches), and the maximum number of hits. The user then may, using for example CHEMDRAW®, draw a chemical st cture. This chemical structure is searched in chemical structure databases, for example UNITY®. The results show the matches ofthe chemical structure For example, the results of a sample UNITY® search are shown in FIG 20
  • the user may enter a plate ID or plate name and ret ⁇ eve information on plates matching the entered c ⁇ te ⁇ a Information on the plates that may be shown includes the plate ID, plate name, plate type, chemical structures and any comment that was added Thus, the user may view an entire plate at one time to compare different wells
  • the user may choose one of six additional options, for example “Browse assay results” 340, “Search IC50 data” 341, “Search reconfirmations” 342, "Search p ⁇ mary screen” 343, “Compare results for two assays” 344, and “List animal expe ⁇ ments” 345
  • the option to "Browse assay results” 340 allows a user to browse assay results by va ⁇ ous parameters, such as assay method type
  • the user may search IC50 data using va ⁇ ous parameters
  • the user may search reconfirmation information using va ⁇ ous parameters
  • the option "Search p ⁇ mary screen" 343 is selected, the user may search a p ⁇ mary screen using va ⁇ ous parameters Under “Compare results for two assays” 344 a user may plot one assay against another assay
  • the option "List animal expe ⁇ ments” 345 allows a user to select an animal expe ⁇ ment
  • a user may choose one of five options, for example “Add data captured from lab devices to database” 346, “View output from lab devices” 347, “Screening tools” 348, “Generate reconfirmation plate maps” 349, and “Add animal data” 350
  • “Add data captured from lab devices to database” 346 the user may take data captured from a lab instrument and add it to the system Under the option, "View output from lab devices” 347, the user may view data from lab devices "Screening tools” 348 allows a user to select a screening tool from a list
  • selecting "Generate reconfirmation plate maps" 349 the user may generate plate maps from reconfirmation data "Add animal data” 350 allows the user to add animal data to the system
  • selecting the "Chemistry” 334 option the user may select one of three other options, for example: “Add compound to database” 351; “Edit substance” 352; or "Create new plate” 353.
  • “Add compound to database” 351 permits the user to add a compound to the system, using a plugin such as CHEMDRAW®.
  • the option “Edit substance” 352 allows a user to edit an existing substance in the system and "Create new plate” 353 creates a blank new plate to add data.
  • the "Other Pages” 335 screen allows the user to perform various other functions. For example, the user may select “IC50 Calculator” 354; “Overview of database” 355; or other system administrator tools.
  • the option “IC50 Calculator” 354 allows a user to build an IC50 curve using data points input by a user.
  • “Overview of database” 355 a user may obtain sample pages that depict various features ofthe system. Additionally, other system administration tools may be placed under this menu option.
  • the system may confirm the access level of a user before granting access to certain menu options. Alternatively, the system may not even display menu options to a user that does not possess the proper access level.
  • the access level of a user may be set when the user's account is created. Referring now to FIG. 6, a flowchart 400 of an overview of one embodiment ofthe present invention is shown.
  • a request for biomolecular data is received.
  • the system determines at least one ofthe databases that contains the biomolecular data at 402.
  • the system generates database instructions for accessing data in the at least one database.
  • the database instructions are transmitted to the at least one database at 404.
  • Biomolecular data from the at least one database is received at 405.
  • a display data is generated for the biomolecular data and at 407 the display data is transmitted.
  • FIG. 7 details more information on what occurs at step 406.
  • a request is received from a user, preferably on the client machine.
  • the program and the program arguments are read.
  • the program is then stored in memory at 503.
  • FIG. 8 details more information on what occurs at step 401. For example, at 601 results from at least one database is received. Then at 605, results of the program are stored and at 603 a display including the results is generated.
  • PROGRAM java- image Java image capture sql-query Interface to allow the user to execute their own SQL queries (also may be used by JAVA® clients)
  • PROGRAM DESCRIPTION OF PROGRAM get-assays retrieves all assay data for the compounds on the plate get-plate Interface for getting a plate data sheet
  • PROGRAM unity-search/hitlist Displays results from a UNITY hitlist unity-search/sketch Interface to query sketchers unity-search/submit-unity-sim-search Interface to chemical similarly searches unity-search/unity-results Does unity search and displays results
  • various programs can be used in the present invention on the client computer and are enabled by the applet, as shown in Tables 21-31. These programs include applets that allow the inco ⁇ oration of dynamic content into web pages. These programs, organized by the type of program and data gathered along with a short description of what the programs accomplish, may include, for example:
  • FIGS. 9 - 23 show images from a prefe ⁇ ed embodiment of the present invention. These images are the copyright of Arena Pharmaceuticals, Inc., and are subject to the restrictions listed above.
  • FIG. 9 a view of an introduction screen as output from one embodiment ofthe present invention is shown. This screen allows a user to navigate through the various features ofthe present invention.
  • FIG. 10 a visualization tool for runset counts as output from one embodiment ofthe present invention is shown. Information on the plate as well as various colors to demonstrate different wells, are shown.
  • FIG. 11 a hydrophobicity sequence analysis, with two applet output windows, as output from one embodiment ofthe present invention is shown. The two applet windows show various graphs that may be output to visual the sequence analysis.
  • the histogram is a plot ofthe various data points.
  • Various qualities about the histogram, such as the maximum and minimum points are also illustrated.
  • FIG. 13 a view of a molecule data sheet as output from one embodiment ofthe present invention is illustrated.
  • a molecule may be searched and looked up using the present invention.
  • Information on the molecule such as its structure, its ID, SLN, molecular weight, clogP, and comments, can easily be seen by the output ofthe present invention.
  • FIG. 14 a primary screening runset as output in two windows from an embodiment ofthe present invention is shown.
  • This shows data from a runset.
  • the number of the plates, along with information obtained about each plate, is also illustrated. For example, the information is grouped by plates.
  • This figure is shown in two windows because the output may be too voluminous to be displayed on one screen and thus a user has to scroll down to see the rest of the data.
  • FIG. 15 a primary screening plate as output in two windows from an embodiment ofthe present invention is shown.
  • the results shows information such as the runset, plate ID, assay, target, assay method, plate map, secondary assay, runset comments, and the date.
  • FIG. 16 animal data output for a motor function/dysfunction experiment as output from one embodiment ofthe present invention is shown. This data shows useful information about a local motor experiment.
  • FIG. 17 an interface while a program is running as output from one embodiment ofthe present invention is shown.
  • the present invention outputs update information to the user such as "setting up search.” Additionally a timer indicates how long the program has taken to execute.
  • an IC50 form for searching results from IC50 searches as output in two windows from an embodiment ofthe present invention is shown. This allows a user to enter various search parameters and receive search results.
  • the screen on the left depicts the entry form for entering search parameters and the screen on the right depicts the results that are output.
  • FIG. 19 a scatter plot comparison, with axes inputs, as output from one embodiment ofthe present invention is shown.
  • the scatte ⁇ lot shows the relationship between assays. Additionally, two input windows are also shown, in which the X and Y axes may be selected.
  • a query for searching database results as output in two windows from an embodiment ofthe present invention is shown.
  • a structure may be entered, along with the databases to search, the search type, the cutoff, and a maximum number of hits.
  • the present invention outputs the hits. In this example, there were 2107 hits.
  • the output is displayed such that a user may scroll down the various hits and select one ofthe hits if desired.
  • the window on the left displays the input window while the window on the right displays the output window.
  • FIG. 21 a form for entering assay data as output from one embodiment ofthe present invention is shown.
  • the form includes various inputs, such as plate number, comments, and entries for the various substance numbers.
  • dot blot experimental details as output in two windows 5 from an embodiment of the present invention is shown.
  • the details show an image ofthe dot blot along with various points of information for the dot blot experiment. This figure is shown in two windows because the output may be too voluminous to be displayed on one screen and thus a user has to scroll down to see the rest ofthe data.
  • FIG. 23 an IC50 plate assay data sheet as output in two windows 10 from an embodiment ofthe present invention is shown.
  • This displays various information about the assay plate, for example, the runset, assay method, and/or a control summary.
  • This figure is shown in two windows because the output may be too voluminous to be displayed on one screen and thus a user has to scroll down to see the rest of the data.

Abstract

The present invention includes a database management system that interfaces with a plurality of databases storing biomolecular data. A request to access biomolecular data stored in at least one of a plurality of databases is received by the database management system from a user's computer. The database management system determines which of the plurality of databases store the biomolecular data. Instructions to access the biomolecular data in the at least one of the plurality of databases is generated and the information is accessed. The biomolecular data is received from the at least one of the plurality of databases. Then a web page display of the biomolecular data received from the at least one of the plurality of databases is generated and sent to the user's computer.

Description

UNIVERSAL BIOMOLECULAR DATA SYSTEM RELATED APPLICATIONS
This application claims priority from U.S. Provisional Application Serial No. 60/193,065, filed March 29, 2000, which is incorporated by reference in its entirety, and U.S. Patent Application No. 09/635,833, filed August 9, 2000, which is incorporated by reference in its entirety.
COPYRIGHT NOTICE A portion of the disclosure of this patent document contains material that is subject to copyright protection. The copyright owner does not object to facsimile ofthe patent document or the patent disclosure as it appears in the Patent and Trademark Office patent files or records following issuance, but otherwise reserves all copyright rights whatsoever.
FIELD OF THE INVENTION
The field ofthe invention is database management systems. In particular, the field of the invention includes a database management system for storing, using, maintaining and retrieving large amounts of biomolecular data, including chemical, biological, genomic, testing, and other related data and information.
BACKGROUND The present invention relates generally to a database management system (or
"DBMS") for storing, using, maintaining and retrieving large amounts of biomolecular data, including chemical, biological, genomic, and other information. More particularly, the invention relates to database management systems for providing fast access to information relating to large collections of biomolecular data, including, for example, data on organic compounds and nucleotide sequences (for example gene sequences), structural and chemical information, in vivo and in vitro data, protein sequences, assays, and other screening information. The present invention utilizes, as appropriate for a desired system, multiple databases, including relational databases, chemical structure databases, and related persistent storage mechanisms, sequence databases, database-searching mechanisms, dedicated laboratory instruments, server computers, and internal and external software including
Internet access software.
Computer systems in general are known. A typical system comprises a computer, keyboard, mouse, and a monitor. Additionally, the computer comprises a single or multiple central processing unit(s) ("CPU") and random access memory ("RAM") and allows various software programs to be used. Further, the computer might comprise a modem, an Ethernet card or other similar device for connecting to a system of networked computers, such as the
Internet. The Internet provides a useful technique for making information available to a variety of individuals each of whom may be located at a variety of different locations. Indeed, within the vast Internet environment, individuals can access information tools from remote locations. The Internet, which originally came about in the late 1960s, is a computer network made up of many smaller networks spanning the entire globe. The host computers or networks of computers on the Internet allow public or private access to databases containing information in numerous areas of expertise. Hosts can be sponsored by a wide range of entities including, for example, universities, government organizations, commercial enterprises and individuals.
Internet information is made available to the public through servers running on an Internet host. The servers make documents or other files available to those accessing the host site. Such files can be stored in databases and on storage media such as, for example, optical or magnetic storage devices, preferably local to the host.
Networking protocols can be used to facilitate communications between the host and a requesting client. TCP/IP ("Transmission Control Protocol/Internet Protocol") is one such networking protocol. Computers on a TCP/IP network utilize unique identification ("ID") codes, allowing each computer or host on the Internet to be uniquely identified. Such codes can include an IP ("Internet Protocol") number or address, and corresponding network and computer names. Created in 1991, the World-Wide Web ("Web" or "www") provides access to information on the Internet, allowing a user to navigate Internet resources intuitively, without IP addresses or other specialized knowledge. The Web comprises hundreds of thousands of interconnected "web pages", or documents, which can be displayed on a user's computer monitor. These web pages are provided by hosts running special servers. Software that runs these web servers is relatively simple and is available on a wide range of computer platforms including PC's ("personal computers"). Equally available is web browser software, used to display web pages as well as traditional non-web files on the user's system.
The Web is based on the concept of hypertext and a transfer method known as "HTTP" ("Hypertext Transfer Protocol"). HTTP is designed to run primarily over TCP/IP and uses the standard Internet setup, where a server issues the data and a client displays or processes it. One format for information transfer is to create documents using Hypertext Markup Language ("HTML"). HTML pages are made up of standard text as well as formatting codes indicating how to display the page. The browser reads these codes to display the page.
Each web page may contain pictures and sounds in addition to text. Associated with certain text, pictures or sounds are connections, known as hypertext links, to other pages within the same server or even on other computers within the Internet. For example, links may appear as underlined or highlighted words or phrases. Each link is directed to a web page by using a special name called a URL ("Uniform Resource Locator"). URLs enable the browser to go directly to the associated resource, even if it is on another web server.
In addition to the Internet, which allows for general, public retrieval of information, other means of accessing such information exist and are commonly utilized. For example, direct modem connections between two computers, proprietary internal networks within large institutions and organizations, or the like, are equally available and useful means for accessing catalogued information stored in databases.
Chemical and pharmaceutical industries and chemically orientated government agencies typically maintain large chemical substance and/or nucleotide sequence databases. Such organizations are faced with managing increasing amounts of information relating to chemical compounds, proteins, gene sequences, and large amounts of data relating thereto. It is not uncommon for a company to have hundreds of thousands, even millions, of organic molecules, and millions of pages of information relating to the characteristics and testing of those molecules.
Furthermore, experimental data are not always recognized as being relevant or important, particularly when such data are located across millions of pages of information. For example, "negative" data from one scientific program is often, in retrospect, "positive" within the context of another program. Additionally, it is increasingly the case that companies may have data relating to thousands of nucleic acids sequences. For example, the explosion of genomic information has created an unprecedented need for users with a deep working knowledge of biological sciences, chemical sciences and computational methods. Accordingly, there exists a need for data models and database systems that allow one to store, manipulate, retrieve, use or search various biomolecular data, including sequences, structures, genetic linkages and maps, signal pathways, and the like. Biological databases also need a means to provide experimental data relating to these sequences and structures, including both positive and negative data.
Notwithstanding the above, it is understood that is it difficult to perform chemical and related searches fast. The efficiency of chemical and related searching is still relatively poor and slow. Therefore, there is a need for an improved database management system for storing, using, maintaining and retrieving biomolecular data.
SUMMARY OF THE INVENTION
The database management system invention disclosed and claimed herein provides users with transparent access to a set of data that otherwise would require separate software packages to access. The database management system includes, as necessary or desired, a local area network, a relational database management system, a chemical structure database, persistent storage (such as flat files and DBM ("Disk Based Hash") files, a relational database, a sequence database and/or sequence searching mechanism, one or more laboratory instruments collecting data (for example, one or more of in vitro data, in vivo data, in situ data, tissue distribution information and data, or the like), external software methods (e.g., file format translation programs), a set of two or more computers with Internet browsing software, a server computer (or a computer cluster), web server software and/or programs for collecting, publishing, editing, and searching data. For secure communications with collaborators using the present invention, a secure sockets layer ("SSL") connection to the Internet or some other encryption technology may be used.
In one embodiment ofthe present invention, a database management system that provides an interface to a plurality of databases storing biomolecular data, the system comprises: a plurality of databases storing biomolecular data; a processing unit; a first computer instruction that directs the processing unit to receive a request for access to biomolecular data stored in at least one ofthe plurality of databases; a second computer instruction that determines which ofthe plurality of databases stores the biomolecular data; a third computer instruction that accesses the biomolecular data in the at least one ofthe plurality of databases; a fourth computer instruction that receives the biomolecular data from the at least one ofthe plurality of databases; and a web page that is generated by the processing unit and displays the biomolecular data received from the at least one ofthe plurality of databases. "Biomolecular data" refers to any data and/or information concerning molecules that are studied, made, used, sold, offered for sale, exported or imported by an individual or entity. Such molecules include, but are not limited to, nucleic acids, proteins, lipids, carbohydrates, and the like, as well as any constituents or building blocks of such molecules, for example amino acids, nucleotides, and the like. Such molecules also include, but are not limited to, organic and inorganic compounds and mixtures of such compounds, whether in liquid, solid, gaseous, or any other form. Information concerning such molecules includes, but is not limited to, structure, formula, weight, chemical activities and characteristics, biological activities and characteristics, in vivo and in vitro activities and characteristics, kinetic activities and characteristics, physiological activities and characteristics, receptor binding activities and characteristics, and so on, including any and all laboratory, testing, experimental and other data relating to such molecules. This definition is intended to be exemplary only. It is not intended to limit the types of data that may be manipulated by the invention to the classes or types of information described or referred to throughout the specification. Rather it is meant to include any information that a person or entity would find useful to store and quickly retrieve using the methods and apparatus described and claimed herein.
"Colored" refers to any item that can be distinguished from another item based on visible or machine readable color recognition. A technical advantage of one embodiment of the present invention is that it provides a fast and easy system for obtaining biomolecular data, such as chemical, structural, genomic, screening and other laboratory data, genomic, and gene-expression data, over the Internet, and thus is accessible virtually from anywhere.
Another technical advantage of one embodiment of the present invention is that it provides a customizable system for gathering, analyzing, and relating data. Additionally, such information may easily and securely be shared within an organization or with other desired third parties.
An additional technical advantage of one embodiment of the present invention is that it allows real-time distribution of experimental data and allows users to analyze and search information with little to no delay.
Another technical advantage of one embodiment of the present invention is that multiple interacting databases may be used to provide useful information to a user. For example, gene sequence based information may be linked to biological assay information and chemical structure information. An additional technical advantage of one embodiment of the present invention is that it may be used to gather and collect data from on-going experiments. Additionally, one embodiment of the present invention may be directly connected to laboratory instruments to record and track data in real time and/or update information in one or more of the plurality of databases.
Another technical advantage of one embodiment of the present invention is that users with limited computer experience may easily gather and read data. Additionally, one embodiment of the present invention outputs clearly formatted data.
Another technical advantage of one embodiment of the present invention is that it may operate over a LAN, wide-area network ("WAN"), stand-alone computer, the Internet, or any other system of networked computers.
Other aspects, embodiments, and technical advantages of the present invention are set forth in or will be apparent from drawings, claims, and the disclosure of the invention which follows, or may be learned from the practice of the invention. Such other aspects, embodiments, and technical advantages shall be deemed to be a part of the invention as if they were disclosed herein.
BRIEF DESCRIPTION OF THE DRAWINGS
For a more complete explanation of the present invention, reference is now made to the following disclosure and the accompanying drawings, in which:
FIG. 1 illustrates an overview of a system contemplated by one embodiment of the present invention; FIG. 2 illustrates an overview of a computer system of one embodiment of the present invention;
FIG. 3 illustrates a schematic overview of one embodiment of the system of the present invention;
FIG. 4 illustrates another schematic overview of one embodiment of the system ofthe present invention;
FIG. 5 illustrates a series of menu options available in one embodiment of the system of the present invention; FIG. 6 illustrates a flowchart of an overview of one embodiment of the present invention;
FIG. 7 illustrates a flowchart of a specific aspect of the flowchart of FIG. 6;
FIG. 8 illustrates a flowchart of a specific aspect of the flowchart of FIG. 6; FIG. 9 illustrates an introduction screen as output from one embodiment ofthe present invention;
FIG. 10 illustrates a visualization tool for runset counts as output from one embodiment of the present invention;
FIG. 1 1 illustrates a hydrophobicity sequence analysis, with two applet output windows, as output from one embodiment of the present invention;
FIG. 12 illustrates a histogram of frequency counts for a runset, as output from one embodiment of the present invention;
FIG. 13 illustrates a molecular datasheet as output from one embodiment of the present invention; FIG. 14 illustrates a primary screening runset as output in two windows from an embodiment of the present invention;
FIG. 15 illustrates a primary screening plate as output in two windows from an embodiment of the present invention;
FIG. 16 illustrates animal data output for a motor function/dysfunction experiment as output from one embodiment of the present invention;
FIG. 17 illustrates an interface while a program is running as output from one embodiment of the present invention;
FIG. 18 illustrates an IC50 form for searching results from IC50 searches as output in two windows from an embodiment of the present invention; FIG. 19 illustrates a scatter plot comparison, with axes inputs, as output from one embodiment of the present invention;
FIG. 20 illustrates a query for searching database results as output in two windows from an embodiment of the present invention; FIG. 21 illustrates a form for entering assay data as output from one embodiment of the present invention;
FIG. 22 illustrates dot blot experimental details as output in two windows from an embodiment of the present invention; and FIG. 23 illustrates an IC50 plat assay data sheet as output in two windows from an embodiment ofthe present invention.
DETAILED DESCRIPTION OF THE INVENTION The present invention comprises a system for maintaining, searching and using a system of databases and for storing and retrieving large amounts of biomolecular and other data, including chemical, biological, genomic, and other information. Moreover the present invention provides fast access to information relating to large collections of organic compounds, proteins, and nucleotide sequences, including structural information, in vivo and in vitro data, gene sequences, protein sequences, and other laboratory data including but not limited to assay and screening information. Biomolecular data comprises the various data types mentioned herein, including, but not limited to, chemical, biological, genomic, organic, structural, and the like
In the system of the present invention, a client computer may be used to make a request for data to a server. The client computer passes to the server computer the name of a server program to run along with a set of arguments and values. In one embodiment, this request is encoded in a client web page via a link or form so that the user may access the program by using a computer mouse. In other embodiments, for example, this request may originate from a JAVA® (Sun Microsystems, Inc.) program and may request that the browser load a new web page, or the request may also be in response to action through a computer mouse by single or multiple clicks. The request is passed to the server computer and onto the server program using the
HTTP web protocol and executed on the server using, for example, the Common Gateway Interface ("CGI") mechanism. Alternatively, this may be accomplished using other protocols, such as PHP, PHP Hypertext Preprocessor (PHP Development Team), or an ISAPI, Internet Server Application Program Interface (Microsoft Corporation), application such as ASP, Active Server Pages (Microsoft Corporation). Standard web server programs such as the APACHE™ (Apache Software Foundation) web server or MICROSOFT'S INTERNET INFORMATION SERVER® (Microsoft Corporation) can be configured to handle these requests. The server program executes and creates a new web page as output, for example in the HTML, the Extensible Markup Language ("XML") format, or any other web page format, which replaces the current web page in the client browser.
Alternatively, for example, the client request may originate from an applet, such as a JAVA® program, that runs in the web page. Applets are programs that can run inside web pages. For example, the applet may obtain its data points from the server program and display a scatter-plot based on that data. The applet is also able to communicate with the CGI program through web server software. The CGI program, for example, may then use a data format that the applet could readily interpret. Throughout this description of the present invention, anywhere that is described as using JAVA® may alternatively use any applet. The software used in the present invention may be under any programming language.
Some languages that are useful to use include, for example: C, the Practical Extraction Report Language ("PERL") (with its modules, for example: CGI (Lincoln D. Stein), DBI (Tim Bunce), LWP (Gisle Aas, Martijn Koster), NET, Win32, GD (Lincoln D. Stein), DB_File (Paul Marquess), PTK (Nick Ing-Simmons), APACHE, and XS), and COBOL. Some tools that are useful to use in the present invention include, for example: APACHE™, UNITY® (TRIPOS Associates, Inc.), public domain software such as APACHE™, NCBI BLAST™ (National Center for Biotechnology Information), CLUSTAL™ (developed by Toby Gibson, Des Higgins, and Julie Thompson), PHYLIP™ (University of Washington), IMAGEMAGICK™ (Imagemagick Studios), GNUPLOT™ (Thomas Williams and Colin Kelley), and the like. Some output formats for visualization of the data that may be used in the present invention include: histograms, scatterplots, barcharts, tables, assay browsers, menus, curves, plate maps, and the like. While the present invention is generally referenced as being used on the Internet, the present invention will equally work on any system of networked computers, or on one standalone computer. Additionally, the client computer may be any workstation, personal computer, server computer, handheld computer, laptop computer, mobile or wireless computing device, or alternatively any other computing device, for example a client in a client/server environment. Alternatively, the client may be a user machine that performs computer processing, while the server acts as a remote storage medium for the data in the databases. For example when an applet is running, the processing may be done by the client computer. Additionally, when running an applet, the client computer may interact with the databases or the DBMS.
Additionally, while the present invention is generally referenced as utilizing chemical, biological, and genomic information, the present invention may use any type of information that is suitable for use on a database system.
With reference to FIG. 1, an overview of an embodiment ofthe present invention 10 is illustrated. A local area network ("LAN") or intranet 11 may be used to carry out the present invention. For example, computers 12 and 14 may be connected to LAN 11. Computers 12 and 14 may connect to web server software 26 through LAN 11. Web server software 26 may connect to a database management system 5 ("DBMS"). The DBMS 5 may be on a server computer or on a plurality of computers (such as a tier of computers), wherein the server is comprised of a plurality of computers. Alternatively, computers 12 and 14 may connect directly to DBMS 5. Alternatively still, computers 12 and 14 may be client computers. The client computers 12 and 14 interact with the server computer housing the DBMS 5 or to other computers on the LAN. In the present invention, as many client computers may be used as necessary. Thus, there may be a client computer for each user that desires access to the present invention).
An example of a standard computer system is illustrated in FIG. 2. FIG. 2 illustrates a portion of a computer, including a CPU and conventional memory in which one embodiment ofthe present invention may be embodied. The environment in which the present invention operates encompasses a general distributed computing system, wherein general purpose computers, workstations, or personal computers are connected via communication links of various types to the Internet. Some of the elements of a general purpose workstation computer 100 are illustrated in FIG. 2, wherein a processor 101 is illustrated, having an input/output ("I/O") section 102, a central processing unit ("CPU") 103 and a memory section 104. The I/O section 102 may be connected to a keyboard 105, a display unit 106, a disk storage unit 109 and a CD-ROM drive unit 107. The CD-ROM unit 107 can read a CD-ROM medium 108, which typically contains programs and data 1 10. This computer 100 may be connected to the Internet or the Web, via a modem, Ethernet connection, or other communications link. Computers described herein may use a computer system as described above or computer systems similar to this computer system.
Referring again to FIG. 1, DBMS 5 may use web server software 26. Computers 12 and 14 may then use Internet browsing software, such as NETSCAPE NAVIGATOR® (Netscape Communications Corporation) or MICROSOFT INTERNET EXPLORER® (Microsoft Corporation). Further, if on the Web through Internet 36, remote client computers 38 that are outside of LAN 11 may connect to the system of the present invention. Additionally, to protect access to the present invention from unauthorized individuals on the Web, a firewall may be erected, passwords may be required, or data encryption may also be used. Database management system 5 oversees organization of the data. Database management system 5 may be the computer system that oversees management, integration and operation of the various databases. Database management system 5 may be housed on a computer or a plurality of computers, such as the server.
The present invention may use any database that contains information useful to the user, including, but not limited to, biomolecular data. For example, some databases that may be used in the present invention include databases that contain information on compound collections, organic compounds, proteins, nucleotide sequences (including structural information), structural descriptors (such as TRIPOS® SLNS or SMILES® (Daylight Chemical Information Systems)), in vivo or in vitro data, gene sequences, protein sequences, a tissue screen, structural information, sequence information, data-mining results, assay information, and the like. Data-mining results may come from searches (such as for primary screens, reconfirmations, or IC50s), comparisons of assays, track screening, structure-based searches (such as for substructure, similarity, or clustering), or a combination of structure/assay searches, for example.
Additionally, the results output from the present invention may provide users with various information in various formats. For example, these results may allow a user to: view data from laboratory devices; add data from counters to a database; compare assay results; obtain statistics on assays; edit plates, runsets, or wells out of an assay; edit chemical structures; add compounds; compare results from various searches; browse dot blots; search dot blots; browse genes; search genes; search sequences; retrieve literature for sequences or other information; and the like.
Moreover, if viewing a runset, a runset profile may be generated. The runset profile may include a grid of colored wells wherein each well contains a representation of a number of data points from sequential plates in the runset. The user may control the number of data points. Additionally, the user may scroll through the visualization ofthe runset.
When a user makes a request, DBMS 5 determines which database or databases the particular data sought by the user is located. This may be accomplished, for example, by sequentially examining each database in the system or by examining specific databases in the system, depending on the request. For example, if a chemical structure is desired, the system may look at chemical structure databases, such as the UNITY® (TRIPOS Associates, Inc.) database and other chemical structure databases.
Database management system 5 is connected to the various databases used in the present invention. For example, database software management system 5 may be connected to a relational database management system ("RDBMS"). A relational database is a set of tables containing data fit into predefined categories, where each table contains one or more data categories. A RDBMS may use structured query language ("SQL"). Examples of RDBMS are: an ORACLE® database (Oracle Corporation), an IBM DB2® (International Business Machines), or MICROSOFT SQL SERVER® (Microsoft Corporation).
Alternatively, DBMS 5 may be connected to: a persistent storage medium, such as flat files 22; disk based hash ("DBM") files, such as Berkley DB™ (Sleepycat Software); a sequence database; a sequence searching database 20, such as NCBI BLAST™; a sequence searching mechanism; a chemical structure database 18; a chemical structure searching mechanism, such as UNITY®; a screening database 16, such as an RDBMS storing screening information (for example using ORACLE® to store screening information); gene expression sequence databases (such as one using ORACLE®); and the like. Additionally, one or more instruments 30, 32, and 34 for collecting data (for example, instruments that collect in vitro data, in vivo data, in situ data, tissue distribution, and the like) may also be connected to DBMS 5. Instruments 30, 32, and 34 are preferably laboratory instruments, however, they may be any instrument that is useful to connect to DBMS 5 and gather data. For example, the laboratory instruments used in the present invention may include but are not limited to: thermometers, electrochemical readout devices, refractive index devices, biosensor readout devices, chemiluminescent readout devices, plot counters, and plate readers such as beta scintillation counters, fluorescent plate readers, fluorescent polarization plate readers, colorimetric plate readers, ultraviolet detector readers and data derived from animal studies. Further, there may be as many instruments used in the present invention as necessary to collect the desired data.
Additionally, a software data source 28, that gathers information from other programs, such as external software methods (for example, file format translation programs) and/or computer programs for collecting, publishing, editing, and searching data, may also be connected to DBMS 5. DBMS 5 and the various databases may be on one computer and communicate with the vaπous databases within the one computer. Alternatively, the databases and other systems may be on various computers and DBMS 5 may communicate with these various systems by using electronic connections, standard communications systems, secure communications, wireless communications, encrypted communications, or the like.
Referring now to FIG. 3, a schematic drawing of another embodiment of the present invention is illustrated. The system begins with a user, via the client computer, requesting data via a web server program, using the appropriate parameters and security access as necessary, at 140. The user may be required to enter a login and/or a password in order to use the system. Alternatively, the system may utilize other forms of user authorization, such as biometric information (for example, a fingerprint, voice recognition, or retinal), knowledge based identifying information (for example, a mother's maiden name), or the like. Once the user has entered a login and/or a password, the system may identify the user and associate various parameters with the user, such as an access level, the number of times the user has accessed the system, the price the user pays per access to the system or per minute on the system, or the like.
At 141, the server CGI program starts. The server CGI program may be a collection of software objects, associated methods and/or procedural programs.
The server CGI program, for example, is selected at 142. On the server side these programs and objects comprising the CGI program may be coded in the computer software programming language PERL. PERL is an object-orientated scripting language that allows useful extension modules for Internet and database programming. PERL may also load C or C++ objects. Thus, numerical routines may be coded as C or C++ and then loaded as PERL modules. Many common data structures used in the present invention may be encoded as PERL objects. For example, a molecule object that describes data associated with a molecule in the compound collection may be encoded using PERL. Such an object, when instantiated, may contain data-fields that describe the molecule, together with software methods. The data- fields may contain raw information, for example, molecular weight or molecular co-ordinates, or may themselves be complex software objects. Another example of a method in the molecule object may include: a method to retrieve basic information about the molecule from the structural database 16 (such as a structural descriptor, molecule name, compounds containing this molecular structure); a method to retrieve similar compounds from a structure database 18; a method to generate a structure diagram graphic that may be used in a web page and a method to retrieve assay results from database 16 and methods to format and visualize information in the molecule object. Additionally, such methods may create other objects and run other methods. For a given client computer request, a PERL encoded program may be invoked with the CGI parameters. Generally the program will create objects and apply methods, though tasks may be achieved purely through procedural code.
Data may be obtained through data-sources on the server computer, as illustrated in 143 to 154. However, while a particular order of data retrieval is illustrated in FIG. 3, the order of steps in the present invention may involve accessing data-sources in whatever order is beneficial for the user to obtain the desired data.
The system determines if a relational database management system ("RDBMS") is to be accessed at 143. The RDBMS may be ORACLE®, however, the present invention may be used in conjunction with any other RDBMS, for example MICROSOFT SQL SERVER® or IBM DB2®. A RDBMS stores data as a set of tables. For example, two tables in the database of one embodiment ofthe present invention are the "molecules" table, which stores specific molecule identification numbers and structural descriptors, and the "primary assay results" table which stores percent response and counts-per-minute for wells in primary assays. Relationships between these and the many other tables are described using foreign keys. Extensive use of indexes in this embodiment enables rapid retrieval of data from the RDBMS.
If RDBMS data is needed, data is retrieved from the RDBMS using SQL. The present invention may generate the appropriate SQL commands automatically and retrieve the data at 144. These commands are executed against the RDBMS and the results may be stored in the objects data fields.
At 145, the system determines if a chemical structure search is to be performed. In one embodiment of the present invention, UNITY® , a chemical structure search system, may be used to perform such a search. However, the present invention may use any other chemical structure search system, such as MDL, the Integrated Scientific Information System ("ISIS") (MDL information Systems Inc.) or the DAYLIGHT DATABASE PACKAGE™ (Daylight Chemical Information Systems, Inc.). Traditionally, chemical structure searches were not possible using SQL, which is why such searches could not be done in traditional RDBMSs. However, the present invention may use a chemical structure search that is embedded within a RDBMS (for example RS3 from Oxford Molecular).
At 146, the chemical structure search is performed. When a chemical structure query is performed, common chemical structure searches (such as a search for "phenol") should result in: (i) an exact match (e.g., retrieve phenol from the database if it is present); (ii) a substructure search (for example, retrieve a list of structures that contain a phenol as a functional group from the database); or (iii) a similarity search (for example, retrieve a list of structures that have a similar chemical structure to phenol). The database management system may have access to a number of other databases, such as UNITY® databases. These databases contain chemical structures and a registration identifier for each structure. One such useful UNITY® database is one containing the chemical structures in the compound collection, where the registration identifier relates to the specific molecule identification number. Thus, methods that retrieve data from this UNITY® database are able to perform a "software join" to data in the RDBMS database. This feature allows the user to navigate between structural and biological data. For example, the user may perform a substructure search to retrieve all compounds containing a particular substituent from the compound collection. The user may then generate a table of these compounds with additional columns containing biological activity. Additional databases are also available to enhance the above information, such as commercial compound catalogues that may contain compounds of interest to users. The UNITY® database software uses an input file and command line syntax to perform searches. Results from the searches are written to a "hitlist" file. Methods in the database management system generate the input file and associated command line, execute the search and parse the hitlist file. Typically, a hitlist object is created to browse the results from the hitlist file. It is stored in a data-field in the original calling object.
At 147, the system of an embodiment of the present invention determines if a sequence-similarity search, such as a bioinformatics search, is to be performed. In the present invention, for example, gene-sequences may be stored in a NCBI BLAST™ database, which is a public database. This software performs sequence homology searches. Like the UNITY® database, BLAST™ uses an input file and command line, and produces an output file. The present invention contains methods for generating BLAST™ queries and input files and also for parsing BLAST™ output files automatically, and does so at 148. Alternatively, the present invention may use any other sequence searching mechanisms.
At 149, the system ofthe present invention determines if saved data is to be used. If so, the saved data object is retrieved at 150. In the system ofthe present invention, cached or saved data may be used. For example, results from searches are often stored in tabular data files. As a user browses the data, the user may add columns to the data or perform operations on the table. Such information may be cached to prevent the search operation from having to be repeated. Caching also speeds up operation ofthe present invention. Other saved objects may further include, for example, molecular structure icons, UNITY® hitlist files or BLAST™ output files.
Additionally, in the present invention, data may exist in flat files. The output from laboratory devices is an example of data in a flat file. Flat files are collected from devices by software services. Such data may be time-stamped and stored on the web server. Users may download such data for local processing or the user may enter such data into the database as assay results.
Moreover, the present invention may use Berkeley DB™ files. This is a DBM file that contains a set of keys and values where the values may be rapidly retrieved using the keys. There are multiple public implementations of DBM files, including the Berkeley DB™, MLDBM, and SDBM. Any of these files may be used in the present invention. The present invention uses software programs that may store search results in the DBM file for quick retrieval later. These files may be used to store results from bioinformatics data-mining experiments, which are typically run every week.
Where more complex data is to be saved, a software object may store the data in a file. Higher level programming languages such as JAVA® and PERL provide simple interfaces for this type of data persistency. An example of this occurs when displaying locomotor animal data. Both the online images and the page itself involves the retrieval of thousands of data points from the database management system followed by statistical analysis. Using the present invention, the search and calculation need only be done once and the resulting complex data structure may be saved to disk. At 151, the system determines if data from an external procedure is to be used. If so, at 152, the data is received from the external procedure. Such external procedures include data-sources and other types of databases or persistent sources. While such external procedures use the same method of retrieval as those databases described above, calling such external procedures involves defining the input (for example, in a command line argument and/or input file), executing the procedure, and then parsing the results. The present invention may communicate with such external procedures in a number of different manners. For example, the present invention may use utility commands from TRIPOS Associates® that add functionality, including chemical structure file format translation programs (for example a program that allows the present invention to read and write structures to the CHEMDRAW® (CambridgeSoft Corporation) plugin) and "sln2gif," a TRIPOS® program that converts a structural descriptor strings to molecular image files. Additionally, the present invention may use any other external procedures. Such external procedures increase the breadth and scope ofthe services available to the user.
Additionally, many external procedures used in the present invention are public utilities, such as GNUPLOT (a public plotting program), MAGEMAGICK CONVERT™ (an image translation program of Imagemagick Studios that allows PNG format images to be converted to MACINTOSH® (Apple Computer, Inc.) PICT or WINDOWS® (Microsoft Corporation) BMP file formats) and CLUSTAL™ sequence alignment software and PHYLIP™ (that allows the generation of a phylogenetic tree diagram from the CLUSTAL™ sequence alignment). Additionally, the present invention may use external procedures that provide a chemical structure clustering program that (in conjunction with a TRIPOS Associates, Inc. tool that generates molecular fingerprints for chemical structures) divides a set of molecules into chemically similar groups. Additionally, this tool allows a user to collect hits from a search for biologically active compounds and sort these hits into chemically similar classes. Further, another external procedure may be used to gather data from IC50 experiments and configure the results into a standard dose response curve using simplex minimization. Additionally, data collected may originate from an instrument, such as a laboratory instrument, that is connected to the present invention. This data may be stored in the appropriate databases, as outlined in detail below. Alternatively, the data from such instruments may be used to update or modify existing data stored in one ofthe plurality of databases connected to the database management system. Once data has been collected from all sources, processing ofthe data occurs at 153.
Data processing in the present invention may occur in various forms, including organization, tabulation, or statistical calculation of such data, for example. Alternatively, for example, the data may be used to modify or update data already existing in the databases. Calculations of the data may range from simple calculations such as the mean or standard deviation, to more complex numeric calculations such as a t-test, a student's distribution probability, the area under a section of a normal distribution curve or an ANOVA (analysis of variance), etc.
Moreover, the present invention may also update databases as the present invention operates. For example, a user may add new screening results to the database as he or she works in the lab, or may collect data from a lab instrument. The present invention determines biological activities and assay controls from a device data-file and user supplied parameters. These new results are added to the appropriate database, however, SQL queries for the update should be formulated and executed in a transaction such that the updated database is not corrupted in the event of a error. In addition to adding new data, users may also edit existing data. For example, a user may mark one or more ofthe wells in an experimental 96-well plate as "bad" if he or she believes a dilution or other error has occurred. The present invention then updates the assay results and controls. The present invention may monitor the databases to ensure the integrity ofthe system.
For example, when in the present invention a new compound is to be registered, the present invention may update multiple databases. Further, the present invention may perform frequent eπor checks to ensure that the various databases are synchronized. The present invention may also perform error checks when a compound structure is being edited. In addition, other error checks also may occur, such as when a structure is newly added. The system may check to determine if the newly added structure is already in the database. Such error checks help ensure the integrity of system, because, for example, a structural analysis using a nuclear magnetic resonance ("NMR") spectrometer and/or a mass-spectrometer may show that a compound had a different structure than what was expected. At 154, the present invention determines whether further searches are needed or if additional information from data-sources is required. Because some software objects that may be used in the present invention are iterative, step 154 helps prevent information from being missed. For example, a user may request a table of all compounds having an average "IC50" against a "receptor A" that is, for example, less than lμM cross referenced against "receptor B" and "receptor C" activities. Rather than constructing a complex SQL query to obtain all results at one time, the present invention may return to step 143 and retrieve the receptor A hits, construct a table object from these hits, and then retrieve any receptor B and receptor C activities which may then be added as additional columns to the table object. At 155, persistent data objects may be saved. If just an HTML page is output, the system may not require saving any persistent objects. However, a more complex page may benefit from the creation of persistent files. For example, a primary assay data sheet may contain an applet that displays assay data in bar-chart form and a second applet that displays the same data as a colored grid in a separate applet. To avoid fetching this data from the database management system twice, the present invention can write to disk files containing the assay data. Each applet is given the name of the data file as a program parameter. Both applets then open a connection to the server and read the data file. Such applets may be developed under JAVA® and may include methods to read and interpret data files generated by the present invention. Again, if a user is repeatedly using the same data again, such data may be saved to a cached file. Alternatively, all search data may be cached and saved as a table file that the user is free to manipulate in later requests.
In a CGI program, the standard output of the program is redirected by the web server software back to the client computer. At 156, when using an applet, the data is formatted in such a way that the applet may read the data. If an applet is being used, at 157 the data is output to the client.
The data may be in the format requested by the client. For example, if a dot blot was desired, it may be output as a "jpg" file, a "bmp" file, a "gif ' file, or some other computer image file. A dot blot is any matrix based system for hybridization. For example, a dot blot may be northern blots data from any DNA technology, any data from two dimensional proteomic gels, a matrix of RNA extract from tissues in dot format, a chip with DNA sequences attached to them, or the like. A sample ofthe results of a dot blot experiment, as output from one embodiment of the present invention, is shown in FIG. 22.
Further, images may be saved in the system, such that a user may cut and paste the image to some other program (such as a word processing document). Alternatively, the output may be in various other formats stated above, such as a histogram, a relational table format, or in some application specific format, such as formatted for ORACLE®. Alternatively still, if connected to laboratory instruments, the data may be in any different format that is convenient for the user to utilize such data. If an applet is not being used, then at 158, the data may be returned in an HTML file to the client. When the client computer is using a web browser, the present invention may use software objects to ensure that a properly formatted HTML page is created. This allows users with limited computer experience to have clearly formatted data that is easily utilized by such users. The generated HTML may contain multiple links to allow the user to retrieve related data. The new web page may also contain one or more forms or easy to fill in pages that the user may fill out to perform a data search. Data may also be output in XML or any other alternative format that a browser can interpret. The present invention may use any available web browser, such as MICROSOFT INTERNET EXPLORER®, NETSCAPE NAVIGATOR®, or any other web browser such as ones used on WINDOWS®, MACINTOSH® and UNIX® (Unix System Laboratories, Inc.) operating systems.
The original request to the system may initiate from a web server, but there may not be a need for an HTML web page file to be returned as output. In such an instance, some other file is returned. For example, a user may request a MICROSOFT® Excel spreadsheet. In such an instance, a MICROSOFT® Excel spreadsheet may be returned to the user, as a readable web page file. Additionally, the system ofthe present invention may also contain methods that may communicate with other MICROSOFT® OLE objects that allow the creation of other documents. Further, a user may request the output to be in a MICROSOFT® Word format. Alternatively, a browser that requests an inline image or a user that requests an image for document preparation may return the appropriate image file instead of an HTML web page file.
Regardless of what the form the output takes (for example, HTML, JAVA®, a spreadsheet file, or the like), the output of the program sends to the client a web page file that is readable from a web browser on the client's computer.
At 159 the server program finishes. At 161, the client loads the server data through its web browser. Data that is not in JAVA® or HTML format, such as a MICROSOFT® Excel spreadsheet or images (as described above) are handled automatically by the web browser. When an HTML web page has been sent to the client, the web browser loads the HTML page. The web browser may automatically format the page and retrieve any included images.
The system determines if applets are in the web page at 162. If there are none, the system is complete at 163, as the web browser has displayed the pages requested by the user. If there are applets in the page, at 164 the system automatically start such applets through the web browser. The applets are stored in compiled byte-code on the server and are downloaded on demand by the web browser. The behavior and size of the applet is described by HTML tags in the client web page. Normally, HTML web pages contain static content. Having an applet in a web page allows dynamic content within the page. Alternatively, other technologies that allow dynamic content in the web page may be used, such as javascript and dynamic HTML.
At 165, the system determines if data files are needed from the server for the applet. If such data files are needed, then these files are retrieved from the server at 166. When the applet needs a data file from the server, the system may open a HTTP connection back to the server and read the file. Many applets may need to gather data from the server. For example, if the applet is displaying table data, then the applet will read the appropriate data file on the server.
At 167, the system determines whether the applet needs to run server software. If so, the applet opens an HTTP connection back to the server and runs the software. The web server software ensures that the output from the server program is returned to the applet (as described above). For example, the applet may need to generate a database query. This query is passed to a server program for execution 143 and the data from the server program is returned to the applet through the appropriate steps described above. At 168, the applet processes the data. The applet may run within a frame in the web page. Additionally, the applet may provide features such as buttons, menus and entry-boxes as well as graphic elements. Also, the applet may respond to user events such as mouse movements or single or multiple clicks. Additionally the program can open additional windows or can load a new page onto the browser. Additional examples of applets that may be used in the present invention include bar- charts, histograms, scatter-plots for the visualization of numeric data, and the like. These components can be used in a variety of situations, such as to download and display table files. Additionally, a user may click on a column in a bar-chart displaying the plate-profile for a primary assay and the web browser can load the relevant compound data-page. Such applets may also be computer-friendly by displaying data in a clear and simple manner to users. Further, many features give the user control over the data display (for example, depending on the particular applet, the user may turn off or on the display of error bars; add points or other labeling; perform scrolling; or perform selection operations; or the like).
A prefeπed use for the applet is to display tabular data. Static web pages can also be used to display tabular data, however, such web pages may be unwieldy for a table with many hundreds of rows. Applets may scroll through large amounts of data while controlling the display of contents in the cells ofthe table. For example, if a user clicked on a cell containing a molecule identification number, the applet directs the browser to that molecule's data sheet. A further applet that may be used is one that allows programs running on the server to display progress messages. This allows the user to track the progress of long tasks. Without this applet, the user would typically have to wait for the program to finish before seeing any progress. Often the user can become frustrated and believe that the connection to the server has been lost without such updates.
Additionally, the system ofthe present invention may use other specific applets. For example, a colored grid applet may be used for the visualization of results, such as screening plate results from laboratory assay experiments. Each well in a screening plate may be illustrated as a colored box, with the color being determined by the well measurement. The user may use buttons to toggle between, for example, a measured biological response and counts-per-minute and the user may control the range and type of color displayed. Further, the user may use the mouse to obtain more detailed information on the contents of each well. Moreover, the system may use an applet for the visualization of dose-response data, allowing the user to toggle, for example, the display of eπor bars, data-points and the curve itself. A further applet that may be used is an applet that performs dose response curve fitting using a downhill simplex minimization or other minimization techniques such as simulated annealing. With such an applet, the user may interactively: add points to the curve; remove points from the fitting calculation; determine the hill slope; perform a two-site dose response fit, anchor either of he curve end-points, and the like. Applets can also be used to build navigation tools, such as an applet that downloads and displays menu files from the server (such as standard menus that occur at the foot of web pages in the database) or menus for selecting specific assay screens (screens may be sorted by type, experimental technique and receptor target).
Additionally any other utility applet objects may be used. For example, some web browsers currently are unable to print applet contents. Therefore, there is an applet object method to capture the display of another graphic object. The method calls a server program to display the printable image graphic in a new browser window. Applets can also be used to "stack" displays. For example one applet may store many tables but display only one. The user can select another table by clicking on a button. This applet is used for concise display in what would be otherwise cluttered data sheets (for example the molecule and substance data sheets). A similar applet may stack plate profile bar-charts, allowing a user to view either counts-per-minute of biological response (for a more complex assay such as a flashplate-cyclic AMP ("cAMP") assay) or any other measurements (such as pMol cAMP). Stacked bar-charts and histograms can also be used to analyze the distribution of data within table columns.
Another useful applet that may be used combines features from two previously described applets: the scatter-plot applet and the assay browser applet. This allows the user to select two assays. The applet generates SQL commands for a comparison ofthe assays and uses a server program to fetch the data from the database management system. The data is then displayed in a scatter-plot.
Further, the present invention may use commercial JAVA® objects, such as, for example, objects that relate to chemical structure sketching. One such object allows the DBMS to retrieve a chemical structure from the user, where the user has sketched the structure using, for example, the CHEMDRAW® plugin.
Alternatively, during execution of the applet, more data may be retrieved from the server. For example, multiple iterations to step 143 may occur. In the present invention, the applet may continue to run while it is displayed in the web browser. Different web browsers react differently to the applet; for example, MICROSOFT INTERNET EXPLORER® stops the program when the user leaves the page and NETSCAPE NAVIGATOR® may allow the applet to continue to execute after the user leaves the page. The present invention may react in accordance with these differences and other differences between various web browsers.
At 169, the applet loops through the user requests. If there are more user requests at 170, such requests are processed at 165. Once the web page is loaded and all applets have ceased execution, the client side process ends at 163. Referring now to FIG. 4, another schematic of one embodiment ofthe present invention is illustrated. This schematic depicts when the server passes status messages to a client during a search or computation.
Running time-consuming programs through a web-server may be confusing to a user, because the user may not know if the connection to the web server has been lost or that the process is taking a long period of time. Additionally, different web browsers react differently while the process is running. The present invention may react accordingly and may provide the user with assurance that the process is still running smoothly. For example, FIG. 4 depicts how one embodiment of the present invention informs a user ofthe progress ofthe present invention. At 201, the client requests that a program be run on the server. This request comes from a hyperlink, from in a web page, or from an applet running in the current page. Using the CGI mechanism, the client passes to the server the name of a server program to run and various parameters, such as arguments and values, at 202.
The web server then starts the program and saves the arguments and the program name to a state file at 203. Then at 204, the server sends the applet and the name ofthe state file with the arguments to the client in HTML format. At this point, the server program stops.
At 205, the client loads the new HTML web page and starts the applet. Two threads of execution 206 and 213 are setup. Ultimately, the applet finishes when both threads stop. The first thread 206 starts a timer at 207. The timer is useful for showing the user that the programs are still running and for showing the elapsed time the program has taken to process. At 208, the system determines whether the second thread has indicated that the program is finished. If not, the timer is incremented by 0.1 seconds and this is indicated on the display at 209. The display is then redrawn. The display may show a moving icon and the elapsed program time. The thread then sleeps for 0.1 seconds until the next determination at 208. When the second thread finishes, thread one terminates at 210. An example screen shot of such a display is illustrated in FIG. 17.
Referring again to FIG. 4, at 213, the second thread starts and opens a connection to the web server at 215. The thread sends a request to the server to run the server program at 223 and the thread passes to the server the filename of the "state" file, where the state file stores the parameters and the state of the process. At 216, the thread continues receiving messages from the server program until the connection closes.
The thread determines if a redirect message has been received at 217. The redirect message indicates that the server program is about to finish and that all results have been written out to an HTML file. This message includes the name of the HTML file. The thread loads the HTML file into the browser at 218.
At 219, the thread determines if any other message or messages from the server have been received. If so, a message is displayed in the applet at 220. Such a message may indicate the progress of the server program, for example stating that the server is "Setting up BLAST™ search" or "Added new compound to database" or the like.
The thread determines if the server connection has closed at 221. If so, the server program is finished at 222 and thread 1 is notified as such at 21 1. Thread 2 then terminates at 212. If the server connection has not closed, the loop waits for incoming messages at 216. At 223, the server program is started once again. However, when the server program starts this time, it has the state filename. At 224, the state file is read by the server and loaded. The state file is checked to see if it indicates that this program has already been run before at 225. If it has, then all previous messages (that are in the state file) are sent to the client at 226 along with an additional message that informs the client that this program has already been run. If, however, the user wants to rerun the program, the user can just reload the page. After sending this message to the client, the server program stops at 238.
At 237, the main search program 238 begins. The main search program is not shown in FIG. 4, however, it may involve the same process as the current process.
The main search program is monitored for messages or for termination at 229. At 230, the system checks for such a message or termination. If such a message is received, the message is sent to the client at 231 and the state file is updated with the message at 232. The web server may automatically send the message to the client, however the web server output should be unbuffered for this to occur.
At 233, the system determines if the program has finished. If not, the system loops back to 229 monitoring for messages. If the program has finished, at 234, the state file is updated to show that this program has now been run.
The system then determines if there are any results to report at 235. If yes, the results are formatted as an HTML file and saved at 236. Additionally, other data may be saved at this point, as stated above. A redirect request is sent the client at 237. The redirect request may include the HTML filename. At 238, the server program stops. The system ofthe present invention is capable of interfacing and handling tabular data. Such data may include information on molecules, plates, assays, or the like. To do this, server and client software objects and methods for handling tabular data have been developed. Additionally, a suite of routines for generation and display of tabular data has also been developed. Such tabular data includes data and meta-data (for example, the prefeπed display methods and the table title). Formats for files can also be exchanged between different server and client programs/tools.
Tabular data may be saved in a simple file format, for example, that may be exchanged between different server and client programs. This file format may be read by both client applets and server CGI programs. For example, a server program may read a table data file and display that table in HTML, or a client JAVA® applet may read the same table and display it dynamically. Applets that can read table files include, for example: a table viewer, a scatter plot viewer, a histogram viewer, a barchart viewer, and the like. Cell information may be linked to other database pages. For example, if a column in a table contains structural data, the software can link the user to structure data sheets. Additionally, users can delete and edit columns. Alternatively, users may add related data from other databases or the database management system, or add screening data to build a selectivity profile. Using an applet, users can interact with data using tools like interactive JAVA® scatter-plots, histograms, curves, bar-charts for interpretation of numerical data in tables, or the like.
Data may be viewed with a table viewer, scatterplots, histograms, or in spreadsheet formats. Additionally, the system ofthe present invention may use other graphics applets that display graphics, such as creating a barchart or creating a histogram profile of screen. Alternatively, colored grid-plate viewers may also be created.
The system ofthe present invention may be carried out using a computer, and specifically, the processing units of such computers. For example, in one embodiment ofthe present invention, an interface to the above databases that store the biomolecular data, receives an instruction to search for a particular piece of biomolecular data. This may come in the form of a computer instruction that directs the processing unit to receive a request for access to certain biomolecular data stored in one ofthe plurality of databases. A second computer instruction determines which ofthe plurality of databases stores the biomolecular data. A third computer instruction accesses the biomolecular data in one of the plurality of databases. A fourth computer instruction receives the biomolecular data from one ofthe plurality of databases. The processing unit then displays the biomolecular data received from the one of the plurality of databases. These steps may be carried out automatically, as described above. The above prior descriptions, examples, and modifications are all applicable to this embodiment ofthe present invention.
Alternatively, another embodiment ofthe present invention includes a computer system for electronically retrieving biomolecular data from a plurality of databases over a system of networked computers, where the computer system includes a central processing unit (CPU) or units and random access memory (RAM) coupled to said CPU, for use in compiling a target program to run on a target computer architecture. The present invention may further include a client computer and the database management system. These systems may be connected by electronic connections such as hardwire or wireless connections. The database management system is electronically connected to a plurality of databases that include biomolecular data. The client computer receives a request for biomolecular data in a desired format from a user and sends the request to the database management system. The database management system accesses the biomolecular data from the plurality of databases. The database management system then generates as output a web page that is sent to the client computer over the electronic connection. The web page includes the biomolecular data in the desired format. The above prior descriptions, examples, and modifications are all applicable to this embodiment ofthe present invention.
In a preferred embodiment of the present invention, a system is setup such that a user may easily navigate through menu choices to obtain the desired information. FIG. 5 illustrates a series of menu options available in one embodiment ofthe system ofthe present invention. These menu options are shown purely as an example of available menu options and are not intended to limit the present invention in anyway. Further, additional menu options may be added as necessary to the system ofthe present invention.
Referring to FIG. 5, in one embodiment ofthe present invention, it may display a "Main" page 310, which provides the option of entering either a "Gene Distribution Home Page" 311 or a "Screening Home Page" 329. These menu choices may be displayed using a HTML web page or an applet. Depending on what menu choice the use selected, the user may navigate through various menu choice options to obtain the desired information. The "Gene Distribution Home Page" provides the user with, for example, genomic and bioinformatic data.
If the user selected the "Gene Distribution Home Page" 311 the user may be directed to four other menu choices, "Main" 312, "Search Sequence" 313, "Search Dot Blot" 314 or "Update" 315. The user may then select which menu option he or she desires. The "Main" 312 selection allows the user to select "Gene Distribution Home Page" 311 or "Screening Home Page" 329. These options returns the user to the beginning ofthe menu options.
Under the "Search Sequence" 313, six options appear: "Browse genes" 316; "Do multiple sequence alignments" 317; "Do BLAST sequence search" 318; "Monthly BLAST probes" 319; "Database mining" 320; or "Hydrophobicity analysis" 321. Under the option of "Browse genes" 316, a user may browse various genes. If selecting "Do multiple sequence alignments" 317, the user may select genes and align either the nucleic or peptide sequences by homology identity. The user may further select to view the image as a dendogram. Under the option of "Do BLAST sequence search" 318, a user may select a gene and run a BLAST search. If selecting "Monthly BLAST probes" 319, the use may recover new sequences that are similar to known probes, from a weekly or monthly download. Under "Database mining"
320, a user may examine several probes in the database. Under "Hydrophobicity analysis"
321, the user may print a hydrophobicity plot of an amino acid sequence.
If a user selects the "Search Dot Blot" page 314, the user may, for example, choose one of five options: "Browse dot blots" 322; "Search by organ and disease state" 323;
"Search by gene and disease state" 324; "Search by gene and organ" 325; or "Find interesting dot blots" 326. If electing to "Browse dot blots" 322, a user may select a dot blot and print it, view the plate map, or show a bar graph. The dot blot is displayed with its ID, the gene, and the date in which the dot blot was added. A user may instead select to "Search by organ and disease state" 323, in which a list of organs and possible diseases may be selected. The user may choose to view the results in a scatterplot and/or barchart. If a user selects to "Search by gene and disease state" 324, the user may select a gene and a coπesponding disease state. The user may then review the results. Alternatively, the user may "Search by gene and organ" 325, in which the user may select a gene and an organ, and view the results. When electing to "Find interesting dot blots" 326, for example, the user may enter various parameters, such as to search for dot blots containing organs whose relative abundance is more than a desired amount of standard deviations larger than the mean for the dot plot. The desired amount of standard deviations may be entered by the user. A table of results is then displayed.
If the user selects "Update" 315, two options appear: "Enter tissue samples" 327 and "Load dot blot experimental results" 328. Under "Enter tissue samples" 327, the user may enter information on tissue samples. By selecting "Load dot blot experimental results" 328, the user may provide more dot blots for the system. Therefore, the "Update" 315 option allows the user to add information to the databases.
The "Screening Home Page" 329 allows a user to obtain analysis of screening and molecular information. This page allows a user to select from one of six menus, for example: "Main" 330; "Molecular Info" 331; "Assay Info" 332; "Lab Info 333; "Chemistry" 334; and "Other Pages" 335.
The "Main" 330 selection allows the user to select "Gene Distribution Home Page" 311 or "Screening Home Page" 329, similarly to the "Main" 312 under the "Gene Distribution Home Page" 311. Again, this returns the user to the beginning ofthe menu choice options.
Under "Molecular Info" 331, the user may select one of four options, for example, comprising: "Select molecule from database" 336; "Select substance from database" 337;
"Perform chemical structure searches" 338; and "Select plate from database" 339. The option to "Select molecule from database" 336 allows a user to enter a molecule ID and retrieve a list of molecules based on the ID. If the user selects "Select substance from database" 337, the user may enter a substance ID and a substance name and retrieve a list of substances. Under "Perform chemical structure searches" 338, the user may select a database, type of search, a cutoff (for two dimensional similarity searches), and the maximum number of hits. The user then may, using for example CHEMDRAW®, draw a chemical st cture. This chemical structure is searched in chemical structure databases, for example UNITY®. The results show the matches ofthe chemical structure For example, the results of a sample UNITY® search are shown in FIG 20
If a user chooses to "Select plate from database" 339, the user may enter a plate ID or plate name and retπeve information on plates matching the entered cπteπa Information on the plates that may be shown includes the plate ID, plate name, plate type, chemical structures and any comment that was added Thus, the user may view an entire plate at one time to compare different wells
By selecting "Assay info" 332, the user may choose one of six additional options, for example "Browse assay results" 340, "Search IC50 data" 341, "Search reconfirmations" 342, "Search pπmary screen" 343, "Compare results for two assays" 344, and "List animal expeπments" 345 The option to "Browse assay results" 340 allows a user to browse assay results by vaπous parameters, such as assay method type If selecting "Search IC50 data" 341 , the user may search IC50 data using vaπous parameters By selecting "Search reconfirmations" 342, the user may search reconfirmation information using vaπous parameters If the option "Search pπmary screen" 343 is selected, the user may search a pπmary screen using vaπous parameters Under "Compare results for two assays" 344 a user may plot one assay against another assay The option "List animal expeπments" 345 allows a user to select an animal expeπment from a list of expeπments
Under the option "Lab Info" 333, a user may choose one of five options, for example "Add data captured from lab devices to database" 346, "View output from lab devices" 347, "Screening tools" 348, "Generate reconfirmation plate maps" 349, and "Add animal data" 350 If selecting "Add data captured from lab devices to database" 346, the user may take data captured from a lab instrument and add it to the system Under the option, "View output from lab devices" 347, the user may view data from lab devices "Screening tools" 348 allows a user to select a screening tool from a list If selecting "Generate reconfirmation plate maps" 349, the user may generate plate maps from reconfirmation data "Add animal data" 350 allows the user to add animal data to the system If selecting the "Chemistry" 334 option, the user may select one of three other options, for example: "Add compound to database" 351; "Edit substance" 352; or "Create new plate" 353. "Add compound to database" 351 permits the user to add a compound to the system, using a plugin such as CHEMDRAW®. The option "Edit substance" 352 allows a user to edit an existing substance in the system and "Create new plate" 353 creates a blank new plate to add data.
The "Other Pages" 335 screen, allows the user to perform various other functions. For example, the user may select "IC50 Calculator" 354; "Overview of database" 355; or other system administrator tools. The option "IC50 Calculator" 354 allows a user to build an IC50 curve using data points input by a user. By selecting "Overview of database" 355, a user may obtain sample pages that depict various features ofthe system. Additionally, other system administration tools may be placed under this menu option.
Different pages in the database may be protected such that only users with specific access levels may view certain pages. For example, the "Other pages" 335 screen is predominately system administration and thus it may not be desirable or useful for other system users to have access to this menu option. Therefore, the system may confirm the access level of a user before granting access to certain menu options. Alternatively, the system may not even display menu options to a user that does not possess the proper access level. The access level of a user may be set when the user's account is created. Referring now to FIG. 6, a flowchart 400 of an overview of one embodiment ofthe present invention is shown. At 401, a request for biomolecular data is received. The system determines at least one ofthe databases that contains the biomolecular data at 402. At 403, the system generates database instructions for accessing data in the at least one database. The database instructions are transmitted to the at least one database at 404. Biomolecular data from the at least one database is received at 405. Then, at 406, a display data is generated for the biomolecular data and at 407 the display data is transmitted.
Referring now to FIG. 7, a flowchart of a specific aspect of the flowchart of FIG. 6 is shown. FIG. 7 details more information on what occurs at step 406. For example, at 501, a request is received from a user, preferably on the client machine. At 502, the program and the program arguments are read. The program is then stored in memory at 503.
Referring now to FIG. 8, a flowchart of a specific aspect ofthe flowchart of FIG. 6 is shown. FIG. 8 details more information on what occurs at step 401. For example, at 601 results from at least one database is received. Then at 605, results of the program are stored and at 603 a display including the results is generated.
Additionally, various programs can be used in the present invention on the DBMS.
These programs, as shown in Tables 1-20, organized by the type of program and data gathered along with a short description of what the programs accomplish, may include, for example:
Assay Table 1
Figure imgf000037_0001
Figure imgf000038_0001
Barcode Table 2
Figure imgf000038_0002
Server Side Includes (SSI) Table 3
Figure imgf000038_0003
Chemistry Table 4
Figure imgf000038_0004
Figure imgf000039_0001
Devices Table 5
Figure imgf000039_0002
IC50 Table 6
Figure imgf000039_0003
Miscellaneous Table 7
NAME OF PROGRAM DESCRIPTION OF PROGRAM java- image Java image capture sql-query Interface to allow the user to execute their own SQL queries (also may be used by JAVA® clients)
Clustering Table 8
Figure imgf000039_0004
Molecule (chemical structure) Table 9
Figure imgf000039_0005
Figure imgf000040_0001
Overview Table 10
Figure imgf000040_0002
Plates Table 11
NAME OF PROGRAM DESCRIPTION OF PROGRAM get-assays Retrieves all assay data for the compounds on the plate get-plate Interface for getting a plate data sheet
Programs Table 12
Figure imgf000040_0003
Screening Table 13
Figure imgf000040_0004
Figure imgf000041_0001
Figure imgf000042_0001
Tissue and Genes Table 18
Figure imgf000042_0002
Structure Searches Table 19
NAME OF PROGRAM DESCRIPTION OF PROGRAM unity-search/hitlist Displays results from a UNITY hitlist unity-search/sketch Interface to query sketchers unity-search/submit-unity-sim-search Interface to chemical similarly searches unity-search/unity-results Does unity search and displays results
Animal Data Table 20
Figure imgf000042_0003
Figure imgf000043_0001
Further, various programs can be used in the present invention on the client computer and are enabled by the applet, as shown in Tables 21-31. These programs include applets that allow the incoφoration of dynamic content into web pages. These programs, organized by the type of program and data gathered along with a short description of what the programs accomplish, may include, for example:
Assay Table 21
Figure imgf000043_0002
Plate Table 26
Figure imgf000044_0001
FIGS. 9 - 23 show images from a prefeπed embodiment of the present invention. These images are the copyright of Arena Pharmaceuticals, Inc., and are subject to the restrictions listed above.
Referring now to FIG. 9, a view of an introduction screen as output from one embodiment ofthe present invention is shown. This screen allows a user to navigate through the various features ofthe present invention.
Referring now to FIG. 10, a visualization tool for runset counts as output from one embodiment ofthe present invention is shown. Information on the plate as well as various colors to demonstrate different wells, are shown. Referring now to FIG. 11, a hydrophobicity sequence analysis, with two applet output windows, as output from one embodiment ofthe present invention is shown. The two applet windows show various graphs that may be output to visual the sequence analysis.
Referring now to FIG. 12, a histogram of frequency counts for a runset, as output from one embodiment ofthe present invention is shown. The histogram is a plot ofthe various data points. Various qualities about the histogram, such as the maximum and minimum points are also illustrated.
Referring now to FIG. 13, a view of a molecule data sheet as output from one embodiment ofthe present invention is illustrated. A molecule may be searched and looked up using the present invention. Information on the molecule, such as its structure, its ID, SLN, molecular weight, clogP, and comments, can easily be seen by the output ofthe present invention.
Referring now to FIG. 14, a primary screening runset as output in two windows from an embodiment ofthe present invention is shown. This shows data from a runset. The number of the plates, along with information obtained about each plate, is also illustrated. For example, the information is grouped by plates. This figure is shown in two windows because the output may be too voluminous to be displayed on one screen and thus a user has to scroll down to see the rest of the data. Referring now to FIG. 15, a primary screening plate as output in two windows from an embodiment ofthe present invention is shown. The results shows information such as the runset, plate ID, assay, target, assay method, plate map, secondary assay, runset comments, and the date. This figure is shown in two windows because the output may be too voluminous to be displayed on one screen and thus a user has to scroll down to see the rest of the data. Referring now to FIG. 16, animal data output for a motor function/dysfunction experiment as output from one embodiment ofthe present invention is shown. This data shows useful information about a local motor experiment.
Referring now to FIG. 17, an interface while a program is running as output from one embodiment ofthe present invention is shown. The present invention outputs update information to the user such as "setting up search." Additionally a timer indicates how long the program has taken to execute.
Referring now to FIG. 18, an IC50 form for searching results from IC50 searches as output in two windows from an embodiment ofthe present invention is shown. This allows a user to enter various search parameters and receive search results. The screen on the left depicts the entry form for entering search parameters and the screen on the right depicts the results that are output.
Referring now to FIG. 19, a scatter plot comparison, with axes inputs, as output from one embodiment ofthe present invention is shown. The scatteφlot shows the relationship between assays. Additionally, two input windows are also shown, in which the X and Y axes may be selected.
Referring now to FIG. 20, a query for searching database results as output in two windows from an embodiment ofthe present invention is shown. For example, a structure may be entered, along with the databases to search, the search type, the cutoff, and a maximum number of hits. The present invention outputs the hits. In this example, there were 2107 hits. The output is displayed such that a user may scroll down the various hits and select one ofthe hits if desired. The window on the left displays the input window while the window on the right displays the output window. Referring now to FIG. 21 , a form for entering assay data as output from one embodiment ofthe present invention is shown. The form includes various inputs, such as plate number, comments, and entries for the various substance numbers.
Referring now to FIG. 22, dot blot experimental details as output in two windows 5 from an embodiment of the present invention is shown. The details show an image ofthe dot blot along with various points of information for the dot blot experiment. This figure is shown in two windows because the output may be too voluminous to be displayed on one screen and thus a user has to scroll down to see the rest ofthe data.
Referring now to FIG. 23, an IC50 plate assay data sheet as output in two windows 10 from an embodiment ofthe present invention is shown. This displays various information about the assay plate, for example, the runset, assay method, and/or a control summary. This figure is shown in two windows because the output may be too voluminous to be displayed on one screen and thus a user has to scroll down to see the rest of the data.
ι c ♦ -Η -^
One skilled in the art will readily appreciate that the present invention is well adapted to cany out the objects and obtain the ends and technical advantages mentioned, as well as those inherent therein. The embodiments presented herein are presently representative of prefeπed embodiments, are exemplary, and are not intended as limitations on the scope ofthe 0 invention. It will be readily apparent to one skilled in the art that various substitutions and modifications may be made to the invention disclosed herein without departing from the scope and spirit ofthe invention.
All patents, patent applications, and publications mentioned in the specification are indicative ofthe levels of those skilled in the art to which the invention pertains. All patents 5 and publications are herein incoφorated by reference to the same extent as if each individual publication was specifically and individually indicated to be incoφorated by reference.
The invention illustratively described herein suitably may be practiced in the absence of any element or elements, limitation or limitations not specifically disclosed herein. The terms and expressions which have been employed are used as terms of description and not of limitation, and there is no intention that in the use of such terms and expressions of excluding any equivalents ofthe features illustrated and described or portions thereof, but it is recognized that various modifications are possible within the scope ofthe invention claimed. Thus, it should be understood that although the present invention has been specifically disclosed by prefeπed embodiments and optional features, modifications and variations ofthe concepts herein disclosed may be appreciated by those skilled in the art, and that such modifications and variations are considered to be within the scope of this invention as defined by the appended claims. The invention has been set forth broadly and generically herein. Each of the naπower species and subgeneric groupings falling within the generic disclosure also form part ofthe invention. This includes the generic description ofthe invention with a proviso or negative limitation removing any subject matter from the genus, regardless of whether or not the excised material is specifically recited herein. Other embodiments are within the following claims.

Claims

What is claimed is:
1) A database management system that provides an interface to a plurality of databases storing biomolecular data, said system comprising: a plurality of databases storing biomolecular data; at least one processing unit; a first computer instruction that directs said processing unit to receive a request for access to biomolecular data stored in at least one of said plurality of databases; a second computer instruction that determines which of said plurality of databases stores said biomolecular data; a third computer instruction that accesses said biomolecular data in said at least one of said plurality of databases; a fourth computer instruction that receives said biomolecular data from said at least one of said plurality of databases; and a web page that is generated by said processing unit and displays said biomolecular data received from said at least one of said plurality of databases.
2) The system of claim 1, wherein said plurality of databases includes a relational database that stores statistical biomolecular data.
3) The system of claim 2 wherein said relational database comprises a molecule identification number for a molecule. 4) The system of claim 2 wherein said relational database comprises a structural descriptor of a molecule.
5) The system of claim 2 wherein said relational database comprises assay results for a molecule.
6) The system of claim 1 wherein said plurality of databases include a chemical structure search system.
7) The system of claim 6 wherein said chemical structure search system is a UNITY system. 8) The system of claim 6 wherein said chemical structure search system compπses a chemical structure of at least one molecule
9) The system of claim 8 wherem said third computer instruction compπses computer instructions that generate a search for structures similar to a desired molecule 10) The system of claim 8 wherein said third computer instructions compπses computer instructions that generate a search of said database for an exact match to a desired molecule 11) The system of claim 8 wherein said third computer instruction compπses computer instructions that generate a hit-list objects stoπng records found in search of said chemical structure search system 12) The system of claim 8 wherein said chemical structure search system compπses a registration identifier for said at least one molecule
13) The system of claim 2 wherein said plurality of databases compπses a gene sequence database
14) The system of claim 13 wherem said first computer instructions compπses instructions to said first processing unit to generate a sequence homology search for a sequence in said gene sequence database
15) The system of claim 1 wherem said web page compπses an applet
16) The system of claim 15 wherem said applet compπses an update of progress of a program being executed by said at least one of said plurality of databases 17) The system of claim 15 wherein said applet comprises a compaπson of two assays of said biomolecular data
18) The system of claim 1, wherein said plurality of databases compπses databases that store chemical, screening, and genomic data
19) The system of claim 18 wherein said first computer instruction compπses instructions to said first processing unit to receive said request from a second processing unit and further wherein said web page is transmitted to said second processing unit
20) The system of claim 18 wherem said first processing unit receives an interface request to provide a graphical user interface to said second processing unit, said first processing unit generates said graphical user interface; and said first processing unit transmits said graphical user interface to said second processing unit.
21 ) The system of claim 18 wherein at least one of said plurality of databases is located on a remote database maintained by a remote processing unit and said first processing unit accesses said biomolecular data from said remote database.
22) The system of claim 18 wherein said biomolecular data received from said at least one of said plurality of databases is stored in a computer memory.
23) The system of claim 22, wherein said biomolecular data received from said at least one of said plurality of databases is stored in tabular data files. 24) The system of claim 18 further comprising a fifth computer instruction that directs said first processing unit to add said biomolecular data to said plurality of databases based on said second computer instruction.
25) The system of claim 18 further comprising a sixth computer instruction that directs said first processing unit to edit biomolecular data stored in said at least one of said plurality of databases based on said second computer instruction.
26) The system of claim 18, further comprising a seventh computer instruction that directs said first processing unit to perform a software join on one of said plurality of databases with another of said plurality of databases.
27) The method of claim 18 wherein said biomolecular data comprises at least one of: chemical, biological, and genomic data.
28) The method of claim 1, wherein said second computer instruction comprises looking in all of said plurality of databases.
29) The method of claim 1, further comprising at least one laboratory instrument that collects data, wherein said laboratory instrument is connected to said first processing unit. 30) The method of claim 29, wherein data collected from said laboratory instrument is stored in said plurality of databases.
31) A method for providing an interface to a plurality of databases storing biomolecular data, comprising the steps of: receiving a request to access biomolecular data stored in at least one of a plurality of databases; determining which of said plurality of databases store said biomolecular data; generating instructions to access said biomolecular data in said at least one of said plurality of databases; receiving said biomolecular data from said at least one of said plurality of databases; and generating a web page display of said biomolecular data received from said at least one of said plurality of databases. 32) The method of claim 31 wherein step of generating said instructions comprises generating a request for a structural descriptor of a molecule.
33) The method of claim 31 wherein step of generating said instructions comprises generating a request to retrieve assay results for a molecule.
34) The method of claim 31 wherein said step of generating said instructions comprises generating a request to access said biomolecular data stored in a chemical structure search system.
35) The method of claim 31 wherein said step of generating said instructions comprises generating a request to find structures similar to a molecule in a chemical structure search system. 36) The method of claim 35 wherein said step of generating said instructions comprises generating a request to find an exact match to a molecule in a chemical structure search system.
37) The method of claim 31 wherein said biomolecular data comprises at least one of chemical, biological, or genomic data. 38) The method of claim 37 wherein said step of generating said instructions comprises generating a sequence homology search for a sequence in a gene sequence database.
39) The method of claim 37 wherein step of receiving said request comprises receiving said request from a second processing unit. 40) The method of claim 37 further comprising transmitting said display to said second processing unit.
41) The method of claim 40 wherein said step of generating said display comprises generating an applet displaying said biomolecular data. 42) The method of claim 41 further comprising maintaining a connection with said second processing unit to execute programs requested by said applet.
43) The method of claim 41 wherein step of generating said applet comprises generating an update of progress of a program being executed by said at least one of said plurality of databases. 44) The method of claim 41 wherein said step of generating said applet comprises generating a colored grid displaying a visualization of plate results of said biomolecular data.
45) The method of claim 41 wherein step of generating said applet comprises generating a visualization of dose-response data.
46) The method of claim 41 wherein said step of generating said applet comprises generating a comparison of two assays of said biomolecular data.
47) The method of claim 31 further comprising: receiving an interface request to provide a graphical user interface to said second processing unit, wherein said graphical user interface comprises a web page; generating said graphical user interface; and transmitting said graphical user interface to said second processing unit.
48) The method of claim 47 further comprising: reading a program from said request wherein said program manipulates said biomolecular data in said at least one of said plurality of databases from said request; generating said instructions to execute said program in said at least one of said plurality of databases; transmitting said instructions to said at least one of said plurality of database; transmitting updates to said second processing unit indicating said program is being executed responsive to said at least one plurality of databases executing said program; and transmitting results of said program to said second processing unit responsive to said program being completed.
49) The method of claim 48 further comprising accessing said biomolecular data from a remote database maintained by a remote processing unit. 50) The method of claim 49 further comprising storing said biomolecular data received from said at least one of said plurality of databases.
51) The method of claim 48, wherein said step of generating said instructions comprises generating said instructions to edit biomolecular data stored in said at least one of said plurality of databases. 52) The method of claim 48, wherein said step of transmitting said instructions comprise periodically transmitting said instructions to perform iterative functions.
53) The method of claim 48, wherein said step of receiving said request to access biomolecular data stored in at least one of a plurality of databases comprises receiving a request, by said database management system, to access biomolecular data stored in at least one of a plurality of databases
54) The method of claim 48, wherein said step of determining which of said plurality of databases store said biomolecular data comprises determining, by said database management system, which of said plurality of databases store said biomolecular data.
55) The method of claim 54, wherein said step of generating instructions to access said biomolecular data in said at least one of said plurality of databases comprises generating instructions, by said database management system, to access said biomolecular data in said at least one of said plurality of databases.
56) The method of claim 55, wherein said step of receiving said biomolecular data from said at least one of said plurality of databases comprises receiving, by said database management system, said biomolecular data from said at least one of said plurality of databases.
57) The method of claim 31 further comprising: reading a program to execute on said biomolecular data in said at least one of said plurality of databases; generating said instructions for executing said programs for said at least one of said plurality of databases; transmitting said instructions to said at least one of said plurality of databases; receiving updates indications said at least one of said plurality of databases is executing said program; indicating to said user that said at least one of said plurality of databases is executing said program; and generating a web page display of results of said program responsive to said one of said plurality of databases completing execution.
58) A method for providing an interface to a plurality of databases storing biomolecular data over a system of networked computers, said method comprising: processing computer instructions that direct a first computer to receive a request for access to biomolecular data stored in at least one of said plurality of databases, wherein said request was sent over a system of networked computers from a second computer; automatically determining which of said plurality of databases store said biomolecular data; automatically accessing said biomolecular data in said at least one of said plurality of databases; automatically receiving said biomolecular data from said at least one of said plurality of databases; automatically generating a web page file comprising said biomolecular data; and sending said web page file over said system of networked computers to said second computer.
59) The method of claim 58, wherein said system of networked computers is the Internet.
60) The method of claim 58, wherein said step of automatically determining searches a relational database, a chemical database, and a bioinformatics database. 61) The method of claim 58, wherein said web page comprising said biomolecular data comprises said biomolecular data in a convenient format.
62) The method of claim 58, further comprising: transmitting updates to said second computer indicating said method is being executed until said step of automatically generating a web page.
63) The method of claim 58, wherein said first computer is a server computer and said second computer is a client computer.
64) The method of claim 58, wherein said step of automatically determining which of said plurality of databases store said biomolecular data comprises automatically determining, by said first computer, which of said plurality of databases store said biomolecular data.
65) The method of claim 58, wherein said step of automatically accessing said biomolecular data in said at least one of said plurality of databases comprises automatically accessing, by said first computer, said biomolecular data in said at least one of said plurality of databases. 66) The method of claim 58, wherein said step of automatically generating a web page comprising said biomolecular data comprises automatically generating a web page, by said first computer, comprising said biomolecular data.
67) The method of claim 58, wherein said first computer is a database management system. 68) The method of claim 58, wherein said step of processing computer instructions requires a password.
69) A computer system for electronically retrieving biomolecular data from a plurality of databases over a system of networked computers, wherein said computer system comprises at least one central processing unit (CPU) and random access memory (RAM) coupled to said CPU, for use in compiling a target program to run on a target computer architecture, said computer system comprising: a client computer; a plurality of databases comprising biomolecular data; a database management system; a first electronic connection between said database management system and said client computer, wherein said first electronic connection is over a system of networked computers and said client computer requests biomolecular data from said database management system in a desired format and said database management system determines which of said plurality of databases stores said requested biomolecular data; a second electronic connection between said database management system and said plurality of databases, wherein said database management system accesses said biomolecular data from said plurality of databases; and a web page that is output from said database management system and sent to said client computer over said first electronic connection, wherein said web page comprises said biomolecular data in said desired format.
70) The system of claim 69, wherein said desired format comprises a histogram.
71) The system of claim 69, wherein said desired format comprises a table. 72) The system of claim 69, wherein said desired format comprises a chemical structure.
73) The system of claim 69, wherein said web page is in HTML format or XML format.
74) The system of claim 69, wherein said web page comprises an applet.
75) A method for providing an interface to a plurality of databases storing biomolecular data, said method comprising: processing computer instructions that direct a computer to receive a request for access to data output from an instrument, wherein said instrument is connected to said computer and said computer is connected to a plurality of databases that store biomolecular data; gathering said data from said instrument; determining which of said plurality of databases is associated with said data output from said instrument; accessing said at least one of said plurality of databases associated with said data output from said instrument; and storing said data from said instrument in said at least one of said plurality of databases. 76) The method of claim 75, wherein said instrument is a laboratory instrument.
77) The method of claim 76, wherein said plurality of databases comprises databases that store chemical, screening, and genomic data.
78) The method of claim 77, further comprising generating a web page file comprising said data output from said instrument.
79) The method of claim 78, wherein said plurality of databases are over a system of networked computers and said step of accessing occurs over said system of networked computers.
80) The method of claim 79, wherein said data from said instrument is used to modify data already existing in said at least one of said plurality of databases.
81) The method of claim 80, wherein said web page comprises an applet.
PCT/US2001/003038 2000-03-29 2001-01-30 Universal biomolecular data system WO2001073587A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2001233134A AU2001233134A1 (en) 2000-03-29 2001-01-30 Universal biomolecular data system

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US19306500P 2000-03-29 2000-03-29
US60/193,065 2000-03-29
US63583300A 2000-08-09 2000-08-09
US09/635,833 2000-08-09

Publications (2)

Publication Number Publication Date
WO2001073587A2 true WO2001073587A2 (en) 2001-10-04
WO2001073587A3 WO2001073587A3 (en) 2003-01-03

Family

ID=26888647

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2001/003038 WO2001073587A2 (en) 2000-03-29 2001-01-30 Universal biomolecular data system

Country Status (2)

Country Link
AU (1) AU2001233134A1 (en)
WO (1) WO2001073587A2 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7177878B2 (en) * 2004-04-13 2007-02-13 International Business Machines Corporation Simple persistence mechanism for server based web applications
US7428554B1 (en) 2000-05-23 2008-09-23 Ocimum Biosolutions, Inc. System and method for determining matching patterns within gene expression data
WO2017041016A1 (en) * 2015-09-03 2017-03-09 Becton, Dickinson And Company Methods and systems for providing labelled biomolecules
US9607490B2 (en) 2012-09-13 2017-03-28 Sony Corporation Haptic device

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5577239A (en) * 1994-08-10 1996-11-19 Moore; Jeffrey Chemical structure storage, searching and retrieval system

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5577239A (en) * 1994-08-10 1996-11-19 Moore; Jeffrey Chemical structure storage, searching and retrieval system

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
DAVIDSON S B ET AL: "BIOKLEISLI: A DIGITAL LIBRARY FOR BIOMEDICAL RESEARCHERS" INTERNATIONAL JOURNAL ON DIGITAL LIBRARIES, HEIDELBERG, DE, vol. 1, no. 1, April 1997 (1997-04), pages 36-53, XP000904387 ISSN: 1432-5012 *
KERLAVAGE A R ET AL: "DATA MANAGEMENT AND ANALYSIS FOR HIGH-THROUGHPUT DNA SEQUENCING PROJECTS" IEEE ENGINEERING IN MEDICINE AND BIOLOGY MAGAZINE, IEEE INC. NEW YORK, US, vol. 14, no. 6, 1 November 1995 (1995-11-01), pages 710-717, XP000598295 ISSN: 0739-5175 *
PATON N W ET AL: "Query processing in the TAMBIS bioinformatics source integration system" SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, 1999. ELEVENTH INTERNATIONAL CONFERENCE ON CLEVELAND, OH, USA 28-30 JULY 1999, LOS ALAMITOS, CA, USA,IEEE COMPUT. SOC, US, 28 July 1999 (1999-07-28), pages 138-147, XP010348735 ISBN: 0-7695-0046-3 *
WANG CHIEW TAN ET AL: "QUICK: graphical user interface to multiple databases" DATABASE AND EXPERT SYSTEMS APPLICATIONS, 1996. PROCEEDINGS., SEVENTH INTERNATIONAL WORKSHOP ON ZURICH, SWITZERLAND 9-10 SEPT. 1996, LOS ALAMITOS, CA, USA,IEEE COMPUT. SOC, US, 9 September 1996 (1996-09-09), pages 404-409, XP010200903 ISBN: 0-8186-7662-0 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7428554B1 (en) 2000-05-23 2008-09-23 Ocimum Biosolutions, Inc. System and method for determining matching patterns within gene expression data
US7177878B2 (en) * 2004-04-13 2007-02-13 International Business Machines Corporation Simple persistence mechanism for server based web applications
US9607490B2 (en) 2012-09-13 2017-03-28 Sony Corporation Haptic device
WO2017041016A1 (en) * 2015-09-03 2017-03-09 Becton, Dickinson And Company Methods and systems for providing labelled biomolecules
US11187699B2 (en) 2015-09-03 2021-11-30 Becton, Dickinson And Company Methods and systems for providing labelled biomolecules
US11860159B2 (en) 2015-09-03 2024-01-02 Becton, Dickinson And Company Methods and systems for providing labelled biomolecules

Also Published As

Publication number Publication date
WO2001073587A3 (en) 2003-01-03
AU2001233134A1 (en) 2001-10-08

Similar Documents

Publication Publication Date Title
Bolton et al. PubChem: integrated platform of small molecules and biological activities
US5859972A (en) Multiple server repository and multiple server remote application virtual client computer
US7941444B2 (en) Universal annotation configuration and deployment
JP3998706B2 (en) Document data management method, management system, and computer software
US20030176929A1 (en) User interface for a bioinformatics system
US7849074B2 (en) Annotation of query components
US20060020398A1 (en) Integration of gene expression data and non-gene data
US20050131649A1 (en) Advanced databasing system for chemical, molecular and cellular biology
US20040267798A1 (en) Federated annotation browser
MXPA01010906A (en) System and method for database similarity join.
EP1500005A1 (en) System and method for semantics driven data processing
US6654736B1 (en) Chemical information systems
WO2005006216A1 (en) A method for querying collated data sets
Lee et al. The GeneMine system for genome/proteome annotation and collaborative data mining
US20020152242A1 (en) System for monitoring the usage of intranet portal modules
US20080263138A1 (en) Method and system for managing specimen data
WO2001073587A2 (en) Universal biomolecular data system
JP2023551641A (en) List-based data storage for data retrieval
US20080033999A1 (en) Bioinformatics system architecture with data and process integration
He et al. eSHAFTS: Integrated and graphical drug design software based on 3D molecular similarity
Medina‐Aunon et al. Protein Information and Knowledge Extractor: Discovering biological information from proteomics data
Lemkin et al. The Protein Disease Database of human body fluids: II. Computer methods and data issues
Butler The Design and Development of Vectorbase: A Bioinformatic Resource Center for Invertebrate Vectors of Human Pathogens
James ProtosLINK: A Bioinformatics Application to Manage Protein Identifications, Annotations, and Related Sample Information
Smietana et al. Current Requirements for Informatics Data Systems for Drug Discovery and Development.

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP