US20040019432A1 - System and method for integrated computer-aided molecular discovery - Google Patents
System and method for integrated computer-aided molecular discovery Download PDFInfo
- Publication number
- US20040019432A1 US20040019432A1 US10/410,965 US41096503A US2004019432A1 US 20040019432 A1 US20040019432 A1 US 20040019432A1 US 41096503 A US41096503 A US 41096503A US 2004019432 A1 US2004019432 A1 US 2004019432A1
- Authority
- US
- United States
- Prior art keywords
- module
- computer
- aided molecular
- molecular discovery
- executing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- ZGBPVHDDOPDOGP-UHFFFAOYSA-N COC1=C(OC)C=C(CNC(=O)C2=CC(Cl)=CC=C2NC(=O)C2=NNC3=CC=CC=C32)C=C1 Chemical compound COC1=C(OC)C=C(CNC(=O)C2=CC(Cl)=CC=C2NC(=O)C2=NNC3=CC=CC=C32)C=C1 ZGBPVHDDOPDOGP-UHFFFAOYSA-N 0.000 description 2
- RMYUTCQOSXGHIC-UHFFFAOYSA-N CC(NC(=O)C(NC(=O)/C1=N/NC2=CC=CC=C21)C1CCNCC1)C1=CC=CC=C1 Chemical compound CC(NC(=O)C(NC(=O)/C1=N/NC2=CC=CC=C21)C1CCNCC1)C1=CC=CC=C1 RMYUTCQOSXGHIC-UHFFFAOYSA-N 0.000 description 1
- XFMXUMOIPYHMFP-UHFFFAOYSA-N CCCCN(CCCC)CCCNC(=O)C1=CC(Cl)=CC=C1NC(=O)C1=NNC2=CC=CC=C21 Chemical compound CCCCN(CCCC)CCCNC(=O)C1=CC(Cl)=CC=C1NC(=O)C1=NNC2=CC=CC=C21 XFMXUMOIPYHMFP-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/26—Visual data mining; Browsing structured data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
- G16B50/20—Heterogeneous data integration
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
- G16H10/40—ICT specially adapted for the handling or processing of patient-related medical or healthcare data for data related to laboratory analysis, e.g. patient specimen analysis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H30/00—ICT specially adapted for the handling or processing of medical images
- G16H30/20—ICT specially adapted for the handling or processing of medical images for handling medical images, e.g. DICOM, HL7 or PACS
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H30/00—ICT specially adapted for the handling or processing of medical images
- G16H30/40—ICT specially adapted for the handling or processing of medical images for processing medical images, e.g. editing
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H40/00—ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices
- G16H40/60—ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices for the operation of medical equipment or devices
- G16H40/67—ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices for the operation of medical equipment or devices for remote operation
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B45/00—ICT specially adapted for bioinformatics-related data visualisation, e.g. displaying of maps or networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99933—Query processing, i.e. searching
- Y10S707/99936—Pattern matching access
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99941—Database schema or data structure
- Y10S707/99944—Object-oriented database structure
- Y10S707/99945—Object-oriented database structure processing
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99941—Database schema or data structure
- Y10S707/99948—Application of database or data structure, e.g. distributed, multimedia, or image
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- General Health & Medical Sciences (AREA)
- Public Health (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Primary Health Care (AREA)
- Epidemiology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Data Mining & Analysis (AREA)
- Biomedical Technology (AREA)
- Radiology & Medical Imaging (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biotechnology (AREA)
- General Physics & Mathematics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
- Bioethics (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Biology (AREA)
- Biophysics (AREA)
- Pathology (AREA)
- Business, Economics & Management (AREA)
- General Business, Economics & Management (AREA)
- User Interface Of Digital Computer (AREA)
- Image Processing (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
Systems and methods for integrating a computer-aided molecular discovery process across a plurality of computer-aided molecular discovery applications are described. In one embodiment, an integrated user interface provides the user with access to the capabilities from a broad array of commercial and custom application modules, such as calculation engines. In another embodiment of this invention a heterogeneous cluster provides the ability to divide the processing required for a molecular discovery process to increase efficiency and reduce the time required to perform molecular discovery.
Description
- This application claims priority under 35 USC 119 from U.S. provisional application serial No. 60/371,644, entitled “System and Method for Data Analysis, Manipulation and Visualization”, filed Apr. 10, 2002; U.S. provisional application serial No. 60/371,956, entitled “System and Method for Data Analysis, Manipulation and Visualization”, filed Apr. 11, 2002; U.S. provisional application serial No. 60/371,643, entitled “System and Method for Integrated Computer-Aided Molecular Discovery,” filed Apr. 10, 2002; and U.S. provisional application serial No. 60/371,871, entitled “System and Method for Integrated Computer-Aided Molecular Discovery,” filed Apr. 11, 2002. The disclosure of each of these provisional applications is hereby incorporated herein by reference. This application also relates to U.S. patent application Ser. No. 10/120,278 entitled “Probes, Systems, and Methods for Drug Discovery,” filed Apr. 10, 2002 which is incorporated herein by reference. This application also relates to attorney docket number 41305-283186, filed simultaneously, entitled “System and Method for Data Analysis, Manipulation and Visualization” which is incorporated herein by reference.
- A portion of the disclosure of this patent document and its figures contain material subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, but otherwise reserves all copyrights whatsoever.
- This invention generally relates to computer-aided molecular discovery. This invention more particularly relates to a system and method for integrating diverse commercial and custom computer application modules to perform molecular discovery.
- Combinatorial chemistry historically required the screening of thousands of compounds to identify a lead compound or scaffold(s) for detailed analysis. The introduction of computational methods for identifying the scaffold(s) has increased the efficiency of the screening by reducing the number of compounds to be analyzed. However, although the number of compounds to be analyzed has decreased, the number remains large.
- Also, performing computer-aided molecular discovery has conventionally required the use of multiple incompatible systems or the use of a single application, which includes some but not all of the functionality required to efficiently and effectively identify compounds of interest. An application from one software vendor may include one module or a suite of modules for performing various tasks in the discovery process. Generally, the application modules from one vendor work together by design; however, they are generally incompatible with modules from other vendors.
- Conventionally, chemists and biologists have addressed the incompatibility or ineffectiveness of existing systems in various ways. One approach has been to perform each step in the process using the best tool, regardless of what application it resides in. Following this approach is relatively slow and labor-intensive. The scientist must wait for each step to finish before beginning the next step. In addition, various modules from different applications are often incompatible, forcing the scientist to make changes to the input and output files manually in order to continue the process.
- Because of the effort involved in attempting to integrate multiple incompatible systems, many scientists simply use a single application. Unfortunately, a single application may not perform every function or may not perform every function in an optimal manner. By following this approach, the scientist is forced to accept the shortcomings of a particular module within an application in order to avoid the manual processes involved in using the best modules of several applications.
- In addition to the problems experienced by scientists, the computer and network administrators face multiple challenges in supporting these scientific applications. One challenge is to ensure that all applications, including the user interfaces, operate on the scientists' workstations. Another major challenge is ensuring that the performance of the system is acceptable. Some of these applications are computationally intensive and place great demands on the system on which they are installed.
- The problem of ensuring adequate performance of a computationally-intensive application is conventionally addressed in a number of ways. The simplest approach is to buy more powerful servers. However, very powerful servers, such as supercomputers, are also very expensive. Also, one server may not be able to run all the diverse applications required by the scientists.
- Other approaches for addressing these needs include implementing a grid computing architecture. One such approach used by some organizations outside the molecular discovery area is to use the unused computer cycles of computers on a network. For example, the Search for Extraterrestrial Intelligence (SETI) Institute (SETI.org) has established a program called SETI@Home (http://www.seti.org/science/setiathome.html), whereby individuals download a software module that allows SETI to use the unused computing cycles on the individuals' computers. However, as SETI states, “the data processing does not occur ‘real time’ so that interesting signals must be followed up at a later date”. The SETI system does not attempt to determine which systems are available. When an individual's computer is not busy, for example, when the screensaver appears, the computer requests a “work unit” from SETI. If and when processing is complete, the computer transfers the file back to SETI. Ensuring that work is performed on a timely basis is very difficult with the SETI implementation and determining the status of a particular job is virtually impossible until that job is returned.
- Various other systems use similar approaches. For example, the FightAIDS@Home project (http://www.fightaidsathome.org/) utilizes MS Windows PC's in much the same manner as SETI. FightAIDS@Home is even more limited than the SETI project, requiring that the computers have a Windows operating system.
- Embodiments of this invention provide systems and methods for integrating a computer-aided molecular discovery process across a plurality of computer-aided molecular discovery applications. Embodiments utilize proprietary and customized tools to perform efficient computer-aided molecular discovery in a parallel, automated, and platform independent fashion.
- One embodiment of this invention includes a method for integrating a computer-aided molecular discovery process across a plurality of computer-aided molecular discovery applications, the method comprising: (a) receiving an input; (b) providing said input to a first module of a first computer-aided molecular discovery application; (c) providing said input to a second module of a second computer-aided molecular discovery application; (d) executing said first module to create a first output; and (e) executing said second module to create a second output.
- Another embodiment of this invention includes a method for integrating a computer-aided molecular discovery process using a plurality of computer-aided molecular discovery applications, the method comprising: (a) receiving an input; (b) providing said input to a first module of a first computer-aided molecular discovery application; (c) executing said first module to create a first output; (d) providing said first output to a second module of a second computer-aided molecular discovery application; and (e) executing said second module to create a second output.
- Another embodiment of this invention includes a system for integrating a computer-aided molecular discovery process across a plurality of computer-aided molecular discovery applications, the system comprising: an application-neutral computer-aided molecular discovery application framework; a first module interface for a first computer-aided molecular discovery application in communication with said computer-aided molecular discovery application framework; and a second module interface for a second computer-aided molecular discovery application in communication with said computer-aided molecular discovery application framework.
- Another embodiment of this invention includes a computer-readable medium on which is encoded programming code for integrating a computer-aided molecular discovery process across a plurality of computer-aided molecular discovery applications, the computer-readable medium comprising: (a) program code for receiving an input; (b) program code for providing said input to a first module of a first computer-aided molecular discovery application; (c) program code for providing said input to a second module of a second computer-aided molecular discovery application; (d) program code for executing said first module to create a first output; and (e) program code for executing said second module to create a second output.
- Another embodiment of this invention includes a computer-readable medium on which is encoded programming code for integrating a computer-aided molecular discovery process using a plurality of computer-aided molecular discovery applications, the computer-readable medium comprising: (a) receiving an input in a user interface; (b) providing said input to a first module of a first computer-aided molecular discovery application; (c) executing said first module to create a first output; (d) providing said first output to a second module of a second computer-aided molecular discovery application; and (e) executing said second module to create a second output.
- Another embodiment of this invention includes a laboratory comprising a system for integrating a computer-aided molecular discovery process across a plurality of computer-aided molecular discovery applications, the system comprising: an application-neutral computer-aided molecular discovery application framework; a first module interface for a first computer-aided molecular discovery application in communication with said computer-aided molecular discovery application framework; and a second module interface for a second computer-aided molecular discovery application in communication with said computer-aided molecular discovery application framework.
- Another embodiment of this invention includes a computer network comprising a system for integrating a computer-aided molecular discovery process across a plurality of computer-aided molecular discovery applications, the system comprising: an application-neutral computer-aided molecular discovery application framework; a first module interface for a first computer-aided molecular discovery application in communication with said computer-aided molecular discovery application framework; and a second module interface for a second computer-aided molecular discovery application in communication with said computer-aided molecular discovery application framework.
- In one embodiment of this invention, the user is provided with a graphical interface that provides the user with the capabilities of a broad array of application modules, such as calculation engines, from a variety of commercial and custom applications. The calculations may be a model and platform independent. Therefore, implementation of new calculation methods is very simple. One embodiment of this invention is capable of utilizing many different computer platforms, including UNIX and LINUX, and provides load balancing for heterogeneous clusters comprising multiple computing platforms.
- Since the system is able to utilize a variety of applications and modules, the system is extremely flexible. The user and/or system administrator chooses the modules to use for performing each task or sub-task.
- An embodiment of this invention provides an automated, integrated all-in-one solution. One embodiment includes a queuing system that discriminates job priorities and provides a “divide and conquer” mechanism in job execution. The user interface provides web deployable standardized docking protocols for novice users. One embodiment provides output plate-id information for subsequent screening activities and/or the ability to flexibly include standard drug-like filters. Embodiments of this invention are reconfigurable for any combination of modeling tools.
- Also, embodiments of this invention provide enormous benefits in terms of scalability. Each of the processes of the system may be executed in a parallel manner utilizing a heterogeneous cluster of networked computers. These computers may be different in terms of both hardware and operating system from one another. The system determines which nodes of the cluster are available and offloads a portion of the processing for any step to the underutilized node.
- The flexibility of embodiments of this invention provides advantages to many different members of the computer-aided molecular discovery market. For example, a laboratory or other organization can increase the efficiency of its scientists, improve the utilization of its computing resources, and easily integrate the variety of applications necessary to perform discovery. Also, by utilizing embodiments of this invention, software developers are able to create custom or commercial modules that can be easily integrated with existing commercial applications. Embodiments of this invention also provide great flexibility to software sellers. The sellers can tout the benefit of multiple commercial applications, which can be integrated under a single easy-to-use interface. System integrators also benefit from utilizing embodiments of this invention. The process of integration becomes much simpler because the integrator is not forced to write various separate applications to integrate each of the various modules a molecular discovery laboratory utilizes.
- Further details and advantages of this invention are set forth below.
- These and other features, aspects, and advantages of the this invention are better understood when the following Detailed Description is read with reference to the accompanying drawings, wherein:
- FIG. 1 illustrates an exemplary environment for one embodiment of this invention;
- FIG. 2 illustrates a multi-layer application framework in one embodiment of this invention;
- FIG. 3 illustrates one embodiment of this invention as a 3-level structure of interrelated modules;
- FIG. 4 illustrates the general process one embodiment of this invention utilizes in reference to the high-level modules of FIG. 3;
- FIG. 5 illustrates the process implemented by the Protein Sequence Translation module in one embodiment of this invention;
- FIG. 6 illustrates the binding site hypothesis process in one embodiment of this invention;
- FIG. 7 illustrates the docking or screening process in one embodiment of this invention;
- FIG. 8 illustrates the process implemented by the Selection and Analysis module in one embodiment of this invention;
- FIG. 9 illustrates the general process of presenting and updating the user interface and scheduling and executing jobs in one embodiment of this invention;
- FIG. 10 illustrates the search process in one embodiment of this invention;
- FIG. 11 illustrates the general process of creating and executing jobs in one embodiment of this invention;
- FIG. 12 illustrates utilizing templates and customized jobs in one embodiment of this invention;
- FIG. 13 illustrates providing email notification of search results in one embodiment of this invention;
- FIG. 14 illustrates providing modeling results via email in one embodiment of this invention;
- FIGS. 15A & B illustrate providing binding sites results via email in one embodiment of this invention;
- FIG. 16 illustrates automated docking results via email in one embodiment of this invention;
- FIG. 17 illustrates the creation and execution of a custom script for a commercial application module in one embodiment of this invention;
- FIG. 18 illustrates the pre-paralellization process in one embodiment of this invention;
- FIG. 19 illustrates the paralellization of a process in one embodiment of this invention;
- FIG. 20 illustrates a scheme for performing in silico screening of probes or compounds;
- FIGS. 21A and B are screen shots illustrating an advanced user configuration interface in one embodiment of this invention;
- FIG. 22 is a screen shot illustrating an administrator configuration interface in one embodiment of this invention;
- FIG. 23 is a screen shot illustrating a user interface for providing the status of jobs submitted to a heterogeneous cluster in one embodiment of this invention;
- FIG. 24 is a screen shot illustrating the help system in one embodiment of this invention;
- FIG. 25 illustrates the process of performing 3-D structure determination in one embodiment of this invention;
- FIG. 26 illustrates a process for binding site identification in one embodiment of this invention; and
- FIG. 27 illustrates a process for docking in one embodiment of this invention.
- Embodiments of this invention provide systems and methods for integrating a computer-aided molecular discovery process across a plurality of computer-aided molecular discovery applications. One embodiment utilizes one or more modules from various commercial and custom application modules to perform a step in a molecular discovery process, utilizing structure-based, ligand-based, and/or property filter based approaches.
- Another embodiment utilizes various application modules to perform multiple steps or tasks in a molecular discovery process, such as the steps that comprise of building homology model(s) and detecting a set of potential binding sites. Yet another embodiment incorporates both horizontal and vertical integration of multiple modules from both commercial and custom applications. Embodiments of this invention may utilize application modules that execute on any hardware/operating system platform and may provide the ability to execute modules in a parallel manner.
- In addition, embodiments of this invention may execute any portion of the discovery process in an iterative manner in order to attempt to enhance the results and/or simplify the process for the user. One embodiment of this invention provides a graphical user interface in which a user defines a series of tasks to be performed by one or more application modules. The user also provides the input data necessary for processing. In another embodiment, the user enters information into a file, which is used by the system to perform the discovery process.
- The tasks may comprise executing one or many application modules. An input may be provided to a plurality of tasks or the output from one task may provide input to a successive task. The output from a task may be processed by an appropriate script to modify the format as required by the input specification of the successive task. The tasks may perform similar or complementary functions. For example, a task may comprise using one input data set to execute multiple application modules that perform the same general function. The user is then able to compare the resulting output sets.
- In one embodiment, a user assembles the tasks within a graphical user interface. Icons or other visual symbols may represent the tasks and the user arranges the tasks in a sequence or flow model to perform the desired discovery process. The interface provides the user with the capability of entering and/or modifying conditions, thresholds, iteration parameters, and other information necessary to execute the desired process.
- In defining the sequence the user may allow interruptions between various tasks so that the user can change the flow as needed. For example, if a user feels that multiple iterations of a task may be needed, the user can set an interruption in the flow after a particular task and decide after each iteration of the task whether to repeat the task or series of tasks until a desired result is achieved or to proceed with subsequent tasks in the process. In addition, the user may desire to repeat or modify particular task(s) using different application module(s). An embodiment of this invention allows such flexibility.
- Once a user has optimized the sequence or flow of the process, an embodiment of this invention allows a user to store the sequence or flow for use by other users. Also, the tasks that make up the topology may be executed on a single computer or within a heterogeneous network. Execution in a heterogeneous network may comprise splitting the data and creating scripts to be distributed on the various computing platforms. One embodiment provides the expert users with various prioritization and management facilities.
- Referring now to the drawings in which like numerals represent like elements throughout the several figures, FIG. 1 illustrates an exemplary environment for one embodiment of this invention utilizing both horizontal and vertical integration as well as parallel execution. In the embodiment shown, a user workstation displays a user interface. The workstation may provide a command line interface, a graphical user interface, or any other interface with which a user may interact. For example, the user interface may comprise a web page, including HyperText Markup Language (HTML), Java, script, and other components. The user interface may also include Tool Command Language (TCL).
- A variety of hardware and operating system combinations may support the interface, including Silicon Graphics (SGI)
workstations 102, Unix and Linux (*NIX)workstations 104, workstations capable of supporting one of the many flavors ofMicrosoft Windows 106, and Apple'sMacintosh workstations 107. - In the embodiment shown, the user workstation102-107 accesses an
application server 108. Theapplication server 108 may include a web server. In such an embodiment, the web server generates the user interface, accepts parameters from the user interface, and inserts those parameters into a database to, among other purposes, initiate program flow in the application as is discussed in detail below. Theapplication server 108 is able to provide the user interface to a plurality of users concurrently. Theapplication server 108 may also comprise a plurality of application servers depending on the number of concurrent users anticipated by the systems administrator. - In order to present the user interface and provide various other features, the
application server 108 accesses a database. In the embodiment shown, theapplication server 108 accesses multiple databases, includingremote databases 110 andlocal databases 112, such as control or administrative databases. These databases may include corporate or commercial databases. These databases may be stand-alone databases on a single database server, such as those exemplified bydatabases databases 114. - In one embodiment of this invention, the
application server 108 uses CGI (Common Gateway Interface), Personalized Homepage Tools (PHP), eXtended Markup Language (XML), and standard data access modules to provide the user interface and process user requests. To initiate jobs, theapplication server 108 also accesses a computer that executes an application module, such as a server or other member of aheterogeneous cluster 116. - A
heterogeneous cluster 116 includes processors. Processors can include, for example, digital logical processors capable of processing input, executing algorithms, and generating output as necessary. Such controllers may include a microprocessor, an Application Specific Integrated Circuit (ASIC), and state machines. Such processors include, or may be in communication with, media, for example computer readable media, which stores instructions that, when executed by the processor, cause the processor to perform the steps described herein as carried out, or assisted, by a processor. Embodiments of computer-readable media include, but are not limited to, an electronic, optical, magnetic, or other storage or transmission device capable of providing a processor, such as the processor in a web server, with computer-readable instructions. Other examples of suitable media include, but are not limited to, a floppy disk, CD-ROM, magnetic disk, memory chip, ROM, RAM, ASIC, configured processor, all optical media, all magnetic tape or other magnetic media, or any other medium from which a computer processor can read. Also, various other forms of computer-readable media may transmit or carry instructions to a computer, including a router, private or public network, or other transmission device or channel. - An application module is a program or portion of a program that can be executed in some manner through the user interface. The module may be an entire commercial application, a single module from a commercial application, a custom module, or some other executable code.
- By utilizing variety of application modules to perform calculations, embodiments of this invention operate independently from the constraints of any one commercial application. In addition, it is relatively simple to implement new modules from various applications. Additionally, embodiments of this invention are not limited to operation on a single hardware and software platform. The modules may execute on any platform on which they are designed to function, including *NIX, Microsoft Windows, and other platforms. Not only does this platform independence increase the flexibility of a system according to this invention, it also increases the scalability. One embodiment of this invention is capable of balancing the processing load for performing calculations across heterogeneous clusters, such as
heterogeneous cluster 116. - Some commercial applications are only capable of running on a limited number of hardware and operating system environments. For example, one commercially available application may only execute within a Windows environment. An embodiment of this invention does not seek to provide a means for the application to run on hardware or operating systems on which it is not designed to run, but rather to allow the user to control the execution of a module or modules of the application from an integrated user interface.
- In the embodiment shown in FIG. 1, rather than accessing a single server, the
application server 108 accesses aheterogeneous cluster 116 of computers that execute the application module specified by theapplication server 108. The heterogeneous cluster may include any type and number of computers, both workstations and servers. In the embodiment shown, the heterogeneous cluster includes arack server 118, theSGI 102 and *NIX 104 workstations, which also may display the user interface, aserver cluster 120, and agrid computing architecture 121. An example of the manner in which theapplication server 108 utilizes theheterogeneous cluster 116 is presented in detail below. - To provide maximum flexibility and scalability, one embodiment of this invention utilizes the multi-layer application framework illustrated in FIG. 2 to process requests from the user interface. FIG. 2 will now be described with reference to the exemplary environment shown in FIG. 1. However, the environment shown in FIG. 1 is merely exemplary; the application framework shown in FIG. 2 is in no way limited to operating within the environment shown in FIG. 1.
- The application framework shown in FIG. 2 includes a
user interface 202 executing on a user workstation, such as anSGI workstation 102. The user interface includes modules 204 a-d. The components 204 a-d may be presented individually in theuser interface 202, such as with component-1 204 a and component-2 204 b, or be presented in combination 204 c,d. When a user specifies a request in theuser interface 102, the embodiment shown in FIG. 2 executes an “Add Job”process 206. The “Add Job”process 206 creates database records in a table in a database, such aslocal database 110. For each component 204 a-d, multiple “Add Job” processes 206 may execute, creating multiple jobs 208 a-d. In addition, in a multi-user environment, each user interface creates independent jobs 208 a-d. As jobs 208 a-d are created, a “Status”process 209 alerts the user viauser workstation 102 or via other means when changes in status of the particular job 208 a-d occurs. - In the embodiment shown in FIG. 2, a background process or
daemon 210 is activated when jobs 208 a-d are created in thedatabase 110. Thedaemon 210 executes the code necessary to create processes within theheterogeneous network 116 corresponding to the job 208 a-d. Thedaemon 210 maybe a background process in a *NIX or other environment or may exist as a screen saver in a Microsoft Windows environment or as a process in a different operating environment. - A hypothetical search provides an example of how the process shown in FIG. 2 works. A user wishes to search for a protein or nucleic acid structure, so the user enters search criteria in a component204 a-d in the
user interface 202. The search request causes the “Add Job”process 206 to add a job 208 a-d todatabase 110. The job 208 a-d includes various parameters, including, for example, the sequence, user name, search engines to utilize, and others. Thedaemon 210 evaluates these parameters and submits the job 208 a-d to one or more application modules,search 212 in FIG. 2, for processing. Thesearch module 212 performs the necessary processing and then determines whether additional jobs must be performed 218. If so, the “Add Job”process 206 is again executed. If not, a “Notification”process 220 notifies the user that the process is complete. In the example, notification occurs viauser workstation 102. However, notification may occur using a variety of methods, including fax, instant messaging, automated phone messaging, e-mail or any other means capable of providing notification to a user. As is shown in FIG. 2, one embodiment of this invention may utilize various application modules, includingmodeling 214 and docking 216 modules. “C-Engines” in FIG. 2 refer to calculation engines. - FIG. 3 illustrates one embodiment of this invention as a 3-level structure of interrelated modules. The embodiment shown utilizes both horizontal and vertical integration of various application modules as well as the capability of executing various modules in a parallel manner. The embodiment shown integrates visualization, simulation, and application modeling development under the control of a
comprehensive user interface 202. Theuser interface 202 may be a command-line interface, a browser-based interface, or other GUI. The scientific aspects of the embodiment shown include four broad high-level modules 302-308, which include twelve lower-level modules 312-334. In addition, the embodiment shown also includes anapplication framework module 310, which includes three lower-level modules 336-340. One embodiment of this invention need not include all of the modules shown in FIG. 3. The structure shown is merely illustrative of one embodiment of this invention. - One embodiment of this invention delivers high throughput computer-aided molecular discovery by coupling computational chemistry with high throughput screening. Custom methodology modules can be developed by utilizing tools currently available in the software industry or created independently for data analysis, mining, and visualization. The system may utilize commands, macros, and scripts, allowing applications to be customized by end-users throughout an organization. Also, although embodiments have been described in terms of molecular discovery, the application framework implemented in embodiments of the present invention may be utilized in a variety of applications.
- For example, one embodiment of this invention utilizes the following commercially available software packages: Cerius2 (Accelrys Inc, San Diego, Calif.) and Molecular Operating Environment (MOE™) (Chemical Computing Group, Montreal, Canada) as calculation engines in some of its modules. However, embodiments of this invention are not limited to those or other commercially-available applications. The modular structure of an embodiment allows the implementation of other calculation engines.
- The five first-level modules shown in FIG. 3 include: (1) a Protein Sequence Translation module302, which automates the translation of a protein sequence to a three-dimensional structure(s) in an efficient manner (Protein is used only as an example in this specification; any target may be sequenced and ranked in one embodiment of this invention); (2) an Identify Binding Sites module 304, which automates the detection of the putative binding sites, calculates their physico-chemical properties and may perform other functions specified by a user, such as eliminating incorrect sites; (3) a Dock Compounds module 306, which automates the docking of a large number of compounds in an efficient fashion utilizing parallel approaches to split the process among different processors based on protein structures and protein sites and ranks them utilizing a number of scoring functions; (4) a Selection and Analysis module 308, which selects high ranking probes or compounds (Probe and compound are used interchangeably throughout this specification as examples) and submits queries to a database to identify the plates they reside in, analyze them, perform identity, similarity and clustering checks, and rank them for in biologico (in vitro/in vivo) screening by generating structure and site specific reports containing plate numbers, location, and the chemical structure of all their constituents; and (5) an Applications Framework module 310, which generates the user interface and provides job control and parallel execution management in the embodiment shown in FIG. 3.
- As used herein, the term “probe” refers to a molecular framework encompassing association elements suitable for interaction with a macromolecular biological target, such as but not limited to DNA, RNA, peptides, and proteins, said proteins being those such as but not limited to enzymes and receptors.
- FIG. 4 illustrates a process of in silico molecular screening utilized by one embodiment of this invention in reference to the high-level modules of FIG. 3. Also illustrated in FIG. 4 are exemplary calculation engines that may be applied to each step in the process.
- Although the processes shown are described as occurring in a certain order, embodiments of the current invention are not limited to performing a sequential set of modules. The modules may be run in any combination and in any order. Also, each module or group of modules may execute iteratively if a user prefers.
- Referring again to FIGS. 3 and 4, the Protein
Sequence Translation module 302 first determines if the submitted sequence corresponds to an existing crystal structure or other experimentally determined three-dimensional structures 402. If not, the three-dimensional structure is determined from thesequence 404. The experimental structure(s) may be retrieved from a protein data bank (www.rcsb.org) or determined using a commercial product, such as but not limited to MOE™ or InsightII® (Accelrys Inc., San Diego, Calif.). Once the three-dimensional structure is determined, or if the crystal structure already exists, the process proceeds to the next step, the bindingsite hypothesis 406, which is performed by the Identify BindingSites module 304. A commercial application, such as MOE™, Dock, or Cerius2, may perform the binding site hypothesis step. - The next step in the process shown is in
silico screening 408, a step performed by the Dock Compoundsmodule 306. Commercial products, which may be used for this step in the process, include but are not limited to MOE™, Cerius2, and Schrödinger. This step in the process also retrieves data from a database, such aslocal database 110. The final step in the in silico process isplate selection 410, which is accomplished by the Selection andAnalysis module 308. In one embodiment of this invention, plate selection is accomplished via custom code. Once the in silico process steps are complete, the compound(s) proceed to inbiologico screening 412. - Each of the modules of one embodiment of this invention will now be described in detail with reference to FIG. 3. The first high-level module is the Protein
Sequence Translation module 302. The goal of thismodule 302 is to automate the creation of a three-dimensional protein model from a protein sequence. Several databases may be used in a concerted fashion to optimize the structural diversity and relevance of the final three-dimensional model that may be used for in silico screening, including commercial, public, and proprietary databases. This process is not aimed at substituting for the scientist, but at performing rapid and automated tasks in a way that may not require user's intervention. In one embodiment of this invention, themodule 302 generates a series of log files. The scientist has the ability to examine the log files to perform quality control checks and to identify any potential issues. The scientist also has the ability to re-execute (replay) the log file or to modify the log file directly and then execute the series of steps contained in the modified log file as desired. - The embodiment illustrated in FIG. 3 is merely exemplary. Other embodiments of this invention include subsets of the modules shown or additional modules. For example, one embodiment of this invention provides links to an integrated data analysis solution. In such an embodiment, information from in silico and in biologico screening is combined in an integrated user interface. Such an embodiment is described in attorney docket number 41305-283186, filed simultaneously, entitled “System and Method for Data Analysis, Manipulation and Visualization” which is hereby incorporated by reference. Other types of applications may require ad hoc combinations of custom and commercially-available application modules.
- FIG. 5 illustrates the process implemented by the Protein
Sequence Translation module 302. Themodule 302 first accepts the sequence as aninput 502. Themodule 302 searches for similar sequences commercial and/or proprietary databases and performsmulti-sequence alignment 504. - Sequence alignment attempts to align several protein sequences such that regions of structural and/or functional similarity are identified and highlighted. Different matrices are used to perform such alignment, such as but not limited to the freely available engines ClustalW (Jeanmougin, F., Thompson, J. D., Gouy, M., Higgins, D. G. and Gibson, T. J.,Trends Biochem Sci, 23, 403-5 (1998)) or MatchBox (Depiereux, E., Baudoux, G., Briffeuil, P., Reginster, I., De Bolle, X., Vinals, C., Feytmans, E., Comput. Appl. Biosci. 13(3) 249-256 (1997)). Databases of protein sequences can be used to identify protein sequences that possess some (user defined) degree of similarity with the protein target of unknown structure, such as but not limited to the freely available internet-based programs FASTA (http://www.ebi.ac.uk/fasta3/) or BLAST (http://www.ncbi.nlm.nih.gov/BLAST/).
- Also, commercially available computer programs, such as but not limited to MOE™ (Chemical Computing Group Inc, Montreal, Canada), Homology (Accelrys Inc., San Diego, Calif.), and Composer™ (Tripos, Inc., St. Louis, Mo.) can perform database searches of the application's proprietary database and sequence alignments as an integrated process. Emphasis can be put on finding similarity among sequences that are known to be associated to certain biological functions, in order to predict not only the structure but also the possible function of the target protein.
- The
module 302 next selects the highlyhomologous sequences 506 with known three-dimensional structures and constructs three-dimensional model(s) 508 (homology models). Once construction of the three-dimensional models is complete, the process proceeds to the bindingsite hypothesis process 406 described in FIG. 6. - The process illustrated in FIG. 6 begins with the three-dimensional structures output by the Structure Determination from
Sequence process 404. These three-dimensional structures are used for binding and/or association site(s) detection 602 (referred to herein as “binding sites”). Once the binding site detection is complete, the binding sites are characterized physically 604. Then the binding sites are ranked 606 and a user-specified number of sites are used for subsequent in silico screening. The process then proceeds to insilico screening 408. - Referring again to FIG. 3, the Protein
Sequence Translation module 302 includes three lower-level modules: Retrieve Protein Sequence/Structures 312, Perform Sequence Alignment 314, andProduce 3D StructureStructures module 312, one embodiment of this invention starts from a target sequence and retrieves protein structures that have structural/functional similarity with the target sequence. Themodule 312 processes the target sequence through a search engine, such as BLAST in NCBI, to search for known protein(s) with similar sequence(s). Themodule 312 may utilize public sequence and three-dimensional structure databases. In one embodiment, the Retrieve Protein Sequence/Structures 312 performs a search in a database, such as a protein data bank (PDB). In another embodiment of this invention, the user may perform a keyword search. The keywords describe the biological nature of the protein. For example, kinases, and GPCR are keywords that the user may specify. Other modules use the retrieved three-dimensional structures during processing. For example, in the embodiment shown, these three-dimensional protein structures are used to construct a homology model for the target. - Several commercially available computer programs, such as but not limited to MOE™ (Chemical Computing Group Inc, Montreal, Canada), InsightII® (Accelrys Inc., San Diego, Calif.), and Modeler© (Andrej Sali, Rockefeller University, New York, N.Y., http://guitar.rockefeller.edu/modeller/modeller.html) can be used to perform homology modeling. Threading algorithms are described in Godzik A, Skolnick J, Kolinski A.,J. Mol. Biol., 227,227-238 (1992) and in other literature. Commercially available threading software includes MatchMaker™ (Tripos, Inc., St. Louis, Mo.).
- The next module in the embodiment shown in FIG. 3 is the Perform Sequence Alignment module314. This module accepts a sequence in a standard format, such as the FASTA format, and searches for proteins of similar sequence in a commercial or other database (e.g. MOE™). The module retrieves these three-dimensional protein structures as well as the three-dimensional protein structures from the
previous module 312 and performs a sequence alignment on all of them. The aligned chains, including alignment scores, are passed to the subsequent module. - The
Produce 3D Structure module 316 runs a homology model engine for the chain with the highest alignment score, or to that selected by the user and produces a three-dimensional model(s) for the target sequence in Protein Data Bank (PDB) format. In one embodiment, the user may modify the default parameter values of the homology modeling process viauser interface 202. The user may also perform quality control checks and may also re-run the same process by selecting an alignment score different that that of the highest score. - In the embodiment shown in FIG. 4, the
Produce 3D Structure module 316 is the final lower-level module of the ProteinSequence Translation module 302. However, energy minimization and/or molecular dynamics simulations may also be performed for the three dimensional structural model(s) using InsightII (Accelrys Inc., San Diego, Calif.) or MOE™ (Chemical Computing Group, Montreal, Canada) or other application software known to those skilled in the art. The next high-level module is the Identify BindingSites module 304. - The Identify Binding
Sites module 304 includes one lower-level module, the Identify and Rank BindingSites module 318. Thismodule 318 accepts the three-dimensional model for the target protein and processes it through one of the custom or commercial calculation engines, e.g., Cerius2. The Identify BindingSites module 318 uses the calculation engine to identify possible binding sites for the protein and ranks the binding sites by size, saving the first n binding sites (n specified by the user). These sites are then passed to a specified calculation engine or engines together with the protein information. Themodule 318 may utilize additional or other algorithms such as Putative Active Sites with Spheres (PASS: Brady, Jr G. P. and. Stouten P. F. W, J. Comp. Aided Molec. Design, 14, 383-401 (2000)) aimed at identifying possible sites as well. - In the case of shape-based methods, the sites are defined based on the shape of the target protein. Within the volume of the target protein, a flood-filling algorithm is employed to search unoccupied, connected grid points, which form the cavities (sites). All sites detected can be browsed according to their size, and a user defined size cutoff eliminates sites smaller than the specified size. Mixed shape/properties sites are defined as connections of hydrophobic and hydrophilic spheres in contact with complementary interacting regions of the target protein. The sites may also be ranked according to the number of hydrophobic contacts made with the receptor, thereby including information about the chemistry of the protein in addition to its geometry.
- Once three-dimensional structure(s) of the target protein(s) is (are) obtained, computer programs are used to predict possible drug association sites in these three-dimensional structures. These results are input to the subsequent in silico screening process. The Dock Compounds
module 306 performs this function and is the next high-level module illustrated in FIG. 4. In the embodiment shown, thismodule 306 uses docking engines in a parallel fashion to screen compound databases or a probe set and so on against protein models to predict compounds that have a higher binding affinity with the protein. Various scoring functions and combinations of scoring functions may then be utilized based on user preferences for scoring the docked protein or compound complex. - FIG. 7 illustrates the docking or screening process. The process begins with output from the binding
site hypothesis process 406. The parallel optimizer extracts three-dimensional structures of the compounds or probes from a database, such as thelocal database 110, and prepares the data forparallel processing 702. In the embodiment shown, the data is processed in parallel for bothcompound structures 704 and identifiedbinding sites 706. Next, automated docking is performed 708. Once the docking is complete, the compounds are ranked according to thescoring function value 710. The docking and ranking information is then output to theplate selection process 410. - As an example of the process shown in FIG. 7, in one embodiment, a probe set is treated sequentially and docking can be performed in parallel. For each probe, rotating the bonds of the probe generates a user-defined number of conformers. For example, one thousand (1000) conformers are generated for each probe through a Monte-Carlo procedure. Other conformational search procedures such as but not limited to simulated annealing, knowledge-based search, systematic conformational search, and others known to one skilled in the art may be employed.
- Each of these conformers is docked in an association site using computational methods such as but not limited to those described below. One such method employs the alignment of the non mass-weighted three-dimensional principal moments of inertia of the probes with that of the association site. The conformer is shifted in its best alignment orientation in the association site to improve the docking. The orientation of the conformer that optimizes the fit between the principal moments of inertia of the probe and the association site is saved to disk, the docking score is calculated as described below for that conformer and the docking process repeats with a new conformer of the same probe. Computer programs such as but not limited to “Cerius2® (LigandFit” (Accelrys Inc., San Diego), DOCK (University of California at San Francisco), F.R.E.D. (OpenEye Scientific Software, Santa Fe, N. Mex.) and others may be used for the docking procedure.
- After docking of the conformers, a score is calculated for each of the probe's conformers in the association site. Several scoring functions can be used for that purpose. One such scoring function is described below.
- Non-bonded electrostatic interactions and volume exclusion calculations can be performed. In this approach, ΔE, the non-bonded interactions between the probe and the target protein, is calculated from the coulombic and van der Waals terms of an empirical potential energy function. ΔE is defined theoretically as: ΔE=E(complex)−[E(Probe)+E(protein)], where E(complex) is the potential energy of the (protein+docked probe) complex, E(probe) is the internal potential energy of the probe in its docked conformation, and E(protein) is the potential energy of the protein alone, i.e., with no probe docked. The protein may be kept fixed during the docking procedure and therefore E(protein) would need to be estimated only once. E(complex) can be calculated either from an explicit description of all the atoms of the protein, or from a grid representation of the association site, the latter being faster in the case where a large number of compounds is to be screened. This approach includes explicitly the calculation of van der Waals interactions between atoms using a Lennard-Jones function. This scoring function favors probes that are small (minimizing van der Waals clashes) and that have large charge-charge interactions between the probe and the protein (maximizing the electrostatic interactions). The scoring function also disfavors probes and/or conformers that exhibit large van der Waals clashes between the probes and the protein.
- Other scoring functions may be used. These include, but are not limited to LUDI (Böhm, H. J.J. Comp. Aided Molec. Design, 8, 243-256 (1994)); PLP (Piecewise Linear Potential, Gehlhaar et al, Chem. Bio., 2, 317-324 (1995); DOCK (Meng, E. C., Shoichet, B. K., and Kuntz, I. D., J. Comp. Chem. 13: 505-524 (1992)); and Poisson-Boltzman (Honig, B. et al, Science, 268, 1144-9 (1995)).
- Some of the above scoring functions are implemented in commercially available software packages such as but not limited to Cerius2® from Accelrys, Inc. (San Diego, Calif.) and MOE™ (Chemical Computing Group Inc., Montreal, Canada)
- This docking/scoring process is done independently for each probe. The score calculated for one probe's conformers does not depend on the calculations for other probes. Therefore, this process is highly scalable, and can be distributed among any number of computers that have the required programs. For two computers for instance, the probes can be divided into two groups that will be docked and scored in parallel. Ultimately, each probe could be docked and scored individually on one processor. Massively parallel computer architecture could then be used to linearly improve the efficiency of the process. The docking/scoring approaches described above can be used to perform massive throughput in silico screening of compounds.
- Referring again to FIG. 3, the Dock Compounds
module 306 includes various lower level or sub-modules. The first lower-level module is the CalculateNode Load module 320. Thismodule 320 calculates the load for each node on a given heterogeneous cluster. TheDivide Data module 322 then divides the data into several pieces to be processed independently on each node in a parallel fashion. For example, in the case of a large structure database (SD) file of chemical structures, the data is divided so that one member of theheterogeneous cluster 116 processes only a portion of the entire data set. Both of thesemodules 320 & 322 are pre-processing modules; they initiate and launch the tasks necessary to prepare data for docking. - The Create Scripts and
Copy Data module 324 is also a pre-processing module. This module 324 (1) executes programs to create per node docking engine scripts and per node shell scripts that ensure data management and proper data allocation and (2) copies the data to the individual nodes. For example, themodule 324 creates scripts that are used by later modules to process each portion of the SD file as divided in the preceding module. Once the file is divided into smaller files, each of the smaller files may be copied, such as by FTP (File Transfer Protocol) to the nodes in theheterogeneous cluster 116. - Once pre-processing is complete, the Execute Docking in
Parallel module 326 executes. Thismodule 326 executes the docking programs in parallel, i.e., at the same time on different members of theheterogeneous cluster 116. Themodule 326 may run on any member of thecluster 116, e.g., on the leading node. In particular, themodule 326 executes and manages the execution of all the processes created by preceding modules 322-324 until they have all successfully completed. - In the embodiment shown in FIG. 3, once pre-processing and docking are complete320-324, the
Perform Post-Processing module 328 executes. Thismodule 328 executes programs for post-processing, including programs that (1) combine the individual SD files after calculation of the in silico screening score into one large final SD file, (2) clean up the data on the individual nodes, removing unused files, and (3) perform any additional per node calculation that might be necessary at this point. These modules 322-324 may utilize various formats. For example, to minimize the volume of network traffic utilized by the modules 322-324, the files may be transferred and processed in a compressed format, such as gzip. - The next high-level module in the embodiment shown is the Selection and
Analysis module 308. This module includes three lower-level modules: a Select Best Compound(s)module 330, a RetrieveLocation Information module 332, and a PerformSimilarity Analysis module 334. - FIG. 8 illustrates the process implemented by the Selection and
Analysis module 308. The process shown in FIG. 8 receives output from the insilico screening process 408. Based on the ranking process, the best n compounds are selected (wherein n is specified by the user or otherwise) 802. Using identifying information, such as the compound or ID number, plate information is extracted from the database (110) 804. The plates are analyzed bymodule 806. For example, in one embodiment, additional wells from each plate that are not selected in the in silico ranking process, are analyzed to determine if similarities exist with the in silico ranked and selected compounds identified in the screening process. These compounds are optionally considered based on their similarity and closeness with the in silico ranked compounds. The process iterates for eachsite 808. - Instead of performing in biologico screening on all of the in silico probe hits obtained only high-ranking probes may be used for subsequent screening activities. Although it may be more relevant to screen only those probes that are identified as in silico probe hits in these plates, various similarity measurements, such as the Tanimoto Coefficient (Tc), may reveal that the other probes in each of the plates containing in silico probe hits to be near neighbors. Hence, all the probes contained in all the plates containing in silico hit(s) may be subjected to in biologico screening. Once the plate selection process is complete, the results are used for the in biologico screening of the identified and selected compounds or
plates 412. - The Selection and
Analysis module 308 provides automated selection of chemistry scaffolds. Themodule 308 also provides automated queries against commercial, public, and proprietary database to select suggested chemistry to be pursued further. In addition, themodule 308 provides plate analysis and clustering, providing an indication of confidence in site specificity and identification of scaffolds. Themodule 308 may also provide automated generation of final reports. - The Select Best Compound(s)
module 330 selects the best-ranked conformation for each selected compound. Themodule 330 next selects the best n compounds or the best m % of all the compounds in their best conformation. The values of n and m may be specified by a system administrator or specified by the user. Themodule 330 outputs various compound identifiers, such as the compound ID number, so that related information, such as the plate ID number, well ID number, and structure, can be retrieved for each compound. - The Retrieve
Location Information module 332 uses the related information to search additional database tables for information, such as the location of the plate identified by the plate ID number. Once a plate has been identified, the information is passed to the next module, the PerformSimilarity Analysis module 334. Thismodule 334 may receive information for one or many plates. - The Perform
Similarity Analysis module 334 performs similarity analysis between the suggested lists of plates to identify any potentially redundant lists, and provides additional information, such as information to assist in prioritizing list submission for in biologico screening. Themodule 334 also allows for filtering the lists to remove any plate or compound from the list. This feature allows a user to remove a compound from the screening list for any number of reasons, including, for example, the compounds nature or presence in another project. Various other analysis functionalities such as Absorption, Distribution, Metabolism, Excretion, Toxicity (ADMET) or other property filters also be implemented as part of this module. - In the embodiment of this invention illustrated in FIG. 3, the modules302-308 and sub-modules 312-334 described above execute within the application framework described in relation to FIG. 2. The application framework is illustrated in FIG. 3 as the
Application Framework module 310. - The Application Framework module includes three lower-level modules: the
Job Scheduling module 336, theUser Interface module 338, and theDevelopment Kit module 340. - The
Job Scheduling module 336 allows a database such as MySQL or Oracle to be used as a job queuing system for any and all modules of the embodiment shown in FIG. 3. Themodule 336 includes theAdd Job 206 andDaemon 210 shown in FIG. 2 and may also include wrappers for each module as necessary. - The
User Interface module 338 provides theuser interface 202. In one embodiment, themodule 338 provides a web interface for job submissions, job administration, and viewing of job results. Themodule 338 may allow cross-platform independence, remote access to job information, and other useful functionality. - The
Development Kit module 340 provides the capability to add custom modules to the embodiment illustrated in FIG. 3. These modules execute under the application framework as illustrated in FIG. 2. They may be written in any of a number of languages, including, for example Perl and C++. - FIG. 9 illustrates the general process of presenting and updating the user interface and scheduling and executing jobs in one embodiment of this invention. In the embodiment shown, the interface is a dynamic page named
GUI 902. GUI includestop header 904, which includes a dynamic menu module,contentCreator 906.ContentCreator 906 generates web page content based on values passed to the script by a drop-down menu or other user interface element. This script creates all the form elements allowing users to enter information and upload multiple files into the application. TheAdd2Queue module 910updates Status 908, which presents status to a user. - The
contentCreator 906 accesses theAdd2Que module 910 to create jobs. TheAdd2Que module 910 reads information about the sequence, for example, from a FASTA or other formattedfile 912, checks for errors, and utilizes the data along with user parameters supplied from thecontentCreator 906 to execute theqAddJob query 914. TheqAddJob query 914 inserts records into thelocal database qDB 110. -
qDB 110 in the embodiment shown is a series of database tables that store information on requested job calculations, what type of calculation types are available for a user's site, how to handle each calculation type, andqDaemon 916 parameters for specific computers, including default parameters.qDB 110 is independent of the computer or user requesting a calculation and the computer that will handle the calculation. Onefunction qDB 110 may implement is to store calculation requests, calculation parameters, input and output data, calculation status, and other information related to requested calculations. Some examples of other information related to a requested calculation include, but is not limited to, who requested the calculation, when the calculation was requested, priority level of the calculation, and searchable user supplied comments related to the requested calculation. TheqDB 110 may also store input and output data file information, such as name pattern of the files and how many files, for each calculation type. -
qDaemon 916 represents a query executing in a background process waiting for jobs to be inserted into theqDB 110. When a new job is found,qDaemon 916 starts ajob 920. Changes to the job table in thedatabase 110 are reflected inGUI 902 via theqStatus 922 andqIDStatus 924 queries. -
qDaemon 916 is a precompiled executable daemon that manages calculations running on the computer on which the daemon was started. TheqDaemon 916 determines when to start a calculation based on a number of variables including but not limited to time of day and current CPU usage.qDaemon 916 requests information from theqDB 110 for the next calculation job that the daemon can run; theqDB 110 then returns information for the next available valid requested calculation based on a list of valid calculation types given by aqDaemon 916 instance, currently waiting requests, and a priority algorithm. If the calculation type requires input data files from theqDB 110, theqDaemon 916 creates any input data files stored in theqDB 110 in a working directory that is also associated with the calculation that is about to run. TheqDaemon 916 then calls a calculation specific wrapper script, based on the calculation type, with the requested calculation parameters. If the calculation type requires data files to be uploaded, theqDaemon 916 uploads the data files to theqDB 110; log files and error log files can be treated as data files. - Valid calculation types that are performed by a particular instance of a
qDaemon 916 are determined at initial startup of the daemon via command line or other parameters. Multiple instances ofQDaemon 916 may execute on a single computer, supporting multiprocessor computers running multiple non-parallel calculations simultaneously. - FIG. 10 illustrates a search process in one embodiment of this invention. The user begins the process shown by starting a search, such as a BLAST search, using a remote, local, commercial party, or custom search utility. The user may also be allowed to select any combination of the available search utilities. The user is also allowed to include results from other searches as an input. In one embodiment of the present invention,
Init Search 1002 initiates the BLAST search, PDB file search, or other search programs. If a remote search was chosen,Mirror Search 1006 is executed. If a local search was chosen,Local Search 1010 is executed. If a commercial party search utility was chosen,commercial Party Utility 1004 is executed. If a custom search utility was chosen, thenCustom Search Utility 1011 is executed. If the user chooses to use more than onesearch utility 1002, than the chosen search utilities (1004, 1006, 1010, 1011) will run simultaneously. -
Mirror Search 1006 is called for searching remote public database queries. This module mirrors result files to the local server for searching. Local Search is called for searching a local mirrored copy of public databases. This module saves results on the local server. commercialParty Search Utility 1004 is called for commercial party database queries. This module saves results to the local server for searching.Custom Search Utility 1011 is used for queries against a custom database. This saves result files to the local server for searching. - Regardless of which search utilities are chosen, search_all1012 combines the result files from each search utility.
Search_all 1012 then appends the results that the user entered fromother searches 1002.Pdb_search 1014 derives unique pdb names fromSearch_all 1012 and applies other conditions/parameters set by theuser 1002 resulting in a list of unique pdb file names. Then download_pdb is called 1016. -
Download_pdb 1016 accepts a list of pdb file names and uses thequery_PDB module 1018 to query the local pdb database to see if the pdb files exist locally. If the files exist locally the script reports the results to the log file and ends 1020. If the files are not found locally, download_pdb generates requests necessary to download the files using 1022 and then callsupdateDB 1024.updateDB 1024 updates the internal database with the names and locations of the downloaded files. - FIG. 11 illustrates the general process of creating and executing jobs in one embodiment of this invention. The first step in the process after
Start 1101 is theqAddJob process 1102. TheqAddJob process 1102 may execute as a result of a command from a user, an automated system event, or any other process or event that results in the creation and execution of a job. TheqAddJob process 1102 simply adds records to theqDB database 110.qDaemon 916 is a background process that waits for jobs to be added to thedatabase 110. When jobs are added to thedatabase 110, theqDaemon process 916 evaluates the records and starts the corresponding process. - In the embodiment shown in FIG. 11, this process may be one of
qSearch 1108,qModel 1110,qSite 1112,qDock 1114, orqSelect 1115. This process is not limited to the five jobs shown. Any other process, such as other 1116, may be executed in this manner with little or no change to the integrated user interface. Thus, one embodiment of this invention provides great flexibility in the implementation and customization of a computer-aided molecular discovery system. - FIG. 12 illustrates utilizing templates and customized jobs in one embodiment of this invention. In the embodiment shown, the first process after
Start 1201 is theqAddJob process 1210, which adds a job record to the database,qDB 110.qDaemon 916 again waits for jobs to be added to thedatabase 110. When a job is added, an application template,qTemplate 1202, is executed, which in turn, executes a customizedcalculation 1204. If additional jobs are spawned from thecalculation 1206, another job is simply added to the database,qDB 110, byqAddJob 1210. If not, a notification is sent by some means, such as instant messaging, email, or by anothermethod 1208. - FIGS.13-17 illustrate the process of providing notification, such as by email or other method, of the completion of a job in one embodiment of this invention. As in other aspects of this invention, the
qDaemon process 916 waits for jobs to be added to the database,qDB 110. When a job is added,qDaemon 916 begins the appropriate job. In the embodiments shown, the job is one ofqSearch 1108,qModel 1110,qSite 1112,qDock 1114,qSelect 1115, orother module process 1116. Each of these jobs executes a corresponding process or series of processes, shown as Init Search throughdownload_PDB 1302,Modelseq 1402,Site 1501, and Dock/Dockrepeat 1504, respectively, in the Figures. Once the process is complete, thenotification module 1304 provides notification to a user, such as by email, fax, instant messaging, or other suitable communication method. - FIG. 15a illustrates the creation and execution of a custom script for a commercial application module in one embodiment of this invention. In the embodiment shown, the Site process is started 1502 after adding a job to the job database as described above. The execution of the Site process results in the creation of a script, which controls the execution of a third-party commercial, public, or custom application. In FIG. 15b, this step is illustrated by the
Site.scriptMaker step 1504. This script is then executed in the Site.exe 1506, which executes thecalculation engine 1506 necessary to perform calculations for the Site process. - Embodiments of this invention provide many benefits over conventional computer-aided molecular discovery systems and processes. One advantage is the ability to parallelize processes across heterogeneous clusters. FIG. 18 illustrates the pre-paralellization process in one embodiment of this invention. The docking process is shown in FIG. 18 for purposes of illustration. However, any of the processes of this invention may be parallelized in the same manner. In the embodiment shown, the docking process is started1802. The start of the process triggers the
parallel process 1804. In order to process the information in parallel, the data file, which is an SD file in the embodiment shown, must be split into multiplesmaller files 1806. The process of splitting is performed by anAgent 1808, which is described in detail below. TheAgent 1808 next copies the smaller data files to the appropriate node in theheterogeneous cluster 1810. The next process then begins 1812, which is illustrated in FIG. 19. - FIG. 19 illustrates the paralellization of a process in one embodiment of this invention. The efficient parallelization of the process is achieved through a combination of processes called Agents that pre-process and post-process the tasks required for parallel runs. A global process, Oligarch (1910) manages the actual run of the docking engine on several nodes. The security of the process is insured by appropriate firewall implementations.
- Agent is a dynamic process that manages the parallelization of all the tasks involved in in silico screening process. Several Agents may handle the pre-processing and the post-processing of the various computational stages in a coherent fashion. As an example, one Agent could be creating input files for the docking engine; another Agent could manage the distribution of all the chemical structures on all the nodes; another Agent could post-process the collection of data.
- To perform its function, Agent determines the configuration of the computer cluster by reading a file (input: cluster.conf file) or through some other means. This file contains information about the server name, common directory for that particular machine, calibration data that are used for heterogeneous cluster load balancing. The parallelization process can be used on a heterogeneous Unix/Linux cluster, including SGI machines or SUN or IBM or Linux boxes with different CPU mixes.
- Oligarch reads a file describing what programs to run in parallel and runs them simultaneously. Oligarch can be located on any member of the cluster but preferably on the leading node of the cluster. Pre-processing Agents create and distribute programs to be run on each node. When it is done, Oligarch runs and manages the execution of all these processes until they have all successfully completed. After completion, Post-processing Agents post-process the data.
- The Dock process as illustrated in FIG. 19 provides an illustrative example of the Agents and Oligarch in one embodiment of this invention. The process shown in FIG. 19 begins where the process in FIG. 18 stops. The data has been divided; in this case a large SD file of chemical structures to be in silico screened, into several pieces to be processed independently on each node in a parallel fashion.
Pre-processing Agents 1808 a,b initiate and launch tasks and prepare data. - One
Agent 1808 a creates per nodedocking engine scripts 1906. Another Agent (not shown) creates per node shell scripts that ensure data management and proper data allocation. OneAgent 1808 b copies the data to theindividual nodes 1908, e.g. in this case the pieces of the original large SD file.Agent 1808 b also creates the file that will be used byOligarch 1910.Oligarch 1910 then executes. After completion,post processing Agent 1808 c executes, combining data and copying the data results 1916. - In the embodiment shown,
Agent 1808 c may actually be multiple Agents. For example, in one embodiment, one Agent combines the individual SD file after calculation of the in silico screening score into one large final SD file. One Agent cleans up the data on the individual nodes, removing unused files. One Agent performs any additional per node calculation that might be necessary at this point. - One embodiment of the in silico screening method is detailed in the block diagram in FIG. 20. Additional detailed aspects of this in silico screening method are detailed below. If the molecular target is a protein, the target's sequence (2002) is compared to sequences of proteins of known three-dimensional structures (2003). Multiple sequence alignment (2004) may be performed using sequence threading algorithms, other methods and algorithms known by those skilled in the art, or using methods such as those described below. Sequence alignment attempts to align several protein sequences such that regions of structural and/or functional similarity are identified and highlighted. Different matrices are used to perform such alignment, such as but not limited to the freely available engines ClustalW (Jeanmougin, F., Thompson, J. D., Gouy, M., Higgins, D. G. and Gibson, T. J. Trends Biochem Sci, 23, 403-5 (1998)) or MatchBox (Depiereux, E., Baudoux, G., Briffeuil, P., Reginster, I., De Bolle, X., Vinals, C., Feytmans, E Comput. Appl. Biosci. 13 (3) 249-256 (1997)). Databases of protein sequences can be used to identify protein sequences that possess some (user defined) degree of similarity with the protein target of unknown structure, such as but not limited to the freely available internet-based programs FASTA or BLAST. Commercially available computer programs, such as but not limited to MOE™ (Chemical Computing Group Inc, Montreal, Canada), or Modeler© (Andrej Sali, Rockefeller University, New York, N.Y., http://guitar.rockefeller.edu/modeller/modeller.html) can perform database searches and sequence alignments as an integrated process. Emphasis can be put on finding similarity among sequences that are known to be associated to certain biological functions, in order to predict not only the structure but also the possible function of the target protein.
- Once a protein of known three-dimensional structure (template) has been identified as homologous to the target protein sequence, one or more three-dimensional structures of the target protein may be built (2006) based on the three-dimensional structure of the template using homology modeling techniques known to one skilled in the art.
- In homology modeling, one attempts to develop models of an unknown protein from homologous proteins. These proteins will have some measure of sequence similarity and a conservation of folds among the homologues. It is hypothesized that for a set of proteins to be homologous, their three-dimensional structures are conserved to a greater extent than their sequences. This observation has been used to generate models of proteins from homologues with very low sequence similarities.
- The steps to creating a homology model may be summarized as follows:
- Identifying homologous proteins and determine the extent of their sequence similarity with one another and the unknown;
- Aligning the sequences
- Identifying structurally conserved and structurally variable regions
- Generating coordinates for core (structurally conserved) residues of the unknown structure from those of the known structure(s)
- Generating conformations for the loops (structurally variable) in the unknown structure
- Building the side-chain conformations
- Refining and evaluating the properties of the unknown structure
- Several commercially available computer programs, such as but not limited to MOE™ (Chemical Computing Group Inc, Montreal, Canada), InsightII® (Accelrys, Inc., San Diego, Calif.), Homology (Accelrys, San Diego, Calif.), and Composer™ (Tripos, Inc., St. Louis, Mo.) can be used to perform homology modeling. Threading algorithms are described in Godzik A, Skolnick J, Kolinski A. 1992, J Mol Biol 227:227-238 and in other literature. Commercially available threading software includes MatchMaker™ (Tripos, Inc., St. Louis, Mo.).
- Several templates can be identified and used to derive one or more three-dimensional structures for the target protein. These different three-dimensional structures for the target protein may be used in a parallel fashion in the in silico screening process (2008) described below. Once three-dimensional structure(s) of the target protein(s) is (are) obtained (2006), computer programs are used to predict possible drug binding site(s) (2010) for the compounds in these three-dimensional structures.
- Several computer programs can be used to identify possible association site(s) (2010), such as but not limited to the shape-based approach from “Cerius2 LigandFit” (Accelrys Inc, San Diego, Calif.), or the mixed size/properties approach from “MOE™ Site Finder” (Chemical Computing Group Inc., Montreal, Canada).
- In the case of shape-based methods, the sites are defined based on the shape of the target protein. Within the volume of the target protein, a flood-filling algorithm is employed to search unoccupied, connected grid points, which form the cavities (sites). All sites detected can be browsed according to their size, and a user defined size cutoff eliminates sites smaller than the specified size. Mixed shape/properties sites are defined as connections of hydrophobic and hydrophilic spheres in contact with mainly hydrophobic regions of the target protein. The sites are ranked according to the number of hydrophobic contacts made with the receptor, therefore including information about the chemistry of the receptor in addition to its geometry.
- Possible association sites, once identified using the one or more of the methods described above, are used to perform in silico screening (2008) of the probes (2012) or a suitable subset or other compound collections. The screening may be separated into two parts: (i) the docking and (ii) the scoring/ranking (2014) of probes. Both processes may be performed in parallel.
- The probe set (2012) is treated sequentially and can be processed in parallel. For each probe, a user-defined number of three-dimensional conformers (2016) are generated by rotating the bonds of the probe. Typically, one thousand conformers are generated for each probe through a Monte-Carlo procedure. Other conformational search procedures such as but not limited to simulated annealing, knowledge-based search, systematic conformational search, and others known to one skilled in the art may be employed.
- Each of these conformers is docked in the association site (2008) using computational methods such as, but not limited to, those described below. One such method employs the alignment of the non mass-weighted three-dimensional principal moments of inertia of the probes with that of the association site. The conformer is shifted in its best alignment orientation in the association site to improve the docking. The orientation of the conformer that optimizes the fit between the principal moments of inertia of the probe and the association site is saved to disk, the docking score is calculated (2014) as described below for that conformer and the docking process repeats with a new conformer of the same probe. Computer programs such as but not limited to “Cerius2® LigandFit” from Accelrys Inc. (San Diego, Calif.), DOCK, (University of California at San Francisco, UCSF), F.R.E.D. (OpenEye Scientific Software, Santa Fe, N. Mex.) and others can be used for the docking procedure.
- After docking of the conformers as described above, a score is calculated (2014) for each of the probe's conformers in the association site. Several scoring functions can be used for that purpose. One such scoring function is described below.
- In this approach, ΔE, the non-bonded interactions between the probe and the target protein, is calculated from the coulombic and van der Waals terms of an empirical potential energy function. ΔE is defined theoretically as: ΔE=E(complex)−[E(Probe)+E(protein)], where E(complex) is the potential energy of the (protein+docked probe) complex, E(probe) is the internal potential energy of the probe in its docked conformation, and E(protein) is the potential energy of the protein alone, i.e., with no probe docked. The protein may be kept fixed during the docking procedure and therefore E(protein) would need to be estimated only once. E(complex) can be calculated either from an explicit description of all the atoms of the protein, or from a grid representation of the association site, the latter being faster in the case where a large number of compounds is to be screened. This approach includes explicitly the calculation of van der Waals interactions between atoms using a Lennard-Jones function. This scoring function favors probes that are small (minimizing van der Waals clashes) and that have large charge-charge interactions between the probe and the receptor (maximizing the electrostatic interactions). The scoring function also disfavors probes and/or conformers that exhibit large van der Waals clashes between the probes and the receptor.
- Other scoring functions may be used. These include, but are not limited to LUDI (Böhm, H. J.J. Comp. Aided Molec. Design, 8, 243-256 (1994)); PLP (piecewise linear potential, Gehlhaar et al, Chem. Bio., 2, 317-324 (1995); DOCK (Meng, E. C., Shoichet, B. K., and Kuntz, I. D. J. Comp. Chem. 1992 13: 505-524); and Poisson-Boltzman (Honig, B. et al, Science, 268, 1144-9 (1995).
- Some of the above scoring functions, are implemented in several commercially available software packages such as but not limited to Cerius2® from Accelrys, Inc. (San Diego, Calif.) and MOE™ (Chemical Computing Group Inc., Montreal, Canada).
- FIG. 20 illustrates a process for performing a process according to this invention. This docking (2008)/scoring (2014) process is done independently for each probe. The score calculated for one probe's conformers does not depend on the calculations for other probes or conformers. Therefore, this process is highly scalable, and can be distributed among any number of computers that have the required programs. For two computers for instance, the probes can be divided in two groups that will be docked and scored in parallel. Ultimately, each probe could be docked and scored individually on one processor. Massively parallel computer architecture could then be used to linearly improve the efficiency of the process. The docking (2008)/scoring (2014) approaches described above can be used to perform massive throughput in silico screening (2008) of compounds.
- Each combination of protein structure and probe conformer may be rank ordered based on the scores calculated as described above. In the present embodiment, the highest-ranking protein structure-probe conformer complexes (based on their scores) are saved for each probe. Optionally, several scoring functions (as described above) may also be utilized yielding a set of scores for each protein structure-probe conformer complex and a consensus score and rank order determined from the set of scores and utilized for the final ranking. Other methods for rank ordering, known to one skilled in the art may also be employed.
- The above rank ordered probe list is used to select a subset of probes from the entire probe set to be considered for in biologico screening. This subset may be determined using one or more of the following protocols or other protocols known to one skilled in the art.
- a) user specified percentage of the rank ordered probe list
- b) The first “N” members of the rank ordered probe list, where “N” is the number of probes requested by the user
- c) The sample plates containing the probes selected in either protocol a or b
- d) The first “M” sample plates containing the probes selected in either protocol a or b where “M” is user specified
- e) Optionally, the nearest neighbors of the probes selected in either protocol a or b, where the neighbor selection criteria is user specified (the nearest neighbors of the probes are themselves probes)
- f) The sample plates containing the probes selected in protocol e.
- g) The first “M” sample plates containing the probes selected in protocol f, where “M” is user specified.
- h) A diverse subset of the high ranking probes
- i) The corresponding sample plates containing the probe subset from protocol h
- In the above protocols, the user specified percentage may typically range from 10 to 60 percent. More preferably between 10 and 50 percent. The number of samples or plates designated as “N” or “M” is dependent on the specific in biologico assay, but typically ranges from 1,000 to 100,000 compounds or 10 to 1,000 plates respectively.
- The rank ordered probe list (2018) obtained as described above is subjected to in biologico screening (not shown) against the target(s). Optionally, the entire probe set (2012), or a diverse subset (selected using methods known to one skilled in the art) of the entire probe set, or other means of selection (known to one skilled in the art) of a custom subset may be subjected to in biologico screening (not shown) against the target(s). The biological activity measured in this screening (described above) is used in the selection of a subset of probes based on a user-selected level of biological activity measured in the in biologico screening. This subset of probes is defined as the list of in biologico hits (not shown).
- Optionally, the nearest neighbors of the in biologico hits selected above may be determined using methods for neighbor list selection as described above and subjected to further in biologico screening. In the case where one or more near neighbor probe(s) have not been synthesized, they may be synthesized.
- The lists of in silico and in biologico hits are divided into three categories (29410): hits found only in silico, hits found only in biologico, and hits found both in silico and in biologico. The members of category are in silico hits that are not identified as hits in biologico. Conversely, members of category are in biologico hits that are not identified as in silico hits. The members of category are in silico hits that are also identified as in biologico hits. A population of category serves to validate the entire process and especially the in silico protocols. In practice, a population of 10 percent or more of the selected in silico hits (2018) that are also validated through in biologico screening is considered to be a strong validation.
- The in biologico hits populating structural scaffolds/templates may be considered leads and may be optionally utilized in the generation of more complex probes and also included to the Candidate Probe Set.
- Optionally, the relative populations of the three categories may be reviewed to determine if there is a need to refine the in silico protocols described in FIG. 20. In practice, if the in silico category contains more than 50 to 60 percent of the in silico hits (2018) (the threshold level), refinement is recommended. Likewise, if the second category is populated (the threshold level), refinement is also recommended.
- In the case where neighbors of the in silico hits and/or the plates containing the in silico hits are subjected to in biologico screening, the potential arises wherein some of the in biologico hits may not have been selected in the in silico screening (2018). In this case, the first category may be populated.
- FIGS.21-27 are screen shots of one embodiment of this invention. The screen shots illustrated in FIGS. 21-27 are for illustration purposes only. A particular embodiment of this invention may not currently include every feature shown and may include additional features. For example, the advanced user configuration shown in FIG. 21A may not contain all of the applications listed due to licensing constraints, changes in availability, and other factors. FIG. 21A is a screen shot illustrating an advanced user configuration interface in one embodiment of this invention. FIG. 21B is screen shot illustrating an advanced user configuration interface in an embodiment of this invention used for ligand configuration.
- FIG. 22 is a screen shot illustrating an administrator configuration interface in one embodiment of this invention. FIG. 23 is a screen shot illustrating a user interface for providing the status of jobs submitted to the heterogeneous cluster. FIG. 24 is a screen shot illustrating the help system in one embodiment of this invention.
- FIG. 25 illustrates the process of performing 3-D structure determination. In the embodiment shown, a sequence is entered in the
user interface 2502. The sequence is converted to astructure 2504. As described above, multiple sequences may be converted to multiple structures in parallel 2506. - FIG. 26 illustrates a process for binding site identification in one embodiment of this invention. In the embodiment shown, the
structure 2504 generated according to the process illustrated in FIG. 25 is the input. The user utilizes auser interface 2602 to enter parameters to control the identification process. Once the identification is complete, the structure with binding sites identified 2604 is produced. As described above, multiple binding site processes may be performed in parallel 2606. - FIG. 27 illustrates a process for docking according to this invention. In the process shown, a three-
dimensional model 2702 is input. A user enters docking-related parameters into theuser interface 2704. The application modules execute, producing an output, such as thegraph 2706 shown in FIG. 27. As described above, multiple docking processes may be performed in parallel 2708. - One embodiment of this invention uses a variety of software languages to integrate various modules. For example, in one embodiment of the this invention, Perl is used to perform integration within the user interface; SVL is used for protein modeling; and Cerius2 and other proprietary and public scripts are used to implement procedures within commercial software packages. Also, shell scripts are implemented where necessary, for example, for parallelization of the process. CGI, PHP, HTML, XML, Java, and JavaScript provide the necessary functionality for presentation with the user interface.
- In another embodiment of this invention, ligand based design (LBD) approaches can be used in lieu of or in combination with structure-based design approaches. Furthermore, several LBD tools may be deployed as illustrated in FIG. 21B. For instance, pharmacophore models could be developed based on the three-dimensional structure(s) of known biologically active compounds using programs such as CATALYST (Accelrys Inc., San Diego, Calif.) or UNITY (Tripos Inc., St. Louis, Mo.). Using these approaches, chemical features aligned in a three-dimensional space are identified and related to biologically activity.
- Further refinements can be achieved by combining shape-based filters such as excluded volume and principal moments of inertia with pharmacophore models. Such models can be used to screen TTProbes™ and TTPIntergrated libraries™ (TransTech Pharma, High Point, N.C.; www.ttpharma.com), commercial or private compound databases, or in silico designed compound libraries to identify compounds with similar pharmacophoric features.
- In another embodiment, the features of the biologically active reference compound(s) can be represented as fingerprints. These are bit-strings (sequences of 1's (on) or 0's (off)) representing presence or absence of various sub-structural features within a compound's chemical structure. Each bit represents an axis in a multi-dimensional substructure space. Fingerprints may consist of thousands of bits. Thus, a 1000-bit fingerprint represents a point in a 1000-dimensional chemistry space. Similar compounds are expected to be located near each other in this space; dissimilar or “diverse” compounds are expected to be further apart from each other. Thus, biologically active reference compound(s) can be projected in the same chemistry space along with TTProbes™, TTPIntergrated libraries™, commercial or private compound databases, or in silico designed compound libraries to identify novel compounds whose biological activities may mimic those of known reference one(s).
- Fingerprints are calculated using computer programs available from vendors such as but not limited to MDL Information Systems (San Leandro, Calif.) (ISIS fingerprints) or Daylight Chemical Information Systems Inc. (Mission Viejo, Calif.) (Daylight fingerprints). Similarity metrics such as Tanimoto coefficient (described below) are used to compute inter-compound “distance”. The magnitude of this “distance” is directly proportional to the structural similarity between compounds.
- Tanimoto coefficient is calculated by Tc=[Nab]/[Na+Nb−Nab], where Na is the number of on-bits in molecule a; Nb the number of on-bits in molecule b, and Nab the number of common on-bits. Identical molecules have Tc of 1. Two compounds are deemed to be similar if they have a Tanimoto coefficient greater than a preset cutoff value. This value depends on the fingerprint used, but is usually 0.8 or above. Computer programs developed and/or described herein allow the selection of TTProbes™ or TTPIntergrated libraries™, commercial or private databases, or in silico designed compound libraries to identify compounds that have a Tc above a user-defined cutoff relative to biologically active reference compound(s).
- In another embodiment, common substructures or topology or graph theoretical methods (molecular graphs) are employed to identify TTProbes™ or TTPIntergrated libraries™, commercial or private, or in silico designed compound libraries similar to biologically active reference compound(s). Computer programs such as ClassPharmer Suite (Bioreason, Santa Fe, N. Mex.) or OEChem (OpenEye Scientific Software, Santa Fe, N. Mex.) or implementations of algorithms such as those described by Willet et al (J. Chem. Inf. Comput. Sci., 2002, 42, 305-316) or Schneider et al (Angew. Chem. Int. Ed. Engl., 1999, 38, 2894-2896) or others known to those skilled in the art could also be employed.
- In yet another embodiment of the LBD approach, use of the steric and electrostatic fields of biologically active reference compound(s) are used to identify TTProbes™ or TTPIntergrated libraries™, commercial or private, or in silico designed compound libraries. Using this technique, quantitative and/or qualitative models that predict biological activity or property of the reference compound(s) can be used to predict biological activities of TTProbes™ or TTPIntergrated libraries™, commercial or private databases, or in silico designed compound libraries. Computer software(s) such as COMFA® (Tripos Inc., St. Louis, Mo.) and/or COMSIA developed by Klebe et al (J. Med. Chem., 37, 4130-4146 (1994)) or others known to those skilled in the art could be used for this purpose.
- In another embodiment of this invention, Absorption, Distribution, Metabolism, Excretion and Toxicity (ADMET) filters or other chemical functionality/feature filters may be employed to obtain TTProbes™ or TTPIntergrated libraries™, commercial or private databases, or in silico designed compound libraries. For instance, a user may be allowed to filter TTProbes™ or TTPIntergrated libraries™, commercial or private databases, or in silico designed compound libraries with molecular weight <500 and/or include compounds that contain at least one basic nitrogen. These chemical functionality/features may be obtained from physicochemical descriptors that could be computed for any given compound using (commercial or private software such as but not limited to MOE™ (Chemical Computing Group, Montreal, Canada) or Cerius2 (Accelrys Inc., San Diego, Calif.). In addition, flexibility to filter TTProbes™ or TTPIntergrated libraries™, commercial or private databases, or in silico designed compound libraries based on ADMET thresholds set by user could also be achieved. These thresholds could be defined based on experimental measurements available for a series of related compounds or predicted using commercial or private software such as but not limited to iDEA™ (Lion bioscience AG, Heidelberg, Germany) QikProp (Schroedinger Inc., Portland, Oreg.) or GastroPlus™ (Simulations Plus, Inc., Lancaster, Calif.). Alternatively, local ADMET models could be developed using any or many of the QSAR/QSPR techniques as implemented in MOE™ (Chemical Computing Group, Montreal, Canada) or Cerius2 (Accelrys Inc., San Diego, Calif.) or other implementations known to those skilled in the art. These models could then be implemented for predicting the ADMET properties of structurally and chemically related or diverse compounds.
- Although structure-based design approaches, ligand-based design approaches, and compound properties filters could be used independently, the embodiments of this invention may combine these approaches in any combinations in performing computer-aided molecular discovery. For instance, the properties filter can be a post-processing filter for the compound subset that comes out of the structure-based design approach results. Alternatively, the properties filter may be employed to screen TTProbes™ or TTPIntergrated libraries™, commercial or private databases, or in silico designed compound libraries and the resulting compound subset could then be passed as an input for ligand based design approaches.
- Embodiments of this invention may support a variety of functions related to molecular discovery beyond the processes described above. For example, embodiments may support: (1) Large scale (millions) enumeration of library compounds; (2) Parallelized conformation generation; (3) Large scale physico-chemical descriptor and molecular fingerprint calculation; (4) same ligand set, variable protein model analysis; (5) cross-site same protein/variable ligand set analysis; and (6) in silico high-throughput screening of compounds, (7) energy minimization and/or molecular dynamics simulations of protein three dimensional structural model(s) and/or protein three dimensional model(s) with ligand bound, (8) ligand based design approaches, (9) property based filters, and (10) QSAR/QSPR modeling.
- In addition to the functionality described in detail above, one embodiment of this invention may include a variety of other functions and processes. For example, an embodiment may include administration functions. Various user types are defined, such as administrator, advanced user, and casual or novice user, and the interface and functioning of the system are varied based on the user type.
- Some organizations utilizing an embodiment of this invention may require that security measures be implemented to ensure that the data generated and consumed by the system will not become known outside the organization. One embodiment of this invention operates only within a firewall and utilize secured sockets layer to provide security.
- One embodiment of this invention may be implemented on a single client site or across multiple client sites, utilizing standard protocols, such as TCP/IP. Therefore, a variety of billing and licensing strategies may be utilized. For example, an organization may purchase an unlimited license, or an organization may simply purchase one or more per-seat licenses. In addition, one embodiment of this invention may be implemented as an application or web service to which organizations subscribe.
- The following example provides an illustration of how one embodiment of this invention may be utilized. The process described below was executed by integrating a computer-aided molecular discovery process across a plurality of computer-aided molecular discovery applications. TTPredict™ provided the automation and integration features necessary to provide both horizontal and vertical integration of the various modules within a heterogeneous computer platform.
- Thrombin is a suitable target for drug discovery using this method. Thrombin lies in the final common pathway of coagulation and cleaves fibrinogen to fibrin thereby generating the biological polymer that constitutes part of a blood clot in mammals. Therefore, inhibition of thrombin would be expected to exert an antithrombotic effect.
- In the present embodiment, TTProbes™ that are thrombin inhibitors were identified by using TTPredict™ starting from the protein sequence of thrombin. The sequence of thrombin was obtained from Swiss-Prot (www.expasy.ch; Accession number: P00734). This sequence was input to the TTPostGene™ module within TTPredict™.
- Using this input protein sequence, TTPostGene™ identified
several template protein 3D-structures from the protein data bank belonging to the same family. At this point, the X-ray structure of human thrombin (PDB code: 1AD8) was selected as the template structure to build the homology model using InsightII® (Accelrys Inc., San Diego, Calif.). - Putative binding sites for the homology model built thrombin were identified using Cerius2® (Accelrys Inc, San Diego, Calif.). Of the several binding sites identified by the TTPSite™ module, only the first two sites were utilized for further in silico screening efforts.
- The two-dimensional structures of the TTProbes™ stored in the database were initially cleaned to remove the salts (if present) and subjected to an energy minimization in order to generate the three-dimensional conformation of the probes. The energy minimization of the TTProbes™ was performed using MOE™ (Chemical Computing Group, Montreal, Canada) and stored in an SD file. This input compound database containing over 50,000 TTProbes™ was used to perform docking studies against both the identified putative binding sites. For each probe, a maximum of 1000 conformations were generated on the fly using Monte Carlo procedure implemented within Cerius2® (Accelrys Inc, San Diego, Calif.).
- The docking of TTProbes™ for each of the binding sites was executed in a parallel process using four Linux nodes and a SGI node. The TTProbes™ were processed in chunks of 1000 probes at a time to a given node and after completion of docking for this chunk, the next 1000 probes were passed so that load balancing between various processors could be achieved.
- Each probe conformer was docked into the putative binding site(s) and a score value was assigned for each of the thrombin-related probe conformer complex using the LigScore_Dreiding scoring function. However, only the best scored conformer for each probe was saved. Subsequently, four more scoring functions (PLP1, PLP2, PMF, and JAIN) were employed to score the saved thrombin-related probe conformer complexes for each probe. A correlation matrix obtained for the five scoring functions showed over 80% correlation between PLP1 and PLP2. Consequently, the results of PLP2 were not used or considered further. In addition, several combinations of the scoring functions (usually three) were used to rank order the first 20% of the top ranked probes within each binding site. For instance, if a particular probe was identified in the first 20% of its rank ordered scoring function value and also exhibits similar behavior with respect to two other scoring functions, the probe was assigned a rank of 3. If the probe was identified within the first 20% only in two of the three selected scoring function combinations, then the probe was assigned a rank of 2.
- The TTProbe™ ID along with the five scoring function values and also multiple scoring function combination results were passed to TTPSelect™ to select or identify the plate-identifiers. Two thousand of the top ranked unique probes for each scoring functions were identified, labeled as in silico probe hits and saved separately, thereby generating 8,000 in silico probe hits. Subsequently, the plate identification number containing the in silico probe hits along with the number of in silico probe hits in each of these plates were obtained.
- Instead of performing in biologico screening on the 8,000 in silico probe hits obtained by filtering the top two thousand best ranked unique probes using each of the four scoring functions, a subset of the 8,000 in silico probe hits were obtained for subsequent screening activities. A subset of the 800 in silico probe hits was achieved by selecting the top ranked plates that contained the maximum number of in silico probe hits for each of the scoring functions and submitted for in biologico screening against thrombin.
- Although it was more relevant to screen only those probes that were identified as in silico probe hits in these plates, the computed Tc revealed that the other probes in each of the plates containing in silico probe hits to be near neighbors. Hence, all the probes in the selected plates were subjected to in biologico screening against thrombin.
- The biological assay for thrombin inhibitory activity is detailed below. To Costar® 96-well black fluorescence plate wells is added 70 microliters of assay buffer, followed by 10 microliters of 1 millimolar substrate solution. Test probe (10 microliters in 30% DMSO) is then added to wells according to the desired concentrations for the assay. The mixture is incubated at 37° C. for 5 minutes, followed by addition of 10 microliters of thrombin (100 micrograms/mL in assay buffer), to make a final assay volume of 100 microliters. The plate is mixed gently and incubated 15 minutes at 37° C. Stop buffer (100 microliters) is added, and the plate is read by detecting fluorescence intensity (Excitation at 360 nM, Emission at 460 nM). Percent inhibition of test compound is calculated by comparison with control wells. “Assay buffer” is composed of 100 mM KH2PO4, 100 mM Na2HPO4, 1 mM EDTA, 0.01% BRIJ-35, and 1 mM dithiothreitol (added fresh on the day assay is preformed). “Stop buffer” is composed of 100 mM Na—O(O)CCH2Cl and 30 mM sodium acetate which is brought to pH 2.5 with glacial acetic acid. Thrombin was purchased from Sigma (cat #T-3399). Thrombin substrate III fluorogenic was purchased from ICN (cat #195915). Sodium acetate, dithiothreitol, and Brij-35 were purchased from Sigma. Sodium monochloroacetate was purchased from Lancaster 223-498-3. Glacial acetic acid was purchased from Alfa Aesar (cat # 33252). Thrombin was stored at −20° C. Thrombin substrate fluorogenic was stored at −20° C. (5 mM in DMSO).
- Based on the dose-response nature of the in biologico screened probes, the success of the in silico protocols in discovering probes for any given target is exemplified using the in silico probe hits that was also identified as an in biologico hit, too.
-
- The foregoing description of embodiments of the invention has been presented only for the purpose of illustration and description and is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Numerous modifications and adaptations thereof will be apparent to those skilled in the art without departing from the spirit and scope of the invention.
Claims (45)
1. A method for integrating a computer-aided molecular discovery process across a plurality of computer-aided molecular discovery applications, the method comprising:
(a) receiving an input;
(b) providing said input to a first module of a first computer-aided molecular discovery application;
(c) providing said input to a second module of a second computer-aided molecular discovery application;
(d) executing said first module to create a first output; and
(e) executing said second module to create a second output.
2. The method of claim 1 , wherein receiving said input comprises receiving a sequence.
3. The method of claim 1 , wherein said receiving said input comprises receiving a structure.
4. The method of claim 1 , wherein receiving said input comprises receiving said input in a user interface.
5. The method of claim 4 , wherein receiving said input in said user interface comprises receiving said input in a graphical user interface.
6. The method of claim 4 , wherein said receiving said input in a user interface comprises receiving said input in a user interface comprising markup language.
7. The method of claim 4 , wherein said receiving said input in a user interface comprises receiving said input in a user interface comprising Tool Command Language (TCL).
8. The method of claim 1 , wherein receiving said input comprises receiving said input from a file.
9. The method of claim 1 , wherein said first module and said second module are operable to perform a similar function.
10. The method of claim 1 , wherein said first module and said second module are operable to perform a complementary function.
11. The method of claim 10 , wherein said first module comprises an sequence retrieval module and said second module comprises an alignment module.
12. The method of claim 1 , wherein said first computer-aided molecular discovery application comprises a commercially-available application.
13. The method of claim 1 , further comprising (f) combining said first output and said second output to create a combined output.
14. The method of claim 13 , further comprising (g) presenting said combined output.
15. The method of claim 13 , further comprising:
(g) providing said combined output to a third module of a third computer-aided molecular discovery application; and
(h) executing said third module to create a third output.
16. The method of claim 15 , further comprising:
(i) providing said input to a fourth module of a fourth computer-aided molecular discovery application;
(j) executing said fourth module to create a fourth output; and
(k) combining said third output and said fourth output to create a second combined output.
17. The method of claim 1 , further comprising, before step (b), receiving a selection of said first module and a selection of said second module.
18. The method of claim 1 , wherein said executing of said first module comprises executing said first module on a heterogeneous computing platform.
19. The method of claim 18 , wherein said executing said first module on a heterogeneous computing platform comprises:
determining the load on at least one of a plurality of nodes of said heterogeneous computing platform to identify at least one available node;
creating at least one script for processing on said at least one available node;
performing one or more of the following steps:
dividing a plurality of data elements for processing on said at least one available node,
copying said data to said at least one available node,
copying said at least one script to said at least one available node;
executing said at least one script on said at least one available node; and
combining the output of said execution of said at least one script.
20. The method of claim 18 , wherein:
executing said first module comprises executing said first module on a first computing platform; and
executing said second module comprises executing said second module on a second computing platform.
21. The method of claim 1 , further comprising, before step (c):
pausing to receive a continuation or cancellation input; and
receiving said continuation or cancellation input.
22. A method for integrating a computer-aided molecular discovery process using a plurality of computer-aided molecular discovery applications, the method comprising:
(a) receiving an input;
(b) providing said input to a first module of a first computer-aided molecular discovery application;
(c) executing said first module to create a first output;
(d) providing said first output to a second module of a second computer-aided molecular discovery application; and
(e) executing said second module to create a second output.
23. The method of claim 22 , further comprising:
(f) providing said input to a third module of a third computer-aided molecular discovery application; and
(g) executing said third module to create a third output.
24. The method of claim 23 , further comprising:
(h) providing said third output to a fourth module of a fourth computer-aided molecular discovery application; and
(i) executing said fourth module to create a fourth output.
25. The method of claim 24 , further comprising:
(j) combining said second output and said fourth output to create a combined output;
(k) providing said combined output to a fifth module of a fifth computer-aided molecular discovery application; and
(l) executing said fifth module to create a fifth output.
26. A system for integrating a computer-aided molecular discovery process across a plurality of computer-aided molecular discovery applications, the system comprising:
an application-neutral computer-aided molecular discovery application framework;
a first module interface for a first computer-aided molecular discovery application in communication with said computer-aided molecular discovery application framework; and
a second module interface for a second computer-aided molecular discovery application in communication with said computer-aided molecular discovery application framework.
27. The system of claim 26 , wherein said computer-aided molecular discovery application framework comprises a module manager for managing execution of a plurality of modules of a plurality of computer-aided molecular discovery applications within a heterogeneous computing platform.
28. The system of claim 27 , wherein said heterogeneous computing platform comprises a grid computing architecture.
29. The system of claim 26 , wherein said computer-aided molecular discovery application framework comprises:
a job scheduler;
a paralellization manager; and
a status notifier.
30. The system of claim 29 , wherein said paralellization manager comprises:
a pre-processor;
a process manager; and
a post-processor.
31. The system of claim 30 , wherein said pre-processor comprises:
a node load manager;
a file splitter; and
a script generator.
32. The system of claim 30 , wherein said process manager comprises:
a job database; and
a job daemon.
33. The system of claim 32 , wherein said job database comprises a relational database.
34. The system of claim 30 , wherein said post-processor comprises:
a file combiner;
a file clean up module; and
a per-node calculation engine.
35. A computer-readable medium on which is encoded programming code for integrating a computer-aided molecular discovery process across a plurality of computer-aided molecular discovery applications, the computer-readable medium comprising:
(a) program code for receiving an input;
(b) program code for providing said input to a first module of a first computer-aided molecular discovery application;
(c) program code for providing said input to a second module of a second computer-aided molecular discovery application;
(d) program code for executing said first module to create a first output; and
(e) program code for executing said second module to create a second output.
36. The computer-readable medium of claim 35 , further comprising:
(f) program code for providing said input to a third module of a third computer-aided molecular discovery application; and
(g) program code for executing said third module to create a third output.
37. The computer-readable medium of claim 36 , further comprising:
(h) program code for providing said input to a fourth module of a fourth computer-aided molecular discovery application; and
(i) program code for executing said fourth module to create a fourth output.
38. The computer-readable medium of claim 37 , further comprising (f) program code for combining said first output and said second output to create a combined output.
39. The computer-readable medium of claim 38 , further comprising (g) program code for presenting said combined output.
40. The computer-readable medium of claim 35 , wherein said program code for executing of said first module comprises program code for executing said first module on a heterogeneous computing platform.
41. The computer-readable medium of claim 40 , wherein said program code for executing said first module on a heterogeneous computing platform comprises:
program code for determining the load on of at least one of a plurality of nodes of said program code for heterogeneous computing platform to identify at least one available node;
program code for creating at least one script for processing on said at least one available node;
program code for performing one or more of the following steps:
dividing a plurality of data elements for processing on said at least one available node,
copying said data to said at least one available node,
copying said at least one script to said at least one available node;
program code for executing said at least one script on said at least one available node; and
program code for combining the output of said execution of said at least one script.
42. The computer-readable medium of claim 40 , wherein:
program code for executing said first module comprises executing said first module on a first computing platform; and
program code for executing said second module comprises executing said second module on a second computing platform.
43. A computer-readable medium on which is encoded programming code for integrating a computer-aided molecular discovery process using a plurality of computer-aided molecular discovery applications, the computer-readable medium comprising:
(a) receiving an input in a user interface;
(b) providing said input to a first module of a first computer-aided molecular discovery application;
(c) executing said first module to create a first output;
(d) providing said first output to a second module of a second computer-aided molecular discovery application; and
(e) executing said second module to create a second output.
44. A laboratory comprising a system for integrating a computer-aided molecular discovery process across a plurality of computer-aided molecular discovery applications, the system comprising:
an application-neutral computer-aided molecular discovery application framework;
a first module interface for a first computer-aided molecular discovery application in communication with said computer-aided molecular discovery application framework; and
a second module interface for a second computer-aided molecular discovery application in communication with said computer-aided molecular discovery application framework.
45. A computer network comprising a system for integrating a computer-aided molecular discovery process across a plurality of computer-aided molecular discovery applications, the system comprising:
an application-neutral computer-aided molecular discovery application framework;
a first module interface for a first computer-aided molecular discovery application in communication with said computer-aided molecular discovery application framework; and
a second module interface for a second computer-aided molecular discovery application in communication with said computer-aided molecular discovery application framework.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/410,965 US20040019432A1 (en) | 2002-04-10 | 2003-04-10 | System and method for integrated computer-aided molecular discovery |
Applications Claiming Priority (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US37164302P | 2002-04-10 | 2002-04-10 | |
US37164402P | 2002-04-10 | 2002-04-10 | |
US37195602P | 2002-04-11 | 2002-04-11 | |
US37187102P | 2002-04-11 | 2002-04-11 | |
US10/410,965 US20040019432A1 (en) | 2002-04-10 | 2003-04-10 | System and method for integrated computer-aided molecular discovery |
Publications (1)
Publication Number | Publication Date |
---|---|
US20040019432A1 true US20040019432A1 (en) | 2004-01-29 |
Family
ID=29255578
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/410,965 Abandoned US20040019432A1 (en) | 2002-04-10 | 2003-04-10 | System and method for integrated computer-aided molecular discovery |
US10/411,568 Expired - Fee Related US7146384B2 (en) | 2002-04-10 | 2003-04-10 | System and method for data analysis, manipulation, and visualization |
US11/586,776 Abandoned US20070043694A1 (en) | 2002-04-10 | 2006-10-26 | System and method for data analysis, manipulation, and visualization |
Family Applications After (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/411,568 Expired - Fee Related US7146384B2 (en) | 2002-04-10 | 2003-04-10 | System and method for data analysis, manipulation, and visualization |
US11/586,776 Abandoned US20070043694A1 (en) | 2002-04-10 | 2006-10-26 | System and method for data analysis, manipulation, and visualization |
Country Status (5)
Country | Link |
---|---|
US (3) | US20040019432A1 (en) |
EP (2) | EP1495432A2 (en) |
AU (2) | AU2003226043A1 (en) |
CA (2) | CA2480202A1 (en) |
WO (2) | WO2003088090A2 (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070050092A1 (en) * | 2005-08-12 | 2007-03-01 | Symyx Technologies, Inc. | Event-based library process design |
WO2008046208A1 (en) * | 2006-10-16 | 2008-04-24 | Zymeworks Inc. | System and method for simulating the time-dependent behaviour of atomic and/or molecular systems subject to static or dynamic fields |
US7373541B1 (en) * | 2004-03-11 | 2008-05-13 | Adaptec, Inc. | Alignment signal control apparatus and method for operating the same |
US20080275884A1 (en) * | 2007-05-04 | 2008-11-06 | Salesforce.Com, Inc. | Method and system for on-demand communities |
US20100145896A1 (en) * | 2007-08-22 | 2010-06-10 | Fujitsu Limited | Compound property prediction apparatus, property prediction method, and program for implementing the method |
EP2216429A1 (en) * | 2007-11-12 | 2010-08-11 | In-Silico Sciences, Inc. | In silico screening system and in silico screening method |
US20130046808A1 (en) * | 2005-03-01 | 2013-02-21 | Csc Holdings, Inc. | Methods and systems for distributed processing on consumer devices |
WO2013163068A1 (en) * | 2012-04-23 | 2013-10-31 | Targacept, Inc. | Chemical entity search, for a collaboration and content management system |
WO2015148546A1 (en) * | 2014-03-24 | 2015-10-01 | Life Technologies Corporation | Methods and systems for knowledge discovery using biological data |
US20150286637A1 (en) * | 2007-10-16 | 2015-10-08 | Jpmorgan Chase Bank, N.A. | Document Management Techniques To Account For User-Specific Patterns In Document Metadata |
WO2017072794A1 (en) | 2015-10-30 | 2017-05-04 | Council Of Scientific And Industrial Research | An automated remote computing method and system by email platform for molecular analysis |
Families Citing this family (53)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7496480B2 (en) * | 2003-08-15 | 2009-02-24 | National Instruments Corporation | Sweep manager for signal analysis |
US7978716B2 (en) | 2003-11-24 | 2011-07-12 | Citrix Systems, Inc. | Systems and methods for providing a VPN solution |
WO2005069012A2 (en) * | 2003-12-12 | 2005-07-28 | Transtech Pharma, Inc. | Ligands for i7l as modulators of orthopox viruses and methods for discovery thereof |
US7757074B2 (en) | 2004-06-30 | 2010-07-13 | Citrix Application Networking, Llc | System and method for establishing a virtual private network |
US8739274B2 (en) | 2004-06-30 | 2014-05-27 | Citrix Systems, Inc. | Method and device for performing integrated caching in a data communication network |
US8495305B2 (en) | 2004-06-30 | 2013-07-23 | Citrix Systems, Inc. | Method and device for performing caching of dynamically generated objects in a data communication network |
EP1771998B1 (en) | 2004-07-23 | 2015-04-15 | Citrix Systems, Inc. | Systems and methods for optimizing communications between network nodes |
KR20070037649A (en) | 2004-07-23 | 2007-04-05 | 사이트릭스 시스템스, 인크. | A method and systems for routing packets from a gateway to an endpoint |
US20060143244A1 (en) * | 2004-12-28 | 2006-06-29 | Taiwan Semiconductor Manufacturing Co., Ltd. | Semiconductor data archiving management systems and methods |
US8700695B2 (en) | 2004-12-30 | 2014-04-15 | Citrix Systems, Inc. | Systems and methods for providing client-side accelerated access to remote applications via TCP pooling |
US8549149B2 (en) | 2004-12-30 | 2013-10-01 | Citrix Systems, Inc. | Systems and methods for providing client-side accelerated access to remote applications via TCP multiplexing |
US7810089B2 (en) | 2004-12-30 | 2010-10-05 | Citrix Systems, Inc. | Systems and methods for automatic installation and execution of a client-side acceleration program |
US8954595B2 (en) | 2004-12-30 | 2015-02-10 | Citrix Systems, Inc. | Systems and methods for providing client-side accelerated access to remote applications via TCP buffering |
US8706877B2 (en) | 2004-12-30 | 2014-04-22 | Citrix Systems, Inc. | Systems and methods for providing client-side dynamic redirection to bypass an intermediary |
US8255456B2 (en) | 2005-12-30 | 2012-08-28 | Citrix Systems, Inc. | System and method for performing flash caching of dynamically generated objects in a data communication network |
WO2007005769A1 (en) * | 2005-06-30 | 2007-01-11 | Applera Corporation | Automated quality control method and system for genetic analysis |
CA2632235A1 (en) | 2005-12-02 | 2007-06-07 | Citrix Systems, Inc. | Method and apparatus for providing authentication credentials from a proxy server to a virtualized computing environment to access a remote resource |
US7571151B1 (en) * | 2005-12-15 | 2009-08-04 | Gneiss Software, Inc. | Data analysis tool for analyzing data stored in multiple text files |
US8301839B2 (en) | 2005-12-30 | 2012-10-30 | Citrix Systems, Inc. | System and method for performing granular invalidation of cached dynamically generated objects in a data communication network |
US7921184B2 (en) | 2005-12-30 | 2011-04-05 | Citrix Systems, Inc. | System and method for performing flash crowd caching of dynamically generated objects in a data communication network |
US8151323B2 (en) | 2006-04-12 | 2012-04-03 | Citrix Systems, Inc. | Systems and methods for providing levels of access and action control via an SSL VPN appliance |
US8046322B2 (en) * | 2007-08-07 | 2011-10-25 | The Boeing Company | Methods and framework for constraint-based activity mining (CMAP) |
US20110040726A1 (en) * | 2007-09-17 | 2011-02-17 | Nicholas Daryl Crosbie | Layout Manager |
US7925694B2 (en) | 2007-10-19 | 2011-04-12 | Citrix Systems, Inc. | Systems and methods for managing cookies via HTTP content layer |
US8090877B2 (en) | 2008-01-26 | 2012-01-03 | Citrix Systems, Inc. | Systems and methods for fine grain policy driven cookie proxying |
US8117145B2 (en) * | 2008-06-27 | 2012-02-14 | Microsoft Corporation | Analytical model solver framework |
US8620635B2 (en) * | 2008-06-27 | 2013-12-31 | Microsoft Corporation | Composition of analytics models |
US20090322739A1 (en) * | 2008-06-27 | 2009-12-31 | Microsoft Corporation | Visual Interactions with Analytics |
US8411085B2 (en) * | 2008-06-27 | 2013-04-02 | Microsoft Corporation | Constructing view compositions for domain-specific environments |
US8255192B2 (en) * | 2008-06-27 | 2012-08-28 | Microsoft Corporation | Analytical map models |
US10114875B2 (en) * | 2008-06-27 | 2018-10-30 | Microsoft Technology Licensing, Llc | Dashboard controls to manipulate visual data |
US8145615B2 (en) * | 2008-11-26 | 2012-03-27 | Microsoft Corporation | Search and exploration using analytics reference model |
US8155931B2 (en) * | 2008-11-26 | 2012-04-10 | Microsoft Corporation | Use of taxonomized analytics reference model |
US8103608B2 (en) * | 2008-11-26 | 2012-01-24 | Microsoft Corporation | Reference model for data-driven analytics |
US8190406B2 (en) * | 2008-11-26 | 2012-05-29 | Microsoft Corporation | Hybrid solver for data-driven analytics |
US8314793B2 (en) * | 2008-12-24 | 2012-11-20 | Microsoft Corporation | Implied analytical reasoning and computation |
US8788574B2 (en) * | 2009-06-19 | 2014-07-22 | Microsoft Corporation | Data-driven visualization of pseudo-infinite scenes |
US8259134B2 (en) * | 2009-06-19 | 2012-09-04 | Microsoft Corporation | Data-driven model implemented with spreadsheets |
US8866818B2 (en) | 2009-06-19 | 2014-10-21 | Microsoft Corporation | Composing shapes and data series in geometries |
US8531451B2 (en) * | 2009-06-19 | 2013-09-10 | Microsoft Corporation | Data-driven visualization transformation |
US8493406B2 (en) * | 2009-06-19 | 2013-07-23 | Microsoft Corporation | Creating new charts and data visualizations |
US9330503B2 (en) | 2009-06-19 | 2016-05-03 | Microsoft Technology Licensing, Llc | Presaging and surfacing interactivity within data visualizations |
US8692826B2 (en) * | 2009-06-19 | 2014-04-08 | Brian C. Beckman | Solver-based visualization framework |
US8352397B2 (en) * | 2009-09-10 | 2013-01-08 | Microsoft Corporation | Dependency graph in data-driven model |
US8370386B1 (en) | 2009-11-03 | 2013-02-05 | The Boeing Company | Methods and systems for template driven data mining task editing |
US9053454B2 (en) * | 2009-11-30 | 2015-06-09 | Bank Of America Corporation | Automated straight-through processing in an electronic discovery system |
US20110145714A1 (en) * | 2009-12-15 | 2011-06-16 | At&T Intellectual Property I, L.P. | System and method for web-integrated statistical analysis |
US9043296B2 (en) | 2010-07-30 | 2015-05-26 | Microsoft Technology Licensing, Llc | System of providing suggestions based on accessible and contextual information |
DE102010036287A1 (en) * | 2010-08-27 | 2012-03-01 | Siemens Aktiengesellschaft | Method of interrogating a data point of a switch |
US9965597B2 (en) * | 2014-04-29 | 2018-05-08 | Schrödinger, Inc. | Collaborative drug discovery system |
US9727623B1 (en) * | 2016-02-05 | 2017-08-08 | Accenture Global Solutions Limited | Integrated developer workflow for data visualization development |
US10853368B2 (en) | 2018-04-02 | 2020-12-01 | Cloudera, Inc. | Distinct value estimation for query planning |
US10991185B1 (en) | 2020-07-20 | 2021-04-27 | Abbott Laboratories | Digital pass verification systems and methods |
Citations (31)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5526281A (en) * | 1993-05-21 | 1996-06-11 | Arris Pharmaceutical Corporation | Machine-learning approach to modeling biological activity for molecular design and to modeling other characteristics |
US5574656A (en) * | 1994-09-16 | 1996-11-12 | 3-Dimensional Pharmaceuticals, Inc. | System and method of automatically generating chemical compounds with desired properties |
US5583973A (en) * | 1993-09-17 | 1996-12-10 | Trustees Of Boston University | Molecular modeling method and system |
US5703792A (en) * | 1993-05-21 | 1997-12-30 | Arris Pharmaceutical Corporation | Three dimensional measurement of molecular diversity |
US5866343A (en) * | 1997-04-15 | 1999-02-02 | Universite De Montreal | Energetically favorable binding site determination between two molecules |
US5989827A (en) * | 1995-11-14 | 1999-11-23 | Abbott Laboratories | Use of nuclear magnetic resonance to design ligands to target biomolecules |
US6010861A (en) * | 1994-08-03 | 2000-01-04 | Dgi Biotechnologies, Llc | Target specific screens and their use for discovering small organic molecular pharmacophores |
US6125383A (en) * | 1997-06-11 | 2000-09-26 | Netgenics Corp. | Research system using multi-platform object oriented program language for providing objects at runtime for creating and manipulating biological or chemical data |
US6127524A (en) * | 1996-10-18 | 2000-10-03 | Dade Behring Inc. | Binding molecules and computer-based methods of increasing the binding affinity thereof |
US6182016B1 (en) * | 1997-08-22 | 2001-01-30 | Jie Liang | Molecular classification for property prediction |
US6185506B1 (en) * | 1996-01-26 | 2001-02-06 | Tripos, Inc. | Method for selecting an optimally diverse library of small molecules based on validated molecular structural descriptors |
US6207861B1 (en) * | 1998-01-05 | 2001-03-27 | Neogenesis, Inc. | Method for producing and screening mass coded combinatorial libraries for drug discovery and target validation |
US6219622B1 (en) * | 1995-03-24 | 2001-04-17 | University Of Guelph | Computational method for designing chemical structures having common functional characteristics |
US6240374B1 (en) * | 1996-01-26 | 2001-05-29 | Tripos, Inc. | Further method of creating and rapidly searching a virtual library of potential molecules using validated molecular structural descriptors |
US6287763B1 (en) * | 1996-06-10 | 2001-09-11 | Millennium Pharmaceuticals, Inc. | Screening methods for compounds useful in the regulation of body weight |
US6303322B1 (en) * | 1996-05-09 | 2001-10-16 | 3-Dimensional Pharmaceuticals, Inc. | Method for identifying lead compounds |
US6308145B1 (en) * | 1992-03-27 | 2001-10-23 | Akiko Itai | Methods for searching stable docking models of biopolymer-ligand molecule complex |
US20010046684A1 (en) * | 2000-02-25 | 2001-11-29 | Robert Powers | Methods of structure-based drug design using MS/MNR |
US6343257B1 (en) * | 1999-04-23 | 2002-01-29 | Peptor Ltd. | Identifying pharmacophore containing combinations of scaffold molecules and substituents from a virtual library |
US20020040612A1 (en) * | 2000-10-10 | 2002-04-11 | Syoichi Miyamoto | Synchronous meshing type automatic transmission control system |
US20020055536A1 (en) * | 1996-09-26 | 2002-05-09 | Dewitte Robert S. | System and method for structure-based drug design that includes accurate prediction of binding free energy |
US6389378B2 (en) * | 1994-10-31 | 2002-05-14 | Akiko Itai | Method of searching novel ligand compounds from three-dimensional structure database |
US20020066073A1 (en) * | 2000-07-12 | 2002-05-30 | Heinz Lienhard | Method and system for implementing process-based Web applications |
US6421612B1 (en) * | 1996-11-04 | 2002-07-16 | 3-Dimensional Pharmaceuticals Inc. | System, method and computer program product for identifying chemical compounds having desired properties |
US20020099506A1 (en) * | 2000-03-23 | 2002-07-25 | Floriano Wely B. | Methods and apparatus for predicting ligand binding interactions |
US20020107359A1 (en) * | 1998-02-06 | 2002-08-08 | P. Mark Hogarth | Three dimensional structures and models of fc receptors and uses thereof |
US6453246B1 (en) * | 1996-11-04 | 2002-09-17 | 3-Dimensional Pharmaceuticals, Inc. | System, method, and computer program product for representing proximity data in a multi-dimensional space |
US6465192B2 (en) * | 1997-12-12 | 2002-10-15 | Jeffrey C. Way | Compounds and methods for the inhibition of protein-protein interactions |
US20020151028A1 (en) * | 2000-05-18 | 2002-10-17 | Cornell Research Foundation, Inc. | Structure-based drug design for Ulp1 protease substrates |
US6491871B1 (en) * | 1989-06-07 | 2002-12-10 | Affymetrix, Inc. | System for determining receptor-ligand binding affinity |
US20030008326A1 (en) * | 2001-05-30 | 2003-01-09 | Sem Daniel S | Nuclear magnetic resonance-docking of compounds |
Family Cites Families (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5375201A (en) * | 1992-12-18 | 1994-12-20 | Borland International, Inc. | System and methods for intelligent analytical graphing |
JPH08297748A (en) * | 1995-04-27 | 1996-11-12 | Canon Inc | Method and device for analysis data display |
JP2000048087A (en) * | 1998-07-15 | 2000-02-18 | Internatl Business Mach Corp <Ibm> | View synthesizing system |
US6356256B1 (en) * | 1999-01-19 | 2002-03-12 | Vina Technologies, Inc. | Graphical user interface for display of statistical data |
US6389376B1 (en) * | 1999-07-26 | 2002-05-14 | Sun Microsystems, Inc. | Method and apparatus for generating n-segment steiner trees |
US6732172B1 (en) * | 2000-01-04 | 2004-05-04 | International Business Machines Corporation | Method and system for providing cross-platform access to an internet user in a heterogeneous network environment |
WO2001065349A1 (en) * | 2000-03-01 | 2001-09-07 | Smithkline Beecham Corporation | Computer user interface for visualizing assay results |
US20020052882A1 (en) * | 2000-07-07 | 2002-05-02 | Seth Taylor | Method and apparatus for visualizing complex data sets |
WO2002005209A2 (en) * | 2000-07-12 | 2002-01-17 | Molecularware, Inc. | Method and apparatus for visualizing complex data sets |
US20020095621A1 (en) * | 2000-10-02 | 2002-07-18 | Lawton Scott S. | Method and system for modifying search criteria based on previous search date |
US20030014420A1 (en) * | 2001-04-20 | 2003-01-16 | Jessee Charles B. | Method and system for data analysis |
US7487148B2 (en) * | 2003-02-28 | 2009-02-03 | Eaton Corporation | System and method for analyzing data |
-
2003
- 2003-04-10 US US10/410,965 patent/US20040019432A1/en not_active Abandoned
- 2003-04-10 US US10/411,568 patent/US7146384B2/en not_active Expired - Fee Related
- 2003-04-10 AU AU2003226043A patent/AU2003226043A1/en not_active Abandoned
- 2003-04-10 EP EP03718342A patent/EP1495432A2/en not_active Ceased
- 2003-04-10 AU AU2003221884A patent/AU2003221884A1/en not_active Abandoned
- 2003-04-10 CA CA002480202A patent/CA2480202A1/en not_active Abandoned
- 2003-04-10 WO PCT/US2003/011180 patent/WO2003088090A2/en not_active Application Discontinuation
- 2003-04-10 CA CA002479818A patent/CA2479818A1/en not_active Abandoned
- 2003-04-10 EP EP03746693A patent/EP1500030A2/en not_active Ceased
- 2003-04-10 WO PCT/US2003/010961 patent/WO2003088125A2/en not_active Application Discontinuation
-
2006
- 2006-10-26 US US11/586,776 patent/US20070043694A1/en not_active Abandoned
Patent Citations (34)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6491871B1 (en) * | 1989-06-07 | 2002-12-10 | Affymetrix, Inc. | System for determining receptor-ligand binding affinity |
US6308145B1 (en) * | 1992-03-27 | 2001-10-23 | Akiko Itai | Methods for searching stable docking models of biopolymer-ligand molecule complex |
US5526281A (en) * | 1993-05-21 | 1996-06-11 | Arris Pharmaceutical Corporation | Machine-learning approach to modeling biological activity for molecular design and to modeling other characteristics |
US6081766A (en) * | 1993-05-21 | 2000-06-27 | Axys Pharmaceuticals, Inc. | Machine-learning approach to modeling biological activity for molecular design and to modeling other characteristics |
US5703792A (en) * | 1993-05-21 | 1997-12-30 | Arris Pharmaceutical Corporation | Three dimensional measurement of molecular diversity |
US5583973A (en) * | 1993-09-17 | 1996-12-10 | Trustees Of Boston University | Molecular modeling method and system |
US6010861A (en) * | 1994-08-03 | 2000-01-04 | Dgi Biotechnologies, Llc | Target specific screens and their use for discovering small organic molecular pharmacophores |
US6434490B1 (en) * | 1994-09-16 | 2002-08-13 | 3-Dimensional Pharmaceuticals, Inc. | Method of generating chemical compounds having desired properties |
US5574656A (en) * | 1994-09-16 | 1996-11-12 | 3-Dimensional Pharmaceuticals, Inc. | System and method of automatically generating chemical compounds with desired properties |
US6490588B2 (en) * | 1994-10-31 | 2002-12-03 | Akiko Itai | Method of searching novel ligand compounds from three-dimensional structure database |
US6389378B2 (en) * | 1994-10-31 | 2002-05-14 | Akiko Itai | Method of searching novel ligand compounds from three-dimensional structure database |
US6219622B1 (en) * | 1995-03-24 | 2001-04-17 | University Of Guelph | Computational method for designing chemical structures having common functional characteristics |
US5989827A (en) * | 1995-11-14 | 1999-11-23 | Abbott Laboratories | Use of nuclear magnetic resonance to design ligands to target biomolecules |
US6185506B1 (en) * | 1996-01-26 | 2001-02-06 | Tripos, Inc. | Method for selecting an optimally diverse library of small molecules based on validated molecular structural descriptors |
US6240374B1 (en) * | 1996-01-26 | 2001-05-29 | Tripos, Inc. | Further method of creating and rapidly searching a virtual library of potential molecules using validated molecular structural descriptors |
US6303322B1 (en) * | 1996-05-09 | 2001-10-16 | 3-Dimensional Pharmaceuticals, Inc. | Method for identifying lead compounds |
US6287763B1 (en) * | 1996-06-10 | 2001-09-11 | Millennium Pharmaceuticals, Inc. | Screening methods for compounds useful in the regulation of body weight |
US20020055536A1 (en) * | 1996-09-26 | 2002-05-09 | Dewitte Robert S. | System and method for structure-based drug design that includes accurate prediction of binding free energy |
US6127524A (en) * | 1996-10-18 | 2000-10-03 | Dade Behring Inc. | Binding molecules and computer-based methods of increasing the binding affinity thereof |
US6421612B1 (en) * | 1996-11-04 | 2002-07-16 | 3-Dimensional Pharmaceuticals Inc. | System, method and computer program product for identifying chemical compounds having desired properties |
US6453246B1 (en) * | 1996-11-04 | 2002-09-17 | 3-Dimensional Pharmaceuticals, Inc. | System, method, and computer program product for representing proximity data in a multi-dimensional space |
US5866343A (en) * | 1997-04-15 | 1999-02-02 | Universite De Montreal | Energetically favorable binding site determination between two molecules |
US6125383A (en) * | 1997-06-11 | 2000-09-26 | Netgenics Corp. | Research system using multi-platform object oriented program language for providing objects at runtime for creating and manipulating biological or chemical data |
US6182016B1 (en) * | 1997-08-22 | 2001-01-30 | Jie Liang | Molecular classification for property prediction |
US6465192B2 (en) * | 1997-12-12 | 2002-10-15 | Jeffrey C. Way | Compounds and methods for the inhibition of protein-protein interactions |
US6207861B1 (en) * | 1998-01-05 | 2001-03-27 | Neogenesis, Inc. | Method for producing and screening mass coded combinatorial libraries for drug discovery and target validation |
US20020107359A1 (en) * | 1998-02-06 | 2002-08-08 | P. Mark Hogarth | Three dimensional structures and models of fc receptors and uses thereof |
US6343257B1 (en) * | 1999-04-23 | 2002-01-29 | Peptor Ltd. | Identifying pharmacophore containing combinations of scaffold molecules and substituents from a virtual library |
US20010046684A1 (en) * | 2000-02-25 | 2001-11-29 | Robert Powers | Methods of structure-based drug design using MS/MNR |
US20020099506A1 (en) * | 2000-03-23 | 2002-07-25 | Floriano Wely B. | Methods and apparatus for predicting ligand binding interactions |
US20020151028A1 (en) * | 2000-05-18 | 2002-10-17 | Cornell Research Foundation, Inc. | Structure-based drug design for Ulp1 protease substrates |
US20020066073A1 (en) * | 2000-07-12 | 2002-05-30 | Heinz Lienhard | Method and system for implementing process-based Web applications |
US20020040612A1 (en) * | 2000-10-10 | 2002-04-11 | Syoichi Miyamoto | Synchronous meshing type automatic transmission control system |
US20030008326A1 (en) * | 2001-05-30 | 2003-01-09 | Sem Daniel S | Nuclear magnetic resonance-docking of compounds |
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7373541B1 (en) * | 2004-03-11 | 2008-05-13 | Adaptec, Inc. | Alignment signal control apparatus and method for operating the same |
US8638937B2 (en) * | 2005-03-01 | 2014-01-28 | CSC Holdings, LLC | Methods and systems for distributed processing on consumer devices |
US9727389B2 (en) | 2005-03-01 | 2017-08-08 | CSC Holdings, LLC | Methods and systems for distributed processing on consumer devices |
US9059996B2 (en) | 2005-03-01 | 2015-06-16 | CSC Holdings, LLC | Methods and systems for distributed processing on consumer devices |
US20130046808A1 (en) * | 2005-03-01 | 2013-02-21 | Csc Holdings, Inc. | Methods and systems for distributed processing on consumer devices |
US20070050092A1 (en) * | 2005-08-12 | 2007-03-01 | Symyx Technologies, Inc. | Event-based library process design |
WO2008046208A1 (en) * | 2006-10-16 | 2008-04-24 | Zymeworks Inc. | System and method for simulating the time-dependent behaviour of atomic and/or molecular systems subject to static or dynamic fields |
US20080147360A1 (en) * | 2006-10-16 | 2008-06-19 | Anthony Peter Fejes | System and method for simulating the time-dependent behaviour of atomic and/or molecular systems subject to static or dynamic fields |
US20080275884A1 (en) * | 2007-05-04 | 2008-11-06 | Salesforce.Com, Inc. | Method and system for on-demand communities |
US9742708B2 (en) | 2007-05-04 | 2017-08-22 | Salesforce.Com, Inc. | Method and system for on-demand communities |
US8706696B2 (en) * | 2007-05-04 | 2014-04-22 | Salesforce.Com, Inc. | Method and system for on-demand communities |
US8473448B2 (en) * | 2007-08-22 | 2013-06-25 | Fujitsu Limited | Compound property prediction apparatus, property prediction method, and program for implementing the method |
US20100145896A1 (en) * | 2007-08-22 | 2010-06-10 | Fujitsu Limited | Compound property prediction apparatus, property prediction method, and program for implementing the method |
US20150286637A1 (en) * | 2007-10-16 | 2015-10-08 | Jpmorgan Chase Bank, N.A. | Document Management Techniques To Account For User-Specific Patterns In Document Metadata |
US9734150B2 (en) * | 2007-10-16 | 2017-08-15 | Jpmorgan Chase Bank, N.A. | Document management techniques to account for user-specific patterns in document metadata |
EP2216429A4 (en) * | 2007-11-12 | 2011-06-15 | In Silico Sciences Inc | In silico screening system and in silico screening method |
US20100312538A1 (en) * | 2007-11-12 | 2010-12-09 | In-Silico Sciences, Inc. | Apparatus for in silico screening, and method of in siloco screening |
EP2216429A1 (en) * | 2007-11-12 | 2010-08-11 | In-Silico Sciences, Inc. | In silico screening system and in silico screening method |
WO2013163068A1 (en) * | 2012-04-23 | 2013-10-31 | Targacept, Inc. | Chemical entity search, for a collaboration and content management system |
WO2015148546A1 (en) * | 2014-03-24 | 2015-10-01 | Life Technologies Corporation | Methods and systems for knowledge discovery using biological data |
US10930373B2 (en) | 2014-03-24 | 2021-02-23 | Life Technologies Corporation | Methods and systems for knowledge discovery using biological data |
WO2017072794A1 (en) | 2015-10-30 | 2017-05-04 | Council Of Scientific And Industrial Research | An automated remote computing method and system by email platform for molecular analysis |
US10467068B2 (en) | 2015-10-30 | 2019-11-05 | Council Of Scientific And Industrial Research | Automated remote computing method and system by email platform for molecular analysis |
Also Published As
Publication number | Publication date |
---|---|
AU2003226043A8 (en) | 2003-10-27 |
EP1500030A2 (en) | 2005-01-26 |
AU2003221884A1 (en) | 2003-10-27 |
WO2003088125A2 (en) | 2003-10-23 |
AU2003221884A8 (en) | 2003-10-27 |
AU2003226043A1 (en) | 2003-10-27 |
US20040010515A1 (en) | 2004-01-15 |
WO2003088090A3 (en) | 2004-10-14 |
US7146384B2 (en) | 2006-12-05 |
EP1495432A2 (en) | 2005-01-12 |
WO2003088090A2 (en) | 2003-10-23 |
WO2003088125A8 (en) | 2004-11-11 |
US20070043694A1 (en) | 2007-02-22 |
WO2003088125A3 (en) | 2004-09-10 |
CA2480202A1 (en) | 2003-10-23 |
CA2479818A1 (en) | 2003-10-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20040019432A1 (en) | System and method for integrated computer-aided molecular discovery | |
Chacon et al. | Low-resolution structures of proteins in solution retrieved from X-ray scattering with a genetic algorithm | |
Sousa et al. | Protein-ligand docking in the new millennium–a retrospective of 10 years in the field | |
Amaro et al. | Emerging methods for ensemble-based virtual screening | |
Demir et al. | PATIKA: an integrated visual environment for collaborative construction and analysis of cellular pathways | |
McConkey et al. | The performance of current methods in ligand–protein docking | |
Schreyer et al. | CREDO: a protein–ligand interaction database for drug discovery | |
Jacq et al. | Grid-enabled virtual screening against malaria | |
Wild et al. | Similarity searching in files of three-dimensional chemical structures. Alignment of molecular electrostatic potential fields with a genetic algorithm | |
Zhang et al. | Molecular docking-based computational platform for high-throughput virtual screening | |
Abreu et al. | MOLA: a bootable, self-configuring system for virtual screening using AutoDock4/Vina on computer clusters | |
Bullock et al. | DockoMatic 2.0: high throughput inverse virtual screening and homology modeling | |
Ausiello et al. | Query3d: a new method for high-throughput analysis of functional residues in protein structures | |
US8036831B2 (en) | Ligand searching device, ligand searching method, program, and recording medium | |
Jacq et al. | Grid as a bioinformatic tool | |
Xu et al. | Protein databases on the internet | |
Head-Gordon et al. | Computational challenges in structural and functional genomics | |
Guterres et al. | CHARMM-GUI LBS finder & refiner for ligand binding site prediction and refinement | |
Shah et al. | A computational pipeline for protein structure prediction and analysis at genome scale | |
Aloisio et al. | ProGenGrid: A grid-enabled platform for bioinformatics | |
WO2008091225A1 (en) | Comparative detection of structure patterns in interaction sites of molecules | |
Taufer et al. | Predictor@ Home: a" protein structure prediction supercomputer" based on public-resource computing | |
US20100305930A1 (en) | System and Method for Improved Computer Drug Design | |
He et al. | eSHAFTS: Integrated and graphical drug design software based on 3D molecular similarity | |
US20040215398A1 (en) | Method and a system for automating the execution of AMoRe over a heterogenous network of computers |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: TRANSTECH PHARMA, INC., NORTH CAROLINA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SAWAFTA, REYAD I.;BAUDRY, JEROME;KUTZ, MICHAEL E.;AND OTHERS;REEL/FRAME:015188/0022;SIGNING DATES FROM 20030802 TO 20030806 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |