WO2005036441A1 - 生体関連事象間の相関データの可視化方法、解析法及びデータベース - Google Patents
生体関連事象間の相関データの可視化方法、解析法及びデータベース Download PDFInfo
- Publication number
- WO2005036441A1 WO2005036441A1 PCT/JP2004/010250 JP2004010250W WO2005036441A1 WO 2005036441 A1 WO2005036441 A1 WO 2005036441A1 JP 2004010250 W JP2004010250 W JP 2004010250W WO 2005036441 A1 WO2005036441 A1 WO 2005036441A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- information
- correlation
- display
- protein
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B45/00—ICT specially adapted for bioinformatics-related data visualisation, e.g. displaying of maps or networks
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/30—Unsupervised data analysis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
Definitions
- the present invention relates to correlation data between bio-related events, in particular, proteins, small molecule compounds,
- the present invention relates to a method for visualizing interaction information between biological substances such as DNA and gene expression profiles. Further, the present invention relates to a graphical user interface and a visualization system incorporating the above method. Further, the present invention relates to an analysis method and a database incorporating the above method. Background art
- this interaction data shows what kind of proteins a small molecule affects, and conversely, what small molecules a protein has.
- the information is combined with the information on the interaction between a protein and a small molecule compound to produce protein. Understand the functions in the body and determine which functions Change can be predicted. In other words, it is possible to predict whether a low-molecular-weight compound can become a drug.
- large-scale data collection has recently begun between two different bio-related events. The problem was that the larger the amount of data, the more difficult it was to look at the entire data and extract features from it. In addition, if the data volume increases, many detailed references of individual data are required, and the observation of individual sites becomes frequent. Therefore, the importance of information visualization methods is increasing in order to effectively extract information embedded in large amounts of correlation data.
- a display method that describes a matrix in which one event is arranged in rows and the other event is arranged in columns, and the correlation data between two events is described in the cell where this matrix intersects is there.
- a method of displaying colors according to the expression intensity in the cells of the matrix is generally used.
- a method of displaying colors or shades according to the interaction in a matrix cell has been used.
- a method of displaying qualitative information such as “ten” or “ten” in a matrix cell according to the interaction has been used (Patent ( PCT: W02 / 23199 A2)).
- the method of displaying the correlation information between two events in a matrix it is common to perform clustering based on the pattern of the correlation data on the matrix. By analyzing what kind of events in the obtained cluster, the relationship between the correlation information and the characteristics of each event can be found. Similarly, by sorting the events according to the characteristics of each event and comparing the obtained correlation information pattern with the characteristics of the event, the relationship between the correlation information and the characteristics of each event can be determined. As described above, in the method of visualizing correlation data using a matrix, it is important to be able to observe both the pattern of the correlation information and the characteristics of each event.
- a matrix display is performed for correlation data with a large number of data, and clustering based on the correlation data pattern ⁇ sorting of events based on the characteristics of each event is performed. Identify. Then, by accessing the detailed information of the feature and interaction information of the components of the identified pattern, it is possible to consider the meaning of the obtained pattern. Furthermore, clustering and sorting are performed again using a method different from the above-described clustering and sorting, and the entire correlation data pattern obtained is observed. Examining whether they belong to a cluster has the potential to lead to new discoveries. In this way, by alternately switching between displaying a large amount of correlation data matrix and displaying individual correlation data, it is considered possible to discover new knowledge about correlation data.
- the conventional method of visualizing correlation data using a matrix has a problem that, when the scale of the number of data fluctuates greatly, appropriate information according to the scale cannot be obtained. For example, let's say that the number of pixels on the screen is about 1 000 pixels by 1 000 pixels X 1 000 pixels (30 cm X 30 cm in size). If the data size is several tens ⁇ hundreds of order, the number of pixels per one cell in the 10 to several tens of pixels X 10 to several tens of pixels, is several min 2 to l cin about 2 in the size The color or shading pattern and each data point can be observed simultaneously.
- the number of pixels per cell is several pixels X several pixels or less, and the size of one cell is about 1 mm 2 or less.
- the cell size is too small to complicate the pattern information, and at the same time, it becomes difficult to recognize each cell.
- Another problem is that drawing time is long.
- the data size increases to several hundreds or more, coarse-graining of a pattern in which a certain number of cells or a plurality of cells corresponding to clusters are collectively recorded as one correlation data is selected.
- the size of one cell is about several mm to lcm X several mm to lcm, and the correlation data pattern and each data point can be observed simultaneously.
- T / JP2004 / 010250 Conversely, if the size of a row or column is reduced to several tens or less, the number of pixels per cell is several tens of pixels X several tens of pixels or more, and the size of one cell is reduced. Despite being as large as several cm 2 or more, the amount of information per cell remains at a level that can be expressed in color, so the amount of information obtained from the entire screen decreases. To increase the amount of information that can be obtained from the entire screen, if one wants to refer to information about individual cells, it becomes necessary to access different information sources for each individual cell. In this case, it was difficult to simultaneously refer to the correlation data pattern and information on a plurality of cells constituting the pattern, and it was troublesome. Disclosure of the invention
- the problem to be solved by the present invention is to provide a visualization method for displaying correlation data between two events in a matrix format, by using a correlation data pattern and information on a plurality of cells constituting the pattern in response to a change in the scale of the number of data. It is to provide a means to observe at the same time, in an appropriate format.
- a screen display system for displaying correlation data between two events in a matrix format
- One of the data display formats with different degrees of data integration per unit correlation data prepared in advance is automatically selected, and information on individual cells (correlation and information on each event) It is characterized by automatically selecting one of a plurality of display methods with different summarization levels prepared in advance, and displaying the correlation data and information on each cell.
- a typical example of correlation data between two events is that one event is a protein, the other event is a small molecule, and the correlation data between events is the strength of the interaction between a protein and a small molecule. .
- both events are proteins, and the correlation data between the events may be the strength of the interaction between proteins or proteins, or the sequence similarity between proteins.
- one event may be a gene, the other event may be a cDNA library from which the gene is derived, and the correlation data between the events may be the expression intensity of the gene for each cDNA library.
- both events are low-molecular-weight compounds, and the correlation data between the events may be structural similarity between low-molecular-weight compounds or interactions on drug efficacy or side effects.
- the first step is to sort the data.
- the data can be sorted in ascending or descending order for one of the protein's physical properties. It can also be sorted and sorted for each single class of protein. Similarly, data can be sorted in ascending or descending order for one of the compound's physical properties. You can also sort by a certain class of compounds.
- proteins and small molecules may be rearranged so that proteins and small molecules having similar interactions are next to each other. it can.
- clustering The calculation of similarity between proteins and small molecules based on interaction strength is called clustering, and is particularly useful for extracting data from information on the interaction between two events. This is a sorting method.
- clustering the table showing the interaction strength is displayed with the strong and weak parts separated from each other. If a strong part is displayed in dark color, it can be regarded as an island floating on the sea. Each “island” is called a cluster. Clusters with higher intensities have a higher degree of attention, so clustering results can be observed in detail from important clusters by arranging each cluster diagonally in descending order of intensity.
- the second step is to use these clusters obtained as a result of clustering.
- clusters are classified into the following three types according to their shapes.
- Long clusters is a cluster formed when two or more proteins interact strongly with one small molecule or when two or more small molecules interact strongly with one protein.
- a large cluster is a cluster formed when all or a part of a combination of a plurality of low molecular compounds and a plurality of proteins strongly bind to each other.
- singletons are clusters formed when a specific strong interaction is observed in a combination of one small molecule and one protein.
- the common part of multiple low-molecular compounds (or proteins) is extracted.
- the common part may be a range in which physical properties represented by numerical values can be taken, or may have similar structural features. Further, the case may be such that the attribute of a compound or a protein is represented by a profile composed of a plurality of elements.
- These common parts are considered to be indispensable factors to generate binding with the target protein (or target low molecular weight compound).
- the structural features of low-molecular-weight compounds involved in binding to the target protein lead to a concept called pharmacophore, which is information that plays an important role in drug discovery.
- the structural features of proteins involved in binding to the target low-molecular compound are active sites expressed in terms of proteins such as “binding pocket” and “dent”.
- the structural modification of the low molecular weight compound maintains the interaction with one protein in the cluster or loses the interaction with another protein in the cluster Such molecular design is also possible.
- the next step is to search for low molecular weight compounds (or proteins) that do not belong to the cluster and have the same common partial structure.
- the low-molecular compound (or protein) obtained as a result of the search is one for which no strong interaction with the target protein (or target low-molecular compound) was found by the definition of the cluster.
- This pair is a doctor There may be a relationship between the drug and its target protein, a small molecule compound that causes side effects and its target protein, and binding may cause a biologically meaningful change. May not be. If this pair is a relationship between a drug and its target protein, chemical modification may be able to design low-molecular compounds that bind more specifically to the target protein.
- a database of the cluster analysis results in the second step is created.
- Analysis results of attributes common to the interaction clusters shown above, and known information related to each time extracted from the literature and patents are collected and compiled into a database.
- This database is equipped with a function to search for known related information from cluster analysis results and a search for cluster analysis results from known information. By utilizing this search function, users will be able to make molecular or pharmacological interpretations of the interaction cluster.
- the screen display method uses a display format in which (A) the correlation data element itself, for example, the binding constant between a low-molecular compound and a protein, is used as a data display format (referred to as an individual data display format). (B) A display format in which a group of multiple pieces of interaction data is used as a screen display data unit (a class obtained from clustering based on the pattern of correlation data and the characteristics of events). JP2004 / 010250 is a set of multiple interaction data. Therefore, it is characterized in that it has three display formats (called a cluster display format) and (C) a display format (called a statistical display format) in which the statistic of a plurality of correlation data is used as a screen display data unit.
- the statistics of correlated data refer to the number of clusters themselves and the number of relevant information obtained from different data sources for each element of the cluster.
- the screen display method according to the present invention is characterized in that it has a display method according to a plurality of summary levels set depending on the amount of information as a display method of information on individual cells (correlation and information on each event). And
- the summarization degree is defined as a higher value as the amount of information in expressing one event is smaller.
- the plurality of summarization levels defined by the present invention are as follows. When all the semantically non-overlapping information stored in the data field is output to the screen, the data is not summarized, so the degree of data summarization is assumed to be zero. For different types of data fields, data formats corresponding to multiple summarization levels are defined. For example, when displaying real data containing an exponent,
- Summarization level 0 displays the field value itself
- the value of the index part is classified into five clusters, and information is displayed in colors corresponding to the clusters.
- each definition of the hierarchical structure is displayed in a staircase
- a value corresponding to the value of the top layer of the hierarchical structure is displayed with a color.
- the screen display method according to the present invention provides the above-described method according to a change in the number of data. Automatically or manually selecting one of the multiple data display formats, and multiple display methods that differ in the degree of summarization of information on individual cells (correlation and information on each event) described above. Automatically or manually selecting one of them, and displaying the correlation data and information on each event using the selected data display format and the degree of summarization. . '
- the data display format and the summarization level When automatically selecting the data display format and the summarization level according to the present invention, it is necessary to select such that the amount of information displayed on the screen is kept close to a certain value near the maximum information amount that can be recognized by the user.
- the data display format and summarization level are automatically selected based on the fact that all information related to one screen is displayed. However, you may allow a little scrolling of the screen.
- Figure 1 is a flow chart of data visualization.
- Figure 2 is an example of a screen display of interaction data between a small molecule and a protein.
- Fig. 3 is an example of a screen display of data sorted based on the clustering result using the interaction data profile.
- Figure 4 is an example of a screen display of data sorted based on the results of clustering using row and column features.
- Fig. 5 shows an example of information display in cluster display format.
- Fig. 6 is a screen display example of information for each of the four summarization levels in the individual data display format.
- Fig. 7 shows the rules for determining the data display format and the degree of data summarization.
- Figure 8 is a summary rule decision table for the low molecular compound physical property table.
- Figure 9 is an overview of the related information extraction method.
- FIG. 1 is a flow chart of data visualization.
- Figure 2 is an example of a screen display of interaction data between a small molecule and a protein.
- Fig. 3 is an
- FIG. 10 shows the result of extracting relevant information.
- FIG. 11 is an example of a user interface in which the present invention is implemented.
- Fig. 12 shows the results before and after the PLD data was divided into clusters that divide the low-molecular-weight compounds into 25 groups and the proteins into 15 groups.
- Fig. 13 shows two types of display examples of clustering results of PLD data.
- FIG. 14 shows a matrix of the interaction between the low molecular weight compounds and the protein, an expression profile matrix in the cell tissue of the protein displayed adjacently, and an adverse event matrix of the low molecular weight compound.
- Fig. 15 shows an example in which information on the interaction between low-molecular-weight compounds obtained by experiments and the information on the interaction between proteins of known low-molecular-weight compounds obtained from the literature etc.
- Fig. 16 is a matrix in which information on chemical structure similarity of low-molecular-weight drugs and classification information based on an adverse event matrix are simultaneously displayed in one matrix as an interaction between two events.
- Figure 17 shows an example of displaying complex information of a protein and a low-molecular compound using a two-dimensional table.
- 101 User operation
- 102 Internal calculation
- 103 Data processing
- 104 Protein-low molecular compound interaction database
- 105 Various correlation tables
- 106 Display data
- 107 Data display format and summarization level determination rule.
- 201 low molecular compound label
- 202 protein label
- 203 matrix portion
- 204 molecular weight
- 205 number of alpha helix and beta strands
- 206 homology Clustering information based on gender.
- Protein cluster B Protein cluster B
- Protein cluster 307 Cluster consisting of a specific low-molecular compound and protein pair
- 308 One A cluster consisting of a set of compounds that interact specifically with one another.
- Cluster A with relatively large molecular weight A 402: Cluster B with medium molecular weight
- 403 Cluster with relatively small molecular weight
- 404: Cluster 1, 40 5 based on homology of amino acid sequence Based on amino acid sequence homology
- cluster 2 406: Region with relatively high interaction.
- 503 label
- 502 number of elements belonging to the cluster
- 503 list of elements belonging to the cluster
- 504 matrix part.
- 6 0 1 Screen display at summarization level 0
- 6 02 Screen display at summarization level 1
- 6 0 3 Screen display at summarization level 2
- 6 04 Screen display at summarization level 3.
- 7 0 1 Summarization level
- 7 0 2 Data item
- 7 0 3 Location
- 7 0 4 Summarization rule
- 7 0 5 Rule “as is”
- 7 0 6 Rule “color (2 0, 3 0 0, 400, 500).
- 801 Condition
- 802 Display format
- 803 Summarization level.
- 90 1 Protein-low molecular weight compound interaction table
- 90 2 Protein-protein interaction table
- 903 Protein-expression table
- 904 Low-molecular compound-low-molecular compound interaction table
- 1 03 Related information acquisition button
- 1104 Function group related to action
- 1105 Function group related to selection
- 1106 Related information display screen.
- 1 203 An area where meaning can be found in the results of class lettering
- 1 204 An area in which dissimilar interaction data is mixed in with the results of clustering
- 1 3 0 1 Example of a part of matrix data in units of clusters displayed on the screen with summarization level 2 1 3 0 2: Number of low molecular compounds belonging to cluster 1 3 0 3: Number of proteins belonging to cluster Number, 1304: Number of interactions belonging to the cluster, 1305: Matrix display of individual proteins and low molecular weight compounds, 1306: 12x12 matrix Cluster, 1 3 0 7: Physical property value of a compound group that is an element of the cluster, 130 8: Cluster in which the physical property of the compound corresponds to the interaction strength, 130 0 9: Physical property of the compound that is an element of the cluster 130 8, 1310: A table in which the interaction strength of the cluster 13 08 and the physical properties of the compound 13 9 are projected into three levels.
- 1 40 1 Matrix of small molecule protein-protein interaction
- 1402 Matrix of expression profile in cell tissue
- 1403 Adverse event matrix
- 1 404 Small molecule protein-protein interaction cluster
- 1 405 Small molecule Compound Protein-protein interaction cluster
- 1406- Small molecule protein-protein interaction cluster region
- 1407 Small molecule compound-protein interaction cluster region
- 14008 Small molecule compound-protein interaction cluster region
- 1 409 Interaction cluster region between small molecules and proteins
- 1410 Expression profile in cell tissue
- 1411 Expression profile in cell tissue
- 1412 Adverse event matrix profile
- 1413 Adverse event matrix profile
- 1501 Interaction matrix between small molecule proteins
- 1502 Cluster obtained by clustering based on known interaction information
- 1601 A matrix displaying the chemical structure similarity information of low-molecular-weight compounds and classification information based on the adverse event matrix at the same time.1662: Clustering based on the chemical structure similarity information is obtained. Cluster, 1603: low molecular compound C5, pair between C4, 1604: compound pair without chemical structural similarity
- 1 7 0 1 Column displaying the distance information between the centers of gravity of the complex of protein and low molecular weight compound, 1 7 0 2: Cluster containing low molecular weight compound, 1 7 0 3: Model of protein-low molecular weight compound complex BEST MODE FOR CARRYING OUT THE INVENTION
- Example 1 As a correlation between two events, we consider the interaction between biological substances such as proteins, low molecular weight compounds, and DNA.
- An example in which the interaction data between “small molecule compound” and “protein” is handled as two events of interest will be described below.
- the interaction data refers to information as to whether or not complex data of small molecules and proteins are available in the Protein Data Bank (PDB, ht tp: ⁇ www. Pdb.org). It is data obtained by experimentally measuring the degree of binding between a low-molecular compound and a protein.
- Protein characteristic data includes information from various external databases and calculated clustering results.
- SWI SSPROT (ht tp: www.expacy.ch/sprot) ID
- clustering results based on amino acid sequence homology
- Gene Ontology http: // ww Geneontology.org
- annotation information solubility in solvents, etc.
- Characteristic data of low molecular compounds include various molecular characteristics such as molecular name, molecular weight, drug classification, charge distribution, hydrophilicity / hydrophobicity, steric structure, number of donors / acceptors for hydrogen bonds, type and number of functional groups, etc. Has a value.
- User operation 101 is a part for selecting data and an action to be executed.
- the functions include data acquisition 102 and data processing 103.
- data acquisition 102 In order to obtain data, a database of protein-low molecular compound interactions under various search conditions 1
- Data processing includes processing such as clustering for the entry specified on the display screen, and processing such as changing the display scale.
- the acquired or processed data is treated as display data 106.
- the data display format and the degree of summarization are determined for the displayed data.
- the data display format and the summarization level are determined based on the data display format and the summarization level determination rule 107 prepared in advance according to the number of pieces of display data. According to the determined display format and the degree of summarization of the data, a data screen display 108 is performed.
- correlation tables include a protein-protein interaction table, A protein expression profile table, a structural similarity between a low molecular compound and a low molecular compound, an interaction table on drug efficacy or toxicity, and the like can be considered.
- the key point of the present invention is that "the data display format and the degree of summarization are determined based on the data display format and the summarization degree determination rule prepared in advance according to the number of data of the display data.” Will be described in detail below.
- Figure 2 shows an example of a screen display of the interaction data between a low-molecular compound and a protein.
- the small molecule label 201 and the protein label 202 are arranged vertically in the matrix, and the matrix part 203 contains the experimentally measured protein and small molecule compounds.
- the color intensity is displayed according to the strength of the binding.
- molecular weight 204 is displayed as the characteristic amount of the compound, and on the upper side of the protein label, the number of alpha helix and beta strand 205 and the protein interaction amount are expressed as the characteristic amount of the protein.
- the clustering information 206 based on the similarity of the two is displayed.
- the interaction data displayed on the screen in a tabular format is clustered based on the interaction data input file, or clustered based on the characteristic amounts of proteins and low molecular weight compounds, and the resulting clustering is obtained.
- the data can be rearranged and displayed based on the information.
- Clustering using the interaction data is performed, for example, by the following method.
- One focusing on the low molecular compound Ci, interacting therewith intensity profile 1 of each protein Pj (j l,.. ., N p, N p is the number of protein) we consider the.
- the distance between the brute force interaction intensity profiles between all the low molecular weight compounds is calculated.
- the distance D ik between the interaction strength profiles between the low-molecular compound and the low-molecular compound C k is calculated by the following formula, for example.
- the sum in the above equation j l,..., Take the N p.
- Figure 3 shows the results of the actual clustering described above. Small molecule compounds are classified into three clusters, and proteins are also classified into three clusters. The results are shown on the label of the small molecule compound in the small molecule cluster A301, the small molecule cluster B302, and the small molecule cluster C It is identified as protein cluster A 304, protein cluster B 305, and protein cluster C 306 on the label of the protein by its color intensity.
- the average of the binding constants which is the interaction data, is calculated internally for each cluster, and the clusters are sorted from top to bottom and from left to right in descending order by the average of the binding constants.
- a cluster 307 consisting of a pair of a specific low-molecular compound and a protein, or a cluster containing many compounds that specifically interact with one protein 308 etc. will be visually evident.
- the molecular weight can be divided into several categories for clustering, tanno,. It is possible to classify the number of quality anorefa helittas and beta strands according to a certain level. And clusters based on molecular weight, 0 Display data can be sorted for clusters based on the number of alpha-heriters and beta-strands, or clusters based on previously calculated homology of amino acid sequences. In particular, if the data for a certain feature is rearranged and a characteristic coupling constant color pattern appears, it is possible to know that the feature and the coupling constant are closely related.
- Figure 4 shows the results of clustering the data based on the molecular weight for the low molecular weight compound and the amino acid homology for the protein side, and rearranging the table according to the clustering results.
- the low molecular weight compounds are classified into clusters A401 with relatively large molecular weight, clusters B402 with medium molecular weight, and clusters C403 with relatively small molecular weight according to the molecular weight.
- the whole is sorted in descending order by molecular weight.
- the clusters 1,404 and clusters 2,405 are shown on the screen based on the amino acid sequence homology.
- the low-molecular compounds belonging to cluster B overlap with the relatively high-interaction region 406 in the interaction matrix.
- each cell in the table corresponds to an interaction between one protein and a small molecule.
- This is called the "individual data display format” here.
- individual data display format As the number of proteins and the number of low-molecular compounds increase, the size of the table increases, making it difficult to grasp the entire data. That is, unless the size of each cell in the table is changed according to the increase in the number of data, the entire table cannot be displayed on the screen, and the state of the entire data cannot be overlooked. Conversely, if the size of each cell in the table is reduced so that the entire table fits on the screen, the pattern of the interaction data displayed in the cell becomes finer, making it difficult to recognize its features. Become. Therefore, in order to make it possible to recognize the interaction pattern of the entire table in a glance even when the number of data is increased, each cluster in Fig. 3 or 4 is displayed as one cell on the table. Enabled. This is called “cluster display format" here.
- Fig. 5 shows an example of information display in cluster display format.
- the label 501 contains the number of the cluster, and the number of features belonging to the cluster 502 and the list 503 of elements belonging to the cluster are shown as features.
- the average value of the measured data for each cluster is indicated by the color density, and the number of elements constituting the cluster is indicated by numerical values. It is possible to switch between information display in individual data display format and information display in cluster display format. Operations such as sorting and deleting rows and columns in one display format are reflected in another display format.
- the cluster display format similar proteins and similar low-molecular compounds form clusters, so it is possible to visualize representative data without dropping it.
- the number of rows and columns in the displayed table can be controlled even when the number of interaction data is large.
- “Statistics display format” is an information display format complementary to the individual data display format and the cluster display format. This format is used to display statistical data such as average and standard deviation for all or part of the data, and to display the number of data extracted from different data sources. In the statistics display format, you can get an overview of the data regardless of the number of interaction data can do. In particular, when the number of data increases, it becomes difficult to recognize the interaction pattern of the entire table at a glance, even in the cluster display format. In such cases, the statistics display format is very effective from the viewpoint of grasping the whole picture of the data.
- a plurality of display formats are prepared, and at the same time, as information to be displayed in each cell of the matrix, a plurality of data with different degrees of summarization are prepared, and information corresponding to the number of data is selected from among them It is characterized by being used.
- summarization levels When displaying the interaction data of proteins and small molecules, four summarization levels (0-4) are prepared. At a summarization level of 0, all the information stored in the database and the statistics calculated from it are displayed. At a summarization level of 1, up to 64 characters of character data, symbols, and colors can be displayed per cell. Text fields in the database can be displayed as long as they are 64 characters or less, and even if they are long, the information can be reduced to 64 characters or less. Summarization level 2 can display up to eight characters of character data, symbols, and colors per cell. At summarization level 3, character data is not displayed. Express all information in color.
- the information display at summarization level 0 is free format, and at summarization level 1, the size of one cell is 60 pixels vertically ⁇ 120 pixels horizontally, with 16 characters x 4 lines in it. Allocate an area to display the text.
- the size of one cell is 20 pixels vertically and 60 pixels horizontally, and an area for displaying text of 8 characters x 1 line is secured in it.
- the size of one cell is 5 pixels vertically by 5 pixels horizontally. In principle, it is possible to reduce the size of one cell to at least 1 pixel x 1 pixel, but we use a mouse to select a cell size that allows us to manipulate individual data.
- Figure 6 shows an example of a screen display of information for each of the four summarization levels in the individual data display format.
- Screen display at summarization level 0 6 1 shows interaction data, low molecular compound Product data and protein data are displayed in detail.
- the display format is free, and it is possible to display and manipulate the structure of proteins and low molecular weight compounds.
- the screen display at the summarization level of 1 displays the key for accessing various external databases related to protein, the name and efficacy of low molecular weight compounds, and detailed numerical values of the measurement data of the interaction. .
- the displayed character data is limited to 8 characters, so labels such as labels for identifying rows and columns and main values of interaction measurement data are limited. Displaying information.
- the value taken by each cell is converted into color information and displayed. Thus, similar data can be visually recognized from the color pattern.
- rules need to be created on how the information is summarized according to the degree of summarization.
- the basic rules are that at summarization level 0, all information is displayed, at summarization levels 1 and 2, information is displayed according to the character length, and at summarization level 3, color is displayed. Following these basic rules, detailed summarization rules need to be defined for each data item in the database.
- Fig. 7 shows an example of a summary rule determination table for the low molecular compound feature table. Information on which data item 702 of the fields in the table is processed into which location 7303 and what summary rules 7004 are displayed on the screen according to the summarization level 7 01 Is given.
- the display format of three data and the degree of summarization of the four data have been described above. By combining these, it is possible to visualize data from various angles.
- the present invention is characterized in that when a user selects information to be viewed, a function of automatically determining an optimum data display format and data summarization degree according to the number of data.
- Figure 8 shows the rules for determining the data display format and data summarization level in a table format.
- the condition 8001 is viewed in order from the top, and when the condition is satisfied, the display format 8002 described in the line and the summarization degree 803 are adopted. If not, look at the next line.
- G, R, Gc, and Rc are the numerical values defined in FIG. Hereinafter, this table will be described.
- the number of proteins P and the number of low-molecular compounds C when both the number of column-wise features and the number of row-wise features are 1 Both are 2 or more and 9 or less.
- the size of one cell is 60 pixels vertically by 120 pixels horizontally, and in the information display area of 450 pixels vertically by 900 pixels horizontally, all the data are displayed.
- the display size of the data is from 240 pixels vertically x 480 pixels horizontally to 660 pixels vertically x 132 pixels horizontally. This is within 1.5 ⁇ 1.5 times the entire information display area.
- the summarization level is gradually increased to 2, 3 according to Fig. 8. If the number of P and C increases further, Switch to the cluster display, and increase the summarization level to 1, 2, and 3 as the number of protein clusters Pc and the number of low-molecular compound clusters Cc increase.
- the conditions for G, R, Gc, and Rc for switching the display format and summarization level described above are as follows:
- the display size of all data is within 1.5 x XI.5 times the entire information display area.
- the conditions are set so that To meet the generalized criterion of displaying information for all data within n x m times the data display area,
- the generalized condition can be used to determine the data display format and the degree of summarization.
- Acquisition of related information is performed as follows. Select the cell area of interest in the displayed data table, and extract the low molecular compound ID and protein ID belonging to this cell area. These IDs are searched in the related data table, and information accompanying the searched ID is extracted from the related data table.
- Figure 9 shows a specific method for extracting relevant information.
- the binding strength between proteins is specified as 100, the maximum value.
- the protein-protein interaction table 922 and the protein-expression table 933 which shows the qualitative protein expression level in the expression library 1, have data for proteins with a protein ID of P12. Extract the Similarly, from the low-molecular compound-low-molecular compound interaction table 904 that stores data on the effect of using multiple drugs between low-molecular compounds and no data, data exists for those with IDs of C5 and C9. Extract things.
- the related information extraction results are arranged and displayed for each source table.
- the information display format and summarization level are automatically set according to the number of hits, and the information is displayed on the screen in the set display format and summarization level.
- the related information can be obtained from a part of the information displayed in such a manner. Therefore, according to the present invention, multidimensional interaction data can be visualized by efficiently following the link between the one-to-one interaction data.
- a part of the information displayed on the screen is selected, an action selected from a plurality of actions is performed on the selected data, and an action of the action is performed.
- the resulting information is displayed on the screen.
- Fig. 11 shows an example of the user interface.
- Display mode change button 1 1 0 1 Summarization level change button 1 1 0 2
- Related information Acquisition button 1 1 0 3 In addition to row and column replacement, sorting, clustering, deletion, etc. It has a related function group 1104 and a function group 1105 related to selection of characteristic rows and columns, and rows and columns as a representative subset.
- mouse actions are assigned to each of the cells displayed in a table on the screen, so that rows and columns can be selected, and the related information display screen 1 1 0 6 can not be displayed in the cell It can also display long character string data.
- Fig. 12 shows the results before and after clustering the PLD data into clusters that divide the low molecular weight compounds into 25 groups and the proteins into 15 groups.
- the matrix before clustering 1221 is rearranged like the matrix after clustering 1222.
- dots indicating combinations of interacting proteins and low molecular weight compounds are scattered on the matrix, but by clustering, rows and columns with similar interaction intensity patterns are adjacent. Is displayed. In the region where the meaning can be found in the clustering results, the region with strong interaction appears as an “island” in the matrix.
- Fig. 13 shows two examples of display of clustering results of PLD data.
- the data belonging to each cluster should have similar interaction intensities if the clustering results are significant. Therefore, the number of rows and columns in the table can be reduced by expressing all the elements included in the cluster with one representative value. The average value was used here as the representative value.
- the number of low-molecular compounds belonging to the cluster is 1
- the number of proteins that interact with each other and the number of interactions that belong to the cluster defined by their product are shown.
- the low molecular weight compound is divided into 25 clusters, and the protein is divided into 15 clusters, so that the size of the entire table is 25 X 15.
- the position of the element containing the maximum value is specified from the matrix of 25 x 15. If the position force of the element is s (P, q), the first element of the matrix is replaced by the P-th row, and the first and q-columns are exchanged. 1) That is, it can move to the upper left.
- the results of clustering are arranged diagonally, but the only difference is that in the operation on the second lap, the element with the largest value is assigned to the first row and the first row of the matrix. It is to find out from the matrix of 2 5 X 14 excluding the column, and to move the element to the position of (2, 2).
- the matrix displayed in units of clusters can be returned to the display 135 by the matrix in units of individual proteins and low molecular weight compounds.
- the number of interactions belonging to the above cluster 1 3 4 4 has 1 2 elements, so if it is expressed in units of proteins and low molecular weight compounds, it will be expressed by a matrix of 1 2 vertical x 1 horizontal Cluster 13 06
- the structure of the compound can be actually displayed and compared by clicking on the label of the compound. By comparing such structures, it is possible to infer the common structure and active site of the compound. Here, such detailed analysis is omitted because it is outside the scope of the present invention.
- Molar refractivity is between 8.3 and 11.5, and the log P value is between 2.4 and 4.5.
- Most compounds belonging to this cluster also belong to the 3 AND MORE RING SYSTEMS (compounds with three or more ring structures) in terms of structural classification.
- a more detailed relationship between physical properties and bond strength can be seen from the observations in Table 1310, in which the interaction strength of cluster 13 08 and the physical property value of the compound 13 09 are projected to three levels. .
- the condition of the physical property value to have a strong bond is that the partition coefficient between water and octanol is small, and the molecular reflectance is medium or large. If either one is satisfied, the binding strength is moderate, and if neither is satisfied, the binding strength is the weakest among the compounds in the cluster.
- Example 3 As a method of extracting a common attribute of a compound or a protein from a cluster obtained based on the interaction, the attribute of the compound or the protein is expressed by a profile including a plurality of elements.
- FIG. Fig. 14 shows the expression profile matrix 1442 in the cell tissue as the attribute of the protein, and the adverse event matrix 1443 as the attribute of the low molecular weight compound. This is displayed adjacent to the matrix 1401 of the interaction between compound and protein. Proteins are indicated as P1 to P7, cell tissues are indicated as T1 to T7, low molecular compounds are indicated as C1 to C6, and adverse events are indicated as S1 to S5.
- the protein-protein interaction matrix may be one obtained by experiment or one obtained from literature.
- the adverse event matrix can be found, for example, in the Japan Pharmaceutical Collection DB (http://www.japic.or.jp/publications/inaex3.ntml). Obtained by examining the occurrence of each term in the glossary (MedDRA).
- the small molecule protein-protein interaction cluster 1444 can be classified into two regions 1406 and 1407. These two regions correspond to two protein groups (P4, P5) and (P6, P7) with different profiles of 14010 and 1411, respectively, in the expression profile matrix in cellular tissues. Yes, it is. This shows that the proteins in cluster 144 interact with the common low-molecular compound C2, but interact with two different groups of proteins in the expression profile in cell tissues. . This means that when the low molecular weight compound is a drug, it interacts with two types of target proteins having different physiological functions. Furthermore, by examining the function of the interacting partner protein, it would be possible to speculate about the relevance of this drug to the efficacy.
- the small molecule protein-protein interaction cluster 1405 can be classified into two regions 14008 and 1409. This These two areas correspond to two groups of low molecular compounds (C2, C3) and (C4, C5) with different profiles 1412 and 1413, respectively, for adverse events. I have. Of these two low molecular weight compounds, one interacts with one protein P 1, while the other interacts with two proteins plus another protein P 2. This makes it possible to infer that the two proteins are each associated with a different adverse event profile.
- the low molecular weight compound and the profile composed of multiple elements as an attribute of the protein may be a protein-protein interaction, a protein phylogenetic tree profile, a compound structural profile (MACCS key descriptor, etc.), and the like. No. In all these cases, determine where and how the low-molecular compounds and proteins that make up the cluster obtained based on the interaction are different when viewed in terms of attributes as a profile consisting of multiple other elements It becomes possible.
- a method of simultaneously identifying and displaying a plurality of types of correlation data between the bio-related events in a matrix cell will be described.
- Fig. 15 shows an example in which the interaction information obtained by the experiment and the known interaction information obtained from the literature are displayed simultaneously.
- FIG. 15 shows the interaction matrix 1501 between low-molecular-weight compound proteins. The low-molecular compounds are indicated by C1 to C6, and the proteins are indicated by P1 to P7.
- the cells of the low-molecular-weight protein-protein interaction matrix are compared with the upper and lower cells corresponding to the interactions obtained from experiments and literature It is divided into two regions, and the presence or absence of interaction is indicated by whether or not a symbol (experiment; image, literature; ⁇ ) is written in the divided region.
- cluster 1502 obtained by clustering based on known interaction information obtained from literature and the like is shown.
- cluster 1502 by focusing on the interactions obtained by experiments, it is possible to evaluate how much of the known interaction information could be reproduced by experiments.
- the cell of (C3, P4) there is an interaction obtained in the literature between the low-molecular compound C3 and the protein P4, but it was found that no interaction was obtained in the experiment. Understand.
- experimental interactions 1503 that do not belong to the cluster of known interaction information it is possible to identify interactions that are not found in the literature but are newly obtained by experiments. it can.
- FIG. 16 shows a matrix 1601 in which the chemical structural similarity information of the low-molecular-weight drug and the classification information based on the adverse event matrix are simultaneously displayed.
- Information on the chemical structural similarity of low-molecular-weight pharmaceutical compounds can be found in, for example, the MACCS key item ⁇ ⁇ child ⁇ Reop timiza tion of MDL Keys for Use in Drug Discovery, j.L. Durant, BA Le land, DR Henry, JG Nourse , JCICS, 2002, 42 (6), 1273-1280.).
- the classification information based on the adverse event matrix can be obtained by comparing the adverse event profiles in the adverse event matrix described in Example 2.
- the cells of the matrix are divided into two areas corresponding to the chemical structure similarity information and the classification information based on the adverse event matrix, respectively.
- the classification information by matrix is displayed.
- the chemical structure similarity strength is represented by the color intensity (K; high similarity ⁇ ; medium similarity ⁇ ; low similarity). Is displayed.
- Figure 16 shows the results of clustering based on chemical structure similarity information and collecting the clusters near the diagonal of the matrix. Based on the chemical structure similarity information and the adverse event matrix in the cluster based on the chemical structure similarity information 4 010250 By comparing and observing the classification information, it is possible to determine the degree of similarity in chemical structure and the degree of the same classification according to the adverse event matrix. For example, cluster
- the low molecular weight compounds C 2, C 3, C 4, and C 5 have similar chemical structural similarities to each other.
- the low molecular weight compounds C5 and C4 there is a weak chemical structural similarity between the groups, but it can be seen that they do not belong to the same cluster depending on the adverse event matrix.
- the same cluster is formed by the adverse event matrix, the presence of an adverse event independent of the chemical structural similarity can be confirmed.
- Simultaneously displayed correlation data includes sequence similarity and structural similarity between proteins, sequence similarity and functional similarity between proteins, sequence similarity and expression profile similarity between proteins, and structure between low molecular weight compounds Similarity and efficacy classification, and structural classification by two different methods between low molecular weight compounds may be used. Also, interaction information obtained by different experimental methods may be used. In all of these cases, it is possible to obtain concrete and intuitive information on how a cluster obtained by one criterion is different from a cluster obtained by another criterion.
- a method of displaying complex information of a protein and a low-molecular compound using a two-dimensional table will be described.
- Both biological events are the atoms of protein residues and the center of gravity of small molecules.
- a plurality of proteins and low molecular weight compounds may both be present in the complex.
- the correlation data between them the interatomic distance, the distance between the centers of gravity of the distance between centers of gravity, and C a atoms one low molecular compound having a low molecular compound is used.
- the case of one protein and one low molecular compound will be described with reference to FIG.
- the distance between atoms of the protein is arranged vertically and horizontally in the order of residue number.
- Fig. 17 shows the result of writing " ⁇ " in the cell when the distance is less than a certain distance as the distance information, and then rearranging the data after clustering. In the upper left corner on the diagonal line of the distance matrix, there is a cluster 1702 containing low-molecular compounds.
- a visualization method that displays correlation data between two biological events in a matrix format By using the visualization method according to the present invention and an interface that implements the visualization method, coarse-grained correlation data patterns and access to other sources of information for each cell can be made depending on the size of the correlation data. Without having to perform the work manually, information on the correlated data pattern and the cells that make up the pattern can be simultaneously obtained in an appropriate display format and summarization level automatically selected according to the variation in the number of data. It becomes possible to observe. This makes it possible to observe the whole image of the data while automatically maximizing the amount of information obtained from the individual cells, regardless of the number of data to be displayed.
- the present invention When the present invention is applied to interaction data between bio-related events, for example, protein-small molecule compound interaction data, the user can see at a glance all of the strengths of these interactions. In addition, when the number of data is large, proteins and low molecular weight compounds with similar interaction intensities are presented on the screen in a compact form with the amount of data. Conversely, when a user focuses on a portion of the interaction data, they can make decisions in drug discovery research while viewing detailed information. Similarly, by analyzing and analyzing protein-protein interaction and other important interaction data using the present invention while visualizing the data, the data processing in the drug discovery process can be accelerated, and the drug discovery process can be accelerated. It leads to speedup.
- bio-related events for example, protein-small molecule compound interaction data
Abstract
Description
Claims
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/569,494 US20060287831A1 (en) | 2003-10-07 | 2004-07-12 | Method for visualizing data on correlation between biological events, analysis method, and database |
JP2005514528A JP4690199B2 (ja) | 2003-10-07 | 2004-07-12 | 生体関連事象間相関データの可視化方法及びコンピューター読み取り可能な記録媒体 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2003348438 | 2003-10-07 | ||
JP2003-348438 | 2003-10-07 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2005036441A1 true WO2005036441A1 (ja) | 2005-04-21 |
Family
ID=34430961
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2004/010250 WO2005036441A1 (ja) | 2003-10-07 | 2004-07-12 | 生体関連事象間の相関データの可視化方法、解析法及びデータベース |
Country Status (3)
Country | Link |
---|---|
US (1) | US20060287831A1 (ja) |
JP (1) | JP4690199B2 (ja) |
WO (1) | WO2005036441A1 (ja) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7716169B2 (en) | 2005-12-08 | 2010-05-11 | Electronics And Telecommunications Research Institute | System for and method of extracting and clustering information |
JP2016514321A (ja) * | 2013-03-13 | 2016-05-19 | セールスフォース ドット コム インコーポレイティッド | データアップロード、処理及び予測クエリapi公開を実施するシステム、方法及び装置 |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070214133A1 (en) * | 2004-06-23 | 2007-09-13 | Edo Liberty | Methods for filtering data and filling in missing data using nonlinear inference |
US8655800B2 (en) * | 2008-10-07 | 2014-02-18 | Hewlett-Packard Development Company, L.P. | Distance based visualization of event sequences |
WO2010126407A1 (en) * | 2009-04-27 | 2010-11-04 | Telefonaktiebolaget Lm Ericsson (Publ) | Dynamic tag control and fingerprinting event localization |
US9165112B2 (en) * | 2012-02-03 | 2015-10-20 | Fresenius Medical Care Holdings, Inc. | Systems and methods for displaying objects at a medical treatment apparatus display screen |
US9280612B2 (en) | 2012-12-14 | 2016-03-08 | Hewlett Packard Enterprise Development Lp | Visualizing a relationship of attributes using a relevance determination process to select from candidate attribute values |
US9779524B2 (en) | 2013-01-21 | 2017-10-03 | Hewlett Packard Enterprise Development Lp | Visualization that indicates event significance represented by a discriminative metric computed using a contingency calculation |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH10269050A (ja) * | 1997-03-24 | 1998-10-09 | Canon Inc | 情報処理装置及びその方法 |
JPH1185448A (ja) * | 1997-09-05 | 1999-03-30 | Matsushita Electric Ind Co Ltd | 情報表示装置 |
JP2002149300A (ja) * | 2000-11-15 | 2002-05-24 | Isao Higashihara | 表の表示および扱いに関する方法および装置 |
JP2003505749A (ja) * | 1999-02-23 | 2003-02-12 | ワーナー−ランバート カンパニー | 遺伝子発現プロファイリングに由来する情報の管理と提示のためのシステムと方法 |
JP2003242154A (ja) * | 2002-02-18 | 2003-08-29 | Celestar Lexico-Sciences Inc | 遺伝子発現情報管理装置、遺伝子発現情報管理方法、プログラム、および、記録媒体 |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020091681A1 (en) * | 2000-04-03 | 2002-07-11 | Jean-Yves Cras | Report then query capability for a multidimensional database model |
WO2003081471A1 (fr) * | 2002-02-18 | 2003-10-02 | Celestar Lexico-Sciences, Inc. | Dispositif de gestion de donnees d'expression genetique |
CA2429909A1 (en) * | 2003-05-27 | 2004-11-27 | Cognos Incorporated | Transformation of tabular and cross-tabulated queries based upon e/r schema into multi-dimensional expression queries |
-
2004
- 2004-07-12 WO PCT/JP2004/010250 patent/WO2005036441A1/ja active Application Filing
- 2004-07-12 US US10/569,494 patent/US20060287831A1/en not_active Abandoned
- 2004-07-12 JP JP2005514528A patent/JP4690199B2/ja not_active Expired - Fee Related
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH10269050A (ja) * | 1997-03-24 | 1998-10-09 | Canon Inc | 情報処理装置及びその方法 |
JPH1185448A (ja) * | 1997-09-05 | 1999-03-30 | Matsushita Electric Ind Co Ltd | 情報表示装置 |
JP2003505749A (ja) * | 1999-02-23 | 2003-02-12 | ワーナー−ランバート カンパニー | 遺伝子発現プロファイリングに由来する情報の管理と提示のためのシステムと方法 |
JP2002149300A (ja) * | 2000-11-15 | 2002-05-24 | Isao Higashihara | 表の表示および扱いに関する方法および装置 |
JP2003242154A (ja) * | 2002-02-18 | 2003-08-29 | Celestar Lexico-Sciences Inc | 遺伝子発現情報管理装置、遺伝子発現情報管理方法、プログラム、および、記録媒体 |
Non-Patent Citations (2)
Title |
---|
KAWASHIMA H.: "Shinseiki iryo o mezashite-SNP to DNA chip DNA chip to bioinformatics", GENE & MEDICINE, KABUSHIKI KAISHA MEDICAL DO, vol. 4, no. 1, 10 February 2000 (2000-02-10), pages 129 - 133, XP002987075 * |
KITANO H.: "System biology seimei o system toshite rikai suru", SHUJUNSHA CO., LTD., 1 July 2001 (2001-07-01), pages 72 - 90, XP002987074 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7716169B2 (en) | 2005-12-08 | 2010-05-11 | Electronics And Telecommunications Research Institute | System for and method of extracting and clustering information |
JP2016514321A (ja) * | 2013-03-13 | 2016-05-19 | セールスフォース ドット コム インコーポレイティッド | データアップロード、処理及び予測クエリapi公開を実施するシステム、方法及び装置 |
Also Published As
Publication number | Publication date |
---|---|
JP4690199B2 (ja) | 2011-06-01 |
JPWO2005036441A1 (ja) | 2006-12-21 |
US20060287831A1 (en) | 2006-12-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Lex et al. | Comparative analysis of multidimensional, quantitative data | |
Gratzl et al. | Domino: Extracting, comparing, and manipulating subsets across multiple tabular datasets | |
Brazma et al. | Gene expression data analysis | |
Kincaid et al. | Line graph explorer: scalable display of line graphs using focus+ context | |
JP2004133903A (ja) | 複数のデータタイプを同時に視覚表示及び操作するための方法及び装置 | |
US20160232224A1 (en) | Categorization and filtering of scientific data | |
US20020165674A1 (en) | Method and system for analyzing biological response signal data | |
Simillion et al. | Building genomic profiles for uncovering segmental homology in the twilight zone | |
Torkkola et al. | Self-organizing maps in mining gene expression data | |
Partl et al. | ConTour: data-driven exploration of multi-relational datasets for drug discovery | |
Furmanova et al. | Taggle: Scalable visualization of tabular data through aggregation | |
WO2016118771A1 (en) | System and method for drug target and biomarker discovery and diagnosis using a multidimensional multiscale module map | |
Kim et al. | Visualizing set concordance with permutation matrices and fan diagrams | |
Wiltgen et al. | DNA microarray analysis: principles and clinical impact | |
Klein et al. | Visual analysis of biological activity data with Scaffold Hunter | |
WO2005036441A1 (ja) | 生体関連事象間の相関データの可視化方法、解析法及びデータベース | |
Gonzalez et al. | SitePainter: a tool for exploring biogeographical patterns | |
CN109033747B (zh) | 基于pls多扰动集成基因选择的肿瘤特异基因识别方法 | |
Ta et al. | A novel method for assigning functional linkages to proteins using enhanced phylogenetic trees | |
Saffer et al. | Visual analytics in the pharmaceutical industry | |
EP1221126A2 (en) | Graphical user interface for display and analysis of biological sequence data | |
JP2004535612A (ja) | 遺伝子発現データの管理システムおよび方法 | |
Kincaid | VistaClara: an interactive visualization for exploratory analysis of DNA microarrays | |
Lee et al. | The next frontier for bio-and cheminformatics visualization | |
Havre et al. | Bioinformatic insights from metagenomics through visualization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
DPEN | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed from 20040101) | ||
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2005514528 Country of ref document: JP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2006287831 Country of ref document: US Ref document number: 10569494 Country of ref document: US |
|
122 | Ep: pct application non-entry in european phase | ||
WWP | Wipo information: published in national office |
Ref document number: 10569494 Country of ref document: US |