US20060195266A1

US20060195266A1 - Methods for predicting cancer outcome and gene signatures for use therein

Info

Publication number: US20060195266A1
Application number: US11/065,794
Authority: US
Inventors: Timothy Yeatman
Original assignee: Individual
Current assignee: Individual
Priority date: 2005-02-25
Filing date: 2005-02-25
Publication date: 2006-08-31
Also published as: US10181009B2; EP1894131A2; WO2006093507A3; EP1894131B1; US20060195269A1; WO2006093507A2; EP1894131A4

Abstract

The present invention pertains to specific gene signatures for cancer that are used to predict survival and novel processes for identifying such gene signatures. In one embodiment, gene signatures for human colorectal cancer are identified and outcomes are linked to the specific gene signatures using significance analysis of microarrays (SAM) and support vector machines (SVM) to provide a prognosis/survival classifier.

Description

CROSS-REFERENCE TO A RELATED APPLICATION

This application claims the benefit of U.S. Provisional Application Ser. No. 60/547,871, filed Feb. 25, 2004, which is hereby incorporated by reference in its entirety.

BACKGROUND OF THE INVENTION

In the last decade, scientists have labored to complete a high-quality, comprehensive sequence of the human genome. With its recent completion, a large number of genomic data sets have been made available in public databases. The available data, however, does not provide explanations regarding which aspects of human biology affect which genes. Researchers are just beginning to explore genomic function.
Several technological advances have made it possible to accurately measure cellular constituents and therefore derive profiles. For example, new techniques provide the ability to monitor the expression level of a large number of transcripts at any one time (see, for example, Schena et al., “Quantitative monitoring of gene expression patterns with a complementary DNA micro-array,” Science, 270:467-470 (1995); Lockhart et al., “Expression monitoring by hybridization to high-density oligonucleotide arrays,” Nature Biotechnology, 14:1675-1680 (1996); and Blanchard et al., “Sequence to array: Probing the genome's secrets,” Nature Biotechnology, 14:1649 (1996)). In organisms for which the complete genome is known, it is possible to analyze the transcripts of all genes within the cell. With other organisms, such as humans, for which there is an increasing knowledge regarding the genome, it is possible to simultaneously monitor large numbers of the genes within the cell.
One aspect of human biology/genomic function that is of great interest to the medical research community is cancer. Currently, genetic samples have been taken from patients having various stages of various types of cancer. Such samples have provided an extensive genetic data collection. To provide a system of organization, such genetic data are collected in DNA microarrays, which are sometimes commonly referred to as biochips, DNA chips, gene arrays, gene chips, and genome chips.
DNA microarrays exploit a phenomenon known as base-pairing or hybridization. To form the array, genetic samples are arranged in an orderly manner (typically in a rectangular grid) on a substrate. Examples of commonly used substrates include microplates and blotting membranes. Many modern microarrays include an array of oligonucleotide or peptide nucleic acid (PNA) probes, and the array is synthesized either in situ (on-chip) or by conventional synthesis followed by on-chip immobilization. The array on the chip is exposed to labeled sample DNA, hybridized, and the identity/abundance of complementary sequences are determined.
There are two major uses of DNA microarray technology. The first involves identification of the gene sequence. The second involves determination of expression level of genes, generally referred to as the abundance of the genes. In particular, expression or abundance of a gene is a measure of a relative level of activity of the gene in replication or translation in the presence of the probe. By analyzing the abundance of various genes in people of various conditions, a relationship between the genetic state of a person, in terms of relative levels of activity of various genes of that person, and that person's condition is assessed. To conduct such analysis, such arrays of expression levels include metadata describing characteristics of the people whose genetic material is sampled and additional metadata which identifies specific genes whose expression levels are represented in such arrays.
The use of microarrays are already being used for a number of beneficial purposes including, for example, identifying biomarkers of cancer (Welsh, J B et al., “Large-scale delineation of secreted protein biomarkers overexpressed in cancer tissue and serum,” PNAS, 100(6):3410-3415 (March 2003)), creating gene expression-based classifications of cancers (Alzadeh, A A et al., “Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling,” Nature, 403:513-11 (2000); and Garber, M E et al., “Diversity of gene expression in adenocarcinoma of the lung,” Proc Natl Acad Sci USA, 98:13784-9 (2001)), and in drug discovery (Marton, M J et al., “Drug target validation and identification of secondary drug target effects using Microarrays,” Nat Med, 4(11):1293-301 (1998); and Gray, N S et al., “Exploiting chemical libraries, structure, and genomics in the search for kinase inhibitors,” Science, 281:533-538 (1998)). One tool that has been applied to microarrays to decipher and compare genome expression patterns in biological systems is Significance Analysis of Microarrays, or SAM (Tusher, V. et al., “Significance analysis of microarrays applied to ionizing radiation response,” Proceedings of the National Academy of Sciences, 2001. First published Apr. 17, 2001, 10.1073/pnas.091062498). This statistical method was developed as a cluster tool for use in identifying genes with statistically significant changes in expression. SAM has been used for a variety of purposes, including identifying potential drugs that would be effective in treating various conditions associated with specific gene expressions (Bunney W E, et al., “Microarray technology: a review of new strategies to discover candidate vulnerability genes in psychiatric disorders,” Am J Psychiatry, 160(4):657-66 (April 2003)).
The known SVM or (Support Vector Machine) (as described in Michael P. et al., “Knowledge-based analysis of microarray gene expression data by using support vector machines,” Proceedings of the National Academy of Sciences, 97(1):262-67 (2000)) is a correlation tool shown to perform well in multiple areas of biological analysis, including evaluating microarray expression data (Brown et al, “Knowledge-based analysis of microarray gene expression data by using support vector machines,” Proc Natl Acad Sci USA, 97:262-267 (2000)), detecting remote protein homologies (Jaakkola, T. et al., “Using the Fisher kernel method to detect remote protein homologies,” Proceedings of the 7^th International Conference on Intelligent Systems for Molecular Biology, AAAI Press, Menlo Park, Calif. (1999)), and recognizing translation initiation sites (Zien, A. et al., “Engineering support vector machine kernels that recognize translation initiation sites,” Bioinformatics, 16(9):799-807 (2000)). When used for classification, SVMs separate a given set of binary labeled training data with a hyper-plane that is maximally distant from set of data (the “maximal margin hyper-plane”). Where no linear separation is possible, SVMs utilize the technique of “kernels” to automatically realize a non-linear mapping to a feature space (Furey, T. S. et al., “Support vector machine classification and validation of cancer tissue samples using microarray expression data,” Bioinformatics, 16(10):906-914 (2000)).
Ranked as the third most commonly diagnosed cancer and the second leading cause of cancer deaths in the United States (American Cancer Society, “Cancer facts and figures,” Washington, D.C.: American Cancer Society (2000)), colon cancer is a deadly disease afflicting nearly 130,000 new patients yearly in the United States. Colon cancer is the only cancer that occurs with approximately equal frequency in men and women. There are several potential risk factors for the development of colon and/or rectal cancer. Known factors for the disease include older age, excessive alcohol consumption, sedentary lifestyle (Reddy, B. S., “Dietary fat and its relationship to large bowel cancer,” Cancer Res., 41:3700-3705 (1981)), and genetic predisposition (Potter, J D “Colorectal cancer: molecules and populations,” J Natl Cancer Institute, 91:916-932 (1999)).
Several molecular pathways have been linked to the development of colon cancer (see, for example, Leeman M F, et al., “New insights into the roles of matrix metalloproteinases in colorectal cancer development and progression,” J Pathol., 201(4):528-34 (2003); Kanazawa, T et al., “Does early polypoid colorectal cancer with depression have a pathway other than adenoma-carcinoma sequence?,” Tumori., 89(4):408-11 (2003); and Notarnicola, M. et al., “Genetic and biochemical changes in colorectal carcinoma in relation to morphologic characteristics,” Oncol Rep., 10(6):1987-91 (2003)), and the expression of key genes in any of these pathways may be affected by inherited or acquired mutation or by hypermethylation. A great deal of research has been performed with regard to identifying genes for which changes in expression may provide an early indicator of colon cancer or a predisposition for the development of colon cancer. Unfortunately, no research has yet been conducted on identifying specific genes associated with colorectal cancer and specific outcomes to provide an accurate prediction of prognosis.
Survival of patients with colon and/or rectal cancer depends to a large extent on the stage of the disease at diagnosis. Devised nearly seventy years ago, the modified Dukes' staging system for colon cancer, discriminates four stages (A, B, C, and D), primarily based on clinicopathologic features such as the presence or absence of lymph node or distant metastases. Specifically, colonic tumors are classified by four Dukes' stages: A, tumor within the intestinal mucosa; B, tumor into muscularis mucosa; C, metastasis to lymph nodes and D, metastasis to other tissues. Of the systems available, the Dukes' staging system, based on the pathological spread of disease through the bowel wall, to lymph nodes, and to distant organ sites such as the liver, has remained the most popular. Despite providing only a relative estimate for cure for any individual patient, the Dukes' staging system remains the standard for predicting colon cancer prognosis, and is the primary means for directing adjuvant therapy.
The Dukes' staging system, however, has only been found useful in predicting the behaviour of a population of patients, rather than an individual. For this reason, any patient with a Dukes A, B, or C lesion would be predicted to be alive at 36 months while a patient staged as Dukes D would be predicted to be dead. Unfortunately, application of this staging system results in the potential over-treatment or under-treatment of a significant number of patients. Further, Dukes' staging can only be applied after complete surgical resection rather than after a pre-surgical biopsy.
Microarray technology, as described above, has permitted development of multi-organ cancer classifiers (Giordano, T. J. et al., “Organ-specific molecular classification of primary lung, colon, and ovarian adenocarcinomas using gene expression profiles,” Am J Pathol, 159:1231-8 (2001); Ramaswamy, S. et al., “Multiclass cancer diagnosis using tumor gene expression signatures,” Proc Natl Acad Sci USA, 98:15149-54 (2001); and Su, A. I. et al., “Molecular classification of human carcinomas by use of gene expression signatures,” Cancer Res, 61:7388-93 (2001)), identification of tumor subclasses (Dyrskjot, L. et al., “Identifying distinct classes of bladder carcinoma using microarrays,” Nat Genet, 33:90-6 (2003); Bhattacharjee, A. et al., “Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses,” Proc Natl Acad Sci USA, 98:13790-5 (2001); Garber, M. E. et al., “Diversity of gene expression in adenocarcinoma of the lung,” Proc Natl Acad Sci USA, 98:13784-9. (2001); and Sorlie, T. et al., “Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications,” Proc Natl Acad Sci USA, 98:10869-74 (2001)), discovery of progression markers (Sanchez-Carbayo, M. et al., “Gene Discovery in Bladder Cancer Progression using cDNA Microarrays,” Am J Pathol, 163:505-16 (2003); and Frederiksen, C M, et al., “Classification of Dukes' B and C colorectal cancers using expression arrays,” J Cancer Res Clin Oncol, 129:263-71 (2003)); and prediction of disease outcome (Henshall, S M et al., “Survival analysis of genome-wide gene expression profiles of prostate cancers identifies new prognostic targets of disease relapse,” Cancer Res, 63:4196-203 (2003); Shipp, M A et al., “Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning,” Nat Med, 8:68-74 (2002); Beer, D G et al., “Gene-expression profiles predict survival of patients with lung adenocarcinoma,” Nat Med, 8:816-24 (2002); Pomeroy, S L et al., “Prediction of central nervous system embryonal tumor outcome based on gene expression,” Nature, 415:436-42 (2002); van 't Veer, L J et al., “Gene expression profiling predicts clinical outcome of breast cancer: Nature, 415:530-6. (2002); Vasselli, J R et al., “Predicting survival in patients with metastatic kidney cancer by gene-expression profiling in the primary tumor,” Proc Natl Acad Sci USA, 100:6958-63 (2003); and Takahashi, M. et al., “Gene expression profiling of clear cell renal cell carcinoma: gene identification and prognostic classification,” Proc Natl Acad Sci USA, 98:9754-9 (2001)) in many types of cancer.
Classification of patient prognosis by microarray analysis has promise in predicting the long-term outcome of any one individual based on the gene expression profile of the tumor at diagnosis. Inherent to this approach is the hypothesis that every tumor contains informative gene expression signatures, at the time of diagnosis, which can direct the biological behaviour of the tumor over time. To date, however, little success has been achieved in developing a classifier that will predict colon cancer outcome equivalent to or better than that which is possible using the standard clinicopathologic staging systems (i.e., Dukes' stage system). What is needed is a particularly effective mechanism for analyzing genomic array data to provide a classifier that accurately predicts cancer outcomes, in particular, colon cancer outcomes.

BRIEF SUMMARY OF THE INVENTION

The present invention provides systems and methods for predicting outcomes in patients diagnosed with cancer. Specifically, the subject invention utilizes molecular staging with gene expression profiles to stage patients with cancer. In a specific embodiment, the present invention provides a gene expression profile based classifier that provides a means for accurately predicting colon cancer outcome.
In accordance with an aspect of the invention, genes are classified according to degree of correlation with a clinical outcome for a cancer of interest (such as colon cancer). These genes are used to establish a set of reference gene expression levels (also referred to herein as a “classifier”). Biological information regarding the patient is received and used to extrapolate intracellular gene expression. The intracellular gene expression levels are compared to those in the classifier to predict clinical outcome.
In one embodiment of the invention, a method is provided in which the specific gene signatures for colon cancer are identified. To do so, frozen tumor specimens form patients with known outcomes are collected and frozen. The outcomes are linked to a specific core set of genes that are weighted in importance by (1) selecting genes of interest by applying microarray analysis; (2) producing a classifier using support vector machines (SVM); and (3) cross-validating the genes of interest and the classifier by comparing them against an independent set of test data. In a preferred embodiment, significance analysis of microarrays (SAM) is utilized to select genes of interest.
Genome wide microarray analyses can produce large datasets that can be pattern-matched to clinicopathologic parameters such as patient outcomes and prognosis. Accordingly, the subject invention identifies gene expression signatures that would predict colon cancer outcome more accurately than the well-accepted Dukes' staging system.
In one embodiment, a group of colon cancer patients was examined to develop a survival classifier, which was subsequently validated using an entirely independent test set of data derived on a different microarray platform at a different performance site. The classifier of the subject invention was ultimately based on a core set of genes selected for their correlation to survival. A number of the genes in the core set demonstrated intrinsic biological significance for colon cancer progression.
With the ability to predict cancer outcomes/prognosis using the subject invention, appropriate treatment protocols can be selected for patients. For example, patients assessed using the subject invention and identified to have poor outcomes may be treated more aggressively or with specific agents (i.e., anti-sense agents, RNA inhibition agents, small molecule inhibitors of the cancer activity, gene therapy, etc.). Accordingly, an important contribution of the prognosis/survival classifier of the present invention is the ability to identify those Dukes' stage B and C cases for which chemotherapy may be beneficial.

DESCRIPTION OF THE FIGURES

The file of this patent contains at least one drawing executed in color. Copies of this patent with color drawing(s) will be provided by the Patent and Trademark Office upon request and payment of the necessary fee.
FIG. 1A is a heatmap illustrating cluster analysis of genes selected in accordance with the present invention when correlated with prognosis/patient survival.
FIG. 1B is a heatmap illustrating cluster analysis of genes selected in accordance with the present invention when grouped by Dukes' stage B and C.
FIG. 2A graphically illustrates a Kaplan-Meier survival curve based on gene expression profiling in accordance with the present invention.
FIG. 2B graphically illustrates a Kaplan-Meier survival curve based on Dukes' staging.
FIGS. 3A-3C illustrate survival curves for molecular classifiers in accordance with the subject invention.

DETAILED DISCLOSURE OF THE INVENTION

The present invention provides systems and methods for predicting cancer prognosis and outcomes. Specifically, the subject invention utilizes molecular staging with gene expression profiles to stage patients with cancer. In a specific embodiment, the present invention provides a gene expression profile based classifier for predicting cancer outcomes/prognosis. Both microarray analysis and binary classification are used to create the classifier of the invention.
The subject invention provides methods for predicting patient outcomes comprising: identifying genes that correlate with a clinical outcome for a cancer of interest (such as colon cancer); establishing a set of reference gene expression levels (also referred to herein as a “classifier”) for said identified genes; receiving biological information regarding the patient; using the biological information to extrapolate intracellular gene expression; and comparing intracellular gene expression levels to those in the classifier to predict clinical outcome.
Biological information of the invention includes, but is not limited to, clinical samples of bodily fluids or tissues; DNA profile information; and RNA profile information. Methods for preparing clinical samples for gene expression analysis are well known in the art, and can be carried out using commercially available kits.
In one embodiment, the subject invention provides methods for predicting colon cancer patient outcomes using a SAM selected set of genes derived from a genome wide analysis of gene expression. Those patients with good and bad prognoses are first clustered into groups that suggest outcome-rich information that is likely present in the gene expression dataset. Subsequently, a supervised SVM analysis identifies a core set of genes that appears in a majority (i.e., 50% or greater, including for example, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90% or 95%) of the cross validation folds and accurately predicts colon cancer survival. Preferably, a core set of genes that appears in 75% of the cross validation folds is identified by an SVM to be used in predicting colon cancer survival.
In one embodiment, a gene core set is derived from a cDNA microarray that includes both named and unnamed genes. The resultant gene set is highly accurate in predicting cancer survival when compared with Dukes staging data from the same patients. To validate a cDNA-based classifier of the subject invention, a normalized and scaled oligonucleotide-based cancer database is evaluated against a completely independent set of test data derived from a different microarray platform.
Accordingly, the subject invention provides a system for predicting clinical outcome in a patient diagnosed with cancer, wherein the system is useful in offering support/advice in making treatment decisions. The system comprises (1) a data storage device for collecting data (i.e., gene data); and (3) a computing means for receiving and analyzing data to accurately determine genes associated with poor or good patient prognosis. A graphical user interface can be included with the systems of the invention to display clinical data as well as enable user-interaction.
In one embodiment, the system of the invention further includes an intelligence system that can use the analyzed clinical data to classify gene samples and offer support/advice for making clinical decisions (i.e., to interpret predicted clinical outcome and provide appropriate treatment). An intelligence system of the subject invention can include, but is not limited to, artificial neural networks, fuzzy logic, evolutionary computation, knowledge-based systems, and artificial intelligence.
In accordance with the subject invention, the computing means is preferably a digital signal processor, which can automatically and accurately analyze gene data and determine those genes that strongly correlate to clinical outcome.
In one embodiment, the system of the subject invention is stationary. For example, the system of the invention can be used within a healthcare setting (i.e., hospital, physician's office).
Definitions
As used herein, the term “patient” refers to humans as well as non-human animals including, and not limited to, mammals, birds, reptiles, amphibians, and fish. Preferred non-human animals include mammals (i.e., mouse, rat, rabbit, monkey, dog, cat, primate, pig). A patient may also include transgenic animals. In certain embodiments, a patient may be a laboratory animal raised by humans in a controlled environment other than its natural habitat.
The term “cancer,” as used herein, refers to a malignant tumor (i.e., colon or prostate cancer) or growth of cells (i.e., leukaemia). Cancers tend to be less differentiated than benign tumors, grow more rapidly, show infiltration, invasion, and destruction, and may metastasize. Cancer include, and are not limited to, colon and rectal cancers, fibrosarcoma, myxosarcoma, antiosarcoma, leukaemia, squamous cell carcinoma, basal cell carcinoma, malignant melanoma, renal cell carcinoma, and hepatocellular carcinoma.
A “marker gene,” as used herein, refers to any gene or gene product (i.e., protein, peptide, mRNA) that indicates a particular clinicopathological state (i.e., carcinoma, normal dysplasia and outcomes) or indicates a particular cell type, tissue type, or origin. The expression or lack of expression of a marker gene may indicate a particular physiological and/or diseased state of a patient, organ, tissue, or cell. Preferably, the expression or lack of expression may be determined using standard techniques such as RT-PCR, sequencing, immunochemistry, gene chip analysis, etc. In certain particular embodiments, the level of expression of a marker gene is quantifiable.
The term “polynucleotide” or “oligonucleotide,” as used herein, refers to a polymer of nucleotides. Typically, a polynucleotide comprises at least three nucleotides. The polymer may include natural nucleosides (i.e., adenosine, thymidine, guanosine, cytidine, uridine, deoxyadenosine, deoxythymidine, deoxyguanosine, and deoxycytidine), nucleoside analogs (i.e., 2-aminoadensoine, 2-thio-thymidine, inosine, pyrrolo-pyrimidine, 3-methyl adenosine, C5-propynylcytidine, C5-propynyluridine, C5-bromouridine, C5-fluorouridine, C5-iodouridine, C5-methylcytidine, 7-deazaadenosine, 7-deazaguanosine, 8-oxoadenosine, 8-oxoguanosine, O(6)-methylguanine, and 2-thiocytidine), chemically modified bases, biologically modified bases (i.e., methylated bases), intercalated bases, modified sugars (i.e., 2′-fluororibose, ribose, 2′-deoxyribose, arabinose, and hexose), or modified phosphate groups (i.e., phosphorothioates and 5′-N-phosphoramidite linkages).
As used herein, the term “tumor” refers to an abnormal growth of cells. The growth of the cells of a tumor typically exceeds the growth of normal tissue and tends to be uncoordinated. The tumor may be benign (i.e., lipoma, fibroma, myxoma, lymphangioma, meningioma, nevus, adenoma, leiomyoma, mature teratoma, etc.) or malignant (i.e., malignant melanoma, ovarian cancer, carcinoma in situ, carcinoma, adenocarcinoma, liposarcoma, mesothelioma, squamous cell carcinoma, basal cell carcinoma, colon cancer, lung cancer, etc.).
The term “bodily fluid,” as used herein, refers to a mixture of molecules obtained from a patient. Bodily fluids include, but are not limited to, exhaled breath, whole blood, blood plasma, urine, semen, saliva, lymph fluid, meningal fluid, amniotic fluid, glandular fluid, sputum, feces, sweat, mucous, and cerebrospinal fluid. Bodily fluid also includes experimentally separated fractions of all of the preceding solutions or mixtures containing homogenized solid material, such as feces, tissues, and biopsy samples.
Computing Means
Correlating genes to clinical outcomes in accordance with the subject invention can be performed using software on a computing means. The computing means can also be responsible for maintenance of acquired data as well as the maintenance of the classifier system itself. The computing means can also detect and act upon user input via user interface means known to the skilled artisan (i.e., keyboard, interactive graphical monitors) for entering data to the computing system.
In one embodiment, the computing means further comprises means for storing and means for outputting processed data. The computing means includes any digital instrumentation capable of processing data input from the user. Such digital instrumentation, as understood by the skilled artisan, can process communicated data by applying algorithm and filter operations of the subject invention. Preferably, the digital instrumentation is a microprocessor, a personal desktop computer, a laptop, and/or a portable palm device. The computing means can be general purpose or application specific.
The subject invention can be practiced in a variety of situations. The computing means can directly or remotely connect to a central office or health care center. In one embodiment, the subject invention is practiced directly in an office or hospital. In another embodiment, the subject invention is practiced in a remote setting, for example, personal residences, mobile clinics, vessels at sea, rural villages and towns without direct access to healthcare, and ambulances, wherein the patient is located some distance from the physician.
In a related embodiment, the computing means is a custom, portable design and can be carried or attached to the health care provider in a manner similar to other portable electronic devices such as a portable radio pr computer.
The computing means used in accordance with the subject invention can contain at least one user-interface device including, but not limited to, a keyboard, stylus, microphone, mouse, speaker, monitor, and printer. Additional user-interface devices contemplated herein include touch screens, strip recorders, joysticks, and rollerballs.
Preferably, the computing means comprises a central processing unit (CPU) having sufficient processing power to perform algorithm operations in accordance with the subject invention. The algorithm operations, including the microarray analysis operations (such as SAM or binary classification), can be embodied in the form of computer processor usable media, such as floppy diskettes, CD-ROMS, zip drives, non-volatile memory, or any other computer-readable storage medium, wherein the computer program code is loaded into and executed by the computing means. Optionally, the operational algorithms of the subject invention can be programmed directly onto the CPU using any appropriate programming language, preferably using the C programming language.
In certain embodiments, the computing means comprises a memory capacity sufficiently large to perform algorithm operations in accordance with the subject invention. The memory capacity of the invention can support loading a computer program code via a computer-readable storage media, wherein the program contains the source code to perform the operational algorithms of the subject invention. Optionally, the memory capacity can support directly programming the CPU to perform the operational algorithms of the subject invention. A standard bus configuration can transmit data between the CPU, memory, ports and any communication devices.
In addition, as understood by the skilled artisan, the memory capacity of the computing means can be expanded with additional hardware and with saving data directly onto external mediums including, for example, without limitation, floppy diskettes, zip drives, non-volatile memory and CD-ROMs.
Further, the computing means can also include the necessary software and hardware to receive, route and transfer data to a remote location.
In one embodiment, the patient is hospitalized, and clinical data generated by a computing means is transmitted to a central location, for example, a monitoring station or to a specialized physician located in a different locale.
In another embodiment, the patient is in remote communication with the health care provider. For example, patients can be located at personal residences, mobile clinics, vessels at sea, rural villages and towns without direct access to healthcare, and ambulances, and by using the classifier system of the invention, still provide clinical data to the health care provider. Advantageously, mobile stations, such as ambulances, and mobile clinics, can monitor patient health by using a portable computing means of the subject invention when transporting and/or treating a patient.
To ensure patient privacy, security measures, such as encryption software and firewalls, can be employed. Optionally, clinical data can be transmitted as unprocessed or “raw” signal(s) and/or as processed signal(s). Advantageously, transmitting raw signals allows any software upgrades to occur at the remote location where a computing means is located. In addition, both historical clinical data and real-time clinical data can be transmitted.
Communication devices such as wireless interfaces, cable modems, satellite links, microwave relays, and traditional telephonic modems can transfer clinical data from a computing means to a healthcare provider via a network. Networks available for transmission of clinical data include, but are not limited to, local area networks, intranets and the open internet. A browser interface, for example, NETSCAPE NAVIGATOR or INTERNET EXPLORER, can be incorporated into communications software to view the transmitted data.
Advantageously, a browser or network interface is incorporated into the processing device to allow the user to view the processed data in a graphical user interface device, for example, a monitor. The results of algorithm operations of the subject invention can be displayed in the form of interactive graphics.
Dukes' Staging as a Classifier
Since Dukes' staging describes the survival of a population of patients, rather than an individual, any individual patient can be classified as alive or dead using the survivorship of the population to predict that of the individual. In other words, if the survival of a Dukes C population is 55% at 36 months of follow up, the Dukes C individual patient would be classified as alive at 36 months but with only a 55% accuracy rate. By making these assumptions, the accuracy of a staging by a microarray classifier of the subject invention to that of a clinical staging system can be compared.
Identification of Prognosis-Related Genes
As a first step in the survival analysis of microarray data, genes that best separate cancer patients with poor and good prognosis were identified. Censored-survival analysis using significance analysis of microarrays (SAM) or any other microarray analysis (i.e., clustering methods such as those disclosed by Eisen et al., “Cluster analysis and display of genome-wide expression patterns,” Proc. Natl. Acad. Sci. USA, 95:14863-14868 (1998); Alon et al., “Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays,” Proc. Natl. Acad. Sci. USA, 96:6745-6750 (1999); and Ben-Dor et al., “Tissue classification with gene expression profiles,” J. Comput. Biol., 7:559-583 (2000); classification trees such those disclosed by Dubitzky et al., “A database system for comparative genomic hybridization analysis,” IEEE Eng Med Biol Mag, 20(4):75-83 (2001); genetic algorithms such as those disclosed by L1 et al., “Computational analysis of leukemia microarray expression data using the GA/KNN,” in Methods of Microarray Data Analysis, Kluwer Academic Publishers (2001); neural networks such as those disclosed by Hwang et al., “Applying machine learning techniques to analysis of gene expression data: cancer diagnosis,” in Methods of Microarray Data Analysis, Kluwer Academic Publishers (2001); and the “Neighborhood Analysis” (a weighted correlation method) as disclosed by Golub et al., “Molecular classification of cancer: class discovery and class prediction by gene expression monitoring,” Science, 286:531-537 (1999)) can be used to select genes correlated with prognosis in accordance with the subject invention.
Using SAM or any other microarray analysis, genes can be selected that most closely correlate with selected survival times. Permutation analysis can then used to estimate the false discovery rate (FDR). The resultant mean-centered gene expression vectors can then be clustered and visualized using known computer software (i.e., Cluster 3.0 and Java TreeView 1.03, both of which are provided by Hoon MJLd, et al., “Open Source Clustering Software,” Bioinformatics 2003, in press).
Classifier Construction and Evaluation
According to the present invention, a gene classifier can be constructed to predict a set time of outcome among a set number of patients using microarray data produced on a cDNA platform. In one embodiment, the classifier of the subject invention is produced on a computing means that using SAM two-class gene selection and a support vector machine classification. In one embodiment, the SAM procedure is empirically set to select enough genes to satisfy a set FDR. Such selected genes can then be used in a linear support vector machine to classify the samples as having poor or good prognosis.
Leave-one-out cross-validation (LOOCV) operation can also be utilized to construct a classifier (i.e., neural network-based classifier) as well as to estimate the prediction accuracy of the classifier of the subject invention. In one embodiment, the classification process includes both gene selection and SVM classification creation; therefore, both steps can be performed on each training set after the test example is removed. According to the subject invention, samples can be classified as having “good” or “poor” prognosis based on survival for a certain set amount of time. In a preferred embodiment, “good” or “poor” prognosis is based on more or less than 36 months, respectively.
By using the leave-one-out cross validation approach, the subject invention provides a means for ranking the genes selected. The number of times a particular gene is chosen can be an indicator of the usefulness of that gene for general classification and may imply biological significance.
In a preferred embodiment, the classifier of the subject invention is prepared by (1) SAM gene selection using a t-test and (2) classification using a neural network. The classifier is prepared after a test sample is left out (from the LOOCV) to avoid bias from the gene selection step. Since the classification problem is a binary decision, a t-test was used for gene selection.
Preferably, once a gene set is selected, a feed-forward back-propogation neural network system (see Rumelhart, D. E. and J. L. McClelland, “Parallel Distributed Processing: Exploration in the Microstructure of Cognition,” Cambridge, Mass.: MIT Press (1986); and Fahlman, S. E., “Faster-Learning Variations on Back-Propogation: An Empirical Study,” Proceedings of the 1988 Connectionist Models Summer School, Los Altos, Calif.: Morgan-Kaufmann (1988)) is used. In one embodiment, a feed-forward back-propogation neural network with a single layer of 10 units is used. Neural network systems are extremely robust to both the number of genes selected and the level of noise in these genes.
Statistical Significance
Differences between Kaplan-Meier curves can be evaluated using the log-rank test, which is well known to the skilled statistician. This can be performed both for the initial survival analysis and for the classifier results. In accordance with the present invention, the classifier can split the samples into various groups (i.e., two groups: those predicted as good or poor prognosis). Classifier accuracy can be reported to the user both as overall accuracy and as specificity/sensitivity. In one embodiment, a McNemar's Chi-Squared test is used to compare the molecular classifier with the use of a Dukes' staging classifier. In a related embodiment, several permutations of the dataset (i.e., 1,000 permutations) are used to measure the significance of the classifier results as compared to chance.

EXAMPLE 1

Human Colon Cancer Survival Classifier

Training Set Tumor Samples
In one embodiment of the subject invention, a colon cancer survival classifier was developed using 78 tumor samples, including 3 adenomas and 75 cancers. Informative frozen colorectal cancer samples were selected from the Moffitt Cancer Center Tumor Bank (Tampa, Fla.) based on evidence for good (survival >36 mo) or poor prognosis (survival <36 mo) from the Tumor Registry. Dukes' stages can include B, C, and D. In this particular embodiment, survival was measured as last contact minus collection date for living patients, or date of death minus collection date for patients who have died.
In this embodiment, the number of samples per Dukes' stage was as follows: 23 patients with stage B, 22 patients with stage C and 30 patients with stage D disease. Just as adenomas can be included to help train the classifier to recognize good prognosis patients, Dukes D patients with synchronous metastatic disease can be used to train the classifier to recognize poor prognosis patients.
In a related embodiment, all samples were selected to have at least 36 months of follow-up. The follow-up results in this embodiment showed that thirty-two of the patients survived more than 36 months, while 46 patients died within 36 months. With this particular embodiment, the median follow-up time for all 78 patients was 27.9 months. The median follow-up for the poor prognosis cases (<36 months survival) was 11.7 months and for the good prognosis cases (>36 months survival) it was 64.2 months.
Since the NIH consensus conference in 1990, chemotherapeutic application in the United States has been relatively homogeneous, with nearly all Dukes stage B avoiding chemotherapy, and nearly all Dukes stage C receiving 6 months of adjuvant 5-fluorouracil (5-FU) and leucovorin.
Test Set Tumor Samples (Denmark)
In another embodiment, eighty-eight patients with Dukes' stage B and C colorectal cancer and a minimum follow-up time of 60 months were selected for array hybridization. Ten micrograms of total RNA were used as starting material for the cDNA preparation and hybridized to Affymetrix U133A GeneChips (Santa Clara, Calif.) by standard protocols supplied by the manufacturer. The U133A gene chip is disclosed in U.S. Pat. Nos. 5,445,934; 5,700,637; 5,744,305; 5,945,334; 6,054,270; 6,140,044; 6,261,776; 6,291,183; 6,346,413; 6,399,365; 6,420,169; 6,551,817; 6,610,482; and 6,733,977; and in European Patent Nos. 619,321 and 373,203, all of which are hereby incorporated in their entirety, including all figures and tables, to the extent they are not inconsistent with the explicit teachings of this specification.
With this particular embodiment, there were 28 patients with stage B and 60 patients with stage C colorectal cancers. All Dukes' stage B patients were treated by surgical resection alone whereas all C patients received 5-FU/leucovorin adjuvant chemotherapy in addition to surgery. Colorectal tumor samples were obtained fresh from surgery and were immediately snap-frozen in fluid nitrogen but were not microdissected, with the potential for inclusion of samples with <80% purity. Total RNA was isolated from 50-150 mg tumor sample using RNAzol (WAK-Chemie Medical) or using spin column technology (Sigma) according to the manufacturer's instructions. Results were noted (i.e., fifty-seven of the patients survived more than 36 months, while 31 died within 36 months).
32K cDNA Array Hybridization and Scanning
According to the subject invention, samples can be microdissected (>80% tumor cells) by frozen section guidance and RNA extraction performed using Trizol followed by secondary purification on RNAEasy columns. The samples can then be profiled on cDNA arrays (i.e., TIGR's 32,488-element spotted cDNA arrays, containing 31,872 human cDNAs representing 30,849 distinct transcripts—23,936 unique TIGR TCs and 6,913 ESTs, 10 exogenous controls printed 36 times, and 4 negative controls printed 36-72 times).
In one embodiment, tumor samples are co-hybridized with a common reference pool in the Cy5 channel for normalization purposes. cDNA synthesis, aminoallyl labeling and hybridizations can be performed according to previously published protocols (see Hegde, P. et al., “A concise guide to cDNA microarray analysis,” Biotechniques; 29:552-562 (2000) and Yang, I. V, et al., “Within the fold: assessing differential expression measures and reproducibility in microarray assays,” Genome Biol; 3:research0062 (2002)). For example, labeled first-strand cDNA is prepared, and co-hybridized with labeled samples are prepared, from a universal reference RNA consisting of equimolar quantities of total RNA derived from three cell lines, CaCO2 (colon), KM12L4A (colon), and U118MG (brain). Detailed protocols and description of the array are available at <http://cancer.tigr.org>. Array probes are identified and local background can be subtracted in Spotfinder (Saeed, A. I. et al., “TM4: a free, open-source system for microarray data management and analysis,” Biotechniques; 34:374-8 (2003)). Individual arrays can be normalized in MIDAS (see Saeed, A.I. ibid.) using LOWESS (an algorithm known to the skilled artisan for use in normalizing data) with smoothing parameter set to 0.33.
Microarray Hybridization and Scanning of Denmark Samples
The first and second strand cDNA synthesis can be performed using the SuperScript II System (Invitrogen) according to the manufacturer's instructions except using an oligodT primer containing a T7 RNA polymerase promoter site. Labeled cRNA is prepared using the BioArray High Yield RNA Transcript Labeling Kit (Enzo). Biotin labeled CTP and UTP (Enzo) are used in the reaction together with unlabeled NTP's. Following the IVT reaction, the unincorporated nucleotides are removed using RNeasy columns (Qiagen). Fifteen micrograms of cRNA are fragmented at 940 C for 35 min in a fragmentation buffer containing 40 mM Tris-acetate pH 8.1, 100 mM KOAc, 30 mM MgOAc. Prior to hybridization, the fragmented cRNA in a 6×SSPE-T hybridization buffer (1 M NaCl, 10 mM Tris pH 7.6, 0.005% Triton) is heated to 95° C. for 5 min and subsequently to 45° C. for 5 min before loading onto the Affymetrix HG_U133A probe array cartridge. The probe array is then incubated for 16 h at 45° C. at constant rotation (60 rpm). The washing and staining procedure can be performed in an Affymetrix Fluidics Station.
The probe array can be exposed to several washes (i.e., 10 washes in 6×SSPE-T at 25° C. followed by 4 washes in 0.5×SSPE-T at 50° C.). The biotinylated cRNA can then be stained with a streptavidinphycoerythrin conjugate, final concentration 2 mg/ml (Molecular Probes, Eugene, Oreg.) in 6×SSPE-T for 30 min at 25° C. followed by 10 washes in 6×SSPE-T at 25° C. An antibody amplification step can then follow, using normal goat IgG as blocking reagent, final concentration 0.1 mg/ml (Sigma) and biotinylated anti-streptavidin antibody (goat), final concentration 3 mg/ml (Vector Laboratories). This can be followed by a staining step with a streptavidin-phycoerythrin conjugate, final concentration 2 mg/ml (Molecular Probes, Eugene, Oreg.) in 6×SSPE-T for 30 min at 25° C. and 10 washes in 6×SSPE-T at 25° C. The probe arrays are scanned (i.e., at 560 nm using a confocal laser-scanning microscope (Hewlett Packard GeneArray Scanner G2500A)). The readings from the quantitative scanning can then be analyzed by the Affymetrix Gene Expression Analysis Software (MAS 5.0) and normalized to a common mean expression value of 150.
Survival Analysis
The first analysis of the colon cancer survival data can be performed using censored survival time (in months) and 500 permutations. Significance analysis of microarrays (SAM) can then be used to select genes most closely correlated to survival. The subset of genes that correspond to an empirically derived, estimated false discovery rate (FDR) is then chosen. This subset of genes can then be used in subsequent analyses. In one embodiment, Cluster 3.0 and Java TreeView 1.03 are used to cluster and visualize the SAM-selected genes.
A hierarchical clustering algorithm can be chosen, with complete linkage and the correlation coefficient (i.e., Pearson correlation coefficient) as the similarity metric. In another embodiment, the Dukes' staging clusters are manually created in the appropriate format. Clustering software produces heatmap (see FIGS. 1A and 1B) and dendrograms. The highest level partition of the SAM-selected genes can then be chosen as a survival grouping. Given two clusters of survival times, Kaplan-Meier curves can be plotted (see FIGS. 2A and 2B).
Identification of Prognosis-Related Genes

According to the subject invention, SAM survival analysis can be used to identify a set of genes most correlated with censored survival time using the training set tumor samples. In one embodiment, a set of 53 genes was found, corresponding to a median expected false discovery rate (FDR) of 28%. These genes are listed in the following Table 1, wherein genes denoted with (+) indicate a positive correlation to survival time and genes without the (+) notation indicate a negative correlation in survival time (over expression in poor prognosis cases). Included in this list of genes in Table 1 are several genes believed to be biologically significant, such as osteopontin and neuregulin.

TABLE 1


Censored survival analysis using SAM, resultant 53 genes selected with median
28% FDR

	UniGene
GeneBank ID	ID	Description

N36176	Hs.108636	membrane protein CH1
AA149253	Hs.107987	N/A
AA425320	Hs.250461	hypothetical protein; MDG1; similar to putative microvascular
		endothelial differentiation gene 1; similar to X98993 (PID: g1771560)
AA775616	Hs.313	OPN-b; osteopontin; secreted phosphoprotein 1 (osteopontin, bone
		sialoprotein I, early T-lymphocyte activation 1)
N72847	Hs.125221	Alu subfamily SP sequence contamination warning entry. [Human]
		{Homo sapiens}
AA706226	Hs.113264	neuregulin 2 isoform 4
AA976642	Hs.42116	axin 2 (conductin, axil)
AA133215	Hs.32989	Receptor activity-modifying protein 1 precursor (CRLR activity-
		modifyingprotein 1)
AA457267	Hs.70669	P19 protein; HMP19 protein
N50073	Hs.84926	hypothetical protein
R38360	Hs.145567	Unknown {Homo sapients}
AA450205	Hs.8146	translocation protein-1; Sec62; Dtrp1 protein; membrane protein
		SEC62, S. cerevisiae, homolog of [Homo sapiens];
AA148578	Hs.110956	KOX 13 protein (56 AA)
R38640	Hs.89584	insulinoma-associated 1; bA470C13.2 (insulinoma-associated protein 1)
AA487274	Hs.48950	heptacellular carcinoma novel gene-3 protein; DAPPER1
N53172	Hs.23016	orphan receptor; orphan G protein-coupled receptor RDC1
AA045308	Hs.7089	insulin induced protein 2; INSIG-2 membrane protein
AA045075	Hs.62751	syntaxin 7
N63366	Hs.161488	N/A
R22340	null	chr2 synaptotagmin; KIAA1228 protein
AA437223	Hs.46640	Adult retina protein
AA481250	Hs.154138	chitinase precursor; chitinase 3-like 2; chondrocyte protein 39
AA045793	Hs.6790	hypothetical protein; MDG1; similar to putative microvascular
		endothelial differentiation gene 1; similar to X98993 (PID: g1771560);
		microvascular endothelial differentiation gene 1 product; microvascular
		endothelial differentiation gene 1; DKFZP564F1862 p
H87795	Hs.233502	N/A
AA121806	Hs.84564	Rab3c; hypothetical protein BC013033
AA284172	Hs.89385	NPAT; predicted amino acids have three regions which share similarity
		to annotated domains of transcriptional factor oct-1, nucleolus-
		cytoplasm shuttle phosphoprotein and protein kinases; NPAT; nuclear
		protein, ataxiatelangiectasia locus; Similar to nuc
R68106	Hs.233450	Fc-gamma-RIIb2; precursor polypeptide (AA −42 to 249); IgG Fc
		receptor; IgG Fc receptor; IgG Fc receptor beta-Fc-gamma-RII; IgG Fc
		fragment receptor precursor; Fc gamma RIIB [Homo sapiens]; Fc
		gamma RIIB [Ho
AA479270	Hs.250802	Diff33 protein homolog; KIAA1253 protein [Homo sapiens];
		KIAA1253protein [Homo sapiens]
AA432030	Hs.179972	Interferon-induced protein 6-16 precursor (Ifi-6-16). [Human] {Homo
		sapiens}
R10545	Hs.148877	dJ425C14.2 (Placental protein
AA453508	Hs.168075	transportin; karyopherin (importin) beta 2 [Homo sapiens]; karyopherin
		beta 2; importin beta 2; transportin; M9 region interaction protein [Homo
		sapiens]
AI149393	Hs.9302	phosducin-like protein; phosducin-like protein; phosducin-like protein;
		phosducin-like protein; hypothetical protein; phosducin-like; Unknown
		(proteinfor MGC: 14088) [Homo sapiens]
AA883496	Hs.125778	Null
AA167823	Hs.112058	CD27BP {Homo sapiens}
AI203139	Hs.180370	hypothetical protein FLJ30934 [Homo sapiens]
⁺H19822	Hs.2450	KIAA0028; leucyl-tRNA synthetase, mitochondrial [Homo sapiens];
		leucyltRNA synthetase, mitochondrial [Homo sapiens]; leucine-tRNA
		ligase precursor; leucine translase [Homo sapiens]
⁺W73732	Hs.83634	Null
⁺AA777892	Hs.121939	Null
⁺AA885478	Hs.125741	unnamed protein product [Homo sapiens]; hypothetical protein
		FLJ12505 [Homo sapiens]; Unknown (protein for MGC: 39884) [Homo
		sapiens]
⁺AA932696	Hs.8022	TU3A protein; TU3A protein [Homo sapiens]
⁺AA481507	Hs.159492	unnamed protein product [Homo sapiens]
⁺H18953	Hs.15232	Null
⁺AA709158	Hs.42853	put. DNA binding protein; put. DNA binding protein; cAMP responsive
		element binding protein-like 1; Creb-related protein [Homo sapiens]
⁺AA488652	Hs.4209	HSPC235; ribosomal protein L2; Similar to ribosomal protein,
		mitochondrial, L2 [Homo sapiens]; mitochondrial ribosomal protein
		L37; ribosomal protein, mitochondrial, L2 [Homo sapiens]
⁺N39584	Hs.17404	Null
⁺H62801	Hs.125059	Unknown (protein for IMAGE: 4309224) [Homo sapiens]; hypothetical
		protein [Homo sapiens]
⁺H17638	Hs.17930	dJ1033B10.2.2 (chromosome 6 open reading frame 11 (BING4),
		isoform 2) [Homo sapiens]
⁺R43684	Hs.165575	dJ402G11.5 (novel protein similar to yeast and bacterial predicted
		proteins) {Homo sapiens}
⁺N21630	Hs.143039	hypothetical protein PRO1942
⁺T81317	Hs.189846	Alu subfamily J sequence contamination warning entry. [Human]
		{Homosapiens}
⁺R45595	Hs.23892	Null
⁺T90789	Hs.121586	ray; small GTP binding protein RAB35 [Homo sapiens]; RAB35,
		member RAS oncogene family,; ras-related protein rab-1c (GTP-binding
		protein ray) [Homosapiens]
⁺AA283062	Hs.73986	Similar to CDC-like kinase 2 {Homo sapiens}

Any and all of the nucleotide and/or amino acid sequences associated with the accession numbers listed in Table 1 are hereby incorporated by reference.

FIG. 1A presents a graphical representation of the 53 SAM-selected genes (as described above) as a clustered heat map. The red color represents over-expressed genes relative to green, under-expressed genes. FIG. 1A shows only the Dukes' stage B and C cases, whose outcome Dukes' staging predicts poorly. Since only genes correlated with survival are used in clustering, the distinctly illustrated clusters in the heatmap correspond to very different prognosis groups.
The 53 SAM-selected genes were also arranged by annotated Dukes' stage in FIG. 1B. Unlike FIG. 1A, where two gene groups were apparent, there was no discernible gene expression grouping when arranged by Dukes' stage.
FIG. 2A shows the Kaplan-Meier plot for two dominant clusters of genes correlated with stage B and C test set tumor samples. Clearly, these genes separated the cases into two distinct clusters of patients with good prognosis (cluster 2) and poor prognosis (cluster 1) (P<0.001 using a log rank test). FIG. 2B presents a Kaplan-Meier plot of the survival times of Dukes' stage B and C tumors grouped by stage, showing no statistically significant difference.
As illustrated in FIGS. 1A, 1B, 2A, and 2B, gene expression profiles separate good and poor prognosis cases better than Dukes' staging. This suggests that a gene-expression based classifier, as provided by the present invention, is more accurate at predicting patient prognosis than the traditional Dukes' staging.
Dukes' Staging as a Prognosis Classifier
As noted above, Dukes' staging provides only a probability of survival for each member of a population of patients, based on historical statistics. Accordingly, the prognosis of an individual patient can be predicted based on historical outcome probabilities of the associated Dukes' stage. For example, if a Dukes' C. survival rate was 55% at 36 months of follow up, any individual Dukes' C. patient would be classified as having a good prognosis since more than 50% of patients would be predicted to be alive.
Performance of a Colorectal Cancer Survival Classifier of the Present Invention as Compared to Dukes' Staging
In order to determine the value of the human colon cancer prognosis/survival classifier of the subject invention, a classifier of the invention was compared to the Dukes' clinical staging approach currently in widespread use. In an initial set of 78 tumors (from the test set tumor samples described above), a classifier (Classifier A) of the present invention predicted 100%, 69%, 55% and 20% for Adenomas, and Dukes' stages B, C and D cancers, respectively. The overall accuracy was 77% (63% sensitivity/97% specificity).
Using LOOCV, Classifier A was evaluated in predicting prognosis for each patient at 36 months follow-up as compared to Dukes' staging predictions. The results of LOOCV demonstrated that Classifier A of the subject invention was 90% accurate (93% sensitivity/84% specificity) in predicting the correct prognosis for each patient at 36 month of follow-up. A log-rank test of the two predicted groups (good and poor prognosis) was significant (P<0.001), demonstrating the ability of Classifier A to distinguish the two outcomes (FIG. 2A). Permutation analysis demonstrates the result is better than possible by chance (P<0.001-1000 permutations).
This result is also significantly higher than that observed using Dukes' staging as a classifier (77%) for the same group of patients (P=0.03878). The results for both Dukes' staging and molecular staging are summarized in Tables 2A-2C below. Shown first in Table 2A are the relative accuracies of Dukes' staging and the cDNA classifier (molecular staging) for all tumors and then a comparison by Dukes' stage. As shown in Table 2B, Dukes' staging was particularly bad at predicting outcome for patients with poor prognosis (70% and 55% for all stages and B and C, respectively). In contrast, molecular staging, as provided by the present invention, identified the good prognosis cases (the “default” classification using Dukes' staging), but also identified poor prognosis cases with a high degree of accuracy, Table 2C. Tables 2A-2C also show the detailed confusion matrix for all samples in the dataset, showing the equivalent misclassification rate of both good and poor prognosis groups by the classifier of the subject invention.

TABLE 2A

LOOCV Accuracy of Dukes' vs. Molecular Staging for all

tumors.

Classification Method Total Accuracy Sensitivity Specificity

Dukes' Staging 77% 63% 97%

Molecular Staging *90% 93% 84%

TABLE 2B


Comparison of Molecular Staging and Dukes' Staging
Accuracy.

	Dukes' Stage	Molecular Staging	Dukes' Staging

Adenoma

100%	100%
B	87%	70%
C	91%	55%
D	90%	97%

TABLE 2C

Confusion Matrix of cDNA Classifier Results.

Observed/Predicted Poor Good Totals

Poor 43 3 46

Good 5 27 32

Total 48 30 78

*Dukes' staging vs. cDNA Classifier, P = 0.03878, one-sided McNemar's test.

Classifier Construction
Leave-one-out cross-validation technique can be utilized for evaluating the performance of a classifier construction method of the subject invention. This approach tends towards high variance in accuracy estimates, but with low bias.
Within each step of the leave-one-out cross-validation (or fold), a classifier of the subject invention can be created on all available training data, then tested for accuracy by classifying the left-out example. In one embodiment, a classifier was constructed in two steps: first a gene selection procedure was performed with SAM and then a support vector machine was constructed.
In a related embodiment, the gene selection approach used was a univariate selection. SAM (significance analysis of microarrays) was the method chosen for selecting genes. Since gene selected was to be based on two classes (good vs. poor prognosis), the two-class SAM method can be used for selecting genes with the best d values. SAM calculates false discovery rates empirically through the use of permutation analysis. SAM provides an estimate of the false discovery rate (FDR) along with a list of genes considered significant relative to censored survival. This feature of SAM was used with this particular embodiment to select the number of genes that resulted in the smallest FDR possible. In one embodiment, this FDR was zero.
The set of 53 genes (significant genes, as described above) at a FDR of 28% was used in this particular embodiment. Using this subset of 53 genes, the samples were clustered as a way of visualizing the SAM results (see FIGS. 1A and 1B). Once the genes were selected using the SAM method, a linear support vector machine (SVM) was constructed. The software used for this approach can be implemented in a weka machine learning toolkit. A linear SVM was then chosen to reduce the potential for overfitting the data, given the small sample sizes and large dimensionality. One further advantage of this approach is the transparency of the constructed model, which is of particular interest when comparing the classifier of the subject invention on two different platforms (see below).

In another embodiment, using LOOCV via statistical analytic tools for comparing groups (i.e., parametric tests such as t-test/ANOVA; see also Dyrskjot L et al., “Identifying distinct classes of bladder carcinoma using microarrays,” Nat. Genet., 33:90-6 (2003)), a list of 43 genes (from the 53 SAM selected genes as described above) was selected for use in constructing a second human colorectal cancer survival classifier, in accordance with the present invention. The list of 43 genes is provided in the following Table 3.

TABLE 3


Genes used in the cDNA classifier (selected by t-test) and ranked by selection
frequency using LOOCV.

Number
Times	GeneBank	UniGene
Occurred	ID	ID	Description

_M*78	AA045075	Hs.62751	syntaxin 7
_M*78	AA425320	Hs.250461	hypothetical protein; MDG1; similar to putative
			microvascular endothelial differentiation gene 1; similar to
			X98993 (PID: g1771560);
			microvascular endothelial differentiation gene 1 product;
			microvascularendothelial differentiation gene 1;
			DKFZP564F1862 p
_M78	AA437223	Hs.46640	adult retina protein
_M*78	AA479270	Hs.250802	Diff33 protein homolog; KIAA1253 protein
_M*78	AA486233	Hs.2707	G1 to S phase transition 1
_M*78	AA487274	Hs.48950	heptacellular carcinoma novel gene-3 protein; DAPPER1
_M78	AA488652	Hs.4209	HSPC235; ribosomal protein L2; Similar to ribosomal
			protein, mitochondrial, L2 [Homo sapiens]; mitochondrial
			ribosomal protein L37; ribosomal protein, mitochondrial, L2
			[Homo sapiens]
_M78	AA694500	Hs.116328	hypothetical protein MGC33414; Similar to PR domain
			containing 1, with ZNF domain
_M78	AA704270	Hs.189002	Null
_M*78	AA706226	Hs.113264	neuregulin 2 isoform 4
_M*78	AA709158	Hs.42853	put. DNA binding protein; put. DNA binding protein; cAMP
			responsive element binding protein-like 1; Creb-related
			protein
_M*78	AA775616	Hs.313	OPN-b; osteopontin; secreted phosphoprotein 1 (osteopontin,
			bone sialoprotein I, early T-lymphocyte activation 1)
_M78	AA777892	Hs.121939	Null
_M*78	AA873159	Hs.182778	apolipoprotein CI; apolipoprotein C-I variant II;
			apolipoprotein C-I variant I
_M*78	AA969508	Hs.10225	HEYL protein; hairy-related transcription factor 3;
			hairy/enhancer-ofsplit related with YRPW motif-like
_M78	AI203139	Hs.180370	hypothetical protein FLJ30934
_M*78	AI299969	Hs.255798	unnamed protein product; HN1 like; Unknown (protein for
			MGC: 22947)
_M*78	H17364	Hs.80285	CRE-BP1 family member; cyclic AMP response element
			DNA-binding protein isoform 1 family; cAMP response
			element binding protein (AA1-505); cyclic AMP response
			element-binding protein (HB16); Similar to activating
			transcription factor 2 [Homo sapiens]; act
_M78	H17627	Hs.83869	unnamed protein
_M*78	H19822	Hs.2450	KIAA0028; leucyl-tRNA synthetase, mitochondrial [Homo
			sapiens]; leucyl-tRNA synthetase, mitochondrial [Homo
			sapiens]; leucine-tRNA ligase precursor; leucine translase
			[Homo sapiens]
_M*78	H23551	Hs.30974	NADH dehydrogenase subunit 4 {Deirochelys reticularia}
_M78	H62801	Hs.125059	Unknown (protein for IMAGE: 4309224) [Homo sapiens];
			hypothetical protein [Homo sapiens]
_M78	H85015	Hs.138614	null
_M78	N21630	Hs.143039	hypothetical protein PRO1942
_M*78	N36176	Hs.108636	membrane protein CH1; membrane protein CH1 [Homo
			sapiens]; membrane protein CH1 [Homo sapiens]; membrane
			protein CH1 [Homo sapiens]
_M*78	N72847	Hs.125221	Alu subfamily SP sequence contamination warning entry.
			[Human] {Homo sapiens}
_M78	N92519	Hs.1189	Unknown (protein for MGC: 10231) [Homo sapiens]
_M*78	R27767	Hs.79946	thyroid hormone receptor-associated protein, 150 kDa
			subunit; Similar to thyroid hormone receptor-associated
			protein, 150 kDa subunit [Homo sapiens];;
_M*78	R34578	Hs.111314	null
_M78	R38360	Hs.145567	unknown {Homo sapiens}
_M78	R43597	Hs.137149	trehalase homolog T19F6.30 - Arabidopsis thaliana
_M78	R43684	Hs.165575	dJ402G11.5 (novel protein similar to yeast and bacterial
			predicted proteins)
_M*78	W73732	Hs.83634	Null
_M*77	AA450205	Hs.8146	translocation protein-1; Sec62; translocation protein 1; Dtrp1
			protein; membrane protein SEC62, S. cerevisiae, homolog of
			[Homo sapiens];
_M77	AI081269	Hs.184108	Alu subfamily SX sequence contamination warning entry.
_M*77	R59314	Hs.170056	null
_M*72	AA702174	Hs.75263	pRb-interacting protein RbBP-36
_M*70	AI002566	Hs.81234	immunoglobin superfamily, member 3
_M*63	AA676797	Hs.1973	cyclin F
_M*62	AA453508	Hs.168075	transportin; karyopherin (importin) beta 2; M9 region
			interaction protein
_M62	W93980	Hs.59511	null
_M*58	AA045308	Hs.7089	insulin induced protein 2; INSIG-2 membrane protein
_M58	AA953396	Hs.127557	null
_M52	AA962236	Hs.124005	hypothetical protein MGC19780
_M*50	AA418726	Hs.4764	null
_M50	R43713	Hs.22945	null
_M*41	AA664240	Hs.8454	artifact-warning sequence (translated ALU class C) - human
_M*38	AA477404	Hs.125262	hypothetical protein; unnamed protein product; GL003;
			AAAS protein; adracalin; aladin
_M*37	AA826237	Hs.3426	Era GTPase A protein; conserved ERA-like GTPase [Homo
			sapiens]; ERA-W [Homo sapiens]; Era G-protein-like 1;
			GTPase, human homolog of E. coli essential cell cycle
			protein Era; era (E. coli Gprotein homolog)-like 1 [Homo
			sapiens]
_M*30	AA007421	Hs.113992	candidate tumor suppressor protein {Homo sapiens}
_M*30	AA478952	Hs.91753	unnamed protein product; hypothetical protein [Homo
			sapiens]; unnamed protein product [Homo sapiens];
			hypothetical protein [Homo sapiens]
_M62	W93980	Hs.59511	Null
_M*58	AA045308	Hs.7089	insulin induced protein 2; INSIG-2 membrane protein
_M58	AA953396	Hs.127557	null
52	AA962236	Hs.124005	hypothetical protein MGC19780
*50	AA418726	Hs.4764	null
50	R43713	Hs.22945	null
*41	AA664240	Hs.8454	artifact-warning sequence (translated ALU class C) - human
*38	AA477404	Hs.125262	hypothetical protein; unnamed protein product; GL003;
			AAAS protein; adracalin; aladin
*37	AA826237	Hs.3426	Era GTPase A protein; conserved ERA-like GTPase [Homo
			sapiens]; ERA-W [Homo sapiens]; Era G-protein-like 1;
			GTPase, human homolog of E. coli essential cell cycle
			protein Era; era (E. coli Gprotein homolog)-like 1 [Homo
			sapiens]
*30	AA007421	Hs.113992	candidate tumor suppressor protein {Homo sapiens}
*30	AA478952	Hs.91753	unnamed protein product; hypothetical protein [Homo
			sapiens]; unnamed protein product [Homo sapiens];
			hypothetical protein [Homo sapiens]
30	AA885096	Hs.43948	Alu subfamily SQ sequence contamination warning entry.
28	H29032	Hs.7094	null
*24	R10545	Hs.148877	dJ425C14.2 (Placental protein
*22	AA448641	Hs.108371	transcription factor; E2F transcription factor 4; p107/p130-
			binding protein
20	R38266	Hs.12431	Unknown (protein for MGC: 30132)
19	H17543	Hs.92580	Alu subfamily J sequence contamination warning entry.
11	T81317	Hs.189846	Alu subfamily J sequence contamination warning entry.
*9	AA453790	Hs.255585	null
9	R22340	null	unnamed protein product; chr2 synaptotagmin KIAA1228
			protein
7	AA987675	Hs.176759	null
7	N51543	Hs.47292	null
*7	N74527	Hs.5420	unnamed protein product
*6	AA121778	Hs.95685	null
*6	AA258031	Hs.125104	unnamed protein product; MUS81 endonuclease
*6	AA702422	Hs.66521	josephin MJD1; super cysteine rich protein; SCRP
6	T64924	Hs.220619	null
*5	R42984	Hs.4863	null
*5	R59360	Hs.12533	null
*5	R63816	Hs.28445	unnamed protein product
5	T49061	Hs.8934	HA-70 {Clostridium botulinum}
4	AA016210	Hs.24920	null
4	AA682585	Hs.193822	null
4	AA705040	Hs.119646	Alu subfamily J sequence contamination warning entry.
			[Human] {Homo sapiens}
4	AA909959	Hs.130719	NESH; hypothetical protein; NESH protein [Homo sapiens];
			NESH protein; new molecule including SH3 [Homo sapiens]
4	AI240881	Hs.89688	complement receptor type 1-like protein {Homo sapiens}
*3	AA133215	Hs.32989	Receptor activity-modifying protein 1 precursor (CRLR
			activity-modifying-protein 1)
3	AA699408	Hs.168103	prp28, U5 snRNP 100 kd protein; prp28, U5 snRNP 100 kd
			protein [Homo sapiens]
3	AA910771	Hs.130421	null
*3	AI362799	Hs.110757	hypothetical protein; NNP3 [Homo sapiens]
*3	H51549	Hs.21899	UDP-galactose translocator; UDP-galactose transporter 1
			[Homo sapiens]
3	R06568	Hs.187556	null
2	AA001604	Hs.204840	null
*2	AA132065	Hs.109144	unknown; SMAP-5; Similar to hypothetical protein
			AF140225
*2	AA490493	Hs.24340	null
2	AA633845	Hs.192156	null
*2	AI261561	Hs.182577	Alu subfamily SQ sequence contamination warning entry.
*2	H81024	Hs.180655	Aik2; aurora-related kinase 2; serine/threonine kinase 12;
			Unknown (protein for MGC: 11031) [Homo sapiens];
			Unknown (protein for MGC: 4243) [Homo sapiens]
2	N75004	Hs.49265	hypothetical protein {Plasmodium falciparum 3D7}
2	W96216	Hs.110196	NICE-1 protein
1	AA045793	Hs.6790	hypothetical protein; MDG1; similar to putative microvascular
			endothelial differentiation gene 1; similar to X98993
			(PID: g1771560); microvascular endothelial differentiation gene 1
			product; microvascular endothelial differentiation gene 1;
			DKFZP564F1862 p
*1	AA284172	Hs.89385	NPAT; predicted amino acids have three regions which share
			similarity to annotated domains of transcriptional factor oct-
			1, nucleoluscytoplasm shuttle phosphoprotein and protein
			kinases; NPAT; nuclear protein, ataxia-telangiectasia locus;
			Similar to nuc
*1	AA411324	Hs.67878	interleukin-13 receptor; interleukin-13 receptor; interleukin
			13 receptor, alpha 1 [Homo sapiens]; Similar to interleukin 13
			receptor, alpha 1[Homo sapiens]; bB128O4.2.1 (interleukin
			13 receptor, alpha 1) [Homo
			sapiens]; interleukin 13 receptor, alpha 1
*1	AA448261	Hs.139800	high mobility group AT-hook 1 isoform b; nonhistone
			chromosomal high-mobility group protein HMG-I/HMG-Y
			[Homo sapiens]
*1	AA479952	Hs.154145	Alu subfamily SX sequence contamination warning entry.
			[Human] {Homo sapiens}
*1	AA485752	Hs.9573	ATP-binding cassette, sub-family F, member 1; ATP-binding
			cassette 50; ATP-binding cassette, sub-family F (GCN20),
			member 1 [Homo sapiens];;
*1	AA504266	Hs.8217	nuclear protein SA-2; bA517O1.1 (similar to SA2 nuclear
			protein); hypothetical protein [Homo sapiens]; stromal
			antigen 2 [Homo sapiens]
*1	AA630376	Hs.8121	null
*1	AA634261	Hs.25035	null
1	AA701167	Hs.191919	Alu subfamily SB sequence contamination warning entry.
			[Human] {Homo sapiens}
*1	AA703019	Hs.114159	small GTP-binding protein; RAB-8b protein; Unknown
			(protein for MGC: 22321) [Homo sapiens]
*1	AA706041	Hs.170253	unnamed protein product [Homo sapiens]; hypothetical
			protein FLJ23282 [Homo sapiens];;
1	AA773139	Hs.66103	null
1	AA776813	Hs.191987	hypothetical protein {Macaca fascicularis}
*1	AA862465	Hs.71	zinc-alpha2-glycoprotein precursor; Zn-alpha2-glycoprotein;
			Znalpha2-glycoprotein; alpha-2-glycoprotein 1, zinc; alpha-
			2-glycoprotein 1, zinc [Homo sapiens];;
*1	AA977711	Hs.128859	null
1	AI288845	Hs.105938	putative chemokine receptor; putative chemokine receptor;
			chemokine receptor X; C—C chemokine receptor 6. (CCR6)
			(Evidence is not experimental); chemokine (C—C motif)
			receptor-like 2 [Homo sapiens]
*1	H15267	Hs.210863	null
1	H18956	Hs.21035	unnamed protein product [Homo sapiens]
1	H73608	Hs.94903	null
*1	H99544	Hs.153445	unknown; endothelial and smooth muscle cell-derived
			neuropilin-like protein [Homo sapiens]; endothelial and
			smooth muscle cell-derived neuropilin-like protein;
			coagulation factor V/VIII-homology domains protein 1
			[Homo sapiens]
*1	N45282	Hs.201591	calcitonin receptor-like
*1	N48270	Hs.45114	Similar to golgi autoantigen, golgin subfamily a, member 6
			[Homo sapiens]
1	N59451	Hs.48389	null
*1	N95226	Hs.22039	KIAA0758 protein;
1	R37028	Hs.20956	cytochrome bd-type quinol oxidase subunit I related protein
			{Thermoplasma acidophilum}
1	R66605	Hs.182485	Unknown (protein for IMAGE: 4843317) {Homo sapiens}
*1	T51004	Hs.167847	null
1	T51316	null	null
1	T72535	Hs.189825	null
*1	W72103	Hs.236443	beta-spectrin 2 isoform 2

_Mdenotes genes that were used to classify 75% of all tumors, and genes appearing in both the cDNA classifier and the U133A-limited cDNA classifier are marked by *.

Any and all of the nucleotide and/or amino acid sequences associated with the accession numbers listed in Table 3 are hereby incorporated by reference.

In yet another embodiment, a third human colorectal cancer survival classifier, in accordance with the present invention, was prepared using U133A-limited genes selected by LOOCV via statistical analytic tools (i.e., t-test). The list of U133A-limited genes selected using LOOCV via t-test is provided in the following Table 4. The named genes common to both the original classifier (a set of 43 genes) and the U133A-limited classifier are marked with an asterisk. Table 5 illustrates seven genes selected by SAM survival analysis, where osteopontin and neuregulin are noted to be present and in common with the gene lists for all classifiers. In Table 5, genes denoted with (+) indicate a positive correlation to survival time and genes without the (+) notation indicate a negative correlation in survival time (over expression in poor prognosis cases)

TABLE 4


Genes used in U133A-limited cDNA classifier (selected by t-test) and ranked
by selection frequency using LOOCV.

Number
Times	GeneBank	UniGene
Occurred	ID	ID	Description

_M*78	AA007421	Hs.113992	candidate tumor suppressor protein
_M*78	AA045075	Hs.62751	syntaxin 7
_M*78	AA045308	Hs.7089	insulin induced protein 2, INSIG-2 membrane protein
_M*78	AA418726	Hs.4764	null
_M*78	AA425320	Hs.250461	hypothetical protein; MDG1; similar to putative
			microvascular endothelial differentiation gene 1; similar to
			X98993 (PID: g1771560); microvascular endothelial
			differentiation gene 1 product; microvascular endothelial
			differentiation gene 1; DKFZP564F1862 p
_M*78	AA450205	Hs.8146	translocation protein-1; Sec62; translocation protein 1; Dtrp1
			protein; membrane protein SEC62, S. cerevisiae, homolog of
			[Homo sapiens];
_M*78	AA453508	Hs.168075	transportin; karyopherin (importin) beta 2; M9 region
			interaction protein
_M*78	AA453790	Hs.255585	null
_M*78	AA477404	Hs.125262	hypothetical protein; unnamed protein product; GL003;
			AAAS protein; adracalin; aladin; adracalin
_M*78	AA478952	Hs.91753	unnamed protein product
_M*78	AA479270	Hs.250802	Diff33 protein homolog; KIAA1253 protein
_M*78	AA486233	Hs.2707	G1 to S phase transition 1 [Homo sapiens]
_M*78	AA487274	Hs.48950	heptacellular carcinoma novel gene-3 protein; DAPPER1
			[Homo sapiens]; unnamed protein product [Homo sapiens]
_M*78	AA664240	Hs.8454	artifact-warning sequence (translated ALU class C) - human
_M*78	AA676797	Hs.1973	cyclin F
_M*78	AA702174	Hs.75263	pRb-interacting protein RbBP-36
_M*78	AA706226	Hs.113264	neuregulin 2 isoform 4
_M*78	AA709158	Hs.42853	put. DNA binding protein; put. DNA binding protein; cAMP
			responsive element binding protein-like 1; Creb-related
			protein [Homo sapiens]
_M*78	AA775616	Hs.313	OPN-b; osteopontin; secreted phosphoprotein 1 (osteopontin,
			bone sialoprotein I, early T-lymphocyte activation 1);
			secreted phosphoprotein 1 (osteopontin, bone sialoprotein I,
			early T-lymphocyte activation 1) [Homo sapiens]; secreted
			phosphoprotein 1 (ost
_M*78	AA826237	Hs.3426	Era GTPase A protein; conserved ERA-like GTPase [Homo
			sapiens]; ERA-W [Homo sapiens]; Era G-protein-like 1;
			GTPase, human homolog of E. coli essential cell cycle
			protein Era; era (E. coli G-protein homolog)-like 1 [Homo
			sapiens]
_M*78	AA873159	Hs.182778	apolipoprotein CI; apolipoprotein CI; apolipoprotein C-I;
			apolipoprotein C-I precursor; apolipoprotein C-I variant II;
			apolipoprotein C-I variant I; Similar to apolipoprotein C-I
			[Homo sapiens]
_M*78	AA969508	Hs.10225	HEYL protein; hairy-related transcription factor 3;
			hairy/enhancer-of-split related with YRPW motif-like [Homo
			sapiens]
_M*78	AI002566	Hs.81234	immunoglobin superfamily, member 3
_M*78	AI299969	Hs.255798	unnamed protein product [Homo sapiens]; HN1 like [Homo
			sapiens]; Unknown (protein for MGC: 22947) [Homo
			sapiens]; HN1 like [Homo sapiens]
_M*78	H17364	Hs.80285	CRE-BP1 family member; cyclic AMP response element
			DNA-binding protein isoform 1 family; cAMP response
			element binding protein (AA 1-505); cyclic AMP response
			element-binding protein (HB16); Similar to activating
			transcription factor 2 [Homo sapiens]; act
_M*78	H19822	Hs.2450	KIAA0028; leucyl-tRNA synthetase, mitochondrial [Homo
			sapiens]; leucyl-tRNA synthetase, mitochondrial [Homo
			sapiens]; leucine-tRNA ligase precursor; leucine translase
			[Homo sapiens]
_M*78	H23551	Hs.30974	NADH dehydrogenase subunit 4 {Deirochelys reticularia}
_M*78	N36176	Hs.108636	membrane protein CH1; membrane protein CH1 [Homo
			sapiens]; membrane protein CH1 [Homo sapiens]; membrane
			protein CH1 [Homo sapiens]
_M*78	N72847	Hs.125221	Alu subfamily SP sequence contamination warning entry.
			[Human] {Homo sapiens}
_M*78	R10545	Hs.148877	dJ425C14.2 (Placental protein
_M*78	R27767	Hs.79946	thyroid hormone receptor-associated protein, 150 kDa
			subunit; Similar to thyroid hormone receptor-associated
			protein, 150 kDa subunit [Homo sapiens];;
_M*78	R34578	Hs.111314	null
_M*78	R59314	Hs.170056	null
_M*78	W73732	Hs.83634	null
_M*74	AA448641	Hs.108371	transcription factor; E2F transcription factor 4; p107/p130-
			binding protein [Homo sapiens]; E2F transcription factor 4,
			p107/p130-binding [Homo sapiens]; E2F transcription factor
			4, p107/p130-binding [Homo sapiens];
_M*68	R59360	Hs.12533	null
_M*63	AA121778	Hs.95685	null
_M*59	H51549	Hs.21899	UDP-galactose translocator; UDP-galactose transporter 1
			[Homo sapiens]
*57	H81024	Hs.180655	Aik2; aurora-related kinase 2; serine/threonine kinase 12;
			serine/threonine kinase 12 [Homo sapiens]; Unknown
			(protein for MGC: 11031) [Homo sapiens]; Unknown (protein
			for MGC: 4243) [Homo sapiens]
*56	AA490493	Hs.24340	0
*56	R42984	Hs.4863	null
*53	AA258031	Hs.125104	unnamed protein product [Homo sapiens]; MUS81
			endonuclease [Homo sapiens]; MUS81 endonuclease [Homo
			sapiens]
*52	AA133215	Hs.32989	Receptor activity-modifying protein 1 precursor (CRLR
			activity-modifying-protein 1)
*52	R63816	Hs.28445	unnamed protein product [Homo sapiens]
*51	N95226	Hs.22039	KIAA0758 protein
*45	N74527	Hs.5420	unnamed protein product {Homo sapiens}
*36	AA702422	Hs.66521	josephin MJD1; super cysteine rich protein; SCRP
*29	AI261561	Hs.182577	Alu subfamily SQ sequence contamination warning entry.
			[Human] {Homo sapiens}
*28	AA132065	Hs.109144	unknown; SMAP-5; Similar to hypothetical protein
			AF140225 [Homo sapiens]; Similar to hypothetical protein
			AF140225 [Homo sapiens]; unnamed protein product [Homo
			sapiens]; unknown [Homo sapiens]; hypothetical protein
			AF140225 [Homo sapiens]
*28	AI362799	Hs.110757	hypothetical protein; NNP3 [Homo sapiens]
*27	AA045793	Hs.6790	hypothetical protein; MDG1; similar to putative
			microvascular endothelial differentiation gene 1; similar to
			X98993 (PID: g1771560); microvascular endothelial
			differentiation gene 1 product; microvascular endothelial
			differentiation gene 1; DKFZP564F1862 p
*27	AA284172	Hs.89385	NPAT; predicted amino acids have three regions which share
			similarity to annotated domains of transcriptional factor oct-
			1, nucleolus-cytoplasm shuttle phosphoprotein and protein
			kinases; NPAT; nuclear protein, ataxia-telangiectasia locus;
			Similar to nuc
24	N51632	Hs.75353	The KIAA0123 gene product is related to rat general
			mitochondrial matrix processing protease (MPP).; Unknown
			(protein for IMAGE: 3632957) [Homo sapiens]; Unknown
			(protein for IMAGE: 3857242) [Homo sapiens]; inositol
			polyphosphate-5-phosphatase, 72 kDa; KIAA0
23	AA482110	Hs.4900	Unknown gene product; PRO0915; CUA001; hypothetical
			protein [Homo sapiens]; hypothetical protein [Homo sapiens]
22	AA485450	Hs.132821	flavin containing monooxygenase 2; flavin containing
			monooxygenase 2 [Homo sapiens]
*19	AA699408	Hs.168103	prp28, U5 snRNP 100 kd protein; prp28, U5 snRNP 100 kd
			protein [Homo sapiens]
18	N70777	Hs.49927	BA103J18.1.2 (novel protein, isoform 2) [Homo sapiens]
16	AA993736	Hs.169838	hypothetical protein; vesicle-associated membrane protein 4
			[Homo sapiens]; Similar to vesicle-associated membrane
			protein 4 [Homo sapiens]
15	AI139498	Hs.151899	delta sarcoglycan; delta-sarcoglycan isoform 2; Sarcoglyan,
			delta (35 kD dystrophin-associated glycoprotein); dystrophin
			associated glycoprotein, delta sarcoglycan; 35 kD dystrophin-
			associated glycoprotein [Homo sapiens]
15	N59721	Hs.21858	glia-derived nexin precursor; serine (or cysteine) proteinase
			inhibitor, clade E (nexin, plasminogen activator inhibitor type
			1), member 2; protease inhibitor 7 (protease nexin I); glia-
			derived nexin [Homo sapiens]; similar to serine (or cysteine)
			protein
14	AA431885	Hs.5591	MAP kinase-interacting serine/threonine kinase 1; MAP
			kinase
			interacting kinase 1 [Homo sapiens]
14	AA911661	Hs.2733	Hox2H protein (AA 1-356); K8 homeo protein; HOX2.8 gene
			product; HOXB2 protein; HOX-2.8 protein (77 AA); homeo
			box B2; homeo box 2H; homeobox protein Hox-B2; K8
			home protein [Homo sapiens];
13	AA775865	Hs.7579	KIAA1192 protein; HSPC273; unnamed protein product;
			hypothetical protein FLJ10402 [Homo sapiens]; unnamed
			protein product [Homo sapiens]; hypothetical protein
			FLJ10402 [Homo sapiens]; hypothetical protein [Homo
			sapiens]; unnamed protein product [Homo sapiens]
13	R30941	Hs.24064	signal transducer and activator of transcription Stat5B;
			transcription factorStat5b; STAT5B_CDS [Homo sapiens];
			signal transducer and activator of transcription 5B; signal
			transducer and activator of transcription 5; transcription
			factor STAT5B [Homo sapiens]
*11	AA703019	Hs.114159	small GTP-binding protein; RAB-8b protein; Unknown
			(protein for MGC: 22321) [Homo sapiens]
11	AA777192	Hs.47062	RNA Polymerase II subunit 14.5 kD; DNA directed RNA
			polymerase II polypeptide I; DNA directed RNA polymerase
			II 14.5 kda polypeptide [Homo sapiens]; polymerase (RNA)
			II (DNA directed) polypeptide I (14.5 kD) [Homo sapiens]
*10	W72103	Hs.236443	beta-spectrin 2 isoform 2 [Homo sapiens]
*9	H15267	Hs.210863	null
8	H17638	Hs.17930	dJ1033B10.2.2 (chromosome 6 open reading frame 11
			BING4), isoform 2) [Homo sapiens]
8	R60193	Hs.11637	null
7	R92717	Hs.170129	choroideremia-like Rab escort protein 2; dJ317G22.3
			(choroideremia-like (Rab escort protein 2))
*6	AA706041	Hs.170253	unnamed protein product [Homo sapiens]; hypothetical
			protein FLJ23282 [Homo sapiens];;
*5	AA411324	Hs.67878	interleukin-13 receptor; interleukin-13 receptor; interleukin
			13 receptor, alpha 1 [Homo sapiens]; Similar to interleukin
			13 receptor, alpha 1 [Homo sapiens]; bB128O4.2.1
			(interleukin 13 receptor, alpha 1) [Homo sapiens]; interleukin
			13 receptor, alpha 1
*5	AA504266	Hs.8217	nuclear protein SA-2; bA517O1.1 (similar to SA2 nuclear
			protein); hypothetical protein [Homo sapiens]; stromal
			antigen 2 [Homo sapiens]
5	AA932696	Hs.8022	TU3A protein; TU3A protein [Homo sapiens]
5	AA973494	Hs.153003	serine/threonine kinase; myristilated and palmitylated serine-
			threonine kinase MPSK; protein kinase expressed in day 12
			fetal liver; F5-2; serine/threonine kinase KRCT;
			erine/threonine kinase 16 [Homo sapiens];
5	N45100	Hs.34871	HRIHFB2411; KIAA0569 gene product; Smad interacting
			protein 1 [Homo sapiens]; smad-interacting protein-1 [Homo
			sapiens]
4	AA418410	Hs.9880	cyclophilin; U-snRNP-associated cyclophilin; peptidyl prolyl
			isomerase H (cyclophilin H) [Homo sapiens]
4	AA725641	Hs.154397	WD-repeat protein
4	AA954482	Hs.222677	SSX1; synovial sarcoma, X breakpoint 1 [Homo sapiens];
			synovial sarcoma, X breakpoint 8 [Homo sapiens]; synovial
			sarcoma, X breakpoint 1; sarcoma, synovial, X-chromosome-
			related 1; SSX1 protein [Homo sapiens]
4	H45391	Hs.31793	null
4	T86932	Hs.131924	T-cell death-associated gene 8; similar to G protein-coupled
			receptor [Homo sapiens]
3	AA279188	Hs.86947	disintegrin and metalloprotease domain 8 precursor
*3	AA485752	Hs.9573	ATP-binding cassette, sub-family F, member 1; ATP-binding
			cassette 50; ATP-binding cassette, sub-family F (GCN20),
			member 1 [Homo sapiens];;
3	AA680132	Hs.55235	sphingomyelin phosphodiesterase 2, neutral membrane
			(neutral
			sphingomyelinase); Unknown (protein for MGC: 1617)
			[Homo sapiens]
*3	AA977711	Hs.128859	null
3	W93370	Hs.174219	NKG2E; type II integral membrane protein; killer cell lectin-
			like receptor subfamily C, member 3; killer cell lectin-like
			receptor subfamily C, member 3 isoform NKG2-H; NKG2E
			[Homo sapiens]; NKG2E [Homo
			sapiens]; NKG2E [Homo sapiens]
2	AA036727	Hs.180236	null
2	AA071075	Hs.25523	Alu subfamily SP sequence contamination warning entry.
			[Human] {Homo sapiens}
2	AA464612	Hs.190161	PTD017; HSPC183; PTD017 protein [Homo sapiens];
			mitochondrial ribosomal protein S18B; mitochondrial
			ribosomal protein S18-2; mitochondrial 28S ribosomal
			protein S18-2 [Homo sapiens]
2	AA481250	Hs.154138	chitinase precursor; chitinase 3-like 2; chondrocyte protein
			39; chitinase 3-like 2 [Homo sapiens]
2	AA598659	Hs.168516	NuMA protein {Homo sapiens}
2	AA682905	Hs.8004	huntingtin-associated protein interacting protein
2	R17811	Hs.77897	splicing factor SF3a60; pre-mRNA splicing factor SF3a
			(60 kD), similar to S. cerevisiae PRP9 (spliceosome-
			associated protein 61); splicing factor 3a, subunit 3, 60 kD
			[Homo sapiens]; Similar to splicing factor 3a, subunit 3,
			60 kD [Homo sapiens]
2	W93592	Hs.47343	hWNT5A; wingless-type MMTV integration site family,
			member 5A precursor; proto-oncogene Wnt-5A precursor;
			WNT-5A protein precursor [Homo sapiens]
1	AA017301	Hs.60796	artifact-warning sequence (translated ALU class C) - human
1	AA046406	Hs.100134	unnamed protein product [Homo sapiens]; hypothetical
			protein FLJ12787 [Homo sapiens]
1	AA256304	Hs.172648	Unknown (protein for MGC: 9448) [Homo sapiens]; distal-
			less homeo box 7 [Homo sapiens]; distal-less homeobox 4,
			isoform a; beta protein 1 [Homo sapiens]
1	AA416759	Hs.239760	Unknown (protein for MGC: 2503) [Homo sapiens]; unnamed
			protein product [Homo sapiens]
*1	AA448261	Hs.139800	high mobility group AT-hook 1 isoform b; nonhistone
			chromosomal highmobility group protein HMG-I/HMG-Y
			[Homo sapiens]
1	AA452130	Hs.28219	Alu subfamily SX sequence contamination warning entry.
			[Human] {Homo sapiens}
1	AA457528	Hs.22979	unnamed protein product [Homo sapiens]; hypothetical
			protein FLJ13993 [Homo sapiens]; FLJ00167 protein [Homo
			sapiens]
1	AA460542	Hs.121849	microtubule-associated proteins 1A/1B light chain 3;
			microtubuleassociated proteins 1A/1B light chain 3;
			microtubule-associated proteins 1A/1B light chain 3 [Homo
			sapiens]; microtubule-associated proteins 1A/1B light chain 3
			[Homo sapiens]
*1	AA479952	Hs.154145	Alu subfamily SX sequence contamination warning entry.
			[Human] {Homo sapiens}
1	AA481507	Hs.159492	unnamed protein product [Homo sapiens]
1	AA504342	Hs.7763	null
1	AA598970	Hs.7918	unnamed protein product; hypothetical protein; dJ453C12.6.2
			(uncharacterized hypothalamus protein (isoform 2));
			hypothetical protein [Homo sapiens]; uncharacterized
			hypothalamus protein HSMNP1 [Homo sapiens]
*1	AA630376	Hs.8121	null
*1	AA634261	Hs.25035	null
1	AA677254	Hs.52002	CT-2; CD5 antigen-like (scavenger receptor cysteine rich
			family); bA120D12.1 (CD5 antigen-like (scavenger receptor
			cysteine rich family)) [Homo sapiens]; CD5 antigen-like
			(scavenger receptor cysteine rich family) [Homo sapiens]
1	AA757564	Hs.13214	Probable G protein-coupled receptor GPR27 (Super
			conserved receptor expressed in brain 1). [Human]
1	AA775888	Hs.163151	null
1	AA844864	Hs.4158	regenerating protein I beta; regenerating islet-derived 1 beta
			precursor; lithostathine 1 beta; regenerating protein I beta;
			secretory pancreatic stone protein 2 [Homo sapiens]
*1	AA862465	Hs.71	zinc-alpha2-glycoprotein precursor; Zn-alpha2-glycoprotein;
			Zn-alpha2-glycoprotein; alpha-2-glycoprotein 1, zinc; alpha-
			2-glycoprotein 1, zinc [Homo sapiens];;
1	AA989139	Hs.16608	candidate tumor suppressor protein; candidate tumor
			suppressor protein [Homo sapiens]
1	AI253017	Hs.183438	U4/U6 snRNP-associated 61 kDa protein {Homo sapiens}
1	AI394426	Hs.57732	acid phosphatase {Homo sapiens}
*1	H99544	Hs.153445	unknown; endothelial and smooth muscle cell-derived
			neuropilin-like protein [Homo sapiens]; endothelial and
			smooth muscle cell-derived neuropilin-like protein;
			coagulation factor V/VIII-homology domains protein 1
			[Homo sapiens]
1	N41021	Hs.114408	Toll/interleukin-1 receptor-like protein 3; Toll-like receptor
			5; Toll-like receptor 5 [Homo sapiens]; toll-like receptor 5;
			Toll/interleukin-1 receptor-like protein 3 [Homo sapiens]
*1	N45282	Hs.201591	calcitonin receptor-like
1	N46845	Hs.144287	hairy/enhancer-of-split related with YRPW motif 2; basic
			helix-loop-helix factor 1; HES-related repressor protein 1
			HERP1; GRIDLOCK; basichelix-loop-helix protein; hairy-
			related transcription factor 2; hairy/enhancer-of-split related
			with YRPW motif 2 [H
*1	N48270	Hs.45114	Similar to golgi autoantigen, golgin subfamily a, member 6
			[Homo sapiens]
1	N59846	Hs.177812	Unknown (protein for MGC: 41314) {Mus musculus}
1	R16760	Hs.20509	HBV pX associated protein-8
1	R44546	Hs.82563	dJ526I14.2 (KIAA0153 (similar
1	R92994	Hs.1695	metalloelastase; metalloelastase; matrix metalloproteinase 12
			(macrophage elastase)
*1	T51004	Hs.167847	null
1	T56281	Hs.8765	metallothionein I-F; RNA helicase-related protein [Homo
			sapiens];
			metallothionein 1F [Homo sapiens]
1	T70321	Hs.247129	G3a protein; Apo M; apolipoprotein M; Unknown (protein
			for
			MGC: 22400) [Homo sapiens]; apolipoprotein M; NG20-like
			protein [Homo sapiens]
1	W45025	Hs.170268	Alu subfamily SX sequence contamination warning entry.
			[Human] {Homo sapiens}

_Mdenotes genes used to classify 75% of all tumors, and genes appearing in both the cDNA classifier and U133A-limited cDNA classifier are marked by *.

Any and all of the nucleotide and/or amino acid sequences associated with the accession numbers listed in Table 4 are hereby incorporated by reference.

TABLE 5


Censored survival analysis using SAM; seven genes selected with
median estimated FDR of 13.5%.

GeneBank	UniGene
ID	ID	Description

N36176	Hs.108636	membrane protein CH1
AA149253	Hs.107987	N/A
AA425320	Hs.250461	hypothetical protein; MDG1; similar to putative
		microvascular endothelial differentiation
		gene 1; similar to X98993 (PID: g1771560)
AA775616	Hs.313	OPN-b; osteopontin; secreted phosphoprotein 1
		(osteopontin, bone sialoprotein I, early
		T-lymphocyte activation 1)
N72847	Hs.125221	N/A
AA706226	Hs.113264	neuregulin 2 isoform 4
⁺AA883496	Hs.125778	N/A

Any and all of the nucleotide and/or amino acid sequences associated with the accession numbers listed in Table 5 are hereby incorporated by reference.

Cross Platform Validation

Systems and methods of the subject invention can be tested by applying a classifier to an immediately available, well-annotated, independent test set of colon cancer tumor samples (Denmark, as described above) run on the Affymetrix platform. Using database software such as the Resourcer software from TIGR (see also Tsai J et al., “RESOURCER: a database for annotating and linking microarray resources within and across species,” Genome Biol, 2:software0002.1-0002.4 (2001)), genes can be mapped out from the cDNA chip to a corresponding gene on the Affymetrix platform.
The linkage is done by common Unigene IDs.
In one embodiment, 12,951 genes (out of 32,000) were mapped to an Affymetrix U133A GeneChip. In certain instances, probes on the cDNA chip are unknown expressed sequence tag markers (ESTs) which can reduce the number of usable genes identified. Thus, a classifier of the subject invention can address this lack of correspondence in platforms. Accordingly, in a related embodiment, a U133A-limited cDNA classifier was constructed in accordance with the subject invention by using the identical approach on this reduced set of overlapping genes.
With the U133A-limited cDNA classifier, only those cDNA probes are chosen that (according to Resourcerer) mapped to an Affymetrix probe set. This approach enables cross-platform comparison. For example, the training set samples were used together with the test set tumor samples in a flip-dye design. The end expression value from a cDNA probe is then the log2 of the training set to test set sample ratio. This same reference RNA was used on two U133A Affymetrix chips.
Once the U133A-limited cDNA classifier was constructed, a linear scaling factor based on the expression of a common training set (H. Lee Moffitt Cancer Center & Research Institute, Tampa, Fla.) sample applied to both the cDNA microarrays and the U133A GeneChips, was applied equally to all Affymetrix samples (training set as well as test set samples from DENMARK). Using this assumption, the U133A chip value corresponding to a cDNA probe is the ratio of training set to test set sample (on U133A chips). Each of the Affymetrix U133A arrays (both the test set and the reference samples) was scaled to a constant average intensity (150) prior to taking the ratio and the test sample chip values were averaged.
The results of a full LOOCV for the U133A-limited classifier on the test set sample (Moffitt Cancer Center cDNA microarray data set; original 78 samples) are shown in Tables 6A-6C. The accuracy of the U133A-limited classifier was 72% (80% sensitivity/59% specificity), which contrasted from the original cDNA classifier results (90%, P=0.001154). Many ESTs were selected both in the SAM survival analysis and in the original cDNA-based classifier, indicating unknown genes (ESTs) may be very important to colorectal cancer outcome. The U133A-limited classifier was not significantly different, however, than the Dukes' staging (77%), P=0.4862 using a two-sided McNemar's test, and still significantly discriminated the two groups, as can be seen in FIG. 3B (P<0.001).
FIGS. 3A through 3C illustrate survival curves for molecular classifiers in accordance with the subject invention. Specifically, FIG. 3A illustrates the survival curve for a cDNA classifier of the subject invention on the 78 training set samples (LOOCV); FIG. 3B illustrates the survival curve for the U133A-limited cDNA classifier (LOOCV); and FIG. 3C illustrates the survival curve for an independent test set classification (Denmark test set sample). A large difference in sensitivity can be seen between the Dukes' method and the classifier (Tables 6A-6C). The confusion matrix and accuracy rates by Dukes' stage are also presented in Tables 6A-6C.

TABLE 6A

LOOCV Accuracy of Dukes' vs. Molecular Staging for all tumors.

Classification Total

Method Accuracy Sensitivity Specificity

Dukes' 76.9% 63% 97%

Staging

Molecular 71.8% 80% 59%

Staging

TABLE 6B


Comparison of Molecular Staging and Dukes' Staging Accuracy

Dukes'	Molecular	Dukes'
Stage	Staging	Staging

Adenoma	67%	100%
B	70%	70%
C	64%	55%
D
	80%	97%

TABLE 6C


Confusion Matrix of cDNA Classifier Results

Observed/Predicted	Poor	Good	Totals

Poor	38	8	46
Good	14	18	32
Total	52	26	78

With respect to comparing the predictive power of a classifier of the subject invention to Dukes' staging, the U133A-limited classifier was tested on the test set of colorectal cancer samples from Denmark that were profiled on the Affymetrix U133A platform. The normalized and scaled test-set data were evaluated with the U133A-limited cDNA classifier. Because the Denmark cases included only Dukes' stages B and C, classification of outcome by Dukes' staging would predict all samples to be of good prognosis. The accuracy of the cDNA classifier was reduced from 72% in LOOCV of the training set (Tables 6A-6C) to 68% in the Denmark cross-platform test set (Tables 7A-7C). A diminished accuracy (4%) was expected due to the limitations imposed by cross-platform analyses, however this reduction was very small compared to that caused by limiting the classifier gene set to U133A content. This result is not significantly different from that achieved by classification using Dukes' staging (64%, P=0.7194 using a two sided McNemar's test) and is better than other reported results (47%) (see Sorlie T et al., “Repeated observation of breast tumor subtypes in independent gene expression data sets,” Proc Natl Acad Sci USA, 100:8418-23 (2003)) for cross-platform analyses where scaling was required. Moreover, the classifier of the subject invention was able to predict the outcome for poor prognosis patients (sensitivity) with an accuracy of 55% whereas 0% would be predicted correctly by Dukes' staging.

TABLE 7A

Accuracy of U133A limited Molecular Staging on Cross-Platform

Denmark Independent Test Set.

Classification Method Total Accuracy Sensitivity Specificity

Dukes' Staging 64% 0% 100%

Molecular Staging 68.5% 55% 75%

TABLE 7B


Comparison of Dukes' Staging and U133A limited Molecular Staging
Accuracy on Cross-Platform Denmark Independent Test Set.

Dukes' Stage	Molecular Staging	Dukes' Staging

B	64%	79%
C	70%	58%

TABLE 7C


Confusion Matrix of U133A limited Molecular Staging Results on
Cross-Platform Denmark Independent Test Set

Observed/Predicted	Poor	Good	Totals

Poor	17	14	31
Good	14	43	57
Total	31	57	88

The present invention provides a colon cancer clinical classifier with significant accuracy in LOOCV that exceeds that of Dukes staging. The utility of the classifier of the subject invention can be validated, such as against in an independent colon cancer population using a completely different microarray platform. The gene classifier of the subject invention can be based on a core set of genes that have biological significance for any type of cancer, including human colon cancer progression.
Application of Prognosis Classifier with Therapy
The benefit of adjuvant chemotherapy for colorectal cancer appears limited to patients with Dukes stage C disease where the cancer has metastasized to lymph nodes at the time of diagnosis. For this reason, the clinicopathological Dukes' staging system is critical for determining how adjuvant therapy is administered. Unfortunately, as noted above, Dukes' staging is not very accurate in predicting overall survival and thus its application likely results in the treatment of a large number of patients to benefit an unknown few. Alternatively, there are a number of patients who would benefit from therapy that do not receive it based on the Dukes' staging system. Accordingly, an important contribution of the prognosis/survival classifier of the present invention is the ability to identify those Dukes' stage B and C cases for which chemotherapy may be beneficial.
The molecular staging/classifier of the subject invention provides more accurate predictions of patient outcome than is currently possible with current clinical staging systems, which may, in fact, misclassify patients. In accordance with the present invention, a set of genes is derived from a genome wide analysis of gene expression using known microarray analysis techniques (i.e., SAM). By clustering groups of patients with good and bad prognoses, it is illustrated that the prognosis/classifier of the subject invention presents outcome-rich information. In a further aspect of the present invention, a supervised learning analysis can be used to identify a core set of informative genes. In a preferred embodiment, a core set of 43 genes was identified that appeared in 75% of the cross validation iterations and accurately predicted colorectal cancer survival. This core set was derived from a 32,000-element cDNA microarray that included both named and unnamed genes. This gene set was highly accurate in predicting survival when compared with Dukes' staging data from the same patients.
A means for validating a prognosis/survival classifier is provided by the present invention. In one embodiment, to validate a cDNA-based classifier for human colorectal cancer, a normalized and scaled oligonucleotide-based colorectal cancer database from Denmark was evaluated based on the Affymetrix U133A GeneChip™. In a related embodiment, a colorectal cancer classifier (U133A-based cDNA classifier) was produced on the training data set using a limited set of genes common to both the U133A and the cDNA microarray (for 78 genes). The U133A-based cDNA classifier was then applied directly to the normalized and scaled Denmark test population.
In addition to identifying those patients for whom therapy is most beneficial, the classifier of the subject invention can identify those genes that are most biologically significant based on their frequency of appearance in the classification set. In one embodiment, those genes that are most biologically significant to colorectal cancer were identified using the classifier provided in Example 1. Specifically, osteopontin and neuregulin reported biological significance in the context of colorectal cancer.
Osteopontin, a secreted glycoprotein and ligand for CD44 and αvβ3, appears to have a number of biological functions associated with cellular adhesion, invasion, angiogenesis and apoptosis (see Fedarko NS et al., “Elevated serum bone sialoprotein and osteopontin in colon, breast, prostate, and lung cancer,” Clin Cancer Res, 7:4060-6 (2001); Yeatman T J and Chambers A F, “Osteopontin and colon cancer progression,” Clin Exp Metastasis, 20:85-90 (2003)). Using an oligonucleotide microarray platform, osteopontin was identified as a gene whose expression was strongly associated with colorectal cancer stage progression (Agrawal D et al., “Osteopontin identified as lead marker of colon cancer progression, using pooled sample expression profiling,” J Natl Cancer Inst, 94:513-21 (2002)). INSIG-2, one of the 43 core classifier genes provided in Example 1, was recently identified as an osteopontin signature gene, suggesting that an osteopontin pathway may be prominent in regulating colon cancer survival.
Similarly, neuregulin appeared to have biological significance in the context of colorectal cancer based on frequency of appearance in the classification set of the present invention. Neuregulin, a ligand for tyrosine kinase receptors (ERBB receptors), may have biological significance in the context of colorectal cancer where current data suggest a strong relationship between colon cancer growth and the ERBB family of receptors (Carraway K L, 3rd, et al., “Neuregulin-2, a new ligand of ErbB3/ErbB4-receptor tyrosine kinases,” Nature, 387:512-6 (1997)). Neuregulin was recently identified as a prognostic gene whose expression correlated with bladder cancer recurrence (Dyrskjot L, et al., “Identifying distinct classes of bladder carcinoma using microarrays,” Nat Genet, 33:90-6 (2003)).

Accordingly, the identification of such genes may be significant in terms of gene therapy. For example, a therapeutic gene may be identified, which when reintroduced into tumor cells, may arrest or even prevent growth in cancer cells. Additionally, using the classifier of the present invention, a therapeutic gene may be identified that enables increased responsiveness to interventions such as radiation or chemotherapy.


Sequences
ACCESSION No. AA149253
ORIGIN

1	aatatggaca gggagtctca ttgtgtttat catatcaatt aatattacag tacatccttg
61	gtaatacaaa attgtacacc ttcatcaaat aaattaggat aaattaaacc aataaattat
121	gcaaagtctt cagaacaata gacaacaaca aaaattcaca attgaaattg cctctagcta
181	aaaaaaacaa acaaaaatca aaaattgact ttatcagttc agttattgta ctatattcaa
241	atcaaagggt ctttattaca aaaaagagct taataatgct atttacaaca tattgctaaa
301	taatataaag gcagtgtttt gtcacggttt atactatata catatgagaa atggctggga
361	caatattgag ggaagcccat gaccttttgg attcttccag gtagcgctga gaccnatccc
421	aatacatttt ttttccttag ttccaaattt gganggcgta atatngcagt tttnagaaat
481	tttccncccc ccntttttag gggggattgg atattttana aaaattccgg atggaatacg
541	gtttccccna aggagggtag cntggtt

ACCESSION No. AA775616
ORIGIN

1	tttttacatt caagataaaa gatttattca caccacaaaa agataatcac aacaaaatat
61	acactaactt aaaaaacaaa agattatagt gacataaaat gttatattct ctttttaagt
121	gggtaaaagt attttgtttg cgtctacata aatttctatt catgagagaa taacaaatat
181	taaaatacag tgatagtttg catttcttct atagaatgaa catagacata accctgaagc
241	ttttagttta cagggagttt ccatgaagcc acaaactaaa ctaattatca aacacatcag
301	ttatttccag actcaaatag atacacattc aaccaataaa ctgagaaaga agcatttcat
361	gttctctttc attttgctat aaagcatttt ttcttttgac taaatgcaaa gtgagagatt
421	gtattttttc tccttttaat tgacctcaga agatgcacta tctaattcat gagaaatacg
481	aaatttcagg tgtttatctt cttccttact tttggggtct acaccagcat atcttcatgg
541	ctg

ACCESSION No. AA045075
ORIGIN

1	ttttttnttt tttttttttt tttttttttt tccaggaaag acagatgtta tttaccacca
61	atgaattttt atcatattta aatgaacttg aaaatgtcat tcaactcaaa tccctcaatc
121	aacttacttc agcccattct gaaacttcat attgcagcaa accagccatg tgaaagaaat
181	aaattcaat

ACCESSION No. AA425320
ORIGIN

1	ttttcaggtt gtaaatattt atatttctct cacatacaat gttgtatgag acacttgttt
61	taatatgtat ccataggatt aatactcata tggagtataa tgtggaaaag tgcagaacta
121	aagaaataag tctatccgaa aacaaaagca cacatttctc aggatttaaa aatattgcac
181	atagtaaggt tgcacagaaa ttactggctg gttttacaaa cagaatgagg tatcagtcaa
241	tctctagata aagatgagag agaggataaa ctacacacac acaaacacat aaatccatac
301	taagacctaa gagtgccaac aactaagaaa gaaatatgaa aaagctatgt taggtagcca
361	ggatttcaac actacaaaat catttttagg ctggaaccaa acacataaca atctcttggc
421	aatatttcgt taagttttca acttttttcc agcctaaatg actatgggca ataaaaccat
481	ttcctttacc ccagttctac tgtagaaagg cacagcgctg tggtaaatat caaaccattc
541	ctttctcaac

ACCESSION No. AA437223
ORIGIN

1	tttggtgaat aaactaacag ctttattaat gaaggcaaac atcagatcat tgtatgaata
61	ttatatatat atataaaaag aaatccaaac taacagcatt gtatttcaaa agtactgtac
121	ttctgtttct tttaaagaga cttgtcatct gtttttataa aacaaaatgg gtactcttct
181	cctaaaaaat cctggaaaaa tgaaatagtc aatttcaagc tgatgaattg aacacacctt
241	tctttaaatg cagactattg ctaggaagca aataaagtca agcatcagaa agaagatgta
301	tgagaaatgc atgaaagtca gagaaaaggg atgtagtgaa attactgcta atctttcccc
361	cctatattca aagaccatcc aaaactggtc tttcatacaa atataaaata actataaaga
421	gagggaattt gaaaccatac ccatctgaaa tc

ACCESSION No. AA479270
ORIGIN

1	ctctgaattc atttatttag aggtaaaaca cagccattca aaattgtgga atacaatgtc
61	tacacacaga ataaggttgg ggaattaagc tgaattgtta tattccattc acattaataa
121	atatttttaa agaagaaatt gtagatttta aaagcttcat tagacactag tgacacatac
181	aaataactaa actctcatac tgcttgattt tcaggttgaa aggttacaat aatctatata
241	tttcaattac atggcagtaa atacaaaagc attttaaaca tcttttgaac tgtgtagtat
301	actataagca ggagttt

ACCESSION No. AA486233
ORIGIN

1	caaattgaat attttattaa catggtagtt gcctttgtaa catgtgcaca cacactcgca
61	cactcagaat gatctgcctg ggggaaaaat actaaatatg cctaagggga aaatgaaaaa
121	taaaaaaatt cctgtaggtt ttcattattg taggcaatta tgtccacatc acttacaaag
181	ctattgccaa atctgtccaa ggaagcagag tttgaagtga gggctaggga caggaatctt
241	gggaaaaatt caacagtggc atagcagagc tctcaatatg agaaagctga cataatgtgg
301	acttttgctg tgaattacct ctttgcaaaa tatggggaga ggtttatcaa tgggcagaaa
361	ataagagaag gcggtgtgaa gtaggcttct gcagtcaatt ttcctcacag tattgtgcag
421	ggtcatcaag aaaatgctta gtctttctct ggaaccagtt tcagaacttt tccaattgca
481	atggtcttac cctcatctct taagggtgaa cgacccacct aagggaagtc tttaaag

ACCESSION No. AA487274
ORIGIN

1	tattactgca tatgttatat taaatttaca caatgatata taaaaacaca tactgtttat
61	attatatagt aatttaacat caacaggagt atcaacacaa gtactactca tgcacaaaac
121	atgcatatat tggtatacaa aaagcaattt tacacaatac tgtttaccaa aaattttttc
181	ttaaaaaaca gcccttccac ataggatcaa aggtccaatc tggactggat tgcactaata
241	tgttcaggtc aacgcttcgg tggcatagcg ctcagtgagc aattctggga ttggagtcat
301	gcccaagggc tacttcatta atagtga

ACCESSION No. AA488652
ORIGIN

1	tttttttttt tttgcaacgc aagggctctt tattgtcagc gagacgagca ggccaaacgg
61	gcactgaggc tccacggggc ccaggcctct ttccgtggaa gagaggcaag aggggtttca
121	ggattcagag gggtcctccg ctcacgcagc accatgcaaa tatagagcta aaaactttct
181	gaatgtctct ggcttgaaac caactgggcc aacaggttcc acaaccactc tctttttgat
241	cactgggaga caccaaaaat gctgatagag gagctggtct gagtccaccc aggccaaatt
301	cttgacaccc tcgttagagt ccaggtctgt ggtattcagt tgaaacacta ggaaatggaa
361	gacacgtcca tccgtgccca ggctctgcac caccacgggc tgctccaaga ccttggcatc
421	attcccatag aggagccggg cctgagcagg gcactgcaaa agcaaacagg atcatcttgg
481	cccgcagctg atctggttga aggcggtgtg gtcgtaaatt ggctttgtcc agtaagtaca
541	gggtatgggg ataggggtaa ggatag

ACCESSION No. AA694500
ORIGIN

1	tttgacagaa gaaacatttt taattgttct tgtcctgccc catcaccagg ggagtcccgg
61	cattgctcag gctcactgcg cttgctttcc cctgggatgt cgaggacact ttgacctcat
121	ctatgtcata gcccatgtgt ttctcagatg ccaccgccat aagatctagt gccccctggt
181	gccattggga taggcaggcc agagaggcat gggagctggg tgtgcaccag gccacagggc
241	tgtggggcat gcagccgatg gtgcagcttc aggtggatgt gctgggtgaa gcgactccgg
301	cagacactgc actggaaggg ccgggtccgg aggtgca

ACCESSION No. AA704270
ORIGIN

1	ctaaatcaag tagtgctact gaaatccagt gcctaatgga gcagatggtg gaggtcttag
61	actctggaac atttatagtg atgcttctga atgcaaaaca ccaagagtgg atttcacagg
121	ctgtgaatct gatttgattt tgatgggagt aaagcttcca ttttcactgt acttgaacca
181	caaaagaaaa aaagcatgtg tgactgacac aagctagtta agaaaaagga acatgttaaa
241	tattagtccc ataaagggaa gcagtttaaa caagtgatta tttgtttgta tcatttaaca
301	tgattatgtt tgtatacaat accaccgtttAA706226

ACCESSION No. AA709158
ORIGIN

1	tttttttcct tcaactccct ccaagttgtt tatttaataa taataaaaaa gaaatgcaca
61	cacataaacc tgaactcccc cccaccccac cctcccttac tcccagtaac tagctccaaa
121	atgaaaaaac ttcccttgtc ccacctgggg actaaattcc cacctccact gccataacac
181	tagagaaaca aaataaaaaa tatgcagcag ctcaccaccc accccacaac tgaacctcac
241	acaatcccct caaacaaaga agccaggact gggggttcac aggaatgaga ggagccctat
301	attctgaaaa gggatgagaa gagaggtgaa cacccccacc tcaaataagt gcttaacccc
361	cacacctgct ctttccttta ccaattgccc caagcctggg gaatcaggga aatttgaaac
421	agt

ACCESSION No. AA775616
ORIGIN

ACCESSION No. AA777892
ORIGIN

1	cagcttgcat cataagtttt attcccgatg cgggacagat ctttccatcc ctcaaatgta
61	ttacatgtcg ccacggaagg gcttaggatg ctgctcccat ctccaggaaa gatgagaaaa
121	aggtacagac tgggagccag tccaggacca ttctgcagtt cctggctctc ttaccctccc
181	ttctcagcag aggaattatc tctcatccat tcagttaaaa agaaaaaaaa aaaaatcatt
241	aacaaaacaa aacacacctt aagtattggg caggggtgtt cttgtcctca gtaggacgtc
301	aagttctggg tcaccaatgg tgattttttt tgtttttgtt ttttgtcatt tttgtttgtt
361	attttttttt tttnnatttg ttagttatgg ntagcagttg tgtgtccacc tcatctgcag
421	gcagctgcac atagcggacg actgagcccc tgatgaagca gttcttgact gataacatgt
481	gagggtattt ctcagggtct gtgacactga tgtcggttag tttgatattg aggtactggt
541	ccacagagtg gagggttcca cagatgctca ggtcattctt gagttccacg actacatacc
601	ttgccacaag agacttgaaa aaggagtaga agagcat

ACCESSION No. AA873159
ORIGIN

1	tttctgtagg atttttattg gtggcacctg gggccacatg gagggagtcc tcagcacagg
61	cgctggggtg tgggaaattt cagaggcccc tcctgggatg tcacccttca ggtcctcatg
121	agtcaatctt gagtttctcc ttcactttct gaaatggctc tggaaaacca ctcccgcatc
181	ttggcagaaa gttcactctg tttgatgcgg ctgatgagtt cccgagcctt gtcctccagt
241	gtgtttccaa actccttcag cttatccaag gcactggaga cgtctggggt cccctgggct
301	ggggctgggc cttccaagac gatcgacaga accaccacca ggaccgggag cgacaggaag

ACCESSION No. AA969508
ORIGIN

1	tttttttttt ttttttcact tcttcaacaa gtatttattg aacgccaact atggaccagg
61	ccctgtgctc aatgctgggt acagagtgga gactgaacca ggcatggcac ctggcctcat
121	gagcttacac tcgagtggga ggcacagtca accaacaagt aaattacaca aatggatatg
181	cagtggcaaa ttctccatga agggaaagaa cagaggcctt gtgatagagg aactccacaa
241	gtaaagtagt cgaggaaggc ctcttggacg aggcaacgtt gaagccaagg cctgagggtc
301	tgcagaactc agccatgcac agggtagggg aagagcattc ttggcaaagg gaacagcata
361	tgcaaagtg

ACCESSION No. AI203139
ORIGIN

1	ttttttgagt ttggcatgtt aatttttatc agcgacttct ggggcctagc accattcccg
61	gaagaaggga gttgtcgggc agggtcctta atgggggttg caattcttgt cttggttggg
121	aaagagccta gctgggaaca ggggtcgttt gtgtagtaac tgtattaagc

ACCESSION No. AI299969
ORIGIN

1	gcggccgcgc cggctccagg gccatttagc ccccaggagg agaatcgagc aatctttttg
61	gaagtccaga agaagctact ccttccagca ggcctaatag gatggcatct aatatttttg
121	gaccaacaga agaacctcag aacataccca agaggacaaa tcccccaggg ggtaaaggaa
181	gtggtatctt tgacgaatca acccccgtgc agactcgaca gcacctgaac ccacctggag
241	ggaagaccag cgacattttt gggtctccgg tcactgccac ttcacgcttg gcacacccaa
301	acaaacccaa ggatcatgtt ttcttatgtg aaggagaaga accaaaatcg gatcttaaag
361	ctgcaaggag catcccggct ggagcagagc caggtgagaa aggcagcgcc agaaaagcag
421	gccccgccaa ggagcag

ACCESSION No. H17364
ORIGIN

1	tttttacttg aaattaaatt tggnctctaa agttggtgta gcagcagttg atcagnactg
61	aaaaacggtt tttagtctcg gaaaaagact gattttgctt ttttataaat attattagat
121	ttattaattt ttcgtgctca atgtgtaaat tgtattataa ttcattgtga tttatttcac
181	ttttaatttg ctggtgtttt aataaatggg ggtgttactg aatctttctt cccacttcca
241	tttcttttga ccacccctta accctcaact gtgacggtag tagtattatc atttatacca
301	aagttttgca tagtccctgt tgactttgta atgttaacgg agtcataaaa gcactaggca
361	agagaaagat agaaatttgc ttttaatctt tttgcctttt attttgcaca ttatgcaaaa
421	gggaaaacat taaaggacac tttttttaag ngagtgaaac atgggnaagg catccagtgc
481	tttatgcaca ttgtnagcta atcaggccat tat

ACCESSION No. H17627
ORIGIN

1	tttttttttg ggcagatgag aaacagaatt atcatcagag tcttgctaca aacagggaaa
61	aacacaaacc aagatgacac acggacatgg tagattaaac attcctcccc accttcagga
121	tacatttaca ttgnaataaa tactgcaatc tcagcagcgg caaacaagga ggaatntagg
181	aaatgcccac ctcctcccct ctgtcttatc tgtgtgctct cttccttggg tagcaccgat
241	ctccccaggg tgctgggtga gaaacaggac aggggngaag aggtccgtgc atgctcactt
301	gcccttttgc

ACCESSION No. H19822
ORIGIN

1	gaagtcatan tatgataaac attttattac actaaaaaag tcatctgtta actgactgaa
61	ctgcaggggg accacatgtg aggttacttc agaaaaatgg catcagataa catatataga
121	tttctggcat tataaaatgg ctagattctc ccctaccttc cctcattaaa tattaatcag
181	tggcttaggt cagttctagt gggaacactt aattgctgac ttcacataaa accaggntta
241	gcctaatgtg ccaatggtat gagtccattc ctgggccatn ttcccaacag ccagaccgct
301	gtggcttgga caccggaggc aacatctggg gggcctcagt tccactcctc tgtggtnagc
361	ttgctttccc aataactggc tntggagtca catcaacaat ggtggc attn catctggggn
421	ccacatgagc cctttggggg tgctgcatcc ctactng

ACCESSION No. H23551
ORIGIN

1	ttttttttta tgcacactaa ggnatatttt attgtggcat taattagatg aaagttagta
61	atatgncatt gaccaaaaca tttgattgac aagnaccata aaggttaact gagagttttc
121	tttaatataa ttgttgtaca gacaaggatt cctgctgtat agagtatata gaaggatgac
181	atactctagg aattaggaac aatatatatt caatacaata acaaaactat atagtacttt
241	aagaactctt tcacatatat gaacactctt acttaggaac ttcagctgtt taaagtaagc
301	aatatgcaaa cctataaagt acacaccaaa aaaatctaac ctacaaaaca cccaaagcaa
361	atgttagcat atctctatta tcaagaatat cttctcacca tcgtttcttt caaaaatatg
421	tgaaaaagtt ctttctttcc ttatgagtgg caatttttaa aggcccctct tctgaaatta
481	gntatgttcc aatccactat cactcttaag ggaaaatgga acdnctctgg g

ACCESSION No. H62801
ORIGIN

1	aatgatatca gaacctttta aatgatctag tatctgtgat gttagcgccc ttgggattca
61	gaaagtggtg tgcatagtaa aagctttcat tgtaactcac cctgcctaga tatgcagaaa
121	gcaaattcag tgataagatc tttcctggga gaccaatcag cagcctcagg ctctgttggg
181	gtctatcaca atgatgttat ctaaatttag ggcaaggaac cctttcccca tcttttagag
241	ggcagtgagt gttctaatca cttcaagata ggtatctgat aaaagtcttg gggccaactt
301	tttcatactt aggnagggca caactaaaat ggatatactt aaaatggtat caaaggaggg
361	ttaggtgtac actctactag gtgtaaggtn tatttcatta caaaatggct ttgg

ACCESSION No. H85015
ORIGIN

1	cacccaggct acagtgcagt agagcaatca caactcactg cagcctcaac ctccctgggn
61	ncatgcaatc ctcccacctc agcctcgcaa gtagctcgga ccatggccac acgccaccac
121	acccggccaa ctttcgtact tcttgcagag agagggattt gccatgttgc ccaggccggt
181	cttgaatttc cgggctcgag tgatccactc acctcagcct cccaaagtac tgtgattaca
241	ggcatgagnc actntgccca gccaataaan tcttt

ACCESSION No. N21630
ORIGIN

1	gaacagacta aatttgtttt aacaatccca tttacaattc aaattccttt aaacaactta
61	atagcattta tacatttaaa aaaatgattc ttttaagcag cattgcaaat gcttgacccc
121	attagcataa accttcccaa gtgcttaact ctcataaaca taataaatta aacatatggt
181	gactttccaa gttctctgaa acatttcagt acttttgcag acttagtaac attttaaaat
241	acctttcaac tgaaactcat aagtctaaaa gtctgttaag cattttaaat tagaatctta
301	aggccagtgt cacatattgt aatatgccaa ttatgtttaa atacttcaaa cagcaaatac
361	tacagtttat ctcaatgaat ataataacca ttcctgctgg gcgcagtggc tcatgccttt
421	aatcccagtc attaaggagg ctgaggtggg aagattgctt gaaaccagga gattgcctca
481	ggcctgggca acatggtgag acctcctatc tcaaaaatcn aaataaaaat tagctgggca
541	ggtggctcat cctgtagccc agcntctcag gaggctgagg tgggaggata gcctcgccta
601	ggagacggag ctgcagtgag c

ACCESSION No. N36176
ORIGIN

1	aataaagaca agtgttcaga tttatttgga aattcacagt ttctaatggc actacagctc
61	cgtagttaca tattgaaaat tctcttccca caacacacag atcacataat ttctcactgt
121	atctctgctc tcatctggac ctcttttcaa ggggcttcta taaaatcagg ncctcttgnt
181	cngganagnn nantngngcn gacaggaaag aaatttaaat cttctaaaac acgctgttaa
241	cctaaagcag caacttaaac aaacaaaaaa ggcgttaaat aagtcacatt acaaacaata
301	cccaagaaag gtattaggca agtttaaaaa cagttatcac tactaaaagt gctcaataag
361	ttataactta aacatcacaa caataaatgg tcaattctct ccctttcaaa aagaaacatg
421	ttccactttc attcactact gtacaatcat acta

ACCESSION No. N72847
ORIGIN

1	attgttactc tagttttaat ggtttcacaa atacaaaagt tgctagataa gcagtaccaa
61	catatctaaa tctccaatga tgttcaatta aaattttatt tatagactca tacactcagc
121	aaaaccactc atttaataag tccaactgaa ataaattctt attaataaaa tacctatatt
181	gaaagtaata tattgtaaga actctacctt aaattgacca tggggatgaa ctacaatgtc
241	ataaaatatg agccaaaatg ttcactcaat aattttaatt acatcacaat taagcccaga
301	actatgcctt ttttttggtg taaggctgaa taaggaccga aactggatgg agagaaaatt
361	gctttctaaa gcctcattta ctggcaataa cttaccttat gcaataacca acatcacgng
421	actgg

ACCESSION No. N92519
ORIGIN

1	ttttttttaa ctcttaaaaa aaatcatttt attgatcctt taccatacaa aatttattca
61	aattacaccc atttgaagtg gtaagatcac agctagagaa caggtcaccc tgtaacaaat
121	ctatttacaa aatccatcat aaaagctttt ttttgttttt ttttacatta tattacatat
181	tttctttttt aaaagcatac aacacaaagc taaactgatt agtagtttgc ctactcccaa
241	ttttgggaga aatacttcct ttttacaaaa tcacgtnccc cgtaggaaaa gaaattccca
301	caccctgaca attggccaac cgacttactc tgcaagccat cttcttcaaa tccctccttc
361	tcatacacac gangttgtca tgcacacact gaatcntaat ttcttttccn ggaagcttaa
421	ncctttaaat accgggaatt attttcagat ctncacgtnc caacaaaaat ggaaacaagg
481	gccccaccaa gnccgggaaa acnaaaccca ataccctntt aaaaatttca aggc

ACCESSION No. R27767
ORIGIN

1	tttttancna tttgtaaata agtttaattt ttnagttttt caatgacatt cagtagagat
61	agttatattg gctatataac acaagtaaag tggtgtttgg aaagtggagg actaggtttt
121	ggcacggggc taggacgggg tgaccgccgc ctcaccacca cagactggag ggggcttttg
181	agagctgggc ttcgctcccg aggactcagc tcagaaactg ctgaggcccg tgatgcagaa
241	ccagtgccgt aggtgggcat ctggccatgg cttcgagctc tcaggatgct tttgtatctt
301	gagagggtgc ctccagagaa tgtctgctcc ttgggcctca tctncccggg ttatnccccg
361	gcag

ACCESSION No. R34578
ORIGIN

1	atttttgaag nngnttcgat gtcttactgt tatgaccata aaaccaataa agctactttg
61	aaaagttaaa gccaggngta attaaacaac tcatacttga ttgttaaagt cagtctctna
121	aaagtgtaat tttaaaaagg taataaaaaa ggtatancat tat

ACCESSION No. R38360
ORIGIN

1	tttttttttt ttcaaaaatg tcaaacttta ttcaagtgtt atggtaagaa atttgaaatt
61	cttaggtaag ctantgaata aatccttggg caggtgcagg catacagatt ctggggtgca
121	gctgctgagt ttaaaagctt cctttggaga tgccccgnng gggnnacacc ccctntcccg
181	cctntcaaga ggaggccatc ctggggcagc acgttagggg caaatggccc agatgcccag
241	ctnagggaaa cctccatgcc tagaggagga ggtcgctctg ggagcaggag gaccttcttg
301	gaacccctgt tnacaggntc ctttttcttg ntttttccag nacctcctgc aggg

ACCESSION No. R43597
ORIGIN

1	tttttttttt ttttttcagg attcactgcc tggggtatcc cactatatat atctcaccta
61	tgatgtagtg gtgcttgaaa tactcatctc attagctcga ttttattatt ctaatctaag
121	gttttttata ttattcatac tatgatattt ttagggacaa tcagtaatat ttggggcaga
181	gtactgaggg acctcttgaa gtctgcaaca gcatgcattt tctttgtttt tgtggggagt
241	gcttccctgt aggctgtctt tgttctagga acactgnctc caaatttatt tccatgggga
301	tgtagggggc tagtaggccc atggtggaaa ggtcttctgt aaatctccnt gggggggtnt
361	gagttattgg gggttatttc taacagggan ttttcccaaa ggggg

ACCESSION No. R43684
ORIGIN

1	tttttttttt ttttcattca aaaatatata atttattgag tacttgctag acacaatgga
61	tacaatgatt atatagtccc aatcctccag gagaacaata gacagacacc tttataatat
121	gtatgtggag tgctctgaca gggaaaagca caaggtccat gggggtggga gtggcccagn
181	agctaaggaa ctcttccccc atgaagtggt tacttacttt ctaatcttta atttaggatt
241	ctctcatgga acatttgant ggtgaaattt tactacataa aggttctcaa ccctaggagg
301	tttatccctg cccccctggg aacatttggn caatgtctga acaacaagtt tattntcaca
361	actggggagg ggngaaggaa gttagcagag gccaaggatg nctggctaaa ccttaaattc
421	ctacat

ACCESSION No. W73732
ORIGIN

1	tatttcaaaa aaagtctttt aattgttcaa aatagcacaa aacgacatcg cactatggta
61	atattgagtc acaggggtta cnctacaata gtgaacggng tactcncctc agaaacaaat
121	cant

ACCESSION No. AA450205
ORIGIN

1	tttttgtttt ctttcattat ctttatttta aatttgatat tttagaatag gaaattatct
61	ttcacagcaa tgcctcctgg tctgataata cagtatctca tttctgaatg taaagattta
121	aaataaatca aaatgaacat taaggcgtac aaagctactt taagtctgct cttaagatca
181	gtttttgctc atattcaaaa tacatggaat gttggcacaa aactgaagct gctgtagaaa
241	gatcacagat gttctgtggg ttactcaaac ttccatttct ctaaaaacat acccttacat
301	ggtcttaatt ttatgaattt aagtgttgag aaatatctaa ataataagta acaattaaaa
361	taaaatgttt tatttgtaaa ttatgtacag aatacacttt acgttacgc

ACCESSION No. AI081269
ORIGIN

1	tttttttttt ttctaaaact acctttattg tggttggctc gacataagat gccgccatca
61	gcagaattat aaaactgtac aggaggcaca aaaataggct gtttaactta gataatgacc
121	ctcatgtctt caagctttaa aaatgcacat aaaagttgta caatctggca gtttataaaa
181	tataaagcta aaaagaggat tttgggttcc acaaagaaga ctgtatcaca caattaacac
241	gtactaatta aacaattaac catccacaca gaagacataa tg

ACCESSION No. R59314
ORIGIN

1	tttttttttt ttttcaaaaa ctttattctt ttctaataaa aatgatatat gttcattata
61	aaaagtttca aacacacatg agtctganga ntgtaaagat cacccaaata ccacagccca
121	gaaaaaaaaa tccttaacat ttggtganga tctctctatg aaacatacat tatcttaaaa
181	tattcaatgt tataaatgag ctcatattca acatatatcc tgtngtctac tttttgattc
241	aataatattt tgggaacata tatccatngc antaaacata tatctaaata tttttaaatg
301	acaactggca tgggnnttta tttaatccat cttttactga gggatgtttc agttgtttcc
361	aatgttttaa tatcataaac atcatggaaa tataccnttg gggctccatg tttgganggc
421	ttggggcaac ctt

ACCESSION No. AA702174
ORIGIN

1	catcttcagc attaagaagt gctgacacaa tatcattaac tgttttatag ttctctccag
61	ttgtcaggat tttactttga actgtttgtt tcaccaggtc tctattaaag cccatttcca
121	aggcagattt aaccacaggt gtattcatca tgacagcatc ttctgaagaa ctttctccag
181	gtccaaaatg aataattggt gggtcagcat tttcttctcc agtggtatct gaagttgaca
241	acagctgttc aagaagatga ggatatctac cttgaatctc atcaacaaac tcttggcctt
301	tcattcgtat caagaactca caccttggaa accacttggc atgttctacc catggatcat
361	ctccagattc ccaacacctc aagccaccat cacaacaaaa gcatttgaca tcatcattgc
421	gacccacata ataaaaacca gcacttgcaa gctgctcagg ctgaactgga acactagatg
481	gccagtacat aaatgttctc attcgagctg catgtgtctg catgctcaga tttgaaatgc
541	taaacctcag agtttctaga gaa

ACCESSION No. AI002566
ORIGIN

1	tttttttttt tttttttttt tttttttttt ttttcacaat tcttaagtct tgttaagaaa
61	gtaaaaaacg tttgggtata ttttgatcca tgggtggcat tttcaaatgt gcaaaaacaa
121	agtcttggaa gagattcctt gtcactagaa agttcgccct tccttttgct gtcagttgta
181	cgtaagagaa attcgtccac attaaggaat ccaaaaaggg taaactaaag ggatttaaaa
241	agagtacatt acaaagaata agaagccctg taacatctat ctgagaatac tagataaatc
301	tgtgagtaga tgtggcacct ggagctactc actacattac taaaaacaga aacaagaaat
361	ctataatggc aggatcacaa catttgcgcg caaatagcta acc

ACCESSION No. AA676797
ORIGIN

1	aataccttct gttttaagtt tttcttttgt tttcatcttg gaaaaaagga aatttagaaa
61	taagacagga aaagaatggc ccagaaattc agcacaaaga gaggtgtaca cattgacgcc
121	atctgtgggt cacatacgaa cgcctctggg acagagctct aaaacgagtc acgtgtcgta
181	gggagtgggc ctgtggcaag gcagtcctcg cagtgtgcag ggacgcaggc ccccttacca
241	tggaagcccc acccagaagg aagtgggtgc cccatgcagg ccgaggtgga tgaggggaca
301	gtggtgtgct cacagctgtc agctccccac tgaagcccca aaccagcaga tgtgggcagg
361	ggctcaagtg gtgtctgact acccaggtca cacgtgcctt aagcgtgaaa gctgtcagct
421	cccggcacgg gctctggtgg ggctgggaac accaggacac acatgggctg aagcttccag
481	agacagtgag acacggaagg gacagagagg tgccctccac acagtgtg

ACCESSION No. AA453508
ORIGIN

1	tttggttatt cagtatttat tctgcaatgc aaaggtgaca aactaaaata taaaaaggct
61	gttatggctt aacatttttg ttgcagatta aatatgcagc attgaaaaat ggaaaggcgt
121	ggcttcatct ctgaccagca gagttaaaaa gaaaaatctc tccattttcc ttcatcatca
181	tgggatacac tgttcaggca atccaaatta ataaagactt gcactttcat atgaacacaa
241	gatcaagtgt accagttagg ttttcacatt cacagtatat aagaaaatac acatggaagg
301	aaaagtaaag ggttaact

ACCESSION No. W93980
ORIGIN

1	tgaatgaggc aacaaaagca gagatttatt gaaaatgaag gtacacttca cagggtggga
61	gtggcttgag caagtggttc aagagcctgg ttaccgaatt ttttgggggt taaatatcct
121	ctagaggttt cccattggtt acttgatgta cacccttgta aatgaagtag tgcccacaat
181	cagtctgatt ggttgaggga ggggacctat cagaggctga agcaagtttc aaagttacac
241	cctatgcaaa tctctgattg attgggaaaa ggctgaagtg aagttacaaa gttatactcc
301	tatgcaaatg aagacttggg cccatgacca gcctcattgg gttgtggaaa gggaccaatc
361	agaggtactt tcaatttttc catctaccat gcagaaaaag gttcgggggt ggggggttgc
421	caaagggaag ttagccnaac aaactcctga cctaccaaca gagggtccca gttgggtagg
481	ggggcctggg

ACCESSION No. AA045308
ORIGIN

1	ctattaatca acacttttta atgtagtaca tatatatctt acagttattt aagtcaaata
61	tgtaaaggtt tacaactgat ttacagatga agcaatcaca gattgcagta atatgtgtgt
121	gtgtatatat atatttatnc catatataca cacacgccaa tcaaggggaa aactgcatcc
181	tggcaatttt acagtctgaa gttttgttgg tatatctacc atttcacatc cttttcatct
241	tgcttttctg tacaaaagat atttttngcc ttcttcattc ctgatgagat ttttctgcga
301	taactttaca ttcgtacatt gccagttgtc gaccaatgtt tcccattgtt atgcctccag
361	caaaaaatat

ACCESSION No. AA953396
ORIGIN

1	atctgtcagt aaattacatg tatcctggct gtttatttca aaaatgcttc agtatgtatt
61	tcctaaaata gggatattct cctttgtaat cacagcaggg tagatactgc tctttagttg
121	tcatgtctct tagccttctt taatgtggaa cacgtccaca ccctttcttt atcttctgtc
181	ttttaaacat cttttctgtt gtccaatttt taacaacaaa gatgttaaaa atcagaaaac
241	tcagaaaagc acatggtgta ttaaaattcc acctaggaat aactgccatt aaagttttgg
301	tgtctccctt tctgtctctt cagatgcaac ttactagtct agacaaagca ggtttctcag
361	tgaataaaac at

ACCESSION No. AA962236
ORIGIN

1	ctaatcctgc gaatatgggt agtgcttcgt tccatggacg ttacgccccg ggagtctctc
61	agtatcttgg tagtggctgg gtccggtggg cataccactg agatcctgag gctgcttggg
121	agcttgtcca atgcctactc acctagacat tatgtcattg ctgacactga tgaaatgagt
181	gccaataaaa taaattcttt tgaactagat cgagctgata gagaccctag taacatgtat
241	accaaatact acattcaccg aattccaaga agccgggagg ttcagcagtc ctggccctcc
301	accgttttca ccaccttgca ctccatgtgg ctctcctttc ccctaattca cagggtgaag
361	ccagatttgg tgttgtgtaa cggaccagga a

ACCESSION No. AA418726
ORIGIN

1	tttgagtttc aaaggattta tttgatttcc ccacatgatc acaaccatgg ttttacattg
61	atagagtctg ttgccactga caaacagaat gcagatgaaa acaaacgcac tcctttcctc
121	tcaaaggtac acagtggggg tgccaggctt cttgtgaggg aggtgtcctt gaagtctctg
181	aacagtctgg ggattcagga cctgattcta attgcttaaa acaactcgga ggcaaaagat
241	attttccaag aggagatgca tgctgtgtgc agtctcgatg tgactgcaca cagaa

ACCESSION No. R43713
ORIGIN

1	tttttttttg atgtgctaat tttatttttc taatacttac caaaataaat gccaccactt
61	aacatagaaa aaattgttcc catgtgacct aaaatcattc ctcagtcacc cctgaactgg
121	ctagtagcga gcatatgtgg agcggtggtg agggcaggat agcctggtta taggaaacct
181	cagantagga aagacctggg ttcaaatccc cactctgcca cttactagnc tgtgtgactt
241	tgggacaagt tgtgaaacct ctctgaggat ttatttcttc atgtaaaatg tcaccgataa
301	tggataactc agtgggtgta agantgatct attttaagga ttctagggca gagtcccngg
361	gcagggcagt taaggcactt aaataggatg gacaguctat tcattnaatt attaggcagt
421	tttttcctta atggagggtc cttgttggaa ggaccccttt tttcttaacc tcc

ACCESSION No. AA664240
ORIGIN

1	tgtgataggg ttccactttt tctctcatac tggtgtgcag ttgctgattc atggctcact
61	gcatcttcag tctcccatgt taaaggaatc ctttcacctc agcctactga gtgtgcacca
121	ccaggtccag ctaattgttt ttttaacttt tttttttttt tttttttctt ggtagagaca
181	gggtcccctc tgttgcccag gatggtttgg aactcctggg ctcaagcaat cctcccactt
241	tggcttccca aagtgctgag attacaggca tgagcactat gcccaacctg agcaggatga
301	cttaaacctg atcaattcta ctccaaaaca gcaactatca ttaagtcagg ggtgtcaagg
361	aggactctgt gaaggcaaag actagactgg gatgtgtgcg agagtgggat aagaaggccc
421	atccctagca gactg

ACCESSION No. AA477404
ORIGIN

1	ggaaaacaaa aggaaaactt atttattctt agaggtggga atgtggggag tggggcagaa
61	caggtggtgg ccctgggaga gggtcccaag gggcagaggt tggggatgtc tcagtaaaga
121	ggggcaggtc atgaatagag cctccacccc cagcaggggt tccttgggcc cgcccaagca
181	ctgggctaaa acgtggaaac tgggcattga caaagtacag cgg

ACCESSION No. AA826237
ORIGIN

1	aaagatgaga accagaatgc ttatatttta ttagtatcca agactgggga gagggatggg
61	gtgggagaga tcaagaattg gggagcagat gggaggcgct acctcactca ggagacacga
121	gttcttatcc aagttcaagg tgaaagaagt gagggcagga agagaaatct ccctgctagc
181	aacagcgact cagggagaaa ctctgggccc atagctagct ggaggcaggg tgacattgct
241	cccaccaatg ggccatcttc ttagctacac ctttgtagct gtggtgccag gcagaagaac
301	cacctggaaa ctgagctaag gcaggttcct tcttccaaca gaagacacag ctgggcaggg
361	actgtgcaga ctcaacaggg ccaggccagc tagtggcang tcagtgttca tgtctctcac
421	cagtgcctgg agggtcccca gccaaggaaa gaactggtca gttcctgc

ACCESSION No. AA007421
ORIGIN

1	gtttgtagca gttccaaaaa gaaagcagaa ctcatttagc aattgtgata aaagaaggaa
61	aaatgcatat gttttaaaag tcattaacgc atcgtgaaag cgctcccaat caacctcatt
121	ccctaggatt ttcagctaac taacaatagt gtctttttaa tttgatgtca tgaaaatctg
181	gtcacagcaa acacaatgtt ttctaaagca gatctggcct ccgagggagg aaagctctcc
241	agggcctcca gtgccttgtt tccatggtaa cgacacaggt caatagctga agtcacacct
301	ttgccagctt tgattctttc tcgcaactgg gagtctgagg caagaggatc acttgagccc
361	aggagtggga ggctgcagta agctatgatt gtgacactgc actccagcct gagcgacaga
421	gcgagaccct atctcttagc atagtccaat cttccttttt cttgag

ACCESSION No. AA478952
ORIGIN

1	tttcccagcc ctcaggccac tttattgctc aagagtggtc agtctggggt atctgcatgc
61	ctgaactcca tgatgatgtc gcctgtgtcg gggtgaaact ccactgcata gctgacagtc
121	cgtgggccac ccagcagtgc tctgggatct ggggcagggc tgaagaagta gacggcctgc
181	ttgcagtggg ggttccagca gcagcccccc tcgggatctg caggctccag gaggccagtg
241	ctgagcgtgc actccggggt caggtggtac tccatccata gcaccgctgc gtggctctgc
301	acgggccttc tgagctccac ggtgccctcg gcacacaggg gctgcagggg ca

ACCESSION No. AA885096
ORIGIN

1	gtctgtgact cttggttagg gcaaatttca aatccattat aatacataca ttgcagcaac
61	actgagtttc ttataatagg tactatccaa agctttcttt tttttacatg tatcacttaa
121	tcctcacaac cacctgagga ttaataccat ttacctgttt tacagataag gaaaacaatc
181	atttttcaat tatgactatg cccccaaaca ctggtttgga tggagccttc actggtatag
241	agaatgacct tcttccctta gactagactc tggctataat aaaggatggt ttaatcatcc
301	cctgaagcaa tgcataagat aatctgcaat gtatcttcac atactgtacc ttatttgata
361	ggcaagagac ccataaagga agctgagcat ggattatcag cttcatcaca aatctgaaga
421	aactgacatt tatgttatgt tgccttaccc aagttgggac atcagagcag caac

ACCESSION No. H29032
ORIGIN

1	tttttttttt tctataaatc tctaatgtta tttaggtttt ttaaggutt ggaagtaaca
61	gagggataca tacagcaaga tccacttaca tagttttaaa acatgcaaaa caagattata
121	tatcgtccat atgtaattat atctgtggta aaatataaag atatgcattt tggggacata
181	gtcaccagat tattagtagc tcaaggaaag gcaggaggaa gagtgctctg ggtgggggga
241	ggttcacagg gtgcttggac tgtacctatg atttcttcaa ataaaaattt caagcaagta
301	taaaatatgg gatataggaa tgtaaaggat ttgggcaaag ctgggctggg tgggtatcca
361	atgttcctta tcaccatctc tgtacttctc tgantgcttt aaataggtca caatcnttgt
421	aag

ACCESSION No. R10545
ORIGIN

1	tagaatgaat tgcagaggaa agttttatga atatggtgat gagttagtaa aagtggccat
61	tattgggctt attctctgct ttatagttgt gaaatganga gtaaaancaa ttngtttgac
121	tattttaaaa ttatattaga ccttaagctn ttttagcaag c

ACCESSION No. AA448641
ORIGIN

1	agccttagga atggttttta ttcacttgaa cactgtacaa atattacaat ttccttttgc
61	tgcaaaaagt ataaaaataa tctttatata ggaatccatt cgttactgta aatctttcta
121	aatctctgca aatggcccta aatgagggta aatgaaaaag ccgaaatgaa gagagggtta
181	tggggcagca ggaggtgggg ccaatcatca gggctggacc acccagactc ctccccagag
241	acctctgttc cttcttggta gccgccccca ccacctgcag gttctagggc taaaggccca
301	gcagaagtgg gcacgtgaga gggccaggag gagctggagg gtcagggggt gggggatagc
361	gaaggaagct agaagtggtg ctggcatgtg cccagttcca ccccacca

ACCESSION No. R38266
ORIGIN

1	tttttttttt atcttttaaa tgggatttat ttatgtttac ataaaaggta gcaaatgtta
61	cataagttgt ttccttaaga acatttattt tgtacaatca cattgttatc aagcaagact
121	tatggaaaat ttcctgggtc cacaacactg aactttgaaa ctactgtagc attctctttt
181	ccaagtttaa acatgacttt gtgcactgaa gaagtatggc ttcgcattgc acagtgggtc
241	acatgtgaca acctgacacc aagcgagaag ccttttgatg aaggaatgtt ttatcttttg
301	ttgaggttac caaaatgggg actttcatgt gtggtggatt atccaaaccc catanttttt
361	ttttncggtt ccatttctgg cttccaattn aaattaaccc ggtttaaact aggcnggttt
421	nggccaatgn ta

ACCESSION No. H17543
ORIGIN

1	tttttttttt tttaacctct tgctcatttt tattccagaa cctaggaaga actagtacac
61	tgaaggcatt tgatgtttgt tatgaaaagg aaacaacaaa aaaatcaagt tcaggctggg
121	catggtgcct catacccgta atcccaagca ctttgggagg ctgaggcagg agggatgctt
181	gagcccaggg agtttgagat cagcctaggc cacatattca gaccccattg ctaccaaaaa
241	atttttaaat taaaaaatgg ctaggcatgg tgggcataca actgtaattc aagctacttg
301	aggaggctga ggtggggagg atcacttgaa cccggggggt tgagggccac agcgagctgt
361	gattcacaac actacactcc accctggggc gacgaagcaa gatttcgttt tcaaaaaaca
421	atttttgttt caantcccat cttcaccnta aaaacctngc tacattcccc aggggaaaac
481	caattttca

ACCESSION No. T81317
ORIGIN

1	taaagnnatg aggtcttgct ctgtcaccca ggctggagtg cagtggcaat tgtccctcct
61	cagtaagtgc aagccaccat accaggccct ttgaacatat tttaaatggc tgatttaaag
121	tctttgccta atactaaagt ctaacatttg ggcttcctca gggaacattt tctaatttac
181	tgctttctct cctatgtgtg gaccatactt aagtggtttt ttgcatgctt tgtaataaca
241	gtctcttgaa aactaaacat tttaaataag gtaatgtgac aactcgnaaa aatcaggatt
301	cttcccctac cagggnattt gttgttatta ctgtttactg ttggttactg gtttattgtt
361	gttnctntta ggtgactttc ctggaactaa ttatctaana tatta

ACCESSION No. AA453790
ORIGIN

1	aacaaatata tttagatata tttaaaagaa ttaaaaaaaa catttcacaa aacatttgtt
61	gccataggaa ttatttttag caataaatgc ccacatcaaa atttaaacat ttttcaaagt
121	atgattatct gtactaagta atgcaacaaa ttatgtaaac agagtcagat acatttccct
181	gtaggagtca cttccttccc gggattaaag ctgtcccaga catctttcca ggggaccaat
241	taagaaactg ctattttcag agcaacagaa ataaaagctt ttatttgttc atttgaatat
301	aaaacaggcg ttatcacaga tgtacaaagc gtactggtgg tttaacatac aagaaggttg
361	ctgtcctttg cacataaaaa ttttgtttga aactgtggct ggttgagtac atgagtt

ACCESSION No. R22340
ORIGIN

1	ttttttaaca taaaggtttt attgaataaa tacatgcact gtcacgtgaa attagttgaa
61	cagaaaggag gttctctact ttttaacccc catcccccac cgctgttctc tatttgcagt
121	ggggggtcca gctggaggtg gaataaatgc ggcaaccaca ganaaaacac acagctacac
181	acaggcctgc atttggctta tgtgcctgaa aaagaagggc cgacctcttg ataaagaatg
241	tctgtaaaag gaattcttac cgtgcagaat atattatcat gggcnantac agttacaagg
301	ctgcttctat tttatttatt ttttgagacg gagttcacct ctgttgccca gggtgggagt
361	gcagtggtgc gatcttgggc tcactggcaa cctccgcctc ctgggttcaa gcantt

ACCESSION No. AA987675
ORIGIN

1	gggtagatag ctagaagtga tagtgctagg tcatatggta aatatatctt caacatttta
61	agatactgcc aaactggttt ccaacgtgac tgcatgtccc atcaacaatg cgtgagtgtt
121	ttagtttttc cacgtcatta tttcacttcc cccaggtgtt actgtccttt tttattatag
181	cattctagtg ggtaagaagt ggtgtctcac tgtagttttg atttgcatgt ccctgctgac
241	tgatgatgct gaccatcttt tcatgtattt tattgtctat tcctacacct ttttgatgaa
301	atggttattc aaatattttg cctattttaa aaatggggta attatcattt tgttgcgtag
361	ttgtaagtgt atttcatatt ctggatatga gtcctgtatt aaatatatga tttgaatttt
421	taaaaaaaaa aaaaaaacct cgt

ACCESSION No. N51543
ORIGIN

1	acgattaatg ttttattatt catattttga caaagatagc atattatatt ccaggacatg
61	gtagttacca tgtggggaaa cctatcaaag catttttaat gactgcttag aataactgta
121	gaaagtactt tctcaatgat ttttgtatgc aagaaaaaaa atacctgaaa gtaaccaaaa
181	gtttcagact ggaaaatatg ccaggaagat tttcttctct cattctcagg tgaggttata
241	atccagtttt agcaaatgtt tgacaattta aaatactttt gaaaactgga gatttaaaaa
301	atgtaaacaa ttggtaggca cagcaaaatc gtagttttcc cttctgatat tatacatttt
361	ggcatctctc tacagttatg attaaccatt aaatnaaggg nagctaaaac gttccaaaaa
421	taggttttac caacattcan tttttaaaat tttccattca agctggtaat ccttttgggt
481	ttcc

ACCESSION No. N74527
ORIGIN

1	aaacgtggca cagtgtgtgt agtgtatgtg actactatca tttgtgtaag agaaagaaaa
61	gtttactatc agagactgta tctggaggga taaacagact ggcaagggtt gcctctggna
121	agaaaccggg gaatagagag cgggagtaga aagactgtat tagctgggtg tggcagcaca
181	cactgtaggc ccagctactc cagaggctga ggggaagact tgctcaagcc caggagttca
241	ggtccagcct gggcaacaca gcaagactaa aaaaaaacaa ctttcttttc caagaatacc
301	ctttttgtaa cttttgaatt ccgtattttt taatggtcta tggtctacaa acactcatgt
361	gcaaacacat tacacgcaga ataagggatc acctgcacga agctatgaac tatttcctca
421	tcccttctag ccccttccta gaggcgaacc ctccgccccc aaccccaggc actatctgtc
481	ctgcttgcac cca

ACCESSION No. AA121778
ORIGIN

1	tttctgtcaa gctgttcttt atttcangga gagggcaggg gcagagcttt acaggagtag
61	agattttgta tgctattgaa ggtaaattgg tatcagttta aattagattg ttttaagtgt
121	aggatgttaa ctataatccc catagcaacc acaaataaaa catctaacaa atatacacaa
181	aggggagtgg aaagagaatc agactagttc actacaaaaa aacagaaaag aaggccataa
241	agaggaaatg aggggccaaa aaagtatatg acatatagaa gaagtgttaa atggtagaag
301	aaagtccttc cttaattact ttaaatgcaa atggattaaa ttttccaatc caaaaggcag
361	aaattggcag aatggacaga naaaacaana catnaacatg atagtgatat gcctgtc

ACCESSION No. AA258031
ORIGIN

1	ggggccccgt gatctcaacg gtcctgccct cggtctccct cttcccccgc cccgccctgg
61	gccaggtgtt cgaatcccga ctccagaact ggcggcgtcc cagtcccgcg ggcgtggagc
121	gctggaggac ccgccctcgg gctcatggcg gccccggtcc gcatgggccg gaagcgcctg
181	ctgcctgcct gtcccaaccc gctcttcgtt cgctggctga ccgagtggcg ggacgaggcg
241	acccgcagca ggcaccgcac gcgcttcgta tttcagaagg cgctgcgttc cctccgacgg
301	tacccactgc cgctgcgcac gggaaggaag ctaagatcct acagcacttc ggagacgggc
361	tctgctggat gctggacgag cggctgcagc ggcaccgaac atcgggcggt gaccatgccc
421	cggact

ACCESSION No. AA702422
ORIGIN

1	aaatgtcttt aattgctgaa tgcctctttg gctaatattt ggaagatcat tatttagtcc
61	tacaacagac gcattgttcc actttcccat cattttgttt gcaaaccgct aaaagtctta
121	tttcctcatc tctttgacac attaccaaag tggaccctat gctgtaatca cacaggataa
181	tgttggaaag tatgaatatc taaattattt tttaaaggta ttattttttt ccttctgttt
241	tcaaatcatt tctgacagtt tctaaagaca tggtcacagc tgcctgaagc atgtcttctt
301	cactcatagc atcacctaga tcactcccaa gtgctcctga actggtggct ggcctttcac
361	atggatgtga actctgtcct gataggtccc cctgctgctg ctgctgctgc tgctgctgct
421	gctgctgctg ctgttgctgc ttttgctgct gtttttcaaa gtaggcttct cgtctcttcc
481	gaagctcttc tgaagtaaga tttgtacctg atgtctgtgt catatcttga gaaatgtttc
541	g

ACCESSION No. T64924
ORIGIN

1	tgagacggan ttgctctgtc gcttaggctg gagagagact ctgtctcaaa aataaaaata
61	aaaataaaat aggagtaatt cacgaggaaa agattacata ggctgctttc ctgcttttct
121	tatccacagg cagttctttg caatgactat ttaaaaacta aaacaacatc acaagtcatg
181	aagtttgtgc tacccctgaa cttgacaaat tgtctgattc aagtgggcaa agcacaatga
241	ttggatgcat ctgaacagaa cctcctctgg aatgggggcc tcactagagt gagctcttca
301	tgagccttgc caccaggggc aggggattat tctgttattt tggcctgttg tagccaagtc
361	tgcaccccta ggcacccaaa acaaactggg gngagttgg

ACCESSION No. R42984
ORIGIN

1	tttttttttt tttttggaaa acactgttta tttgaaaaca atgagacctc aaatatgaaa
61	tatagttaac aatgacattg acactgttgc tagcactttc ccctaaacca cccgtaagtc
121	ttggacgcat gtgcatgcag cacacacaca cacacacaaa aaccaaaaac aaagccaaaa
181	aaaaaaaant cccaaacaca acattccatg nttgttcatt gaactcctga tgccgggagn
241	acaggactgt taaaagattt tgtctcccac attatctctg ggagtggggc acaaagc

ACCESSION No. R59360
ORIGIN

1	ttttttttgg ttttattttc tcctgaagct gaaaatgttt cacccatata aatgtggcat
61	tttagactct agctataaac ctcatcgacc agtatgtttt cagagttgtt cacaacaaaa
121	tattattcgt ttctaaaatc agttttcact ttttggtgat agtattccag gctggactgc
181	ttgaatttta gatgcagaga tcattttata tatatctgtc aatgtaatac agaaaaatta
241	catgtgaatt gtttatgtgc cccctctacg tagggacaca gtatcaatca ctcaataagg
301	cactgtaaca tcaggtgggt gtttggggat aaataacctc ttcggggttt ctttcaatcc
361	cactaccata tggct

ACCESSION No. R63816
ORIGIN

1	aagtcannga tntttactta atttctttca ttgtatactt gtatctcatt ttctcttaac
61	actgaaaatc ctgacttcta aagaaatgta actacttgtt ttcttacaac atagtattct
121	agatacaata ggttcaaaat aacaccagta ttaccattaa caatgagact actaaatgca
181	ttttcacagt gcactaaaat ctcaggaatt cactggcaat ataattcatc catgtaataa
241	aaaaccactt ggtaactcca aaactattca aataaaangg taataacaaa tttaaaaatg
301	gcattttgng ggtttcttcg gaattttttc accctttata ttcccccaaa gggccttctc
361	ctattaattg nggaggggcc ttgggnattg g

ACCESSION No. T49061
ORIGIN

1	ggaccaaaga actttatatt tattttaaat atcaaagtaa cacaaagaac tagttcaata
61	tacagtacac ttcctactct tcacagagaa ctgaaatttt ctataaagac atttatactt
121	aggaaacatc agacaaccaa agtatgtata aaactcacaa gatattttac acacagttca
181	caataattaa ttctgatatt ttaggntttt tctgtcattg cttttaaagc atccttaatt
241	taaaaacaaa aattattatt tgaggactgg aaaacaggtg gcaaaggcat ttctactttt
301	aattatacac tggtaaatcc ccccttaatc caaaacattt tacttncaca t

ACCESSION No. AA016210
ORIGIN

1	cacagcaatt catctttgct tttattaata atttcaacgt atgttttgag cactttacaa
61	tgtaggaaat gctttcatag acattatttc ctatgattct cacaaaacct tcactgaaaa
121	aaaagacttc aaggtcactt gccctatgtt tataaaataa tccgctttaa ataagcagat
181	aggagtccaa aaattcttac aatcataaga aaaaaaaagt ctaaccagta cttaattatt
241	tcttgtcatg attactttgt tttaacgcca ctgtttcctt gcttccccca ttttcttcag
301	ataagtttac tccttttggc ttgtcctgca tccttttctg acagctgccc tgtgtacacc
361	tgccttaaac atctatcctt ctactctgga atagactaag ccaaaagcaa ttaagaaata
421	tttcattcta aagaaaacag aattttagtc caaaacccaa at

ACCESSION No. AA682585
ORIGIN

1	cctgtgggct atattttcct gtatgttttg tatttttttg ttggaaactg aacattccaa
61	gttttacact ggggaagctc tggaaactga attattttac tcctccagga ttgtttattt
121	ttaaaatttt gctggcttat gataaagggt atttcgagga aacagataaa gggatgtata
181	gggcgaggta tgggggaagg ggtgcagagc ttccatgccc tccgtaggtg caccactctc
241	caggaacctg caggtgttca gctatgtgga ggctccctga atgcggtcct cttgggtttt
301	tatggaagct tcataatgtc agcattcctt cccccaaggt atagggcaag actctctctg
361	gggaaggtct taggaccaca atcagaaaag tgggcagaca ttagagtcct gccttggggc
421	agatgaaagg agggcaggag aaggtcagag aaattgtttt tcttgag

ACCESSION No. AA705040
ORIGIN

1	gtagagtcgc ggtctcactg tgttgcccag actcgtctca aaaaactcct gggctcaagc
61	aatcctcctg cctcagcctc ccaaagtgct gggagtctag gggtgagcca tcatgcccag
121	ccaagcctga ttttaaatca ggtctctgcc actagcagct gagagctcct cactgataaa
181	tcctttgcag ctggaagtat tcaatggtat ccagtatatt cccaatggct cattcctctt
241	ggacagagaa actcaagtta aatgaactct tttggctgtt tttctccctc ccctttgttt
301	cctccctctc ccttgcctgt gtctctctgt ccactctctc aggcccttc

ACCESSION No. AA909959
ORIGIN

1	ttttaatggg caaaagaaca agttgcagtc aatggctgca gaggggtgtc tggggtccaa
61	tgtgggctgc actttgtggg tactgaggaa atgggaagat gctgcttcta ggtcagctgg
121	tgggttggag gttgggggct gtaattagca gcagccttag aactgggatg cctttcaatc
181	cctcctggcc ccttatctct gtggggcagt cacaggacat catctgtttt attcaaagtt
241	gggacttgca gcaggagacc ctgtcctgca tggagtaggg gtcctctgtt gacaaacttc
301	ttggtttcca gctcttcccc atctgcagca ggcctctgga ta

ACCESSION No. AI240881
ORIGIN

1	tcggttaaga tttttattat tccagagaaa aattagaatg tatcggtaaa agaaatagga
61	atgcatattt caactcactg tcacaaacag gtgttttatt atcccaaatg acagtgttgc
121	ctgagatgat gcatgtggca gacgaggaac caatgagtcg gtatccttta ggacaagaat
181	atttaatttg ggatccgaac tggatgtctt tgatcacatg tgccatgcca ttcacaggat
241	ctggaggatt acgacatgat ttacgtttgc acttgtcctt agcacttgtc cagactgagt
301	tttttaggca gatgatagaa aacggtcttc cggaataacc agggcggcat tcatagttca
361	gatatgtccc aatgggaaac tcagagtcat cagttaggtt ggtaggcctg gcaaatggaa
421	gcccattccg gacattgcat tga

ACCESSION No. AA133215
ORIGIN

1	caagaacatc ccttttaatc acaaaccact catccacaaa tgtggctatg gggtaagcag
61	tctaggctgg gaccctttcc agaggtaagt caaggtcacg tccctgcccc cttcctaggg
121	tggcggtggc tccagccagg ggggcttcca ggttaatacc agagcctcgg ctactctgga
181	ctcctgtgag ctcttcttgg ctggaagaag gggggcattg tgggcctgct ctgtcccaag
241	gctccagaag ctgcccctac ccaggcctgc ctgc

ACCESSION No. AA699408
ORIGIN

1	taacagtctt aatattcatg tatttattct cagaacatac aaacttatct tctcagagaa
61	tagaaaacag agatttcact cagtgacaaa gatggacaca gccagttcac cgtgtccccc
121	catctactta gaaaatcccc tgggggaggg gatgcctaga gcatacagca ccccttggtg
181	gccggctgtg cacaggtcta aagactctca acttccttta ccatccaaaa aggaaaacag
241	ctgtccagat gacagtaaga ttccactgtc tgtaatcctc atggtgccag gtctcctggg
301	gcatctaggg caatgatgct actgcagttt atgcagttac acagtcaagt ctgtgccaaa
361	ggaggtccca tccggcggcc aggtttctgt

ACCESSION No. AA910771
ORIGIN

1	ttttgttgta gaaatatatt tattaacata agcagttcac aatttactgt aagaaaaaaa
61	gcaagctaca aaacagtgat tccatgttta tattaaaata aacatacaca aattaaaaat
121	ttccttagat atccatttaa tctctgggat cataagcaat gtttaggtat tttttgctca
181	tttattgcct aggttttaca caatgagcat atatgttaat tgtgtaattt aaaattatgg
241	aattaagtgc aagagttcct aaccaccttt tacaaaactg ttatgagaaa atacattcta
301	gattcaaaca aaaactaagc aatatatccc ttattctaac agctctaaaa tctgttcttc
361	tcattatact cccac

ACCESSION No. AI362799
ORIGIN

1	tttttttttt tttttttgca agggctgcgc ggcattttat tttctgaacc ccccacagca
61	ggggcggcca gtcctgctgc aggcagagtt tcagtcttcg gagtttgacc ttctggccca
121	aggtcatcac agccacaggc ggaggctctg gggaaaggtc cagttcctgg gatgctggcc
181	cctaatgatg ggcccatctt tccagtgccg cccttccctc ccgcctggca caggagttct
241	ggagccacgg tcctgagtct acagaacagc ccggtcagcc tcgtcccgcg gtgcaagcga
301	ggcctggcct ccctccctgc ctgtccttgg cccggccaca tcactccctg cgtttcttct
361	tcttctccgg ctcctggaca ttggccgcct ttgctcgggc actggtcagg ggccgaggtg
421	tcctccttct ttggcgagcc cctttttggc cacgggccct

ACCESSION No. H51549
ORIGIN

1	atacaacatc tttatttggc attgganatc ctgacatttg tncattacag ttccttaaaa
61	aacaaaccaa aaaatcagaa caaattaatc aaaaataaag atccaatggc tctatttaca
121	tatngcaaag acagcccagg natcttccnt gcacacacac accccgcccc gatacagtta
181	aggggttaat aagctttggg gagcgcagga ggcaggttcc acagttcatc aatcccaagn
241	cacccccatg aggtaggggt gcctcacaca gccagacggn tatcaagagt atgattggta
301	gctttttcct c

ACCESSION No. R06568
ORIGIN

1	ctgtcctgat tagaattaat tttcataaag agaacaagaa tcttgactgg ttcacccttc
61	aattccttgt gcccgcaaca gtgaccggca catggaaagc attcagggaa taaaagcaca
121	atggaaaatt aaaacatact cactgcatgc ctgccaccta taggaaccaa attaaatcac
181	tgccaatatg gcatgggggg aaaaccttcc catttttctg ggaataatgt ttacaaaggg
241	tgggaaaata aggtggcaca ttcacctggg gtggggcatt ttaatttaaa cgctngttga
301	ccccagtngg ttgttacntt tttcaggtgg aatta

ACCESSION No. AA001604
ORIGIN

1	cttatgaata atgttagaaa tggaacatga tgttttaaat gtatacataa accttccaat
61	taattatcag gtgatccagt agtagacctg tgacctctga aggctcctgc ttctcatccc
121	ttcccttctg ctgtgatttg ttgtcttccc tctgctcatt ccccttgtgt ctgtttcttc
181	catcctctcc ccatgctccc tctgttgtca tttcccctta ctctccactg cacccagcct
241	ctgttcataa tttttactgc aattccgatg attgaattat aaactggaag ggagcaggga
301	tattgatctt catgtagttg gacatgtact agactcacgg agaacaagga ctgggttgta
361	ggcacaatgc tgtgtgggtt ttgggtaaat ctaactcaca ctcaacttga ttttgttttc
421	c

ACCESSION No. AA132065
ORIGIN

1	gagacacagt acaacagtct ttaatgtata tataaatatg cctacataac agagtttgat
61	aagagaagtt ttggctatat acaactctgc atgtaatcaa actctagaac atcaaatgca
121	actccactgc atagctgttt tgacagagca acagttaagc ataaaatagc tttgcacctt
181	attattttgg agcaaaataa aaaataacca ccacaaaaaa aatctctaca ataatttaaa
241	ctaaaaatgt tgttgaggat agggtaaaca acaaaaaaga aaataatttg atccatatgt
301	gatatttggc tgaagattaa cagtgttaag tctaaccaac agcgagataa ttttaatttt
361	cccaagcatc ttnctaccgg tttattagcc atatttggat attaagggga agggcatttn
421	gccctttacc aaaaccn

ACCESSION No. AA490493
ORIGIN

1	tctttattga cttattgtaa ttttttggca tacaaattac ttaagtatat ttacaattct
61	tacataatgt acattttaga agataatgta ctttgctcca tttacaatga caaactactg
121	taaaactaca ttcatgaatt agatacaaat cctctacata ctaataaaaa gtaaatggac
181	tgttggttat acattcttta aaatatacct tttcacaggt agcaagaaat agtacatgta
241	ataagtcttt atgactggaa tga

ACCESSION No. AA633845
ORIGIN

1	gtttttaaaa gtcagggttt tttgttgttg cttgtgtgtt ttataattaa catagtttat
61	ttttaatact ggcatccaag aatcctggtt tactcaggtg cagaaagact ctctaactaa
121	gcagccaaaa aaatttttgg tatgcaagtt ttatcatttt ttaatttgca tatgacttga
181	acgtgtcttc aagtataggt ctacataata actttttaag aaaattataa agctcaatac
241	aataaatcta atacataaat gctgcttgta agtcaaatat ttaagagact ataaaaatgg
301	gtaattttgt gataaaattt agaatcattt gacaagagat caatgaattg

ACCESSION No. A1261561
ORIGIN

1	cactgttaaa aatacattta tcattaaaat atattacaca tggagacagg atgcatcata
61	tacagtttgg aagacttgct ggcccagaaa atcccacttg tttcaccgaa cactcatttt
121	ttcagggatt ttacatttta tttttagaga cggggtctcc ctctctcacc cgggctggcg
181	tacagtgatg tggtcatagg tcactgcagc ctcaaactcc tgtgctcaag tgagccaccc
241	acgtcagcct cccaagtaac tgggaccaca ggcacgcatc accacgccca gccaattttt
301	taaaaatgtt tttgtagaga gggggtctcc ccgtgt

ACCESSION No. H81024
ORIGIN

1	agcttcagcc tttattaaac aaaggaggag gtagaaaaca gataagggaa cagttaggga
61	tcccttcttt cccctataca tacacagaca tacaaacaca cgcacccgag tgaatgacag
121	ggaccatcag gcgacagatt gaagggcaga gggaggcagc accctccgag agttggcccg
181	gacccaaggg tgggctgaga cctgggccag gggcagccgt tccgaggggt tntgcctgag
241	cagtttggag atgaggtcct gggctcccgt ggggcacaga agcggggaac tttaggtcca
301	ccttggacga tggcgg

ACCESSION No. N75004
ORIGIN

1	tcaagtcata agataaagtt taatcatttg atcatgttaa aagacacaaa acacagccaa
61	tctaaccaaa tttcaggcat gcatttacat aaatatatta aattaagaaa agaaattgta
121	cacttaaacg tccttttcac ctagaaatca ttaaatccac agatcaacaa taaaaccaat
181	tctctgcatt taccacttca agatacaatt gttctatttt aaagataaca caaactncac
241	tagtctggtt aggaatttat ntgcattata catatattat

ACCESSION No. W96216
ORIGIN

1	tctcaggagg tagaagcttt attatgacat cttcaaaaga caatcaaatc aatagacatt
61	tgctgagcac ctgctgtgtg caagcccgtg tagacagtag ggtccagtgt cccacgcatg
121	gctctcgaat ccccggggag aaaaatcaca tcnggggtca gggagttttg cgtggctgag
181	aacaaagtgg gtttctgaac atcaaagtgc aattcgcttt acggggcaaa ctccgangcc
241	cagccccgcg tngggaagcc gcagcngggc gggcccgctt cctggggctn gcggccgggg
301	tttctctaag ccgcacgcnt tgcgtggtgt tgcggggcct ctcaagcaag cccggaagca
361	gcatccttga gctccggttg ttggagcgct gggacctctg gctgccgccc ccgcagcagc
421	agcaaccact actccgctgt c

ACCESSION No. AA045793
ORIGIN

1	caaggtatag ctaattttat tattatcaaa caaaactagt agatataact tccaggaaat
61	aagttacata aatataacag aataaattca ttttcttaag tttcaaatta aagatgatta
121	agaaatacag ctttatgtaa agtttctgct ttttctcaac cacgcctaaa gaggaaagaa
181	ctggcagcag gaacacttgc tcctaggaaa caaatacaac aaaattataa ttaaaaagat
241	cttcaagcta tcaaaatttg tgagagaagg atggtaagaa tgcagtagaa attaccanat
301	gacaaacaaa atcctatcag ttttcaggtt ggtcaaaaag taacttccat gaatatagcc
361	tgtggatccg gccat

ACCESSION No. AA284172
ORIGIN

1	gtgttaaagt tggatggatt tattttttta aaggcccagt acaaaaaaat ggttgaggaa
61	agtgactctt caacaaaata tacacctgta gaaaaaaatc cctaatatac tgatatttaa
121	ttgaacggaa agtactaaag agaacatact ttaatatcta ggcacaattg gtcaggtact
181	aattataatt tctgttctca tttaaaagtt taaaccaatt cttcaactgg actgatgtgt
241	gtgagtctaa tacagagaag gcacctctct catctctcac tctccttaag gaccttttga
301	gagaaactct ttgtaacact ttaagggaca cagacaatgc actatatcta agtatagata
361	tagttattta acatac

ACCESSION No. AA411324
ORIGIN

1	tttttttttt tcccaaacaa tacatatcag attttatcca ttttgttttc tacatgttct
61	ttgtgactca agtttgacat tagcatttgc accccaaatg agttccccta caaataaaat
121	ttgttcatgt tgacacaaag aacacaaagc aagtatagat ccctcaggaa gttgtcacaa
181	ctcttgataa gattaactcc accactatca tcactttttg ctttgtcccc tagtttgaag
241	cctgctggct tttataattc aatgagaatg actccacact cttctccaaa gcgcccatta
301	tttttagttt ttcggtgcgc gactcaacat aaagacctgt ggctcttatg agctgcctgt
361	ttttaaatgg tgcagtagtt tcagtttcca tttaataagt tcccagataa caaatggaga
421	atgggaagaa tcttctcaag gtcacagtga aggtaaaaat aaattatctc catcactgag
481	aggct

ACCESSION No. AA448261
ORIGIN

1	tttccagaaa aggatatttt ttttattcaa gtaactgcaa ataggaaacc agagagggag
61	ccccaggctg ggacaaatca tggctacccc tccccaacag aacaggggga ggaggtggcc
121	cctacaccct ttatggtcga ttcgggcccc cttgctcact ctgctgcagc atcctagggg
181	cagggccagc cttccctggg actggggtag tcggtcaccc agcctgccat gccccagccc
241	ctcttcccca caaagagtat cttgggggag gggatcgtgg gcagaacagg aggcaatgag
301	gatgaacatt tggcgctggt agcagcagca atgacggatt gtcgaagaat ggaacattga
361	aca

ACCESSION No. AA479952
ORIGIN

1	aacagtctgg ctgttgtttg aattaaactc ttaaacagga tgtttagtta gagggtaatt
61	gttgagtaat gatgcataca acagcatact tccctttctt gctgggggtg cagcttttca
121	gttttcttgt tttactttga cagtgcaagg ggaactgaaa ataatttcca ttgtattatt
181	tatcttagtt cagctgaggg ctttatgaga cagtggatgg ggaggcagta agacggtgat
241	gagataaaat gtgtgtgttg cactgactgt ctataaagtt atcctttctt catgaaaaag
301	tagcatttaa atctggatga gtttataaag gattacaaaa tgctgattta tagagtaaac
361	tttaaaatat taaagactaa agactaaaag aagagtaata atgaagtaat gtag

ACCESSION No. AA485752
ORIGIN

1	ttcggcagca actcctttcc tttatttctt ccccttgtaa agggaaattc aagttcagca
61	gcattccttt cctgccccaa gtcctcaacc agacaagagg ctgcaggcac caaatcttgg
121	gctggataat ggcaaaggcc tcagaagctc acctccagct ctgagcttca acagctgttt
181	gtaccagtga gtcagcatta aatccaccag aaaagaacag caccacccaa agactggggg
241	gcagctgggc ctgaagctgt agggtaaatc agaggcaggc ttctgagtga tgagagtcct
301	gagaca

ACCESSION No. AA504266
ORIGIN

1	tttttttttt tttatatata tatataattt tatttaaaat ttagatccct attcccacac
61	tctaataagc tgtataattt ttgtttagaa tttttctgca aacatactac aataagcttc
121	ttttatttgg agacaaaata cagtggcatt actggaagga atatcacaac attacatttt
181	tatcttaaag gacaagcaaa ctttcagggt tgataatggg ataagcatgt ttgagactgg
241	ttaccttctg gcagttcact gcatctggat atttctgaaa agtatagaga agctcttgga
301	ttttaaaaat atcttaaaat acttttagat gaaaaaattg taaaagttct gcttataagt
361	ttacttttct ccacaattac aatatttaaa acaaagtttt gttgattgac gttttaagca
421	tttaaattta gaatgctaaa aacaattcta tcctacactt tcttcagggt aggggaataa
481	atacatcctt aacattgttt tctggatgta aacagaaatc cagcagaggt catcattatt
541	tagtacaacc agtaaataaa tgtaagagaa t

ACCESSION No. AA630376
ORIGIN

1	agcttggcaa acctttttta ttttgtgata aaaatgcttt catataaatt tcatcttaac
61	tacctttaga atgaaacgga aaagtaaaaa caaagtgtgc attttcctta ctacgtttag
121	tcaggaatat gcggtcattt tattggttac tgggtttctc atacaaacag atataatatc
181	acttttaaga gaaatgtaca caaggaagta accatagtac cacttattag tgggggcctc
241	tgggtacata aatgtgtcct cccaaatagt catcatacat tcaatggtat t

ACCESSION No. AA634261
ORIGIN

1	atagtgaaaa tatactttat tttttaatac aatagctgcc agcaatatac tggtgctgat
61	gttccaaaga taaaagaaaa tacatgcatt ctataataag ctttcatttg cctgttcaag
121	aaattataaa gaaaatactc caattctgtt caacattacg gcttgaggag ttgaaatttt
181	tccatgataa aaatatactt tgtgtggccc aaaccttgac tatttataaa ggatggagtt
241	tttaaaagcc cacatgtatc aataatggat gctcccctct ctttgaatta aatgcctaaa
301	ttcaaattaa tgcaagaaat tggtgaatca ttaaatgatg aaatttgtat caaaatgttc
361	atgaaaaaat acatttctat ttcctctaca tttttacttt gtagttattt tctaaatggg
421	tttaagggca cagaaataaa tgctatctac atgcaactct ggagagattc aaaacacaac
481	agaagttaac atgcctaaat cctagagttg atccatttag tgtaagaata aatgtcagaa
541	atc

ACCESSION No. AA701167
ORIGIN

1	ggtagaggca aagtttcgct atgttgccca ggctggtgtc gaattccagg cctcaggtga
61	tcttcccacc ttggcctccc aaagtgctgg gattacaggc gtgaaccacc gtgccaaacc
121	tacattttta gatttattat ggtgttctga ttaacaataa agctaggtta ttagctgcct
181	gggaagagga ggaagtagat ttttacagtc acttttatag aaactgttaa attcacatga
241	gaaattccac cttacgagaa ttggctccct gacatgtctt tggactacct ctgtttctct
301	aagtttttgt ttttttctgg tgtctgaatt aagttggtga cagatttggg ggatatttga
361	gtagcacttt atctagagtt gc

ACCESSION No. AA703019
ORIGIN

1	ggcatttcag taaatttttt taatgacttt aatgattctt atttaagaaa aagcccttaa
61	ataaatgcta ccaaggcagt aatatttgac catatgaacc agaccaaata ccctttaatt
121	ttagtatatt aacctctgct gtaaatgctc ttttaacatt gccacatgta caaatttgtc
181	tagaacttca cgacacaaaa gtgtgcaaat atgagtctaa gattgtgctg aaatagggaa
241	aggctaacac tgatgtgcaa agtaaaaaag aaagataacc gcttctgcaa caggtaataa
301	aacaaggaaa aaacgagtta ggtcctgcat gtgtctccac ttcattgctt ccatgtttga
361	aaaagggagt ctgttctttt gctaggccat gaggctggaa tccacttggc atactgtgtt
421	gagaggtcta agttcagtgg tgctctcagc agcagccggg agg

ACCESSION No. AA706041
ORIGIN

1	cgctgagctg cttatttatt gaaaataaac gacggaaaag tctggccttg ctcctgtgca
61	agcttggagg cctgggtcgc cgctgtggac aagcgtctta gtgtcatgca gaccagaagg
121	cagctgctgt cccagggccg gggccacctc actgcctctg atggggactc ccagccccca
181	tggctccgct gtgccctggg caggggacgg gctgggggca ggggagggct ggagcccagg
241	aggcagcaca gcagccagaa agccgcacgc tgagcctgca cctatggttc cgggaggggc
301	ttgggccgtc acccaagtgt gatccctaag aacaggaggc ccagcaccct ggaaggaggc
361	gctggaaggc ggggcggtgg tggccccgtc a

ACCESSION No. AA773139
ORIGIN

1	ccatgaacac agtagtgaga tattcctttt ccactcctac actatcttct gcttaaaacc
61	ctctgagggg tcccatctct ctcagggtga tgtctagact tcttctgagg ctagaccagg
121	tggtgcggcc ccatgtgcca cgcacccaag ccccctgcct cagtgtcccc catatcccac
181	accacagggg ggtggctgcg ttctgtatgg taggtggtgc tgaccactgg gcctctgcac
241	acgctgctct cagttccctg gccaactctc cttcaggcct cagc

ACCESSION No. AA776813
ORIGIN

1	ttttgtagag ctgggatctc actatgttgc ccaaggtggt ctcaaactcc tggcctcaac
61	tgattctcag gcctcagctc cggaagtgct ggaatcacag gcaggagcac ggtaacccgg
121	gccccacagg ggtttggggt c

ACCESSION No. AA862465
ORIGIN

1	tttatgctag gcaaggaggg atgattattt attagcttct acagattaga caatggggtg
61	ggggtgggct caaggtgaga tgattttttg ggtccaagtc tactcaagac aggcatccca
121	gtcttcggtc tccaaatcca cctcctgtct gtccccccac actgctcctc aggccttgtg
181	gatccattga ctgtgatttc tgtggttcag ctcccacatc aggcaggaag ggcagctact
241	gggtctgaga tcccacattg cctccaaccc ttgcttccta gctggcctcc cagggcacca
301	cgaggggctg ggccaggctg ctgtgctgca cgtggcagga gtagggggct gtgtcctgcg
361	ggggcactgc accaccaccc aggactggta agtgccattt ccattgtgaa gaacatctcc
421	cgtactcagg ctcctgcacc tcgcggcccg agtccagtgc acatcaattt ccctgggtag
481	aagtcgtagg ccagcacttc agtttcttct tttctcctgg gggctggtgg ctggtgacac
541	cacagaggga ggatctgccg gtccaggata tttttgct

ACCESSION No. AA977711
ORIGIN

1	tttggcattg taattatgca gaagaaaatc tttattctta gggatcatgc tgggaactga
61	gggatgaagt atatgcatat tccaaatggt tcaggaaaaa tcctgtctat aaagcataca
121	tgataaaatg tcaacaataa gacaaactag aggaaggata tacaggtgct tactgtcaaa
181	tttcaaattt tctgtaggtt tgagagattc aagatgaaaa cttgggggaa aattatatat
241	tctgataata aaacagatgg gaaacaaaga gggcccataa gacagtcact gattaagatg
301	ctttctacat ggatgggcct catccttttg tccaaaggga ctacctggca tctgttccat
361	gttagtgaca gtgactcacc ccaggttgct gcacagatat gagaggcttt agatcatagc
421	acagtc

ACCESSION No. AI288845
ORIGIN

1	tttttagatg ttttaaaata catttatttc atgtcgtttg tccccagggt ttggagtttg
61	atgttctgga ccaagcgtag gctctgagca aatgctacca gggctggaga atcagttctg
121	ccacttccta gttaagtgat cttagacaaa tttccgcgcc ttagttttct tctcagagaa
181	atgagactag tcctatccac actatggaca agtggtagga ggcgaaggag ctcacgtttg
241	taaagagcct tgcacggtgc ctgagacaaa ttcagtgctt agcaaatgtt agctcacctc
301	tcccttttct tcctgtatcc gattttgtat acaaatgtgt agaaaattta catgaaataa
361	tgcagaaag

ACCESSION No. H15267
ORIGIN

1	tttttttttt ttacatgaag tagaactttt atttggaaag ttgaatttca tgtataatga
61	aaatattttc aaaccataca tagtcataag cataatacaa acaccaccta caatacaaac
121	acgttttata aagttctact atgaatatta atccaagcca aaagaaaaag gtaatcacgt
181	gaacctgttc tacatacctt tcatctcttt tgatgacgta atcgaacaat ttaaggtaca
241	aaacaangaa agctttgggc tgaaccctac ttatttcact ataggaacac taggatatat
301	actaccacag gtaaccaaac ccaatcccat tataattaat ttaacattgt tacatggatc
361	ctatcttaat ggnatgtaaa cat

ACCESSION No. H18956
ORIGIN

1	tttttttttt ttttttttac atgtaagaag tggttttatt ccaggngtgt gtttcataaa
61	gacgaggtcc tcaaggacag ctagtggcac atgctttggt caagaagagg aaaagcaaaa
121	acagaacagg gctgcgttgc cacaaaggac cggctgataa gtgcagagcc tgatctgacc
181	acagcaaagg acagagagac cctcttgaag gccctctggt cagcagtcct cttacattca
241	acaggcgcac ccggctcccc agccccaaag gtccatgccc gagtntggcc cgggcttcta
301	gtccatcctc tgggggagag gcctttgccc tggggcccag ttttgtccta aggtttnggc
361	aggganggtt tcccagatgg aacaggggga tttttagggn tgcacttggg tttncggaag
421	gaaacntcac gacagaggga caggcaaagc ttggccntgg g

ACCESSION No. H73608
ORIGIN

1	aaattttatt aattttattc aggaaagaca ttgactgtta agtttttttt tngggggggg
61	ggtgatgtct tgctattttt taaaaattat atccagacta tgaatttaat atttactacg
121	gctaatcaac tgctcatgtc agtaatcaaa gncagaaatg agccttatac gtacatctac
181	attaaacaca cacacacccc tttaaggggt gctcagtgta gnttctaatg tcagtctgtc
241	cattcaaccc agggcccaag gttgcatcac atcaccaagt tggaatcatg aagacagccc
301	agatttgact gacatgggca cagcagggct ccctcaccac agcccntggc accagttaac
361	tatttctngc tcgngccgaa ttnttgggcc tcgagggcaa ntttccctat tagtnag

ACCESSION No. H99544
ORIGIN

1	gcgnccgccg cccccgcctg ggccgcgctc cccctctccc gctccctccc tccctgctcc
61	aactcctcct ccttctccat gcctctgttc ctcctgctct tacttgtcct gctcctgctg
121	ctcgaggacg ctggagccca gcaaggtgat ggatgtggac acactgtact aggccctgag
181	agtggaaccc ttacatccat aaactaccca cagacctatc ccaacagcac tgtttgtgaa
241	tgggagatcc gtgtaaagat tggganagag gagttcgcat caaatttggt gactttgaca
301	tttgaagatt ctgattcttg tcactntaat tacttgnaga atttataatg ggaattggga
361	gtcagcggaa cttgaaaata aggcaaaata cttggtaggt ctgggggtnt ggcaaaat

ACCESSION No. N45282
ORIGIN

1	ctaggcataa cataaattgt tataattgat cagaatatct tgaatatatt tttacagata
61	actagtggtt tctactagca gattaaaacc aagagaaaat taaaagtaag ttcacattta
121	aaaaaaatta taagcaataa atacagcact acagccacca ctaattctat atacattgga
181	ttacatttaa acaaacactg cattccagaa tgaatatttt atgaataaat gcattggaaa
241	ttaactttag gaaataaaat gacaaattac gaatttagaa aattaaaata tgactttcac
301	aangtaatca cagtaaaatg cagatctaca ttttaaaagc tagaaatttc cccaaattta
361	tttttttgga cagccaagaa gnttgcctta aaaa

ACCESSION No. N48270
ORIGIN

1	tttgcacctt gaaacaattt aataatgtat tacattatag tagcatcaca gcagcagtca
61	ataatgccac tttagacaaa aatcagtatt tccattatgc attctgtgta taagaattca
121	taaatcggta aaagtcattc taagaaaact tggcaaatac agctttggac tggaattggc
181	atttctttgt ctacttttcc ttcccctaga ttctttgttt taaactacag tattcatatt
241	ttaaaatgtt ttaaattatt ttaagacgtt aatatagcag ttacattttt gaatagttat
301	ttgaaagtga ctgtaagata aagttttaga gaatctatta atgggatagg gttgatttac
361	attttcacat ttttcctaaa aatcagcttt ggttttagaa ctgattggtt tttcattttg
421	ggaa

ACCESSION No. N59451
ORIGIN

1	aaaatcactt caagaagcat ttattgagaa tctaagacaa acaccctata ttcaaagagc
61	ttacagttta tggaaaggcc agccaatcaa tatgcaatat ttaagtcttt tcattgaggc
121	aagtgttgat tttgagagca gagagatgat gatcgttttc gagctgagtt accaaggttg
181	gagcttacta aactcacaag ggcagtttca ggaaaggaaa ataccatctg caaaggtata
241	tggctcattc aggggctctc tgaattgtgg ctggagcaaa aggtttgaaa tcttttttct
301	tcccaagaag atgaaagagc tcctggagga cagaaactgc tttttattcc ctttgtatct
361	ctcacagcac ctggatactt aagactaaac tattctttca ctcatatggc ccattatcaa
421	tgtcagcatt gtaaggccct gatggg

ACCESSION No. N95226
ORIGIN

1	tccctttctc cctgtttccc tcccttcttt ccttccttcc ttccttcctt ccttcttaga
61	attcactgaa gtatttccta ggtagccttt tacttactac tttaatcaaa gcttatcttt
121	gtgcccaatg tgtaaaaagt gaaaatgtct cttcgaaatt ctatattaca atatagacag
181	agaagttggg ccttgagggc ttgagtttca cttaaatact atacacatgt ggtatcacac
241	aaggtggagg gggagggaac aaacagaaac ataacaatta tttttattct gtctttacaa
301	aagaaagcct cttctctatg aaaaagtctt tttggcatct gctcccggaa acctgccccg
361	agaacacgtt ccccattgct ttgcaagcat ctctttttaa aagcacanca ctgtccccgg
421	gagtcacgta ggttggatta anctgtctta gttgaccaac gaagaancac tggatgagtt
481	ttccagggat gantggttgt ctggggtgga acatatagtc ctgtctacaa caaatgtaac
541	tcctgatatg ggacnatgaa cncagtgtgt gacccaggag tgnttgatct gtnaacantc
601	gcatgnaatt

ACCESSION No. R37028
ORIGIN

1	ttttttttct ctaagtgata atgatatccc agctagaata attgtgctct ccagaagcaa
61	ttaatctgat ttgcaagcac tgattttttc ttttgcaaaa actaataata ttagcctgac
121	caattatgaa ataattccta aatttacaaa ttcccaaatt tgtgctttca tggcttcctt
181	ctattttaaa tctatattat tttaaacaaa ttttccttaa gnaaaaatga cttaacttca
241	taaaaatcta cccatttatg gtaaataaaa cattaaccaa aaaccaaaat taaagggntt
301	actataaatg gnaacattta cattgctggn tattaaatcc ctttccttgg catt

ACCESSION No. R66605
ORIGIN

1	ttttttatcc ttcttaannn ttattacatg ttttattatc ctgtccccag aggtgggttt
61	atccagaaac caagaaaaaa aatcaatcag aataaactca aaaaaaaaag gtagggggag
121	caaaaccatc aaccaccagg gcagccaggc catcagccca cctccacctc tggagggtcc
181	ccagagaccc acgcccgacg cagacccgga ggaggcatca gcaagggggc ccgggcagag
241	aatcggctat gtctttcatt atgaggaggc agggagagac gggcagagat atgtttgcta
301	gggtgantat atattttata ttaattaaat ccgtaagttt aattaaagta aataggtatt
361	tctctggaag tttttttaat ttctttcntt ttttatagtt tttttggttt tttgtggntt
421	tttttttttt ttttggggtt t

ACCESSION No. T51004
ORIGIN

1	gcagctgttg tcttccaact cagcggcagg tttgctttcc ccacggacac tctggacctt
61	gtagctcctc aagcttccct gtctattgag cagataggaa gccgtgtcaa atatgtggca
121	ccttgaggaa atgcctagtg aatgacagta tgtcctattg tgctctaact ttatttcagc
181	cttatttctt ttctgaatat tatttttcat ttatcttcat ttccttacct attttctttt
241	cttctaaagt atgtatcttt gttagctcca tcatcctttt tgggaatgag gcaagtataa
301	aaataaggta aataaataag gaccccatcc ctaggtattt ttaaggaaac cacccttttg
361	cggggcacac ttggctacct tggggtcttt agggctctgg ggggctttng ggtgtncctc
421	tngggcaggt cctggctggc attggcct

ACCESSION No. T51316
ORIGIN

1	ttcatccgct gcatgtggaa aactggcccg atacctcgca ctacgagttt ctcgccgaca
61	ctatgtggag cgattttgcc tacggtcgca atgccgtata cccggaagcn atcacggcaa
121	cgcanctngt cgcgttatcc cattgaacat tatgagaatc gcgatgtttc ggtcgatggt
181	gcggaaaagc gcggcntgct tcttacttgc cgcattgtgc cgccgattga ccgggaaaag
241	cgattcatgt tgatgttgcg tacatcttgg ggccttgcgt tgagggcgca ccgttcagg

ACCESSION No. T72535
ORIGIN

1	atgacctctg caaagagaag gtcagctata ngtagggaga aaaggaagaa ggcaagaaaa
61	ggagactcga gatgagttta catccaagag aagcacagat gtttgtaatc tacctagaat
121	aatgtgaagt acctgtccag catgtatgct cagatcctcc attcattagc acaagctgaa
181	aacatgaact gcaaattcta caccagcatc ctttgcttcc tccatggcag tgggaggtag
241	caaggggagt ccaacacttc tccatgacgt angaaaggca gggaaaaata ctgnt

ACCESSION No. W72103
ORIGIN

1	gtttgtgaaa aggaacaaaa tgaanttgaa ttggacatgt gctttaagca ngccaacaga
61	caacacacca ctagagacac acatcaaaag caatcacagt gctatgatca aatgatgggt
121	acatgtgaac acatc

All patents, patent applications, provisional applications, and publications referred to or cited herein are incorporated by reference in their entirety, including all figures and tables, to the extent they are not inconsistent with the explicit teachings of this specification.
All nucleotide and/or amino acid sequences associated with accession numbers referred to or cited herein are incorporated by reference in their entirety.
It should be understood that the examples and embodiments described herein are for illustrative purposes only and that various modifications or changes in light thereof will be suggested to persons skilled in the art and are to be included within the spirit and purview of this application.

Claims

1. A system for predicting clinical outcome for a patient diagnosed with cancer comprising a computing means; a user interface means that enables data entry, wherein said interface is coupled to said computing means, wherein said computing means is configured to perform microarray analysis and binary classification to generate a set of genes used in predicting clinical outcome.

2. The system of claim 1, wherein the microarray analysis and is significance analysis of microarrays and the binary classification is support vector machine.

3. The system of claim 1, wherein the computer is further configured to perform leave-one-out cross validation.

4. The system of claim 1, wherein the computer comprises a database for storing the set of genes, said computer further configured to analyzing biological information from a patient against the set of genes to generate a predicted clinical outcome.

5. The system of claim 1, wherein the patient is diagnosed with colon cancer.

6. A classifier for predicting clinical outcome in a patient diagnosed with cancer comprising a computing means and a user interface, wherein said computing means comprises a storing means and a means for outputting processed data, wherein said storing means comprises a set of genes classified by outcome, wherein said interface is coupled to said computing means.

7. The classifier of claim 6, wherein said set of genes consists of the following genes: N36176; AA149253; AA425320; AA775616; N72847; AA706226; AA976642; AA133215; AA457267; N50073; R38360; AA450205; AA148578; R38640; AA487274; N53172; AA045308; AA045075; N63366; R22340; AA437223; AA481250; AA045793; H87795; AA121806; AA284172; R68106; AA479270; AA432030; R10545; AA453508; A1149393; AA883496; AA167823; A1203139; H19822; W73732; AA777892; AA885478; AA932696; AA481507; H18953; AA709158; AA488652; N39584; H62801; H17638; R43684; N21630; T81317; R45595; T90789; and AA283062.

8. The classifier of claim 6, wherein said set of genes consists of the following genes: AA045075; AA425320; AA437223; AA479270; AA486233; AA487274; AA488652; AA694500; AA704270; AA706226; AA709158; AA775616; AA777892; AA873159; AA969508; A1203139; A1299969; H17364; H17627; H19822; H23551; H62801; H85015; N21630; N36176; N72847; N92519; R27767; R34578; R38360; R43597; R43684; W73732; AA450205; A1081269; R59314; AA702174; A1002566; AA676797; AA453508; W93980; AA045308; AA953396; AA962236; AA418726; R43713; AA664240; AA477404; AA826237; AA007421; AA478952; W93980; AA045308; AA953396; AA962236; AA418726; R43713; AA664240; AA477404; AA826237; AA007421; AA478952; AA885096; H29032; R10545; AA448641; R38266; H17543; T81317; AA453790; R22340; AA987675; N51543; N74527; AA121778; AA258031; AA702422; T64924; R42984; R59360; R63816; T49061; AA016210; AA682585; AA705040; AA909959; A1240881; AA133215; AA699408; AA910771; A1362799; H51549; R06568; AA001604; AA132065; AA490493; AA633845; A1261561; H81024; N75004; W96216; AA045793; AA284172; AA411324; AA448261; AA479952; AA485752; AA504266; AA630376; AA634261; AA701167; AA703019; AA706041; AA773139; AA776813; AA862465; AA977711; A1288845; H15267; H18956; H73608; H99544; N45282; N48270; N59451; N95226; R37028; R66605; T51004; T51316; T72535; and W72103.

9. The classifier of claim 6, wherein said set of genes consists of the following genes: AA007421; AA045075; AA045308; AA418726; AA425320; AA450205; AA453508; AA453790; AA477404; AA478952; AA479270; AA486233; AA487274; AA664240; AA676797; AA702174; AA706226; AA709158; AA775616; AA826237; AA873159; AA969508; AI002566; AI29969; H17364; H19822; H23551; N36176; N72847; R10545; R27767; R34578; R59314; W73732; AA448641; R59360; AA121778; H51549; H81024; AA490493; R42984; AA258031; AA133215; R63816; N95226; N74527; AA702422; A1261561; AA132065; A1362799; AA045793; AA284172; N51632; AA482110; AA485450; AA699408; N70777; AA993736; A1139498; N59721; AA431885; AA911661; AA775865; R30941; AA703019; AA777192; W72103; H15267; H17638; R60193; R92717; AA706041; AA411324; AA504266; AA932696; AA973494; N45100; AA418410; AA725641; AA954482; H45391; T86932; AA279188; AA485752; AA680132; AA977711; W93370; AA036727; AA071075; AA464612; AA481250; AA598659; AA682905; R17811; W93592; AA017301; AA046406; AA256304; AA416759; AA448261; AA452130; AA457528; AA460542; AA479952; AA481507; AA504342; AA598970; AA630376; AA634261; AA677254; AA757564; AA775888; AA844864; AA862465; AA989139; AI253017; A1394426; H99544; N41021; N45282; N46845; N48270; N59846; R16760; R44546; R92994; T51004; T56281; T70321; and W45025.

10. The classifier of claim 6, wherein said set of genes consists of the following genes: N36176; AA149253; AA425320; AA775616; N72847; AA706226; AA883496.

11. A method for predicting a clinical outcome for a patient diagnosed with cancer, said method comprising the steps of:

a) classifying at least one gene that correlates with a clinical outcome;

b) establishing a set of reference gene expression levels based on the at least one gene;

c) receiving biological information from the patient;

d) extrapolating from the biological information the level of intracellular expression of said at least one gene;

e) comparing said level of intracellular expression against said set of reference gene expression levels; and

f) predicting a clinical outcome based on the deviation of the intracellular level expression from that of the reference gene expression levels.

12. The method of claim 1, wherein identification of said at least one gene is performed with any on or combination of the following: significance analysis of microarrays, cluster analysis, support vector technology, neural network, and leave-one-out cross validation.

13. The method of claim 1, further comprising the step of estimating the accuracy of the predicted clinical outcome.

14. The method of claim 1, wherein the biological information is a clinical specimen of bodily fluid or tissue.

15. The method of claim 14, wherein the biological information is a clinical tumor sample.

16. The method of claim 1, wherein the outcome being evaluated is for a patient diagnosed with colon cancer.

17. The method of claim 1, wherein the predicted clinical outcome is the probability of patient survival at a predetermined date.

18. The method of claim 1, further comprising the step of generating a treatment regimen based on the predicted clinical outcome.

19. The method of claim 1, wherein the gene that is identified is one with the accession number selected from the group consisting of: N36176; AA149253; AA425320; AA775616; N72847; AA706226; AA976642; AA133215; AA457267; N50073; R38360; AA450205; AA148578; R38640; AA487274; N53172; AA045308; AA045075; N63366; R22340; AA437223; AA481250; AA045793; H87795; AA121806; AA284172; R68106; AA479270; AA432030; R10545; AA453508; A1149393; AA883496; AA167823; AI203139; H19822; W73732; AA777892; AA885478; AA932696; AA481507; H18953; AA709158; AA488652; N39584; H62801; H17638 R43684; N21630; T81317; R45595; T90789; and AA283062.