US20060195266A1 - Methods for predicting cancer outcome and gene signatures for use therein - Google Patents

Methods for predicting cancer outcome and gene signatures for use therein Download PDF

Info

Publication number
US20060195266A1
US20060195266A1 US11/065,794 US6579405A US2006195266A1 US 20060195266 A1 US20060195266 A1 US 20060195266A1 US 6579405 A US6579405 A US 6579405A US 2006195266 A1 US2006195266 A1 US 2006195266A1
Authority
US
United States
Prior art keywords
protein
homo sapiens
genes
classifier
accession
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US11/065,794
Inventor
Timothy Yeatman
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to US11/065,794 priority Critical patent/US20060195266A1/en
Priority to EP05754399A priority patent/EP1894131B1/en
Priority to PCT/US2005/017988 priority patent/WO2006093507A2/en
Priority to US11/134,688 priority patent/US10181009B2/en
Publication of US20060195266A1 publication Critical patent/US20060195266A1/en
Assigned to NATIONAL INSTITUTES OF HEALTH (NIH), U.S. DEPT. OF HEALTH AND HUMAN SERVICES (DHHS), U.S. GOVERNMENT reassignment NATIONAL INSTITUTES OF HEALTH (NIH), U.S. DEPT. OF HEALTH AND HUMAN SERVICES (DHHS), U.S. GOVERNMENT CONFIRMATORY LICENSE (SEE DOCUMENT FOR DETAILS). Assignors: UNIVERSITY OF SOUTH FLORIDA
Assigned to US ARMY, SECRETARY OF THE ARMY reassignment US ARMY, SECRETARY OF THE ARMY CONFIRMATORY LICENSE (SEE DOCUMENT FOR DETAILS). Assignors: UNIVERSITY OF SOUTH FLORIDA
Assigned to NATIONAL INSTITUTES OF HEALTH (NIH), U.S. DEPT. OF HEALTH AND HUMAN SERVICES (DHHS), U.S. GOVERNMENT reassignment NATIONAL INSTITUTES OF HEALTH (NIH), U.S. DEPT. OF HEALTH AND HUMAN SERVICES (DHHS), U.S. GOVERNMENT CONFIRMATORY LICENSE (SEE DOCUMENT FOR DETAILS). Assignors: UNIVERSITY OF SOUTH FLORIDA
Abandoned legal-status Critical Current

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/10Gene or protein expression profiling; Expression-ratio estimation or normalisation
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/10Signal processing, e.g. from mass spectrometry [MS] or from PCR
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/20Supervised data analysis
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/106Pharmacogenomics, i.e. genetic variability in individual responses to drugs and drug metabolism
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/118Prognosis of disease development
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding

Definitions

  • DNA microarrays which are sometimes commonly referred to as biochips, DNA chips, gene arrays, gene chips, and genome chips.
  • DNA microarrays exploit a phenomenon known as base-pairing or hybridization.
  • genetic samples are arranged in an orderly manner (typically in a rectangular grid) on a substrate.
  • substrates include microplates and blotting membranes.
  • Many modern microarrays include an array of oligonucleotide or peptide nucleic acid (PNA) probes, and the array is synthesized either in situ (on-chip) or by conventional synthesis followed by on-chip immobilization. The array on the chip is exposed to labeled sample DNA, hybridized, and the identity/abundance of complementary sequences are determined.
  • PNA peptide nucleic acid
  • expression or abundance of a gene is a measure of a relative level of activity of the gene in replication or translation in the presence of the probe.
  • arrays of expression levels include metadata describing characteristics of the people whose genetic material is sampled and additional metadata which identifies specific genes whose expression levels are represented in such arrays.
  • microarrays are already being used for a number of beneficial purposes including, for example, identifying biomarkers of cancer (Welsh, J B et al., “Large-scale delineation of secreted protein biomarkers overexpressed in cancer tissue and serum,” PNAS, 100(6):3410-3415 (March 2003)), creating gene expression-based classifications of cancers (Alzadeh, A A et al., “Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling,” Nature, 403:513-11 (2000); and Garber, M E et al., “Diversity of gene expression in adenocarcinoma of the lung,” Proc Natl Acad Sci USA, 98:13784-9 (2001)), and in drug discovery (Marton, M J et al., “Drug target validation and identification of secondary drug target effects using Microarrays,” Nat Med, 4(11):1293-301 (1998); and Gray, N S et al.
  • SAM has been used for a variety of purposes, including identifying potential drugs that would be effective in treating various conditions associated with specific gene expressions (Bunney W E, et al., “Microarray technology: a review of new strategies to discover candidate vulnerability genes in psychiatric disorders,” Am J Psychiatry, 160(4):657-66 (April 2003)).
  • SVM Serial Vector Machine
  • the known SVM or (Support Vector Machine) is a correlation tool shown to perform well in multiple areas of biological analysis, including evaluating microarray expression data (Brown et al, “Knowledge-based analysis of microarray gene expression data by using support vector machines,” Proc Natl Acad Sci USA, 97:262-267 (2000)), detecting remote protein homologies (Jaakkola, T.
  • SVMs utilize the technique of “kernels” to automatically realize a non-linear mapping to a feature space (Furey, T. S. et al., “Support vector machine classification and validation of cancer tissue samples using microarray expression data,” Bioinformatics, 16(10):906-914 (2000)).
  • colon cancer is a deadly disease afflicting nearly 130,000 new patients yearly in the United States.
  • Colon cancer is the only cancer that occurs with approximately equal frequency in men and women.
  • risk factors for the development of colon and/or rectal cancer include older age, excessive alcohol consumption, sedentary lifestyle (Reddy, B.
  • the Dukes' staging system based on the pathological spread of disease through the bowel wall, to lymph nodes, and to distant organ sites such as the liver, has remained the most popular. Despite providing only a relative estimate for cure for any individual patient, the Dukes' staging system remains the standard for predicting colon cancer prognosis, and is the primary means for directing adjuvant therapy.
  • the Dukes' staging system has only been found useful in predicting the behaviour of a population of patients, rather than an individual. For this reason, any patient with a Dukes A, B, or C lesion would be predicted to be alive at 36 months while a patient staged as Dukes D would be predicted to be dead. Unfortunately, application of this staging system results in the potential over-treatment or under-treatment of a significant number of patients. Further, Dukes' staging can only be applied after complete surgical resection rather than after a pre-surgical biopsy.
  • Microarray technology has permitted development of multi-organ cancer classifiers (Giordano, T. J. et al., “Organ-specific molecular classification of primary lung, colon, and ovarian adenocarcinomas using gene expression profiles,” Am J Pathol, 159:1231-8 (2001); Ramaswamy, S. et al., “Multiclass cancer diagnosis using tumor gene expression signatures,” Proc Natl Acad Sci USA, 98:15149-54 (2001); and Su, A. I. et al., “Molecular classification of human carcinomas by use of gene expression signatures,” Cancer Res, 61:7388-93 (2001)), identification of tumor subclasses (Dyrskjot, L.
  • multi-organ cancer classifiers Gaordano, T. J. et al., “Organ-specific molecular classification of primary lung, colon, and ovarian adenocarcinomas using gene expression profiles,” Am J Pathol, 159:1231-8 (2001); Ram
  • Vasselli, J R et al. “Predicting survival in patients with metastatic kidney cancer by gene-expression profiling in the primary tumor,” Proc Natl Acad Sci USA, 100:6958-63 (2003); and Takahashi, M. et al., “Gene expression profiling of clear cell renal cell carcinoma: gene identification and prognostic classification,” Proc Natl Acad Sci USA, 98:9754-9 (2001)) in many types of cancer.
  • Classification of patient prognosis by microarray analysis has promise in predicting the long-term outcome of any one individual based on the gene expression profile of the tumor at diagnosis.
  • Inherent to this approach is the hypothesis that every tumor contains informative gene expression signatures, at the time of diagnosis, which can direct the biological behaviour of the tumor over time.
  • little success has been achieved in developing a classifier that will predict colon cancer outcome equivalent to or better than that which is possible using the standard clinicopathologic staging systems (i.e., Dukes' stage system).
  • What is needed is a particularly effective mechanism for analyzing genomic array data to provide a classifier that accurately predicts cancer outcomes, in particular, colon cancer outcomes.
  • genes are classified according to degree of correlation with a clinical outcome for a cancer of interest (such as colon cancer). These genes are used to establish a set of reference gene expression levels (also referred to herein as a “classifier”). Biological information regarding the patient is received and used to extrapolate intracellular gene expression. The intracellular gene expression levels are compared to those in the classifier to predict clinical outcome.
  • a cancer of interest such as colon cancer
  • a method in which the specific gene signatures for colon cancer are identified.
  • frozen tumor specimens form patients with known outcomes are collected and frozen.
  • the outcomes are linked to a specific core set of genes that are weighted in importance by (1) selecting genes of interest by applying microarray analysis; (2) producing a classifier using support vector machines (SVM); and (3) cross-validating the genes of interest and the classifier by comparing them against an independent set of test data.
  • SVM support vector machines
  • significance analysis of microarrays SAM is utilized to select genes of interest.
  • Genome wide microarray analyses can produce large datasets that can be pattern-matched to clinicopathologic parameters such as patient outcomes and prognosis. Accordingly, the subject invention identifies gene expression signatures that would predict colon cancer outcome more accurately than the well-accepted Dukes' staging system.
  • a group of colon cancer patients was examined to develop a survival classifier, which was subsequently validated using an entirely independent test set of data derived on a different microarray platform at a different performance site.
  • the classifier of the subject invention was ultimately based on a core set of genes selected for their correlation to survival. A number of the genes in the core set demonstrated intrinsic biological significance for colon cancer progression.
  • patients assessed using the subject invention and identified to have poor outcomes may be treated more aggressively or with specific agents (i.e., anti-sense agents, RNA inhibition agents, small molecule inhibitors of the cancer activity, gene therapy, etc.).
  • specific agents i.e., anti-sense agents, RNA inhibition agents, small molecule inhibitors of the cancer activity, gene therapy, etc.
  • an important contribution of the prognosis/survival classifier of the present invention is the ability to identify those Dukes' stage B and C cases for which chemotherapy may be beneficial.
  • FIG. 1A is a heatmap illustrating cluster analysis of genes selected in accordance with the present invention when correlated with prognosis/patient survival.
  • FIG. 1B is a heatmap illustrating cluster analysis of genes selected in accordance with the present invention when grouped by Dukes' stage B and C.
  • FIG. 2A graphically illustrates a Kaplan-Meier survival curve based on gene expression profiling in accordance with the present invention.
  • FIG. 2B graphically illustrates a Kaplan-Meier survival curve based on Dukes' staging.
  • FIGS. 3A-3C illustrate survival curves for molecular classifiers in accordance with the subject invention.
  • the present invention provides systems and methods for predicting cancer prognosis and outcomes.
  • the subject invention utilizes molecular staging with gene expression profiles to stage patients with cancer.
  • the present invention provides a gene expression profile based classifier for predicting cancer outcomes/prognosis. Both microarray analysis and binary classification are used to create the classifier of the invention.
  • the subject invention provides methods for predicting patient outcomes comprising: identifying genes that correlate with a clinical outcome for a cancer of interest (such as colon cancer); establishing a set of reference gene expression levels (also referred to herein as a “classifier”) for said identified genes; receiving biological information regarding the patient; using the biological information to extrapolate intracellular gene expression; and comparing intracellular gene expression levels to those in the classifier to predict clinical outcome.
  • a cancer of interest such as colon cancer
  • a set of reference gene expression levels also referred to herein as a “classifier”
  • Biological information of the invention includes, but is not limited to, clinical samples of bodily fluids or tissues; DNA profile information; and RNA profile information. Methods for preparing clinical samples for gene expression analysis are well known in the art, and can be carried out using commercially available kits.
  • the subject invention provides methods for predicting colon cancer patient outcomes using a SAM selected set of genes derived from a genome wide analysis of gene expression. Those patients with good and bad prognoses are first clustered into groups that suggest outcome-rich information that is likely present in the gene expression dataset. Subsequently, a supervised SVM analysis identifies a core set of genes that appears in a majority (i.e., 50% or greater, including for example, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90% or 95%) of the cross validation folds and accurately predicts colon cancer survival. Preferably, a core set of genes that appears in 75% of the cross validation folds is identified by an SVM to be used in predicting colon cancer survival.
  • a gene core set is derived from a cDNA microarray that includes both named and unnamed genes.
  • the resultant gene set is highly accurate in predicting cancer survival when compared with Dukes staging data from the same patients.
  • a normalized and scaled oligonucleotide-based cancer database is evaluated against a completely independent set of test data derived from a different microarray platform.
  • the subject invention provides a system for predicting clinical outcome in a patient diagnosed with cancer, wherein the system is useful in offering support/advice in making treatment decisions.
  • the system comprises (1) a data storage device for collecting data (i.e., gene data); and (3) a computing means for receiving and analyzing data to accurately determine genes associated with poor or good patient prognosis.
  • a graphical user interface can be included with the systems of the invention to display clinical data as well as enable user-interaction.
  • the system of the invention further includes an intelligence system that can use the analyzed clinical data to classify gene samples and offer support/advice for making clinical decisions (i.e., to interpret predicted clinical outcome and provide appropriate treatment).
  • An intelligence system of the subject invention can include, but is not limited to, artificial neural networks, fuzzy logic, evolutionary computation, knowledge-based systems, and artificial intelligence.
  • the computing means is preferably a digital signal processor, which can automatically and accurately analyze gene data and determine those genes that strongly correlate to clinical outcome.
  • the system of the subject invention is stationary.
  • the system of the invention can be used within a healthcare setting (i.e., hospital, physician's office).
  • the term “patient” refers to humans as well as non-human animals including, and not limited to, mammals, birds, reptiles, amphibians, and fish.
  • Preferred non-human animals include mammals (i.e., mouse, rat, rabbit, monkey, dog, cat, primate, pig).
  • a patient may also include transgenic animals.
  • a patient may be a laboratory animal raised by humans in a controlled environment other than its natural habitat.
  • cancer refers to a malignant tumor (i.e., colon or prostate cancer) or growth of cells (i.e., leukaemia). Cancers tend to be less differentiated than benign tumors, grow more rapidly, show infiltration, invasion, and destruction, and may metastasize. Cancer include, and are not limited to, colon and rectal cancers, fibrosarcoma, myxosarcoma, antiosarcoma, leukaemia, squamous cell carcinoma, basal cell carcinoma, malignant melanoma, renal cell carcinoma, and hepatocellular carcinoma.
  • a “marker gene,” as used herein, refers to any gene or gene product (i.e., protein, peptide, mRNA) that indicates a particular clinicopathological state (i.e., carcinoma, normal dysplasia and outcomes) or indicates a particular cell type, tissue type, or origin.
  • the expression or lack of expression of a marker gene may indicate a particular physiological and/or diseased state of a patient, organ, tissue, or cell.
  • the expression or lack of expression may be determined using standard techniques such as RT-PCR, sequencing, immunochemistry, gene chip analysis, etc.
  • the level of expression of a marker gene is quantifiable.
  • polynucleotide refers to a polymer of nucleotides. Typically, a polynucleotide comprises at least three nucleotides.
  • the polymer may include natural nucleosides (i.e., adenosine, thymidine, guanosine, cytidine, uridine, deoxyadenosine, deoxythymidine, deoxyguanosine, and deoxycytidine), nucleoside analogs (i.e., 2-aminoadensoine, 2-thio-thymidine, inosine, pyrrolo-pyrimidine, 3-methyl adenosine, C5-propynylcytidine, C5-propynyluridine, C5-bromouridine, C5-fluorouridine, C5-iodouridine, C5-methylcytidine, 7-deaza
  • tumor refers to an abnormal growth of cells.
  • the growth of the cells of a tumor typically exceeds the growth of normal tissue and tends to be uncoordinated.
  • the tumor may be benign (i.e., lipoma, fibroma, myxoma, lymphangioma, meningioma, nevus, adenoma, leiomyoma, mature teratoma, etc.) or malignant (i.e., malignant melanoma, ovarian cancer, carcinoma in situ, carcinoma, adenocarcinoma, liposarcoma, mesothelioma, squamous cell carcinoma, basal cell carcinoma, colon cancer, lung cancer, etc.).
  • Bodily fluid refers to a mixture of molecules obtained from a patient. Bodily fluids include, but are not limited to, exhaled breath, whole blood, blood plasma, urine, semen, saliva, lymph fluid, meningal fluid, amniotic fluid, glandular fluid, sputum, feces, sweat, mucous, and cerebrospinal fluid. Bodily fluid also includes experimentally separated fractions of all of the preceding solutions or mixtures containing homogenized solid material, such as feces, tissues, and biopsy samples.
  • Correlating genes to clinical outcomes in accordance with the subject invention can be performed using software on a computing means.
  • the computing means can also be responsible for maintenance of acquired data as well as the maintenance of the classifier system itself.
  • the computing means can also detect and act upon user input via user interface means known to the skilled artisan (i.e., keyboard, interactive graphical monitors) for entering data to the computing system.
  • the computing means further comprises means for storing and means for outputting processed data.
  • the computing means includes any digital instrumentation capable of processing data input from the user. Such digital instrumentation, as understood by the skilled artisan, can process communicated data by applying algorithm and filter operations of the subject invention.
  • the digital instrumentation is a microprocessor, a personal desktop computer, a laptop, and/or a portable palm device.
  • the computing means can be general purpose or application specific.
  • the subject invention can be practiced in a variety of situations.
  • the computing means can directly or remotely connect to a central office or health care center.
  • the subject invention is practiced directly in an office or hospital.
  • the subject invention is practiced in a remote setting, for example, personal residences, mobile clinics, vessels at sea, rural villages and towns without direct access to healthcare, and ambulances, wherein the patient is located some distance from the physician.
  • the computing means is a custom, portable design and can be carried or attached to the health care provider in a manner similar to other portable electronic devices such as a portable radio pr computer.
  • the computing means used in accordance with the subject invention can contain at least one user-interface device including, but not limited to, a keyboard, stylus, microphone, mouse, speaker, monitor, and printer. Additional user-interface devices contemplated herein include touch screens, strip recorders, joysticks, and rollerballs.
  • the computing means comprises a central processing unit (CPU) having sufficient processing power to perform algorithm operations in accordance with the subject invention.
  • the algorithm operations including the microarray analysis operations (such as SAM or binary classification), can be embodied in the form of computer processor usable media, such as floppy diskettes, CD-ROMS, zip drives, non-volatile memory, or any other computer-readable storage medium, wherein the computer program code is loaded into and executed by the computing means.
  • the operational algorithms of the subject invention can be programmed directly onto the CPU using any appropriate programming language, preferably using the C programming language.
  • the computing means comprises a memory capacity sufficiently large to perform algorithm operations in accordance with the subject invention.
  • the memory capacity of the invention can support loading a computer program code via a computer-readable storage media, wherein the program contains the source code to perform the operational algorithms of the subject invention.
  • the memory capacity can support directly programming the CPU to perform the operational algorithms of the subject invention.
  • a standard bus configuration can transmit data between the CPU, memory, ports and any communication devices.
  • the memory capacity of the computing means can be expanded with additional hardware and with saving data directly onto external mediums including, for example, without limitation, floppy diskettes, zip drives, non-volatile memory and CD-ROMs.
  • the computing means can also include the necessary software and hardware to receive, route and transfer data to a remote location.
  • the patient is hospitalized, and clinical data generated by a computing means is transmitted to a central location, for example, a monitoring station or to a specialized physician located in a different locale.
  • the patient is in remote communication with the health care provider.
  • patients can be located at personal residences, mobile clinics, vessels at sea, rural villages and towns without direct access to healthcare, and ambulances, and by using the classifier system of the invention, still provide clinical data to the health care provider.
  • mobile stations such as ambulances, and mobile clinics, can monitor patient health by using a portable computing means of the subject invention when transporting and/or treating a patient.
  • clinical data can be transmitted as unprocessed or “raw” signal(s) and/or as processed signal(s).
  • transmitting raw signals allows any software upgrades to occur at the remote location where a computing means is located.
  • both historical clinical data and real-time clinical data can be transmitted.
  • Communication devices such as wireless interfaces, cable modems, satellite links, microwave relays, and traditional telephonic modems can transfer clinical data from a computing means to a healthcare provider via a network.
  • Networks available for transmission of clinical data include, but are not limited to, local area networks, intranets and the open internet.
  • a browser interface for example, NETSCAPE NAVIGATOR or INTERNET EXPLORER, can be incorporated into communications software to view the transmitted data.
  • a browser or network interface is incorporated into the processing device to allow the user to view the processed data in a graphical user interface device, for example, a monitor.
  • a graphical user interface device for example, a monitor.
  • the results of algorithm operations of the subject invention can be displayed in the form of interactive graphics.
  • genes can be selected that most closely correlate with selected survival times. Permutation analysis can then used to estimate the false discovery rate (FDR).
  • FDR false discovery rate
  • the resultant mean-centered gene expression vectors can then be clustered and visualized using known computer software (i.e., Cluster 3.0 and Java TreeView 1.03, both of which are provided by Hoon MJLd, et al., “Open Source Clustering Software,” Bioinformatics 2003, in press).
  • a gene classifier can be constructed to predict a set time of outcome among a set number of patients using microarray data produced on a cDNA platform.
  • the classifier of the subject invention is produced on a computing means that using SAM two-class gene selection and a support vector machine classification.
  • the SAM procedure is empirically set to select enough genes to satisfy a set FDR. Such selected genes can then be used in a linear support vector machine to classify the samples as having poor or good prognosis.
  • Leave-one-out cross-validation (LOOCV) operation can also be utilized to construct a classifier (i.e., neural network-based classifier) as well as to estimate the prediction accuracy of the classifier of the subject invention.
  • the classification process includes both gene selection and SVM classification creation; therefore, both steps can be performed on each training set after the test example is removed.
  • samples can be classified as having “good” or “poor” prognosis based on survival for a certain set amount of time. In a preferred embodiment, “good” or “poor” prognosis is based on more or less than 36 months, respectively.
  • the subject invention provides a means for ranking the genes selected.
  • the number of times a particular gene is chosen can be an indicator of the usefulness of that gene for general classification and may imply biological significance.
  • the classifier of the subject invention is prepared by (1) SAM gene selection using a t-test and (2) classification using a neural network.
  • the classifier is prepared after a test sample is left out (from the LOOCV) to avoid bias from the gene selection step. Since the classification problem is a binary decision, a t-test was used for gene selection.
  • a feed-forward back-propogation neural network system see Rumelhart, D. E. and J. L. McClelland, “Parallel Distributed Processing: Exploration in the Microstructure of Cognition,” Cambridge, Mass.: MIT Press (1986); and Fahlman, S. E., “Faster-Learning Variations on Back-Propogation: An Empirical Study,” Proceedings of the 1988 Connectionist Models Summer School , Los Altos, Calif.: Morgan-Kaufmann (1988)
  • a feed-forward back-propogation neural network with a single layer of 10 units is used. Neural network systems are extremely robust to both the number of genes selected and the level of noise in these genes.
  • Differences between Kaplan-Meier curves can be evaluated using the log-rank test, which is well known to the skilled statistician. This can be performed both for the initial survival analysis and for the classifier results.
  • the classifier can split the samples into various groups (i.e., two groups: those predicted as good or poor prognosis). Classifier accuracy can be reported to the user both as overall accuracy and as specificity/sensitivity.
  • a McNemar's Chi-Squared test is used to compare the molecular classifier with the use of a Dukes' staging classifier.
  • several permutations of the dataset i.e., 1,000 permutations are used to measure the significance of the classifier results as compared to chance.
  • a colon cancer survival classifier was developed using 78 tumor samples, including 3 adenomas and 75 cancers. Informative frozen colorectal cancer samples were selected from the Moffitt Cancer Center Tumor Bank (Tampa, Fla.) based on evidence for good (survival >36 mo) or poor prognosis (survival ⁇ 36 mo) from the Tumor Registry. Dukes' stages can include B, C, and D. In this particular embodiment, survival was measured as last contact minus collection date for living patients, or date of death minus collection date for patients who have died.
  • the number of samples per Dukes' stage was as follows: 23 patients with stage B, 22 patients with stage C and 30 patients with stage D disease.
  • adenomas can be included to help train the classifier to recognize good prognosis patients
  • Dukes D patients with synchronous metastatic disease can be used to train the classifier to recognize poor prognosis patients.
  • all samples were selected to have at least 36 months of follow-up.
  • the follow-up results in this embodiment showed that thirty-two of the patients survived more than 36 months, while 46 patients died within 36 months.
  • the median follow-up time for all 78 patients was 27.9 months.
  • the median follow-up for the poor prognosis cases ( ⁇ 36 months survival) was 11.7 months and for the good prognosis cases (>36 months survival) it was 64.2 months.
  • RNAzol WAK-Chemie Medical
  • spin column technology Sigma
  • samples can be microdissected (>80% tumor cells) by frozen section guidance and RNA extraction performed using Trizol followed by secondary purification on RNAEasy columns.
  • the samples can then be profiled on cDNA arrays (i.e., TIGR's 32,488-element spotted cDNA arrays, containing 31,872 human cDNAs representing 30,849 distinct transcripts—23,936 unique TIGR TCs and 6,913 ESTs, 10 exogenous controls printed 36 times, and 4 negative controls printed 36-72 times).
  • tumor samples are co-hybridized with a common reference pool in the Cy5 channel for normalization purposes.
  • cDNA synthesis, aminoallyl labeling and hybridizations can be performed according to previously published protocols (see Hegde, P. et al., “A concise guide to cDNA microarray analysis,” Biotechniques; 29:552-562 (2000) and Yang, I. V, et al., “Within the fold: assessing differential expression measures and reproducibility in microarray assays,” Genome Biol; 3:research0062 (2002)).
  • labeled first-strand cDNA is prepared, and co-hybridized with labeled samples are prepared, from a universal reference RNA consisting of equimolar quantities of total RNA derived from three cell lines, CaCO2 (colon), KM12L4A (colon), and U118MG (brain).
  • RNA equimolar quantities of total RNA derived from three cell lines
  • CaCO2 colon
  • KM12L4A colon
  • Array probes are identified and local background can be subtracted in Spotfinder (Saeed, A. I. et al., “TM4: a free, open-source system for microarray data management and analysis,” Biotechniques; 34:374-8 (2003)).
  • Individual arrays can be normalized in MIDAS (see Saeed, A.I. ibid.) using LOWESS (an algorithm known to the skilled artisan for
  • the first and second strand cDNA synthesis can be performed using the SuperScript II System (Invitrogen) according to the manufacturer's instructions except using an oligodT primer containing a T7 RNA polymerase promoter site.
  • Labeled cRNA is prepared using the BioArray High Yield RNA Transcript Labeling Kit (Enzo). Biotin labeled CTP and UTP (Enzo) are used in the reaction together with unlabeled NTP's. Following the IVT reaction, the unincorporated nucleotides are removed using RNeasy columns (Qiagen).
  • cRNA Fifteen micrograms of cRNA are fragmented at 940 C for 35 min in a fragmentation buffer containing 40 mM Tris-acetate pH 8.1, 100 mM KOAc, 30 mM MgOAc. Prior to hybridization, the fragmented cRNA in a 6 ⁇ SSPE-T hybridization buffer (1 M NaCl, 10 mM Tris pH 7.6, 0.005% Triton) is heated to 95° C. for 5 min and subsequently to 45° C. for 5 min before loading onto the Affymetrix HG_U133A probe array cartridge. The probe array is then incubated for 16 h at 45° C. at constant rotation (60 rpm). The washing and staining procedure can be performed in an Affymetrix Fluidics Station.
  • the probe array can be exposed to several washes (i.e., 10 washes in 6 ⁇ SSPE-T at 25° C. followed by 4 washes in 0.5 ⁇ SSPE-T at 50° C.).
  • the biotinylated cRNA can then be stained with a streptavidinphycoerythrin conjugate, final concentration 2 mg/ml (Molecular Probes, Eugene, Oreg.) in 6 ⁇ SSPE-T for 30 min at 25° C. followed by 10 washes in 6 ⁇ SSPE-T at 25° C.
  • An antibody amplification step can then follow, using normal goat IgG as blocking reagent, final concentration 0.1 mg/ml (Sigma) and biotinylated anti-streptavidin antibody (goat), final concentration 3 mg/ml (Vector Laboratories). This can be followed by a staining step with a streptavidin-phycoerythrin conjugate, final concentration 2 mg/ml (Molecular Probes, Eugene, Oreg.) in 6 ⁇ SSPE-T for 30 min at 25° C. and 10 washes in 6 ⁇ SSPE-T at 25° C.
  • the probe arrays are scanned (i.e., at 560 nm using a confocal laser-scanning microscope (Hewlett Packard GeneArray Scanner G2500A)). The readings from the quantitative scanning can then be analyzed by the Affymetrix Gene Expression Analysis Software (MAS 5.0) and normalized to a common mean expression value of 150.
  • MAS 5.0 Affymetrix Gene Expression Analysis Software
  • the first analysis of the colon cancer survival data can be performed using censored survival time (in months) and 500 permutations. Significance analysis of microarrays (SAM) can then be used to select genes most closely correlated to survival. The subset of genes that correspond to an empirically derived, estimated false discovery rate (FDR) is then chosen. This subset of genes can then be used in subsequent analyses.
  • SAM statistical analysis of microarrays
  • FDR estimated false discovery rate
  • Cluster 3.0 and Java TreeView 1.03 are used to cluster and visualize the SAM-selected genes.
  • a hierarchical clustering algorithm can be chosen, with complete linkage and the correlation coefficient (i.e., Pearson correlation coefficient) as the similarity metric.
  • the Dukes' staging clusters are manually created in the appropriate format.
  • Clustering software produces heatmap (see FIGS. 1A and 1B ) and dendrograms.
  • the highest level partition of the SAM-selected genes can then be chosen as a survival grouping.
  • Kaplan-Meier curves can be plotted (see FIGS. 2A and 2B ).
  • SAM survival analysis can be used to identify a set of genes most correlated with censored survival time using the training set tumor samples.
  • a set of 53 genes was found, corresponding to a median expected false discovery rate (FDR) of 28%.
  • FDR median expected false discovery rate
  • genes denoted with (+) indicate a positive correlation to survival time and genes without the (+) notation indicate a negative correlation in survival time (over expression in poor prognosis cases). Included in this list of genes in Table 1 are several genes believed to be biologically significant, such as osteopontin and neuregulin.
  • FIG. 1A presents a graphical representation of the 53 SAM-selected genes (as described above) as a clustered heat map.
  • the red color represents over-expressed genes relative to green, under-expressed genes.
  • FIG. 1A shows only the Dukes' stage B and C cases, whose outcome Dukes' staging predicts poorly. Since only genes correlated with survival are used in clustering, the distinctly illustrated clusters in the heatmap correspond to very different prognosis groups.
  • the 53 SAM-selected genes were also arranged by annotated Dukes' stage in FIG. 1B . Unlike FIG. 1A , where two gene groups were apparent, there was no discernible gene expression grouping when arranged by Dukes' stage.
  • FIG. 2A shows the Kaplan-Meier plot for two dominant clusters of genes correlated with stage B and C test set tumor samples. Clearly, these genes separated the cases into two distinct clusters of patients with good prognosis (cluster 2) and poor prognosis (cluster 1) (P ⁇ 0.001 using a log rank test).
  • FIG. 2B presents a Kaplan-Meier plot of the survival times of Dukes' stage B and C tumors grouped by stage, showing no statistically significant difference.
  • gene expression profiles separate good and poor prognosis cases better than Dukes' staging. This suggests that a gene-expression based classifier, as provided by the present invention, is more accurate at predicting patient prognosis than the traditional Dukes' staging.
  • Dukes' staging provides only a probability of survival for each member of a population of patients, based on historical statistics. Accordingly, the prognosis of an individual patient can be predicted based on historical outcome probabilities of the associated Dukes' stage. For example, if a Dukes' C. survival rate was 55% at 36 months of follow up, any individual Dukes' C. patient would be classified as having a good prognosis since more than 50% of patients would be predicted to be alive.
  • a classifier of the invention was compared to the Dukes' clinical staging approach currently in widespread use.
  • a classifier (Classifier A) of the present invention predicted 100%, 69%, 55% and 20% for Adenomas, and Dukes' stages B, C and D cancers, respectively.
  • the overall accuracy was 77% (63% sensitivity/97% specificity).
  • molecular staging identified the good prognosis cases (the “default” classification using Dukes' staging), but also identified poor prognosis cases with a high degree of accuracy, Table 2C.
  • Tables 2A-2C also show the detailed confusion matrix for all samples in the dataset, showing the equivalent misclassification rate of both good and poor prognosis groups by the classifier of the subject invention.
  • Leave-one-out cross-validation technique can be utilized for evaluating the performance of a classifier construction method of the subject invention. This approach tends towards high variance in accuracy estimates, but with low bias.
  • a classifier of the subject invention can be created on all available training data, then tested for accuracy by classifying the left-out example.
  • a classifier was constructed in two steps: first a gene selection procedure was performed with SAM and then a support vector machine was constructed.
  • the gene selection approach used was a univariate selection.
  • SAM signal analysis of microarrays
  • SAM signal analysis of microarrays
  • SAM calculates false discovery rates empirically through the use of permutation analysis.
  • SAM provides an estimate of the false discovery rate (FDR) along with a list of genes considered significant relative to censored survival. This feature of SAM was used with this particular embodiment to select the number of genes that resulted in the smallest FDR possible. In one embodiment, this FDR was zero.
  • the set of 53 genes (significant genes, as described above) at a FDR of 28% was used in this particular embodiment.
  • the samples were clustered as a way of visualizing the SAM results (see FIGS. 1A and 1B ).
  • SVM linear support vector machine
  • the software used for this approach can be implemented in a weka machine learning toolkit.
  • a linear SVM was then chosen to reduce the potential for overfitting the data, given the small sample sizes and large dimensionality.
  • One further advantage of this approach is the transparency of the constructed model, which is of particular interest when comparing the classifier of the subject invention on two different platforms (see below).
  • a list of 43 genes was selected for use in constructing a second human colorectal cancer survival classifier, in accordance with the present invention.
  • the list of 43 genes is provided in the following Table 3. TABLE 3 Genes used in the cDNA classifier (selected by t-test) and ranked by selection frequency using LOOCV.
  • coli essential cell cycle protein Era Era
  • era E. coli Gprotein homolog
  • Homo sapiens *30 AA007421 Hs.113992 candidate tumor suppressor protein
  • Homo sapiens ⁇ *30 AA478952 Hs.91753 unnamed protein product
  • CCR6 (Evidence is not experimental); chemokine (C—C motif) receptor-like 2 [ Homo sapiens ] *1 H15267 Hs.210863 null 1 H18956 Hs.21035 unnamed protein product [ Homo sapiens ] 1 H73608 Hs.94903 null *1 H99544 Hs.153445 unknown; endothelial and smooth muscle cell-derived neuropilin-like protein [ Homo sapiens ]; endothelial and smooth muscle cell-derived neuropilin-like protein; coagulation factor V/VIII-homology domains protein 1 [ Homo sapiens ] *1 N45282 Hs.201591 calcitonin receptor-like *1 N48270 Hs.45114 Similar to golgi autoantigen, golgin subfamily a, member 6 [ Homo sapiens ] 1 N59451 Hs.48389 null *1 N95226 Hs.22039 KIAA0758 protein;
  • a third human colorectal cancer survival classifier in accordance with the present invention, was prepared using U133A-limited genes selected by LOOCV via statistical analytic tools (i.e., t-test).
  • the list of U133A-limited genes selected using LOOCV via t-test is provided in the following Table 4.
  • the named genes common to both the original classifier (a set of 43 genes) and the U133A-limited classifier are marked with an asterisk.
  • Table 5 illustrates seven genes selected by SAM survival analysis, where osteopontin and neuregulin are noted to be present and in common with the gene lists for all classifiers.
  • Systems and methods of the subject invention can be tested by applying a classifier to an immediately available, well-annotated, independent test set of colon cancer tumor samples (Denmark, as described above) run on the Affymetrix platform.
  • database software such as the Resourcer software from TIGR (see also Tsai J et al., “RESOURCER: a database for annotating and linking microarray resources within and across species,” Genome Biol, 2:software0002.1-0002.4 (2001)
  • genes can be mapped out from the cDNA chip to a corresponding gene on the Affymetrix platform.
  • the linkage is done by common Unigene IDs.
  • 12,951 genes (out of 32,000) were mapped to an Affymetrix U133A GeneChip.
  • probes on the cDNA chip are unknown expressed sequence tag markers (ESTs) which can reduce the number of usable genes identified.
  • ESTs expressed sequence tag markers
  • a classifier of the subject invention can address this lack of correspondence in platforms. Accordingly, in a related embodiment, a U133A-limited cDNA classifier was constructed in accordance with the subject invention by using the identical approach on this reduced set of overlapping genes.
  • the U133A-limited cDNA classifier was constructed, a linear scaling factor based on the expression of a common training set (H. Lee Moffitt Cancer Center & Research Institute, Tampa, Fla.) sample applied to both the cDNA microarrays and the U133A GeneChips, was applied equally to all Affymetrix samples (training set as well as test set samples from DENMARK). Using this assumption, the U133A chip value corresponding to a cDNA probe is the ratio of training set to test set sample (on U133A chips). Each of the Affymetrix U133A arrays (both the test set and the reference samples) was scaled to a constant average intensity (150) prior to taking the ratio and the test sample chip values were averaged.
  • a common training set H. Lee Moffitt Cancer Center & Research Institute, Tampa, Fla.
  • FIGS. 3A through 3C illustrate survival curves for molecular classifiers in accordance with the subject invention.
  • FIG. 3A illustrates the survival curve for a cDNA classifier of the subject invention on the 78 training set samples (LOOCV);
  • FIG. 3B illustrates the survival curve for the U133A-limited cDNA classifier (LOOCV);
  • FIG. 3C illustrates the survival curve for an independent test set classification (Denmark test set sample).
  • Tables 6A-6C The confusion matrix and accuracy rates by Dukes' stage are also presented in Tables 6A-6C.
  • the U133A-limited classifier was tested on the test set of colorectal cancer samples from Denmark that were profiled on the Affymetrix U133A platform.
  • the normalized and scaled test-set data were evaluated with the U133A-limited cDNA classifier. Because the Denmark cases included only Dukes' stages B and C, classification of outcome by Dukes' staging would predict all samples to be of good prognosis.
  • the accuracy of the cDNA classifier was reduced from 72% in LOOCV of the training set (Tables 6A-6C) to 68% in the Denmark cross-platform test set (Tables 7A-7C).
  • the classifier of the subject invention was able to predict the outcome for poor prognosis patients (sensitivity) with an accuracy of 55% whereas 0% would be predicted correctly by Dukes' staging.
  • TABLE 7A Accuracy of U133A limited Molecular Staging on Cross-Platform Denmark Independent Test Set. Classification Method Total Accuracy Sensitivity Specificity Dukes' Staging 64% 0% 100% Molecular Staging 68.5% 55% 75%
  • the present invention provides a colon cancer clinical classifier with significant accuracy in LOOCV that exceeds that of Dukes staging.
  • the utility of the classifier of the subject invention can be validated, such as against in an independent colon cancer population using a completely different microarray platform.
  • the gene classifier of the subject invention can be based on a core set of genes that have biological significance for any type of cancer, including human colon cancer progression.
  • the molecular staging/classifier of the subject invention provides more accurate predictions of patient outcome than is currently possible with current clinical staging systems, which may, in fact, misclassify patients.
  • a set of genes is derived from a genome wide analysis of gene expression using known microarray analysis techniques (i.e., SAM). By clustering groups of patients with good and bad prognoses, it is illustrated that the prognosis/classifier of the subject invention presents outcome-rich information.
  • a supervised learning analysis can be used to identify a core set of informative genes.
  • a core set of 43 genes was identified that appeared in 75% of the cross validation iterations and accurately predicted colorectal cancer survival. This core set was derived from a 32,000-element cDNA microarray that included both named and unnamed genes. This gene set was highly accurate in predicting survival when compared with Dukes' staging data from the same patients.
  • a means for validating a prognosis/survival classifier is provided by the present invention.
  • a normalized and scaled oligonucleotide-based colorectal cancer database from Denmark was evaluated based on the Affymetrix U133A GeneChipTM.
  • a colorectal cancer classifier (U133A-based cDNA classifier) was produced on the training data set using a limited set of genes common to both the U133A and the cDNA microarray (for 78 genes). The U133A-based cDNA classifier was then applied directly to the normalized and scaled Denmark test population.
  • the classifier of the subject invention can identify those genes that are most biologically significant based on their frequency of appearance in the classification set.
  • those genes that are most biologically significant to colorectal cancer were identified using the classifier provided in Example 1. Specifically, osteopontin and neuregulin reported biological significance in the context of colorectal cancer.
  • Osteopontin a secreted glycoprotein and ligand for CD44 and ⁇ v ⁇ 3, appears to have a number of biological functions associated with cellular adhesion, invasion, angiogenesis and apoptosis (see Fedarko NS et al., “Elevated serum bone sialoprotein and osteopontin in colon, breast, prostate, and lung cancer,” Clin Cancer Res, 7:4060-6 (2001); Yeatman T J and Chambers A F, “Osteopontin and colon cancer progression,” Clin Exp Metastasis, 20:85-90 (2003)).
  • osteopontin was identified as a gene whose expression was strongly associated with colorectal cancer stage progression (Agrawal D et al., “Osteopontin identified as lead marker of colon cancer progression, using pooled sample expression profiling,” J Natl Cancer Inst, 94:513-21 (2002)).
  • INSIG-2 one of the 43 core classifier genes provided in Example 1, was recently identified as an osteopontin signature gene, suggesting that an osteopontin pathway may be prominent in regulating colon cancer survival.
  • neuregulin appeared to have biological significance in the context of colorectal cancer based on frequency of appearance in the classification set of the present invention.
  • Neuregulin a ligand for tyrosine kinase receptors (ERBB receptors)
  • ERBB receptors a ligand for tyrosine kinase receptors
  • Neuregulin was recently identified as a prognostic gene whose expression correlated with bladder cancer recurrence (Dyrskjot L, et al., “Identifying distinct classes of bladder carcinoma using microarrays,” Nat Genet, 33:90-6 (2003)).
  • a therapeutic gene may be identified, which when reintroduced into tumor cells, may arrest or even prevent growth in cancer cells. Additionally, using the classifier of the present invention, a therapeutic gene may be identified that enables increased responsiveness to interventions such as radiation or chemotherapy. Sequences ACCESSION No.
  • AI203139 ORIGIN 1 tttttttgagt ttggcatgtt aatttttatc agcgacttct ggggcctagc accattcccg 61 gaagaaggga gttgtcgggc agggtcctta atgggggttg caattcttgt cttggttggg 121 aaagagccta gctgggaaca ggggtcgttt gtgtagtaac tgtattaagc ACCESSION No.
  • AI288845 ORIGIN 1 tttttagatg ttttaaata catttatttc atgtcgtttg tccccagggt ttggagtttg 61 atgttctgga ccaagcgtag gctctgagca aatgctacca gggctggaga atcagttctg 121 ccacttccta gttaagtgat cttagacaaa tttccgcgccc ttagttttct tctcagagaa 181 atgagactag tcctatccac actatggaca agtggtagga ggcgaaggag ctcacgtttg 241 taaagagcct tgctgagacaaa ttcagtgcttt agca

Abstract

The present invention pertains to specific gene signatures for cancer that are used to predict survival and novel processes for identifying such gene signatures. In one embodiment, gene signatures for human colorectal cancer are identified and outcomes are linked to the specific gene signatures using significance analysis of microarrays (SAM) and support vector machines (SVM) to provide a prognosis/survival classifier.

Description

    CROSS-REFERENCE TO A RELATED APPLICATION
  • This application claims the benefit of U.S. Provisional Application Ser. No. 60/547,871, filed Feb. 25, 2004, which is hereby incorporated by reference in its entirety.
  • BACKGROUND OF THE INVENTION
  • In the last decade, scientists have labored to complete a high-quality, comprehensive sequence of the human genome. With its recent completion, a large number of genomic data sets have been made available in public databases. The available data, however, does not provide explanations regarding which aspects of human biology affect which genes. Researchers are just beginning to explore genomic function.
  • Several technological advances have made it possible to accurately measure cellular constituents and therefore derive profiles. For example, new techniques provide the ability to monitor the expression level of a large number of transcripts at any one time (see, for example, Schena et al., “Quantitative monitoring of gene expression patterns with a complementary DNA micro-array,” Science, 270:467-470 (1995); Lockhart et al., “Expression monitoring by hybridization to high-density oligonucleotide arrays,” Nature Biotechnology, 14:1675-1680 (1996); and Blanchard et al., “Sequence to array: Probing the genome's secrets,” Nature Biotechnology, 14:1649 (1996)). In organisms for which the complete genome is known, it is possible to analyze the transcripts of all genes within the cell. With other organisms, such as humans, for which there is an increasing knowledge regarding the genome, it is possible to simultaneously monitor large numbers of the genes within the cell.
  • One aspect of human biology/genomic function that is of great interest to the medical research community is cancer. Currently, genetic samples have been taken from patients having various stages of various types of cancer. Such samples have provided an extensive genetic data collection. To provide a system of organization, such genetic data are collected in DNA microarrays, which are sometimes commonly referred to as biochips, DNA chips, gene arrays, gene chips, and genome chips.
  • DNA microarrays exploit a phenomenon known as base-pairing or hybridization. To form the array, genetic samples are arranged in an orderly manner (typically in a rectangular grid) on a substrate. Examples of commonly used substrates include microplates and blotting membranes. Many modern microarrays include an array of oligonucleotide or peptide nucleic acid (PNA) probes, and the array is synthesized either in situ (on-chip) or by conventional synthesis followed by on-chip immobilization. The array on the chip is exposed to labeled sample DNA, hybridized, and the identity/abundance of complementary sequences are determined.
  • There are two major uses of DNA microarray technology. The first involves identification of the gene sequence. The second involves determination of expression level of genes, generally referred to as the abundance of the genes. In particular, expression or abundance of a gene is a measure of a relative level of activity of the gene in replication or translation in the presence of the probe. By analyzing the abundance of various genes in people of various conditions, a relationship between the genetic state of a person, in terms of relative levels of activity of various genes of that person, and that person's condition is assessed. To conduct such analysis, such arrays of expression levels include metadata describing characteristics of the people whose genetic material is sampled and additional metadata which identifies specific genes whose expression levels are represented in such arrays.
  • The use of microarrays are already being used for a number of beneficial purposes including, for example, identifying biomarkers of cancer (Welsh, J B et al., “Large-scale delineation of secreted protein biomarkers overexpressed in cancer tissue and serum,” PNAS, 100(6):3410-3415 (March 2003)), creating gene expression-based classifications of cancers (Alzadeh, A A et al., “Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling,” Nature, 403:513-11 (2000); and Garber, M E et al., “Diversity of gene expression in adenocarcinoma of the lung,” Proc Natl Acad Sci USA, 98:13784-9 (2001)), and in drug discovery (Marton, M J et al., “Drug target validation and identification of secondary drug target effects using Microarrays,” Nat Med, 4(11):1293-301 (1998); and Gray, N S et al., “Exploiting chemical libraries, structure, and genomics in the search for kinase inhibitors,” Science, 281:533-538 (1998)). One tool that has been applied to microarrays to decipher and compare genome expression patterns in biological systems is Significance Analysis of Microarrays, or SAM (Tusher, V. et al., “Significance analysis of microarrays applied to ionizing radiation response,” Proceedings of the National Academy of Sciences, 2001. First published Apr. 17, 2001, 10.1073/pnas.091062498). This statistical method was developed as a cluster tool for use in identifying genes with statistically significant changes in expression. SAM has been used for a variety of purposes, including identifying potential drugs that would be effective in treating various conditions associated with specific gene expressions (Bunney W E, et al., “Microarray technology: a review of new strategies to discover candidate vulnerability genes in psychiatric disorders,” Am J Psychiatry, 160(4):657-66 (April 2003)).
  • The known SVM or (Support Vector Machine) (as described in Michael P. et al., “Knowledge-based analysis of microarray gene expression data by using support vector machines,” Proceedings of the National Academy of Sciences, 97(1):262-67 (2000)) is a correlation tool shown to perform well in multiple areas of biological analysis, including evaluating microarray expression data (Brown et al, “Knowledge-based analysis of microarray gene expression data by using support vector machines,” Proc Natl Acad Sci USA, 97:262-267 (2000)), detecting remote protein homologies (Jaakkola, T. et al., “Using the Fisher kernel method to detect remote protein homologies,” Proceedings of the 7th International Conference on Intelligent Systems for Molecular Biology, AAAI Press, Menlo Park, Calif. (1999)), and recognizing translation initiation sites (Zien, A. et al., “Engineering support vector machine kernels that recognize translation initiation sites,” Bioinformatics, 16(9):799-807 (2000)). When used for classification, SVMs separate a given set of binary labeled training data with a hyper-plane that is maximally distant from set of data (the “maximal margin hyper-plane”). Where no linear separation is possible, SVMs utilize the technique of “kernels” to automatically realize a non-linear mapping to a feature space (Furey, T. S. et al., “Support vector machine classification and validation of cancer tissue samples using microarray expression data,” Bioinformatics, 16(10):906-914 (2000)).
  • Ranked as the third most commonly diagnosed cancer and the second leading cause of cancer deaths in the United States (American Cancer Society, “Cancer facts and figures,” Washington, D.C.: American Cancer Society (2000)), colon cancer is a deadly disease afflicting nearly 130,000 new patients yearly in the United States. Colon cancer is the only cancer that occurs with approximately equal frequency in men and women. There are several potential risk factors for the development of colon and/or rectal cancer. Known factors for the disease include older age, excessive alcohol consumption, sedentary lifestyle (Reddy, B. S., “Dietary fat and its relationship to large bowel cancer,” Cancer Res., 41:3700-3705 (1981)), and genetic predisposition (Potter, J D “Colorectal cancer: molecules and populations,” J Natl Cancer Institute, 91:916-932 (1999)).
  • Several molecular pathways have been linked to the development of colon cancer (see, for example, Leeman M F, et al., “New insights into the roles of matrix metalloproteinases in colorectal cancer development and progression,” J Pathol., 201(4):528-34 (2003); Kanazawa, T et al., “Does early polypoid colorectal cancer with depression have a pathway other than adenoma-carcinoma sequence?,” Tumori., 89(4):408-11 (2003); and Notarnicola, M. et al., “Genetic and biochemical changes in colorectal carcinoma in relation to morphologic characteristics,” Oncol Rep., 10(6):1987-91 (2003)), and the expression of key genes in any of these pathways may be affected by inherited or acquired mutation or by hypermethylation. A great deal of research has been performed with regard to identifying genes for which changes in expression may provide an early indicator of colon cancer or a predisposition for the development of colon cancer. Unfortunately, no research has yet been conducted on identifying specific genes associated with colorectal cancer and specific outcomes to provide an accurate prediction of prognosis.
  • Survival of patients with colon and/or rectal cancer depends to a large extent on the stage of the disease at diagnosis. Devised nearly seventy years ago, the modified Dukes' staging system for colon cancer, discriminates four stages (A, B, C, and D), primarily based on clinicopathologic features such as the presence or absence of lymph node or distant metastases. Specifically, colonic tumors are classified by four Dukes' stages: A, tumor within the intestinal mucosa; B, tumor into muscularis mucosa; C, metastasis to lymph nodes and D, metastasis to other tissues. Of the systems available, the Dukes' staging system, based on the pathological spread of disease through the bowel wall, to lymph nodes, and to distant organ sites such as the liver, has remained the most popular. Despite providing only a relative estimate for cure for any individual patient, the Dukes' staging system remains the standard for predicting colon cancer prognosis, and is the primary means for directing adjuvant therapy.
  • The Dukes' staging system, however, has only been found useful in predicting the behaviour of a population of patients, rather than an individual. For this reason, any patient with a Dukes A, B, or C lesion would be predicted to be alive at 36 months while a patient staged as Dukes D would be predicted to be dead. Unfortunately, application of this staging system results in the potential over-treatment or under-treatment of a significant number of patients. Further, Dukes' staging can only be applied after complete surgical resection rather than after a pre-surgical biopsy.
  • Microarray technology, as described above, has permitted development of multi-organ cancer classifiers (Giordano, T. J. et al., “Organ-specific molecular classification of primary lung, colon, and ovarian adenocarcinomas using gene expression profiles,” Am J Pathol, 159:1231-8 (2001); Ramaswamy, S. et al., “Multiclass cancer diagnosis using tumor gene expression signatures,” Proc Natl Acad Sci USA, 98:15149-54 (2001); and Su, A. I. et al., “Molecular classification of human carcinomas by use of gene expression signatures,” Cancer Res, 61:7388-93 (2001)), identification of tumor subclasses (Dyrskjot, L. et al., “Identifying distinct classes of bladder carcinoma using microarrays,” Nat Genet, 33:90-6 (2003); Bhattacharjee, A. et al., “Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses,” Proc Natl Acad Sci USA, 98:13790-5 (2001); Garber, M. E. et al., “Diversity of gene expression in adenocarcinoma of the lung,” Proc Natl Acad Sci USA, 98:13784-9. (2001); and Sorlie, T. et al., “Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications,” Proc Natl Acad Sci USA, 98:10869-74 (2001)), discovery of progression markers (Sanchez-Carbayo, M. et al., “Gene Discovery in Bladder Cancer Progression using cDNA Microarrays,” Am J Pathol, 163:505-16 (2003); and Frederiksen, C M, et al., “Classification of Dukes' B and C colorectal cancers using expression arrays,” J Cancer Res Clin Oncol, 129:263-71 (2003)); and prediction of disease outcome (Henshall, S M et al., “Survival analysis of genome-wide gene expression profiles of prostate cancers identifies new prognostic targets of disease relapse,” Cancer Res, 63:4196-203 (2003); Shipp, M A et al., “Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning,” Nat Med, 8:68-74 (2002); Beer, D G et al., “Gene-expression profiles predict survival of patients with lung adenocarcinoma,” Nat Med, 8:816-24 (2002); Pomeroy, S L et al., “Prediction of central nervous system embryonal tumor outcome based on gene expression,” Nature, 415:436-42 (2002); van 't Veer, L J et al., “Gene expression profiling predicts clinical outcome of breast cancer: Nature, 415:530-6. (2002); Vasselli, J R et al., “Predicting survival in patients with metastatic kidney cancer by gene-expression profiling in the primary tumor,” Proc Natl Acad Sci USA, 100:6958-63 (2003); and Takahashi, M. et al., “Gene expression profiling of clear cell renal cell carcinoma: gene identification and prognostic classification,” Proc Natl Acad Sci USA, 98:9754-9 (2001)) in many types of cancer.
  • Classification of patient prognosis by microarray analysis has promise in predicting the long-term outcome of any one individual based on the gene expression profile of the tumor at diagnosis. Inherent to this approach is the hypothesis that every tumor contains informative gene expression signatures, at the time of diagnosis, which can direct the biological behaviour of the tumor over time. To date, however, little success has been achieved in developing a classifier that will predict colon cancer outcome equivalent to or better than that which is possible using the standard clinicopathologic staging systems (i.e., Dukes' stage system). What is needed is a particularly effective mechanism for analyzing genomic array data to provide a classifier that accurately predicts cancer outcomes, in particular, colon cancer outcomes.
  • BRIEF SUMMARY OF THE INVENTION
  • The present invention provides systems and methods for predicting outcomes in patients diagnosed with cancer. Specifically, the subject invention utilizes molecular staging with gene expression profiles to stage patients with cancer. In a specific embodiment, the present invention provides a gene expression profile based classifier that provides a means for accurately predicting colon cancer outcome.
  • In accordance with an aspect of the invention, genes are classified according to degree of correlation with a clinical outcome for a cancer of interest (such as colon cancer). These genes are used to establish a set of reference gene expression levels (also referred to herein as a “classifier”). Biological information regarding the patient is received and used to extrapolate intracellular gene expression. The intracellular gene expression levels are compared to those in the classifier to predict clinical outcome.
  • In one embodiment of the invention, a method is provided in which the specific gene signatures for colon cancer are identified. To do so, frozen tumor specimens form patients with known outcomes are collected and frozen. The outcomes are linked to a specific core set of genes that are weighted in importance by (1) selecting genes of interest by applying microarray analysis; (2) producing a classifier using support vector machines (SVM); and (3) cross-validating the genes of interest and the classifier by comparing them against an independent set of test data. In a preferred embodiment, significance analysis of microarrays (SAM) is utilized to select genes of interest.
  • Genome wide microarray analyses can produce large datasets that can be pattern-matched to clinicopathologic parameters such as patient outcomes and prognosis. Accordingly, the subject invention identifies gene expression signatures that would predict colon cancer outcome more accurately than the well-accepted Dukes' staging system.
  • In one embodiment, a group of colon cancer patients was examined to develop a survival classifier, which was subsequently validated using an entirely independent test set of data derived on a different microarray platform at a different performance site. The classifier of the subject invention was ultimately based on a core set of genes selected for their correlation to survival. A number of the genes in the core set demonstrated intrinsic biological significance for colon cancer progression.
  • With the ability to predict cancer outcomes/prognosis using the subject invention, appropriate treatment protocols can be selected for patients. For example, patients assessed using the subject invention and identified to have poor outcomes may be treated more aggressively or with specific agents (i.e., anti-sense agents, RNA inhibition agents, small molecule inhibitors of the cancer activity, gene therapy, etc.). Accordingly, an important contribution of the prognosis/survival classifier of the present invention is the ability to identify those Dukes' stage B and C cases for which chemotherapy may be beneficial.
  • DESCRIPTION OF THE FIGURES
  • The file of this patent contains at least one drawing executed in color. Copies of this patent with color drawing(s) will be provided by the Patent and Trademark Office upon request and payment of the necessary fee.
  • FIG. 1A is a heatmap illustrating cluster analysis of genes selected in accordance with the present invention when correlated with prognosis/patient survival.
  • FIG. 1B is a heatmap illustrating cluster analysis of genes selected in accordance with the present invention when grouped by Dukes' stage B and C.
  • FIG. 2A graphically illustrates a Kaplan-Meier survival curve based on gene expression profiling in accordance with the present invention.
  • FIG. 2B graphically illustrates a Kaplan-Meier survival curve based on Dukes' staging.
  • FIGS. 3A-3C illustrate survival curves for molecular classifiers in accordance with the subject invention.
  • DETAILED DISCLOSURE OF THE INVENTION
  • The present invention provides systems and methods for predicting cancer prognosis and outcomes. Specifically, the subject invention utilizes molecular staging with gene expression profiles to stage patients with cancer. In a specific embodiment, the present invention provides a gene expression profile based classifier for predicting cancer outcomes/prognosis. Both microarray analysis and binary classification are used to create the classifier of the invention.
  • The subject invention provides methods for predicting patient outcomes comprising: identifying genes that correlate with a clinical outcome for a cancer of interest (such as colon cancer); establishing a set of reference gene expression levels (also referred to herein as a “classifier”) for said identified genes; receiving biological information regarding the patient; using the biological information to extrapolate intracellular gene expression; and comparing intracellular gene expression levels to those in the classifier to predict clinical outcome.
  • Biological information of the invention includes, but is not limited to, clinical samples of bodily fluids or tissues; DNA profile information; and RNA profile information. Methods for preparing clinical samples for gene expression analysis are well known in the art, and can be carried out using commercially available kits.
  • In one embodiment, the subject invention provides methods for predicting colon cancer patient outcomes using a SAM selected set of genes derived from a genome wide analysis of gene expression. Those patients with good and bad prognoses are first clustered into groups that suggest outcome-rich information that is likely present in the gene expression dataset. Subsequently, a supervised SVM analysis identifies a core set of genes that appears in a majority (i.e., 50% or greater, including for example, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90% or 95%) of the cross validation folds and accurately predicts colon cancer survival. Preferably, a core set of genes that appears in 75% of the cross validation folds is identified by an SVM to be used in predicting colon cancer survival.
  • In one embodiment, a gene core set is derived from a cDNA microarray that includes both named and unnamed genes. The resultant gene set is highly accurate in predicting cancer survival when compared with Dukes staging data from the same patients. To validate a cDNA-based classifier of the subject invention, a normalized and scaled oligonucleotide-based cancer database is evaluated against a completely independent set of test data derived from a different microarray platform.
  • Accordingly, the subject invention provides a system for predicting clinical outcome in a patient diagnosed with cancer, wherein the system is useful in offering support/advice in making treatment decisions. The system comprises (1) a data storage device for collecting data (i.e., gene data); and (3) a computing means for receiving and analyzing data to accurately determine genes associated with poor or good patient prognosis. A graphical user interface can be included with the systems of the invention to display clinical data as well as enable user-interaction.
  • In one embodiment, the system of the invention further includes an intelligence system that can use the analyzed clinical data to classify gene samples and offer support/advice for making clinical decisions (i.e., to interpret predicted clinical outcome and provide appropriate treatment). An intelligence system of the subject invention can include, but is not limited to, artificial neural networks, fuzzy logic, evolutionary computation, knowledge-based systems, and artificial intelligence.
  • In accordance with the subject invention, the computing means is preferably a digital signal processor, which can automatically and accurately analyze gene data and determine those genes that strongly correlate to clinical outcome.
  • In one embodiment, the system of the subject invention is stationary. For example, the system of the invention can be used within a healthcare setting (i.e., hospital, physician's office).
  • Definitions
  • As used herein, the term “patient” refers to humans as well as non-human animals including, and not limited to, mammals, birds, reptiles, amphibians, and fish. Preferred non-human animals include mammals (i.e., mouse, rat, rabbit, monkey, dog, cat, primate, pig). A patient may also include transgenic animals. In certain embodiments, a patient may be a laboratory animal raised by humans in a controlled environment other than its natural habitat.
  • The term “cancer,” as used herein, refers to a malignant tumor (i.e., colon or prostate cancer) or growth of cells (i.e., leukaemia). Cancers tend to be less differentiated than benign tumors, grow more rapidly, show infiltration, invasion, and destruction, and may metastasize. Cancer include, and are not limited to, colon and rectal cancers, fibrosarcoma, myxosarcoma, antiosarcoma, leukaemia, squamous cell carcinoma, basal cell carcinoma, malignant melanoma, renal cell carcinoma, and hepatocellular carcinoma.
  • A “marker gene,” as used herein, refers to any gene or gene product (i.e., protein, peptide, mRNA) that indicates a particular clinicopathological state (i.e., carcinoma, normal dysplasia and outcomes) or indicates a particular cell type, tissue type, or origin. The expression or lack of expression of a marker gene may indicate a particular physiological and/or diseased state of a patient, organ, tissue, or cell. Preferably, the expression or lack of expression may be determined using standard techniques such as RT-PCR, sequencing, immunochemistry, gene chip analysis, etc. In certain particular embodiments, the level of expression of a marker gene is quantifiable.
  • The term “polynucleotide” or “oligonucleotide,” as used herein, refers to a polymer of nucleotides. Typically, a polynucleotide comprises at least three nucleotides. The polymer may include natural nucleosides (i.e., adenosine, thymidine, guanosine, cytidine, uridine, deoxyadenosine, deoxythymidine, deoxyguanosine, and deoxycytidine), nucleoside analogs (i.e., 2-aminoadensoine, 2-thio-thymidine, inosine, pyrrolo-pyrimidine, 3-methyl adenosine, C5-propynylcytidine, C5-propynyluridine, C5-bromouridine, C5-fluorouridine, C5-iodouridine, C5-methylcytidine, 7-deazaadenosine, 7-deazaguanosine, 8-oxoadenosine, 8-oxoguanosine, O(6)-methylguanine, and 2-thiocytidine), chemically modified bases, biologically modified bases (i.e., methylated bases), intercalated bases, modified sugars (i.e., 2′-fluororibose, ribose, 2′-deoxyribose, arabinose, and hexose), or modified phosphate groups (i.e., phosphorothioates and 5′-N-phosphoramidite linkages).
  • As used herein, the term “tumor” refers to an abnormal growth of cells. The growth of the cells of a tumor typically exceeds the growth of normal tissue and tends to be uncoordinated. The tumor may be benign (i.e., lipoma, fibroma, myxoma, lymphangioma, meningioma, nevus, adenoma, leiomyoma, mature teratoma, etc.) or malignant (i.e., malignant melanoma, ovarian cancer, carcinoma in situ, carcinoma, adenocarcinoma, liposarcoma, mesothelioma, squamous cell carcinoma, basal cell carcinoma, colon cancer, lung cancer, etc.).
  • The term “bodily fluid,” as used herein, refers to a mixture of molecules obtained from a patient. Bodily fluids include, but are not limited to, exhaled breath, whole blood, blood plasma, urine, semen, saliva, lymph fluid, meningal fluid, amniotic fluid, glandular fluid, sputum, feces, sweat, mucous, and cerebrospinal fluid. Bodily fluid also includes experimentally separated fractions of all of the preceding solutions or mixtures containing homogenized solid material, such as feces, tissues, and biopsy samples.
  • Computing Means
  • Correlating genes to clinical outcomes in accordance with the subject invention can be performed using software on a computing means. The computing means can also be responsible for maintenance of acquired data as well as the maintenance of the classifier system itself. The computing means can also detect and act upon user input via user interface means known to the skilled artisan (i.e., keyboard, interactive graphical monitors) for entering data to the computing system.
  • In one embodiment, the computing means further comprises means for storing and means for outputting processed data. The computing means includes any digital instrumentation capable of processing data input from the user. Such digital instrumentation, as understood by the skilled artisan, can process communicated data by applying algorithm and filter operations of the subject invention. Preferably, the digital instrumentation is a microprocessor, a personal desktop computer, a laptop, and/or a portable palm device. The computing means can be general purpose or application specific.
  • The subject invention can be practiced in a variety of situations. The computing means can directly or remotely connect to a central office or health care center. In one embodiment, the subject invention is practiced directly in an office or hospital. In another embodiment, the subject invention is practiced in a remote setting, for example, personal residences, mobile clinics, vessels at sea, rural villages and towns without direct access to healthcare, and ambulances, wherein the patient is located some distance from the physician.
  • In a related embodiment, the computing means is a custom, portable design and can be carried or attached to the health care provider in a manner similar to other portable electronic devices such as a portable radio pr computer.
  • The computing means used in accordance with the subject invention can contain at least one user-interface device including, but not limited to, a keyboard, stylus, microphone, mouse, speaker, monitor, and printer. Additional user-interface devices contemplated herein include touch screens, strip recorders, joysticks, and rollerballs.
  • Preferably, the computing means comprises a central processing unit (CPU) having sufficient processing power to perform algorithm operations in accordance with the subject invention. The algorithm operations, including the microarray analysis operations (such as SAM or binary classification), can be embodied in the form of computer processor usable media, such as floppy diskettes, CD-ROMS, zip drives, non-volatile memory, or any other computer-readable storage medium, wherein the computer program code is loaded into and executed by the computing means. Optionally, the operational algorithms of the subject invention can be programmed directly onto the CPU using any appropriate programming language, preferably using the C programming language.
  • In certain embodiments, the computing means comprises a memory capacity sufficiently large to perform algorithm operations in accordance with the subject invention. The memory capacity of the invention can support loading a computer program code via a computer-readable storage media, wherein the program contains the source code to perform the operational algorithms of the subject invention. Optionally, the memory capacity can support directly programming the CPU to perform the operational algorithms of the subject invention. A standard bus configuration can transmit data between the CPU, memory, ports and any communication devices.
  • In addition, as understood by the skilled artisan, the memory capacity of the computing means can be expanded with additional hardware and with saving data directly onto external mediums including, for example, without limitation, floppy diskettes, zip drives, non-volatile memory and CD-ROMs.
  • Further, the computing means can also include the necessary software and hardware to receive, route and transfer data to a remote location.
  • In one embodiment, the patient is hospitalized, and clinical data generated by a computing means is transmitted to a central location, for example, a monitoring station or to a specialized physician located in a different locale.
  • In another embodiment, the patient is in remote communication with the health care provider. For example, patients can be located at personal residences, mobile clinics, vessels at sea, rural villages and towns without direct access to healthcare, and ambulances, and by using the classifier system of the invention, still provide clinical data to the health care provider. Advantageously, mobile stations, such as ambulances, and mobile clinics, can monitor patient health by using a portable computing means of the subject invention when transporting and/or treating a patient.
  • To ensure patient privacy, security measures, such as encryption software and firewalls, can be employed. Optionally, clinical data can be transmitted as unprocessed or “raw” signal(s) and/or as processed signal(s). Advantageously, transmitting raw signals allows any software upgrades to occur at the remote location where a computing means is located. In addition, both historical clinical data and real-time clinical data can be transmitted.
  • Communication devices such as wireless interfaces, cable modems, satellite links, microwave relays, and traditional telephonic modems can transfer clinical data from a computing means to a healthcare provider via a network. Networks available for transmission of clinical data include, but are not limited to, local area networks, intranets and the open internet. A browser interface, for example, NETSCAPE NAVIGATOR or INTERNET EXPLORER, can be incorporated into communications software to view the transmitted data.
  • Advantageously, a browser or network interface is incorporated into the processing device to allow the user to view the processed data in a graphical user interface device, for example, a monitor. The results of algorithm operations of the subject invention can be displayed in the form of interactive graphics.
  • Dukes' Staging as a Classifier
  • Since Dukes' staging describes the survival of a population of patients, rather than an individual, any individual patient can be classified as alive or dead using the survivorship of the population to predict that of the individual. In other words, if the survival of a Dukes C population is 55% at 36 months of follow up, the Dukes C individual patient would be classified as alive at 36 months but with only a 55% accuracy rate. By making these assumptions, the accuracy of a staging by a microarray classifier of the subject invention to that of a clinical staging system can be compared.
  • Identification of Prognosis-Related Genes
  • As a first step in the survival analysis of microarray data, genes that best separate cancer patients with poor and good prognosis were identified. Censored-survival analysis using significance analysis of microarrays (SAM) or any other microarray analysis (i.e., clustering methods such as those disclosed by Eisen et al., “Cluster analysis and display of genome-wide expression patterns,” Proc. Natl. Acad. Sci. USA, 95:14863-14868 (1998); Alon et al., “Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays,” Proc. Natl. Acad. Sci. USA, 96:6745-6750 (1999); and Ben-Dor et al., “Tissue classification with gene expression profiles,” J. Comput. Biol., 7:559-583 (2000); classification trees such those disclosed by Dubitzky et al., “A database system for comparative genomic hybridization analysis,” IEEE Eng Med Biol Mag, 20(4):75-83 (2001); genetic algorithms such as those disclosed by L1 et al., “Computational analysis of leukemia microarray expression data using the GA/KNN,” in Methods of Microarray Data Analysis, Kluwer Academic Publishers (2001); neural networks such as those disclosed by Hwang et al., “Applying machine learning techniques to analysis of gene expression data: cancer diagnosis,” in Methods of Microarray Data Analysis, Kluwer Academic Publishers (2001); and the “Neighborhood Analysis” (a weighted correlation method) as disclosed by Golub et al., “Molecular classification of cancer: class discovery and class prediction by gene expression monitoring,” Science, 286:531-537 (1999)) can be used to select genes correlated with prognosis in accordance with the subject invention.
  • Using SAM or any other microarray analysis, genes can be selected that most closely correlate with selected survival times. Permutation analysis can then used to estimate the false discovery rate (FDR). The resultant mean-centered gene expression vectors can then be clustered and visualized using known computer software (i.e., Cluster 3.0 and Java TreeView 1.03, both of which are provided by Hoon MJLd, et al., “Open Source Clustering Software,” Bioinformatics 2003, in press).
  • Classifier Construction and Evaluation
  • According to the present invention, a gene classifier can be constructed to predict a set time of outcome among a set number of patients using microarray data produced on a cDNA platform. In one embodiment, the classifier of the subject invention is produced on a computing means that using SAM two-class gene selection and a support vector machine classification. In one embodiment, the SAM procedure is empirically set to select enough genes to satisfy a set FDR. Such selected genes can then be used in a linear support vector machine to classify the samples as having poor or good prognosis.
  • Leave-one-out cross-validation (LOOCV) operation can also be utilized to construct a classifier (i.e., neural network-based classifier) as well as to estimate the prediction accuracy of the classifier of the subject invention. In one embodiment, the classification process includes both gene selection and SVM classification creation; therefore, both steps can be performed on each training set after the test example is removed. According to the subject invention, samples can be classified as having “good” or “poor” prognosis based on survival for a certain set amount of time. In a preferred embodiment, “good” or “poor” prognosis is based on more or less than 36 months, respectively.
  • By using the leave-one-out cross validation approach, the subject invention provides a means for ranking the genes selected. The number of times a particular gene is chosen can be an indicator of the usefulness of that gene for general classification and may imply biological significance.
  • In a preferred embodiment, the classifier of the subject invention is prepared by (1) SAM gene selection using a t-test and (2) classification using a neural network. The classifier is prepared after a test sample is left out (from the LOOCV) to avoid bias from the gene selection step. Since the classification problem is a binary decision, a t-test was used for gene selection.
  • Preferably, once a gene set is selected, a feed-forward back-propogation neural network system (see Rumelhart, D. E. and J. L. McClelland, “Parallel Distributed Processing: Exploration in the Microstructure of Cognition,” Cambridge, Mass.: MIT Press (1986); and Fahlman, S. E., “Faster-Learning Variations on Back-Propogation: An Empirical Study,” Proceedings of the 1988 Connectionist Models Summer School, Los Altos, Calif.: Morgan-Kaufmann (1988)) is used. In one embodiment, a feed-forward back-propogation neural network with a single layer of 10 units is used. Neural network systems are extremely robust to both the number of genes selected and the level of noise in these genes.
  • Statistical Significance
  • Differences between Kaplan-Meier curves can be evaluated using the log-rank test, which is well known to the skilled statistician. This can be performed both for the initial survival analysis and for the classifier results. In accordance with the present invention, the classifier can split the samples into various groups (i.e., two groups: those predicted as good or poor prognosis). Classifier accuracy can be reported to the user both as overall accuracy and as specificity/sensitivity. In one embodiment, a McNemar's Chi-Squared test is used to compare the molecular classifier with the use of a Dukes' staging classifier. In a related embodiment, several permutations of the dataset (i.e., 1,000 permutations) are used to measure the significance of the classifier results as compared to chance.
  • EXAMPLE 1 Human Colon Cancer Survival Classifier
  • Training Set Tumor Samples
  • In one embodiment of the subject invention, a colon cancer survival classifier was developed using 78 tumor samples, including 3 adenomas and 75 cancers. Informative frozen colorectal cancer samples were selected from the Moffitt Cancer Center Tumor Bank (Tampa, Fla.) based on evidence for good (survival >36 mo) or poor prognosis (survival <36 mo) from the Tumor Registry. Dukes' stages can include B, C, and D. In this particular embodiment, survival was measured as last contact minus collection date for living patients, or date of death minus collection date for patients who have died.
  • In this embodiment, the number of samples per Dukes' stage was as follows: 23 patients with stage B, 22 patients with stage C and 30 patients with stage D disease. Just as adenomas can be included to help train the classifier to recognize good prognosis patients, Dukes D patients with synchronous metastatic disease can be used to train the classifier to recognize poor prognosis patients.
  • In a related embodiment, all samples were selected to have at least 36 months of follow-up. The follow-up results in this embodiment showed that thirty-two of the patients survived more than 36 months, while 46 patients died within 36 months. With this particular embodiment, the median follow-up time for all 78 patients was 27.9 months. The median follow-up for the poor prognosis cases (<36 months survival) was 11.7 months and for the good prognosis cases (>36 months survival) it was 64.2 months.
  • Since the NIH consensus conference in 1990, chemotherapeutic application in the United States has been relatively homogeneous, with nearly all Dukes stage B avoiding chemotherapy, and nearly all Dukes stage C receiving 6 months of adjuvant 5-fluorouracil (5-FU) and leucovorin.
  • Test Set Tumor Samples (Denmark)
  • In another embodiment, eighty-eight patients with Dukes' stage B and C colorectal cancer and a minimum follow-up time of 60 months were selected for array hybridization. Ten micrograms of total RNA were used as starting material for the cDNA preparation and hybridized to Affymetrix U133A GeneChips (Santa Clara, Calif.) by standard protocols supplied by the manufacturer. The U133A gene chip is disclosed in U.S. Pat. Nos. 5,445,934; 5,700,637; 5,744,305; 5,945,334; 6,054,270; 6,140,044; 6,261,776; 6,291,183; 6,346,413; 6,399,365; 6,420,169; 6,551,817; 6,610,482; and 6,733,977; and in European Patent Nos. 619,321 and 373,203, all of which are hereby incorporated in their entirety, including all figures and tables, to the extent they are not inconsistent with the explicit teachings of this specification.
  • With this particular embodiment, there were 28 patients with stage B and 60 patients with stage C colorectal cancers. All Dukes' stage B patients were treated by surgical resection alone whereas all C patients received 5-FU/leucovorin adjuvant chemotherapy in addition to surgery. Colorectal tumor samples were obtained fresh from surgery and were immediately snap-frozen in fluid nitrogen but were not microdissected, with the potential for inclusion of samples with <80% purity. Total RNA was isolated from 50-150 mg tumor sample using RNAzol (WAK-Chemie Medical) or using spin column technology (Sigma) according to the manufacturer's instructions. Results were noted (i.e., fifty-seven of the patients survived more than 36 months, while 31 died within 36 months).
  • 32K cDNA Array Hybridization and Scanning
  • According to the subject invention, samples can be microdissected (>80% tumor cells) by frozen section guidance and RNA extraction performed using Trizol followed by secondary purification on RNAEasy columns. The samples can then be profiled on cDNA arrays (i.e., TIGR's 32,488-element spotted cDNA arrays, containing 31,872 human cDNAs representing 30,849 distinct transcripts—23,936 unique TIGR TCs and 6,913 ESTs, 10 exogenous controls printed 36 times, and 4 negative controls printed 36-72 times).
  • In one embodiment, tumor samples are co-hybridized with a common reference pool in the Cy5 channel for normalization purposes. cDNA synthesis, aminoallyl labeling and hybridizations can be performed according to previously published protocols (see Hegde, P. et al., “A concise guide to cDNA microarray analysis,” Biotechniques; 29:552-562 (2000) and Yang, I. V, et al., “Within the fold: assessing differential expression measures and reproducibility in microarray assays,” Genome Biol; 3:research0062 (2002)). For example, labeled first-strand cDNA is prepared, and co-hybridized with labeled samples are prepared, from a universal reference RNA consisting of equimolar quantities of total RNA derived from three cell lines, CaCO2 (colon), KM12L4A (colon), and U118MG (brain). Detailed protocols and description of the array are available at <http://cancer.tigr.org>. Array probes are identified and local background can be subtracted in Spotfinder (Saeed, A. I. et al., “TM4: a free, open-source system for microarray data management and analysis,” Biotechniques; 34:374-8 (2003)). Individual arrays can be normalized in MIDAS (see Saeed, A.I. ibid.) using LOWESS (an algorithm known to the skilled artisan for use in normalizing data) with smoothing parameter set to 0.33.
  • Microarray Hybridization and Scanning of Denmark Samples
  • The first and second strand cDNA synthesis can be performed using the SuperScript II System (Invitrogen) according to the manufacturer's instructions except using an oligodT primer containing a T7 RNA polymerase promoter site. Labeled cRNA is prepared using the BioArray High Yield RNA Transcript Labeling Kit (Enzo). Biotin labeled CTP and UTP (Enzo) are used in the reaction together with unlabeled NTP's. Following the IVT reaction, the unincorporated nucleotides are removed using RNeasy columns (Qiagen). Fifteen micrograms of cRNA are fragmented at 940 C for 35 min in a fragmentation buffer containing 40 mM Tris-acetate pH 8.1, 100 mM KOAc, 30 mM MgOAc. Prior to hybridization, the fragmented cRNA in a 6×SSPE-T hybridization buffer (1 M NaCl, 10 mM Tris pH 7.6, 0.005% Triton) is heated to 95° C. for 5 min and subsequently to 45° C. for 5 min before loading onto the Affymetrix HG_U133A probe array cartridge. The probe array is then incubated for 16 h at 45° C. at constant rotation (60 rpm). The washing and staining procedure can be performed in an Affymetrix Fluidics Station.
  • The probe array can be exposed to several washes (i.e., 10 washes in 6×SSPE-T at 25° C. followed by 4 washes in 0.5×SSPE-T at 50° C.). The biotinylated cRNA can then be stained with a streptavidinphycoerythrin conjugate, final concentration 2 mg/ml (Molecular Probes, Eugene, Oreg.) in 6×SSPE-T for 30 min at 25° C. followed by 10 washes in 6×SSPE-T at 25° C. An antibody amplification step can then follow, using normal goat IgG as blocking reagent, final concentration 0.1 mg/ml (Sigma) and biotinylated anti-streptavidin antibody (goat), final concentration 3 mg/ml (Vector Laboratories). This can be followed by a staining step with a streptavidin-phycoerythrin conjugate, final concentration 2 mg/ml (Molecular Probes, Eugene, Oreg.) in 6×SSPE-T for 30 min at 25° C. and 10 washes in 6×SSPE-T at 25° C. The probe arrays are scanned (i.e., at 560 nm using a confocal laser-scanning microscope (Hewlett Packard GeneArray Scanner G2500A)). The readings from the quantitative scanning can then be analyzed by the Affymetrix Gene Expression Analysis Software (MAS 5.0) and normalized to a common mean expression value of 150.
  • Survival Analysis
  • The first analysis of the colon cancer survival data can be performed using censored survival time (in months) and 500 permutations. Significance analysis of microarrays (SAM) can then be used to select genes most closely correlated to survival. The subset of genes that correspond to an empirically derived, estimated false discovery rate (FDR) is then chosen. This subset of genes can then be used in subsequent analyses. In one embodiment, Cluster 3.0 and Java TreeView 1.03 are used to cluster and visualize the SAM-selected genes.
  • A hierarchical clustering algorithm can be chosen, with complete linkage and the correlation coefficient (i.e., Pearson correlation coefficient) as the similarity metric. In another embodiment, the Dukes' staging clusters are manually created in the appropriate format. Clustering software produces heatmap (see FIGS. 1A and 1B) and dendrograms. The highest level partition of the SAM-selected genes can then be chosen as a survival grouping. Given two clusters of survival times, Kaplan-Meier curves can be plotted (see FIGS. 2A and 2B).
  • Identification of Prognosis-Related Genes
  • According to the subject invention, SAM survival analysis can be used to identify a set of genes most correlated with censored survival time using the training set tumor samples. In one embodiment, a set of 53 genes was found, corresponding to a median expected false discovery rate (FDR) of 28%. These genes are listed in the following Table 1, wherein genes denoted with (+) indicate a positive correlation to survival time and genes without the (+) notation indicate a negative correlation in survival time (over expression in poor prognosis cases). Included in this list of genes in Table 1 are several genes believed to be biologically significant, such as osteopontin and neuregulin.
    TABLE 1
    Censored survival analysis using SAM, resultant 53 genes selected with median
    28% FDR
    UniGene
    GeneBank ID ID Description
    N36176 Hs.108636 membrane protein CH1
    AA149253 Hs.107987 N/A
    AA425320 Hs.250461 hypothetical protein; MDG1; similar to putative microvascular
    endothelial differentiation gene 1; similar to X98993 (PID: g1771560)
    AA775616 Hs.313 OPN-b; osteopontin; secreted phosphoprotein 1 (osteopontin, bone
    sialoprotein I, early T-lymphocyte activation 1)
    N72847 Hs.125221 Alu subfamily SP sequence contamination warning entry. [Human]
    {Homo sapiens}
    AA706226 Hs.113264 neuregulin 2 isoform 4
    AA976642 Hs.42116 axin 2 (conductin, axil)
    AA133215 Hs.32989 Receptor activity-modifying protein 1 precursor (CRLR activity-
    modifyingprotein 1)
    AA457267 Hs.70669 P19 protein; HMP19 protein
    N50073 Hs.84926 hypothetical protein
    R38360 Hs.145567 Unknown {Homo sapients}
    AA450205 Hs.8146 translocation protein-1; Sec62; Dtrp1 protein; membrane protein
    SEC62, S. cerevisiae, homolog of [Homo sapiens];
    AA148578 Hs.110956 KOX 13 protein (56 AA)
    R38640 Hs.89584 insulinoma-associated 1; bA470C13.2 (insulinoma-associated protein 1)
    AA487274 Hs.48950 heptacellular carcinoma novel gene-3 protein; DAPPER1
    N53172 Hs.23016 orphan receptor; orphan G protein-coupled receptor RDC1
    AA045308 Hs.7089 insulin induced protein 2; INSIG-2 membrane protein
    AA045075 Hs.62751 syntaxin 7
    N63366 Hs.161488 N/A
    R22340 null chr2 synaptotagmin; KIAA1228 protein
    AA437223 Hs.46640 Adult retina protein
    AA481250 Hs.154138 chitinase precursor; chitinase 3-like 2; chondrocyte protein 39
    AA045793 Hs.6790 hypothetical protein; MDG1; similar to putative microvascular
    endothelial differentiation gene 1; similar to X98993 (PID: g1771560);
    microvascular endothelial differentiation gene 1 product; microvascular
    endothelial differentiation gene 1; DKFZP564F1862 p
    H87795 Hs.233502 N/A
    AA121806 Hs.84564 Rab3c; hypothetical protein BC013033
    AA284172 Hs.89385 NPAT; predicted amino acids have three regions which share similarity
    to annotated domains of transcriptional factor oct-1, nucleolus-
    cytoplasm shuttle phosphoprotein and protein kinases; NPAT; nuclear
    protein, ataxiatelangiectasia locus; Similar to nuc
    R68106 Hs.233450 Fc-gamma-RIIb2; precursor polypeptide (AA −42 to 249); IgG Fc
    receptor; IgG Fc receptor; IgG Fc receptor beta-Fc-gamma-RII; IgG Fc
    fragment receptor precursor; Fc gamma RIIB [Homo sapiens]; Fc
    gamma RIIB [Ho
    AA479270 Hs.250802 Diff33 protein homolog; KIAA1253 protein [Homo sapiens];
    KIAA1253protein [Homo sapiens]
    AA432030 Hs.179972 Interferon-induced protein 6-16 precursor (Ifi-6-16). [Human] {Homo
    sapiens}
    R10545 Hs.148877 dJ425C14.2 (Placental protein
    AA453508 Hs.168075 transportin; karyopherin (importin) beta 2 [Homo sapiens]; karyopherin
    beta 2; importin beta 2; transportin; M9 region interaction protein [Homo
    sapiens]
    AI149393 Hs.9302 phosducin-like protein; phosducin-like protein; phosducin-like protein;
    phosducin-like protein; hypothetical protein; phosducin-like; Unknown
    (proteinfor MGC: 14088) [Homo sapiens]
    AA883496 Hs.125778 Null
    AA167823 Hs.112058 CD27BP {Homo sapiens}
    AI203139 Hs.180370 hypothetical protein FLJ30934 [Homo sapiens]
    +H19822 Hs.2450 KIAA0028; leucyl-tRNA synthetase, mitochondrial [Homo sapiens];
    leucyltRNA synthetase, mitochondrial [Homo sapiens]; leucine-tRNA
    ligase precursor; leucine translase [Homo sapiens]
    +W73732 Hs.83634 Null
    +AA777892 Hs.121939 Null
    +AA885478 Hs.125741 unnamed protein product [Homo sapiens]; hypothetical protein
    FLJ12505 [Homo sapiens]; Unknown (protein for MGC: 39884) [Homo
    sapiens]
    +AA932696 Hs.8022 TU3A protein; TU3A protein [Homo sapiens]
    +AA481507 Hs.159492 unnamed protein product [Homo sapiens]
    +H18953 Hs.15232 Null
    +AA709158 Hs.42853 put. DNA binding protein; put. DNA binding protein; cAMP responsive
    element binding protein-like 1; Creb-related protein [Homo sapiens]
    +AA488652 Hs.4209 HSPC235; ribosomal protein L2; Similar to ribosomal protein,
    mitochondrial, L2 [Homo sapiens]; mitochondrial ribosomal protein
    L37; ribosomal protein, mitochondrial, L2 [Homo sapiens]
    +N39584 Hs.17404 Null
    +H62801 Hs.125059 Unknown (protein for IMAGE: 4309224) [Homo sapiens]; hypothetical
    protein [Homo sapiens]
    +H17638 Hs.17930 dJ1033B10.2.2 (chromosome 6 open reading frame 11 (BING4),
    isoform 2) [Homo sapiens]
    +R43684 Hs.165575 dJ402G11.5 (novel protein similar to yeast and bacterial predicted
    proteins) {Homo sapiens}
    +N21630 Hs.143039 hypothetical protein PRO1942
    +T81317 Hs.189846 Alu subfamily J sequence contamination warning entry. [Human]
    {Homosapiens}
    +R45595 Hs.23892 Null
    +T90789 Hs.121586 ray; small GTP binding protein RAB35 [Homo sapiens]; RAB35,
    member RAS oncogene family,; ras-related protein rab-1c (GTP-binding
    protein ray) [Homosapiens]
    +AA283062 Hs.73986 Similar to CDC-like kinase 2 {Homo sapiens}

    Any and all of the nucleotide and/or amino acid sequences associated with the accession numbers listed in Table 1 are hereby incorporated by reference.
  • FIG. 1A presents a graphical representation of the 53 SAM-selected genes (as described above) as a clustered heat map. The red color represents over-expressed genes relative to green, under-expressed genes. FIG. 1A shows only the Dukes' stage B and C cases, whose outcome Dukes' staging predicts poorly. Since only genes correlated with survival are used in clustering, the distinctly illustrated clusters in the heatmap correspond to very different prognosis groups.
  • The 53 SAM-selected genes were also arranged by annotated Dukes' stage in FIG. 1B. Unlike FIG. 1A, where two gene groups were apparent, there was no discernible gene expression grouping when arranged by Dukes' stage.
  • FIG. 2A shows the Kaplan-Meier plot for two dominant clusters of genes correlated with stage B and C test set tumor samples. Clearly, these genes separated the cases into two distinct clusters of patients with good prognosis (cluster 2) and poor prognosis (cluster 1) (P<0.001 using a log rank test). FIG. 2B presents a Kaplan-Meier plot of the survival times of Dukes' stage B and C tumors grouped by stage, showing no statistically significant difference.
  • As illustrated in FIGS. 1A, 1B, 2A, and 2B, gene expression profiles separate good and poor prognosis cases better than Dukes' staging. This suggests that a gene-expression based classifier, as provided by the present invention, is more accurate at predicting patient prognosis than the traditional Dukes' staging.
  • Dukes' Staging as a Prognosis Classifier
  • As noted above, Dukes' staging provides only a probability of survival for each member of a population of patients, based on historical statistics. Accordingly, the prognosis of an individual patient can be predicted based on historical outcome probabilities of the associated Dukes' stage. For example, if a Dukes' C. survival rate was 55% at 36 months of follow up, any individual Dukes' C. patient would be classified as having a good prognosis since more than 50% of patients would be predicted to be alive.
  • Performance of a Colorectal Cancer Survival Classifier of the Present Invention as Compared to Dukes' Staging
  • In order to determine the value of the human colon cancer prognosis/survival classifier of the subject invention, a classifier of the invention was compared to the Dukes' clinical staging approach currently in widespread use. In an initial set of 78 tumors (from the test set tumor samples described above), a classifier (Classifier A) of the present invention predicted 100%, 69%, 55% and 20% for Adenomas, and Dukes' stages B, C and D cancers, respectively. The overall accuracy was 77% (63% sensitivity/97% specificity).
  • Using LOOCV, Classifier A was evaluated in predicting prognosis for each patient at 36 months follow-up as compared to Dukes' staging predictions. The results of LOOCV demonstrated that Classifier A of the subject invention was 90% accurate (93% sensitivity/84% specificity) in predicting the correct prognosis for each patient at 36 month of follow-up. A log-rank test of the two predicted groups (good and poor prognosis) was significant (P<0.001), demonstrating the ability of Classifier A to distinguish the two outcomes (FIG. 2A). Permutation analysis demonstrates the result is better than possible by chance (P<0.001-1000 permutations).
  • This result is also significantly higher than that observed using Dukes' staging as a classifier (77%) for the same group of patients (P=0.03878). The results for both Dukes' staging and molecular staging are summarized in Tables 2A-2C below. Shown first in Table 2A are the relative accuracies of Dukes' staging and the cDNA classifier (molecular staging) for all tumors and then a comparison by Dukes' stage. As shown in Table 2B, Dukes' staging was particularly bad at predicting outcome for patients with poor prognosis (70% and 55% for all stages and B and C, respectively). In contrast, molecular staging, as provided by the present invention, identified the good prognosis cases (the “default” classification using Dukes' staging), but also identified poor prognosis cases with a high degree of accuracy, Table 2C. Tables 2A-2C also show the detailed confusion matrix for all samples in the dataset, showing the equivalent misclassification rate of both good and poor prognosis groups by the classifier of the subject invention.
    TABLE 2A
    LOOCV Accuracy of Dukes' vs. Molecular Staging for all
    tumors.
    Classification Method Total Accuracy Sensitivity Specificity
    Dukes' Staging 77% 63% 97%
    Molecular Staging *90%  93% 84%
  • TABLE 2B
    Comparison of Molecular Staging and Dukes' Staging
    Accuracy.
    Dukes' Stage Molecular Staging Dukes' Staging
    Adenoma
    100% 100%
    B 87% 70%
    C 91% 55%
    D 90% 97%
  • TABLE 2C
    Confusion Matrix of cDNA Classifier Results.
    Observed/Predicted Poor Good Totals
    Poor 43 3 46
    Good 5 27 32
    Total 48 30 78

    *Dukes' staging vs. cDNA Classifier, P = 0.03878, one-sided McNemar's test.

    Classifier Construction
  • Leave-one-out cross-validation technique can be utilized for evaluating the performance of a classifier construction method of the subject invention. This approach tends towards high variance in accuracy estimates, but with low bias.
  • Within each step of the leave-one-out cross-validation (or fold), a classifier of the subject invention can be created on all available training data, then tested for accuracy by classifying the left-out example. In one embodiment, a classifier was constructed in two steps: first a gene selection procedure was performed with SAM and then a support vector machine was constructed.
  • In a related embodiment, the gene selection approach used was a univariate selection. SAM (significance analysis of microarrays) was the method chosen for selecting genes. Since gene selected was to be based on two classes (good vs. poor prognosis), the two-class SAM method can be used for selecting genes with the best d values. SAM calculates false discovery rates empirically through the use of permutation analysis. SAM provides an estimate of the false discovery rate (FDR) along with a list of genes considered significant relative to censored survival. This feature of SAM was used with this particular embodiment to select the number of genes that resulted in the smallest FDR possible. In one embodiment, this FDR was zero.
  • The set of 53 genes (significant genes, as described above) at a FDR of 28% was used in this particular embodiment. Using this subset of 53 genes, the samples were clustered as a way of visualizing the SAM results (see FIGS. 1A and 1B). Once the genes were selected using the SAM method, a linear support vector machine (SVM) was constructed. The software used for this approach can be implemented in a weka machine learning toolkit. A linear SVM was then chosen to reduce the potential for overfitting the data, given the small sample sizes and large dimensionality. One further advantage of this approach is the transparency of the constructed model, which is of particular interest when comparing the classifier of the subject invention on two different platforms (see below).
  • In another embodiment, using LOOCV via statistical analytic tools for comparing groups (i.e., parametric tests such as t-test/ANOVA; see also Dyrskjot L et al., “Identifying distinct classes of bladder carcinoma using microarrays,” Nat. Genet., 33:90-6 (2003)), a list of 43 genes (from the 53 SAM selected genes as described above) was selected for use in constructing a second human colorectal cancer survival classifier, in accordance with the present invention. The list of 43 genes is provided in the following Table 3.
    TABLE 3
    Genes used in the cDNA classifier (selected by t-test) and ranked by selection
    frequency using LOOCV.
    Number
    Times GeneBank UniGene
    Occurred ID ID Description
    M*78 AA045075 Hs.62751 syntaxin 7
    M*78 AA425320 Hs.250461 hypothetical protein; MDG1; similar to putative
    microvascular endothelial differentiation gene 1; similar to
    X98993 (PID: g1771560);
    microvascular endothelial differentiation gene 1 product;
    microvascularendothelial differentiation gene 1;
    DKFZP564F1862 p
    M78 AA437223 Hs.46640 adult retina protein
    M*78 AA479270 Hs.250802 Diff33 protein homolog; KIAA1253 protein
    M*78 AA486233 Hs.2707 G1 to S phase transition 1
    M*78 AA487274 Hs.48950 heptacellular carcinoma novel gene-3 protein; DAPPER1
    M78 AA488652 Hs.4209 HSPC235; ribosomal protein L2; Similar to ribosomal
    protein, mitochondrial, L2 [Homo sapiens]; mitochondrial
    ribosomal protein L37; ribosomal protein, mitochondrial, L2
    [Homo sapiens]
    M78 AA694500 Hs.116328 hypothetical protein MGC33414; Similar to PR domain
    containing 1, with ZNF domain
    M78 AA704270 Hs.189002 Null
    M*78 AA706226 Hs.113264 neuregulin 2 isoform 4
    M*78 AA709158 Hs.42853 put. DNA binding protein; put. DNA binding protein; cAMP
    responsive element binding protein-like 1; Creb-related
    protein
    M*78 AA775616 Hs.313 OPN-b; osteopontin; secreted phosphoprotein 1 (osteopontin,
    bone sialoprotein I, early T-lymphocyte activation 1)
    M78 AA777892 Hs.121939 Null
    M*78 AA873159 Hs.182778 apolipoprotein CI; apolipoprotein C-I variant II;
    apolipoprotein C-I variant I
    M*78 AA969508 Hs.10225 HEYL protein; hairy-related transcription factor 3;
    hairy/enhancer-ofsplit related with YRPW motif-like
    M78 AI203139 Hs.180370 hypothetical protein FLJ30934
    M*78 AI299969 Hs.255798 unnamed protein product; HN1 like; Unknown (protein for
    MGC: 22947)
    M*78 H17364 Hs.80285 CRE-BP1 family member; cyclic AMP response element
    DNA-binding protein isoform 1 family; cAMP response
    element binding protein (AA1-505); cyclic AMP response
    element-binding protein (HB16); Similar to activating
    transcription factor 2 [Homo sapiens]; act
    M78 H17627 Hs.83869 unnamed protein
    M*78 H19822 Hs.2450 KIAA0028; leucyl-tRNA synthetase, mitochondrial [Homo
    sapiens]; leucyl-tRNA synthetase, mitochondrial [Homo
    sapiens]; leucine-tRNA ligase precursor; leucine translase
    [Homo sapiens]
    M*78 H23551 Hs.30974 NADH dehydrogenase subunit 4 {Deirochelys reticularia}
    M78 H62801 Hs.125059 Unknown (protein for IMAGE: 4309224) [Homo sapiens];
    hypothetical protein [Homo sapiens]
    M78 H85015 Hs.138614 null
    M78 N21630 Hs.143039 hypothetical protein PRO1942
    M*78 N36176 Hs.108636 membrane protein CH1; membrane protein CH1 [Homo
    sapiens]; membrane protein CH1 [Homo sapiens]; membrane
    protein CH1 [Homo sapiens]
    M*78 N72847 Hs.125221 Alu subfamily SP sequence contamination warning entry.
    [Human] {Homo sapiens}
    M78 N92519 Hs.1189 Unknown (protein for MGC: 10231) [Homo sapiens]
    M*78 R27767 Hs.79946 thyroid hormone receptor-associated protein, 150 kDa
    subunit; Similar to thyroid hormone receptor-associated
    protein, 150 kDa subunit [Homo sapiens];;
    M*78 R34578 Hs.111314 null
    M78 R38360 Hs.145567 unknown {Homo sapiens}
    M78 R43597 Hs.137149 trehalase homolog T19F6.30 - Arabidopsis thaliana
    M78 R43684 Hs.165575 dJ402G11.5 (novel protein similar to yeast and bacterial
    predicted proteins)
    M*78 W73732 Hs.83634 Null
    M*77 AA450205 Hs.8146 translocation protein-1; Sec62; translocation protein 1; Dtrp1
    protein; membrane protein SEC62, S. cerevisiae, homolog of
    [Homo sapiens];
    M77 AI081269 Hs.184108 Alu subfamily SX sequence contamination warning entry.
    M*77 R59314 Hs.170056 null
    M*72 AA702174 Hs.75263 pRb-interacting protein RbBP-36
    M*70 AI002566 Hs.81234 immunoglobin superfamily, member 3
    M*63 AA676797 Hs.1973 cyclin F
    M*62 AA453508 Hs.168075 transportin; karyopherin (importin) beta 2; M9 region
    interaction protein
    M62 W93980 Hs.59511 null
    M*58 AA045308 Hs.7089 insulin induced protein 2; INSIG-2 membrane protein
    M58 AA953396 Hs.127557 null
    M52 AA962236 Hs.124005 hypothetical protein MGC19780
    M*50 AA418726 Hs.4764 null
    M50 R43713 Hs.22945 null
    M*41 AA664240 Hs.8454 artifact-warning sequence (translated ALU class C) - human
    M*38 AA477404 Hs.125262 hypothetical protein; unnamed protein product; GL003;
    AAAS protein; adracalin; aladin
    M*37 AA826237 Hs.3426 Era GTPase A protein; conserved ERA-like GTPase [Homo
    sapiens]; ERA-W [Homo sapiens]; Era G-protein-like 1;
    GTPase, human homolog of E. coli essential cell cycle
    protein Era; era (E. coli Gprotein homolog)-like 1 [Homo
    sapiens]
    M*30 AA007421 Hs.113992 candidate tumor suppressor protein {Homo sapiens}
    M*30 AA478952 Hs.91753 unnamed protein product; hypothetical protein [Homo
    sapiens]; unnamed protein product [Homo sapiens];
    hypothetical protein [Homo sapiens]
    M62 W93980 Hs.59511 Null
    M*58 AA045308 Hs.7089 insulin induced protein 2; INSIG-2 membrane protein
    M58 AA953396 Hs.127557 null
    52 AA962236 Hs.124005 hypothetical protein MGC19780
    *50 AA418726 Hs.4764 null
    50 R43713 Hs.22945 null
    *41 AA664240 Hs.8454 artifact-warning sequence (translated ALU class C) - human
    *38 AA477404 Hs.125262 hypothetical protein; unnamed protein product; GL003;
    AAAS protein; adracalin; aladin
    *37 AA826237 Hs.3426 Era GTPase A protein; conserved ERA-like GTPase [Homo
    sapiens]; ERA-W [Homo sapiens]; Era G-protein-like 1;
    GTPase, human homolog of E. coli essential cell cycle
    protein Era; era (E. coli Gprotein homolog)-like 1 [Homo
    sapiens]
    *30 AA007421 Hs.113992 candidate tumor suppressor protein {Homo sapiens}
    *30 AA478952 Hs.91753 unnamed protein product; hypothetical protein [Homo
    sapiens]; unnamed protein product [Homo sapiens];
    hypothetical protein [Homo sapiens]
    30 AA885096 Hs.43948 Alu subfamily SQ sequence contamination warning entry.
    28 H29032 Hs.7094 null
    *24 R10545 Hs.148877 dJ425C14.2 (Placental protein
    *22 AA448641 Hs.108371 transcription factor; E2F transcription factor 4; p107/p130-
    binding protein
    20 R38266 Hs.12431 Unknown (protein for MGC: 30132)
    19 H17543 Hs.92580 Alu subfamily J sequence contamination warning entry.
    11 T81317 Hs.189846 Alu subfamily J sequence contamination warning entry.
    *9 AA453790 Hs.255585 null
    9 R22340 null unnamed protein product; chr2 synaptotagmin KIAA1228
    protein
    7 AA987675 Hs.176759 null
    7 N51543 Hs.47292 null
    *7 N74527 Hs.5420 unnamed protein product
    *6 AA121778 Hs.95685 null
    *6 AA258031 Hs.125104 unnamed protein product; MUS81 endonuclease
    *6 AA702422 Hs.66521 josephin MJD1; super cysteine rich protein; SCRP
    6 T64924 Hs.220619 null
    *5 R42984 Hs.4863 null
    *5 R59360 Hs.12533 null
    *5 R63816 Hs.28445 unnamed protein product
    5 T49061 Hs.8934 HA-70 {Clostridium botulinum}
    4 AA016210 Hs.24920 null
    4 AA682585 Hs.193822 null
    4 AA705040 Hs.119646 Alu subfamily J sequence contamination warning entry.
    [Human] {Homo sapiens}
    4 AA909959 Hs.130719 NESH; hypothetical protein; NESH protein [Homo sapiens];
    NESH protein; new molecule including SH3 [Homo sapiens]
    4 AI240881 Hs.89688 complement receptor type 1-like protein {Homo sapiens}
    *3 AA133215 Hs.32989 Receptor activity-modifying protein 1 precursor (CRLR
    activity-modifying-protein 1)
    3 AA699408 Hs.168103 prp28, U5 snRNP 100 kd protein; prp28, U5 snRNP 100 kd
    protein [Homo sapiens]
    3 AA910771 Hs.130421 null
    *3 AI362799 Hs.110757 hypothetical protein; NNP3 [Homo sapiens]
    *3 H51549 Hs.21899 UDP-galactose translocator; UDP-galactose transporter 1
    [Homo sapiens]
    3 R06568 Hs.187556 null
    2 AA001604 Hs.204840 null
    *2 AA132065 Hs.109144 unknown; SMAP-5; Similar to hypothetical protein
    AF140225
    *2 AA490493 Hs.24340 null
    2 AA633845 Hs.192156 null
    *2 AI261561 Hs.182577 Alu subfamily SQ sequence contamination warning entry.
    *2 H81024 Hs.180655 Aik2; aurora-related kinase 2; serine/threonine kinase 12;
    Unknown (protein for MGC: 11031) [Homo sapiens];
    Unknown (protein for MGC: 4243) [Homo sapiens]
    2 N75004 Hs.49265 hypothetical protein {Plasmodium falciparum 3D7}
    2 W96216 Hs.110196 NICE-1 protein
    1 AA045793 Hs.6790 hypothetical protein; MDG1; similar to putative microvascular
    endothelial differentiation gene 1; similar to X98993
    (PID: g1771560); microvascular endothelial differentiation gene 1
    product; microvascular endothelial differentiation gene 1;
    DKFZP564F1862 p
    *1 AA284172 Hs.89385 NPAT; predicted amino acids have three regions which share
    similarity to annotated domains of transcriptional factor oct-
    1, nucleoluscytoplasm shuttle phosphoprotein and protein
    kinases; NPAT; nuclear protein, ataxia-telangiectasia locus;
    Similar to nuc
    *1 AA411324 Hs.67878 interleukin-13 receptor; interleukin-13 receptor; interleukin
    13 receptor, alpha 1 [Homo sapiens]; Similar to interleukin 13
    receptor, alpha 1[Homo sapiens]; bB128O4.2.1 (interleukin
    13 receptor, alpha 1) [Homo
    sapiens]; interleukin 13 receptor, alpha 1
    *1 AA448261 Hs.139800 high mobility group AT-hook 1 isoform b; nonhistone
    chromosomal high-mobility group protein HMG-I/HMG-Y
    [Homo sapiens]
    *1 AA479952 Hs.154145 Alu subfamily SX sequence contamination warning entry.
    [Human] {Homo sapiens}
    *1 AA485752 Hs.9573 ATP-binding cassette, sub-family F, member 1; ATP-binding
    cassette 50; ATP-binding cassette, sub-family F (GCN20),
    member 1 [Homo sapiens];;
    *1 AA504266 Hs.8217 nuclear protein SA-2; bA517O1.1 (similar to SA2 nuclear
    protein); hypothetical protein [Homo sapiens]; stromal
    antigen 2 [Homo sapiens]
    *1 AA630376 Hs.8121 null
    *1 AA634261 Hs.25035 null
    1 AA701167 Hs.191919 Alu subfamily SB sequence contamination warning entry.
    [Human] {Homo sapiens}
    *1 AA703019 Hs.114159 small GTP-binding protein; RAB-8b protein; Unknown
    (protein for MGC: 22321) [Homo sapiens]
    *1 AA706041 Hs.170253 unnamed protein product [Homo sapiens]; hypothetical
    protein FLJ23282 [Homo sapiens];;
    1 AA773139 Hs.66103 null
    1 AA776813 Hs.191987 hypothetical protein {Macaca fascicularis}
    *1 AA862465 Hs.71 zinc-alpha2-glycoprotein precursor; Zn-alpha2-glycoprotein;
    Znalpha2-glycoprotein; alpha-2-glycoprotein 1, zinc; alpha-
    2-glycoprotein 1, zinc [Homo sapiens];;
    *1 AA977711 Hs.128859 null
    1 AI288845 Hs.105938 putative chemokine receptor; putative chemokine receptor;
    chemokine receptor X; C—C chemokine receptor 6. (CCR6)
    (Evidence is not experimental); chemokine (C—C motif)
    receptor-like 2 [Homo sapiens]
    *1 H15267 Hs.210863 null
    1 H18956 Hs.21035 unnamed protein product [Homo sapiens]
    1 H73608 Hs.94903 null
    *1 H99544 Hs.153445 unknown; endothelial and smooth muscle cell-derived
    neuropilin-like protein [Homo sapiens]; endothelial and
    smooth muscle cell-derived neuropilin-like protein;
    coagulation factor V/VIII-homology domains protein 1
    [Homo sapiens]
    *1 N45282 Hs.201591 calcitonin receptor-like
    *1 N48270 Hs.45114 Similar to golgi autoantigen, golgin subfamily a, member 6
    [Homo sapiens]
    1 N59451 Hs.48389 null
    *1 N95226 Hs.22039 KIAA0758 protein;
    1 R37028 Hs.20956 cytochrome bd-type quinol oxidase subunit I related protein
    {Thermoplasma acidophilum}
    1 R66605 Hs.182485 Unknown (protein for IMAGE: 4843317) {Homo sapiens}
    *1 T51004 Hs.167847 null
    1 T51316 null null
    1 T72535 Hs.189825 null
    *1 W72103 Hs.236443 beta-spectrin 2 isoform 2

    Mdenotes genes that were used to classify 75% of all tumors, and genes appearing in both the cDNA classifier and the U133A-limited cDNA classifier are marked by *.

    Any and all of the nucleotide and/or amino acid sequences associated with the accession numbers listed in Table 3 are hereby incorporated by reference.
  • In yet another embodiment, a third human colorectal cancer survival classifier, in accordance with the present invention, was prepared using U133A-limited genes selected by LOOCV via statistical analytic tools (i.e., t-test). The list of U133A-limited genes selected using LOOCV via t-test is provided in the following Table 4. The named genes common to both the original classifier (a set of 43 genes) and the U133A-limited classifier are marked with an asterisk. Table 5 illustrates seven genes selected by SAM survival analysis, where osteopontin and neuregulin are noted to be present and in common with the gene lists for all classifiers. In Table 5, genes denoted with (+) indicate a positive correlation to survival time and genes without the (+) notation indicate a negative correlation in survival time (over expression in poor prognosis cases)
    TABLE 4
    Genes used in U133A-limited cDNA classifier (selected by t-test) and ranked
    by selection frequency using LOOCV.
    Number
    Times GeneBank UniGene
    Occurred ID ID Description
    M*78 AA007421 Hs.113992 candidate tumor suppressor protein
    M*78 AA045075 Hs.62751 syntaxin 7
    M*78 AA045308 Hs.7089 insulin induced protein 2, INSIG-2 membrane protein
    M*78 AA418726 Hs.4764 null
    M*78 AA425320 Hs.250461 hypothetical protein; MDG1; similar to putative
    microvascular endothelial differentiation gene 1; similar to
    X98993 (PID: g1771560); microvascular endothelial
    differentiation gene 1 product; microvascular endothelial
    differentiation gene 1; DKFZP564F1862 p
    M*78 AA450205 Hs.8146 translocation protein-1; Sec62; translocation protein 1; Dtrp1
    protein; membrane protein SEC62, S. cerevisiae, homolog of
    [Homo sapiens];
    M*78 AA453508 Hs.168075 transportin; karyopherin (importin) beta 2; M9 region
    interaction protein
    M*78 AA453790 Hs.255585 null
    M*78 AA477404 Hs.125262 hypothetical protein; unnamed protein product; GL003;
    AAAS protein; adracalin; aladin; adracalin
    M*78 AA478952 Hs.91753 unnamed protein product
    M*78 AA479270 Hs.250802 Diff33 protein homolog; KIAA1253 protein
    M*78 AA486233 Hs.2707 G1 to S phase transition 1 [Homo sapiens]
    M*78 AA487274 Hs.48950 heptacellular carcinoma novel gene-3 protein; DAPPER1
    [Homo sapiens]; unnamed protein product [Homo sapiens]
    M*78 AA664240 Hs.8454 artifact-warning sequence (translated ALU class C) - human
    M*78 AA676797 Hs.1973 cyclin F
    M*78 AA702174 Hs.75263 pRb-interacting protein RbBP-36
    M*78 AA706226 Hs.113264 neuregulin 2 isoform 4
    M*78 AA709158 Hs.42853 put. DNA binding protein; put. DNA binding protein; cAMP
    responsive element binding protein-like 1; Creb-related
    protein [Homo sapiens]
    M*78 AA775616 Hs.313 OPN-b; osteopontin; secreted phosphoprotein 1 (osteopontin,
    bone sialoprotein I, early T-lymphocyte activation 1);
    secreted phosphoprotein 1 (osteopontin, bone sialoprotein I,
    early T-lymphocyte activation 1) [Homo sapiens]; secreted
    phosphoprotein 1 (ost
    M*78 AA826237 Hs.3426 Era GTPase A protein; conserved ERA-like GTPase [Homo
    sapiens]; ERA-W [Homo sapiens]; Era G-protein-like 1;
    GTPase, human homolog of E. coli essential cell cycle
    protein Era; era (E. coli G-protein homolog)-like 1 [Homo
    sapiens]
    M*78 AA873159 Hs.182778 apolipoprotein CI; apolipoprotein CI; apolipoprotein C-I;
    apolipoprotein C-I precursor; apolipoprotein C-I variant II;
    apolipoprotein C-I variant I; Similar to apolipoprotein C-I
    [Homo sapiens]
    M*78 AA969508 Hs.10225 HEYL protein; hairy-related transcription factor 3;
    hairy/enhancer-of-split related with YRPW motif-like [Homo
    sapiens]
    M*78 AI002566 Hs.81234 immunoglobin superfamily, member 3
    M*78 AI299969 Hs.255798 unnamed protein product [Homo sapiens]; HN1 like [Homo
    sapiens]; Unknown (protein for MGC: 22947) [Homo
    sapiens]; HN1 like [Homo sapiens]
    M*78 H17364 Hs.80285 CRE-BP1 family member; cyclic AMP response element
    DNA-binding protein isoform 1 family; cAMP response
    element binding protein (AA 1-505); cyclic AMP response
    element-binding protein (HB16); Similar to activating
    transcription factor 2 [Homo sapiens]; act
    M*78 H19822 Hs.2450 KIAA0028; leucyl-tRNA synthetase, mitochondrial [Homo
    sapiens]; leucyl-tRNA synthetase, mitochondrial [Homo
    sapiens]; leucine-tRNA ligase precursor; leucine translase
    [Homo sapiens]
    M*78 H23551 Hs.30974 NADH dehydrogenase subunit 4 {Deirochelys reticularia}
    M*78 N36176 Hs.108636 membrane protein CH1; membrane protein CH1 [Homo
    sapiens]; membrane protein CH1 [Homo sapiens]; membrane
    protein CH1 [Homo sapiens]
    M*78 N72847 Hs.125221 Alu subfamily SP sequence contamination warning entry.
    [Human] {Homo sapiens}
    M*78 R10545 Hs.148877 dJ425C14.2 (Placental protein
    M*78 R27767 Hs.79946 thyroid hormone receptor-associated protein, 150 kDa
    subunit; Similar to thyroid hormone receptor-associated
    protein, 150 kDa subunit [Homo sapiens];;
    M*78 R34578 Hs.111314 null
    M*78 R59314 Hs.170056 null
    M*78 W73732 Hs.83634 null
    M*74 AA448641 Hs.108371 transcription factor; E2F transcription factor 4; p107/p130-
    binding protein [Homo sapiens]; E2F transcription factor 4,
    p107/p130-binding [Homo sapiens]; E2F transcription factor
    4, p107/p130-binding [Homo sapiens];
    M*68 R59360 Hs.12533 null
    M*63 AA121778 Hs.95685 null
    M*59 H51549 Hs.21899 UDP-galactose translocator; UDP-galactose transporter 1
    [Homo sapiens]
    *57 H81024 Hs.180655 Aik2; aurora-related kinase 2; serine/threonine kinase 12;
    serine/threonine kinase 12 [Homo sapiens]; Unknown
    (protein for MGC: 11031) [Homo sapiens]; Unknown (protein
    for MGC: 4243) [Homo sapiens]
    *56 AA490493 Hs.24340 0
    *56 R42984 Hs.4863 null
    *53 AA258031 Hs.125104 unnamed protein product [Homo sapiens]; MUS81
    endonuclease [Homo sapiens]; MUS81 endonuclease [Homo
    sapiens]
    *52 AA133215 Hs.32989 Receptor activity-modifying protein 1 precursor (CRLR
    activity-modifying-protein 1)
    *52 R63816 Hs.28445 unnamed protein product [Homo sapiens]
    *51 N95226 Hs.22039 KIAA0758 protein
    *45 N74527 Hs.5420 unnamed protein product {Homo sapiens}
    *36 AA702422 Hs.66521 josephin MJD1; super cysteine rich protein; SCRP
    *29 AI261561 Hs.182577 Alu subfamily SQ sequence contamination warning entry.
    [Human] {Homo sapiens}
    *28 AA132065 Hs.109144 unknown; SMAP-5; Similar to hypothetical protein
    AF140225 [Homo sapiens]; Similar to hypothetical protein
    AF140225 [Homo sapiens]; unnamed protein product [Homo
    sapiens]; unknown [Homo sapiens]; hypothetical protein
    AF140225 [Homo sapiens]
    *28 AI362799 Hs.110757 hypothetical protein; NNP3 [Homo sapiens]
    *27 AA045793 Hs.6790 hypothetical protein; MDG1; similar to putative
    microvascular endothelial differentiation gene 1; similar to
    X98993 (PID: g1771560); microvascular endothelial
    differentiation gene 1 product; microvascular endothelial
    differentiation gene 1; DKFZP564F1862 p
    *27 AA284172 Hs.89385 NPAT; predicted amino acids have three regions which share
    similarity to annotated domains of transcriptional factor oct-
    1, nucleolus-cytoplasm shuttle phosphoprotein and protein
    kinases; NPAT; nuclear protein, ataxia-telangiectasia locus;
    Similar to nuc
    24 N51632 Hs.75353 The KIAA0123 gene product is related to rat general
    mitochondrial matrix processing protease (MPP).; Unknown
    (protein for IMAGE: 3632957) [Homo sapiens]; Unknown
    (protein for IMAGE: 3857242) [Homo sapiens]; inositol
    polyphosphate-5-phosphatase, 72 kDa; KIAA0
    23 AA482110 Hs.4900 Unknown gene product; PRO0915; CUA001; hypothetical
    protein [Homo sapiens]; hypothetical protein [Homo sapiens]
    22 AA485450 Hs.132821 flavin containing monooxygenase 2; flavin containing
    monooxygenase 2 [Homo sapiens]
    *19 AA699408 Hs.168103 prp28, U5 snRNP 100 kd protein; prp28, U5 snRNP 100 kd
    protein [Homo sapiens]
    18 N70777 Hs.49927 BA103J18.1.2 (novel protein, isoform 2) [Homo sapiens]
    16 AA993736 Hs.169838 hypothetical protein; vesicle-associated membrane protein 4
    [Homo sapiens]; Similar to vesicle-associated membrane
    protein 4 [Homo sapiens]
    15 AI139498 Hs.151899 delta sarcoglycan; delta-sarcoglycan isoform 2; Sarcoglyan,
    delta (35 kD dystrophin-associated glycoprotein); dystrophin
    associated glycoprotein, delta sarcoglycan; 35 kD dystrophin-
    associated glycoprotein [Homo sapiens]
    15 N59721 Hs.21858 glia-derived nexin precursor; serine (or cysteine) proteinase
    inhibitor, clade E (nexin, plasminogen activator inhibitor type
    1), member 2; protease inhibitor 7 (protease nexin I); glia-
    derived nexin [Homo sapiens]; similar to serine (or cysteine)
    protein
    14 AA431885 Hs.5591 MAP kinase-interacting serine/threonine kinase 1; MAP
    kinase
    interacting kinase 1 [Homo sapiens]
    14 AA911661 Hs.2733 Hox2H protein (AA 1-356); K8 homeo protein; HOX2.8 gene
    product; HOXB2 protein; HOX-2.8 protein (77 AA); homeo
    box B2; homeo box 2H; homeobox protein Hox-B2; K8
    home protein [Homo sapiens];
    13 AA775865 Hs.7579 KIAA1192 protein; HSPC273; unnamed protein product;
    hypothetical protein FLJ10402 [Homo sapiens]; unnamed
    protein product [Homo sapiens]; hypothetical protein
    FLJ10402 [Homo sapiens]; hypothetical protein [Homo
    sapiens]; unnamed protein product [Homo sapiens]
    13 R30941 Hs.24064 signal transducer and activator of transcription Stat5B;
    transcription factorStat5b; STAT5B_CDS [Homo sapiens];
    signal transducer and activator of transcription 5B; signal
    transducer and activator of transcription 5; transcription
    factor STAT5B [Homo sapiens]
    *11 AA703019 Hs.114159 small GTP-binding protein; RAB-8b protein; Unknown
    (protein for MGC: 22321) [Homo sapiens]
    11 AA777192 Hs.47062 RNA Polymerase II subunit 14.5 kD; DNA directed RNA
    polymerase II polypeptide I; DNA directed RNA polymerase
    II 14.5 kda polypeptide [Homo sapiens]; polymerase (RNA)
    II (DNA directed) polypeptide I (14.5 kD) [Homo sapiens]
    *10 W72103 Hs.236443 beta-spectrin 2 isoform 2 [Homo sapiens]
    *9 H15267 Hs.210863 null
    8 H17638 Hs.17930 dJ1033B10.2.2 (chromosome 6 open reading frame 11
    BING4), isoform 2) [Homo sapiens]
    8 R60193 Hs.11637 null
    7 R92717 Hs.170129 choroideremia-like Rab escort protein 2; dJ317G22.3
    (choroideremia-like (Rab escort protein 2))
    *6 AA706041 Hs.170253 unnamed protein product [Homo sapiens]; hypothetical
    protein FLJ23282 [Homo sapiens];;
    *5 AA411324 Hs.67878 interleukin-13 receptor; interleukin-13 receptor; interleukin
    13 receptor, alpha 1 [Homo sapiens]; Similar to interleukin
    13 receptor, alpha 1 [Homo sapiens]; bB128O4.2.1
    (interleukin 13 receptor, alpha 1) [Homo sapiens]; interleukin
    13 receptor, alpha 1
    *5 AA504266 Hs.8217 nuclear protein SA-2; bA517O1.1 (similar to SA2 nuclear
    protein); hypothetical protein [Homo sapiens]; stromal
    antigen 2 [Homo sapiens]
    5 AA932696 Hs.8022 TU3A protein; TU3A protein [Homo sapiens]
    5 AA973494 Hs.153003 serine/threonine kinase; myristilated and palmitylated serine-
    threonine kinase MPSK; protein kinase expressed in day 12
    fetal liver; F5-2; serine/threonine kinase KRCT;
    erine/threonine kinase 16 [Homo sapiens];
    5 N45100 Hs.34871 HRIHFB2411; KIAA0569 gene product; Smad interacting
    protein 1 [Homo sapiens]; smad-interacting protein-1 [Homo
    sapiens]
    4 AA418410 Hs.9880 cyclophilin; U-snRNP-associated cyclophilin; peptidyl prolyl
    isomerase H (cyclophilin H) [Homo sapiens]
    4 AA725641 Hs.154397 WD-repeat protein
    4 AA954482 Hs.222677 SSX1; synovial sarcoma, X breakpoint 1 [Homo sapiens];
    synovial sarcoma, X breakpoint 8 [Homo sapiens]; synovial
    sarcoma, X breakpoint 1; sarcoma, synovial, X-chromosome-
    related 1; SSX1 protein [Homo sapiens]
    4 H45391 Hs.31793 null
    4 T86932 Hs.131924 T-cell death-associated gene 8; similar to G protein-coupled
    receptor [Homo sapiens]
    3 AA279188 Hs.86947 disintegrin and metalloprotease domain 8 precursor
    *3 AA485752 Hs.9573 ATP-binding cassette, sub-family F, member 1; ATP-binding
    cassette 50; ATP-binding cassette, sub-family F (GCN20),
    member 1 [Homo sapiens];;
    3 AA680132 Hs.55235 sphingomyelin phosphodiesterase 2, neutral membrane
    (neutral
    sphingomyelinase); Unknown (protein for MGC: 1617)
    [Homo sapiens]
    *3 AA977711 Hs.128859 null
    3 W93370 Hs.174219 NKG2E; type II integral membrane protein; killer cell lectin-
    like receptor subfamily C, member 3; killer cell lectin-like
    receptor subfamily C, member 3 isoform NKG2-H; NKG2E
    [Homo sapiens]; NKG2E [Homo
    sapiens]; NKG2E [Homo sapiens]
    2 AA036727 Hs.180236 null
    2 AA071075 Hs.25523 Alu subfamily SP sequence contamination warning entry.
    [Human] {Homo sapiens}
    2 AA464612 Hs.190161 PTD017; HSPC183; PTD017 protein [Homo sapiens];
    mitochondrial ribosomal protein S18B; mitochondrial
    ribosomal protein S18-2; mitochondrial 28S ribosomal
    protein S18-2 [Homo sapiens]
    2 AA481250 Hs.154138 chitinase precursor; chitinase 3-like 2; chondrocyte protein
    39; chitinase 3-like 2 [Homo sapiens]
    2 AA598659 Hs.168516 NuMA protein {Homo sapiens}
    2 AA682905 Hs.8004 huntingtin-associated protein interacting protein
    2 R17811 Hs.77897 splicing factor SF3a60; pre-mRNA splicing factor SF3a
    (60 kD), similar to S. cerevisiae PRP9 (spliceosome-
    associated protein 61); splicing factor 3a, subunit 3, 60 kD
    [Homo sapiens]; Similar to splicing factor 3a, subunit 3,
    60 kD [Homo sapiens]
    2 W93592 Hs.47343 hWNT5A; wingless-type MMTV integration site family,
    member 5A precursor; proto-oncogene Wnt-5A precursor;
    WNT-5A protein precursor [Homo sapiens]
    1 AA017301 Hs.60796 artifact-warning sequence (translated ALU class C) - human
    1 AA046406 Hs.100134 unnamed protein product [Homo sapiens]; hypothetical
    protein FLJ12787 [Homo sapiens]
    1 AA256304 Hs.172648 Unknown (protein for MGC: 9448) [Homo sapiens]; distal-
    less homeo box 7 [Homo sapiens]; distal-less homeobox 4,
    isoform a; beta protein 1 [Homo sapiens]
    1 AA416759 Hs.239760 Unknown (protein for MGC: 2503) [Homo sapiens]; unnamed
    protein product [Homo sapiens]
    *1 AA448261 Hs.139800 high mobility group AT-hook 1 isoform b; nonhistone
    chromosomal highmobility group protein HMG-I/HMG-Y
    [Homo sapiens]
    1 AA452130 Hs.28219 Alu subfamily SX sequence contamination warning entry.
    [Human] {Homo sapiens}
    1 AA457528 Hs.22979 unnamed protein product [Homo sapiens]; hypothetical
    protein FLJ13993 [Homo sapiens]; FLJ00167 protein [Homo
    sapiens]
    1 AA460542 Hs.121849 microtubule-associated proteins 1A/1B light chain 3;
    microtubuleassociated proteins 1A/1B light chain 3;
    microtubule-associated proteins 1A/1B light chain 3 [Homo
    sapiens]; microtubule-associated proteins 1A/1B light chain 3
    [Homo sapiens]
    *1 AA479952 Hs.154145 Alu subfamily SX sequence contamination warning entry.
    [Human] {Homo sapiens}
    1 AA481507 Hs.159492 unnamed protein product [Homo sapiens]
    1 AA504342 Hs.7763 null
    1 AA598970 Hs.7918 unnamed protein product; hypothetical protein; dJ453C12.6.2
    (uncharacterized hypothalamus protein (isoform 2));
    hypothetical protein [Homo sapiens]; uncharacterized
    hypothalamus protein HSMNP1 [Homo sapiens]
    *1 AA630376 Hs.8121 null
    *1 AA634261 Hs.25035 null
    1 AA677254 Hs.52002 CT-2; CD5 antigen-like (scavenger receptor cysteine rich
    family); bA120D12.1 (CD5 antigen-like (scavenger receptor
    cysteine rich family)) [Homo sapiens]; CD5 antigen-like
    (scavenger receptor cysteine rich family) [Homo sapiens]
    1 AA757564 Hs.13214 Probable G protein-coupled receptor GPR27 (Super
    conserved receptor expressed in brain 1). [Human]
    1 AA775888 Hs.163151 null
    1 AA844864 Hs.4158 regenerating protein I beta; regenerating islet-derived 1 beta
    precursor; lithostathine 1 beta; regenerating protein I beta;
    secretory pancreatic stone protein 2 [Homo sapiens]
    *1 AA862465 Hs.71 zinc-alpha2-glycoprotein precursor; Zn-alpha2-glycoprotein;
    Zn-alpha2-glycoprotein; alpha-2-glycoprotein 1, zinc; alpha-
    2-glycoprotein 1, zinc [Homo sapiens];;
    1 AA989139 Hs.16608 candidate tumor suppressor protein; candidate tumor
    suppressor protein [Homo sapiens]
    1 AI253017 Hs.183438 U4/U6 snRNP-associated 61 kDa protein {Homo sapiens}
    1 AI394426 Hs.57732 acid phosphatase {Homo sapiens}
    *1 H99544 Hs.153445 unknown; endothelial and smooth muscle cell-derived
    neuropilin-like protein [Homo sapiens]; endothelial and
    smooth muscle cell-derived neuropilin-like protein;
    coagulation factor V/VIII-homology domains protein 1
    [Homo sapiens]
    1 N41021 Hs.114408 Toll/interleukin-1 receptor-like protein 3; Toll-like receptor
    5; Toll-like receptor 5 [Homo sapiens]; toll-like receptor 5;
    Toll/interleukin-1 receptor-like protein 3 [Homo sapiens]
    *1 N45282 Hs.201591 calcitonin receptor-like
    1 N46845 Hs.144287 hairy/enhancer-of-split related with YRPW motif 2; basic
    helix-loop-helix factor 1; HES-related repressor protein 1
    HERP1; GRIDLOCK; basichelix-loop-helix protein; hairy-
    related transcription factor 2; hairy/enhancer-of-split related
    with YRPW motif 2 [H
    *1 N48270 Hs.45114 Similar to golgi autoantigen, golgin subfamily a, member 6
    [Homo sapiens]
    1 N59846 Hs.177812 Unknown (protein for MGC: 41314) {Mus musculus}
    1 R16760 Hs.20509 HBV pX associated protein-8
    1 R44546 Hs.82563 dJ526I14.2 (KIAA0153 (similar
    1 R92994 Hs.1695 metalloelastase; metalloelastase; matrix metalloproteinase 12
    (macrophage elastase)
    *1 T51004 Hs.167847 null
    1 T56281 Hs.8765 metallothionein I-F; RNA helicase-related protein [Homo
    sapiens];
    metallothionein 1F [Homo sapiens]
    1 T70321 Hs.247129 G3a protein; Apo M; apolipoprotein M; Unknown (protein
    for
    MGC: 22400) [Homo sapiens]; apolipoprotein M; NG20-like
    protein [Homo sapiens]
    1 W45025 Hs.170268 Alu subfamily SX sequence contamination warning entry.
    [Human] {Homo sapiens}

    Mdenotes genes used to classify 75% of all tumors, and genes appearing in both the cDNA classifier and U133A-limited cDNA classifier are marked by *.
  • Any and all of the nucleotide and/or amino acid sequences associated with the accession numbers listed in Table 4 are hereby incorporated by reference.
    TABLE 5
    Censored survival analysis using SAM; seven genes selected with
    median estimated FDR of 13.5%.
    GeneBank UniGene
    ID ID Description
    N36176 Hs.108636 membrane protein CH1
    AA149253 Hs.107987 N/A
    AA425320 Hs.250461 hypothetical protein; MDG1; similar to putative
    microvascular endothelial differentiation
    gene 1; similar to X98993 (PID: g1771560)
    AA775616 Hs.313 OPN-b; osteopontin; secreted phosphoprotein 1
    (osteopontin, bone sialoprotein I, early
    T-lymphocyte activation 1)
    N72847 Hs.125221 N/A
    AA706226 Hs.113264 neuregulin 2 isoform 4
    +AA883496 Hs.125778 N/A

    Any and all of the nucleotide and/or amino acid sequences associated with the accession numbers listed in Table 5 are hereby incorporated by reference.

    Cross Platform Validation
  • Systems and methods of the subject invention can be tested by applying a classifier to an immediately available, well-annotated, independent test set of colon cancer tumor samples (Denmark, as described above) run on the Affymetrix platform. Using database software such as the Resourcer software from TIGR (see also Tsai J et al., “RESOURCER: a database for annotating and linking microarray resources within and across species,” Genome Biol, 2:software0002.1-0002.4 (2001)), genes can be mapped out from the cDNA chip to a corresponding gene on the Affymetrix platform.
  • The linkage is done by common Unigene IDs.
  • In one embodiment, 12,951 genes (out of 32,000) were mapped to an Affymetrix U133A GeneChip. In certain instances, probes on the cDNA chip are unknown expressed sequence tag markers (ESTs) which can reduce the number of usable genes identified. Thus, a classifier of the subject invention can address this lack of correspondence in platforms. Accordingly, in a related embodiment, a U133A-limited cDNA classifier was constructed in accordance with the subject invention by using the identical approach on this reduced set of overlapping genes.
  • With the U133A-limited cDNA classifier, only those cDNA probes are chosen that (according to Resourcerer) mapped to an Affymetrix probe set. This approach enables cross-platform comparison. For example, the training set samples were used together with the test set tumor samples in a flip-dye design. The end expression value from a cDNA probe is then the log2 of the training set to test set sample ratio. This same reference RNA was used on two U133A Affymetrix chips.
  • Once the U133A-limited cDNA classifier was constructed, a linear scaling factor based on the expression of a common training set (H. Lee Moffitt Cancer Center & Research Institute, Tampa, Fla.) sample applied to both the cDNA microarrays and the U133A GeneChips, was applied equally to all Affymetrix samples (training set as well as test set samples from DENMARK). Using this assumption, the U133A chip value corresponding to a cDNA probe is the ratio of training set to test set sample (on U133A chips). Each of the Affymetrix U133A arrays (both the test set and the reference samples) was scaled to a constant average intensity (150) prior to taking the ratio and the test sample chip values were averaged.
  • The results of a full LOOCV for the U133A-limited classifier on the test set sample (Moffitt Cancer Center cDNA microarray data set; original 78 samples) are shown in Tables 6A-6C. The accuracy of the U133A-limited classifier was 72% (80% sensitivity/59% specificity), which contrasted from the original cDNA classifier results (90%, P=0.001154). Many ESTs were selected both in the SAM survival analysis and in the original cDNA-based classifier, indicating unknown genes (ESTs) may be very important to colorectal cancer outcome. The U133A-limited classifier was not significantly different, however, than the Dukes' staging (77%), P=0.4862 using a two-sided McNemar's test, and still significantly discriminated the two groups, as can be seen in FIG. 3B (P<0.001).
  • FIGS. 3A through 3C illustrate survival curves for molecular classifiers in accordance with the subject invention. Specifically, FIG. 3A illustrates the survival curve for a cDNA classifier of the subject invention on the 78 training set samples (LOOCV); FIG. 3B illustrates the survival curve for the U133A-limited cDNA classifier (LOOCV); and FIG. 3C illustrates the survival curve for an independent test set classification (Denmark test set sample). A large difference in sensitivity can be seen between the Dukes' method and the classifier (Tables 6A-6C). The confusion matrix and accuracy rates by Dukes' stage are also presented in Tables 6A-6C.
    TABLE 6A
    LOOCV Accuracy of Dukes' vs. Molecular Staging for all tumors.
    Classification Total
    Method Accuracy Sensitivity Specificity
    Dukes' 76.9% 63% 97%
    Staging
    Molecular 71.8% 80% 59%
    Staging
  • TABLE 6B
    Comparison of Molecular Staging and Dukes' Staging Accuracy
    Dukes' Molecular Dukes'
    Stage Staging Staging
    Adenoma 67% 100%
    B 70% 70%
    C 64% 55%
    D
    80% 97%
  • TABLE 6C
    Confusion Matrix of cDNA Classifier Results
    Observed/Predicted Poor Good Totals
    Poor 38  8 46
    Good 14 18 32
    Total 52 26 78
  • With respect to comparing the predictive power of a classifier of the subject invention to Dukes' staging, the U133A-limited classifier was tested on the test set of colorectal cancer samples from Denmark that were profiled on the Affymetrix U133A platform. The normalized and scaled test-set data were evaluated with the U133A-limited cDNA classifier. Because the Denmark cases included only Dukes' stages B and C, classification of outcome by Dukes' staging would predict all samples to be of good prognosis. The accuracy of the cDNA classifier was reduced from 72% in LOOCV of the training set (Tables 6A-6C) to 68% in the Denmark cross-platform test set (Tables 7A-7C). A diminished accuracy (4%) was expected due to the limitations imposed by cross-platform analyses, however this reduction was very small compared to that caused by limiting the classifier gene set to U133A content. This result is not significantly different from that achieved by classification using Dukes' staging (64%, P=0.7194 using a two sided McNemar's test) and is better than other reported results (47%) (see Sorlie T et al., “Repeated observation of breast tumor subtypes in independent gene expression data sets,” Proc Natl Acad Sci USA, 100:8418-23 (2003)) for cross-platform analyses where scaling was required. Moreover, the classifier of the subject invention was able to predict the outcome for poor prognosis patients (sensitivity) with an accuracy of 55% whereas 0% would be predicted correctly by Dukes' staging.
    TABLE 7A
    Accuracy of U133A limited Molecular Staging on Cross-Platform
    Denmark Independent Test Set.
    Classification Method Total Accuracy Sensitivity Specificity
    Dukes' Staging   64%  0% 100%
    Molecular Staging 68.5% 55%  75%
  • TABLE 7B
    Comparison of Dukes' Staging and U133A limited Molecular Staging
    Accuracy on Cross-Platform Denmark Independent Test Set.
    Dukes' Stage Molecular Staging Dukes' Staging
    B 64% 79%
    C 70% 58%
  • TABLE 7C
    Confusion Matrix of U133A limited Molecular Staging Results on
    Cross-Platform Denmark Independent Test Set
    Observed/Predicted Poor Good Totals
    Poor 17 14 31
    Good 14 43 57
    Total 31 57 88
  • The present invention provides a colon cancer clinical classifier with significant accuracy in LOOCV that exceeds that of Dukes staging. The utility of the classifier of the subject invention can be validated, such as against in an independent colon cancer population using a completely different microarray platform. The gene classifier of the subject invention can be based on a core set of genes that have biological significance for any type of cancer, including human colon cancer progression.
  • Application of Prognosis Classifier with Therapy
  • The benefit of adjuvant chemotherapy for colorectal cancer appears limited to patients with Dukes stage C disease where the cancer has metastasized to lymph nodes at the time of diagnosis. For this reason, the clinicopathological Dukes' staging system is critical for determining how adjuvant therapy is administered. Unfortunately, as noted above, Dukes' staging is not very accurate in predicting overall survival and thus its application likely results in the treatment of a large number of patients to benefit an unknown few. Alternatively, there are a number of patients who would benefit from therapy that do not receive it based on the Dukes' staging system. Accordingly, an important contribution of the prognosis/survival classifier of the present invention is the ability to identify those Dukes' stage B and C cases for which chemotherapy may be beneficial.
  • The molecular staging/classifier of the subject invention provides more accurate predictions of patient outcome than is currently possible with current clinical staging systems, which may, in fact, misclassify patients. In accordance with the present invention, a set of genes is derived from a genome wide analysis of gene expression using known microarray analysis techniques (i.e., SAM). By clustering groups of patients with good and bad prognoses, it is illustrated that the prognosis/classifier of the subject invention presents outcome-rich information. In a further aspect of the present invention, a supervised learning analysis can be used to identify a core set of informative genes. In a preferred embodiment, a core set of 43 genes was identified that appeared in 75% of the cross validation iterations and accurately predicted colorectal cancer survival. This core set was derived from a 32,000-element cDNA microarray that included both named and unnamed genes. This gene set was highly accurate in predicting survival when compared with Dukes' staging data from the same patients.
  • A means for validating a prognosis/survival classifier is provided by the present invention. In one embodiment, to validate a cDNA-based classifier for human colorectal cancer, a normalized and scaled oligonucleotide-based colorectal cancer database from Denmark was evaluated based on the Affymetrix U133A GeneChip™. In a related embodiment, a colorectal cancer classifier (U133A-based cDNA classifier) was produced on the training data set using a limited set of genes common to both the U133A and the cDNA microarray (for 78 genes). The U133A-based cDNA classifier was then applied directly to the normalized and scaled Denmark test population.
  • In addition to identifying those patients for whom therapy is most beneficial, the classifier of the subject invention can identify those genes that are most biologically significant based on their frequency of appearance in the classification set. In one embodiment, those genes that are most biologically significant to colorectal cancer were identified using the classifier provided in Example 1. Specifically, osteopontin and neuregulin reported biological significance in the context of colorectal cancer.
  • Osteopontin, a secreted glycoprotein and ligand for CD44 and αvβ3, appears to have a number of biological functions associated with cellular adhesion, invasion, angiogenesis and apoptosis (see Fedarko NS et al., “Elevated serum bone sialoprotein and osteopontin in colon, breast, prostate, and lung cancer,” Clin Cancer Res, 7:4060-6 (2001); Yeatman T J and Chambers A F, “Osteopontin and colon cancer progression,” Clin Exp Metastasis, 20:85-90 (2003)). Using an oligonucleotide microarray platform, osteopontin was identified as a gene whose expression was strongly associated with colorectal cancer stage progression (Agrawal D et al., “Osteopontin identified as lead marker of colon cancer progression, using pooled sample expression profiling,” J Natl Cancer Inst, 94:513-21 (2002)). INSIG-2, one of the 43 core classifier genes provided in Example 1, was recently identified as an osteopontin signature gene, suggesting that an osteopontin pathway may be prominent in regulating colon cancer survival.
  • Similarly, neuregulin appeared to have biological significance in the context of colorectal cancer based on frequency of appearance in the classification set of the present invention. Neuregulin, a ligand for tyrosine kinase receptors (ERBB receptors), may have biological significance in the context of colorectal cancer where current data suggest a strong relationship between colon cancer growth and the ERBB family of receptors (Carraway K L, 3rd, et al., “Neuregulin-2, a new ligand of ErbB3/ErbB4-receptor tyrosine kinases,” Nature, 387:512-6 (1997)). Neuregulin was recently identified as a prognostic gene whose expression correlated with bladder cancer recurrence (Dyrskjot L, et al., “Identifying distinct classes of bladder carcinoma using microarrays,” Nat Genet, 33:90-6 (2003)).
  • Accordingly, the identification of such genes may be significant in terms of gene therapy. For example, a therapeutic gene may be identified, which when reintroduced into tumor cells, may arrest or even prevent growth in cancer cells. Additionally, using the classifier of the present invention, a therapeutic gene may be identified that enables increased responsiveness to interventions such as radiation or chemotherapy.
    Sequences
    ACCESSION No. AA149253
    ORIGIN
    1 aatatggaca gggagtctca ttgtgtttat catatcaatt aatattacag tacatccttg
    61 gtaatacaaa attgtacacc ttcatcaaat aaattaggat aaattaaacc aataaattat
    121 gcaaagtctt cagaacaata gacaacaaca aaaattcaca attgaaattg cctctagcta
    181 aaaaaaacaa acaaaaatca aaaattgact ttatcagttc agttattgta ctatattcaa
    241 atcaaagggt ctttattaca aaaaagagct taataatgct atttacaaca tattgctaaa
    301 taatataaag gcagtgtttt gtcacggttt atactatata catatgagaa atggctggga
    361 caatattgag ggaagcccat gaccttttgg attcttccag gtagcgctga gaccnatccc
    421 aatacatttt ttttccttag ttccaaattt gganggcgta atatngcagt tttnagaaat
    481 tttccncccc ccntttttag gggggattgg atattttana aaaattccgg atggaatacg
    541 gtttccccna aggagggtag cntggtt
    ACCESSION No. AA775616
    ORIGIN
    1 tttttacatt caagataaaa gatttattca caccacaaaa agataatcac aacaaaatat
    61 acactaactt aaaaaacaaa agattatagt gacataaaat gttatattct ctttttaagt
    121 gggtaaaagt attttgtttg cgtctacata aatttctatt catgagagaa taacaaatat
    181 taaaatacag tgatagtttg catttcttct atagaatgaa catagacata accctgaagc
    241 ttttagttta cagggagttt ccatgaagcc acaaactaaa ctaattatca aacacatcag
    301 ttatttccag actcaaatag atacacattc aaccaataaa ctgagaaaga agcatttcat
    361 gttctctttc attttgctat aaagcatttt ttcttttgac taaatgcaaa gtgagagatt
    421 gtattttttc tccttttaat tgacctcaga agatgcacta tctaattcat gagaaatacg
    481 aaatttcagg tgtttatctt cttccttact tttggggtct acaccagcat atcttcatgg
    541 ctg
    ACCESSION No. AA045075
    ORIGIN
    1 ttttttnttt tttttttttt tttttttttt tccaggaaag acagatgtta tttaccacca
    61 atgaattttt atcatattta aatgaacttg aaaatgtcat tcaactcaaa tccctcaatc
    121 aacttacttc agcccattct gaaacttcat attgcagcaa accagccatg tgaaagaaat
    181 aaattcaat
    ACCESSION No. AA425320
    ORIGIN
    1 ttttcaggtt gtaaatattt atatttctct cacatacaat gttgtatgag acacttgttt
    61 taatatgtat ccataggatt aatactcata tggagtataa tgtggaaaag tgcagaacta
    121 aagaaataag tctatccgaa aacaaaagca cacatttctc aggatttaaa aatattgcac
    181 atagtaaggt tgcacagaaa ttactggctg gttttacaaa cagaatgagg tatcagtcaa
    241 tctctagata aagatgagag agaggataaa ctacacacac acaaacacat aaatccatac
    301 taagacctaa gagtgccaac aactaagaaa gaaatatgaa aaagctatgt taggtagcca
    361 ggatttcaac actacaaaat catttttagg ctggaaccaa acacataaca atctcttggc
    421 aatatttcgt taagttttca acttttttcc agcctaaatg actatgggca ataaaaccat
    481 ttcctttacc ccagttctac tgtagaaagg cacagcgctg tggtaaatat caaaccattc
    541 ctttctcaac
    ACCESSION No. AA437223
    ORIGIN
    1 tttggtgaat aaactaacag ctttattaat gaaggcaaac atcagatcat tgtatgaata
    61 ttatatatat atataaaaag aaatccaaac taacagcatt gtatttcaaa agtactgtac
    121 ttctgtttct tttaaagaga cttgtcatct gtttttataa aacaaaatgg gtactcttct
    181 cctaaaaaat cctggaaaaa tgaaatagtc aatttcaagc tgatgaattg aacacacctt
    241 tctttaaatg cagactattg ctaggaagca aataaagtca agcatcagaa agaagatgta
    301 tgagaaatgc atgaaagtca gagaaaaggg atgtagtgaa attactgcta atctttcccc
    361 cctatattca aagaccatcc aaaactggtc tttcatacaa atataaaata actataaaga
    421 gagggaattt gaaaccatac ccatctgaaa tc
    ACCESSION No. AA479270
    ORIGIN
    1 ctctgaattc atttatttag aggtaaaaca cagccattca aaattgtgga atacaatgtc
    61 tacacacaga ataaggttgg ggaattaagc tgaattgtta tattccattc acattaataa
    121 atatttttaa agaagaaatt gtagatttta aaagcttcat tagacactag tgacacatac
    181 aaataactaa actctcatac tgcttgattt tcaggttgaa aggttacaat aatctatata
    241 tttcaattac atggcagtaa atacaaaagc attttaaaca tcttttgaac tgtgtagtat
    301 actataagca ggagttt
    ACCESSION No. AA486233
    ORIGIN
    1 caaattgaat attttattaa catggtagtt gcctttgtaa catgtgcaca cacactcgca
    61 cactcagaat gatctgcctg ggggaaaaat actaaatatg cctaagggga aaatgaaaaa
    121 taaaaaaatt cctgtaggtt ttcattattg taggcaatta tgtccacatc acttacaaag
    181 ctattgccaa atctgtccaa ggaagcagag tttgaagtga gggctaggga caggaatctt
    241 gggaaaaatt caacagtggc atagcagagc tctcaatatg agaaagctga cataatgtgg
    301 acttttgctg tgaattacct ctttgcaaaa tatggggaga ggtttatcaa tgggcagaaa
    361 ataagagaag gcggtgtgaa gtaggcttct gcagtcaatt ttcctcacag tattgtgcag
    421 ggtcatcaag aaaatgctta gtctttctct ggaaccagtt tcagaacttt tccaattgca
    481 atggtcttac cctcatctct taagggtgaa cgacccacct aagggaagtc tttaaag
    ACCESSION No. AA487274
    ORIGIN
    1 tattactgca tatgttatat taaatttaca caatgatata taaaaacaca tactgtttat
    61 attatatagt aatttaacat caacaggagt atcaacacaa gtactactca tgcacaaaac
    121 atgcatatat tggtatacaa aaagcaattt tacacaatac tgtttaccaa aaattttttc
    181 ttaaaaaaca gcccttccac ataggatcaa aggtccaatc tggactggat tgcactaata
    241 tgttcaggtc aacgcttcgg tggcatagcg ctcagtgagc aattctggga ttggagtcat
    301 gcccaagggc tacttcatta atagtga
    ACCESSION No. AA488652
    ORIGIN
    1 tttttttttt tttgcaacgc aagggctctt tattgtcagc gagacgagca ggccaaacgg
    61 gcactgaggc tccacggggc ccaggcctct ttccgtggaa gagaggcaag aggggtttca
    121 ggattcagag gggtcctccg ctcacgcagc accatgcaaa tatagagcta aaaactttct
    181 gaatgtctct ggcttgaaac caactgggcc aacaggttcc acaaccactc tctttttgat
    241 cactgggaga caccaaaaat gctgatagag gagctggtct gagtccaccc aggccaaatt
    301 cttgacaccc tcgttagagt ccaggtctgt ggtattcagt tgaaacacta ggaaatggaa
    361 gacacgtcca tccgtgccca ggctctgcac caccacgggc tgctccaaga ccttggcatc
    421 attcccatag aggagccggg cctgagcagg gcactgcaaa agcaaacagg atcatcttgg
    481 cccgcagctg atctggttga aggcggtgtg gtcgtaaatt ggctttgtcc agtaagtaca
    541 gggtatgggg ataggggtaa ggatag
    ACCESSION No. AA694500
    ORIGIN
    1 tttgacagaa gaaacatttt taattgttct tgtcctgccc catcaccagg ggagtcccgg
    61 cattgctcag gctcactgcg cttgctttcc cctgggatgt cgaggacact ttgacctcat
    121 ctatgtcata gcccatgtgt ttctcagatg ccaccgccat aagatctagt gccccctggt
    181 gccattggga taggcaggcc agagaggcat gggagctggg tgtgcaccag gccacagggc
    241 tgtggggcat gcagccgatg gtgcagcttc aggtggatgt gctgggtgaa gcgactccgg
    301 cagacactgc actggaaggg ccgggtccgg aggtgca
    ACCESSION No. AA704270
    ORIGIN
    1 ctaaatcaag tagtgctact gaaatccagt gcctaatgga gcagatggtg gaggtcttag
    61 actctggaac atttatagtg atgcttctga atgcaaaaca ccaagagtgg atttcacagg
    121 ctgtgaatct gatttgattt tgatgggagt aaagcttcca ttttcactgt acttgaacca
    181 caaaagaaaa aaagcatgtg tgactgacac aagctagtta agaaaaagga acatgttaaa
    241 tattagtccc ataaagggaa gcagtttaaa caagtgatta tttgtttgta tcatttaaca
    301 tgattatgtt tgtatacaat accaccgtttAA706226
    ACCESSION No. AA709158
    ORIGIN
    1 tttttttcct tcaactccct ccaagttgtt tatttaataa taataaaaaa gaaatgcaca
    61 cacataaacc tgaactcccc cccaccccac cctcccttac tcccagtaac tagctccaaa
    121 atgaaaaaac ttcccttgtc ccacctgggg actaaattcc cacctccact gccataacac
    181 tagagaaaca aaataaaaaa tatgcagcag ctcaccaccc accccacaac tgaacctcac
    241 acaatcccct caaacaaaga agccaggact gggggttcac aggaatgaga ggagccctat
    301 attctgaaaa gggatgagaa gagaggtgaa cacccccacc tcaaataagt gcttaacccc
    361 cacacctgct ctttccttta ccaattgccc caagcctggg gaatcaggga aatttgaaac
    421 agt
    ACCESSION No. AA775616
    ORIGIN
    1 tttttacatt caagataaaa gatttattca caccacaaaa agataatcac aacaaaatat
    61 acactaactt aaaaaacaaa agattatagt gacataaaat gttatattct ctttttaagt
    121 gggtaaaagt attttgtttg cgtctacata aatttctatt catgagagaa taacaaatat
    181 taaaatacag tgatagtttg catttcttct atagaatgaa catagacata accctgaagc
    241 ttttagttta cagggagttt ccatgaagcc acaaactaaa ctaattatca aacacatcag
    301 ttatttccag actcaaatag atacacattc aaccaataaa ctgagaaaga agcatttcat
    361 gttctctttc attttgctat aaagcatttt ttcttttgac taaatgcaaa gtgagagatt
    421 gtattttttc tccttttaat tgacctcaga agatgcacta tctaattcat gagaaatacg
    481 aaatttcagg tgtttatctt cttccttact tttggggtct acaccagcat atcttcatgg
    541 ctg
    ACCESSION No. AA777892
    ORIGIN
    1 cagcttgcat cataagtttt attcccgatg cgggacagat ctttccatcc ctcaaatgta
    61 ttacatgtcg ccacggaagg gcttaggatg ctgctcccat ctccaggaaa gatgagaaaa
    121 aggtacagac tgggagccag tccaggacca ttctgcagtt cctggctctc ttaccctccc
    181 ttctcagcag aggaattatc tctcatccat tcagttaaaa agaaaaaaaa aaaaatcatt
    241 aacaaaacaa aacacacctt aagtattggg caggggtgtt cttgtcctca gtaggacgtc
    301 aagttctggg tcaccaatgg tgattttttt tgtttttgtt ttttgtcatt tttgtttgtt
    361 attttttttt tttnnatttg ttagttatgg ntagcagttg tgtgtccacc tcatctgcag
    421 gcagctgcac atagcggacg actgagcccc tgatgaagca gttcttgact gataacatgt
    481 gagggtattt ctcagggtct gtgacactga tgtcggttag tttgatattg aggtactggt
    541 ccacagagtg gagggttcca cagatgctca ggtcattctt gagttccacg actacatacc
    601 ttgccacaag agacttgaaa aaggagtaga agagcat
    ACCESSION No. AA873159
    ORIGIN
    1 tttctgtagg atttttattg gtggcacctg gggccacatg gagggagtcc tcagcacagg
    61 cgctggggtg tgggaaattt cagaggcccc tcctgggatg tcacccttca ggtcctcatg
    121 agtcaatctt gagtttctcc ttcactttct gaaatggctc tggaaaacca ctcccgcatc
    181 ttggcagaaa gttcactctg tttgatgcgg ctgatgagtt cccgagcctt gtcctccagt
    241 gtgtttccaa actccttcag cttatccaag gcactggaga cgtctggggt cccctgggct
    301 ggggctgggc cttccaagac gatcgacaga accaccacca ggaccgggag cgacaggaag
    ACCESSION No. AA969508
    ORIGIN
    1 tttttttttt ttttttcact tcttcaacaa gtatttattg aacgccaact atggaccagg
    61 ccctgtgctc aatgctgggt acagagtgga gactgaacca ggcatggcac ctggcctcat
    121 gagcttacac tcgagtggga ggcacagtca accaacaagt aaattacaca aatggatatg
    181 cagtggcaaa ttctccatga agggaaagaa cagaggcctt gtgatagagg aactccacaa
    241 gtaaagtagt cgaggaaggc ctcttggacg aggcaacgtt gaagccaagg cctgagggtc
    301 tgcagaactc agccatgcac agggtagggg aagagcattc ttggcaaagg gaacagcata
    361 tgcaaagtg
    ACCESSION No. AI203139
    ORIGIN
    1 ttttttgagt ttggcatgtt aatttttatc agcgacttct ggggcctagc accattcccg
    61 gaagaaggga gttgtcgggc agggtcctta atgggggttg caattcttgt cttggttggg
    121 aaagagccta gctgggaaca ggggtcgttt gtgtagtaac tgtattaagc
    ACCESSION No. AI299969
    ORIGIN
    1 gcggccgcgc cggctccagg gccatttagc ccccaggagg agaatcgagc aatctttttg
    61 gaagtccaga agaagctact ccttccagca ggcctaatag gatggcatct aatatttttg
    121 gaccaacaga agaacctcag aacataccca agaggacaaa tcccccaggg ggtaaaggaa
    181 gtggtatctt tgacgaatca acccccgtgc agactcgaca gcacctgaac ccacctggag
    241 ggaagaccag cgacattttt gggtctccgg tcactgccac ttcacgcttg gcacacccaa
    301 acaaacccaa ggatcatgtt ttcttatgtg aaggagaaga accaaaatcg gatcttaaag
    361 ctgcaaggag catcccggct ggagcagagc caggtgagaa aggcagcgcc agaaaagcag
    421 gccccgccaa ggagcag
    ACCESSION No. H17364
    ORIGIN
    1 tttttacttg aaattaaatt tggnctctaa agttggtgta gcagcagttg atcagnactg
    61 aaaaacggtt tttagtctcg gaaaaagact gattttgctt ttttataaat attattagat
    121 ttattaattt ttcgtgctca atgtgtaaat tgtattataa ttcattgtga tttatttcac
    181 ttttaatttg ctggtgtttt aataaatggg ggtgttactg aatctttctt cccacttcca
    241 tttcttttga ccacccctta accctcaact gtgacggtag tagtattatc atttatacca
    301 aagttttgca tagtccctgt tgactttgta atgttaacgg agtcataaaa gcactaggca
    361 agagaaagat agaaatttgc ttttaatctt tttgcctttt attttgcaca ttatgcaaaa
    421 gggaaaacat taaaggacac tttttttaag ngagtgaaac atgggnaagg catccagtgc
    481 tttatgcaca ttgtnagcta atcaggccat tat
    ACCESSION No. H17627
    ORIGIN
    1 tttttttttg ggcagatgag aaacagaatt atcatcagag tcttgctaca aacagggaaa
    61 aacacaaacc aagatgacac acggacatgg tagattaaac attcctcccc accttcagga
    121 tacatttaca ttgnaataaa tactgcaatc tcagcagcgg caaacaagga ggaatntagg
    181 aaatgcccac ctcctcccct ctgtcttatc tgtgtgctct cttccttggg tagcaccgat
    241 ctccccaggg tgctgggtga gaaacaggac aggggngaag aggtccgtgc atgctcactt
    301 gcccttttgc
    ACCESSION No. H19822
    ORIGIN
    1 gaagtcatan tatgataaac attttattac actaaaaaag tcatctgtta actgactgaa
    61 ctgcaggggg accacatgtg aggttacttc agaaaaatgg catcagataa catatataga
    121 tttctggcat tataaaatgg ctagattctc ccctaccttc cctcattaaa tattaatcag
    181 tggcttaggt cagttctagt gggaacactt aattgctgac ttcacataaa accaggntta
    241 gcctaatgtg ccaatggtat gagtccattc ctgggccatn ttcccaacag ccagaccgct
    301 gtggcttgga caccggaggc aacatctggg gggcctcagt tccactcctc tgtggtnagc
    361 ttgctttccc aataactggc tntggagtca catcaacaat ggtggc attn catctggggn
    421 ccacatgagc cctttggggg tgctgcatcc ctactng
    ACCESSION No. H23551
    ORIGIN
    1 ttttttttta tgcacactaa ggnatatttt attgtggcat taattagatg aaagttagta
    61 atatgncatt gaccaaaaca tttgattgac aagnaccata aaggttaact gagagttttc
    121 tttaatataa ttgttgtaca gacaaggatt cctgctgtat agagtatata gaaggatgac
    181 atactctagg aattaggaac aatatatatt caatacaata acaaaactat atagtacttt
    241 aagaactctt tcacatatat gaacactctt acttaggaac ttcagctgtt taaagtaagc
    301 aatatgcaaa cctataaagt acacaccaaa aaaatctaac ctacaaaaca cccaaagcaa
    361 atgttagcat atctctatta tcaagaatat cttctcacca tcgtttcttt caaaaatatg
    421 tgaaaaagtt ctttctttcc ttatgagtgg caatttttaa aggcccctct tctgaaatta
    481 gntatgttcc aatccactat cactcttaag ggaaaatgga acdnctctgg g
    ACCESSION No. H62801
    ORIGIN
    1 aatgatatca gaacctttta aatgatctag tatctgtgat gttagcgccc ttgggattca
    61 gaaagtggtg tgcatagtaa aagctttcat tgtaactcac cctgcctaga tatgcagaaa
    121 gcaaattcag tgataagatc tttcctggga gaccaatcag cagcctcagg ctctgttggg
    181 gtctatcaca atgatgttat ctaaatttag ggcaaggaac cctttcccca tcttttagag
    241 ggcagtgagt gttctaatca cttcaagata ggtatctgat aaaagtcttg gggccaactt
    301 tttcatactt aggnagggca caactaaaat ggatatactt aaaatggtat caaaggaggg
    361 ttaggtgtac actctactag gtgtaaggtn tatttcatta caaaatggct ttgg
    ACCESSION No. H85015
    ORIGIN
    1 cacccaggct acagtgcagt agagcaatca caactcactg cagcctcaac ctccctgggn
    61 ncatgcaatc ctcccacctc agcctcgcaa gtagctcgga ccatggccac acgccaccac
    121 acccggccaa ctttcgtact tcttgcagag agagggattt gccatgttgc ccaggccggt
    181 cttgaatttc cgggctcgag tgatccactc acctcagcct cccaaagtac tgtgattaca
    241 ggcatgagnc actntgccca gccaataaan tcttt
    ACCESSION No. N21630
    ORIGIN
    1 gaacagacta aatttgtttt aacaatccca tttacaattc aaattccttt aaacaactta
    61 atagcattta tacatttaaa aaaatgattc ttttaagcag cattgcaaat gcttgacccc
    121 attagcataa accttcccaa gtgcttaact ctcataaaca taataaatta aacatatggt
    181 gactttccaa gttctctgaa acatttcagt acttttgcag acttagtaac attttaaaat
    241 acctttcaac tgaaactcat aagtctaaaa gtctgttaag cattttaaat tagaatctta
    301 aggccagtgt cacatattgt aatatgccaa ttatgtttaa atacttcaaa cagcaaatac
    361 tacagtttat ctcaatgaat ataataacca ttcctgctgg gcgcagtggc tcatgccttt
    421 aatcccagtc attaaggagg ctgaggtggg aagattgctt gaaaccagga gattgcctca
    481 ggcctgggca acatggtgag acctcctatc tcaaaaatcn aaataaaaat tagctgggca
    541 ggtggctcat cctgtagccc agcntctcag gaggctgagg tgggaggata gcctcgccta
    601 ggagacggag ctgcagtgag c
    ACCESSION No. N36176
    ORIGIN
    1 aataaagaca agtgttcaga tttatttgga aattcacagt ttctaatggc actacagctc
    61 cgtagttaca tattgaaaat tctcttccca caacacacag atcacataat ttctcactgt
    121 atctctgctc tcatctggac ctcttttcaa ggggcttcta taaaatcagg ncctcttgnt
    181 cngganagnn nantngngcn gacaggaaag aaatttaaat cttctaaaac acgctgttaa
    241 cctaaagcag caacttaaac aaacaaaaaa ggcgttaaat aagtcacatt acaaacaata
    301 cccaagaaag gtattaggca agtttaaaaa cagttatcac tactaaaagt gctcaataag
    361 ttataactta aacatcacaa caataaatgg tcaattctct ccctttcaaa aagaaacatg
    421 ttccactttc attcactact gtacaatcat acta
    ACCESSION No. N72847
    ORIGIN
    1 attgttactc tagttttaat ggtttcacaa atacaaaagt tgctagataa gcagtaccaa
    61 catatctaaa tctccaatga tgttcaatta aaattttatt tatagactca tacactcagc
    121 aaaaccactc atttaataag tccaactgaa ataaattctt attaataaaa tacctatatt
    181 gaaagtaata tattgtaaga actctacctt aaattgacca tggggatgaa ctacaatgtc
    241 ataaaatatg agccaaaatg ttcactcaat aattttaatt acatcacaat taagcccaga
    301 actatgcctt ttttttggtg taaggctgaa taaggaccga aactggatgg agagaaaatt
    361 gctttctaaa gcctcattta ctggcaataa cttaccttat gcaataacca acatcacgng
    421 actgg
    ACCESSION No. N92519
    ORIGIN
    1 ttttttttaa ctcttaaaaa aaatcatttt attgatcctt taccatacaa aatttattca
    61 aattacaccc atttgaagtg gtaagatcac agctagagaa caggtcaccc tgtaacaaat
    121 ctatttacaa aatccatcat aaaagctttt ttttgttttt ttttacatta tattacatat
    181 tttctttttt aaaagcatac aacacaaagc taaactgatt agtagtttgc ctactcccaa
    241 ttttgggaga aatacttcct ttttacaaaa tcacgtnccc cgtaggaaaa gaaattccca
    301 caccctgaca attggccaac cgacttactc tgcaagccat cttcttcaaa tccctccttc
    361 tcatacacac gangttgtca tgcacacact gaatcntaat ttcttttccn ggaagcttaa
    421 ncctttaaat accgggaatt attttcagat ctncacgtnc caacaaaaat ggaaacaagg
    481 gccccaccaa gnccgggaaa acnaaaccca ataccctntt aaaaatttca aggc
    ACCESSION No. R27767
    ORIGIN
    1 tttttancna tttgtaaata agtttaattt ttnagttttt caatgacatt cagtagagat
    61 agttatattg gctatataac acaagtaaag tggtgtttgg aaagtggagg actaggtttt
    121 ggcacggggc taggacgggg tgaccgccgc ctcaccacca cagactggag ggggcttttg
    181 agagctgggc ttcgctcccg aggactcagc tcagaaactg ctgaggcccg tgatgcagaa
    241 ccagtgccgt aggtgggcat ctggccatgg cttcgagctc tcaggatgct tttgtatctt
    301 gagagggtgc ctccagagaa tgtctgctcc ttgggcctca tctncccggg ttatnccccg
    361 gcag
    ACCESSION No. R34578
    ORIGIN
    1 atttttgaag nngnttcgat gtcttactgt tatgaccata aaaccaataa agctactttg
    61 aaaagttaaa gccaggngta attaaacaac tcatacttga ttgttaaagt cagtctctna
    121 aaagtgtaat tttaaaaagg taataaaaaa ggtatancat tat
    ACCESSION No. R38360
    ORIGIN
    1 tttttttttt ttcaaaaatg tcaaacttta ttcaagtgtt atggtaagaa atttgaaatt
    61 cttaggtaag ctantgaata aatccttggg caggtgcagg catacagatt ctggggtgca
    121 gctgctgagt ttaaaagctt cctttggaga tgccccgnng gggnnacacc ccctntcccg
    181 cctntcaaga ggaggccatc ctggggcagc acgttagggg caaatggccc agatgcccag
    241 ctnagggaaa cctccatgcc tagaggagga ggtcgctctg ggagcaggag gaccttcttg
    301 gaacccctgt tnacaggntc ctttttcttg ntttttccag nacctcctgc aggg
    ACCESSION No. R43597
    ORIGIN
    1 tttttttttt ttttttcagg attcactgcc tggggtatcc cactatatat atctcaccta
    61 tgatgtagtg gtgcttgaaa tactcatctc attagctcga ttttattatt ctaatctaag
    121 gttttttata ttattcatac tatgatattt ttagggacaa tcagtaatat ttggggcaga
    181 gtactgaggg acctcttgaa gtctgcaaca gcatgcattt tctttgtttt tgtggggagt
    241 gcttccctgt aggctgtctt tgttctagga acactgnctc caaatttatt tccatgggga
    301 tgtagggggc tagtaggccc atggtggaaa ggtcttctgt aaatctccnt gggggggtnt
    361 gagttattgg gggttatttc taacagggan ttttcccaaa ggggg
    ACCESSION No. R43684
    ORIGIN
    1 tttttttttt ttttcattca aaaatatata atttattgag tacttgctag acacaatgga
    61 tacaatgatt atatagtccc aatcctccag gagaacaata gacagacacc tttataatat
    121 gtatgtggag tgctctgaca gggaaaagca caaggtccat gggggtggga gtggcccagn
    181 agctaaggaa ctcttccccc atgaagtggt tacttacttt ctaatcttta atttaggatt
    241 ctctcatgga acatttgant ggtgaaattt tactacataa aggttctcaa ccctaggagg
    301 tttatccctg cccccctggg aacatttggn caatgtctga acaacaagtt tattntcaca
    361 actggggagg ggngaaggaa gttagcagag gccaaggatg nctggctaaa ccttaaattc
    421 ctacat
    ACCESSION No. W73732
    ORIGIN
    1 tatttcaaaa aaagtctttt aattgttcaa aatagcacaa aacgacatcg cactatggta
    61 atattgagtc acaggggtta cnctacaata gtgaacggng tactcncctc agaaacaaat
    121 cant
    ACCESSION No. AA450205
    ORIGIN
    1 tttttgtttt ctttcattat ctttatttta aatttgatat tttagaatag gaaattatct
    61 ttcacagcaa tgcctcctgg tctgataata cagtatctca tttctgaatg taaagattta
    121 aaataaatca aaatgaacat taaggcgtac aaagctactt taagtctgct cttaagatca
    181 gtttttgctc atattcaaaa tacatggaat gttggcacaa aactgaagct gctgtagaaa
    241 gatcacagat gttctgtggg ttactcaaac ttccatttct ctaaaaacat acccttacat
    301 ggtcttaatt ttatgaattt aagtgttgag aaatatctaa ataataagta acaattaaaa
    361 taaaatgttt tatttgtaaa ttatgtacag aatacacttt acgttacgc
    ACCESSION No. AI081269
    ORIGIN
    1 tttttttttt ttctaaaact acctttattg tggttggctc gacataagat gccgccatca
    61 gcagaattat aaaactgtac aggaggcaca aaaataggct gtttaactta gataatgacc
    121 ctcatgtctt caagctttaa aaatgcacat aaaagttgta caatctggca gtttataaaa
    181 tataaagcta aaaagaggat tttgggttcc acaaagaaga ctgtatcaca caattaacac
    241 gtactaatta aacaattaac catccacaca gaagacataa tg
    ACCESSION No. R59314
    ORIGIN
    1 tttttttttt ttttcaaaaa ctttattctt ttctaataaa aatgatatat gttcattata
    61 aaaagtttca aacacacatg agtctganga ntgtaaagat cacccaaata ccacagccca
    121 gaaaaaaaaa tccttaacat ttggtganga tctctctatg aaacatacat tatcttaaaa
    181 tattcaatgt tataaatgag ctcatattca acatatatcc tgtngtctac tttttgattc
    241 aataatattt tgggaacata tatccatngc antaaacata tatctaaata tttttaaatg
    301 acaactggca tgggnnttta tttaatccat cttttactga gggatgtttc agttgtttcc
    361 aatgttttaa tatcataaac atcatggaaa tataccnttg gggctccatg tttgganggc
    421 ttggggcaac ctt
    ACCESSION No. AA702174
    ORIGIN
    1 catcttcagc attaagaagt gctgacacaa tatcattaac tgttttatag ttctctccag
    61 ttgtcaggat tttactttga actgtttgtt tcaccaggtc tctattaaag cccatttcca
    121 aggcagattt aaccacaggt gtattcatca tgacagcatc ttctgaagaa ctttctccag
    181 gtccaaaatg aataattggt gggtcagcat tttcttctcc agtggtatct gaagttgaca
    241 acagctgttc aagaagatga ggatatctac cttgaatctc atcaacaaac tcttggcctt
    301 tcattcgtat caagaactca caccttggaa accacttggc atgttctacc catggatcat
    361 ctccagattc ccaacacctc aagccaccat cacaacaaaa gcatttgaca tcatcattgc
    421 gacccacata ataaaaacca gcacttgcaa gctgctcagg ctgaactgga acactagatg
    481 gccagtacat aaatgttctc attcgagctg catgtgtctg catgctcaga tttgaaatgc
    541 taaacctcag agtttctaga gaa
    ACCESSION No. AI002566
    ORIGIN
    1 tttttttttt tttttttttt tttttttttt ttttcacaat tcttaagtct tgttaagaaa
    61 gtaaaaaacg tttgggtata ttttgatcca tgggtggcat tttcaaatgt gcaaaaacaa
    121 agtcttggaa gagattcctt gtcactagaa agttcgccct tccttttgct gtcagttgta
    181 cgtaagagaa attcgtccac attaaggaat ccaaaaaggg taaactaaag ggatttaaaa
    241 agagtacatt acaaagaata agaagccctg taacatctat ctgagaatac tagataaatc
    301 tgtgagtaga tgtggcacct ggagctactc actacattac taaaaacaga aacaagaaat
    361 ctataatggc aggatcacaa catttgcgcg caaatagcta acc
    ACCESSION No. AA676797
    ORIGIN
    1 aataccttct gttttaagtt tttcttttgt tttcatcttg gaaaaaagga aatttagaaa
    61 taagacagga aaagaatggc ccagaaattc agcacaaaga gaggtgtaca cattgacgcc
    121 atctgtgggt cacatacgaa cgcctctggg acagagctct aaaacgagtc acgtgtcgta
    181 gggagtgggc ctgtggcaag gcagtcctcg cagtgtgcag ggacgcaggc ccccttacca
    241 tggaagcccc acccagaagg aagtgggtgc cccatgcagg ccgaggtgga tgaggggaca
    301 gtggtgtgct cacagctgtc agctccccac tgaagcccca aaccagcaga tgtgggcagg
    361 ggctcaagtg gtgtctgact acccaggtca cacgtgcctt aagcgtgaaa gctgtcagct
    421 cccggcacgg gctctggtgg ggctgggaac accaggacac acatgggctg aagcttccag
    481 agacagtgag acacggaagg gacagagagg tgccctccac acagtgtg
    ACCESSION No. AA453508
    ORIGIN
    1 tttggttatt cagtatttat tctgcaatgc aaaggtgaca aactaaaata taaaaaggct
    61 gttatggctt aacatttttg ttgcagatta aatatgcagc attgaaaaat ggaaaggcgt
    121 ggcttcatct ctgaccagca gagttaaaaa gaaaaatctc tccattttcc ttcatcatca
    181 tgggatacac tgttcaggca atccaaatta ataaagactt gcactttcat atgaacacaa
    241 gatcaagtgt accagttagg ttttcacatt cacagtatat aagaaaatac acatggaagg
    301 aaaagtaaag ggttaact
    ACCESSION No. W93980
    ORIGIN
    1 tgaatgaggc aacaaaagca gagatttatt gaaaatgaag gtacacttca cagggtggga
    61 gtggcttgag caagtggttc aagagcctgg ttaccgaatt ttttgggggt taaatatcct
    121 ctagaggttt cccattggtt acttgatgta cacccttgta aatgaagtag tgcccacaat
    181 cagtctgatt ggttgaggga ggggacctat cagaggctga agcaagtttc aaagttacac
    241 cctatgcaaa tctctgattg attgggaaaa ggctgaagtg aagttacaaa gttatactcc
    301 tatgcaaatg aagacttggg cccatgacca gcctcattgg gttgtggaaa gggaccaatc
    361 agaggtactt tcaatttttc catctaccat gcagaaaaag gttcgggggt ggggggttgc
    421 caaagggaag ttagccnaac aaactcctga cctaccaaca gagggtccca gttgggtagg
    481 ggggcctggg
    ACCESSION No. AA045308
    ORIGIN
    1 ctattaatca acacttttta atgtagtaca tatatatctt acagttattt aagtcaaata
    61 tgtaaaggtt tacaactgat ttacagatga agcaatcaca gattgcagta atatgtgtgt
    121 gtgtatatat atatttatnc catatataca cacacgccaa tcaaggggaa aactgcatcc
    181 tggcaatttt acagtctgaa gttttgttgg tatatctacc atttcacatc cttttcatct
    241 tgcttttctg tacaaaagat atttttngcc ttcttcattc ctgatgagat ttttctgcga
    301 taactttaca ttcgtacatt gccagttgtc gaccaatgtt tcccattgtt atgcctccag
    361 caaaaaatat
    ACCESSION No. AA953396
    ORIGIN
    1 atctgtcagt aaattacatg tatcctggct gtttatttca aaaatgcttc agtatgtatt
    61 tcctaaaata gggatattct cctttgtaat cacagcaggg tagatactgc tctttagttg
    121 tcatgtctct tagccttctt taatgtggaa cacgtccaca ccctttcttt atcttctgtc
    181 ttttaaacat cttttctgtt gtccaatttt taacaacaaa gatgttaaaa atcagaaaac
    241 tcagaaaagc acatggtgta ttaaaattcc acctaggaat aactgccatt aaagttttgg
    301 tgtctccctt tctgtctctt cagatgcaac ttactagtct agacaaagca ggtttctcag
    361 tgaataaaac at
    ACCESSION No. AA962236
    ORIGIN
    1 ctaatcctgc gaatatgggt agtgcttcgt tccatggacg ttacgccccg ggagtctctc
    61 agtatcttgg tagtggctgg gtccggtggg cataccactg agatcctgag gctgcttggg
    121 agcttgtcca atgcctactc acctagacat tatgtcattg ctgacactga tgaaatgagt
    181 gccaataaaa taaattcttt tgaactagat cgagctgata gagaccctag taacatgtat
    241 accaaatact acattcaccg aattccaaga agccgggagg ttcagcagtc ctggccctcc
    301 accgttttca ccaccttgca ctccatgtgg ctctcctttc ccctaattca cagggtgaag
    361 ccagatttgg tgttgtgtaa cggaccagga a
    ACCESSION No. AA418726
    ORIGIN
    1 tttgagtttc aaaggattta tttgatttcc ccacatgatc acaaccatgg ttttacattg
    61 atagagtctg ttgccactga caaacagaat gcagatgaaa acaaacgcac tcctttcctc
    121 tcaaaggtac acagtggggg tgccaggctt cttgtgaggg aggtgtcctt gaagtctctg
    181 aacagtctgg ggattcagga cctgattcta attgcttaaa acaactcgga ggcaaaagat
    241 attttccaag aggagatgca tgctgtgtgc agtctcgatg tgactgcaca cagaa
    ACCESSION No. R43713
    ORIGIN
    1 tttttttttg atgtgctaat tttatttttc taatacttac caaaataaat gccaccactt
    61 aacatagaaa aaattgttcc catgtgacct aaaatcattc ctcagtcacc cctgaactgg
    121 ctagtagcga gcatatgtgg agcggtggtg agggcaggat agcctggtta taggaaacct
    181 cagantagga aagacctggg ttcaaatccc cactctgcca cttactagnc tgtgtgactt
    241 tgggacaagt tgtgaaacct ctctgaggat ttatttcttc atgtaaaatg tcaccgataa
    301 tggataactc agtgggtgta agantgatct attttaagga ttctagggca gagtcccngg
    361 gcagggcagt taaggcactt aaataggatg gacaguctat tcattnaatt attaggcagt
    421 tttttcctta atggagggtc cttgttggaa ggaccccttt tttcttaacc tcc
    ACCESSION No. AA664240
    ORIGIN
    1 tgtgataggg ttccactttt tctctcatac tggtgtgcag ttgctgattc atggctcact
    61 gcatcttcag tctcccatgt taaaggaatc ctttcacctc agcctactga gtgtgcacca
    121 ccaggtccag ctaattgttt ttttaacttt tttttttttt tttttttctt ggtagagaca
    181 gggtcccctc tgttgcccag gatggtttgg aactcctggg ctcaagcaat cctcccactt
    241 tggcttccca aagtgctgag attacaggca tgagcactat gcccaacctg agcaggatga
    301 cttaaacctg atcaattcta ctccaaaaca gcaactatca ttaagtcagg ggtgtcaagg
    361 aggactctgt gaaggcaaag actagactgg gatgtgtgcg agagtgggat aagaaggccc
    421 atccctagca gactg
    ACCESSION No. AA477404
    ORIGIN
    1 ggaaaacaaa aggaaaactt atttattctt agaggtggga atgtggggag tggggcagaa
    61 caggtggtgg ccctgggaga gggtcccaag gggcagaggt tggggatgtc tcagtaaaga
    121 ggggcaggtc atgaatagag cctccacccc cagcaggggt tccttgggcc cgcccaagca
    181 ctgggctaaa acgtggaaac tgggcattga caaagtacag cgg
    ACCESSION No. AA826237
    ORIGIN
    1 aaagatgaga accagaatgc ttatatttta ttagtatcca agactgggga gagggatggg
    61 gtgggagaga tcaagaattg gggagcagat gggaggcgct acctcactca ggagacacga
    121 gttcttatcc aagttcaagg tgaaagaagt gagggcagga agagaaatct ccctgctagc
    181 aacagcgact cagggagaaa ctctgggccc atagctagct ggaggcaggg tgacattgct
    241 cccaccaatg ggccatcttc ttagctacac ctttgtagct gtggtgccag gcagaagaac
    301 cacctggaaa ctgagctaag gcaggttcct tcttccaaca gaagacacag ctgggcaggg
    361 actgtgcaga ctcaacaggg ccaggccagc tagtggcang tcagtgttca tgtctctcac
    421 cagtgcctgg agggtcccca gccaaggaaa gaactggtca gttcctgc
    ACCESSION No. AA007421
    ORIGIN
    1 gtttgtagca gttccaaaaa gaaagcagaa ctcatttagc aattgtgata aaagaaggaa
    61 aaatgcatat gttttaaaag tcattaacgc atcgtgaaag cgctcccaat caacctcatt
    121 ccctaggatt ttcagctaac taacaatagt gtctttttaa tttgatgtca tgaaaatctg
    181 gtcacagcaa acacaatgtt ttctaaagca gatctggcct ccgagggagg aaagctctcc
    241 agggcctcca gtgccttgtt tccatggtaa cgacacaggt caatagctga agtcacacct
    301 ttgccagctt tgattctttc tcgcaactgg gagtctgagg caagaggatc acttgagccc
    361 aggagtggga ggctgcagta agctatgatt gtgacactgc actccagcct gagcgacaga
    421 gcgagaccct atctcttagc atagtccaat cttccttttt cttgag
    ACCESSION No. AA478952
    ORIGIN
    1 tttcccagcc ctcaggccac tttattgctc aagagtggtc agtctggggt atctgcatgc
    61 ctgaactcca tgatgatgtc gcctgtgtcg gggtgaaact ccactgcata gctgacagtc
    121 cgtgggccac ccagcagtgc tctgggatct ggggcagggc tgaagaagta gacggcctgc
    181 ttgcagtggg ggttccagca gcagcccccc tcgggatctg caggctccag gaggccagtg
    241 ctgagcgtgc actccggggt caggtggtac tccatccata gcaccgctgc gtggctctgc
    301 acgggccttc tgagctccac ggtgccctcg gcacacaggg gctgcagggg ca
    ACCESSION No. AA885096
    ORIGIN
    1 gtctgtgact cttggttagg gcaaatttca aatccattat aatacataca ttgcagcaac
    61 actgagtttc ttataatagg tactatccaa agctttcttt tttttacatg tatcacttaa
    121 tcctcacaac cacctgagga ttaataccat ttacctgttt tacagataag gaaaacaatc
    181 atttttcaat tatgactatg cccccaaaca ctggtttgga tggagccttc actggtatag
    241 agaatgacct tcttccctta gactagactc tggctataat aaaggatggt ttaatcatcc
    301 cctgaagcaa tgcataagat aatctgcaat gtatcttcac atactgtacc ttatttgata
    361 ggcaagagac ccataaagga agctgagcat ggattatcag cttcatcaca aatctgaaga
    421 aactgacatt tatgttatgt tgccttaccc aagttgggac atcagagcag caac
    ACCESSION No. H29032
    ORIGIN
    1 tttttttttt tctataaatc tctaatgtta tttaggtttt ttaaggutt ggaagtaaca
    61 gagggataca tacagcaaga tccacttaca tagttttaaa acatgcaaaa caagattata
    121 tatcgtccat atgtaattat atctgtggta aaatataaag atatgcattt tggggacata
    181 gtcaccagat tattagtagc tcaaggaaag gcaggaggaa gagtgctctg ggtgggggga
    241 ggttcacagg gtgcttggac tgtacctatg atttcttcaa ataaaaattt caagcaagta
    301 taaaatatgg gatataggaa tgtaaaggat ttgggcaaag ctgggctggg tgggtatcca
    361 atgttcctta tcaccatctc tgtacttctc tgantgcttt aaataggtca caatcnttgt
    421 aag
    ACCESSION No. R10545
    ORIGIN
    1 tagaatgaat tgcagaggaa agttttatga atatggtgat gagttagtaa aagtggccat
    61 tattgggctt attctctgct ttatagttgt gaaatganga gtaaaancaa ttngtttgac
    121 tattttaaaa ttatattaga ccttaagctn ttttagcaag c
    ACCESSION No. AA448641
    ORIGIN
    1 agccttagga atggttttta ttcacttgaa cactgtacaa atattacaat ttccttttgc
    61 tgcaaaaagt ataaaaataa tctttatata ggaatccatt cgttactgta aatctttcta
    121 aatctctgca aatggcccta aatgagggta aatgaaaaag ccgaaatgaa gagagggtta
    181 tggggcagca ggaggtgggg ccaatcatca gggctggacc acccagactc ctccccagag
    241 acctctgttc cttcttggta gccgccccca ccacctgcag gttctagggc taaaggccca
    301 gcagaagtgg gcacgtgaga gggccaggag gagctggagg gtcagggggt gggggatagc
    361 gaaggaagct agaagtggtg ctggcatgtg cccagttcca ccccacca
    ACCESSION No. R38266
    ORIGIN
    1 tttttttttt atcttttaaa tgggatttat ttatgtttac ataaaaggta gcaaatgtta
    61 cataagttgt ttccttaaga acatttattt tgtacaatca cattgttatc aagcaagact
    121 tatggaaaat ttcctgggtc cacaacactg aactttgaaa ctactgtagc attctctttt
    181 ccaagtttaa acatgacttt gtgcactgaa gaagtatggc ttcgcattgc acagtgggtc
    241 acatgtgaca acctgacacc aagcgagaag ccttttgatg aaggaatgtt ttatcttttg
    301 ttgaggttac caaaatgggg actttcatgt gtggtggatt atccaaaccc catanttttt
    361 ttttncggtt ccatttctgg cttccaattn aaattaaccc ggtttaaact aggcnggttt
    421 nggccaatgn ta
    ACCESSION No. H17543
    ORIGIN
    1 tttttttttt tttaacctct tgctcatttt tattccagaa cctaggaaga actagtacac
    61 tgaaggcatt tgatgtttgt tatgaaaagg aaacaacaaa aaaatcaagt tcaggctggg
    121 catggtgcct catacccgta atcccaagca ctttgggagg ctgaggcagg agggatgctt
    181 gagcccaggg agtttgagat cagcctaggc cacatattca gaccccattg ctaccaaaaa
    241 atttttaaat taaaaaatgg ctaggcatgg tgggcataca actgtaattc aagctacttg
    301 aggaggctga ggtggggagg atcacttgaa cccggggggt tgagggccac agcgagctgt
    361 gattcacaac actacactcc accctggggc gacgaagcaa gatttcgttt tcaaaaaaca
    421 atttttgttt caantcccat cttcaccnta aaaacctngc tacattcccc aggggaaaac
    481 caattttca
    ACCESSION No. T81317
    ORIGIN
    1 taaagnnatg aggtcttgct ctgtcaccca ggctggagtg cagtggcaat tgtccctcct
    61 cagtaagtgc aagccaccat accaggccct ttgaacatat tttaaatggc tgatttaaag
    121 tctttgccta atactaaagt ctaacatttg ggcttcctca gggaacattt tctaatttac
    181 tgctttctct cctatgtgtg gaccatactt aagtggtttt ttgcatgctt tgtaataaca
    241 gtctcttgaa aactaaacat tttaaataag gtaatgtgac aactcgnaaa aatcaggatt
    301 cttcccctac cagggnattt gttgttatta ctgtttactg ttggttactg gtttattgtt
    361 gttnctntta ggtgactttc ctggaactaa ttatctaana tatta
    ACCESSION No. AA453790
    ORIGIN
    1 aacaaatata tttagatata tttaaaagaa ttaaaaaaaa catttcacaa aacatttgtt
    61 gccataggaa ttatttttag caataaatgc ccacatcaaa atttaaacat ttttcaaagt
    121 atgattatct gtactaagta atgcaacaaa ttatgtaaac agagtcagat acatttccct
    181 gtaggagtca cttccttccc gggattaaag ctgtcccaga catctttcca ggggaccaat
    241 taagaaactg ctattttcag agcaacagaa ataaaagctt ttatttgttc atttgaatat
    301 aaaacaggcg ttatcacaga tgtacaaagc gtactggtgg tttaacatac aagaaggttg
    361 ctgtcctttg cacataaaaa ttttgtttga aactgtggct ggttgagtac atgagtt
    ACCESSION No. R22340
    ORIGIN
    1 ttttttaaca taaaggtttt attgaataaa tacatgcact gtcacgtgaa attagttgaa
    61 cagaaaggag gttctctact ttttaacccc catcccccac cgctgttctc tatttgcagt
    121 ggggggtcca gctggaggtg gaataaatgc ggcaaccaca ganaaaacac acagctacac
    181 acaggcctgc atttggctta tgtgcctgaa aaagaagggc cgacctcttg ataaagaatg
    241 tctgtaaaag gaattcttac cgtgcagaat atattatcat gggcnantac agttacaagg
    301 ctgcttctat tttatttatt ttttgagacg gagttcacct ctgttgccca gggtgggagt
    361 gcagtggtgc gatcttgggc tcactggcaa cctccgcctc ctgggttcaa gcantt
    ACCESSION No. AA987675
    ORIGIN
    1 gggtagatag ctagaagtga tagtgctagg tcatatggta aatatatctt caacatttta
    61 agatactgcc aaactggttt ccaacgtgac tgcatgtccc atcaacaatg cgtgagtgtt
    121 ttagtttttc cacgtcatta tttcacttcc cccaggtgtt actgtccttt tttattatag
    181 cattctagtg ggtaagaagt ggtgtctcac tgtagttttg atttgcatgt ccctgctgac
    241 tgatgatgct gaccatcttt tcatgtattt tattgtctat tcctacacct ttttgatgaa
    301 atggttattc aaatattttg cctattttaa aaatggggta attatcattt tgttgcgtag
    361 ttgtaagtgt atttcatatt ctggatatga gtcctgtatt aaatatatga tttgaatttt
    421 taaaaaaaaa aaaaaaacct cgt
    ACCESSION No. N51543
    ORIGIN
    1 acgattaatg ttttattatt catattttga caaagatagc atattatatt ccaggacatg
    61 gtagttacca tgtggggaaa cctatcaaag catttttaat gactgcttag aataactgta
    121 gaaagtactt tctcaatgat ttttgtatgc aagaaaaaaa atacctgaaa gtaaccaaaa
    181 gtttcagact ggaaaatatg ccaggaagat tttcttctct cattctcagg tgaggttata
    241 atccagtttt agcaaatgtt tgacaattta aaatactttt gaaaactgga gatttaaaaa
    301 atgtaaacaa ttggtaggca cagcaaaatc gtagttttcc cttctgatat tatacatttt
    361 ggcatctctc tacagttatg attaaccatt aaatnaaggg nagctaaaac gttccaaaaa
    421 taggttttac caacattcan tttttaaaat tttccattca agctggtaat ccttttgggt
    481 ttcc
    ACCESSION No. N74527
    ORIGIN
    1 aaacgtggca cagtgtgtgt agtgtatgtg actactatca tttgtgtaag agaaagaaaa
    61 gtttactatc agagactgta tctggaggga taaacagact ggcaagggtt gcctctggna
    121 agaaaccggg gaatagagag cgggagtaga aagactgtat tagctgggtg tggcagcaca
    181 cactgtaggc ccagctactc cagaggctga ggggaagact tgctcaagcc caggagttca
    241 ggtccagcct gggcaacaca gcaagactaa aaaaaaacaa ctttcttttc caagaatacc
    301 ctttttgtaa cttttgaatt ccgtattttt taatggtcta tggtctacaa acactcatgt
    361 gcaaacacat tacacgcaga ataagggatc acctgcacga agctatgaac tatttcctca
    421 tcccttctag ccccttccta gaggcgaacc ctccgccccc aaccccaggc actatctgtc
    481 ctgcttgcac cca
    ACCESSION No. AA121778
    ORIGIN
    1 tttctgtcaa gctgttcttt atttcangga gagggcaggg gcagagcttt acaggagtag
    61 agattttgta tgctattgaa ggtaaattgg tatcagttta aattagattg ttttaagtgt
    121 aggatgttaa ctataatccc catagcaacc acaaataaaa catctaacaa atatacacaa
    181 aggggagtgg aaagagaatc agactagttc actacaaaaa aacagaaaag aaggccataa
    241 agaggaaatg aggggccaaa aaagtatatg acatatagaa gaagtgttaa atggtagaag
    301 aaagtccttc cttaattact ttaaatgcaa atggattaaa ttttccaatc caaaaggcag
    361 aaattggcag aatggacaga naaaacaana catnaacatg atagtgatat gcctgtc
    ACCESSION No. AA258031
    ORIGIN
    1 ggggccccgt gatctcaacg gtcctgccct cggtctccct cttcccccgc cccgccctgg
    61 gccaggtgtt cgaatcccga ctccagaact ggcggcgtcc cagtcccgcg ggcgtggagc
    121 gctggaggac ccgccctcgg gctcatggcg gccccggtcc gcatgggccg gaagcgcctg
    181 ctgcctgcct gtcccaaccc gctcttcgtt cgctggctga ccgagtggcg ggacgaggcg
    241 acccgcagca ggcaccgcac gcgcttcgta tttcagaagg cgctgcgttc cctccgacgg
    301 tacccactgc cgctgcgcac gggaaggaag ctaagatcct acagcacttc ggagacgggc
    361 tctgctggat gctggacgag cggctgcagc ggcaccgaac atcgggcggt gaccatgccc
    421 cggact
    ACCESSION No. AA702422
    ORIGIN
    1 aaatgtcttt aattgctgaa tgcctctttg gctaatattt ggaagatcat tatttagtcc
    61 tacaacagac gcattgttcc actttcccat cattttgttt gcaaaccgct aaaagtctta
    121 tttcctcatc tctttgacac attaccaaag tggaccctat gctgtaatca cacaggataa
    181 tgttggaaag tatgaatatc taaattattt tttaaaggta ttattttttt ccttctgttt
    241 tcaaatcatt tctgacagtt tctaaagaca tggtcacagc tgcctgaagc atgtcttctt
    301 cactcatagc atcacctaga tcactcccaa gtgctcctga actggtggct ggcctttcac
    361 atggatgtga actctgtcct gataggtccc cctgctgctg ctgctgctgc tgctgctgct
    421 gctgctgctg ctgttgctgc ttttgctgct gtttttcaaa gtaggcttct cgtctcttcc
    481 gaagctcttc tgaagtaaga tttgtacctg atgtctgtgt catatcttga gaaatgtttc
    541 g
    ACCESSION No. T64924
    ORIGIN
    1 tgagacggan ttgctctgtc gcttaggctg gagagagact ctgtctcaaa aataaaaata
    61 aaaataaaat aggagtaatt cacgaggaaa agattacata ggctgctttc ctgcttttct
    121 tatccacagg cagttctttg caatgactat ttaaaaacta aaacaacatc acaagtcatg
    181 aagtttgtgc tacccctgaa cttgacaaat tgtctgattc aagtgggcaa agcacaatga
    241 ttggatgcat ctgaacagaa cctcctctgg aatgggggcc tcactagagt gagctcttca
    301 tgagccttgc caccaggggc aggggattat tctgttattt tggcctgttg tagccaagtc
    361 tgcaccccta ggcacccaaa acaaactggg gngagttgg
    ACCESSION No. R42984
    ORIGIN
    1 tttttttttt tttttggaaa acactgttta tttgaaaaca atgagacctc aaatatgaaa
    61 tatagttaac aatgacattg acactgttgc tagcactttc ccctaaacca cccgtaagtc
    121 ttggacgcat gtgcatgcag cacacacaca cacacacaaa aaccaaaaac aaagccaaaa
    181 aaaaaaaant cccaaacaca acattccatg nttgttcatt gaactcctga tgccgggagn
    241 acaggactgt taaaagattt tgtctcccac attatctctg ggagtggggc acaaagc
    ACCESSION No. R59360
    ORIGIN
    1 ttttttttgg ttttattttc tcctgaagct gaaaatgttt cacccatata aatgtggcat
    61 tttagactct agctataaac ctcatcgacc agtatgtttt cagagttgtt cacaacaaaa
    121 tattattcgt ttctaaaatc agttttcact ttttggtgat agtattccag gctggactgc
    181 ttgaatttta gatgcagaga tcattttata tatatctgtc aatgtaatac agaaaaatta
    241 catgtgaatt gtttatgtgc cccctctacg tagggacaca gtatcaatca ctcaataagg
    301 cactgtaaca tcaggtgggt gtttggggat aaataacctc ttcggggttt ctttcaatcc
    361 cactaccata tggct
    ACCESSION No. R63816
    ORIGIN
    1 aagtcannga tntttactta atttctttca ttgtatactt gtatctcatt ttctcttaac
    61 actgaaaatc ctgacttcta aagaaatgta actacttgtt ttcttacaac atagtattct
    121 agatacaata ggttcaaaat aacaccagta ttaccattaa caatgagact actaaatgca
    181 ttttcacagt gcactaaaat ctcaggaatt cactggcaat ataattcatc catgtaataa
    241 aaaaccactt ggtaactcca aaactattca aataaaangg taataacaaa tttaaaaatg
    301 gcattttgng ggtttcttcg gaattttttc accctttata ttcccccaaa gggccttctc
    361 ctattaattg nggaggggcc ttgggnattg g
    ACCESSION No. T49061
    ORIGIN
    1 ggaccaaaga actttatatt tattttaaat atcaaagtaa cacaaagaac tagttcaata
    61 tacagtacac ttcctactct tcacagagaa ctgaaatttt ctataaagac atttatactt
    121 aggaaacatc agacaaccaa agtatgtata aaactcacaa gatattttac acacagttca
    181 caataattaa ttctgatatt ttaggntttt tctgtcattg cttttaaagc atccttaatt
    241 taaaaacaaa aattattatt tgaggactgg aaaacaggtg gcaaaggcat ttctactttt
    301 aattatacac tggtaaatcc ccccttaatc caaaacattt tacttncaca t
    ACCESSION No. AA016210
    ORIGIN
    1 cacagcaatt catctttgct tttattaata atttcaacgt atgttttgag cactttacaa
    61 tgtaggaaat gctttcatag acattatttc ctatgattct cacaaaacct tcactgaaaa
    121 aaaagacttc aaggtcactt gccctatgtt tataaaataa tccgctttaa ataagcagat
    181 aggagtccaa aaattcttac aatcataaga aaaaaaaagt ctaaccagta cttaattatt
    241 tcttgtcatg attactttgt tttaacgcca ctgtttcctt gcttccccca ttttcttcag
    301 ataagtttac tccttttggc ttgtcctgca tccttttctg acagctgccc tgtgtacacc
    361 tgccttaaac atctatcctt ctactctgga atagactaag ccaaaagcaa ttaagaaata
    421 tttcattcta aagaaaacag aattttagtc caaaacccaa at
    ACCESSION No. AA682585
    ORIGIN
    1 cctgtgggct atattttcct gtatgttttg tatttttttg ttggaaactg aacattccaa
    61 gttttacact ggggaagctc tggaaactga attattttac tcctccagga ttgtttattt
    121 ttaaaatttt gctggcttat gataaagggt atttcgagga aacagataaa gggatgtata
    181 gggcgaggta tgggggaagg ggtgcagagc ttccatgccc tccgtaggtg caccactctc
    241 caggaacctg caggtgttca gctatgtgga ggctccctga atgcggtcct cttgggtttt
    301 tatggaagct tcataatgtc agcattcctt cccccaaggt atagggcaag actctctctg
    361 gggaaggtct taggaccaca atcagaaaag tgggcagaca ttagagtcct gccttggggc
    421 agatgaaagg agggcaggag aaggtcagag aaattgtttt tcttgag
    ACCESSION No. AA705040
    ORIGIN
    1 gtagagtcgc ggtctcactg tgttgcccag actcgtctca aaaaactcct gggctcaagc
    61 aatcctcctg cctcagcctc ccaaagtgct gggagtctag gggtgagcca tcatgcccag
    121 ccaagcctga ttttaaatca ggtctctgcc actagcagct gagagctcct cactgataaa
    181 tcctttgcag ctggaagtat tcaatggtat ccagtatatt cccaatggct cattcctctt
    241 ggacagagaa actcaagtta aatgaactct tttggctgtt tttctccctc ccctttgttt
    301 cctccctctc ccttgcctgt gtctctctgt ccactctctc aggcccttc
    ACCESSION No. AA909959
    ORIGIN
    1 ttttaatggg caaaagaaca agttgcagtc aatggctgca gaggggtgtc tggggtccaa
    61 tgtgggctgc actttgtggg tactgaggaa atgggaagat gctgcttcta ggtcagctgg
    121 tgggttggag gttgggggct gtaattagca gcagccttag aactgggatg cctttcaatc
    181 cctcctggcc ccttatctct gtggggcagt cacaggacat catctgtttt attcaaagtt
    241 gggacttgca gcaggagacc ctgtcctgca tggagtaggg gtcctctgtt gacaaacttc
    301 ttggtttcca gctcttcccc atctgcagca ggcctctgga ta
    ACCESSION No. AI240881
    ORIGIN
    1 tcggttaaga tttttattat tccagagaaa aattagaatg tatcggtaaa agaaatagga
    61 atgcatattt caactcactg tcacaaacag gtgttttatt atcccaaatg acagtgttgc
    121 ctgagatgat gcatgtggca gacgaggaac caatgagtcg gtatccttta ggacaagaat
    181 atttaatttg ggatccgaac tggatgtctt tgatcacatg tgccatgcca ttcacaggat
    241 ctggaggatt acgacatgat ttacgtttgc acttgtcctt agcacttgtc cagactgagt
    301 tttttaggca gatgatagaa aacggtcttc cggaataacc agggcggcat tcatagttca
    361 gatatgtccc aatgggaaac tcagagtcat cagttaggtt ggtaggcctg gcaaatggaa
    421 gcccattccg gacattgcat tga
    ACCESSION No. AA133215
    ORIGIN
    1 caagaacatc ccttttaatc acaaaccact catccacaaa tgtggctatg gggtaagcag
    61 tctaggctgg gaccctttcc agaggtaagt caaggtcacg tccctgcccc cttcctaggg
    121 tggcggtggc tccagccagg ggggcttcca ggttaatacc agagcctcgg ctactctgga
    181 ctcctgtgag ctcttcttgg ctggaagaag gggggcattg tgggcctgct ctgtcccaag
    241 gctccagaag ctgcccctac ccaggcctgc ctgc
    ACCESSION No. AA699408
    ORIGIN
    1 taacagtctt aatattcatg tatttattct cagaacatac aaacttatct tctcagagaa
    61 tagaaaacag agatttcact cagtgacaaa gatggacaca gccagttcac cgtgtccccc
    121 catctactta gaaaatcccc tgggggaggg gatgcctaga gcatacagca ccccttggtg
    181 gccggctgtg cacaggtcta aagactctca acttccttta ccatccaaaa aggaaaacag
    241 ctgtccagat gacagtaaga ttccactgtc tgtaatcctc atggtgccag gtctcctggg
    301 gcatctaggg caatgatgct actgcagttt atgcagttac acagtcaagt ctgtgccaaa
    361 ggaggtccca tccggcggcc aggtttctgt
    ACCESSION No. AA910771
    ORIGIN
    1 ttttgttgta gaaatatatt tattaacata agcagttcac aatttactgt aagaaaaaaa
    61 gcaagctaca aaacagtgat tccatgttta tattaaaata aacatacaca aattaaaaat
    121 ttccttagat atccatttaa tctctgggat cataagcaat gtttaggtat tttttgctca
    181 tttattgcct aggttttaca caatgagcat atatgttaat tgtgtaattt aaaattatgg
    241 aattaagtgc aagagttcct aaccaccttt tacaaaactg ttatgagaaa atacattcta
    301 gattcaaaca aaaactaagc aatatatccc ttattctaac agctctaaaa tctgttcttc
    361 tcattatact cccac
    ACCESSION No. AI362799
    ORIGIN
    1 tttttttttt tttttttgca agggctgcgc ggcattttat tttctgaacc ccccacagca
    61 ggggcggcca gtcctgctgc aggcagagtt tcagtcttcg gagtttgacc ttctggccca
    121 aggtcatcac agccacaggc ggaggctctg gggaaaggtc cagttcctgg gatgctggcc
    181 cctaatgatg ggcccatctt tccagtgccg cccttccctc ccgcctggca caggagttct
    241 ggagccacgg tcctgagtct acagaacagc ccggtcagcc tcgtcccgcg gtgcaagcga
    301 ggcctggcct ccctccctgc ctgtccttgg cccggccaca tcactccctg cgtttcttct
    361 tcttctccgg ctcctggaca ttggccgcct ttgctcgggc actggtcagg ggccgaggtg
    421 tcctccttct ttggcgagcc cctttttggc cacgggccct
    ACCESSION No. H51549
    ORIGIN
    1 atacaacatc tttatttggc attgganatc ctgacatttg tncattacag ttccttaaaa
    61 aacaaaccaa aaaatcagaa caaattaatc aaaaataaag atccaatggc tctatttaca
    121 tatngcaaag acagcccagg natcttccnt gcacacacac accccgcccc gatacagtta
    181 aggggttaat aagctttggg gagcgcagga ggcaggttcc acagttcatc aatcccaagn
    241 cacccccatg aggtaggggt gcctcacaca gccagacggn tatcaagagt atgattggta
    301 gctttttcct c
    ACCESSION No. R06568
    ORIGIN
    1 ctgtcctgat tagaattaat tttcataaag agaacaagaa tcttgactgg ttcacccttc
    61 aattccttgt gcccgcaaca gtgaccggca catggaaagc attcagggaa taaaagcaca
    121 atggaaaatt aaaacatact cactgcatgc ctgccaccta taggaaccaa attaaatcac
    181 tgccaatatg gcatgggggg aaaaccttcc catttttctg ggaataatgt ttacaaaggg
    241 tgggaaaata aggtggcaca ttcacctggg gtggggcatt ttaatttaaa cgctngttga
    301 ccccagtngg ttgttacntt tttcaggtgg aatta
    ACCESSION No. AA001604
    ORIGIN
    1 cttatgaata atgttagaaa tggaacatga tgttttaaat gtatacataa accttccaat
    61 taattatcag gtgatccagt agtagacctg tgacctctga aggctcctgc ttctcatccc
    121 ttcccttctg ctgtgatttg ttgtcttccc tctgctcatt ccccttgtgt ctgtttcttc
    181 catcctctcc ccatgctccc tctgttgtca tttcccctta ctctccactg cacccagcct
    241 ctgttcataa tttttactgc aattccgatg attgaattat aaactggaag ggagcaggga
    301 tattgatctt catgtagttg gacatgtact agactcacgg agaacaagga ctgggttgta
    361 ggcacaatgc tgtgtgggtt ttgggtaaat ctaactcaca ctcaacttga ttttgttttc
    421 c
    ACCESSION No. AA132065
    ORIGIN
    1 gagacacagt acaacagtct ttaatgtata tataaatatg cctacataac agagtttgat
    61 aagagaagtt ttggctatat acaactctgc atgtaatcaa actctagaac atcaaatgca
    121 actccactgc atagctgttt tgacagagca acagttaagc ataaaatagc tttgcacctt
    181 attattttgg agcaaaataa aaaataacca ccacaaaaaa aatctctaca ataatttaaa
    241 ctaaaaatgt tgttgaggat agggtaaaca acaaaaaaga aaataatttg atccatatgt
    301 gatatttggc tgaagattaa cagtgttaag tctaaccaac agcgagataa ttttaatttt
    361 cccaagcatc ttnctaccgg tttattagcc atatttggat attaagggga agggcatttn
    421 gccctttacc aaaaccn
    ACCESSION No. AA490493
    ORIGIN
    1 tctttattga cttattgtaa ttttttggca tacaaattac ttaagtatat ttacaattct
    61 tacataatgt acattttaga agataatgta ctttgctcca tttacaatga caaactactg
    121 taaaactaca ttcatgaatt agatacaaat cctctacata ctaataaaaa gtaaatggac
    181 tgttggttat acattcttta aaatatacct tttcacaggt agcaagaaat agtacatgta
    241 ataagtcttt atgactggaa tga
    ACCESSION No. AA633845
    ORIGIN
    1 gtttttaaaa gtcagggttt tttgttgttg cttgtgtgtt ttataattaa catagtttat
    61 ttttaatact ggcatccaag aatcctggtt tactcaggtg cagaaagact ctctaactaa
    121 gcagccaaaa aaatttttgg tatgcaagtt ttatcatttt ttaatttgca tatgacttga
    181 acgtgtcttc aagtataggt ctacataata actttttaag aaaattataa agctcaatac
    241 aataaatcta atacataaat gctgcttgta agtcaaatat ttaagagact ataaaaatgg
    301 gtaattttgt gataaaattt agaatcattt gacaagagat caatgaattg
    ACCESSION No. A1261561
    ORIGIN
    1 cactgttaaa aatacattta tcattaaaat atattacaca tggagacagg atgcatcata
    61 tacagtttgg aagacttgct ggcccagaaa atcccacttg tttcaccgaa cactcatttt
    121 ttcagggatt ttacatttta tttttagaga cggggtctcc ctctctcacc cgggctggcg
    181 tacagtgatg tggtcatagg tcactgcagc ctcaaactcc tgtgctcaag tgagccaccc
    241 acgtcagcct cccaagtaac tgggaccaca ggcacgcatc accacgccca gccaattttt
    301 taaaaatgtt tttgtagaga gggggtctcc ccgtgt
    ACCESSION No. H81024
    ORIGIN
    1 agcttcagcc tttattaaac aaaggaggag gtagaaaaca gataagggaa cagttaggga
    61 tcccttcttt cccctataca tacacagaca tacaaacaca cgcacccgag tgaatgacag
    121 ggaccatcag gcgacagatt gaagggcaga gggaggcagc accctccgag agttggcccg
    181 gacccaaggg tgggctgaga cctgggccag gggcagccgt tccgaggggt tntgcctgag
    241 cagtttggag atgaggtcct gggctcccgt ggggcacaga agcggggaac tttaggtcca
    301 ccttggacga tggcgg
    ACCESSION No. N75004
    ORIGIN
    1 tcaagtcata agataaagtt taatcatttg atcatgttaa aagacacaaa acacagccaa
    61 tctaaccaaa tttcaggcat gcatttacat aaatatatta aattaagaaa agaaattgta
    121 cacttaaacg tccttttcac ctagaaatca ttaaatccac agatcaacaa taaaaccaat
    181 tctctgcatt taccacttca agatacaatt gttctatttt aaagataaca caaactncac
    241 tagtctggtt aggaatttat ntgcattata catatattat
    ACCESSION No. W96216
    ORIGIN
    1 tctcaggagg tagaagcttt attatgacat cttcaaaaga caatcaaatc aatagacatt
    61 tgctgagcac ctgctgtgtg caagcccgtg tagacagtag ggtccagtgt cccacgcatg
    121 gctctcgaat ccccggggag aaaaatcaca tcnggggtca gggagttttg cgtggctgag
    181 aacaaagtgg gtttctgaac atcaaagtgc aattcgcttt acggggcaaa ctccgangcc
    241 cagccccgcg tngggaagcc gcagcngggc gggcccgctt cctggggctn gcggccgggg
    301 tttctctaag ccgcacgcnt tgcgtggtgt tgcggggcct ctcaagcaag cccggaagca
    361 gcatccttga gctccggttg ttggagcgct gggacctctg gctgccgccc ccgcagcagc
    421 agcaaccact actccgctgt c
    ACCESSION No. AA045793
    ORIGIN
    1 caaggtatag ctaattttat tattatcaaa caaaactagt agatataact tccaggaaat
    61 aagttacata aatataacag aataaattca ttttcttaag tttcaaatta aagatgatta
    121 agaaatacag ctttatgtaa agtttctgct ttttctcaac cacgcctaaa gaggaaagaa
    181 ctggcagcag gaacacttgc tcctaggaaa caaatacaac aaaattataa ttaaaaagat
    241 cttcaagcta tcaaaatttg tgagagaagg atggtaagaa tgcagtagaa attaccanat
    301 gacaaacaaa atcctatcag ttttcaggtt ggtcaaaaag taacttccat gaatatagcc
    361 tgtggatccg gccat
    ACCESSION No. AA284172
    ORIGIN
    1 gtgttaaagt tggatggatt tattttttta aaggcccagt acaaaaaaat ggttgaggaa
    61 agtgactctt caacaaaata tacacctgta gaaaaaaatc cctaatatac tgatatttaa
    121 ttgaacggaa agtactaaag agaacatact ttaatatcta ggcacaattg gtcaggtact
    181 aattataatt tctgttctca tttaaaagtt taaaccaatt cttcaactgg actgatgtgt
    241 gtgagtctaa tacagagaag gcacctctct catctctcac tctccttaag gaccttttga
    301 gagaaactct ttgtaacact ttaagggaca cagacaatgc actatatcta agtatagata
    361 tagttattta acatac
    ACCESSION No. AA411324
    ORIGIN
    1 tttttttttt tcccaaacaa tacatatcag attttatcca ttttgttttc tacatgttct
    61 ttgtgactca agtttgacat tagcatttgc accccaaatg agttccccta caaataaaat
    121 ttgttcatgt tgacacaaag aacacaaagc aagtatagat ccctcaggaa gttgtcacaa
    181 ctcttgataa gattaactcc accactatca tcactttttg ctttgtcccc tagtttgaag
    241 cctgctggct tttataattc aatgagaatg actccacact cttctccaaa gcgcccatta
    301 tttttagttt ttcggtgcgc gactcaacat aaagacctgt ggctcttatg agctgcctgt
    361 ttttaaatgg tgcagtagtt tcagtttcca tttaataagt tcccagataa caaatggaga
    421 atgggaagaa tcttctcaag gtcacagtga aggtaaaaat aaattatctc catcactgag
    481 aggct
    ACCESSION No. AA448261
    ORIGIN
    1 tttccagaaa aggatatttt ttttattcaa gtaactgcaa ataggaaacc agagagggag
    61 ccccaggctg ggacaaatca tggctacccc tccccaacag aacaggggga ggaggtggcc
    121 cctacaccct ttatggtcga ttcgggcccc cttgctcact ctgctgcagc atcctagggg
    181 cagggccagc cttccctggg actggggtag tcggtcaccc agcctgccat gccccagccc
    241 ctcttcccca caaagagtat cttgggggag gggatcgtgg gcagaacagg aggcaatgag
    301 gatgaacatt tggcgctggt agcagcagca atgacggatt gtcgaagaat ggaacattga
    361 aca
    ACCESSION No. AA479952
    ORIGIN
    1 aacagtctgg ctgttgtttg aattaaactc ttaaacagga tgtttagtta gagggtaatt
    61 gttgagtaat gatgcataca acagcatact tccctttctt gctgggggtg cagcttttca
    121 gttttcttgt tttactttga cagtgcaagg ggaactgaaa ataatttcca ttgtattatt
    181 tatcttagtt cagctgaggg ctttatgaga cagtggatgg ggaggcagta agacggtgat
    241 gagataaaat gtgtgtgttg cactgactgt ctataaagtt atcctttctt catgaaaaag
    301 tagcatttaa atctggatga gtttataaag gattacaaaa tgctgattta tagagtaaac
    361 tttaaaatat taaagactaa agactaaaag aagagtaata atgaagtaat gtag
    ACCESSION No. AA485752
    ORIGIN
    1 ttcggcagca actcctttcc tttatttctt ccccttgtaa agggaaattc aagttcagca
    61 gcattccttt cctgccccaa gtcctcaacc agacaagagg ctgcaggcac caaatcttgg
    121 gctggataat ggcaaaggcc tcagaagctc acctccagct ctgagcttca acagctgttt
    181 gtaccagtga gtcagcatta aatccaccag aaaagaacag caccacccaa agactggggg
    241 gcagctgggc ctgaagctgt agggtaaatc agaggcaggc ttctgagtga tgagagtcct
    301 gagaca
    ACCESSION No. AA504266
    ORIGIN
    1 tttttttttt tttatatata tatataattt tatttaaaat ttagatccct attcccacac
    61 tctaataagc tgtataattt ttgtttagaa tttttctgca aacatactac aataagcttc
    121 ttttatttgg agacaaaata cagtggcatt actggaagga atatcacaac attacatttt
    181 tatcttaaag gacaagcaaa ctttcagggt tgataatggg ataagcatgt ttgagactgg
    241 ttaccttctg gcagttcact gcatctggat atttctgaaa agtatagaga agctcttgga
    301 ttttaaaaat atcttaaaat acttttagat gaaaaaattg taaaagttct gcttataagt
    361 ttacttttct ccacaattac aatatttaaa acaaagtttt gttgattgac gttttaagca
    421 tttaaattta gaatgctaaa aacaattcta tcctacactt tcttcagggt aggggaataa
    481 atacatcctt aacattgttt tctggatgta aacagaaatc cagcagaggt catcattatt
    541 tagtacaacc agtaaataaa tgtaagagaa t
    ACCESSION No. AA630376
    ORIGIN
    1 agcttggcaa acctttttta ttttgtgata aaaatgcttt catataaatt tcatcttaac
    61 tacctttaga atgaaacgga aaagtaaaaa caaagtgtgc attttcctta ctacgtttag
    121 tcaggaatat gcggtcattt tattggttac tgggtttctc atacaaacag atataatatc
    181 acttttaaga gaaatgtaca caaggaagta accatagtac cacttattag tgggggcctc
    241 tgggtacata aatgtgtcct cccaaatagt catcatacat tcaatggtat t
    ACCESSION No. AA634261
    ORIGIN
    1 atagtgaaaa tatactttat tttttaatac aatagctgcc agcaatatac tggtgctgat
    61 gttccaaaga taaaagaaaa tacatgcatt ctataataag ctttcatttg cctgttcaag
    121 aaattataaa gaaaatactc caattctgtt caacattacg gcttgaggag ttgaaatttt
    181 tccatgataa aaatatactt tgtgtggccc aaaccttgac tatttataaa ggatggagtt
    241 tttaaaagcc cacatgtatc aataatggat gctcccctct ctttgaatta aatgcctaaa
    301 ttcaaattaa tgcaagaaat tggtgaatca ttaaatgatg aaatttgtat caaaatgttc
    361 atgaaaaaat acatttctat ttcctctaca tttttacttt gtagttattt tctaaatggg
    421 tttaagggca cagaaataaa tgctatctac atgcaactct ggagagattc aaaacacaac
    481 agaagttaac atgcctaaat cctagagttg atccatttag tgtaagaata aatgtcagaa
    541 atc
    ACCESSION No. AA701167
    ORIGIN
    1 ggtagaggca aagtttcgct atgttgccca ggctggtgtc gaattccagg cctcaggtga
    61 tcttcccacc ttggcctccc aaagtgctgg gattacaggc gtgaaccacc gtgccaaacc
    121 tacattttta gatttattat ggtgttctga ttaacaataa agctaggtta ttagctgcct
    181 gggaagagga ggaagtagat ttttacagtc acttttatag aaactgttaa attcacatga
    241 gaaattccac cttacgagaa ttggctccct gacatgtctt tggactacct ctgtttctct
    301 aagtttttgt ttttttctgg tgtctgaatt aagttggtga cagatttggg ggatatttga
    361 gtagcacttt atctagagtt gc
    ACCESSION No. AA703019
    ORIGIN
    1 ggcatttcag taaatttttt taatgacttt aatgattctt atttaagaaa aagcccttaa
    61 ataaatgcta ccaaggcagt aatatttgac catatgaacc agaccaaata ccctttaatt
    121 ttagtatatt aacctctgct gtaaatgctc ttttaacatt gccacatgta caaatttgtc
    181 tagaacttca cgacacaaaa gtgtgcaaat atgagtctaa gattgtgctg aaatagggaa
    241 aggctaacac tgatgtgcaa agtaaaaaag aaagataacc gcttctgcaa caggtaataa
    301 aacaaggaaa aaacgagtta ggtcctgcat gtgtctccac ttcattgctt ccatgtttga
    361 aaaagggagt ctgttctttt gctaggccat gaggctggaa tccacttggc atactgtgtt
    421 gagaggtcta agttcagtgg tgctctcagc agcagccggg agg
    ACCESSION No. AA706041
    ORIGIN
    1 cgctgagctg cttatttatt gaaaataaac gacggaaaag tctggccttg ctcctgtgca
    61 agcttggagg cctgggtcgc cgctgtggac aagcgtctta gtgtcatgca gaccagaagg
    121 cagctgctgt cccagggccg gggccacctc actgcctctg atggggactc ccagccccca
    181 tggctccgct gtgccctggg caggggacgg gctgggggca ggggagggct ggagcccagg
    241 aggcagcaca gcagccagaa agccgcacgc tgagcctgca cctatggttc cgggaggggc
    301 ttgggccgtc acccaagtgt gatccctaag aacaggaggc ccagcaccct ggaaggaggc
    361 gctggaaggc ggggcggtgg tggccccgtc a
    ACCESSION No. AA773139
    ORIGIN
    1 ccatgaacac agtagtgaga tattcctttt ccactcctac actatcttct gcttaaaacc
    61 ctctgagggg tcccatctct ctcagggtga tgtctagact tcttctgagg ctagaccagg
    121 tggtgcggcc ccatgtgcca cgcacccaag ccccctgcct cagtgtcccc catatcccac
    181 accacagggg ggtggctgcg ttctgtatgg taggtggtgc tgaccactgg gcctctgcac
    241 acgctgctct cagttccctg gccaactctc cttcaggcct cagc
    ACCESSION No. AA776813
    ORIGIN
    1 ttttgtagag ctgggatctc actatgttgc ccaaggtggt ctcaaactcc tggcctcaac
    61 tgattctcag gcctcagctc cggaagtgct ggaatcacag gcaggagcac ggtaacccgg
    121 gccccacagg ggtttggggt c
    ACCESSION No. AA862465
    ORIGIN
    1 tttatgctag gcaaggaggg atgattattt attagcttct acagattaga caatggggtg
    61 ggggtgggct caaggtgaga tgattttttg ggtccaagtc tactcaagac aggcatccca
    121 gtcttcggtc tccaaatcca cctcctgtct gtccccccac actgctcctc aggccttgtg
    181 gatccattga ctgtgatttc tgtggttcag ctcccacatc aggcaggaag ggcagctact
    241 gggtctgaga tcccacattg cctccaaccc ttgcttccta gctggcctcc cagggcacca
    301 cgaggggctg ggccaggctg ctgtgctgca cgtggcagga gtagggggct gtgtcctgcg
    361 ggggcactgc accaccaccc aggactggta agtgccattt ccattgtgaa gaacatctcc
    421 cgtactcagg ctcctgcacc tcgcggcccg agtccagtgc acatcaattt ccctgggtag
    481 aagtcgtagg ccagcacttc agtttcttct tttctcctgg gggctggtgg ctggtgacac
    541 cacagaggga ggatctgccg gtccaggata tttttgct
    ACCESSION No. AA977711
    ORIGIN
    1 tttggcattg taattatgca gaagaaaatc tttattctta gggatcatgc tgggaactga
    61 gggatgaagt atatgcatat tccaaatggt tcaggaaaaa tcctgtctat aaagcataca
    121 tgataaaatg tcaacaataa gacaaactag aggaaggata tacaggtgct tactgtcaaa
    181 tttcaaattt tctgtaggtt tgagagattc aagatgaaaa cttgggggaa aattatatat
    241 tctgataata aaacagatgg gaaacaaaga gggcccataa gacagtcact gattaagatg
    301 ctttctacat ggatgggcct catccttttg tccaaaggga ctacctggca tctgttccat
    361 gttagtgaca gtgactcacc ccaggttgct gcacagatat gagaggcttt agatcatagc
    421 acagtc
    ACCESSION No. AI288845
    ORIGIN
    1 tttttagatg ttttaaaata catttatttc atgtcgtttg tccccagggt ttggagtttg
    61 atgttctgga ccaagcgtag gctctgagca aatgctacca gggctggaga atcagttctg
    121 ccacttccta gttaagtgat cttagacaaa tttccgcgcc ttagttttct tctcagagaa
    181 atgagactag tcctatccac actatggaca agtggtagga ggcgaaggag ctcacgtttg
    241 taaagagcct tgcacggtgc ctgagacaaa ttcagtgctt agcaaatgtt agctcacctc
    301 tcccttttct tcctgtatcc gattttgtat acaaatgtgt agaaaattta catgaaataa
    361 tgcagaaag
    ACCESSION No. H15267
    ORIGIN
    1 tttttttttt ttacatgaag tagaactttt atttggaaag ttgaatttca tgtataatga
    61 aaatattttc aaaccataca tagtcataag cataatacaa acaccaccta caatacaaac
    121 acgttttata aagttctact atgaatatta atccaagcca aaagaaaaag gtaatcacgt
    181 gaacctgttc tacatacctt tcatctcttt tgatgacgta atcgaacaat ttaaggtaca
    241 aaacaangaa agctttgggc tgaaccctac ttatttcact ataggaacac taggatatat
    301 actaccacag gtaaccaaac ccaatcccat tataattaat ttaacattgt tacatggatc
    361 ctatcttaat ggnatgtaaa cat
    ACCESSION No. H18956
    ORIGIN
    1 tttttttttt ttttttttac atgtaagaag tggttttatt ccaggngtgt gtttcataaa
    61 gacgaggtcc tcaaggacag ctagtggcac atgctttggt caagaagagg aaaagcaaaa
    121 acagaacagg gctgcgttgc cacaaaggac cggctgataa gtgcagagcc tgatctgacc
    181 acagcaaagg acagagagac cctcttgaag gccctctggt cagcagtcct cttacattca
    241 acaggcgcac ccggctcccc agccccaaag gtccatgccc gagtntggcc cgggcttcta
    301 gtccatcctc tgggggagag gcctttgccc tggggcccag ttttgtccta aggtttnggc
    361 aggganggtt tcccagatgg aacaggggga tttttagggn tgcacttggg tttncggaag
    421 gaaacntcac gacagaggga caggcaaagc ttggccntgg g
    ACCESSION No. H73608
    ORIGIN
    1 aaattttatt aattttattc aggaaagaca ttgactgtta agtttttttt tngggggggg
    61 ggtgatgtct tgctattttt taaaaattat atccagacta tgaatttaat atttactacg
    121 gctaatcaac tgctcatgtc agtaatcaaa gncagaaatg agccttatac gtacatctac
    181 attaaacaca cacacacccc tttaaggggt gctcagtgta gnttctaatg tcagtctgtc
    241 cattcaaccc agggcccaag gttgcatcac atcaccaagt tggaatcatg aagacagccc
    301 agatttgact gacatgggca cagcagggct ccctcaccac agcccntggc accagttaac
    361 tatttctngc tcgngccgaa ttnttgggcc tcgagggcaa ntttccctat tagtnag
    ACCESSION No. H99544
    ORIGIN
    1 gcgnccgccg cccccgcctg ggccgcgctc cccctctccc gctccctccc tccctgctcc
    61 aactcctcct ccttctccat gcctctgttc ctcctgctct tacttgtcct gctcctgctg
    121 ctcgaggacg ctggagccca gcaaggtgat ggatgtggac acactgtact aggccctgag
    181 agtggaaccc ttacatccat aaactaccca cagacctatc ccaacagcac tgtttgtgaa
    241 tgggagatcc gtgtaaagat tggganagag gagttcgcat caaatttggt gactttgaca
    301 tttgaagatt ctgattcttg tcactntaat tacttgnaga atttataatg ggaattggga
    361 gtcagcggaa cttgaaaata aggcaaaata cttggtaggt ctgggggtnt ggcaaaat
    ACCESSION No. N45282
    ORIGIN
    1 ctaggcataa cataaattgt tataattgat cagaatatct tgaatatatt tttacagata
    61 actagtggtt tctactagca gattaaaacc aagagaaaat taaaagtaag ttcacattta
    121 aaaaaaatta taagcaataa atacagcact acagccacca ctaattctat atacattgga
    181 ttacatttaa acaaacactg cattccagaa tgaatatttt atgaataaat gcattggaaa
    241 ttaactttag gaaataaaat gacaaattac gaatttagaa aattaaaata tgactttcac
    301 aangtaatca cagtaaaatg cagatctaca ttttaaaagc tagaaatttc cccaaattta
    361 tttttttgga cagccaagaa gnttgcctta aaaa
    ACCESSION No. N48270
    ORIGIN
    1 tttgcacctt gaaacaattt aataatgtat tacattatag tagcatcaca gcagcagtca
    61 ataatgccac tttagacaaa aatcagtatt tccattatgc attctgtgta taagaattca
    121 taaatcggta aaagtcattc taagaaaact tggcaaatac agctttggac tggaattggc
    181 atttctttgt ctacttttcc ttcccctaga ttctttgttt taaactacag tattcatatt
    241 ttaaaatgtt ttaaattatt ttaagacgtt aatatagcag ttacattttt gaatagttat
    301 ttgaaagtga ctgtaagata aagttttaga gaatctatta atgggatagg gttgatttac
    361 attttcacat ttttcctaaa aatcagcttt ggttttagaa ctgattggtt tttcattttg
    421 ggaa
    ACCESSION No. N59451
    ORIGIN
    1 aaaatcactt caagaagcat ttattgagaa tctaagacaa acaccctata ttcaaagagc
    61 ttacagttta tggaaaggcc agccaatcaa tatgcaatat ttaagtcttt tcattgaggc
    121 aagtgttgat tttgagagca gagagatgat gatcgttttc gagctgagtt accaaggttg
    181 gagcttacta aactcacaag ggcagtttca ggaaaggaaa ataccatctg caaaggtata
    241 tggctcattc aggggctctc tgaattgtgg ctggagcaaa aggtttgaaa tcttttttct
    301 tcccaagaag atgaaagagc tcctggagga cagaaactgc tttttattcc ctttgtatct
    361 ctcacagcac ctggatactt aagactaaac tattctttca ctcatatggc ccattatcaa
    421 tgtcagcatt gtaaggccct gatggg
    ACCESSION No. N95226
    ORIGIN
    1 tccctttctc cctgtttccc tcccttcttt ccttccttcc ttccttcctt ccttcttaga
    61 attcactgaa gtatttccta ggtagccttt tacttactac tttaatcaaa gcttatcttt
    121 gtgcccaatg tgtaaaaagt gaaaatgtct cttcgaaatt ctatattaca atatagacag
    181 agaagttggg ccttgagggc ttgagtttca cttaaatact atacacatgt ggtatcacac
    241 aaggtggagg gggagggaac aaacagaaac ataacaatta tttttattct gtctttacaa
    301 aagaaagcct cttctctatg aaaaagtctt tttggcatct gctcccggaa acctgccccg
    361 agaacacgtt ccccattgct ttgcaagcat ctctttttaa aagcacanca ctgtccccgg
    421 gagtcacgta ggttggatta anctgtctta gttgaccaac gaagaancac tggatgagtt
    481 ttccagggat gantggttgt ctggggtgga acatatagtc ctgtctacaa caaatgtaac
    541 tcctgatatg ggacnatgaa cncagtgtgt gacccaggag tgnttgatct gtnaacantc
    601 gcatgnaatt
    ACCESSION No. R37028
    ORIGIN
    1 ttttttttct ctaagtgata atgatatccc agctagaata attgtgctct ccagaagcaa
    61 ttaatctgat ttgcaagcac tgattttttc ttttgcaaaa actaataata ttagcctgac
    121 caattatgaa ataattccta aatttacaaa ttcccaaatt tgtgctttca tggcttcctt
    181 ctattttaaa tctatattat tttaaacaaa ttttccttaa gnaaaaatga cttaacttca
    241 taaaaatcta cccatttatg gtaaataaaa cattaaccaa aaaccaaaat taaagggntt
    301 actataaatg gnaacattta cattgctggn tattaaatcc ctttccttgg catt
    ACCESSION No. R66605
    ORIGIN
    1 ttttttatcc ttcttaannn ttattacatg ttttattatc ctgtccccag aggtgggttt
    61 atccagaaac caagaaaaaa aatcaatcag aataaactca aaaaaaaaag gtagggggag
    121 caaaaccatc aaccaccagg gcagccaggc catcagccca cctccacctc tggagggtcc
    181 ccagagaccc acgcccgacg cagacccgga ggaggcatca gcaagggggc ccgggcagag
    241 aatcggctat gtctttcatt atgaggaggc agggagagac gggcagagat atgtttgcta
    301 gggtgantat atattttata ttaattaaat ccgtaagttt aattaaagta aataggtatt
    361 tctctggaag tttttttaat ttctttcntt ttttatagtt tttttggttt tttgtggntt
    421 tttttttttt ttttggggtt t
    ACCESSION No. T51004
    ORIGIN
    1 gcagctgttg tcttccaact cagcggcagg tttgctttcc ccacggacac tctggacctt
    61 gtagctcctc aagcttccct gtctattgag cagataggaa gccgtgtcaa atatgtggca
    121 ccttgaggaa atgcctagtg aatgacagta tgtcctattg tgctctaact ttatttcagc
    181 cttatttctt ttctgaatat tatttttcat ttatcttcat ttccttacct attttctttt
    241 cttctaaagt atgtatcttt gttagctcca tcatcctttt tgggaatgag gcaagtataa
    301 aaataaggta aataaataag gaccccatcc ctaggtattt ttaaggaaac cacccttttg
    361 cggggcacac ttggctacct tggggtcttt agggctctgg ggggctttng ggtgtncctc
    421 tngggcaggt cctggctggc attggcct
    ACCESSION No. T51316
    ORIGIN
    1 ttcatccgct gcatgtggaa aactggcccg atacctcgca ctacgagttt ctcgccgaca
    61 ctatgtggag cgattttgcc tacggtcgca atgccgtata cccggaagcn atcacggcaa
    121 cgcanctngt cgcgttatcc cattgaacat tatgagaatc gcgatgtttc ggtcgatggt
    181 gcggaaaagc gcggcntgct tcttacttgc cgcattgtgc cgccgattga ccgggaaaag
    241 cgattcatgt tgatgttgcg tacatcttgg ggccttgcgt tgagggcgca ccgttcagg
    ACCESSION No. T72535
    ORIGIN
    1 atgacctctg caaagagaag gtcagctata ngtagggaga aaaggaagaa ggcaagaaaa
    61 ggagactcga gatgagttta catccaagag aagcacagat gtttgtaatc tacctagaat
    121 aatgtgaagt acctgtccag catgtatgct cagatcctcc attcattagc acaagctgaa
    181 aacatgaact gcaaattcta caccagcatc ctttgcttcc tccatggcag tgggaggtag
    241 caaggggagt ccaacacttc tccatgacgt angaaaggca gggaaaaata ctgnt
    ACCESSION No. W72103
    ORIGIN
    1 gtttgtgaaa aggaacaaaa tgaanttgaa ttggacatgt gctttaagca ngccaacaga
    61 caacacacca ctagagacac acatcaaaag caatcacagt gctatgatca aatgatgggt
    121 acatgtgaac acatc
  • All patents, patent applications, provisional applications, and publications referred to or cited herein are incorporated by reference in their entirety, including all figures and tables, to the extent they are not inconsistent with the explicit teachings of this specification.
  • All nucleotide and/or amino acid sequences associated with accession numbers referred to or cited herein are incorporated by reference in their entirety.
  • It should be understood that the examples and embodiments described herein are for illustrative purposes only and that various modifications or changes in light thereof will be suggested to persons skilled in the art and are to be included within the spirit and purview of this application.

Claims (19)

1. A system for predicting clinical outcome for a patient diagnosed with cancer comprising a computing means; a user interface means that enables data entry, wherein said interface is coupled to said computing means, wherein said computing means is configured to perform microarray analysis and binary classification to generate a set of genes used in predicting clinical outcome.
2. The system of claim 1, wherein the microarray analysis and is significance analysis of microarrays and the binary classification is support vector machine.
3. The system of claim 1, wherein the computer is further configured to perform leave-one-out cross validation.
4. The system of claim 1, wherein the computer comprises a database for storing the set of genes, said computer further configured to analyzing biological information from a patient against the set of genes to generate a predicted clinical outcome.
5. The system of claim 1, wherein the patient is diagnosed with colon cancer.
6. A classifier for predicting clinical outcome in a patient diagnosed with cancer comprising a computing means and a user interface, wherein said computing means comprises a storing means and a means for outputting processed data, wherein said storing means comprises a set of genes classified by outcome, wherein said interface is coupled to said computing means.
7. The classifier of claim 6, wherein said set of genes consists of the following genes: N36176; AA149253; AA425320; AA775616; N72847; AA706226; AA976642; AA133215; AA457267; N50073; R38360; AA450205; AA148578; R38640; AA487274; N53172; AA045308; AA045075; N63366; R22340; AA437223; AA481250; AA045793; H87795; AA121806; AA284172; R68106; AA479270; AA432030; R10545; AA453508; A1149393; AA883496; AA167823; A1203139; H19822; W73732; AA777892; AA885478; AA932696; AA481507; H18953; AA709158; AA488652; N39584; H62801; H17638; R43684; N21630; T81317; R45595; T90789; and AA283062.
8. The classifier of claim 6, wherein said set of genes consists of the following genes: AA045075; AA425320; AA437223; AA479270; AA486233; AA487274; AA488652; AA694500; AA704270; AA706226; AA709158; AA775616; AA777892; AA873159; AA969508; A1203139; A1299969; H17364; H17627; H19822; H23551; H62801; H85015; N21630; N36176; N72847; N92519; R27767; R34578; R38360; R43597; R43684; W73732; AA450205; A1081269; R59314; AA702174; A1002566; AA676797; AA453508; W93980; AA045308; AA953396; AA962236; AA418726; R43713; AA664240; AA477404; AA826237; AA007421; AA478952; W93980; AA045308; AA953396; AA962236; AA418726; R43713; AA664240; AA477404; AA826237; AA007421; AA478952; AA885096; H29032; R10545; AA448641; R38266; H17543; T81317; AA453790; R22340; AA987675; N51543; N74527; AA121778; AA258031; AA702422; T64924; R42984; R59360; R63816; T49061; AA016210; AA682585; AA705040; AA909959; A1240881; AA133215; AA699408; AA910771; A1362799; H51549; R06568; AA001604; AA132065; AA490493; AA633845; A1261561; H81024; N75004; W96216; AA045793; AA284172; AA411324; AA448261; AA479952; AA485752; AA504266; AA630376; AA634261; AA701167; AA703019; AA706041; AA773139; AA776813; AA862465; AA977711; A1288845; H15267; H18956; H73608; H99544; N45282; N48270; N59451; N95226; R37028; R66605; T51004; T51316; T72535; and W72103.
9. The classifier of claim 6, wherein said set of genes consists of the following genes: AA007421; AA045075; AA045308; AA418726; AA425320; AA450205; AA453508; AA453790; AA477404; AA478952; AA479270; AA486233; AA487274; AA664240; AA676797; AA702174; AA706226; AA709158; AA775616; AA826237; AA873159; AA969508; AI002566; AI29969; H17364; H19822; H23551; N36176; N72847; R10545; R27767; R34578; R59314; W73732; AA448641; R59360; AA121778; H51549; H81024; AA490493; R42984; AA258031; AA133215; R63816; N95226; N74527; AA702422; A1261561; AA132065; A1362799; AA045793; AA284172; N51632; AA482110; AA485450; AA699408; N70777; AA993736; A1139498; N59721; AA431885; AA911661; AA775865; R30941; AA703019; AA777192; W72103; H15267; H17638; R60193; R92717; AA706041; AA411324; AA504266; AA932696; AA973494; N45100; AA418410; AA725641; AA954482; H45391; T86932; AA279188; AA485752; AA680132; AA977711; W93370; AA036727; AA071075; AA464612; AA481250; AA598659; AA682905; R17811; W93592; AA017301; AA046406; AA256304; AA416759; AA448261; AA452130; AA457528; AA460542; AA479952; AA481507; AA504342; AA598970; AA630376; AA634261; AA677254; AA757564; AA775888; AA844864; AA862465; AA989139; AI253017; A1394426; H99544; N41021; N45282; N46845; N48270; N59846; R16760; R44546; R92994; T51004; T56281; T70321; and W45025.
10. The classifier of claim 6, wherein said set of genes consists of the following genes: N36176; AA149253; AA425320; AA775616; N72847; AA706226; AA883496.
11. A method for predicting a clinical outcome for a patient diagnosed with cancer, said method comprising the steps of:
a) classifying at least one gene that correlates with a clinical outcome;
b) establishing a set of reference gene expression levels based on the at least one gene;
c) receiving biological information from the patient;
d) extrapolating from the biological information the level of intracellular expression of said at least one gene;
e) comparing said level of intracellular expression against said set of reference gene expression levels; and
f) predicting a clinical outcome based on the deviation of the intracellular level expression from that of the reference gene expression levels.
12. The method of claim 1, wherein identification of said at least one gene is performed with any on or combination of the following: significance analysis of microarrays, cluster analysis, support vector technology, neural network, and leave-one-out cross validation.
13. The method of claim 1, further comprising the step of estimating the accuracy of the predicted clinical outcome.
14. The method of claim 1, wherein the biological information is a clinical specimen of bodily fluid or tissue.
15. The method of claim 14, wherein the biological information is a clinical tumor sample.
16. The method of claim 1, wherein the outcome being evaluated is for a patient diagnosed with colon cancer.
17. The method of claim 1, wherein the predicted clinical outcome is the probability of patient survival at a predetermined date.
18. The method of claim 1, further comprising the step of generating a treatment regimen based on the predicted clinical outcome.
19. The method of claim 1, wherein the gene that is identified is one with the accession number selected from the group consisting of: N36176; AA149253; AA425320; AA775616; N72847; AA706226; AA976642; AA133215; AA457267; N50073; R38360; AA450205; AA148578; R38640; AA487274; N53172; AA045308; AA045075; N63366; R22340; AA437223; AA481250; AA045793; H87795; AA121806; AA284172; R68106; AA479270; AA432030; R10545; AA453508; A1149393; AA883496; AA167823; AI203139; H19822; W73732; AA777892; AA885478; AA932696; AA481507; H18953; AA709158; AA488652; N39584; H62801; H17638 R43684; N21630; T81317; R45595; T90789; and AA283062.
US11/065,794 2004-02-25 2005-02-25 Methods for predicting cancer outcome and gene signatures for use therein Abandoned US20060195266A1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
US11/065,794 US20060195266A1 (en) 2005-02-25 2005-02-25 Methods for predicting cancer outcome and gene signatures for use therein
EP05754399A EP1894131B1 (en) 2005-02-25 2005-05-19 Methods and systems for predicting cancer outcome
PCT/US2005/017988 WO2006093507A2 (en) 2005-02-25 2005-05-19 Methods and systems for predicting cancer outcome
US11/134,688 US10181009B2 (en) 2004-02-25 2005-05-19 Methods and systems for predicting cancer outcome

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US11/065,794 US20060195266A1 (en) 2005-02-25 2005-02-25 Methods for predicting cancer outcome and gene signatures for use therein

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US11/134,688 Continuation-In-Part US10181009B2 (en) 2004-02-25 2005-05-19 Methods and systems for predicting cancer outcome

Publications (1)

Publication Number Publication Date
US20060195266A1 true US20060195266A1 (en) 2006-08-31

Family

ID=36932887

Family Applications (2)

Application Number Title Priority Date Filing Date
US11/065,794 Abandoned US20060195266A1 (en) 2004-02-25 2005-02-25 Methods for predicting cancer outcome and gene signatures for use therein
US11/134,688 Expired - Fee Related US10181009B2 (en) 2004-02-25 2005-05-19 Methods and systems for predicting cancer outcome

Family Applications After (1)

Application Number Title Priority Date Filing Date
US11/134,688 Expired - Fee Related US10181009B2 (en) 2004-02-25 2005-05-19 Methods and systems for predicting cancer outcome

Country Status (3)

Country Link
US (2) US20060195266A1 (en)
EP (1) EP1894131B1 (en)
WO (1) WO2006093507A2 (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060195269A1 (en) * 2004-02-25 2006-08-31 Yeatman Timothy J Methods and systems for predicting cancer outcome
WO2009089548A2 (en) * 2008-01-11 2009-07-16 H. Lee Moffitt Cancer & Research Institute, Inc. Malignancy-risk signature from histologically normal breast tissue
WO2010017559A1 (en) * 2008-08-08 2010-02-11 University Of Georgia Research Foundation, Inc. Methods and systems for predicting proteins that can be secreted into bodily fluids
US20100144554A1 (en) * 2008-10-31 2010-06-10 Abbott Laboratories Methods for assembling panels of cancer cell lines for use in testing the efficacy of one or more pharmaceutical compositions
US20100145894A1 (en) * 2008-10-31 2010-06-10 Abbott Laboratories Genomic classification of colorectal cancer based on patterns of gene copy number alterations
US20100145893A1 (en) * 2008-10-31 2010-06-10 Abbott Laboratories Genomic classification of non-small cell lung carcinoma based on patterns of gene copy number alterations
US20100145897A1 (en) * 2008-10-31 2010-06-10 Abbott Laboratories Genomic classification of malignant melanoma based on patterns of gene copy number alterations
US20100184065A1 (en) * 2007-07-10 2010-07-22 University Of South Florida Method of Predicting Non-Response to First Line Chemotherapy
US20100292131A1 (en) * 2007-11-16 2010-11-18 Pronota N.V. Biomarkers and methods for diagnosing, predicting and/or prognosing sepsis and uses thereof
US7914988B1 (en) 2006-03-31 2011-03-29 Illumina, Inc. Gene expression profiles to predict relapse of prostate cancer
CN107548498A (en) * 2015-01-20 2018-01-05 南托米克斯有限责任公司 System and method for the chemotherapy in the high-level carcinoma of urinary bladder of response prediction
CN108753981A (en) * 2018-07-31 2018-11-06 深圳大学 Application of the quantitative detection of HOXB8 genes in colorectal cancer Index for diagnosis

Families Citing this family (196)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB0307403D0 (en) 2003-03-31 2003-05-07 Medical Res Council Selection by compartmentalised screening
GB0307428D0 (en) 2003-03-31 2003-05-07 Medical Res Council Compartmentalised combinatorial chemistry
US20060078893A1 (en) 2004-10-12 2006-04-13 Medical Research Council Compartmentalised combinatorial chemistry by microfluidic control
US20050221339A1 (en) 2004-03-31 2005-10-06 Medical Research Council Harvard University Compartmentalised screening by microfluidic control
CA2572450A1 (en) 2004-05-28 2005-12-15 Ambion, Inc. Methods and compositions involving microrna
US20120252050A1 (en) 2004-06-22 2012-10-04 Crescent Diagnostics Limited Methods for assessing risk of bone fracture
US7968287B2 (en) 2004-10-08 2011-06-28 Medical Research Council Harvard University In vitro evolution in microfluidic systems
EP2302055B1 (en) 2004-11-12 2014-08-27 Asuragen, Inc. Methods and compositions involving miRNA and miRNA inhibitor molecules
US20070017809A1 (en) * 2005-07-21 2007-01-25 Power3 Medical Products, Inc. Assay for ALS and ALS-like disorders
CA2632327A1 (en) * 2005-12-22 2007-06-28 F. Hoffmann-La Roche Ag Use of a marker combination comprising osteopontin and carcinoembryonic antigen in the assessment of colorectal cancer
WO2007081386A2 (en) 2006-01-11 2007-07-19 Raindance Technologies, Inc. Microfluidic devices and methods of use
DK1974058T3 (en) * 2006-01-11 2014-09-01 Genomic Health Inc Gene Expression Markers for Prognostication of Colorectal Cancer
US8024282B2 (en) * 2006-03-31 2011-09-20 Biodesix, Inc. Method for reliable classification of samples in clinical diagnostics using an improved method of classification
US9562837B2 (en) 2006-05-11 2017-02-07 Raindance Technologies, Inc. Systems for handling microfludic droplets
EP2047910B1 (en) 2006-05-11 2012-01-11 Raindance Technologies, Inc. Microfluidic device and method
US9074242B2 (en) 2010-02-12 2015-07-07 Raindance Technologies, Inc. Digital analyte analysis
EP2258874B1 (en) 2006-06-02 2015-03-18 GlaxoSmithKline Biologicals S.A. Method for identifying whether a patient will be responder or not to immunotherapy
WO2008011046A2 (en) * 2006-07-17 2008-01-24 The H.Lee Moffitt Cancer And Research Institute, Inc. Computer systems and methods for selecting subjects for clinical trials
EP3536396B1 (en) 2006-08-07 2022-03-30 The President and Fellows of Harvard College Fluorocarbon emulsion stabilizing surfactants
US9867530B2 (en) 2006-08-14 2018-01-16 Volcano Corporation Telescopic side port catheter device with imaging system and method for accessing side branch occlusions
WO2008036765A2 (en) * 2006-09-19 2008-03-27 Asuragen, Inc. Micrornas differentially expressed in pancreatic diseases and uses thereof
WO2008097559A2 (en) 2007-02-06 2008-08-14 Brandeis University Manipulation of fluids and reactions in microfluidic systems
WO2008130623A1 (en) 2007-04-19 2008-10-30 Brandeis University Manipulation of fluids, fluid components and reactions in microfluidic systems
US7908231B2 (en) * 2007-06-12 2011-03-15 Miller James R Selecting a conclusion using an ordered sequence of discriminators
US7810365B2 (en) * 2007-06-14 2010-10-12 Schlage Lock Company Lock cylinder with locking member
US9596993B2 (en) 2007-07-12 2017-03-21 Volcano Corporation Automatic calibration systems and methods of use
WO2009009799A1 (en) 2007-07-12 2009-01-15 Volcano Corporation Catheter for in vivo imaging
WO2009009802A1 (en) 2007-07-12 2009-01-15 Volcano Corporation Oct-ivus catheter for concurrent luminal imaging
US8361714B2 (en) 2007-09-14 2013-01-29 Asuragen, Inc. Micrornas differentially expressed in cervical cancer and uses thereof
WO2009039479A1 (en) * 2007-09-21 2009-03-26 H. Lee Moffitt Cancer Center And Research Institute, Inc. Genotypic tumor progression classifier and predictor
US8244654B1 (en) * 2007-10-22 2012-08-14 Healthways, Inc. End of life predictive model
US20110046002A1 (en) * 2007-11-20 2011-02-24 University Of South Florida Seven Gene Breast Cancer Predictor
WO2009070805A2 (en) 2007-12-01 2009-06-04 Asuragen, Inc. Mir-124 regulated genes and pathways as targets for therapeutic intervention
US8645424B2 (en) * 2007-12-19 2014-02-04 Sam Stanley Miller System for electronically recording and sharing medical information
WO2009095319A1 (en) * 2008-01-28 2009-08-06 Siemens Healthcare Diagnostics Gmbh Cancer prognosis by majority voting
EP2990487A1 (en) 2008-05-08 2016-03-02 Asuragen, INC. Compositions and methods related to mirna modulation of neovascularization or angiogenesis
EP2315629B1 (en) 2008-07-18 2021-12-15 Bio-Rad Laboratories, Inc. Droplet libraries
WO2010028288A2 (en) * 2008-09-05 2010-03-11 Aueon, Inc. Methods for stratifying and annotating cancer drug treatment options
US8551505B2 (en) 2008-10-31 2013-10-08 The Invention Science Fund I, Llc Compositions and methods for therapeutic delivery with frozen particles
US9060926B2 (en) 2008-10-31 2015-06-23 The Invention Science Fund I, Llc Compositions and methods for therapeutic delivery with frozen particles
US8568363B2 (en) 2008-10-31 2013-10-29 The Invention Science Fund I, Llc Frozen compositions and methods for piercing a substrate
US8409376B2 (en) 2008-10-31 2013-04-02 The Invention Science Fund I, Llc Compositions and methods for surface abrasion with frozen particles
US8563012B2 (en) * 2008-10-31 2013-10-22 The Invention Science Fund I, Llc Compositions and methods for administering compartmentalized frozen particles
US9050317B2 (en) 2008-10-31 2015-06-09 The Invention Science Fund I, Llc Compositions and methods for therapeutic delivery with frozen particles
US9050070B2 (en) 2008-10-31 2015-06-09 The Invention Science Fund I, Llc Compositions and methods for surface abrasion with frozen particles
US8793075B2 (en) 2008-10-31 2014-07-29 The Invention Science Fund I, Llc Compositions and methods for therapeutic delivery with frozen particles
US8551506B2 (en) 2008-10-31 2013-10-08 The Invention Science Fund I, Llc Compositions and methods for administering compartmentalized frozen particles
US8788211B2 (en) 2008-10-31 2014-07-22 The Invention Science Fund I, Llc Method and system for comparing tissue ablation or abrasion data to data related to administration of a frozen particle composition
US9072799B2 (en) 2008-10-31 2015-07-07 The Invention Science Fund I, Llc Compositions and methods for surface abrasion with frozen particles
US20100111857A1 (en) 2008-10-31 2010-05-06 Boyden Edward S Compositions and methods for surface abrasion with frozen particles
US8731840B2 (en) * 2008-10-31 2014-05-20 The Invention Science Fund I, Llc Compositions and methods for therapeutic delivery with frozen particles
US9072688B2 (en) 2008-10-31 2015-07-07 The Invention Science Fund I, Llc Compositions and methods for therapeutic delivery with frozen particles
US9060931B2 (en) 2008-10-31 2015-06-23 The Invention Science Fund I, Llc Compositions and methods for delivery of frozen particle adhesives
US8603495B2 (en) 2008-10-31 2013-12-10 The Invention Science Fund I, Llc Compositions and methods for biological remodeling with frozen particle compositions
US9060934B2 (en) 2008-10-31 2015-06-23 The Invention Science Fund I, Llc Compositions and methods for surface abrasion with frozen particles
US8414356B2 (en) 2008-10-31 2013-04-09 The Invention Science Fund I, Llc Systems, devices, and methods for making or administering frozen particles
US8721583B2 (en) 2008-10-31 2014-05-13 The Invention Science Fund I, Llc Compositions and methods for surface abrasion with frozen particles
US8731841B2 (en) 2008-10-31 2014-05-20 The Invention Science Fund I, Llc Compositions and methods for therapeutic delivery with frozen particles
US9050251B2 (en) 2008-10-31 2015-06-09 The Invention Science Fund I, Llc Compositions and methods for delivery of frozen particle adhesives
US8545855B2 (en) 2008-10-31 2013-10-01 The Invention Science Fund I, Llc Compositions and methods for surface abrasion with frozen particles
US8762067B2 (en) 2008-10-31 2014-06-24 The Invention Science Fund I, Llc Methods and systems for ablation or abrasion with frozen particles and comparing tissue surface ablation or abrasion data to clinical outcome data
US8221480B2 (en) 2008-10-31 2012-07-17 The Invention Science Fund I, Llc Compositions and methods for biological remodeling with frozen particle compositions
US8725420B2 (en) 2008-10-31 2014-05-13 The Invention Science Fund I, Llc Compositions and methods for surface abrasion with frozen particles
US8528589B2 (en) 2009-03-23 2013-09-10 Raindance Technologies, Inc. Manipulation of microfluidic droplets
AU2010242073C1 (en) 2009-04-30 2015-12-24 Good Start Genetics, Inc. Methods and compositions for evaluating genetic markers
US10179936B2 (en) * 2009-05-01 2019-01-15 Genomic Health, Inc. Gene expression profile algorithm and test for likelihood of recurrence of colorectal cancer and response to chemotherapy
GB0917457D0 (en) 2009-10-06 2009-11-18 Glaxosmithkline Biolog Sa Method
SG10201405883XA (en) 2009-09-23 2014-11-27 Celmatix Inc Methods and devices for assessing infertility and/or egg quality
WO2011042564A1 (en) 2009-10-09 2011-04-14 Universite De Strasbourg Labelled silica-based nanomaterial with enhanced properties and uses thereof
US10837883B2 (en) 2009-12-23 2020-11-17 Bio-Rad Laboratories, Inc. Microfluidic systems and methods for reducing the exchange of molecules between droplets
TWI488496B (en) * 2010-01-20 2015-06-11 Altek Corp Face capture method for image capture device
US10351905B2 (en) 2010-02-12 2019-07-16 Bio-Rad Laboratories, Inc. Digital analyte analysis
US9399797B2 (en) 2010-02-12 2016-07-26 Raindance Technologies, Inc. Digital analyte analysis
US9366632B2 (en) 2010-02-12 2016-06-14 Raindance Technologies, Inc. Digital analyte analysis
WO2011140662A1 (en) * 2010-05-13 2011-11-17 The Royal Institution For The Advancement Of Learning / Mcgill University Cux1 signature for determination of cancer clinical outcome
KR20130115250A (en) 2010-09-15 2013-10-21 알막 다이아그노스틱스 리미티드 Molecular diagnostic test for cancer
IN2013MN00522A (en) 2010-09-24 2015-05-29 Univ Leland Stanford Junior
US9562897B2 (en) 2010-09-30 2017-02-07 Raindance Technologies, Inc. Sandwich assays in droplets
US9163281B2 (en) 2010-12-23 2015-10-20 Good Start Genetics, Inc. Methods for maintaining the integrity and identification of a nucleic acid template in a multiplex sequencing reaction
US11141063B2 (en) 2010-12-23 2021-10-12 Philips Image Guided Therapy Corporation Integrated system architectures and methods of use
US11040140B2 (en) 2010-12-31 2021-06-22 Philips Image Guided Therapy Corporation Deep vein thrombosis therapeutic methods
US20120184454A1 (en) * 2011-01-14 2012-07-19 Kalady Matthew F Gene signature is associated with early stage rectal cancer recurrence
KR20140006898A (en) 2011-01-25 2014-01-16 알막 다이아그노스틱스 리미티드 Colon cancer gene expression signatures and methods of use
EP3859011A1 (en) 2011-02-11 2021-08-04 Bio-Rad Laboratories, Inc. Methods for forming mixed droplets
WO2012112804A1 (en) 2011-02-18 2012-08-23 Raindance Technoligies, Inc. Compositions and methods for molecular labeling
US8841071B2 (en) 2011-06-02 2014-09-23 Raindance Technologies, Inc. Sample multiplexing
US9556470B2 (en) 2011-06-02 2017-01-31 Raindance Technologies, Inc. Enzyme quantification
US8793209B2 (en) 2011-06-22 2014-07-29 James R. Miller, III Reflecting the quantitative impact of ordinal indicators
US8658430B2 (en) 2011-07-20 2014-02-25 Raindance Technologies, Inc. Manipulating droplet size
WO2013033489A1 (en) 2011-08-31 2013-03-07 Volcano Corporation Optical rotary joint and methods of use
US9644241B2 (en) 2011-09-13 2017-05-09 Interpace Diagnostics, Llc Methods and compositions involving miR-135B for distinguishing pancreatic cancer from benign pancreatic disease
CA2850930A1 (en) 2011-10-03 2013-04-11 Celmatix, Inc. Methods and devices for assessing risk to a putative offspring of developing a condition
CA2852665A1 (en) 2011-10-17 2013-04-25 Good Start Genetics, Inc. Analysis methods
US8418249B1 (en) * 2011-11-10 2013-04-09 Narus, Inc. Class discovery for automated discovery, attribution, analysis, and risk assessment of security threats
EP3495817A1 (en) 2012-02-10 2019-06-12 Raindance Technologies, Inc. Molecular diagnostic screening assay
US8209130B1 (en) 2012-04-04 2012-06-26 Good Start Genetics, Inc. Sequence assembly
US10227635B2 (en) 2012-04-16 2019-03-12 Molecular Loop Biosolutions, Llc Capture reactions
EP3524693A1 (en) 2012-04-30 2019-08-14 Raindance Technologies, Inc. Digital analyte analysis
US20140024546A1 (en) 2012-05-01 2014-01-23 Synapdx Corporation Systems and methods for normalizing gene expression profiles of biological samples having a mixed cell population
EP2844773B1 (en) 2012-05-04 2017-08-16 Boreal Genomics Corp. Biomarker analysis using scodaphoresis
CN102760210A (en) * 2012-06-19 2012-10-31 南京理工大学常熟研究院有限公司 Adenosine triphosphate binding site predicting method for protein
US9336302B1 (en) 2012-07-20 2016-05-10 Zuci Realty Llc Insight and algorithmic clustering for automated synthesis
US9478940B2 (en) 2012-10-05 2016-10-25 Volcano Corporation Systems and methods for amplifying light
US9292918B2 (en) 2012-10-05 2016-03-22 Volcano Corporation Methods and systems for transforming luminal images
US9858668B2 (en) 2012-10-05 2018-01-02 Volcano Corporation Guidewire artifact removal in images
US11272845B2 (en) 2012-10-05 2022-03-15 Philips Image Guided Therapy Corporation System and method for instant and automatic border detection
US9324141B2 (en) 2012-10-05 2016-04-26 Volcano Corporation Removal of A-scan streaking artifact
US9367965B2 (en) 2012-10-05 2016-06-14 Volcano Corporation Systems and methods for generating images of tissue
US10568586B2 (en) 2012-10-05 2020-02-25 Volcano Corporation Systems for indicating parameters in an imaging data set and methods of use
US9307926B2 (en) 2012-10-05 2016-04-12 Volcano Corporation Automatic stent detection
US9286673B2 (en) 2012-10-05 2016-03-15 Volcano Corporation Systems for correcting distortions in a medical image and methods of use thereof
US10070827B2 (en) 2012-10-05 2018-09-11 Volcano Corporation Automatic image playback
US10162800B2 (en) 2012-10-17 2018-12-25 Celmatix Inc. Systems and methods for determining the probability of a pregnancy at a selected point in time
US9177098B2 (en) 2012-10-17 2015-11-03 Celmatix Inc. Systems and methods for determining the probability of a pregnancy at a selected point in time
US9840734B2 (en) 2012-10-22 2017-12-12 Raindance Technologies, Inc. Methods for analyzing DNA
CA2894403A1 (en) 2012-12-13 2014-06-19 Volcano Corporation Devices, systems, and methods for targeted cannulation
US9836577B2 (en) 2012-12-14 2017-12-05 Celmatix, Inc. Methods and devices for assessing risk of female infertility
WO2014113188A2 (en) 2012-12-20 2014-07-24 Jeremy Stigall Locating intravascular images
US11406498B2 (en) 2012-12-20 2022-08-09 Philips Image Guided Therapy Corporation Implant delivery system and implants
US10942022B2 (en) 2012-12-20 2021-03-09 Philips Image Guided Therapy Corporation Manual calibration of imaging system
CA2895502A1 (en) 2012-12-20 2014-06-26 Jeremy Stigall Smooth transition catheters
US9709379B2 (en) 2012-12-20 2017-07-18 Volcano Corporation Optical coherence tomography system that is reconfigurable between different imaging modes
US10939826B2 (en) 2012-12-20 2021-03-09 Philips Image Guided Therapy Corporation Aspirating and removing biological material
US9612105B2 (en) 2012-12-21 2017-04-04 Volcano Corporation Polarization sensitive optical coherence tomography system
WO2014100530A1 (en) 2012-12-21 2014-06-26 Whiseant Chester System and method for catheter steering and operation
EP2936626A4 (en) 2012-12-21 2016-08-17 David Welford Systems and methods for narrowing a wavelength emission of light
WO2014100162A1 (en) 2012-12-21 2014-06-26 Kemp Nathaniel J Power-efficient optical buffering using optical switch
EP2934280B1 (en) 2012-12-21 2022-10-19 Mai, Jerome Ultrasound imaging with variable line density
US9486143B2 (en) 2012-12-21 2016-11-08 Volcano Corporation Intravascular forward imaging device
WO2014099672A1 (en) 2012-12-21 2014-06-26 Andrew Hancock System and method for multipath processing of image signals
EP2936426B1 (en) 2012-12-21 2021-10-13 Jason Spencer System and method for graphical processing of medical data
WO2014100606A1 (en) 2012-12-21 2014-06-26 Meyer, Douglas Rotational ultrasound imaging catheter with extended catheter body telescope
US10058284B2 (en) 2012-12-21 2018-08-28 Volcano Corporation Simultaneous imaging, monitoring, and therapy
CN105103163A (en) 2013-03-07 2015-11-25 火山公司 Multimodal segmentation in intravascular images
US10226597B2 (en) 2013-03-07 2019-03-12 Volcano Corporation Guidewire with centering mechanism
US11154313B2 (en) 2013-03-12 2021-10-26 The Volcano Corporation Vibrating guidewire torquer and methods of use
WO2014164696A1 (en) 2013-03-12 2014-10-09 Collins Donna Systems and methods for diagnosing coronary microvascular disease
US9301687B2 (en) 2013-03-13 2016-04-05 Volcano Corporation System and method for OCT depth calibration
US11026591B2 (en) 2013-03-13 2021-06-08 Philips Image Guided Therapy Corporation Intravascular pressure sensor calibration
JP6339170B2 (en) 2013-03-13 2018-06-06 ジンヒョン パーク System and method for generating images from a rotating intravascular ultrasound device
US10292677B2 (en) 2013-03-14 2019-05-21 Volcano Corporation Endoluminal filter having enhanced echogenic properties
US10219887B2 (en) 2013-03-14 2019-03-05 Volcano Corporation Filters with echogenic characteristics
WO2014152421A1 (en) 2013-03-14 2014-09-25 Good Start Genetics, Inc. Methods for analyzing nucleic acids
EP2967606B1 (en) 2013-03-14 2018-05-16 Volcano Corporation Filters with echogenic characteristics
EP2986762B1 (en) 2013-04-19 2019-11-06 Bio-Rad Laboratories, Inc. Digital analyte analysis
US9898575B2 (en) 2013-08-21 2018-02-20 Seven Bridges Genomics Inc. Methods and systems for aligning sequences
US9116866B2 (en) 2013-08-21 2015-08-25 Seven Bridges Genomics Inc. Methods and systems for detecting sequence variants
CN105793859B (en) 2013-09-30 2020-02-28 七桥基因公司 System for detecting sequence variants
US11901041B2 (en) 2013-10-04 2024-02-13 Bio-Rad Laboratories, Inc. Digital analysis of nucleic acid modification
JP2016533182A (en) 2013-10-18 2016-10-27 セブン ブリッジズ ジェノミクス インコーポレイテッド Methods and systems for identifying disease-induced mutations
WO2015058093A1 (en) 2013-10-18 2015-04-23 Seven Bridges Genomics Inc. Methods and systems for genotyping genetic samples
US11049587B2 (en) 2013-10-18 2021-06-29 Seven Bridges Genomics Inc. Methods and systems for aligning sequences in the presence of repeating elements
US10851414B2 (en) 2013-10-18 2020-12-01 Good Start Genetics, Inc. Methods for determining carrier status
WO2015058095A1 (en) 2013-10-18 2015-04-23 Seven Bridges Genomics Inc. Methods and systems for quantifying sequence alignment
US9063914B2 (en) 2013-10-21 2015-06-23 Seven Bridges Genomics Inc. Systems and methods for transcriptome analysis
US9944977B2 (en) 2013-12-12 2018-04-17 Raindance Technologies, Inc. Distinguishing rare variations in a nucleic acid sequence from a sample
WO2015103367A1 (en) 2013-12-31 2015-07-09 Raindance Technologies, Inc. System and method for detection of rna species
KR20160107237A (en) 2014-01-10 2016-09-13 세븐 브릿지스 지노믹스 인크. Systems and methods for use of known alleles in read mapping
US9817944B2 (en) 2014-02-11 2017-11-14 Seven Bridges Genomics Inc. Systems and methods for analyzing sequence data
WO2015175530A1 (en) 2014-05-12 2015-11-19 Gore Athurva Methods for detecting aneuploidy
US10208350B2 (en) 2014-07-17 2019-02-19 Celmatix Inc. Methods and systems for assessing infertility and related pathologies
WO2016040446A1 (en) 2014-09-10 2016-03-17 Good Start Genetics, Inc. Methods for selectively suppressing non-target sequences
EP3224595A4 (en) 2014-09-24 2018-06-13 Good Start Genetics, Inc. Process control for increased robustness of genetic assays
CN107408043A (en) 2014-10-14 2017-11-28 七桥基因公司 System and method for the intelligence tool in sequence streamline
US10066259B2 (en) 2015-01-06 2018-09-04 Good Start Genetics, Inc. Screening for structural variants
EP3057046A1 (en) * 2015-02-13 2016-08-17 Tata Consultancy Services Limited Method and system for employee assesment
US10192026B2 (en) 2015-03-05 2019-01-29 Seven Bridges Genomics Inc. Systems and methods for genomic pattern analysis
US10275567B2 (en) 2015-05-22 2019-04-30 Seven Bridges Genomics Inc. Systems and methods for haplotyping
US10793895B2 (en) 2015-08-24 2020-10-06 Seven Bridges Genomics Inc. Systems and methods for epigenetic analysis
US10584380B2 (en) 2015-09-01 2020-03-10 Seven Bridges Genomics Inc. Systems and methods for mitochondrial analysis
US10724110B2 (en) 2015-09-01 2020-07-28 Seven Bridges Genomics Inc. Systems and methods for analyzing viral nucleic acids
US10647981B1 (en) 2015-09-08 2020-05-12 Bio-Rad Laboratories, Inc. Nucleic acid library generation methods and compositions
US11347704B2 (en) 2015-10-16 2022-05-31 Seven Bridges Genomics Inc. Biological graph or sequence serialization
US20170199960A1 (en) 2016-01-07 2017-07-13 Seven Bridges Genomics Inc. Systems and methods for adaptive local alignment for graph genomes
US10364468B2 (en) 2016-01-13 2019-07-30 Seven Bridges Genomics Inc. Systems and methods for analyzing circulating tumor DNA
US10460829B2 (en) 2016-01-26 2019-10-29 Seven Bridges Genomics Inc. Systems and methods for encoding genetic variation for a population
US10262102B2 (en) 2016-02-24 2019-04-16 Seven Bridges Genomics Inc. Systems and methods for genotyping with graph reference
CN106023258B (en) * 2016-05-26 2019-02-15 南京工程学院 Improved adaptive GMM moving target detecting method
US10775361B2 (en) 2016-07-22 2020-09-15 Qualcomm Incorporated Monitoring control channel with different encoding schemes
US11250931B2 (en) 2016-09-01 2022-02-15 Seven Bridges Genomics Inc. Systems and methods for detecting recombination
US11205103B2 (en) 2016-12-09 2021-12-21 The Research Foundation for the State University Semisupervised autoencoder for sentiment analysis
WO2018143540A1 (en) * 2017-02-02 2018-08-09 사회복지법인 삼성생명공익재단 Method, device, and program for predicting prognosis of stomach cancer by using artificial neural network
US10998178B2 (en) 2017-08-28 2021-05-04 Purdue Research Foundation Systems and methods for sample analysis using swabs
US11335460B2 (en) * 2017-11-09 2022-05-17 International Business Machines Corporation Neural network based selection of representative patients
US10630709B2 (en) * 2018-02-13 2020-04-21 Cisco Technology, Inc. Assessing detectability of malware related traffic
EP3762493A4 (en) 2018-03-08 2021-11-24 University of Notre Dame du Lac Systems and methods for assessing colorectal cancer molecular subtype and risk of recurrence and for determining and administering treatment protocols based thereon
CN110197474B (en) * 2018-03-27 2023-08-25 腾讯科技(深圳)有限公司 Image processing method and device and training method of neural network model
CA3003032A1 (en) * 2018-04-27 2019-10-27 Nanostics Inc. Methods of diagnosing disease using microflow cytometry
US11798653B2 (en) 2018-10-18 2023-10-24 Medimmune, Llc Methods for determining treatment for cancer patients
KR102188115B1 (en) * 2019-03-20 2020-12-07 인천대학교 산학협력단 Electronic device capable of selecting a biomarker to be used in cancer prognosis prediction based on generative adversarial networks and operating method thereof
CN111863126B (en) * 2020-05-28 2024-03-26 上海市生物医药技术研究院 Method for constructing colorectal tumor state evaluation model and application
CN112268732B (en) * 2020-11-02 2021-08-06 河南省肿瘤医院 Auxiliary detection system for breast duct in-situ cancer
CN112599190B (en) * 2020-12-17 2024-04-05 重庆大学 Method for identifying deafness-related genes based on mixed classifier
WO2024019471A1 (en) * 2022-07-18 2024-01-25 아주대학교산학협력단 Survival curve generating system using exponential function, and method thereof
CN115457361A (en) * 2022-09-19 2022-12-09 京东方科技集团股份有限公司 Classification model obtaining method, expression class determining method, apparatus, device and medium
CN116072214B (en) * 2023-03-06 2023-07-11 之江实验室 Phenotype intelligent prediction and training method and device based on gene significance enhancement

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DK1410011T3 (en) 2001-06-18 2011-07-18 Netherlands Cancer Inst Diagnosis and prognosis of breast cancer patients
US7171311B2 (en) 2001-06-18 2007-01-30 Rosetta Inpharmatics Llc Methods of assigning treatment to breast cancer patients
CA2498418A1 (en) * 2002-09-10 2004-03-25 Guennadi V. Glinskii Gene segregation and biological sample classification methods
US20040146921A1 (en) * 2003-01-24 2004-07-29 Bayer Pharmaceuticals Corporation Expression profiles for colon cancer and methods of use
ES2651849T3 (en) 2003-07-10 2018-01-30 Genomic Health, Inc. Expression profile and test algorithm for cancer prognosis
US20060195266A1 (en) * 2005-02-25 2006-08-31 Yeatman Timothy J Methods for predicting cancer outcome and gene signatures for use therein

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10181009B2 (en) 2004-02-25 2019-01-15 H. Lee Moffitt Cancer Center And Research Institute, Inc. Methods and systems for predicting cancer outcome
US20060195269A1 (en) * 2004-02-25 2006-08-31 Yeatman Timothy J Methods and systems for predicting cancer outcome
US8110363B2 (en) 2006-03-31 2012-02-07 Illumina, Inc. Expression profiles to predict relapse of prostate cancer
US20110153534A1 (en) * 2006-03-31 2011-06-23 Illumina, Inc. Expression Profiles to Predict Relapse of Prostate Cancer
US7914988B1 (en) 2006-03-31 2011-03-29 Illumina, Inc. Gene expression profiles to predict relapse of prostate cancer
US8440407B2 (en) 2006-03-31 2013-05-14 Illumina, Inc. Gene expression profiles to predict relapse of prostate cancer
US20100184065A1 (en) * 2007-07-10 2010-07-22 University Of South Florida Method of Predicting Non-Response to First Line Chemotherapy
US20100292131A1 (en) * 2007-11-16 2010-11-18 Pronota N.V. Biomarkers and methods for diagnosing, predicting and/or prognosing sepsis and uses thereof
WO2009089548A3 (en) * 2008-01-11 2010-01-07 H. Lee Moffitt Cancer & Research Institute, Inc. Malignancy-risk signature from histologically normal breast tissue
WO2009089548A2 (en) * 2008-01-11 2009-07-16 H. Lee Moffitt Cancer & Research Institute, Inc. Malignancy-risk signature from histologically normal breast tissue
WO2010017559A1 (en) * 2008-08-08 2010-02-11 University Of Georgia Research Foundation, Inc. Methods and systems for predicting proteins that can be secreted into bodily fluids
US20110224913A1 (en) * 2008-08-08 2011-09-15 Juan Cui Methods and systems for predicting proteins that can be secreted into bodily fluids
CN102177434A (en) * 2008-08-08 2011-09-07 乔治亚大学研究基金公司 Methods and systems for predicting proteins that can be secreted into bodily fluids
US20100145893A1 (en) * 2008-10-31 2010-06-10 Abbott Laboratories Genomic classification of non-small cell lung carcinoma based on patterns of gene copy number alterations
US20100145897A1 (en) * 2008-10-31 2010-06-10 Abbott Laboratories Genomic classification of malignant melanoma based on patterns of gene copy number alterations
US20100145894A1 (en) * 2008-10-31 2010-06-10 Abbott Laboratories Genomic classification of colorectal cancer based on patterns of gene copy number alterations
US8498821B2 (en) * 2008-10-31 2013-07-30 Abbvie Inc. Genomic classification of malignant melanoma based on patterns of gene copy number alterations
US8498822B2 (en) * 2008-10-31 2013-07-30 Abbvie Inc. Genomic classification of colorectal cancer based on patterns of gene copy number alterations
US8498820B2 (en) * 2008-10-31 2013-07-30 Abbvie Inc. Genomic classification of non-small cell lung carcinoma based on patterns of gene copy number alterations
US9002653B2 (en) 2008-10-31 2015-04-07 Abbvie, Inc. Methods for assembling panels of cancer cell lines for use in testing the efficacy of one or more pharmaceutical compositions
US20100144554A1 (en) * 2008-10-31 2010-06-10 Abbott Laboratories Methods for assembling panels of cancer cell lines for use in testing the efficacy of one or more pharmaceutical compositions
CN107548498A (en) * 2015-01-20 2018-01-05 南托米克斯有限责任公司 System and method for the chemotherapy in the high-level carcinoma of urinary bladder of response prediction
JP2018507470A (en) * 2015-01-20 2018-03-15 ナントミクス,エルエルシー System and method for predicting response to chemotherapy for high-grade bladder cancer
US11101038B2 (en) 2015-01-20 2021-08-24 Nantomics, Llc Systems and methods for response prediction to chemotherapy in high grade bladder cancer
CN108753981A (en) * 2018-07-31 2018-11-06 深圳大学 Application of the quantitative detection of HOXB8 genes in colorectal cancer Index for diagnosis

Also Published As

Publication number Publication date
US10181009B2 (en) 2019-01-15
EP1894131A2 (en) 2008-03-05
WO2006093507A3 (en) 2009-06-04
EP1894131B1 (en) 2013-01-23
US20060195269A1 (en) 2006-08-31
WO2006093507A2 (en) 2006-09-08
EP1894131A4 (en) 2010-01-27

Similar Documents

Publication Publication Date Title
US20060195266A1 (en) Methods for predicting cancer outcome and gene signatures for use therein
JP6404304B2 (en) Prognosis prediction of melanoma cancer
US20190249260A1 (en) Method for Using Gene Expression to Determine Prognosis of Prostate Cancer
US9115401B2 (en) Partition defined detection methods
JP5745848B2 (en) Signs of growth and prognosis in gastrointestinal cancer
KR101530689B1 (en) Prognosis prediction for colorectal cancer
ES2692333T3 (en) Resolution of genome fractions using polymorphism count
CN108368551B (en) Method for diagnosing tuberculosis
KR20140006898A (en) Colon cancer gene expression signatures and methods of use
US11053551B2 (en) Method and apparatus for determining a probability of colorectal cancer in a subject
EP2304630A1 (en) Molecular markers for cancer prognosis
EP2419540B1 (en) Methods and gene expression signature for assessing ras pathway activity
KR20110057188A (en) System and methods for measuring biomarker profiles
JP2020191896A (en) Gene expression profile algorism for calculating recurrence score for patient with kidney cancer
WO2005083128A2 (en) Methods for predicting cancer outcome and gene signatures for use therein
WO2016118670A1 (en) Multigene expression assay for patient stratification in resected colorectal liver metastases
AU2021286283B2 (en) Chromosome conformation markers of prostate cancer and lymphoma
US20090143238A1 (en) Oligonucleotide matrix and methods of use
Kelmansky Where statistics and molecular microarray experiments biology meet
Westbrook et al. Novel Targets for Diagnosis and Treatment of Breast Cancer Identified by Genomic Analysis
Crick cDNA Microarrays
NZ752676B2 (en) Gene expression profile algorithm for calculating a recurrence score for a patient with kidney cancer

Legal Events

Date Code Title Description
STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION

AS Assignment

Owner name: NATIONAL INSTITUTES OF HEALTH (NIH), U.S. DEPT. OF

Free format text: CONFIRMATORY LICENSE;ASSIGNOR:UNIVERSITY OF SOUTH FLORIDA;REEL/FRAME:021186/0272

Effective date: 20060314

AS Assignment

Owner name: US ARMY, SECRETARY OF THE ARMY, MARYLAND

Free format text: CONFIRMATORY LICENSE;ASSIGNOR:UNIVERSITY OF SOUTH FLORIDA;REEL/FRAME:032159/0437

Effective date: 20060314

AS Assignment

Owner name: NATIONAL INSTITUTES OF HEALTH (NIH), U.S. DEPT. OF

Free format text: CONFIRMATORY LICENSE;ASSIGNOR:UNIVERSITY OF SOUTH FLORIDA;REEL/FRAME:040665/0721

Effective date: 20161118