WO2012156515A1 - Molecular analysis of acute myeloid leukemia - Google Patents

Molecular analysis of acute myeloid leukemia Download PDF

Info

Publication number
WO2012156515A1
WO2012156515A1 PCT/EP2012/059280 EP2012059280W WO2012156515A1 WO 2012156515 A1 WO2012156515 A1 WO 2012156515A1 EP 2012059280 W EP2012059280 W EP 2012059280W WO 2012156515 A1 WO2012156515 A1 WO 2012156515A1
Authority
WO
WIPO (PCT)
Prior art keywords
rnas
abundance
aml
rna
listed
Prior art date
Application number
PCT/EP2012/059280
Other languages
French (fr)
Inventor
Joachim Schultze
Andrea Hofmann
Andrea Staratschek-Jox
Original Assignee
Rheinische Friedrich-Wilhelms-Universität Bonn
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Rheinische Friedrich-Wilhelms-Universität Bonn filed Critical Rheinische Friedrich-Wilhelms-Universität Bonn
Priority to EP12724304.6A priority Critical patent/EP2710147A1/en
Publication of WO2012156515A1 publication Critical patent/WO2012156515A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers

Definitions

  • the present invention provides a method for molecu l ar analysis of Acute Myeloid Leukemia (AML) based on the abundance of particular RINAs from blood samples, as well as diagnostic tools such as kits and arrays suitable for such method .
  • AML Acute Myeloid Leukemia
  • WO2010/143941 discloses the subclassifying of juvenile leukemia via molecular signatures to predict the development of the disease. This subclassifying, however, requires that a primary diagnosis of AML has been established beforehand.
  • WO2006/071088 discloses gene expression analysis by RNA hybridization and quantitation based on a small number of patients. The genes determined in this analysis include CITED2, MGST1, BIN 1, RAB32, ICAM3, PXN, PPGB and TAF15.
  • Such patients usually are visiting their private practitioner and very often the early signs of leukemia are misclassified as viral or bacterial infections with leukocytosis. Only if symptoms are remaining for a prolonged time, patients are sent to the hospital (see also Figure 7A). Particularly if such hospitals are not specialized in hematology, these patients have to be further forwarded to regional centers or university hospitals before they are correctly diagnosed with AML. At the centers the primary diagnosis of AML is performed by an experienced hematologist using the patient's history, blood counts and light microscopy of cells derived from bone marrow aspirates.
  • a hematologist is usually performing further tests for differential diagnosis, subclassification of the disease, prognosis of disease outcome, or therapy outcome prediction.
  • These include flow cytometric analysis, cytogenetics, and PCR-based assays for genetic translocations.
  • Previous inventions in the field of gene expression profiling (GEP) are exclusively targeted at improving differential diagnosis, subclassification, prognosis of disease outcome, and therapy outcome prediction and rely on primary diagnosis by current diagnostic procedures (patient history, physical exam, blood counts and light microscopy of bone marrow cells) prior introduction of the inventions as new diagnostics.
  • the present invention is directed to substitute currently used approaches (patient history, physical exam, blood counts and light microscopy of bone marrow cells) for primary diagnosis by gene expression profiling (GEP) of peripheral blood (grey box in Figure 7B).
  • GEP gene expression profiling
  • our test is targeted at an earlier time point during the diagnostic process and should be available for the private practitioner rather than the specialist, e.g . a hematologist or an oncologist. This is shown in Figure 7B.
  • a private practitioner can order such GEP-based test for primary diagnosis of AML from a specialized laboratory at an early time point of the diagnostic process.
  • the specialized laboratory will provide the private practitioner with a probability for the patient being diagnosed with AML.
  • the private practitioner can directly refer the patient to a center for the treatment of the patient.
  • the center is then quickly performing further diagnostics for subclassification, therapy outcome prediction and prognosis (e.g . by other GEP-based algorithms) and can immediately start with therapy.
  • Another very important advantage of the present invention is the possibility to diagnose patients significantly earlier. Since our GEP-based assay can be performed from a small amount of blood in an unbiased fashion, the primary diagnosis does not require the expertise of specialized hematologists (in specialized centers). In other words, the private practioner (with the help of a specialized laboratory) can be enabled to primarily diagnose AML in a rather short time frame.
  • the invention thus provides methods and kits for diagnosing, detecting, and screening for of Acute Myeloid Leukemia (AML). Also provided is a method for preparing an RNA expression profile that is indicative of the presence or absence of AML in a subject. Further provided is the eva luation of the patient RNA expression profiles for the presence or absence of one or more RNA expression signatures that are indicative of AML. More concretely the application provides a method for the detection of AML in a human subject based on RNA from a blood sample obtained from said subject, comprising :
  • RNAs in the sample that are chosen from the RNAs listed in Ta ble 2, a nd
  • the above method is suitable as a primary test for AML, i.e. it does not require a preceding primary test by classical methods.
  • the conclusion whether the patient has AML or not may comprise, in a preferred embodiment of the method, classifying the sample as being from a healthy individual or from an individual having AML based on the specific difference of the abundance of the at least 4 RNAs in healthy individuals versus the abundance of the at least 4 RNAs in individuals with AML (in a reference set).
  • a sample can be classified as being from a patient with AML or from a healthy individual without the necessity to run a reference sample of known origin (i.e. from an AML patient or a healthy individual) at the same time.
  • the method of the invention is a method for the detection of AML in a human individual based on RNA obtained from a blood sample obtained from the individual, comprising :
  • the sample classifying the sample as being from a healthy individual or from an individual having AML based on the specific difference of the abundance of the at least 4 RNAs in healthy individuals versus the abundance of the at least 4 RNAs in individuals with AML.
  • RNAs are abundance of at least 6 RNAs, of at least 8 RNAs, of at least 10 RNAs, of at least 12 RNAs, or of at least 14 RNAs listed in Table 2 is determined. It is further preferred to determine the highest ranked RNAs of Table 2 in the method of the invention, i.e. the first 4, first 6, first 8, first 12 or first 14 RNAs of Table 2.
  • the invention provides a method for preparing RNA expression profiles that are indicative of the presence or absence of AML.
  • the RNA expression profiles are prepared from patient blood samples.
  • the number of transcripts in the RNA expression profile may be selected so as to offer a convenient and cost effective means for screening samples for the presence or absence of AML cancer with high sensitivity and high specificity.
  • the RNA expression profile includes the expression level or "abundance" of from 4 to about 3000 transcripts.
  • the expression profile includes the RNA levels of 2500 transcripts or less, 2000 transcripts or less, 1500 transcripts or less, 1000 transcripts or less, 500 transcripts or less, 250 transcripts or less, 100 transcripts of less, or 50 transcripts or less.
  • the profile may contain the abundance or expression level of at least 4 RNAs that are indicative of the presence or absence AML, and specifically, as selected from Table 2, or may contain the expression level of at least 6, at least 8, at least 12 or at least 14 RNAs selected from Table 2.
  • the profile may contain the expression level or abundance of at least about 60, at least 100, at least 150, or 200 RNAs that are indicative of the presence or absence of AML, and such RNAs may be selected from Table 2.
  • Combinations of genes and/or transcripts that make up or are included in expression profiles are available from Examples 1 to 15 shown in Tables 3, 4 and 5.
  • RNA expression profiles in accordance with this aspect may be evaluated for the presence or absence of an RNA expression signature indicative of AML.
  • the sequential addition of transcripts from Table 2 to the expression profile provides for higher sensitivity and/or specificity for the detection of AML.
  • the area under the ROC curve (AUC) may be at least at least 0.8, or at least 0.82, or at least 0.85 or at least 0.9.
  • the AUC is a quantitative parameter for the clinical utility (specificity and sensitivity) of the detection method described herein.
  • An AUC of 1.0 refers to a sensitivity and specificity of 100 %.
  • median Mathews Correlation Coefficient may be at least at least 0.8, or at least 0.82, or at least 0.85 or at least 0.9.
  • MCC is a quantitative parameter for the clinical utility (specificity and sensitivity) of the detection method described herein.
  • An MCC of 1.0 refers to a sensitivity and specificity of 100 %.
  • the invention provides a method for detecting, diagnosing, or screening for AML.
  • the method comprises preparing an RNA expression profile by measuring the abundance of at least 4, at least 6, at least 8, or at least 12, or at least 14 RNAs in a patient blood sample, where the abundance of such RNAs are indicative of the presence or absence of AML.
  • the RNAs may be selected from the RNAs listed in Table 2.
  • the method further comprises evaluating the profile for the presence or absence of an RNA expression signature indicative of AML, to thereby conclude whether the patient has or does not have AML.
  • the method generally provides a sensitivity for the detection of AML of at least about 70 %, while providing a specificity of at least about 70 %.
  • the method comprises determining the abundance of at least 4 RNAs, at least 60 RNAs, at least 100 RNAs, at least 200, or of at least 500 RNAs chosen from the RNAs listed in Table 2, and classifying the sample as being indicative of AML, or not being indicative of AML.
  • kits a nd custom arrays for preparing the gene expression profiles, and for determining the presence or absence of AML.
  • Figure 1 Development of a classifier to molecularly diagnose AML.
  • A Schema of approach to define classifiers. A dataset was compiled out of a total of 17 individual studies containing a total of 2013 samples. Randomly, a total of 150 samples was drawn. Within these 150 samples, 75 cases of AML were included, the other samples were called non-AML or control samples. The 150 samples were evenly split into three independent cohorts of 50 samples, called training set (TS) and validation set 1 (VI) and validation set 2 (V2). The classifier was built in TS and applied to VI and V2. Classifier performance is only shown for external validation in VI and V2.
  • Classifiers were build according to a combined approach of 1) feature selection, 2) application of a classifier algorithm, and 3) 10-fold cross-validation (internal validation). Influence of 1) feature size, 2) classification algorithm, 3) ratio of cases and controls in the TS, and 4) the size of the TS were assessed by varying the respective parameters. As readout we assessed AUC, MCC, specificity and sensitivity. The process of generating three independent sets of data out of the 150 samples was performed 10,000 times and termed 'trial simulation approach' (TSA).
  • TSA 'trial simulation approach'
  • FIG 2 Development of a classifier to molecularly diagnosis of AML using a compiled dataset of 2013 samples.
  • A Schema of approach to define classifiers.
  • a dataset was compiled out of a total of 17 individual studies containing a total of 2013 samples.
  • the 2013 samples were evenly split into three independent cohorts of 671 samples, called training set (TS) and validation set 1 (VI) and validation set 2 (V2).
  • the classifier was built in TS and applied to VI and V2. Classifier performance is only shown for external validation in VI and V2.
  • Classifiers were built according to a combined approach of 1) feature selection, 2) application of a classifier algorithm, and 3) 10-fold cross-validation (internal validation) (as defined in Figure 1).
  • Figure 3 Correlation of feature size and feature selection with classifier performance.
  • A The number of features in each individual classifier of a total of 10.000 classifiers is plotted against the AUC of the respective classifier. For each level of filter settings (a total of five different levels of filter settings) the data are plotted separately. On the top panel, the data obtained in the small dataset (150 samples) are shown, on the lower panel, the data obtained in the complete dataset (2013 samples) is shown. It can be clearly seen that the variation in AUC is reduced with higher feature sizes in the small cohort, but this effect is not apparent anymore in the complete dataset.
  • £B For each transcript interrogated on the array, its participation in any of the 60.000 classifier (6 levels of filtering) were calculated and ranked . If a transcript was part of at least 1 classifier, its participation frequency was plotted. In B the results of the small dataset of 150 samples is shown.
  • C Similar analysis, but this time for the complete dataset.
  • A The influence of feature size on classifier performance was assessed using 10,000 independent TSA settings for each filter setting . 5 different filter settings based on differentially expressed genes (combined fold change, p-value filter, here abbreviated by FC only) were used. The larger the FC value the smaller the number of transcripts per classifier. Shown is classifier performance in VI and V2 independently.
  • £B The influence of sample distribution in TS was assessed by decreasing the number of cases from 25 out of 50 in TS to 5 out of 30 in TS. Shown is again classifier performance in VI and V2 independently.
  • C The influence of sample size in TS was assessed by decreasing both the number of cases and controls from each 25 to each 5 in TS. Shown is again classifier performance in VI and V2 independently.
  • Figure 5 Corresponding to Figure 2, instead of AUC, the MCC is shown for the complete dataset.
  • A For VI (black dots) and V2 (red dots), the performance of the 10,000 classifiers generated in this simulation approach and the influence of feature size is shown. It can be clearly seen that the classifier generated with a larger training set perform significantly better than data presented in Figure 1A. Data are shown in boxplots (mean, 25 to 75 percentiles, standard deviation and outliers). £ ⁇ ) The influence of different classification algorithms on classifier performance is shown. For each condition, 10,000 individual classifiers in 10,000 independent trial simulation approaches are shown for VI and V2 independently. SVM with linear or radial kernel was used in combination with either t-test (t) or Wilcoxon-test (wile). (C) PAM or LDA in combination with t-test or Wilcoxon-test were used.
  • FIG. 6 Correlation of feature size and feature selection with classifier performance. Shown here is the MCC as readout (A) The number of features in each individual classifier of a total of 10.000 classifiers is plotted against the MCC of the respective classifier. For each level of filter settings (a total of five different levels of filter settings) the data are plotted separately. On the top panel, the data obtained in the small dataset (150 samples) are shown, the data obtained in the complete dataset (2013 samples) is shown on the lower panel . It can be clearly seen that the variation in MCC is reduced with higher feature sizes in the small cohort, but this effect is not apparent anymore in the complete dataset.
  • the invention provides methods and kits for screening, diagnosing, and detecting AML in human patients (subjects).
  • a synonym for a patient with AML is "AML-case” or simply "case.”
  • the present invention provides methods and kits for screening patient samples for those that are positive for AML, e.g., in the absence of surgery or any other diagnostic procedure.
  • the invention relates to the determination of the abundance of RNAs to detect a AML in a human subject, wherein the determination of the abundance is based on RNA obtained (or isolated) from whole blood of the subject or from blood cells of the subject.
  • the invention involves preparing an RNA expression profile from a patient sa mple.
  • the method may comprise isolati ng RNA from whole blood , a nd detecting the abundance or relative abundance of selected transcripts.
  • the "RNAs" may be defined by reference to an expressed gene, or by reference to a transcript, or by reference to a particular oligonucleotide probe for detecting the RNA (or cDNA derived therefrom), each of which is listed in Table 2 for 680 RNAs that are indicative of the presence or absence of AML.
  • the number of transcripts in the RNA expression profile may be selected so as to offer a convenient and cost effective means for screening samples for the presence or absence of AML with high sensitivity and high specificity.
  • the RNA expression profile may include the expression level or "abundance" of from 4 to about 3000 transcripts.
  • the expression profile includes the RNA levels of 2500 transcripts or less, 2000 transcripts or less, 1500 transcripts or less, 1000 transcripts or less, 500 transcripts or less, 250 transcripts or less, 200 transcripts of less, 100 transcripts of less, or 50 transcripts or less.
  • Such profiles may be prepared, for example, using custom microarrays or multiplex gene expression assays as described in detail herein.
  • RNA expression profiles in accordance with this aspect may be evaluated for the presence or absence of an RNA expression signature indicative of AML.
  • the sequential addition of transcripts from Table 2 to the expression profile provides for higher sensitivity and/or specificity for the detection of AML, as indicated by the AUC.
  • a clinical utility is reached if the AUC is at least 0.8.
  • the inventors have surprisingly found that an AUC of 0.8 is reached if and only if at least 4 RNAs are measured that are chosen from the RNAs listed in Table 2.
  • measuring 4 RNAs is necessary and sufficient for the detection of AML in a human subject based on RNA from a blood sample obtained from said subject by measuring the abundance of at least 4 RNAs in the sample, that are chosen from the RNAs listed in Table 2, and concluding based on the measured abundance whether the subject has AML or not.
  • the area under the ROC curve may be at least 0.8, or at least 0.82, or at least 0.85 or at least 0.9.
  • the AUC is a quantitative parameter for the clinical utility (specificity and sensitivity) of the detectbn method described herein.
  • An AUC of 1.0 refers to a sensitivity and specificity of 100 %.
  • the profile may contain the expression level of at least 4 RNAs that are indicative of the presence or absence of AML, and specifically, as selected from Table 2, or may contain the expression level of at least 6, 8, 10, 12 or 14 RNAs selected from Table 2.
  • the profile may contain the expression level or abundance of at least 60, 100, 200, 500, or 680 RNAs that are indicative of the presence or absence of AML, and such RNAs may be (at least in part) selected from Table 2.
  • RNAs may be defined by gene, or by transcript ID, or by probe ID.
  • RNAs of Table 2 support the detection of AML with high sensitivity and high specificity.
  • the abundance of at least 4, at least 6, at least 8, at least 10, at least 12, at least 14, at least 60, at least 100, at least 200, or at least 500 distinct RNAs are measured, in order to arrive at a reliable diagnosis of AML.
  • the set of RNAs may comprise, consist essentially of, or consist of, a set or subset of RNAs exemplified in Table 2.
  • the term "consists essentially of” in this context allows for the expression level of additional transcripts to be determined that are not differentially expressed in AML subjects, and which may therefore be used as positive or negative expression level controls or for normalization of expression levels between samples.
  • RNA expression profiles may be evaluated for the presence or absence of an RNA expression signature indicative of AML.
  • the sequential addition of transcripts from Table 2 to the expression profile provides for higher sensitivity and/or specificity and stability (i.e. independence from the sample analyzed) for the detection of AML.
  • the sensitivity and specificity of the methods provided herein may be equivalent to an area under the ROC curve (AUC) of at least at least 0.8, or at least 0.82, at least 0.85, or of at least 0.9.
  • the present invention provides an in-vitro diagnostic test system (IVD) that is trained (as described further below) for the detection of a AML.
  • IVD in-vitro diagnostic test system
  • RNA abundance values for AML positive and negative samples are determined.
  • the RNAs can be quantitatively measured on an adequate set of training samples comprising cases and controls, and with adequate clinical information on leukemia status, applying adequate quality control measures, and on an adequate set of test samples, for which the detection is yet to be made.
  • a classifier can be trained and applied to the test samples to calculate the probability of the presence or non— presence of the AML.
  • a sample can be classified as being from a patient with AML or from a healthy individual without the necessity to run a reference sample of known origin (i.e. from a AML patient or a healthy individual) at the same time.
  • classification schemes are known for classifying samples between two or more classes or groups, and these include, without limitation : Naive Bayes, Support Vector Machines, Nearest Neighbors, Decision Trees, Logistics, Articifial Beural Networks, and Rule-based schemes.
  • the predictions from multiple models can be combined to generate an overall prediction.
  • a classification algorithm or "class predictor" may be constructed to classify samples. The process for preparing a suitable class predictor is reviewed in R. Simon, Diagnostic and prognostic prediction using gene expression profiles in high-dimensional microarray data, British Journal of Cancer (2003) 89, 1599-1604, which review is hereby incorporated by reference.
  • the invention teaches an in-vitro diagnostic test system (IVD) that is trained in the detection of a AML referred to above, comprising at least 4 RNAs, which can be quantitatively measured on an adequate set of training samples comprising cases and controls, with adequate clinical information on leukemia status, applying adequate quality control measures, and on an adequate set of test samples, for which the detection yet has to be made.
  • IVD in-vitro diagnostic test system
  • a classifier Given the quantitative values for the RNAs and the clinical data for the training samples, a classifier can be trained and applied to the test samples to calculate the probability of the presence or absence of the AML.
  • the present invention provides methods for detecting, diagnosing, or screening for AML in a human subject with a high sensitivity and specificity.
  • the sensitivity of the methods provided herein is equivalent to an area under the ROC curve (AUC) of at least at least 0.8, or at least 0.82, at least 0.85, of or at least 0.9.
  • the above finding may be due to the fact that an organism such as a human systemically reacts to the development of an AML by altering the expression levels of genes in different pathways.
  • the change in expression might be small for each gene in a particular signature, measuring a set of at least 4 genes, preferably even larger numbers such as 6, 8, 10, 12, 14, 100, 200, 500 or even more RNAs, for example at least 5, at least 8, at least 120, at least 160 RNAs at the same time allows for the detection of AML in a human with high sensitivity and high specificity.
  • a RNA obtained from a subject's blood sample i.e. a RNA biomarker
  • a RNA biomarker is a RNA molecule with a particular base sequence whose presence within a blood sample from a human subject can be quantitatively measured.
  • the measurement can be based on a part of the RNA molecule, namely a part of the RNA molecule that has a certain base sequence, which allows for its detection and thereby allows for the measurement of its abundance in a sample.
  • the measurement can be by methods known in the art, for example analysis on a solid phase device, or in solution (for example, by RT-PCR) . Probes for the particular RNAs can either be bought commercially, or designed based on the respective RNA sequence.
  • the abundance of several RNA molecules is determined in a relative or an absolute manner, wherein an absolute measurement of RNA abundance is preferred .
  • the RNA abundance is, if applicable, compared with that of other individuals, or with multivariate quantitative thresholds.
  • RNA abundance is performed from blood samples using quantitative methods.
  • RNA is isolated from a blood sample obtained from a human subject that is to undergo AML testing, i .e. for example a smoker or a person with high fever and waekness.
  • AML testing i .e. for example a smoker or a person with high fever and waekness.
  • microarray-based methods the invention is not limited thereto.
  • RNA abundance can be measured by in situ hybridization, amplification assays such as the polymerase chain reaction (PCR), sequencing, or microarray-based methods.
  • RT-PCR e.g ., TAQMAN
  • hybridization-based assays such as DNA microarray analysis, as well as direct mRNA capture with branched DNA (QUANTIGENE) or HYBRID CAPTURE (DIGENE) .
  • QUANTIGENE direct mRNA capture with branched DNA
  • DIGENE HYBRID CAPTURE
  • the invention employs a microarray.
  • a "microarray” includes a specific set of probes, such as oligonucleotides and/or cDNAs (e.g. , expressed sequence tags, "ESTs") corresponding in whole or in part, and/or continuously or discontinuously, to regions of RNAs that can be extracted from a blood sample of a human subject.
  • the probes are bound to a solid support.
  • the support may be selected from beads (magnetic, paramagnetic, etc. ), glass slides, and silicon wafers.
  • the probes can correspond in sequence to the RNAs of the invention such that hybridization between the RNA from the subject sample (or cDNA derived therefrom) and the probe occurs.
  • the sample RNA can optiona lly be amplified before hybridization to the microarray.
  • the sample RNA Prior to hybridization, the sample RNA is fluorescently labeled .
  • fluorescence emission is quantified . Fluorescence emission for each particular RNA is directly correlated with the amount of the particular RNA in the sample. The signal can be detected and together with its location on the support can be used to determine which probe hybridized with RNA from the subject's blood sample.
  • the invention is directed to a kit or microarray for detecting the level of expression or abundance of RNAs in the subject's blood sample, where this "profile" allows for the conclusion of whether the subject has AML or not (at a level of accuracy described herein).
  • the invention relates to a probe set that allows for the detection of the RNAs associated with AML. If these particular RNAs are present in a sample, they (or corresponding cDNA) will hybridize with their respective probe (i .e, a complementary nucleic acid sequence), which will yield a detectable signal. Probes are designed to minimize cross reactivity and false positives.
  • the invention in certain aspects provides a microarray, which generally comprises a solid support and a set of oligonucleotide probes.
  • the set of probes generally contains from 4 to about 3,000 probes, including at least 4 probes deduced from Table 2. In certain embodiments, the set contains 2000 probes or less, or 1000 probes or less, 500 probes or less, 200 probes or less, or 100 probes or less.
  • the conclusion whether the subject has AML or not is preferably reached on the basis of a classification algorithm, which can be developed using e.g. a random forest method, a support vector machine (SVM), a K-nearest neighbor method (K-NN), such as a 3-nearest neighbor method (3-NN), a linear discrimination analysis (LDA), or a prediction analysis for microarrays (PAM), as known in the art.
  • a classification algorithm which can be developed using e.g. a random forest method, a support vector machine (SVM), a K-nearest neighbor method (K-NN), such as a 3-nearest neighbor method (3-NN), a linear discrimination analysis (LDA), or a prediction analysis for microarrays (PAM), as known in the art.
  • F-statistics is used to identify specific difference of the abundance of the at least 4 RNAs in healthy individuals versus the abundance of the at least 4 RNAs in individuals with AML.
  • S + or true positive fraction (TPF) refers to the count of positive test results among all true positive disease states divided by the count of all true positive disease states.
  • S or true negative fraction (TNF) refers to the count of negative test results among all true negative disease states divided by the count of all true negative disease states.
  • CCR Correct Classification Rate
  • TF true fraction
  • PV + or PPV Physical Predictive Value
  • V Negative Predictive Value
  • RNA molecules that can be used in combinations described herein for diagnosing and detecting AML in a subject according to the invention can be found in Table 2.
  • the inventors have shown that the selection of at least 4 or more RNAs of the markers listed in Table 2 can be used to diagnose or detect AML in a subject using a blood sample from that subject.
  • the RNA molecules that can be used for detecting, screening and diagnosing AML are selected from the RNAs provided in Table 2.
  • the method of the invention comprises at least the following steps: measuring the abundance of at least 4 RNAs (preferably 9 RNAs or 10 RNAs) in the sample, that are chosen from the RNAs listed in Table 2, and concluding, based on the measured abundance, whether the subject has AML or not.
  • Measuring the abundance of RNAs may comprise isolating RNA from blood samples as described, and hybridizing the RNA or cDNA prepared therefrom to a microarray. Alternatively, other methods for determining RNA levels may be employed .
  • Examples for sets of 4 or more RNAs that a re measured together, i .e. sequentia lly or prefera bly simultaneously, are shown in Examples 1 to 15 of Tables 3, 4 and 5.
  • the sets of at least 4 RNAs of Tables 3 and 4 are defined by a common threshold of AUC>0.8.
  • the abundance of at least 4 RNAs (preferably 6, 8, 10, or 12 RNAs) in the sample is measured, wherein the at least 4 RNAs are chosen from the RNAs listed in Table 2. Examples for sets of 4 RNAs that can be measured together, i.e. sequentially or preferably simultaneously, to detect AML in a human subject are shown in Table 2.
  • RNAs refers to a minimum number of RNAs that are measured . It is possible to use up to 10,000 or 20,000 genes in the invention, a fraction of which can be RNAs listed in Table 2. In preferred embodiments of the invention, abundance of up to 5.000, 2.500, 2.000, 1,000, 500, 250, 100, 80, 70, 60, 50, 40, 30, 20, 10, 5,4, 3, 2, or 1 RNA of randomly chosen RNAs that are not listed in Table 2 is measured in addition to RNAs of Table 2 (or subsets thereof) .
  • RNAs that are mentioned in Table 2 are measured .
  • RNA markers for AML for example the at least 4 RNAs described above, (or more RNAs as disclosed above and herein), is determined preferably by measuring the quantity of the transcribed RNA of the marker gene.
  • This quantity of the mRNA of the marker gene can be determined for example through chip technology (microarray), (RT-) PCR (for example also on fixated material), Northern hybridization, dot-blotting, sequencing, or in situ hybridization .
  • the microarray technology which is most preferred, allows for the simultaneous measurement of RNA abundance of up to many thousa nd RNAs and is therefore an important tool for determining differential expression (or differences in RNA abundance), in particular between two biological samples or groups of biological samples.
  • the RNAs of the sample need to be amplified and labeled and the hybridization and detection procedure can be performed as known to a person of skill in the art.
  • the analysis can also be performed through single reverse transcriptase-PCR, competitive PCR, real time PCR, differential display RT-PCR, Northern blot analysis, sequencing, and other related methods.
  • the larger the number of markers is that are to be measured the more preferred is the use of the microarray technology.
  • multiplex PCR for example, real time multiplex PCR is known in the art and is amenable for use with the present invention, in order to detect the presence of 2 or more genes or RNAs simultaneously.
  • the RNA whose abundance is measured in the method of the invention can be mRNA, cDNA, unspliced RNA, or its fragments. Measurements can be performed using the complementary DNA (cDNA) or complementary RNA (cRNA), which is produced on the basis of the RNA to be analyzed, e.g . using microarrays.
  • cDNA complementary DNA
  • cRNA complementary RNA
  • microarrays A great number of different arrays as well as their manufacture are known to a person of skil l in the art and are described for example in the U .S. Patent Nos.
  • the decision whether the subject has AML comprises the step of training a classification algorithm on an adequate training set of cases and controls and applying it to RNA abundance data that was experimentally determined based on the blood sample from the human subject to be diagnosed .
  • the classification method can be a random forest method, a support vector machine (SVM), or a K-nearest neighbor method (K-NN), such as 3-NN.
  • SVM support vector machine
  • K-NN K-nearest neighbor method
  • RNAs For the development of a model that allows for the classification for a given set of biomarkers, such as RNAs, methods generally known to a person of skill in the art are sufficient, i.e., new algorithms need not be developed .
  • a classifier i.e. a mathematical model that generalizes properties of the different classes (leukemia vs. healthy individual) from the training data and applies them to the test data resulting in a classification for each test sample.
  • the raw data from microarray hybridizations can first be condensed with FARMS as shown by Hochreiter et al., Bioinformatics 22(8) : 943-9(2006) .
  • Alternative methods for condensation such as Robust Multi-Array Analysis (RMA, GC-RMA, see Irizarry et al. Biostatistics. 4, 249-264 (2003) can be used.
  • RMA Robust Multi-Array Analysis
  • GC-RMA see Irizarry et al. Biostatistics. 4, 249-264 (2003)
  • classification of the test data set through a support-vector- machine or other classification algorithms is known to a person of skill in the art, like for example classification and regression trees, penalized logistic regression, sparse linear discriminant analysis, Fisher linear discriminant analysis, K-nearest neighbors, shrunken centroids, and artificia l neura l networks (see W. Wapni k, The Nature of Statistica l Learning Theory, Springer Verlag, New York, NY, USA, 1995; Berhard Scholkopf, Alex Smola : Learning with Kernels : Support Vector Machines, Regularization, Optimization, and Beyond, MIT Press, Cambridge, MA, 2002; S. Kotsiantis, Informatica J. 31 : 249-268 (2007)).
  • RNA biomarkers that are used as input to the classification algorithm .
  • the invention refers to the use of a method as described above and herein for the detection of AML in a human subject, based on RNA from a blood sample.
  • the invention also refers to the use of a microarray for the detection of AML in a human subject based on RNA from a blood sample.
  • a use can comprise measuring the abundance of at least 4 RNAs (or more, as described above and herein) that are listed in Table 2.
  • the microarray comprises at least 3 probes for measuring the abundance of the at least 3 RNAs.
  • Commercially available microarrays such as from Illumina or Affymetrix, may be used .
  • the abundance of the at least 4 RNAs is measured by multiplex RT-PCR.
  • the RT-PCR includes real time detection, e.g ., with fluorescent probes such as Molecular beacons or TaqMan® probes.
  • the microarray comprises probes for measuring only RNAs that are listed in Table 2 (or subsets thereof) .
  • the invention also refers to a kit for the detection of AML in a human subject based on RNA obtained from a blood sample.
  • a kit for the detection of AML in a human subject based on RNA obtained from a blood sample.
  • Such a kit comprises a means for measuring the abundance of at least 4 RNAs that are chosen from the RNAs listed in Table 2.
  • the means for measuring expression can be probes that allow for the detection of RNA in the sample or primers that allow for the amplification of RNA in the sample. Ways to devise probes and primers for such a kit are known to a person of skill in the art.
  • the invention refers to the use of a kit as described above and herein for the detection of AML in a human subject based on RNA from a blood sample comprising means for measuring the abundance of at least 4 RNAs that are chosen from the RNAs listed in Table 2.
  • a use may comprise the following steps : contacting at least one component of the kit with RNA from a blood sample from a human subject, measuring the abundance of at least 4 RNAs (or more as described above and herein) that are chosen from the RNAs listed in Table 2 using the means for measuring the abundance of at least 4 RNAs, and concluding, based on the measured abundance, whether the subject has AML.
  • the invention also refers to a method for preparing an RNA expression profile that is indicative of the presence or absence of AML, comprising : isolating RNA from a whole blood sample, and determining the level or abundance of from 4 to about 3000 RNAs, including at least 4 RNAs selected from Table 2.
  • the expression profile contains the level or abundance of 680 RNAs or less, 500 or less, of 150 RNAs or less, or of 100 RNAs or less. Further, it is preferred that at least 10 RNAs, at least 30 RNAs, at least 100 RNAs are listed in Table 2.
  • the invention also refers to a microarray, comprising a solid support and a set of oligonucleotide probes, the set containing from 4 to about 3,000 probes, and including at least 4 probes selected from Table 2.
  • the set contains 680 probes or less (such as e.g. 500 probes, or less).
  • At least 10 probes can be those listed in Table 2.
  • At least 30 probes can be those listed in Table 2.
  • at least 100 probes are listed in Table 2.
  • AML has a low prevalence but is a deadly disease if not diagnosed in time
  • a test used for screening or primary diagnosis of AML would have to achieve sensitivity and specificity greater 90%, preferably greater 95%, preferably greater 98%, preferably >99% to minimize false-negative results while avoiding unacceptable levels of false-positive results.
  • a total of 2013 microarray samples (Affymetrix U 133A chip) from 17 individual studies were compiled to form a new dataset (AML dataset, Table 1, Fig . 1A). Samples were only included into the study when passing all quality control checks. Following recent guidelines suggested by the MAQC consortium the preferred methodologies for data processing, feature selection and classifier development were established using one of the datasets provided by the MAQC consortium (Fig.
  • AML acute myeloid leukemia
  • PR/Granger CB correlate with the extent of coronary artery disease.
  • Table 1 summarizes the origin of the samples that have been used for the development of a test to diagnose molecularly AML.
  • Quality check procedure Samples were subjected to an extended quality check prior use. First, a visual inspection of the distribution of raw expression values was performed using pairwise scatterplots of expression values from all arrays of a dataset. Overall median correlation within a combined dataset was required to be above 0.8. Next, the present call rate had to reach a threshold determined within the dataset as median present rate > 0.3. Third overall sample distribution was visually analyzed by density plots.
  • Batch-effect removal Due to our overall strategy and to be able to better mimic expected clinical routine of subsequent data generation, which is naturally prone to batch-effects, we voted against batch-effect removal.
  • Classification algorithms During setup we tested several classification algorithms (support vector machine (SVM), linear discrimination analysis (LDA), or prediction analysis for microarrays (PAM)). The different classifier were built and optimized based on the training set using a 10-fold cross-validation design repeated 10-times. The training set was divided 10 times into an internal training and an internal validation set in a ratio 9 : 1 (distribution to internal validation group see supplementary Table 1). In the internal training set the differentially expressed genes between positive and negative samples (cases and controls) were calculated using a t-test. Using the feature list extracted following the feature selection method, the different algorithms were trained on the internal training set and used to calculate the probability score for each case of the respective internal validation set.
  • SVM support vector machine
  • LDA linear discrimination analysis
  • PAM prediction analysis for microarrays
  • AUC and median MCC were used to measure the quality of the classifier. Sensitivity and specificity were calculated at the maximum Youden- index (sensitivity + specificity - 1). AUC and MCC values were calculated using prediction probabilities as implemented in the ROCR package. For description of specificity controls see the following paragraph.
  • Randomized permutation of training and validation sets To assess robustness of classification, a permutation approach was applied, where each classification was repeated in 10.000 iterations (Trial Simulation Approach, TSA). In this re-sampling design, the dataset was randomly divided in one training set and two validation sets. If not stated otherwise, all divided sets comprise one third of the entire dataset. The classifier was built in the training set based on differentially expressed genes selected by statistical testing in the training set and then applied to the two independent validation sets.
  • Fig. 1A To develop classifiers to diagnose AML the procedure shown in Fig. 1A was performed initially. First, 150 samples (75 AML samples and 75 non-AML samples) were randomly drawn from the complete dataset to simulate a typical pilot trial situation. These 150 samples were divided into three independent datasets, each containing 25 AML cases and 25 non-AML samples. One of the three datasets was set to become the training set (TS), one the first validation cohort (VI) and the third dataset to become the second validation cohort (V2). Within TS, the classifier was generated using a defined approach for feature selection, a defined classification algorithm and a lOx cross validation (internal validation). The classifier build in TS was then validated in the independent datasets VI and V2.
  • Fig . IB shows the graphical representation of the AUC of each of the 10.000 classifiers in VI (black circles) respectively V2 (red circles) (left panel). On the right panel, mean (line in grey box), 25/75 percentiles (boxes) and 95 percentiles (lines) as well as outliers (dots) are shown.
  • Fig. IE the influence of sample size and distribution in the test set (TS) on classifier performance is shown.
  • TS the number of AML cases was varied between 5 and 25.
  • TS sample size in the test set
  • controls the number of AML cases as well as the number of non-AML (controls) was varied between 5 and 25.
  • Fig. 2A the whole dataset comprising all 2013 samples was used (Fig. 2A). These 2013 samples were divided into three independent datasets, each containing 637 samples with equal distribution between AML and non-AML samples.
  • One of the three datasets was set to become the training set (TS), one the first validation cohort (VI) and the third dataset to become the second validation cohort (V2).
  • TS training set
  • VI first validation cohort
  • V2 second validation cohort
  • the classifier was generated using a defined approach for feature selection, a defined classification algorithm and a lOx cross validation (internal validation).
  • the classifier build in TS was then validated in the independent datasets VI and V2. To better understand the role of feature size classification algorithm, ratio of cases and controls as well as size of training set, these variables were varied (see figures below).
  • FIG. 2B the AUC in VI and V2 of all classifiers generated in the respective TS are shown for 6 different feature size criteria (FC>4 to FC> 12).
  • FC>4 to FC> 12 the result of 60.000 classifiers in both VI and V2.
  • the left panel shows the same scale on the y-axis as shown in figure ID, the left panel reduces the scale from 0.96 to 1.
  • Fig. 2C correlates the percentage of generated classifiers with their result (here AUC) for the small cohort (150 samples, see Figure 1, red dots) and for the complete cohort (2013 samples, Figure 2B. black dots.
  • Fig. 2D correlates the percentage of generated classifiers with their result (here MCC) for the small cohort (150 samples, see Figure 1, red dots) and for the complete cohort (2013 samples, Fig . 2B. black dots).
  • Fig. 3A correlates the feature size of each classifier with its AUC in VI and V2 for both the small cohort (upper panel) and the complete cohort (lower panel). Again, a total of 60.000 classifiers in 5 panels is shown for the small cohort and the same number of classifiers for the large cohort.
  • Fig. 3B addresses the question, how often specific features (transcripts) are part of a classifier. For this purpose all 3540 transcripts that appeared in at least one of 60.000 classifiers were plotted against their participation (in percent) of all classifiers.
  • transcripts > 3000 transcripts
  • transcripts are only part of less than 40% of all classifiers.
  • transcript (21) is part of at least 50% of all classifiers.
  • Not a single transcript is observed in more than 90% of all classifiers.
  • Fig . 3C addresses the question, how often specific features (transcripts) are part of a classifier in the complete dataset. For this purpose all 680 transcripts that appeared in at least one of 60.000 classifiers were plotted against their participation (in percent) of all classifiers.
  • transcripts > 600 transcripts
  • 45 transcripts are present in at least 50% of all classifiers, 25 transcripts in more than 80% of all classifiers, 19 transcripts in more than 90% of all classifiers and 8 transcripts even in all classifiers.
  • transcripts that are always part of a classifier irrespective of the distribution of patients into test and validation sets. These few transcripts are the prime candidates for building the test for the primary molecular diagnosis of AML.
  • KIR3DL1 /// KIR3DL2 ///
  • Example 1 Example 2 Example 3 Example 4 Example 5
  • Example 6 Example 7 Example 8 Example 9 Example 10

Abstract

The present invention provides a method for molecular analysis of Acute Myeloid Leukemia (AML)based on the abundance of particular RNAs from blood samples, as well as diagnostic tools such as kits and arrays suitable for such method.

Description

Molecular Analysis of Acute Myeloid Leukemia
The present invention provides a method for molecu l ar analysis of Acute Myeloid Leukemia (AML) based on the abundance of particular RINAs from blood samples, as well as diagnostic tools such as kits and arrays suitable for such method .
Background of the Invention
With the initial introduction of DNA microarray analysis for cancer diagnostics in the late 1990s (T. R. Golub et al., Science 286: 531-537 (1999)), a rush towards diagnostic, prognostic and even predictive gene signatures was initiated (J. A. Ludwig and J.N. Weinstein, Nat. Rev. Cancer 5 : 845-856 (2005); A. Rosenwald et al., New England J. Med . 346: 1937-1947 (2002); E.J. Yeoh et al., Cancer Cell 1 : 133-143 (2002); L.J. van 't Veer et al ., Nature 415 : 530-536 (2002); D.G. Beer et al., Nat. Med . 8: 816-824 (2002); A. Bhattacharjee et al., Proc. Natl. Acad . Sci. USA 98: 13790-13795 (2001); S. Ramaswamy et al., Nat. Genet. 33 : 49-54 (2003); S.L. Pomeroy et al., Nature 415 : 436-442 (2002)). Not much later similar strategies were applied to other areas of medical sciences, e.g . infectious diseases (M .P. Berry et al., Nature 466: 973-977 (2010)). At the same time, there was a tremendous surge for novel analytical tools and mathematical algorithms to be utilized for the analysis of high throughput data for diagnostic purposes (M. D. Radmacher et al., J Comput Biol 9 : 505-511(2002); A.M . Glas et al ., BMC Genomics 7 : 278 (2006); Q. Liu et al., PLoS One 4: e8250 (2009); T. Reme et al ., BMC Bioinformatics 9 : 16(2008); R. M : Parry et al., Pharmacogenomics J. 10 : 292- 309 (2010)). Irrespective of these significant advances, the number of gene signatures that have entered clinical practice is alarmingly small (FDA, FDA Clears Breast Cancer Specific Molecular Prognostic Test (2007) and F.M . Goodsaid et al., Nat Rev Drug Discov 9 :435-445 (2010)). More futile, there have been serious concerns about the validity of several landmark studies in gene signature development (S. Michiels et al., Lancet 365 :488-492 (2005)) if not about the approach in general (S. Michiels et al., Lancet 365 :488-492 (2005); J. P. Ioannidis et al., Nat Genet 41 : 149-155 (2009); A. Dupuy A, R. M . Simon, J Natl Cancer Inst 99 : 147-157(2007)). As an important result of these concerns the MicroArray Quality Control (MAQC-I) project was successfully installed demonstrating that the technology itself is reliable and reproducible (L. Shi et al ., Nat Biotechnol 24: 1151- 1161(2006); L. Shi et al., Curr Opin Biotechnol 19 : 10-18 (2008)). More recently, concerns about overoptimistic data presentation in clinical gene signature studies could be eased by improving the analytical processes used within these pilot studies (X. Fan et al., doi : 10.1158/1078-0432. CCR-09-1815). In addition, the MAQC-II study set the framework for further development of microarray-based predictive models (L. Shi et al., Nat Biotechnol 28: 827-838(2010); S. Dudoit et al., J . Am. Stat. Assoc. 97, 77-87 (2002). Several important points are derived from this large consortium effort. Most important, model prediction performance is largely (biological) endpoint dependent, probably the most critical finding supporting further development of gene signature technology. Further, internal validation performance from well-implemented, unbiased cross-validation shows a high degree of concordance with external validation performance. Nevertheless, external validation is a critical feature for signature development. Formerly questioned by others [L. Ein-Dor et al., Bioinformatics 21 : 171-178 (2005); L. Ein-Dor et al., Proc Natl Acad Sci U S A 103 : 5923-5928 (2006)) the MAQC-II study also clearly established that many classifiers with similar performance can be developed from a given data set. Not surprising, proficiency of investigators and good modeling practice are leading to improved results (L. Shi et al ., Nat Biotechnol 28: 827-838 (2010)). WO2010/143941 discloses the subclassifying of juvenile leukemia via molecular signatures to predict the development of the disease. This subclassifying, however, requires that a primary diagnosis of AML has been established beforehand. Finally WO2006/071088 discloses gene expression analysis by RNA hybridization and quantitation based on a small number of patients. The genes determined in this analysis include CITED2, MGST1, BIN 1, RAB32, ICAM3, PXN, PPGB and TAF15.
Summary of the Invention
Increasing data support the notion that biological high throughput data will transform molecular diagnostics of many diseases including cancer and infection. Among the most advanced approaches are gene signatures based on gene expression profiling of diseased tissue or peripheral blood. However, despite the enormous number of studies performed translation of gene signatures into clinical use has been very limited and continues to be tremendously difficult. Here, we introduce adaptive learning and simulation approaches to significantly improve and accelerate the development of gene signature-based diagnostic biomarkers as exemplified for acute myeloid leukemia. In addition to current approaches determining optimal classifiers within a defined study setting (training and validation set), the overall study setting (n> 10.000) was permutated thereby simulating the performance range (sensitivities, specificities, AUC) of potential disease classifiers in other study settings. With these significant improvements we establish an exceedingly robust and clinically applicable gene signature for the diagnosis of acute myeloid leukemia.
These comprehensive findings strongly suggest to quickly develop high throughput gene expression data into diagnostic tests to address several unresolved issues. First, to better judge the validity of small pilot trials a two-step validation approach is developed that is combined with randomized permutation ("10.000 clinical trial simulation"). Second, to predict the minimum size of a consecutive pivotal validation trial an algorithm is described, which combines sample simulation and adaptive learning approaches ("on the fly optimization strategy"). This approach can also estimate overall best test performance. Further, evidence is provided that patients included in such a pivotal trial can already benefit from this adaptive learning algorithms. Utilizing these approaches a high-performance test for primary molecular diagnosis of leukemia is established . A typical patient history for a patient with AML is characterized by early episodes with fever, abnormal fatigue, signs of an infection or a cold. Such patients usually are visiting their private practitioner and very often the early signs of leukemia are misclassified as viral or bacterial infections with leukocytosis. Only if symptoms are remaining for a prolonged time, patients are sent to the hospital (see also Figure 7A). Particularly if such hospitals are not specialized in hematology, these patients have to be further forwarded to regional centers or university hospitals before they are correctly diagnosed with AML. At the centers the primary diagnosis of AML is performed by an experienced hematologist using the patient's history, blood counts and light microscopy of cells derived from bone marrow aspirates. Based on the primary diagnosis of AML using these rather old technologies (light microscope, blood counts) a hematologist is usually performing further tests for differential diagnosis, subclassification of the disease, prognosis of disease outcome, or therapy outcome prediction. These include flow cytometric analysis, cytogenetics, and PCR-based assays for genetic translocations. Previous inventions in the field of gene expression profiling (GEP) are exclusively targeted at improving differential diagnosis, subclassification, prognosis of disease outcome, and therapy outcome prediction and rely on primary diagnosis by current diagnostic procedures (patient history, physical exam, blood counts and light microscopy of bone marrow cells) prior introduction of the inventions as new diagnostics. In other words, these tests are supposed to be used in addition to current methodology for differential diagnosis and subclassification and only add further value by introducing prognosis of disease outcome and therapy outcome prediction as endpoints. Figure 7A shows their relation to current standards in AML diagnostics. However, as described in WO2010/143941, the performance of GEP-based assays for these outcomes are currently still insufficient for a successful clinical application.
The present invention is directed to substitute currently used approaches (patient history, physical exam, blood counts and light microscopy of bone marrow cells) for primary diagnosis by gene expression profiling (GEP) of peripheral blood (grey box in Figure 7B). Moreover, our test is targeted at an earlier time point during the diagnostic process and should be available for the private practitioner rather than the specialist, e.g . a hematologist or an oncologist. This is shown in Figure 7B. It is envisioned that a private practitioner can order such GEP-based test for primary diagnosis of AML from a specialized laboratory at an early time point of the diagnostic process. The specialized laboratory will provide the private practitioner with a probability for the patient being diagnosed with AML. If the patient is diagnosed with AML, the private practitioner can directly refer the patient to a center for the treatment of the patient. The center is then quickly performing further diagnostics for subclassification, therapy outcome prediction and prognosis (e.g . by other GEP-based algorithms) and can immediately start with therapy.
Of note, neither for the clinically based tentative diagnosis of AML by private practitioners, nor for the tentative diagnosis by regional hospitals nor for the primary diagnosis (using classical diagnostic procedures) in specialized centers we know the specificity and sensitivity for a correct diagnosis of AML patients. Estimations range from 60 to 95% for both specificity and sensitivity in the three scenarios. Therefore, the very high statistical performance of our GEP-based assay (> 99%) is considerably higher than current practice.
Another very important advantage of the present invention is the possibility to diagnose patients significantly earlier. Since our GEP-based assay can be performed from a small amount of blood in an unbiased fashion, the primary diagnosis does not require the expertise of specialized hematologists (in specialized centers). In other words, the private practioner (with the help of a specialized laboratory) can be enabled to primarily diagnose AML in a rather short time frame.
The invention thus provides methods and kits for diagnosing, detecting, and screening for of Acute Myeloid Leukemia (AML). Also provided is a method for preparing an RNA expression profile that is indicative of the presence or absence of AML in a subject. Further provided is the eva luation of the patient RNA expression profiles for the presence or absence of one or more RNA expression signatures that are indicative of AML. More concretely the application provides a method for the detection of AML in a human subject based on RNA from a blood sample obtained from said subject, comprising :
measuring the abundance of at least 4 RNAs in the sample, that are chosen from the RNAs listed in Ta ble 2, a nd
concluding based on the measured abundance whether the subject has AML The above method is suitable as a primary test for AML, i.e. it does not require a preceding primary test by classical methods. The conclusion whether the patient has AML or not may comprise, in a preferred embodiment of the method, classifying the sample as being from a healthy individual or from an individual having AML based on the specific difference of the abundance of the at least 4 RNAs in healthy individuals versus the abundance of the at least 4 RNAs in individuals with AML (in a reference set). In the present method, a sample can be classified as being from a patient with AML or from a healthy individual without the necessity to run a reference sample of known origin (i.e. from an AML patient or a healthy individual) at the same time.
In a preferred embodiment the method of the invention is a method for the detection of AML in a human individual based on RNA obtained from a blood sample obtained from the individual, comprising :
determining the abundance of at least 4 RNAs in the sample that are chosen from the RNAs listed in Table 2, and
classifying the sample as being from a healthy individual or from an individual having AML based on the specific difference of the abundance of the at least 4 RNAs in healthy individuals versus the abundance of the at least 4 RNAs in individuals with AML.
Particularly preferred for this method is that the abundance of at least 6 RNAs, of at least 8 RNAs, of at least 10 RNAs, of at least 12 RNAs, or of at least 14 RNAs listed in Table 2 is determined. It is further preferred to determine the highest ranked RNAs of Table 2 in the method of the invention, i.e. the first 4, first 6, first 8, first 12 or first 14 RNAs of Table 2.
In one aspect, the invention provides a method for preparing RNA expression profiles that are indicative of the presence or absence of AML. The RNA expression profiles are prepared from patient blood samples. The number of transcripts in the RNA expression profile may be selected so as to offer a convenient and cost effective means for screening samples for the presence or absence of AML cancer with high sensitivity and high specificity. Generally, the RNA expression profile includes the expression level or "abundance" of from 4 to about 3000 transcripts. In certain embodiments, the expression profile includes the RNA levels of 2500 transcripts or less, 2000 transcripts or less, 1500 transcripts or less, 1000 transcripts or less, 500 transcripts or less, 250 transcripts or less, 100 transcripts of less, or 50 transcripts or less. In such embodiments, the profile may contain the abundance or expression level of at least 4 RNAs that are indicative of the presence or absence AML, and specifically, as selected from Table 2, or may contain the expression level of at least 6, at least 8, at least 12 or at least 14 RNAs selected from Table 2. Where larger profiles are desired, the profile may contain the expression level or abundance of at least about 60, at least 100, at least 150, or 200 RNAs that are indicative of the presence or absence of AML, and such RNAs may be selected from Table 2. Combinations of genes and/or transcripts that make up or are included in expression profiles are available from Examples 1 to 15 shown in Tables 3, 4 and 5.
Such RNA expression profiles in accordance with this aspect may be evaluated for the presence or absence of an RNA expression signature indicative of AML. Generally, the sequential addition of transcripts from Table 2 to the expression profile provides for higher sensitivity and/or specificity for the detection of AML. For example, the area under the ROC curve (AUC) may be at least at least 0.8, or at least 0.82, or at least 0.85 or at least 0.9. The AUC is a quantitative parameter for the clinical utility (specificity and sensitivity) of the detection method described herein. An AUC of 1.0 refers to a sensitivity and specificity of 100 %.
Alternatively median Mathews Correlation Coefficient (MCC) may be at least at least 0.8, or at least 0.82, or at least 0.85 or at least 0.9. Again, the MCC is a quantitative parameter for the clinical utility (specificity and sensitivity) of the detection method described herein. An MCC of 1.0 refers to a sensitivity and specificity of 100 %.
In a second aspect, the invention provides a method for detecting, diagnosing, or screening for AML. In this aspect, the method comprises preparing an RNA expression profile by measuring the abundance of at least 4, at least 6, at least 8, or at least 12, or at least 14 RNAs in a patient blood sample, where the abundance of such RNAs are indicative of the presence or absence of AML. The RNAs may be selected from the RNAs listed in Table 2. The method further comprises evaluating the profile for the presence or absence of an RNA expression signature indicative of AML, to thereby conclude whether the patient has or does not have AML. The method generally provides a sensitivity for the detection of AML of at least about 70 %, while providing a specificity of at least about 70 %.
In various embodiments, the method comprises determining the abundance of at least 4 RNAs, at least 60 RNAs, at least 100 RNAs, at least 200, or of at least 500 RNAs chosen from the RNAs listed in Table 2, and classifying the sample as being indicative of AML, or not being indicative of AML.
In other aspects, the invention provides kits a nd custom arrays for preparing the gene expression profiles, and for determining the presence or absence of AML.
Short Discription of the Figures
Figure 1 : Development of a classifier to molecularly diagnose AML. (A) Schema of approach to define classifiers. A dataset was compiled out of a total of 17 individual studies containing a total of 2013 samples. Randomly, a total of 150 samples was drawn. Within these 150 samples, 75 cases of AML were included, the other samples were called non-AML or control samples. The 150 samples were evenly split into three independent cohorts of 50 samples, called training set (TS) and validation set 1 (VI) and validation set 2 (V2). The classifier was built in TS and applied to VI and V2. Classifier performance is only shown for external validation in VI and V2. Classifiers were build according to a combined approach of 1) feature selection, 2) application of a classifier algorithm, and 3) 10-fold cross-validation (internal validation). Influence of 1) feature size, 2) classification algorithm, 3) ratio of cases and controls in the TS, and 4) the size of the TS were assessed by varying the respective parameters. As readout we assessed AUC, MCC, specificity and sensitivity. The process of generating three independent sets of data out of the 150 samples was performed 10,000 times and termed 'trial simulation approach' (TSA). £B) For VI (black dots) and V2 (red dots), the performance of the 10,000 classifiers generated in this simulation approach is shown using a defined feature selection (transcripts defined by a combined fold change and p-value filter) combined with an SVM-based classification algorithm. It can be clearly seen that dependent on the distribution of the samples into TS, VI and V2, the performance read out as AUC can widely vary between 0,3 and 1. On the right panel the data are shown in an integrated format shown as a boxplot (mean, 25 to 75 percentiles, standard deviation and outliers). (C) The influence of different classification algorithms on classifier performance is shown. For each condition, 10,000 individual classifiers in 10,000 independent trial simulation approaches are shown for VI and V2 independently. On the left, SVM with linear or radial kernel was used in combination with either t-test (t) or Wilcoxon-test (wile). On the right, PAM or LDA in combination with t-test or Wilcoxon-test were used. £D) The influence of feature size on classifier performance was assessed again using 10,000 independent TSA settings. Filter based on differentially expressed genes (combined fold change, p- value filter, here abbreviated by FC only) were used . The larger the FC value the smaller the number of transcripts per classifier. Shown is again classifier performance in VI and V2 independently. (El The influence of sample distribution in TS was assessed by decreasing the number of cases from 25 out of 50 in TS to 5 out of 30 in TS. Shown is again classifier performance in VI and V2 independently. (F) The influence of sample size in TS was assessed by decreasing both the number of cases and controls from each 25 to each 5 in TS. Shown is again classifier performance in VI and V2 independently.
Figure 2 : Development of a classifier to molecularly diagnosis of AML using a compiled dataset of 2013 samples. (A) Schema of approach to define classifiers. A dataset was compiled out of a total of 17 individual studies containing a total of 2013 samples. The 2013 samples were evenly split into three independent cohorts of 671 samples, called training set (TS) and validation set 1 (VI) and validation set 2 (V2). The classifier was built in TS and applied to VI and V2. Classifier performance is only shown for external validation in VI and V2. Classifiers were built according to a combined approach of 1) feature selection, 2) application of a classifier algorithm, and 3) 10-fold cross-validation (internal validation) (as defined in Figure 1). Influence of 1) feature size, 2) classification algorithm, 3) ratio of cases and controls in the TS, and 4) the size of the TS were assessed by varying the respective parameters. As readout we assessed AUC, MCC, specificity and sensitivity. The process of generating three independent sets of data out of the 2013 samples was performed 10,000 times (TSA). £B) For VI (black dots) and V2 (red dots), the performance of the 10,000 classifiers generated in this simulation approach and the influence of feature size is shown. It can be clearly seen that the classifier generated with a larger training set perform significantly better than data presented in Figure 1A. Depending on the distribution of the samples into TS, VI and V2, the performance read out as AUC varied in a small range of 0.96 to 1. Data are shown in boxplots (mean, 25 to 75 percentiles, standard deviation and outliers). (C) Classifier performance (here AUC) was plotted against the frequency of classifiers reaching a certain AUC level . In red, performance within the small cohort of 150 samples described in Figure 1 is shown, in black, the performance of classifiers in the complete dataset (n = 2013) is shown. £D) Instead of AUC, MCC is shown.
Figure 3 : Correlation of feature size and feature selection with classifier performance. (A) The number of features in each individual classifier of a total of 10.000 classifiers is plotted against the AUC of the respective classifier. For each level of filter settings (a total of five different levels of filter settings) the data are plotted separately. On the top panel, the data obtained in the small dataset (150 samples) are shown, on the lower panel, the data obtained in the complete dataset (2013 samples) is shown. It can be clearly seen that the variation in AUC is reduced with higher feature sizes in the small cohort, but this effect is not apparent anymore in the complete dataset. £B) For each transcript interrogated on the array, its participation in any of the 60.000 classifier (6 levels of filtering) were calculated and ranked . If a transcript was part of at least 1 classifier, its participation frequency was plotted. In B the results of the small dataset of 150 samples is shown. (C) Similar analysis, but this time for the complete dataset.
Figure 4: Corresponding to Figure 1, instead of the AUC, the MCC results in the small sample cohort (n = 150) is shown. (A) The influence of feature size on classifier performance was assessed using 10,000 independent TSA settings for each filter setting . 5 different filter settings based on differentially expressed genes (combined fold change, p-value filter, here abbreviated by FC only) were used. The larger the FC value the smaller the number of transcripts per classifier. Shown is classifier performance in VI and V2 independently. £B) The influence of sample distribution in TS was assessed by decreasing the number of cases from 25 out of 50 in TS to 5 out of 30 in TS. Shown is again classifier performance in VI and V2 independently. (C) The influence of sample size in TS was assessed by decreasing both the number of cases and controls from each 25 to each 5 in TS. Shown is again classifier performance in VI and V2 independently.
Figure 5 : Corresponding to Figure 2, instead of AUC, the MCC is shown for the complete dataset. (A) For VI (black dots) and V2 (red dots), the performance of the 10,000 classifiers generated in this simulation approach and the influence of feature size is shown. It can be clearly seen that the classifier generated with a larger training set perform significantly better than data presented in Figure 1A. Data are shown in boxplots (mean, 25 to 75 percentiles, standard deviation and outliers). £Β) The influence of different classification algorithms on classifier performance is shown. For each condition, 10,000 individual classifiers in 10,000 independent trial simulation approaches are shown for VI and V2 independently. SVM with linear or radial kernel was used in combination with either t-test (t) or Wilcoxon-test (wile). (C) PAM or LDA in combination with t-test or Wilcoxon-test were used.
Figure 6: Correlation of feature size and feature selection with classifier performance. Shown here is the MCC as readout (A) The number of features in each individual classifier of a total of 10.000 classifiers is plotted against the MCC of the respective classifier. For each level of filter settings (a total of five different levels of filter settings) the data are plotted separately. On the top panel, the data obtained in the small dataset (150 samples) are shown, the data obtained in the complete dataset (2013 samples) is shown on the lower panel . It can be clearly seen that the variation in MCC is reduced with higher feature sizes in the small cohort, but this effect is not apparent anymore in the complete dataset.
Figure 7 : (A) Current time line for diagnosis of patients with AML. Use of classical diagnostic approaches. Gene expression profiling of bone marrow is targeted at further diagnostics (including differential diagnosis, subclassification, therapy outcome prediction and prognosis) as an additional part of standard diagnostics (add on technology). (B) Scenario envisioning the use of the GEP-based technology for primary diagnosis in a setting where the private practioner can cooperate with a specialized laboratory applying our GEP-based test to blood from patients with a tentative diagnosis AML. The technology would substitute for previous technology (blood counts, light microscopy of bone marrow cells) = substitution technology. Time axis are estimation based. Since the scenarios for further diagnostics by specialized hematologists (as the focus of previous inventions) and our invention targeting at primary diagnosis by private practitioners are mutually exclusive, cited prior inventions cannot be seen as prior art in the field for primary diagnosis of AML. However, they can be seen complementary.
Detailed Description of the invention
The invention provides methods and kits for screening, diagnosing, and detecting AML in human patients (subjects). A synonym for a patient with AML is "AML-case" or simply "case."
As disclosed herein, the present invention provides methods and kits for screening patient samples for those that are positive for AML, e.g., in the absence of surgery or any other diagnostic procedure.
The invention relates to the determination of the abundance of RNAs to detect a AML in a human subject, wherein the determination of the abundance is based on RNA obtained (or isolated) from whole blood of the subject or from blood cells of the subject.
In various aspects, the invention involves preparing an RNA expression profile from a patient sa mple. The method may comprise isolati ng RNA from whole blood , a nd detecting the abundance or relative abundance of selected transcripts. The "RNAs" may be defined by reference to an expressed gene, or by reference to a transcript, or by reference to a particular oligonucleotide probe for detecting the RNA (or cDNA derived therefrom), each of which is listed in Table 2 for 680 RNAs that are indicative of the presence or absence of AML.
The number of transcripts in the RNA expression profile may be selected so as to offer a convenient and cost effective means for screening samples for the presence or absence of AML with high sensitivity and high specificity. For example, the RNA expression profile may include the expression level or "abundance" of from 4 to about 3000 transcripts. In certain embodiments, the expression profile includes the RNA levels of 2500 transcripts or less, 2000 transcripts or less, 1500 transcripts or less, 1000 transcripts or less, 500 transcripts or less, 250 transcripts or less, 200 transcripts of less, 100 transcripts of less, or 50 transcripts or less. Such profiles may be prepared, for example, using custom microarrays or multiplex gene expression assays as described in detail herein.
Such RNA expression profiles in accordance with this aspect may be evaluated for the presence or absence of an RNA expression signature indicative of AML. Generally, the sequential addition of transcripts from Table 2 to the expression profile provides for higher sensitivity and/or specificity for the detection of AML, as indicated by the AUC. A clinical utility is reached if the AUC is at least 0.8.
The inventors have surprisingly found that an AUC of 0.8 is reached if and only if at least 4 RNAs are measured that are chosen from the RNAs listed in Table 2. In other words, measuring 4 RNAs is necessary and sufficient for the detection of AML in a human subject based on RNA from a blood sample obtained from said subject by measuring the abundance of at least 4 RNAs in the sample, that are chosen from the RNAs listed in Table 2, and concluding based on the measured abundance whether the subject has AML or not. An analysis of 1, 2 or 3 RNAs chosen from the RNAs listed in Table 2, however, does not allow for this detection.
For example, the area under the ROC curve (AUC) may be at least 0.8, or at least 0.82, or at least 0.85 or at least 0.9. The AUC is a quantitative parameter for the clinical utility (specificity and sensitivity) of the detectbn method described herein. An AUC of 1.0 refers to a sensitivity and specificity of 100 %. In such embodiments, the profile may contain the expression level of at least 4 RNAs that are indicative of the presence or absence of AML, and specifically, as selected from Table 2, or may contain the expression level of at least 6, 8, 10, 12 or 14 RNAs selected from Table 2. Where larger profiles are desired, the profile may contain the expression level or abundance of at least 60, 100, 200, 500, or 680 RNAs that are indicative of the presence or absence of AML, and such RNAs may be (at least in part) selected from Table 2. Such RNAs may be defined by gene, or by transcript ID, or by probe ID.
The identities of genes and/or transcripts that make up, or are included in exemplary expression profiles are disclosed in Table 2. As shown herein, profiles selected from the RNAs of Table 2 support the detection of AML with high sensitivity and high specificity.
Thus, in various embodiments, the abundance of at least 4, at least 6, at least 8, at least 10, at least 12, at least 14, at least 60, at least 100, at least 200, or at least 500 distinct RNAs are measured, in order to arrive at a reliable diagnosis of AML. The set of RNAs may comprise, consist essentially of, or consist of, a set or subset of RNAs exemplified in Table 2. The term "consists essentially of" in this context allows for the expression level of additional transcripts to be determined that are not differentially expressed in AML subjects, and which may therefore be used as positive or negative expression level controls or for normalization of expression levels between samples.
Such RNA expression profiles may be evaluated for the presence or absence of an RNA expression signature indicative of AML. Generally, the sequential addition of transcripts from Table 2 to the expression profile provides for higher sensitivity and/or specificity and stability (i.e. independence from the sample analyzed) for the detection of AML. For example, the sensitivity and specificity of the methods provided herein may be equivalent to an area under the ROC curve (AUC) of at least at least 0.8, or at least 0.82, at least 0.85, or of at least 0.9.
The present invention provides an in-vitro diagnostic test system (IVD) that is trained (as described further below) for the detection of a AML. For example, in order to determine whether a patient has AML, reference RNA abundance values for AML positive and negative samples are determined. The RNAs can be quantitatively measured on an adequate set of training samples comprising cases and controls, and with adequate clinical information on leukemia status, applying adequate quality control measures, and on an adequate set of test samples, for which the detection is yet to be made. With such quantitative values for the RNAs and the clinical data for the training samples, a classifier can be trained and applied to the test samples to calculate the probability of the presence or non— presence of the AML. Therefore, in one embodiment of the present method, a sample can be classified as being from a patient with AML or from a healthy individual without the necessity to run a reference sample of known origin (i.e. from a AML patient or a healthy individual) at the same time.
Various classification schemes are known for classifying samples between two or more classes or groups, and these include, without limitation : Naive Bayes, Support Vector Machines, Nearest Neighbors, Decision Trees, Logistics, Articifial Beural Networks, and Rule-based schemes. In addition, the predictions from multiple models can be combined to generate an overall prediction. Thus, a classification algorithm or "class predictor" may be constructed to classify samples. The process for preparing a suitable class predictor is reviewed in R. Simon, Diagnostic and prognostic prediction using gene expression profiles in high-dimensional microarray data, British Journal of Cancer (2003) 89, 1599-1604, which review is hereby incorporated by reference.
In this context, the invention teaches an in-vitro diagnostic test system (IVD) that is trained in the detection of a AML referred to above, comprising at least 4 RNAs, which can be quantitatively measured on an adequate set of training samples comprising cases and controls, with adequate clinical information on leukemia status, applying adequate quality control measures, and on an adequate set of test samples, for which the detection yet has to be made. Given the quantitative values for the RNAs and the clinical data for the training samples, a classifier can be trained and applied to the test samples to calculate the probability of the presence or absence of the AML.
The present invention provides methods for detecting, diagnosing, or screening for AML in a human subject with a high sensitivity and specificity. Specifically, the sensitivity of the methods provided herein is equivalent to an area under the ROC curve (AUC) of at least at least 0.8, or at least 0.82, at least 0.85, of or at least 0.9.
Without wishing to be bound by any particular theory, the above finding may be due to the fact that an organism such as a human systemically reacts to the development of an AML by altering the expression levels of genes in different pathways. Although the change in expression (abundance) might be small for each gene in a particular signature, measuring a set of at least 4 genes, preferably even larger numbers such as 6, 8, 10, 12, 14, 100, 200, 500 or even more RNAs, for example at least 5, at least 8, at least 120, at least 160 RNAs at the same time allows for the detection of AML in a human with high sensitivity and high specificity.
In this context, a RNA obtained from a subject's blood sample, i.e. a RNA biomarker, is a RNA molecule with a particular base sequence whose presence within a blood sample from a human subject can be quantitatively measured. The measurement can be based on a part of the RNA molecule, namely a part of the RNA molecule that has a certain base sequence, which allows for its detection and thereby allows for the measurement of its abundance in a sample. The measurement can be by methods known in the art, for example analysis on a solid phase device, or in solution (for example, by RT-PCR) . Probes for the particular RNAs can either be bought commercially, or designed based on the respective RNA sequence.
In the method of the invention, the abundance of several RNA molecules (e.g . mRNA or pre-spliced RNA, intron-lariat RNA, micro RNA, sma ll nuclear RNA, or fragments thereof) is determined in a relative or an absolute manner, wherein an absolute measurement of RNA abundance is preferred . The RNA abundance is, if applicable, compared with that of other individuals, or with multivariate quantitative thresholds.
The determination of the abundance of the RNAs described herein is performed from blood samples using quantitative methods. In particular, RNA is isolated from a blood sample obtained from a human subject that is to undergo AML testing, i .e. for example a smoker or a person with high fever and waekness. Although the examples described herein use microarray-based methods, the invention is not limited thereto. For example, RNA abundance can be measured by in situ hybridization, amplification assays such as the polymerase chain reaction (PCR), sequencing, or microarray-based methods. Other methods that can be used include polymerase-based assays, such as RT- PCR (e.g ., TAQMAN), hybridization-based assays, such as DNA microarray analysis, as well as direct mRNA capture with branched DNA (QUANTIGENE) or HYBRID CAPTURE (DIGENE) .
In certain embodiments, the invention employs a microarray. A "microarray" includes a specific set of probes, such as oligonucleotides and/or cDNAs (e.g. , expressed sequence tags, "ESTs") corresponding in whole or in part, and/or continuously or discontinuously, to regions of RNAs that can be extracted from a blood sample of a human subject. The probes are bound to a solid support. The support may be selected from beads (magnetic, paramagnetic, etc. ), glass slides, and silicon wafers. The probes can correspond in sequence to the RNAs of the invention such that hybridization between the RNA from the subject sample (or cDNA derived therefrom) and the probe occurs. In the microarray embodiments, the sample RNA can optiona lly be amplified before hybridization to the microarray. Prior to hybridization, the sample RNA is fluorescently labeled . Upon hybridization to the array and excitation at the appropriate wavelength, fluorescence emission is quantified . Fluorescence emission for each particular RNA is directly correlated with the amount of the particular RNA in the sample. The signal can be detected and together with its location on the support can be used to determine which probe hybridized with RNA from the subject's blood sample.
Accordingly, in certain aspects, the invention is directed to a kit or microarray for detecting the level of expression or abundance of RNAs in the subject's blood sample, where this "profile" allows for the conclusion of whether the subject has AML or not (at a level of accuracy described herein). In another aspect, the invention relates to a probe set that allows for the detection of the RNAs associated with AML. If these particular RNAs are present in a sample, they (or corresponding cDNA) will hybridize with their respective probe (i .e, a complementary nucleic acid sequence), which will yield a detectable signal. Probes are designed to minimize cross reactivity and false positives.
Thus, the invention in certain aspects provides a microarray, which generally comprises a solid support and a set of oligonucleotide probes. The set of probes generally contains from 4 to about 3,000 probes, including at least 4 probes deduced from Table 2. In certain embodiments, the set contains 2000 probes or less, or 1000 probes or less, 500 probes or less, 200 probes or less, or 100 probes or less.
The conclusion whether the subject has AML or not is preferably reached on the basis of a classification algorithm, which can be developed using e.g. a random forest method, a support vector machine (SVM), a K-nearest neighbor method (K-NN), such as a 3-nearest neighbor method (3-NN), a linear discrimination analysis (LDA), or a prediction analysis for microarrays (PAM), as known in the art.
Preferably, F-statistics (ANOVA) is used to identify specific difference of the abundance of the at least 4 RNAs in healthy individuals versus the abundance of the at least 4 RNAs in individuals with AML.
"Sensitivity" (S+ or true positive fraction (TPF)) refers to the count of positive test results among all true positive disease states divided by the count of all true positive disease states.
"Specificity" (S" or true negative fraction (TNF)) refers to the count of negative test results among all true negative disease states divided by the count of all true negative disease states. "Correct Classification Rate" (CCR or true fraction (TF)) refers to the sum of the count of positive test results among all true positive disease states and count of negative test results among all true negative disease states divided by all the sum of all cases. The measures S+, S", and CCR address the question : To what degree does the test reflect the true disease state?
"Positive Predictive Value" (PV+ or PPV) refers to the count of true positive disease states among all positive test results dived by the count of all positive test results.
"Negative Predictive Value" (PV" or NPV) refers to the count of true negative disease states among all negative test results dived by the count of all negative test results. The predictive values address the question : How likely is the disease given the test results?
The preferred RNA molecules that can be used in combinations described herein for diagnosing and detecting AML in a subject according to the invention can be found in Table 2. The inventors have shown that the selection of at least 4 or more RNAs of the markers listed in Table 2 can be used to diagnose or detect AML in a subject using a blood sample from that subject. The RNA molecules that can be used for detecting, screening and diagnosing AML are selected from the RNAs provided in Table 2.
Specifically, the method of the invention comprises at least the following steps: measuring the abundance of at least 4 RNAs (preferably 9 RNAs or 10 RNAs) in the sample, that are chosen from the RNAs listed in Table 2, and concluding, based on the measured abundance, whether the subject has AML or not. Measuring the abundance of RNAs may comprise isolating RNA from blood samples as described, and hybridizing the RNA or cDNA prepared therefrom to a microarray. Alternatively, other methods for determining RNA levels may be employed .
Examples for sets of 4 or more RNAs that a re measured together, i .e. sequentia lly or prefera bly simultaneously, are shown in Examples 1 to 15 of Tables 3, 4 and 5. The sets of at least 4 RNAs of Tables 3 and 4 are defined by a common threshold of AUC>0.8.
In a preferred embodiment of the invention as mentioned herein, the abundance of at least 4 RNAs (preferably 6, 8, 10, or 12 RNAs) in the sample is measured, wherein the at least 4 RNAs are chosen from the RNAs listed in Table 2. Examples for sets of 4 RNAs that can be measured together, i.e. sequentially or preferably simultaneously, to detect AML in a human subject are shown in Table 2.
An example for a set of 680 RNAs of which the abundance can be measured in the method of the invention is listed in Table 2.
The wording "at least a number of RNAs" refers to a minimum number of RNAs that are measured . It is possible to use up to 10,000 or 20,000 genes in the invention, a fraction of which can be RNAs listed in Table 2. In preferred embodiments of the invention, abundance of up to 5.000, 2.500, 2.000, 1,000, 500, 250, 100, 80, 70, 60, 50, 40, 30, 20, 10, 5,4, 3, 2, or 1 RNA of randomly chosen RNAs that are not listed in Table 2 is measured in addition to RNAs of Table 2 (or subsets thereof) .
In a preferred embodiment, only RNAs that are mentioned in Table 2 are measured .
The expression profile or abundance of RNA markers for AML, for example the at least 4 RNAs described above, (or more RNAs as disclosed above and herein), is determined preferably by measuring the quantity of the transcribed RNA of the marker gene. This quantity of the mRNA of the marker gene can be determined for example through chip technology (microarray), (RT-) PCR (for example also on fixated material), Northern hybridization, dot-blotting, sequencing, or in situ hybridization .
The microarray technology, which is most preferred, allows for the simultaneous measurement of RNA abundance of up to many thousa nd RNAs and is therefore an important tool for determining differential expression (or differences in RNA abundance), in particular between two biological samples or groups of biological samples. In order to apply the microarray technology, the RNAs of the sample need to be amplified and labeled and the hybridization and detection procedure can be performed as known to a person of skill in the art.
As will be understood by those of ordinary skill in the art, the analysis can also be performed through single reverse transcriptase-PCR, competitive PCR, real time PCR, differential display RT-PCR, Northern blot analysis, sequencing, and other related methods. In general, the larger the number of markers is that are to be measured, the more preferred is the use of the microarray technology. However, multiplex PCR, for example, real time multiplex PCR is known in the art and is amenable for use with the present invention, in order to detect the presence of 2 or more genes or RNAs simultaneously.
The RNA whose abundance is measured in the method of the invention can be mRNA, cDNA, unspliced RNA, or its fragments. Measurements can be performed using the complementary DNA (cDNA) or complementary RNA (cRNA), which is produced on the basis of the RNA to be analyzed, e.g . using microarrays. A great number of different arrays as well as their manufacture are known to a person of skil l in the art and are described for example in the U .S. Patent Nos. 5,445,934; 5,532,128; 5,556,752; 5,242,974; 5,384,261; 5,405,783; 5,412,087; 5,424,186; 5,429,807; 5,436,327; 5,472,672; 5,527,681; 5,529,756; 5,545,331; 5,554,501; 5,561,071; 5,571,639; 5,593,839; 5,599,695; 5,624,711 ; 5,658,734; and 5,700,637.
Preferably the decision whether the subject has AML comprises the step of training a classification algorithm on an adequate training set of cases and controls and applying it to RNA abundance data that was experimentally determined based on the blood sample from the human subject to be diagnosed .
The classification method can be a random forest method, a support vector machine (SVM), or a K-nearest neighbor method (K-NN), such as 3-NN.
For the development of a model that allows for the classification for a given set of biomarkers, such as RNAs, methods generally known to a person of skill in the art are sufficient, i.e., new algorithms need not be developed .
The major steps of such a model are :
l] condensation of the raw measurement data (for example combining probes of a microarray to probe set data, and/or normalizing measurement data against common controls) ;
2] training and applying a classifier (i.e. a mathematical model that generalizes properties of the different classes (leukemia vs. healthy individual) from the training data and applies them to the test data resulting in a classification for each test sample.
For example, the raw data from microarray hybridizations can first be condensed with FARMS as shown by Hochreiter et al., Bioinformatics 22(8) : 943-9(2006) . Alternative methods for condensation such as Robust Multi-Array Analysis (RMA, GC-RMA, see Irizarry et al. Biostatistics. 4, 249-264 (2003) can be used. Similar to condensation, classification of the test data set through a support-vector- machine or other classification algorithms is known to a person of skill in the art, like for example classification and regression trees, penalized logistic regression, sparse linear discriminant analysis, Fisher linear discriminant analysis, K-nearest neighbors, shrunken centroids, and artificia l neura l networks (see W. Wapni k, The Nature of Statistica l Learning Theory, Springer Verlag, New York, NY, USA, 1995; Berhard Scholkopf, Alex Smola : Learning with Kernels : Support Vector Machines, Regularization, Optimization, and Beyond, MIT Press, Cambridge, MA, 2002; S. Kotsiantis, Informatica J. 31 : 249-268 (2007)).
The key component of these classifier training and classification techniques is the choice of RNA biomarkers that are used as input to the classification algorithm . In a further aspect, the invention refers to the use of a method as described above and herein for the detection of AML in a human subject, based on RNA from a blood sample.
In a further aspect, the invention also refers to the use of a microarray for the detection of AML in a human subject based on RNA from a blood sample. According to the invention, such a use can comprise measuring the abundance of at least 4 RNAs (or more, as described above and herein) that are listed in Table 2. Accordingly, the microarray comprises at least 3 probes for measuring the abundance of the at least 3 RNAs. Commercially available microarrays, such as from Illumina or Affymetrix, may be used .
In another embodiment, the abundance of the at least 4 RNAs is measured by multiplex RT-PCR. In a further embodiment, the RT-PCR includes real time detection, e.g ., with fluorescent probes such as Molecular beacons or TaqMan® probes.
In a preferred embodiment, the microarray comprises probes for measuring only RNAs that are listed in Table 2 (or subsets thereof) .
In yet a further aspect, the invention also refers to a kit for the detection of AML in a human subject based on RNA obtained from a blood sample. Such a kit comprises a means for measuring the abundance of at least 4 RNAs that are chosen from the RNAs listed in Table 2. The means for measuring expression can be probes that allow for the detection of RNA in the sample or primers that allow for the amplification of RNA in the sample. Ways to devise probes and primers for such a kit are known to a person of skill in the art.
Further, the invention refers to the use of a kit as described above and herein for the detection of AML in a human subject based on RNA from a blood sample comprising means for measuring the abundance of at least 4 RNAs that are chosen from the RNAs listed in Table 2. Such a use may comprise the following steps : contacting at least one component of the kit with RNA from a blood sample from a human subject, measuring the abundance of at least 4 RNAs (or more as described above and herein) that are chosen from the RNAs listed in Table 2 using the means for measuring the abundance of at least 4 RNAs, and concluding, based on the measured abundance, whether the subject has AML.
In yet a further aspect, the invention also refers to a method for preparing an RNA expression profile that is indicative of the presence or absence of AML, comprising : isolating RNA from a whole blood sample, and determining the level or abundance of from 4 to about 3000 RNAs, including at least 4 RNAs selected from Table 2.
Preferably, the expression profile contains the level or abundance of 680 RNAs or less, 500 or less, of 150 RNAs or less, or of 100 RNAs or less. Further, it is preferred that at least 10 RNAs, at least 30 RNAs, at least 100 RNAs are listed in Table 2.
In yet a further aspect, the invention also refers to a microarray, comprising a solid support and a set of oligonucleotide probes, the set containing from 4 to about 3,000 probes, and including at least 4 probes selected from Table 2. Preferably, the set contains 680 probes or less (such as e.g. 500 probes, or less). At least 10 probes can be those listed in Table 2. At least 30 probes can be those listed in Table 2. In another embodiment, at least 100 probes are listed in Table 2.
Features of the invention that were described herein in combination with a method, a microarray, a kit, or a use also refer, if applicable, to all other aspects of the invention.
The evaluation of simulation and adaptive learning approaches to accelerate biomarker development first required the establishment of a larger set of high- throughput data suitable for assessing a clinically relevant endpoint. We chose primary molecular diagnosis for acute myeloid leukemia (AML) as the first model endpoint, since several studies independently reported gene expression profiling (GEP)-based classifier for disease subclassification, outcome prediction and differential diagnosis (Miesner et al., Blood 116: 2742-51 (2010)), while primary molecular diagnosis was not a primary endpoint in these studies. As AML has a low prevalence but is a deadly disease if not diagnosed in time, a test used for screening or primary diagnosis of AML would have to achieve sensitivity and specificity greater 90%, preferably greater 95%, preferably greater 98%, preferably >99% to minimize false-negative results while avoiding unacceptable levels of false-positive results. A total of 2013 microarray samples (Affymetrix U 133A chip) from 17 individual studies were compiled to form a new dataset (AML dataset, Table 1, Fig . 1A). Samples were only included into the study when passing all quality control checks. Following recent guidelines suggested by the MAQC consortium the preferred methodologies for data processing, feature selection and classifier development were established using one of the datasets provided by the MAQC consortium (Fig. 4 and protocol of decision making below) prior to application to the AML dataset. To simulate a typical pilot trial setting as a first step of test development 150 samples were drawn randomly from the complete dataset and distributed into three sets (training set (TS), validation cohorts VI and V2, each containing 25 AML cases and 25 controls) (Figure 2A). This initial setting allowed the development of a classifier within TS and two independent validations (in VI resp. V2). This procedure was performed three times and representative data of one experiment are shown. Since classifier performance in subsequent validation cohorts might be strongly influenced by the characteristics of the patient population within TS, it was already suggested in MAQC-II to perform swap analysis of training and validation cohorts. In principle, swapping independent patient cohorts is just a special case of random permutation of samples. We therefore extended this approach from a single swap to 10.000 permutations (termed '10.000 trial simulation approach'; TSA) containing a TS and two independent validation cohorts (Fig. 2A). Applied to AML, a mean AUC of 0.9644 (VI) resp. 0.9636 (V2) was achieved by TSA (Fig. IB). However, depending on the sample distribution to the three independent cohorts (TS, VI, V2), performance of a significant number of classifiers would not have supported further classifier development. Moreover, mean classifier performance did not reach the preferred target of specificity and sensitivity (> 0.99). To elucidate dependency on methodology we varied SVM settings and compared SVM to LDA and PAM algorithms (n = 8 x 10.000 classifier, Fig. 1C). While the results of SVM were further optimized using a linear kernel combined with t-Test instead of radial kernel and Wilcoxon-Test neither PAM nor LDA reached similarly high mean AUC. This was similarly true when reading out MCC (Fig. 4). As TSA clearly established the framework for classifier performance (range, median, 75% percentile), the next issues addressed were the dependency of classifier performance on feature size (n = 5xl0.000 classifier, Fig . ID), the sample distribution in TS (n = 5xl0.000 classifier, Fig . IE) and the sample size in TS (n = 5xl0.000 classifier, Fig. IF) using TSA. Unexpectedly, reducing feature size resulted in inferior classifier performance suggesting that due to the overall small sample size, more features are required to correctly classify in independent validation cohorts (Fig. ID). In contrast, TSA clearly established that further reduction of sample size (Fig. IE) or unequal distribution of samples (Fig. IF) in TS results in reduced overall classifier performance. Taken together, TSA is well-suited to establish overall classifier performance that can be expected independent of the actual clinical situation with subsequent patient recruitment into TS and validation cohorts. Moreover, dependencies of classifier performance are easily uncovered . In clinical biomarker development results from small pilot trials are supposed to form the basis for larger validation trials, however, prediction of classifier performance in the larger cohorts is still an unsolved issue. Furthermore, classifier performance in larger cohorts is expected to improve. To capture the overall improvement by enlarging the cohorts (TS, VI, V2), we repeated the trial simulation approach on the complete AML dataset (Fig. 2A). As shown in Fig. 2B, this improved classifier performance dramatically, with 61.8% of all tests reaching an AUC>0.99 and 98.4% >0.98. Although there was still a slight improvement of the spectrum of classifiers when increasing the feature size (FC>4, left panel), all 60.000 tests performed at least with an AUC of 0.9638. Similar improvements were observed when reading out MCC (Fig. 4). When directly comparing the initial small AML dataset with the complete AML dataset it became clear that only the larger dataset results in a sufficiently high AUC in the majority of classifiers developed (Fig. 2B). In fact, not even 60% of all classifiers generated in the small dataset showed an AUC>0.95. Assessing the MCC showed similar results, while basically all classifiers generated in the large dataset reached an MCC >0.98, not even 80% of all classifiers within the small AML dataset reached an MCC>0.95 (Figs. 2C and 2D). To elucidate whether the improvement in the larger dataset is associated with differences in feature distribution all features (n = 23.000) were evaluated for being part of at least one of 60.000 classifiers (Figs. 3B and 3C). While a total of 3540 features were part of at least one classifier in the small AML dataset, only 680 features were identified in the large dataset. Even more striking, while no feature was identified to be present in all classifiers in the small AML dataset, 8 features were present in all 60.000 classifier in the large dataset. When assessing feature size and classifier performance (here AUC), an enormous variance became apparent in the small AML dataset, while there was clearly less variance in the larger dataset (Fig 3 and Fig. 4). Interestingly, reduced variance in feature size in the large dataset was seen irrespective of filter criteria settings that determine the potential feature size. Together, these results support a robust gene expression profiling- based classifier as a test for primary diagnosis of AML with a sensitivity and specificity of greater 99.5%. At the same time these results also indicate that even initial pilot trials would require larger patient cohorts for robust classifier development, a requirement that can rarely be met.
The Invention is further described in the following Examples and Figures, which are, however, not to be construed as limiting the invention.
Examples
Methods Summary: In this study a total of 2013 gene expression samples (GEP) profiling in 17 independent datasets derived from 17 different studies were included (Table 1). Following suggested standards by MAQC-II a procedure for data processing, feature selection and classifier optimization was developed using Bioconductor (R-packages) as the basis for all subsequent assessments. Using a dataset of peripheral blood derived GEP samples (n = 2013) a smaller dataset (n2 = 150) from this study was randomly generated prior to applying a permutation approach (n = 10.000) to simulate a typical clinical pilot trial setting (trial simulation approach, TSA) consisting of three small but independent patient cohorts, here termed 'training set' (TS), validation set 1 (VI), and validation set 2 (V2) (Figure 1A). To assess patient distribution issues and classifier performance range and dependencies in this permutation approach molecular diagnosis of acute myeloid leukemia (AML) was chosen as a primary endpoint. Using AUC, MCC, sensitivity and specificity as readouts for the generated classifiers, 1) the range of performance of the classifiers, and 2) their dependency on a) classifier algorithms used, b) feature size, c) sample size within TS, and d) class distribution was assessed.
Description of gene expression profiling (GEP) datasets.
For this analysis new data were generated in our laboratory and also previously published datasets were used. The complete list of datasets is summarized in Table 1. Dataset for the development of a primary diagnostic test for AML: A dataset of gene expression profiles (GEP) was derived from peripheral blood mononuclear cells (PBMC) collected from a total of 19 individual studies, including our own unpublished samples (in total n = 2013). Patients with AML (n=725), ALL (n = 218), chronic leukemias (CML, CLL, n=98), infectious diseases (n = 262), coronary artery disease (n = 101), Parkinson's disease (n = 85), Colitis ulcerosa and Crohn's disease (n=85), Huntington's disease (n = 19), post-infectious chronic fatigue syndrome (n=8), and 560 healthy controls from these studies were included . All GEP samples were generated using the Affymetrix U 133A microarray.
Figure imgf000027_0001
Borovecki F/ GSE1767 Genome-wide expression profiling of human
Krainc D blood reveals biomarkers for Huntington's
disease.
Burczynski ME/ GSE3365 Molecular classification of Crohn's disease and
Dorner AJ ulcerative colitis patients using transcriptional profiles in peripheral blood mononuclear cells.
Burczynski ME/ GSE3365
Dorner AJ
amilo 0/ GSE6269 Gene expression patterns in blood leukocytes
Chaussabel D discriminate patients with acute infections
Ramilo 0/ GSE6269
Chaussabel D
Connolly PH/ GSE1140 http://www.ncbi.nlm.nih.gov/pubmed/15194674
Cooper DM.
Debey-Pascher S/ To be loaded
Schultze JL
Gow JW/ GSE14577 A gene signature for post-infectious chronic
Chaudhuri A. fatigue syndrome.
Gow JW/
Chaudhuri A.
Sinnaeve GSE12288 Gene expression patterns in peripheral blood
PR/Granger CB. correlate with the extent of coronary artery disease.
Sinnaeve
PR/Granger CB.
Scherzer CR/ GSE6613 Molecular markers of early Parkinson's disease
Gullans SR based on gene expression in blood.
Scherzer CR/ GSE6613
Gullans SR
Watford GSE12839 Tpl2 kinase regulates T cell interferon-y
production and host resistance to Toxoplasma gondii
Table 1 summarizes the origin of the samples that have been used for the development of a test to diagnose molecularly AML.
Protocol of decision making during feature selection and model development.
Following recommendations made by MAQC-II the decision making process has been documented electronically (see supplemental information).. All statistical and bioinformatical analysis were performed using R software (www.r-proiect.org) version 2.10.1 and packages from the bioconductor project
(http ://www.bioconductor.org/). Using one of the datasets (multiple myeloma) provided by the MAQC-II project (GSE24080) the preferred protocol for decision making during feature selection and model development was established . In principle, raw data (for Affymetrix: CEL-files) were downloaded from the GEO website and prepared for further analysis by R software.
Quality check procedure : Samples were subjected to an extended quality check prior use. First, a visual inspection of the distribution of raw expression values was performed using pairwise scatterplots of expression values from all arrays of a dataset. Overall median correlation within a combined dataset was required to be above 0.8. Next, the present call rate had to reach a threshold determined within the dataset as median present rate > 0.3. Third overall sample distribution was visually analyzed by density plots.
Normalization procedures: When Affymetrix microarrays were used, all samples of a dataset were normalized using the MAS5 method, a method comprising
background correction, signal intensity and scaling calculations, from the affy package.
Batch-effect removal : Due to our overall strategy and to be able to better mimic expected clinical routine of subsequent data generation, which is naturally prone to batch-effects, we voted against batch-effect removal.
Feature selection approach : As the preferred feature selection approach we used a combination of fold changes and p-values based on t-tests (2-sided, unequal variance = Welch test, unpaired; "FC+P") between experimental groups in the training set. The p-values were adjusted for multiple testing using Benjamini- Hochberg correction. If not stated otherwise we used a FC > 2 and p<0.05. To minimize overfitting issues the number of features was usually kept small. Overall, endpoints with a higher predictability showed a larger number of features that could be successfully used to obtain efficient classifiers, an observation that also became apparent during MAQC-II.
Classification algorithms: During setup we tested several classification algorithms (support vector machine (SVM), linear discrimination analysis (LDA), or prediction analysis for microarrays (PAM)). The different classifier were built and optimized based on the training set using a 10-fold cross-validation design repeated 10-times. The training set was divided 10 times into an internal training and an internal validation set in a ratio 9 : 1 (distribution to internal validation group see supplementary Table 1). In the internal training set the differentially expressed genes between positive and negative samples (cases and controls) were calculated using a t-test. Using the feature list extracted following the feature selection method, the different algorithms were trained on the internal training set and used to calculate the probability score for each case of the respective internal validation set. This approach was repeated 10 times according to the 10 dataset splitting of this 10 fold cross-validation. For each of the 10 cross-validation steps the area under the receiver operator curve (AUC) and the median Mathews Correlation Coefficient (MCC) were calculated for the internal validation set. The optimal classification algorithm was selected according to the maximum AUC and the maximum MCC reached in all algorithms tested. In our hands, SVM performed best and was therefore chosen for further analysis.
Validation of the optimized classifier: The optimized classifier was then applied to the validation cohorts. AUC and median MCC were used to measure the quality of the classifier. Sensitivity and specificity were calculated at the maximum Youden- index (sensitivity + specificity - 1). AUC and MCC values were calculated using prediction probabilities as implemented in the ROCR package. For description of specificity controls see the following paragraph.
Randomized permutation of training and validation sets: To assess robustness of classification, a permutation approach was applied, where each classification was repeated in 10.000 iterations (Trial Simulation Approach, TSA). In this re-sampling design, the dataset was randomly divided in one training set and two validation sets. If not stated otherwise, all divided sets comprise one third of the entire dataset. The classifier was built in the training set based on differentially expressed genes selected by statistical testing in the training set and then applied to the two independent validation sets.
Discussion of Experiments with reference to the Figures
In total gene expression profiling samples derived from peripheral blood mononuclear cells in 17 independent studies were collected from public databases (GEO) or generated in our laboratory, quality controlled, and normalized for further analysis. A total of 2013 samples passed all quality check filters.
To develop classifiers to diagnose AML the procedure shown in Fig. 1A was performed initially. First, 150 samples (75 AML samples and 75 non-AML samples) were randomly drawn from the complete dataset to simulate a typical pilot trial situation. These 150 samples were divided into three independent datasets, each containing 25 AML cases and 25 non-AML samples. One of the three datasets was set to become the training set (TS), one the first validation cohort (VI) and the third dataset to become the second validation cohort (V2). Within TS, the classifier was generated using a defined approach for feature selection, a defined classification algorithm and a lOx cross validation (internal validation). The classifier build in TS was then validated in the independent datasets VI and V2. To better understand the role of feature size classification algorithm, ratio of cases and controls as well as size of training set, these variables were varied (see figures below). The setting described so far, could reflect a clinical situation where TS would have been drawn first, followed by VI and V2. However, a classifier identified in such a typical clinical setting might not be generalizable. In fact, it might be biased and might underperform. To address the influence of patient entry into the independent cohorts, 10.000 clinical trial settings (Trial Simulation Approach, TSA) were tested and for each setting a classifier vor VI respectively V2 determined . In such a scenario, it is possible to determine the average performance as well as the 25, 75, 90, 95 quantiles of test performances to be expected irrespective of patient entry.
Fig . IB shows the graphical representation of the AUC of each of the 10.000 classifiers in VI (black circles) respectively V2 (red circles) (left panel). On the right panel, mean (line in grey box), 25/75 percentiles (boxes) and 95 percentiles (lines) as well as outliers (dots) are shown.
The data clearly demonstrate that there is no significant difference between VI and V2 concerning the AUC results of the 10.000 classifiers. The data also present that already rather high AUC values can be observed using only a rather small number of samples. Nevertheless, the analysis also indicates that under certain circumstances, classifiers would have been generated that would have dismissed any further development (e.g. AUC < 0.9).
As further statistical readouts of classifier performance MCC, sensitivity and specificity have been performed as well . These data are available upon request. In Figure 1C the influence of classification algorithm on classifier performance was on methodology we varied SVM (support vector machine) settings and compared SVM to LDA (linear discrimination analysis) and PAM (prediction analysis for microarrays) algorithms (n=8 x 10.000 classifier, Fig . 1C). While the results of SVM were further optimized using a linear kernel combined with t-Test instead of radial kernel and Wilcoxon-Test neither PAM nor LDA reached similarly high mean AUC. This was similarly true when reading out MCC, specificity and sensitivity
In Figur ID, the influence of feature size in classifier performance is shown. The number of features was altered based on a filter combining different levels of significance (p-value) and fold change for each feature (transcript on the array) between the groups AML and non-AML. Under these conditions, there was a slighlty better performance observed in VI over V2. More important, for both validation cohorts, the classifier performance was better with higher numbers of features (FC > 4 = more features)
In Fig. IE , the influence of sample size and distribution in the test set (TS) on classifier performance is shown. In TS, the number of AML cases was varied between 5 and 25.
Reducing the number of AML cases in TS reduces the performance of the classifier in both validation cohorts VI and V2.
In Fig . IF, the influence of sample size in the test set (TS) on classifier performance is shown. In TS, the number of AML cases as well as the number of non-AML (controls) was varied between 5 and 25.
Reducing the number of both AML cases and controls in TS reduces the performance of the classifier in both validation cohorts VI and V2.
These data indicate that increasing the number of samples would greatly improve the quality of the classifiers that could be obtained from such an approach.
As a next step, the whole dataset comprising all 2013 samples was used (Fig. 2A). These 2013 samples were divided into three independent datasets, each containing 637 samples with equal distribution between AML and non-AML samples. One of the three datasets was set to become the training set (TS), one the first validation cohort (VI) and the third dataset to become the second validation cohort (V2). Within TS, the classifier was generated using a defined approach for feature selection, a defined classification algorithm and a lOx cross validation (internal validation). The classifier build in TS was then validated in the independent datasets VI and V2. To better understand the role of feature size classification algorithm, ratio of cases and controls as well as size of training set, these variables were varied (see figures below). The setting described so far, could reflect a clinical situation where TS would have been drawn first, followed by VI and V2. However, a classifier identified in such a typical clinical setting might not be generalizable. In fact, it might be biased and might underperform. To address the influence of patient entry into the independent cohorts, 10.000 clinical trial settings (Trial Simulation Approach, TSA) were tested and for each setting a classifier for VI respectively V2 determined. In such a scenario, it is possible to determine the average performance as well as the 25, 75, 90, 95 quantiles of test performances to be expected irrespective of patient entry.
In Fig . 2B the AUC in VI and V2 of all classifiers generated in the respective TS are shown for 6 different feature size criteria (FC>4 to FC> 12). In total the figures shows the result (here AUC) of 60.000 classifiers in both VI and V2. The left panel shows the same scale on the y-axis as shown in figure ID, the left panel reduces the scale from 0.96 to 1.
As shown in Fig . 2B, this improved classifier performance dramatically, with 61.8% of all tests reaching an AUC>0.99 and 98.4% >0.98. Although there was still a slight improvement of the spectrum of classifiers when increasing the feature size (FC>4, left panel), all 60.000 tests performed at least with an AUC of 0.9638. Similar improvements were observed when reading out MCC (Fig. 4). When directly comparing the initial small AML dataset with the complete AML dataset it became clear that only the larger dataset results in a sufficiently high AUC in the majority of classifiers developed (Fig . 2B). In fact, not even 60% of all classifiers generated in the small dataset showed an AUC>0.95. Assessing the MCC showed similar results, while basically all classifiers generated in the large dataset reached an MCC >0.98, not even 80% of all classifiers within the small AML dataset reached an MCC>0.95. Fig. 2C correlates the percentage of generated classifiers with their result (here AUC) for the small cohort (150 samples, see Figure 1, red dots) and for the complete cohort (2013 samples, Figure 2B. black dots.
Not even 60% of all classifiers generated in the small dataset showed an AUC>0.95. In contrast, in the complete cohort, 100% of all classifiers reached an AUC of at least 0.98% and more than 80% reached an AUC of at least 0.99%. This clearly indicates that the analysis in the large dataset generates numerous classifiers reaching a quality required for primary molecular diagnosis of AML.
Fig. 2D correlates the percentage of generated classifiers with their result (here MCC) for the small cohort (150 samples, see Figure 1, red dots) and for the complete cohort (2013 samples, Fig . 2B. black dots).
Assessing the MCC showed similar results, while basically all classifiers generated in the large dataset reached an MCC >0.98, not even 80% of all classifiers within the small AML dataset reached an MCC>0.95.
Fig. 3A correlates the feature size of each classifier with its AUC in VI and V2 for both the small cohort (upper panel) and the complete cohort (lower panel). Again, a total of 60.000 classifiers in 5 panels is shown for the small cohort and the same number of classifiers for the large cohort.
It can be clearly seen that the variance of AUC is large in the small cohort and variance is similarly seen across the whole range of feature sizes with a tendency of smaller variance when classifiers with larger feature sizes were generated .
In stark contrast, there was almost no variance in the complete cohort and this was irrespective of feature size. Also the range of feature sizes was clearly smaller in the complete cohort (10-378 features / classifier versus 1-837 features per classifier for the small cohort)
Fig. 3B addresses the question, how often specific features (transcripts) are part of a classifier. For this purpose all 3540 transcripts that appeared in at least one of 60.000 classifiers were plotted against their participation (in percent) of all classifiers.
It can be clearly seen that the majority of all transcripts (> 3000 transcripts) are only part of less than 40% of all classifiers. Only a very small subset of transcript (21) is part of at least 50% of all classifiers. Not a single transcript is observed in more than 90% of all classifiers.
Fig . 3C addresses the question, how often specific features (transcripts) are part of a classifier in the complete dataset. For this purpose all 680 transcripts that appeared in at least one of 60.000 classifiers were plotted against their participation (in percent) of all classifiers.
It can be clearly seen that the majority of all transcripts (> 600 transcripts) are only part of less than 40% of all classifiers. However, 45 transcripts are present in at least 50% of all classifiers, 25 transcripts in more than 80% of all classifiers, 19 transcripts in more than 90% of all classifiers and 8 transcripts even in all classifiers.
These data indicate that there is a small set of transcripts that are always part of a classifier irrespective of the distribution of patients into test and validation sets. These few transcripts are the prime candidates for building the test for the primary molecular diagnosis of AML.
Together, these results support a robust gene expression profiling-based classifier as a test for primary diagnosis of AML with a sensitivity and specificity of greater 99.5%. Table 2
SEQ ID Probe. Set.lD Gene. Symbol Rank mean_AML mean_Control fold change
1 203434_s_at MME 1 6,484842407 194,828945 0,0332848
2 203435_s_at MME 2 11,7774216 390,5671937 0,030154662
3 204007_at FCGR3B 3 96,36588679 2547,897377 0,03782173
4 207008_at CXCR2 4 19,22025237 420,9509119 0,04565913
5 207094_at CXCR1 5 8,434680603 296,4180862 0,028455351
6 210084_x_at TPSAB1 6 469,5698231 14,19884381 33,07099009
7 211163_s_at TNFRSF10C 7 13,7516325 286,181047 0,048052213
8 217023_x_at TPSAB1 /// TPSB2 8 357,8705911 14,72675465 24,30070981
9 209905_at HOXA9 10 331,4165733 15,65550111 21,16933664
10 203691_at PI3 9 9,495926826 169,7706524 0,055933854
11 216782_at — 11 6,005876026 92,38991825 0,065005751
12 204006_s_at FCGR3A /// FCGR3B 12 72,28041694 1134,121235 0,063732531
13 210119_at KCNJ15 13 14,43091797 241,644622 0,059719591
14 201427_s_at SEPP1 18 292,5521236 19,48037107 15,01779008
15 216474_x_at TPSAB1 /// TPSB2 15 483,0454115 32,49118243 14,86696929
16 39318_at TCL1A 17 45,49184435 677,2316601 0,067173239
17 207907_at TNFSF14 14 5,520702158 79,03808 0,069848637
18 209995_s_at TCL1A 19 56,42459636 805,2634375 0,070069736
19 221345_at FFAR2 16 14,15907694 196,3522654 0,072110586
20 214651_s_at HOXA9 21 508,5936926 36,96465891 13,75891751
21 210321_at GZMH 20 25,44312502 344,5439601 0,073845802
22 204561_x_at APOC2 27 154,8462107 11,33686201 13,65864827
23 203828_s_at IL32 22 21,61826865 281,8012063 0,076714606
24 203948_s_at MPO 23 2980,815153 231,3270405 12,88571862
25 215382_x_at TPSAB1 24 419,4320099 33,08337558 12,67802945
26 205683_x_at TPSAB1 26 553,9765291 44,19819564 12,5339173
27 207134_x_at TPSB2 28 468,6948008 38,86841261 12,05850122
28 206622_at TRH 55 175,8067271 15,36030018 11,44552678
29 210549_s_at CCL23 32 65,28562885 5,894866402 11,07499719
30 203949_at MPO 30 2889,547982 264,6356365 10,91896776
31 20505l_s_at KIT 25 365,945213 33,69167105 10,86159284
32 207741_x_at TPSAB1 34 350,7145365 33,07766809 10,6027588
33 204885_s_at MSLN 71 59,36785281 5,529934065 10,73572526
34 220010_at KCNE1L 63 144,630379 13,98374647 10,34274894
35 205131_x_at CLEC11A 29 446,0704585 44,07899432 10,11979664
36 220068_at VPREB3 37 26,77972855 266,9641689 0,100312071
37 204698_at ISG20 33 35,03623094 342,703004 0,102234969
38 20489l_s_at LCK 31 61,8281313 604,9303348 0,102207027
39 205798_at IL7R 35 108,240112 1032,103319 0,10487333
40 207826_s_at ID3 56 7,932373391 75,09538195 0,105630642
41 211796_s_at TRBC1 36 152,617601 1388,961527 0,109878926
42 205568_at AQP9 41 83,23465063 748,7517669 0,111164547
43 37145 at GNLY 38 96,9066424 872,656328 0,111047888 206310_at SPINK2 60 541,0837849 61,01626776 8,86786106
214575_s_at AZU1 64 1705,900814 193,187188 8,830299939
210783_x_at CLEC11A 39 283,5868231 31,88617234 8,893724219
221558_s_at LEF1 40 73,31851205 641,0090366 0,114379842
206591_at RAG1 148 9,613271244 83,70008021 0,114853788
AFFX-r2-Bs-dap-5_at — 62 1,649184496 14,32383753 0,115135661
220418_at UBASH3A 43 5,870364325 50,46535814 0,116324634
205366_s_at H0XB6 42 71,07813035 8,302351445 8,561204716
206804_at CD3G 44 14,19499845 121,4526087 0,116876851
205495_s_at GNLY 45 87,77323834 748,2775156 0,117300382
209488_s_at RBPMS 78 51,68384724 6,148101098 8,406473221
209757_s_at MYCN 77 121,3497462 14,80685543 8,195510973
212775_at 0BSL1 57 80,16776939 9,722280977 8,245777876
TRA@ /// TRAC /// TRAJ17 ///
210972_x_at TRAV20 46 129,3957012 1059,666672 0,122109815
209671_x_at TRA@ /// TRAC 47 101,2817483 829,5941872 0,122085894
205922_at VNN2 58 80,6765747 658,1964947 0,122572173
207339_s_at LTB 48 109,1885328 885,6579051 0,123285224
209670_at TRAC 49 86,08434501 694,3831361 0,123972402
210915_x_at TRBC1 50 172,9595219 1396,137195 0,123884331
210031_at CD247 51 74,38866909 600,5255353 0,123872616
206413_s_at TCL1B 165 15,43778348 124,0294703 0,124468672
210484_s_at MGC31957 /// TNFRSFIOC 52 16,24082917 130,0153704 0,124914686
210997_at HGF 149 64,7825388 8,337912254 7,769635471
213150_at HOXA10 66 177,0410268 22,9407217 7,717325945
221349_at VPREB1 173 24,19474485 180,146343 0,134306056
210998_s_at HGF 150 41,2827123 5,376523535 7,678328203
204115_at GNG11 76 22,76278451 172,8273674 0,131708218
219243_at GIMAP4 68 58,29859795 439,4511796 0,132662286
206067_s_at WT1 88 86,95974851 12,02375551 7,232328405
206666_at GZMK 59 34,74787804 259,2225125 0,134046529
204468_s_at TIE1 83 43,57479354 5,951716432 7,321382669
204548_at STAR 61 71,84858889 9,586105161 7,495076226
211902_x_at TRA@ 53 74,8070672 563,8647769 0,132668452
205488_at GZMA 54 41,56099931 311,1555885 0,133569831
206135_at ST18 194 34,95344191 5,034439616 6,942866452
219630_at PDZK1IP1 69 34,61887348 252,9530427 0,136858893
AFFX-r2-Bs-dap-M_at — 91 3,175115317 22,61724812 0,140384688
205267_at POU2AF1 70 43,69031275 312,8932685 0,139633278
207815_at PF4V1 72 7,484984187 53,6660255 0,139473421
205119_s_at FPR1 80 284,3281629 2014,406449 0,141147365
221602_s_at FAIM3 79 45,84000399 323,4024518 0,141742908
210164_at GZMB 82 56,30162949 394,0862387 0,142866266
205254_x_at TCF7 75 13,51171938 94,6088088 0,142816716
222285_at IGHD 81 8,170971952 56,88368827 0,143643498
213193_x_at TRBC1 67 244,4105938 1711,030032 0,142844129
204890_s_at LCK 73 42,01306176 292,0235388 0,143868751 90 205624_at CPA3 86 676,0833449 97,65229344 6,923373954
91 206150_at CD27 65 50,32670569 344,7974439 0,145960205
92 205484_at SIT1 74 15,68390877 107,6818621 0,145650423
93 213844_at H0XA5 85 365,8363745 53,03963106 6,897415521
94 41469_at PI3 84 25,91120807 175,5752987 0,147578892
95 211709_s_at CLEC11A 89 794,0224893 119,3520958 6,652773745
96 211339_s_at ITK 87 66,57609306 437,1517676 0,152295148
97 217418_x_at MS4A1 90 56,81209315 374,0850183 0,151869469
98 220416_at ATP8B4 93 342,5641787 52,08948971 6,576454878
99 222222_s_at H0MER3 92 27,07998807 4,114495956 6,581605223
100 206871_at ELANE 156 1723,797103 261,233626 6,598679997
101 206515_at CYP4F3 94 29,85213119 194,1115237 0,153788557
102 202718_at IGFBP2 151 190,6137939 28,86367281 6,603934127
103 220807_at HESQl 105 22,50679864 146,2910049 0,153849505
104 213258_at TFPI 96 122,6600569 18,88299885 6,495793274
105 206255_at BLK 157 11,56607876 74,85865397 0,154505567
106 206222_at TNFRSF10C 95 28,52518936 181,4515694 0,157205526
107 219529_at CLIC3 98 10,6577767 68,70552516 0,155122556
108 205609_at ANGPT1 97 160,9818384 25,16903735 6,396026839
109 214567_s_at XCLl /// XCL2 101 8,624927227 55,52068817 0,155346187
110 208406_s_at GRAP2 104 5,636358491 36,03533169 0,156412005
111 213110_s_at COL4A5 99 99,11357882 15,75970386 6,289050843
112 209395_at CHI3L1 158 32,13301161 204,0712832 0,157459742
113 212776_s_at 0BSL1 100 70,13493895 11,14462397 6,293163334
114 210724_at EMR3 102 18,43840105 115,2397638 0,160000337
115 203066_at CHST15 103 76,54644213 479,3878118 0,159675403
116 214470_at KLRB1 106 61,39466397 382,8952588 0,160343234
117 216565_x_at — 107 108,6709256 668,4902474 0,162561722
118 217572_at — 108 29,79450706 184,073138 0,16186233
119 44790_s_at C13orfl8 109 21,57755211 132,6182676 0,162704222
120 205590_at RASGRP1 110 39,44573031 240,2518309 0,164184931
121 205831_at CD2 111 90,90211133 555,0670764 0,163767795
122 210439_at ICOS 152 5,062850753 30,93240296 0,163674667
123 220005_at P2RY13 112 57,08634227 348,4278317 0,163839789
124 208304_at CCR3 113 14,71133034 89,50159919 0,164369469
125 204581_at CD22 159 11,15111378 67,24471136 0,165828859
126 221958_s_at WLS 114 7,625327055 45,98332511 0,165828092
127 205174_s_at QPCT 115 31,87833796 191,2661475 0,166670048
128 218963_s_at KRT23 116 21,0687926 126,3074299 0,166805647
129 207651_at GPR171 117 18,18767815 108,4152486 0,16775941
130 201242_s_at ATP1B1 118 194,195005 32,61092831 5,954905764
131 205899_at CCNA1 119 235,6262453 40,0148415 5,888471287
132 209396_s_at CHI3L1 160 46,96279853 275,7761577 0,170293179
133 220744_s_at IFT122 120 73,23634146 12,42326923 5,895094125
134 213830_at TRD@ 198 32,16267358 178,5521652 0,180130404
135 218805_at GIMAP5 121 79,09887161 463,707162 0,170579361 136 206674 at FLT3 122 657,0140317 113,5788776 5,784649802
XAGEIA /// XAGEIB /// XAGEIC
137 220057_at /// XAGE1D /// XAGE1E 211 30,73374656 5,28138168 5,819262538
138 210356_x_at MS4A1 123 72,73134865 423,6405427 0,171681747
139 20593 l_s_at CREB5 124 13,79652321 80,0034599 0,172449082
140 221601_s_at FAIM3 125 122,8955756 718,390153 0,171070796
141 221211_s_at C21orf7 126 22,21700459 127,5605364 0,174168322
142 214022_s_at IFITM1 127 481,6096005 2790,278586 0,172602694
143 219054_at C5orf23 176 255,0090238 45,90585443 5,555043621
144 201601_x_at IFITM1 128 340,0437507 1955,181704 0,173919258
145 201506_at TGFBI 161 103,2399435 572,9778234 0,180181395
146 206765_at KCNJ2 129 33,088223 186,4384185 0,177475347
147 201189_s_at ITPR3 130 18,63391075 104,4596169 0,17838387
148 220646_s_at KLRF1 131 30,89890892 176,2038458 0,175358879
149 202890_at MAP7 169 85,36716348 15,50625813 5,505336154
150 221234_s_at BACH 2 132 29,10573499 164,6899425 0,176730495
151 206785_s_at KLRC1 /// KLRC2 133 8,121891973 45,34929275 0,179096332
152 213958_at CD6 134 77,20360954 440,6890877 0,175188385
153 206390_x_at PF4 181 184,7433193 993,0692085 0,186032673
154 AFFX-BioB-3_at — 172 114,6197576 629,9850895 0,181940429
155 201058_s_at MYL9 217 9,680994213 49,69003618 0,194827675
156 213906_at MYBL1 135 18,41132161 102,1215408 0,180288326
157 200935_at CALR 136 246,3586224 44,54281106 5,530827906
158 217143_s_at TRA@ /// TRD@ 200 57,53383859 293,0047225 0,196358059
159 205653_at CTSG 209 887,3045511 174,1565166 5,094868503
160 207341_at PRTN3 215 598,6288261 118,041971 5,071321845
161 216191_s_at TRA@ /// TRD@ 235 41,97835302 210,121277 0,199781543
162 210948_s_at LEF1 137 19,45058329 103,8370026 0,18731842
163 220377_at FAM30A 216 131,5058081 26,27473391 5,005029113
164 209960_at HGF 197 67,87936228 13,19224493 5,145398881
165 202761_s_at SYNE2 138 32,0041479 176,3319556 0,181499421
166 206337_at CCR7 139 73,8188086 397,0296006 0,18592772
167 20567 l_s_at H LA-DOB 140 19,27426004 102,1656849 0,188656887
168 204777_s_at MAL 153 61,10519311 319,5547608 0,191219786
169 206480_at LTC4S 166 49,81632329 9,580914719 5,199537284
170 20860 l_s_at TUBB1 174 19,20991684 98,76030769 0,1945105
171 207460_at GZMM 141 15,49596558 83,97740658 0,184525412
172 212914_at CBX7 142 39,4554564 215,0751427 0,183449635
173 216667_at LOC643332 /// RNASE2 175 255,0062238 49,50545939 5,151072771
174 210483_at MGC31957 154 11,78876243 61,64553789 0,191234643
175 205049_s_at CD79A 143 74,47784552 389,7401354 0,191096166
176 220421_at BTNL8 184 6,950711076 34,79597569 0,199756177
177 206398_s_at CD19 188 59,34682501 294,1320186 0,201769346
178 214617_at PRF1 162 176,8364486 902,0238807 0,196044087
179 202075_s_at PLTP 144 91,32586184 17,43885341 5,236918947
180 213539_at CD3D 195 161,8678226 791,3866312 0,204536969
181 212538 at D0CK9 179 9,386344607 46,42889036 0,202166034 182 221969_at PAX5 167 54,66248961 274,2889631 0,199287966
183 202016_at MEST 177 513,8088003 103,1788319 4,979788885
184 207979_s_at CD8B 145 36,7856376 192,1824056 0,191410017
185 210606_x_at KLRD1 163 30,34968459 154,4087646 0,196554157
186 217326_x_at IL23A /// TRBV19 155 8,796525844 44,06721966 0,199616085
187 214014_at CDC42EP2 168 3,982186857 19,87844669 0,200326862
188 219201_s_at TWSG1 182 23,91781181 4,853478362 4,927973305
189 210933_s_at FSCN1 170 97,93080862 19,79532619 4,947168219
190 205758_at CD8A 146 76,32138489 378,7377981 0,201515099
191 219812_at PVRIG 147 39,64281582 201,9232452 0,196326162
192 33304_at ISG20 164 100,3888068 479,7976027 0,209231572
193 210607_at FLT3LG 171 13,42088257 63,58015417 0,211086034
194 216268_s_at JAG1 187 158,1367533 32,30953482 4,894429899
195 206082_at HCP5 178 31,30560915 145,5687142 0,215057262
196 218999_at TMEM140 180 42,22485105 193,8173124 0,217859027
197 210755_at HGF 202 122,3041546 24,68334512 4,954926247
198 210805_x_at RUNX1 183 22,61797142 4,674066365 4,839035146
199 202269_x_at GBP1 201 52,49052695 254,8562973 0,205961271
200 219471_at C13orfl8 186 52,07303611 251,4217343 0,207114298
201 209099_x_at JAG1 192 205,8446797 42,32087979 4,863903602
202 204118_at CD48 185 329,0406975 1491,037992 0,220678949
203 207890_s_at MMP25 189 68,0053477 308,9291247 0,220132523
204 215332_s_at CD8B 190 9,004046574 41,27744739 0,218134772
205 215967_s_at LY9 191 21,97460499 102,3662852 0,21466643
206 220987_s_at Cllorfl7 /// NUAK2 193 72,87829115 319,5156383 0,228089904
207 210113_s_at NLRP1 196 34,24453351 156,2955704 0,219101113
208 219837_s_at CYTL1 229 248,4767409 51,46081148 4,828465269
209 203485_at RTN1 210 12,01325143 57,37660195 0,209375443
210 219528_s_at BCL11B 199 47,46514885 219,0741265 0,216662504
211 213611_at AQP5 206 5,767866015 27,36226467 0,210796368
212 210772_at FPR2 203 32,79449117 147,7694912 0,221930054
213 206420_at IGSF6 204 48,67005371 218,4338085 0,22281374
214 208087_s_at ZBP1 205 7,428788811 32,96717009 0,22533899
215 205821_at KLRK1 207 66,35915665 290,6477224 0,228314731
216 211596_s_at LRIG1 208 17,77256431 80,33400834 0,221233382
217 206366_x_at XCL1 212 18,53302351 83,5298982 0,221872933
218 211532_x_at KIR2DS2 214 3,612556812 16,71297558 0,216152821
219 219521_at B3GAT1 213 6,325672366 28,85097756 0,219253311
220 206980_s_at FLT3LG 218 12,25700708 53,99433053 0,227005446
221 206560_s_at MIA 219 7,174218359 32,40053375 0,221422845
222 205544_s_at CR2 232 7,929147176 37,68130215 0,21042657
223 217078_s_at CD300A 220 49,84409702 218,9016351 0,227700889
224 212827_at IGHM 221 211,0969518 930,2018958 0,226936704
225 204959_at MNDA 222 498,2110764 2175,509169 0,229008953
226 218858_at DEPDC6 223 156,82729 35,79659864 4,381066805
227 201655_s_at HSPG2 228 48,36542278 10,49165472 4,609894632 228 204731_at TGFBR3 224 28,14059147 121,7104946 0,231209244
229 208190_s_at LSR 225 5,466221877 24,41918876 0,223849446
230 219024_at PLEKHA1 226 16,59069748 71,97474514 0,230507207
231 219478_at WFDC1 227 41,33698495 9,246594785 4,470508972
232 203561_at FCGR2A 230 164,7501278 723,1536389 0,22782175
233 201162_at IGFBP7 231 525,9651412 115,7009149 4,545903044
234 210665_at TFPI 236 47,97070476 10,33030902 4,643685361
235 206343_s_at NRG1 239 9,731740853 45,30222202 0,214818179
236 202270_at GBP1 242 21,94638578 101,3709845 0,216495735
237 203413_at NELL2 233 60,31409784 256,9574596 0,234724059
238 1405_i_at CCL5 234 199,3787132 872,8199562 0,228430516
239 208029_s_at LAPTM4B 249 205,7679967 44,51654258 4,622281624
240 213135_at TIAM1 237 44,09140961 192,4269415 0,229133245
241 221724_s_at CLEC4A 238 36,38533489 160,3125276 0,226965013
242 215894_at PTGDR 240 10,09097165 43,50172171 0,231967179
243 213668_s_at S0X4 241 153,0294372 33,88793262 4,51575016
244 207850_at CXCL3 265 126,5750632 27,43978233 4,612830441
245 212097_at CAV1 252 46,70628007 10,05645496 4,644408019
246 211583_x_at NCR3 243 20,58903269 86,96146449 0,236760418
247 205239_at AREG 245 274,5081477 61,12600985 4,490856648
248 219789_at NPR3 244 128,0553738 29,15718266 4,391898053
249 214146_s_at PPBP 246 284,4743409 1260,489401 0,225685627
250 205403_at IL1R2 269 48,40448289 217,90211 0,222138661
251 215401_at — 247 39,77525385 9,189621184 4,32828003
252 209840_s_at LRRN3 248 14,59045825 61,97676034 0,235418214
253 64064_at GIMAP5 250 108,5107297 452,2601507 0,239929893
254 207072_at IL18RAP 251 50,29770937 216,3647967 0,232467158
255 202687_s_at TNFSF10 253 68,11015856 289,9564719 0,234897873
256 AFFX-r2-Ec-bioB-3_at — 254 151,0729408 653,8535262 0,231050128
257 210992_x_at FCGR2C 255 64,18957197 276,2000709 0,232402446
258 213147_at HOXA10 256 159,4771971 37,58347194 4,24328006
259 208602_x_at CD6 257 7,956666243 34,82064492 0,228504276
260 208105_at GIPR 258 6,472909399 28,02840403 0,230941062
261 205826_at MY0M2 259 16,83532566 73,644085 0,228603908
262 210773_s_at FPR2 260 35,49593155 150,6824345 0,235567813
263 220187_at STEAP4 261 5,554693253 23,8798066 0,232610479
264 206301_at TEC 262 23,47813332 5,470628278 4,291670377
265 201596_x_at KRT18 263 163,6557229 38,7918178 4,218820673
266 215447_at — 264 65,63625353 15,33516558 4,28011378
267 211719_x_at FN1 275 27,30741833 5,766984105 4,735129806
268 217683_at HBE1 266 12,76149291 53,87722483 0,236862477
269 201830_s_at NET1 267 182,0513217 43,27333631 4,207009148
270 205118_at FPR1 268 17,74688024 75,79984767 0,234128178
271 205253_at PBX1 291 3,606617912 15,4798246 0,232988293
272 214974_x_at CXCL5 270 11,67376168 49,65947571 0,235076217
273 221764_at C19orf22 271 235,1097173 977,6804763 0,240477051 274 201315_x_at IFITM2 272 1051,742923 4313,573375 0,24382173
275 205259_at NR3C2 273 9,400127322 39,03005123 0,240843325
276 AFFX-BioB-M_at — 274 225,6345859 942,2412407 0,239465835
277 205472_s_at DACH1 276 26,06541147 6,109951983 4,266058316
278 209602_s_at GATA3 277 5,478401298 23,161233 0,236533232
KIR3DL1 /// KIR3DL2 ///
279 211688_x_at LOC727787 278 6,52011132 27,02964596 0,241220744
280 210244_at CAMP 284 90,01931097 383,9264596 0,234470193
281 202723_s_at F0X01 279 55,6684428 231,5998035 0,24036481
282 204647_at HOMER3 280 204,5931968 49,91107525 4,099154262
283 205221_at HGD 281 5,137262489 22,65238755 0,2267868
284 213716_s_at SECTM1 282 76,1690302 314,2311226 0,24239811
285 205442_at MFAP3L 283 11,91382367 48,90532924 0,243609927
286 20987 l_s_at APBA2 285 11,87705149 48,99932907 0,242392125
287 211396_at FCGR2C 286 8,740357118 35,97454095 0,242959518
288 214032_at ZAP70 287 50,44304687 206,0124128 0,244854406
289 205255_x_at TCF7 288 228,8036835 929,503869 0,246156784
290 215925_s_at CD72 289 19,29299684 79,24384826 0,243463654
291 216050_at — 290 4,318961656 17,70995806 0,24387193
292 205382_s_at CFD 292 1245,680507 306,7572469 4,060802212
293 220118_at ZBTB32 293 3,615032013 14,9413273 0,241948519
294 207567_at SLC13A2 294 6,442853014 26,26980642 0,245256966
295 210664_s_at TFPI 295 92,30790656 22,75010566 4,057471553
296 201069_at MMP2 296 183,2057337 45,1295109 4,059555046
297 201171_at ATP6V0E1 297 12,17161755 49,19727804 0,247404288
298 206726_at HPGDS 298 128,7199149 31,66110819 4,065553047
299 203675_at NUCB2 299 581,0657373 144,3560154 4,025227044
300 20732 l_s_at ABCB9 300 9,597679218 38,66261506 0,248241853
301 201324_at EMP1 301 153,7211847 38,08502784 4,036262893
302 215101_s_at CXCL5 302 3,35769507 13,88400719 0,241839047
303 213880_at LGR5 305 6,158370932 24,63759123 0,249958321
304 201243_s_at ATP1B1 303 209,5042172 52,32143794 4,004175447
305 215783_s_at ALPL 304 19,19983355 76,42576535 0,251221999
306 204070_at RARRES3 306 93,47552988 372,5548527 0,250904073
307 221790_s_at LDLRAP1 307 35,08165706 139,9222606 0,250722486
308 210450_at LOC90925 308 5,026756025 19,97748911 0,251621012
309 210279_at GPR18 309 39,03136373 155,3738119 0,251209411
310 205627_at CDA 310 72,29770311 286,2792658 0,252542576
311 206208_at CA4 311 7,276368268 28,73302408 0,2532406
312 202768_at FOSB 312 813,7193284 206,7508908 3,935747629
313 210397_at DEFB1 313 100,8985594 25,76022308 3,916835622
314 202478_at TRIB2 314 46,28447853 181,8258135 0,254553947
315 212768_s_at OLFM4 317 21,86567681 82,21592322 0,265954282
316 210517_s_at AKAP12 315 21,06046949 80,9125085 0,260286943
317 206157_at PTX3 316 175,4752753 45,54020632 3,853194561
318 211372_s_at IL1R2 319 32,1615205 117,4894062 0,273739749
319 207892_at CD40LG 318 9,579237346 37,4450169 0,255821419 320 206145_at RHAG 320 155,697322 40,93290691 3,803720129
321 219790_s_at NPR3 321 25,48030957 6,591865991 3,865416803
322 209101_at CTGF 333 31,43797681 98,65688962 0,31865972
323 207384_at PGLYRP1 327 37,55728221 134,0435242 0,280187218
324 217889_s_at CYBRD1 322 33,92266153 8,813028867 3,849149032
325 AFFX-DapX-M_at — 323 10,45116394 40,57710595 0,257563069
326 AFFX-DapX-3_at — 324 3,009980604 11,57277719 0,260091468
327 213479_at NPTX2 325 57,5121763 15,41351592 3,731282116
328 204163_at EMIUN1 326 19,79347365 5,157990463 3,837438978
329 219753_at STAG 3 328 34,41725114 130,8446191 0,263039102
330 213418_at HSPA6 329 94,75691341 360,5381526 0,262820766
331 209687_at CXCL12 330 38,62336714 10,74669311 3,593976933
332 209774_x_at CXCL2 331 377,0516652 101,2161224 3,725213497
333 207840_at CD160 332 23,59459132 92,22980292 0,255823937
334 209368_at EPHX2 334 7,575194058 28,70908945 0,263860478
335 204482_at CLDN5 335 11,52757686 43,96198629 0,262216925
336 210762_s_at DLC1 336 131,6311686 34,33703476 3,833504249
337 201325_s_at EMP1 337 106,590224 28,47481783 3,743315397
338 201564_s_at FSCN1 338 90,18765003 23,30620864 3,869683458
339 20504 l_s_at 0RM1 /// 0RM2 340 13,62207281 49,31113214 0,276247416
340 209487_at RBPMS 339 49,81636296 13,37095141 3,725715653
341 212094_at PEG10 341 6,11628344 22,72014038 0,269200953
342 219737_s_at PCDH9 347 40,74671663 140,1220029 0,290794563
343 219947_at CLEC4A 342 52,315492 199,6909314 0,261982313
344 202458_at PRSS23 343 11,88231903 45,51876042 0,261042237
345 205040_at 0RM1 345 16,70943711 59,31793055 0,281692853
346 207143_at CDK6 344 98,47496801 25,58440479 3,849023216
347 214735_at IPCEF1 346 29,16411265 113,2214675 0,257584655
348 219396_s_at NEIL1 348 19,29679235 74,06164868 0,260550402
349 209772_s_at CD24 349 53,50458827 196,620889 0,272120569
350 202889_x_at MAP7 350 64,75898466 16,95485668 3,819494666
351 205456_at CD3E 351 88,70541382 340,1003006 0,260821333
352 210426_x_at RORA 352 24,4999862 93,27138492 0,262674198
353 220528_at VNN3 353 20,57181028 78,35327718 0,262552008
354 211893_x_at CD6 354 22,71738031 85,90547549 0,264446244
355 209570_s_at D4S234E /// FOXPl 355 10,86988873 41,60539973 0,26126149
356 207802_at CRISP3 361 33,0800047 107,7207623 0,307090332
357 206298_at ARHGAP22 356 50,68511461 13,5239308 3,747809374
358 213589_s_at B3GNTL1 357 66,46884401 17,42019768 3,815619389
359 206851_at RNASE3 358 792,4918987 222,5644658 3,560729678
360 204627_s_at ITGB3 359 33,67278139 117,0891953 0,287582311
361 208963_x_at FADS1 360 72,86226083 19,36492074 3,762590192
362 37152_at PPARD 362 21,34283775 81,33684836 0,262400599
363 219491_at LRFN4 363 43,87539292 11,54224652 3,80128711
364 219840_s_at TCL6 364 4,902334453 18,15862754 0,269972741
365 210432_s_at SCN3A 367 12,13879748 36,92689918 0,328725069 366 204150_at STAB1 365 536,3675127 145,002979 3,6990103
367 210847_x_at TNFRSF25 366 13,26552709 49,5194275 0,267885308
368 AFFX-r2-Ec-bioB-M_at — 368 225,151872 844,6451295 0,266563867
369 210495_x_at FN1 369 43,91255749 13,55438379 3,23973101
370 215352_at — 370 4,698420824 17,25699317 0,272261846
371 205780_at BIK 371 69,12782123 18,83026074 3,67110271
372 204363_at F3 372 21,74795736 7,063272999 3,079019792
373 210222_s_at RTN1 373 17,05262168 62,39665066 0,273293863
374 206111_at RNASE2 374 1897,890733 510,5854218 3,717087587
375 204420_at F0SL1 375 64,43432114 17,36979658 3,709561068
376 214039_s_at LAPTM4B 376 407,0318192 114,4323026 3,556966082
377 206371_at FOLR3 377 32,78225635 108,9641646 0,300853556
378 214406_s_at SLC7A4 378 3,572451858 13,2279157 0,270069143
379 40850_at FKBP8 379 92,04355549 320,0965874 0,287549318
380 202208_s_at ARL4C 380 83,03493325 308,9671664 0,268750024
381 221557_s_at LEF1 381 6,665880362 24,47212025 0,272386712
382 207723_s_at KLRC3 382 9,003876751 33,35409327 0,269948179
383 209604_s_at GATA3 383 84,75773903 311,082066 0,272461027
384 221166_at FGF23 384 6,890303801 25,16691863 0,273784165
385 216052_x_at ARTN 385 41,92526647 11,54855094 3,630348664
386 212531_at LCN2 386 123,9381705 400,6931348 0,309309443
387 211341_at POU4F1 387 136,563044 44,81182385 3,047477927
388 208789_at PTRF 388 45,82371052 13,50580322 3,392890433
389 212417_at SCAMPI 389 83,08371823 22,29484986 3,726587923
390 203153_at IFIT1 390 50,77713759 161,3290932 0,314742596
391 210327_s_at AGXT 391 11,93635623 3,564682525 3,348504711
392 212956_at TBC1D9 392 43,9621454 156,9219381 0,28015296
393 213345_at NFATC4 393 12,98766488 46,92612515 0,276768321
394 201744_s_at LUM 394 8,553660712 2,246540094 3,807481885
395 203716_s_at DPP4 395 3,900478927 10,60569133 0,367772246
396 204304_s_at PROM1 396 248,231266 76,72629975 3,235282646
397 204823_at NAV3 397 22,83940387 7,865496467 2,903745995
398 211451_s_at KCNJ4 398 5,901161865 21,48271516 0,274693484
399 206385_s_at ANK3 399 8,003621356 28,73572287 0,278525144
400 202178_at PRKCZ 400 28,72043762 103,694556 0,276971509
401 210479_s_at RORA 401 26,24105362 94,9665145 0,276319014
402 206641_at TNFRSF17 402 7,383637239 25,19867483 0,293016886
403 219519_s_at SIGLECl 403 11,35222793 33,21540395 0,341776001
404 221004_s_at ITM2C 404 399,7257188 110,3751871 3,621517928
405 205297_s_at CD79B 405 49,25287699 180,2145746 0,273301297
406 205801_s_at RASGRP3 406 67,14611383 20,29521765 3,308469758
407 217977_at SEPX1 407 275,6100752 1016,988696 0,271006036
408 205291_at IL2RB 408 108,9317352 413,417277 0,263491008
409 205767_at EREG 409 160,5319644 47,62248521 3,370927907
410 206522_at MGAM 410 52,86947241 179,8121031 0,294026217
411 219355_at CXorf57 411 2,509719861 8,364075645 0,300059441 412 204439_at IFI44L 412 57,48378003 171,7652467 0,334664789
413 209850_s_at CDC42EP2 413 21,30414858 79,03557646 0,269551378
414 207533_at CCL1 414 11,19043384 3,970571602 2,818343291
415 204081_at NRGN 415 166,0205022 603,4671564 0,275111082
416 202524_s_at SP0CK2 416 111,5556972 415,7774729 0,268306256
417 213194_at R0B01 417 29,67578968 9,434148553 3,145571592
418 209191_at TUBB6 418 382,1031007 103,235872 3,701262878
419 206545_at CD28 419 22,28602698 82,60916056 0,269776703
420 204044_at QPRT 420 56,55366927 16,02144715 3,529872724
421 219174_at IFT74 421 22,74132093 6,215253954 3,658952812
422 221698_s_at CLEC7A 422 135,8671929 473,0917455 0,287189946
423 205237_at FCN1 423 601,272474 2093,548815 0,28720251
424 219315_s_at TMEM204 424 53,21239446 198,6300266 0,267897031
425 216331_at ITGA7 425 17,87263636 5,3325625 3,35160373
426 221658_s_at IL21R 426 16,91898847 60,9356509 0,277653364
427 210202_s_at BIN1 427 28,64089372 103,5160191 0,276680788
428 214491_at SSTR3 428 5,028913459 17,99671709 0,279435046
429 202444_s_at ERLIN1 429 208,5380356 56,23837595 3,708109135
430 AFFX-r2-Bs-dap-3_at — 430 2,914327944 9,518616748 0,306171371
431 202688_at TNFSF10 431 171,8167109 610,3845345 0,281489293
432 214761_at ZNF423 432 12,86219673 41,62111802 0,309030544
433 211781_x_at — 433 87,3203394 277,0059158 0,31522915
434 214183_s_at TKTL1 434 67,05697902 26,06824719 2,572362404
435 219218_at BAHCC1 435 241,8785928 70,54449223 3,428738164
436 214208_at KLHL35 436 9,84969892 34,6065618 0,284619402
437 219360_s_at TRPM4 437 22,2462255 8,029596359 2,770528493
438 213474_at KCTD7 438 18,1145844 65,90770755 0,274847739
439 214073_at CTTN 439 6,909059922 23,352207 0,295863253
440 210550_s_at RASGRF1 440 4,651249578 15,96492653 0,291341747
441 204661_at CD52 441 421,8538252 1490,368782 0,283053316
442 215229_at LOC100129973 442 1,879865887 6,350893496 0,296000222
443 205730_s_at ABLIM3 443 4,290350355 14,44952868 0,296919744
444 209469_at GPM6A 444 4,418308303 13,47079877 0,32799156
445 211135_x_at LILRB3 445 104,7735173 358,2493216 0,292459779
446 219073_s_at OSBPL10 446 14,56733828 50,48107682 0,288570276
447 216831_s_at RUNXITI 447 5,55785056 1,8521834 3,000702069
448 209892_at FUT4 448 488,1113339 142,4203247 3,427258959
449 205476_at CCL20 449 20,24927654 13,37704654 1,513732981
450 205987_at CD1C 450 33,44184112 104,0186473 0,32149852
451 211743_s_at PRG2 451 388,5653205 147,9242207 2,626786328
452 206394_at MYBPC2 452 3,726424062 11,84805956 0,314517668
453 207655_s_at BLNK 453 79,61767293 253,0620248 0,314617229
454 211429_s_at SERPINA1 454 602,5901045 2073,691338 0,290588138
455 213395_at MLC1 455 196,9332 54,84027081 3,591032596
456 215599_at GUSBP3 456 115,1078605 33,24731704 3,462169905
457 210459_at PSMD4 457 5,513675912 18,15317821 0,303730611 458 217152_at — 458 13,85537991 49,19475683 0,281643427
459 214195_at TPP1 459 9,963782387 34,87030505 0,28573832
460 220848_x_at 0BP2A 460 5,149303189 17,61071259 0,292396072
461 203936_s_at MMP9 461 198,6648713 608,7010696 0,326375098
462 220110_s_at NXF3 462 29,29537075 9,418501985 3,110406602
463 206647_at HBZ 463 16,22106639 44,42459798 0,365137044
464 216598_s_at CCL2 464 17,15333473 27,46638387 0,624521044
465 215215_s_at LOC81691 465 67,00882524 19,02725285 3,521728847
466 205240_at GPSM2 466 47,6865216 14,18149453 3,362587877
467 210233_at 1 LI RAP 467 51,1505373 15,56395547 3,286474149
468 220485_s_at SIRPG 468 27,77533293 100,7989732 0,275551745
469 202728_s_at LTBP1 469 49,61464554 15,85609793 3,129057715
470 203131_at PDGFRA 470 7,612340927 18,83485909 0,404162351
471 213317_at CLIC5 471 9,585497195 29,5976418 0,323860166
472 213759_at — 472 6,452569733 21,92831689 0,29425741
473 205528_s_at RUNX1T1 473 22,96892059 7,761091583 2,959496141
474 209160_at AKR1C3 474 196,6497244 60,56969443 3,24666859
475 209560_s_at DLK1 475 76,04599909 28,4041981 2,677280267
476 213606_s_at ARHGDIA 476 100,5644113 29,04451502 3,462423497
477 217996_at PHLDA1 477 129,2736235 40,81200475 3,167539167
478 219839_x_at TCL6 478 3,376230181 10,9349598 0,30875561
AFFX-
479 HUMRGE/M10098_5_at — 479 155,1917966 367,4382667 0,422361552
480 211794_at FYB 480 47,56925293 165,3015158 0,287772636
481 211005_at LAT /// SPNS1 481 74,56530095 254,7457726 0,292704763
482 32625_at NPR1 482 9,413226801 31,45345243 0,299274835
483 206001_at NPY 483 33,90019072 107,7172116 0,314714707
484 202555_s_at MYLK 484 17,6498803 56,54429021 0,312142574
485 202947_s_at GYPC 485 438,0492591 1473,404162 0,297304209
486 208502_s_at PITX1 486 6,475260259 2,891004469 2,239796004
487 20688 l_s_at LILRA3 487 53,01479537 157,2896643 0,337051996
488 217032_at F0XD4 /// F0XD4L1 488 4,435979139 14,07091796 0,315258688
489 220105_at RTDR1 489 2,35459886 7,523745178 0,31295569
490 206760_s_at FCER2 490 8,269909642 18,27351255 0,452562671
491 202083_s_at SEC14L1 491 50,24268811 178,2359714 0,281888598
492 218876_at TPPP3 492 28,96722244 9,533025961 3,038617806
493 218717_s_at LEPREL1 493 5,797160007 19,54504803 0,296605053
494 213241_at PLXNC1 494 137,3833808 474,617399 0,289461324
495 216442_x_at FN1 495 44,7763732 16,68457747 2,683698359
496 219093_at PID1 496 17,69716266 56,70471044 0,312093343
497 206500_s_at C14orfl06 497 78,58177815 23,25790405 3,378712802
498 204724_s_at COL9A3 498 7,192744843 19,95750597 0,360402991
499 204187_at GMPR 499 82,32427111 262,0740554 0,314125986
500 214200_s_at C0L6A1 500 5,661479341 18,40994903 0,307522815
501 221081_s_at DENND2D 501 67,74893896 244,401497 0,277203453
502 206762_at KCNA5 502 6,458960684 13,90921891 0,464365449
503 205837_s_at GYPA /// GYPB 503 45,09590857 16,67651741 2,704156237 504 20341 l_s_at LMNA 504 334,7553006 96,68020934 3,462500784
505 205900_at KRT1 505 70,31502319 192,2066359 0,365830362
506 208107_s_at LOC81691 506 91,60864249 26,65203461 3,437210098
507 217422_s_at CD22 507 13,092683 41,22798088 0,317567893
508 213558_at PCLO 508 18,52869849 49,81515017 0,371949064
AFFX-r2-Hsl8SrRNA-
509 5_at — 509 170,9156246 396,3749727 0,431196812
510 205768_s_at SLC27A2 510 69,33620639 21,02451925 3,297873571
511 210784_x_at LILRB3 511 112,3644722 363,2603716 0,309322131
512 216953_s_at WT1 512 23,62027137 7,256229692 3,2551714
513 222288_at — 513 16,56522555 6,898363806 2,401326752
514 205529_s_at RUNXITI 514 60,82331353 21,10287 2,882229457
515 211010_s_at NCR3 515 9,280737065 31,33873876 0,296142647
516 201465_s_at JUN 516 245,8359286 84,29698208 2,916307589
517 206823_at L3MBTL 517 14,56830388 48,37964621 0,301124647
518 210548_at CCL23 518 50,97203163 15,97634998 3,190467892
519 214957_at ACTL8 519 8,586441992 28,71619083 0,29901048
520 210225_x_at LILRB3 520 107,2733578 344,2690325 0,311597465
521 202219_at SLC6A8 521 63,51307239 167,8602635 0,378368716
522 206258_at ST8SIA5 522 4,193388344 13,15678551 0,318724383
523 210487_at DNTT 523 136,3501608 348,2639946 0,391513803
524 204793_at GPRASP1 524 28,76781848 98,98179248 0,290637477
525 217646_at SURF1 525 3,398341487 11,58454026 0,293351433
526 206889_at PDIA2 526 6,995534807 23,2639881 0,300702303
527 206865_at HRK 527 4,284780741 13,15347348 0,325752794
528 220051_at PRSS21 528 127,828969 38,114854 3,353783514
529 208173_at IFNB1 529 14,78109024 6,924916205 2,134479293
530 209013_x_at TRIO 530 60,00903118 17,9124573 3,350128359
531 213419_at APBB2 531 3,948544808 11,26179798 0,350614068
532 221054_s_at TCL6 532 2,088266245 6,061255695 0,344527001
533 202391_at BASP1 533 465,5514334 1569,632379 0,296599025
534 216439_at TNK2 534 3,244107408 9,983822017 0,324936422
535 201743_at CD14 535 363,4036127 1079,531628 0,336630816
536 218614_at C12orf35 536 103,3335574 369,2586796 0,279840565
537 219332_at MICALL2 537 108,3058109 30,92565246 3,502135034
538 215184_at DAPK2 538 8,938536759 29,47403245 0,3032682
539 200706_s_at LITAF 539 268,9395082 961,1513014 0,279809753
540 210123_s_at CHRFAM7A /// CHRNA7 540 43,20985297 12,71264868 3,398965396
541 210693_at SPPL2B 541 34,74849862 11,04668857 3,145603173
542 215621_s_at IGHD 542 24,29191874 73,99951176 0,328271338
543 221011_s_at LBH 543 106,8910618 373,5487677 0,286150219
544 205898_at CX3CR1 544 290,80575 893,4310906 0,325493206
545 203382_s_at APOE 545 22,4631062 7,633104462 2,942853241
546 210215_at TFR2 546 110,6857898 36,2278831 3,055265181
547 211965_at ZFP36L1 547 33,5120891 108,2544284 0,309567836
548 218454_at PLBD1 548 376,9197365 1164,065785 0,323795907
549 219667_s_at BANK1 549 49,6639925 163,6574876 0,303463002 550 209890_at TSPAN5 550 56,71657358 176,9493755 0,320524294
551 222313_at — 551 47,22704184 15,0289847 3,14239736
552 203386_at TBC1D4 552 31,95159083 110,4209654 0,289361633
553 206023_at NMU 553 10,27969494 4,515081122 2,276746455
554 215116_s_at DNM1 554 107,4454296 31,12432687 3,452136653
555 202833_s_at SERPINA1 555 395,5598101 1204,491906 0,328403876
556 220448_at KCNK12 556 10,51237011 22,24892968 0,472488801
557 204655_at CCL5 557 203,9761701 673,5873305 0,302820675
558 214433_s_at SELENBP1 558 205,4009036 550,9771577 0,372793864
559 201163_s_at IGFBP7 559 436,8289333 129,6484475 3,369334087
560 205020_s_at ARL4A 560 333,8241393 100,3344959 3,327112339
561 201667_at GJA1 561 26,84940228 10,5567582 2,543337811
562 201951_at ALCAM 562 98,41207931 28,33146363 3,473596725
563 207900_at CCL17 563 5,395262864 14,21679318 0,379499286
564 204961_s_at NCF1 /// NCF1B /// NCF1C 564 227,2586246 703,0853062 0,323230514
565 220085_at HELLS 565 27,58974469 9,313601559 2,962306742
566 202086_at MX1 566 145,8156484 433,4498026 0,336407232
567 206707_x_at FAM65B 567 208,9449058 718,834561 0,290671758
568 214049_x_at CD7 568 57,34040344 150,5641058 0,38083714
569 221210_s_at NPL 569 43,84888418 137,6447879 0,318565525
570 205559_s_at PCSK5 570 13,67756241 47,57753162 0,287479446
571 217147_s_at TRAT1 571 24,10594227 74,64821006 0,322927264
572 209173_at AGR2 572 16,44547721 3,727531975 4,411894337
573 209870_s_at APBA2 573 39,47313158 131,8007164 0,299491025
574 218000_s_at PHLDA1 574 18,11620036 6,156224782 2,942745108
575 204915_s_at S0X11 575 8,008870153 19,88810888 0,402696415
576 209098_s_at JAG1 576 52,41753592 16,05357925 3,265161938
577 211062_s_at CPZ 577 4,993545378 15,3553649 0,325198744
578 215666_at HLA-DRB4 578 11,54287437 28,64390811 0,402978334
579 218788_s_at SMYD3 579 241,2241714 70,15763805 3,438316598
580 201393_s_at IGF2R 580 95,90477435 305,094262 0,314344733
581 212464_s_at FN1 581 28,33237471 12,24010107 2,314717383
582 204836_at GLDC 582 12,2735886 36,41385321 0,337058222
583 204914_s_at S0X11 583 9,743656564 17,28567813 0,56368379
584 205569_at LAMP3 584 7,533652404 23,56021919 0,319761558
585 205947_s_at VIPR2 585 3,856028319 11,72181071 0,328961832
586 213809_x_at — 586 6,924154548 22,70759828 0,304926768
587 203290_at HLA-DQA1 587 59,06591664 147,3332677 0,400900065
588 205551_at SV2B 588 31,02442444 10,38834724 2,986463941
589 210571_s_at CMAH 589 57,00508063 17,60843778 3,237372976
590 206237_s_at NRG1 590 4,193441279 13,14575131 0,318995939
591 211560_s_at ALAS2 591 449,6626641 1171,81268 0,383732547
592 216560_x_at IGL@ 592 16,20418794 32,65910164 0,496161472
593 205805_s_at R0R1 593 6,999813141 20,09972135 0,348254238
594 206589_at GFI1 594 171,5736844 53,47557389 3,208449613
595 213737_x_at LOC728498 595 1421,978175 409,5601867 3,471963881 596 220832_at TLR8 596 30,10469076 81,48279796 0,36946069
597 202207_at ARL4C 597 188,0838384 608,8454928 0,308918832
598 206025_s_at TNFAIP6 598 37,87001587 109,062072 0,3472336
599 209949_at NCF2 599 428,0947114 1394,03312 0,307090775
600 211791_s_at KCNAB2 600 70,2584383 21,02265115 3,342035112
601 215408_at — 601 5,061611061 15,48399674 0,326893059
602 220757_s_at UBXN6 602 78,62345214 253,0770162 0,310670061
603 38487_at STAB1 603 627,2784611 211,1342638 2,970993195
604 205769_at SLC27A2 604 85,24096679 28,03232222 3,040810038
605 206655_s_at GPIBB /// SEPT5 605 52,43712352 134,5859121 0,38961822
606 210370_s_at LY9 606 27,88002563 92,89604242 0,300120704
607 219971_at IL21R 607 11,13659814 36,49920802 0,305118898
608 221933_at NLGN4X 608 5,041460688 12,93262025 0,389825154
609 204446_s_at AL0X5 609 275,4781873 890,9996409 0,309178786
610 205863_at S100A12 610 408,92268 1180,59847 0,346368973
611 206584_at LY96 611 65,72672895 195,2368465 0,336651253
612 210254_at MS4A3 612 494,4628085 196,6079032 2,514969136
613 210794_s_at MEG3 613 53,90626914 24,46522773 2,203383093
614 211322_s_at SARDH 614 9,099867531 28,71755245 0,31687476
615 200965_s_at ABLIM1 615 139,7876868 428,2493685 0,326416563
616 206126_at CXCR5 616 17,05125613 54,53607654 0,312660118
617 210332_at LOC100287322 617 7,822669126 24,37482588 0,320932308
618 214223_at — 618 11,11240596 34,02430162 0,326602029
619 217002_s_at HTR3A 619 6,304007273 19,99209746 0,315324957
620 217007_s_at ADAM15 620 40,64499847 12,57143926 3,233122129
621 219541_at UME1 621 31,325066 96,91941105 0,32320735
622 220139_at DNMT3L 622 4,900931854 15,28529644 0,320630475
623 220359_s_at ARPP21 623 8,167500742 17,12517946 0,476929352
624 220684_at TBX21 624 49,15091719 172,7943202 0,284447528
625 221908_at RNFT2 625 9,84315517 28,14751293 0,349698931
626 202007_at NIDI 626 43,83762291 15,37301939 2,851594849
627 202074_s_at OPTN 627 100,9324187 300,4300815 0,335959762
628 204560_at FKBP5 628 90,55096559 32,97152091 2,746338752
629 206363_at MAF 629 19,28349034 52,38644688 0,368100749
630 206940_s_at POU4F1 630 162,354841 73,96280564 2,195087647
631 206983_at CCR6 631 13,09213821 43,87107396 0,298423016
632 207550_at MPL 632 66,92029938 23,30423292 2,87159417
633 210873_x_at APOBEC3A 633 89,05940092 247,2303209 0,360228473
634 211728_s_at HYAL3 634 50,42111505 15,88323173 3,174487152
635 212400_at FAM102A 635 79,22250974 272,917962 0,290279574
636 212526_at SPG20 636 170,4150398 49,90562301 3,414746266
637 212951_at GPR116 637 4,368074965 13,02488495 0,335363804
638 213217_at ADCY2 638 25,58959654 8,693213393 2,943629171
639 214639_s_at HOXA1 639 14,19516046 4,768993962 2,976552408
640 219511_s_at SNCAIP 640 21,91485939 9,387562922 2,334456724
641 201188_s_at ITPR3 641 21,00405595 66,65039799 0,315137742 642 20129 l_s_at TOP2A 642 113,3517316 41,65536344 2,721179755
643 201418_s_at SOX4 643 266,2805664 89,39503398 2,978695287
644 202411_at IFI27 644 67,01374124 132,2353661 0,506776237
645 203038_at PTPRK 645 18,86888439 53,44034439 0,353083136
646 203979_at CYP27A1 646 26,03933076 64,9624031 0,400836938
647 204011_at SPRY2 647 149,8074771 58,50625409 2,560537834
648 205268_s_at ADD2 648 74,82011176 29,71762923 2,517701234
649 205414_s_at RICH2 649 5,75002671 14,22008435 0,404359536
650 206338_at ELAVL3 650 6,189514338 18,93997948 0,326796254
651 206478_at KIAA0125 651 305,2692051 103,5767082 2,947276569
652 206759_at FCER2 652 19,31055036 45,29108946 0,426365331
653 206964_at NAT8B 653 13,39217244 43,83345397 0,305524006
654 207166_at GNGT1 654 2,651431532 6,160516593 0,430391103
655 208582_s_at DUXl /// DUX3 /// DUX5 655 2,817130586 7,464028784 0,377427616
656 208605_s_at NTRK1 656 13,36495444 6,104661657 2,189303058
657 209116_x_at HBB 657 5151,46449 15690,4194 0,328319107
658 209318_x_at PLAGL1 658 339,5678397 98,19916762 3,457950286
659 209387_s_at TM4SF1 659 27,32845715 9,592380299 2,848975572
660 209969_s_at STAT1 660 77,05106756 227,0532379 0,339352428
661 211397_x_at KIR2DL2 661 10,7564824 34,88724019 0,30832139
662 211699_x_at HBA1 /// HBA2 662 5042,097129 16013,90625 0,314857415
663 212396_s_at KIAA0090 663 54,75826763 17,21615004 3,180633737
664 212589_at RRAS2 664 20,46611417 56,71154848 0,360880891
665 213069_at HEG1 665 12,77477924 36,8291775 0,346865722
666 213094_at GPR126 666 37,38863864 16,22611479 2,304226189
667 213354_s_at LOC100287051 667 2,193403893 5,890678695 0,372351643
668 214142_at ZG16 668 3,992833875 12,25666984 0,325768249
669 214920_at THSD7A 669 29,31285529 13,850589 2,116361643
670 215117_at RAG2 670 5,775528094 15,78789807 0,365819951
671 216485_s_at TPSAB1 671 7,580271498 22,89962185 0,331021689
672 217062_at LOC100287204 672 3,276983203 10,48640413 0,312498275
673 217394_at — 673 11,13730093 36,86690698 0,302094801
674 217523_at CD44 674 265,8406832 86,63753958 3,06842374
675 219463_at C20orfl03 675 106,8746644 57,17508763 1,869252306
676 22075 l_s_at C5orf4 676 53,36359127 126,6530725 0,421336729
677 221920_s_at SLC25A37 677 188,720358 552,4597632 0,341600186
678 222139_at KIAA1466 678 20,74906306 67,17899751 0,30886235
679 222211_x_at SCAND2 679 7,051228533 21,42200102 0,329158258
680 34210_at CD52 680 329,6937304 1033,462373 0,31901861
Table 3
Example 1 Example 2 Example 3 Example 4 Example 5
201242_s_at 201242_s_at 201427_s_at 200935_at 201058_s_at
201427_s_at 201427_s_at 202761_s_at 201189_s 201242_s_at
201601 x at 201506 at 203434 s at 201242 s 201427 s at 202075_s_at 202718_at 203435_s_at 201427_s_at 202718_at
202718_at 202890_at 203691_at 201506_at 203066_at
202890_at 203066_at 203828_s_at 202075_s_at 203434_s_at
203066_at 203434_s_at 203948_s_at 202718_at 203435_s_at
203434_s_at 203435_s_at 203949_at 203066_at 203691_at
203435_s_at 203691_at 204006_s_at 203434_s_at 203828_s_at
203691_at 203828_s_at 204007_at 203435_s_at 203948_s_at
203828_s_at 203948_s_at 204115_at 203485_at 203949_at
203948_s_at 203949_at 204468_s_at 203691_at 204006_s_at
203949_at 204006_s_at 204548_at 203828_s_at 204007_at
204006_s_at 204007_at 204561_x_at 203948_s_at 204115_at
204007_at 204115_at 204698_at 203949_at 204548_at
204115_at 204468_s_at 204885_s_at 204006_s_at 204561_x_at
204468_s_at 204548_at 204890_s_at 204007_at 204698_at
204548_at 204561_x_at 20489 l_s_at 204115_at 204885_s_at
204561_x_at 204698_at 20505 l_s_at 204468_s_at 204890_s_at
204581_at 204885_s_at 205119_s_at 204548_at 20489 l_s_at
204698_at 204890_s_at 205131_x_at 204561_x_at 20505 l_s_at
204885_s_at 20489 l_s_at 205254_x_at 204581_at 205119_s_at
204890_s_at 20505 l_s_at 205267_at 204698_at 205131_x_at
20489 l_s_at 205119_s_at 205366_s_at 204885_s_at 205174_s_at
20505 l_s_at 205131_x_at 205484_at 204890_s_at 205254_x_at
205119_s_at 205239_at 205488_at 20489 l_s_at 205267_at
205131_x_at 205254_x_at 205495_s_at 20505 l_s_at 205366_s_at
205254_x_at 205267_at 205568_at 205119_s_at 205484_at
205267_at 205366_s_at 205590_at 205131_x_at 205488_at
205366_s_at 205484_at 205609_at 205174_s_at 205495_s_at
205484_at 205488_at 205624_at 205254_x_at 205568_at
205488_at 205495_s_at 205683_x_at 205267_at 205624_at
205495_s_at 205568_at 205798_at 205366_s_at 205683_x_at
205568_at 205590_at 205831_at 205403_at 205798_at
205590_at 205609_at 205922_at 205484_at 205831_at
205609_at 205624_at 206067_s_at 205488_at 205922_at
205624_at 205683_x_at 206150_at 205495_s_at 20593 l_s_at
205683_x_at 205798_at 206222_at 205568_at 206067_s_at
205798_at 205831_at 206255_at 205590_at 206135_at
205831_at 205922_at 206310_at 205609_at 206150_at
205922_at 206067_s_at 206413_s_at 205624_at 206222_at
206067_s_at 206135_at 206515_at 205653_at 206310_at
206135_at 206150_at 206591_at 205683_x_at 206413_s_at
206150_at 206222_at 206622_at 205798_at 206515_at
206222_at 206310_at 206666_at 205831_at 206591_at
206255_at 206413_s_at 206785_s_at 205899_at 206622_at
206310_at 206515_at 206804_at 205922_at 206666_at
206413_s_at 206591_at 206871_at 20593 l_s_at 206765_at
206515 at 206622 at 207008 at 206067 s at 206804 at 206591_at 206666_at 207094_at 206135_at 207008_at
206622_at 206674_at 207134_x_at 206150_at 207094_at
206666_at 206765_at 207339_s_at 206222_at 207134_x_at
206804_at 206785_s_at 207741_x_at 206255_at 207339_s_at
206871_at 206804_at 207815_at 206310_at 207651_at
207008_at 206871_at 207826_s_at 206413_s_at 207741_x_at
207094_at 207008_at 207850_at 206480_at 207815_at
207134_x_at 207094_at 207907_at 206515_at 207826_s_at
207339_s_at 207134_x_at 209395_at 206591_at 207907_at
207651_at 207339_s_at 209396_s_at 206622_at 208304_at
207741_x_at 207741_x_at 209488_s_at 206666_at 208406_s_at
207815_at 207815_at 209670_at 206674_at 209488_s_at
207826_s_at 207826_s_at 209671_x_at 206765_at 209670_at
207850_at 207907_at 209757_s_at 206804_at 209671_x_at
207907_at 208406_s_at 209905_at 206871_at 209757_s_at
208406_s_at 209395_at 209995_s_at 207008_at 209905_at
209488_s_at 209396_s_at 210031_at 207094_at 209995_s_at
209670_at 209488_s_at 210084_x_at 207134_x_at 210031_at
209671_x_at 209670_at 210119_at 207339_s_at 210084_x_at
209757_s_at 209671_x_at 210164_at 207341_at 210119_at
209905_at 209757_s_at 210321_at 207651_at 210164_at
209995_s_at 209905_at 210439_at 207741_x_at 210321_at
210031_at 209995_s_at 210484_s_at 207815_at 210484_s_at
210084_x_at 210031_at 210549_s_at 207826_s_at 210549_s_at
210119_at 210084_x_at 210783_x_at 207907_at 210724_at
210164_at 210119_at 210915_x_at 208304_at 210783_x_at
210321_at 210164_at 210972_x_at 208406_s_at 210915_x_at
210439_at 210321_at 210997_at 209395_at 210972_x_at
210484_s_at 210356_x_at 210998_s_at 209396_s_at 210997_at
210549_s_at 210484_s_at 211163_s_at 209488_s_at 210998_s_at
210783_x_at 210549_s_at 211339_s_at 209602_s_at 211163_s_at
210915_x_at 210724_at 211796_s_at 209670_at 211339_s_at
210972_x_at 210783_x_at 211902_x_at 209671_x_at 211709_s_at
210997_at 210915_x_at 212775_at 209757_s_at 211796_s_at
210998_s_at 210972_x_at 212776_s_at 209905_at 211902_x_at
211163_s_at 210997_at 213110_s_at 209995_s_at 212775_at
211339_s_at 211163_s_at 213150_at 210031_at 213110_s_at
211709_s_at 211339_s_at 213193_x_at 210084_x_at 213150_at
211796_s_at 211709_s_at 213844_at 210119_at 213193_x_at
211902_x_at 211796_s_at 213906_at 210164_at 213611_at
212775_at 211902_x_at 214470_at 210321_at 213844_at
212776_s_at 212775_at 214567_s_at 210439_at 214470_at
213110_s_at 212776_s_at 214575_s_at 210484_s_at 214575_s_at
213150_at 212914_at 214651_s_at 210549_s_at 214651_s_at
213193_x_at 213110_s_at 215382_x_at 210724_at 215382_x_at
213258 at 213150 at 216474 x at 210783 x at 216474 x at 213844_at 213193_x_at 216782_at 210915_x_at 216565_x_at
213958_at 213258_at 217023_x_at 210972_x_at 216782_at
214470_at 213844_at 217418_x_at 210997_at 217023_x_at
214567_s_at 214470_at 217572_at 210998_s_at 217418_x_at
214575_s_at 214575_s_at 219243_at 211163_s_at 218805_at
214651_s_at 214651_s_at 219529_at 211339_s_at 218963_s_at
215382_x_at 215382_x_at 219630_at 211709_s_at 219243_at
216474_x_at 216474_x_at 220010_at 211796_s_at 219630_at
216782_at 216782_at 220068_at 211902_x_at 220005_at
217023_x_at 217023_x_at 220416_at 212775_at 220010_at
217418_x_at 217418_x_at 220418_at 212776_s_at 220068_at
218963_s_at 217572_at 220807_at 213110_s_at 220416_at
219243_at 218805_at 221345_at 213150_at 220418_at
219529_at 218963_s_at 221349_at 213193_x_at 220744_s_at
219630_at 219054_at 221558_s_at 213258_at 220807_at
220010_at 219243_at 221601_s_at 213830_at 221211_s_at
220068_at 219630_at 221602_s_at 213844_at 221345_at
220416_at 219837_s_at 222222_s_at 214470_at 221558_s_at
220418_at 220005_at 222285_at 214567_s_at 221602_s_at
220646_s_at 220010_at 37145_at 214575_s_at 221958_s_at
220807_at 220068_at 39318_at 214651_s_at 222222_s_at
221345_at 220377_at 41469_at 215382_x_at 222285_at
221349_at 220416_at 44790_s_at 216474_x_at 37145_at
221558_s_at 220418_at AFFX-BioB-3_at 216565_x_at 39318_at
221602_s_at 220646_s_at AFFX-r2-Bs-dap-5_at 216782_at 41469_at
221958_s_at 220807_at AFFX-r2-Bs-dap-M_at 217023_x_at AFFX-r2-Bs- dap-5_at
222222_s_at 221211_s_at 217418_x_at AFFX-r2-Bs- dap-M_at
222285_at 221345_at 218963_s_at
37145_at 221558_s_at 219054_at
39318_at 221602_s_at 219243_at
41469_at 221958_s_at 219630_at
AFFX-r2-Bs-dap-5_at 222285_at 219837_s_at
AFFX-r2-Bs-dap-M_at 37145_at 220005_at
39318_at 220010_at
41469_at 220057_at
44790_s_at 220068_at
AFFX-r2-Bs-dap-5_at 220416_at
AFFX-r2-Bs-dap-M_at 220418_at
220744_s_at
221211_s_at
221345_at
221349_at
221558_s_at
221602_s_at
221958 s at 222222_s_at
222285_at
37145_at
39318_at
41469_at
44790_s_at
AFFX-BioB-3_at
AFFX-r2-Bs-dap-5_at
AFFX-r2-Bs-dap-M_at
Table 4
Example 6 Example 7 Example 8 Example 9 Example 10
201189_s_at 201189_s_at 201427_s_at 201058_s_at 200935_at
201427_s_at 201242_s_at 201506_at 201242_s_at 201058_s_at
202718_at 201427_s_at 203066_at 201427_s_at 201242_s_at
203066_at 201506_at 203434_s_at 202718_at 201427_s_at
203434_s_at 202890_at 203435_s_at 203066_at 201601_x_at
203435_s_at 203066_at 203691_at 203434_s_at 202718_at
203691_at 203434_s_at 203828_s_at 203435_s_at 203066_at
203828_s_at 203435_s_at 203948_s_at 203691_at 203434_s_at
203948_s_at 203691_at 203949_at 203828_s_at 203435_s_at
203949_at 203828_s_at 204006_s_at 203948_s_at 203691_at
204006_s_at 203948_s_at 204007_at 203949_at 203828_s_at
204007_at 203949_at 204115_at 204006_s_at 203948_s_at
204115_at 204006_s_at 204468_s_at 204007_at 203949_at
204468_s_at 204007_at 204548_at 204115_at 204006_s_at
204548_at 204115_at 204561_x_at 204468_s_at 204007_at
204561_x_at 204468_s_at 204698_at 204548_at 204115_at
204581_at 204548_at 204885_s_at 204561_x_at 204468_s_at
204698_at 204561_x_at 20489 l_s_at 204698_at 204548_at
204885_s_at 204581_at 20505 l_s_at 204885_s_at 204561_x_at
204890_s_at 204698_at 205119_s_at 204890_s_at 204581_at
20489 l_s_at 204885_s_at 205131_x_at 20489 l_s_at 204698_at
20505 l_s_at 204890_s_at 205174_s_at 20505 l_s_at 204885_s_at
205119_s_at 20489 l_s_at 205267_at 205119_s_at 204890_s_at
205131_x_at 20505 l_s_at 205366_s_at 205131_x_at 20489 l_s_at
205174_s_at 205119_s_at 205484_at 205254_x_at 20505 l_s_at
205254_x_at 205131_x_at 205488_at 205267_at 205119_s_at
205267_at 205174_s_at 205495_s_at 205366_s_at 205131_x_at
205366_s_at 205254_x_at 205568_at 205484_at 205254_x_at
205484_at 205267_at 205609_at 205488_at 205267_at
205488_at 205366_s_at 205624_at 205495_s_at 205366_s_at
205495_s_at 205484_at 205683_x_at 205568_at 205484_at
205568_at 205488_at 205798_at 205609_at 205488_at
205590 at 205495 s at 205922 at 205624 at 205495 s at 205624_at 205568_at 20593 l_s_at 205683_x_at 205568_at
205683_x_at 205590_at 206067_s_at 205798_at 205590_at
205798_at 205609_at 206135_at 205899_at 205609_at
205831_at 205624_at 206150_at 205922_at 205624_at
205922_at 205683_x_at 206222_at 206067_s_at 205653_at
206150_at 205798_at 206310_at 206135_at 205683_x_at
206255_at 205831_at 206413_s_at 206150_at 205798_at
206310_at 205899_at 206515_at 206222_at 205831_at
206398_s_at 205922_at 206591_at 206255_at 205922_at
206413_s_at 20593 l_s_at 206622_at 206310_at 206067_s_at
206515_at 206067_s_at 206666_at 206413_s_at 206135_at
206591_at 206135_at 206674_at 206480_at 206150_at
206622_at 206150_at 206804_at 206591_at 206222_at
206666_at 206222_at 206871_at 206622_at 206255_at
206804_at 206255_at 207008_at 206666_at 206310_at
206871_at 206310_at 207094_at 206804_at 206413_s_at
207008_at 206413_s_at 207134_x_at 206871_at 206591_at
207094_at 206515_at 207339_s_at 207008_at 206622_at
207134_x_at 206591_at 207341_at 207094_at 206666_at
207339_s_at 206622_at 207741_x_at 207134_x_at 206674_at
207741_x_at 206666_at 207815_at 207339_s_at 206804_at
207815_at 206674_at 207826_s_at 207741_x_at 206871_at
207826_s_at 206765_at 207907_at 207815_at 207008_at
207907_at 206804_at 208406_s_at 207826_s_at 207094_at
208406_s_at 207008_at 209395_at 207850_at 207134_x_at
209101_at 207094_at 209396_s_at 207907_at 207339_s_at
209395_at 207134_x_at 209488_s_at 208406_s_at 207651_at
209488_s_at 207339_s_at 209670_at 209488_s_at 207741_x_at
209670_at 207651_at 209671_x_at 209670_at 207815_at
209671_x_at 207741_x_at 209757_s_at 209671_x_at 207826_s_at
209757_s_at 207815_at 209905_at 209757_s_at 207907_at
209905_at 207826_s_at 209995_s_at 209905_at 208304_at
209995_s_at 207907_at 210031_at 209995_s_at 208406_s_at
210031_at 208304_at 210084_x_at 210031_at 209395_at
210084_x_at 208406_s_at 210119_at 210084_x_at 209396_s_at
210119_at 209488_s_at 210164_at 210119_at 209488_s_at
210164_at 209670_at 210321_at 210164_at 209670_at
210321_at 209671_x_at 210439_at 210321_at 209671_x_at
210439_at 209757_s_at 210484_s_at 210439_at 209757_s_at
210484_s_at 209905_at 210549_s_at 210484_s_at 209905_at
210549_s_at 209995_s_at 210724_at 210549_s_at 209995_s_at
210783_x_at 210031_at 210783_x_at 210724_at 210031_at
210915_x_at 210084_x_at 210915_x_at 210783_x_at 210084_x_at
210972_x_at 210119_at 210972_x_at 210915_x_at 210119_at
210997_at 210164_at 210997_at 210972_x_at 210164_at
210998 s at 210321 at 210998 s at 210997 at 210321 at 211163_s_at 210484_s_at 211163_s_at 210998_s_at 210439_at
211339_s_at 210549_s_at 211339_s_at 211163_s_at 210484_s_at
211796_s_at 210724_at 211709_s_at 211339_s_at 210549_s_at
211902_x_at 210783_x_at 211796_s_at 211709_s_at 210724_at
212775_at 210915_x_at 211902_x_at 211796_s_at 210783_x_at
213110_s_at 210972_x_at 212775_at 211902_x_at 210915_x_at
213150_at 210997_at 212776_s_at 212775_at 210972_x_at
213193_x_at 210998_s_at 213110_s_at 212776_s_at 210997_at
213258_at 211163_s_at 213150_at 213110_s_at 210998_s_at
213830_at 211339_s_at 213193_x_at 213150_at 211163_s_at
213844_at 211796_s_at 213258_at 213193_x_at 211339_s_at
214575_s_at 211902_x_at 213844_at 213258_at 211709_s_at
214651_s_at 212775_at 214470_at 213844_at 211796_s_at
215382_x_at 212776_s_at 214567_s_at 214470_at 211902_x_at
216191_s_at 212914_at 214575_s_at 214567_s_at 212775_at
216474_x_at 213110_s_at 214651_s_at 214575_s_at 212776_s_at
216782_at 213150_at 215382_x_at 214651_s_at 213110_s_at
217023_x_at 213193_x_at 216474_x_at 215382_x_at 213150_at
217418_x_at 213258_at 216565_x_at 216474_x_at 213193_x_at
217572_at 213844_at 216782_at 216565_x_at 213258_at
218963_s_at 213906_at 217023_x_at 216782_at 213844_at
219243_at 214470_at 217418_x_at 217023_x_at 214022_s_at
219529_at 214567_s_at 217572_at 217418_x_at 214470_at
219630_at 214575_s_at 218963_s_at 219054_at 214567_s_at
220010_at 214651_s_at 219243_at 219243_at 214575_s_at
220068_at 215382_x_at 219529_at 219529_at 214651_s_at
220416_at 216474_x_at 219630_at 219630_at 215382_x_at
220418_at 216565_x_at 219737_s_at 219737_s_at 216474_x_at
220807_at 216782_at 219837_s_at 220005_at 216565_x_at
221234_s_at 217023_x_at 220005_at 220010_at 216782_at
221345_at 217418_x_at 220010_at 220068_at 217023_x_at
221349_at 217572_at 220068_at 220416_at 217418_x_at
221558_s_at 218805_at 220377_at 220418_at 217572_at
221601_s_at 218963_s_at 220416_at 220646_s_at 218805_at
221602_s_at 219054_at 220418_at 220744_s_at 218963_s_at
221958_s_at 219243_at 220807_at 220807_at 219054_at
222222_s_at 219630_at 221211_s_at 221345_at 219243_at
222285_at 220005_at 221345_at 221349_at 219630_at
37145_at 220010_at 221558_s_at 221558_s_at 220005_at
39318_at 220068_at 221602_s_at 221958_s_at 220010_at
41469_at 220377_at 222285_at 222222_s_at 220068_at
44790_s_at 220416_at 37145_at 222285_at 220416_at
AFFX-r2-Bs-dap-5_at 220418_at 39318_at 37145_at 220418_at
AFFX-r2-Bs-dap-M_at 220744_s_at 41469_at 39318_at 220807_at
220807_at 44790_s_at 41469_at 221345_at
221211_s_at AFFX-r2-Bs-dap-5_at 44790_s_at 221349_at 221345_at AFFX-r2-Bs-dap-M_at AFFX-r2-Bs-dap-5_at 221558_s_at
221349_at AFFX-r2-Bs-dap-M_at 221601_s_at
221558_s_at 221602_s_at
221602_s_at 222285_at
221958_s_at 37145_at
222285_at 39318_at
37145_at 41469_at
39318_at AFFX-r2-Bs- dap-5_at
41469_at AFFX-r2-Bs- dap-M_at
44790_s_at
AFFX-r2-Bs-dap-5_at
Table 5
Example 11 Example 12 Example 13 Example 14 Example 15
AUC VSl 0,9591275 0,9591275 0,9591275 0,9591275 0,9591275
AUC VS2 0,8669514 0,8669514 0,8669514 0,8669514 0,8669514
203434_ _s_at 203434_ _s_at 203434_ _s_at 203434_ _s_at 203434_ _s_at
203435_ _s_at 203435_ _s_at 203435_ _s_at 203435_ _s_at 203435_ _s_at
204007_ _at 204007_ _at 204007_ _at 204007_ _at 204007_ _at
207008_ _at 207008_ _at 207008_ _at 207008_ _at 207008_ _at
207094_ _at 207094_ _at 207094_ _at 207094_ _at
210084_ x_at 210084_ x_at 210084_ x_at 210084_ x_at
211163_ _s_at 211163_ _s_at 211163_ _s_at
217023_ x_at 217023_ x_at 217023_ x_at
209905_ _at 209905_ _at
203691 at 203691 at
216782_at 204006 s at

Claims

Claims
1. A method for the detection of Acute Myeloid Leukemia (AML) in a human subject based on RNA from a blood sample obtained from said subject, comprising :
measuring the abundance of at least 4 RNAs in the sample, that are chosen from the RNAs listed in Ta ble 2, a nd
concluding based on the measured abundance whether the subject has AML.
2. The method of claim 1, wherein
(i) the abundance of the first 4 RNAs listed in Table 2 is measured ;
(ii) the abundance of at least 6 RNAs listed in Table 2, preferably of the first 6 RNAs listed in Table 2 is measured;
(iii) the abundance of at least 8 RNAs listed in Table 2, preferably of the first 8 RNAs listed in Table 2 is measured,
(iv) the abundance of at least 12 RNAs listed in Table 2, preferably of the first 12 RNAs listed in Table 2 is measured,
(v) the abundance of at least 14 RNAs listed in Table 2, preferably of the first 14 RNAs listed in Table 2 is measured .
3. The method of claim 1 or 2, wherein concluding comprises classifying the sample as being from a healthy individual or from an individual having AML based on the specific difference of the abundance of the at least 4 RNAs in healthy individuals versus the abundance of the at least 4 RNAs in individuals with AML.
4. The method of claims 1 to 3, wherein concluding whether the subject has AML comprises the step of training a classification algorithm on a training set of cases and controls, and applying it to measured RNA abundance.
5. The method of claim 3 or 4, wherein the classifying is achieved by applying a random forest method, a support vector machine (SVM), or a K-nearest neighbor method (K-NN), such as a 3-nearest neighbor method (3-NN), linear discrimination analysis (LDA) or prediction analysis for microarrays (PAM).
6. The method of claims 1 to 5, wherein the measuring of RNA abundance is performed using a microarray, a real-time polymerase chain reaction or sequencing.
7. The method of claims 1 to 6, wherein the measuring of the abundance is performed through a hybridization with probes for determining the abundance of the at least 4 RNAs of Table 2, preferably said probes comprise 15 to 150, most preferably 30 to 70 consecutive nucleotides with a reverse complementary sequence to the at least 4 RNAs whose abundance is to be determined.
8. The method of claims 1 to 7, wherein
(i) the RNA of the sample to be determined is mRNA, cDNA, micro RNA, small nuclear RNA, unspliced RNA, or its fragments; and/or
(ii) the abundance of the RNAs in the sample are increased or decreased as shown in Table 2.
9. A microarray, comprising a solid support and a set of oligonucleotide probes, the set containing from 4 to about 3,000 probes, and including at least 4 probes for detecting an RNA selected from Table 2.
10. The microarray of claim 9, wherein the probes comprise 15 to 150, preferably 30 to 70 consecutive nucleotides with a reverse complementary sequence to the at least 4 RNAs.
11. Use of a microarray of claim 9 or 10 for detection of AML in a human subject based on RNA from a blood sample, comprising measuring the abundance of at least 4 RNAs listed in Table 2.
12. A kit for the detection of AML in a human subject based on RNA obtained from a blood sample, comprising means for measuring the abundance of at least 4 RNAs that are chosen from the RNAs listed in Table 2, preferably comprising means for exclusively measuring the abundance of RNAs that are chosen from Table 2.
13. The kit of claim 12, which comprises
probes comprising 15 to 150, preferably 30 to 70 consecutive nucleotides with a reverse complementary sequence to the at least 4 RNAs whose abundance is to be determined, or
a microarray comprising probes with a reverse complementary sequence to the at least 4 RNAs whose abundance is to be determined .
14. Use of a kit of claim 12 or 13 for the detection of AML in a human subject based on RNA from a blood sample, comprising means for measuring the abundance of at least 4 RNAs that are chosen from the RNAs listed in Table 2, comprising measuring the abundance of at least 4 RNAs in a blood sample from a human subject, wherein the at least 4 RNAs are chosen from the RNAs listed in Table 2, and
concluding based on the measured abundance whether the subject has AML.
15. A method for preparing an RNA expression profile that is indicative of the presence or absence of AML in a subject, comprising :
isolating RNA from a blood sample obtained from the subject, and
determining the abundance of from 4 to about 3000 RNAs, including at least 4 RNAs selected from Table 2.
PCT/EP2012/059280 2011-05-18 2012-05-18 Molecular analysis of acute myeloid leukemia WO2012156515A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP12724304.6A EP2710147A1 (en) 2011-05-18 2012-05-18 Molecular analysis of acute myeloid leukemia

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
EP11166626 2011-05-18
EP11166626.9 2011-05-18
EP11166775 2011-05-19
EP11166775.4 2011-05-19

Publications (1)

Publication Number Publication Date
WO2012156515A1 true WO2012156515A1 (en) 2012-11-22

Family

ID=46178525

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2012/059280 WO2012156515A1 (en) 2011-05-18 2012-05-18 Molecular analysis of acute myeloid leukemia

Country Status (2)

Country Link
EP (1) EP2710147A1 (en)
WO (1) WO2012156515A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9714289B2 (en) 2015-02-19 2017-07-25 Compugen Ltd. Anti-PVRIG antibodies and methods of use
US10124061B2 (en) 2016-08-17 2018-11-13 Compugen Ltd. Anti-TIGIT antibodies, anti-PVRIG antibodies and combinations thereof
US10550173B2 (en) 2015-02-19 2020-02-04 Compugen, Ltd. PVRIG polypeptides and methods of treatment
WO2021178166A1 (en) * 2020-03-06 2021-09-10 Denovo Biopharma Llc Compositions and methods for assessing the efficacy of inhibitors of neurotransmitter transporters
US11225523B2 (en) 2017-06-01 2022-01-18 Compugen Ltd. Triple combination antibody therapies
US11549146B2 (en) 2016-05-20 2023-01-10 Cedars-Sinai Medical Center Diagnosis of inflammatory bowel disease based on genes
WO2022256607A3 (en) * 2021-06-04 2023-01-19 Duke University Compositions for and methods of evaluating gap junction formation and function

Citations (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5242974A (en) 1991-11-22 1993-09-07 Affymax Technologies N.V. Polymer reversal on solid surfaces
US5384261A (en) 1991-11-22 1995-01-24 Affymax Technologies N.V. Very large scale immobilized polymer synthesis using mechanically directed flow paths
US5405783A (en) 1989-06-07 1995-04-11 Affymax Technologies N.V. Large scale photolithographic solid phase synthesis of an array of polymers
US5412087A (en) 1992-04-24 1995-05-02 Affymax Technologies N.V. Spatially-addressable immobilization of oligonucleotides and other biological polymers on surfaces
US5424186A (en) 1989-06-07 1995-06-13 Affymax Technologies N.V. Very large scale immobilized polymer synthesis
US5429807A (en) 1993-10-28 1995-07-04 Beckman Instruments, Inc. Method and apparatus for creating biopolymer arrays on a solid support surface
US5436327A (en) 1988-09-21 1995-07-25 Isis Innovation Limited Support-bound oligonucleotides
US5472672A (en) 1993-10-22 1995-12-05 The Board Of Trustees Of The Leland Stanford Junior University Apparatus and method for polymer synthesis using arrays
US5527681A (en) 1989-06-07 1996-06-18 Affymax Technologies N.V. Immobilized molecular synthesis of systematically substituted compounds
US5532128A (en) 1991-11-19 1996-07-02 Houston Advanced Research Center Multi-site detection apparatus
US5545331A (en) 1991-04-08 1996-08-13 Romar Technologies, Inc. Recycle process for removing dissolved heavy metals from water with iron particles
US5554501A (en) 1992-10-29 1996-09-10 Beckman Instruments, Inc. Biopolymer synthesis using surface activated biaxially oriented polypropylene
US5556752A (en) 1994-10-24 1996-09-17 Affymetrix, Inc. Surface-bound, unimolecular, double-stranded DNA
US5561071A (en) 1989-07-24 1996-10-01 Hollenberg; Cornelis P. DNA and DNA technology for the construction of networks to be used in chip construction and chip production (DNA-chips)
US5571639A (en) 1994-05-24 1996-11-05 Affymax Technologies N.V. Computer-aided engineering system for design of sequence arrays and lithographic masks
US5599695A (en) 1995-02-27 1997-02-04 Affymetrix, Inc. Printing molecular library arrays using deprotection agents solely in the vapor phase
US5624711A (en) 1995-04-27 1997-04-29 Affymax Technologies, N.V. Derivatization of solid supports and methods for oligomer synthesis
US5658734A (en) 1995-10-17 1997-08-19 International Business Machines Corporation Process for synthesizing chemical compounds
US5700637A (en) 1988-05-03 1997-12-23 Isis Innovation Limited Apparatus and method for analyzing polynucleotide sequences and method of generating oligonucleotide arrays
WO2004097051A2 (en) * 2003-04-29 2004-11-11 Wyeth Methods for diagnosing aml and mds differential gene expression
WO2006048270A2 (en) * 2004-11-04 2006-05-11 Roche Diagnostics Gmbh Methods of detecting leukemia and its subtypes
WO2006071088A1 (en) 2004-12-29 2006-07-06 Digital Genomics Inc. Markers for the diagnosis of aml, b-all and t-all
WO2006125195A2 (en) * 2005-05-18 2006-11-23 Wyeth Leukemia disease genes and uses thereof
WO2010143941A1 (en) 2009-06-12 2010-12-16 Erasmus University Medical Center Rotterdam Classification and risk-assignment of childhood acute myeloid leukaemia (aml) by gene expression signatures

Patent Citations (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5700637A (en) 1988-05-03 1997-12-23 Isis Innovation Limited Apparatus and method for analyzing polynucleotide sequences and method of generating oligonucleotide arrays
US5436327A (en) 1988-09-21 1995-07-25 Isis Innovation Limited Support-bound oligonucleotides
US5445934A (en) 1989-06-07 1995-08-29 Affymax Technologies N.V. Array of oligonucleotides on a solid substrate
US5405783A (en) 1989-06-07 1995-04-11 Affymax Technologies N.V. Large scale photolithographic solid phase synthesis of an array of polymers
US5424186A (en) 1989-06-07 1995-06-13 Affymax Technologies N.V. Very large scale immobilized polymer synthesis
US5527681A (en) 1989-06-07 1996-06-18 Affymax Technologies N.V. Immobilized molecular synthesis of systematically substituted compounds
US5561071A (en) 1989-07-24 1996-10-01 Hollenberg; Cornelis P. DNA and DNA technology for the construction of networks to be used in chip construction and chip production (DNA-chips)
US5545331A (en) 1991-04-08 1996-08-13 Romar Technologies, Inc. Recycle process for removing dissolved heavy metals from water with iron particles
US5532128A (en) 1991-11-19 1996-07-02 Houston Advanced Research Center Multi-site detection apparatus
US5242974A (en) 1991-11-22 1993-09-07 Affymax Technologies N.V. Polymer reversal on solid surfaces
US5384261A (en) 1991-11-22 1995-01-24 Affymax Technologies N.V. Very large scale immobilized polymer synthesis using mechanically directed flow paths
US5412087A (en) 1992-04-24 1995-05-02 Affymax Technologies N.V. Spatially-addressable immobilization of oligonucleotides and other biological polymers on surfaces
US5554501A (en) 1992-10-29 1996-09-10 Beckman Instruments, Inc. Biopolymer synthesis using surface activated biaxially oriented polypropylene
US5472672A (en) 1993-10-22 1995-12-05 The Board Of Trustees Of The Leland Stanford Junior University Apparatus and method for polymer synthesis using arrays
US5529756A (en) 1993-10-22 1996-06-25 The Board Of Trustees Of The Leland Stanford Junior University Apparatus and method for polymer synthesis using arrays
US5429807A (en) 1993-10-28 1995-07-04 Beckman Instruments, Inc. Method and apparatus for creating biopolymer arrays on a solid support surface
US5571639A (en) 1994-05-24 1996-11-05 Affymax Technologies N.V. Computer-aided engineering system for design of sequence arrays and lithographic masks
US5593839A (en) 1994-05-24 1997-01-14 Affymetrix, Inc. Computer-aided engineering system for design of sequence arrays and lithographic masks
US5556752A (en) 1994-10-24 1996-09-17 Affymetrix, Inc. Surface-bound, unimolecular, double-stranded DNA
US5599695A (en) 1995-02-27 1997-02-04 Affymetrix, Inc. Printing molecular library arrays using deprotection agents solely in the vapor phase
US5624711A (en) 1995-04-27 1997-04-29 Affymax Technologies, N.V. Derivatization of solid supports and methods for oligomer synthesis
US5658734A (en) 1995-10-17 1997-08-19 International Business Machines Corporation Process for synthesizing chemical compounds
WO2004097051A2 (en) * 2003-04-29 2004-11-11 Wyeth Methods for diagnosing aml and mds differential gene expression
WO2006048270A2 (en) * 2004-11-04 2006-05-11 Roche Diagnostics Gmbh Methods of detecting leukemia and its subtypes
WO2006071088A1 (en) 2004-12-29 2006-07-06 Digital Genomics Inc. Markers for the diagnosis of aml, b-all and t-all
WO2006125195A2 (en) * 2005-05-18 2006-11-23 Wyeth Leukemia disease genes and uses thereof
WO2010143941A1 (en) 2009-06-12 2010-12-16 Erasmus University Medical Center Rotterdam Classification and risk-assignment of childhood acute myeloid leukaemia (aml) by gene expression signatures

Non-Patent Citations (37)

* Cited by examiner, † Cited by third party
Title
A. BHATTACHARJEE ET AL., PROC. NATL. ACAD. SCI. USA, vol. 98, 2001, pages 13790 - 13795
A. DUPUY A; R.M. SIMON, J NATL CANCER INST, vol. 99, 2007, pages 147 - 157
A. ROSENWALD ET AL., NEW ENGLAND J. MED., vol. 346, 2002, pages 1937 - 1947
A.M. GLAS ET AL., BMC GENOMICS, vol. 7, 2006, pages 278
B. J. WOUTERS ET AL: "A decade of genome-wide gene expression profiling in acute myeloid leukemia: flashback and prospects", BLOOD, vol. 113, no. 2, 2 October 2008 (2008-10-02), pages 291 - 298, XP055015531, ISSN: 0006-4971, DOI: 10.1182/blood-2008-04-153239 *
BERHARD SCHÖLKOPF: "Alex Smola: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond", 2002, MIT PRESS
D.G. BEER ET AL., NAT. MED., vol. 8, 2002, pages 816 - 824
DEREK L. STIREWALT ET AL: "Identification of genes with abnormal expression changes in acute myeloid leukemia", GENES, CHROMOSOMES AND CANCER, vol. 47, no. 1, 1 January 2008 (2008-01-01), pages 8 - 20, XP055015530, ISSN: 1045-2257, DOI: 10.1002/gcc.20500 *
E.J. YEOH ET AL., CANCER CELL, vol. 1, 2002, pages 133 - 143
F.M. GOODSAID ET AL., NAT REV DRUG DISCOV, vol. 9, 2010, pages 435 - 445
FDA, FDA CLEARS BREAST CANCER SPECIFIC MOLECULAR PROGNOSTIC TEST, 2007
HAFERLACH TORSTEN ET AL: "Global approach to the diagnosis of leukemia using gene expression profiling", BLOOD, AMERICAN SOCIETY OF HEMATOLOGY, US, vol. 106, no. 4, 1 August 2005 (2005-08-01), pages 1189 - 1198, XP002361704, ISSN: 0006-4971, DOI: 10.1182/BLOOD-2004-12-4938 *
HOCHREITER ET AL., BIOINFORMATICS, vol. 22, no. 8, 2006, pages 943 - 9
IRIZARRY ET AL., BIOSTATISTICS, vol. 4, 2003, pages 249 - 264
J.A. LUDWIG; J.N. WEINSTEIN, NAT. REV. CANCER, vol. 5, 2005, pages 845 - 856
J.P. IOANNIDIS ET AL., NAT GENET, vol. 41, 2009, pages 149 - 155
KERN W ET AL: "GENE EXPRESSION PROFILING AS A DIAGNOSTIC TOOL IN ACUTE MYELOID LEUKEMIA", AMERICAN JOURNAL OF PHARMACOGENOMICS, WOLTERS KLUWER HEALTH, XX, vol. 4, no. 4, 1 January 2004 (2004-01-01), pages 225 - 237, XP009059426, ISSN: 1175-2203, DOI: 10.2165/00129785-200404040-00002 *
L. EIN-DOR ET AL., BIOINFORMATICS, vol. 21, 2005, pages 171 - 178
L. EIN-DOR ET AL., PROC NATL ACAD SCI U S A, vol. 103, 2006, pages 5923 - 5928
L. SHI ET AL., CURR OPIN BIOTECHNOL, vol. 19, 2008, pages 10 - 18
L. SHI ET AL., NAT BIOTECHNOL, vol. 24, 2006, pages 1151 - 1161
L. SHI ET AL., NAT BIOTECHNOL, vol. 28, 2010, pages 827 - 838
L.J. VAN 'T VEER ET AL., NATURE, vol. 415, 2002, pages 530 - 536
M.D. RADMACHER ET AL., J COMPUT BIOL, vol. 9, 2002, pages 505 - 511
M.P. BERRY ET AL., NATURE, vol. 466, 2010, pages 973 - 977
MIESNER ET AL., BLOOD, vol. 116, 2010, pages 2742 - 51
Q. LIU ET AL., PLOS ONE, vol. 4, 2009, pages E8250
R. SIMON: "Diagnostic and prognostic prediction using gene expression profiles in high-dimensional microarray data", BRITISH JOURNAL OF CANCER, vol. 89, 2003, pages 1599 - 1604, XP002661447, DOI: doi:10.1038/sj.bjc.6601326
R.M: PARRY ET AL., PHARMACOGENOMICS J., vol. 10, 2010, pages 292 - 309
S. DUDOIT ET AL., J. AM. STAT. ASSOC., vol. 97, 2002, pages 77 - 87
S. KOTSIANTIS, INFORMATICA J., vol. 31, 2007, pages 249 - 268
S. MICHIELS ET AL., LANCET, vol. 365, 2005, pages 488 - 492
S. RAMASWAMY ET AL., NAT. GENET., vol. 33, 2003, pages 49 - 54
S.L. POMEROY ET AL., NATURE, vol. 415, 2002, pages 436 - 442
T. REME ET AL., BMC BIOINFORMATICS, vol. 9, 2008, pages 16
T.R. GOLUB ET AL., SCIENCE, vol. 286, 1999, pages 531 - 537
W. WAPNIK: "The Nature of Statistical Learning Theory", 1995, SPRINGER VERLAG

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11220542B2 (en) 2015-02-19 2022-01-11 Compugen Ltd. Anti-PVRIG antibodies and methods of use
US10351625B2 (en) 2015-02-19 2019-07-16 Compugen Ltd. Anti-PVRIG antibodies and methods of use
US11795209B2 (en) 2015-02-19 2023-10-24 Compugen Ltd. PVRIG polypeptides and methods of treatment
US10227408B2 (en) 2015-02-19 2019-03-12 Compugen Ltd. Anti-PVRIG antibodies and methods of use
US9714289B2 (en) 2015-02-19 2017-07-25 Compugen Ltd. Anti-PVRIG antibodies and methods of use
US10550173B2 (en) 2015-02-19 2020-02-04 Compugen, Ltd. PVRIG polypeptides and methods of treatment
US11795220B2 (en) 2015-02-19 2023-10-24 Compugen Ltd. Anti-PVRIG antibodies and methods of use
US11623955B2 (en) 2015-02-19 2023-04-11 Compugen Ltd. Anti-PVRIG antibodies and methods of use
US11549146B2 (en) 2016-05-20 2023-01-10 Cedars-Sinai Medical Center Diagnosis of inflammatory bowel disease based on genes
US10751415B2 (en) 2016-08-17 2020-08-25 Compugen Ltd. Anti-TIGIT antibodies, anti-PVRIG antibodies and combinations thereof
US11701424B2 (en) 2016-08-17 2023-07-18 Compugen Ltd. Anti-TIGIT antibodies, anti-PVRIG antibodies and combinations thereof
US10124061B2 (en) 2016-08-17 2018-11-13 Compugen Ltd. Anti-TIGIT antibodies, anti-PVRIG antibodies and combinations thereof
US10213505B2 (en) 2016-08-17 2019-02-26 Compugen Ltd. Anti-TIGIT anibodies, anti-PVRIG antibodies and combinations thereof
US11225523B2 (en) 2017-06-01 2022-01-18 Compugen Ltd. Triple combination antibody therapies
WO2021178166A1 (en) * 2020-03-06 2021-09-10 Denovo Biopharma Llc Compositions and methods for assessing the efficacy of inhibitors of neurotransmitter transporters
WO2022256607A3 (en) * 2021-06-04 2023-01-19 Duke University Compositions for and methods of evaluating gap junction formation and function

Also Published As

Publication number Publication date
EP2710147A1 (en) 2014-03-26

Similar Documents

Publication Publication Date Title
US20220112562A1 (en) Prognostic tumor biomarkers
US20220396842A1 (en) Method for using gene expression to determine prognosis of prostate cancer
US20210062275A1 (en) Methods to predict clinical outcome of cancer
EP2553118B1 (en) Method for breast cancer recurrence prediction under endocrine treatment
US10196691B2 (en) Colon cancer gene expression signatures and methods of use
US20080020379A1 (en) Diagnosis and prognosis of infectious diseases clinical phenotypes and other physiologic states using host gene expression biomarkers in blood
WO2012156515A1 (en) Molecular analysis of acute myeloid leukemia
WO2008104608A1 (en) Method for the determination and the classification of rheumatic conditions
WO2011144718A2 (en) Methods and kits for diagnosing colorectal cancer
US20100298160A1 (en) Method and tools for prognosis of cancer in er-patients
AU2020201779A1 (en) Method for using gene expression to determine prognosis of prostate cancer
Belder et al. From RNA isolation to microarray analysis: comparison of methods in FFPE tissues
US20110306507A1 (en) Method and tools for prognosis of cancer in her2+partients
WO2013163134A2 (en) Biomolecular events in cancer revealed by attractor metagenes
US20150105272A1 (en) Biomolecular events in cancer revealed by attractor metagenes
EP2524966A1 (en) Molecular analysis of tuberculosis
US20220259674A1 (en) Compositions and methods for treating breast cancer

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12724304

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2012724304

Country of ref document: EP