WO2003018775A2 - Human mitochondrial dna polymorphisms, haplogroups, associations with physiological conditions, and genotyping arrays - Google Patents

Human mitochondrial dna polymorphisms, haplogroups, associations with physiological conditions, and genotyping arrays Download PDF

Info

Publication number
WO2003018775A2
WO2003018775A2 PCT/US2002/028471 US0228471W WO03018775A2 WO 2003018775 A2 WO2003018775 A2 WO 2003018775A2 US 0228471 W US0228471 W US 0228471W WO 03018775 A2 WO03018775 A2 WO 03018775A2
Authority
WO
WIPO (PCT)
Prior art keywords
haplogroup
nucleotide
allele
group
sample
Prior art date
Application number
PCT/US2002/028471
Other languages
French (fr)
Other versions
WO2003018775A3 (en
Inventor
Douglas C. Wallace
Seyed Hosseini
Dan Mishmar
Eduardo Ruiz-Pesini
Marie Lott
Original Assignee
Emory University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CA 2356536 external-priority patent/CA2356536A1/en
Application filed by Emory University filed Critical Emory University
Priority to JP2003523626A priority Critical patent/JP2005525082A/en
Priority to EP02796465A priority patent/EP1432831A4/en
Priority to US10/488,618 priority patent/US20050123913A1/en
Priority to CA002459127A priority patent/CA2459127A1/en
Publication of WO2003018775A2 publication Critical patent/WO2003018775A2/en
Publication of WO2003018775A3 publication Critical patent/WO2003018775A3/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers

Definitions

  • haplogroups can be combined into macro-haplogroups.
  • Haplogroups can be subdivided into subhaplogroups.
  • the complete Cambridge mitochondrial DNA sequence may be found at MITOMAP, http://www.gen.emory.edu/cgi-giri/MITOMAP, Genbank accession no. J01415, and is provided in SEQ ID NO:2. Also see Andrews et al. (1999), "Reanalysis and Revision of the Cambridge Reference Sequence for Human Mitochondrial DNA," Nature Genetics 23:147.
  • Haplogroup T has been associated with reduced sperm motility in European males (E. Ruiz-Pesini et al., [2000] American Journal of Human Genetics 67:682-696), the tRNA Gln np 4336 variant in haplogroup H is associated with late-onset Alzheimer Disease ( J. M. Shoffher et al, [1993] Genomics 17:171-184).
  • the D-loop is the most variable region in the mitochondrial genome, and the most polymorphic nucleotide sites within this loop are concentrated in two 'hypervariable segments', HVS-I and HVS-H (Wilkinson-Herbots, H.M. et al., (1996) "Site 73 in hypervariable region II of the human mitochondrial genome and the origin of European populations," Ann Hum Genet 60:499-508).
  • HVS-I and HVS-H Wang-Herbots, H.M. et al., (1996) "Site 73 in hypervariable region II of the human mitochondrial genome and the origin of European populations," Ann Hum Genet 60:499-508).
  • Population-specific, neutral mtDNA variants have been identified by surveying mtDNA restriction site variants or by sequencing hypervariable segments in the displacement loop. Restriction analysis using fourteen restriction endonucleases allowed screening of 15-20% of the mtDNA sequence for variations (Chen Y.S.
  • the coding and classification system that has been used for mtDNA haplogroups refers primarily to the information provided by RFLPs and the hypervariable segments of the control region.
  • neutrality testing including K a /K s analysis, has not been applied for the purpose of identifying disease-associated mutations.
  • Populations for neutrality testing analysis were identified by observation of normal phenOtypic variation.
  • Neutrality testing has been performed to determine whether a gene is under selection. None of these publications describe neutrality analysis with the purpose of identifying phenotype- associated mutations, and no suspected phenotype-associated mutations were identified.
  • US Patent 6,228,586 (issued May 8, 2001) and US Patent 6,280,953 (issued August 28, 2001) describe methods for identifying polynucleotide and polypeptide sequences in human and/or non-human primates, which may be associated with a physiological condition. The methods employ comparison of human and non-human primate sequences using statistical methods.
  • U.S. Patent 6,274,319 (issued August 14, 2001) describes Ka/K s methods for identifying polynucleotide and polypeptide sequences that may be associated with commercially or aesthetically relevant traits in domesticated plants or animals. The methods employ comparison of homologous genes from the domesticated organism and its wild ancestor to identify evolutionarily significant changes.
  • neutrality testing including K a /K s analysis, is only applied to interspecific, not intraspecific, comparisons, and only genes from the nuclear genome, not from organelle genomes, are analyzed.
  • microarray technologies are intimately connected with the Human Genome Project, which has development of rapid methods of nucleic acid sequencing and genome analysis as key objectives (E. Marshall, (1995) Science 268:1270), as well as elucidation of sequence- function relationships (M. Schena et al., (1996) Proc. Nat'l. Acad. Sci. USA, 93:10614).
  • the Affymetrix GeneChip ® HuSNPTM Array enables whole-genome surveys by simultaneously tracking nearly 1,500 genetic variations, known as single nucleotide polymorphisms (SNPs), dispersed throughout the genome.
  • the HuSNP Affymetrix Array is being used for familial linkage studies that aim to map inherited disease or drug . susceptibilities as well as for tracking de novo genetic alterations.
  • arrays rely on multiple probes to interrogate individual nucleotides in a sequence. The identity of a target base can be deduced using four identical probes that vary only in the target position, each containing one of the four possible bases. Alternatively, the presence of a consensus sequence can be tested using one or two probes representing specific alleles.
  • arrays with many probes can be created to provide redundant information.
  • Arrays also called DNA microarrays or DNA chips, are fabricated by high-speed robotics, generally on glass but sometimes on nylon substrates, for which probes (Phimister, B. (1999) Nature Genetics 21s:l-60) with known identity are used to determine complementary binding.
  • An experiment with a single DNA chip can provide researchers information on thousands of genes simultaneously.
  • steps in the design and implementation of a DNA array experiment Many strategies have been investigated at each of these steps: 1) DNA types; 2) Chip fabrication; 3) Sample preparation; 4) Assay; 5) Readout; and 6) Software (informatics).
  • Format I consists of probe cDNA (500 ⁇ 5,000 bases long) immobilized to a solid surface such as glass using robot spotting and exposed to a set of targets either separately or in a mixture. This method, "traditionally” called DNA microarray, is widely considered as having been developed at Stanford University. (R. Ekins and F.W. Chu "Microarrays: their origins and applications,” [1999] Trends in Biotechnology, 17:217-218).
  • Format ⁇ consists of an array of oligonucleotide (20 ⁇ 80-mer oligos) or peptide nucleic acid (PNA) probes synthesized either in situ (on-chip) or by conventional synthesis followed by on-chip immobilization. The array is exposed to labeled sample DNA, hybridized, and the identity/abundance of complementary sequences is determined.
  • This method "historically” called DNA chips, was developed at Affymetrix, Inc., which sells its photolithographically fabricated products under the GeneChip ® trademark. Many companies are manufacturing oligonucleotide-based chips using alternative in-situ synthesis or depositioning technologies.
  • Probes' on arrays can be hybridized with fluorescently-labeled target polynucleotides and the hybridized array can be scanned by means of scanning fluorescence microscopy.
  • the fluorescence patterns are then analyzed by an algorithm that determines the extent of mismatch content, identifies polymorphisms, and provides some general sequencing information (M. Chee et al., [1996] Science 274:610). Selectivity is afforded in this system by low stringency washes to rinse away non-selectively adsorbed materials. Subsequent analysis of relative binding signals from array elements determines where base-pair mismatches may exist. This method then relies on conventional chemical methods to maximize stringency, and automated pattern recognition processing is used to discriminate between fully complementary and partially complementary binding.
  • Devices such as standard nucleic acid microarrays or gene chips, require data processing algorithms and the use of sample redundancy (i.e., many of the same types of array elements for statistically significant data interpretation and avoidance of anomalies) to provide semi-quantitative analysis of polymorphisms or levels of mismatch between the target sequence and sequences immobilized on the device surface.
  • sample redundancy i.e., many of the same types of array elements for statistically significant data interpretation and avoidance of anomalies
  • Labels appropriate for array analysis are known in the art. Examples are the two- color fluorescent systems, such as Cy3/Cy5 and Cy3.5/Cy5.5 phosphoramidites (Glen Research, Sterling Virginia). Patents covering cyanine dyes include: U.S. 6,114,350 (Sept. 5, 2000); U.S. 6,197,956 (March 6, 2001); U.S. 6,204,389 (March 20, 2001) and U.S. 6,224,644 (May 1, 2001). Array printers and readers are available in the art.
  • the high mitochondrial DNA mutation rate of human mitochondrial DNA has been thought to result in the accumulation of a wide range of neutral, population-specific base substitutions in mtDNA. These have accumulated sequentially along radiating maternal lineages that have diverged approximately on the same time scale as human populations have colonized different geographical regions of the world. About 76% of all African mtDNAs fall into haplogroup L, defined by an Hpal restriction site gain at bp 3592. 77% of Asian mtDNAs are encompassed within a super- haplogroup defined by a Ddel site gain at bp 10394 and an Alul site gain at bp 10397. Essentially all native American mtDNAs fall into four haplogroups, A-D.
  • Haplogroup A is defined by a HaeJJI site gain at bp 663, B by a 9 bp deletion between bp 8271 to bp 8281, C by a HincIJ site loss at bp 13259, and D defined by an AM site loss at bp 5176.
  • Ten haplogroups encompass almost all mtDNAs in European populations. The ten-mtDNA haplogroups of Europeans can be surveyed by using a combination of data from RFLP analysis of the coding region and sequencing of the hypervariable segment I. About 99% of European mtDNAs fall into one often haplogroups: H, I, J, K, M, T, U, V, W or X.
  • This invention provides human mtDNA polymorphisms that are diagnostic of all the major human haplogroups and methods of diagnosing those haplogroups and selected subhaplogroups.
  • This invention also provides methods for identifying evolutionarily significant mitochondrial DNA genes, nucleotide alleles, and amino acid alleles.
  • Evolutionarily significant genes and alleles are identified using one or two populations of a single species.
  • the process of identifying evolutionarily significant nucleotide alleles involves identifying evolutionarily significant genes and then evolutionarily significant nucleotide alleles in those - genes, and identifying evolutionarily significant amino acid alleles involves identifying amino acids encoded by all nonsynonymous alleles.
  • Synonymous codings of the nucleotide alleles encoding evolutionarily significant amino acid alleles of this invention are equivalent to the evolutionarily significant amino acid alleles disclosed herein and are included within the scope of this invention.
  • Synonymous codings include alleles at neighboring nucleotide loci that are within the same codon.
  • This invention also provides methods for associating haplogroups and evolutionarily significant nucleotide and amino acid alleles with predispositions to physiological conditions.
  • Methods for diagnosing predisposition to LHON, and methods for diagnosing increased likelihood of developing blindness, centenaria, and increased longevity that are not dependent on the geographical location of the individual being diagnosed are provided herein.
  • Diagnosis of an individual with a predisposition to an energy metabolism-related physiological condition is dependent on the geographic region of the individual.
  • Physiological conditions diagnosable by the methods of this invention include healthy conditions and pathological conditions.
  • Physiological conditions that are associated with haplogroups and with alleles provided by this invention include energetic imbalance, metabolic disease, abnormal energy metabolism, abnormal temperature regulation, abnormal oxidative phosphorylation, abnormal electron transport, obesity, amount of body fat, diabetes, hypertension, and cardiovascular disease.
  • Molecules having sequences provided by this invention are provided in libraries and on genotyping arrays.
  • This invention provides methods of making and using the genotyping arrays of this invention.
  • the arrays of this invention are useful for determining the presence and absence of nucleotide alleles of this invention, for determining a haplogroup, and for diagnosis.
  • This invention also provides machine-readable storage devices and program devices for storing data and programmed methods for diagnosing haplogroups and physiological conditions.
  • the arrays of this invention are useful for determining the presence and absence of nucleotide alleles of this invention, for determining a haplogroup, and for diagnosis.
  • This invention also provides machine-readable storage devices and program devices for storing data and programmed methods for diagnosing haplogroups and physiological conditions.
  • FIG. 1 shows a consensus neighbor-joining tree of 104 human mtDNA complete sequences and two primate sequences. Numbers correspond to bootstrap values (% of 500 total bootstrap replicates) (Felsenstein, J. (1993) PHYLIP (Phylogeny Inference Package) 3.53c. Distributed by author, Department of Genetics, University of Washington, Seattle, WA). Maximum Likelihood (ML) and UPGMA yielded consistent branching orders with respect to continent-specific mtDNA haplogroups. Sequences: 11-53: Genbank AF346963- AF347015 (4); E21U: Genbank X93334, AlLla: Genbank D38112, cam revise: Genbank NC_001807 corrected according to (R. M.
  • Haplogroups A, B, C, D, and X were drawn from both Eurasia and the Americas. Haplogroup names are designated with capital letters. P. paniscus and P. troglodytes mtDNA sequences were used as outgroups. Haplogroups LO and LI encompass previously assigned LI a and Lib mtDNAs, respectively (Y. S. Chen et al., American Journal of Human Genetics 66, 1362-1383 (2000)).
  • FIG. 2 shows the migrations of human haplogroups around the world. +/-, +/+, or -/- equals Dde 1 10394 and Alu 1 10397. * equals Rsa 1 16329.
  • the mutation rate is 2.2-2.9% per million years. Time estimates are YBP (years before present).
  • FIG. 3 shows a cladogram listing nucleotide alleles describing 21 major human haplogroups, 21 sub-haplogroups, and several macro-haplogroups. The groups on the left are described by the alleles to their right. A vertical bar designates that each group to the left of the bar has all of the alleles to the right of the bar.
  • FIG. 4 shows the selective constraint (kc values) of mtDNA protein genes with comparisons among mammalian species.
  • Statistical significance P ⁇ 0.05 was determined using ANOVA, t-tests or the Tukey-Kramer Multiple Comparisons tests. Most programs used are from DNAsp ( J. Rozas and R. Rozas, (1999) Bioinformatics 15:174-5). DNA sequence divergence was analyzed using the DIVERGE program (Wisconsin Package Version 10.0, Genetics Computer Group (GCG), Madison, WI).
  • GCG Genetics Computer Group
  • Table 1 shows human mitochondrial nucleotide alleles, which have been associated with physiological conditions.
  • columns three nucleotide locus
  • five physiological condition nucleotide allele
  • column two physiological condition
  • MITOMAP A Human Mitochondrial Genome Database. Center for Molecular Medicine, Emory University, Atlanta, GA, USA. http://www.gen.emorv.edu/mitomap.htTTi1, 2001).
  • Codon usage for mtDNA differs slightly from the universal code. For example, UGA codes for tryptophan instead of termination, AUA codes for methionine instead of isoleucine, and AGA and AGG are terminators instead of coding for arginine.
  • printing refers to the process of creating an array of nucleic acids on known positions of a solid substrate.
  • the arrays of this invention can be printed by spotting, e.g., applying arrays of probes to a solid substrate, or to the synthesis of probes in place on a solid substrate.
  • glass slide refers to a small piece of glass of the same dimensions as a standard microscope slide.
  • prepared substrate refers to a substrate that is prepared with a substance capable of serving as an attachment medium for attaching the probes to the substrate, such as poly Lysine.
  • sample refers to a composition containing human mitochondrial DNA that can be genotyped.
  • quantitative hybridization refers to hybridization performed under appropriate conditions and using appropriate materials such that the sequence of one nucleotide allele (a single nucleotide polymorphism) can be determined, such as by hybridization of a molecule containing that allele to two or more probes, each containing different alleles at that nucleotide locus, all as is known in the art.
  • physiological condition includes diseased conditions, healthy conditions, and cosmetic conditions.
  • Diseased conditions include, but are not limited to, metabolic diseases such as diabetes, hypertension, and cardiovascular disease.
  • Healthy conditions include, but are not limited to, traits such as increased longevity.
  • Physiological conditions include cosmetic conditions.
  • Cosmetic conditions include, but are not limited to, traits such as amount of body fat.
  • Physiological conditions can change health status in different contexts, such as for the same organism in a different environment. Such different environments for humans are different cultural environments or different climatic contexts such as are found on different continents.
  • neutrality analysis refers to analysis to determine the neutrality of one or more nucleotide alleles and/or the gene containing the allele(s) using at least two alleles of a sequence. Commonly, the alleles in a sequence to be analyzed are divided into two groups, synonymous and nonsynonymous. Codon usage tables showing which codons encode which amino acids are used in this analysis. Codon usage tables for many organisms and genomes are available in the art. If a gene is determined to not be neutral, the gene is determined to have had selection pressure applied to it during evolution, and to be evolutionarily significant. The alleles that change amino acids in the gene (nonsynonymous) are then determined to be non-neutral and evolutionarily significant.
  • K a K s refers to a ratio of the proportion of nonsynonymous differences to the proportion of synonymous differences in a DNA sequence analysis, as is known to the art.
  • the proportion of nonsynonymous differences is the number of nonsynonymous nucleotide substitutions in a sequence per site at which a nonsynonymous substitution could occur.
  • the proportion of synonymous differences is the number of synonymous nucleotide substitutions in a sequence per site at which synonymous substitutions could occur.
  • the number of alternative substitutions that could occur at each site are also included. Either definition may be used as long as similar definitions are used for both K a and K s in an analysis.
  • Kc is K a /K s .
  • nonsynonymous refers to mutations that result in changes to the encoded amino acid.
  • synonymous refers to mutations that do not result in changes to the encoded amino acids.
  • haplogroup refers to radiating lineages on the human evolutionary tree, as is known in the art.
  • macro-haplogroup refers to a group of evolutionarily related haplogroups.
  • sub-haplogroup refers to an evolutionarily related subset of a haplogroup. An individual's haplotype is the haplogroup to which he belongs.
  • extended longevity or extended lifespan refers to living longer than the average expected lifespan for the population to which one belongs.
  • centenaria refers to an extended lifespan that is at least 100 years.
  • abnormal energy metabolism in an individual who is non-native to the geographical region in which he lives refers to energy metabolism that differs from that of the population that is native to where the individual lives.
  • abnormal temperature regulation in such an individual refers to temperature regulation that differs from that of the population that is native to where he lives.
  • abnormal oxidative phosphorylation in such an individual refers to oxidative phosphorylation that differs from that of the population that is native to where he lives.
  • abnormal electron transport in such an individual refers to electron transport that differs from that of the population that is native to where he lives.
  • metabolic disease of such an individual refers to metabolism that differs from that of the population that is native to where he lives.
  • energetic imbalance of such an individual refers to a balance of energy generation or use that differs from that of the population that is native to where he lives.
  • oil of such an individual refers to a body weight that, for the height of the individual, is 20% higher than the average body weight that is recommended for the population native to where the individual lives.
  • amount of body fat of such an individual refers to a low or high percentage of body fat relative to what is recommended for the population that is native to where he lives.
  • an isolated nucleic acid is a nucleic acid outside of the context in which it is found in nature.
  • the term covers, for example: (a) a DNA which has the sequence of part of a naturally-occurring genomic DNA molecule but is not flanked by both of the coding or noncoding sequences that flank that part of the molecule in the genome of the organism in which it naturally occurs; (b) a nucleic acid incorporated into a vector or into the genomic DNA of a prokaryote or eukaryote in a manner such that the resulting molecule is not identical to any naturally-occurring vector or genomic DNA; (c) a separate molecule such as a cDNA, a genomic fragment, a fragment produced by polymerase chain reaction (PCR), or a restriction fragment; and (d) a recombinant nucleotide sequence that is part of a hybrid gene, i.e., a gene encoding a fusion protein, or a modified gene having a sequence not
  • nucleotide locus refers to a nucleotide position of the human mitochondrial genome.
  • the Cambridge sequence SEQ ID NO:2 is used as a reference sequence, and the positions of the mitochondrial genome referred to herein are assigned relative to that sequence.
  • loci refers to more than one locus.
  • nucleotide allele refers to a single nucleotide at a selected nucleotide locus from a selected sequence when different bases occur naturally at that locus in different individuals.
  • nucleotide allele information is provided herein as the nucleotide locus number and the ⁇ base that is at that locus, such as 3796C, which means that at human mitochondrial position 3796 in the Cambridge sequence, there is a cytosine (C).
  • amino acid allele refers to the amino acid that is at a selected amino acid location in the human mitochondrial genome when different amino acids occur naturally at that location in different individuals.
  • ntl 15884 P means that there is a proline (P) encoded by the codon containing nucleotide locus 15884.
  • “evolutionarily significant gene” refers to a gene that has statistically significantly more nonsynonymous nucleotide changes, when compared to the corresponding gene in another individual, than would be expected by chance.
  • “evolutionarily significant nucleotide allele” refers to a nucleotide allele that is located in a gene that has been determined to be evolutionarily significant using that nucleotide allele, or an equivalent nucleotide allele in a corresponding gene in another individual.
  • “intraspecific” means within one species.
  • “subpopulation” refers to a population within a larger population. A subpopulation can be as small as one individual.
  • refers to a geographic area in which a statistically significant number of individuals have the same haplotype.
  • being “native” to a geographic region refers to having the haplotype associated with that geographic region.
  • the haplotype associated with a geographic region is that which originated in the region or of many individuals who settled historically in the region with respect to human evolution. ⁇
  • target or “target sample” refers to the collection of nucleic acids used as a sample for array analysis.
  • the target is interrogated by the probes of the array.
  • a “target” or “target sample” maybe a mixture of several samples that are combined.
  • an experimental target sample may be combined with a differently labeled control target sample and hybridized to an array, the combined samples being referred to as the "target” interrogated by the probes of the array during that experiment.
  • interrogated means tested. Probes, targets, and hybridization conditions are chosen such that the probes are capable of interrogating the target, i.e., of hybridizing to complementary - sequences in the target sample.
  • This invention provides a list of human mtDNA polymorphisms found in all the major human haplogroups.
  • Example 1 summarizes data from sequencing over 100 human mtDNA genomes that are representative of the major human haplogroups around the world. The summary includes over 900 point mutations and one nine-base pair deletion.
  • Table 3 Human MtDNA Nucleotide Alleles, lists the alleles identified in 103 such sequences in the third column, the corresponding alleles of the Cambridge mtDNA sequence in the second column and the nucleotide loci (position in the Cambridge sequence), in the first column.
  • Table 3 lists the set of human mtDNA nucleotide alleles that occur naturally in different haplogroups.
  • Table 3 does not include alleles previously known to be associated with disease (i.e., does not include the alleles of Table 1).
  • Table 4 lists the nucleotide alleles identified by the inventors hereof in 48 human mtDNA genomes in column three, and the corresponding Cambridge alleles in column two. Columns one and three of Table 4 make up the set of non-Cambridge human mtDNA nucleotide alleles in 48 genomes.
  • nucleotide alleles listed in Table 3, including the Cambridge nucleotide alleles, being naturally occurring, are useful for identifying alleles that are associated with abnormal physiological conditions. These nucleotide alleles can be ignored during analysis steps when performing methods for identifying novel alleles associated with selected physiological conditions.
  • certain alleles of Table 3 are useful for identifying physiological conditions related to energy metabolism such as energetic imbalance, metabolic disease, abnormal energy metabolism, abnormal temperature regulation, abnormal oxidative phosphorylation, abnormal electron transport, obesity, amount of body fat, diabetes, hypertension, and cardiovascular disease when the affected individuals have the abnormal physiological condition because they are in a geographical region that is not native for their haplogroup.
  • Example 2 summarizes phylogenetic analyses of the sequence data of the 103 individuals and the Cambridge sequence along with two chimpanzee mtDNA sequences. The results are shown in FIG. 1 in a cladogram. Calculations of the time since the most recent common ancestor (MRCA) are shown in Table 5. The 104 individuals were chosen from known haplogroups, and the corresponding haplogroups are labeled on the figure. Combining the sequence data of the 104 individuals with FIG.
  • FIG.2 Example 3
  • FIG. 3 Example 4
  • sub-haplogroups and haplogroups are listed.
  • Macrohaplogroups are shown in parentheses. Nucleotide loci and alleles that are present in all the members of each group (sub-haplo or haplo) are listed.
  • FIG. 3 is drawn as a cladogram.
  • FIG. 3 demonstrates that the macrohaplogroup (R) individuals all contain 12705C and 16223C, and no other individuals are known to have these alleles, therefore macro-haplogroup (R) can be diagnosed by identifying in a sample containing mtDNA, the presence of either 12705C or 16223C.
  • macro-haplogroup (N) can be diagnosed by identifying the presence of 8701 A, 9540T, or 10873T.
  • the presence of only one particular allele is usually sufficient for diagnosing a haplogroup, however, often it is not known which locus needs to be tested.
  • the haplogroup of an unknown sample can be diagnosed.
  • macro-haplogroups can be diagnosed or excluded first, thereby decreasing the number of loci that need to be tested to distinguish between the remaining, possible haplogroups.
  • Alleles useful for diagnosing macro-haplogroups by methods that require testing only one or a few loci are included in Table 11. Further analysis of the data provided by this invention will demonstrate which sets of alleles identify additional sub-haplogroups and additional macro- haplogroups.
  • Diagnosing the haplogroup of a sample is useful in criminal investigations and forensic analyses. Identifying a sample as belonging to a particular haplogroup, and knowing which alleles have not been associated with a selected physiological condition and context, are useful when identifying novel alleles associated with a selected physiological condition, as described above and in Example 6. Diagnosing the haplogroup of a sample is also useful for identifying a novel allele associated with a selected physiological condition when the novel allele causes the physiological condition only in the genetic context of a particular haplogroup, as shown in Example 6. In example 6, the list of alleles associated with haplogroups found in Russia was used in the sequence analysis of two Russian LHON families.
  • Example 7 demonstrates the identification of a new primary LHON mutation, 10663C, in complex I, that appears to cause a predisposition to LHON only when associated with haplogroup J.
  • Haplogroup J is defined by a nonsynonymous difference that is useful for diagnosing haplogroup J, 458T in ND5.
  • This invention provides a method of diagnosing a person with a predisposition to LHON and/or to developing early onset blindness by identifying, in a sample containing mtDNA from the person, the nucleotide allele, or a synonymous nucleotide allele of 10663 C and also identifying alleles diagnostic of haplogroup J, such as 458T in ND5.
  • ND5 458T is a missense mutation in all haplogroup J individuals, this particular mutation may be directly involved in causing LHON.
  • NDl 304H is another missense mutation that is present in all haplogroup J individuals, and may also be directly involved in causing LHON. 458T is also present in haplogroup T individuals. Haplogroup J is also associated with a predisposition to centenaria and an extended lifespan. ND5 458T and NDl 304H may also be directly involved in causing the predisposition to centenaria and extended lifespan.
  • Example 8 demonstrates the importance of demographic factors in intercontinental mtDNA sequence radiation. Haplogroups are combined and separated into various populations for statistical analyses.
  • nucleotide loci in Table 3 are located in the mitochondrial protein-coding genes (Table 2). Of those loci, some of the identified nucleotide alleles alter the protein encoded by the codon in which the nucleotide locus resides. This is determined using the mitochondrial codon usage table, as is known in the art. Nucleotide alleles that change an amino acid are called missense mutations, missense polymorphisms, or nonsynomymous differences. Missense polymorphisms alter the protein sequence relative to a compared sequence, but they still may be neutral because they do not affect the function of the encoded protein.
  • Neutrality testing of nucleotide alleles first requires neutrality testing of the genes containing those nucleotide alleles. Neutrality testing of one or more genes by comparing two sets of allelic genes from two intraspecif ⁇ c populations was performed, as described in Example 9. Haplogroups were combined to make populations for the comparison. In example 9, nucleotide alleles from the entire coding region of the mtDNA genome, representing haplogroups native to a geographic region, were combined to make a first population and first set of sequences. Nucleotide alleles of the entire coding region of the mtDNA genome, from haplogroups native to a different geographic region, were combined to make the second population and the second set of sequences.
  • Nucleotide alleles were divided into those encoding synonymous and non-synonymous differences. The ratio of K a /K s for each gene, separated by the population containing the allele, is shown in Table 12.
  • Neutrality testing of genes by comparing one set of at least two nucleotide alleles of at least one gene from one population of one species was performed in Example 10.
  • sequences of the entire coding region of the mtDNA genome, of haplogroups in all geographic regions on earth were combined to make one population and set of sequences for analysis.
  • FIG 4 shows the results of the comparison of one set of sequences from one population of only one species, 104 human sequences.
  • Example 11 includes comparisons of sets of sequences between two populations, human vs. P. paniscus, human vs. P. troglodytes, human vs. eight other primate species, and human vs. thirteen mammalian species.
  • nucleotide sequences representing parts of genes or one or more whole genes are useful.
  • the sets of sequences are compared to each other by neutrality analysis. Differences in the sequences from each set are determined to be synonymous or nonsynonymous differences. The proportion of nonsynonymous differences is compared to the proportion of synonymous differences (K a /K s )-
  • the results of the analysis are compiled in a data set and the data set is analyzed, as is known in the art, to identify one or more evolutionarily significant genes.
  • the gene or part of the gene is determined to be evolutionarily significant.
  • the synonymous differences occur significantly more often than is expected by chance than the nonsynonymous differences, the gene or part of the gene is determined to be conserved.
  • the ratio is as expected by chance, then there is no evidence of selection or evolutionary significance.
  • nucleotide sequences from only one population may also be analyzed, e.g., the nucleotide sequences representative of humans living on one continent.
  • the set must contain at least two corresponding nucleotide alleles (i.e., there must be sequence polymorphism).
  • Corresponding sequences are sequences of the same gene or gene part from at least two individuals. The sequences from different individuals within the population must contain polymorphisms with respect to each other. Differences in the sequences relative to each other are determined to be synonymous or nonsynonymous.
  • Neutrality analysis is performed to generate a data set. The data set is analyzed to identify an evolutionarily significant gene.
  • the set of nucleotide sequences can be increased, such as by increasing the size of the population from which the sequences are derived, to determine if one or more genes are evolutionarily significant in the enlarged population.
  • Example 12 is similar to example 9 except that the data is further analyzed by manipulating K a /K s to K c . Examples 9-12 demonstrate that all but one mtDNA gene are not neutral and therefore are evolutionarily significant. Genes are determined to not be neutral by statistical significance tests known in the art. Some genes are only evolutionarily significant when comparing selected populations.
  • ND4 was demonstrated to be significant when comparing Native American sequences to African sequences and when comparing all human sequences to each other, but not when comparing European to African sequences.
  • ND4L is the only mtDNA gene not shown to be evolutionarily significant by the current analyses. ND4L might be demonstrated to be evolutionarily significant by the methods of this invention using one or more different populations or using only part of the gene sequence. In examples 9-12, the entire sequence of each gene was used for analysis, however portions of genes are also useful in the methods of this invention. The statistical significance tests prevent too small a gene portion from being used to determine non- neutrality.
  • evolutionarily significant nucleotide alleles can be identified.
  • the steps for identifying an evolutionarily significant gene, using one or two populations are performed with the addition of a step of analyzing the sequence data set to determine an evolutionarily significant nucleotide allele.
  • An evolutionarily significant nucleotide allele is - part of a sequence incoding an allelic amino acid in an evolutionarily significant gene or part of a gene. Examples 13 and 14 demonstrate identification of evolutionary significant nucleotide alleles and evolutionarily significant amino acid alleles in the evolutionarily significant genes identified in Examples 9-12.
  • Evolutionarily significant amino acid alleles are the amino acids encoded by the codons containing evolutionarily significant nucleotide alleles.
  • nucleotides at loci not listed in Table 3 are identical to the Cambridge sequence so that the entire codon containing an evolutionarily significant nucleotide allele and the amino acid encoded by that codon can be determined.
  • All nucleotide alleles that are part of a codon encoding the same amino acid as an evolutionarily significant amino acid allele identified herein, or identified by methods of this invention, are also evolutionarily significant and are intended to be within the scope of this invention.
  • An evolutionarily significant amino acid allele may include more than one nucleotide allele, such as at two neighboring nucleotide loci.
  • Table 14 Evolutionarily significant nucleotide alleles and evolutionarily significant amino acid alleles in human mitochondrial sequences, identified by the methods of this invention, are listed in Table 14. i column one, Table 14 lists the gene containing the alleles, column two indicates the locus of the nucleotide allele, column three lists the Cambridge nucleotide allele at that nucleotide locus, column four lists a non- Cambridge allele of this invention, column five lists the amino acid encoded by the codon containing the Cambridge nucleotide allele (when other Cambridge nucleotides are present at the other nucleotide loci of the codon), and column six lists the amino acid encoded by the codon containing the non-Cambridge allele (when Cambridge nucleotides are present at the other nucleotide loci of the codon).
  • Table 14 designates the nucleotide locus of the listed alleles. For the amino acid alleles listed in columns five and six, the relevant loci are all three nucleotide loci in the encoding codon containing the nucleotide locus listed in column two.
  • an evolutionarily significant amino acid allele the steps for identifying an evolutionarily significant gene, using one or two populations, are performed with the addition of two steps: 1) analyzing the data set to determine an evolutionarily significant nucleotide allele; and 2) determining the encoded amino acid allele.
  • An evolutionarily significant amino acid allele is a different amino acid, representing a nonsynonymous difference, relative to the corresponding amino acid allele against which it was compared, wherein the gene has been determined to be evolutionarily significant in the corresponding one or more populations.
  • amino acid substitution mutations are much more common in human mtDNAs than would be expected by chance, and that most of them are evolutionarily significant.
  • This invention demonstrates that these alleles have become fixed by selection.
  • the mitochondrial genes encode proteins that are responsible for generating energy and for generating heat to maintain body temperature. As humans migrated to different parts of the world, they encountered changes in diet and climate. The high mutation rate of mtDNA and the central role of mitochondrial proteins in cellular energetics make the mtDNA an ideal system for permitting rapid mammalian adaptation to varying climatic and dietary conditions.
  • the increased amino acid sequence variability that has been found among human mtDNA genes is due to the fact that natural selection favored mtDNA alleles that altered the coupling efficiency between the electron transport chain (ETC) and ATP synthesis, determined by the mitochondrial inner membrane proton gradient ( ⁇ ).
  • the coupling efficiency between the ETC and ATP synthesis is mediated to a considerable extent by the proton channel of the ATP synthase, which is composed of the mtDNA-encoded ATP6 protein and the nuclear DNA-encoded ATP9 protein. Mutations in the ATP6 gene, which create a more leaky ATP synthase proton channel, reduced ATP production but increased heat production for each calorie consumed.
  • Modem mtDNA variation has been shaped by adaptation as our ancestors moved into different environmental conditions. Variants that are advantageous in one climatic and dietary environment are maladaptive when individuals locate to a different environment.
  • the methods of this invention associate mtDNA nucleotide alleles with haplogroups and combine this data with native haplogroup geographic regions as is known in the art, to diagnose individuals as having predispositions to late-onset clinical disorders such as obesity, diabetes, hypertension, and cardiovascular disease when those individuals live in climatic and dietary environments that are disadvantageous with respect to their mtDNA alleles.
  • This invention provides a method of diagnosing a human with a predisposition to a physiological condition such as, but not limited to, energetic imbalance, metabolic disease, abnormal energy metabolism, abnormal temperature regulation, abnormal oxidative phosphorylation, abnormal electron transport, obesity, amount of body fat, diabetes, hypertension, and cardiovascular disease.
  • the method involves testing a sample containing mitochondrial nucleic acid from an individual in a geographic region to determine the haplogroup of the sample and therefore of the individual, comparing the haplogroup of the individual to the set of haplogroups known to be native to that geographic region, and diagnosing the individual human with a predisposition to the above-mentioned conditions if the haplogroup of the individual is not in the set of haplogroups native to that geographic region.
  • This invention enables treatment of one of the above-mentioned conditions that is diagnosed by the above-mentioned method, comprising relocating the diagnosed human to a geographic region that is of similar climate as the region(s) native to the human's haplogroup and/or changing the diagnosed human's diet to more closely match the diet historically available in the region(s) native to the human's haplogroup.
  • the above-described method for diagnosing a predisposition to a physiological condition is also useful for associating an amino acid allele with the physiological condition.
  • the evolutionarily significant amino acid alleles present in the haplogroup of the diagnosed individual and not in the haplogroups native to the individual's geographic location are associated with the physiological condition by the methods of this invention.
  • Amino acid alleles, and the corresponding nucleotide alleles, useful for diagnosing haplogroups, and the " haplogroup they are useful for diagnosing, are listed in Table 15.
  • the amino acid alleles and corresponding nucleotide alleles listed in Table 15, and synonymously coding nucleotide alleles are associated with the above-mentioned physiological conditions.
  • Table 15 lists the set of amino acid alleles useful for diagnosing haplogroups. Column one of Table 15 lists the gene, column two lists the nucleotide locus, column three lists the useful nucleotide allele, column four lists the useful amino acid allele encoded by the useful nucleotide allele when Cambridge nucleotides are present at the other nucleotide loci of the encoding codon, and column five lists the haplogroups or sub-haplogroups, in parentheses, that contain the corresponding alleles.
  • the amino acid alleles (column four) can be identified by the codon containing the nucleotide locus (column two).
  • the proline in the NDl gene is identified as ntl 3796 P, where ntl signifies the codon containing the nucleotide locus (ntl) 3796.
  • ntl signifies the codon containing the nucleotide locus (ntl) 3796.
  • the amino acid allele is selected from the group consisting of ntl 14917 D, ntl 8701 T, and ntl 15452 I.
  • the haplogroup is haplogroup W
  • the amino acid allele is selected from the group consisting of ntl 5046 I, ntl 5460 T, ntl 8701 T, and ntl 15884 P.
  • the haplogroup is haplogroup D
  • the amino acid allele is selected from the group consisting of ntl 5178 M and ntl 8414 F.
  • the amino acid allele is selected from the group consisting of ntl 5442 L, ntl 7146 A, ntl 9402 P, ntl 13105 V, and ntl 13276 V.
  • the haplogroup is haplogroup LI
  • the amino acid allele is selected from the group consisting of ntl 7146 A, ntl 7389 H, ntl 13105 V, ntl 13789 H, and ntl 14178 V.
  • haplogroup is haplogroup C the amino acid allele is selected from the group consisting of ntl 8584 T and ntl 14318 S.
  • the amino acid allele is ntl 8701 T.
  • haplogroup J the amino acid allele is selected from the group consisting of ntl 8701 T, ntl 13708 T, and ntl 15452 I.
  • haplogroups V and H the amino acid allele is selected from the group consisting of ntl 8701 T and ntl 14766 T.
  • nucleotide and amino acid alleles also exist in nuclear- encoded ATP9 that are useful for diagnosing predisposition to an energy metabolism-related - physiological condition such as energetic imbalance, metabolic disease, abnormal energy metabolism, abnormal temperature regulation, abnormal oxidative phosphorylation, abnormal electron transport, obesity, centenaria, diabetes, hypertension, and cardiovascular disease. These alleles may be identified by methods of this invention.
  • the evolutionarily significant amino acid alleles and corresponding nucleotide alleles are candidates for alleles causing a physiological condition for which a predisposition is diagnosable by the methods of this invention.
  • the evolutionarily significant amino acid and nucleotide alleles identified by the methods of this invention (Table 19) are useful for gene therapy and mitochondrial replacement therapy to treat the corresponding physiological conditions.
  • the evolutionarily significant genes, amino acid alleles, and nucleotide alleles identified by the methods of this invention are useful for identifying targets for traditional therapy, and for designing corresponding therapeutic agents.
  • the evolutionarily significant genes and amino acid and nucleotide changes identified by the methods of this invention are useful for generating animal models of the corresponding human physiological conditions.
  • individuals may contain more than one mitochondrial DNA allele at any given nucleotide locus.
  • One cell contains many mitochondria, and one cell or different cells within one organism may contain genetically different mitochondria.
  • Heteroplasmy is the occurrence of more than one type of mitochondria in an individual or sample. Varying degrees of heteroplasmy are associated with varying degrees of the physiological conditions described herein. Heteroplasmy may be identified by means known to the art, and the severity of the physiological condition associated with specific nucleotide alleles is expected to vary with the percentage of such associated alleles within the individual.
  • the methods of this invention are used to analyze the human mitochondrial genome in the listed examples, but the methods are also useful for analyzing other genomes and other species.
  • the methods of this invention are useful for identifying evolutionarily significant protein-coding genes and the correspondingly encoded mutations in other genomes in addition to mitochondrial genomes, such as in nuclear and chloroplast genomes.
  • Using human haplogroups as populations (FIG 1) the methods of this invention are useful for identifying evolutionarily significant protein-coding genes and the corresponding evolutionarily significant alleles in human nuclear genes.
  • the methods of this invention are • also useful for identifying evolutionarily significant protein-coding genes and the corresponding alleles in many species. For example, the methods of this invention are applicable to varieties of beef or dairy cattle, or pig lines.
  • Corn lines are divisible by phenotypic and/or molecular markers into heterotic groups that are useful populations in the methods of this invention. Using corn heterotic groups as populations, the methods of this invention are useful for identifying evolutionarily significant protein-coding genes and the corresponding mutations in the nuclear, chloroplast, and mitochondrial genomes of corn.
  • This invention provides isolated nucleic acid molecules containing novel nucleotide alleles of this invention in libraries.
  • the libraries contain at least two such molecules. Preferably the molecules have unique sequences.
  • the molecules typically have a length from about 7 to about 30 nucleotides. "About” as used herein means within about 10% (e.g., "about 30 nucleotides” means 27-33 nucleotides). However, the molecules maybe longer, such as about 50 nucleotides long.
  • a library of this invention contains at least two isolated nucleic acid molecules each containing at least one non-Cambridge nucleotide allele of this invention.
  • a library of this invention may contain at least ten, twenty-five, fifty, 100, 500 or more isolated nucleic acid molecules, at least one of which contains a nucleotide allele of this invention.
  • a library of this invention may contain molecules having at least two to all of the nucleotide alleles of this invention, including synonymous codings of evolutionarily significant amino acid alleles.
  • the nucleotide alleles of this invention are defined by a nucleotide locus, the nucleotide location in the human mitochondrial genome, and by the A G C T (or U) nucleotide.
  • An isolated nucleic acid molecule, in a library of this invention, can be identified as containing a nucleotide allele of this invention, because the nucleotide allele of this invention is bounded on at least one side by its context in the mitochondrial genome.
  • a nucleotide allele of this invention is bounded on at least one side by its context in the mitochondrial genome.
  • Statistically, to be unique in the human mitochondrial genome, such a molecule would need to be at least about seven nucleotides long.
  • Statistically, to be unique in the total human genome, including the mitochondrial genome, such a molecule would need to be at least about fifteen nucleotides long.
  • Examples of isolated nucleic acid molecules of this invention are molecules containing the following nucleotide alleles: 1) Cambridge alleles at human mtDNA nucleotide loci 168-170, non-Cambridge alleles at locus 171 A, and Cambridge alleles at human mtDNA nucleotide loci 172-174; and 2) Cambridge alleles at 11940-11946, non-Cambridge alleles at 11947G, and Cambridge alleles at 11948-11954.
  • An isolated nucleic acid molecule of this invention may contain more than one nucleotide allele of this invention.
  • the nucleotide allele of this invention may be at any position in the isolated nucleic acid molecule.
  • Isolated nucleic acid molecules of this invention are useful for interrogating, determining the presence or absence of, a nucleotide allele at the corresponding nucleotide locus in the mitochondrial genome in a sample containing mitochondrial nucleic acid from a human, using any method known in the art. Methods for determining the presence of absence of the nucleotide allele include allele- specific PCR and nucleic acid array hybridization or sequencing.
  • the alleles and libraries of this invention are useful for designing probes for nucleic acid arrays.
  • This invention provides nucleic acid arrays having two or more nucleic acid molecules or spots (each spot comprising a plurality of substantially identical isolated nucleic acid molecules), each molecule having the sequence of an allele of this invention.
  • the molecules on the arrays of this invention are usually about 7 to about 30 nucleotides long.
  • the arrays are useful for detecting the presence or absence of alleles.
  • Arrays of this invention are also useful for sequencing human mtDNA.
  • Alleles may be selected from sets of nucleotide alleles including human mtDNA nucleotide alleles, non-Cambridge human mtDNA nucleotide alleles, human mtDNA nucleotide alleles in 48 genomes and the Cambridge sequence, non-Cambridge human mtDNA nucleotide alleles in 48 genomes, nucleotide alleles useful for diagnosing human haplogroups and macro-haplogroups, nucleotide alleles useful for diagnosing human haplogroups, and evolutionarily significant human mitochondrial nucleotide alleles as listed in the various Tables and portions of tables hereof.
  • Arrays of this invention may contain molecules capable of interrogating all of the alleles in one of the above-mentioned sets of alleles.
  • a genotyping array useful for detecting sequence polymorphisms such as are provided by this invention, are similar to Affymetrix (Santa Clara, CA, USA) genotyping arrays containing a Perfect Match probe (PM) and a corresponding Mismatch probe (MM).
  • PM probe could comprise a non-Cambridge allele at a selected nucleotide locus and the corresponding MM probe could comprise the corresponding Cambridge allele at the selected nucleotide locus.
  • Arrays of this invention include sequencing arrays for human mtDNA.
  • array refers to an ordered set of isolated nucleic acid molecules or spots consisting of pluralities of substantially identical isolated nucleic acid molecules. Preferably the molecules are attached to a substrate. The spots or molecules are ordered so that the location of each (on the substrate) is known and the identity of each is known. Arrays on a microscale can be called microarrays. Microarays on solid substrates, such as glass or other ceramic slides, can be called gene chips or chips.
  • Arrays are preferably printed on solid substrates. Before printing, substrates such as glass slides are prepared to provide a surface useful for binding, as is known to the art. Arrays may be printed using any printing techniques and machines known in the art. Printing involves placing the probes on the substrate, attaching the probes to the substrate, and blocking the substrate to prevent non-specific hybridization. Spots are printed at known locations. Arrays may be printed on glass microscope slides. Alternatively, probes may be synthesized in known positions on prepared solid substrates (Affymetrix, Santa Clara, CA, USA).
  • Arrays of this invention may contain as few as two spots, or more than about ten spots, more than about twenty-five spots, more than about one hundred spots, more than about 1000 spots, more than about 65,000 spots, or up to about several hundred thousand spots.
  • microarrays may require amplification of target sequences (generation of multiple copies of the same sequence) of sequences of interest, such as by PCR or reverse transcription.
  • target sequences generation of multiple copies of the same sequence
  • PCR or reverse transcription As the nucleic acid is copied, it is tagged with a fluorescent label that emits light like a light bulb.
  • the labeled nucleic acid is introduced to the microarray and allowed to react for a period of time. This nucleic acid sticks to, or hybridizes, with the probes on the array when the probe is sufficiently complementary to the labeled, amplified, sample nucleic acid. The extra nucleic acid is washed off of the array, leaving behind only the nucleic acid that has bound to the probes.
  • Arrays of this invention may be made by any array synthesis methods known in the art such as spotting technology or solid phase synthesis.
  • the arrays of this invention are synthesized by solid phase synthesis using a combination of photolithography and combinatorial chemistry.
  • Some of the key elements of probe selection and array design are common to the production of all arrays.
  • Strategies to optimize probe hybridization, for example, are invariably included in the process of probe selection.
  • Hybridization under particular pH, salt, and temperature conditions can be optimized by taking into account melting temperatures and by using empirical rules that correlate with desired hybridization behaviors.
  • Computer models may be used for predicting the intensity and concentration- dependence of probe hybridization. Detecting a particular polymorphism can be accomplished using two probes.
  • One probe is designed to be perfectly complementary to a target sequence, and a partner probe is generated that is identical except for a single base mismatch in its center.
  • these probe pairs are called the Perfect Match probe (PM) and the Mismatch probe (MM). They allow for the quantitation and subtraction of signals caused by non-specific cross-hybridization.
  • the difference in hybridization signals between the partners, as well as their intensity ratios, serve as indicators of specific target abundance, and consequently of the sequence.
  • Arrays can rely on multiple probes to interrogate individual nucleotides in a sequence.
  • the identity of a target base can be deduced using four identical probes that vary only in the target position, each containing one of the four possible bases.
  • the presence of a consensus sequence can be tested using one or two probes representing specific alleles.
  • arrays with many probes can be created to provide redundant information, resulting in unequivocal genotyping.
  • Probes fixed on solid substrates and targets are combined in a hybridization buffer solution and held at an appropriate temperature until annealing occurs. Thereafter, the substrate is washed free of extraneous materials, leaving the nucleic acids on the target bound to the fixed probe molecules allowing for detection and - quantitation by methods known in the art such as by autoradiograph, liquid scintillation counting, and/or fluorescence. As improvements are made in hybridization and detection techniques, they can be readily applied by one of ordinary skill in the art.
  • the probe molecules and target molecules hybridize by forming a strong non- covalent bond between the two molecules, it can be reasonably assumed that the probe and target nucleic acid are essentially identical, or almost completely complementary if the annealing and washing steps are carried out under conditions of high stringency.
  • the detectable label provides a means for determining whether hybridization has occurred.
  • the probes may be labeled.
  • the target may instead be labeled by means known to the art.
  • Target may be labeled with radioactive or non-radioactive labels.
  • Targets preferably contain fluorescent labels.
  • Various degrees of stringency of hybridization can be employed. The more stringent the conditions are, the greater the complementarity that is required for duplex formation. Stringency can be controlled by temperature, probe concentration, probe length, ionic strength, time, and the like.
  • Hybridization experiments are often conducted under moderate to high stringency conditions by techniques well know in the art, as described, for example in Keller, G.H., and M.M. Manak (1987) DNA Probes, Stockton Press, New York, NY., pp. 169-170, hereby incorporated by reference.
  • sequencing arrays typically use lower hybridization stringencies, as is known in the art.
  • Moderate to high stringency conditions for hybridization are known to the art.
  • An example of high stringency conditions for a blot are hybridizing at 68° C in 5X SSC/5X Denhardt's solution/0.1% SDS, and washing in 0.2X SSC/0.1% SDS at room temperature.
  • An example of conditions of moderate stringency are hybridizing at 68° C in 5X SSC/5X Denhardt's solution/0.1% SDS and washing at 42° C in 3X SSC.
  • the parameters of temperature and salt concentration can be varied to achieve the desired level of sequence identity between probe and target nucleic acid. See, e.g., Sambrook et al. (1989) vide infra or Ausubel et al. (1995) Current Protocols in Molecular Biology, John Wiley & Sons, NY, NY, for further guidance on hybridization conditions.
  • the 'melting temperature is described by the following formula (Beltz, G.A. et al., [1983] Methods of Enzymology, R. Wu, L. Grossman and K. Moldave [Eds.] Academic Press, New York 100:266-285).
  • Tm 81.5o C + 16.6 Log[Na+]+0.41(+G+C)-0.61(%,formamide)-600/length of duplex in base pairs.
  • Washes can typically be carried out as follows: twice at room temperature for 15 minutes in IX SSPE, 0.1% SDS (low stringency wash), and once at TM-20o C for 15 minutes in 0.2X SSPE, 0.1% SDS (moderate stringency wash).
  • Nucleic acid useful in this invention can be created by Polymerase Chain Reaction (PCR) amplification. PCR products can be confirmed by agarose gel electrophoresis. PCR is a repetitive, enzymatic, primed synthesis of a nucleic acid sequence. This procedure is well known and commonly used by those skilled in this art (see Mullis, U.S. Patent Nos. 4,683,195, 4,683,202, and 4,800,159; Saiki et al. [1985] Science 230:1350-1354). PCR is used to enzymatically amplify a DNA fragment of interest that is flanked by two oligonucleotide primers that hybridize to opposite strands of the target sequence.
  • PCR Polymerase Chain Reaction
  • the primers are oriented with the 3' ends pointing towards each other. Repeated cycles of heat denaturation of the template, annealing of the primers to their complementary sequences, and extension of the annealed primers with a DNA polymerase result in the amplification of the segment defined by the 5' ends of the PCR primers. Since the extension product of each primer can serve as a template for the other primer, each cycle essentially doubles the amount of DNA template produced in the previous cycle. This results in the exponential accumulation of the specific target fragment, up to several million-fold in a few hours.
  • a thermostable DNA polymerase such as the Taq polymerase, which is isolated from the thermophilic bacterium Thermus aquaticus, the amplification process can be completely automated. Other enzymes that can be used are known to those skilled in the art.
  • Polynucleotide sequences of the present invention can be truncated and/or mutated such that certain of the resulting fragments and/or mutants of the original full-length sequence can retain the desired characteristics of the full-length sequence.
  • restriction enzymes that are suitable for generating fragments from larger nucleic acid molecules are well known.
  • Bal31 exonuclease can be conveniently used for time-controlled limited digestion of DNA. See, for example, Maniatis (1982) Molecular Cloning: A Laboratory Manual Cold Spring Harbor Laboratory, New York, pages 135-139, incorporated herein by reference. See also Wei et al. (1983) J. Biol. Chem. 258:13006-13512.
  • Bal31 exonuclease commonly referred to as "erase-a- base” procedures
  • the ordinarily skilled artisan can remove nucleotides from either or both ends of the subject nucleic acids to generate a wide spectrum of fragments that are functionally equivalent to the subject nucleotide sequences.
  • One of ordinary skill in the art can, in this manner, generate hundreds of fragments of controlled, varying lengths from locations all along the original molecule.
  • the ordinarily skilled artisan can routinely test or screen the generated fragments for their characteristics and determine the utility of the fragments as taught herein. It is also well known that the mutant sequences can be easily produced with site-directed mutagenesis. See, for example, Larionov, O.A.
  • Standard techniques for cloning, DNA isolation, amplification and purification, for enzymatic reactions involving DNA ligase, DNA polymerase, restriction endonucleases and the like, and various separation techniques useful herein are those known and commonly employed by those skilled in the art.
  • a number of standard techniques are described in Sambrook et al. (1989) Molecular Cloning, Second Edition, Cold Spring Harbor Laboratory, Plainview, New York; Maniatis et al. (1982) Molecular Cloning, Cold Spring Harbor Laboratory, Plainview, New York; Wu (ed.) (1993) Meth. Enzymol. 218, Part I; Wu (ed.) (1979) Meth. Enzymol. 68; Wu et al.
  • This invention provides machine-readable storage devices and program storage devices having data and methods for diagnosing haplogroups and physiological conditions.
  • One program storage device contains the program steps: a) determining the haplogroup of a sample from an individual using nucleotide sequence data from nucleic acid in the sample; b) associating the haplogroup with information identifying the geographic region of the individual; c) comparing the haplogroup and geographic region of the sample to the set of haplogroups native to the geographic region of the individual; and d) diagnosing the individual with a predisposition to an energy metabolism-related physiological condition if the haplogroup of the individual is not within the set of haplogroups native to the geographic region of the individual; all said program steps being encoded in machine readable form, and all said information encoded in machine readable form.
  • This invention also provides a data set, encoded in machine-readable form, containing nucleotide alleles listed in Table 19, with each allele associated with encoded information identifying a physiological condition in humans.
  • physiological conditions are energy- metabolism-related conditions including energetic imbalance, metabolic disease, abnormal energy metabolism, abnormal temperature regulation, abnormal oxidative phosphorylation, abnormal electron transport, obesity, amount of body fat, diabetes, hypertension, and cardiovascular disease.
  • This storage device may also contain information associating each allele with one or more native geographic regions.
  • a program storage device provided by this invention contains input means for inputting the haplogroup of an individual and the geographic region of that individual, and contains information associating alleles with native geographic regions, and program steps for diagnosing the individual with a predisposition to a physiological condition.
  • a storage device containing a data set in machine readable form provided by this invention may include encoded information comprising amino acid alleles listed in Table 19, with each allele associated with a physiological condition in humans.
  • This invention provides human mtDNA polymorphisms found in all the major human haplogroups.
  • Table 3 shows naturally occurring nucleotide alleles identified in the complete mtDNA sequences of 103 individuals, as compared to the mtDNA Cambridge sequence. All nucleotide sequences not listed are identical to the Cambridge sequence. Nucleotide alleles previously known to be associated with disease conditions, such as those listed in Table 1, are not listed in Table 3. Some deletion or rearrangement polymorphisms have also been excluded. All polymorphisms listed are nucleotide substitutions except for a nine-adenine nucleotide deletion at positions 8271-8279.
  • Table 4 lists the nucleotide alleles identified in 48 mitochondrial genomes as compared to the Cambridge sequence.
  • the mtDNA sequences of Example 1 were chosen because they represent all of the major haplogroup lineages in humans. Analysis of these sequences has reaffirmed that all human mtDNAs belong to a single maternal tree, rooted in Africa (R. L. Cann et al, Nature 325:31-36 (1987); M. J. Johnson et al., (1983) Journal of Molecular Evolution 19:255-271; D. C. Wallace et al., "Global Mitochondrial DNA Variation and the Origin of Native Americans" in The Origin of Humankind, M. Aloisi, B. Battaglia, E. Carafoli, G. A. Danieli, Eds., Venice (IOS Press, 2000); M.
  • the most striking feature of the mtDNA tree is the remarkable reduction in the number of mtDNA lineages that are associated with the transition from one continent to another.
  • the number of mitochondrial lineages was reduced from dozens to two lineages.
  • northeastern Africa encompasses the entire range of African mtDNA variation from the exclusively African haplogroups L0- L2 to the progenitors of the European and Asian mtDNA lineages
  • macro-haplogroups M and N which arose about 65,000 YBP, left Africa to colonize Eurasia.
  • the times of the MRCAs of macro-haplogroups M and N as well as sub-macro-haplogroup R are similar, suggesting rapid population expansion associated with the colonization of Eurasia.
  • alleles are descriptive of the major haplogroups, selected sub-haplogroups, and selected macro-haplogroups.
  • the mtDNA nucleotide positions and the relevant alleles are shown in FIG 3.
  • the data is arranged as a cladogram, such that a group on the left contains all of the alleles to its right.
  • a vertical bar designates that the alleles to the right of the bar are present in all of the groups to the left of the bar.
  • the haplogroup data in FIG. 3 is summarized in Tables 6 and 7.
  • the sub-haplo group data is summarized in Tables 8 and 9. Each group contains the alleles listed below it.
  • nucleotide alleles useful for diagnosing the haplogroups A set of nucleotide alleles useful for diagnosing all of the haplogroups and sub-haplogroups in FIG. 3 is listed in Table 10. There are many equivalent methods for diagnosing the haplogroups. Examples of methods requiring testing only or a few loci follow. Alleles are identified in human samples containing mtDNA. Haplogroup LO can be diagnosed by identifying 4586C, 9818T, or 8113A. Haplogroup LI can be diagnosed by identifying 825 A, 2758A, 2885C, 7146G, 8468T, 8655T, 10688A, 10810C, or 13105G.
  • Haplogroup L2 can be diagnosed by identifying 2416C, 2758G, 8206A, 9221 G, 11944C, or 16390G.
  • Haplogroup L3 can be diagnosed by identifying 10819G, 14212C, 8618C, 10086C, 16362C, 10398A, or 16124C.
  • Haplogroup C can be diagnosed by identifying 3552C, 4715G, 7196A, 8584A, 9545G, 13263G, 14318C, or 16327T.
  • Haplogroup D can be diagnosed by identifying 4883T, 5178A, 8414T, 14668T, or 15487T.
  • Haplogroup E can be diagnosed by identifying 16227G.
  • Haplogroup G can be diagnosed by identifying 4833G, 8200C, or 16017C
  • Haplogroup Z can be diagnosed by identifying 11078G, 16185T, or 16260T.
  • Haplogroup A can be diagnosed by identifying 663G, 16290T, or 16319A.
  • Haplogroup I can be diagnosed by identifying 4529T, 10034C, or 16391 A.
  • Haplogroup W can be diagnosed by identifying 204C, 207A, 1243C, 5046A, 5460A, 8994A, 11947G, 15884C, or 16292T.
  • Haplogroup X can be diagnosed by identifying 1719A, 3516G, 6221C, or 14470C.
  • Haplogroup F can be diagnosed by identifying 12406A or 16304C.
  • Haplogroup Y can be diagnosed by identifying 7933G, 8392A, 1623 IC, or 16266T.
  • Haplogroup U can be diagnosed by identifying 3197C, 4646C, 7768G, 9055A, 11332T, 13104G, 14070G, 15907G, 16051G, 16129C, 16172C, 16219G, 16249C, 16270T, 16311T, 16318T, 16343G, or 16356C
  • Haplogroup J can be diagnosed by identifying 295T, 12612G, 13708A, or 16069T.
  • Haplogroup T can be diagnosed by identifying 11812G, 12633T, 14233G, 16163C, 16186T, 1888A, 4917G, 8697A, 10463C, 13368A, 14905A, 15607G, 15928A, or 16294T.
  • Haplogroup V can be diagnosed by identifying 72C, 4580A, or 15904T.
  • Haplogroup H can be diagnosed by identifying 2706A or 7028C. Diagnosis of haplogroup B is more complicated, requiring three steps.
  • Haplogroup B can be diagnosed by identifying 16189C; and by identifying the absence of 1719A, 3516G, 6221C, 14470C, or 16278T; and by identifying the absence of 1888 A, 4216C, 4917G, 8697A, 10463C, 11251G, 11467G, 12308G, 12372A, 12633T, 13104G, 13368A, 14070G, 14905A, 15452A, 15607G, 15928A, 16126C, 16163C, 16186T, 16249C, or 16294T.
  • Table 10 Nucleotide Alleles Useful for Diagnosing Human
  • Table 11 Additional alleles are included in Table 11. These alleles are useful for designing equivalent methods, to those described above, for diagnosing the haplogroups. Alleles in Table 11 are useful for designing efficient methods for diagnosing macro-haplogroups. The data in Tables 10 and 11 and FIG 3 are also useful for identifying sub-haplogroups.
  • This invention provides a method for diagnosing sub-hap lo group Ll l by identifying in a human sample, one of the nucleotide alleles selected from the group consisting of 4586C and 9818T.
  • This invention provides a method for diagnosing sub-haplogroup Lla2 by identifying in a human sample, one of the nucleotide alleles selected from the group consisting of 8113 A and 8251 A.
  • This invention provides a method for diagnosing sub-haplogroup Llbl by identifying in a human sample, the nucleotide allele 2352C and one of the nucleotide alleles selected from the group consisting of 3666A, 7055G, 7389C, 13789C, and 14178C.
  • This invention provides a method for diagnosing sub-haplogroup Llb2 by identifying in a human sample, one of the nucleotide alleles selected from the group consisting of 3796C, 5951G, 5984G, 6071C, 9072G, 10586A, 12810G, and 13485G.
  • This invention provides a method for diagnosing sub-haplogroup L2a by identifying in a human sample the nucleotide allele 13803G.
  • This invention provides a method for diagnosing sub-haplogroup L2b by identifying in a human sample the nucleotide allele 4158G.
  • This invention provides a method for diagnosing sub-haplogroup L2c by identifying in a human sample, one of the nucleotide alleles selected from the group consisting of 325T, 680C, and 13958C.
  • This invention provides a method for diagnosing sub-haplogroup L3a by identifying in a human sample, one of the nucleotide alleles selected from the group consisting of 2325C, 10819G, and 14212C.
  • This invention provides a method for diagnosing sub-haplogroup L3b by identifying in a human sample the nucleotide allele 8618C.
  • This invention provides a method for diagnosing sub-haplogroup L3c by identifying in a human sample the nucleotide allele 10086C.
  • This invention provides a method for diagnosing sub-haplogroup L3d by identifying in a human sample the nucleotide allele 10398A.
  • This invention provides a method for diagnosing sub-haplogroup Uk by identifying in a human sample, one of the nucleotide alleles selected from the group consisting of 9055 A and 1631 IT.
  • This invention provides a method for diagnosing sub-haplogroup U7 by identifying in a human sample the nucleotide allele 16318T.
  • This invention provides a method for diagnosing sub-haplogroup U6 by identifying in a human sample, one of the nucleotide alleles selected from the group consisting of 16172 C and 16219G.
  • This invention provides a method for diagnosing sub- haplogroup U5 by identifying in a human sample, one of the nucleotide alleles selected from the group consisting of 3197C, 7768G, and 16270T.
  • This invention provides a method for diagnosing sub-haplogroup U4 by identifying in a human sample, one of the nucleotide alleles selected from the group consisting of 4646C, 11332T, 16356C.
  • This invention provides a method for diagnosing sub-haplogroup U3 by identifying in a human sample the nucleotide allele 16343G.
  • This invention provides a method for diagnosing sub-haplogroup U2 by identifying in a human sample, one of the nucleotide alleles selected from the group consisting of 15907G, 16051G, and 16129C.
  • This invention provides a method for diagnosing sub-haplogroup Ul by identifying in a human sample, one of the nucleotide alleles selected from the group consisting of 13104G, 14070G, 16189C, and 16249C.
  • This invention provides a method for diagnosing sub-haplogroup T* by identifying in a human sample, one of the nucleotide alleles selected from the group consisting of 11812G and 14233G.
  • This invention provides a method for diagnosing sub-haplogroup TI by identifying in a human sample, one of the nucleotide alleles selected from the group consisting of 12633T, 16163C, and 16186T.
  • An equivalent method for diagnosing a haplogroup is diagnosing haplogroup LO by identifying the presence of one of 825A, 2758A, 2885C, 7146G, 8468T, 8655T, 10688A, 10810C, or 13105G; and identifying the absence of one of 3666A, 7055G, 7389C, 13789C, or 14178C.
  • Other equivalent methods can be derived from the data in FIG 3, and are within the scope of this invention.
  • LHON Lebers Hereditary Optic Neuropathy
  • mtDNA mitochondrial DNA
  • 3460A, 11778A, 14484C, and 14459A are designated "primary" mutations.
  • a new primary LHON mtDNA mutation, 10663C, affecting a Complex I gene was homoplasmic in 3 Caucasian LHON families, all of which belonged to haplogroup J. These 3 families were the only haplogroup J-associated LHON families (out of 17) that did not harbor a known, primary LHON mutation.
  • Comprehensive phylogenetic analysis of haplogroup J using complete mtDNA sequences demonstrated that the 10663C variant has arisen 3 independent times on this background. This mutation was not present in over 200 non- haplogroup J European controls, 74 haplogroup J patient and control mtDNAs, or 36 putative LHON patients without primary mutations.
  • a partial Complex I defect was found in 10663C- containing lymphoblast and cybrid mitochondria.
  • the 10663C mutation has occurred three independent times, each time on haplogroup J and only in LHON patients without a known LHON mutation.
  • the number of non-synonymous to synonymous base substitutions was analyzed for all 13 mtDNA protein genes of those haplogroups which contributed to the colonization of each of the major continental spaces: African, European, and Native American.
  • the mtDNAs from the Asian-Native American haplogroups A, B, C, D and X were combined.
  • the Asian-Native American mtDNAs from the haplogroups were combined because random mutations accumulate in founder populations and those mtDNAs which prove advantageous in new environments are enriched. Hence, the founding mutations of the haplogroup are important in the continental success of the lineage.
  • the kc values for each human mtDNA gene were compared across the total global collection of human mtDNA sequences ( Figure 4).
  • the ATP6 gene was the least conserved gene in the human mtDNA, though previously it had been shown to be relatively highly conserved in inter-specific comparisons (N. Neckelmann et al., (1987) Proc. Natl. Acad. Sci. USA 84:7580-7584).
  • ATP6 The higher inter-specific conservation of ATP6 was confirmed by comparing the kc values of human versus chimpanzee (Pan troglodytes) and bonobo (Pan paniscus); human versus eight primate species (baboon, Borneo and Sumatran orangutan, gibbon, gorilla, lowland gorilla, bonobo, and chimpanzee); and human versus 13 diverse mammalian species (bovine, mouse, cat, dog, pig, rat, rhinoceros, horse, gibbon, gorilla , orangutan, bonobo, chimpanzee) (Figure 3).
  • ATP6 is highly conserved between species, it is very poorly conserved within humans.
  • mtDNA protein sequence correlates with the climatic transitions that humans would have experienced as they migrated out of tropical and sub-tropical Africa and into temperate Eurasia and arctic Siberia and Beringia.
  • the mtDNA genes that showed the highest amino acid sequence variation between continents were COIJJ and ATP6.
  • a threonine to alanine substitution at codon 59 (T59A, nucleotide location 8701- 8703) in ATP6 separates the mtDNAs of macro-haplogroup N from the rest of the World.
  • the polar threonine at position 59 is conserved in all great apes and some old-world monkeys.
  • haplogroups of macro-haplogroup M the related Siberian-Native American haplogroups C and Z are delineated by an A20T (nucleotide location 8584-8586) variant.
  • a non-polar amino acid found in this position occurs in all animal species except for Macaca, Papio, Balaenoptera and Drosophila.
  • the non-R lineage Nib harbors two distinctive amino acid substitutions: M104V (nucleotide location 8836-8838) and T146A.
  • the methionine at position 104 is conserved in all mammals, and the threonine at position 146 is conserved throughout all animal mtDNAs. Moreover, the T146A substitution is within the same transmembrane ⁇ -helix as the pathogenic mutation L156R that alters the coupling efficiency of the ATP synthase and causes the NARP and Leigh syndromes (I. Trounce, S. Neill, D. C Wallace, Proceedings of the National Academy of Sciences of the United States of America 91, 8334-8338 (1994)).
  • haplogroup A mtDNAs harbor a H90Y (nucleotide location 8794-8796) amino acid substitution.
  • the histidine in this position is conserved in all placental mammals except Pongo, Cebus and Loxodonta and occurs within a highly conserved region.
  • haplogroup B one mtDNA harbored a F193L (nucleotide location 9103-9105) substitution. This position is conserved in all mammals except Pongo, Papio, Cebus and Erinaceus.
  • SEQ JD NO:l is a theoretical human mtDNA genome sequence containing the nucleotide alleles of this invention as listed in Table 3.
  • SEQ JD NO:2 is the human mtDNA reference sequence called the Cambridge Sequence (GenBank Accession No. JO 1415).

Abstract

This inention provides human mtDNA polymorphisms that are diagnostic of all the major human haplogroups and methods of diagnosing those haplogroups and selected haplogroups.

Description

HUMAN MITOCHONDRIAL DNA POLYMORPHISMS, HAPLOGROUPS, ASSOCIATIONS WITH PHYSIOLOGICAL CONDITIONS, AND GENOTYPING
ARRAYS
CROSS-REFERENCE TO RELATED APPLICATIONS This application claims priority to U.S. Patent Applications Serial No. 60/316333 filed August 30, 2001 and Serial No. 60/380,546 filed May 13, 2002, and to Canadian Patent Application No. 2,356,536 filed on August 31, 2001, which are.hereby incorporated in their entirety by reference to the extent not inconsistent with the disclosure herein.
STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH This invention was made in part with funding from the United States Government (NIH grants AG13154, HL64017, NS21328, andNS37167). The United States Government may have certain rights therein.
BACKGROUND OF THE INVENTION Human mitochondrial DNA (mtDNA) is maternally inherited. Mutations accumulate sequentially in radiating lineages creating branches on the human evolutionary tree. Using sequences of tDNA, human populations are divisible evolutionarily into haplogroups (Wallace, D.C. et al. (1999) Gene 238:211-230; h gman M. et al., (2000) Nature 408:708- 713; Maca-Meyer, N. (August 2001) BioMed Central 2:13; T. G. Schurr et al., (1999) American Journal of Physical Anthropology 108:1-39; and V. Macaulay et al., (1999) American Journal of Human Genetics 64:232-249). Related haplogroups can be combined into macro-haplogroups. Haplogroups can be subdivided into subhaplogroups. The complete Cambridge mitochondrial DNA sequence may be found at MITOMAP, http://www.gen.emory.edu/cgi-giri/MITOMAP, Genbank accession no. J01415, and is provided in SEQ ID NO:2. Also see Andrews et al. (1999), "Reanalysis and Revision of the Cambridge Reference Sequence for Human Mitochondrial DNA," Nature Genetics 23:147.
Publications on the subject of mitochondrial biology include: Scheffler, IE. (1999) Mitochondria. Wiley-Liss, NY; Lestierme P Ed.; Mitochondrial Diseases: Models and Methods, Springer-Verlag, Berlin; Methods in Enzvmology (2000) 322:Section V Mitochondria and Apoptosis, Academic Press, CA; Mitochondria and Cell Death (1999) Princeton University Press, NJ; Papa S, Ferruciio G, and Tager J Eds.; Frontiers of Cellular Bioenergetics: Molecular Biology, Biochemistry, and Phvsiopathology, Kluwer Academic / Plenum Publishers, NY; Lemasters, J. and Nieminen, A. (2001) Mitochondria in Patho genesis, Kluwer Academic / Plenum Publishers, NY; MITOMAP, http://www.gen.emory.edu cgi-ginMITOMAP; Wallace, D.C. (2001) "A mitochondrial paradigm for degenerative diseases and ageing" Novartis Foundation Symposium 235:247- 266; Wallace, D.C. (1997) "Mitochondrial DNA in Aging and Disease" Scientific American August 277:40-47; Wallace, D.C. et al, (1998) "Mitochondrial biology, degenerative diseases and aging," BioFactors 7:187-190; Heddi, A. et al., (1999) "Coordinate Induction of Energy Gene Expression in Tissues of Mitochondrial Disease Patients" JBC 274:22968- 22976; Wallace, D.C. (1999) "Mitochondrial Diseases in Man and Mouse" Science 283:1482- 1488; Saraste, M. (1999) "Oxidative Phosphorylation at the fin de siecle" Science 283:1488- 1493; Kokoszka et. al. (2001) "Increased mitochondrial oxidative stress in the Sod2 (+/-) mouse results in the age-related decline of mitochondrial function culminating in increased apoptosis" PNAS 98:2278-2283; Wallace, D.C. (2001) Mental Retardation and Developmental Disabilities 7:158-166; Wallace, D.C. (2001) Am. J. Med. Gen. 106:71-93; Wei, Y-H et al. (2001) Chinese Medical Journal (Taipei) 64:259-270; and Wallace, D.C. (2001) EuroMit 5 Abstract.
Certain mitochondrial mutations have been associated with physiological conditions (U.S. Patent 6,280,966 issued on August 28, 2001; U.S. Patent 6,140,067 issued on October 31, 2000; U.S. Patent 5,670,320; U.S. Patent 5,296,349; U.S. Patent 5,185,244; U.S. Patent 5,494,794; Wallace, D.C. (1999) Science 283:1482-1488; Brown, M.D. et al. (2001) American Society for Human Genetics Poster #2332; Brown, M.D. et al., (2001) Human Genet. 109:33-39; and Brown, M.D. et al. (January 2002) Human Genet. 110:130-138), Wallace, D.C. et al. (1999) Gene 238:211-230 describes analysis of LHON mutants. Grossman, L.I. et al. (2001) Molecular Phylogenetics and Evolution 18(l):26-36, describes changes in the biochemical machinery for aerobic energy metabolism. Kalman, B. et al. (1999) Acta Neurol. Scand. 99(1): 16-25 describes mitochondrial mutations and multiple sclerosis (MS). Wei, Y.H. et al. (2001) Chinese Medical Journal 64:259-270 describes recent results in support of the mitochondrial theory of aging.
Ivanova, R. et al. (1998) Geronotology 44:349 describes mitochondrial haplotypes and longevity in a French population. Tanaka, M. et al. (1998) Lancet 351:185-186 describes longevity and haplogroups in a Japanese population. De Benedictis, G. et al. (1999) FASEB 13:1532-1536 describes haplogroups and longevity in an Italian population. Rose, G. et al. (2001) European Journal of Human Genetics 9:701-707 describes haplogroup J in centenarians. Ross, O.A. et al. (2001) Experimental Gerontology 36(7):1161-1178 describes haplotypes and longevity in an Irish population.
Haplogroup T has been associated with reduced sperm motility in European males (E. Ruiz-Pesini et al., [2000] American Journal of Human Genetics 67:682-696), the tRNAGln np 4336 variant in haplogroup H is associated with late-onset Alzheimer Disease ( J. M. Shoffher et al, [1993] Genomics 17:171-184).
Taylor, R.W. (1997) J. of Bioenergetics and Biomembranes 29(2):195-205 describes methods for treating mitochondrial disease. Collombet, J. and Coutelle, C. (1998) Molecular Medicine Today 4(l):l-8 describes gene therapy for mitochondrial disorders, including using cell fusion to introduce healthy mitochondria. Owen, R. and Flotte, T.R. (2001) Antioxidants and Redox Signaling 3(3):451 -460 discuss approaches and limitations to gene therapy for mitochondrial diseases.
Human mitochondrial DNA sequence variation, except that which has been associated with particular diseases, has not been associated with specific phenotypic conditions, has been considered neutral, and has been used to reconstruct human phylogenies (Henry Gee, "Statistical Cloud over African Eden," (13 February 1992) Nature 355:583; Marcia Barinaga, '"African Eve Backers Beat a Retreat," (7 February 1992) Science, 255:687; S. Blair Hedges et al., "Human Origins and Analysis of Mitochondrial DNA Sequences," (7 February 1992) Science, 255:737-739; Allan C. Wilson and Rebecca L. Carrn, "The Recent African Genesis of Humans," (April 1992) Scientific American, 68). The average number of base pair differences between two human mitochondrial genomes is estimated to be from 9.5 to 66 (Zeviani M. et al. (1998) "Reviews in molecular medicine: Mitochondrial disorders," Medicine 77:59-72).
The D-loop is the most variable region in the mitochondrial genome, and the most polymorphic nucleotide sites within this loop are concentrated in two 'hypervariable segments', HVS-I and HVS-H (Wilkinson-Herbots, H.M. et al., (1996) "Site 73 in hypervariable region II of the human mitochondrial genome and the origin of European populations," Ann Hum Genet 60:499-508). Population-specific, neutral mtDNA variants have been identified by surveying mtDNA restriction site variants or by sequencing hypervariable segments in the displacement loop. Restriction analysis using fourteen restriction endonucleases allowed screening of 15-20% of the mtDNA sequence for variations (Chen Y.S. et al., (1995) "Analysis of mtDNA variation in African populations reveals the most ancient of all human continent-specific haplogroups," Am J Hum Genet 57:133-149). The large majority of mtDNA sequence data published to date are limited to HVS-I. Bandelt, H.J. et al., (1995) "Mitochondrial portraits of human populations using median networks" Genetics 141:743-753).
The coding and classification system that has been used for mtDNA haplogroups refers primarily to the information provided by RFLPs and the hypervariable segments of the control region. (Torroni, A. et al. (1996) "Classification of European mtDNAs from an analysis of three European populations," Genetics 144:1835-1850 and Richards MB et al., (1998) "Phylogeography of mitochondrial DNA in western Europe," Ann Hum Genet 62:241- 260.)
Methods are known for testing the likelihood of neutrality of mutations (Tajima, F. (1989) Genetics 123:585-595; Fu, Y. and Li, W. (1993) Genetics 133:693-709; Li, W. et al. (1985) Mol. Biol. Evol. 2(2): 150-174; and Nei, M. and Gojobori, T. (1986) Mol. Biol. Evol. " 3(5):418-426). All of the methods in these publications are used to compare datasets taken from separate groups. None of these methods are used to analyze a dataset not containing data representing an outgroup.
Wise, CA. et al. (1998) Genetics 148:409-421, describes neutrality analysis of the human mitochondrial NADH Dehydrogenase Subunit 2 gene, when compared to the NADH Dehydrogenase Subunit 2 gene from chimpanzees. Templeton, A.R. (1996) Genetics 144:1263-1270, describes neutrality analysis of the human mitochondrial Cytochrome Oxidase JJ (COXE) gene when compared to the COXJJ gene in hominoid primates. Messier, W. and Stewart, C. (1997) Nature 385:151-154 describes neutrality analysis of primate lysozymes. Endo, T. et al. (1996) Mol. Biol. Evol. 13(5):685-690 describes large-scale neutrality analysis of sequences from DDBJ, EMBL, and GenBank databases. Hughes, A.L. and Nei, M. (1988) Nature 335:167-170 describes neutrality analysis of MHC Class I loci. Nachman, M.W. (1996) Genetics 142:953-963 describes neutrality analysis of the human mitochondrial NADH Dehydrogenase subunit 3 (NADH3) gene, when compared to the NADH Dehydrogenase subunit 3 gene from chimpanzees. Nachman, M.W. et al. (1994) Proc. Natl. Acad. Sci. USA 76:5269-5273 describes neutrality analysis of the mitochondrial NADH dehydrogenase subunit 3 gene in 3 strains of mouse. Rand, D.M. et al. (1994) Genetics 138:741-756; Ballard, J.W.O. and Kreitman, M. (1994) Genetics 138:757-772; and Kaneko, M.Y. et al. (1993) Genet. Res. 61:195-204, describe neutrality analysis for mitochondrial NADH dehydrogenase subunit 5, Cytochrome b, and ATPaseό in strains of Drosophila.
In the above-mentioned publications, neutrality testing, including Ka/Ks analysis, has not been applied for the purpose of identifying disease-associated mutations. Populations for neutrality testing analysis were identified by observation of normal phenOtypic variation. Neutrality testing has been performed to determine whether a gene is under selection. None of these publications describe neutrality analysis with the purpose of identifying phenotype- associated mutations, and no suspected phenotype-associated mutations were identified.
US Patent 6,228,586 (issued May 8, 2001) and US Patent 6,280,953 (issued August 28, 2001) describe methods for identifying polynucleotide and polypeptide sequences in human and/or non-human primates, which may be associated with a physiological condition. The methods employ comparison of human and non-human primate sequences using statistical methods. U.S. Patent 6,274,319 (issued August 14, 2001) describes Ka/Ks methods for identifying polynucleotide and polypeptide sequences that may be associated with commercially or aesthetically relevant traits in domesticated plants or animals. The methods employ comparison of homologous genes from the domesticated organism and its wild ancestor to identify evolutionarily significant changes. In the above-mentioned publications, neutrality testing, including Ka/Ks analysis, is only applied to interspecific, not intraspecific, comparisons, and only genes from the nuclear genome, not from organelle genomes, are analyzed.
Methods for constructing peptide and nucleotide libraries are well known to the art, e.g. as described in U.S. Patents 6,156,511 and 6,130,092. Sequencing methods are also known to the art, e.g., as described in U.S. Patent 6,087,095. Arrays of nucleic acid have been used for sequencing and for identifying exceptional alleles including disease-associated alleles. Nucleic acid arrays have been described, e.g., in patent nos.: U.S. 5,837,832, U.S. 5,807,522, U.S. 6,007,987, U.S. 6,110,426, WO 99/05324, 99/05591, WO 00/58516, WO 95/11995, WO 95/35505A1, WO 99/42813, JP10503841T2, GR3030430T3, ES2134481T3, EP804731B1, DE69509925C0, CA2192095AA, AU2862995A1, AU709276B2, AT180570, EP 1066506, and AU 2780499. Computational methods are useful for analyzing hybridization results, e.g., as described in PCT Publication WO 99/05574, and U.S. Patents 5,754,524; 6228,575; 5,593,839; and 5,856,101. Methods for screening for disease markers are also known to the art, e.g. as described in U.S. Patents 6,228,586; 6,160,104; 6,083,698; 6,268,398; 6,228,578; and 6,265,174.
The development of microarray technologies has stemmed from the desire to examine very large numbers of nucleic acid probe sequences simultaneously, in an effort to obtain information about genetic mutations, gene expression or nucleic acid sequences. Microarray technologies are intimately connected with the Human Genome Project, which has development of rapid methods of nucleic acid sequencing and genome analysis as key objectives (E. Marshall, (1995) Science 268:1270), as well as elucidation of sequence- function relationships (M. Schena et al., (1996) Proc. Nat'l. Acad. Sci. USA, 93:10614). Microarray hybridization of PCR-amplified fragments to allele-specific oligonucleotide (ASO) probes is widely used in large-scale single nucleotide polymorphism (SNP) genotyping (Huber M. et al. (2002) Analytical Biochemistry 303:25-33 and Southern, E.M. (1996) Trends Genet. 12: 110-115).
The Affymetrix GeneChip® HuSNP™ Array enables whole-genome surveys by simultaneously tracking nearly 1,500 genetic variations, known as single nucleotide polymorphisms (SNPs), dispersed throughout the genome. The HuSNP Affymetrix Array is being used for familial linkage studies that aim to map inherited disease or drug . susceptibilities as well as for tracking de novo genetic alterations. For genotyping, arrays rely on multiple probes to interrogate individual nucleotides in a sequence. The identity of a target base can be deduced using four identical probes that vary only in the target position, each containing one of the four possible bases. Alternatively, the presence of a consensus sequence can be tested using one or two probes representing specific alleles. To genotype heterozygous or genetically mixed samples, arrays with many probes can be created to provide redundant information. Arrays, also called DNA microarrays or DNA chips, are fabricated by high-speed robotics, generally on glass but sometimes on nylon substrates, for which probes (Phimister, B. (1999) Nature Genetics 21s:l-60) with known identity are used to determine complementary binding. An experiment with a single DNA chip can provide researchers information on thousands of genes simultaneously. There are several steps in the design and implementation of a DNA array experiment. Many strategies have been investigated at each of these steps: 1) DNA types; 2) Chip fabrication; 3) Sample preparation; 4) Assay; 5) Readout; and 6) Software (informatics).
There are two major application forms for the array technology: 1) Determination of expression level (abundance) of genes; and 2) Identification of sequence (gene / gene mutation). There appear to be two variants of the array technology, in terms of intellectual property, of arrayed DNA sequence with known identity: Format I consists of probe cDNA (500~5,000 bases long) immobilized to a solid surface such as glass using robot spotting and exposed to a set of targets either separately or in a mixture. This method, "traditionally" called DNA microarray, is widely considered as having been developed at Stanford University. (R. Ekins and F.W. Chu "Microarrays: their origins and applications," [1999] Trends in Biotechnology, 17:217-218). Format π consists of an array of oligonucleotide (20~80-mer oligos) or peptide nucleic acid (PNA) probes synthesized either in situ (on-chip) or by conventional synthesis followed by on-chip immobilization. The array is exposed to labeled sample DNA, hybridized, and the identity/abundance of complementary sequences is determined. This method, "historically" called DNA chips, was developed at Affymetrix, Inc., which sells its photolithographically fabricated products under the GeneChip® trademark. Many companies are manufacturing oligonucleotide-based chips using alternative in-situ synthesis or depositioning technologies.
Probes' on arrays can be hybridized with fluorescently-labeled target polynucleotides and the hybridized array can be scanned by means of scanning fluorescence microscopy. The fluorescence patterns are then analyzed by an algorithm that determines the extent of mismatch content, identifies polymorphisms, and provides some general sequencing information (M. Chee et al., [1996] Science 274:610). Selectivity is afforded in this system by low stringency washes to rinse away non-selectively adsorbed materials. Subsequent analysis of relative binding signals from array elements determines where base-pair mismatches may exist. This method then relies on conventional chemical methods to maximize stringency, and automated pattern recognition processing is used to discriminate between fully complementary and partially complementary binding.
Devices such as standard nucleic acid microarrays or gene chips, require data processing algorithms and the use of sample redundancy (i.e., many of the same types of array elements for statistically significant data interpretation and avoidance of anomalies) to provide semi-quantitative analysis of polymorphisms or levels of mismatch between the target sequence and sequences immobilized on the device surface.
Labels appropriate for array analysis are known in the art. Examples are the two- color fluorescent systems, such as Cy3/Cy5 and Cy3.5/Cy5.5 phosphoramidites (Glen Research, Sterling Virginia). Patents covering cyanine dyes include: U.S. 6,114,350 (Sept. 5, 2000); U.S. 6,197,956 (March 6, 2001); U.S. 6,204,389 (March 20, 2001) and U.S. 6,224,644 (May 1, 2001). Array printers and readers are available in the art.
A process of using arrays is described in Grigorenko, E.V. ed., (2002) DNA Arrays: Technologies and Experimental Strategies. CRC Press, NY; Vrana, K.E. et al., (May 2001) Microarrays and Related Technologies: Miniaturization and Acceleration of Genomics Research, CHI, Upper Falls, MA; and Branca, M.A. et al., (February 2002) DNA Microarray Informatics: Key Technological Trends and Commercial Opportunities, CHI, Upper Falls, MA.
All publications referred to herein are incorporated by reference to the extent not inconsistent herewith. The mention of a publication in this Background Section does not constitute an admission that it is prior art.
SUMMARY OF INVENTION
The high mitochondrial DNA mutation rate of human mitochondrial DNA has been thought to result in the accumulation of a wide range of neutral, population-specific base substitutions in mtDNA. These have accumulated sequentially along radiating maternal lineages that have diverged approximately on the same time scale as human populations have colonized different geographical regions of the world. About 76% of all African mtDNAs fall into haplogroup L, defined by an Hpal restriction site gain at bp 3592. 77% of Asian mtDNAs are encompassed within a super- haplogroup defined by a Ddel site gain at bp 10394 and an Alul site gain at bp 10397. Essentially all native American mtDNAs fall into four haplogroups, A-D. Haplogroup A is defined by a HaeJJI site gain at bp 663, B by a 9 bp deletion between bp 8271 to bp 8281, C by a HincIJ site loss at bp 13259, and D defined by an AM site loss at bp 5176. Ten haplogroups encompass almost all mtDNAs in European populations. The ten-mtDNA haplogroups of Europeans can be surveyed by using a combination of data from RFLP analysis of the coding region and sequencing of the hypervariable segment I. About 99% of European mtDNAs fall into one often haplogroups: H, I, J, K, M, T, U, V, W or X.
This invention provides human mtDNA polymorphisms that are diagnostic of all the major human haplogroups and methods of diagnosing those haplogroups and selected subhaplogroups.
This invention also provides methods for identifying evolutionarily significant mitochondrial DNA genes, nucleotide alleles, and amino acid alleles. Evolutionarily significant genes and alleles are identified using one or two populations of a single species. The process of identifying evolutionarily significant nucleotide alleles involves identifying evolutionarily significant genes and then evolutionarily significant nucleotide alleles in those - genes, and identifying evolutionarily significant amino acid alleles involves identifying amino acids encoded by all nonsynonymous alleles. Synonymous codings of the nucleotide alleles encoding evolutionarily significant amino acid alleles of this invention are equivalent to the evolutionarily significant amino acid alleles disclosed herein and are included within the scope of this invention. Synonymous codings include alleles at neighboring nucleotide loci that are within the same codon.
This invention also provides methods for associating haplogroups and evolutionarily significant nucleotide and amino acid alleles with predispositions to physiological conditions. Methods for diagnosing predisposition to LHON, and methods for diagnosing increased likelihood of developing blindness, centenaria, and increased longevity that are not dependent on the geographical location of the individual being diagnosed are provided herein. Diagnosis of an individual with a predisposition to an energy metabolism-related physiological condition is dependent on the geographic region of the individual. Physiological conditions diagnosable by the methods of this invention include healthy conditions and pathological conditions. Physiological conditions that are associated with haplogroups and with alleles provided by this invention include energetic imbalance, metabolic disease, abnormal energy metabolism, abnormal temperature regulation, abnormal oxidative phosphorylation, abnormal electron transport, obesity, amount of body fat, diabetes, hypertension, and cardiovascular disease.
Molecules having sequences provided by this invention are provided in libraries and on genotyping arrays. This invention provides methods of making and using the genotyping arrays of this invention. The arrays of this invention are useful for determining the presence and absence of nucleotide alleles of this invention, for determining a haplogroup, and for diagnosis.
This invention also provides machine-readable storage devices and program devices for storing data and programmed methods for diagnosing haplogroups and physiological conditions.
The arrays of this invention are useful for determining the presence and absence of nucleotide alleles of this invention, for determining a haplogroup, and for diagnosis. This invention also provides machine-readable storage devices and program devices for storing data and programmed methods for diagnosing haplogroups and physiological conditions. BRIEF DESCRIPTION OF THE FIGURES
FIG. 1 shows a consensus neighbor-joining tree of 104 human mtDNA complete sequences and two primate sequences. Numbers correspond to bootstrap values (% of 500 total bootstrap replicates) (Felsenstein, J. (1993) PHYLIP (Phylogeny Inference Package) 3.53c. Distributed by author, Department of Genetics, University of Washington, Seattle, WA). Maximum Likelihood (ML) and UPGMA yielded consistent branching orders with respect to continent-specific mtDNA haplogroups. Sequences: 11-53: Genbank AF346963- AF347015 (4); E21U: Genbank X93334, AlLla: Genbank D38112, cam revise: Genbank NC_001807 corrected according to (R. M. Andrews et al., Nature Genetics 23, 147 (1999)); the rest are 48 sequences generated in this invention using an ABI 377. Specific mutations in patient samples that have been implicated in disease were excluded from this analysis, as well as gaps and deletions, with the exception of the 9 bp deletion (nucleotide position (np) 8272 to 8281). Haplogroups A, B, C, D, and X were drawn from both Eurasia and the Americas. Haplogroup names are designated with capital letters. P. paniscus and P. troglodytes mtDNA sequences were used as outgroups. Haplogroups LO and LI encompass previously assigned LI a and Lib mtDNAs, respectively (Y. S. Chen et al., American Journal of Human Genetics 66, 1362-1383 (2000)).
FIG. 2 shows the migrations of human haplogroups around the world. +/-, +/+, or -/- equals Dde 1 10394 and Alu 1 10397. * equals Rsa 1 16329. The mutation rate is 2.2-2.9% per million years. Time estimates are YBP (years before present).
FIG. 3 shows a cladogram listing nucleotide alleles describing 21 major human haplogroups, 21 sub-haplogroups, and several macro-haplogroups. The groups on the left are described by the alleles to their right. A vertical bar designates that each group to the left of the bar has all of the alleles to the right of the bar.
FIG. 4 shows the selective constraint (kc values) of mtDNA protein genes with comparisons among mammalian species. Statistical significance (P < 0.05) was determined using ANOVA, t-tests or the Tukey-Kramer Multiple Comparisons tests. Most programs used are from DNAsp ( J. Rozas and R. Rozas, (1999) Bioinformatics 15:174-5). DNA sequence divergence was analyzed using the DIVERGE program (Wisconsin Package Version 10.0, Genetics Computer Group (GCG), Madison, WI). For all thirteen mtDNA genes, data is shown for human, human compared to P. troglodytes, human compared to P. paniscus, and nine species of primates. For only ATP6 and ATP8, data is also shown for fourteen species of mammals.
DETAILED DESCRIPTION OF THE INVENTION
Table 1 shows human mitochondrial nucleotide alleles, which have been associated with physiological conditions. In Table 1, columns three (nucleotide locus), five (physiological condition nucleotide allele), and column two (physiological condition) make up the set of Human Mitochondrial Nucleotide Alleles Known to be Associated with Physiological Conditions. TABLE l1 Human Mitochondrial Alleles Known to be Associated with Physiological Conditions
Figure imgf000014_0001
1 (MITOMAP: A Human Mitochondrial Genome Database. Center for Molecular Medicine, Emory University, Atlanta, GA, USA. http://www.gen.emorv.edu/mitomap.htTTi1, 2001).
Figure imgf000015_0001
Definitions:
LHON Leber Hereditary Optic Neuropathy
MM Mitochondrial Myopathy
AD Alzheimer's Disease
LIMM Lethal Infantile Mitochondrial Myopathy
ADPD Alzheimer's Disease and Parkinson's Disease
MMC Maternal Myopathy and Cardiomyopathy
NARP Neurogenic muscle weakness, Ataxia, and Retinitis Pigmentosa; alternate phenotype at this locus is reported as Leigh Disease
FICP Fatal Infantile Cardiomyopathy Plus a MELAS-associated cardiomyopathy
MELAS Mitochondrial Encephalomyopathy, Lactic Acidosis, and Stroke-like episodes LDYT Leber's hereditary optic neuropathy and DYsTonia
MERRF Myoclonic Epilepsy and Ragged Red Muscle Fibers
MHCM Maternally inherited Hypertrophic CardioMyopathy
CPEO Chronic Progressive External Ophthalmoplegia
KSS Kearns Sayre Syndrome
DM Diabetes Mellitus
DMDF Diabetes Mellitus + DeaFness
CIPO Chronic Intestinal Pseudoobstruction with myopathy and Ophthalmoplegia
DEAF Maternally inherited DEAFness or aminoglycoside-induced DEAFness
PEM Progressive encephalopathy
SNHL SensoriNeural Hearing Loss
Thirteen protein-coding mitochondrial genes are known (MitoMap, http ://www.gen. emory. edu/cgi-bin/MITOMAP) . Table 2 Protein-coding Human MtDNA Genes
Figure imgf000016_0001
a'b As defined on MitoMap, http://www.gen.emory.edu/cgi-bin/MITOMAP, which is numbered relative to the Cambridge Sequence (Genbank accession no. J01415 and Andrews" et al. (1999), A Reanalysis and Revision of the Cambridge Reference Sequence for Human Mitochondrial DNA, Nature Genetics 23:147.
Codon usage for mtDNA differs slightly from the universal code. For example, UGA codes for tryptophan instead of termination, AUA codes for methionine instead of isoleucine, and AGA and AGG are terminators instead of coding for arginine.
As used herein "printing" refers to the process of creating an array of nucleic acids on known positions of a solid substrate. The arrays of this invention can be printed by spotting, e.g., applying arrays of probes to a solid substrate, or to the synthesis of probes in place on a solid substrate. As used herein "glass slide" refers to a small piece of glass of the same dimensions as a standard microscope slide. As used herein, "prepared substrate" refers to a substrate that is prepared with a substance capable of serving as an attachment medium for attaching the probes to the substrate, such as poly Lysine. As used herein, "sample" refers to a composition containing human mitochondrial DNA that can be genotyped. As used herein, "quantitative hybridization" refers to hybridization performed under appropriate conditions and using appropriate materials such that the sequence of one nucleotide allele (a single nucleotide polymorphism) can be determined, such as by hybridization of a molecule containing that allele to two or more probes, each containing different alleles at that nucleotide locus, all as is known in the art.
As used herein, "physiological condition" includes diseased conditions, healthy conditions, and cosmetic conditions. Diseased conditions include, but are not limited to, metabolic diseases such as diabetes, hypertension, and cardiovascular disease. Healthy conditions include, but are not limited to, traits such as increased longevity. Physiological conditions include cosmetic conditions. Cosmetic conditions include, but are not limited to, traits such as amount of body fat. Physiological conditions can change health status in different contexts, such as for the same organism in a different environment. Such different environments for humans are different cultural environments or different climatic contexts such as are found on different continents.
As used herein, "neutrality analysis" refers to analysis to determine the neutrality of one or more nucleotide alleles and/or the gene containing the allele(s) using at least two alleles of a sequence. Commonly, the alleles in a sequence to be analyzed are divided into two groups, synonymous and nonsynonymous. Codon usage tables showing which codons encode which amino acids are used in this analysis. Codon usage tables for many organisms and genomes are available in the art. If a gene is determined to not be neutral, the gene is determined to have had selection pressure applied to it during evolution, and to be evolutionarily significant. The alleles that change amino acids in the gene (nonsynonymous) are then determined to be non-neutral and evolutionarily significant.
As used herein, "Ka Ks" refers to a ratio of the proportion of nonsynonymous differences to the proportion of synonymous differences in a DNA sequence analysis, as is known to the art. The proportion of nonsynonymous differences is the number of nonsynonymous nucleotide substitutions in a sequence per site at which a nonsynonymous substitution could occur. The proportion of synonymous differences is the number of synonymous nucleotide substitutions in a sequence per site at which synonymous substitutions could occur. Alternatively, instead of only including the number of sites in the denominator of each proportion, the number of alternative substitutions that could occur at each site are also included. Either definition may be used as long as similar definitions are used for both Ka and Ks in an analysis. Kc is Ka/Ks.
As used herein "nonsynonymous" refers to mutations that result in changes to the encoded amino acid. As used herein, "synonymous" refers to mutations that do not result in changes to the encoded amino acids.
As used herein, "haplogroup" refers to radiating lineages on the human evolutionary tree, as is known in the art. As used herein, "macro-haplogroup" refers to a group of evolutionarily related haplogroups. As used herein, "sub-haplogroup" refers to an evolutionarily related subset of a haplogroup. An individual's haplotype is the haplogroup to which he belongs.
As used herein, "extended longevity" or "extended lifespan" refers to living longer than the average expected lifespan for the population to which one belongs. As used herein, "centenaria" refers to an extended lifespan that is at least 100 years.
As used herein, "abnormal energy metabolism" in an individual who is non-native to the geographical region in which he lives refers to energy metabolism that differs from that of the population that is native to where the individual lives. As used herein, "abnormal temperature regulation" in such an individual refers to temperature regulation that differs from that of the population that is native to where he lives. As used herein, "abnormal oxidative phosphorylation" in such an individual refers to oxidative phosphorylation that differs from that of the population that is native to where he lives. As used herein, "abnormal electron transport" in such an individual refers to electron transport that differs from that of the population that is native to where he lives. As used herein "metabolic disease" of such an individual refers to metabolism that differs from that of the population that is native to where he lives. As used herein, "energetic imbalance" of such an individual refers to a balance of energy generation or use that differs from that of the population that is native to where he lives. As used herein, "obesity" of such an individual refers to a body weight that, for the height of the individual, is 20% higher than the average body weight that is recommended for the population native to where the individual lives. As used herein, "amount of body fat" of such an individual refers to a low or high percentage of body fat relative to what is recommended for the population that is native to where he lives. As used herein, an isolated nucleic acid is a nucleic acid outside of the context in which it is found in nature. The term covers, for example: (a) a DNA which has the sequence of part of a naturally-occurring genomic DNA molecule but is not flanked by both of the coding or noncoding sequences that flank that part of the molecule in the genome of the organism in which it naturally occurs; (b) a nucleic acid incorporated into a vector or into the genomic DNA of a prokaryote or eukaryote in a manner such that the resulting molecule is not identical to any naturally-occurring vector or genomic DNA; (c) a separate molecule such as a cDNA, a genomic fragment, a fragment produced by polymerase chain reaction (PCR), or a restriction fragment; and (d) a recombinant nucleotide sequence that is part of a hybrid gene, i.e., a gene encoding a fusion protein, or a modified gene having a sequence not found in nature.
As used herein, "nucleotide locus" refers to a nucleotide position of the human mitochondrial genome. The Cambridge sequence SEQ ID NO:2 is used as a reference sequence, and the positions of the mitochondrial genome referred to herein are assigned relative to that sequence. As used herein, "loci" refers to more than one locus. As used herein, "nucleotide allele" refers to a single nucleotide at a selected nucleotide locus from a selected sequence when different bases occur naturally at that locus in different individuals. The nucleotide allele information is provided herein as the nucleotide locus number and the base that is at that locus, such as 3796C, which means that at human mitochondrial position 3796 in the Cambridge sequence, there is a cytosine (C). As used herein, "amino acid allele" refers to the amino acid that is at a selected amino acid location in the human mitochondrial genome when different amino acids occur naturally at that location in different individuals. There are thirteen protein-coding genes in the human mitochondria. For each gene, the encoded protein consists of amino acids that are numbered starting at one. ND1 304 H, means that there is a histidine at amino acid 304 in the NDl protein. Amino acids are encoded by codons. As used herein, "codon" refers to the group of three nucleotides that encode an amino acid in a protein, as is known in the art. An amino acid allele can be referred to by one or more of the nucleotide loci that code for it. For example, ntl 15884 P means that there is a proline (P) encoded by the codon containing nucleotide locus 15884.
As used herein, "evolutionarily significant gene" refers to a gene that has statistically significantly more nonsynonymous nucleotide changes, when compared to the corresponding gene in another individual, than would be expected by chance. As used herein, "evolutionarily significant nucleotide allele" refers to a nucleotide allele that is located in a gene that has been determined to be evolutionarily significant using that nucleotide allele, or an equivalent nucleotide allele in a corresponding gene in another individual. As used herein, "intraspecific" means within one species. As used herein, "subpopulation" refers to a population within a larger population. A subpopulation can be as small as one individual. As used herein, "geographic region" refers to a geographic area in which a statistically significant number of individuals have the same haplotype. As used herein, being "native" to a geographic region refers to having the haplotype associated with that geographic region. The haplotype associated with a geographic region is that which originated in the region or of many individuals who settled historically in the region with respect to human evolution.
As used herein, "target" or "target sample" refers to the collection of nucleic acids used as a sample for array analysis. The target is interrogated by the probes of the array. A "target" or "target sample" maybe a mixture of several samples that are combined. For example, an experimental target sample may be combined with a differently labeled control target sample and hybridized to an array, the combined samples being referred to as the "target" interrogated by the probes of the array during that experiment. As used herein, "interrogated" means tested. Probes, targets, and hybridization conditions are chosen such that the probes are capable of interrogating the target, i.e., of hybridizing to complementary - sequences in the target sample.
As used herein, "increased likelihood of developing blindness" refers to a higher than normal probability of losing the ability to see normally and/or of losing the ability to see normally at a younger age.
All sequences defined herein are meant to encompass the complementary strand as well as double-stranded polynucleotides comprising the given sequence.
This invention provides a list of human mtDNA polymorphisms found in all the major human haplogroups. Example 1 summarizes data from sequencing over 100 human mtDNA genomes that are representative of the major human haplogroups around the world. The summary includes over 900 point mutations and one nine-base pair deletion. Table 3, Human MtDNA Nucleotide Alleles, lists the alleles identified in 103 such sequences in the third column, the corresponding alleles of the Cambridge mtDNA sequence in the second column and the nucleotide loci (position in the Cambridge sequence), in the first column. Table 3 lists the set of human mtDNA nucleotide alleles that occur naturally in different haplogroups. Table 3 does not include alleles previously known to be associated with disease (i.e., does not include the alleles of Table 1). The nucleotide alleles listed in column three of Table 3, together with the corresponding nucleotide loci in column one, make up the set of non- Cambridge human mtDNA nucleotide alleles. Table 4 lists the nucleotide alleles identified by the inventors hereof in 48 human mtDNA genomes in column three, and the corresponding Cambridge alleles in column two. Columns one and three of Table 4 make up the set of non-Cambridge human mtDNA nucleotide alleles in 48 genomes.
The nucleotide alleles listed in Table 3, including the Cambridge nucleotide alleles, being naturally occurring, are useful for identifying alleles that are associated with abnormal physiological conditions. These nucleotide alleles can be ignored during analysis steps when performing methods for identifying novel alleles associated with selected physiological conditions.
As described below, certain alleles of Table 3 are useful for identifying physiological conditions related to energy metabolism such as energetic imbalance, metabolic disease, abnormal energy metabolism, abnormal temperature regulation, abnormal oxidative phosphorylation, abnormal electron transport, obesity, amount of body fat, diabetes, hypertension, and cardiovascular disease when the affected individuals have the abnormal physiological condition because they are in a geographical region that is not native for their haplogroup.
The nucleotide alleles listed in Table 3, including the Cambridge nucleotide alleles, are also useful for identifying mtDNA sequences associated with and diagnostic of human haplogroups. Example 2 summarizes phylogenetic analyses of the sequence data of the 103 individuals and the Cambridge sequence along with two chimpanzee mtDNA sequences. The results are shown in FIG. 1 in a cladogram. Calculations of the time since the most recent common ancestor (MRCA) are shown in Table 5. The 104 individuals were chosen from known haplogroups, and the corresponding haplogroups are labeled on the figure. Combining the sequence data of the 104 individuals with FIG. 1 and the geographic regions native to human haplogroups, as is known in the art, results in FIG.2 (Example 3), which tracks human mtDNA migrations. Analysis of several mtDNA genomic sequences representing each haplogroup demonstrated which alleles are segregating within a haplogroup as well as which alleles are present in every individual within one or more haplogroups. The alleles that are present in every individual within each haplogroup are shown in FIG. 3 (Example 4). On the left, sub-haplogroups and haplogroups are listed. Macrohaplogroups are shown in parentheses. Nucleotide loci and alleles that are present in all the members of each group (sub-haplo or haplo) are listed. A vertical bar designates that all of the alleles to the right are present in all of the haplogroups and/or sub-haplogroups to the left. FIG. 3 is drawn as a cladogram. For example, FIG. 3 demonstrates that the macrohaplogroup (R) individuals all contain 12705C and 16223C, and no other individuals are known to have these alleles, therefore macro-haplogroup (R) can be diagnosed by identifying in a sample containing mtDNA, the presence of either 12705C or 16223C. Similarly, macro-haplogroup (N) can be diagnosed by identifying the presence of 8701 A, 9540T, or 10873T.
Analysis of the data in FIG. 3 demonstrated sets of alleles useful for diagnosing the haplogroups (Example 5). These alleles are listed by haplogroup in Tables 6 and 7, and by sub-haplogroup in Tables 8 and 9. A set of alleles useful for diagnosing all of the haplogroups and sub-haplogroups in FIG. 3 is listed in Table 10. Table 10 lists the nucleotide loci in column one and the nucleotide alleles useful for diagnosing haplogroups in column two. Table 10 contains some alleles from the Cambridge sequence. There are many equivalent methods for diagnosing the haplogroups. Methods for diagnosing haplogroups that require testing only one or a few loci are listed in Example 5. The presence of only one particular allele is usually sufficient for diagnosing a haplogroup, however, often it is not known which locus needs to be tested. By determining the allele at each nucleotide locus listed in Table 10, the haplogroup of an unknown sample can be diagnosed. Alternatively, macro-haplogroups can be diagnosed or excluded first, thereby decreasing the number of loci that need to be tested to distinguish between the remaining, possible haplogroups. Alleles useful for diagnosing macro-haplogroups by methods that require testing only one or a few loci are included in Table 11. Further analysis of the data provided by this invention will demonstrate which sets of alleles identify additional sub-haplogroups and additional macro- haplogroups.
Diagnosing the haplogroup of a sample is useful in criminal investigations and forensic analyses. Identifying a sample as belonging to a particular haplogroup, and knowing which alleles have not been associated with a selected physiological condition and context, are useful when identifying novel alleles associated with a selected physiological condition, as described above and in Example 6. Diagnosing the haplogroup of a sample is also useful for identifying a novel allele associated with a selected physiological condition when the novel allele causes the physiological condition only in the genetic context of a particular haplogroup, as shown in Example 6. In example 6, the list of alleles associated with haplogroups found in Russia was used in the sequence analysis of two Russian LHON families. By eliminating alleles listed in Table 3, two novel mutations were identified that are associated with LHON. These new complex I mutations, 3635A and 4640C, are useful for diagnosing a predisposition to Leber Hereditary Optic Neuropathy (LHON).
Example 7 demonstrates the identification of a new primary LHON mutation, 10663C, in complex I, that appears to cause a predisposition to LHON only when associated with haplogroup J. Haplogroup J is defined by a nonsynonymous difference that is useful for diagnosing haplogroup J, 458T in ND5. This invention provides a method of diagnosing a person with a predisposition to LHON and/or to developing early onset blindness by identifying, in a sample containing mtDNA from the person, the nucleotide allele, or a synonymous nucleotide allele of 10663 C and also identifying alleles diagnostic of haplogroup J, such as 458T in ND5. Because ND5 458T is a missense mutation in all haplogroup J individuals, this particular mutation may be directly involved in causing LHON. • NDl 304H is another missense mutation that is present in all haplogroup J individuals, and may also be directly involved in causing LHON. 458T is also present in haplogroup T individuals. Haplogroup J is also associated with a predisposition to centenaria and an extended lifespan. ND5 458T and NDl 304H may also be directly involved in causing the predisposition to centenaria and extended lifespan.
Example 8 demonstrates the importance of demographic factors in intercontinental mtDNA sequence radiation. Haplogroups are combined and separated into various populations for statistical analyses.
Previously in the art, it has been thought that polymorphisms in human mtDNA, such as the nucleotide alleles listed in Table 3, were neutral in all contexts and could not be associated with physiological conditions. It has been thought that differences in human mtDNA diversity associated with inter-continental migrations were due to random genetic drift (e.g. founder effects followed by rapid population expansion). In this invention, the biological and clinical significance of these human mtDNA polymorphisms are disclosed. The neutrality of the nucleotide alleles listed in Table 3 was tested using neutrality analysis (Examples 9-12).
Some of the nucleotide loci in Table 3 are located in the mitochondrial protein-coding genes (Table 2). Of those loci, some of the identified nucleotide alleles alter the protein encoded by the codon in which the nucleotide locus resides. This is determined using the mitochondrial codon usage table, as is known in the art. Nucleotide alleles that change an amino acid are called missense mutations, missense polymorphisms, or nonsynomymous differences. Missense polymorphisms alter the protein sequence relative to a compared sequence, but they still may be neutral because they do not affect the function of the encoded protein. Without performing biochemical studies on the affected proteins, statistical analyses can be performed to determine whether a polymorphism is neutral, whether evolution imposed selection on the encoding allele, and whether that selection is positive. This invention provides results of the statistical analyses of the polymorphisms in Table 3 and provides a list of which alleles are not neutral, and therefore evolutionarily significant.
Neutrality testing of nucleotide alleles first requires neutrality testing of the genes containing those nucleotide alleles. Neutrality testing of one or more genes by comparing two sets of allelic genes from two intraspecifϊc populations was performed, as described in Example 9. Haplogroups were combined to make populations for the comparison. In example 9, nucleotide alleles from the entire coding region of the mtDNA genome, representing haplogroups native to a geographic region, were combined to make a first population and first set of sequences. Nucleotide alleles of the entire coding region of the mtDNA genome, from haplogroups native to a different geographic region, were combined to make the second population and the second set of sequences. Nucleotide alleles were divided into those encoding synonymous and non-synonymous differences. The ratio of Ka/Ks for each gene, separated by the population containing the allele, is shown in Table 12. Neutrality testing of genes by comparing one set of at least two nucleotide alleles of at least one gene from one population of one species was performed in Example 10. In Example 10, sequences of the entire coding region of the mtDNA genome, of haplogroups in all geographic regions on earth, were combined to make one population and set of sequences for analysis. FIG 4 shows the results of the comparison of one set of sequences from one population of only one species, 104 human sequences. Example 11 includes comparisons of sets of sequences between two populations, human vs. P. paniscus, human vs. P. troglodytes, human vs. eight other primate species, and human vs. thirteen mammalian species.
To identify an evolutionarily significant gene, two sets of nucleotide sequences, each set from a different population, are compared to each other. Nucleotide sequences representing parts of genes or one or more whole genes are useful. The sets of sequences are compared to each other by neutrality analysis. Differences in the sequences from each set are determined to be synonymous or nonsynonymous differences. The proportion of nonsynonymous differences is compared to the proportion of synonymous differences (Ka/Ks)- The results of the analysis are compiled in a data set and the data set is analyzed, as is known in the art, to identify one or more evolutionarily significant genes. When the nonsynonymous differences occur significantly more often than is expected by chance than the synonymous differences, the gene or part of the gene is determined to be evolutionarily significant. When the synonymous differences occur significantly more often than is expected by chance than the nonsynonymous differences, the gene or part of the gene is determined to be conserved. When the ratio is as expected by chance, then there is no evidence of selection or evolutionary significance.
To identify an evolutionarily significant gene, only one set of nucleotide sequences (from only one population) may also be analyzed, e.g., the nucleotide sequences representative of humans living on one continent. When only one set of sequences is analyzed, the set must contain at least two corresponding nucleotide alleles (i.e., there must be sequence polymorphism). Corresponding sequences are sequences of the same gene or gene part from at least two individuals. The sequences from different individuals within the population must contain polymorphisms with respect to each other. Differences in the sequences relative to each other are determined to be synonymous or nonsynonymous. Neutrality analysis is performed to generate a data set. The data set is analyzed to identify an evolutionarily significant gene. If an analysis determines that none of the analyzed genes are evolutionarily significant, the set of nucleotide sequences can be increased, such as by increasing the size of the population from which the sequences are derived, to determine if one or more genes are evolutionarily significant in the enlarged population. Example 12 is similar to example 9 except that the data is further analyzed by manipulating Ka/Ks to Kc. Examples 9-12 demonstrate that all but one mtDNA gene are not neutral and therefore are evolutionarily significant. Genes are determined to not be neutral by statistical significance tests known in the art. Some genes are only evolutionarily significant when comparing selected populations. For example, ND4 was demonstrated to be significant when comparing Native American sequences to African sequences and when comparing all human sequences to each other, but not when comparing European to African sequences. ND4L is the only mtDNA gene not shown to be evolutionarily significant by the current analyses. ND4L might be demonstrated to be evolutionarily significant by the methods of this invention using one or more different populations or using only part of the gene sequence. In examples 9-12, the entire sequence of each gene was used for analysis, however portions of genes are also useful in the methods of this invention. The statistical significance tests prevent too small a gene portion from being used to determine non- neutrality.
After identifying evolutionarily significant genes, evolutionarily significant nucleotide alleles can be identified. To identify an evolutionarily significant nucleotide allele, the steps for identifying an evolutionarily significant gene, using one or two populations, are performed with the addition of a step of analyzing the sequence data set to determine an evolutionarily significant nucleotide allele. An evolutionarily significant nucleotide allele is - part of a sequence incoding an allelic amino acid in an evolutionarily significant gene or part of a gene. Examples 13 and 14 demonstrate identification of evolutionary significant nucleotide alleles and evolutionarily significant amino acid alleles in the evolutionarily significant genes identified in Examples 9-12. Evolutionarily significant amino acid alleles are the amino acids encoded by the codons containing evolutionarily significant nucleotide alleles. In these examples, nucleotides at loci not listed in Table 3 are identical to the Cambridge sequence so that the entire codon containing an evolutionarily significant nucleotide allele and the amino acid encoded by that codon can be determined. All nucleotide alleles that are part of a codon encoding the same amino acid as an evolutionarily significant amino acid allele identified herein, or identified by methods of this invention, are also evolutionarily significant and are intended to be within the scope of this invention. An evolutionarily significant amino acid allele may include more than one nucleotide allele, such as at two neighboring nucleotide loci. Evolutionarily significant nucleotide alleles and evolutionarily significant amino acid alleles in human mitochondrial sequences, identified by the methods of this invention, are listed in Table 14. i column one, Table 14 lists the gene containing the alleles, column two indicates the locus of the nucleotide allele, column three lists the Cambridge nucleotide allele at that nucleotide locus, column four lists a non- Cambridge allele of this invention, column five lists the amino acid encoded by the codon containing the Cambridge nucleotide allele (when other Cambridge nucleotides are present at the other nucleotide loci of the codon), and column six lists the amino acid encoded by the codon containing the non-Cambridge allele (when Cambridge nucleotides are present at the other nucleotide loci of the codon). Columns two, three, and four make the set of evolutionarily significant human mitochondrial nucleotide alleles. Columns two, five, and six make the set of evolutionarily significant human mitochondrial amino acid alleles. Table 14 designates the nucleotide locus of the listed alleles. For the amino acid alleles listed in columns five and six, the relevant loci are all three nucleotide loci in the encoding codon containing the nucleotide locus listed in column two.
To identify an evolutionarily significant amino acid allele, the steps for identifying an evolutionarily significant gene, using one or two populations, are performed with the addition of two steps: 1) analyzing the data set to determine an evolutionarily significant nucleotide allele; and 2) determining the encoded amino acid allele. An evolutionarily significant amino acid allele is a different amino acid, representing a nonsynonymous difference, relative to the corresponding amino acid allele against which it was compared, wherein the gene has been determined to be evolutionarily significant in the corresponding one or more populations.
In this invention it is demonstrated that amino acid substitution mutations (nonsynonymous differences) are much more common in human mtDNAs than would be expected by chance, and that most of them are evolutionarily significant. This invention demonstrates that these alleles have become fixed by selection. The mitochondrial genes encode proteins that are responsible for generating energy and for generating heat to maintain body temperature. As humans migrated to different parts of the world, they encountered changes in diet and climate. The high mutation rate of mtDNA and the central role of mitochondrial proteins in cellular energetics make the mtDNA an ideal system for permitting rapid mammalian adaptation to varying climatic and dietary conditions. The increased amino acid sequence variability that has been found among human mtDNA genes is due to the fact that natural selection favored mtDNA alleles that altered the coupling efficiency between the electron transport chain (ETC) and ATP synthesis, determined by the mitochondrial inner membrane proton gradient (ΔΨ). The coupling efficiency between the ETC and ATP synthesis is mediated to a considerable extent by the proton channel of the ATP synthase, which is composed of the mtDNA-encoded ATP6 protein and the nuclear DNA-encoded ATP9 protein. Mutations in the ATP6 gene, which create a more leaky ATP synthase proton channel, reduced ATP production but increased heat production for each calorie consumed. Such a change in energy balance was beneficial in a temperate or arctic climate, but deleterious in a tropical climate. Humans acquiring mtDNA alleles enabling better adaptation to the encountered changes in diet and climate experienced a higher genetic fitness and those alleles were selected for. In particular, these alleles were established genetically because they had an adaptive advantage as humans moved from the African tropics into the EurAsian temperate zone and on into the arctic (FIG 2). The lack of recombination of the maternally inherited mtDNAs favored the rapid segregation, expression and adaptive selection of advantageous mtDNA alleles. The apparent non-randomness of the differences in non-synonymous versus synonymous mtDNA variation between continents demonstrates that selection also influenced inter-continental colonization. Random genetic hitchhiking, such as in the synonymous alleles, then resulted in identifiable continent-specific haplogroups.
Modem mtDNA variation has been shaped by adaptation as our ancestors moved into different environmental conditions. Variants that are advantageous in one climatic and dietary environment are maladaptive when individuals locate to a different environment. The methods of this invention associate mtDNA nucleotide alleles with haplogroups and combine this data with native haplogroup geographic regions as is known in the art, to diagnose individuals as having predispositions to late-onset clinical disorders such as obesity, diabetes, hypertension, and cardiovascular disease when those individuals live in climatic and dietary environments that are disadvantageous with respect to their mtDNA alleles. When humans having regional mtDNA alleles move into a different thermal and/or dietary environment from the one in which the alleles were selected, they are energetically unbalanced with their environment, and as a result are predisposed to having metabolic diseases such as diabetes, hypertension, cardiovascular disease, and other diseases known to the art to be associated with metabolism and mitochondrial functions. The above-mentioned late-onset clinical disorders are rapidly becoming epidemic around the world in members of our globally mobile society. This invention provides a method of diagnosing a human with a predisposition to a physiological condition such as, but not limited to, energetic imbalance, metabolic disease, abnormal energy metabolism, abnormal temperature regulation, abnormal oxidative phosphorylation, abnormal electron transport, obesity, amount of body fat, diabetes, hypertension, and cardiovascular disease. The method involves testing a sample containing mitochondrial nucleic acid from an individual in a geographic region to determine the haplogroup of the sample and therefore of the individual, comparing the haplogroup of the individual to the set of haplogroups known to be native to that geographic region, and diagnosing the individual human with a predisposition to the above-mentioned conditions if the haplogroup of the individual is not in the set of haplogroups native to that geographic region. This invention enables treatment of one of the above-mentioned conditions that is diagnosed by the above-mentioned method, comprising relocating the diagnosed human to a geographic region that is of similar climate as the region(s) native to the human's haplogroup and/or changing the diagnosed human's diet to more closely match the diet historically available in the region(s) native to the human's haplogroup.
The above-described method for diagnosing a predisposition to a physiological condition is also useful for associating an amino acid allele with the physiological condition. The evolutionarily significant amino acid alleles present in the haplogroup of the diagnosed individual and not in the haplogroups native to the individual's geographic location are associated with the physiological condition by the methods of this invention. Amino acid alleles, and the corresponding nucleotide alleles, useful for diagnosing haplogroups, and the " haplogroup they are useful for diagnosing, are listed in Table 15. The amino acid alleles and corresponding nucleotide alleles listed in Table 15, and synonymously coding nucleotide alleles, are associated with the above-mentioned physiological conditions. Table 15 lists the set of amino acid alleles useful for diagnosing haplogroups. Column one of Table 15 lists the gene, column two lists the nucleotide locus, column three lists the useful nucleotide allele, column four lists the useful amino acid allele encoded by the useful nucleotide allele when Cambridge nucleotides are present at the other nucleotide loci of the encoding codon, and column five lists the haplogroups or sub-haplogroups, in parentheses, that contain the corresponding alleles. The amino acid alleles (column four) can be identified by the codon containing the nucleotide locus (column two). For example, the proline in the NDl gene is identified as ntl 3796 P, where ntl signifies the codon containing the nucleotide locus (ntl) 3796. When an individual of one of the haplogroups listed in column five of Table 15 is diagnosed with one of the above-mentioned physiological conditions by the above-mentioned method, the physiological condition is associated with the presence of one of the alleles listed in Table 15. When the haplogroup of the individual is haplogroup G, the amino acid allele likely to have caused the physiological condition is ntl 4833 A. When the haplogroup of the individual is haplogroup T, the amino acid allele is selected from the group consisting of ntl 14917 D, ntl 8701 T, and ntl 15452 I. When the haplogroup is haplogroup W, the amino acid allele is selected from the group consisting of ntl 5046 I, ntl 5460 T, ntl 8701 T, and ntl 15884 P. When the haplogroup is haplogroup D, the amino acid allele is selected from the group consisting of ntl 5178 M and ntl 8414 F. When the haplogroup is haplogroup L0, the amino acid allele is selected from the group consisting of ntl 5442 L, ntl 7146 A, ntl 9402 P, ntl 13105 V, and ntl 13276 V. When the haplogroup is haplogroup LI, the amino acid allele is selected from the group consisting of ntl 7146 A, ntl 7389 H, ntl 13105 V, ntl 13789 H, and ntl 14178 V. When the haplogroup is haplogroup C the amino acid allele is selected from the group consisting of ntl 8584 T and ntl 14318 S. When the haplogroup is selected from the group consisting of haplogroups A, I, X, B, F, Y, and U the amino acid allele is ntl 8701 T. When the haplogroup is haplogroup J the amino acid allele is selected from the group consisting of ntl 8701 T, ntl 13708 T, and ntl 15452 I. When the haplogroup is haplogroup selected from the group consisting of haplogroups V and H, the amino acid allele is selected from the group consisting of ntl 8701 T and ntl 14766 T.
Evolutionarily significant nucleotide and amino acid alleles also exist in nuclear- encoded ATP9 that are useful for diagnosing predisposition to an energy metabolism-related - physiological condition such as energetic imbalance, metabolic disease, abnormal energy metabolism, abnormal temperature regulation, abnormal oxidative phosphorylation, abnormal electron transport, obesity, centenaria, diabetes, hypertension, and cardiovascular disease. These alleles may be identified by methods of this invention.
The evolutionarily significant amino acid alleles and corresponding nucleotide alleles are candidates for alleles causing a physiological condition for which a predisposition is diagnosable by the methods of this invention. The evolutionarily significant amino acid and nucleotide alleles identified by the methods of this invention (Table 19) are useful for gene therapy and mitochondrial replacement therapy to treat the corresponding physiological conditions. The evolutionarily significant genes, amino acid alleles, and nucleotide alleles identified by the methods of this invention are useful for identifying targets for traditional therapy, and for designing corresponding therapeutic agents. The evolutionarily significant genes and amino acid and nucleotide changes identified by the methods of this invention are useful for generating animal models of the corresponding human physiological conditions.
As is known to the art, individuals may contain more than one mitochondrial DNA allele at any given nucleotide locus. One cell contains many mitochondria, and one cell or different cells within one organism may contain genetically different mitochondria. Heteroplasmy is the occurrence of more than one type of mitochondria in an individual or sample. Varying degrees of heteroplasmy are associated with varying degrees of the physiological conditions described herein. Heteroplasmy may be identified by means known to the art, and the severity of the physiological condition associated with specific nucleotide alleles is expected to vary with the percentage of such associated alleles within the individual.
The methods of this invention are used to analyze the human mitochondrial genome in the listed examples, but the methods are also useful for analyzing other genomes and other species. The methods of this invention are useful for identifying evolutionarily significant protein-coding genes and the correspondingly encoded mutations in other genomes in addition to mitochondrial genomes, such as in nuclear and chloroplast genomes. Using human haplogroups as populations (FIG 1), the methods of this invention are useful for identifying evolutionarily significant protein-coding genes and the corresponding evolutionarily significant alleles in human nuclear genes. The methods of this invention are • also useful for identifying evolutionarily significant protein-coding genes and the corresponding alleles in many species. For example, the methods of this invention are applicable to varieties of beef or dairy cattle, or pig lines. Corn lines are divisible by phenotypic and/or molecular markers into heterotic groups that are useful populations in the methods of this invention. Using corn heterotic groups as populations, the methods of this invention are useful for identifying evolutionarily significant protein-coding genes and the corresponding mutations in the nuclear, chloroplast, and mitochondrial genomes of corn.
This invention provides isolated nucleic acid molecules containing novel nucleotide alleles of this invention in libraries. The libraries contain at least two such molecules. Preferably the molecules have unique sequences. The molecules typically have a length from about 7 to about 30 nucleotides. "About" as used herein means within about 10% (e.g., "about 30 nucleotides" means 27-33 nucleotides). However, the molecules maybe longer, such as about 50 nucleotides long. A library of this invention contains at least two isolated nucleic acid molecules each containing at least one non-Cambridge nucleotide allele of this invention. A library of this invention may contain at least ten, twenty-five, fifty, 100, 500 or more isolated nucleic acid molecules, at least one of which contains a nucleotide allele of this invention. A library of this invention may contain molecules having at least two to all of the nucleotide alleles of this invention, including synonymous codings of evolutionarily significant amino acid alleles. The nucleotide alleles of this invention are defined by a nucleotide locus, the nucleotide location in the human mitochondrial genome, and by the A G C T (or U) nucleotide. An isolated nucleic acid molecule, in a library of this invention, can be identified as containing a nucleotide allele of this invention, because the nucleotide allele of this invention is bounded on at least one side by its context in the mitochondrial genome. Statistically, to be unique in the human mitochondrial genome, such a molecule would need to be at least about seven nucleotides long. Statistically, to be unique in the total human genome, including the mitochondrial genome, such a molecule would need to be at least about fifteen nucleotides long. Examples of isolated nucleic acid molecules of this invention are molecules containing the following nucleotide alleles: 1) Cambridge alleles at human mtDNA nucleotide loci 168-170, non-Cambridge alleles at locus 171 A, and Cambridge alleles at human mtDNA nucleotide loci 172-174; and 2) Cambridge alleles at 11940-11946, non-Cambridge alleles at 11947G, and Cambridge alleles at 11948-11954. An isolated nucleic acid molecule of this invention may contain more than one nucleotide allele of this invention. The nucleotide allele of this invention may be at any position in the isolated nucleic acid molecule. Often it is useful to have the relevant nucleotide allele in the center of the isolated nucleic acid molecule or on the 3' end of the molecule. Isolated nucleic acid molecules of this invention are useful for interrogating, determining the presence or absence of, a nucleotide allele at the corresponding nucleotide locus in the mitochondrial genome in a sample containing mitochondrial nucleic acid from a human, using any method known in the art. Methods for determining the presence of absence of the nucleotide allele include allele- specific PCR and nucleic acid array hybridization or sequencing.
The alleles and libraries of this invention are useful for designing probes for nucleic acid arrays. This invention provides nucleic acid arrays having two or more nucleic acid molecules or spots (each spot comprising a plurality of substantially identical isolated nucleic acid molecules), each molecule having the sequence of an allele of this invention. The molecules on the arrays of this invention are usually about 7 to about 30 nucleotides long. The arrays are useful for detecting the presence or absence of alleles. Arrays of this invention are also useful for sequencing human mtDNA. Alleles may be selected from sets of nucleotide alleles including human mtDNA nucleotide alleles, non-Cambridge human mtDNA nucleotide alleles, human mtDNA nucleotide alleles in 48 genomes and the Cambridge sequence, non-Cambridge human mtDNA nucleotide alleles in 48 genomes, nucleotide alleles useful for diagnosing human haplogroups and macro-haplogroups, nucleotide alleles useful for diagnosing human haplogroups, and evolutionarily significant human mitochondrial nucleotide alleles as listed in the various Tables and portions of tables hereof. Arrays of this invention may contain molecules capable of interrogating all of the alleles in one of the above-mentioned sets of alleles. A genotyping array useful for detecting sequence polymorphisms, such as are provided by this invention, are similar to Affymetrix (Santa Clara, CA, USA) genotyping arrays containing a Perfect Match probe (PM) and a corresponding Mismatch probe (MM). A PM probe could comprise a non-Cambridge allele at a selected nucleotide locus and the corresponding MM probe could comprise the corresponding Cambridge allele at the selected nucleotide locus. Arrays of this invention include sequencing arrays for human mtDNA.
As used herein, "array" refers to an ordered set of isolated nucleic acid molecules or spots consisting of pluralities of substantially identical isolated nucleic acid molecules. Preferably the molecules are attached to a substrate. The spots or molecules are ordered so that the location of each (on the substrate) is known and the identity of each is known. Arrays on a microscale can be called microarrays. Microarays on solid substrates, such as glass or other ceramic slides, can be called gene chips or chips.
Arrays are preferably printed on solid substrates. Before printing, substrates such as glass slides are prepared to provide a surface useful for binding, as is known to the art. Arrays may be printed using any printing techniques and machines known in the art. Printing involves placing the probes on the substrate, attaching the probes to the substrate, and blocking the substrate to prevent non-specific hybridization. Spots are printed at known locations. Arrays may be printed on glass microscope slides. Alternatively, probes may be synthesized in known positions on prepared solid substrates (Affymetrix, Santa Clara, CA, USA).
Arrays of this invention may contain as few as two spots, or more than about ten spots, more than about twenty-five spots, more than about one hundred spots, more than about 1000 spots, more than about 65,000 spots, or up to about several hundred thousand spots.
Using microarrays may require amplification of target sequences (generation of multiple copies of the same sequence) of sequences of interest, such as by PCR or reverse transcription. As the nucleic acid is copied, it is tagged with a fluorescent label that emits light like a light bulb. The labeled nucleic acid is introduced to the microarray and allowed to react for a period of time. This nucleic acid sticks to, or hybridizes, with the probes on the array when the probe is sufficiently complementary to the labeled, amplified, sample nucleic acid. The extra nucleic acid is washed off of the array, leaving behind only the nucleic acid that has bound to the probes. By obtaining an image of the array with a fluorescent scanner and using software to analyze the hybridized array image, it can be determined if, and to what extent, genes are switched on and off, or whether or not sequences are present, by comparing fluorescent intensities at specific locations on the array. The intensity of the signal indicates to what extent a sequence is present. In expression arrays, high fluorescent signals indicate that many copies of a gene are present in a sample, and lower fluorescent signal shows a gene is less active. By selecting appropriate hybridization conditions and probes, this technique is useful for detecting single nucleotide polymorphisms (SNPs) and for sequencing. Methods of designing and using microarrays are continuously being improved (Relogio, A. et al. (2002) Nuc. Acids. Res. 30(1 l):e51 ; Iwasaki, H et al. (2002) DNA Res. 9(2):59-62; and Lindroos, K. et al. (2002) Nuc. Acids. Res. 30(14):E70).
Arrays of this invention may be made by any array synthesis methods known in the art such as spotting technology or solid phase synthesis. Preferably the arrays of this invention are synthesized by solid phase synthesis using a combination of photolithography and combinatorial chemistry. Some of the key elements of probe selection and array design are common to the production of all arrays. Strategies to optimize probe hybridization, for example, are invariably included in the process of probe selection. Hybridization under particular pH, salt, and temperature conditions can be optimized by taking into account melting temperatures and by using empirical rules that correlate with desired hybridization behaviors. Computer models may be used for predicting the intensity and concentration- dependence of probe hybridization. Detecting a particular polymorphism can be accomplished using two probes. One probe is designed to be perfectly complementary to a target sequence, and a partner probe is generated that is identical except for a single base mismatch in its center. In the Affymetrix system, these probe pairs are called the Perfect Match probe (PM) and the Mismatch probe (MM). They allow for the quantitation and subtraction of signals caused by non-specific cross-hybridization. The difference in hybridization signals between the partners, as well as their intensity ratios, serve as indicators of specific target abundance, and consequently of the sequence.
Arrays can rely on multiple probes to interrogate individual nucleotides in a sequence. The identity of a target base can be deduced using four identical probes that vary only in the target position, each containing one of the four possible bases. Alternatively, the presence of a consensus sequence can be tested using one or two probes representing specific alleles. To genotype heterozygous or genetically mixed samples, arrays with many probes can be created to provide redundant information, resulting in unequivocal genotyping.
Probes fixed on solid substrates and targets (nucleotide sequences in the sample) are combined in a hybridization buffer solution and held at an appropriate temperature until annealing occurs. Thereafter, the substrate is washed free of extraneous materials, leaving the nucleic acids on the target bound to the fixed probe molecules allowing for detection and - quantitation by methods known in the art such as by autoradiograph, liquid scintillation counting, and/or fluorescence. As improvements are made in hybridization and detection techniques, they can be readily applied by one of ordinary skill in the art. As is well known in the art, if the probe molecules and target molecules hybridize by forming a strong non- covalent bond between the two molecules, it can be reasonably assumed that the probe and target nucleic acid are essentially identical, or almost completely complementary if the annealing and washing steps are carried out under conditions of high stringency. The detectable label provides a means for determining whether hybridization has occurred.
When using oligonucleotides or polynucleotides as hybridization probes, the probes may be labeled. In arrays of this invention, the target may instead be labeled by means known to the art. Target may be labeled with radioactive or non-radioactive labels. Targets preferably contain fluorescent labels. Various degrees of stringency of hybridization can be employed. The more stringent the conditions are, the greater the complementarity that is required for duplex formation. Stringency can be controlled by temperature, probe concentration, probe length, ionic strength, time, and the like. Hybridization experiments are often conducted under moderate to high stringency conditions by techniques well know in the art, as described, for example in Keller, G.H., and M.M. Manak (1987) DNA Probes, Stockton Press, New York, NY., pp. 169-170, hereby incorporated by reference. However, sequencing arrays typically use lower hybridization stringencies, as is known in the art.
Moderate to high stringency conditions for hybridization are known to the art. An example of high stringency conditions for a blot are hybridizing at 68° C in 5X SSC/5X Denhardt's solution/0.1% SDS, and washing in 0.2X SSC/0.1% SDS at room temperature. An example of conditions of moderate stringency are hybridizing at 68° C in 5X SSC/5X Denhardt's solution/0.1% SDS and washing at 42° C in 3X SSC. The parameters of temperature and salt concentration can be varied to achieve the desired level of sequence identity between probe and target nucleic acid. See, e.g., Sambrook et al. (1989) vide infra or Ausubel et al. (1995) Current Protocols in Molecular Biology, John Wiley & Sons, NY, NY, for further guidance on hybridization conditions.
The 'melting temperature is described by the following formula (Beltz, G.A. et al., [1983] Methods of Enzymology, R. Wu, L. Grossman and K. Moldave [Eds.] Academic Press, New York 100:266-285).
Tm=81.5o C + 16.6 Log[Na+]+0.41(+G+C)-0.61(%,formamide)-600/length of duplex in base pairs.
Washes can typically be carried out as follows: twice at room temperature for 15 minutes in IX SSPE, 0.1% SDS (low stringency wash), and once at TM-20o C for 15 minutes in 0.2X SSPE, 0.1% SDS (moderate stringency wash).
Nucleic acid useful in this invention can be created by Polymerase Chain Reaction (PCR) amplification. PCR products can be confirmed by agarose gel electrophoresis. PCR is a repetitive, enzymatic, primed synthesis of a nucleic acid sequence. This procedure is well known and commonly used by those skilled in this art (see Mullis, U.S. Patent Nos. 4,683,195, 4,683,202, and 4,800,159; Saiki et al. [1985] Science 230:1350-1354). PCR is used to enzymatically amplify a DNA fragment of interest that is flanked by two oligonucleotide primers that hybridize to opposite strands of the target sequence. The primers are oriented with the 3' ends pointing towards each other. Repeated cycles of heat denaturation of the template, annealing of the primers to their complementary sequences, and extension of the annealed primers with a DNA polymerase result in the amplification of the segment defined by the 5' ends of the PCR primers. Since the extension product of each primer can serve as a template for the other primer, each cycle essentially doubles the amount of DNA template produced in the previous cycle. This results in the exponential accumulation of the specific target fragment, up to several million-fold in a few hours. By using a thermostable DNA polymerase such as the Taq polymerase, which is isolated from the thermophilic bacterium Thermus aquaticus, the amplification process can be completely automated. Other enzymes that can be used are known to those skilled in the art.
Polynucleotide sequences of the present invention can be truncated and/or mutated such that certain of the resulting fragments and/or mutants of the original full-length sequence can retain the desired characteristics of the full-length sequence. A wide variety of restriction enzymes that are suitable for generating fragments from larger nucleic acid molecules are well known. In addition, it is well known that Bal31 exonuclease can be conveniently used for time-controlled limited digestion of DNA. See, for example, Maniatis (1982) Molecular Cloning: A Laboratory Manual Cold Spring Harbor Laboratory, New York, pages 135-139, incorporated herein by reference. See also Wei et al. (1983) J. Biol. Chem. 258:13006-13512. By use of Bal31 exonuclease (commonly referred to as "erase-a- base" procedures), the ordinarily skilled artisan can remove nucleotides from either or both ends of the subject nucleic acids to generate a wide spectrum of fragments that are functionally equivalent to the subject nucleotide sequences. One of ordinary skill in the art can, in this manner, generate hundreds of fragments of controlled, varying lengths from locations all along the original molecule. The ordinarily skilled artisan can routinely test or screen the generated fragments for their characteristics and determine the utility of the fragments as taught herein. It is also well known that the mutant sequences can be easily produced with site-directed mutagenesis. See, for example, Larionov, O.A. and Nikiforov, V.G. (1982) Genetika 18(3):349-59; and Shortle, D. et al., (1981) Annu. Rev. Genet. 15:265- 94, both incorporated herein by reference. The skilled artisan can routinely produce deletion- , insertion-, or substitution-type mutations and identify those resulting mutants that contain the desired characteristics of wild-type sequences, or fragments thereof.
Percent sequence identity of two nucleic acids may be determined using the algorithm of Karlin and Altschul (1990) Proc. Natl. Acad. Sci. USA 87:2264-2268, modified as in Karlin and Altschul (1993) Proc. Natl. Acad. Sci. USA 90:5873-5877. Such an algorithm is incorporated into the NBLAST and XBLAST programs of Altschul et al. (1990) J. Mol. Biol. 215:402-410. BLAST nucleotide searches are performed with the NBLAST program, score = 100, wordlength = 12, to obtain nucleotide sequences with the desired percent sequence identity. To obtain gapped alignments for comparison purposes, Gapped BLAST is used as described in Altschul et al. (1997) Nucl. Acids. Res. 25:3389-3402. When utilizing BLAST and Gapped BLAST programs, the default parameters of the respective programs (NBLAST and XBLAST) are used. See http://www.ncbi.nih.gov.
Standard techniques for cloning, DNA isolation, amplification and purification, for enzymatic reactions involving DNA ligase, DNA polymerase, restriction endonucleases and the like, and various separation techniques useful herein are those known and commonly employed by those skilled in the art. A number of standard techniques are described in Sambrook et al. (1989) Molecular Cloning, Second Edition, Cold Spring Harbor Laboratory, Plainview, New York; Maniatis et al. (1982) Molecular Cloning, Cold Spring Harbor Laboratory, Plainview, New York; Wu (ed.) (1993) Meth. Enzymol. 218, Part I; Wu (ed.) (1979) Meth. Enzymol. 68; Wu et al. (eds.) (1983) Meth. Enzymol. 100 and 101; Grossman and Moldave (eds.) Meth. Enzymol. 65; Miller (ed.) (1972) Experiments in Molecular Genetics, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York; Old and Primrose (1981) Principles of Gene Manipulation, University of California Press, Berkeley; Schleif and Wensink (1982) Practical Methods in Molecular Biology; Glover (Ed.) (1985) DNA Cloning Vol. I and H, JRL Press, Oxford, UK; Hames and Higgins (Eds.) (19851 Nucleic Acid Hybridization, IRL Press, Oxford, UK; Setlow and Hollaender (1979) Genetic Engineering: Principles and Methods. Vols. 1-4, Plenum Press, New York; and Ausubel et al. (1992) Current Protocols in Molecular Biology, Greene/Wiley, New York, NY. Abbreviations and nomenclature, where employed, are deemed standard in the field and commonly used in professional journals such as those cited herein. This invention provides machine-readable storage devices and program storage devices having data and methods for diagnosing haplogroups and physiological conditions. One program storage device provided by this invention contains the program steps: a) determining the haplogroup of a sample from an individual using nucleotide sequence data from nucleic acid in the sample; b) associating the haplogroup with information identifying the geographic region of the individual; c) comparing the haplogroup and geographic region of the sample to the set of haplogroups native to the geographic region of the individual; and d) diagnosing the individual with a predisposition to an energy metabolism-related physiological condition if the haplogroup of the individual is not within the set of haplogroups native to the geographic region of the individual; all said program steps being encoded in machine readable form, and all said information encoded in machine readable form. This invention also provides a data set, encoded in machine-readable form, containing nucleotide alleles listed in Table 19, with each allele associated with encoded information identifying a physiological condition in humans. These physiological conditions are energy- metabolism-related conditions including energetic imbalance, metabolic disease, abnormal energy metabolism, abnormal temperature regulation, abnormal oxidative phosphorylation, abnormal electron transport, obesity, amount of body fat, diabetes, hypertension, and cardiovascular disease. This storage device may also contain information associating each allele with one or more native geographic regions. A program storage device provided by this invention contains input means for inputting the haplogroup of an individual and the geographic region of that individual, and contains information associating alleles with native geographic regions, and program steps for diagnosing the individual with a predisposition to a physiological condition. A storage device containing a data set in machine readable form provided by this invention may include encoded information comprising amino acid alleles listed in Table 19, with each allele associated with a physiological condition in humans.
It will be appreciated by those of ordinary skill in the art that populations, subpopulations, organelles, and amino acid and nucleotide sequence comparison methods, neutrality test methods, nucleotide sequencing methods, codons, samples, sample collection techniques, sample preparation techniques, probes, probe generation techniques, genes involved in mitochondrial biology, hybridization techniques, array printing techniques, physiological conditions, cell lines, mutant strains, organisms, tissues, solid substrates, machine-readable storage devices, program devices, and methods of data analyses other than those specifically disclosed herein are available in the art and can be employed in the practice of this invention. All art-known functional equivalents are intended to be encompassed within the scope of this invention.
The following examples are provided for illustrative purposes, and are not intended to limit the scope of the invention as claimed here. Any variations in the compositions and methods exemplified that occur to the skilled artisan are intended to fall within the scope of the present invention.
EXAMPLES
Example 1
This invention provides human mtDNA polymorphisms found in all the major human haplogroups. Table 3 shows naturally occurring nucleotide alleles identified in the complete mtDNA sequences of 103 individuals, as compared to the mtDNA Cambridge sequence. All nucleotide sequences not listed are identical to the Cambridge sequence. Nucleotide alleles previously known to be associated with disease conditions, such as those listed in Table 1, are not listed in Table 3. Some deletion or rearrangement polymorphisms have also been excluded. All polymorphisms listed are nucleotide substitutions except for a nine-adenine nucleotide deletion at positions 8271-8279.
Table 3 Human MtDNA Nucleotide Alleles
Figure imgf000041_0002
Figure imgf000041_0001
Figure imgf000042_0001
Figure imgf000042_0002
Figure imgf000043_0001
Figure imgf000043_0002
90 JM o
H U α.
Figure imgf000044_0001
in r~- r~-
90 © o
Figure imgf000044_0002
O
Figure imgf000045_0001
Figure imgf000045_0002
Figure imgf000046_0001
Figure imgf000046_0002
Figure imgf000047_0001
Figure imgf000047_0002
Figure imgf000048_0001
Figure imgf000048_0002
Figure imgf000049_0001
Figure imgf000049_0002
Figure imgf000050_0001
Figure imgf000050_0002
Figure imgf000051_0001
Figure imgf000051_0002
Figure imgf000052_0001
Figure imgf000052_0002
Figure imgf000053_0001
Table 4 lists the nucleotide alleles identified in 48 mitochondrial genomes as compared to the Cambridge sequence.
Table 4 Human MtDNA Nucleotide Alleles in 48 Genomes
Figure imgf000055_0001
Figure imgf000055_0002
Figure imgf000056_0001
Figure imgf000056_0002
Figure imgf000057_0001
Figure imgf000057_0002
Figure imgf000058_0001
Figure imgf000058_0002
Figure imgf000059_0001
Figure imgf000059_0002
Figure imgf000060_0001
Figure imgf000060_0002
Figure imgf000061_0001
Figure imgf000061_0002
Figure imgf000062_0001
Figure imgf000062_0002
Figure imgf000063_0001
Example 2
The mtDNA sequences of Example 1 were chosen because they represent all of the major haplogroup lineages in humans. Analysis of these sequences has reaffirmed that all human mtDNAs belong to a single maternal tree, rooted in Africa (R. L. Cann et al, Nature 325:31-36 (1987); M. J. Johnson et al., (1983) Journal of Molecular Evolution 19:255-271; D. C. Wallace et al., "Global Mitochondrial DNA Variation and the Origin of Native Americans" in The Origin of Humankind, M. Aloisi, B. Battaglia, E. Carafoli, G. A. Danieli, Eds., Venice (IOS Press, 2000); M. Ingman et al., (2000) Nature 408:708-13; and D. C. Wallace et al., (1999) Gene 238:211-230). A cladogram of these mtDNA sequences is shown in FIG 1. Haplogroups are designated on branches of the tree. A calibration of the sequence evolution rate for the coding regions of the mtDNA, based on a human-chimpanzee divergence time of 6.5 million years ago (MYA) ( M. Goodman et al., (1998) Mol Phylogenet. Evol. 9:585-98), has permitted an estimate of the time to the most recent common ancestor (MRCA) of the human mtDNA phylogeny at ~200,000 years before present (YBP), and an estimate of the time of the MRCA for each major haplogroup (Table
5).
Table 5 Coalescence dates for haplogroups*
Figure imgf000065_0001
* The high probability of reverse mutations in the control region led us to calculate the times to the MRCAs using the entire mtDNA, excluding the control region (np 577-16023). Based on this value we estimated the average sequence evolution rate as (1.26 ± 0.08) x 10"" 8 per nucleotide per year, using the HKY85 model ( M. Hasegawa et al, (1985) JMol. Evol. 22:160-74 (1985)). Standard errors calculated from the inverse hessian at the maximum of the likelihood do not include any uncertainty in the calibration point, and were calculated using the delta method. The coalescence times of the various haplogroups may well be underestimated because of their small sample size.
Example 3
Inter-Continental Founder Events
The most striking feature of the mtDNA tree is the remarkable reduction in the number of mtDNA lineages that are associated with the transition from one continent to another. For example, when humans moved to Eurasia from Africa, the number of mitochondrial lineages was reduced from dozens to two lineages. While northeastern Africa encompasses the entire range of African mtDNA variation from the exclusively African haplogroups L0- L2 to the progenitors of the European and Asian mtDNA lineages, only two African mtDNA lineages, macro-haplogroups M and N, which arose about 65,000 YBP, left Africa to colonize Eurasia. Moreover, the times of the MRCAs of macro-haplogroups M and N as well as sub-macro-haplogroup R are similar, suggesting rapid population expansion associated with the colonization of Eurasia.
Similarly, when humans later moved from Central Asia to the Americas, the number of lineages was again reduced from dozens to about five. There is great mtDNA diversity in Asia, yet this diversity is substantially reduced in Siberia, and only five mtDNA haplogroups (A, B, C, D, and X), which arose in Asia about 28,000-34,000 YBP, successfully crossed the Bering land bridge to occupy the Americas. Human mtDNA haplogroup migrations are depicted in FIG 2.
Example 4
Further analysis demonstrated which alleles are descriptive of the major haplogroups, selected sub-haplogroups, and selected macro-haplogroups. The mtDNA nucleotide positions and the relevant alleles are shown in FIG 3. The data is arranged as a cladogram, such that a group on the left contains all of the alleles to its right. A vertical bar designates that the alleles to the right of the bar are present in all of the groups to the left of the bar. The haplogroup data in FIG. 3 is summarized in Tables 6 and 7. The sub-haplo group data is summarized in Tables 8 and 9. Each group contains the alleles listed below it.
Table 6
Figure imgf000067_0001
r-- tυ
Figure imgf000068_0001
Figure imgf000069_0001
Figure imgf000070_0001
Example 5
Further analysis of the data in FIG. 3 demonstrated sets of nucleotide alleles useful for diagnosing the haplogroups. A set of nucleotide alleles useful for diagnosing all of the haplogroups and sub-haplogroups in FIG. 3 is listed in Table 10. There are many equivalent methods for diagnosing the haplogroups. Examples of methods requiring testing only or a few loci follow. Alleles are identified in human samples containing mtDNA. Haplogroup LO can be diagnosed by identifying 4586C, 9818T, or 8113A. Haplogroup LI can be diagnosed by identifying 825 A, 2758A, 2885C, 7146G, 8468T, 8655T, 10688A, 10810C, or 13105G. Haplogroup L2 can be diagnosed by identifying 2416C, 2758G, 8206A, 9221 G, 11944C, or 16390G. Haplogroup L3 can be diagnosed by identifying 10819G, 14212C, 8618C, 10086C, 16362C, 10398A, or 16124C. Haplogroup C can be diagnosed by identifying 3552C, 4715G, 7196A, 8584A, 9545G, 13263G, 14318C, or 16327T. Haplogroup D can be diagnosed by identifying 4883T, 5178A, 8414T, 14668T, or 15487T. Haplogroup E can be diagnosed by identifying 16227G. Haplogroup G can be diagnosed by identifying 4833G, 8200C, or 16017C Haplogroup Z can be diagnosed by identifying 11078G, 16185T, or 16260T. Haplogroup A can be diagnosed by identifying 663G, 16290T, or 16319A. Haplogroup I can be diagnosed by identifying 4529T, 10034C, or 16391 A. Haplogroup W can be diagnosed by identifying 204C, 207A, 1243C, 5046A, 5460A, 8994A, 11947G, 15884C, or 16292T. Haplogroup X can be diagnosed by identifying 1719A, 3516G, 6221C, or 14470C. Haplogroup F can be diagnosed by identifying 12406A or 16304C. Haplogroup Y can be diagnosed by identifying 7933G, 8392A, 1623 IC, or 16266T. Haplogroup U can be diagnosed by identifying 3197C, 4646C, 7768G, 9055A, 11332T, 13104G, 14070G, 15907G, 16051G, 16129C, 16172C, 16219G, 16249C, 16270T, 16311T, 16318T, 16343G, or 16356C Haplogroup J can be diagnosed by identifying 295T, 12612G, 13708A, or 16069T. Haplogroup T can be diagnosed by identifying 11812G, 12633T, 14233G, 16163C, 16186T, 1888A, 4917G, 8697A, 10463C, 13368A, 14905A, 15607G, 15928A, or 16294T. Haplogroup V can be diagnosed by identifying 72C, 4580A, or 15904T. Haplogroup H can be diagnosed by identifying 2706A or 7028C. Diagnosis of haplogroup B is more complicated, requiring three steps. Haplogroup B can be diagnosed by identifying 16189C; and by identifying the absence of 1719A, 3516G, 6221C, 14470C, or 16278T; and by identifying the absence of 1888 A, 4216C, 4917G, 8697A, 10463C, 11251G, 11467G, 12308G, 12372A, 12633T, 13104G, 13368A, 14070G, 14905A, 15452A, 15607G, 15928A, 16126C, 16163C, 16186T, 16249C, or 16294T. Table 10 Nucleotide Alleles Useful for Diagnosing Human
Haplogroups
72 C
204 C
207 A
295 T
663 G
825 A
1243 C
1719 A
1888 A
2416 C
2706 A
2758 A
2758 G
2885 C
3197 C
3516 G
3552 C
4216 C
4529 T
4580 A
4586 c
4646 c
4715 G
4833 G
4883 T
4917 G
5046 A
5178 A
5460 A
6221 c
7028 C
7146 G
7196 A
7768 G
7933 G
8113 A
8200 C
8206 A
Figure imgf000072_0001
Figure imgf000072_0002
Additional alleles are included in Table 11. These alleles are useful for designing equivalent methods, to those described above, for diagnosing the haplogroups. Alleles in Table 11 are useful for designing efficient methods for diagnosing macro-haplogroups. The data in Tables 10 and 11 and FIG 3 are also useful for identifying sub-haplogroups. This invention provides a method for diagnosing sub-hap lo group Ll l by identifying in a human sample, one of the nucleotide alleles selected from the group consisting of 4586C and 9818T. This invention provides a method for diagnosing sub-haplogroup Lla2 by identifying in a human sample, one of the nucleotide alleles selected from the group consisting of 8113 A and 8251 A. This invention provides a method for diagnosing sub-haplogroup Llbl by identifying in a human sample, the nucleotide allele 2352C and one of the nucleotide alleles selected from the group consisting of 3666A, 7055G, 7389C, 13789C, and 14178C. This invention provides a method for diagnosing sub-haplogroup Llb2 by identifying in a human sample, one of the nucleotide alleles selected from the group consisting of 3796C, 5951G, 5984G, 6071C, 9072G, 10586A, 12810G, and 13485G. This invention provides a method for diagnosing sub-haplogroup L2a by identifying in a human sample the nucleotide allele 13803G. This invention provides a method for diagnosing sub-haplogroup L2b by identifying in a human sample the nucleotide allele 4158G. This invention provides a method for diagnosing sub-haplogroup L2c by identifying in a human sample, one of the nucleotide alleles selected from the group consisting of 325T, 680C, and 13958C. This invention provides a method for diagnosing sub-haplogroup L3a by identifying in a human sample, one of the nucleotide alleles selected from the group consisting of 2325C, 10819G, and 14212C. This invention provides a method for diagnosing sub-haplogroup L3b by identifying in a human sample the nucleotide allele 8618C. This invention provides a method for diagnosing sub-haplogroup L3c by identifying in a human sample the nucleotide allele 10086C. This invention provides a method for diagnosing sub-haplogroup L3d by identifying in a human sample the nucleotide allele 10398A. This invention provides a method for diagnosing sub-haplogroup Uk by identifying in a human sample, one of the nucleotide alleles selected from the group consisting of 9055 A and 1631 IT. This invention provides a method for diagnosing sub-haplogroup U7 by identifying in a human sample the nucleotide allele 16318T. This invention provides a method for diagnosing sub-haplogroup U6 by identifying in a human sample, one of the nucleotide alleles selected from the group consisting of 16172 C and 16219G. This invention provides a method for diagnosing sub- haplogroup U5 by identifying in a human sample, one of the nucleotide alleles selected from the group consisting of 3197C, 7768G, and 16270T. This invention provides a method for diagnosing sub-haplogroup U4 by identifying in a human sample, one of the nucleotide alleles selected from the group consisting of 4646C, 11332T, 16356C. This invention provides a method for diagnosing sub-haplogroup U3 by identifying in a human sample the nucleotide allele 16343G. This invention provides a method for diagnosing sub-haplogroup U2 by identifying in a human sample, one of the nucleotide alleles selected from the group consisting of 15907G, 16051G, and 16129C. This invention provides a method for diagnosing sub-haplogroup Ul by identifying in a human sample, one of the nucleotide alleles selected from the group consisting of 13104G, 14070G, 16189C, and 16249C. This invention provides a method for diagnosing sub-haplogroup T* by identifying in a human sample, one of the nucleotide alleles selected from the group consisting of 11812G and 14233G. This invention provides a method for diagnosing sub-haplogroup TI by identifying in a human sample, one of the nucleotide alleles selected from the group consisting of 12633T, 16163C, and 16186T.
Table 11 Nucleotide Alleles Useful for Diagnosing Human
Haplogroups and Macro-
Haplogroups
72 C
73 A
204 C
207 A
295 T
325 T
489 C
663 G
680 C
769 A
825 A
1018 A
1048 T
1243 C
1719 A
1888 A
2352 C
2416 C
2706 A
2758 A
2758 G
2885 C
3197 C
3516 A
3516 G
3552 C
3594 T
3666 A
3796 C
4104 G
4158 G
4216 C
4312 T
4529 T
4580 A
4586 C
Figure imgf000075_0001
Figure imgf000075_0002
Figure imgf000076_0002
Figure imgf000076_0001
An equivalent method for diagnosing a haplogroup is diagnosing haplogroup LO by identifying the presence of one of 825A, 2758A, 2885C, 7146G, 8468T, 8655T, 10688A, 10810C, or 13105G; and identifying the absence of one of 3666A, 7055G, 7389C, 13789C, or 14178C. Other equivalent methods can be derived from the data in FIG 3, and are within the scope of this invention.
Example 6
Lebers Hereditary Optic Neuropathy (LHON) is a form of blindness caused by mitochondrial DNA (mtDNA) mutations. Four mutations, 3460A, 11778A, 14484C, and 14459A, account for over 90% of LHON worldwide and are designated "primary" mutations. Primary mutations strongly predispose carriers to LHON, are not found in controls, are all in Complex I genes, and do not co-occur with each other. It has been demonstrated that the 11778 A and 14484C mutations occurred more frequently than expected in association with European mtDNA haplogroup J (found in 9% of European-derived mtDNAs), suggesting a synergistic interaction among mtDNA mutations increased the probability of disease expression. Sequence analysis of two Russian LHON families without primary LHON mutations, including removal of nucleotide alleles listed in Table 3, demonstrated two new complex I mutations, 3635A and 4640C. Venous blood samples were obtained from the family members. Genomic DNA was isolated from the buffy coat blood fraction using Chelex 100 (Cetus, Emberyville, CA, USA). mtDNA was amplified by PCR in 2-3kb fragments, purified on Centricon 100 columns, and cycle-sequenced using BigDye Terminators (ABI Perkin Elmer Cetus) and an ABI Prism 377 automated DNA sequencer. The mutations were confirmed using mutation-specific restriction enzyme digestion following mismatched-primer PCR amplification of white blood cell mtDNA (Brown M.D. et al., (1995) Human Mutat. 6:311-325).
Example 7
A new primary LHON mtDNA mutation, 10663C, affecting a Complex I gene was homoplasmic in 3 Caucasian LHON families, all of which belonged to haplogroup J. These 3 families were the only haplogroup J-associated LHON families (out of 17) that did not harbor a known, primary LHON mutation. Comprehensive phylogenetic analysis of haplogroup J using complete mtDNA sequences demonstrated that the 10663C variant has arisen 3 independent times on this background. This mutation was not present in over 200 non- haplogroup J European controls, 74 haplogroup J patient and control mtDNAs, or 36 putative LHON patients without primary mutations. A partial Complex I defect was found in 10663C- containing lymphoblast and cybrid mitochondria. Thus, the 10663C mutation has occurred three independent times, each time on haplogroup J and only in LHON patients without a known LHON mutation. This makes the 10663C mutation unique among all pathogenic mtDNA mutations in that it appears to require the genetic background provided by haplogroup J for expression. These results provide further evidence for the predisposing role of haplogroup J and for the paradigm of "mild" mtDNA mutations interacting in an additive way to precipitate disease expression. Europeans with the mild ND6 np 14484 and ND3 np 10663 Leber's Hereditary Optic Neuropathy (LHON) missense mutations are more prone to blindness if they also possess the mtDNA haplogroup J.
Example 8
To assess the importance of demographic factors in inter-continental mtDNA sequence radiation, deviations from the standard neutral model were tested for in the distribution of mtDNA sequence variants using the Tajima's D and Fu and Li D* tests ( Y. X. Fu, W. H. Li, (1993) Genetics 133:693-709. and F. Tajima, (1989) Genetics 123, 585-95). The standard neutral model of population genetics assumes a random-mating population of constant size, with all mutations uniquely arising and selectively neutral. The continental frequency distribution of pairwise mtDNA sequence differences was calculated to test for rapid population expansion using the method of A. R. Rogers, H. Harpending, (1992) Mol. Biol. Evol. 9:552-569.
For the African mtDNA sequences (n = 32), the results did not significantly deviate from the standard neutral model, and the frequency distribution of pairwise sequence difference counts was broad and ragged. Both of these results are consistent with the model that the African population has been relatively stable for a long time. By contrast, the non- African mtDNAs (n = 72) showed a highly significant deviation from neutrality (Tajima's D = -2.43, P < 0.01; Fu and Li D* = -5.09, P < 0.02), as well as a bell-shaped frequency distribution of pairwise sequence differences. Thus, these results are consistent with population expansions having distorted the frequency distribution (L. Excoffier, J. Mol. Evol. 30:125-39 (1990) and D.A. Merriwether et al. (1991) J. Mol. Evol 33:543-555). To better define the regional distribution of these demographic influences, the Eurasian samples were divided into European and Asian plus Native American. Analysis of all European mtDNAs also revealed significant deviations from the standard neutral model (Tajima's D = -2.19, P < 0.01; Fu and Li 79* = -3.31, P < 0.02). The distribution of pairwise sequence differences for the European mtDNAs revealed two sharp peaks, hinting at two major expansion phases. The most recent of these peaks was lost when haplogroup H and V mtDNAs were deleted from the sample. Hence, haplogroup H, which represents 40% of modem European mtDNAs ( A. Torroni et al., American Journal of Human Genetics 62, 1137-1152 (1998)) and has a MRCA of 19,000 YBP, came to predominate in Europe relatively recently.
Analysis of the aggregated Asian and Native American mtDNAs (« = 41) also revealed significant deviations from the standard neutral model (Tajima's D = -2.28, P < 0.01, Fu and LiD* = -4.31; P < 0.02) as well as revealing a broad, bell-shaped distribution of pairwise differences consistent with rapid population expansion.
When the Asian-Native American haplogroups A, B, C, D and X mtDNAs in = 26) were analyzed separately, they also showed significant deviation from neutrality for the Fu and Li D* test (£>* = -2.65, P < 0.05), although not for the Tajima's D test (D = -1.60, ns). Their distribution of pairwise sequence differences was also strongly uni-modal, indicating that the population expanded as people moved through Siberia and Beringia and into the Americas.
Example 9
Variable Replacement Mutation Rates in Human mtDNA Genes
To determine if selection was an important factor in causing the sudden shifts in mtDNA sequence variation between continents, the number of non-synonymous to synonymous base substitutions was analyzed for all 13 mtDNA protein genes of those haplogroups which contributed to the colonization of each of the major continental spaces: African, European, and Native American. For example, for the "Native Americans " the mtDNAs from the Asian-Native American haplogroups A, B, C, D and X were combined. The Asian-Native American mtDNAs from the haplogroups were combined because random mutations accumulate in founder populations and those mtDNAs which prove advantageous in new environments are enriched. Hence, the founding mutations of the haplogroup are important in the continental success of the lineage. We then tested for possible selective effects during the colonization of each continent by comparing the ratio of non-synonymous versus synonymous nucleotide substitutions for each mtDNA gene. An increase in the non- synonynous to synonymous mutation ratio suggests that selection has favored the propagation of a functionally altered protein.
The comparison of the ratio of nonsynonymous to synonymous mutations, counting each change only once, revealed great variation between continents for several genes (Table 12). Marked increases in the accumulation of non-synonymous mutations were seen for ND3 in Africans, Cytb and COJJI in Europeans, and ATP6 in Native Americans. The number of non-synonymous and synonymous mutations for each gene was also compared between the different continents by computing the P value using a Two-tailed Fisher Exact Test. This revealed significant differences between Africans and both Europeans and Native Americans for COJJI, between Africans and Native Americans for ATP6, and between Africans and Europeans for the sum of all mtDNA genes (Table 12). Hence, this analysis supports the hypothesis that selection has played a role in shaping continental mtDNA protein variation.
Table 12*
Figure imgf000080_0001
* Replacement versus synonymous mutation numbers of mtDNA genes. Rplmt = replacement mutations, ratio = rplmt/silent. FET = Fisher Exact Test. Afr = Africa, Eur = Europe, Am = Native American. The ratios of polymorphic sites in bold-italics highlight some of the higher values observed. Those in bold-italics under Two-Tailed FET indicate comparisons that are significant at the 0.05 level.
Example 10
Since the above analysis counts each mutation only once, irrespective of its frequency within the haplogroup, it under-emphasizes the importance of nodal mutations and overemphasizes the importance of terminal private polymorphisms. As an alternative to this approach, we calculated the corrected non-synonymous (Ka) and synonymous (Ks) mutation frequencies and then determined the relative selective constraints acting on that gene by calculating the kc value { kc= - \n(Ka/Ks)}. A high kc value is indicative of high protein sequence conservation and low amino acid variation, while a low value is indicative of low protein conservation and high amino acid variation (N. Neckelmann et al, (1987) Proc. Natl. Acad. Sci. USA 84:7580-7584).
The kc values for each human mtDNA gene were compared across the total global collection of human mtDNA sequences (Figure 4). The ATP6 gene was the least conserved gene in the human mtDNA, though previously it had been shown to be relatively highly conserved in inter-specific comparisons (N. Neckelmann et al., (1987) Proc. Natl. Acad. Sci. USA 84:7580-7584).
Example 11
The higher inter-specific conservation of ATP6 was confirmed by comparing the kc values of human versus chimpanzee (Pan troglodytes) and bonobo (Pan paniscus); human versus eight primate species (baboon, Borneo and Sumatran orangutan, gibbon, gorilla, lowland gorilla, bonobo, and chimpanzee); and human versus 13 diverse mammalian species (bovine, mouse, cat, dog, pig, rat, rhinoceros, horse, gibbon, gorilla , orangutan, bonobo, chimpanzee) (Figure 3). Thus, while ATP6 is highly conserved between species, it is very poorly conserved within humans. These results are consistent with the reduced intra-specific versus inter-specific conservation observed for other genes (C. A. Wise et al., (1998) Genetics 148:409-21), and with the hypothesis that mitochondrial protein variation is accelerated in humans and other primates, as seen in cytochrome c oxidase genes (L. I. Grossman et al., (200 \) Mol. Phylogenet. Evol. 18:26-36). Example 12
To further investigate the possibility that individual mtDNA protein genes differ in their selective constraints in different human continental populations, Rvalues for all 13 mtDNA protein genes from each set of continental haplogroups were calculated: African, European, and the Native American. The cumulative selective pressure that separated the mtDNAs of pairs of continents by pair-wise comparison of the kc values was calculated for the genes of each mtDNA (Table 13). Comparison of mtDNA protein Rvalues in Europeans versus Africans revealed that three genes (NDl, cytb and COHI) had significantly lower sequence conservation in Europeans. A comparison of the kc values of Native American versus African mωNA genes revealed six genes (ND4, ND6, COIL COJJI, ATP6 and ATP8) that had significantly lower sequence conservation in Native Americans. Finally, comparison of the kc values of Africans versus Europeans or Native Americans revealed four mtDNA genes (ND3, ND5, cytb, and COI) had significantly lower sequence conservation in Africans. The greatest differences in kc values were seen for the comparisons of COrπ and ATP6 between Africans and Native Americans and for COJJI between African and Europeans (Table 13).
Table 13*
Figure imgf000082_0001
* Estimates of coefficients of selective constraint (kc) stratified by gene and region, kc values and standard deviations calculated for African, European and Asian- American haplogroups A,B,C,D and X mtDNA protein-coding genes. * indicates that k values could not be calculated, since either Ks or Ka were 0. Haplogroup X is represented only by the Native- American sequence, the European X sequence being excluded. Taken together, these data show that different selective forces have acted on individual mtDNA genes as humans colonized different continents. Moreover, the observed differences in mtDNA protein sequence correlate with the climatic transitions that humans would have experienced as they migrated out of tropical and sub-tropical Africa and into temperate Eurasia and arctic Siberia and Beringia. The mtDNA genes that showed the highest amino acid sequence variation between continents were COIJJ and ATP6.
Example 13
The nucleotide alleles in Table 3 residing in evolutionarily significant genes identified in Examples 9-12 were analyzed for evolutionary significance. Evolutionarily significant alleles reside in evolutionarily significant genes and cause amino acid changes. A list of the evolutionarily significant nucleotide alleles in NDl, ND2, ND3, ND4, ND5, ND6, Cytb, COI, COπ, COm, ATP6, and ATP8 appear in Table 14. The Cambridge nucleotide alleles in Table 14 are evolutionarily significant. These amino acid alleles, including the Cambridge alleles, are evolutionarily significant. The locations of the amino acid alleles are identified by the location of the nucleotide allele listed in Table 3. Other evolutionarily significant nucleotide alleles not listed in Table 14, include alleles at neighboring nucleotide loci that are within the same codon and code for the same amino acids that are listed in Table 14.
Table 14 Evolutionarily Significant Human Mitochondrial Nucleotide and Amino Acid Alleles
Figure imgf000083_0001
Figure imgf000084_0001
Figure imgf000085_0001
Figure imgf000086_0001
Figure imgf000087_0001
A subset of the alleles in Table 14 that are associated with predispositions to physiological conditions using the methods of this invention is listed in Table 15.
Table 15 Amino Acid Alleles Associated with Physiological Conditions in this Invention
Figure imgf000087_0002
Figure imgf000088_0001
Example 14
Continent-Specific Amino Acid Substitutions in ATP6
To further investigate the biological significance of the human continent-specific ATP6 amino acid substitutions, the amino acid conservation for each variable human position using 39 animal species mtDNAs (12 primates, 22 other mammals, four non-mammalian vertebrates, and Drosophild) was analyzed. This revealed that many of the ATP6 substitutions that are associated with particular mtDNA haplogroups alter evolutionarily conserved, and hence potentially functionally important, amino acids.
A threonine to alanine substitution at codon 59 (T59A, nucleotide location 8701- 8703) in ATP6 separates the mtDNAs of macro-haplogroup N from the rest of the World. The polar threonine at position 59 is conserved in all great apes and some old-world monkeys.
Among the haplogroups of macro-haplogroup M, the related Siberian-Native American haplogroups C and Z are delineated by an A20T (nucleotide location 8584-8586) variant. A non-polar amino acid found in this position occurs in all animal species except for Macaca, Papio, Balaenoptera and Drosophila. Among the haplogroups of macro-haplogroup N, the non-R lineage Nib harbors two distinctive amino acid substitutions: M104V (nucleotide location 8836-8838) and T146A. (nucleotide location 8962-8964) The methionine at position 104 is conserved in all mammals, and the threonine at position 146 is conserved throughout all animal mtDNAs. Moreover, the T146A substitution is within the same transmembrane α-helix as the pathogenic mutation L156R that alters the coupling efficiency of the ATP synthase and causes the NARP and Leigh syndromes (I. Trounce, S. Neill, D. C Wallace, Proceedings of the National Academy of Sciences of the United States of America 91, 8334-8338 (1994)).
Also in macro-haplogroup N, haplogroup A mtDNAs harbor a H90Y (nucleotide location 8794-8796) amino acid substitution. The histidine in this position is conserved in all placental mammals except Pongo, Cebus and Loxodonta and occurs within a highly conserved region. Furthermore, among the heterogeneous group of mtDNAs carrying the tRNA s-COJJ 9 bp deletion and arbitrarily assigned to haplogroup B, one mtDNA harbored a F193L (nucleotide location 9103-9105) substitution. This position is conserved in all mammals except Pongo, Papio, Cebus and Erinaceus.
Since each of the mtDNA sequences used in this comparison of different species is derived from only one or two individuals, it is possible that the rare deviant cases are due to the accumulation of environmentally adaptive mutations in those species that parallel those in humans. Thus, the above ATP6 amino acid polymorphisms have the characteristics expected for evolutionarily adaptive mutations.
Table 16
Figure imgf000089_0001
Figure imgf000090_0001
Figure imgf000090_0002
Figure imgf000091_0001
Figure imgf000091_0002
o
H U α.
Figure imgf000092_0002
Figure imgf000092_0003
Figure imgf000092_0001
Figure imgf000093_0001
Figure imgf000093_0002
Figure imgf000094_0001
Figure imgf000094_0002
Figure imgf000095_0001
Figure imgf000095_0002
Figure imgf000096_0001
Figure imgf000096_0002
00
©
H U α.
Figure imgf000097_0002
Figure imgf000097_0003
Figure imgf000097_0001
Figure imgf000098_0001
Figure imgf000098_0002
Figure imgf000099_0002
Figure imgf000099_0001
REFERENCE TO SEQUENCE LISTINGS SEQ JD NO:l is a theoretical human mtDNA genome sequence containing the nucleotide alleles of this invention as listed in Table 3.
SEQ JD NO:2 is the human mtDNA reference sequence called the Cambridge Sequence (GenBank Accession No. JO 1415).

Claims

We claim:
1. A method for diagnosing a haplogroup of a human comprising: a) providing a sample comprising mitochondrial nucleic acid from said human; and b) identifying, in said sample, the presence or absence of at least one nucleotide allele diagnostic of a haplogroup.
2. The method of claim 1 wherein said haplogroup is haplogroup LI and wherein method step b) comprises identifying in said sample at least one nucleotide allele selected from the group consisting of 825A, 2758A, 2885C, 7146G, 8468T, 8655T, 10688A, 10810C, and 13105G.
3. The method of claim 1 wherein said haplogroup is haplogroup L2 and wherein method step b) comprises identifying in said sample at least one nucleotide allele selected from the group consisting of 2416C, 2758G, 8206A, 9221G, 11944C, and 16390G.
4. The method of claim 1 wherein said haplogroup is haplogroup L3 and wherein method step b) comprises identifying in said sample at least one nucleotide allele selected from the group consisting of 10819G, 14212C, 8618C, 10086C, 16362C, 10398A, and 16124C.
5. The method of claim 1 wherein said haplogroup is haplogroup C and wherein method step b) comprises identifying in said sample at least one nucleotide allele selected from the group consisting of 3552C, 4715G, 7196A, 8584A, 9545G, 13263G, 14318C, and l6327T.
6. The method of claim 1 wherein said haplogroup is haplogroup D and wherein method step b) comprises identifying in said sample at least one nucleotide allele selected from the group consisting of 4883T, 5178A, 8414T, 14668T, and 15487T.
7. The method of claim 1 wherein said haplogroup is haplogroup E and wherein method step b) comprises identifying in said sample the nucleotide allele 16227G.
8. The method of claim 1 wherein said haplogroup is haplogroup G and wherein method step b) comprises identifying in said sample at least one nucleotide allele selected from the group consisting of 4833G, 8200C, and 16017C.
9. The method of claim 1 wherein said haplogroup is haplogroup Z and wherein method step b) comprises identifying in said sample at least one nucleotide allele selected from the group consisting of 11078G, 16185T, and 16260T.
10. The method of claim 1 wherein said haplogroup is haplogroup A and wherein method step b) comprises identifying in said sample at least one nucleotide allele selected from the group consisting of 663 G, 16290T, and 16319 A.
11. The method of claim 1 wherein said haplogroup is haplogroup I and wherein method step b) comprises identifying in said sample at least one nucleotide allele selected from the group consisting of 4529T, 10034C, and 16391A.
12. The method of claim 1 wherein said haplogroup is haplogroup W and wherein method step b) comprises identifying in said sample at least one nucleotide allele selected from the group consisting of 204C, 207A, 1243C, 5046A, 5460A, 8994A, 11947G, 15884C, and 16292T.
13. The method of claim 1 wherein said haplogroup is haplogroup X and wherein method step b) comprises identifying in said sample at least one nucleotide allele selected from the group consisting of 1719 A, 3 16G, 622 IC, and 14470C.
14. The method of claim 1 wherein said haplogroup is haplogroup B and wherein method step b) comprises:
1) identifying in said sample nucleotide allele 16189C;
2) identifying in said sample the absence of a nucleotide allele selected from the group consisting of 1719A, 3516G, 6221C, 14470C, and 16278T; and
3) identifying in said sample the absence of a nucleotide allele selected from the group consisting of 1888A, 4216C, 4917G, 8697A, 10463C, 11251G, 11467G, 12308G, 12372A, 12633T, 13104G, 13368A, 14070G, 14905A, 15452A, 15607G, 15928A, 16126C, 16163C, 16186T, 16249C, and 16294T.
15. The method of claim 1 wherein said haplogroup is haplogroup F and wherein method step b) comprises identifying in said sample at least one nucleotide allele selected from the group consisting of 12406A and 16304C.
16. The method of claim 1 wherein said haplogroup is haplogroup Y and wherein method step b) comprises identifying in said sample at least one nucleotide allele selected from the group consisting of 7933G, 8392A, 16231C, and 16266T.
17. The method of claim 1 wherein said haplogroup is haplogroup U and wherein method step b) comprises identifying in said sample at least one nucleotide allele selected from the group consisting of 3197C, 4646C, 7768G, 9055A, 11332T, 13104G, 14070G, 15907G, 16051G, 16129C, 16172C, 16219G, 16249C, 16270T, 1631 IT, 16318T, 16343G, and 16356C.
18. The method of claim 1 wherein said haplogroup is haplogroup J and wherein method step b) comprises identifying in said sample at least one nucleotide allele selected from the group consisting of 295T, 12612G, 13708A, and 16069T.
19. The method of claim 1 wherein said haplogroup is haplogroup T and wherein method step b) comprises identifying in said sample at least one nucleotide allele selected from the group consisting of 11812G, 12633T, 14233G, 16163C, 16186T, 1888A, 4917G, 8697 A, 10463C, 13368A, 14905A, 15607G, 15928A, and 16294T.
20. The method of claim 1 wherein said haplogroup is haplogroup V and wherein method step b) comprises identifying in said sample at least one nucleotide allele selected from the group consisting of 72C, 4580 A, and 15904T.
21. The method of claim 1 wherein said haplogroup is haplogroup H and wherein method step b) comprises identifying in said sample at least one nucleotide allele selected from the group consisting of 2706A and 7028C.
22. The method of claim 1 wherein said haplogroup is haplogroup LO and wherein method step b) comprises identifying in said sample at least one nucleotide allele selected from the group consisting of 4586C, 9818T, and 8113A.
23. The method of claim 1 wherein said identifying step is performed using an array comprising two or more isolated nucleic acid molecules attached to a substrate at a known location, each molecule having a length of about 7 to about 30 nucleotides, each molecule comprising a sequence identical with a portion of SEQ JD NO:l containing at least one nucleotide allele at a locus selected from the group of loci consisting of those listed in column 1 of Table 3.
24. A machine readable storage device comprising a data set encoded in machine readable form, said data set comprising a plurality of nucleotide alleles and a haplogroup designation associated with each allele.
25. A program storage device comprising the storage device of claim 24 and also comprising input means for inputting a data set comprising one or more nucleotide alleles, said device also comprising program steps for diagnosing a haplogroup by associating said input nucleotide alleles with an associated haplogroup, and displaying the result.
26. A method for identifying an evolutionarily significant gene, said method comprising: a) providing a first set of nucleotide sequences comprising nucleic acid sequences of at least one allelic gene or portion thereof from a first population; b) providing a second set of nucleotide sequences comprising nucleic acid sequences of the corresponding at least one allelic gene or portion thereof from a second, intraspecific, population; c) performing neutrality analysis, comprising comparing said first set to said second set to generate a data set; and d) analyzing said data set to identify an evolutionarily significant gene.
27. The method of claim 26 wherein said one or more of said allelic genes are located in the mitochondrial genome.
28. The method of claim 26 wherein said populations are human populations.
29. The method of claim 26 wherein said first population and/or said second population comprises at least one subpopulation, said subpopulation being selected from the group consisting of macro-haplogroup, haplogroup, sub-haplogroup, and individual.
30. The method of claim 26 wherein said second set of nucleotide sequences comprises at least 100 nucleotides identical to a portion of SEQ ID NO:2.
31. The method of claim 26 wherein said evolutionarily significant gene is a mitochondrial gene selected from the group consisting of NDl, ND2, ND3, ND4, ND5, ND6, Cytb, COI, COJJ, COm, ATP6, and ATP8.
32. The method of claim 31 wherein said evolutionarily significant mitochondrial gene is selected from the group consisting of COIJJ and ATP6.
33. The method of claim 26 also comprising identifying at least one evolutionarily significant nucleotide allele by identifying a sequence difference between said first and second nucleotide sequences.
34. The method of claim 33 also comprising identifying an evolutionarily significant amino acid allele by determining the evolutionarily significant amino acid allele encoded by the codon comprising said evolutionarily significant nucleotide allele.
35. The method of claim 34 also comprising identifying an amino acid allele diagnostic of a predisposition to a physiological condition by using as said first population, individuals having said physiological condition, and using as the second population, individuals not having said physiological condition.
36. A method for identifying an evolutionarily significant gene, said method comprising: a) providing a set of nucleotide sequences comprising two or more corresponding allelic genes from one population of one species; b) performing neutrality analysis comprising comparing said nucleotide sequences of corresponding allelic genes to generate a data set; and c) analyzing said data set to identify an evolutionarily significant gene.
37. The method of claim 36 also comprising identifying an evolutionarily significant nucleotide allele by analyzing a nucleic acid sequence of an evolutionarily significant gene to identify an evolutionarily significant nucleotide allele.
38. The method of claim 37 also comprising identifying an evolutionarily significant amino acid allele by determining the evolutionarily significant amino acid allele encoded by the codon containing said evolutionarily significant nucleotide allele.
39. A method for diagnosing an individual with a predisposition to a selected physiological condition comprising: a) providing a sample comprising mitochondrial nucleic acid molecule from an individual; b) providing information identifying the geographic region in which said individual resides; c) providing information identifying a set of haplogroups native to said geographic region; d) determining the haplogroup of said individual from said sample; e) comparing said haplogroup of said individual to said set of haplogroups native to said geographic region; and f) diagnosing said individual with a predisposition to said selected physiological condition if said haplogroup of said individual is not within said set of haplogroups native to said geographic region.
40. The method of claim 39 wherein said physiological condition is selected from the group consisting of energetic imbalance, metabolic disease, abnormal energy metabolism, abnormal temperature regulation, abnormal oxidative phosphorylation, abnormal electron transport, obesity, amount of body fat, diabetes, hypertension, and cardiovascular disease.
41. The method of claim 39 also comprising associating an amino acid allele with said physiological condition, said method comprising selecting an amino acid allele useful for diagnosing said haplogroup of said individual, wherein the presence of said amino acid allele is not useful for diagnosing one or more haplogroups in said set of haplogroups native to said geographical region in which said individual resides.
42. The method of claim 41 wherein said haplogroup is haplogroup G and the amino acid allele is ntl 4833 A.
43. The method of claim 41 wherein said haplogroup is haplogroup T and the amino acid allele is selected from the group consisting of ntl 4917 D, ntl 8701 T, and ntl 15452 I.
44. The method of claim 41 wherein said haplogroup is haplogroup W and the amino acid allele is selected from the group consisting of ntl 5046 I, ntl 5460 T, ntl 8701 T, and ntl 15884 P.
45. The method of claim 41 wherein said haplogroup is haplogroup D and the amino acid allele is selected from the group consisting of ntl 5178 M and ntl 8414 F.
46. The method of claim 41 wherein said haplogroup is haplogroup L0 and the amino acid allele is selected from the group consisting of ntl 5442 L, ntl 7146 A, ntl 9402 P, ntl 13105 V, and ntl 13276 V.
47. The method of claim 41 wherein said haplogroup is haplogroup LI and the amino acid allele is selected from the group consisting of ntl 7146 A, ntl 7389 H, ntl 13105 V, ntl 13789 H, and ntl 14178 V.
48. The method of claim 41 wherein said haplogroup is haplogroup C and the amino acid allele is selected from the group consisting of ntl 8584 T and ntl 14318 S.
49. The method of claim 41 wherein said haplogroup is selected from the group consisting of haplogroups A, I, X, B, F, Y, and U and the amino acid allele is ntl 8701 T.
50. The method of claim 41 wherein said haplogroup is haplogroup G and the amino acid allele is selected from the group consisting of ntl 8701 T, ntl 13708 T, and ntl 154521.
51. The method of claim 41 wherein said haplogroup is selected from the group consisting of haplogroups V and H and the amino acid allele is selected from the group consisting of ntl 8701 T and ntl 14766 T.
52. The method of claim 41 wherein a nucleotide allele contained in a codon encoding said amino acid allele is not a nucleotide allele of Table 1.
53. A program storage device in which the steps of claim 39 are encoded in machine- readable form, said device also comprising a storage medium encoding said information identifying the geographic region in which said individual resides and a set of haplogroups native to said geographic region in machine readable form.
54. A storage device comprising a data set encoded in machine-readable form comprising nucleotide alleles selected from the group consisting of evolutionarily significant human mitochondrial nucleotide alleles, each said allele being associated in said storage device with encoded information identifying a physiological condition in humans.
55. The storage device of claim 54 wherein said physiological condition is selected from the group consisting of energetic imbalance, metabolic disease, abnormal energy metabolism, abnormal temperature regulation, abnormal oxidative phosphorylation, abnormal electron transport, obesity, amount of body fat, diabetes, hypertension, and cardiovascular disease.
56. The storage device of claim 54 also comprising encoded information associating each said nucleotide allele with a native geographic region.
57. A program storage device comprising the storage device of claim 54 and also comprising input means for inputting a haplogroup of an individual and a geographic region of said individual, said device further comprising program steps for diagnosing said individual as having a predisposition to a physiological condition.
58. A storage device comprising a data set encoded in machine-readable form comprising evolutionarily significant human mitochondrial amino acid alleles, each said allele being associated in said storage device with encoded information identifying a physiological condition in humans.
59. A method for diagnosing a predisposition to LHON in a human comprising: a) providing a sample from said human; b) identifying in said sample nucleotide allele 10663C; and c) identifying in said sample, nucleotide alleles encoding threonine at amino acid position 458 of gene ND5; wherein the presence of said nucleotide alleles is diagnostic of a predisposition to LHON.
60. A method for diagnosing a predisposition to LHON in a human comprising: a) providing a sample from said human; b) identifying in said sample nucleotide allele 10663C; and c) identifying in said sample at least one nucleotide allele selected from the group consisting of 295T, 12612G, 13708A, and 16069T, wherein the presence of said nucleotide alleles is diagnostic of a predisposition to LHON.
61. A method for diagnosing a predisposition to LHON in a human comprising: a) providing a sample from said human; and b) identifying in said sample a nucleotide allele selected from the group consisting of 3635 A and 4640C, wherein the presence of said nucleotide alleles is diagnostic of a predisposition to LHON.
62. A method for diagnosing increased likelihood of developing blindness in a human comprising: a) providing a sample from said human; b) identifying in said sample a nucleotide allele selected from the group consisting of 11778A , 14484C and 10663C; and c) identifying in said sample, nucleotide alleles encoding threonine at amino acid position 458 of gene ND5, wherein the presence of said nucleotide alleles is diagnostic of a predisposition to develop blindness.
63. A library comprising at least two isolated nucleic acid molecules, each molecule having a length of about 7 to about 30 nucleotides, each molecule comprising a sequence identical with a portion of SEQ ID NO: 1 containing at least one nucleotide allele at a locus selected from the group of loci consisting of those listed in column 1 of Table 3.
64. The library of claim 63 wherein at least one molecule has a sequence comprising a nucleotide allele selected from the group consisting of non-Cambridge human mtDNA nucleotide alleles of Table 3.
65. The library of claim 63 wherein at least one molecule has a sequence comprising a nucleotide allele selected from the group consisting of non-Cambridge human mtDNA nucleotide alleles of Table 4.
66. The library of claim 63 wherein at least one molecule has a sequence comprising a nucleotide allele selected from the group consisting of nucleotide alleles useful for diagnosing human haplogroups and macro-haplogroups (Table 11).
67. The library of claim 63 comprising all said nucleotide acid molecules.
68. A nucleic acid array comprising two or more spots, each spot comprising a plurality of substantially identical isolated nucleic acid molecules of the library of claim 63 attached to a substrate at a defined location.
69. The array of claim 68 wherein at least one molecule has a sequence comprising a nucleotide allele selected from the group consisting of non-Cambridge human mtDNA nucleotide alleles of Table 3.
70. The array of claim 68 wherein at least one molecule has a sequence comprising a nucleotide allele selected from the group consisting of non-Cambridge human mtDNA nucleotide alleles of Table 4.
71. The array of claim 68 wherein at least one molecule has a sequence comprising a nucleotide allele selected from the group consisting of nucleotide alleles in nucleotide alleles useful for diagnosing human haplogroups and macro-haplogroups (Table 11).
72. The array of claim 68 comprising all said nucleic acid molecules.
73. The array of claim 68 printed on a glass slide.
74. The array of claim 68 comprising more than about ten spots.
75. The array of claim 68 comprising more than about twenty-five spots.
76. The array of claim 68 wherein said isolated nucleic acid molecules are about 20 nucleotides in length.
77. A method of making a nucleic acid array comprising: a) providing a prepared substrate; and b) printing two or more spots in known positions on said substrate, each spot comprising a plurality of substantially identical isolated nucleic acid molecules, each molecule having a length of about 7 to about 30 nucleotides, each molecule comprising a sequence identical with a portion of SEQ ID NO:l, and containing at least one nucleotide allele at a locus selected from the group of loci consisting of those listed in column 1 of Table 3.
78. The method of claim 77 wherein at least one molecule has a sequence comprising a nucleotide allele selected from the group consisting of non-Cambridge human mtDNA nucleotide alleles of Table 3.
79. The method of claim 77 wherein at least one molecule has a sequence comprising a nucleotide allele selected from the group consisting of nn-Cambridge human mtDNA nucleotide alleles of Table 4. The method of claim 77 wherein said array comprises all said nucleotide acid molecules.
A method for determining the presence or absence of a nucleotide allele in a sample comprising: a) providing a prepared human sample; b) providing an array of claim 68; c) contacting said array with and said sample under conditions allowing quantitative hybridization; d) measuring the pattern hybridization of said sample to said array; and e) analyzing said hybridization.
PCT/US2002/028471 2001-08-30 2002-08-30 Human mitochondrial dna polymorphisms, haplogroups, associations with physiological conditions, and genotyping arrays WO2003018775A2 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
JP2003523626A JP2005525082A (en) 2001-08-30 2002-08-30 Human mitochondrial DNA polymorphisms, haplogroups, associations with physiological conditions, and genotyping arrays
EP02796465A EP1432831A4 (en) 2001-08-30 2002-08-30 Human mitochondrial dna polymorphisms, haplogroups, associations with physiological conditions, and genotyping arrays
US10/488,618 US20050123913A1 (en) 2001-08-30 2002-08-30 Human mitochondrial dna polymorphisms, haplogroups, associations with physiological conditions, and genotyping arrays
CA002459127A CA2459127A1 (en) 2001-08-30 2002-08-30 Human mitochondrial dna polymorphisms, haplogroups, associations with physiological conditions, and genotyping arrays

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
US31633301P 2001-08-30 2001-08-30
US60/316,333 2001-08-30
CA2,356,536 2001-08-31
CA 2356536 CA2356536A1 (en) 2001-08-30 2001-08-31 Mitochondrial dna sequence alleles
US38054602P 2002-05-13 2002-05-13
US60/380,546 2002-05-13

Publications (2)

Publication Number Publication Date
WO2003018775A2 true WO2003018775A2 (en) 2003-03-06
WO2003018775A3 WO2003018775A3 (en) 2003-10-23

Family

ID=27171588

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2002/028471 WO2003018775A2 (en) 2001-08-30 2002-08-30 Human mitochondrial dna polymorphisms, haplogroups, associations with physiological conditions, and genotyping arrays

Country Status (3)

Country Link
EP (1) EP1432831A4 (en)
JP (1) JP2005525082A (en)
WO (1) WO2003018775A2 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1897958A2 (en) * 2006-09-07 2008-03-12 Genocheck Co., Ltd. Method, polynucleotide probe, DNA chip and kit for identifying mutation of human mitochondrial DNA
CN103290109A (en) * 2013-04-17 2013-09-11 浙江大学 Kit for detecting mitochondrial T4353C mutation linked to hypertension and application thereof
EP3091083A1 (en) * 2015-05-07 2016-11-09 Latvian Biomedical Research and Study Centre A kit for detecting mutation or polymorphism in the human mitochondrial dna

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5360854B2 (en) * 2006-06-14 2013-12-04 雅嗣 田中 Gene detection method for type 2 diabetes
JP5360853B2 (en) * 2006-06-14 2013-12-04 雅嗣 田中 Gene detection method for metabolic syndrome
JP5276257B2 (en) * 2006-06-14 2013-08-28 雅嗣 田中 Gene detection method for human mitochondrial DNA
JP5396586B2 (en) * 2006-06-14 2014-01-22 雅嗣 田中 Gene detection method for atherothrombotic cerebral infarction
JP5360855B2 (en) * 2006-06-14 2013-12-04 雅嗣 田中 Gene detection method for myocardial infarction

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5837832A (en) * 1993-06-25 1998-11-17 Affymetrix, Inc. Arrays of nucleic acid probes on biological chips

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5976798A (en) * 1994-03-30 1999-11-02 Mitokor Methods for detecting mitochondrial mutations diagnostic for Alzheimer's disease and methods for determining heteroplasmy of mitochondrial nucleic acid

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5837832A (en) * 1993-06-25 1998-11-17 Affymetrix, Inc. Arrays of nucleic acid probes on biological chips

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ANDREWS ET AL.: 'Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA' NATURE GENETICS vol. 23, October 1999, page 147, XP002961687 *
DATABASE GENBANK [Online] CENTER FOR BIOTECHNOLOGY INFORMATION, NATIONAL LIBRARY OF MEDICINE, NIH (BETHESDA, MD, USA) 18 April 2000 CREWS S. ET AL., XP002108769 Retrieved from NCBI Database accession no. (J01415) & NATURE vol. 277, no. 5693, 1979, pages 192 - 198 *
See also references of EP1432831A2 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1897958A2 (en) * 2006-09-07 2008-03-12 Genocheck Co., Ltd. Method, polynucleotide probe, DNA chip and kit for identifying mutation of human mitochondrial DNA
EP1897958A3 (en) * 2006-09-07 2008-04-30 Genocheck Co., Ltd. Method, polynucleotide probe, DNA chip and kit for identifying mutation of human mitochondrial DNA
CN103290109A (en) * 2013-04-17 2013-09-11 浙江大学 Kit for detecting mitochondrial T4353C mutation linked to hypertension and application thereof
CN103290109B (en) * 2013-04-17 2015-01-28 浙江大学 Kit for detecting mitochondrial T4353C mutation linked to hypertension and application thereof
EP3091083A1 (en) * 2015-05-07 2016-11-09 Latvian Biomedical Research and Study Centre A kit for detecting mutation or polymorphism in the human mitochondrial dna

Also Published As

Publication number Publication date
EP1432831A2 (en) 2004-06-30
WO2003018775A3 (en) 2003-10-23
EP1432831A4 (en) 2006-06-14
JP2005525082A (en) 2005-08-25

Similar Documents

Publication Publication Date Title
EP1056889A2 (en) Methods and products related to genotyping and dna analysis
WO2005123951A2 (en) Methods of human leukocyte antigen typing by neighboring single nucleotide polymorphism haplotypes
US20050123913A1 (en) Human mitochondrial dna polymorphisms, haplogroups, associations with physiological conditions, and genotyping arrays
WO2003018775A2 (en) Human mitochondrial dna polymorphisms, haplogroups, associations with physiological conditions, and genotyping arrays
Ortiz et al. Generic platform for the multiplexed targeted electrochemical detection of osteoporosis-associated single nucleotide polymorphisms using recombinase polymerase solid-phase primer elongation and ferrocene-modified nucleoside triphosphates
JP2005525082A5 (en)
US7794982B2 (en) Method for identifying gene with varying expression levels
AU2002332905A1 (en) Human mitochondrial DNA polymorphism, haplogroups, associations with physiological conditions, and genotyping arrays
WO2000058519A2 (en) Charaterization of single nucleotide polymorphisms in coding regions of human genes
EP1798295A1 (en) Method for identifying progressive rod-cone degeneration in dogs
JP2006254735A (en) Diabetic disease-sensitive gene, and method for detecting difficulty or easiness of being infected with diabetes
Rahim et al. Co-inheritance of α-and β-thalassemia in Khuzestan Province, Iran
Moghadam et al. Molecular characterization of AIPL1 gene region in the Iranian population: application of novel informative haplotypes and detection of mutational founder effect
CA2459127A1 (en) Human mitochondrial dna polymorphisms, haplogroups, associations with physiological conditions, and genotyping arrays
WO2003020220A2 (en) Mitochondrial biology expression arrays
WO1999039004A1 (en) Iterative resequencing
KR102511596B1 (en) A single nucleotide polymorphism marker composition for diagnosing an adverse reactions with angiotensin converting enzyme inhibitor and a method using the same
EP1527197B1 (en) Association of edg5 polymorphism v286a with type ii diabetes mellitus and venous thrombosis/pulmonary embolism and the use thereof
US8198022B2 (en) Association of EDG5 polymorphism V286A with type II diabetes mellitus and venous thrombosis/pulmonary embolism and the use thereof
JP2006254739A (en) Diabetic disease-sensitive gene, and method for detecting difficulty or easiness of being infected with diabetes
CA2294572A1 (en) Genetic compositions and methods
KR100912470B1 (en) SNP for diagnosing schizophrenia, microarray and kit comprising the same
KR100912469B1 (en) SNP for diagnosing schizophrenia, microarray and kit comprising the same
JP2004215647A (en) Gene related to type ii diabetes
Tourakoulov et al. Allelic Polymorphism of Short Tandem Repeats Located

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BY BZ CA CH CN CO CR CU CZ DE DM DZ EC EE ES FI GB GD GE GH HR HU ID IL IN IS JP KE KG KP KR LC LK LR LS LT LU LV MA MD MG MN MW MX MZ NO NZ OM PH PL PT RU SD SE SG SI SK SL TJ TM TN TR TZ UA UG US UZ VC VN YU ZA ZM

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SI SK SL TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ UG ZM ZW AM AZ BY KG KZ RU TJ TM AT BE BG CH CY CZ DK EE ES FI FR GB GR IE IT LU MC PT SE SK TR BF BJ CF CG CI GA GN GQ GW ML MR NE SN TD TG

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR IE IT LU MC NL PT SE SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
WWE Wipo information: entry into national phase

Ref document number: 2003523626

Country of ref document: JP

WWE Wipo information: entry into national phase

Ref document number: 2459127

Country of ref document: CA

WWE Wipo information: entry into national phase

Ref document number: 2002332905

Country of ref document: AU

WWE Wipo information: entry into national phase

Ref document number: 2002796465

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 2002796465

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 10488618

Country of ref document: US

WWW Wipo information: withdrawn in national office

Ref document number: 2002796465

Country of ref document: EP