WO2001009384A2 - Serial analysis of genetic alterations - Google Patents

Serial analysis of genetic alterations Download PDF

Info

Publication number
WO2001009384A2
WO2001009384A2 PCT/US2000/020557 US0020557W WO0109384A2 WO 2001009384 A2 WO2001009384 A2 WO 2001009384A2 US 0020557 W US0020557 W US 0020557W WO 0109384 A2 WO0109384 A2 WO 0109384A2
Authority
WO
WIPO (PCT)
Prior art keywords
adapter
sequence
strand
polynucleotide
joining
Prior art date
Application number
PCT/US2000/020557
Other languages
French (fr)
Other versions
WO2001009384A3 (en
Inventor
James E. Stefano
Original Assignee
Genzyme Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Genzyme Corporation filed Critical Genzyme Corporation
Priority to AU63870/00A priority Critical patent/AU6387000A/en
Publication of WO2001009384A2 publication Critical patent/WO2001009384A2/en
Publication of WO2001009384A3 publication Critical patent/WO2001009384A3/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6844Nucleic acid amplification reactions
    • C12Q1/6853Nucleic acid amplification reactions using modified primers or templates
    • C12Q1/6855Ligating adaptors
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6809Methods for determination or identification of nucleic acids involving differential detection

Definitions

  • the invention is in the field of genetic analysis, including mutation detection and single nucleotide polymorphism (SNP) analysis.
  • SNP single nucleotide polymorphism
  • Nucleotide sequence polymorphism is a hallmark of the human genome. It is estimated that approximately 0.1% of the 3 billion base-pair human genome is subject to polymorphism. Thus, approximately 3 million base pairs of the human genome are subject to variation from one individual to another. A unique genetic fingerprint can be obtained for each individual, based upon which of these sites varies in a given individual, as well as the nature of the variations. The determination of the constellation of single nucleotide polymorphisms (SNPs) that exist in the genome of a given individual will be useful for disease phenotyping, as molecular markers for genome mapping, and as markers for forensic identification and related techniques. SNPs are but one form of the genetic variation that exists within the human population. Other forms of genetic variation include insertions and deletions of nucleotide sequences, sequence repetitions, translocations and inversions.
  • genetic variation contributes to many, if not most, types of human disease. Genetic factors can affect an individual's susceptibility to disease, as well as the response of an individual to pharmaceuticals used in the treatment of disease. In addition, both inherited and acquired genetic changes contribute to the development and progression of major human diseases such as cancer, cardiovascular disease, neurological and neurodegenerative disease. Consequently, the unique nature of an individual's genetic variation will be informative with respect to that individual's susceptibility to disease and response to treatment. Most diseases result from the cumulative effect of multiple genetic changes.
  • U.S. Patent No. 5,459,039 discloses a method for detecting base sequence differences between two DNA molecules, using a protein that recognizes base pair mismatches, by detection of a protein: DNA complex.
  • U.S. Patent No. 5,459,039 does not provide a method for parallel analysis of multiple genetic alterations, nor does it provide a method for identifying genetic alterations.
  • U.S. Patent No. 5,695,937 discloses a method for serial analysis of gene expression. It does not describe methods for detection and identification of genetic alterations, nor does it disclose methods for parallel analysis of multiple genetic alterations.
  • a method for rapid parallel analysis of multiple genetic alterations would be particularly useful, in light of the vast genetic diversity of the human species, the consequent preponderance of genetic alterations in the human genome, and the importance of these genetic variations for diagnosis and treatment of disease, among other things.
  • This invention provides this method.
  • the present invention provides methods and compositions for the rapid parallel analysis of multiple genetic alterations in a collection of sample nucleic acids.
  • Practice of the invention allows the identification of one or more genetic alterations in one or more sample polynucleotides, and in addition provides the nucleotide sequences of the genetic alterations so identified.
  • the invention provides a method for determining the nucleotide sequence of a sample polynucleotide containing one or more genetic alterations.
  • step (d) amplifying the products of step (c) using primers complementary to a portion of the adapter oligonucleotide, to generate amplification products;
  • each monomeric unit of the concatemer comprises a region of the sample polynucleotide containing the genetic alteration, and the monomeric units are separated by regions of sequence corresponding to a portion of the adapter oligonucleotide sequence;
  • determining the nucleotide sequence of the concatemer In one embodiment of the invention, multiple sample polynucleotides are assayed using the same reference polynucleotide. In additional embodiments, a plurality of reference polynucleotides are used to assay one or more sample polynucleotides.
  • the invention additionally provides oligonucleotide adapters useful for forming concatemers of sequences containing genetic alterations.
  • the oligonucleotide adapters of the invention are designed to minimize self-hgation, to be resistant to nucleolytic activities and to comprise a capture moiety that allows selective immobilization and/or removal of one strand of an amplification product.
  • Figures 1 A and IB are a schematic diagram of the assay method.
  • reference and test DNAs are heteroduplexed by heating and slow cooling. For each nucleotide difference, two mismatched nucleotide pairs are generated (only one is shown in the figure). These heteroduplexes are contacted with mutS protein. Digestion of the complexed DNA results in protection of small mostly double- stranded DNA fragments bound to mutS. These are purified from the other digest products and DNAse by spin column chromatography. The ends of the protected fragments are polished with T4 DNA polymerase.
  • Figure IB is a continuation of Figure 1A.
  • a pair of double-stranded adapters is ligated to the ends of the protected, polished fragments using T4 DNA ligase.
  • the adapters lack 5' phosphates, leaving one strand of the adapter unligated.
  • Contact with an exonuclease destroys the unligated strand allowing extension of the 3' ends of the protected fragment to be extended by T4 DNA polymerase.
  • the product is amplified by PCR using a set of biotinylated primers corresponding to one strand of the adapters.
  • FIGS. 2 A to 2C show concerted generation of tandem arrays from amplified protected fragments.
  • a plasmid vector bearing nonpalindromic single-stranded overhangs is generated by ligation of two adapter oligonucleotides complimentary to the restriction fragment ends used to linearize the vector. Both adapters have identical sequence single-stranded overhangs on the non-ligating terminus, except that one of the two lacks a 5' phosphate.
  • the adapter-decorated vector is then purified from any free adapter.
  • Figure 2B shows the restricted, and purified amplified protected fragment population is ligated together in the presence of a small amount (5%) of a chain- terminating adapter oligonucleotide complimentary on one end to the ends of the fragments. The other end of this adapter is complementary to the single-stranded overhang on the decorated vector.
  • the mixture is ligated to completion. All chains receiving a chain-terminator will be forced to form larger linear arrays until another chain terminator (or another growing chain also ending in a terminator) is ligated yielding a large tandem array whose average size is determined by the ratio of terminator to fragment ratio.
  • Figure 2C shows the decorated vector is added to the ligation reaction to accept the array.
  • one strand may be removed by T7 gene 6 exonuclease.
  • a cell includes a plurality of cells, including mixtures thereof.
  • compositions and methods include the recited elements, but not excluding others.
  • Consisting essentially of when used to define compositions and methods shall mean excluding other elements of any essential significance to the combination.
  • a composition consisting essentially of the elements as defined herein would not exclude trace contaminants from the isolation and purification method and pharmaceutically acceptable carriers, such as phosphate buffered saline, preservatives, and the like.
  • Consisting of shall mean excluding more than trace elements of other ingredients and substantial method steps for administering the compositions of this invention. Embodiments defined by each of these transition terms are within the scope of this invention.
  • polynucleotide and “nucleic acid molecule” are used interchangeably to refer to polymeric forms of nucleotides of any length.
  • the polynucleotides may contain deoxyribonucleotides, ribonucleotides, and/or their analogs. Nucleotides may have any three-dimensional structure, and may perform any function, known or unknown.
  • polynucleotide includes single-, double-stranded and triple helical molecules.
  • Olionucleotide refers to polynucleotides of between about 6 and about 100 nucleotides of single- or double- stranded DNA or RNA.
  • Oligonucleotides are also known as oligomers and may be isolated from genes, or chemically synthesized by methods known in the art.
  • a "primer” refers to an oligonucleotide, usually single-stranded, that provides a 3'- hydroxyl end for the initiation of nucleic acid synthesis.
  • polynucleotides a gene or gene fragment, exons, introns, rnRNA, tRNA, rRNA, ribozymes, cDNA, recombinant polynucleotides, branched polynucleotides, plasmids, vectors, isolated DNA of any sequence, isolated RNA of any sequence, nucleic acid probes, and primers.
  • a nucleic acid molecule may also comprise modified nucleic acid molecules, such as methylated nucleic acid molecules and nucleic acid molecule analogs.
  • Analogs of purines and pyrimidines are known in the art, and include, but are not limited to, aziridinycytosine, 4-acetylcytosine, 5-fluorouracil, 5-bromouracil, 5-carboxymethylaminomethyl-2-thiouracil, 5-carboxymethyl-aminomethyluracil, inosine, N6-isopentenyladenine, 1-methyladenine, 1-methylpseudouracil, 1- methylguanine, 1-methylinosine, 2,2-dimethylguanine, 2-methyladenine, 2- methylguanine, 3-methylcytosine, 5-methylcytosine, pseudouracil, 5-pentylnyluracil and 2,6-diaminopurine.
  • Oligonucleotides are short polymers of nucleotides, generally less than 200 nucleotides, preferably less than 150 nucleotides, more preferably less than 100 nucleotides, more preferably less than 50 nucleotides and most preferably less than 30 nucleotides in length. Oligonucleotides are generally considered to comprise shorter polymers of nucleotides than do polynucleotides, although there is an art- recognized overlap between the upper limit of oligonucleotide length and the lower limit of polynucleotide length. Consequently, for the purposes of the present invention, the terms "oligonucleotide” and "polynucleotide” shall not be considered limiting with respect to polymer length.
  • base pair also designated “bp” refers to the complementary nucleic acid molecules; in DNA the purine adenine (A) is hydrogen bonded with the pyrimidine base thymine (T), and the purine guanine (G) with pyrimidine cytosine (C), also known as Watson-Crick base-pairing.
  • a thousand base pairs is often called a kilobase, or kb.
  • a “base pair mismatch” refers to a location in a nucleic acid molecule in which the bases are not complementary Watson-Crick pairs.
  • duplex refers to the complex formed between two strands of hydrogen-bonded, complementary nucleic acid molecules.
  • a duplex need not be entirely complementary, but can contain one or more mismatches or one or more deletions or additions.
  • a duplex is sufficiently long-lasting to persist between formation of the duplex or complex and subsequent manipulations, including, for example, any optional washing steps.
  • reference strand or "wild-type strand” refers to the nucleic acid molecule or polynucleotide having a sequence prevalent in the general population that is not associated with any disease or discernible phenotype. It is noted that in the general population, wild-type genes may include multiple prevalent versions that contain alterations in sequence relative to each other and yet do not cause a discernible pathological effect. These variations are designated
  • polymorphisms or "allelic variations.” It is therefore possible to prepare multiple reference strands, thereby providing a mixture of the most common polymorphisms.
  • one reference strand may be used that has been selected for its particular sequence.
  • the reference strand can also be chemically or enzymatically modified, for example to remove or add methyl groups.
  • the reference strand is comprised of a PCR product identical at least in part to the sequence prevalent in the general population.
  • the reference strand or wild-type strand comprises a portion of a particular gene or genetic locus in the patient's genomic DNA known to be involved in a pathological condition or syndrome.
  • genetic syndromes include cystic fibrosis, sickle-cell anemia, thalassemias, Gaucher's disease, adenosine deaminase deficiency, alphal-antitrypsin deficiency, Duchenne muscular dystrophy, familial hypercholesterolemia, fragile X syndrome, glucose-6-phosphate dehydrogenase deficiency, hemophilia A, Huntington's disease, myotonic dystrophy, neurofibromatosis type 1 , osteogenesis imperfecta, phenylketonuria, retinoblastoma, Tay-Sachs disease, and Wilms tumor (Thompson and Thompson, Genetics in Medicine, 5th Ed.).
  • the reference strand comprises part of a particular gene or genetic locus that may not be known to be linked to a particular disease, but in which polymo ⁇ hism is known or suspected.
  • obesity may be linked with variations in the apolipoprotein B gene
  • hypertension may be due to genetic variations in sodium or other transport systems
  • aortic aneurysms may be linked to variations in I-haptoglobin and cholesterol ester transfer protein
  • alcoholism may be related to variant forms of alcohol dehydrogenase and mitochondrial aldehyde dehydrogenase.
  • an individual's response to medicaments may be affected by variations in drug modification systems such as cytochrome P450s, and susceptibility to particular infectious diseases may also be influenced by genetic status.
  • the methods of the present invention can be applied to HLA analysis for identity testing.
  • sample strand or “patient strand” refers to the polynucleotide having unknown sequence and potentially containing one or more mutations or mismatches as compared to the reference strand. This may be a PCR product amplified from patient DNA or other sample(s).
  • the reference strand comprises part of a foreign genetic sequence e.g., the genome of an invading microorganism.
  • a foreign genetic sequence e.g., the genome of an invading microorganism.
  • Non-limiting examples include bacteria and their phages, viruses, fungi, protozoa, myoplasms, and the like. The present methods are particularly applicable when it is desired to distinguish between different variants or strains of a microorganism in order to choose appropriate therapeutic interventions.
  • genetic alterations or “mutations” is used to refer to a change from the wild-type or reference sequence of one or more nucleic acid molecules. It refers to base pair substitutions, additions and deletions of a sample strand when compared to a reference strand.
  • a linear sequence of polynucleotides is "substantially homologous" to another linear sequence, having the opposite polarity, if both sequences are capable of hybridizing to form duplexes with the same complementary polynucleotide.
  • sequences that hybridize under conditions of high stringency are more preferred. It is understood that hybridization reactions can accommodate insertions, deletions, and substitutions in the nucleotide sequence. Thus, linear sequences of nucleotides can be essentially identical even if some of the nucleotide residues do not precisely correspond or align.
  • the "substantially homologous" sample sequences of the invention contain a single mutation (mismatch) or an addition or deletion of 1 to about 10 base pairs when compared to the reference polynucleotide.
  • the term "reagent which recognizes a non-duplex polynucleotide structure” is any agent, proteinaceous or otherwise, which provides this functional activity when used in the method of this invention.
  • this agent is a mismatch binding protein or "MBP.”
  • MBP refers to the group of proteins which recognize and bind to nucleotide mismatches and unpaired nucleotides in polynucleotide duplexes.
  • non-duplex polynucleotide structure shall mean the absence at any position of a Watson-Crick base pair, i.e., any pair other than A:T or G:C, or A:U in RNA, and unpaired nucleotides.
  • MBPs includes several embodiments. These embodiments include any fragment, analog, mutein, variant or mixture thereof, that retains the ability to recognize and bind to a nucleotide mismatch.
  • a "variant” is a protein or polypeptide with conservative amino acid substitutions as compared to the wild-type amino acid sequence: The term therefore encompasses MutS and its homologues including hMSH2, hPMSl, and hPMS2.
  • Mismatch repair proteins for use in the present invention may be derived from E. coli (as described above) or from any organism containing mismatch repair proteins with appropriate functional properties.
  • useful proteins include those derived from Salmonella typhimurium (MutS, see, Su, S.S. and Modrich, P., Proc. Natl. Acad. Sci.
  • heteroduplexes formed between patients' DNA and wild-type DNA as described above are incubated with p53 or its C-terminal domain (Lee, et al., Cell 81:1013- 1020 (1995)).
  • MutS, MutL, and MutH are used to cleave mismatch regions (Su et al., Proc. Natl. Acad. Sci. USA 83:5057 (1986); Grulley et al., J. Biol. Chem. 264:1000 (1989)).
  • the agents are proteins or polypeptides, they can be in the L or D form so long as the biological activity of the polypeptide is maintained.
  • the protein can be altered so as to be secreted from the cell for recombinant production and purification.
  • proteins which are post-translationally modified by reactions that include glycosylation, acetylation and phosphorylation.
  • Such polypeptides also include analogs, alleles and allelic variants which can contain amino acid derivatives or non-amino acid moieties that do not affect the biological or functional activity of the protein as compared to wild-type or naturally occurring protein.
  • amino acid refers both to the naturally occurring amino acids and their derivatives, such as TyrMe and PheCl, as well as other moieties characterized by the presence of both an available carboxyl group and an amine group.
  • Non-amino acid moieties which can be contained in such polypeptides include, for example, amino acid mimicking structures. Mimicking structures are those structures which exhibit substantially the same spatial arrangement of functional groups as amino acids but do not necessarily have both the ⁇ -amino and ⁇ -carboxyl groups characteristic of amino acids.
  • nucleolytic agent refers to an enzyme or chemical agent that cleaves at least one strand of DNA.
  • Non-limiting examples of such agents include exonucleases such as B AL31 , lambda exonuclease, exonuclease III, T7 gene 6 exonuclease, and endonucleases such as DNase I, S. aureus (micrococcal) nuclease, PI nuclease, and the like.
  • examples of chemical agents include bleomycin and/or iron-bound intercalators such as 0-phenanothroline that can direct the formation of hydroxyl radicals in close proximity to a DNA-bound intercalator to effect DNA cleavage.
  • exonuclease refers to an enzyme that cleaves nucleotides sequentially from the free ends of a linear nucleic acid substrate. Exonucleases can be specific for double or single stranded nucleotides and/or directionally specific, for instance, 3'— »5' and/or 5'-»3 ⁇ Some exonucleases exhibit other enzymatic activities, for example, native T7 DNA polymerase is both a polymerase and an active 3'- 5' exonuclease.
  • Exonuclease III removes nucleotides one at a time from the 3 '-end of duplex DNA
  • exonuclease VII makes oligonucleotides by several nucleotides from both ends of single-stranded DNA
  • lambda exonuclease removes nucleotides having attached 5' phosphate groups from the 5' end of duplex DNA.
  • nick translation refers to a process comprising the combined action of a 5'— »3' exonuclease and a polymerase to degrade the 5' terminus of a polynucleotide in duplex with a template strand polynucleotide and extend the 3' terminus of an adjacent second polynucleotide strand in duplex with the same template strand polynucleotide wherein the said 5' terminus is separated from the 3' terminus of the adjacent polynucleotide by a nick or gap.
  • adjacent includes any number of nucleotides without limit (more than zero nucleotides is usually referred to as a "gap"), but is typically zero nucleotides
  • nick usually referred to as a "nick"
  • exonuclease and polymerase activities may reside in the same enzyme (such as with E. coli DNA polymerase I) or different enzymes, and their action may be either simultaneous or sequential.
  • PCR polymerase chain reaction
  • PCR refers to a method for amplifying a DNA base sequence using a heat-stable polymerase such as Taq polymerase, and two oligonucleotide primers, one complementary to the (+)-strand at one end of the sequence to be amplified and the other complementary to the (-)- strand at the other end.
  • PCR also can be used to detect the existence of the defined sequence in a DNA sample.
  • This invention provides a method for determining the sequence of a sample polynucleotide containing one or more genetic alterations.
  • the genetic alteration are presented in the form of a heteroduplex of polynucleotides which is then contacted with a reagent that recognizes a non-duplex polynucleotide structure, to form a reagent-heteroduplex complex, wherein the duplex contains at least one base-pair mismatch.
  • a heteroduplex can comprise: DNA:DNA; DNA:RNA; or DNA:RNA.
  • a nucleolytic agent is contacted with the heteroduplex to produce a protected fragment from which a double-stranded adapter oligonucleotide is joined.
  • the duplex: adapter complex is amplified using primers complementary to a portion of the adapter oligonucleotide, to generate amplification products.
  • the products are the amplification products joined to one another to form a concatemer, wherein each monomeric unit of the concatemer comprises a region of the sample polynucleotide contaimng the genetic alteration or the substantially-complementary reference strand region, and the monomeric units are separated by regions of sequence corresponding to a portion of the adapter oligonucleotide sequence.
  • the sequence of the concatemer is comprised of sequences of the strands of the protected fragments in random order and orientation.
  • sequence or sequences of the protected fragment strands are compared with the known sequence of the sample strand to identify the genetic alterations.
  • sequence of the reference strand is not fully known, the sequence of the two strands of the same protected fragment and/or sequences of other protected fragments arising from the same mismatch (identifiable as comprising substantially-identical overlapping or nested sequences) are compared to each other to identify the nucleotide sequence variations within or between the sample and/or reference polynucleotides.
  • a plurality of reference polynucleotides are used with either the same or different sample strands.
  • the adapter oligonucleotide comprises one or more restriction enzyme recognition sites.
  • the amplification products are digested with a restriction enzyme which cleaves at the recognition site prior to joining to adapter polynucleotides.
  • adapter polynucleotides of differing sequences are utilized to join the amplified products to each other.
  • the adapter oligonucleotide has the following characteristics: It contains an inner end and an outer end; it is non-phosphorylated at the 5' terminus of the inner end; it contains a capture moiety at the outer end, and it has one or more blocking linkages adjacent to the capture moiety.
  • joining of the adapter comprises joining the first strand of the double- stranded adapter oligonucleotide to the products of the prior step and joining the second strand of the double-stranded adapter oligonucleotide to the products of this step.
  • a further embodiment of the invention provides the first strand is covalently joined to the protected fragment by ligation, and the second strand is covalently joined to the protected fragment by nick- translation.
  • Reference DNA can be synthesized by chemical means or, preferably, isolated from any organism by any method known in the art. The organism will have no discernible disease or phenotypic effects.
  • This DNA may be obtained from any cell source, tissue source or body fluid.
  • Non-limiting examples of cells sources available in clinical practice include blood cells, buccal cells, cerviovaginal cells, epithelial cells from urine, fetal cells, or any cells present in tissue obtained by biopsy.
  • Body fluids include urine, blood, cerebrospinal fluid (CSF), and tissue exudates at the site of infection or inflammation.
  • DNA is extracted from the cells or body fluid using any method known in the art. Preferably, at least 5 pg of DNA is extracted. The extracted DNA can be used without further modification or stored for future use.
  • one or more specific regions in the extracted reference polynucleotide are amplified by PCR using a set of PCR primers complementary to genomic DNA separated by up to about 500 base pairs.
  • PCR conditions found to be suitable are described below in the Examples. It will be understood that optimal PCR conditions can be readily determined by those skilled in the art. (See, e.g., PCR 2: A PRACTICAL APPROACH (1995) eds. M.J. McPherson, B.D. Hames and G.R. Taylor, LRL Press, Oxford).
  • PCR products can be purified by a variety of methods, including but not limited to, microfiltration, dialysis, gel electrophoresis and the like. It is desirable to remove the polymerase used in PCR so that no new DNA synthesis can occur.
  • a reference:sample heteroduplex can be formed by any method of hybridization known in the art.
  • the reference and samples are separately heated and then annealed together.
  • the heating step is between about 70°C and about 100°C, more preferably between about 80°C and 100°C, and even more preferably between about 90°C and 100°C.
  • the polynucleotide is kept at the elevated temperature for sufficient time to separate the strands, preferably between about 2 minutes and about 15 minutes, more preferably between about 2 and about 10 minutes and even more preferably about 5 minutes.
  • sample duplexes including those with both high and low Tm.
  • the duplexes can be used immediately, or stored at 4°C until use.
  • a duplex can be formed by adjusting the salt and temperature to achieve suitable hybridization conditions.
  • Hybridization reactions can be performed in solutions ranging from about 10 mM NaCl to about 600 mM NaCl, at temperatures ranging from about 37°C to about 65°C. It will be understood that the stringency of the hybridization reaction is determined by both the salt concentration and the temperature. For instance, a hybridization performed in 10 mM salt at 37°C may be of similar stringency to one performed in 500 mM salt at 65°C.
  • organic solvents and/or chaotropic salts such as guanidine thiocyanate (2.5M) may be used, allowing hybridization to be performed at 37°C.
  • any hybridization conditions can be used that form hybrids between substantially homologous complementary sequences, provided the reagents employed are compatible with the MBP and exonuclease employed. Generally, this can be accomplished by exchange into the reaction buffer of choice by dilution, extraction followed by ethanol precipitation, ultrafiltration or spin column chromatography and the like. In a preferred embodiment stringent hybridization conditions are used.
  • a genetic alteration is a difference in nucleotide sequence between two polynucleotides.
  • a genetic alteration can be a mutation, resulting in a detectable phenotype.
  • a genetic alteration may not be linked to a phenotype, but will be useful, e.g., for forensic purposes.
  • a particular sample polynucleotide that is assayed in the practice of the invention can contain a single genetic alteration or it may contain multiple genetic alterations.
  • the genetic alteration will be a single nucleotide polymo ⁇ hism, i.e., a change in the sequence of a single base, compared to the wild-type sequence.
  • a plurality of sample polynucleotides, each of which contains one or more genetic alterations is analyzed by the practice of the invention, and the nucleotide sequences of the genetic alterations thereby determined.
  • one or more sample polynucleotides are hybridized with one or more reference polynucleotides to generate a polynucleotide duplex.
  • a polynucleotide duplex is a double-stranded polynucleotide, in which the association between the two strands is mediated, at least in part, by complementary base-pairing.
  • Hybridization of polynucleotides to form duplexes proceeds according to well-known and art-recognized base-pairing properties, such that adenine base-pairs with thymine or uracil, and guanine base- pairs with cytosine.
  • nucleotide duplex The property of a nucleotide that allows it to base-pair with a second nucleotide is called complementarity.
  • adenine is complementary to both thymine and uracil, and vice versa; similarly, guanine is complementary to cytosine and vice versa.
  • a polynucleotide duplex one or both of the two component strands may not be duplex along their entire length if, for example, one strand is longer that the other, or if the two strands have non-complementary terminal sequences.
  • a homoduplex in which all bases in both component polynucleotide strands form complementary base pairs (along the double-stranded portion of the duplex) and a heteroduplex, in which one or more additions, deletions or base-pair mismatches exist within the duplex.
  • Formation of a heteroduplex is indicative of a genetic alteration in one of the two polynucleotide strands of a duplex, with respect to the other strand.
  • Two polynucleotide strands containing one or more additions, deletions or mismatches with respect to each other are nevertheless capable of forming a stable duplex, if sufficient complementary exists between the two strands.
  • hybridization conditions can be adjusted to facilitate the formation of heteroduplexes.
  • the stability of a heteroduplex is such that it will persist after its formation through subsequent manipulations such as washing, protein binding and treatment with nucleolytic agents.
  • the nucleotide sequence of a reference polynucleotide will be known as a reference sequence and, in general, a reference sequence will be the wild-type sequence of a particular genetic region. Variations from the wild-type sequence present in the sample polynucleotide(s) can thereby be determined.
  • a single reference sequence comprising a region of known genetic variability, is used to assay multiple sample polynucleotides.
  • a reference polynucleotide could comprise a sequence that has been defined as "mutant" and be used to detect either wild-type sequences or sequences containing a different mutant allele.
  • a sample polynucleotide is contacted with a reference polynucleotide, either single- or double-stranded, to form a mixture.
  • the mixture is treated so as to denature double- stranded polynucleotides and to remove regions of secondary structure in single- stranded polynucleotides, for instance, by heating.
  • the mixture is incubated in solution under conditions of temperature, ionic strength, pH, etc., that are favorable to annealing, i.e., under hybridization conditions.
  • Hybridization conditions are chosen to allow duplex formation between sequences having one or more mismatches, as is known in the art.
  • Hybridization of a sample polynucleotide with a reference polynucleotide will thus result in the formation of a heteroduplex polynucleotide, containing one or more mismatches, i.e., sites at which complementary base-pairing does not occur.
  • wild-type genes may include multiple prevalent versions that contain alterations in sequence relative to each other and yet do not cause a discernible pathological effect. These variations are designated "polymo ⁇ hisms" or allelic variations.” It is therefore possible to prepare multiple reference polynucleotides, thereby providing a mixture of the most common polymo ⁇ hisms.
  • a single reference polynucleotide may be used that has been selected for its particular sequence.
  • the reference polynucleotide can also be chemically or enzymatically modified, for example to remove or add methyl groups.
  • the reference polynucleotide comprises a PCR product identical, at least in part, to the sequence prevalent in the general population.
  • polynucleotides as defined above, i.e., a gene or gene fragment, restriction fragment, exons, introns, mRNA, tRNA, rRNA, ribozymes, cDNA, recombinant polynucleotides, branched polynucleotides, plasmids, vectors, isolated DNA of any sequence, isolated RNA of any sequence, nucleic acid probes, and primers.
  • the reference polynucleotide comprises a portion of a particular gene or genetic locus in the patient's genomic DNA known to be involved in a pathological condition or syndrome.
  • genetic syndromes include cystic fibrosis, sickle-cell anemia, thalassemias, Gaucher's disease, adenosine deaminase deficiency, alphal-antitrypsin deficiency, Duchenne muscular dystrophy, familial hypercholesterolemia, fragile X syndrome, glucose 6- phosphate dehydrogenase deficiency, hemophilia A, Huntington's disease, myotonic dystrophy, neurofibromatosis type 1, osteogenesis imperfecta, phenylketonuria, retinoblastoma, Tay-Sachs disease, and Wilms' tumor. Thompson and Thompson, Genetics in Medicine, 5* Ed.
  • the reference polynucleotide comprises part of a particular gene or genetic locus that may not be known to be linked to a particular disease, but in which polymo ⁇ hism is known or suspected.
  • obesity may be linked with variations in the apolipoprotein B gene
  • hypertension may be due to genetic variations in transport systems for sodium or other ions
  • aortic aneurysms may be linked to variations in I-haptoglobin and cholesterol ester transfer protein
  • alcoholism may be related to variant forms of alcohol dehydrogenase and mitochondrial aldehyde dehydrogenase.
  • an individual's response to medicaments may be affected by variations in drug modification systems such as cytochrome P-450s, and susceptibility to particular infectious diseases may also be influenced by genetic status.
  • the methods of the present invention can be applied to HLA analysis for identity testing.
  • the invention allows the generation of a smaller subregion of the sample polynucleoti dereference polynucleotide heteroduplex, wherein the subregion encompasses the sequence difference between the sample polynucleotide and the reference polynucleotide.
  • the subregion is generated by protection from nuclease digestion, and is therefore denoted a protected fragment.
  • An exemplary method for generating a protected fragment is as follows.
  • a heteroduplex formed between a sample polynucleotide and a reference polynucleotide is contacted with a reagent which recognizes a non-duplex structure.
  • the reagent can be chemical or enzymatic.
  • the reagent is enzymatic; in a more preferred embodiment, the reagent is a mismatch binding protein; in a still more preferred embodiment, the reagent is the E. coli mutS protein. Reaction conditions compatible with the binding of mutS and other mismatch binding proteins to heteroduplexes are known in the art and are additionally provided in the examples below.
  • reagent-heteroduplex complex will generally comprise a duplex polynucleotide in which one (or more) portions of the duplex are bound by the reagent (in the vicinity of the genetic alteration), and the remainder of the duplex is free of bound reagent. Those portions of the duplex that are free of bound reagent are susceptible to nucleolytic agents. Accordingly, to generate a protected fragment, a reagen heteroduplex complex is subjected to the action of one or more nucleolytic agents. Such nucleolytic agents can be chemical or enzymatic.
  • nuclease An enzymatic nucleolytic agent is also known as a nuclease.
  • a nuclease is an enzyme capable of degrading nucleic acids.
  • An exonuclease degrades from the ends of a nucleic acid molecule.
  • a 5'-specific exonuclease will begin degradation at the 5' end of a nucleic acid molecule, and a 3 '-specific exonuclease will begin degradation at the 3' end of a nucleic acid molecule.
  • 5 '-specific exonucleases may additionally be specific for either 5 '-phosphate- or 5'-hydroxyl terminated ends.
  • 3'-specific exonucleases may be specific for either 3'-phosphate- or 3'- hydroxyl terminated ends.
  • An endonuclease degrades internally in a nucleic acid molecule.
  • a single strand-specific nuclease degrades single-stranded nucleic acids, either exonucleolytically or endonucleolytically, but is unable to degrade a double- stranded nucleic acid.
  • a preferred nuclease is an endonuclease.
  • Suitable endonucleases include SI nuclease, pi nuclease from Micrococcal nuclease, Mung Bean nuclease and DNAse I. Concentrations of endonuclease sufficient to digest non-protected portions of a duplex polynucleotide are known in the art and an example is provided in the examples, infra.
  • Nucleolytic agents of a chemical nature include, for example, NaOH or other bases, which are capable of nucleolytic degradation of RNA.
  • the requirement for a reagent that recognizes a non-duplex polynucleotide structure is fulfilled by the concerted, sequential action of several reagents.
  • a reagent-heteroduplex complex can be contacted with a chemical or enzymatic reagent which cleaves one strand of an otherwise duplex polynucleotide at or near a mismatch, followed by contact of the nicked polynucleotide with a nick-binding protein.
  • Suitable reagents which cleave at or near a mismatch include SI nuclease, Mung Bean nuclease, Mut Y protein, and Mut M protein.
  • the region of the polynucleotide component of a reagent-heteroduplex complex that is bound by the reagent is protected from the action of the nucleolytic agent.
  • treatment of the reagent-heteroduplex complex with a nucleolytic agent generates a protected fragment.
  • Protected fragments can be purified, if desired, by any method which separates the protected fragment from (usually) smaller unprotected polynucleotide fragments produced by the nucleolytic agent. For example, a size separation, such as gel filtration or gel electrophoresis, can be used.
  • sample duplex is contacted with one or more agents having the ability to specifically bind to bp mismatches.
  • This includes, but is not limited to, mismatch binding proteins.
  • the agent is contacted under conditions which allow binding of the agent to the mismatch.
  • the MBP is E. coli MutS (AP Biotech) although other MBPs or mixtures of MBPs can be used.
  • homologues of MutS such as MutS from Thermus aquaticus (Epicentre), Streptococcus pneumoniae HexA, hMSH2, genetically modified MutS or other mutation binding proteins such as RuvC protein from E.
  • the duplex is contacted with MutS at 0°C for between about 10 and 30 min., preferably about 30 min.
  • MutS binding yielding consistent patterns of protection with a large "footprint" range of bp protected, occurs near neutral pH, preferably between a pH of about 6.5 and 8.5, and more preferably, between a pH of about 7.0 and 8.0.
  • a source of magnesium ions (Mg) can also be added to the reaction to enhance MutS binding.
  • Adapter oligonucleotides can also be added to the reaction to enhance MutS binding.
  • oligonucleotide adapters are added to the protected fragments.
  • the adapters facilitate the amplification of sequences corresponding to a protected fragment, by providing a pair of primer sites.
  • an adapter oligonucleotide duplex lacks terminal phosphate residues.
  • the lack of 5 '-phosphate termini on the adapter oligonucleotides prevents self-ligation of the adapter oligonucleotides, which would lead to the production of spurious amplification products (i.e., "primer dimers") in later steps.
  • Lack of 5 '-phosphate termini also prevents ligation of multiple adapters to a protected fragment, insuring that a single adapter is ligated to each end of a protected fragment.
  • the inner end of the adapter oligonucleotide is that end which becomes joined to a protected fragment in the practice of the invention.
  • the outer end of the adapter oligonucleotide is that end which is not joined to the protected fragment.
  • the outer end of the adapter oligonucleotide forms a terminus of the polynucleotide which results from ligation of an adapter oligonucleotide to a protected fragment. It is possible to specify the inner and outer ends of an adapter oligonucleotide in several ways.
  • the adapter oligonucleotide duplex comprises a capture moiety at its outer end, which serves to allow immobilization of the outer portions of the adapter-modified fragment after digestion with a restriction enzyme, and also serves to sterically block ligation of that end of the adapter oligonucleotide to a protected fragment.
  • the adapter oligonucleotides can optionally comprise a capture moiety at the outer end of the adapter oligonucleotide duplex.
  • the capture moiety can be attached either to the strand that forms the 5'-end of the outer end of the adapter oligonucleotide duplex, or to the strand that forms the 3 '-end of the outer end of the adapter oligonucleotide duplex.
  • the capture moiety is generally a molecule that is capable of interacting with a second molecule (a recognition moiety) to form a stable complex.
  • Adapter oligonucleotides and any other nucleic acid that is directly or indirectly attached to the capture moiety will be present in a complex formed between a capture moiety and a recognition moiety.
  • a recognition moiety will often be attached to a solid substrate, or otherwise immobilized such that a capture moiety recognition moiety complex can be brought out of solution.
  • Exemplary capture moiety:recognition moiety pairs include bioti adidin, biotimstreptavidin, biotimanti-biotin, antigemantibody, haptemantibody, enzyme: substrate, sugar:lectin, protei ligand and nucleic acidxomplementary nucleic acid.
  • Other interacting molecules that can serve as capture moiety recognition moiety pairs will be known to those of skill in the art. It is also clear that the roles of capture moiety and recognition moiety can be reversed.
  • Adapter oligonucleotides will comprise, within their sequence, a primer binding site.
  • a primer binding site refers to a region of an oligonucleotide or polynucleotide, such as an adapter or a sequence encoded by an adapter, that is capable of base-pairing with a primer, or that encodes a sequence that is able to base- pair with a primer.
  • a primer is an oligonucleotide or polynucleotide capable of base- pairing with another oligonucleotide or polynucleotide and serving as a site from which polymerization can be initiated, normally from a 3'-hydroxyl end. Because of the ease with which oligonucleotides of defined sequence can be synthesized, virtually any sequence, capable of base-pairing, can function as a primer binding site.
  • An adapter oligonucleotide duplex can comprise one or more blocking linkages adjacent to its outer end.
  • a blocking linkage is an internucleotide linkage which is less susceptible to nucleolytic degradation, compared to the phosphodiester linkage normally found in most naturally-occurring nucleic acids.
  • Exemplary blocking linkages include phosphorothioate, methyl phosphonate, boronate and others.
  • Other modifications such as nucleic acid analogs (PNAs, mo ⁇ holidate DNAs, locked nucleoside analogs (LNA), and the like, may also be used.
  • haptens such as fluorescent tags, inverted nucleosides (in 5 '-5' or 3'-3' linkage), or biotin, any of which inhibit the action of nucleases used in the procedure, may also be used.
  • the presence of blocking linkages minimizes loss of protected fragments due to nucleolytic action, as described, infra.
  • the protected fragments are rendered blunt-ended (if not already blunt-ended as a result of the action of a nucleolytic agent), prior to the addition of adapters, by one of a number of end repair processes. In one embodiment, this is accomplished by incubating the protected fragments with T4 DNA Polymerase, dATP, dCTP, dGTP and dTTP under conditions suitable for both the polymerization and exonucleolytic activities of T4 DNA polymerase. Fragments contaimng 3 ' overhanging ends will be rendered blunt-ended by the 3 '- specific exonuclease activity of T4 DNA Polymerase; while 5' overhangs will be converted to blunt ends by polymerization of the recessed 3' termini.
  • protected fragments are treated with a single strand-specific exonuclease, such as E. coli exonuclease VII or exonuclease I.
  • a single strand-specific exonuclease such as E. coli exonuclease VII or exonuclease I.
  • 5 '-phosphate-terminated blunt ends are generated, either through the end repair process or as a result of nucleolytic action.
  • Other methods for generating blunt-ended fragments such as, for example, physical methods, treatment with single-strand-specific exo- or endonucleases or the use of other types of nucleic acid polymerase, are also within the scope of this invention.
  • ligase enzymes include T4 DNA ligase, E. coli DNA ligase, T. aquaticus ligase.
  • the protected fragments will have 5'-phosphate termini.
  • the oligonucleotide adapters lack terminal phosphate residues.
  • the lack of 5'- phosphate termini on the adapter oligonucleotides prevents self-ligation of the adapter oligonucleotides, which would lead to the production of spurious amplification products (i.e., "primer dimers") in later steps.
  • Lack of 5 '-phosphate termini also prevents ligation of multiple adapters to a protected fragment, insuring that a single adapter is ligated to each end of a protected fragment.
  • the complementary strands in which a 3'-OH end of the protected fragment is juxtaposed to a 5'-OH end of the adapter oligonucleotide, will not be ligated. Since this occurs at both ends of the protected fragment, the result of adapter ligation is a duplex, comprising a protected fragment flanked by adapters, with a nick near each 3 '-end at the boundary between protected fragment sequences and adapter sequences.
  • the 3 '-end of the "outer" end of the adapter(s) may be modified with blocking moieties, such as a 3' deoxynucleotide, 3' dideoxynucleotide, inverted nucleotide or hapten, to prevent wrong orientation ligation.
  • blocking moieties such as a 3' deoxynucleotide, 3' dideoxynucleotide, inverted nucleotide or hapten
  • the non-ligated adapter strand is degraded by a nuclease, and resynthesized by a DNA polymerase.
  • the nuclease is the T7 gene 6 exonuclease
  • the polymerase is T4 DNA polymerase. Degradation and resynthesis can be sequential or simultaneous.
  • nucleolytic and polymerization activity are present in the same polypeptide.
  • the combined polymerase and 5'->3' exonuclease activities of a DNA polymerase (such as the Klenow fragment of E. coli DNA Polymerase I) are utilized to close the gap by "nick translation.”
  • Duplex polynucleotides comprising a protected fragment flanked by adapter oligonucleotide duplexes are subjected to amplification.
  • Duplex polynucleotides comprising a protected fragment flanked by adapter oligonucleotide duplexes, or optionally their amplification products, are ligated into tandem arrays (concatemers) and the nucleotide sequence of the concatemer is determined, thereby providing the nucleotide sequences of the genetic alterations in the sample polynucleotides.
  • an internal subfragment containing a portion of the adapter ligated to the protected fragment, or its amplification product, and bearing a palindromic single-stranded overhang ("sticky end") may be generated by providing a restriction site within the adapter sequence and digesting the adapted products with a restriction enzyme prior to ligating the array.
  • a capture moiety such as biotin may be provided on the terminal fragment by inclusion in the adapter and/or primer oligonucleotides, and removing them by capture with a recognition moiety such as streptavidin prior to ligation.
  • Such ligation will generate arrays of highly random size, many of which will circularize, and thus be prevented from ligating into a vector. This effect will reduce the frequency of clones with inserts of appropriate size for sequencing.
  • This may be controlled by including in the adapter population a second set of adapters with a restriction site that creates the same sticky end as the bulk of the adapters, but which is cleaved exclusively by a second enzyme, by virtue of having difference adjacent nucleotides.
  • One such example is BamHl and Bgll, both of which leave a 5 '-GATC overhang, but recognize that sequence in the context of differing adjacent nucleotides.
  • the product can then be cleaved with Bgll to generate linear arrays of more defined size.
  • the average size of the arrays, or number of inserts may be adjusted by varying the ratio of BamHl -site containing adapters or primers to Bgll -site contaimng adapters or primers.
  • An alternative means to generate defined arrays may be provided by including a small amount of a "chain-terminating" adapter oligonucleotide in the ligation reaction.
  • Such molecules comprise double-stranded oligonucleotides bearing at one end, a sequence complementary to the sticky ends of the adapted protected fragments, and on the other end, a single-stranded non-palindromic sequence that lacks the ability to form a hybrid with itself, either by Watson-Crick or other base- pairing that would otherwise lead to ligation of a dimer. This is most easily achieved by utilizing a non-palindromic sequence that cannot pair with itself. Inclusion of this chain terminator in the ligation reaction forces the formation of linear products ending with the chain terminator adapters.
  • a vector to which adapters complementary to the chain terminator sticky ends have been previously ligated (“decorated vector"), may then be added to the reaction to accept the arrays without allowing the closure of the vector or arrays into circular products.
  • the protected fragments or their amplified products will contain mismatches, that when introduced into host cells, will give rise to mismatch correction mechanisms, including the post DNA replication repair system. This may be avoided by methylation of the ligation product with a DNA methylase corresponding to the host cell.
  • one strand of the vector containing the ligated array may be removed by treatment with an exonuclease prior to transformation.
  • a suitable exonuclease is T7 gene 6 exonuclease. A single nick in the ligated product is required to prevent undesirable degradation of the vector.
  • a vector may be used that contains a Ml 3 or fl origin of replication and cleaving it in one strand of the origin with gpll protein prior to treatment with T7 gene 6 exonuclease.
  • sequences may be subject to host restriction-modification systems. This may be overcome by appropriate host selection or pretreatment with a modification methylase.
  • DNA from colonies containing tandem arrays is prepared and sequenced using methods well understood in the art.
  • all of the reactions following purification of the protected fragments are performed sequentially in the same reaction vessel, by sequential addition of enzymes (and buffers) required for the subsequent step(s).
  • Two ⁇ g of the purified heteroduplex was mixed with 10 pmol of E. coli mutS protein (AP Biotech) in 20 ⁇ l 50mM Tris pH7.5, 8 mM MgCl 2 , and 0.5 mM DTT and incubated for lh. on ice. Then, l ⁇ l 2 mg/ml DNase I was added and the sample incubated for 10 min. at 37°C. The digestion was terminated by addition of 23 ⁇ l 50 mM Tris pH7.5, lOmM EDTA, and the mutS-protected complexes purified by chromatography on a Sephacryl S-200 spin column (AP Biotech).
  • each forward and reverse double-stranded adapters annealed sequences 1 and 2 mixed with annealed sequences 3 and 4, each separately pretreated with calf intestinal phosphatase (CLP)
  • CLP calf intestinal phosphatase
  • T4 DNA ligase (Stratagene)
  • the unligated strands of the adapters (corresponding to sequences 2 and 4) in the ligation products (lO ⁇ l) were then removed and the 3' ends of the protected fragments simultaneously extended by addition of 50 U T7 gene 6 exonuclease (AP Biotech) and incubation for 30 min. at room temperature.
  • the final products were digested with 10 units Dra I in the same buffer for lh. at 37°C.
  • the final products were amplified by denaturing for 1 min. at 95°C, followed by 29 cycles of PCR, with denaturation at 95°C for 10 seconds, primer annealing at 62°C for 20s, and primer extension at 72°C for 10s, followed by a final extension for 5 min. at 72°C using l ⁇ M oligonucleotides 1 and 3 in a 50 ⁇ l reaction containing 2.5 U Taq DNA polymerase .
  • the products were digested with 80 U Hind III (New England Biolabs) for 3h. at 37°C.
  • the released biotinylated adapter termini were removed after adding NaCl to 0.7 M by addition 40 ⁇ l prewashed streptavidin agarose (Gibco-BRL) and incubation for 30 min. at room temperature. The mixture was extracted with phenol/chloroform and desalted by chromatography on a G-50 spin column. A portion (7 ⁇ l) of the eluate was then ligated with 100 ng Hind Ill-digested and phosphatase- treated pUC19 with 2 U T4 DNA ligase overnight at 14°C in a final volume of lO ⁇ l. The products were used to transform competent cells which were plated on LB/ampicillin agar. Analysis indicated 50% of the clones contained inserts.
  • All of the insert-containing clones contained single protected fragments. Among 64 clones, 3 separate isolates were obtained for each of three of the mutation sites. The sequence changes corresponding to the mutations were properly identified in each by sequence variation between the isolates. Nine of the clones contained inserts corresponding to the fourth site, and five corresponded to the fifth site. Seven other clones had inserts to a site (and indicated a mutation by sequence variation) which was not contained in the sequenced region containing the known mutations. Four clones contained sequences within a 40 nucleotide region in mutS but which did not identify any nucleotide changes by sequence variation. Nine of the clones corresponded to vector sequence also without changes from the known sequence.
  • Example 2 Detection of single-nucleotide polymorphisms in a 4kb PCR product from APC using a solid-phase mutS protection reaction. Genomic DNA samples from four normal individuals were combined in equal mass ratio. A portion of exon 15 of the adenomatosus polyposis coli (APC) gene was amplified from 400ng of the pooled genomic template DNA using l ⁇ M oligonucleotide sequences 5 and 6 as primers and Pfu Turbo polymerase (Stratagene) in the supplied buffer supplemented with 5% glycerol. The reaction was heated to 95°C for one minute, followed by 29 cycles of 95°C denaturation for 1 min., 62°C primer annealing for 1 min. and extension at 72° for 6 min., followed by a final extension at 72°C for 10 minutes, generating a product ⁇ 4kb in length.
  • APC adenomatosus polyposis coli
  • a reference DNA product was prepared in parallel except using 15 ng of a cloned PCR product from one of the four individuals as template and biotinylated primers corresponding to oligonucleotides 5 and 6.
  • Biotinylated reference DNA (0.5 ⁇ g) and patient DNA (0.5 ⁇ g) were heteroduplexed in 150 ⁇ l 50% formamide, 2.78X SSPE and heated to 95°C for 5 minutes followed by incubation at 37°C for lh.
  • the buffer was exchanged and the sample concentrated by diafiltration on a Centricon 100 followed by a single wash with 2ml 1 OmM Tris, ImM EDTA pH7.5 (TE).
  • the product was diluted to 200 ⁇ l with IX B&W (1M NaCl, lOmM Tris, ImM EDTA, 0.1% Tween 20), and lOO ⁇ g Dynal M-280 Streptavidin (Dynal Co ⁇ ., prewashed with IX B&W) was added and the hybrids captured by end-over-end mixing for 30 min. at room temperature.
  • the particles were collected on a magnet, washed once with 300 ⁇ l mutS binding buffer (50mM Tris pH7.5, 8 mM MgCl 2 , and 0.5mM DTT), and resuspended in 20 ⁇ l of the same buffer. MutS (10 pmol) was added and the suspension incubated on ice without mixing for 1 hr.
  • DNAse I (2 ⁇ g in 1 ⁇ l) was then added and the suspension incubated for 10 minutes at 37°C.
  • the digest was terminated by addition of 23 ⁇ l 50mM Tris pH7.5, lOmM EDTA.
  • the particles were removed by applying a magnetic field, and the supernatant was chromatographed over a Sephacryl S-200 MicroSpin column (AP Biotech).
  • a portion of the S-200 eluate (20 ⁇ l) was processed as described above (example 1), and the Hind-III digested inserts ligated into restricted and phosphatased pUC19.
  • the cloned products were sequenced. Thirty one of the 45 inserts sequenced corresponded to a single (A/G) polymo ⁇ hism at position 5034 of the gene (codon 1678). One insert corresponded to the (G/A) polymo ⁇ hism at position 4479 (codon 1493) and two inserts corresponded to the (A/G) polymo ⁇ hism at position 5880 (codon 1960). Eleven inserts contained APC sequence but did not show any changes from the expected sequence.
  • a pUC vector decorated with nonpalindromic adapter ends is generated.
  • One hundred micrograms of pUC19 is digested with BamHl and EcoRI, and the large fragment separated on a 1 % agarose gel and collected by electroelution.
  • the vector is ligated with two oligonucleotide adapters (Q and R) created by mixing and annealing sequence 7 with sequence 8 (Q) and sequence 9 with sequence 10 (R).
  • adapter "S” created by annealing oligonucleotide sequences 11 and 12.
  • This adapter provides a Hindlll-compatible single-stranded terminus and a sequence complementary to the adapter-modified ends of the pUC19 vector above.
  • an adapter "S2" is created by annealing oligonucleotide sequences 11 and 17, and may be used to limit the formation of S2 dimers during the subsequent ligation reaction.
  • the mixture is ligated overnight at 16°C with 2 U T4 DNA ligase (Stratagene) in 50mM Tris pH7.5, 5mM MgCl 2 , ImM DTT and 15% polyethylene glycol (Sigma) to create a tandem array terminated in non-self-ligatable 5'-GCTA-3' single-stranded overhangs.
  • the arrays are then ligated into the decorated vector by addition of l ⁇ g of the pUC19 prepared as described above and incubation for 4h. at 16°C.
  • the ligase is then inactivated by heating the mixture to 65°C for 15 min.
  • one strand of the vector-ligated product may be removed by addition of 100 units T7 gene 6 exonuclease (AP Biotech) and incubation at 30°C for 20 min.
  • a portion ( 1 ⁇ l) of the final reaction product diluted 1 : 10 in 1 OmM Tris, 1 mM EDTA is used to transform 25 ⁇ l of MAX efficiency DH5 ⁇ E. coli (Gibco- BRL).
  • Colonies are screened for a preferred insert size of 300-800 bp by minipreparation or colony PCR. Background arising from occasional ligation of the "S" adapters may be eliminated by digestion of the final (double-stranded) ligation product with Sac I prior to transformation (or T6 gene 6 exonuclease treatment, if used).
  • the primers for generating SNP amplimers have biotinylated termini allowing their removal by capture on a streptavidin solid support.
  • a set of amplimer products generated from patient DNA using the adapters and primers as in example 2, except lacking biotin modifications is heated in the presence of a 10-fold molar excess of biotinylated SNP amplimers.
  • Two ⁇ l of the PCR reaction from patient DNA is combined with 20 ⁇ l of a PCR reaction containing the SNP amplimers in addition to 1.5M sodium thiocyanate, 120mM disodium phosphate, lOmM EDTA in a final volume of lOO ⁇ l.
  • the annealing is supplemented with 50 pmol each oligonucleotide sequences 2, 4, 14 and 16.
  • 8 ⁇ l redistilled phenol is added and the mixture heated to 100°C for 10 min., and then chilled on ice.
  • the reactions are then placed in a thermocycler and cycled for 2 min. at 65°C and 15 min. at 25°C for a total of 10 cycles ( ⁇ 2.5h) (Miller, R, and Riblet, R, Nucl. Acids Res. 23:2339-2340 (1995)).
  • the mixture is extracted once with chloroform and 50 ⁇ l is desalted over a G50 spin column (AP Biotech).
  • the eluate is then mixed with 50 ⁇ l 2M NaCl, and incubated with 50 ⁇ l streptavidin agarose (Gibco-BRL) equilibrated with 1M NaCl, lOmM Tris pH7.5, ImM EDTA with mixing for 30 min.
  • the final product is then desalted by chromatography on a G-50 spin column.
  • Two ⁇ l of the G-50 eluate is reamplified with Taq DNA polymerase using biotinylated primers (sequences 1, 4) by heating to 95°C for 1 min., and then cycling at 95°C for 10 sec, annealing at 62°C for 20 sec. and extending at 72°C for 10 sec. for a total of 14 cycles.
  • the product(s) are digested with Hind III and processed as described above (examples 2 and 3) to generate tandem insert array clones.

Abstract

Methods and compositions for the rapid detection and identification of multiple genetic alterations are provided.

Description

SERIAL ANALYSIS OF GENETIC ALTERATIONS
TECHNICAL FIELD The invention is in the field of genetic analysis, including mutation detection and single nucleotide polymorphism (SNP) analysis.
BACKGROUND Nucleotide sequence polymorphism is a hallmark of the human genome. It is estimated that approximately 0.1% of the 3 billion base-pair human genome is subject to polymorphism. Thus, approximately 3 million base pairs of the human genome are subject to variation from one individual to another. A unique genetic fingerprint can be obtained for each individual, based upon which of these sites varies in a given individual, as well as the nature of the variations. The determination of the constellation of single nucleotide polymorphisms (SNPs) that exist in the genome of a given individual will be useful for disease phenotyping, as molecular markers for genome mapping, and as markers for forensic identification and related techniques. SNPs are but one form of the genetic variation that exists within the human population. Other forms of genetic variation include insertions and deletions of nucleotide sequences, sequence repetitions, translocations and inversions.
It is now clear that genetic variation contributes to many, if not most, types of human disease. Genetic factors can affect an individual's susceptibility to disease, as well as the response of an individual to pharmaceuticals used in the treatment of disease. In addition, both inherited and acquired genetic changes contribute to the development and progression of major human diseases such as cancer, cardiovascular disease, neurological and neurodegenerative disease. Consequently, the unique nature of an individual's genetic variation will be informative with respect to that individual's susceptibility to disease and response to treatment. Most diseases result from the cumulative effect of multiple genetic changes.
Hence, methods for the parallel analysis of multiple mutations will contribute greatly to our understanding of the molecular mechanisms of human disease, and to our eventual ability to design more effective treatments. In addition, the effectiveness of a forensic identification increases with the number of markers that are tested. Hence, the ability to conduct parallel analyses of multiple genetic changes will facilitate progress in these and other areas.
Although there are several existing methods for the identification of genetic polymorphisms, none of them allow rapid characterization and parallel analysis of multiple genetic changes. Many methods identify the presence of the mutation, but do not identify the mutation. However, this method does not lend itself easily to analysis of multiple sequence changes, nor it is certain that every sequence difference will result in a corresponding conformational change. Restriction fragment length polymorphism (RFLP) analysis will only detect sequence changes that occur in a restriction enzyme recognition site and, hence, its usefulness is limited. Array methods for analysis of genetic variation involve hybridization to oligonucleotide arrays which contain probes complementary to both the wild-type sequence and the polymorphic variant(s). However, array methods require construction of the arrays and often require elaborate controls for distinguishing single-nucleotide differences in sequence.
U.S. Patent No. 5,459,039 discloses a method for detecting base sequence differences between two DNA molecules, using a protein that recognizes base pair mismatches, by detection of a protein: DNA complex. U.S. Patent No. 5,459,039 does not provide a method for parallel analysis of multiple genetic alterations, nor does it provide a method for identifying genetic alterations.
U.S. Patent No. 5,695,937 discloses a method for serial analysis of gene expression. It does not describe methods for detection and identification of genetic alterations, nor does it disclose methods for parallel analysis of multiple genetic alterations.
A method for rapid parallel analysis of multiple genetic alterations would be particularly useful, in light of the vast genetic diversity of the human species, the consequent preponderance of genetic alterations in the human genome, and the importance of these genetic variations for diagnosis and treatment of disease, among other things. This invention provides this method.
DISCLOSURE OF THE INVENTION The present invention provides methods and compositions for the rapid parallel analysis of multiple genetic alterations in a collection of sample nucleic acids. Practice of the invention allows the identification of one or more genetic alterations in one or more sample polynucleotides, and in addition provides the nucleotide sequences of the genetic alterations so identified. In one embodiment, the invention provides a method for determining the nucleotide sequence of a sample polynucleotide containing one or more genetic alterations.
In practicing this method, the following steps are performed:
(a) contacting a duplex containing the sample polynucleotide and a reference polynucleotide with a reagent which recognizes a non-duplex polynucleotide structure, to form a reagent-heteroduplex complex, wherein the duplex contains at least one base-pair mismatch;
(b) contacting the reagent-heteroduplex complex with a nucleolytic agent to produce a protected fragment; (c) joining a double-stranded adapter oligonucleotide to the protected fragment;
(d) amplifying the products of step (c) using primers complementary to a portion of the adapter oligonucleotide, to generate amplification products;
(e) joining the amplification products to one another to form a concatemer, wherein each monomeric unit of the concatemer comprises a region of the sample polynucleotide containing the genetic alteration, and the monomeric units are separated by regions of sequence corresponding to a portion of the adapter oligonucleotide sequence; and
(f) determining the nucleotide sequence of the concatemer. In one embodiment of the invention, multiple sample polynucleotides are assayed using the same reference polynucleotide. In additional embodiments, a plurality of reference polynucleotides are used to assay one or more sample polynucleotides.
The invention additionally provides oligonucleotide adapters useful for forming concatemers of sequences containing genetic alterations. The oligonucleotide adapters of the invention are designed to minimize self-hgation, to be resistant to nucleolytic activities and to comprise a capture moiety that allows selective immobilization and/or removal of one strand of an amplification product.
Methods and compositions for selective amplification of one strand of a duplex are also provided.
BRIEF DESCRIPTION OF THE FIGURES Figures 1 A and IB are a schematic diagram of the assay method. In Figure 1 A, reference and test DNAs are heteroduplexed by heating and slow cooling. For each nucleotide difference, two mismatched nucleotide pairs are generated (only one is shown in the figure). These heteroduplexes are contacted with mutS protein. Digestion of the complexed DNA results in protection of small mostly double- stranded DNA fragments bound to mutS. These are purified from the other digest products and DNAse by spin column chromatography. The ends of the protected fragments are polished with T4 DNA polymerase. Figure IB is a continuation of Figure 1A. A pair of double-stranded adapters is ligated to the ends of the protected, polished fragments using T4 DNA ligase. The adapters lack 5' phosphates, leaving one strand of the adapter unligated. Contact with an exonuclease (T7 gene 6 exonuclease) destroys the unligated strand allowing extension of the 3' ends of the protected fragment to be extended by T4 DNA polymerase. The product is amplified by PCR using a set of biotinylated primers corresponding to one strand of the adapters. The PCR products are digested with a restriction enzyme (Hind III shown), the terminal subfragments removed by capture on streptavidin agarose, and the internal subfragment (containing the protected fragment sequence) ligated together in tandem arrays, and then ligated into a vector. Figures 2 A to 2C show concerted generation of tandem arrays from amplified protected fragments. In Figure 2A, a plasmid vector bearing nonpalindromic single-stranded overhangs is generated by ligation of two adapter oligonucleotides complimentary to the restriction fragment ends used to linearize the vector. Both adapters have identical sequence single-stranded overhangs on the non-ligating terminus, except that one of the two lacks a 5' phosphate. The adapter-decorated vector is then purified from any free adapter.
Figure 2B shows the restricted, and purified amplified protected fragment population is ligated together in the presence of a small amount (5%) of a chain- terminating adapter oligonucleotide complimentary on one end to the ends of the fragments. The other end of this adapter is complementary to the single-stranded overhang on the decorated vector. The mixture is ligated to completion. All chains receiving a chain-terminator will be forced to form larger linear arrays until another chain terminator (or another growing chain also ending in a terminator) is ligated yielding a large tandem array whose average size is determined by the ratio of terminator to fragment ratio.
Figure 2C shows the decorated vector is added to the ligation reaction to accept the array. To minimize repair activity directed against the final ligation product, one strand may be removed by T7 gene 6 exonuclease.
MODES FOR CARRYING OUT THE INVENTION
The practice of the present invention employs, unless otherwise indicated, conventional techniques of microbiology, molecular biology, recombinant DNA and related fields, which are within the skill of the art. These techniques are fully explained in the literature. See, e.g., Maniatis et al., Molecular Cloning: A Laboratory Manual (1982) Cold Spring Harbor Laboratory Press; D. Glover, ed., DNA Cloning: A Practical Approach, vols. I & II (1985) LRL Press; M. Gait, ed., Oligonucleotide Synthesis (1984) IRL Press; B. Harnes & S. Higgins, eds., Nucleic Acid Hybridization (1985) LRL Press; Perbal, A Practical Guide to Molecular Cloning (1984); Ausubel, et al., Current Protocols In Molecular Biology (1987 and annual updates) John Wiley & Sons; and Sambrook et al., Molecular Clomng: A Laboratory Manual (2nd Edition); vols. I, II & HI (1989) Cold Spring Harbor Laboratory Press.
Definitions As used herein, certain terms will have specific meanings.
The singular form "a," "an" and "the" include plural references unless the context clearly dictates otherwise. For example, the term "a cell" includes a plurality of cells, including mixtures thereof.
The term "comprising" is intended to mean that the compositions and methods include the recited elements, but not excluding others. "Consisting essentially of when used to define compositions and methods, shall mean excluding other elements of any essential significance to the combination. Thus, a composition consisting essentially of the elements as defined herein would not exclude trace contaminants from the isolation and purification method and pharmaceutically acceptable carriers, such as phosphate buffered saline, preservatives, and the like. "Consisting of shall mean excluding more than trace elements of other ingredients and substantial method steps for administering the compositions of this invention. Embodiments defined by each of these transition terms are within the scope of this invention. The terms "polynucleotide" and "nucleic acid molecule" are used interchangeably to refer to polymeric forms of nucleotides of any length. The polynucleotides may contain deoxyribonucleotides, ribonucleotides, and/or their analogs. Nucleotides may have any three-dimensional structure, and may perform any function, known or unknown. The term "polynucleotide" includes single-, double-stranded and triple helical molecules. "Oligonucleotide" refers to polynucleotides of between about 6 and about 100 nucleotides of single- or double- stranded DNA or RNA. Oligonucleotides are also known as oligomers and may be isolated from genes, or chemically synthesized by methods known in the art. A "primer" refers to an oligonucleotide, usually single-stranded, that provides a 3'- hydroxyl end for the initiation of nucleic acid synthesis. The following are non-limiting embodiments of polynucleotides: a gene or gene fragment, exons, introns, rnRNA, tRNA, rRNA, ribozymes, cDNA, recombinant polynucleotides, branched polynucleotides, plasmids, vectors, isolated DNA of any sequence, isolated RNA of any sequence, nucleic acid probes, and primers. A nucleic acid molecule may also comprise modified nucleic acid molecules, such as methylated nucleic acid molecules and nucleic acid molecule analogs. Analogs of purines and pyrimidines are known in the art, and include, but are not limited to, aziridinycytosine, 4-acetylcytosine, 5-fluorouracil, 5-bromouracil, 5-carboxymethylaminomethyl-2-thiouracil, 5-carboxymethyl-aminomethyluracil, inosine, N6-isopentenyladenine, 1-methyladenine, 1-methylpseudouracil, 1- methylguanine, 1-methylinosine, 2,2-dimethylguanine, 2-methyladenine, 2- methylguanine, 3-methylcytosine, 5-methylcytosine, pseudouracil, 5-pentylnyluracil and 2,6-diaminopurine. The use of uracil as a substitute for thymine in a deoxyribonucleic acid is also considered an analogous form of pyrimidine. Oligonucleotides are short polymers of nucleotides, generally less than 200 nucleotides, preferably less than 150 nucleotides, more preferably less than 100 nucleotides, more preferably less than 50 nucleotides and most preferably less than 30 nucleotides in length. Oligonucleotides are generally considered to comprise shorter polymers of nucleotides than do polynucleotides, although there is an art- recognized overlap between the upper limit of oligonucleotide length and the lower limit of polynucleotide length. Consequently, for the purposes of the present invention, the terms "oligonucleotide" and "polynucleotide" shall not be considered limiting with respect to polymer length.
As used herein, "base pair," also designated "bp" refers to the complementary nucleic acid molecules; in DNA the purine adenine (A) is hydrogen bonded with the pyrimidine base thymine (T), and the purine guanine (G) with pyrimidine cytosine (C), also known as Watson-Crick base-pairing. A thousand base pairs is often called a kilobase, or kb. A "base pair mismatch" refers to a location in a nucleic acid molecule in which the bases are not complementary Watson-Crick pairs. The term "duplex" refers to the complex formed between two strands of hydrogen-bonded, complementary nucleic acid molecules. A duplex need not be entirely complementary, but can contain one or more mismatches or one or more deletions or additions. A duplex is sufficiently long-lasting to persist between formation of the duplex or complex and subsequent manipulations, including, for example, any optional washing steps. As used herein, the term "reference strand" or "wild-type strand" refers to the nucleic acid molecule or polynucleotide having a sequence prevalent in the general population that is not associated with any disease or discernible phenotype. It is noted that in the general population, wild-type genes may include multiple prevalent versions that contain alterations in sequence relative to each other and yet do not cause a discernible pathological effect. These variations are designated
"polymorphisms" or "allelic variations." It is therefore possible to prepare multiple reference strands, thereby providing a mixture of the most common polymorphisms. Alternatively, one reference strand may be used that has been selected for its particular sequence. The reference strand can also be chemically or enzymatically modified, for example to remove or add methyl groups. In one or more embodiments, the reference strand is comprised of a PCR product identical at least in part to the sequence prevalent in the general population.
In a preferred embodiment, the reference strand or wild-type strand comprises a portion of a particular gene or genetic locus in the patient's genomic DNA known to be involved in a pathological condition or syndrome. Non-limiting examples of genetic syndromes include cystic fibrosis, sickle-cell anemia, thalassemias, Gaucher's disease, adenosine deaminase deficiency, alphal-antitrypsin deficiency, Duchenne muscular dystrophy, familial hypercholesterolemia, fragile X syndrome, glucose-6-phosphate dehydrogenase deficiency, hemophilia A, Huntington's disease, myotonic dystrophy, neurofibromatosis type 1 , osteogenesis imperfecta, phenylketonuria, retinoblastoma, Tay-Sachs disease, and Wilms tumor (Thompson and Thompson, Genetics in Medicine, 5th Ed.).
In another embodiment, the reference strand comprises part of a particular gene or genetic locus that may not be known to be linked to a particular disease, but in which polymoφhism is known or suspected. For example, obesity may be linked with variations in the apolipoprotein B gene, hypertension may be due to genetic variations in sodium or other transport systems, aortic aneurysms may be linked to variations in I-haptoglobin and cholesterol ester transfer protein, and alcoholism may be related to variant forms of alcohol dehydrogenase and mitochondrial aldehyde dehydrogenase. Furthermore, an individual's response to medicaments may be affected by variations in drug modification systems such as cytochrome P450s, and susceptibility to particular infectious diseases may also be influenced by genetic status. Finally, the methods of the present invention can be applied to HLA analysis for identity testing.
The term "sample strand" or "patient strand" refers to the polynucleotide having unknown sequence and potentially containing one or more mutations or mismatches as compared to the reference strand. This may be a PCR product amplified from patient DNA or other sample(s).
In yet another embodiment, the reference strand comprises part of a foreign genetic sequence e.g., the genome of an invading microorganism. Non-limiting examples include bacteria and their phages, viruses, fungi, protozoa, myoplasms, and the like. The present methods are particularly applicable when it is desired to distinguish between different variants or strains of a microorganism in order to choose appropriate therapeutic interventions.
The term "genetic alterations" or "mutations" is used to refer to a change from the wild-type or reference sequence of one or more nucleic acid molecules. It refers to base pair substitutions, additions and deletions of a sample strand when compared to a reference strand.
A linear sequence of polynucleotides is "substantially homologous" to another linear sequence, having the opposite polarity, if both sequences are capable of hybridizing to form duplexes with the same complementary polynucleotide.
Sequences that hybridize under conditions of high stringency are more preferred. It is understood that hybridization reactions can accommodate insertions, deletions, and substitutions in the nucleotide sequence. Thus, linear sequences of nucleotides can be essentially identical even if some of the nucleotide residues do not precisely correspond or align. Preferably, the "substantially homologous" sample sequences of the invention contain a single mutation (mismatch) or an addition or deletion of 1 to about 10 base pairs when compared to the reference polynucleotide.
As used herein, the term "reagent which recognizes a non-duplex polynucleotide structure" is any agent, proteinaceous or otherwise, which provides this functional activity when used in the method of this invention. In one embodiment, this agent is a mismatch binding protein or "MBP." MBP refers to the group of proteins which recognize and bind to nucleotide mismatches and unpaired nucleotides in polynucleotide duplexes. As used herein, the term "non-duplex polynucleotide structure" shall mean the absence at any position of a Watson-Crick base pair, i.e., any pair other than A:T or G:C, or A:U in RNA, and unpaired nucleotides. By recognizing and binding to improperly paired nucleotide strands, these proteins are involved in the complex pathway of genetic repair. Repair is generally initiated by the binding of the protein mutS to the mismatch. (See, Modrich (1994), supra). According to current understanding, the portion of DNA between the mutS-bound mismatch and the nearest GATC element (bound by mutH) is looped out by a translocase activity of the mutS protein, assisted by the DNA helicase activity of mutL, leading to activation of the GATC endonculease associated with mutH. Cooperative action of mutS, mutH and the DNA helicase (mutL) is required to mark the mismatch region, which is then repaired using exonucleases and polymerases.
"MBPs" includes several embodiments. These embodiments include any fragment, analog, mutein, variant or mixture thereof, that retains the ability to recognize and bind to a nucleotide mismatch. In one embodiment, a "variant" is a protein or polypeptide with conservative amino acid substitutions as compared to the wild-type amino acid sequence: The term therefore encompasses MutS and its homologues including hMSH2, hPMSl, and hPMS2.
Mismatch repair proteins for use in the present invention may be derived from E. coli (as described above) or from any organism containing mismatch repair proteins with appropriate functional properties. Non-limiting examples of useful proteins include those derived from Salmonella typhimurium (MutS, see, Su, S.S. and Modrich, P., Proc. Natl. Acad. Sci. USA 84:5057-5061 (1986); MutL); Streptococcus pneumoniae (HexA, HexB); Saccharomyces cerevisiae ("all-type," MSH2, MLH1, MSH3); Schizosaccharomycespom.be (SWI4); mouse (repl, rep3); and human ("all- type," hMSH2, hMLHl, hPMSl, hPMS2, duel). In another embodiment, heteroduplexes formed between patients' DNA and wild-type DNA as described above are incubated with p53 or its C-terminal domain (Lee, et al., Cell 81:1013- 1020 (1995)). In another embodiment, purified MutS, MutL, and MutH are used to cleave mismatch regions (Su et al., Proc. Natl. Acad. Sci. USA 83:5057 (1986); Grulley et al., J. Biol. Chem. 264:1000 (1989)).
When the agents are proteins or polypeptides, they can be in the L or D form so long as the biological activity of the polypeptide is maintained. For example, the protein can be altered so as to be secreted from the cell for recombinant production and purification. These also include proteins which are post-translationally modified by reactions that include glycosylation, acetylation and phosphorylation. Such polypeptides also include analogs, alleles and allelic variants which can contain amino acid derivatives or non-amino acid moieties that do not affect the biological or functional activity of the protein as compared to wild-type or naturally occurring protein. The term amino acid refers both to the naturally occurring amino acids and their derivatives, such as TyrMe and PheCl, as well as other moieties characterized by the presence of both an available carboxyl group and an amine group. Non-amino acid moieties which can be contained in such polypeptides include, for example, amino acid mimicking structures. Mimicking structures are those structures which exhibit substantially the same spatial arrangement of functional groups as amino acids but do not necessarily have both the α-amino and α-carboxyl groups characteristic of amino acids. As used herein, the term "nucleolytic agent" refers to an enzyme or chemical agent that cleaves at least one strand of DNA. Non-limiting examples of such agents include exonucleases such as B AL31 , lambda exonuclease, exonuclease III, T7 gene 6 exonuclease, and endonucleases such as DNase I, S. aureus (micrococcal) nuclease, PI nuclease, and the like. In addition, examples of chemical agents include bleomycin and/or iron-bound intercalators such as 0-phenanothroline that can direct the formation of hydroxyl radicals in close proximity to a DNA-bound intercalator to effect DNA cleavage.
The term "exonuclease" refers to an enzyme that cleaves nucleotides sequentially from the free ends of a linear nucleic acid substrate. Exonucleases can be specific for double or single stranded nucleotides and/or directionally specific, for instance, 3'— »5' and/or 5'-»3\ Some exonucleases exhibit other enzymatic activities, for example, native T7 DNA polymerase is both a polymerase and an active 3'- 5' exonuclease. Exonuclease III removes nucleotides one at a time from the 3 '-end of duplex DNA, exonuclease VII makes oligonucleotides by several nucleotides from both ends of single-stranded DNA and lambda exonuclease removes nucleotides having attached 5' phosphate groups from the 5' end of duplex DNA.
As used herein, the term "nick translation" refers to a process comprising the combined action of a 5'— »3' exonuclease and a polymerase to degrade the 5' terminus of a polynucleotide in duplex with a template strand polynucleotide and extend the 3' terminus of an adjacent second polynucleotide strand in duplex with the same template strand polynucleotide wherein the said 5' terminus is separated from the 3' terminus of the adjacent polynucleotide by a nick or gap. For these purposes, "adjacent" includes any number of nucleotides without limit (more than zero nucleotides is usually referred to as a "gap"), but is typically zero nucleotides
(usually referred to as a "nick"). As is well understood in the art, such exonuclease and polymerase activities may reside in the same enzyme (such as with E. coli DNA polymerase I) or different enzymes, and their action may be either simultaneous or sequential. The term "polymerase chain reaction" or "PCR" refers to a method for amplifying a DNA base sequence using a heat-stable polymerase such as Taq polymerase, and two oligonucleotide primers, one complementary to the (+)-strand at one end of the sequence to be amplified and the other complementary to the (-)- strand at the other end. Because the newly synthesized DNA strands can subsequently serve as additional templates for the same primer sequences, successive rounds of primer annealing, strand elongation, and dissociation can produce rapid and highly specific exponential amplification of the desired sequence. PCR also can be used to detect the existence of the defined sequence in a DNA sample.
This invention provides a method for determining the sequence of a sample polynucleotide containing one or more genetic alterations. The genetic alteration are presented in the form of a heteroduplex of polynucleotides which is then contacted with a reagent that recognizes a non-duplex polynucleotide structure, to form a reagent-heteroduplex complex, wherein the duplex contains at least one base-pair mismatch. As used herein, a heteroduplex can comprise: DNA:DNA; DNA:RNA; or DNA:RNA. A nucleolytic agent is contacted with the heteroduplex to produce a protected fragment from which a double-stranded adapter oligonucleotide is joined. The duplex: adapter complex is amplified using primers complementary to a portion of the adapter oligonucleotide, to generate amplification products. The products are the amplification products joined to one another to form a concatemer, wherein each monomeric unit of the concatemer comprises a region of the sample polynucleotide contaimng the genetic alteration or the substantially-complementary reference strand region, and the monomeric units are separated by regions of sequence corresponding to a portion of the adapter oligonucleotide sequence. The sequence of the concatemer is comprised of sequences of the strands of the protected fragments in random order and orientation. The sequence or sequences of the protected fragment strands are compared with the known sequence of the sample strand to identify the genetic alterations. Alternatively, in cases where the sequence of the reference strand is not fully known, the sequence of the two strands of the same protected fragment and/or sequences of other protected fragments arising from the same mismatch (identifiable as comprising substantially-identical overlapping or nested sequences) are compared to each other to identify the nucleotide sequence variations within or between the sample and/or reference polynucleotides. In one embodiment, a plurality of reference polynucleotides are used with either the same or different sample strands.
In one aspect, the adapter oligonucleotide comprises one or more restriction enzyme recognition sites. In a further aspect, the amplification products are digested with a restriction enzyme which cleaves at the recognition site prior to joining to adapter polynucleotides. In a yet further aspect, adapter polynucleotides of differing sequences are utilized to join the amplified products to each other.
In a further embodiment, the adapter oligonucleotide has the following characteristics: It contains an inner end and an outer end; it is non-phosphorylated at the 5' terminus of the inner end; it contains a capture moiety at the outer end, and it has one or more blocking linkages adjacent to the capture moiety. In a further embodiment, joining of the adapter comprises joining the first strand of the double- stranded adapter oligonucleotide to the products of the prior step and joining the second strand of the double-stranded adapter oligonucleotide to the products of this step. As set forth in more detail below, a further embodiment of the invention provides the first strand is covalently joined to the protected fragment by ligation, and the second strand is covalently joined to the protected fragment by nick- translation.
It is to be understood, although not always explicitly stated, that the methods as disclosed herein can be practiced with a plurality of sample strands with one or more reference strands, and vice versa. In addition, a plurality of reference strands can be initially contacted with a plurality of sample strands.
Materials and Methods Preparation of Sample and Reference Polynucleotides
Reference DNA can be synthesized by chemical means or, preferably, isolated from any organism by any method known in the art. The organism will have no discernible disease or phenotypic effects. This DNA may be obtained from any cell source, tissue source or body fluid. Non-limiting examples of cells sources available in clinical practice include blood cells, buccal cells, cerviovaginal cells, epithelial cells from urine, fetal cells, or any cells present in tissue obtained by biopsy. Body fluids include urine, blood, cerebrospinal fluid (CSF), and tissue exudates at the site of infection or inflammation. DNA is extracted from the cells or body fluid using any method known in the art. Preferably, at least 5 pg of DNA is extracted. The extracted DNA can be used without further modification or stored for future use. Preferably, one or more specific regions in the extracted reference polynucleotide are amplified by PCR using a set of PCR primers complementary to genomic DNA separated by up to about 500 base pairs. PCR conditions found to be suitable are described below in the Examples. It will be understood that optimal PCR conditions can be readily determined by those skilled in the art. (See, e.g., PCR 2: A PRACTICAL APPROACH (1995) eds. M.J. McPherson, B.D. Hames and G.R. Taylor, LRL Press, Oxford).
PCR products can be purified by a variety of methods, including but not limited to, microfiltration, dialysis, gel electrophoresis and the like. It is desirable to remove the polymerase used in PCR so that no new DNA synthesis can occur.
Duplex Formation
A reference:sample heteroduplex can be formed by any method of hybridization known in the art. In one embodiment, the reference and samples are separately heated and then annealed together. Preferably the heating step is between about 70°C and about 100°C, more preferably between about 80°C and 100°C, and even more preferably between about 90°C and 100°C. The polynucleotide is kept at the elevated temperature for sufficient time to separate the strands, preferably between about 2 minutes and about 15 minutes, more preferably between about 2 and about 10 minutes and even more preferably about 5 minutes.
The separately heated reference and sample strands are then combined while at the elevated temperatures and allowed to cool. Generally, cooling occurs rather slowly, for instance the solution is allowed to cool to 50°C over a period of about an hour. The cooling must be sufficiently slow as to allow formation of reference: sample duplexes including those with both high and low Tm. The duplexes can be used immediately, or stored at 4°C until use.
Alternatively, a duplex can be formed by adjusting the salt and temperature to achieve suitable hybridization conditions. Hybridization reactions can be performed in solutions ranging from about 10 mM NaCl to about 600 mM NaCl, at temperatures ranging from about 37°C to about 65°C. It will be understood that the stringency of the hybridization reaction is determined by both the salt concentration and the temperature. For instance, a hybridization performed in 10 mM salt at 37°C may be of similar stringency to one performed in 500 mM salt at 65°C. In addition, organic solvents and/or chaotropic salts such as guanidine thiocyanate (2.5M) may be used, allowing hybridization to be performed at 37°C. Finally, means of accelerating hybridization such as phenol emulsion (Miller & Riblet, Nucl. Acids Res. 23:2339- 2340 (1995)) can be employed. For the present invention, any hybridization conditions can be used that form hybrids between substantially homologous complementary sequences, provided the reagents employed are compatible with the MBP and exonuclease employed. Generally, this can be accomplished by exchange into the reaction buffer of choice by dilution, extraction followed by ethanol precipitation, ultrafiltration or spin column chromatography and the like. In a preferred embodiment stringent hybridization conditions are used.
A genetic alteration is a difference in nucleotide sequence between two polynucleotides. In certain cases, a genetic alteration can be a mutation, resulting in a detectable phenotype. In other cases, a genetic alteration may not be linked to a phenotype, but will be useful, e.g., for forensic purposes. A particular sample polynucleotide that is assayed in the practice of the invention can contain a single genetic alteration or it may contain multiple genetic alterations. In a preferred embodiment, the genetic alteration will be a single nucleotide polymoφhism, i.e., a change in the sequence of a single base, compared to the wild-type sequence. In a preferred embodiment, a plurality of sample polynucleotides, each of which contains one or more genetic alterations, is analyzed by the practice of the invention, and the nucleotide sequences of the genetic alterations thereby determined.
To identify and determine the sequences of genetic alterations, one or more sample polynucleotides are hybridized with one or more reference polynucleotides to generate a polynucleotide duplex. A polynucleotide duplex is a double-stranded polynucleotide, in which the association between the two strands is mediated, at least in part, by complementary base-pairing. Hybridization of polynucleotides to form duplexes proceeds according to well-known and art-recognized base-pairing properties, such that adenine base-pairs with thymine or uracil, and guanine base- pairs with cytosine. The property of a nucleotide that allows it to base-pair with a second nucleotide is called complementarity. Thus, adenine is complementary to both thymine and uracil, and vice versa; similarly, guanine is complementary to cytosine and vice versa. In a polynucleotide duplex, one or both of the two component strands may not be duplex along their entire length if, for example, one strand is longer that the other, or if the two strands have non-complementary terminal sequences.
For the puφoses of the present invention, it is useful to distinguish between a homoduplex, in which all bases in both component polynucleotide strands form complementary base pairs (along the double-stranded portion of the duplex) and a heteroduplex, in which one or more additions, deletions or base-pair mismatches exist within the duplex. Formation of a heteroduplex is indicative of a genetic alteration in one of the two polynucleotide strands of a duplex, with respect to the other strand. Two polynucleotide strands containing one or more additions, deletions or mismatches with respect to each other are nevertheless capable of forming a stable duplex, if sufficient complementary exists between the two strands. Moreover, those of skill in the art are aware that lower hybridization stringency allows formation of stable duplexes between polynucleotide strands with higher degrees of noncomplementarity. Therefore, hybridization conditions can be adjusted to facilitate the formation of heteroduplexes. For the puφoses of the invention, the stability of a heteroduplex is such that it will persist after its formation through subsequent manipulations such as washing, protein binding and treatment with nucleolytic agents.
The nucleotide sequence of a reference polynucleotide will be known as a reference sequence and, in general, a reference sequence will be the wild-type sequence of a particular genetic region. Variations from the wild-type sequence present in the sample polynucleotide(s) can thereby be determined. In one embodiment of the invention, a single reference sequence, comprising a region of known genetic variability, is used to assay multiple sample polynucleotides. In additional embodiments, a reference polynucleotide could comprise a sequence that has been defined as "mutant" and be used to detect either wild-type sequences or sequences containing a different mutant allele. In the practice of the invention, a sample polynucleotide, either single- or double-stranded, is contacted with a reference polynucleotide, either single- or double-stranded, to form a mixture. The mixture is treated so as to denature double- stranded polynucleotides and to remove regions of secondary structure in single- stranded polynucleotides, for instance, by heating. After denaturation, the mixture is incubated in solution under conditions of temperature, ionic strength, pH, etc., that are favorable to annealing, i.e., under hybridization conditions. Hybridization conditions are chosen to allow duplex formation between sequences having one or more mismatches, as is known in the art. Hybridization of a sample polynucleotide with a reference polynucleotide, according to the practice of the invention, will thus result in the formation of a heteroduplex polynucleotide, containing one or more mismatches, i.e., sites at which complementary base-pairing does not occur.
It is noted that in the general population, wild-type genes may include multiple prevalent versions that contain alterations in sequence relative to each other and yet do not cause a discernible pathological effect. These variations are designated "polymoφhisms" or allelic variations." It is therefore possible to prepare multiple reference polynucleotides, thereby providing a mixture of the most common polymoφhisms. Alternatively, a single reference polynucleotide may be used that has been selected for its particular sequence. The reference polynucleotide can also be chemically or enzymatically modified, for example to remove or add methyl groups. In one or more embodiments, the reference polynucleotide comprises a PCR product identical, at least in part, to the sequence prevalent in the general population. It is intended to include, but not be limited to, polynucleotides as defined above, i.e., a gene or gene fragment, restriction fragment, exons, introns, mRNA, tRNA, rRNA, ribozymes, cDNA, recombinant polynucleotides, branched polynucleotides, plasmids, vectors, isolated DNA of any sequence, isolated RNA of any sequence, nucleic acid probes, and primers.
In a preferred embodiment, the reference polynucleotide comprises a portion of a particular gene or genetic locus in the patient's genomic DNA known to be involved in a pathological condition or syndrome. Non-limiting examples of genetic syndromes include cystic fibrosis, sickle-cell anemia, thalassemias, Gaucher's disease, adenosine deaminase deficiency, alphal-antitrypsin deficiency, Duchenne muscular dystrophy, familial hypercholesterolemia, fragile X syndrome, glucose 6- phosphate dehydrogenase deficiency, hemophilia A, Huntington's disease, myotonic dystrophy, neurofibromatosis type 1, osteogenesis imperfecta, phenylketonuria, retinoblastoma, Tay-Sachs disease, and Wilms' tumor. Thompson and Thompson, Genetics in Medicine, 5* Ed.
In another embodiment, the reference polynucleotide comprises part of a particular gene or genetic locus that may not be known to be linked to a particular disease, but in which polymoφhism is known or suspected. For example, obesity may be linked with variations in the apolipoprotein B gene, hypertension may be due to genetic variations in transport systems for sodium or other ions, aortic aneurysms may be linked to variations in I-haptoglobin and cholesterol ester transfer protein, and alcoholism may be related to variant forms of alcohol dehydrogenase and mitochondrial aldehyde dehydrogenase. Furthermore, an individual's response to medicaments may be affected by variations in drug modification systems such as cytochrome P-450s, and susceptibility to particular infectious diseases may also be influenced by genetic status. Finally, the methods of the present invention can be applied to HLA analysis for identity testing.
Protected Fragments
To facilitate the rapid analysis of multiple genetic alterations, the invention allows the generation of a smaller subregion of the sample polynucleoti dereference polynucleotide heteroduplex, wherein the subregion encompasses the sequence difference between the sample polynucleotide and the reference polynucleotide. In one embodiment, the subregion is generated by protection from nuclease digestion, and is therefore denoted a protected fragment. An exemplary method for generating a protected fragment is as follows.
A heteroduplex formed between a sample polynucleotide and a reference polynucleotide is contacted with a reagent which recognizes a non-duplex structure. The reagent can be chemical or enzymatic. In a preferred embodiment, the reagent is enzymatic; in a more preferred embodiment, the reagent is a mismatch binding protein; in a still more preferred embodiment, the reagent is the E. coli mutS protein. Reaction conditions compatible with the binding of mutS and other mismatch binding proteins to heteroduplexes are known in the art and are additionally provided in the examples below. Those of skill in the art are aware that such conditions can be varied and are also aware of art-recognized methods for the detection of protein binding to a heteroduplex, such as filter binding, gel electrophoresis and the like. Thus, additional conditions compatible with binding can be determined without undue experimentation. Contact between the reagent and the heteroduplex results in the formation of a reagent-heteroduplex complex. This complex will generally comprise a duplex polynucleotide in which one (or more) portions of the duplex are bound by the reagent (in the vicinity of the genetic alteration), and the remainder of the duplex is free of bound reagent. Those portions of the duplex that are free of bound reagent are susceptible to nucleolytic agents. Accordingly, to generate a protected fragment, a reagen heteroduplex complex is subjected to the action of one or more nucleolytic agents. Such nucleolytic agents can be chemical or enzymatic.
An enzymatic nucleolytic agent is also known as a nuclease. Thus, a nuclease is an enzyme capable of degrading nucleic acids. An exonuclease degrades from the ends of a nucleic acid molecule. A 5'-specific exonuclease will begin degradation at the 5' end of a nucleic acid molecule, and a 3 '-specific exonuclease will begin degradation at the 3' end of a nucleic acid molecule. 5 '-specific exonucleases may additionally be specific for either 5 '-phosphate- or 5'-hydroxyl terminated ends. Similarly, 3'-specific exonucleases may be specific for either 3'-phosphate- or 3'- hydroxyl terminated ends. An endonuclease degrades internally in a nucleic acid molecule. A single strand-specific nuclease degrades single-stranded nucleic acids, either exonucleolytically or endonucleolytically, but is unable to degrade a double- stranded nucleic acid.
A preferred nuclease is an endonuclease. Suitable endonucleases include SI nuclease, pi nuclease from Micrococcal nuclease, Mung Bean nuclease and DNAse I. Concentrations of endonuclease sufficient to digest non-protected portions of a duplex polynucleotide are known in the art and an example is provided in the examples, infra.
Nucleolytic agents of a chemical nature include, for example, NaOH or other bases, which are capable of nucleolytic degradation of RNA. In another embodiment of the invention, the requirement for a reagent that recognizes a non-duplex polynucleotide structure is fulfilled by the concerted, sequential action of several reagents. For example, a reagent-heteroduplex complex can be contacted with a chemical or enzymatic reagent which cleaves one strand of an otherwise duplex polynucleotide at or near a mismatch, followed by contact of the nicked polynucleotide with a nick-binding protein. Suitable reagents which cleave at or near a mismatch include SI nuclease, Mung Bean nuclease, Mut Y protein, and Mut M protein.
The region of the polynucleotide component of a reagent-heteroduplex complex that is bound by the reagent is protected from the action of the nucleolytic agent. Thus, treatment of the reagent-heteroduplex complex with a nucleolytic agent generates a protected fragment. Protected fragments can be purified, if desired, by any method which separates the protected fragment from (usually) smaller unprotected polynucleotide fragments produced by the nucleolytic agent. For example, a size separation, such as gel filtration or gel electrophoresis, can be used.
Mismatch Recognition
The reference:sample duplex is contacted with one or more agents having the ability to specifically bind to bp mismatches. This includes, but is not limited to, mismatch binding proteins. The agent is contacted under conditions which allow binding of the agent to the mismatch. Preferably, the MBP is E. coli MutS (AP Biotech) although other MBPs or mixtures of MBPs can be used. For instance, homologues of MutS such as MutS from Thermus aquaticus (Epicentre), Streptococcus pneumoniae HexA, hMSH2, genetically modified MutS or other mutation binding proteins such as RuvC protein from E. coli, human p53, or genetically modified (non-cleaving forms) of mutY or T4 endonuclease VII may be used. Preferably, the duplex is contacted with MutS at 0°C for between about 10 and 30 min., preferably about 30 min. MutS binding, yielding consistent patterns of protection with a large "footprint" range of bp protected, occurs near neutral pH, preferably between a pH of about 6.5 and 8.5, and more preferably, between a pH of about 7.0 and 8.0. A source of magnesium ions (Mg") can also be added to the reaction to enhance MutS binding. Adapter oligonucleotides
To facilitate the formation of tandem arrays of protected fragments (i.e., concatemers) and their sequence analysis, oligonucleotide adapters are added to the protected fragments. The adapters facilitate the amplification of sequences corresponding to a protected fragment, by providing a pair of primer sites. In a preferred embodiment of the invention, an adapter oligonucleotide duplex lacks terminal phosphate residues. The lack of 5 '-phosphate termini on the adapter oligonucleotides prevents self-ligation of the adapter oligonucleotides, which would lead to the production of spurious amplification products (i.e., "primer dimers") in later steps. Lack of 5 '-phosphate termini also prevents ligation of multiple adapters to a protected fragment, insuring that a single adapter is ligated to each end of a protected fragment.
For the puφoses of the invention, it is convenient to distinguish between an inner end and an outer end of the adapter oligonucleotide. The inner end of the adapter oligonucleotide is that end which becomes joined to a protected fragment in the practice of the invention. Conversely, the outer end of the adapter oligonucleotide is that end which is not joined to the protected fragment. Hence, the outer end of the adapter oligonucleotide forms a terminus of the polynucleotide which results from ligation of an adapter oligonucleotide to a protected fragment. It is possible to specify the inner and outer ends of an adapter oligonucleotide in several ways. In one embodiment, the adapter oligonucleotide duplex comprises a capture moiety at its outer end, which serves to allow immobilization of the outer portions of the adapter-modified fragment after digestion with a restriction enzyme, and also serves to sterically block ligation of that end of the adapter oligonucleotide to a protected fragment. The adapter oligonucleotides can optionally comprise a capture moiety at the outer end of the adapter oligonucleotide duplex. The capture moiety can be attached either to the strand that forms the 5'-end of the outer end of the adapter oligonucleotide duplex, or to the strand that forms the 3 '-end of the outer end of the adapter oligonucleotide duplex. The capture moiety is generally a molecule that is capable of interacting with a second molecule (a recognition moiety) to form a stable complex. Adapter oligonucleotides and any other nucleic acid that is directly or indirectly attached to the capture moiety will be present in a complex formed between a capture moiety and a recognition moiety. A recognition moiety will often be attached to a solid substrate, or otherwise immobilized such that a capture moiety recognition moiety complex can be brought out of solution. Exemplary capture moiety:recognition moiety pairs include bioti adidin, biotimstreptavidin, biotimanti-biotin, antigemantibody, haptemantibody, enzyme: substrate, sugar:lectin, protei ligand and nucleic acidxomplementary nucleic acid. Other interacting molecules that can serve as capture moiety recognition moiety pairs will be known to those of skill in the art. It is also clear that the roles of capture moiety and recognition moiety can be reversed.
Adapter oligonucleotides will comprise, within their sequence, a primer binding site. A primer binding site refers to a region of an oligonucleotide or polynucleotide, such as an adapter or a sequence encoded by an adapter, that is capable of base-pairing with a primer, or that encodes a sequence that is able to base- pair with a primer. A primer is an oligonucleotide or polynucleotide capable of base- pairing with another oligonucleotide or polynucleotide and serving as a site from which polymerization can be initiated, normally from a 3'-hydroxyl end. Because of the ease with which oligonucleotides of defined sequence can be synthesized, virtually any sequence, capable of base-pairing, can function as a primer binding site.
An adapter oligonucleotide duplex can comprise one or more blocking linkages adjacent to its outer end. A blocking linkage is an internucleotide linkage which is less susceptible to nucleolytic degradation, compared to the phosphodiester linkage normally found in most naturally-occurring nucleic acids. Exemplary blocking linkages include phosphorothioate, methyl phosphonate, boronate and others. Other modifications such as nucleic acid analogs (PNAs, moφholidate DNAs, locked nucleoside analogs (LNA), and the like, may also be used. Further, addition of bulky haptens such as fluorescent tags, inverted nucleosides (in 5 '-5' or 3'-3' linkage), or biotin, any of which inhibit the action of nucleases used in the procedure, may also be used. The presence of blocking linkages minimizes loss of protected fragments due to nucleolytic action, as described, infra.
End repair
In one embodiment of the invention, the protected fragments are rendered blunt-ended (if not already blunt-ended as a result of the action of a nucleolytic agent), prior to the addition of adapters, by one of a number of end repair processes. In one embodiment, this is accomplished by incubating the protected fragments with T4 DNA Polymerase, dATP, dCTP, dGTP and dTTP under conditions suitable for both the polymerization and exonucleolytic activities of T4 DNA polymerase. Fragments contaimng 3 ' overhanging ends will be rendered blunt-ended by the 3 '- specific exonuclease activity of T4 DNA Polymerase; while 5' overhangs will be converted to blunt ends by polymerization of the recessed 3' termini. In an alternative embodiment, protected fragments are treated with a single strand-specific exonuclease, such as E. coli exonuclease VII or exonuclease I. In a preferred embodiment, 5 '-phosphate-terminated blunt ends are generated, either through the end repair process or as a result of nucleolytic action. Other methods for generating blunt-ended fragments, such as, for example, physical methods, treatment with single-strand-specific exo- or endonucleases or the use of other types of nucleic acid polymerase, are also within the scope of this invention.
Joining of adapter oligonucleotides to protected fragments
Subsequent to end repair of the protected fragments, (a) double- stranded adapter oligonucleotide(s) is/are attached to the protected fragments. In a preferred embodiment, initial attachment of one strand of the adapter oligonucleotide duplex to one strand of a protected fragment is achieved by enzymatic ligation using a ligase enzyme. Exemplary ligase enzymes include T4 DNA ligase, E. coli DNA ligase, T. aquaticus ligase.
As a result of either the action of the nucleolytic agent or of the end repair process, the protected fragments will have 5'-phosphate termini. The oligonucleotide adapters, on the other hand, lack terminal phosphate residues. The lack of 5'- phosphate termini on the adapter oligonucleotides prevents self-ligation of the adapter oligonucleotides, which would lead to the production of spurious amplification products (i.e., "primer dimers") in later steps. Lack of 5 '-phosphate termini also prevents ligation of multiple adapters to a protected fragment, insuring that a single adapter is ligated to each end of a protected fragment.
In order to catalyze ligation of two nucleic acid strands, most DNA ligases require the juxtaposition of a 5'-phosphate termini to a 3'-OH terminus. As a result of the lack of 5 '-phosphate termini on the adapter oligonucleotides, when a duplex adapter oligonucleotide is joined to one end of a duplex protected fragment, ligation (i.e., covalent joining by a ligase enzyme) will occur only between the strand comprising the 5 '-phosphate terminated end of the protected fragment and the strand comprising the 3'-OH end of the adapter. However, the complementary strands, in which a 3'-OH end of the protected fragment is juxtaposed to a 5'-OH end of the adapter oligonucleotide, will not be ligated. Since this occurs at both ends of the protected fragment, the result of adapter ligation is a duplex, comprising a protected fragment flanked by adapters, with a nick near each 3 '-end at the boundary between protected fragment sequences and adapter sequences.
The 3 '-end of the "outer" end of the adapter(s) may be modified with blocking moieties, such as a 3' deoxynucleotide, 3' dideoxynucleotide, inverted nucleotide or hapten, to prevent wrong orientation ligation. However, as will be understood, even without such steps, some fraction of the products will be properly ligated, which is all that is necessary to achieve the final product. Without such modification, the efficiency may be affected which may reduce the utility in some applications, but it is not essential. Covalent closure of the non-ligated adapter strand can be achieved by any method known in the art. In a preferred embodiment, the non-ligated adapter strand is degraded by a nuclease, and resynthesized by a DNA polymerase. In one aspect, the nuclease is the T7 gene 6 exonuclease, and the polymerase is T4 DNA polymerase. Degradation and resynthesis can be sequential or simultaneous. In certain embodiments of the invention, nucleolytic and polymerization activity are present in the same polypeptide. For example, the combined polymerase and 5'->3' exonuclease activities of a DNA polymerase (such as the Klenow fragment of E. coli DNA Polymerase I) are utilized to close the gap by "nick translation."
Amplification Duplex polynucleotides comprising a protected fragment flanked by adapter oligonucleotide duplexes are subjected to amplification.
Formation of tandem arrays of protected fragments
Duplex polynucleotides comprising a protected fragment flanked by adapter oligonucleotide duplexes, or optionally their amplification products, are ligated into tandem arrays (concatemers) and the nucleotide sequence of the concatemer is determined, thereby providing the nucleotide sequences of the genetic alterations in the sample polynucleotides.
In order to efficiently generate tandem arrays, an internal subfragment containing a portion of the adapter ligated to the protected fragment, or its amplification product, and bearing a palindromic single-stranded overhang ("sticky end") may be generated by providing a restriction site within the adapter sequence and digesting the adapted products with a restriction enzyme prior to ligating the array. To prevent religation of the terminal ("outer") subfragments, a capture moiety such as biotin may be provided on the terminal fragment by inclusion in the adapter and/or primer oligonucleotides, and removing them by capture with a recognition moiety such as streptavidin prior to ligation.
As will be understood, such ligation will generate arrays of highly random size, many of which will circularize, and thus be prevented from ligating into a vector. This effect will reduce the frequency of clones with inserts of appropriate size for sequencing. This may be controlled by including in the adapter population a second set of adapters with a restriction site that creates the same sticky end as the bulk of the adapters, but which is cleaved exclusively by a second enzyme, by virtue of having difference adjacent nucleotides. One such example is BamHl and Bgll, both of which leave a 5 '-GATC overhang, but recognize that sequence in the context of differing adjacent nucleotides.
After cleaving the adapted products with BamHl, capturing the terminal portions, and ligating the internal fragments, the product can then be cleaved with Bgll to generate linear arrays of more defined size. The average size of the arrays, or number of inserts, may be adjusted by varying the ratio of BamHl -site containing adapters or primers to Bgll -site contaimng adapters or primers.
An alternative means to generate defined arrays may be provided by including a small amount of a "chain-terminating" adapter oligonucleotide in the ligation reaction. Such molecules comprise double-stranded oligonucleotides bearing at one end, a sequence complementary to the sticky ends of the adapted protected fragments, and on the other end, a single-stranded non-palindromic sequence that lacks the ability to form a hybrid with itself, either by Watson-Crick or other base- pairing that would otherwise lead to ligation of a dimer. This is most easily achieved by utilizing a non-palindromic sequence that cannot pair with itself. Inclusion of this chain terminator in the ligation reaction forces the formation of linear products ending with the chain terminator adapters. A vector to which adapters complementary to the chain terminator sticky ends have been previously ligated ("decorated vector"), may then be added to the reaction to accept the arrays without allowing the closure of the vector or arrays into circular products.
Further, it will be recognized that the protected fragments or their amplified products will contain mismatches, that when introduced into host cells, will give rise to mismatch correction mechanisms, including the post DNA replication repair system. This may be avoided by methylation of the ligation product with a DNA methylase corresponding to the host cell. Alternatively, one strand of the vector containing the ligated array may be removed by treatment with an exonuclease prior to transformation. A suitable exonuclease is T7 gene 6 exonuclease. A single nick in the ligated product is required to prevent undesirable degradation of the vector. This may be conveniently achieved by placing a non-ligating adapter on one end of the vector or the insert array, or by omitting the 5' terminal phosphate on the adapter. Alternatively, a vector may be used that contains a Ml 3 or fl origin of replication and cleaving it in one strand of the origin with gpll protein prior to treatment with T7 gene 6 exonuclease.
Finally, certain sequences may be subject to host restriction-modification systems. This may be overcome by appropriate host selection or pretreatment with a modification methylase.
Analysis of sequence of tandem arrays
DNA from colonies containing tandem arrays is prepared and sequenced using methods well understood in the art.
In a preferred embodiment, all of the reactions following purification of the protected fragments are performed sequentially in the same reaction vessel, by sequential addition of enzymes (and buffers) required for the subsequent step(s).
The following examples are intended to illustrate, but not limit the invention.
EXAMPLES
Example 1. Detection of mutations in cloned mutS genes.
Five circular plasmid clones of E. coli mutS containing one mutation each in a 2kb region of the mutS coding sequence (each approximately 6kb total including vector sequence) were linearized with EcoRI. Then 0.1 pmol (0.42μg) aliquots of each were combined in 25μl of 2.5M guanidine isothiocyanate (GTC) and heteroduplexed by heating to 95°C for 2 min. and incubating at 37°C for lh. The GTC was then removed by chromatography on a Sephadex G-25 spin column (AP Biotech).
Two μg of the purified heteroduplex was mixed with 10 pmol of E. coli mutS protein (AP Biotech) in 20μl 50mM Tris pH7.5, 8 mM MgCl2, and 0.5 mM DTT and incubated for lh. on ice. Then, lμl 2 mg/ml DNase I was added and the sample incubated for 10 min. at 37°C. The digestion was terminated by addition of 23μl 50 mM Tris pH7.5, lOmM EDTA, and the mutS-protected complexes purified by chromatography on a Sephacryl S-200 spin column (AP Biotech). Twenty μl of the eluate was adjusted to 0.1M Tris pH7.5, 5 mM MgCl2, 7.5mM DTT, lOOμM ATP and 1 mM each dATP, dGTP, dCTP, and TTP. The termini of the protected fragments were then polished by the addition of 10 U T4 DNA polymerase and incubation for 10 min. at 37°C. The products were then adapted by addition of 35 pmol each forward and reverse double-stranded adapters (annealed sequences 1 and 2 mixed with annealed sequences 3 and 4, each separately pretreated with calf intestinal phosphatase (CLP)) with 2 units T4 DNA ligase (Stratagene) and incubation for lh. at 16°C. The unligated strands of the adapters (corresponding to sequences 2 and 4) in the ligation products (lOμl) were then removed and the 3' ends of the protected fragments simultaneously extended by addition of 50 U T7 gene 6 exonuclease (AP Biotech) and incubation for 30 min. at room temperature. To reduce background arising from residual adapter ligation, the final products were digested with 10 units Dra I in the same buffer for lh. at 37°C.
The final products were amplified by denaturing for 1 min. at 95°C, followed by 29 cycles of PCR, with denaturation at 95°C for 10 seconds, primer annealing at 62°C for 20s, and primer extension at 72°C for 10s, followed by a final extension for 5 min. at 72°C using lμM oligonucleotides 1 and 3 in a 50μl reaction containing 2.5 U Taq DNA polymerase . The products were digested with 80 U Hind III (New England Biolabs) for 3h. at 37°C. The released biotinylated adapter termini were removed after adding NaCl to 0.7 M by addition 40μl prewashed streptavidin agarose (Gibco-BRL) and incubation for 30 min. at room temperature. The mixture was extracted with phenol/chloroform and desalted by chromatography on a G-50 spin column. A portion (7 μl) of the eluate was then ligated with 100 ng Hind Ill-digested and phosphatase- treated pUC19 with 2 U T4 DNA ligase overnight at 14°C in a final volume of lOμl. The products were used to transform competent cells which were plated on LB/ampicillin agar. Analysis indicated 50% of the clones contained inserts. All of the insert-containing clones contained single protected fragments. Among 64 clones, 3 separate isolates were obtained for each of three of the mutation sites. The sequence changes corresponding to the mutations were properly identified in each by sequence variation between the isolates. Nine of the clones contained inserts corresponding to the fourth site, and five corresponded to the fifth site. Seven other clones had inserts to a site (and indicated a mutation by sequence variation) which was not contained in the sequenced region containing the known mutations. Four clones contained sequences within a 40 nucleotide region in mutS but which did not identify any nucleotide changes by sequence variation. Nine of the clones corresponded to vector sequence also without changes from the known sequence. Seventeen clones either contained very small (<10 bp) inserts or inserts which matched neither mutS nor vector sequence; several of these clones had homology to prokaryotic sequences suggesting contaminating E. coli genomic DNA in the plasmid preparations.
Example 2. Detection of single-nucleotide polymorphisms in a 4kb PCR product from APC using a solid-phase mutS protection reaction. Genomic DNA samples from four normal individuals were combined in equal mass ratio. A portion of exon 15 of the adenomatosus polyposis coli (APC) gene was amplified from 400ng of the pooled genomic template DNA using lμM oligonucleotide sequences 5 and 6 as primers and Pfu Turbo polymerase (Stratagene) in the supplied buffer supplemented with 5% glycerol. The reaction was heated to 95°C for one minute, followed by 29 cycles of 95°C denaturation for 1 min., 62°C primer annealing for 1 min. and extension at 72° for 6 min., followed by a final extension at 72°C for 10 minutes, generating a product ~4kb in length.
A reference DNA product was prepared in parallel except using 15 ng of a cloned PCR product from one of the four individuals as template and biotinylated primers corresponding to oligonucleotides 5 and 6.
Biotinylated reference DNA (0.5μg) and patient DNA (0.5μg) were heteroduplexed in 150μl 50% formamide, 2.78X SSPE and heated to 95°C for 5 minutes followed by incubation at 37°C for lh. The buffer was exchanged and the sample concentrated by diafiltration on a Centricon 100 followed by a single wash with 2ml 1 OmM Tris, ImM EDTA pH7.5 (TE). The product was diluted to 200μl with IX B&W (1M NaCl, lOmM Tris, ImM EDTA, 0.1% Tween 20), and lOOμg Dynal M-280 Streptavidin (Dynal Coφ., prewashed with IX B&W) was added and the hybrids captured by end-over-end mixing for 30 min. at room temperature. The particles were collected on a magnet, washed once with 300μl mutS binding buffer (50mM Tris pH7.5, 8 mM MgCl2, and 0.5mM DTT), and resuspended in 20μl of the same buffer. MutS (10 pmol) was added and the suspension incubated on ice without mixing for 1 hr. DNAse I (2μg in 1 μl) was then added and the suspension incubated for 10 minutes at 37°C. The digest was terminated by addition of 23 μl 50mM Tris pH7.5, lOmM EDTA. The particles were removed by applying a magnetic field, and the supernatant was chromatographed over a Sephacryl S-200 MicroSpin column (AP Biotech).
A portion of the S-200 eluate (20μl) was processed as described above (example 1), and the Hind-III digested inserts ligated into restricted and phosphatased pUC19. The cloned products were sequenced. Thirty one of the 45 inserts sequenced corresponded to a single (A/G) polymoφhism at position 5034 of the gene (codon 1678). One insert corresponded to the (G/A) polymoφhism at position 4479 (codon 1493) and two inserts corresponded to the (A/G) polymoφhism at position 5880 (codon 1960). Eleven inserts contained APC sequence but did not show any changes from the expected sequence.
Example 3. Generation of cloned tandem protected-fragment insert arrays.
A pUC vector decorated with nonpalindromic adapter ends is generated. One hundred micrograms of pUC19 is digested with BamHl and EcoRI, and the large fragment separated on a 1 % agarose gel and collected by electroelution. The vector is ligated with two oligonucleotide adapters (Q and R) created by mixing and annealing sequence 7 with sequence 8 (Q) and sequence 9 with sequence 10 (R).
After removal of the adapter termini from the amplified mutS -protected fragments by Hindlll digestion and capture on streptavidin agarose as described in example 1, the products are quantitated either by absorbance or fluorescence using ethidium bromide or Hoescht 33258. The mixture is supplemented with 5%
(mol/mol, assuming an average insert + adapter remnant size of 33bp) of adapter "S" created by annealing oligonucleotide sequences 11 and 12. This adapter provides a Hindlll-compatible single-stranded terminus and a sequence complementary to the adapter-modified ends of the pUC19 vector above. Alternatively, an adapter "S2" is created by annealing oligonucleotide sequences 11 and 17, and may be used to limit the formation of S2 dimers during the subsequent ligation reaction. The mixture is ligated overnight at 16°C with 2 U T4 DNA ligase (Stratagene) in 50mM Tris pH7.5, 5mM MgCl2, ImM DTT and 15% polyethylene glycol (Sigma) to create a tandem array terminated in non-self-ligatable 5'-GCTA-3' single-stranded overhangs. The arrays are then ligated into the decorated vector by addition of lμg of the pUC19 prepared as described above and incubation for 4h. at 16°C. The ligase is then inactivated by heating the mixture to 65°C for 15 min. To minimize the possibility of intracellular mismatch repair on cross-hybridized, mismatched PCR products in the final ligation product, one strand of the vector-ligated product may be removed by addition of 100 units T7 gene 6 exonuclease (AP Biotech) and incubation at 30°C for 20 min. A portion ( 1 μl) of the final reaction product diluted 1 : 10 in 1 OmM Tris, 1 mM EDTA is used to transform 25 μl of MAX efficiency DH5α E. coli (Gibco- BRL). Colonies are screened for a preferred insert size of 300-800 bp by minipreparation or colony PCR. Background arising from occasional ligation of the "S" adapters may be eliminated by digestion of the final (double-stranded) ligation product with Sac I prior to transformation (or T6 gene 6 exonuclease treatment, if used).
Example 4. Removal of inserts arising from polymorphisms in APC.
A set of amplified, protected fragments from sites of single-nucleotide polymoφhisms (SNP amplimers) in a gene of interest is generated as in example 2 from a pool of normal DNA, except using different adapters (and corresponding primers; adapter X = oligonucleotide sequences 13 and 14, sequence 13 as primer; adapter Y = oligonucleotide sequences 15 and 16, sequence 15 as primer). Any such sequences may be used as long as they function as adapter/primers in the method presented in examples 1 and 2 and will not anneal to the adapter sequences used for processing patient DNA. The primers for generating SNP amplimers have biotinylated termini allowing their removal by capture on a streptavidin solid support. A set of amplimer products generated from patient DNA using the adapters and primers as in example 2, except lacking biotin modifications is heated in the presence of a 10-fold molar excess of biotinylated SNP amplimers. Two μl of the PCR reaction from patient DNA is combined with 20μl of a PCR reaction containing the SNP amplimers in addition to 1.5M sodium thiocyanate, 120mM disodium phosphate, lOmM EDTA in a final volume of lOOμl. To minimize cross- hybridization of the amplimers due to the adapter sequences used for the SNP and patient DNAs, the annealing is supplemented with 50 pmol each oligonucleotide sequences 2, 4, 14 and 16. To this mixture, 8μl redistilled phenol is added and the mixture heated to 100°C for 10 min., and then chilled on ice. The reactions are then placed in a thermocycler and cycled for 2 min. at 65°C and 15 min. at 25°C for a total of 10 cycles (~2.5h) (Miller, R, and Riblet, R, Nucl. Acids Res. 23:2339-2340 (1995)). The mixture is extracted once with chloroform and 50μl is desalted over a G50 spin column (AP Biotech). The eluate is then mixed with 50μl 2M NaCl, and incubated with 50μl streptavidin agarose (Gibco-BRL) equilibrated with 1M NaCl, lOmM Tris pH7.5, ImM EDTA with mixing for 30 min. The final product is then desalted by chromatography on a G-50 spin column. Two μl of the G-50 eluate is reamplified with Taq DNA polymerase using biotinylated primers (sequences 1, 4) by heating to 95°C for 1 min., and then cycling at 95°C for 10 sec, annealing at 62°C for 20 sec. and extending at 72°C for 10 sec. for a total of 14 cycles. The product(s) are digested with Hind III and processed as described above (examples 2 and 3) to generate tandem insert array clones.
It is to be understood that while the invention has been described in conjuction with the above embodiments, that the foregoing description and examples are intended to illustrate and not limit the scope of the invention. Other aspects, advantages and modifications within the scope of the invention will be apparent to those skilled in the art to which the invention pertains. Sequences (Referenced above)
Oligonucleotide #
1. 5 '-BIO-TsGsCs-TsAC-CAG-TGC-CAG-CCA-AGC-TTT-T-3 ' where s = phosphorothioate linkage
2. 5 '-AAA-AGC-TTG-GCT-GGC-ACT-GGT-AGC-AZ-3 ' where Z = 3' inverted A (3 '-3' linkage)
3. 5'-BIO-CsCsT-sCsAA-GGA-TGG-CTC-CGA-AGC-TTT-T-3' where s = phosphorothioate linkage 4. 5'-AAA-AGC-TTC-GGA-GCC-ATC-CTT-GAG-GZ-3' where Z = 3' inverted A (3 '-3' linkage)
5. 5'-GTT-GAA-CTC-TGG-AAG-GCA-AAG-TCC-T-3'
6. 5 '-TTT-CTA-CCA-GGG-GAA-ATT-GAG-TTT-3 '
7. 5'-pTAG-CCG-AGG-GC-3' where p = 5 ' phosphate
8. 5'-pGAT-CGC-CCT-CG-3'
9. 5'-pAAT-TCC-GCC-TG-3'
10. 5'-TAG-CCA-GGC-GG-3' (5'-OH) 11. 5 '-pGC-TAG-GGC-CG-3 ' 12. 5'-pAG-CTC-GGC-CC-3'
13. 5 '-BIO-TsGsA-sCsGT-GC A-CTC-GGG-CGG-GAT-CCT-T-3 ' where s = phosphorothioate linkage
14. 5'-AAG-GAT-CCC-GCC-CGA-GTG-CAC-GTC-AZ-3' where Z = 3' inverted A (3 '-3' linkage) 15. 5'-BIO-TsGsA-sCsGC-GCT-GCC-ATG-CCG-GAT-CCT-T-3' where s = phosphorothioate linkage
16. 5 '-AAG-GAT-CCG-GCA-TGG-CAG-CGC-GTC-AZ-3 ' where Z = 3' inverted A (3 '-3' linkage)
17. 5'-pAG-TTC-GGC-CC-3'

Claims

CLAIMS What is claimed is:
1. A method for determining the sequence of a sample polynucleotide containing one or more genetic alterations, wherein the method comprises: (a) contacting a heteroduplex containing a sample polynucleotide and a reference polynucleotide with a reagent which recognizes a non-duplex polynucleotide structure, to form a reagent-heteroduplex complex, wherein the duplex contains at least one base-pair mismatch;
(b) contacting the reagent-heteroduplex complex with a nucleolytic agent to produce a protected fragment;
(c) joining a double-stranded adapter oligonucleotide to the protected fragment;
(d) amplifying the products of step (c) using primers complementary to a portion of the adapter oligonucleotide, to generate amplification products; (e) joining the amplification products to one another to form a concatemer, wherein each monomeric unit of the concatemer comprises a region of the sample polynucleotide contaimng the genetic alteration, and the monomeric units are separated by regions of sequence corresponding to a portion of the adapter oligonucleotide sequence; and (f) determining the nucleotide sequence of the concatemer.
2. The method according to claim 1, wherein a plurality of reference polynucleotides are used.
3. The method according to claim 1, further comprising step (b)(i), wherein the ends of the protected fragments are repaired.
4. The method according to claim 1, wherein the adapter oligonucleotide comprises one or more restriction enzyme recognition sites.
5. The method according to claim 4, further comprising step (d)(i), wherein the amplification products are digested with a restriction enzyme which cleaves at the recognition site.
6. The method according to claim 1 , wherein the adapter oligonucleotide:
(a) comprises an inner end and an outer end,
(b) is non-phosphorylated at the 5' terminus of the inner end,
(c) comprises a capture moiety at the outer end, and
(d) comprises one or more blocking linkages adjacent to the capture moiety.
7. The method according to claim 1 wherein, in step (c), two adapter oligonucleotides, having different sequences, are used.
8. The method according to claim 1, wherein, in step (c), joining the adapter comprises the following steps: (c)(i) joining the first strand of the double-stranded adapter oligonucleotide to the products of step (b), and
(c)(ii) joining the second strand of the double-stranded adapter oligonucleotide to the products of step (c)(i).
9. The method according to claim 8, wherein (a) the first strand is covalently joined to the protected fragment by ligation, and
(b) the second strand is covalently joined to the protected fragment by nick-translation.
10. A method for determining the sequences of one or more genetic alterations in a plurality of sample polynucleotides, wherein the method comprises: (a) contacting a plurality of duplexes with a reagent which recognizes a non-duplex polynucleotide structure, wherein each duplex comprises a sample polynucleotide strand and a reference polynucleotide strand and at least one of the duplexes contains at least one base-pair mismatch, to form at least one reagent- heteroduplex complex; (b) contacting the reagent-heteroduplex complexes with a nucleolytic agent to produce a plurality of protected fragments;
(c) joining a double-stranded adapter oligonucleotide to the protected fragments; (d) amplifying the products of step (c) using primers complementary to a portion of the adapter oligonucleotide, to generate a plurality of amplification products;
(e) joining the amplification products to one another to form a concatemer, wherein each monomeric unit of the concatemer comprises a region of a sample polynucleotide contaimng a genetic alteration, and the monomeric units are separated by regions of sequence corresponding to a portion of the adapter oligonucleotide sequence; and
(f) determining the nucleotide sequence of the concatemer.
11. The method according to claim 10, wherein a plurality of reference polynucleotides are used.
12. The method according to claim 10, further comprising step (b)(i), wherein the ends of the protected fragments are repaired.
13. The method according to claim 10, wherein the adapter oligonucleotide comprises one or more restriction enzyme recognition sites.
14. The method according to claim 13 further comprising step (d)(i), wherein the amplification products are digested with a restriction enzyme which cleaves at the recognition site.
15. The method according to claim 10, wherein the adapter oligonucleotide: (a) comprises an inner end and an outer end, (b) is non- phosphorylated at the 5' terminus of the inner end, (c) comprises a capture moiety at the outer end, and (d) comprises one or more blocking linkages adjacent to the capture moiety.
16. The method according to claim 10 wherein, in step (c), two adapter oligonucleotides, having different sequences, are used.
17. The method according to claim 10, wherein, in step (c), joining the adapter comprises the following steps: (c)(i) joining the first strand of the double-stranded adapter oligonucleotide to the products of step (b), and
(c)(ii) joining the second strand of the double-stranded adapter oligonucleotide to the products of step (c)(i).
18. The method of claims 1 or 10, wherein the joining of steps (c)and (e) comprises the formation of a covalent bond.
19. The method of claim 8 or 17, wherein the joining comprises the formation of a covalent bond.
PCT/US2000/020557 1999-07-29 2000-07-28 Serial analysis of genetic alterations WO2001009384A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU63870/00A AU6387000A (en) 1999-07-29 2000-07-28 Serial analysis of genetic alterations

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US14643799P 1999-07-29 1999-07-29
US60/146,437 1999-07-29

Publications (2)

Publication Number Publication Date
WO2001009384A2 true WO2001009384A2 (en) 2001-02-08
WO2001009384A3 WO2001009384A3 (en) 2001-06-07

Family

ID=22517359

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2000/020557 WO2001009384A2 (en) 1999-07-29 2000-07-28 Serial analysis of genetic alterations

Country Status (2)

Country Link
AU (1) AU6387000A (en)
WO (1) WO2001009384A2 (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004081183A2 (en) * 2003-03-07 2004-09-23 Rubicon Genomics, Inc. In vitro dna immortalization and whole genome amplification using libraries generated from randomly fragmented dna
WO2009052214A2 (en) * 2007-10-15 2009-04-23 Complete Genomics, Inc. Sequence analysis using decorated nucleic acids
US7718403B2 (en) 2003-03-07 2010-05-18 Rubicon Genomics, Inc. Amplification and analysis of whole genome and whole transcriptome libraries generated by a DNA polymerization process
US7803550B2 (en) 2005-08-02 2010-09-28 Rubicon Genomics, Inc. Methods of producing nucleic acid molecules comprising stem loop oligonucleotides
US8206913B1 (en) 2003-03-07 2012-06-26 Rubicon Genomics, Inc. Amplification and analysis of whole genome and whole transcriptome libraries generated by a DNA polymerization process
US8409804B2 (en) 2005-08-02 2013-04-02 Rubicon Genomics, Inc. Isolation of CpG islands by thermal segregation and enzymatic selection-amplification method
US8440404B2 (en) 2004-03-08 2013-05-14 Rubicon Genomics Methods and compositions for generating and amplifying DNA libraries for sensitive detection and analysis of DNA methylation
CN103294932A (en) * 2012-02-24 2013-09-11 三星Sds株式会社 Reference sequence processing system and method for analyzing genome sequence
US8673562B2 (en) 2005-06-15 2014-03-18 Callida Genomics, Inc. Using non-overlapping fragments for nucleic acid sequencing
US9222132B2 (en) 2008-01-28 2015-12-29 Complete Genomics, Inc. Methods and compositions for efficient base calling in sequencing reactions
US9267172B2 (en) 2007-11-05 2016-02-23 Complete Genomics, Inc. Efficient base determination in sequencing reactions
US9524369B2 (en) 2009-06-15 2016-12-20 Complete Genomics, Inc. Processing and analysis of complex nucleic acid sequence data
US11389779B2 (en) 2007-12-05 2022-07-19 Complete Genomics, Inc. Methods of preparing a library of nucleic acid fragments tagged with oligonucleotide bar code sequences

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1995029258A1 (en) * 1994-04-27 1995-11-02 St. James' And Seacroft University Hospitals Nhs Trust Nucleic acid mutation assays
WO1996041002A2 (en) * 1995-06-07 1996-12-19 Genzyme Corporation Methods for the identification of genetic modification of dna involving dna sequencing and positional cloning
EP0761822A2 (en) * 1995-09-12 1997-03-12 The Johns Hopkins University School Of Medicine Method for serial analysis of gene expression
WO1997029211A1 (en) * 1996-02-09 1997-08-14 The Government Of The United States Of America, Represented By The Secretary, Department Of Health And Human Services RESTRICTION DISPLAY (RD-PCR) OF DIFFERENTIALLY EXPRESSED mRNAs
US5750335A (en) * 1992-04-24 1998-05-12 Massachusetts Institute Of Technology Screening for genetic variation
WO1999023256A1 (en) * 1997-10-30 1999-05-14 Cold Spring Harbor Laboratory Probe arrays and methods of using probe arrays for distinguishing dna
WO1999039003A1 (en) * 1998-01-30 1999-08-05 Genzyme Corporation Method for detecting and identifying mutations

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5750335A (en) * 1992-04-24 1998-05-12 Massachusetts Institute Of Technology Screening for genetic variation
WO1995029258A1 (en) * 1994-04-27 1995-11-02 St. James' And Seacroft University Hospitals Nhs Trust Nucleic acid mutation assays
WO1996041002A2 (en) * 1995-06-07 1996-12-19 Genzyme Corporation Methods for the identification of genetic modification of dna involving dna sequencing and positional cloning
EP0761822A2 (en) * 1995-09-12 1997-03-12 The Johns Hopkins University School Of Medicine Method for serial analysis of gene expression
WO1997029211A1 (en) * 1996-02-09 1997-08-14 The Government Of The United States Of America, Represented By The Secretary, Department Of Health And Human Services RESTRICTION DISPLAY (RD-PCR) OF DIFFERENTIALLY EXPRESSED mRNAs
WO1999023256A1 (en) * 1997-10-30 1999-05-14 Cold Spring Harbor Laboratory Probe arrays and methods of using probe arrays for distinguishing dna
WO1999039003A1 (en) * 1998-01-30 1999-08-05 Genzyme Corporation Method for detecting and identifying mutations

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
COTTIN R: "Current methods of mutation research" MUTATION RESEARCH, vol. 285, 1993, pages 125-144, XP000443992 *
ELLIS L A ET AL: "MUTS BINDING PROTECTS HETERODUPLEX DNA FROM EXONUCLEASE DIGESTION IN VITRO: A SIMPLE METHOD FOR DETECTING MUTATIONS" NUCLEIC ACIDS RESEARCH,GB,OXFORD UNIVERSITY PRESS, SURREY, vol. 22, no. 13, 11 July 1994 (1994-07-11), pages 2710-2711, XP000606262 ISSN: 0305-1048 *
PARSONS B L ET AL: "EVALUATION OF MUTS AS A TOOL FOR DIRECT MEASUREMENT OF POINT MUTATIONS IN GENOMIC DNA" MUTATION RESEARCH,NL,AMSTERDAM, vol. 374, no. 2, 21 March 1997 (1997-03-21), pages 277-285, XP002914507 ISSN: 0027-5107 *
VELCULESCU V E ET AL: "SERIAL ANALYSIS OF GENE EXPRESSION" SCIENCE,US,AMERICAN ASSOCIATION FOR THE ADVANCEMENT OF SCIENCE,, vol. 270, 20 October 1995 (1995-10-20), pages 484-487, XP002053721 ISSN: 0036-8075 *
WHITE M J ET AL: "CONCATEMER CHAIN REACTION: A TAQ DNA POLYMERASE-MEDIATED MECHANISM FOR GENERATING LONG TANDEMLY REPETITIVE DNA SEQUENCES" ANALYTICAL BIOCHEMISTRY,US,ACADEMIC PRESS, SAN DIEGO, CA, vol. 199, no. 2, 1 December 1991 (1991-12-01), pages 184-190, XP000236491 ISSN: 0003-2697 *

Cited By (40)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004081183A2 (en) * 2003-03-07 2004-09-23 Rubicon Genomics, Inc. In vitro dna immortalization and whole genome amplification using libraries generated from randomly fragmented dna
WO2004081183A3 (en) * 2003-03-07 2005-05-12 Rubicon Genomics Inc In vitro dna immortalization and whole genome amplification using libraries generated from randomly fragmented dna
US11492663B2 (en) 2003-03-07 2022-11-08 Takara Bio Usa, Inc. Amplification and analysis of whole genome and whole transcriptome libraries generated by a DNA polymerization process
US7718403B2 (en) 2003-03-07 2010-05-18 Rubicon Genomics, Inc. Amplification and analysis of whole genome and whole transcriptome libraries generated by a DNA polymerization process
US10837049B2 (en) 2003-03-07 2020-11-17 Takara Bio Usa, Inc. Amplification and analysis of whole genome and whole transcriptome libraries generated by a DNA polymerization process
US8206913B1 (en) 2003-03-07 2012-06-26 Rubicon Genomics, Inc. Amplification and analysis of whole genome and whole transcriptome libraries generated by a DNA polymerization process
US11661628B2 (en) 2003-03-07 2023-05-30 Takara Bio Usa, Inc. Amplification and analysis of whole genome and whole transcriptome libraries generated by a DNA polymerization process
US8440404B2 (en) 2004-03-08 2013-05-14 Rubicon Genomics Methods and compositions for generating and amplifying DNA libraries for sensitive detection and analysis of DNA methylation
US9708652B2 (en) 2004-03-08 2017-07-18 Rubicon Genomics, Inc. Methods and compositions for generating and amplifying DNA libraries for sensitive detection and analysis of DNA methylation
US8771957B2 (en) 2005-06-15 2014-07-08 Callida Genomics, Inc. Sequencing using a predetermined coverage amount of polynucleotide fragments
US11414702B2 (en) 2005-06-15 2022-08-16 Complete Genomics, Inc. Nucleic acid analysis by random mixtures of non-overlapping fragments
US9637784B2 (en) 2005-06-15 2017-05-02 Complete Genomics, Inc. Methods for DNA sequencing and analysis using multiple tiers of aliquots
US8673562B2 (en) 2005-06-15 2014-03-18 Callida Genomics, Inc. Using non-overlapping fragments for nucleic acid sequencing
US9637785B2 (en) 2005-06-15 2017-05-02 Complete Genomics, Inc. Tagged fragment library configured for genome or cDNA sequence analysis
US8765375B2 (en) 2005-06-15 2014-07-01 Callida Genomics, Inc. Method for sequencing polynucleotides by forming separate fragment mixtures
US8765382B2 (en) 2005-06-15 2014-07-01 Callida Genomics, Inc. Genome sequence analysis using tagged amplicons
US8765379B2 (en) 2005-06-15 2014-07-01 Callida Genomics, Inc. Nucleic acid sequence analysis from combined mixtures of amplified fragments
US8771958B2 (en) 2005-06-15 2014-07-08 Callida Genomics, Inc. Nucleotide sequence from amplicon subfragments
US10125392B2 (en) 2005-06-15 2018-11-13 Complete Genomics, Inc. Preparing a DNA fragment library for sequencing using tagged primers
US10208337B2 (en) 2005-08-02 2019-02-19 Takara Bio Usa, Inc. Compositions including a double stranded nucleic acid molecule and a stem-loop oligonucleotide
US8399199B2 (en) 2005-08-02 2013-03-19 Rubicon Genomics Use of stem-loop oligonucleotides in the preparation of nucleic acid molecules
US7803550B2 (en) 2005-08-02 2010-09-28 Rubicon Genomics, Inc. Methods of producing nucleic acid molecules comprising stem loop oligonucleotides
US11072823B2 (en) 2005-08-02 2021-07-27 Takara Bio Usa, Inc. Compositions including a double stranded nucleic acid molecule and a stem-loop oligonucleotide
US8071312B2 (en) 2005-08-02 2011-12-06 Rubicon Genomics, Inc. Methods for producing and using stem-loop oligonucleotides
US9598727B2 (en) 2005-08-02 2017-03-21 Rubicon Genomics, Inc. Methods for processing and amplifying nucleic acids
US8778610B2 (en) 2005-08-02 2014-07-15 Rubicon Genomics, Inc. Methods for preparing amplifiable DNA molecules
US8728737B2 (en) 2005-08-02 2014-05-20 Rubicon Genomics, Inc. Attaching a stem-loop oligonucleotide to a double stranded DNA molecule
US10196686B2 (en) 2005-08-02 2019-02-05 Takara Bio Usa, Inc. Kits including stem-loop oligonucleotides for use in preparing nucleic acid molecules
US8409804B2 (en) 2005-08-02 2013-04-02 Rubicon Genomics, Inc. Isolation of CpG islands by thermal segregation and enzymatic selection-amplification method
WO2009052214A3 (en) * 2007-10-15 2009-06-25 Complete Genomics Inc Sequence analysis using decorated nucleic acids
WO2009052214A2 (en) * 2007-10-15 2009-04-23 Complete Genomics, Inc. Sequence analysis using decorated nucleic acids
US9267172B2 (en) 2007-11-05 2016-02-23 Complete Genomics, Inc. Efficient base determination in sequencing reactions
US11389779B2 (en) 2007-12-05 2022-07-19 Complete Genomics, Inc. Methods of preparing a library of nucleic acid fragments tagged with oligonucleotide bar code sequences
US9222132B2 (en) 2008-01-28 2015-12-29 Complete Genomics, Inc. Methods and compositions for efficient base calling in sequencing reactions
US10662473B2 (en) 2008-01-28 2020-05-26 Complete Genomics, Inc. Methods and compositions for efficient base calling in sequencing reactions
US9523125B2 (en) 2008-01-28 2016-12-20 Complete Genomics, Inc. Methods and compositions for efficient base calling in sequencing reactions
US11098356B2 (en) 2008-01-28 2021-08-24 Complete Genomics, Inc. Methods and compositions for nucleic acid sequencing
US11214832B2 (en) 2008-01-28 2022-01-04 Complete Genomics, Inc. Methods and compositions for efficient base calling in sequencing reactions
US9524369B2 (en) 2009-06-15 2016-12-20 Complete Genomics, Inc. Processing and analysis of complex nucleic acid sequence data
CN103294932A (en) * 2012-02-24 2013-09-11 三星Sds株式会社 Reference sequence processing system and method for analyzing genome sequence

Also Published As

Publication number Publication date
AU6387000A (en) 2001-02-19
WO2001009384A3 (en) 2001-06-07

Similar Documents

Publication Publication Date Title
US5952176A (en) Glycosylase mediated detection of nucleotide sequences at candidate loci
US6297010B1 (en) Method for detecting and identifying mutations
EP3234200B1 (en) Method for targeted depletion of nucleic acids using crispr/cas system proteins
EP2591125B1 (en) V3-d genomic region of interest sequencing strategies
JP3535159B2 (en) Selective approach to DNA analysis
US5958692A (en) Detection of mutation by resolvase cleavage
JP4663118B2 (en) Methods for screening nucleic acids for nucleotide variations
US5707806A (en) Direct sequence identification of mutations by cleavage- and ligation-associated mutation-specific sequencing
US6924104B2 (en) Methods for identifying genes associated with diseases or specific phenotypes
CA2318980C (en) Method for detecting and identifying mutations
WO2001009384A2 (en) Serial analysis of genetic alterations
JP2002523063A5 (en)
WO2003074734A2 (en) Methods for detecting genome-wide sequence variations associated with a phenotype
KR20010005544A (en) Extraction and utilisation of VNTR alleles
Lu et al. Detection of single DNA base mutations with mismatch repair enzymes
KR20170110721A (en) Methods and compositions for reducing nonspecific amplification products
JP2007532120A (en) Method for selectively detecting a subset of nucleic acid molecules
EP1916312B1 (en) Use of DNA polymerases as exoribonucleases
CA2360929A1 (en) Genomic analysis method
US6090548A (en) Method for identifying and/or quantifying expression of nucleic acid molecules in a sample
CA2393874C (en) Method for selectively isolating a nucleic acid
WO2000056923A2 (en) Genetic analysis
WO2001046470A1 (en) Enrichment of nucleic acid

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
AK Designated states

Kind code of ref document: A3

Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A3

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase in:

Ref country code: JP