WO1999042621A2 - Methods for identifying or characterising a site based on the thermodynamic properties of nucleic acids - Google Patents

Methods for identifying or characterising a site based on the thermodynamic properties of nucleic acids Download PDF

Info

Publication number
WO1999042621A2
WO1999042621A2 PCT/US1999/003754 US9903754W WO9942621A2 WO 1999042621 A2 WO1999042621 A2 WO 1999042621A2 US 9903754 W US9903754 W US 9903754W WO 9942621 A2 WO9942621 A2 WO 9942621A2
Authority
WO
WIPO (PCT)
Prior art keywords
sequence
nucleotide
protein
determining
nucleotide sequence
Prior art date
Application number
PCT/US1999/003754
Other languages
French (fr)
Other versions
WO1999042621A3 (en
Inventor
Michael J. Lane
Albert S. Benight
Brian D. Faldasz
Original Assignee
Tm Technologies, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tm Technologies, Inc. filed Critical Tm Technologies, Inc.
Priority to AU33049/99A priority Critical patent/AU3304999A/en
Publication of WO1999042621A2 publication Critical patent/WO1999042621A2/en
Publication of WO1999042621A3 publication Critical patent/WO1999042621A3/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/30Detection of binding sites or motifs
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/10Gene or protein expression profiling; Expression-ratio estimation or normalisation
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression

Definitions

  • the invention relates to nucleotide and polypeptide sequences having selected functional characteristics or having selected values for a free energy parameter.
  • the methods of the invention allow selection of anti-sense oligonucleotides (e.g., for use as pharmacological agents), determination of protein or polypeptide function from a nucleotide sequence which codes for the polypeptide, prediction of antigenic regions of a protein, determination of a region of a polynucleotide or protein which is susceptible to mutation, and the like.
  • nucleic acid sequence such as a gene
  • the primary structure of the protein or polypeptide which is encoded by a nucleic acid sequence, such as a gene can be determined through the use of the well-known genetic code.
  • the function of a protein which is encoded by a particular gene is unknown at the time the gene is sequenced.
  • the function of such proteins can often be determined, but the cost can be considerable.
  • Antisense therapy involves the administration of exogenous oligonucleotides that bind to a target nucleic acid, typically an RNA molecule, located within cells.
  • a target nucleic acid typically an RNA molecule
  • antisense is so given because the oligonucleotides are typically complementary to mRNA molecules ("sense strands") which encode a cellular product. The ability * o ⁇ se anti-sense oligonucleotides to inhibit expression of mRNAs.
  • oligonucleotide or oligonucleotides
  • Anti-sense agents typically need to continuously bind all target RNA molecules so as to inactivate them or alternatively provide a substrate for endogenous ribonuclease H (Rnase H) activity.
  • Sensitivity of RN A oligonucleotide complexes, generated by the methods of the present invention, to Rnase H digestion can be evaluated by standard methods (see, e.g., Donia, B. P., et al., J. Biol. Chem. 268 (19):14514-14522 (1993); Kawasaki, A. M., et al., J. Med. Chem. 6(7):831-841 (1993), incorporated herein by reference).
  • the invention relates to nucleic acid sequences, and, in particular, to methods for identifying or designing structural and/or energetic characteristics of nucleic acids, and proteins which are encoded by the nucleic acids.
  • the invention relates to methods for selecting antisense oligonucleotides.
  • Prior art methods do not provide efficient means of determining which complimentary oligonucleotides to a given mRNA will be useful in an application.
  • Shorter (15-200) base anti-sense molecules are preferred in clinical applications. In fact, a minimum of 15 base anti-sense oligonucleotides is preferred.
  • the invention includes methods for selecting desired anti-sense oligonucleotides from the set of candidates provided by any given nucleic acid, e.g., an mRNA.
  • the invention provides a means of determining desired, e.g., sequence positions, e.g., those which present a desired level of free energy variations on the mRNA to design anti-sense oligonucleotides against thus reducing the empiricism currently employed.
  • the invention features a method of identifying a site on a nucleic acid sequence having high free energy variability. This allows determination of sites which are preferred for oligonucleotide, e.g, antisense, binding.
  • the method includes some or all of the following steps: providing a nucleotide sequence, e.g., sequence from a target gene; casting the nucleotide sequence as the free energy as a function of base pair position: calculating the free energy of X windows centered on a base pair for a plurality of base pairs from the nucleotide sequence for every, or at least a plurality of window sizes between 2 and Y.
  • Y is an integer betv ⁇ I 3 and 1.000. more preferably between 2 and 100
  • for each window size constructing a free energy distribution along the sequence, preferably normalizing the distribution to a standard scale (to account for the fact that the free energy is proportional to window size) (this calculation gives the results which can be plotted as shown in Figure 1);
  • the invention provides for a method for determining preferred anti-sense sequence compliments within a predefined RNA sequence, these are generally high variability sequences.
  • high variability can be a relative parameter, e.g., relative to other variability in the sequence. Alternatively it can be relative to a predefined value).
  • the invention provides for sets (e.g., sets of 2, 3, 4 or more) of sequences, e.g., anti-sense oligonucleotides, of an optimal duplex free energy or variability but variable length at the sites of anti-sense candidates within candidate regions.
  • the invention provides sets of isoenergetic, or isovariable, oligonucleotides, e.g., anti-sense candidates of a set length within candidate region.
  • the invention provides for establishing oligonucleotides, (e.g., sets of 2, 3, 4 or more) oligonucleotides, e.g., anti-sense oligonucleotides ,of a preselected melting temperature, Tm within candidate regions.
  • oligonucleotides e.g., sets of 2, 3, 4 or more
  • oligonucleotides e.g., anti-sense oligonucleotides ,of a preselected melting temperature, Tm within candidate regions.
  • the method allows for identification, choosing, and matching of sequences with desired free energy variability characteristics.
  • Methods of the invention can be used for any of the following: Determining the best anti-sense candidate regions, or sub-sequences, within any given anti-sense target. Such sequences exhibit wide variation in average energy as a function of increasing length.
  • compositions of sequence which display the identified variation in sequence composition with increasing window size.
  • the invention provides methods for determining functionally important regions of a protein. In another aspect, the invention provides methods for predicting mutation-prone regions of a protein. In still another aspect, the invention provides methods for determining flanking sequences which provide a pre-selected value for a thermodynamic parameter, such as free energy of ligand binding, to a ligand binding site of a polynucleotide.
  • a thermodynamic parameter such as free energy of ligand binding
  • the invention provides methods for selecting flanking sequences to a ligand binding site of a nucleic acid which affect ligand binding to a nucleic acid.
  • the invention provides methods for determining the ability of a pre-selected nucleic acid sequence to transmit binding energy to a portion of nucleic acid which is remote from the pre-selected sequence.
  • the invention provides methods for increasing or decreasing the mutation rate of a nucleic acid sequence, e.g., by stabilizing or destabilizing the nucleotide sequence.
  • Any method of the invention can include providing a sequence, e.g., by synthesizing, or by placing in a reaction mixture which includes a carrier, e.g., a liquid, e.g., water.
  • a carrier e.g., a liquid, e.g., water.
  • Figure 1 is a plot of normalized energy as a function of window size and position along a representative DNA sequence.
  • Figure 2 is an overlaid plot of the data shown in Figure 1.
  • Figure 3 is a plot of the variability of energy distributions along the representative DNA sequence.
  • Figure 4 is a schematic illustration of a ligand binding experiment.
  • Figure 5 is an autoradiogram of a gel shift assay.
  • Figure 6 shows the sequences of several DNA constructs identified by ligand binding experiments.
  • Figure 7 is a graph of the distributions of free energies of 15 base window free energy calculations along the mouse betaglobin major gene are plotted as a function of amino acid position.
  • the distributions were determined by calculating the free energy for each of the possible codon sequences dictated by the amino acid sequences and categorized into discrete energy ranges. Since all permutations of codons representing the amino acids are present, the distribution describes a probability of the energies available to the five amino acids in each window.
  • the two introns of the betaglobin gene were deleted for this calculation.
  • the axes are: x. free energy in calories; y, normalized probability; z. base position in the gene sequence.
  • Figure 8 is an alternative representation of the energetic analysis shown in Figure 6.
  • the distributions are normalized to their respective means, and each distribution is reduced by 2.5% from each end.
  • the energy profile of the actual mouse betaglobin sequence is plotted after normalizing it to the same mean value.
  • These confidence intervals define the range over which 95% of the energies of the DNA sequences that can be coded from the amino acid sequence of the- gene can fall.
  • Figure 9 is a graph which displays the relative free energy of a segment of the HTLV-I LTR as a function of base position and the natural logarithm of the mutation rate over the same region (inset). Seven HTLV-I sequences were used in the mutation rate calculations.
  • Figure 10 The positions along the consensus sequence which have a standard deviation of energy less than 1 kcal determined from the statistical analysis of the 24 ldh sequences are plotted as a function of base position in the consensus sequence. The total number of points is 79. b) The points in the mean free energy deviation versus base position which exceed the +95% and -95% confidence limits are designated as +1 or -1 , respectively, and are plotted as a function of consensus base position. All points which fall within the 95% confidence limits are zero. The two regions denoted by vertical bars have correlations between these two datasets; around 600 and around 825.
  • the antigenic c) and nonantigenic d) regions of lactate dehydrogenase are displayed relative to the consensus gapped DNA sequence. Regions at 600 and 825 cross-correlate to a nonantigenic and an antigenic region, respectively. The antigenic region is the third most antigenic region found by Hogrefe et al. Figure 11 shows the wildtype and unstable insert DNA sequences of a neo gene.
  • the invention relates to methods for determining structurally or functionally important regions of a nucleic acid sequence or a protein or polypeptide sequence.
  • the invention provides a method(s) for determining base position(s) on a preselected mRNA sequence where best hybridization of an oligonucleotide will occur.
  • the mRNA may be a pre- mRNA (hnRNA) thus containing untranscribed regions to be spliced out and that included in this mRNA/pre-mRNA are a variety of control sequences which allow binding of various cellular component?
  • oligonucleotide e.g., 30 bases
  • the method is described below with reference to data from a representative target nucleic acid sequence (LDH M72545, base positions from 64-924, the sequence is available through GENBANK)
  • the algorithm for determining relatively "reactive" sites along genomic DNA is based on a representation of duplex DNA in terms of its sequence dependent melting free-energy This provides DNA sequence as energy contours, that when scrutinized in the proper way, can lead to direct determination of specific sites that are optimum for targeting by anti-sense therapeutic agents
  • each bp l can be assigned a melting free-energy value.
  • ⁇ G, ⁇ G H - B + ( ⁇ G S M . + ⁇ GS 1 1+1 )/2
  • ⁇ G H B is the free-energy of hydrogen bonding that typically can take on only two values (for A-T or G-C type bps) and ⁇ G S , , , + ⁇ G S , 1+1 are the nearest-neighbor sequence dependent stacking free-energies for the stacking interactions between bp 1 and bps l+l and l-l Utilizing this equation each bp can be assigned a free-energy of melting
  • plotting the values of ⁇ G j vs bp position s results in an energy contour for that particular window size, j. Since the magnitude of ⁇ G j increase with the size of j, relative features of energy contours constructed for different window sizes are difficult to compare directly.
  • ⁇ G/ V ) I ( ⁇ G v - ⁇ G (min))
  • N w is the number of window sizes
  • T m melting temperature
  • ⁇ H D and ⁇ S D are the calculated melting enthalpy and entropy for the particular sequence.
  • the entropy of nucleation is ⁇ S nuc and is regarded as a constant for a particular type of target in our equational formulation. That is, it does not depend on oligomer length.
  • the enthalpy of duplex nucleation, ⁇ H nuc is primarily electrostatic in nature and therefore depends on sequence length, G-C percentage and salt concentration.
  • the total strand concentration is C and ⁇ is a factor that properly accounts for sequence degeneracies in association of the ohgomers. Overall, stability of the chosen ohgomers can therefore be adjusted by changes in G-C percentage and length.
  • the invention relates to methods for selecting flanking nucleic acid sequences, e.g., nucleic acid sequences which flank (in the 3' and or 5' direction) a selected nucleic acid sequence (such as a ligand binding site).
  • the methods of the invention are useful for determining flanking sequences which provide desired characteristics, such as Tm, ability of the ligand binding site to bind to a ligand, ability of a ligand to react with the nucleic acid sequence, stability of the nucleic acid sequence, and the like.
  • thermodynamic parameters of a nucleic acid sequence can be modulated by providing appropriate flanking sequences.
  • a flanking sequence(s) can be selected to increase the ability of a ligand to bind to a ligand binding site; decrease the ability of a ligand to bind to a ligand binding site; increase the ability of a ligand to react with a nucleic acid sequence; decrease the ability of a ligand to react with a nucleic acid sequence; increase the stability of the nucleic acid sequence (e.g., the mutability of the sequence); ''en/ease the stability of the nucleic acid sequence: and the like.
  • the ability of a nucleic acid sequence to bind to a ligand is related to the thermodynamic stability of the nucleic acid sequence. It is believed that less stable sequences bind to ligands better than more stable sequences do.
  • the ability of a polynucleotide to bind to a ligand is related to the stability of the polynucleotide sequence.
  • the stability of a polynucleotide which includes the ligand binding site can be modulated by providing flanking sequences which affect the stability of the polynucleotide, e.g., at the ligand binding site.
  • the method includes the steps of providing a polynucleotide which includes a ligand binding site and at least one sequence which flanks the ligand binding site (e.g., in the 3' and/or 5' direction); and determining the ability of the ligand to bind to the ligand binding site to bind to (or react with) the ligand.
  • a plurality (such as combinatorial library) of polynucleotides, each including the same ligand binding site and different (e.g., randomly differing) flanking sequences, are provided.
  • the mixture of polynucleotides can be screened against a limiting concentration of the ligand, and polynucleotide sequences which preferentially bind to the ligand (or do not bind to the ligand) can be selected, and (preferably) are then sequenced to determine an appropriate flanking sequence(s).
  • polynucleotide sequences which preferentially bind to the ligand (or do not bind to the ligand) can be selected, and (preferably) are then sequenced to determine an appropriate flanking sequence(s).
  • Example 1 infra, describes the preparation of a plurality of
  • DNA sequences each sequence included a binding site for the restriction enzyme BamHI, flanked on each end by a random polynucleotide sequence 40 bases in length.
  • the mixture of sequences was titrated with a known concentration of BamHI, and those sequences which bound most strongly to BamHI were selected (in this example, by gel shift assay and recovery of the shifted bands). Similarly, the poorest-binding sequences were selected. Certain of the selected polynucleotides were then sequenced to determine the flanking sequences which conferred increased or decreased ligand-binding ability on the ligand binding site.
  • the ability of a polynucleotide sequence to bind to a ligand is believed to be related to the stability of the polynucleotide sequence. It is further believed that the ability of a polynucleotide sequence to bind to ligands is at least largely independent of the ligand selected. Thus, a flanking sequence which destabilizes the binding of any one ligand to a ligand binding site adjacent the flanking sequence, will also destabilize the binding of other ligands to the ligand binding site. Thus, the particular ligand selected for use according to the methods of the invention, to determine the ability of a flanking sequence to affect ligand binding, is a matter of convenience and design choice which will be routine to one of ordinary skill in the art.
  • flanking sequences which confer particular energetic or reactivity attributes upon a neighboring sequence will have many potential uses.
  • flanking sequences can be selected to promote binding to or reaction with a ligand, such as an RNA or DNA binding protein, a polymerase, a reverse transcriptase. a telomerase, a helicase, a transcription factor, and the like.
  • a ligand such as an RNA or DNA binding protein, a polymerase, a reverse transcriptase. a telomerase, a helicase, a transcription factor, and the like.
  • the invention provides methods for selecting flanking sequences which can be used in vivo, e.g., to study the interaction of ligands and nucleic acids, or to provide improved probes or primers for PCR amplification, and the like.
  • flanking sequences of a polynucleotide can also be provided in a non- random manner.
  • a flanking sequence can be provided , e.g., by oligonucleotide chemical or biochemical synthesis, to provide a flanking region of any known sequence.
  • This flanking sequence can then be tested to determine the effect on ligand binding or sequence stability.
  • One particularly preferred practice of the invention involves the construction of a plurality of oligonucleotides, each including a ligand binding site flanked by at least one flanking sequence which has a known, repeating motif. The effect on stability of each flanking sequence can then be assayed, e.g., as described herein.
  • This embodiment of the invention is useful in constructing sequence reactivity data compilations, e.g., a database, which quantifies the effect on stability of any possible flanking sequence (see infra).
  • the invention provides methods for determining polynucleotides which are more (or less) prone to mutation.
  • the methods of the invention can be used to determine regions of a polynucleotide sequence, including a gene, which are more (or less) likely to mutate, e.g., in response to a selection pressure on the organism.
  • the methods of the invention are useful, e.g., for determining which portions of a gene are optimal targets for design of probes, e.g., for the detection of the presence of a microorganism in a biological sample.
  • a probe which is complementary to a portion of bacterial nucleic acid can be used to detect the presence of the bacterium in a biological sample, e.g., to detect bacterial infection, as is well known in the art.
  • the probe will no longer bind (or will bind with decreased affinity) to the nucleic acid of the mutated bacterium, thus rendering detection of the bacterium more difficult.
  • a probe be designed to be complementary to a portion of the bacterial nucleic acid which is stable, e.g., less prone to mutation. Mutations of the bacterium are less likely to occur in a stable region, and therefore, the probability that the probe will be rendered useless by subsequent mutation is decreased.
  • destabilization of a portion of a polynucleotide sequence can result in increased mutation of nucleic acid sequences which are remote from the destabilized portion of the polynucleotide.
  • insertion of a destabilized, A-T rich region of polynucleotide sequence into a gene can result in significant increases in mutation rate of regions of the gene as much as 200-250 bases awav from the destabilized region. compared to the wild-type gene, when the gene is inserted into a bacterial host cell which is then subjected to mutational pressure.
  • the methods of the invention are also useful for determining functionally important portions of a protein which is encoded by a nucleic acid sequence.
  • nucleic acid sequences which code for critical residues or regions of the protein will reside in regions of the gene which are relatively resistant to mutation, i.e., in stable regions of the gene, to avoid deleterious mutations which would decrease or abolish the desired function of the protein.
  • the stability of regions of a gene e.g., by determining the effect on stability of flanking sequences, critical residues of the encoded protein can be identified.
  • the methods of the invention can be used to determine or predict regions of a protein or polypeptide which are antigenically important.
  • nucleic acids sequences can often code for a single polypeptide. Routine computational methods allow the determination of the free energy of each nucleic acid sequence which encodes a selected polypeptide. For a given polypeptide, the energy of a naturally-occurring coding sequence can be compared to the energies of all possible nucleic acids which could code for that polypeptide, to determine unusually stable or unstable coding sequences.
  • Example 3 by identifying regions of the coding nucleic acid sequence where a coding nucleic acid is unusually stable or unusually unstable with respect to the possible range of nucleic acid sequences that could code for a given polypeptide sequence, important regions of the polypeptide can be identified.
  • a particular "window" of length n bases e.g., 15 bases in Example 3
  • the window can be moved through the entire coding sequence, or any portion thereof, to determine the free energy of the polynucleotide window subsequence.
  • free energies of window subsequences of other potential (generally non-natural) coding sequences can be determined, and the wi: ir w free energies for the naturally-occurring (or test) coding sequence can be compared to the corresponding window free energies from the potential coding sequences.
  • Unusually stable regions of the test sequence, compared to the potential coding sequences, can then be identified.
  • the identified regions of the polypeptide (or the coding nucleic acid) can then be altered to provide polypeptides with altered properties (e.g., increased or decreased antigenicity) according to a variety of methods, some of which are known in the art.
  • the invention provides methods for modulating (e.g., increasing or decreasing) the relative or absolute susceptibility to mutation of a polynucleotide sequence.
  • the method includes the step of providing a polynucleotide sequence in which a portion of the polynucleotide sequence has been stabilized (or destabilized) relative to a control polynucleotide sequence, such that the relative or absolute susceptibility to mutation of a polynucleotide sequence is modulated.
  • the invention provides means for stabilizing or destabilizing a polynucleotide sequence with respect to mutational susceptibility.
  • a polynucleotide sequence can be stabilized, e.g., to prevent mutation when the polynucleotide sequence is inserted into a host cell or organism, or destabilized, e.g., to promote mutation.
  • the addition of destabilizing regions of nucleic acid into a gene can provided increased rates of mutation when the gene is inserted into an organism.
  • This method can thereby provide a method for producing non-naturally occurring nucleic acid sequences (and proteins encoded by them); this is form of "directed evolution" in that a particular gene or portion thereof can be targeted for mutation without increasing the propensity for mutation of other regions of the genome.
  • Such non-naturally occurring proteins can be assayed to determine properties such as binding specificity, binding affinity, rate of catalysis of a reaction, and the like, to identify proteins which have desirable characteristics.
  • the methods of the invention can be used to speed the process of preparing and selecting mutant proteins.
  • the methods of the invention can be used to increase the stability of a sequence and thereby decrease the mutation frequency of the sequence.
  • a nucleic acid sequence, and the protein encoded thereby can be "protected” to prevent mutations, e.g., by altering the coding nucleic acid sequence, or a nearby (e.g., flanking) sequence to increase stability of the coding sequence.
  • the invention also provides methods for de ir yiining sequence stability of a test polynucleotide sequence by comparing the test sequence with sequences having a known stability, and thereby determining the stability of the test sequence.
  • a "database" of sequence stability values can be constructed, e.g., using the methods described herein.
  • the invention provides methods for selected, from a plurality of polynucleotides.
  • flanking sequences which confer increased (or decreased) thermodynamic stability on a nucleic acid binding site.
  • the results can be stored in a database. It will be appreciated that for a polynucleotide sequence, such as a flanking sequence, of length n bases, there will be 4 n possible polynucleotides. For sequences of length greater than about 10 bases, the number of possible polynucleotides becomes large, and experimental determination of the energies of all possible sequences may not be practical. It is therefore desirable to reduce the number of sequences which must be tested, while still providing sufficient data to permit the determination of the energy of any test sequence with reasonable accuracy.
  • flanking sequence i.e., more homogeneous flanking sequences (e.g., flanking sequences which have repeated subsequence motifs) are better able to "transmit" energy to the sequence of interest. This relationship can be expressed as
  • the free energy, ⁇ G j can be readily computed, e.g., by using known values and methods. Therefore, the ability of a flanking sequence to stabilize (or destabilized) a sequence of interest can be evaluated by determining the effect of sequence homogeneity on reactivity for a variety of sample flanking sequences. It is believed that conformational variability is a common, although heretofore unappreciated, feature of nucleic acids such as DNA. Studies have indicated that small (GC)g DNA segments can influence the overall structure of sequences of DNA as large as one thousand bases in length (e.g.. Kirn et al., Biopolymers 33: 1725-1745 (1993); see also Example 4. infra)).
  • GC small
  • flanking sequences are constructed to have a pre-determined amount of sequence homogeneity (e.g., repeated subsequence motifs).
  • sequence homogeneity e.g., repeated subsequence motifs.
  • the ability of a ligand binding site flanked by such a flanking sequence can be determined, e.g.. by methods described herein, e.g., the binding assay described in Example 1. infra.
  • the effect of sequence homogeneity on reactivity can then be determined to provide a database of sequence reactivity values.
  • the methods described in Example 1 and Example 2 herein can advantageously be combined to better determine the effect of any given flanking sequence on the binding or reactivity characteristics of a ligand binding site.
  • the method of Example 1 can be used to explore flanking sequence effects by sampling a population (or at least a random subpopulation) of all possible flanking sequences to determine those sequences which most increase, or decrease, the reactivity of ligand binding site for a ligand.
  • the method of Example 2 can be sued to explore flanking sequence effects in a systematic fashion, by preparing any desired number of flanking sequences which have known amounts of sequence homogeneity.
  • the methods are complementary, and can be used in tandem to provide greater information about flanking sequence effects.
  • FIG. 4 A schematic diagram illustrating one embodiment of a method of selecting flanking sequence(s) which affect the relative reactivity of binding site flanked by such sequence(s) is shown in Figure 4.
  • the ligand can be. e.g., a protein which binds to a nucleic acid, e.g., an enzyme, e.g., a restriction enzyme.
  • the ligand can be the endonuclease BamHI, which binds to the sequence 5'-GGATTC-3'; binding sites for other ligands can be employed as is known in the art.
  • the binding site is flanked in both directions by A (to desymmetrize the construct), and then a 40-base long random insert is provided in both the 5' and 3' directions (longer or shorter random sequences can be employed if desired, e.g., to study the effect of remote flanking sequences on binding site reactivity).
  • PCR primer sites were provided to permit amplification of the construct, if desired.
  • Each PCR primer site included an EcoRI restriction site.
  • the constructs used in this example were synthesized on an automated DNA synthesizer, (although other synthesis methods can be used). During the synthesis, the synthesizer was programmed to provide a mixture of each of the four nucleotide A, G. C.
  • the population of constructs was amplified with PCR under standard conditions and purified by polyacrylamide gel electrophoresis (PAGE), followed by elution into buffer including 50mM NaCl and 50mM Tris-HCl (pH 8.0). The result was a population of polynucleotide duplexes (the PCR reaction provided the complementary strand). It was determined through appropriate control experiments that the synthesis and PCR amplification of the duplexes resulted in correct binding sites for BamHI.
  • Lanes 2 through 10 show the result of incubating duplex polynucleotide aliquots with varying amounts of BamHI as indicated in the legend. It was found that the ratio of shifted to unshifted molecules increased with increasing BamHI concentration, as expected. The duplexes having highest affinity for BamHI should be shifted at relatively low BamHI concentrations, while lower-affinity duplexes will begin to bind as the BamHI concentration is increased.
  • Samples of shifted and unshifted duplexes were then digested with 200 units of EcoRI per microgram of duplex polynucleotide.
  • the polynucleotides were cleaved at the EcoRI recognition sites, purified by PAGE, and ligated into Lambda ZAp vector predigested with EcoRI and treated with CIAP at 1 : 1 insert:vector ratio in the presence of 2U of T4 ligase in 5 microliters of T4 ligase buffer at 40°C overnight.
  • the ligated samples were then packaged using Gigapack II Gold packaging extract (Stratagene) and cloned into E. coli XL 1 -Blue host strain and sAje ;ted to blue/white selection.
  • the recombinant (white) clones are selected and eluted in 500 microliters SM buffer (100 mM NaCl. 8 mM MgS0 4 . 50 mM Tris-HCl, pH 7.5, 0.01% gelatin, 0.04% chloroform).
  • Ten microliters of the eluate was amplified by PCR using T3/T7 primers, purified by Qiagen PCR purification kit and sequenced. * 5 "*"* 1 PCT/US99/03754
  • methods such as the methods described above can be used to generate compilations of data for the prediction of the reactivity of a potential binding site for a ligand based upon sequences which flank the ligand binding site.
  • methods such as the methods described above can be used to generate compilations of data for the prediction of the reactivity of a potential binding site for a ligand based upon sequences which flank the ligand binding site.
  • Example 2 This Example described a method for providing nucleotide sequences that represent all sequence possibilities. A minimal number of nucleotide repeat sequences of any specified length are created that represent all sequence possibilities using the nucleotides A, G, C, and T.
  • a "pure repeat” is a repeating polynucleotide (e.g., DNA) sequence for which all base positions are defined.
  • a pure dinucleotide repeat is (AG) n , where n is an integer and A and G are the defined nucleotides. Pure trinucleotide repeats, tetranucleotide repeats, pentanucleotide repeats, and higher repeating units are also possible.
  • An “impure repeat” is a repeating polynucleotide (e.g.. DNA) sequence in which random, non-repeating nucleotides may occur.
  • an impure trinucleotide repeat is (AGX) n , where n is an integer and X is a random nucleotide. Impure trinucleotide repeats, tetranucleotide repeats, pentanucleotide repeats, and higher repeating units are also possible.
  • flanking sequences can be u ⁇ c 1 to probe flanking sequence reactivity, e.g.. in a binding experiment analogous to the experiment described in Example 1, supra.
  • the flanking sequences instead of random flanking sequences surrounding a BamHI binding site, the flanking sequences include pure or impure repeat units. Synthesis of a population of duplex polynucleotides having pure dinucleotide repeats, pure trinucleotide repeats, impure dinucleotide repeats, impure trinucleotide repeats, and the like, can be performed according to standard methods, e.g., on a DNA synthesizer.
  • the standard deviation (SD) of DNA free energy for a protein shared by different organisms was studied.
  • SD standard deviation
  • the calculation of the SD of energy for a gene starts with extracting the coding sequences for the protein of interest and translating them into their respective protein sequence. These sequences are aligned simultaneously to provide a representation of the protein sequences containing gaps which can be used for intersequence comparison.
  • a consensus protein sequence is determined.
  • the energy profile for each of the DNA sequences is calculated using known hydrogen-bonding and base stacking free energies over an interval or window of the DNA sequence. This window is then moved one base step along the sequence and a new free energy value is calculated for the new window. This procedure is continued for the entire length of the DNA.
  • the energy profiles are then gapped according to the gapped protein sequences.
  • the SD and mean free energy values are calculated.
  • the distribution of energies from the permutations is calculated for each base posiA"- . Upper and lower confidence limits for each base position are then calculated.
  • Regions where the mean free energy exceeds the confidence limits are tabulated.
  • the confidence limit outliers are cross-correlated with the positions of low SD. Regions of cross-correlation are prime candidates for antigenic and nonantigenic response for the protein of interest.
  • the analysis of the lactate dehydrogenase genes from bacteria along with some data from the mouse beta globin gene and HTLV-I LTR gene are presented.
  • the distribution of free energy states available to the DNA as determined by the amino acid sequence is calculated for each position along the mouse beta globin gene with a 15 base window and defined in discrete energy ranges. A representation of this data is shown in Figure 7. Each distribution represents the probability of finding the DNA at any particular energy.
  • LDH low-density gene from three different bacilli were calculated using the method described above.
  • the low temperature bacillus has a larger number of points above the upper 95% confidence limit.
  • the ambient temperature bacillus has a number of points exceeding the upper and lower 95%> confidence limits which is between the number of points found for the high and low temperature bacilli.
  • the mutation rate is dependent is the DNA stability. Too high a mutation rate may kill the organism due to lethal mutations and too low of a mutation rate would not allow the organism to adapt to environmental changes thereby reducing the survivability of the organism.
  • lactate dehydrogenase was chosen since it is a very common gene, there is a large database of information about the protein structure and antigenicity and many gene sequences are available from Genbank. The search for gene sequences was limited to gram positive bacteria and provided 27 unique complete lactate dehydrogenase gene sequences for the analysis. The methodology described above for determining the SD of energy and the unusually high and low energy positions was applied to this set of gene sequences.
  • Positions along the DNA which have a mean free energy deviation that are higher or lower than the 95% confidence limits are designated as + 1 or -1. respectively. All points which fall within the 95% confidence limits are assigned a 0.
  • the result is plotted in the second line graph of Figure 10. From a total of 400 points (amino acid positions) 22 exceeded the 95% confidence limits.
  • the antigenic and nonantigenic sites relative to the consensus gapped DNA determined by Hogrefe et al., J.B.C., 1987 for mouse lactate dehydrogenase in. rabbits are displayed in the two bottom graphs of Figure 10.
  • the vertical bars around base positions 600 and 825 denote the only regions where we find low SD of energy and the mean free energy deviation exceeds the 95% confidence limits. In fact, the region with the unusually stable sequence at 600 is nonantigenic and the region at 825 with an unusually unstable sequence is the second most antigenic region in the protein.
  • regions of the coding DNA that have a low SD of energy and are stable with respect to the overall possible range of energy states available for the peptide code for nonantigenic regions of the protein. In other words, these regions of the protein look like "self and thus do not induce an antigenic response. Conversely, regions of the coding DNA which have a low SD of energy and are unstable with respect to the overall possible range of energy states available for the peptide code for antigenic regions of the protein. Based on the consensus sequence determined from a set coding sequences for a particular protein from different organisms and the statistical variation of the free energy of the DNA, it is possible to predict which peptides will be the most likely candidates for eliciting an antigenic response. One advantage of this technique is the short time that is necessary to determine the most effective peptide candidates for inducing an antigenic response.
  • the stable regions exhibit 2/3 of their of GC substitution bias in the 3rd position of the codon. Those that are unstable exhibit 50% of their GC substitution in the 3rd position.
  • the stable regions appear to have a higher prevalence of GC substitutions than the unstable regions which appear to be more like a normal DNA sequence. Nevertheless, the stable regions with the higher rate of substitution only change their amino acid, i.e. synonymous 12.5% of the time, whereas the unstable regions make non synonymous substitutions 27% of time.
  • the unstable regions look more like normal or random DNA than the stable regions which have a higher CG substitution which does not change the amino acid identity.
  • a section of a gene (“Wild Type Sequence”; the sequence is shown in Figure 1 1) was replar a ⁇ with a DNA sequence which codes for the same polypeptide sequence, but which is richer in A and T than the wildtype sequence, and therefore is energetically less stable than the wildtype( "Unstable Insert Sequence” shown in Figure 11).
  • the altered gene was inserted into a plasmid and used to transform E. coli.
  • the wildtype sequence was also inserted into a plasmid and used to transform E. coli as a control.
  • the plasmids included genes to permit blue/white selection of bacteria (e.g., see Example 1, supra).
  • Transformed bacteria were then subjected to chemical mutagenesis conditions to cause mutations in the sequences.
  • the level of mutagenesis was monitored by scoring the number of white colonies that result (white colonies form if the lacZ alpha- complementation, also carried on the plasmids, becomes mutated to the point where it is no longer functional. Isolated blue colonies were selected for expansion and the plasmid isolated from the resulting colony. This plasmid was then used to transform stock bacteria and the cycle of mutagenesis and plasmid isolation was repeated for a total of four cycles.

Abstract

The invention relates to nucleic acid sequences, and, in particular, to methods for identifying or designing structural and/or energetic characteristics of nucleic acids, and proteins which are encoded by the nucleic acids.

Description

THERMODYNAMIC PROPERTIES OF NUCLEIC ACIDS
Related Application
This application is related to copending U.S. provisional applications Serial No. 60/038,796, filed February 24, 1997, and Serial No. 60/068,616, filed December 23, 1997; the entire contents of these applications are hereby incorporated by reference.
Field of the Invention
The invention relates to nucleotide and polypeptide sequences having selected functional characteristics or having selected values for a free energy parameter. The methods of the invention allow selection of anti-sense oligonucleotides (e.g., for use as pharmacological agents), determination of protein or polypeptide function from a nucleotide sequence which codes for the polypeptide, prediction of antigenic regions of a protein, determination of a region of a polynucleotide or protein which is susceptible to mutation, and the like.
Backsround of the Invention
Recent advances in sequencing of nucleic acids have resulted in the accumulation of large amounts of sequence information, including the sequences of the genomes of entire organisms. The primary structure of the protein or polypeptide which is encoded by a nucleic acid sequence, such as a gene, can be determined through the use of the well-known genetic code. However, in many instances, the function of a protein which is encoded by a particular gene is unknown at the time the gene is sequenced. The function of such proteins can often be determined, but the cost can be considerable. The design of nucleic acids which are complementary to a target nucleic acid
(e.g., probes, primers, antisense oligonucleotides, and the like) can be complicated by factors such as secondary formation by the complementary strand. Moreover, in many cases, the design of probes having a pre-determined value for a parameter, such as melting temperature, would be desirable. Antisense therapy involves the administration of exogenous oligonucleotides that bind to a target nucleic acid, typically an RNA molecule, located within cells. The term antisense is so given because the oligonucleotides are typically complementary to mRNA molecules ("sense strands") which encode a cellular product. The ability *o αse anti-sense oligonucleotides to inhibit expression of mRNAs. and thereby to inhibit protein expression in vivo, is well documented. However, selection of an appropriate complimentary oligonucleotide (or oligonucleotides) to a given mRNA is not always simple (see, e.g.. Crooke, S.T. FASEB J. 7: 533-539 (1993), incorporated herein by reference). Anti-sense agents typically need to continuously bind all target RNA molecules so as to inactivate them or alternatively provide a substrate for endogenous ribonuclease H (Rnase H) activity. Sensitivity of RN A oligonucleotide complexes, generated by the methods of the present invention, to Rnase H digestion can be evaluated by standard methods (see, e.g., Donia, B. P., et al., J. Biol. Chem. 268 (19):14514-14522 (1993); Kawasaki, A. M., et al., J. Med. Chem. 6(7):831-841 (1993), incorporated herein by reference).
Methods for addressing problems in determination of protein function and in design of oligonucleotides are needed.
Summary of the Invention
The invention relates to nucleic acid sequences, and, in particular, to methods for identifying or designing structural and/or energetic characteristics of nucleic acids, and proteins which are encoded by the nucleic acids.
In one aspect, the invention relates to methods for selecting antisense oligonucleotides. Prior art methods do not provide efficient means of determining which complimentary oligonucleotides to a given mRNA will be useful in an application. Shorter (15-200) base anti-sense molecules are preferred in clinical applications. In fact, a minimum of 15 base anti-sense oligonucleotides is preferred. The invention includes methods for selecting desired anti-sense oligonucleotides from the set of candidates provided by any given nucleic acid, e.g., an mRNA. In particular, the invention provides a means of determining desired, e.g., sequence positions, e.g., those which present a desired level of free energy variations on the mRNA to design anti-sense oligonucleotides against thus reducing the empiricism currently employed. In one embodiment, the invention features a method of identifying a site on a nucleic acid sequence having high free energy variability. This allows determination of sites which are preferred for oligonucleotide, e.g, antisense, binding. The method includes some or all of the following steps: providing a nucleotide sequence, e.g., sequence from a target gene; casting the nucleotide sequence as the free energy as a function of base pair position: calculating the free energy of X windows centered on a base pair for a plurality of base pairs from the nucleotide sequence for every, or at least a plurality of window sizes between 2 and Y. where Y is an integer betv Ω I 3 and 1.000. more preferably between 2 and 100; for each window size, constructing a free energy distribution along the sequence, preferably normalizing the distribution to a standard scale (to account for the fact that the free energy is proportional to window size) (this calculation gives the results which can be plotted as shown in Figure 1);
.0. finding the mean normalized free energy values for all windows for each base pair position(this gives results which can be plotted as in Figure 2. It also represents the "carrier"); subtracting the mean value for a position and provide the deviation from the mean of each base position to determine those sequence which show high variability. The results can be plotted as in Figure 3 (point "a" in Figs 2 and 3 corresponds to high variability).
In another aspect, the invention provides for a method for determining preferred anti-sense sequence compliments within a predefined RNA sequence, these are generally high variability sequences. (As used herein high variability can be a relative parameter, e.g., relative to other variability in the sequence. Alternatively it can be relative to a predefined value).
In another aspect, the invention provides for sets (e.g., sets of 2, 3, 4 or more) of sequences, e.g., anti-sense oligonucleotides, of an optimal duplex free energy or variability but variable length at the sites of anti-sense candidates within candidate regions.
In another aspect, the invention provides sets of isoenergetic, or isovariable, oligonucleotides, e.g., anti-sense candidates of a set length within candidate region.
In yet another aspect, the invention provides for establishing oligonucleotides, (e.g., sets of 2, 3, 4 or more) oligonucleotides, e.g., anti-sense oligonucleotides ,of a preselected melting temperature, Tm within candidate regions.
Generally, the method allows for identification, choosing, and matching of sequences with desired free energy variability characteristics.
Methods of the invention can be used for any of the following: Determining the best anti-sense candidate regions, or sub-sequences, within any given anti-sense target. Such sequences exhibit wide variation in average energy as a function of increasing length.
Designing desirable attributes such as Tm, free energy and length coupled with sequence composition to arrive at the best anti-sense oligonucleotide candidates 10-200 bases in length including the pre-identified candidate regions.
Providing compositions of sequence which display the identified variation in sequence composition with increasing window size.
In another : p ct. the invention provides methods for determining functionally important regions of a protein. In another aspect, the invention provides methods for predicting mutation-prone regions of a protein. In still another aspect, the invention provides methods for determining flanking sequences which provide a pre-selected value for a thermodynamic parameter, such as free energy of ligand binding, to a ligand binding site of a polynucleotide.
In another aspect, the invention provides methods for selecting flanking sequences to a ligand binding site of a nucleic acid which affect ligand binding to a nucleic acid.
In another aspect, the invention provides methods for determining the ability of a pre-selected nucleic acid sequence to transmit binding energy to a portion of nucleic acid which is remote from the pre-selected sequence. In still another embodiment, the invention provides methods for increasing or decreasing the mutation rate of a nucleic acid sequence, e.g., by stabilizing or destabilizing the nucleotide sequence.
Any method of the invention can include providing a sequence, e.g., by synthesizing, or by placing in a reaction mixture which includes a carrier, e.g., a liquid, e.g., water.
Brief Description of the Drawings
Figure 1 is a plot of normalized energy as a function of window size and position along a representative DNA sequence. Figure 2 is an overlaid plot of the data shown in Figure 1.
Figure 3 is a plot of the variability of energy distributions along the representative DNA sequence.
Figure 4 is a schematic illustration of a ligand binding experiment.
Figure 5 is an autoradiogram of a gel shift assay. Figure 6 shows the sequences of several DNA constructs identified by ligand binding experiments.
Figure 7 is a graph of the distributions of free energies of 15 base window free energy calculations along the mouse betaglobin major gene are plotted as a function of amino acid position. The distributions were determined by calculating the free energy for each of the possible codon sequences dictated by the amino acid sequences and categorized into discrete energy ranges. Since all permutations of codons representing the amino acids are present, the distribution describes a probability of the energies available to the five amino acids in each window. The two introns of the betaglobin gene were deleted for this calculation. The axes are: x. free energy in calories; y, normalized probability; z. base position in the gene sequence.
Figure 8 is an alternative representation of the energetic analysis shown in Figure 6. The distributions are normalized to their respective means, and each distribution is reduced by 2.5% from each end. The energy profile of the actual mouse betaglobin sequence is plotted after normalizing it to the same mean value. These confidence intervals define the range over which 95% of the energies of the DNA sequences that can be coded from the amino acid sequence of the- gene can fall.
Figure 9 is a graph which displays the relative free energy of a segment of the HTLV-I LTR as a function of base position and the natural logarithm of the mutation rate over the same region (inset). Seven HTLV-I sequences were used in the mutation rate calculations.
Figure 10. a) The positions along the consensus sequence which have a standard deviation of energy less than 1 kcal determined from the statistical analysis of the 24 ldh sequences are plotted as a function of base position in the consensus sequence. The total number of points is 79. b) The points in the mean free energy deviation versus base position which exceed the +95% and -95% confidence limits are designated as +1 or -1 , respectively, and are plotted as a function of consensus base position. All points which fall within the 95% confidence limits are zero. The two regions denoted by vertical bars have correlations between these two datasets; around 600 and around 825. The antigenic c) and nonantigenic d) regions of lactate dehydrogenase (Hogrefe, et al., J.B.C., 1987) are displayed relative to the consensus gapped DNA sequence. Regions at 600 and 825 cross-correlate to a nonantigenic and an antigenic region, respectively. The antigenic region is the third most antigenic region found by Hogrefe et al. Figure 11 shows the wildtype and unstable insert DNA sequences of a neo gene.
Detailed Description of the Invention
In general, the invention relates to methods for determining structurally or functionally important regions of a nucleic acid sequence or a protein or polypeptide sequence.
Methods for Designing Antisense Oligonucleotides
The invention provides a method(s) for determining base position(s) on a preselected mRNA sequence where best hybridization of an oligonucleotide will occur. Note that the mRNA may be a pre- mRNA (hnRNA) thus containing untranscribed regions to be spliced out and that included in this mRNA/pre-mRNA are a variety of control sequences which allow binding of various cellular component?
For example, if one were to approach the problem of anti-sense design randomly on. say. a 1000 base target mRNA molecule then one could pick a set length oligonucleotide, e.g., 30 bases, and synthesize a thirty-mer starting at position 1 of the mRNA and complimentary to positions 1-30 of the mRNA followed by synthesis of a second thirty-mer starting at position 2 and ending at position 31. This iterative process of synthesis followed to its conclusion results in [1000 base mRNA -2(30 base anti- sense length) + 2] = 942 thirty base anti-sense oligonucleotides Similarly, of course one might also select nineteen-mers as the optimal length resulting in [1000 base mRNA - 2(19 base anti-sense length) + 2] = 964 nineteen base anti-sense oligonucleotides In fact, one could synthesize all such complimentary oligonucleotides of length less than the mRNA length and try to inhibit protein synthesis with each in an attempt to find the best anti-sense oligonucleotide for a given mRNA However, in practice this approach would be an enormous undertaking Clearly the process of selecting an anti- sense oligonucleotide of length suitable for large scale use as a pharmaceutical while showing in vivo activity would be simplified by identifying the "best" mRNA sequence position to target an anti-sense oligonucleotide against
The method is described below with reference to data from a representative target nucleic acid sequence (LDH M72545, base positions from 64-924, the sequence is available through GENBANK) The algorithm for determining relatively "reactive" sites along genomic DNA is based on a representation of duplex DNA in terms of its sequence dependent melting free-energy This provides DNA sequence as energy contours, that when scrutinized in the proper way, can lead to direct determination of specific sites that are optimum for targeting by anti-sense therapeutic agents There are six steps to the current method with at least one step (4) being considered optional
(1) Free-Energy Representation of DNA Sequences For a DNA sequence comprised ot N base pairs (bps), each bp l can be assigned a melting free-energy value. ΔG,,
ΔG, = ΔGH-B + (ΔGS M . + ΔGS 1 1+1)/2
Where ΔGH B, is the free-energy of hydrogen bonding that typically can take on only two values (for A-T or G-C type bps) and ΔGS , , , + ΔGS , 1+1 are the nearest-neighbor sequence dependent stacking free-energies for the stacking interactions between bp 1 and bps l+l and l-l Utilizing this equation each bp can be assigned a free-energy of melting
(2) Construction of Free-Energy Windows In this procedure windows of bps containing from 2 to 200 bps are individually examined For each window size, starting at bp 1. the added free-energy of the bps in the window are summed and plotted as the first point The window is then moved over one bp and the free-energy of the new window that contains the
Figure imgf000008_0001
of a new bp and not the tree-energ\ ot the first bp ot the previous window and all the intervening bps, is determined. The process is continued until the last window reaches the end of the DNA sequence under consideration. Formally for each window size, j = 10-62 bps starting at bp s = 1, N-j+1 the free-energy of each window is given by,
ΔGJ* = Σ1=SJ+S., (ΔG1)
Thus, plotting the values of ΔGj vs bp position s results in an energy contour for that particular window size, j. Since the magnitude of ΔGj increase with the size of j, relative features of energy contours constructed for different window sizes are difficult to compare directly.
(3)-Direct Comparisons of Energy Contours Constructed with Different Window Sizes: To facilitate such a direct comparison the values of ΔG^ determined for different values of j are normalized relative to the maximum free-energy difference of any two windows of size j . Thus the normalized free-energy for each window is given by
<ΔG/V) = I (ΔG v - ΔG (min)) |/| (ΔG * (max) - ΔG (min)) |
WhereΔGAmax) and ΔGj w(min) are the maximum minimum and free-energies observed for all the windows along the sequence of size j. Now the free-energy contours constructed with different window sizes consist of a distribution of relative free-energies with values between 0 and 1 vs bp position.
(4) [Optional step] Overlapping Energy Contours Constructed with Different Window Sizes: A more direct comparison of these energy contours is to "overplot" (e.g., plot one data set over another) them as shown in Fig. 2. Features of the distribution of melting stability are clearly apparent and apparently only slightly dependent on window size over the range examined. Regions of lowest magnitude are the least stable while regions of highest magnitude are the most stable. Although the same general features are observed on all the distribution function shown in Figure 2, there are small deviations (on the order of 10-20%) about what appears to be the "average" shape of the distributi n These distribution directly reveal the contributions of hydrogen bonding and nearest-neighbor stacking to DNA stability. The prominent features of the distribution being determined by the amount of A-T or G-C type bps in the sequence. For example, the peaks on Fig. 2 depict regions relatively higher in G-C percentage. The converse is true for the valleys which reveal a larger percentage of A-T type base pairs in that region. (5) Deviations from the Window Size Average Reveal Targetable Regions The superimposed "noise" or deviations from the mean behavior of the distribution for the different window sizes seen in Fig 2 reveals the influence of nearest-neighbor stacking on DNA stability It is this noise pattern that can be isolated To better examine this component of the distribution functions, the average over all normalized energies determined for each window size are determined at each bp position, s That is,
<ΔG->βve(s) = ΣJ<ΔGJ-)(s Nvv
where Nw is the number of window sizes Now the differences,
δ<ΔG-)ave(s) = <ΔG-)ave(s) - (ΔGj-Xs)
are determined and plotted vs sequence position for each window size as shown in Fig 2 The result is a "noise" pattern with most values between -0 20 and +0 20 centering around 0 along the bp position Notably, several regions emerge from this pattern which display larger range than the preselected noise criteria These regions are clearly seen in Fig 3 and display the highest variability in sequence dependent stability with changes in window size and after scaling the values for the entire sequence set as described above These are the desired targets for sequence specific anti-sense therapeutics
(6) Selection of Sequences The 200 base sequences, 100 to either side of the "vaπational maxima" seen on the plots of δ(ΔGw)a e(s) vs s (δ(ΔGw)ave(s)) are identified from the mRNA sequence and subjected to further examination While these 200-mers could be used as anti-sense oligonucleotides immediately it is more desirable to use smaller ohgomers comprises of approximately 30 or less bases that are subsequences of the selected 200-mer Optimal anti-sense candidate ohgomers within the 200-mer will contain a 2-10 bp more stable region flanked by relatively unstable regions
In some applications it may be desirable to select sets of anti-sense oligonucleotides all with a pre-defined optimal duplex free-energy but with different variable lengths This is done by scanning the energetic distribution ol the 200 bp region and determining the various pieces from 15 to 30 bps in length that have the same calculated free-energy of stability
In other applications it may be desirable to select sets of isoenergetic anti-sense candidates of a given length This is done by scanning the energetic distribution of the 200 bp region and determining the various pieces of a given length that have the same calculated free-energy.
In other applications it may be desirable to choose anti-sense oligonucleotides of a preselected melting temperature, Tm. This is done using the formula,
T = (ΔHD + ΔHnue> ΔSD + ΔSnue + !n(dCΥ)
Where ΔHD and ΔSD are the calculated melting enthalpy and entropy for the particular sequence. The entropy of nucleation is ΔSnuc and is regarded as a constant for a particular type of target in our equational formulation. That is, it does not depend on oligomer length. In contrast, the enthalpy of duplex nucleation, ΔHnuc is primarily electrostatic in nature and therefore depends on sequence length, G-C percentage and salt concentration. The total strand concentration is C and α is a factor that properly accounts for sequence degeneracies in association of the ohgomers. Overall, stability of the chosen ohgomers can therefore be adjusted by changes in G-C percentage and length.
Methods for Selecting Flanking Sequences In another aspect, the invention relates to methods for selecting flanking nucleic acid sequences, e.g., nucleic acid sequences which flank (in the 3' and or 5' direction) a selected nucleic acid sequence (such as a ligand binding site). The methods of the invention are useful for determining flanking sequences which provide desired characteristics, such as Tm, ability of the ligand binding site to bind to a ligand, ability of a ligand to react with the nucleic acid sequence, stability of the nucleic acid sequence, and the like.
According to this aspect of the invention, the thermodynamic parameters of a nucleic acid sequence can be modulated by providing appropriate flanking sequences. For example, a flanking sequence(s) can be selected to increase the ability of a ligand to bind to a ligand binding site; decrease the ability of a ligand to bind to a ligand binding site; increase the ability of a ligand to react with a nucleic acid sequence; decrease the ability of a ligand to react with a nucleic acid sequence; increase the stability of the nucleic acid sequence (e.g., the mutability of the sequence); ''en/ease the stability of the nucleic acid sequence: and the like. Without wishing to be bound by any particular theory, it is believed that the ability of a nucleic acid sequence to bind to a ligand is related to the thermodynamic stability of the nucleic acid sequence. It is believed that less stable sequences bind to ligands better than more stable sequences do. Thus, the ability of a polynucleotide to bind to a ligand is related to the stability of the polynucleotide sequence. For a given ligand binding site, the stability of a polynucleotide which includes the ligand binding site can be modulated by providing flanking sequences which affect the stability of the polynucleotide, e.g., at the ligand binding site. In one embodiment, the method includes the steps of providing a polynucleotide which includes a ligand binding site and at least one sequence which flanks the ligand binding site (e.g., in the 3' and/or 5' direction); and determining the ability of the ligand to bind to the ligand binding site to bind to (or react with) the ligand. In certain embodiments, a plurality (such as combinatorial library) of polynucleotides, each including the same ligand binding site and different (e.g., randomly differing) flanking sequences, are provided. In this embodiment, the mixture of polynucleotides can be screened against a limiting concentration of the ligand, and polynucleotide sequences which preferentially bind to the ligand (or do not bind to the ligand) can be selected, and (preferably) are then sequenced to determine an appropriate flanking sequence(s). Thus, for instance, Example 1 , infra, describes the preparation of a plurality of
DNA sequences; each sequence included a binding site for the restriction enzyme BamHI, flanked on each end by a random polynucleotide sequence 40 bases in length. The mixture of sequences was titrated with a known concentration of BamHI, and those sequences which bound most strongly to BamHI were selected (in this example, by gel shift assay and recovery of the shifted bands). Similarly, the poorest-binding sequences were selected. Certain of the selected polynucleotides were then sequenced to determine the flanking sequences which conferred increased or decreased ligand-binding ability on the ligand binding site.
As discussed above, the ability of a polynucleotide sequence to bind to a ligand is believed to be related to the stability of the polynucleotide sequence. It is further believed that the ability of a polynucleotide sequence to bind to ligands is at least largely independent of the ligand selected. Thus, a flanking sequence which destabilizes the binding of any one ligand to a ligand binding site adjacent the flanking sequence, will also destabilize the binding of other ligands to the ligand binding site. Thus, the particular ligand selected for use according to the methods of the invention, to determine the ability of a flanking sequence to affect ligand binding, is a matter of convenience and design choice which will be routine to one of ordinary skill in the art.
It will be appreciated tNaf flanking sequences which confer particular energetic or reactivity attributes upon a neighboring sequence will have many potential uses. For example, flanking sequences can be selected to promote binding to or reaction with a ligand, such as an RNA or DNA binding protein, a polymerase, a reverse transcriptase. a telomerase, a helicase, a transcription factor, and the like. Thus, the invention provides methods for selecting flanking sequences which can be used in vivo, e.g., to study the interaction of ligands and nucleic acids, or to provide improved probes or primers for PCR amplification, and the like.
The flanking sequences of a polynucleotide can also be provided in a non- random manner. For example, a flanking sequence can be provided , e.g., by oligonucleotide chemical or biochemical synthesis, to provide a flanking region of any known sequence. This flanking sequence can then be tested to determine the effect on ligand binding or sequence stability. One particularly preferred practice of the invention involves the construction of a plurality of oligonucleotides, each including a ligand binding site flanked by at least one flanking sequence which has a known, repeating motif. The effect on stability of each flanking sequence can then be assayed, e.g., as described herein. This embodiment of the invention is useful in constructing sequence reactivity data compilations, e.g., a database, which quantifies the effect on stability of any possible flanking sequence (see infra).
Methods for Predicting Mutable Sites
In another aspect, the invention provides methods for determining polynucleotides which are more (or less) prone to mutation. The methods of the invention can be used to determine regions of a polynucleotide sequence, including a gene, which are more (or less) likely to mutate, e.g., in response to a selection pressure on the organism.
The methods of the invention are useful, e.g., for determining which portions of a gene are optimal targets for design of probes, e.g., for the detection of the presence of a microorganism in a biological sample. For example, a probe which is complementary to a portion of bacterial nucleic acid can be used to detect the presence of the bacterium in a biological sample, e.g., to detect bacterial infection, as is well known in the art. However, if a mutation occurs in the bacterial genome at the site to which the probe binds, the probe will no longer bind (or will bind with decreased affinity) to the nucleic acid of the mutated bacterium, thus rendering detection of the bacterium more difficult. According to the invention, a probe be designed to be complementary to a portion of the bacterial nucleic acid which is stable, e.g., less prone to mutation. Mutations of the bacterium are less likely to occur in a stable region, and therefore, the probability that the probe will be rendered useless by subsequent mutation is decreased.
The present inventors have now discovered that destabilization of a portion of a polynucleotide sequence can result in increased mutation of nucleic acid sequences which are remote from the destabilized portion of the polynucleotide. For example, as described in Example 4, infra, insertion of a destabilized, A-T rich region of polynucleotide sequence into a gene can result in significant increases in mutation rate of regions of the gene as much as 200-250 bases awav from the destabilized region. compared to the wild-type gene, when the gene is inserted into a bacterial host cell which is then subjected to mutational pressure. This result demonstrates that stability to mutation is inversely related to stability of the polynucleotide sequence (or a flanking region thereof); that is, as the stability of the sequence (or flanking region) decreases, the likelihood of mutation increases. This result can be employed in "directed evolution" schemes, e.g., as described infra.
The methods of the invention are also useful for determining functionally important portions of a protein which is encoded by a nucleic acid sequence. Without wishing to be bound by theory, it is believed that nucleic acid sequences which code for critical residues or regions of the protein will reside in regions of the gene which are relatively resistant to mutation, i.e., in stable regions of the gene, to avoid deleterious mutations which would decrease or abolish the desired function of the protein. Thus, by determining the stability of regions of a gene, e.g., by determining the effect on stability of flanking sequences, critical residues of the encoded protein can be identified. In another embodiment, the methods of the invention can be used to determine or predict regions of a protein or polypeptide which are antigenically important. Due to the degeneracy of the genetic code, a plurality of nucleic acids sequences can often code for a single polypeptide. Routine computational methods allow the determination of the free energy of each nucleic acid sequence which encodes a selected polypeptide. For a given polypeptide, the energy of a naturally-occurring coding sequence can be compared to the energies of all possible nucleic acids which could code for that polypeptide, to determine unusually stable or unstable coding sequences.
For example, as described in Example 3, infra, by identifying regions of the coding nucleic acid sequence where a coding nucleic acid is unusually stable or unusually unstable with respect to the possible range of nucleic acid sequences that could code for a given polypeptide sequence, important regions of the polypeptide can be identified. Thus, as described in Example 3, a particular "window" of length n bases (e.g., 15 bases in Example 3) can be defined, and the free energy of each subsequence of length n bases of the coding nucleic acid can be determined, e.g., by routine computational methods. The window can be moved through the entire coding sequence, or any portion thereof, to determine the free energy of the polynucleotide window subsequence. In a similar fashion, free energies of window subsequences of other potential (generally non-natural) coding sequences can be determined, and the wi: ir w free energies for the naturally-occurring (or test) coding sequence can be compared to the corresponding window free energies from the potential coding sequences. Unusually stable regions of the test sequence, compared to the potential coding sequences, can then be identified. The identified regions of the polypeptide (or the coding nucleic acid) can then be altered to provide polypeptides with altered properties (e.g., increased or decreased antigenicity) according to a variety of methods, some of which are known in the art.
Methods for Modulating Mutation Rate of Polynucleotide Sequences
In another aspect, the invention provides methods for modulating (e.g., increasing or decreasing) the relative or absolute susceptibility to mutation of a polynucleotide sequence. In one embodiment, the method includes the step of providing a polynucleotide sequence in which a portion of the polynucleotide sequence has been stabilized (or destabilized) relative to a control polynucleotide sequence, such that the relative or absolute susceptibility to mutation of a polynucleotide sequence is modulated. In this embodiment, the invention provides means for stabilizing or destabilizing a polynucleotide sequence with respect to mutational susceptibility. Thus, a polynucleotide sequence can be stabilized, e.g., to prevent mutation when the polynucleotide sequence is inserted into a host cell or organism, or destabilized, e.g., to promote mutation. As shown in Example 4, infra, the addition of destabilizing regions of nucleic acid into a gene can provided increased rates of mutation when the gene is inserted into an organism. This method can thereby provide a method for producing non-naturally occurring nucleic acid sequences (and proteins encoded by them); this is form of "directed evolution" in that a particular gene or portion thereof can be targeted for mutation without increasing the propensity for mutation of other regions of the genome. Such non-naturally occurring proteins can be assayed to determine properties such as binding specificity, binding affinity, rate of catalysis of a reaction, and the like, to identify proteins which have desirable characteristics. The methods of the invention can be used to speed the process of preparing and selecting mutant proteins.
Alternatively, the methods of the invention can be used to increase the stability of a sequence and thereby decrease the mutation frequency of the sequence. In this embodiment, a nucleic acid sequence, and the protein encoded thereby, can be "protected" to prevent mutations, e.g., by altering the coding nucleic acid sequence, or a nearby (e.g., flanking) sequence to increase stability of the coding sequence.
Methods for Predicting Sequence Stability
The invention also provides methods for de ir yiining sequence stability of a test polynucleotide sequence by comparing the test sequence with sequences having a known stability, and thereby determining the stability of the test sequence.
For example, in one embodiment, a "database" of sequence stability values can be constructed, e.g., using the methods described herein. For example, as described above, the invention provides methods for selected, from a plurality of polynucleotides.
- 1 J- those polynucleotides which are most (or least) effectively bound by a ligand, e.g., for determining flanking sequences which confer increased (or decreased) thermodynamic stability on a nucleic acid binding site. Once the stability of the polynucleotides (e.g., flanking sequences) has been determined, the results can be stored in a database. It will be appreciated that for a polynucleotide sequence, such as a flanking sequence, of length n bases, there will be 4n possible polynucleotides. For sequences of length greater than about 10 bases, the number of possible polynucleotides becomes large, and experimental determination of the energies of all possible sequences may not be practical. It is therefore desirable to reduce the number of sequences which must be tested, while still providing sufficient data to permit the determination of the energy of any test sequence with reasonable accuracy.
This result can be achieved as follows. It is believed (without wishing to be bound by any theory) that the stability, and therefore the reactivity, of a test sequence of interest is a function of both the free energy ΔGfj of the sequence (i.e., the free energy of hydrogen bonding and the free energy of base stacking, which are both values known in the art) as well as a function of the ability of flanking sequences to "transmit" energy to the sequence of interest. This "transmissibility" is in turn believed to be a function of the homogeneity of the flanking sequence, i.e., more homogeneous flanking sequences (e.g., flanking sequences which have repeated subsequence motifs) are better able to "transmit" energy to the sequence of interest. This relationship can be expressed as
Reactivity = f(ΔGrj) + f(homogeneity)
The free energy, ΔG j, can be readily computed, e.g., by using known values and methods. Therefore, the ability of a flanking sequence to stabilize (or destabilized) a sequence of interest can be evaluated by determining the effect of sequence homogeneity on reactivity for a variety of sample flanking sequences. It is believed that conformational variability is a common, although heretofore unappreciated, feature of nucleic acids such as DNA. Studies have indicated that small (GC)g DNA segments can influence the overall structure of sequences of DNA as large as one thousand bases in length (e.g.. Kirn et al., Biopolymers 33: 1725-1745 (1993); see also Example 4. infra)). The effect of sequence homogeneity on reactivity can be studied in a variety of ways. For example F .ample 2. infra, describes method for providing non-random flanking sequences to a ligand binding site. The flanking sequences are constructed to have a pre-determined amount of sequence homogeneity (e.g., repeated subsequence motifs). The ability of a ligand binding site flanked by such a flanking sequence can be determined, e.g.. by methods described herein, e.g., the binding assay described in Example 1. infra. By creating both "pure" and "impure" repeat sequences, the effect of sequence homogeneity on reactivity can be evaluated. The effect of sequence homogeneity on reactivity can then be determined to provide a database of sequence reactivity values. The methods described in Example 1 and Example 2 herein can advantageously be combined to better determine the effect of any given flanking sequence on the binding or reactivity characteristics of a ligand binding site. For example, the method of Example 1 can be used to explore flanking sequence effects by sampling a population (or at least a random subpopulation) of all possible flanking sequences to determine those sequences which most increase, or decrease, the reactivity of ligand binding site for a ligand. The method of Example 2 can be sued to explore flanking sequence effects in a systematic fashion, by preparing any desired number of flanking sequences which have known amounts of sequence homogeneity. The methods are complementary, and can be used in tandem to provide greater information about flanking sequence effects.
Exemplification
The invention is further illustrated by the following non-limiting examples.
Example 1
A schematic diagram illustrating one embodiment of a method of selecting flanking sequence(s) which affect the relative reactivity of binding site flanked by such sequence(s) is shown in Figure 4.
As depicted in Figure 4, a binding site for a ligand is provided. The ligand can be. e.g., a protein which binds to a nucleic acid, e.g., an enzyme, e.g., a restriction enzyme. As shown in Figure 4, the ligand can be the endonuclease BamHI, which binds to the sequence 5'-GGATTC-3'; binding sites for other ligands can be employed as is known in the art. The binding site is flanked in both directions by A (to desymmetrize the construct), and then a 40-base long random insert is provided in both the 5' and 3' directions (longer or shorter random sequences can be employed if desired, e.g., to study the effect of remote flanking sequences on binding site reactivity). At both the 5' and 3' ends of the construct, PCR primer sites were provided to permit amplification of the construct, if desired. Each PCR primer site included an EcoRI restriction site. The constructs used in this example were synthesized on an automated DNA synthesizer, (although other synthesis methods can be used). During the synthesis, the synthesizer was programmed to provide a mixture of each of the four nucleotide A, G. C. and T at each position of the 40-base random sequences; thus, a population of constructs was created as a statistical mixture differing at the random portions of the construct. (It will be appreciated that only a subpopulation of the 440 possible random sequences was obtained due to practical limitations on the amount of DNA synthesized.) The population of constructs was amplified with PCR under standard conditions and purified by polyacrylamide gel electrophoresis (PAGE), followed by elution into buffer including 50mM NaCl and 50mM Tris-HCl (pH 8.0). The result was a population of polynucleotide duplexes (the PCR reaction provided the complementary strand). It was determined through appropriate control experiments that the synthesis and PCR amplification of the duplexes resulted in correct binding sites for BamHI.
Aliquots of these duplexes were then incubated with appropriate quantities of BAMHI under conditions such that BamHI will bind to its binding site on the duplex but will not cleave the site (100 ng of the duplexes containing 0.01 pmol of 23p end-labelled duplex polynucleotide was incubated with varying concentrations of BamHI (shown in the legend of Figure 5) in a total volume of 30 microliters of 50 mM Tris-HCl, pH 8; 50 mM EDTA; 50 mM NaCl; 1 mM dithiothreitol, 1 hour, 37°C).
After BamHI incubation, the aliquots were subjected to PAGE analysis on an 8% native polyacrylamide gel and visualized. Duplexes to which the enzyme bound should show a retarded mobility on the gel compared to unbound duplexes, and low mobility bands were in fact seen. Shifted (low mobility) and unshifted (mobility similar to duplex in the absence of BamHI) bands were excised from the gel and eluted overnight in 1 mL of 50mM Tris-HCl, 50 mM NaCl pH 8.0 buffer. The sample was concentrated in to 50 microliters of the same buffer and amplified by PCR. Results of the gel-shift assay are shown in Figure 5. Lane 1 is a molecular weight marker lane. Lanes 2 through 10 show the result of incubating duplex polynucleotide aliquots with varying amounts of BamHI as indicated in the legend. It was found that the ratio of shifted to unshifted molecules increased with increasing BamHI concentration, as expected. The duplexes having highest affinity for BamHI should be shifted at relatively low BamHI concentrations, while lower-affinity duplexes will begin to bind as the BamHI concentration is increased.
Samples of shifted and unshifted duplexes were then digested with 200 units of EcoRI per microgram of duplex polynucleotide. The polynucleotides were cleaved at the EcoRI recognition sites, purified by PAGE, and ligated into Lambda ZAp vector predigested with EcoRI and treated with CIAP at 1 : 1 insert:vector ratio in the presence of 2U of T4 ligase in 5 microliters of T4 ligase buffer at 40°C overnight.
The ligated samples were then packaged using Gigapack II Gold packaging extract (Stratagene) and cloned into E. coli XL 1 -Blue host strain and sAje ;ted to blue/white selection. The recombinant (white) clones are selected and eluted in 500 microliters SM buffer (100 mM NaCl. 8 mM MgS04. 50 mM Tris-HCl, pH 7.5, 0.01% gelatin, 0.04% chloroform). Ten microliters of the eluate was amplified by PCR using T3/T7 primers, purified by Qiagen PCR purification kit and sequenced. *5"*"*1 PCT/US99/03754
Certain of the sequences determined by the above procedure are shown in Figure 6 (six shifted and six unshifted sequences are shown). It was found that the shifted (high-affinity) sequences were readily cleaved by BamHI, while the unshifted (low affinity) sequences were resistant to BamHI cleavage. While no homology between the random portions of the high- or low-affinity sequences is immediately apparent, the results show that certain flanking sequences can have a strong effect on the reactivity of a ligand binding site, e.g., the ability of the ligand binding site to bind to or react with a ligand, such as a restriction enzyme. These sequences can then be used to construct probes, primers, binding sites, and the like, having desired ligand binding or reactivity attributes.
In addition, methods such as the methods described above can be used to generate compilations of data for the prediction of the reactivity of a potential binding site for a ligand based upon sequences which flank the ligand binding site. Thus, as increasing numbers of sequences which confer high (or low) ligand binding or reactivity upon a neighboring site are identified, the ability to predict the characteristics of a previously unknown flanking sequence will be improved, without the requirement of performing a binding experiment to determine such characteristics.
Example 2 This Example described a method for providing nucleotide sequences that represent all sequence possibilities. A minimal number of nucleotide repeat sequences of any specified length are created that represent all sequence possibilities using the nucleotides A, G, C, and T.
As used herein, a "pure repeat" is a repeating polynucleotide (e.g., DNA) sequence for which all base positions are defined. For example, a pure dinucleotide repeat is (AG)n, where n is an integer and A and G are the defined nucleotides. Pure trinucleotide repeats, tetranucleotide repeats, pentanucleotide repeats, and higher repeating units are also possible. An "impure repeat" is a repeating polynucleotide (e.g.. DNA) sequence in which random, non-repeating nucleotides may occur. For example, an impure trinucleotide repeat is (AGX)n, where n is an integer and X is a random nucleotide. Impure trinucleotide repeats, tetranucleotide repeats, pentanucleotide repeats, and higher repeating units are also possible.
Pure and impure repeats can be u ^c1 to probe flanking sequence reactivity, e.g.. in a binding experiment analogous to the experiment described in Example 1, supra. In this experiment, instead of random flanking sequences surrounding a BamHI binding site, the flanking sequences include pure or impure repeat units. Synthesis of a population of duplex polynucleotides having pure dinucleotide repeats, pure trinucleotide repeats, impure dinucleotide repeats, impure trinucleotide repeats, and the like, can be performed according to standard methods, e.g., on a DNA synthesizer. Populations (mixtures) of the duplexes can then be incubated with varying quantities of a binding ligand such as BamHI, as described in Example 1. By determining the relative binding affinity of pure and impure repeats for the binding ligand, the ability of a flanking sequence to affect the reactivity of ligand binding site can be systematically explored, and the results can be used to create a database of reactivity values and/or a predictive algorithm, for use in predicting or identifying the ligand binding characteristics which will be conferred upon a ligand binding site by any flanking sequence.
Example 3
In this Example, the standard deviation (SD) of DNA free energy for a protein shared by different organisms was studied. We demonstrate that when the SD of energy is small, the corresponding DNA sequence or range of sequences has been selected for over time and is a consequence of evolution. We also studied genes to find regions of a gene where the DNA is unusually stable or unusually unstable with respect to the energies of all possible DNAs that could code for the peptide sequence. Such regions are of great importance in predicting the location of antigenic and nonantigenic sites. The calculation of the SD of energy for a gene starts with extracting the coding sequences for the protein of interest and translating them into their respective protein sequence. These sequences are aligned simultaneously to provide a representation of the protein sequences containing gaps which can be used for intersequence comparison. A consensus protein sequence is determined. The energy profile for each of the DNA sequences is calculated using known hydrogen-bonding and base stacking free energies over an interval or window of the DNA sequence. This window is then moved one base step along the sequence and a new free energy value is calculated for the new window. This procedure is continued for the entire length of the DNA. The energy profiles are then gapped according to the gapped protein sequences. The SD and mean free energy values are calculated. The SD values are then ranked and the regions with the lowest SD are tabulated. All possible DNA sequence permutations which could code for the consensus protein sequence are determined by applying the know degeneracy of the genetic code. The distribution of energies from the permutations is calculated for each base posiA"- . Upper and lower confidence limits for each base position are then calculated. Regions where the mean free energy exceeds the confidence limits are tabulated. The confidence limit outliers are cross-correlated with the positions of low SD. Regions of cross-correlation are prime candidates for antigenic and nonantigenic response for the protein of interest. As an example, the analysis of the lactate dehydrogenase genes from bacteria along with some data from the mouse beta globin gene and HTLV-I LTR gene are presented. The distribution of free energy states available to the DNA as determined by the amino acid sequence is calculated for each position along the mouse beta globin gene with a 15 base window and defined in discrete energy ranges. A representation of this data is shown in Figure 7. Each distribution represents the probability of finding the DNA at any particular energy. Upon inspection of Figure 7 it is clear that the range of energies available to any particular segment of the DNA varies in width and mean energy. A more informative representation is shown in Figure 8 where the mean energy value at each position is subtracted from the normalized curves. This produces an energetic "baseline". The distributions are reduced by 2.5% from the high and low energy limits to define a 95% confidence interval.
In Figure 8, it is difficult to discern where the mean energy profile of this data set exceeds the upper or lower 95% confidence limits. Therefore, a simplified version was created where only three relative values are used. These are arbitrarily declared as +1 , -1 and 0 for points that exceed the upper and lower 95% confidence limits and those that arc within the + and - 95% confidence limits, respectively. Two different base pair windows, 90 and 180, were used for the calculations. We found that there exist cross-correlations between the coding regions of the gene and positions of relatively high stability. In addition, there are promoter regions 5' and 3' to the coding sequence which also correspond to regions of relatively low stability.
The relationship between DNA stability and mutation rate discussed above is shown in Figure 9. Matching segments of seven HTLV-I LTR sequences with no gaps were used in the energy calculations and to determine the natural logarithm of the mutation rate. The natural logarithm of the mutation rate versus base position is shown in the inset. Upon inspection of the relative free energy profile two features become apparent; relatively high stability regions are seen at ~400 and -700 base positions. These same regions also exhibit relatively low mutation rates as seen in the inset. This correlation supports the link between DNA stability and mutation rate. The relative free energy profiles of a segment of the lactate dehydrogenase
(LDH) gene from three different bacilli were calculated using the method described above. One LDH gene was from a 70°C organism, one from an ambient temperature organism and one from a 0°C organism. The 5' end of the three profiles showed a high degree of similarity with each other up to approximately base position 400, suggesting that even though the absolute free energies may differ, their profiles are conserved. However, upon comparison of the 3' ends of the profiles, it was apparent that the conservation of the energy profiles is not always necessary. It was found that the number of points for which the energy profile is below the low 95% confidence limit is much greater for the high temperature bacillus. The low temperature bacillus has a larger number of points above the upper 95% confidence limit. The ambient temperature bacillus has a number of points exceeding the upper and lower 95%> confidence limits which is between the number of points found for the high and low temperature bacilli. With wishing to bound by any theory, it is believed that the mutation rate is dependent is the DNA stability. Too high a mutation rate may kill the organism due to lethal mutations and too low of a mutation rate would not allow the organism to adapt to environmental changes thereby reducing the survivability of the organism.
We chose lactate dehydrogenase as our gene target since it is a very common gene, there is a large database of information about the protein structure and antigenicity and many gene sequences are available from Genbank. The search for gene sequences was limited to gram positive bacteria and provided 27 unique complete lactate dehydrogenase gene sequences for the analysis. The methodology described above for determining the SD of energy and the unusually high and low energy positions was applied to this set of gene sequences.
The SD of free energy as a function of base position from the set of 27 lactate dehydrogenase genes was calculated and plotted as a set of positive and negative values symmetric about zero. Multiple sequence alignment produced regions which were represented by only a few sequences. In order to maintain statistical significance the regions represented by fewer than 14 sequences were not used. The result was that these regions had gaps with no energy values. Regions where the genes are highly constrained in the number of energy states exhibited by the DNA could be readily determined by this approach.
The deviation of the mean free energy from the expected free energy of the lactate dehydrogenase gene consensus sequence determined as described above was plotted along with 95% confidence limits, and positions where correlations between the low SD of energy and the position where the mean free energy exceeds the 95% confidence limit were found. More details of this comparison are discussed below. The results from the two energy based calculations and the antigenic and nonantigenic maps of the lactate dehydrogenase are summarized in Figure 10. Analysis of the low SD of energy points provides the top line. The poi t? along the DNA with SD of energy less than 1 kcal were included. Therefore, out of 1200 DNA positions 79 exhibited a "low" SD. Positions along the DNA which have a mean free energy deviation that are higher or lower than the 95% confidence limits are designated as + 1 or -1. respectively. All points which fall within the 95% confidence limits are assigned a 0. The result is plotted in the second line graph of Figure 10. From a total of 400 points (amino acid positions) 22 exceeded the 95% confidence limits. The antigenic and nonantigenic sites relative to the consensus gapped DNA determined by Hogrefe et al., J.B.C., 1987 for mouse lactate dehydrogenase in. rabbits are displayed in the two bottom graphs of Figure 10. The vertical bars around base positions 600 and 825 denote the only regions where we find low SD of energy and the mean free energy deviation exceeds the 95% confidence limits. In fact, the region with the unusually stable sequence at 600 is nonantigenic and the region at 825 with an unusually unstable sequence is the second most antigenic region in the protein.
Therefore, it is believed that regions of the coding DNA that have a low SD of energy and are stable with respect to the overall possible range of energy states available for the peptide code for nonantigenic regions of the protein. In other words, these regions of the protein look like "self and thus do not induce an antigenic response. Conversely, regions of the coding DNA which have a low SD of energy and are unstable with respect to the overall possible range of energy states available for the peptide code for antigenic regions of the protein. Based on the consensus sequence determined from a set coding sequences for a particular protein from different organisms and the statistical variation of the free energy of the DNA, it is possible to predict which peptides will be the most likely candidates for eliciting an antigenic response. One advantage of this technique is the short time that is necessary to determine the most effective peptide candidates for inducing an antigenic response.
Surprisingly, the stable regions exhibit 2/3 of their of GC substitution bias in the 3rd position of the codon. Those that are unstable exhibit 50% of their GC substitution in the 3rd position. The stable regions appear to have a higher prevalence of GC substitutions than the unstable regions which appear to be more like a normal DNA sequence. Nevertheless, the stable regions with the higher rate of substitution only change their amino acid, i.e. synonymous 12.5% of the time, whereas the unstable regions make non synonymous substitutions 27% of time. The unstable regions look more like normal or random DNA than the stable regions which have a higher CG substitution which does not change the amino acid identity.
Example 4
In this Example, a section of a gene ("Wild Type Sequence"; the sequence is shown in Figure 1 1) was replar aό with a DNA sequence which codes for the same polypeptide sequence, but which is richer in A and T than the wildtype sequence, and therefore is energetically less stable than the wildtype( "Unstable Insert Sequence" shown in Figure 11). The altered gene was inserted into a plasmid and used to transform E. coli. The wildtype sequence was also inserted into a plasmid and used to transform E. coli as a control. The plasmids included genes to permit blue/white selection of bacteria (e.g., see Example 1, supra).
Transformed bacteria were then subjected to chemical mutagenesis conditions to cause mutations in the sequences. The level of mutagenesis was monitored by scoring the number of white colonies that result (white colonies form if the lacZ alpha- complementation, also carried on the plasmids, becomes mutated to the point where it is no longer functional. Isolated blue colonies were selected for expansion and the plasmid isolated from the resulting colony. This plasmid was then used to transform stock bacteria and the cycle of mutagenesis and plasmid isolation was repeated for a total of four cycles.
The results showed that there was a significant increase in the number of white colonies formed by bacteria carrying the unstable gene sequence, compared to the wildtype-carrying control bacteria. Direct sequencing of the isolated plasmids also showed that showed that the "unstable" sequence-carrying bacteria had a higher overall mutation rate than the controls, and that the mutation profiles also differed: the
"unstable" plasmid had more mutations in the region 3' to the altered sequence than did the control. these results show that the relative tempo of mutation (evolution) can be affected by controlled manipulation of DNA thermodynamic properties. Alteration of DNA sequences can have an effect on the stability of neighboring sequences, even sequences up to 200-250 bases upstream or downstream from the altered sequence. This example thus provides a means for providing "directed evolution" in lower organisms such as bacteria.
Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, numerous equivalents to the specific procedures flescfibed herein. Such equivalents are considered to be within the scope of this invention and are covered by the following claims.
The contents of all publications cited herein are hereby incorporated by reference.
Other embodiments are within the following claims.
What is claimed is:
-1 ~> .

Claims

1. A method of identifying a site on a nucleic acid sequence having a desired free energy variability comprising: providing a nucleotide sequence; casting the nucleotide sequence as the free energy as a function of base pair position; calculating the free energy of X windows centered on a base pair for a plurality of base pairs from the nucleotide sequence for every, or at least a plurality of window sizes between 2 and Y, where Y is an integer between 3 and 100; for each window size, constructing a free energy distribution along the sequence, normalizing the distribution to a standard scale; find the mean normalized free energy values for all windows for each base pair position; subtract the mean value for a position and provide the deviation from the mean of each base position to determine those sequence which show the desired variability.
2. A method of ranking relative reactivities of polynucleotide flanking sequences, the method comprising:
(a) providing a plurality of different nucleic acid molecules wherein each of the molecules has the same ligand binding site located adjacent to at least one flanking sequence;
(b) exposing the plurality of molecules to a ligand capable of binding to the binding site, under conditions such that the relative binding affinity of the ligand for the binding site within at least two of the molecules of the plurality of molecules is determined; and
(c) ranking the relative binding affinities determined in step (b) to rank ranking relative reactivities of polynucleotide flanking sequences.
3. A method for identifying an antigenic site of a protein encoded by a naturally- occurring nucleic acid sequence, the method comprising: determining a free energy profile for the naturally-occurring nucleic acid sequence which encodes the protein; determining regions of the free energy profiles for at least one alternate nuc A - acid sequence which encodes the protein; and identifying regions of the naturally-occurring protein which are relatively unstable compared to a corresponding region of the at least one alternate nucleic acid sequence, such that an antigenic site of the protein is determined.
4. A method for determining relative reactivity of a first nucleotide and a second nucleotide in a pre-selected polynucleotide sequence with a ligand, the method comprising: determining a mutation frequency for the first nucleotide; determining a mutation frequency for the second nucleotide; comparing the mutation frequency for the first nucleotide to the mutation frequency for the second nucleotide; such that the relative reactivity of the first nucleotide and the second nucleotide with the ligand is determined.
5. A method for determining the relative susceptibility to mutation of a first nucleotide and a second nucleotide in a pre-selected polynucleotide sequence, the method comprising: determining sequence homogeneity and free energy of sequences flanking the first nucleotide; determining sequence homogeneity and free energy of sequences flanking the second nucleotide; comparing the sequence homogeneity and free energy of sequences flanking the first nucleotide to the sequence homogeneity and free energy of sequences flanking the second nucleotide; such that the relative susceptibility to mutation of the first nucleotide and the second nucleotide is determined.
6. A method for determining whether a first protein is functionally similar to a second protein, the method comprising: providing a first nucleotide sequence which encodes the first protein; providing a second nucleotide sequence which encodes the second protein; determining a reactivity profile for the first nucleotide sequence, wherein the reactivity profile for the first nucleotide sequence is related to function of the first protein; determining a reactivity profile for the second nucleotide sequence, wherein the reactivity profile for the second nucleotide sequence is related to function of the second protein; comparing the reactivity profile for the first nucleotide sequence to the reactivity profile for the second nucleotide sequence, thereby determining whether the first protein is functionally similar to the second protein.
7. A method for determining whether a first nucleotide sequence encodes a protein which is functionally similar to a protein encoded by a second nucleotide sequence, the method comprising: providing the first nucleotide sequence; determining a reactivity profile for the first nucleotide sequence; providing the second nucleotide sequence; determining a reactivity profile for the second nucleotide sequence; comparing the reactivity profile for the first nucleotide sequence to the reactivity profile for the second nucleotide sequence, thereby determining whether the first nucleotide sequence encodes a protein which is functionally similar to the protein encoded by the second nucleotide sequence.
PCT/US1999/003754 1998-02-21 1999-02-19 Methods for identifying or characterising a site based on the thermodynamic properties of nucleic acids WO1999042621A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU33049/99A AU3304999A (en) 1998-02-21 1999-02-19 Thermodynamic properties of nucleic acids

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US7563398P 1998-02-21 1998-02-21
US60/075,633 1998-02-21

Publications (2)

Publication Number Publication Date
WO1999042621A2 true WO1999042621A2 (en) 1999-08-26
WO1999042621A3 WO1999042621A3 (en) 2000-03-16

Family

ID=22127047

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1999/003754 WO1999042621A2 (en) 1998-02-21 1999-02-19 Methods for identifying or characterising a site based on the thermodynamic properties of nucleic acids

Country Status (2)

Country Link
AU (1) AU3304999A (en)
WO (1) WO1999042621A2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999063074A2 (en) * 1998-06-04 1999-12-09 Tm Technologies, Inc. Altering the ligand-binding characteristics of a nucleic acid ligand binding sequence by altering the nucleotide composition of its flanking sequences
WO2001037191A2 (en) * 1999-11-19 2001-05-25 Proteom Limited Method for manipulating protein or dna sequence data (in order to generate complementary peptide ligands)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1993000446A1 (en) * 1991-06-27 1993-01-07 Genelabs Technologies, Inc. Screening assay for the detection of dna-binding molecules
WO1995000665A1 (en) * 1993-06-17 1995-01-05 The Research Foundation Of State University Of New York Thermodynamics, design, and use of nucleic acid sequences
WO1998037242A1 (en) * 1997-02-24 1998-08-27 Tm Technologies, Inc. Process for selecting anti-sense oligonucleotides
WO1999032664A1 (en) * 1997-12-23 1999-07-01 Tm Technologies, Inc. Method of selecting flanking sequences which convey relative binding affinities to a ligand binding site

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1993000446A1 (en) * 1991-06-27 1993-01-07 Genelabs Technologies, Inc. Screening assay for the detection of dna-binding molecules
WO1995000665A1 (en) * 1993-06-17 1995-01-05 The Research Foundation Of State University Of New York Thermodynamics, design, and use of nucleic acid sequences
WO1998037242A1 (en) * 1997-02-24 1998-08-27 Tm Technologies, Inc. Process for selecting anti-sense oligonucleotides
WO1999032664A1 (en) * 1997-12-23 1999-07-01 Tm Technologies, Inc. Method of selecting flanking sequences which convey relative binding affinities to a ligand binding site

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
BRUICE T W ET AL: "Control of complexity constraints on combinatorial screening for preferred oligonucleotide hybridisation sites on structured RNA" BIOCHEMISTRY,US,AMERICAN CHEMICAL SOCIETY. EASTON, PA, vol. 36, no. 16, page 5004-5019-5019 XP002110003 ISSN: 0006-2960 *
DATABASE CHEMABS [Online] Chemical Abstracts Service, Columbus, Ohio, US Accession Number 117:105283, STULL, R.A. ET AL.: "Predicting antisense oligonucleotide inhibitory efficacy: a computational approach using histograms and thermodynamic indices" XP002110007 & NUCLEIC ACIDS RES., vol. 20, no. 13, 1992, pages 3501-3508, *
DATABASE CHEMICAL ABSTRACTS [Online] abstr. no. 92401, 18 August 1997 (1997-08-18) SCZAKIEL G. ET AL.: "Computer-aided calculation of the local folding potential of target RNA and its use for ribozyme design " XP002110004 & METHODS MOL. BIOL., vol. 74, 1997, pages 11-15, US *
DATABASE CHEMICAL ABSTRACTS [Online] Accession no. 119:132896, SZAKIEL G. ET AL.: "Computer-aided search for effective antisense RNA target sequences of the human immunodeficiency virus type 1" XP002110005 & ANTISENSE RES.DEV., vol. 3, no. 1, 1993, pages 45-52, *
DATABASE MEDLINE [Online] us national library of medicine Accession Number: 96374653, HYNDMAN D. ET AL.: "Software to detewrmine optimal oligonucleotide sequences based on hybridization simulation data" XP002110006 & BIOTECHNIQUES, vol. 20, no. 6, June 1996 (1996-06), pages 1090-1097, *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999063074A2 (en) * 1998-06-04 1999-12-09 Tm Technologies, Inc. Altering the ligand-binding characteristics of a nucleic acid ligand binding sequence by altering the nucleotide composition of its flanking sequences
WO1999063074A3 (en) * 1998-06-04 2000-04-06 Tm Technologies Inc Altering the ligand-binding characteristics of a nucleic acid ligand binding sequence by altering the nucleotide composition of its flanking sequences
WO2001037191A2 (en) * 1999-11-19 2001-05-25 Proteom Limited Method for manipulating protein or dna sequence data (in order to generate complementary peptide ligands)
WO2001037191A3 (en) * 1999-11-19 2002-03-21 Proteom Ltd Method for manipulating protein or dna sequence data (in order to generate complementary peptide ligands)
US6721663B1 (en) 1999-11-19 2004-04-13 Proteom Limited Method for manipulating protein or DNA sequence data in order to generate complementary peptide ligands

Also Published As

Publication number Publication date
WO1999042621A3 (en) 2000-03-16
AU3304999A (en) 1999-09-06

Similar Documents

Publication Publication Date Title
Cech et al. Secondary structure of the Tetrahymena ribosomal RNA intervening sequence: structural homology with fungal mitochondrial intervening sequences.
Harris et al. Use of photoaffinity crosslinking and molecular modeling to analyze the global architecture of ribonuclease P RNA.
Wells et al. The role of DNA structure in genetic regulatio
Hanvey et al. Site-specific inhibition of Eco RI restriction/modification enzymes by a DNA triple helix
Pan et al. Variable structures of Fis-DNA complexes determined by flanking DNA–protein contacts
Lowary et al. New DNA sequence rules for high affinity binding to histone octamer and sequence-directed nucleosome positioning
Escolar et al. Binding of the fur (ferric uptake regulator) repressor of Escherichia coli to arrays of the GATAAT sequence
Wang et al. The actinomycete Thermobispora bispora contains two distinct types of transcriptionally active 16S rRNA genes
Komine et al. Genomic organization and physical mapping of the transfer RNA genes in Escherichia coli K12
Nielsen et al. Sequence specific inhibition of DNA restriction enzyme cleavage by PNA
EP3088533B1 (en) Recombinase polymerase amplification
EP2245187B1 (en) Methods for accurate sequence data and modified base position determination
Postel et al. Human NM23/nucleoside diphosphate kinase regulates gene expression through DNA binding to nuclease-hypersensitive transcriptional elements
US5556747A (en) Method for site-directed mutagenesis
US6242222B1 (en) Programmed sequential mutagenesis
Westhof et al. Mapping in three dimensions of regions in a catalytic RNA protected from attack by an Fe (II)-EDTA reagent
Blackburn et al. DNA termini in ciliate macronuclei
Nieuwlandt et al. The RNA component of RNase P from the archaebacterium Haloferax volcanii.
Yoshizawa et al. Nuclease resistance of an extraordinarily thermostable mini-hairpin DNA fragment, d (GCGAAGC) and its application to in vitro protein synthesis
Zito et al. Lead-catalyzed cleavage of ribonuclease P RNA as a probe for integrity of tertiary structure
Qian et al. Structural alterations far from the anticodon of the tRNAGGGProof Salmonella typhimurium induce+ 1 frameshifting at the peptidyl-site
Jilk et al. The organization of the outside end of transposon Tn5
Munishkin et al. Systematic Deletion Analysis of Ricin A-Chain Function: SINGLE AMINO ACID DELETIONS (∗)
WO1999042621A2 (en) Methods for identifying or characterising a site based on the thermodynamic properties of nucleic acids
Senger et al. The presence of a D-stem but not a T-stem is essential for triggering aminoacylation upon anticodon binding in yeast methionine tRNA

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG UZ VN YU ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW SD SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
AK Designated states

Kind code of ref document: A3

Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG UZ VN YU ZW

AL Designated countries for regional patents

Kind code of ref document: A3

Designated state(s): GH GM KE LS MW SD SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

NENP Non-entry into the national phase

Ref country code: KR

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase