WO1999022025A1 - Inference sequencing by hybridization - Google Patents

Inference sequencing by hybridization Download PDF

Info

Publication number
WO1999022025A1
WO1999022025A1 PCT/US1998/022519 US9822519W WO9922025A1 WO 1999022025 A1 WO1999022025 A1 WO 1999022025A1 US 9822519 W US9822519 W US 9822519W WO 9922025 A1 WO9922025 A1 WO 9922025A1
Authority
WO
WIPO (PCT)
Prior art keywords
mers
mer
isbh
sequence
target
Prior art date
Application number
PCT/US1998/022519
Other languages
French (fr)
Inventor
Charles R. Cantor
Fouad A. Siddiqi
Original Assignee
Trustees Of Boston University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Trustees Of Boston University filed Critical Trustees Of Boston University
Priority to AU11185/99A priority Critical patent/AU1118599A/en
Publication of WO1999022025A1 publication Critical patent/WO1999022025A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • C12Q1/6874Methods for sequencing involving nucleic acid arrays, e.g. sequencing by hybridisation
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • C12Q1/6834Enzymatic or biochemical coupling of nucleic acids to a solid phase
    • C12Q1/6837Enzymatic or biochemical coupling of nucleic acids to a solid phase using probe arrays or probe chips

Definitions

  • Diagnostic SBH employs hybridization of a target DNA sequence to a tiled array of several thousand short oligonucleotide probes of known sequence [W. Bains et al., A Novel Method For Nucleic Acid Sequence Determination, J TTieor Biol, 135, 303 (1988); E.
  • Subsequences must be arranged by examining how they overlap with one another. For example, octa-nucleotides are assembled into longer sequences by finding corresponding seven-base overlaps. The accuracy of reassembly is limited because any particular subsequence can occur more than once in the target DNA, leading to an ambiguity in the final reconstructed sequence [P. Pevzner et al., Towards DNA Sequencing Chips, In I9 h Int Conf Mathematical Foundations of Computer Science, Lecture Notes Iin Computer Science, Springer-Verlag, Berlin, Vol 841, pp. 143-158 (1994); P.
  • Figure 1 Basic concepts underlying the design of an Inference Sequencing by Hybridization (ISBH) probe array.
  • a target in this case a four-digit phone number, is characterized by a degenerate probe array with a four-fold redundancy using only 64 probes.
  • the probe array does not detect any digit directly, but the information gathered is sufficient to unambiguously infer the identity of the number.
  • a probe array based on conventional SBH designs capable of acquiring the same information would require 10,000 probes, one for each phone number.
  • Figure 2 General scheme for ISBH.
  • a long single-stranded target DNA is sheared into short oligonucleotides and hybridized to an ISBH probe array.
  • the pattern of hybridization is used to create a set of degenerate 16-mers that characterize the target DNA.
  • Information from this degenerate set is used by an inference algorithm to produce a set of explicit 16-mers.
  • the set of explicit 16-mers produced by the inference algorithm contains all 16-mers actually present in the target DNA sequence as well as "false positive" 16-mers that are not in the target.
  • a data reduction algorithm is then used to eliminate the false positives from the set of explicit
  • FIG. 3 Design of the ISBH probe array of degenerate 16-mers used in this study.
  • the inference algorithm generates all valid combinations of data from the probe array to produce a set of nine inferred tetranucleotides, six of which are false positives. In general, the number of false positives generated by inference decreases with the number of probes used in the ISBH array. If the ISBH array used in this example had 16 additional probes, then no false positives would have been generated.
  • Figure 5 Data reduction after inference for the ISBH array of degenerate 16-mers used in this study. Seventy-six target DNA sequences downloaded from GenBank comprising a total of 2.45 million bases were tested by computer simulation. The number of 16-mers generated by inference increases as a power law function of the number of different 16-mers in the target DNA (filled circles). Data reduction reliably eliminated all but a handful of the false positives for all target lengths investigated (open triangles), even when false positives comprised more than 99% of the inferred 16-mer set.
  • Figure 8. Summary of ISBH simulation results. All lengths are in bases. Locus and definition for each sequence are shown exactly as they appear in GenBank. The number different 16-mers in a target is defined as the number of 16-mers which have different base sequences. The fraction of target 16-mers that are repeated is given by: 1 - [(number of different 16-mers in target)/(target length - 15)], which is a quantitative measure of repetitiveness of the sequence. The fraction of the inferred set that are false positive is given by: 1 - [(number of different 16-mers in target)/(number of 16-mers in inferred set)].
  • the present invention is directed to a method that satisfies the above mentioned problems by introducing a new SBH implementation for de novo sequencing, which we term Inference Sequencing by Hybridization (ISBH).
  • ISBH Inference Sequencing by Hybridization
  • Figure 1 The basic concepts underlying ISBH are illustrated in Figure 1 with an example of how to determine the last four digits of a phone number without detecting any digit directly.
  • the digit groupings in Figure 2 are analogous to familiar base groupings such as purine/pyrimidine (R/Y), amino/keto (M/K), and weak strong (W/S).
  • ISBH is an indirect strategy that uses several small arrays (65,536 probes each) to closely approximate the information that would be gathered from a single SBH array containing ⁇ 4.3 billion probes ( Figure 2).
  • Our strategy relies on degenerate probe arrays that are similar to binary SBH arrays proposed by Pevzner et. al. [P. Pevzner et al, Towards DNA Sequencing Chips, In 19 th Int Conf Mathematical Foundations of Computer Science, Lecture Notes Iin Computer Science, Springer-Verlag, Berlin, Vol 841, pp. 143-158 (1994)].
  • Unlike conventional SBH whose accuracy drops with increasing target length, we find that ISBH always reconstructs targets at 100% accuracy.
  • ISBH returns a target sequence as non-overlapping fragments that range from several hundred to several thousand bases in length.
  • the number of fragments increases with the length and repetitiveness of the target DNA, but in principle, any length of target can be sequenced in a single experiment.
  • the inventors have simulated the following laboratory experiment: 1) a single stranded target DNA of unknown sequence is sheared into overlapping oligomers 16 bases long. 2) These oligomers are hybridized to an ISBH degenerate probe array and this pattern of hybridization is detected. 3) the hybridization data is reconstructed by computer algorithm into contiguous sequence.
  • the 16-mers from the target DNA were then discarded, to account for the fact that in a real experiment, the target DNA is of unknown sequence.
  • the signals from the ISBH array were collected to produce a set of degenerate 16-mers, which is the non-repeated set of degenerate 16-mers present in the target DNA. We emphasize that the only data retained is the signal pattern from the ISBH array, as would be detected in an actual ISBH experiment.
  • the degenerate set contains no information about the explicit base sequence of any 16-mer present or its position in the target DNA sequence.
  • a set of explicit 16-mers is inferred from the set of degenerate 16-mers detected by the ISBH array. The inference is accomplished by testing every possible explicit 16-mer against the degenerate set. Any particular explicit 16-mer is included in the inferred set only if exactly 25 corresponding degenerate 16-mers are present in the data from the ISBH array. Under conditions of ideal hybridization, the inferred 16-mer set is always a superset of the set of 16-mers present in the target DNA sequence. The inferred set usually contains false positive 16-mers, ones which are not actually present in the original target DNA sequence. The number of these false positives increases with the length and repetitiveness of the target DNA
  • Sequence Reassembly The 16-mers remaining in the inferred set are then assembled into longer sequences. This is done by comparing the 3 '-fifteen bases of each 16-mer to the 5'-fifteen bases of every other 16-mer in the data set. If a match is found, then the two 16-mers are combined into a single 17-mer. The two terminal 15-mers on either side of this newly formed 17-mer are compared for overlap with the remaining 16-mers in the data set and the process is repeated until no more overlaps are found. To insure reconstruction accuracy, only fragments at least 100 bases in length were considered to be part of the target sequence.
  • ISBH simulations were performed on 76 target sequences obtained from GenBank comprising a total of 2.45 million bases, ranging from 5 to 100 kilobases in length.
  • the size of the inferred 16-mer pool increases as a power law function of the number of different 16-mers in the target DNA. Data reduction reliably eliminates all but a handful of the false positives for all target lengths investigated, even when false positives accounted for more than 99% of the inferred 16-mer pool ( Figure 5) .
  • the set of inferred 16-mers remaining after data reduction closely approximates the information that would be gathered from an SBH array containing all explicit 16-mers (-4.3 billion probes). Sequence Reconstruction
  • the ISBH reconstruction algorithm typically returns a target DNA as several non-overlapping fragments that are in the range of several hundred to several thousand bases in length. Reconstructed fragments always show 100% identity (by BLASTN) to some region of unknown position in the target DNA sequence. The largest single fragment reconstructed was 28 kb, and fragments longer than lOkb were commonly observed ( Figure 6). In most cases, more than 95% of the target DNA was recovered in a simulated ISBH experiment ( Figure 7). ISBH performs optimally on target DNAs having no repeated 16-mers, generally returning a handful of long (3-15 kb) fragments. Even on sequences with many repeated 16-mers, ISBH returns dozens of fragments shorter than 5 kb, which is five to ten times the performance of electrophoretic sequencing methods.
  • Target DNAs longer than 50 kb tend to produce large numbers of false positives in the inference step, a few of which remain after data reduction. False positives introduce ambiguities during reassembly, leading to lower average lengths for reconstructed fragments.
  • Figure 8 A comprehensive list of each sequence analyzed as well as a summary of the ISBH simulation results are shown in Figure 8.
  • a benefit of the present invention in contrast to conventional SBH, is that the inventor's ISBH method has the potential to sequence very long targets at high accuracy, using an oligonucleotide array of moderate size.
  • the hypothetical ISBH array studied here could easily sequence 15-45kb of DNA in a single experiment.
  • the ISBH method requires no electrophoresis, no information about the target DNA, and could be used for diagnostic as well as de novo applications.
  • ISBH could generate more sequence data than two dozen Sanger sequencing reactions after shotgun subcloning of a target DNA.
  • each fragment reconstructed by the ISBH method can outperform electrophoretic methods by 28-fold.
  • ISBH performance is equivalent to electrophoretic methods.
  • ISBH reconstruction of a single target DNA generally required less than 10 minutes of supercomputer time to complete.
  • the computational complexity of the inference step is of order N 2
  • the data reduction and reassembly steps are of order N log(N).
  • Sequence reconstruction using a highly streamlined ISBH algorithm running on a typical desktop computer could be completed in a few hours. For DNA of random sequence, a given 16-mer should appear once every 4 16 «
  • DNAs from a wide variety of organisms in the range of 10-100 kb typically have hundreds or thousands of repeated 16-mers. As shorter subsequences are examined, the number of repeated subsequences increases dramatically. For example, the 48.5 kb genome of bacteriophage lambda, which has no repeated 16-mers, has a single repeated 15-mer, and ten distinct 14-mers appearing more than once.
  • ISBH de novo SBH using oligonucleotide probes shorter than 16 bases will perform poorly on target DNAs longer than a few kilobases.
  • ISBH appears to perform optimally both in terms of absolute read lengths and relative target coverage on DNAs in the range of 25kb that have small numbers of repeated 16-mers.
  • ISBH is a scalable technique - the number of false positives generated by inference increases as the number of probes used in the ISBH array decreases.
  • each degenerate probe is actually a mixture of many individual probes that are bound to the same area in an SBH array.
  • the binding capacity of such a degenerate probe is greatly reduced in comparison to a pure individual probe - for the hypothetical ISBH array in this study, the complexity of each degenerate probe is 65,536.
  • Such a high probe complexity may mean that accurate physical hybridization cannot be achieved with a high signal to noise ratio.
  • Possible solutions to this problem include the use of base analogs to decrease probe complexity or the addition of an enzymatic step (e.g., ligation) to augment the accuracy of simple physical hybridization.
  • ISBH performance is equivalent to the case of a single-stranded DNA twice as long. If a double stranded target is cleaved by a restriction endonuclease before hybridization, ISBH will return the sequence of each restriction fragment. If this experiment is repeated using a restriction endonuclease with a different recognition site, then the fragments can be aligned relative to one another using standard contig reassembly algorithms. The potential of the ISBH strategy is so strong that we are now investigating strategies to implement it in practice.
  • ISBH Probe Array Another embodiment of the ISBH Probe Array, Experimental, and Data Analysis Algorithm Design consists of the following:
  • Probe Array Design The proposed array consists of 768K oligonucleotide probes divided into three groups: 1) all 256K possible 9-base single-stranded sequences, 2) all 256K possible 9-base 5 '-overhanging partial duplexes, and 3) all 256K possible 9-base 3 '-overhanging partial duplexes.
  • the target DNA must be single stranded - it may be prepared (for example) by long PCR of a double stranded target using one primer that is biotinylated at its 5' end to facilitate purification from the other strand.
  • the biotinylated strand may be captured on streptavidin-coated beads or column.
  • Group I 5'-ANNNNNNN-3', 5'-CNNNNNNN-3', 5'-GNNNNNNN-3', 5'-TNNNNNNN-3'.
  • Group II 5'-P-NNNNNNNA-Sdd-3', 5'-P-NNNNNNNC-Sdd-3 ⁇ 5'-P-NNNNNNNG-Sdd-3', and 5'-P-NNNNNNNT-Sdd-3'.
  • each oligonucleotide from group I is combined with an oligonucleotide from group II and then hybridized to the single-stranded target under conditions favoring accurate base- pairing. After hybridization has occurred, the oligonucleotides that are still remaining in solution (for example) must be removed by size-exclusion column chromatography.
  • DNA ligase (and all necessary cofactors) are added to the reaction mixture. After the ligation reaction, exonuclease III is added to the reaction mixture to destroy any unligated oligonucleotides that are still hybridized to the target. Under ideal conditions, the ligation products will be 16-mers of the form:
  • Each of the sixteen probe array hybridization experiments described above generates the following data: a set of 9-mers from probe group 1, a set of 9-mers from probe group 2, and a set of 9-mers from group 3.
  • the following algorithm is used to expand the data: compare each 9-mer from group 2 to each 9-mers from group 3. If the 9-mer from group 2 has the same 3' two bases as the 5' two bases of the 9-mer in group 3 (i.e., they have a two-base overlap), then combine the two 9-mers to form a single 16-mer (concatenate the 3' seven bases of the group 3 oligo to the 3' end of the group 2 oligo).
  • This newly formed 16-mer is retained only if all eight of its 9-base subsequences are present in group 1. This analysis is performed for all probe array experiments and all retained 16-mers are collected into a single set. This set of 16-mers (which is the inferred set of explicit 16-mers from the target) is then subjected to the same data reduction and sequence reconstruction algorithms that we have previously described for ISBH.

Abstract

The inventors present a new strategy for de novo sequencing using oligonucleotide probe arrays which we term Inference Sequencing by Hybridization (ISBH). The ISBH method characterises a target DNA with several small degenerate probe arrays and reconstructs the base sequence of the target via computer algorithm at 100 % accuracy. ISBH returns a target sequence as multiple non-overlapping fragments several hundred to several thousand bases in length. Target DNAs up to 100 kilobases long can potentially be sequenced in a single experiment. A hypothetical ISBH probe array and sequence reconstruction algorithm were designed and tested by computer simulation on 76 DNA sequences obtained from GenBank comprising a total of 2.45 million base pairs. ISBH performs optimally on target DNAs in the range of 15-45 kilobases that have low levels of repeated sequence. We believe that the ISBH approach has great potential to increase the speed of DNA sequencing.

Description

INFERENCE SEQUENCING BY HYBRIDIZATION
This application for patent under 35 U.S.C. § 111(a) claims priority to Provisional Application Serial No. 60/063,103, filed October 24, 1997 under 35 U.S.C. § 111(b). This invention was made with Government Support under Contract Number DGE-9452651 awarded by the National Science Foundation. The Government has certain rights in the invention.
BACKGROUND OF THE INVENTION
Procedures involving use of Sequencing by hybridization (SBH) are known to those skilled in the art, and have recently been demonstrated to be useful as a powerful alternative to electrophoretic methods for diagnostic DNA analysis [M. Chee et al.,
Accessing Genetic Information With High Density DNA Arrays, Science 21 A, 610 (1996); J. Hacia et al, Detection Of Heterozygous Mutations In BRCA1 Using High- Density Oligonucleotide Arrays And Two-Color Fluorescence Analysis, Nature Genetics 14, 441 (1996)]. Diagnostic SBH employs hybridization of a target DNA sequence to a tiled array of several thousand short oligonucleotide probes of known sequence [W. Bains et al., A Novel Method For Nucleic Acid Sequence Determination, J TTieor Biol, 135, 303 (1988); E. Southern et al, Hybridization With Oligonucleotide Arrays, Genomics, 13, 1008 (1992)]. The pattern of hybridization, detected by fluorescence microscopy, indicates which oligonucleotides in the probe array are present in the target DNA. When this information is compared against a reference target sequence, the entire sequence of the target DNA can be reconstructed at high accuracy.
SBH, while offering great advantages in terms of throughput for diagnostic sequence analysis, suffers from the drawback that a different probe array must be tailored for each target DNA analyzed [M. Chee et al, Accessing Genetic Information
With High Density DNA Arrays, Science, 21 A, 610 (1996)]. For de novo sequencing, current SBH methods are not competitive with electrophoretic sequencing techniques that yield 600-1000 base pair read lengths per experiment [P. Pevzner et al., Improved Chips For Sequencing By Hybridization, J Biomolecular Struct Dyn, 9, 399 (1991)]. Even under perfect experimental conditions, existing SBH designs cannot reconstruct a unique target sequence from hybridization data alone [P. Pevzner et al., Towards DNA Sequencing Chips, In 19th Int Conf Mathematical Foundations of Computer Science,
Lecture Notes Iin Computer Science, Springer-Verlag, Berlin, Vol 841, pp. 143-158 (1994); P. Pevzner, Rearrangements Of DNA Sequences And SBH, Computers Chem, 18, 221 (1994)]. Without a reference sequence for comparison, de novo SBH is fundamentally limited because it acquires base sequence information at the cost of positional information. One knows exactly which subsequences (probe sequences) .are present in the target DNA but not where they are located.
Subsequences must be arranged by examining how they overlap with one another. For example, octa-nucleotides are assembled into longer sequences by finding corresponding seven-base overlaps. The accuracy of reassembly is limited because any particular subsequence can occur more than once in the target DNA, leading to an ambiguity in the final reconstructed sequence [P. Pevzner et al., Towards DNA Sequencing Chips, In I9h Int Conf Mathematical Foundations of Computer Science, Lecture Notes Iin Computer Science, Springer-Verlag, Berlin, Vol 841, pp. 143-158 (1994); P. Pevzner, Rearrangements Of DNA Sequences And SBH, Computers Chem, 18, 221 (1994)]. De novo SBH designs typically use the complete set of all possible oligonucleotide probes of a given length [W. Bains et al, A Novel Method For Nucleic Acid Sequence Determination, J Theor Biol, 135, 303 (1988); N. Broude et al, Enhanced DNA Sequencing By Hybridization, Proc Natl Acad Sci USA, 91, 3072 (1994); R. Drmanac et al, DNA Sequence Determination by Hybridization: A Strategy For Efficient Large-Scale Sequencing, Science, 260, 1649 (1993); R. Drmanac et al, Sequencing Of Megabase Plus DNA By Hybridization: Theory Of The Method, Genomics, 4, 114 (1989)]; the use of longer probes increases reconstruction accuracy but requires the use of very large arrays (> 109 probes), since the number of required probes increases exponentially with probe length. Even assuming perfect hybridization, an SBH array containing all ~106 possible 10-mers would reliably be able to sequence only about 600 bp of target DNA in a single experiment. As longer target DNA sequences are attempted, the reconstruction accuracy drops precipitously. For a detailed discussion of SBH reassembly algorithms and their limitations see references [P. Pevzner et al, Towards DNA Sequencing Chips, In 19th Int Conf Mathematical
Foundations of Computer Science, Lecture Notes Iin Computer Science, Springer- Verlag, Berlin, Vol 841, pp. 143-158 (1994); P. Pevzner et al, Improved Chips For Sequencing By Hybridization, J Biomolecular Struct Dyn, 9, 399 (1991); P. Pevzner, Rearrangements Of DNA Sequences And SBH, Computers Chem, 18, 221 (1994)]. All de novo SBH strategies proposed thus far are direct methods, that is, they directly probe the target DNA with oligonucleotides whose sequences are then assembled into longer fragments. Since it is currently not feasible to manufacture a probe array containing more than approximately 106 tiled oligonucleotide probes, a direct de novo SBH approach cannot outperform electrophoretic sequencing in terms of read length and reaaccuracy.
IN THE FIGURES
The present invention will become better understood with reference to the following description, appended claims, and accompanying figures where:
Figure 1. Basic concepts underlying the design of an Inference Sequencing by Hybridization (ISBH) probe array. A target, in this case a four-digit phone number, is characterized by a degenerate probe array with a four-fold redundancy using only 64 probes. The probe array does not detect any digit directly, but the information gathered is sufficient to unambiguously infer the identity of the number. A probe array based on conventional SBH designs capable of acquiring the same information would require 10,000 probes, one for each phone number.
Figure 2. General scheme for ISBH. A long single-stranded target DNA is sheared into short oligonucleotides and hybridized to an ISBH probe array. The pattern of hybridization is used to create a set of degenerate 16-mers that characterize the target DNA. Information from this degenerate set is used by an inference algorithm to produce a set of explicit 16-mers. The set of explicit 16-mers produced by the inference algorithm contains all 16-mers actually present in the target DNA sequence as well as "false positive" 16-mers that are not in the target. A data reduction algorithm is then used to eliminate the false positives from the set of explicit
16-mers. The explicit 16-mers that remain after data reduction are then reassembled at high accuracy into contiguous sequence.
Figure 3. Design of the ISBH probe array of degenerate 16-mers used in this study. The array consists of 25 different probe groups. Each group pattern represents 216 = 65,536 degenerate 16-mers, for a total of 1,638,400 probes. Each probe in the array represents 65,536 explicit 16-mers. Under ideal conditions, a single target 16-mer will hybridize to exactly one probe in each probe group.
Figure 4. Example of how false positives are generated by inference. Tetranucleotides from a target DNA are characterized by an ISBH probe array of degenerate 4-mers with two- fold redundancy (R = A or G; Y - C or T; W = A or T;
S = C or G). The inference algorithm generates all valid combinations of data from the probe array to produce a set of nine inferred tetranucleotides, six of which are false positives. In general, the number of false positives generated by inference decreases with the number of probes used in the ISBH array. If the ISBH array used in this example had 16 additional probes, then no false positives would have been generated.
Figure 5. Data reduction after inference for the ISBH array of degenerate 16-mers used in this study. Seventy-six target DNA sequences downloaded from GenBank comprising a total of 2.45 million bases were tested by computer simulation. The number of 16-mers generated by inference increases as a power law function of the number of different 16-mers in the target DNA (filled circles). Data reduction reliably eliminated all but a handful of the false positives for all target lengths investigated (open triangles), even when false positives comprised more than 99% of the inferred 16-mer set.
Figure 6. Absolute performance of the ISBH sequence reassembly algorithm. The reassembly algorithm returns a target DNA as several non-overlapping fragments
- A - in the range of several hundred to several thousand bases in length. The largest fragment reconstructed in this study was 28 kilobases, and fragments longer than lOkb were commonly observed. Reconstructed fragments always show 100% identity to some region of the target DNA. Figure 7. Total target coverage in a simulated ISBH experiment. ISBH typically covers more than 95% of a target DNA in a single hybridization experiment in fragments that are longer than 500 bases.
Figure 8. Summary of ISBH simulation results. All lengths are in bases. Locus and definition for each sequence are shown exactly as they appear in GenBank. The number different 16-mers in a target is defined as the number of 16-mers which have different base sequences. The fraction of target 16-mers that are repeated is given by: 1 - [(number of different 16-mers in target)/(target length - 15)], which is a quantitative measure of repetitiveness of the sequence. The fraction of the inferred set that are false positive is given by: 1 - [(number of different 16-mers in target)/(number of 16-mers in inferred set)].
SUMMARY OF THE INVENTION
The present invention is directed to a method that satisfies the above mentioned problems by introducing a new SBH implementation for de novo sequencing, which we term Inference Sequencing by Hybridization (ISBH). The basic concepts underlying ISBH are illustrated in Figure 1 with an example of how to determine the last four digits of a phone number without detecting any digit directly. A conventional SBH strategy would require 104 = 10,000 probes (one for each phone number), whereas the ISBH approach gathers the same information using only 4x24 = 64 degenerate probes. The digit groupings in Figure 2 are analogous to familiar base groupings such as purine/pyrimidine (R/Y), amino/keto (M/K), and weak strong (W/S).
ISBH is an indirect strategy that uses several small arrays (65,536 probes each) to closely approximate the information that would be gathered from a single SBH array containing ~4.3 billion probes (Figure 2). Our strategy relies on degenerate probe arrays that are similar to binary SBH arrays proposed by Pevzner et. al. [P. Pevzner et al, Towards DNA Sequencing Chips, In 19th Int Conf Mathematical Foundations of Computer Science, Lecture Notes Iin Computer Science, Springer-Verlag, Berlin, Vol 841, pp. 143-158 (1994)]. Unlike conventional SBH, whose accuracy drops with increasing target length, we find that ISBH always reconstructs targets at 100% accuracy. ISBH returns a target sequence as non-overlapping fragments that range from several hundred to several thousand bases in length. The number of fragments increases with the length and repetitiveness of the target DNA, but in principle, any length of target can be sequenced in a single experiment. We have designed and empirically tested by computer simulation an ISBH array and reconstruction algorithm for use in de novo sequencing of targets up to 100 kilobases in length.
METHODS
The inventors have simulated the following laboratory experiment: 1) a single stranded target DNA of unknown sequence is sheared into overlapping oligomers 16 bases long. 2) These oligomers are hybridized to an ISBH degenerate probe array and this pattern of hybridization is detected. 3) the hybridization data is reconstructed by computer algorithm into contiguous sequence.
An ISBH probe array containing 1.64 million different degenerate 16-mers was designed. This array is 25-fold redundant - it consists of 25 groups of 216=65,536 (64K) different probe oligomers. Figure 3 shows the identity of each degenerate probe group in the array. Each group of 64K degenerate probes is capable of hybridizing to all possible explicit 16-mers. Thus, a single explicit 16-mer will, under ideal conditions, hybridize to exactly 25 different degenerate probes in the ISBH array.
Simulations of ISBH experiments and subsequent data analysis were performed on a Silicon Graphics Origin2000 supercomputer (Boston University Center for Computational Science) with code written in C as follows: 1) Each target DNA sequence was retrieved from GenBank (http:// www2.ncbi.nlm.nih.gov/genbank) and broken up into all possible component 16-mers. This set of 16-mers was then shuffled to destroy any positional information. 2) Each 16-mer from the target DNA was compared against all 1.64 million degenerate probes in the ISBH array, simulating ideal hybridization. If a 16-mer from the target was found to hybridize with one of the degenerate probes, then the hybridization was considered to be a signal arising from that particular probe in the ISBH array. 3) The 16-mers from the target DNA were then discarded, to account for the fact that in a real experiment, the target DNA is of unknown sequence. 4) The signals from the ISBH array were collected to produce a set of degenerate 16-mers, which is the non-repeated set of degenerate 16-mers present in the target DNA. We emphasize that the only data retained is the signal pattern from the ISBH array, as would be detected in an actual ISBH experiment. The degenerate set contains no information about the explicit base sequence of any 16-mer present or its position in the target DNA sequence.
Sequence Reconstruction
1) Inference. A set of explicit 16-mers is inferred from the set of degenerate 16-mers detected by the ISBH array. The inference is accomplished by testing every possible explicit 16-mer against the degenerate set. Any particular explicit 16-mer is included in the inferred set only if exactly 25 corresponding degenerate 16-mers are present in the data from the ISBH array. Under conditions of ideal hybridization, the inferred 16-mer set is always a superset of the set of 16-mers present in the target DNA sequence. The inferred set usually contains false positive 16-mers, ones which are not actually present in the original target DNA sequence. The number of these false positives increases with the length and repetitiveness of the target DNA
(Figure 4).
2) Primary Data Reduction. Since the inferred set of explicit 16-mers contains an unknown number of false positives, a data reduction step is required to eliminate them. Every 16-mer in the inferred set is examined to determine if it overlaps by six bases at least one other 16-mer in the inferred set on both its 3' and 5' ends. If both overlaps are not found the 16-mer is discarded. This procedure is repeated iteratively on the resultant set of 16-mers, each time examining an overlap one base longer than was used for the previous iteration, until an overlap of fifteen bases is reached. The set that remains is then iterated using fifteen-base overlaps until four or fewer 16-mers are discarded at each iteration.
3) Secondary Data Reduction. All possible reconstruction ambiguities are eliminated by comparing the 3 '-fifteen bases of each 16-mer with the 3 '-fifteen bases of every other 16-mer in the set from step 2 above. If two or more 16-mers are found to have identical fifteen base 3 '-ends, then they are all discarded from the data set. A similar procedure is used to compare the 5 '-fifteen bases of the 16-mers and eliminate any duplication.
4) Sequence Reassembly. The 16-mers remaining in the inferred set are then assembled into longer sequences. This is done by comparing the 3 '-fifteen bases of each 16-mer to the 5'-fifteen bases of every other 16-mer in the data set. If a match is found, then the two 16-mers are combined into a single 17-mer. The two terminal 15-mers on either side of this newly formed 17-mer are compared for overlap with the remaining 16-mers in the data set and the process is repeated until no more overlaps are found. To insure reconstruction accuracy, only fragments at least 100 bases in length were considered to be part of the target sequence.
RESULTS
Inference Algorithm
ISBH simulations were performed on 76 target sequences obtained from GenBank comprising a total of 2.45 million bases, ranging from 5 to 100 kilobases in length. The size of the inferred 16-mer pool increases as a power law function of the number of different 16-mers in the target DNA. Data reduction reliably eliminates all but a handful of the false positives for all target lengths investigated, even when false positives accounted for more than 99% of the inferred 16-mer pool (Figure 5) . The set of inferred 16-mers remaining after data reduction closely approximates the information that would be gathered from an SBH array containing all explicit 16-mers (-4.3 billion probes). Sequence Reconstruction
The ISBH reconstruction algorithm typically returns a target DNA as several non-overlapping fragments that are in the range of several hundred to several thousand bases in length. Reconstructed fragments always show 100% identity (by BLASTN) to some region of unknown position in the target DNA sequence. The largest single fragment reconstructed was 28 kb, and fragments longer than lOkb were commonly observed (Figure 6). In most cases, more than 95% of the target DNA was recovered in a simulated ISBH experiment (Figure 7). ISBH performs optimally on target DNAs having no repeated 16-mers, generally returning a handful of long (3-15 kb) fragments. Even on sequences with many repeated 16-mers, ISBH returns dozens of fragments shorter than 5 kb, which is five to ten times the performance of electrophoretic sequencing methods. Target DNAs longer than 50 kb tend to produce large numbers of false positives in the inference step, a few of which remain after data reduction. False positives introduce ambiguities during reassembly, leading to lower average lengths for reconstructed fragments. A comprehensive list of each sequence analyzed as well as a summary of the ISBH simulation results are shown in Figure 8.
DETAILED DESCRIPTION OF THE INVENTION
While this invention is satisfied by embodiments in many different forms, there will herein be described preferred embodiments of the invention, with the understanding that the present disclosure is to be considered exemplary of the principles of the invention and is not intended to limit the invention to the embodiments illustrated and described. The scope of the invention will be measured by the appended claims and their equivalents.
A benefit of the present invention, in contrast to conventional SBH, is that the inventor's ISBH method has the potential to sequence very long targets at high accuracy, using an oligonucleotide array of moderate size. The hypothetical ISBH array studied here could easily sequence 15-45kb of DNA in a single experiment. The ISBH method requires no electrophoresis, no information about the target DNA, and could be used for diagnostic as well as de novo applications. In a single experiment, ISBH could generate more sequence data than two dozen Sanger sequencing reactions after shotgun subcloning of a target DNA. In the best cases, each fragment reconstructed by the ISBH method can outperform electrophoretic methods by 28-fold. In the worst cases, ISBH performance is equivalent to electrophoretic methods. ISBH reconstruction of a single target DNA generally required less than 10 minutes of supercomputer time to complete. The computational complexity of the inference step is of order N2, while the data reduction and reassembly steps are of order N log(N). Sequence reconstruction using a highly streamlined ISBH algorithm running on a typical desktop computer could be completed in a few hours. For DNA of random sequence, a given 16-mer should appear once every 416 «
4.3xl09 bases, a 15-mer once in 415 « 109 bases, and a 14-mer once in 414 « 2.7xl08 bases. We note however, that DNAs from a wide variety of organisms in the range of 10-100 kb typically have hundreds or thousands of repeated 16-mers. As shorter subsequences are examined, the number of repeated subsequences increases dramatically. For example, the 48.5 kb genome of bacteriophage lambda, which has no repeated 16-mers, has a single repeated 15-mer, and ten distinct 14-mers appearing more than once. This would suggest that any form of de novo SBH using oligonucleotide probes shorter than 16 bases will perform poorly on target DNAs longer than a few kilobases. ISBH appears to perform optimally both in terms of absolute read lengths and relative target coverage on DNAs in the range of 25kb that have small numbers of repeated 16-mers. ISBH is a scalable technique - the number of false positives generated by inference increases as the number of probes used in the ISBH array decreases. An ISBH array smaller than the one examined here (e.g., 12 probe groups using 12x216=7.86xl0s probes) would still sequence with 100% accuracy, but would return a target as shorter fragments.
While ISBH under ideal conditions would appear to provide an enormous gain for de novo sequencing over conventional SBH and electrophoretic methods, several daunting technical obstacles remain. Each degenerate probe is actually a mixture of many individual probes that are bound to the same area in an SBH array. The binding capacity of such a degenerate probe is greatly reduced in comparison to a pure individual probe - for the hypothetical ISBH array in this study, the complexity of each degenerate probe is 65,536. Such a high probe complexity may mean that accurate physical hybridization cannot be achieved with a high signal to noise ratio. Possible solutions to this problem include the use of base analogs to decrease probe complexity or the addition of an enzymatic step (e.g., ligation) to augment the accuracy of simple physical hybridization.
Noise contamination of the data set, particularly in terms of false negatives, must be studied in greater detail. False positives are easily dealt with in the data reduction step, but false negatives (target 16-mers that never appear at all in the inferred data set) will have the effect of lowering the mean fragment length during reconstruction. Aberrant hybridization also increases the complexity of data processing needed for reliable sequence reconstruction; the upper bound for robust sequence reconstruction from .an actual ISBH implementation is likely to be somewhat lower than the ideal situation presented here. An ISBH sequencing approach would be very effective for rapid analysis of viral and bacterial genomes which are essentially non-repeating. Sequencing of double-stranded target DNAs by ISBH is also possible, as is sequencing of a mixture of targets. For double-stranded DNA, ISBH performance is equivalent to the case of a single-stranded DNA twice as long. If a double stranded target is cleaved by a restriction endonuclease before hybridization, ISBH will return the sequence of each restriction fragment. If this experiment is repeated using a restriction endonuclease with a different recognition site, then the fragments can be aligned relative to one another using standard contig reassembly algorithms. The potential of the ISBH strategy is so strong that we are now investigating strategies to implement it in practice.
Another embodiment of the ISBH Probe Array, Experimental, and Data Analysis Algorithm Design consists of the following:
Probe Array Design. The proposed array consists of 768K oligonucleotide probes divided into three groups: 1) all 256K possible 9-base single-stranded sequences, 2) all 256K possible 9-base 5 '-overhanging partial duplexes, and 3) all 256K possible 9-base 3 '-overhanging partial duplexes.
Target Preparation. The target DNA must be single stranded - it may be prepared (for example) by long PCR of a double stranded target using one primer that is biotinylated at its 5' end to facilitate purification from the other strand. The biotinylated strand may be captured on streptavidin-coated beads or column.
Experimental Design. Eight distinct oligonucleotides in two groups are required as follows. Group I: 5'-ANNNNNNN-3', 5'-CNNNNNNN-3', 5'-GNNNNNNN-3', 5'-TNNNNNNN-3'. Group II: 5'-P-NNNNNNNA-Sdd-3', 5'-P-NNNNNNNC-Sdd-3\ 5'-P-NNNNNNNG-Sdd-3', and 5'-P-NNNNNNNT-Sdd-3'.
P denotes a phosphate group, Sdd denotes a 3'-dideoxy base connected by a phosphorothioate linkage. Sixteen separate reactions are then performed: each oligonucleotide from group I is combined with an oligonucleotide from group II and then hybridized to the single-stranded target under conditions favoring accurate base- pairing. After hybridization has occurred, the oligonucleotides that are still remaining in solution (for example) must be removed by size-exclusion column chromatography. DNA ligase (and all necessary cofactors) are added to the reaction mixture. After the ligation reaction, exonuclease III is added to the reaction mixture to destroy any unligated oligonucleotides that are still hybridized to the target. Under ideal conditions, the ligation products will be 16-mers of the form:
5'-ANNNNNNNNNNNNNNA-Sdd-3' (and all 15 other permutations of the end bases). The ligation products from each of the sixteen reactions are then hybridized separately to a probe array as described above.
Data Analysis. Each of the sixteen probe array hybridization experiments described above generates the following data: a set of 9-mers from probe group 1, a set of 9-mers from probe group 2, and a set of 9-mers from group 3. The following algorithm is used to expand the data: compare each 9-mer from group 2 to each 9-mers from group 3. If the 9-mer from group 2 has the same 3' two bases as the 5' two bases of the 9-mer in group 3 (i.e., they have a two-base overlap), then combine the two 9-mers to form a single 16-mer (concatenate the 3' seven bases of the group 3 oligo to the 3' end of the group 2 oligo). This newly formed 16-mer is retained only if all eight of its 9-base subsequences are present in group 1. This analysis is performed for all probe array experiments and all retained 16-mers are collected into a single set. This set of 16-mers (which is the inferred set of explicit 16-mers from the target) is then subjected to the same data reduction and sequence reconstruction algorithms that we have previously described for ISBH.
Accordingly, this invention is not limited to the particular embodiments disclosed, but is intended to cover all modifications that are within the spirit and scope of the invention as defined by the appended claims.

Claims

CLAIMSWhat is claimed is:
1. A method of testing a nucleic acid target, comprising: a) fragmenting a single-stranded target DNA into single-stranded target DNA fragments; b) testing said fragments so as to generate a signal for each fragment; c) calculating a first set of N-mers from said signals, each of said N-mers having a sequence with 3' and 5' ends; d) comparing a portion of the nucleic acid sequence of each of said
N-mers of said first set with a portion of the nucleic acid sequence of every other N-mer in said first set for sequence overlap; and e) eliminating each N-mer that is found not to display said overlap, so as to create a second set of N-mers.
2. The method of Claim 1, further comprising the steps: f) comparing a portion of the nucleic acid sequence of each of said N-mers in said second set with a portion of the nucleic acid sequence of every other N-mer in said second set for sequence overlap, wherein the portion compared has a length in bases defined by N-l; and g) identifying an instance where said portion of one N-mer from said second set is found to overlap with said portion of another N-mer from said second set, thereby identifying first and second overlapping N-mers.
3. The method of Claim 1, wherein non-target DNA fragments are added to said single- stranded target DNA fragments prior to step (b).
4. The method of Claim 2, wherein N is 16 and said portion compared in step (f) is fifteen bases in length.
5. The method of Claim 4, wherein said fifteen bases compared in step (f) are the 3'-fifteen bases of each 16-mer with the 5'-fifteen bases of every other 16-mer in said second set.
6. The method of Claim 5, further comprising the step (h) constructing a 17-mer from the combination of the sequences of said first and second overlapping N-mers of step (g).
7. A method of testing a nucleic acid target, comprising: a) fragmenting a single- stranded target DNA into single- stranded fragments; b) testing each of said fragments so as to generate a signal for each fragment; c) calculating a first set of N-mers from said signals, each of said N-mers having a sequence with 3' and 5' ends; d) comparing a portion of the nucleic acid sequence of each of said N-mers of said first set with a portion of the nucleic acid sequence of every other N-mer in said first set for sequence overlap; e) eliminating each N-mer that is found not to display said overlap, so as to create a second set of N-mers; f) comparing a portion of the nucleic acid sequence of each of said N-mers in said second set with a portion of the nucleic acid sequence of every other N-mer in said second set for sequence overlap, wherein the portion compared has a length in bases defined by N-l; and g) identifying an instance where said portion of one N-mer from said second set is found to overlap with said portion of another N-mer from said second set, thereby identifying first and second overlapping N-mers.
8. The method of Claim 7, wherein non-target DNA fragments are added o said single-stranded target DNA fragments prior to step (b).
9. The method of Claim 8, wherein N is 16 and said portion compared in step (f) is fifteen bases in length.
10. The method of Claim 9, wherein said fifteen bases compared in step (f) are the 3'-fifteen bases of each 16-mer with the 5'-fifteen bases of every other 16-mer in said second set.
11. The method of Claim 10, further comprising the step (h) constructing a 17-mer from the combination of the sequences of said first and second overlapping N-mers of step (g).
12. A method of testing a nucleic acid target, comprising: a) fragmenting a single-stranded target DNA into single-stranded fragments; b) adding non-target DNA fragments to said single-stranded target DNA fragments to create a mixture; c) testing each of said fragments so as to generate a signal for each fragment; d) calculating a first set of N-mers from said signals, each of said N-mers having a sequence with 3' and 5' ends; e) comparing a portion of the nucleic acid sequence of each of said
N-mers of said first set with a portion of the nucleic acid sequence of every other N-mer in said first set for sequence overlap; f) eliminating each N-mer that is found not to display said overlap, so as to create a second set of N-mers; g) comparing a portion of the nucleic acid sequence of each of said N-mers in said second set with a portion of the nucleic acid sequence of every other N-mer in said second set for sequence overlap, wherein the portion compared has a length in bases defined by N-l; and h) identifying an instance where said portion of one N-mer from said second set is found to overlap with said portion of another N-mer from said second set, thereby identifying first and second overlapping N-mers.
13. The method of Claim 12, wherein N is 16 and said portion compared in step (g) is fifteen bases in length.
14. The method of Claim 13, wherein said fifteen bases compared in step
(g) are the 3'-fifteen bases of each 16-mer with the 5'-fifteen bases of every other 16-mer in said second set.
15. The method of Claim 14, further comprising the step (i) constructing a 17-mer from the combination of the sequences of said first and second overlapping N-mers of step (h).
PCT/US1998/022519 1997-10-24 1998-10-23 Inference sequencing by hybridization WO1999022025A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU11185/99A AU1118599A (en) 1997-10-24 1998-10-23 Inference sequencing by hybridization

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US6310397P 1997-10-24 1997-10-24
US60/063,103 1997-10-24

Publications (1)

Publication Number Publication Date
WO1999022025A1 true WO1999022025A1 (en) 1999-05-06

Family

ID=22046957

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1998/022519 WO1999022025A1 (en) 1997-10-24 1998-10-23 Inference sequencing by hybridization

Country Status (2)

Country Link
AU (1) AU1118599A (en)
WO (1) WO1999022025A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6376184B1 (en) * 2000-01-13 2002-04-23 Hamamatsu Photonics K.K. Method for gene analysis
WO2002079496A3 (en) * 2001-03-28 2004-01-08 Applied Minds Inc Method and sequences for determinate nucleic acid hybridization
US7504213B2 (en) * 1999-07-09 2009-03-17 Agilent Technologies, Inc. Methods and apparatus for preparing arrays comprising features having degenerate biopolymers

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5202231A (en) * 1987-04-01 1993-04-13 Drmanac Radoje T Method of sequencing of genomes by hybridization of oligonucleotide probes
US5525464A (en) * 1987-04-01 1996-06-11 Hyseq, Inc. Method of sequencing by hybridization of oligonucleotide probes
US5683881A (en) * 1995-10-20 1997-11-04 Biota Corp. Method of identifying sequence in a nucleic acid target using interactive sequencing by hybridization
US5723320A (en) * 1995-08-29 1998-03-03 Dehlinger; Peter J. Position-addressable polynucleotide arrays
US5759779A (en) * 1995-08-29 1998-06-02 Dehlinger; Peter J. Polynucleotide-array assay and methods
US5763263A (en) * 1995-11-27 1998-06-09 Dehlinger; Peter J. Method and apparatus for producing position addressable combinatorial libraries

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5202231A (en) * 1987-04-01 1993-04-13 Drmanac Radoje T Method of sequencing of genomes by hybridization of oligonucleotide probes
US5525464A (en) * 1987-04-01 1996-06-11 Hyseq, Inc. Method of sequencing by hybridization of oligonucleotide probes
US5723320A (en) * 1995-08-29 1998-03-03 Dehlinger; Peter J. Position-addressable polynucleotide arrays
US5759779A (en) * 1995-08-29 1998-06-02 Dehlinger; Peter J. Polynucleotide-array assay and methods
US5683881A (en) * 1995-10-20 1997-11-04 Biota Corp. Method of identifying sequence in a nucleic acid target using interactive sequencing by hybridization
US5763263A (en) * 1995-11-27 1998-06-09 Dehlinger; Peter J. Method and apparatus for producing position addressable combinatorial libraries

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
BROUDE N. E., ET AL.: "ENHANCED DNA SEQUENCING BY HYBRIDIZATION.", PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES, NATIONAL ACADEMY OF SCIENCES, US, vol. 91., 1 April 1994 (1994-04-01), US, pages 3072 - 3076., XP002916360, ISSN: 0027-8424, DOI: 10.1073/pnas.91.8.3072 *
DRMANAC R., ET AL.: "DNA SEQUENCE DETERMINATION BY HYBRIDIZATION: A STRATEGY FOR EFFICIENT LARGE-SCALE SEQUENCING.", SCIENCE, AMERICAN ASSOCIATION FOR THE ADVANCEMENT OF SCIENCE, US, vol. 260., 11 June 1993 (1993-06-11), US, pages 1649 - 1652., XP002916361, ISSN: 0036-8075, DOI: 10.1126/science.8503011 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7504213B2 (en) * 1999-07-09 2009-03-17 Agilent Technologies, Inc. Methods and apparatus for preparing arrays comprising features having degenerate biopolymers
US6376184B1 (en) * 2000-01-13 2002-04-23 Hamamatsu Photonics K.K. Method for gene analysis
WO2002079496A3 (en) * 2001-03-28 2004-01-08 Applied Minds Inc Method and sequences for determinate nucleic acid hybridization
US6949340B2 (en) 2001-03-28 2005-09-27 Creative Mines Llc Optical phase modulator
US7537892B2 (en) 2001-03-28 2009-05-26 Searete Llc Method and sequences for determinate nucleic acid hybridization

Also Published As

Publication number Publication date
AU1118599A (en) 1999-05-17

Similar Documents

Publication Publication Date Title
Reinartz et al. Massively parallel signature sequencing (MPSS) as a tool for in-depth quantitative gene expression profiling in all organisms
US6864052B1 (en) Enhanced sequencing by hybridization using pools of probes
US6297017B1 (en) Categorising nucleic acids
US6294336B1 (en) Method for analyzing the nucleotide sequence of a polynucleotide by oligonucleotide extension on an array
US6638717B2 (en) Microarray-based subtractive hybridzation
NZ334426A (en) Characterising cDNA comprising cutting sample cDNAs with a first endonuclease, sorting fragments according to the un-paired ends of the DNA, cutting with a second endonuclease then sorting the fragments
JP2001514906A (en) Methods and compositions for detecting or quantifying nucleic acid species
EP1192274A2 (en) Solution-based methods and materials for sequence analysis by hybridization
GB2413796A (en) Nucleic acid sequencing method
JP2008546405A (en) Improved strategy for sequencing complex genomes using high-throughput sequencing techniques
JP2013223502A (en) Method for identification of clonal source of restriction fragment
US6692915B1 (en) Sequencing a polynucleotide on a generic chip
AU2001280008A1 (en) A method and an algorithm for mRNA expression analysis
WO1997013868A1 (en) Large scale dna sequencing by position sensitive hybridization
Adams Serial analysis of gene expression: ESTs get smaller
Maldonado-Rodriguez et al. Mutation detection by stacking hybridization on genosensor arrays
WO1999022025A1 (en) Inference sequencing by hybridization
EP4060053A1 (en) Highly sensitive methods for accurate parallel quantification of nucleic acids
US20020172948A1 (en) Method and system for nucleic acid sequencing
WO2003002752A2 (en) Methods of using nick translate libraries for snp analysis
Cantor et al. DNA sequencing after the human genome project
EP4215619A1 (en) Methods for sensitive and accurate parallel quantification of nucleic acids
WO2000079007A9 (en) Improved methods of sequence assembly in sequencing by hybridization
WO2000018955A1 (en) Novel method for the preselection of shotgun clones of a genome or a portion thereof of an organism
US20050176007A1 (en) Discriminative analysis of clone signature

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AU CA JP

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: CA