WO1998015652A1 - Nucleic acid sequencing by adaptator ligation - Google Patents

Nucleic acid sequencing by adaptator ligation Download PDF

Info

Publication number
WO1998015652A1
WO1998015652A1 PCT/GB1997/002734 GB9702734W WO9815652A1 WO 1998015652 A1 WO1998015652 A1 WO 1998015652A1 GB 9702734 W GB9702734 W GB 9702734W WO 9815652 A1 WO9815652 A1 WO 9815652A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequence
adaptor
sequencing
fragments
nucleic acid
Prior art date
Application number
PCT/GB1997/002734
Other languages
French (fr)
Inventor
Gunter Schmidt
Andrew Hugin Thompson
Original Assignee
Brax Genomics Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Brax Genomics Limited filed Critical Brax Genomics Limited
Priority to AU45663/97A priority Critical patent/AU4566397A/en
Publication of WO1998015652A1 publication Critical patent/WO1998015652A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • C12Q1/6872Methods for sequencing involving mass spectrometry

Definitions

  • the invention relates to a method for sequencing nucleic acid, especially DNA.
  • modified nucleotides can be used to terminate polymerase extension of nucleic acid being copied from a template strand.
  • To determine the sequence of a template strand four dideoxynucleotides are needed corresponding to the four normal bases.
  • Template strands are added to a polymerisation medium containing all four normal nucleotides and one of the four dideoxynucleotides, which is labeled, usually with a fluorescent dye.
  • the dideoxynucleotide in each medium is at a concentration such that it has a small but defined probability of being incorporated into an extending copy of the template rather than its corresponding normal nucleotide. This terminates chain extension for this fragment. If all the fragments in a particular medium are separated on a sequencing gel, which resolves nucleic acids to a difference in length of one nucleotide, fragments corresponding to termination at every occurrence of the base to which the dideoxynucleotide corresponds should be observed. If a gel for each base is run then there should be observed a band for each nucleotide in the template and the nucleotide sequence should be determined.
  • the cleavage reagents are, however, not totally specific, so this is a fairly 'noisy' system. It suffers from the same problems as the chain termination method too.
  • the method of Harding and Keller acts on immobilised DNA templates.
  • the templates are single stranded and constructed from analogues of normal bases bearing unique fluorescent labels .
  • the immobilised templates are cleaved with a 5' to 3 ' exonuclease to release bases into a flowing medium that is run through a fluorimeter that detects which base is present at the highest concentration in the medium.
  • the exonucleases cleave off bases simultaneously from each copy of a nucleic acid simultaneously, the signal at any time should correspond essentially to the last base cleaved.
  • the sequence of bases flowing past the fluorimeter thus corresponds to the sequence of the fluorescent template.
  • This sort of approach has a potential to be extremely rapid, limited only by the processivity of the exonuclease used, which could be of the order of 100 to a 1000 bases a second.
  • This method requires base analogs that are distinguishable by some means from each other, preferably by fluorescence. These analaogs must be incorporated into a template as normal bases and be cleaved by an exonuclease as normal bases. Alternatively polymerases and exonucleases must be engineered to recognise the analogs. Either way is a major technical obstacle. Furthermore, the requirement for simultaneity of cleavage of corresponding bases from a population of immobilised template allows only a small margin of error which will severely limit the length of sequence that can be determined by this approach. These technical obstructions remain to be overcome.
  • Arrays of single-stranded oligonucleotides can be constructed representing for example, every possible combination of the 4 bases in an 8 bp oligonucleotide. Each point on an array would correspond to one such oligonucleotide.
  • a single-stranded template with a fluorescent label can be hybridised to to such an array. Every overlapping linear sequence of 8 bp that is contained in the template will be represented on the array and the template should hybridise to every point corresponding to each 8 bp sequence that defines it . The fluorescence from every point on the array can then be determined and the sequence of the target reconstructed.
  • the present invention provides a method for sequencing nucleic acid, which comprises:
  • the label may be any label suitable for the purpose, such as a fluorescent label or a mass label.
  • Each label may comprise a mass label associated with a corresponding known base sequence for identifying the corresponding _base sequence in mass spectrometry.
  • each adaptor oligonucleotide is labelled with an associated mass label which is uniquely resolvable in mass spectrometry from the other labelled adaptor oligonucleotides.
  • each adaptor oligonucleotide is composed of nucleotide analogues which are resistant to fragmentation in the mass spectrometer .
  • each mass label is cleavably attached to its corresponding adaptor oligonucleotide and uniquely resolvable in mass spectrometry.
  • the mass label may be attached to the adaptor oligonucleotide by a cleavable linker and may be cleaved under any appropriate cleavage conditions such as photocleavage conditions or chemical cleavage conditions.
  • the mass spectrometry may be effected using a mass spectrometer with orthogonal time of flight or array detector geometry.
  • the fragments are contacted in step (i) with the array of adaptor oligonucleotides in a cycle wherein the cycle comprises sequentially contacting each adaptor oligonucleotide of the array with the fragments.
  • the target nucleic acid population is subjected to sorting into sub-populations according to their sticky end sequences and each of the sub-populations is subjected to steps (b) and (c) .
  • each fragment may be produced by differential application.
  • the predetermined length of this base sequence of the sticky ends is from 3 to 5 , more preferably 4.
  • the sequencing enzyme preferably comprises a type IIs restriction endonuclease .
  • the target nucleic acid population may comprise heterogeneous nucleic acid fragments. _ The other end of each fragment may be protected by ligation with an immobilisation adaptor oligonucleotide.
  • FIGURE 1 shows a cloning vector for template sequences
  • FIGURE 2 shows PCR amplification of template DNA
  • FIGURE 3 shows immobilisation of amplified template DNA
  • FIGURE 4 shows a method of differential amplification of template
  • FIGURE 5 shows a method of producing protected DNA fragments with termini for sequencing
  • FIGURE 6 shows the action of Fokl
  • FIGURE 7 shows cutting behaviour of typical adapters according ot the invention
  • FIGURE 8 shows an adapter cycle according to one embodiment of the invention.
  • FIGURE 9 shows graphs of the effects of PEG and Ficol on ligation at ATTA and GCCG.
  • This invention provides a method capable of simultaneously sequencing a heterogeneous population of nucleic acid fragments.
  • the technology is compatible with numerous methods of template preparation.
  • the invention however provides a novel preferred strategy for sequencing large DNA molecules that limits the need for biological sub-cloning hosts and vectors.
  • the sequencing process may be summarised:
  • the sequencing method described here allows one to produce nucleic acid fragment populations in a_ reproducible manner that can then be sorted into subsets and finally sequenced by an oligonucleotide adapter based technique.
  • the sequencing method described requires double stranded templates .
  • the sequencing technique is most effective with immobilisation of the nucleic acid templates at one terminus, the other terminus must be accessible to adapters.
  • the sequencing steps use adapter molecules to generate and probe the sequence of terminal single-stranded overlaps of immobilised nucleic acid fragments.
  • Single-stranded overlaps are generated in a cyclical process preferably through the use of type IIs restriction endonucleases .
  • Recognition sites for these enzymes are provided by adapters at the terminus of a template sequence. The position of the recognition site is arranged so that digestion in the presence of the type IIs endonuclease exposes an ambiguous sticky end in the unknown sequence of the template.
  • the resultant ambiguous sticky ends generated in template sequences are probed as heterogeneous sets and sequence information is determined by measuring the quantity of label detected from correctly hybridised adapters.
  • the sequence of individual fragments is determined by comparing quantities of label for each adapter in each cycle of the sequencing process with quantities derived in previous and subsequent cycles.
  • the invention provides a method for analysing heterogeneous sub- populations of nucleic acids without spatially resolving them. This is achieved by a signal acquisition and signal processing procedure that allows sequences to be identified on the basis of their relative quantities.
  • This process does not require traditional gel methods to acquire sequence information. Since the entire process takes place in solution and is an iterative process, the steps involved could be performed by a liquid-handling robot or a microfluidics system.
  • Type IIs Restriction Endonucleases have the property that they recognise and bind to a specific sequence within a target DNA molecule, but they cut at a defined distance away from that sequence generating single-stranded sticky-ends of known length but unknown sequence at the cleavage termini of the restriction products .
  • the enzyme fokl generates an ambiguous sticky-end of 4 bp, 9 bp downstream of its recognition sequence.
  • This ambiguous sticky-end could thus be one of 256 possible 4-mers. (see figure 6) .
  • a preferred method for use with this invention requires fragmentation of the target nucleic acid followed by molecular sorting into sub-populations that are small enough to allow simultaneous sequencing.
  • This preferred method requires the use of adapters.
  • Two adapter based sorting methods are described here. One requires the use of a type IIs restriction endonuclease or a similar system for generating ambiguous sticky-ends in double stranded DNA while the second uses a primer and DNA amplification based approach to sorting.
  • Nucleic acids may be fragmented in numerous ways which may be either directed or random. For the purposes of sequencing large nucleic acids, an approach which generates numerous fragments that overlap randomly is favoured in conventional sequencing strategies for large nucleic acid templates, due to its redundancy and relative simplicity. Obviously this sort of approach requires multiple copies of the target nucleic acid to ensure that all sequences are represented in the population of fragments with unambiguous overlaps with other fragments.
  • Random fragmentation will work excellently with certain embodiments of this invention but to try and reduce redundant sequencing more controlled fragmentation of a target nucleic acid could be used.
  • a set of relatively high stringency restriction endonucleases can be used to generate sets of overlapping fragments . In this way one would hope to generate overlapping contigs in a more economical manner than random fragmentatio .
  • Random fragmentation can be achieved with mild digestion of the target nucleic acid by DNAsel or sonication. Generally, blunt - ended fragments are generated by this approach.
  • the DNA is first sub-cloned into a library.
  • the process of producing a library of this sort can be done in-house or by commercially available services, such as that provided by Clonetech.
  • the DNA is fragmented (e.g. by restriction enzyme digestion or sonication) to sizes in range of a few hundred bases and then sub-cloned into a cloning vector of choice. Because each fragment in the library is flanked by the same vector sequence a standard set of flanking PCR primers can be used to PCR amplify each fragment. Using the same PCR primers for each fragment also helps to normalise the efficiency of each PCR reaction as primer sequence is one of the most important factors affecting amplification efficiency, (see figure 1)
  • the library is then transfixed into an appropriate bacterial strain and the bacteria plated out onto selective agar plates.
  • Individual colonies are then picked by a colony picking robot (which are commercially available) .
  • Each picked colony is then spiked into a unique PCR reaction, set up on a microtitre plate for example, and each fragment is PCR amplified using the standard primer set which flank the insert .
  • One of the primers used in this reaction must be biotinylated which will allow the subsequent capture of the amplified fragment, (see figure 2)
  • a known amount of each of the amplified fragments can be captured on a streptavidin coated surface by its biotinylated primer.
  • a specific amount of PCR product can be captured. (This does, however, rely on all the primers being incorporated into the amplification products. This should only require a simple primer titration optimisation experiment as PCR reactions using clones are highly efficient.)
  • Different protocols can be used for this purpose, for example streptavidin coated magnetic beads or streptavidin coated wells of a microtitre plate.
  • beads which will bind 1 pmol of biotin per ul of beads
  • adding 5 ul of beads and the appropriate buffer to the PCR reaction will capture 5 pmol of the amplified fragment.
  • the use of beads also allows the capture of different quantities of individual amplified fragments. By adding differing amounts of beads to separate amplification reactions prior to pooling them, one can, for example, create a heterogeneous population with lpmol of fragment 1, 4 pmol of fragment 2, 10 pmol of fragment 3 and so on.
  • streptavidin coated wells of a microtitre plate could also be used by transferring each amplification reaction to a unique well of the microtitre plate.
  • Commercially available streptavidin coated plates usually have a maximum binding capacity of between 5 to 20 pmol of biotin. Therefore, the amount of amplified fragment captured in each well is determined by the binding capacity of that plate.
  • quantification of template must be stringent. This can be achieved by labelling one of the primers. After the amplification reaction has been performed, the biotinylated fragments can be captured on an avidinated substrate and washed. The number of copies of template pres.ent can be determined by measuring the retained fluorescence. Appropriate dilution of the amplified template can be performed if desired before sequencing. This gives an additional level of control over and above the capture steps .
  • the vector sequences at either terminus of the DNA can be designed to bear distinct primer sequences. This would ensure that one terminus can be readily identified as a sequencing terminus and one terminus could be designated as the immobilisation terminus. A unique termination sequence at the immobilisation terminus would identify when the clone had been sequenced.
  • Type IIs restriction endonuclease sites in the immobilisation terminus sequence can additionally permit molecular sorting.
  • the sequencing terminus can be engineered to carry a recognition sequence for the type IIs restriction endonuclease to be used as the sequencing endonuclease.
  • These vector sequences can additionally provide primer sequences to permit amplification of template and amplification based sorting.
  • the precise method of ligation will depend on how the fragments are generated. If fragments of the target nucleic acid are generated by using a type IIs restriction endonuclease, adapters with sticky-ends complementary to subsets of the possible sticky- ends that would be generated by the fragmentation endonuclease, can be ligated to the resultant fragments. These adapters could carry designed primer sites that would allow much greater control of the amplification step. The combinations of subsets of sticky- ends of the primer adapters will determine which subsets of fragments are amplified and how large these subsets of fragments are . This will allow much greater control over the PCR amplification steps. See Figures 4 and 5.
  • the parallel sequencing process described here lends itself to this sort of cloning strategy because of its ability to simultaneously sequence heterogeneous populations without spatial resolution of nucleic acids which conventional sequencing strategies cannot achieve. This means if a set of primers generated more than one fragment, the ability to sequence multiple templates simultaneously would allow one to determine the sequence without having to separate the amplified fragments.
  • nucleic acid can be fragmented initially with the sequencing enzyme. This will generate 3 classes of fragments, one class with the sequencing enzyme recognition site at one terminus only, one class with the sequencing enzyme recognition site at both termini and a third class with the site at neither termini.
  • a complete adapter set i.e. corresponding to all sticky-ends, can be added to the restriction fragment population.
  • the adapter would bear the recognition site for the sequencing enzyme. Addition of the sequencing enzyme to a population of fragments with these adapters can have two results. If a given terminus has a recognition site already then the sequencing enzyme can cleave either at the adapter site or at the more internal site. There is a 50 % chance of either cleavage event occurring. At other sites where there is no internal site, clearly, terminal bases must be lost by this process. Since with each round of this process only half of the internal sites will be removed, the process must be repeated at least 7 times to ensure removal of the sequencing enzyme recognition sequence from at least 99 % of the fragments in the population. Thus fragment size may be significantly reduced if a sequencing enzyme is used that cleaves at a significant distance from its recognition site.
  • the pair of adapters shown can be ligated to a fragment generated by Sau3AI .
  • the first adapter bears a recognition sequence for fokl while the second adapter bears a restriction site for BsuRI .
  • BsuRI is methylation sensitive and generates blun -ended fragments. If one synthesises template DNA with S-methyl cytosine but uses adapters with ordinary DNA, only the adapter will be cleaved by this will leave fragments amenable to blunt end ligation.
  • Adapter 1 provides immobilisation and the recognition site for the sequencing endonuclease. A simple protocol for generating distinct termini would be as follows:
  • the first step is fragmentation of a large number of copies of a large nucleic acid, preferably with an ordinary type II restriction endonuclease to generate known sticky ends, such as Sau3AI .
  • fragments can then be ligated to adapters. If the fragments are treated with ligase in the presence of the two types of adapters above, this will generate fragments of three types: fragments with both ends carrying adapter 1, fragments with both ends carrying adapter 2 and thirdly fragments carrying adapter 1 at one end and adapter 2 at the other. Statistically the third type of fragment will be in the majority.
  • the fragments carrying adapter 1 can be immobilised on a solid phase matrix derivitised with avidin.
  • the fragments carrying adapter 2 at both ends can be washed away and those fragments carrying two immobilisation adapters will also be immobilised.
  • the immobilised fragments can be removed from the solid phase matrix.
  • Biotin/streptavidin interactions can be disrupted by acid.
  • o Fragments that bore both adapters can be captured by the new terminus generated by cleavage of adapter 2. Capture requires a further adapter which can be immobilised allowing fragments with adapter 1 at both termini to be washed away. Alternatively the 'capture' adapter can introduce a primer sequence. Adapter 1 can additionally provide a known primer sequence to permit the captured fragments to be differentially amplified. A more complex protocol which allows molecular sorting could be achieved using adapter 2 to provide a second type IIs recognition site. (BspMI in the example below)
  • cleavage step after immobilisation is performed with this would generate fragments with ambiguous sticky ends at one terminus which can again be captured by adapters complementary to the sticky ends generated but one can select at this stage which sticky ends to capture.
  • An entire array of all possible adapters can be generated to allow all fragments to be captured and isolated.
  • a hybridisation array on a glass surface would allow spatial sorting.
  • An alternative method would use the adapter sequence to perform differential amplification.
  • the 'capture' adapter used above can provide another terminal type IIs restriction endonuclease site. This will allow another set of ambiguous sticky-ends to be generated allowing further sub-sorting until the nucleic acid fragment population is of the correct size for unambiguous sequence determination.
  • This sorting process above generates, for a 4 bp ambiguous sticky-end, 256 sub-populations. This may be generate nucleic acid populations small enough to begin sequencing or further sub- sorting may be necessary.
  • the actual sequencing method is essentially sequencing by hybridisation and can be understood first by explaining it for the case of a single nucleic acid.
  • a single nucleic acid immobilised at one terminus to a solid phase substrate, and which has an adapter at the other terminus bearing the recognition site for the type IIs restriction endonuclease chosen for sequencing. Digestion in the presence of fokl will generate a 4 bp ambiguous sticky-end.
  • an adapter molecule This would be an oligonucleotide carrying a sticky-end with one, known, sequence of 4 bp of the possible 256.
  • the adapter would additionally carry a label, e.g. a fluorescent tag, and a binding site for the desired type IIs restriction endonuclease to be used to sequence the immobilised nucleic acid. If the adapter is complementary to the ambiguous end of the target nucleic acid, it will hybridise and it will then be possible to ligate the adapter to the target. The immobilised matrix can then be washed to remove any unbound adapter.
  • the terminus of the target nucleic acid will carry also a binding site for the sequencing endonuclease that will allow cleavage of the target nucleic acid exposing further bases for analysis and the above process can be repeated for the next 4 bp of the target . This iterative process can be repeated until the entire target nucleic acid has been sequenced.
  • a more effective method of labelling appropriate for use with this invention is 'mass labelling' .
  • Cleavable mass labels are described in patent GB9700746.2. This describes methods for generation and use of labels that are readily detectable in a mass spectrometer. Mass labelling permits the generation of large numbers of labels. This would permit 256 labels to be generated allowing all 256 probes for a 4 base pair overlap to be tested simultaneously rather than sequentially. This has advantages in a hybridisation based sequencing method as a competitive binding system avoids some of the problems of different binding energies of different 4 base sequences.
  • GB 9700746.2 describes tagging of nucleic acid probes with cleavable mass labels.
  • These labels may be cleaved from the probe at various stages in a probing assay, but a preferred method of cleavage is during the ionisation process .
  • various methods are possible. After the exposed sticky ends of a template are probed with labelled adapters one is left, after washing away non-ligated adapters, with a template terminated with a labelled adapter.
  • labels that are photocleavable, thermo- labile or acid labile, for example, which can be removed at this stage and analysed in a mass spectrometer.
  • one can cleave the adapter from the template with the appropriate type IIs restriction endonuclease whose recognition site is provided by the adapter.
  • the cleaved adapter can be analysed in a mass spectrometer and the mass label can be cleaved during ionisation.
  • Non-cleavable mass labels are also appropriate for analysis of the cleaved adapter. One needs only use sufficient labels to resolve adapters with the same mass in the mass spectrum.
  • a short liquid chromatography step with a denaturing solvent would allow the tagged strand to be separated from the untagged strand.
  • HPLC or capillary electrophoresis separations would be appropriate. Such a separation would probably not be necessary, though.
  • Denaturing the cleaved adaptor might be quite desirable, however. After cleavage, both strands of the adaptor will be extended by 4 bp .
  • the probe strand will be extended by 4 unknown bases .
  • the non- probe strand will be extended by 4 bases complementary to the probe overlap of the probe strand, and so will have a known mass, hence is the preferred strand for mass labelling.
  • Certain 4mers have the same composition, GGCC, GCCG, GCGC, etc and need to be resolved as these will all give the same peaks in a mass spectrum. One need only add sufficient mass to resolve these uniquely.
  • Type IIs restriction endonucleases are also known to sometimes cleave at incorrect positions. Such cleavages should also be identifiable with this approach as an extra base or one base too few will give a shifted mass spectral peak. This should again allow improvement in quantitation.
  • the positioning of the recognition site for the sequencing endonuclease in the adapter will determine whether the next 4 bp exposed are the next 4 bp in the sequence. Or they may overlap partially with the last four base pairs thus giving partially redundant information or they may be further downstream missing out a few bases, thus only sampling the sequence of the immobilised target nucleic acid.
  • sequential bases can be exposed with adapter 1 while bases are sampled at intervals by adapter 2.
  • With adapter 3 redundant information is acquired. Adapter nucleic acid is shown in bold while fokl binding sites are underlined) . Whatever spacing is used, the spatial information relating the 4 bp oligonucleotides is retained.
  • redundant sequence data is desirable from the template nucleic acid in order to relate sequence information from each round of sequencing to the last round. This gives a small amount of redundancy, hence adapter 3 in figure 7 below is a preferred adapter construct.
  • this invention also envisions algorithms for analysing such a data matrix.
  • the algorithm attempts to identify a sequence on the basis of its frequency, i.e. a sequence present at a given frequency will have every sub-sequence present at the same frequency.
  • the algorithm searches through each column of the matrix and attempts to resolve label quantities, that may be sums of sequence frequencies into atomic quantities such that the same set of atomic quantities appear in all columns.
  • the algorithm achieves this by comparing label quantities in a given column with those in the all the other columns. A given atomic quantity that appears in all columns is then assumed to correspond to a unique sequence .
  • Ligation of adaptors is another critical aspect of the invention that must be considered.
  • Chemical methods of ligation are known: o Ferris et al, Nucleosides and Nucleotides 8, 407 - 414, 1989 o Shabarova et al, Nucleic Acids Research 19, 4247 - 4251, 1991
  • enzymatic ligation would be used and preferred ligases would be T4 DNA ligase, T7 DNA ligase, E. coli DNA ligase, Taq ligase, Pfu ligase and Tth ligase.
  • preferred ligases would be T4 DNA ligase, T7 DNA ligase, E. coli DNA ligase, Taq ligase, Pfu ligase and Tth ligase.
  • References to the literature are given below: o Lehman, Science 186, 790 - 797, 1974 o Engler et al, 'DNA Ligases', pg 3 - 30 in Boyer, editor, 'The Enzymes, Vol 15B' , Academic Press, New York, 1982
  • Protocols for use of ligases can be found in: o Sambrook et al, cited above o Barany, PCR Methods and Applications, 1: 5 - 16, 1991 o Marsh et al, Strategies 5, 73 - 76, 1992
  • restriction endonucleases Numerous type IIs restriction endonucleases exist and could be used as sequencing enzymes for this process . Table 1 below gives a list of examples but is by no means comprehensive. A literary review of restriction endonucleases can be found in Roberts, R., J. Nucl. Acids Res. 18, 2351 - 2365, 1988. New enzymes are discovered at an increasing rate and more up to date listings are recorded in specialist databases such as REBase which is readily accessible on the internet using software packages such as Netscape or Mosaic and is found at the World Wide Web address : http://www.neb.com/rebase/.
  • REBase lists all restriction enzymes as they are discovered and is updated regularly, moreover it lists recognition sequences, isoschizomers of each enzyme, manufacturers and suppliers and references to them in scientific literature.
  • the protocol would be much the same irrespective of the type IIs restriction endonuclease used but the spacing of recognition sites for a given enzyme within an adaptor would be tailored according to requirements and the enzymes cutting behaviour, (see figure n above)
  • Preferred embodiments of the process will use adaptors labeled with fluorescent labels. Detection of fluorescent signals can be performed using optical equipment that is readily available. Fluorescent labels usually have optimum frequencies for excitation and then fluoresce at specific wavelengths in returning from an excited state to a ground state. Excitation can be performed with lasers at specific frequencies and fluorescence detected using collections lenses, beam splitters and signal distribution optics. These direct fluorescent signals to photomultiplier systems which convert optical signals to electronic signals which can be interpreted using appropriate electronics systems.
  • a favourable approach is to synthesise the sample molecule with appropriate isotopes to give a slightly different mass spectrum, for a molecule with the same chemical behaviour.
  • This approach might be less desirable than external standards for use with large numbers of mass labels due to the added expense of finding or synthesising appropriate internal standards but will give better qunatification than external standards .
  • An alternative to isotope labelling is to identify a molecule that has similar but not identical chemical behaviour as the sample in the mass spectrometer. Finding such analogues is difficult and is a significant task for large families of mass labels.
  • the configuration of the instrument is critical to determining the actual ion count itself, particularly the ionisation method and the separation method used.
  • Certain separation methods act as mass filters like the quadrupole which only permits ions with a particular mass charge ratio to pass through at one time. This means that a considerable proportion of sample never reaches the detector.
  • most mass spectrometers only detect one part of the mass spectrum at a time. Given that a large proportion of the mass spectrum may be empty or irrelevant but is usually scanned anyway, this means a further large proportion of the sample is wasted.
  • Mattauch-Herzog geometry sector instruments permit this but have a number of limitations.
  • Sector instruments are organised into distinct regions, 'sectors', that perform certain functions.
  • the ionisation chamber feeds into a free sector which feeds into an 'electric sector' .
  • the electric sector basically 'focusses' the ion beam which is divergent after leaving the ion source.
  • the electronic sector also ensures the ion stream has the same energy. This step results in the loss of a certain amount of sample.
  • This focussed ion beam then passes through a second free area into a magnetic sector which splits the beam on the basis of its mass charge ratio.
  • the magnetic sector behaves almost like a prism.
  • a photographic plate can be placed in front of the split beam to measure the intensities of the spectrum at all positions.
  • With a family of well characterised mass labels one would probably monitor only sufficient peaks to sample all the mass labels unambiguously.
  • array detectors would allow one to simultaneously and continuously monitor a number of regions of the mass spectrum simultaneously, which might be applicable for use with well characterised mass label families.
  • the limit on the resolution of closely spaced regions of the spectrum might restrict the number of labels one might use, though, if array detectors are chosen.
  • SIM selected ion monitoring'
  • the orthogonal time of flight mass spectrometer This geometry that allows for very fast sampling of an ion stream followed by almost instantaneous detection of all ion species.
  • the ion current leaving the source probably an electrospray source for many biological applications, passes an electrode plate perpendicular to the current .
  • This plate is essentially an electrical gate and is used to generate a repulsive potential which deflects the ion current 'orthogonally' into a time of flight mass analyser that uses a reflectron.
  • the reflectron is essentially a series of circular electrodes that generate an increasingly repulsive electromagnetic fieldthat normalises ion energies and reflects the ion stream into a detector.
  • the reflectron is a simple device that greatly increases the resolution of TOF analysers. Ions leaving the ion source will have different energies, faster ions will penetrate the repulsive field further than ions with a lower energy and so will be delayed slightly with respect to the lower energy ions but since they will arrive slightly before the lower energy ions they will enter the TOF at roughly the same time so all the ions of a given mass charge ratio will arrive at the detector at roughly the same time.
  • the electrical gate is 'closed' to deflect ions into the TOF analyser, the timer is triggered. The flight time of the deflected ions is recorded and this is sufficient to determine their mass/charge ratio.
  • the gate generally only sends a short pulse of ions into the TOF analyser at any one time. Since the arrival of all ions is recorded and since the TOF separation is extremely fast, the entire mass spectrum is measured effectively simultaneously. Furthermore, the gate electrode can sample the ion stream at extremely high frequencies so very little sample is required. For these reasons this geometry is extremely sensitve, to the order of a few femtomoles .
  • PCR product Three different PCR products are used to represent 3 different templates at different frequencies.
  • the PCR product used for this are exons 14, 16 and 19 of the anion exchanger (AE1) as these PCRs have already been optimised in our laboratory. These are referred to as AE14 , AE16 and AE19.
  • AE16 will be at half the concentration of AE14 and AE19 will be at one fifth the concentration of AE14.
  • AE14 sequence ccaaagctgggagagaacagaatgccttggttttctgctgcagatcttccaggaccacccactacagaagac
  • FAM - CTAGAGGACGATCGA GGATG . GATC . TTCCAGGACCACC ... GATCTCCTGCTAGCT . CCTAC . CTAG . AAGGTCCTGGTGG ...
  • FAM - CTAGAGGACGATCGA GGATG . GATC . TGAGACTCCAGGAATAT . GATCTCCTGCTAGCT . CCTAC . CTAG .ACTCTGAGGTCCTTATA...
  • FAM - CTAGAGGACGATCGA GGATG . GATC .ATCTGCCTGGCAG . GATCTCCTGCTAGCT . CCTAC . CTAG . TAGACGGACCGTC .
  • FAM - CTAGAGGACGATCGA GGATG . GATC . TTCCA
  • FAM - CTAGAGGACGATCGA GGATG. GATC. TGAGA
  • the cleaved fragments are then captured, through ligation, to 3 different wells of a microtitreplate each containing a specific adaptor simulating the first cycle of a sequencing reaction, providing the first 4 bases. See below for full sequences
  • TCT N-CGTCG .
  • GTCC GTCC
  • N is a number of bases
  • TCT CAGGACCTTCTAG .
  • Biotin-N-GCAGC AGA. GGAGTCTCAGATC . CATCC .AGCTAGCAGGAGATC
  • N-CGTCG N-CGTCG .
  • TCT CCTCAGAGTCTAG .
  • GTAGG TCGATCGTCCTCTAG -FAM
  • N-CGTCG N-CGTCG .
  • TCT GTCCGTCTACTAG .
  • GTAGG GTAGG .
  • concentration can be measured through fluorescence of the FAM label and the first 4 bases (XXXX) determined. Successful ligation, measured by fluorescence therefore provides concentration information and the first 4 bases of each fragment .
  • the 'Bbv" adaptors were bound to black, streptavidin coated 96 well microtitre plates (Boehringer Mannheim) . This was achieved by incubating lOpmol of the appropriate adaptor in 35ul of lxTE+0. IM NaCl in each well overnight at 4°C. Following the overnight incubation each well was washed 3 times with 50ul of lxTE+O.lM NaCl. The lxTE+O.lM NaCl was removed and 50ul of Ixligase buffer was added to each well and the plate was stored at 4°C untill used.
  • BioFAMFok adaptor was bound to 8 wells by incubating lOpmol of the adaptor in 25ul of lxTE+O.lM NaCl in each well overnight at 4°C. Following the overnight incubation each well was washed 3 times with 50ul of lxTE+O.lM NaCl. A dilution of BioFAMFok (5, 2.5, 1.25, 0.675, 0.3375pmol) diluted in lxTE+0. IM NaCl was added to a series of well and the fluorescence of the plate read in a Biolumin Microtiter plate Reader (Molecular Dynamics)
  • the 3 PCR products used to represent sequence templates at different concentrations were exons 14,16 and 19 from the human erythrocyte anion exchanger gene located on chromosome 17q21-22. Primer sequences use to amplify exons 14,16 and 19
  • biotin into one of the primers in each set will allow their capture to streptavidin coated beads (Dynal UK) .
  • reaction mix was heated at 65oC in a Techne Dryblock for 20 minutes to inactivate the enzyme.
  • DynaBeads M280 will bind 60-120 pmol of biotinylated double stranded DNA.
  • 300ul of DynaBeads M280 at lmg/ml were washed with lOOul TES by holding the beads to the side of an eppendorf tube with a Magnetic Particle Concentrator (Dynal UK) so that the supernatant could be removed. This was repeated three times (All subsequent bead manipulation were carried out in this manner according to manufacture's instructions) .
  • the beads were resuspended in lOOul of TES and the Sau 3A digested DNA added and incubated at room temperature for 1 hour to allow the biotinylated DNA to bind to the beads .
  • the Beads/DNA were then washed three times with Ixligase buffer using the Magnetic Particle Concentrator (Dynal UK) as before.
  • the beads/DNA were was 2 times with 75 ul of lx Fok I buffer and the resuspended in lOOul of lxFok I buffer and heated at 65oC in a Techne Dryblock for 20 minutes to inactivate any remaining ligase.
  • the buffer was removed and the beads/DNA resuspended in 95ul of lx Fok I buffer containing 20 units of Fok I (New England Biolabs)
  • the beads/DNA were then incubated at 37oC for 2 hours. Following incubation the supernatant, containing the fragments cleaved by Fok I, was then transferred to a fresh eppendorf tube and heated at 65oC for 20 minutes in a Techne Dryblock in inactivate the Fok I.
  • Fok I fragments were then divided into three tubes each containing 30ul of Fok I cleaved fragments, 5ul of lOx Ligase buffer, 3ul ligase (at 400uints/ul -New England Biolabs) and 12ul of H20.
  • the ligase buffer on a plate containing adaptors Bbvl4 , 16, 19 in separate wells was removed and the above reaction mixtures, containing the Fok I cleaved fragments and ligase, added to each.
  • the reading obtained from the Bbvl6 well should be half (i.e. 50%) of that obtained from the Bbvl4 well and as one fifth the amount of exon 19 compared to exon 14 (6pmol exon 19, 30 pmol exon 14) the reading obtained from the Bbvl9 well should be one fifth (i.e. 20%) that obtained from the Bbvl4 well.
  • this process is capable of separating a mixed population of DNA , and identifying 4bp, while at the same time maintaining the relative proportions of the original mixture with minimal errors . Which in turn can then be reprobed to obtain another 4bp and the associated quantitative data.
  • the ligation reaction is a critical step in this sequencing technology. Therefore, full optimisation of this reaction is required to ensure success with these techniques.
  • the conditions for the ligation reaction have been investigated by ligating fluorescently (FAM) labelled adaptors to biotinylated adaptors captured to a streptavidin coated microtitre plate.
  • the biotinylated adaptors consist of a GC rich and AT rich type having the 4 base pair overhang sequence CGGC and TAAT respectively. These represent the extremes of GC and AT hybridisation and are therefore used to determine the conditions required to equalise their differing hybridisation kinetics.
  • reaction time increases the amount of FAM labelled adaptor ligated, as expected.
  • a reaction time of 60 minutes will be impractical for the proposed techniques.
  • these reactions do not contain any agents which promote ligation through intra molecular crowding such as polyethylene glycol (PEG) or ficol.
  • PEG polyethylene glycol
  • the intra molecular crowders PEG, ficol and hexamine chloride were titrated to investigate their effects on ligation.
  • Tetremethly ammonium chloride which modifies Watson and Crick base pairing, was also titrated to investigate its effect on the differing efficiency of ligation of AT and GC rich adaptors 5pmol of adaptor was ligated for 10 minutes at 16°C.
  • Figure 9 shows a graph representing the effect that increasing Ficol concentration has on the efficiency of ligating FAM labelled GCCG adaptor (series 1) to captured CGGC target adaptor and FAM labelled ATTA adaptor (series 2) to captured TAAT target adaptor .
  • Ficol has much less of an effect on the efficiency of these reactions as compared to PEG (see below) and therefore will be of less use in helping to equalise the efficiency of ligation between AT and GC rich adaptors .
  • Figure 9 also shows a graph representing the effect that increasing PEG concentration has on the efficiency of ligating FAM labelled ATTA adaptor to captured TAAT target adaptor.
  • Figure 9 also shows a graph representing the effect that increasing PEG concentration has on the efficiency of ligating FAM labelled GCCG adaptor to a captured CGGC target adaptor.
  • the ATCA mismatched adaptor does not ligate to any measurable degree .
  • the presence of the C in the ATCA adaptor must therefore disrupt the base pairing completely thereby preventing any ligation.
  • the ATAA adaptor only ligates at 6.7% of the amount as the ATTA adaptor.
  • the replacement of the T with an A in .this mismatch therefore disrupts base pairing to a lesser degree than a C and therefore allows some ligation.
  • the ligation of this mismatched adaptor is completely displaced by the presence of any unlabelled specific ATTA adaptor.
  • the mismatched adaptors do ligate as compared to the AT rich one which do not .
  • the amount of ligation achieved is reduced to 23% for the GCAG and 10% for the GCGG adaptors.
  • a sequencing reaction by this method involves repeated cycles of cleaving a template with a type IIs restriction endonuclease whose recognition sequence is provided by an adaptor. If the reaction is peformed with multiple templates then each cycle of the sequencing reaction will generate a signal for a series of n-mers. Many cycles will of the reaction will generate a matrix of n-mers which must be analysed to reconstruct the sequences of the source templates .
  • the program operates by first analysing the data matrix to identify in each column of the matrix, corresponding to one cycle of the sequencing reaction, n-mer frequencies or quantities which are equivalent in other columns of the matrix given a predefined margin of error in the measurement of n-mer quantities within which to operate.
  • the raw n-mer frequencies in the data matrix are then replaced with their probable group frequencies in each column .
  • This new data matrix is then analysed by a second algorithm which assumes that there should be the same number of n-mers in each column of the matrix and attempts to resolve any 'sums' of frequencies where the same n-mer has occurred in more than one template in a given cycle of the sequencing reaction.
  • This algorithm takes the group frequencies in the data matrix and generates a sorted 'frequency list' that lists the number of occurrences of each group frequency in order of increasing number of occurrences .
  • the algorithm then takes group frequencies with the lowest number of occurrences first on the assumption that these are likely to be sums, since sums of groups should occur with a relatively low frequency.
  • An alternative would be to generate a sorted list of group frequencies, in order of decreasing quantity, and start with the largest quantities, again on the assumption that these are likely to be sums.
  • the algorithm tests each frequency in the list against each column of the original data matrix. If the group frequency occurs in the column it is tested against all combinations of pairs of group frequencies that are missing from the column to see if any of these missing frequencies can add up to give the current frequency being tested. If any of these missing frequencies do add up and there is only one pair that can add up within the predetermined margin of error then it is assumed that the larger frequency is the sum of the two missing frequencies and the larger frequency is replaced in the current column of the data matrix by occurrences of the two missing frequencies . Any frequencies are the sum of two pairs of missing frequencies are marked as such and in the final sequence reconstruction the bases are marked as unknown.
  • the SeqMatrix data structure stores the matrix of n-mers generated by a sequencing reaction.
  • noise subtraction algorithms would be needed and an algorithm to normalise frequencies in each column to account for progressive decrease in signal with each cycle of the sequencing reaction that will result from the fact that no enzymatic step will be 100% efficient.
  • newList AppendElement(newL ⁇ st, NewElement(meanFreq, groupCount));
  • tempL ⁇ st5 tempL ⁇ st5-> ⁇ ext
  • ⁇ temp ⁇ st4 tempL ⁇ st4->next
  • ⁇ tempL ⁇ st3 tempL ⁇ st3-> ⁇ ext
  • 1 I tempL ⁇ st2 tempL ⁇ st2->next
  • Step 1 Cleave genomic DNA with type IIs restriction endonuclease
  • Step 2 Add adaptors to fragments each bearing primer binding sites such that each sticky-end or subset thereof bears a unique primer site
  • Step 3 Differentially amplify by PCR by adding different amounts of primer for each adaptor
  • Step 1 Cleave genomic DNA with type IIs restriction endonuclease
  • Step 2 Ligate adaptor pair to fragments to tag termini
  • Step 3 Capture fragments to allow fragments with adaptor 2 at both termini to be washed away
  • Step 4 Cleave adaptor 2 with restriction endonuclease
  • Step 6 Ligate capture adaptor to blunt end generated from fragments with adaptor 2 at one end
  • Step 7 Capture fragments or perform arbitrary further sorting - 73b
  • Sorting step Sort fragments onto array of oligonucleotides or into array of 256 wells
  • Cleavage step Cleave immobilised fragments with type IIs restriction endonuclease corresponding to directionality adaptor 1

Abstract

A method for sequencing nucleic acid, which comprises: (a) obtaining a target nucleic acid population comprising nucleic acid fragments in which each fragment is present in a unique amount and bears at one end a sticky end sequence of predetermined length and unknown sequence, (b) protecting the other end of each fragment, and (c) sequencing each of the fragments by (i) contacting the fragments with an array of adaptor oligonucleotides under hybridisation conditions, each adaptor oligonucleotide bearing a label, a sequencing enzyme recognition site, and a known unique base sequence of same predetermined length as the sticky end sequence, the array containing all possible base sequences of that predetermined length; removing any unhybridised adaptor oligonucleotide and recording the quantity of any hybridised adaptor oligonucleotide by detection of the label, then repeating the cycle, until all of the adaptors in the array have been tested; (ii) contacting the hybridised adaptor oligonucleotides with a sequencing enzyme which binds to the recognition site and cuts the fragment to expose a new sticky end sequence which is contiguous with or overlaps the previous sticky end sequence; (iii) repeating steps (i) and (ii) for a sufficient number of times and determining the sequence of the fragment by comparing the quantities recorded for each sticky end sequence.

Description

NUCLEIC ACID SEQUENCING BY ADAPTATOR LIGATION
Field of the Invention
The invention relates to a method for sequencing nucleic acid, especially DNA.
Background to the Invention
Various methods for sequencing have been developed but most are slow and complicated to automate like Sanger's chain termination method or Maxam's chain degradation method. Others have technical difficulties remaining to be overcome like the single base sequencing methods to be discussed below. This is discussed in Brenner PCT/US95/12678 pg 1-2.
According to the method of chain termination sequencing (Sanger et al, Proc. Natl. Acad. Sci . USA 74, 5463 - 5467, 1977), modified nucleotides can be used to terminate polymerase extension of nucleic acid being copied from a template strand. To determine the sequence of a template strand four dideoxynucleotides are needed corresponding to the four normal bases. Template strands are added to a polymerisation medium containing all four normal nucleotides and one of the four dideoxynucleotides, which is labeled, usually with a fluorescent dye. The dideoxynucleotide in each medium is at a concentration such that it has a small but defined probability of being incorporated into an extending copy of the template rather than its corresponding normal nucleotide. This terminates chain extension for this fragment. If all the fragments in a particular medium are separated on a sequencing gel, which resolves nucleic acids to a difference in length of one nucleotide, fragments corresponding to termination at every occurrence of the base to which the dideoxynucleotide corresponds should be observed. If a gel for each base is run then there should be observed a band for each nucleotide in the template and the nucleotide sequence should be determined.
This method is limited to templates of about 1500 bp . It is automatable but there is a time bottleneck in the running of sequencing gels which is technically difficult to overcome. Gels also have assorted problems of their own, such as band-broadening due to temperature effects, compressions due to secondary structure in the template nucleic acids and inhomogeneities in the separation gel.
Conceptually similar in principle to chain termination sequencing, the method of chain degradation sequencing (Maxam et al, Proc. Natl. Acad. Sci . USA 74, 560 - 564, 1977) relies on chemical reagents which specifically cleave a DNA template at a specific base. Radiolabeled templates are cleaved in separate reactions with each reagent and separated on a sequencing gel. The pattern of bands is essentially the same as with chain termination but shifted by one base.
The cleavage reagents are, however, not totally specific, so this is a fairly 'noisy' system. It suffers from the same problems as the chain termination method too.
There are variations on methods of single-base sequencing, one example of which is described here . Others can be found in the references given below:
o Cheeseman, U.S. patent 5,302,509.
o Tsien et al, International Patent Application WO 91/06678.
o J.D. Harding and R.A. Keller, Trends in Biotechnology 10, 55 - 58, 1992.
o Rosenthal et al, International Patent Application WO 93/21340.
o Canard et al, Gene 148, 1 - 6, 1994.
o Metzker et al, Nucleic Acids Research 22, 4259 - 4267, 1994.
The method of Harding and Keller acts on immobilised DNA templates. The templates are single stranded and constructed from analogues of normal bases bearing unique fluorescent labels . The immobilised templates are cleaved with a 5' to 3 ' exonuclease to release bases into a flowing medium that is run through a fluorimeter that detects which base is present at the highest concentration in the medium. As long as the exonucleases cleave off bases simultaneously from each copy of a nucleic acid simultaneously, the signal at any time should correspond essentially to the last base cleaved. The sequence of bases flowing past the fluorimeter thus corresponds to the sequence of the fluorescent template. This sort of approach has a potential to be extremely rapid, limited only by the processivity of the exonuclease used, which could be of the order of 100 to a 1000 bases a second.
This method requires base analogs that are distinguishable by some means from each other, preferably by fluorescence. These analaogs must be incorporated into a template as normal bases and be cleaved by an exonuclease as normal bases. Alternatively polymerases and exonucleases must be engineered to recognise the analogs. Either way is a major technical obstacle. Furthermore, the requirement for simultaneity of cleavage of corresponding bases from a population of immobilised template allows only a small margin of error which will severely limit the length of sequence that can be determined by this approach. These technical obstructions remain to be overcome.
The alternative to ensuring that the exonuclease remains in step with numerous copies of a single template is to analyse only a single molecule at a time. This approach has severe technical difficulties such as manipulating single templates, developing fluorescent dyes with a large quantum yield and robustness to ensure every cleaved base is detected and developing corresponding detectors .
Sequencing by hybridisation to chips, grids and arrays is an approach described in J. Biomolecular Structure and Dynamics 9, 399 - 410, 1991. Arrays of single-stranded oligonucleotides can be constructed representing for example, every possible combination of the 4 bases in an 8 bp oligonucleotide. Each point on an array would correspond to one such oligonucleotide. A single-stranded template with a fluorescent label can be hybridised to to such an array. Every overlapping linear sequence of 8 bp that is contained in the template will be represented on the array and the template should hybridise to every point corresponding to each 8 bp sequence that defines it . The fluorescence from every point on the array can then be determined and the sequence of the target reconstructed.
The problem with this sort of approach is the size of arrays required to sequence a nucleic acid template of reasonable length unambiguously. Arrays are expensive and technically difficult to construct .
Direct analysis of Sanger Ladders by mass spectrometry is an approach described by Kδster et al in Nature Biotech JL4, 1123- 1128 (1996) . This approach requires determination of the mass of each component of a Sanger ladder .
Analysis of Sanger sequence ladders directly by mass spectrometry has the potential to be extremely rapid but there is a severe problem with the 'read-length' that can be obtained using this approach. This is due to the fact that DNA is highly fragmentory in the mass spectrometer and is also poly-ionic, so each DNA species will be found in multiple ionisation states, which gives rise to highly complex spectra. Furthermore the mass resolution of most appropriate mass spectrometers does not permit sequencing of more than about 30 to 40 bases. This problem grows massively as the linear length of DNA sequence analysed increases .
The fragmentation problem and poly-ionisation problem are related protonation of the bases in DNA is believed to induce fragmentation. Various DNA analogues exist which are less easily protonated, reducing the spectral complexity problem but the mass resolution limit is less trivial to overcome. Summary of the Invention
The present invention provides a method for sequencing nucleic acid, which comprises:
(a) obtaining a target nucleic acid population comprising nucleic acid fragments in which each fragment is present in a unique amount and bears at one end a sticky end sequence of predetermined length and unknown sequence,
(b) protecting the other end of each fragment, and
(c) sequencing each of the fragments by
(i) contacting the fragments with an array of adaptor oligonucleotides under hybridisation conditions, each adaptor oligonucleotide bearing a label, a sequencing enzyme recognition site, and a known unique base sequence of same predetermined length as the sticky end sequence, the array containing all possible base sequences of that predetermined length; removing any unhybridised adaptor oligonucleotide and recording the quantity of any hybridised adaptor oligonucleotide by detection of the label, then repeating the cycle, until all of the adaptors in the array have been tested;
(ii) contacting the hybridised adaptor oligonucleotides with a sequencing enzyme which binds to the recognition site and cuts the fragment to expose a new sticky end sequence which is contiguous with or overlaps the previous sticky end sequence;
(iii) repeating steps (i) and (ii) for a sufficient number of times and determining the sequence of the fragment by comparing the quantities recorded for each sticky end sequence.
The label may be any label suitable for the purpose, such as a fluorescent label or a mass label. Each label may comprise a mass label associated with a corresponding known base sequence for identifying the corresponding _base sequence in mass spectrometry. In one embodiment, each adaptor oligonucleotide is labelled with an associated mass label which is uniquely resolvable in mass spectrometry from the other labelled adaptor oligonucleotides. According to this embodiment it is preferable that each adaptor oligonucleotide is composed of nucleotide analogues which are resistant to fragmentation in the mass spectrometer .
In another embodiment, each mass label is cleavably attached to its corresponding adaptor oligonucleotide and uniquely resolvable in mass spectrometry. The mass label may be attached to the adaptor oligonucleotide by a cleavable linker and may be cleaved under any appropriate cleavage conditions such as photocleavage conditions or chemical cleavage conditions.
The mass spectrometry may be effected using a mass spectrometer with orthogonal time of flight or array detector geometry.
According to one embodiment, the fragments are contacted in step (i) with the array of adaptor oligonucleotides in a cycle wherein the cycle comprises sequentially contacting each adaptor oligonucleotide of the array with the fragments.
Preferably, the target nucleic acid population is subjected to sorting into sub-populations according to their sticky end sequences and each of the sub-populations is subjected to steps (b) and (c) .
In one embodiment, where the target nucleic acid is genomic DNA, each fragment may be produced by differential application.
Preferably, the predetermined length of this base sequence of the sticky ends is from 3 to 5 , more preferably 4.
The sequencing enzyme preferably comprises a type IIs restriction endonuclease . The target nucleic acid population may comprise heterogeneous nucleic acid fragments. _ The other end of each fragment may be protected by ligation with an immobilisation adaptor oligonucleotide.
Brief description of the drawings
FIGURE 1 shows a cloning vector for template sequences;
FIGURE 2 shows PCR amplification of template DNA;
FIGURE 3 shows immobilisation of amplified template DNA;
FIGURE 4 shows a method of differential amplification of template
DNA fragments;
FIGURE 5 shows a method of producing protected DNA fragments with termini for sequencing;
FIGURE 6 shows the action of Fokl ;
FIGURE 7 shows cutting behaviour of typical adapters according ot the invention;
FIGURE 8 shows an adapter cycle according to one embodiment of the invention; and
FIGURE 9 shows graphs of the effects of PEG and Ficol on ligation at ATTA and GCCG.
DESCRIPTION OF THE PREFERRED EMBODIMENTS
Simultaneous sequencing of sorted populations of nucleic acids by adapters :
This invention provides a method capable of simultaneously sequencing a heterogeneous population of nucleic acid fragments. The technology is compatible with numerous methods of template preparation. The invention however provides a novel preferred strategy for sequencing large DNA molecules that limits the need for biological sub-cloning hosts and vectors. In outline the sequencing process may be summarised:
1. Generation of a mixed nucleic acid population;
2. Sort molecules into subsets; and
3. Sequence molecules within subsets simultaneously.
The sequencing method described here allows one to produce nucleic acid fragment populations in a_ reproducible manner that can then be sorted into subsets and finally sequenced by an oligonucleotide adapter based technique. The sequencing method described requires double stranded templates . The sequencing technique is most effective with immobilisation of the nucleic acid templates at one terminus, the other terminus must be accessible to adapters.
The sequencing steps use adapter molecules to generate and probe the sequence of terminal single-stranded overlaps of immobilised nucleic acid fragments. Single-stranded overlaps are generated in a cyclical process preferably through the use of type IIs restriction endonucleases . Recognition sites for these enzymes are provided by adapters at the terminus of a template sequence. The position of the recognition site is arranged so that digestion in the presence of the type IIs endonuclease exposes an ambiguous sticky end in the unknown sequence of the template. The resultant ambiguous sticky ends generated in template sequences are probed as heterogeneous sets and sequence information is determined by measuring the quantity of label detected from correctly hybridised adapters. The sequence of individual fragments is determined by comparing quantities of label for each adapter in each cycle of the sequencing process with quantities derived in previous and subsequent cycles. The invention provides a method for analysing heterogeneous sub- populations of nucleic acids without spatially resolving them. This is achieved by a signal acquisition and signal processing procedure that allows sequences to be identified on the basis of their relative quantities.
This process does not require traditional gel methods to acquire sequence information. Since the entire process takes place in solution and is an iterative process, the steps involved could be performed by a liquid-handling robot or a microfluidics system.
Type IIs Restriction Endonucleases : Type IIs restriction endonucleases have the property that they recognise and bind to a specific sequence within a target DNA molecule, but they cut at a defined distance away from that sequence generating single-stranded sticky-ends of known length but unknown sequence at the cleavage termini of the restriction products .
For example, the enzyme fokl, generates an ambiguous sticky-end of 4 bp, 9 bp downstream of its recognition sequence. This ambiguous sticky-end could thus be one of 256 possible 4-mers. (see figure 6) . Numerous other type IIs restriction endonucleases exist.
Sequencing large nucleic acid molecules:
It is not necessary to sequence an entire molecule at once to determine its sequence, which is fortunate as it is a practical impossibility, at the moment, to sequence molecules as large as chromosomes. It is calculated that any given sequence 17 bp long should be unique within the human genome. Similar calculations can be performed for genomes that are of different sizes. This consideration means that large nucleic acids or entire genomes can be sequenced by degradation into short overlapping fragments, > 17 bp in length, which can then be sequenced and the total genome sequence can thence be reconstructed using software to determine contig overlaps .
Preparing a large Nucleic Acid for Sequencing:
To sequence a complete nucleic acid of significant size is practically very difficult. A preferred method for use with this invention requires fragmentation of the target nucleic acid followed by molecular sorting into sub-populations that are small enough to allow simultaneous sequencing. This preferred method requires the use of adapters. Two adapter based sorting methods are described here. One requires the use of a type IIs restriction endonuclease or a similar system for generating ambiguous sticky-ends in double stranded DNA while the second uses a primer and DNA amplification based approach to sorting.
Establishing a single sequencing terminus for generic nucleic acids is a requirement that faces users of this technology. The approach taken will clearly be determined by what is known about the nucleic acid to be sequenced. An adapter based approach suitable for an unknown nucleic acid is described here.
Fragmentation of large nucleic acids :
Nucleic acids may be fragmented in numerous ways which may be either directed or random. For the purposes of sequencing large nucleic acids, an approach which generates numerous fragments that overlap randomly is favoured in conventional sequencing strategies for large nucleic acid templates, due to its redundancy and relative simplicity. Obviously this sort of approach requires multiple copies of the target nucleic acid to ensure that all sequences are represented in the population of fragments with unambiguous overlaps with other fragments.
Random fragmentation will work excellently with certain embodiments of this invention but to try and reduce redundant sequencing more controlled fragmentation of a target nucleic acid could be used. One might fragment the target nucleic acid with a relatively high stringency type II restriction endonuclease or a type IIs restriction endonuclease. A set of relatively high stringency restriction endonucleases can be used to generate sets of overlapping fragments . In this way one would hope to generate overlapping contigs in a more economical manner than random fragmentatio .
Random fragmentation can be achieved with mild digestion of the target nucleic acid by DNAsel or sonication. Generally, blunt - ended fragments are generated by this approach.
Automated preparation of heterogeneous template populations with known template frequencies :
In order to produce a high throughput DNA sequencing technology the automation of the production of the sequencing template is essential. The following is an outline which describes an automated method of producing sequencing templates using conventional techniques.
For a large scale sequencing project, for example a whole bacterial genome or a full YAC clone, the DNA is first sub-cloned into a library. The process of producing a library of this sort can be done in-house or by commercially available services, such as that provided by Clonetech. The DNA is fragmented (e.g. by restriction enzyme digestion or sonication) to sizes in range of a few hundred bases and then sub-cloned into a cloning vector of choice. Because each fragment in the library is flanked by the same vector sequence a standard set of flanking PCR primers can be used to PCR amplify each fragment. Using the same PCR primers for each fragment also helps to normalise the efficiency of each PCR reaction as primer sequence is one of the most important factors affecting amplification efficiency, (see figure 1)
The library is then transfixed into an appropriate bacterial strain and the bacteria plated out onto selective agar plates. Individual colonies (each containing an unique fragment contained within the cloning vector) are then picked by a colony picking robot (which are commercially available) . Each picked colony is then spiked into a unique PCR reaction, set up on a microtitre plate for example, and each fragment is PCR amplified using the standard primer set which flank the insert . One of the primers used in this reaction must be biotinylated which will allow the subsequent capture of the amplified fragment, (see figure 2)
Following the PCR amplification, a known amount of each of the amplified fragments can be captured on a streptavidin coated surface by its biotinylated primer. By controlling the amount of available streptavidin a specific amount of PCR product can be captured. (This does, however, rely on all the primers being incorporated into the amplification products. This should only require a simple primer titration optimisation experiment as PCR reactions using clones are highly efficient.)
Different protocols can be used for this purpose, for example streptavidin coated magnetic beads or streptavidin coated wells of a microtitre plate. When using beads, which will bind 1 pmol of biotin per ul of beads, adding 5 ul of beads and the appropriate buffer to the PCR reaction will capture 5 pmol of the amplified fragment. The use of beads also allows the capture of different quantities of individual amplified fragments. By adding differing amounts of beads to separate amplification reactions prior to pooling them, one can, for example, create a heterogeneous population with lpmol of fragment 1, 4 pmol of fragment 2, 10 pmol of fragment 3 and so on. Alternatively streptavidin coated wells of a microtitre plate could also be used by transferring each amplification reaction to a unique well of the microtitre plate. Commercially available streptavidin coated plates usually have a maximum binding capacity of between 5 to 20 pmol of biotin. Therefore, the amount of amplified fragment captured in each well is determined by the binding capacity of that plate.
Following capture, excess amplified fragments are then washed away, the double stranded PCR product is denatured with either alkali or heat (or both) (to free the non-biotinylated strand) . The non-biotinylated strand is then washed away and this leaves a single stranded template immobilised in the well or tube ready to be used in a sequencing reaction, (see figure 3)
If simultaneous sequencing of heterogeneous templates is to be performed quantification of template must be stringent. This can be achieved by labelling one of the primers. After the amplification reaction has been performed, the biotinylated fragments can be captured on an avidinated substrate and washed. The number of copies of template pres.ent can be determined by measuring the retained fluorescence. Appropriate dilution of the amplified template can be performed if desired before sequencing. This gives an additional level of control over and above the capture steps .
The vector sequences at either terminus of the DNA can be designed to bear distinct primer sequences. This would ensure that one terminus can be readily identified as a sequencing terminus and one terminus could be designated as the immobilisation terminus. A unique termination sequence at the immobilisation terminus would identify when the clone had been sequenced. Type IIs restriction endonuclease sites in the immobilisation terminus sequence can additionally permit molecular sorting. The sequencing terminus can be engineered to carry a recognition sequence for the type IIs restriction endonuclease to be used as the sequencing endonuclease. These vector sequences can additionally provide primer sequences to permit amplification of template and amplification based sorting.
Adapter based techniques for template preparation:
Rather than using cloning vectors to provide sequencing and immobilisation termini it is possible to use an adapter based approach that is more amenable to automation. One can ligate adapters bearing primer binding sites to the nucleic acid fragments generated by a variety of fragmentation processes.
The precise method of ligation will depend on how the fragments are generated. If fragments of the target nucleic acid are generated by using a type IIs restriction endonuclease, adapters with sticky-ends complementary to subsets of the possible sticky- ends that would be generated by the fragmentation endonuclease, can be ligated to the resultant fragments. These adapters could carry designed primer sites that would allow much greater control of the amplification step. The combinations of subsets of sticky- ends of the primer adapters will determine which subsets of fragments are amplified and how large these subsets of fragments are . This will allow much greater control over the PCR amplification steps. See Figures 4 and 5.
To maximise the differences in frequency between fragments within an amplified set, one need only alter the quantities of primers corresponding to the primer binding sites present in each adapter added to the PCR incubator. The combination of adapters at the termini of the nucleic acid fragments should increase the variation in frequency of fragments, exacerbating the known inhomogeneities in PCR amplification.
PCR of genomi c DNA :
The use of controlled fragmentation and amplification, outlined above, could conceivably allow specific amplification of DNA subsets directly from genomic DNA. This potentially offers a novel strategy for genomic sequencing that avoids cloning into biological hosts and vectors. If one could reliably amplify genomic subsets of fragments then the biological steps of present sequencing strategies could be avoided with large savings in time and resources, and also eliminating the unreliability of using biological vectors . A set of primers would also be a rather more manageable way to access genomic fragments : primers would not need to be physically maintained due to the ease of synthesising short oligonucleotides whereas clones must be carefully cultured to ensure their availability and continued integrity.
The parallel sequencing process described here lends itself to this sort of cloning strategy because of its ability to simultaneously sequence heterogeneous populations without spatial resolution of nucleic acids which conventional sequencing strategies cannot achieve. This means if a set of primers generated more than one fragment, the ability to sequence multiple templates simultaneously would allow one to determine the sequence without having to separate the amplified fragments.
Ensuring tha t only the desired type IIs restriction endonuclease si tes in the target nucleic acid are available for sequencing:
It is important to ensure no 'sequencing enzyme' binding sites are accessible or present in the template nucleic acid fragments prior to addition of adapters bearing the 'sequencing enzyme' binding site to the terminus of the molecule from which sequencing is to occur. Certain type IIs restriction endonucleases are sensitive to the methylation state of their recognition regions so to prevent unwanted sites being used by the sequencing endonuclease the target nucleic acid can be methylated prior to ligation of adapters bearing the sequencing endonuclease recognition site. Methylation can be achieved in the preparation of templates by use of 5-methyl cytosine in any amplification reactions. Use of unmethylated adapters would allow recognition sequences present in these to function but not those in the template .
An alternative, but less preferable, way to avoid this problem is to remove enzymatically the recognition sequence of the sequencing endonuclease from within the target nucleic acid population. The nucleic acid can be fragmented initially with the sequencing enzyme. This will generate 3 classes of fragments, one class with the sequencing enzyme recognition site at one terminus only, one class with the sequencing enzyme recognition site at both termini and a third class with the site at neither termini.
A complete adapter set, i.e. corresponding to all sticky-ends, can be added to the restriction fragment population. The adapter would bear the recognition site for the sequencing enzyme. Addition of the sequencing enzyme to a population of fragments with these adapters can have two results. If a given terminus has a recognition site already then the sequencing enzyme can cleave either at the adapter site or at the more internal site. There is a 50 % chance of either cleavage event occurring. At other sites where there is no internal site, clearly, terminal bases must be lost by this process. Since with each round of this process only half of the internal sites will be removed, the process must be repeated at least 7 times to ensure removal of the sequencing enzyme recognition sequence from at least 99 % of the fragments in the population. Thus fragment size may be significantly reduced if a sequencing enzyme is used that cleaves at a significant distance from its recognition site.
Establishing distinct termini in a population of nucleic acids :
An important facet of this technology is immobilisation of nucleic acids at one terminus . This requires that a randomly generated fragment have directionality, i.e. it requires two distinguishable termini. This can be achieved using adapters. Two types of adapters are required to identify two distinct termini. Adapters for a simple protocol are shown below:
Biotin NNN GGATG NNNNNGATC
Adapter1 ^ NNN CCTAC NNNNN
NNNGGCC NNNNGATC Adapter2 NJJJJ CGG NNNN
The pair of adapters shown can be ligated to a fragment generated by Sau3AI . The first adapter bears a recognition sequence for fokl while the second adapter bears a restriction site for BsuRI . BsuRI is methylation sensitive and generates blun -ended fragments. If one synthesises template DNA with S-methyl cytosine but uses adapters with ordinary DNA, only the adapter will be cleaved by this will leave fragments amenable to blunt end ligation. Adapter 1 provides immobilisation and the recognition site for the sequencing endonuclease. A simple protocol for generating distinct termini would be as follows:
o The first step is fragmentation of a large number of copies of a large nucleic acid, preferably with an ordinary type II restriction endonuclease to generate known sticky ends, such as Sau3AI .
o The resultant fragments can then be ligated to adapters. If the fragments are treated with ligase in the presence of the two types of adapters above, this will generate fragments of three types: fragments with both ends carrying adapter 1, fragments with both ends carrying adapter 2 and thirdly fragments carrying adapter 1 at one end and adapter 2 at the other. Statistically the third type of fragment will be in the majority.
° If the immobilisation effector on adapter 1 is biotin then the fragments carrying adapter 1 can be immobilised on a solid phase matrix derivitised with avidin. The fragments carrying adapter 2 at both ends can be washed away and those fragments carrying two immobilisation adapters will also be immobilised.
° Cleavage with the type II restriction endonuclease whose binding site is carried by adapter 2 will allow blunt end ligation of a new adapter to one terminus of the fragments bearing both types of adapter.
° At this stage the immobilised fragments can be removed from the solid phase matrix. Biotin/streptavidin interactions can be disrupted by acid.
o Fragments that bore both adapters can be captured by the new terminus generated by cleavage of adapter 2. Capture requires a further adapter which can be immobilised allowing fragments with adapter 1 at both termini to be washed away. Alternatively the 'capture' adapter can introduce a primer sequence. Adapter 1 can additionally provide a known primer sequence to permit the captured fragments to be differentially amplified. A more complex protocol which allows molecular sorting could be achieved using adapter 2 to provide a second type IIs recognition site. (BspMI in the example below)
NNNGCAGGT NNNN Adapter2 NNNCGTCCA NNNNGATC
If the cleavage step after immobilisation is performed with this would generate fragments with ambiguous sticky ends at one terminus which can again be captured by adapters complementary to the sticky ends generated but one can select at this stage which sticky ends to capture. An entire array of all possible adapters can be generated to allow all fragments to be captured and isolated. A hybridisation array on a glass surface would allow spatial sorting. An alternative method would use the adapter sequence to perform differential amplification.
Addi tional sorting:
Once a fragment population has been amplified and distinct termini established for each fragment, an arbitrary degree of sorting can be performed. The 'capture' adapter used above can provide another terminal type IIs restriction endonuclease site. This will allow another set of ambiguous sticky-ends to be generated allowing further sub-sorting until the nucleic acid fragment population is of the correct size for unambiguous sequence determination.
This sorting process above generates, for a 4 bp ambiguous sticky-end, 256 sub-populations. This may be generate nucleic acid populations small enough to begin sequencing or further sub- sorting may be necessary.
In order to begin to sequence the sorted nucleic acids, they must be treated with the sequencing enzyme to expose a new ambiguous sticky-end at the sequencing terminus.
Parallel Sequencing of Subsets of Nucleic Acid Fragments wi th Adapters :
Sequencing a single molecule by ligation of adapters :
The actual sequencing method is essentially sequencing by hybridisation and can be understood first by explaining it for the case of a single nucleic acid. Consider a single nucleic acid, immobilised at one terminus to a solid phase substrate, and which has an adapter at the other terminus bearing the recognition site for the type IIs restriction endonuclease chosen for sequencing. Digestion in the presence of fokl will generate a 4 bp ambiguous sticky-end.
To determine the sequence of that sticky-end one can probe the immobilised nucleic acid with an adapter molecule. This would be an oligonucleotide carrying a sticky-end with one, known, sequence of 4 bp of the possible 256. The adapter would additionally carry a label, e.g. a fluorescent tag, and a binding site for the desired type IIs restriction endonuclease to be used to sequence the immobilised nucleic acid. If the adapter is complementary to the ambiguous end of the target nucleic acid, it will hybridise and it will then be possible to ligate the adapter to the target. The immobilised matrix can then be washed to remove any unbound adapter. To determine whether the adapter has been ligated to the immobilised target, one need only measure the fluorescence of the matrix. This will also reveal how much of the adapter has hybridised, hence the amount of immobilised DNA. Other means of detecting hybridisation might be used in this invention. Radio- labelled adapters could be used as an alternative to a fluorescent probe, so also could dyes, stable isotopes, tagging oligonucleotides, enzymes, carbohydrates, biotin amongst others. If the adapter is not complementary to the ambiguous sticky-end of the target nucleic acid and only one label is available, then a second adapter can be tried and the above process repeated until all 256 possible adapters have been tested. (This process is shown in the 'adapter cycle' - see figure 8) .
Clearly one of the adapters will have to be complementary to the ambiguous end. Once this has been found, then the terminus of the target nucleic acid will carry also a binding site for the sequencing endonuclease that will allow cleavage of the target nucleic acid exposing further bases for analysis and the above process can be repeated for the next 4 bp of the target . This iterative process can be repeated until the entire target nucleic acid has been sequenced.
A more effective method of labelling appropriate for use with this invention is 'mass labelling' . Cleavable mass labels are described in patent GB9700746.2. This describes methods for generation and use of labels that are readily detectable in a mass spectrometer. Mass labelling permits the generation of large numbers of labels. This would permit 256 labels to be generated allowing all 256 probes for a 4 base pair overlap to be tested simultaneously rather than sequentially. This has advantages in a hybridisation based sequencing method as a competitive binding system avoids some of the problems of different binding energies of different 4 base sequences. GB 9700746.2 describes tagging of nucleic acid probes with cleavable mass labels. These labels may be cleaved from the probe at various stages in a probing assay, but a preferred method of cleavage is during the ionisation process . For use with the sequencing method described here various methods are possible. After the exposed sticky ends of a template are probed with labelled adapters one is left, after washing away non-ligated adapters, with a template terminated with a labelled adapter. One can use labels that are photocleavable, thermo- labile or acid labile, for example, which can be removed at this stage and analysed in a mass spectrometer. Alternatively one can cleave the adapter from the template with the appropriate type IIs restriction endonuclease whose recognition site is provided by the adapter. The cleaved adapter can be analysed in a mass spectrometer and the mass label can be cleaved during ionisation.
Non-cleavable mass labels are also appropriate for analysis of the cleaved adapter. One needs only use sufficient labels to resolve adapters with the same mass in the mass spectrum.
For the purposes of sequencing, with 4bp ambiguous sticky-ends, one would need 256 adaptors each tagged via a non-cleavable linker, to give the whole adaptor, or preferably just the strand of the adapter without the probe sequence, a unique mass in the mass spectrum. After probing a template with an adaptor population the successfully ligated adapter can then be identified after washing away any unligated probe. To identify the ligated adapter, it is then digested from the template by the sequencing endonuclease. The released adaptor can then be analysed by electrospray mass spectrometry. Preferably, only the mass tagged strand should be analysed. A short liquid chromatography step with a denaturing solvent would allow the tagged strand to be separated from the untagged strand. HPLC or capillary electrophoresis separations would be appropriate. Such a separation would probably not be necessary, though. Denaturing the cleaved adaptor might be quite desirable, however. After cleavage, both strands of the adaptor will be extended by 4 bp . The probe strand will be extended by 4 unknown bases . The non- probe strand will be extended by 4 bases complementary to the probe overlap of the probe strand, and so will have a known mass, hence is the preferred strand for mass labelling. Certain 4mers have the same composition, GGCC, GCCG, GCGC, etc and need to be resolved as these will all give the same peaks in a mass spectrum. One need only add sufficient mass to resolve these uniquely. One can furthermore choose mass tags which should reveal mis -hybridisations as an incorrect base will be cleaved off the template and will be present in the extended sequence of the tagged probe. If probes appear shifted to points in the mass spectrum in which probes are not normally found one will detect mis-hybridisation. If the tags are c-hosen carefully it should be possible to determine the identity of the mis-hybridised base. This should allow correction of hybridisation errors and correct assignment of sequence. Quantitation of probe should be improved if this source of error is removed. Type IIs restriction endonucleases are also known to sometimes cleave at incorrect positions. Such cleavages should also be identifiable with this approach as an extra base or one base too few will give a shifted mass spectral peak. This should again allow improvement in quantitation.
Analysis of small nucleic acid molecules within the mass spectrometer is very much less affected by fragmentation than the analysis of larger DNA molecule making this a highly appropriate method of analysis. If fragmentation were a concern one could synthesise adapters and template DNA with analogues that are resistant to fragmentation. N-7 deaza analogues of guanine and adenine are appropriate .
Sequencing a Population of Nucleic Acid Fragments :
The same process can be applied to a heterogeneous population of immobilised nucleic acids allowing them to be analysed in parallel. To be successful when applied to a population of nucleic acids, this method relies on the fact that statistically 1 out of 256 molecules within the total population will carry each of the possible 4 bp sticky-ends at any particular site after cleavage with a sequencing enzyme such as fokl . If one sub- sorts the template population into manageable subsets of less than 256 fragments, one would expect that almost all will have different ambiguous sticky-ends (there is about a 1 in 1000 chance of there being 2 distinct cDNAs having the same initial 4 bp sticky end) so for most purposes one can assume that a hybridisation signal corresponds to a single cDNA type.
The positioning of the recognition site for the sequencing endonuclease in the adapter will determine whether the next 4 bp exposed are the next 4 bp in the sequence. Or they may overlap partially with the last four base pairs thus giving partially redundant information or they may be further downstream missing out a few bases, thus only sampling the sequence of the immobilised target nucleic acid. (See figure below, sequential bases can be exposed with adapter 1 while bases are sampled at intervals by adapter 2. With adapter 3 redundant information is acquired. Adapter nucleic acid is shown in bold while fokl binding sites are underlined) . Whatever spacing is used, the spatial information relating the 4 bp oligonucleotides is retained. For the purposes of this invention redundant sequence data is desirable from the template nucleic acid in order to relate sequence information from each round of sequencing to the last round. This gives a small amount of redundancy, hence adapter 3 in figure 7 below is a preferred adapter construct.
Reconstructing Sequences of Target Nucleic Acids:
Repetitions of the adapter cycle will generate a matrix of quantities of label corresponding to each adapter corresponding to each adapter tested.
Adapter Cycle 1 Cycle 2 Cycle 3 Cycle 4
AAAA 5 24 13 7
AAAC 10 5 9 13
AAAG 13 9 15 17
TTTG 7 13 17 10
TTTT 17 10 7 14
To reconstruct the sequences to which these quantities of label correspond, this invention also envisions algorithms for analysing such a data matrix. The algorithm attempts to identify a sequence on the basis of its frequency, i.e. a sequence present at a given frequency will have every sub-sequence present at the same frequency. The algorithm searches through each column of the matrix and attempts to resolve label quantities, that may be sums of sequence frequencies into atomic quantities such that the same set of atomic quantities appear in all columns. The algorithm achieves this by comparing label quantities in a given column with those in the all the other columns. A given atomic quantity that appears in all columns is then assumed to correspond to a unique sequence .
If two sequences have the same n-mer at a particular point in the sequence, these can be resolved by the quantitative nature of this system in that the quantity of a particular n-mer in a particular ligation will be the sum of the quantities of the two sequences that share the n-mer at the same point . These can be largely resolved by comparison of one cycle with previous and subsequent ligation cycles to identify such sums. This is made particularly simple if the sequences that are being analysed have been amplified by PCR such that the sequence in the lowest quantity is present at not less than half the quantity of the sequence with the greatest frequency, that is to say if the frequency range of sequences lies between some quantity N and 2N. This means that any sum of frequencies will be greater than 2N and hence readily detectable.
If there is a known overlap between sequence samples, in embodiments that use adapters that generate overlapping sequence samples one has a certain amount of redundancy with which to account for errors . A one base overlap in samples will a quarter of the sequences in each column of the matrix with the next and previous columns .
Implementation of the process:
Practical details of implementing the process are described below.
Adaptors, PCR Primers and Oligonucleotides:
Construction of Adaptors , Primers, etc :
Details and reviews on the construction of oligonucleotides are available in numerous up to date texts, which should allow one skilled in the art to construct primers, adaptors and any other oligonucleotides required by the invention: ° Gait, M.J. editor, Oligonucleotide Synthesis: A Practical Approach' , IRL Press, Oxford, 1990 o Eckstein, editor, 'Oligonucleotides and Analogues : A Practical Approach', IRL Press, Oxford, 1991 o Kricka, editor, 'Nonisotropic DNA Probe Techniques', Academic Press, San Diego, 1992 o Haugland, 'Handbook of Fluorescent Probes and Research Chemicals', Molecular Probes, Inc., Eugene, 1992 o Keller and Manack, 'DNA Probes, 2nd Edition' , Stockton Press, New York, 1993
° Kessler, editor, 'Nonradioactive Labeling and Detection of Biomolecules' , Springer-Verlag, Berlin, 1992.
Conditions for Using Oligonucleotide Constructs:
Details on effects of hybridisation conditions for nucleic acid probes can be found in be found in references below: o Wetmur, Critical Reviews in Biochemistry and Molecular Biology, 26, 227-259, 1991
° Sambrook et al, 'Molecular Cloning: A Laboratory Manual, 2nd Edition' , Cold Spring Harbour Laboratory, New York, 1989
°Please consider the attached claims and proposed response and let us know whether or not you are happy with them. If we do not hear from you by the start of our business day 6th October 1997, we shall take it that you are happy with the claims and response and file them at the European Patent Office without further revision.
Hames, B.D., Higgins , S.J., 'Nucleic Acid Hybridisation: A Practical Approach', IRL Press, Oxford, 1988
Ligation :
Ligation of adaptors is another critical aspect of the invention that must be considered. Chemical methods of ligation are known: o Ferris et al, Nucleosides and Nucleotides 8, 407 - 414, 1989 o Shabarova et al, Nucleic Acids Research 19, 4247 - 4251, 1991
Preferably enzymatic ligation would be used and preferred ligases would be T4 DNA ligase, T7 DNA ligase, E. coli DNA ligase, Taq ligase, Pfu ligase and Tth ligase. References to the literature are given below: o Lehman, Science 186, 790 - 797, 1974 o Engler et al, 'DNA Ligases', pg 3 - 30 in Boyer, editor, 'The Enzymes, Vol 15B' , Academic Press, New York, 1982
Protocols for use of ligases can be found in: o Sambrook et al, cited above o Barany, PCR Methods and Applications, 1: 5 - 16, 1991 o Marsh et al, Strategies 5, 73 - 76, 1992
Phosphorylation of Nucleic Acids :
When ligases and restriction endonucleases are used, there are changes made to the 5' phosphates of nucleic acid backbone sugar molecules. It may be advantageous to alter the phosphorylation state of adaptors or target nucleic acids in versions of the process. For example dephosphorylating the terminal 5' bases of immobilised cDNAs left after fokl cleavage to prevent cross- ligation of immobilised nucleic acids with complementary termini. Hence included are references to literature regarding use of phosphatases, kinases and chemical methods: o Horn and Urdea, Tetrahedron Lett. 27, 4705, 1986 o Sambrook et al, cited above
Restriction Endonucleases:
Numerous type IIs restriction endonucleases exist and could be used as sequencing enzymes for this process . Table 1 below gives a list of examples but is by no means comprehensive. A literary review of restriction endonucleases can be found in Roberts, R., J. Nucl. Acids Res. 18, 2351 - 2365, 1988. New enzymes are discovered at an increasing rate and more up to date listings are recorded in specialist databases such as REBase which is readily accessible on the internet using software packages such as Netscape or Mosaic and is found at the World Wide Web address: http://www.neb.com/rebase/. REBase lists all restriction enzymes as they are discovered and is updated regularly, moreover it lists recognition sequences, isoschizomers of each enzyme, manufacturers and suppliers and references to them in scientific literature. The protocol would be much the same irrespective of the type IIs restriction endonuclease used but the spacing of recognition sites for a given enzyme within an adaptor would be tailored according to requirements and the enzymes cutting behaviour, (see figure n above)
Enzyme Name Recognition sequence Cutting site
Fokl GGATG 9/13
BstFsl GGATG 2/0
SfaNI GCATC 5/9
Hgal GACGC 5/10
Bbvl GCAGC 8/12
Table 1 : A sample of type IIs restriction endonucleases
The requirement of the process is the generation of ambiguous sticky-ends at the termini of the nucleic acids being analysed. This could also be achieved by controlled use of 5 ' to 3 ' exonucleases. Clearly any method that achieves the creation of such sticky-ends will suffice for the process.
Similarly ordinary type II restriction endonucleases required by this invention can be found in the reference sources listed above. Details on methylation sensitivity and other means of controlling enzyme action can be found in the references given in REBase or can be acquired from the manufacturers.
Other means, however, of cleaving the immobilised nucleic acid might also suffice for this invention. Site specific chemical cleavage has been reported in Chu, B.C.F. and Orgel, L.E., Proc. Natl. Acad. Sci. USA, 1985, 963 - 967. Use of a non-specific nuclease to generate blunt ended fragments might also be used. Preferably, though, a type II restriction endonuclease would be used, chosen for accuracy of recognition of its site, maximal processivity and cheap and ready availability.
Solid Phase Supports :
A full discussion of solid phase supports can be found in Brenner PCT/US95/12678 pg 12 - 14. This is an important issue in the use of fluorimetry to determine sequence abundance in that the design of supports will affect the acquisition of fluorescent signals which must be maximised for this process to be effective.
Fluorimetry:
Preferred embodiments of the process will use adaptors labeled with fluorescent labels. Detection of fluorescent signals can be performed using optical equipment that is readily available. Fluorescent labels usually have optimum frequencies for excitation and then fluoresce at specific wavelengths in returning from an excited state to a ground state. Excitation can be performed with lasers at specific frequencies and fluorescence detected using collections lenses, beam splitters and signal distribution optics. These direct fluorescent signals to photomultiplier systems which convert optical signals to electronic signals which can be interpreted using appropriate electronics systems.
Brenner PCT/US95/12678 pg 26 - 28 gives a full discussion.
Liquid Handling Robotics :
For this process to be practically useful, automation is essential and liquid handling robots can be acquired from various sources such as Applied Biosystems .
Quantification and mass spectrometry:
For the most part biochemical and molecular biological assays are quantitative. The mass spectrometer is not a simple device for quantification but use of appropriate instrumentation can lead to great sensitivity. It must always be remembered that the ion count is not a direct measure of the source molecule quantity, the relationship is a complex function of the molecule's ionisation behaviour. Quantitation is effected by scanning the mass spectrum and counting ions at each mass/charge ratio scanned. The count is integrated to give the total count at each point in the spectrum over a given time. These counts can be related back to the original qunatities of source molecules in a sample. Methods for relating the ion count or current back to the quantity of source molecule vary. External standards are one approach in which the behaviour of the sample molecules is determined prior to measurement of unknown sample. A calibration curve for each sample molecule can be determined by measuring the ion current for serial dilutions of a sample molecule when fed into the instrument configuration being used.
Internal standards are probably the more favoured approach rather than external standards, since an internal standard is subjected to the same experimental conditions as the sample so any experimental vagaries will affect both internal control and sample molecule. To determine the quantity of a sample molecule, an internal standard of a known quantity is added to the sample. The internal standard is chosen to have a similar ionisation behaviour as the molecule being measured. Thus the ratio of sample ion count to standard ion count can be used to determine the quantity of sample as the ratio of qunatities should be the same. Choosing appropriate standards is the main difficulty with this approach. One must find a molecule that is similar but not identical in its mass spectrum. A favourable approach is to synthesise the sample molecule with appropriate isotopes to give a slightly different mass spectrum, for a molecule with the same chemical behaviour. This approach might be less desirable than external standards for use with large numbers of mass labels due to the added expense of finding or synthesising appropriate internal standards but will give better qunatification than external standards . An alternative to isotope labelling is to identify a molecule that has similar but not identical chemical behaviour as the sample in the mass spectrometer. Finding such analogues is difficult and is a significant task for large families of mass labels.
A compromise approach might be appropriate though, since large families of mass labels will ideally be synthesised combinatorially, and will thus be related chemically. A small number of internal controls might be used, where each individual control determines the quantities of a number of mass labels . The precise relationship between internal standard and each mass label might be determined in external calibration experiments to compensate for any differences between them.
The configuration of the instrument is critical to determining the actual ion count itself, particularly the ionisation method and the separation method used. Certain separation methods act as mass filters like the quadrupole which only permits ions with a particular mass charge ratio to pass through at one time. This means that a considerable proportion of sample never reaches the detector. Furthermore most mass spectrometers only detect one part of the mass spectrum at a time. Given that a large proportion of the mass spectrum may be empty or irrelevant but is usually scanned anyway, this means a further large proportion of the sample is wasted. These factors may be a problem in detecting very low abundance ions but these problems can in large part be overcome by correct configuration of the instrumentation.
To ensure better quantification one could attempt to ensure all ions that are generated are detected. Mattauch-Herzog geometry sector instruments permit this but have a number of limitations. Sector instruments are organised into distinct regions, 'sectors', that perform certain functions. In general the ionisation chamber feeds into a free sector which feeds into an 'electric sector' . The electric sector esentially 'focusses' the ion beam which is divergent after leaving the ion source. The electronic sector also ensures the ion stream has the same energy. This step results in the loss of a certain amount of sample. This focussed ion beam then passes through a second free area into a magnetic sector which splits the beam on the basis of its mass charge ratio. The magnetic sector behaves almost like a prism. A photographic plate can be placed in front of the split beam to measure the intensities of the spectrum at all positions. Unfortunately there is a limit on the dynamic range of these sorts of detector and it is messy and cumbersome. Better dynamic range is achievable with electron multiplier arrays, but at a cost of loss in resolution which is limited by how close together the elements of the array can be constructed. With a family of well characterised mass labels one would probably monitor only sufficient peaks to sample all the mass labels unambiguously. In general array detectors would allow one to simultaneously and continuously monitor a number of regions of the mass spectrum simultaneously, which might be applicable for use with well characterised mass label families. The limit on the resolution of closely spaced regions of the spectrum might restrict the number of labels one might use, though, if array detectors are chosen. For 'selected ion monitoring' (SIM) the quadropole has an advantage over many configurations in that the fields that filter ions can be changed with extreme rapidity allowing a very high sampling rate over a small number of peaks of interest.
Orthogonal TOF mass spectrometry:
An approach that is preferable to array geometries is the orthogonal time of flight mass spectrometer. This geometry that allows for very fast sampling of an ion stream followed by almost instantaneous detection of all ion species. The ion current leaving the source, probably an electrospray source for many biological applications, passes an electrode plate perpendicular to the current . This plate is essentially an electrical gate and is used to generate a repulsive potential which deflects the ion current 'orthogonally' into a time of flight mass analyser that uses a reflectron. The reflectron is essentially a series of circular electrodes that generate an increasingly repulsive electromagnetic fieldthat normalises ion energies and reflects the ion stream into a detector. The reflectron is a simple device that greatly increases the resolution of TOF analysers. Ions leaving the ion source will have different energies, faster ions will penetrate the repulsive field further than ions with a lower energy and so will be delayed slightly with respect to the lower energy ions but since they will arrive slightly before the lower energy ions they will enter the TOF at roughly the same time so all the ions of a given mass charge ratio will arrive at the detector at roughly the same time. When the electrical gate is 'closed' to deflect ions into the TOF analyser, the timer is triggered. The flight time of the deflected ions is recorded and this is sufficient to determine their mass/charge ratio. The gate generally only sends a short pulse of ions into the TOF analyser at any one time. Since the arrival of all ions is recorded and since the TOF separation is extremely fast, the entire mass spectrum is measured effectively simultaneously. Furthermore, the gate electrode can sample the ion stream at extremely high frequencies so very little sample is required. For these reasons this geometry is extremely sensitve, to the order of a few femtomoles .
Example 1 :
Ligation of adapters and cycles of digestion have been demonstrated - PCT/GB95/00109. The purpose of the following examples is to show that the ligation of labelled adaptors can give a quantitative signal that is proportional to the quantity of template present .
Three different PCR products are used to represent 3 different templates at different frequencies. The PCR product used for this are exons 14, 16 and 19 of the anion exchanger (AE1) as these PCRs have already been optimised in our laboratory. These are referred to as AE14 , AE16 and AE19.
The products are captured to Dynalbeads (by incorporating a biotin in one of the PCR primers) and effectively represent captured cDNA. AE16 will be at half the concentration of AE14 and AE19 will be at one fifth the concentration of AE14. AE14 sequence ccaaagctgggagagaacagaatgccttggttttctgctgcagatcttccaggaccacccactacagaagac
ttataactacaacgtgttgatggtgcccaaacctcagggccccctgcccaacacagccctcctctcccttgt
gctcatggccggtaccttcttctttgccatgatgctgcgcaagttcaagaacagctcctatttccctggcaa
gtcagcataccctcctcgcctgtccttgccaacactgc
AE16 sequence ctgggagaatgccagggaaaggtctctgcctcccaccctcccaggcccagcccccaccctgtctctcacgtg
gtgatctgagactccaggaatatgaggatgaagaccagcagagcaggcagggcggaggcaaaatcatccaga
tgggaaactcggaacgcaagcccagtgggtggatgacccagccccgggctgaggagttgacaccttgaagcc
atcaggcaccgagagtttctgtgggagggggtagcaggtaagaatgccaagggc
AE19 sequence gtgataggcactgaccccagcctccgcctgcaggtgaagacctggcgcatgcacttattcacgggcatccag
atcatctgcctggcagtgctgtgggtggtgaagtccacgccggcctccctggccctgcccttcgtcctcatc
ctcactgtgccgctgcggcgcgtcctgctgccgctcatcttcaggaacgtggagcttcagtgtgtgagtggc
tgcctgggcctggggcacaagagctgggagcatgcg
Following capture, they are first digested with the frequent cutter Sau 3A1. This enzyme recognises the sequence GATC .
This provides the following 4bp overhangs of each of the products .
AE14
TTCCAGGACCACC ... CTAGAAGGTCCTGGTGG ... AE16
TGAGACTCCAGGAATAT ... CTAGACTCTGAGGTCCTTATA...
AE19
ATCTGCCTGGCAG . CTAGTAGACGGACCGTC .
The following adaptor complimentary to the 4bp overhang revealed by Sau 3A1, and containing a Fok I site, is then ligated to the captured fragments .
Adaptor SauFAM
FAM - CTAGAGGACGATCGA. GGATG .
GATCTCCTGCTAGCT . CCTAC . CTAG l I
Fok I site
This produces the following sequences AE14
FAM - CTAGAGGACGATCGA. GGATG . GATC . TTCCAGGACCACC ... GATCTCCTGCTAGCT . CCTAC . CTAG . AAGGTCCTGGTGG ...
AEl β
FAM - CTAGAGGACGATCGA. GGATG . GATC . TGAGACTCCAGGAATAT . GATCTCCTGCTAGCT . CCTAC . CTAG .ACTCTGAGGTCCTTATA...
AE19
FAM - CTAGAGGACGATCGA. GGATG . GATC .ATCTGCCTGGCAG . GATCTCCTGCTAGCT . CCTAC . CTAG . TAGACGGACCGTC .
These sequences are then digested with Fok I, which cuts at 9 and 13 bases from GGATG, and the following fragments are released into solution.
AE14
FAM - CTAGAGGACGATCGA. GGATG . GATC . TTCCA
GATCTCCTGCTAGCT . CCTAC . CTAG .AAGGTCCTG
AElβ
FAM - CTAGAGGACGATCGA. GGATG. GATC. TGAGA
GATCTCCTGCTAGCT . CCTAC . CTAG .ACTCTGAGG
AE19
FAM - CTAGAGGACGATCGA. GGATG. GATC.ATCTG
GATCTCCTGCTAGCT . CCTAC . CTAG . TAGACGGAC
The cleaved fragments are then captured, through ligation, to 3 different wells of a microtitreplate each containing a specific adaptor simulating the first cycle of a sequencing reaction, providing the first 4 bases. See below for full sequences
For AE14 (adavtor Bbyl4 ) Biotin-N-GCAGC.AGA
N-CGTCG. TCT. CAGG
I I
Bbv I site
For AR16 (αdαυtor Bbylβ)
Biot in-N-GCAGC . AGA
N- CGTCG . TCT . CCTC
For AE19 (adaptor Bbv 19)
Biot in-N-GCAGC . AGA
N-CGTCG . TCT . GTCC
Where N is a number of bases
This produces the following sequences:
For AE14
Biotin-N-GCAGC .AGA. GTCCTGGAAGATC . CATCC . AGCTAGCAGGAGATC
N-CGTCG . TCT . CAGGACCTTCTAG . GTAGG . TCGATCGTCCTCTAG -FAM
For AR 16
Biotin-N-GCAGC . AGA. GGAGTCTCAGATC . CATCC .AGCTAGCAGGAGATC
N-CGTCG . TCT . CCTCAGAGTCTAG . GTAGG . TCGATCGTCCTCTAG -FAM For AE19
Biotin-N-GCAGC .AGA. CAGGCAGATGATC . CATCC .AGCTAGCAGGAGATC
N-CGTCG . TCT . GTCCGTCTACTAG . GTAGG . TCGATCGTCCTCTAG-FAM
At this point the concentration can be measured through fluorescence of the FAM label and the first 4 bases (XXXX) determined. Successful ligation, measured by fluorescence therefore provides concentration information and the first 4 bases of each fragment .
Adaptor Sequences and Preparation:
SauFam
5 ' -FAM-CTAGAGGACGATCGAGGATG-3 '
3 ' -GATCTCCTGCTAGCTCCTACCTAG-P04-5 '
'Bbv ' Adaptors Bbv 14
5 ' BIOTIN-6C-CCTAGACTAGAGGACCGATCGAATCAGCAGCAGA-3 '
3 ' -GATCTGATCTCCTGGCTAGCTTAGTCGTCCTCTCAGG-P04-5 '
Bbv16
5 ' BIOTIN- 6C-CCTAGACTAGAGGACCGATCGAATCAGCAGCAGA-3 '
3 ' -GATCTGATCTCCTGGCTAGCTTAGTCGTCCTCTCCTC-P04 -5
Bbv19
5 ' BIOTIN-6C-CCTAGACTAGAGGACCGATCGAATCAGCAGCAGA-3 ' 3 ' -GATCTGATCTCCTGGCTAGCTTAGTCGTCCTCTGTCC- P04 - 5 '
Cycling Aadptors C14
5 ' FAM-CAACTGTCCAGGATC-3 '
3 ' -GTTGACAGGTCCTAGAAGG-P04 -5 '
C16
5 ' FAM-CAACTGTCCAGGATC-3 '
3 ' -GTTGACAGGTCCTAGACTC-P04 -5 '
C19
5 ' FAM-CAACTGTCCAGGATC-3 '
3 ' -GTTGACAGGTCCTAGTAGA-P04 -5 '
BioFAMFok
5 ' BIOTIN-GGTCACTTAGATCGATCCATGAGGATGCTTCATTCTGATTCAGTCC-3 '
3 ' -CCAGTGAATCTAGCTAGGTACTCCTACGAAGTAAGACTAAGTCAGG-FAM
BioG
5 ' BIOTIN-GCATCTGGAGTCTACAGTCGTCTATTGACG-3 '
3 ' -CGTAGACCTCAGATGTCAGCAGATAACTGCCGGC-P04 -5 '
GCCG
5 ' FAM-GCATCAGGATGTACAG- 3 '
3 ' -CGTAGTCCTACATGTCGCCA-P04 -5 ' In the above adaptors the abbreviations used are as follows:
FAM- fluorescein P04 - phosphate
All primers were purchased from Oswell DNA Services.
All adaptors were made by heating 200ul of TE containing each primer at 20pmol/ul concentration at 90°C, in a Techne Dryblock and allowing the block to cool to room temperature over 2 hours . The adaptors were then incubated on ice for 1 hour and then frozen at -20°C until used.
Binding Bbvl4,16, and 19 Adaptors to Microtitre plate
In order to capture the Fok 1 cleaved fragments to the 'Bbv' adaptors via ligation the 'Bbv" adaptors were bound to black, streptavidin coated 96 well microtitre plates (Boehringer Mannheim) . This was achieved by incubating lOpmol of the appropriate adaptor in 35ul of lxTE+0. IM NaCl in each well overnight at 4°C. Following the overnight incubation each well was washed 3 times with 50ul of lxTE+O.lM NaCl. The lxTE+O.lM NaCl was removed and 50ul of Ixligase buffer was added to each well and the plate was stored at 4°C untill used.
Plate capacity
To determine the binding capacity of each well lOpmol of BioFAMFok adaptor was bound to 8 wells by incubating lOpmol of the adaptor in 25ul of lxTE+O.lM NaCl in each well overnight at 4°C. Following the overnight incubation each well was washed 3 times with 50ul of lxTE+O.lM NaCl. A dilution of BioFAMFok (5, 2.5, 1.25, 0.675, 0.3375pmol) diluted in lxTE+0. IM NaCl was added to a series of well and the fluorescence of the plate read in a Biolumin Microtiter plate Reader (Molecular Dynamics)
The following readings (expressed as Relative Fluorescent Units) were obtained.
Dilution wells
5 pmol 74575 RFU
2.5pmol 35429 RFU
1.25pmol 16232 RFU
0.625pmol 9388 RFU
0.3375pmol 4807 RFU
Wells incubated with lOpmol of adaptor and washed 20872 RFU
21516 RFU
22519 RFU
21679 RFU
22658 RFU
21517 RFU
21742 RFU
22417 RFU
mean^lδόS
From these f igures one can calculate that 21856 RFUs is equal to 1 . 5 pmol of BioFAMFok . These data agree with the capacity of the wells to bind biotinylated double stranded DNA ( 5pmol hybridised in 200ul ) provided by Boehringer Mannheim technical help line .
Ef fect of Tween 20 on Ligation The addition of 0.1% Tween 20 to the reaction buffer used with Fok 1 is claimed to reduce the exonuclease activity associated with this enzyme (Fok 1 data sheet - New England Biolabs) . The following experiment was performed in order to determine if the addition of Tween would have any effect on the subsequent ligation of the cleaved fragements .
Nine reactions were set up with each set of three reactions each containing either 0, 0.05 or 0.1% tween in 25ul of Ixligase buffer, lOpmol BioG adaptor, lOpmol GCCG adaptor and 200ul ligase
(New England Biolabs) . One set of three reactions was set up as the above with 0. l%tween and no ligase. These were then incubated at 16°C for 1 hour and then each reaction transfered to a well of a black streptavidin coated microtitre plate
(Boehringer Mannheim) . The plate was incubated at room temperature for one hour and each well washed 3 times with lOOul of TES and the fluoresence measured in a Biolumin Microtite plate Reader (Molecular Dynamics) .
The following readings (expressed as Relative Fluoresent Units) was obtained.
Sample 0% tween 20 0.05% tween 20 0.1% tween 20 0.1% tween 20 (no ligase)
1 8592 8742 10213 3660
2 8083 8712 10605 3967
3 8720 8519 11598 3468 means 8465 8657.7 10805 3698
The above data demonstrate that the inclusion of 0.1% tween 20 increases ligation effieciency and therefore should not be detrimental to the ligation of the Fok 1 cleaved fragments to the 'Bbv" adaptors.
PCR primers and Conditions and Purification
The 3 PCR products used to represent sequence templates at different concentrations were exons 14,16 and 19 from the human erythrocyte anion exchanger gene located on chromosome 17q21-22. Primer sequences use to amplify exons 14,16 and 19
Exon 14
Forward primer
5 ' -GTATTTTCCAGCCCAAGCCAAAGCTGG-3 '
Reverse primer
5 ' BIOTIN-GCAGTGTTGGCAAGGACAGGC-3 '
Exon 16
Forward primer
5 ' BIOTIN-GCCCTTGGCATTCTTACCTGC-3 '
Reverse primer
5 ' -CTGGGAGAATGCCAGGGAAAGG-3 '
Exon 19
Forward primer
5 ' -GTGATAGGCACTGACCCCAG-3 '
Reverse primer
5 ' BIOTIN-CGCATGCTCCCAGCTCTTGTGC-3 '
The inclusion of biotin into one of the primers in each set will allow their capture to streptavidin coated beads (Dynal UK) .
All PCR reactions were performed in 50ul containing IxAmplitaq buffer (Perkin Elmer) , 30pmol of forward and reverse primer, 200uM dNTPs, 1.25 units of Amplitaq (Perkin Elmer) and lOOng of human genomic DNA. The reactions were overlaid with 50ul of mineral oil and cycled on a Techne 'Genie' PCR machine with the following conditions.
Exon 14
1 cycle 95°C for 2 min 35 cycles 57.5°C for 45 sec
72°C for 1 min
95°C for 35 sec
1 cycle 72°C for 5 min
Exon 16
1 cycle 95°C for 2 min
35 cycles 52°C for 45 sec
72°C for 1 min
95°C for 35 sec
1 cycle 72°C for 5 min
Exon 19
1 cycle 95°C for 2 min
35 cycles 57.5°C for 45 sec
72°C for 1 min
95°C for 35 sec
1 cycle 72°C for 5 min
Purification Excess primers and salts need to be removed before the PCR products are bound to DynaBeads, this is performed as described below.
10 reactions of each were pooled following PCR, separately, prior to purification. The PCR products were then ethanol precipitated by added 2.5 volumes of 100% ethanol and one tenth of a volume of 3M sodium acetate. The solution was then incubated at -20°C for 30 minutes and then spun at 13000rpm in a Heraeus A13 benchtop centrifuge for 15 minutes to precipitate the DNA. The supernatant was then poured off and the pellet allowed to air dry. The dry pellet was then resuspended in 150ul of water. Following this, 2 Chromospin-100 columns (Clonetech) were prepared for each sample by spinning the columns in a Hereaus 17RS centrifuge for 3 minutes at 3500rpm according to the manufacture's instructions. Following centrifugation 75ul of the DNA solution was added to each prepared column and spun as before collecting the purified DNA into a 1.5ml eppendorf tube. The 2 samples for each exon were then pooled and the DNA concentration measured by reading the absorption at 260nm and 280nm in a Pharmacia Genequant spectrophotometer .
Solutions and Buffers
Figure imgf000046_0001
lOmM Tris HC1
ImM EDTA
TES ΌH7.5 lOmM Tris-HCl
ImM EDTA
2M NaCl lxFok l buffer ΌH7.9 50mM potassium acetate
20mM Tris Acetate
lOmM magnesium acetate
ImM DTT
IxBbv I buffer Ph7.9 50mM NaCl
Figure imgf000047_0001
lOmM MgC12
ImM DTT
lxSau 3 A buffer pH7.9 33mM Tris acetate
66mM potassium acetate
lOmM magnesium acetate
0.5mM DTT
IxLisase buffer pH7.8 50mM Tris-HCl
lOmM MgC12
lOmM DTT
ImM ATP
50ug/ml BSA Experiment
Concentrations of Column Purified DNA exon 14 - 130ng/ul
exon 16 - 120ng/ul
exon 19 - 115ng/ul
lug exonl4 ( 255bp) = 5 . 9pmol , lug exonl6 ( 272bp) = 5 . 58pmol , lug exonl9 (252bp) = 6 . 03pmol . lug exonl4 = 7.7ul, lug exonl6 = 8.3ul, lug exonl9 = 8.7ul therefore exon 14 = 0.76 pmol ul, exon 16 = 0.67pmol/ul, exon 19 = 0.69pmol/ul
Sau 3A Digest
30, 15 and 6pmol of column purified exons 14, 16 and 19, respectively, were digested with 20 units of Sau 3A in lOOul of lxSau 3A buffer at 37°C for 4 hours.
exonl4 39.5ul
exon 16 22.4ul
exon 19 8.7ul
sau 3A 5ul
lOxsau 3 A buffer lOul
H2O 14.4ul
Following digestion the reaction mix was heated at 65oC in a Techne Dryblock for 20 minutes to inactivate the enzyme.
Preparation of DvnaBead M280
According to the manufacture's instructions 3mg of DynaBeads M280 will bind 60-120 pmol of biotinylated double stranded DNA. 300ul of DynaBeads M280 at lmg/ml were washed with lOOul TES by holding the beads to the side of an eppendorf tube with a Magnetic Particle Concentrator (Dynal UK) so that the supernatant could be removed. This was repeated three times (All subsequent bead manipulation were carried out in this manner according to manufacture's instructions) . The beads were resuspended in lOOul of TES and the Sau 3A digested DNA added and incubated at room temperature for 1 hour to allow the biotinylated DNA to bind to the beads .
The Beads/DNA were then washed three times with Ixligase buffer using the Magnetic Particle Concentrator (Dynal UK) as before.
Licra tion of SauFAM Adaptor (Containing Fok I si te)
The supernatant was removed and the beads/DNA were resuspended in 75ul of Ixligase buffer containing 300pmol of SauFAM adaptor and 4000 units of ligase (New England Biolabs) .
Beads/DNA , 7.5ul 10 ligase buffer, 15ul SauFAM (at 20pmol/ul) , lOul ligase (at400 units/ul) , 42.5ul H20
The reaction was then incubated at 16°C for 2 hours Fok I Digestion
Following ligation the beads/DNA were was 2 times with 75 ul of lx Fok I buffer and the resuspended in lOOul of lxFok I buffer and heated at 65oC in a Techne Dryblock for 20 minutes to inactivate any remaining ligase.
The buffer was was removed and the beads/DNA resuspended in 95ul of lx Fok I buffer containing 20 units of Fok I (New England Biolabs)
Beads/DNA, 9.5ul lOx Fok I buffer, 5ul Fok I (at 4 units/ul)
The beads/DNA were then incubated at 37oC for 2 hours. Following incubation the supernatant, containing the fragments cleaved by Fok I, was then transferred to a fresh eppendorf tube and heated at 65oC for 20 minutes in a Techne Dryblock in inactivate the Fok I.
Licration of Fok I Cleaved Fragments to Bbv Adaptors on Microti ter Plate
The Fok I fragments were then divided into three tubes each containing 30ul of Fok I cleaved fragments, 5ul of lOx Ligase buffer, 3ul ligase (at 400uints/ul -New England Biolabs) and 12ul of H20.
The ligase buffer on a plate containing adaptors Bbvl4 , 16, 19 in separate wells (prepared as previously described) was removed and the above reaction mixtures, containing the Fok I cleaved fragments and ligase, added to each.
The wells were then incubated at 16°C for one hour and then washed three times with 50ul of TES. The TES was removed from the wells, another 50ul of TES added and the fluorescence measured in Biolumin Microplate reader (Molecular Dynamics) . A well to which no fragments were added and just contained Bbv adaptors was used as a blank.
Data expressed as RFUs
Bbvl4 well 1774 RFU
Bbvlδ well 1441 RFU
Bbvl9 well 1192 RFU Blank 1010 RFU
The reading from the blank well, which is a background reading, was subtracted from the reading of the other wells and gave the following. Bbvl4 well 764 RFU Bbvlδ well 431 RFU Bbvl9 well 182 RFU
As half as much of exon 16 compared to exon 14 (15pmol exon 16, 30 pmol exon 14) was included into the procedure the reading obtained from the Bbvl6 well should be half (i.e. 50%) of that obtained from the Bbvl4 well and as one fifth the amount of exon 19 compared to exon 14 (6pmol exon 19, 30 pmol exon 14) the reading obtained from the Bbvl9 well should be one fifth (i.e. 20%) that obtained from the Bbvl4 well.
Ideal Reading Expressed As Percentages Bbvl4 well 100 Bbvl6 well 50 Bbvl9 well 20
Actual Readings Expressed As Percentages (using Bbyl4 well as
100%)
Bbvl4 well 100
Bbvl6 well 56.4
Bbvl9 well 23.8
Bbvlδ well 6.4% error Bbvl9 well 3.8% error
Therefore, this process is capable of separating a mixed population of DNA , and identifying 4bp, while at the same time maintaining the relative proportions of the original mixture with minimal errors . Which in turn can then be reprobed to obtain another 4bp and the associated quantitative data.
Example 2
Ligation Optimisation
The ligation reaction is a critical step in this sequencing technology. Therefore, full optimisation of this reaction is required to ensure success with these techniques. The conditions for the ligation reaction have been investigated by ligating fluorescently (FAM) labelled adaptors to biotinylated adaptors captured to a streptavidin coated microtitre plate.
The biotinylated adaptors consist of a GC rich and AT rich type having the 4 base pair overhang sequence CGGC and TAAT respectively. These represent the extremes of GC and AT hybridisation and are therefore used to determine the conditions required to equalise their differing hybridisation kinetics.
Data (presented in RFUs - Relative Fluorescent Unit ) provided below are means of duplicate reactions. Reactions performed in 25ul.
FAM labelled Adaptor Ti tration
Used to evaluate the optimum amount of FAM labelled adaptor for use in the ligation reaction. To lpmol of captured adaptor 0, 2.5, 5 and 10 pmol of FAM labelled adaptor was ligated to it for 5 minutes at 16°C.
pmol GCCG RFU RFU pmol ATTA
0 56 71 0
2.5 175 614 2.5
5 303 756 5
10 293 783 10 The above data demonstrates that 5pmol of FAM labelled adaptor is optimum for this reaction. Increasing the amount further does little to promote further ligation. This experiment also demonstrates the ligation differences between GC and AT rich adaptors with GC rich ones ligating 2.5 times more that AT rich under identical conditions.
Ligase Ti tration
Used to evaluate optimum ligase concentration for the reaction. 5pmol of GC and AT rich adaptor was ligated to their appropriate captured adaptor for 5 minutes at 16°C with 0, 0.5, 1 and 2ul of ligase (at 4 units per ul - units as defined by New England Biolabs, not Weiss units) .
ul ligase GCCG RFU ul ligase ATTA RFU
95 133
0.5 186 355 0.5
205 420
224 383
These data suggest the lul of ligase is optimum for this reaction. Increasing the amount of enzyme in the reaction appears to have an inhibitory effect with the GC rich adaptor which maybe a result of the increased amount of glycerol in the reaction. The ligase is stored in a 50% glycerol solution.
Investigating the effect of a higher concentration ligase (also available from New England Biolabs) is proposed for future work. As this maybe a way equalising differences of GC and AT adaptors by driving each reaction to completion.
Reaction Time Course
Investigation the effect of time on the reaction. 2.5 and 5pmol of GC and AT rich adaptor ligated to their appropriate captured adaptor for 5, 10, 30 and 60 minutes at 16°C.
GCCG 2.5 pmol 5 pmol ATTA 2.5 pmol 5 pmol
5 min 178 216 5 min 97 123
10 min 198 255 10 min 107 148
30 min 312 377 30 min 229 216
60 min 474 486 60 min 231 326
As can be seen from these data increasing the reaction time increases the amount of FAM labelled adaptor ligated, as expected. A reaction time of 60 minutes will be impractical for the proposed techniques. However, these reactions do not contain any agents which promote ligation through intra molecular crowding such as polyethylene glycol (PEG) or ficol. By including such additives the reaction time can be reduced to an acceptable duration by increasing ligation speed.
Titration of intra molecular crowders
The intra molecular crowders PEG, ficol and hexamine chloride were titrated to investigate their effects on ligation. Tetremethly ammonium chloride, which modifies Watson and Crick base pairing, was also titrated to investigate its effect on the differing efficiency of ligation of AT and GC rich adaptors 5pmol of adaptor was ligated for 10 minutes at 16°C.
PEG Titration
PEG % GCGG ATTA
0 545 273
2.5 510 388
7.5 534 384
15 326 422
The addition of PEG to reaction appears to have little effect on the ligation of the GCCG adaptor up to 7.5% and at 15% it is inhibitory to the reaction. However, it increases the amount of the ATTA adaptor ligated. Further titration is required to determine at which concentration the efficiency of the two reactions is equal.
Ficol Titration
ficol% GCGG ATTA
0 545 273
2.5 550 341
7.5 570 398
15 274 152 As with PEG, the addition of ficol has little effect on the efficiency of the GCCG reaction up to 7.5% and is inhibitory at 15%. However, increasing the concentration of ficol increases the efficiency of the ATTA reaction up to 7.5% and at 15% it is inhibitory. Again, further titration is required to evaluate at which concentration the reactions are equalised.
Hexamine Chloride Titration
mM hexamine GCGG ATTA chloride
0 545 273
1 449 295
5 439 262
25 300 116
This data suggests that the addition of hexamine chloride in this system is inhibitory and therefore has little use in promoting ligation.
TMAC Titration
mM TMAC GCGG ATTA
0 545 273
1 681 453
5 647 215
25 686 97 The addition of TMAC to the reaction appears to increase the efficiency of the GCCG adaptor ligation at all concentrations tested, while decreasing the ATTA reaction at concentrations above ImM. The inclusion of TMAC with PEG or ficol should be investigated as a means of equalising and promoting ligation of GCCG and ATTA adaptors.
It is clear that the inclusion of such intra molecular crowders, such as PEG or ficol, will allow the equalisation of the differences in reaction efficiency observed between AT and GC rich adaptors .
Example 3
COMPETITIVE HYBRIDISATION ASSAY
Additive Titration
It is important that differing efficiencies of ligation between AT and GC rich adaptors are reduced to a minimum if the ligation of adaptors is to be used to obtain quantitative data on a mixed population of nucleic acid.
The effects of polyethylene glycol 8000 and ficol on equalising the differing efficiencies of ligation between AT rich (ATTA) and GC rich (GCCG) adaptors has been investigated (see previous example) . Results of further similar experiments are shown in Figure 9 (all results given as Relative Fluorescence Units (RFU) ) .
Fi col Ti tra tion
Figure 9 shows a graph representing the effect that increasing Ficol concentration has on the efficiency of ligating FAM labelled GCCG adaptor (series 1) to captured CGGC target adaptor and FAM labelled ATTA adaptor (series 2) to captured TAAT target adaptor .
RFU - Relative Fluorescent Uni t
As can be seen increasing the amount of ficol in the reaction mix increases the efficiency of reactions for the GC rich adaptor substantially and to a limited degree for the AT rich adaptor. At concentration above 10% the ficol is inhibitory.
Ficol has much less of an effect on the efficiency of these reactions as compared to PEG (see below) and therefore will be of less use in helping to equalise the efficiency of ligation between AT and GC rich adaptors .
PEG Ti tration
AT Rich reactions
Figure 9 also shows a graph representing the effect that increasing PEG concentration has on the efficiency of ligating FAM labelled ATTA adaptor to captured TAAT target adaptor.
Clearly increasing the concentration of PEG increases the amount of adaptor ligated to the target , in a 5 minute reaction, up to around 10% when at concentration higher than this it begins to have an inhibitory effect. At 10% PEG concentration approximately 3 times more adaptor is successful ligated to the target than with no PEG in the reaction mix (1481 RFUs at 0%, 4369 RFUs at 10%) .
This increasing in efficiency is probably due to the PEG increasing the time that the adaptor can hybridise to the target therefore increasing the chance that the enzyme can ligate it to the target. At concentrations above 10% the reaction solution is rather viscous and this will decrease the mobility of the reaction components and hence reduce reaction efficiency i.e. the benefits of having the adaptor hybridised to the target for long is lost if reaction components cannot get to the adaptor quick enough to complete the reaction. This would explain the inhibitory effect at the higher PEG concentrations.
GC Rich Reactions
Figure 9 also shows a graph representing the effect that increasing PEG concentration has on the efficiency of ligating FAM labelled GCCG adaptor to a captured CGGC target adaptor.
Interestingly, increasing the concentration of PEG in this reaction has a general inhibitory effect. This observation is probably due to the increased viscosity of the solution reducing the mobility of the reaction components and therefore the reaction efficiency. This appears to out way any effects that increased hybridisation times may have on the efficiency of the reaction. This is probably due to the fact that GC rich adaptors hybridise more strongly (as compared to AT rich ones) due to the extra hydrogen bond that GC base pairs have and increasing the concentration of PEG does little to increase the time that the adaptors remain hybridised.
Therefore, from this data, it is proposed that a PEG concentration of 10% should be used in a reaction where the ligation of adaptors is used to obtain quantitative data.
Competitive assays
If one has 256 uniquely mass labelled adaptors one reduce the time it take to obtain quantitative data from a mixed population of nucleic acid by ligating 128 adaptors simultaneously followed by 128 of the corresponding complimentary adaptors. In this system adaptors would compete with each other for their complimentary sites. In order to investigate the specificity of this system various FAM labelled adaptors (specific e.g. GCCG and to CGGC mismatched e.g. GCGG to CGGC) have been tested against increasing concentrations of unlabelled specific adaptors under different conditions e.g. differing enzyme concentrations. The following are the data from conditions which produced the greatest specificity and reproducibility . All reactions were performed in duplicate.
AT Rich Adaptors
Firstly, 2.5 pmol FAM labelled specific (ATTA) and mismatched (ATCA and ATAA) were ligated in the presence of 0,1.25,2.5 and 5 pmol of unlabelled ATTA with 10% PEG for 5 minutes at 16oC. Following incubation unligated adaptors were removed by 3 washes of lxTE and the fluorescence measured. Data are expressed as relative fluorescent units.
2.5 pmol 2.5 pmol 2.5 pmol
FAM-ATTA FAM-ATCA FAM-ATAA
0 pmol ATTA 3285 0 223
1.25 pmol ATTA 2994 0 0
2.5 pmol ATTA 2744 0 0
5 pmol ATTA 1605 0 0
The most important observation from the above data is that the ATCA mismatched adaptor does not ligate to any measurable degree . The presence of the C in the ATCA adaptor must therefore disrupt the base pairing completely thereby preventing any ligation. One would predict that this would also be the case if the mismatch contained a G instead of a C. The ATAA adaptor only ligates at 6.7% of the amount as the ATTA adaptor. The replacement of the T with an A in .this mismatch therefore disrupts base pairing to a lesser degree than a C and therefore allows some ligation. However, the ligation of this mismatched adaptor is completely displaced by the presence of any unlabelled specific ATTA adaptor.
From these data one can therefore conclude that the ligation of AT rich adaptor will be highly specific in a competitive system and will deliver highly representative quantitative data.
GC Rich Adaptors
2.5 pmol FAM labelled specific (GCCG) and mismatched (GCAG and GCGG) were ligated in the presence of 0,1.25,2.5 and 5 pmol of unlabelled GCCG with 10% PEG for 5 minutes at 16oC. Following incubation unligated adaptors were removed by 3 washes of lxTE and the fluorescence measured. Data are expressed as relative fluorescent units.
2.5 pmol 2.5 pmol 2.5 pmol
FAM-GCCG FAM-GCAG FAM-GCGG
0 pmol GCCG 8615 2992 3311
1.25 pmol GCCG 5091 2442 1472
2.5 pmol GCCG 3660 1991 841
5 pmol GCCG 2430 501 267
With GC rich sequences the mismatched adaptors do ligate as compared to the AT rich one which do not . The GCAG mismatch ligates at 35% and the GCGG at 38% of the amount of the specific GCCG. This is to be expected as GC base pairing is stronger than AT base pairing and can thus accommodate a degree of base pairing disruption. However, when in competition with equal amounts of unlabelled specific GCCG adaptor the amount of ligation achieved is reduced to 23% for the GCAG and 10% for the GCGG adaptors.
Therefore, the above data suggest that the ATTA rich adaptors will be highly specific but 10 to 23%_ of the GC rich adaptors could ligate to an incorrect sequence. However, this should not be a problem as these errors can be compensated for in the software that would be required to analyse the data .Also, if the same experiment is repeated with a different reference enzyme one could use each set of data to cross reference the quantification and sequence data in order to resolve discrepancies produced from non-specific ligations.
Adaptor Sequences and Preparation (Examples 2 and 3) :
ATTA Adaptor:
5 ' -FAM-GCATCAGGATGTACAG-3 '
3 ' -CGTAGTCCTACATGTCATTA-P04-5 '
ATAA Adaptor:
5' -FAM-GCATCAGGATGTACAG-3 '
3 ' -CGTAGTCCTACATGTCATAA-P04 -5 '
ATGA Adaptor:
5' -FAM-GCATCAGGATGTACAG
3 ' -CGTAGTCCTACATGTCATGA-P04-5 '
GCCG Adaptor:
5 ' -FAM-GCATCAGGATGTACAG
3 ' -CGTAGTCCTACATGTCGCCG-P04 -5 '
GGCG Adaptor:
5' -FAM-GCATCAGGATGTACAG
3 ' -CGTAGTCCTACATGTCGGCG-P04 -5 '
GACG Adaptor:
5 ' -FAM' GCATCAGGATGTACAG-3 '
3 ' -CGTAGTCCTACATGTCGACG-P04 -5 '
TAAT Adaptor:
5 ' -Biotin-GCATCAGGATGTACAG-3 '
3 ' -CGTAGTCCTACATGTCTAAT-P04 - 5 ' CGGC Adaptor:
5 ' -Biotin-GCATCAGGATGTACAG-3 '
3 ' -CGTAGTCCTACATGTCCGGC-P04 -5 '
Adaptors prepared as described in Example 1.
Abbreviations : FAM - fluorescein P04 - phosphate
All oligonucleotides purchased from Oswel DNA services .
Reconstructing sequences from matrices of n-mers:
A sequencing reaction by this method involves repeated cycles of cleaving a template with a type IIs restriction endonuclease whose recognition sequence is provided by an adaptor. If the reaction is peformed with multiple templates then each cycle of the sequencing reaction will generate a signal for a series of n-mers. Many cycles will of the reaction will generate a matrix of n-mers which must be analysed to reconstruct the sequences of the source templates .
An algorithm has been implemented in the C programming language that can import and interpret such a data matrix. The entire program is not listed but the critical data structure to store the n-mer matrix is shown with the critical sections of code. A complete program that can form part of a data capture and processing system should be trivial to develop from this starting point .
The program operates by first analysing the data matrix to identify in each column of the matrix, corresponding to one cycle of the sequencing reaction, n-mer frequencies or quantities which are equivalent in other columns of the matrix given a predefined margin of error in the measurement of n-mer quantities within which to operate. The raw n-mer frequencies in the data matrix are then replaced with their probable group frequencies in each column .
This new data matrix is then analysed by a second algorithm which assumes that there should be the same number of n-mers in each column of the matrix and attempts to resolve any 'sums' of frequencies where the same n-mer has occurred in more than one template in a given cycle of the sequencing reaction. This algorithm takes the group frequencies in the data matrix and generates a sorted 'frequency list' that lists the number of occurrences of each group frequency in order of increasing number of occurrences .
The algorithm then takes group frequencies with the lowest number of occurrences first on the assumption that these are likely to be sums, since sums of groups should occur with a relatively low frequency. An alternative would be to generate a sorted list of group frequencies, in order of decreasing quantity, and start with the largest quantities, again on the assumption that these are likely to be sums.
The algorithm then tests each frequency in the list against each column of the original data matrix. If the group frequency occurs in the column it is tested against all combinations of pairs of group frequencies that are missing from the column to see if any of these missing frequencies can add up to give the current frequency being tested. If any of these missing frequencies do add up and there is only one pair that can add up within the predetermined margin of error then it is assumed that the larger frequency is the sum of the two missing frequencies and the larger frequency is replaced in the current column of the data matrix by occurrences of the two missing frequencies . Any frequencies are the sum of two pairs of missing frequencies are marked as such and in the final sequence reconstruction the bases are marked as unknown.
At the end of this analysis one should be left, in most cases, with a data matrix that has the same group frequencies in each column of the matrix. Each group frequency is then assumed to correspond to a single template and the sequence of each template is then the series of n-mers from all the columns in the matrix identified by its group frequency.
C Header File of N-mer Ma trix Data Structure :
#include <stdio.h> #include <stdlib.h>
^include <string.h>
#include <math.h>
#include "List.h"
#define CYCLES 10 #define NMERS 256 #define CUTLENGTH 4
typedef struct
{ double element [4] ;
} Element;
typedef struct
{
Element matrix [NMERS] [CYCLES]
List *frequencyList ;
} SeqMatrix; void InitSeqMatrix (SeqMatrix *myMatrιx) ,
void AddtoMatrix (SeqMatrix *myMatπx, int columnNo, int fourMer, double frequency) ,
int ReplaceElement (SeqMatrix *myMatπx, double oldFrequency, double newFrequency) ,
int FindColumn (SeqMatrix *myMatnx, double frequency),
int InColuπvn (SeqMatrix *myMatrιx, double frequency, int columnNo),
int FindRow (SeqMatrix *myMatrιx, double frequency, int columnNo),
void PrintMatπx (SeqMatrix *myMatrιx) ,
void FPπntMatrix (SeqMatrix *myMatnx) ,
void BuildList (SeqMatrix *myMatnx) ,
void ResolveGroups (SeqMatrix *myMatrιx int error) ,
void ResolveSums (SeqMatrix *myMatnx, int errorSize) ,
void Reconstruct (SeqMatrix *myMatπx) ,
The SeqMatrix data structure stores the matrix of n-mers generated by a sequencing reaction.
The critical algorithms are-
0 ResolveGroups (SeqMatrix *myMatrιx, int error) which analyses the SeqMatrix data structure to identify frequencies m each column of the matrix that are equivalent.
° The ResolveSums (SeqMatrix *myMatrix, int errorSize) algorithm analyses the matrix generated by the ResolveGroups algorithm and attempts to determine which frequencies are sums of other frequencies.
0 The Reconstruct (SeqMatrix *myMatrix) algorithm analyses the data matrix produced by the ResolveSums algorithm to reconstruct the sequences encoded by the group frequencies (not listed) .
The program as it stands is far from optimal but will reliably reconstruct model matrices generated from fifteen template simultaneously with an error in the measurement of frequencies of about 2%. Many improvements to the basic algorithms can still be made .
To be useful in analysing real data, noise subtraction algorithms would be needed and an algorithm to normalise frequencies in each column to account for progressive decrease in signal with each cycle of the sequencing reaction that will result from the fact that no enzymatic step will be 100% efficient.
ResolveGroups code:
II attempts to group error laden frequencies into groups corresponding to // original source frequencies void ResolveGroups(SeqMatrιx *myMatπx, mt error) { mt i, j, k, I, n, groupCount, ambiguity; double maxDistance, mmDistance, localError, fError, meanFreq, groups[400][CYCLES];
List *tempϋst = NULL, *newLιst = NULL;
pπntf("\nResolvιng groups of frequencies..An"); for(ι = 0; i < T OO; ι++)
{ for( = 0; j < CYCLES; j++)
{ groups[ι][j] = 0.0;
} ( myMatπx->frequencyLιst = FreqSortLιst(myMatπx->frequeπcyLιst);
// twice percent error as first grouping attempt fError = (double)(2 * error);
// test frequencies to see if they belong to a group tempList = myMatrιx->frequencyLιst; ι=0; whιle(tempϋst l= NULL)
1 localError = (fError/1 00 * tempLιst->frequency);
// value that is 2% greater than first frequency - maxDistance = tempϋst->frequency + localError;
// value that is 2% less than first frequency - mmDistance = tempUst->frequency - localError;
Figure imgf000068_0001
ambiguity = 0; for(k=0;k<NMERS;k++)
(
// is 2nd frequency within twice percent error of 1 st
// frequency"? ιf(myMatπx->matπx[k][j].elernent[0] > mmDistance)
! ιf(myMatrιx->matπx[k] (j].element[0] < maxDistance)
{ groups[ι][j] = my Mat rιx-> matrix [k] [j] element [0] , ambiguity += 1 , if (ambiguity > 1 )
{ pπntf("ambιguous grouping - first groupingXn"); groups[ι] |] = -1 -0; f tempList = tempLιst->next; ι++;
/ *
// pπnt out groups matπx to screen for(k=0;k<ι;k++)
Figure imgf000069_0001
ιf(groups[k]rj] '= 0) { pπntf("%5.4d ",(ιnt)groups[k][j]);
} 1 pnntfC'Xn");
} * /
// To hold new list of grouped frequencies πewϋst = NULL; meanFreq = 0; groupCount = 0; for(j=0;j<CYCLES;j++) ( ιf(groups[0][j] '= 0 && groups[0][j] '= -1 .0) ( meanFreq += groups[0] |]; groupCount += 1 ; 1
} meanFreq = meanFreq/groupCount; //prιntf("%f, %d\n", meanFreq, groupCount); k = 0; 1 = 0; whιle(k<ι && kι)
{ for(l=k+1 ;l<ι;l++)
Figure imgf000069_0002
groups[l][j]) newList = AppeπdElement(newLιst, NewElement(meanFreq, groupCount)); k = l; groupCount = 0; meanFreq = 0; for(n=0;n<CYCLES;n++)
{ ιf(groups[k][n] 1= 0 && groups[k][n] '= -1 .0) ; I meanFreq += groups[k][n]; groupCount += 1 ; } } meanFreq = meanFreq/groupCount; //prιntf("%f, %d\n", meanFreq, groupCount);
newList = AppendElement(newLιst, NewElement(meanFreq, groupCount));
// narrow down allowed error range as second grouping attempt: fError = (double)( 1 .5 * error);
// Replace error πdden frequencies with corresponding group mean values // retesting each value against group means tempϋst = newList; while (tempϋst '= NULL && tempLιst->frequency '= 0)
{ localError = (fError/1 00 * tempLιst->frequency);
// value that is 2% greater than first frequency - maxDistance = tempLιst->frequency + localError;
// value that is 2% less than first frequency - mmDistance = tempLιst->frequency - localError; for(l=0;l<CYCLES;l++)
{ ambiguity = 0; for(n=0;n<NMERS;n++) f
// is 2nd frequency within twice percent error of 1 st frequency"? ιf(myMatπx->matπx[n] [l].element[0] > mmDistance)
{ ιf(myMatπx->matπx[n][l].element[0] < maxDistance)
{ myMatπx->matπx[n][l].element[0] = tempLιst->frequency; ambiguity += 1 ;
if (ambiguity > 1 )
{ pππtf("ambιguous grouping - second groupingXn"), groups[ι] [j] = - 1 , tempList = tempLιst->πext;
// Update frequencyList with values of group means myMatπx-> frequeπcyList = lnιtLιst(myMatπx->frequencyLιst) myMatπx-> frequencyList = CopyLιst(newLιst);
ResolveSums code:
II Resolve frequencies that are the sums of atomic quantities void ResolveSums(SeqMatrιx *myMatπx, mt errorSize)
{ int flag, coINo, rowNo, candidates, i, j, k; double minSum, maxSum, tempFreq, fError, localError;
List *tempLιst 1 , *tempϋst2, *tempϋst3 , *tempLιst4, *tempLιst5 ; pπntf("\nResolvιng sums of frequencies..An"); myMatπx->frequencyLιst = SortLιst(myMatπx-> frequencyLιst); fError = ((double)errorSιze); tempList l = myMatπx->frequencyLιst; whιle(tempϋst 1 ι= NULL) { flag = 1 ; localError = (fError/ 1 00 * tempListl ->frequency);
// value that is within % error greater than first frequency - maxSum = tempList l ->frequency + localError;
// value that is within % error less than first frequency - minSum = tempList l -> frequency - localError; whι!e(tempUst 1 ->occurrences>0 && flag)
{
// FindColumn returns the column in which the frequency is found coINo = FιndColumn(myMatπx, tempList l ->frequency); tempϋst2 = tempList l ->πext; candidates = 0; whιle(tempϋst2 '= NULL)
I ιf('(lnColumn(myMatπx, tempLιst2->frequency, coINo)))
{ tempϋst3 = tempLιst2->πext; whιle(tempϋst3 '= NULL)
I ιf(l(lπColumn(myMatπx, tempLιst3->frequency, coINo)) )
I tempFreq = tempList2->frequency; tempFreq += tempLιst3->frequency; ιf(mιnSum <= tempFreq)
1 ιf(maxSum >= tempFreq) f pπntf("Frequency %f " , empList l ->frequency); prιntf( "could be composed of %f", tempLιst2->frequency ); pπntf("and %f\n" ,tempLιst3-> frequency); SubtractOccurrence(my Ma trιx-> frequency List, tempList l ->frequency); rowNo = FindRow(myMatrix, tempList l -> frequency, coINo); myMatrιx->matrix[rowNo][colNo].elemeπt[0]
= tempList2->frequency; myMatrix->matrix[rowNo][colNo].element[ 1 ]
= tempList3->frequency; AddOccurrence(myMatrιx-> frequencyList, tempList2->frequency); AddOccurreπce(myMatrix->frequencyϋst, tempList3-> frequency); candidates++;
tempUst4 = tempList3->next; while(tempϋst4 '= NULL)
{ if(!(lnColumn(myMatrix, tempList4->frequency, coINo)))
{ tempFreq = tempList2->frequency; tempFreq += tempϋst3->frequency; tempFreq += tempList4->frequeπcy; if(minSum <= tempFreq)
{ if( maxSum >= tempFreq) f printf ("Frequency %f ",tempList 1 ->frequeπcy); printfC'could be composed of %f ", tempList2->frequency); printf("and %f\n", tempList3->frequency); printf("and %f\n", tempList4->frequency); SubtractOccurrence(myMatrιx-> frequency List , tempList l ->frequency); rowNo = FiπdRow(myMatπx, tempList l ->frequency,colNo) my Matπx->matπx[rowNo] [colNo] . element [0] = tempLιst2-> frequency; myMatπx->matπx[rowNo] [colNo].element[ 1 ] = tempLιst3->frequency; myMatπx->matπx[rowNo] [colNo] .element[ l ] = tempLιst4->frequency; AddOccurrence( my Ma trιx-> frequency List , tempLιst2->frequency); AddOccurrence(myMatrix-> frequency List, tempLιst3->frequency); AddOccurrence(my Mat πx-> frequency List, tempLιst4-> frequency) ; candidates- -j-;
tempϋst5 = tempLιst4->πext; whιle(tempϋst5 ι= NULL) i ι f(i(lnColumn(myMatrιx, tempLιst5-> frequency, coINo) ) )
( tempFreq = tempLιst2->frequency; tempFreq += tempLιst3->frequency; tempFreq += tempLιst4->frequency; tempFreq += tempLιst5->frequency; ιf(mιnSum <= tempFreq)
{ ιf(maxSum >= tempFreq)
{ pπntf ("Frequency %f ", tempList l -> frequency); pπntfC'could be composed of %f ", tempLιst2->frequency); prιntf("and %f\n" ,tempϋst3->frequency); pππtf("and %f\n",tempLιst4-> frequency); prιntf("and %f\n" ,tempLιst5->frequency); SubtractOccurrence( myMatnx->frequencyLιst,tempLιst l ->frequeπcy); rowNo = FιndRow(myMatπx, tempList 1 ->frequency,colNo); myMatnx->matnx[rowNo][colNo].element[0] = tempLιst2-> frequency; myMatπx->matπx[rowNo] [colNo].element[ 1 ] = tempLιst3-> frequency; myMatπx->matπx[rowNo] [colNo].elemeπt[ 1 ] = tempϋst4->f requency; myMatπx->matπx[rowNo][colNo].element[ 1 ] = tempLιst5-> frequency;
AddOccurrence(myMatπx->frequencyLιst tempLιst2->f requency) ; AddOccurrence(myMatπx->frequencyϋst, tempLιst3->f requency); AddOccurreπce(myMatπx->f requency List, tempLιst4-> frequency); AddOccurrence(myMatπx->f requency List, tempLιst5->frequeπcy); candιdates++,
tempLιst5 = tempLιst5->πext; } tempϋst4 = tempLιst4->next; } tempLιst3 = tempLιst3->πext; 1 I tempLιst2 = tempLιst2->next;
I ιf(candιdates > 1 )
I flag = 0; pππtf("Ambιguιty - candidates = %d :" , candidates) pππtff'Frequency = %f\n", empList l -> frequeπcyj , tempList l = tempListl ->next;
} myMatπx-> frequencyList = RemoveNullOcc(myMatπx-> frequencyList);
- 73a - Key for Drawings FIGURE 4
Step 1 Cleave genomic DNA with type IIs restriction endonuclease
Step 2 Add adaptors to fragments each bearing primer binding sites such that each sticky-end or subset thereof bears a unique primer site
Step 3 Differentially amplify by PCR by adding different amounts of primer for each adaptor
FIGURE 5
Step 1 Cleave genomic DNA with type IIs restriction endonuclease
Step 2 Ligate adaptor pair to fragments to tag termini
Step 3 Capture fragments to allow fragments with adaptor 2 at both termini to be washed away
Step 4 Cleave adaptor 2 with restriction endonuclease
Step 5 Release fragments from solid phase substrate
Step 6 Ligate capture adaptor to blunt end generated from fragments with adaptor 2 at one end
Step 7 Capture fragments or perform arbitrary further sorting - 73b
FIGURE 8
Sorting step Sort fragments onto array of oligonucleotides or into array of 256 wells
Cleavage step Cleave immobilised fragments with type IIs restriction endonuclease corresponding to directionality adaptor 1
Addition of Adaptor Add adaptor with fluorescent label and with sticky-end complementary to one of the 256 possible 4 base overlaps that might be present on the immobilised nucleic acid fragments. The sequence of each adaptor's sticky-end must be known

Claims

- 74 - CLAIMS :
1. A method for sequencing nucleic acid, which comprises:
(a) obtaining a target nucleic acid population comprising nucleic acid fragments in which each fragment is present in a unique amount and bears at one end a sticky end sequence of predetermined length and unknown sequence,
(b) protecting the other end of each fragment, and
(c) sequencing each of the fragments by
(i) contacting the fragments with an array of adaptor oligonucleotides under hybridisation conditions, each adaptor oligonucleotide bearing a label, a sequencing enzyme recognition site, and a known unique base sequence of same predetermined length as the sticky end sequence, the array containing all possible base sequences of that predetermined length; removing any unhybridised adaptor oligonucleotide and recording the quantity of any hybridised adaptor oligonucleotide by detection of the label, then repeating the cycle, until all of the adaptors in the array have been tested;
(ii) contacting the hybridised adaptor oligonucleotides with a sequencing enzyme which binds to the recognition site and cuts the fragment to expose a new sticky end sequence which is contiguous with or overlaps the previous sticky end sequence ,-
(iii) repeating steps (i) and (ii) for a sufficient number of times and determining the sequence of the fragment by comparing the quantities recorded for each sticky end sequence . - 75 -
2. A method according to claim 1, wherein each label comprises a mass label associated with a corresponding known base sequence for identifying the corresponding base sequence in mass spectrometry .
3. A method according to claim 2, wherein each adapter oligonucleotide labelled with an associated mass label is uniquely resolvable in mass spectrometry from the other labelled adapter oligonucleotides.
4. A method according to claim 3 , wherein each adapter oligonucleotide is composed of nucleotide analogues which are resistant to fragmentation in the mass spectrometer.
5. A method according to claim 2, wherein each mass label is cleavably attached to its corresponding adaptor oligonucleotide and uniquely resolvable in mass spectrometry.
6. A method according to any one of claims 2 to 5, wherein the mass spectrometry is effected using a mass spectrometer with orthogonal time of flight or array detector geometry.
7. A method according to any one of the preceding claims, wherein the fragments are contacted in step (i) with the array of adaptor oligonucleotides in a cycle wherein the cycle comprises sequentially contacting each adaptor oligonucleotide of the array with the fragments .
8. A method according to any one of the preceding claims, wherein the target nucleic acid population is subjected to a step of sorting into sub-populations according to their sticky end - 76 - sequences and each of the sub-populations is subjected to steps (b) and (c) .
9. A method according to any one of the preceding claims, wherein each fragment is produced by differential amplification.
10. A method according to any one of the preceding claims, wherein the predetermined length of the base sequence of the sticky ends is from 1 to 5.
11. A method according to any one of the preceding claims, wherein the sequencing enzyme comprises a type IIs restriction endonuclease .
12. A method according to any one of the preceding claims, wherein the target nucleic acid population comprises heterogenous nucleic acid fragments.
13. A method according to any one of the preceding claims, wherein the other end of each fragment is protected by ligation with an immobilisation adaptor oligonucleotide.
PCT/GB1997/002734 1996-10-04 1997-10-06 Nucleic acid sequencing by adaptator ligation WO1998015652A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU45663/97A AU4566397A (en) 1996-10-04 1997-10-06 Nucleic acid sequencing by adaptator ligation

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB9620769.1 1996-10-04
GBGB9620769.1A GB9620769D0 (en) 1996-10-04 1996-10-04 Nucleic acid sequencing

Publications (1)

Publication Number Publication Date
WO1998015652A1 true WO1998015652A1 (en) 1998-04-16

Family

ID=10800974

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB1997/002734 WO1998015652A1 (en) 1996-10-04 1997-10-06 Nucleic acid sequencing by adaptator ligation

Country Status (3)

Country Link
AU (1) AU4566397A (en)
GB (1) GB9620769D0 (en)
WO (1) WO1998015652A1 (en)

Cited By (60)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1998048047A2 (en) * 1997-04-21 1998-10-29 Brax Group Limited Characterising dna
WO1999002726A1 (en) * 1997-07-11 1999-01-21 Brax Group Limited Characterising nucleic acid
WO1999014362A1 (en) * 1997-09-15 1999-03-25 Brax Group Limited Characterising nucleic acid by mass spectrometry
WO2000040755A2 (en) * 1999-01-06 2000-07-13 Cornell Research Foundation, Inc. Method for accelerating identification of single nucleotide polymorphisms and alignment of clones in genomic sequencing
US6117634A (en) * 1997-03-05 2000-09-12 The Reagents Of The University Of Michigan Nucleic acid sequencing and mapping
WO2000075377A2 (en) * 1999-06-03 2000-12-14 Jacques Schrenzel Non-cognate hybridization system (nchs)
US6197557B1 (en) 1997-03-05 2001-03-06 The Regents Of The University Of Michigan Compositions and methods for analysis of nucleic acids
DE19957827A1 (en) * 1999-11-25 2001-06-21 Epigenomics Ag Oligomer array with PNA and / or DNA oligomers on one surface
US6270976B1 (en) 1998-09-15 2001-08-07 Brax Group Limited Characterizing nucleic acid by mass spectrometry
WO2001068664A2 (en) 2000-03-14 2001-09-20 Xzillion Gmbh & Co. Kg Mass labels
US6828098B2 (en) 2000-05-20 2004-12-07 The Regents Of The University Of Michigan Method of producing a DNA library using positional amplification based on the use of adaptors and nick translation
EP1513950A1 (en) * 2002-05-20 2005-03-16 Intel Corporation Application of cantilevers in nucleic acid sequencing
US7195751B2 (en) 2003-01-30 2007-03-27 Applera Corporation Compositions and kits pertaining to analyte determination
WO2007060456A1 (en) * 2005-11-25 2007-05-31 Solexa Limited Preparation of nucleic acid templates for solid phase amplification
US7270958B2 (en) 1998-09-10 2007-09-18 The Regents Of The University Of Michigan Compositions and methods for analysis of nucleic acids
US7291460B2 (en) 2002-05-31 2007-11-06 Verenium Corporation Multiplexed systems for nucleic acid sequencing
US7655791B2 (en) 2001-11-13 2010-02-02 Rubicon Genomics, Inc. DNA amplification and sequencing using DNA molecules generated by random fragmentation
US7803550B2 (en) 2005-08-02 2010-09-28 Rubicon Genomics, Inc. Methods of producing nucleic acid molecules comprising stem loop oligonucleotides
US7956175B2 (en) 2003-09-11 2011-06-07 Ibis Biosciences, Inc. Compositions for use in identification of bacteria
US7964343B2 (en) 2003-05-13 2011-06-21 Ibis Biosciences, Inc. Method for rapid purification of nucleic acids for subsequent analysis by mass spectrometry by solution capture
US8084207B2 (en) 2005-03-03 2011-12-27 Ibis Bioscience, Inc. Compositions for use in identification of papillomavirus
US8088582B2 (en) 2006-04-06 2012-01-03 Ibis Biosciences, Inc. Compositions for the use in identification of fungi
US8097416B2 (en) 2003-09-11 2012-01-17 Ibis Biosciences, Inc. Methods for identification of sepsis-causing bacteria
US8148163B2 (en) 2008-09-16 2012-04-03 Ibis Biosciences, Inc. Sample processing units, systems, and related methods
US8158936B2 (en) 2009-02-12 2012-04-17 Ibis Biosciences, Inc. Ionization probe assemblies
US8158354B2 (en) 2003-05-13 2012-04-17 Ibis Biosciences, Inc. Methods for rapid purification of nucleic acids for subsequent analysis by mass spectrometry by solution capture
US8163895B2 (en) 2003-12-05 2012-04-24 Ibis Biosciences, Inc. Compositions for use in identification of orthopoxviruses
US8173957B2 (en) 2004-05-24 2012-05-08 Ibis Biosciences, Inc. Mass spectrometry with selective ion filtration by digital thresholding
US8182992B2 (en) 2005-03-03 2012-05-22 Ibis Biosciences, Inc. Compositions for use in identification of adventitious viruses
US8187814B2 (en) 2004-02-18 2012-05-29 Ibis Biosciences, Inc. Methods for concurrent identification and quantification of an unknown bioagent
US8214154B2 (en) 2001-03-02 2012-07-03 Ibis Biosciences, Inc. Systems for rapid identification of pathogens in humans and animals
US8265878B2 (en) 2001-03-02 2012-09-11 Ibis Bioscience, Inc. Method for rapid detection and identification of bioagents
US8268565B2 (en) 2001-03-02 2012-09-18 Ibis Biosciences, Inc. Methods for identifying bioagents
US8273706B2 (en) 2004-01-05 2012-09-25 Dh Technologies Development Pte. Ltd. Isobarically labeled analytes and fragment ions derived therefrom
US8298760B2 (en) 2001-06-26 2012-10-30 Ibis Bioscience, Inc. Secondary structure defining database and methods for determining identity and geographic origin of an unknown bioagent thereby
US8367322B2 (en) 1999-01-06 2013-02-05 Cornell Research Foundation, Inc. Accelerating identification of single nucleotide polymorphisms and alignment of clones in genomic sequencing
US8407010B2 (en) 2004-05-25 2013-03-26 Ibis Biosciences, Inc. Methods for rapid forensic analysis of mitochondrial DNA
US8409804B2 (en) 2005-08-02 2013-04-02 Rubicon Genomics, Inc. Isolation of CpG islands by thermal segregation and enzymatic selection-amplification method
US8440404B2 (en) 2004-03-08 2013-05-14 Rubicon Genomics Methods and compositions for generating and amplifying DNA libraries for sensitive detection and analysis of DNA methylation
US8534447B2 (en) 2008-09-16 2013-09-17 Ibis Biosciences, Inc. Microplate handling systems and related computer program products and methods
US8546082B2 (en) 2003-09-11 2013-10-01 Ibis Biosciences, Inc. Methods for identification of sepsis-causing bacteria
US8551738B2 (en) 2005-07-21 2013-10-08 Ibis Biosciences, Inc. Systems and methods for rapid identification of nucleic acid variants
US8550694B2 (en) 2008-09-16 2013-10-08 Ibis Biosciences, Inc. Mixing cartridges, mixing stations, and related kits, systems, and methods
US8563250B2 (en) 2001-03-02 2013-10-22 Ibis Biosciences, Inc. Methods for identifying bioagents
US8809518B2 (en) 2000-10-05 2014-08-19 Riken Oligonucleotide linkers comprising a variable cohesive portion and method for the preparation of polynucleotide libraries by using said linkers
US8822156B2 (en) 2002-12-06 2014-09-02 Ibis Biosciences, Inc. Methods for rapid identification of pathogens in humans and animals
US8871471B2 (en) 2007-02-23 2014-10-28 Ibis Biosciences, Inc. Methods for rapid forensic DNA analysis
US8950604B2 (en) 2009-07-17 2015-02-10 Ibis Biosciences, Inc. Lift and mount apparatus
US9080209B2 (en) 2009-08-06 2015-07-14 Ibis Biosciences, Inc. Non-mass determined base compositions for nucleic acid detection
US9149473B2 (en) 2006-09-14 2015-10-06 Ibis Biosciences, Inc. Targeted whole genome amplification method for identification of pathogens
WO2015172080A1 (en) * 2014-05-08 2015-11-12 Fluidigm Corporation Integrated single cell sequencing
US9194877B2 (en) 2009-07-17 2015-11-24 Ibis Biosciences, Inc. Systems for bioagent indentification
US9393564B2 (en) 2009-03-30 2016-07-19 Ibis Biosciences, Inc. Bioagent detection systems, devices, and methods
US9416409B2 (en) 2009-07-31 2016-08-16 Ibis Biosciences, Inc. Capture primers and capture sequence linked solid supports for molecular diagnostic tests
US9598724B2 (en) 2007-06-01 2017-03-21 Ibis Biosciences, Inc. Methods and compositions for multiple displacement amplification of nucleic acids
US9719083B2 (en) 2009-03-08 2017-08-01 Ibis Biosciences, Inc. Bioagent detection methods
US9758840B2 (en) 2010-03-14 2017-09-12 Ibis Biosciences, Inc. Parasite detection via endosymbiont detection
US9873906B2 (en) 2004-07-14 2018-01-23 Ibis Biosciences, Inc. Methods for repairing degraded DNA
US9890408B2 (en) 2009-10-15 2018-02-13 Ibis Biosciences, Inc. Multiple displacement amplification
US10837049B2 (en) 2003-03-07 2020-11-17 Takara Bio Usa, Inc. Amplification and analysis of whole genome and whole transcriptome libraries generated by a DNA polymerization process

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999002728A1 (en) * 1997-07-11 1999-01-21 Brax Group Limited Characterising nucleic acids

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0309969A2 (en) * 1987-09-28 1989-04-05 E.I. Du Pont De Nemours And Company Method of gene mapping
WO1995020053A1 (en) * 1994-01-21 1995-07-27 Medical Research Council Sequencing of nucleic acids
WO1995027080A2 (en) * 1994-04-04 1995-10-12 Lynx Therapeutics Inc Dna sequencing by stepwise ligation and cleavage
WO1996012014A1 (en) * 1994-10-13 1996-04-25 Lynx Therapeutics, Inc. Molecular tagging system
WO1996012039A1 (en) * 1994-10-13 1996-04-25 Lynx Therapeutics, Inc. Massively parallel sequencing of sorted polynucleotides
WO1997046704A1 (en) * 1996-06-06 1997-12-11 Lynx Therapeutics, Inc. Sequencing by ligation of encoded adaptors

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0309969A2 (en) * 1987-09-28 1989-04-05 E.I. Du Pont De Nemours And Company Method of gene mapping
WO1995020053A1 (en) * 1994-01-21 1995-07-27 Medical Research Council Sequencing of nucleic acids
WO1995027080A2 (en) * 1994-04-04 1995-10-12 Lynx Therapeutics Inc Dna sequencing by stepwise ligation and cleavage
WO1996012014A1 (en) * 1994-10-13 1996-04-25 Lynx Therapeutics, Inc. Molecular tagging system
WO1996012039A1 (en) * 1994-10-13 1996-04-25 Lynx Therapeutics, Inc. Massively parallel sequencing of sorted polynucleotides
WO1997046704A1 (en) * 1996-06-06 1997-12-11 Lynx Therapeutics, Inc. Sequencing by ligation of encoded adaptors

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
TANG K ET AL: "MATRIX-ASSISTED LASER DESORPTION/IONIZATION OF RESTRICTION ENZYME-DIGESTED DNA", RAPID COMMUNICATIONS IN MASS SPECTROMETRY, vol. 8, no. 2, February 1994 (1994-02-01), pages 183 - 186, XP000608266 *
UNRAU P ET AL: "Non-cloning amplification of specific DNA fragments from whole genomic digests using DNA 'indexers'", GENE, vol. 145, 1994, pages 163 - 169, XP002056703 *

Cited By (118)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6117634A (en) * 1997-03-05 2000-09-12 The Reagents Of The University Of Michigan Nucleic acid sequencing and mapping
US6762022B2 (en) 1997-03-05 2004-07-13 The Regents Of The University Of Michigan Compositions and methods for analysis of nucleic acids
US6537757B1 (en) 1997-03-05 2003-03-25 The Regents Of The University Of Michigan Nucleic acid sequencing and mapping
US6197557B1 (en) 1997-03-05 2001-03-06 The Regents Of The University Of Michigan Compositions and methods for analysis of nucleic acids
WO1998048047A2 (en) * 1997-04-21 1998-10-29 Brax Group Limited Characterising dna
US6613511B1 (en) 1997-04-21 2003-09-02 Xzillion Gmbh & Co. Characterizing DNA
WO1998048047A3 (en) * 1997-04-21 1999-01-28 Brax Genomics Ltd Characterising dna
US6312904B1 (en) 1997-07-11 2001-11-06 Xzillion Gmbh & Co. Kg Characterizing nucleic acid
WO1999002726A1 (en) * 1997-07-11 1999-01-21 Brax Group Limited Characterising nucleic acid
WO1999014362A1 (en) * 1997-09-15 1999-03-25 Brax Group Limited Characterising nucleic acid by mass spectrometry
US7270958B2 (en) 1998-09-10 2007-09-18 The Regents Of The University Of Michigan Compositions and methods for analysis of nucleic acids
US6270976B1 (en) 1998-09-15 2001-08-07 Brax Group Limited Characterizing nucleic acid by mass spectrometry
WO2000040755A2 (en) * 1999-01-06 2000-07-13 Cornell Research Foundation, Inc. Method for accelerating identification of single nucleotide polymorphisms and alignment of clones in genomic sequencing
WO2000040755A3 (en) * 1999-01-06 2001-01-04 Cornell Res Foundation Inc Method for accelerating identification of single nucleotide polymorphisms and alignment of clones in genomic sequencing
US6534293B1 (en) 1999-01-06 2003-03-18 Cornell Research Foundation, Inc. Accelerating identification of single nucleotide polymorphisms and alignment of clones in genomic sequencing
US8367322B2 (en) 1999-01-06 2013-02-05 Cornell Research Foundation, Inc. Accelerating identification of single nucleotide polymorphisms and alignment of clones in genomic sequencing
WO2000075377A2 (en) * 1999-06-03 2000-12-14 Jacques Schrenzel Non-cognate hybridization system (nchs)
US6544777B1 (en) 1999-06-03 2003-04-08 Jacques Schrenzel Non-cognate hybridization system (NCHS)
WO2000075377A3 (en) * 1999-06-03 2001-05-17 Jacques Schrenzel Non-cognate hybridization system (nchs)
DE19957827A1 (en) * 1999-11-25 2001-06-21 Epigenomics Ag Oligomer array with PNA and / or DNA oligomers on one surface
DE19957827C2 (en) * 1999-11-25 2003-06-12 Epigenomics Ag Use of an oligomer array with PNA and / or DNA oligomers on a surface
US7825069B2 (en) 2000-03-14 2010-11-02 Electrophoretics Limited Mass labels
US7816304B2 (en) 2000-03-14 2010-10-19 Electrophoretics Limited Mass labels
CN102839213A (en) * 2000-03-14 2012-12-26 电泳有限公司 Mass labels
WO2001068664A2 (en) 2000-03-14 2001-09-20 Xzillion Gmbh & Co. Kg Mass labels
EP1806586A1 (en) 2000-03-14 2007-07-11 Electrophoretics Limited Mass labels
WO2001068664A3 (en) * 2000-03-14 2002-03-21 Xzillion Gmbh & Co Kg Mass labels
US7294456B2 (en) 2000-03-14 2007-11-13 Electrophoretics Limited Mass labels
US6828098B2 (en) 2000-05-20 2004-12-07 The Regents Of The University Of Michigan Method of producing a DNA library using positional amplification based on the use of adaptors and nick translation
US8809518B2 (en) 2000-10-05 2014-08-19 Riken Oligonucleotide linkers comprising a variable cohesive portion and method for the preparation of polynucleotide libraries by using said linkers
US8268565B2 (en) 2001-03-02 2012-09-18 Ibis Biosciences, Inc. Methods for identifying bioagents
US8802372B2 (en) 2001-03-02 2014-08-12 Ibis Biosciences, Inc. Methods for rapid forensic analysis of mitochondrial DNA and characterization of mitochondrial DNA heteroplasmy
US8563250B2 (en) 2001-03-02 2013-10-22 Ibis Biosciences, Inc. Methods for identifying bioagents
US8214154B2 (en) 2001-03-02 2012-07-03 Ibis Biosciences, Inc. Systems for rapid identification of pathogens in humans and animals
US8815513B2 (en) 2001-03-02 2014-08-26 Ibis Biosciences, Inc. Method for rapid detection and identification of bioagents in epidemiological and forensic investigations
US9752184B2 (en) 2001-03-02 2017-09-05 Ibis Biosciences, Inc. Methods for rapid forensic analysis of mitochondrial DNA and characterization of mitochondrial DNA heteroplasmy
US9416424B2 (en) 2001-03-02 2016-08-16 Ibis Biosciences, Inc. Methods for rapid identification of pathogens in humans and animals
US8265878B2 (en) 2001-03-02 2012-09-11 Ibis Bioscience, Inc. Method for rapid detection and identification of bioagents
US8921047B2 (en) 2001-06-26 2014-12-30 Ibis Biosciences, Inc. Secondary structure defining database and methods for determining identity and geographic origin of an unknown bioagent thereby
US8380442B2 (en) 2001-06-26 2013-02-19 Ibis Bioscience, Inc. Secondary structure defining database and methods for determining identity and geographic origin of an unknown bioagent thereby
US8298760B2 (en) 2001-06-26 2012-10-30 Ibis Bioscience, Inc. Secondary structure defining database and methods for determining identity and geographic origin of an unknown bioagent thereby
US7655791B2 (en) 2001-11-13 2010-02-02 Rubicon Genomics, Inc. DNA amplification and sequencing using DNA molecules generated by random fragmentation
US10214771B2 (en) 2001-11-13 2019-02-26 Takara Bio Usa, Inc. DNA amplification and sequencing using DNA molecules generated by random fragmentation
US8815504B2 (en) 2001-11-13 2014-08-26 Rubicon Genomics, Inc. DNA amplification and sequencing using DNA molecules generated by random fragmentation
US9410193B2 (en) 2001-11-13 2016-08-09 Rubicon Genomics, Inc. DNA amplification and sequencing using DNA molecules generated by random fragmentation
EP1513950A4 (en) * 2002-05-20 2005-07-20 Intel Corp Application of cantilevers in nucleic acid sequencing
EP1513950A1 (en) * 2002-05-20 2005-03-16 Intel Corporation Application of cantilevers in nucleic acid sequencing
US7291460B2 (en) 2002-05-31 2007-11-06 Verenium Corporation Multiplexed systems for nucleic acid sequencing
US9725771B2 (en) 2002-12-06 2017-08-08 Ibis Biosciences, Inc. Methods for rapid identification of pathogens in humans and animals
US8822156B2 (en) 2002-12-06 2014-09-02 Ibis Biosciences, Inc. Methods for rapid identification of pathogens in humans and animals
US7947513B2 (en) 2003-01-30 2011-05-24 DH Technologies Ptd. Ltd. Sets and compositions pertaining to analyte determination
US7195751B2 (en) 2003-01-30 2007-03-27 Applera Corporation Compositions and kits pertaining to analyte determination
US7799576B2 (en) 2003-01-30 2010-09-21 Dh Technologies Development Pte. Ltd. Isobaric labels for mass spectrometric analysis of peptides and method thereof
US8679773B2 (en) 2003-01-30 2014-03-25 Dh Technologies Development Pte. Ltd. Kits pertaining to analyte determination
US10837049B2 (en) 2003-03-07 2020-11-17 Takara Bio Usa, Inc. Amplification and analysis of whole genome and whole transcriptome libraries generated by a DNA polymerization process
US11492663B2 (en) 2003-03-07 2022-11-08 Takara Bio Usa, Inc. Amplification and analysis of whole genome and whole transcriptome libraries generated by a DNA polymerization process
US11661628B2 (en) 2003-03-07 2023-05-30 Takara Bio Usa, Inc. Amplification and analysis of whole genome and whole transcriptome libraries generated by a DNA polymerization process
US8158354B2 (en) 2003-05-13 2012-04-17 Ibis Biosciences, Inc. Methods for rapid purification of nucleic acids for subsequent analysis by mass spectrometry by solution capture
US8476415B2 (en) 2003-05-13 2013-07-02 Ibis Biosciences, Inc. Methods for rapid purification of nucleic acids for subsequent analysis by mass spectrometry by solution capture
US7964343B2 (en) 2003-05-13 2011-06-21 Ibis Biosciences, Inc. Method for rapid purification of nucleic acids for subsequent analysis by mass spectrometry by solution capture
US7956175B2 (en) 2003-09-11 2011-06-07 Ibis Biosciences, Inc. Compositions for use in identification of bacteria
US8288523B2 (en) 2003-09-11 2012-10-16 Ibis Biosciences, Inc. Compositions for use in identification of bacteria
US8394945B2 (en) 2003-09-11 2013-03-12 Ibis Biosciences, Inc. Compositions for use in identification of bacteria
US8242254B2 (en) 2003-09-11 2012-08-14 Ibis Biosciences, Inc. Compositions for use in identification of bacteria
US8013142B2 (en) 2003-09-11 2011-09-06 Ibis Biosciences, Inc. Compositions for use in identification of bacteria
US8097416B2 (en) 2003-09-11 2012-01-17 Ibis Biosciences, Inc. Methods for identification of sepsis-causing bacteria
US8546082B2 (en) 2003-09-11 2013-10-01 Ibis Biosciences, Inc. Methods for identification of sepsis-causing bacteria
US8163895B2 (en) 2003-12-05 2012-04-24 Ibis Biosciences, Inc. Compositions for use in identification of orthopoxviruses
US8273706B2 (en) 2004-01-05 2012-09-25 Dh Technologies Development Pte. Ltd. Isobarically labeled analytes and fragment ions derived therefrom
US9447462B2 (en) 2004-02-18 2016-09-20 Ibis Biosciences, Inc. Methods for concurrent identification and quantification of an unknown bioagent
US8187814B2 (en) 2004-02-18 2012-05-29 Ibis Biosciences, Inc. Methods for concurrent identification and quantification of an unknown bioagent
US8440404B2 (en) 2004-03-08 2013-05-14 Rubicon Genomics Methods and compositions for generating and amplifying DNA libraries for sensitive detection and analysis of DNA methylation
US9708652B2 (en) 2004-03-08 2017-07-18 Rubicon Genomics, Inc. Methods and compositions for generating and amplifying DNA libraries for sensitive detection and analysis of DNA methylation
US9449802B2 (en) 2004-05-24 2016-09-20 Ibis Biosciences, Inc. Mass spectrometry with selective ion filtration by digital thresholding
US8173957B2 (en) 2004-05-24 2012-05-08 Ibis Biosciences, Inc. Mass spectrometry with selective ion filtration by digital thresholding
US8987660B2 (en) 2004-05-24 2015-03-24 Ibis Biosciences, Inc. Mass spectrometry with selective ion filtration by digital thresholding
US8407010B2 (en) 2004-05-25 2013-03-26 Ibis Biosciences, Inc. Methods for rapid forensic analysis of mitochondrial DNA
US9873906B2 (en) 2004-07-14 2018-01-23 Ibis Biosciences, Inc. Methods for repairing degraded DNA
US8182992B2 (en) 2005-03-03 2012-05-22 Ibis Biosciences, Inc. Compositions for use in identification of adventitious viruses
US8084207B2 (en) 2005-03-03 2011-12-27 Ibis Bioscience, Inc. Compositions for use in identification of papillomavirus
US8551738B2 (en) 2005-07-21 2013-10-08 Ibis Biosciences, Inc. Systems and methods for rapid identification of nucleic acid variants
US7803550B2 (en) 2005-08-02 2010-09-28 Rubicon Genomics, Inc. Methods of producing nucleic acid molecules comprising stem loop oligonucleotides
US11072823B2 (en) 2005-08-02 2021-07-27 Takara Bio Usa, Inc. Compositions including a double stranded nucleic acid molecule and a stem-loop oligonucleotide
US8399199B2 (en) 2005-08-02 2013-03-19 Rubicon Genomics Use of stem-loop oligonucleotides in the preparation of nucleic acid molecules
US8071312B2 (en) 2005-08-02 2011-12-06 Rubicon Genomics, Inc. Methods for producing and using stem-loop oligonucleotides
US8409804B2 (en) 2005-08-02 2013-04-02 Rubicon Genomics, Inc. Isolation of CpG islands by thermal segregation and enzymatic selection-amplification method
US8778610B2 (en) 2005-08-02 2014-07-15 Rubicon Genomics, Inc. Methods for preparing amplifiable DNA molecules
US10196686B2 (en) 2005-08-02 2019-02-05 Takara Bio Usa, Inc. Kits including stem-loop oligonucleotides for use in preparing nucleic acid molecules
US8728737B2 (en) 2005-08-02 2014-05-20 Rubicon Genomics, Inc. Attaching a stem-loop oligonucleotide to a double stranded DNA molecule
US10208337B2 (en) 2005-08-02 2019-02-19 Takara Bio Usa, Inc. Compositions including a double stranded nucleic acid molecule and a stem-loop oligonucleotide
US9598727B2 (en) 2005-08-02 2017-03-21 Rubicon Genomics, Inc. Methods for processing and amplifying nucleic acids
WO2007060456A1 (en) * 2005-11-25 2007-05-31 Solexa Limited Preparation of nucleic acid templates for solid phase amplification
EP2918686A1 (en) * 2005-11-25 2015-09-16 Illumina Cambridge Limited Preparation of nucleic acid templates for solid phase amplification
US8168388B2 (en) 2005-11-25 2012-05-01 Illumina Cambridge Ltd Preparation of nucleic acid templates for solid phase amplification
US8088582B2 (en) 2006-04-06 2012-01-03 Ibis Biosciences, Inc. Compositions for the use in identification of fungi
US9149473B2 (en) 2006-09-14 2015-10-06 Ibis Biosciences, Inc. Targeted whole genome amplification method for identification of pathogens
US8871471B2 (en) 2007-02-23 2014-10-28 Ibis Biosciences, Inc. Methods for rapid forensic DNA analysis
US9598724B2 (en) 2007-06-01 2017-03-21 Ibis Biosciences, Inc. Methods and compositions for multiple displacement amplification of nucleic acids
US8252599B2 (en) 2008-09-16 2012-08-28 Ibis Biosciences, Inc. Sample processing units, systems, and related methods
US8148163B2 (en) 2008-09-16 2012-04-03 Ibis Biosciences, Inc. Sample processing units, systems, and related methods
US8534447B2 (en) 2008-09-16 2013-09-17 Ibis Biosciences, Inc. Microplate handling systems and related computer program products and methods
US9027730B2 (en) 2008-09-16 2015-05-12 Ibis Biosciences, Inc. Microplate handling systems and related computer program products and methods
US9023655B2 (en) 2008-09-16 2015-05-05 Ibis Biosciences, Inc. Sample processing units, systems, and related methods
US8550694B2 (en) 2008-09-16 2013-10-08 Ibis Biosciences, Inc. Mixing cartridges, mixing stations, and related kits, systems, and methods
US8609430B2 (en) 2008-09-16 2013-12-17 Ibis Biosciences, Inc. Sample processing units, systems, and related methods
US8158936B2 (en) 2009-02-12 2012-04-17 Ibis Biosciences, Inc. Ionization probe assemblies
US9165740B2 (en) 2009-02-12 2015-10-20 Ibis Biosciences, Inc. Ionization probe assemblies
US8796617B2 (en) 2009-02-12 2014-08-05 Ibis Biosciences, Inc. Ionization probe assemblies
US9719083B2 (en) 2009-03-08 2017-08-01 Ibis Biosciences, Inc. Bioagent detection methods
US9393564B2 (en) 2009-03-30 2016-07-19 Ibis Biosciences, Inc. Bioagent detection systems, devices, and methods
US8950604B2 (en) 2009-07-17 2015-02-10 Ibis Biosciences, Inc. Lift and mount apparatus
US9194877B2 (en) 2009-07-17 2015-11-24 Ibis Biosciences, Inc. Systems for bioagent indentification
US10119164B2 (en) 2009-07-31 2018-11-06 Ibis Biosciences, Inc. Capture primers and capture sequence linked solid supports for molecular diagnostic tests
US9416409B2 (en) 2009-07-31 2016-08-16 Ibis Biosciences, Inc. Capture primers and capture sequence linked solid supports for molecular diagnostic tests
US9080209B2 (en) 2009-08-06 2015-07-14 Ibis Biosciences, Inc. Non-mass determined base compositions for nucleic acid detection
US9890408B2 (en) 2009-10-15 2018-02-13 Ibis Biosciences, Inc. Multiple displacement amplification
US9758840B2 (en) 2010-03-14 2017-09-12 Ibis Biosciences, Inc. Parasite detection via endosymbiont detection
WO2015172080A1 (en) * 2014-05-08 2015-11-12 Fluidigm Corporation Integrated single cell sequencing

Also Published As

Publication number Publication date
GB9620769D0 (en) 1996-11-20
AU4566397A (en) 1998-05-05

Similar Documents

Publication Publication Date Title
WO1998015652A1 (en) Nucleic acid sequencing by adaptator ligation
AU721861B2 (en) Characterising DNA
US6297017B1 (en) Categorising nucleic acids
CA2301875A1 (en) Methods of preparing nucleic acids for mass spectrometric analysis
JP2003339389A (en) Positional sequence determination by hybrid formation
AU728805B2 (en) Nucleic acid sequencing
GB2539675A (en) Reagents, kits and methods for molecular barcoding
AU733924B2 (en) Characterising DNA
US11486003B2 (en) Highly sensitive methods for accurate parallel quantification of nucleic acids
EP4060049B1 (en) Methods for accurate parallel quantification of nucleic acids in dilute or non-purified samples
WO1999014362A1 (en) Characterising nucleic acid by mass spectrometry
US11970736B2 (en) Methods for accurate parallel detection and quantification of nucleic acids
US20240068010A1 (en) Highly sensitive methods for accurate parallel quantification of variant nucleic acids
US20240068022A1 (en) Methods for accurate parallel detection and quantification of nucleic acids
WO2023139309A1 (en) Methods for sensitive and accurate parallel quantification of nucleic acids using bridge probes

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GE GH HU IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG US UZ VN YU ZW AM AZ BY KG KZ MD RU TJ TM

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH KE LS MW SD SZ UG ZW AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: JP

Ref document number: 1998517299

Format of ref document f/p: F

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: CA