POLYNUCLEOTIDE SEQUENCING Field of the Invention
This invention relates to the sequencing of polynucieotides. In particular, this invention discloses methods for determining the sequence of arrayed polynucieotides. Background to the Invention
Advances in the study of molecules have been led, in part, by improvement in technologies used to characterise the molecules or their biological reactions. In particular, the study of the nucleic acids DNA and RNA has benefited from developing technologies used for sequence analysis and the study of hybridisation events. An example of the technologies that have improved the study of nucleic acids, is the development of fabricated arrays of immobilised nucleic acids. These arrays consist typically of a high-density matrix of polynucieotides immobilised onto a solid support material. Fodor et a/, Trends in Biotechnology (1994) 12:19-26, describes ways of assembling the nucleic acids using a chemically sensitized glass surface protected by a mask, but exposed at defined areas to allow attachment of suitably modified nucleotide phosphoramidites. Fabricated arrays may also be manufactured by the technique of "spotting" known polynucieotides onto a solid support at predetermined positions (e.g. Stimpson et a/ PNAS (1995) 92:6379-6383).
A further development in array technology is the attachment of the polynucieotides to the solid support material via beads (microspheres).
For DNA arrays to be useful their sequences must be determined. US 5302509 discloses a method to sequence polynucieotides immobilised on a solid support. The method relies on the incorporation of 3'-blocked bases A, G, C and T having a different fluorescent label to the immobilised polynucleotide, in the presence of DNA polymerase. The polymerase incorporates a base complementary to the target polynucleotide, but is prevented from further addition by the 3'-blocking group. The label of the incorporated base can then be determined and the blocking group removed to allow further polymerisation to occur.
However, the need to remove the blocking groups after each cycle is time- consuming and must be performed with high efficiency.
Similarly, EP0640146 discloses a polymerisation-based technique for sequencing DNA. The technique again requires removal of a blocking group prior to subsequent incorporation of nucieotides.
There is therefore a need for alternative methods for determining the sequence of arrayed polynucieotides. Summary of the Invention
In the general method of the invention, a target polynucleotide sequence can be determined by generating its complement using the polymerase reaction by the extension of a suitable primer, and characterising the successive incorporation of bases that generate the complement. The method requires the target sequence to be immobilised on a solid support, with multiple copies of the target being localised within discrete regions. Each of the different bases A, T, G or C are then brought, by sequential addition, into contact with the target, and any incorporation events detected. Repeating the procedure with each of the bases allows the sequence of the complement to be identified, and thereby the target sequence also.
A distinguishing feature from the disclosure in US 5302509 is that the bases do not contain a blocking group preventing further polymerisation from occurring. In addition, the present invention requires the separate and serial addition of each of the different base types to the array, and, when fluorophores are used as the label, removal of the label can be carried out efficiently by photobleaching.
A further distinguishing feature, particularly relevant to EP 0640146, is that for each incorporation step, only a minor proportion of the bases are detectably-labelled. Consequently, among the many copies of the target, relatively few will incorporate a labelled base into the complement. This permits the straight forward identification of any sequence containing two or more consecutive bases of the same type. In this case, copies of the target will incorporate differing amounts of the labelled base into the complement, resulting in differing levels of signal. It is then possible to determine quantitatively the number of consecutive bases on the complement by detecting the different level of signals generated, as explained later.
Accordingly, a method for determining the sequence of a target polynucleotide on an array, comprises the steps of:
(i) forming an array comprising multiple copies of each target polynucleotide; contacting the array with a composition comprising one of the bases A, T, G or C under conditions that permit polymerisation to occur, wherein a minor proportion of the bases are detectably-labelled;
(iii) detecting the incoφoration of a base onto the complement of the target after removal of non-incoφorated bases; and (iv) repeating steps (ii) and (iii) with each of the different bases until the sequence is determined. According to one aspect of the invention, the label from incoφorated bases may be removed either prior to the addition of bases having the same label or before it becomes difficult to detect incoφoration.
According to a second aspect of the invention, when the label is a fluorophore, the fluorescence signal generated on nucleotide incoφoration may be measured quantitatively, without the need to remove labels after each incoφoration step. There is therefore a method for determining the sequence of a target polynucleotide as described above, wherein the fluorescence labels are not removed from the incoφorated nucleotides, and subsequent detection of incoφoration is carried out by measuring the step wise increase in the fluorescence signal. The advantage of this embodiment is that it does not require the step of photobleaching and may therefore be carried out quickly and efficiently.
Sequencing the polynucieotides on the array makes it possible to form a spatially addressable array. This may then be used for many different applications, including genotyping studies and other characterisation experiments. The method of the present invention may be automated to produce a very efficient and fast sequence determination. Description of the Drawings
Figure 1 represents a fluorescence (left) or optical (right) image generated in the presence (A) and absence (B) of polymerase enzyme; and Figure 2 represents a fluorescence image generated from beads with fluorophore-iabelled DNA attached (A) or a fluorophore-labelled nucleotide incorporated into DNA using a polymerase (B). Description of the Invention
The method for determining the sequence of the arrayed polynucieotides is carried out by contacting the array separately with the different bases to form the complement to that of the target polynucleotide, and detecting incorporation. The method makes use of polymerisation, whereby a polymerase enzyme extends the complementary strand by incoφorating the correct base complementary to that on the
target. The polymerisation reaction also requires a specific primer to initiate polymerisation.
For each cycle, adding one base type to the array, only a minor proportion of the bases are detectably-labelled, i.e. less than 50% of the bases are detectably- labelled, preferably less than 20%. Therefore, it is only the incoφoration of detectably-labelled bases that can be monitored. The labelled bases are present at a fixed low concentration with respect to the non-labelled bases. The concentration may be chosen to permit a suitable incoφoration rate of the labelled bases for efficient detection. For example the concentration may be chosen to permit between 10% to 0.0001% incoφoration of labelled bases, preferably, between 5% and 0.01%, most preferably between 1% and 0.1%.
Using many copies of the same polynucleotide in discrete regions it is possible to detect quantitatively the incoφoration of a labelled base. For example, on incoφoration of the adenosine nucleotide, a proportion of the polynucieotides will have a non-labelled adenosine nucleotide and a proportion will have a labelled adenosine nucleotide. Detecting the incorporation of the label will allow a sequence determination to be made. If two adenosine nucleotides are incoφorated consecutively into the complementary strand, a proportion of the polynucleotide copies will incoφorate two non-labelled adenosine nucleotides, a proportion will incoφorate one labelled adenosine and one non-labelled adenosine, and a proportion will incoφorate two labelled adenosine nucleotides. However, the ratio of labelled to unlabeiled nucleotide will be such that very little of the labelled nucleotide will incoφorate into the same strand. This is especially preferable when fluorescent labels are used, where fluorescence quenching or loss of linearity of signal may be caused. The label will therefore be distributed throughout the population of a given sequence.
Consequently, there will be a quantitative difference in the signal generated within the population of the given sequence. It is possible therefore to detect the incoφoration of the two consecutive labelled bases due to the quantitative differences in the signal.
In the context of the invention, reference to the bases A, T, G and C is taken to be a reference to the deoxynucieoside triphosphates, Adenosine, Thymidine, Guanosine and Cytidine, and to functional analogs thereof, including dideoxynucleoside triphosphates.
The terms "arrayed polynucieotides" and "polynucleotide arrays" are used herein to define an array of polynucieotides that are immobilised on a solid support
material. The polynucieotides may be immobilised to the solid support indirectly through a linker molecule, or may be attached to a particle, e.g. a microsphere, which is itself attached to a solid support material.
An important requirement is that there are multiple copies of each target polynucleotide on the array. Typically, these will be in discrete positioned regions on the solid support. Each discrete region may typically comprise several hundred to several thousand copies of the target polynucleotide. There may be, for example, up to 10,000 polynucleotide copies per region. The polynucieotides within each region preferably form a substantially uniform arrangement. This permits a high level of discrimination between individual polynucieotides, which may be preferable to resolve individual labels. However, it is not necessarily the density of the polynucieotides that is of primary importance; the concentration of the labelled bases during the sequencing steps is also important, and this can be optimised readily by the skilled person. The term "spatially addressable" is used herein to describe how different molecules may be identified on the basis of their position on an array.
The detection of an incoφorated base may be carried out by using a confocal scanning microscope to scan the surface of the array with a laser, to image a fluorophore bound directly to the incoφorated base. Alternatively, a sensitive 2-D detector, such as a charge-coupled detector (CCD), can be used to visualise the individual signals generated. The use of such apparatus is known to the skilled person. However, other techniques such as scanning near-field optical microscopy (SNOM) are available which are capable of smaller optical resolution, thereby committing "more dense" arrays to be used. For example, using SNOM, individual polynucieotides may be distinguished when separated by a distance of less than 100 nm, e.g. 10 nm x 10 nm. For a description of scanning near-field optical microscopy, see Moyer et al Laser Focus World (1993) 29:10.
The polynucieotides that may be sequenced include DNA, RNA and synthetic alternatives such as PNA. The polynucieotides may be attached to the solid support by recognised means, including the use of biotin-avidin interactions or the use of amine linkages. In one embodiment, the polynucieotides are attached to the solid support via microscopic beads (microspheres), which may in turn be attached to the solid support by known
means. The microspheres may be of any suitable size, typically in the range of from 10 nm to 100 nm in diameter.
Attachment via microspheres is a preferred embodiment as it allows discrete regions of polynucieotides to be easily generated on the array. Each microsphere may have multiple copies of a polynucleotide attached, and each microsphere can be resolved individually to determine incoφoration events.
The method makes use of the polymerisation reaction to generate the complementary sequence of the target. The conditions necessary for polymerisation to occur will be apparent to the skilled person. For example, a polymerase enzyme may be used to extend the complementary strand, and different polymerases, including DNA polymerases and RNA polymerases, are known to those skilled in the art. For example, the Klenow fragment of E. coli DNA polymerase I or the T7 DNA polymerase may be used. To carry out the polymerase reaction it may be necessary to first anneal a primer sequence to the target polynucleotide, the primer sequence being recognised by the polymerase enzyme and acting as an initiation site for the subsequent extension of the complementary strand. Other conditions necessary for carrying out the reaction, including temperature and pH, will be apparent to those skilled in the art.
This polymerisation step is allowed to proceed for a time sufficient to allow incoφoration of all the correct bases. This will depend on the efficiency of incoφoration and can be determined by the skilled person. Bases that are not incoφorated are then removed, for example, by subjecting the array to a washing step, and detection of the incoφorated labels may then be carried out.
Detection may be by conventional means, for example if the label is a fluorescent moiety, detection may be earned out by optical microscopy, e.g. confocal scanning microscopy.
A preferred embodiment of the invention uses fluorophores as the label, and many examples of fluorophores that may be used are known in the prior art e.g. tetramethylrhodamine (TMR). After detection, the labels may be removed from the bases so that they do not interfere with the signal generated from next cycle of incoφoration. If the label is a fluorophore it is possible to bleach the fluorophore by chemical means or through the use of a laser (photobleaching). Alternatively, the label may be removed by chemical or photochemical means.
The process of incoφorating bases may then be repeated using each of the different bases until the sequence has been determined.
It may not always be necessary to remove the labels prior to the addition of the next base sample. Different bases may have distinguishable labels and so it will only be necessary to remove incoφorated labels prior to adding bases having an identical label. in one embodiment, fluorescent labels are used and detection is carried out by optical means without the requirement for removing labels between incoφoration steps. For example, a confocal microscope may be used to scan the array and measure quantitatively the step-wise increase in fluorescence after each cycle of incoφoration. By measuring the increase in the amount of fluoresence after each cycle, and not the absolute amount, it should be possible to determine whether there are two or more nucleotides incoφorated consecutively onto the template. This method relies on using sensitive detectors (e.g. charge coupled detectors) to measure the increase in signal. Suitable apparatus for carrying out the method is available commercially and will be apparent to the skilled person.
In a separate embodiment of the invention, the labelled bases may be modified so that on incoφoration, no further bases may be added. Bases that carry out this chain terminating function include the dideoxynucleoside triphosphates, as used in conventional Sanger sequencing (Proc. Natl. Acad. Sci. USA 74: 5463-5467, 1977).
Therefore, after each incoφoration step, a proportion of the polynucieotides will incoφorate a labelled base that prevents further chain-extension. The number of polynucieotides available for the polymerisation step will gradually decrease as the sequencing method proceeds. However, provided there are sufficient copies of the polynucleotide, and provided the concentration of the labelled, chain-terminating bases, is sufficiently low, it should be possible to sequence the target polynucieotides.
The following experiment illustrates the invention. Example In this experiment a fluorescently-labelled DNA molecule (SEQ ID NO. 1) was coupled directly to beads and the level of fluorescence measured using an inverted Nikon microscope with an ICCD detector in an epifluorescence set-up. In a separate reaction, an unlabelled DNA (SEQ ID NO. 2) was attached to beads (containing SEQ ID NO. 2) and a fluorescently-labelled nucleotide incoφorated onto the DNA (SEQ ID
NO.2) using a polymerase. By comparing the average level of fluorescence between the two sets of beads the efficiency of incoφoration of the fluorescently-labelled nucleotide was shown to be 89%. This was determined by diluting the fluorescent beads in unmodified beads so that each fluorescent bead could be detected individually.
By measuring the signal-to-noise in the experiment an estimate can be obtained of the fraction of nucleotides that can be labelled with a fluorophore and detected when incoφorated. This is less than 1%, i.e. it is possible to detect incoφoration when the concentration of fluorescently-labelled nucleotides is such that only 1% is incoφorated, and the remaining 99% of the incoφorated nucleotides are non-labelled.
The experiment is now described in more detail. DNA Coupling
Carboxylic acid-modified beads (both non-porous polystyrene and silica) of sizes 0.5-2.9 μm were placed in solutions of milli-q water (typically 1 mg per 50 ml). 1-3(3-Dimethylaminopropyl)-3-ethylcarbodiimide hydrochloride (EDC) (1 mg) and the oligonucleotide (added to give a final concentration of 10 μm) were added, the beads agitated by vortexing and left for 12 h at room temperature. The beads were washed with 0J5 M NaOH, twice with TT buffer (250 mM Tris.HCI, pH 8.0, 0.1 % tween 20) and heated at 80°C in TTE (250 mM Tris.CHI, pH 8.0, 0.1% tween 20, 20 mM sodium EDTA) and rinsed with water. To achieve a dilute array, the beads were sonicated in 200 μl water and 2.5 μl evaporated onto a heated slide. Enzyme Incoφoration
A solution of the 51mer (SEQ ID NO. 3) (4 μm; 2eqvs) in hybridisation buffer (5 mM, MgCI2, 7.5 mM DTT, 10 mM Tris.HCI (pH 7.6), 0.005% Triton X100) (20 ml) was added to 0.05 mg of beads (containing SEQ ID NO. 2) which were heated to 90°C for 2 min and allowed to cool for 1h. The fluorescent dUTP (400 μm stock, 0.5 μl, 10 μm; 4eqvs)) was added. A fraction of the beads were removed as a washing control and the polymerase (Sequence) (0.5 μl, 6.5 units) (one unit will incoφorate 1 nmole dNTP in 30 s a 37°C) was added. The reaction was left at room temperature for 4 h and the beads were washed with NaOH, TT and TTE buffers as above and arrayed onto a coverslip.
The oligos used in this study are as follows:
S'-CCrAMRAJAGCGTCGGCAGGTATCCCAA-^βaminoJ-δ' SEQ ID NO. 1
and unlabelled:
S'-amino-GTCATCGAACGTCGAGCCTCGCAGCCGTCCAACCAACTCA-S'
SEQ ID NO. 2
and
3,-CAGTAGCTTGCAGCTCGGAGCGTCGGCAGGTTGGTTGAGTAGGTCTTGTTT-5,
SEQ ID NO. 3
as hybridised template.
Figure 1 shows the fluorescence image on the left and the optical image on the right when the experiment on the incoφoration of fluorescently-labelled d-UTP was performed in the presence (A) and absence (B) of the polymerase. It is clear that no fluorescence is detected in the absence of any enzyme.
Figure 2 shows the beads diluted in unmodified beads so a quantitative analysis can be performed. The top figures (A) show the fluorescence from the beads with fluorophore-labelled DNA attached to the bead and the lower image (B) shows the level of fluorescence when the fluorophore-labelled nucleotide is incoφorated into the DNA using a polymerase. The values of the fluorescence from the beads were compared:
(A) 3'-TAMRA DNA; Average counts/bead = 1956 (54 beads, +/- 50%)
(B) unblocked carboxylic acid-modified beads; counts/bead = 1739 (89%) (88 beads, = +/- 40%)
This means the incoφoration of the labelled nucleotide is 89%.
By comparing the signal-to-noise in Figure 1 between the level of fluorescence when the enzyme is present and when it is absent it is possible to estimate that the level of fluorescence could be reduced by a factor of 100-1000 while still allowing the detection of fluorescence above the background with adequate signal-to-noise. This
means that experiments can be performed with the fluorophore-labelled nucleotides highly diluted in non-labelled nucleotides so that only 1 % of the fluorophore-labelled nucleotides are incoφorated.