WO2016109808A1 - Methods and compositions for nucleic acid-templated synthesis of large libraries of complex small molecules - Google Patents

Methods and compositions for nucleic acid-templated synthesis of large libraries of complex small molecules Download PDF

Info

Publication number
WO2016109808A1
WO2016109808A1 PCT/US2015/068308 US2015068308W WO2016109808A1 WO 2016109808 A1 WO2016109808 A1 WO 2016109808A1 US 2015068308 W US2015068308 W US 2015068308W WO 2016109808 A1 WO2016109808 A1 WO 2016109808A1
Authority
WO
WIPO (PCT)
Prior art keywords
codon
reactive
oligonucleotides
sequence
small molecules
Prior art date
Application number
PCT/US2015/068308
Other languages
French (fr)
Inventor
Nicholas K. TERRETT
William H. CONNORS
Original Assignee
Ensemble Therapeutics Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ensemble Therapeutics Corporation filed Critical Ensemble Therapeutics Corporation
Publication of WO2016109808A1 publication Critical patent/WO2016109808A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B50/00Methods of creating libraries, e.g. combinatorial synthesis
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/102Mutagenizing nucleic acids
    • C12N15/1031Mutagenizing nucleic acids mutagenesis by gene assembly, e.g. assembly by oligonucleotide extension PCR

Definitions

  • the invention relates generally to methods and compositions for performing nucleic acid-templated synthesis. More particularly, the invention relates to methods and compositions for producing small molecule libraries of greater size, scope and complexity than previously possible by nucleic acid template synthesis.
  • Nucleic acid-templated organic synthesis enables modes of controlling reactivity that are not possible in a conventional synthesis format and allows synthetic molecules to be manipulated using translation, selection, and amplification methods previously available only to biological macromolecules (Gartner et al. (2001) J. AM. CHEM. SOC. 123: 6961-3; Gartner et al. (2002) ANGEW. CHEM., INT. ED. ENGL. 123: 61796-1800; Gartner et al. (2002) J. AM. CHEM. Soc. 124: 10304-6; Calderone et al. (2002) ANGEW. CHEM., INT. ED. ENGL.
  • the present invention provides methods and compositions for producing small molecule libraries of greater size, scope and complexity than those previously possible by nucleic acid template synthesis.
  • the invention provides a method for producing a library of small molecules associated with corresponding oligonucleotides.
  • the method comprises the steps of (a) providing a plurality of templates comprising a plurality of first reactive units associated with a corresponding plurality of first oligonucleotides, (b) providing a plurality of first transfer units comprising a plurality of second reactive units covalently attached to a corresponding plurality of second oligonucleotides, wherein each second oligonucleotide defines a first anti- codon sequence complementary to a first codon sequence; and (c) annealing first and second oligonucleotides having complementary codon and anti-codon sequences to bring the first and second reactive units into reactive proximity, thereby producing a plurality of first small molecules associated with the corresponding first oligonucleotides.
  • Each first oligonucleotide defines at least a first codon sequence, a second codon sequence, and a third codon sequence, and each of the first, second, and third codon sequence is at least 12 bases in length and is different from one another. Each first oligonucleotide is at least 70 bases in length.
  • the method can also include dividing a plurality of templates comprising a plurality of first reactive units associated with a corresponding plurality of first oligonucleotides into a plurality of aliquots, and for each aliquot, providing a plurality of first transfer units, a plurality of second transfer units, a plurality of third transfer units, and, optionally a plurality of fourth transfer units, wherein the order of adding the first, second, third, and optionally fourth transfer units is different from any other aliquot.
  • Each first oligonucleotide defines at least a first codon sequence, a second codon sequence, and a third codon sequence, and each of the first, second, and third codon sequence is at least 12 bases in length and is different from one another. Each first oligonucleotide is at least 70 bases in length.
  • the method includes recombining two or more of the aliquots to create a library of small molecules.
  • the template may have any one of the following features.
  • at least one codon is at least 14 bases in length.
  • each of the first, second, third, and, optionally, fourth codon is at least 14 bases in length.
  • the first oligonucleotide is at least 90 bases in length.
  • each first oligonucleotide comprises a unique tag sequence that defines the linker or capping group, or any other structural modification to any small molecule that was not achieved through a DNA-templated reaction step.
  • each of the corresponding first oligonucleotides has a nucleotide sequence informative of at least a portion of the synthetic history of the small molecule associated therewith.
  • the concentration of the plurality of templates is at least 90 nM and no greater than 500 nM at each step when a reactive unit is added.
  • the small molecules may have any of the following features.
  • the second, third or fourth small molecule comprises a moiety that was added as a soluble reagent to the first oligonucleotide-associated small molecule (e.g. , to the first, second, third, or fourth small molecule); and, optionally, wherein each of the first
  • oligonucleotides comprises a nucleotide sequence that is informative of the soluble reagent- added moiety.
  • at least one of the first, second, third, fourth or fifth reactive unit, or the soluble reagent is a trivalent moiety.
  • the second, third or fourth small molecule, or the soluble reagent comprises a reactive moiety or can be further modified or deprotected to reveal a reactive functional group capable of further reaction with a plurality of chemical moieties.
  • the reactive moiety capable of further reaction with another chemical moiety can be but is not limited to a nucleophilic primary or secondary amine or a free carboxyl group.
  • the methods may further comprise the following steps.
  • the method comprises (i) splitting the library into a plurality of aliquots following addition of the reactive moiety capable of further reaction with a plurality of chemical moieties onto the first oligonucleotide-associated small molecules; and (ii) adding to each of the plurality of aliquots a different reagent that reacts with the reactive moiety present on the first oligonucleotide- associated small molecules present therein.
  • different reagents include an acylating agent, a sulfonating agent, a heteroaryl halide reagent, reductive amination reagents and an amide-forming reagent.
  • an identifying sequence e.g. , a tag
  • a tag can be ligated to the 3' terminus of each of the plurality of templates.
  • one or more of the plurality of second, third, fourth or fifth oligonucleotides is bound to a first member of a binding pair, e.g., biotin.
  • a binding pair e.g., biotin.
  • one or more of the plurality of first, second, third or fourth small molecules bound to the first member of a binding pair is purified by contact with a second binding pair member, wherein the second member of the binding pair (e.g., streptavidin) is bound to a solid support.
  • the plurality of templates is reacted with a capping reagent that differentially caps the small molecules that did not react with one or more of the prior-added reactive units or soluble reagents.
  • the capping reagent is an acid anhydride, e.g. , acetic anhydride, or acyl chloride or other activated acylating group known to those skilled in the art.
  • the invention relates to a library of compounds produced by any of the methods described herein.
  • FIG. 1 is a schematic illustration depicting an exemplary template covalently attached to a product (macrocycle) encoded by nucleic acid template synthesis.
  • the exemplary template comprises a plurality of regions including two fixed regions (10 bases), two tag regions (7 bases) and three codons (12 bases) for DPC reactions.
  • a macrocycle small molecule is synthesized on the DNA template, with the linker corresponding to Tag 2, the spacer corresponding to Tag 1, and building blocks 1, 2, and 3 corresponding to codons 1, 2 and 3, respectively.
  • FIG. 2 is a schematic illustration depicting how to efficiently create a library of templates suitable for nucleic acid template synthesis.
  • each codon position contains one of 24 variants (but more variants are possible), and each of the codon sets in each of the positions Rl, R2, and R3 has its own unique set of 24 codons, making a total of 72 different codon sequences used in total, to generate 13,824 different DNA sequences corresponding to 13,824 different templates.
  • FIG. 3 is a schematic illustration of an embodiment in which the Tag 2 position of a DNA template is kept fixed with a unique base sequence that defines one trivalent linker building block.
  • FIG. 4 is a schematic illustration of an embodiment in which a plurality of different linkers (16 as shown) are employed in a single library mixture, wherein the template oligonucleotides were synthesized in 16 mixtures denoted L-l through LI 6 each comprising 13,824 different sequences. Combining the 16 different mixtures gives a total template library complexity of 221,184 different template sequences.
  • FIG. 5 is a schematic illustration depicting the addition of a spacer, which is defined by the Tag 1 sequence in the DNA template.
  • FIG. 6 shows the design of DNA templates with four codons which increases library size. Using 24 variants for the new building block introduced by this codon will provide 331,776 different template sequences. When 16 different four-codon template mixtures are pooled, a library of 5,308,416 unique compounds is created, which is a 24-fold increase in library diversity as compared to a 16-mixture pool of three codon templates, which provides only 221,184 compounds.
  • FIG. 7 is a schematic illustration showing how changing the sequence of the chemical steps in a nucleic acid-templated library synthesis can generate diverse architectures, thereby increasing the diversity of small molecules in a DPC library.
  • the present invention facilitates the creation of large, diverse libraries of small molecules that are created by nucleic acid templated synthesis. This can be facilitated by one or more of the choice of specific templates, the choice of specific chemical reactants, capping processes, the choice of appropriate chemistries, and changing the order of synthesis steps during nucleic acid template synthesis. Each of the features is discussed in more detail below.
  • codon and anti-codon as used herein, refer to complementary oligonucleotide sequences in a template and in a transfer unit, respectively, that permit the transfer unit to anneal to the template during nucleic acid-templated synthesis.
  • soluble reagent refers to a chemical reagent or chemical moiety that is not linked to an oligonucleotide and does not participate in nucleic acid- templated synthesis.
  • the soluble reagent can directly modify the small molecule attached to the oligonucleotide by chemical reaction independent of nucleic acid-templated addition.
  • the first and second reactive units of the template for example, and the transfer units are attached to oligonucleotides that can participate in nucleic acid-templated synthesis.
  • oligonucleotide or “nucleic acid” as used herein refer to a polymer of nucleotides.
  • the polymer may include, without limitation, natural nucleosides (i.e., adenosine, thymidine, guanosine, cytidine, uridine, deoxy adenosine, deoxythymidine, deoxyguanosine, and deoxy cytidine), nucleoside analogs (e.g., 2-aminoadenosine, 2-thiothymidine, inosine, pyrrolo-pyrimidine, 3-methyl adenosine, 5-methylcytidine, C5-bromouridine, C5-fluorouridine, C5-iodouridine, C5-propynyl-uridine, C5-propynyl-cytidine, C5-methylcytidine,
  • Nucleic acids and oligonucleotides may also include other polymers of bases having a modified backbone, such as a locked nucleic acid (LNA), a peptide nucleic acid (PNA), a threose nucleic acid (TNA) and any other polymers capable of serving as a template for an amplification reaction using an amplification technique, for example, a polymerase chain reaction, a ligase chain reaction, or non-enzymatic template-directed replication.
  • LNA locked nucleic acid
  • PNA peptide nucleic acid
  • TAA threose nucleic acid
  • reaction intermediate refers to a chemical reagent or a chemical moiety chemically transformed into a different reagent or chemical moiety with a soluble reagent.
  • small molecule refers to an organic compound either synthesized in the laboratory or found in nature having a molecular weight from about 300 Daltons (Da) to about 1,500 Da.
  • small molecule scaffold refers to a chemical compound having at least one site or chemical moiety suitable for functionalization.
  • the small molecule scaffold or molecular scaffold may have two, three, four, five or more sites or chemical groups suitable for functionalization. These functionalization sites may be protected or masked as would be appreciated by one of skill in this art. The sites may also be found on an underlying ring structure or backbone.
  • the small molecule scaffolds are not nucleic acids, nucleotides, or nucleotide analogs.
  • transfer unit refers to a molecule comprising an oligonucleotide having an anti-codon sequence attached to a reactive unit including, for example, but not limited to, a building block, monomer, small molecule scaffold, or other reactant useful in nucleic acid-templated chemical synthesis.
  • template refers to a molecule comprising an
  • the template optionally may comprise (i) a plurality of codon sequences, (ii) an amplification means, for example, a PCR primer binding site or a sequence
  • compositions are described as having, including, or comprising specific components, or where processes are described as having, including, or comprising specific process steps, it is contemplated that compositions of the present invention also consist essentially of, or consist of, the recited components, and that the processes of the present invention also consist essentially of, or consist of, the recited processing steps. Further, it should be understood that the order of steps or order for performing certain actions are immaterial so long as the invention remains operable. Moreover, unless specified to the contrary, two or more steps or actions may be conducted simultaneously. [0033] In one aspect, the invention provides a method for producing a library of small molecules associated with corresponding oligonucleotides.
  • the method comprises the steps of (a) providing a plurality of templates comprising a plurality of first reactive units associated with a corresponding plurality of first oligonucleotides, wherein (i) each first oligonucleotide defines at least a first codon sequence, a second codon sequence, and a third codon sequence; (ii) each of the first, second and third codon sequence is at least 12 bases in length; (iii) each of the first, second and third codon sequence is different from one another; and (iv) each first oligonucleotide is at least 70 bases in length; (b) providing a plurality of first transfer units comprising a plurality of second reactive units covalently attached to a corresponding plurality of second oligonucleotides, wherein each second oligonucleotide defines a first anti-codon sequence complementary to a first codon sequence; (c) annealing first and second
  • oligonucleotides having complementary codon and anti-codon sequences to bring the first and second reactive units into reactive proximity, thereby producing a plurality of first small molecules associated with the corresponding first oligonucleotides; (d) providing a plurality of second transfer units comprising a plurality of third reactive units covalently attached to a corresponding plurality of third oligonucleotides, wherein each third oligonucleotide defines a second anti-codon sequence complementary to the second codon sequence; (e) annealing first and third oligonucleotides having complementary codon and anti-codon sequences to bring the reaction products of step (c) and the third reactive units into reactive proximity thereby producing a plurality of second small molecules associated with the corresponding first oligonucleotides; (f) providing a plurality of third transfer units comprising a plurality of fourth reactive units covalently attached to a corresponding plurality of fourth oligonucleotides, wherein each fourth oligon
  • step (e) complementary codon and anti-codon sequences to bring the reaction products of step (e) and the fourth reactive units into reactive proximity thereby producing a plurality of third small molecules associated with the corresponding first oligonucleotides, wherein each of the corresponding first oligonucleotides has a nucleotide sequence informative of at least a portion of the synthetic history of the third small molecule associated therewith.
  • oligonucleotide is at least 70 bases in length; (b) dividing the plurality of templates into a plurality of aliquots; (c) for each aliquot (i) providing a plurality of first transfer units comprising a plurality of second reactive units covalently attached to a corresponding plurality of second oligonucleotides, wherein each second oligonucleotide defines a first anti-codon sequence complementary to a first codon sequence; (ii) annealing first and second
  • oligonucleotides having complementary codon and anti-codon sequences to bring the first and second reactive units into reactive proximity, thereby producing a plurality of first small molecules associated with the corresponding first oligonucleotides; (iii) providing a plurality of second transfer units comprising a plurality of third reactive units covalently attached to a corresponding plurality of third oligonucleotides, wherein each third oligonucleotide defines a second anti-codon sequence complementary to the second codon sequence; (iv) annealing first and third oligonucleotides having complementary codon and anti-codon sequences to bring the plurality of first small molecules of step (iii) and the third reactive units into reactive proximity thereby producing a plurality of second small molecules associated with the corresponding first oligonucleotides; (v) providing a plurality of third transfer units comprising a plurality of fourth reactive units covalently attached to a corresponding plurality of fourth oligonucleot
  • the first oligonucleotide comprises a fourth codon of at least 12 bases in length
  • the method comprises the additional steps of providing a plurality of fourth transfer units comprising a plurality of fifth reactive units covalently attached to a corresponding plurality of fifth oligonucleotides, wherein each fifth oligonucleotide defines a fourth anti-codon sequence complementary to the fourth codon sequence; and annealing first and fifth oligonucleotides having complementary codon and anti-codon sequences to bring the plurality of third small molecules of step (g) or (vi) and the fifth reactive units into reactive proximity thereby producing a plurality of fourth small molecules associated with the corresponding first oligonucleotides, wherein each of the corresponding first oligonucleotides has a nucleotide sequence informative of at least a portion of the synthetic history of the fourth small molecule associated therewith.
  • each codon region ensures sufficient base-pairing to give high-affinity duplex formation with a suitably high melting temperature such that the duplex will form and be maintained at ambient temperature.
  • at least one codon is at least 14 bases in length.
  • each of the first, second, third, and, when present, fourth codon is at least 14 bases in length.
  • the first oligonucleotide is at least 90 bases in length.
  • a soluble reagent e.g. , a free reactant not attached to an oligonucleotide-transfer unit
  • the third or fourth small molecule comprises a moiety that was added as a soluble reagent to the first oligonucleotide-associated small molecule of one or more of steps (c), (e), (g) or (i) (or (ii), (iv), (vi) or (viii)); and, optionally, wherein each of the first oligonucleotides comprises a nucleotide sequence that is informative of the soluble reagent-added moiety.
  • the structural complexity of small molecules generated by nucleic acid-templated synthesis can be further enhanced by the incorporation of a trivalent (i.e., trifunctional) building block, having three sites of attachment.
  • a trivalent building block having three sites of attachment.
  • at least one of the first, second, third, fourth or fifth reactive unit, or the soluble reagent, is a trivalent moiety.
  • the second, third or fourth small molecule or the soluble reagent comprises a reactive moiety or can be further modified or deprotected to reveal a reactive functional group capable of further reaction with a plurality of chemical moieties.
  • the reactive moiety capable of further reaction with another chemical moiety can be but is not limited to a nucleophilic primary or secondary amine or a free carboxyl group.
  • Diversity can also be increased by (i) splitting the library into a plurality of aliquots following addition of the reactive moiety capable of further reaction with a plurality of chemical moieties onto the first oligonucleotide-associated small molecules; and (ii) adding to each of the plurality of aliquots a different reagent that reacts with the reactive moiety present on the first oligonucleotide-associated small molecules present therein.
  • different reagents include an acylating agent, a sulfonating agent, a heteroaryl halide reagent, reductive animation reagents and an amide-forming reagent.
  • an identifying sequence e.g. , a tag
  • a tag can be ligated to the 3' terminus of each of the plurality of templates.
  • the concentration of the plurality of templates is at least 90 nM and no greater than 500 nM at each step when a reactive unit is added.
  • the concentration of the plurality of templates can be 90 nM, 100 nM, 150 nM, 200 nM, 300 nM, 400 nM or 500 nM.
  • one or more of the plurality of second, third, fourth or fifth oligonucleotides is bound to a first member of a binding pair, e.g. , biotin.
  • one or more of the plurality of first, second, third or fourth small molecules bound to a first member of a binding pair is purified by contact with a second member of a binding pair, wherein the second member of the binding pair (e.g. , streptavidin) is bound to a solid support.
  • Binding pair members can be any binding pairs known in the art, including, for example, biotin and avidin or streptavidin, antibody-antigen pairs, etc.
  • the plurality of templates is reacted with a capping reagent that differentially caps the small molecules that did not react with one or more of the prior-added reactive units or soluble reagents.
  • the cap renders the small molecule that did not react with the one or more of the prior-added reactive units or soluble reagents unable to react with any further reactive units or soluble reagents. In this manner, these components can no longer participate in further chemical steps in the library preparation and are removed from the library pool upon stepwise purification.
  • each first oligonucleotide comprises a unique tag sequence that defines the linker or capping group, or any other structural modification to any small molecule that was not achieved through a DNA-templated reaction step.
  • the nucleic acid template can direct a wide variety of chemical reactions without obvious structural requirements by specifically recruiting reactants linked to complementary oligonucleotides.
  • the template hybridizes or anneals to one or more transfer units to direct the synthesis of a reaction intermediate that can subsequently be converted by further chemical reaction into a reaction product (e.g., a small molecule).
  • the reaction product then is selected or screened based on certain criteria, such as the ability to bind to a preselected target molecule.
  • the associated template can then be sequenced to decode the synthetic history of the reaction intermediate and/or the reaction product.
  • the length of the template may vary greatly depending upon the type of the nucleic acid-templated synthesis contemplated.
  • the template may be from 20 to 400 nucleotides in length, from 30 to 300 nucleotides in length, from 40 to 200 nucleotides in length, or from 50 to 100 nucleotides in length, from 40 to 400 nucleotides or from 40 to 100 nucleotides in length.
  • the template may be 40, 50, 60, 70, 80, 90, or 100 nucleotides in length.
  • the length of the template will of course depend on, for example, the length of the codons, the complexity of the library, the complexity and/or size of a reaction product (e.g., a small molecule), the use of spacer sequences, etc.
  • the sequence of the template may be designed in a number of ways. For example, the length of the codon must be determined and the codon sequences must be set. If a codon length of two is used, then using the four naturally occurring bases only 16 possible combinations are available to be used in encoding the library. If the length of the codon is increased to three (the number Nature uses in encoding proteins), the number of possible combinations increases to 64. If the length of the codon is increased to four, the number of possible combinations increases to 256. Other factors to be considered in determining the length of the codon are mismatching, frame-shifting, complexity of library, etc. As the length of the codon is increased up to a certain point the number of mismatches is decreased; however, excessively long codons likely will hybridize despite mismatched base pairs.
  • the codons may range from 3 to 50 nucleotides, from 3 to 40 nucleotides, from 3 to 30 nucleotides, from 3 to 20 nucleotides, from
  • nucleotides 3 to 15 nucleotides, from 3 to 10 nucleotides, from 4 to 50 nucleotides, from 4 to 40 nucleotides, from 4 to 30 nucleotides, from 4 to 20 nucleotides, from 4 to 15 nucleotides, from 3 to 10 nucleotides, from 4 to 50 nucleotides, from 4 to 40 nucleotides, from 4 to 30 nucleotides, from 4 to 20 nucleotides, from 4 to 15 nucleotides, from 3 to 10 nucleotides, from 4 to 50 nucleotides, from 4 to 40 nucleotides, from 4 to 30 nucleotides, from 4 to 20 nucleotides, from 4 to 15 nucleotides, from 3 to 10 nucleotides, from 4 to 50 nucleotides, from 4 to 40 nucleotides, from 4 to 30 nucleotides, from 4 to 20 nucleotides, from 4 to 15 nucleotides,
  • nucleotides from 5 to 50 nucleotides, from 5 to 40 nucleotides, from 5 to 30 nucleotides, from 5 to 20 nucleotides, from 5 to 15 nucleotides, from 5 to 10 nucleotides, from 6 to 50 nucleotides, from 6 to 40 nucleotides, from 6 to 30 nucleotides, from 6 to 20 nucleotides, from 6 to 15 nucleotides, from 6 to 10 nucleotides, from 7 to 50 nucleotides, from 5 to 50 nucleotides, from 5 to 40 nucleotides, from 5 to 30 nucleotides, from 5 to 20 nucleotides, from 5 to 15 nucleotides, from 5 to 10 nucleotides, from 6 to 50 nucleotides, from 6 to 40 nucleotides, from 6 to 30 nucleotides, from 6 to 20 nucleotides, from 6 to 15 nucleotides, from 6 to 10 nucleotides
  • 7 to 40 nucleotides from 7 to 30 nucleotides, from 7 to 20 nucleotides, from 7 to 15 nucleotides, from 7 to 10 nucleotides, from 8 to 50 nucleotides, from 8 to 40 nucleotides, from
  • 8 to 30 nucleotides from 8 to 20 nucleotides, from 8 to 15 nucleotides, from 8 to 10 nucleotides, from 9 to 50 nucleotides, from 9 to 40 nucleotides, from 9 to 30 nucleotides, from
  • 9 to 20 nucleotides from 9 to 15 nucleotides, from 9 to 10 nucleotides.
  • codons are 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 nucleotides in length.
  • the set of codons used in the template preferably maximizes the number of mismatches between any two codons within a codon set to ensure that only the proper anti-codons of the transfer units anneal to the codon sites of the template. Furthermore, it is important that the template has mismatches between all the members of one codon set and all the codons of a different codon set to ensure that the anti-codons do not inadvertently bind to the wrong codon set.
  • the choice of exemplary codon sets and methods of creating functional codon sets are described, for example, in U.S. Patent Nos. 7,491,494; 7,771,935; and 8,206,914, by Liu et al. Using this and other approaches, different sets of codons can be generated so that no codons are repeated.
  • nucleic acid identifiers may be incorporated into the template to identify a spacer moiety, linker moiety, capping reagent, or soluble reagent used in such synthesis.
  • additional nucleic acid (e.g., DNA) tag sequences are used to identify the reaction product covalently attached to the template at the end of the library synthesis, but they do not engage in DPC-catalyzed reactions. Instead, they are used to identify subsets of the library that have a particular linker, spacer, capping reagent, or soluble reagent.
  • a template oligonucleotide can be synthesized with a given tag sequence that corresponds to a specific linker moiety.
  • the individual linker moieties are chemically attached to the 5 '-amino terminus of the template, maintaining a direct relationship between the tag sequence and the linker moiety.
  • the template-linker conjugates can then be mixed and used in DPC reactions directly.
  • the final library products that are synthesized can be identified ultimately by sequencing the DNA revealing the structure of the small molecule by a consideration of both the codon regions and the tag region, which defines the linker moiety.
  • the linker, spacer, capping moiety or soluble reagent can be added independently to a template mixture and then an identifying tag DNA sequence for such added reagent added to the 3 '-end of the nucleic acid template by a ligation reaction.
  • the tag is unique for the linker, spacer, capping moiety or soluble reagent.
  • the sequence of the added tag can be identified and the identity of the non-DPC linker, spacer, capping moiety or soluble reagent can be determined.
  • an exemplary template can comprise one or more (e.g., two) tag regions (e.g., a 7 base tag) encoding a linker, spacer, capping moiety or soluble reagent.
  • Tag 2 corresponds to the linker of the attached small molecule, and Tag 1 to the spacer, although this relationship is not fixed and can be reversed (see also FIG. 3, showing that the fixed Tag 2 region defines the linker building block).
  • the sequence encoding a tag e.g., Tag 1 or Tag 2 stays constant for a given template.
  • FIG. 1 e.g., Tag 1 or Tag 2
  • the identity of the linker attached to a template can be determined by determining the sequence of Tag 2.
  • the identity of the spacer attached to a given template can be determined by determining the sequence of Tag 1.
  • the tag regions may range from 3 to 30 nucleotides, from 3 to 20 nucleotides, from 3 to 15 nucleotides, from 3 to 10 nucleotides, from 4 to 30 nucleotides, from 4 to 20 nucleotides, from 4 to 15 nucleotides, from 4 to 10 nucleotides, from 5 to 30 nucleotides, from 5 to 20 nucleotides, from 5 to 15 nucleotides, from 5 to 10 nucleotides, from 6 to 30 nucleotides, from 6 to 20 nucleotides, from 6 to 15 nucleotides, from 6 to 10 nucleotides, from 7 to 30 nucleotides, from 7 to 20 nucleotides, from 7 to 15 nucleotides, from 7 to 10 nucleotides, from 8 to 30 nucleotides, from 8 to 20 nucleotides, from 8 to 15 nucleotides, from 8 to 10 nucle
  • tag regions are 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 nucleotides in length.
  • the 3'- and 5'-ends of the nucleic template optionally may comprise "fixed sequence regions” or “fixed regions” of bases (e.g., 10 bases), which represent polymerase chain reaction (PCR) primer sites. These sites can be used to facilitate PCR amplification of the DNA sequences the specific base sequence of the DNA is used to directly identify library components.
  • a PCR primer site comprises bases from both the fixed sequence region and an adjacent region of the template, e.g. , a portion of a tag or codon.
  • the fixed sequence regions can range from 2 to 50 nucleotides, from 2 to 40 nucleotides, from 2 to 30 nucleotides, from 2 to 20 nucleotides, from 2 to 15 nucleotides, from 2 to 10 nucleotides, from 3 to 50 nucleotides, from 3 to 40 nucleotides, from 3 to 30 nucleotides, from 3 to 20 nucleotides, from 2 to 50 nucleotides, from 2 to 40 nucleotides, from 2 to 30 nucleotides, from 2 to 20 nucleotides, from 2 to 50 nucleotides, from 2 to 40 nucleotides, from 2 to 30 nucleotides, from 2 to 20 nucleotides, from 2 to 50 nucleotides, from 2 to 40 nucleotides, from 2 to 30 nucleotides, from 2 to 20 nucleotides, from 2 to 50 nucleotides, from 2 to 40 nucleotides, from 2 to 30 nucleot
  • nucleotides 3 to 15 nucleotides, from 3 to 10 nucleotides, from 4 to 50 nucleotides, from 4 to 40 nucleotides, from 4 to 30 nucleotides, from 4 to 20 nucleotides, from 4 to 15 nucleotides, from 3 to 10 nucleotides, from 4 to 50 nucleotides, from 4 to 40 nucleotides, from 4 to 30 nucleotides, from 4 to 20 nucleotides, from 4 to 15 nucleotides, from 3 to 10 nucleotides, from 4 to 50 nucleotides, from 4 to 40 nucleotides, from 4 to 30 nucleotides, from 4 to 20 nucleotides, from 4 to 15 nucleotides, from 3 to 10 nucleotides, from 4 to 50 nucleotides, from 4 to 40 nucleotides, from 4 to 30 nucleotides, from 4 to 20 nucleotides, from 4 to 15 nucleotides,
  • nucleotides from 5 to 50 nucleotides, from 5 to 40 nucleotides, from 5 to 30 nucleotides, from 5 to 20 nucleotides, from 5 to 15 nucleotides, from 5 to 10 nucleotides, from
  • 6 to 50 nucleotides from 6 to 40 nucleotides, from 6 to 30 nucleotides, from 6 to 20 nucleotides, from 6 to 15 nucleotides, from 6 to 10 nucleotides, from 7 to 50 nucleotides, from
  • 7 to 40 nucleotides from 7 to 30 nucleotides, from 7 to 20 nucleotides, from 7 to 15 nucleotides, from 7 to 10 nucleotides, from 8 to 50 nucleotides, from 8 to 40 nucleotides, from 8 to 30 nucleotides, from 8 to 20 nucleotides, from 8 to 15 nucleotides, from 8 to 10 nucleotides, from 9 to 50 nucleotides, from 9 to 40 nucleotides, from 9 to 30 nucleotides, from 9 to 20 nucleotides, from 9 to 15 nucleotides, from 9 to 10 nucleotides.
  • fixed sequence regions are 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 nucleotides in length. While primers generally must be about 6 nucleotides in length, a fixed region can be less than 6 nucleotides if a PCR primer site comprises bases from both the fixed sequence region and an adjacent region of the template, e.g., a portion of a tag or codon.
  • the templates can be synthesized using methodologies well known in the art. These methods include both in vivo and in vitro methods including PCR, plasmid preparation, endonuclease digestion, solid phase synthesis (for example, using an automated synthesizer), in vitro transcription, strand separation, etc.
  • the template when desired can be attached (for example, covalently or non-covalently attached) with a reactive unit of interest using standard coupling chemistries known in the art.
  • a linker is used to attach a reactive unit of interest to the template.
  • the linker can be bivalent (have two functional groups for attachment) or trivalent (have three functional groups for attachment).
  • the linker can be defined by a tag sequence on the template, as described above.
  • oligonucleotides are synthesized using standard phosphoramidite 3' to 5' chemistries, although alternatively, synthesis in the 5' to 3' direction can be performed.
  • the constant 3' end is synthesized. This is then split into n different vessels, where n is the number of different codons to appear at that position in the template. For each vessel, one of the n different codons is synthesized on the (growing) 5' end of the constant 3' end. Thus, each vessel contains, from 5' to 3', a different codon attached to a constant 3' end.
  • n vessels then are pooled, so that a single vessel contains n different codons attached to the constant 3' end. Any constant bases adjacent the 5' end of the codon are now synthesized.
  • the pool then is split into m different vessels, where m is the number of different codons to appear at the next (more 5') position of the template.
  • a different codon is synthesized (at the 5' end of the growing oligonucleotide) in each of the m vessels.
  • the resulting oligonucleotides are pooled in a single vessel. Splitting, synthesizing, and pooling are repeated as required to synthesize all codons and constant regions in the oligonucleotides.
  • a template can be constructed comprising, e.g., a fixed region 3' and 5' of the template, two tags (Tag 1 and Tag 2), and three codons. Each of the three codons has 24 possible variants, for a total of 13,824 different DNA sequences.
  • larger libraries can be produced by combining multiple mixtures of templates encoding different DNA sequences. For example (and as shown in FIG. 4), 16 mixtures of templates encoding 13,824 different DNA sequences can be combined to produce a single library of , e.g., 221,184 different DNA sequences. As shown in FIG. 6, even larger libraries can be produced by using a fourth codon.
  • a transfer unit comprises an oligonucleotide containing an anti-codon sequence and a reactive unit.
  • the anti-codons are designed to be complementary to the codons present in the template. Accordingly, the sequences used in the template and the codon lengths should be considered when designing the anti-codons. Any molecule complementary to a codon used in the template may be used, including natural or non-natural nucleotides. In certain
  • the codons include one or more bases found in nature (i.e. , thymidine, uracil, guanidine, cytosine, and adenine).
  • the anti-codon is associated with a particular type of reactive unit to form a transfer unit.
  • the anti-codon can be associated with a reactive unit or reactant that is used to modify a small molecule scaffold.
  • the reactant is linked to the anti-codon via a connector long enough to allow the reactant to come into reactive proximity with the small molecule scaffold.
  • the connector preferably has a length and composition to permit a specific reaction between the annealed template and reactant, while minimizing and preferably preventing the occurrence of non-specific reactions (e.g. , non-specific intramolecular reactions).
  • the reactants include a variety of reagents as demonstrated by the wide range of reactions that can be utilized in nucleic acid-templated synthesis and can be any chemical group, catalyst (e.g., organometallic compounds), or reactive moiety (e.g., electrophiles, nucleophiles) known in the chemical arts.
  • catalyst e.g., organometallic compounds
  • reactive moiety e.g., electrophiles, nucleophiles
  • the anti-codon can be associated with the reactant through a cleavable connector.
  • the connector can be cleavable by light, oxidation, hydrolysis, exposure to acid, exposure to base, reduction, etc.
  • Fruchtel et al. (1996) ANGEW. CHEM. INT. ED. ENGL. 35 : 17 describes a variety of linkages useful in the practice of the invention.
  • the linker facilitates contact of the reactant with the small molecule scaffold and in certain embodiments, depending on the desired reaction, positions DNA as a leaving group ("autocleavable” strategy), or may link reactive groups to the template via the "scarless” connector strategy (which yields product without leaving behind an additional atom or atoms having chemical functionality), or a "useful scar” strategy (in which a portion of the connector is left behind to be functionalized in subsequent steps following connector cleavage).
  • the DNA-reactive group bond is cleaved as a natural consequence of the reaction.
  • DNA- templated reaction of one reactive group is followed by cleavage of the connector attached through a second reactive group to yield products without leaving behind additional atoms capable of providing chemical functionality.
  • a "useful scar” may be utilized on the theory that it may be advantageous to introduce useful atoms and/or chemical groups as a consequence of connector cleavage.
  • a "useful scar” is left behind following connector cleavage and can be functionalized in subsequent steps.
  • transfer units can be used at submillimolar concentrations (e.g. less than 100 ⁇ , less than 10 ⁇ , less than 1 ⁇ , less than 100 nM, or less than 10 nM).
  • a linker can be defined by a tag, which can be used to identify the final reaction product covalently attached to the template.
  • the linker preferably is multivalent, and can be bivalent, trivalent, tetravalent, etc.
  • Exemplary linkers include a diamino acid, an azido-amino acid, an acetylenic amino acid, a haloaromatic functionalized amino acid, or any other similar multivalent building block, with the functionality either present or in a protected or precursor form.
  • Exemplary protected or precursor forms include but are not limited to Boc, Alloc, or Fmoc carbamate forms, or amines can be generated from azides by using a reduction step.
  • Spacer moieties can be used to add additional diversity to small molecule libraries, to increase the size of small molecules (e.g. , macrocycles), or increase the spacing between other moieties in molecules, or to introduce new and diverse functionality.
  • a spacer can be defined by a tag (see FIGS. 1, 5 and 6), which can be used to identify the final reaction product covalently attached to the template.
  • Exemplary spacers include but are not limited to amino acids, carboxy alkynes, bis-carboxylic acids, amino aldehydes, and bromo carboxylic acids.
  • a variety of small molecule compounds and/or libraries can be prepared using the methods described herein.
  • compounds that are not, or do not resemble, nucleic acids or analogs thereof, are synthesized according to the method of the invention. It is contemplated that the small molecules can include macrocycles.
  • an evolvable template can be used.
  • the template can include a small molecule scaffold upon which the small molecule is to be built (e.g. , a first reactive unit), or a small molecule scaffold may be added to the template.
  • the small molecule scaffold can be any chemical compound with two or more sites for functionalization.
  • the small molecule scaffold can include a ring system (e.g. , the ABCD steroid ring system found in cholesterol) with functionalizable groups coupled to the atoms making up the rings.
  • the small molecule may be the underlying core scaffold structure of a pharmaceutical agent such as morphine, epothilone or a cephalosporin antibiotic.
  • the sites or groups to be functionalized on the small molecule scaffold may be protected using methods and protecting groups known in the art.
  • the protecting groups used in a small molecule scaffold may be orthogonal to one another so that protecting groups can be removed one at a time.
  • the transfer units comprise an anti-codon associated with a reactant or a building block for use in modifying, adding to, or taking away from the small molecule scaffold.
  • the reactants or building blocks may be, for example, electrophiles (e.g. , anhydrides, acid chlorides, esters, nitriles, imines), nucleophiles (e.g. , amines, hydroxyl groups, thiols), catalysts (e.g. , organometallic catalysts), or side chains.
  • the transfer units are allowed to contact the template under hybridizing conditions.
  • the attached reactant or building block is allowed to react with a site on the small molecule scaffold to produce one or more reaction intermediates.
  • protecting groups on the small molecule template are removed one at a time from the sites to be functionalized so that the reactant of the transfer unit will react at only the desired position on the scaffold.
  • reaction conditions, linker, reactant, and site to be functionalized are chosen to avoid unwanted side reactions and accelerate desired intramolecular reactions. Sequential or simultaneous contacting of the template with transfer units can be employed depending on the particular compound to be synthesized.
  • the newly synthesized small molecule remains associated with the template that encoded its synthesis.
  • Decoding the sequence of the template permits the deconvolution of the synthetic history and thereby the structure of the small molecule.
  • nucleic acid-tempi ated reactions Known chemical reactions for synthesizing polymers, small molecules, or other molecules can be used in nucleic acid-tempi ated reactions.
  • reactions such as those listed in March 's Advanced Organic Chemistry, Organic Reactions, Organic Syntheses, organic text books, journals such as Journal of the American Chemical Society, Journal of Organic Chemistry, Tetrahedron, etc., and Carruther's Some Modern Methods of Organic Chemistry can be used.
  • the chosen reactions preferably are compatible with nucleic acids such as DNA or RNA or are compatible with the modified nucleic acids used as the template.
  • Reactions useful in nucleic-acid templated chemistry include, for example, substitution reactions, carbon-carbon bond forming reactions, elimination reactions, acylation reactions, and addition reactions. An illustrative but not exhaustive list of aliphatic
  • nucleophilic substitution reactions useful in the present invention includes, for example, S N 2 reactions, SNI reactions, SNI reactions, allylic rearrangements, nucleophilic substitution at an aliphatic trigonal carbon, and nucleophilic substitution at an aromatic carbon.
  • Specific aliphatic nucleophilic substitution reactions with oxygen nucleophiles include, for example, hydrolysis of alkyl halides, hydrolysis of gem-dihalides, hydrolysis of 1,1,1 -trihalides, hydrolysis of alkyl esters or inorganic acids, hydrolysis of diazo ketones, hydrolysis of acetal and enol ethers, hydrolysis of epoxides, hydrolysis of acyl halides, hydrolysis of anhydrides, hydrolysis of carboxylic esters, hydrolysis of amides, alkylation with alkyl halides (Williamson Reaction), epoxide formation, alkylation with inorganic esters, alkylation with diazo compounds, dehydration of alcohols, transetherification, alcoholysis of epoxides, alkylation with onium salts, hydroxylation of silanes, alcoholysis of acyl halides, alcoholysis of anhydrides, esterification of carboxylic acids, alcoholy
  • Specific aliphatic nucleophilic substitution reactions with sulfur nucleophiles include, for example, attack by SH at an alkyl carbon to form thiols, attack by S at an alkyl carbon to form thioethers, attack by SH or SR at an acyl carbon, formation of disulfides, formation of Bunte salts, alkylation of sulfinic acid salts, and formation of alkyl thiocyanates.
  • Aliphatic nucleophilic substitution reactions with nitrogen nucleophiles include, for example, alkylation of amines, N-arylation of amines, replacement of a hydroxy by an amino group, transamination, transamidation, alkylation of amines with diazo compounds, amination of epoxides, amination of oxetanes, amination of aziridines, amination of alkanes, formation of isocyanides, acylation of amines by acyl halides, acylation of amines by anhydrides, acylation of amines by carboxylic acids, acylation of amines by carboxylic esters, acylation of amines by amides, acylation of amines by other acid derivatives, N-alkylation or N-arylation of amides and imides, N-acylation of amides and imides, formation of aziridines from epoxides, formation of nitro compounds, formation of azides, formation
  • Aliphatic nucleophilic substitution reactions with halogen nucleophiles include, for example, attack at an alkyl carbon, halide exchange, formation of alkyl halides from esters of sulfuric and sulfonic acids, formation of alkyl halides from alcohols, formation of alkyl halides from ethers, formation of halohydrins from epoxides, cleavage of carboxylic esters with lithium iodide, conversion of diazo ketones to a-halo ketones, conversion of amines to halides, conversion of tertiary amines to cyanamides (the von Braun reaction), formation of acyl halides from carboxylic acids, and formation of acyl halides from acid derivatives.
  • Aliphatic nucleophilic substitution reactions using hydrogen as a nucleophile include, for example, reduction of alkyl halides, reduction of tosylates, other sulfonates, and similar compounds, hydrogenolysis of alcohols, hydrogenolysis of esters (Barton-McCombie reaction), hydrogenolysis of nitriles, replacement of alkoxyl by hydrogen, reduction of epoxides, reductive cleavage of carboxylic esters, reduction of a C-N bond, desulfurization, reduction of acyl halides, reduction of carboxylic acids, esters, and anhydrides to aldehydes, and reduction of amides to aldehydes.
  • aliphatic nucleophilic substitution reactions using carbon nucleophiles include, for example, coupling with silanes, coupling of alkyl halides (the Wurtz reaction), the reaction of alkyl halides and sulfonate esters with Group I (I A) and II (II A) organometallic reagents, reaction of alkyl halides and sulfonate esters with organocuprates, reaction of alkyl halides and sulfonate esters with other organometallic reagents, allylic and propargylic coupling with a halide substrate, coupling of organometallic reagents with esters of sulfuric and sulfonic acids, sulfoxides, and sulfones, coupling involving alcohols, coupling of organometallic reagents with carboxylic esters, coupling
  • Reactions which involve nucleophilic attack at a sulfonyl sulfur atom may also be used in the present invention and include, for example, hydrolysis of sulfonic acid derivatives (attack by OH), formation of sulfonic esters (attack by OR), formation of sulfonamides (attack by nitrogen), formation of sulfonyl halides (attack by halides), reduction of sulfonyl chlorides (attack by hydrogen), and preparation of sulfones (attack by carbon).
  • Aromatic electrophilic substitution reactions may also be used in nucleotide- templated chemistry. Hydrogen exchange reactions are examples of aromatic electrophilic substitution reactions that use hydrogen as the electrophile. Aromatic electrophilic substitution reactions which use nitrogen electrophiles include, for example, nitration and nitro-de- hydrogenation, nitrosation of nitroso-de-hydrogenation, diazonium coupling, direct introduction of the diazonium group, and animation or amino-de-hydrogenation. Reactions of this type with sulfur electrophiles include, for example, sulfonation, sulfo-de-hydrogenation, halosulfonation, halosulfo-de-hydrogenation, sulfurization, and sulfonylation. Reactions using halogen electrophiles include, for example, halogenation, and halo-de-hydrogenation.
  • Aromatic electrophilic substitution reactions with carbon electrophiles include, for example, Friedel-Crafts alkylation, alkylation, alkyl-de-hydrogenation, Friedel-Crafts arylation (the Scholl reaction), Friedel-Crafts acylation, formylation with disubstituted formamides, formylation with zinc cyanide and HC1 (the Gatterman reaction), formylation with chloroform (the Reimer-Tiemann reaction), other formylations, formyl-de-hydrogenation, carboxylation with carbonyl halides, carboxylation with carbon dioxide (the Kolbe-Schmitt reaction), amidation with isocyanates, N-alkylcarbamoyl-de-hydrogenation, hydroxyalkylation, hydroxyalkyl-de-hydrogenation, cyclodehydration of aldehydes and ketones, haloalkylation, halo-de-hydrogenation, aminoalkylation, amidoalkylation, dial
  • dialkylamino-de-hydrogenation dialkylamino-de-hydrogenation, thioalkylation, acylation with nitriles (the Hoesch reaction), cyanation, and cyano-de-hydrogenation.
  • Reactions using oxygen electrophiles include, for example, hydroxylation and hydroxy-de-hydrogenation.
  • Rearrangement reactions include, for example, the Fries rearrangement, migration of a nitro group, migration of a nitroso group (the Fischer-Hepp Rearrangement), migration of an arylazo group, migration of a halogen (the Orton rearrangement), migration of an alkyl group, etc.
  • Other reactions on an aromatic ring include the reversal of a Friedel-Crafts alkylation, decarboxylation of aromatic aldehydes, decarboxylation of aromatic acids, the Jacobsen reaction, deoxygenation, desulfonation, hydro-de-sulfonation, dehalogenation, hydro-de- halogenation, and hydrolysis of organometallic compounds.
  • Aliphatic electrophilic substitution reactions are also useful. Reactions using the SEI , SE2 (front), SE2 (back), SEI, addition-elimination, and cyclic mechanisms can be used in the present invention. Reactions of this type with hydrogen as the leaving group include, for example, hydrogen exchange (deuterio-de-hydrogenation, deuteriation), migration of a double bond, and keto-enol tautomerization. Reactions with halogen electrophiles include, for example, halogenation of aldehydes and ketones, halogenation of carboxylic acids and acyl halides, and halogenation of sulfoxides and sulfones.
  • Reactions with nitrogen electrophiles include, for example, aliphatic diazonium coupling, nitrosation at a carbon bearing an active hydrogen, direct formation of diazo compounds, conversion of amides to a-azido amides, direct amination at an activated position, and insertion by nitrenes.
  • Reactions with sulfur or selenium electrophiles include, for example, sulfenylation, sulfonation, and selenylation of ketones and carboxylic esters.
  • Reactions with carbon electrophiles include, for example, acylation at an aliphatic carbon, conversion of aldehydes to ⁇ -keto esters or ketones, cyanation, cyano-de- hydrogenation, alkylation of alkanes, the Stork enamine reaction, and insertion by carbenes.
  • Reactions with metal electrophiles include, for example, metalation with organometallic compounds, metalation with metals and strong bases, and conversion of enolates to silyl enol ethers.
  • Aliphatic electrophilic substitution reactions with metals as leaving groups include, for example, replacement of metals by hydrogen, reactions between organometallic reagents and oxygen, reactions between organometallic reagents and peroxides, oxidation of trialkylboranes to borates, conversion of Grignard reagents to sulfur compounds, halo-de-metalation, the conversion of organometallic compounds to amines, the conversion of organometallic compounds to ketones, aldehydes, carboxylic esters and amides, cyano-de-metalation, transmetalation with a metal, transmetalation with a metal halide, transmetalation with an organometallic compound, reduction of alkyl halides, metallo-de-halogenation, replacement of a halogen by a metal from an organometallic compound, decarboxylation of aliphatic acids, cleavage of alkoxides, replacement of a carboxyl group by an acyl group, basic
  • Electrophilic substitution reactions at nitrogen include, for example, diazotization, conversion of hydrazines to azides, N-nitrosation, N-nitroso-de-hydrogenation, conversion of amines to azo compounds, N-halogenation, N-halo-de-hydrogenation, reactions of amines with carbon monoxide, and reactions of amines with carbon dioxide.
  • Aromatic nucleophilic substitution reactions may also be used in the present invention. Reactions proceeding via the S N Ar mechanism, the SNI mechanism, the benzyne mechanism, the SRNI mechanism, or other mechanism, for example, can be used. Aromatic nucleophilic substitution reactions with oxygen nucleophiles include, for example, hydroxy-de- halogenation, alkali fusion of sulfonate salts, and replacement of OR or OAr. Reactions with sulfur nucleophiles include, for example, replacement by SH or SR. Reactions using nitrogen nucleophiles include, for example, replacement by NH 2 , NHR, or NR 2 , and replacement of a hydroxy group by an amino group.
  • Reactions with halogen nucleophiles include, for example, the introduction of halogens.
  • Aromatic nucleophilic substitution reactions with hydrogen as the nucleophile include, for example, reduction of phenols and phenolic esters and ethers, and reduction of halides and nitro compounds.
  • Reactions with carbon nucleophiles include, for example, the Rosenmund-von Braun reaction, coupling of organometallic compounds with aryl halides, ethers, and carboxylic esters, arylation at a carbon containing an active hydrogen, conversions of aryl substrates to carboxylic acids, their derivatives, aldehydes, and ketones, and the Ullmann reaction.
  • Reactions with hydrogen as the leaving group include, for example, alkylation, arylation, and animation of nitrogen heterocycles.
  • Reactions with N 2 + as the leaving group include, for example, hydroxy-de-diazoniation, replacement by sulfur-containing groups, iodo-de-diazoniation, and the Schiemann reaction.
  • Rearrangement reactions include, for example, the von Richter rearrangement, the Sommelet-Hauser rearrangement, rearrangement of aryl hydroxylamines, and the Smiles rearrangement.
  • Reactions involving free radicals can also be used, although the free radical reactions used in nucleotide-templated chemistry should be carefully chosen to avoid modification or cleavage of the nucleotide template.
  • free radical substitution reactions can be used in the present invention.
  • Particular free radical substitution reactions include, for example, substitution by halogen, halogenation at an alkyl carbon, allylic halogenation, benzylic halogenation, halogenation of aldehydes, hydroxylation at an aliphatic carbon, hydroxylation at an aromatic carbon, oxidation of aldehydes to carboxylic acids, formation of cyclic ethers, formation of hydroperoxides, formation of peroxides, acyloxylation, acyloxy-de- hydrogenation, chlorosulfonation, nitration of alkanes, direct conversion of aldehydes to amides, amidation and animation at an alkyl carbon, simple coupling at a susceptible position, coupling of alkynes, arylation of aromatic compounds by diazonium salts, arylation of activated alkenes by diazonium salts (the Meerwein arylation), arylation and alkylation of alkenes
  • Free radical substitution reactions with metals as leaving groups include, for example, coupling of Grignard reagents, coupling of boranes, and coupling of other organometallic reagents. Reaction with halogen as the leaving group are included.
  • Other free radical substitution reactions with various leaving groups include, for example, desulfurization with Raney Nickel, conversion of sulfides to organolithium compounds, decarboxylative dimerization (the Kolbe reaction), the Hunsdiecker reaction, decarboxylative allylation, and decarbonylation of aldehydes and acyl halides.
  • reactions involving additions to carbon-carbon multiple bonds are also used in nucleotide-templated chemistry. Any mechanism may be used in the addition reaction including, for example, electrophilic addition, nucleophilic addition, free radical addition, and cyclic mechanisms. Reactions involving additions to conjugated systems can also be used. Addition to cyclopropane rings can also be utilized.
  • Particular reactions include, for example, isomerization, addition of hydrogen halides, hydration of double bonds, hydration of triple bonds, addition of alcohols, addition of carboxylic acids, addition of H 2 S and thiols, addition of ammonia and amines, addition of amides, addition of hydrazoic acid, hydrogenation of double and triple bonds, other reduction of double and triple bonds, reduction of the double and triple bonds of conjugated systems, hydrogenation of aromatic rings, reductive cleavage of cyclopropanes, hydroboration, other hydrometalations, addition of alkanes, addition of alkenes and/or alkynes to alkenes and/or alkynes (e.g., pi-cation cyclization reactions, hydro-alkenyl- addition), ene reactions, the Michael reaction, addition of organometallics to double and triple bonds not conjugated to carbonyls, the addition of two alkyl groups to an alkyne, 1,4-addition of organometallic compounds to activated
  • acylamidation (addition of oxygen, carbon or nitrogen, carbon), 1,3-dipolar cycloaddition (addition of oxygen, nitrogen, carbon), Huisgen reaction of azides and acetylenes, Diels-Alder reaction, heteroatom Diels-Alder reaction, all carbon 3 +2 cycloadditions, dimerization of alkenes, the addition of carbenes and carbenoids to double and triple bonds, trimerization and tetramerization of alkynes, and other cycloaddition reactions. [0087] In addition to reactions involving additions to carbon-carbon multiple bonds, addition reactions to carbon-hetero multiple bonds can be used in nucleotide-templated chemistry.
  • Exemplary reactions include, for example, the addition of water to aldehydes and ketones (formation of hydrates), hydrolysis of carbon-nitrogen double bond, hydrolysis of aliphatic nitro compounds, hydrolysis of nitriles, addition of alcohols and thiols to aldehydes and ketones, reductive alkylation of alcohols, addition of alcohols to isocyanates, alcoholysis of nitriles, formation of xanthates, addition of H 2 S and thiols to carbonyl compounds, formation of bisulfite addition products, addition of amines to aldehydes and ketones, addition of amides to aldehydes, reductive alkylation of ammonia or amines, the Mannich reaction, the addition of amines to isocyanates, addition of ammonia or amines to nitriles, addition of amines to carbon disulfide and carbon dioxide, addition of hydrazine derivative to carbonyl compounds, formation of oximes,
  • carbodiimides the conversion of carboxylic acid salts to nitriles, the formation of epoxides from aldehydes and ketones, the formation of episulfides and episulfones, the formation of ⁇ - lactones and oxetanes (e.g. , the Paterno-Buchi reaction), the formation of ⁇ -lactams, etc.
  • Reactions involving addition to isocyanides include the addition of water to isocyanides, the Passerini reaction, the Ug reaction, and the formation of metalated aldimines.
  • Elimination reactions including ⁇ , ⁇ , and ⁇ eliminations, as well as extrusion reactions, can be performed using nucleotide-templated chemistry, although the strength of the reagents and conditions employed should be considered.
  • Preferred elimination reactions include reactions that go by El, E2, ElcB, or E2C mechanisms.
  • Exemplary reactions include, for example, reactions in which hydrogen is removed from one side (e.g., dehydration of alcohols, cleavage of ethers to alkenes, the Chugaev reaction, ester decomposition, cleavage of quaternary ammonium hydroxides, cleavage of quaternary ammonium salts with strong bases, cleavage of amine oxides, pyrolysis of keto-ylids, decomposition of toluene-p- sulfonylhydrazones, cleavage of sulfoxides, cleavage of selenoxides, cleavage of sulfones, dehydrohalogenation of alkyl halides, dehydrohalogenation of acyl halides,
  • reactions in which hydrogen is removed from one side e.g., dehydration of alcohols, cleavage of ethers to alkenes, the Chugaev reaction, ester decomposition, cleavage of
  • Extrusion reactions include, for example, extrusion of N 2 from pyrazolines, extrusion of N 2 from pyrazoles, extrusion of N 2 from triazolines, extrusion of CO, extrusion of CO2, extrusion of SO2, the Story synthesis, and alkene synthesis by twofold extrusion.
  • Rearrangements including, for example, nucleophilic rearrangements, electrophilic rearrangements, prototropic rearrangements, and free-radical rearrangements, can also be performed using nucleotide-templated chemistry. Both 1,2 rearrangements and non-1,2 rearrangements can be performed. Exemplary reactions include, for example, carbon-to-carbon migrations of R, H, and Ar (e.g., Wagner-Meerwein and related reactions, the Pinacol rearrangement, ring expansion reactions, ring contraction reactions, acid-catalyzed
  • Villiger rearrangement and rearrangment of hydroperoxides nitrogen-to-carbon, oxygen-to- carbon, and sulfur-to-carbon migration (e.g., the Stevens rearrangement, and the Wittig rearrangement), boron-to-carbon migrations (e.g., conversion of boranes to alcohols (primary or otherwise), conversion of boranes to aldehydes, conversion of boranes to carboxylic acids, conversion of vinylic boranes to alkenes, formation of alkynes from boranes and acetylides, formation of alkenes from boranes and acetylides, and formation of ketones from boranes and acetylides), electrocyclic rearrangements (e.g., of cyclobutenes and 1,3-cyclohexadienes, or conversion of stilbenes to phenanthrenes), sigmatropic rearrangements (e.g., (l,j) sigmatropic migrations of hydrogen,
  • Oxidative and reductive reactions may also be performed using nucleotide-templated chemistry.
  • Exemplary reactions may involve, for example, direct electron transfer, hydride transfer, hydrogen-atom transfer, formation of ester intermediates, displacement mechanisms, or addition-elimination mechanisms.
  • Exemplary oxidations include, for example, eliminations of hydrogen (e.g., aromatization of six-membered rings, dehydrogenations yielding carbon- carbon double bonds, oxidation or dehydrogenation of alcohols to aldehydes and ketones, oxidation of phenols and aromatic amines to quinones, oxidative cleavage of ketones, oxidative cleavage of aldehydes, oxidative cleavage of alcohols, ozonolysis, oxidative cleavage of double bonds and aromatic rings, oxidation of aromatic side chains, oxidative decarboxylation, and bisdecarboxylation), reactions involving replacement of hydrogen by oxygen (e.g., oxidation of methylene to carbonyl, oxidation of methylene to OH, C0 2 R, or OR, oxidation of
  • arylmethanes oxidation of ethers to carboxylic esters and related reactions, oxidation of aromatic hydrocarbons to quinones, oxidation of amines or nitro compounds to aldehydes, ketones, or dihalides, oxidation of primary alcohols to carboxylic acids or carboxylic esters, oxidation of alkenes to aldehydes or ketones, oxidation of amines to nitroso compounds and hydroxylamines, oxidation of primary amines, oximes, azides, isocyanates, or nitroso compounds, to nitro compounds, oxidation of thiols and other sulfur compounds to sulfonic acids), reactions in which oxygen is added to the subtrate (e.g., oxidation of alkynes to a- diketones, oxidation of tertiary amines to amine oxides, oxidation of thioesters to sulfoxides and sulfones, and oxidation of
  • Exemplary reductive reactions include, for example, reactions involving replacement of oxygen by hydrogen (e.g., reduction of carbonyl to methylene in aldehydes and ketones, reduction of carboxylic acids to alcohols, reduction of amides to amines, reduction of carboxylic esters to ethers, reduction of cyclic anhydrides to lactones and acid derivatives to alcohols, reduction of carboxylic esters to alcohols, reduction of carboxylic acids and esters to alkanes, complete reduction of epoxides, reduction of nitro compounds to amines, reduction of nitro compounds to hydroxylamines, reduction of nitroso compounds and hydroxylamines to amines, reduction of oximes to primary amines or aziridines, reduction of azides to primary amines, reduction of nitrogen compounds, and reduction of sulfonyl halides and sulfonic acids to thiols), removal of oxygen from the substrate (e.g., reduction of amine oxides and az
  • nucleic acid-templated functional group interconversions permit the generation of library diversity by sequential unmasking.
  • the sequential unmasking approach offers the major advantage of enabling reactants that would normally lack the ability to be linked to a nucleic acid (for example, simple alkyl halides) to contribute to library diversity by reacting with a sequence-specified subset of templates in an intermolecular, non-templated reaction mode. This advantage significantly increases the types of structures that can be generated.
  • One embodiment of the invention involves deprotection or unmasking of functional groups present in a reactive unit.
  • a nucleic acid-template is associated with a reactive unit that contains a protected functional group.
  • a transfer unit comprising an oligonucleotide complementary to the template codon region and a reagent capable of removing the protecting group, is annealed to the template, and the reagent reacts with the protecting group, removing it from the reactive unit.
  • the exposed functional group then is subjected to a reagent not linked to a nucleic acid.
  • the reactive unit contains two or more protected functional groups.
  • the protecting groups are orthogonal protecting groups that are sequentially removed by iterated annealing with reagents linked to transfer units.
  • Another embodiment of the invention involves interconversions of functional groups present on a reactive unit.
  • a transfer unit associated with a reagent that can catalyze a reaction is annealed to a template bearing the reactive unit.
  • a reagent not linked to a nucleic acid is added to the reaction, and the transfer unit reagent catalyzes the reaction between the unlinked reagent and the reactive unit, yielding a newly functionalized reactive unit.
  • the reactive unit contains two or more functional groups which are sequentially interconverted by iterative exposure to different transfer unit-bound reagents.
  • Nucleic acid-templated reactions can occur in aqueous or non-aqueous (i.e. , organic) solutions, or a mixture of one or more aqueous and non-aqueous solutions.
  • aqueous solutions reactions can be performed at pH ranges from about 2 to about 12, or preferably from about 2 to about 10, or more preferably from about 4 to about 10.
  • the reactions used in DNA- templated chemistry preferably should not require very basic conditions (e.g. , pH > 12, pH > 10) or very acidic conditions (e.g., pH ⁇ 1, pH ⁇ 2, pH ⁇ 4), because extreme conditions may lead to degradation or modification of the nucleic acid template and/or molecule (for example, the polymer, or small molecule) being synthesized.
  • the aqueous solution can contain one or more inorganic salts, including, but not limited to, NaCl, Na 2 SC>4, KC1, Mg +2 , Mn +2 , etc., at various concentrations.
  • Organic solvents suitable for nucleic acid-templated reactions include, but are not limited to, methylene chloride, chloroform, dimethylformamide, and organic alcohols, including methanol and ethanol.
  • quatemized ammonium salts such as, for example, long chain
  • tetraalkylammonium salts can be added (Jost et al. (1989) NUCLEIC ACIDS RES. 17: 2143; Mel'nikov et al. (1999) LANGMUIR 15: 1923-1928).
  • Nucleic acid-templated reactions may require a catalyst, such as, for example, homogeneous, heterogeneous, phase transfer, and asymmetric catalysis. In other embodiments, a catalyst is not required.
  • a catalyst is not required.
  • additional, accessory reagents not linked to a nucleic acid are preferred in some embodiments.
  • Useful accessory reagents can include, for example, oxidizing agents (e.g., NaI0 4 ); reducing agents (e.g., NaCNBH 3 ); activating reagents (e.g., EDC, NHS, and sulfo-NHS); transition metals such as nickel (e.g., Ni(N0 3 ) 2 ), rhodium (e.g.
  • RJ1CI3 ruthenium
  • copper e.g. Cu(N0 3 ) 2
  • cobalt e.g. CoCl 2
  • iron e.g. Fe(N03) 3
  • osmium e.g. OSO 4
  • titanium e.g. T1CI4 or titanium tetraisopropoxide
  • palladium e.g. NaPdC
  • Ln transition metal ligands (e.g., phosphines, amines, and halides); Lewis acids; and Lewis bases.
  • Reaction conditions preferably are optimized to suit the nature of the reactive units and oligonucleotides used.
  • reaction products e.g., small molecules
  • desired activities such as catalytic activity, binding affinity, or a particular effect in an activity assay
  • affinity selections may be performed according to the principles used in library-based selection methods such as phage display, polysome display, and mRNA-fusion protein displayed peptides.
  • Selection for catalytic activity may be performed by affinity selections on transition- state analog affinity columns (Baca et al. (1997) PROC. NATL. ACAD. SCI. USA 94(19): 10063- 8) or by function-based selection schemes (Pedersen et al. (1998) PROC. NATL. ACAD. SCI.
  • the templates and reaction products can be selected (or screened) for binding to a target molecule.
  • selection or partitioning means any process whereby a library member bound to a target molecule is separated from library members not bound to target molecules. Selection can be accomplished by various methods known in the art.
  • the templates of the present invention contain a built-in function for direct selection and amplification.
  • binding to a target molecule preferably is selective, such that the template and the resulting reaction product (e.g. , a small molecule) bind preferentially with a specific target molecule, perhaps preventing or inducing a specific biological effect.
  • a binding molecule identified using the present invention may be useful as a therapeutic and/or diagnostic agent.
  • the selected templates optionally can be amplified and sequenced.
  • the selected reaction products if present in sufficient quantity, can be separated from the templates, purified (e.g., by HPLC, column chromatography, or other chromatographic method), and further characterized.
  • Binding assays provide a rapid means for isolating and identifying reaction products (e.g. , a small molecule) that bind to, for example, a surface (such as metal, plastic, composite, glass, ceramics, rubber, skin, or tissue); a polymer; a catalyst; or a target biomolecule such as a nucleic acid, a protein (including enzymes, receptors, antibodies, and glycoproteins), a signal molecule (such as cAMP, inositol triphosphate, peptides, or prostaglandins), a carbohydrate, or a lipid. Binding assays can be advantageously combined with activity assays for the effect of a reaction product on a function of a target molecule.
  • the selection strategy can be carried out to allow selection against almost any target. Importantly, the selection strategy does not require any detailed structural information about the target molecule or about the molecules in the libraries. The entire process is driven by the binding affinity involved in the specific recognition and binding of the molecules in the library to a given target. Examples of various selection procedures are described below.
  • the libraries of the present invention can contain molecules that could potentially bind to any known or unknown target.
  • the binding region of a target molecule could include a catalytic site of an enzyme, a binding pocket on a receptor (for example, a G-protein coupled receptor), a protein surface area involved in a protein-protein or protein-nucleic acid interaction (preferably a hot-spot region), or a specific site on DNA (such as the major groove).
  • the natural function of the target could be stimulated (agonized), reduced (antagonized), unaffected, or completely changed by the binding of the reaction product (e.g., a small molecule). This will depend on the precise binding mode and the particular binding site the reaction product occupies on the target.
  • Functional sites such as protein-protein interaction or catalytic sites
  • proteins often are more prone to bind molecules than are other more neutral surface areas on a protein.
  • these functional sites normally contain a smaller region that seems to be primarily responsible for the binding energy: the so-called "hot-spot regions" (Wells, et al. (1993) RECENT PROG. HORMONE RES. 48: 253- 262). This phenomenon facilitates selection for molecules affecting the biological function of a certain target.
  • the linkage between the template molecule and reaction product allows rapid identification of binding molecules using various selection strategies.
  • This invention broadly permits identifying binding molecules for any known target molecule.
  • novel unknown targets can be discovered by isolating binding molecules against unknown antigens (epitopes) and using these binding molecules for identification and validation.
  • the target molecule is designed to mimic a transition state of a chemical reaction; one or more reaction products resulting from the selection may stabilize the transition state and catalyze the chemical reaction.
  • the template-directed synthesis of the invention permits selection procedures analogous to other display methods such as phage display (Smith (1985) SCIENCE 228: 1315- 1317). Phage display selection has been used successfully on peptides (Wells et al. (1992) CURR. OP. STRUCT. BIOL. 2: 597-604), proteins (Marks et al. (1992) J. BIOL. CHEM. 267: 16007-16010) and antibodies (Winter et al. (1994) ANNU. REV. IMMUNOL. 12: 433-455). Similar selection procedures also are exploited for other types of display systems such as ribosome display Mattheakis et al. (1994) PROC. NATL.
  • ACAD. SCI. 91 : 9022-9026 and mRNA display (Roberts, et al. (1997) PROC. NATL. ACAD. SCI. 94: 12297-302).
  • the libraries of the present invention allow direct selection of target-specific molecules without requiring traditional ribosome-mediated translation.
  • the present invention also allows the display of small molecules which have not previously been synthesized directly from a nucleic acid template.
  • binding molecules from a library can be performed in any format to identify optimal binding molecules. Binding selections typically involve immobilizing the desired target molecule, adding a library of potential binders, and removing non-binders by washing. When the molecules showing low affinity for an immobilized target are washed away, the molecules with a stronger affinity generally remain attached to the target.
  • the enriched population remaining bound to the target after stringent washing is preferably eluted with, for example, acid, chaotropic salts, heat, competitive elution with a known ligand or by proteolytic release of the target and/or of template molecules.
  • the eluted templates are suitable for PCR, leading to many orders of amplification, whereby essentially each selected template becomes available at a greatly increased copy number for cloning, sequencing, and/or further enrichment or diversification.
  • the fraction of ligand bound to target is determined by the effective concentration of the target protein.
  • selection stringency is controllable by varying the effective concentration of target.
  • the target molecule (peptide, protein, DNA or other antigen) can be immobilized on a solid support, for example, a container wall, a wall of a microtiter plate well.
  • the library preferably is dissolved in aqueous binding buffer in one pot and equilibrated in the presence of immobilized target molecule. Non-binders are washed away with buffer. Those molecules that may be binding to the target molecule through their attached DNA templates rather than through their synthetic moieties can be eliminated by washing the bound library with unfunctionalized templates lacking PCR primer binding sites. Remaining bound library members then can be eluted, for example, by denaturation.
  • the target molecule can be immobilized on beads, particularly if there is doubt that the target molecule will adsorb sufficiently to a container wall, as may be the case for an unfolded target eluted from an SDS-PAGE gel.
  • the derivatized beads can then be used to separate high-affinity library members from nonbinders by simply sedimenting the beads in a benchtop centrifuge.
  • the beads can be used to make an affinity column. In such cases, the library is passed through the column one or more times to permit binding. The column then is washed to remove nonbinding library members.
  • Magnetic beads are essentially a variant on the above; the target is attached to magnetic beads which are then used in the selection.
  • Sepharose beads and the integrity of known properties of the target molecule can be verified.
  • Activated beads are available with attachment sites for -NH 2 or -COOH groups (which can be used for coupling).
  • the target molecule is blotted onto nitrocellulose or PVDF.
  • the blot should be blocked (e.g. , with BSA or similar protein) after immobilization of the target to prevent nonspecific binding of library members to the blot.
  • Library members that bind a target molecule can be released by denaturation, acid, or chaotropic salts.
  • elution conditions can be more specific to reduce background or to select for a desired specificity. Elution can be accomplished using proteolysis to cleave a connector between the target molecule and the immobilizing surface or between the reaction product (e.g. , a small molecule) and the template. Also, elution can be accomplished by competition with a known competitive ligand for the target molecule. Alternatively, a PCR reaction can be performed directly in the presence of the washed target molecules at the end of the selection procedure.
  • the binding molecules need not be elutable from the target to be selectable since only the template is needed for further amplification or cloning, not the reaction product itself. Indeed, some target molecules bind the most avid ligands so tightly that elution would be difficult.
  • the cells themselves can be used as the selection agent.
  • the library preferably is first exposed to cells not expressing the target molecule on their surfaces to remove library members that bind specifically or non specifically to other cell surface epitopes.
  • cells lacking the target molecule are present in large excess in the selection process and separable (by fluorescence-activated cell sorting (FACS), for example) from cells bearing the target molecule.
  • FACS fluorescence-activated cell sorting
  • a recombinant DNA encoding the target molecule can be introduced into a cell line; library members that bind the transformed cells but not the untransformed cells are enriched for target molecule binders.
  • This approach is also called subtraction selection and has successfully been used for phage display on antibody libraries (Hoogenboom et al. (1998) IMMUNOTECH 4: 1- 20).
  • a selection procedure can also involve selection for binding to cell surface receptors that are internalized so that the receptor together with the selected binding molecule passes into the cytoplasm, nucleus, or other cellular compartment, such as the Golgi or lysosomes.
  • Internalized library members can be distinguished from molecules attached to the cell surface by washing the cells, preferably with a denaturant. More preferably, standard subcellular fractionation techniques are used to isolate the selected library members in a desired subcellular compartment.
  • An alternative selection protocol also includes a known, weak ligand affixed to each member of the library.
  • the known ligand guides the selection by interacting with a defined part of the target molecule and focuses the selection on molecules that bind to the same region, providing a cooperative effect. This can be particularly useful for increasing the affinity of a ligand with a desired biological function but with too low a potency.
  • the selection process is well suited for optimizations, where the selection steps are made in series, starting with the selection of binding molecules and ending with an optimized binding molecule.
  • the procedures in each step can be automated using various robotic systems.
  • the invention permits supplying a suitable library and target molecule to a fully automatic system which finally generates an optimized binding molecule. Under ideal conditions, this process should run without any requirement for external work outside the robotic system during the entire procedure.
  • the selection methods of the present invention can be combined with secondary selection or screening to identify reaction products (e.g. , small molecules) capable of modifying target molecule function upon binding.
  • reaction products e.g. , small molecules
  • the methods described herein can be employed to isolate or produce binding molecules that bind to and modify the function of any protein or nucleic acid.
  • nucleic acid-templated chemistry can be used to identify, isolate, or produce binding molecules (1) affecting catalytic activity of target enzymes by inhibiting catalysis or modifying substrate binding; (2) affecting the functionality of protein receptors, by inhibiting binding to receptors or by modifying the specificity of binding to receptors; (3) affecting the formation of protein multimers by disrupting the quaternary structure of protein subunits; or (4) modifying transport properties of a protein by disrupting transport of small molecules or ions.
  • Functional assays can be included in the selection process. For example, after selecting for binding activity, selected library members can be directly tested for a desired functional effect, such as an effect on cell signaling. This can, for example, be performed via FACS methodologies.
  • the binding molecules of the invention can be selected for other properties in addition to binding. For example, to select for stability of binding interactions in a desired working environment. If stability in the presence of a certain protease is desired, that protease can be part of the buffer medium used during selection. Similarly, the selection can be performed in serum or cell extracts or in any type of medium, aqueous or organic. Conditions that disrupt or degrade the template should however be avoided to allow subsequent amplification.
  • selections for other desired properties can also be performed.
  • the selection should be designed such that library members with the desired activity are isolatable on that basis from other library members.
  • library members can be screened for the ability to fold or otherwise significantly change conformation in the presence of a target molecule, such as a metal ion, or under particular pH or salinity conditions.
  • the folded library members can be isolated by performing non- denaturing gel electrophoresis under the conditions of interest. The folded library members migrate to a different position in the gel and can subsequently be extracted from the gel and isolated.
  • reaction products that fluoresce in the presence of specific ligands may be selected by FACS based sorting of translated polymers linked through their DNA templates to beads. Those beads that fluoresce in the presence, but not in the absence, of the target ligand are isolated and characterized.
  • Useful beads with a homogenous population of nucleic acid- templates on any bead can be prepared using the split-pool synthesis technique on the bead, such that each bead is exposed to only a single nucleotide sequence.
  • a different anti-template (each complementary to only a single, different template) can be synthesized on beads using a split-pool technique, and then can anneal to capture a solution-phase library.
  • Biotin-terminated biopolymers can be selected for the actual catalysis of bond- breaking reactions by passing these biopolymers over a resin linked through a substrate to avidin. Those biopolymers that catalyze substrate cleavage self-elute from a column charged with this resin. Similarly, biotin-terminated biopolymers can be selected for the catalysis of bond-forming reactions. One substrate is linked to resin and the second substrate is linked to avidin. Biopolymers that catalyze bond formation between the substrates are selected by their ability to react the substrates together, resulting in attachment of the biopolymer to the resin.
  • Library members can also be selected for their catalytic effects on synthesis of a polymer to which the template is or becomes attached.
  • the library member may influence the selection of monomer units to be polymerized as well as how the polymerization reaction takes place (e.g., stereochemistry, tacticity, activity).
  • the synthesized polymers can be selected for specific properties, such as, molecular weight, density, hydrophobicity, tacticity, stereoselectivity, using standard techniques, such as, electrophoresis, gel filtration, centrifugal sedimentation, or partitioning into solvents of different hydrophobicities.
  • the attached template that directed the synthesis of the polymer can then be identified.
  • Library members that catalyze virtually any reaction causing bond formation between two substrate molecules or resulting in bond breakage into two product molecules can be selected using the schemes proposed herein.
  • bond forming catalysts for example, hetero Diels-Alder, Heck coupling, aldol reaction, or olefin metathesis catalysts
  • library members are covalently linked to one substrate through their 5' amino or thiol termini.
  • the other substrate of the reaction is synthesized as a derivative linked to biotin.
  • those library members that catalyze bond formation cause the biotin group to become covalently attached to themselves.
  • Active bond forming catalysts can then be separated from inactive library members by capturing the former with immobilized streptavidin and washing away inactive library members
  • library members that catalyze bond cleavage reactions such as retro-aldol reactions, amide hydrolysis, elimination reactions, or olefin dihydroxylation followed by periodate cleavage can be selected.
  • library members are covalently linked to biotinylated substrates such that the bond breakage reaction causes the disconnection of the biotin moiety from the library members.
  • active catalysts but not inactive library members, induce the loss of their biotin groups.
  • Streptavidin-linked beads can then be used to capture inactive polymers, while active catalysts are able to be eluted from the beads.
  • Related bond formation and bond cleavage selections have been used successfully in catalytic RNA and DNA evolution (Jaschke et al. (2000) CURR. OPIN. CHEM. BIOL. 4: 257-62) Although these selections do not explicitly select for multiple turnover catalysis, RNAs and DNAs selected in this manner have in general proven to be multiple turnover catalysts when separated from their substrate moieties (Jaschke et al. (2000) CURR. OPIN. CHEM. BIOL. 4: 257-62; Jaeger et al. (1999) PROC. NATL. ACAD. SCI. USA 96: 14712-7; Bartel et al. (1993) SCIENCE 261 : 141 1-8; Sen et al. (1998) CURR. OPIN. CHEM. BIOL. 2: 680-7
  • Substrate specificity among catalysts can be selected by selecting for active catalysts in the presence of the desired substrate and then selecting for inactive catalysts in the presence of one or more undesired substrates. If the desired and undesired substrates differ by their configuration at one or more stereocenters, enantioselective or diastereoselective catalysts can emerge from rounds of selection.
  • metal selectivity can be evolved by selecting for active catalysts in the presence of desired metals and selecting for inactive catalysts in the presence of undesired metals.
  • catalysts with broad substrate tolerance can be evolved by varying substrate structures between successive rounds of selection.
  • in vitro selections can also select for specificity in addition to binding affinity.
  • Library screening methods for binding specificity typically require duplicating the entire screen for each target or non-target of interest.
  • selections for specificity can be performed in a single experiment by selecting for target binding as well as for the inability to bind one or more non-targets.
  • the library can be pre-depleted by removing library members that bind to a non-target.
  • selection for binding to the target molecule can be performed in the presence of an excess of one or more non-targets.
  • the non-target can be a homologous molecule.
  • the target molecule is a protein
  • appropriate non-target proteins include, for example, a generally promiscuous protein such as an albumin. If the binding assay is designed to target only a specific portion of a target molecule, the non-target can be a variation on the molecule in which that portion has been changed or removed.
  • the templates which are associated with the selected reaction product preferably are amplified using any suitable technique to facilitate sequencing or other subsequent manipulation of the templates.
  • Natural oligonucleotides can be amplified by any state of the art method. These methods include, for example, polymerase chain reaction (PCR); nucleic acid sequence-based amplification (see, for example, Compton (1991) NATURE 350: 91 -92), amplified anti-sense RNA (see, for example, van Gelder et al. (1988) PROC. NATL. ACAD. SCI.
  • Ligase-mediated amplification methods such as Ligase Chain Reaction (LCR) may also be used.
  • LCR Ligase Chain Reaction
  • any means allowing faithful, efficient amplification of selected nucleic acid sequences can be employed in the method of the present invention. It is preferable, although not necessary, that the proportionate representations of the sequences after amplification reflect the relative proportions of sequences in the mixture before amplification.
  • non-natural nucleotides the choices of efficient amplification procedures are fewer. As non-natural nucleotides can be incorporated by certain enzymes including polymerases it will be possible to perform manual polymerase chain reaction by adding the polymerase during each extension cycle. [00132] For oligonucleotides containing nucleotide analogs, fewer methods for amplification exist. One may use non-enzyme mediated amplification schemes (Schmidt et al. (1997)
  • NUCLEIC ACIDS RES. 25: 4797-4802 For backbone-modified oligonucleotides such as PNA and LNA, this amplification method may be used. Alternatively, standard PCR can be used to amplify a DNA from a PNA or LNA oligonucleotide template. Before or during amplification the templates or complementing templates may be mutagenized or recombined in order to create an evolved library for the next round of selection or screening.
  • Sequencing can be done by a standard dideoxy chain termination method, or by chemical sequencing, for example, using the Maxam-Gilbert sequencing procedure.
  • the sequence of the template (or, if a long template is used, the variable portion(s) thereof) can be determined by hybridization to a chip.
  • a single- stranded template molecule associated with a detectable moiety such as a fluorescent moiety is exposed to a chip bearing a large number of clonal populations of single-stranded nucleic acids or nucleic acid analogs of known sequence, each clonal population being present at a particular addressable location on the chip.
  • the template sequences are permitted to anneal to the chip sequences.
  • the position of the detectable moieties on the chip then is determined. Based upon the location of the detectable moiety and the immobilized sequence at that location, the sequence of the template can be determined.
  • next-generation sequencing techniques are used, where during DNA sequencing, the bases of a small fragment of DNA are sequentially identified from signals emitted as each fragment is re-synthesized from a DNA template strand. This sequencing method is based on reversible dye-terminators that enable the identification of single bases as they are introduced into complementary DNA strands.
  • [00135] Small molecule compound libraries have been made using nucleic acid template synthesis, also referred to herein as DNA-programmed chemistry (DPC), in which the DNA base sequence corresponds directly to the structure of the molecule made on each unique DNA template strand. Sequence-specific DNA-templated reactions have been carried out covering a range of chemical reaction types. The process requires that the template forms a duplex specifically with the oligonucleotide in the transfer unit, a partner reagent DNA strand comprising an anti-codon sequence for one of the codons in the template. Following duplex formation, the DNA-linked reactive units were brought into close proximity, and a chemical reaction was catalyzed between the building blocks on the template and reagent strands, forming a new covalent bond linking these two small molecules together.
  • DPC DNA-programmed chemistry
  • the single-stranded DNA sequence used as the template for DPC contained several distinct codon regions of predetermined length and sequence to ensure the specificity of DNA duplex formation and thus integrity of the chemical reaction.
  • DNA templates have been designed containing fixed sequence regions, tag regions and codons for DPC reactions (see FIG. 1).
  • Using the principles described herein it has been possible to create libraries of novel small molecules conjugated to DNA oligonucleotides from analogous libraries of DNA templates where the specific base sequence in the template translates directly to a specific small molecule structure that was made on the 5 '-end of the DNA.
  • Each codon and tag region within the template corresponds to a particular structural feature or building block employed in the synthesis, and the summation of all tag and codon sequences identifies the unique structure of the attached small molecule.
  • each base employed in the single-stranded DNA template sequence is part of a longer functional region with a specific pre-determined role in the execution of DPC library synthesis or is required as part of the process by which the structures of active components of the library are determined.
  • Located at both the 3'- and 5 '-ends of the DNA template are located fixed regions of ten bases length, which represent polymerase chain reaction (PCR) ligation sites. These sites permit PCR amplification of the DNA sequences when it is required to define the specific base sequence of the DNA to directly identify library components.
  • PCR polymerase chain reaction
  • oligonucleotides attached to the small molecule By determining the particular DNA base sequence of oligonucleotides attached to the small molecule it is possible to identify compounds that have affinity for target proteins. For example, incubating the library of DNA-conjugated small molecules in the presence of a solid-phase resin-immobilized protein target acts in an affinity-based selection format to sequester compounds with protein affinity. Washing the solid-supported protein free of non- binding compounds, and then elution of binders following protein denaturation will yield active conjugates. PCR amplification of the attached DNA strands and sequencing will reveal the specific base sequence of the DNA and by extension the unique structures of the small molecule protein ligands. The success and efficiency of the protein binding hit discovery process is greatly enhanced by enlarging the number and diversity of the small molecule collection, and the library design and synthesis process has been refined to increase the productivity and efficiency of hit discovery.
  • each of 12 bases adjacent to the fixed ligation site are three independent codon regions each of 12 bases. Each of these sites in turn can form a duplex with complementary reagent DNA sequences during the DPC reaction steps.
  • the codons are designed to ensure specific interactions with the predetermined DNA sequences of the reagent 'anti-codon' sequences.
  • the length of the codon region ensures sufficient base-pairing to give high affinity duplex formation with a suitably high melting temperature such that the duplex will form and be maintained at ambient temperature.
  • each codon region contains a number of possible base sequences chosen to present specific building blocks in one diversity location in the final library compound. For example in the preparation of a triazole-linked library, each codon position (Rl through R3) contains one of 24 variants, and each of the codon sets in each of the positions Rl, R2 and R3 had its own unique set of 24 codons, making a total of 72 codon sequences used (see FIG. 2).
  • the oligonucleotide synthesis was based on a mix and split process to ensure that all permutations were generated in approximately equal amounts.
  • DNA synthesis was carried out in 24 parallel vessels, and within each the DNA was synthesized from the 3 '-end to produce specific DNA sequence comprising the fixed ligation sequence of 10 bases, two tag sequences totaling 14 bases and then a unique R3 codon sequence of 12 bases.
  • the controlled-pore glass (CPG) solid support from all 24 vessels was removed and thoroughly mixed before redistribution into 24 new vessels for the addition of the bases constituting 24 unique and distinctive R2 codon sequences.
  • CPG controlled-pore glass
  • the same process carried out for the introduction of the 24 Rl codon sequences provided the 13,824 different nascent DNA templates.
  • Subsequent oligonucleotide synthesis introduced the final fixed ligation sequence at the 5 'end of the DNA.
  • chemical modification of the 5'-hydroxyl introduced an amino group that provided a handle for addition of the first synthetic building block (linker residue).
  • FIGS. 1 and 3 show an exemplary template containing two tags, Tag 1 and Tag 2, each of which is 7 bases long. These DNA sequences are essential in the identification of the small molecule covalently attached to the template DNA at the end of the library synthesis, but they do not engage in DPC-catalyzed reactions. Instead they are used to 'hard code' for subsets of the library that might differ in the identity of the linker or spacer building blocks.
  • the Tag 2 position is kept fixed with a unique base sequence, and the mix and split process of introducing the Rl through R3 codon regions proceeds to give every codon variant (see FIG. 3).
  • the resulting product is a mixture of templates all containing a fixed Tag 2 base sequence that defines one linker building block. Without further mixing, the individual 'linker' building blocks are chemically attached to the 5 '-amino terminus maintaining a direct relationship between the Tag 2 sequence and the linker building block structure.
  • the DNA template-linker conjugates are then mixed and used in DPC reactions directly.
  • the template oligonucleotides will have been synthesized in 16 mixtures each comprising 13,824 different sequences. Combining the 16 different mixtures gives a total template library complexity of 221,184 sequences (see FIG. 4).
  • the final library products that are synthesized can be identified by sequencing the DNA, revealing the structure of the small molecule by considering both (1) the codon regions representing building blocks Rl through R3 and (2) the tag region defining the linker building block.
  • Example 3 Tag Sequence in the DNA Template Defines a Spacer Group
  • the other tag sequence in the DNA template sequence (see FIGS. 1 and 2) is used to define the spacer building block.
  • the Tag 1 position base sequence is held constant, but any library might comprise multiple library mixtures which differ only in the DNA base sequence of Tag 1.
  • Each library mixture will independently be chemically derivatized in a non-DPC step with a spacer building block. The mixtures are kept separate so that the spacer that is attached is defined by the Tag 1 base sequence ensuring that there is fidelity between the Tag 1 DNA sequence and the spacer structure (see FIG. 5).
  • the DPC library synthesis process is a method for converting the combinatorial set of DNA template sequences into a combinatorial set of small molecules that are constructed on the 5 '-end of the DNA.
  • the total library size is the mathematical product of the building block diversity at every variable position in the molecule. For example in one library, diversity was introduced through the linker position (16 variants) and each of the Rl through R3 building block positions (each 24 variants). This affords a total mixture complexity of 221,184 different library small molecule products (see FIG. 4).
  • Increasing the numerical complexity of the library can be achieved in several different and additive ways, but the inclusion of a fourth codon is one way to increase library size (see FIG. 6).
  • the fourth codon permits the addition of a new building block position during DNA-programmed chemistry (DPC).
  • DPC DNA-programmed chemistry
  • Using 24 variants for the new building block introduced by this codon will provide 331,776 different template sequences.
  • a library of 5,308,416 unique compounds is created, which is a 24-fold increase in library diversity as compared to a 16-mixture pool of three codon templates, which provides only 221,184 compounds.
  • the length of the codon region ensures sufficient base-pairing between the template and the reagent oligonucleotides, such that there is sufficient specificity in the DNA duplex formation.
  • the choice of codons of 12 bases length ensures a high level of fidelity between the codon sequence and the identity of the building block added to the small molecule being synthesized on the 5 '-end of the DNA.
  • the length is chosen to give high affinity duplex formation with a suitably high melting temperature (above ambient temperature), and also to minimize any mismatched DNA.
  • the number of base sequence permutations that can be achieved with four bases in each of the 12 base positions is 16,777,216, providing a significant choice of alternate sequences for each building block.
  • Appropriate computer algorithms can be used to select codon sequences, to compile all possible full DNA template sequences, and to determine that there is absolute fidelity and no ambiguity in the matching of anti-codons on the reagent DNA to the template codons.
  • codons recognized only their designated anti-codons and that the reagent strands containing the anti-codons can bind only to their complementary codons, not to any other codon, nor to any other sequence of bases in any of the templates which are concurrently present in the DPC reaction mixture.
  • the vast diversity of codon sequences ensures that 24 unique codon sequences that work in every library template context can readily be selected.
  • the codon repertoire may be extended by lengthening the codon base sequence.
  • Example 6 Use of a Multivalent (Trivalent) Building Block
  • the DNA-programmed chemistry (DPC) approach permits building blocks to be added to the growing small molecule on the 5 '-end of the DNA independent of any base sequence.
  • a typical building block is a bifunctional reagent such as an amino acid, although depending on the chemistry used in small molecule synthesis, the building block is not necessarily limited to just this type of building block.
  • the amino acid is attached to the reagent DNA anticodon sequence through the amino group.
  • the carboxylic acid is activated, typically with a standard peptide coupling reagent, and the amide bond is generated between the amino acid and the free amine group on the small molecule intermediate attached to the 5 '-end of the DNA template.
  • the newly created molecule is sandwiched between the template and reagent DNA strands.
  • the scissile bond can be selectively cleaved allowing the separation from the now redundant reagent DNA.
  • the template now has a newly modified small molecule on the 5 '-end with a free exposed amine group which is available for further chemical derivitization.
  • a trifunctional building block An example might be the use of a suitably protected diamino acid.
  • Such building blocks have been attached to the DNA template through the carboxylic acid, with the two amine groups exposed for further chemistry. To prevent ambiguity in the synthesis, the two amines will be protected in different ways, so that either of both amines can be independently revealed when required.
  • one amine has been protected as the Fmoc-derivative, which can be revealed by treating with piperidine, and the other as an azide, which was converted to the free amine at a later stage of the library synthesis by a reduction or hydrogenation reaction.
  • DPC reactions have been used to add amino acids onto the free amine of the trifunctional group.
  • the amine on the trifunctional building block was revealed by suitable chemical conversion, and the two amines (that on the amino acid added by DPC, and that revealed on the trifunctional building block) can be linked by a bis-carboxylic acid spacer molecule to generate a macrocycle.
  • the terminal functional group might be a carboxy alkyne introduced as a spacer by amide coupling to the terminal amine on the third bifunctional building block.
  • Production of the final macrocyclic product has been achieved by a copper (I) salt-catalyzed Huisgen cyclization, onto the azide of the trifunctional linker molecule, to give a 1,2,3-triazole product.
  • the trifunctional building block can thus be considered to be a diversity generating element that, by being employed at different stages of the synthesis, can result in small molecule products with highly divergent architectures (see FIG. 7). Inclusion of additional trifunctional building blocks can add further structural diversity to the library with minimal additional synthetic effort, and will result in multiple libraries of similar numerical complexity.
  • each codon position encodes for a building block collection of 24 different building blocks.
  • Each of the three codon positions has a different set of 24 building blocks associated with it.
  • the building blocks are added to the growing small molecule by sequential DPC reactions. However there does not need to be a direct relationship between the sequence of DPC catalyzed building block adding events and the relative positions of the encoding codons within the DNA template.
  • the first building block could be introduced by using the codon 1 position.
  • a set of building blocks could be introduced in the first step equally successfully by using the second codon position or indeed the third.
  • Certain library architectures can be engineered to contain reactive functional groups such as carboxylic acids and amines. These functional groups are further derivatized by reaction of the entire library mixture with soluble reagents. For example, making a library comprising multiple diversity positions might conclude the DPC steps with an exposed nucleophilic amino group. Rather than using this amine for a further DPC step, or for cyclization to yield a macrocycle, the amine is derivatized with a number of different soluble reagents. To maximize the total number of library compounds, the library mixture at the end of the DPC steps is split into multiple aliquots such that each aliquot contains every library component. Each aliquot is treated with a different soluble reagent.
  • reactive functional groups such as carboxylic acids and amines.
  • a library aliquot containing a nucleophilic amine is reacted with acid chlorides, anhydrides, sulfonyl chlorides, isocyanates, isothiocyanates or nucleophilic aromatic systems such as chloroheterocycles.
  • a library mixture with a free, terminal carboxylic acid is reacted with amines using routine amide coupling conditions to generate amide derivatives.
  • a mixture of 5,971,968 library products containing a terminal amino group is split into 50 aliquots and each aliquot reacted with a different acyl chloride, anhydride, sulfonyl chloride or other electrophilic reagent to generate a total of 298,598,400 different library products in 50 mixtures.
  • Example 9 Addition of a DNA Tag Sequence by Ligation to Define a Soluble Reagent or the Sequence of Building Block Addition
  • Tag sequences can be used to define and later identify non-DPC steps, and as a consequence of their use, it is necessary to keep pools of template sequences separate until the point the non-DPC building block is added to the growing small molecule.
  • An example of this is the addition of the linker at the start of the small molecule synthesis. This step is undertaken with individual pools of templates, each comprising a unique tag sequence and each resulting in the addition of only one linker building block.
  • the pools of templates each containing different linkers can be combined subsequently and prior to the DPC reactions.
  • An alternative method is to add the linker to a template mixture and then add a tag DNA sequence to the 3 '-end of the DNA template by a ligation reaction.
  • the tag can be just 7 bases long and would be unique for the linker that has been added.
  • the sequence of the added tag can be identified and the nature of the non-DPC building block or soluble reagent can also be unambiguously determined.

Abstract

The present invention provides methods and compositions for expanding the scope of chemical reactions that can be performed during nucleic acid-templated organic syntheses, and for producing small molecule libraries of greater size, scope and complexity than previously possible.

Description

METHODS AND COMPOSITIONS FOR NUCLEIC ACID-TEMPLATED SYNTHESIS OF LARGE LIBRARIES OF COMPLEX SMALL MOLECULES
REFERENCE TO RELATED APPLICATIONS
[0001] This application claims the benefit of and priority to U.S. Provisional Patent Application No. 62/099,092, filed December 31, 2014, the entire contents of which are incorporated by reference herein in their entirety.
FIELD OF THE INVENTION
[0002] The invention relates generally to methods and compositions for performing nucleic acid-templated synthesis. More particularly, the invention relates to methods and compositions for producing small molecule libraries of greater size, scope and complexity than previously possible by nucleic acid template synthesis.
BACKGROUND OF THE INVENTION
[0003] Nucleic acid-templated organic synthesis enables modes of controlling reactivity that are not possible in a conventional synthesis format and allows synthetic molecules to be manipulated using translation, selection, and amplification methods previously available only to biological macromolecules (Gartner et al. (2001) J. AM. CHEM. SOC. 123: 6961-3; Gartner et al. (2002) ANGEW. CHEM., INT. ED. ENGL. 123: 61796-1800; Gartner et al. (2002) J. AM. CHEM. Soc. 124: 10304-6; Calderone et al. (2002) ANGEW. CHEM., INT. ED. ENGL. 41 : 4104-8; Gartner et al. (2003) ANGEW. CHEM., INT. ED. ENGL. 42: 1370-5; Li et al. (2004) J. AM. CHEM. SOC. 124: 5090-2; Kanan et al. (2004) NATURE 431 : 545-9; Gartner et al. (2004) SCIENCE 305: 1601-5; Li et al. (2004) ANGEW. CHEM. INT. ED. 43: 4848-70; Brenner et al. (1992) PROC. NATL. ACAD. SCI. USA 89: 5181; Doyon et al. (2003) J. AM. CHEM. SOC. 125: 12372-3; Halpin et al. (2004) PLoS BlOL. 2: el 74). However, the size, scope and diversity of small molecule libraries that can be produced using nucleic acid-templated synthesis can be limited.
Accordingly, there is a need in the art for methods and compositions for producing large libraries of diverse, complex small molecules. SUMMARY OF THE INVENTION
[0004] The present invention provides methods and compositions for producing small molecule libraries of greater size, scope and complexity than those previously possible by nucleic acid template synthesis.
[0005] In one aspect, the invention provides a method for producing a library of small molecules associated with corresponding oligonucleotides. The method comprises the steps of (a) providing a plurality of templates comprising a plurality of first reactive units associated with a corresponding plurality of first oligonucleotides, (b) providing a plurality of first transfer units comprising a plurality of second reactive units covalently attached to a corresponding plurality of second oligonucleotides, wherein each second oligonucleotide defines a first anti- codon sequence complementary to a first codon sequence; and (c) annealing first and second oligonucleotides having complementary codon and anti-codon sequences to bring the first and second reactive units into reactive proximity, thereby producing a plurality of first small molecules associated with the corresponding first oligonucleotides. This process can be repeated with a plurality of second transfer units to produce a plurality of second small molecules, with a plurality of third transfer units to produce a plurality of third small molecules, and with a plurality of fourth transfer units to produce a plurality of fourth small molecules. Each first oligonucleotide defines at least a first codon sequence, a second codon sequence, and a third codon sequence, and each of the first, second, and third codon sequence is at least 12 bases in length and is different from one another. Each first oligonucleotide is at least 70 bases in length.
[0006] In another aspect, the method can also include dividing a plurality of templates comprising a plurality of first reactive units associated with a corresponding plurality of first oligonucleotides into a plurality of aliquots, and for each aliquot, providing a plurality of first transfer units, a plurality of second transfer units, a plurality of third transfer units, and, optionally a plurality of fourth transfer units, wherein the order of adding the first, second, third, and optionally fourth transfer units is different from any other aliquot. Each first oligonucleotide defines at least a first codon sequence, a second codon sequence, and a third codon sequence, and each of the first, second, and third codon sequence is at least 12 bases in length and is different from one another. Each first oligonucleotide is at least 70 bases in length. In certain embodiments, the method includes recombining two or more of the aliquots to create a library of small molecules. [0007] The template may have any one of the following features. In certain embodiments, at least one codon is at least 14 bases in length. In certain embodiments, each of the first, second, third, and, optionally, fourth codon is at least 14 bases in length. In certain
embodiments, the first oligonucleotide is at least 90 bases in length. In certain embodiments, each first oligonucleotide comprises a unique tag sequence that defines the linker or capping group, or any other structural modification to any small molecule that was not achieved through a DNA-templated reaction step. In certain embodiments, each of the corresponding first oligonucleotides has a nucleotide sequence informative of at least a portion of the synthetic history of the small molecule associated therewith. In certain embodiments, the concentration of the plurality of templates is at least 90 nM and no greater than 500 nM at each step when a reactive unit is added.
[0008] The small molecules may have any of the following features. In certain
embodiments, the second, third or fourth small molecule comprises a moiety that was added as a soluble reagent to the first oligonucleotide-associated small molecule (e.g. , to the first, second, third, or fourth small molecule); and, optionally, wherein each of the first
oligonucleotides comprises a nucleotide sequence that is informative of the soluble reagent- added moiety. In certain embodiments, at least one of the first, second, third, fourth or fifth reactive unit, or the soluble reagent, is a trivalent moiety. In certain embodiments, the second, third or fourth small molecule, or the soluble reagent comprises a reactive moiety or can be further modified or deprotected to reveal a reactive functional group capable of further reaction with a plurality of chemical moieties. For example, the reactive moiety capable of further reaction with another chemical moiety can be but is not limited to a nucleophilic primary or secondary amine or a free carboxyl group.
[0009] The methods may further comprise the following steps. In certain embodiments, the method comprises (i) splitting the library into a plurality of aliquots following addition of the reactive moiety capable of further reaction with a plurality of chemical moieties onto the first oligonucleotide-associated small molecules; and (ii) adding to each of the plurality of aliquots a different reagent that reacts with the reactive moiety present on the first oligonucleotide- associated small molecules present therein. Examples of different reagents include an acylating agent, a sulfonating agent, a heteroaryl halide reagent, reductive amination reagents and an amide-forming reagent. To identify the addition of a soluble reagent or other chemical modifi cation, an identifying sequence (e.g. , a tag) can be ligated to the 3' terminus of each of the plurality of templates.
[0010] In certain embodiments, one or more of the plurality of second, third, fourth or fifth oligonucleotides is bound to a first member of a binding pair, e.g., biotin. In certain embodiments, one or more of the plurality of first, second, third or fourth small molecules bound to the first member of a binding pair is purified by contact with a second binding pair member, wherein the second member of the binding pair (e.g., streptavidin) is bound to a solid support.
[0011] In certain embodiments, following the addition of one or more of the first, second, third, or fourth reactive unit, or the soluble reagent-added moiety, the plurality of templates is reacted with a capping reagent that differentially caps the small molecules that did not react with one or more of the prior-added reactive units or soluble reagents. In this manner, these components can no longer participate in further chemical steps in the library preparation and are removed from the library pool upon stepwise purification. In certain embodiments, the capping reagent is an acid anhydride, e.g. , acetic anhydride, or acyl chloride or other activated acylating group known to those skilled in the art.
[0012] Libraries can be created according to the invention using any combination of the features described above.
[0013] In another aspect, the invention relates to a library of compounds produced by any of the methods described herein.
[0014] The foregoing aspects and features of the invention may be further understood by reference to the following drawings, detailed description, examples, and claims.
DESCRIPTION OF THE DRAWINGS
[0015] FIG. 1 is a schematic illustration depicting an exemplary template covalently attached to a product (macrocycle) encoded by nucleic acid template synthesis. The exemplary template comprises a plurality of regions including two fixed regions (10 bases), two tag regions (7 bases) and three codons (12 bases) for DPC reactions. In this embodiment, a macrocycle small molecule is synthesized on the DNA template, with the linker corresponding to Tag 2, the spacer corresponding to Tag 1, and building blocks 1, 2, and 3 corresponding to codons 1, 2 and 3, respectively. [0016] FIG. 2 is a schematic illustration depicting how to efficiently create a library of templates suitable for nucleic acid template synthesis. As shown, each codon position (Rl through R3) contains one of 24 variants (but more variants are possible), and each of the codon sets in each of the positions Rl, R2, and R3 has its own unique set of 24 codons, making a total of 72 different codon sequences used in total, to generate 13,824 different DNA sequences corresponding to 13,824 different templates.
[0017] FIG. 3 is a schematic illustration of an embodiment in which the Tag 2 position of a DNA template is kept fixed with a unique base sequence that defines one trivalent linker building block.
[0018] FIG. 4 is a schematic illustration of an embodiment in which a plurality of different linkers (16 as shown) are employed in a single library mixture, wherein the template oligonucleotides were synthesized in 16 mixtures denoted L-l through LI 6 each comprising 13,824 different sequences. Combining the 16 different mixtures gives a total template library complexity of 221,184 different template sequences.
[0019] FIG. 5 is a schematic illustration depicting the addition of a spacer, which is defined by the Tag 1 sequence in the DNA template.
[0020] FIG. 6 shows the design of DNA templates with four codons which increases library size. Using 24 variants for the new building block introduced by this codon will provide 331,776 different template sequences. When 16 different four-codon template mixtures are pooled, a library of 5,308,416 unique compounds is created, which is a 24-fold increase in library diversity as compared to a 16-mixture pool of three codon templates, which provides only 221,184 compounds.
[0021] FIG. 7 is a schematic illustration showing how changing the sequence of the chemical steps in a nucleic acid-templated library synthesis can generate diverse architectures, thereby increasing the diversity of small molecules in a DPC library.
DETAILED DESCRIPTION
[0022] The present invention facilitates the creation of large, diverse libraries of small molecules that are created by nucleic acid templated synthesis. This can be facilitated by one or more of the choice of specific templates, the choice of specific chemical reactants, capping processes, the choice of appropriate chemistries, and changing the order of synthesis steps during nucleic acid template synthesis. Each of the features is discussed in more detail below.
DEFINITIONS
[0023] The terms, "codon" and "anti-codon" as used herein, refer to complementary oligonucleotide sequences in a template and in a transfer unit, respectively, that permit the transfer unit to anneal to the template during nucleic acid-templated synthesis.
[0024] The term, "soluble reagent" as used herein refers to a chemical reagent or chemical moiety that is not linked to an oligonucleotide and does not participate in nucleic acid- templated synthesis. The soluble reagent can directly modify the small molecule attached to the oligonucleotide by chemical reaction independent of nucleic acid-templated addition. In comparison, the first and second reactive units of the template, for example, and the transfer units are attached to oligonucleotides that can participate in nucleic acid-templated synthesis.
[0025] The terms, "oligonucleotide" or "nucleic acid" as used herein refer to a polymer of nucleotides. The polymer may include, without limitation, natural nucleosides (i.e., adenosine, thymidine, guanosine, cytidine, uridine, deoxy adenosine, deoxythymidine, deoxyguanosine, and deoxy cytidine), nucleoside analogs (e.g., 2-aminoadenosine, 2-thiothymidine, inosine, pyrrolo-pyrimidine, 3-methyl adenosine, 5-methylcytidine, C5-bromouridine, C5-fluorouridine, C5-iodouridine, C5-propynyl-uridine, C5-propynyl-cytidine, C5-methylcytidine,
7-deazaadenosine, 7-deazaguanosine, 8-oxoadenosine, 8-oxoguanosine, 0(6)-methylguanine, and 2-thiocytidine), chemically modified bases, biologically modified bases (e.g., methylated bases), intercalated bases, modified sugars (e.g. , 2'-fluororibose, ribose, 2'-deoxyribose, arabinose, and hexose), or modified phosphate groups (e.g. , phosphorothioates and
5' -N-phosphoramidite linkages). Nucleic acids and oligonucleotides may also include other polymers of bases having a modified backbone, such as a locked nucleic acid (LNA), a peptide nucleic acid (PNA), a threose nucleic acid (TNA) and any other polymers capable of serving as a template for an amplification reaction using an amplification technique, for example, a polymerase chain reaction, a ligase chain reaction, or non-enzymatic template-directed replication.
[0026] The term, "reactive unit" as used herein, refers to a chemical reagent or chemical moiety that can participate in a chemical reaction with another chemical reagent or chemical moiety to produce a reaction intermediate and/or a reaction product (e.g. , a small molecule). [0027] The term, "reaction intermediate" as used herein, refers to a chemical reagent or a chemical moiety chemically transformed into a different reagent or chemical moiety with a soluble reagent.
[0028] The term, "small molecule" as used herein, refers to an organic compound either synthesized in the laboratory or found in nature having a molecular weight from about 300 Daltons (Da) to about 1,500 Da.
[0029] The term, "small molecule scaffold" as used herein, refers to a chemical compound having at least one site or chemical moiety suitable for functionalization. The small molecule scaffold or molecular scaffold may have two, three, four, five or more sites or chemical groups suitable for functionalization. These functionalization sites may be protected or masked as would be appreciated by one of skill in this art. The sites may also be found on an underlying ring structure or backbone. The small molecule scaffolds are not nucleic acids, nucleotides, or nucleotide analogs.
[0030] The term, "transfer unit" as used herein, refers to a molecule comprising an oligonucleotide having an anti-codon sequence attached to a reactive unit including, for example, but not limited to, a building block, monomer, small molecule scaffold, or other reactant useful in nucleic acid-templated chemical synthesis.
[0031] The term, "template" as used herein, refers to a molecule comprising an
oligonucleotide having at least one codon sequence suitable for a nucleic acid-templated chemical synthesis. The template optionally may comprise (i) a plurality of codon sequences, (ii) an amplification means, for example, a PCR primer binding site or a sequence
complementary thereto, (iii) a reactive unit associated therewith, (iv) one or more tag sequences, or (v) any combination of (i) - (iv).
[0032] Throughout the description, where compositions are described as having, including, or comprising specific components, or where processes are described as having, including, or comprising specific process steps, it is contemplated that compositions of the present invention also consist essentially of, or consist of, the recited components, and that the processes of the present invention also consist essentially of, or consist of, the recited processing steps. Further, it should be understood that the order of steps or order for performing certain actions are immaterial so long as the invention remains operable. Moreover, unless specified to the contrary, two or more steps or actions may be conducted simultaneously. [0033] In one aspect, the invention provides a method for producing a library of small molecules associated with corresponding oligonucleotides. The method comprises the steps of (a) providing a plurality of templates comprising a plurality of first reactive units associated with a corresponding plurality of first oligonucleotides, wherein (i) each first oligonucleotide defines at least a first codon sequence, a second codon sequence, and a third codon sequence; (ii) each of the first, second and third codon sequence is at least 12 bases in length; (iii) each of the first, second and third codon sequence is different from one another; and (iv) each first oligonucleotide is at least 70 bases in length; (b) providing a plurality of first transfer units comprising a plurality of second reactive units covalently attached to a corresponding plurality of second oligonucleotides, wherein each second oligonucleotide defines a first anti-codon sequence complementary to a first codon sequence; (c) annealing first and second
oligonucleotides having complementary codon and anti-codon sequences to bring the first and second reactive units into reactive proximity, thereby producing a plurality of first small molecules associated with the corresponding first oligonucleotides; (d) providing a plurality of second transfer units comprising a plurality of third reactive units covalently attached to a corresponding plurality of third oligonucleotides, wherein each third oligonucleotide defines a second anti-codon sequence complementary to the second codon sequence; (e) annealing first and third oligonucleotides having complementary codon and anti-codon sequences to bring the reaction products of step (c) and the third reactive units into reactive proximity thereby producing a plurality of second small molecules associated with the corresponding first oligonucleotides; (f) providing a plurality of third transfer units comprising a plurality of fourth reactive units covalently attached to a corresponding plurality of fourth oligonucleotides, wherein each fourth oligonucleotide defines a third anti-codon sequence complementary to the third codon sequence; and (g) annealing first and fourth oligonucleotides having
complementary codon and anti-codon sequences to bring the reaction products of step (e) and the fourth reactive units into reactive proximity thereby producing a plurality of third small molecules associated with the corresponding first oligonucleotides, wherein each of the corresponding first oligonucleotides has a nucleotide sequence informative of at least a portion of the synthetic history of the third small molecule associated therewith.
[0034] In another aspect, the invention provides a method for producing a library of small molecules associated with corresponding oligonucleotides comprising the steps of (a) providing a plurality of templates comprising a plurality of first reactive units associated with a - Si - corresponding plurality of first oligonucleotides, wherein (i) each first oligonucleotide defines at least a first codon sequence, a second codon sequence, and a third codon sequence; (ii) each of the first, second and third codon sequence is at least 12 bases in length; (iii) each of the first, second and third codon sequence is different from one another; and (iv) each first
oligonucleotide is at least 70 bases in length; (b) dividing the plurality of templates into a plurality of aliquots; (c) for each aliquot (i) providing a plurality of first transfer units comprising a plurality of second reactive units covalently attached to a corresponding plurality of second oligonucleotides, wherein each second oligonucleotide defines a first anti-codon sequence complementary to a first codon sequence; (ii) annealing first and second
oligonucleotides having complementary codon and anti-codon sequences to bring the first and second reactive units into reactive proximity, thereby producing a plurality of first small molecules associated with the corresponding first oligonucleotides; (iii) providing a plurality of second transfer units comprising a plurality of third reactive units covalently attached to a corresponding plurality of third oligonucleotides, wherein each third oligonucleotide defines a second anti-codon sequence complementary to the second codon sequence; (iv) annealing first and third oligonucleotides having complementary codon and anti-codon sequences to bring the plurality of first small molecules of step (iii) and the third reactive units into reactive proximity thereby producing a plurality of second small molecules associated with the corresponding first oligonucleotides; (v) providing a plurality of third transfer units comprising a plurality of fourth reactive units covalently attached to a corresponding plurality of fourth oligonucleotides, wherein each fourth oligonucleotide defines a third anti-codon sequence complementary to the third codon sequence; (vi) annealing first and fourth oligonucleotides having complementary codon and anti-codon sequences to bring the plurality of second small molecules of step (iv) and the fourth reactive units into reactive proximity thereby producing a plurality of third small molecules associated with the corresponding first oligonucleotides, wherein the order of adding the first, second, and third transfer units is different from any other aliquot; and each of the corresponding first oligonucleotides has a nucleotide sequence informative of at least a portion of the synthetic history of the third small molecule associated therewith; and (d) optionally recombining two or more of the aliquots to create a library of small molecules.
[0035] In certain embodiments, the first oligonucleotide comprises a fourth codon of at least 12 bases in length, and the method comprises the additional steps of providing a plurality of fourth transfer units comprising a plurality of fifth reactive units covalently attached to a corresponding plurality of fifth oligonucleotides, wherein each fifth oligonucleotide defines a fourth anti-codon sequence complementary to the fourth codon sequence; and annealing first and fifth oligonucleotides having complementary codon and anti-codon sequences to bring the plurality of third small molecules of step (g) or (vi) and the fifth reactive units into reactive proximity thereby producing a plurality of fourth small molecules associated with the corresponding first oligonucleotides, wherein each of the corresponding first oligonucleotides has a nucleotide sequence informative of at least a portion of the synthetic history of the fourth small molecule associated therewith.
[0036] The length of each codon region ensures sufficient base-pairing to give high-affinity duplex formation with a suitably high melting temperature such that the duplex will form and be maintained at ambient temperature. In certain embodiments, at least one codon is at least 14 bases in length. In certain embodiments, each of the first, second, third, and, when present, fourth codon is at least 14 bases in length. In certain embodiments, the first oligonucleotide is at least 90 bases in length.
[0037] To increase the diversity of compounds that can be produced, a soluble reagent (e.g. , a free reactant not attached to an oligonucleotide-transfer unit) also can be used. For example, in certain embodiments, the third or fourth small molecule comprises a moiety that was added as a soluble reagent to the first oligonucleotide-associated small molecule of one or more of steps (c), (e), (g) or (i) (or (ii), (iv), (vi) or (viii)); and, optionally, wherein each of the first oligonucleotides comprises a nucleotide sequence that is informative of the soluble reagent-added moiety.
[0038] The structural complexity of small molecules generated by nucleic acid-templated synthesis can be further enhanced by the incorporation of a trivalent (i.e., trifunctional) building block, having three sites of attachment. In certain embodiments, at least one of the first, second, third, fourth or fifth reactive unit, or the soluble reagent, is a trivalent moiety.
[0039] To further generate diversity, in certain embodiments, the second, third or fourth small molecule or the soluble reagent comprises a reactive moiety or can be further modified or deprotected to reveal a reactive functional group capable of further reaction with a plurality of chemical moieties. For example, the reactive moiety capable of further reaction with another chemical moiety can be but is not limited to a nucleophilic primary or secondary amine or a free carboxyl group. [0040] Diversity can also be increased by (i) splitting the library into a plurality of aliquots following addition of the reactive moiety capable of further reaction with a plurality of chemical moieties onto the first oligonucleotide-associated small molecules; and (ii) adding to each of the plurality of aliquots a different reagent that reacts with the reactive moiety present on the first oligonucleotide-associated small molecules present therein. Examples of different reagents include an acylating agent, a sulfonating agent, a heteroaryl halide reagent, reductive animation reagents and an amide-forming reagent. To identify the addition of a soluble reagent or other chemical modification, an identifying sequence (e.g. , a tag) can be ligated to the 3' terminus of each of the plurality of templates.
[0041] In certain embodiments, the concentration of the plurality of templates is at least 90 nM and no greater than 500 nM at each step when a reactive unit is added. For example, the concentration of the plurality of templates can be 90 nM, 100 nM, 150 nM, 200 nM, 300 nM, 400 nM or 500 nM. In certain embodiments, one or more of the plurality of second, third, fourth or fifth oligonucleotides is bound to a first member of a binding pair, e.g. , biotin. In certain embodiments, one or more of the plurality of first, second, third or fourth small molecules bound to a first member of a binding pair is purified by contact with a second member of a binding pair, wherein the second member of the binding pair (e.g. , streptavidin) is bound to a solid support. Binding pair members can be any binding pairs known in the art, including, for example, biotin and avidin or streptavidin, antibody-antigen pairs, etc.
[0042] In certain embodiments, following the addition of one or more of the first, second, third, or fourth reactive unit, or the soluble reagent-added moiety, the plurality of templates is reacted with a capping reagent that differentially caps the small molecules that did not react with one or more of the prior-added reactive units or soluble reagents. As a result, the cap renders the small molecule that did not react with the one or more of the prior-added reactive units or soluble reagents unable to react with any further reactive units or soluble reagents. In this manner, these components can no longer participate in further chemical steps in the library preparation and are removed from the library pool upon stepwise purification. In certain embodiments, the capping reagent is an acid anhydride, e.g., acetic anhydride, or acyl chloride or other activated acylating group known to those skilled in the art. [0043] In certain embodiments, each first oligonucleotide comprises a unique tag sequence that defines the linker or capping group, or any other structural modification to any small molecule that was not achieved through a DNA-templated reaction step.
[0044] As will be appreciated by those skilled in the art, the methods and compositions of the invention can be used to expand the number and diversity of small molecules produced during nucleic acid-templated chemical syntheses. A general discussion of these considerations follows.
I. TEMPLATE CONSIDERATIONS
[0045] The nucleic acid template can direct a wide variety of chemical reactions without obvious structural requirements by specifically recruiting reactants linked to complementary oligonucleotides. During synthesis, the template hybridizes or anneals to one or more transfer units to direct the synthesis of a reaction intermediate that can subsequently be converted by further chemical reaction into a reaction product (e.g., a small molecule). The reaction product then is selected or screened based on certain criteria, such as the ability to bind to a preselected target molecule. Once the reaction product has been identified, the associated template can then be sequenced to decode the synthetic history of the reaction intermediate and/or the reaction product.
(i) Template Format
[0046] The length of the template may vary greatly depending upon the type of the nucleic acid-templated synthesis contemplated. For example, in certain embodiments, the template may be from 20 to 400 nucleotides in length, from 30 to 300 nucleotides in length, from 40 to 200 nucleotides in length, or from 50 to 100 nucleotides in length, from 40 to 400 nucleotides or from 40 to 100 nucleotides in length. In certain embodiments, the template may be 40, 50, 60, 70, 80, 90, or 100 nucleotides in length. The length of the template will of course depend on, for example, the length of the codons, the complexity of the library, the complexity and/or size of a reaction product (e.g., a small molecule), the use of spacer sequences, etc.
(ii) Codon Usage
[0047] It is contemplated that the sequence of the template may be designed in a number of ways. For example, the length of the codon must be determined and the codon sequences must be set. If a codon length of two is used, then using the four naturally occurring bases only 16 possible combinations are available to be used in encoding the library. If the length of the codon is increased to three (the number Nature uses in encoding proteins), the number of possible combinations increases to 64. If the length of the codon is increased to four, the number of possible combinations increases to 256. Other factors to be considered in determining the length of the codon are mismatching, frame-shifting, complexity of library, etc. As the length of the codon is increased up to a certain point the number of mismatches is decreased; however, excessively long codons likely will hybridize despite mismatched base pairs.
[0048] Although the length of the codons may vary, the codons may range from 3 to 50 nucleotides, from 3 to 40 nucleotides, from 3 to 30 nucleotides, from 3 to 20 nucleotides, from
3 to 15 nucleotides, from 3 to 10 nucleotides, from 4 to 50 nucleotides, from 4 to 40 nucleotides, from 4 to 30 nucleotides, from 4 to 20 nucleotides, from 4 to 15 nucleotides, from
4 to 10 nucleotides, from 5 to 50 nucleotides, from 5 to 40 nucleotides, from 5 to 30 nucleotides, from 5 to 20 nucleotides, from 5 to 15 nucleotides, from 5 to 10 nucleotides, from 6 to 50 nucleotides, from 6 to 40 nucleotides, from 6 to 30 nucleotides, from 6 to 20 nucleotides, from 6 to 15 nucleotides, from 6 to 10 nucleotides, from 7 to 50 nucleotides, from
7 to 40 nucleotides, from 7 to 30 nucleotides, from 7 to 20 nucleotides, from 7 to 15 nucleotides, from 7 to 10 nucleotides, from 8 to 50 nucleotides, from 8 to 40 nucleotides, from
8 to 30 nucleotides, from 8 to 20 nucleotides, from 8 to 15 nucleotides, from 8 to 10 nucleotides, from 9 to 50 nucleotides, from 9 to 40 nucleotides, from 9 to 30 nucleotides, from
9 to 20 nucleotides, from 9 to 15 nucleotides, from 9 to 10 nucleotides. In certain
embodiments, codons are 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 nucleotides in length.
[0049] In one embodiment, the set of codons used in the template preferably maximizes the number of mismatches between any two codons within a codon set to ensure that only the proper anti-codons of the transfer units anneal to the codon sites of the template. Furthermore, it is important that the template has mismatches between all the members of one codon set and all the codons of a different codon set to ensure that the anti-codons do not inadvertently bind to the wrong codon set. The choice of exemplary codon sets and methods of creating functional codon sets are described, for example, in U.S. Patent Nos. 7,491,494; 7,771,935; and 8,206,914, by Liu et al. Using this and other approaches, different sets of codons can be generated so that no codons are repeated. (iii) Tags
[0050] While the codons themselves are identifiers for the reactive units used in the synthesis of the reaction product (e.g. , a small molecule), additional nucleic acid identifiers, "tag" sequences, may be incorporated into the template to identify a spacer moiety, linker moiety, capping reagent, or soluble reagent used in such synthesis. These additional nucleic acid (e.g., DNA) tag sequences are used to identify the reaction product covalently attached to the template at the end of the library synthesis, but they do not engage in DPC-catalyzed reactions. Instead, they are used to identify subsets of the library that have a particular linker, spacer, capping reagent, or soluble reagent. For example, a template oligonucleotide can be synthesized with a given tag sequence that corresponds to a specific linker moiety. The individual linker moieties are chemically attached to the 5 '-amino terminus of the template, maintaining a direct relationship between the tag sequence and the linker moiety. The template-linker conjugates can then be mixed and used in DPC reactions directly. The final library products that are synthesized can be identified ultimately by sequencing the DNA revealing the structure of the small molecule by a consideration of both the codon regions and the tag region, which defines the linker moiety.
[0051] Alternatively, the linker, spacer, capping moiety or soluble reagent can be added independently to a template mixture and then an identifying tag DNA sequence for such added reagent added to the 3 '-end of the nucleic acid template by a ligation reaction. The tag is unique for the linker, spacer, capping moiety or soluble reagent. An advantage of this approach is that a linker, spacer, capping moiety or soluble reagent can be added at any stage of the synthesis, either before, after, or between DPC steps, and the identity of the product can be captured in the template sequence by adding additional DNA bases (e.g. , a tag) that define the linker, spacer, capping moiety or soluble reagent. When a template is later amplified and sequenced, in addition to identifying small molecule structures by the base sequence of the codon regions, the sequence of the added tag can be identified and the identity of the non-DPC linker, spacer, capping moiety or soluble reagent can be determined.
[0052] As shown in FIG. 1, an exemplary template can comprise one or more (e.g., two) tag regions (e.g., a 7 base tag) encoding a linker, spacer, capping moiety or soluble reagent. In this example, Tag 2 corresponds to the linker of the attached small molecule, and Tag 1 to the spacer, although this relationship is not fixed and can be reversed (see also FIG. 3, showing that the fixed Tag 2 region defines the linker building block). As shown in FIG. 2, unlike codon sequences, the sequence encoding a tag (e.g., Tag 1 or Tag 2) stays constant for a given template. As shown in FIG. 4, when multiple mixtures (e.g., 16) of templates with unique tags/linkers are combined into a single library, the identity of the linker attached to a template can be determined by determining the sequence of Tag 2. Similarly, as shown in FIG. 5, when multiple mixtures (e.g., 16) of templates with unique tags/spacers are combined into a single library, the identity of the spacer attached to a given template can be determined by determining the sequence of Tag 1.
[0053] Although the length of the tag region may vary, the tag regions may range from 3 to 30 nucleotides, from 3 to 20 nucleotides, from 3 to 15 nucleotides, from 3 to 10 nucleotides, from 4 to 30 nucleotides, from 4 to 20 nucleotides, from 4 to 15 nucleotides, from 4 to 10 nucleotides, from 5 to 30 nucleotides, from 5 to 20 nucleotides, from 5 to 15 nucleotides, from 5 to 10 nucleotides, from 6 to 30 nucleotides, from 6 to 20 nucleotides, from 6 to 15 nucleotides, from 6 to 10 nucleotides, from 7 to 30 nucleotides, from 7 to 20 nucleotides, from 7 to 15 nucleotides, from 7 to 10 nucleotides, from 8 to 30 nucleotides, from 8 to 20 nucleotides, from 8 to 15 nucleotides, from 8 to 10 nucleotides, from 9 to 30 nucleotides, from 9 to 20 nucleotides, from 9 to 15 nucleotides, from 9 to 10 nucleotides. In certain
embodiments, tag regions are 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 nucleotides in length.
(iv) Fixed Sequence Regions
[0054] As shown in FIG. 1, the 3'- and 5'-ends of the nucleic template optionally may comprise "fixed sequence regions" or "fixed regions" of bases (e.g., 10 bases), which represent polymerase chain reaction (PCR) primer sites. These sites can be used to facilitate PCR amplification of the DNA sequences the specific base sequence of the DNA is used to directly identify library components. In certain embodiments, a PCR primer site comprises bases from both the fixed sequence region and an adjacent region of the template, e.g. , a portion of a tag or codon. By determining the particular DNA base sequence of oligonucleotides attached to the small molecule it is possible to identify compounds that have affinity for target proteins.
[0055] Although the length of the fixed sequence region can vary, the fixed sequence regions can range from 2 to 50 nucleotides, from 2 to 40 nucleotides, from 2 to 30 nucleotides, from 2 to 20 nucleotides, from 2 to 15 nucleotides, from 2 to 10 nucleotides, from 3 to 50 nucleotides, from 3 to 40 nucleotides, from 3 to 30 nucleotides, from 3 to 20 nucleotides, from
3 to 15 nucleotides, from 3 to 10 nucleotides, from 4 to 50 nucleotides, from 4 to 40 nucleotides, from 4 to 30 nucleotides, from 4 to 20 nucleotides, from 4 to 15 nucleotides, from
4 to 10 nucleotides, from 5 to 50 nucleotides, from 5 to 40 nucleotides, from 5 to 30 nucleotides, from 5 to 20 nucleotides, from 5 to 15 nucleotides, from 5 to 10 nucleotides, from
6 to 50 nucleotides, from 6 to 40 nucleotides, from 6 to 30 nucleotides, from 6 to 20 nucleotides, from 6 to 15 nucleotides, from 6 to 10 nucleotides, from 7 to 50 nucleotides, from
7 to 40 nucleotides, from 7 to 30 nucleotides, from 7 to 20 nucleotides, from 7 to 15 nucleotides, from 7 to 10 nucleotides, from 8 to 50 nucleotides, from 8 to 40 nucleotides, from 8 to 30 nucleotides, from 8 to 20 nucleotides, from 8 to 15 nucleotides, from 8 to 10 nucleotides, from 9 to 50 nucleotides, from 9 to 40 nucleotides, from 9 to 30 nucleotides, from 9 to 20 nucleotides, from 9 to 15 nucleotides, from 9 to 10 nucleotides. In certain
embodiments, fixed sequence regions are 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 nucleotides in length. While primers generally must be about 6 nucleotides in length, a fixed region can be less than 6 nucleotides if a PCR primer site comprises bases from both the fixed sequence region and an adjacent region of the template, e.g., a portion of a tag or codon.
(v) Template Synthesis
[0056] The templates can be synthesized using methodologies well known in the art. These methods include both in vivo and in vitro methods including PCR, plasmid preparation, endonuclease digestion, solid phase synthesis (for example, using an automated synthesizer), in vitro transcription, strand separation, etc. Following synthesis, the template, when desired can be attached (for example, covalently or non-covalently attached) with a reactive unit of interest using standard coupling chemistries known in the art. In certain embodiments, a linker is used to attach a reactive unit of interest to the template. The linker can be bivalent (have two functional groups for attachment) or trivalent (have three functional groups for attachment). The linker can be defined by a tag sequence on the template, as described above.
[0057] An efficient method to synthesize a large variety of templates is to use a "split-pool" technique. The oligonucleotides are synthesized using standard phosphoramidite 3' to 5' chemistries, although alternatively, synthesis in the 5' to 3' direction can be performed. First, the constant 3' end is synthesized. This is then split into n different vessels, where n is the number of different codons to appear at that position in the template. For each vessel, one of the n different codons is synthesized on the (growing) 5' end of the constant 3' end. Thus, each vessel contains, from 5' to 3', a different codon attached to a constant 3' end. The n vessels then are pooled, so that a single vessel contains n different codons attached to the constant 3' end. Any constant bases adjacent the 5' end of the codon are now synthesized. The pool then is split into m different vessels, where m is the number of different codons to appear at the next (more 5') position of the template. A different codon is synthesized (at the 5' end of the growing oligonucleotide) in each of the m vessels. The resulting oligonucleotides are pooled in a single vessel. Splitting, synthesizing, and pooling are repeated as required to synthesize all codons and constant regions in the oligonucleotides.
[0058] An exemplary method for producing diverse groups of templates is shown in FIG. 2. A template can be constructed comprising, e.g., a fixed region 3' and 5' of the template, two tags (Tag 1 and Tag 2), and three codons. Each of the three codons has 24 possible variants, for a total of 13,824 different DNA sequences. As shown in FIG. 4, larger libraries can be produced by combining multiple mixtures of templates encoding different DNA sequences. For example (and as shown in FIG. 4), 16 mixtures of templates encoding 13,824 different DNA sequences can be combined to produce a single library of , e.g., 221,184 different DNA sequences. As shown in FIG. 6, even larger libraries can be produced by using a fourth codon. Using 24 variants for the fourth codon will provide 331,776 different template sequences. When 16 different four-codon template mixtures are pooled, a library of 5,308,416 unique compounds is created, which is a 24-fold increase in library diversity as compared to a 16- mixture pool of three codon templates, which provides only 221,184 compounds.
II. TRANSFER UNITS
[0059] A transfer unit comprises an oligonucleotide containing an anti-codon sequence and a reactive unit. The anti-codons are designed to be complementary to the codons present in the template. Accordingly, the sequences used in the template and the codon lengths should be considered when designing the anti-codons. Any molecule complementary to a codon used in the template may be used, including natural or non-natural nucleotides. In certain
embodiments, the codons include one or more bases found in nature (i.e. , thymidine, uracil, guanidine, cytosine, and adenine).
[0060] As discussed above, the anti-codon is associated with a particular type of reactive unit to form a transfer unit. [0061] In certain other embodiments, where a small molecule library is to be created, the anti-codon can be associated with a reactive unit or reactant that is used to modify a small molecule scaffold. In certain embodiments, the reactant is linked to the anti-codon via a connector long enough to allow the reactant to come into reactive proximity with the small molecule scaffold. The connector preferably has a length and composition to permit a specific reaction between the annealed template and reactant, while minimizing and preferably preventing the occurrence of non-specific reactions (e.g. , non-specific intramolecular reactions). The reactants include a variety of reagents as demonstrated by the wide range of reactions that can be utilized in nucleic acid-templated synthesis and can be any chemical group, catalyst (e.g., organometallic compounds), or reactive moiety (e.g., electrophiles, nucleophiles) known in the chemical arts.
[0062] Thus, the anti-codon can be associated with the reactant through a cleavable connector. The connector can be cleavable by light, oxidation, hydrolysis, exposure to acid, exposure to base, reduction, etc. Fruchtel et al. (1996) ANGEW. CHEM. INT. ED. ENGL. 35 : 17 describes a variety of linkages useful in the practice of the invention. The linker facilitates contact of the reactant with the small molecule scaffold and in certain embodiments, depending on the desired reaction, positions DNA as a leaving group ("autocleavable" strategy), or may link reactive groups to the template via the "scarless" connector strategy (which yields product without leaving behind an additional atom or atoms having chemical functionality), or a "useful scar" strategy (in which a portion of the connector is left behind to be functionalized in subsequent steps following connector cleavage).
[0063] With the "autocleavable" connector strategy, the DNA-reactive group bond is cleaved as a natural consequence of the reaction. In the "scarless" connector strategy, DNA- templated reaction of one reactive group is followed by cleavage of the connector attached through a second reactive group to yield products without leaving behind additional atoms capable of providing chemical functionality. Alternatively, a "useful scar" may be utilized on the theory that it may be advantageous to introduce useful atoms and/or chemical groups as a consequence of connector cleavage. In particular, a "useful scar" is left behind following connector cleavage and can be functionalized in subsequent steps.
[0064] The specific annealing of transfer units to templates permits the use of transfer units at concentrations lower than concentrations used in many traditional organic syntheses. Thus, transfer units can be used at submillimolar concentrations (e.g. less than 100 μΜ, less than 10 μΜ, less than 1 μΜ, less than 100 nM, or less than 10 nM).
III. LINKERS AND SPACERS
[0065] A linker can be defined by a tag, which can be used to identify the final reaction product covalently attached to the template. The linker preferably is multivalent, and can be bivalent, trivalent, tetravalent, etc. Exemplary linkers include a diamino acid, an azido-amino acid, an acetylenic amino acid, a haloaromatic functionalized amino acid, or any other similar multivalent building block, with the functionality either present or in a protected or precursor form. Exemplary protected or precursor forms include but are not limited to Boc, Alloc, or Fmoc carbamate forms, or amines can be generated from azides by using a reduction step.
[0066] Spacer moieties can be used to add additional diversity to small molecule libraries, to increase the size of small molecules (e.g. , macrocycles), or increase the spacing between other moieties in molecules, or to introduce new and diverse functionality. A spacer can be defined by a tag (see FIGS. 1, 5 and 6), which can be used to identify the final reaction product covalently attached to the template. Exemplary spacers include but are not limited to amino acids, carboxy alkynes, bis-carboxylic acids, amino aldehydes, and bromo carboxylic acids.
IV. CHEMICAL REACTIONS
[0067] A variety of small molecule compounds and/or libraries can be prepared using the methods described herein. In certain embodiments, compounds that are not, or do not resemble, nucleic acids or analogs thereof, are synthesized according to the method of the invention. It is contemplated that the small molecules can include macrocycles.
(i) Coupling Reactions for Small Molecule Synthesis
[0068] In synthesizing small molecules using the method of the present invention, an evolvable template can be used. The template can include a small molecule scaffold upon which the small molecule is to be built (e.g. , a first reactive unit), or a small molecule scaffold may be added to the template. The small molecule scaffold can be any chemical compound with two or more sites for functionalization. For example, the small molecule scaffold can include a ring system (e.g. , the ABCD steroid ring system found in cholesterol) with functionalizable groups coupled to the atoms making up the rings. In another example, the small molecule may be the underlying core scaffold structure of a pharmaceutical agent such as morphine, epothilone or a cephalosporin antibiotic. The sites or groups to be functionalized on the small molecule scaffold may be protected using methods and protecting groups known in the art. The protecting groups used in a small molecule scaffold may be orthogonal to one another so that protecting groups can be removed one at a time.
[0069] In this approach, the transfer units comprise an anti-codon associated with a reactant or a building block for use in modifying, adding to, or taking away from the small molecule scaffold. The reactants or building blocks may be, for example, electrophiles (e.g. , anhydrides, acid chlorides, esters, nitriles, imines), nucleophiles (e.g. , amines, hydroxyl groups, thiols), catalysts (e.g. , organometallic catalysts), or side chains. The transfer units are allowed to contact the template under hybridizing conditions. As a result of oligonucleotide annealing, the attached reactant or building block is allowed to react with a site on the small molecule scaffold to produce one or more reaction intermediates. In certain embodiments, protecting groups on the small molecule template are removed one at a time from the sites to be functionalized so that the reactant of the transfer unit will react at only the desired position on the scaffold.
[0070] The reaction conditions, linker, reactant, and site to be functionalized are chosen to avoid unwanted side reactions and accelerate desired intramolecular reactions. Sequential or simultaneous contacting of the template with transfer units can be employed depending on the particular compound to be synthesized.
[0071] After the sites on the scaffold have been modified, the newly synthesized small molecule remains associated with the template that encoded its synthesis. Decoding the sequence of the template permits the deconvolution of the synthetic history and thereby the structure of the small molecule.
(ii) Classes of Chemical Reactions
[0072] Known chemical reactions for synthesizing polymers, small molecules, or other molecules can be used in nucleic acid-tempi ated reactions. Thus, reactions such as those listed in March 's Advanced Organic Chemistry, Organic Reactions, Organic Syntheses, organic text books, journals such as Journal of the American Chemical Society, Journal of Organic Chemistry, Tetrahedron, etc., and Carruther's Some Modern Methods of Organic Chemistry can be used. The chosen reactions preferably are compatible with nucleic acids such as DNA or RNA or are compatible with the modified nucleic acids used as the template. [0073] Reactions useful in nucleic-acid templated chemistry include, for example, substitution reactions, carbon-carbon bond forming reactions, elimination reactions, acylation reactions, and addition reactions. An illustrative but not exhaustive list of aliphatic
nucleophilic substitution reactions useful in the present invention includes, for example, SN2 reactions, SNI reactions, SNI reactions, allylic rearrangements, nucleophilic substitution at an aliphatic trigonal carbon, and nucleophilic substitution at an aromatic carbon.
[0074] Specific aliphatic nucleophilic substitution reactions with oxygen nucleophiles include, for example, hydrolysis of alkyl halides, hydrolysis of gem-dihalides, hydrolysis of 1,1,1 -trihalides, hydrolysis of alkyl esters or inorganic acids, hydrolysis of diazo ketones, hydrolysis of acetal and enol ethers, hydrolysis of epoxides, hydrolysis of acyl halides, hydrolysis of anhydrides, hydrolysis of carboxylic esters, hydrolysis of amides, alkylation with alkyl halides (Williamson Reaction), epoxide formation, alkylation with inorganic esters, alkylation with diazo compounds, dehydration of alcohols, transetherification, alcoholysis of epoxides, alkylation with onium salts, hydroxylation of silanes, alcoholysis of acyl halides, alcoholysis of anhydrides, esterification of carboxylic acids, alcoholysis of carboxylic esters (transesterfication), alcoholysis of amides, alkylation of carboxylic acid salts, cleavage of ether with acetic anhydride, alkylation of carboxylic acids with diazo compounds, acylation of carboxylic acids with acyl halides, acylation of carboxylic acids with activated carboxylic acids, formation of oxonium salts, preparation of peroxides and hydroperoxides, preparation of inorganic esters (e.g. , nitrites, nitrates, sulfonates), preparation of alcohols from amines, and preparation of mixed organic-inorganic anhydrides.
[0075] Specific aliphatic nucleophilic substitution reactions with sulfur nucleophiles, which tend to be better nucleophiles than their oxygen analogs, include, for example, attack by SH at an alkyl carbon to form thiols, attack by S at an alkyl carbon to form thioethers, attack by SH or SR at an acyl carbon, formation of disulfides, formation of Bunte salts, alkylation of sulfinic acid salts, and formation of alkyl thiocyanates.
[0076] Aliphatic nucleophilic substitution reactions with nitrogen nucleophiles include, for example, alkylation of amines, N-arylation of amines, replacement of a hydroxy by an amino group, transamination, transamidation, alkylation of amines with diazo compounds, amination of epoxides, amination of oxetanes, amination of aziridines, amination of alkanes, formation of isocyanides, acylation of amines by acyl halides, acylation of amines by anhydrides, acylation of amines by carboxylic acids, acylation of amines by carboxylic esters, acylation of amines by amides, acylation of amines by other acid derivatives, N-alkylation or N-arylation of amides and imides, N-acylation of amides and imides, formation of aziridines from epoxides, formation of nitro compounds, formation of azides, formation of isocyanates and
isothiocyanates, and formation of azoxy compounds.
[0077] Aliphatic nucleophilic substitution reactions with halogen nucleophiles include, for example, attack at an alkyl carbon, halide exchange, formation of alkyl halides from esters of sulfuric and sulfonic acids, formation of alkyl halides from alcohols, formation of alkyl halides from ethers, formation of halohydrins from epoxides, cleavage of carboxylic esters with lithium iodide, conversion of diazo ketones to a-halo ketones, conversion of amines to halides, conversion of tertiary amines to cyanamides (the von Braun reaction), formation of acyl halides from carboxylic acids, and formation of acyl halides from acid derivatives.
[0078] Aliphatic nucleophilic substitution reactions using hydrogen as a nucleophile include, for example, reduction of alkyl halides, reduction of tosylates, other sulfonates, and similar compounds, hydrogenolysis of alcohols, hydrogenolysis of esters (Barton-McCombie reaction), hydrogenolysis of nitriles, replacement of alkoxyl by hydrogen, reduction of epoxides, reductive cleavage of carboxylic esters, reduction of a C-N bond, desulfurization, reduction of acyl halides, reduction of carboxylic acids, esters, and anhydrides to aldehydes, and reduction of amides to aldehydes.
[0079] Although certain carbon nucleophiles may be too nucleophilic and/or basic to be used in certain embodiments of the invention, aliphatic nucleophilic substitution reactions using carbon nucleophiles include, for example, coupling with silanes, coupling of alkyl halides (the Wurtz reaction), the reaction of alkyl halides and sulfonate esters with Group I (I A) and II (II A) organometallic reagents, reaction of alkyl halides and sulfonate esters with organocuprates, reaction of alkyl halides and sulfonate esters with other organometallic reagents, allylic and propargylic coupling with a halide substrate, coupling of organometallic reagents with esters of sulfuric and sulfonic acids, sulfoxides, and sulfones, coupling involving alcohols, coupling of organometallic reagents with carboxylic esters, coupling of organometallic reagents with compounds containing an ester linkage, reaction of organometallic reagents with epoxides, reaction of organometallics with aziridine, alkylation at a carbon bearing an active hydrogen, alkylation of ketones, nitriles, and carboxylic esters, alkylation of carboxylic acid salts, alkylation at a position a to a heteroatom (alkylation of 1,3-dithianes), alkylation of dihydro- 1,3-oxazine (the Meyers synthesis of aldehydes, ketones, and carboxylic acids), alkylation with trialkylboranes, alkylation at an alkynyl carbon, preparation of nitriles, direct conversion of alkyl halides to aldehydes and ketones, conversion of alkyl halides, alcohols, or alkanes to carboxylic acids and their derivatives, the conversion of acyl halides to ketones with organometallic compounds, the conversion of anhydrides, carboxylic esters, or amides to ketones with organometallic compounds, the coupling of acyl halides, acylation at a carbon bearing an active hydrogen, acylation of carboxylic esters by carboxylic esters (the Claisen and Dieckmann condensation), acylation of ketones and nitriles with carboxylic esters, acylation of carboxylic acid salts, preparation of acyl cyanides, and preparation of diazo ketones, ketonic decarboxylation.
[0080] Reactions which involve nucleophilic attack at a sulfonyl sulfur atom may also be used in the present invention and include, for example, hydrolysis of sulfonic acid derivatives (attack by OH), formation of sulfonic esters (attack by OR), formation of sulfonamides (attack by nitrogen), formation of sulfonyl halides (attack by halides), reduction of sulfonyl chlorides (attack by hydrogen), and preparation of sulfones (attack by carbon).
[0081] Aromatic electrophilic substitution reactions may also be used in nucleotide- templated chemistry. Hydrogen exchange reactions are examples of aromatic electrophilic substitution reactions that use hydrogen as the electrophile. Aromatic electrophilic substitution reactions which use nitrogen electrophiles include, for example, nitration and nitro-de- hydrogenation, nitrosation of nitroso-de-hydrogenation, diazonium coupling, direct introduction of the diazonium group, and animation or amino-de-hydrogenation. Reactions of this type with sulfur electrophiles include, for example, sulfonation, sulfo-de-hydrogenation, halosulfonation, halosulfo-de-hydrogenation, sulfurization, and sulfonylation. Reactions using halogen electrophiles include, for example, halogenation, and halo-de-hydrogenation.
Aromatic electrophilic substitution reactions with carbon electrophiles include, for example, Friedel-Crafts alkylation, alkylation, alkyl-de-hydrogenation, Friedel-Crafts arylation (the Scholl reaction), Friedel-Crafts acylation, formylation with disubstituted formamides, formylation with zinc cyanide and HC1 (the Gatterman reaction), formylation with chloroform (the Reimer-Tiemann reaction), other formylations, formyl-de-hydrogenation, carboxylation with carbonyl halides, carboxylation with carbon dioxide (the Kolbe-Schmitt reaction), amidation with isocyanates, N-alkylcarbamoyl-de-hydrogenation, hydroxyalkylation, hydroxyalkyl-de-hydrogenation, cyclodehydration of aldehydes and ketones, haloalkylation, halo-de-hydrogenation, aminoalkylation, amidoalkylation, dialkylaminoalkylation,
dialkylamino-de-hydrogenation, thioalkylation, acylation with nitriles (the Hoesch reaction), cyanation, and cyano-de-hydrogenation. Reactions using oxygen electrophiles include, for example, hydroxylation and hydroxy-de-hydrogenation.
[0082] Rearrangement reactions include, for example, the Fries rearrangement, migration of a nitro group, migration of a nitroso group (the Fischer-Hepp Rearrangement), migration of an arylazo group, migration of a halogen (the Orton rearrangement), migration of an alkyl group, etc. Other reactions on an aromatic ring include the reversal of a Friedel-Crafts alkylation, decarboxylation of aromatic aldehydes, decarboxylation of aromatic acids, the Jacobsen reaction, deoxygenation, desulfonation, hydro-de-sulfonation, dehalogenation, hydro-de- halogenation, and hydrolysis of organometallic compounds.
[0083] Aliphatic electrophilic substitution reactions are also useful. Reactions using the SEI , SE2 (front), SE2 (back), SEI, addition-elimination, and cyclic mechanisms can be used in the present invention. Reactions of this type with hydrogen as the leaving group include, for example, hydrogen exchange (deuterio-de-hydrogenation, deuteriation), migration of a double bond, and keto-enol tautomerization. Reactions with halogen electrophiles include, for example, halogenation of aldehydes and ketones, halogenation of carboxylic acids and acyl halides, and halogenation of sulfoxides and sulfones. Reactions with nitrogen electrophiles include, for example, aliphatic diazonium coupling, nitrosation at a carbon bearing an active hydrogen, direct formation of diazo compounds, conversion of amides to a-azido amides, direct amination at an activated position, and insertion by nitrenes. Reactions with sulfur or selenium electrophiles include, for example, sulfenylation, sulfonation, and selenylation of ketones and carboxylic esters. Reactions with carbon electrophiles include, for example, acylation at an aliphatic carbon, conversion of aldehydes to β-keto esters or ketones, cyanation, cyano-de- hydrogenation, alkylation of alkanes, the Stork enamine reaction, and insertion by carbenes. Reactions with metal electrophiles include, for example, metalation with organometallic compounds, metalation with metals and strong bases, and conversion of enolates to silyl enol ethers. Aliphatic electrophilic substitution reactions with metals as leaving groups include, for example, replacement of metals by hydrogen, reactions between organometallic reagents and oxygen, reactions between organometallic reagents and peroxides, oxidation of trialkylboranes to borates, conversion of Grignard reagents to sulfur compounds, halo-de-metalation, the conversion of organometallic compounds to amines, the conversion of organometallic compounds to ketones, aldehydes, carboxylic esters and amides, cyano-de-metalation, transmetalation with a metal, transmetalation with a metal halide, transmetalation with an organometallic compound, reduction of alkyl halides, metallo-de-halogenation, replacement of a halogen by a metal from an organometallic compound, decarboxylation of aliphatic acids, cleavage of alkoxides, replacement of a carboxyl group by an acyl group, basic cleavage of β- keto esters and β-diketones, haloform reaction, cleavage of non-enolizable ketones, the Haller- Bauer reaction, cleavage of alkanes, decyanation, and hydro-de-cyanation. Electrophilic substitution reactions at nitrogen include, for example, diazotization, conversion of hydrazines to azides, N-nitrosation, N-nitroso-de-hydrogenation, conversion of amines to azo compounds, N-halogenation, N-halo-de-hydrogenation, reactions of amines with carbon monoxide, and reactions of amines with carbon dioxide.
[0084] Aromatic nucleophilic substitution reactions may also be used in the present invention. Reactions proceeding via the SNAr mechanism, the SNI mechanism, the benzyne mechanism, the SRNI mechanism, or other mechanism, for example, can be used. Aromatic nucleophilic substitution reactions with oxygen nucleophiles include, for example, hydroxy-de- halogenation, alkali fusion of sulfonate salts, and replacement of OR or OAr. Reactions with sulfur nucleophiles include, for example, replacement by SH or SR. Reactions using nitrogen nucleophiles include, for example, replacement by NH2, NHR, or NR2, and replacement of a hydroxy group by an amino group. Reactions with halogen nucleophiles include, for example, the introduction of halogens. Aromatic nucleophilic substitution reactions with hydrogen as the nucleophile include, for example, reduction of phenols and phenolic esters and ethers, and reduction of halides and nitro compounds. Reactions with carbon nucleophiles include, for example, the Rosenmund-von Braun reaction, coupling of organometallic compounds with aryl halides, ethers, and carboxylic esters, arylation at a carbon containing an active hydrogen, conversions of aryl substrates to carboxylic acids, their derivatives, aldehydes, and ketones, and the Ullmann reaction. Reactions with hydrogen as the leaving group include, for example, alkylation, arylation, and animation of nitrogen heterocycles. Reactions with N2 + as the leaving group include, for example, hydroxy-de-diazoniation, replacement by sulfur-containing groups, iodo-de-diazoniation, and the Schiemann reaction. Rearrangement reactions include, for example, the von Richter rearrangement, the Sommelet-Hauser rearrangement, rearrangement of aryl hydroxylamines, and the Smiles rearrangement. [0085] Reactions involving free radicals can also be used, although the free radical reactions used in nucleotide-templated chemistry should be carefully chosen to avoid modification or cleavage of the nucleotide template. With that limitation, free radical substitution reactions can be used in the present invention. Particular free radical substitution reactions include, for example, substitution by halogen, halogenation at an alkyl carbon, allylic halogenation, benzylic halogenation, halogenation of aldehydes, hydroxylation at an aliphatic carbon, hydroxylation at an aromatic carbon, oxidation of aldehydes to carboxylic acids, formation of cyclic ethers, formation of hydroperoxides, formation of peroxides, acyloxylation, acyloxy-de- hydrogenation, chlorosulfonation, nitration of alkanes, direct conversion of aldehydes to amides, amidation and animation at an alkyl carbon, simple coupling at a susceptible position, coupling of alkynes, arylation of aromatic compounds by diazonium salts, arylation of activated alkenes by diazonium salts (the Meerwein arylation), arylation and alkylation of alkenes by organopalladium compounds (the Heck reaction), arylation and alkylation of alkenes by vinyltin compounds (the Stille reaction), alkylation and arylation of aromatic compounds by peroxides, photochemical arylation of aromatic compounds, alkylation, acylation, and carbalkoxylation of nitrogen heterocycles Particular reactions in which N2 + is the leaving group include, for example, replacement of the diazonium group by hydrogen, replacement of the diazonium group by chlorine or bromine, nitro-de-diazoniation, replacement of the diazonium group by sulfur-containing groups, aryl dimerization with diazonium salts, methylation of diazonium salts, vinylation of diazonium salts, arylation of diazonium salts, and conversion of diazonium salts to aldehydes, ketones, or carboxylic acids. Free radical substitution reactions with metals as leaving groups include, for example, coupling of Grignard reagents, coupling of boranes, and coupling of other organometallic reagents. Reaction with halogen as the leaving group are included. Other free radical substitution reactions with various leaving groups include, for example, desulfurization with Raney Nickel, conversion of sulfides to organolithium compounds, decarboxylative dimerization (the Kolbe reaction), the Hunsdiecker reaction, decarboxylative allylation, and decarbonylation of aldehydes and acyl halides.
[0086] Reactions involving additions to carbon-carbon multiple bonds are also used in nucleotide-templated chemistry. Any mechanism may be used in the addition reaction including, for example, electrophilic addition, nucleophilic addition, free radical addition, and cyclic mechanisms. Reactions involving additions to conjugated systems can also be used. Addition to cyclopropane rings can also be utilized. Particular reactions include, for example, isomerization, addition of hydrogen halides, hydration of double bonds, hydration of triple bonds, addition of alcohols, addition of carboxylic acids, addition of H2S and thiols, addition of ammonia and amines, addition of amides, addition of hydrazoic acid, hydrogenation of double and triple bonds, other reduction of double and triple bonds, reduction of the double and triple bonds of conjugated systems, hydrogenation of aromatic rings, reductive cleavage of cyclopropanes, hydroboration, other hydrometalations, addition of alkanes, addition of alkenes and/or alkynes to alkenes and/or alkynes (e.g., pi-cation cyclization reactions, hydro-alkenyl- addition), ene reactions, the Michael reaction, addition of organometallics to double and triple bonds not conjugated to carbonyls, the addition of two alkyl groups to an alkyne, 1,4-addition of organometallic compounds to activated double bonds, addition of boranes to activated double bonds, addition of tin and mercury hydrides to activated double bonds, acylation of activated double bonds and of triple bonds, addition of alcohols, amines, carboxylic esters, aldehydes, etc. , carbonylation of double and triple bonds, hydrocarboxylation,
hydroformylation, addition of aldehydes, addition of HCN, addition of silanes, radical addition, radical cyclization, halogenation of double and triple bonds (addition of halogen, halogen), halolactonization, halolactamization, addition of hypohalous acids and hypohalites (addition of halogen, oxygen), addition of sulfur compounds (addition of halogen, sulfur), addition of halogen and an amino group (addition of halogen, nitrogen), addition of NOX and NO2X (addition of halogen, nitrogen), addition of XN3 (addition of halogen, nitrogen), addition of alkyl halides (addition of halogen, carbon), addition of acyl halides (addition of halogen, carbon), hydroxylation (addition of oxygen, oxygen) (e.g., asymmetric dihydroxylation reaction with OSO4), dihydroxylation of aromatic rings, epoxidation (addition of oxygen, oxygen) (e.g., Sharpless asymmetric epoxidation), photooxidation of dienes (addition of oxygen, oxygen), hydroxysulfenylation (addition of oxygen, sulfur), oxyamination (addition of oxygen, nitrogen), diamination (addition of nitrogen, nitrogen), formation of aziridines (addition of nitrogen), aminosulfenylation (addition of nitrogen, sulfur), acylacyloxylation and
acylamidation (addition of oxygen, carbon or nitrogen, carbon), 1,3-dipolar cycloaddition (addition of oxygen, nitrogen, carbon), Huisgen reaction of azides and acetylenes, Diels-Alder reaction, heteroatom Diels-Alder reaction, all carbon 3 +2 cycloadditions, dimerization of alkenes, the addition of carbenes and carbenoids to double and triple bonds, trimerization and tetramerization of alkynes, and other cycloaddition reactions. [0087] In addition to reactions involving additions to carbon-carbon multiple bonds, addition reactions to carbon-hetero multiple bonds can be used in nucleotide-templated chemistry.
Exemplary reactions include, for example, the addition of water to aldehydes and ketones (formation of hydrates), hydrolysis of carbon-nitrogen double bond, hydrolysis of aliphatic nitro compounds, hydrolysis of nitriles, addition of alcohols and thiols to aldehydes and ketones, reductive alkylation of alcohols, addition of alcohols to isocyanates, alcoholysis of nitriles, formation of xanthates, addition of H2S and thiols to carbonyl compounds, formation of bisulfite addition products, addition of amines to aldehydes and ketones, addition of amides to aldehydes, reductive alkylation of ammonia or amines, the Mannich reaction, the addition of amines to isocyanates, addition of ammonia or amines to nitriles, addition of amines to carbon disulfide and carbon dioxide, addition of hydrazine derivative to carbonyl compounds, formation of oximes, conversion of aldehydes to nitriles, formation of gem-dihalides from aldehydes and ketones, reduction of aldehydes and ketones to alcohols, reduction of the carbon- nitrogen double bond, reduction of nitriles to amines, reduction of nitriles to aldehydes, addition of Grignard reagents and organolithium reagents to aldehydes and ketones, addition of other organometallics to aldehydes and ketones, addition of trialkylallylsilanes to aldehydes and ketones, addition of conjugated alkenes to aldehydes (the Baylis-Hillman reaction), the Reformatsky reaction, the conversion of carboxylic acid salts to ketones with organometallic compounds, the addition of Grignard reagents to acid derivatives, the addition of
organometallic compounds to CO2 and CS2, addition of organometallic compounds to C=N compounds, addition of carbenes and diazoalkanes to C=N compounds, addition of Grignard reagents to nitriles and isocyanates, the Aldol reaction, Mukaiyama Aldol and related reactions, Aldol-type reactions between carboxylic esters or amides and aldehydes or ketones, the Knoevenagel reaction (e.g. , the Nef reaction, the Favorskii reaction), the Peterson alkenylation reaction, the addition of active hydrogen compounds to CO2 and CS2, the Perkin reaction, Darzens glycidic ester condensation, the Tollens' reaction, the Wittig reaction, the Tebbe alkenylation, the Petasis alkenylation, alternative alkenylations, the Thorpe reaction, the Thorpe-Ziegler reaction, addition of silanes, formation of cyanohydrins, addition of HCN to C=N and C=N bonds, the Prins reaction, the benzoin condensation, addition of radicals to C=0, C=S, C=N compounds, the Ritter reaction, acylation of aldehydes and ketones, addition of aldehydes to aldehydes, the addition of isocyanates to isocyanates (formation of
carbodiimides), the conversion of carboxylic acid salts to nitriles, the formation of epoxides from aldehydes and ketones, the formation of episulfides and episulfones, the formation of β- lactones and oxetanes (e.g. , the Paterno-Buchi reaction), the formation of β-lactams, etc.
Reactions involving addition to isocyanides include the addition of water to isocyanides, the Passerini reaction, the Ug reaction, and the formation of metalated aldimines. [0088] Elimination reactions, including α, β, and γ eliminations, as well as extrusion reactions, can be performed using nucleotide-templated chemistry, although the strength of the reagents and conditions employed should be considered. Preferred elimination reactions include reactions that go by El, E2, ElcB, or E2C mechanisms. Exemplary reactions include, for example, reactions in which hydrogen is removed from one side (e.g., dehydration of alcohols, cleavage of ethers to alkenes, the Chugaev reaction, ester decomposition, cleavage of quaternary ammonium hydroxides, cleavage of quaternary ammonium salts with strong bases, cleavage of amine oxides, pyrolysis of keto-ylids, decomposition of toluene-p- sulfonylhydrazones, cleavage of sulfoxides, cleavage of selenoxides, cleavage of sulfones, dehydrohalogenation of alkyl halides, dehydrohalogenation of acyl halides,
dehydrohalogenation of sulfonyl halides, elimination of boranes, conversion of alkenes to alkynes, decarbonylation of acyl halides), reactions in which neither leaving atom is hydrogen (e.g., deoxygenation of vicinal diols, cleavage of cyclic thionocarbonates, conversion of epoxides to episulfides and alkenes, the Ramberg-Backlund reaction, conversion of aziridines to alkenes, dehalogenation of vicinal dihalides, dehalogenation of a-halo acyl halides, and elimination of a halogen and a hetero group), fragmentation reactions (i.e., reactions in which carbon is the positive leaving group or the electrofuge, such as, for example, fragmentation of γ-amino and γ-hydroxy halides, fragmentation of 1,3-diols, decarboxylation of β-hydroxy carboxylic acids, decarboxylation of β-lactones, fragmentation of α,β-epoxy hydrazones, elimination of CO from bridged bicyclic compounds, and elimination of CO2 from bridged bicyclic compounds), reactions in which C≡N or C=N bonds are formed (e.g., dehydration of aldoximes or similar compounds, conversion of ketoximes to nitriles, dehydration of unsubstituted amides, and conversion of N-alkylformamides to isocyanides), reactions in which C=0 bonds are formed (e.g., pyrolysis of β-hydroxy alkenes), and reactions in which N=N bonds are formed (e.g., eliminations to give diazoalkenes). Extrusion reactions include, for example, extrusion of N2 from pyrazolines, extrusion of N2 from pyrazoles, extrusion of N2 from triazolines, extrusion of CO, extrusion of CO2, extrusion of SO2, the Story synthesis, and alkene synthesis by twofold extrusion.
[0089] Rearrangements, including, for example, nucleophilic rearrangements, electrophilic rearrangements, prototropic rearrangements, and free-radical rearrangements, can also be performed using nucleotide-templated chemistry. Both 1,2 rearrangements and non-1,2 rearrangements can be performed. Exemplary reactions include, for example, carbon-to-carbon migrations of R, H, and Ar (e.g., Wagner-Meerwein and related reactions, the Pinacol rearrangement, ring expansion reactions, ring contraction reactions, acid-catalyzed
rearrangements of aldehydes and ketones, the dienone-phenol rearrangement, the Favorskii rearrangement, the Arndt-Eistert synthesis, homologation of aldehydes, and homologation of ketones), carbon-to-carbon migrations of other groups (e.g., migrations of halogen, hydroxyl, amino, etc; migration of boron; and the Neber rearrangement), carbon-to-nitrogen migrations of R and Ar (e.g., the Hofmann rearrangement, the Curtius rearrangement, the Lossen rearrangement, the Schmidt reaction, the Beckman rearrangement, the Stieglitz rearrangement, and related rearrangements), carbon-to-oxygen migrations of R and Ar (e.g. , the Baeyer-
Villiger rearrangement and rearrangment of hydroperoxides), nitrogen-to-carbon, oxygen-to- carbon, and sulfur-to-carbon migration (e.g., the Stevens rearrangement, and the Wittig rearrangement), boron-to-carbon migrations (e.g., conversion of boranes to alcohols (primary or otherwise), conversion of boranes to aldehydes, conversion of boranes to carboxylic acids, conversion of vinylic boranes to alkenes, formation of alkynes from boranes and acetylides, formation of alkenes from boranes and acetylides, and formation of ketones from boranes and acetylides), electrocyclic rearrangements (e.g., of cyclobutenes and 1,3-cyclohexadienes, or conversion of stilbenes to phenanthrenes), sigmatropic rearrangements (e.g., (l,j) sigmatropic migrations of hydrogen, (l,j) sigmatropic migrations of carbon, conversion of
vinylcyclopropanes to cyclopentenes, the Cope rearrangement, the Claisen rearrangement, the Fischer indole synthesis, (2,3) sigmatropic rearrangements, and the benzidine rearrangement), other cyclic rearrangements (e.g., metathesis of alkenes, the di-7r-methane and related rearrangements, and the Hofmann-Loffler and related reactions), and non-cyclic
rearrangements (e.g., hydride shifts, the Chapman rearrangement, the Wallach rearrangement, and dyotropic rearrangements). [0090] Oxidative and reductive reactions may also be performed using nucleotide-templated chemistry. Exemplary reactions may involve, for example, direct electron transfer, hydride transfer, hydrogen-atom transfer, formation of ester intermediates, displacement mechanisms, or addition-elimination mechanisms. Exemplary oxidations include, for example, eliminations of hydrogen (e.g., aromatization of six-membered rings, dehydrogenations yielding carbon- carbon double bonds, oxidation or dehydrogenation of alcohols to aldehydes and ketones, oxidation of phenols and aromatic amines to quinones, oxidative cleavage of ketones, oxidative cleavage of aldehydes, oxidative cleavage of alcohols, ozonolysis, oxidative cleavage of double bonds and aromatic rings, oxidation of aromatic side chains, oxidative decarboxylation, and bisdecarboxylation), reactions involving replacement of hydrogen by oxygen (e.g., oxidation of methylene to carbonyl, oxidation of methylene to OH, C02R, or OR, oxidation of
arylmethanes, oxidation of ethers to carboxylic esters and related reactions, oxidation of aromatic hydrocarbons to quinones, oxidation of amines or nitro compounds to aldehydes, ketones, or dihalides, oxidation of primary alcohols to carboxylic acids or carboxylic esters, oxidation of alkenes to aldehydes or ketones, oxidation of amines to nitroso compounds and hydroxylamines, oxidation of primary amines, oximes, azides, isocyanates, or nitroso compounds, to nitro compounds, oxidation of thiols and other sulfur compounds to sulfonic acids), reactions in which oxygen is added to the subtrate (e.g., oxidation of alkynes to a- diketones, oxidation of tertiary amines to amine oxides, oxidation of thioesters to sulfoxides and sulfones, and oxidation of carboxylic acids to peroxy acids), and oxidative coupling reactions (e.g., coupling involving carbanions, dimerization of silyl enol ethers or of lithium enolates, and oxidation of thiols to disulfides).
[0091] Exemplary reductive reactions include, for example, reactions involving replacement of oxygen by hydrogen (e.g., reduction of carbonyl to methylene in aldehydes and ketones, reduction of carboxylic acids to alcohols, reduction of amides to amines, reduction of carboxylic esters to ethers, reduction of cyclic anhydrides to lactones and acid derivatives to alcohols, reduction of carboxylic esters to alcohols, reduction of carboxylic acids and esters to alkanes, complete reduction of epoxides, reduction of nitro compounds to amines, reduction of nitro compounds to hydroxylamines, reduction of nitroso compounds and hydroxylamines to amines, reduction of oximes to primary amines or aziridines, reduction of azides to primary amines, reduction of nitrogen compounds, and reduction of sulfonyl halides and sulfonic acids to thiols), removal of oxygen from the substrate (e.g., reduction of amine oxides and azoxy compounds, reduction of sulfoxides and sulfones, reduction of hydroperoxides and peroxides, and reduction of aliphatic nitro compounds to oximes or nitriles), reductions that include cleavage (e.g., de-alkylation of amines and amides, reduction of azo, azoxy, and hydrazo compounds to amines, and reduction of disulfides to thiols), reductive coupling reactions (e.g., bimolecular reduction of aldehydes and ketones to 1,2-diols, bimolecular reduction of aldehydes or ketones to alkenes, acyloin ester condensation, reduction of nitro to azoxy compounds, and reduction of nitro to azo compounds), and reductions in which an organic substrate is both oxidized and reduced (e.g., the Cannizzaro reaction, the Tishchenko reaction, the Pummerer rearrangement, and the Willgerodt reaction).
(Hi) Functional Group Transformations
[0092] Nucleic acid-templated synthesis can be used to effect functional group
transformations that either (i) unmask or (ii) interconvert functionality used in coupling reactions. By exposing or creating a reactive group within a sequence-programmed subset of a library, nucleic acid-templated functional group interconversions permit the generation of library diversity by sequential unmasking. The sequential unmasking approach offers the major advantage of enabling reactants that would normally lack the ability to be linked to a nucleic acid (for example, simple alkyl halides) to contribute to library diversity by reacting with a sequence-specified subset of templates in an intermolecular, non-templated reaction mode. This advantage significantly increases the types of structures that can be generated.
[0093] One embodiment of the invention involves deprotection or unmasking of functional groups present in a reactive unit. According to this embodiment, a nucleic acid-template is associated with a reactive unit that contains a protected functional group. A transfer unit, comprising an oligonucleotide complementary to the template codon region and a reagent capable of removing the protecting group, is annealed to the template, and the reagent reacts with the protecting group, removing it from the reactive unit. To further functionalize the reactive unit, the exposed functional group then is subjected to a reagent not linked to a nucleic acid. In some embodiments, the reactive unit contains two or more protected functional groups. In still other embodiments, the protecting groups are orthogonal protecting groups that are sequentially removed by iterated annealing with reagents linked to transfer units.
[0094] Another embodiment of the invention involves interconversions of functional groups present on a reactive unit. According to this embodiment, a transfer unit associated with a reagent that can catalyze a reaction is annealed to a template bearing the reactive unit. A reagent not linked to a nucleic acid is added to the reaction, and the transfer unit reagent catalyzes the reaction between the unlinked reagent and the reactive unit, yielding a newly functionalized reactive unit. In some embodiments, the reactive unit contains two or more functional groups which are sequentially interconverted by iterative exposure to different transfer unit-bound reagents.
(iv) Reaction Conditions
[0095] Nucleic acid-templated reactions can occur in aqueous or non-aqueous (i.e. , organic) solutions, or a mixture of one or more aqueous and non-aqueous solutions. In aqueous solutions, reactions can be performed at pH ranges from about 2 to about 12, or preferably from about 2 to about 10, or more preferably from about 4 to about 10. The reactions used in DNA- templated chemistry preferably should not require very basic conditions (e.g. , pH > 12, pH > 10) or very acidic conditions (e.g., pH < 1, pH < 2, pH < 4), because extreme conditions may lead to degradation or modification of the nucleic acid template and/or molecule (for example, the polymer, or small molecule) being synthesized. The aqueous solution can contain one or more inorganic salts, including, but not limited to, NaCl, Na2SC>4, KC1, Mg+2, Mn+2, etc., at various concentrations.
[0096] Organic solvents suitable for nucleic acid-templated reactions include, but are not limited to, methylene chloride, chloroform, dimethylformamide, and organic alcohols, including methanol and ethanol. To permit quantitative dissolution of reaction components in organic solvents, quatemized ammonium salts, such as, for example, long chain
tetraalkylammonium salts, can be added (Jost et al. (1989) NUCLEIC ACIDS RES. 17: 2143; Mel'nikov et al. (1999) LANGMUIR 15: 1923-1928).
[0097] Nucleic acid-templated reactions may require a catalyst, such as, for example, homogeneous, heterogeneous, phase transfer, and asymmetric catalysis. In other embodiments, a catalyst is not required. The presence of additional, accessory reagents not linked to a nucleic acid are preferred in some embodiments. Useful accessory reagents can include, for example, oxidizing agents (e.g., NaI04); reducing agents (e.g., NaCNBH3); activating reagents (e.g., EDC, NHS, and sulfo-NHS); transition metals such as nickel (e.g., Ni(N03)2), rhodium (e.g. RJ1CI3), ruthenium (e.g. R.UCI3), copper (e.g. Cu(N03)2), cobalt (e.g. CoCl2), iron (e.g. Fe(N03)3), osmium (e.g. OSO4), titanium (e.g. T1CI4 or titanium tetraisopropoxide), palladium (e.g. NaPdC ), or Ln; transition metal ligands (e.g., phosphines, amines, and halides); Lewis acids; and Lewis bases.
[0098] Reaction conditions preferably are optimized to suit the nature of the reactive units and oligonucleotides used.
V. SELECTION AND SCREENING
[0099] Selection and/or screening for reaction products (e.g., small molecules) with desired activities (such as catalytic activity, binding affinity, or a particular effect in an activity assay) may be performed using methodologies known and used in the art. For example, affinity selections may be performed according to the principles used in library-based selection methods such as phage display, polysome display, and mRNA-fusion protein displayed peptides. Selection for catalytic activity may be performed by affinity selections on transition- state analog affinity columns (Baca et al. (1997) PROC. NATL. ACAD. SCI. USA 94(19): 10063- 8) or by function-based selection schemes (Pedersen et al. (1998) PROC. NATL. ACAD. SCI. USA 95(18): 10523-8). Since minute quantities of DNA (~10"20 mol) can be amplified by PCR (Kramer et al. (1999) CURRENT PROTOCOLS IN MOLECULAR BIOLOGY (ed. Ausubel, F. M.) 15. 1 -15.3, Wiley), these selections can be conducted on a scale ten or more orders of magnitude less than that required for reaction analysis by current methods, making a truly broad search both economical and efficient.
(i) Setection for Binding to Target Molecule
[00100] The templates and reaction products (e.g. , small molecules) can be selected (or screened) for binding to a target molecule. In this context, selection or partitioning means any process whereby a library member bound to a target molecule is separated from library members not bound to target molecules. Selection can be accomplished by various methods known in the art.
[00101] The templates of the present invention contain a built-in function for direct selection and amplification. In most applications, binding to a target molecule preferably is selective, such that the template and the resulting reaction product (e.g. , a small molecule) bind preferentially with a specific target molecule, perhaps preventing or inducing a specific biological effect. Ultimately, a binding molecule identified using the present invention may be useful as a therapeutic and/or diagnostic agent. Once the selection is complete, the selected templates optionally can be amplified and sequenced. The selected reaction products, if present in sufficient quantity, can be separated from the templates, purified (e.g., by HPLC, column chromatography, or other chromatographic method), and further characterized.
(ii) Target Molecules
[00102] Binding assays provide a rapid means for isolating and identifying reaction products (e.g. , a small molecule) that bind to, for example, a surface (such as metal, plastic, composite, glass, ceramics, rubber, skin, or tissue); a polymer; a catalyst; or a target biomolecule such as a nucleic acid, a protein (including enzymes, receptors, antibodies, and glycoproteins), a signal molecule (such as cAMP, inositol triphosphate, peptides, or prostaglandins), a carbohydrate, or a lipid. Binding assays can be advantageously combined with activity assays for the effect of a reaction product on a function of a target molecule.
[00103] The selection strategy can be carried out to allow selection against almost any target. Importantly, the selection strategy does not require any detailed structural information about the target molecule or about the molecules in the libraries. The entire process is driven by the binding affinity involved in the specific recognition and binding of the molecules in the library to a given target. Examples of various selection procedures are described below.
[00104] The libraries of the present invention can contain molecules that could potentially bind to any known or unknown target. The binding region of a target molecule could include a catalytic site of an enzyme, a binding pocket on a receptor (for example, a G-protein coupled receptor), a protein surface area involved in a protein-protein or protein-nucleic acid interaction (preferably a hot-spot region), or a specific site on DNA (such as the major groove). The natural function of the target could be stimulated (agonized), reduced (antagonized), unaffected, or completely changed by the binding of the reaction product (e.g., a small molecule). This will depend on the precise binding mode and the particular binding site the reaction product occupies on the target.
[00105] Functional sites (such as protein-protein interaction or catalytic sites) on proteins often are more prone to bind molecules than are other more neutral surface areas on a protein. In addition, these functional sites normally contain a smaller region that seems to be primarily responsible for the binding energy: the so-called "hot-spot regions" (Wells, et al. (1993) RECENT PROG. HORMONE RES. 48: 253- 262). This phenomenon facilitates selection for molecules affecting the biological function of a certain target.
[00106] The linkage between the template molecule and reaction product (e.g. , a small molecule) allows rapid identification of binding molecules using various selection strategies. This invention broadly permits identifying binding molecules for any known target molecule. In addition, novel unknown targets can be discovered by isolating binding molecules against unknown antigens (epitopes) and using these binding molecules for identification and validation. In another preferred embodiment, the target molecule is designed to mimic a transition state of a chemical reaction; one or more reaction products resulting from the selection may stabilize the transition state and catalyze the chemical reaction.
(in) Binding Assays
[00107] The template-directed synthesis of the invention permits selection procedures analogous to other display methods such as phage display (Smith (1985) SCIENCE 228: 1315- 1317). Phage display selection has been used successfully on peptides (Wells et al. (1992) CURR. OP. STRUCT. BIOL. 2: 597-604), proteins (Marks et al. (1992) J. BIOL. CHEM. 267: 16007-16010) and antibodies (Winter et al. (1994) ANNU. REV. IMMUNOL. 12: 433-455). Similar selection procedures also are exploited for other types of display systems such as ribosome display Mattheakis et al. (1994) PROC. NATL. ACAD. SCI. 91 : 9022-9026) and mRNA display (Roberts, et al. (1997) PROC. NATL. ACAD. SCI. 94: 12297-302). The libraries of the present invention, however, allow direct selection of target-specific molecules without requiring traditional ribosome-mediated translation. The present invention also allows the display of small molecules which have not previously been synthesized directly from a nucleic acid template.
[00108] Selection of binding molecules from a library can be performed in any format to identify optimal binding molecules. Binding selections typically involve immobilizing the desired target molecule, adding a library of potential binders, and removing non-binders by washing. When the molecules showing low affinity for an immobilized target are washed away, the molecules with a stronger affinity generally remain attached to the target. The enriched population remaining bound to the target after stringent washing is preferably eluted with, for example, acid, chaotropic salts, heat, competitive elution with a known ligand or by proteolytic release of the target and/or of template molecules. The eluted templates are suitable for PCR, leading to many orders of amplification, whereby essentially each selected template becomes available at a greatly increased copy number for cloning, sequencing, and/or further enrichment or diversification.
[00109] In a binding assay, when the concentration of ligand is much less than that of the target (as it would be during the selection of a DNA-templated library), the fraction of ligand bound to target is determined by the effective concentration of the target protein. The fraction of ligand bound to target is a sigmoidal function of the concentration of target, with the midpoint (50% bound) at [target] = K& of the ligand-target complex. This relationship indicates that the stringency of a specific selection— the minimum ligand affinity required to remain bound to the target during the selection— is determined by the target concentration.
Therefore, selection stringency is controllable by varying the effective concentration of target.
[00110] The target molecule (peptide, protein, DNA or other antigen) can be immobilized on a solid support, for example, a container wall, a wall of a microtiter plate well. The library preferably is dissolved in aqueous binding buffer in one pot and equilibrated in the presence of immobilized target molecule. Non-binders are washed away with buffer. Those molecules that may be binding to the target molecule through their attached DNA templates rather than through their synthetic moieties can be eliminated by washing the bound library with unfunctionalized templates lacking PCR primer binding sites. Remaining bound library members then can be eluted, for example, by denaturation.
[00111] Alternatively, the target molecule can be immobilized on beads, particularly if there is doubt that the target molecule will adsorb sufficiently to a container wall, as may be the case for an unfolded target eluted from an SDS-PAGE gel. The derivatized beads can then be used to separate high-affinity library members from nonbinders by simply sedimenting the beads in a benchtop centrifuge. Alternatively, the beads can be used to make an affinity column. In such cases, the library is passed through the column one or more times to permit binding. The column then is washed to remove nonbinding library members. Magnetic beads are essentially a variant on the above; the target is attached to magnetic beads which are then used in the selection. [00112] There are many reactive matrices available for immobilizing the target molecule, including matrices bearing -NH2 groups or -SH groups. The target molecule can be immobilized by conjugation with NHS ester or maleimide groups covalently linked to
Sepharose beads and the integrity of known properties of the target molecule can be verified. Activated beads are available with attachment sites for -NH2 or -COOH groups (which can be used for coupling). Alternatively, the target molecule is blotted onto nitrocellulose or PVDF. When using a blotting strategy, the blot should be blocked (e.g. , with BSA or similar protein) after immobilization of the target to prevent nonspecific binding of library members to the blot.
[00113] Library members that bind a target molecule can be released by denaturation, acid, or chaotropic salts. Alternatively, elution conditions can be more specific to reduce background or to select for a desired specificity. Elution can be accomplished using proteolysis to cleave a connector between the target molecule and the immobilizing surface or between the reaction product (e.g. , a small molecule) and the template. Also, elution can be accomplished by competition with a known competitive ligand for the target molecule. Alternatively, a PCR reaction can be performed directly in the presence of the washed target molecules at the end of the selection procedure. Thus, the binding molecules need not be elutable from the target to be selectable since only the template is needed for further amplification or cloning, not the reaction product itself. Indeed, some target molecules bind the most avid ligands so tightly that elution would be difficult.
[00114] To select for a molecule that binds a protein expressible on a cell surface, such as an ion channel or a transmembrane receptor, the cells themselves can be used as the selection agent. The library preferably is first exposed to cells not expressing the target molecule on their surfaces to remove library members that bind specifically or non specifically to other cell surface epitopes. Alternatively, cells lacking the target molecule are present in large excess in the selection process and separable (by fluorescence-activated cell sorting (FACS), for example) from cells bearing the target molecule. In either method, cells bearing the target molecule then are used to isolate library members bearing the target molecule (e.g. , by sedimenting the cells or by FACS sorting). For example, a recombinant DNA encoding the target molecule can be introduced into a cell line; library members that bind the transformed cells but not the untransformed cells are enriched for target molecule binders. This approach is also called subtraction selection and has successfully been used for phage display on antibody libraries (Hoogenboom et al. (1998) IMMUNOTECH 4: 1- 20). [00115] A selection procedure can also involve selection for binding to cell surface receptors that are internalized so that the receptor together with the selected binding molecule passes into the cytoplasm, nucleus, or other cellular compartment, such as the Golgi or lysosomes.
Depending on the dissociation rate constant for specific selected binding molecules, these molecules may localize primarily within the intracellular compartments. Internalized library members can be distinguished from molecules attached to the cell surface by washing the cells, preferably with a denaturant. More preferably, standard subcellular fractionation techniques are used to isolate the selected library members in a desired subcellular compartment.
[00116] An alternative selection protocol also includes a known, weak ligand affixed to each member of the library. The known ligand guides the selection by interacting with a defined part of the target molecule and focuses the selection on molecules that bind to the same region, providing a cooperative effect. This can be particularly useful for increasing the affinity of a ligand with a desired biological function but with too low a potency.
[00117] Other methods for selection or partitioning are also available for use with the present invention. These include, for example: immunoprecipitation (direct or indirect) where the target molecule is captured together with library members; mobility shift assays in agarose or polyacrylamide gels, where the selected library members migrate with the target molecule in a gel; cesium chloride gradient centrifugation to isolate the target molecule with library members; mass spectroscopy to identify target molecules labeled with library members. In general, any method where the library member/target molecule complex can be separated from library members not bound to the target is useful.
[00118] The selection process is well suited for optimizations, where the selection steps are made in series, starting with the selection of binding molecules and ending with an optimized binding molecule. The procedures in each step can be automated using various robotic systems. Thus, the invention permits supplying a suitable library and target molecule to a fully automatic system which finally generates an optimized binding molecule. Under ideal conditions, this process should run without any requirement for external work outside the robotic system during the entire procedure.
[00119] The selection methods of the present invention can be combined with secondary selection or screening to identify reaction products (e.g. , small molecules) capable of modifying target molecule function upon binding. Thus, the methods described herein can be employed to isolate or produce binding molecules that bind to and modify the function of any protein or nucleic acid. For example, nucleic acid-templated chemistry can be used to identify, isolate, or produce binding molecules (1) affecting catalytic activity of target enzymes by inhibiting catalysis or modifying substrate binding; (2) affecting the functionality of protein receptors, by inhibiting binding to receptors or by modifying the specificity of binding to receptors; (3) affecting the formation of protein multimers by disrupting the quaternary structure of protein subunits; or (4) modifying transport properties of a protein by disrupting transport of small molecules or ions.
[00120] Functional assays can be included in the selection process. For example, after selecting for binding activity, selected library members can be directly tested for a desired functional effect, such as an effect on cell signaling. This can, for example, be performed via FACS methodologies.
[00121] The binding molecules of the invention can be selected for other properties in addition to binding. For example, to select for stability of binding interactions in a desired working environment. If stability in the presence of a certain protease is desired, that protease can be part of the buffer medium used during selection. Similarly, the selection can be performed in serum or cell extracts or in any type of medium, aqueous or organic. Conditions that disrupt or degrade the template should however be avoided to allow subsequent amplification.
(iv) Other Selections
[00122] Selections for other desired properties, such as catalytic or other functional activities, can also be performed. Generally, the selection should be designed such that library members with the desired activity are isolatable on that basis from other library members. For example, library members can be screened for the ability to fold or otherwise significantly change conformation in the presence of a target molecule, such as a metal ion, or under particular pH or salinity conditions. The folded library members can be isolated by performing non- denaturing gel electrophoresis under the conditions of interest. The folded library members migrate to a different position in the gel and can subsequently be extracted from the gel and isolated.
[00123] Similarly, reaction products that fluoresce in the presence of specific ligands may be selected by FACS based sorting of translated polymers linked through their DNA templates to beads. Those beads that fluoresce in the presence, but not in the absence, of the target ligand are isolated and characterized. Useful beads with a homogenous population of nucleic acid- templates on any bead can be prepared using the split-pool synthesis technique on the bead, such that each bead is exposed to only a single nucleotide sequence. Alternatively, a different anti-template (each complementary to only a single, different template) can be synthesized on beads using a split-pool technique, and then can anneal to capture a solution-phase library.
[00124] Biotin-terminated biopolymers can be selected for the actual catalysis of bond- breaking reactions by passing these biopolymers over a resin linked through a substrate to avidin. Those biopolymers that catalyze substrate cleavage self-elute from a column charged with this resin. Similarly, biotin-terminated biopolymers can be selected for the catalysis of bond-forming reactions. One substrate is linked to resin and the second substrate is linked to avidin. Biopolymers that catalyze bond formation between the substrates are selected by their ability to react the substrates together, resulting in attachment of the biopolymer to the resin.
[00125] Library members can also be selected for their catalytic effects on synthesis of a polymer to which the template is or becomes attached. For example, the library member may influence the selection of monomer units to be polymerized as well as how the polymerization reaction takes place (e.g., stereochemistry, tacticity, activity). The synthesized polymers can be selected for specific properties, such as, molecular weight, density, hydrophobicity, tacticity, stereoselectivity, using standard techniques, such as, electrophoresis, gel filtration, centrifugal sedimentation, or partitioning into solvents of different hydrophobicities. The attached template that directed the synthesis of the polymer can then be identified.
[00126] Library members that catalyze virtually any reaction causing bond formation between two substrate molecules or resulting in bond breakage into two product molecules can be selected using the schemes proposed herein. To select for bond forming catalysts (for example, hetero Diels-Alder, Heck coupling, aldol reaction, or olefin metathesis catalysts), library members are covalently linked to one substrate through their 5' amino or thiol termini. The other substrate of the reaction is synthesized as a derivative linked to biotin. When dilute solutions of library-substrate conjugate are combined with the substrate-biotin conjugate, those library members that catalyze bond formation cause the biotin group to become covalently attached to themselves. Active bond forming catalysts can then be separated from inactive library members by capturing the former with immobilized streptavidin and washing away inactive library members
[00127] In an analogous manner, library members that catalyze bond cleavage reactions such as retro-aldol reactions, amide hydrolysis, elimination reactions, or olefin dihydroxylation followed by periodate cleavage can be selected. In this case, library members are covalently linked to biotinylated substrates such that the bond breakage reaction causes the disconnection of the biotin moiety from the library members. Upon incubation under reaction conditions, active catalysts, but not inactive library members, induce the loss of their biotin groups.
Streptavidin-linked beads can then be used to capture inactive polymers, while active catalysts are able to be eluted from the beads. Related bond formation and bond cleavage selections have been used successfully in catalytic RNA and DNA evolution (Jaschke et al. (2000) CURR. OPIN. CHEM. BIOL. 4: 257-62) Although these selections do not explicitly select for multiple turnover catalysis, RNAs and DNAs selected in this manner have in general proven to be multiple turnover catalysts when separated from their substrate moieties (Jaschke et al. (2000) CURR. OPIN. CHEM. BIOL. 4: 257-62; Jaeger et al. (1999) PROC. NATL. ACAD. SCI. USA 96: 14712-7; Bartel et al. (1993) SCIENCE 261 : 141 1-8; Sen et al. (1998) CURR. OPIN. CHEM. BIOL. 2: 680-7).
[00128] In addition to simply evolving active catalysts, the in vitro selections described above are used to evolve non-natural polymer libraries in powerful directions difficult to achieve using other catalyst discovery approaches. Substrate specificity among catalysts can be selected by selecting for active catalysts in the presence of the desired substrate and then selecting for inactive catalysts in the presence of one or more undesired substrates. If the desired and undesired substrates differ by their configuration at one or more stereocenters, enantioselective or diastereoselective catalysts can emerge from rounds of selection. Similarly, metal selectivity can be evolved by selecting for active catalysts in the presence of desired metals and selecting for inactive catalysts in the presence of undesired metals. Conversely, catalysts with broad substrate tolerance can be evolved by varying substrate structures between successive rounds of selection.
[00129] Importantly, in vitro selections can also select for specificity in addition to binding affinity. Library screening methods for binding specificity typically require duplicating the entire screen for each target or non-target of interest. In contrast, selections for specificity can be performed in a single experiment by selecting for target binding as well as for the inability to bind one or more non-targets. Thus, the library can be pre-depleted by removing library members that bind to a non-target. Alternatively, or in addition, selection for binding to the target molecule can be performed in the presence of an excess of one or more non-targets. To maximize specificity, the non-target can be a homologous molecule. If the target molecule is a protein, appropriate non-target proteins include, for example, a generally promiscuous protein such as an albumin. If the binding assay is designed to target only a specific portion of a target molecule, the non-target can be a variation on the molecule in which that portion has been changed or removed.
(v) Amplification and Sequencing
[00130] Once all rounds of selection are complete, the templates which are associated with the selected reaction product (e.g. , a small molecule) preferably are amplified using any suitable technique to facilitate sequencing or other subsequent manipulation of the templates. Natural oligonucleotides can be amplified by any state of the art method. These methods include, for example, polymerase chain reaction (PCR); nucleic acid sequence-based amplification (see, for example, Compton (1991) NATURE 350: 91 -92), amplified anti-sense RNA (see, for example, van Gelder et al. (1988) PROC. NATL. ACAD. SCI. USA 85 : 77652- 77656); self-sustained sequence replication systems (Gnatelli et al. (1990) PROC. NATL. ACAD. SCI. USA 87: 1874-1878); polymerase-independent amplification (see, for example, Schmidt et al. (1997) NUCLEIC ACIDS RES. 25 : 4797-4802, and in vivo amplification of plasmids carrying cloned DNA fragments. Descriptions of PCR methods are found, for example, in Saiki et al. (1985) SCIENCE 230: 1350-1354; Scharf ei a/. (1986) SCIENCE 233: 1076-1078; and in U.S. Patent No. 4,683,202. Ligase-mediated amplification methods such as Ligase Chain Reaction (LCR) may also be used. In general, any means allowing faithful, efficient amplification of selected nucleic acid sequences can be employed in the method of the present invention. It is preferable, although not necessary, that the proportionate representations of the sequences after amplification reflect the relative proportions of sequences in the mixture before amplification.
[00131] For non-natural nucleotides the choices of efficient amplification procedures are fewer. As non-natural nucleotides can be incorporated by certain enzymes including polymerases it will be possible to perform manual polymerase chain reaction by adding the polymerase during each extension cycle. [00132] For oligonucleotides containing nucleotide analogs, fewer methods for amplification exist. One may use non-enzyme mediated amplification schemes (Schmidt et al. (1997)
NUCLEIC ACIDS RES. 25: 4797-4802). For backbone-modified oligonucleotides such as PNA and LNA, this amplification method may be used. Alternatively, standard PCR can be used to amplify a DNA from a PNA or LNA oligonucleotide template. Before or during amplification the templates or complementing templates may be mutagenized or recombined in order to create an evolved library for the next round of selection or screening.
(vi) Sequence Determination and Template Evolution
[00133] Sequencing can be done by a standard dideoxy chain termination method, or by chemical sequencing, for example, using the Maxam-Gilbert sequencing procedure.
Alternatively, the sequence of the template (or, if a long template is used, the variable portion(s) thereof) can be determined by hybridization to a chip. For example, a single- stranded template molecule associated with a detectable moiety such as a fluorescent moiety is exposed to a chip bearing a large number of clonal populations of single-stranded nucleic acids or nucleic acid analogs of known sequence, each clonal population being present at a particular addressable location on the chip. The template sequences are permitted to anneal to the chip sequences. The position of the detectable moieties on the chip then is determined. Based upon the location of the detectable moiety and the immobilized sequence at that location, the sequence of the template can be determined. It is contemplated that large numbers of such oligonucleotides can be immobilized in an array on a chip or other solid support. In some embodiments, next-generation sequencing techniques are used, where during DNA sequencing, the bases of a small fragment of DNA are sequentially identified from signals emitted as each fragment is re-synthesized from a DNA template strand. This sequencing method is based on reversible dye-terminators that enable the identification of single bases as they are introduced into complementary DNA strands.
[00134] The following examples contain important additional information, exemplification and guidance that can be adapted to the practice of this invention in its various embodiments and equivalents thereof. Practice of the invention will be more fully understood from these following examples, which are presented herein for illustrative purpose only, and should not be construed as limiting in any way. EXAMPLES
Example 1: Design and Use of DNA Templates
[00135] Small molecule compound libraries have been made using nucleic acid template synthesis, also referred to herein as DNA-programmed chemistry (DPC), in which the DNA base sequence corresponds directly to the structure of the molecule made on each unique DNA template strand. Sequence-specific DNA-templated reactions have been carried out covering a range of chemical reaction types. The process requires that the template forms a duplex specifically with the oligonucleotide in the transfer unit, a partner reagent DNA strand comprising an anti-codon sequence for one of the codons in the template. Following duplex formation, the DNA-linked reactive units were brought into close proximity, and a chemical reaction was catalyzed between the building blocks on the template and reagent strands, forming a new covalent bond linking these two small molecules together.
[00136] The single-stranded DNA sequence used as the template for DPC contained several distinct codon regions of predetermined length and sequence to ensure the specificity of DNA duplex formation and thus integrity of the chemical reaction.
[00137] In the instant invention, DNA templates have been designed containing fixed sequence regions, tag regions and codons for DPC reactions (see FIG. 1). Using the principles described herein, it has been possible to create libraries of novel small molecules conjugated to DNA oligonucleotides from analogous libraries of DNA templates where the specific base sequence in the template translates directly to a specific small molecule structure that was made on the 5 '-end of the DNA. Each codon and tag region within the template corresponds to a particular structural feature or building block employed in the synthesis, and the summation of all tag and codon sequences identifies the unique structure of the attached small molecule.
[00138] The composition of the preferred template sequence and composition is illustrated in FIG. 1 and is as follows, where each base employed in the single-stranded DNA template sequence is part of a longer functional region with a specific pre-determined role in the execution of DPC library synthesis or is required as part of the process by which the structures of active components of the library are determined. Located at both the 3'- and 5 '-ends of the DNA template, are located fixed regions of ten bases length, which represent polymerase chain reaction (PCR) ligation sites. These sites permit PCR amplification of the DNA sequences when it is required to define the specific base sequence of the DNA to directly identify library components. By determining the particular DNA base sequence of oligonucleotides attached to the small molecule it is possible to identify compounds that have affinity for target proteins. For example, incubating the library of DNA-conjugated small molecules in the presence of a solid-phase resin-immobilized protein target acts in an affinity-based selection format to sequester compounds with protein affinity. Washing the solid-supported protein free of non- binding compounds, and then elution of binders following protein denaturation will yield active conjugates. PCR amplification of the attached DNA strands and sequencing will reveal the specific base sequence of the DNA and by extension the unique structures of the small molecule protein ligands. The success and efficiency of the protein binding hit discovery process is greatly enhanced by enlarging the number and diversity of the small molecule collection, and the library design and synthesis process has been refined to increase the productivity and efficiency of hit discovery.
[00139] At the 5 '-end of the DNA template, adjacent to the fixed ligation site are three independent codon regions each of 12 bases. Each of these sites in turn can form a duplex with complementary reagent DNA sequences during the DPC reaction steps. The codons are designed to ensure specific interactions with the predetermined DNA sequences of the reagent 'anti-codon' sequences. The length of the codon region ensures sufficient base-pairing to give high affinity duplex formation with a suitably high melting temperature such that the duplex will form and be maintained at ambient temperature.
[00140] The reactions are run ideally at a reagent concentration of at least 60 nM to maintain a clear preference for intramolecular over intermolecular reactions, while minimizing the reaction volume. The length of the codon region also can be designed to minimize any mismatched DNA and thus loss of fidelity in the DPC process. Each codon region contains a number of possible base sequences chosen to present specific building blocks in one diversity location in the final library compound. For example in the preparation of a triazole-linked library, each codon position (Rl through R3) contains one of 24 variants, and each of the codon sets in each of the positions Rl, R2 and R3 had its own unique set of 24 codons, making a total of 72 codon sequences used (see FIG. 2). As the template is created by a split-pool combinatorial synthesis, there were a total of 24 x 24 x 24 or 13,824 different permutations of Rl, R2 and R3, and thus the oligonucleotide synthesis was based on a mix and split process to ensure that all permutations were generated in approximately equal amounts. [00141] For example, in the generation of a template sequence, DNA synthesis was carried out in 24 parallel vessels, and within each the DNA was synthesized from the 3 '-end to produce specific DNA sequence comprising the fixed ligation sequence of 10 bases, two tag sequences totaling 14 bases and then a unique R3 codon sequence of 12 bases. At the completion of the addition of bases constituting the R3 codon, the controlled-pore glass (CPG) solid support from all 24 vessels was removed and thoroughly mixed before redistribution into 24 new vessels for the addition of the bases constituting 24 unique and distinctive R2 codon sequences. The same process carried out for the introduction of the 24 Rl codon sequences provided the 13,824 different nascent DNA templates. Subsequent oligonucleotide synthesis introduced the final fixed ligation sequence at the 5 'end of the DNA. At this point, chemical modification of the 5'-hydroxyl introduced an amino group that provided a handle for addition of the first synthetic building block (linker residue).
Example 2: Tag Sequence in the DNA Template to Define the Linker Group
[00142] FIGS. 1 and 3 show an exemplary template containing two tags, Tag 1 and Tag 2, each of which is 7 bases long. These DNA sequences are essential in the identification of the small molecule covalently attached to the template DNA at the end of the library synthesis, but they do not engage in DPC-catalyzed reactions. Instead they are used to 'hard code' for subsets of the library that might differ in the identity of the linker or spacer building blocks.
[00143] As an illustration, when the template DNA oligonucleotide is synthesized, within one mixture the Tag 2 position is kept fixed with a unique base sequence, and the mix and split process of introducing the Rl through R3 codon regions proceeds to give every codon variant (see FIG. 3). At the end of the oligonucleotide synthesis the resulting product is a mixture of templates all containing a fixed Tag 2 base sequence that defines one linker building block. Without further mixing, the individual 'linker' building blocks are chemically attached to the 5 '-amino terminus maintaining a direct relationship between the Tag 2 sequence and the linker building block structure. The DNA template-linker conjugates are then mixed and used in DPC reactions directly. For example there might be 16 different linkers employed in a library and the template oligonucleotides will have been synthesized in 16 mixtures each comprising 13,824 different sequences. Combining the 16 different mixtures gives a total template library complexity of 221,184 sequences (see FIG. 4). The final library products that are synthesized can be identified by sequencing the DNA, revealing the structure of the small molecule by considering both (1) the codon regions representing building blocks Rl through R3 and (2) the tag region defining the linker building block.
Example 3: Tag Sequence in the DNA Template Defines a Spacer Group
[00144] The other tag sequence in the DNA template sequence (see FIGS. 1 and 2) is used to define the spacer building block. At the end of the DPC steps in the library sequence, there will be generated a mixture of 221,184 template sequences and library members. For every component, the Tag 1 position base sequence is held constant, but any library might comprise multiple library mixtures which differ only in the DNA base sequence of Tag 1. Each library mixture will independently be chemically derivatized in a non-DPC step with a spacer building block. The mixtures are kept separate so that the spacer that is attached is defined by the Tag 1 base sequence ensuring that there is fidelity between the Tag 1 DNA sequence and the spacer structure (see FIG. 5). After this addition and further chemical modification as necessary (e.g. cyclization of a macrocycle might constitute a final library synthesis step for some libraries), the individual mixtures can be combined. After screening and DNA sequencing, the identity of the attached small molecule is defined by the sequences of Tag 1 (spacer), Tag 2 (linker) and codons 1 through 3 (the Rl through R3 building blocks).
Example 4: Use of a Fourth Codon in a DNA Template
[00145] The DPC library synthesis process is a method for converting the combinatorial set of DNA template sequences into a combinatorial set of small molecules that are constructed on the 5 '-end of the DNA. As the numerical complexity of the library is a key advantage of the technology, as more compounds will naturally enhance success in empirical screening of any compound collection, the total library size is the mathematical product of the building block diversity at every variable position in the molecule. For example in one library, diversity was introduced through the linker position (16 variants) and each of the Rl through R3 building block positions (each 24 variants). This affords a total mixture complexity of 221,184 different library small molecule products (see FIG. 4).
[00146] Increasing the numerical complexity of the library can be achieved in several different and additive ways, but the inclusion of a fourth codon is one way to increase library size (see FIG. 6). The fourth codon permits the addition of a new building block position during DNA-programmed chemistry (DPC). Using 24 variants for the new building block introduced by this codon will provide 331,776 different template sequences. When 16 different four-codon template mixtures are pooled, a library of 5,308,416 unique compounds is created, which is a 24-fold increase in library diversity as compared to a 16-mixture pool of three codon templates, which provides only 221,184 compounds.
Example 5: Increasing the Length of the Codon Region in the DNA Template
[00147] The length of the codon region ensures sufficient base-pairing between the template and the reagent oligonucleotides, such that there is sufficient specificity in the DNA duplex formation. The choice of codons of 12 bases length ensures a high level of fidelity between the codon sequence and the identity of the building block added to the small molecule being synthesized on the 5 '-end of the DNA. In particular the length is chosen to give high affinity duplex formation with a suitably high melting temperature (above ambient temperature), and also to minimize any mismatched DNA.
[00148] In addition, the number of base sequence permutations that can be achieved with four bases in each of the 12 base positions is 16,777,216, providing a significant choice of alternate sequences for each building block. Appropriate computer algorithms can be used to select codon sequences, to compile all possible full DNA template sequences, and to determine that there is absolute fidelity and no ambiguity in the matching of anti-codons on the reagent DNA to the template codons. In particular, it can be important to ensure that codons recognized only their designated anti-codons, and that the reagent strands containing the anti-codons can bind only to their complementary codons, not to any other codon, nor to any other sequence of bases in any of the templates which are concurrently present in the DPC reaction mixture. The vast diversity of codon sequences ensures that 24 unique codon sequences that work in every library template context can readily be selected. However, as the methodology is extended to allow the creation of larger libraries up to and including a billion compounds, the codon repertoire may be extended by lengthening the codon base sequence. Increasing the codon length to 14 bases now provides a total of 268 million base permutations, thus assisting the choice of sequence-specific codon-anti-codon interactions devoid of mismatch or cross-talk interactions that would detract from the one to one correspondence between DNA sequence and small molecule structure generated in the library process.
[00149] A consideration of the impact of longer codons will implicate a necessary requirement for longer template sequences. When three 12-base codons are used, the total template length was 70 bases. Increasing the number of DPC codons from three to four, and increasing codon length from 12 to 14 bases, increases the overall template DNA single strand length to 90 bases (FIG. 6).
Example 6: Use of a Multivalent (Trivalent) Building Block [00150] The DNA-programmed chemistry (DPC) approach permits building blocks to be added to the growing small molecule on the 5 '-end of the DNA independent of any base sequence. A typical building block is a bifunctional reagent such as an amino acid, although depending on the chemistry used in small molecule synthesis, the building block is not necessarily limited to just this type of building block. In a typical library synthesis, the amino acid is attached to the reagent DNA anticodon sequence through the amino group. After DNA duplex formation between the template and the reagent DNA strands, the carboxylic acid is activated, typically with a standard peptide coupling reagent, and the amide bond is generated between the amino acid and the free amine group on the small molecule intermediate attached to the 5 '-end of the DNA template. At this juncture, the newly created molecule is sandwiched between the template and reagent DNA strands. Through the inclusion of a scissile bond in the connector that attaches the amino acid to the reagent DNA, under predefined chemical conditions, usually modestly basic conditions in the case of the BOSCOES connector group, the scissile bond can be selectively cleaved allowing the separation from the now redundant reagent DNA. The template now has a newly modified small molecule on the 5 '-end with a free exposed amine group which is available for further chemical derivitization.
[00151] A sequence of bifunctional building blocks such as amino acids, when incorporated into small molecules, will generate linear products, where the product is being extended in an unambiguous one-dimensional manner to give a straight chain product. However, the structural complexity of final small products being generated on DNA by DPC can be greatly enhanced by the incorporation of a trifunctional building block. An example might be the use of a suitably protected diamino acid. Such building blocks have been attached to the DNA template through the carboxylic acid, with the two amine groups exposed for further chemistry. To prevent ambiguity in the synthesis, the two amines will be protected in different ways, so that either of both amines can be independently revealed when required. For example, one amine has been protected as the Fmoc-derivative, which can be revealed by treating with piperidine, and the other as an azide, which was converted to the free amine at a later stage of the library synthesis by a reduction or hydrogenation reaction. DPC reactions have been used to add amino acids onto the free amine of the trifunctional group. At a later stage, the amine on the trifunctional building block was revealed by suitable chemical conversion, and the two amines (that on the amino acid added by DPC, and that revealed on the trifunctional building block) can be linked by a bis-carboxylic acid spacer molecule to generate a macrocycle. Alternatively, the terminal functional group might be a carboxy alkyne introduced as a spacer by amide coupling to the terminal amine on the third bifunctional building block. Production of the final macrocyclic product has been achieved by a copper (I) salt-catalyzed Huisgen cyclization, onto the azide of the trifunctional linker molecule, to give a 1,2,3-triazole product.
[00152] Adding the trifunctional building block at a different point in the library synthesis generates macrocycles of different sizes. If the bis-carboxylic acid spacer is not employed in the library synthesis, or included but not used in a cyclization reaction, the synthesis instead will generate branched small molecules. The trifunctional building block can thus be considered to be a diversity generating element that, by being employed at different stages of the synthesis, can result in small molecule products with highly divergent architectures (see FIG. 7). Inclusion of additional trifunctional building blocks can add further structural diversity to the library with minimal additional synthetic effort, and will result in multiple libraries of similar numerical complexity.
Example 7: Splitting the Library into Aliquots and Adding the Building Blocks in
Alternate Permutations to Generate Different Libraries and Architectures
[00153] Libraries have been prepared wherein each codon position encodes for a building block collection of 24 different building blocks. Each of the three codon positions has a different set of 24 building blocks associated with it. The building blocks are added to the growing small molecule by sequential DPC reactions. However there does not need to be a direct relationship between the sequence of DPC catalyzed building block adding events and the relative positions of the encoding codons within the DNA template. Although the first building block could be introduced by using the codon 1 position. A set of building blocks could be introduced in the first step equally successfully by using the second codon position or indeed the third.
[00154] Considering a mixture of macrocycle, linear or branched library products of 221 , 184 different structures generated by adding three different sets of building blocks in a specific order, those sets of building blocks could be added in any other sequence to generate new compounds. In addition to adding the sets in the sequence 1 followed by 2 followed by 3, the building block sets could be added 1 followed by 3 followed by 2. This permits the same building blocks to generate 6 different mixtures increasing the library size from 221,184 to 1,327,104 different products. One further step is to use any building block mixture in any position, which will generate 27 different library mixtures giving a total of 5,971,968 different compounds.
Example 8: Splitting the Library into Aliquots and Adding Soluble Reagents
[00155] Certain library architectures can be engineered to contain reactive functional groups such as carboxylic acids and amines. These functional groups are further derivatized by reaction of the entire library mixture with soluble reagents. For example, making a library comprising multiple diversity positions might conclude the DPC steps with an exposed nucleophilic amino group. Rather than using this amine for a further DPC step, or for cyclization to yield a macrocycle, the amine is derivatized with a number of different soluble reagents. To maximize the total number of library compounds, the library mixture at the end of the DPC steps is split into multiple aliquots such that each aliquot contains every library component. Each aliquot is treated with a different soluble reagent. A library aliquot containing a nucleophilic amine is reacted with acid chlorides, anhydrides, sulfonyl chlorides, isocyanates, isothiocyanates or nucleophilic aromatic systems such as chloroheterocycles. Similarly, a library mixture with a free, terminal carboxylic acid is reacted with amines using routine amide coupling conditions to generate amide derivatives. A mixture of 5,971,968 library products containing a terminal amino group is split into 50 aliquots and each aliquot reacted with a different acyl chloride, anhydride, sulfonyl chloride or other electrophilic reagent to generate a total of 298,598,400 different library products in 50 mixtures.
[00156] Example 9: Addition of a DNA Tag Sequence by Ligation to Define a Soluble Reagent or the Sequence of Building Block Addition
[00157] Tag sequences can be used to define and later identify non-DPC steps, and as a consequence of their use, it is necessary to keep pools of template sequences separate until the point the non-DPC building block is added to the growing small molecule. An example of this is the addition of the linker at the start of the small molecule synthesis. This step is undertaken with individual pools of templates, each comprising a unique tag sequence and each resulting in the addition of only one linker building block. The pools of templates each containing different linkers can be combined subsequently and prior to the DPC reactions.
[00158] An alternative method is to add the linker to a template mixture and then add a tag DNA sequence to the 3 '-end of the DNA template by a ligation reaction. The tag can be just 7 bases long and would be unique for the linker that has been added. An advantage of this approach is that a building block or soluble reagent that is added to every component in a templated mixture can be added at any stage of the synthesis, either before or after, or between DPC steps, and yet the identity of the product can be captured in the template sequence by adding additional DNA bases that define this chemical transformation. When DNA templates are later amplified and sequenced, in addition to identifying small molecule structures by the base sequence of the codon regions, the sequence of the added tag can be identified and the nature of the non-DPC building block or soluble reagent can also be unambiguously determined.
INCORPORATION BY REFERENCE
[00159] The entire disclosure of each of the patent documents and scientific articles referred to herein is incorporated by reference for all purposes.
EQUIVALENTS
[00160] The invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The foregoing embodiments are therefore to be considered in all respects illustrative rather than limiting on the invention described herein. Scope of the invention is thus indicated by the appended claims rather than by the foregoing description, and all changes that come within the meaning and range of equivalency of the claims are intended to be embraced therein.

Claims

WHAT IS CLAIMED IS: 1. A method for producing a library of small molecules associated with corresponding oligonucleotides comprising the steps of:
(a.) providing a plurality of templates comprising a plurality of first reactive units associated with a corresponding plurality of first oligonucleotides, wherein:
i. each first oligonucleotide defines at least a first codon sequence, a second codon sequence, and a third codon sequence;
ii. each of the first, second and third codon sequence is at least 12 bases in length;
iii. each of the first, second and third codon sequence is different from one another; and iv. each first oligonucleotide is at least 70 bases in length;
(b.) providing a plurality of first transfer units comprising a plurality of second reactive units covalently attached to a corresponding plurality of second oligonucleotides, wherein each second oligonucleotide defines a first anti-codon sequence complementary to a first codon sequence;
(c.) annealing first and second oligonucleotides having complementary codon and anti-codon sequences to bring the first and second reactive units into reactive proximity, thereby producing a plurality of first small molecules associated with the corresponding first oligonucleotides;
(d.) providing a plurality of second transfer units comprising a plurality of third reactive units covalently attached to a corresponding plurality of third oligonucleotides, wherein each third oligonucleotide defines a second anti-codon sequence complementary to the second codon sequence;
(e.) annealing first and third oligonucleotides having complementary codon and anti- codon sequences to bring the plurality of first small molecules of step (c) and the third reactive units into reactive proximity thereby producing a plurality of second small molecules associated with the corresponding first oligonucleotides;
(f.) providing a plurality of third transfer units comprising a plurality of fourth reactive units covalently attached to a corresponding plurality of fourth oligonucleotides, wherein each fourth oligonucleotide defines a third anti-codon sequence complementary to the third codon sequence; and
(g.) annealing first and fourth oligonucleotides having complementary codon and anti-codon sequences to bring the plurality of second small molecules of step (e) and the fourth reactive units into reactive proximity thereby producing a plurality of third small molecules associated with the corresponding first oligonucleotides, wherein each of the corresponding first oligonucleotides has a nucleotide sequence informative of at least a portion of the synthetic history of the third small molecule associated therewith.
2. The method of claim 1, wherein the first oligonucleotide comprises a fourth codon of at least 12 bases in length, the method comprising the additional steps of:
(h.) providing a plurality of fourth transfer units comprising a plurality of fifth reactive units covalently attached to a corresponding plurality of fifth oligonucleotides, wherein each fifth oligonucleotide defines a fourth anti-codon sequence complementary to the fourth codon sequence; and
(i.) annealing first and fifth oligonucleotides having complementary codon and anti- codon sequences to bring the plurality of third small molecules of step (g) and the fifth reactive units into reactive proximity thereby producing a plurality of fourth small molecules associated with the corresponding first oligonucleotides, wherein each of the corresponding first oligonucleotides has a nucleotide sequence informative of at least a portion of the synthetic history of the fourth small molecule associated therewith.
3. The method of claim 1 or 2, wherein at least one codon is at least 14 bases in length.
4. The method of claim 3, wherein each of the first, second, third, and fourth codon is at least 14 bases in length.
5. The method of any one of claims 1-4, wherein the first oligonucleotide is at least 90 bases in length.
6. The method of any one of claims 1-5, wherein the third or fourth small molecule comprises a moiety that was added as a soluble reagent to the first oligonucleotide-associated small molecule of one or more of steps (c), (e), (g) or (i); and, optionally, wherein each of the first oligonucleotides comprises a nucleotide sequence that is informative of the soluble reagent-added moiety.
7. The method of any one of claims 1-6, wherein at least one of the first, second, third, fourth or fifth reactive unit, or the soluble reagent, is a trivalent moiety.
8. The method of any one of claims 1-7, wherein the second, third or fourth small molecule, or the soluble reagent comprises a reactive moiety capable of further reaction with a plurality of chemical moieties.
9. The method of claim 8, wherein the reactive moiety capable of further reaction with another chemical moiety is selected from a nucleophilic primary or secondary amine and a free carboxyl group.
10. The method of claim 8 or 9 comprising the further steps of:
splitting the library into a plurality of aliquots following addition of the reactive moiety capable of further reaction with a plurality of chemical moieties;
adding to each of the plurality of aliquots a different reagent that reacts with the reactive moiety present on the first oligonucleotide-associated small molecules present therein.
11. The method of claim 10, wherein the different reagents comprise two or more of an acylating agent, a sulfonating agent, a heteroaryl halide reagent, reductive amination reagents and an amide-forming reagent.
12. The method of claim 11, wherein an encoding sequence is ligated to the 3' terminus of each of the plurality of templates before or after addition of a moiety provided by a soluble reagent or other chemical modification to identify the moiety.
13. The method of any one of claims 1-12, wherein the concentration of the plurality of templates is at least 90 nM and no greater than 500 nM at each step when a reactive unit is added.
14. The method of any one of claims 1-13, wherein one or more of the plurality of second, third, fourth or fifth oligonucleotides is bound to a first binding pair member.
15. The method of claim 14, wherein the first binding pair member is biotin.
16. The method of any one of claims 1-15, wherein following the addition of one or more of the first, second, third, or fourth reactive unit, or the soluble reagent-added moiety, the plurality of templates is reacted with a capping reagent that differentially caps the small molecules that did not react with the corresponding reactive unit or soluble reagent, wherein the cap renders the small molecule that did not react with the corresponding reactive unit or soluble reagent unable to react with any further reactive units or soluble reagents.
17. The method of claim 16, wherein the capping reagent is an acid anhydride or acyl chloride.
18. The method of claim 17, wherein the capping reagent is acetic anhydride.
19. The method of any one of claims 14-18, wherein one or more of the plurality of first, second, third or fourth small molecules is purified by contact with a second binding pair member, wherein the second binding pair member is bound to a solid support.
20. A method for producing a library of small molecules associated with corresponding oligonucleotides comprising the steps of:
(a.) providing a plurality of templates comprising a plurality of first reactive units associated with a corresponding plurality of first oligonucleotides, wherein:
i. each first oligonucleotide defines at least a first codon sequence, a second codon sequence, and a third codon sequence;
ii. each of the first, second and third codon sequence is at least 12 bases in length; iii. each of the first, second and third codon sequence is different from one another; and
iv. each first oligonucleotide is at least 70 bases in length;
(b.) dividing the plurality of templates into a plurality of aliquots;
(c.) for each aliquot:
i. providing a plurality of first transfer units comprising a plurality of second reactive units covalently attached to a corresponding plurality of second
oligonucleotides, wherein each second oligonucleotide defines a first anti-codon sequence complementary to a first codon sequence;
ii. annealing first and second oligonucleotides having complementary codon and anti-codon sequences to bring the first and second reactive units into reactive proximity, thereby producing a plurality of first small molecules associated with the corresponding first oligonucleotides;
iii. providing a plurality of second transfer units comprising a plurality of third reactive units covalently attached to a corresponding plurality of third oligonucleotides, wherein each third oligonucleotide defines a second anti-codon sequence
complementary to the second codon sequence;
iv. annealing first and third oligonucleotides having complementary codon and anti- codon sequences to bring the plurality of first small molecules of step (ii) and the third reactive units into reactive proximity thereby producing a plurality of second small molecules associated with the corresponding first oligonucleotides;
v. providing a plurality of third transfer units comprising a plurality of fourth reactive units covalently attached to a corresponding plurality of fourth
oligonucleotides, wherein each fourth oligonucleotide defines a third anti-codon sequence complementary to the third codon sequence; vi. annealing first and fourth oligonucleotides having complementary codon and anti-codon sequences to bring the plurality of second small molecules of step (iv) and the fourth reactive units into reactive proximity thereby producing a plurality of third small molecules associated with the corresponding first oligonucleotides, wherein: the order of adding the first, second, and third transfer units is different from any other aliquot; and
each of the corresponding first oligonucleotides has a nucleotide sequence informative of at least a portion of the synthetic history of the third small molecule associated therewith; and
(d.) optionally recombining two or more of the aliquots to create a library of small molecules.
21. The method of claim 20, wherein an encoding sequence is ligated to the 3 ' terminus of each of the plurality of templates before or after addition of a moiety provided by a soluble reagent or other chemical modification to identify the moiety.
22. The method of claim 21, wherein the first oligonucleotide comprises a fourth codon of at least 12 bases in length, wherein prior to optionally recombining the two or more aliquots, the method comprising the additional steps of:
vii. providing a plurality of fourth transfer units comprising a plurality of fifth reactive units covalently attached to a corresponding plurality of fifth oligonucleotides, wherein each fifth oligonucleotide defines a fourth anti-codon sequence complementary to the fourth codon sequence;
viii. annealing first and fifth oligonucleotides having complementary codon and anti- codon sequences to bring the plurality of third small molecules of step (vi) and the fifth reactive units into reactive proximity thereby producing a plurality of fourth small molecules associated with the corresponding first oligonucleotides, wherein for each aliquot:
the order of adding the first, second, third and fourth transfer units is different from any other aliquot; and
each of the corresponding first oligonucleotides has a nucleotide sequence informative of at least a portion of the synthetic history of the fourth small molecule associated therewith.
23. The method of claim 21 or 22, wherein at least one codon is at least 14 bases in length.
24. The method of claim 23, wherein each of the first, second, third and fourth codon is at least 14 bases in length.
25. The method of any one of claims 20-24, wherein the first oligonucleotide is at least 90 bases in length.
26. The method of any one of claims 20-25, wherein the fourth small molecule comprises a moiety that was added as a soluble reagent to the first oligonucleotide-associated small molecule of one or more of steps (ii), (iv), (vi) or (viii); and, optionally, wherein each of the first oligonucleotides comprises a nucleotide sequence that is informative of the soluble reagent-added moiety.
27. The method of any one of claims 20-25, wherein at least one of the first, second, third, fourth or fifth reactive unit, or the soluble reagent is a trivalent moiety.
28. The method of any one of claims 20-27, wherein the second, third or fourth small molecule, or the soluble reagent comprises a reactive moiety capable of further reaction with a plurality of chemical moieties.
29. The method of claim 28, wherein the reactive moiety capable of further reaction with another chemical moiety is selected from a nucleophilic primary or secondary amine and a free carboxyl group.
30. The method of claim 28 or 29 comprising the further steps of:
splitting the library into a plurality of aliquots following addition of the reactive moiety capable of further reaction with a plurality of chemical moieties; and
adding to each of the plurality of aliquots a different reagent that reacts with the reactive moiety present on the first oligonucleotide-associated small molecules present therein.
31. The method of claim 30, wherein the different reagents comprise two or more of an acylating agent, a sulfonating agent, a heteroaryl halide reagent, reductive amination reagents and an amide-forming reagent.
32. The method of any one of claims 20-31, wherein the concentration of the plurality of templates is at least 90 nM and no greater than 500 nM at each step when a reactive unit is added.
33. The method of any one of claims 20-32, wherein one or more of the plurality of second, third, fourth or fifth oligonucleotides is bound to a first binding pair member.
34. The method of claim 33, wherein the first binding pair member is biotin.
35. The method of any one of claims 20-34, wherein following the addition of one or more of the first, second, third, or fourth reactive unit, or the soluble reagent-added moiety, the plurality of templates is reacted with a capping reagent that differentially caps the small molecules that did not react with the corresponding reactive unit or soluble reagent, wherein the cap renders the small molecule that did not react with the corresponding reactive unit or soluble reagent unable to react with any further reactive units or soluble reagents.
36. The method of claim 35, wherein the capping reagent is an acid anhydride or acyl chloride.
37. The method of claim 36, wherein the capping reagent is acetic anhydride.
38. The method of any one of claims 33-37, wherein one or more of the plurality of first, second, third or fourth small molecules is purified by contact with a second binding pair member, wherein the second binding pair member is bound to a solid support.
39. The method of any one of claims 1-38, wherein each first oligonucleotide comprises a unique tag sequence that defines the linker or capping group, or any other structural modification to the third or fourth small molecule that was not achieved through a DNA- templated reaction step.
40. A library of compounds produced by the method of any one of claims 1 -39.
PCT/US2015/068308 2014-12-31 2015-12-31 Methods and compositions for nucleic acid-templated synthesis of large libraries of complex small molecules WO2016109808A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201462099092P 2014-12-31 2014-12-31
US62/099,092 2014-12-31

Publications (1)

Publication Number Publication Date
WO2016109808A1 true WO2016109808A1 (en) 2016-07-07

Family

ID=56285077

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2015/068308 WO2016109808A1 (en) 2014-12-31 2015-12-31 Methods and compositions for nucleic acid-templated synthesis of large libraries of complex small molecules

Country Status (1)

Country Link
WO (1) WO2016109808A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090035824A1 (en) * 2005-06-17 2009-02-05 Liu David R Nucleic acid-templated chemistry in organic solvents
US7491494B2 (en) * 2002-08-19 2009-02-17 President And Fellows Of Harvard College Evolving new molecular function
US20090149347A1 (en) * 2005-06-07 2009-06-11 President And Fellows Of Harvard College Ordered Multi-Step Synthesis by Nucleic Acid-Mediated Chemistry

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7491494B2 (en) * 2002-08-19 2009-02-17 President And Fellows Of Harvard College Evolving new molecular function
US20090149347A1 (en) * 2005-06-07 2009-06-11 President And Fellows Of Harvard College Ordered Multi-Step Synthesis by Nucleic Acid-Mediated Chemistry
US20090035824A1 (en) * 2005-06-17 2009-02-05 Liu David R Nucleic acid-templated chemistry in organic solvents

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
GARTNER ET AL.: "DNA-template organic synthesis and selection of a library of macrocycles''.", SCIENCE, vol. 305, 2004, pages 1601 - 1605, XP002397753, DOI: doi:10.1126/science.1102629 *
TSE ET AL.: "Translation of DNA into a library of 13000 synthetic small-molecule macrocycles suitable for in vitro selection''.", THE JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, vol. 130, 2008, pages 15611 - 15626, XP055171629, DOI: doi:10.1021/ja805649f *

Similar Documents

Publication Publication Date Title
EP1899465B1 (en) Iterated branching reaction pathways via nucleic acid-mediated chemistry
EP1888746B1 (en) Ordered multi-step synthesis by nucleic acid-mediated chemistry
US20090035824A1 (en) Nucleic acid-templated chemistry in organic solvents
JP5254934B2 (en) Evolving new molecular functions
US20090203530A1 (en) Polymer evolution via templated synthesis
JP2005536234A5 (en)
US20180251756A1 (en) Template directed split and mix synthesis of small molecule libraries
JP4969459B2 (en) Free reactants used in nucleic acid template synthesis
JP2004536162A (en) Evolution of new molecular functions
WO2016109808A1 (en) Methods and compositions for nucleic acid-templated synthesis of large libraries of complex small molecules

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15876360

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15876360

Country of ref document: EP

Kind code of ref document: A1