WO2005044979A2 - Glycosylation enzymes and systems and methods of making and using them - Google Patents

Glycosylation enzymes and systems and methods of making and using them Download PDF

Info

Publication number
WO2005044979A2
WO2005044979A2 PCT/US2004/025015 US2004025015W WO2005044979A2 WO 2005044979 A2 WO2005044979 A2 WO 2005044979A2 US 2004025015 W US2004025015 W US 2004025015W WO 2005044979 A2 WO2005044979 A2 WO 2005044979A2
Authority
WO
WIPO (PCT)
Prior art keywords
seq
nucleic acid
sequence
polypeptide
set forth
Prior art date
Application number
PCT/US2004/025015
Other languages
French (fr)
Other versions
WO2005044979A3 (en
Inventor
Axel Trefzer
Brian D. Green
Mervyn Bibb
Dylan Mason
Original Assignee
Diversa Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Diversa Corporation filed Critical Diversa Corporation
Publication of WO2005044979A2 publication Critical patent/WO2005044979A2/en
Publication of WO2005044979A3 publication Critical patent/WO2005044979A3/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12PFERMENTATION OR ENZYME-USING PROCESSES TO SYNTHESISE A DESIRED CHEMICAL COMPOUND OR COMPOSITION OR TO SEPARATE OPTICAL ISOMERS FROM A RACEMIC MIXTURE
    • C12P21/00Preparation of peptides or proteins
    • C12P21/005Glycopeptides, glycoproteins
    • AHUMAN NECESSITIES
    • A01AGRICULTURE; FORESTRY; ANIMAL HUSBANDRY; HUNTING; TRAPPING; FISHING
    • A01NPRESERVATION OF BODIES OF HUMANS OR ANIMALS OR PLANTS OR PARTS THEREOF; BIOCIDES, e.g. AS DISINFECTANTS, AS PESTICIDES OR AS HERBICIDES; PEST REPELLANTS OR ATTRACTANTS; PLANT GROWTH REGULATORS
    • A01N43/00Biocides, pest repellants or attractants, or plant growth regulators containing heterocyclic compounds
    • A01N43/90Biocides, pest repellants or attractants, or plant growth regulators containing heterocyclic compounds having two or more relevant hetero rings, condensed among themselves or with a common carbocyclic ring system

Definitions

  • TECHNICAL FIELD This invention relates to the fields of agriculture, pharmacology and molecular biology.
  • the invention provides a novel technology for glycosylation of natural products using novel genetically engineered strains of bacteria.
  • These in vivo glycosylation systems of the invention express a heterologous glycosyltransferase and deoxysugar pathway that are capable of glycosylating a suitable substrate, which can be added to a culture broth.
  • the invention also provides novel compounds glycosylated by these novel genetically engineered strains (the in vivo glycosylation systems of the invention), including novel peptides, mixed polyketide-peptides, or polyketides, including novel macrolides, e.g., glycosylated spinosyn derivatives, glycosylated derivatives of antibiotics such as erythromycin, , rifampicin, glycosylated derivatives of anti-tumor drugs such as daunorubicin, mithramycin, derivatives of immunosuppressants such as l rapamycin, FK520, FK506, glycosylated derivatives of anti-fungals such as amphotericin, glycosylated derivatives of antibacterials such as tylosin, glycosylated derivatives of antiparasitics such as avermectin, glycosylated derivatives of insecticides such as spinosyn, and methods for making these compounds using the in vivo
  • actinomycete derived polyketides are used as antibacterials (tylosin), antiparasitics (avermectin), and as insecticides (spinosyn). This wide range of applications underscores the commercial importance of polyketides. Many of the compounds mentioned above contain glycosyl residues attached to the polyketide scaffold. These sugars serve as molecular recognition elements that are frequently critical for biological activity, and in their absence function is often abolished or dramatically reduced. Most of the sugars found in secondary metabolites belong to the 6-deoxysugar-family.
  • Spinosyns are a novel family of fermentation-derived natural products that exhibit potent insecticidal activities.
  • Spinosad a naturally-occurring mixture of spinosyn A and spinosyn D, has successfully established its utility for crop protective applications in the agrochemical field.
  • Spinosyns are macrolides produced by Saccharopolyspora spinosa and Saccharopolyspora pogona and show excellent insecticidal activity. They are characterized by an unusual tetracyclic macrolide aglycone to which two deoxysugar moieties are attached.
  • a mixture of the main components, spinosyn A and D from S. spinosa is marketed to control insects in cotton, vegetables, fruit trees and nuts.
  • the first committed step in deoxysugar biosynthesis is the removal of the 6-OH-group by NDP-glucose-4,6- dehydratase, yielding the key intermediate NDP-4-keto-6-deoxy-glucose, which is then further modified to yield the final deoxysugar.
  • the NDP-activated deoxysugar is then transferred to the aglycone, or to a growing sugar chain, by pathway-specific glycosyltransferases (GT).
  • GTs can show flexibility towards both their aglycone and their sugar substrates. For example, Trefzer, et al., (2002) J. Am. Chem Soc.
  • UrdGT2 acted on both the urdamycin and premithramycinone aglycones and was able to accept several sugars (D-mycarose, D- and L-rhodinose) besides its natural substrate (D-olivose).
  • deletion of the deoxysugar biosynthetic genes in the urdamycin-producing organism resulted in rerouting of deoxysugar biosynthesis towards sugars not found in the wild-type strain (see Hoffmeister (2000) A. Chem Biol. 7(11):821-831). Similar results were found for enzymes involved in the production of picromycin, elloramycin, and oleandomycin.
  • This system also allows incorporation of unnatural sugars such as halogenated sugars.
  • unnatural sugars such as halogenated sugars.
  • a more cost-effective in vivo fermentation process is required.
  • Methods for chemical glycosylation have been described as well.
  • synthesis of the highly modified sugar moieties found in natural products and their attachment is difficult and not feasible at a production scale.
  • the ability to modulate the glycosylation state of a natural product in a rapid and cost-effective way would be a highly desirable attribute for any natural product drug discovery program.
  • the invention provides in vivo glycosylation systems comprising an engineered host cell, for example, an actinomycetes (including any organism from the order Actinomycetales), e.g., a recombinantly engineered actinomycetes, comprising a heterologous glycosyltransferase and a heterologous deoxysugar pathway.
  • an actinomycetes including any organism from the order Actinomycetales
  • actinomycetes including any organism from the order Actinomycetales
  • a recombinantly engineered actinomycetes comprising a heterologous glycosyltransferase and a heterologous deoxysugar pathway.
  • the actinomycetes is a Streptomyces, such as a Streptomyces coelicolor, Streptomyces peucetius, Streptomyces avermitilis, Streptomyces aureofaciens, Streptomyces kasugensis, Streptomyces lavenulae, Streptomyces lipmanii, Streptomyces lividans, Streptomyces ambofaciens, Streptomyces violaceoniger, Streptomyces thermotolerans, Streptomyces rimosus, Streptomyces glaucescens, Streptomyces roseofulvus, Streptomyces cinnamonensis, Streptomyces curacoi, Streptomyces fradiae, Streptomyces griseus, Streptomyces griseofuscus, Streptomyces longisporoflavus, Streptomyces hygr
  • the actinomycete is an actinomycete plant endophyte.
  • the host cell is an actinomycetes from the family Micromonosporaceae, or the genus Actinomyces, Actinomadura or Nocardia.
  • Micromonosporaceae are preferably Micromonos poraceae, Actinoplaes, Dactylos porangium, Micromonospora or
  • the host cell can also comprise a Pseudonocardineae, Actinosynnema, Lechevaleria, Saccharothrix, Actinoalloteichus, Actinopolyspora, Amycolatopsis, Kibedelos porangium, Pseudonocardia, Saccharomonospora, Saccharopolyspora, and Streptoalloteichus.
  • the host cell can be a Streptomycetacea, including Kitasatospora and Streptomyces.
  • the host cell can be a Microbispora and Microtetraspora.
  • the heterologous glycosyltransferase or the heterologous deoxysugar pathway can be a glycosyltransferase or a deoxysugar pathway of the invention.
  • the invention provides in vitro glycosylation systems comprising a heterologous glycosyltransferase and a heterologous deoxysugar pathway and a cell extract of a host cell, e.g., actinomycetes, or equivalent, which includes, as noted above, any organism from the order Actinomycetales.
  • the actinomycete is a Streptomyces, e.g., a Streptomyces coelicolor, Streptomyces peucetius, Streptomyces avermitilis, Streptomyces aureofaciens, Streptomyces kasugensis, Streptomyces lavenulae, Streptomyces lipmanii, Streptomyces lividans, Streptomyces ambofaciens, Streptomyces violaceoniger, Streptomyces thermotolerans, Streptomyces rimosus, Streptomyces glaucescens, Streptomyces roseofulvus, Streptomyces cinnamonensis, Streptomyces curacoi, Streptomyces fradiae, Streptomyces griseus, Streptomyces griseofuscus, Streptomyces longisporoflavus, Streptomyces
  • Micromonosporaceae are preferably Micromonos poraceae, Actinoplaes, Dactylos porangium, Micromonospora or Verrucosispora.
  • Alternative host cells include Pseudonocardineae, particularly Actinosynnema, Lechevaleria, Saccharothrix, Actinoalloteichus, Actinopolyspora, Amycolatopsis, Kibedelos porangium, Pseudonocardia, Saccharomonospora, Saccharopolyspora, and Streptoalloteichus.
  • Alternative host cells include the Streptomycetacea, including Kitasatospora and Streptomyces.
  • heterologous glycosyltransferase or the heterologous deoxysugar pathway can be a glycosyltransferase or a deoxysugar pathway of the invention.
  • the glycosyltransferase is recombinant enzyme and/or the deoxysugar pathway comprises recombinant enzymes, or combinations thereof.
  • the glycosyltransferase is rhamnosyl-transferase, an anthracycline glycosyltransferase , desosaminyltransferase, mycarosyltransferase, desosaminyltransferase, megosaminyltransferase, oleandrosyltransferase, olivosyl- transferase, mycaminosyltransferase, deoxyallose transferase, forosaminyltransferase, mannosyltransferase, daunosaminyltransferase, rhodinosyltransferase, quinovosyltransferase, or a macrolide glycosyltransferase.
  • the deoxysugar pathway comprises a rhamnose, a forosamine, mycarose, mycaminose, desosaminose, megosaminose, oleandrosose, olivosose, deoxyallose, mannose, daunosaminose, rhodinose, quinovose, and/or a L-digitoxose biosynthetic pathway.
  • the glycosyltransferase transfers a deoxy-glucose, -rhamnose, -digitoxose, -forosamine , - mycarose, -mycaminose, -desosaminose, -megosaminose, -oleandrosose, -olivosose, - deoxyallose, —mannose, -daunosaminose, -rhodinose, quinovose, and/or their D- or L- forms.
  • the glycosyltransferase transfers one of the sugars shown in Figures 31 or 32 or as described herein.
  • the invention provides methods for making a glycosylated natural product comprising the following steps (a) providing an in vivo or in vitro glycosylation system comprising an engineered host cell, e.g. actinomycetes (or cell extract equivalent) comprising a heterologous glycosyltransferase and a heterologous deoxysugar pathway; (b) providing natural product; and (c) adding the natural product to the in vivo or in vitro glycosylation system, thereby glycosylating the natural product.
  • the natural product is either added exogenously or provided in vivo or in vitro by expressing biosynthetic genes for the natural product.
  • the natural product comprises an aglycone or a pseudoaglycone, or 9- or 17-pseudoaglycone, a macrolide, or the natural product comprises a peptide, a mixed polyketide-peptide, or a polyketide, such as a macrolide, e.g., a spinosyn, an erythromycin, , a rifampicin, , idarubicin, epirubicin, a daunorubicin, a mithramycin, a rapamycin, FK520, FK506, an amphotericin, a tylosin or an avermectin.
  • a macrolide e.g., a spinosyn, an erythromycin, , a rifampicin, , idarubicin, epirubicin, a daunorubicin, a mithramycin, a
  • the macrolide aglycone or pseudoaglycone includes an aglycone or pseudoaglycone of a spinosyn, an erythromycin, a rifampicin, idarubicin, epirubicin, a daunorubicin, a mithramycin, a rapamycin, FK520, FK506, an amphotericin, a tylosin, oleandomycin, rifamycin, immunomycin, narbomycin, pikromycin, spiramycin, dirithromycin, clarithromycin, troleandomycin, azithromycin or an avermectin.
  • the actinomycete is a Streptomyces., e.g., a Streptomyces coelicolor, Streptomyces peucetius, Streptomyces avermitilis, Streptomyces aureofaciens, Streptomyces kasugensis, Streptomyces lavenulae, Streptomyces lipmanii, Streptomyces lividans, Streptomyces ambofaciens, Streptomyces violaceoniger, Streptomyces thermotolerans, Streptomyces rimosus, Streptomyces glaucescens, Streptomyces roseofulvus, Streptomyces cinnamonensis, Streptomyces curacoi, Streptomyces fradiae, Streptomyces griseus, Streptomyces griseofuscus, Streptomyces longisporoflavus, Strepteptomyces,
  • Micromonosporaceae are preferably Micromonos poraceae, Actinoplaes, Dactylos porangium, Micromonospora or Verrucosispora.
  • the host cell can be a Pseudonocardineae, particularly Actinosynnema, Lechevaleria, Saccharothrix, Actinoalloteichus, Actinopolyspora, Amycolatopsis, Kibedelos porangium, Pseudonocardia, Saccharomonospora,
  • the host cell can be a Streptomycetacea, including Kitasatospora and Streptomyces.
  • the host cell can be a Microbispora and Microtetraspora.
  • the glycosyltransferase is recombinant enzyme and/or the deoxysugar pathway comprises recombinant enzymes, or combinations thereof.
  • the glycosyltransferase is a rhamnosyl-transferase, an anthracycline glycosyltransferase , desosaminyltransferase, mycarosyltransferase, desosaminyltransferase, megosaminyltransferase, oleandrosyltransferase, olivosyl-transferase, mycaminosyltransferase, deoxyallose transferase, forosaminyltransferase, mannosyltransferase, daunosaminyltransferase, rhodinosyltransferase, quinovosyltransferase, or a macrolide glycosyltransferase.
  • the deoxysugar pathway comprises a rhamnose, a forosamine, mycarose, mycaminose, desosaminose, megosaminose, oleandrosose, olivosose, deoxyallose, mannose, daunosaminose, rhodinose, quinovose and/or a L-digitoxose biosynthetic pathway.
  • the invention provides a compound of the formula 9-6-deoxy-D-glucosyl- spinosyn, a 9-L-rhamnosyl-spinosyn, a 9-L-digitoxosyl-spinosyn, L-rhamnosyl- 17- pseudoaglycone spinosyn, 6-deoxy-beta-D-glucose- 17-pseudoaglycone spinosyn, D- quinovose- 17-pseudoaglycone spinosyn, or L-digitoxosyl- 17-pseudoaglycone spinosyn.
  • a spinosyn of the invention can be an A or D form or a 21-butenyl form as described herein.
  • Alternative spinosyns are the spinosyn A and spinosyn D forms.
  • the invention provides a compound of the formula 9-6-deoxy-D-glucosyl-spinosyn A, a 9-L-rhamnosyl- spinosyn A, a 9-L-digitoxosyl-spinosyn A, L-rhamnosyl- 17-pseudoaglycone spinosyn A, 6-deoxy-beta-D-glucose- 17-pseudoaglycone spinosyn A, D-quinovose- 17- pseudoaglycone spinosyn A, or L-digitoxosyl- 17-pseudoaglycone spinosyn A.
  • the invention provides a compound of the formula 9-6-deoxy-D-glucosyl-spinosyn D, a 9-L- rhamnosyl-spinosyn D, a 9-L-digitoxosyl-spinosyn D, L-rhamnosyl- 17-pseudoaglycone spinosyn D, 6-deoxy-beta-D-glucose- 17-pseudoaglycone spinosyn D, D-quinovose- 17- pseudoaglycone spinosyn D, or L-digitoxosyl- 17-pseudoaglycone spinosyn D.
  • a spinosyn compound where the sugar moiety at the 9-position of a spinosyn aglycone or a pseudoaglycone or a spinosyn is selected from a group provided in Figure 31, wherein R5 is C2-C6 alkyl, C3-C6 branched alkyl, C3-C7 cycloalkyl, C1-C6 alkoxy-Cl-C6 alkyl, C1-C6 alkylthio-Cl-C6 alkyl, halo C1-C6 alkyl, C2-C6 alkenyl, C2-C6 alkynyl, C1-C6 alkylcarbonyl, or C3-C6 branched alkylcarbonyl, C3-C7-cycloalkylcarbonyl, C1-C6 alkoxy-Cl-C6 alkylcarbonyl, halo C1-C6 alkylcarbonyl, C2-C6 alken
  • the invention also provides a spinosyn compound where the sugar moiety at the 17-position of a spinosyn aglycone or a pseudoaglycone or a spinosyn is selected from a group provided in Figure 32, wherein R5 is C2-C6 alkyl, C3-C6 branched alkyl, C3-C7 cycloalkyl, C1-C6 alkoxy-Cl-C6 alkyl, C1-C6 alkylthio-Cl-C6 alkyl, halo C1-C6 alkyl, C2-C6 alkenyl, C2-C6 alkynyl, C1-C6 alkylcarbonyl, or C3-C6 branched alkylcarbonyl, C3-C7-cycloalkylcarbonyl, C1-C6 alkoxy-Cl-C6 alkylcarbonyl, halo Cl- C6 alkylcarbonyl, C2-C6 alken
  • the invention also provides a novel macrolide having as one of its sugar moieties a group selected from Figure 31, wherein R5 is C2-C6 alkyl, C3-C6 branched alkyl, C3-C7 cycloalkyl, C1-C6 alkoxy-Cl-C6 alkyl, C1-C6 alkylthio-Cl-C6 alkyl, halo Cl- C6 alkyl, C2-C6 alkenyl, C2-C6 alkynyl, C1-C6 alkylcarbonyl, or C3-C6 branched alkylcarbonyl, C3-C7-cycloalkylcarbonyl, C1-C6 alkoxy-Cl-C6 alkylcarbonyl, halo Cl- C6 alkylcarbonyl, C2-C6 alkenylcarbonyl, C2-C6 alkynylcarbonyl or formyl; and R6 in formula M is C1-C6 alkyl,
  • the invention also provides a novel macrolide having as one of its sugar moieties a group selected from Figure 32, wherein R5 is C2-C6 alkyl, C3-C6 branched alkyl, C3-C7 cycloalkyl, C1-C6 alkoxy-Cl-C6 alkyl, C1-C6 alkylthio-Cl-C6 alkyl, halo Cl- C6 alkyl, C2-C6 alkenyl, C2-C6 alkynyl, C1-C6 alkylcarbonyl, or C3-C6 branched alkylcarbonyl, C3-C7-cycloalkylcarbonyl, C1-C6 alkoxy-Cl-C6 alkylcarbonyl, halo Cl- C6 alkylcarbonyl, C2-C6 alkenylcarbonyl, C2-C6 alkynylcarbonyl or formyl; and R6 in formula M is C1-C6 alkyl
  • a glycosyltransferase capable of transferring the at least one deoxy sugar compound described herein.
  • the sugar biosynthesis pathway and/or glycosyltransferase of the invention yield a spinosyn or a 9- or -17 pseudoaglycone having at least one of the deoxy sugar compounds described herein.
  • the methods of the present invention can yield a spinosyn or pseudoaglycone where the sugar moiety at the 9 position of a spinosyn or a pseudoaglycone is selected from a group provided in Figure 31, wherein R5 is C2-C6 alkyl, C3-C6 branched alkyl, C3-C7 cycloalkyl, C1-C6 alkoxy-Cl-C6 alkyl, C1-C6 alkylthio-Cl-C6 alkyl, halo C1-C6 alkyl, C2-C6 alkenyl, C2- C6 alkynyl, C1-C6 alkylcarbonyl, or C3-C6 branched alkylcarbonyl, C3-C7-
  • the methods of the present invention can yield a spinosyn or pseudoaglycone where the sugar moiety at the 17 position of a spinosyn or a pseudoaglycone is selected from a group provided in Figure 32, wherein R5 is C2-C6 alkyl, C3-C6 branched alkyl, C3-C7 cycloalkyl, C1-C6 alkoxy-Cl-C6 alkyl, C1-C6 alkylthio-Cl-C6 alkyl, halo C1-C6 alkyl, C2-C6 alkenyl, C2-C6 alkynyl, C1-C6 alkylcarbonyl, or C3-C6 branched alkylcarbonyl, C3-C7-cycloalkylcarbonyl, C1-C6 alkoxy-Cl-C6 alkylcarbonyl, halo C1-C6 alkylcarbonyl, C2-C6 alken
  • the invention provides an insecticide comprising a glycosylated natural product of the invention, e.g., a natural product glycosylated by a method of the invention or made using an in vivo glycosylation system of the invention, e.g., 9-6-deoxy-D- glucosyl-spinosyn, 9-L-rhamnosyl-spinosyn, 9-L-digitoxosyl-spinosyn, L-digitoxosyl- 17- pseudo-aglycone spinosyn, 6-deoxy-D-glucose- 17-pseudoaglycone spinosyn, D- quinovose- 17-pseudoaglycone spinosyn, or a combination thereof, more preferably 9-6- deoxy-D-glucosyl-spinosyn A or D, 9-L-rhamnosyl-spinosyn A or D, 9-L-digitoxos
  • the invention provides a pharmaceutical composition comprising a glycosylated natural product of the invention, e.g., a natural product glycosylated by a method of the invention or made using an in vivo glycosylation system of the invention, e.g., 9-6-deoxy-D-glucosyl-spinosyn, 9-L-rhamnosyl-spinosyn, 9-L-digitoxosyl-spinosyn, L-digitoxosyl- 17-pseudo-aglycone spinosyn, 6-deoxy-D-glucose- 17-pseudoaglycone spinosyn, D-quinovose- 17-pseudoaglycone spinosyn, or a combination thereof, more preferably 9-6-deoxy-D-glucosyl-spinosyn A or D, 9-L-rhamnosyl-spinosyn A or D, 9-L-
  • the invention provides methods of preventing or treating or ameliorating an infection in a cell or an organism or a plant comprising application of an effective amount of a composition comprising a glycosylated natural product of the invention, e.g., a natural product glycosylated by a method of the invention or made using an in vivo or in vitro glycosylation system of the invention, e.g., comprising 9-6-deoxy-D-glucosyl- spinosyn, 9-L-rhamnosyl-spinosyn, 9-L-digitoxosyl-spinosyn, L-digitoxosyl- 17-pseudoaglycone spinosyn, 6-deoxy-D-glucose- 17-pseudoaglycone spinosyn, D-quinovose- 17- pseudoaglycone spinosyn, or a combination thereof, more preferably 9-6-deoxy-D- glucosyl-
  • Atropa Avena, Brassica, Citrus, Citrullus, Capsicum, Carthamus, Cocos, Coffea, Cucumis, Cucurbita, Daucus, Elaeis, Fragaria, Glycine, Gossypium, Helianthus, Heterocallis, Hordeum, Hyoscyamus, Lactuca, Linum, Lolium, Lupinus, Lycopersicon, Malus, Manihot, Majorana, Medicago, Nicotiana, Olea, Oryza, Panieum, Pannesetum, Persea, Phaseolus, Pistachia, Pisum, Pyrus, Prunus, Raphanus, Ricinus, Secale, Senecio, Sinapis, Solanum, Sorghum, Tlieobromus, Trigonella, Triticum, Vicia, Vitis, Vigna or Zea, or the plant can be an angiosperm or a gymnosperm or a monocot or a
  • kits comprising a 9-6-deoxy-D-glucosyl-spinosyn, 9-L-rhamnosyl-spinosyn, 9-L-digitoxosyl-spinosyn, L-digitoxosyl- 17-pseudo-aglycone spinosyn, 6-deoxy-D-glucose- 17-pseudoaglycone spinosyn, D-quinovose- 17- pseudoaglycone spinosyn, or a combination thereof, more preferably 9-6-deoxy-D- glucosyl-spinosyn A or D, 9-L-rhamnosyl-spinosyn A or D, 9-L-digitoxosyl-spinosyn A or D, L-digitoxosyl- 17-pseudo-aglycone spinosyn A or D, 6-deoxy-D-glucose- 17- pseudoaglycon
  • the invention provides methods for treating a plant with a novel compound of the invention, e.g., a novel spinosyn, e.g., as an insecticide in or on a plant, such as a plant from the genera Anacardium, Arachis, Asparagus, Atropa, Avena, Brassica, Citrus, Citrullus, Capsicum, Carthamus, Cocos, Coffea, Cucumis, Cucurbita, Daucus, Elaeis, Fragaria, Glycine, Gossypium, Helianthus, Heterocallis, Hordeum, Hyoscyamus, Lactuca, Linum, Lolium, Lupinus, Lycopersicon, Malus, Manihot,
  • a novel compound of the invention e.g., a novel spinosyn, e.g., as an insecticide in or on a plant, such as a plant from the genera Anacardium, Arachis, Asparagus, Atropa
  • the plant can be an angiosperm or a gymnosperm.
  • the plant can be a monocot or a dicot. In one aspect, the plant is a transgenic plant.
  • nucleic acids encoding a polypeptides comprising glycosyltransferase and/or deoxysugar pathways can comprise an expression cassette, e.g., comprising a polypeptide-encoding nucleic acid operatively linked to a promoter.
  • the nucleic acid can be operatively linked to any kind of promoter, such as an inducible promoter, a constitutive promoter and/or a tissue specific or developmentally or environmentally regulated promoter.
  • the promoter can be a plant promoter (e.g., promoters endogenous to or active in plants), such as a cauliflower mosaic virus (CaMV) 35S transcription initiation region or a 1'- or 2'- promoter derived from T- DNA of Agrobacterium tumefaciens.
  • the promoter can be an inducible plant promoter.
  • the inducible promoter can be responsive to an environmental condition, such as an anaerobic condition, elevated temperature, the presence of light or a chemical.
  • a plant is exposed to a chemical to induce the promoter.
  • the plant promoter is a maize In2-2 promoter that is activated by a benzenesulfonamide herbicide.
  • a plant or plant part is sprayed or otherwise treated (e.g., dipped, painted, etc.) with a chemical (e.g., in a solution) to induce the promoter.
  • a chemical e.g., in a solution
  • the entire plant, or seeds, fruits, leaves, roots, tubers and the like can be treated, e.g., sprayed.
  • Plant parts, e.g., leaves, roots, tubers, fruits or seeds can be sprayed after harvesting from the plant.
  • a plant or plant part can be sprayed or otherwise treated (e.g., dipped, painted, etc.) with a composition (e.g., a solution) comprising a natural product of the invention.
  • the entire plant, or seeds, fruits, leaves, roots, tubers and the like can be treated, e.g., sprayed.
  • Plant parts, e.g., leaves, roots, tubers, fruits or seeds, can be sprayed or otherwise treated after harvesting from the plant.
  • the nucleic acid encoding polypeptides comprising a glycosylation system of the invention can comprise an expression vector.
  • the nucleic acid can further comprise any kind of expression vector, e.g., the expression vector can comprise nucleic acid derived from a bacteria, a virus or a transposable element or derivatives thereof, e.g., Agrobacterium spp., potato virus X, tobacco mosaic virus, tomato bushy stunt virus, tobacco etch virus, bean golden mosaic virus, cauliflower mosaic virus, maize Ac/Ds transposable element, maize suppressor mutator (Spm) transposable element or derivatives thereof.
  • Agrobacterium spp. potato virus X, tobacco mosaic virus, tomato bushy stunt virus, tobacco etch virus, bean golden mosaic virus, cauliflower mosaic virus, maize Ac/Ds transposable element, maize suppressor mutator (Spm) transposable element or derivatives thereof.
  • Spm maize suppressor mutator
  • the invention provides methods for screening for a composition having an insecticidal or anti-microbial activity comprising the following steps: (a) providing a composition of the invention; (b) providing a test cell or organism; (c) reacting the composition of step (a) with the test cell or organism; and (d) monitoring insecticidal or anti-microbial activity, thereby determining that the composition has a insecticidal or anti-microbial activity.
  • the test cell or organism can be derived from a biological sample, e.g., a bacterial cell, a protozoan cell, an insect cell, a yeast cell, a plant cell, a fungal cell or a mammalian cell.
  • At least one step, or, all of the steps are conducted in a reaction vessel. At least one step, or, all of the steps, can be conducted in a cell extract, and/or in an intact cell, or a combination thereof.
  • the reaction vessel can comprise a microtiter plate, e.g., a capillary tube or a capillary array, such as a GIGAMATRIXTM array.
  • Monitoring production of the insecticidal or anti-microbial product can be by a growth selection assay or equivalent.
  • the test cell or organism comprises a cell extract or a cell fraction, e.g., a bacterial cell, a protozoan cell, an insect cell, a yeast cell, a plant cell, a fungal cell or a mammalian cell.
  • the invention provides transgenic plants (including parts of the plants, e.g., seeds, leaves, fruits, roots and the like) and transformed plant cells and seeds comprising a glycosyltransferase or a glycosylation system of the invention.
  • the invention provides kits comprising a glycosyltransferase or a glycosylation system of the invention.
  • the kit can further comprise instructions for using the kit, e.g., instructions comprising how to use the methods and compositions of the invention.
  • the invention provides methods of treating a plant, or, making a composition of the invention in a plant, comprising the following steps: (a) introducing nucleic acids encoding polypeptides comprising a glycosylation system of the invention, wherem the nucleic acids are operably linked to a promoter; and (b) expressing the polypeptides, thereby treating a plant, or, making a composition of the invention in a plant.
  • the promoter is an inducible promoter, or, a constitutive promoter.
  • the plant can be a monocot or a dicot.
  • the monocot can be selected from the group consisting of maize, corn, sorghum and rice.
  • the plant is a transgenic plant comprising the nucleic acid.
  • the nucleic acid further comprises an expression vector, a recombinant virus and the like.
  • the invention provides methods for treating a cell, a plant, an organism, a food or a feed, or an object, comprising the following steps: providing a composition of the invention, and, contacting the composition with cell, plant, organism, food or feed, or object.
  • the composition can be provided by treating, e.g., spraying, painting, dipping, etc., with a formulation comprising the composition.
  • the composition can comprise a plant or a plant part, e.g., a seed, fruit, root, leaf, tuber and the like.
  • the food or feed comprises an animal feed, feed supplement, an animal grain, a food or a food additive.
  • the invention provides isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%,
  • SEQ ID NO:l sequence identity to an exemplary nucleic acid of the invention, e.g., SEQ ID NO:l; SEQ ID NO:3; SEQ ID NO:5; SEQ ID NO:7; SEQ ID NO:9; SEQ ID NO:ll; SEQ ID NO:13; SEQ ID NO:15; SEQ ID NO:17; SEQ ID NO:19; SEQ ID NO:21; SEQ ID NO:23; SEQ ID NO:25; SEQ ID NO:27; SEQ ID NO:29; SEQ ID NO:31, SEQ ID NO:34; SEQ ID NO:39; SEQ ID NO:44; SEQ ID NO:48; SEQ ID NO:52;
  • sequence identities can be determined, e.g., by analysis with a sequence comparison algorithm or by a visual inspection.
  • the sequence comparison algorithm is a BLAST version 2.2.2 algorithm where a filtering setting is set to blastall -p blastn -d "nr patnt" -F F, and all other options are set to default.
  • the invention provides isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50%) (or more, as noted above), or complete (100%) sequence identity to SEQ ID NO:l; SEQ ID NO:3; SEQ ID NO:5; SEQ ID NO:7; SEQ ID NO:9; SEQ ID NO:ll; SEQ ID NO:13; SEQ ID NO:15; SEQ ID NO: 17; SEQ ID NO: 19; SEQ ID NO:21; SEQ ID NO:23; SEQ ID NO:25; SEQ ID NO:27; SEQ ID NO:29, wherein nucleic acids having at least about 50% sequence identity to SEQ ID NO:l and/or SEQ ID NO:3 encode a polypeptide (e.g., SEQ ID NO:2, SEQ ID NO:4) having a gtt, or nucleotidyl transferase activity; nucleic acids having at least about 50% sequence identity to SEQ ID NO:5 and/or SEQ ID NO
  • the invention provides isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50% (or more, as noted above), or complete (100%) sequence identity to SEQ ID NO:31, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a 6-deoxyhexose, and the pathway comprises at least one gtt gene and at least one gdh gene, e.g., in one aspect, encoding polypeptides having a sequence as set forth in SEQ ID NO:32 and SEQ ID NO:33.
  • the invention provides isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50% (or more, as noted above), or complete (100%) sequence identity to SEQ ID NO:34, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a 6-deoxysugar, and the pathway comprises at least one gtt, gdh, epi and kre gene, e.g., in one aspect, encoding polypeptides having a sequence as set forth in SEQ ID NO:35, SEQ ID NO:36, SEQ ID NO:37, and SEQ ID NO:38. See pathway 1 cross-referenced in Table 2, below.
  • the invention provides isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50%) (or more, as noted above), or complete (100%) sequence identity to SEQ ID NO:39, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a 6-deoxysugar, and the pathway comprises at least one gtt, gdh, epi and kre gene, e.g., in one aspect, encoding polypeptides having a sequence as set forth in SEQ ID NO:40, SEQ ID NO:41, SEQ ID NO:42, and SEQ ID NO:43. See pathway 2, cross-referenced in Table 2, below.
  • the invention provides isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50% (or more, as noted above), or complete (100%) sequence identity to SEQ ID NO:44, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a 6-deoxysugar, and the pathway comprises at least one gtt, gdh and kre gene, e.g., in one aspect, encoding polypeptides having a sequence as set forth in SEQ ID NO:45, SEQ ID NO:46, and SEQ ID NO:47. See pathway 2-epi, cross-referenced in Table 2, below.
  • the invention provides isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50% (or more, as noted above), or complete (100%) sequence identity to SEQ ID NO:48, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a 6-deoxyhexose, and the pathway comprises at least one gtt, gdh and epi gene, e.g., in one aspect, encoding polypeptides having a sequence as set forth in SEQ ID NO:49, SEQ ID NO:50, and SEQ ID NO:51. See pathway 2-kre, cross-referenced in Table 2, below.
  • the invention provides isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50% (or more, as noted above), or complete (100%) sequence identity to SEQ ID NO:52, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a 6-deoxysugar, and the pathway comprises at least one gtt, gdh, epi and kre gene, e.g., in one aspect, encoding polypeptides having a sequence as set forth in SEQ ID NO:53, SEQ ID NO:54, SEQ ID NO:55, and SEQ ID NO:56. See pathway 3, cross-referenced in Table 2, below.
  • the invention provides isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50% (or more, as noted above), or complete (100%) sequence identity to SEQ ID NO:57, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a 6- deoxyhexose, and the pathway comprises at least one gtt, gdh and kre gene, e.g., in one aspect, encoding polypeptides having a sequence as set forth in SEQ ID NO:58, SEQ ID NO:59, and SEQ ID NO:60. See pathway 3-epi, cross-referenced in Table 2, below.
  • the invention provides isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50% (or more, as noted above), or complete (100%) sequence identity to SEQ ID NO:61, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a 6-deoxysugar, and the pathway comprises at least one gtt, gdh, epi and kre gene, e.g., in one aspect, encoding polypeptides having a sequence as set forth in SEQ ID NO:62, SEQ ID NO:63, SEQ ID TSfO:64 and SEQ ID NO:65. See pathway 4, cross-referenced in Table 2, below.
  • the invention provides isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50% (or more, as noted above), or complete (100%) sequence identity to SEQ ID NO:66, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a 6-deoxyhexose, and the pathway comprises at least one gtt, gdh and epi gene, e.g., in one aspect, encoding polypeptides having a sequence as set forth in SEQ ID NO:67, SEQ ID NO:68 and SEQ ID NO:69. See pathway 4-kre, cross-referenced in Table 2, below.
  • the invention provides isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50% (or more, as noted above), or complete (100%) sequence identity to SEQ ID NO:70, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a 6-deoxysugar, and the pathway comprises at least one gtt, gdh, epi and kre gene, e.g., in one aspect, encoding polypeptides having a sequence as set forth in SEQ ID NO:71, SEQ ID NO:72, SEQ ID NO:73 and SEQ ID NO:74. See pathway 6, cross-referenced in Table 2, below.
  • the invention provides isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50% (or more, as noted above), or complete (100%) sequence identity to SEQ ID NO:75, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a 2,6 dideoxysugar, and the pathway comprises at least one gtt, gdh, epi, kre, tdh and tkr gene, e.g., in one aspect, encoding polypeptides having a sequence as set forth in SEQ ID NO:76, SEQ ID NO:77, SEQ ID NO:78, SEQ ID NO:79, SEQ ID NO:80, and SEQ ID NO:81.
  • the invention provides isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50% (or more, as noted above), or complete (100%) sequence identity to SEQ ID NO:89, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a 2,6 dideoxysugar, and the pathway comprises at least one gtt, gdh, epi, kre, tdh and tkr gene, e.g., in one aspect, encoding polypeptides having a sequence as set forth in SEQ ID NO:90, SEQ ID NO:91, SEQ ID NO:92, SEQ ID NO:93, SEQ ID NO:94, and SEQ ID NO:95.
  • the invention provides isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50% (or more, as noted above), or complete (100%) sequence identity to SEQ ID NO:96, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a 2,6 dideoxysugar, and the pathway comprises at least one gtt, gdh, epi, kre, tdh and tkr gene, e.g., in one aspect, encoding polypeptides having a sequence as set forth in SEQ ID NO:97, SEQ ID NO:98,
  • the invention provides isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50% (or more, as noted above), or complete (100%) sequence identity to SEQ ID NO: 103, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a 2,6 dideoxysugar, and the pathway comprises at least one gtt, gdh, epi, kre, tdh and tkr gene, e.g., in one aspect, encoding polypeptides having a sequence as set forth in SEQ ID NO: 104, SEQ ID NO:105, SEQ ID NO:106, SEQ ID NO:107, SEQ ID NO:108 and SEQ ID NO:109.
  • the invention provides isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50% (or more, as noted above), or complete (100%) sequence identity to SEQ ID NO:l 10, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a deoxysugar, and the pathway comprises at least one gtt, gdh, spnQ, R, S, tdh and tkr gene, e.g., in one aspect, encoding polypeptides having a sequence as set forth in SEQ ID NO:l 11, SEQ ID NO:l 12, SEQ ID NO:113, SEQ ID NO:114, SEQ ID NO:115, SEQ ID NO:116 and SEQ ID NO:l 17.
  • the invention provides isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50% (or more, as noted above), or complete (100%) sequence identity to SEQ ID NO:l 18, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a deoxysugar, and the pathway comprises at least one gtt, gdh, spnQ, R, tdh and tkr gene, e.g., in one aspect, encoding polypeptides having a sequence as set forth in SEQ ID NO:l 19, SEQ ID NO: 120, SEQ ID NO: 121, SEQ ID NO:122, SEQ ID NO:123 and SEQ ID NO:124.
  • the invention provides isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50% (or more, as noted above), or complete (100%) sequence identity to SEQ ID NO:125, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a deoxysugar, and the pathway comprises at least one gtt, gdh, spnQ, kre, tdh and tkr gene, e.g., in one aspect, encoding polypeptides having a sequence as set forth in SEQ ID NO:l 19, SEQ ID
  • the invention provides isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50% (or more, as noted above), or complete (100%) sequence identity to SEQ ID NO: 132, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a deoxysugar, and the pathway comprises at least one gtt, gdh, spnQ, kre, tdh and tkr gene, e.g., in one aspect, encoding polypeptides having a sequence as set forth in SEQ ID NO: 133, SEQ ID NO:134, SEQ ID NO:135, SEQ ID NO:136, SEQ ID NO: 137 and SEQ ID NO: 138.
  • the invention provides isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50% (or more, as noted above), or complete (100%) sequence identity to SEQ ID NO:139, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a deoxysugar, and the pathway comprises at least one nucleic acid encoding SEQ ID NO: 140 and SEQ ID NO:141.
  • the nucleic of SEQ ID NO:139 designated the plasmid (vector) "pAT6", cross-referenced in Table 2, below.
  • the invention provides isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50% (or more, as noted above), or complete (100%) sequence identity to SEQ ID NO: 142, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a deoxysugar, and the pathway comprises at least one nucleic acid encoding SEQ ID NO: 143, SEQ ID NO: 144, and SEQ ID NO: 145. See the nucleic of SEQ ID NO: 142 designated the plasmid (vector) "pUWL-spnP", cross-referenced in Table 2, below.
  • the invention provides isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50%) (or more, as noted above), or complete (100%) sequence identity to SEQ ID NO: 146, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a deoxysugar, and the pathway comprises at least one nucleic acid encoding SEQ ID NO: 147, SEQ ID NO: 148, and SEQ ID NO: 149. See the nucleic of SEQ ID NO:142 designated the plasmid (vector) "pUWL-spnG", cross-referenced in Table 2, below.
  • the pathways herein comprise at least one gtt and a gdh.
  • the pathways herein are pathways wherein all or a part of the pathway, preferably one, two or three or four genes of the pathway, are provided by an endogenous gene of the host cell.
  • the kre gene is provided by the host cell rather than the heterologous pathway.
  • the all or part of the endogenous pathway genes preferably one, two, three or four endogenous host pathway genes, are overexpressed at least two-fold above or inactivated at least two-fold below wild-type levels.
  • the all or part of the endogenous host pathway gene or genes are selected to complement any of the pathway activities provided herein.
  • the host cell or cell extract therefrom comprises one or more inactivated, defective or inhibited endogenous glycosyltransferase and/or deoxysugar pathway genes or enzymes.
  • the endogenous gene or enzyme can be inactivated by any number of means including gene disruption, antisense, inhibitory RNA (e.g., iRNA), a regulatory mutation or a structural gene mutation (a mutation in the gene itself).
  • the host cell e.g., actinomycete host cell, or cell extract therefrom comprises two, three or four inactivated, defective or inhibited endogenous gly cosy ltransferases and/or deoxysugar pathway enzymes.
  • the one or more endogenous biosynthetic pathway genes can be part of a macrolide biosynthetic pathway that produces the macrolide target of the invention.
  • the inactivated, defective or inhibited endogenous glycosyltransferase and/or deoxysugar pathway gene or enzyme can result in the host cell providing an aglycone, pseudoaglycone, unnatural sugar or modified sugar compared to wild-type cell or extract.
  • the engineered host cell or cell extract comprises at least two heterologous glycosyltransferases and/or at least two heterologous deoxysugar pathways.
  • at least one biosynthetic deoxysugar pathway gene or enzyme and/or a glycosyltransferase of a system is provided by and endogenous to the host cell.
  • two, three or four pathway genes or enzymes are provided by and endogenous to the host cell.
  • the one or more endogenous genes are activated, e.g., induced, constitutive or overexpressed, or are inactivated
  • the engineered host cell comprises one or more genes of a heterologous macrolide biosynthetic pathway thereby producing the heterologous macrolide, or aglycone or pseudoaglycone thereof.
  • a combinatorial cell library comprising novel macrolides is provided that comprises at least two, pooled or separate, engineered host cells of the invention that are of different species, strains or mutants and contain the same heterologous biosynthetic deoxysugar pathway genes and/or heterologous glycosyltransferase.
  • a library is obtained that can provide a variety of modified or improved macrolides of interest.
  • host cells that differ in their ability to modify, i.e. decorate, a sugar moiety, for example as by carbamylation, O-methylation, O-alkylation, N-methylation, C- methylation, nitrosylation, amination, or deoxygenation.
  • the library can be screened for production of a modified macrolide of interest.
  • the deoxysugar biosynthetic pathways of the invention comprise: Deoxysugar biosynthetic pathway 1 (SEQ ID NO:34') General Description DNA pathway 1 Entire molecule length: 3536 bp Feature Map CDS (4 total) gtt-sp Start: 30 End: 911 Original Location Description: 712..1593 gdh-sp Start: 939 End: 1928 Original Location Description: 1621..2610 epi-sp Start: 1962 End: 2570 Original Location Description: 719..1328 kre-sp Start: 2600 End: 3517 Original Location Description: 1358..2275
  • Deoxysugar biosynthetic pathway 4 (SEQ ID NO:61 General Description DNA pathway 4 Entire molecule length: 3625 bp Feature Map CDS (4 total) gtt-SP Start: 30 End: 911 Original Location Description: 712..1593 gdh-SP Start: 939 End: 1928 Original Location Description: 1621..2610 epi-DS Start: 1967 End: 2563 Original Location Description: 725..1321 kre-DS Start: 2591 End: 3610 Original Location Description: 1349..2368 Restriction/Methylation Map No cuts: Clal, Ncol
  • Deoxysugar biosynthetic pathway 6 (SEO ID NO:70 General Description DNA pathway 6 Entire molecule length: 3727 bp Feature Map CDS (4 total) gtt-DS Start: 79 End: 948 Streptomyces diversa gtt gene Original Location Description: 73..942 gdh-DS Start: 1049 End: 2017 Streptomyces diversa gdh gene Original Location Description: 1043..2011 epi-SP Start: 2056 End: 2664 Original Location Description: 2049..2658 kre-DS Start: 2693 End: 3712 Original Location Description: 2687..3706
  • Deoxysugar biosynthetic pathway 9 (SEQ ID NO:75 General Description DNA pathway 9 Entire molecule length: 6010 bp Feature Map CDS (6 total) gtt-SP Start: 30 End: 911 Original Location Description: 712..1593 gdh-SP Start: 939 End: 1928 Original Location Description: 1621..2610 epi-SP Start: 1962 End: 2570 Original Location Description: 719..1328 kre-SP Start: 2600 End: 3517 Original Location Description: 1358..2275 tdh-DS Start: 3545 End: 4960 Original Location Description: 25..1440 tkr-DS Start: 4995 End: 5996 Original Location Description: 24..1025
  • Deoxysugar biosynthetic Pathway 10 (SEO ID NO:82 ⁇ General Description DNA Pathway 10 Entire molecule length: 6108 bp Feature Map CDS (6 total) gtt-SP Start: 30 End: 911 Original Location Description: 712..1593 gdh-SP Start: 939 End: 1928 Original Location Description: 1621..2610 epi-SP Start: 1962 End: 2570 Original Location Description: 719..1328 kre-DS Start: 2599 End: 3618 Original Location Description: 1357..2376 tdh-DS Start: 3643 End: 5058 Original Location Description: 25..1440 tkr-DS Start: 5093 End: 6094 Original Location Description: 24..1025
  • Deoxysugar biosynthetic pathway 11 (SEQ ID NO: 89 General Description DNA pathway 11 Entire molecule length: 6002 bp Feature Map CDS (6 total) gtt-SP Start: 30 End: 911 Original Location Description: 712..1593 gdh-SP Start: 939 End: 1928 Original Location Description: 1621..2610 epi-DS Start: 1967 End: 2563 Original Location Description: 725..1321 kre-SP Start: 2592 End: 3509 Original Location Description: 1350..2267 tdh-DS Start: 3537 End: 4952 Original Location Description: 25..1440 tkr-DS Start: 4987 End: 5988 Original Location Description: 24..1025
  • Deoxysugar biosynthetic pathway 12 (SEQ ID NO:96 General Description DNA pathway 12 Entire molecule length: 6100 bp Feature Map CDS (6 total) gtt-SP Start: 30 End: 911 Original Location Description: 712..1593 gdh-SP Start: 939 End: 1928 Original Location Description: 1621..2610 epi-DS Start: 1967 End: 2563 Original Location Description: 725..1321 kre-DS Start: 2591 End: 3610 Original Location Description: 1349..2368 tdh-DS Start: 3635 End: 5050 Original Location Description: 25..1440 tkr-DS Start: 5085 End: 6086 Original Location Description: 24..1025 Restriction/Methylation Map No cuts: Clal, Ncol
  • Deoxysugar biosynthetic pathway 13 (SEO ID NO: 103 General Description DNA pathway 13 Entire molecule length: 6104 bp Feature Map CDS (6 total) gtt-DS Start: 79 End: 948 Streptomyces diversa gtt gene Original Location Description: 72..941 gdh-DS Start: 1049 End: 2017 Streptomyces diversa gdh gene Original Location Description: 1042..2010 epi-SP Start: 2056 End: 2664 Original Location Description: 2048..2657 kre-SP Start: 2694 End: 3611 Original Location Description: 2687..3604 tdh-DS Start: 3639 End: 5054 Original Location Description: 3632..5047 tkr-DS Start: 5089 End: 6090 Original Location Description: 5082..6083
  • Deoxysugar biosynthetic pathway 17 (SEO ID NO: 110 General Description DNA pathway 17 Entire molecule length: 7811 bp Feature Map CDS (7 total) gtt-SP Start: 24 End: 905 Original Location Description: 712..1593 gdh-SP Start: 933 End: 1922 Original Location Description: 1621..2610 spnQ-SP Start: 1964 End: 3352 spnQ from Spinosyn cluster Original Location Description: complement(1988..3373) spnR-SP Start: 3390 End: 4547 spnR from Spinosyn cluster Original Location Description: complement(793..1947) spnS-SP Start: 4550 End: 5299 spnS from Spinosyn cluster Original Location Description: complement ⁇ 1..793) tdh-DS Start: 5346 End: 6761 Original Location Description: 25..1440 tkr-DS Start: 6796 End: 7797 Original Location Description: 24..1025
  • Deoxysugar biosynthetic Pathway 18 (SEO ID NO: 118) General Description DNA Pathway 18 Entire molecule length: 7065 bp Feature Map CDS (6 total) gtt-SP Start: 24 End: 905 Original Location Description: 712..1593 gdh-SP Start: 933 End: 1922 Original Location Description: 1621..2610 spnQ-SP Start: 1964 End: 3352 spnQ from Spinosyn cluster Original Location Description: complement(1242..2627) spnR-SP Start: 3390 End: 4547 spnR from Spinosyn cluster Original Location Description: complement(47..1201) tdh-DS Start: 4600 End: 6015 Original Location Description: 25..1440 tkr-DS Start: 6050 End: 7051 Original Location Description: 24..1025
  • Deoxysugar biosynthetic Pathway 19 (SEO ID NO: 125) General Description DNA Pathway 19 Entire molecule length: 6825 bp Feature Map CDS (6 total) gtt-SP Start: 30 End: 911 Original Location Description: 712..1593 gdh-SP Start: 939 End: 1928 Original Location Description: 1621..2610 spnQ-SP Start: 1970 End: 3355 spnQ from Spinosyn cluster Original Location Description: complement(44..1429) kre-SP Start: 3415 End: 4332 Original Location Description: 1350..2267 tdh-DS Start: 4360 End: 5775 Original Location Description: 25..1440 tkr-DS Start: 5810 End: 6811 Original Location Description: 24..1025
  • Deoxysugar biosynthetic Pathway 20 (SEO ID NO: 132) General Description DNA Pathway 20 Entire molecule length: 6923 bp Feature Map CDS (6 total) gtt-SP Start: 30 End: 911 Original Location Description: 712..1593 gdh-SP Start: 939 End: 1928 Original Location Description: 1621..2610 spnQ-SP Start: 1970 End: 3355 spnQ from Spinosyn cluster Original Location Description: complement(44..1429) kre-DS Start: 3414 End: 4433 Original Location Description: 1349..2368 tdh-DS Start: 4458 End: 5873 Original Location Description: 25..1440 tkr-DS Start: 5908 End: 6909 Original Location Description: 24..1025
  • the invention also provides deoxysugar biosynthetic pathways comprising any combination of the biosynthetic pathways described herein.
  • the invention provides a glycosylation system or a biosynthetic pathway comprising any combination of nucleic acids of the invention, including nucleic acids encoding enzymes of the invention and/or deoxysugar biosynthetic pathways of the invention, such as the exemplary nucleic acids of the invention SEQ ID NO:l; SEQ ID NO:3; SEQ ID NO:5; SEQ ID NO:7; SEQ ID NO:9; SEQ ID NO:ll; SEQ ID NO:13; SEQ ID NO:15; SEQ ID NO:17; SEQ ID NO:19; SEQ ID NO:21; SEQ ID NO:23; SEQ ID NO:25; SEQ ID NO:27; SEQ ID NO:29; SEQ ID NO:31, SEQ ID NO:34; SEQ ID NO:39; SEQ ID NO:44; SEQ ID NO:48; SEQ ID NO:52
  • nucleic acids can be expressed using one or any combination of designs, e.g., they can assembled together on one or more plasmids (or other expression construct), or, each can be inserted separately in a plasmid (or other expression construct).
  • the nucleic acids of the invention (or, biosynthetic pathways of the invention) can be integrated into the chromosome together of separately, or, expressed episomally.
  • the isolated or recombinant nucleic acid encodes a polypeptide having a thermostable activity.
  • the polypeptide can retain a activity under conditions comprising a temperature range of between about 37°C to about 95°C; between about 55°C to about 85°C, between about 70°C to about 95°C, or, between about 90°C to about 95°C.
  • the isolated or recombinant nucleic acid encodes a polypeptide which is thermotolerant.
  • the polypeptide can retain activity after exposure to a temperature in the range from greater than 37°C to about 95°C or anywhere in the range from greater than 55°C to about 85°C; or, after exposure to a temperature in the range between about 1°C to about 5°C, between about 5°C to about 15°C, between about 15°C to about 25°C, between about 25°C to about 37°C, between about 37°C to about
  • the isolated or recombinant nucleic acid of claim encodes a polypeptide having improved expression in a host cell, improved enzymatic activity, or a different substrate specificity than wild type.
  • the different substrate specificity can be a more promiscuous activity such that a wider range of acceptor and/or donors are accepted by the enzyme.
  • Methods to improve enzyme or gene activity are well known in the art. Some such methods are described herein such as gene site saturation mutagenesis.
  • the modified enzymes or genes are then screened for a desired property or activity as would be known in the art.
  • the modified gene or enzyme has at least about 50%,
  • the invention provides isolated or recombinant nucleic acids comprising a sequence that hybridizes under stringent conditions to a nucleic acid comprising a sequence as set forth in SEQ ID NO:l; SEQ ID NO:3; SEQ ID NO:5; SEQ ID NO:7; SEQ ID NO:9; SEQ ID NO:l 1; SEQ ID NO:13; SEQ ID NO:15; SEQ ID NO:17; SEQ ID NO: 19; SEQ ID NO:21; SEQ ID NO:23; SEQ ID NO:25; SEQ ID NO:27; SEQ ID NO:29; SEQ ID NO:31, SEQ ID NO:34; SEQ ID NO:39; SEQ ID NO:44; SEQ ID NO:48; SEQ ID NO:52; SEQ ID NO:57; SEQ ID NO:61; SEQ ID NO:66; SEQ ID NO:70; SEQ ID NO:75; SEQ ID NO:82; SEQ ID NO:89; SEQ ID NO:96; SEQ ID NO:103;
  • the nucleic acid can be at least about 10, 15, 20, 25, 30, 35, 40, 45, 50, 75, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1050, 1100, 1150, 1200 or more residues in length or the full length of the gene or transcript.
  • the stringent conditions include a wash step comprising a wash in 0.2X SSC at a temperature of about 65°C for about 15 minutes.
  • the invention provides a nucleic acid probe for identifying a nucleic acid encoding a polypeptide having a desired activity (as set forth herein), wherein the probe comprises at least about 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000 or more, consecutive bases of a sequence comprising a sequence of the invention, or fragments or subsequences thereof, wherein the probe identifies the nucleic acid by binding or hybridization.
  • the probe can comprise an oligonucleotide comprising at least about 10 to 50, about 20 to 60, about 30 to 70, about 40 to 80, or about 60 to 100 consecutive bases of a sequence comprising a sequence of the invention, or fragments or subsequences thereof.
  • the invention provides a nucleic acid probe for identifying a nucleic acid encoding a polypeptide having a desired activity (as set forth herein), wherein the probe comprises a nucleic acid comprising a sequence at least about 10, 15, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000 or more residues having at least about 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%
  • the probe can comprise an oligonucleotide comprising at least about 10 to 50, about 20 to 60, about 30 to 70, about 40 to 80, or about 60 to 100 consecutive bases of a nucleic acid sequence of the invention, or a subsequence thereof.
  • the invention provides an amplification primer pair for amplifying a nucleic acid encoding a polypeptide having a desired activity (as set forth herein), wherein the primer pair is capable of amplifying a nucleic acid comprising a sequence of the invention, or fragments or subsequences thereof.
  • One or each member of the amplification primer sequence pair can comprise an oligonucleotide comprising at least about 10 to 50 consecutive bases of the sequence, or about 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 or more consecutive bases of the sequence.
  • the invention provides amplification primer pairs, wherein the primer pair comprises a first member having a sequence as set forth by about the first (the 5') 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 or more residues of a nucleic acid of the invention, and a second member having a sequence as set forth by about the first (the 5') 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 or more residues of the complementary strand of the first member.
  • the first member sequence of a primer pair comprises a sequence selected from a polypeptide-encoding region, its complement, or their degenerate sequences of the invention.
  • the amplification primer pair sequence is codon biased based on codon usage or on GC content of an Actinomyces or that of the target cell.
  • the GC content of actinomycetes is about 65-15%, more preferably about 70%, and any from about 65%, 66%, 61%, 68%, 69%, 70%, 71%, 72%,
  • the invention provides polypeptide-encoding nucleic acids generated by amplification, e.g., polymerase chain reaction (PCR), using an amplification primer pair of the invention.
  • the invention provides enzymes generated by amplification, e.g., polymerase chain reaction (PCR), using an amplification primer pair of the invention.
  • the invention provides methods of making a polypeptide by amplification, e.g., polymerase chain reaction (PCR), using an amplification primer pair of the invention.
  • the amplification primer pair amplifies a nucleic acid from a library, e.g., a gene library, such as an environmental library.
  • the invention provides methods of amplifying a nucleic acid encoding a polypeptide having a desired activity (as set forth herein) comprising amplification of a template nucleic acid with an amplification primer sequence pair capable of amplifying a nucleic acid sequence of the invention, or fragments or subsequences thereof.
  • the invention provides expression cassettes comprising a nucleic acid of the invention or a subsequence thereof.
  • the expression cassette can comprise the nucleic acid that is operably linked to a promoter.
  • the promoter can be a viral, bacterial, mammalian or plant promoter.
  • the plant promoter can be a potato, rice, corn, wheat, tobacco or barley promoter.
  • the promoter can be a constitutive promoter.
  • the constitutive promoter can comprise CaMV35S.
  • the promoter can be an inducible promoter.
  • the promoter can be a tissue- specific promoter or an environmentally regulated or a developmentally regulated promoter.
  • the promoter can be, e.g., a seed-specific, a leaf-specific, a root-specific, a stem-specific or an abscission-induced promoter.
  • the expression cassette can further comprise a plant or plant virus expression vector.
  • the invention provides cloning vehicles comprising an expression cassette (e.g., a vector) of the invention or a nucleic acid of the invention (comprising, in one aspect, a glycosylation pathway of the invention).
  • the cloning vehicle can be a viral vector, a plasmid, a phage, a phagemid, a cosmid, a fosmid, a bacteriophage or an artificial chromosome.
  • the viral vector can comprise an adenovirus vector, a retroviral vector or an adeno-associated viral vector.
  • the cloning vehicle can comprise a bacterial artificial chromosome (BAC), a plasmid, a bacteriophage PI -derived vector (PAC), a yeast artificial chromosome (YAC), or a mammalian artificial chromosome (MAC).
  • BAC bacterial artificial chromosome
  • PAC bacteriophage PI -derived vector
  • YAC yeast artificial chromosome
  • MAC mammalian artificial chromosome
  • the invention provides transformed cells comprising a nucleic acid of the invention (comprising, in one aspect, a glycosylation pathway of the invention) an expression cassette (e.g., a vector) of the invention, or a cloning vehicle of the invention.
  • the transformed cell can be a bacterial cell (e.g., an actinomycete, including any organism from the order Actinomycetales), a mammalian cell, a fungal cell, a yeast cell, an insect cell or a plant cell.
  • the plant cell can be a potato, wheat, rice, corn, tobacco or barley cell.
  • the invention provides transgenic non-human animals comprising a nucleic acid of the invention (comprising, in one aspect, a glycosylation pathway of the invention) or an expression cassette (e.g., a vector) of the invention.
  • the animal is a mouse.
  • the invention provides transgenic plants comprising a nucleic acid of the invention (comprising, in one aspect, a glycosylation pathway of the invention) or an expression cassette (e.g., a vector) of the invention.
  • the transgenic plant can be a corn plant, a potato plant, a tomato plant, a wheat plant, an oilseed plant, a rapeseed plant, a soybean plant, a rice plant, a barley plant or a tobacco plant.
  • the invention provides transgenic seeds comprising a nucleic acid of the invention (comprising, in one aspect, a glycosylation pathway of the invention) or an expression cassette (e.g., a vector) of the invention.
  • the transgenic seed can be a corn seed, a wheat kernel, an oilseed, a rapeseed, a soybean seed, a palm kernel, a sunflower seed, a sesame seed, a peanut or a tobacco plant seed.
  • the invention provides methods of making a transgenic plant comprising the following steps: (a) introducing a heterologous nucleic acid sequence into the cell, wherein the heterologous nucleic sequence comprises a nucleic acid sequence of the invention (comprising, in one aspect, a glycosylation pathway of the invention), thereby producing a transformed plant cell; and (b) producing a transgenic plant from the transformed cell.
  • the step (a) can further comprise introducing the heterologous nucleic acid sequence by electroporation or microinjection of plant cell protoplasts.
  • the step (a) can further comprise introducing the heterologous nucleic acid sequence directly to plant tissue by DNA particle bombardment.
  • the step (a) can further comprise introducing the heterologous nucleic acid sequence into the plant cell DNA using an Agrobacterium tumefaciens host.
  • the plant cell can be a potato, corn, rice, wheat, tobacco, or barley cell.
  • the invention provides methods of expressing a heterologous nucleic acid sequence in a cell, e.g., a bacterial or plant cell, comprising the following steps: (a) transforming the plant cell with a heterologous nucleic acid sequence operably linked to a promoter, wherein the heterologous nucleic sequence comprises a nucleic acid of the invention (comprising, in one aspect, a glycosylation pathway of the invention); (b) growing the cell (e.g., bacteria or plant) under conditions wherein the heterologous nucleic acids sequence is expressed in the cell.
  • the invention provides methods of expressing a heterologous nucleic acid sequence in a cell (e.g., bacteria or plant) comprising the following steps: (a) transforming the cell (e.g., bacteria or plant) with a heterologous nucleic acid sequence operably linked to a promoter, wherein the heterologous nucleic sequence comprises a sequence of the invention; (b) growing the cell (e.g., plant) under conditions wherein the heterologous nucleic acids sequence is expressed in the cell.
  • the invention provides an antisense oligonucleotide comprising a nucleic acid sequence complementary to or capable of hybridizing under stringent conditions to a nucleic acid of the invention.
  • the invention provides methods of inhibiting the translation of a message (e.g., a glycosyl transferase, a methyltransferase, an aminotransferase, a 3,4, dehydratase, a 3-ketoreductase, 4,6-dehydratase, 2,3- dehydratase, 4-ketoreductase, or an O-methyl transferase message) in a cell comprising administering to the cell or expressing in the cell an antisense oligonucleotide comprising a nucleic acid sequence complementary to or capable of hybridizing under stringent conditions to a nucleic acid of the invention.
  • a message e.g., a glycosyl transferase, a methyltransferase, an aminotransferase, a 3,4, dehydratase, a 3-ketoreductase, 4,6-dehydratase, 2,3- dehydratase, 4-ketoreductase
  • the antisense oligonucleotide is between about 10 to 50, about 20 to 60, about 30 to 70, about 40 to 80, or about 60 to 100 bases in length, or any variation thereof.
  • the invention provides methods of inhibiting the translation of a message in a cell comprising administering to the cell or expressing in the cell an antisense oligonucleotide comprising a nucleic acid sequence complementary to or capable of hybridizing under stringent conditions to a nucleic acid of the invention.
  • the invention provides double-stranded inhibitory RNA (RNAi) molecules comprising a subsequence of a sequence of the invention.
  • the RNAi is about 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25 or more duplex nucleotides in length.
  • the invention provides methods of inhibiting the expression of an enzyme (e.g., a glycosyl transferase, a methyltransferase, an aminotransferase, a 3,4, dehydratase, a 3-keto reductase, 4,6- dehydratase, 2,3- dehydratase, 4-ketoreductase, or an O-methyl transferase activity) in a cell comprising administering to the cell or expressing in the cell a double-stranded inhibitory RNA (iRNA), wherein the RNA comprises a subsequence of a sequence of the invention.
  • an enzyme e.g., a glycosyl transferase, a methyltransferase, an aminotransferase, a 3,4, dehydratase, a 3-keto reductase, 4,6- dehydratase, 2,3- dehydratase, 4-ketoreductase, or an O-methyl transferas
  • the invention provides an isolated or recombinant polypeptide comprising an amino acid sequence having at least about 50%), 51%, 52%, 53%>, 54%>, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more, or complete (100%o) sequence identity to an exemplary polypeptide or peptide of the invention over a region of at least about 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 100, 125, 150, 175, 200, 225, 250, 275, 300, 325, 350 or
  • Exemplary polypeptides and peptides of the invention comprise sequences as set forth in SEQ ID NO:2; SEQ ID NO:4; SEQ ID NO:6; SEQ IDNO:8; SEQ ID NO: 10; SEQ ID NO: 12; SEQ ID NO: 14; SEQ ID NO: 16; SEQ ID NO: 18; SEQ ID NO:20; SEQ ID NO:22; SEQ ID NO:24; SEQ ID NO:26; SEQ ID NO:28; SEQ ID NO:30; SEQ ID NO:32; SEQ ID NO&3; SEQ ID NO:35; SEQ ID NO:36; SEQ ID NO:37; SEQ ID NO:38; SEQ ID NO:40; SEQ ID NO:41; SEQ ID NO:42; SEQ ID NO:43; SEQ ID NO:45; SEQ ID NO:46; SEQ ID NO:47; SEQ ID NO:49; SEQ ID NO:50; SEQ ID NO:51; SEQ ID NO:53; SEQ ID NO:54; SEQ
  • SEQ ID NO 104 SEQ ID NO:105; SEQ ID NO:106; SEQ ID NO:107; SEQ ID NO:108 SEQ ID NO 109 SEQ IDNO-.lll; SEQ ID NO:112; SEQ ID NO:113; SEQ ID NO:114 SEQ ID NO 115 SEQ ID NOT 16; SEQ ID NO:l 17; SEQ ID NO:119; SEQ ID NO:120 SEQ ID NO 121 SEQ ID NO:122; SEQ ID NO:123; SEQ ID NO:124; SEQ ID NO:126 SEQ ID NO 127 SEQ ID NO:128; SEQ ID NO:129; SEQ ID NO:130; SEQ ID NO:131 SEQ ID NO 133 SEQ ID NO:134; SEQ ID NO: 135; SEQ ID NO: 136; SEQ ID NO:137 : SEQ ID NO: 140; SEQ ID NO: 141; SEQ ID NO: 143; SEQ ID NO: 144 SEQ ID NO 145 SEQ ID NO: 147
  • sequence identities can be determined, e.g., by analysis with a sequence comparison algorithm or by visual inspection.
  • sequence comparison algorithm is a BLAST version 2.2.2 algorithm where a filtering setting is set to blastall -p blastp -d "nr pataa" -F F, and all other options are set to default.
  • Exemplary polypeptides also include peptides of at least about 10, 15, 20, 25, 30, 35, 40, 45, 50, 75, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600 or more residues of an exemplary sequence of the invention, or over the full length of an exemplary enzyme of the invention.
  • the peptide can be, e.g., an immunogenic fragment, a motif (e.g., a binding site), or an active site.
  • exemplary polypeptide or peptide sequences of the invention include sequence encoded by a nucleic acid of the invention.
  • Exemplary polypeptide or peptide sequences of the invention include polypeptides or peptides specifically bound by an antibody of the invention.
  • a polypeptide of the invention can have a glycosyl transferase, a methyltransferase, an aminotransferase, a 3,4, dehydratase, a 3-keto reductase, 4,6-dehydratase, 2,3- dehydratase, 4-ketoreductase, or an O-methyl transferase activity, as described herein.
  • the enzyme activity is thermostable.
  • the polypeptide can retain enzyme activity under conditions comprising a temperature range of between about 1°C to about 5°C, between about 5°C to about 15°C, between about 15°C to about 25°C, between about 25°C to about 37°C, between about 37°C to about 95°C, between about 55°C to about 85°C, between about 70°C to about 75°C, or between about 90°C to about 95°C, or more.
  • the enzyme activity can be thermotolerant.
  • the polypeptide can retain enzyme activity after exposure to a temperature in the range from greater than 37°C to about 95°C, or in the range from greater than 55°C to about 85°C, or in the range from greater than 90°C to about 95°C at pH 4.5.
  • the isolated or recombinant polypeptide can comprise the polypeptide of the invention that lacks a signal sequence.
  • the isolated or recombinant polypeptide can comprise the polypeptide of the invention comprising a heterologous signal sequence.
  • the invention provides a signal sequence comprising a peptide comprising/ consisting of a sequence as set forth in residues 1 to 12, 1 to 13, 1 to 14, 1 to 15, 1 to 16, 1 to 17, 1 to 18, 1 to 19, 1 to 20, 1 to 21, 1 to 22, 1 to 23, 1 to 24, 1 to 25, 1 to 26, 1 to 27, 1 to 28, 1 to 28, 1 to 30, 1 to 31, 1 to 32, 1 to 33, 1 to 34, 1 to 35, 1 to
  • the invention provides chimeric proteins comprising a first domain comprising a signal sequence of the invention and at least a second domain.
  • the protein can be a fusion protein.
  • the second domain can comprise an enzyme.
  • the enzyme can be an enzyme.
  • the invention provides chimeric polypeptides comprising at least a first domain comprising signal peptide (SP), a prepro sequence and/or a catalytic domain (CD) of the invention and at least a second domain comprising a heterologous polypeptide or peptide, wherein the heterologous polypeptide or peptide is not naturally associated with the signal peptide (SP), prepro sequence and/ or catalytic domain (CD).
  • the heterologous polypeptide or peptide is not derived from an enzyme of the invention.
  • the heterologous polypeptide or peptide can be amino terminal to, carboxy terminal to or on both ends of the signal peptide (SP), prepro sequence and/or catalytic domain (CD).
  • the invention provides isolated or recombinant nucleic acids encoding a chimeric polypeptide, wherein the chimeric polypeptide comprises at least a first domain comprising signal peptide (SP), a prepro domain and/or a catalytic domain (CD) of the invention and at least a second domain comprising a heterologous polypeptide or peptide, wherein the heterologous polypeptide or peptide is not naturally associated with the signal peptide (SP), prepro domain and/ or catalytic domain (CD).
  • the invention provides the isolated or recombinant polypeptide of the invention, wherein the polypeptide comprises at least one glycosylation site. In one aspect, glycosylation can be an N-linked glycosylation.
  • the polypeptide can be glycosylated after being expressed in a P. pastoris or a & pombe.
  • the polypeptide can retain enzyme activity under conditions comprising about pH 6.5, pH 6, pH 5.5, pH 5, pH 4.5 or pH 4.
  • the polypeptide can retain enzyme activity under conditions comprising about pH 7, pH 7.5 pH 8.0, pH 8.5, pH 9, pH 9.5, pH 10, pH 10.5 or pH 11.
  • the polypeptide can retain enzyme activity after exposure to conditions comprising about pH 6.5, pH 6, pH 5.5, pH 5, pH 4.5 or pH 4.
  • the polypeptide can retain enzyme activity after exposure to conditions comprising about pH 7, pH 7.5 pH 8.0, pH 8.5, pH 9, pH 9.5, pH 10, pH 10.5 or pH 11.
  • the invention provides protein preparations comprising a polypeptide of the invention, wherein the protein preparation comprises a liquid, a solid or a gel.
  • the invention provides heterodimers comprising a polypeptide of the invention and a second protein or domain.
  • the second domain can be a polypeptide and the heterodimer can be a fusion protein.
  • the second domain can be an epitope or a tag.
  • the invention provides homodimers comprising a polypeptide of the invention.
  • the invention provides immobilized polypeptides having enzyme activity, wherein the polypeptide comprises a polypeptide of the invention, a polypeptide encoded by a nucleic acid of the invention, or a polypeptide comprising a polypeptide of the invention and a second domain.
  • the polypeptide can be immobilized on a cell, a metal, a resin, a polymer, a ceramic, a glass, a microelectrode, a graphitic particle, a bead, a gel, a plate, an array or a capillary tube.
  • the invention provides arrays comprising an immobilized nucleic acid or polypeptide of the invention.
  • the invention provides arrays comprising an antibody of the invention.
  • the invention provides isolated or recombinant antibodies that specifically bind to a polypeptide of the invention or to a polypeptide encoded by a nucleic acid of the invention.
  • the antibody can be a monoclonal or a polyclonal antibody.
  • the invention provides hybridomas comprising an antibody of the invention, e.g., an antibody that specifically binds to a polypeptide of the invention or to a polypeptide encoded by a nucleic acid of the invention.
  • the invention provides methods of making an antibody comprising administering to a non-human animal a nucleic acid of the invention or a polypeptide of the invention or subsequences thereof in an amount sufficient to generate a humoral immune response.
  • the invention provides methods of generating an immune response comprising administering to a non-human animal a nucleic acid of the invention or a polypeptide of the invention or subsequences thereof in an amount sufficient to generate an immune response.
  • the invention provides methods of producing a recombinant polypeptide comprising the steps of: (a) providing a nucleic acid of the invention operably linked to a promoter; and (b) expressing the nucleic acid of step (a) under conditions that allow expression of the polypeptide, thereby producing a recombinant polypeptide.
  • the method can further comprise transforming a host cell with the nucleic acid of step (a) followed by expressing the nucleic acid of step (a), thereby producing a recombinant polypeptide in a transformed cell.
  • the invention provides methods for identifying a polypeptide having a desired activity comprising the following steps: (a) providing a polypeptide of the invention; or a polypeptide encoded by a nucleic acid of the invention; (b) providing an appropriate enzyme substrate; and (c) contacting the polypeptide or a fragment or variant thereof of step (a) with the substrate of step (b) and detecting a decrease in the amount of substrate or an increase in the amount of a reaction product, wherein a decrease in the amount of the substrate or an increase in the amount of the reaction product detects a polypeptide having the desired activity.
  • the invention provides methods for identifying an enzyme substrate comprising the following steps: (a) providing a polypeptide of the invention; or a polypeptide encoded by a nucleic acid of the invention; (b) providing a test substrate; and (c) contacting the polypeptide of step (a) with the test substrate of step (b) and detecting a decrease in the amount of substrate or an increase in the amount of reaction product, wherein a decrease in the amount of the substrate or an increase in the amount of a reaction product identifies the test substrate as the appropriate substrate.
  • the invention provides methods of determining whether a test compound specifically binds to a polypeptide comprising the following steps: (a) expressing a nucleic acid or a vector comprising the nucleic acid under conditions permissive for translation of the nucleic acid to a polypeptide, wherein the nucleic acid comprises a nucleic acid of the invention, or, providing a polypeptide of the invention; (b) providing a test compound; (c) contacting the polypeptide with the test compound; and (d) determining whether the test compound of step (b) specifically binds to the polypeptide.
  • the invention provides methods for identifying a modulator of an enzyme's activity comprising the following steps: (a) providing a polypeptide of the invention or a polypeptide encoded by a nucleic acid of the invention; (b) providing a test compound; (c) contacting the polypeptide of step (a) with the test compound of step (b) and measuring an activity of the enzyme, wherein a change in the enzyme activity measured in the presence of the test compound compared to the activity in the absence of the test compound provides a determination that the test compound modulates the enzyme's activity.
  • the enzyme activity can be measured by providing an appropriate substrate and detecting a decrease in the amount of the substrate or an increase in the amount of a reaction product, or, an increase in the amount of the substrate or a decrease in the amount of a reaction product.
  • a decrease in the amount of the substrate or an increase in the amount of the reaction product with the test compound as compared to the amount of substrate or reaction product without the test compound identifies the test compound as an activator of enzyme activity.
  • An increase in the amount of the substrate or a decrease in the amount of the reaction product with the test compound as compared to the amount of substrate or reaction product without the test compound identifies the test compound as an inhibitor of enzyme activity.
  • the invention provides computer systems comprising a processor and a data storage device wherein said data storage device has stored thereon a polypeptide sequence or a nucleic acid sequence of the invention (e.g., a polypeptide encoded by a nucleic acid of the invention).
  • the computer system can further comprise a sequence comparison algorithm and a data storage device having at least one reference sequence stored thereon.
  • the sequence comparison algorithm comprises a computer program that indicates polymorphisms.
  • the computer system can further comprise an identifier that identifies one or more features in said sequence.
  • the invention provides computer readable media having stored thereon a polypeptide sequence or a nucleic acid sequence of the invention.
  • the invention provides methods for identifying a feature in a sequence comprising the steps of: (a) reading the sequence using a computer program which identifies one or more features in a sequence, wherein the sequence comprises a polypeptide sequence or a nucleic acid sequence of the invention; and (b) identifying one or more features in the sequence with the computer program.
  • the invention provides methods for comparing a first sequence to a second sequence comprising the steps of: (a) reading the first sequence and the second sequence through use of a computer program which compares sequences, wherein the first sequence comprises a polypeptide sequence or a nucleic acid sequence of the invention; and (b) determining differences between the first sequence and the second sequence with the computer program.
  • the step of determining differences between the first sequence and the second sequence can further comprise the step of identifying polymorphisms.
  • the method can further comprise an identifier that identifies one or more features in a sequence.
  • the method can comprise reading the first sequence using a computer program and identifying one or more features in the sequence.
  • the invention provides methods for isolating or recovering a nucleic acid encoding a polypeptide having a desired activity from an environmental sample comprising the steps of: (a) providing an amplification primer sequence pair for amplifying a nucleic acid encoding a polypeptide having the desired activity, wherein the primer pair is capable of amplifying a nucleic acid of the invention; (b) isolating a nucleic acid from the environmental sample or treating the environmental sample such that nucleic acid in the sample is accessible for hybridization to the amplification primer pair; and, (c) combining the nucleic acid of step (b) with the amplification primer pair of step (a) and amplifying nucleic acid from the environmental sample, thereby isolating or recovering a nucleic acid encoding a polypeptide having the desired activity from an environmental sample.
  • One or each member of the amplification primer sequence pair can comprise an oligonucleotide comprising at least about 10 to 50 consecutive bases of a sequence of the invention.
  • the amplification primer sequence pair is an amplification pair of the invention.
  • the invention provides methods for isolating or recovering a nucleic acid encoding a polypeptide having enzyme activity from an environmental sample comprising the steps of: (a) providing a polynucleotide probe comprising a nucleic acid of the invention or a subsequence thereof; (b) isolating a nucleic acid from the environmental sample or treating the environmental sample such that nucleic acid in the sample is accessible for hybridization to a polynucleotide probe of step (a); (c) combining the isolated nucleic acid or the treated environmental sample of step (b) with the polynucleotide probe of step (a); and (d) isolating a nucleic acid that specifically hybridizes with the polynucleotide probe of step
  • the environmental sample can comprise a water sample, a liquid sample, a soil sample, an air sample or a biological sample.
  • the biological sample can be derived from a bacterial cell, a protozoan cell, an insect cell, a yeast cell, a plant cell, a fungal cell or a mammalian cell.
  • the invention provides methods of generating a variant of a nucleic acid encoding a polypeptide having enzyme (e.g., a glycosyl transferase, a methyltransferase, an aminotransferase, a 3,4, dehydratase, a 3-keto reductase, 4,6-dehydratase, 2,3- dehydratase, 4-ketoreductase, or an O-methyl transferase) activity comprising the steps of: (a) providing a template nucleic acid comprising a nucleic acid of the invention; and
  • the method can further comprise expressing the variant nucleic acid to generate a variant polypeptide.
  • the modifications, additions or deletions can be introduced by a method comprising error-prone PCR, shuffling, oligonucleotide-directed mutagenesis, assembly PCR, sexual PCR mutagenesis, in vivo mutagenesis, cassette mutagenesis, recursive ensemble mutagenesis, exponential ensemble mutagenesis, site-specific mutagenesis, gene reassembly, Gene Site Saturation MutagenesisTM (GSSMTM), synthetic ligation reassembly (SLR) or a combination thereof.
  • GSSMTM Gene Site Saturation MutagenesisTM
  • SLR synthetic ligation reassembly
  • the modifications, additions or deletions are introduced by a method comprising recombination, recursive sequence recombination, phosphothioate-modified DNA mutagenesis, uracil-containing template mutagenesis, gapped duplex mutagenesis, point mismatch repair mutagenesis, repair-deficient host strain mutagenesis, chemical mutagenesis, radiogenic mutagenesis, deletion mutagenesis, restriction-selection mutagenesis, restriction-purification mutagenesis, artificial gene synthesis, ensemble mutagenesis, chimeric nucleic acid multimer creation and a combination thereof.
  • the method can be iteratively repeated until an enzyme having an altered or different activity, an altered or different stability, an altered or different substrate specificity, or an altered or different level of expression in a host cell, from that of a polypeptide encoded by the template nucleic acid is produced.
  • the variant enzyme is thermotolerant, and retains some activity after being exposed to an elevated temperature.
  • the variant enzyme has increased glycosylation as compared to the enzyme encoded by a template nucleic acid.
  • the enzyme has increased glycosyltransferase activity as compared to the enzyme encoded by the starting template.
  • the variant enzyme polypeptide has enzyme activity under a high temperature, wherein the enzyme encoded by the template nucleic acid is not active under the high temperature.
  • the method can be iteratively repeated until an enzyme coding sequence having an altered codon usage from that of the template nucleic acid is produced.
  • the method can be iteratively repeated until an enzyme gene having higher or lower level of message expression or stability from that of the template nucleic acid is produced.
  • the invention provides methods for modifying codons in a nucleic acid encoding a polypeptide having an enzyme activity to increase its expression in a host cell, the method comprising the following steps: (a) providing a nucleic acid of the invention encoding a polypeptide having an enzyme activity; and, (b) identifying a non-preferred or a less preferred codon in the nucleic acid of step (a) and replacing it with a preferred or neutrally used codon encoding the same amino acid as the replaced codon, wherein a preferred codon is a codon over-represented in coding sequences in genes in the host cell and a non-preferred or less preferred codon is a codon under-represented in coding sequences in genes in the host cell, thereby modifying the nucleic acid to increase its expression in a host cell.
  • the invention provides methods for modifying codons in a nucleic acid encoding a polypeptide having an enzyme activity; the method comprising the following steps: (a) providing a nucleic acid of the invention; and, (b) identifying a codon in the nucleic acid of step (a) and replacing it with a different codon encoding the same amino acid as the replaced codon, thereby modifying codons in a nucleic acid encoding an enzyme.
  • the invention provides methods for modifying codons in a nucleic acid encoding a polypeptide having an enzyme activity to increase its expression in a host cell, the method comprising the following steps: (a) providing a nucleic acid of the invention encoding an enzyme; and, (b) identifying a non-preferred or a less preferred codon in the nucleic acid of step (a) and replacing it with a preferred or neutrally used codon encoding the same amino acid as the replaced codon, wherein a preferred codon is a codon over- represented in coding sequences in genes in the host cell and a non-preferred or less preferred codon is a codon under-represented in coding sequences in genes in the host cell, thereby modifying the nucleic acid to increase its expression in a host cell.
  • the invention provides methods for modifying a codon in a nucleic acid encoding a polypeptide having an enzyme activity to decrease its expression in a host cell, the method comprising the following steps: (a) providing a nucleic acid of the invention; and (b) identifying at least one preferred codon in the nucleic acid of step (a) and replacing it with a non-preferred or less preferred codon encoding the same amino acid as the replaced codon, wherein a preferred codon is a codon over-represented in coding sequences in genes in a host cell and a non-preferred or less preferred codon is a codon under-represented in coding sequences in genes in the host cell, thereby modifying the nucleic acid to- decrease its expression in a host cell.
  • the host cell can be a bacterial cell, a fungal cell, an insect cell, a yeast cell, a plant cell or a mammalian cell.
  • the invention provides methods for producing a library of nucleic acids encoding a plurality of modified enzyme active sites or substrate binding sites, wherein the modified active sites or substrate binding sites are derived from a first nucleic acid comprising a sequence encoding a first active site or a first substrate binding site the method comprising the following steps: (a) providing a first nucleic acid encoding a first active site or first substrate binding site, wherein the first nucleic acid sequence comprises a sequence that hybridizes under stringent conditions to a nucleic acid of the invention, and the nucleic acid encodes an enzyme active site or substrate binding site; (b) providing a set of mutagenic oligonucleotides that encode naturally-occurring amino acid variants at a plurality of targeted codons in the first nucleic acid; and, (c) using the set of
  • the method comprises mutagenizing the first nucleic acid of step (a) by a method comprising an optimized directed evolution system, Gene Site Saturation MutagenesisTM (GSSMTM), synthetic ligation reassembly (SLR), error-prone PCR, shuffling, oligonucleotide-directed mutagenesis, assembly PCR, sexual PCR mutagenesis, in vivo mutagenesis, cassette mutagenesis, recursive ensemble mutagenesis, exponential ensemble mutagenesis, site-specific mutagenesis, gene reassembly, and a combination thereof.
  • GSSMTM Gene Site Saturation MutagenesisTM
  • SLR synthetic ligation reassembly
  • error-prone PCR shuffling
  • oligonucleotide-directed mutagenesis assembly PCR
  • sexual PCR mutagenesis in vivo mutagenesis
  • cassette mutagenesis cassette mutagenesis
  • recursive ensemble mutagenesis recursive ensemble muta
  • the method comprises mutagenizing the first nucleic acid of step (a) or variants by a method comprising recombination, recursive sequence recombination, phosphothioate-modified DNA mutagenesis, uracil-containing template mutagenesis, gapped duplex mutagenesis, point mismatch repair mutagenesis, repair-deficient host strain mutagenesis, chemical mutagenesis, radiogenic mutagenesis, deletion mutagenesis, restriction-selection mutagenesis, restriction-purification mutagenesis, artificial gene synthesis, ensemble mutagenesis, chimeric nucleic acid multimer creation and a combination thereof.
  • the invention provides methods for making a small molecule comprising the following steps: (a) providing a plurality of biosynthetic enzymes capable of synthesizing or modifying a small molecule, wherein one of the enzymes is encoded by a nucleic acid of the invention; (b) providing a substrate for at least one of the enzymes of step (a); and (c) reacting the substrate of step (b) with the enzymes under conditions that facilitate a plurality of biocatalytic reactions to generate a small molecule by a series of biocatalytic reactions.
  • the invention provides methods for modifying a small molecule comprising the following steps: (a) providing an enzyme of the invention, or, a polypeptide encoded by a nucleic acid of the invention, or a subsequence thereof; (b) providing a small molecule; and (c) reacting the enzyme of step (a) with the small molecule of step (b) under conditions that facilitate an enzymatic reaction catalyzed by the enzyme, thereby modifying a small molecule by an enzymatic reaction.
  • the method can comprise a plurality of small molecule substrates for the enzyme of step
  • the method can comprise a plurality of additional enzymes under conditions that facilitate a plurality of biocatalytic reactions by the enzymes to form a library of modified small molecules produced by the plurality of enzymatic reactions.
  • the method can further comprise the step of testing the library to determine if a particular modified small molecule which exhibits a desired activity is present within the library.
  • the step of testing the library can further comprise the steps of systematically eliminating all but one of the biocatalytic reactions used to produce a portion of the plurality of the modified small molecules within the library by testing the portion of the modified small molecule for the presence or absence of the particular modified small molecule with a desired activity, and identifying at least one specific biocatalytic reaction that produces the particular modified small molecule of desired activity.
  • the invention provides methods for determining a functional fragment of an enzyme comprising the steps of: (a) providing an enzyme of the invention, or a polypeptide encoded by a nucleic acid of the invention, or a subsequence thereof; and (b) deleting a plurality of amino acid residues from the sequence of step (a) and testing the remaining subsequence for activity, thereby determining a functional fragment of the enzyme.
  • the activity is measured by providing an appropriate substrate and detecting a decrease in the amount of the substrate or an increase in the amount of a reaction product.
  • the invention provides methods for whole cell engineering of new or modified phenotypes by using metabolic flux analysis, the method comprising the following steps: (a) making a modified cell by modifying the genetic composition of a cell, wherein the genetic composition is modified by addition to the cell of a nucleic acid of the invention; (b) culturing the modified cell to generate a plurality of modified cells; (c) measuring at least one metabolic parameter of the cell by monitoring the cell culture of step (b), optionally in real time; and, (d) analyzing the data of step (c) to determine if the measured parameter differs from a comparable measurement in an unmodified cell under similar conditions, thereby identifying an engineered phenotype in the cell using real-time metabolic flux analysis.
  • the genetic composition of the cell can be modified by a method comprising deletion of a sequence or modification of a sequence in the cell, or, knocking out the expression of a gene.
  • the method can further comprise selecting a cell comprising a newly engineered phenotype.
  • the method can comprise culturing the selected cell, thereby generating a new cell strain comprising a newly engineered phenotype.
  • the invention provides methods of increasing thermotolerance or thermostability of an enzyme of the invention, the method comprising glycosylating a polypeptide, wherem the polypeptide comprises at least thirty contiguous amino acids of a polypeptide of the invention; or a polypeptide encoded by a nucleic acid sequence of the invention, thereby increasing the thermotolerance or thermostability of the enzyme of the invention.
  • the enzyme specific activity can be thermostable or thermotolerant at a temperature in the range from greater than about 37°C to about 95°C.
  • the invention provides methods for overexpressing a recombinant polypeptide in a cell comprising expressing a vector comprising a nucleic acid comprising a nucleic acid of the invention or a nucleic acid sequence of the invention, wherein the sequence identities are determined by analysis with a sequence comparison algorithm or by visual inspection, wherein overexpression is effected by use of a high activity promoter, a dicistronic vector or by gene amplification of the vector.
  • the invention provides a kit comprising a polypeptide of the invention or a polypeptide encoded by a nucleic acid of the invention.
  • Figure 1 is an illustration of natural products that can be glycosylated using the systems and methods of the invention.
  • Figure 2 illustrates exemplary spinosyns that are modified (glycosylated) using the systems and methods of the invention.
  • Figure 3 illustrates the structures of doxorubicin, MEN10755 and the general structure of anthracyclines that can be modified (glycosylated) using the systems and methods of the invention.
  • Figure 4 illustrates an exemplary doxorubicin biosynthesis pathway.
  • Figure 5 the structures of anthracyclines that can be modified
  • Figure 6 illustrates the organization of the spinosyn gene cluster and location of isolated fosmid clones, as discussed in detail in Example 1, below.
  • Figure 7 illustrates substrates used in conversion experiments: aglycone, 9- pseudo-aglycone, 9-PSA, and 17-pseudo-aglycone, 17-PSA, in Figure 7a, and 17-keto aglycone in Figure 7b, as discussed in detail in Example 5, below.
  • Figure 8 illustrates structures of: Figure 8(a) M548, Figure 8(b) M548-II, and Figure 8(c) M532, as discussed in detail in Example 7, below.
  • Figure 9 illustrates L-rhamnose biosynthesis in Sac. spinosa and 6-deoxy- D-glucose biosynthesis in an exemplary engineered Streptomyces of the invention, as discussed in detail in Example 7, below.
  • Figure 10 illustrates the structures spinosyn C, as discussed in detail in Example 1, below.
  • Figure 11 illustrates structures of novel spinosyn derivatives of the invention: M689 Figure 11(a), M689-II Figure 1(b) and M673 Figure 11(c), as discussed in detail in Example 8, below.
  • Figure 12 illustrates a general scheme for deoxysugar biosynthesis that can be used to practice the invention, as discussed in detail in Example 9, below.
  • the 4,6-di- deoxysugars are additional sugars derived from 4-keto-6-deoxy-dNDP-glucose.
  • Figure 13 illustrates the structure of viriplanin.
  • Figure 14 illustrates an exemplary solid phase gene reassembly technique used to generate enzymes used in the compositions and methods of the invention, as discussed in detail in Example 10, below.
  • Figure 15 illustrates a model for the construction of an exemplary deoxysugar pathway used in the compositions and methods of the invention, as discussed in detail in Example 10, below.
  • Figure 16 illustrates a scheme for constructing a library of pathways including various biosynthetic genes to be used in the compositions and methods of the invention, as discussed in detail in Example 10, below.
  • Figure 17 illustrates the vector pUWL201, as discussed in detail in Example 14, below.
  • Figure 18 illustrates the vector pAT6-gtt21-gdhll-epil2-krel5, into which is cloned a deoxysugar pathway, as discussed in detail in Example 14, below.
  • Figure 19a illustrates the structure of spinosyn A aglycone; and, Figure 19b illustrates a novel compound of the invention, M548, or 6-deoxy-D-glucose- 17- pseudoaglycone, as discussed in detail in Example 14, below.
  • Figure 20 illustrates a novel compound of the invention, Compound M548-II, or L-rhamnosyl- 17-pseudo-aglycone, as discussed in detail in Example 15, below.
  • Figure 21 illustrates an engineered deoxysugar pathway of the invention, designated pathway #3 in vector pAT6-gtt21-gdhll-epil2-kre9, as discussed in detail in Example 15, below.
  • Figure 22 illustrates an engineered deoxysugar pathway of the invention, pathway #9, cloned into vector pAT6-gtt-gdh-e8-k9-tdh-tkr, as discussed in detail in Example 15, below.
  • Figure 23 illustrates an engineered deoxysugar pathway, pathway #12, as cloned into the vector pAT6-gtt-gdh-el2-kl5-tdh-tkr, as discussed in detail in Example 16, below.
  • Figure 24 illustrates a novel compound of the invention, Compound M532, or L-digitoxosyl- 17-pseudo-aglycone, as discussed in detail in Example 16, below.
  • Figure 25a illustrates the structure of spinosyn A 9-pseudo-aglycone (9- PSA);
  • Figure 25b illustrates a novel compound of the invention, 9-6-deoxy-D-glucosyl- spinosyn A of the invention, as discussed in detail in Example 17, below.
  • Figure 26 illustrates a novel compound of the invention, Compound
  • Figure 27 illustrates a novel compound of the invention, Compound M673, 9-L-digitoxosyl-spinosyn A, as discussed in detail in Example 18, below.
  • Figure 28 illustrates a map of pathway #1 in plasmid pAT6, as discussed in detail in Example 21, below.
  • Figure 29 illustrates various inserts of complete and incomplete 6- deoxysugar pathways, as discussed in detail in Example 21, below.
  • Figure 30 illustrates the conversion of an aglycone with complete or incomplete pathways with SpnG by S. diversa, as discussed in detail in Example 21, below.
  • Figure 31 shows deoxy sugar compounds that can be synthesized and/or transferred to macrolides by embodiments of the invention.
  • these compounds are suitable as a deoxy sugar at the 9 position of a spinosyn or a spinosyn pseudoaglycone.
  • R5 is C2-C6 alkyl, C3-C6 branched alkyl, C3-C7 cycloalkyl, C1-C6 alkoxy-Cl-C6 alkyl, C1-C6 alkylthio-Cl-C6 alkyl, halo C1-C6 alkyl, C2- C6 alkenyl, C2-C6 alkynyl, C1-C6 alkylcarbonyl, or C3-C6 branched alkylcarbonyl, C3- C7-cycloalkylcarbonyl, C1-C6 alkoxy-Cl-C6 alkylcarbonyl, halo C1-C6 alkylcarbonyl, C2-C6 alkenylcarbonyl, C2-C6 alkynylcarbonyl, or formyl.
  • Figure 32 shows deoxy sugar compounds that can be synthesized and/or transferred to macrolides by embodiments of the invention.
  • these compounds are suitable as a deoxy sugar at the 17 position of a spinosyn or a spinosyn pseudoaglycone.
  • R5 is C2-C6 alkyl, C3-C6 branched alkyl, C3-C7 cycloalkyl, C1-C6 alkoxy-Cl-C6 alkyl, C1-C6 alkylthio-Cl-C6 alkyl, halo C1-C6 alkyl, C2- C6 alkenyl, C2-C6 alkynyl, C1-C6 alkylcarbonyl, or C3-C6 branched alkylcarbonyl, C3- C7-cycloalkylcarbonyl, C1-C6 alkoxy-Cl-C6 alkylcarbonyl, halo C1-C6 alkylcarbonyl, C2-C6 alkenylcarbonyl, C2-C6 alkynylcarbonyl or formyl; and R6 in formula M is Cl- C6 alkyl, C1-C6 alkenyl, formyl, C1-C6 alkylcarbonyl
  • Figure 33 shows the formulas of additional deoxysugars and of spinosyn variants of the present invention.
  • the deoxysugars can be used to glycosylate other aglycones or pseudoaglycones as discussed herein.
  • Figure 34 is a schematic depicting exemplary combinatorial enzyme pathways for production, transfer and modification of deoxy sugars.
  • Figure 35 depicts the basic ring structure of 21-butenyl spinosyns suitable for use in embodiments
  • the invention provides in vivo glycosylation systems for the biosynthesis of novel glycosylated natural products for pharmaceutical and agrochemical applications.
  • the glycosylated natural products made by the methods and in vivo glycosylation systems of the invention can be a source of valuable leads in pharmaceutical drug development and agrochemical applications.
  • the novel glycosylation platform of the invention exploits the substrate flexibility of glycosyltransferases and the abundance of deoxysugar biosynthetic pathways found in Actinomyceales. As components of glycoconjugates, sugars contribute to a large repertoire of compounds with diverse biological activities (for example see Weymouth- Wilson, A. C. Nat. Prod. Rep.
  • the glycosylated macrolide spinosyn A (1) produced by Saccharopolyspora spinosa is a commercially important insecticide.
  • the spinosyn tetracyclic aglycone carries a per-methylated L-rhamnose moiety at C9 and a D- forosamine moiety at C17.
  • the invention provides a novel technology for glycosylation of natural products using novel genetically engineered strains of bacteria. These in vivo glycosylation systems of the invention express a heterologous glycosyltransferase and deoxysugar pathway that are capable of glycosylating a suitable substrate, which can be added to a culture broth.
  • the invention also provides novel compounds glycosylated by the genetically engineered strains of the invention (the in vivo glycosylation systems of the invention), including novel peptides, a mixed polyketide-peptide, or polyketides, including novel macrolides (see Figure 1), e.g., glycosylated spinosyn derivatives, glycosylated derivatives of antibiotics such as erythromycin, tetracycline, rifampicin, glycosylated derivatives of anti-tumor drugs such as anthracyclines (e.g., doxorubicin, and second generation anthracyclines such as idarubicin and epirubicin) daunorubicin, mithramycin, derivatives of immunosuppressants such as rapamycin, FK520, FK506, glycosylated derivatives of anti-fungals such as amphotericin, glycosylated derivatives of antibacterials such as tylosin, glycos
  • these glycosylated derivatives of the invention are disaccharide derivatives.
  • the natural products can be aglycones or pseudoaglycones.
  • pseudoaglycone is meant a compound that is the result of removing only one or more but not all sugars from the parent compound.
  • Figure 1 illustrates exemplary natural products, polyketide derived pharmaceuticals from actinomycetes, that can be modified (glycosylated) using the systems and methods of the invention; sugar moieties shown in red.
  • Figure 2 illustrates exemplary spinosyns that are modified (glycosylated) using the systems and methods of the invention. Spinosyns glycosylated using the systems and methods of the invention include those from S.
  • Glycosylated derivatives of the invention include both spiriosyn A and the 17-pseudo-aglycone (17-PSA) generated by replacing the tri-O-methyl L-rhamnose with alternative sugars (see Examples, below).
  • Additional spinosyns that can be modified using the systems and methods of the invention include the 21-butenyl spinosyns and their aglycones and pseudoaglycones as disclosed in WQ02077004 published October 3, 2002, which is incorporated by reference herein for its disclosure of these starting compounds and sugars.
  • Figure 35 shows two 21- butenyl spinosyn backbones (I and II) that can be glycosylated (at Sugar and Sugar 1 positions) via the methods of the invention.
  • R2 is H or CH3;
  • R3 and R4 are H or combine to form a double bond or combine to form an epoxide group;
  • RIO in formula II is ethyl;
  • R5 is C2-C6 alkyl, C3-C6 branched alkyl, C3-C7 cycloalkyl, C1-C6 alkoxy-Cl- C6 alkyl, C1-C6 alkylthio-Cl-C6 alkyl, halo C1-C6 alkyl, C2-C6 alkenyl, C2-C6 alkynyl, C1-C6 alkylcarbonyl, or C3-C6 branched alkylcarbonyl, C3-C7-cycloalkylcarbonyl, Cl- C6 alkoxy-Cl-C6 alkylcarbonyl, halo C1-C6 alkylcarbonyl, C2-C6 alkenylcarbonyl, C2- C6 alkynylcarbonyl or formy
  • Additional deoxy sugars that can be produced and/or transferred using the methods and systems of the invention include the deoxy sugars shown in Figures 31 and 32, which are also disclosed in WO02077004 published October 3, 2002, which is incorporated by reference for its disclosure of these deoxy sugars. Accordingly in one embodiment are 21-butenyl-spinosyns and pseudoaglycones wherein the Sugar and/or Sugar 2 positions are any of the deoxysugars produced or disclosed herein.
  • spinosyns or their pseudoaglycones produced by the heterologous expression methods of the present invention that have at their 9- and/or 17-positions any of the deoxysugars produced or disclosed herein
  • These butenyl-spinosyn starting materials can be prepared by culturing one of the following strains of Saccharopolyspora sp. that were deposited on the dates indicated in accordance with the terms of the Budapest treaty at the Midwest Area
  • WO02077004 published October 3, 2002: Deposit Number deposit date NRRL 3 0141 June 9, 1999 NRRL 30424 March 8. 2001 NRRL 30423 March 8, 2001 NRRL 30422 March 8, 2001 NRRL 30438 March 15, 2001 NRRL 30421 March 8, 2001 NRRL 30437 March 15, 2001
  • These strains are suitable as hosts for the in vivo heterologous expression methods of the present invention. Further these strains are suitable sources for extracts and enzymes for practicing the in vitro heterologous expression system of the present invention.
  • D-forosamine is replaced with the neutral sugars L- mycarose or D-glucose using the erythromycin producer S. erythraea as a host strain.
  • the transfer of glucose was due to an endogenous activity of S. erythraea, while transfer of the L-mycarose occurred after expression of SpnP.
  • SpnG and SpnP are remarkably flexible towards their deoxysugar substrates.
  • Glycosylated spinosyns of the invention are highly active on chewing insects (e.g. caterpillars) and/or sucking insects (e.g. aphids), and, in one aspect, act as broad-spectrum insecticides.
  • Anthracyclines e.g., doxorubicin, and second generation anthracyclines such as idarubicin and epirubicin can also be modified (glycosylated) using the systems and methods of the invention.
  • the glycosylated derivatives of anthracyclines, e.g., doxorubicin, and second generation anthracyclines such as idarubicin and epirubicin of the invention are disaccharide derivatives.
  • Doxorubicin is an anthracycline type polyketide widely used in anticancer chemotherapy. It consists of a tetracyclic aglycone that carries an aminosugar.
  • Figure 3 illustrates the structures of doxorubicin, MEN10755 (see, e.g., Bos, et al. (2001) Cancer Chemotherapy and Pharmacol. 48(5):361-369) and the general structure of anthracyclines that can be modified (glycosylated) using the systems and methods of the invention; arrows indicate positions of glycosylation.
  • Any anthracycline see Figure 5
  • Figure 5 including aclacinomycin, nogalamycin, rhodomycin and doxorubicin, and/or any intermediate in the biosynthesis of an anthracycline can be modified (glycosylated) using the systems and methods of the invention, e.g., as illustrated in Figure 4.
  • Anthracycline biosynthesis employs a type II PKS to generate the aglycone, which is then further modified by oxygenases, reductases and cyclases to form the final tetracyclic anthraquinone moiety.
  • the invention also provides methods for the further modification of glycosyl residues of the novel compositions of the invention, including deoxygenation, methylation and amination.
  • the in vivo glycosylation system of the invention further comprise heterologous enzymes for deoxygenation, methylation, carbamoylation and amination, e.g., spinosyn O-methyltransferases, and one, several or all of the enzymes listed in Figure 15 (see also Example 10, below) and Table 8 (see Example 11 , below).
  • heterologous enzymes for deoxygenation, methylation, carbamoylation and amination e.g., spinosyn O-methyltransferases
  • the enzymes listed in Figure 15 see also Example 10, below
  • Table 8 see Example 11 , below.
  • sugars where the sugars described herein have at least one or more CH3 replaced by an alkyl selected from C2 to C6 alkyl.
  • Such modified forms of sugars A, C and E of Figure 33 can also be used.
  • sugars where the sugars herein have at least one or more CH3 replaced by C2-C6 alkyl, C3-C6 branched alkyl, C3-C7 cycloalkyl, C1-C6 alkoxy-Cl-C6 alkyl, Cl- C6 alkylthio-Cl-C6 alkyl, halo C1-C6 alkyl, C2-C6 alkenyl, C2-C6 alkynyl, C1-C6 alkylcarbonyl, or C3-C6 branched alkylcarbonyl, C3-C7-cycloalkylcarbonyl, C1-C6 alkoxy-Cl-C6 alkylcarbonyl, halo C1-C6 alkylcarbonyl, C2-C6 alkenylcarbonyl, C2-C6 alkynylcarbonyl or formyl group.
  • alternative aspects comprise modified forms of sugars A, C, and E of Figure 33.
  • Other alternative aspects comprise modified
  • the position of genes in a pathway of an in vivo glycosylation system of the invention can follow the same order as their gene-product functions in biosynthesis, for example, early genes (dNDP-transferase, 4,6-dehydratase), followed by intermediate genes (e.g. epimerases, 2-deoxygenation, C-methylations), and with late genes (e.g. 4-ketoreductases, aminotransferases) at the end.
  • in vivo glycosylation systems of the invention comprise smaller, specialized sub-libraries, such as 6-deoxysugar pathway libraries or amino-sugar libraries.
  • the methods of the invention comprise providing a transgenic plant or non-human animal capable of constitutively or inducibly expressing an in vivo glycosylation system of the invention.
  • a transgenic plant or non- human animal of the invention is used to generate a compound of the invention. These can be applied to a plant, plant part, cell, animal or any surface needing treatment.
  • a natural product of the invention can be prophylactically applied to any plant, animal or surface as an anti-microbial or insecticidal agent.
  • the invention includes in vitro or in vivo methods for making the novel compositions of the invention, e.g., using transgenic plants, genetically engineered cells and cell extracts, or other biocatalytic processes.
  • the invention provides transgenic plants, genetically engineered cells and cell extracts comprising introduced nucleic acids encoding a glycosylation system of the invention.
  • the nucleic acids encoding all or part of a glycosylation system of the invention is under the control of an inducible transcriptional control element, e.g., a promoter and/or enhancer or a constitutive transcriptional control element, e.g., a promoter and/or enhancer, e.g., a cauliflower mosaic virus (CaMV) 35S transcription initiation region, a 1'- or 2'- promoter derived from T-DNA of Agrobacterium tumefaciens.
  • an inducible transcriptional control element e.g., a promoter and/or enhancer or a constitutive transcriptional control element, e.g., a promoter and/or enhancer, e.g., a cauliflower mosaic virus (CaMV) 35S transcription initiation region, a 1'- or 2'- promote
  • the introduced nucleic acid encoding all or part of a glycosylation system of the invention is cloned into an expression vehicle, e.g., a vector, a plasmid, a phagemid, a phage, a recombinant virus, vectors from Agrobacterium spp., and the like.
  • an expression vehicle e.g., a vector, a plasmid, a phagemid, a phage, a recombinant virus, vectors from Agrobacterium spp., and the like.
  • While the Examples exemplify embodiments in terms of novel systems for generating novel spinosyn compounds, the systems and methods are sufficiently versatile to provide a combinatorial glycosylation system for other natural products, such as macrolides.
  • Streptomyces diversaTM which is an actinomycete that is amenable to genetic manipulation, was chosen as host strain.
  • other hosts including any species of Streptomyces, are used in the methods and systems of the invention.
  • Deoxysugar biosynthetic genes were sourced from the S.
  • pathways A-D were predicted to produce 6-deoxysugars
  • pathways E-H were predicted to produce 2,6-dideoxysugars
  • pathways I-L to yield 2,3,6-trideoxysugars
  • the 6- deoxysugar pathways consisted of glucose- 1-phosphate-thymidyltransferase, dTDP- glucose-4,6-dehydratase, epimerase and 4-ketoreductase
  • the 2,6-dideoxysugar pathways contained additional genes required for 2-deoxygenation
  • the 2,3,6-tridexysugar pathways contained additional functions for 3 -deoxygenation and either a 4-ketoreductase or a 4- aminotransferase.
  • spnN 2,3-dehydratase
  • spnO 3-ketoreductase
  • spnQ 3,4-dehydratase
  • spnR transaminase
  • spnS dimethyltransferase.
  • gtt NDP-glucose synthase
  • gdh NDP-glucose-4,6- dehydratase
  • epi is 3',5'-epimerase
  • kre 4'-ketoreductase.
  • gtt is NDP-glucose synthase
  • gdh NDP- glucose-4,6-dehydratase
  • epi is 5'-epimerase
  • kre is 4'-ketoreductase
  • tdh 2,3- dehydratase
  • tkr 3-ketoreductase
  • glycosyltransferase genes spnG and spnP were cloned into the vector pUWL201 (Doumith, M.; Weingarten, P.; Wehmeier, U. R; Salah-Bey, K.; Benhamou, B.; Capdevila, C; Michel, J. M.; Piepersberg, W.; Raynal, M. C. Mol. Gen. Genet. 2000, 264 (4), 477-485).
  • Pathways and glycosyltransferases were then combined in S. diversa. As discussed in the Examples, the recombinant S. diversa strains were utilized in bioconversion experiments to assess their capabilities to generate glycosylated spinosyn derivatives.
  • Pathways A-D in combination with SpnG, yielded two products with an apparent molecular weight of 548.3, consistent with attachment of a 6-deoxysugar to the aglycone. Larger amounts of both compounds were generated and their structures were elucidated by NMR to be alpha-L-rhamnosyl-17-PSA (compound 4 in Figure 33) and beta-D-quinovosyl-17-PSA (compound 5 in Figure 33). The relative levels at which these two compounds were produced was dependent on the particular pathway.
  • Alpha-L- Rhamnose is the predicted end product of pathway A and B, and so production of alpha- L-rhamnosyl-17-PSA (compound 4 of Figure 33) by these strains confirms the overall viability of the approach described.
  • alpha-L-rhamnose is not a product predicted for pathways C and D, and production of D-quinovose was not predicted for any of pathways A-D.
  • the 17-keto- aglycone was also isolated.
  • Biosynthesis of the two sugars was monitored through production of their respective 17-PSA analogues,5 compounds 4 and 5 of Figure 33, after feeding the aglycone.
  • Relative production of compounds 4 and 5 by further modified combinatorial 6-deoxysugar pathways was determined and compared to the full combinatorial pathway.
  • Pathway A yielded relative levels of 70 and 6 for compounds 4 and 5, respectively, whereas the modified pathway A (gtt-gdh-kre (S. diversa)), yielded 0 and 37.
  • Pathway B yielded0 relative levels of 8 and 82 for compounds 4 and 5, respectively, whereas the modified pathway B (gtt-gdh-kre (S. spinosa)) yielded 0 and 64.
  • Pathway C yielded relative levels of 9 and 38 for compounds 4 and 5, respectively, whereas the modified pathway C (gtt-gdh-epi (S. diversa.)) yielded 5 and 76.
  • Pathway D yielded relative levels of 6 and 55 for compounds 4 and 5, respectively, whereas the modified pathway D (gtt-gdh-epi (S.5 spinosa)) yielded 0 and 38.
  • relative levels of compounds 4 and 5 were 0 and 0, whereas with modified pathway gtt-gdh levels were 0 and 38.
  • heterologous biosynthetic pathways can be obtained by mutagenesis of the novel heterologous pathways, whether by deleting, adding or replacing one or more genes, by mutagenesis of the pathway as by shuffling, or by mutagenesis of one or more genes in the pathway.
  • the host cell can provide one or more genes that encode enzymes that can participate in the pathway or can modify compounds produced by the heterologous pathway.
  • a host enzyme for the final 4-ketoreduction step in D-quinovose biosynthesis is the use of a host enzyme for the final 4-ketoreduction step in D-quinovose biosynthesis.
  • the host cell can provide endogenous genes and encoded enzymes that allow transfer of the sugar products of the biosynthetic pathways to biomolecules of interest.
  • the host cell makes one or more aglycones or pseudoaglycones of a biomolecule of interest, which are the targets for glycosylation by host using the combinatorial methods and systems of the invention.
  • the host cell is provided with one or more aglycones or pseudoaglycones which are the target for glycosylation by the host using the combinatorial methods and systems of the invention As demonstrated herein both glycosylation steps have been obtained heterologously in S. diversa.
  • venezuelae that carries an inactivated desosamine pathway (Borisova, S. A.; Zhao, L.; Sherman, D. H.; Liu, H. W. Org Lett. 1999, 1 (1), 133-136). Since gtt and gdh are sufficient for D-quinovose production in S. diversa, we propose dNDP-4-keto-6- deoxyhexose as intermediate sugar and the presence of an unidentified endogenous 4- ketoreductase. Surprisingly the 5-epimerase from L-digitoxose biosynthesis was able to substitute for the 3,5-epimerase from L-rhamnose biosynthesis providing evidence for the flexibility of this enzyme.
  • a method of increasing the spinosyn- producing ability of a spinosyn-producing microorganism comprising the steps of 1) transforming the host cell with a recombinant DNA vector or portion thereof that produces the biosynthetic pathway enzymes for production of a spinosyn or a spinosyn variant or a precursor thereof.
  • a vector or portion thereof comprising a DNA sequence that codes for the expression of an activity that is rate limiting in the pathway.
  • the microorganism transformed with the vector is incubated under conditions suitable for cell growth and division, expression of said DNA sequence, and production of spinosyn, its aglycone or pseudoaglycone.
  • Such cells are robust cells for the methods and systems of the invention that provide modified spinosyns.
  • the operative spinosyn biosynthetic genes in the genome of the host cell have been modified so that duplicate copies of at least one of the spinosyn biosynthetic genes are present.
  • a spinosyn, aglycone or pseudoaglycone is provided by the host cell having spinosyn biosynthetic genes in its genome, wherein at least one of the genes has been inactivated, the rest of the genes being operational to produce a spinosyn variant.
  • the host cell has been transformed so that its genome contains operative spinosyn biosynthetic genes or operative genes to produce the macrolide, macrolide aglycone or pseudoaglycone of interest as a substrate of the present methods and systems.
  • the heterologous glycosylation system described herein has proved its versatility and robustness to combine deoxysugar pathways, glycosyltransferases, and acceptor substrates in a very efficient way.
  • the invention provides novel in vivo glycosylation systems.
  • the invention can be practiced in conjunction with any method or protocol known in the art, which are well described in the scientific and patent literature. The discussion of the general methods given herein is intended for illustrative purposes only. Other alternative methods and embodiments will be apparent to those of skill in the art upon review of this disclosure.
  • General Techniques The nucleic acids used to practice this invention, including RNA, iRNA, antisense nucleic acid, cDNA, genomic DNA, vectors, viruses or hybrids thereof, may be isolated from a variety of sources, genetically engineered, amplified, and/or expressed/ generated recombinantly.
  • Recombinant polypeptides e.g., heterologous glycosyltransferase and/or heterologous deoxysugar pathway enzymes
  • a desired activity e.g., in vivo glycosylation of a natural product.
  • Any recombinant expression system can be used, including bacterial, mammalian, yeast, insect or plant cell expression systems.
  • these nucleic acids can be synthesized in vitro by well- known chemical synthesis techniques, as described in, e.g., Adams (1983) J. Am. Chem. Soc. 105:661; Belousov (1997) Nucleic Acids Res.
  • nucleic acids such as, e.g., subcloning, labeling probes (e.g., random-primer labeling using Klenow polymerase, nick translation, amplification), sequencing, hybridization and the like are well described in the scientific and patent literature, see, e.g., Sambrook, ed., MOLECULAR CLONING: A LABORATORY MANUAL (2ND ED.), Vols. 1-3, Cold Spring Harbor Laboratory, (1989); CURRENT PROTOCOLS IN MOLECULAR BIOLOGY, Ausubel, ed.
  • General methods for detecting both nucleic acids and corresponding proteins include analytic biochemical methods such as spectrophotometry, radiography, electrophoresis, capillary electrophoresis, high performance liquid chromatography (HPLC), thin layer chromatography (TLC), hyperdiffusion chromatography, and the like, and various immunological methods such as fluid or gel precipitin reactions, immunodiffusion (single or double), immunoelectrophoresis, radioimmunoassays (RIAs), enzyme-linked immunosorbent assays (ELISAs), immunofluorescent assays, and the like.
  • analytic biochemical methods such as spectrophotometry, radiography, electrophoresis, capillary electrophoresis, high performance liquid chromatography (HPLC), thin layer chromatography (TLC), hyperdiffusion chromatography, and the like
  • various immunological methods such as fluid or gel precipitin reactions, immunodiffusion (single or double), immunoelectrophoresis, radioimmunoa
  • nucleic acids and polypeptides can be by well known methods such as Southern analysis, northern analysis, gel electrophoresis, PCR, radiolabeling, scintillation counting, and affinity chromatography.
  • Another useful means of obtaining and manipulating nucleic acids used to practice the methods of the invention is to clone from genomic samples, and, if desired, screen and re-clone inserts isolated or amplified from, e.g., genomic clones or cDNA clones.
  • Sources of nucleic acid used in the methods of the invention include genomic or cDNA libraries contained in, e.g., mammalian artificial chromosomes (MACs), see, e.g., U.S. Patent Nos.
  • MACs mammalian artificial chromosomes
  • human artificial chromosomes see, e.g., Rosenfeld (1997) Nat. Genet. 15:333-335; yeast artificial chromosomes (YAC); bacterial artificial chromosomes (BAG); PI artificial chromosomes, see, e.g., Woon (1998) Genomics 50:306-316; Pl-derived vectors (PACs), see, e.g., Kern (1997) Biotechniques 23:120-124; cosmids, recombinant viruses, phages or plasmids.
  • PACs Pl-derived vectors
  • a nucleic acid encoding a polypeptide used to practice the invention is assembled in appropriate phase with a leader sequence capable of directing secretion of the translated polypeptide or fragment thereof.
  • the novel in vivo glycosylation systems and methods of the invention can use fusion proteins and nucleic acids encoding them.
  • a polypeptide used to practice the invention can be fused to a heterologous peptide or polypeptide, such as N-terminal identification peptides which impart desired characteristics, such as increased stability or simplified purification.
  • Peptides and polypeptides used to practice the invention can also be synthesized and expressed as fusion proteins with one or more additional domains linked thereto for, e.g., producing a more immunogenic peptide, to more readily isolate a recombinantly synthesized peptide, to identify and isolate antibodies and antibody- expressing B cells, and the like.
  • Detection and purification facilitating domains include, e.g., metal chelating peptides such as polyhistidine tracts and histidine-tryptophan modules that allow purification on immobilized metals, protein A domains that allow purification on immobilized immunoglobulin, and the domain utilized in the FLAGS extension/affinity purification system (Immunex Corp, Seattle WA).
  • an expression vector can include an epitope- encoding nucleic acid sequence linked to six histidine residues followed by a thioredoxin and an enterokinase cleavage site (see e.g., Williams (1995) Biochemistry 34:1787-1797; Dobeli (1998) Protein Expr. Purif. 12:404-414).
  • histidine residues facilitate detection and purification while the enterokinase cleavage site provides a means for purifying the epitope from the remainder of the fusion protein.
  • Technology pertaining to vectors encoding fusion proteins and application of fusion proteins are well described in the scientific and patent literature, see e.g., Kroll (1993) DNA Cell. Biol., 12:441-53.
  • the nucleic acid (e.g., DNA) sequences used to practice the invention can be operatively linked to expression (e.g., transcriptional or translational) control sequence(s), e.g., promoters or enhancers, to direct or modulate RNA synthesis/ expression or their own replication.
  • expression control sequence can be in an expression vector.
  • Exemplary bacterial promoters include lad, lacZ, T3, T7, gpt, lambda PR, PL and trp.
  • Exemplary eukaryotic promoters include CMV immediate early, HSV thymidine kinase, early and late SV40, LTRs from retrovirus, and mouse metallothionein I.
  • Promoters suitable for expressing a polypeptide in bacteria include the E. coli lac or trp promoters, the lad promoter, the lacZ promoter, the T3 promoter, the T7 promoter, the gpt promoter, the lambda PR promoter, the lambda PL promoter, promoters from operons encoding glycolytic enzymes such as 3-phosphoglycerate kinase (PGK), and the acid phosphatase promoter.
  • Eukaryotic promoters include the CMV immediate early promoter, the HSV thymidine kinase promoter, heat shock promoters, the early and late SV40 promoter, LTRs from retroviruses, and the mouse metallothionein-I promoter.
  • tissue-Specific Plant Promoters can be used and expressed in a tissue-specific manner, e.g., expressing a heterologous deoxysugar pathway or glycosyltransferase gene in a tissue-specific manner.
  • the invention also provides plants, plant cells, extracts or seeds that express in vivo glycosylation systems of the invention in a tissue-specific manner.
  • the tissue-specificity can be seed specific, stem specific, leaf specific, root specific, fruit specific and the like.
  • a constitutive promoter such as the CaMV 35S promoter can be used for expression of enzymes (e.g., heterologous glycosyltransferase and a heterologous deoxysugar pathways) in specific parts of the plant or seed or throughout the plant.
  • enzymes e.g., heterologous glycosyltransferase and a heterologous deoxysugar pathways
  • a plant promoter fragment can be employed which will direct expression of a nucleic acid in some or all tissues of a plant, e.g., a regenerated plant.
  • Such promoters are referred to herein as "constitutive" promoters and are active under most environmental conditions and states of development or cell differentiation.
  • constitutive promoters used to practice the invention include the cauliflower mosaic virus (CaMV) 35S transcription initiation region, the V- or 2'- promoter derived from T-DNA of Agrobacterium tumefaciens, and other transcription initiation regions from various plant genes known to those of skill.
  • Such genes include, e.g., ACT11 from Arabidopsis (Huang (1996) Plant Mol Biol. 33:125-139); Cat3 from Arabidopsis (Gen ankNo. ⁇ J43l41, Z ong (1996) Mol. Gen. Genet. 251:196-203); the gene encoding stearoyl-acyl carrier protein desaturase from Brassica napus (Genbank No.
  • tissue-specific or constitutive promoters derived from viruses which can include, e.g., the tobamovirus subgenomic promoter (Kumagai (1995) Proc. Natl. Acad. Sci.
  • the methods of the invention can use plant promoters to direct expression of enzyme-coding (e.g., heterologous glycosyltransferase and a heterologous deoxysugar pathways) nucleic acid in a specific tissue, organ or cell type (i.e.
  • enzyme-coding e.g., heterologous glycosyltransferase and a heterologous deoxysugar pathways
  • tissue-specific promoters The methods of the invention can use plant or other promoters under environmental or developmental control. Exemplary promoters include transcriptional control elements inducible under various environmental conditions, including anaerobic conditions, elevated temperature, the presence of light, chemicals and/or hormones. In one aspect, the plants are sprayed with chemicals and/or hormones to induce expression of a glycosyltransferase or a deoxysugar pathway to generate a natural product of the invention.
  • the methods of the invention can use the drought-inducible promoter of maize (Busk (1997) supra); or, the cold, drought, and high salt inducible promoter from potato (Kirch (1997) Plant Mol. Biol. 33:897 909).
  • tissue-specific promoters used in the methods of the invention promote transcription only within a certain time frame of developmental stage within that tissue, including, e.g., promoters described in: Blazquez (1998) Plant Cell
  • nucleic acids used to practice the invention are operably linked to a promoter active primarily only in cotton fiber cells.
  • nucleic acids used to practice the invention are operably linked to a promoter active primarily during the stages of cotton fiber cell elongation, e.g., as described by Rinehart (1996) supra.
  • the nucleic acids can be operably linked to the
  • Fbl2A gene promoter to be preferentially expressed in cotton fiber cells (Ibid) . See also,
  • Root-specific promoters may also be used to express the nucleic acids used to practice the invention.
  • examples of root-specific promoters include the promoter from the alcohol dehydrogenase gene (DeLisle (1990) Int. Rev. Cytol. 123:39-60).
  • Other promoters that can be used to express the nucleic acids used to practice the invention include, e.g., ovule-specific, embryo-specific, endosperm- specific, integument-specific, seed coat-specific promoters, or some combination thereof; a leaf-specific promoter (see, e.g., Busk (1997) Plant J.
  • the Blec4 gene from pea which is active in epidermal tissue of vegetative and floral shoot apices of transgenic alfalfa making it a useful tool to target the expression of foreign genes to the epidermal layer of actively growing shoots or fibers
  • the ovule- specific BEL1 gene see, e.g., Reiser (1995) Cell 83:735-742, GenBank No. U39944)
  • the promoter in Klee, U.S. Patent No. 5,589,583, describing a plant promoter region is capable of conferring high levels of transcription in meristemat ⁇ c tissue and/or rapidly dividing cells.
  • plant promoters which are inducible upon exposure to plant hormones, such as auxins, can be used to express nucleic acids used to practice the invention.
  • the invention can use the auxin-response elements El promoter fragment (AuxREs) in the soybean (Glycine max L.) (Liu (1997) Plant Physiol. 115:397-407); the auxin-responsive Arabidopsis GST6 promoter (also responsive to salicylic acid and hydrogen peroxide) (Chen (1996) Plant J. 10: 955-966); the auxin- inducible parC promoter from tobacco (Sakai (1996) 37:906-913); a plant biotin response element (Streit (1997) Mol. Plant Microbe Interact.
  • Nucleic acids used to practice the invention can also be operably linked to plant promoters which are inducible upon exposure to chemicals reagents which can be applied to the plant, such as herbicides or antibiotics.
  • the rnaize In2-2 promoter activated by benzenesulfonamide herbicide safeners, can be used (De Veylder
  • Coding sequence can be under the control of, e.g., a tetracycline-inducible promoter, e.g., as described with transgenic tobacco plants containing the Avena sativa L. (oat) arginine decarboxylase gene (Masgrau (1997) Plant J. 11:465-473); or, a salicylic acid-responsive element (Stange (1997) Plant J. 11:1315-1324).
  • a tetracycline-inducible promoter e.g., as described with transgenic tobacco plants containing the Avena sativa L. (oat) arginine decarboxylase gene (Masgrau (1997) Plant J. 11:465-473); or, a salicylic acid-responsive element (Stange (1997) Plant J. 11:1315-1324).
  • a tetracycline-inducible promoter e.g., as described with transgenic tobacco plants containing the Avena sativa
  • hormone- or pesticide-) induced promoters i.e., promoter responsive to a chemical which can be applied to the transgenic plant in the field, expression of a glycosyltransferase or a deoxysugar pathway to generate a natural product of the invention can be induced at a particular stage of development of the plant.
  • the invention also provides for transgenic plants containing an inducible gene encoding for a glycosyltransferase or a deoxysugar pathway to generate a natural product of the invention, where the plants' host range is limited to target plant species, such as corn, rice, barley, wheat, potato or other crops, inducible at any stage of development of the crop.
  • a tissue-specific plant promoter drives expression of operably linked sequences in tissues other than the target tissue.
  • a tissue-specific promoter is one that drives expression preferentially in the target tissue or cell type, but may also lead to some expression in other tissues as well.
  • the nucleic acids used to practice the invention can also be operably linked to plant promoters which are inducible upon exposure to chemicals reagents. These reagents include, e.g., herbicides, synthetic auxins, or antibiotics which can be applied, e.g., sprayed, onto transgenic plants.
  • Inducible expression of a glycosyltransferase or a deoxysugar pathway to generate a natural product of the invention will allow the grower to induce production in a plant, a plant cell, a seed, a fruit and the like. Inducible expression will also allow selection of plants with fungi-resistant properties. The development of toxin resistant plants, seeds, fruits, etc. can be controlled in this manner. In this way the invention provides the means to facilitate the growth, harvesting and storage of plants and plant parts by constitutive or inducible enzyme detoxification.
  • the maize In2-2 promoter activated by benzenesulfonamide herbicide safeners, is used (De Veylder (1997) Plant Cell Physiol. 38:568-577); application of different herbicide safeners induces distinct gene expression patterns, including expression in the root, hydathodes, and the shoot apical meristem.
  • Coding sequences of nucleic acids also can be under the control of a tetracycline-inducible promoter, e.g., as described with transgenic tobacco plants containing the Avena sativa L. (oat) arginine decarboxylase gene (Masgrau (1997) Plant J. 11 :465-473); or, a salicylic acid-responsive element (Stange (1997) Plant J. 11:1315-1324).
  • Expression vectors and cloning vehicles comprising nucleic acids encoding enzymes (e.g., glycosyltransferases or deoxysugar pathways) are used to practice the invention.
  • Expression vectors and cloning vehicles used to practice the invention can comprise viral particles, baculovirus, phage, plasmids, phagemids, cosmids, fosmids, bacterial artificial chromosomes, viral DNA (e.g., vaccinia, adenovirus, foul pox virus, pseudorabies and derivatives of SV40), PI -based artificial chromosomes, yeast plasmids, yeast artificial chromosomes, and any other vectors specific for specific hosts of interest (such as bacillus, Aspergillus and yeast).
  • Vectors used to practice the invention can include chromosomal, non-chromosomal and synthetic DNA sequences. Large numbers of suitable vectors are known to those of skill in the art, and are commercially available. Exemplary vectors are include: bacterial: pQE vectors (Qiagen), pBluescript plasmids, pNH vectors, (lambda-ZAP vectors (Stratagene); ptrc99a, pKK223-3, pDR540, pRIT2T (Pharmacia); Eukaryotic: pXTl, pSG5 (Stratagene), pSVK3, pBPV, pMSG, pSVLSV40 (Pharmacia).
  • the expression vector can comprise a promoter, a ribosome binding site for translation initiation and a transcription terminator.
  • the vector may also include appropriate sequences for amplifying expression.
  • Mammalian expression vectors can comprise an origin of replication, any necessary ribosome binding sites, a polyadenylation site, splice donor and acceptor sites, transcriptional termination sequences, and 5' flanking non-transcribed sequences.
  • DNA sequences derived from the SV40 splice and polyadenylation sites may be used to provide the required non-transcribed genetic elements.
  • the expression vectors contain one or more selectable marker genes to permit selection of host cells containing the vector.
  • selectable markers include genes encoding dihydrofolate reductase or genes conferring neomycin resistance for eukaryotic cell culture, genes conferring tetracycline or ampicillin resistance in E. coli, and the S. cerevisiae TRP1 gene.
  • Promoter regions can be selected from any desired gene using chloramphenicol transferase (CAT) vectors or other vectors with selectable markers.
  • Vectors for expressing a polypeptide or fragment thereof in eukaryotic cells can also contain enhancers to increase expression levels.
  • Enhancers are cis-acting elements of DNA, usually from about 10 to about 300 bp in length that act on a promoter to increase its transcription. Examples include the SV40 enhancer on the late side of the replication origin bp 100 to 270, the cytomegalovirus early promoter enhancer, the polyoma enhancer on the late side of the replication origin, and the adenovirus enhancers.
  • a nucleic acid sequence can be inserted into a vector by a variety of procedures. In general, the sequence is ligated to the desired position in the vector following digestion of the insert and the vector with appropriate restriction endonucleases. Alternatively, blunt ends in both the insert and the vector may be ligated.
  • the vector can be in the form of a plasmid, a viral particle, or a phage.
  • Other vectors include chromosomal, non-chromosomal and synthetic DNA sequences, derivatives of SV40; bacterial plasmids, phage DNA, baculovirus, yeast plasmids, vectors derived from combinations of plasmids and phage DNA, viral DNA such as vaccinia, adenovirus, fowl pox virus, and pseudorabies.
  • cloning and expression vectors for use with prokaryotic and eukaryotic hosts are described by, e.g., Sambrook.
  • Particular bacterial vectors which can be used include the commercially available plasmids comprising genetic elements of the well known cloning vector pBR322 (ATCC 37017), pKK223-3 (Pharmacia Fine Chemicals, Uppsala, Sweden), GEM1 (Promega Biotec, Madison, WI, USA) pQE70, pQE60, pQE-9 (Qiagen), pDl 0, psiX174 pBluescript II KS, pNH8A, pNH16a, pNH18A, pNH46A (Stratagene), ptrc99a, pKK223-3, pKK233-3, DR540, pRIT5 (Pharmacia), pKK232-8 and ⁇ CM7.
  • eukaryotic vectors include pSV2CAT, pOG44, pXTl, pSG (Stratagene) pSVK3, pBPV, pMSG, and pSVL (Pharmacia).
  • any other vector may be used as long as it is replicable and viable in the host cell.
  • the nucleic acids used to practice the invention can be expressed in expression cassettes, vectors or viruses and transiently or stably expressed in plant cells and seeds.
  • One exemplary transient expression system uses episomal expression systems, e.g., cauliflower mosaic virus (CaMV) viral RNA generated in the nucleus by transcription of an episomal mini-chromosome containing supercoiled DNA, see, e.g., Covey (1990) Proc. Natl. Acad. Sci. USA 87:1633-1637.
  • coding sequences i.e., all or sub-fragments of sequences encoding polypeptides having a glycosyltransferase or deoxysugar pathway activity can be inserted into a plant host cell genome becoming an integral part of the host chromosomal DNA.
  • Sense or antisense transcripts can be expressed in this manner.
  • a vector comprising sequences (e.g., promoters or coding regions) from nucleic acids used to practice the invention can comprise a marker gene that confers a selectable phenotype on a plant cell or a seed.
  • the marker may encode biocide resistance, particularly antibiotic resistance, such as resistance to kanamycin, G418, bleomycin, hygromycin, or herbicide resistance, such as resistance to chlorosulfuron or Basta.
  • the expression vectors comprising nucleic acids encoding a glycosyltransferase or a deoxysugar pathway to generate a natural product of the invention are expressed (inducibly or constitutively) in plants and plant parts, including cells, seeds, fruits, leaves, roots, flowers and the like.
  • Expression vectors capable of expressing nucleic acids and proteins in plants are well known in the art, and can include, e.g., vectors from Agrobacterium spp., potato virus X (see, e.g., Angell (1997) EMBO J.
  • tobacco mosaic virus see, e.g., Casper (1996) Gene 173:69-73
  • tomato bushy stunt virus see, e.g., Hillman (1989) Virology 169:42-50
  • tobacco etch virus see, e.g., Dolja (1997) Virology 234:243-252)
  • bean golden mosaic virus see, e.g., Morinaga (1993) Microbiol Immunol. 37:471-476
  • cauliflower mosaic virus see, e.g., Cecchini (1997) Mol. Plant Microbe Interact. 10:1094-1101
  • maize Ac/Ds transposable element see, e.g., Rubin (1997) Mol. Cell. Biol.
  • the expression vector can have two replication systems to allow it to be maintained in two organisms, for example in mammalian or insect cells for expression and in a prokaryotic host for cloning and amplification.
  • the expression vector can contain at least one sequence homologous to the host cell genome. It can contain two homologous sequences which flank the expression construct.
  • the integrating vector can be directed to a specific locus in the host cell by selecting the appropriate homologous sequence for inclusion in the vector.
  • Constructs for integrating vectors are well known in the art.
  • Expression vectors used to practice the invention may also include a selectable marker gene to allow for the selection of bacterial strains that have been transformed, e.g., genes which render the bacteria resistant to drugs such as ampicillin, chloramphenicol, erythromycin, kanamycin, neomycin and tetracycline.
  • Selectable markers can also include biosynthetic genes, such as those in the histidine, tryptophan and leucine biosynthetic pathways.
  • transformed cells comprising a nucleic acid sequence encoding glycosyltransferases or deoxysugar pathways, or capable of producing a novel compound of the invention, are used to practice the invention.
  • a transformed cell is used to generate a polypeptide having glycosyltransferase or deoxysugar pathway activity, which can in turn be applied to or administered to an animal, a plant, a food, a feed, a patient, and the like, as described below.
  • a transformed cell that generates (e.g., secretes) a glycosyltransferase or deoxysugar pathway activity, or a novel compound of the invention is itself applied to a plant, a food, a feed, and the like.
  • the host cell may be any of the host cells familiar to those skilled in the art, including prokaryotic cells, eukaryotic cells, such as bacterial cells, fungal cells, yeast cells, mammalian cells, insect cells, or plant cells.
  • a host cell can be an actinomycetes (i.e., any organism from the order Actinomycetales), e.g., a recombinantly engineered actinomycetes, comprising a heterologous glycosyltransferase and a heterologous deoxysugar pathway.
  • actinomycetes i.e., any organism from the order Actinomycetales
  • a recombinantly engineered actinomycetes comprising a heterologous glycosyltransferase and a heterologous deoxysugar pathway.
  • the actinomycetes is a Streptomyces, such as a Streptomyces coelicolor, Streptomyces peucetius, Streptomyces avermitilis, Streptomyces aureofaciens, Streptomyces kasugensis, Streptomyces lavenulae, Streptomyces lipmanii, Streptomyces lividans, Streptomyces ambofaciens, Streptomyces violaceoniger, Streptomyces thermotolerans, Streptomyces rimosus, Streptomyces glaucescens, Streptomyces roseofulvus, Streptomyces cinnamonensis, Streptomyces curacoi, Streptomyces fradiae, Streptomyces griseus, Streptomyces griseofuscus, Streptomyces longisporoflavus, Streptomyces hygr
  • the actinomycete is an actinomycete plant endophyte.
  • the Actinomycetales is from the family Micromonosporaceae, or the genus Actinomyces, Actinomadura or Nocardia.
  • Micromonosporaceae are preferably Micromonos poraceae, Actinoplaes, Dactylos porangium, Micromonospora or Verrucosispora.
  • Alternative host cells include Pseudonocardineae, Actinosynnema, Lechevaleria, Saccharothrix, Actinoalloteichus, Actinopolyspora, Amycolatopsis, Kibedelos porangium, Pseudonocardia, Saccharomonospora, Saccharopolyspora, and Streptoalloteichus.
  • Alternative host cells include Streptomycetacea, including Kitasatospora and Streptomyces.
  • Alternative host cells include Microbispora and Microtetraspora.
  • Exemplary bacterial cells include E.
  • coli Streptomyces, Bacillus subtilis, Bacillus ceres, Salmonella typhimurium and various species within the genera Bacillus, Streptomyces, and Staphylococcus.
  • Exemplary insect cells include Drosophila S2 and Spodoptera Sf9.
  • Exemplary animal cells include CHO, COS or Bowes melanoma or any mouse or human cell line. The selection of an appropriate host is within the abilities of those skilled in the art. Techniques for transforming a wide variety of higher plant species are well known and described in the technical and scientific literature. See, e.g., Weising (1988) Ann. Rev. Genet. 22:421- 477; U.S. Patent No. 5,750,870.
  • the vector can be introduced into the host cells using any of a variety of techniques, including transformation, transfection, transduction, viral infection, gene guns, or Ti-mediated gene transfer. Particular methods include calcium phosphate transfection, DEAE-Dextran mediated transfection, lipofection, or electroporation, see, e.g., Davis, L., Dibner, M., Battey, I., Basic Methods in Molecular Biology, (1986).
  • the nucleic acids or vectors used to practice the invention are introduced into the cells for screening, thus, the nucleic acids enter the cells in a manner suitable for subsequent expression of the nucleic acid. The method of introduction is largely dictated by the targeted cell type.
  • Exemplary methods include CaP0 4 precipitation, liposome fusion, lipofection (e.g., LIPOFECTINTM), electroporation, viral infection, etc.
  • the candidate nucleic acids may stably integrate into the genome of the host cell (for example, with retroviral introduction) or may exist either transiently or stably in the cytoplasm (i.e. through the use of traditional plasmids, utilizing standard regulatory sequences, selection markers, etc.).
  • retroviral vectors capable of transfecting such targets can also be used.
  • the engineered host cells can be cultured in conventional nutrient media modified as appropriate for activating promoters, selecting transformants or amplifying the genes encoding a glycosyltransferase or a deoxysugar pathway to generate a natural product of the invention.
  • the selected promoter may be induced by appropriate means (e.g., temperature shift or chemical induction) and the cells may be cultured for an additional period to allow them to produce the desired polypeptide or fragment thereof.
  • Cells can be harvested by centrifugation, disrupted by physical or chemical means, and the resulting crude extract is retained for further purification.
  • Microbial cells employed for expression of proteins can be disrupted by any convenient method, including freeze-thaw cycling, sonication, mechanical disruption, or use of cell lysing agents. Such methods are well known to those skilled in the art.
  • the expressed polypeptide or fragment thereof can be recovered and purified from recombinant cell cultures by methods including ammonium sulfate or ethanol precipitation, acid extraction, anion or cation exchange chromatography, phosphocellulose chromatography, hydrophobic interaction chromatography, affinity chromatography, hydroxylapatite chromatography and lectin chromatography. Protein refolding steps can be used, as necessary, in completing configuration of the polypeptide.
  • HPLC high performance liquid chromatography
  • Various mammalian cell culture systems can also be employed to express recombinant protein.
  • Examples of mammalian expression systems include the COS-7 lines of monkey kidney fibroblasts and other cell lines capable of expressing proteins from a compatible vector, such as the C127, 3T3, CHO, HeLa and BHK cell lines.
  • the constructs in host cells can be used in a conventional manner to produce the gene product encoded by the recombinant sequence.
  • the polypeptides produced by host cells containing the vector may be glycosylated or may be non-glycosylated.
  • Polypeptides used to practice the invention may or may not also include an initial methionine amino acid residue.
  • enzymes that can modify the sugar moiety before or after transfer to the aglycone or pseudoaglycone are included.
  • the new glycone produced can be subjected to further processing or derivatization by the host cell or post isolation.
  • aglycone is meant a molecule, such as a polyketide, a macrolide, a peptide or a mixed polyketide-peptide or macrolide-peptide that can be processed, for example, by a glycosyltransferase, to receive at least one carbohydrate (including a natural or unnatural sugar), e.g., at least one activated sugar moiety.
  • sugar moieties of or added to the aglycone or pseudoaglycone can include forms modified by enzymes such as methyl transferases, hydroxylases, or epoxidases, before or after a glycosyltransferase step.
  • Diversity in the glycones of the invention can be achieved by modifying one or more glycone- (or aglycone- or pseudoaglycone-) modifying endogenous genes in a host cell.
  • Host cells can be engineered to inactivate or activate all or parts of an endogenous metabolic pathway, e.g. a deoxysugar biosynthetic pathway, a glycosyltransferase, a macrolide biosynthetic pathway, by means well known in the art.
  • an antisense molecule against an endogenous polypeptide or enzyme of interest in a metabolic pathway can be administered to the host cell or expressed in the host cell to modulate expression of that polypeptide.
  • a pathway or pathways are modulated to allow production of an aglycone or pseudoaglycone of interest that is then a substrate for at least one enzyme and/or pathway of the present invention.
  • Other methods are well known to inactivate a particular enzyme or pathway of interest, for example by gene disruption of the structural gene encoding the target enzyme or of a regulatory gene or cis region controlling expression of that enzyme or its pathway, e.g., operon.
  • a particular active domain of an enzyme e.g. a domain of a PKS, is inactivated to yield a host cell whose modified PKS produces a modified product that is then a substrate for use in the invention.
  • a host cell can either naturally produce a desired aglycone or pseudoaglycone or be modified to produce one. Additionally the host cell can be modified to further modify a glycone, aglycone or pseudoaglycone produced by the glycosyltransferase or sugar pathway of the invention. Cell-free translation systems can also be used to practice the invention.
  • Cell-free translation systems can use mRNAs transcribed from a DNA construct comprising a promoter operably linked to a nucleic acid encoding the polypeptide or fragment thereof.
  • the DNA construct may be linearized prior to conducting an in vitro transcription reaction.
  • the transcribed mRNA is then incubated with an appropriate cell-free translation extract, such as a rabbit reticulocyte extract, to produce the desired polypeptide or fragment thereof.
  • the expression vectors can contain one or more selectable marker genes to provide a phenotypic trait for selection of transformed host cells such as dihydrofolate reductase or neomycin resistance for eukaryotic cell culture, or such as tetracycline or ampicillin resistance in E. coli.
  • the invention provides methods of generating variants of the in vivo glycosylation systems of the invention, including heterologous glycosyltransferases and heterologous deoxysugar pathways, using, e.g., evolution technologies, such as GSSMTM and GENEREASSEMBLYTM.
  • GSSMTM is used to create all possible codon substitutions at each position within a given gene.
  • GENEREASSEMBLYTM is routinely used to create libraries of gene chimeras from functionally homologous parental genes, or to recombine desirable mutants identified by GSSMTM.
  • the gene fragments can be generated by PCR or from synthetic oligonucleotides.
  • Gene chimeras are formed by ligation of pooled homologous fragments. All fragments in a pool have identical overhangs and therefore all combinations are formed with the same probability. The overhangs are directly engineered into the oligo fragments or generated by class IIS restriction sites included in the PCR primers.
  • the complexity of the resulting library can be customized to fit the needs of the individual project. It can be used for the evolution of protein domains, entire enzymes, or even pathways.
  • GENEREASSEMBLYTM works equally well with all parental DNA's independent of sequence homology. This allows for the recombination of proteins even at non-conserved amino acids.
  • the genetic composition of a cell is altered by, e.g., modification of a homologous gene ex vivo, followed by its reinsertion into the cell.
  • a nucleic acid can be altered by any means. For example, random or stochastic methods, or, non-stochastic, or "directed evolution," methods, see, e.g., U.S. Patent No. 6,361,974. Methods for random mutation of genes are well known in the art, see, e.g., U.S. Patent No. 5,830,696. For example, mutagens can be used to randomly mutate a gene.
  • Mutagens include, e.g., ultraviolet light or gamma irradiation, or a chemical mutagen, e.g., mitomycin, nitrous acid, photoactivated psoralens, alone or in combination, to induce DNA breaks amenable to repair by recombination.
  • chemical mutagens include, for example, sodium bisulfite, nitrous acid, hydroxylamine, hydrazine or formic acid.
  • Other mutagens are analogues of nucleotide precursors, e.g., nitrosoguanidine, 5-bromouracil, 2-aminopurine, or acridine.
  • Intercalating agents such as profiavine, acriflavine, quinacrine and the like can also be used. Any technique in molecular biology can be used, e.g., random PCR mutagenesis, see, e.g., Rice (1992) Proc. Natl. Acad. Sci. USA 89:5467-5471; or, combinatorial multiple cassette mutagenesis, see, e.g., Crameri (1995) Biotechniques
  • nucleic acids e.g., genes
  • modifications, additions or deletions are introduced by error-prone PCR, shuffling, oligonucleotide-directed mutagenesis, assembly PCR, sexual PCR mutagenesis, in vivo mutagenesis, cassette mutagenesis, recursive ensemble mutagenesis, exponential ensemble mutagenesis, site-specific mutagenesis, gene reassembly, Gene Site Saturation MutagenesisTM (GSSMTM), synthetic ligation reassembly (SLR), recombination, recursive sequence recombination, phosphothioate-modified DNA mutagenesis, uracil-containing template mutagenesis, gapped duplex mutagenesis, point mismatch repair mutagenesis, repair-deficient host strain mutagenesis, chemical mutagenesis, radiogenic mutagenesis, deletion mutagenesis, restriction-selection mutagenesis, restriction-purification mutagenesis, artificial gene synthesis, ensemble mutagenesis, chimeric nucleic acid
  • Mutational methods of generating diversity include, for example, site- directed mutagenesis (Ling et al. (1997) "Approaches to DNA mutagenesis: an overview” Anal Biochem. 254(2): 157-178; Dale et al. (1996) “Oligonucleotide-directed random mutagenesis using the phosphorothioate method” Methods Mol. Biol. 57:369-374; Smith (1985) "In vitro mutagenesis” Ann. Rev. Genet. 19:423-462; Botstein & Shortle (1985) "Strategies and applications of in vitro mutagenesis” Science 229: 1193-1201 ; Carter (1986) “Site-directed mutagenesis” Biochem. J. 237:1-7; and Kunkel (1987) "The site- directed mutagenesis (Ling et al. (1997) "Approaches to DNA mutagenesis: an overview" Anal Biochem. 254(2): 157-178; Dale e
  • Oligonucleotide- directed mutagenesis a simple method using two oligonucleotide primers and a single- stranded DNA template" Methods in Enzymol. 154:329-350); phosphorothioate-modified DNA mutagenesis (Taylor et al. (1985) "The use of phosphorothioate-modified DNA in restriction enzyme reactions to prepare nicked DNA” Nucl. Acids Res. 13: 8749-8764; Taylor et al. (1985) "The rapid generation of oligonucleotide-directed mutations at high frequency using phosphorothioate-modified DNA” Nucl.
  • Additional protocols that can be used to practice the invention include point mismatch repair (Kramer (1984) "Point Mismatch Repair” Cell 38:879-887), mutagenesis using repair-deficient host strains (Carter et al. (1985) "Improved oligonucleotide site-directed mutagenesis using Ml 3 vectors" Nucl. Acids Res. 13 : 4431 - 4443; and Carter (1987) "Improved oligonucleotide-directed mutagenesis using M13 vectors” Methods in Enzymol. 154: 382-403), deletion mutagenesis (Eghtedarzadeh (1986) "Use of oligonucleotides to generate large deletions" Nucl.
  • Non-stochastic, or "directed evolution,” methods include, e.g., saturation mutagenesis (e.g., Gene Site Saturation MutagenesisTM (GSSMTM)), synthetic ligation reassembly (SLR), or a combination thereof can be used to modify the nucleic acids encoding a glycosyltransferase or a deoxysugar pathway to generate a natural product of the invention.
  • Polypeptides encoded by the modified nucleic acids can be screened for a new property, e.g., stability, before testing for a glycosyltransferase or a deoxysugar pathway or other activity.
  • Any testing modality or protocol can be used, e.g., using a capillary array platform. See, e.g., U.S. Patent Nos. 6,361,974; 6,280,926; 5,939,250.
  • Saturation mutagenesis, or, GSSMTM codon primers containing a degenerate N,N,G/T sequence are used to introduce point mutations into a polynucleotide, e.g., a glycosyltransferase or a deoxysugar pathway enzyme, so as to generate a set of progeny polypeptides in which a full range of single amino acid substitutions is represented at each amino acid position, e.g., an amino acid residue in an enzyme active site or ligand binding site targeted to be modified.
  • a polynucleotide e.g., a glycosyltransferase or a deoxysugar pathway enzyme
  • oligonucleotides can comprise a contiguous first homologous sequence, a degenerate N,N,G/T sequence, and, optionally, a second homologous sequence.
  • the downstream progeny translational products from the use of such oligonucleotides include all possible amino acid changes at each amino acid site along the polypeptide, because the degeneracy of the N,N,G/T sequence includes codons for all 20 amino acids.
  • one such degenerate oligonucleotide (comprised of, e.g., one degenerate N,N,G/T cassette) is used for subjecting each original codon in a parental polynucleotide template to a full range of codon substitutions.
  • At least two degenerate cassettes are used - either in the same oligonucleotide or not, for subjecting at least two original codons in a parental polynucleotide template to a full range of codon substitutions.
  • more than one N,N,G/T sequence can be contained in one oligonucleotide to introduce amino acid mutations at more than one site.
  • This plurality of N,N,G/T sequences can be directly contiguous, or separated by one or more additional nucleotide sequence(s).
  • oligonucleotides serviceable for introducing additions and deletions can be used either alone or in combination with the codons containing an N,N,G/T sequence, to introduce any combination or permutation of amino acid additions, deletions, and/or substitutions.
  • simultaneous mutagenesis of two or more contiguous amino acid positions is done using an oligonucleotide that contains contiguous N,N,G/T triplets, i.e. a degenerate (N,N,G/T)n sequence.
  • degenerate cassettes having less degeneracy than the N,N,G/T sequence are used. For example, it may be desirable in some instances to use (e.g.
  • a degenerate triplet sequence comprised of only one N, where said N can be in the first second or third position of the triplet.
  • Any other bases including any combinations and permutations thereof can be used in the remaining two positions of the triplet.
  • degenerate triplets allows for systematic and easy generation of a full range of possible natural amino acids (for a total of 20 amino acids) into each and every amino acid position in a polypeptide (in alternative aspects, the methods also include generation of less than all possible substitutions per amino acid residue, or codon, position). For example, for a 100 amino acid polypeptide, 2000 distinct species (i.e. 20 possible amino acids per position X 100 amino acid positions) can be generated.
  • an oligonucleotide or set of oligonucleotides containing a degenerate N,N,G/T triplet 32 individual sequences can code for all 20 possible natural amino acids.
  • Nondegenerate oligonucleotides can optionally be used in combination with degenerate primers disclosed; for example, nondegenerate oligonucleotides can be used to generate specific point mutations in a working polynucleotide.
  • each saturation mutagenesis reaction vessel contains polynucleotides encoding at least 20 progeny polypeptide (e.g., a glycosyltransferase or a deoxysugar pathway enzyme) molecules such that all 20 natural amino acids are represented at the one specific amino acid position corresponding to the codon position mutagenized in the parental polynucleotide (other aspects use less than all 20 natural combinations).
  • progeny polypeptide e.g., a glycosyltransferase or a deoxysugar pathway enzyme
  • the 32-fold degenerate progeny polypeptides generated from each saturation mutagenesis reaction vessel can be subjected to clonal amplification (e.g. cloned into a suitable host, e.g., E. coli host, using, e.g., an expression vector) and subjected to expression screening.
  • clonal amplification e.g. cloned into a suitable host, e.g., E. coli host, using, e.g., an expression vector
  • an individual progeny polypeptide is identified by screening to display a favorable change in property (when compared to the parental polypeptide, such as increased proteolytic activity under alkaline or acidic conditions), it can be sequenced to identify the correspondingly favorable amino acid substitution contained therein.
  • favorable amino acid changes may be identified at more than one amino acid position.
  • One or more new progeny molecules can be generated that contain a combination of all or part of these favorable amino acid substitutions. For example, if 2 specific favorable amino acid changes are identified in each of 3 amino acid positions in a polypeptide, the permutations include 3 possibilities at each position (no change from the original amino acid, and each of two favorable changes) and 3 positions. Thus, there are 3 x 3 x 3 or 27 total possibilities, including 7 that were previously examined - 6 single point mutations (i.e. 2 at each of three positions) and no change at any position.
  • site-saturation mutagenesis can be used together with another stochastic or non-stochastic means to vary sequence, e.g., synthetic ligation reassembly (see below), shuffling, chimerization, recombination and other mutagenizing processes and mutagenizing agents.
  • This invention provides for the use of any mutagenizing process(es), including saturation mutagenesis, in an iterative manner.
  • Synthetic Ligation Reassembly The invention provides a non-stochastic gene modification system termed "synthetic ligation reassembly,” or simply “SLR,” a “directed evolution process,” to generate polypeptides having a glycosyltransferase or a deoxysugar pathway activity with new or altered properties.
  • SLR is a method of ligating oligonucleotide fragments together non-stochastically. This method differs from stochastic oligonucleotide shuffling in that the nucleic acid building blocks are not shuffled, concatenated or chimerized randomly, but rather are assembled non-stochastically. See, e.g., U.S.
  • SLR comprises the following steps: (a) providing a template polynucleotide, wherein the template polynucleotide comprises sequence encoding a homologous gene; (b) providing a plurality of building block polynucleotides, wherein the building block polynucleotides are designed to cross-over reassemble with the template polynucleotide at a predetermined sequence, and a building block polynucleotide comprises a sequence that is a variant of the homologous gene and a sequence homologous to the template polynucleotide flanking the variant sequence; (c) combining a building block polynucleotide with a template polynucleotide such that the building block polynucleotide cross-over reassembles with the
  • SLR does not depend on the presence of high levels of homology between polynucleotides to be rearranged.
  • this method can be used to non-stochastically generate libraries (or sets) of progeny molecules comprised of over 10100 different chimeras.
  • SLR can be used to generate libraries comprised of over 101000 different progeny chimeras.
  • aspects of the present invention include non-stochastic methods of producing a set of finalized chimeric nucleic acid molecule shaving an overall assembly order that is chosen by design. This method includes the steps of generating by design a plurality of specific nucleic acid building blocks having serviceable mutually compatible ligatable ends, and assembling these nucleic acid building blocks, such that a designed overall assembly order is achieved.
  • the mutually compatible ligatable ends of the nucleic acid building blocks to be assembled are considered to be "serviceable" for this type of ordered assembly if they enable the building blocks to be coupled in predetermined orders.
  • the overall assembly order in which the nucleic acid building blocks can be coupled is specified by the design of the ligatable ends. If more than one assembly step is to be used, then the overall assembly order in which the nucleic acid building blocks can be coupled is also specified by the sequential order of the assembly step(s).
  • the annealed building pieces are treated with an enzyme, such as a ligase (e.g. T4 DNA ligase), to achieve covalent bonding of the building pieces.
  • a ligase e.g. T4 DNA ligase
  • the design of the oligonucleotide building blocks is obtained by analyzing a set of progenitor nucleic acid sequence templates that serve as a basis for producing a progeny set of finalized chimeric polynucleotides.
  • These parental oligonucleotide templates thus serve as a source of sequence information that aids in the design of the nucleic acid building blocks that are to be mutagenized, e.g., chimerized or shuffled.
  • the sequences of a plurality of parental nucleic acid templates are aligned in order to select one or more demarcation points.
  • the demarcation points can be located at an area of homology, and are comprised of one or more nucleotides.
  • demarcation points are preferably shared by at least two of the progenitor templates.
  • the demarcation points can thereby be used to delineate the boundaries of oligonucleotide building blocks to be generated in order to rearrange the parental polynucleotides.
  • the demarcation points identified and selected in the progenitor molecules serve as potential chimerization points in the assembly of the final chimeric progeny molecules.
  • a demarcation point can be an area of homology (comprised of at least one homologous nucleotide base) shared by at least two parental polynucleotide sequences.
  • a demarcation point can be an area of homology that is shared by at least half of the parental polynucleotide sequences, or, it can be an area of homology that is shared by at least two thirds of the parental polynucleotide sequences. Even more preferably a serviceable demarcation points is an area of homology that is shared by at least three fourths of the parental polynucleotide sequences, or, it can be shared by at almost all of the parental polynucleotide sequences. In one aspect, a demarcation point is an area of homology that is shared by all of the parental polynucleotide sequences .
  • a ligation reassembly process is performed exhaustively in order to generate an exhaustive library of progeny chimeric polynucleotides.
  • all possible ordered combinations of the nucleic acid building blocks are represented in the set of finalized chimeric nucleic acid molecules.
  • the assembly order i.e. the order of assembly of each building block in the 5' to 3 sequence of each finalized chimeric nucleic acid
  • the assembly order is by design (or non-stochastic) as described above. Because of the non-stochastic nature of this invention, the possibility of unwanted side products is greatly reduced.
  • the ligation reassembly method is performed systematically.
  • the method is performed in order to generate a systematically compartmentalized library of progeny molecules, with compartments that can be screened systematically, e.g. one by one.
  • this invention provides that, through the selective and judicious use of specific nucleic acid building blocks, coupled with the selective and judicious use of sequentially stepped assembly reactions, a design can be achieved where specific sets of progeny products are made in each of several reaction vessels. This allows a systematic examination and screening procedure to be performed. Thus, these methods allow a potentially very large number of progeny molecules to be examined systematically in smaller groups.
  • the progeny molecules generated preferably comprise a library of finalized chimeric nucleic acid molecules having an overall assembly order that is chosen by design.
  • the saturation mutagenesis and optimized directed evolution methods also can be used to generate different progeny molecular species.
  • the invention provides freedom of choice and control regarding the selection of demarcation points, the size and number of the nucleic acid building blocks, and the size and design of the couplings. It is appreciated, furthermore, that the requirement for intermolecular homology is highly relaxed for the operability of this invention. In fact, demarcation points can even be chosen in areas of little or no intermolecular homology. For example, because of codon wobble, i.e. the degeneracy of codons, nucleotide substitutions can be introduced into nucleic acid building blocks without altering the amino acid originally encoded in the corresponding progenitor template. Alternatively, a codon can be altered such that the coding for an originally amino acid is altered.
  • nucleic acid building block is used to introduce an intron.
  • Optimized Directed Evolution System provides a non-stochastic gene modification system termed "optimized directed evolution system" to polypeptides having a glycosyltransferase or a deoxysugar pathway activity with new or altered properties. Optimized directed evolution is directed to the use of repeated cycles of reductive reassortment, recombination and selection that allow for the directed molecular evolution of nucleic acids through recombination.
  • Optimized directed evolution allows generation of a large population of evolved chimeric sequences, wherein the generated population is significantly enriched for sequences that have a predetermined number of crossover events.
  • a crossover event is a point in a chimeric sequence where a shift in sequence occurs from one parental variant to another parental variant. Such a point is normally at the juncture of where oligonucleotides from two parents are ligated together to form a single sequence.
  • This method allows calculation of the correct concentrations of oligonucleotide sequences so that the final chimeric population of sequences is enriched for the chosen number of crossover events. This provides more control over choosing chimeric variants having a predetermined number of crossover events.
  • this method provides a convenient means for exploring a tremendous amount of the possible protein variant space in comparison to other systems.
  • Previously if one generated, for example, 1013 chimeric molecules during a reaction, it would be extremely difficult to test such a high number of chimeric variants for a particular activity.
  • a significant portion of the progeny population would have a very high number of crossover events which resulted in proteins that were less likely to have increased levels of a particular activity.
  • the population of chimerics molecules can be enriched for those variants that have a particular number of crossover events.
  • each of the molecules chosen for further analysis most likely has, for example, only three crossover events.
  • One method for creating a chimeric progeny polynucleotide sequence is to create oligonucleotides corresponding to fragments or portions of each parental sequence.
  • Each oligonucleotide preferably includes a unique region of overlap so that mixing the oligonucleotides together results in a new variant that has each oligonucleotide fragment assembled in the correct order. Additional information can also be found, e.g., in USSN 09/332,835; U.S. Patent No. 6,361,974.
  • the number of oligonucleotides generated for each parental variant bears a relationship to the total number of resulting crossovers in the chimeric molecule that is ultimately created. For example, three parental nucleotide sequence variants might be provided to undergo a ligation reaction in order to find a chimeric variant having, for example, greater activity at high temperature.
  • a set of 50 oligonucleotide sequences can be generated corresponding to each portions of each parental variant. Accordingly, during the ligation reassembly process there could be up to 50 crossover events within each of the chimeric sequences. The probability that each of the generated chimeric polynucleotides will contain oligonucleotides from each parental variant in alternating order is very low. If each oligonucleotide fragment is present in the ligation reaction in the same molar quantity it is likely that in some positions oligonucleotides from the same parental polynucleotide will ligate next to one another and thus not result in a crossover event.
  • a probability density function can be determined to predict the population of crossover events that are likely to occur during each step in a ligation reaction given a set number of parental variants, a number of oligonucleotides corresponding to each variant, and the concentrations of each variant during each step in the ligation reaction.
  • PDF probability density function
  • a target number of crossover events can be predetermined, and the system then programmed to calculate the starting quantities of each parental oligonucleotide during each step in the ligation reaction to result in a probability density function that centers on the predetermined number of crossover events.
  • This system allows generation of a large population of evolved chimeric sequences, wherein the generated population is significantly enriched for sequences that have a predetermined number of crossover events.
  • a crossover event is a point in a chimeric sequence where a shift in sequence occurs from one parental variant to another parental variant. Such a point is normally at the juncture of where oligonucleotides from two parents are ligated together to form a single sequence.
  • the method allows calculation of the correct concentrations of oligonucleotide sequences so that the final chimeric population of sequences is enriched for the chosen number of crossover events. This provides more control over choosing chimeric variants having a predetermined number of crossover events.
  • the population of chimerics molecules can be enriched for those variants that have a particular number of crossover events.
  • each of the molecules chosen for further analysis most likely has, for example, only three crossover events.
  • the resulting progeny population can be skewed to have a predetermined number of crossover events, the boundaries on the functional variety between the chimeric molecules is reduced. This provides a more manageable number of variables when calculating which oligonucleotide from the original parental polynucleotides might be responsible for affecting a particular trait.
  • the method creates a chimeric progeny polynucleotide sequence by creating oligonucleotides corresponding to fragments or portions of each parental sequence.
  • Each oligonucleotide preferably includes a unique region of overlap so that mixing the oligonucleotides together results in a new variant that has each oligonucleotide fragment assembled in the correct order. See also USSN 09/332,835.
  • the number of oligonucleotides generated for each parental variant bears a relationship to the total number of resulting crossovers in the chimeric molecule that is ultimately created.
  • three parental nucleotide sequence variants might be provided to undergo a ligation reaction in order to find a chimeric variant having, for example, greater activity at high temperature.
  • a set of 50 oligonucleotide sequences can be generated corresponding to each portions of each parental variant. Accordingly, during the ligation reassembly process there could be up to
  • each of the generated chimeric polynucleotides will contain oligonucleotides from each parental variant in alternating order.
  • the probability that each of the generated chimeric polynucleotides will contain oligonucleotides from each parental variant in alternating order is very low. If each oligonucleotide fragment is present in the ligation reaction in the same molar quantity it is likely that in some positions oligonucleotides from the same parental polynucleotide will ligate next to one another and thus not result in a crossover event. If the concentration of each oligonucleotide from each parent is kept constant during any ligation step in this example, there is a 1/3 chance (assuming 3 parents) that an oligonucleotide from the same parental variant will ligate within the chimeric sequence and produce no crossover.
  • a probability density function can be determined to predict the population of crossover events that are likely to occur during each step in a ligation reaction given a set number of parental variants, a number of oligonucleotides corresponding to each variant, and the concentrations of each variant during each step in the ligation reaction.
  • PDF probability density function
  • a target number of crossover events can be predetermined, and the system then programmed to calculate the starting quantities of each parental oligonucleotide during each step in the ligation reaction to result in a probability density function that centers on the predetermined number of crossover events.
  • aspects of the invention include a system and software that receive a desired crossover probability density function (PDF), the number of parent genes to be reassembled, and the number of fragments in the reassembly as inputs.
  • PDF crossover probability density function
  • the output of this program is a "fragment PDF" that can be used to determine a recipe for producing reassembled genes, and the estimated crossover PDF of those genes.
  • the processing can be performed, e.g., in MATLABTM (The Mathworks, Natick, Massachusetts) a programming language and development environment for technical computing. Iterative Processes In practicing the invention, these processes can be iteratively repeated.
  • a nucleic acid responsible for an altered or new glycosyltransferase or a deoxysugar pathway enzyme phenotype is identified, re-isolated, again modified, re-tested for activity. This process can be iteratively repeated until a desired phenotype is engineered.
  • an entire biochemical anabolic or catabolic pathway can be engineered into a cell, including, e.g., an in vivo glycosylation system of the invention.
  • a particular oligonucleotide has no affect at all on the desired trait (e.g., a new glycosyltransferase or deoxysugar pathway enzyme phenotype)
  • it can be removed as a variable by synthesizing larger parental oligonucleotides that include the sequence to be removed. Since incorporating the sequence within a larger sequence prevents any crossover events, there will no longer be any variation of this sequence in the progeny polynucleotides. This iterative practice of determining which oligonucleotides are most related to the desired trait, and which are unrelated, allows more efficient exploration all of the possible protein variants that might be provide a particular trait or activity.
  • In vivo shuffling of molecules is use in methods of the invention that provide variants of polypeptides having a glycosyltransferase or deoxysugar pathway activity.
  • In vivo shuffling can be performed utilizing the natural property of cells to recombine multimers. While recombination in vivo has provided the major natural route to molecular diversity, genetic recombination remains a relatively complex process that involves 1) the recognition of homologies; 2) strand cleavage, strand invasion, and metabolic steps leading to the production of recombinant chiasma; and finally 3) the resolution of chiasma into discrete recombined molecules.
  • the invention provides a method for producing a hybrid polynucleotide from at least a first polynucleotide (e.g., a glycosyltransferase or deoxysugar pathway enzyme) and a second polynucleotide (e.g., a polypeptide having a glycosyltransferase or deoxysugar pathway, or, a tag or an epitope).
  • the hybrid polynucleotide can be made by introducing at least a first polynucleotide and a second polynucleotide which share at least one region of partial sequence homology into a suitable host cell.
  • hybrid polynucleotide The regions of partial sequence homology promote processes which result in sequence reorganization producing a hybrid polynucleotide.
  • Hybrid polynucleotides can result from intermolecular recombination events which promote sequence integration between DNA molecules.
  • hybrid polynucleotides can result from intramolecular reductive reassortment processes which utilize repeated sequences to alter a nucleotide sequence within a DNA molecule.
  • Producing sequence variants The invention also provides additional methods for making sequence variants of the nucleic acids encoding polypeptides with glycosyltransferase or deoxysugar pathway activity.
  • the invention provides variants of a glycosyltransferase or deoxysugar pathway enzyme coding sequence (e.g., a gene, cDNA or message).
  • the variants can be generated any means, including, e.g., random or stochastic methods, or, non-stochastic, or "directed evolution," methods, as described herein.
  • the isolated variants may be naturally occurring.
  • Variant can also be created in vitro. Variants may be created using genetic engineering techniques such as site directed mutagenesis, random chemical mutagenesis, Exonuclease III deletion procedures, and standard cloning techniques. Alternatively, such variants, fragments, analogs, or derivatives may be created using chemical synthesis or modification procedures.
  • variants are also familiar to those skilled in the art. These include procedures in which nucleic acid sequences obtained from natural isolates are modified to generate nucleic acids which encode polypeptides having characteristics which enhance their value in industrial or laboratory applications. In such procedures, a large number of variant sequences having one or more nucleotide differences with respect to the sequence obtained from the natural isolate are generated and characterized. These nucleotide differences can result in amino acid changes with respect to the polypeptides encoded by the nucleic acids from the natural isolates. For example, variants may be created using error prone PCR.
  • Error prone PCR PCR is performed under conditions where the copying fidelity of the DNA polymerase is low, such that a high rate of point mutations is obtained along the entire length of the PCR product.
  • Error prone PCR is described, e.g., in Leung, D.W., et al., Technique, 1:11-15, 1989) and Caldwell, R. C. & Joyce G.F., PCR Methods Applic, 2:28-33, 1992.
  • nucleic acids to be mutagenized are mixed with PCR primers, reaction buffer, MgCl 2 , MnCl 2 , Taq polymerase and an appropriate concentration of dNTPs for achieving a high rate of point mutation along the entire length of the PCR product.
  • the reaction may be performed using 20 fmoles of nucleic acid to be mutagenized, 30 pmole of each PCR primer, a reaction buffer comprising 50mM KCl, lOmM Tris HC1 (pH 8.3) and 0.01% gelatin, 7mM MgC12, 0.5mM MnCl 2 , 5 units of Taq polymerase, 0.2mM dGTP, 0.2mM dATP, ImM dCTP, and ImM dTTP.
  • PCR may be performed for 30 cycles of 94°C for 1 min, 45°C for 1 min, and 72°C for 1 min. However, it will be appreciated that these parameters may be varied as appropriate.
  • the mutagenized nucleic acids are cloned into an appropriate vector and the activities of the polypeptides encoded by the mutagenized nucleic acids is evaluated. Variants may also be created using oligonucleotide directed mutagenesis to generate site-specific mutations in any cloned DNA of interest. Oligonucleotide mutagenesis is described, e.g., in Reidhaar-Olson (1988) Science 241:53-57. Briefly, in such procedures a plurality of double stranded oligonucleotides bearing one or more mutations to be introduced into the cloned DNA are synthesized and inserted into the cloned DNA to be mutagenized.
  • Assembly PCR involves the assembly of a PCR product from a mixture of small DNA fragments. A large number of different PCR reactions occur in parallel in the same vial, with the products of one reaction priming the products of another reaction. Assembly PCR is described in, e.g, U.S. Patent No. 5,965,408. Still another method of generating variants is sexual PCR mutagenesis.
  • Sexual PCR mutagenesis In sexual PCR mutagenesis, forced homologous recombination occurs between DNA molecules of different but highly related DNA sequence in vitro, as a result of random fragmentation of the DNA molecule based on sequence homology, followed by fixation of the crossover by primer extension in a PCR reaction.
  • Sexual PCR mutagenesis is described, e.g., in Stemmer (1994) Proc. Natl. Acad. Sci. USA 91:10747-10751. Briefly, in such procedures a plurality of nucleic acids to be recombined are digested with DNase to generate fragments having an average size of 50-200 nucleotides. Fragments of the desired average size are purified and resuspended in a PCR mixture.
  • PCR is conducted under conditions which facilitate recombination between the nucleic acid fragments.
  • PCR may be performed by resuspending the purified fragments at a concentration of 10-30ng/:l in a solution of Q.2mM of each dNTP, 2.2mM MgCl 2 , 50mM
  • reaction mixture 100:1 of reaction mixture is added and PCR is performed using the following regime:
  • oligonucleotides may be included in the PCR reactions.
  • the Klenow fragment of DNA polymerase I may be used in a first set of PCR reactions and Taq polymerase may be used in a subsequent set of PCR reactions.
  • Recombinant sequences are isolated and the activities of the polypeptides they encode are assessed.
  • Variants may also be created by in vivo mutagenesis.
  • random mutations in a sequence of interest are generated by propagating the sequence of interest in a bacterial strain, such as an E. coli strain, which carries mutations in one or more of the DNA repair pathways.
  • Such "imitator" strains have a higher random mutation rate than that of a wild-type parent. Propagating the DNA in one of these strains will eventually generate random mutations within the DNA.
  • Mutator strains suitable for use for in vivo mutagenesis are described, e.g, in PCT Publication No. WO 91/16427. Variants may also be generated using cassette mutagenesis. In cassette mutagenesis a small region of a double stranded DNA molecule is replaced with a synthetic oligonucleotide "cassette" that differs from the native sequence. The oligonucleotide often contains completely and/or partially randomized native sequence. Recursive ensemble mutagenesis may also be used to generate variants.
  • Recursive ensemble mutagenesis is an algorithm for protein engineering (protein mutagenesis) developed to produce diverse populations of phenotypically related mutants whose members differ in amino acid sequence. This method uses a feedback mechanism to control successive rounds of combinatorial cassette mutagenesis.
  • Recursive ensemble mutagenesis is described, e.g, in Arkin (1992) Proc. Natl. Acad. Sci. USA 89:7811-7815.
  • variants are created using exponential ensemble mutagenesis.
  • Exponential ensemble mutagenesis is a process for generating combinatorial libraries with a high percentage of unique and functional mutants, wherein small groups of residues are randomized in parallel to identify, at each altered position, amino acids which lead to functional proteins.
  • Exponential ensemble mutagenesis is described, e.g, in Delegrave (1993) Biotechnology Res. 11:1548-1552. Random and site-directed mutagenesis are described, e.g, in Arnold (1993) Current Opinion in Biotechnology 4:450-455.
  • the variants are created using shuffling procedures wherein portions of a plurality of nucleic acids which encode distinct polypeptides are fused together to create chimeric nucleic acid sequences which encode chimeric polypeptides as described in, e.g, U.S. Patent Nos. 5,965,408; 5,939,250 (see also discussion, above).
  • the invention also provides variants of polypeptides having a glycosyltransferase or deoxysugar pathway activity comprising sequences in which one or more of the amino acid residues are substituted with a conserved or non-conserved amino acid residue (e.g, a conserved amino acid residue) and such substituted amino acid residue may or may not be one encoded by the genetic code.
  • Conservative substitutions are those that substitute a given amino acid in a polypeptide by another amino acid of like characteristics.
  • polypeptides having a glycosyltransferase or deoxysugar pathway activity include those with conservative substitutions of sequences of known polypeptides having a glycosyltransferase or deoxysugar pathway activity, including but not limited to the following replacements: replacements of an aliphatic amino acid such as Alanine, Valine, Leucine and Isoleucine with another aliphatic amino acid; replacement of a Serine with a Threonine or vice versa; replacement of an acidic residue such as Aspartic acid and Glutamic acid with another acidic residue; replacement of a residue bearing an amide group, such as Asparagine and Glutamine, with another residue bearing an amide group; exchange of a basic residue such as Lysine and Arginine with another basic residue; and replacement of an aromatic residue such as Phenylalanine, Tyrosine with another aromatic residue.
  • replacements of an aliphatic amino acid such as Alanine, Valine, Leucine and Isoleucine with another ali
  • variants are those in which one or more of the amino acid residues of the polypeptides includes a substituent group.
  • Other variants within the scope of the invention are those in which the polypeptide is associated with another compound, such as a compound to increase the half-life of the polypeptide, for example, polyethylene glycol.
  • Additional variants within the scope of the invention are those in which additional amino acids are fused to the polypeptide, such as a leader sequence, a secretory sequence, a proprotein sequence or a sequence which facilitates purification, enrichment, or stabilization of the polypeptide.
  • the variants, fragments, derivatives and analogs of polypeptides having a glycosyltransferase or deoxysugar pathway activity retain the same biological function or activity as the exemplary polypeptides described herein.
  • the variant, fragment, derivative, or analog includes a proprotein, such that the variant, fragment, derivative, or analog can be activated by cleavage of the proprotein portion to produce an active polypeptide.
  • Optimizing codons to achieve high levels of protein expression in host cells The invention provides methods for modifying nucleic acids encoding glycosyltransferase or deoxysugar pathway enzymes by modifying codon usage.
  • the invention provides methods for modifying codons in a nucleic acid encoding a glycosyltransferase or deoxysugar pathway enzyme to increase or decrease its expression in a host cell.
  • the invention also provides nucleic acids encoding a glycosyltransferase or deoxysugar pathway enzyme modified to increase its expression in a host cell, and methods of making the modified glycosyltransferase or deoxysugar pathway.
  • the method comprises identifying a "non-preferred” or a "less preferred” codon in glycosyltransferase or deoxysugar pathway enzyme -encoding nucleic acid and replacing one or more of these non-preferred or less preferred codons with a "preferred codon” encoding the same amino acid as the replaced codon and at least one non-preferred or less preferred codon in the nucleic acid has been replaced by a preferred codon encoding the same amino acid.
  • a preferred codon is a codon over-represented in coding sequences in genes in the host cell and a non-preferred or less preferred codon is a codon under- represented in coding sequences in genes in the host cell.
  • Host cells for expressing nucleic acids encoding polypeptides having a glycosyltransferase or deoxysugar pathway activity include bacteria, yeast, fungi, plant cells, insect cells and mammalian cells.
  • the invention provides methods for optimizing codon usage in all of these cells, codon-altered nucleic acids and polypeptides made by the codon-altered nucleic acids.
  • Exemplary host cells include gram negative bacteria, such as Escherichia coli; gram positive bacteria, such as a Streptomyces, e.g, Streptomyces coelicolor, Streptomyces peucetius, Streptomyces avermitilis, Streptomyces aureofaciens, Streptomyces kasugensis, Streptomyces lavenulae, Streptomyces lipmanii, Streptomyces lividans, Streptomyces ambofaciens, Streptomyces violaceoniger, Streptomyces thermotolerans, Streptomyces rimosus, Streptomyces glaucescens, Streptomyces roseofulvus, Streptomyces cinnamonensis, Streptomyces curacoi, Streptomyces fradiae, Streptomyces griseus, Streptomyces griseofuscus, Streptomyces long
  • Exemplary host cells also include eukaryotic organisms, e.g, various yeast, such as Saccharomyces sp, including Saccharomyces cerevisiae, Schizosaccharomyces pombe, Pichiapastoris, and Kluyveromyces lactis, Hansenula polymorpha, Aspergillus niger, and mammalian cells and cell lines and insect cells and cell lines.
  • yeast such as Saccharomyces sp, including Saccharomyces cerevisiae, Schizosaccharomyces pombe, Pichiapastoris, and Kluyveromyces lactis, Hansenula polymorpha, Aspergillus niger, and mammalian cells and cell lines and insect cells and cell lines.
  • the invention also includes nucleic acids and polypeptides optimized for expression in these organisms and species.
  • the codons of a nucleic acid encoding a glycosyltransferase or deoxysugar pathway enzyme isolated from a bacterial cell are modified such that the nucleic acid is optimally expressed in a bacterial cell different from the bacteria from which the glycosyltransferase or deoxysugar pathway enzyme was derived, a yeast, a fungi, a plant cell, an insect cell or a mammalian cell.
  • Methods for optimizing codons are well known in the art, see, e.g, U.S. Patent No. 5,795,737; Baca (2000) Int. J. Parasitol.
  • the invention provides compositions having anti-microbial (e.g, as a disinfectant) and/or insecticidal activity and methods for using the compositions of the invention. Any or all of the steps of the methods of the invention can be carried out in vitro, in vivo in a whole cell process or in a transgenic plant or transformed plant cell.
  • compositions of the invention can be detected and quantified by any of a number of means well known to those of skill in the art, including, e.g, analytic methods such as spectrophotometry (e.g, mass spectography), radiography, electrophoresis, capillary electrophoresis, high performance liquid chromatography (HPLC), thin layer chromatography (TLC), hyperdiffusion chromatography, and the like, and various immunological methods such as fluid or gel precipitin reactions, immunodiffusion (single or double), immunoelectrophoresis, radioimmunoassays (RIAs), enzyme-linked immunosorbent assays (ELISAs), immunofluorescent assays, ion-specific electrodes, see, e.g, Fritsche (1991) Analytica Chimica Acta 244:179-182; West (1992) Analytical Chemistry 64:533- 540; by gas chromatography, mass spectography or by
  • compositions of the invention e.g, glycosylated natural products of the invention
  • a composition e.g, a food or feed, such as a grain
  • fluorescence polarization For example, a feed, e.g, a grain extract, is prepared by shaking a crushed sample with a solvent.
  • a mixture is prepared by combining the extract with a tracer and with monoclonal antibodies specific to a natural product. The tracer is able to bind to the monoclonal antibodies to produce a detectable change in fluorescence polarization.
  • the tracer is prepared by conjugating a glycosylated natural product to a suitable fluorophore. The fluorescence polarization of the mixture is measured.
  • the glycosylated natural product concentration of the mixture may be calculated using a standard curve obtained by measuring the fluorescence polarization of a series of solutions of known concentration.
  • the methods of the invention comprise application of a glycosylated natural product of the invention directly to a plant or plant part, including processed plant parts, such as animal feeds, foods, and the like.
  • the polypeptide can be applied to a crop area or a plant to be treated, simultaneously or in succession, with other compounds, such as fertilizers, nutrients or other preparations that influence plant growth, herbicides, insecticides, fungicides, bactericides, nematicides, mollusicides, or mixtures of these preparations.
  • the application of a glycosylated natural product of the invention can be with an agriculturally acceptable carrier, a surfactant, and/or an adjuvant or formulation.
  • the glycosylated natural products of the invention can be formulated as solids or liquids. They can be applied with natural or regenerated mineral substances, solvents, dispersants, wetting agents, tackifiers, binders, or fertilizers.
  • the application of a glycosylated natural product of the invention can be applied to the plant, plant part or any surface using any techniques, for example, as a wash or spray, or in dried or lyophilized form or powered form.
  • the glycosylated natural product of the invention is in a milled formulation.
  • the glycosylated natural product of the invention is applied to foods and feeds, e.g, processed grains or silage to be used for animal feed.
  • the glycosylated natural product of the invention can be applied in the form of an inoculant or probiotic additive.
  • the glycosylated natural product of the invention can be useful during processing and/or in animal feed prior to its use.
  • Biological activity of the novel 17-PSA analogues and spinosyn analogues was determined against beet armyworm larvae. Whereas none of the 17-PSA variants showed any activity, a subset of the spinosyn A analogues displayed good insecticidal activity, albeit less than spinosad.
  • transgenic non-human animals comprising a nucleic acid encoding a glycosyltransferase or a deoxysugar pathway enzyme.
  • the invention also provides transgenic non-human animals comprising a nucleic acid or a polypeptide generated by a method of the invention.
  • the invention provides transgenic non-human animals comprising an expression cassette or vector or a transfected or transformed cell comprising a nucleic acid encoding a glycosyltransferase or a deoxysugar pathway enzyme.
  • the invention also provides methods of making and using these transgenic non-human animals.
  • the transgenic non-human animals can be used as in vivo screening models for identifying glycosyltransferase or a deoxysugar pathway enzymes, e.g, enzymes modified by the methods of the invention.
  • the transgenic non-human animals can be, e.g, goats, rabbits, sheep, pigs, cows, rats and mice, comprising the nucleic acids encoding glycosyltransferase or a deoxysugar pathway enzyme. These animals can be used, e.g, as in vivo models to screen for or to study glycosyltransferase or a deoxysugar pathway enzymes.
  • the coding sequences for the polypeptides to be expressed in the transgenic non-human animals can be designed to be constitutive, or, under the control of tissue-specific, developmental- specific or inducible transcriptional regulatory factors.
  • Transgenic non-human animals can be designed and generated using any method known in the art; see, e.g, U.S. Patent Nos.
  • U.S. Patent No. 6,211,4208 describes making and using transgenic non-human mammals which express in their brains a nucleic acid construct comprising a DNA sequence.
  • U.S. Patent No. 5,387,742 describes injecting cloned recombinant or synthetic DNA sequences into fertilized mouse eggs, implanting the injected eggs in pseudo-pregnant females, and growing to term transgenic mice whose cells express proteins related to the pathology of Alzheimer's disease.
  • U.S. Patent No. 6,187,992 describes making and using a transgenic mouse whose genome comprises a disruption of the gene encoding amyloid precursor protein (APP).
  • APP amyloid precursor protein
  • “Knockout animals” can also be used to practice the methods of the invention or to screen for a glycosyltransferase or a deoxysugar pathway to generate a natural product of the invention.
  • the transgenic or modified animals comprise a "knockout animal,” e.g, a “knockout mouse,” engineered not to express an endogenous gene, which is replaced with a gene expressing a heterologous glycosyltransferase and/or a heterologous deoxysugar pathway.
  • Transgenic Plants and Plant Parts The invention can be practiced using transgenic plants and plant parts (e.g, individual cells, seeds, fruits, flowers, leaves, roots, tubers, etc.) comprising a nucleic acid encoding a glycosyltransferase and/or a deoxysugar pathway.
  • the methods comprise providing a transgenic plant capable of constitutively or inducibly expressing a glycosyltransferase and/or a deoxysugar pathway.
  • a plant or plant cell is used to generate a glycosylated natural product of the invention, which is then applied to a plant, plant part, or any surface.
  • the invention also provides transgenic plants and plant parts comprising a nucleic acid, a polypeptide, an expression cassette or vector or a transfected or transformed cell comprising a nucleic acid encoding a glycosyltransferase and/or a deoxysugar pathway.
  • transgenic plant includes plants that comprise within their genome a heterologous polynucleotide, e.g, a nucleic acid encoding a glycosyltransferase and/or a deoxysugar pathway.
  • a heterologous polynucleotide e.g, a nucleic acid encoding a glycosyltransferase and/or a deoxysugar pathway.
  • the heterologous polynucleotide is stably integrated within the genome such that the polynucleotide is passed on to successive generations.
  • the heterologous polynucleotide may be integrated into the genome alone or as part of a recombinant expression cassette.
  • Transgenic is used herein to include any cell, cell line, callus, tissue, plant part or plant, the genotype of which has been altered by the presence of heterologous nucleic acid including those transgenics initially so altered as well as those created by sexual crosses or asexual propagation from the initial transgenic.
  • the term “transgenic” as used herein does not encompass the alteration of the genome (chromosomal or extra-chromosomal) by conventional plant breeding methods or by naturally occurring events such as random cross-fertilization, nonrecombinant viral infection, nonrecombinant bacterial transformation, nonrecombinant transposition, or spontaneous mutation.
  • the transgenic plant can be dicotyledonous (a dicot) or monocotyledonous (a monocot).
  • the invention also provides methods of making and using transgenic plants and plant parts.
  • the transgenic plants and plant parts be constructed in accordance with any method known in the art. See, for example, U.S. Patent No. 6,309,872.
  • a plant can be transformed with a nucleotide sequence (e.g, a nucleic acid encoding a glycosyltransferase and/or a deoxysugar pathway) to, e.g, generate a composition of the invention, in the transformed plant or plant products.
  • Transgenic plants and plants transformed with a nucleic acid encoding a glycosyltransferase and/or a deoxysugar pathway used to practice the invention include, for example, species from the genera Cucurbita, Rosa, Vitis, Juglans, Fragaria, Lotus, Medicago, Onobrychis, Trifolium, Trigonella, Vigna, Citrus, Linum, Geranium, Manihot, Daucus, Arabidopsis, Brassica, Raphanus, Sinapis, Atropa, Capsicum, Datura, Hyoscyamus, Lycopersicon, Nicotiana, Solanum, Petunia, Digitalis, Majorana, Ciahorium, Helianthus, Lactuca, Bromus, Asparagus, Antirrhinum, Heterocallis, Nemesis, Pelargonium, Panieum, Pennisetum, Ranunculus, Senecio, Salpiglossis, Cucumis, Brow
  • Nucleic acids and expression constructs can be introduced into a plant cell by any means.
  • nucleic acids or expression constructs can be introduced into the genome of a desired plant host, or, the nucleic acids or expression constructs can be episomes.
  • Introduction into the genome of a desired plant can be such that the host's glycosyltransferase and/or a deoxysugar production is regulated by endogenous transcriptional or translational control elements.
  • the invention also provides "knockout plants" where insertion of gene sequence by, e.g, homologous recombination, has disrupted the expression of the endogenous gene.
  • Transformation protocols as well as protocols for introducing nucleotide sequences into plants may vary depending on the type of plant or plant cell, i.e., monocot or dicot, targeted for transformation. Suitable methods of introducing nucleotide sequences into plant cells and subsequent insertion into the plant genome include microinjection (Crossway et al. (1986) Biotechniques 4:320-334), electroporation (Riggs et al. (1986) Proc.
  • the first step in production of a transgenic plant involves making an expression construct for expression in a plant cell.
  • These techniques are well known in the art. They can include selecting and cloning a promoter, a coding sequence for facilitating efficient binding of ribosomes to mRNA and selecting the appropriate gene terminator sequences.
  • a constitutive promoter is CaMV35S, from the cauliflower mosaic virus, which generally results in a high degree of expression in plants.
  • Other promoters are more specific and respond to cues in the plant's internal or external environment.
  • An exemplary light-inducible promoter is the promoter from the cab gene, encoding the major chlorophyll a/b binding protein.
  • the nucleic acid is modified to achieve greater expression in a plant cell.
  • a sequence encoding a polypeptide a glycosyltransferase or a deoxysugar pathway may have a higher percentage of A-T nucleotide pairs compared to that seen in a plant, some of which prefer G-C nucleotide pairs. Therefore, A-T nucleotides in the coding sequence can be substituted with G-C nucleotides without significantly changing the amino acid sequence to enhance production of the gene product in plant cells.
  • Selectable marker gene can be added to the gene construct in order to identify plant cells or tissues that have successfully integrated the transgene. This may be necessary because achieving incorporation and expression of genes in plant cells is a rare event, occurring in just a few percent of the targeted tissues or cells.
  • Selectable marker genes encode proteins that provide resistance to agents that are normally toxic to plants, such as antibiotics or herbicides. Only plant cells that have integrated the selectable marker gene will survive when grown on a medium containing the appropriate antibiotic or herbicide. As for other inserted genes, marker genes also require promoter and termination sequences for proper function.
  • making transgenic plants or plant parts comprises incorporating sequences (e.g, a nucleic acid encoding a glycosyltransferase and/or a deoxysugar pathway) and, optionally, marker genes into a target expression construct (e.g, a plasmid), along with positioning of the promoter and the terminator sequences.
  • sequences e.g, a nucleic acid encoding a glycosyltransferase and/or a deoxysugar pathway
  • marker genes e.g, a plasmid
  • This can involve transferring the modified gene into the plant through a suitable method.
  • a construct may be introduced directly into the genomic DNA of the plant cell using techniques such as electroporation and microinjection of plant cell protoplasts, or the constructs can be introduced directly to plant tissue using ballistic methods, such as DNA particle bombardment. For example, see, e.g, Christou (1997) Plant Mol. Biol.
  • Plant signal sequences including, but not limited to, signal-peptide encoding DNA/RNA sequences which target proteins to the extracellular matrix of the plant cell (Dratewka-Kos et al, (1989) J. Biol. Chem. 264:4896-4900), the Nicotiana plumbaginifolia extension gene (DeLoose, et al. (1991) Gene 99:95-100), signal peptides which target proteins to the vacuole like the sweet potato sporamin gene (Matsuka et al.
  • protoplasts can be immobilized and injected with a nucleic acid, e.g, an expression construct.
  • Organized tissues can be transformed with naked DNA using gene gun technique, where DNA is coated on tungsten microprojectiles, shot 1/100th the size of cells, which carry the DNA deep into cells and organelles.
  • Transformed tissue is then induced to regenerate, usually by somatic embryogenesis.
  • This technique has been successful in several cereal species including maize and rice.
  • Nucleic acids e.g, expression constructs, can also be introduced in to plant cells using recombinant viruses.
  • Plant cells can be transformed using viral vectors, such as, e.g, tobacco mosaic virus derived vectors (Rouwendal (1997) Plant Mol. Biol. 33:989-999), see Porta (1996) "Use of viral replicons for the expression of genes in plants," Mol. Biotechnol. 5:209-221.
  • nucleic acids e.g, an expression construct
  • suitable T-DNA flanking regions can be combined with suitable T-DNA flanking regions and introduced into a conventional Agrobacterium tumefaciens host vector.
  • the virulence functions of the Agrobacterium tumefaciens host will direct the insertion of the construct and adjacent marker into the plant cell DNA when the cell is infected by the bacteria.
  • Agrobacterium tumefaciens-mediated transformation techniques including disarming and use of binary vectors, are well described in the scientific literature. See, e.g, Horsch (1984) Science 233:496-498; Fraley (1983) Proc. Natl. Acad. Sci.
  • the DNA in an A. tumefaciens cell is contained in the bacterial chromosome as well as in another structure known as a Ti (tumor-inducing) plasmid.
  • the Ti plasmid contains a stretch of DNA termed T-DNA (-20 kb long) that is transferred to the plant cell in the infection process and a series of vir
  • A. tumefaciens can only infect a plant through wounds: when a plant root or stem is wounded it gives off certain chemical signals, in response to which, the vir genes of A. tumefaciens become activated and direct a series of events necessary for the transfer of the T-DNA from the Ti plasmid to the plant's chromosome. The T-DNA then enters the plant cell through the wound.
  • One speculation is that the T-DNA waits until the plant DNA is being replicated or transcribed, then inserts itself into the exposed plant DNA. In order to use A.
  • tumefaciens as a transgene vector
  • the tumor-inducing section of T-DNA have to be removed, while retaining the T-DNA border regions and the vir genes.
  • the transgene is then inserted between the T-DNA border regions, where it is transferred to the plant cell and becomes integrated into the plant's chromosomes.
  • monocotyledonous plants can be transformed using a nucleic acid encoding a glycosyltransferase and/or a deoxysugar pathway.
  • Monocotyledonous plants used to practice the invention include all cereals, see Hiei (1997) Plant Mol. Biol. 35:205-218.
  • the third step can involve selection and regeneration of whole plants capable of transmitting the incorporated target gene (e.g, heterologous glycosyltransferase or a heterologous deoxysugar pathway) to the next generation.
  • target gene e.g, heterologous glycosyltransferase or a heterologous deoxysugar pathway
  • Such regeneration techniques rely on manipulation of certain phytohormones in a tissue culture growth medium, typically relying on a biocide and/or herbicide marker that has been introduced together with the desired nucleotide sequences. Plant regeneration from cultured protoplasts is described in Evans et al. Protoplasts Isolation and Culture,
  • Regeneration can also be obtained from plant callus, explants, organs, or parts thereof. Such regeneration techniques are described generally in Klee (1987) Ann. Rev. of Plant Phys. 38:467-486.
  • tissue culture a process known as tissue culture. Once whole plants are generated and produce seed, evaluation of the progeny begins. After the expression cassette is stably incorporated in transgenic plants, it can be introduced into other plants by sexual crossing.
  • any of a number of standard breeding techniques can be used, depending upon the species to be crossed. Since transgenic expression of nucleic acids encoding polypeptides having a glycosyltransferase or a deoxysugar pathway can lead to phenotypic changes, plants comprising the recombinant nucleic acids can be sexually crossed with a second plant to obtain a final product.
  • a seed comprising a nucleic acid encoding a heterologous glycosyltransferase and a heterologous deoxysugar pathway is derived from a cross between two transgenic plants, or a cross between a plant comprising a nucleic acid encoding heterologous glycosyltransferase and a heterologous deoxysugar pathway and another plant.
  • the desired effects e.g, expression of a polypeptide having a heterologous glycosyltransferase and a heterologous deoxysugar pathway, can be enhanced when both parental plants express these polypeptides.
  • the desired effects can be passed to future plant generations by standard propagation means.
  • nucleic acids encoding a heterologous glycosyltransferase and a heterologous deoxysugar pathway can be expressed in or inserted in any plant or plant part, e.g, a seed, fruit, flower, a root, a tuber and the like.
  • Transgenic plants can be dicotyledonous or monocotyledonous. Examples of monocot transgenic plants of the invention and used to practice the invention include grasses, such as meadow grass (blue grass, Pod), forage grass such as festuca, lolium, temperate grass, such as Agrostis, and cereals, e.g, wheat, oats, rye, barley, rice, sorghum, and maize (corn).
  • dicot transgenic plants of the invention examples include tobacco, legumes, such as lupins, potato, sugar beet, pea, bean and soybean, and cruciferous plants (family Brassicaceae), such as cauliflower, rape seed, and the closely related model organism Arabidopsis thaliana.
  • legumes such as lupins, potato, sugar beet, pea, bean and soybean
  • cruciferous plants family Brassicaceae
  • cauliflower, rape seed and the closely related model organism Arabidopsis thaliana.
  • transgenic plants and plant parts used to practice the invention include a broad range of plants, including, but not limited to, species from the genera Anacardium, Arachis, Asparagus, Atropa, Avena, Brassica, Citrus, Citrullus, Capsicum, Carthamus, Cocos, Coffea, Cucumis, Cucurbita, Daucus, Elaeis, Fragaria, Glycine, Gossypium, Helianthus, Heterocallis, Hordeum, Hyoscyamus, Lactuca, Linum, Lolium, Lupinus, Lycopersicon, Malus, Manihot, Majorana, Medicago, Nicotiana, Olea, Oryza, Panieum, Pannisetum, Persea, Phaseolus, Pistachia, Pisum, Pyrus, Prunus, Raphanus, Ricinus, Secale, Senecio, Sinapis, Solanum, Sorghum, Theobromus,
  • the nucleic acids encoding heterologous glycosyltransferases and/or heterologous deoxysugar pathways are expressed in plants which contain fiber cells, including, e.g., cotton, silk cotton tree (Kapok, Ceiba pentandra), desert willow, creosote bush, winterfat, balsa, ramie, kenaf, hemp, roselle, jute, sisal abaca and flax.
  • transgenic plants used to practice the invention are members of the genus Gossypium, including members of any Gossypium species, such as G. arboreum;. G. herbaceum, G. barbadense, and G. hirsutum.
  • the invention also provides for transgenic plants for producing large amounts of heterologous glycosyltransferases or heterologous deoxysugar pathway enzymes, resulting in generation of compositions of the invention.
  • transgenic plants for producing large amounts of heterologous glycosyltransferases or heterologous deoxysugar pathway enzymes, resulting in generation of compositions of the invention.
  • Palmgren 1997 Trends Genet. 13:348; Chong (1997) Transgenic Res. 6:289-296 (producing human milk protein beta-casein in transgenic potato plants using an auxin-inducible, bi-directional mannopine synthase (masl',2') promoter with Agrobacterium tumefaciens-mediated leaf disc transformation methods).
  • the modified plant may be grown into plants in accordance with conventional ways. See, for example, McCormick et al. (1986) Plant Cell. Reports 5:81- 84.
  • heterologous glycosyltransferase and/or heterologous deoxysugar pathway enzymes can be fermented in a bacterial host and the resulting bacteria processed and used as a microbial spray. Any suitable microorganism can be used for this purpose. See, for example, Gaertner et al. (1993) in Advanced Engineered Pesticides, Kim (Ed.).
  • the nucleic acids encoding a heterologous glycosyltransferase and/or heterologous deoxysugar pathway can be introduced into microorganisms that multiply on plants (epiphytes) to deliver the glycosylated natural products of the invention to potential target crops.
  • Epiphytes can be gram-positive or gram-negative bacteria.
  • the microorganisms that have been genetically altered to contain at least one nucleic acid encoding a heterologous glycosyltransferase and/or heterologous deoxysugar pathway are used for protecting agricultural crops and products.
  • whole, i.e., unlysed, cells of transformed organisms of the invention are applied to the environment of a target plant.
  • a nucleic acid encoding a heterologous glycosyltransferase and/or heterologous deoxysugar pathway can be introduced via a suitable vector into a microbial host, and said transformed host applied to the environment or plants or animals.
  • the microorganism hosts will then produce a composition of the invention.
  • microorganism hosts that are known to occupy the "phytosphere" (phylloplane, phyllosphere, rhizosphere, and/or rhizoplane) of one or more crops of interest are selected for transformation.
  • microorganisms are selected so as to be capable of successfully competing in the particular environment with the wild-type microorganisms, to provide for stable maintenance and expression of the gene expressing the pesticide/ insecticide of the invention.
  • Exemplary microorganism hosts of the invention for producing a glycosylated natural product include bacteria, algae, and fungi, including bacteria such as Erwinia, Serratia, Klebsiella, Xanthomonas, Streptomyces, Rhizobium, Methylius, Agrobacterium, Acetobacter, Lactobacillus, Arthrobacter, Azotobacter, Leuconostoc, and Alcaligenes; fungi, particularly yeast, e.g, Saccharomyces, Pichia, Cryptococcus, Kluyveromyces, Sporobolomyces, Rhodotorula, and Aureobasidium.
  • Streptomyces includes Streptomyces coelicolor, Streptomyces peucetius, Streptomyces avermitilis, Streptomyces aureofaciens, Streptomyces kasugensis, Streptomyces lavenulae, Streptomyces lipmanii, Streptomyces lividans, Streptomyces ambofaciens, Streptomyces violaceoniger, Streptomyces thermotolerans, Streptomyces rimosus, Streptomyces glaucescens, Streptomyces roseofulvus, Streptomyces cinnamonensis, Streptomyces curacoi, Streptomyces fradiae, Streptomyces griseus, Streptomyces griseofuscus, Streptomyces longisporoflavus, Streptomyces hygroscopicus, Streptomyces lasaliensis, Streptomyces
  • Agrobacteria Agrobacteria, Xanthomonas campestris, Rhizobium melioti, Alcaligenes entrophus,
  • Rhodotorula rubra a species of yeast species such as Rhodotorula rubra, R. glutinis, R. marina, R. aurantiaca, Cryptococcus albidus, C diffluens, C. laurentii, Saccharomyces rosei, S. pretoriensis, S. cerevisiae, Sporobolomyces rosues, S. odorus, Kluyveromyces veronae, and Aureobasidium pullulans are used.
  • phytosphere yeast species such as Rhodotorula rubra, R. glutinis, R. marina, R. aurantiaca, Cryptococcus albidus, C diffluens, C. laurentii, Saccharomyces rosei, S. pretoriensis, S. cerevisiae, Sporobolomyces rosues, S. odorus, Kluyveromyces veronae, and Aureobas
  • Exemplary prokaryote hosts of the invention for producing a glycosylated natural product include both Gram-negative and -positive, include Enterobacteriaceae, such as Escherichia, Erwinia, Shigella, Salmonella, and Proteus; Bacillaceae; Rhizobiaceae, such as Rhizobium; Spirillaceae, such as photobacterium, Zymomonas, Serratia, Aeromonas, Vibrio, Desulfovibrio, Spirillum; Lactobacillaceae; and Acetobacter; Azotobacteraceae; and Nitrobacteraceae.
  • Enterobacteriaceae such as Escherichia, Erwinia, Shigella, Salmonella, and Proteus
  • Bacillaceae Rhizobiaceae, such as Rhizobium
  • Spirillaceae such as photobacterium, Zymomonas, Serratia, Aeromonas, Vibrio, Desulfovibrio,
  • Exemplary eukaryote hosts of the invention for producing a glycosylated natural product are fungi, such as Phycomycetes and Ascomycetes, which includes yeast, such as Saccharomyces and Schizosaccharomyces; and Basidiomycetes yeast, such as Rhodotorula, Aureobasidium, Sporobolomyces, and the like.
  • fungi such as Phycomycetes and Ascomycetes
  • yeast such as Saccharomyces and Schizosaccharomyces
  • Basidiomycetes yeast such as Rhodotorula, Aureobasidium, Sporobolomyces, and the like.
  • characteristics including ease of introducing the nucleic acid into a host, availability of expression systems, efficiency of expression, stability of the protein in the host, and the presence of auxiliary genetic capabilities are considered. Other considerations include ease of formulation and handling, economics, storage stability, and the like.
  • expression cassettes can be constructed that include the DNA constructs operably linked with the transcriptional and translational regulatory signals for expression of the DNA constructs, and a DNA sequence homologous with a sequence in the host organism, whereby integration will occur, and/or a replication system that is functional in the host, whereby integration or stable maintenance will occur.
  • formulations comprising a glycosylated natural product of the invention.
  • the pharmaceutical compositions are formulations that comprise a pharmacologically effective amount of a glycosylated natural product of the invention.
  • the pharmaceutical compositions of the invention comprise antibiotics of the invention (e.g, erythromycin, tetracycline, rifampicin and the like glycosylated by an in vivo glycosylation system of the invention); anti-tumor drugs of the invention (daunorubicin, mithramycin glycosylated by an in vivo glycosylation system of the invention); immunosuppressants of the invention (rapamycin, FK520, FK506 glycosylated by an in vivo glycosylation system of the invention); anti-fungals of the invention (amphotericin glycosylated by an in vivo glycosylation system of the invention); antibacterials of the invention (tylosin glycosylated by an in vivo glycosylation system of the invention); antiparasitics of the invention (avermectin glycosylated by an in vivo glycosylation system of the invention); and, anti-tumor drugs of the invention (
  • compositions of the invention can include pharmaceutically acceptable carriers that can contain a physiologically acceptable compound that acts, e.g, to stabilize the composition or to increase or decrease the absorption of the pharmaceutical composition.
  • Physiologically acceptable compounds can include, for example, carbohydrates, such as glucose, sucrose, or dextrans, antioxidants, such as ascorbic acid or glutathione, chelating agents, low molecular weight proteins, compositions that reduce the clearance or hydrolysis of any co-administered agents, or excipients or other stabilizers and/or buffers. Detergents can also used to stabilize the composition or to increase or decrease the absorption of the pharmaceutical composition.
  • Other physiologically acceptable compounds include wetting agents, emulsifying agents, dispersing agents or preservatives that are particularly useful for preventing the growth or action of microorganisms. Various preservatives are well known, e.g, ascorbic acid.
  • the choice of a pharmaceutically acceptable carrier, including a physiologically acceptable compound depends, e.g, on the route of administration and on the particular physio-chemical characteristics of any co- administered agent.
  • the composition for administration comprises a pharmaceutically acceptable carrier, e.g, an aqueous carrier.
  • a pharmaceutically acceptable carrier e.g, an aqueous carrier.
  • carriers can be used, e.g, buffered saline and the like. These solutions are sterile and generally free of undesirable matter.
  • These compositions may be sterilized by conventional, well-known sterilization techniques.
  • compositions may contain pharmaceutically acceptable auxiliary substances as required to approximate physiological conditions such as pH adjusting and buffering agents, toxicity adjusting agents and the like, for example, sodium acetate, sodium chloride, potassium chloride, calcium chloride, sodium lactate and the like.
  • concentration of active agent in these formulations can vary widely, and will be selected primarily based on fluid volumes, viscosities, body weight and the like in accordance with the particular mode of administration and imaging modality selected.
  • the pharmaceutical formulations of the invention can be administered in a variety of unit dosage forms, the general medical condition of each patient, the method of administration, and the like. Details on dosages are well described in the scientific and patent literature, see, e.g., the latest edition of Remington's Pharmaceutical Sciences.
  • the exact amount and concentration of pharmaceutical of the invention and the amount of formulation in a given dose, or the "effective dose” can be routinely determined by, e.g, the clinician (see above discussion of "a pharmacologically effective amount of a glycosylated natural product”).
  • the "dosing regimen,” will depend upon a variety of factors, e.g, the general state of the patient's health, age and the like. Using guidelines describing alternative dosaging regimens, e.g, from the use of other imaging contrast agents, the skilled artisan can determine by routine trials optimal effective concentrations of pharmaceutical compositions of the invention.
  • the invention is not limited by any particular dosage range.
  • compositions of the invention can be delivered by any means known in the art systemically (e.g, intravenously), regionally, or locally (e.g, infra- or peri-tumoral or intracystic injection) by, e.g, intraarterial, intratumoral, intravenous (IV), parenteral, intra-pleural cavity, topical, oral, or local administration, as subcutaneous, intra-tracheal (e.g, by aerosol) or transmucosal (e.g, buccal, bladder, vaginal, uterine, rectal, nasal mucosa), intra-tumoral (e.g, fransdermal application or local injection).
  • intraarterial, intratumoral, intravenous (IV), parenteral, intra-pleural cavity, topical, oral, or local administration as subcutaneous, intra-tracheal (e.g, by aerosol) or transmucosal (e.g, buccal, bladder, vaginal, uterine, rectal, nasal mucosa
  • intra-arterial injections can be used to have a "regional effect," e.g, to focus on a specific organ (e.g, brain, liver, spleen, lungs).
  • Formulations suitable for oral administration can comprise liquid solutions, such as an effective amount of the compound dissolved in diluents, such as water, saline, or fruit juice; capsules, sachets or tablets, each containing a predetermined amount of the active ingredient, as solid, granules or freeze-dried cells; solutions or suspensions in an aqueous liquid; and oil-in-water emulsions or water-in-oil emulsions.
  • Tablet forms can include one or more of lactose, mannitol, corn starch, potato starch, macrocrystalline cellulose, acacia, gelatin, colloidal silicon dioxide, croscarmellose sodium, talc, magnesium stearate, stearic acid, and other excipients, colorants, diluents, buffering agents, moistening agents, preservatives, flavoring agents, and pharmacologically compatible carriers.
  • Suitable formulations for oral delivery can also be incorporated into synthetic and natural polymeric microspheres, or other means to protect the agents of the present invention from degradation within the gastrointestinal tract. See, for example, Wallace (1993) Science 260:912-915.
  • the glycosylated natural products or conjugates thereof, alone or in combination with other similar acting compounds, can be made into aerosol formulations to be administered via inhalation. These aerosol formulations can be placed into pressurized acceptable propellants, such as dichlorodifluoromethane, propane, nitrogen and the like.
  • the glycosylated natural products or conjugates thereof, alone or in combinations with other similar acting compounds or absorption modulators, can be made into suitable formulations for fransdermal application and absorption.
  • Transdermal electroporation or iontophoresis also can be used to promote and/or control the systemic delivery of a glycosylated natural product of the invention through the skin, e.g, see Theiss et al, Meth. Find. Exp. Clin.
  • Formulations suitable for topical administration of a glycosylated natural product of the invention can include lozenges comprising the active ingredient in a flavor, usually sucrose and acacia or tragacanth; pastilles comprising the active ingredient in an inert base, such as gelatin and glycerin, or sucrose and acacia; and mouthwashes comprising a natural product of the invention in a suitable liquid carrier; as well as creams, emulsions, gels and the like.
  • Formulations for rectal administration can be presented as a suppository with a suitable base comprising, for example, cocoa butter or a salicylate.
  • Formulations suitable for vaginal administration can be presented as pessaries, tampons, creams, gels, pastes, foams, or spray formulas containing, in addition to the active ingredient, such as, for example, freeze-dried bacteria genetically engineered to directly produce a glycosylated natural product of the invention, such carriers as are known in the art to be appropriate.
  • a natural product of the invention can be combined with a lubricant as a coating on a condom.
  • a natural product of the invention can be applied to any contraceptive device, e.g, a condom, a diaphragm, a cervical cap, a vaginal ring or a sponge.
  • Formulations suitable for parenteral administration can include aqueous and non-aqueous, isotonic sterile injection solutions, which can contain anti-oxidants, buffers, bacteriostats, and solutes that render the formulation isotonic with the blood of the intended recipient, and aqueous and non-aqueous sterile suspensions that can include suspending agents, solubilizers, thickening agents, stabilizers, and preservatives.
  • the pharmaceutical formulations of the invention can be presented in unit- dose or multi-dose sealed containers, such as ampoules and vials, and can be stored in a freeze-dried (lyophilized) condition requiring only the addition of the sterile liquid excipient, for example, water, for injections, immediately prior to use.
  • Extemporaneous injection solutions and suspensions can be prepared from sterile powders, granules, and tablets.
  • Therapeutic compositions can also be administered in a lipid formulation, e.g, complexed with liposomes or in lipid/nucleic acid complexes or encapsulated in liposomes, as in immuno-liposomes directed to specific cells.
  • lipid formulations can be administered topically, systemically, or delivered via aerosol. See, e.g, U.S. Patent Nos. 6,149,937; 6,146,659; 6,143,716; 6,133,243; 6,110,490; 6,083,530; 6,063,400; 6,013,278; 5,958,378; 5,552,157.
  • the pharmaceutical composition of the invention comprises an altered glycosylated natural product of the invention, e.g, a glycosylated natural product of the invention with a modified molecular structure to, e.g, provide enhanced stability to the glycosylated natural product, as discussed above.
  • the invention provides various delivery strategy systems and routes of administration for glycosylated natural products of the invention, all of which are well known in the art, see e.g, Epstein, CRC Crit. Rev. Therapeutic Drug Carrier Systems 5, 99-139, 1988; Siddiqui et al, CRC Crit. Rev. Therapeutic Drug Carrier Systems 3, 195- 208, 1987; Banga et al. Int. J. Pharmaceutics 48, 15-50, 1988; Sanders, Eur.
  • a glycosylated natural product of the invention is used with an absorption-enhancing agent.
  • Any absorption-enhancing agent can be used, e.g, those applied in combination with protein and peptide drugs for oral delivery and for delivery by other routes, see, e.g, van Hoogdalem, Pharmac. Ther. 44, 407-443, 1989; Davis, J. Pharm. Pharmacol. 44(Suppl.
  • Enhancers used in the compositions and methods of the invention include, e.g, (a) chelators, such as EDTA, salicylates, and N- acyl derivatives of collagen, (b) surfactants, such as lauryl sulfate and polyoxyethylene-9- lauryl ether, (c) bile salts, such as glycolate and taurocholate, and derivatives, such as taurodihydrofusidate, (d) fatty acids, such as oleic acid and capric acid, and their derivatives, such as acylcamitines, monoglycerides and diglycerides, (e) non-surfactants, such as unsaturated cyclic ureas, (f) saponins, (g) cyclodextrins, and (h) phospholipids.
  • chelators such as EDTA, salicylates, and N- acyl derivatives of collagen
  • surfactants such as lauryl sulfate and polyoxyethylene-9- la
  • glycosylated natural product can be administered in combination with other drugs or substances.
  • Another alternative approach to prevent or delay gastrointestinal absorption of glycosylated natural product is to incorporate it into a delivery system that is designed to protect the natural product from contact with the proteolytic enzymes in the intestinal lumen and to release the natural product only upon reaching an area favorable for its absorption.
  • a biodegradable microcapsules or microspheres is used with a glycosylated natural product of the invention, both to protect it from degradation, as well as to effect a prolonged release of active drug, see, e.g, Deasy, in Microencapsulation and Related Processes, Swarbrick, ed., Marcell Dekker, Inc.: New York, 1984, pp. 1-60, 88- 89, 208-211.
  • Microcapsules also can provide a useful way to effect a prolonged delivery of a natural product drug after injection, see, e.g, Maulding, J. Controlled Release 6, 167- 176, 1987.
  • the invention also provides a pharmaceutical composition
  • a pharmaceutical composition comprising an isolated or purified glycosylated natural product or glycosylated natural product conjugate, a matrix-anchored glycosylated natural product or a matrix-anchored glycosylated natural product conjugate.
  • the composition of the invention can further comprise a carrier, such as a pharmaceutically acceptable carrier.
  • composition of the invention can further comprise an antiviral compound, e.g, AZT, ddl, ddC, gancyclovir, fluorinated dideoxynucleosides, nevirapine, R82913, Ro 31-8959, BI-RJ-70, acyclovir, alpha-interferon, recombinant sCD4, michellamines, calanolides, nonoxynol-9, gossypol and derivatives thereof, and gramicidin.
  • an antiviral compound e.g, AZT, ddl, ddC, gancyclovir, fluorinated dideoxynucleosides, nevirapine, R82913, Ro 31-8959, BI-RJ-70, acyclovir, alpha-interferon, recombinant sCD4, michellamines, calanolides, nonoxynol-9, gossypol and derivatives thereof, and gramici
  • a glycosylated natural product conjugate of the invention can be genetically engineered or chemically coupled.
  • Formulations of the invention comprising a glycosylated natural product or glycosylated natural product conjugate of the invention can be used for sterilization of inanimate objects, such as medical supplies or equipment, laboratory equipment and supplies, instruments, devices, and the like.
  • formulations of the invention are used for ex vivo sterilization from a sample, such as blood, blood products, sperm, or other bodily products, such as a fluid, cells, a tissue or an organ, or any other solution, suspension, emulsion, vaccine formulation, or any other material which can be administered to a patient in a medical procedure, can be selected or adapted as appropriate by one skilled in the art, from any of the aforementioned compositions or formulations.
  • a glycosylated natural product of the invention is attached to a solid support matrix as an antiseptic or anti-microbial in a sample, e.g, a bodily product such as a fluid, cells, a tissue or an organ from an organism, in particular a mammal, such as a human, including, for example, blood, a component of blood, or sperm.
  • a glycosylated natural product or the invention comprises a condom, a diaphragm, a cervical cap, a vaginal ring or a sponge.
  • the compositions of the invention can be used to treat objects or materials, such as medical equipment, supplies, or fluids, including biological fluids, such as blood, blood products and vaccine formulations, cells, tissues and organs.
  • Example 1 Cloning of a spinosyn biosynthetic gene cluster
  • a fosmid library of the spinosyn producer Sac. spinosa was generated.
  • a set of four probes was designed and used in a colony hybridization experiment to identify clones containing the sugar biosynthetic genes. Forty hybridizing clones were obtained and analyzed by PCR and restriction analysis.
  • Example 2 Cloning of individual genes for expression in a Streptomyces All GT, methyltransferase and deoxysugar biosynthetic genes were cloned by PCR either using chromosomal DNA, or appropriate fosmid clones as templates. A total of 14 genes were cloned. Restriction sites to facilitate subsequent cloning steps were included in the PCR amplification primers. All PCR fragments were initially cloned in TOPO-vectors (Invifrogen, San Diego, CA) to verify their sequence. Clones having the predicted sequence were then used for further work. A list of genes cloned is shown in Table 1:
  • Example 3 Tools for the expression of deoxysugar pathways in Streptomyces Host strains: The cloned and verified GT genes, spnG and spnP, were cloned into the expression vector pUWL201. These constructs, as well as a vector-only control, were then transferred to the Streptomyces by electroporation. After verifying the presence of the plasmids, spores were prepared that were used subsequently as recipients of the cloned deoxysugar pathways described below. Thus, three different strains were generated for the analysis of deoxysugar pathway expression.
  • the chromosomal integration vector pAT6 was constructed by incorporation of the ermEp*49 promoter adjacent to the cloning site of pSET15250.
  • ermEP* is a strong constitutive promoter from Saccharopolyspora erythraea ((9) Bibb, M. J.; Janssen, G. R.; Ward, J. M. Gene 1985, 38 (1-3), 215-226).
  • restriction sites suitable for insertion of cloned deoxysugar pathways downstream of the promoter were added.
  • Example 4 Construction of deoxysugar pathways To generate deoxysugar pathways, individual genes were combined in a step-by-step process in the E. coli cloning vector pMUN3. Complete pathways were then transferred to the expression vector pAT6. This process was time consuming, reflecting the large number of sequential steps involved: 4 different 6- deoxysugar pathways
  • E. coli LPS lipopolysaccharide
  • deoxysugar pathways were constructed as described herein. All pathways contained the gtt and gdh genes from Sac. spinosa, in addition to the genes listed (1-4: 6-deoxysugar pathways, 9-12: 2,6-di-deoxysugar pathways, 17-20: forosamine analogous pathways; red (or underlined): genes from Sac. spinosa, blue: genes from Streptomyces). These include reconstruction of the L-rhamnose and L-digitoxose pathways from Sac. spinosa and the Streptomyces used in these studies, respectively. Based on the same functionality of the two kre genes, these eight pathways should have resulted in the production of four different deoxysugars.
  • Example 5 Culture conditions Initially, stability of the substrates and their influence on growth of the
  • Example 6 Chemical analysis and purification of spinosyn derivatives Liquid / liquid extraction with ethyl-acetate proved to be almost 100% efficient and delivered highly reproducible clean extracts. This method was therefore chosen. However, this method is not easily adaptable to high throughput formats, such as 96-well plates. Other methods can also be used.
  • HPLC-UV-Vis was chosen for the initial analysis of samples.
  • HPLC-MS analysis was used to determine the molecular weights of novel peaks in selected samples.
  • Example 7 Analysis of deoxysugar pathway expression in Streptomyces All pathways were analyzed in the Streptomyces used in these studies (pUWL201) (vector-only control, i.e. in the absence of a cloned GT gene) and, as expected, no product formation was detected. The influence of expression of the cloned pathways on the Streptomyces itself was investigated by analyzing samples of extracts without addition of any substrate to the cultures.
  • 6-deoxysugar pathways The 6-deoxysugar pathways were analyzed in the Streptomyces (pUWL-SpnG). All pathways resulted in conversion of the aglycone to two different glycosylated products, designated M548 and M548-II. Relative production levels varied with the pathway, as shown in Table 4:
  • Table 4 shows conversion ratios to M548 and M548-II from the aglycone; ratios were calculated as relative peak areas of M548 : M548-II : aglycone.
  • M548-II and M548 production ratios obtained with different pathways may be explained by differential expression of cloned pathway genes or by contributions from the chromosomal background of the Streptomyces used in these studies.
  • the Streptomyces used in these studies contains several other epi and kre homologs in addition to the L-digitoxose pathway genes used for construction of these pathways.
  • the host strain also contains chromosomal copies of the L-digitoxose genes. Depending on which genes are over-expressed in the artificial pathways, the ratio of activated L-rhamnose and 6-deoxy-D-glucose as substrate for SpnG could change, resulting in different ratios of M548 : M548-II.
  • 2,6-dideoxysugar pathways The 2,6-dideoxysugar pathways were analyzed in the Streptomyces used in these studies (pUWL-SpnG). Strains containing these pathways also produced M548 and M548-II, the compounds that were identified using 6-deoxysugar pathways. In addition, the expression of the reconstructed L- digitoxose pathway (#12) resulted in production of a compound with a molecular weight of 532, as expected for attachment of a 2,6-dideoxyhexose. This compound, designated M532 (fig. lie), was purified and attachment of L-digitoxose confirmed by NMR. This result confirms that M532 is also a novel compound.
  • D-forosamine and analogous pathways were analyzed both in the Streptomyces (pUWL-SpnP) and the Streptomyces (pUWL-SpnG) for conversion of the 17-PSA or aglycone, respectively.
  • the reconstructed D- forosamine pathway (#17) resulted in conversion of the 17-PSA to spinosyn A.
  • spinosyn-A additional products were also identified. These included spinosyn B, or 4"-demethyl-spinosyn A, as well as two additional, as yet unidentified products. These two products were found after expression of three forosamine analogous pathways.
  • pathway (#18) lacking the N-methyltransferase resulted in production of a compound of molecular weight 703, consistent with the expected attachment of a di- demethyl derivative of D-forosamine resulting in formation of spinosyn C, as illustrated in Figure 10.
  • Pathways 19 and 20 generated M548, indicating again that gtt, gdh, and kre present in these pathways form a minimal 6-deoxy-D-glucose pathway without any contribution from any of the other genes.
  • Example 8 Production of compounds for biological testing Studies in Sac. spinosa have shown that L-rhamnose is normally attached prior to attachment of the D-forosamine. Despite this, conversion of the 9-PSA by strains containing SpnG and the L-rhamnose, 6-deoxy-D-glucose, or L-digitoxose biosynthetic pathways was attempted. In all cases, transfer of the sugar to the 9-PSA was demonstrated but at slightly lower conversion ratios when compared to the aglycone (see Table 4).
  • spinosyn analogues were designated M689 (6-deoxy-D-glucosyl analogue), M689-II (L-rhamnosyl analogue) and M673 (L-digitoxosyl analogue), as illustrated in Figure 11.
  • M689 11(a), M689-II 11(b) and M673 11(c); modified sugars are shown in red.
  • spinosyn as a model compound, a system in a Streptomyces was successfully developed that is suitable for glycosylation of natural products. Twelve recombinant deoxysugar pathways were constructed and expressed. At least total of six novel products were identified.
  • Example 9 Cloning of deoxysugar biosynthetic genes
  • the in vivo glycosylation systems of the invention can use any of the many deoxysugar biosynthetic genes that have been identified in actinomycetes, related species or other organisms, including insects, mammals, etc, including the sequences of genes from the more than 30 deoxysugar biosynthetic pathways available in public databases.
  • Table 5 lists some of the sugar biosynthetic pathways that are used to practice the invention, and their organisms of origin:
  • this cluster contains interesting deoxysugar biosynthetic pathway genes, as well as genes encoding anthracycline GTs, are cloned de novo.
  • PCR-based methods targeting the type II PKS 53, dNDP-glucose-4,6-dehydratase54, and dNDP-glucose-2,3- dehydratase55 genes, all of which are expected to be part of the viriplanin cluster, are used to generate probes for screening a fosmid library of Ampullariella regularis > ATCC31417, the producing organism, by colony hybridization. Positive clones are confirmed and characterized by restriction analysis, PCR and hybridization.
  • One or two fosmid clones are expected to cover the complete pathway, these are sequenced by a shotgun approach. Genes and their functions can be identified using sequence analysis tools. In addition to the genes from the pathways listed in Table 5, single genes with interesting functions from other pathways are cloned, for example, the C5-C- methyltransferase novU from Streptomyces sphaeroides. To identify other genes that can be used in the in vivo glycosylation systems and methods of the invention, environmental sequence archives for homologues of sugar biosynthetic genes and GTs can be screened. For example, using genes from the L-rhamnose pathway from Sac.
  • Example 10 Construction of hybrid deoxysugar pathways An alternative cloning vector can be constructed that can efficiently maintain the sequence integrity of cloned DNA.
  • Low copy number vectors can be used as an alternative to pUC-based high-copy vectors pMUN3 and pAT6 used, as discussed above.
  • a suitable multi cloning site (mcs) for pathway construction can be incorporated in pACYC-derived vectors.
  • pACYC contains the replicon of P15A, which is reported to have a copy number of 15-50 per chromosome in E. coli.
  • an efficient transcriptional terminator such as the fd-terminator, can be included upstream of the mcs to inhibit any expression from vector-borne promoters.
  • the stability of pathways that proved to be problematic e.g, pathways # 1 and
  • GENEREASSEMBLYTM uses a bead-based ligation procedure. The first fragment is coupled to magnetic beads by a biotin label. Each additional pool of fragments is added, allowed to ligate and excess fragments are removed. Then the next pool of fragment is added and the cycle repeated, as illustrated in Figure 14. At the end, the product is cleaved from the beads, ligated and introduced into E. coli by transformation. To adapt this technology for the construction of deoxysugar pathways, the L-rhamnose / L-digitoxose analogous pathways (see Table 2, above) generated as described above can be used as model systems. Using GENEREASSEMBLYTM, the same subset of 8 pathways will be constructed as a library.
  • Table 6 shows the expected production patterns and frequency of occurrence, see description of pathways in Table 2, above. Analysis of the production profile allows assessment as to whether the process results in any bias in pathway construction or yields a large percentage of nonfunctional pathways. Distribution of genes within the pathways can be analyzed at the E. coli stage by performing PCR and restriction analysis. The use of primers specific for the different epi, kre and tdh / tkr genes allows assessment of whether the genes are evenly distributed, as expected. Restriction analysis can be performed to assess the number of full-length pathways, and to identify specific pathways based on their patterns. After successful implementation of GENEREASSEMBLYTM with the model pathways, a library of pathways can be constructed that includes additional biosynthetic genes.
  • 2 to 4 homologues for each gene-function can be included, as illustrated in Figure 16.
  • the position of genes in the hybrid pathways can follow the same order as their gene-product functions in biosynthesis: early genes (dNDP- transferase, 4,6-dehydratase), followed by intermediate genes (e.g. epimerases, 2- deoxygenation, C-methylations), and with late genes (e.g. 4-ketoreductases, aminotransferases) at the end. Because the reconstructed sugar biosynthetic pathways can be ordered, with respect to gene function, it will be possible to modulate the length of pathways and the type of deoxysugar pathways produced, as illustrated in Figure 16.
  • the availability of specialized libraries can allow matching GTs of known specificity with pathways more likely to provide suitable substrates, thus minimizing the number of samples that have to be analyzed.
  • the quality of the library can be confirmed by restriction analysis of a representative number of clones to determine both average pathway length, as well as the distribution of the different genes within the library.
  • Pathway length can be estimated by analysis of insert size. Distribution of genes can be estimated by analysis of repeats in restriction patterns using enzymes that cut actinomycete DNA at relatively high frequency, such as SacII or Narl.
  • GENEREASSEMBLYTM does not prove to be suitable, alternative approaches to generate a pathway library can be explored.
  • One such alternative is the use of restriction independent cloning systems, for example, the XI- CLONETM system from Gene Therapy Systems, San Diego, CA. This system takes advantage of homologous recombination for directional cloning of linear DNA fragments at any position in a vector, and allows for the efficient addition and/or replacement of genes in existing constructs.
  • the fragment to be inserted must contain homologous regions of ⁇ 50 bp, which can be added by means of extended PCR-primers.
  • GENEREASSEMBLYTM By using multiple fragments in each step and following the scheme in Figure 16, it is possible to construct a library similar to that using GENEREASSEMBLYTM.
  • a combination of these methods can be used: GENEREASSEMBLYTM to generate partial library fragments and XI-CLONETM to combine the fragments into functional pathways.
  • Pathway clones that result in the production of a novel glycosylated derivative can be recovered in E. coli. Plasmids isolated from E. coli clones will then be analyzed by restriction analysis and sequencing to identify the genes present in these pathways.
  • Example 11 Construction of specific deoxysugar pathways
  • GTs genes have been chosen to explore their ability to glycosylate either spinosyn or anthracycline type substrates (see Table 8 and Table 9, below).
  • these GTs can be initially be supplied with their natural sugar substrate. Therefore, the pathways for the associated deoxysugars of these enzymes, listed in Table 7, can be re-constructed.
  • a bead-based ligation procedure as described for GENEREASSEMBLYTM where only those fragments necessary for a particular pathway can be used.
  • common functions such as 2-deoxygenation or 3 -aminotransferases, can be chosen from a single source. Table 7 shows specific pathways that can reconstructed and used in the in vivo glycosylation systems and methods of the invention:
  • Example 12 Conversion system optimization The number of possible combinations of deoxysugar pathways and GTs that can be generated and used in the in vivo glycosylation systems and methods of the invention is extensive. Initial optimization can be done using the spinosyn system of the invention. The optimized conditions can then be applied to glycosylated anthracycline products. The initial optimization will be done using the well-characterized spinosyn system. The optimized conditions will then be applied to glycosylated anthracycline products to further validate the system. Transfer of constructs: Allowing for the need for sporulation of Streptomyces, the sequential transfer of two constructs into the strain required approximately 6 weeks.
  • pathway 1 and SpnG can be used, selecting the plasmids with apramycin and thiostrepton, respectively.
  • Resulting exconjugants can be verified by analysis of conversion of the aglycone to M548-II. Based on the conjugation frequency for single constructs (1 / 1000 - 10000) recipients), it is conceivable that about 10 double exconjugants could be obtained by such double transfers.
  • the deoxysugar pathway library can be prepared in E. coli and a host strain prepared containing each GT of interest. Spores of these individual host strains can be used for introduction of the library. This should result in less bias, as the exconjugants grow as individual colonies on plates and do not compete against each other.
  • Culture conditions Experiments have shown that conversion can be achieved in a 96 deep-well (2.2ml / well) format with 1 ml of medium. However, conversion ratios were poor (10 to 20%) compared to those obtained from tube or flask cultures (30%> to 100%).
  • a possible cause could be the different media-volume to vessel- volume ratio (1:2 vs. 1:5) (see above), resulting in differential aeration of the cultures.
  • a plate-based format is desirable. Optimization of conditions in a plate format can be performed using the mini-library described above. Lower media volumes in a 96 well format can be analyzed, as well as longer periods of incubation and increased speed of shaking. For Streptomyces cultures, this is limited by evaporation to a minimum of 700 ml and 7 days. If the 96-well format cannot be improved significantly, plates with a larger volume, such as 48-well (5ml / well) and 24-well (10ml / well) plates can be used.
  • Sample preparation and chromatographic analysis The current liquid / liquid extraction is not adaptable to a truly high throughput process in a plate format. Other options, such as evaporation of the medium, direct sample preparation by centrifugation, filtration, or solid-phase extraction can be optimized by routine screening alternatives to obtain low variation between samples, good product recovery and a low background.
  • the current HPLC-UV-vis method for analysis of spinosyn derivatives consists of a linear gradient of H 2 0 and acetonitrile, with a total runtime of 30 min. This method is suitable for analysis of the full spectrum of substrates and products (aglycone,
  • HPLC-MS HPLC-MS.
  • HPLC-MS data can be used to build a database containing retention times, mass-spectra and UV-vis-spectra. This data can then be used to prioritize conversion products for further analysis. Products showing different characteristics than those in the database can be scaled up and isolated for structure elucidation by NMR and testing of their biological activity.
  • the TLC method can be used and optimized by routine screening of alternatives. This may allow handling of a large number of samples simultaneously and can be easily automated.
  • the influence of differential glycosylation on Rf-values can be analyzed using the novel compounds of the invention.
  • Example 13 Construction of a library of spinosyn derivatives
  • a library of modified glycosylated spinosyn derivatives is generated to identify compounds with increased or modified insecticidal activity. This is achieved by increasing the number of deoxysugar pathway / GT combinations used to generate the derivatives. Optimal activity of the spinosyns is achieved only after methylation of the sugar moieties.
  • a series of O-methyltransferase genes from the spinosyn pathway is to the system.
  • derivatives of spinosyn containing modified sugars at both the 9- and 17-positions is generated. Additional glycosyltransferases A second spinosyn producing strain, Sac. pogona, is available from the
  • hybridizing clones are mapped by restriction analysis, and suitable clones sequenced using a shotgun approach.
  • deoxysugar biosynthetic genes and GT genes are identified and used for preparing spinosyn analogs.
  • SpnP has recently been shown to transfer neutral sugars to the 17-PSA28.
  • both SpnP and the homologue from Sac. pogona are screened for their ability to transfer neutral sugars to the 17-PSA.
  • a number of macrolide biosynthetic gene clusters have been sequenced.
  • the erythromycin producer Sac. erythraea is capable of transferring a glucose moiety to the 17-position of the 17-PSA 28. Sac. erythraea is also able to transfer a glucose moiety to tylactone and avermectin. While the gene responsible for this transfer has not been identified, it is conceivable that this activity is normally associated with self-resistance of the organism to erythromycin (in addition to the methylation of 23S rRNA conferred by the ermE gene). A similar resistance mechanism has been demonstrated in the oleandomycin producer Streptomyces antibioticus.
  • This mechanism of resistance involves an intracellular glucosyltransferase that transfers glucose to the 2"- OH of desosamine, coupled with an extracellular glucosidase that removes the sugar after excretion. Homologues of these genes have been found in several Streptomyces species.
  • genes from Streptomyces e.g, the oleandomycin producer S. antibioticus, are cloned and investigated for their ability to glycosylate spinosyn derivatives. Analysis of the deoxysugar pathway library
  • pathway libraries to be constructed in this project are tested in conjunction with both SpnP and SpnG for conversion of the 17-PSA and aglycone.
  • Any other GT that shows glycosylation of a spinosyn substrate can be assayed together with the deoxysugar pathway library.
  • a sublibrary of pathways closely related to the cognate substrate will be chosen and analyzed.
  • Methyltransferases It is known that spinosyn derivatives lacking the O-methyl groups on the L-rhamnose are in general less active than their methylated counterparts.
  • the methyltransferase genes spnl, H, and K were isolated by PCR amplification from Sac. spinosa genomic DNA.
  • these genes are cloned into plasmid pUWL20 GENEREASSEMBLYTM is used to efficiently generate these constructs. After transfer of these constructs to a Streptomyces, the resulting strains can be investigated for glycosylation and methylation using pathways 4 (6-deoxy-D-glucose), 1 (L-rhamnose), and 12 (L-digitoxose). In one aspect, up to 17 different methylated derivatives of each the 17-PSA analogues and spinosyn A are generated, and, screened for biological activity.
  • the glycosylated spinosyns of the invention are methylated at one of the 17 different possible sites, or, at several of the 17 different possible sites, or, at all of the 17 different possible sites.
  • the compositions are modified at a single glycosylation site, or, are modified in a single glycosylation step.
  • diversity is added to the compositions of the invention by combining modifications at both the 9- and the 17-position within a single compound.
  • a sequential feeding strategy can be employed.
  • the product of the first glycosylation step can be produced at a larger scale and partially purified.
  • This partially purified material carrying one modified sugar can then be used for a second feeding using a strain that carries out the second glycosylation step.
  • This product can then be purified for further characterization.
  • the advantage of this method is that it does not require any further modification of the strains.
  • this sequential strategy is time consuming and requires a large amount of substrate due to the loss of compound during purification.
  • the method for making a compound of the invention is done as a one-step process. This can require combining all genes necessary for the 9- and 17-glycosylation in one strain of a Streptomyces.
  • SpnG and SpnP can be introduced into strains that contain pathways required for the synthesis the sugar moieties found at the 9- and 17-position (L-rhamnose, L-digitoxose, or 6-D-deoxyglucose, respectively, and D- forosamine). Conversion of the aglycone to spinosyn derivatives by these strains can be assessed by routine procedures. Activity testing of novel spinosyn derivatives All novel spinosyn derivatives can be tested for their insecticidal activity using routine screens known in the art. For example, the nematode Caenorhabditis elegans and the mosquito Aedes aegypti can be used as target organisms.
  • Both assays are available in a 96-well format and have been shown to be suitable for analyzing activity of spinosyn derivatives. Initial testing may requires about 2 mg of pure compound for each derivative. Based on results of these screens, compounds can be produced at a larger scale for extensive activity profiling.
  • glycosylation platforms of the invention are used to generate novel anthracycline derivatives. Initially, this requires cloning and expression of anthracycline resistance and GT genes in a Streptomyces. Then the deoxysugar pathway library can be used to generate novel glycosylated doxorubicin derivatives. The focus can be on generating di-glycosylated derivatives that have shown promising profiles for anticancer therapy. Doxorubicin resistance in a Streptomyces In addition to their anticancer activity, anthracyclines also possess potent antibacterial activity. The doxorubicin producer S. peucetius contains two resistance mechanisms, conferred by DrrAB and DrrC.
  • DrrAB form an ABC-type export system
  • DrrC is an UvrA-like DNA repair enzyme. Both confer doxorubicin resistance in S. lividans.
  • the Streptomyces strain used in these studies contains a close homologue of DrrC, which may confer resistance as well. Resistance of this Streptomyces against doxorubicin can be determined.
  • drrAB and drrC is cloned.
  • the level of resistance of this Streptomyces expressing either or both is determined. If this does not result in sufficient levels of resistance, spontaneous doxorubicin resistant mutants of this Streptomyces can be selected.
  • toxicity levels for all substrates are determined.
  • the level of resistance can determine the substrate concenfration that can be used in the glycosylation experiments.
  • Chemical analysis of doxorubicin In one aspect, commercially available anthracyclines, such as daunorubicin, doxorubicin and aclarubicin, are used to use routine screening to determine HPLC methods suitable for analysis of doxorubicin derivatives of the invention.
  • HPLC methods have been described in the literature. A method with an overall analysis time of less than 5 min/sample can be established. Detection limits can be determined. Based on detection limits and substrate loading, the minimal detectable conversion ratios can be calculated.
  • TLC methods for anthracyclines are available.
  • one or both sugars within the disaccharide of a composition of the invention e.g, a glycosylated anthracycline, such as doxorubicin are modified.
  • both a mono-glycosylated and a non-glycosylated substrate can be used.
  • Doxorubicin (e.g, from Sigma, St. Louis, MO) can be used as a mono-glycosylated substrate.
  • aglycones are used as non-glycosylated substrate; they can be generated from commercially available anthracyclines, such as doxorubicin, daunorubicin and/or aclarubicin, by mild-acid hydrolysis using published procedures.
  • an aclarubicin aglycone (similar to the rhodomycin aglycone) as substrate may not result in a doxorubicin derivative.
  • aclarubicin aglycone is an intermediate in doxorubicin biosynthesis and final conversion to the doxorubicin aglycone occurs after glycosylation.
  • the genes involved in these late steps in doxorubicin biosynthesis (dnrKP, doxA) are well characterized and, in one aspect, are included into the systems of the invention to generate doxorubicin type compounds.
  • Glycosyltransferase (GT) Genes Anthracyclines are a diverse group of glycosylated polyketides. Glycosylation occurs at various positions of the anthracycline aglycone, and sugar chains containing up to five moieties exist.
  • the invention provides anthracycline aglycones containing up to five glycosylation moieties.
  • diglycosylated doxorubicin derivatives are constructed. Therefore, GTs expected to be able to attach sugars to the 7-position (red arrow in Figure 3c) and to form a disaccharide are used.
  • Publicly available anthracycline GTs can be used, for example, sequences for anthracycline GTs, their deoxysugar substrates, and their sources are listed in Table 9:
  • Additional GTs can be identified by cloning, e.g, cloning the viriplanin cluster from Ampullariella regularis ATCC31417.
  • the exact functions of only a few of these genes e.g. dnmS, dnrH, dauH
  • the enzymes listed above are predicted to be involved in the glycosylation steps indicated by the arrows in Figure 5.
  • DnmS, RdmH, RhoG, and SnogE are predicted to act on the 7-position of the aglycone, whereas DauH, DnmH, and AknK are predicted to attach a second sugar to the growing saccharide chain.
  • Glycosylated doxorubicin derivatives In one aspect, both of the sugar moieties are modified independently of each other. In one aspect, the results are combined to generate a truly combinatorial library of di glycosylated doxorubicin derivatives. Accordingly, in one aspect, the invention provides libraries of novel diglycosylated doxorubicin derivatives (as are provided libraries of novel diglycosylated natural product, e.g, spinosyn, derivatives). In one aspect, a twofold approach is taken: 1. Direct glycosylation of doxorubicin, resulting immediately in diglycosides (and providing insight into the functions and flexibility of the enzymes used). 2.
  • Glycosylation of anthracycline aglycones to generate novel mono-glycosylated derivatives as substrates for a second glycosylation.
  • Initial analysis can be performed providing each GT with its native deoxysugar substrate by co-expression of the relevant pathway (see Table 7).
  • GTs can be assessed using either aglycones or doxorubicin, and the appropriate deoxysugar pathways listed in Table 9. These can be specifically reconstructed (see Table 7).
  • GTs showing activity on any of the substrates are used in conjunction with the deoxysugar pathway library in an analogous fashion to the experiments described above.
  • Any mono-glycosylated products from this effort can be produced on a larger scale to provide substrates for attachment of a second sugar moiety.
  • combinations of the two different sugar moieties are carried out using a sequential feeding strategy analogous to that described in above.
  • Activity of novel glycosylated doxorubicin analogues Diglycosidic derivatives of doxorubicin are predicted to have both higher efficacy and lower toxicity compared to doxorubicin.
  • the initial evaluation of the efficacy of the novel diglycosidic doxorubicins of the invention can be done using routine screening, e.g, by cell-based assays.
  • a wide range of cell lines, including human cancer are used, as shown in Table 10:
  • LDH assay which measures cell membrane integrity using lactate dehydrogenase (LDH) activity, as well as Tetrazolium dye (MTT and MTS) conversion assays that measure cell metabolic activity.
  • LDH lactate dehydrogenase
  • MTS Tetrazolium dye
  • results from these cell-based screens can be used to prioritize compounds of the invention.
  • Those compounds showing improved efficacy in the cell-based assay, as compared to doxorubicin, are further analyzed in animal models, e.g, using xenografted mice carrying human tumors.
  • Daudi solid tumors can be established in CB-17SCID mice and serially passaged 3 times before use in study. Six to eight week-old mice are injected subcutaneously with 1 x 10 6 cells in both flanks. Mice can be monitored for tumor growth at 2 to 3 weeks post tumor inoculation and divided into groups of 10 mice.
  • Mice are treated intravenously with novel doxorubicin analogs at both a high and low dose (to be determined by IC50 values from in vitro cytotoxicity assays).
  • a positive control group treated with doxorubicin can be included.
  • Mice can be monitored for tumor growth and progression. To determine possible side effects, weight can be monitored as a general parameter of health. In one aspect, tumor growth inhibition and long-term survival are chosen as the end point for the study. Compounds showing good activity can be further tested in xenograft tumor models established in additional cell lines, including doxorubicin-resistant tumor cell lines, to determine if the doxorubicin derivatives are effective in inhibiting drug-resistant tumor growth.
  • Example 14 6-deoxy-D-glucose- 17-pseudoaglycone, Compound M548
  • the invention provides a novel compound 6-deoxy-D-glucose- 17- pseudoaglycone.
  • the strain expressed the rhamnosyl-transferase spnG from Saccharopolyspora spinosa ATCC49460 cloned under the control of the ermEp* promoter in the vector pUWL201, as illustrated in Figure 17.
  • the strain also expressed a deoxysugar pathway, expressed by genes as cloned into vector pAT6 (pAT6-gtt21-gdhll-epil2-krel5), illustrated as pathway #4, shown in Figure 18, comprising the gdh and gtt genes from S.
  • a 10%) vol/vol inoculum was transferred to SCM with 12.5mg/ml thiostrepton and 50 mg/ml apramycin to select for the vectors and 100 mg/ml spinosyn A aglycone (whose structure is illustrated in Figure 19a) as substrate for glycosylation.
  • the culture was extracted with ethyl-acetate and analyzed by HPLC-UV-vis and HPLC-MS. Production of a peak with a molecular weight of 548.3 and the same UV-vis spectrum as the spinosyn A aglycone was detected.
  • This compound was named M548 and purified by HPLC from a larger volume and the structure elucidated by NMR spectroscopy. This data showed that the compound M548 contains a 6-deoxy-D-glucose moiety attached to the 9-position of the spinosyn aglycone whose structure is illustrated in Figure 19b.
  • Example 15 L-rhamnosyl- 17-pseudo-aglycone, Compound M548-II
  • the invention provides a novel compound L-rhamnosyl- 17-pseudoaglycone.
  • HPLC-UV-vis showed an identical spectrum to the spinosyn A aglycone and LC-MS showed a molecular weight of 548.3.
  • This compound was named M548-II and the structure elucidated by NMR, Figure 20.
  • Example 16 L-digitoxosyl- 17-pseudo-aglycone.
  • the invention provides a novel compound L-digitoxosyl- 17-pseudo- aglycone.
  • Example 17 9-6-deoxy-D-glucosyl-spinosyn A, Compound M689
  • the invention provides a novel compound the 9-6-deoxy-D-glucosyl- spinosyn A.
  • the engineered Streptomyces strain (S. diversa) expressing SpnG and the 6- deoxy-D-glucose biosynthetic pathway (pathway #4, Figure 18, see above) was used to convert the spinosyn A 9-pseudo-aglycone (9-PSA, whose structure is illustrated in Figure 25a) to the 9-6-deoxy-D-glucosyl-spinosyn A of the invention, whose structure is illustrated in Figure 25b.
  • the 9-PSA was added to a culture of the strain, extracted and analyzed by HPLC and HPLC-MS. Production of a new compound with a molecular weight of 689.7 could be shown. This compound was designated M689.
  • Example 18 9-L-rhamnosyl-spinosyn A, Compound M689-II
  • the invention provides a novel compound, 9-L-rhamnosyl-spinosyn A, Compound M689-II, whose structure is illustrated in Figure 26.
  • Streptomyces strain (S. diversa) expressing SpnG and the L-rhamnose biosynthetic pathway (pathway #9, Figure 22, see above) was used to convert the 9-PSA (Figure 25a) to the 9-L-rhamnosyl-spinosyn A of the invention, whose structure is illustrated in Figure 26.
  • the 9-PSA was added to a culture of the strain, extracted and analyzed by HPLC and HPLC-MS. Production of a new compound with a molecular weight of 689.7 could be shown. This compound was designated M689-II.
  • the structures of both M689 and M673 were confirmed by NMR after purification of larger quantities from the strains described herein.
  • Example 19 9-L-digitoxosyl-spinosyn A, Compound M673
  • the invention provides a novel compound 9-L-digitoxosyl-spinosyn A, Compound M673, whose structure is illustrated in Figure 27.
  • the engineered Streptomyces strain expressing SpnG and the L-digitoxose biosynthetic pathway was used to convert the 9-PSA (Figure 25a) to the 9-L- digitoxosyl-spinosyn A of the invention, whose structure is illustrated in Figure 27.
  • the 9-PSA was added to a culture of the strain, extracted and analyzed by HPLC and HPLC-MS.
  • Example 20 Activity of novel compounds Compounds M548, M548-II, and M532 were inactive. Activity was tested by an injection assay using beet armyworms as the test organisms. 6 larvae per test were all run at 10 ug/larva in 0.5 ul of DMSO. Unless otherwise indicated, data is reported as percent of injected larvae showing symptoms of intoxication.
  • Example 21 Deoxysugar biosynthesis in S. diversa Reconstructed L-rhamnose pathway Production of M548-II and M689-II in S. diversa, after feeding the aglycone and the 9-pseudo-aglycone, respectively, was confirmed by HPLC and HPLC- MS analysis. Investigation of 6-deoxysugar biosynthesis in S. diversa To elucidate the minimal gene set required for biosynthesis of 6-deoxy-D- glucose and reasons for various levels of 6-deoxysugar after expression of reconstructed 6-deoxysugar pathways several incomplete pathways were constructed by removing genes from pathways #2, #3, and #4. These were expressed in S.
  • Vector construction pAT6 was constructed by insertion of ermEp* from pUWL201 into pSET 152 to generate a new vector suitable for over-expression of genes and pathways in Streptomyces. All pieces used are from publicly available plasmids. These data demonstrate: - Over-expression of gtt and gdh genes is sufficient for biosynthesis of 6- deoxy-D-glucose. S. diversa must contain a ketoreductase, which is able to catalyze the reduction of 4-keto-6-deoxy-D-glucose to 6-deoxy-D-glucose. - L-rhamnose biosynthesis requires over-expression of an epi gene.

Abstract

The invention provides a novel technology for glycosylation of natural products using novel genetically engineered strains of bacteria. These in vivo glycosylation systems of the invention express a heterologous glycosyltransferase and deoxysugar pathway that are capable of glycosylating a suitable substrate, which can be added to a culture broth. The invention also provides novel compounds glycosylated by these novel genetically engineered strains (the in vivo glycosylation systems of the invention), including novel peptides, mixed polyketide-peptides, or polyketides, including novel macrolides, e.g., glycosylated spinosyn derivatives, glycosylated derivatives of antibiotics such as erythromycin, tetracycline, rifampicin, glycosylated derivatives of anti-tumor drugs such as daunorubicin, mithramycin, derivatives of immunosuppressants such as rapamycin, FK520, FK506, glycosylated derivatives of anti-fungals such as amphotericin, glycosylated derivatives of antibacterials such as tylosin, glycosylated derivatives of antiparasitics such as avermectin, glycosylated derivatives of insecticides such as spinosyn, and methods for making these compounds using the in vivo glycosylation systems of the invention.

Description

GLYCOSYLATION ENZYMES AND SYSTEMS AND METHODS OF MAKING AND USING THEM
GOVERNMENT INTEREST The United States Government may have certain rights to this invention by virtue of funding received from the Department of Health and Human Services NIH grant number 1R430GM067468-01.
CROSS REFERENCE TO RELATED APPLICATIONS This application claims priority benefit to United States provisional patent applications 60/492,781, filed August 4, 2003, and 60/515,950, filed October 29, 2003, which are incorporated by reference in their entirety for all purposes.
REFERENCE TO SEQUENCE LISTING SUBMITTED ON A COMPACT DISC This application includes a compact disc (submitted in quadruplicate) containing a sequence listing. The entire content of the sequence listing is herein incoφorated by reference. The sequence listing is identified on the compact disc as follows.
Figure imgf000002_0001
TECHNICAL FIELD This invention relates to the fields of agriculture, pharmacology and molecular biology. The invention provides a novel technology for glycosylation of natural products using novel genetically engineered strains of bacteria. These in vivo glycosylation systems of the invention express a heterologous glycosyltransferase and deoxysugar pathway that are capable of glycosylating a suitable substrate, which can be added to a culture broth. The invention also provides novel compounds glycosylated by these novel genetically engineered strains (the in vivo glycosylation systems of the invention), including novel peptides, mixed polyketide-peptides, or polyketides, including novel macrolides, e.g., glycosylated spinosyn derivatives, glycosylated derivatives of antibiotics such as erythromycin, , rifampicin, glycosylated derivatives of anti-tumor drugs such as daunorubicin, mithramycin, derivatives of immunosuppressants such as l rapamycin, FK520, FK506, glycosylated derivatives of anti-fungals such as amphotericin, glycosylated derivatives of antibacterials such as tylosin, glycosylated derivatives of antiparasitics such as avermectin, glycosylated derivatives of insecticides such as spinosyn, and methods for making these compounds using the in vivo glycosylation systems of the invention. BACKGROUND The occurrence of resistant bacterial pathogens and the desire for anti- tumor drugs with reduced side effects requires the constant development of novel pharmaceuticals. Historically, many important pharmaceuticals have been derived from natural products, which have also served many useful roles in agriculture. The Gram- positive filamentous actinomycetes produce approximately 60% of all known natural products, and about two thirds of all known antibiotics of microbial origin. Actinomycete-derived ,polyketides, peptides or mixed polyketide-peptides possess a wide variety of properties resulting in many different applications. In the pharmaceutical field they are used as antibiotics (erythromycin, tetracycline, rifampicin), anti-tumor drugs
(daunorubicin, mithramycin), immunosuppressants (rapamycin, FK520, FK506) and anti- fungals (amphotericin). In agriculture, actinomycete derived polyketides are used as antibacterials (tylosin), antiparasitics (avermectin), and as insecticides (spinosyn). This wide range of applications underscores the commercial importance of polyketides. Many of the compounds mentioned above contain glycosyl residues attached to the polyketide scaffold. These sugars serve as molecular recognition elements that are frequently critical for biological activity, and in their absence function is often abolished or dramatically reduced. Most of the sugars found in secondary metabolites belong to the 6-deoxysugar-family. Spinosyns are a novel family of fermentation-derived natural products that exhibit potent insecticidal activities. Spinosad, a naturally-occurring mixture of spinosyn A and spinosyn D, has successfully established its utility for crop protective applications in the agrochemical field. Spinosyns are macrolides produced by Saccharopolyspora spinosa and Saccharopolyspora pogona and show excellent insecticidal activity. They are characterized by an unusual tetracyclic macrolide aglycone to which two deoxysugar moieties are attached. A mixture of the main components, spinosyn A and D from S. spinosa, is marketed to control insects in cotton, vegetables, fruit trees and nuts. They are highly active on chewing insects (e.g. caterpillars) but have only limited activity on sucking insects (e.g. aphids). A compound with activity on both would turn spinosyn into a truly broad-spectrum insecticide, greatly enhancing its market potential. Approximately 80 different deoxysugars have been found in secondary metabolites made by actinomycetes. Recognition of their importance for bioactivity has prompted a number of recent studies on deoxysugar biosynthesis, resulting in a thorough understanding of the corresponding biosynthetic pathways. Biosynthesis generally starts with activation of glucose- 1 -phosphate to yield NDP-glucose. The first committed step in deoxysugar biosynthesis is the removal of the 6-OH-group by NDP-glucose-4,6- dehydratase, yielding the key intermediate NDP-4-keto-6-deoxy-glucose, which is then further modified to yield the final deoxysugar. The NDP-activated deoxysugar is then transferred to the aglycone, or to a growing sugar chain, by pathway-specific glycosyltransferases (GT). Many of the enzymes involved in deoxysugar biosynthesis and transfer are flexible towards their substrates. GTs can show flexibility towards both their aglycone and their sugar substrates. For example, Trefzer, et al., (2002) J. Am. Chem Soc.
124(21):6056-6062, showed that the enzyme UrdGT2 acted on both the urdamycin and premithramycinone aglycones and was able to accept several sugars (D-mycarose, D- and L-rhodinose) besides its natural substrate (D-olivose). Similarly, deletion of the deoxysugar biosynthetic genes in the urdamycin-producing organism resulted in rerouting of deoxysugar biosynthesis towards sugars not found in the wild-type strain (see Hoffmeister (2000) A. Chem Biol. 7(11):821-831). Similar results were found for enzymes involved in the production of picromycin, elloramycin, and oleandomycin. Blanco, et al. (2001) Chem. Biol. 8(3):253-263; Rodriguez, et al. (2002) Chem. Biol. 9(6):721-729, generated a system using a heterologous host strain in which deoxysugar pathways and GTs are expressed. An aglycone is either added exogenously or provided in vivo by expressing the respective biosynthetic genes. Gaisser, et al. (2000) Mol. Microbiol. 36(2):391-401, used the erythromycin producer Saccharopolyspora erythraea as a host, taking advantage of the resident deoxysugar pathways for D- desosamine, L-mycarose, and L-rhamnose biosynthesis. Several mutant strains of S. erythraea were used as hosts for heterologous expression of GTs. Depending on whether the erythromycin aglycone genes were present or not, these hosts were used for generating erythromycin or other macrolide derivatives. In contrast to these in vivo systems, an in vitro approach using chemo- enzymatically generated activated deoxysugars and heterologously expressed GTs has been described by Albermann, et al. (2003) Organic Letters 5(6):933-936; Losey, et al. (2002) Chem. Biol. 9(12):1305-1314. Several novel novobiocin and vancomycin derivatives were generated using this approach, which was termed glyco-randomization. This system also allows incorporation of unnatural sugars such as halogenated sugars. However, for a large-scale production process, a more cost-effective in vivo fermentation process is required. Methods for chemical glycosylation have been described as well. However, synthesis of the highly modified sugar moieties found in natural products and their attachment is difficult and not feasible at a production scale. Given the importance of the sugar moieties for bioactivity, the ability to modulate the glycosylation state of a natural product in a rapid and cost-effective way would be a highly desirable attribute for any natural product drug discovery program.
SUMMARY The invention provides in vivo glycosylation systems comprising an engineered host cell, for example, an actinomycetes (including any organism from the order Actinomycetales), e.g., a recombinantly engineered actinomycetes, comprising a heterologous glycosyltransferase and a heterologous deoxysugar pathway. In one aspect, the actinomycetes is a Streptomyces, such as a Streptomyces coelicolor, Streptomyces peucetius, Streptomyces avermitilis, Streptomyces aureofaciens, Streptomyces kasugensis, Streptomyces lavenulae, Streptomyces lipmanii, Streptomyces lividans, Streptomyces ambofaciens, Streptomyces violaceoniger, Streptomyces thermotolerans, Streptomyces rimosus, Streptomyces glaucescens, Streptomyces roseofulvus, Streptomyces cinnamonensis, Streptomyces curacoi, Streptomyces fradiae, Streptomyces griseus, Streptomyces griseofuscus, Streptomyces longisporoflavus, Streptomyces hygroscopicus, Streptomyces lasaliensis, Streptomyces venezuelae, Streptomyces antibioticus,
Streptomyces albus, Streptomyces tsukubaensis, Streptomyces galilaeus or Streptomyces diver sa. In one aspect, the actinomycete is an actinomycete plant endophyte. In one aspect, the host cell is an actinomycetes from the family Micromonosporaceae, or the genus Actinomyces, Actinomadura or Nocardia. Micromonosporaceae are preferably Micromonos poraceae, Actinoplaes, Dactylos porangium, Micromonospora or
Verrucosispora. Alternatively, the host cell can also comprise a Pseudonocardineae, Actinosynnema, Lechevaleria, Saccharothrix, Actinoalloteichus, Actinopolyspora, Amycolatopsis, Kibedelos porangium, Pseudonocardia, Saccharomonospora, Saccharopolyspora, and Streptoalloteichus. Alternatively, the host cell can be a Streptomycetacea, including Kitasatospora and Streptomyces. Alternatively, the host cell can be a Microbispora and Microtetraspora. The heterologous glycosyltransferase or the heterologous deoxysugar pathway can be a glycosyltransferase or a deoxysugar pathway of the invention. The invention provides in vitro glycosylation systems comprising a heterologous glycosyltransferase and a heterologous deoxysugar pathway and a cell extract of a host cell, e.g., actinomycetes, or equivalent, which includes, as noted above, any organism from the order Actinomycetales. In one aspect, the actinomycete is a Streptomyces, e.g., a Streptomyces coelicolor, Streptomyces peucetius, Streptomyces avermitilis, Streptomyces aureofaciens, Streptomyces kasugensis, Streptomyces lavenulae, Streptomyces lipmanii, Streptomyces lividans, Streptomyces ambofaciens, Streptomyces violaceoniger, Streptomyces thermotolerans, Streptomyces rimosus, Streptomyces glaucescens, Streptomyces roseofulvus, Streptomyces cinnamonensis, Streptomyces curacoi, Streptomyces fradiae, Streptomyces griseus, Streptomyces griseofuscus, Streptomyces longisporoflavus, Streptomyces hygroscopicus, Streptomyces lasaliensis, Streptomyces venezuelae, Streptomyces antibioticus, Streptomyces albus, Streptomyces tsukubaensis, Streptomyces galilaeus or Streptomyces diversa, or as described herein, or the actinomycete is from the family Micromonosporaceae, or the genus Actinomyces, Actinomadura or Nocardia, or the actinomycete is an actinomycete plant endophyte. Micromonosporaceae are preferably Micromonos poraceae, Actinoplaes, Dactylos porangium, Micromonospora or Verrucosispora. Alternative host cells include Pseudonocardineae, particularly Actinosynnema, Lechevaleria, Saccharothrix, Actinoalloteichus, Actinopolyspora, Amycolatopsis, Kibedelos porangium, Pseudonocardia, Saccharomonospora, Saccharopolyspora, and Streptoalloteichus. Alternative host cells include the Streptomycetacea, including Kitasatospora and Streptomyces. Alternative host cells include Microbispora and Microtetraspora. The heterologous glycosyltransferase or the heterologous deoxysugar pathway can be a glycosyltransferase or a deoxysugar pathway of the invention. In one aspect, the glycosyltransferase is recombinant enzyme and/or the deoxysugar pathway comprises recombinant enzymes, or combinations thereof. In one aspect, the glycosyltransferase is rhamnosyl-transferase, an anthracycline glycosyltransferase , desosaminyltransferase, mycarosyltransferase, desosaminyltransferase, megosaminyltransferase, oleandrosyltransferase, olivosyl- transferase, mycaminosyltransferase, deoxyallose transferase, forosaminyltransferase, mannosyltransferase, daunosaminyltransferase, rhodinosyltransferase, quinovosyltransferase, or a macrolide glycosyltransferase. In one aspect, the deoxysugar pathway comprises a rhamnose, a forosamine, mycarose, mycaminose, desosaminose, megosaminose, oleandrosose, olivosose, deoxyallose, mannose, daunosaminose, rhodinose, quinovose, and/or a L-digitoxose biosynthetic pathway. In one embodiment the glycosyltransferase transfers a deoxy-glucose, -rhamnose, -digitoxose, -forosamine , - mycarose, -mycaminose, -desosaminose, -megosaminose, -oleandrosose, -olivosose, - deoxyallose, —mannose, -daunosaminose, -rhodinose, quinovose, and/or their D- or L- forms. In another embodiment the glycosyltransferase transfers one of the sugars shown in Figures 31 or 32 or as described herein. The invention provides methods for making a glycosylated natural product comprising the following steps (a) providing an in vivo or in vitro glycosylation system comprising an engineered host cell, e.g. actinomycetes (or cell extract equivalent) comprising a heterologous glycosyltransferase and a heterologous deoxysugar pathway; (b) providing natural product; and (c) adding the natural product to the in vivo or in vitro glycosylation system, thereby glycosylating the natural product. In one aspect, the natural product is either added exogenously or provided in vivo or in vitro by expressing biosynthetic genes for the natural product. In one aspect, the natural product comprises an aglycone or a pseudoaglycone, or 9- or 17-pseudoaglycone, a macrolide, or the natural product comprises a peptide, a mixed polyketide-peptide, or a polyketide, such as a macrolide, e.g., a spinosyn, an erythromycin, , a rifampicin, , idarubicin, epirubicin, a daunorubicin, a mithramycin, a rapamycin, FK520, FK506, an amphotericin, a tylosin or an avermectin. The macrolide aglycone or pseudoaglycone includes an aglycone or pseudoaglycone of a spinosyn, an erythromycin, a rifampicin, idarubicin, epirubicin, a daunorubicin, a mithramycin, a rapamycin, FK520, FK506, an amphotericin, a tylosin, oleandomycin, rifamycin, immunomycin, narbomycin, pikromycin, spiramycin, dirithromycin, clarithromycin, troleandomycin, azithromycin or an avermectin. In one aspect of the methods, the actinomycete is a Streptomyces., e.g., a Streptomyces coelicolor, Streptomyces peucetius, Streptomyces avermitilis, Streptomyces aureofaciens, Streptomyces kasugensis, Streptomyces lavenulae, Streptomyces lipmanii, Streptomyces lividans, Streptomyces ambofaciens, Streptomyces violaceoniger, Streptomyces thermotolerans, Streptomyces rimosus, Streptomyces glaucescens, Streptomyces roseofulvus, Streptomyces cinnamonensis, Streptomyces curacoi, Streptomyces fradiae, Streptomyces griseus, Streptomyces griseofuscus, Streptomyces longisporoflavus, Streptomyces hygroscopicus, Streptomyces lasaliensis, Streptomyces venezuelae, Streptomyces antibioticus, Streptomyces albus, Streptomyces tsukubaensis, Streptomyces galilaeus or Streptomyces diversa, or as described herein, or the actinomycete is from the family Micromonosporaceae, or the genus Actinomyces, Actinomadura or Nocardia, or the actinomycete is an actinomycete plant endophyte. Micromonosporaceae are preferably Micromonos poraceae, Actinoplaes, Dactylos porangium, Micromonospora or Verrucosispora. Alternatively, the host cell can be a Pseudonocardineae, particularly Actinosynnema, Lechevaleria, Saccharothrix, Actinoalloteichus, Actinopolyspora, Amycolatopsis, Kibedelos porangium, Pseudonocardia, Saccharomonospora,
Saccharopolyspora, and Streptoalloteichus. Alternatively, the host cell can be a Streptomycetacea, including Kitasatospora and Streptomyces. Alternatively, the host cell can be a Microbispora and Microtetraspora. In one aspect of the methods, the glycosyltransferase is recombinant enzyme and/or the deoxysugar pathway comprises recombinant enzymes, or combinations thereof. In one aspect, the glycosyltransferase is a rhamnosyl-transferase, an anthracycline glycosyltransferase , desosaminyltransferase, mycarosyltransferase, desosaminyltransferase, megosaminyltransferase, oleandrosyltransferase, olivosyl-transferase, mycaminosyltransferase, deoxyallose transferase, forosaminyltransferase, mannosyltransferase, daunosaminyltransferase, rhodinosyltransferase, quinovosyltransferase, or a macrolide glycosyltransferase. In one aspect, the deoxysugar pathway comprises a rhamnose, a forosamine, mycarose, mycaminose, desosaminose, megosaminose, oleandrosose, olivosose, deoxyallose, mannose, daunosaminose, rhodinose, quinovose and/or a L-digitoxose biosynthetic pathway. The invention provides a compound of the formula 9-6-deoxy-D-glucosyl- spinosyn, a 9-L-rhamnosyl-spinosyn, a 9-L-digitoxosyl-spinosyn, L-rhamnosyl- 17- pseudoaglycone spinosyn, 6-deoxy-beta-D-glucose- 17-pseudoaglycone spinosyn, D- quinovose- 17-pseudoaglycone spinosyn, or L-digitoxosyl- 17-pseudoaglycone spinosyn. A spinosyn of the invention can be an A or D form or a 21-butenyl form as described herein. Alternative spinosyns are the spinosyn A and spinosyn D forms. The invention provides a compound of the formula 9-6-deoxy-D-glucosyl-spinosyn A, a 9-L-rhamnosyl- spinosyn A, a 9-L-digitoxosyl-spinosyn A, L-rhamnosyl- 17-pseudoaglycone spinosyn A, 6-deoxy-beta-D-glucose- 17-pseudoaglycone spinosyn A, D-quinovose- 17- pseudoaglycone spinosyn A, or L-digitoxosyl- 17-pseudoaglycone spinosyn A. The invention provides a compound of the formula 9-6-deoxy-D-glucosyl-spinosyn D, a 9-L- rhamnosyl-spinosyn D, a 9-L-digitoxosyl-spinosyn D, L-rhamnosyl- 17-pseudoaglycone spinosyn D, 6-deoxy-beta-D-glucose- 17-pseudoaglycone spinosyn D, D-quinovose- 17- pseudoaglycone spinosyn D, or L-digitoxosyl- 17-pseudoaglycone spinosyn D. In one embodiment is provided a spinosyn compound where the sugar moiety at the 9-position of a spinosyn aglycone or a pseudoaglycone or a spinosyn is selected from a group provided in Figure 31, wherein R5 is C2-C6 alkyl, C3-C6 branched alkyl, C3-C7 cycloalkyl, C1-C6 alkoxy-Cl-C6 alkyl, C1-C6 alkylthio-Cl-C6 alkyl, halo C1-C6 alkyl, C2-C6 alkenyl, C2-C6 alkynyl, C1-C6 alkylcarbonyl, or C3-C6 branched alkylcarbonyl, C3-C7-cycloalkylcarbonyl, C1-C6 alkoxy-Cl-C6 alkylcarbonyl, halo C1-C6 alkylcarbonyl, C2-C6 alkenylcarbonyl, C2-C6 alkynylcarbonyl or formyl; and R6 in formula compound M is C1-C6 alkyl, C1-C6 alkenyl, formyl, C1-C6 alkylcarbonyl, or C3-C6 branched alkylcarbonyl. The invention also provides a spinosyn compound where the sugar moiety at the 17-position of a spinosyn aglycone or a pseudoaglycone or a spinosyn is selected from a group provided in Figure 32, wherein R5 is C2-C6 alkyl, C3-C6 branched alkyl, C3-C7 cycloalkyl, C1-C6 alkoxy-Cl-C6 alkyl, C1-C6 alkylthio-Cl-C6 alkyl, halo C1-C6 alkyl, C2-C6 alkenyl, C2-C6 alkynyl, C1-C6 alkylcarbonyl, or C3-C6 branched alkylcarbonyl, C3-C7-cycloalkylcarbonyl, C1-C6 alkoxy-Cl-C6 alkylcarbonyl, halo Cl- C6 alkylcarbonyl, C2-C6 alkenylcarbonyl, C2-C6 alkynylcarbonyl or formyl; and R6 in formula compound M is C1-C6 alkyl, C1-C6 alkenyl, formyl, C1-C6 alkylcarbonyl, or C3- C6 branched alkylcarbonyl. The invention also provides a novel macrolide having as one of its sugar moieties a group selected from Figure 31, wherein R5 is C2-C6 alkyl, C3-C6 branched alkyl, C3-C7 cycloalkyl, C1-C6 alkoxy-Cl-C6 alkyl, C1-C6 alkylthio-Cl-C6 alkyl, halo Cl- C6 alkyl, C2-C6 alkenyl, C2-C6 alkynyl, C1-C6 alkylcarbonyl, or C3-C6 branched alkylcarbonyl, C3-C7-cycloalkylcarbonyl, C1-C6 alkoxy-Cl-C6 alkylcarbonyl, halo Cl- C6 alkylcarbonyl, C2-C6 alkenylcarbonyl, C2-C6 alkynylcarbonyl or formyl; and R6 in formula M is C1-C6 alkyl, C1-C6 alkenyl, formyl, C1-C6 alkylcarbonyl, or C3-C6 branched alkylcarbonyl. The invention also provides a novel macrolide having as one of its sugar moieties a group selected from Figure 32, wherein R5 is C2-C6 alkyl, C3-C6 branched alkyl, C3-C7 cycloalkyl, C1-C6 alkoxy-Cl-C6 alkyl, C1-C6 alkylthio-Cl-C6 alkyl, halo Cl- C6 alkyl, C2-C6 alkenyl, C2-C6 alkynyl, C1-C6 alkylcarbonyl, or C3-C6 branched alkylcarbonyl, C3-C7-cycloalkylcarbonyl, C1-C6 alkoxy-Cl-C6 alkylcarbonyl, halo Cl- C6 alkylcarbonyl, C2-C6 alkenylcarbonyl, C2-C6 alkynylcarbonyl or formyl; and R6 in formula M is C1-C6 alkyl, C1-C6 alkenyl, formyl, C1-C6 alkylcarbonyl, or C3-C6 branched alkylcarbonyl. In one embodiment is provided a macrolide or a spinosyn compound or their pseudoaglycone having any one or more of the sugar moieties disclosed herein. In one embodiment is provided a heterologous in vivo glycosylation system and/or heterologous deoxy sugar pathway that generates at least one of the deoxy sugar compounds describe herein, a glycosyltransferase capable of transferring the at least one deoxy sugar compound described herein. Each combination is incorporated explicitly as appropriate for any particular macrolide. In another embodiment the sugar biosynthesis pathway and/or glycosyltransferase of the invention yield a spinosyn or a 9- or -17 pseudoaglycone having at least one of the deoxy sugar compounds described herein. Each pairwise combination is also incorporated explicitly, e.g., where the spinosyn has a compound of formula A of Figure 31 for its 9 position sugar and a compound of formula B of Figure 32 for its 17 position sugar. For example, the methods of the present invention can yield a spinosyn or pseudoaglycone where the sugar moiety at the 9 position of a spinosyn or a pseudoaglycone is selected from a group provided in Figure 31, wherein R5 is C2-C6 alkyl, C3-C6 branched alkyl, C3-C7 cycloalkyl, C1-C6 alkoxy-Cl-C6 alkyl, C1-C6 alkylthio-Cl-C6 alkyl, halo C1-C6 alkyl, C2-C6 alkenyl, C2- C6 alkynyl, C1-C6 alkylcarbonyl, or C3-C6 branched alkylcarbonyl, C3-C7- cycloalkylcarbonyl, C1-C6 alkoxy-Cl-C6 alkylcarbonyl, halo C1-C6 alkylcarbonyl, C2- C6 alkenylcarbonyl, C2-C6 alkynylcarbonyl or formyl; and R6 in formula M is C1-C6 alkyl, C1-C6 alkenyl, formyl, C1-C6 alkylcarbonyl, or C3-C6 branched alkylcarbonyl. In a further example the methods of the present invention can yield a spinosyn or pseudoaglycone where the sugar moiety at the 17 position of a spinosyn or a pseudoaglycone is selected from a group provided in Figure 32, wherein R5 is C2-C6 alkyl, C3-C6 branched alkyl, C3-C7 cycloalkyl, C1-C6 alkoxy-Cl-C6 alkyl, C1-C6 alkylthio-Cl-C6 alkyl, halo C1-C6 alkyl, C2-C6 alkenyl, C2-C6 alkynyl, C1-C6 alkylcarbonyl, or C3-C6 branched alkylcarbonyl, C3-C7-cycloalkylcarbonyl, C1-C6 alkoxy-Cl-C6 alkylcarbonyl, halo C1-C6 alkylcarbonyl, C2-C6 alkenylcarbonyl, C2-C6 alkynylcarbonyl or formyl; and R6 in formula M is C1-C6 alkyl, C1-C6 alkenyl, formyl, C1-C6 alkylcarbonyl, or C3-C6 branched alkylcarbonyl. The invention provides an insecticide comprising a glycosylated natural product of the invention, e.g., a natural product glycosylated by a method of the invention or made using an in vivo glycosylation system of the invention, e.g., 9-6-deoxy-D- glucosyl-spinosyn, 9-L-rhamnosyl-spinosyn, 9-L-digitoxosyl-spinosyn, L-digitoxosyl- 17- pseudo-aglycone spinosyn, 6-deoxy-D-glucose- 17-pseudoaglycone spinosyn, D- quinovose- 17-pseudoaglycone spinosyn, or a combination thereof, more preferably 9-6- deoxy-D-glucosyl-spinosyn A or D, 9-L-rhamnosyl-spinosyn A or D, 9-L-digitoxosyl- spinosyn A or D, L-digitoxosyl- 17-pseudo-aglycone spinosyn A or D, 6-deoxy-D- glucose- 17-pseudoaglycone spinosyn A or D, D-quinovose- 17-pseudoaglycone spinosyn A or D, or a combination thereof. The invention provides a pharmaceutical composition comprising a glycosylated natural product of the invention, e.g., a natural product glycosylated by a method of the invention or made using an in vivo glycosylation system of the invention, e.g., 9-6-deoxy-D-glucosyl-spinosyn, 9-L-rhamnosyl-spinosyn, 9-L-digitoxosyl-spinosyn, L-digitoxosyl- 17-pseudo-aglycone spinosyn, 6-deoxy-D-glucose- 17-pseudoaglycone spinosyn, D-quinovose- 17-pseudoaglycone spinosyn, or a combination thereof, more preferably 9-6-deoxy-D-glucosyl-spinosyn A or D, 9-L-rhamnosyl-spinosyn A or D, 9-L- digitoxosyl-spinosyn A or D, L-digitoxosyl- 17-pseudo-aglycone spinosyn A or D, 6- deoxy-D-glucose- 17-pseudoaglycone spinosyn A or D, D-quinovose- 17-pseudoaglycone spinosyn A or D, or a combination thereof. The invention provides methods of preventing or treating or ameliorating an infection in a cell or an organism or a plant comprising application of an effective amount of a composition comprising a glycosylated natural product of the invention, e.g., a natural product glycosylated by a method of the invention or made using an in vivo or in vitro glycosylation system of the invention, e.g., comprising 9-6-deoxy-D-glucosyl- spinosyn, 9-L-rhamnosyl-spinosyn, 9-L-digitoxosyl-spinosyn, L-digitoxosyl- 17-pseudoaglycone spinosyn, 6-deoxy-D-glucose- 17-pseudoaglycone spinosyn, D-quinovose- 17- pseudoaglycone spinosyn, or a combination thereof, more preferably 9-6-deoxy-D- glucosyl-spinosyn A or D, 9-L-rhamnosyl-spinosyn A or D, 9-L-digitoxosyl-spinosyn A or D, L-digitoxosyl- 17-pseudo-aglycone spinosyn A or D, 6-deoxy-D-glucose- 17- pseudoaglycone spinosyn A or D, D-quinovose- 17-pseudoaglycone spinosyn A or D, or a combination thereof. In one aspect, the cell is a plant cell, an insect cell, a fungal cell, a mammalian cell. The plant cell can be from the genera Anacardium, Arachis, Asparagus,
Atropa, Avena, Brassica, Citrus, Citrullus, Capsicum, Carthamus, Cocos, Coffea, Cucumis, Cucurbita, Daucus, Elaeis, Fragaria, Glycine, Gossypium, Helianthus, Heterocallis, Hordeum, Hyoscyamus, Lactuca, Linum, Lolium, Lupinus, Lycopersicon, Malus, Manihot, Majorana, Medicago, Nicotiana, Olea, Oryza, Panieum, Pannesetum, Persea, Phaseolus, Pistachia, Pisum, Pyrus, Prunus, Raphanus, Ricinus, Secale, Senecio, Sinapis, Solanum, Sorghum, Tlieobromus, Trigonella, Triticum, Vicia, Vitis, Vigna or Zea, or the plant can be an angiosperm or a gymnosperm or a monocot or a dicot. The invention provides kits comprising a 9-6-deoxy-D-glucosyl-spinosyn, 9-L-rhamnosyl-spinosyn, 9-L-digitoxosyl-spinosyn, L-digitoxosyl- 17-pseudo-aglycone spinosyn, 6-deoxy-D-glucose- 17-pseudoaglycone spinosyn, D-quinovose- 17- pseudoaglycone spinosyn, or a combination thereof, more preferably 9-6-deoxy-D- glucosyl-spinosyn A or D, 9-L-rhamnosyl-spinosyn A or D, 9-L-digitoxosyl-spinosyn A or D, L-digitoxosyl- 17-pseudo-aglycone spinosyn A or D, 6-deoxy-D-glucose- 17- pseudoaglycone spinosyn A or D, D-quinovose- 17-pseudoaglycone spinosyn A or D, or a combination thereof. Also provided in each of the embodiments herein are 9-6-deoxy-D- glucosyl-spinosyn, 9-L-rhamnosyl-spinosyn, 9-L-digitoxosyl-spinosyn, L-digitoxosyl- 17- pseudo-aglycone spinosyn, 6-deoxy-D-glucose- 17-pseudoaglycone spinosyn, D- quinovose- 17-pseudoaglycone spinosyn, or a combination thereof, where the spinosyn is any one of the 21-butenyl spinosyns as described herein. In one aspect, the invention provides methods for treating a plant with a novel compound of the invention, e.g., a novel spinosyn, e.g., as an insecticide in or on a plant, such as a plant from the genera Anacardium, Arachis, Asparagus, Atropa, Avena, Brassica, Citrus, Citrullus, Capsicum, Carthamus, Cocos, Coffea, Cucumis, Cucurbita, Daucus, Elaeis, Fragaria, Glycine, Gossypium, Helianthus, Heterocallis, Hordeum, Hyoscyamus, Lactuca, Linum, Lolium, Lupinus, Lycopersicon, Malus, Manihot,
Majorana, Medicago, Nicotiana, Olea, Oryza, Panieum, Pannesetum, Persea, Phaseolus, Pistachia, Pisum, Pyrus, Prunus, Raphanus, Ricinus, Secale, Senecio, Sinapis, Solanum, Sorghum, Theobromus, Trigonella, Triticum, Vicia, Vitis, Vigna and/or Zea. The plant can be an angiosperm or a gymnosperm. The plant can be a monocot or a dicot. In one aspect, the plant is a transgenic plant. In the methods and systems of the invention, nucleic acids encoding a polypeptides comprising glycosyltransferase and/or deoxysugar pathways (including the glycosyltransferase and/or deoxysugar pathways of the invention) can comprise an expression cassette, e.g., comprising a polypeptide-encoding nucleic acid operatively linked to a promoter. The nucleic acid can be operatively linked to any kind of promoter, such as an inducible promoter, a constitutive promoter and/or a tissue specific or developmentally or environmentally regulated promoter. The promoter can be a plant promoter (e.g., promoters endogenous to or active in plants), such as a cauliflower mosaic virus (CaMV) 35S transcription initiation region or a 1'- or 2'- promoter derived from T- DNA of Agrobacterium tumefaciens. The promoter can be an inducible plant promoter. The inducible promoter can be responsive to an environmental condition, such as an anaerobic condition, elevated temperature, the presence of light or a chemical. In one aspect, a plant is exposed to a chemical to induce the promoter. In one aspect, the plant promoter is a maize In2-2 promoter that is activated by a benzenesulfonamide herbicide. In alternative aspects, a plant or plant part is sprayed or otherwise treated (e.g., dipped, painted, etc.) with a chemical (e.g., in a solution) to induce the promoter. For example, the entire plant, or seeds, fruits, leaves, roots, tubers and the like, can be treated, e.g., sprayed. Plant parts, e.g., leaves, roots, tubers, fruits or seeds, can be sprayed after harvesting from the plant. Similarly, in alternative aspects, a plant or plant part can be sprayed or otherwise treated (e.g., dipped, painted, etc.) with a composition (e.g., a solution) comprising a natural product of the invention. For example, the entire plant, or seeds, fruits, leaves, roots, tubers and the like, can be treated, e.g., sprayed. Plant parts, e.g., leaves, roots, tubers, fruits or seeds, can be sprayed or otherwise treated after harvesting from the plant. In the systems and methods of the invention, the nucleic acid encoding polypeptides comprising a glycosylation system of the invention can comprise an expression vector. The nucleic acid can further comprise any kind of expression vector, e.g., the expression vector can comprise nucleic acid derived from a bacteria, a virus or a transposable element or derivatives thereof, e.g., Agrobacterium spp., potato virus X, tobacco mosaic virus, tomato bushy stunt virus, tobacco etch virus, bean golden mosaic virus, cauliflower mosaic virus, maize Ac/Ds transposable element, maize suppressor mutator (Spm) transposable element or derivatives thereof. The invention provides methods for screening for a composition having an insecticidal or anti-microbial activity comprising the following steps: (a) providing a composition of the invention; (b) providing a test cell or organism; (c) reacting the composition of step (a) with the test cell or organism; and (d) monitoring insecticidal or anti-microbial activity, thereby determining that the composition has a insecticidal or anti-microbial activity. The test cell or organism can be derived from a biological sample, e.g., a bacterial cell, a protozoan cell, an insect cell, a yeast cell, a plant cell, a fungal cell or a mammalian cell. In one aspect of the methods of screening of the invention, at least one step, or, all of the steps, are conducted in a reaction vessel. At least one step, or, all of the steps, can be conducted in a cell extract, and/or in an intact cell, or a combination thereof. The reaction vessel can comprise a microtiter plate, e.g., a capillary tube or a capillary array, such as a GIGAMATRIX™ array. Monitoring production of the insecticidal or anti-microbial product can be by a growth selection assay or equivalent. In one aspect of the methods of screening of the invention, the test cell or organism comprises a cell extract or a cell fraction, e.g., a bacterial cell, a protozoan cell, an insect cell, a yeast cell, a plant cell, a fungal cell or a mammalian cell. The invention provides transgenic plants (including parts of the plants, e.g., seeds, leaves, fruits, roots and the like) and transformed plant cells and seeds comprising a glycosyltransferase or a glycosylation system of the invention. The invention provides kits comprising a glycosyltransferase or a glycosylation system of the invention. The kit can further comprise instructions for using the kit, e.g., instructions comprising how to use the methods and compositions of the invention. The invention provides methods of treating a plant, or, making a composition of the invention in a plant, comprising the following steps: (a) introducing nucleic acids encoding polypeptides comprising a glycosylation system of the invention, wherem the nucleic acids are operably linked to a promoter; and (b) expressing the polypeptides, thereby treating a plant, or, making a composition of the invention in a plant. In one aspect, the promoter is an inducible promoter, or, a constitutive promoter. The plant can be a monocot or a dicot. The monocot can be selected from the group consisting of maize, corn, sorghum and rice. In one aspect, the plant is a transgenic plant comprising the nucleic acid. In one aspect, the nucleic acid further comprises an expression vector, a recombinant virus and the like. The invention provides methods for treating a cell, a plant, an organism, a food or a feed, or an object, comprising the following steps: providing a composition of the invention, and, contacting the composition with cell, plant, organism, food or feed, or object. The composition can be provided by treating, e.g., spraying, painting, dipping, etc., with a formulation comprising the composition. The composition can comprise a plant or a plant part, e.g., a seed, fruit, root, leaf, tuber and the like. In another aspect, the food or feed comprises an animal feed, feed supplement, an animal grain, a food or a food additive. The invention provides isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%,
73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more, or complete (100%) sequence identity to an exemplary nucleic acid of the invention, e.g., SEQ ID NO:l; SEQ ID NO:3; SEQ ID NO:5; SEQ ID NO:7; SEQ ID NO:9; SEQ ID NO:ll; SEQ ID NO:13; SEQ ID NO:15; SEQ ID NO:17; SEQ ID NO:19; SEQ ID NO:21; SEQ ID NO:23; SEQ ID NO:25; SEQ ID NO:27; SEQ ID NO:29; SEQ ID NO:31, SEQ ID NO:34; SEQ ID NO:39; SEQ ID NO:44; SEQ ID NO:48; SEQ ID NO:52; SEQ ID NO:57; SEQ ID NO:61; SEQ ID NO:66; SEQ ID NO:70; SEQ ID NO:75; SEQ ID NO:82; SEQ ID NO:89; SEQ ID NO:96; SEQ ID NO: 103; SEQ ID NO:110; SEQ ID NO:118; SEQ ID NO:125; SEQ ID NO:132; SEQ ID NO:139; SEQ ID NO: 142; SEQ ID NO: 146. The sequence identities can be determined, e.g., by analysis with a sequence comparison algorithm or by a visual inspection. In one aspect, the sequence comparison algorithm is a BLAST version 2.2.2 algorithm where a filtering setting is set to blastall -p blastn -d "nr patnt" -F F, and all other options are set to default. In one aspect, the invention provides isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50%) (or more, as noted above), or complete (100%) sequence identity to SEQ ID NO:l; SEQ ID NO:3; SEQ ID NO:5; SEQ ID NO:7; SEQ ID NO:9; SEQ ID NO:ll; SEQ ID NO:13; SEQ ID NO:15; SEQ ID NO: 17; SEQ ID NO: 19; SEQ ID NO:21; SEQ ID NO:23; SEQ ID NO:25; SEQ ID NO:27; SEQ ID NO:29, wherein nucleic acids having at least about 50% sequence identity to SEQ ID NO:l and/or SEQ ID NO:3 encode a polypeptide (e.g., SEQ ID NO:2, SEQ ID NO:4) having a gtt, or nucleotidyl transferase activity; nucleic acids having at least about 50% sequence identity to SEQ ID NO:5 and/or SEQ ID NO:7 encode a polypeptide (e.g., SEQ ID NO:6, SEQ ID NO:8) having a gdh, or dNDP-4,6-dehydratase activity; nucleic acids having at least about 50% sequence identity to SEQ ID NO:9 and/or SEQ ID NO:ll encode a polypeptide (e.g., SEQ ID NO:10, SEQ ID NO:12) having an epi, or 3,5-epimerase activity; nucleic acids having at least about 50% sequence identity to SEQ ID NO: 13 and/or SEQ ID NO: 15 encode a polypeptide (e.g., SEQ ID
NO: 14, SEQ ID NO: 16) having a kre, or 4-ketoreductase activity; nucleic acids having at least about 50% sequence identity to SEQ ID NO: 17 encode a polypeptide (e.g., SEQ ID NO: 18) having a tdh, or 2,3-dehdratase activity; nucleic acids having at least about 50% sequence identity to SEQ ID NO: 19 encode a polypeptide (e.g., SEQ ID NO:20) having a tkr, or 3-ketoreductase activity, nucleic acids having at least about 50% sequence identity to SEQ ID NO:21 encode a polypeptide (SEQ ID NO:22, ) having a glycosyl transferase (spnG) activity; nucleic acids having at least about 50% sequence identity to SEQ ID NO:23 encode a polypeptide (e.g., SEQ ID NO:24) having a glycosyl transferase (spnP) activity; nucleic acids having at least about 50% sequence identity to SEQ ID NO:25 encode a polypeptide (e.g., SEQ ID NO:26) having a methyltransferase (spnS) activity; nucleic acids having at least about 50% sequence identity to SEQ ID NO: 27 encode a polypeptide (e.g., SEQ ID NO:28) having a amino transferase (spnR) activity; nucleic acids having at least about 50% sequence identity to SEQ ID NO:29 encode a polypeptide (e.g., SEQ ID NO:30) having a 3,4 dehydratase (spnQ) activity; for example, as summarized in Table 1 : Table 1
Activity of
Nucleic acid Amino acid SEC ! Initial αene polypeptide encoded
SEQ ID NO: ID NO: Gene name source bv αene Gtt (nucleotidyl
1 2 transferase) S. diversa
3 4 gtt S. spinosa Gdh (dNDP-4,6-
5 6 dehydratase) S. diversa
7 8 gdh S. spinosa Epi
9 10 (3,5-epimerase) S. diversa
11 12 epi S. spinosa Kre
13 14 (4-ketoreductase) S. diversa
15 16 kre S. spinosa tdh
17 18 (2,3-dehdratase) S. diversa Tkr
19 20 (3-ketoreductase) S. diversa Glycosyl transferase
21 22 G S. spinosa (spnG) Glycosyl transferase
23 24 P S. spinosa (spnP) Methyltransferase
25 26 S S. spinosa (spnS) Amino transferase
27 28 R S. spinosa (spnR)
29 30 Q S. spinosa 3,4 dehydratase (spnQ) 0 2,3 dehydratase N 3 keto-reductase O-methyl K transferase O-methyl 1 transferase O-methyl H transferase
In one aspect, the invention provides isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50% (or more, as noted above), or complete (100%) sequence identity to SEQ ID NO:31, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a 6-deoxyhexose, and the pathway comprises at least one gtt gene and at least one gdh gene, e.g., in one aspect, encoding polypeptides having a sequence as set forth in SEQ ID NO:32 and SEQ ID NO:33. In one aspect, the invention provides isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50% (or more, as noted above), or complete (100%) sequence identity to SEQ ID NO:34, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a 6-deoxysugar, and the pathway comprises at least one gtt, gdh, epi and kre gene, e.g., in one aspect, encoding polypeptides having a sequence as set forth in SEQ ID NO:35, SEQ ID NO:36, SEQ ID NO:37, and SEQ ID NO:38. See pathway 1 cross-referenced in Table 2, below. In one aspect, the invention provides isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50%) (or more, as noted above), or complete (100%) sequence identity to SEQ ID NO:39, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a 6-deoxysugar, and the pathway comprises at least one gtt, gdh, epi and kre gene, e.g., in one aspect, encoding polypeptides having a sequence as set forth in SEQ ID NO:40, SEQ ID NO:41, SEQ ID NO:42, and SEQ ID NO:43. See pathway 2, cross-referenced in Table 2, below. In one aspect, the invention provides isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50% (or more, as noted above), or complete (100%) sequence identity to SEQ ID NO:44, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a 6-deoxysugar, and the pathway comprises at least one gtt, gdh and kre gene, e.g., in one aspect, encoding polypeptides having a sequence as set forth in SEQ ID NO:45, SEQ ID NO:46, and SEQ ID NO:47. See pathway 2-epi, cross-referenced in Table 2, below. In one aspect, the invention provides isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50% (or more, as noted above), or complete (100%) sequence identity to SEQ ID NO:48, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a 6-deoxyhexose, and the pathway comprises at least one gtt, gdh and epi gene, e.g., in one aspect, encoding polypeptides having a sequence as set forth in SEQ ID NO:49, SEQ ID NO:50, and SEQ ID NO:51. See pathway 2-kre, cross-referenced in Table 2, below. In one aspect, the invention provides isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50% (or more, as noted above), or complete (100%) sequence identity to SEQ ID NO:52, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a 6-deoxysugar, and the pathway comprises at least one gtt, gdh, epi and kre gene, e.g., in one aspect, encoding polypeptides having a sequence as set forth in SEQ ID NO:53, SEQ ID NO:54, SEQ ID NO:55, and SEQ ID NO:56. See pathway 3, cross-referenced in Table 2, below. In one aspect, the invention provides isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50% (or more, as noted above), or complete (100%) sequence identity to SEQ ID NO:57, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a 6- deoxyhexose, and the pathway comprises at least one gtt, gdh and kre gene, e.g., in one aspect, encoding polypeptides having a sequence as set forth in SEQ ID NO:58, SEQ ID NO:59, and SEQ ID NO:60. See pathway 3-epi, cross-referenced in Table 2, below. In one aspect, the invention provides isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50% (or more, as noted above), or complete (100%) sequence identity to SEQ ID NO:61, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a 6-deoxysugar, and the pathway comprises at least one gtt, gdh, epi and kre gene, e.g., in one aspect, encoding polypeptides having a sequence as set forth in SEQ ID NO:62, SEQ ID NO:63, SEQ ID TSfO:64 and SEQ ID NO:65. See pathway 4, cross-referenced in Table 2, below. In one aspect, the invention provides isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50% (or more, as noted above), or complete (100%) sequence identity to SEQ ID NO:66, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a 6-deoxyhexose, and the pathway comprises at least one gtt, gdh and epi gene, e.g., in one aspect, encoding polypeptides having a sequence as set forth in SEQ ID NO:67, SEQ ID NO:68 and SEQ ID NO:69. See pathway 4-kre, cross-referenced in Table 2, below. In one aspect, the invention provides isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50% (or more, as noted above), or complete (100%) sequence identity to SEQ ID NO:70, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a 6-deoxysugar, and the pathway comprises at least one gtt, gdh, epi and kre gene, e.g., in one aspect, encoding polypeptides having a sequence as set forth in SEQ ID NO:71, SEQ ID NO:72, SEQ ID NO:73 and SEQ ID NO:74. See pathway 6, cross-referenced in Table 2, below. In one aspect, the invention provides isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50% (or more, as noted above), or complete (100%) sequence identity to SEQ ID NO:75, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a 2,6 dideoxysugar, and the pathway comprises at least one gtt, gdh, epi, kre, tdh and tkr gene, e.g., in one aspect, encoding polypeptides having a sequence as set forth in SEQ ID NO:76, SEQ ID NO:77, SEQ ID NO:78, SEQ ID NO:79, SEQ ID NO:80, and SEQ ID NO:81. See pathway 9, cross-referenced in Table 2, below. In one aspect, the invention provides isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50% (or more, as noted above), or complete (100%) sequence identity to SEQ ID NO:89, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a 2,6 dideoxysugar, and the pathway comprises at least one gtt, gdh, epi, kre, tdh and tkr gene, e.g., in one aspect, encoding polypeptides having a sequence as set forth in SEQ ID NO:90, SEQ ID NO:91, SEQ ID NO:92, SEQ ID NO:93, SEQ ID NO:94, and SEQ ID NO:95. See pathway 11, cross-referenced in Table 2, below. In one aspect, the invention provides isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50% (or more, as noted above), or complete (100%) sequence identity to SEQ ID NO:96, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a 2,6 dideoxysugar, and the pathway comprises at least one gtt, gdh, epi, kre, tdh and tkr gene, e.g., in one aspect, encoding polypeptides having a sequence as set forth in SEQ ID NO:97, SEQ ID NO:98,
SEQ ID NO:99, SEQ ID NO: 100," SEQ ID NO: 101, and SEQ ID NO: 102. See pathway
12, cross-referenced in Table 2, below. In one aspect, the invention provides isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50% (or more, as noted above), or complete (100%) sequence identity to SEQ ID NO: 103, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a 2,6 dideoxysugar, and the pathway comprises at least one gtt, gdh, epi, kre, tdh and tkr gene, e.g., in one aspect, encoding polypeptides having a sequence as set forth in SEQ ID NO: 104, SEQ ID NO:105, SEQ ID NO:106, SEQ ID NO:107, SEQ ID NO:108 and SEQ ID NO:109. See pathway 13, cross-referenced in Table 2, below. In one aspect, the invention provides isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50% (or more, as noted above), or complete (100%) sequence identity to SEQ ID NO:l 10, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a deoxysugar, and the pathway comprises at least one gtt, gdh, spnQ, R, S, tdh and tkr gene, e.g., in one aspect, encoding polypeptides having a sequence as set forth in SEQ ID NO:l 11, SEQ ID NO:l 12, SEQ ID NO:113, SEQ ID NO:114, SEQ ID NO:115, SEQ ID NO:116 and SEQ ID NO:l 17. See pathway 17, cross-referenced in Table 2, below. In one aspect, the invention provides isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50% (or more, as noted above), or complete (100%) sequence identity to SEQ ID NO:l 18, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a deoxysugar, and the pathway comprises at least one gtt, gdh, spnQ, R, tdh and tkr gene, e.g., in one aspect, encoding polypeptides having a sequence as set forth in SEQ ID NO:l 19, SEQ ID NO: 120, SEQ ID NO: 121, SEQ ID NO:122, SEQ ID NO:123 and SEQ ID NO:124. See pathway 18, cross-referenced in Table 2, below. In one aspect, the invention provides isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50% (or more, as noted above), or complete (100%) sequence identity to SEQ ID NO:125, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a deoxysugar, and the pathway comprises at least one gtt, gdh, spnQ, kre, tdh and tkr gene, e.g., in one aspect, encoding polypeptides having a sequence as set forth in SEQ ID NO:l 19, SEQ ID
NO: 126, SEQ ID NO:127, SEQ ID NO:128, SEQ ID NO:129, SEQ ID NO: 130 and SEQ
ID NO: 131. See pathway 19, cross-referenced in Table 2, below. In one aspect, the invention provides isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50% (or more, as noted above), or complete (100%) sequence identity to SEQ ID NO: 132, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a deoxysugar, and the pathway comprises at least one gtt, gdh, spnQ, kre, tdh and tkr gene, e.g., in one aspect, encoding polypeptides having a sequence as set forth in SEQ ID NO: 133, SEQ ID NO:134, SEQ ID NO:135, SEQ ID NO:136, SEQ ID NO: 137 and SEQ ID NO: 138. See pathway 20, cross-referenced in Table 2, below. In one aspect, the invention provides isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50% (or more, as noted above), or complete (100%) sequence identity to SEQ ID NO:139, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a deoxysugar, and the pathway comprises at least one nucleic acid encoding SEQ ID NO: 140 and SEQ ID NO:141. See the nucleic of SEQ ID NO:139 designated the plasmid (vector) "pAT6", cross-referenced in Table 2, below. In one aspect, the invention provides isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50% (or more, as noted above), or complete (100%) sequence identity to SEQ ID NO: 142, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a deoxysugar, and the pathway comprises at least one nucleic acid encoding SEQ ID NO: 143, SEQ ID NO: 144, and SEQ ID NO: 145. See the nucleic of SEQ ID NO: 142 designated the plasmid (vector) "pUWL-spnP", cross-referenced in Table 2, below. In one aspect, the invention provides isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50%) (or more, as noted above), or complete (100%) sequence identity to SEQ ID NO: 146, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a deoxysugar, and the pathway comprises at least one nucleic acid encoding SEQ ID NO: 147, SEQ ID NO: 148, and SEQ ID NO: 149. See the nucleic of SEQ ID NO:142 designated the plasmid (vector) "pUWL-spnG", cross-referenced in Table 2, below. In one embodiment of the pathways herein comprise at least one gtt and a gdh. In another embodiment of the pathways herein are pathways wherein all or a part of the pathway, preferably one, two or three or four genes of the pathway, are provided by an endogenous gene of the host cell. In one example, the kre gene is provided by the host cell rather than the heterologous pathway. In another embodiment, the all or part of the endogenous pathway genes, preferably one, two, three or four endogenous host pathway genes, are overexpressed at least two-fold above or inactivated at least two-fold below wild-type levels. In one embodiment the all or part of the endogenous host pathway gene or genes are selected to complement any of the pathway activities provided herein. In one embodiment of the glycosylation system the host cell or cell extract therefrom, particularly an actinomycetes cell, comprises one or more inactivated, defective or inhibited endogenous glycosyltransferase and/or deoxysugar pathway genes or enzymes. The endogenous gene or enzyme can be inactivated by any number of means including gene disruption, antisense, inhibitory RNA (e.g., iRNA), a regulatory mutation or a structural gene mutation (a mutation in the gene itself). In one embodiment the host cell, e.g., actinomycete host cell, or cell extract therefrom comprises two, three or four inactivated, defective or inhibited endogenous gly cosy ltransferases and/or deoxysugar pathway enzymes. As is known in the art, the one or more endogenous biosynthetic pathway genes can be part of a macrolide biosynthetic pathway that produces the macrolide target of the invention. The inactivated, defective or inhibited endogenous glycosyltransferase and/or deoxysugar pathway gene or enzyme can result in the host cell providing an aglycone, pseudoaglycone, unnatural sugar or modified sugar compared to wild-type cell or extract. Such aglycones, pseudoaglycones and/or modified sugars are substrates for the pathways and/or glycosyltransferases of the invention. In some embodiments the engineered host cell or cell extract comprises at least two heterologous glycosyltransferases and/or at least two heterologous deoxysugar pathways. In one embodiment at least one biosynthetic deoxysugar pathway gene or enzyme and/or a glycosyltransferase of a system is provided by and endogenous to the host cell. In another embodiment two, three or four pathway genes or enzymes are provided by and endogenous to the host cell. In another embodiment the one or more endogenous genes are activated, e.g., induced, constitutive or overexpressed, or are inactivated, In yet another embodiment the engineered host cell comprises one or more genes of a heterologous macrolide biosynthetic pathway thereby producing the heterologous macrolide, or aglycone or pseudoaglycone thereof. In one embodiment of the invention a combinatorial cell library comprising novel macrolides is provided that comprises at least two, pooled or separate, engineered host cells of the invention that are of different species, strains or mutants and contain the same heterologous biosynthetic deoxysugar pathway genes and/or heterologous glycosyltransferase. By providing the same heterologous system of the invention in a plurality of different genetic backgrounds, a library is obtained that can provide a variety of modified or improved macrolides of interest. Of particular interest are host cells that differ in their ability to modify, i.e. decorate, a sugar moiety, for example as by carbamylation, O-methylation, O-alkylation, N-methylation, C- methylation, nitrosylation, amination, or deoxygenation. The library can be screened for production of a modified macrolide of interest.
Table 2
Amino acid Vector Pathway AKA
Nucleic acid SEQ ID name Name pathway Pathway Pathway genes (initial
SEQ ID NO: NO: Apdix A name produces gene source)
31 32, 33 gtt-gdh 6" gtt (S. spinosa), gdh (S. deoxyhexose spinosa) gtt (S. spinosa), gdh (S.
34 35-38 6- spinosa), epi (S. spinosa), deoxysugar kre (S. spinosa) gtt (S. spinosa), gdh (S. 6- spinosa), epi (S spinosa),
39 40-43 2 B . deoxysugar kre (S. diversa)
6- gtt (S. spinosa), gdh (S.
44 45-47 2-epi B-epi deoxyhexose spinosa), kre (S. diversa)
48 49-51 2-kre B-kre 6- gtt (S. spinosa), gdh (S. deoxyhexose spinosa), epi (S. spinosa) gtt (S. spinosa), gdh (S. 6- spinosa), epi (S. diversa),
52 53-56 3 C deoxysugar kre (S. spinosa)
6- gtt (S. spinosa), gdh (S.
57 58-60 3-epi C-epi deoxyhexose spinosa), kre (S. spinosa) gtt (S. spinosa), gdh (S.
61 6- spinosa), epi (S. diversa), 62-65 4 D deoxysugar kre (S. diversa)
66 67-69 4-kre D-kre 6- gtt (S. spinosa), gdh (S. deoxyhexose spinosa), epi (S. diversa) gtt (S. diversa), gdh (S. 71-74 6- diversa), epi (S. spinosa),
70 deoxysugar kre (S. diversa) gtt (S. spinosa), gdh (S. spinosa), epi (S. spinosa), 76-81 2,6 kre (S. spinosa), tdh (S.
75 dideoxysugardiversa), tkr (S. diversa) gtt (S. spinosa), gdh (S. spinosa), epi (S. spinosa), 82 83-88 10 F 2'^ kre (S. diversa), tdh (S. dideoxysugardiversa), tkr (S. diversa) gtt (S. spinosa), gdh (S. spinosa), epi (S. diversa), 89 90-95 11 G 2'6 kre (S. spinosa), tdh (S. dideoxysugardiversa), tkr (S. diversa) gtt (S. spinosa), gdh (S. spinosa), epi (S. diversa), 96 97-102 12 H 2'6 kre (S. diversa), tdh (S. dideoxysugardiversa), tkr (S. diversa) gtt (S. diversa), gdh (S. diversa), epi (S. spinosa), 103 104-109 13 2'6 kre (S. spinosa), tdh (S. dideoxysugardiversa), tkr (S. diversa)
gtt (S. spinosa), gdh (S. spinosa), spnQ (S. spinosa), R (S. spinosa), S (S. ^ Q 111-117 17 I spinosa), tdh (S. diversa), tkr (S. diversa)
gtt (S. spinosa), gdh (S. spinosa), spnQ (S. spinosa),
118 119-124 18 J R (S. spinosa) tdh (S diversa), tkr (S. diversa)
gtt (S. spinosa), gdh (S. spinosa), spnQ (S. spinosa), 125 126-131 19 K kre (S. spinosa) tdh (S diversa), tkr (S. diversa)
gtt (S. spinosa), gdh (S. spinosa), spnQ (S. spinosa),
132 133-138 20 L re (S' ^^' ^ (S\ diversa), tkr (S. diversa) 139 140-141 pAT6 pUWL-
142 143-145 spnP pUWL-
146 147-149 spnG
The deoxysugar biosynthetic pathways of the invention comprise: Deoxysugar biosynthetic pathway 1 (SEQ ID NO:34') General Description DNA pathway 1 Entire molecule length: 3536 bp Feature Map CDS (4 total) gtt-sp Start: 30 End: 911 Original Location Description: 712..1593 gdh-sp Start: 939 End: 1928 Original Location Description: 1621..2610 epi-sp Start: 1962 End: 2570 Original Location Description: 719..1328 kre-sp Start: 2600 End: 3517 Original Location Description: 1358..2275
Figure imgf000025_0001
No cuts: Clal, Ncol Deoxysugar biosynthetic pathway 2 (SEQ ID NO:39") General Description DNA pathway 2 Entire molecule length: 3633 bp Feature Map CDS (4 total) gtt-SP Start: 30 End: 911 Original Location. Description: 712..1593 gdh-SP Start: 939 End: 1928 Original Location. Description: 1621..2610 epi-SP Start: 1962 End: 2570 Original Location. Description: 719..1328 kre-DS Start: 2599 End: 3618 Original Location Description: 1357..2376
Restriction/Meth lation Ma
Figure imgf000026_0001
No cuts: Clal, Ncol
Deoxysugar biosynthetic pathway 2-epi (SEQ ID NO:44v) General Description DNA pathway 2-epi Entire molecule length: 2996 bp Feature Map CDS (3 total) gtt-SP Start: 30 End: 911 Original Location Description: 712..1593 gdh-SP Start: 939 End: 1928 Original Location Description: 1621..2610 kre-DS Start: 1962 End: 2981 Original Location Description: 1357..2376
Figure imgf000027_0001
No cuts: Clal, Ncol
Deoxysugar biosynthetic pathway 2-kre (SEQ ID NO:48~) General Description DNA pathway 2-kre Entire molecule length: 2593 bp Feature Map CDS (3 total) gtt-SP Start: 30 End: 911 Original Location Description: 712..1593 gdh-SP Start: 939 End: 1928 Original Location Description: 1621..2610 epi-SP Start: 1962 End: 2570 Original Location Description: 719..1328 Restriction/Meth lation Ma
Figure imgf000027_0002
Figure imgf000028_0001
No cuts: Clal, Ncol
Deoxysugar biosynthetic pathway 3 (SEO ID N"Q:52*) General Description DNA pathway 3 Entire molecule length: 3528 bp
Feature Map CDS (4 total) gtt-SP Start: 30 End: 911 Original Location Description: 712..1593 gdh-SP Start: 939 End: 1928 Original Location Description: 1621..2610 epi-DS Start: 1967 End: 2563 Original Location Description: 725..1321 kre-sp Start: 2592 End: 3509 Original Location Description: 1358..2275
Figure imgf000028_0002
No cuts: Clal, Ncol, Smal, Xmal
Deoxysugar biosynthetic pathway 3 -epi (SEQ IN NO: 57") General Description DNA pathway 3 -epi Entire molecule length: 2898 bp CDS (3 total) gtt-SP Start: 30 End: 911 Original Location Description: 712..1593 gdh-SP Start: 939 End: 1928 Original Location Description: 1621..2610 kre-SP Start: 1963 End: 2880 Original Location Description: 1358..2275
Figure imgf000029_0001
No cuts: Clal, Ncol, Smal, Xmal
Deoxysugar biosynthetic pathway 4 (SEQ ID NO:61 General Description DNA pathway 4 Entire molecule length: 3625 bp Feature Map CDS (4 total) gtt-SP Start: 30 End: 911 Original Location Description: 712..1593 gdh-SP Start: 939 End: 1928 Original Location Description: 1621..2610 epi-DS Start: 1967 End: 2563 Original Location Description: 725..1321 kre-DS Start: 2591 End: 3610 Original Location Description: 1349..2368 Restriction/Methylation Map
Figure imgf000030_0001
No cuts: Clal, Ncol
Deoxysugar biosynthetic pathway 4-kre (SEQ ID NO:66") General Description DNA pathway 4-kre Entire molecule length: 2583 bp Feature Map CDS (3 total) gtt-SP Start: 30 End: 911 Original Location Description: 712..1593 gdh-SP Start: 939 End: 1928 Original Location Description: 1621..2610 epi-DS Start: 1967 End: 2563 Original Location Description: 725..1321
Figure imgf000030_0002
No cuts: Clal, Ncol, Smal, Xmal
Deoxysugar biosynthetic pathway 6 (SEO ID NO:70 General Description DNA pathway 6 Entire molecule length: 3727 bp Feature Map CDS (4 total) gtt-DS Start: 79 End: 948 Streptomyces diversa gtt gene Original Location Description: 73..942 gdh-DS Start: 1049 End: 2017 Streptomyces diversa gdh gene Original Location Description: 1043..2011 epi-SP Start: 2056 End: 2664 Original Location Description: 2049..2658 kre-DS Start: 2693 End: 3712 Original Location Description: 2687..3706
Figure imgf000031_0001
No cuts: Clal, EcoRI, Ncol, Pstl
Deoxysugar biosynthetic pathway 9 (SEQ ID NO:75 General Description DNA pathway 9 Entire molecule length: 6010 bp Feature Map CDS (6 total) gtt-SP Start: 30 End: 911 Original Location Description: 712..1593 gdh-SP Start: 939 End: 1928 Original Location Description: 1621..2610 epi-SP Start: 1962 End: 2570 Original Location Description: 719..1328 kre-SP Start: 2600 End: 3517 Original Location Description: 1358..2275 tdh-DS Start: 3545 End: 4960 Original Location Description: 25..1440 tkr-DS Start: 4995 End: 5996 Original Location Description: 24..1025
Figure imgf000032_0001
Deoxysugar biosynthetic Pathway 10 (SEO ID NO:82^ General Description DNA Pathway 10 Entire molecule length: 6108 bp Feature Map CDS (6 total) gtt-SP Start: 30 End: 911 Original Location Description: 712..1593 gdh-SP Start: 939 End: 1928 Original Location Description: 1621..2610 epi-SP Start: 1962 End: 2570 Original Location Description: 719..1328 kre-DS Start: 2599 End: 3618 Original Location Description: 1357..2376 tdh-DS Start: 3643 End: 5058 Original Location Description: 25..1440 tkr-DS Start: 5093 End: 6094 Original Location Description: 24..1025
Figure imgf000033_0001
No cuts: Clal, Ncol
Deoxysugar biosynthetic pathway 11 (SEQ ID NO: 89 General Description DNA pathway 11 Entire molecule length: 6002 bp Feature Map CDS (6 total) gtt-SP Start: 30 End: 911 Original Location Description: 712..1593 gdh-SP Start: 939 End: 1928 Original Location Description: 1621..2610 epi-DS Start: 1967 End: 2563 Original Location Description: 725..1321 kre-SP Start: 2592 End: 3509 Original Location Description: 1350..2267 tdh-DS Start: 3537 End: 4952 Original Location Description: 25..1440 tkr-DS Start: 4987 End: 5988 Original Location Description: 24..1025
Figure imgf000034_0001
No cuts: Clal, Ncol
Deoxysugar biosynthetic pathway 12 (SEQ ID NO:96 General Description DNA pathway 12 Entire molecule length: 6100 bp Feature Map CDS (6 total) gtt-SP Start: 30 End: 911 Original Location Description: 712..1593 gdh-SP Start: 939 End: 1928 Original Location Description: 1621..2610 epi-DS Start: 1967 End: 2563 Original Location Description: 725..1321 kre-DS Start: 2591 End: 3610 Original Location Description: 1349..2368 tdh-DS Start: 3635 End: 5050 Original Location Description: 25..1440 tkr-DS Start: 5085 End: 6086 Original Location Description: 24..1025 Restriction/Methylation Map
Figure imgf000035_0001
No cuts: Clal, Ncol
Deoxysugar biosynthetic pathway 13 (SEO ID NO: 103 General Description DNA pathway 13 Entire molecule length: 6104 bp Feature Map CDS (6 total) gtt-DS Start: 79 End: 948 Streptomyces diversa gtt gene Original Location Description: 72..941 gdh-DS Start: 1049 End: 2017 Streptomyces diversa gdh gene Original Location Description: 1042..2010 epi-SP Start: 2056 End: 2664 Original Location Description: 2048..2657 kre-SP Start: 2694 End: 3611 Original Location Description: 2687..3604 tdh-DS Start: 3639 End: 5054 Original Location Description: 3632..5047 tkr-DS Start: 5089 End: 6090 Original Location Description: 5082..6083
Restriction/Methylation Map Enzyme # of cuts Positions
Figure imgf000036_0001
Deoxysugar biosynthetic pathway 17 (SEO ID NO: 110 General Description DNA pathway 17 Entire molecule length: 7811 bp Feature Map CDS (7 total) gtt-SP Start: 24 End: 905 Original Location Description: 712..1593 gdh-SP Start: 933 End: 1922 Original Location Description: 1621..2610 spnQ-SP Start: 1964 End: 3352 spnQ from Spinosyn cluster Original Location Description: complement(1988..3373) spnR-SP Start: 3390 End: 4547 spnR from Spinosyn cluster Original Location Description: complement(793..1947) spnS-SP Start: 4550 End: 5299 spnS from Spinosyn cluster Original Location Description: complement^ 1..793) tdh-DS Start: 5346 End: 6761 Original Location Description: 25..1440 tkr-DS Start: 6796 End: 7797 Original Location Description: 24..1025 Restriction/Methylation Map
Figure imgf000037_0001
No cuts: Clal, Ncol
Deoxysugar biosynthetic Pathway 18 (SEO ID NO: 118) General Description DNA Pathway 18 Entire molecule length: 7065 bp Feature Map CDS (6 total) gtt-SP Start: 24 End: 905 Original Location Description: 712..1593 gdh-SP Start: 933 End: 1922 Original Location Description: 1621..2610 spnQ-SP Start: 1964 End: 3352 spnQ from Spinosyn cluster Original Location Description: complement(1242..2627) spnR-SP Start: 3390 End: 4547 spnR from Spinosyn cluster Original Location Description: complement(47..1201) tdh-DS Start: 4600 End: 6015 Original Location Description: 25..1440 tkr-DS Start: 6050 End: 7051 Original Location Description: 24..1025
Figure imgf000037_0002
Figure imgf000038_0001
Deoxysugar biosynthetic Pathway 19 (SEO ID NO: 125) General Description DNA Pathway 19 Entire molecule length: 6825 bp Feature Map CDS (6 total) gtt-SP Start: 30 End: 911 Original Location Description: 712..1593 gdh-SP Start: 939 End: 1928 Original Location Description: 1621..2610 spnQ-SP Start: 1970 End: 3355 spnQ from Spinosyn cluster Original Location Description: complement(44..1429) kre-SP Start: 3415 End: 4332 Original Location Description: 1350..2267 tdh-DS Start: 4360 End: 5775 Original Location Description: 25..1440 tkr-DS Start: 5810 End: 6811 Original Location Description: 24..1025
Figure imgf000038_0002
Xmal 4 5470 5901 6321 6723 No cuts: Clal, Ncol
Deoxysugar biosynthetic Pathway 20 (SEO ID NO: 132) General Description DNA Pathway 20 Entire molecule length: 6923 bp Feature Map CDS (6 total) gtt-SP Start: 30 End: 911 Original Location Description: 712..1593 gdh-SP Start: 939 End: 1928 Original Location Description: 1621..2610 spnQ-SP Start: 1970 End: 3355 spnQ from Spinosyn cluster Original Location Description: complement(44..1429) kre-DS Start: 3414 End: 4433 Original Location Description: 1349..2368 tdh-DS Start: 4458 End: 5873 Original Location Description: 25..1440 tkr-DS Start: 5908 End: 6909 Original Location Description: 24..1025
Figure imgf000039_0001
No cuts: Clal, Ncol
Deoxysugar biosynthetic pAT6 (SEO ID NO: 132) General Description DNA pAT6 Entire molecule length: 6013 bp
Feature Map CDS (2 total) CDS(aac(3)IV)_2 Start: 2027 End: 2803 Original Location Description: 1729..2505 Qualifiers: /gene="aac(3)IV" /function- ' apramycin resistance" /codon_start=l /transl_table=l 1 /product="apramycin resistance gene" /protein_id="CAC93947.1" /db_xref="GI:17974211" CDS(int)_3 Start: 4152 End: 5993 Original Location Description: 3S54..5695 Qualifiers: /gene- ' hit" /function="phiC31 integrase" /codon_start=l /transl_table=l 1 /product="integrase" /protein_id="CAC93948.1" /db_xref="GI: 17974212" Misc. Feature (1 total) attP site Start: 4097 End: 4135 phiC31 attP site Original Location Description: 3799..3837 Promoter Prokaryotic (1 total) ermEp* Start: 503 End: 548 (Complementary) Replication Origin (1 total) Rep_Origin_1 Start: 3638 End: 3749 RP4 origin of single-stranded DNA transfer Original Location Description: 3340..3451 Qualifiers: /direction=RIGHT -10 Signal (1 total) -10 Start: 515 End: 520 (Complementary) -35 Signal (1 total) -35 Start: 535 End: 540 (Complementary) Variation (1 total) Variation_1 Start: 806 End: 807 (Complementary) pUC18 Original Location Description: complement(508..509) Qualifiers: /replace="ggt"
Figure imgf000041_0001
Plasmid pUWL-SpnP (SEQ ID NO: 142) General Description DNA pUWL-SpnP Entire molecule length: 8215 bp Feature Map CDS (4 total) Tsr Start: 418 End: 1227 bla Start: 2282 End: 3142 CDS(spnP)_6 Start: 4206 End: 5525 (Complementary) involved in forosamine addition Original Location Description: 7083..8450 Qualifiers: /gene="spnP" /codon_start=l /transl_table=l 1 /product="probableNDP-forosamyltransferase" /protein_id="AAG23277.1 " /db_xref="GI: 13162649" rep-plJ101 Start: 6020 End: 7390 (Complementary) Promoter Prokaryotic (1 total) ermEp* Start: 5583 End: 5852 (Complementary) Replication Origin (2 total) ColE1-ori Start: 3168 End: 3910 (Complementary) ori-plJ101 Start: 7412 End: 8172 Terminator (1 total) fd-terminator Start: 1282 End: 1620 Restriction/Meth lation Ma
Figure imgf000042_0001
No cuts: BamHI
pUWL-SpnG (SEO ID NO: 146) General Description DNA pUWL-SpnG Entire molecule length: 8065 bp Feature Map CDS (4 total) tsr Start: 418 End: 1227 bla Start: 2282 End: 3142 CDS(spnG)_15 Start: 4205 End: 5377 (Complementary) involved in rhamnose addition Original Location Description: complement(18541..19713) Qualifiers: /gene="spnG" /codon_start=l /transl_table=ll /product="probableNDP-rhamnosyltransferase" /protein_id="AAG23268.1 " /db_xref="GI: 13162640" rep-plJ101 Start: 5870 End: 7240 (Complementary) Promoter Prokaryotic (1 total) ermEp* Start: 5433 End: 5702 (Complementary) Replication Origin (2 total) ColE1-ori Start: 3168 End: 3910 (Complementary) ori-pU101 Start: 7262 End: 8022 Terminator (1 total) fd-terminator Start: 1282 End: 1620 Restriction/Meth lation Ma
Figure imgf000043_0001
The invention also provides deoxysugar biosynthetic pathways comprising any combination of the biosynthetic pathways described herein. For example, the invention provides a glycosylation system or a biosynthetic pathway comprising any combination of nucleic acids of the invention, including nucleic acids encoding enzymes of the invention and/or deoxysugar biosynthetic pathways of the invention, such as the exemplary nucleic acids of the invention SEQ ID NO:l; SEQ ID NO:3; SEQ ID NO:5; SEQ ID NO:7; SEQ ID NO:9; SEQ ID NO:ll; SEQ ID NO:13; SEQ ID NO:15; SEQ ID NO:17; SEQ ID NO:19; SEQ ID NO:21; SEQ ID NO:23; SEQ ID NO:25; SEQ ID NO:27; SEQ ID NO:29; SEQ ID NO:31, SEQ ID NO:34; SEQ ID NO:39; SEQ ID NO:44; SEQ ID NO:48; SEQ ID NO:52; SEQ ID NO:57; SEQ ID N0:61; SEQ ID NO:66; SEQ ID NO:70; SEQ ID NO:75; SEQ ID NO:82; SEQ ID NO:89; SEQ ID NO:96; SEQ ID NO: 103; SEQ ID NO: 110; SEQ ID NO:l 18; SEQ ID NO: 125; SEQ ID NO:132; and the plasmids SEQ ID NO:139; SEQ ID NO:142 and SEQ ID NO:146.
These nucleic acids can be expressed using one or any combination of designs, e.g., they can assembled together on one or more plasmids (or other expression construct), or, each can be inserted separately in a plasmid (or other expression construct). The nucleic acids of the invention (or, biosynthetic pathways of the invention) can be integrated into the chromosome together of separately, or, expressed episomally. In one aspect, the isolated or recombinant nucleic acid encodes a polypeptide having a thermostable activity. The polypeptide can retain a activity under conditions comprising a temperature range of between about 37°C to about 95°C; between about 55°C to about 85°C, between about 70°C to about 95°C, or, between about 90°C to about 95°C. In another aspect, the isolated or recombinant nucleic acid encodes a polypeptide which is thermotolerant. The polypeptide can retain activity after exposure to a temperature in the range from greater than 37°C to about 95°C or anywhere in the range from greater than 55°C to about 85°C; or, after exposure to a temperature in the range between about 1°C to about 5°C, between about 5°C to about 15°C, between about 15°C to about 25°C, between about 25°C to about 37°C, between about 37°C to about
95°C, between about 55°C to about 85°C, between about 70°C to about 75°C, or between about 90°C to about 95°C, or more; or after exposure to a temperature in the range from greater than 90°C to about 95°C at pH 4.5. In one embodiment the isolated or recombinant nucleic acid of claim encodes a polypeptide having improved expression in a host cell, improved enzymatic activity, or a different substrate specificity than wild type. The different substrate specificity can be a more promiscuous activity such that a wider range of acceptor and/or donors are accepted by the enzyme. Methods to improve enzyme or gene activity are well known in the art. Some such methods are described herein such as gene site saturation mutagenesis. The modified enzymes or genes are then screened for a desired property or activity as would be known in the art. In one aspect, the modified gene or enzyme has at least about 50%,
51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%,
66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%,
81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%o, 91%, 98%, or 99% sequence identity to an exemplary nucleic acid of the invention or to a wild-type or other starting pathway or glycosyltransferase gene or enzyme of interest. The invention provides isolated or recombinant nucleic acids comprising a sequence that hybridizes under stringent conditions to a nucleic acid comprising a sequence as set forth in SEQ ID NO:l; SEQ ID NO:3; SEQ ID NO:5; SEQ ID NO:7; SEQ ID NO:9; SEQ ID NO:l 1; SEQ ID NO:13; SEQ ID NO:15; SEQ ID NO:17; SEQ ID NO: 19; SEQ ID NO:21; SEQ ID NO:23; SEQ ID NO:25; SEQ ID NO:27; SEQ ID NO:29; SEQ ID NO:31, SEQ ID NO:34; SEQ ID NO:39; SEQ ID NO:44; SEQ ID NO:48; SEQ ID NO:52; SEQ ID NO:57; SEQ ID NO:61; SEQ ID NO:66; SEQ ID NO:70; SEQ ID NO:75; SEQ ID NO:82; SEQ ID NO:89; SEQ ID NO:96; SEQ ID NO:103; SEQ ID NO:110; SEQ ID NO:118; SEQ ID NO:125; SEQ ID NO:132; SEQ ID NO: 139; SEQ ID NO: 142; SEQ ID NO: 146, or fragments or subsequences thereof, wherein the respective exemplary nucleic acids encode a polypeptide having an activity as set forth above (e.g., nucleic acids that hybridize under stringent conditions to SEQ ID NO:l and/or SEQ ID NO:3 encode a polypeptide (e.g., SEQ ID NO:2, SEQ ID NO:4) having a gtt, or nucleotidyl transferase activity, etc.). The nucleic acid can be at least about 10, 15, 20, 25, 30, 35, 40, 45, 50, 75, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1050, 1100, 1150, 1200 or more residues in length or the full length of the gene or transcript. In one aspect, the stringent conditions include a wash step comprising a wash in 0.2X SSC at a temperature of about 65°C for about 15 minutes. The invention provides a nucleic acid probe for identifying a nucleic acid encoding a polypeptide having a desired activity (as set forth herein), wherein the probe comprises at least about 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000 or more, consecutive bases of a sequence comprising a sequence of the invention, or fragments or subsequences thereof, wherein the probe identifies the nucleic acid by binding or hybridization. The probe can comprise an oligonucleotide comprising at least about 10 to 50, about 20 to 60, about 30 to 70, about 40 to 80, or about 60 to 100 consecutive bases of a sequence comprising a sequence of the invention, or fragments or subsequences thereof. The invention provides a nucleic acid probe for identifying a nucleic acid encoding a polypeptide having a desired activity (as set forth herein), wherein the probe comprises a nucleic acid comprising a sequence at least about 10, 15, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000 or more residues having at least about 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more, or complete (100%) sequence identity to a nucleic acid of the invention, wherein the sequence identities are determined by analysis with a sequence comparison algorithm or by visual inspection. The probe can comprise an oligonucleotide comprising at least about 10 to 50, about 20 to 60, about 30 to 70, about 40 to 80, or about 60 to 100 consecutive bases of a nucleic acid sequence of the invention, or a subsequence thereof. The invention provides an amplification primer pair for amplifying a nucleic acid encoding a polypeptide having a desired activity (as set forth herein), wherein the primer pair is capable of amplifying a nucleic acid comprising a sequence of the invention, or fragments or subsequences thereof. One or each member of the amplification primer sequence pair can comprise an oligonucleotide comprising at least about 10 to 50 consecutive bases of the sequence, or about 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 or more consecutive bases of the sequence. The invention provides amplification primer pairs, wherein the primer pair comprises a first member having a sequence as set forth by about the first (the 5') 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 or more residues of a nucleic acid of the invention, and a second member having a sequence as set forth by about the first (the 5') 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 or more residues of the complementary strand of the first member. In one embodiment the first member sequence of a primer pair comprises a sequence selected from a polypeptide-encoding region, its complement, or their degenerate sequences of the invention. In one embodiment the amplification primer pair sequence is codon biased based on codon usage or on GC content of an Actinomyces or that of the target cell. The GC content of actinomycetes is about 65-15%, more preferably about 70%, and any from about 65%, 66%, 61%, 68%, 69%, 70%, 71%, 72%,
73%, 74%, to about 75%.. The invention provides polypeptide-encoding nucleic acids generated by amplification, e.g., polymerase chain reaction (PCR), using an amplification primer pair of the invention. The invention provides enzymes generated by amplification, e.g., polymerase chain reaction (PCR), using an amplification primer pair of the invention. The invention provides methods of making a polypeptide by amplification, e.g., polymerase chain reaction (PCR), using an amplification primer pair of the invention. In one aspect, the amplification primer pair amplifies a nucleic acid from a library, e.g., a gene library, such as an environmental library. The invention provides methods of amplifying a nucleic acid encoding a polypeptide having a desired activity (as set forth herein) comprising amplification of a template nucleic acid with an amplification primer sequence pair capable of amplifying a nucleic acid sequence of the invention, or fragments or subsequences thereof. The invention provides expression cassettes comprising a nucleic acid of the invention or a subsequence thereof. In one aspect, the expression cassette can comprise the nucleic acid that is operably linked to a promoter. The promoter can be a viral, bacterial, mammalian or plant promoter. In one aspect, the plant promoter can be a potato, rice, corn, wheat, tobacco or barley promoter. The promoter can be a constitutive promoter. The constitutive promoter can comprise CaMV35S. In another aspect, the promoter can be an inducible promoter. In one aspect, the promoter can be a tissue- specific promoter or an environmentally regulated or a developmentally regulated promoter. Thus, the promoter can be, e.g., a seed-specific, a leaf-specific, a root-specific, a stem-specific or an abscission-induced promoter. In one aspect, the expression cassette can further comprise a plant or plant virus expression vector. The invention provides cloning vehicles comprising an expression cassette (e.g., a vector) of the invention or a nucleic acid of the invention (comprising, in one aspect, a glycosylation pathway of the invention). The cloning vehicle can be a viral vector, a plasmid, a phage, a phagemid, a cosmid, a fosmid, a bacteriophage or an artificial chromosome. The viral vector can comprise an adenovirus vector, a retroviral vector or an adeno-associated viral vector. The cloning vehicle can comprise a bacterial artificial chromosome (BAC), a plasmid, a bacteriophage PI -derived vector (PAC), a yeast artificial chromosome (YAC), or a mammalian artificial chromosome (MAC). The invention provides transformed cells comprising a nucleic acid of the invention (comprising, in one aspect, a glycosylation pathway of the invention) an expression cassette (e.g., a vector) of the invention, or a cloning vehicle of the invention. In one aspect, the transformed cell can be a bacterial cell (e.g., an actinomycete, including any organism from the order Actinomycetales), a mammalian cell, a fungal cell, a yeast cell, an insect cell or a plant cell. In one aspect, the plant cell can be a potato, wheat, rice, corn, tobacco or barley cell. The invention provides transgenic non-human animals comprising a nucleic acid of the invention (comprising, in one aspect, a glycosylation pathway of the invention) or an expression cassette (e.g., a vector) of the invention. In one aspect, the animal is a mouse. The invention provides transgenic plants comprising a nucleic acid of the invention (comprising, in one aspect, a glycosylation pathway of the invention) or an expression cassette (e.g., a vector) of the invention. The transgenic plant can be a corn plant, a potato plant, a tomato plant, a wheat plant, an oilseed plant, a rapeseed plant, a soybean plant, a rice plant, a barley plant or a tobacco plant. The invention provides transgenic seeds comprising a nucleic acid of the invention (comprising, in one aspect, a glycosylation pathway of the invention) or an expression cassette (e.g., a vector) of the invention. The transgenic seed can be a corn seed, a wheat kernel, an oilseed, a rapeseed, a soybean seed, a palm kernel, a sunflower seed, a sesame seed, a peanut or a tobacco plant seed. The invention provides methods of making a transgenic plant comprising the following steps: (a) introducing a heterologous nucleic acid sequence into the cell, wherein the heterologous nucleic sequence comprises a nucleic acid sequence of the invention (comprising, in one aspect, a glycosylation pathway of the invention), thereby producing a transformed plant cell; and (b) producing a transgenic plant from the transformed cell. In one aspect, the step (a) can further comprise introducing the heterologous nucleic acid sequence by electroporation or microinjection of plant cell protoplasts. In another aspect, the step (a) can further comprise introducing the heterologous nucleic acid sequence directly to plant tissue by DNA particle bombardment. Alternatively, the step (a) can further comprise introducing the heterologous nucleic acid sequence into the plant cell DNA using an Agrobacterium tumefaciens host. In one aspect, the plant cell can be a potato, corn, rice, wheat, tobacco, or barley cell. The invention provides methods of expressing a heterologous nucleic acid sequence in a cell, e.g., a bacterial or plant cell, comprising the following steps: (a) transforming the plant cell with a heterologous nucleic acid sequence operably linked to a promoter, wherein the heterologous nucleic sequence comprises a nucleic acid of the invention (comprising, in one aspect, a glycosylation pathway of the invention); (b) growing the cell (e.g., bacteria or plant) under conditions wherein the heterologous nucleic acids sequence is expressed in the cell. The invention provides methods of expressing a heterologous nucleic acid sequence in a cell (e.g., bacteria or plant) comprising the following steps: (a) transforming the cell (e.g., bacteria or plant) with a heterologous nucleic acid sequence operably linked to a promoter, wherein the heterologous nucleic sequence comprises a sequence of the invention; (b) growing the cell (e.g., plant) under conditions wherein the heterologous nucleic acids sequence is expressed in the cell. The invention provides an antisense oligonucleotide comprising a nucleic acid sequence complementary to or capable of hybridizing under stringent conditions to a nucleic acid of the invention. The invention provides methods of inhibiting the translation of a message (e.g., a glycosyl transferase, a methyltransferase, an aminotransferase, a 3,4, dehydratase, a 3-ketoreductase, 4,6-dehydratase, 2,3- dehydratase, 4-ketoreductase, or an O-methyl transferase message) in a cell comprising administering to the cell or expressing in the cell an antisense oligonucleotide comprising a nucleic acid sequence complementary to or capable of hybridizing under stringent conditions to a nucleic acid of the invention. In one aspect, the antisense oligonucleotide is between about 10 to 50, about 20 to 60, about 30 to 70, about 40 to 80, or about 60 to 100 bases in length, or any variation thereof. The invention provides methods of inhibiting the translation of a message in a cell comprising administering to the cell or expressing in the cell an antisense oligonucleotide comprising a nucleic acid sequence complementary to or capable of hybridizing under stringent conditions to a nucleic acid of the invention. The invention provides double-stranded inhibitory RNA (RNAi) molecules comprising a subsequence of a sequence of the invention. In one aspect, the RNAi is about 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25 or more duplex nucleotides in length. The invention provides methods of inhibiting the expression of an enzyme (e.g., a glycosyl transferase, a methyltransferase, an aminotransferase, a 3,4, dehydratase, a 3-keto reductase, 4,6- dehydratase, 2,3- dehydratase, 4-ketoreductase, or an O-methyl transferase activity) in a cell comprising administering to the cell or expressing in the cell a double-stranded inhibitory RNA (iRNA), wherein the RNA comprises a subsequence of a sequence of the invention. The invention provides an isolated or recombinant polypeptide comprising an amino acid sequence having at least about 50%), 51%, 52%, 53%>, 54%>, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more, or complete (100%o) sequence identity to an exemplary polypeptide or peptide of the invention over a region of at least about 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 100, 125, 150, 175, 200, 225, 250, 275, 300, 325, 350 or more residues, or over the full length of the polypeptide, and the sequence identities are determined by analysis with a sequence comparison algorithm or by a visual inspection. Exemplary polypeptides and peptides of the invention comprise sequences as set forth in SEQ ID NO:2; SEQ ID NO:4; SEQ ID NO:6; SEQ IDNO:8; SEQ ID NO: 10; SEQ ID NO: 12; SEQ ID NO: 14; SEQ ID NO: 16; SEQ ID NO: 18; SEQ ID NO:20; SEQ ID NO:22; SEQ ID NO:24; SEQ ID NO:26; SEQ ID NO:28; SEQ ID NO:30; SEQ ID NO:32; SEQ ID NO&3; SEQ ID NO:35; SEQ ID NO:36; SEQ ID NO:37; SEQ ID NO:38; SEQ ID NO:40; SEQ ID NO:41; SEQ ID NO:42; SEQ ID NO:43; SEQ ID NO:45; SEQ ID NO:46; SEQ ID NO:47; SEQ ID NO:49; SEQ ID NO:50; SEQ ID NO:51; SEQ ID NO:53; SEQ ID NO:54; SEQ ID NO:55; SEQ ID NO:56; SEQ ID NO:58; SEQ ID NO:59; SEQ ID NO:60; SEQ ID NO:62; SEQ ID NO:63; SEQ ID NO:64; SEQ ID NO:65; SEQ ID NO:67; SEQ ID NO:68; SEQ ID NO:69; SEQ ID NO:71; SEQ ID NO:72; SEQ ID NO:73; SEQ ID NO:74; SEQ ID NO:76; SEQ ID NO:77; SEQ ID NO:78; SEQ ID NO:79; SEQ ID NO:80; SEQ ID NO:81; SEQ ID NO:83; SEQ ID NO: 84; SEQ ID NO: 85; SEQ ID NO: 86; SEQ ID NO: 87; SEQ ID NO: 88; SEQ ID NO:90; SEQ ID NO: 91; SEQ ID NO: 92; SEQ ID NO: 93; SEQ ID NO: 94; SEQ ID NO: 95; SEQ ID NO: 97; SEQ ID NO:98; SEQ ID NO:99; SEQ ID NO:100; SEQ ID NO:101; SEQ ID NO:102;
SEQ ID NO 104 SEQ ID NO:105; SEQ ID NO:106; SEQ ID NO:107; SEQ ID NO:108 SEQ ID NO 109 SEQ IDNO-.lll; SEQ ID NO:112; SEQ ID NO:113; SEQ ID NO:114 SEQ ID NO 115 SEQ ID NOT 16; SEQ ID NO:l 17; SEQ ID NO:119; SEQ ID NO:120 SEQ ID NO 121 SEQ ID NO:122; SEQ ID NO:123; SEQ ID NO:124; SEQ ID NO:126 SEQ ID NO 127 SEQ ID NO:128; SEQ ID NO:129; SEQ ID NO:130; SEQ ID NO:131 SEQ ID NO 133 SEQ ID NO:134; SEQ ID NO: 135; SEQ ID NO: 136; SEQ ID NO:137: SEQ ID NO 138 SEQ ID NO:140; SEQ ID NO: 141; SEQ ID NO: 143; SEQ ID NO: 144 SEQ ID NO 145 SEQ ID NO: 147; SEQ ID NO: 148; SEQ ID NO: 149, and fragments and subsequences thereof. The sequence identities can be determined, e.g., by analysis with a sequence comparison algorithm or by visual inspection. In one aspect, the sequence comparison algorithm is a BLAST version 2.2.2 algorithm where a filtering setting is set to blastall -p blastp -d "nr pataa" -F F, and all other options are set to default. Exemplary polypeptides also include peptides of at least about 10, 15, 20, 25, 30, 35, 40, 45, 50, 75, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600 or more residues of an exemplary sequence of the invention, or over the full length of an exemplary enzyme of the invention. The peptide can be, e.g., an immunogenic fragment, a motif (e.g., a binding site), or an active site. Exemplary polypeptide or peptide sequences of the invention include sequence encoded by a nucleic acid of the invention. Exemplary polypeptide or peptide sequences of the invention include polypeptides or peptides specifically bound by an antibody of the invention. In alternative aspects, a polypeptide of the invention can have a glycosyl transferase, a methyltransferase, an aminotransferase, a 3,4, dehydratase, a 3-keto reductase, 4,6-dehydratase, 2,3- dehydratase, 4-ketoreductase, or an O-methyl transferase activity, as described herein. In one aspect, the enzyme activity is thermostable. The polypeptide can retain enzyme activity under conditions comprising a temperature range of between about 1°C to about 5°C, between about 5°C to about 15°C, between about 15°C to about 25°C, between about 25°C to about 37°C, between about 37°C to about 95°C, between about 55°C to about 85°C, between about 70°C to about 75°C, or between about 90°C to about 95°C, or more. In another aspect, the enzyme activity can be thermotolerant. The polypeptide can retain enzyme activity after exposure to a temperature in the range from greater than 37°C to about 95°C, or in the range from greater than 55°C to about 85°C, or in the range from greater than 90°C to about 95°C at pH 4.5. In one aspect, the isolated or recombinant polypeptide can comprise the polypeptide of the invention that lacks a signal sequence. In one aspect, the isolated or recombinant polypeptide can comprise the polypeptide of the invention comprising a heterologous signal sequence. In one aspect, the invention provides a signal sequence comprising a peptide comprising/ consisting of a sequence as set forth in residues 1 to 12, 1 to 13, 1 to 14, 1 to 15, 1 to 16, 1 to 17, 1 to 18, 1 to 19, 1 to 20, 1 to 21, 1 to 22, 1 to 23, 1 to 24, 1 to 25, 1 to 26, 1 to 27, 1 to 28, 1 to 28, 1 to 30, 1 to 31, 1 to 32, 1 to 33, 1 to 34, 1 to 35, 1 to
36, 1 to 37, 1 to 38, 1 to 39, 1 to 40, 1 to 41, 1 to 42, 1 to 43, 1 to 44 (or a longer peptide) of a polypeptide of the invention. In one aspect, the invention provides chimeric proteins comprising a first domain comprising a signal sequence of the invention and at least a second domain. The protein can be a fusion protein. The second domain can comprise an enzyme. The enzyme can be an enzyme. The invention provides chimeric polypeptides comprising at least a first domain comprising signal peptide (SP), a prepro sequence and/or a catalytic domain (CD) of the invention and at least a second domain comprising a heterologous polypeptide or peptide, wherein the heterologous polypeptide or peptide is not naturally associated with the signal peptide (SP), prepro sequence and/ or catalytic domain (CD). In one aspect, the heterologous polypeptide or peptide is not derived from an enzyme of the invention. The heterologous polypeptide or peptide can be amino terminal to, carboxy terminal to or on both ends of the signal peptide (SP), prepro sequence and/or catalytic domain (CD). The invention provides isolated or recombinant nucleic acids encoding a chimeric polypeptide, wherein the chimeric polypeptide comprises at least a first domain comprising signal peptide (SP), a prepro domain and/or a catalytic domain (CD) of the invention and at least a second domain comprising a heterologous polypeptide or peptide, wherein the heterologous polypeptide or peptide is not naturally associated with the signal peptide (SP), prepro domain and/ or catalytic domain (CD). The invention provides the isolated or recombinant polypeptide of the invention, wherein the polypeptide comprises at least one glycosylation site. In one aspect, glycosylation can be an N-linked glycosylation. In one aspect, the polypeptide can be glycosylated after being expressed in a P. pastoris or a & pombe. In one aspect, the polypeptide can retain enzyme activity under conditions comprising about pH 6.5, pH 6, pH 5.5, pH 5, pH 4.5 or pH 4. In another aspect, the polypeptide can retain enzyme activity under conditions comprising about pH 7, pH 7.5 pH 8.0, pH 8.5, pH 9, pH 9.5, pH 10, pH 10.5 or pH 11. In one aspect, the polypeptide can retain enzyme activity after exposure to conditions comprising about pH 6.5, pH 6, pH 5.5, pH 5, pH 4.5 or pH 4. In another aspect, the polypeptide can retain enzyme activity after exposure to conditions comprising about pH 7, pH 7.5 pH 8.0, pH 8.5, pH 9, pH 9.5, pH 10, pH 10.5 or pH 11. The invention provides protein preparations comprising a polypeptide of the invention, wherein the protein preparation comprises a liquid, a solid or a gel. The invention provides heterodimers comprising a polypeptide of the invention and a second protein or domain. In one aspect, the second domain can be a polypeptide and the heterodimer can be a fusion protein. In one aspect, the second domain can be an epitope or a tag. In one aspect, the invention provides homodimers comprising a polypeptide of the invention. The invention provides immobilized polypeptides having enzyme activity, wherein the polypeptide comprises a polypeptide of the invention, a polypeptide encoded by a nucleic acid of the invention, or a polypeptide comprising a polypeptide of the invention and a second domain. In one aspect, the polypeptide can be immobilized on a cell, a metal, a resin, a polymer, a ceramic, a glass, a microelectrode, a graphitic particle, a bead, a gel, a plate, an array or a capillary tube. The invention provides arrays comprising an immobilized nucleic acid or polypeptide of the invention. The invention provides arrays comprising an antibody of the invention. The invention provides isolated or recombinant antibodies that specifically bind to a polypeptide of the invention or to a polypeptide encoded by a nucleic acid of the invention. The antibody can be a monoclonal or a polyclonal antibody. The invention provides hybridomas comprising an antibody of the invention, e.g., an antibody that specifically binds to a polypeptide of the invention or to a polypeptide encoded by a nucleic acid of the invention. The invention provides methods of making an antibody comprising administering to a non-human animal a nucleic acid of the invention or a polypeptide of the invention or subsequences thereof in an amount sufficient to generate a humoral immune response. The invention provides methods of generating an immune response comprising administering to a non-human animal a nucleic acid of the invention or a polypeptide of the invention or subsequences thereof in an amount sufficient to generate an immune response. The invention provides methods of producing a recombinant polypeptide comprising the steps of: (a) providing a nucleic acid of the invention operably linked to a promoter; and (b) expressing the nucleic acid of step (a) under conditions that allow expression of the polypeptide, thereby producing a recombinant polypeptide. In one aspect, the method can further comprise transforming a host cell with the nucleic acid of step (a) followed by expressing the nucleic acid of step (a), thereby producing a recombinant polypeptide in a transformed cell. The invention provides methods for identifying a polypeptide having a desired activity comprising the following steps: (a) providing a polypeptide of the invention; or a polypeptide encoded by a nucleic acid of the invention; (b) providing an appropriate enzyme substrate; and (c) contacting the polypeptide or a fragment or variant thereof of step (a) with the substrate of step (b) and detecting a decrease in the amount of substrate or an increase in the amount of a reaction product, wherein a decrease in the amount of the substrate or an increase in the amount of the reaction product detects a polypeptide having the desired activity. The invention provides methods for identifying an enzyme substrate comprising the following steps: (a) providing a polypeptide of the invention; or a polypeptide encoded by a nucleic acid of the invention; (b) providing a test substrate; and (c) contacting the polypeptide of step (a) with the test substrate of step (b) and detecting a decrease in the amount of substrate or an increase in the amount of reaction product, wherein a decrease in the amount of the substrate or an increase in the amount of a reaction product identifies the test substrate as the appropriate substrate. The invention provides methods of determining whether a test compound specifically binds to a polypeptide comprising the following steps: (a) expressing a nucleic acid or a vector comprising the nucleic acid under conditions permissive for translation of the nucleic acid to a polypeptide, wherein the nucleic acid comprises a nucleic acid of the invention, or, providing a polypeptide of the invention; (b) providing a test compound; (c) contacting the polypeptide with the test compound; and (d) determining whether the test compound of step (b) specifically binds to the polypeptide. The invention provides methods for identifying a modulator of an enzyme's activity comprising the following steps: (a) providing a polypeptide of the invention or a polypeptide encoded by a nucleic acid of the invention; (b) providing a test compound; (c) contacting the polypeptide of step (a) with the test compound of step (b) and measuring an activity of the enzyme, wherein a change in the enzyme activity measured in the presence of the test compound compared to the activity in the absence of the test compound provides a determination that the test compound modulates the enzyme's activity. In one aspect, the enzyme activity can be measured by providing an appropriate substrate and detecting a decrease in the amount of the substrate or an increase in the amount of a reaction product, or, an increase in the amount of the substrate or a decrease in the amount of a reaction product. A decrease in the amount of the substrate or an increase in the amount of the reaction product with the test compound as compared to the amount of substrate or reaction product without the test compound identifies the test compound as an activator of enzyme activity. An increase in the amount of the substrate or a decrease in the amount of the reaction product with the test compound as compared to the amount of substrate or reaction product without the test compound identifies the test compound as an inhibitor of enzyme activity. The invention provides computer systems comprising a processor and a data storage device wherein said data storage device has stored thereon a polypeptide sequence or a nucleic acid sequence of the invention (e.g., a polypeptide encoded by a nucleic acid of the invention). In one aspect, the computer system can further comprise a sequence comparison algorithm and a data storage device having at least one reference sequence stored thereon. In another aspect, the sequence comparison algorithm comprises a computer program that indicates polymorphisms. In one aspect, the computer system can further comprise an identifier that identifies one or more features in said sequence. The invention provides computer readable media having stored thereon a polypeptide sequence or a nucleic acid sequence of the invention. The invention provides methods for identifying a feature in a sequence comprising the steps of: (a) reading the sequence using a computer program which identifies one or more features in a sequence, wherein the sequence comprises a polypeptide sequence or a nucleic acid sequence of the invention; and (b) identifying one or more features in the sequence with the computer program. The invention provides methods for comparing a first sequence to a second sequence comprising the steps of: (a) reading the first sequence and the second sequence through use of a computer program which compares sequences, wherein the first sequence comprises a polypeptide sequence or a nucleic acid sequence of the invention; and (b) determining differences between the first sequence and the second sequence with the computer program. The step of determining differences between the first sequence and the second sequence can further comprise the step of identifying polymorphisms. In one aspect, the method can further comprise an identifier that identifies one or more features in a sequence. In another aspect, the method can comprise reading the first sequence using a computer program and identifying one or more features in the sequence. The invention provides methods for isolating or recovering a nucleic acid encoding a polypeptide having a desired activity from an environmental sample comprising the steps of: (a) providing an amplification primer sequence pair for amplifying a nucleic acid encoding a polypeptide having the desired activity, wherein the primer pair is capable of amplifying a nucleic acid of the invention; (b) isolating a nucleic acid from the environmental sample or treating the environmental sample such that nucleic acid in the sample is accessible for hybridization to the amplification primer pair; and, (c) combining the nucleic acid of step (b) with the amplification primer pair of step (a) and amplifying nucleic acid from the environmental sample, thereby isolating or recovering a nucleic acid encoding a polypeptide having the desired activity from an environmental sample. One or each member of the amplification primer sequence pair can comprise an oligonucleotide comprising at least about 10 to 50 consecutive bases of a sequence of the invention. In one aspect, the amplification primer sequence pair is an amplification pair of the invention. The invention provides methods for isolating or recovering a nucleic acid encoding a polypeptide having enzyme activity from an environmental sample comprising the steps of: (a) providing a polynucleotide probe comprising a nucleic acid of the invention or a subsequence thereof; (b) isolating a nucleic acid from the environmental sample or treating the environmental sample such that nucleic acid in the sample is accessible for hybridization to a polynucleotide probe of step (a); (c) combining the isolated nucleic acid or the treated environmental sample of step (b) with the polynucleotide probe of step (a); and (d) isolating a nucleic acid that specifically hybridizes with the polynucleotide probe of step (a), thereby isolating or recovering a nucleic acid encoding a polypeptide having enzyme activity from an environmental sample. The environmental sample can comprise a water sample, a liquid sample, a soil sample, an air sample or a biological sample. In one aspect, the biological sample can be derived from a bacterial cell, a protozoan cell, an insect cell, a yeast cell, a plant cell, a fungal cell or a mammalian cell. The invention provides methods of generating a variant of a nucleic acid encoding a polypeptide having enzyme (e.g., a glycosyl transferase, a methyltransferase, an aminotransferase, a 3,4, dehydratase, a 3-keto reductase, 4,6-dehydratase, 2,3- dehydratase, 4-ketoreductase, or an O-methyl transferase) activity comprising the steps of: (a) providing a template nucleic acid comprising a nucleic acid of the invention; and
(b) modifying, deleting or adding one or more nucleotides in the template sequence, or a combination thereof, to generate a variant of the template nucleic acid. In one aspect, the method can further comprise expressing the variant nucleic acid to generate a variant polypeptide. The modifications, additions or deletions can be introduced by a method comprising error-prone PCR, shuffling, oligonucleotide-directed mutagenesis, assembly PCR, sexual PCR mutagenesis, in vivo mutagenesis, cassette mutagenesis, recursive ensemble mutagenesis, exponential ensemble mutagenesis, site-specific mutagenesis, gene reassembly, Gene Site Saturation Mutagenesis™ (GSSM™), synthetic ligation reassembly (SLR) or a combination thereof. In another aspect, the modifications, additions or deletions are introduced by a method comprising recombination, recursive sequence recombination, phosphothioate-modified DNA mutagenesis, uracil-containing template mutagenesis, gapped duplex mutagenesis, point mismatch repair mutagenesis, repair-deficient host strain mutagenesis, chemical mutagenesis, radiogenic mutagenesis, deletion mutagenesis, restriction-selection mutagenesis, restriction-purification mutagenesis, artificial gene synthesis, ensemble mutagenesis, chimeric nucleic acid multimer creation and a combination thereof. In one aspect, the method can be iteratively repeated until an enzyme having an altered or different activity, an altered or different stability, an altered or different substrate specificity, or an altered or different level of expression in a host cell, from that of a polypeptide encoded by the template nucleic acid is produced. In one aspect, the variant enzyme is thermotolerant, and retains some activity after being exposed to an elevated temperature. In another aspect, the variant enzyme has increased glycosylation as compared to the enzyme encoded by a template nucleic acid. In another embodiment the enzyme has increased glycosyltransferase activity as compared to the enzyme encoded by the starting template. Alternatively, the variant enzyme polypeptide has enzyme activity under a high temperature, wherein the enzyme encoded by the template nucleic acid is not active under the high temperature. In one aspect, the method can be iteratively repeated until an enzyme coding sequence having an altered codon usage from that of the template nucleic acid is produced. In another aspect, the method can be iteratively repeated until an enzyme gene having higher or lower level of message expression or stability from that of the template nucleic acid is produced. The invention provides methods for modifying codons in a nucleic acid encoding a polypeptide having an enzyme activity to increase its expression in a host cell, the method comprising the following steps: (a) providing a nucleic acid of the invention encoding a polypeptide having an enzyme activity; and, (b) identifying a non-preferred or a less preferred codon in the nucleic acid of step (a) and replacing it with a preferred or neutrally used codon encoding the same amino acid as the replaced codon, wherein a preferred codon is a codon over-represented in coding sequences in genes in the host cell and a non-preferred or less preferred codon is a codon under-represented in coding sequences in genes in the host cell, thereby modifying the nucleic acid to increase its expression in a host cell. The invention provides methods for modifying codons in a nucleic acid encoding a polypeptide having an enzyme activity; the method comprising the following steps: (a) providing a nucleic acid of the invention; and, (b) identifying a codon in the nucleic acid of step (a) and replacing it with a different codon encoding the same amino acid as the replaced codon, thereby modifying codons in a nucleic acid encoding an enzyme. The invention provides methods for modifying codons in a nucleic acid encoding a polypeptide having an enzyme activity to increase its expression in a host cell, the method comprising the following steps: (a) providing a nucleic acid of the invention encoding an enzyme; and, (b) identifying a non-preferred or a less preferred codon in the nucleic acid of step (a) and replacing it with a preferred or neutrally used codon encoding the same amino acid as the replaced codon, wherein a preferred codon is a codon over- represented in coding sequences in genes in the host cell and a non-preferred or less preferred codon is a codon under-represented in coding sequences in genes in the host cell, thereby modifying the nucleic acid to increase its expression in a host cell. The invention provides methods for modifying a codon in a nucleic acid encoding a polypeptide having an enzyme activity to decrease its expression in a host cell, the method comprising the following steps: (a) providing a nucleic acid of the invention; and (b) identifying at least one preferred codon in the nucleic acid of step (a) and replacing it with a non-preferred or less preferred codon encoding the same amino acid as the replaced codon, wherein a preferred codon is a codon over-represented in coding sequences in genes in a host cell and a non-preferred or less preferred codon is a codon under-represented in coding sequences in genes in the host cell, thereby modifying the nucleic acid to- decrease its expression in a host cell. In one aspect, the host cell can be a bacterial cell, a fungal cell, an insect cell, a yeast cell, a plant cell or a mammalian cell. The invention provides methods for producing a library of nucleic acids encoding a plurality of modified enzyme active sites or substrate binding sites, wherein the modified active sites or substrate binding sites are derived from a first nucleic acid comprising a sequence encoding a first active site or a first substrate binding site the method comprising the following steps: (a) providing a first nucleic acid encoding a first active site or first substrate binding site, wherein the first nucleic acid sequence comprises a sequence that hybridizes under stringent conditions to a nucleic acid of the invention, and the nucleic acid encodes an enzyme active site or substrate binding site; (b) providing a set of mutagenic oligonucleotides that encode naturally-occurring amino acid variants at a plurality of targeted codons in the first nucleic acid; and, (c) using the set of mutagenic oligonucleotides to generate a set of active site-encoding or substrate binding site- encoding variant nucleic acids encoding a range of amino acid variations at each amino acid codon that was mutagenized, thereby producing a library of nucleic acids encoding a plurality of modified active sites or substrate binding sites. In one aspect, the method comprises mutagenizing the first nucleic acid of step (a) by a method comprising an optimized directed evolution system, Gene Site Saturation Mutagenesis™ (GSSM™), synthetic ligation reassembly (SLR), error-prone PCR, shuffling, oligonucleotide-directed mutagenesis, assembly PCR, sexual PCR mutagenesis, in vivo mutagenesis, cassette mutagenesis, recursive ensemble mutagenesis, exponential ensemble mutagenesis, site-specific mutagenesis, gene reassembly, and a combination thereof. In another aspect, the method comprises mutagenizing the first nucleic acid of step (a) or variants by a method comprising recombination, recursive sequence recombination, phosphothioate-modified DNA mutagenesis, uracil-containing template mutagenesis, gapped duplex mutagenesis, point mismatch repair mutagenesis, repair-deficient host strain mutagenesis, chemical mutagenesis, radiogenic mutagenesis, deletion mutagenesis, restriction-selection mutagenesis, restriction-purification mutagenesis, artificial gene synthesis, ensemble mutagenesis, chimeric nucleic acid multimer creation and a combination thereof. The invention provides methods for making a small molecule comprising the following steps: (a) providing a plurality of biosynthetic enzymes capable of synthesizing or modifying a small molecule, wherein one of the enzymes is encoded by a nucleic acid of the invention; (b) providing a substrate for at least one of the enzymes of step (a); and (c) reacting the substrate of step (b) with the enzymes under conditions that facilitate a plurality of biocatalytic reactions to generate a small molecule by a series of biocatalytic reactions. The invention provides methods for modifying a small molecule comprising the following steps: (a) providing an enzyme of the invention, or, a polypeptide encoded by a nucleic acid of the invention, or a subsequence thereof; (b) providing a small molecule; and (c) reacting the enzyme of step (a) with the small molecule of step (b) under conditions that facilitate an enzymatic reaction catalyzed by the enzyme, thereby modifying a small molecule by an enzymatic reaction. In one aspect, the method can comprise a plurality of small molecule substrates for the enzyme of step
(a), thereby generating a library of modified small molecules produced by at least one enzymatic reaction catalyzed by the enzyme. In one aspect, the method can comprise a plurality of additional enzymes under conditions that facilitate a plurality of biocatalytic reactions by the enzymes to form a library of modified small molecules produced by the plurality of enzymatic reactions. In another aspect, the method can further comprise the step of testing the library to determine if a particular modified small molecule which exhibits a desired activity is present within the library. The step of testing the library can further comprise the steps of systematically eliminating all but one of the biocatalytic reactions used to produce a portion of the plurality of the modified small molecules within the library by testing the portion of the modified small molecule for the presence or absence of the particular modified small molecule with a desired activity, and identifying at least one specific biocatalytic reaction that produces the particular modified small molecule of desired activity. The invention provides methods for determining a functional fragment of an enzyme comprising the steps of: (a) providing an enzyme of the invention, or a polypeptide encoded by a nucleic acid of the invention, or a subsequence thereof; and (b) deleting a plurality of amino acid residues from the sequence of step (a) and testing the remaining subsequence for activity, thereby determining a functional fragment of the enzyme. In one aspect, the activity is measured by providing an appropriate substrate and detecting a decrease in the amount of the substrate or an increase in the amount of a reaction product. The invention provides methods for whole cell engineering of new or modified phenotypes by using metabolic flux analysis, the method comprising the following steps: (a) making a modified cell by modifying the genetic composition of a cell, wherein the genetic composition is modified by addition to the cell of a nucleic acid of the invention; (b) culturing the modified cell to generate a plurality of modified cells; (c) measuring at least one metabolic parameter of the cell by monitoring the cell culture of step (b), optionally in real time; and, (d) analyzing the data of step (c) to determine if the measured parameter differs from a comparable measurement in an unmodified cell under similar conditions, thereby identifying an engineered phenotype in the cell using real-time metabolic flux analysis. In one aspect, the genetic composition of the cell can be modified by a method comprising deletion of a sequence or modification of a sequence in the cell, or, knocking out the expression of a gene. In one aspect, the method can further comprise selecting a cell comprising a newly engineered phenotype. In another aspect, the method can comprise culturing the selected cell, thereby generating a new cell strain comprising a newly engineered phenotype. The invention provides methods of increasing thermotolerance or thermostability of an enzyme of the invention, the method comprising glycosylating a polypeptide, wherem the polypeptide comprises at least thirty contiguous amino acids of a polypeptide of the invention; or a polypeptide encoded by a nucleic acid sequence of the invention, thereby increasing the thermotolerance or thermostability of the enzyme of the invention. In one aspect, the enzyme specific activity can be thermostable or thermotolerant at a temperature in the range from greater than about 37°C to about 95°C. The invention provides methods for overexpressing a recombinant polypeptide in a cell comprising expressing a vector comprising a nucleic acid comprising a nucleic acid of the invention or a nucleic acid sequence of the invention, wherein the sequence identities are determined by analysis with a sequence comparison algorithm or by visual inspection, wherein overexpression is effected by use of a high activity promoter, a dicistronic vector or by gene amplification of the vector. The invention provides a kit comprising a polypeptide of the invention or a polypeptide encoded by a nucleic acid of the invention.
The details of one or more embodiments of the invention are set forth in the accompanying drawings and the description below. Other features, objects, and advantages- of the invention will be apparent from the description and drawings, and from the claims. All publications, patents, patent applications, GenBank sequences and ATCC deposits, cited herein are hereby expressly incorporated by reference for all purposes.
DESCRIPTION OF DRAWINGS The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee. Figure 1 is an illustration of natural products that can be glycosylated using the systems and methods of the invention. Figure 2 illustrates exemplary spinosyns that are modified (glycosylated) using the systems and methods of the invention. Figure 3 illustrates the structures of doxorubicin, MEN10755 and the general structure of anthracyclines that can be modified (glycosylated) using the systems and methods of the invention. Figure 4 illustrates an exemplary doxorubicin biosynthesis pathway. , Figure 5 the structures of anthracyclines that can be modified
(glycosylated) using the systems and methods of the invention. Figure 6 illustrates the organization of the spinosyn gene cluster and location of isolated fosmid clones, as discussed in detail in Example 1, below. Figure 7 illustrates substrates used in conversion experiments: aglycone, 9- pseudo-aglycone, 9-PSA, and 17-pseudo-aglycone, 17-PSA, in Figure 7a, and 17-keto aglycone in Figure 7b, as discussed in detail in Example 5, below. Figure 8 illustrates structures of: Figure 8(a) M548, Figure 8(b) M548-II, and Figure 8(c) M532, as discussed in detail in Example 7, below. Figure 9 illustrates L-rhamnose biosynthesis in Sac. spinosa and 6-deoxy- D-glucose biosynthesis in an exemplary engineered Streptomyces of the invention, as discussed in detail in Example 7, below. Figure 10 illustrates the structures spinosyn C, as discussed in detail in Example 1, below. Figure 11 illustrates structures of novel spinosyn derivatives of the invention: M689 Figure 11(a), M689-II Figure 1(b) and M673 Figure 11(c), as discussed in detail in Example 8, below. Figure 12 illustrates a general scheme for deoxysugar biosynthesis that can be used to practice the invention, as discussed in detail in Example 9, below. The 4,6-di- deoxysugars are additional sugars derived from 4-keto-6-deoxy-dNDP-glucose. Figure 13 illustrates the structure of viriplanin. Figure 14 illustrates an exemplary solid phase gene reassembly technique used to generate enzymes used in the compositions and methods of the invention, as discussed in detail in Example 10, below. Figure 15 illustrates a model for the construction of an exemplary deoxysugar pathway used in the compositions and methods of the invention, as discussed in detail in Example 10, below. Figure 16 illustrates a scheme for constructing a library of pathways including various biosynthetic genes to be used in the compositions and methods of the invention, as discussed in detail in Example 10, below. Figure 17 illustrates the vector pUWL201, as discussed in detail in Example 14, below. Figure 18 illustrates the vector pAT6-gtt21-gdhll-epil2-krel5, into which is cloned a deoxysugar pathway, as discussed in detail in Example 14, below. Figure 19a illustrates the structure of spinosyn A aglycone; and, Figure 19b illustrates a novel compound of the invention, M548, or 6-deoxy-D-glucose- 17- pseudoaglycone, as discussed in detail in Example 14, below. Figure 20 illustrates a novel compound of the invention, Compound M548-II, or L-rhamnosyl- 17-pseudo-aglycone, as discussed in detail in Example 15, below. Figure 21 illustrates an engineered deoxysugar pathway of the invention, designated pathway #3 in vector pAT6-gtt21-gdhll-epil2-kre9, as discussed in detail in Example 15, below. Figure 22 illustrates an engineered deoxysugar pathway of the invention, pathway #9, cloned into vector pAT6-gtt-gdh-e8-k9-tdh-tkr, as discussed in detail in Example 15, below. Figure 23 illustrates an engineered deoxysugar pathway, pathway #12, as cloned into the vector pAT6-gtt-gdh-el2-kl5-tdh-tkr, as discussed in detail in Example 16, below. Figure 24 illustrates a novel compound of the invention, Compound M532, or L-digitoxosyl- 17-pseudo-aglycone, as discussed in detail in Example 16, below. Figure 25a illustrates the structure of spinosyn A 9-pseudo-aglycone (9- PSA); Figure 25b illustrates a novel compound of the invention, 9-6-deoxy-D-glucosyl- spinosyn A of the invention, as discussed in detail in Example 17, below. Figure 26 illustrates a novel compound of the invention, Compound
M689-II, 9-L-rhamnosyl-spinosyn A, as discussed in detail in Example 18, below. Figure 27 illustrates a novel compound of the invention, Compound M673, 9-L-digitoxosyl-spinosyn A, as discussed in detail in Example 18, below. Figure 28 illustrates a map of pathway #1 in plasmid pAT6, as discussed in detail in Example 21, below. Figure 29 illustrates various inserts of complete and incomplete 6- deoxysugar pathways, as discussed in detail in Example 21, below. Figure 30 illustrates the conversion of an aglycone with complete or incomplete pathways with SpnG by S. diversa, as discussed in detail in Example 21, below. Figure 31 shows deoxy sugar compounds that can be synthesized and/or transferred to macrolides by embodiments of the invention. In particular these compounds are suitable as a deoxy sugar at the 9 position of a spinosyn or a spinosyn pseudoaglycone. In the figure R5 is C2-C6 alkyl, C3-C6 branched alkyl, C3-C7 cycloalkyl, C1-C6 alkoxy-Cl-C6 alkyl, C1-C6 alkylthio-Cl-C6 alkyl, halo C1-C6 alkyl, C2- C6 alkenyl, C2-C6 alkynyl, C1-C6 alkylcarbonyl, or C3-C6 branched alkylcarbonyl, C3- C7-cycloalkylcarbonyl, C1-C6 alkoxy-Cl-C6 alkylcarbonyl, halo C1-C6 alkylcarbonyl, C2-C6 alkenylcarbonyl, C2-C6 alkynylcarbonyl, or formyl. Figure 32 shows deoxy sugar compounds that can be synthesized and/or transferred to macrolides by embodiments of the invention. In particular these compounds are suitable as a deoxy sugar at the 17 position of a spinosyn or a spinosyn pseudoaglycone. In the figure R5 is C2-C6 alkyl, C3-C6 branched alkyl, C3-C7 cycloalkyl, C1-C6 alkoxy-Cl-C6 alkyl, C1-C6 alkylthio-Cl-C6 alkyl, halo C1-C6 alkyl, C2- C6 alkenyl, C2-C6 alkynyl, C1-C6 alkylcarbonyl, or C3-C6 branched alkylcarbonyl, C3- C7-cycloalkylcarbonyl, C1-C6 alkoxy-Cl-C6 alkylcarbonyl, halo C1-C6 alkylcarbonyl, C2-C6 alkenylcarbonyl, C2-C6 alkynylcarbonyl or formyl; and R6 in formula M is Cl- C6 alkyl, C1-C6 alkenyl, formyl, C1-C6 alkylcarbonyl, or C3-C6 branched alkylcarbonyl. Figure 33 shows the formulas of additional deoxysugars and of spinosyn variants of the present invention. The deoxysugars can be used to glycosylate other aglycones or pseudoaglycones as discussed herein. Figure 34 is a schematic depicting exemplary combinatorial enzyme pathways for production, transfer and modification of deoxy sugars. Figure 35 depicts the basic ring structure of 21-butenyl spinosyns suitable for use in embodiments
Like reference symbols in the various drawings indicate like elements. DETAILED DESCRIPTION The invention provides in vivo glycosylation systems for the biosynthesis of novel glycosylated natural products for pharmaceutical and agrochemical applications. The glycosylated natural products made by the methods and in vivo glycosylation systems of the invention can be a source of valuable leads in pharmaceutical drug development and agrochemical applications. The novel glycosylation platform of the invention exploits the substrate flexibility of glycosyltransferases and the abundance of deoxysugar biosynthetic pathways found in Actinomyceales. As components of glycoconjugates, sugars contribute to a large repertoire of compounds with diverse biological activities (for example see Weymouth- Wilson, A. C. Nat. Prod. Rep. 1997, 14 (2), 99-110). This is particularly true of the deoxysugars found in natural products from actinomycetes, where their removal generally results in loss or reduction of bioactivity. Enzymatic and genetic studies have led to a good understanding of deoxysugar biosynthesis and of glycosylation in actinomycetes (Walsh, C. T.; Freel Meyers, C. L.; Losey, H. C. J. Med. Chem. 2003, and Trefzer, A.; Salas, J. A.; Bechthold, A. Nat. Prod. Rep. 1999, 16 (3), 283-299). These studies have revealed that the enzymes involved in sugar biosynthesis and transfer steps of antibiotic biosynthesis are flexible towards their substrates. As a result, these sugar moieties have suggested as targets for modification by combinatorial biosynthesis (Mendez, C; Salas, J. A. Trends Biotechnol 2001, 19 (11), 449-456), where the structures and bioactivities of natural products are modulated by modification of biosynthetic genes. Various alternative approaches have been attempted, where gene deletions of polyketide synthetase genes or their domains or heterologous gt expression, resulting in the production of differentially glycosylated derivatives of some compounds (see Walsh, C. T.; Freel Meyers, C. L.;
Losey, H. C, J. Med. Chem. 2003; and Trefzer, A.; Salas, J. A.; Bechthold, A. Nat. Prod. Rep. 1999, 16 (3), 283-299; and Mendez, C; Salas, J. A. Trends Biotechnol 2001, 19 (11), 449-456). Methods for chemo-enzymatic synthesis of activated deoxysugars and transfer has been developed, enabling the in vitro generation of glycosylated derivatives using natural and unnatural sugars (Sparks, T. C; Thompson, G. D.; Kirst, H. A.;
Hertlein, M. B.; Mynderse, J. S.; Turner, J. R.; Worden, T. V. In Biopesticides: Use and Delivery, Hall, F. R., Menn, J. J., Eds.; Humana Press: Totowa, NJ, 1999; pp 171-188). The glycosylated macrolide spinosyn A (1) produced by Saccharopolyspora spinosa is a commercially important insecticide. The spinosyn tetracyclic aglycone carries a per-methylated L-rhamnose moiety at C9 and a D- forosamine moiety at C17. Studies on structure activity relation ship (SAR) have established the importance of both sugar moieties and the O-methylation of the L- rhamnose moiety for activity (Sparks, T. C; Thompson, G. D.; Kirst, H. A.; Hertlein, M.
B.; Mynderse, J. S.; Turner, J. R.; Worden, T. V. In Biopesticides: Use and Delivery, Hall, F. R., Menn, J. J., Eds.; Humana Press: Totowa, NJ, 1999; pp 171-188). Both spinosyn biosynthesis and its gene cluster have been characterized (Madduri, K.; Waldron, C; Merlo, D. J. Journal Of Bacteriology 2001, 183 (19), 5632-5638; and Waldron, C; Matsushima, P.; Rosteck, P. R., Jr.; Broughton, M. C; Turner, J.; Madduri, K.; Crawford, K. P.; Merlo, D. J.; Baltz, R. H. Chem Biol 2001, 8 (5), 487-499). The aglycone is first assembled and cyclized, L-rhamnose is then attached and methylated, and finally D- forosamine is added. The invention provides a novel technology for glycosylation of natural products using novel genetically engineered strains of bacteria. These in vivo glycosylation systems of the invention express a heterologous glycosyltransferase and deoxysugar pathway that are capable of glycosylating a suitable substrate, which can be added to a culture broth. The invention also provides novel compounds glycosylated by the genetically engineered strains of the invention (the in vivo glycosylation systems of the invention), including novel peptides, a mixed polyketide-peptide, or polyketides, including novel macrolides (see Figure 1), e.g., glycosylated spinosyn derivatives, glycosylated derivatives of antibiotics such as erythromycin, tetracycline, rifampicin, glycosylated derivatives of anti-tumor drugs such as anthracyclines (e.g., doxorubicin, and second generation anthracyclines such as idarubicin and epirubicin) daunorubicin, mithramycin, derivatives of immunosuppressants such as rapamycin, FK520, FK506, glycosylated derivatives of anti-fungals such as amphotericin, glycosylated derivatives of antibacterials such as tylosin, glycosylated derivatives of antiparasitics such as avermectin, glycosylated derivatives of insecticides such as spinosyn, and methods for making these compounds using the in vivo glycosylation systems of the invention. In one aspect, these glycosylated derivatives of the invention are disaccharide derivatives. The natural products can be aglycones or pseudoaglycones. By "pseudoaglycone" is meant a compound that is the result of removing only one or more but not all sugars from the parent compound. Figure 1 illustrates exemplary natural products, polyketide derived pharmaceuticals from actinomycetes, that can be modified (glycosylated) using the systems and methods of the invention; sugar moieties shown in red. Figure 2 illustrates exemplary spinosyns that are modified (glycosylated) using the systems and methods of the invention. Spinosyns glycosylated using the systems and methods of the invention include those from S. spinosa containing D-forosamine and a tri-O-methyl L-rhamnose, and S. pogona that produces derivatives with variations in the aglycone and shows a broader range of sugar substituents at the 17-position. Glycosylated derivatives of the invention include both spiriosyn A and the 17-pseudo-aglycone (17-PSA) generated by replacing the tri-O-methyl L-rhamnose with alternative sugars (see Examples, below). Additional spinosyns that can be modified using the systems and methods of the invention include the 21-butenyl spinosyns and their aglycones and pseudoaglycones as disclosed in WQ02077004 published October 3, 2002, which is incorporated by reference herein for its disclosure of these starting compounds and sugars. Figure 35 shows two 21- butenyl spinosyn backbones (I and II) that can be glycosylated (at Sugar and Sugar 1 positions) via the methods of the invention. In formulas I and II of Figure 35, Rl is H, OH, OCH3, OR5, or =0; R2 is H or CH3; R3 and R4 are H or combine to form a double bond or combine to form an epoxide group; RIO in formula I is trans- 1-butenyl, 1,4- butadieny, n-butyl, 3 -hydroxy- 1-butenyl, n-propyl, 1-propenyl, 1,2-epoxy-l -butyl, 3-oxo- 1-butenyl, CH3CH(OCH3)CH=CH-, CH3CH(OR5)CH=CH-,
CH3CH=CHCH(CH2C02Me)-5 or CH3CH=CHCH(CH2CON(Me2))-; RIO in formula II is ethyl; R5 is C2-C6 alkyl, C3-C6 branched alkyl, C3-C7 cycloalkyl, C1-C6 alkoxy-Cl- C6 alkyl, C1-C6 alkylthio-Cl-C6 alkyl, halo C1-C6 alkyl, C2-C6 alkenyl, C2-C6 alkynyl, C1-C6 alkylcarbonyl, or C3-C6 branched alkylcarbonyl, C3-C7-cycloalkylcarbonyl, Cl- C6 alkoxy-Cl-C6 alkylcarbonyl, halo C1-C6 alkylcarbonyl, C2-C6 alkenylcarbonyl, C2- C6 alkynylcarbonyl or formyl. Additional deoxy sugars that can be produced and/or transferred using the methods and systems of the invention include the deoxy sugars shown in Figures 31 and 32, which are also disclosed in WO02077004 published October 3, 2002, which is incorporated by reference for its disclosure of these deoxy sugars. Accordingly in one embodiment are 21-butenyl-spinosyns and pseudoaglycones wherein the Sugar and/or Sugar 2 positions are any of the deoxysugars produced or disclosed herein. In another embodiment are spinosyns or their pseudoaglycones produced by the heterologous expression methods of the present invention that have at their 9- and/or 17-positions any of the deoxysugars produced or disclosed herein These butenyl-spinosyn starting materials can be prepared by culturing one of the following strains of Saccharopolyspora sp. that were deposited on the dates indicated in accordance with the terms of the Budapest treaty at the Midwest Area
Regional Research Center, Agricultural Research Service, United States Department of
Agriculture, 815 North University Street, Peoria, IL 61604, as indicated in
WO02077004 published October 3, 2002: Deposit Number deposit date NRRL 3 0141 June 9, 1999 NRRL 30424 March 8. 2001 NRRL 30423 March 8, 2001 NRRL 30422 March 8, 2001 NRRL 30438 March 15, 2001 NRRL 30421 March 8, 2001 NRRL 30437 March 15, 2001 These strains are suitable as hosts for the in vivo heterologous expression methods of the present invention. Further these strains are suitable sources for extracts and enzymes for practicing the in vitro heterologous expression system of the present invention. In one embodiment is provided a method to identify and isolate the glycosyltransferases and deoxysugar biosynthetic pathways of these organisms using the polynucleotide sequences presented herein, as would be known by gene cloning and isolation methods in the art and as taught herein. In one aspect, D-forosamine is replaced with the neutral sugars L- mycarose or D-glucose using the erythromycin producer S. erythraea as a host strain. The transfer of glucose was due to an endogenous activity of S. erythraea, while transfer of the L-mycarose occurred after expression of SpnP. SpnG and SpnP are remarkably flexible towards their deoxysugar substrates. Both enzymes are able to transfer D- and L- sugars, with SpnP catalyzing attachment of both amino and neutral sugars, and SpnG accepting 6-deoxysugars and 2,6-dideoxysugars as substrates. This broad substrate range makes them valuable tools for combinatorial biosynthesis. Glycosylated spinosyns of the invention are highly active on chewing insects (e.g. caterpillars) and/or sucking insects (e.g. aphids), and, in one aspect, act as broad-spectrum insecticides. Anthracyclines, e.g., doxorubicin, and second generation anthracyclines such as idarubicin and epirubicin can also be modified (glycosylated) using the systems and methods of the invention. In one aspect, the glycosylated derivatives of anthracyclines, e.g., doxorubicin, and second generation anthracyclines such as idarubicin and epirubicin of the invention are disaccharide derivatives. Doxorubicin is an anthracycline type polyketide widely used in anticancer chemotherapy. It consists of a tetracyclic aglycone that carries an aminosugar. Figure 3 illustrates the structures of doxorubicin, MEN10755 (see, e.g., Bos, et al. (2001) Cancer Chemotherapy and Pharmacol. 48(5):361-369) and the general structure of anthracyclines that can be modified (glycosylated) using the systems and methods of the invention; arrows indicate positions of glycosylation. Any anthracycline (see Figure 5), including aclacinomycin, nogalamycin, rhodomycin and doxorubicin, and/or any intermediate in the biosynthesis of an anthracycline can be modified (glycosylated) using the systems and methods of the invention, e.g., as illustrated in Figure 4. Anthracycline biosynthesis employs a type II PKS to generate the aglycone, which is then further modified by oxygenases, reductases and cyclases to form the final tetracyclic anthraquinone moiety. The invention also provides methods for the further modification of glycosyl residues of the novel compositions of the invention, including deoxygenation, methylation and amination. In one aspect, the in vivo glycosylation system of the invention further comprise heterologous enzymes for deoxygenation, methylation, carbamoylation and amination, e.g., spinosyn O-methyltransferases, and one, several or all of the enzymes listed in Figure 15 (see also Example 10, below) and Table 8 (see Example 11 , below). In one embodiment are sugars where the sugars described herein have at least one or more CH3 replaced by an alkyl selected from C2 to C6 alkyl. Such modified forms of sugars A, C and E of Figure 33 can also be used. In another embodiment are sugars where the sugars herein have at least one or more CH3 replaced by C2-C6 alkyl, C3-C6 branched alkyl, C3-C7 cycloalkyl, C1-C6 alkoxy-Cl-C6 alkyl, Cl- C6 alkylthio-Cl-C6 alkyl, halo C1-C6 alkyl, C2-C6 alkenyl, C2-C6 alkynyl, C1-C6 alkylcarbonyl, or C3-C6 branched alkylcarbonyl, C3-C7-cycloalkylcarbonyl, C1-C6 alkoxy-Cl-C6 alkylcarbonyl, halo C1-C6 alkylcarbonyl, C2-C6 alkenylcarbonyl, C2-C6 alkynylcarbonyl or formyl group. In this regard, alternative aspects comprise modified forms of sugars A, C, and E of Figure 33. Other alternative compounds are those where the spinosyn has at least one such modified sugar, particularly a modified sugar A, C, or
E of Figure 33. In one aspect, the position of genes in a pathway of an in vivo glycosylation system of the invention can follow the same order as their gene-product functions in biosynthesis, for example, early genes (dNDP-transferase, 4,6-dehydratase), followed by intermediate genes (e.g. epimerases, 2-deoxygenation, C-methylations), and with late genes (e.g. 4-ketoreductases, aminotransferases) at the end. In one aspect, in vivo glycosylation systems of the invention comprise smaller, specialized sub-libraries, such as 6-deoxysugar pathway libraries or amino-sugar libraries. In one aspect, the methods of the invention comprise providing a transgenic plant or non-human animal capable of constitutively or inducibly expressing an in vivo glycosylation system of the invention. In one aspect, a transgenic plant or non- human animal of the invention is used to generate a compound of the invention. These can be applied to a plant, plant part, cell, animal or any surface needing treatment. In one aspect, a natural product of the invention can be prophylactically applied to any plant, animal or surface as an anti-microbial or insecticidal agent. The invention includes in vitro or in vivo methods for making the novel compositions of the invention, e.g., using transgenic plants, genetically engineered cells and cell extracts, or other biocatalytic processes. The invention provides transgenic plants, genetically engineered cells and cell extracts comprising introduced nucleic acids encoding a glycosylation system of the invention. In one aspect, the nucleic acids encoding all or part of a glycosylation system of the invention is under the control of an inducible transcriptional control element, e.g., a promoter and/or enhancer or a constitutive transcriptional control element, e.g., a promoter and/or enhancer, e.g., a cauliflower mosaic virus (CaMV) 35S transcription initiation region, a 1'- or 2'- promoter derived from T-DNA of Agrobacterium tumefaciens. In one aspect, the introduced nucleic acid encoding all or part of a glycosylation system of the invention is cloned into an expression vehicle, e.g., a vector, a plasmid, a phagemid, a phage, a recombinant virus, vectors from Agrobacterium spp., and the like. The detailed knowledge of deoxysugar biosynthesis and of spinosyn biosynthesis with the teachings of the present invention enables a heterologous system useful to generate novel differentially glycosylated macrolides, including spinosyns, that have various uses as described herein. The numerous novel structures allow construction of an annotated library that provides a structure activity relationship analysis (SAR) to developing even more useful compounds. While the Examples exemplify embodiments in terms of novel systems for generating novel spinosyn compounds, the systems and methods are sufficiently versatile to provide a combinatorial glycosylation system for other natural products, such as macrolides. As further described in the Examples, Streptomyces diversa™, which is an actinomycete that is amenable to genetic manipulation, was chosen as host strain. As described herein, in alternative aspects of the invention, other hosts, including any species of Streptomyces, are used in the methods and systems of the invention. Deoxysugar biosynthetic genes were sourced from the S. spinosa L-rhamnose and D-forosamine biosynthetic genes and from an L-digitoxose pathway from & diversa (the pathway tdh- tkr-gtt-gdh-epi-kre genes makes L-digitoxose which is transferred to polyketides by an endogenous gt). Individual genes were cloned from chromosomal DNA using PCR. These were assembled into artificial operons resulting in the construction of a total of 12 different hybrid pathways (A-L). See Figure 34 and related figures and text throughout this specification for the genes comprising the novel combinatorial biosynthetic pathways. Based on the predicted or known function of the constituent genes included, pathways A-D were predicted to produce 6-deoxysugars, pathways E-H were predicted to produce 2,6-dideoxysugars, and pathways I-L to yield 2,3,6-trideoxysugars The 6- deoxysugar pathways consisted of glucose- 1-phosphate-thymidyltransferase, dTDP- glucose-4,6-dehydratase, epimerase and 4-ketoreductase; the 2,6-dideoxysugar pathways contained additional genes required for 2-deoxygenation; the 2,3,6-tridexysugar pathways contained additional functions for 3 -deoxygenation and either a 4-ketoreductase or a 4- aminotransferase. From the D-forosamine biosynthetic pathway the gene functions are spnN is 2,3-dehydratase, spnO is 3-ketoreductase, spnQ is 3,4-dehydratase, spnR is transaminase, and spnS is dimethyltransferase. From the L-rhamnose biosynthetic pathway the gene functions are gtt is NDP-glucose synthase, gdh is NDP-glucose-4,6- dehydratase, epi is 3',5'-epimerase, and kre is 4'-ketoreductase. From the L-digitoxose biosynthetic pathway the genes functions are gtt is NDP-glucose synthase, gdh is NDP- glucose-4,6-dehydratase, epi is 5'-epimerase, kre is 4'-ketoreductase, tdh is 2,3- dehydratase, and tkr is 3-ketoreductase. For expression in S. diversa each pathway was cloned into an integrative actinomycete vector under control of the ermEp* promoter. The glycosyltransferase genes spnG and spnP were cloned into the vector pUWL201 (Doumith, M.; Weingarten, P.; Wehmeier, U. R; Salah-Bey, K.; Benhamou, B.; Capdevila, C; Michel, J. M.; Piepersberg, W.; Raynal, M. C. Mol. Gen. Genet. 2000, 264 (4), 477-485). Pathways and glycosyltransferases were then combined in S. diversa. As discussed in the Examples, the recombinant S. diversa strains were utilized in bioconversion experiments to assess their capabilities to generate glycosylated spinosyn derivatives. In spinosyn biosynthesis, SpnG transfers L-rhamnose to the aglycone (compound 2 in Figure 33) and then SpnP transfers D-forosamine to the 17- pseudo-aglycone (17-PSA) (compound 3 in Figure 33). Therefore compounds (2) and (3) were used as substrates for bioconversion experiments using strains containing SpnG and SpnP, respectively. Conversion was monitored by HPLC-UV and HPLC-MS. Sugars and spinosyn compounds produced by the combinatorial biosynthetic pathways are also shown in Figure 33. Pathways A-D, in combination with SpnG, yielded two products with an apparent molecular weight of 548.3, consistent with attachment of a 6-deoxysugar to the aglycone. Larger amounts of both compounds were generated and their structures were elucidated by NMR to be alpha-L-rhamnosyl-17-PSA (compound 4 in Figure 33) and beta-D-quinovosyl-17-PSA (compound 5 in Figure 33). The relative levels at which these two compounds were produced was dependent on the particular pathway. Alpha-L- Rhamnose is the predicted end product of pathway A and B, and so production of alpha- L-rhamnosyl-17-PSA (compound 4 of Figure 33) by these strains confirms the overall viability of the approach described. However, alpha-L-rhamnose is not a product predicted for pathways C and D, and production of D-quinovose was not predicted for any of pathways A-D. Furthermore, in all bioconversion experiments, the 17-keto- aglycone was also isolated. These unexpected sugars indicate that the method and systems of the invention provide a novel combinatorial approach to sugar biosynthetic pathway construction that can generate new sugars or sugars not produced by the parent pathways. Consequently, by providing combinatorial sugar biosynthetic pathways, optionally with a heterologous glycosyltransferase, in a host that produces an aglycone or pseudoaglycone form(s) of an interesting biomolecule, the methods and systems of the invention provides libraries of new glycosylated forms of those biomolecules. Of the pathways (E-H) designed to generate dideoxy sugars, pathway H resulted in a novel product. The bioconversion experiment using this pathway in combination with SpnG yielded alpha-L-digitoxosyl-17-PSA (compound 6 of Figure 33). In contrast, pathways E-G gave the same products in essentially the same ratios as pathways A-C. Pathways I-L were assayed both for conversion of the aglycone (compound 2 of Figure 33) in combination with SpnG and for conversion of 17-PSA
(compound 3 of Figure 33) with SpnP. Pathways I and J are predicted to produce amino sugars. In combination with SpnP, pathway I yielded spinosyns A (compound 1 of Figure 33) and B (compound 7 of Figure 33), whereas pathway J (which lacks the N- methyltransferase spnS) gave spinosyn C (compound 8 of Figure 33) as expected. SpnG is not known to accept amino sugars as substrates and no glycosylated spinosyn derivatives were detected in these experiments. Pathways K and L with either SpnG or
SpnP did not result in formation of any new products. Previous studies with S. spinosa have shown that glycosylation occurs in sequence, with attachment of the L-rhamnose to C9 first. Therefore incubation of the 9- pseudo-aglycone (9-PSA) (compound 9 of Figure 33) with strains containing SpnG and producing D-quinovose (pathway D), L-rhamnose (pathway A), and L-digitoxose (pathway H) were not expected to give doubly-glycosylated spinosyn derivatives. Surprisingly, three new spinosyns (alpha-L-rhamnosyl-spinosyn A (compound 10 of
5 Figure 33) beta-D-quinosovyl-spinosyn A (compound 11 of Figure 33), and alpha-L- digitoxosyl spinosyn A (compound 12 of Figure 33)) were observed. Similar conversion ratios to those with the aglycone were obtained. All three novel derivatives were produced in good yield (using 20mg of compound 9 as a substrate, 7.5mg of compound 10, 3.5mg of compound 11, and lmg of compound 12 were obtained) and their structures o were confirmed by NMR after purification. To investigate the unexpected production of the D-quinovose derivative, and to identify the genes required for production of either L-rhamnose or D-quinovose in S. diversa, we deleted individual genes from pathways A-D. Biosynthesis of the two sugars was monitored through production of their respective 17-PSA analogues,5 compounds 4 and 5 of Figure 33, after feeding the aglycone. Relative production of compounds 4 and 5 by further modified combinatorial 6-deoxysugar pathways (genes were deleted) was determined and compared to the full combinatorial pathway. Pathway A yielded relative levels of 70 and 6 for compounds 4 and 5, respectively, whereas the modified pathway A (gtt-gdh-kre (S. diversa)), yielded 0 and 37. Pathway B yielded0 relative levels of 8 and 82 for compounds 4 and 5, respectively, whereas the modified pathway B (gtt-gdh-kre (S. spinosa)) yielded 0 and 64. Pathway C yielded relative levels of 9 and 38 for compounds 4 and 5, respectively, whereas the modified pathway C (gtt-gdh-epi (S. diversa.)) yielded 5 and 76. Pathway D yielded relative levels of 6 and 55 for compounds 4 and 5, respectively, whereas the modified pathway D (gtt-gdh-epi (S.5 spinosa)) yielded 0 and 38. In the absence of an added heterologous pathway, relative levels of compounds 4 and 5 were 0 and 0, whereas with modified pathway gtt-gdh levels were 0 and 38. From these data it was concluded that expression of a glucose- 1- phosphate-thimidyl-transferase gene (gtt) and a dTDP-glucose-4,6-dehydratase gene . (gdh) was sufficient for production of D-quinovose, while additional expression of an0 epimerase was required for production of L-rhamnose. The final 4-ketoreduction step in D-quinovose biosynthesis was catalyzed by an unidentified gene product present in S. diversa. These data demonstrate additional aspects of the invention. Further heterologous biosynthetic pathways can be obtained by mutagenesis of the novel heterologous pathways, whether by deleting, adding or replacing one or more genes, by mutagenesis of the pathway as by shuffling, or by mutagenesis of one or more genes in the pathway. Also in another embodiment, the host cell can provide one or more genes that encode enzymes that can participate in the pathway or can modify compounds produced by the heterologous pathway. One example is the use of a host enzyme for the final 4-ketoreduction step in D-quinovose biosynthesis. Further the host cell can provide endogenous genes and encoded enzymes that allow transfer of the sugar products of the biosynthetic pathways to biomolecules of interest, In one embodiment the host cell makes one or more aglycones or pseudoaglycones of a biomolecule of interest, which are the targets for glycosylation by host using the combinatorial methods and systems of the invention. In another embodiment, the host cell is provided with one or more aglycones or pseudoaglycones which are the target for glycosylation by the host using the combinatorial methods and systems of the invention As demonstrated herein both glycosylation steps have been obtained heterologously in S. diversa. Reconstruction of deoxysugar pathways led to the production of the host of two sugars D-forosamine and L-rhamnose naturally present in spinosyn but not in the host (S. diversa in this case). However, sugars not naturally present in the host nor naturally present in the macrolide, spinosyn in this case, L- digitoxose, D-quinovose and di-demethyl-D-forosamine, were also readily produced and transferred. Production and attachment of beta-D-quinovose was unexpected, since all 6-deoxysugar pathways contained an epi gene and were therefore predicted to produce L- sugars. D-quinovose has also been found in an engineered strain of the picromycin producer S. venezuelae that carries an inactivated desosamine pathway (Borisova, S. A.; Zhao, L.; Sherman, D. H.; Liu, H. W. Org Lett. 1999, 1 (1), 133-136). Since gtt and gdh are sufficient for D-quinovose production in S. diversa, we propose dNDP-4-keto-6- deoxyhexose as intermediate sugar and the presence of an unidentified endogenous 4- ketoreductase. Surprisingly the 5-epimerase from L-digitoxose biosynthesis was able to substitute for the 3,5-epimerase from L-rhamnose biosynthesis providing evidence for the flexibility of this enzyme. Attachment of D-quinovose in the beta anomeric configuration can be explained by the fact that SpnG treats the D-sugar as an L-sugar forcing it into a less favorable confirmation for the transfer (Gaisser, S.; Martin, C. J.; Wilkinson, B.; Sheridan, R. M.; Lill, R. E.; Weston, A. J.; Ready, S. J.; Waldron, C; Grouse, G. D.; Leadlay, P. R; Staunton, J. Chem Commun. (Camb. ) 2002, (6), 618-619). After transfer, a change back to its more favorable confirmation will result in the observed beta-D- quinovose. Similar flexibility has been observed for a number of other glycosyltransferases suggesting that this is a general feature of these remarkable enzymes (see for example Gaisser, S., J. Chem Commun. (Camb. ) 2002, (6), 618-619 and Hoffmeister, D.; Drager, G; Ichinose, K.; Rohr, J.; Bechthold, A. Journal of the American Chemical Society 2003, 125 (16), 4678-4679 and Rodriguez, L.; Aguirrezabalaga, I.; Allende, N.; Brana, A. R; Mendez, C; Salas, J. A. Chem Biol 2002, 9 (6), 721-729) The work presented here illustrates the considerable substrate flexibility of the glycosyltransferases involved in spinosyn biosynthesis: SpnP transfers both 4- aminosugars and neutral (Gaisser, S., J. Chem Commun. (Camb. ) 2002, (6), 618-619), in both the D- and L- forms; SpnG transfers both 6-deoxyhexoses and 2,6-dideoxyhexoses as D- and L-sugars. Recently, a family of spinosyn derivatives were isolated and characterized from Saccharopolyspora pogona, whose structures contained both neutral hexoses and aminohexoses attached to C17 (Lewer, P.; Hahn, D. R.; Karr, L.; Graupner, P. R.; Gilbert, J. R.; Worden, T. V; Yao, R. C. PCT Publication WO0119840 published 2001). Possibly all sugars were transferred by a single, promiscuous endogenous glycosyltransferase . In another embodiment is provided a method of increasing the spinosyn- producing ability of a spinosyn-producing microorganism comprising the steps of 1) transforming the host cell with a recombinant DNA vector or portion thereof that produces the biosynthetic pathway enzymes for production of a spinosyn or a spinosyn variant or a precursor thereof. Of particular interest is use of a vector or portion thereof comprising a DNA sequence that codes for the expression of an activity that is rate limiting in the pathway. The microorganism transformed with the vector is incubated under conditions suitable for cell growth and division, expression of said DNA sequence, and production of spinosyn, its aglycone or pseudoaglycone. Such cells are robust cells for the methods and systems of the invention that provide modified spinosyns. In one embodiment the operative spinosyn biosynthetic genes in the genome of the host cell have been modified so that duplicate copies of at least one of the spinosyn biosynthetic genes are present. In another embodiment a spinosyn, aglycone or pseudoaglycone is provided by the host cell having spinosyn biosynthetic genes in its genome, wherein at least one of the genes has been inactivated, the rest of the genes being operational to produce a spinosyn variant. In one embodiment the host cell has been transformed so that its genome contains operative spinosyn biosynthetic genes or operative genes to produce the macrolide, macrolide aglycone or pseudoaglycone of interest as a substrate of the present methods and systems. The heterologous glycosylation system described herein has proved its versatility and robustness to combine deoxysugar pathways, glycosyltransferases, and acceptor substrates in a very efficient way.
General Methods In one aspect, the invention provides novel in vivo glycosylation systems. The invention can be practiced in conjunction with any method or protocol known in the art, which are well described in the scientific and patent literature. The discussion of the general methods given herein is intended for illustrative purposes only. Other alternative methods and embodiments will be apparent to those of skill in the art upon review of this disclosure. General Techniques The nucleic acids used to practice this invention, including RNA, iRNA, antisense nucleic acid, cDNA, genomic DNA, vectors, viruses or hybrids thereof, may be isolated from a variety of sources, genetically engineered, amplified, and/or expressed/ generated recombinantly. Recombinant polypeptides (e.g., heterologous glycosyltransferase and/or heterologous deoxysugar pathway enzymes) generated from these nucleic acids can be individually isolated or cloned and tested for a desired activity, e.g., in vivo glycosylation of a natural product. Any recombinant expression system can be used, including bacterial, mammalian, yeast, insect or plant cell expression systems. Alternatively, these nucleic acids can be synthesized in vitro by well- known chemical synthesis techniques, as described in, e.g., Adams (1983) J. Am. Chem. Soc. 105:661; Belousov (1997) Nucleic Acids Res. 25:3440-3444; Frenkel (1995) Free Radio. Biol. Med. 19:373-380; Blommers (1994) Biochemistry 33:7886-7896; Narang (1979) Meth. Enzymol. 68:90; Brown (1979) Meth. Enzymol. 68:109; Beaucage (1981) Tetra. Lett. 22:1859; U.S. Patent No. 4,458,066. Techniques for the manipulation of nucleic acids, such as, e.g., subcloning, labeling probes (e.g., random-primer labeling using Klenow polymerase, nick translation, amplification), sequencing, hybridization and the like are well described in the scientific and patent literature, see, e.g., Sambrook, ed., MOLECULAR CLONING: A LABORATORY MANUAL (2ND ED.), Vols. 1-3, Cold Spring Harbor Laboratory, (1989); CURRENT PROTOCOLS IN MOLECULAR BIOLOGY, Ausubel, ed. John Wiley & Sons, Inc., New York (1997); LABORATORY TECHNIQUES IN BIOCHEMISTRY AND MOLECULAR BIOLOGY: HYBRIDIZATION WITH NUCLEIC ACID PROBES, Part I. Theory and Nucleic Acid Preparation, Tij ssen, ed. Elsevier, N.Y. (1993). Nucleic acids and proteins can be detected, confirmed and quantified by any of a number of means well known to those of skill in the art. General methods for detecting both nucleic acids and corresponding proteins include analytic biochemical methods such as spectrophotometry, radiography, electrophoresis, capillary electrophoresis, high performance liquid chromatography (HPLC), thin layer chromatography (TLC), hyperdiffusion chromatography, and the like, and various immunological methods such as fluid or gel precipitin reactions, immunodiffusion (single or double), immunoelectrophoresis, radioimmunoassays (RIAs), enzyme-linked immunosorbent assays (ELISAs), immunofluorescent assays, and the like. The detection of nucleic acids and polypeptides can be by well known methods such as Southern analysis, northern analysis, gel electrophoresis, PCR, radiolabeling, scintillation counting, and affinity chromatography. Another useful means of obtaining and manipulating nucleic acids used to practice the methods of the invention is to clone from genomic samples, and, if desired, screen and re-clone inserts isolated or amplified from, e.g., genomic clones or cDNA clones. Sources of nucleic acid used in the methods of the invention include genomic or cDNA libraries contained in, e.g., mammalian artificial chromosomes (MACs), see, e.g., U.S. Patent Nos. 5,721,118; 6,025,155; human artificial chromosomes, see, e.g., Rosenfeld (1997) Nat. Genet. 15:333-335; yeast artificial chromosomes (YAC); bacterial artificial chromosomes (BAG); PI artificial chromosomes, see, e.g., Woon (1998) Genomics 50:306-316; Pl-derived vectors (PACs), see, e.g., Kern (1997) Biotechniques 23:120-124; cosmids, recombinant viruses, phages or plasmids. In one aspect, a nucleic acid encoding a polypeptide used to practice the invention is assembled in appropriate phase with a leader sequence capable of directing secretion of the translated polypeptide or fragment thereof. The novel in vivo glycosylation systems and methods of the invention can use fusion proteins and nucleic acids encoding them. A polypeptide used to practice the invention can be fused to a heterologous peptide or polypeptide, such as N-terminal identification peptides which impart desired characteristics, such as increased stability or simplified purification. Peptides and polypeptides used to practice the invention can also be synthesized and expressed as fusion proteins with one or more additional domains linked thereto for, e.g., producing a more immunogenic peptide, to more readily isolate a recombinantly synthesized peptide, to identify and isolate antibodies and antibody- expressing B cells, and the like. Detection and purification facilitating domains include, e.g., metal chelating peptides such as polyhistidine tracts and histidine-tryptophan modules that allow purification on immobilized metals, protein A domains that allow purification on immobilized immunoglobulin, and the domain utilized in the FLAGS extension/affinity purification system (Immunex Corp, Seattle WA). The inclusion of a cleavable linker sequences such as Factor Xa or enterokinase (Invifrogen, San Diego CA) between a purification domain and the motif-comprising peptide or polypeptide to facilitate purification. For example, an expression vector can include an epitope- encoding nucleic acid sequence linked to six histidine residues followed by a thioredoxin and an enterokinase cleavage site (see e.g., Williams (1995) Biochemistry 34:1787-1797; Dobeli (1998) Protein Expr. Purif. 12:404-414). The histidine residues facilitate detection and purification while the enterokinase cleavage site provides a means for purifying the epitope from the remainder of the fusion protein. Technology pertaining to vectors encoding fusion proteins and application of fusion proteins are well described in the scientific and patent literature, see e.g., Kroll (1993) DNA Cell. Biol., 12:441-53.
Transcriptional and translational control sequences The nucleic acid (e.g., DNA) sequences used to practice the invention can be operatively linked to expression (e.g., transcriptional or translational) control sequence(s), e.g., promoters or enhancers, to direct or modulate RNA synthesis/ expression or their own replication. The expression control sequence can be in an expression vector. Exemplary bacterial promoters include lad, lacZ, T3, T7, gpt, lambda PR, PL and trp. Exemplary eukaryotic promoters include CMV immediate early, HSV thymidine kinase, early and late SV40, LTRs from retrovirus, and mouse metallothionein I. Promoters suitable for expressing a polypeptide in bacteria include the E. coli lac or trp promoters, the lad promoter, the lacZ promoter, the T3 promoter, the T7 promoter, the gpt promoter, the lambda PR promoter, the lambda PL promoter, promoters from operons encoding glycolytic enzymes such as 3-phosphoglycerate kinase (PGK), and the acid phosphatase promoter. Eukaryotic promoters include the CMV immediate early promoter, the HSV thymidine kinase promoter, heat shock promoters, the early and late SV40 promoter, LTRs from retroviruses, and the mouse metallothionein-I promoter. Other promoters known to control expression of genes in prokaryotic or eukaryotic cells or their viruses may also be used. Tissue-Specific Plant Promoters In practicing the methods of the invention, expression cassettes can be used and expressed in a tissue-specific manner, e.g., expressing a heterologous deoxysugar pathway or glycosyltransferase gene in a tissue-specific manner. The invention also provides plants, plant cells, extracts or seeds that express in vivo glycosylation systems of the invention in a tissue-specific manner. The tissue-specificity can be seed specific, stem specific, leaf specific, root specific, fruit specific and the like. In one aspect, a constitutive promoter such as the CaMV 35S promoter can be used for expression of enzymes (e.g., heterologous glycosyltransferase and a heterologous deoxysugar pathways) in specific parts of the plant or seed or throughout the plant. For example, for overexpression, a plant promoter fragment can be employed which will direct expression of a nucleic acid in some or all tissues of a plant, e.g., a regenerated plant. Such promoters are referred to herein as "constitutive" promoters and are active under most environmental conditions and states of development or cell differentiation. Examples of constitutive promoters used to practice the invention include the cauliflower mosaic virus (CaMV) 35S transcription initiation region, the V- or 2'- promoter derived from T-DNA of Agrobacterium tumefaciens, and other transcription initiation regions from various plant genes known to those of skill. Such genes include, e.g., ACT11 from Arabidopsis (Huang (1996) Plant Mol Biol. 33:125-139); Cat3 from Arabidopsis (Gen ankNo. \J43l41, Z ong (1996) Mol. Gen. Genet. 251:196-203); the gene encoding stearoyl-acyl carrier protein desaturase from Brassica napus (Genbank No. X74782, Solocombe (1994) Plant Physiol. 104:1167-1176); GPcl from maize (GenBank No. X15596; Martinez (1989) J. Mol. Biol 208:551-565); the Gpc2 from maize (GenBank , No. U45855, Manjunath (1997) Plant Mol Biol. 33:97-112); plant promoters described in U.S. Patent Nos. 4,962,028; 5,633,440. The invention can use tissue-specific or constitutive promoters derived from viruses which can include, e.g., the tobamovirus subgenomic promoter (Kumagai (1995) Proc. Natl. Acad. Sci. USA 92:1679-1683; the rice tungro bacilliform virus (RTBV), which replicates only in phloem cells in infected rice plants, with its promoter which drives strong phloem-specific reporter gene expression; the cassava vein mosaic virus (CVMV) promoter, with highest activity in vascular elements, in leaf mesophyll cells, and in root tips (Verdaguer (1996) Plant Mol. Biol. 31: 1129-1139). Alternatively, the methods of the invention can use plant promoters to direct expression of enzyme-coding (e.g., heterologous glycosyltransferase and a heterologous deoxysugar pathways) nucleic acid in a specific tissue, organ or cell type (i.e. tissue-specific promoters). The methods of the invention can use plant or other promoters under environmental or developmental control. Exemplary promoters include transcriptional control elements inducible under various environmental conditions, including anaerobic conditions, elevated temperature, the presence of light, chemicals and/or hormones. In one aspect, the plants are sprayed with chemicals and/or hormones to induce expression of a glycosyltransferase or a deoxysugar pathway to generate a natural product of the invention. For example, the methods of the invention can use the drought-inducible promoter of maize (Busk (1997) supra); or, the cold, drought, and high salt inducible promoter from potato (Kirch (1997) Plant Mol. Biol. 33:897 909). In one aspect, tissue-specific promoters used in the methods of the invention promote transcription only within a certain time frame of developmental stage within that tissue, including, e.g., promoters described in: Blazquez (1998) Plant Cell
10:791-800, characterizing the Arabidopsis LEAFY gene promoter; Cardon (1997) Plant J 12:367-77, describing the transcription factor SPL3, which recognizes a conserved sequence motif in the promoter region of the A. thaliana floral meristem identity gene API; and Mandel (1995) Plant Molecular Biology, Vol. 29, pp 995-1004, describing the meristem promoter eIF4. Tissue specific promoters which are active throughout the life cycle of a particular tissue can be used. In one aspect, the nucleic acids used to practice the invention are operably linked to a promoter active primarily only in cotton fiber cells. In one aspect, the nucleic acids used to practice the invention are operably linked to a promoter active primarily during the stages of cotton fiber cell elongation, e.g., as described by Rinehart (1996) supra. The nucleic acids can be operably linked to the
Fbl2A gene promoter to be preferentially expressed in cotton fiber cells (Ibid) . See also,
John (1997) Proc. Natl. Acad. Sci. USA 89:5769-5773; John, et al., U.S. Patent Nos.
5,608,148 and 5,602,321, describing cotton fiber-specific promoters and methods for the construction of transgenic cotton plants. Root-specific promoters may also be used to express the nucleic acids used to practice the invention. Examples of root-specific promoters include the promoter from the alcohol dehydrogenase gene (DeLisle (1990) Int. Rev. Cytol. 123:39-60). Other promoters that can be used to express the nucleic acids used to practice the invention include, e.g., ovule-specific, embryo-specific, endosperm- specific, integument-specific, seed coat-specific promoters, or some combination thereof; a leaf-specific promoter (see, e.g., Busk (1997) Plant J. 11:1285 1295, describing a leaf- specific promoter in maize); the ORF 13 promoter from Agrobacterium r izogenes (which exhibits high activity in roots, see, e.g., Hansen (1997) supra); a maize pollen specific promoter (see, e.g., Guerrero (1990) Mol. Gen. Genet. 224:161 168); a tomato promoter active during fruit ripening, senescence and abscission of leaves and, to a lesser extent, of flowers can be used (see, e.g., Blume (1997) Plant J. 12:731 746); a pistil-specific promoter from the potato SK2 gene (see, e.g., Ficker (1997) Plant Mol. Biol. 35:425 431); the Blec4 gene from pea, which is active in epidermal tissue of vegetative and floral shoot apices of transgenic alfalfa making it a useful tool to target the expression of foreign genes to the epidermal layer of actively growing shoots or fibers; the ovule- specific BEL1 gene (see, e.g., Reiser (1995) Cell 83:735-742, GenBank No. U39944); and/or, the promoter in Klee, U.S. Patent No. 5,589,583, describing a plant promoter region is capable of conferring high levels of transcription in meristematϊc tissue and/or rapidly dividing cells. Alternatively, plant promoters which are inducible upon exposure to plant hormones, such as auxins, can be used to express nucleic acids used to practice the invention. For example, the invention can use the auxin-response elements El promoter fragment (AuxREs) in the soybean (Glycine max L.) (Liu (1997) Plant Physiol. 115:397-407); the auxin-responsive Arabidopsis GST6 promoter (also responsive to salicylic acid and hydrogen peroxide) (Chen (1996) Plant J. 10: 955-966); the auxin- inducible parC promoter from tobacco (Sakai (1996) 37:906-913); a plant biotin response element (Streit (1997) Mol. Plant Microbe Interact. 10:933-937); and, the promoter responsive to the stress hormone abscisic acid (Sheen (1996) Science 274:1900-1902). Nucleic acids used to practice the invention can also be operably linked to plant promoters which are inducible upon exposure to chemicals reagents which can be applied to the plant, such as herbicides or antibiotics. For example, the rnaize In2-2 promoter, activated by benzenesulfonamide herbicide safeners, can be used (De Veylder
(1997) Plant Cell Physiol. 38:568-577); application of different herbicide safeners induces distinct gene expression patterns, including expression in the root, hydathodes, and the shoot apical meristem. Coding sequence can be under the control of, e.g., a tetracycline-inducible promoter, e.g., as described with transgenic tobacco plants containing the Avena sativa L. (oat) arginine decarboxylase gene (Masgrau (1997) Plant J. 11:465-473); or, a salicylic acid-responsive element (Stange (1997) Plant J. 11:1315-1324). Using chemically- (e.g. , hormone- or pesticide-) induced promoters, i.e., promoter responsive to a chemical which can be applied to the transgenic plant in the field, expression of a glycosyltransferase or a deoxysugar pathway to generate a natural product of the invention can be induced at a particular stage of development of the plant. Thus, the invention also provides for transgenic plants containing an inducible gene encoding for a glycosyltransferase or a deoxysugar pathway to generate a natural product of the invention, where the plants' host range is limited to target plant species, such as corn, rice, barley, wheat, potato or other crops, inducible at any stage of development of the crop. In one aspect, a tissue-specific plant promoter drives expression of operably linked sequences in tissues other than the target tissue. Thus, a tissue-specific promoter is one that drives expression preferentially in the target tissue or cell type, but may also lead to some expression in other tissues as well. The nucleic acids used to practice the invention can also be operably linked to plant promoters which are inducible upon exposure to chemicals reagents. These reagents include, e.g., herbicides, synthetic auxins, or antibiotics which can be applied, e.g., sprayed, onto transgenic plants. Inducible expression of a glycosyltransferase or a deoxysugar pathway to generate a natural product of the invention will allow the grower to induce production in a plant, a plant cell, a seed, a fruit and the like. Inducible expression will also allow selection of plants with fungi-resistant properties. The development of toxin resistant plants, seeds, fruits, etc. can be controlled in this manner. In this way the invention provides the means to facilitate the growth, harvesting and storage of plants and plant parts by constitutive or inducible enzyme detoxification. For example, in various aspects, the maize In2-2 promoter, activated by benzenesulfonamide herbicide safeners, is used (De Veylder (1997) Plant Cell Physiol. 38:568-577); application of different herbicide safeners induces distinct gene expression patterns, including expression in the root, hydathodes, and the shoot apical meristem.
Coding sequences of nucleic acids also can be under the control of a tetracycline-inducible promoter, e.g., as described with transgenic tobacco plants containing the Avena sativa L. (oat) arginine decarboxylase gene (Masgrau (1997) Plant J. 11 :465-473); or, a salicylic acid-responsive element (Stange (1997) Plant J. 11:1315-1324). Expression vectors and cloning vehicles Expression vectors and cloning vehicles comprising nucleic acids encoding enzymes (e.g., glycosyltransferases or deoxysugar pathways) are used to practice the invention. Sequences encoding glycosyltransferases or deoxysugar patirways can be cloned into any expression vehicle. Expression vectors and cloning vehicles used to practice the invention can comprise viral particles, baculovirus, phage, plasmids, phagemids, cosmids, fosmids, bacterial artificial chromosomes, viral DNA (e.g., vaccinia, adenovirus, foul pox virus, pseudorabies and derivatives of SV40), PI -based artificial chromosomes, yeast plasmids, yeast artificial chromosomes, and any other vectors specific for specific hosts of interest (such as bacillus, Aspergillus and yeast). Vectors used to practice the invention can include chromosomal, non-chromosomal and synthetic DNA sequences. Large numbers of suitable vectors are known to those of skill in the art, and are commercially available. Exemplary vectors are include: bacterial: pQE vectors (Qiagen), pBluescript plasmids, pNH vectors, (lambda-ZAP vectors (Stratagene); ptrc99a, pKK223-3, pDR540, pRIT2T (Pharmacia); Eukaryotic: pXTl, pSG5 (Stratagene), pSVK3, pBPV, pMSG, pSVLSV40 (Pharmacia). However, any other plasmid or other vector may be used so long as they are replicable and viable in the host. Low copy number or high copy number vectors may be employed with the present invention. The expression vector can comprise a promoter, a ribosome binding site for translation initiation and a transcription terminator. The vector may also include appropriate sequences for amplifying expression. Mammalian expression vectors can comprise an origin of replication, any necessary ribosome binding sites, a polyadenylation site, splice donor and acceptor sites, transcriptional termination sequences, and 5' flanking non-transcribed sequences. In some aspects, DNA sequences derived from the SV40 splice and polyadenylation sites may be used to provide the required non-transcribed genetic elements. In one aspect, the expression vectors contain one or more selectable marker genes to permit selection of host cells containing the vector. Such selectable markers include genes encoding dihydrofolate reductase or genes conferring neomycin resistance for eukaryotic cell culture, genes conferring tetracycline or ampicillin resistance in E. coli, and the S. cerevisiae TRP1 gene. Promoter regions can be selected from any desired gene using chloramphenicol transferase (CAT) vectors or other vectors with selectable markers. Vectors for expressing a polypeptide or fragment thereof in eukaryotic cells can also contain enhancers to increase expression levels. Enhancers are cis-acting elements of DNA, usually from about 10 to about 300 bp in length that act on a promoter to increase its transcription. Examples include the SV40 enhancer on the late side of the replication origin bp 100 to 270, the cytomegalovirus early promoter enhancer, the polyoma enhancer on the late side of the replication origin, and the adenovirus enhancers. A nucleic acid sequence can be inserted into a vector by a variety of procedures. In general, the sequence is ligated to the desired position in the vector following digestion of the insert and the vector with appropriate restriction endonucleases. Alternatively, blunt ends in both the insert and the vector may be ligated. A variety of cloning techniques are known in the art, e.g., as described in Ausubel and Sambrook. Such procedures and others are deemed to be within the scope of those skilled in the art. The vector can be in the form of a plasmid, a viral particle, or a phage. Other vectors include chromosomal, non-chromosomal and synthetic DNA sequences, derivatives of SV40; bacterial plasmids, phage DNA, baculovirus, yeast plasmids, vectors derived from combinations of plasmids and phage DNA, viral DNA such as vaccinia, adenovirus, fowl pox virus, and pseudorabies. A variety of cloning and expression vectors for use with prokaryotic and eukaryotic hosts are described by, e.g., Sambrook. Particular bacterial vectors which can be used include the commercially available plasmids comprising genetic elements of the well known cloning vector pBR322 (ATCC 37017), pKK223-3 (Pharmacia Fine Chemicals, Uppsala, Sweden), GEM1 (Promega Biotec, Madison, WI, USA) pQE70, pQE60, pQE-9 (Qiagen), pDl 0, psiX174 pBluescript II KS, pNH8A, pNH16a, pNH18A, pNH46A (Stratagene), ptrc99a, pKK223-3, pKK233-3, DR540, pRIT5 (Pharmacia), pKK232-8 and ρCM7. Particular eukaryotic vectors include pSV2CAT, pOG44, pXTl, pSG (Stratagene) pSVK3, pBPV, pMSG, and pSVL (Pharmacia). However, any other vector may be used as long as it is replicable and viable in the host cell. The nucleic acids used to practice the invention can be expressed in expression cassettes, vectors or viruses and transiently or stably expressed in plant cells and seeds. One exemplary transient expression system uses episomal expression systems, e.g., cauliflower mosaic virus (CaMV) viral RNA generated in the nucleus by transcription of an episomal mini-chromosome containing supercoiled DNA, see, e.g., Covey (1990) Proc. Natl. Acad. Sci. USA 87:1633-1637. Alternatively, coding sequences, i.e., all or sub-fragments of sequences encoding polypeptides having a glycosyltransferase or deoxysugar pathway activity can be inserted into a plant host cell genome becoming an integral part of the host chromosomal DNA. Sense or antisense transcripts can be expressed in this manner. A vector comprising sequences (e.g., promoters or coding regions) from nucleic acids used to practice the invention can comprise a marker gene that confers a selectable phenotype on a plant cell or a seed. For example, the marker may encode biocide resistance, particularly antibiotic resistance, such as resistance to kanamycin, G418, bleomycin, hygromycin, or herbicide resistance, such as resistance to chlorosulfuron or Basta. In one aspect, the expression vectors comprising nucleic acids encoding a glycosyltransferase or a deoxysugar pathway to generate a natural product of the invention are expressed (inducibly or constitutively) in plants and plant parts, including cells, seeds, fruits, leaves, roots, flowers and the like. Expression vectors capable of expressing nucleic acids and proteins in plants are well known in the art, and can include, e.g., vectors from Agrobacterium spp., potato virus X (see, e.g., Angell (1997) EMBO J. 16:3675-3684), tobacco mosaic virus (see, e.g., Casper (1996) Gene 173:69-73), tomato bushy stunt virus (see, e.g., Hillman (1989) Virology 169:42-50), tobacco etch virus (see, e.g., Dolja (1997) Virology 234:243-252), bean golden mosaic virus (see, e.g., Morinaga (1993) Microbiol Immunol. 37:471-476), cauliflower mosaic virus (see, e.g., Cecchini (1997) Mol. Plant Microbe Interact. 10:1094-1101), maize Ac/Ds transposable element (see, e.g., Rubin (1997) Mol. Cell. Biol. 17:6294-6302; Kunze (1996) Curr. Top. Microbiol. Immunol. 204:161-194), and the maize suppressor-mutator (Spm) transposable element (see, e.g., Schlappi (1996) Plant Mol. Biol. 32:717-725); and derivatives thereof. In one aspect, the expression vector can have two replication systems to allow it to be maintained in two organisms, for example in mammalian or insect cells for expression and in a prokaryotic host for cloning and amplification. Furthermore, for integrating expression vectors, the expression vector can contain at least one sequence homologous to the host cell genome. It can contain two homologous sequences which flank the expression construct. The integrating vector can be directed to a specific locus in the host cell by selecting the appropriate homologous sequence for inclusion in the vector. Constructs for integrating vectors are well known in the art. Expression vectors used to practice the invention may also include a selectable marker gene to allow for the selection of bacterial strains that have been transformed, e.g., genes which render the bacteria resistant to drugs such as ampicillin, chloramphenicol, erythromycin, kanamycin, neomycin and tetracycline. Selectable markers can also include biosynthetic genes, such as those in the histidine, tryptophan and leucine biosynthetic pathways. Host cells and transformed cells In one aspect, transformed cells comprising a nucleic acid sequence encoding glycosyltransferases or deoxysugar pathways, or capable of producing a novel compound of the invention, are used to practice the invention. In one aspect, a transformed cell is used to generate a polypeptide having glycosyltransferase or deoxysugar pathway activity, which can in turn be applied to or administered to an animal, a plant, a food, a feed, a patient, and the like, as described below. In another aspect, a transformed cell that generates (e.g., secretes) a glycosyltransferase or deoxysugar pathway activity, or a novel compound of the invention, is itself applied to a plant, a food, a feed, and the like. The host cell may be any of the host cells familiar to those skilled in the art, including prokaryotic cells, eukaryotic cells, such as bacterial cells, fungal cells, yeast cells, mammalian cells, insect cells, or plant cells. A host cell can be an actinomycetes (i.e., any organism from the order Actinomycetales), e.g., a recombinantly engineered actinomycetes, comprising a heterologous glycosyltransferase and a heterologous deoxysugar pathway. In one aspect, the actinomycetes is a Streptomyces, such as a Streptomyces coelicolor, Streptomyces peucetius, Streptomyces avermitilis, Streptomyces aureofaciens, Streptomyces kasugensis, Streptomyces lavenulae, Streptomyces lipmanii, Streptomyces lividans, Streptomyces ambofaciens, Streptomyces violaceoniger, Streptomyces thermotolerans, Streptomyces rimosus, Streptomyces glaucescens, Streptomyces roseofulvus, Streptomyces cinnamonensis, Streptomyces curacoi, Streptomyces fradiae, Streptomyces griseus, Streptomyces griseofuscus, Streptomyces longisporoflavus, Streptomyces hygroscopicus, Streptomyces lasaliensis, Streptomyces venezuelae, Streptomyces antibioticus, Streptomyces albus, Streptomyces tsukubaensis, Streptomyces galilaeus or Streptomyces diversa. In one aspect, the actinomycete is an actinomycete plant endophyte. In one aspect, the Actinomycetales is from the family Micromonosporaceae, or the genus Actinomyces, Actinomadura or Nocardia. Micromonosporaceae are preferably Micromonos poraceae, Actinoplaes, Dactylos porangium, Micromonospora or Verrucosispora. Alternative host cells include Pseudonocardineae, Actinosynnema, Lechevaleria, Saccharothrix, Actinoalloteichus, Actinopolyspora, Amycolatopsis, Kibedelos porangium, Pseudonocardia, Saccharomonospora, Saccharopolyspora, and Streptoalloteichus. Alternative host cells include Streptomycetacea, including Kitasatospora and Streptomyces. Alternative host cells include Microbispora and Microtetraspora. Exemplary bacterial cells include E. coli, Streptomyces, Bacillus subtilis, Bacillus ceres, Salmonella typhimurium and various species within the genera Bacillus, Streptomyces, and Staphylococcus. Exemplary insect cells include Drosophila S2 and Spodoptera Sf9. Exemplary animal cells include CHO, COS or Bowes melanoma or any mouse or human cell line. The selection of an appropriate host is within the abilities of those skilled in the art. Techniques for transforming a wide variety of higher plant species are well known and described in the technical and scientific literature. See, e.g., Weising (1988) Ann. Rev. Genet. 22:421- 477; U.S. Patent No. 5,750,870. The vector can be introduced into the host cells using any of a variety of techniques, including transformation, transfection, transduction, viral infection, gene guns, or Ti-mediated gene transfer. Particular methods include calcium phosphate transfection, DEAE-Dextran mediated transfection, lipofection, or electroporation, see, e.g., Davis, L., Dibner, M., Battey, I., Basic Methods in Molecular Biology, (1986). In one aspect, the nucleic acids or vectors used to practice the invention are introduced into the cells for screening, thus, the nucleic acids enter the cells in a manner suitable for subsequent expression of the nucleic acid. The method of introduction is largely dictated by the targeted cell type. Exemplary methods include CaP04 precipitation, liposome fusion, lipofection (e.g., LIPOFECTIN™), electroporation, viral infection, etc. The candidate nucleic acids may stably integrate into the genome of the host cell (for example, with retroviral introduction) or may exist either transiently or stably in the cytoplasm (i.e. through the use of traditional plasmids, utilizing standard regulatory sequences, selection markers, etc.). As many pharmaceutically important screens require human or model mammalian cell targets, retroviral vectors capable of transfecting such targets can also be used. Where appropriate, the engineered host cells can be cultured in conventional nutrient media modified as appropriate for activating promoters, selecting transformants or amplifying the genes encoding a glycosyltransferase or a deoxysugar pathway to generate a natural product of the invention. Following transformation of a suitable host strain and growth of the host strain to an appropriate cell density, the selected promoter may be induced by appropriate means (e.g., temperature shift or chemical induction) and the cells may be cultured for an additional period to allow them to produce the desired polypeptide or fragment thereof. Cells can be harvested by centrifugation, disrupted by physical or chemical means, and the resulting crude extract is retained for further purification.' Microbial cells employed for expression of proteins can be disrupted by any convenient method, including freeze-thaw cycling, sonication, mechanical disruption, or use of cell lysing agents. Such methods are well known to those skilled in the art. The expressed polypeptide or fragment thereof can be recovered and purified from recombinant cell cultures by methods including ammonium sulfate or ethanol precipitation, acid extraction, anion or cation exchange chromatography, phosphocellulose chromatography, hydrophobic interaction chromatography, affinity chromatography, hydroxylapatite chromatography and lectin chromatography. Protein refolding steps can be used, as necessary, in completing configuration of the polypeptide. If desired, high performance liquid chromatography (HPLC) can be employed for final purification steps. Various mammalian cell culture systems can also be employed to express recombinant protein. Examples of mammalian expression systems include the COS-7 lines of monkey kidney fibroblasts and other cell lines capable of expressing proteins from a compatible vector, such as the C127, 3T3, CHO, HeLa and BHK cell lines. The constructs in host cells can be used in a conventional manner to produce the gene product encoded by the recombinant sequence. Depending upon the host employed in a recombinant production procedure, the polypeptides produced by host cells containing the vector may be glycosylated or may be non-glycosylated.
Polypeptides used to practice the invention may or may not also include an initial methionine amino acid residue. In one embodiment are included enzymes that can modify the sugar moiety before or after transfer to the aglycone or pseudoaglycone. In further embodiments the new glycone produced can be subjected to further processing or derivatization by the host cell or post isolation. By "aglycone" is meant a molecule, such as a polyketide, a macrolide, a peptide or a mixed polyketide-peptide or macrolide-peptide that can be processed, for example, by a glycosyltransferase, to receive at least one carbohydrate (including a natural or unnatural sugar), e.g., at least one activated sugar moiety. In alternative aspects, sugar moieties of or added to the aglycone or pseudoaglycone can include forms modified by enzymes such as methyl transferases, hydroxylases, or epoxidases, before or after a glycosyltransferase step. Diversity in the glycones of the invention can be achieved by modifying one or more glycone- (or aglycone- or pseudoaglycone-) modifying endogenous genes in a host cell. Host cells can be engineered to inactivate or activate all or parts of an endogenous metabolic pathway, e.g. a deoxysugar biosynthetic pathway, a glycosyltransferase, a macrolide biosynthetic pathway, by means well known in the art. For example, an antisense molecule against an endogenous polypeptide or enzyme of interest in a metabolic pathway can be administered to the host cell or expressed in the host cell to modulate expression of that polypeptide. In one embodiment a pathway or pathways are modulated to allow production of an aglycone or pseudoaglycone of interest that is then a substrate for at least one enzyme and/or pathway of the present invention. Other methods are well known to inactivate a particular enzyme or pathway of interest, for example by gene disruption of the structural gene encoding the target enzyme or of a regulatory gene or cis region controlling expression of that enzyme or its pathway, e.g., operon. In another embodiment a particular active domain of an enzyme, e.g. a domain of a PKS, is inactivated to yield a host cell whose modified PKS produces a modified product that is then a substrate for use in the invention. Thus a host cell can either naturally produce a desired aglycone or pseudoaglycone or be modified to produce one. Additionally the host cell can be modified to further modify a glycone, aglycone or pseudoaglycone produced by the glycosyltransferase or sugar pathway of the invention. Cell-free translation systems can also be used to practice the invention.
Cell-free translation systems can use mRNAs transcribed from a DNA construct comprising a promoter operably linked to a nucleic acid encoding the polypeptide or fragment thereof. In some aspects, the DNA construct may be linearized prior to conducting an in vitro transcription reaction. The transcribed mRNA is then incubated with an appropriate cell-free translation extract, such as a rabbit reticulocyte extract, to produce the desired polypeptide or fragment thereof. The expression vectors can contain one or more selectable marker genes to provide a phenotypic trait for selection of transformed host cells such as dihydrofolate reductase or neomycin resistance for eukaryotic cell culture, or such as tetracycline or ampicillin resistance in E. coli.
Modification of Nucleic Acids The invention provides methods of generating variants of the in vivo glycosylation systems of the invention, including heterologous glycosyltransferases and heterologous deoxysugar pathways, using, e.g., evolution technologies, such as GSSM™ and GENEREASSEMBLY™. GSSM™ is used to create all possible codon substitutions at each position within a given gene. GENEREASSEMBLY™ is routinely used to create libraries of gene chimeras from functionally homologous parental genes, or to recombine desirable mutants identified by GSSM™. The gene fragments can be generated by PCR or from synthetic oligonucleotides. Gene chimeras are formed by ligation of pooled homologous fragments. All fragments in a pool have identical overhangs and therefore all combinations are formed with the same probability. The overhangs are directly engineered into the oligo fragments or generated by class IIS restriction sites included in the PCR primers. The complexity of the resulting library can be customized to fit the needs of the individual project. It can be used for the evolution of protein domains, entire enzymes, or even pathways. GENEREASSEMBLY™ works equally well with all parental DNA's independent of sequence homology. This allows for the recombination of proteins even at non-conserved amino acids. In one aspect, the genetic composition of a cell is altered by, e.g., modification of a homologous gene ex vivo, followed by its reinsertion into the cell. In practicing the methods of the invention, a nucleic acid can be altered by any means. For example, random or stochastic methods, or, non-stochastic, or "directed evolution," methods, see, e.g., U.S. Patent No. 6,361,974. Methods for random mutation of genes are well known in the art, see, e.g., U.S. Patent No. 5,830,696. For example, mutagens can be used to randomly mutate a gene. Mutagens include, e.g., ultraviolet light or gamma irradiation, or a chemical mutagen, e.g., mitomycin, nitrous acid, photoactivated psoralens, alone or in combination, to induce DNA breaks amenable to repair by recombination. Other chemical mutagens include, for example, sodium bisulfite, nitrous acid, hydroxylamine, hydrazine or formic acid. Other mutagens are analogues of nucleotide precursors, e.g., nitrosoguanidine, 5-bromouracil, 2-aminopurine, or acridine. These agents can be added to a PCR reaction in place of the nucleotide precursor thereby mutating the sequence. Intercalating agents such as profiavine, acriflavine, quinacrine and the like can also be used. Any technique in molecular biology can be used, e.g., random PCR mutagenesis, see, e.g., Rice (1992) Proc. Natl. Acad. Sci. USA 89:5467-5471; or, combinatorial multiple cassette mutagenesis, see, e.g., Crameri (1995) Biotechniques
18:194-196. Alternatively, nucleic acids, e.g., genes, can be reassembled after random, or "stochastic," fragmentation, see, e.g., U.S. Patent Nos. 6,291,242; 6,287,862; 6,287, 861; 5,955,358; 5,830,721; 5,824,514; 5,811,238; 5,605,793. In alternative aspects, modifications, additions or deletions are introduced by error-prone PCR, shuffling, oligonucleotide-directed mutagenesis, assembly PCR, sexual PCR mutagenesis, in vivo mutagenesis, cassette mutagenesis, recursive ensemble mutagenesis, exponential ensemble mutagenesis, site-specific mutagenesis, gene reassembly, Gene Site Saturation Mutagenesis™ (GSSM™), synthetic ligation reassembly (SLR), recombination, recursive sequence recombination, phosphothioate-modified DNA mutagenesis, uracil-containing template mutagenesis, gapped duplex mutagenesis, point mismatch repair mutagenesis, repair-deficient host strain mutagenesis, chemical mutagenesis, radiogenic mutagenesis, deletion mutagenesis, restriction-selection mutagenesis, restriction-purification mutagenesis, artificial gene synthesis, ensemble mutagenesis, chimeric nucleic acid multimer creation, and/or a combination of these and other methods. The following publications describe a variety of recursive recombination procedures and/or methods which can be incorporated into the methods of the invention: Stemmer (1999) "Molecular breeding of viruses for targeting and other clinical properties" Tumor Targeting 4:1-4; Ness (1999) Nature Biotechnology 17:893-896; Chang (1999) "Evolution of a cytokine using DNA family shuffling" Nature Biotechnology 17:793-797; Minshull (1999) "Protein evolution by molecular breeding" Current Opinion in Chemical Biology 3:284-290; Christians (1999) "Directed evolution of thymidine kinase for AZT phosphorylation using DNA family shuffling" Nature Biotechnology 17:259-264; Crameri (1998) "DNA shuffling of a family of genes from diverse species accelerates directed evolution" Nature 391:288-291; Crameri (1997) "Molecular evolution of an arsenate detoxification pathway by DNA shuffling," Nature
Biotechnology 15:436-438; Zhang (1997) "Directed evolution of an effective fucosidase from a galactosidase by DNA shuffling and screening" Proc. Natl. Acad. Sci. USA
94:4504-4509; Patten et al. (1997) "Applications of DNA Shuffling to Pharmaceuticals and Vaccines" Current Opinion in Biotechnology 8:724-733; Crameri et al. (1996) "Construction and evolution of antibody-phage libraries by DNA shuffling" Nature Medicine 2:100-103; Gates et al. (1996) "Affinity selective isolation of ligands from peptide libraries through display on a lac repressor "headpiece dime " Journal of Molecular Biology 255:373-386; Stemmer (1996) "Sexual PCR and Assembly PCR" In: The Encyclopedia of Molecular Biology. VCH Publishers, New York, pp.447-457; Crameri and Stemmer (1995) "Combinatorial multiple cassette mutagenesis creates all the permutations of mutant and wildtype cassettes" BioTechniques 18:194-195; Stemmer et al. (1995) "Single-step assembly of a gene and entire plasmid form large numbers of oligodeoxyribonucleotides" Gene, 164:49-53; Stemmer (1995) "The Evolution of Molecular Computation" Science 270: 1510; Stemmer (1995) "Searching Sequence Space" Bio/Technology 13:549-553; Stemmer (1994) "Rapid evolution of a protein in vitro by DNA shuffling" Nature 370:389-391; and Stemmer (1994) "DNA shuffling by random fragmentation and reassembly: In vitro recombination for molecular evolution." Proc. Natl. Acad. Sci. USA 91:10747-10751. Mutational methods of generating diversity include, for example, site- directed mutagenesis (Ling et al. (1997) "Approaches to DNA mutagenesis: an overview" Anal Biochem. 254(2): 157-178; Dale et al. (1996) "Oligonucleotide-directed random mutagenesis using the phosphorothioate method" Methods Mol. Biol. 57:369-374; Smith (1985) "In vitro mutagenesis" Ann. Rev. Genet. 19:423-462; Botstein & Shortle (1985) "Strategies and applications of in vitro mutagenesis" Science 229: 1193-1201 ; Carter (1986) "Site-directed mutagenesis" Biochem. J. 237:1-7; and Kunkel (1987) "The
' efficiency of oligonucleotide directed mutagenesis" in Nucleic Acids & Molecular Biology (Eckstein, F. and Lilley, D. M. J. eds., Springer Verlag, Berlin)); mutagenesis using uracil containing templates (Kunkel (1985) "Rapid and efficient site-specific mutagenesis without phenotypic selection" Proc. Natl. Acad. Sci. USA 82:488-492; Kunkel et al. (1987) "Rapid and efficient site-specific mutagenesis without phenotypic selection" Methods in Enzymol. 154, 367-382; and Bass et al. (1988) "Mutant Trp repressors with new DNA-binding specificities" Science 242:240-245); oligonucleotide- directed mutagenesis (Methods in Enzymol. 100: 468-500 (1983); Methods in Enzymol. 154: 329-350 (1987); Zoller & Smith (1982) "Oligonucleotide-directed mutagenesis using M13-derived vectors: an efficient and general procedure for the production of point mutations in any DNA fragment" Nucleic Acids Res. 10:6487-6500; Zoller & Smith (1983) "Oligonucleotide-directed mutagenesis of DNA fragments cloned into M13 vectors" Methods in Enzymol. 100:468-500; and Zoller & Smith (1987) Oligonucleotide- directed mutagenesis: a simple method using two oligonucleotide primers and a single- stranded DNA template" Methods in Enzymol. 154:329-350); phosphorothioate-modified DNA mutagenesis (Taylor et al. (1985) "The use of phosphorothioate-modified DNA in restriction enzyme reactions to prepare nicked DNA" Nucl. Acids Res. 13: 8749-8764; Taylor et al. (1985) "The rapid generation of oligonucleotide-directed mutations at high frequency using phosphorothioate-modified DNA" Nucl. Acids Res. 13: 8765-8787 (1985); Nakamaye (1986) "Inhibition of restriction endonuclease Nci I cleavage by phosphorothioate groups and its application to oligonucleotide-directed mutagenesis" Nucl. Acids Res. 14: 9679-9698; Sayers et al. (1988) "Y-T Exonucleases in phosphorothioate-based oligonucleotide-directed mutagenesis" Nucl. Acids Res. 16:791- 802; and Sayers et al. (1988) "Strand specific cleavage of phosphorothioate-containing DNA by reaction with restriction endonucleases in the presence of ethidium bromide" Nucl. Acids Res. 16: 803-814); mutagenesis using gapped duplex DNA (Kramer et al. (1984) "The gapped duplex DNA approach to oligonucleotide-directed mutation construction" Nucl. Acids Res. 12: 9441-9456; Kramer & Fritz (1987) Methods in
Enzymol. "Oligonucleotide-directed construction of mutations via gapped duplex DNA" 154:350-367; Kramer et al. (1988) "Improved enzymatic in vitro reactions in the gapped duplex DNA approach to oligonucleotide-directed construction of mutations" Nucl. Acids Res. 16: 7207; and Fritz et al. (1988) "Oligonucleotide-directed construction of mutations: a gapped duplex DNA procedure without enzymatic reactions in vitro" Nucl. Acids Res. 16: 6987-6999). Additional protocols that can be used to practice the invention include point mismatch repair (Kramer (1984) "Point Mismatch Repair" Cell 38:879-887), mutagenesis using repair-deficient host strains (Carter et al. (1985) "Improved oligonucleotide site-directed mutagenesis using Ml 3 vectors" Nucl. Acids Res. 13 : 4431 - 4443; and Carter (1987) "Improved oligonucleotide-directed mutagenesis using M13 vectors" Methods in Enzymol. 154: 382-403), deletion mutagenesis (Eghtedarzadeh (1986) "Use of oligonucleotides to generate large deletions" Nucl. Acids Res. 14: 5115), restriction-selection and restriction-selection and restriction-purification (Wells et al. (1986) "Importance of hydrogen-bond formation in stabilizing the transition state of subtilisin" Phil. Trans. R. Soc. Lond. A 317: 415-423), mutagenesis by total gene synthesis (Nambiar et al. (1984) "Total synthesis and cloning of a gene coding for the ribonuclease S protein" Science 223: 1299-1301; Sakamar and Khorana (1988) "Total synthesis and expression of a gene for the a-subunit of bovine rod outer segment guanine nucleotide-binding protein (transducin)" Nucl. Acids Res. 14: 6361-6372; Wells et al. (1985) "Cassette mutagenesis: an efficient method for generation of multiple mutations at defined sites" Gene 34:315-323; and Grundstrom et al. (1985) "Oligonucleotide-directed mutagenesis by microscale "shot-gun" gene synthesis" Nucl. Acids Res. 13: 3305-3316), double-strand break repair (Mandecki (1986); Arnold (1993) "Protein engineering for unusual environments" Current Opinion in Biotechnology 4:450-455. "Oligonucleotide- directed double-strand break repair in plasmids of Escherichia coli: a method for site- specific mutagenesis" Proc. Natl. Acad. Sci. USA, 83:7177-7181). Additional details on many of the above methods can be found in Methods in Enzymology Volume 154, which also describes useful controls for trouble-shooting problems with various mutagenesis methods. Protocols that can be used to practice the invention are described, e.g., in U.S. Patent Nos. 5,605,793 to Stemmer (Feb. 25, 1997), "Methods for In Vitro Recombination;" U.S. Pat. No. 5,811,238 to Stemmer et al. (Sep. 22, 1998) "Methods for Generating Polynucleotides having Desired Characteristics by Iterative Selection and Recombination;" U.S. Pat. No. 5,830,721 to Stemmer et al. (Nov. 3, 1998), "DNA Mutagenesis by Random Fragmentation and Reassembly;" U.S. Pat. No. 5,834,252 to Stemmer, et al. (Nov. 10, 1998) "End-Complementary Polymerase Reaction;" U.S. Pat. No. 5,837,458 to Minshull, et al. (Nov. 17, 1998), "Methods and Compositions for Cellular and Metabolic Engineering;" WO 95/22625, Stemmer and Crameri,
"Mutagenesis by Random Fragmentation and Reassembly;" WO 96/33207 by Stemmer and Lipschutz "End Complementary Polymerase Chain Reaction;" WO 97/20078 by Stemmer and Crameri "Methods for Generating Polynucleotides having Desired Characteristics by Iterative Selection and Recombination;" WO 97/35966 by Minshull and Stemmer, "Methods and Compositions for Cellular and Metabolic Engineering;" WO 99/41402 by Punnonen et al. "Targeting of Genetic Vaccine Vectors;" WO 99/41383 by Punnonen et al. "Antigen Library Immunization;" WO 99/41369 by Punnonen et al. "Genetic Vaccine Vector Engineering;" WO 99/41368 by Punnonen et al. "Optimization • of Immunomodulatory Properties of Genetic Vaccines;" EP 752008 by Stemmer and Crameri, "DNA Mutagenesis by Random Fragmentation and Reassembly;" EP 0932670 by Stemmer "Evolving Cellular DNA Uptake by Recursive Sequence Recombination;"
WO 99/23107 by Stemmer et al., "Modification of Virus Tropism and Host Range by
Viral Genome Shuffling;" WO 99/21979 by Apt et al., "Human Papillomavirus Vectors;"
WO 98/31837 by del Cardayre et al. "Evolution of Whole Cells and Organisms by Recursive Sequence Recombination;" WO 98/27230 by Patten and Stemmer, "Methods and Compositions for Polypeptide Engineering;" WO 98/27230 by Stemmer et al., "Methods for Optimization of Gene Therapy by Recursive Sequence Shuffling and Selection," WO 00/00632, "Methods for Generating Highly Diverse Libraries," WO 00/09679, "Methods for Obtaining in Vitro Recombined Polynucleotide Sequence Banks and Resulting Sequences," WO 98/42832 by Arnold et al., "Recombination of Polynucleotide Sequences Using Random or Defined Primers," WO 99/29902 by Arnold et al., "Method for Creating Polynucleotide and Polypeptide Sequences," WO 98/41653 by Vind, "An in Vitro Method for Construction of a DNA Library," WO 98/41622 by Borchert et al., "Method for Constructing a Library Using DNA Shuffling," and WO
98/42727 by Pati and Zarling, "Sequence Alterations using Homologous Recombination." Protocols that can be used to practice the invention (providing details regarding various diversity generating methods) are described, e.g., in U.S. Patent application serial no. (USSN) 09/407,800, "SHUFFLING OF CODON ALTERED GENES" by Patten et al. filed Sep. 28, 1999; "EVOLUTION OF WHOLE CELLS AND ORGANISMS BY RECURSIVE SEQUENCE RECOMBINATION" by del Cardayre et al., United States Patent No. 6,379,964; "OLIGONUCLEOTIDE MEDIATED NUCLEIC ACID RECOMBINATION" by Crameri et al., United States Patent Nos. 6,319,714; 6,368,861; 6,376,246; 6,423,542; 6,426,224 and PCT/US00/01203; "USE OF CODON- VARIED OLIGONUCLEOTIDE SYNTHESIS FOR SYNTHETIC SHUFFLING" by Welch et al., United States Patent No. 6,436,675; "METHODS FOR MAKING CHARACTER STRINGS, POLYNUCLEOTIDES & POLYPEPTIDES HAVING DESIRED CHARACTERISTICS" by Selifonov et al., filed Jan. 18, 2000, (PCT/USOO/01202) and, e.g. "METHODS FOR MAKING CHARACTER STRINGS, POLYNUCLEOTIDES & POLYPEPTIDES HAVING DESIRED
CHARACTERISTICS" by Selifonov et al., filed Jul. 18, 2000 (U.S. Ser. No. 09/618,579); "METHODS OF POPULATING DATA STRUCTURES FOR USE IN EVOLUTIONARY SIMULATIONS" by Selifonov and Stemmer, filed Jan. 18, 2000 (PCT/USOO/01138); and "SINGLE-STRANDED NUCLEIC ACID TEMPLATE- MEDIATED RECOMBINATION AND NUCLEIC ACID FRAGMENT ISOLATION" by Affliolter, filed Sep. 6, 2000 (U.S. Ser. No. 09/656,549); and United States Patent Nos.
6,177,263; 6,153,410. Non-stochastic, or "directed evolution," methods include, e.g., saturation mutagenesis (e.g., Gene Site Saturation Mutagenesis™ (GSSM™)), synthetic ligation reassembly (SLR), or a combination thereof can be used to modify the nucleic acids encoding a glycosyltransferase or a deoxysugar pathway to generate a natural product of the invention. Polypeptides encoded by the modified nucleic acids can be screened for a new property, e.g., stability, before testing for a glycosyltransferase or a deoxysugar pathway or other activity. Any testing modality or protocol can be used, e.g., using a capillary array platform. See, e.g., U.S. Patent Nos. 6,361,974; 6,280,926; 5,939,250. Saturation mutagenesis, or, GSSM™ In one aspect, codon primers containing a degenerate N,N,G/T sequence are used to introduce point mutations into a polynucleotide, e.g., a glycosyltransferase or a deoxysugar pathway enzyme, so as to generate a set of progeny polypeptides in which a full range of single amino acid substitutions is represented at each amino acid position, e.g., an amino acid residue in an enzyme active site or ligand binding site targeted to be modified. These oligonucleotides can comprise a contiguous first homologous sequence, a degenerate N,N,G/T sequence, and, optionally, a second homologous sequence. The downstream progeny translational products from the use of such oligonucleotides include all possible amino acid changes at each amino acid site along the polypeptide, because the degeneracy of the N,N,G/T sequence includes codons for all 20 amino acids. In one aspect, one such degenerate oligonucleotide (comprised of, e.g., one degenerate N,N,G/T cassette) is used for subjecting each original codon in a parental polynucleotide template to a full range of codon substitutions. In another aspect, at least two degenerate cassettes are used - either in the same oligonucleotide or not, for subjecting at least two original codons in a parental polynucleotide template to a full range of codon substitutions. For example, more than one N,N,G/T sequence can be contained in one oligonucleotide to introduce amino acid mutations at more than one site. This plurality of N,N,G/T sequences can be directly contiguous, or separated by one or more additional nucleotide sequence(s). In another aspect, oligonucleotides serviceable for introducing additions and deletions can be used either alone or in combination with the codons containing an N,N,G/T sequence, to introduce any combination or permutation of amino acid additions, deletions, and/or substitutions. In one aspect, simultaneous mutagenesis of two or more contiguous amino acid positions is done using an oligonucleotide that contains contiguous N,N,G/T triplets, i.e. a degenerate (N,N,G/T)n sequence. In another aspect, degenerate cassettes having less degeneracy than the N,N,G/T sequence are used. For example, it may be desirable in some instances to use (e.g. in an oligonucleotide) a degenerate triplet sequence comprised of only one N, where said N can be in the first second or third position of the triplet. Any other bases including any combinations and permutations thereof can be used in the remaining two positions of the triplet. Alternatively, it may be desirable in some instances to use (e.g. in an oligo) a degenerate N,N,N triplet sequence. In one aspect, use of degenerate triplets (e.g., N,N,G/T triplets) allows for systematic and easy generation of a full range of possible natural amino acids (for a total of 20 amino acids) into each and every amino acid position in a polypeptide (in alternative aspects, the methods also include generation of less than all possible substitutions per amino acid residue, or codon, position). For example, for a 100 amino acid polypeptide, 2000 distinct species (i.e. 20 possible amino acids per position X 100 amino acid positions) can be generated. Through the use of an oligonucleotide or set of oligonucleotides containing a degenerate N,N,G/T triplet, 32 individual sequences can code for all 20 possible natural amino acids. Thus, in a reaction vessel in which a parental polynucleotide sequence is subjected to saturation mutagenesis using at least one such oligonucleotide, there are generated 32 distinct progeny polynucleotides encoding 20 distinct polypeptides. In contrast, the use of a non-degenerate oligonucleotide in site- directed mutagenesis leads to only one progeny polypeptide product per reaction vessel. Nondegenerate oligonucleotides can optionally be used in combination with degenerate primers disclosed; for example, nondegenerate oligonucleotides can be used to generate specific point mutations in a working polynucleotide. This provides one means to generate specific silent point mutations, point mutations leading to corresponding amino acid changes, and point mutations that cause the generation of stop codons and the corresponding expression of polypeptide fragments. In one aspect, each saturation mutagenesis reaction vessel contains polynucleotides encoding at least 20 progeny polypeptide (e.g., a glycosyltransferase or a deoxysugar pathway enzyme) molecules such that all 20 natural amino acids are represented at the one specific amino acid position corresponding to the codon position mutagenized in the parental polynucleotide (other aspects use less than all 20 natural combinations). The 32-fold degenerate progeny polypeptides generated from each saturation mutagenesis reaction vessel can be subjected to clonal amplification (e.g. cloned into a suitable host, e.g., E. coli host, using, e.g., an expression vector) and subjected to expression screening. When an individual progeny polypeptide is identified by screening to display a favorable change in property (when compared to the parental polypeptide, such as increased proteolytic activity under alkaline or acidic conditions), it can be sequenced to identify the correspondingly favorable amino acid substitution contained therein. In one aspect, upon mutagenizing each and every amino acid position in a parental polypeptide using saturation mutagenesis as disclosed herein, favorable amino acid changes may be identified at more than one amino acid position. One or more new progeny molecules can be generated that contain a combination of all or part of these favorable amino acid substitutions. For example, if 2 specific favorable amino acid changes are identified in each of 3 amino acid positions in a polypeptide, the permutations include 3 possibilities at each position (no change from the original amino acid, and each of two favorable changes) and 3 positions. Thus, there are 3 x 3 x 3 or 27 total possibilities, including 7 that were previously examined - 6 single point mutations (i.e. 2 at each of three positions) and no change at any position. In another aspect, site-saturation mutagenesis can be used together with another stochastic or non-stochastic means to vary sequence, e.g., synthetic ligation reassembly (see below), shuffling, chimerization, recombination and other mutagenizing processes and mutagenizing agents. This invention provides for the use of any mutagenizing process(es), including saturation mutagenesis, in an iterative manner.
Synthetic Ligation Reassembly (SLR) The invention provides a non-stochastic gene modification system termed "synthetic ligation reassembly," or simply "SLR," a "directed evolution process," to generate polypeptides having a glycosyltransferase or a deoxysugar pathway activity with new or altered properties. SLR is a method of ligating oligonucleotide fragments together non-stochastically. This method differs from stochastic oligonucleotide shuffling in that the nucleic acid building blocks are not shuffled, concatenated or chimerized randomly, but rather are assembled non-stochastically. See, e.g., U.S. Patent Application Serial No. (USSN) 09/332,835 entitled "Synthetic Ligation Reassembly in Directed Evolution" and filed on June 14, 1999 ("USSN 09/332,835"). In one aspect, SLR comprises the following steps: (a) providing a template polynucleotide, wherein the template polynucleotide comprises sequence encoding a homologous gene; (b) providing a plurality of building block polynucleotides, wherein the building block polynucleotides are designed to cross-over reassemble with the template polynucleotide at a predetermined sequence, and a building block polynucleotide comprises a sequence that is a variant of the homologous gene and a sequence homologous to the template polynucleotide flanking the variant sequence; (c) combining a building block polynucleotide with a template polynucleotide such that the building block polynucleotide cross-over reassembles with the template polynucleotide to generate polynucleotides comprising homologous gene sequence variations. SLR does not depend on the presence of high levels of homology between polynucleotides to be rearranged. Thus, this method can be used to non-stochastically generate libraries (or sets) of progeny molecules comprised of over 10100 different chimeras. SLR can be used to generate libraries comprised of over 101000 different progeny chimeras. Thus, aspects of the present invention include non-stochastic methods of producing a set of finalized chimeric nucleic acid molecule shaving an overall assembly order that is chosen by design. This method includes the steps of generating by design a plurality of specific nucleic acid building blocks having serviceable mutually compatible ligatable ends, and assembling these nucleic acid building blocks, such that a designed overall assembly order is achieved. The mutually compatible ligatable ends of the nucleic acid building blocks to be assembled are considered to be "serviceable" for this type of ordered assembly if they enable the building blocks to be coupled in predetermined orders. Thus, the overall assembly order in which the nucleic acid building blocks can be coupled is specified by the design of the ligatable ends. If more than one assembly step is to be used, then the overall assembly order in which the nucleic acid building blocks can be coupled is also specified by the sequential order of the assembly step(s). In one aspect, the annealed building pieces are treated with an enzyme, such as a ligase (e.g. T4 DNA ligase), to achieve covalent bonding of the building pieces. In one aspect, the design of the oligonucleotide building blocks is obtained by analyzing a set of progenitor nucleic acid sequence templates that serve as a basis for producing a progeny set of finalized chimeric polynucleotides. These parental oligonucleotide templates thus serve as a source of sequence information that aids in the design of the nucleic acid building blocks that are to be mutagenized, e.g., chimerized or shuffled. In one aspect of this method, the sequences of a plurality of parental nucleic acid templates are aligned in order to select one or more demarcation points. The demarcation points can be located at an area of homology, and are comprised of one or more nucleotides. These demarcation points are preferably shared by at least two of the progenitor templates. The demarcation points can thereby be used to delineate the boundaries of oligonucleotide building blocks to be generated in order to rearrange the parental polynucleotides. The demarcation points identified and selected in the progenitor molecules serve as potential chimerization points in the assembly of the final chimeric progeny molecules. A demarcation point can be an area of homology (comprised of at least one homologous nucleotide base) shared by at least two parental polynucleotide sequences. Alternatively, a demarcation point can be an area of homology that is shared by at least half of the parental polynucleotide sequences, or, it can be an area of homology that is shared by at least two thirds of the parental polynucleotide sequences. Even more preferably a serviceable demarcation points is an area of homology that is shared by at least three fourths of the parental polynucleotide sequences, or, it can be shared by at almost all of the parental polynucleotide sequences. In one aspect, a demarcation point is an area of homology that is shared by all of the parental polynucleotide sequences . In one aspect, a ligation reassembly process is performed exhaustively in order to generate an exhaustive library of progeny chimeric polynucleotides. In other ; words, all possible ordered combinations of the nucleic acid building blocks are represented in the set of finalized chimeric nucleic acid molecules. At the same time, in another aspect, the assembly order (i.e. the order of assembly of each building block in the 5' to 3 sequence of each finalized chimeric nucleic acid) in each combination is by design (or non-stochastic) as described above. Because of the non-stochastic nature of this invention, the possibility of unwanted side products is greatly reduced. In another aspect, the ligation reassembly method is performed systematically. For example, the method is performed in order to generate a systematically compartmentalized library of progeny molecules, with compartments that can be screened systematically, e.g. one by one. In other words this invention provides that, through the selective and judicious use of specific nucleic acid building blocks, coupled with the selective and judicious use of sequentially stepped assembly reactions, a design can be achieved where specific sets of progeny products are made in each of several reaction vessels. This allows a systematic examination and screening procedure to be performed. Thus, these methods allow a potentially very large number of progeny molecules to be examined systematically in smaller groups. Because of its ability to perform chimerizations in a manner that is highly flexible yet exhaustive and systematic as well, particularly when there is a low level of homology among the progenitor molecules, these methods provide for the generation of a library (or set) comprised of a large number of progeny molecules. Because of the non-stochastic nature of the instant ligation reassembly invention, the progeny molecules generated preferably comprise a library of finalized chimeric nucleic acid molecules having an overall assembly order that is chosen by design. The saturation mutagenesis and optimized directed evolution methods also can be used to generate different progeny molecular species. It is appreciated that the invention provides freedom of choice and control regarding the selection of demarcation points, the size and number of the nucleic acid building blocks, and the size and design of the couplings. It is appreciated, furthermore, that the requirement for intermolecular homology is highly relaxed for the operability of this invention. In fact, demarcation points can even be chosen in areas of little or no intermolecular homology. For example, because of codon wobble, i.e. the degeneracy of codons, nucleotide substitutions can be introduced into nucleic acid building blocks without altering the amino acid originally encoded in the corresponding progenitor template. Alternatively, a codon can be altered such that the coding for an originally amino acid is altered. This invention provides that such substitutions can be introduced into the nucleic acid building block in order to increase the incidence of intermolecular homologous demarcation points and thus to allow an increased number of couplings to be achieved among the building blocks, which in turn allows a greater number of progeny chimeric molecules to be generated. In another aspect, the synthetic nature of the step in which the building blocks are generated allows the design and introduction of nucleotides (e.g., one or more nucleotides, which may be, for example, codons or introns or regulatory sequences) that can later be optionally removed in an in vitro process (e.g. by mutagenesis) or in an in vivo process (e.g. by utilizing the gene splicing ability of a host organism). It is appreciated that in many instances the introduction of these nucleotides may also be desirable for many other reasons in addition to the potential benefit of creating a serviceable demarcation point. In one aspect, a nucleic acid building block is used to introduce an intron.
Thus, functional introns are introduced into a man-made gene manufactured according to the methods described herein. The artificially introduced intron(s) can be functional in a host cells for gene splicing much in the way that naturally-occurring introns serve functionally in gene splicing. Optimized Directed Evolution System The invention provides a non-stochastic gene modification system termed "optimized directed evolution system" to polypeptides having a glycosyltransferase or a deoxysugar pathway activity with new or altered properties. Optimized directed evolution is directed to the use of repeated cycles of reductive reassortment, recombination and selection that allow for the directed molecular evolution of nucleic acids through recombination. Optimized directed evolution allows generation of a large population of evolved chimeric sequences, wherein the generated population is significantly enriched for sequences that have a predetermined number of crossover events. A crossover event is a point in a chimeric sequence where a shift in sequence occurs from one parental variant to another parental variant. Such a point is normally at the juncture of where oligonucleotides from two parents are ligated together to form a single sequence. This method allows calculation of the correct concentrations of oligonucleotide sequences so that the final chimeric population of sequences is enriched for the chosen number of crossover events. This provides more control over choosing chimeric variants having a predetermined number of crossover events. In addition, this method provides a convenient means for exploring a tremendous amount of the possible protein variant space in comparison to other systems. Previously, if one generated, for example, 1013 chimeric molecules during a reaction, it would be extremely difficult to test such a high number of chimeric variants for a particular activity. Moreover, a significant portion of the progeny population would have a very high number of crossover events which resulted in proteins that were less likely to have increased levels of a particular activity. By using these methods, the population of chimerics molecules can be enriched for those variants that have a particular number of crossover events. Thus, although one can still generate 1013 chimeric molecules during a reaction, each of the molecules chosen for further analysis most likely has, for example, only three crossover events. Because the resulting progeny population can be skewed to have a predetermined number of crossover events, the boundaries on the functional variety between the chimeric molecules is reduced. This provides a more manageable number of variables when calculating which oligonucleotide from the original parental polynucleotides might be responsible for affecting a particular trait. One method for creating a chimeric progeny polynucleotide sequence is to create oligonucleotides corresponding to fragments or portions of each parental sequence.
Each oligonucleotide preferably includes a unique region of overlap so that mixing the oligonucleotides together results in a new variant that has each oligonucleotide fragment assembled in the correct order. Additional information can also be found, e.g., in USSN 09/332,835; U.S. Patent No. 6,361,974. The number of oligonucleotides generated for each parental variant bears a relationship to the total number of resulting crossovers in the chimeric molecule that is ultimately created. For example, three parental nucleotide sequence variants might be provided to undergo a ligation reaction in order to find a chimeric variant having, for example, greater activity at high temperature. As one example, a set of 50 oligonucleotide sequences can be generated corresponding to each portions of each parental variant. Accordingly, during the ligation reassembly process there could be up to 50 crossover events within each of the chimeric sequences. The probability that each of the generated chimeric polynucleotides will contain oligonucleotides from each parental variant in alternating order is very low. If each oligonucleotide fragment is present in the ligation reaction in the same molar quantity it is likely that in some positions oligonucleotides from the same parental polynucleotide will ligate next to one another and thus not result in a crossover event. If the concentration of each oligonucleotide from each parent is kept constant during any ligation step in this example, there is a 1/3 chance (assuming 3 parents) that an oligonucleotide from the same parental variant will ligate within the chimeric sequence and produce no crossover. Accordingly, a probability density function (PDF) can be determined to predict the population of crossover events that are likely to occur during each step in a ligation reaction given a set number of parental variants, a number of oligonucleotides corresponding to each variant, and the concentrations of each variant during each step in the ligation reaction. The statistics and mathematics behind determining the PDF is described below. By utilizing these methods, one can calculate such a probability density function, and thus enrich the chimeric progeny population for a predetermined number of crossover events resulting from a particular ligation reaction. Moreover, a target number of crossover events can be predetermined, and the system then programmed to calculate the starting quantities of each parental oligonucleotide during each step in the ligation reaction to result in a probability density function that centers on the predetermined number of crossover events. These methods are directed to the use of repeated cycles of reductive reassortment, recombination and selection that allow for the directed molecular evolution of a nucleic acid encoding a polypeptide through recombination. This system allows generation of a large population of evolved chimeric sequences, wherein the generated population is significantly enriched for sequences that have a predetermined number of crossover events. A crossover event is a point in a chimeric sequence where a shift in sequence occurs from one parental variant to another parental variant. Such a point is normally at the juncture of where oligonucleotides from two parents are ligated together to form a single sequence. The method allows calculation of the correct concentrations of oligonucleotide sequences so that the final chimeric population of sequences is enriched for the chosen number of crossover events. This provides more control over choosing chimeric variants having a predetermined number of crossover events. In addition, these methods provide a convenient means for exploring a tremendous amount of the possible protein variant space in comparison to other systems. By using the methods described herein, the population of chimerics molecules can be enriched for those variants that have a particular number of crossover events. Thus, although one can still generate 1013 chimeric molecules during a reaction, each of the molecules chosen for further analysis most likely has, for example, only three crossover events. Because the resulting progeny population can be skewed to have a predetermined number of crossover events, the boundaries on the functional variety between the chimeric molecules is reduced. This provides a more manageable number of variables when calculating which oligonucleotide from the original parental polynucleotides might be responsible for affecting a particular trait. In one aspect, the method creates a chimeric progeny polynucleotide sequence by creating oligonucleotides corresponding to fragments or portions of each parental sequence. Each oligonucleotide preferably includes a unique region of overlap so that mixing the oligonucleotides together results in a new variant that has each oligonucleotide fragment assembled in the correct order. See also USSN 09/332,835. The number of oligonucleotides generated for each parental variant bears a relationship to the total number of resulting crossovers in the chimeric molecule that is ultimately created. For example, three parental nucleotide sequence variants might be provided to undergo a ligation reaction in order to find a chimeric variant having, for example, greater activity at high temperature. As one example, a set of 50 oligonucleotide sequences can be generated corresponding to each portions of each parental variant. Accordingly, during the ligation reassembly process there could be up to
50 crossover events within each of the chimeric sequences. The probability that each of the generated chimeric polynucleotides will contain oligonucleotides from each parental variant in alternating order is very low. If each oligonucleotide fragment is present in the ligation reaction in the same molar quantity it is likely that in some positions oligonucleotides from the same parental polynucleotide will ligate next to one another and thus not result in a crossover event. If the concentration of each oligonucleotide from each parent is kept constant during any ligation step in this example, there is a 1/3 chance (assuming 3 parents) that an oligonucleotide from the same parental variant will ligate within the chimeric sequence and produce no crossover. Accordingly, a probability density function (PDF) can be determined to predict the population of crossover events that are likely to occur during each step in a ligation reaction given a set number of parental variants, a number of oligonucleotides corresponding to each variant, and the concentrations of each variant during each step in the ligation reaction. The statistics and mathematics behind determining the PDF is described below. One can calculate such a probability density function, and thus enrich the chimeric progeny population for a predetermined number of crossover events resulting from a particular ligation reaction. Moreover, a target number of crossover events can be predetermined, and the system then programmed to calculate the starting quantities of each parental oligonucleotide during each step in the ligation reaction to result in a probability density function that centers on the predetermined number of crossover events.
Determining Crossover Events Aspects of the invention include a system and software that receive a desired crossover probability density function (PDF), the number of parent genes to be reassembled, and the number of fragments in the reassembly as inputs. The output of this program is a "fragment PDF" that can be used to determine a recipe for producing reassembled genes, and the estimated crossover PDF of those genes. The processing can be performed, e.g., in MATLAB™ (The Mathworks, Natick, Massachusetts) a programming language and development environment for technical computing. Iterative Processes In practicing the invention, these processes can be iteratively repeated.
For example, a nucleic acid (or, the nucleic acid) responsible for an altered or new glycosyltransferase or a deoxysugar pathway enzyme phenotype is identified, re-isolated, again modified, re-tested for activity. This process can be iteratively repeated until a desired phenotype is engineered. For example, an entire biochemical anabolic or catabolic pathway can be engineered into a cell, including, e.g., an in vivo glycosylation system of the invention. Similarly, if it is determined that a particular oligonucleotide has no affect at all on the desired trait (e.g., a new glycosyltransferase or deoxysugar pathway enzyme phenotype), it can be removed as a variable by synthesizing larger parental oligonucleotides that include the sequence to be removed. Since incorporating the sequence within a larger sequence prevents any crossover events, there will no longer be any variation of this sequence in the progeny polynucleotides. This iterative practice of determining which oligonucleotides are most related to the desired trait, and which are unrelated, allows more efficient exploration all of the possible protein variants that might be provide a particular trait or activity. In vivo shuffling In vivo shuffling of molecules is use in methods of the invention that provide variants of polypeptides having a glycosyltransferase or deoxysugar pathway activity. In vivo shuffling can be performed utilizing the natural property of cells to recombine multimers. While recombination in vivo has provided the major natural route to molecular diversity, genetic recombination remains a relatively complex process that involves 1) the recognition of homologies; 2) strand cleavage, strand invasion, and metabolic steps leading to the production of recombinant chiasma; and finally 3) the resolution of chiasma into discrete recombined molecules. The formation of the chiasma requires the recognition of homologous sequences. In one aspect, the invention provides a method for producing a hybrid polynucleotide from at least a first polynucleotide (e.g., a glycosyltransferase or deoxysugar pathway enzyme) and a second polynucleotide (e.g., a polypeptide having a glycosyltransferase or deoxysugar pathway, or, a tag or an epitope). The hybrid polynucleotide can be made by introducing at least a first polynucleotide and a second polynucleotide which share at least one region of partial sequence homology into a suitable host cell. The regions of partial sequence homology promote processes which result in sequence reorganization producing a hybrid polynucleotide. Hybrid polynucleotides can result from intermolecular recombination events which promote sequence integration between DNA molecules. In addition, such hybrid polynucleotides can result from intramolecular reductive reassortment processes which utilize repeated sequences to alter a nucleotide sequence within a DNA molecule. Producing sequence variants The invention also provides additional methods for making sequence variants of the nucleic acids encoding polypeptides with glycosyltransferase or deoxysugar pathway activity. In one aspect, the invention provides variants of a glycosyltransferase or deoxysugar pathway enzyme coding sequence (e.g., a gene, cDNA or message). The variants can be generated any means, including, e.g., random or stochastic methods, or, non-stochastic, or "directed evolution," methods, as described herein. The isolated variants may be naturally occurring. Variant can also be created in vitro. Variants may be created using genetic engineering techniques such as site directed mutagenesis, random chemical mutagenesis, Exonuclease III deletion procedures, and standard cloning techniques. Alternatively, such variants, fragments, analogs, or derivatives may be created using chemical synthesis or modification procedures. Other methods of making variants are also familiar to those skilled in the art. These include procedures in which nucleic acid sequences obtained from natural isolates are modified to generate nucleic acids which encode polypeptides having characteristics which enhance their value in industrial or laboratory applications. In such procedures, a large number of variant sequences having one or more nucleotide differences with respect to the sequence obtained from the natural isolate are generated and characterized. These nucleotide differences can result in amino acid changes with respect to the polypeptides encoded by the nucleic acids from the natural isolates. For example, variants may be created using error prone PCR. In error prone PCR, PCR is performed under conditions where the copying fidelity of the DNA polymerase is low, such that a high rate of point mutations is obtained along the entire length of the PCR product. Error prone PCR is described, e.g., in Leung, D.W., et al., Technique, 1:11-15, 1989) and Caldwell, R. C. & Joyce G.F., PCR Methods Applic, 2:28-33, 1992. Briefly, in such procedures, nucleic acids to be mutagenized are mixed with PCR primers, reaction buffer, MgCl2, MnCl2, Taq polymerase and an appropriate concentration of dNTPs for achieving a high rate of point mutation along the entire length of the PCR product. For example, the reaction may be performed using 20 fmoles of nucleic acid to be mutagenized, 30 pmole of each PCR primer, a reaction buffer comprising 50mM KCl, lOmM Tris HC1 (pH 8.3) and 0.01% gelatin, 7mM MgC12, 0.5mM MnCl2, 5 units of Taq polymerase, 0.2mM dGTP, 0.2mM dATP, ImM dCTP, and ImM dTTP. PCR may be performed for 30 cycles of 94°C for 1 min, 45°C for 1 min, and 72°C for 1 min. However, it will be appreciated that these parameters may be varied as appropriate. The mutagenized nucleic acids are cloned into an appropriate vector and the activities of the polypeptides encoded by the mutagenized nucleic acids is evaluated. Variants may also be created using oligonucleotide directed mutagenesis to generate site-specific mutations in any cloned DNA of interest. Oligonucleotide mutagenesis is described, e.g., in Reidhaar-Olson (1988) Science 241:53-57. Briefly, in such procedures a plurality of double stranded oligonucleotides bearing one or more mutations to be introduced into the cloned DNA are synthesized and inserted into the cloned DNA to be mutagenized. Clones containing the mutagenized DNA are recovered and the activities of the polypeptides they encode are assessed. Another method for generating variants is assembly PCR. Assembly PCR involves the assembly of a PCR product from a mixture of small DNA fragments. A large number of different PCR reactions occur in parallel in the same vial, with the products of one reaction priming the products of another reaction. Assembly PCR is described in, e.g, U.S. Patent No. 5,965,408. Still another method of generating variants is sexual PCR mutagenesis. In sexual PCR mutagenesis, forced homologous recombination occurs between DNA molecules of different but highly related DNA sequence in vitro, as a result of random fragmentation of the DNA molecule based on sequence homology, followed by fixation of the crossover by primer extension in a PCR reaction. Sexual PCR mutagenesis is described, e.g., in Stemmer (1994) Proc. Natl. Acad. Sci. USA 91:10747-10751. Briefly, in such procedures a plurality of nucleic acids to be recombined are digested with DNase to generate fragments having an average size of 50-200 nucleotides. Fragments of the desired average size are purified and resuspended in a PCR mixture. PCR is conducted under conditions which facilitate recombination between the nucleic acid fragments. For example, PCR may be performed by resuspending the purified fragments at a concentration of 10-30ng/:l in a solution of Q.2mM of each dNTP, 2.2mM MgCl2, 50mM
KCL, lOmM Tris HC1, pH 9.0, and 0.1% Triton X-100. 2.5 units of Taq polymerase per
100:1 of reaction mixture is added and PCR is performed using the following regime:
94°C for 60 seconds, 94°C for 30 seconds, 50-55°C for 30 seconds, 72°C for 30 seconds
(30-45 times) and 72°C for 5 minutes. However, it will be appreciated that these parameters may be varied as appropriate. In some aspects, oligonucleotides may be included in the PCR reactions. In other aspects, the Klenow fragment of DNA polymerase I may be used in a first set of PCR reactions and Taq polymerase may be used in a subsequent set of PCR reactions. Recombinant sequences are isolated and the activities of the polypeptides they encode are assessed. Variants may also be created by in vivo mutagenesis. In some aspects, random mutations in a sequence of interest are generated by propagating the sequence of interest in a bacterial strain, such as an E. coli strain, which carries mutations in one or more of the DNA repair pathways. Such "imitator" strains have a higher random mutation rate than that of a wild-type parent. Propagating the DNA in one of these strains will eventually generate random mutations within the DNA. Mutator strains suitable for use for in vivo mutagenesis are described, e.g, in PCT Publication No. WO 91/16427. Variants may also be generated using cassette mutagenesis. In cassette mutagenesis a small region of a double stranded DNA molecule is replaced with a synthetic oligonucleotide "cassette" that differs from the native sequence. The oligonucleotide often contains completely and/or partially randomized native sequence. Recursive ensemble mutagenesis may also be used to generate variants. Recursive ensemble mutagenesis is an algorithm for protein engineering (protein mutagenesis) developed to produce diverse populations of phenotypically related mutants whose members differ in amino acid sequence. This method uses a feedback mechanism to control successive rounds of combinatorial cassette mutagenesis. Recursive ensemble mutagenesis is described, e.g, in Arkin (1992) Proc. Natl. Acad. Sci. USA 89:7811-7815. In some aspects, variants are created using exponential ensemble mutagenesis. Exponential ensemble mutagenesis is a process for generating combinatorial libraries with a high percentage of unique and functional mutants, wherein small groups of residues are randomized in parallel to identify, at each altered position, amino acids which lead to functional proteins. Exponential ensemble mutagenesis is described, e.g, in Delegrave (1993) Biotechnology Res. 11:1548-1552. Random and site-directed mutagenesis are described, e.g, in Arnold (1993) Current Opinion in Biotechnology 4:450-455. In some aspects, the variants are created using shuffling procedures wherein portions of a plurality of nucleic acids which encode distinct polypeptides are fused together to create chimeric nucleic acid sequences which encode chimeric polypeptides as described in, e.g, U.S. Patent Nos. 5,965,408; 5,939,250 (see also discussion, above). The invention also provides variants of polypeptides having a glycosyltransferase or deoxysugar pathway activity comprising sequences in which one or more of the amino acid residues are substituted with a conserved or non-conserved amino acid residue (e.g, a conserved amino acid residue) and such substituted amino acid residue may or may not be one encoded by the genetic code. Conservative substitutions are those that substitute a given amino acid in a polypeptide by another amino acid of like characteristics. Thus, polypeptides having a glycosyltransferase or deoxysugar pathway activity include those with conservative substitutions of sequences of known polypeptides having a glycosyltransferase or deoxysugar pathway activity, including but not limited to the following replacements: replacements of an aliphatic amino acid such as Alanine, Valine, Leucine and Isoleucine with another aliphatic amino acid; replacement of a Serine with a Threonine or vice versa; replacement of an acidic residue such as Aspartic acid and Glutamic acid with another acidic residue; replacement of a residue bearing an amide group, such as Asparagine and Glutamine, with another residue bearing an amide group; exchange of a basic residue such as Lysine and Arginine with another basic residue; and replacement of an aromatic residue such as Phenylalanine, Tyrosine with another aromatic residue. Other variants are those in which one or more of the amino acid residues of the polypeptides includes a substituent group. Other variants within the scope of the invention are those in which the polypeptide is associated with another compound, such as a compound to increase the half-life of the polypeptide, for example, polyethylene glycol. Additional variants within the scope of the invention are those in which additional amino acids are fused to the polypeptide, such as a leader sequence, a secretory sequence, a proprotein sequence or a sequence which facilitates purification, enrichment, or stabilization of the polypeptide. In some aspects, the variants, fragments, derivatives and analogs of polypeptides having a glycosyltransferase or deoxysugar pathway activity retain the same biological function or activity as the exemplary polypeptides described herein. In other aspects, the variant, fragment, derivative, or analog includes a proprotein, such that the variant, fragment, derivative, or analog can be activated by cleavage of the proprotein portion to produce an active polypeptide. Optimizing codons to achieve high levels of protein expression in host cells The invention provides methods for modifying nucleic acids encoding glycosyltransferase or deoxysugar pathway enzymes by modifying codon usage. In one aspect, the invention provides methods for modifying codons in a nucleic acid encoding a glycosyltransferase or deoxysugar pathway enzyme to increase or decrease its expression in a host cell. The invention also provides nucleic acids encoding a glycosyltransferase or deoxysugar pathway enzyme modified to increase its expression in a host cell, and methods of making the modified glycosyltransferase or deoxysugar pathway. The method comprises identifying a "non-preferred" or a "less preferred" codon in glycosyltransferase or deoxysugar pathway enzyme -encoding nucleic acid and replacing one or more of these non-preferred or less preferred codons with a "preferred codon" encoding the same amino acid as the replaced codon and at least one non-preferred or less preferred codon in the nucleic acid has been replaced by a preferred codon encoding the same amino acid. A preferred codon is a codon over-represented in coding sequences in genes in the host cell and a non-preferred or less preferred codon is a codon under- represented in coding sequences in genes in the host cell. Host cells for expressing nucleic acids encoding polypeptides having a glycosyltransferase or deoxysugar pathway activity, expression cassettes and vectors comprising these nucleic acids, include bacteria, yeast, fungi, plant cells, insect cells and mammalian cells. Thus, the invention provides methods for optimizing codon usage in all of these cells, codon-altered nucleic acids and polypeptides made by the codon-altered nucleic acids. Exemplary host cells include gram negative bacteria, such as Escherichia coli; gram positive bacteria, such as a Streptomyces, e.g, Streptomyces coelicolor, Streptomyces peucetius, Streptomyces avermitilis, Streptomyces aureofaciens, Streptomyces kasugensis, Streptomyces lavenulae, Streptomyces lipmanii, Streptomyces lividans, Streptomyces ambofaciens, Streptomyces violaceoniger, Streptomyces thermotolerans, Streptomyces rimosus, Streptomyces glaucescens, Streptomyces roseofulvus, Streptomyces cinnamonensis, Streptomyces curacoi, Streptomyces fradiae, Streptomyces griseus, Streptomyces griseofuscus, Streptomyces longisporoflavus, Streptomyces hygroscopicus, Streptomyces lasaliensis, Streptomyces venezuelae,
Streptomyces antibioticus, Streptomyces albus, Streptomyces tsukubaensis, Streptomyces galilaeus or Streptomyces diversa, Lactobacilliis gasseri, Lactococcus lactis, Lactococcus cremoris, Bacillus subtilis. Exemplary host cells also include eukaryotic organisms, e.g, various yeast, such as Saccharomyces sp, including Saccharomyces cerevisiae, Schizosaccharomyces pombe, Pichiapastoris, and Kluyveromyces lactis, Hansenula polymorpha, Aspergillus niger, and mammalian cells and cell lines and insect cells and cell lines. Thus, the invention also includes nucleic acids and polypeptides optimized for expression in these organisms and species. For example, the codons of a nucleic acid encoding a glycosyltransferase or deoxysugar pathway enzyme isolated from a bacterial cell are modified such that the nucleic acid is optimally expressed in a bacterial cell different from the bacteria from which the glycosyltransferase or deoxysugar pathway enzyme was derived, a yeast, a fungi, a plant cell, an insect cell or a mammalian cell. Methods for optimizing codons are well known in the art, see, e.g, U.S. Patent No. 5,795,737; Baca (2000) Int. J. Parasitol. 30:113-118; Hale (1998) Protein Expr. Purif. 12:185-188; Narum (2001) Infect. Immun. 69:7250-7253. See also Narum (2001) Infect. Immun. 69:7250-7253, describing optimizing codons in mouse systems; Outchkourov (2002) Protein Expr. Purif. 24:18-24, describing optimizing codons in yeast; Feng (2000) Biochemistry 39:15399-15409, describing optimizing codons in E. coli; Humphreys (2000) Protein Expr. Purif. 20:252- 264, describing optimizing codon usage that affects secretion in E. coli.
Anti-microbial disinfectant, insecticidal activity In one aspect, the invention provides compositions having anti-microbial (e.g, as a disinfectant) and/or insecticidal activity and methods for using the compositions of the invention. Any or all of the steps of the methods of the invention can be carried out in vitro, in vivo in a whole cell process or in a transgenic plant or transformed plant cell. Compositions of the invention (e.g, glycosylated natural products of the invention) can be detected and quantified by any of a number of means well known to those of skill in the art, including, e.g, analytic methods such as spectrophotometry (e.g, mass spectography), radiography, electrophoresis, capillary electrophoresis, high performance liquid chromatography (HPLC), thin layer chromatography (TLC), hyperdiffusion chromatography, and the like, and various immunological methods such as fluid or gel precipitin reactions, immunodiffusion (single or double), immunoelectrophoresis, radioimmunoassays (RIAs), enzyme-linked immunosorbent assays (ELISAs), immunofluorescent assays, ion-specific electrodes, see, e.g, Fritsche (1991) Analytica Chimica Acta 244:179-182; West (1992) Analytical Chemistry 64:533- 540; by gas chromatography, mass spectography or by other chromatographic methods. ill In one aspect, compositions of the invention (e.g, glycosylated natural products of the invention) in a composition, e.g, a food or feed, such as a grain, is detected and measured by fluorescence polarization. For example, a feed, e.g, a grain extract, is prepared by shaking a crushed sample with a solvent. A mixture is prepared by combining the extract with a tracer and with monoclonal antibodies specific to a natural product. The tracer is able to bind to the monoclonal antibodies to produce a detectable change in fluorescence polarization. The tracer is prepared by conjugating a glycosylated natural product to a suitable fluorophore. The fluorescence polarization of the mixture is measured. The glycosylated natural product concentration of the mixture may be calculated using a standard curve obtained by measuring the fluorescence polarization of a series of solutions of known concentration. In one aspect, the methods of the invention comprise application of a glycosylated natural product of the invention directly to a plant or plant part, including processed plant parts, such as animal feeds, foods, and the like. The polypeptide can be applied to a crop area or a plant to be treated, simultaneously or in succession, with other compounds, such as fertilizers, nutrients or other preparations that influence plant growth, herbicides, insecticides, fungicides, bactericides, nematicides, mollusicides, or mixtures of these preparations. In practicing the methods of the invention, the application of a glycosylated natural product of the invention can be with an agriculturally acceptable carrier, a surfactant, and/or an adjuvant or formulation. The glycosylated natural products of the invention can be formulated as solids or liquids. They can be applied with natural or regenerated mineral substances, solvents, dispersants, wetting agents, tackifiers, binders, or fertilizers. The application of a glycosylated natural product of the invention can be applied to the plant, plant part or any surface using any techniques, for example, as a wash or spray, or in dried or lyophilized form or powered form. In one aspect, the glycosylated natural product of the invention is in a milled formulation. In one aspect, the glycosylated natural product of the invention is applied to foods and feeds, e.g, processed grains or silage to be used for animal feed. The glycosylated natural product of the invention can be applied in the form of an inoculant or probiotic additive. The glycosylated natural product of the invention can be useful during processing and/or in animal feed prior to its use. Biological activity of the novel 17-PSA analogues and spinosyn analogues was determined against beet armyworm larvae. Whereas none of the 17-PSA variants showed any activity, a subset of the spinosyn A analogues displayed good insecticidal activity, albeit less than spinosad. Consequently, the method and system of the invention provides libraries of novel glycosylated compound suitable for screening for a desirable property of interest. Transgenic non-human animals The methods of the invention can be practiced using transgenic non-human animals comprising a nucleic acid encoding a glycosyltransferase or a deoxysugar pathway enzyme. The invention also provides transgenic non-human animals comprising a nucleic acid or a polypeptide generated by a method of the invention. The invention provides transgenic non-human animals comprising an expression cassette or vector or a transfected or transformed cell comprising a nucleic acid encoding a glycosyltransferase or a deoxysugar pathway enzyme. The invention also provides methods of making and using these transgenic non-human animals. The transgenic non-human animals can be used as in vivo screening models for identifying glycosyltransferase or a deoxysugar pathway enzymes, e.g, enzymes modified by the methods of the invention. The transgenic non-human animals can be, e.g, goats, rabbits, sheep, pigs, cows, rats and mice, comprising the nucleic acids encoding glycosyltransferase or a deoxysugar pathway enzyme. These animals can be used, e.g, as in vivo models to screen for or to study glycosyltransferase or a deoxysugar pathway enzymes. The coding sequences for the polypeptides to be expressed in the transgenic non-human animals can be designed to be constitutive, or, under the control of tissue-specific, developmental- specific or inducible transcriptional regulatory factors. Transgenic non-human animals can be designed and generated using any method known in the art; see, e.g, U.S. Patent Nos. 6,211,428; 6,187,992; 6,156,952; 6,118,044; 6,111,166; 6,107,541; 5,959,171; 5,922,854; 5,892,070; 5,880,327; 5,891,698; 5,639,940; 5,573,933; 5,387,742; 5,087,571, describing making and using transformed cells and eggs and transgenic mice, rats, rabbits, sheep, pigs and cows. See also, e.g. Pollock (1999) J. Immunol. Methods 231:147-157, describing the production of recombinant proteins in the milk of transgenic dairy animals; Baguisi (1999) Nat. Biotechnol. 17:456-461, demonstrating the production of transgenic goats. U.S. Patent No. 6,211,428, describes making and using transgenic non-human mammals which express in their brains a nucleic acid construct comprising a DNA sequence. U.S. Patent No. 5,387,742, describes injecting cloned recombinant or synthetic DNA sequences into fertilized mouse eggs, implanting the injected eggs in pseudo-pregnant females, and growing to term transgenic mice whose cells express proteins related to the pathology of Alzheimer's disease. U.S. Patent No. 6,187,992, describes making and using a transgenic mouse whose genome comprises a disruption of the gene encoding amyloid precursor protein (APP). "Knockout animals" can also be used to practice the methods of the invention or to screen for a glycosyltransferase or a deoxysugar pathway to generate a natural product of the invention. For example, in one aspect, the transgenic or modified animals comprise a "knockout animal," e.g, a "knockout mouse," engineered not to express an endogenous gene, which is replaced with a gene expressing a heterologous glycosyltransferase and/or a heterologous deoxysugar pathway.
Transgenic Plants and Plant Parts The invention can be practiced using transgenic plants and plant parts (e.g, individual cells, seeds, fruits, flowers, leaves, roots, tubers, etc.) comprising a nucleic acid encoding a glycosyltransferase and/or a deoxysugar pathway. In one aspect, the methods comprise providing a transgenic plant capable of constitutively or inducibly expressing a glycosyltransferase and/or a deoxysugar pathway. In one aspect, a plant or plant cell is used to generate a glycosylated natural product of the invention, which is then applied to a plant, plant part, or any surface. The invention also provides transgenic plants and plant parts comprising a nucleic acid, a polypeptide, an expression cassette or vector or a transfected or transformed cell comprising a nucleic acid encoding a glycosyltransferase and/or a deoxysugar pathway.
The term "transgenic plant" includes plants that comprise within their genome a heterologous polynucleotide, e.g, a nucleic acid encoding a glycosyltransferase and/or a deoxysugar pathway. Generally, the heterologous polynucleotide is stably integrated within the genome such that the polynucleotide is passed on to successive generations. The heterologous polynucleotide may be integrated into the genome alone or as part of a recombinant expression cassette. "Transgenic" is used herein to include any cell, cell line, callus, tissue, plant part or plant, the genotype of which has been altered by the presence of heterologous nucleic acid including those transgenics initially so altered as well as those created by sexual crosses or asexual propagation from the initial transgenic. The term "transgenic" as used herein does not encompass the alteration of the genome (chromosomal or extra-chromosomal) by conventional plant breeding methods or by naturally occurring events such as random cross-fertilization, nonrecombinant viral infection, nonrecombinant bacterial transformation, nonrecombinant transposition, or spontaneous mutation. The transgenic plant can be dicotyledonous (a dicot) or monocotyledonous (a monocot). The invention also provides methods of making and using transgenic plants and plant parts. The transgenic plants and plant parts be constructed in accordance with any method known in the art. See, for example, U.S. Patent No. 6,309,872. A plant can be transformed with a nucleotide sequence (e.g, a nucleic acid encoding a glycosyltransferase and/or a deoxysugar pathway) to, e.g, generate a composition of the invention, in the transformed plant or plant products. Transgenic plants and plants transformed with a nucleic acid encoding a glycosyltransferase and/or a deoxysugar pathway used to practice the invention include, for example, species from the genera Cucurbita, Rosa, Vitis, Juglans, Fragaria, Lotus, Medicago, Onobrychis, Trifolium, Trigonella, Vigna, Citrus, Linum, Geranium, Manihot, Daucus, Arabidopsis, Brassica, Raphanus, Sinapis, Atropa, Capsicum, Datura, Hyoscyamus, Lycopersicon, Nicotiana, Solanum, Petunia, Digitalis, Majorana, Ciahorium, Helianthus, Lactuca, Bromus, Asparagus, Antirrhinum, Heterocallis, Nemesis, Pelargonium, Panieum, Pennisetum, Ranunculus, Senecio, Salpiglossis, Cucumis, Browaalia, Glycine, Pisum, Phaseolus, Lolium, Oryza, Zea, Avena, Hordeum, Secale, Triticum, Sorghum, Picea, Caco, and Populus. Nucleic acids and expression constructs can be introduced into a plant cell by any means. For example, nucleic acids or expression constructs can be introduced into the genome of a desired plant host, or, the nucleic acids or expression constructs can be episomes. Introduction into the genome of a desired plant can be such that the host's glycosyltransferase and/or a deoxysugar production is regulated by endogenous transcriptional or translational control elements. The invention also provides "knockout plants" where insertion of gene sequence by, e.g, homologous recombination, has disrupted the expression of the endogenous gene. Means to generate "knockout" plants are well-known in the art, see, e.g, Strepp (1998) Proc Natl. Acad. Sci. USA 95:4368- 4373; Miao (1995) Plant J 7:359-365. See discussion on transgenic plants, below. Transformation protocols as well as protocols for introducing nucleotide sequences into plants may vary depending on the type of plant or plant cell, i.e., monocot or dicot, targeted for transformation. Suitable methods of introducing nucleotide sequences into plant cells and subsequent insertion into the plant genome include microinjection (Crossway et al. (1986) Biotechniques 4:320-334), electroporation (Riggs et al. (1986) Proc. Natl. Acad. Sci. USA 83:5602-5606, Agrobacterium-mediated transformation (Townsend et al, U.S. Pat. No. 5,563,055), direct gene transfer (Paszkowski et al. (1984) EMBO J. 3:2717-2722), and ballistic particle acceleration (see, for example, Sanford et al, U.S. Pat. No. 4,945,050; Tomes et al. (1995) "Direct DNA Transfer into Intact Plant Cells via Microprojectile Bombardment," in Plant Cell, Tissue, and Organ Culture: Fundamental Methods, ed. Gamborg and Phillips (Springer-Verlag, Berlin); and McCabe et al. (1988) Biotechnology 6:923-926). See also Weissinger et al. (1988) Ann. Rev. Genet. 22:421-477; Sanford et al. (1987) Particulate Science and Technology 5:27-37 (onion); Christou et al. (1988) Plant Physiol. 87:671-674 (soybean); McCabe et al. (1988) Bio/Technology 6:923-926 (soybean); Finer and McMullen (1991) In vitro Cell Dev. Biol. 27PT75-182 (soybean); Singh et al. (1998) Theor. Appl. Genet. 96:319-324 (soybean); Datta et al. (1990) Biotechnology 8:736-740 (rice); Klein et al. (1988) Proc. Natl. Acad. Sci. USA 85:4305-4309 (maize); Klein et al. (1988) Biotechnology 6:559-563 (maize); Tomes, U.S. Pat. No. 5,240,855; Buising et al, U.S. Pat. Nos. 5,322,783 and 5,324,646; Tomes et al. (1995) "Direct DNA Transfer into Intact Plant Cells via Microprojectile Bombardment," in Plant Cell, Tissue, and Organ Culture: Fundamental Methods, ed. Gamborg (Springer-Verlag, Berlin) (maize); Klein et al. (1988) Plant Physiol. 91:440-444 (maize); Fromm et al. (1990) Biotechnology 8:833-839 (maize); Hooykaas-Van Slogteren et al. (1984) Nature (London) 311 :763-764; Bowen et al, U.S. Pat. No. 5,736,369 (cereals); Bytebier et al. (1987) Proc. Natl. Acad. Sci. USA 84:5345-5349 (Liliaceae); De Wet et al. (1985) in The Experimental Manipulation of Ovule Tissues, ed. Chapman et al. (Longman, NY.), pp. 197-209 (pollen); Kaeppler et al. (1990) Plant Cell Reports 9:415-418 and Kaeppler et al. (1992) Theor. Appl. Genet. 84:560-566 (whisker-mediated transformation); D'Halluin et al. (1992) Plant Cell 4:1495- 1505 (electroporation); Li et al. (1993) Plant Cell Reports 12:250-255 and Christou and Ford (1995) Annals of Botany 75:407-413 (rice); Osjoda et al. (1996) Nature Biotechnology 14:745-750 (maize via Agrobacterium tumefaciens). In one aspect, the first step in production of a transgenic plant involves making an expression construct for expression in a plant cell. These techniques are well known in the art. They can include selecting and cloning a promoter, a coding sequence for facilitating efficient binding of ribosomes to mRNA and selecting the appropriate gene terminator sequences. One exemplary constitutive promoter is CaMV35S, from the cauliflower mosaic virus, which generally results in a high degree of expression in plants. Other promoters are more specific and respond to cues in the plant's internal or external environment. An exemplary light-inducible promoter is the promoter from the cab gene, encoding the major chlorophyll a/b binding protein. In one aspect, the nucleic acid is modified to achieve greater expression in a plant cell. For example, a sequence encoding a polypeptide a glycosyltransferase or a deoxysugar pathway may have a higher percentage of A-T nucleotide pairs compared to that seen in a plant, some of which prefer G-C nucleotide pairs. Therefore, A-T nucleotides in the coding sequence can be substituted with G-C nucleotides without significantly changing the amino acid sequence to enhance production of the gene product in plant cells. Selectable marker gene can be added to the gene construct in order to identify plant cells or tissues that have successfully integrated the transgene. This may be necessary because achieving incorporation and expression of genes in plant cells is a rare event, occurring in just a few percent of the targeted tissues or cells. Selectable marker genes encode proteins that provide resistance to agents that are normally toxic to plants, such as antibiotics or herbicides. Only plant cells that have integrated the selectable marker gene will survive when grown on a medium containing the appropriate antibiotic or herbicide. As for other inserted genes, marker genes also require promoter and termination sequences for proper function. In one aspect, making transgenic plants or plant parts comprises incorporating sequences (e.g, a nucleic acid encoding a glycosyltransferase and/or a deoxysugar pathway) and, optionally, marker genes into a target expression construct (e.g, a plasmid), along with positioning of the promoter and the terminator sequences. This can involve transferring the modified gene into the plant through a suitable method. For example, a construct may be introduced directly into the genomic DNA of the plant cell using techniques such as electroporation and microinjection of plant cell protoplasts, or the constructs can be introduced directly to plant tissue using ballistic methods, such as DNA particle bombardment. For example, see, e.g, Christou (1997) Plant Mol. Biol. 35:197-203; Pawlowski (1996) Mol. Biotechnol. 6:17-30; Klein (1987) Nature 327:70- 73; Takumi (1997) Genes Genet. Syst. 72:63-69, discussing use of particle bombardment to introduce transgenes into wheat; and Adam (1997) supra, for use of particle bombardment to introduce YACs into plant cells. For example, Rinehart (1997) supra, used particle bombardment to generate transgenic cotton plants. Apparatus for accelerating particles is described U.S. Pat. No. 5,015,580; and, the commercially available BioRad (Biolistics) PDS-2000 particle acceleration instrument; see also, John, U.S. Patent No. 5,608,148; and Ellis, U.S. Patent No. 5, 681,730, describing particle- mediated transformation of gymnosperms. Plant signal sequences, including, but not limited to, signal-peptide encoding DNA/RNA sequences which target proteins to the extracellular matrix of the plant cell (Dratewka-Kos et al, (1989) J. Biol. Chem. 264:4896-4900), the Nicotiana plumbaginifolia extension gene (DeLoose, et al. (1991) Gene 99:95-100), signal peptides which target proteins to the vacuole like the sweet potato sporamin gene (Matsuka et al.
(1991) PNAS 88:834) and the barley lectin gene (Wilkins et al. (1990) Plant Cell 2:301- 313), signal peptides which cause proteins to be secreted such as that of PRIb (Lind et al.
(1992) Plant Mol. Biol. 18:47-53), or the barley alpha amylase (BAA) (Rahmatullah et al. (1989) Plant Mol. Biol. 12:119), or the signal peptide from the ESP1 or BEST1 gene, or signal peptides which target proteins to the plastids such as that of rapeseed enoyl-Acp reductase (Verwaert et al. (1994) Plant Mol. Biol. 26:189-202) are useful in the invention. In one aspect, protoplasts can be immobilized and injected with a nucleic acid, e.g, an expression construct. Although plant regeneration from protoplasts is not easy with cereals, plant regeneration is possible in legumes using somatic embryogenesis from protoplast derived callus. Organized tissues can be transformed with naked DNA using gene gun technique, where DNA is coated on tungsten microprojectiles, shot 1/100th the size of cells, which carry the DNA deep into cells and organelles.
Transformed tissue is then induced to regenerate, usually by somatic embryogenesis. This technique has been successful in several cereal species including maize and rice. Nucleic acids, e.g, expression constructs, can also be introduced in to plant cells using recombinant viruses. Plant cells can be transformed using viral vectors, such as, e.g, tobacco mosaic virus derived vectors (Rouwendal (1997) Plant Mol. Biol. 33:989-999), see Porta (1996) "Use of viral replicons for the expression of genes in plants," Mol. Biotechnol. 5:209-221. Alternatively, nucleic acids, e.g, an expression construct, can be combined with suitable T-DNA flanking regions and introduced into a conventional Agrobacterium tumefaciens host vector. The virulence functions of the Agrobacterium tumefaciens host will direct the insertion of the construct and adjacent marker into the plant cell DNA when the cell is infected by the bacteria. Agrobacterium tumefaciens-mediated transformation techniques, including disarming and use of binary vectors, are well described in the scientific literature. See, e.g, Horsch (1984) Science 233:496-498; Fraley (1983) Proc. Natl. Acad. Sci. USA 80:4803 (1983); Gene Transfer to Plants, Potrykus, ed. (Springer-Verlag, Berlin 1995). The DNA in an A. tumefaciens cell is contained in the bacterial chromosome as well as in another structure known as a Ti (tumor-inducing) plasmid. The Ti plasmid contains a stretch of DNA termed T-DNA (-20 kb long) that is transferred to the plant cell in the infection process and a series of vir
(virulence) genes that direct the infection process. A. tumefaciens can only infect a plant through wounds: when a plant root or stem is wounded it gives off certain chemical signals, in response to which, the vir genes of A. tumefaciens become activated and direct a series of events necessary for the transfer of the T-DNA from the Ti plasmid to the plant's chromosome. The T-DNA then enters the plant cell through the wound. One speculation is that the T-DNA waits until the plant DNA is being replicated or transcribed, then inserts itself into the exposed plant DNA. In order to use A. tumefaciens as a transgene vector, the tumor-inducing section of T-DNA have to be removed, while retaining the T-DNA border regions and the vir genes. The transgene is then inserted between the T-DNA border regions, where it is transferred to the plant cell and becomes integrated into the plant's chromosomes. In practicing the invention, monocotyledonous plants can be transformed using a nucleic acid encoding a glycosyltransferase and/or a deoxysugar pathway. Monocotyledonous plants used to practice the invention include all cereals, see Hiei (1997) Plant Mol. Biol. 35:205-218. See also, e.g., Horsch, Science (1984) 233:496; Fraley (1983) Proc. Natl. Acad. Sci USA 80:4803; Thykjaer (1997) supra; Park (1996) Plant Mol. Biol. 32:1135-1148, discussing T-DNA integration into genomic DNA. See also D'Halluin, U.S. Patent No. 5,712,135, describing a process for the stable integration of a DNA comprising a gene that is functional in a cell of a cereal, or other monocotyledonous plant. In one aspect, the third step can involve selection and regeneration of whole plants capable of transmitting the incorporated target gene (e.g, heterologous glycosyltransferase or a heterologous deoxysugar pathway) to the next generation. Such regeneration techniques rely on manipulation of certain phytohormones in a tissue culture growth medium, typically relying on a biocide and/or herbicide marker that has been introduced together with the desired nucleotide sequences. Plant regeneration from cultured protoplasts is described in Evans et al. Protoplasts Isolation and Culture,
Handbook of Plant Cell Culture, pp. 124-176, MacMillilan Publishing Company, New
York, 1983; and Binding, Regeneration of Plants, Plant Protoplasts, pp. 21-73, CRC Press, Boca Raton, 1985. Regeneration can also be obtained from plant callus, explants, organs, or parts thereof. Such regeneration techniques are described generally in Klee (1987) Ann. Rev. of Plant Phys. 38:467-486. To obtain whole plants from transgenic tissues such as immature embryos, they can be grown under controlled environmental conditions in a series of media containing nutrients and hormones, a process known as tissue culture. Once whole plants are generated and produce seed, evaluation of the progeny begins. After the expression cassette is stably incorporated in transgenic plants, it can be introduced into other plants by sexual crossing. Any of a number of standard breeding techniques can be used, depending upon the species to be crossed. Since transgenic expression of nucleic acids encoding polypeptides having a glycosyltransferase or a deoxysugar pathway can lead to phenotypic changes, plants comprising the recombinant nucleic acids can be sexually crossed with a second plant to obtain a final product. In one aspect, a seed comprising a nucleic acid encoding a heterologous glycosyltransferase and a heterologous deoxysugar pathway is derived from a cross between two transgenic plants, or a cross between a plant comprising a nucleic acid encoding heterologous glycosyltransferase and a heterologous deoxysugar pathway and another plant. The desired effects, e.g, expression of a polypeptide having a heterologous glycosyltransferase and a heterologous deoxysugar pathway, can be enhanced when both parental plants express these polypeptides. The desired effects can be passed to future plant generations by standard propagation means. In practicing the methods of the invention, nucleic acids encoding a heterologous glycosyltransferase and a heterologous deoxysugar pathway can be expressed in or inserted in any plant or plant part, e.g, a seed, fruit, flower, a root, a tuber and the like. Transgenic plants can be dicotyledonous or monocotyledonous. Examples of monocot transgenic plants of the invention and used to practice the invention include grasses, such as meadow grass (blue grass, Pod), forage grass such as festuca, lolium, temperate grass, such as Agrostis, and cereals, e.g, wheat, oats, rye, barley, rice, sorghum, and maize (corn). Examples of dicot transgenic plants of the invention and used to practice the methods of the invention include tobacco, legumes, such as lupins, potato, sugar beet, pea, bean and soybean, and cruciferous plants (family Brassicaceae), such as cauliflower, rape seed, and the closely related model organism Arabidopsis thaliana. Thus, the transgenic plants and plant parts used to practice the invention include a broad range of plants, including, but not limited to, species from the genera Anacardium, Arachis, Asparagus, Atropa, Avena, Brassica, Citrus, Citrullus, Capsicum, Carthamus, Cocos, Coffea, Cucumis, Cucurbita, Daucus, Elaeis, Fragaria, Glycine, Gossypium, Helianthus, Heterocallis, Hordeum, Hyoscyamus, Lactuca, Linum, Lolium, Lupinus, Lycopersicon, Malus, Manihot, Majorana, Medicago, Nicotiana, Olea, Oryza, Panieum, Pannisetum, Persea, Phaseolus, Pistachia, Pisum, Pyrus, Prunus, Raphanus, Ricinus, Secale, Senecio, Sinapis, Solanum, Sorghum, Theobromus, Trigonella, Triticum, Vicia, Vitis, Vigna, and Zea. In alternative aspects, the nucleic acids encoding heterologous glycosyltransferases and/or heterologous deoxysugar pathways are expressed in plants which contain fiber cells, including, e.g., cotton, silk cotton tree (Kapok, Ceiba pentandra), desert willow, creosote bush, winterfat, balsa, ramie, kenaf, hemp, roselle, jute, sisal abaca and flax. In alternative aspect, transgenic plants used to practice the invention are members of the genus Gossypium, including members of any Gossypium species, such as G. arboreum;. G. herbaceum, G. barbadense, and G. hirsutum. The invention also provides for transgenic plants for producing large amounts of heterologous glycosyltransferases or heterologous deoxysugar pathway enzymes, resulting in generation of compositions of the invention. For example, see Palmgren (1997) Trends Genet. 13:348; Chong (1997) Transgenic Res. 6:289-296 (producing human milk protein beta-casein in transgenic potato plants using an auxin-inducible, bi-directional mannopine synthase (masl',2') promoter with Agrobacterium tumefaciens-mediated leaf disc transformation methods). The modified plant may be grown into plants in accordance with conventional ways. See, for example, McCormick et al. (1986) Plant Cell. Reports 5:81- 84. These plants may then be grown, and either pollinated with the same transformed strain or different strains, and the resulting hybrid having the desired phenotypic characteristic identified. Two or more generations may be grown to ensure that the subject phenotypic characteristic is stably maintained and inherited and then seeds harvested to ensure the desired phenotype or other property has been achieved. The heterologous glycosyltransferase and/or heterologous deoxysugar pathway enzymes can be fermented in a bacterial host and the resulting bacteria processed and used as a microbial spray. Any suitable microorganism can be used for this purpose. See, for example, Gaertner et al. (1993) in Advanced Engineered Pesticides, Kim (Ed.). The nucleic acids encoding a heterologous glycosyltransferase and/or heterologous deoxysugar pathway can be introduced into microorganisms that multiply on plants (epiphytes) to deliver the glycosylated natural products of the invention to potential target crops. Epiphytes can be gram-positive or gram-negative bacteria. In one aspect, the microorganisms that have been genetically altered to contain at least one nucleic acid encoding a heterologous glycosyltransferase and/or heterologous deoxysugar pathway are used for protecting agricultural crops and products. In one aspect, whole, i.e., unlysed, cells of transformed organisms of the invention are applied to the environment of a target plant. A nucleic acid encoding a heterologous glycosyltransferase and/or heterologous deoxysugar pathway can be introduced via a suitable vector into a microbial host, and said transformed host applied to the environment or plants or animals. The microorganism hosts will then produce a composition of the invention. In one aspect, microorganism hosts that are known to occupy the "phytosphere" (phylloplane, phyllosphere, rhizosphere, and/or rhizoplane) of one or more crops of interest are selected for transformation. These microorganisms are selected so as to be capable of successfully competing in the particular environment with the wild-type microorganisms, to provide for stable maintenance and expression of the gene expressing the pesticide/ insecticide of the invention. Exemplary microorganism hosts of the invention for producing a glycosylated natural product include bacteria, algae, and fungi, including bacteria such as Erwinia, Serratia, Klebsiella, Xanthomonas, Streptomyces, Rhizobium, Methylius, Agrobacterium, Acetobacter, Lactobacillus, Arthrobacter, Azotobacter, Leuconostoc, and Alcaligenes; fungi, particularly yeast, e.g, Saccharomyces, Pichia, Cryptococcus, Kluyveromyces, Sporobolomyces, Rhodotorula, and Aureobasidium. The Streptomyces includes Streptomyces coelicolor, Streptomyces peucetius, Streptomyces avermitilis, Streptomyces aureofaciens, Streptomyces kasugensis, Streptomyces lavenulae, Streptomyces lipmanii, Streptomyces lividans, Streptomyces ambofaciens, Streptomyces violaceoniger, Streptomyces thermotolerans, Streptomyces rimosus, Streptomyces glaucescens, Streptomyces roseofulvus, Streptomyces cinnamonensis, Streptomyces curacoi, Streptomyces fradiae, Streptomyces griseus, Streptomyces griseofuscus, Streptomyces longisporoflavus, Streptomyces hygroscopicus, Streptomyces lasaliensis, Streptomyces venezuelae, Streptomyces antibioticus, Streptomyces albus, Streptomyces tsukubaensis, Streptomyces galilaeus or Streptomyces diversa. In one aspect, phytosphere bacterial species such as Serratia marcescens, Acetobacter xylinum,
Agrobacteria, Xanthomonas campestris, Rhizobium melioti, Alcaligenes entrophus,
Clavibacter xyli, and Azotobacter vinlandii; and phytosphere yeast species such as Rhodotorula rubra, R. glutinis, R. marina, R. aurantiaca, Cryptococcus albidus, C diffluens, C. laurentii, Saccharomyces rosei, S. pretoriensis, S. cerevisiae, Sporobolomyces rosues, S. odorus, Kluyveromyces veronae, and Aureobasidium pullulans are used. Exemplary prokaryote hosts of the invention for producing a glycosylated natural product include both Gram-negative and -positive, include Enterobacteriaceae, such as Escherichia, Erwinia, Shigella, Salmonella, and Proteus; Bacillaceae; Rhizobiaceae, such as Rhizobium; Spirillaceae, such as photobacterium, Zymomonas, Serratia, Aeromonas, Vibrio, Desulfovibrio, Spirillum; Lactobacillaceae; and Acetobacter; Azotobacteraceae; and Nitrobacteraceae. Exemplary eukaryote hosts of the invention for producing a glycosylated natural product are fungi, such as Phycomycetes and Ascomycetes, which includes yeast, such as Saccharomyces and Schizosaccharomyces; and Basidiomycetes yeast, such as Rhodotorula, Aureobasidium, Sporobolomyces, and the like. In various aspect, for selecting a host cell for producing a glycosylated natural product, characteristics including ease of introducing the nucleic acid into a host, availability of expression systems, efficiency of expression, stability of the protein in the host, and the presence of auxiliary genetic capabilities are considered. Other considerations include ease of formulation and handling, economics, storage stability, and the like. A number of ways are available for introducing a nucleic acid into a microorganism host under conditions that allow for stable maintenance and expression of the gene. For example, expression cassettes can be constructed that include the DNA constructs operably linked with the transcriptional and translational regulatory signals for expression of the DNA constructs, and a DNA sequence homologous with a sequence in the host organism, whereby integration will occur, and/or a replication system that is functional in the host, whereby integration or stable maintenance will occur. Formulation and Administration Pharmaceuticals The invention provides pharmaceutical compositions comprising a glycosylated natural product of the invention. In one aspect, the pharmaceutical compositions are formulations that comprise a pharmacologically effective amount of a glycosylated natural product of the invention. In various aspects, the pharmaceutical compositions of the invention comprise antibiotics of the invention (e.g, erythromycin, tetracycline, rifampicin and the like glycosylated by an in vivo glycosylation system of the invention); anti-tumor drugs of the invention (daunorubicin, mithramycin glycosylated by an in vivo glycosylation system of the invention); immunosuppressants of the invention (rapamycin, FK520, FK506 glycosylated by an in vivo glycosylation system of the invention); anti-fungals of the invention (amphotericin glycosylated by an in vivo glycosylation system of the invention); antibacterials of the invention (tylosin glycosylated by an in vivo glycosylation system of the invention); antiparasitics of the invention (avermectin glycosylated by an in vivo glycosylation system of the invention); and, anti-tumor drugs of the invention such as anthracyclines glycosylated by an in vivo glycosylation system of the invention (e.g, doxorubicin, and second generation anthracyclines such as idarubicin and epirubicin) daunorubicin, mithramycin. These pharmaceuticals can be administered by any means in any appropriate formulation. Routine means to determine drug regimens and formulations to practice the invention are well described in the patent and scientific literature. For example, details on techniques for formulation, dosages, administration and the like are described in, e.g., the latest edition of Remington's Pharmaceutical Sciences, Maack Publishing Co, Easton PA. In practicing the invention, techniques for formulation, dosages, administration and the like for corresponding known natural products can be used, e.g, for doxorubicin, see, e.g, U.S. Patent No. 6,369,037. The formulations of the invention can include pharmaceutically acceptable carriers that can contain a physiologically acceptable compound that acts, e.g, to stabilize the composition or to increase or decrease the absorption of the pharmaceutical composition. Physiologically acceptable compounds can include, for example, carbohydrates, such as glucose, sucrose, or dextrans, antioxidants, such as ascorbic acid or glutathione, chelating agents, low molecular weight proteins, compositions that reduce the clearance or hydrolysis of any co-administered agents, or excipients or other stabilizers and/or buffers. Detergents can also used to stabilize the composition or to increase or decrease the absorption of the pharmaceutical composition. Other physiologically acceptable compounds include wetting agents, emulsifying agents, dispersing agents or preservatives that are particularly useful for preventing the growth or action of microorganisms. Various preservatives are well known, e.g, ascorbic acid.
One skilled in the art would appreciate that the choice of a pharmaceutically acceptable carrier, including a physiologically acceptable compound depends, e.g, on the route of administration and on the particular physio-chemical characteristics of any co- administered agent. In one aspect, the composition for administration comprises a pharmaceutically acceptable carrier, e.g, an aqueous carrier. A variety of carriers can be used, e.g, buffered saline and the like. These solutions are sterile and generally free of undesirable matter. These compositions may be sterilized by conventional, well-known sterilization techniques. The compositions may contain pharmaceutically acceptable auxiliary substances as required to approximate physiological conditions such as pH adjusting and buffering agents, toxicity adjusting agents and the like, for example, sodium acetate, sodium chloride, potassium chloride, calcium chloride, sodium lactate and the like. The concentration of active agent in these formulations can vary widely, and will be selected primarily based on fluid volumes, viscosities, body weight and the like in accordance with the particular mode of administration and imaging modality selected. The pharmaceutical formulations of the invention can be administered in a variety of unit dosage forms, the general medical condition of each patient, the method of administration, and the like. Details on dosages are well described in the scientific and patent literature, see, e.g., the latest edition of Remington's Pharmaceutical Sciences. The exact amount and concentration of pharmaceutical of the invention and the amount of formulation in a given dose, or the "effective dose" can be routinely determined by, e.g, the clinician (see above discussion of "a pharmacologically effective amount of a glycosylated natural product"). The "dosing regimen," will depend upon a variety of factors, e.g, the general state of the patient's health, age and the like. Using guidelines describing alternative dosaging regimens, e.g, from the use of other imaging contrast agents, the skilled artisan can determine by routine trials optimal effective concentrations of pharmaceutical compositions of the invention. The invention is not limited by any particular dosage range. The pharmaceutical compositions of the invention can be delivered by any means known in the art systemically (e.g, intravenously), regionally, or locally (e.g, infra- or peri-tumoral or intracystic injection) by, e.g, intraarterial, intratumoral, intravenous (IV), parenteral, intra-pleural cavity, topical, oral, or local administration, as subcutaneous, intra-tracheal (e.g, by aerosol) or transmucosal (e.g, buccal, bladder, vaginal, uterine, rectal, nasal mucosa), intra-tumoral (e.g, fransdermal application or local injection). For example, intra-arterial injections can be used to have a "regional effect," e.g, to focus on a specific organ (e.g, brain, liver, spleen, lungs). Formulations suitable for oral administration can comprise liquid solutions, such as an effective amount of the compound dissolved in diluents, such as water, saline, or fruit juice; capsules, sachets or tablets, each containing a predetermined amount of the active ingredient, as solid, granules or freeze-dried cells; solutions or suspensions in an aqueous liquid; and oil-in-water emulsions or water-in-oil emulsions. Tablet forms can include one or more of lactose, mannitol, corn starch, potato starch, macrocrystalline cellulose, acacia, gelatin, colloidal silicon dioxide, croscarmellose sodium, talc, magnesium stearate, stearic acid, and other excipients, colorants, diluents, buffering agents, moistening agents, preservatives, flavoring agents, and pharmacologically compatible carriers. Suitable formulations for oral delivery can also be incorporated into synthetic and natural polymeric microspheres, or other means to protect the agents of the present invention from degradation within the gastrointestinal tract. See, for example, Wallace (1993) Science 260:912-915. The glycosylated natural products or conjugates thereof, alone or in combination with other similar acting compounds, can be made into aerosol formulations to be administered via inhalation. These aerosol formulations can be placed into pressurized acceptable propellants, such as dichlorodifluoromethane, propane, nitrogen and the like. The glycosylated natural products or conjugates thereof, alone or in combinations with other similar acting compounds or absorption modulators, can be made into suitable formulations for fransdermal application and absorption. Transdermal electroporation or iontophoresis also can be used to promote and/or control the systemic delivery of a glycosylated natural product of the invention through the skin, e.g, see Theiss et al, Meth. Find. Exp. Clin. Pharmacol. 13, 353-359, 1991. Formulations suitable for topical administration of a glycosylated natural product of the invention can include lozenges comprising the active ingredient in a flavor, usually sucrose and acacia or tragacanth; pastilles comprising the active ingredient in an inert base, such as gelatin and glycerin, or sucrose and acacia; and mouthwashes comprising a natural product of the invention in a suitable liquid carrier; as well as creams, emulsions, gels and the like. Formulations for rectal administration can be presented as a suppository with a suitable base comprising, for example, cocoa butter or a salicylate. Formulations suitable for vaginal administration can be presented as pessaries, tampons, creams, gels, pastes, foams, or spray formulas containing, in addition to the active ingredient, such as, for example, freeze-dried bacteria genetically engineered to directly produce a glycosylated natural product of the invention, such carriers as are known in the art to be appropriate. Similarly, a natural product of the invention can be combined with a lubricant as a coating on a condom. A natural product of the invention can be applied to any contraceptive device, e.g, a condom, a diaphragm, a cervical cap, a vaginal ring or a sponge. Formulations suitable for parenteral administration can include aqueous and non-aqueous, isotonic sterile injection solutions, which can contain anti-oxidants, buffers, bacteriostats, and solutes that render the formulation isotonic with the blood of the intended recipient, and aqueous and non-aqueous sterile suspensions that can include suspending agents, solubilizers, thickening agents, stabilizers, and preservatives. The pharmaceutical formulations of the invention can be presented in unit- dose or multi-dose sealed containers, such as ampoules and vials, and can be stored in a freeze-dried (lyophilized) condition requiring only the addition of the sterile liquid excipient, for example, water, for injections, immediately prior to use. Extemporaneous injection solutions and suspensions can be prepared from sterile powders, granules, and tablets. Therapeutic compositions can also be administered in a lipid formulation, e.g, complexed with liposomes or in lipid/nucleic acid complexes or encapsulated in liposomes, as in immuno-liposomes directed to specific cells. These lipid formulations can be administered topically, systemically, or delivered via aerosol. See, e.g, U.S. Patent Nos. 6,149,937; 6,146,659; 6,143,716; 6,133,243; 6,110,490; 6,083,530; 6,063,400; 6,013,278; 5,958,378; 5,552,157. In one aspect, the pharmaceutical composition of the invention comprises an altered glycosylated natural product of the invention, e.g, a glycosylated natural product of the invention with a modified molecular structure to, e.g, provide enhanced stability to the glycosylated natural product, as discussed above. The invention provides various delivery strategy systems and routes of administration for glycosylated natural products of the invention, all of which are well known in the art, see e.g, Epstein, CRC Crit. Rev. Therapeutic Drug Carrier Systems 5, 99-139, 1988; Siddiqui et al, CRC Crit. Rev. Therapeutic Drug Carrier Systems 3, 195- 208, 1987; Banga et al. Int. J. Pharmaceutics 48, 15-50, 1988; Sanders, Eur. J. Drug Metab. Pharmacokinetics 15, 95-102, 1990; and Verhoef, Eur. J. Drug Metab. Pharmacokinetics 15, 83-93, 1990. The appropriate delivery system for a given glycosylated natural product or conjugate thereof will depend upon its particular nature, the particular clinical application, and the site of drug action. In one aspect, a glycosylated natural product of the invention is used with an absorption-enhancing agent. Any absorption-enhancing agent can be used, e.g, those applied in combination with protein and peptide drugs for oral delivery and for delivery by other routes, see, e.g, van Hoogdalem, Pharmac. Ther. 44, 407-443, 1989; Davis, J. Pharm. Pharmacol. 44(Suppl. 1), 186-190, 1992. Enhancers used in the compositions and methods of the invention include, e.g, (a) chelators, such as EDTA, salicylates, and N- acyl derivatives of collagen, (b) surfactants, such as lauryl sulfate and polyoxyethylene-9- lauryl ether, (c) bile salts, such as glycolate and taurocholate, and derivatives, such as taurodihydrofusidate, (d) fatty acids, such as oleic acid and capric acid, and their derivatives, such as acylcamitines, monoglycerides and diglycerides, (e) non-surfactants, such as unsaturated cyclic ureas, (f) saponins, (g) cyclodextrins, and (h) phospholipids. Other approaches to enhancing oral delivery of natural product drugs are also used the practice the invention, e.g, chemical modifications to enhance stability to gastrointestinal enzymes and/or increased lipophilicity. Alternatively, or in addition, the glycosylated natural product can be administered in combination with other drugs or substances. Another alternative approach to prevent or delay gastrointestinal absorption of glycosylated natural product is to incorporate it into a delivery system that is designed to protect the natural product from contact with the proteolytic enzymes in the intestinal lumen and to release the natural product only upon reaching an area favorable for its absorption. In one aspect, a biodegradable microcapsules or microspheres is used with a glycosylated natural product of the invention, both to protect it from degradation, as well as to effect a prolonged release of active drug, see, e.g, Deasy, in Microencapsulation and Related Processes, Swarbrick, ed., Marcell Dekker, Inc.: New York, 1984, pp. 1-60, 88- 89, 208-211. Microcapsules also can provide a useful way to effect a prolonged delivery of a natural product drug after injection, see, e.g, Maulding, J. Controlled Release 6, 167- 176, 1987. The invention also provides a pharmaceutical composition comprising an isolated or purified glycosylated natural product or glycosylated natural product conjugate, a matrix-anchored glycosylated natural product or a matrix-anchored glycosylated natural product conjugate. The composition of the invention can further comprise a carrier, such as a pharmaceutically acceptable carrier. The composition of the invention can further comprise an antiviral compound, e.g, AZT, ddl, ddC, gancyclovir, fluorinated dideoxynucleosides, nevirapine, R82913, Ro 31-8959, BI-RJ-70, acyclovir, alpha-interferon, recombinant sCD4, michellamines, calanolides, nonoxynol-9, gossypol and derivatives thereof, and gramicidin. As discussed above, the glycosylated natural product of the invention used in the pharmaceutical composition of the invention can be isolated and purified from nature or genetically engineered. A glycosylated natural product conjugate of the invention can be genetically engineered or chemically coupled. Formulations of the invention comprising a glycosylated natural product or glycosylated natural product conjugate of the invention can be used for sterilization of inanimate objects, such as medical supplies or equipment, laboratory equipment and supplies, instruments, devices, and the like. Similarly, formulations of the invention are used for ex vivo sterilization from a sample, such as blood, blood products, sperm, or other bodily products, such as a fluid, cells, a tissue or an organ, or any other solution, suspension, emulsion, vaccine formulation, or any other material which can be administered to a patient in a medical procedure, can be selected or adapted as appropriate by one skilled in the art, from any of the aforementioned compositions or formulations. In one aspect, a glycosylated natural product of the invention is attached to a solid support matrix as an antiseptic or anti-microbial in a sample, e.g, a bodily product such as a fluid, cells, a tissue or an organ from an organism, in particular a mammal, such as a human, including, for example, blood, a component of blood, or sperm. In one aspect, a glycosylated natural product or the invention, comprises a condom, a diaphragm, a cervical cap, a vaginal ring or a sponge. The compositions of the invention can be used to treat objects or materials, such as medical equipment, supplies, or fluids, including biological fluids, such as blood, blood products and vaccine formulations, cells, tissues and organs.
EXAMPLES
Example 1 : Cloning of a spinosyn biosynthetic gene cluster To clone a deoxysugar biosynthetic pathway gene for use in the systems and methods of the invention, a fosmid library of the spinosyn producer Sac. spinosa was generated. Using the previously published sequence (Genbank accession AY007564, 48) a set of four probes was designed and used in a colony hybridization experiment to identify clones containing the sugar biosynthetic genes. Forty hybridizing clones were obtained and analyzed by PCR and restriction analysis. Subsequently, a subset of 10 overlapping fosmid clones containing the complete spinosyn biosynthetic pathway was identified, including two clones (spin-29, spin-33) that contained sequences encoding the genes for the biosynthesis of the associated deoxysugars, as illustrated in Figure 6 (showing the organization of the spinosyn gene cluster and location of isolated fosmid clones; green: PKS genes, red: L-rhamnose genes, blue: D-forosamine genes). Example 2: Cloning of individual genes for expression in a Streptomyces All GT, methyltransferase and deoxysugar biosynthetic genes were cloned by PCR either using chromosomal DNA, or appropriate fosmid clones as templates. A total of 14 genes were cloned. Restriction sites to facilitate subsequent cloning steps were included in the PCR amplification primers. All PCR fragments were initially cloned in TOPO-vectors (Invifrogen, San Diego, CA) to verify their sequence. Clones having the predicted sequence were then used for further work. A list of genes cloned is shown in Table 1:
Figure imgf000131_0001
Figure imgf000131_0002
Example 3: Tools for the expression of deoxysugar pathways in Streptomyces Host strains: The cloned and verified GT genes, spnG and spnP, were cloned into the expression vector pUWL201. These constructs, as well as a vector-only control, were then transferred to the Streptomyces by electroporation. After verifying the presence of the plasmids, spores were prepared that were used subsequently as recipients of the cloned deoxysugar pathways described below. Thus, three different strains were generated for the analysis of deoxysugar pathway expression. Expression vector: To facilitate expression of deoxysugar pathways in the Streptomyces used in these studies, the chromosomal integration vector pAT6 was constructed by incorporation of the ermEp*49 promoter adjacent to the cloning site of pSET15250. ermEP* is a strong constitutive promoter from Saccharopolyspora erythraea ((9) Bibb, M. J.; Janssen, G. R.; Ward, J. M. Gene 1985, 38 (1-3), 215-226). At the same time, restriction sites suitable for insertion of cloned deoxysugar pathways downstream of the promoter were added. Thus a versatile two-vector expression system was constructed, allowing for facile combination of deoxysugar pathways and glycosyltransferases. Example 4: Construction of deoxysugar pathways To generate deoxysugar pathways, individual genes were combined in a step-by-step process in the E. coli cloning vector pMUN3. Complete pathways were then transferred to the expression vector pAT6. This process was time consuming, reflecting the large number of sequential steps involved: 4 different 6- deoxysugar pathways
4 different 2,6-
Figure imgf000132_0001
di-deoxysugar pathways
Replace epi with 4 different forosamine forosamine genes analogous pathways Several constructs were unstable in E. coli, resulting in the deletion of genes from the recombinant pathways. This may reflect expression of genes or partial pathways in E. coli, and interference with lipopolysaccharide (LPS) biosynthesis. E. coli LPS is known to contain 6-deoxysugars such as L-rhamnose and L-fucose. This problem was partially overcome by growing strains at lower temperatures (30°C). Using the different epi and kre homologues from Streptomyces and Sac. spinosa, eight different pathways were constructed, see Table 2:
Figure imgf000132_0002
In Table 2, deoxysugar pathways were constructed as described herein. All pathways contained the gtt and gdh genes from Sac. spinosa, in addition to the genes listed (1-4: 6-deoxysugar pathways, 9-12: 2,6-di-deoxysugar pathways, 17-20: forosamine analogous pathways; red (or underlined): genes from Sac. spinosa, blue: genes from Streptomyces). These include reconstruction of the L-rhamnose and L-digitoxose pathways from Sac. spinosa and the Streptomyces used in these studies, respectively. Based on the same functionality of the two kre genes, these eight pathways should have resulted in the production of four different deoxysugars. An additional four pathways, including the D-forosamine pathway, were generated by the addition of either one or all of the tfd, amt, and nmt genes to a truncated 2,6-dideoxysugar pathway. These four pathways are predicted to generate three additional sugars.
Example 5: Culture conditions Initially, stability of the substrates and their influence on growth of the
Streptomyces used in these studies was investigated under conditions suitable for secondary metabolite production. None of the substrates (aglycone, 17-pseudo-aglycone = 17-PSA, 9-pseudo-aglycone = 9-PSA, see Figure 7a) inhibited growth of the Streptomyces at concentrations up to lOOmg/ml. 17-PSA and 9-PSA were stable when added to cultures of the Streptomyces used in these studies. In contrast, this Streptomyces converted the aglycone to a 17-keto derivative (see Figure 7b) with an efficiency of 10- 30%). This product was purified and the structure confirmed by NMR. Based on these results, a substrate loading concentration of lOOmg/ml was chosen for all subsequent experiments. Taking into account extraction efficiency and the detection level for HPLC-UV- VIS analysis, as little as ~l-2%> conversion could be detected (see below). These initial conditions were further optimized after glycosylation products of the aglycone were detected. Various culture volumes were analyzed and glycosylation could be shown in volumes ranging from 1ml in a 96-well format to 50ml in 250-ml baffled Erlenmeyer flasks. A media to vessel volume ratio of 1 :5 resulted in the highest conversion ratio (e.g. 100%) conversion of the aglycone to the rhamnosyl derivative in 72 h). Despite this optimization, however, significant clone to clone variability was observed. Therefore, at least four independent clones for each pathway / host-strain / substrate combination were analyzed for their glycosylation capabilities. Example 6: Chemical analysis and purification of spinosyn derivatives Liquid / liquid extraction with ethyl-acetate proved to be almost 100% efficient and delivered highly reproducible clean extracts. This method was therefore chosen. However, this method is not easily adaptable to high throughput formats, such as 96-well plates. Other methods can also be used. Using standard compounds (aglycone, 17-PSA, 9-PSA, spinosyn A) obtained from DAS, thin-layer chromatography (TLC), HPLC-UV-VIS, and HPLC-MS methods for spinosyn derivatives were established. The analysis time and detection level are summarized in Table 3:
Figure imgf000134_0001
Based on throughput, time required for sample preparation and sensitivity, HPLC-UV-Vis was chosen for the initial analysis of samples. To confirm attachment of sugars to the substrate, HPLC-MS analysis was used to determine the molecular weights of novel peaks in selected samples. Example 7: Analysis of deoxysugar pathway expression in Streptomyces All pathways were analyzed in the Streptomyces used in these studies (pUWL201) (vector-only control, i.e. in the absence of a cloned GT gene) and, as expected, no product formation was detected. The influence of expression of the cloned pathways on the Streptomyces itself was investigated by analyzing samples of extracts without addition of any substrate to the cultures. None of the pathways had any apparent effect on endogenous compounds or on the growth of the Streptomyces used in these studies. 6-deoxysugar pathways: The 6-deoxysugar pathways were analyzed in the Streptomyces (pUWL-SpnG). All pathways resulted in conversion of the aglycone to two different glycosylated products, designated M548 and M548-II. Relative production levels varied with the pathway, as shown in Table 4:
Figure imgf000134_0002
Table 4 shows conversion ratios to M548 and M548-II from the aglycone; ratios were calculated as relative peak areas of M548 : M548-II : aglycone. After scale- up and purification the structure of each compound was elucidated by NMR, as illustrated in Figure 8; structures of 8(a) M548, 8(b) M548-II, and 8(c) M532; sugars are shown in red. M548 contains 6-deoxy-D-glucose attached at the 9 position, whereas M548-II contains L-rhamnose. Both M548 and M548-II are novel compounds. While the attachment of L-rhamnose to the spinosyn aglycone (M548-II) was predicted, the presence of a 6-deoxy-D-glucose in M548 was not expected. The production of this sugar can be explained if only the gtt, gdh, and kre gene products form a functional pathway without any contribution from the epi gene product, as illustrated in Figure 9. There are no reports of such a pathway in any known antibiotic-producing organism. Figure 9 illustrates L-rhamnose biosynthesis in Sac. spinosa (top) and 6- deoxy-D-glucose biosynthesis in engineered in the Streptomyces used in these studies. The different levels of L-rhamnose and 6-deoxy-D-glucose production
(measured as M548-II and M548 production ratios, see Table 4) obtained with different pathways may be explained by differential expression of cloned pathway genes or by contributions from the chromosomal background of the Streptomyces used in these studies. The Streptomyces used in these studies contains several other epi and kre homologs in addition to the L-digitoxose pathway genes used for construction of these pathways. The host strain also contains chromosomal copies of the L-digitoxose genes. Depending on which genes are over-expressed in the artificial pathways, the ratio of activated L-rhamnose and 6-deoxy-D-glucose as substrate for SpnG could change, resulting in different ratios of M548 : M548-II. 2,6-dideoxysugar pathways: The 2,6-dideoxysugar pathways were analyzed in the Streptomyces used in these studies (pUWL-SpnG). Strains containing these pathways also produced M548 and M548-II, the compounds that were identified using 6-deoxysugar pathways. In addition, the expression of the reconstructed L- digitoxose pathway (#12) resulted in production of a compound with a molecular weight of 532, as expected for attachment of a 2,6-dideoxyhexose. This compound, designated M532 (fig. lie), was purified and attachment of L-digitoxose confirmed by NMR. This result confirms that M532 is also a novel compound. D-forosamine and analogous pathways: These pathways were analyzed both in the Streptomyces (pUWL-SpnP) and the Streptomyces (pUWL-SpnG) for conversion of the 17-PSA or aglycone, respectively. As expected the reconstructed D- forosamine pathway (#17) resulted in conversion of the 17-PSA to spinosyn A. In addition to spinosyn-A, additional products were also identified. These included spinosyn B, or 4"-demethyl-spinosyn A, as well as two additional, as yet unidentified products. These two products were found after expression of three forosamine analogous pathways. Expression of the pathway (#18) lacking the N-methyltransferase resulted in production of a compound of molecular weight 703, consistent with the expected attachment of a di- demethyl derivative of D-forosamine resulting in formation of spinosyn C, as illustrated in Figure 10. Products obtained from pathways 17 and 18 co-expressed with spnP after conversion of the 17-PSA (pathway 17: spinosyn A (R1=R2=CH3) and spinosyn B (R1=H R2=CH3); pathway 18: spinosyn C (R1=R2=H)). Expression of these pathways in the Streptomyces (pUWL-SpnG) did not result in any new products. Pathways 19 and 20 generated M548, indicating again that gtt, gdh, and kre present in these pathways form a minimal 6-deoxy-D-glucose pathway without any contribution from any of the other genes.
Example 8: Production of compounds for biological testing Studies in Sac. spinosa have shown that L-rhamnose is normally attached prior to attachment of the D-forosamine. Despite this, conversion of the 9-PSA by strains containing SpnG and the L-rhamnose, 6-deoxy-D-glucose, or L-digitoxose biosynthetic pathways was attempted. In all cases, transfer of the sugar to the 9-PSA was demonstrated but at slightly lower conversion ratios when compared to the aglycone (see Table 4). These spinosyn analogues were designated M689 (6-deoxy-D-glucosyl analogue), M689-II (L-rhamnosyl analogue) and M673 (L-digitoxosyl analogue), as illustrated in Figure 11. In Figure 11, structures of novel spinosyn derivatives of the invention: M689 11(a), M689-II 11(b) and M673 11(c); modified sugars are shown in red. In conclusion: Using spinosyn as a model compound, a system in a Streptomyces was successfully developed that is suitable for glycosylation of natural products. Twelve recombinant deoxysugar pathways were constructed and expressed. At least total of six novel products were identified.
Example 9: Cloning of deoxysugar biosynthetic genes The in vivo glycosylation systems of the invention can use any of the many deoxysugar biosynthetic genes that have been identified in actinomycetes, related species or other organisms, including insects, mammals, etc, including the sequences of genes from the more than 30 deoxysugar biosynthetic pathways available in public databases. Table 5 lists some of the sugar biosynthetic pathways that are used to practice the invention, and their organisms of origin:
Figure imgf000137_0001
These pathways cover a great deal of the structural diversity found in deoxysugars from Actinomycetales, and encompass different stages of deoxygenation, alternate aminations, inclusion of nitro groups, andN-, 0- and C-methylations. A general scheme for deoxysugar biosynthesis that can be used to practice the invention is outlined in Figure 12. Most of the genes in these pathways are available from public databases and can be cloned directly from chromosomal DNA of the respective organism. However, the gene-cluster for viriplanin (whose structure is illustrated in Figure 13), which contains 5 different sugars, including an unusual nitro-sugar moiety, has not been reported. As this cluster contains interesting deoxysugar biosynthetic pathway genes, as well as genes encoding anthracycline GTs, are cloned de novo. PCR-based methods targeting the type II PKS 53, dNDP-glucose-4,6-dehydratase54, and dNDP-glucose-2,3- dehydratase55 genes, all of which are expected to be part of the viriplanin cluster, are used to generate probes for screening a fosmid library of Ampullariella regularis > ATCC31417, the producing organism, by colony hybridization. Positive clones are confirmed and characterized by restriction analysis, PCR and hybridization. One or two fosmid clones are expected to cover the complete pathway, these are sequenced by a shotgun approach. Genes and their functions can be identified using sequence analysis tools. In addition to the genes from the pathways listed in Table 5, single genes with interesting functions from other pathways are cloned, for example, the C5-C- methyltransferase novU from Streptomyces sphaeroides. To identify other genes that can be used in the in vivo glycosylation systems and methods of the invention, environmental sequence archives for homologues of sugar biosynthetic genes and GTs can be screened. For example, using genes from the L-rhamnose pathway from Sac. spinosa and the L-digitoxose pathway from a Streptomyces, a BLAST analysis was performed against a sequence database. This revealed potential homologues. These genes can be analyzed by bioinformatics, and those of most interest can be incorporated into the in vivo glycosylation systems and methods of the invention. All genes of interest can be cloned by PCR with their ribosomal binding site (rbs). If necessary (e.g. in case of genes of unknown origin), the rbs sequence will be based on a consensus of highly expressed genes from a Streptomyces. At this stage an EcoRI site will be introduced upstream and a Muni site downstream of the ORF to facilitate pathway construction, e.g, by GENEREASSEMBLY™.
Example 10: Construction of hybrid deoxysugar pathways An alternative cloning vector can be constructed that can efficiently maintain the sequence integrity of cloned DNA. Low copy number vectors can be used as an alternative to pUC-based high-copy vectors pMUN3 and pAT6 used, as discussed above. A suitable multi cloning site (mcs) for pathway construction can be incorporated in pACYC-derived vectors. pACYC contains the replicon of P15A, which is reported to have a copy number of 15-50 per chromosome in E. coli. In addition, an efficient transcriptional terminator such as the fd-terminator, can be included upstream of the mcs to inhibit any expression from vector-borne promoters. To assess the utility of these vectors, the stability of pathways that proved to be problematic (e.g, pathways # 1 and
#4, Table 2, above) can be investigated. To increase the rate at which hybrid deoxysugar pathways can be constructed, the traditional restriction enzyme based cloning methods can be replaced with an adaptation of GENEREASSEMBLY™. In one aspect, for assembly of pathways all genes will carry EcoRI and Muni sites, which generate compatible overhangs.
However, when ligated to each other, the product cannot be cut with either enzyme.
GENEREASSEMBLY™ uses a bead-based ligation procedure. The first fragment is coupled to magnetic beads by a biotin label. Each additional pool of fragments is added, allowed to ligate and excess fragments are removed. Then the next pool of fragment is added and the cycle repeated, as illustrated in Figure 14. At the end, the product is cleaved from the beads, ligated and introduced into E. coli by transformation. To adapt this technology for the construction of deoxysugar pathways, the L-rhamnose / L-digitoxose analogous pathways (see Table 2, above) generated as described above can be used as model systems. Using GENEREASSEMBLY™, the same subset of 8 pathways will be constructed as a library. The scheme in figure 15 describes the construction process. This library will be transferred to Streptomyces-SpnG. Occurrence and relative ratios of aglycone conversion products M548, M548-II, and M532 from this library can be predicted based on the results of these studies, as shown in Table 6:
Figure imgf000139_0001
Table 6 shows the expected production patterns and frequency of occurrence, see description of pathways in Table 2, above. Analysis of the production profile allows assessment as to whether the process results in any bias in pathway construction or yields a large percentage of nonfunctional pathways. Distribution of genes within the pathways can be analyzed at the E. coli stage by performing PCR and restriction analysis. The use of primers specific for the different epi, kre and tdh / tkr genes allows assessment of whether the genes are evenly distributed, as expected. Restriction analysis can be performed to assess the number of full-length pathways, and to identify specific pathways based on their patterns. After successful implementation of GENEREASSEMBLY™ with the model pathways, a library of pathways can be constructed that includes additional biosynthetic genes. At each step, 2 to 4 homologues for each gene-function can be included, as illustrated in Figure 16. The position of genes in the hybrid pathways can follow the same order as their gene-product functions in biosynthesis: early genes (dNDP- transferase, 4,6-dehydratase), followed by intermediate genes (e.g. epimerases, 2- deoxygenation, C-methylations), and with late genes (e.g. 4-ketoreductases, aminotransferases) at the end. Because the reconstructed sugar biosynthetic pathways can be ordered, with respect to gene function, it will be possible to modulate the length of pathways and the type of deoxysugar pathways produced, as illustrated in Figure 16. This can allow for the generation of smaller, specialized sub-libraries, such as 6-deoxysugar pathway libraries or amino-sugar libraries. The availability of specialized libraries can allow matching GTs of known specificity with pathways more likely to provide suitable substrates, thus minimizing the number of samples that have to be analyzed. The quality of the library can be confirmed by restriction analysis of a representative number of clones to determine both average pathway length, as well as the distribution of the different genes within the library. Pathway length can be estimated by analysis of insert size. Distribution of genes can be estimated by analysis of repeats in restriction patterns using enzymes that cut actinomycete DNA at relatively high frequency, such as SacII or Narl. In the event that GENEREASSEMBLY™ does not prove to be suitable, alternative approaches to generate a pathway library can be explored. One such alternative is the use of restriction independent cloning systems, for example, the XI- CLONE™ system from Gene Therapy Systems, San Diego, CA. This system takes advantage of homologous recombination for directional cloning of linear DNA fragments at any position in a vector, and allows for the efficient addition and/or replacement of genes in existing constructs. To target the fragment to a specific position of a vector, the fragment to be inserted must contain homologous regions of ~50 bp, which can be added by means of extended PCR-primers. By using multiple fragments in each step and following the scheme in Figure 16, it is possible to construct a library similar to that using GENEREASSEMBLY™. A combination of these methods can be used: GENEREASSEMBLY™ to generate partial library fragments and XI-CLONE™ to combine the fragments into functional pathways. Pathway clones that result in the production of a novel glycosylated derivative can be recovered in E. coli. Plasmids isolated from E. coli clones will then be analyzed by restriction analysis and sequencing to identify the genes present in these pathways.
Example 11: Construction of specific deoxysugar pathways Several GTs genes have been chosen to explore their ability to glycosylate either spinosyn or anthracycline type substrates (see Table 8 and Table 9, below). To enhance the likelihood of glycosylation, these GTs can be initially be supplied with their natural sugar substrate. Therefore, the pathways for the associated deoxysugars of these enzymes, listed in Table 7, can be re-constructed. To expedite construction, a bead-based ligation procedure as described for GENEREASSEMBLY™ where only those fragments necessary for a particular pathway can be used. To minimize cloning efforts, common functions, such as 2-deoxygenation or 3 -aminotransferases, can be chosen from a single source. Table 7 shows specific pathways that can reconstructed and used in the in vivo glycosylation systems and methods of the invention:
Figure imgf000141_0001
Example 12: Conversion system optimization The number of possible combinations of deoxysugar pathways and GTs that can be generated and used in the in vivo glycosylation systems and methods of the invention is extensive. Initial optimization can be done using the spinosyn system of the invention. The optimized conditions can then be applied to glycosylated anthracycline products. The initial optimization will be done using the well-characterized spinosyn system. The optimized conditions will then be applied to glycosylated anthracycline products to further validate the system. Transfer of constructs: Allowing for the need for sporulation of Streptomyces, the sequential transfer of two constructs into the strain required approximately 6 weeks. Various deoxysugar pathway/GT combinations can be used, and screened for efficacy, by simultaneously transferring both constructs by intergeneric mating from E. coli. To establish proof of principle, pathway 1 and SpnG can be used, selecting the plasmids with apramycin and thiostrepton, respectively. Resulting exconjugants can be verified by analysis of conversion of the aglycone to M548-II. Based on the conjugation frequency for single constructs (1 / 1000 - 10000) recipients), it is conceivable that about 10 double exconjugants could be obtained by such double transfers. Since different plasmids may conjugate with different efficiencies, and have differential effects on sporulation, it will be crucial to determine that this procedure does not introduce any bias that could compromise the quality of the deoxysugar pathway library in the Streptomyces. To investigate this, the mini-library of the 6-deoxysugar pathways (1-4) and 2,6-dideoxysugar pathways can be transferred to the Streptomyces, and after spores have been prepared, SpnG can be introduced. Analysis of the conversion products of the aglycone, and comparison to the predicted results (see Table 6) and results noted above (i.e. the library transferred to a strain containing SpnG), will demonstrate whether preparing a deoxysugar pathway library in the Streptomyces in this manner is a suitable approach. Alternatively, the deoxysugar pathway library can be prepared in E. coli and a host strain prepared containing each GT of interest. Spores of these individual host strains can be used for introduction of the library. This should result in less bias, as the exconjugants grow as individual colonies on plates and do not compete against each other. Culture conditions: Experiments have shown that conversion can be achieved in a 96 deep-well (2.2ml / well) format with 1 ml of medium. However, conversion ratios were poor (10 to 20%) compared to those obtained from tube or flask cultures (30%> to 100%). A possible cause could be the different media-volume to vessel- volume ratio (1:2 vs. 1:5) (see above), resulting in differential aeration of the cultures. For automation of the process, a plate-based format is desirable. Optimization of conditions in a plate format can be performed using the mini-library described above. Lower media volumes in a 96 well format can be analyzed, as well as longer periods of incubation and increased speed of shaking. For Streptomyces cultures, this is limited by evaporation to a minimum of 700 ml and 7 days. If the 96-well format cannot be improved significantly, plates with a larger volume, such as 48-well (5ml / well) and 24-well (10ml / well) plates can be used. Sample preparation and chromatographic analysis: The current liquid / liquid extraction is not adaptable to a truly high throughput process in a plate format. Other options, such as evaporation of the medium, direct sample preparation by centrifugation, filtration, or solid-phase extraction can be optimized by routine screening alternatives to obtain low variation between samples, good product recovery and a low background. The current HPLC-UV-vis method for analysis of spinosyn derivatives consists of a linear gradient of H20 and acetonitrile, with a total runtime of 30 min. This method is suitable for analysis of the full spectrum of substrates and products (aglycone,
17-PSA, 9-PSA, spinosyn A and analogues), but only allows a throughput of 48 samples on a single instrument a day. The goal is to cut analysis time to less than 5min / sample without loosing sensitivity and resolution. This may require development of separate methods for each substrate to be analyzed. Any conversion detected with HPLC-UV-vis can be analyzed using
HPLC-MS. HPLC-MS data can be used to build a database containing retention times, mass-spectra and UV-vis-spectra. This data can then be used to prioritize conversion products for further analysis. Products showing different characteristics than those in the database can be scaled up and isolated for structure elucidation by NMR and testing of their biological activity. The TLC method can be used and optimized by routine screening of alternatives. This may allow handling of a large number of samples simultaneously and can be easily automated. The influence of differential glycosylation on Rf-values can be analyzed using the novel compounds of the invention. Example 13: Construction of a library of spinosyn derivatives In one aspect, a library of modified glycosylated spinosyn derivatives is generated to identify compounds with increased or modified insecticidal activity. This is achieved by increasing the number of deoxysugar pathway / GT combinations used to generate the derivatives. Optimal activity of the spinosyns is achieved only after methylation of the sugar moieties. In one aspect, a series of O-methyltransferase genes from the spinosyn pathway is to the system. In one aspect, to further increase the spectrum of derivatives, derivatives of spinosyn containing modified sugars at both the 9- and 17-positions is generated. Additional glycosyltransferases A second spinosyn producing strain, Sac. pogona, is available from the
ATCC26. This strain was shown to produce spinosyn derivatives that contain neutral sugars instead of the D-forosamine. No sequence information is available for the spinosyn cluster from Sac. pogona. However, due to the similarities to the spinosyns from Sac. spinosa, it is expected that the cluster will be conserved at both the level of individual gene sequence and overall pathway organization. In one aspect, to clone and sequence the deoxysugar portion from Sac. pogona, a fosmid library is prepared and probed with the GT genes (spnG, spnP), deoxysugar genes (e.g. spnO, spnS) and genes thought to be involved in cyclization of the polyketide (spnF, J, L, M). In one aspect, hybridizing clones are mapped by restriction analysis, and suitable clones sequenced using a shotgun approach. In one aspect, deoxysugar biosynthetic genes and GT genes are identified and used for preparing spinosyn analogs. SpnP has recently been shown to transfer neutral sugars to the 17-PSA28. In one aspect, both SpnP and the homologue from Sac. pogona are screened for their ability to transfer neutral sugars to the 17-PSA. A number of macrolide biosynthetic gene clusters have been sequenced. From these sequences, about 20 GTs have been identified that have the potential to attach sugars to macrolide derivatives. Several of these enzymes show flexibility towards their substrates. A set of sequences was retrieved from Genbank and aligned to build a phylogenetic tree. Based on this analysis, select genes will be cloned and used for expression in the Streptomyces glycosylation platform of the invention. In one aspect, the enzymes are analyzed initially for activity by co-expression with reconstructed pathways of their native deoxysugar. These strains can be analyzed for conversion of the aglycone, 9-PSA, 17-PSA and spinosyn A. Any enzyme showing activity on spinosyns can be studied further by varying the deoxysugar substrates supplied (see below). Table 8 lists genes encoding macrolide-modifying GTs that can be used in the in vivo glycosylation systems of the invention:
Figure imgf000144_0001
The erythromycin producer Sac. erythraea is capable of transferring a glucose moiety to the 17-position of the 17-PSA 28. Sac. erythraea is also able to transfer a glucose moiety to tylactone and avermectin. While the gene responsible for this transfer has not been identified, it is conceivable that this activity is normally associated with self-resistance of the organism to erythromycin (in addition to the methylation of 23S rRNA conferred by the ermE gene). A similar resistance mechanism has been demonstrated in the oleandomycin producer Streptomyces antibioticus. This mechanism of resistance involves an intracellular glucosyltransferase that transfers glucose to the 2"- OH of desosamine, coupled with an extracellular glucosidase that removes the sugar after excretion. Homologues of these genes have been found in several Streptomyces species. In one aspect, genes from Streptomyces, e.g, the oleandomycin producer S. antibioticus, are cloned and investigated for their ability to glycosylate spinosyn derivatives. Analysis of the deoxysugar pathway library In one aspect, pathway libraries to be constructed in this project (see above) are tested in conjunction with both SpnP and SpnG for conversion of the 17-PSA and aglycone. Any other GT that shows glycosylation of a spinosyn substrate can be assayed together with the deoxysugar pathway library. In one aspect, for those enzymes that show conversion ratios at levels less than 40%, a sublibrary of pathways closely related to the cognate substrate will be chosen and analyzed. Methyltransferases It is known that spinosyn derivatives lacking the O-methyl groups on the L-rhamnose are in general less active than their methylated counterparts. The methyltransferase genes spnl, H, and K were isolated by PCR amplification from Sac. spinosa genomic DNA. In one aspect, these genes are cloned into plasmid pUWL20 GENEREASSEMBLY™ is used to efficiently generate these constructs. After transfer of these constructs to a Streptomyces, the resulting strains can be investigated for glycosylation and methylation using pathways 4 (6-deoxy-D-glucose), 1 (L-rhamnose), and 12 (L-digitoxose). In one aspect, up to 17 different methylated derivatives of each the 17-PSA analogues and spinosyn A are generated, and, screened for biological activity. Thus, in one aspect, the glycosylated spinosyns of the invention are methylated at one of the 17 different possible sites, or, at several of the 17 different possible sites, or, at all of the 17 different possible sites. Combining sugar variations In one aspect, the compositions (the glycosylated natural products of the invention) are modified at a single glycosylation site, or, are modified in a single glycosylation step. In another aspect, diversity is added to the compositions of the invention by combining modifications at both the 9- and the 17-position within a single compound. To accomplish this, a sequential feeding strategy can be employed. For this purpose, the product of the first glycosylation step can be produced at a larger scale and partially purified. This partially purified material carrying one modified sugar can then be used for a second feeding using a strain that carries out the second glycosylation step. This product can then be purified for further characterization. The advantage of this method is that it does not require any further modification of the strains. However, this sequential strategy is time consuming and requires a large amount of substrate due to the loss of compound during purification. In another aspect, the method for making a compound of the invention is done as a one-step process. This can require combining all genes necessary for the 9- and 17-glycosylation in one strain of a Streptomyces. SpnG and SpnP can be introduced into strains that contain pathways required for the synthesis the sugar moieties found at the 9- and 17-position (L-rhamnose, L-digitoxose, or 6-D-deoxyglucose, respectively, and D- forosamine). Conversion of the aglycone to spinosyn derivatives by these strains can be assessed by routine procedures. Activity testing of novel spinosyn derivatives All novel spinosyn derivatives can be tested for their insecticidal activity using routine screens known in the art. For example, the nematode Caenorhabditis elegans and the mosquito Aedes aegypti can be used as target organisms. Both assays are available in a 96-well format and have been shown to be suitable for analyzing activity of spinosyn derivatives. Initial testing may requires about 2 mg of pure compound for each derivative. Based on results of these screens, compounds can be produced at a larger scale for extensive activity profiling.
Glycosylation of anthracyclines The glycosylation platforms of the invention are used to generate novel anthracycline derivatives. Initially, this requires cloning and expression of anthracycline resistance and GT genes in a Streptomyces. Then the deoxysugar pathway library can be used to generate novel glycosylated doxorubicin derivatives. The focus can be on generating di-glycosylated derivatives that have shown promising profiles for anticancer therapy. Doxorubicin resistance in a Streptomyces In addition to their anticancer activity, anthracyclines also possess potent antibacterial activity. The doxorubicin producer S. peucetius contains two resistance mechanisms, conferred by DrrAB and DrrC. DrrAB form an ABC-type export system, and DrrC is an UvrA-like DNA repair enzyme. Both confer doxorubicin resistance in S. lividans. The Streptomyces strain used in these studies contains a close homologue of DrrC, which may confer resistance as well. Resistance of this Streptomyces against doxorubicin can be determined. In addition, in one aspect, drrAB and drrC is cloned. In one aspect, the level of resistance of this Streptomyces expressing either or both is determined. If this does not result in sufficient levels of resistance, spontaneous doxorubicin resistant mutants of this Streptomyces can be selected. In one aspect, toxicity levels for all substrates are determined. The level of resistance can determine the substrate concenfration that can be used in the glycosylation experiments. Chemical analysis of doxorubicin In one aspect, commercially available anthracyclines, such as daunorubicin, doxorubicin and aclarubicin, are used to use routine screening to determine HPLC methods suitable for analysis of doxorubicin derivatives of the invention. Various HPLC methods have been described in the literature. A method with an overall analysis time of less than 5 min/sample can be established. Detection limits can be determined. Based on detection limits and substrate loading, the minimal detectable conversion ratios can be calculated. Several TLC methods for anthracyclines are available. Since TLC can be easily automated, and has a short analysis time per sample, this procedure can be used primary screen. Substrates for glycosylation In one aspect, one or both sugars within the disaccharide of a composition of the invention (e.g, a glycosylated anthracycline, such as doxorubicin) are modified.
Thus, both a mono-glycosylated and a non-glycosylated substrate can be used.
Doxorubicin (e.g, from Sigma, St. Louis, MO) can be used as a mono-glycosylated substrate. In one aspect, aglycones are used as non-glycosylated substrate; they can be generated from commercially available anthracyclines, such as doxorubicin, daunorubicin and/or aclarubicin, by mild-acid hydrolysis using published procedures. Using an aclarubicin aglycone (similar to the rhodomycin aglycone) as substrate may not result in a doxorubicin derivative. However, the aclarubicin aglycone is an intermediate in doxorubicin biosynthesis and final conversion to the doxorubicin aglycone occurs after glycosylation. The genes involved in these late steps in doxorubicin biosynthesis (dnrKP, doxA) are well characterized and, in one aspect, are included into the systems of the invention to generate doxorubicin type compounds. Glycosyltransferase (GT) Genes Anthracyclines are a diverse group of glycosylated polyketides. Glycosylation occurs at various positions of the anthracycline aglycone, and sugar chains containing up to five moieties exist. Unusual connections between sugar moieties or sugars and aglycones, such as ether bridges and C-C connections, are also found. Accordingly, in one aspect, the invention provides anthracycline aglycones containing up to five glycosylation moieties. In one aspect, diglycosylated doxorubicin derivatives are constructed. Therefore, GTs expected to be able to attach sugars to the 7-position (red arrow in Figure 3c) and to form a disaccharide are used. Publicly available anthracycline GTs can be used, for example, sequences for anthracycline GTs, their deoxysugar substrates, and their sources are listed in Table 9:
Figure imgf000148_0001
Additional GTs can be identified by cloning, e.g, cloning the viriplanin cluster from Ampullariella regularis ATCC31417. The exact functions of only a few of these genes (e.g. dnmS, dnrH, dauH) are known. Based on amino acid sequence alignments, the enzymes listed above are predicted to be involved in the glycosylation steps indicated by the arrows in Figure 5. DnmS, RdmH, RhoG, and SnogE are predicted to act on the 7-position of the aglycone, whereas DauH, DnmH, and AknK are predicted to attach a second sugar to the growing saccharide chain. Glycosylated doxorubicin derivatives In one aspect, both of the sugar moieties are modified independently of each other. In one aspect, the results are combined to generate a truly combinatorial library of di glycosylated doxorubicin derivatives. Accordingly, in one aspect, the invention provides libraries of novel diglycosylated doxorubicin derivatives (as are provided libraries of novel diglycosylated natural product, e.g, spinosyn, derivatives). In one aspect, a twofold approach is taken: 1. Direct glycosylation of doxorubicin, resulting immediately in diglycosides (and providing insight into the functions and flexibility of the enzymes used). 2. Glycosylation of anthracycline aglycones to generate novel mono-glycosylated derivatives as substrates for a second glycosylation. Initial analysis can be performed providing each GT with its native deoxysugar substrate by co-expression of the relevant pathway (see Table 7). Based on predictions from sequence analysis, GTs can be assessed using either aglycones or doxorubicin, and the appropriate deoxysugar pathways listed in Table 9. These can be specifically reconstructed (see Table 7). In one aspect, GTs showing activity on any of the substrates are used in conjunction with the deoxysugar pathway library in an analogous fashion to the experiments described above. Any mono-glycosylated products from this effort can be produced on a larger scale to provide substrates for attachment of a second sugar moiety. In one aspect, combinations of the two different sugar moieties are carried out using a sequential feeding strategy analogous to that described in above. Activity of novel glycosylated doxorubicin analogues Diglycosidic derivatives of doxorubicin are predicted to have both higher efficacy and lower toxicity compared to doxorubicin. The initial evaluation of the efficacy of the novel diglycosidic doxorubicins of the invention can be done using routine screening, e.g, by cell-based assays. In one aspect, for analysis of cytotoxicity in mammalian cells, a wide range of cell lines, including human cancer, are used, as shown in Table 10:
Figure imgf000149_0001
Figure imgf000150_0001
Several cell viability and proliferation assays have been established in 96- or 384-well formats, including an LDH assay, which measures cell membrane integrity using lactate dehydrogenase (LDH) activity, as well as Tetrazolium dye (MTT and MTS) conversion assays that measure cell metabolic activity. These assays can be used to determine efficacy and spectrum of activity of novel derivatives of the invention compared to doxorubicin. Diglycosidic derivatives of the invention have also been shown to have activity against doxorubicin resistant cell lines. Therefore, the activity of the derivatives of the invention are analyzed towards doxorubicin resistance isolates of the cell-lines listed in Table 10. To determine general cytotoxicity, human hepatocytes and kidney cells can be used. Results from these cell-based screens can be used to prioritize compounds of the invention. Those compounds showing improved efficacy in the cell-based assay, as compared to doxorubicin, are further analyzed in animal models, e.g, using xenografted mice carrying human tumors. In one aspect, for the initial analysis, Daudi solid tumors can be established in CB-17SCID mice and serially passaged 3 times before use in study. Six to eight week-old mice are injected subcutaneously with 1 x 106 cells in both flanks. Mice can be monitored for tumor growth at 2 to 3 weeks post tumor inoculation and divided into groups of 10 mice. Mice are treated intravenously with novel doxorubicin analogs at both a high and low dose (to be determined by IC50 values from in vitro cytotoxicity assays). A positive control group treated with doxorubicin can be included. Mice can be monitored for tumor growth and progression. To determine possible side effects, weight can be monitored as a general parameter of health. In one aspect, tumor growth inhibition and long-term survival are chosen as the end point for the study. Compounds showing good activity can be further tested in xenograft tumor models established in additional cell lines, including doxorubicin-resistant tumor cell lines, to determine if the doxorubicin derivatives are effective in inhibiting drug-resistant tumor growth. Example 14: 6-deoxy-D-glucose- 17-pseudoaglycone, Compound M548 The invention provides a novel compound 6-deoxy-D-glucose- 17- pseudoaglycone. Compound M548, or 6-deoxy-D-glucose- 17-pseudoaglycone, whose structure is illustrated in Figure 19b, was purified from an engineered Streptomyces strain (S. diversa) after adding the spinosyn aglycone to the culture broth. The strain expressed the rhamnosyl-transferase spnG from Saccharopolyspora spinosa ATCC49460 cloned under the control of the ermEp* promoter in the vector pUWL201, as illustrated in Figure 17. The strain also expressed a deoxysugar pathway, expressed by genes as cloned into vector pAT6 (pAT6-gtt21-gdhll-epil2-krel5), illustrated as pathway #4, shown in Figure 18, comprising the gdh and gtt genes from S. spinosa and the ORFs SMD08981 (epi) and S D08982 (kre) from a Streptomyces under control of the ermEp* promoter in pAT6. This engineered Streptomyces strain (S. diversa) was first grown in MYM- tap with 12.5 mg/ml thiostrepton and 50 mg/ml apramycin with shaking (250 rpm) at 30°C. A 10%) vol/vol inoculum was transferred to SCM with 12.5mg/ml thiostrepton and 50 mg/ml apramycin to select for the vectors and 100 mg/ml spinosyn A aglycone (whose structure is illustrated in Figure 19a) as substrate for glycosylation. After 4 days of culturing at 30°C with shaking (250rpm) the culture was extracted with ethyl-acetate and analyzed by HPLC-UV-vis and HPLC-MS. Production of a peak with a molecular weight of 548.3 and the same UV-vis spectrum as the spinosyn A aglycone was detected. This compound was named M548 and purified by HPLC from a larger volume and the structure elucidated by NMR spectroscopy. This data showed that the compound M548 contains a 6-deoxy-D-glucose moiety attached to the 9-position of the spinosyn aglycone whose structure is illustrated in Figure 19b.
Example 15: L-rhamnosyl- 17-pseudo-aglycone, Compound M548-II The invention provides a novel compound L-rhamnosyl- 17-pseudoaglycone. Compound M548-II, or L-rhamnosyl- 17-pseudo-aglycone, whose structure is illustrated in Figure 20, was initially discovered in an engineered Streptomyces strain (S. diversa) expressing the rhamnosyl-transferase SpnG from S. spinosa, as illustrated in Figure 17, and an engineered deoxysugar pathway, as cloned into the vector pAT6-gtt21- gdh 11 -epi 12-kre9, see pathway #3 , Figure 21 , after feeding of the spinosyn A aglycone under the same conditions as described above. It was later found at a higher level from a strain expressing pathway #9, cloned into vector pAT6-gtt-gdh-e8-k9-tdh-tkr, Figure 22, and subsequently purified from this strain. HPLC-UV-vis showed an identical spectrum to the spinosyn A aglycone and LC-MS showed a molecular weight of 548.3. This compound was named M548-II and the structure elucidated by NMR, Figure 20.
Example 16: L-digitoxosyl- 17-pseudo-aglycone. Compound M532 The invention provides a novel compound L-digitoxosyl- 17-pseudo- aglycone. Compound M532, or L-digitoxosyl- 17-pseudo-aglycone, whose structure is illustrated in Figure 24, was discovered in an engineered Streptomyces strain (S. diversa) expressing the rhamnosyl-transferase SpnG (Figure 17) from S. spinosa and an engineered deoxysugar pathway, pathway #12, as cloned into the vector pAT6-gtt-gdh- el2-kl5-tdh-tkr, Figure 23, after feeding of the spinosyn A aglycone under the same conditions as described above. HPLC-UV-vis showed an identical spectrum to the spinosyn A aglycone and LC-MS showed a molecular weight of 532.4. This compound was named M532. It was isolated from a larger scale culture, purified by HPLC and the structure elucidated by NMR, Figure 24.
Example 17: 9-6-deoxy-D-glucosyl-spinosyn A, Compound M689 The invention provides a novel compound the 9-6-deoxy-D-glucosyl- spinosyn A. The engineered Streptomyces strain (S. diversa) expressing SpnG and the 6- deoxy-D-glucose biosynthetic pathway (pathway #4, Figure 18, see above) was used to convert the spinosyn A 9-pseudo-aglycone (9-PSA, whose structure is illustrated in Figure 25a) to the 9-6-deoxy-D-glucosyl-spinosyn A of the invention, whose structure is illustrated in Figure 25b. For this purpose the 9-PSA was added to a culture of the strain, extracted and analyzed by HPLC and HPLC-MS. Production of a new compound with a molecular weight of 689.7 could be shown. This compound was designated M689.
Example 18: 9-L-rhamnosyl-spinosyn A, Compound M689-II The invention provides a novel compound, 9-L-rhamnosyl-spinosyn A, Compound M689-II, whose structure is illustrated in Figure 26. The engineered
Streptomyces strain (S. diversa) expressing SpnG and the L-rhamnose biosynthetic pathway (pathway #9, Figure 22, see above) was used to convert the 9-PSA (Figure 25a) to the 9-L-rhamnosyl-spinosyn A of the invention, whose structure is illustrated in Figure 26. For this purpose the 9-PSA was added to a culture of the strain, extracted and analyzed by HPLC and HPLC-MS. Production of a new compound with a molecular weight of 689.7 could be shown. This compound was designated M689-II. The structures of both M689 and M673 (see Example 19) were confirmed by NMR after purification of larger quantities from the strains described herein. Example 19: 9-L-digitoxosyl-spinosyn A, Compound M673 The invention provides a novel compound 9-L-digitoxosyl-spinosyn A, Compound M673, whose structure is illustrated in Figure 27. The engineered Streptomyces strain expressing SpnG and the L-digitoxose biosynthetic pathway (pathway #12, Figure 23, see above) was used to convert the 9-PSA (Figure 25a) to the 9-L- digitoxosyl-spinosyn A of the invention, whose structure is illustrated in Figure 27. For this purpose the 9-PSA was added to a culture of the strain, extracted and analyzed by HPLC and HPLC-MS. Production of a new compound with a molecular weight of 673.7 could be shown. This compound was designated M673. Example 20: Activity of novel compounds Compounds M548, M548-II, and M532 were inactive. Activity was tested by an injection assay using beet armyworms as the test organisms. 6 larvae per test were all run at 10 ug/larva in 0.5 ul of DMSO. Unless otherwise indicated, data is reported as percent of injected larvae showing symptoms of intoxication. Compound 1 hr 6 hr 24 hr 48 hr 48hr mortality DMSO 0% 0% 0% 0% 0% M689 33% 17% 0% 0% 0% M689-II 100% 100% 67% 33% 17% M673 100% 100% 100% 100% 17%
For the best compound M673 a direct comparison run-down assay with spinosad a mixture of spinosyn A and D, was performed. 6 larvae per test were all run at the indicated amount per larva in 0.5 ul of DMSO. Unless otherwise indicated, data is reported as percent of injected larvae showing symptoms of intoxication. Symptoms Compound 1 hr 6 hr 24 hr 48 hr 48hr mortality DMSO 0% 0% 0% 0% 0% M673 10 ug 67% 83% 33% 0% 0% M673 1 ug 17% 0% 0% 0% 0% M673 0.1 ug 17% 0% 0% 0% 0% Spinosad 10 ug 100% 100% 100% 100% 100% Spinosad 1 ug 100% 100% 100% 100% 83% Spinosad 0.1 ug 100% 100% 100% 67% 0%
Example 21: Deoxysugar biosynthesis in S. diversa Reconstructed L-rhamnose pathway Production of M548-II and M689-II in S. diversa, after feeding the aglycone and the 9-pseudo-aglycone, respectively, was confirmed by HPLC and HPLC- MS analysis. Investigation of 6-deoxysugar biosynthesis in S. diversa To elucidate the minimal gene set required for biosynthesis of 6-deoxy-D- glucose and reasons for various levels of 6-deoxysugar after expression of reconstructed 6-deoxysugar pathways several incomplete pathways were constructed by removing genes from pathways #2, #3, and #4. These were expressed in S. diversa DS10 from plasmid pAT6 (see Figure 28, map of pathway #1 in pAT6) together with SpnG from pUWL201 and analyzed for conversion of the aglycone. The inserts of complete and incomplete 6-deoxysugar pathways are illustrated in Figure 29. Conversion of the aglycone after the complete or incomplete pathways with SpnG by S. diversa was investigated. Production of the two different deoxysugars 6- deoxy-D-glucose and L-rhamnose was monitored as relative production of M548 and M- 548-11. The results are illustrated in Figure 30 as averages from 4 individual clones; relative conversion of aglycone (Missing to 100%: unconverted aglycone and M-2). Vector construction pAT6 was constructed by insertion of ermEp* from pUWL201 into pSET 152 to generate a new vector suitable for over-expression of genes and pathways in Streptomyces. All pieces used are from publicly available plasmids. These data demonstrate: - Over-expression of gtt and gdh genes is sufficient for biosynthesis of 6- deoxy-D-glucose. S. diversa must contain a ketoreductase, which is able to catalyze the reduction of 4-keto-6-deoxy-D-glucose to 6-deoxy-D-glucose. - L-rhamnose biosynthesis requires over-expression of an epi gene. It seems like the 4-keto-6-deoxy-D-glucose-5-epimerase from S. diversa also exhibits some level of 4-keto-6-deoxy-D-glucose-3;5-epimerase activity resulting in L-rhamnose production from pathways 2, 3, and 4. - Co-expression of the L-rhamnose specific genes results in redirection of flux from 6-deoxy-D-glucose towards L-rhamnose. This is indicated by the quite different ratio of M548:M548-II production from pathway 1 compared to all other pathways. A number of embodiments of the invention have been described. Nevertheless, it will be understood that various modifications may be made without departing from the spirit and scope of the invention. Accordingly, other embodiments are within the scope of the following claims.

Claims

WHAT IS CLAIMED IS: 1. A glycosylation system comprising (i) an in vivo glycosylation system comprising an engineered host cell comprising a heterologous glycosyltransferase and/or a heterologous deoxysugar pathway, or, (ii) an in vitro glycosylation system comprising a host cell extract comprising a heterologous glycosyltransferase and/or a heterologous deoxysugar pathway.
2. The glycosylation system of claim 1, wherein the host cell is an Actinomycetales.
3. The glycosylation system of claim 2, wherein the Actinomycetales is a Streptomyces.
4. The glycosylation system of claim 3, wherein the Streptomyces is selected from the group Streptomyces coelicolor, Streptomyces peucetius, Streptomyces avermitilis, Streptomyces aureofaciens, Streptomyces kasugensis, Streptomyces lavenulae, Streptomyces lipmanii, Streptomyces lividans, Streptomyces ambofaciens, Streptomyces violaceoniger, Streptomyces thermotolerans, Streptomyces rimosus, Streptomyces glaucescens, Streptomyces roseofulvus, Streptomyces cinnamonensis, Streptomyces curacoi, Streptomyces fradiae, Streptomyces griseus, Streptomyces griseofuscus, Streptomyces longisporoflavus, Streptomyces hygroscopicus, Streptomyces lasaliensis, Streptomyces venezuelae, Streptomyces antibioticus, Streptomyces albus, Streptomyces tsukubaensis, Streptomyces galilaeus or Streptomyces diversa.
5. The glycosylation system of claim 1, wherein the host cell is an actinomycetes selected from the family Micromonosporaceae, or the genus Actinomyces, Actinomadura or Nocardia.
6- The glycosylation system of claim 1, wherein the host cell or cell extract therefrom comprises an activated or overexpressed endogenous glycosyltransferase and/or deoxysugar pathway gene or enzyme.
7. The glycosylation system of claim 1, wherein the glycosyltransferase is a recombinant enzyme and/or the deoxysugar pathway comprises recombinant enzymes.
8. The glycosylation system of claim 1, wherein the glycosyltransferase is a rhamnosyl-transferase, an anthracycline glycosyltransferase , desosaminyltransferase, mycarosyltransferase, desosaminyltransferase, megosaminyltransferase, oleandrosyltransferase, olivosyl-transferase, mycaminosyltransferase, deoxyallose transferase, forosaminyltransferase, mannosyltransferase, daunosaminyltransferase, rhodinosyltransferase, quinovosyltransferase, or a macrolide glycosyltransferase.
9. The glycosylation system of claim 1, wherein the deoxysugar pathway comprises an rhamnose, a forosamine , mycarose, mycaminose, desosaminose, megosaminose, oleandrosose, olivosose, deoxyallose, mannose, daunosaminose, rhodinose, quinovose, and/or a L-digitoxose biosynthetic pathway.
10. The glycosylation system of claim 1 wherein the host cell or cell extract therefrom comprises an inactivated, defective or inhibited endogenous glycosyltransferase and/or deoxysugar pathway gene or enzyme.
11. The method of claim 10 wherein the endogenous gene is inactivated by gene disruption, antisense, inhibitory RNA (iRNA), a regulatory mutation or a structural gene mutation.
12. The method of claim 10 wherein the host cell or cell extract therefrom comprises two, three or four inactivated, defective or inhibited or activated endogenous glycosyltransferases and/or deoxysugar pathway enzymes.
13. The glycosylation system of claim 10 wherein the host or cell extract comprises an inactivated, defective or inhibited endogenous glycosyltransferase and/or deoxysugar pathway gene or enzyme, thereby providing an aglycone, pseudoaglycone, unnatural sugar or modified sugar compared to wild-type cell or extract.
14. The method of claim 1, wherein the engineered host cell or cell extract comprises at least two heterologous glycosyltransferases and/or at least two heterologous deoxysugar pathways.
15. The method of claim 1, wherein at least one pathway gene or enzyme is provided by and endogenous to the host cell.
16. The method of claim 15, wherein two, three or four pathway genes or enzymes are provided by and endogenous to the host cell.
17. The method of claim 1, wherein a glycosyltransferase activity is provided by and endogenous to the host cell.
18. The method of claim 1 wherein the glycosyltransferase transfers a deoxy-glucose, -rhamnose, -digitoxose, -forosamine , -mycarose, -mycaminose, - desosaminose, -megosaminose, -oleandrosose, -olivosose, -deoxyallose, -mannose, - daunosaminose, -rhodinose, - quinovose, and/or their D- or L-forms.
19. A method for making a glycosylated natural product comprising the following steps (a) providing an in vivo or in vitro glycosylation system comprising an engineered host or a cell extract therefrom comprising a heterologous glycosyltransferase and a heterologous deoxysugar pathway; (b) providing natural product; and (c) adding the natural product to the in vivo or in vitro glycosylation system, thereby glycosylating the natural product.
20. The method of claim 19, wherein the natural product is either added exogenously or provided in vivo or in vitro by expressing biosynthetic genes for the natural product.
21. The glycosylation system of claims 1 or 19, wherein the deoxysugar pathway comprises an L- or a D-sugar biosynthetic pathway.
22. The method of claim 19, wherein the natural product comprises an aglycone, a pseudoaglycone or a macrolide.
23. The method of claim 22, wherein the pseudoaglycone comprises a 9- or 17-pseudoaglycone.
24. The method of claim 19, wherein the natural product comprises an aglycone or pseudoaglycone of a polyketide, or a peptide or a mixed polyketide-peptide.
25. The method of claim 24, wherein the aglycone or pseudoaglycone comprises an aglycone or pseudoaglycone of a macrolide.
26. The method of claim 25, wherein the macrolide aglycone or pseudoaglycone comprises an aglycone or pseudoaglycone of a spinosyn, an erythromycin, a rifampicin, idarubicin, epirubicin, a daunorubicin, a mithramycin, a rapamycin, FK520, FK506, an amphotericin, a tylosin, oleandomycin, rifamycin, immunomycin, narbomycin, pikromycin, spiramycin, dirithromycin, clarithromycin, froleandomycin, azithromycin or an avermectin.
27. The method of claim 19, wherein the natural product is an aglycone or pseudoaglycone of a spinosyn.
28. The method of claim 19, wherein the natural product comprises a polyketide, peptide or a mixed polyketide-peptide.
29. The method of claim 28, wherein the polyketide comprises a macrolide.
30. The method of claim 29, wherein the macrolide comprises a spinosyn, an erythromycin, , a rifampicin, idarubicin, epirubicin, a daunorubicin, a mithramycin, a rapamycin, FK520, FK506, an amphotericin, a tylosin, oleandomycin, rifamycin, immunomycin, narbomycin, pikromycin, spiramycin, dirithromycin, clarithromycin, froleandomycin, azithromycin or an avermectin.
31. A spinosyn compound having the formula of 9-6-deoxy-D-glucosyl- spinosyn, a 9-L-rhamnosyl-spinosyn, a 9-L-digitoxosyl-spinosyn, L-rhamnosyl- 17- pseudoaglycone spinosyn, 6-deoxy-beta-D-glucose- 17-pseudoaglycone spinosyn, D- quinovose- 17-pseudoaglycone spinosyn, or L-digitoxosyl- 17-pseudoaglycone spinosyn.
32. A compound of claim 31 wherein the spinosyn is spinosyn A, spinosyn D, or a 21-butenyl spinosyn.
33. A compound of claim 31 having the formula of 9-6-deoxy-D-glucosyl- spinosyn A, 9-L-rhamnosyl-spinosyn A, 9-L-digitoxosyl-spinosyn A, L-rhamnosyl- 17- pseudoaglycone spinosyn A, 6-deoxy-beta-D-glucose- 17-pseudoaglycone spinosyn A, D- quinovose-17-pseudoaglycone spinosyn A, or L-digitoxosyl- 17-pseudoaglycone spinosyn A.
34. An insecticide comprising a compound having a formula comprising a 9-6- deoxy-D-glucosyl-spinosyn, 9-L-rhamnosyl-spinosyn, 9-L-digitoxosyl-spinosyn, L- digitoxosyl- 17-pseudo-aglycone, 6-deoxy-D-glucose- 17-pseudoaglycone, D-quinovose- 17-pseudoaglycone spinosyn, or L-digitoxosyl- 17-pseudoaglycone spinosyn or a combination thereof.
35. A disinfectant comprising a compound having a formula comprising a 9-6- deoxy-D-glucosyl-spinosyn, 9-L-rhamnosyl-spinosyn, 9-L-digitoxosyl-spinosyn, L- digitoxosyl-17-pseudo-aglycone, 6-deoxy-D-glucose- 17-pseudoaglycone, D-quinovose- 17-pseudoaglycone spinosyn, or L-digitoxosyl- 17-pseudoaglycone spinosyn or a combination thereof.
36. A pharmaceutical composition comprising a compound having a formula comprising a 9-6-deoxy-D-glucosyl-spinosyn, 9-L-rhamnosyl-spinosyn, 9-L-digitoxosyl- spinosyn, L-digitoxosyl- 17-pseudo-aglycone, 6-deoxy-D-glucose- 17-pseudoaglycone, D- quinovose- 17-pseudoaglycone spinosyn, or L-digitoxosyl- 17-pseudoaglycone spinosyn or a combination thereof.
37. A method of preventing or treating infection in a cell comprising application of an effective amount of a composition comprising a 9-6-deoxy-D-glucosyl- spinosyn, 9-L-rhamnosyl-spinosyn, 9-L-digitoxosyl-spinosyn, L-digitoxosyl- 17-pseudoaglycone, 6-deoxy-D-glucose- 17-pseudoaglycone, D-quinovose- 17-pseudoaglycone spinosyn, or L-digitoxosyl- 17-pseudoaglycone spinosyn or a combination thereof.
38. The method of claim 37, wherein the cell is a plant cell or animal cell
39. The method of claim 38, wherein the plant cell is from the genera Anacardium, Arachis, Asparagus, Atropa, Avena, Brassica, Citrus, Citrullus, Capsicum, Carthamus, Cocos, Coffea, Cucumis, Cucurbita, Daucus, Elaeis, Fragaria, Glycine,
Gossypium, Helianthus, Heterocallis, Hordeum, Hyoscyamus, Lactuca, Linum, Lolium, Lupinus, Lycopersicon, Malus, Manihot, Majorana, Medicago, Nicotiana, Olea, Oryza, Panieum, Pannesetum, Persea, Phaseolus, Pistachia, Pisum, Pyrus, Prunus, Raphanus, Ricinus, Secale, Senecio, Sinapis, Solanum, Sorghum, Theobromus, Trigonella, Triticum, Vicia, Vitis, Vigna ox Zea.
40. The method of claim 38, wherein the plant is an angiosperm or a gymnosperm.
41. The method of claim 38, wherein the plant is a monocot or a dicot.
42. A kit comprising a 9-6-deoxy-D-glucosyl-spinosyn, 9-L-rhamnosyl- spinosyn, 9-L-digitoxosyl-spinosyn, L-digitoxosyl- 17-pseudo-aglycone, 6-deoxy-D- glucose- 17-pseudoaglycone, D-quinovose- 17-pseudoaglycone spinosyn, or L-digitoxosyl- 17-pseudoaglycone spinosyn or a combination thereof.
43. The method of claims 1 or 19, wherein the heterologous deoxysugar pathway and/or glycosyltransferase comprises an enzyme having a polypeptide sequence as set forth in SEQ ID NO:2; SEQ ID NO:4; SEQ ID NO:6; SEQ ID NO:8; SEQ ID NO:10; SEQ ID NO:12; SEQ ID NO:14; SEQ ID NO:16; SEQ ID NO:18; SEQ ID NO:20; SEQ ID NO:22; SEQ ID NO:24; SEQ ID NO:26; SEQ ID NO:28; SEQ ID NO:30; SEQ ID NO:32; SEQ ID NO:33; SEQ ID NO:35; SEQ ID NO:36; SEQ ID NO:37; SEQ ID NO:38; SEQ ID NO:40; SEQ ID NO:41; SEQ ID NO:42; SEQ ID NO:43; SEQ ID NO:45; SEQ ID NO:46; SEQ ID NO:47; SEQ ID NO:49; SEQ ID NO:50; SEQ ID NO:51; SEQ ID NO:53; SEQ ID NO:54; SEQ ID NO:55; SEQ ID
NO:56; SEQ ID NO:58; SEQ ID NO:59; SEQ ID NO:60; SEQ ID NO:62; SEQ ID
NO:63; SEQ ID NO:64; SEQ ID NO:65; SEQ ID NO:67; SEQ ID NO:68; SEQ ID
NO:69; SEQ ID NO:71; SEQ ID NO:72; SEQ ID NO:73; SEQ ID NO:74; SEQ ID
NO:76; SEQ ID NO:77; SEQ ID NO:78; SEQ ID NO:79; SEQ ID NO:80; SEQ ID NO:81; SEQ ID NO:83; SEQ ID NO: 84; SEQ ID NO: 85; SEQ ID NO: 86; SEQ ID NO: 87; SEQ ID NO: 88; SEQ ID NO:90; SEQ ID NO: 91; SEQ ID NO: 92; SEQ ID NO: 93; SEQ ID NO: 94; SEQ ID NO: 95; SEQ ID NO: 97; SEQ ID NO:98; SEQ ID NO:99; SEQ
ID NO: 100; SEQ ID NO 101; SEQ ID NO 102; SEQ ID NO: 104; SEQ ID NO: 105; SEQ ID NO: 106; SEQ ID NO 107; SEQ ID NO 108; SEQ ID NO: 109; SEQ ID NO: 111; SEQ ID NO: 112; SEQ ID NO 113; SEQ ID NO 114; SEQ ID NO: 115; SEQ ID NO: 116; SEQ ID NO: 117; SEQ ID NO 119; SEQ ID NO 120; SEQ ID NO: 121; SEQ ID NO: 122; SEQ ID NO: 123; SEQ ID NO 124; SEQ ID NO 126; SEQ ID NO: 127; SEQ ID NO: 128; SEQ ID NO: 129; SEQ ID NO 130; SEQ ID NO 131; SEQ ID NO: 133; SEQ ID NO: 134; SEQ ID NO:135; SEQ ID NO 136; SEQ ID NO 137; SEQ ID NO: 138; SEQ ID NO: 140; SEQ ID NO-.141; SEQ ID NO 143; SEQ ID NO 144; SEQ ID NO: 145; SEQ ID NO: 147; SEQ
ID NO: 148 or SEQ ID NO: 149 or a combination thereof.
44. An isolated or recombinant nucleic acid comprising a nucleic acid sequence having at least about 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%), 94%, 95%, 96%, 97%, 98%, 99%, or more, or complete (100%) sequence identity to SEQ ID NO:l; SEQ ID NO:3; SEQ ID NO:5; SEQ ID NO:7; SEQ ID NO:9; SEQ IDNO:11; SEQ ID NO:13; SEQ ID NO:15; SEQ IDNO:17; SEQ ID NO:19; SEQ ID NO:21; SEQ ID NO:23; SEQ ID NQ:25; SEQ ID NO:27; SEQ ID NO:29; SEQ ID NO:31, SEQ ID NO:34; SEQ ID NO:39; SEQ ID NO:44; SEQ ID NO:48; SEQ ID NO:52; SEQ ID NQ:57; SEQ ID NO:61; SEQ ID NO:66; SEQ ID NO:70; SEQ ID NO:75; SEQ ID NO:82; SEQ ID NO:89; SEQ ID NO:96; SEQ ID NO: 103; SEQ ID NO: 110; SEQ ID NO: 118; SEQ ID NO: 125; SEQ ID NO: 132; SEQ ID NO:139; SEQ ID NO: 142 or SEQ ID NO:146.
45. The isolated or recombinant nucleic acid of claim 44, wherein the sequence identity is over a region of at least about 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1050, 1100, 1150 or more residues, or the full length of a gene or a transcript.
46. The isolated or recombinant nucleic acid of claim 44, wherein the sequence identities are determined by analysis with a sequence comparison algorithm or by a visual inspection.
47. The isolated or recombinant nucleic acid of claim 44, wherein the sequence comparison algorithm is a BLAST version 2.2.2 algorithm where a filtering setting is set to blastall -p blastn -d "nr patnt" -F F, and all other options are set to default.
48. The isolated or recombinant nucleic acid of claim 44, wherein the nucleic acid sequence encodes a polypeptide having a sequence as set forth in SEQ ID NO:2; SEQ ID NO:4; SEQ ID NO:6; SEQ ID NQ:8; SEQ ID NO:10; SEQ ID NO:12; SEQ ID NO: 14; SEQ ID NO: 16; SEQ ID NO: 18; SEQ ID NO:20; SEQ ID NO:22; SEQ ID NO:24; SEQ ID NO:26; SEQ ID NQ:28; SEQ ID NO:30; SEQ ID NO:32; SEQ ID NO:33; SEQ ID NO:35; SEQ ID NO:36; SEQ ID NO:37; SEQ ID NO:38; SEQ ID NO:40; SEQ ID NO:41; SEQ ID NO:42; SEQ ID NO:43; SEQ ID NO:45; SEQ ID NO:46; SEQ ID NO:47; SEQ ID JO:49; SEQ ID NO:50; SEQ ID NQ:51; SEQ ID NO:53; SEQ ID NO:54; SEQ ID NO:55; SEQ ID NO:56; SEQ ID NO:58; SEQ ID NO:59; SEQ ID NO:60; SEQ ID NO:62; SEQ ID NO:63; SEQ ID NO:64; SEQ ID NO:65; SEQ ID NO:67; SEQ ID NO:68; SEQ ID NO:69; SEQ ID NO:71 ; SEQ ID NQ:72; SEQ ID NO:73; SEQ ID NO:74; SEQ ID NO:76; SEQ ID NQ:77; SEQ ID NO:78; SEQ ID NO:79; SEQ ID NO:80; SEQ ID NO:81; SEQ ID NO:83; SEQ ID NO:
84; SEQ ID NO: 85; SEQ ID NO: 86; SEQ ID NO: 87; SEQ ID NO: 88; SEQ ID NO 90; SEQ ID NO: 91; SEQ ID NO: 92; SEQ ID NO: 93; SEQ ID NO: 94; SEQ ID NO: 95 SEQ ID NO: 97; SEQ ID NO:98; SEQ ID NO:99; SEQ ID NO:100; SEQ ID NO:101
SEQ ID NO 102; SEQ ID NO: 104; SEQ ID NO 105 SEQ ID NO: 106; SEQ ID NO: 107 SEQ ID NO 108; SEQ ID NO: 109; SEQ ID NO 111 SEQ ID NO: 112; SEQ ID NO: 113 SEQ ID NO 114; SEQ ID NO: 115; SEQ ID NO 116 SEQ ID NO: 117; SEQ ID NO: 119 SEQ ID NO 120; SEQ ID NO: 121; SEQ ID NO 122 SEQ ID NO: 123; SEQ ID NO: 124 SEQ ID NO 126; SEQ ID NO: 127; SEQ ID NO 128 SEQ ID NO: 129; SEQ ID NO: 130 SEQ ID NO 131; SEQ ID NO: 133; SEQ ID NO 134 SEQ ID NO:135; SEQ IDNO:136 SEQ ID NO 137; SEQ ID NO: 138; SEQ ID NO 140 SEQ ID NO: 141; SEQ ID NO: 143 SEQ ID NO 144; SEQ ID NO: 145; SEQ ID NO 147 SEQ ID NO:148 or SEQ ID NO: 149.
49. The isolated or recombinant nucleic acid of claim 44, wherein the nucleic acid sequence encodes a polypeptide having a glycosyl transferase, a methyltransferase, an aminotransferase, a 3,4, dehydratase, a 3-keto reductase, 4,6-dehydratase, 2,3- dehydratase, 4-ketoreductase, or an O-methyl transferase activity.
50. The isolated or recombinant nucleic acid of claim 44, wherein the nucleic acid sequence encodes a polypeptide having a thermostable or a thermotolerant activity.
51. The isolated or recombinant nucleic acid of claim 50, wherein the polypeptide can retain an activity under conditions comprising a temperature range of between about 1°C to about 5°C, between about 5°C to about 15°C, between about 15°C to about 25°C, between about 25°C to about 37°C, between about 37°C to about 95°C, between about 55°C to about 85°C, between about 70°C to about 75°C, or between about 90°C to about 95°C, or more, or, retain an activity after exposure to conditions comprising a temperature range of between about 1°C to about 5°C, between about 5°C to about 15°C, between about 15°C to about 25°C, between about 25°C to about 37°C, between about 37°C to about 95°C, between about 55°C to about 85°C, between about 70°C to about 75°C, or between about 90°C to about 95°C, or more.
52. The isolated or recombinant nucleic acid of claim 44, wherein the nucleic acid sequence encodes a polypeptide having improved expression in a host cell, improved enzymatic activity, or a different substrate specificity than wild type.
53. An isolated or recombinant nucleic acid nucleic acid comprising a sequence that hybridizes under stringent conditions to a nucleic acid comprising SEQ ID NOT; SEQ ID NO:3; SEQ ID NO:5; SEQ ID NO:7; SEQ ID NO:9; SEQ ID NOT 1; SEQ ID NO: 13; SEQ ID NO: 15; SEQ ID NO: 17; SEQ ID NO: 19; SEQ ID NO:21; SEQ ID NO:23; SEQ ID NO:25; SEQ ID NO:27; SEQ ID NO:29; SEQ ID NQ:31, SEQ ID NO:34; SEQ ID NO:39; SEQ ID NO:44; SEQ ID NO:48; SEQ ID NO:52; SEQ ID NO:57; SEQ ID NO:61; SEQ ID NO:66; SEQ ID NO:70; SEQ ID NO:75; SEQ ID
NO:82; SEQ ID NO:89; SEQ ID NO:96; SEQ ID NO:103; SEQ ID NO:110; SEQ ID NOT 18; SEQ ID NOT25,; SEQ ID NO:132; SEQ ID NO:139; SEQ ID NO:142 or SEQ ID NO: 146.
54. The isolated or recombinant nucleic acid of claim 54, wherein the nucleic acid is at least about 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 100, 150, 200, 300, 400, 500, 600, 700, 800, 900, 1000 or more residues in length or the full length of the gene or transcript.
55. The isolated or recombinant nucleic acid of claim 53, wherein the stringent conditions include a wash step comprising a wash in 0.2X SSC at a temperature of about 65°C for about 15 minutes.
56. A nucleic acid probe for identifying a nucleic acid encoding a polypeptide wherein the probe comprises at least 10 consecutive bases of a sequence comprising SEQ ID NOT; SEQ ID NO:3; SEQ ID NO:5; SEQ ID NO:7; SEQ ID NO:9; SEQ ID NOT 1; SEQ ID NOT3; SEQ ID NO:15; SEQ ID NOT7; SEQ ID NOT9; SEQ ID NO:21; SEQ ID NO:23; SEQ ID NO:25; SEQ ID NO:27; SEQ ID NO:29; SEQ ID NO:31, SEQ ID NO:34; SEQ ID NO:39; SEQ ID NO:44; SEQ ID NO:48; SEQ ID NO:52; SEQ ID NO:57; SEQ ID NO:61; SEQ ID NO:66; SEQ ID NO:70; SEQ ID NO:75; SEQ ID NO:82; SEQ ID NO:89; SEQ ID NO:96; SEQ ID NOT03; SEQ ID NOT 10; SEQ ID NOT 18; SEQ ID NO: 125; SEQ ID NO: 132; SEQ ID NO: 139; SEQ ID NO: 142 or SEQ ID NO: 146.
57. An amplification primer pair for amplifying a nucleic acid, wherein the primer pair is capable of amplifying a nucleic acid comprising a sequence as set forth in claim 44 or claim 56, or a subsequence thereof.
58. The amplification primer pair of claim 57, wherein a member of the amplification primer sequence pair comprises an oligonucleotide comprising at least about 10 to 50 consecutive bases of the sequence, or, about 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 or more consecutive bases of the sequence.
59. An amplification primer pair, wherein the primer pair comprises a first member having a sequence as set forth by about the first (the 5') 12, 13, 14, 15, 16, 17,
18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 or more residues of SEQ ID NOT; SEQ
ID NO:3; SEQ ID NO:5; SEQ ID NO:7; SEQ ID NO:9; SEQ ID NOT 1; SEQ ID NOT3;
SEQ ID NOT5; SEQ ID NO:17; SEQ ID NOT9; SEQ ID NO:21; SEQ ID NO:23; SEQ ID NO:25; SEQ ID NO:27; SEQ ID N0:29; SEQ ID N0:31, SEQ ID NO:34; SEQ ID NO:39; SEQ ID NO:44; SEQ ID NO:48; SEQ ID NO:52; SEQ ID NO:57; SEQ ID N0:61; SEQ ID NO:66; SEQ ID NO:70; SEQ ID NO:75; SEQ ID NO:82; SEQ ID NO:89; SEQ ID NO:96; SEQ ID NOT03; SEQ ID NOT 10; SEQ ID NOT 18; SEQ ID NO: 125; SEQ ID NO: 132; SEQ ID NO: 139; SEQ ID NO: 142 or SEQ ID NOT46.
60. The amplification primer pair of claim 57 or claim 59, wherein the first member sequence comprises a sequence selected from the polypeptide-encoding region, its complement, or their degenerate sequences.
61. The amplification primer pair of claim 60 wherein the sequence is codon biased based on codon usage or on GC content of an Actinomyces.
62. The amplification primer pair of claim 61, wherein the GC content is about 65 to 75%.
63. A polypeptide-encoding nucleic acid generated by amplification of a polynucleotide using an amplification primer pair as set forth in claim 57 or claim 59.
64. The polypeptide-encoding nucleic acid of claim 63, wherein the amplification is by polymerase chain reaction (PCR).
65. The polypeptide-encoding nucleic acid of claim 63, wherein the nucleic acid generated by amplification of a gene library. '
66. The polypeptide-encoding nucleic acid of claim 63, wherein the gene library is an environmental library.
67. An isolated or recombinant polypeptide encoded by a nucleic acid as set forth in claim 63.
68. A method of amplifying a nucleic acid encoding a polypeptide comprising amplification of a template nucleic acid with an amplification primer sequence pair capable of amplifying a nucleic acid sequence as set forth in claim 44, claim 53, claim 56, claim 57, claim 58 or claim 59, or a subsequence thereof.
69. An expression cassette comprising a nucleic acid comprising a sequence as set forth in claim 44, claim 53, claim 56, claim 57, claim 58 or claim 59, or a subsequence thereof.
70. A vector comprising a nucleic acid comprising a sequence as set forth in claim 44, claim 53, claim 56, claim 57, claim 58 or claim 59, or a subsequence thereof.
71. A cloning vehicle comprising a nucleic acid comprising a sequence as set forth in claim 44, claim 53, claim 56, claim 57, claim 58 or claim 59, or a subsequence thereof, wherein the cloning vehicle comprises a viral vector, a plasmid, a phage, a phagemid, a cosmid, a fosmid, a bacteriophage or an artificial chromosome.
72. The cloning vehicle of claim 71, wherein the viral vector comprises an adenovirus vector, a retroviral vector or an adeno-associated viral vector.
73. The cloning vehicle of claim 71, comprising a bacterial artificial chromosome (BAC), a plasmid, a bacteriophage PI -derived vector (PAC), a yeast artificial chromosome (YAC), or a mammalian artificial chromosome (MAC).
74. A transformed cell comprising a nucleic acid comprising a sequence as set forth in claim 44 or claim 53 or a subsequence thereof.
75. A transformed cell comprising an expression cassette as set forth in claim 74.
76. The transformed cell of claim 75, wherein the cell is a bacterial cell, a mammalian cell, a fungal cell, a yeast cell, an insect cell or a plant cell.
77. The transformed cell of claim 76, wherein the bacterial cell belongs to the order Actinomycetales.
78. A transgenic non-human animal comprising a sequence as set forth in claim 44 or claim 53 or a subsequence thereof.
79. The transgenic non-human animal of claim 78, wherein the animal is a mouse.
80. A transgenic plant comprising a sequence as set forth in claim 44 or claim 53 or a subsequence thereof.
81. The transgenic plant of claim 80, wherein the plant is a corn plant, a sorghum plant, a potato plant, a tomato plant, a wheat plant, an oilseed plant, a rapeseed plant, a soybean plant, a rice plant, a barley plant, a grass, or a tobacco plant.
82. A transgenic seed comprising a sequence as set forth in claim 44 or claim 53 or a subsequence thereof.
83. The transgenic seed of claim 82, wherein the seed is a corn seed, a wheat kernel, an oilseed, a rapeseed, a soybean seed, a palm kernel, a sunflower seed, a sesame seed, a rice, a barley, a peanut or a tobacco plant seed.
84. An antisense oligonucleotide comprising a nucleic acid sequence complementary to or capable of hybridizing under stringent conditions to a sequence as set forth in claim 44 or claim 53 or a subsequence thereof.
85. The antisense oligonucleotide of claim 84, wherein the antisense oligonucleotide is between about 10 to 50, about 20 to 60, about 30 to 70, about 40 to 80, or about 60 to 100 bases in length.
86. A method of inhibiting the translation of a message in a cell comprising administering to the cell or expressing in the cell an antisense oligonucleotide comprising a nucleic acid sequence complementary to or capable of hybridizing under stringent conditions to a sequence as set forth in claim 44 or claim 53 or a subsequence thereof.
87. A double-stranded inhibitory RNA (RNAi) molecule comprising a subsequence of a sequence as set forth in claim 44 or claim 53.
88. The double-stranded inhibitory RNA (RNAi) molecule of claim 87, wherein the RNAi is about 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25 or more duplex nucleotides in length.
89. A method of inhibiting the expression of a polypeptide in a cell comprising administering to the cell or expressing in the cell a double-stranded inhibitory RNA (iRNA), wherein the RNA comprises a subsequence of a sequence as set forth in claim 44 or claim 53.
90. An isolated or recombinant polypeptide (i) having at least 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%>, 99%, or more, or complete (100%)) sequence identity to SEQ ID NO:2; SEQ ID NO:4; SEQ ID NO:6; SEQ ID NO:8; SEQ ID NO: 10; SEQ ID NO: 12; SEQ ID NO: 14; SEQ ID NOT6; SEQ ID NOT8; SEQ ID NO:20; SEQ ID NO:22; SEQ ID NO:24; SEQ ID NO:26; SEQ ID NO:28; SEQ ID NO:30; SEQ ID NO:32; SEQ ID NO:33; SEQ ID NO:35; SEQ ID NO:36; SEQ ID NO:37; SEQ ID NQ:38; SEQ ID NO:40; SEQ ID NQ:41; SEQ ID NO:42; SEQ ID NO:43; SEQ ID NO:45; SEQ ID NO:46; SEQ ID NO:47; SEQ ID NO:49; SEQ ID NQ:50; SEQ ID NO:51; SEQ ID NO:53; SEQ ID NO:54; SEQ ID NO:55; SEQ ID NO:56; SEQ ID NO:58; SEQ ID NO:59; SEQ ID NO:60; SEQ ID NO:62; SEQ ID NO:63; SEQ ID NO:64; SEQ ID NO:65; SEQ ID NO:67; SEQ ID NQ:68; SEQ ID NQ:69; SEQ ID NO:71; SEQ ID NO:72; SEQ ID NO:73; SEQ ID NO:74; SEQ ID NO:76; SEQ ID NO:77; SEQ ID NO:78; SEQ ID NO:79; SEQ ID NO:80; SEQ ID NO:81; SEQ ID NQ:83; SEQ ID NO: 84; SEQ ID NO:
85; SEQ ID NO: 86; SEQ ID NO: 87; SEQ ID NO: 88; SEQ ID NO: 90; SEQ ID NO: 91; SEQ ID NO: 92; SEQ ID NO: 93; SEQ ID NO: 94; SEQ ID NO: 95; SEQ ID NO: 97;
SEQ ID NO:98; SEQ ID NO:99; SEQ ID NOTO0; SEQ ID NO: 101; SEQ ID NO: 102;
SEQ ID NO: 104; SEQ ID NOT05; SEQ ID NO: 106; SEQ ID NO: 107; SEQ ID NO: 108;
SEQ ID NOT09; SEQ ID NOT 11; SEQ ID NOT 12; SEQ ID NOT13; SEQ ID NOT 14;
SEQ ID NOT 15; SEQ ID NOT 16; SEQ ID NO: 117; SEQ ID NOT 19; SEQ ID NO: 120; SEQ ID NOT21; SEQ ID NOT22; SEQ ID NO: 123; SEQ ID NOT24; SEQ ID NOT26 SEQ ID NOT27; SEQ ID NOT28; SEQ ID NO: 129; SEQ ID NOT30; SEQ ID N0T31 SEQ ID NOT33; SEQ ID NOT34; SEQ ID 0: 135; SEQ ID NOT36; SEQ ID N0T37 SEQ ID NOT38; SEQ ID NOT40; SEQ ID N0:141; SEQ ID NOT43; SEQ ID NOT44 SEQ ID NOT45; SEQ ID NOT47; SEQ ID NO: 148 or SEQ ID NOT49, over a region of at least about 100 residues, iwherein the sequence identities are determined by analysis with a sequence comparison algorithm or by a visual inspection, or, (ii) encoded by a nucleic acid having at least 50% sequence identity to a sequence as set forth in SEQ ID NOT; SEQ ID NO:3; SEQ ID NQ:5; SEQ ID NO:7; SEQ ID NO:9; SEQ ID NOT 1; SEQ ID NOT3; SEQ ID NOT5; SEQ ID NOT7; SEQ ID NOT9; SEQ ID NO:21; SEQ ID NO:23; SEQ ID NO:25; SEQ ID NO:27; SEQ ID NO:29; SEQ ID NO:31, SEQ ID NO:34; SEQ ID NO:39; SEQ ID NO:44; SEQ ID NO:48; SEQ ID NO:52; SEQ ID NO:57; SEQ ID NO:61; SEQ ID NO:66; SEQ ID NO:70; SEQ ID NO:75; SEQ ID NO: 82; SEQ ID NO: 89; SEQ ID NO:96; SEQ ID NO: 103; SEQ ID NOT 10; SEQ ID NOT 18; SEQ ID NOT25; SEQ ID NOT32; SEQ ID NOT39; SEQ ID NOT42 or SEQ ID NO: 146, over a region of at least about 100 residues, and the sequence identities are determined by analysis with a sequence comparison algorithm or by a visual inspection, or encoded by a nucleic acid capable of hybridizing under stringent conditions to a sequence as set forth in SEQ ID NOT; SEQ ID NO:3; SEQ ID NO:5; SEQ ID NO:7; SEQ ID NO-.9; SEQ ID NOT1; SEQ ID NOT3; SEQ ID NOT5; SEQ ID NOT7; SEQ ID NOT9; SEQ ID NO:21; SEQ ID NO:23; SEQ ID NO:25; SEQ ID NO:27; SEQ ID NO:29; SEQ ID NO:31, SEQ ID NO:34; SEQ ID NO:39; SEQ ID NO:44; SEQ ID NO:48; SEQ ID NO:52; SEQ ID NO:57; SEQ ID NO:61; SEQ ID NO:66; SEQ ID NO:70; SEQ ID NO:75; SEQ ID NO:82; SEQ ID NO:89; SEQ ID NO:96; SEQ ID NOT03; SEQ ID NOT 10; SEQ ID NOT 18; SEQ ID NOT25; SEQ ID NOT32; SEQ ID NOT39; SEQ ID NOT42 or SEQ ID NOT46.
91. The isolated or recombinant polypeptide of claim 90, wherein the sequence identity is over a region of at least about 10, 15, 20, 25, 30, 35, 40, 45, 50, 75, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950,
1000, 1050 or more residues, or the full length of the polypeptide.
92. The isolated or recombinant polypeptide of claim 90, wherein the polypeptide has a thermostable or a thermotolerant activity or has improved expression in a host cell, improved enzymatic activity, or a different substrate specificity than wild type.
93. The isolated or recombinant polypeptide of claim 90, wherein the polypeptide can retain an activity under conditions comprising a temperature range of between about 1°C to about 5°C, between about 5°C to about 15°C, between about 15°C to about 25°C, between about 25°C to about 37°C, between about 37°C to about 95°C, between about 55°C to about 85°C, between about 70°C to about 75°C, or between about 90°C to about 95°C, or more, or, retain an activity after exposure to conditions comprising a temperature range of between about 1°C to about 5°C, between about 5°C to about 15°C, between about 15°C to about 25°C, between about 25°C to about 37°C, between about 37°C to about 95°C, between about 55°C to about 85°C, between about 70°C to about 75°C, or between about 90°C to about 95°C, or more.
94. An isolated or recombinant polypeptide comprising a polypeptide as set forth in claim 90 and having a heterologous signal sequence.
95. The isolated or recombinant polypeptide of claim 90, wherein the polypeptide retains activity under conditions comprising about pH 6.5, pH 6.0, pH 5.5, 5.0, pH 4.5 or 4.0, or, retains activity under conditions comprising about pH 7.5, pH 8.0, pH 8.5, pH 9, pH 9.5, pH 10 or pH 10.5.
96. A protein preparation comprising a polypeptide as set forth in claim 90.
97. An immobilized polypeptide, wherein the polypeptide comprises a sequence as set forth in claim 90, or a subsequence thereof.
98. The immobilized polypeptide of claim 90, wherein the polypeptide is immobilized on a cell, a metal, a resin, a polymer, a ceramic, a glass, a microelecfrode, a graphitic particle, a bead, a gel, a plate, an array or a capillary tube.
99. An array comprising an immobilized polypeptide as set forth in claim 90.
100. An array comprising an immobilized nucleic acid as set forth in claim 44 or claim 53.
101. An isolated or recombinant antibody that specifically binds to a polypeptide as set forth in claim 90.
102. The isolated or recombinant antibody of claim 101, wherein the antibody is a monoclonal or a polyclonal antibody.
103. A hybridoma comprising an antibody that specifically binds to a polypeptide as set forth in claim 90.
104. A method of making an antibody comprising administering to a non-human animal a nucleic acid as set forth in claim 44 or claim 53 or a subsequence thereof in an amount sufficient to generate a humoral immune response, thereby making an antibody.
105. A method of making an antibody comprising administering to a non-human animal a polypeptide as set forth in claim 90 or a subsequence thereof in an amount sufficient to generate a humoral immune response, thereby making an antibody.
106. A method of producing a recombinant polypeptide comprising the steps of: (a) providing a nucleic acid operably linked to a promoter, wherein the nucleic acid comprises a sequence as set forth in claim 44 or claim 53 or a subsequence thereof; and (b) expressing the nucleic acid of step (a) under conditions that allow expression of the polypeptide, thereby producing a recombinant polypeptide.
107. The method of claim 106, further comprising transforming a host cell with the nucleic acid of step (a) followed by expressing the nucleic acid of step (a), thereby producing a recombinant polypeptide in a transformed cell.
108. A method for identifying a polypeptide having an activity comprising the following steps: (a) providing a polypeptide as set forth in claim 90; (b) providing an appropriate substrate; and (c) contacting the polypeptide with the substrate of step (b) and detecting a decrease in the amount of substrate or an increase in the amount of a reaction product, wherein a decrease in the amount of the substrate or an increase in the amount of the reaction product detects the active polypeptide.
109. A method for identifying a substrate comprising the following steps: (a) providing a polypeptide as set forth in claim 90; (b) providing a putative test substrate; and (c) contacting the polypeptide of step (a) with the test substrate of step (b) and detecting a decrease in the amount of subsfrate or an increase in the amount of reaction product, wherein a decrease in the amount of the substrate or an increase in the amount of a reaction product identifies the test substrate.
110. A method of determining whether a test compound specifically binds to a polypeptide comprising the following steps: (a) expressing a nucleic acid or a vector comprising the nucleic acid under conditions permissive for translation of the nucleic acid to a polypeptide, wherein the nucleic acid has a sequence as set forth in claim 44 or claim 53 or a subsequence thereof; (b) providing a test compound; (c) contacting the polypeptide with the test compound; and (d) determining whether the test compound of step (b) specifically binds to the polypeptide.
111. A method of determining whether a test compound specifically binds to a polypeptide comprising the following steps: (a) providing a polypeptide as set forth in claim 90; (b) providing a test compound; (c) contacting the polypeptide with the test compound; and (d) determining whether the test compound of step (b) specifically binds to the polypeptide.
112. A method for identifying a modulator of an activity comprising the following steps: (a) providing a polypeptide as set forth in claim 90; (b) providing an appropriate test compound; (c) contacting the polypeptide of step (a) with the test compound of step (b) and measuring an activity of the polypeptide, wherein a change in activity measured in the presence of the test compound compared to the activity in the absence of the test compound provides a determination that the test compound modulates the activity.
113. The method of claim 112, wherein the activity is measured by providing an appropriate substrate and detecting a decrease in the amount of the subsfrate or an increase in the amount of a reaction product, or, an increase in the amount of the substrate or a decrease in the amount of a reaction product.
114. The method of claim 113, wherein a decrease in the amount of the substrate or an increase in the amount of the reaction product with the test compound as compared to the amount of substrate or reaction product without the test compound identifies the test compound as an activator of activity.
115. The method of claim 113, wherein an increase in the amount of the substrate or a decrease in the amount of the reaction product with the test compound as compared to the amount of subsfrate or reaction product without the test compound identifies the test compound as an inhibitor of activity.
116. A computer system comprising a processor and a data storage device wherein said data storage device has stored thereon a polypeptide sequence or a nucleic acid sequence, wherein the polypeptide sequence comprises sequence as set forth in claim 90, a polypeptide encoded by a nucleic acid as set forth in claim 44 or claim 53 or a subsequence thereof.
117. The computer system of claim 116, further comprising a sequence comparison algorithm and a data storage device having at least one reference sequence stored thereon.
118. The computer system of claim 117, wherein the sequence comparison algorithm comprises a computer program that indicates polymorphisms.
119. The computer system of claim 116, further comprising an identifier that identifies one or more features in said sequence.
120. A computer readable medium having stored thereon a polypeptide sequence or a nucleic acid sequence, wherein the polypeptide sequence comprises a polypeptide as set forth in claim 90; a polypeptide encoded by a nucleic acid as set forth in claim 44 or claim 53 or a subsequence thereof.
121. A method for identifying a feature in a sequence comprising the 5 steps of: (a) reading the sequence using a computer program which identifies one or more features in a sequence, wherein the sequence comprises a polypeptide sequence or a nucleic acid sequence, wherein the polypeptide sequence comprises a polypeptide as set forth in claim 90; a polypeptide encoded by a nucleic acid as set forth in claim 44 or claim 53 or a subsequence thereof; and (b) identifying one or more features in the o sequence with the computer program.
122. A method for comparing a first sequence to a second sequence comprising the steps of: (a) reading the first sequence and the second sequence through use of a computer program which compares sequences, wherein the first sequence5 comprises a polypeptide sequence or a nucleic acid sequence, wherein the polypeptide sequence comprises a polypeptide as set forth in claim 90 or a polypeptide encoded by a nucleic acid as set forth in claim 44 or claim 53 or a subsequence thereof; and (b) determining differences between the first sequence and the second sequence with the computer program.0
123. The method of claim 122, wherein the step of determining differences between the first sequence and the second sequence further comprises the step of identifying polymorphisms. 5
124. The method of claim 122, further comprising an identifier that identifies one or more features in a sequence.
125. The method of claim 123, comprising reading the first sequence using a computer program and identifying one or more features in the sequence.0
126. A method for isolating or recovering a nucleic acid encoding a polypeptide from an environmental sample comprising the steps of: (a) providing an amplification primer sequence pair as set forth in claim 44 or claim 53 or a subsequence thereof; (b) isolating a nucleic acid from the environmental sample or treating the environmental sample such that nucleic acid in the sample is accessible for hybridization to the amplification primer pair; and, (c) combining the nucleic acid of step (b) with the amplification primer pair of step (a) and amplifying nucleic acid from the environmental sample, thereby isolating or recovering a nucleic acid encoding the polypeptide from an environmental sample.
127. The method of claim 126, wherein each member of the amplification primer sequence pair comprises an oligonucleotide comprising at least about 10 to 50 consecutive bases of a sequence as set forth in SEQ ID NOT; SEQ IE) NO:3; SEQ ID NO:5; SEQ ID NO:7; SEQ ID NO:9; SEQ ID NOT 1; SEQ ID NOT 3; SEQ ID NOT5; SEQ ID NOT7; SEQ ID NOT9; SEQ ID NO:21; SEQ ID NO:23; SEQ ID NO:25; SEQ ID NO:27; SEQ ID NO:29; SEQ ID NO:31, SEQ ID NO:34; SEQ ID NO:39; SEQ ID NO:44; SEQ ID NO:48; SEQ ID NO:52; SEQ ID NO:57; SEQ ID NO:61; SEQ ID NO:66; SEQ ID NO:70; SEQ ID NO:75; SEQ ID NO:82; SEQ ID NO:89; SEQ ID NO:96; SEQ ID NOT03; SEQ ID NOT10; SEQ ID NOT 18; SEQ ID NO: 125; SEQ ID NO: 132; SEQ ID NO: 139; SEQ ID NO: 142 or SEQ ID NO: 146.
128. The method of claim 126, wherein the environmental sample comprises a water sample, a liquid sample, a soil sample, an air sample or a biological sample.
129. The method of claim 128, wherein the biological sample is derived from a bacterial cell, a protozoan cell, an insect cell, a yeast cell, a plant cell, a fungal cell or a mammalian cell.
130. A method of generating a variant of a nucleic acid encoding a. polypeptide comprising the steps of: (a) providing a template nucleic acid comprising a sequence as set forth in claim 44 or claim 53 or a subsequence thereof; and (b) modifying, deleting or adding one or more nucleotides in the template sequence, or a combination thereof, to generate a variant of the template nucleic acid.
131. The method of claim 130, wherein the modifications, additions or deletions are introduced by a method comprising error-prone PCR, shuffling, oligonucleotide-directed mutagenesis, assembly PCR, sexual PCR mutagenesis, in -vivo mutagenesis, cassette mutagenesis, recursive ensemble mutagenesis, exponential ensemble mutagenesis, site-specific mutagenesis, gene reassembly, Gene Site Saturation Mutagenesis™ (GSSM™), synthetic ligation reassembly (SLR) and a combination thereof.
132. The method of claim 130, wherein the modifications, additions or deletions are introduced by a method comprising recombination, recursive sequence recombination, phosphothioate-modified DNA mutagenesis, uracil-containing template mutagenesis, gapped duplex mutagenesis, point mismatch repair mutagenesis, repair- deficient host strain mutagenesis, chemical mutagenesis, radiogenic mutagenesis, deletion mutagenesis, restriction-selection mutagenesis, restriction-purification mutagenesis, artificial gene synthesis, ensemble mutagenesis, chimeric nucleic acid multimer creation and a combination thereof.
133. The method of claim 130, wherein the method is iteratively repeated until a polypeptide having an altered or different activity or an altered or different stability or altered or different expression or altered or different substrate specificity from that of a polypeptide encoded by the template nucleic acid is produced.
134. The method of claim 133, wherein the variant polypeptide is thermotolerant, and retains some activity after being exposed to an elevated temperature.
135. The method of claim 134, wherein the variant polypeptide has increased glycosyltransferase activity or has increased expression levels in the host cell as compared to the polypeptide encoded by a template nucleic acid .
136. The method of claim 132, wherein the method is iteratively repeated until a polypeptide coding sequence having an altered codon usage from that of the template nucleic acid is produced.
137. The method of claim 132, wherein the method is iteratively repeated until a gene having higher or lower level of message expression or stability from that of the template nucleic acid is produced.
138. A method for modifying codons in a nucleic acid encoding a polypeptide to increase its expression in a host cell, the method comprising the following steps: (a) providing a nucleic acid encoding a polypeptide comprising a sequence as set forth in claim 43 or claim 53 or a subsequence thereof; and, (b) identifying a non- preferred or a less preferred codon in the nucleic acid of step (a) and replacing it with a preferred or neutrally used codon encoding the same amino acid as the replaced codon, wherein a preferred codon is a codon over-represented in coding sequences in genes in the host cell and a non-preferred or less preferred codon is a codon under-represented in coding sequences in genes in the host cell, thereby modifying the nucleic acid to increase its expression in a host cell.
139. A method for modifying codons in a nucleic acid encoding a glycosyltransferase or deoxysugar biosynthetic pathway polypeptide, the method comprising the following steps: (a) providing a nucleic acid encoding a polypeptide comprising a sequence as set forth in claim 43 or claim 53 or a subsequence thereof; and, (b) identifying a codon in the nucleic acid of step (a) and replacing it with a different codon encoding the same amino acid as the replaced codon, thereby modifying codons in the nucleic acid.
140. A method for modifying codons in a nucleic acid encoding a polypeptide to increase its expression in a host cell, the method comprising the following steps: (a) providing a nucleic acid encoding a glycosyltransferase or deoxysugar biosynthetic pathway polypeptide comprising a sequence as set forth in claim 43 or claim 53 or a subsequence thereof; and, (b) identifying a non-preferred or a less preferred codon in the nucleic acid of step (a) and replacing it with a preferred or neutrally used codon encoding the same amino acid as the replaced codon, wherein a preferred codon is a codon over-represented in coding sequences in genes in the host cell and a non- preferred or less preferred codon is a codon under-represented in coding sequences in genes in the host cell, thereby modifying the nucleic acid to increase its expression in a host cell.
141. A method for modifying a codon in a nucleic acid encoding a polypeptide to decrease its expression in a host cell, the method comprising the following steps: (a) providing a nucleic acid encoding a glycosyltransferase or deoxysugar biosynthetic pathway polypeptide comprising a sequence as set forth in claim 43 or claim 53 or a subsequence thereof; and (b) identifying at least one preferred codon in the nucleic acid of step (a) and replacing it with a non-preferred or less preferred codon encoding the same amino acid as the replaced codon, wherein a preferred codon is a codon over-represented in coding sequences in genes in a host cell and a non-preferred or less preferred codon is a codon under-represented in coding sequences in genes in the host cell, thereby modifying the nucleic acid to decrease its expression in a host cell.
142. The method of claims 138 to 141, wherein the host cell is a bacterial cell, a fungal cell, an insect cell, a yeast cell, a plant cell or a mammalian cell.
143. A method for producing a library of nucleic acids encoding a plurality of modified active sites or substrate binding sites, wherein the modified active sites or substrate binding sites are derived from a first nucleic acid comprising a sequence encoding a first active site or a first substrate binding site the method comprising the following steps: (a) providing a first nucleic acid encoding a first active site or first substrate binding site, wherein the first nucleic acid sequence comprises a sequence that hybridizes under stringent conditions to a sequence as set forth in SEQ ID NO: 1 ; SEQ ID NO:3; SEQ ID NO:5; SEQ ID NO:7; SEQ ID NO:9; SEQ ID NOT 1; SEQ ID NOT3; SEQ ID NOT5; SEQ ID NOT7; SEQ ID NOT9; SEQ ID NO:21; SEQ ID NO:23; SEQ ID NO:25; SEQ ID NO:27; SEQ ID NO:29; SEQ ID NO:31, SEQ ID NO:34; SEQ ID NO:39; SEQ ID NO:44; SEQ ID NQ:48; SEQ ID NO:52; SEQ ID NO:57; SEQ ID NO:61; SEQ ID NO:66; SEQ ID NO:70; SEQ ID NO:75; SEQ ID NO:82; SEQ ID NO:89; SEQ ID NO:96; SEQ IDNOT03; SEQ ID NOT10; SEQ ID NOT18; SEQ ID NOT25; SEQ ID NOT32; SEQ ID NOT39; SEQ ID NOT42 or SEQ ID NOT46, or a subsequence thereof, and the nucleic acid encodes an active site or a substrate binding site; (b) providing a set of mutagenic oligonucleotides that encode naturally-occurring amino acid variants at a plurality of targeted codons in the first nucleic acid; and, (c) using the set of mutagenic oligonucleotides to generate a set of active site-encoding or substrate binding site-encoding variant nucleic acids encoding a range of amino acid ^ variations at each amino acid codon that was mutagenized, thereby producing a library of nucleic acids encoding a plurality of modified active sites or substrate binding sites.
144. The method of claim 143, comprising mutagenizing the first nucleic acid of step (a) by a method comprising an optimized directed evolution system, Gene Site Saturation Mutagenesis™ (GSSM™), or a synthetic ligation reassembly (SLR).
145. The method of claim 143, comprising mutagenizing the first nucleic acid of step (a) or variants by a method comprising error-prone PCR, shuffling, oligonucleotide-directed mutagenesis, assembly PCR, sexual PCR mutagenesis, in vivo mutagenesis, cassette mutagenesis, recursive ensemble mutagenesis, exponential ensemble mutagenesis, site-specific mutagenesis, gene reassembly, Gene Site Saturation Mutagenesis™ (GSSM™), synthetic ligation reassembly (SLR) and a combination thereof.
146. The method of claim 143, comprising mutagenizing the first nucleic acid of step (a) or variants by a method comprising recombination, recursive sequence recombination, phosphothioate-modified DNA mutagenesis, uracil-containing template mutagenesis, gapped duplex mutagenesis, point mismatch repair mutagenesis, repair-deficient host strain mutagenesis, chemical mutagenesis, radiogenic mutagenesis, deletion mutagenesis, restriction-selection mutagenesis, restriction-purification mutagenesis, artificial gene synthesis, ensemble mutagenesis, chimeric nucleic acid multimer creation and a combination thereof.
147. A method for making a small molecule comprising the following steps: (a) providing a plurality of biosynthetic enzymes capable of synthesizing or modifying a small molecule, wherein one of the enzymes comprises a glycosyltransferase or deoxysugar biosynthetic pathway enzyme encoded by a nucleic acid comprising a sequence as set forth in claim 43 or claim 53 or a subsequence thereof; (b) providing a substrate for at least one of the enzymes of step (a); and (c) reacting the substrate of step (b) with the enzymes under conditions that facilitate a plurality of biocatalytic reactions to generate a small molecule by a series of biocatalytic reactions.
148. A method for modifying a small molecule comprising the following steps: (a) providing an enzyme, wherein the enzyme comprises a polypeptide as set forth in claim 90, or a polypeptide encoded by a nucleic acid comprising a nucleic acid sequence as set forth in claim 43 or claim 53 or a subsequence thereof; (b) providing a small molecule; and (c) reacting the enzyme of step (a) with the small molecule of step (b) under conditions that facilitate an enzymatic reaction catalyzed by the enzyme, thereby modifying a small molecule by a glycosyltransferase or deoxysugar biosynthetic pathway polypeptide enzymatic reaction.
149. The method of claim 148, comprising a plurality of small molecule substrates for the enzyme of step (a), thereby generating a library of modified small molecules produced by at least one enzymatic reaction catalyzed by the glycosyltransferase or deoxysugar biosynthetic pathway polypeptide enzyme.
150. The method of claim 148, further comprising a plurality of additional enzymes under conditions that facilitate a plurality of biocatalytic reactions by the enzymes to form a library of modified small molecules produced by the plurality of enzymatic reactions .
151. The method of claim 150, further comprising the step of testing the library to determine if a particular modified small molecule which exhibits a desired activity is present within the library.
152. The method of claim 151, wherein the step of testing the library further comprises the steps of systematically eliminating all but one of the biocatalytic reactions used to produce a portion of the plurality of the modified small molecules within the library by testing the portion of the modified small molecule for the presence or absence of the particular modified small molecule with a desired activity, and identifying at least one specific biocatalytic reaction that produces the particular modified small molecule of desired activity.
153. A method for determining a functional fragment of an enzyme comprising the steps of: (a) providing an enzyme, wherein the enzyme comprises a polypeptide as set forth in claim 90, or a polypeptide encoded by a nucleic acid as set forth in claim 43 or claim 53 or a subsequence thereof; and (b) deleting a plurality of amino acid residues from the sequence of step (a) and testing the remaining subsequence for activity, thereby determining a functional fragment of the enzyme.
154. The method of claim 153, wherein the activity is measured by providing an appropriate substrate and detecting a decrease in the amount of the substrate or an increase in the amount of a reaction product.
155. A method for whole cell engineering of new or modified phenotypes by using metabolic flux analysis, the method comprising the following steps: (a) making a modified cell by modifying the genetic composition of a cell, wherein the genetic composition is modified by addition to the cell of a nucleic acid comprising a sequence as set forth in claim 43 or claim 53 or a subsequence thereof; (b) culturing the modified cell to generate a plurality of modified cells; (c) measuring at least one metabolic parameter of the cell by monitoring the cell culture of step (b); and, (d) analyzing the data of step (c) to determine if the measured parameter differs from a comparable measurement in an unmodified cell under similar conditions, thereby identifying an engineered phenotype in the cell using real-time metabolic flux analysis.
156. The method of claim 155, wherein the cell culture is monitored in real time.
157. The method of claim 155, wherein the genetic composition of the cell is modified by a method comprising deletion of a sequence or modification of a sequence in the cell, or, knocking out the expression of a gene, or by activating or overexpressing.
158. The method of claim 155, further comprising selecting a cell comprising a newly engineered phenotype.
159. The method of claim 155, further comprising culturing the selected cell, thereby generating a new cell strain comprising a newly engineered phenotype.
160. An isolated or recombinant signal sequence consisting of (i) a sequence as set forth in residues 1 to 17, 1 to 18, 1 to 19, 1 to 20, 1 to 21, 1 to 22, 1 to 23,
1 to 24, 1 to 25, 1 to 26, 1 to 27, 1 to 28, 1 to 28, 1 to 30, 1 to 31, 1 to 32, 1 to 33, 1 to 34,
1 to 35, 1 to 36, 1 to 37, 1 to 38 or 1 to 39 of SEQ ID NOT; SEQ ID NO:3; SEQ ID NO:5; SEQ ID NO:7; SEQ ID NO:9; SEQ ID NOT 1; SEQ ID NOT3; SEQ ID NOT5; SEQ ID NOT7; SEQ ID NOT9; SEQ ID NO:21; SEQ ID NO:23; SEQ ID NO:25; SEQ ID NO:27; SEQ ID NO:29; SEQ ID N0:31, SEQ ID NO:34; SEQ ID NO:39; SEQ ID NO:44; SEQ ID NO:48; SEQ ID NO:52; SEQ ID NO:57; SEQ ID N0:61; SEQ ID NO:66; SEQ ID NO:70; SEQ ID NO:75; SEQ ID NO:82; SEQ ID NO:89; SEQ ID
NO:96; SEQ ID NOT03; SEQ ID NOT 10; SEQ ID N0T 18; SEQ ID NOT25; SEQ ID NOT32; SEQ ID NOT39; SEQ ID NOT42 or SEQ ID NOT46.
161. A chimeric polypeptide comprising at least a first domain comprising signal peptide (SP) having a sequence as set forth in claim 160, and at least a second domain comprising a heterologous polypeptide or peptide, wherein the heterologous polypeptide or peptide is not naturally associated with the signal peptide (SP).
162. The chimeric polypeptide of claim 161, wherein the heterologous polypeptide or peptide is amino terminal to, carboxy terminal to or on both ends of the signal peptide (SP) or a glycosyltransferase or deoxysugar biosynthetic pathway polypeptide catalytic domain (CD).
163. An isolated or recombinant nucleic acid encoding a chimeric polypeptide, wherein the chimeric polypeptide comprises at least a first domain comprising signal peptide (SP) having a sequence as set forth in claim 160 and at least a second domain comprising a heterologous polypeptide or peptide, wherein the heterologous polypeptide or peptide is not naturally associated with the signal peptide (SP).
164. A method of increasing thermotolerance or thermostability of a polypeptide, the method comprising glycosylating the polypeptide, wherein the polypeptide comprises at least thirty contiguous amino acids of a polypeptide as set forth in claim 90, or a polypeptide encoded by a nucleic acid as set forth in c claim 43 or claim 53 or a subsequence thereof, thereby increasing the thermotolerance or thermostability of the polypeptide.
165. A method for overexpressing a recombinant glycosyltransferase or deoxysugar biosynthetic pathway polypeptide in a cell comprising expressing a vector comprising a nucleic acid sequence as set forth in claim 43 or claim 53 or a subsequence thereof, wherein overexpression is effected by use of a high activity promoter, a dicistronic vector or by gene amplification of the vector.
166. An isolated or recombinant nucleic acid having at least about 50% sequence identity to SEQ ID NOT and/or SEQ ID NO: 3 and encoding a polypeptide having a gtt, or nucleotidyl transferase activity.
167. An isolated or recombinant nucleic acid having at least about 50%) sequence identity to SEQ ID NO: 5 and/or SEQ ID NO: 7 and encoding a polypeptide (e.g., SEQ ID NO:6, SEQ ID NO:8) having a gdh, or dNDP-4,6-dehydratase activity.
168. An isolated or recombinant nucleic acid having at least about 50% sequence identity to SEQ ID NO:9 and/or SEQ ID NO: 11 and encoding a polypeptide having an epi, or 3,5-epimerase activity.
169. An isolated or recombinant nucleic acid having at least about 50% sequence identity to SEQ ID NO: 13 and/or SEQ ID NO: 15 and encoding a polypeptide having a kre, or 4-ketoreductase activity.
170. An isolated or recombinant nucleic acid having at least about 50%> sequence identity to SEQ ID NO: 17 and encoding a polypeptide having a tdh, or 2,3- dehdratase activity. ,
171. An isolated or recombinant nucleic acid having at least about 50%> sequence identity to SEQ ID NO: 19 and encoding a polypeptide having a tkr, or 3- ketoreductase activity.
172. An isolated or recombinant nucleic acid having at least about 50%) sequence identity to SEQ ID NO:21 encoding a polypeptide having a glycosyl transferase (spnG) activity.
173. An isolated or recombinant nucleic acid having at least about 50%> sequence identity to SEQ ID NO:23 and encoding a polypeptide having a glycosyl transferase (spnP) activity.
174. An isolated or recombinant nucleic acid having at least about 50%> sequence identity to SEQ ID NO:25 and encoding a polypeptide having a methyltransferase (spnS) activity.
175. An isolated or recombinant nucleic acid having at least about 50% sequence identity to SEQ ID NO:27 and encoding a polypeptide having a amino transferase (spnR) activity.
176. An isolated or recombinant nucleic acid having at least about 50% sequence identity to SEQ ID NO:29 and encoding a polypeptide having a 3,4 dehydratase (spnQ) activity.
177. An isolated or recombinant nucleic acid comprising a nucleic acid sequence having at least about 50% sequence identity to SEQ ID NQ:31, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a 6- deoxyhexose, and the pathway comprises at least one gtt gene and at least one gdh gene.
178. The isolated or recombinant nucleic acid of claim 177, wherein the deoxysugar biosynthetic pathway encodes a polypeptide having a sequence as set forth in SEQ ID NO:32 and/or SEQ ID NQ:33.
179. An isolated or recombinant nucleic acid comprising a nucleic acid sequence having at least about 50%> sequence identity to SEQ ID NQ:34, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a 6- deoxysugar, and the pathway comprises at least one gtt, gdh, epi and kre gene.
180. The isolated or recombinant nucleic acid of claim 179, wherein the deoxysugar biosynthetic pathway encodes a polypeptide having a sequence as set forth in SEQ ID NO:35, SEQ ID NO:36, SEQ ID NO:37, and SEQ ID NO:38.
181. An isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50% sequence identity to SEQ ID NO:39, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a 6- deoxysugar, and the pathway comprises at least one gtt, gdh, epi and kre gene.
182. The isolated or recombinant nucleic acid of claim 181, wherein the deoxysugar biosynthetic pathway encodes a polypeptide having a sequence as set forth in SEQ ID NO:40, SEQ ID NO:41, SEQ ID NO:42, and SEQ ID NO:43.
183. An isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50% sequence identity to SEQ ID NO: 44, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a 6- deoxysugar, and the pathway comprises at least one gtt, gdh and kre gene.
184. The isolated or recombinant nucleic acid of claim 183, wherein the deoxysugar biosynthetic pathway encodes a polypeptide having a sequence as set forth in SEQ ID NO:45, SEQ ID NO:46, and SEQ ID NO:47.
185. An isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50% sequence identity to SEQ ID NO:48, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a 6- deoxyhexose, and the pathway comprises at least one gtt, gdh and epi gene.
186. The isolated or recombinant nucleic acid of claim 185, wherein the deoxysugar biosynthetic pathway encodes a polypeptide having a sequence as set forth in
SEQ ID NO:49, SEQ ID NO:50, and SEQ ID NO:51.
187. An isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50% sequence identity to SEQ ID NO:52, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a 6- deoxysugar, and the pathway comprises at least one gtt, gdh, epi and kre gene.
188. The isolated or recombinant nucleic acid of claim 187, wherein the deoxysugar biosynthetic pathway encodes a polypeptide having a sequence as set forth in SEQ ID NO:53, SEQ ID NO:54, SEQ ID NO:55, and SEQ ID NO:56.
189. An isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50%> sequence identity to SEQ ID NO:57, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a 6- deoxyhexose, and the pathway comprises at least one gtt, gdh and kre gene.
190. The isolated or recombinant nucleic acid of claim 189, wherein the deoxysugar biosynthetic pathway encodes a polypeptide having a sequence as set forth in SEQ ID NO:58, SEQ ID NO:59, and SEQ ID NO:60. '
191. An isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50%) sequence identity to SEQ ID NO:61, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a 6- deoxysugar, and the pathway comprises at least one gtt, gdh, epi and kre gene.
192. The isolated or recombinant nucleic acid of claim 191, wherein the deoxysugar biosynthetic pathway encodes a polypeptide having a sequence as set forth in
SEQ ID NO:62, SEQ ID NO:63, SEQ ID NO:64 and SEQ ID NO:65.
193. An isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50%> sequence identity to SEQ ID NO:66, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a 6- deoxyhexose, and the pathway comprises at least one gtt, gdh and epi gene.
194. The isolated or recombinant nucleic acid of claim 193, wherein the deoxysugar biosynthetic pathway encodes a polypeptide having a sequence as set forth in SEQ ID NO:67, SEQ ID NO:68 and SEQ ID NO:69.
195. An isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50% sequence identity to SEQ ID NO:70, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a 6- deoxysugar, and the pathway comprises at least one gtt, gdh, epi and kre gene.
196. The isolated or recombinant nucleic acid of claim 195, wherein the deoxysugar biosynthetic pathway encodes a polypeptide having a sequence as set forth in
SEQ ID NO:71, SEQ ID NO:72, SEQ ID NO:73 and SEQ ID NO:74.
197. An isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50%) sequence identity to SEQ ID NO:75, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a 2,6 dideoxysugar, and the pathway comprises at least one gtt, gdh, epi, kre, tdh and tkr gene.
198. The isolated or recombinant nucleic acid of claim 197, wherein the deoxysugar biosynthetic pathway encodes a polypeptide having a sequence as set forth in SEQ ID NO:76, SEQ ID NO:77, SEQ ID NO:78, SEQ ID NO:79, SEQ ID NO:80, and SEQ ID NO:81.
199. An isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50% sequence identity to SEQ ID NO: 89, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a 2,6 dideoxysugar, and the pathway comprises at least one gtt, gdh, epi, kre, tdh and tkr gene.
200. The isolated or recombinant nucleic acid of claim 199, wherein the deoxysugar biosynthetic pathway encodes a polypeptide having a sequence as set forth in SEQ ID NO:90, SEQ ID NO:91, SEQ ID NO:92, SEQ ID NO:93, SEQ ID NO:94, and SEQ ID NO:95.
201. An isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50% sequence identity to SEQ ID NO:96, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a 2,6 dideoxysugar, and the pathway comprises at least one gtt, gdh, epi, kre, tdh and tkr gene.
202. The isolated or recombinant nucleic acid of claim 201, wherein the deoxysugar biosynthetic pathway encodes a polypeptide having a sequence as set forth in SEQ ID NO:97, SEQ ID NO:98, SEQ ID NQ:99, SEQ ID NO: 100, SEQ ID NOT01, and SEQ ID NO: 102.
203. An isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50%o sequence identity to SEQ ID NO: 103, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a 2,6 dideoxysugar, and the pathway comprises at least one gtt, gdh, epi, kre, tdh and tkr gene.
204. The isolated or recombinant nucleic acid of claim 203, wherein the deoxysugar biosynthetic pathway encodes a polypeptide having a sequence as set forth in
SEQ ID NOT04, SEQ ID NOT05, SEQ ID NO: 106, SEQ ID NO: 107, SEQ ID NOT08 and SEQ ID NO: 109.
205. An isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50%> sequence identity to SEQ ID NO: 110, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a deoxysugar, and the pathway comprises at least one gtt, gdh, spnQ, R, S, tdh and tkr gene.
206. The isolated or recombinant nucleic acid of claim 205, wherein the deoxysugar biosynthetic pathway encodes a polypeptide having a sequence as set forth in
SEQ ID NOT11, SEQ ID NO: 112, SEQ ID NO: 113, SEQ ID NOT 14, SEQ ID NOT 15, SEQ ID NOT 16 and SEQ ID NO: 117.
207. An isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50% sequence identity to SEQ ID NO: 118, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a deoxysugar, and the pathway comprises at least one gtt, gdh, spnQ, R, tdh and tkr gene.
208. The isolated or recombinant nucleic acid of claim 207, wherein the deoxysugar biosynthetic pathway encodes a polypeptide having a sequence as set forth in
SEQ ID NOT 19, SEQ ID NO: 120, SEQ ID NOT21, SEQ ID NO: 122, SEQ ID NOT23 and SEQ ID NO: 124.
209. An isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50%> sequence identity to SEQ ID NOT25, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a deoxysugar, and the pathway comprises at least one gtt, gdh, spnQ, kre, tdh and tkr gene.
210. The isolated or recombinant nucleic acid of claim 209, wherein the deoxysugar biosynthetic pathway encodes a polypeptide having a sequence as set forth in SEQ ID NOT 19, SEQ ID NO: 126, SEQ ID NO: 127, SEQ ID NO: 128, SEQ ID NO: 129, SEQ ID NOT30 and SEQ ID NOT31.
211. An isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50% sequence identity to SEQ ID NO: 132, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a deoxysugar, and the pathway comprises at least one gtt, gdh, spnQ, kre, tdh and tkr gene.
212. The isolated or recombinant nucleic acid of claim 211, wherein the deoxysugar biosynthetic pathway encodes a polypeptide having a sequence as set forth in SEQ ID NOT33, SEQ ID NQT34, SEQ ID NOT35, SEQ ID NOT36, SEQ ID NOT37 and SEQ ID NO: 138.
213. An isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50% sequence identity to SEQ ID NO: 139, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a deoxysugar, and the pathway comprises at least one nucleic acid encoding SEQ ID NOT40 and SEQ ID NOT41.
214. An isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50% sequence identity to SEQ ID NO: 142, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a deoxysugar, and the pathway comprises at least one nucleic acid encoding SEQ ID NO: 143, SEQ ID NO: 144, and SEQ ID NOT45.
215. An isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least about 50% sequence identity to SEQ ID NO: 146, wherein the nucleic acid comprises a deoxysugar biosynthetic pathway that can generate a deoxysugar, and the pathway comprises at least one nucleic acid encoding SEQ ID NO: 147, SEQ ID NO: 148, and SEQ ID NO: 149.
216. A deoxysugar biosynthetic pathway comprising one or more or any combination of deoxysugar biosynthetic pathways as set forth in claims 177 to 215.
217. A glycosylation system comprising one or more or any combination of deoxysugar biosynthetic pathways as set forth in claims 177 to 215.
218. The glycosylation system of claim 217, wherein the system is an in vivo glycosylation system, an in vitro glycosylation system or a combination thereof.
PCT/US2004/025015 2003-08-04 2004-08-04 Glycosylation enzymes and systems and methods of making and using them WO2005044979A2 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US49278103P 2003-08-04 2003-08-04
US60/492,781 2003-08-04
US51595003P 2003-10-29 2003-10-29
US60/515,950 2003-10-29

Publications (2)

Publication Number Publication Date
WO2005044979A2 true WO2005044979A2 (en) 2005-05-19
WO2005044979A3 WO2005044979A3 (en) 2009-04-09

Family

ID=34576538

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2004/025015 WO2005044979A2 (en) 2003-08-04 2004-08-04 Glycosylation enzymes and systems and methods of making and using them

Country Status (1)

Country Link
WO (1) WO2005044979A2 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010028667A1 (en) * 2008-09-11 2010-03-18 W.C. Heraeus Gmbh Genetically modified strains for biotransformations in anthracycline production
CN103755723A (en) * 2014-02-07 2014-04-30 天津大学 Method for preparing rifampicin I crystal form
CN106006882A (en) * 2016-07-18 2016-10-12 青岛阳光动力生物医药技术有限公司 Photosensitive molecule system disinfectant combination used for aquaculture
CN106188184A (en) * 2015-06-01 2016-12-07 中南大学 Pleocidin derivative application in terms of preparing antitumor drug and anti-KSHV virus drugs
CN114438004A (en) * 2021-12-14 2022-05-06 湖南师范大学 Saccharopolyspora whiskers engineering strain with doubled pII gene and construction method and application thereof
CN114686549A (en) * 2022-04-29 2022-07-01 陕西嘉禾生物科技股份有限公司 Method for preparing enzyme modified isoquercitrin by using rutin

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6265202B1 (en) * 1998-06-26 2001-07-24 Regents Of The University Of Minnesota DNA encoding methymycin and pikromycin

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6265202B1 (en) * 1998-06-26 2001-07-24 Regents Of The University Of Minnesota DNA encoding methymycin and pikromycin

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MIYMOTO, Y.: 'Cloning and characterization of a glycosyltransferase gene involved in the biosynthesis ofanthacycline antibiotic beta-rhudomycin from Streptomyces Vilaceus' FEMS MICROBIOL LETT. 10 December 2001, pages 163 - 168 *
RODRIGUEZ, L. ET AL.: 'Engineering Deoxysugar Biosynthetic Pathways from Antibiotic- Producing Microorganisms: A tool to produce Novel Glycosylated Bioactive Compounds' CHEM & BIOL vol. 9, June 2002, pages 721 - 729 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010028667A1 (en) * 2008-09-11 2010-03-18 W.C. Heraeus Gmbh Genetically modified strains for biotransformations in anthracycline production
CN103755723A (en) * 2014-02-07 2014-04-30 天津大学 Method for preparing rifampicin I crystal form
CN103755723B (en) * 2014-02-07 2016-04-20 天津大学 A kind of preparation method of rifampicin I crystal form
CN106188184A (en) * 2015-06-01 2016-12-07 中南大学 Pleocidin derivative application in terms of preparing antitumor drug and anti-KSHV virus drugs
CN106188184B (en) * 2015-06-01 2021-01-05 中南大学 Application of spinosyn derivative in preparation of anti-tumor drugs and anti-KSHV virus drugs
CN112386603A (en) * 2015-06-01 2021-02-23 中南大学 Application of spinosyn derivative in preparation of anti-tumor drugs and anti-KSHV virus drugs
CN112386603B (en) * 2015-06-01 2021-10-01 中南大学 Application of spinosyn derivative in preparation of anti-tumor drugs and anti-KSHV virus drugs
CN106006882A (en) * 2016-07-18 2016-10-12 青岛阳光动力生物医药技术有限公司 Photosensitive molecule system disinfectant combination used for aquaculture
CN114438004A (en) * 2021-12-14 2022-05-06 湖南师范大学 Saccharopolyspora whiskers engineering strain with doubled pII gene and construction method and application thereof
CN114438004B (en) * 2021-12-14 2023-08-22 湖南师范大学 Saccharopolyspora erythraea engineering strain with doubled pII gene, and construction method and application thereof
CN114686549A (en) * 2022-04-29 2022-07-01 陕西嘉禾生物科技股份有限公司 Method for preparing enzyme modified isoquercitrin by using rutin

Also Published As

Publication number Publication date
WO2005044979A3 (en) 2009-04-09

Similar Documents

Publication Publication Date Title
AU2022203048B2 (en) Recombinant production of steviol glycosides
Laudert et al. Cloning, molecular and functional characterization of Arabidopsis thaliana allene oxide synthase (CYP 74), the first enzyme of the octadecanoid pathway to jasmonates
KR101508249B1 (en) Aldolases, nucleic acids encoding them and methods for making and using them
Zhou et al. A novel multidomain polyketide synthase is essential for zeamine production and the virulence of Dickeya zeae
KR101802547B1 (en) Recombinant Production of Steviol Glycosides
Bender et al. Characterization of the genes controlling the biosynthesis of the polyketide phytotoxin coronatine including conjugation between coronafacic and coronamic acid
PL182136B1 (en) Genes for use in synthesising antipathogenic substances
US20060053510A1 (en) Transgenic plants incorporating traits of Zostera marina
RU2372404C2 (en) Metabolising herbicide protein, gene and application
CA2586048C (en) Protection against herbivores
MXPA05004869A (en) Xylose isomerases, nucleic acids encoding them and methods for making and using them.
WO2005044979A2 (en) Glycosylation enzymes and systems and methods of making and using them
JP2001509379A (en) Polyketide production in plants.
US7419812B2 (en) Sequences encoding PhzO and methods
RU2580015C2 (en) Spnk strains
US8188245B2 (en) Enduracidin biosynthetic gene cluster from streptomyces fungicidicus
WO1998049273A1 (en) Raffinose synthetase gene, process for producing raffinose, and transformed plant
JP7086984B2 (en) Compositions and Methods for Enhancing Enduracididine Production in Recombinant strains of Streptomyces fungicidicus
EP1381685B1 (en) Genes and proteins for the biosynthesis of polyketides
KR101753073B1 (en) Microorganisms and plants transformed by recombinant vector comprising plant derived genes coding COMT enzyme
JPH08205873A (en) Recombinant gibberellin dna and its application
KR100715372B1 (en) Protein with dual activities of uracil phosphoribosyltransferase and uridine kinase and gene coding the same
JPH11123080A (en) Gene for raffinose synthetase, production of raffinose and transformed plant
US6855867B1 (en) Plant glutamine amidotransferase homologs
US20040014032A1 (en) Nucleic acids encoding lettuce big-vein virus proteins and utilization thereof

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase in:

Ref country code: DE

122 Ep: pct application non-entry in european phase