WO2009009139A2

WO2009009139A2 - Rna ligase polypeptides and methods of selection and use thereof

Info

Publication number: WO2009009139A2
Application number: PCT/US2008/008550
Authority: WO
Inventors: Burchard Seelig; Jack W. Szostak; Anthony D. Keefe; Glen S. Cho
Original assignee: The General Hospital Corporation
Priority date: 2007-07-11
Filing date: 2008-07-11
Publication date: 2009-01-15
Also published as: WO2009009139A3

Abstract

The present invention features polypeptides capable of catalyzing a ligation reaction between a first RNA molecule comprising, e.g., a hydroxyl group at the 3' position and a second RNA molecule comprising, e.g., a triphosphate group at the 5' position. The invention further features methods of selecting, optimizing, and using such polypeptides.

Description

RNA LIGASE POLYPEPTIDES AND METHODS OF SELECTION AND USE THEREOF

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims benefit of the filing date of U.S. provisional application serial number 60/959,084, filed July 11, 2007, the disclosure of which is incorporated herein in its entirety.

STATEMENT AS TO FEDERALLY SPONSORED RESEARCH This invention was made with Government support under Grant Nos. R01GM53936, awarded by the National Institute of General Medical Sciences of the National Institutes of Health; U01HL66678, awarded by the National Heart, Lung and Blood Institute of the National Institutes of Health; NCC2- 1069, awarded by the National Aeronautics and Space Administration (Exobiology); and NNA04CC12A, awarded by the National Aeronautics and Space Administration (Ames Research Center). The Government has certain rights in the invention.

BACKGROUND OF THE INVENTION In general, this invention relates to RNA ligase polypeptides and methods of their selection.

Natural enzymes that ligate RNA or DNA (e.g., T4 DNA ligase or T4 RNA ligase) generally join a nucleic acid molecule having a phosphoryl group at the 5' position to a second nucleic acid molecule having a hydroxy 1 group at the 3' position. RNA is commonly synthesized by a transcription reaction that generates 5 '-triphosphate ends that are not amenable to ligation by natural enzymes. Therefore, to produce 5 '-phosphoryl ends suitable for ligation, RNA is typically dephosphorylated and then kinased - a cumbersome process that results in significant losses of valuable RNA. While artificial ribozymes and deoxyribozymes that catalyze the ligation of RNA having 5 '-triphosphate ends exist, these nucleic acid enzymes must be designed and synthesized specifically for each new ligation reaction, as they need to match the sequences that are to be ligated. There is thus a need in the art for an RNA ligase enzyme that is capable of ligating diverse substrates having 5 '-triphosphate ends without the need to tailor the ligase to the individual RNA substrates being ligated.

SUMMARY OF THE INVENTION

We have developed protein enzymes that are capable of catalyzing a ligation reaction between a first RNA molecule that includes a hydroxy 1 group at the 3 ' position and a second RNA molecule that includes a triphosphate group at the 5' position. More generally, we have found that new enzymatic activities, such as the RNA ligase activity described herein, can be created de nnoovvoo without the need for prior mechanistic information by selection from a polypeptide library with product formation as the sole selection criterion.

Accordingly, the invention features a polypeptide capable of catalyzing a ligation reaction between a first RNA molecule that includes a hydroxyl group at the 3' position and a second RNA molecule that includes a triphosphate group at the 5' position, the ligation reaction occurring in the presence of an oligonucleotide that is complementary to at least a portion of the first RNA molecule and at least a portion of the second RNA molecule, wherein the ligation reaction results in the formation of a phosphodiester bond between the 3' position of the first RNA molecule and the 5' position of the second RNA molecule. The oligonucleotide (also referred to herein as "splint") is included in order to facilitate positioning of the first and second RNA molecules prior to ligation and can be complementary to any portions of the first and second RNA molecules, e.g., at least three, four, five, six, seven, eight, nine, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, or greater than 20 consecutive bases. The ligation reaction can proceed, e.g., at least 10³, 10⁴, 10⁵, 10⁶, 10⁷, 10⁸, or even 10⁹ times faster than an uncatalyzed control reaction that includes the first and second RNA molecules and the oligonucleotide. The ligation step may be performed at 65°C.

In some instances, the polypeptide includes an amino acid sequence that is substantially identical, e.g., at least 50%, 60%, 70%, 80%, 85%, 90%, 95%, 98%, 99%, or even 100% identical, to amino acids 12-79 of any one of SEQ ID NOs.: 1, 3-5, 8-15, or 17-18; amino acids 12-61 of SEQ ID NO.: 2; amino acids 12-62 of SEQ ID NOs.: 6 or 7; amino acids 11-78 of SEQ ID NO.: 16; amino acids 17-88 of SEQ ID NOs.: 20, 21, or 23; or amino acids 17-70 of SEQ ID NO.: 22. The polypeptide can include amino acids 12-90 of any one of SEQ ID NOs.: 1, 4, 8, 10, 1 1, 13-15, 17, or 18; amino acids 12-72 of SEQ ID NO.: 2; amino acids 12-79 of any one of SEQ ID NOs.: 3, 5, 9, or 12; amino acids 12-73 of SEQ ID NOs.: 6 or 7; amino acids 11-89 of SEQ ID NO.: 16; amino acids 17-88 of SEQ ID NOs.: 20, 21, or 23; or amino acids 17-70 of SEQ ID NO.: 22. The polypeptide can include any one of SEQ ID NOs.: 1-23. The polypeptide can include an amino acid sequence that differs from amino acids 21-32 or 60-68 of SEQ ID NO.: 1 by at least one, two, three, four, or five amino acids, or that differs from amino acids 21-29 or 42-50 of SEQ ID NO.: 2 by at least one, two, three, four, or five amino acids, or that differs from amino acids 21-32 or 60-68 of SEQ ID NO.: 3 by at least one, two, three, four, or five amino acids, or that differs from amino acids 21 -32 or 60-68 of SEQ ID NO. : 4 by at least one, two, three, four, or five amino acids, or that differs from amino acids 21-32 or 60-68 of SEQ ID NO.: 5 by at least one, two, three, four, or five amino acids, or that differs from amino acids 21-30 or 43-51 of SEQ ID NO.: 6 by at least one, two, three, four, or five amino acids, or that differs from amino acids 21-30 or 43-51 of SEQ ID NO.: 7 by at least one, two, three, four, or five amino acids, or that differs from amino acids 21-32 or 60-68 of SEQ ID NO.: 8 by at least one, two, three, four, or five amino acids, or that differs from amino acids 21-32 or 60-68 of SEQ ID NO.: 9 by at least one, two, three, four, or five amino acids, or that differs from amino acids 21-32 or 60-68 of SEQ ID NO.: 10 by at least one, two, three, four, or five amino acids, or that differs from amino acids 21-32 or 60-68 of SEQ ID NO.: 1 1 by at least one, two, three, four, or five amino acids, or that differs from amino acids 21-32 or 60-68 of SEQ ID NO.: 12 by at least one, two, three, four, or five amino acids, or that differs from amino acids 21-32 or 60-68 of SEQ ID NO.: 13 by at least one, two, three, four, or five amino acids, or that differs from amino acids 21-32 or 60-68 of SEQ ID NO.: 14 by at least one, two, three, four, or five amino acids, or that differs from amino acids 21-32 or 60-68 of SEQ ID NO.: 15 by at least one, two, three, four, or five amino acids, or that differs from amino acids 20- 31 or 59-67 of SEQ ID NO.: 16 by at least one, two, three, four, or five amino acids, or that differs from amino acids 21-32 or 60-68 of SEQ ID NO.: 17 by at least one, two, three, four, or five amino acids, or that differs from amino acids 21-32 or 60-68 of SEQ ID NO.: 18 by at least one, two, three, four, or five amino acids, or that differs from amino acids 26-37 or 64-73 of SEQ ID NO.: 20 by at least one, two, three, four, or five amino acids, , or that differs from amino acids 26-37 or 64-73 of SEQ ID NO.: 21 by at least one, two, three, four, or five amino acids, or that differs from amino acids 26-37 or 64-73 of SEQ ID NO.: 23 by at least one, two, three, four, or five amino acids.

The invention further features a cell (e.g., a prokaryotic or a eukaryotic cell) that includes any of the polypeptides described herein.

The invention additionally features a kit that includes a polypeptide of the present invention and instructions for its use. The kit can also contain additional reagents, e.g., buffers, salts, or controls (e.g., negative and/or positive controls).

The invention further features a method for the selection of a polypeptide capable of catalyzing a ligation reaction between a first RNA molecule and a second RNA molecule, the method including the steps of: (a) providing a population of candidate RNA molecules, each of which includes a translation initiation sequence and a start codon operably linked to a candidate polypeptide coding sequence and each of which is covalently bonded to a peptide acceptor at the 3 ' end of the candidate polypeptide coding sequence, the peptide acceptor being a molecule that can be added to the C-terminus of a growing polypeptide chain by the catalytic activity of a ribosomal peptidyl transferase; (b) in vitro translating the candidate polypeptide coding sequences of the candidate RNA molecules to produce a population of candidate RNA- polypeptide fusions; and (c) selecting a desired RNA-polypeptide fusion based on RNA ligase activity, thereby selecting the polypeptide capable of catalyzing the ligation reaction. In some instances, the selecting in step (c) occurs at 65°C. In some instances, the selecting in step (c) occurs in the presence of an oligonucleotide that is complementary to at least a portion of the first RNA molecule and at least a portion of the second RNA molecule, e.g., at least three, four, five, six, seven, eight, nine, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, or greater than 20 consecutive bases of either or both RNA molecules. In some instances, the first RNA molecule includes a hydroxyl group at the 3' position. In addition, in some instances, the second RNA molecule includes either a triphosphate group or a phosphoryl group at the 5 ' position. In some instances, the second RNA molecule includes an imidazolide, an imidazolide derivative, a phosphoramidate, a carboxylic anhydride, a mixed anhydride, or an activated phosphodiester at the 5' position.

By "capable of catalyzing" a reaction is meant capable of promoting or increasing the rate of the reaction in comparison with the rate of an uncatalyzed control reaction. For example, a polypeptide of the invention may be capable of increasing the rate of ligation of a first and second RNA molecule 10-fold, 100-fold, 1, 000-fold, 10,000-fold, 100,000-fold, 1,000,000-fold, or even 10,000,000-fold in comparison with the rate of a control ligation reaction in the absence of the polypeptide. The rate of ligation and the degree of catalysis may be determined, e.g., by using a gel shift assay as described herein. By "covalently bonded" is meant joined either directly through a covalent bond or indirectly through another covalently bonded sequence (for example, DNA corresponding to a pause site).

By "imidazolide derivative" is meant an imidazolide analog, e.g., containing a substitution at one or more positions of the imidazole ring. Imidazolide analogs are described, e.g., in U.S. Patent Nos. 3,717,655, 6,737,434, and 7,153,874. By "ligation reaction" in the context of RNA is meant a chemical reaction that results in the formation of a phosphodiester bond between the 3' position of a first RNA molecule and the 5' position of a second RNA molecule. In some instances, an oligonucleotide that is complementary to a portion of the first RNA molecule and a portion of the second RNA molecule (e.g., at least three consecutive bases of each) is included in the ligation reaction to facilitate positioning of the first and second RNA molecules prior to ligation.

By "oligonucleotide" is meant a molecule, e.g., RNA or DNA, having a sequence of two or more covalently bonded, naturally occurring or modified nucleotides. The oligonucleotide may include modified or unmodified nucleotides, or mixtures or combinations thereof. Various salts, mixed salts, and free acid forms are also included.

The terms "peptide," "polypeptide," and "protein" are used interchangeably and refer to any chain of two or more natural or unnatural amino acids, regardless of post-translational modification (e.g., glycosylation or phosphorylation), constituting all or part of a naturally-occurring or non- naturally occurring polypeptide or peptide, as is described herein.

As used herein, a natural amino acid is a natural α-amino acid having the L-configuration, such as those normally occurring in natural polypeptides. Unnatural amino acid refers to an amino acid, which normally does not occur in polypeptides, e.g., an epimer of a natural α-amino acid having the L configuration, that is to say an amino acid having the unnatural D- configuration; or a (D,L)-isomeric mixture thereof; or a homologue of such an amino acid, for example, a β-amino acid, an α,α-disubstituted amino acid, or an α-amino acid wherein the amino acid side chain has been shortened by one or two methylene groups or lengthened to up to 10 carbon atoms, such as an α- amino alkanoic acid with 5 up to and including 10 carbon atoms in a linear chain, an unsubstituted or substituted aromatic (α-aryl or α-aryl lower alkyl), for example, a substituted phenylalanine or phenylglycine. By "peptide acceptor" is meant any molecule capable of being added to the C-terminus of a growing polypeptide chain by the catalytic activity of the ribosomal peptidyl transferase function. Typically, such molecules contain (i) a nucleotide or nucleotide-like moiety (for example, adenosine or an adenosine analog (di-methylation at the N-6 amino position is acceptable)), (ii) an amino acid or amino acid-like moiety (for example, any of the 20 D- or L-amino acids or any amino acid analog thereof (for example, O-methyl tyrosine or any of the analogs described by Ellman et al., Meth. Enzymol. 202:301, 1991)), and (iii) a linkage between the two (for example, an ester, amide, or ketone linkage at the 3' position or, alternatively, the 2' position); preferably, this linkage does not significantly perturb the pucker of the ring from the natural ribonucleotide conformation. Peptide acceptors may also possess a nucleophile, which may be, without limitation, an amino group, a hydroxyl group, or a sulfhydryl group. In addition, peptide acceptors may be composed of nucleotide mimetics, amino acid mimetics, or mimetics of the combined nucleotide-amino acid structure.

By "peptide acceptor at the 3' position" of a polypeptide coding sequence is meant that the peptide acceptor molecule is positioned after the final codon of that polypeptide coding sequence. This term includes, without limitation, a peptide acceptor molecule that is positioned precisely at the 3' end of the polypeptide coding sequence as well as one which is separated from the final codon by intervening coding or non-coding sequence (for example, a sequence corresponding to a pause site). This term also includes constructs in which coding or non-coding sequences follow (that is, are 3' to) the peptide acceptor molecule. In addition, this term encompasses, without limitation, a peptide acceptor molecule that is covalently bonded (either directly or indirectly through intervening nucleic acid sequence) to the polypeptide coding sequence, as well as one that is joined to the polypeptide coding sequence by some non-covalent means, for example, through hybridization using a second nucleic acid sequence that binds at or near the 3' end of the polypeptide coding sequence and that itself is bound to a peptide acceptor molecule. By "population" is meant more than one molecule (for example, more than one RNA, DNA, or RNA-polypeptide fusion molecule). RNA or DNA molecules within a population can include, e.g., a polypeptide coding sequence, and can be partially or completely randomized. Because the methods described herein facilitate selections which begin, if desired, with large numbers of ' candidate molecules, a population according to the invention can mean, e.g., more than 10⁸, 10⁹, 10¹⁰, 10¹¹, 10¹², 10¹³, or even 10¹⁴ molecules.

By "RNA" is meant a molecule having a sequence of two or more covalently bonded, naturally occurring or modified ribonucleotides. The term includes, e.g., mRNA, miRNA, rRNA, siRNA (e.g., secondary siRNA), and tRNA. Examples of modified RNA included within this term are phosphorothioates, chiral phosphorothioates, phosphorodithioates, phosphotriesters, aminoalkyl-phosphotriesters, methyl and other alkyl phosphonates including 3'-alkylene phosphonates and chiral phosphonates, phosphinates, phosphoramidates including 3 '-amino phosphoramidate and aminoalkylphosphoramidates, thionophosphoramidates, thionoalkylphosphonates, thionoalkylphosphotriesters, and boranophosphates. Various salts, mixed salts, and free acid forms are also included.

By "selecting" is meant substantially partitioning a molecule from other molecules in a population. As used herein, a selecting step provides at least a 2-fold, 30-fold, 100-fold, or even 1, 00-fold or greater enrichment of a desired molecule relative to undesired molecules in a population following the selection step. As indicated herein, a selection step may be repeated any number of times, and different types of selection steps may be combined in a given approach. For example, a selection step can select for molecules that have RNA ligase activity, e.g., as described herein.

By "start codon" is meant three bases which signal the beginning of a polypeptide coding sequence. Generally, these bases are AUG or ATG; however, any other base triplet capable of being utilized in this manner may be substituted. By "substantially identical" is meant a polypeptide or nucleic acid exhibiting at least 50%, 60%, 70%, 80%, 85%, 90%, 95%, 98%, 99%, or even 100% identity to a reference amino acid or nucleic acid sequence over at least 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, or 70 contiguous residues or bases. Sequence identity is typically measured using a sequence analysis program (e.g., BLAST 2; Tatusova et al., FEMS Microbiol Lett. 174:247-250, 1999) with the default parameters specified therein. Conservative substitutions typically include substitutions within the following groups: glycine, alanine, valine, isoleucine, leucine; aspartic acid, glutamic acid, asparagine, glutamine; serine, threonine; lysine, arginine; and phenylalanine and tyrosine.

By "translation initiation sequence" is meant any sequence which is capable of providing a functional ribosome entry site. In bacterial systems, this region is sometimes referred to as a Shine-Dalgarno sequence.

By "uncatalyzed control reaction" is meant a reaction that includes all of the reagents included in a catalyzed reaction with the exception of the catalyzing agent. For example, to provide a basis of comparison for an RNA ligation reaction, a corresponding uncatalyzed control reaction would exclude the RNA ligase but include all of the other elements of the reaction. The rate of ligation in a catalyzed or uncatalyzed reaction may be determined, e.g., by using a gel shift assay as described herein.

The polypeptides, kits, and methods described herein offer multiple advantages over previously available reagents and methods. In particular, by enabling the ligation of RNA with 5 '-triphosphate ends, there is no longer a need to dephosphorylate and then kinase RNA prior to ligation, nor is there a need to purify the intermediates that would result from the extra processing steps, thereby saving substantial cost and time as well as increasing yield. Diverse substrates may be ligated without the need to tailor the ligase to the sequences of the particular substrates.

Exemplary uses of the polypeptides, kits, and methods described herein include, e.g., ligation of synthetic and/or enzymatically synthesized RNA; incorporation of nonnatural nucleotides, e.g., labels and cross-linkers, into RNA for functional studies; circularization of oligonucleotides; and creation of RNA molecules that are differentially labeled with stable isotopes for structural studies by nuclear magnetic resonance (NMR).

In addition, the methods described herein can be used to either discover new enzymes or to optimize existing enzymes by selecting for a chosen activity, e.g., RNA ligase activity, using single or multiple rounds of selection and amplification. In some applications, very large and complex libraries of candidate sequences may be used to discover or optimize polypeptides with desired activity, e.g., RNA ligase activity. This advantage is particularly important when selecting functional polypeptide sequences, considering, for example, that 10¹³ possible sequences exist for a peptide of only 10 amino acids in length. Large library size provides a significant advantage for directed evolution applications, in that sequence space can be explored to a greater depth around any given starting sequence. Other features and advantages of the invention will be apparent from the detailed description and from the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

Figure Ia is a schematic illustration of a general selection scheme for enzymes for bond-forming reactions. A DNA library is transcribed into RNA, cross-linked to a 3'-puromycin oligonucleotide, and in vitro translated. The library of mRNA-displayed proteins is reverse transcribed with a primer bearing substrate A. Substrate B, which carries an anchor group, is added. Proteins that join A and B attach the anchor group to their encoding cDNA. Selected cDNA sequences are then amplified by PCR, and used as input for the next round. Figure Ib is a schematic illustration of the selection of enzymes that perform template-dependent ligation of a 5'-triphosphate-activated RNA ("PPP-substrate") to a second RNA ("HO-substrate"). The PPP-substrate is ligated to the primer and then used in the reverse transcription reaction. The cDNA of the catalytically active molecules is immobilized on streptavidin- coated beads via biotin, washed, and released by UV-irradiation of the photocleavable linker (PC). Figure Ic is a schematic illustration of the scaffolded library, which is based on a two zinc finger domain with two loop regions (light gray) that are replaced by segments of 12 or 9 random amino acids. Figure 2 is a bar graph that shows the progress of polypeptide selection.

The fraction of ³²P-labelled cDNA that bound to streptavidin agarose (SA) and eluted after photocleavage at each round of selection is shown. The input DNA into rounds 9*, 10* and 1 1 * was subjected to mutagenic PCR amplification and, in addition, a recombination procedure was performed before rounds 9* and 11 *. The selection pressure was increased by decreasing the time of the reaction as indicated. Asterisks indicate selection rounds after mutagenesis and recombination.

Figure 3a is a diagram showing the sequences of the starting library and selected ligases. Loop regions are highlighted. The highlighted cysteines constitute the two pairs of CX_nC (n = 2 or 5) motifs that coordinate zinc ions in the original hRXR domain. Randomized amino acids in the library are shown as x. Dashes indicate amino acids that are the same as in the starting library, whereas highlighted periods symbolize deletions. The underlined flanking regions were not part of the hRXRα domain but were added to contain a Flag epitope tag, a hexahistidine tag, and a linker region. Figure 3b is a diagram showing the sequences of 18 selected ligases, including those shown in Figure 3a.

Figure 4a is a schematic illustration of the RNA ligation reaction. Figure 4b is an image showing the results of reactions catalyzed by ligase #4 after 1, 3, and 10 hours (lanes 1, 2, and 3, respectively). Lanes 4-7: 10 hours with no splint, 5 '-monophosphate instead of PPP-substrate, 5' -hydroxy 1 instead of PPP-substrate, and wild-type hRXRα protein domain instead of ligase #4. Figure 4c is an image showing the release of inorganic pyrophosphate during ligation. Ligation reactions with γ-³²P GTP-labelled PPP-substrate were separated by thin-layer chromatography. A mixture of inorganic ³²P-phosphate (Pi), ³²P-pyrophosphate (PPi), and 5'-γ-³²P-labelled PPP-substrate was run for reference (Ref.). Figure 4d is an image showing the 3'-5' regiospecificity of ligation. Ligation of α-³²P GTP body-labelled PPP-substrate yielded product with ³²P at the indicated (*) positions. The product was digested to nucleoside monophosphates with ribonuclease T2 (which does not efficiently digest 2 ',5' RNA linkages) in the presence of a 22-nucleotide chemically synthesized RNA identical in sequence to the predicted ligation product but which contains a 2'-5' linkage at the ligation junction

(5'- CUAACGUUCGC^2'p^5'GGAGACUCUUU). Digestion products were separated by two-dimensional thin-layer chromatography. Ultraviolet shadowing revealed the carrier RNA digestion products (Ap, Cp, Gp, Up), including the 2 '-linked GpCp dinucleotide (encircled spots). Black spots represent the overlaid autoradiograph. The small dashed circle indicates the origin. Figure 4e is a graph demonstrating the occurrence of multiple turnover ligation. Substrate oligonucleotides and splint (each 20 μM) were incubated with ligase #4 (1 μM) for the indicated times and the ligation product was quantified. Error bars indicate standard deviation. Figure 4f is a graph showing the thermal unfolding of ligase #6 monitored by circular dichroism spectroscopy.

Figure 5 is a graph showing the far UV-CD spectrum of ligase #6. Figure 6a is a graph showing the results of two-dimensional ¹H¹⁵N-

HSQC NMR of ligase #6 using uniformly ¹⁵N-labelled protein. Figure 6b is a graph showing the results of two-dimensional ¹H¹⁵N-HSQC NMR of ligase #6 using selectively ¹⁵N-cysteine labelled protein.

Figure 7 is a graph showing the progress of the selection for ligases with enhanced stability at 65°C.

Figure 8 is an image showing the activity of original ligases #1, #4, and #7 and ligases selected at 65°C (A-6, B-7, C-IOC, D-IOH).

DETAILED DESCRIPTION OF THE INVENTION The present invention features polypeptides that are capable of catalyzing a ligation reaction between a first RNA molecule that includes a hydroxy 1 group at the 3 ' position and a second RNA molecule that includes a triphosphate group at the 5' position. The invention further features methods of selecting, optimizing, and using such polypeptides.

We have used mRNA-display, in which polypeptides are covalently linked to their encoding mRNA, to select for functional polypeptides from an in vitro translated polypeptide library of high complexity (>10¹²), without the constraints imposed by any in vivo step. In particular, we have isolated novel RNA ligases from a library based on a zinc finger scaffold, followed by in vitro directed evolution to further optimize these enzymes. The resulting ligases exhibit multiple turnover with rate enhancements of more than two million- fold.

Our results represent the first use of mRNA-display to select for a novel enzyme activity. The methods described herein represent a broadly applicable route to the isolation of novel enzymatic activities that are otherwise difficult or impossible to generate without explicit knowledge of structure or mechanism.

RNA ligase polypeptides

Polypeptides of the present invention are capable of catalyzing a ligation reaction between a first RNA molecule that includes a hydroxyl group at the 3 ' position and a second RNA molecule that includes a triphosphate group at the 5' position. These polypeptides may be identified by any means, e.g., the selection methods described herein.

In some instances, the RNA ligase polypeptides are based on a zinc finger scaffold, the protein retinoid-X-receptor (hRXRα) domain, with variations in two loop regions that provide specificity for the RNA ligase activity (Figs. Ic and 3a) (see, e.g., Cho and Szostak, Chem. Biol. 13: 139-147, 2006). In the example described in further detail herein, several such zinc finger variants with RNA ligase activity have been identified. The sequence of the starting library, SEQ ID NO.: 19, can include an N-terminal FLAG tag and a C-terminal His₆ tag for ease of selection and purification (Fig. 3a); alternatively, other flanking regions may be included in the RNA ligase polypeptide, or such flanking regions may be left out. The two loop regions (amino acids 21-32 and 60-68 of SEQ ID NO.: 19) are randomized in the polypeptide selection process, and mutations can also occur elsewhere in the sequence. Exemplary polypeptides of the invention are shown in Figs. 3a-3b (SEQ

ID NOs.: 1-18) and include A-6, B-7, C-IOC, and D-IOH (SEQ ID NOs.: 20- 23) disclosed herein. The invention also encompasses polypeptides that include an amino acid sequence that is at least 50%, 60%, 70%, 80%, 85%, 90%, 95%, 98%, 99%, or even 100% identical to amino acids 12-79 of any one of SEQ ID NOs.: 1, 3-5, 8-15, or 17-18; amino acids 12-61 of SEQ ID NO.: 2; amino acids 12-62 of SEQ ID NOs.: 6 or 7; amino acids 11-78 of SEQ ID NO.: 16, amino acids 17-88 of any one of SEQ ID NOs.: 20, 21, or 23, or amino acids 17-70 of SEQ ID NO.: 22. Polypeptides of the invention can also include amino acids 12-90 of any one of SEQ ID NOs.: 1, 4, 8, 10, 11, 13-15, 17, or 18; amino acids 12-72 of SEQ ID NO.: 2; amino acids 12-79 of any one of SEQ ID NOs.: 3, 5, 9, or 12; amino acids 12-73 of SEQ ID NOs.: 6 or 7; amino acids 11-89 of SEQ ID NO.: 16; amino acids 17-88 of any one of SEQ ID NOs.: 20, 21, or 23; or amino acids 17-70 of SEQ ID NO.: 22. Furthermore, polypeptides can include any one of SEQ ID NOs.: 1-23. In addition, polypeptides of the invention can differ by one, two, three, four, five, or more amino acids from the loop regions of the sequences selected in SEQ ID NOs.: 1-23. For example, polypeptides can include an amino acid sequence that differs from amino acids 21-32 or 60-68 of SEQ ID NO.: 1 by at least one, two, three, four, or five amino acids, or that differs from amino acids 21-29 or 42-50 of SEQ ID NO.: 2 by at least one, two, three, four, or five amino acids, or that differs from amino acids 21-32 or 60-68 of SEQ ID NO.: 3 by at least one, two, three, four, or five amino acids, or that differs from amino acids 21-32 or 60-68 of SEQ ID NO.: 4 by at least one, two, three, four, or five amino acids, or that differs from amino acids 21-32 or 60-68 of SEQ ID NO.: 5 by at least one, two, three, four, or five amino acids, or that differs from amino acids 21-30 or 43-51 of SEQ ID NO.: 6 by at least one, two, three, four, or five amino acids, or that differs from amino acids 21-30 or 43-51 of SEQ ID NO.: 7 by at least one, two, three, four, or five amino acids, or that differs from amino acids 21-32 or 60-68 of SEQ ID NO.: 8 by at least one, two, three, four, or five amino acids, or that differs from amino acids 21-32 or 60-68 of SEQ ID NO.: 9 by at least one, two, three, four, or five amino acids, or that differs from amino acids 21-32 or 60-68 of SEQ ID NO.: 10 by at least one, two, three, four, or five amino acids, or that differs from amino acids 21-32 or 60-68 of SEQ ID NO.: 11 by at least one, two, three, four, or five amino acids, or that differs from amino acids 21-32 or 60-68 of SEQ ID NO.: 12 by at least one, two, three, four, or five amino acids, or that differs from amino acids 21 -32 or 60-68 of SEQ ID NO. : 13 by at least one, two, three, four, or five amino acids, or that differs from amino acids 21-32 or 60-68 of SEQ ID NO.: 14 by at least one, two, three, four, or five amino acids, or that differs from amino acids 21-32 or 60-68 of SEQ ID NO.: 15 by at least one, two, three, four, or five amino acids, or that differs from amino acids 20- 31 or 59-67 of SEQ ID NO.: 16 by at least one, two, three, four, or five amino acids, or that differs from amino acids 21-32 or 60-68 of SEQ ID NO.: 17 by at least one, two, three, four, or five amino acids, or that differs from amino acids 21-32 or 60-68 of SEQ ID NO.: 18 by at least one, two, three, four, or five amino acids, or that differs from amino acids 26-37 or 64-73 of SEQ ID NO.: 20 by at least one, two, three, four, or five amino acids, or that differs from amino acids 26-37 or 64-73 of SEQ ID NO.: 21 by at least one, two, three, four, or five amino acids, or that differs from amino acids 26-37 or 64-73 of SEQ ID NO.: 23 by at least one, two, three, four, or five amino acids.

Other scaffolds and selection techniques may also be used to generate RNA ligase polypeptides of the invention.

Methods of selecting polypeptides

Any selection method may be used to identify the polypeptides of the invention. For example, mRNA display may be employed. In one instance, the selection method includes the steps of: (a) providing a population of candidate RNA molecules, each of which includes a translation initiation sequence and a start codon operably linked to a candidate polypeptide coding sequence and each of which is covalently bonded to a peptide acceptor at the 3 ' end of the candidate polypeptide coding sequence, the peptide acceptor being a molecule that can be added to the C-terminus of a growing polypeptide chain by the catalytic activity of a ribosomal peptidyl transferase; (b) in vitro translating the candidate polypeptide coding sequences of the candidate RNA molecules to produce a population of candidate RNA-polypeptide fusions; and (c) selecting a desired RNA-polypeptide fusion based on RNA ligase activity, thereby selecting the polypeptide capable of catalyzing the ligation reaction. Any population of candidate RNA molecules may be employed, e.g., those encoding zinc finger scaffolds with randomized loop regions, as described, e.g., in Cho and Szostak, Chem. Biol. 13: 139-147, 2006, and as described further herein. Any peptide acceptor, e.g., puromycin, may be used to link the RNA and polypeptide chain. In vitro translation can be performed using any method known in the art, e.g., by employing cell-free expression systems such as reticulocyte lysates. Selection for the desired activity, e.g., ligation of a first RNA molecule having a hydroxyl group at the 3 ' position and a second RNA molecule having a triphosphate or phosphoryl group at the 5' position, can be performed, e.g., as described in further detail in the example herein. The rate of ligation may be determined, e.g., by using a gel shift assay as described herein. In some instances, a "splint" oligonucleotide having complementarity to at least a portion of both RNA molecules being ligated can be included in order to facilitate positioning of the two molecules prior to ligation. In addition to selecting for a desired activity, the methods of the invention can be employed to optimize a previously-identified polypeptide with desired activity, e.g., RNA ligase activity, in order to optimize its sequence, activity, specificity, solubility, or other characteristics. Selection methods may be performed in a single round or in multiple rounds.

Methods of mRNA display are described in further detail in the example herein, and in U.S. Patent Nos. 6,518,018; 6,281,344; 6,261,804; 6,258,558; 6,214,553; and 6,207,446; Roberts and Szostak, Proc. Natl Acad. Sci. USA 94: 12297-12302, 1997; Keefe and Szostak, Nature 410:715-718, 2001 ; Wilson et al., Proc. Natl. Acad. Sci. USA 98:3750-3755, 2001; Baggio et al., J. MoI. Recognit. 15: 126-134, 2002; Cujec et al., Chem. Biol. 9:253-264, 2002; Raffler et al., Chem. Biol. 10:369-369, 2003; and Xu et al., Chem. Biol. 9:933-942, 2002.

EXAMPLE

The following example is provided for the purpose of illustrating the invention and is not meant to limit the invention in any way. We have devised a general scheme for the direct selection of enzymes catalyzing bond-forming reactions from mRNA-displayed protein libraries (Fig. Ia). To demonstrate that new protein catalysts can be created using this scheme, we chose, as a model reaction, the ligation of two RNA molecules aligned on a template, with one RNA activated with a 5 '-triphosphate (Fig. Ib). For our selection we used a library in which two loops of the small, stable protein retinoid-X-receptor (hRXRα) domain were randomized (Fig. Ic). We transcribed and translated this synthetic DNA library to generate mRNA- displayed proteins (Fig. Ia), which we then reverse transcribed with a primer joined at its 5 '-end to the triphosphorylated RNA (PPP-substrate, Fig. Ib). We incubated the library of 4x10¹² unique mRNA-displayed proteins with the biotinylated oligonucleotide (HO-substrate) and the complementary splint oligonucleotide that aligns the two substrate oligonucleotides. Proteins that catalyzed the ligation of the two substrates covalently attached the biotin moiety to their own cDNA, which we captured on streptavidin-coated agarose beads. After washing, we eluted the cDNA by cleaving the photocleavable linker between the HO-substrate and the biotin. We amplified the cDNA by PCR and used it as input for the next round of selection and amplification. Over 9 rounds, the fraction of the input library immobilized on the streptavidin beads and photoreleased increased from 0.01% to 0.3%, and after 12 rounds it increased to 2.3% (Fig. 2). To increase the activity of the selected ligases, we returned to the DNA library after round 8 and performed recombination and random mutagenesis (as described, e.g., in Cadwell and Joyce, PCR Methods Appl. 2:28-33, 1992, and Wilson and Keefe, Curr. Prot. MoI. Biol., edited by Ausubel et al. (Wiley, New York, 2000)) by restriction enzyme digestion and ligation of the DNA, and by subjecting the input DNA for rounds 9* through 11 * to error-prone PCR amplification. We then continued the cycles of selection and amplification without further recombination or mutagenesis until round 17, while increasing the selection pressure by gradually decreasing the reaction time from overnight to five minutes (Fig. 2).

The evolved pool of enzymes contained several families of closely related sequences as well as multiple unrelated single isolates. Of the nine amino acids in loop 2, four positions were absolutely conserved in all sequences, four other sites were conserved in 86-90% of the clones, and one position was conserved in 50% of the sequences. In contrast, we observed the motif "DYKXXD" at varying positions in the 12 originally randomized positions of loop 1 in 57% of the clones. This motif was likely enriched because it resembles the recognition site for the anti-Flag antibody M2 (as described, e.g., in Slootstra et al., MoI. Divers. 2: 156-164, 1997) that we used for purification of the mRNA-displayed proteins. These results suggest that the highly conserved loop 2 plays an essential role in ligase activity, while loop 1 is less critical. Analysis of the non-loop regions revealed a low conservation of specific cysteines of the original scaffold structure (Fig. 3a). After 17 rounds of selection, just 16% of the clones (8 of 49) retained the cysteine pattern as originally designed and were free of major deletions. The first and the fourth CX_nC sequences were highly conserved (47 and 48 clones out of 49, respectively), but the second and third CX_nC motifs were retained in only 24% and 20% of the clones, respectively. In addition, two deletions (of 17 and 13 amino acids) were frequently observed (Fig. 3a). Because of the mutation of up to half of the eight original zinc- coordinating cysteines and the frequent deletion of significant segments of the protein during the selection and evolution process, the majority of the proteins likely have undergone a substantial structural rearrangement in comparison to the original scaffold. We chose 18 clones from the final evolved library and screened them as mRNA-displayed proteins for ligation activity. All of the clones, including those with mutated cysteines or deleted regions, showed activity. We then expressed the seven most active ligases (#1-7, Fig. 3a) in Escherichia coli as C- terminal fusions with maltose binding protein (MBP) or without any fusion partner. All seven enzymes were soluble when fused to MBP (≥ 3 mg/ml for several weeks at 4°C). When expressed on their own, two of the ligases were soluble (#6 and #7), whereas the other five precipitated or aggregated. The sequences of all 18 clones that we screened for ligation activity are shown in Fig. 3b.

We chose the MBP-fusion of the most active enzyme (ligase #4) for more detailed characterization (Fig. 4). Incubation of the purified MBP-fusion enzyme with the PPP-substrate, the HO-substrate, and the splint oligonucleotide yielded the desired ligation product (Figs. 4a and 4b) as well as the expected inorganic pyrophosphate by-product (Fig. 4c). We did not detect any product when we substituted the PPP-substrate with an oligonucleotide of identical sequence but with either a 5 '-monophosphate or a 5' -hydroxy 1 instead of the 5 '-triphosphate group (Fig. 4b). Preliminary experiments show that the enzyme catalyzes the ligation equally well for all four nucleobases at the 3'- terminal base of the HO-substrate as long as they are correctly base-paired to the splint oligonucleotide. A mismatch at this position reduces the ligation efficiency several-fold. Enzymatic digestion of the ligated product confirmed the 3'-5' regiospecificity of the ligase reaction (Fig. 4d).

Because the RXR-library was based on a zinc finger protein, we examined the role of zinc and other cations in catalysis. The reaction required Zn²⁺ and monovalent cations (K⁺ or Na⁺) with optima of 100 μM and 80 mM respectively. The rate of the catalyzed reaction showed a strong pH dependence with an optimum at pH 7.6. The optimal ligation _^conditions with regard to Zn²⁺, monovalent cation, and pH coincide with the conditions used during the selection. In contrast to the enzymatic reaction described here, the non-enzymatic template-directed ligation is inhibited by Zn²⁺ and shows a linear increase in reaction rate with increasing pH. Incubating the ligase with chelating resin (Chelex 100) resulted in an almost complete loss of activity; activity could be restored by the addition of Zn²⁺, but not by the addition of Cu²⁺, Ni²⁺, Co²⁺, Mn²⁺, Cd²⁺, or Mg²⁺ ions. Elemental analysis by inductively coupled plasma optical emission spectroscopy revealed 2.6 ± 0.4 equivalents (± s.d.) of bound zinc per ligase molecule, whereas the original wild-type hRXRα protein contained 2.1 ± 0.1 molecules of zinc (± s.d.). The strong zinc dependence of the ligase enzymes could be due to either the continued existence of structural zinc sites or the presence of a catalytic zinc in the molecule.

To quantify the rate acceleration achieved by the selected ligase, we determined the rates of the catalyzed as well as the uncatalyzed RNA-RNA ligation reactions. We could not detect any uncatalyzed formation of product in the absence OfMg²⁺, consistent with previous work on the Mg²⁺-dependence of the nonenzymatic ligation reaction. During the selection process, Mg ⁺ was present at a concentration of 5 mM, yet we found that the catalyzed reaction did not require magnesium ions, and indeed was faster in their absence. By quantifying the detection limit of our assay, we determined that the upper limit of the rate of the uncatalyzed reaction of the pseudo-intramolecular complex of two substrate oligonucleotides pre-aligned on the template oligonucleotide in the absence Of Mg²⁺ was k_{obs (uncatalyzed)} < 3 x 10^"7 h^'1. We measured the rate of the catalyzed ligation in the absence Of Mg²⁺, at a subsaturating substrate concentration of 10 μM as k_obs (cata_ly_zed) ⁼ 0.65 ± 0.11 h^"1 (± s.d.), which is at least 2 x 10⁶-fold faster than the uncatalyzed reaction. For the wild-type hRXRα protein domain, we could not detect any ligated product (Fig. 4b).

We found that the evolved enzyme catalyzed the ligation reaction with multiple turnover (Fig. 4e), although the selection scheme we employed utilized a single-turnover strategy that did not exert any selective pressure for product release. The intramolecular single-turnover design of the mRNA- display selection scheme used here facilitates the isolation of enzymes, even if the rate acceleration is low or the substrate affinity is weak.

Preliminary biophysical studies suggest that the ligase possesses a folded structure. We chose ligase #6 for the following experiments because of its superior solubility in the absence of a fusion protein partner. Circular dichroism (CD) spectroscopy revealed an α-helical component of the secondary structure (Fig. 5), and thermal denaturation indicated cooperative thermal unfolding (Fig. 4f). The two-dimensional ¹H¹⁵N heteronuclear single- quantum coherence (HSQC) NMR spectrum showed about 67 well-resolved peaks with a good chemical shift dispersion in the amide region of the spectrum, which indicates that a significant portion of the ligase protein is well folded (Fig. 6a). A similar HSQC experiment with selectively ¹⁵N-cysteine labelled protein suggests that all six cysteines of ligase #6 are well structured (Fig. 6b). There are no known examples of naturally-occurring protein enzymes that catalyze the ligation of a 5'-triphosphorylated RNA oligonucleotide to the terminal 3' -hydroxy 1 group of a second RNA. An enzyme catalyzing a similar reaction, the T4 RNA ligase, joins a 3'-hydroxyl group to a 5'- monophosphorylated RNA with the concomitant conversion of ATP to AMP and inorganic pyrophosphate via a covalent AMP-ligase intermediate. The reaction catalyzed by the ligase described herein is more closely related to chain elongation by one nucleotide during RNA polymerization: in both cases, the growing strand and the triphosphate-containing substrate base pair to a template, the 3'- hydroxy 1 of the growing strand attacks the α-phosphate of a 5 '-triphosphate, and a pyrophosphate is released in concert with the formation of a 3 '-5' phosphodiester bond (Fig. 4a). RNA polymerases can be very fast; for example, T7 RNA polymerase catalyzes chain elongation at 240 nucleotides/second. Preliminary results with our selected ligase #4 did not show any polymerase activity with nucleoside triphosphates. Ribozymes and deoxyribozymes previously selected from random oligonucleotide libraries (as described, e.g., in Bartel and Szostak, Science 261 : 1411-1418, 1993, and Purtha et al., J. Am. Chem. Soc. 127: 13124-13125, 2005) catalyze the same reaction as the protein ligases described herein. These ribozymes and deoxyribozymes have rate enhancements over the uncatalyzed background reaction of the same order of magnitude as the protein ligases of the invention, and in the case of the ribozymes, these rates were significantly increased by further in vitro evolution (up to 10⁹-fold rate acceleration). While the protein ligase described herein was dependent on Zn ⁺ and inhibited by Mg , the ribozyme-catalyzed ligation is strongly dependent on Mg with an optimum at -60 mM . The deoxyribozymes have been selected as Mg - dependent variants and also as Zn²⁺-dependent variants. The pH dependence of the ligase enzyme described herein indicates that the catalytic mechanism involves acid-base catalysis by amino acid residues of the enzyme; in contrast, the pH-dependence of the ribozyme and deoxyribozyme ligases is more consistent with a catalytic role for one or more bound metal ions. To select for ligase enzymes with increased thermostability we continued the selection using the same selection protocol described herein except for the following changes.

For the reverse transcription reaction of the mRNA-displayed proteins the oligonucleotide BS-LigRT-70 was used instead of RT-primer. The mRNA- displayed proteins were incubated at 65°C for 10 minutes immediately before the addition of HO-substrate (BS-lig-bio20) and splint oligonucleotide (BS- LigSpl-40) and the incubation was continued for an additional hour at 65°C, followed by the quenching with EDTA. The procedure then followed the original protocol again as described above. Figure 7 shows the progress of this selection. The pool after round 4 was cloned and sequenced. The sequences (A-6, B-7, C-IOC, and D-10H; SEQ ID NOs.:20-23) were assayed for ligation activity at 65°C overnight. All four ligases showed ligation activity at 65°C, whereas the original ligases #1, #4, and #7 assayed under the same conditions yielded no detectable ligation product (Fig. 8). All four heat-evolved ligases also showed an increased activity at room temperature compared to the original ligases #1, #4, and #7. Further details follow on the materials and methods used in the above- described example.

Ligation activity assay ofmRNA-displayed ligases by gel shift 18 individual ligases were expressed separately as mRNA-displayed proteins, and incubated with HO-substrate and splint. After five hours, the ligation reaction mixture was quenched, mixed with excess streptavidin, and separated by denaturing PAGE. The substrate and gel-shifted product bands were quantified.

Expression of ligases in E. coli

All proteins were expressed in Rosetta BL21 (DE3) cells and purified on either an amylose resin column (MBP-fused proteins) or a Ni-NTA resin column.

Ligation activity assay of free ligases

20 μM PPP-substrate, 15 μM splint, and 10 μM radiolabeled HO- substrate were incubated with 5 μM ligase, separated by PAGE, and analyzed. The k_obs values were determined by fitting the ratio of product concentration divided by enzyme concentration against time to a linear equation, and are the average of three independent experiments measured at less than 10% product formation.

Preparation of primer for reverse transcription (RT) The RT-primer was a chimeric oligonucleotide made from a 5'- triphosphate RNA oligonucleotide and a DNA oligonucleotide at the 3 '-end. The PPP-substrate (5 '-pppGGAG ACUCUUU) was synthesized by T7 RNA polymerase from a double stranded template of BS47 and BS48 and purified by denaturing polyacrylamide gel electrophoresis (PAGE). The PPP-substrate was then ligated to BS50 in the presence of BS56 as template by T4 DNA ligase, and the product was purified by denaturing PAGE to yield the RT- primer : 5 ' -pppGGAGACUCUUUTTTTTTTTTTTTTTTTTTCCC AGATCC A GACATTC.

In vitro selection and evolution

The DNA library (characterized in Cho and Szostak, Chem. Biol. 13: 139-147, 2006) was PCR-amplified with primers BS31ong and BS24RXR2 to introduce a cross-link site at the 3 '-end in order to use the psoralen- crosslinking protocol. RNA was produced from the DNA library with T7 RNA polymerase. After purification by denaturing polyacrylamide gel electrophoresis, the RNA was photo-crosslinked (as described, e.g., in Kurz et al., Nucl. Acids Res. 28:e83, 2000) with the XL-PSO oligonucleotide and ethanol-precipitated. The mRNA-displayed proteins were generated as previously described (see, e.g., Roberts and Szostak, Proc. Natl. Acad. Sci. USA 94: 12297-12302, 1997; Keefe and Szostak, Nature 410:715-718, 2001 ; Cho and Szostak, Chem. Biol. 13: 139-147, 2006; Cho et al., J. MoI. Biol. 297:309-319, 2000; and Liu et al., Methods Enzymol. 318:268-293, 2000) with the following modifications. In the first round of selection, a 10 ml translation was incubated at 3O⁰C for one hour (200 nM psoralen cross-linked RNA template, Red Nova Rabbit Reticulocyte Lysate (Novagen, Madison, WI), used according to the manufacturer's instructions with an additional 100 mM KCl / 0.9 mM Mg(OAc)₂ and 69 nM ³⁵S-methionine). After addition of 600 mM KCl and 25 mM MgCl₂, the translation reaction was incubated at room temperature for five minutes and then diluted ten-fold into oligo(dT)cellulose binding buffer (10 mM EDTA, 1 M NaCl, 10 mM 2-mercaptoethanol, 20 mM

Tris(hydroxymethyl) amino methane, pH 8.0, 0.2% w/v Triton X-100), and this mixture was incubated with 10 mg/ml oligo(dT)cellulose (New England Biolabs, Beverly, MA) for fifteen minutes at 4°C with rotation. The oligo(dT)cellulose was washed on a chromatography column (Bio-Rad, Hercules, CA) with the same oligo(dT)cellulose binding buffer, then with oligo(dT)cellulose wash buffer (300 mM KCl, 5 mM 2-mercaptoethanol, 20 mM Tris(hydroxymethyl) amino methane, pH 8.0) and then eluted with oligo(dT)cellulose elution buffer (5 mM 2-mercaptoethanol, 2 mM Tris(hydroxymethyl) amino methane, pH 8.0) to yield 4xlO¹³ mRNA-displayed proteins. The eluate was mixed with 10x Flag binding buffer (Ix is 150 mM KCl, 5 mM 2-mercaptoethanol, 50 mM HEPES, pH 7.4, 0.01% w/v Triton X- 100) and then incubated with 50 μl Anti-Flag M2-agarose affinity gel (Sigma, St. Louis, MO; prewashed with Flag clean buffer (100 mM glycine, pH 3.5, 0.25% w/v Triton X-100) andTlag binding buffer) for two hours at 4°C with rotation. The Anti-Flag M2-agarose affinity gel was then washed with Flag binding buffer and eluted with Flag binding buffer containing two equivalents of Flag peptide (Sigma, St. Louis, MO; one equivalent of Flag peptide saturates both antigen sites of the antibody resin) for twenty minutes at 4°C with rotation. The eluate was diluted two-fold with an additional 50 mM Tris (hydroxymethyl) amino methane, pH 8.3, 3 mM MgCl₂, 10 mM 2- mercaptoethanol, 0.5 mM each of dCTP, dGTP, dTTP, 5 μM dATP, 50 nM α- ³²P dATP and used for the reverse transcription of the mRNA-displayed proteins with the RT-primer and Superscript II (Gibco BRL, Rockville, MD) at 42°C for 30 minutes. This sample was then dialyzed twice against Flag binding buffer at a ratio of 1/1000 and then incubated with 100 μl Anti-Flag M2-agarose affinity gel and processed as described for the first Flag affinity purification above. Zinc chloride and 5x selection buffer (Ix is 400 mM KCl, 5 mM MgCl₂ , 20 mM HEPES, pH 7.4, 0.01% w/v Triton X-100) was added to the Flag elution in order to make a final concentration of 100 μM and Ix, respectively. The mixture was incubated with 2 μM HO-substrate (PC-biotin) and 3 μM splint for the indicated times (Fig. 2) at room temperature. After quenching the reaction with 10 mM EDTA, the solution was incubated with 700 μl ImmunoPure immobilized streptavidin agarose (Pierce, Rockford, IL; prewashed with PBS buffer (138 mM NaCl, 2.7 mM KCl, 10 mM potassium phosphate, pH 7.4) including 2 mg/ml t-RNA (from baker's yeast, Sigma, St. Louis, MO)) and then washed with PBS alone at room temperature for twenty minutes with rotation. The streptavidin agarose was washed on a chromatography column (Bio-Rad, Hercules, CA) with SA binding buffer (I M NaCl, 10 mM HEPES, pH 7.2, 5 mM EDTA), with SA urea wash buffer (8 M urea, 0.1 M Tris(hydroxymethyl) amino methane, pH 7.4), with SA basic wash buffer (20 mM NaOH, 1 mM EDTA) and with water. For the first round of selection, the streptavidin agarose was used directly in the PCR amplification reaction (50 μl streptavidin agarose beads per 1 ml PCR). Every round was assayed by scintillation counting of the ³⁵S-methionine-labelled proteins (from translation to reverse transcription) or of the ³²P-labelled cDNA (after reverse transcription) to measure the efficiencies of the various steps. These data were then used to determine that the number of purified individual protein sequences introduced into the round 1 ligation reaction step (incubation with biotin-PC- RNA and splint RNA) was 4x10¹², based on the proportion of total methionine (translation) and total dATP (reverse transcription) incorporated into the mRNA-displayed proteins, and the efficiency of each of the subsequent purification steps.

This procedure was repeated for 17 rounds except for the following changes: in round 2 and in all subsequent rounds the translation reaction was 2 ml, only 400 μl of streptavidin agarose were used and directly before the PCR amplification, the streptavidin agarose beads were aliquoted in a 50% PBS slurry to 100 μl open wells. The slurry was irradiated with a UV lamp (4 Watts) at 360 nm from a 1 cm distance for fifteen minutes while shaking in order to release the cDNA. The beads were filtered off and the solution was used for PCR amplification. Before round 9* and 11 * the DNA was digested with restriction endonuclease Avail, which recognizes a unique restriction site between the two zinc fingers, and then ligated back together with T4 DNA ligase to achieve a recombination of the two halves of the proteins. The input DNA in rounds 9*, 10* and 11 * were further mutagenized by error prone PCR²⁰'²¹ at an average mutagenic rate of 3.8% at the amino acid level. Cloning

Cloning was done as in Cho and Szostak, Chem. Biol. 13: 139-147, 2006, with some changes. In order to analyze the results of the selection, the cDNA of the respective round was cloned into the pCR^®-TOPO vector (TOPO TA Cloning, Invitrogen, Carlsbad, CA) and the individual clones were sequenced. To express the proteins in E. coli, the ligase genes were amplified with primers BS63 and BS65, and the wild-type hRXRα motif was amplified with primers BS68 and BS70. The PCR products were digested with Ndel and Xhol and cloned into the pIADL14 vector (see, e.g., McCafferty et al., Biochemistry 36: 10498-10505, 1997) to yield the MBP-fusion proteins or into the pET24a vector (Novagen, Madison, WI) to yield the protein without any fusion partner.

Sequence analysis For sequence alignments the following software was used: Seqlab of the

GCG Wisconsin Package™, BioEdit, and MultAlin.

Ligation activity assay of mRNA-displayed ligases by gel shift The sequences of 18 individual ligases were amplified from their respective pCR^®-TOPO vector with primers B S3 long / B S24RXR2 and separately subjected to one round of selection as described above. After the incubation with HO-substrate (PC-biotin) and splint for five hours, the ligation reaction mixture was quenched with 10 mM EDTA / 8 M urea and was then mixed with an excess of streptavidin (Pierce, Rockford, IL) and separated by denaturing PAGE. The gel was analyzed using a GE Healthcare (Amersham Bioscience, Piscataway, NJ) phosphorimager and ImageQuant software.

Expression of ligases and wild-type hRXRa in E. coli All proteins were expressed in Rosetta BL21 (DE3) cells (Novagen, Madison, WI) containing the recombinant plasmids at 37°C in LB broth containing 50 μg/ml kanamycin. Cells were harvested, resuspended in lysis buffer (400 mM NaCl, 5 mM 2-mercaptoethanol, 20 mM HEPES, pH 7.5, 100 μM ZnCl₂, 10% glycerol) and sonicated. After centrifugation, the supernatant was applied to an amylose resin column (New England Biolabs, Beverly, MA) in the case of the MBP-fusion proteins. The immobilized protein was washed and then eluted with amylose elution buffer (150 mM NaCl, 5 mM 2- mercaptoethanol, 20 mM HEPES, pH 7.5, 100 μM ZnCl₂, 10 mM maltose) and stored at 4°C for further use.

In order to purify the proteins lacking the MBP-fusion the supernatant after centrifugation was applied to a Ni-NTA resin column (Qiagen, Hilden, Germany) instead. The immobilized protein was washed and then eluted with acidic Ni-NTA elution buffer (20 mM NaOAc, pH 4.5, 400 mM NaCl, 5 mM 2-mercaptoethanol, 100 μM ZnCl₂) directly into a 1 M HEPES, pH 7.5 solution to yield a final concentration of 100 mM HEPES. For use in CD and NMR spectroscopy experiments, the protein was further purified by FPLC (BioCAD Sprint Perfusion System) using a Sephadex-200 gel filtration column (Pharmacia Biotech, Uppsala, Sweden) with isocratic elution in 150 mM NaCl, 5 mM 2-mercaptoethanol, 20 mM HEPES, pH 7.4, 100 μM ZnCl₂ at 4°C. The proteins were stored at 4°C for further use. Protein concentration was determined by the Bradford method.

Ligation activity assay of free ligases

20 μM PPP-substrate (1 lmer), 15 μM splint and 10 μM 5'-³²P-labelled HO-substrate (1 lmer) were incubated with 5 μM ligase in reaction buffer (100 mM NaCl, 20 mM HEPES, pH 7.5, 100 μM ZnCl₂) for the indicated time and separated and analyzed as above.

The k_obs values were determined by fitting the ratio of product concentration divided by enzyme concentration against time to a linear equation, and are the average of three independent experiments measured at less than 10% product formation. The standard deviation is provided. Detection of pyrophosphate

The MBP fusion of ligase #4 (purified on amylose column) was immobilized on Ni-NTA resin (Qiagen, Hilden, Germany), washed with buffer (150 mM KCl, 5 mM 2-mercaptoethanol, 50 mM HEPES, pH 7.4, 0.01% w/v Triton X-100, 100 μM ZnCl₂) and eluted in acidic elution buffer (50 mM NaOAc, pH 4.5, 150 mM NaCl, 5 mM 2-mercaptoethanol, 100 μM ZnCl₂). The ligase was then dialyzed against 150 mM NaCl, 5 mM 2-mercaptoethanol, 20 mM HEPES, pH 7.5, 100 μM ZnCl₂. The ligase (3 μM) was incubated with 6 μM γ-³²P-labelled PPP-substrate (1 lmer), 9 μM splint, and 12 μM HO- substrate (1 lmer). The reactions were separated by thin-layer chromatography on PEI cellulose plates, developed in 0.5 M KH₂PO₄ at pH 3.4.

Analysis of metal content

The MBP-fusion proteins of ligase #4 and wild-type hRXRα (purified on amylose column) were dialyzed three times against buffer (100 mM NaCl, 5 mM 2-mercaptoethanol, 20 mM HEPES at pH 7.5; pre-treated with Chelex 100 beads (BioRad) for three hours and filtered) at a ratio of 1/1000. The metal content of 4 μM samples was measured with an Inductively Coupled Plasma Emission Spectrometer (Jarrell-Ash 965 ICP, University of Georgia).

Circular dichroism spectroscopy

CD spectra were recorded on an Aviv CD Spectrometer Model 202. Wavelength scans were performed in 15 mM NaCl, 0.5 mM 2- mercaptoethanol, 2 mM HEPES at pH 7.5, 10 μM ZnCl₂ and 100 μM ligase #6 at 25°C in a 0.1 mm cuvette at 1 nm bandwidth in 1 nm increments with an averaging time of four seconds. Thermal denaturation of 324 μM ligase #6 in 150 mM NaCl, 5 mM 2-mercaptoethanol, 50 mM HEPES, pH 7.4, 100 μM ZnCl₂ was monitored at 222 nm from 5°C to 90⁰C in 4°C increments and an equilibration time of two minutes at each temperature step in a 1 mm cuvette at 1.5 nm bandwidth with an averaging time of 10 seconds. NMR spectroscopy

¹H¹⁵N-NMR spectra were recorded on Bruker 500 MHz and 600 MHz NMR instruments with either uniformly ¹⁵N-labelled or selectively ¹⁵N- cysteine labelled protein (0.3 mM) in 10% D₂O, 150 mM NaCl, 5 mM 2- mercaptoethanol, 50 mM HEPES, pH 7.4, 100 μM ZnCl₂. Protein samples were prepared from minimal media cultures using ¹⁵N-labelled NH₄CI as the sole source of nitrogen or ¹⁵N-labelled cysteine as sole source of cysteine, respectively.

Oligonucleotides (DNA unless stated otherwise)

The following oligonucleotides were used in the experiments described herein.

BS47, 5 '-TTCTAATACGACTCACTATAGGAGACTCTTT BS48, 5'-AAAGAGTCTCCTATAGTGAGTCGTATTAGAA BS50, 5'-PTTTTTTTTTTTTTTTTTTCCCAGATCCAGACATTC

BS56, 5'-CATATGGGAATGTCTGGATCTGGGAAAAAAAAAAA AAAAAAAAAAGAGTCTCCGCGAACGTTAGACAGTGTGACTTCGTC ATGCTATTCA

B S3 long, 5 ' -TCTAATACGACTCACTATAGGGACAATTACTATT TACAATTACAATGGACT

BS24RXR2, 5 '-TTAATAGCCGGTGCCAGATCCAGACATTCCCA TAGAACCGCCATGATGATG

XL-PSO, X(tagccggtg)AAA AAA AAA AAA AAA ZZ ACC P X=psoralen C6, lower case=2'OMe, Z=spacer 9, P=puromycin (Glen Research, Sterling, VA); stretch of A's and ACC was DNA

HO-substrate (PC-biotin), 5'-PC-UCACACUGUCUAACGUUCGC PC biotin = PC biotin phosphoramidite (Glen Research, Sterling, VA); all nucleotides are RNA splint RNA, 5'-GAGUCUCCGCGAACGU; all nucleotides are RNA BS63, 5 '-AGGATTATAGCATATGGACTACAAGGACGACGACG

BS65, 5'-ATGTTCAGACCTCGAG TTAATAGCCGGTGCCAGAT BS68, 5 '-AGGATTATAGCATATGGACTACAAGGACGACGACG ACAAGGGCGGAAAGCACATCTGC

B S70, 5 ' -TCATTCAGACCTCGAGTTAATAGCCGGTGCCAGAT CCAGACATTCCCATAGAACCGCCATGATGATGGTGGTGGTGACTA CCTACCTCCTCCTGCACGGCTTCC

HO-substrate (l lmer), 5'-CUAACGUUCGC; all nucleotides are RNA

BS-LigRT-70 (RNA / DNA chimera),

5'-PPPGGAGAUUCACUAGCUGGUUU TGTACGATTCGATGACGA- HEG4-TTTTTTTTTTTTTTTCCCAGATCCAGACATTC This oligonucleotide was prepared in a similar way as the RT primer for original selection by ligation of a transcribed RNA (5'-PPPGGAGAUUCACUAGCUGGUUU) and a synthetic DNA oligonucleotide; "HEG4" stands for 4 units of hexaethylene glycol linker and provides a flexible linker. BS-lig-bio20 (RNA), 5'-PC biotin-UCACACUGUCUA ACGUUCGC

BS-LigSpl-40 (RNA), 5'-AAACCAGCUAGUGAAUCUCCGCGAACGUUAGACAGUGUGA

DNA library sequence

5 ' -TCTAATACGACTCACTATAGGGACAATTACTATTTACAAT TACAATGGACTACAAGGACGACGACGACAAGGGCGGAAAGCACA TCTGCGCCATCTGTGGAGAT N₃₆ TCCTGTGAGGGCTGTAAAGGCTT CTTCAAGCGCACCGTGAGAAAGGACCTGACCTACACCTGTCGGGA CAACAAGGATTGT N₂₇ TGCCAGTACTGTAGGTACCAGAAGGCCCT CGCCATGGGCATGAAAAGGGAGGCCGTGCAGGAAGAGGTAGGTA GTCACCACCACCATCATCATGGCGGTTCTATGGGAATGTCTGGATC TGGCACCGGCTATTAA Sequences of selected clones from round 17

B5, MDYKDDDDKSGKHICAICGDYIPEEDSHRDGDSCEGCKGF SKRTVRKDLTYTCRDYKNCESYHKCSDLCQYCRYQKALATGMKRE AVQEGVGIHHQHHHGGSMGMSGSGTGY F9, MDYKDDDDKGGRHICAICGDWTTADTKTQYDSCEGCKSF

SKRTVRKDPTYTCRDYKNCESYHKCSDLCQYCRYQKALAMGTKRG AVQEEVGSHHQHHHGGSMGMSGSGTGY

E3, MDYKDDDDKGGKHICAICGDVVDTADAKTQYDSCGGCK GIPKRTERKELTYTCRDYKNCESYHKCSDLCLYCRYQLDLAIHHQHH HGGSMGMSGSGTGY

G9, MDYKDDDDKGGKHSCSICGDWATADTKFQYDSCEGCK GSSKRTVRKDLTYTCRDYKNCESYHKCSDLCQDRRNQKALAIHHQ HHHGGSMGMSGSGTGY

C4, MD YKDDDDKSGKHVCAICGDVL YEND YKTSDNSGEGCK GVYKRTVRKDMTYTHRDHRNCECYHLCINQCQYCRYQKALAKGM KREAVQEEAGSHHQHHHGGSMGMSGSGTGY

E5 , MD YKDDDDKSGKHVCAICGDVL YEND YKTSDNSGEGCK GFYKRTERKDMTYTHRDHRNCECYHLCINQCQYCRYQKALAKGM KREAVQEEAGSHHQHHHGGSMGMSGSGTGY H.4, MDYKDDDDKSGKHVCAICGDVLYENDYKTSDNSCEGRK

GFYKRSVRKDPTYTHRDHRNCECYHLCINQCQYCRYQKALAKGM KREAVQEEAGSHHHHHGGSMGMSGSGTGY d 1 , MD YKDDDDKSGKHVC AICGDVL YENDYKTSDNSGEGCK

GFYKRTVRKDMTYTHRDHRNCECYHLCINQCQYCRHQRALAKGT KREAVQEEVGIHHQHHHGGSMGMSGSGTGY

E7, MDYKDDDDKGGKHICAICGDIIADTRDYKSGDSCEGCNS TFKRTVRRDLTYTSRDNKNCERYHLCINQCQYCRYQKALATGTKR EWQDEAGSHHQHHHGGSMGMSGSGTGY

B 1 , MDYKDDDDKGGKHICAICGDSLRDTHDYKRGDSCEGCK GFFKRTVRKDLTYTCRDYKYRESYHKCSDLCQYCRYQKALAIHHQ HHHGGSMGMSGSGTGY C5, MDYKDDDDKGGKHICAICGDQLPNDMNDKDYKSYEGSK GPFKRTARKDLTNTCRDYKYRESYHKCLDLCQYCRYQKALATHHH HHHGGSMGMSGSGTGY

D4, MDYKDNDDKGGKHICAICGDILDDDYDYKQTDSREGRQ GFFKRTLRKDLTYSCRDYKYRESYHKCQDLCQYCRYRRALAKGTK RGAVQEEVGIHHQHHHGGSMGMSGSGTGY

_G3, MDYKDNDDKGGKHICAICGDILDDDYDYKQTDSREGRQ GFFKRTLRKDLTYSCRDYKYRESYHKCQDLCQYCRYQRALAKGT KREAVQEEVGIHHQHHHGGSMGMSGSGTGY E8, MDYKDNDDKGGKHICAICGDILDDDYDYKQTDSREGRQ GFFKRTLRKDLTYPCRDYKYRESYHKCQDLCQYCRYQRALAKGTK REAVQEEVGIHHQHHHGGSMGMSGSGTGY

B2, MDYKDNDDKGGKHICAIGGDILDDDYDYKQTDSREGRQ GFFKRTLRKDLTYSCRDYKYRESYHKCQDLCQYCRYQRALAKGTK REAVQEEAGSHHQHHHGGSMGMSGSGTGY

A3, MDYKDNDDKGGKHICAIGGDILDDDYDYKQTDSREGRQ GFFKRTLRKDLTYSCRDYKYRESYHKCQDLCQYCRYQRALAKGT KREAVQEEAGSHHQHHHGGSMGMSGSGTGY

C1, MDYKDNDDKGGKHICAICGDILDDDYDYKQTDSREGRQ GFFKRTLRKDLTYSCRDYKYRESYHKCQDLCQYCRYQRALAKGT KREAVQEEVGIHHQHHHGGSMGMSGSGTGY

A7, MDYKDNDDKGGKHICAICGDILDDDYDYKQTDSREGRQ GFFKRTLRKDLTYSCRDYKYRESYHKCQDLCQYCRHQRALAKGTK REAVQEEVGIHHQHHHGGSMGMSGSGTGY F8, MDYKDNDDKGGKHICAICGDILNDDYDYKQTDSREGRQ

GFFKRTLRKDLTYSCRDYKYRESYHKCQDLCQYCRYQRALAKGTK REAVQEEVGIHHQHHHGGSMGMSGSGTGY b9, MDYKDNDDKGGKHICAICGDILNDDYDYKQTDSREGRQ GFFKRTLRKDLTYSCRDYKYRESYHKCQDLCQYCRYQRALAKGTK REAVQEEVGIHHQHHHGGSMGMSGSGTGY HI1, MDYKDNDDKGGKHICAICGDILDDDYDYKQTDSREGRQ GFFKRTLRKDLTYSCRDYKYRESYHKCQDLCQYCRYQRALAKGTK REAVQEEVGIHHQHHHGGSMGMSGSGTGY

_G1, MDYKDNDDKGGKHICAICGDILDDDYDYKQTDSREGRQ GFFKRTLRKDLTYSCRDYKYRESYHKCQDLCQYCRYQRALAKGTK REAVQEEVGIHHQHHHGGSMGMSGSGTGY

H2, MDYKDNDDKGGRHICAICGDILDDDYDYKQTDSREGRQ GFFKRTLRKDLTYSCRDYKYRESYHKCQDLCQYCRYQRALAKGTK REAVQEEVGIHHQHHHGGSMGMSGSGTGY G2, MDYKDNDDKGGRHICAICGDILDDDYDYKQTDSREGRQ GFFKRTLRKDLTYSCRDYKYRESYHKCQDLCQYCRYQRALAKGTK REAVQEEVGIHHQHHHGGSMGMSGSGTGY

H1, MDYKDNDDKGGKHICAICGDILDDDYDYKQTDPREGRQ GFFKRTLRKDLTYSCRDYKYRESYHKCQDLCQYCRYQRALAKGTK REAVQEEVGIHHQHHHGGSMGMSGSGTGY

F4, MDYKDNDDKGGKHICAICGDILDDDYDYKQTDSREGRQ GFFKRTLRKDLTYSCRDYKYRESYHKCLDLCQYCRYQRALAKGTK REAVQEEVGIHHQHHHGGSMGMSGSGTGY

H9, MDYKDNDDKGGKHICAICGEILDDDYDYKQTDSREGRQ GFFKRTLRKDLTYSCRDYKYRESYHECQDLCQYCRYQRALAKGTK REAVQEEVGVHHQHHHGGSMGMSGSGTGY

_G4, MDYKDNDDKGGKHICAICGDILDDDYDYKQTDSREGRQ AFFKRTLRKDLTYSCRDYKYRESYHKCQDLCQYCRYQRALAKGTK REAMQEEVGIHHQHHHGGSMGMSGSGTGY D2, MDYKDNDDKGGKHICAICGDTLDDDYDYKQTDSREGRQ GFFKRTLRKDLTYSCRDYKYRESYHKCQDLCQYCRYQRALAKGTK GEAVQEEVGIHHQHHHGGSMGMSGSGTGY

D9, MDYKDDDDKGGMHICAICGDTLSDAKDYKIDDSSEGSKG FFKRAVRKDQTYTCRDYKYRESYHKCQDLCQCCRYQRALAKGTK REAVQEGVGIHHQHHHGGSMGMSGSGTGY c8, MD YKDDDDRGRKHICAICGDYID YKIEDHEDDSREDCKG

FLKRTVRKDLTYSCRDYKYRESYHKCSDLCQYCRYQKALAMGKN KEAVQEEVGIHHQHHHGGSMGMSGSGTGY

AL0, MDYKDDDKGGKHICAICGDWATADTKTQYDSCEGCKG FSKRTVRRGLTYTCRDYKNCESYHKCSELCQYCRYQRALAMGLKR EAVQEEVGIHHLHHHGGSMGMSGSGTGY

E9, MDYKDDDKGGKHICAICGDWATADTKTQYDSCEGCKG FSKRTVRRGLTYTCRDYKNCESYHKCSELCQYCRYQRALAMGLKR EAVQEEVGIHHLHHHGGSMGMSGSGTGY D8, MDYKDDDDKGGKHICAICGDWSTTDTKTQYDSCEGCKG FSKRTVGKDLTYTCRDYKNCESYHKCSDLCQYCKYQKALAVGMKR EAVQEEVGSHHQHHHGGSMGMSGSGTGY

B7, MDYKDDDDKGRKHICAICGEWTTADTKIQYDSCEGCRG LSKRTVRKDLTYTCRDYKNCESYHKCSDLCQYCRYQKALAKSTKR EAAQEGVGSHHQHHHGGSMGMSGSGTGY

CT₅ MDYKDDDDEGGRHICAICGDWATADTRTQYDSCEGCK

GSSKRTVRKDLTYICQDYKNCESYHKCSDLCQYCRYQKALAMGM KREAVQEEVGIHHQHHHGGSMGMSGSGTGY

E4, MDYKDDDDKGGKHICAVCGDYISAVDTQSKGDSCEGCK GFFNRTEKKDLSYTCRDYKNCESYHRCQDLCQYCRYQKALAMGV EREAVQNWGIHHQHHHGGSMGMSGSGTGY

F7, MDYKDDDDKGGRHICAICGNNAEDYKHTDMDLTYTDRD YKNCESYHKCQDLCQYCRYQKALAMGIKREAVQEEVGSHHQHHH GGSMGMSGSGTGY D 11 , MDYKDDDDKGGRHICAICGNNAEDYKHTDMDLTYTDR

DYKNCESYHKCQDLCQYCRYQKALAMGIKREAVQEEVGSHHQHH HGGSMGMSGSGTGY

G7, MDYKDDDDKGGRHICAICGNNAEDYKHTDMDLTYTDR DYKNCESYHKCQDLCQYCRYQKALAMGIKREAVQEEVGSHHQHH HGGSMGMSGSGTGY b10, MDYKDDDDKGGRHICAICGNNAEDYKHTDMDLTYTDR DYKNCESYHKCQDLCQYCRYQKALAMGIKREAVQEEVGSHHQHH HGGSMGMSGSGTGY d10₅ MDYKDDDDKGGRHICAICGNNAEDYKHTDMDLTYTDR DYKNCESYHKCQDLCQYCRYQKALAMGIKREAVQEEVGSHHQHH HGGSMGMSGSGTGY hi, MDYKDDDDKGGRHICAICGNNAEDYKHTDMDLTYTDR DYKNCESYHKCQDLCQYCRHQKALAMGIKREAVQEEVGSHHQH HHGGSMGMSGSGTGY e10, MDYKDDDDKDGKHICAICGDTVTNTDYKTPDLTSTCR DYKNRESYHKCSDLCQYCRYQKALAMGTKREAAQEEVGSHHQH HHGGSMGMSGSGTGY b8, MDYKDDDDKDGKHICAICGDTVTNTDYKTPDLTSTCRD YKNRESYHKCSDLCRYCRYQKALAMGTKREAAQEEVGSHHQHH HGGSMGMSGSGTGY

(17,MDYKDDDDKDGKHICAICGDTVTNTDYKTTDLTSTCRD YKNRESYHKCSDLCQYCRYQKALAMGTKREAAQEEVGSHHQHH HGGSMGMSGSGTGY c2, MDYKDNDDKGGKHICAICGDFTNIDYKDEGQTYTCRD YKYRESYHKCSDLCQYCRYQKALAVGMNREAVRDEVGSHHQH HHGGSMGMSGSGTGY clO, MDYKDNDDKGGRHICAICGNNAEDYKHTDMDLTYTD RDYKNCESYHKCSDLCQHCRYLKAPAMGMRGVAVQEEVGSHHQ HHHGGSMGMSGSGTGY c3, MDYKDDDDKDGEHICAICGDTVTNTDYKTPDPTSTCRD YKNRESYHKCSDLCQYCRYQKALAMGMKSEAAQEEIGAHHQHH HGGSMGMSGSGTGY

Sequences ofligases selected at 65⁰C A-6

MGAP VP YPDPLEPRGGKHICAICGDILDDDYD YKQTDSREGRQGFFKR TLRKDLTYSCRDYKYRESYHKCSDLCQYCRYQKALAIHHQHHHGGSM GMSGSGTGY* (SEQ ID NO.:20)

B-7 MGAPVPYPDPLEPRGGKHICAICGEILDDDYDYKQTDSREGRQGFFKRT LRKDLTYSCRDYKYRESYHKCSDLCQYCRYQKALAIHHQHHHGGSMG MSGSGTGY* (SEQ ID NO.:21)

C-IOC MGAPVP YPDPLEPRGGKHIC AICGNNAED YKHTDMDLTYTDRD YKNC ESYHKCSDLCQYCRYQKDLAIHHQHHHGGSMGMSGSGTGY* (SEQ ID NO.:22)

D-10H MGAPVPYPDPLEPRGGKHICAICGDILDDDYDYKQTDSREGRQGFFKR TLRKDLTYSCRD YK YRESYHKCSDLCQSCRYQKAL AIHHQHHHGGSM GMSGSGTGY* (SEQ ID NO.:23)

Other Embodiments All publications, patents, and patent applications mentioned in the above specification are hereby incorporated by reference. Various modifications and variations of the described method and system of the invention will be apparent to those skilled in the art without departing from the scope and spirit of the invention. Although the invention has been described in connection with specific embodiments, it should be understood that the invention as claimed should not be unduly limited to such specific embodiments. Indeed, various modifications of the described modes for carrying out the invention that are obvious to those skilled in the art are intended to be within the scope of the invention. Other embodiments are in the claims.

What is claimed is:

Claims

1. A polypeptide capable of catalyzing a ligation reaction between a first RNA molecule comprising a hydroxyl group at the 3 ' position and a second RNA molecule comprising a triphosphate group at the 5' position, said ligation reaction occurring in the presence of an oligonucleotide that is complementary to at least three consecutive bases of said first RNA molecule and at least three consecutive bases of said second RNA molecule, wherein said ligation reaction results in the formation of a phosphodiester bond between said 3' position of said first RNA molecule and said 5' position of said second RNA molecule.

2. The polypeptide of claim 1, wherein said ligation reaction proceeds at least 10,000 times faster than an uncatalyzed control reaction comprising said first and second RNA molecules and said oligonucleotide.

3. The polypeptide of claim 1, comprising an amino acid sequence that is at least 80% identical to amino acids 17-88 of any one of SEQ ID NOs.: 20, 21, or 23, or amino acids 17-70 of SEQ ID NO.: 22.

4. The polypeptide of claim 3, wherein said ligation reaction is performed at 65°C.

5. The polypeptide of claim 1, comprising an amino acid sequence that is at least 80% identical to amino acids 12-79 of any one of SEQ ID NOs.: 1, 3- 5, 8-15, or 17-18; amino acids 12-61 of SEQ ID NO.: 2; amino acids 12-62 of SEQ ID NOs.: 6 or 7; or amino acids 11-78 of SEQ ID NO.: 16?

6. The polypeptide of claim 5, comprising an amino acid sequence that is at least 80% identical to amino acids 12-79 of any one of SEQ ID NOs.: 1 or 3-5; amino acids 12-61 of SEQ ID NO.: 2; or amino acids 12-62 of SEQ ID NOs.: 6 or 7.

7. The polypeptide of claim 5, comprising amino acids 17-88 of any one of SEQ ID NOs.: 20, 21, or 23, or amino acids 17-70 of SEQ ID NO.: 22.

8. The polypeptide of claim 5, comprising amino acids 12-79 of any one of SEQ ID NOs.: 1, 3-5, 8-15, or 17-18; amino acids 12-61 of SEQ ID NO.: 2; amino acids 12-62 of SEQ ID NOs.: 6 or 7; or amino acids 11-78 of SEQ ID NO.: 16.

9. The polypeptide of claim 8, comprising amino acids 12-79 of any one of SEQ ID NOs.: 1 or 3-5; amino acids 12-61 of SEQ ID NO.: 2; or amino acids 12-62 of SEQ ID NOs.: 6 or 7.

10. The polypeptide of claim 5, comprising amino acids 12-90 of any one of SEQ ID NOs.: 1, 4, 8, 10, 11, 13-15, 17, or 18; amino acids 12-72 of SEQ ID NO.: 2; amino acids 12-79 of any one of SEQ ID NOs.: 3, 5, 9, or 12; amino acids 12-73 of SEQ ID NOs.: 6 or 7; or amino acids 11-89 of SEQ ID NO.: 16.

11. The polypeptide of claim 10, comprising amino acids 12-90 of SEQ ID NOs.: 1 or 4; amino acids 12-72 of SEQ ID NO.: 2; amino acids 12-79 of SEQ ID NOs.: 3 or 5; or amino acids 12-73 of SEQ ID NOs.: 6 or 7.

12. The polypeptide of claim 3, comprising any one of SEQ ID NOs.:

20-23.

13. The polypeptide of claim 5, comprising any one of SEQ ID NOs.: 1-18.

14. The polypeptide of claim 13, comprising any one of SEQ ID NOs.: 1-7.

15. The polypeptide of claim 3, comprising an amino acid sequence that differs from amino acids 26-37 or 64-73 of any one of SEQ ID NOs.: 20, 21, or 23 by at least one amino acid.

16. The polypeptide of claim 5, comprising an amino acid sequence that differs from amino acids 21-32 or 60-68 of SEQ ID NO.: 1 by at least one amino acid, or that differs from amino acids 21-29 or 42-50 of SEQ ID NO.: 2 by at least one amino acid, or that differs from amino acids 21-32 or 60-68 of SEQ ID NO.: 3 by at least one amino acid, or that differs from amino acids 21- 32 or 60-68 of SEQ ID NO.: 4 by at least one amino acid, or that differs from amino acids 21-32 or 60-68 of SEQ ID NO.: 5 by at least one amino acid, or that differs from amino acids 21-30 or 43-51 of SEQ ID NO.: 6 by at least one amino acid, or that differs from amino acids 21-30 or 43-51 of SEQ ID NO.: 7 by at least one amino acid, or that differs from amino acids 21-32 or 60-68 of SEQ ID NO.: 8 by at least one amino acid, or that differs from amino acids 21- 32 or 60-68 of SEQ ID NO.: 9 by at least one amino acid, or that differs from amino acids 21-32 or 60-68 of SEQ ID NO.: 10 by at least one amino acid, or that differs from amino acids 21-32 or 60-68 of SEQ ID NO.: 11 by at least one amino acid, or that differs from amino acids 21-32 or 60-68 of SEQ ID NO.: 12 by at least one amino acid, or that differs from amino acids 21-32 or 60-68 of SEQ ID NO.: 13 by at least one amino acid, or that differs from amino acids 21-32 or 60-68 of SEQ ID NO.: 14 by at least one amino acid, or that differs from amino acids 21-32 or 60-68 of SEQ ID NO.: 15 by at least one amino acid, or that differs from amino acids 20-31 or 59-67 of SEQ ID NO.: 16 by at least one amino acid, or that differs from amino acids 21-32 or 60-68 of SEQ ID NO.: 17 by at least one amino acid, or that differs from amino acids 21-32 or 60-68 of SEQ ID NO.: 18 by at least one amino acid.

17. The polypeptide of claim 16, comprising an amino acid sequence that differs from amino acids 21-32 or 60-68 of SEQ ID NO.: 1 by at least one amino acid, or that differs from amino acids 21-29 or 42-50 of SEQ ID NO.: 2 by at least one amino acid, or that differs from amino acids 21-32 or 60-68 of SEQ ID NO.: 3 by at least one amino acid, or that differs from amino acids 21- 32 or 60-68 of SEQ ID NO.: 4 by at least one amino acid, or that differs from amino acids 21-32 or 60-68 of SEQ ID NO.: 5 by at least one amino acid, or that differs from amino acids 21-30 or 43-51 of SEQ ID NO.: 6 by at least one amino acid, or that differs from amino acids 21-30 or 43-51 of SEQ ID NO.: 7 by at least one amino acid.

18. A cell comprising the polypeptide of any one of claims 1-17.

19. A method for the selection of a polypeptide capable of catalyzing a ligation reaction between a first RNA molecule and a second RNA molecule, said method comprising the steps of:

(a) providing a population of candidate RNA molecules, each of which comprises a translation initiation sequence and a start codon operably linked to a candidate polypeptide coding sequence and each of which is covalently bonded to a peptide acceptor at the 3 ' end of said candidate polypeptide coding sequence, said peptide acceptor being a molecule that can be added to the C- terminus of a growing polypeptide chain by the catalytic activity of a ribosomal peptidyl transferase;

(b) in vitro translating said candidate polypeptide coding sequences of said candidate RNA molecules to produce a population of candidate RNA- polypeptide fusions; and

(c) selecting a desired RNA-polypeptide fusion based on RNA ligase activity, thereby selecting said polypeptide capable of catalyzing said ligation reaction.

20. The method of claim 19, wherein said selecting step (c) occurs at 65°C.

21. The method of claim 19, wherein said selecting in step (c) occurs in the presence of an oligonucleotide that is complementary to at least three consecutive bases of said first RNA molecule and at least three consecutive bases of said second RNA molecule.

22. The method of claim 19, wherein said first RNA molecule comprises a hydroxyl group at the 3' position.

23. The method of claim 22, wherein said second RNA molecule comprises a triphosphate group at the 5' position.

24. The method of claim 22, wherein said second RNA molecule comprises a phosphoryl group at the 5' position.

25. The method of claim 22, wherein said second RNA molecule comprises an imidazolide, an imidazolide derivative, a phosphoramidate, a carboxylic anhydride, a mixed anhydride, or an activated phosphodiester at the 5' position.