WO1993016178A2 - Sequences characteristic of human gene transcription product - Google Patents

Sequences characteristic of human gene transcription product Download PDF

Info

Publication number
WO1993016178A2
WO1993016178A2 PCT/US1993/001294 US9301294W WO9316178A2 WO 1993016178 A2 WO1993016178 A2 WO 1993016178A2 US 9301294 W US9301294 W US 9301294W WO 9316178 A2 WO9316178 A2 WO 9316178A2
Authority
WO
WIPO (PCT)
Prior art keywords
seq
polynucleotide
sequences
sequence
est
Prior art date
Application number
PCT/US1993/001294
Other languages
French (fr)
Other versions
WO1993016178A3 (en
Inventor
Craig J. Venter
Mark D. Adams
Ruben F. Moreno
Original Assignee
The United States Of America, As Represented By The Secretary, Department Of Health And Human Services
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by The United States Of America, As Represented By The Secretary, Department Of Health And Human Services filed Critical The United States Of America, As Represented By The Secretary, Department Of Health And Human Services
Publication of WO1993016178A2 publication Critical patent/WO1993016178A2/en
Publication of WO1993016178A3 publication Critical patent/WO1993016178A3/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/435Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans
    • C07K14/46Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans from vertebrates
    • C07K14/47Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans from vertebrates from mammals
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1096Processes for the isolation, preparation or purification of DNA or RNA cDNA Synthesis; Subtracted cDNA library construction, e.g. RT, RT-PCR
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers

Definitions

  • the present invention relates to newly identified polynucleotide sequences corresponding to transcription products of human genes, and to complete gene sequences associated therewith.
  • This invention relates to human genes. Identification and sequencing of human genes is a major goal of modern scientific research. The sequence of human genes is more than just a scientific curiosity. For example, by identifying genes and determining their sequences, scientists have been able to make large quantities of valuable human "gene products.” These include human insulin, interferon, Factor VIII, tumor necrosis factor, human growth hormone, tissue plasminogen activator, and numerous other compounds. Additionally, knowledge of gene sequences can provide the key to treatment or cure of genetic diseases (such as muscular dystrophy and cystic fibrosis). The present invention represents a quantum leap forward in civilization's knowledge of human gene sequences.
  • the present invention is based on identification and characterization of gene segments.
  • Genes are the basic units of inheritance. Each gene is a string of connected bases called nucleotides. Most genes are formed of deoxyribonucleic acid, DNA. (Some viruses contain genes of ribonucleic acid, RNA.) The genetic information resides in the particular sequence in which the bases are arranged. A short sequence of nucleotides is often called a polynucleotide or an oligonucleotide.
  • polypeptides are built from long strings of individual units. These units are amino acids.
  • the nucleotide sequence of a gene tells the cell the sequence in which to arrange the amino acids to make the polypeptide encoded by that gene.
  • chains of up to about 200 amino acids are called polypeptides, while proteins are larger molecules made up of polypeptide subunits; both types of molecules are referred to generally herein as polypeptides.
  • a triplet of nucleotides (codon) in DNA codes for each amino acid or signals the beginning or end of the message (anticodon).
  • the term codon is also used for the corresponding (and complementary) sequences of three nucleotides in the mRNA into which the original DNA sequence is transcribed.
  • RNA messenger RNA
  • mRNA messenger RNA
  • the mRNA in turn, can be translated into a polypeptide by the cell. This entire process is called gene expression, and the polypeptide is the gene product encoded by the gene.
  • cDNA complementary DNA
  • genes include those which code for polypeptides, those which are transcribed into RNA but are not translated into polypeptides, and those whose functional significance does not demand that they be transcribed at all.
  • Most genes are found on large molecules of DNA located in chromosomes. Double stranded cDNA carries all the information of a gene. Each base of the first strand is joined to a complementary base (hybridized) in the second strand.
  • the linear DNA molecules in chromosomes have thousands of genes distributed along their length. Chromosomes include both coding regions (coding for polypeptides) and noncoding regions; the coding regions. represent only about three percent of the total chromosome sequence.
  • An individual gene has regulatory regions that include a promoter which directs expression of the gene, a coding region which can code for a polypeptide, and a termination signal.
  • the regulatory DNA sequence is usually a noncoding region that determines if, where, when, and at what level a particular gene is expressed.
  • the coding regions of many genes are discontinuous, with coding sequences (exons) alternating with noncoding regions (introns).
  • the final mRNA copy of the gene does not include these introns (which can be much longer than the coding region itself), although it does contain certain untranslated regions that usually do not code for the polynucleotide gene product.
  • Untranslated sequences at the beginning and end of the mRNA are known as 5'- and 3'-untranslated regions, respectively. This nomenclature reflects the orientation of the nucleotide constituents of the mRNA.
  • a cDNA is a DNA copy of a messenger RNA, which contains all of the exons of a gene.
  • the cDNA can be thought of as having three parts: an untranslated 5' leader, an uninterrupted polypeptide-coding sequence, and a 3' untranslated region.
  • the untranslated leader and trailing sequences are important for initiation of translation, mRNA stability, and other functions.
  • the untranslated leader and trailing sequences are called 5'- and 3'-untranslated sequences, respectively.
  • the 3' untranslated sequence is usually longer than the 5' untranslated leader, and can be longer than the polypeptide-coding sequence.
  • the untranslated regions typically have many, randomly-distributed stop codons, and do not display the nonrandom base arrangements found in coding sequences.
  • the 5'-untranslated sequence is relatively short, generally between 20 and 200 bases.
  • the 3'-untranslated sequence is often many times longer, up to several thousand bases.
  • the translated or coding sequence begins with a translational start codon (AUG or GUG) and ends with a translational stop codon (UAA, UGA, or UAG).
  • translation begins at the first "start” codon on the mRNA and proceeds to the first "stop” codon. Coding sequences can be distinguished by their nonrandom distribution of bases; numerous computer algorithms have been developed to distinguish coding from noncoding regions in this way.
  • PCR polymerase chain reaction
  • oligonucleotide primers that hybridize to opposite strands.
  • Primer extension proceeds inward across the region between the two primers, and the product of DNA synthesis of one primer serves as a template for the other primer. Repeated cycles of DNA denaturation, annealing of primers, and extension result in an exponential increase in the number of copies of the region bounded by the primers.
  • a labeled segment of single-stranded DNA can be hybridized to a longer DNA sequence, such as a chromosome, to mark a specific location on the longer sequence.
  • a longer DNA sequence such as a chromosome
  • the Human Genome Project is an effort to sequence all human DNA (the human genome).
  • the human genome is estimated to comprise 50,000 - 100,000 genes, up to 30,000 of which might be expressed in the brain (Sutcliffe, Ann. Rev. Neurosci. 11:157 (1988)).
  • Once dedicated human chromosome sequencing begins in three to five years, it was expected that 12-15 years will be required to complete the sequence of the genome (Report of the Ad Hoc Program. Advisory Committee on Complex Genomes, Reston, Va., Feb. 1988, D. Baltimore Ed. (NIH, Bethesda, Md, 1988)).
  • the present invention can greatly accelerate the pace at which human genes can be identified and mapped.
  • GenBank listed the sequences of only a few thousand human genes and less than two hundred human brain mRNAs (GenBank Release 66.0, December, 1990).
  • Genomic sequencing proponents have argued the difficulty of finding every mR ⁇ A expressed in all tissues, cell types, and developmental states, and that much valuable information from intronic and intergenic regions, including control and regulatory sequences, will be missed by cD ⁇ A sequencing. (Report of the Committee on Mapping and Sequencing the Human Genome, National Research Council (National Academy Press, Washington, D.C. 1988)). Further, sequencing of transcribed regions of the genome using cDNA libraries has heretofore been considered impractical or unsatisfactory. Libraries of cDNA were believed to be dominated by repetitive elements, mitochondrial genes, ribosomal RNA genes, and other nuclear genes comprising common or housekeeping sequences.
  • cDNA libraries would provide few sequences corresponding to structural and regulatory polypeptides or peptides. See, for example, Putney, et al., Nature 302:718-721 (1983). Putney, et al. sequenced over 150 clones from a rabbit muscle cDNA library and identified clones for 13 of the 19 known muscle polypeptides, including one new isotype but no unknown coding sequences.
  • cDNA sequencing now provides a rapid method for obtaining enormous amounts of valuable genetic information and DNA products of great utility for the biotechnology and pharmaceutical industries. Not only can many distinct cDNAs be isolated and sequenced, even partial cDNAs can be used, with conventional, well-understood methods, to isolate entire genes, and to determine the chromosomal locations and biological functions of these genes. As is demonstrated here, fragments of only a few hundred bases are sufficient, in many cases, to identify the probable function of a new human gene if it is similar in structure to a gene from another animal, or from plants or bacteria.
  • fragments of untranslated regions of a cDNA can be used to: i) isolate the coding sequence of the cDNA; ii) isolate the complete gene; iii) determine the position of the gene on a human chromosome, and hence the potential of the gene to cause a human genetic disease; and iv) determine the function of the gene by means of experiments in which the function of the native gene is disrupted by the addition of a short DNA fragment to the cell, e.g., using triple helix or antisense probes.
  • coding regions comprise such a small portion of the human genome
  • identification and mapping of transcribed regions and coding regions of chromosomes is of significant interest.
  • human sequences are valuable for chromosome mapping, human identification, identification of tissue type and origin, forensic identification, and locating disease-associated genes (i.e., genes that are associated with an inherited human disease, whether through mutation, deletion, or faulty gene expression) on the chromosome.
  • sequences of the present invention were ascertained using a fast approach to cDNA characterization. This approach could facilitate the tagging of most expressed human genes within a few years at a fraction of the cost of complete genomic sequencing, provide new genetic markers, provide new DNA-based therapeutics and diagnostics, and provide other valuable nucleotide reagents.
  • ESTs styled Expressed Sequence Tags
  • STSs random genomic DNA sequence tagged sites
  • aspects of the present invention thus include the individual ESTs, corresponding partial and complete cDNA, genomic DNA, mRNA, antisense strands, triple helix probes, PCR primers, coding regions, and constructs. Also, where one skilled in the art is enabled by this specification to prepare expression vectors and polypeptide expression products, they are also within the scope of the present invention, along with antibodies, especially monoclonal antibodies, to such expression products.
  • the single drawing Figure schematically illustrates the progression from chromosome to gene to mRNA to cDNA.
  • the sequences of the present invention were isolated from commercially available and custom made cDNA libraries using a rapid screening and sequencing technique.
  • the method comprises applying conventional automated DNA sequencing technology to screening clones, advantageously randomly selected clones, from a cDNA library.
  • the library is initially "enriched” through removal of ribosomal sequences and other common sequences prior to clone selection.
  • ESTs are generated from partial DNA sequencing of the selected clones.
  • the ESTs of the present invention were generated using low redundancy of sequencing, typically a single sequencing reaction. While single sequencing reactions may have an accuracy as low as 97%, this nevertheless provides sufficient fidelity for identification of the sequence and design of PCR primers.
  • transcripts of the gene will not be represented in cDNA libraries so the gene will not be identifiable by EST sequencing.
  • a new method called "exon amplification" can be used to isolate and identify transcripts of such genes.
  • Exon amplification works by artificially expressing part or all of a gene that is contained in a cloned fragment of genomic DNA such as a cosmid or yeast artificial chromosome (YAC).
  • the gene is cloned into a special vector, designed at MIT, that uses control elements from virus genes to express the protein-coding exons of the human gene of interest.
  • Exon trapping shows considerable promise as a general technique for identifying those genes in the human genome that cannot be found by cDNA cloning and EST sequencing.
  • Exon amplification will also be useful for identifying the genes in regions of genomic DNA to which disease genes have been mapped.
  • the exon amplification method can be used directly with the cosmid and YAC clones frown human chromosomes that are being obtained by both NIH and DOE supported human genome centers.
  • ESTs comprise DNA sequences corresponding to a portion of nuclear encoded messenger RNA.
  • An EST is of sufficient length to permit: (1) amplification of the specific sequence from a cDNA library, e.g., by polymerase chain reaction (PCR); (2) use of a synthetic polynucleotide corresponding to a partial or complete sequence of the EST as a hybridization probe of a cDNA library, generally having 30 - 50 base pairs; or (3) unique designation of the pure cDNA clone from which the EST was derived (the EST clone) for use as a hybridization probe of a cDNA library.
  • EST-derived primer pairs and sequences amplify or detectably hybridize to a sequence from a genomic library.
  • the ESTs disclosed herein are generally at least 150 base pairs in length.
  • the length of an EST is determined by the quality of sequencing data and the length of the cloned cDNA.
  • Raw data from the automated sequencers is edited to remove low quality sequence at the end of the sequencing run.
  • High quality sequences (usually a result of sequencing templates without excessive salt contamination) generally give about 400 bp of reliable sequence data; other sequences give fewer bases of reliable data.
  • a 150 bp EST is long enough to be translated into a 50 amino acid peptide sequence. This length is sufficient to observe similarities when they exist in a database search.
  • 150 bp is long enough to design PCR primers from each end of the sequence to amplify the complete EST. Sequences shorter than 150 bp are difficult to purify and use following PCR amplification. Furthermore, a 150 bp polynucleotide is likely to give a very strong signal with low background in a screen of a genomic library.
  • This problem can be circumvented by using the 3'-untranslated part of the cDNA alone as a probe for the chromosomal location or for the full -length cDNA or gene .
  • the 3'-untranslated region is more likely to be unique within gene families, since there is no evolutionary pressure to conserve a coding function of this region of the mRNA.
  • ESTs can be used to map the expressed sequence to a particular chromosome.
  • ESTs can be expanded to provide the full coding regions, as detailed below. In this manner, previously unknown genes can be identified.
  • cDNA libraries can be used to obtain ESTs
  • human brain cDNA libraries are exemplified and represent a preferred embodiment.
  • Suitable cDNA libraries can be freshly prepared or obtained commercially, e.g., as shown in Examples 1, 2, and 11.
  • the cDNA libraries from the desired tissue are preferably preprocessed by conventional techniques to reduce repeated sequencing of high and intermediate abundance clones and to maximize the chances of finding rare messages from specific cell populations.
  • preprocessing includes the use of defined composition prescreening probes, e.g., cDNA corresponding to mitochondria, abundant sequences, ribosomes, actins, myelin basic polypeptides, or any other known high abundance peptide; these prescreening probes used for preprocessing are generally derived from known ESTs.
  • Other useful preprocessing techniques include subtraction, which preferentially reduces the population of certain sequences in the library (e.g., see A. Swaroop et al., Nucl. Acids Res. 19, 1954 (1991)), and normalization, which results in all sequences being represented in approximately equal proportions in the library (Patanjali et al, Proc. Natl. Acad. Sci. USA 88:1943 (1991)).
  • the cDNA libraries used in the present method will ideally use directional cloning methods so that either the 5' end of the cDNA (likely to contain coding sequence) or the 3' end (likely to be a non-coding sequence) can be selectively obtained.”
  • Libraries of cDNA can also be generated from recombinant expression of genomic DNA. After they are amplified, ESTs can be obtained and sequenced, e.g., as illustrated in Example 11.
  • sequences of the present invention include the specific sequences set forth in the Sequence Listing and designated SEQ ID NO: 1 - SEQ ID NO: 2412.
  • the invention relates to those sequences of SEQ ID NOS: 1 - 2412 that comprise the cDNA coding sequences for polypeptides having less than 95% identity with known amino acid sequences (see Table 2) and more preferably less than 90% or 85% identity.
  • the invention relates to those sequences of SEQ ID NOS: 1 - 2412 that encode polypeptides having no similarity to known amino acid sequences (see Examples that follow). Precisely because they do not contain coding regions and are therefore more unique in their sequence structures, those sequences which meet neither of the preceding criteria can be most useful and are generally preferred for mapping.
  • the ESTs of the present invention generally represent relatively small coding regions or untranslated regions of human genes. Although most of these sequences do not code for a complete gene product, the ESTs of the present invention are highly specific markers for the corresponding complete coding regions.
  • the ESTs are of sufficient length that they will hybridize, under stringent conditions, only with DNA for that gene to which they correspond.
  • Suitably stringent conditions comprise conditions, for example, where at least 95%, preferably at least 97% or 98% identity (base pairing), is required for hybridization. This property permits use of the EST to isolate the entire coding region and even the entire sequence. Therefore, only routine laboratory work is necessary to parlay the unique EST sequence into the corresponding unique complete gene sequence.
  • each EST “corresponds" to a particular unique human gene. Knowledge of the EST sequence permits routine isolation and sequencing of the complete coding sequence of the corresponding gene. The complete coding sequence is present in a full-length cDNA clone as well as in the gene carried on genomic clones. Therefore, each EST "corresponds" to a cDNA (from which the EST was derived), a complete genomic gene sequence, a polypeptide coding region (which can be obtained either from the cDNA or genomic DNA) , and a polypeptide or amino acid sequence encoded by that region.
  • the first step in determining where an EST is located in the cDNA is to analyze the EST for the presence of coding sequence, e.g., as described in Example 14.
  • the CRM program predicts the extent and orientation of the coding region of a sequence. Based on this information, one can infer the presence of start or stop codons within a sequence and whether the sequence is completely coding or completely noncoding. If start or stop codons are present, then the EST can cover both part of the 5'-untranslated or 3'-untranslated part of the mRNA (respectively) as well as part of the coding sequence. If no coding sequence is present, it is likely that the EST is derived from the 3'-untranslated sequence due to its longer length and the fact that most cDNA library construction methods are biased toward the 3' end of the mRNA.
  • Radiolabel the isolated insert DNA e.g., with 32 P labels, preferably by nick translation or random primer labeling.
  • An EST is a specific tag for a messenger RNA molecule.
  • the complete sequence of that messenger RNA, in the form of cDNA, can be determined using the EST as a probe to identify a cDNA clone corresponding to a full-length transcript, followed by sequencing of that clone.
  • the EST or the full-length cDNA clone can also be used as a probe to identify a genomic clone or clones that contain the complete gene including regulatory and promoter regions, exons, and introns.
  • ESTs are used as probes to identify the cDNA clones from which an EST was derived.
  • ESTs, or portions thereof can be nick-translated or end-labelled with P 32 using polynucleotide kinase using labelling methods known to those with skill in the art (Basic Methods in Molecular Biology, L.G. Davis, M.D. Dibner, and J.F. Battey, ed., Elsevier Press, NY, 1986).
  • the lambda library can be directly screened with the labelled ESTs of interest or the library can be converted en masse to pBluescript (Stratagene, La Jolla, California) to facilitate bacterial colony screening. Both methods are well known in the art.
  • filters with bacterial colonies containing the library in pBluescript or bacterial lawns containing lambda plaques are denatured and the DNA is fixed to the filters.
  • the filters are hybridized with the labelled probe using hybridization conditions described by Davis et al.
  • the ESTs, cloned into lambda or pBluescript, can be used as positive controls to assess background binding and to adjust the hybridization and washing stringencies necessary for accurate clone identification.
  • the resulting autoradiograms are compared to duplicate plates of colonies or plaques; each exposed spot corresponds to a positive colony or plaque.
  • the colonies or plaques are selected, expanded and the DNA is isolated from the colonies for further analysis and sequencing.
  • the ESTs can additionally be used to screen Northern blots of mRNA obtained from various tissues or cell cultures, including the tissue of origin of the EST clone. Northern analysis will most often produce one to several positive bands. The bands can be selected for further study based on the predicted size of the mRNA.
  • Positive cDNA clones in phage lambda are analyzed to determine the amount of additional sequence they contain using PCR with one primer from the EST and the other primer from the vector.
  • Clones with a larger vector-insert PCR product than the original EST clone are analyzed by restriction digestion and DNA sequencing to determine whether they contain an insert of the same size or similar as the mRNA size on a Northern blot.
  • the complete sequence of the clones can be determined.
  • the preferred method is to use exonuclease III digestion (McCombie, W.R, Kirkness, E., Fleming, J.T., Kerlavage, A.R., Iovannisci, D.M., and Martin-Gallardo, R., Methods: 3: 33-40, 1991).
  • a series of deletion clones is generated, each of which is sequenced.
  • the resulting overlapping sequences are assembled into a single contiguous sequence of high redundancy (usually three to five overlapping sequences at each nucleotide position), resulting in a highly accurate final sequence.
  • a similar screening and clone selection approach can be applied to obtaining cosmid or lambda clones from a genomic DNA library that contains the complete gene from which the EST was derived (Kirkness, E.F., Kusiak, J.W., Menninger, J., Gocayne, J.D., Ward, D.C., and Venter, J.C. Genomics 10: 985-995 (1991). Although the process is much more laborious, these genomic clones can be sequenced in their entirety also.
  • a shotgun approach is preferred to sequencing clones with inserts longer than 10 kb (genomic cosmid and lambda clones). In shotgun sequencing, the clone is randomly broken into many small pieces, each of which is partially sequenced.
  • sequence fragments are then aligned to produce the final contiguous sequence with high redundancy.
  • An intermediate approach is to sequence just the promoter region and the intron-exon boundaries and to estimate the size of the introns by restriction endonuclease digestion (ibid.).
  • the polynucleotides of the present invention can be derived from natural sources or synthesized using known methods.
  • the sequences falling within the scope of the present invention are not limited to the specific sequences described, but include human allelic and species variations thereof and portions thereof of at least 15-18 bases. (Sequences of at least 15-18 bases can be used, for example, as PCR primers or as DNA probes.)
  • the invention includes the entire coding sequence associated with the specific polynucleotide sequence of bases described in the Sequence Listing, as well as portions of the entire coding sequence of at least 15-18 bases and allelic and species variations thereof.
  • the invention includes sequences coding for the same amino acid sequences as do the specific sequences disclosed herein.
  • sequences, constructs, vectors, clones, and other materials comprising the present invention can advantageously be in enriched or isolated form.
  • enriched means that the concentration of the material is at least about 2, 5, 10, 100, or 1000 times its natural concentration (for example), advantageously 0.01%, by weight, preferably at least about 0.1% by weight. Enriched preparations of about 0.5%, 1%, 5%, 10%, and 20% by weight are also contemplated. Further, removal of clones corresponding to ribosomal RNA and "housekeeping" genes and clones without human cDNA inserts results in a library that is "enriched" in the desired clones.
  • isolated requires that the material be removed from its original environment (e.g., the natural. environment if it is naturally occurring).
  • a naturally-occurring polynucleotide present in a living animal is not isolated, but the same polynucleotide, separated from some or all of the coexisting materials in the natural system, is isolated.
  • sequences be in purified form.
  • purified does .not require absolute purity; rather, it is intended as a relative definition.
  • Individual EST clones isolated from a cDNA library have been conventionally purified to electrophoretic homogeneity. The sequences obtained from these clones could not be obtained directly either from the library or from total human DNA.
  • the cDNA clones are not naturally occurring as such, but rather are obtained via manipulation of a partially purified naturally occurring substance (messenger RNA).
  • the conversion of mRNA into a cDNA library involves the creation of a synthetic substance (cDNA) and pure individual cDNA clones can be isolated from the synthetic library by clonal selection.
  • cDNA synthetic substance
  • cDNA pure individual cDNA clones can be isolated from the synthetic library by clonal selection.
  • creating a cDNA library from messenger RNA and subsequently isolating individual clones from that library results in an approximately 10°-fold purification of the native message.
  • Purification of starting material or natural material to at least one order of magnitude, preferably two or three orders, and more preferably four or five orders of magnitude is expressly contemplated.
  • a cDNA library there are many species of mRNA represented. Each cDNA clone can be interesting in its own right, but must be isolated from the library before further experimentation can be completed. In order to sequence any specific cDNA, it must be removed and separated (i.e. isolated and purified) from all the other sequences. This can be accomplished by many techniques known to those of skill in the art. These procedures normally involve identification of a bacterial colony containing the cDNA of interest and further amplification of that bacteria. Once a cDNA is separated from the mixed clone library, it can be used as a template for further procedures such as nucleotide sequencing.
  • subgroupings of 50 ESTs are contemplated (e.g., SEQ ID NOS 1-50, 51-100, 101-150, etc.) as being within the scope of this invention, as are subgroupings of 5, 10, 25, 100, 200, and 500 ESTs and corresponding sequences.
  • the present invention also includes recombinant constructs comprising one or more of the sequences as broadly described above.
  • the constructs comprise a vector, such as a plasmid or viral vector, into which a sequence of the invention has been inserted, in a sense or antisense orientation.
  • the construct further comprises regulatory sequences, including for example, a promoter, operably linked to the sequence.
  • a promoter operably linked to the sequence.
  • Bacterial Bacterial: pBs, phagescript, ⁇ X174, pBluescript SK, pBs KS, pNH8a, pNH16a, pNH18a, pNH46a (Stratagene); pTrc99A, pKK223-3, pKK233-3, pDR540, pRIT5 (Pharmacia).
  • Eukarvotic pWLneo, pSV2cat, pOG44, pXT1, pSG (Stratagene); pSVK3, pBPV, pMSG, pSVL (Pharmacia).
  • Promoter regions can be selected from any desired gene using CAT (chloramphenicol transferase) vectors or other vectors with selectable markers.
  • Two appropriate vectors are pKK232-8 and pCM7.
  • Particular named bacterial promoters include lad, lacZ, T3, T7, gpt, lambda P R , and trc.
  • Eukaryotic promoters include CMV immediate early, HSV thymidine kinase, early and late SV40, LTRs from retrovirus, and mouse metallothionein-I. Selection of the appropriate vector and promoter is well within the level of ordinary skill in the art.
  • the present invention relates to host cells containing the above-described construct.
  • the host cell can be a higher eukaryotic cell, such as a mammalian cell, or a lower eukaryotic cell, such as a yeast cell, or the host cell can be a procaryotic cell, such as a bacterial cell.
  • Introduction of the construct into the host cell can be effected by calcium phosphate transfection, DEAE dextran mediated transfection, or electroporation (Davis, L., Dibner, M., Battey, I., Basic Methods in Molecular Biology, (1986)).
  • the constructs in host cells can be used in a conventional manner to produce the gene product coded by the recombinant sequence.
  • the encoded polypeptide can be synthetically produced by conventional peptide synthesizers.
  • ESTs have already been preliminarily categorized by analogy to related sequences in other organisms (see Table 2).
  • Table 10 of Example 10 categorizes particular ESTs broadly as metabolic, regulatory, and structural sequences where known. Constructs comprising genes or coding sequences corresponding to each of these categories are, therefore, specifically and individually contemplated.
  • Table 11 more particularly separates 127 new ESTs into
  • Each of the cDNA sequences identified herein can be used in numerous ways as polynucleotide reagents.
  • the sequences can be used as diagnostic probes for the presence of a specific mRNA in a particular cell type.
  • these sequences can be used as diagnostic probes suitable for use in genetic linkage analysis (polymorphisms).
  • the sequences can be used as probes for locating gene regions associated with genetic disease, as explained in more detail below.
  • the EST and complete gene sequences of the present invention are also valuable for chromosome identification. Each sequence is specifically targeted to and can hybridize with a particular location on an individual human chromosome. Moreover, there is a current need for identifying particular sites on the chromosome. Few chromosome marking reagents based on actual sequence data (repeat polymorphisms) are presently available for marking chromosomal location. The present invention constitutes a major expansion of available chromosome markers. One hundred ESTS have already been mapped to chromosomes. Using the techniques described in Example 5 or 6, the remaining ESTs and the corresponding complete sequences can similarly be mapped to chromosomes. The mapping of ESTs and cDNAs to chromosomes according to the present invention is an important first step in correlating those sequences with genes associated with disease.
  • sequences can be mapped to chromosomes by preparing PCR primers (preferably 15-25 bp) from the ESTs. Computer analysis of the ESTs is used to rapidly select primers that do not span more than one exon in the genomic DNA, thus complicating the amplification process. These primers are then used for PCR screening of somatic cell hybrids containing individual human chromosomes. Only those hybrids containing the human gene corresponding to the EST will yield an amplified fragment.
  • PCR mapping of somatic cell hybrids is a rapid procedure for assigning a particular EST to a particular chromosome. Three or more clones can be assigned per day using a single thermal cycler. Using the present invention with the same oligonucleotide primers, sublocalization can be achieved with panels of fragments from specific chromosomes or pools of large genomic clones in an analogous manner.
  • Other mapping strategies that can similarly be used to map an EST to its chromosome include in situ hybridization, prescreening with labeled flow-sorted chromosomes and preselection by hybridization to construct chromosome specific cDNA libraries. Results of mapping ESTs to chromosomal segments are listed in Tables 3 and 4.
  • Fluorescence in situ hybridization (FISH) of a cDNA clone to a metaphase chromosomal spread can be used to provide a precise chromosomal location in one step.
  • This technique can be used with cDNA as short as 500 or 600 bases; however, clones larger than 2,000 bp have a higher likelihood of binding to a unique chromosomal location with sufficient signal intensity for simple detection.
  • FISH requires use of the clone from which the EST was derived, and the longer the better. 2,000 bp is good, 4,000 is better, and more than 4,000 is probably not necessary to get good results a reasonable percentage of the time.
  • Reagents for chromosome mapping can be used individually (to mark a single chromosome or a single site on that chromosome) or as panels of reagents (for marking multiple sites and/or multiple chromosomes). Reagents corresponding to noncoding regions of the genes actually are preferred for mapping purposes. Coding sequences are more likely to be conserved within gene families, thus increasing the chance of cross hybridizations during chromosomal mapping (see Tables 8 and 9).
  • a cDNA precisely localized to a chromosomal region associated with the disease could be one of between 50 and 500 potential causative genes. (This assumes 1 megabase mapping resolution and one gene per 20 kb.)
  • Comparison of affected and unaffected individuals generally involves first looking for structural alterations in the chromosomes, such as deletions or translocations that are visible from chromosome spreads or detectable using PCR based on that cDNA sequence. Ultimately, complete sequencing of genes from several individuals is required to confirm the presence of a mutation and to distinguish mutations from polymorphisms.
  • sequences of the invention can be used to control gene expression through triple helix formation or antisense DNA or RNA, both of which methods are based on binding of a polynucleotide sequence to DNA or RNA.
  • Polynucleotides suitable for use in these methods are usually 20 to 40 bases in length and are designed to be complementary to a region of the gene involved in transcription (triple helix - see Lee et al, Nucl. Acids Res. 6: 3073 (1979); Cooney et al, Science 241: 456 (1988); and Dervan et al, Science 251: 1360 (1991)) or to the mRNA itself (antisense - Okano, J. Neurochem.
  • sequences of the present invention are also useful for identification of individuals from minute biological samples.
  • the United States military for example, is considering the use of restriction fragment length polymorphism (RFLP) for identification of its personnel.
  • RFLP restriction fragment length polymorphism
  • an individual's genomic DNA is digested with one or more restriction enzymes, and probed on a Southern blot to yield unique bands for identifying personnel.
  • This method does not suffer from the current limitations of "Dog Tags" which can be lost, switched, or stolen, making positive identification difficult.
  • the sequences of the present invention are useful as additional DNA markers for RFLP.
  • RFLP is a pattern based technique, which does not directly focus on the actual DNA sequence of the individual.
  • the sequences of the present invention can be used to provide an alternative technique that determines the actual base-by-base DNA sequence of selected portions of an individual's genome. These sequences can be used to prepare PCR primers for amplifying and isolating such selected DNA.
  • Panels of corresponding DNA sequences from individuals can provide unique individual identifications, as each individual will have a unique set of such DNA sequences, due to allelic differences.
  • the sequences of the present invention can be used to particular advantage to obtain such identification sequences from individuals and from tissue, as explained in Examples 12 - 14.
  • the EST sequences from Examples 1 and 2 and the complete sequences from Example 13 uniquely represent portions of the human genome. Allelic variation occurs to some degree in the coding regions of these sequences, and to a greater degree in the noncoding regions. It is estimated that allelic variation between individual humans occurs with a frequency of about once per each 500 bases.
  • Each of the ESTs or complete coding sequences comprising a part of the present invention can, to some degree, be used as a standard against which DNA from an individual can be compared for identification purposes. Because greater numbers of polymorphisms occur in the noncoding regions, fewer sequences are necessary to differentiate individuals.
  • the noncoding sequences of Table 9 could comfortably provide positive individual identification with a panel of perhaps 100 to 1,000 primers which each yield a noncoding amplified sequence of 100 bp. If predicted coding sequences, such as those from Table 6, are used, a more appropriate number of primers for positive individual identification would be 500-2,000.
  • a panel of reagents from ESTs or complete sequences of this invention is used to generate a unique ID database for an individual, those same reagents can later be used to identify tissue from that individual. Positive identification of that individual, living or dead can be made from extremely small tissue samples.
  • DNA-based identification techniques are in forensic biology.
  • PCR technology can be used to amplify DNA sequences taken from very small biological samples such as tissues, e.g., hair or skin, or body fluids, e.g., blood, saliva, semen, etc.
  • gene sequences are amplified at specific loci known to contain a large number of allelic variations, for example the DQ ⁇ class II HLA gene (Erlich, H., PCR Technology, Freeman and Co. (1992)). Once this specific area of the genome is amplified, it is digested with one or more restriction enzymes to yield an identifying set of bands on a Southern blot probed with DNA corresponding to the DQ ⁇ class II HLA gene.
  • sequences of the present invention can be used to provide polynucleotide reagents specifically targeted to additional loci in the human genome, and can enhance the reliability of DNA-based forensic identifications. Those sequences targeted to noncoding regions (see, e.g., Tables 8 and 9) are particularly appropriate. As mentioned above, actual base sequence information can be used for identification as an accurate alternative to patterns formed by restriction enzyme generated fragments. Reagents for obtaining such sequence information are within the scope of the present invention. Such reagents can comprise complete ESTs or corresponding coding regions, or fragments of either of at least 15 bp, preferably at least 18 bp.
  • reagents capable of identifying the source of a particular tissue. Such need arises, for example, in forensics when presented with tissue of unknown origin.
  • Appropriate reagents can comprise, for example, DNA probes or primers specific to particular tissue prepared from the ESTs or complete sequences of the present invention. Panels of such reagents can identify tissue by species and/or by organ type. In a similar fashion, these reagents can be used to screen tissue culture for contamination.
  • each EST corresponds not only to a coding region, but also to a polypeptide.
  • the coding sequence is known, or the gene is cloned which encodes the polypeptide, conventional techniques in molecular biology can be used to obtain the polypeptide.
  • the amino acid sequence encoded by the polynucleotide sequence can be synthesized using commercially available peptide synthesizers. This is particularly useful in producing small peptides and fragments of larger polypeptides. (Fragments are useful, for example, in generating antibodies against the native polypeptide.)
  • the DNA encoding the desired polypeptide can be inserted into a host organism and expressed.
  • the organism can be a bacterium, yeast, cell line, or multicellular plant or animal.
  • the literature is replete with examples of suitable host organisms and expression techniques.
  • naked polynucleotide DNA or mRNA
  • This methodology can be used to deliver the polypeptide to the animal, or to generate an immune response against a foreign polypeptide.
  • the coding sequence can be inserted into a vector, which is then used to transfect a cell.
  • the cell which may or may not be part of a larger organism
  • Antibodies generated against the polypeptide corresponding to a sequence of the present invention can be obtained by direct injection of the naked polypeptide into an animal (as above) or by administering the polypeptide to an animal, preferably a nonhuman. The antibody so obtained will then bind the polypeptide itself. In this manner, even a sequence encoding only a fragment of the polypeptide can be used to generate antibodies binding the whole native polypeptide. Such antibodies can then be used to isolate the polypeptide from tissue expressing that polypeptide. Moreover, a panel of such antibodies, specific to a large number of polypeptides, can be used to identify and differentiate such tissue.
  • lambda ZAP libraries were converted en masse to pBluescript plasmids, transfected into E. coli XL1-Blue cells, and plated on X-gal/IPTG/ampicillin plates.
  • a total of 1058 clones were picked at random from three human brain cDNA libraries: fetal brain, two-year-old hippocampus, and two-year-old temporal cortex (Stratagene catalog #936206, 936205, 935, respectively.
  • Stratagene 11099 N. Torrey Pines Rd., La Jolla, CA 92037).
  • the EST sequences from this Example 1 are identified as SEQ ID NOs 1-315.
  • cDNA libraries were used as sources of clones for sequencing.
  • Human hippocampus and fetal brain libraries, plasmid template preparation, sequencing reactions, and automated sequencing were performed as described (Adams, M.D., Kelley, J.M., Gocayne, J.D., Dubnick, M., Polymeropoulos, M.H., Xiao, H., Merril, C.R., Wu, A., Olde, B., Moreno, R.F., Kerlavage, A.R., McCombie, W.R., & Venter, J.C. Science, 252: 1651-56 (1991)).
  • a pooled probe consisting of inserts from 10 different EST clones with sequences that matched either mitochondrial genes or the 18S or 28S ribosomal RNAs was used to prescreen a gridded filter array of the hippocampus library; nonhybridizing clones are referred to as the "prescreened library”.
  • Another fetal brain library was constructed by and was a gift from Bento
  • BLAST Altschul, S. F., Gish, W., Miller, W., Myers, E.W., & Lipman, D. J. Mol. Biol. 215: 403-410 (1990)
  • BLAST output was parsed, and an interactive alignment editor was used to select which matches, if any, from each search to record in a relational EST database, which was developed to track sequencing, identification, tissue localization, physical mapping, and the public distribution of the clones, mapping and sequence data.
  • ESTs including SEQ ID NOs 1-315 were analyzed as follows. Initially, the EST sequences were examined for similarities in the GenBank nucleic acid database (GenBank Release 65.0), Protein Information Resource Release 26.0 (PIR), and ProSite (MacPattern from the EMBL data library, Fuchs R. Comput. Appl. Biosci. 7: 105 (1990) Release 5.0 were used). BLAST was used to search Genbank and the PIR (both maintained by the National Center for Biotechnology Information) ESTs without exact GenBank matches were translated in all six reading frames and each translation was compared with the protein sequence database PIR and the ProSite protein motif database. Comparisons with the ProSite motif database were done by means of the program MacPattern from the EMBL Data Library.
  • GenBank and PIR searches were conducted with the "basic local alignment search tool" programs for nucleotide (BLASTN) and peptide (BLASTX) comparisons (Altschul et al, J. Mol. Biol. 215: 403 (1990)). PIR searches were run on the National Center for Biotechnology Information BLAST network service.
  • the BLAST programs contain a very rapid database-searching algorithm that searches for local areas of similarity between two sequences and then extends the alignments on the basis of defined match and mismatch criteria. The algorithm does not consider the potential gaps to improve the alignment, thus sacrificing some sensitivity for a 6-80 fold increase in speed over other database-searching programs such as FASTA
  • ESTs matched previously sequenced human nuclear genes with more than 97% identity.
  • Four of these ESTs are from genes encoding enzymes involved in maintaining metabolic energy, including ADP/ATP translocase, aldolase C, hexokinase, and phosphoglycerate kinase.
  • Human homologs of genes for the bovine mitochondrial ATP synthase F o ß-subunit and porcine aconitase were also found (Table 2).
  • Brain-specific cDNAs included synaptophysin, glial fibrillary acidic protein (GFAP), and neurofilament light chain.
  • ESTs are from genes encoding proteins involved in signal transduction: 2',3'-cyclic nucleotide 3'-phosphodiesterase (2 ESTs), calmodulin, c-erbA- ⁇ -2, G s ⁇ , and Na + /K + ATPase ⁇ -subunit.
  • Other ESTs were matches to genes for ubiquitous structural proteins - - actins , tubulins , and fodrin (non-erythroid spectrin).
  • ESTs also document the presence in the hippocampus cDNA library of the ret protooncogene, the ras-related gene rhoB, and one of the chromosome 22 breakpoint cluster region transcripts.
  • ESTs are from genes known to be associated with genetic disorders (Online Mendelian Inheritance in Man). More than half of the human-matched ESTs from Example 1 have been mapped to chromosomes, indicating the bias of GenBank entries toward well-studied genes and proteins.
  • ESTs without significant GenBank matches were also compared to the ProSite database of recognized protein motifs. Not counting post-translational-modification signatures, fifty-four sequences contained motifs from the database. Some patterns, particularly the "leucine zipper", are found in scores or hundreds of proteins that do not share the functional property implied by the presence of the motif.
  • EST00257 shows strong nucleotide sequence similarity to the squid (67%) and Drosophila (70.4%) kinesin heavy chain. Kinesin was first described as a microtubule-associated motor protein involved in organelle transport in the squid giant axon (Vale et al, Cell 42: 39 (1985)). Six oncogene-related sequences were also among the cDNA clones sequenced.
  • EST00299 SEQ ID NO: 180
  • EST00283 SEQ ID NO: 271
  • EST00248 SEQ ID NO: 102
  • EST00248 SEQ ID NO: 102
  • EST00248 SEQ ID NO: 102
  • EST00248 SEQ ID NO: 102
  • EST00248 SEQ ID NO: 102
  • EST00248 SEQ ID NO: 102
  • EST00299 SEQ ID NO: 180
  • EST00283 SEQ ID NO: 271
  • Similarities with an S. cerevisiae RNA polymerase subunit and Torpedo electromotor neuron-associated protein were also observed.
  • Two ESTs may represent new members of known human gene families: EST00270 matched the three ß-tubulin genes with 88-91% identity and EST00271 (SEQ ID NO:248) matched ⁇ -actinin with 85% identity at the nucleotide level.
  • Enhancer of split protein interacts with a membrane protein that is the product of the Notch gene to convert a developmental signal into an altered pattern of gene expression (id. J. Mol. Biol. 215: 403 (1990)).
  • EST00256 (SEQ ID NO:188) matches near the 5' end of the Enhancer of split coding sequence, away from the mammalian G protein ⁇ subunit- and yeast cdc4-like elements (Hartley et al, Cell 55: 785 (1988); Klambt et al. EMBO J. 8: 203 (1989)).
  • Seven genes were represented by more than one EST.
  • Example 2 The ESTs of Example 2, including SEQ ID NOs 316-2407, were screened against known sequences listed in GenBank and other databases, as in Example 3. The results are reported in Table 2. The quality of the match is given as percent identity and length in base pairs for nucleotide matches and amino acid residues for peptide matches. In many cases ESTs match multiple domains on several related proteins; for example, EST00825 matches two transmembrane domains on both GABA and Norepinephrine transporters. Nucleotide databases are: GenBank (GB), and EMBL (E); peptide databases are: GenPept (GPU), Swiss-Prot (SP), and PIR.
  • Example 2 The great majority (83%) of the partial cDNA sequences reported in Example 2 are unrelated to any sequences previously described in the literature. Based on database matches to known genes from humans as well as from such evolutionarily distant organisms as E. coli , yeast, C. elegans, Drosophila , barley, AraJbidopsis, rice, and green algae, we have preliminarily identified the functional type of a number of the ESTs (Table 2). These include a novel gene similar to Notch/Tan- 1 (Adams et al., supra), a new neurotransmitter transporter gene, and a new member of the multi-drug resistance gene family.
  • MBP myelin basic protein genes
  • ESTs By matching ESTs to known database sequences, a phenotypic characterization of the tissue begins to emerge. Protein superfamilies matched by ESTs were grouped into three broad functional categories to assess the biological spectrum represented by these randomly selected cDNA clones. Structural and metabolic classes comprised about 30% of the ESTs with database matches. Twenty-five percent were involved in regulatory pathways and the remainder were not classifiable. Eleven of the eighteen enzymes of glycolysis and the citric acid cycle are represented by at least one subunit or isozyme.
  • osteopontin Young, M., Kerr, J., Termine, J., Wewer, U., Wang, M., McBride, W. & Fisher, L. Genomics 7:491-502
  • Oligonucleotide primer pairs were designed from EST sequences to minimize the chance of amplifying through an intron.
  • the oligonucleotides were 18-23 bp in length and designed for PCR amplification using the computer program INTRON (National Institutes of Mental Health, Bethesda, MD). The program is based on the assumptions that: 1) introns are genomic sequences that interrupt the coding and noncoding sequences of genes (Smith, J. Mol. Evol. 27:45-55 (1988)); 2) there are consensus sequences for splice junctions (Shapiro, et al., Nucl. Acids Res.
  • the program evaluates the likelihood that a given GG or CC dinucleotide represents a former exon-intron
  • every input strand is processed by the INTRON program twice, first evaluating the sense mRNA strand, and then processing the complementary or anti-sense strand.
  • the program evaluates each sequence by finding all GG or CC pairs (possible former splice sites), searching for STOP codons in all three reading frames, and analyzing the GG or CC pairs surrounded by stop codons. All regions of the EST that are unlikely to contain splice junctions based on CC content, GG content, and stop codon frequency are then marked by the program in uppercase.
  • PCR primers from known sequences are well known to those with skill in the art. For a review of PCR technology see Erlich, H.A., PCR Technology, Principles and Applications for DNA Amplification. 1992. W.H. Freeman and Co., New York. ESTs were examined for the presence of stop codons in each reading frame and for consensus splice junctions. The presence of stop codons and absence of splice junction sequences are more characteristic of 3' untranslated sequences than of introns. The untranslated sequences are unique to a given gene; thus, primers from these regions are less likely to prime other members of a gene family or pseudogenes.
  • PCR polymerase chain reactions
  • oligonucleotide primer 0.6 unit of Tag polymerase, and 1 uCu of a 32 P-labeled deoxycytidine triphosphate.
  • the PCR was performed in a microplate thermocycler (Techne) under the following conditions: 30 cycles of 94°C, 1.4 min; 55°C, 2 min; and 72°C, 2 min; with a final extension at 72°C for 10 min.
  • the amplified products were analyzed on a 6% polyacrylamide sequencing gel and visualized by
  • Somatic Cell Hybrid Mapping Panel Number 1 (NIGMS, Camden, NJ).
  • PCR was used to screen a series of somatic cell hybrid cell lines containing defined sets of human chromosomes for the presence of a given EST. DNA was isolated from the somatic hybrids and used as starting templates for PCR reactions using the primer pairs from EST sequences
  • the single human chromosome present in all cell hybrids that give rise to an amplified fragment represents the chromosome containing that EST.
  • the assignment of 100 ESTs and corresponding genes to chromosomes by PCR is shown in Table 3.
  • Somatic cell hybrids were prepared that contained defined subsets of chromosomes 6 and X. Methods for preparing and selecting somatic cell hybrids are known in the art. For a review of an exemplary procedure to generate somatic cell hybrids containing the short arm of human chromosome 6, see Zoghbi, et al., Genomics 9(4):713-720 (1991). For a general review of somatic cell hybridization see Ledbetter et al. (supra). The hybrids were processed to obtain DNA and analyzed by PCR and by fluorescence in situ
  • Example 5 The procedure of Example 5 is repeated for all of the ESTs from Examples 1 and 2 not previously mapped to human chromosomes. Data are generated corresponding to the data in Table 3 for all of the unmapped ESTs. As previously mentioned, virtually all of the ESTs will map to a unique chromosomal location. The inability of any ESTs to
  • This technique was used to map an EST to a particular location on a given chromosome.
  • Cell cultures, tissue, or whole blood were used to obtain chromosomes.
  • 0.5 ml. of whole blood was added to RPMI 1640 and incubated 96 hours in a 5%CO 2 /37°C incubator.
  • 0.05 ug/ml colcemide was added to the culture one hour before harvest.
  • Cells were collected and washed in PBS.
  • the suspension was incubated with a hypotonic solution of KC1 added dropwise to reach a final volume of 5 ml .
  • the cells were spun down and fixed by resuspending the cells in methanol and glacial acetic acid (3:1). The cell suspension was dropped onto glass slides and dried.
  • the slides were treated with RNase A and washed then dehydrated in a series of increasing concentrations of ethanol.
  • the EST to be localized was nick-translated using fluorescently labeled nucleotide (Korenberg, Jr., et al., Cell 53(3):391-400 (1988)). Following nick translation, unincorporated label was removed by spin dialysis through Sepharose. The probe was further extracted with phenolchloroform to remove additional protein. The chromosomes were denatured in formamide using techniques known in the art and the denatured probe was added to the slides. Following hybridization, the cells were washed. The slides were studied under a fluorescent microscope. In addition, the chromosomes can be stained for G-banding or Q-banding using techniques known in the art.
  • the resulting metaphase chromosomes had fluorescent tags localized to those regions of the chromosome that were homologous to the EST. Thus, a particular EST was localized to a particular region on a given chromosome.
  • SEQ ID NOs 396, 485, 506, 1880 and 1894 were mapped using fluorescent in situ hybridization to locations on chromosomes 17, 7, 10 and 1 respectively (See Table 4B below).
  • the ESTs of the present invention were statistically evaluated using the coding-region prediction program CRM via the GRAIL server (Uberbacher, E. & Mural, R. Proc.
  • the CRM program uses a neural network to combine results from several different coding regions by looking at different 6 bp sequences found in coding exons and in introns. The program additionally conducts reading frame searches and assesses randomness at the third position of codons. This, protocol categorizes sequences as having an excellent, good, marginal, or poor probability of containing coding regions. The results are reported in Tables 6-9. There were 219 ESTs categorized as "excellent” (Table 6); 120 categorized as "good” (Table 7); 113 categorized as

Abstract

Partial and complete human cDNA and genomic sequences corresponding to particular expressed sequence tags (ESTs). The ESTs are cDNA sequences that are generally between 150 and 500 base pairs in length, are derived from human brain cDNA libraries, correspond to genes transcribed in human brain, and have base sequences identified herein as SEQ ID NOS: 1-2421.

Description

SEQUENCES CHARACTERISTIC OF HUMAN GENE TRANSCRIPTION
PRODUCT
Technical Field
The present invention relates to newly identified polynucleotide sequences corresponding to transcription products of human genes, and to complete gene sequences associated therewith.
Background
This invention relates to human genes. Identification and sequencing of human genes is a major goal of modern scientific research. The sequence of human genes is more than just a scientific curiosity. For example, by identifying genes and determining their sequences, scientists have been able to make large quantities of valuable human "gene products." These include human insulin, interferon, Factor VIII, tumor necrosis factor, human growth hormone, tissue plasminogen activator, and numerous other compounds. Additionally, knowledge of gene sequences can provide the key to treatment or cure of genetic diseases (such as muscular dystrophy and cystic fibrosis). The present invention represents a quantum leap forward in mankind's knowledge of human gene sequences.
There are several basic concepts of molecular biology which figure prominently in the invention. A brief explanation of those concepts follows. Additional background information and definitions for scientific terms can be found in the literature. See, for example, "Glossary of Genetics, Classical and Molecular" by R. Rieger, A. Michaelis, and M.M. Green (Fifth Edition, Springer-Verlag, New York (1991)). The contents of this and other publications cited in the specification are incorporated by reference herein.
At an initial level, the present invention is based on identification and characterization of gene segments. Genes are the basic units of inheritance. Each gene is a string of connected bases called nucleotides. Most genes are formed of deoxyribonucleic acid, DNA. (Some viruses contain genes of ribonucleic acid, RNA.) The genetic information resides in the particular sequence in which the bases are arranged. A short sequence of nucleotides is often called a polynucleotide or an oligonucleotide.
Like genes, polypeptides are built from long strings of individual units. These units are amino acids. The nucleotide sequence of a gene tells the cell the sequence in which to arrange the amino acids to make the polypeptide encoded by that gene. In general, chains of up to about 200 amino acids are called polypeptides, while proteins are larger molecules made up of polypeptide subunits; both types of molecules are referred to generally herein as polypeptides. A triplet of nucleotides (codon) in DNA codes for each amino acid or signals the beginning or end of the message (anticodon). The term codon is also used for the corresponding (and complementary) sequences of three nucleotides in the mRNA into which the original DNA sequence is transcribed.
Generally, enzymes in the cell transcribe the permanent DNA of the gene into a temporary RNA copy, called messenger RNA or mRNA. The mRNA, in turn, can be translated into a polypeptide by the cell. This entire process is called gene expression, and the polypeptide is the gene product encoded by the gene.
Scientists have previously discovered how to reverse the transcription process and copy mRNA back into DNA using an enzyme called reverse transcriptase. The resulting is called complementary DNA, or cDNA. This is schematically shown in the single Figure. When substantially all of the mRNA from one cell or tissue is converted to cDNA at once and cloned into multiple copies of a recombinant vector to allow replication and manipulation in the laboratory, the result is called a cDNA library.
The various types of genes include those which code for polypeptides, those which are transcribed into RNA but are not translated into polypeptides, and those whose functional significance does not demand that they be transcribed at all. Most genes are found on large molecules of DNA located in chromosomes. Double stranded cDNA carries all the information of a gene. Each base of the first strand is joined to a complementary base (hybridized) in the second strand. The linear DNA molecules in chromosomes have thousands of genes distributed along their length. Chromosomes include both coding regions (coding for polypeptides) and noncoding regions; the coding regions. represent only about three percent of the total chromosome sequence.
An individual gene has regulatory regions that include a promoter which directs expression of the gene, a coding region which can code for a polypeptide, and a termination signal. The regulatory DNA sequence is usually a noncoding region that determines if, where, when, and at what level a particular gene is expressed.
The coding regions of many genes are discontinuous, with coding sequences (exons) alternating with noncoding regions (introns). The final mRNA copy of the gene does not include these introns (which can be much longer than the coding region itself), although it does contain certain untranslated regions that usually do not code for the polynucleotide gene product. Untranslated sequences at the beginning and end of the mRNA are known as 5'- and 3'-untranslated regions, respectively. This nomenclature reflects the orientation of the nucleotide constituents of the mRNA.
A cDNA is a DNA copy of a messenger RNA, which contains all of the exons of a gene. The cDNA can be thought of as having three parts: an untranslated 5' leader, an uninterrupted polypeptide-coding sequence, and a 3' untranslated region. The untranslated leader and trailing sequences are important for initiation of translation, mRNA stability, and other functions. The untranslated leader and trailing sequences are called 5'- and 3'-untranslated sequences, respectively. The 3' untranslated sequence is usually longer than the 5' untranslated leader, and can be longer than the polypeptide-coding sequence. The untranslated regions typically have many, randomly-distributed stop codons, and do not display the nonrandom base arrangements found in coding sequences. The 5'-untranslated sequence is relatively short, generally between 20 and 200 bases. The 3'-untranslated sequence is often many times longer, up to several thousand bases.
The translated or coding sequence begins with a translational start codon (AUG or GUG) and ends with a translational stop codon (UAA, UGA, or UAG). Generally, translation begins at the first "start" codon on the mRNA and proceeds to the first "stop" codon. Coding sequences can be distinguished by their nonrandom distribution of bases; numerous computer algorithms have been developed to distinguish coding from noncoding regions in this way.
Human DNA differs from person to person. No two persons (except perhaps identical twins) have identical DNA. While the differences, called allelic variations or polymorphisms, are slight on a molecular level, they account for most of the physical and other observable differences between individuals. It has been estimated that approximately 14 million sequence polymorphism differences exist between individuals. The ability of one strand of DNA to attach or hybridize to a complementary strand has already been exploited for several purposes. For example, small pieces of DNA (15 to 25 base pairs long) can be made which will hybridize to longer strands of DNA which have a complementary sequence. These short "primers" can be selected such that they hybridize to a specific, unique location on the longer strand. Once the primers have hybridized to their target on the DNA, the polymerase chain reaction (PCR) can be employed to generate millions of copies of (or amplify) the particular segment of DNA between the locations to which two primers are bound. Briefly, this technique allows amplification of a DNA region situated between two convergent primers, using oligonucleotide primers that hybridize to opposite strands. Primer extension proceeds inward across the region between the two primers, and the product of DNA synthesis of one primer serves as a template for the other primer. Repeated cycles of DNA denaturation, annealing of primers, and extension result in an exponential increase in the number of copies of the region bounded by the primers.
Similarly, a labeled segment of single-stranded DNA can be hybridized to a longer DNA sequence, such as a chromosome, to mark a specific location on the longer sequence. Segments of DNA 50 bases long or longer that hybridize to a unique DNA location in the human genome are extremely unlikely to hybridize elsewhere in the human genome.
The Human Genome Project is an effort to sequence all human DNA (the human genome). The human genome is estimated to comprise 50,000 - 100,000 genes, up to 30,000 of which might be expressed in the brain (Sutcliffe, Ann. Rev. Neurosci. 11:157 (1988)). Once dedicated human chromosome sequencing begins in three to five years, it was expected that 12-15 years will be required to complete the sequence of the genome (Report of the Ad Hoc Program. Advisory Committee on Complex Genomes, Reston, Va., Feb. 1988, D. Baltimore Ed. (NIH, Bethesda, Md, 1988)). At that rate, the majority of human genes would remain unknown for at least the next decade. The present invention can greatly accelerate the pace at which human genes can be identified and mapped. Most gene researchers, in conjunction with publication of their results in this field, submit sequence data to the GenBank database. Prior to the present invention, GenBank listed the sequences of only a few thousand human genes and less than two hundred human brain mRNAs (GenBank Release 66.0, December, 1990).
The role of sequencing complementary DΝA (cDΝA), reverse transcribed from mRΝA, as a part of the human genome project has been vigorously debated since the idea of determining the complete nucleotide sequence of humans first surfaced. The coding sequence of all human genes represents most of the information content of the genome, but only 3-5% of the total DΝA. In contrast, cDΝA (which is only made from the transcription product of active genes) is one-half to three-fourths (the remainder being 5'- and 3'-untranslated sequence) meaningful genetic information. Thus, some have argued that cDΝA sequencing should take precedence over genomic sequencing (Brenner, CIBA Found. Symp. 149:6 (1990)). However, until now, such arguments have not been heeded.
Genomic sequencing proponents have argued the difficulty of finding every mRΝA expressed in all tissues, cell types, and developmental states, and that much valuable information from intronic and intergenic regions, including control and regulatory sequences, will be missed by cDΝA sequencing. (Report of the Committee on Mapping and Sequencing the Human Genome, National Research Council (National Academy Press, Washington, D.C. 1988)). Further, sequencing of transcribed regions of the genome using cDNA libraries has heretofore been considered impractical or unsatisfactory. Libraries of cDNA were believed to be dominated by repetitive elements, mitochondrial genes, ribosomal RNA genes, and other nuclear genes comprising common or housekeeping sequences. It was believed that cDNA libraries would provide few sequences corresponding to structural and regulatory polypeptides or peptides. See, for example, Putney, et al., Nature 302:718-721 (1983). Putney, et al. sequenced over 150 clones from a rabbit muscle cDNA library and identified clones for 13 of the 19 known muscle polypeptides, including one new isotype but no unknown coding sequences.
Another perceived drawback of cDNA sequencing was that some mRNAs are abundant, and some are rare. The cellular quantities of mRNA from various genes can vary by several orders of magnitude. This led critics to believe that most information obtained from cDNA sequencing would be repetitious and useless.
The present invention demonstrates that, despite such skepticism, cDNA sequencing now provides a rapid method for obtaining enormous amounts of valuable genetic information and DNA products of great utility for the biotechnology and pharmaceutical industries. Not only can many distinct cDNAs be isolated and sequenced, even partial cDNAs can be used, with conventional, well-understood methods, to isolate entire genes, and to determine the chromosomal locations and biological functions of these genes. As is demonstrated here, fragments of only a few hundred bases are sufficient, in many cases, to identify the probable function of a new human gene if it is similar in structure to a gene from another animal, or from plants or bacteria. Similarly, even fragments of untranslated regions of a cDNA can be used to: i) isolate the coding sequence of the cDNA; ii) isolate the complete gene; iii) determine the position of the gene on a human chromosome, and hence the potential of the gene to cause a human genetic disease; and iv) determine the function of the gene by means of experiments in which the function of the native gene is disrupted by the addition of a short DNA fragment to the cell, e.g., using triple helix or antisense probes.
Because coding regions comprise such a small portion of the human genome, identification and mapping of transcribed regions and coding regions of chromosomes is of significant interest. There is a corresponding need for reagents for identifying and marking coding regions and transcribed regions of chromosomes. Furthermore, such human sequences are valuable for chromosome mapping, human identification, identification of tissue type and origin, forensic identification, and locating disease-associated genes (i.e., genes that are associated with an inherited human disease, whether through mutation, deletion, or faulty gene expression) on the chromosome.
SUMMARY OF THE INVENTION
Contrary to the expectations of the scientific community, cDNA screening and sequencing techniques have now been used to discover a large number of heretofore unknown human genes. Disclosed herein are over 2,400 new human polynucleotide sequences. These sequences could represent up to 5% of all human genes. The novelty of these sequences has been established through comparison to both nucleotide sequence databases and amino acid sequence databases. Surprisingly, over 80% of the sequences generated were unrelated to any sequences previously described in the literature.
The sequences of the present invention were ascertained using a fast approach to cDNA characterization. This approach could facilitate the tagging of most expressed human genes within a few years at a fraction of the cost of complete genomic sequencing, provide new genetic markers, provide new DNA-based therapeutics and diagnostics, and provide other valuable nucleotide reagents.
The sequences disclosed herein, styled Expressed Sequence Tags ("ESTs"), are markers for human genes actually transcribed in vivo . Techniques are disclosed for using these ESTs to obtain the full coding region of the corresponding gene. The use of ESTs, complete coding sequences, or fragments thereof for marking chromosomes, for mapping locations of expressed genes on chromosomes, for individual or forensic identification, for mapping locations of disease-associated genes, for identification of tissue type, and for preparation of antisense sequences, probes, and constructs is discussed in detail below. Unlike the random genomic DNA sequence tagged sites (STSs) (Olson et al., Science 245:1434 (1989)), ESTs point directly to expressed genes.
Various aspects of the present invention thus include the individual ESTs, corresponding partial and complete cDNA, genomic DNA, mRNA, antisense strands, triple helix probes, PCR primers, coding regions, and constructs. Also, where one skilled in the art is enabled by this specification to prepare expression vectors and polypeptide expression products, they are also within the scope of the present invention, along with antibodies, especially monoclonal antibodies, to such expression products.
BRIEF DESCRIPTION OF THE DRAWING
The single drawing Figure schematically illustrates the progression from chromosome to gene to mRNA to cDNA.
DETAILED DESCRIPTION OF THE INVENTION
The detailed description that follows provides not only the actual sequence of each new EST, but also explains how the ESTs were obtained, how to obtain the corresponding complete cDNA sequence and the corresponding genomic DNA sequence, how to make DNA constructs from the ESTs and corresponding sequences, how to use those sequences as reagents in molecular biology and other fields, how to produce gene products from the ESTs and corresponding sequences and antibodies to those gene products, and the functional categories of many ESTs and corresponding genes. Furthermore, numerous actual working examples and predictive examples are provided to demonstrate and exemplify numerous aspects of the invention.
I. ESTs from cDNA Libraries
The sequences of the present invention were isolated from commercially available and custom made cDNA libraries using a rapid screening and sequencing technique. In general, the method comprises applying conventional automated DNA sequencing technology to screening clones, advantageously randomly selected clones, from a cDNA library. Preferably, the library is initially "enriched" through removal of ribosomal sequences and other common sequences prior to clone selection. According to the present method, ESTs are generated from partial DNA sequencing of the selected clones. The ESTs of the present invention were generated using low redundancy of sequencing, typically a single sequencing reaction. While single sequencing reactions may have an accuracy as low as 97%, this nevertheless provides sufficient fidelity for identification of the sequence and design of PCR primers.
Most human genes can be identified by EST sequencing from libraries of cDNA copies of messenger RNAs. However, some genes are expressed only at specific times during embryonic development, or only in small amounts in a few specific cell types. Other genes have mRNAs that: are degraded very quickly by the cell in which they are expressed. If any of these are the case, transcripts of the gene will not be represented in cDNA libraries so the gene will not be identifiable by EST sequencing. A new method called "exon amplification", however, can be used to isolate and identify transcripts of such genes.
Exon amplification works by artificially expressing part or all of a gene that is contained in a cloned fragment of genomic DNA such as a cosmid or yeast artificial chromosome (YAC). The gene is cloned into a special vector, designed at MIT, that uses control elements from virus genes to express the protein-coding exons of the human gene of interest. Exon trapping shows considerable promise as a general technique for identifying those genes in the human genome that cannot be found by cDNA cloning and EST sequencing. Exon amplification will also be useful for identifying the genes in regions of genomic DNA to which disease genes have been mapped. The exon amplification method can be used directly with the cosmid and YAC clones frown human chromosomes that are being obtained by both NIH and DOE supported human genome centers. ESTs comprise DNA sequences corresponding to a portion of nuclear encoded messenger RNA. An EST is of sufficient length to permit: (1) amplification of the specific sequence from a cDNA library, e.g., by polymerase chain reaction (PCR); (2) use of a synthetic polynucleotide corresponding to a partial or complete sequence of the EST as a hybridization probe of a cDNA library, generally having 30 - 50 base pairs; or (3) unique designation of the pure cDNA clone from which the EST was derived (the EST clone) for use as a hybridization probe of a cDNA library. Preferably, EST-derived primer pairs and sequences amplify or detectably hybridize to a sequence from a genomic library.
It has been found that sufficient information is contained in the 150-400 base ESTs from one sequencing run to effect preliminary identification and exact chromosome mapping. Accordingly, the ESTs disclosed herein are generally at least 150 base pairs in length. The length of an EST is determined by the quality of sequencing data and the length of the cloned cDNA. Raw data from the automated sequencers is edited to remove low quality sequence at the end of the sequencing run. High quality sequences (usually a result of sequencing templates without excessive salt contamination) generally give about 400 bp of reliable sequence data; other sequences give fewer bases of reliable data. A 150 bp EST is long enough to be translated into a 50 amino acid peptide sequence. This length is sufficient to observe similarities when they exist in a database search. Furthermore, 150 bp is long enough to design PCR primers from each end of the sequence to amplify the complete EST. Sequences shorter than 150 bp are difficult to purify and use following PCR amplification. Furthermore, a 150 bp polynucleotide is likely to give a very strong signal with low background in a screen of a genomic library.
Finally, it is highly unlikely that a sequence of the same 150 bp exists in any genes in the genome besides the one tagged by the EST. Some closely related gene family members have very similar nucleotide sequences, but no examples of pairs of human genes with long segments of identical sequence have been reported to date. For instance, there are three known β-tubulin genes in humans. Several ESTs were found that matched one or another of these tubulin genes, but several new members of this gene family were also found and could be clearly distinguished from the three known members. ESTs that match perfectly to several different genes can be detected by hybridizing to chromosomes: if many chromosomal loci are observed, the sequence (or a close variant ) is present in more than one gene . This problem can be circumvented by using the 3'-untranslated part of the cDNA alone as a probe for the chromosomal location or for the full -length cDNA or gene . The 3'-untranslated region is more likely to be unique within gene families, since there is no evolutionary pressure to conserve a coding function of this region of the mRNA.
As demonstrated in the Examples that follow, ESTs can be used to map the expressed sequence to a particular chromosome. In addition, ESTs can be expanded to provide the full coding regions, as detailed below. In this manner, previously unknown genes can be identified.
While a variety of cDNA libraries can be used to obtain ESTs, human brain cDNA libraries are exemplified and represent a preferred embodiment. Suitable cDNA libraries can be freshly prepared or obtained commercially, e.g., as shown in Examples 1, 2, and 11. The cDNA libraries from the desired tissue are preferably preprocessed by conventional techniques to reduce repeated sequencing of high and intermediate abundance clones and to maximize the chances of finding rare messages from specific cell populations. Preferably, preprocessing includes the use of defined composition prescreening probes, e.g., cDNA corresponding to mitochondria, abundant sequences, ribosomes, actins, myelin basic polypeptides, or any other known high abundance peptide; these prescreening probes used for preprocessing are generally derived from known ESTs. Other useful preprocessing techniques include subtraction, which preferentially reduces the population of certain sequences in the library (e.g., see A. Swaroop et al., Nucl. Acids Res. 19, 1954 (1991)), and normalization, which results in all sequences being represented in approximately equal proportions in the library (Patanjali et al, Proc. Natl. Acad. Sci. USA 88:1943 (1991)).
The cDNA libraries used in the present method will ideally use directional cloning methods so that either the 5' end of the cDNA (likely to contain coding sequence) or the 3' end (likely to be a non-coding sequence) can be selectively obtained."
Libraries of cDNA can also be generated from recombinant expression of genomic DNA. After they are amplified, ESTs can be obtained and sequenced, e.g., as illustrated in Example 11.
The sequences of the present invention include the specific sequences set forth in the Sequence Listing and designated SEQ ID NO: 1 - SEQ ID NO: 2412. In one aspect of this embodiment, the invention relates to those sequences of SEQ ID NOS: 1 - 2412 that comprise the cDNA coding sequences for polypeptides having less than 95% identity with known amino acid sequences (see Table 2) and more preferably less than 90% or 85% identity. In a second aspect, the invention relates to those sequences of SEQ ID NOS: 1 - 2412 that encode polypeptides having no similarity to known amino acid sequences (see Examples that follow). Precisely because they do not contain coding regions and are therefore more unique in their sequence structures, those sequences which meet neither of the preceding criteria can be most useful and are generally preferred for mapping.
Consistent with the NIH mission and its responsibilities to disseminate knowledge and share the tangible fruits of its research, the present inventors have taken a number of steps to facilitate sequence data and clone availability. All EST sequences have been submitted to GenBank (representing an addition equivalent to 7% of the human nucleotides in Release 69 of GenBank, September 1991). The corresponding cDNA clones have been submitted to the American Type Culture Collection and information on clones and sequences has been submitted to the Genome Data Base (Pearson, P. Nucl. Acids Res. 19 (Suppl.): 2237-9 (1991)).
II. Complete Coding Sequences from ESTs
The ESTs of the present invention generally represent relatively small coding regions or untranslated regions of human genes. Although most of these sequences do not code for a complete gene product, the ESTs of the present invention are highly specific markers for the corresponding complete coding regions. The ESTs are of sufficient length that they will hybridize, under stringent conditions, only with DNA for that gene to which they correspond. Suitably stringent conditions comprise conditions, for example, where at least 95%, preferably at least 97% or 98% identity (base pairing), is required for hybridization. This property permits use of the EST to isolate the entire coding region and even the entire sequence. Therefore, only routine laboratory work is necessary to parlay the unique EST sequence into the corresponding unique complete gene sequence.
Thus, each of the ESTs of the present invention
"corresponds" to a particular unique human gene. Knowledge of the EST sequence permits routine isolation and sequencing of the complete coding sequence of the corresponding gene. The complete coding sequence is present in a full-length cDNA clone as well as in the gene carried on genomic clones. Therefore, each EST "corresponds" to a cDNA (from which the EST was derived), a complete genomic gene sequence, a polypeptide coding region (which can be obtained either from the cDNA or genomic DNA) , and a polypeptide or amino acid sequence encoded by that region.
The first step in determining where an EST is located in the cDNA is to analyze the EST for the presence of coding sequence, e.g., as described in Example 14. The CRM program predicts the extent and orientation of the coding region of a sequence. Based on this information, one can infer the presence of start or stop codons within a sequence and whether the sequence is completely coding or completely noncoding. If start or stop codons are present, then the EST can cover both part of the 5'-untranslated or 3'-untranslated part of the mRNA (respectively) as well as part of the coding sequence. If no coding sequence is present, it is likely that the EST is derived from the 3'-untranslated sequence due to its longer length and the fact that most cDNA library construction methods are biased toward the 3' end of the mRNA.
One general procedure for obtaining complete sequences from ESTs is as follows:
1. Purify selected human DNA from an EST clone (the cDNA clone that was sequenced to give the EST), e.g., by endonuclease digestion using ECOR1, gel electrophoresis, and isolation of the aforementioned clone by removal from low-melting agarose gel.
2. Radiolabel the isolated insert DNA, e.g., with 32P labels, preferably by nick translation or random primer labeling.
3. Use the labeled EST insert as a probe to screen a lambda phage cDNA library or a plasmid cDNA library. 4. Identify colonies containing clones related to the probe cDNA and purify them by known purification methods.
5. Nucleotide sequence the ends of the newly purified clones to identify full length sequences.
6. Perform complete sequencing of full length clones by Exonuclease III digestion or primer walking. Northern blots of the mRNA from various tissues using at least part of the EST clone as a probe can optionally be performed to check the size of the mRNA against that of the purported full length cDNA.
An EST is a specific tag for a messenger RNA molecule. The complete sequence of that messenger RNA, in the form of cDNA, can be determined using the EST as a probe to identify a cDNA clone corresponding to a full-length transcript, followed by sequencing of that clone. The EST or the full-length cDNA clone can also be used as a probe to identify a genomic clone or clones that contain the complete gene including regulatory and promoter regions, exons, and introns.
ESTs are used as probes to identify the cDNA clones from which an EST was derived. ESTs, or portions thereof, can be nick-translated or end-labelled with P32 using polynucleotide kinase using labelling methods known to those with skill in the art (Basic Methods in Molecular Biology, L.G. Davis, M.D. Dibner, and J.F. Battey, ed., Elsevier Press, NY, 1986). The lambda library can be directly screened with the labelled ESTs of interest or the library can be converted en masse to pBluescript (Stratagene, La Jolla, California) to facilitate bacterial colony screening. Both methods are well known in the art. Briefly, filters with bacterial colonies containing the library in pBluescript or bacterial lawns containing lambda plaques are denatured and the DNA is fixed to the filters. The filters are hybridized with the labelled probe using hybridization conditions described by Davis et al. The ESTs, cloned into lambda or pBluescript, can be used as positive controls to assess background binding and to adjust the hybridization and washing stringencies necessary for accurate clone identification. The resulting autoradiograms are compared to duplicate plates of colonies or plaques; each exposed spot corresponds to a positive colony or plaque. The colonies or plaques are selected, expanded and the DNA is isolated from the colonies for further analysis and sequencing.
The ESTs can additionally be used to screen Northern blots of mRNA obtained from various tissues or cell cultures, including the tissue of origin of the EST clone. Northern analysis will most often produce one to several positive bands. The bands can be selected for further study based on the predicted size of the mRNA.
Positive cDNA clones in phage lambda are analyzed to determine the amount of additional sequence they contain using PCR with one primer from the EST and the other primer from the vector. Clones with a larger vector-insert PCR product than the original EST clone are analyzed by restriction digestion and DNA sequencing to determine whether they contain an insert of the same size or similar as the mRNA size on a Northern blot.
Once one or more overlapping cDNA clones are identified, the complete sequence of the clones can be determined. The preferred method is to use exonuclease III digestion (McCombie, W.R, Kirkness, E., Fleming, J.T., Kerlavage, A.R., Iovannisci, D.M., and Martin-Gallardo, R., Methods: 3: 33-40, 1991). A series of deletion clones is generated, each of which is sequenced. The resulting overlapping sequences are assembled into a single contiguous sequence of high redundancy (usually three to five overlapping sequences at each nucleotide position), resulting in a highly accurate final sequence.
A similar screening and clone selection approach can be applied to obtaining cosmid or lambda clones from a genomic DNA library that contains the complete gene from which the EST was derived (Kirkness, E.F., Kusiak, J.W., Menninger, J., Gocayne, J.D., Ward, D.C., and Venter, J.C. Genomics 10: 985-995 (1991). Although the process is much more laborious, these genomic clones can be sequenced in their entirety also. A shotgun approach is preferred to sequencing clones with inserts longer than 10 kb (genomic cosmid and lambda clones). In shotgun sequencing, the clone is randomly broken into many small pieces, each of which is partially sequenced. The sequence fragments are then aligned to produce the final contiguous sequence with high redundancy. An intermediate approach is to sequence just the promoter region and the intron-exon boundaries and to estimate the size of the introns by restriction endonuclease digestion (ibid.).
Using the sequence information provided herein, the polynucleotides of the present invention can be derived from natural sources or synthesized using known methods. The sequences falling within the scope of the present invention are not limited to the specific sequences described, but include human allelic and species variations thereof and portions thereof of at least 15-18 bases. (Sequences of at least 15-18 bases can be used, for example, as PCR primers or as DNA probes.) In addition, the invention includes the entire coding sequence associated with the specific polynucleotide sequence of bases described in the Sequence Listing, as well as portions of the entire coding sequence of at least 15-18 bases and allelic and species variations thereof. Furthermore, to accommodate codon variability, the invention includes sequences coding for the same amino acid sequences as do the specific sequences disclosed herein. Finally, although the error rate in the automated sequencing used in the present invention is small, there remains some chance of error. Therefore, claims to particular sequences should not be so narrowly construed as to require inclusion of erroneously identified bases or to exclude corrections.
Any specific sequence disclosed herein can be readily screened for errors by resequencing each EST in both directions (i.e., sequence both strands of cDNA). The sequences, constructs, vectors, clones, and other materials comprising the present invention can advantageously be in enriched or isolated form. As used herein, "enriched" means that the concentration of the material is at least about 2, 5, 10, 100, or 1000 times its natural concentration (for example), advantageously 0.01%, by weight, preferably at least about 0.1% by weight. Enriched preparations of about 0.5%, 1%, 5%, 10%, and 20% by weight are also contemplated. Further, removal of clones corresponding to ribosomal RNA and "housekeeping" genes and clones without human cDNA inserts results in a library that is "enriched" in the desired clones.
The term "isolated" requires that the material be removed from its original environment (e.g., the natural. environment if it is naturally occurring). For example, a naturally-occurring polynucleotide present in a living animal is not isolated, but the same polynucleotide, separated from some or all of the coexisting materials in the natural system, is isolated.
It is also advantageous that the sequences be in purified form. The term "purified" does .not require absolute purity; rather, it is intended as a relative definition. Individual EST clones isolated from a cDNA library have been conventionally purified to electrophoretic homogeneity. The sequences obtained from these clones could not be obtained directly either from the library or from total human DNA.
The cDNA clones are not naturally occurring as such, but rather are obtained via manipulation of a partially purified naturally occurring substance (messenger RNA). The conversion of mRNA into a cDNA library involves the creation of a synthetic substance (cDNA) and pure individual cDNA clones can be isolated from the synthetic library by clonal selection. Thus, creating a cDNA library from messenger RNA and subsequently isolating individual clones from that library results in an approximately 10°-fold purification of the native message. Purification of starting material or natural material to at least one order of magnitude, preferably two or three orders, and more preferably four or five orders of magnitude is expressly contemplated.
In a cDNA library there are many species of mRNA represented. Each cDNA clone can be interesting in its own right, but must be isolated from the library before further experimentation can be completed. In order to sequence any specific cDNA, it must be removed and separated (i.e. isolated and purified) from all the other sequences. This can be accomplished by many techniques known to those of skill in the art. These procedures normally involve identification of a bacterial colony containing the cDNA of interest and further amplification of that bacteria. Once a cDNA is separated from the mixed clone library, it can be used as a template for further procedures such as nucleotide sequencing.
Although claims to large numbers of ESTs and corresponding sequences are presented herein, the invention is not limited to these particular groupings of sequences. Thus, individual sequences are considered as applicants' discoveries or inventions, as are subgroupings of sequences. All of the functional subgroupings set forth in the tables define groupings for which separate claims are contemplated as being within the scope of this invention. Moreover, in addition to claims to individual clones, it is intended that the present disclosure also support claims to numerical subgroupings. Thus, subgroupings of 50 ESTs (and corresponding sequences) are contemplated (e.g., SEQ ID NOS 1-50, 51-100, 101-150, etc.) as being within the scope of this invention, as are subgroupings of 5, 10, 25, 100, 200, and 500 ESTs and corresponding sequences.
III. DNA Constructs
The present invention also includes recombinant constructs comprising one or more of the sequences as broadly described above. The constructs comprise a vector, such as a plasmid or viral vector, into which a sequence of the invention has been inserted, in a sense or antisense orientation. In a preferred aspect of this embodiment, the construct further comprises regulatory sequences, including for example, a promoter, operably linked to the sequence. Large numbers of suitable vectors and promoters are known to those of skill in the art, and are commercially available. The following vectors are provided by way of example. Bacterial: pBs, phagescript, øX174, pBluescript SK, pBs KS, pNH8a, pNH16a, pNH18a, pNH46a (Stratagene); pTrc99A, pKK223-3, pKK233-3, pDR540, pRIT5 (Pharmacia).
Eukarvotic: pWLneo, pSV2cat, pOG44, pXT1, pSG (Stratagene); pSVK3, pBPV, pMSG, pSVL (Pharmacia).
Promoter regions can be selected from any desired gene using CAT (chloramphenicol transferase) vectors or other vectors with selectable markers. Two appropriate vectors are pKK232-8 and pCM7. Particular named bacterial promoters include lad, lacZ, T3, T7, gpt, lambda PR, and trc. Eukaryotic promoters include CMV immediate early, HSV thymidine kinase, early and late SV40, LTRs from retrovirus, and mouse metallothionein-I. Selection of the appropriate vector and promoter is well within the level of ordinary skill in the art.
In a further embodiment, the present invention relates to host cells containing the above-described construct. The host cell can be a higher eukaryotic cell, such as a mammalian cell, or a lower eukaryotic cell, such as a yeast cell, or the host cell can be a procaryotic cell, such as a bacterial cell. Introduction of the construct into the host cell can be effected by calcium phosphate transfection, DEAE dextran mediated transfection, or electroporation (Davis, L., Dibner, M., Battey, I., Basic Methods in Molecular Biology, (1986)).
The constructs in host cells can be used in a conventional manner to produce the gene product coded by the recombinant sequence. Alternatively, the encoded polypeptide can be synthetically produced by conventional peptide synthesizers.
Certain ESTs have already been preliminarily categorized by analogy to related sequences in other organisms (see Table 2). Table 10 of Example 10 categorizes particular ESTs broadly as metabolic, regulatory, and structural sequences where known. Constructs comprising genes or coding sequences corresponding to each of these categories are, therefore, specifically and individually contemplated.
Table 11 more particularly separates 127 new ESTs into
13 categories using a different criteria. These are genes related to cell surface; developmental control; energy metabolism; kinase and phosphatase; oncogenes; other metabolism-related polypeptides; peptidases and peptidase inhibitors; receptors; structural and cytoskeletal; signal transduction; transporters; -transcription, translation, and subcellular localization; and transcription factors. Table 11 further identifies the EST by the particular gene product for which it apparently codes. Each of these categories individually comprises a preferred category of EST, andpreferred constructs and resulting polypeptide can be prepared from those ESTs or the corresponding complete gene sequence. IV. ESTs and Corresponding Sequences as Reagents
Each of the cDNA sequences identified herein (and the corresponding complete gene sequences) can be used in numerous ways as polynucleotide reagents. The sequences can be used as diagnostic probes for the presence of a specific mRNA in a particular cell type. In addition, these sequences can be used as diagnostic probes suitable for use in genetic linkage analysis (polymorphisms). Further, the sequences can be used as probes for locating gene regions associated with genetic disease, as explained in more detail below.
The EST and complete gene sequences of the present invention are also valuable for chromosome identification. Each sequence is specifically targeted to and can hybridize with a particular location on an individual human chromosome. Moreover, there is a current need for identifying particular sites on the chromosome. Few chromosome marking reagents based on actual sequence data (repeat polymorphisms) are presently available for marking chromosomal location. The present invention constitutes a major expansion of available chromosome markers. One hundred ESTS have already been mapped to chromosomes. Using the techniques described in Example 5 or 6, the remaining ESTs and the corresponding complete sequences can similarly be mapped to chromosomes. The mapping of ESTs and cDNAs to chromosomes according to the present invention is an important first step in correlating those sequences with genes associated with disease.
Briefly, sequences can be mapped to chromosomes by preparing PCR primers (preferably 15-25 bp) from the ESTs. Computer analysis of the ESTs is used to rapidly select primers that do not span more than one exon in the genomic DNA, thus complicating the amplification process. These primers are then used for PCR screening of somatic cell hybrids containing individual human chromosomes. Only those hybrids containing the human gene corresponding to the EST will yield an amplified fragment.
PCR mapping of somatic cell hybrids is a rapid procedure for assigning a particular EST to a particular chromosome. Three or more clones can be assigned per day using a single thermal cycler. Using the present invention with the same oligonucleotide primers, sublocalization can be achieved with panels of fragments from specific chromosomes or pools of large genomic clones in an analogous manner. Other mapping strategies that can similarly be used to map an EST to its chromosome include in situ hybridization, prescreening with labeled flow-sorted chromosomes and preselection by hybridization to construct chromosome specific cDNA libraries. Results of mapping ESTs to chromosomal segments are listed in Tables 3 and 4. Fluorescence in situ hybridization (FISH) of a cDNA clone to a metaphase chromosomal spread can be used to provide a precise chromosomal location in one step. This technique can be used with cDNA as short as 500 or 600 bases; however, clones larger than 2,000 bp have a higher likelihood of binding to a unique chromosomal location with sufficient signal intensity for simple detection. FISH requires use of the clone from which the EST was derived, and the longer the better. 2,000 bp is good, 4,000 is better, and more than 4,000 is probably not necessary to get good results a reasonable percentage of the time. For a review of this technique, see Verma et al., Human Chromosomes: a Manual of Basic Techniques. Pergamon Press, New York (1988).
Reagents for chromosome mapping can be used individually (to mark a single chromosome or a single site on that chromosome) or as panels of reagents (for marking multiple sites and/or multiple chromosomes). Reagents corresponding to noncoding regions of the genes actually are preferred for mapping purposes. Coding sequences are more likely to be conserved within gene families, thus increasing the chance of cross hybridizations during chromosomal mapping (see Tables 8 and 9).
Once a sequence has been mapped to a precise chromosomal location, the physical position of the sequence on the chromosome can be correlated with genetic map data. (Such data are found, for example, in V. McKusick, Mendelian Inheritance in Man (available on line through Johns Hopkins University Welch Medical Library).) The relationship between genes and diseases that have been mapped to the same chromosomal region are then identified through linkage analysis (coinheritance of physically adjacent genes).
Next, it is necessary to determine the differences in the cDNA or genomic sequence between affected and unaffected individuals. If a mutation is observed in some or all of the affected individuals but not in any normal individuals, then the mutation is likely to be the causative agent of the disease.
With current resolution of physical mapping and genetic mapping techniques, a cDNA precisely localized to a chromosomal region associated with the disease could be one of between 50 and 500 potential causative genes. (This assumes 1 megabase mapping resolution and one gene per 20 kb.)
Comparison of affected and unaffected individuals generally involves first looking for structural alterations in the chromosomes, such as deletions or translocations that are visible from chromosome spreads or detectable using PCR based on that cDNA sequence. Ultimately, complete sequencing of genes from several individuals is required to confirm the presence of a mutation and to distinguish mutations from polymorphisms.
In addition to the foregoing, the sequences of the invention, as broadly described, can be used to control gene expression through triple helix formation or antisense DNA or RNA, both of which methods are based on binding of a polynucleotide sequence to DNA or RNA. Polynucleotides suitable for use in these methods are usually 20 to 40 bases in length and are designed to be complementary to a region of the gene involved in transcription (triple helix - see Lee et al, Nucl. Acids Res. 6: 3073 (1979); Cooney et al, Science 241: 456 (1988); and Dervan et al, Science 251: 1360 (1991)) or to the mRNA itself (antisense - Okano, J. Neurochem. 56: 560 (1991); Oligodeoxynucleotides as Antisense Inhibitors of Gene Expression, CRC Press, Boca Raton, FL (1988)). Triple helix formation optimally results in a shut-off of RNA transcription from DNA, while antisense RNA hybridization blocks translation of an mRNA molecule into polypeptide. Both techniques have been demonstrated to be efficient in model systems. Information contained in the sequences of the present invention is necessary for the design of an antisense or triple helix oligonucleotide. The present invention is also useful tool in gene therapy, which requires isolation of the disease-associated gene in question as a prerequisite to the insertion of a normal gene into an organism to correct a genetic defect. high specificity of the cDNA probes according to this invention have promise of targeting such gene locations in a highly accurate manner.
The sequences of the present invention, as broadly defined, are also useful for identification of individuals from minute biological samples. The United States military, for example, is considering the use of restriction fragment length polymorphism (RFLP) for identification of its personnel. In this technique, an individual's genomic DNA is digested with one or more restriction enzymes, and probed on a Southern blot to yield unique bands for identifying personnel. This method does not suffer from the current limitations of "Dog Tags" which can be lost, switched, or stolen, making positive identification difficult. The sequences of the present invention are useful as additional DNA markers for RFLP.
However, RFLP is a pattern based technique, which does not directly focus on the actual DNA sequence of the individual. The sequences of the present invention can be used to provide an alternative technique that determines the actual base-by-base DNA sequence of selected portions of an individual's genome. These sequences can be used to prepare PCR primers for amplifying and isolating such selected DNA. One can, for example, take an EST of the invention and prepare two PCR primers from the 5' and 3' ends of the EST. These are used to amplify an individual's DNA, corresponding to the EST. The amplified DNA is sequenced.
Panels of corresponding DNA sequences from individuals, made this way, can provide unique individual identifications, as each individual will have a unique set of such DNA sequences, due to allelic differences. The sequences of the present invention can be used to particular advantage to obtain such identification sequences from individuals and from tissue, as explained in Examples 12 - 14.
The EST sequences from Examples 1 and 2 and the complete sequences from Example 13 uniquely represent portions of the human genome. Allelic variation occurs to some degree in the coding regions of these sequences, and to a greater degree in the noncoding regions. It is estimated that allelic variation between individual humans occurs with a frequency of about once per each 500 bases. Each of the ESTs or complete coding sequences comprising a part of the present invention can, to some degree, be used as a standard against which DNA from an individual can be compared for identification purposes. Because greater numbers of polymorphisms occur in the noncoding regions, fewer sequences are necessary to differentiate individuals. The noncoding sequences of Table 9 for example, could comfortably provide positive individual identification with a panel of perhaps 100 to 1,000 primers which each yield a noncoding amplified sequence of 100 bp. If predicted coding sequences, such as those from Table 6, are used, a more appropriate number of primers for positive individual identification would be 500-2,000.
If a panel of reagents from ESTs or complete sequences of this invention is used to generate a unique ID database for an individual, those same reagents can later be used to identify tissue from that individual. Positive identification of that individual, living or dead can be made from extremely small tissue samples.
Another use for DNA-based identification techniques is in forensic biology. PCR technology can be used to amplify DNA sequences taken from very small biological samples such as tissues, e.g., hair or skin, or body fluids, e.g., blood, saliva, semen, etc. In one prior art technique, gene sequences are amplified at specific loci known to contain a large number of allelic variations, for example the DQα class II HLA gene (Erlich, H., PCR Technology, Freeman and Co. (1992)). Once this specific area of the genome is amplified, it is digested with one or more restriction enzymes to yield an identifying set of bands on a Southern blot probed with DNA corresponding to the DQα class II HLA gene.
The sequences of the present invention can be used to provide polynucleotide reagents specifically targeted to additional loci in the human genome, and can enhance the reliability of DNA-based forensic identifications. Those sequences targeted to noncoding regions (see, e.g., Tables 8 and 9) are particularly appropriate. As mentioned above, actual base sequence information can be used for identification as an accurate alternative to patterns formed by restriction enzyme generated fragments. Reagents for obtaining such sequence information are within the scope of the present invention. Such reagents can comprise complete ESTs or corresponding coding regions, or fragments of either of at least 15 bp, preferably at least 18 bp.
There is also a need for reagents capable of identifying the source of a particular tissue. Such need arises, for example, in forensics when presented with tissue of unknown origin. Appropriate reagents can comprise, for example, DNA probes or primers specific to particular tissue prepared from the ESTs or complete sequences of the present invention. Panels of such reagents can identify tissue by species and/or by organ type. In a similar fashion, these reagents can be used to screen tissue culture for contamination.
V. Production of Polypeptide Corresponding to ESTs
As previously explained, each EST corresponds not only to a coding region, but also to a polypeptide. Once the coding sequence is known, or the gene is cloned which encodes the polypeptide, conventional techniques in molecular biology can be used to obtain the polypeptide.
At the simplest level, the amino acid sequence encoded by the polynucleotide sequence can be synthesized using commercially available peptide synthesizers. This is particularly useful in producing small peptides and fragments of larger polypeptides. (Fragments are useful, for example, in generating antibodies against the native polypeptide.)
Alternatively, the DNA encoding the desired polypeptide can be inserted into a host organism and expressed. The organism can be a bacterium, yeast, cell line, or multicellular plant or animal. The literature is replete with examples of suitable host organisms and expression techniques. For example, naked polynucleotide (DNA or mRNA) can be injected directly into muscle tissue of mammals, where it is expressed. This methodology can be used to deliver the polypeptide to the animal, or to generate an immune response against a foreign polypeptide. Wolff, et al., Science 247:1465 (1990); Feigner, et al., Nature 349:351 (1991). Alternatively, the coding sequence, together with appropriate regulatory regions (i.e., a construct), can be inserted into a vector, which is then used to transfect a cell. The cell (which may or may not be part of a larger organism) then expresses the polypeptide. (See Example 25.)
Antibodies generated against the polypeptide corresponding to a sequence of the present invention can be obtained by direct injection of the naked polypeptide into an animal (as above) or by administering the polypeptide to an animal, preferably a nonhuman. The antibody so obtained will then bind the polypeptide itself. In this manner, even a sequence encoding only a fragment of the polypeptide can be used to generate antibodies binding the whole native polypeptide. Such antibodies can then be used to isolate the polypeptide from tissue expressing that polypeptide. Moreover, a panel of such antibodies, specific to a large number of polypeptides, can be used to identify and differentiate such tissue.
VI. Examples
Certain aspects of the present invention are described in greater detail in the non-limiting Examples that follow. EXAMPLE 1
cDNA Sequences Determined by Random
Clone Selection: First set
METHODOLOGY:
With reference to the data presented in Table 1, lambda ZAP libraries were converted en masse to pBluescript plasmids, transfected into E. coli XL1-Blue cells, and plated on X-gal/IPTG/ampicillin plates. A total of 1058 clones were picked at random from three human brain cDNA libraries: fetal brain, two-year-old hippocampus, and two-year-old temporal cortex (Stratagene catalog #936206, 936205, 935, respectively. Stratagene, 11099 N. Torrey Pines Rd., La Jolla, CA 92037). An analysis of these clones is summarized in Table I (see below) In addition, clones selected from the hippocampus library were also analyzed after subtractive hybridization with the fibroblast library. These results are listed in the "Hippocampus Subtracted" column of Table 1. Templates for DNA sequencing were PCR products or plasmids prepared by the alkaline lysis method. About half of the templates prepared by PCR failed to yield an amplified fragment suitable for sequencing. This was primarily due to use of PCR conditions that minimized the need for further purification of the product but also selected against amplification of long inserts (5 μl fresh or frozen overnight culture of E. coli carrying the pBluescript plasmid, 7.5 μM each dNTP, and 0.1 μM each primer for 35 cycles: 94°C, 40 sec; 55°C, 40 sec; 72°C, 90 sec). A further percentage of the PCR-generated templates failed to sequence, largely due to primer-dimer or other amplification artifacts. Qiagen™ columns improved the percentage of plasmid templates, increasing the yields of usable sequence from about 60% with a standard alkaline lysis protocol to over 90%. Overall, 117 PCR-generated templates and 497 plasmid templates resulted in usable sequence. Dideoxy chain termination sequencing reactions were performed with fluorescent dye-labeled M13 universal or reverse primers. After a cycle sequencing protocol, carried out in a Perkin-Elmer thermal cycler, sequencing reactions were run on an Applied Biosystems, Inc.
(Foster City, CA) 373A automated DNA sequencer. (Cycle sequencing was performed in a Perkin Elmer Thermal Cycler for
15 cycles of 95°C, 30 sec; 60°C, 1 sec; 70°C, 60 sec and
15 cycles of 95°C, 30 sec; 70°C, 60 sec with the Applied
Biosystems, Inc. Taq Dye Primer Cycle Sequencing Core Kit protocol). Some sequencing reactions were performed on an ABI robotic workstation (Cathcart, Nature 347: 310 (1990) hereby incorporated by reference).
RESULTS:
Singe-run DNA sequence data were obtained from 609 randomly chosen cDNA clones.. The number of clones sequenced from each library is summarized in Table 1. Double-stranded cDNA clones in the pBluescript vector were sequenced by a cycle sequencing protocol with dye-labeled primers and Applied Biosystems, Inc. 373A DNA Sequences. The average length of usable sequence was 397 bases with a standard deviation of 99 bases.
Subtractive hybridization has been used successfully to reduce the population of highly represented sequences in a cDNA library by selectively removing sequences shared by another library. (Schmid and Girou, Neurochem. 48: 307
(1987); Fargnoli et al, Anal. Biochem. 187: 364 (1990)
Duguid and Dinauer, Nucl. Acids. Res. 18: 2789 (1990)
Schweinfest, et al, Genet. Anal. Techn. Appl. 7: 64 (1990)
Travis and Sutcliffe, Proc. Natl. Acad. Sci. USA 85: 1696 (1988); Kato, Eur. J. Neurosci. 2: 704 (1990)). Subtractive hybridization was therefore tested as a way of enhancing the number of brain-specific clones in the hippocampus library by hybridizing the hippocampus library with a WI38 human lung fibroblast cell line cDNA library and removing the common sequences (Schweinfest et al, Genet. Anal. Techn. Appl. 7: 64 (1990); Sive and St. John, Nucl. Acids Res. 16: 10937 (1988)). Clones from this subtraction are listed in the column "Hippocampus Subtracted" in Table 1.
The EST sequences from this Example 1 are identified as SEQ ID NOs 1-315.
TABLE 1. cDNA Library Composition Determined
By Random Clone Sequencing ---- ----- ----- ----- ----- ----- - ---- ----- -cDNA Library---- ----- ----- ----- ----- ----- ----- ----- ----- ----- -
Hippocampus Hippocampus Subtracted Fetal Brain Temporal Cortex
EST Category Number Percent Number Percent Number Percent Number Percent
Databases Match- -Human
Mitochondrial Genes 48 12.8 10 8.6 3 7.9 6 7.5
Repeats: Alu, Line-1, etc. 39 10.4 14 12.2 6 15.8 0 0
Ribosomal RNA 10 2.7 7 6.0 0 0 11 13.8
Other Nuclear Genes 32 8.6 7 6.0 4 10.5 0 0
Database Match--Other 32 8.6 7 6.0 5 13.2 4 5.0
No Database Match 160 42.8 44 37.9 20 52.6 6 7.5 poly A Insert 53 14.1 24 20.7 0 0 27 33.7
No Insert 1 0.3 3 2.6 0 0 26 32.5
EXAMPLE 2
Sequencing of Additional ESTs: Second set
Over 2600 additional cDNA clones have been isolated, partially sequenced and screened. The clones were isolated from four human brain cDNA libraries. The new sequences thus discovered, together with the 315 brain ESTs from Example 1, correspond to over 2400 new human genes. These data represent an approximate doubling of the number of human genes identified by DNA sequencing.
Specifically, four cDNA libraries were used as sources of clones for sequencing. Human hippocampus and fetal brain libraries, plasmid template preparation, sequencing reactions, and automated sequencing were performed as described (Adams, M.D., Kelley, J.M., Gocayne, J.D., Dubnick, M., Polymeropoulos, M.H., Xiao, H., Merril, C.R., Wu, A., Olde, B., Moreno, R.F., Kerlavage, A.R., McCombie, W.R., & Venter, J.C. Science, 252: 1651-56 (1991)). A pooled probe consisting of inserts from 10 different EST clones with sequences that matched either mitochondrial genes or the 18S or 28S ribosomal RNAs was used to prescreen a gridded filter array of the hippocampus library; nonhybridizing clones are referred to as the "prescreened library". Another fetal brain library was constructed by and was a gift from Bento
Soares (Columbia University). A directionally-cloned library was prepared using the method of Rubenstein, et al.
(Rubenstein, J., Elizabeth, A., Brice, A., Ciaranello, R., Denney, D., Porteus, M. & Usdin, T. Nucl. Acids Res. 18: 4833-4842) using human adult brain mRNA purchased from Clontech (Palo Alto, CA; Catalogue # 6516-1). Of 482 clones analyzed by restriction enzyme digestion, 33% contained inserts at least 1500 base pairs in length. Stratagene hippocampus and fetal brain library totals include data from Adams et al Science 252: 1651.
Sequences of nuclear-encoded cDNAs that did not include interspersed repeats (Schmid, C. W. & Jelinek, W. R. Science 216: 1065-1070 (1982); Paulson, K. E., Deka, N., Schmid, C. W., Misra, R., Schlinder, C. W., Rush, M. G., Kadyk, L., & Leinwand, L. Nature 316: 359-361 (1985); Fanning, T. G. & Singer, M. F. Biochem. Biophys. Acta 910: 203-212 (1987)) were searched against all of GenBank and, in 6-frame translation, against a comprehensive, non-redundant peptide database using the network BLAST (Altschul, S. F., Gish, W., Miller, W., Myers, E.W., & Lipman, D. J. Mol. Biol. 215: 403-410 (1990)) server at the National Center for Biotechnology Information. BLAST output was parsed, and an interactive alignment editor was used to select which matches, if any, from each search to record in a relational EST database, which was developed to track sequencing, identification, tissue localization, physical mapping, and the public distribution of the clones, mapping and sequence data. For significant similarities, a putative gene name and Protein Identification Resource (PIR) gene family identification (Barker, W., George, D., Hunt, L., & Garavelli, J. Nucl. Acids Res. 19 (Suppl): 2231-2236 (1991.)) for the EST were assigned. ESTs without significant matches using BLAST were searched in translation against PIR using
FASTA. Ten additional marginal matches were found. A total of 2300 new EST sequences comprising 765,505 nucleotides from the current data set have been submitted to GenBank and assigned accession numbers M77851-M79278 and M85308-M86179. All ESTs except those multiply representing actin, tubulin, and myelin basic protein clones were submitted. ATCC accession numbers of cDNA clones from which ESTs were derived are 77501-78999 and 81000-81756. The Genome Data Base expressed D-segment numbers for these clones are DOS1E - DOS2422E. The ESTs from this Example are identified herein as SEQ ID NOs 316-2407. EXAMPLE 3
EST Characterization: First Set
ESTs including SEQ ID NOs 1-315 were analyzed as follows. Initially, the EST sequences were examined for similarities in the GenBank nucleic acid database (GenBank Release 65.0), Protein Information Resource Release 26.0 (PIR), and ProSite (MacPattern from the EMBL data library, Fuchs R. Comput. Appl. Biosci. 7: 105 (1990) Release 5.0 were used). BLAST was used to search Genbank and the PIR (both maintained by the National Center for Biotechnology Information) ESTs without exact GenBank matches were translated in all six reading frames and each translation was compared with the protein sequence database PIR and the ProSite protein motif database. Comparisons with the ProSite motif database were done by means of the program MacPattern from the EMBL Data Library. GenBank and PIR searches were conducted with the "basic local alignment search tool" programs for nucleotide (BLASTN) and peptide (BLASTX) comparisons (Altschul et al, J. Mol. Biol. 215: 403 (1990)). PIR searches were run on the National Center for Biotechnology Information BLAST network service. The BLAST programs contain a very rapid database-searching algorithm that searches for local areas of similarity between two sequences and then extends the alignments on the basis of defined match and mismatch criteria. The algorithm does not consider the potential gaps to improve the alignment, thus sacrificing some sensitivity for a 6-80 fold increase in speed over other database-searching programs such as FASTA
(Peqarson and Lipman, Proc. Natl. Acad. Sci. USA, 85: 2444 (1988)).
Sequence similarities identified by the BLAST programs were considered statistically significant with a Poisson P-value than 0.01. The Poisson P-value less than the probability of as high a score occurring by chance given the number of residues in the query sequence and the database. After the BLASTN search, 30 unmatched ESTs were compared against GenBank by FASTA to determine if significant matches were missed due to the use of BLASTN for the database search. No additional statistically significant matches were found. Statistical significance does not necessarily mean functional similarity; some of the reported matches may indicate the presence of a conserved domain or motif or simply a common protein structure pattern. Those ESTs identified as fully corresponding to known human genes or proteins are not included in this disclosure. Statistically significant matches are reported in Table 2, together with the length and percent identity or similarity of each alignment.
On the basis of database searches, 609 EST sequences were classified into eight groups as shown in Table 1 (see Example 1 above). Four groups, with 197 or 32% of the sequences, consist of matches to human sequences: repetitive elements, mitochondrial genes, ribosomal RNA genes, and other nuclear genes. Forty-eight (8%) of the sequences matched non-human entries in GenBank or PIR while 230 (38%) had no significant matches. The remaining 134 (22%) sequences contained no insert or consisted entirely of polyA between the EcoRI cloning sites.
Thirty-six ESTs matched previously sequenced human nuclear genes with more than 97% identity. Four of these ESTs are from genes encoding enzymes involved in maintaining metabolic energy, including ADP/ATP translocase, aldolase C, hexokinase, and phosphoglycerate kinase. Human homologs of genes for the bovine mitochondrial ATP synthase Foß-subunit and porcine aconitase were also found (Table 2). Brain-specific cDNAs included synaptophysin, glial fibrillary acidic protein (GFAP), and neurofilament light chain. At least six ESTs are from genes encoding proteins involved in signal transduction: 2',3'-cyclic nucleotide 3'-phosphodiesterase (2 ESTs), calmodulin, c-erbA-α-2, Gsα, and Na+/K+ ATPase α-subunit. Other ESTs were matches to genes for ubiquitous structural proteins - - actins , tubulins , and fodrin (non-erythroid spectrin). ESTs also document the presence in the hippocampus cDNA library of the ret protooncogene, the ras-related gene rhoB, and one of the chromosome 22 breakpoint cluster region transcripts. Eight ESTs are from genes known to be associated with genetic disorders (Online Mendelian Inheritance in Man). More than half of the human-matched ESTs from Example 1 have been mapped to chromosomes, indicating the bias of GenBank entries toward well-studied genes and proteins.
ESTs without significant GenBank matches were also compared to the ProSite database of recognized protein motifs. Not counting post-translational-modification signatures, fifty-four sequences contained motifs from the database. Some patterns, particularly the "leucine zipper", are found in scores or hundreds of proteins that do not share the functional property implied by the presence of the motif.
Similarities to sequences from other organisms were also detected in the BLAST searches of GenBank and PIR (Table 2) .
Several ESTs displayed similarity to "housekeeping" genes, including the ribosomal proteins S10 and L30 (rat) and the above glycolytic enzymes. EST00257 (SEQ ID NO: 77) shows strong nucleotide sequence similarity to the squid (67%) and Drosophila (70.4%) kinesin heavy chain. Kinesin was first described as a microtubule-associated motor protein involved in organelle transport in the squid giant axon (Vale et al, Cell 42: 39 (1985)). Six oncogene-related sequences were also among the cDNA clones sequenced. EST00299 (SEQ ID NO: 180) and EST00283 (SEQ ID NO: 271) show similarity to several ras-related genes and EST00248 (SEQ ID NO: 102) matched the 3' untranslated region of the bovine substrate of botulinum toxin ADP-ribosyltransferase. Similarities with an S. cerevisiae RNA polymerase subunit and Torpedo electromotor neuron-associated protein were also observed. Two ESTs may represent new members of known human gene families: EST00270 matched the three ß-tubulin genes with 88-91% identity and EST00271 (SEQ ID NO:248) matched α-actinin with 85% identity at the nucleotide level.
Among the most interesting of the primary sequence relationships was the similarity of ESTs to the Drosophila genes Notch and Enhancer of split. Nucleotide and peptide alignments of EST00256 (SEQ ID NO: 188) and EST00259 (SEQ ID NO: 227) with the Drosophila genes have been demonstrated. Both genes are part of a signal cascade encoded by the "neurogenic" genes that are involved in the differentiation of neuronal and epidermal cell lineages in the neuroectoderm of the developing Drosophila embryo (Campos-Ortega, Trends in Neuro. Sci. 11: 400 (1988)). It has been proposed that the Enhancer of split protein interacts with a membrane protein that is the product of the Notch gene to convert a developmental signal into an altered pattern of gene expression (id. J. Mol. Biol. 215: 403 (1990)). EST00256 (SEQ ID NO:188) matches near the 5' end of the Enhancer of split coding sequence, away from the mammalian G protein β subunit- and yeast cdc4-like elements (Hartley et al, Cell 55: 785 (1988); Klambt et al. EMBO J. 8: 203 (1989)). Part of the EST00259 (SEQ ID NO: 227) match to Notch in the cdclO/SW16 region that is similar to three cell-cycle control genes in yeast and is tightly conserved in the Xenopus Notch homolog, Xotch. In Drosophila, Enhancer of split is absolutely required for formation of epidermal tissue. Notch contains several epidermal growth factor-like repeats and appears to play a general role in cell-cell communication during development (Banerjee and Zipursky, Neuron 4:177
(1990)).
Seven genes were represented by more than one EST.
Comparisons of all the ESTs against one another revealed two overlaps of unknown ESTs: EST00233 (SEQ ID NO: 32) and EST00234 (SEQ ID NO: 8) match in opposite orientations and EST00235 (SEQ ID NO:204) and EST00236 (SEQ ID NO: 148) match in the same orientation beginning at the same nucleotide. Five human genes were represented by more than one EST: ß- actin (3), λ-actin (2), α-tubulin (2), α-2-macroglobulin (2), and 2'3'-cyclic-nucleotide-3'-phosphodiesterase (2). Those few instances where two or more ESTs represent different portions of a single cDNA can be readily ascertained when the sequence of the full cDNA insert is determined in accordance with Example 13.
Example 4
EST Sequences Characterization: Second Set
The ESTs of Example 2, including SEQ ID NOs 316-2407, were screened against known sequences listed in GenBank and other databases, as in Example 3. The results are reported in Table 2. The quality of the match is given as percent identity and length in base pairs for nucleotide matches and amino acid residues for peptide matches. In many cases ESTs match multiple domains on several related proteins; for example, EST00825 matches two transmembrane domains on both GABA and Norepinephrine transporters. Nucleotide databases are: GenBank (GB), and EMBL (E); peptide databases are: GenPept (GPU), Swiss-Prot (SP), and PIR.
The great majority (83%) of the partial cDNA sequences reported in Example 2 are unrelated to any sequences previously described in the literature. Based on database matches to known genes from humans as well as from such evolutionarily distant organisms as E. coli , yeast, C. elegans, Drosophila , barley, AraJbidopsis, rice, and green algae, we have preliminarily identified the functional type of a number of the ESTs (Table 2). These include a novel gene similar to Notch/Tan- 1 (Adams et al., supra), a new neurotransmitter transporter gene, and a new member of the multi-drug resistance gene family. Several genes involved in development or cell differentiation in Drosophila are represented by similar human ESTs, including seven in absentia (Carthew, R. & Rubin, G. Cell 63: 561-577 (1990)), big-brain (bib) (Rao, Y., Jan, L., & Jan, Y. Nature 345: 163-167 (1990)), the discs tumor suppressor (Woods, D. & Bryant, P. Cell 66: 1-20 (1991)), and the homeotic gene orthodenticle (Finkelstein, R., Smouse, D. Capaci, T., Spradling, A. & Perrimon, N. Genes. Dev. 4: 1516-1527 (1990)). New members of gene families previously known in humans include a Ca+2-transporting ATPase, an ADP ribosylation factor, and a new neural-cell adhesion molecule gene.
The 1971 ESTs without a putative identification were analyzed using the coding-region prediction program CRM via the GRAIL server (Uberbacher, E. & Mural, R. Proc. Natl. Acad. Sci. USA 88: 11261-5 (1991)). Fifteen percent of the unknown ESTs scored an excellent probability of containing protein-coding sequence. Fifty percent of the ESTs to known human genes contain protein-coding sequences, therefore, at most half of the unknown ESTs are likely to contain coding sequences. We have found no evidence that genomic DNA or cDNA to unspliced precursor RNA is a major contaminant of either the hippocampus or fetal brain library.
Table 2: ESTs Identified by Database Matches
SEQ ID EST# Putative Identification Accession DB Len %ID
----- -------- --------------------------------------------------------- -------------- --- ---- -----
208 EST00250 60K filarial antigen A28209 PIR 108 56.9
2320 EST01784 60K filanal antigen A28209 PIR 88 50.6
969 EST01982 ADP-ribosylation factor 1 B33283 PIR 84 41.2
1834 EST01620 AMP deaminase, brain A37056 PIR 57 100.0
97 EST00289 Acomtase A35544 PIR 105 90.6
251 EST00370 Actin, other S10021 PIR 44 51.1
248 EST00271 Actinin, alpha HUMACTAR GB 271 85.3
891 EST01891 Actinin, alpha HUMACTAR GB 315 81.6
1500 EST02538 Actinin, alpha HUMACTAR GB 271 75.0
132 EST00110 Agrin RATAGR GB 269 82.2
1852 EST01625 Agrin RATAGR GB 103 84.6
1094 EST02113 Ala HUMALA GB 92 82.8
691 EST00675 Alcohol dehydrogenase RICGOS2G_1 GPU 38 59.0
2408 EST00244 Amyloid A4 HUMAFPA4 GB 135 91.9
1965 EST01664 Amyloid A4 A29030 PIR 52 54.7
2068 EST01694 Amyloid A4 QRHUA4 PIR 83 69.0
2092 EST01700 Anion exchanger homolog AE3 A33638 PIR 95 97.9
1880 EST01634 Axonal glycoprotein TAG-1 A34695 PIR 69 87.1
1492 EST02530 B cell-specific Mo -MLV integration site 1 (bmi-1) MUSBMl1A GB 111 87.5
1277 EST02306 Bib protein S09699 PIR 57 53.4
13 EST00255 Cadherins CADN$HUMAN SP 41 45.2
1348 EST02378 cAMP-dependent protein kinase inhibitor MUSPKI GB 234 91.5
1931 EST01041 cAMP-regulated phosphoprotein B35308 PIR 21 86.4
1413 EST02447 cAMP-specific phosphodiesterase HUMPDEAA GB 363 69.0
396 EST01443 CDPdiacylglycerol-serine O-phosphatidyltransferase JH0368 PIR 33 41.2
1956 EST01663 Ca2 + -transporting ATPase 2 B28065 PIR 125 88.9
1126 EST02146 Calbindin D28 RATCALBD28 GB 81 87.8
1039 EST02055 Calcium channel S05054 PIR 33 67.6
1910 EST01645 Calmodulin RATRCM1 GB 120 90.1
485 EST01466 Calmodulin-dependent protein kinase, type II, beta A26464 PIR 93 98.9
913 EST01913 Clathrin coat assembly protein AP50 homolog YSCYAP54_1 GPU 62 63.5
2004 EST01676 Cofilin PIGCOFIL GB 132 89.5
2400 EST01824 Cysteine-rich intestinal protein GYRTI PIR 56 66.7
1588 EST02633 D22Z3 repetitive DNA HUMREP GB 160 76.4
2192 EST01257 Diacylglycerol kinase, lymphyocyte S09156 PIR 44 42.2
1441 EST02477 Diamine acetyltransferase ATDA$HUMAN SP 74 45.3
650 EST00642 Dilute (myosin heavy chain) MUSDILUTE_1 GPU 27 100.0
2302 EST01779 Discs-large tumor suppressor DRODLGA_1 GPU 53 63.0
188 EST00256 Enhancer of split A30047 PIR 86 58.6
2289 EST01325 Fatty acid synthase RATFAS GB 98 79.8
310 EST00377 Fo ATPase beta subunit mitochondrial BOVMTASB GB 293 85.4
1332 EST02362 GA binding protein, beta subunit MUSGAC_1 GPU 86 90.8
1667 EST00825 Gamma-aminobutyric acid transporter A35918 PIR 26 59.3
2217 EST01738 Gelation factor ABP-280 A37098 PIR 74 80.0
1412 EST02446 Glutamate-aspartate carrier protein JV0092 PIR 57 37.9
1020 EST02034 Glutammase GLS$RAT SP 34 74.3
1885 EST01639 Histocompatibility antigen modifier 1 A37779 PIR 63 75.0
1495 EST02533 Hypothetical 435K protein JU0319 PIR 43 52.3
2326 EST01791 Inositol 1,45-trisphosphate 3 kinase JN0129 PIR 65 68.2
SEQ ID EST# Putative Identification Accession DB Len %ID ----- -------- --------------------------------------------------------- -------------- --- ---- -----
724 EST01529 Interferon-mduced 54K protein INI4$HUMAN SP 76 70.1 1035 EST02051 J1 protein MUSJ1PRO GB 362 85.7
1229 EST02258 KUP protein HUMKUPMR_1 GPU 54 36.4 993 EST02007 Kinase 5 protein CHKCEK5_1 GPU 68 94.2
77 EST00257 Kinesin A35075 PIR 57 86.2
78 EST00258 Kinesin A35075 PIR 62 47.6
2245 EST01748 Kinesin A35075 PIR 98 52.5
2282 EST01764 Lamin B receptor A36427 PIR 76 71.4
2173 EST01724 Lon protease JQ0901 PIR 103 41.3
1427 EST02463 Long-cham-fatty-acid-CoA ligase A36275 PIR 36 62.2 313 EST00276 Lysosomal membrane glycoprotein 1 (LAMP-1) A31959 PIR 53 46.3 161 EST00247 MARCKS (myristoylated alanine-rich protein kinase BOVMARCKS GB 139 83.6 1386 EST02418 MARCKS homolog MMF52 EU 237 92.4 769 EST00734 MARCKS homolog S08341 PIR 61 40.3 43 EST00371 Maternal G10 protein S05955 PIR 38 92.3
1468 EST02505 Matrin 3 RATMATRIN3 GB 137 93.5
639 EST00632 Membrane transport superfamily (GTP-dependent) A24400 PIR 63 39.1 1894 EST01643 Membrane transport superfamily (GTP-dependent} A24400 PIR 71 50.0 824 EST01865 Microtubule-associated protein 1B RATNEU GB 293 86.4 223 EST00368 Microtubule-associated protein 1 B A33645 PIR 30 54.8 2032 EST01683 Microtubule-associated protein 1 B A33645 PIR 49 62.0 2017 EΞT01678 Milk fat globule membrane protein A36479 PIR 48 61.2 1704 EΞT01580 Myeloid differentiation primary response gene MyD1 MUSMYD1 88_1 GPU 76 1 88.3 2226 EST01744 NAD(P) + transhydrogenase (B-specific) DEBOXM PIR 86 93.1 1567 EΞT02610 Neural cell adhesion molecule L1 S05479 PIR 82 43.4 506 EST01471 Neuraxm S06017 PIR 120 84.3
1566 EST02609 Neutrophil oxidase factor A34855 PIR 43 47.7
952 EST01961 Notch/Xotch HUMTAN1_1 GPU 85 57.0 227 EST00259 Notch/Xotch A35844 PIR 74 85.3
1395 EST02429 Nuclear factor 1-like protein (NF1) HAMNF1A GB 111 92.0 1681 EST01573 Nucleoside diphosphate kinase A33386 PIR 71 52.8 346 EST01828 Otd homeotic protein A35912 PIR 35 52.8
2254 EST01751 Phosphatιdγlιnositol-4,5-bιsphosphate phosphodiest A28807 PIR 40 90.2 1869 EST00992 Polymyxin B resistance A32714 PIR 20 76.2
93 EST00287 Processing enhancing protein S03968 PIR 96 58.8
2353 EST01806 Prohibitin RATPROHlBJ_1 GPU 120 97.5
2297 EST01775 Prohormone cleavage enzyme MUSMPC1A_1 GPU 91 93.5
9 EST00376 Prolyl endopeptidase PIGPREP GB 223 83.9
1069 EST02087 Protein kinase C, zeta HUMPKCL GB 382 58.7 1933 EST01650 Protein phosphatase 2A beta subunit HUMPROP2AB GB 288 76.8 202 EST00298 Protein-tyrosine phosphatase LRP LRP$MOUSE SP 62 44.4 1654 EST01572 Protochlorophyllide reductase S04783 PIR 34 57.1
38 EST00374 RNA polymerase II 6th subunit (RP026) A36352 PIR 72 75.3 1478 EST02515 Rab5 F34323 PIR 91 82.6
2368 EST01389 Radial spoke protein 3 S05962 PIR 58 52.5
37 EST00038 ras p21 -like small GTP-binding protein (smg GDS) BOVSMGGDS GB 131 89.4
180 EST00299 ras-related proteins S10493 PIR 51 46.1
1700 EST01579 Retrovirus-related gag polyprotein FOHUE2 PIR 95 77.1
1511 EST02550 Retrovirus-related polpolyprotein GNLJGL PIR 50 54.9
102 EST00248 rho H12/ARH12 BOVBGBRH GB 195 79.6
1715 EST01583 Ribosomal protein L18a R5RT18 PIR 68 95.7
SEQ ID EST# Putative Identification Accession DB Len %ID
----- -------- --------------------------------------------------------- -------------- --- ---- ----- 1856 EST01627 Ribosomal protein L1a A24579 PIR 75 63.1
1974 EST01667 Ribosomal protein L3 JQ0771 PIR 74 80.0
301 EST00300 Ribosomal protein L30 R6RT30 PIR 57 96.5
22 EΞT00301 Ribosomal protein S10 R3RT10 PIR 66 970
2402 EST01826 Ribosomal protein S10 R3YM10 PIR 36 51 4
463 EST01459 Ribosomal protein YL10 S11581 PIR 40 683
1408 EST02442 Seven in absentia A36195 PIR 46 80.8
299 EST00249 smg p25A GDP dissociation inhibitor A35652 PIR 97 77.5
951 EST01960 Spectnn, beta HUMSPTB GB 268 67.7
2089 EST01699 Sperm membrane protein A35981 PIR 52 585
2073 EST01697 Succinate dehydrogenase flavoprotein BOVSDHFP1_1 GPU 44 100.0
2138 EST01715 Succinate dehydrogenase flavoprotein BOVSDHFP1_1 GPU 49 92.0
430 EST00472 Synaptotagmin (p65) SY65$HUMAN SP 27 53.6
1371 EST02402 Talin MUSTALINR_1 GPU 79 81.2
1771 EST01601 Thiosulfate sulfurtransferase (rhodanese) ROBO PIR 65 81.8
300 EST00232 Transforming protein (dbl) TVHUDB PIR 25 654
189 EΞT00282 trkB A35104 PIR 33 67.6
653 EST01512 Tubulin, alpha HUMTUBAG GB 223 75.0
594 EST01490 Tubulin. beta HUMTBB5 GB 298 93.6
757 EST01542 Tubulin, beta HUMTUBBM GB 217 904
1245 EST02274 Tubulin, beta A26561 PIR 105 88.7
1147 EST021C9 Tyrosine kinase HUMECK GB 384 74.3
1701 EST00853 Unc-104 JN0114 NR 36 45.0
2121 EST01711 Valine-tRNA ligase A29871 PIR 56 57.9
187 EST00152 Wilm's tumor-related protein HUMQM GB 228 99.6
1726 EST01588 XPR2 alkaline extracellular protease B26955 PIR 88 46.1
249 EST00275 Zinc Finger Proteins S06551 PIR 25 57.7
413 EST01446 Zinc Finger Proteins S00754 PIR 45 60.9
469 EST01460 Zinc Finger Proteins C32891 PIR 34 54.3
833 EST01560 Zinc Finger Proteins S00754 PIR 105 67.0
1230 EST02259 Zinc finger proteins S00754 PIR 71 62.5
1496 EST02534 Zinc finger proteins A34612 PIR 50 451
2324 EST01352 Zinc Finger Proteins S10397 PIR 29 567
There is little redundancy in EST sequencing according to the present invention. Of the nuclear-encoded messenger RNAs, the most common ESTs were to the β-actin (0.6% of the EST clones) and myelin basic protein genes (MBP, 0.5% of the clones). MBP, a highly expressed structural component of nerve tissue (Kamholtz, J., de Ferra, F., Puckett, C., & Lazzarini, R. Proc. Natl. Acad. Sci., USA 83: 4962-4966 (1986)), displays four alternate splicing forms, of which at least two are present among the ESTs reported here.
Other common ESTs were Gs-alpha gamma-actin and both a- and alpha-tubulin.
By matching ESTs to known database sequences, a phenotypic characterization of the tissue begins to emerge. Protein superfamilies matched by ESTs were grouped into three broad functional categories to assess the biological spectrum represented by these randomly selected cDNA clones. Structural and metabolic classes comprised about 30% of the ESTs with database matches. Twenty-five percent were involved in regulatory pathways and the remainder were not classifiable. Eleven of the eighteen enzymes of glycolysis and the citric acid cycle are represented by at least one subunit or isozyme. In addition, several genes not previously known to be expressed in the brain were matched, including spermine/spermidine acetyltransferase (Casero, R., Celano, P, Ervin, S., Applegren, N., Wiest, L. & Pegg, A. J. Biol. Chem. 266: 810-814 (1991)) and
osteopontin (Young, M., Kerr, J., Termine, J., Wewer, U., Wang, M., McBride, W. & Fisher, L. Genomics 7:491-502
(1990)). EXAMPLE 5
Mapping of ESTs to Human Chromosomes
Randomly selected ESTs corresponding to SEQ ID NOs. were assigned to chromosomes via PCR (see Table 3).
Oligonucleotide primer pairs were designed from EST sequences to minimize the chance of amplifying through an intron. The oligonucleotides were 18-23 bp in length and designed for PCR amplification using the computer program INTRON (National Institutes of Mental Health, Bethesda, MD). The program is based on the assumptions that: 1) introns are genomic sequences that interrupt the coding and noncoding sequences of genes (Smith, J. Mol. Evol. 27:45-55 (1988)); 2) there are consensus sequences for splice junctions (Shapiro, et al., Nucl. Acids Res. 15:7155-7174 (1987)); and 3) that 90% of the human genes studied have 3' untranslated regions of mRNA not interrupted by introns in the genomic DNA (Hawkins, Nucl. Acids Res. 16:9893-9908 (1988)).
The program evaluates the likelihood that a given GG or CC dinucleotide represents a former exon-intron
boundary. Specifically, every input strand is processed by the INTRON program twice, first evaluating the sense mRNA strand, and then processing the complementary or anti-sense strand. The program evaluates each sequence by finding all GG or CC pairs (possible former splice sites), searching for STOP codons in all three reading frames, and analyzing the GG or CC pairs surrounded by stop codons. All regions of the EST that are unlikely to contain splice junctions based on CC content, GG content, and stop codon frequency are then marked by the program in uppercase.
The creation of PCR primers from known sequences is well known to those with skill in the art. For a review of PCR technology see Erlich, H.A., PCR Technology, Principles and Applications for DNA Amplification. 1992. W.H. Freeman and Co., New York. ESTs were examined for the presence of stop codons in each reading frame and for consensus splice junctions. The presence of stop codons and absence of splice junction sequences are more characteristic of 3' untranslated sequences than of introns. The untranslated sequences are unique to a given gene; thus, primers from these regions are less likely to prime other members of a gene family or pseudogenes.
The primers were used in polymerase chain reactions (PCR) to amplify templates from total human genomic DNA. PCR conditions were as follows: 60 ng of genomic DNA was used as a template for PCR with 80 ng of each
oligonucleotide primer, 0.6 unit of Tag polymerase, and 1 uCu of a 32P-labeled deoxycytidine triphosphate. The PCR was performed in a microplate thermocycler (Techne) under the following conditions: 30 cycles of 94°C, 1.4 min; 55°C, 2 min; and 72°C, 2 min; with a final extension at 72°C for 10 min. The amplified products were analyzed on a 6% polyacrylamide sequencing gel and visualized by
autoradiography. If the size of the resulting product was equivalent to the EST from which the primers are derived, then the PCR reaction was repeated with DNA templates from two panels of human-rodent somatic cell hybrids; BIOS
PCRable DNA (BIOS Corporation) and NIGMS Human-Rodent
Somatic Cell Hybrid Mapping Panel Number 1 (NIGMS, Camden, NJ).
PCR was used to screen a series of somatic cell hybrid cell lines containing defined sets of human chromosomes for the presence of a given EST. DNA was isolated from the somatic hybrids and used as starting templates for PCR reactions using the primer pairs from EST sequences
selected above. Only those somatic cell hybrids with chromosomes containing the human gene corresponding to the EST will yield an amplified fragment. ESTs were assigned to a chromosome by analysis of the segregation pattern of PCR products from hybrid DNA templates. For a review of techniques and analysis of results from somatic cell gene mapping experiments. (See Ledbetter et al., Genomics
6:475-481 (1990).) The single human chromosome present in all cell hybrids that give rise to an amplified fragment represents the chromosome containing that EST. The assignment of 100 ESTs and corresponding genes to chromosomes by PCR is shown in Table 3.
Table 3: Assignment of ESTs to Chromosomes by PCR
SEQ ID EST# Chr PRIMER #1 PRIMER #2
5 EST00012 1 TCCAGGCAATCCCAGAATAG CTAATTGAGCTCACTGGCCC
57 EST00058 1 CTGTTTGCAAGTTTCAAAGC GCCATTTCTAACAACCAGAG
64 EST00066 1 GCCATTGTGCTGAATAGAGT GTTAGTGTTTCCTTAGCAAG
83 EST00079 1 CAGCTAATTGACCTGGGCTA CAACATGCTCTGAGCTTTAG
83 EST00079 1 GGCAGAGCATAATGAGTATA CATATGCATATGGTCCCTAT
91 EST00086 1 AGTTTAGATGGAGGGCTGTC TCTGCCCTAATGCGCAGGCT
105 EST00365 1 CTTAATCACCTCCCTTTTGT CCTTAGTTGGAGATAAGGTC
109 EST00095 1 AGTCTAATCCTGTACACTTG CGGGCTTTCTCTGAATTGGT
116 EST00100 1 TTAGAAGTGCCCATGGGAGG TTTTAAGGCTCTGGAGTGTT
141 EST00118 1 CTCAGAGAAACTTAGGTGAA CTACAGAATCATTTCACCAG
220 EST00372 1 AAGTTGCACATTGCCCAAGG ATAGTACTGCAAGGTTATTC
237 EST00187 1 TTACAAATTTCTCTTGACGC CTGAAGGAGCACAGTTTCTC
242 EST00192 1 GGATCAGATAATCAAACAGG GCTTAGGATATGAATGCATA
259 EST00202 1 GCATCACAGTTTAACTGAGG CTACATATTTGTGCCTCCTT
269 EST00293 1 CTGTTGCTGTGCAGTAGCTT CTTTTGACCCAGTGAAACTT
299 EST00249 1 GATCATGCAGACGTAGATAT CCAACTCCTGCCAGATCATT
1651 EST00810 1 TAGTCGCTGTAAGTTGATTC GCTTTGCTGGATGCTTCATT
16 EST00021 2 CAGGCAAGTTTCTTCCAGGA TCAGACCCATGGTCAGCTT
1898 EST01013 2 GGCTGAGAACGGTTAGCATA CCCTCAGCTTAGGGGAATG
8 EST00234 2 TAGAAGGCAAACTATGTCCC GGTTGAGGATTGGCTTTTAC
36 EST00037 2 AGCCAGAAGGCTGCTTAAAG GCAGTGAACCAGTACTCCTA
123 EST00106 2 GTCTAATTTGTAACCTTCAG GATAGATTGTATAAGAAGCC
192 EST00155 2 GATTTATGTCTGGGAACTAA GCAGCATGTGAAAGAATGAT
200 EST00162 2 TTTAATGGGTGGTGGGAGCT CGATGCACATCCTTCTCCAT
284 EST00216 2 CCTAAGAATTCGTTTGGCTC GTCTGGCACATAATAGATTTG
102 EST00248 3 ATACTACATCTAGTCTGG TTACAGTTCTGTGGTTTC
167 EST00138 3 AAACAGCTGCGGAGTACA AAAGGATCCTCCACTCCAGA
12 EST00274 3 CCTAGCAAACTCATACACAC CATAAGTGAATGGACACAGG
6-0 ΞST00062 3 ACACATTAACGGTGCTGCAG GGAATCAGCCCTTGAGGACT
77 EST00257 3 AAGCTCACAACGCAGATCTG CTGGAACAGCTTACAAAGGT
107 EST00093 3 ATTGAACTCTGTCAACAGTG TGTAAAACAAAGGCCAAACT
108 EST00094 3 AL2-GCAGGATGTCAGTCTTTTGAG AGCACACATTATCTACCACGGC
1706 EST00857 3 AL2-GCAGGATGTCAGTCTTTTGAG CCAGCACACATTATCTACCACG
37 EST00038 4 AACTTCGCAGTCATGAGAAC TGTATCGGGCAGTTCTCAG
6 EST00013 4 CACATGTTCTCCCTCTTTCA GCATTTTGGAGCTCTTCCGT
37 EST00038 4 AL2-GGAAGTACAGGATTTGGC TTAGAGATGGGATGATGCCG
31 EST00033 5 TGGGTACCCTAAGGTGTTTG GACTAATCTAAGGTCTAGG
28 EST00030 5 AGATAAGTTAGGAAGCTGGT ACTCACTGCTAGTATCATCC
59 EST00061 5 AAAGTTTCTTAGCACCCCCC CAGACTTTGACAAAAGAATC
74 EST00073 5 ATCAGACACGTGGCAGGGTT AAGTCCCTGAGGGTGCAGAA
121 EST00104 5 TGAAGGCAGCTGCTAAATCT GGATGTATTGATCTGACTCA
149 EST00123 5 ATACTGTCAACGGAGGGTGA GTCTGCAGGTTTCTCCTTGA
235 EST00185 5 TTACTGTCCCATCAGATATC TACACTCTTAAGAAGGTATG
1643 EST00803 5 GAGCGTTTAAAAGAGATTCT TACAGACAGCCATGTTCCAA
1677 EST00835 5 AL2-TCTCCAACACAGTCATGC CGGATGCCATCATATACC
23 EST00026 5 CCTGCAGTGACACTTAACAT CTGCTCACCTGAAATTGATAC
121 EST00104 5 AL2-CAGATCAATACATCCTCTGGG CTGTGCAGTGGTGAGTAAAAGG SEO ID EST# Chr PRIMER #1 PRIMER #2
1 EST00007 6 TAGTTGATGGTCTGGGTTAT GAAATCCCAGGGAGACAATG
19 EST00023 6 CAACTTACATTAGGGGTTTG GACCTCATTAGAAGAGCCCA
155 EST00129 6 GGAAGCTGCCATATAAGCTC TCAGTGTCGTACAATCTACC
224 EST00356 6 GCTGTATGTTAACCCTTTGT TGGAACCCTCAAACACTGCT
288 EST00219 6 ACTTTCATGTTGAGAAGTAT ATCTAGCTGAAACATTGCTG
1638 EST00798 6 CTTCATCTGTTAACTGTTGA TGAAAATGAGTCACAGGCAG
1675 EST00833 6 AL2 -ACCCAGTTCTCAAAGACC GGTTTACCATTCAGAGGC
22 EST00301 6 CTCCGTGATTACCTTCATCT TTGTAGGTATCTCTGTCAGCT
207 EST00167 7 GGTGCTACTTTGTGAATGCT AGCAATGTGATTTTGTAGG
137 EST00272 7 AGTGGTCACTATCTACATGG GATTCAGAATTACTAAGCCG
1659 EST00817 7 TGTATAGGCTCTACATAAAG CTTAATCATGGATTCTTCGT
1680 EST00838 7 AL2 -GTTCTTTCCCAGGTATGC TTGTTGGTACTGAGGAAGTOCG
292 EST00223 8 TGCAGCAGTGACCATGAGAA ATCATCTTTCCACGCGGCTT
134 EST00375 9 TCTGGGCTTCTGTGGTTCAA CTGGCTGCTCAGCAACTCAT
1906 EST01021 9 GGATGTTTTCTATGTGACGA TTCCAGTGCCCCTTTTGTCC
1645 EST00804 10 CTCCTTTGGGACAAACAACT CCAACCCAAACATATTCTA
20 EST00024 10 AGCTGTTCCTGAGAGATGCA CCTTGTGAAGAAAGACTTTC
157 EST00131 10 TCAGCAACAGGTCACTTTGG CTAAGCATCTGCATGTCCAG
172 EST00142 10 TACTAGCATTTCTTACTCTC TATGCTGATTGTTTGCACTC
250 EST00197 10 GGTGATTAGAGAGTCTGTTG GAACTCTGTAGTGTTCTAAA
133 EST00111 11 GGAAATTAGGCTTAGCTCAC GTGCAGAATACTTAGAGTCC
178 EST00294 11 GTTTGAAGGAAGTGATTTCC TAGGGCCACCTCCAGTTCAT
10 EST00016 11 GTCTTTGGATTCTACGTAGA CGATAATGACATTTCTTCTGG
126 EST00109 11 AL2 -CTAACCACAACCCACACATTG CCTCAGCACAAGAGAAGAATGG
7 EST00014 12 AACTTGCAACATAAATACTAG GAGCAATGATTTCTAACAGT
254 EST00200 13 TTGTGTACTGTCTGATAGAC TAAGCCATGGGCATCTATAA
2409 EST00273 13 GCAAGATGATGGAACATCCC TTCCTTCTGGAGGCTCTACA
170 EST00295 14 GGTGCTTAAGGCCACTTTTG CTTAGAGGATCATAGGTCTG
255 EST00201 14 CCAGGAGAGTAAGAAGATCA GCAGAGTTGAATATGAACCT
290 EST00221 14 GTGCCAAGATGGCTCATGTA GTATAGCTTTAAGCCAGTTC
293 EST00224 14 AATGCATTATGCCTGGTCTT GGAAAAGTCTAGAACTTAGT
1664 EST00822 14 GGGTCAGAATTAAGAGGTCT GTTCATCTCTAACTCCTTTC
315 EST00008 14 AAGCTGGCTGGGAAATGTTC GTCATGCTAGTAAACTTACAC
1689 EST00845 14 AL2-AGGAGGAAGCTGAAATCC GGAAGTCCATAAGAGACTCACC
95 EST00088 15 GTGACAGACCATGTCTATTG AAGTGAGCGATTGCACCTTC
205 EST00165 15 AGGATGACCTGAGTGAGCTG CCATGGCAGCAAGGAACTCT
33 EST00034 16 TGTGTGAAAGGGAGTCTTGT CCATTTTGACTGTTCCATAG
247 EST00279 16 TGGCTAGGGCAGGCCTTAAA GAGAAGAATATCAAATGGGG
18 EST00373 16 CCATCTGTGTCCCAATTAAGC AGGGAAGAAGTCTAGAGCGA
68 EST00068 17 CAAAGACGGGAGACGAATGA AGTGGAACGCGTGGCCTATG
1652 EST00811 17 GAGCTGCATGTTGATAAGTA TTGACTTAAGCTGACCTTAA
1702 EST00854 17 AL2-TTGCTGTGGAATCCATGAGAG GGCAAGTGATCTGTTCTTGG
84 EST00080 19 AGAGATGTCAGTCCATTATC CTATTCCACCTTACTCAAGG
223 EST00368 19 CATCATGTCGGAGACGCATT TGGATGACCTGAGTCTGCAG
21 EST00025 20 AGTTCTGGAGGCTAGGAGTT ATGTAAGGACCCCTAGATGG
210 EST00168 20 TGTCAACTTCCCTTTGGCCT GAAGCTTGCTCATTCAGGAA
136 EST00113 20 AL2 -TCGGAGAAGTTGCAGTTTCTG GTTAAAAGCTGTTAGACGGGGC
120 EST00103 22 CACTGACTGACTCCTCTTTA GGAACCGTAACTCTCCATAG
313 EST00276 X ATTGACCTTCAATGTAATAA TTGGATTGGGCAAAATAG SEQ ID EST# Chr PRIMER #1 PRIMER #2
162 EST00133 X ATGTGAGCATCTATACCTGC AATGAAGGCATGAGAATAGG
1669 EST00827 X CGGACAACTAGGATAAATGC TACGCGTTTGAATGGCTTGA
1917 EST01029 X GAATAGCATTATTAGCCAGT GGACCTATTGGAGATCTACT
1708 EST00858 X AL2-AAGGCGAGGATTATGTGC TTCTACTGGGTACACTTCGACC
Abbreviation: AL2 : Amino-Link-2 Fluorescent Tag, Chr.: Chromosome.
The foregoing techniques have been used to further localize 9 ESTs and their associated genes to precise locations onto chromosome 6 or chromosome X, as reflected in Table 4A (in Example 7 below) , using sublocalization techniques that employ somatic cell hybrids. ESTs were used as hybridization probes and mapped to other
chromosomes using techniques disclosed in Example 7.
Somatic cell hybrids were prepared that contained defined subsets of chromosomes 6 and X. Methods for preparing and selecting somatic cell hybrids are known in the art. For a review of an exemplary procedure to generate somatic cell hybrids containing the short arm of human chromosome 6, see Zoghbi, et al., Genomics 9(4):713-720 (1991). For a general review of somatic cell hybridization see Ledbetter et al. (supra). The hybrids were processed to obtain DNA and analyzed by PCR and by fluorescence in situ
hybridization. SEQ ID NOs 19, 22, 1, 224, 288 mapped to chromosome 6, while SEQ ID NOs 162, 1917, 1699 and 1899 mapped to chromosome X using somatic cell hybrids. EXAMPLE 6
Mapping of All ESTs to Human Chromosomes
The procedure of Example 5 is repeated for all of the ESTs from Examples 1 and 2 not previously mapped to human chromosomes. Data are generated corresponding to the data in Table 3 for all of the unmapped ESTs. As previously mentioned, virtually all of the ESTs will map to a unique chromosomal location. The inability of any ESTs to
localize to a unique location will be readily ascertainable during the mapping process.
Physical mapping of the type reported in Table 4 on all the EST clones reported here would provide human chromosome markers spaced on average every 1.2 megabases and would roughly double the number of expressed sequences that have been localized to chromosomes (McKusick, V. FASEB J. 5: 12-20 (1991)). Mapped ESTs are also a new resource to identify candidates for the estimated 5000 single-locus disease-associated genes (Id.).
EXAMPLE 7
Alternative Technique for Mapping to Chromosomes
Mapping of ESTs to chromosomes using fluorescence in situ hybridization
This technique was used to map an EST to a particular location on a given chromosome. Cell cultures, tissue, or whole blood were used to obtain chromosomes.
0.5 ml. of whole blood was added to RPMI 1640 and incubated 96 hours in a 5%CO2/37°C incubator. 0.05 ug/ml colcemide was added to the culture one hour before harvest. Cells were collected and washed in PBS. The suspension was incubated with a hypotonic solution of KC1 added dropwise to reach a final volume of 5 ml . The cells were spun down and fixed by resuspending the cells in methanol and glacial acetic acid (3:1). The cell suspension was dropped onto glass slides and dried.
The slides were treated with RNase A and washed then dehydrated in a series of increasing concentrations of ethanol.
The EST to be localized was nick-translated using fluorescently labeled nucleotide (Korenberg, Jr., et al., Cell 53(3):391-400 (1988)). Following nick translation, unincorporated label was removed by spin dialysis through Sepharose. The probe was further extracted with phenolchloroform to remove additional protein. The chromosomes were denatured in formamide using techniques known in the art and the denatured probe was added to the slides. Following hybridization, the cells were washed. The slides were studied under a fluorescent microscope. In addition, the chromosomes can be stained for G-banding or Q-banding using techniques known in the art. The resulting metaphase chromosomes had fluorescent tags localized to those regions of the chromosome that were homologous to the EST. Thus, a particular EST was localized to a particular region on a given chromosome. In this manner, SEQ ID NOs 396, 485, 506, 1880 and 1894 were mapped using fluorescent in situ hybridization to locations on chromosomes 17, 7, 10 and 1 respectively (See Table 4B below). For a review of the technique see Verma et al., Human Chromosomes: A Manual of Basic Techniques. Pergamon Press, NY (1988), which is hereby incorporated by reference.
Table 4: Precise Chromosomal Localization of ESTs
SEQ ID EST# Map Location
- - - - - - - - - - - - - - - - - - - - - - - - - - A. 19 EST00023 6p
22 EST00301 6p
1894 EST01643 6p21
1 EST00007 6q
224 EST00356 6q
288 EST00219 6q
162 EST00133 Xp11. 21 - Xp21.2
1917 EST01029 Xp11. 21 - Xp21.2
1669 EST00827 Xq26 - Xq27.1
1899 EST01014 Xq28
B. 1880 EST01634 lq32
485 EST01466 7p13
506 EST01471 10q11.2
396 EST01443 17q25
EXAMPLE 8
Automated DNA Sequencing Accuracy ESTs that match human sequences in GenBank are
excellent tools for the analysis of the accuracy of double-strand automated DNA sequencing. Ninety EST/GenBank matches were examined for the number of nucleotide
mismatches and gaps required to achieve optimal alignment by the Genetics Computer Group (GCG) program BESTFIT
(Devereux et al, Nucleic Acids Research 12: 387 (1984)) . The number of mismatches, insertions and deletions was counted for each hundred bases of the sequence (Table 5). As expected, the sequence quality was best closest to the primer and decreased rapidly after about 400 bases. The number of deletions and insertions relative to the GenBank reference sequence increased five- to ten-fold beyond 400 bases, while the number of mismatches doubled. The average accuracy rate for individual double-stranded sequencing runs was 97.7% to 400 bases.
TABLE 5. Accuracy Of Single-Run Double -Stranded Automated Sequencing
Bases from Mismatches/ Gaps Percent Aligned
Primer Ambiquities+ Insertions+ Deletions+ Accurate Bases
101 - 200 1.45 0.18 0.19 98.2 8,800 201 - 300 1.72 0.25 0.11 97.9 8,130 301 - 400 2.07 0.98 0.37 96.6 5,404 >400 3.53 2.63 1.06 92.8 3,197
ESTs statistically identical to known human sequences and those matching mitochondrial and ribosomal genes were aligned with sequenced from GenBank using the GCG program BESTFIT. The first 85 nucleotides was polylinker sequence which was not aligned with the pBluescript SK reference sequence. Tabulation of errors began 15 bases into the BESTFIT alignment and thus is reported beginning with bases 101-200. Error rates are reported as number of mismatches, insertions, or deletions per hundred aligned bases. "Mismatches" includes ambiguous base calls.
EXAMPLE 9
Probability of ESTs Containing Coding Sequences
The ESTs of the present invention were statistically evaluated using the coding-region prediction program CRM via the GRAIL server (Uberbacher, E. & Mural, R. Proc.
Natl. Acad. Sci. USA, 88: 11261-5 (1991)). The CRM program uses a neural network to combine results from several different coding regions by looking at different 6 bp sequences found in coding exons and in introns. The program additionally conducts reading frame searches and assesses randomness at the third position of codons. This, protocol categorizes sequences as having an excellent, good, marginal, or poor probability of containing coding regions. The results are reported in Tables 6-9. There were 219 ESTs categorized as "excellent" (Table 6); 120 categorized as "good" (Table 7); 113 categorized as
"marginal" (Table 8); and 1743 categorized as "poor" (Table 9). These results indicate that most ESTs of the present invention comprise noncoding regions.
Table 6: ESTs with Excellent Probability of Containing Coding Sequence
SEQ ID# EST# 973 EST01987 1807 EST00941 2373 EST01393
979 EST01993 1809 EST00943 2374 EST01394
7 EST00014 980 EST01994 1820 EST00951 2393 EST01417
15 EST00020 986 EST02000 1829 EST00958 2394 EST01418
48 EST00291 1000 EST02014 1849 EST00975 2396 EST01420
62 EST00064 1004 EST02018 1860 EST00983
66 EST00067 1007 EST02021 1866 EST00989
75 EST00074 1018 EST02032 1871 EST00994
98 EST00260 1021 EST02035 1888 EST01005
106 EST00092 1034 EST02050 1890 EST01007
108 EST00094 1047 EST02063 1892 EST01009
114 EST00098 1090 EST02109 1903 EST01018
115 EST00099 1096 EST02115 1904 EST01019
124 EST00107 1115 EST02135 1914 EST01026
128 EST00252 1118 EST02138 1930 EST01040
156 EST00130 1129 EST02149 1944 EST01050
164 EST00135 1133 EST02153 1949 EST01054
166 EST00137 1141 EST02163 1962 EST01062
174 EST00296 1163 EST02187 1973 EST01071
179 EST00145 1183 EST02208 1977 EST01075
183 EST00148 1243 EST02272 1982 EST01080
201 EST00163 1264 EST02293 1991 EST01088
205 EST00165 1265 EST02294 1993 EST01090
215 EST00172 1266 EST02295 2000 EST01097
230 EST00181 1287 EST02317 2001 EST01098
253 EST00199 1308 EST02338 2012 EST01106
263 EST00203 1324 EST02354 2013 EST01107
268 EST00369 1344 EST02374 2024 EST01117
270 EST00207 1356 EST02386 2043 EST01131
271 EST00283 1365 EST02396 2051 EST01138
273 EST00208 1383 EST02415 2056 EST01142
276 EST00211 1399 EST02433 2058 EST01144
281 EST00214 1401 EST02435 2059 EST01145
285 EST00286 1405 EST02439 2064 EST01149
333 EST00394 1417 EST02452 2090 EST01167
336 EST00397 1451 EST02487 2094 EST01171
339 EST00400 1457 EST02493 2116 EST01192
362 EST00418 1463 EST02500 2117 EST01193
389 EST00440 1473 EST02510 2128 EST01202
441 EST00481 1479 EST02516 2131 EST01205
454 EST00493 1516 EST02555 2134 EST01208
476 EST00509 1528 EST02569 2144 EST01216
493 EST00522 1531 EST02572 2145 EST01217
504 EST00529 1544 EST02586 2150 EST01222
516 EST00538 1551 EST02593 2155 EST01227
518 EST00540 1558 EST02601 2161 EST01231
551 EST01482 1561 EST02604 2163 EST01238
552 EST00565 1581 EST02625 2174 EST01242
559 EST00570 1586 EST02631 2176 EST01244
582 EST00592 1591 EST02636 2189 EST01255
602 EST00606 1616 EST02661 2214 EST01272
606 EST00609 1624 EST02670 2225 EST01278
608 EST00611 1630 EST02676 2227 EST01279
621 EST00620 1637 EST00796 2233 EST01284
635 EST00629 1639 EST00799 2235 EST01286
642 EST00634 1649 EST00808 2236 EST01287
644 EST00636 1651 EST00810 2255 EST01302
687 EST00671 1677 EST00835 2259 EST01304
700 EST00683 1682 EST00839 2263 EST01307
743 EST00714 1694 EST00849 SEQ ID# EST#
753 EST00721 1706 EST00857
760 EST00726 1708 EST00858 2267 EST01756
764 EST00729 1710 EST00860 2281 EST01321
808 EST00761 1716 EST00865 2283 EST01322
823 EST01864 SEQ ID# EST# 2300 EST01333
834 EST00771 2303 EST01335
886 EST01886 1718 EST00867 2303 EST01335
919 EST01921 1731 EST00879 2314 EST01345
930 EST01933 1742 EST00887 2334 EST01358
SEQ ID# EST# 1746 EST00891 2339 EST01362
1760 EST00903 2342 EST01365
936 EST01939 1767 EST00907 2348 EST01371
948 EST01957 1769 EST00909 2358 EST01379
965 EST01978 1777 EST00913 2367 EST01388 Table 7 : ESTs with Good Probability of Containing Coding Sequence
SEQ ID# EST# 1041 EST02057 2362 EST01383
1083 EST02102 2378 EST01397
20 EST00024 1099 EST02118 2399 EST01423
72 EST00071 1105 EST02124 2407 EST02714
82 EST00078 1113 EST02133
88 EST00084 1139 EST02161
137 EST00272 1146 EST02168
177 EST00328 1196 EST02221
193 EST00156 1210 EST02238
200 EST00162 1233 EST02262
218 EST00175 1285 EST02314
228 EST00179 1331 EST02361
247 EST00279 1388 EST02421
264 EST00204 1418 EST02453
267 EST00297 1439 EST02475
296 EST00228 1502 EST02540
371 EST00426 1537 EST02578
385 EST00436 1563 EST02606
392 EST00442 1599 EST02644
414 EST00460 1602 EST02647
433 EST00474 1693 EST00848
453 EST00492 1695 EST00850
471 EST00505 172.9 EST00877
496 EST00525 1730 EST00878
524 EST00544 1738 EST00883
526 EST00546 1739 EST00885
529 EST00549 1743 EST00888
549 EST00563 1768 EST00908
557 EST00569 1780 EST00916
578 EST00588 1804 EST00938
596 EST00602 1805 EST00939
607 EST00610 1811 EST00945
619 EST00619 1819 EST00950
657 EST00646 1826 EST00956
660 EST00649 1830 EST00959
689 EST00673 1845 EST00971
695 EST00679 1848 EST00974
699 EST00682 1853 EST00977
729 EST00703 1967 EST01066
742 EST00713 1992 EST01089
747 EST00717 1994 EST01091
755 EST00723 SEQ ID# EST#
759 EST00725
776 EST00738 1997 EST01094
778 EST00740 2046 EST01134
782 EST01551 2101 EST01177
829 EST00768 2102 EST01178
835 EST00772 2105 EST01181
836 EST00773 2106 EST01182
862 EST01872 2141 EST01213
881 EST01881 2184 EST01251
SEQ ID# EST# 2196 EST01260
2203 EST01264
884 EST01884 2232 EST01283
924 EST01926 2308 EST01339
929 EST01932 2345 EST01368
938 EST01941 2346 EST01369
971 EST01985 2351 EST01373
995 EST02009 2354 EST01375
996 EST02010 2355 EST01376
1031 EST02046 2359 EST01380 Table 8: ESTs with Marginal Probability of Containing Coding Sequence
SEQ ID# EST# 1222 EST02251
1224 EST02253
11 EST00018 1228 EST02257
12 EST00274 1267 EST02296
24 EST00027 1301 EST02331
45 EST00364 1397 EST02431
79 EST00076 1448 EST02484
90 EST00302 1480 EST02517
110 EST00096 1493 EST02531
1 44 EST00120 1499 EST02537
145 EST00121 1503 EST02541
192 EST00155 1527 EST02568
222 EST00177 1536 EST02577
234 EST00184 1548 EST02590
277 EST00212 1562 EST02605
319 EST00381 1572 EST02615
368 EST00423 1575 EST02618
370 EST00425 1595 EST02640
387 EST00438 1608 EST02653
402 EST00451 1610 EST02655
415 EΞT00461 1621 EST02667
418 EST00464 1627 EST02674
426 EST00470 1629 EST02677
503 EST00528 1631 EST02678
517 EST00539 1683 EST00840
522 EST00543 1692 EST00847
532 EST00551 1751 EST00895
540 EST00557 1756 EST00900
570 EST00580 1764 EST02690
573 EST00583 1770 EST00910
576 EST00586 1793 EST00929
613 EST00615 1847 EST00973
617 EST00617 1877 EST00998
626 EST00622 1897 EST01012
681 EST00665 1900 EST01015
726 EST00700 1939 EST01655
727 EST00701 1940 EST01046
738 EST00711 1954 EST01058
745 EST00715 SEQ ID# EST#
752 EST00720
791 EST00746 1990 EST01087
795 EST00749 2008 EST01103
803 EST00756 2031 EST01123
845 EST00777 2041 EST01130
852 EST00782 2044 EST01132
854 EST00784 2060 EST01146
907 EST01907 2100 EST01176
912 EST01912 2136 EST01210
935 EST01938 2153 EST01225
SEQ ID# EST# 2204 EST01265
2212 EST01270
968 EST01981 2248 EST01297
985 EST01999 2250 EST01299
988 EST02002 2266 EST01310
1043 EST02059 2309 EST01340
1081 EST02100 2347 EST01370
1089 EST02108 2388 EST01406
1116 EST02136 2398 EST01422
1134 EST02154 2405 EST01427
1205 EST02233 Table 9: ESTs with Poor Coding Probability
SEQ ID# EST# 103 EST00317 204 EST00235 309 EST00174 404 EST00453
104 EST00354 206 EST00166 315 EST00008 405 EST00454
1 EST00007 105 EST00365 207 EST00167 316 EST00378 406 EST00455
2 EST00009 107 EST00093 209 EST00331 317 EST00379 407 EST00456
3 EST00010 109 EST00095 210 EST00168 318 EST00380 408 EST00457
4 EST00011 111 EST00281 211 EST00332 320 EST00382 409 EST01444
5 EST00012 112 EST00318 212 EST00169 321 EST00383 410 EST00458
6 EST00013 113 EST00097 213 EST00170 322 EST00384 411 EST00459
8 EST00234 116 EST00100 214 EST00171 323 EST00385 412 EST01445
10 EST00016 117 EST00319 216 EST00173 325 EST00386 416 EST00462
14 EST00019 118 EST00101 219 EST00176 326 EST00387 417 EST00463
16 EST00021 119 EST00102 220 EST00372 327 EST00388 419 EST00465
17 EST00022 120 EST00103 221 EST00359 328 EST00389 420 EST00466
18 EST00373 121 EST00104 224 EST00356 329 EST00390 421 EST00467
19 EST00023 122 EST00105 225 EST00178 330 EST00391 422 EST01447
21 EST00025 123 EST00106 226 EST00333 331 EST00392 423 EST00468
23 EST00026 125 EST00108 229 EST00180 332 EST00393 424 EST01448
25 EST00028 126 EST00109 231 EST00334 334 EST00395 425 EST00469
27 EST00029 127 EST00320 232 EST00182 335 EST00396 427 EST01449
28 EST00030 129 EST00321 233 EST00183 337 EST00398 428 EST01451
29 EST00031 130 EST00355 235 EST00185 340 EST00402 429 EST00471
30 EST00032 131 EST00322 236 EST00186 341 EST00403 431 EST00473
31 EST00033 133 EST00111 237 EST00187 342 EST00404 432 EST01452
32 EST00233 134 EST00375 238 EST00188 344 EST00405 434 EST00475
33 EST00034 135 EST00112 239 EST00189 345 EST00406 435 EST00476
34 EST00035 136 EST00113 240 EST00335 347 EST01829 436 EST00477
35 EST00036 138 EST00114 241 EST00191 348 EST01830 437 EST00478
36 EST00037 139 EST00116 242 EST00192 349 EST01831 438 EST00479
39 EST00039 140 EST00117 243 EST00193 350 EST00407 439 EST00480
40 EST00040 141 EST00118 244 EST00194 351 EST00408 440 EST01454
41 EST00041 142 EST00323 245 EST00347 352 EST00409 442 EST01456
42 EST00042 143 EST00119 246 EST00196 353 EST00410 443 EST00482
46 EST00044 146 EST00122 250 EST00197 354 EST01433 444 EST00483
47 EST00046 147 EST00292 252 EST00198 355 EST00411 446 EST00485
49 EST00047 148 EST00236 254 EST00200 356 EST00412 447 EST00486
50 EST00048 149 EST00123 255 EST00201 357 EST00413 448 EST00487
51 EST00049 150 EST00124 256 EST00345 358 EST00414 449 EST00488
52 EST00052 151 EST00125 257 EST00337 359 EST00415 450 EST00489
53 EST00054 152 EST00126 259 EST00202 360 EST00416 451 EST00490
54 EST00055 153 EST00127 260 EST00357 361 EST00417 452 EST00491
55 EST00056 154 EST00128 261 EST00338 363 EST00419 455 EST00494
56 EST00057 155 EST00129 262 EST00339 364 EST00420 457 EST00495
57 EST00058 157 EST00131 265 EST00205 365 EST01434 458 EST00496
58 EST00059 158 EST00132 266 EST00206 366 EST00421 459 EST00497
. 59 EST00061 159 EST00325 272 EST00340 367 EST00422 460 EST01457
60 EST00062 160 EST00326 274 EST00268 369 EST00424 461 EST01836
63 EST00065 162 EST00133 275 EST00209 372 EST00427 462 EST00498
64 EST00066 163 EST00134 278 EST00342 373 EST01832 464 EST00499
67 EST00351 165 EST00136 279 EST00213 374 EST00428 465 EST00500
68 EST00068 167 EST00138 280 EST00343 375 EST00429 466 EST00501
69 EST00360 168 EST00140 283 EST00215 376 EST01436 467 EST00502
71 EST00070 169 EST00141 284 EST00216 377 EST00430 468 EST00503
73 EST00072 170 EST00295 286 EST00217 378 EST00431 470 EST00504
74 EST00073 171 EST00327 287 EST00218 379 EST00432 SEQ ID# EST#
76 EST00075 172 EST00142 288 EST00219 380 EST01439
80 EST00077 173 EST00143 289 EST00220 381 EST00433 473 EST00506
81 EST00315 175 EST00144 290 EST00221 382 EST00434 474 EST00507
83 EST00079 178 EST00294 291 EST00222 SEQ ID# EST# 477 EST01463
84 EST00080 182 EST00329 292 EST00223 478 EST00510
85 EST00081 184 EST00149 293 EST00224 383 EST00435 479 EST00511
86 EST00082 185 EST00150 294 EST00225 384 EST01440 480 EST01464
87 EST00083 186 EST00151 SEQ ID# EST# 386 EST00437 481 EST00512
89 EST00085 190 EST00153 388 EST00439 482 EST01465
91 EST00086 191 EST00154 295 EST00226 390 EST01442 483 EST00513
92 EST00087 194 EST00157 297 EST00230 391 EST00441 484 EST00514
94 EST00353 SEQ ID# EST# 298 EST00231 393 EST00443 487 EST00516 95 EST00088 302 EST00303 395 EST00445 488 EST00517
96 EST00089 195 EST00158 303 EST00348 397 EST00446 489 EST00518
99 EST00316 196 EST00159 304 EST00307 398 EST00447 490 EST00519
SEQ ID# EST# 197 EST00160 305 EST00308 399 EST00448 491 EST00520
198 EST00161 306 EST00309 400 EST00449 492 EST00521
100 EST00090 199 EST00277 307 EST00312 401 EST00450 495 EST00524
101 EST00091 203 EST00164 308 EST00314 403 EST00452 497 EST00526 498 EST01467 600 EST01492 697 EST00680 799 EST00752 894 EST01894
499 EST01468 601 EST01493 698 EST00681 800 EST00753 895 EST01895
500 EST00527 603 EST01494 701 EST01522 801 EST00754 896 EST01896
501 EST02715 604 EST00607 702 EST00684 804 EST00757 897 EST01897
502 EST01469 605 EST00608 703 EST00685 805 EST00758 898 EST01898
507 EST00530 609 EST01496 704 EST00686 806 EST00759 899 EST01899
508 EST00531 610 EST00612 705 EST00687 807 EST00760 900 EST01900
509 EST01472 611 EST00613 706 EST00688 809 EST00762 901 EST01901
510 EST00532 612 EST00614 708 EST00689 810 EST00763 902 EST01902
511 EST00533 615 EST00616 709 EST00690 811 EST00764 903 EST01903
512 EST00534 616 EST01497 710 EST00691 813 EST00765 904 EST01904
513 EST00535 618 EST01498 711 EST00692 814 EST00766 905 EST01905
514 EST00536 620 EST01499 712 EST00693 815 EST01855 906 EST01906
515 EST00537 622 EST01843 713 EST00694 816 EST01856 908 EST01908
519 EST00541 623 EST00621 714 EST00695 817 EST01857 909 EST01909
520 EST00542 624 EST01500 715 EST01523 818 EST01858 910 EST01910
521 EST01474 625 EST01844 716 EST01524 819 EST01859 911 EST01911
523 EST01838 627 EST00623 717 EST01525 820 EST01860 914 EST01914
525 EST00545 628 EST01503 718 EST00696 822 EST01863 915 EST01915
527 EST00547 629 EST00624 719 EST01526 825 EST01866 916 EST01917
528 EST00548 630 EST01505 720 EST00697 826 EST01867 917 EST01919
530 EST01477 631 EST00625 721 EST01527 827 EST01558 918 EST01920
531 EST00550 632 EST00626 722 EST01528 828 EST00767 920 EST01922
533 EST00552 633 EST00627 723 EST00698 830 EST01559 921 EST01923
534 EST01478 634 EST00628 725 EST00699 831 EST00769 922 EST01924
535 EST00553 636 EST01507 728 EST00702 832 EST00770 923 EST01925
536 EST01479 637 EST00630 730 EST00704 837 EST01561 925 EST01927
537 EST00554 638 EST00631 731 EST00705 838 EST00774 926 EST01929
538 EST00555 640 EST01509 732 EST00706 839 EST01562 927 EST01930
539 EST00556 641 EST00633 733 EST00707 840 EST00775 928 EST01931
541 EST00558 643 EST00635 734 EST00708 841 EST00776 931 EST01934
542 EST01480 645 EST00637 735 EST00709 842 EST01563 932 EST01935
543 EST00559 646 EST00638 736 EST01532 843 EST01564 933 EST01936
544 EST00560 647 EST00639 737 EST00710 844 EST01565 934 EST01937
545 EST01481 648 EST00640 739 EST01534 846 EST00778 937 EST01940
547 EST00561 649 EST00641 740 EST01535 847 EST00779 939 EST01943
548 EST00562 651 EST00643 741 EST00712 848 EST01566 SEQ ID# EST#
550 EST00564 652 EST01510 744 EST01537 849 EST01567
553 EST00566 654 EST00644 746 EST00716 850 EST00780 940 EST01944
555 EST01483 655 EST00645 748 EST01850 851 EST00781 941 EST01945
556 EST00568 656 EST01513 749 EST00719 SEQ ID# EST# 942 EST01947
558 EST01484 658 EST00647 750 EST01539 943 EST01948
560 EST01485 659 EST00648 751 EST01540 853 EST00783 944 EST01949
561 EST00571 661 EST00650 754 EST00722 855 EST00785 945 EST01950
562 EST00572 662 EST00651 SEQ ID# EST# 856 EST01568 946 EST01953
563 EST00573 663 EST00652 857 EST01868 947 EST01954
564 EST00574 664 EST00653 756 EST01541 858 EST01869 949 EST01958
565 EST00575 665 EST00654 758 EST00724 859 EST01870 950 EST01959
566 EST00576 SEQ ID# EST# 761 EST01544 860 EST00786 953 EST01962
567 EST00577 762 EST00727 861 EST01871 954 EST01963
568 EST00578 666 EST01514 763 EST00728 863 EST01873 956 EST01968
569 EST00579 667 EST00655 765 EST00730 864 EST00787 957 EST01969
SEQ ID# EST# 668 EST00656 766 EST00731 865 EST01569 958 EST01970
669 EST00657 767 EST00732 866 EST01874 959 EST01972
571 EST00581 670 EST00658 768 EST00733 867 EST01875 960 EST01973
572 EST00582 671 EST00659 770 EST00735 868 EST01876 961 EST01974
574 EST00584 672 EST00660 771 EST01546 869 EST00788 962 EST01975
575 EST00585 673 EST01515 772 EST00736 870 EST00789 963 EST01976
577 EST00587 674 EST01516 774 EST01548 871 EST00790 964 EST01977
580 EST00590 675 EST00661 775 EST00737 872 EST00791 966 EST01979
581 EST00591 676 EST00662 777 EST00739 873 EST00792 967 EST01980
583 EST00593 677 EST00663 779 EST00741 874 EST00793 970 EST01983
584 EST00594 678 EST01517 780 EST01549 875 EST00794 972 EST01986
585 EST00595 679 EST01518 781 EST01550 876 EST00795 974 EST01988
586 EST00596 680 EST00664 783 EST01552 877 EST01877 975 EST01989
587 EST01488 682 EST00666 785 EST01553 878 EST01878 976 EST01990
588 EST00597 683 EST00667 786 EST00742 879 EST01879 977 EST01991
589 EST00598 684 EST00668 787 EST00743 880 EST01880 978 EST01992
590 EST00599 685 EST00669 788 EST00744 882 EST01882 981 EST01995
591 EST01489 686 EST00670 789 EST00745 883 EST01883 982 EST01996
592 EST00600 688 EST00672 790 EST01554 885 EST01885 983 EST01997
593 EST00601 690 EST00674 792 EST00747 887 EST01887 984 EST01998
595 EST01840 692 EST00676 793 EST00748 889 EST01889 987 EST02001
597 EST00603 693 EST00677 794 EST01555 890 EST01890 989 EST02003
598 EST00604 694 EST00678 796 EST00750 892 EST01892 990 EST02004
599 EST00605 696 EST01521 797 EST00751 893 EST01893 991 EST02005 992 EST02006 1086 EST02105 1184 EST02209 1274 EST02303 1363 EST02394
994 EST02008 1087 EST02106 1185 EST02210 1275 EST02304 1364 EST02395
997 EST02011 1088 EST02107 1186 EST02211 1276 EST02305 1366 EST02397
999 EST02013 1091 EST02110 1187 EST02212 1278 EST02307 1367 EST02398
1001 EST02015 1093 EST02112 1188 EST02213 1279 EST02308 1368 EST02399
1002 EST02016 1095 EST02114 1189 EST02214 1280 EST02309 1370 EST02401
1003 EST02017 1097 EST02116 1190 EST02215 1281 EST02310 1372 EST02403
1005 EST02019 1098 EST02117 1191 EST02216 1282 EST02311 1373 EST02404
1006 EST02020 1100 EST02119 1192 EST02217 1283 EST02312 1375 EST02406
1008 EST02022 1101 EST02120 1193 EST02218 1284 EST02313 1376 EST02407
1009 EST02023 1102 EST02121 1194 EST02219 1286 EST02316 1377 EST02408
1010 EST02024 1104 EST02123 1195 EST02220 1288 EST02318 1378 EST02409
1011 EST02025 1106 EST02125 1197 EST02222 1289 EST02319 1379 EST02410
1012 EST02026 1107 EST02126 1198 EST02223 1290 EST02320 1380 EST02411
1013 EST02027 1108 EST02127 1199 EST02224 1291 EST02321 1381 EST02413
1014 EST02028 1109 EST02128 1200 EST02226 1292 EST02322 1382 EST02414
1015 EST02029 1110 EST02129 1201 EST02228 1293 EST02323
1016 EST02030 1111 EST02131 1202 EST02229 1294 EST02324
1017 EST02031 1112 EST02132 1203 EST02230 1295 EST02325
1019 EST02033 1114 EST02134 1204 EST02232 1296 EST02326
1022 EST02036 1117 EST02137 1206 EST02234 SEQ ID# EST#
1023 EST02037 1119 EST02139 1207 EST02235
1024 EST02038 1120 EST02140 1208 EST02236 1298 EST02328
1025 EST02040 1121 EST02141 1209 EST02237 1299 EST02329
1026 EST02041 1122 EST02142 SEQ ID# EST# 1300 EST02330
1027 EST02042 1123 EST02143 1302 EST02332
1028 EST02043 1124 EST02144 1211 EST02239 1303 EST02333
1029 EST02044 1125 EST02145 1212 EST02240 1304 EST02334
1030 EST02045 SEQ ID# EST# 1213 EST02241 1305 EST02335
1032 EST02048 1214 EST02242 1306 EST02336
1033 EST02049 1127 EST02147 1215 EST02244 1307 EST02337
1036 EST02052 1128 EST02148 1216 EST02245 1309 EST02339
SEQ ID# EST# 1130 EST02150 1217 EST02246 1310 EST02340
1131 EST02151 1218 EST02247 1311 EST02341
1037 EST02053 1132 EST02152 1219 EST02248 1313 EST02343
1038 EST02054 1135 EST02155 1220 EST02249 1314 EST02344
1040 EST02056 1136 EST02156 1221 EST02250 1315 EST02345
1042 EST02058 1137 EST02157 1223 EST02252 1316 EST02346
1044 EST02060 1138 EST02159 1225 EST02254 1317 EST02347
1045 EST02061 1140 EST02162 1226 EST02255 1318 EST02348
1046 EST02062 1142 EST02164 1227 EST02256 1319 EST02349
1048 EST02064 1143 EST02165 1232 EST02261 1320 EST02350
1049 EST02065 1144 EST02166 1234 EST02263 1321 EST02351
1050 EST02066 1145 EST02167 1235 EST02264 1322 EST02352
1051 EST02067 1148 EST02170 1236 EST02265 1323 EST02353
1052 EST02068 1149 EST02171 1237 EST02266 1325 EST02355
1053 EST02069 1150 EST02172 1238 EST02267 1326 EST02356
1054 EST02070 1152 EST02174 1239 EST02268 1327 EST02357
1055 EST02071 1153 EST02175 1240 EST02269 1328 EST02358
1056 EST02072 1154 EST02176 1241 EST02270 1329 EST02359
1057 EST02073 1155 EST02177 1242 EST02271 1330 EST02360
1058 EST02074 1156 EST02178 1244 EST02273 1333 EST02363
1059 EST02075 1157 EST02180 1246 EST02275 1334 EST02364
1060 EST02076 1158 EST02181 1247 EST02276 1335 EST02365
1061 EST02078 1159 EST02182 1248 EST02277 1336 EST02366
1062 EST02079 1160 EST02183 1249 EST02278 1337 EST02367
1063 EST02081 1161 EST02184 1250 EST02279 1338 EST02368
1064 EST02082 1162 EST02185 1251 EST02280 1339 EST02369
1065 EST02083 1164 EST02188 1252 EST02281 1342 EST02372
1066 EST02084 1165 EST02189 1253 EST02282 1343 EST02373
1067 EST02085 1166 EST02190 1254 EST02283 1345 EST02375
1068 EST02086 1167 EST02191 1255 EST02284 1346 EST02376
1070 EST02088 1168 EST02193 1256 EST02285 1347 EST02377
1071 EST02089 1169 EST02194 1257 EST02286 1349 EST02379
1072 EST02090 1170 EST02195 1258 EST02287 1350 EST02380
1073 EST02091 1171 EST02196 1259 EST02288 1351 EST02381
1074 EST02092 1172 EST02197 1260 EST02289 1352 EST02382
1075 EST02093 1173 EST02198 1261 EST02290 1353 EST02383
1076 EST02094 1174 EST02199 1262 EST02291 1354 EST02384
1077 EST02096 1175 EST02200 1263 EST02292 1355 EST02385
1078 EST02097 1176 EST02201 1268 EST02297 1357 EST02387
1079 EST02098 1177 EST02202 1269 EST02298 1358 EST02388
1080 EST02099 1178 EST02203 1270 EST02299 1359 EST02390
1082 EST02101 1179 EST02204 1271 EST02300 1360 EST02391
1084 EST02103 1180 EST02205 1272 EST02301 1361 EST02392
1085 EST02104 1182 EST02207 1273 EST02302 1362 EST02393 SEQ ID# EST# 1485 EST02522 1592 EST02637 1689 EST00845 1799 EST00934
1486 EST02523 1593 EST02638 1690 EST00846 1800 EST00935
1384 EST02416 1487 EST02524 1594 EST02639 1691 EST01577 1801 EST00936
1387 EST02419 1488 EST02525 1596 EST02641 1696 EST00851 1802 EST00937
1389 EST02422 1489 EST02526 1597 EST02642 1697 EST00852 1803 EST01613
1390 EST02423 1490 EST02527 1598 EST02643 1702 EST00854 1806 EST00940
1391 EST02424 1491 EST02529 1600 EST02645 1703 EST00855 1808 EST00942
1392 EST02425 1494 EST02532 1601 EST02646 1705 EST00856 1810 EST00944
1393 EST02426 1497 EST02535 1603 EST02648 1707 EST01581 1812 EST02693
1394 EST02427 1498 EST02536 1604 EST02649 1709 EST00859 1813 EST00946
1396 EST02430 1501 EST02539 1605 EST02650 1711 EST00861 1814 EST00947
1398 EST02432 1504 EST02542 1606 EST02651 1712 EST00862 1815 EST01615
1400 EST02434 1506 EST02545 1607 EST02652 1713 EST00863 1816 EST00948
1402 EST02436 1507 EST02546 1609 EST02654 1714 EST00864 1817 EST00949
1403 EST02437 1508 EST02547 1611 EST02656 1717 EST00866 1818 EST01616
1404 EST02438 1509 EST02548 1612 EST02657 1719 EST00868 1821 EST00952
1406 EST02440 1510 EST02549 1613 EST02658 1720 EST00869 1822 EST00953
1407 EST02441 1512 EST02551 1614 EST02659 1721 EST00870 1823 EST00954
1410 EST02444 1513 EST02552 1615 EST02660 1722 EST00871 1824 EST01617
1411 EST02445 1514 EST02553 1617 EST02662 1723 EST00872 1825 EST00955
1414 EST02448 1515 EST02554 1618 EST02663 1724 EST00873 1827 EST01618
1415 EST02449 1517 EST02558 1619 EST02665 1725 EST00874 1828 EST00957
1416 EST02450 1518 EST02559 1620 EST02666 1727 EST00875 1831 EST01619
1419 EST02454 1519 EST02560 1622 EST02668 1728 EST00876 1832 EST00960
1420 EST02456 1520 EST02561 1623 EST02669 1732 EST01590 1833 EST00961
1421 EST02457 1521 EST02562 1625 EST02672 1733 EST01591 1835 EST00962
1422 EST02458 1522 EST02563 1626 EST02673 1734 EST00880 1836 EST01622
1423 EST02459 1523 EST02564 1628 EST02675 1735 EST00881 1837 EST00963
1424 EST02460 1524 EST02565 1632 EST02679 1736 EST01592 1838 EST00964
1425 EST02461 1525 EST02566 1633 EST02680 1737 EST00882 1839 EST00965
1426 EST02462 1526 EST02567 1634 EST02681 1740 EST02687 1840 EST00966
1428 EST02464 1529 EST02570 1635 EST02682 1741 EST00886 1841 EST00967
1429 EST02465 1530 EST02571 1636 EST02684 1744 EST00889 1842 EST00968
1431 EST02467 1532 EST02573 1638 EST00798 1745 EST00890 1843 EST00969
1432 EST02468 1533 EST02574 1640 EST00800 1747 EST00892 1844 EST00970
1433 EST02469 1534 EST02575 1641 EST00801 1748 EST00893 1846 EST00972
1434 EST02470 1535 EST02576 1642 EST00802 1749 EST01593 1850 EST01624
1435 EST02471 1538 EST02579 1643 EST00803 1750 EST00894 1851 EST00976
1436 EST02472 1539 EST02580 1645 EST00804 1752 EST00896 1854 EST00978
1437 EST02473 1540 EST02581 1646 EST00805 1753 EST00897 1855 EST00979
1438 EST02474 1541 EST02582 1647 EST00806 1754 EST00898 1857 EST00980
1440 EST02476 1542 EST02583 1648 EST00807 1755 EST00899 1858 EST00981
1442 EST02478 1545 EST02587 1650 EST00809 1757 EST01594 1859 EST00982
1443 EST02479 1546 EST02588 1652 EST00811 1758 EST00901 1861 EST00984
1444 EST02480 1547 EST02589 1653 EST00812 1759 EST00902 1862 EST00985
1445 EST02481 1549 EST02591 1655 EST00813 1761 EST01598 1863 EST00986
1446 EST02482 1550 EST02592 1656 EST00814 1762 EST00904 1864 EST00987
1447 EST02483 1552 EST02594 1657 EST00815 1763 EST00905 1865 EST00988
1450 EST02486 1553 EST02595 1658 EST00816 1765 EST01600 1867 EST00990
1452 EST02488 1554 EST02597 1659 EST00817 1766 EST00906 1868 EST00991
1453 EST02489 1555 EST02598 1660 EST00818 1772 EST02691 1870 EST00993
1454 EST02490 1556 EST02599 1661 EST00819 1773 EST00911 1872 EST00995
1455 EST02491 1557 EST02600 1662 EST00820 1774 EST00912 1873 EST01630
1456 EST02492 1559 EST02602 1663 EST00821 1775 EST02692 1874 EST00996 1458 EST02495 1560 EST02603 1664 EST00822 1776 EST01603 1875 EST01631
1459 EST02496 1564 EST02607 1665 EST00823 1778 EST00914 1876 EST00997
1460 EST02497 1565 EST02608 1666 EST00824 1779 EST00915 SEQ ID# EST#
1461 EST02498 1568 EST02611 1668 EST00826 1781 EST00917
1462 EST02499 1569 EST02612 1669 EST00827 1782 EST00918 1878 EST00999
1464 EST02501 1570 EST02613 1670 EST00828 1783 EST00919 1879 EST01633
1466 EST02503 1571 EST02614 1671 EST00829 SEQ ID# EST# 1881 EST01000
1467 EST02504 1573 EST02616 1672 EST00830 1882 EST01638
1469 EST02506 1574 EST02617 1673 EST00831 1784 EST00920 1883 EST01001
1470 EST02507 1576 EST02619 1674 EST00832 1785 EST00921 1884 EST01002
1471 EST02508 1577 EST02620 SEQ ID# EST# 1786 EST00922 1886 EST01003
1472 EST02509 1578 EST02621 1787 EST00923 1887 EST01004
1474 EST02511 1579 EST02622 1675 EST00833 1788 EST00924 1889 EST01006
1475 EST02512 1580 EST02623 1676 EST00834 1789 EST00925 1891 EST01008
1476 EST02513 SEQ ID# EST# 1678 EST00836 1790 EST00926 1893 EST01642
1477 EST02514 1679 EST00837 1791 EST00927 1895 EST01010
1481 EST02518 1582 EST02626 1680 EST00838 1792 EST00928 1898 EST01013
1482 EST02519 1583 EST02628 1684 EST00841 1794 EST01607 1899 EST01014
SEQ ID# EST# 1584 EST02629 1685 EST00842 1795 EST00930 1901 EST01016
1585 EST02630 1686 EST01574 1796 EST00931 1902 EST01017
1483 EST02520 1587 EST02632 1687 EST00843 1797 EST00932 1905 EST01020
1484 EST02521 1590 EST02635 1688 EST00844 1798 EST00933 1906 EST01021 1907 EST01022 2016 EST01110 2118 EST01194 2223 EST01742 2332 EST01794
1908 EST01023 2018 EST01111 2119 EST01195 2224 EST01277 2333 EST01357
1909 EST01024 2019 EST01112 2122 EST01197 2228 EST01280 2335 EST01359
1911 EST02694 2020 EST01113 2123 EST01713 2229 EST01281 2336 EST01360
1912 EST01025 2021 EST01114 2124 EST01198 2231 EST01746 2337 EST01361
1913 EST01646 2022 EST01115 2125 EST01199 2237 EST01288 2340 EST01802
1915 EST01027 2023 EST01116 2126 EST01200 2238 EST01289 2341 EST01364
1916 EST01028 2025 EST01118 2127 EST01201 2239 EST01290 2343 EST01366
1917 EST01029 2026 EST01119 2129 EST01203 2240 EST01291 2344 EST01367
1918 EST02695 2027 EST01120 2130 EST01204 2241 EST01747 2349 EST01372
1919 EST01030 2028 EST01121 2132 EST01206 2242 EST01292 2350 EST02708
1920 EST01031 2029 EST01682 2133 EST01207 2243 EST01293 2352 EST01374
1921 EST01647 2030 EST01122 2135 EST01209 2244 EST01294 2356 EST01377
1922 EST01032 2033 EST01684 2137 EST01211 2246 EST01295 2357 EST01378
1923 EST01033 2034 EST01124 2139 EST01716 2247 EST01296 2360 EST01381
1924 EST01034 2035 EST01125 2140 EST01212 2249 EST01298 2361 EST01382
1925 EST01035 2036 EST01126 2142 EST01214 2251 EST01300 2363 EST01384
1926 EST01036 2037 EST01686 2143 EST01215 2252 EST01750 2364 EST01385
1927 EST01037 2038 EST01127 2147 EST01219 2253 EST01301 2365 EST01386
1929 EST01039 2039 EST01128 2148 EST01220 2256 EST02718 2366 EST01387
1932 EST01042 2040 EST01129 2151 EST01223 2257 EST01303 2369 EST01811
1934 EST01043 2042 EST01688 2152 EST01224 2258 EST01754 2370 EST01390
1935 EST01044 2045 EST01133 2154 EST01226 2260 EST01305 2371 EST01391
1936 EST01045 2047 EST01135 2156 EST01718 2261 EST01755 2372 EST01392
1937 EST01652 2048 EST01136 2157 EST01719 2262 EST01306 2375 EST01815
1938 EST01654 2049 EST01689 2158 EST01228 2264 EST01308 2376 EST01395
1941 EST01047 2050 EST01137 2159 EST01229 2265 EST01309 2377 EST01396
1942 EST01048 2052 EST01139 2160 EST01230 2268 EST01311 2379 EST01398
1943 EST01049 2053 EST01140 2162 EST01232 2269 EST01312 2380 EST01399
1945 EST01051 2054 EST01141 2163 EST01233 2270 EST01313 2381 EST01400
1946 EST02696 2055 EST01690 2164 EST01234 2271 EST01314 2382 EST01401
1947 EST01052 2057 EST01143 2165 EST01720 2272 EST01762 2383 EST01402
1948 EST01053 2061 ES101147 2166 EST01236 2273 EST01315 2384 EST01403
1950 EST01055 2062 EST02701 2167 EST01237 2275 EST01316 2385 EST01816
1951 EST01056 2063 EST01148 2169 EST01722 2276 EST01317 2386 EST01404
1952 EST01057 2065 EST01691 2170 EST01239 2277 EST01318 2387 EST01405
1955 EST01662 2066 EST01692 2171 EST01240 2278 EST01319
1957 EST01059 2067 EST01693 2172 EST01241 2279 EST01320
1958 EST01060 2069 EST01150 2175 EST01243 2280 EST01763
1959 EST01061 2070 EST01151 2177 EST01245 2284 EST01323
1963 EST01063 2072 EST01152 2178 EST01726 SEQ ID# EST#
1964 EST01064 2074 EST01698 2179 EST01246
1966 EST01065 2075 EST01153 2180 EST01247 2285 EST01768
1968 EST01067 2076 EST02702 2181 EST01248 2287 EST01770
1969 EST01068 2077 EST01154 SEQ ID# EST# 2288 EST01324
1970 EST01666 2078 EST01155 2290 EST01772
1971 EST01069 2079 EST01156 2182 EST01249 2291 EST01773
1972 EST01070 2080 EST01157 2183 EST01250 2292 EST01326
1975 EST01073 SEQ ID# EST# 2185 EST01252 2293 EST01327
1976 EST01074 2186 EST01253 2294 EST01328
1978 EST01076 2081 EST01158 2187 EST01727 2295 EST01329
1979 EST01077 2082 EST01159 2188 EST01254 2296 EST01330
SEQ ID# EST# 2083 EST01160 2190 EST01728 2298 EST01331
2084 EST01161 2191 EST01256 2299 EST01332
1980 EST01078 2085 EST01162 2193 EST01258 2301 EST01334
1981 EST01079 2086 EST01163 2194 EST01729 2304 EST01780
1983 EST01081 2087 EST01164 2195 EST01259 2305 EST01336
1984 EST01082 2088 EST01166 2197 EST01261 2306 EST01337
1985 EST01083 2091 EST01168 2198 EST01730 2310 EST01341
1986 EST01084 2093 EST01170 2199 EST01262 2311 EST01342
1988 EST01085 2095 EST01701 2200 EST01731 2312 EST01343
1989 EST01086 2096 EST01172 2201 EST01263 2313 EST01344
1995 EST01092 2097 EST01173 2202 EST01732 2315 EST01346
1996 EST01093 2098 EST01174 2205 EST01735 2316 EST01782
1998 EST01095 2099 EST01175 2206 EST01736 2317 EST01347
1999 EST01096 2103 EST01179 2208 EST01267 2318 EST01348
2002 EST01099 2104 EST01180 2209 EST02717 2319 EST01349
2003 EST01675 2107 EST01183 2210 EST01268 2321 EST01350
2005 EST01100 2108 EST01184 2211 EST01269 2322 EST01351
2006 EST01101 2109 EST01185 2213 EST01271 2323 EST01789
2007 EST01102 2110 EST01186 2215 EST01273 2325 EST01353
2009 EST01677 2111 EST01187 2218 EST01274 2327 EST01354
2010 EST01104 2112 EST01188 2219 EST01275 2328 EST01355
2011 EST01105 2113 EST01189 2220 EST01740 2329 EST01792
2014 EST01108 2114 EST01190 2221 EST01741 2330 EST01793
2015 EST01109 2115 EST01191 2222 EST01276 2331 EST01356 SEQ ID# EST#
2389 EST01407
2391 EST01415
2392 EST01416 2395 EST01419 2397 EST01421 2401 EST01424
2403 EST01425
2404 EST01426 2406 EST02713 2409 EST00273
EXAMPLE 10
Functional Groupings of ESTs and Corresponding Genes
By matching new human ESTs to known sequences from other species, the apparent function of the gene corresponding to the EST can be ascertained. The data generated in Example 3 and 4 have been used to categorize 127 of the ESTs of the present invention, and their corresponding genes, into predicted functional groups. (These 127 are ESTs with database matches to sequences from other species for which a function was known.) Two different grouping schemes have been used.
The first scheme separates the sequences into three broad categories: metabolic; regulatory; and structural. These groupings are set out in Table 10.
The second grouping scheme separates the sequences into
13 specific categories: cell surface proteins; developmental control; energy metabolism; kinases and phosphatases; oncogenes; other metabolism-related polypeptides; peptidases and peptidase inhibitors; receptors; structural and cytoskeletal; signal transduction; transporters; transcription, translation, and subcellular localization; and transcription factors. These groupings are set out in Table 11.
Table 10: Three-Class Functional Groupings of ESTs
SEQ ID EST# Group Putative Identification
-- --- ---- ---- ---- - -- ----- ----- ----- ----- ----- ----- ----- ----- -----
1834 EST01620 M AMP deaminase, brain
97 EST00289 M Aconitase
691 EST00675 M Alcohol dehydrogenase
2092 EST01700 M Anion exchanger homolog AE3
396 EST01443 M CDPdiacylglycerol-serine O-phosphatidyltransfera
1956 EST01663 M Ca2+-transporting ATPase 2
1039 EST02055 M Calcium channel
2192 EST01257 M Diacylglycerol kinase, lymphyocyte
1441 EST02477 M Diamine acetyltransferase
2289 EST01325 M Fatty acid synthase
310 EST00377 M Fo ATPase beta subunit, mitochondrial
1667 EST00825 M Gamma-aminobutyric acid transporter
1412 EST02446 M Glutamate-aspartate carrier protein
1020 EST02034 M Glutaminase
2326 EST01791 M Inositol-1,4,5-trisphosphate 3-kinase
2173 EST01724 M Lon protease
1427 EST02463 M Long-chain-fatty-acid-CoA ligase
2226 EST01744 M NAD(P)+ transhydrogenase (B-specific)
1566 EST02609 M Neutrophil oxidase factor
1681 EST01573 M Nucleoside diphosphate kinase
2254 EST01751 M Phosphatidylinositol-4, 5-bisphosphate phosphodie
93 EST00287 M Processing enhancing protein
2297 EST01775 M Prohormone cleavage enzyme
9 EST00376 M Prolyl endopeptidase
1654 EST01572 M Protochlorophyllide reductase
38 EST00374 M RNA polymerase II 6th subunit (RP026)
1715 EST01583 M Ribosomal protein L18a
1856 EST01627 M Ribosomal protein L1a
1974 EST01667 M Ribosomal protein L3
301 EST00300 M Ribosomal protein L30
22 EST00301 M Ribosomal protein S10
2402 EST01826 M Ribosomal protein S10
463 EST01459 M Ribosomal protein YL10
2073 EST01697 M Succinate dehydrogenase flavoprotein
2138 EST01715 M Succinate dehydrogenase flavoprotein
1771 EST01601 M Thiosulfate sulfurtransferase (rhodanese)
2121 EST01711 M Valine-tRNA ligase
1726 EST01588 M XPR2 alkaline extracellular protease
913 EST01913 M Clathrin coat assembly protein AP50 homolog
1035 EST02051 M J1 protein
969 EST01982 R ADP-ribosylation factor 1
1126 EST02146 R Calbindin D28
1910 EST01645 R Calmodulin
485 EST01466 R Calmodulin-dependent protein kinase, type II, be
2302 EST01779 R Discs-large tumor suppressor
188 EST00256 R Enhancer of split
1229 EST02258 R KUP protein
993 EST02007 R Kinase 5 protein
2282 EST01764 R Lamin B receptor
SEQ ID EST# Group Putative Identification
-- --- ----- --- ---- - -- ----- ----- ----- ----- ----- ----- ----- ----- ---- 161 EST00247 R MARCKS (myristoylated alanine-rich protein kinas
769 EST00734 R MARCKS homolog
1386 EST02418 R MARCKS homolog
227 EST00259 R Notch/Xotch
952 EST01961 R Notch/Xotch
1395 EST02429 R Nuclear factor 1-like protein (NF1)
2353 EST01806 R Prohibitin
1069 EST02087 R Protein kinase zeta
1933 EST01650 R Protein phosphatase 2A beta subunit 202 EST00298 R Protein-tyrosine phosphatase LRP
1478 EST02515 R Rab5
1408 EST02442 R Seven in absentia
300 EST00232 R Transforming protein (dbl)
1147 EST02169 R Tyrosine kinase
1348 EST02378 R cAMP-dependent protein kinase inhibitor
1931 EST01041 R cAMP-regulated phosphoprotein
1413 EST02447 R cAMP-specific phosphodiesterase
37 EST00038 R ras p21-like small GTP-binding protein (smg GDS)
102 EST00248 R rho H12/ ARH12
299 EST00249 R smg p25A GDP dissociation inhibitor
189 EST00282 R trkB
1332 EST02362 R GA binding protein, beta subunit
1277 EST02306 R Bib protein
43 EST00371 R Maternal G10 protein
1704 EST01580 R Myeloid differentiation primary response gene My
346 EST01828 R Otd homeotic protein
187 EST00152 R Wilm's tumor-related protein
249 EST00275 R Zinc Finger Proteins
413 EST01446 R Zinc Finger Proteins
469 EST01460 R Zinc Finger Proteins
833 EST01560 R Zinc Finger Proteins
1230 EST02259 R Zinc finger proteins
1496 EST02534 R Zinc finger proteins
2324 EST01352 R Zinc Finger Proteins
208 EST00250 S 60K filarial antigen
2320 EST01784 S 60K filarial antigen
251 EST00370 S Actin, other
2146 EST01218 S Actin, other
248 EST00271 S Actinin, alpha
891 EST01891 S Actinin, alpha
1500 EST02538 S Actinin, alpha
132 EST00110 S Agrin
1852 EST01625 S Agrin
1965 EST01664 S Amyloid A4
2068 EST01694 S Amyloid A4
2408 EST00244 S Amyloid A4
1S80 EST01634 S Axonal glycoprotein TAG-1
2004 EST01676 S Cofilin
650 EST00642 S Dilute (myosin heavy chain)
2217 EST01738 S Gelation factor ABP-280
1885 EST01639 S Histocompatibility antigen modifier 1
77 EST00257 S Kinesin
SEQ ID EST# Group Putative Identification
-- --- ----- --- ---- - ----- ------- ------- ------- ------- ------- ---- - -
78 EST00258 S Kinesin
2245 EST01748 S Kinesin
313 EST00276 S Lysosomal membrane glycoprotein 1 (LAMP-1)
223 EST00368 S Microtubule-associated protein 1B
824 EST01865 S Microtubule-associated protein 1B
2032 EST01683 S Microtubule-associated protein 1B
2017 EST01678 S Milk fat globule membrane protein
1567 EST02610 S Neural cell adhesion molecule L1
506 EST01471 S Neuraxin
2368 EST01389 S Radial spoke protein 3
951 EST01960 S Spectrin, beta
2089 EST01699 S Sperm membrane protein
653 EST01512 S Tubulin, alpha
311 EST00270 S Tubulin, beta
594 EST01490 S Tubulin, beta
757 EST01542 S Tubulin, beta
1245 EST02274 S Tubulin, beta
1589 EST02634 S Tubulin, beta
1468 EST02505 S Matrin 3 1371 EST02402 S Talin
1701 EST00853 S Unc-104
Group Key: M: Metabolic, R: Regulatory, S: Structural
Table 11: Thirteen-Class Functional Groupings of ESTs
SEQ ID EST# Group Putative Identification
208 EST00250 CS 60K filarial antigen
2320 EST01784 CS 60K filarial antigen
1965 EST01664 CS Amyloid A4
2068 EST01694 CS Amyloid A4
2408 EST00244 CS Amyloid A4
1880 EST01634 CS Axonal glycoprotein TAG-1
1885 EST01639 CS Histocompatibility antigen modifier 1
313 EST00276 CS Lysosomal membrane glycoprotein 1 (LAMP-1)
2017 EST01678 CS Milk fat globule membrane protein
1567 EST02610 CS Neural cell adhesion molecule L1
2368 EST01389 CS Radial spoke protein 3
2089 EST01699 CS Sperm membrane protein
1277 EST02306 DC Bib protein
188 EST00256 DC Enhancer of split
43 EST00371 DC Maternal G10 protein
1704 EST01580 DC Myeloid differentiation primary response gene MyD1 227 EST00259 DC Notch/Xotch
952 EST01961 DC Notch/Xotch
346 EST01828 DC Orthodentical homeotic protein
1408 EST02442 DC Seven in absentia
97 EST00289 EM Aconitase
310 EST00377 EM Fo ATPase beta subunit, mitochondrial
485 EST01466 KP Calmodulin-dependent protein kinase, type II, beta
993 EST02007 KP Kinase 5 protein
1069 EST02087 KP Protein kinase C, zeta
1933 EST01650 KP Protein phosphatase 2A beta subunit
202 EST00298 KP Protein-tyrosine phosphatase LRP
1348 EST02378 KP cAMP-dependent protein kinase inhibitor
2302 EST01779 OG Discs-large tumor suppressor
2353 EST01806 OG Prohibitin
1478 EST02515 OG Rab5
300 EST00232 OG Transforming protein (dbl)
37 EST00038 OG ras p21-like small GTP-binding protein (smg GDS)
102 EST00248 OG rho H12/ ARH12
1834 EST01620 OM AMP deaminase, brain
691 EST00675 OM Alcohol dehydrogenase
396 EST01443 OM CDPdiacylglycerol-serine O-phosphatidyltransferase
2192 EST01257 OM Diacylglycerol kinase, lymphyocyte
1441 EST02477 OM Diamine acetyltransferase
2289 EST01325 OM Fatty acid synthase
1020 EST02034 OM Glutaminase
2326 EST01791 OM Inositol-1,4,5-trisphosphate 3-kinase
1427 EST02463 OM Long-chain-fatty-acid-CoA ligase
2226 EST01744 OM NAD(P)+ transhydrogenase (B-specific)
1566 EST02609 OM Neutrophil oxidase factor
1681 EST01573 OM Nucleoside diphosphate kinase SEQ ID EST# Group Putative Identification
2254 EST01751 OM Phosphatidylinositol-4,5-bisphosphatephosphodiest
1654 EST01572 OM Protochlorophyllide reductase
2073 EST01697 OM Succinate dehydrogenase flavoprotein
2138 EST01715 OM Succinate dehydrogenase flavoprotein
1771 EST01601 OM Thiosulfate sulfurtransferase (rhodanese)
2173 EST01724 PI Lon protease
2297 EST01775 PI Prohormone cleavage enzyme
9 EST00376 PI Prolyl endopeptidase
1726 EST01588 PI XPR2 alkaline extracellular protease
1147 EST02169 PP Tyrosine kinase
2282 EST01764 RT Lamin B receptor
189 EST00282 RT trkB
251 EST00370 SC Actin, other
2146 EST01218 SC Actin, other
248 EST00271 SC Actinin, alpha
891 EST01891 SC Actinin, alpha
1500 EST02538 SC Actinin, alpha
132 EST00110 SC Agrin
1852 EST01625 SC Agrin
2004 EST01676 SC Cofilin
650 EST00642 SC Dilute (myosin heavy chain)
2217 EST01738 SC Gelation factor ABP-280
77 EST00257 SC Kinesin
78 EST00258 SC Kinesin
2245 EST01748 SC Kinesin
1468 EST02505 SC Matrin 3
223 EST00368 SC Microtubule-associated protein 1B
824 EST01865 SC Microtubule-associated protein 1B
2032 EST01683 SC Microtubule-associated protein 1B
506 EST01471 SC Neuraxin
951 EST01960 SC Spectrin, beta
1371 EST02402 SC Talin
653 EST01512 SC Tubulin, alpha
311 EST00270 SC Tubulin, beta
594 EST01490 SC Tubulin, beta
757 EST01542 SC Tubulin, beta
1245 EST02274 SC Tubulin, beta
1589 EST02634 SC Tubulin, beta
1701 EST00853 SC Unc-104
969 EST01982 ST ADP-ribosylation factor 1
1126 EST02146 ST Calbindin D28
1910 EST01645 ST Calmodulin
161 EST00247 ST MARCKS (myristoylated alanine-rich protein kinase
769 EST00734 ST MARCKS homolog
1386 EST02418 ST MARCKS homolog
1931 EST01041 ST cAMP-regulated phosphoprotein
1413 EST02447 ST cAMP-specific phosphodiesterase
299 EST00249 ST smg p25A GDP dissociation inhibitor SEQ ID EST# Group Putative Identification
2092 EST01700 TP Anion exchanger homolog AE3
1956 EST01663 TP Ca2+-transporting ATPase 2
1039 EST02055 TP Calcium channel
1667 EST00825 TP Gamma-aminobutyric acid transporter
1412 EST02446 TP Glutamate-aspartate carrier protein
913 EST01913 TT Clathrin coat assembly protein AP50 homolog
1035 EST02051 TT J1 protein
93 EST00287 TT Processing enhancing protein
38 EST00374 TT RNA polymerase II 6th subunit (RPO26)
1715 EST01583 TT Ribosomal protein L18a
1856 EST01627 TT Ribosomal protein L1a
1974 EST01667 TT Ribosomal protein L3
301 EST00300 TT Ribosomal protein L30
22 EST00301 TT Ribosomal protein S10
2402 EST01826 TT Ribosomal protein S10
463 EST01459 TT Ribosomal protein YL10
2121 EST01711 TT Valine-tRNA ligase
1332 EST02362 TX GA binding protein, beta subunit
1229 EST02258 TX KUP protein
1395 EST02429 TX Nuclear factor 1-like protein (NF1)
187 EST00152 TX Wilm's tumor-related protein
249 EST00275 TX Zinc Finger Proteins
413 EST01446 TX Zinc Finger Proteins
469 EST01460 TX Zinc Finger Proteins
833 EST01560 TX Zinc Finger Proteins
1230 EST02259 TX Zinc finger proteins
1496 EST02534 TX Zinc finger proteins
2324 EST01352 TX Zinc Finger Proteins
Group Key: CS: Cell Surface, DC: Developmental Control, EM: Energy Metabolism, KP: Kinases and Phosphatases, OG: Oncogenes, OM: Other Metabolism, PI, Peptidases and Peptidase Inhibitors, RT: Receptors, SC: Structural and Cytoskeletal, ST: Signal Transduction, TP: Transporters, TT: Transcription, Translation, and Subcellular Localization, TX: Transcription Factors.
EXAMPLE 11
cDNA Libraries Generated From Specific Genomic DΝA
by Exon Expression & Amplification
Exon amplification was used to express potential exons from genomic DΝA in a recombinant vector that contains some of the signals necessary for splicing. If an exon is present in the proper orientation in the vector, that exon will be spliced in a mammalian cell and will become part of the mRΝA of that cell. The exon splice-product can be purified from other mRΝA in the cell by conversion of the mRΝA to cDΝA and selective amplification of the recombinant splice-product cDΝAs. Cosmid DΝA from human chromosome 19q13.3 was digested with BamHI or BamHI/BglII restriction enzymes. The fragments generated were collected and size specifically cloned into an expression vector (Buckler, et al. Proc. Νat'l. Acad. Sci. USA, 88:4005-4009 (1991)). After transfection by electroporation of these constructs into COS cells, RΝA transcripts were generated using the SV40 early promoter and a polyadenylation signal derived from SV40 both present in the expression vector. When a fragment of genomic DΝA contains an entire exon with flanking intron sequence in the sense orientation, the exon should be retained in the mature poly (A) + cytoplasmic RΝA. Therefore, the mRΝA was used as template for cDΝA synthesis using reverse transcriptase and vector-priming. Subsequently, the cDΝAs were amplified by vector-priming using PCR. A fraction of this first PCR product was reamplified using internal vector-primers containing terminal cloning sites. These products were end-repaired with T4 DΝA polymerase, digested with the appropriate restriction enzymes, gel purified and cloned into pBluescript vectors. The constructs were transfected into XLl-Blue competent cells and plated on LB/X-gal/IPTG/ampicillin plates. White colonies were selected and expanded to prepare DΝA templates as described in Example 2. When multiple cosmids or YAC clones were used as the source DNA, a pool of specific expressed exons was obtained as a cDNA library. The EST/cDNAs sequenced from this specific library are disclosed herein as SEQ ID NOS : 2412 -2417 .
EXAMPLE 12
PCR Amplification from Predicted Exons Computational analyses can be applied to genomic DNA sequences to predict protein coding regions. The coding region prediction program CRM (E. Uberbacher and R. Mural, Proc. Natl. Acad. Sci. USA 88:11261-5 (1991)) finds open reading frames and classifies them according to their probability of being coding regions. These regions are subsequently examined using the GM program (C. Fields and C. Soderlund, Comp. Applic. Biosci. 6: 263, 1990), which predicts intron-exon structure. PCR primers are then designed to amplify the predicted exons and used to test human cDNA libraries (for example, fetal brain or placental libraries) for the presence of these putative exons using a PCR assay.
This strategy has been successfully applied in two large scale genomic sequencing projects, the Huntington's locus of human chromosome 4pl6.3 (McCombie, et al., submitted) and human chromosome locus 19ql3.3 (Martin-Gallardo, et al., submitted). Sequences from eleven predicted exons from chromosome 4 were present in tested cDNA libraries, indicating that this region has at least two and probably three expressed genes. In one case, the method resulted in an amplification product which spanned two predicted exons.
(SEQ ID NO: 2411.) When sequenced, this PCR product indicated the presence of the two exons from which the primers were initially chosen, as well as an intervening exon which was also predicted by the CRM program, but not the intervening genomic sequences. In a similar fashion, the presence of the two predicted genes in the chromosome 19 sequence was confirmed by sequencing PCR products. SEQ ID NO 2410, includes a partial exon of one of these genes.
EXAMPLE 13
Complete Sequence of EST Clone Inserts
There are a number of methods known to those with skill in the art of molecular biology, to obtain sequence information from the cDNAs corresponding to the EST sequences. Procedures for these methods are provided in Basic Methods in Molecular Biology (David et al. supra). One way to acquire more information about the cDNA from which an EST was derived is to sequence the remainder of the cDNA clone. The complete sequence of the inserts of four EST clones (representing SEQ ID NOs 188, 189, 223, and 227) was determined using Exonuclease III deletions. Briefly, EST clones were digested with the restriction enzymes Sail and Kpnl or PstI and BamHI (for deletions from the Forward primer and Reverse primer ends of the insert, respectively). The Kpnl and PstI enzymes leave 3' sticky ends following digestion, which Exonuclease III is unable to bind. This results in unidirectional deletions into the cDNA insert leaving the vector sequence undisturbed. After addition of Exonuclease III to the Forward and Reverse deletion reactions, aliquots of the reaction were removed at defined time intervals and the reaction was stopped to prevent further deletion. SI nuclease and Klenow DNA polymerase were added to create blunt ended fragments suitable for ligation.
Samples for each time point was purified by electrophoresis through an agarose gel and religated. Two to four representative clones from each time point in each direction were sequenced to give between 200 and 400 base pairs of sequence data. Careful selection of deletion conditions and time points allow a deletion series of approximately 100-200 base pairs difference in length at each consecutive time point. Sequence fragments were reassembled into a redundant contiguous sequence using the INHERIT software from Applied Biosystems, Inc. (Foster City, CA). In this way, the complete insert from these four cDNA clones was sequenced on both strands to an average redundancy between three and four (each base was sequenced between three and four times, on average). Those complete insert sequences are disclosed herein as SEQ ID 2418, 2419, 2420, and 2421, corresponding to original ESTs with SEQ ID 223, 189, 227, and 188, respectively.
EXAMPLE 14
Determining Reading Frame, Orientation, Coding Regions:
ESTs and Complete cDNA Sequences
Once the complete cDNA sequence has been determined in accordance with Example 13, the reading frame, orientation, and coding regions are determined by computer techniques. (The complete coding region is considered to be the largest open reading frame from a methionine to a stop codon.)
Specifically, the CRM program on the GRAIL server is used as explained in Example 9 to determine probable coding regions. This information is supplemented by location of start and stop codons. Where possible, the results of the CRM analysis are validated by comparison of the cDNA sequence to known sequences using database matching, in accordance with Examples 3 and 4. If a match of 50% (or even less) is found in any particular reading frame and orientation, this serves to verify corresponding CRM results. Alternatively, database matches can be used to determine reading frame and orientation without use of the CRM program. Of course, if the cDNA is derived from a directional library, the probable orientation is already known.
EXAMPLE 15
Preparation of PCR Primers and Amplification of DNA
The EST sequences and the corresponding cDNA sequences and genomic sequences may be used, in accordance with the present invention, to prepare PCR primers for a variety of applications. The PCR primers are preferably at least 15 bases, and more preferably at least 18 bases in length. The procedure of Example 5 is repeated using the desired EST, or using the corresponding cDNA or genomic DNA sequence from Example 13. It is preferred that the primer pairs have approximately the same G/C ratio, so that melting temperatures are approximately the same. When screening cDNA, introns are of no concern; however, when screening genomic DNA, primers should be selected to avoid reading across introns, which usually are too large to amplify. The PCR primers and amplified DNA of this Example find use in the Examples that follow. EXAMPLE 16
Forensic Matching by DNA Sequencing
In one exemplary method, DNA samples are isolated from forensic specimens of, for example, hair, semen, blood or skin cells by conventional methods. A panel of PCR primers derived from a number of the sequences of Example 1, 2, 11, 12 and/or 13 is then utilized in accordance with Example 12 to obtain DNA of approximately 100-200 bases in length from the forensic specimen. Corresponding sequences are obtained from a suspect. Each of these identification DNAs is then sequenced, and a simple database comparison determines the differences, if any, between the sequences from the suspect and those from the sample. Statistically significant differences between the suspect's DNA sequences and those from the sample conclusively prove a lack of identity. This lack of identity can be proven, for example, with only one sequence. Identity, on the other hand, should be demonstrated with a large number of sequences, all matching. Preferably, a minimum of 50 statistically identical sequences of 100 bases in length are used to prove identity between the suspect and the sample.
EXAMPLE 17
Positive Identification by DNA Sequencing
The technique outlined in the previous example may also be used on a larger scale to provide a unique fingerprint-type identification of any individual. In this technique, primers are prepared from a large number of sequences from Examples 1, 2, 11, 12 and/or 13. Preferably, 20 to 50 different primers are used. These primers are used to obtain, a corresponding number of PCR-generated DNA segments from the individual in question in accordance with Example 15. Each of these DNA segments is sequenced, using the methods set. forth in Example 1. The database of sequences generated through this procedure uniquely identifies the individual from whom the sequences were obtained. The same panel of primers may then be used at any later time to absolutely . correlate tissue or other biological specimen with that individual.
EXAMPLE 18
Southern Blot Forensic Identification
The procedure of Example 17 is repeated to obtain a panel of from 10 to 2000 amplified sequences from an individual and a specimen. This PCR-generated DNA is then digested with one or a combination of, preferably, four base specific restriction enzymes. Such enzymes are commercially available and known to those of skill in the art. After digestion, the resultant gene fragments are size separated in multiple duplicate wells on an agarose gel and transferred to nitrocellulose using Southern blotting techniques well known to those with skill in the art. For a review of Southern blotting see Davis et al. (Basic Methods in Molecular Biology, 1986, Elsevier Press, pp 62-65).
A panel of ESTs or complete cDNA sequences from Examples 1, 2, and/or 13, or fragments thereof of at least 15 bases, are radioactively or colorimetrically labeled using end-labeled oligonucleotides derived from the ESTs, nick translated sequences or the like using methods known in the art and hybridized to the Southern blot using techniques known in the art (Davis et al., supra). Preferably, at least 5 to 10 of these labeled probes are used, and more preferably at least about 20 or 30 are used to provide a unique pattern. The resultant bands appearing from the hybridization of a large sample of ESTs will be a unique identifier. Since the restriction enzyme cleavage will be different for every individual, the band pattern on the Southern blot will also be unique. Increasing the number of EST probes will provide a statistically higher level of confidence in the identification since there will be an increased number of sets of bands used for identification.
EXAMPLE 19
Dot Blot Identification Procedure Another technique for identifying individuals using the sequences disclosed herein utilizes a dot blot hybridization technique.
Genomic DNA is isolated from nuclei of subject to be identified. Oligonucleotide probes of approximately 30 bp in length were synthesized that correspond to sequences from the ESTs. The probes are used to hybridize to the genomic DNA through conditions known to those in the art. The oligonucleotides are end labelled with P32 using polynucleotide kinase (Pharmacia). Dot Blots are created by spotting about 50 ng cDNA of at least 10, preferably at least 50 sequences corresponding to a variety of the Sequence ID NOs provided in Table 7 onto nitrocellulose or the like using a vacuum dot blot manifold (BioRad, Richmond California). The nitrocellulose filter containing the EST clone sequences is baked or UV linked to the filter, prehybridized and hybridized with labeled probe using techniques known in the art (Davis et al. supra). The 32P labeled DNA fragments are sequentially hybridized with successively stringent conditions to detect minimal differences between the 30 bp sequence and the DNA. Tetramethylammonium chloride is useful for identifying clones containing small numbers of nucleotide mismatches (Wood et al., Proc. Natl. Acad. Sci. USA 82 (6) :1585-1588 (1985) which is hereby incorporated by reference. A unique pattern of dots distinguishes one individual from another individuals.
EXAMPLE 20
Alternative "Fingerprint" Identification Technique EST sequences and the corresponding complete cDNA sequences can be used to create a unique fingerprint for an individual. Thus pools of EST sequences can be used in forensics, paternity suits or the like to differentiate one individual from another.
Entire EST sequences can be used; similarly oligonucleotides can be prepared from EST sequences. In this example, 20-mer oligonucleotides are prepared from 200 EST sequences using commercially available oligonucleotide services such as Oligos Etc., Wilsonville, OR. Patient cell samples are processed for DNA using techniques well known to those with skill in the art. The nucleic acid is digested with restriction enzymes EcoRI and XbaI. Following digestion, samples are applied to wells for electrophoresis. The procedure, as known in the art, may be modified to accommodate polyacrylamide electrophoresis, however in this example, samples containing 5 ug of DNA are loaded into wells and separated on 0.8% agarose gels. The gels are transferred using Southern blotting techniques onto nitrocellulose.
10 ng of each of the oligos are pooled and end-labeled with P32. The nitrocellulose is prehybridized with blocking solution and hybridized with the labeled probes. Following hybridization and washing, the nitrocellulose filter is exposed to X-Omat AR X-ray film. The resulting hybridization pattern will be unique for each individual.
It is additionally contemplated within this example that the representative number of EST sequences can be varied for additional accuracy or clarity.
EXAMPLE 21
Identification of genes associated with hereditary diseases
This example illustrates an approach useful for the association of EST sequences with particular phenotypic characteristics. In this example, a particular EST is used as a test probe to associate that EST with a particular phenotypic characteristic.
An EST clone corresponding to EST01643, (SEQ ID NO 1894) maps to a gene rich region of chromosome 6. EST clone HHCMH89, from which EST01643 was derived, was mapped to chromosome 6p21 by Dr. Julie Korenberg of UCLA/Cedar Sinai Hospital using FISH. A search of Mendelian Inheritance in Man (supra) revealed 6p21 to be a very gene rich region containing several known genes and several diseases for which genes have not been identified. The cDNA encoded by EST clone HHCMH89 thus becomes an immediate candidate for each of these genetic diseases.
Cells from patients with these diseases are isolated and expanded in culture. PCR primers from the EST sequences are used to screen genomic DNA and RNA or cDNA from the patients. ESTs that are not amplified in the patients can be positively associated with a particular disease by further analysis. EXAMPLE 22
Identification of a gene associated with
Angelman's disease
Angelman's disease (AD) is characterized by deletions on the long arm of chromosome 15 (15qllql3) (Williams et al.
Am. J. Med. Genet. 32:339-345 (1989) hereby incorporated by reference). The symptoms of the disease include developmental delay, seizures, inappropriate laughter and ataxic movements. These symptoms suggest that the disorder is a neurologic deficiency. This prophetic example illustrates how ESTs, preferably obtained from a cDNA library from human brain, may be used in identifying the defective gene or genes associated with Angelman's Disease. (The example is based on analogous work with genomic DNA, rather than cDNA and ESTs, in identifying the genetic defect associated with Angelman's Disease.) This example also illustrates how EST sequences may generally be used for identifying gene sequences associated with an inherited disease that is mapped to a chromosome location.
ESTs are screened using techniques described in Example 5 and Example 7 to identify those ESTs that localize to the long arm of chromosome 15 and preferably localize to chromosome 15 bands 15qllql3 from normal patients. ESTs that bind to the long arm of chromosome 15 are hybridized to chromosome 15 from AD patients. These studies are preferrably performed using either fluorescence in situ hybridization or using somatic cell hybrids that contain fragments from the long arm of chromosome 15 from AD patients. Those chromosome 15-specific ESTs that do not map to chromosome 15 from AD patients are useful as markers for Angelman's Disease and can be incorporated into diagnostics for genetic screening. These ESTs are associated with chromosome deletions present in Angelman's disease. Identification of the gene associated with these AD negative ESTs and an analysis of the polypeptides encoded by the genes from normal patients is essential for providing gene or other therapies for AD patients.
Genetic diseases are not always accompanied by gene deletions. Therefore, it is also important to use the ESTs that bind to bands 15q11q13 from AD patients as tools to identify the polymorphisms present within the disease population. Restriction fragment length polymorphism (RFLP) analysis can be performed on patient cells from AD disease or from somatic cell hybrids created using the long arm of chromosome 15. For a review of RFLP techniques see Donis-Keller et al. (Cell 51:319-337 (1987) hereby incorporated by reference). DNA is isolated from the somatic cell lines or from cells from AD patients. The DNA is digested with one or more restriction enzymes according to techniques of Donis-Keller et al. The resulting fragments are separated by gel electrophoresis, denatured, transferred to nitrocellulose and hybridized with the selected radio-labeled ESTs that localize to the region of interest. The autoradiographic pattern is compared both to a number of AD patients and to normal patients. Common patterns of EST hybridization in AD patients that are not present in normal patients indicates that the genes associated with these ESTs are candidate genes affected by AD.
cDNA libraries are prepared from the somatic cell hybrids from AD patients. Libraries are prepared using Lambda Zap II Library Kits (Stratagene, La Jolla, California) or other commercially available library kits. The ESTs of interest are used as probes to identify those bacterial colonies carrying genes corresponding to the EST probes. Positive clones are sequenced and the sequences are compared to homologous gene sequences derived from normal patients.
Alterations, including deletions and substitutions, within gene sequences, associated with bands 15q11q13, are thus positively identified and associated with AD disease. Wagstaff et al. were able to identify deletions and substitutions in sequences encoding the GABAA receptor protein subunit from patients with Angelman ' s disease (Am. J. Hum. Genet. 49:330-337, (1991)). It is likely that other genes will additionally be associated with the disease. EXAMPLE 23
Preparation and Use of Antisense Oligonucleotides
Antisense RNA molecules are known to be useful for regulating translation within the cell. Antisense RNA molecules can be produced from EST sequences or from the corresponding gene sequences. These antisense molecules can be used as diagnostic probes to determine whether or not a particular gene is expressed in a cell. Similarly, the antisense molecules can be used as a therapeutic to regulate gene expression once the EST is associated with a particular disease (see Example 22).
The antisense molecules are obtained from a nucleotide sequence by reversing the orientation of the coding region with regard to the promoter. Thus, the antisense RNA is complementary to the corresponding mRNA. For a review of antisense design see Green et al., Ann. Rev. Biochem. 55:569-597 (1986), which is hereby incorporated by reference. The antisense sequences can contain modified sugar phosphate backbones to increase stability and make them less sensitive to RNase activity. Examples of the modifications are described by Rossi et al., Pharmacol. Ther. 50(2):245-254, (1991).
Antisense molecules are introduced into cells that express the gene corresponding to the EST of interest in culture. In a preferred application of this invention, the polypeptide encoded by the gene is first identified, so that the effectiveness of antisense inhibition on translation can be monitored using techniques that include but are not limited to antibody-mediated tests such as RIAs and ELISA, functional assays, or radiolabelling. The antisense molecule is introduced into the cells by diffusion or by transfection procedures known in the art. The molecules are introduced onto cell samples at a number of different concentrations preferably between 1×10-10M to 1×10-4M. Once the minimum concentration that can adequately control translation is identified, the optimized dose is translated into a dosage suitable for use in vivo. For example, an inhibiting concentration in culture of 1×10-7 translates into a dose of approximately 0.6 mg/kg bodyweight. Levels of oligonucleotide approaching 100 mg/kg bodyweight or higher may be possible after testing the toxicity of the oligonucleotide in laboratory animals.
The antisense can be introduced into the body as a bare or naked oligonucleotide, oligonucleotide encapsulated in lipid, oligonucleotide sequence encapsidated by viral protein, or as oligonucleotide contained in an expression vector such as those described in Example 25. The antisense oligonucleotide is preferably introduced into the vertebrate by injection. It is additionally contemplated that cells from the vertebrate are removed, treated with the antisense oligonucleotide, and reintroduced into the vertebrate. It is further contemplated that the antisense oligonucleotide sequence is incorporated into a ribozyme sequence to enable the antisense to bind and cleave its target. For technical applications of ribozyme and antisense oligonucleotides see Rossi et al.
EXAMPLE 24
Preparation and use of Triple Helix Probes
Triple helix oligonucleotides are used to inhibit transcription from a genome. They are particularly useful for studying alterations in cell activity as it is associated with a particular gene. The EST sequences or complete sequences of the present invention or, more preferably, a portion of those sequences, can be used to inhibit gene expression in individuals having diseases associated with a particular gene. Similarly, a portion of the EST or corresponding gene sequence can be used to study the effect of inhibiting transcription of a particular gene within a cell. Traditionally, homopurine sequences were considered the most useful. However, homopyrimidine sequences can also inhibit gene expression. Thus, both types of sequences from either the EST or from the gene corresponding to the EST are contemplated within the scope of this invention. Homopyrimidine oligonucleotides bind to the major groove at homopurine : homopyrimidine sequences. As an example, 10-mer to 20-mer homopyrimidine sequences from the ESTs can be used to inhibit expression from homopurine sequences. SEQ ID NOs such as 282, 888, 719, 670, 994, 240, 873 and 761 contain homopyrimidine 15-mers. Moreover the natural (beta) anomers of the oligonucleotide units can be replaced with alpha anomers to render the oligonucleotide more resistant to nucleases. Further, an intercalating agent such as ethidium bromide, or the like, can be attached to the 3' end of the alpha oligonucleotide to stabilize the triple helix. For information on the generation of oligonucleotides suitable for triple helix formation see Griffin et al. (Science 245:967-971 (1989), which is hereby incorporated by this . reference).
The oligonucleotides may be prepared on an oligonucleotide synthesizer or they may be purchased commercially from a company specializing in custom oligonucleotide synthesis. The sequences are introduced into cells in culture using techniques known in the art that include but are not limited to calcium phosphate precipitation, DEAE-Dextran, electroporation, liposome-mediated transfection or native uptake. Treated cells are monitored for altered cell function. These cell functions are predicted based upon the homologies of the gene, corresponding to the EST from which the oligonucleotide was derived, with known genes sequences that have been associated with a particular function. The cell functions can also be predicted based on the presence of abnormal physiologies within cells derived from individuals with a particular inherited disease, particularly when the EST is associated with the disease using techniques described in Example 22.
EXAMPLE 25
Gene expression from DNA Sequences Corresponding to ESTs
A gene sequence of the present invention coding for all or part of a human gene product is introduced into an expression vector using conventional technology. (Techniques to transfer cloned sequences into expression vectors that direct protein translation in mammalian, yeast, insect or bacterial expression systems are well known in the art.)
Commercially available vectors and expression systems are available from a variety of suppliers including Stratagene
(La Jolla, California), Promega (Madison, Wisconsin), and Invitrogen (San Diego, California). If desired, to enhance expression and facilitate proper protein folding, the codon context and codon pairing of the sequence may be optimized for the particular expression organism, as explained by Hatfield, et al., U.S. Patent No. 5,082,767, incorporated herein by this reference.
The following is provided as one exemplary method to generate polypeptide from cloned cDNA sequences. The cDNA from the EST of interest is sequenced to identify the methionine initiation codon for the gene and the poly A sequence. If the cDNA lacks a poly A sequence, this sequence can be added to the construct by, for example, splicing out the Poly A sequence from pSG5 (Stratagene) using Bgll and Sail restriction endonuclease enzymes and incorporating it into the mammalian expression vector pXT1 (Stratagene). pXT1 contains the LTRs and a portion of the gag gene from Moloney Murine Leukemia Virus. The position of the LTRs in the construct allow efficient stable transfection. The vector includes the Herpes Simplex Thymidine Kinase promoter and the selectable neomycin gene. The cDNA is obtained by PCR from the bacterial vector using oligonucleotide primers complementary to the cDNA and containing restriction endonuclease sequences for Pst I incorporated into the 5'primer and BglII at the 5' end of the corresponding cDNA 3' primer, taking care to ensure that the cDNA is positioned inframe with the poly A sequence. The purified fragment obtained from the resulting PCR reaction is digested with PstI, blunt ended with an exonuclease, digested with Bgl II, purified and ligated to pXT1, now containing a poly A sequence and digested BglII.
The ligated product is transfected into mouse NIH 3T3 cells using Lipofectin (Life Technologies, Inc., Grand Island, New York) under conditions outlined in the product specification. Positive transfectants are selected after growing the transfected cells in 600ug/ml G418 (Sigma, St. Louis, Missouri). The protein is preferrably released into the supernatant. However if the protein has membrane binding domains, the protein may additionally be retained within the cell or expression may be restricted to the cell surface.
Since it may be necessary to purify and locate the transfected product, synthetic 15-mer peptides synthesized from the predicted cDNA sequence are injected into mice to generate antibody to the polypeptide encoded by the cDNA.
If antibody production is not possible, the cDNA sequence is additionally incorporated into eukaryotic expression vectors and expressed as a chimeric with, for example, β-globin. Antibody to β-globin is used to purify the chimeric. Corresponding protease cleavage sites engineered between the β-globin gene and the cDNA are then used to separate the two polypeptide fragments from one another after translation. One useful expression vector for generating β-globin chimerics is pSG5 (Stratagene). This vector encodes rabbit β-globin. Intron II of the rabbit β-globin gene facilitates splicing of the expressed transcript, and the polyadenylation signal incorporated into the construct increases the level of expression. These techniques as described are well known to those skilled in the art of molecular biology. Standard methods are published in methods texts such as Davis et al. and many of the methods are available from the technical assistance representatives from Stratagene, Life Technologies, Inc., or Promega. Polypeptide may additionally be produced from either construct using in vitro translation systems such as In vitro Express™ Translation Kit (Stratagene).
Example 26
Production of an Antibody to a Human Protein
Substantially pure protein or polypeptide is isolated from the transfected or transformed cells as described in Example 25. Concentration of protein in the final preparation is adjusted, for example, by concentration on an Amicon filter device, to the level of a few micrograms/ml. Monoclonal or polyclonal antibody to the protein can then be prepared as follows:
A. Monoclonal Antibody Production by Hybridoma Fusion
Monoclonal antibody to epitopes of any of the peptides identified and isolated as described can be prepared from murine hybridomas according to the classical method of Kohler, G. and Milstein, C, Nature 256:495 (1975) or derivative methods thereof. Briefly, a mouse is repetitively inoculated with a few micrograms of the selected protein over a period of a few weeks. The mouse is then sacrificed, and the antibody producing cells of the spleen isolated. The spleen cells are fused by means of polyethylene glycol with mouse myeloma cells, and the excess unfused cells destroyed by growth of the system on selective media comprising aminopterin (HAT media). The successfully fused cells are diluted and aliquots of the dilution placed in wells of a microtiter plate where growth of the culture is continued. Antibody-producing clones are identified by detection of antibody in the supernatant fluid of the wells by immunoassay procedures, such as Elisa, as originally described by Engvall, E., Meth. Enzymol. 70:419 (1980), and derivative methods thereof. Selected positive clones can be expanded and their monoclonal antibody product harvested for use. Detailed procedures for monoclonal antibody production are described in Davis, L. et al. Basic Methods in Molecular Biology Elsevier, New York. Section 21-2.
B. Polyclonal Antibody Production by Immunization
Polyclonal antiserum containing antibodies to heterogenous epitopes of a single protein can be prepared by immunizing suitable animals with the expressed protein described above, which can be unmodified or modified to enhance immunogenicity. Effective polyclonal antibody production is affected by many factors related both to the antigen and the host species. For example, small molecules tend to be less immunogenic than other and may require the use of carriers and adjuvant. Also, host animals vary in response to site of inoculations and dose, with both inadequate or excessive doses of antigen resulting in low titer antisera. Small doses (ng level) of antigen administered at multiple intradermal sites appears to be most reliable. An effective immunization protocol for rabbits dan be found in Vaitukaitis, J. et al . J. Clin. Endocrinol. Metab. 33:988-991 (1971).
Booster injections can be given at regular intervals, and antiserum harvested when antibody titer thereof, as determined semi-quantitatively, for example, by double immunodiffusion in agar against known concentrations of the antigen, begins to fall. See, for example, Ouchterlony, O. et al., Chap. 19 in: Handbook of Experimental Immunology D. Wier (ed) Blackwell (1973). Plateau concentration of antibody is usually in the range of 0.1 to 0.2 mg/ml of serum (about 12 μM). Affinity of the antisera for the antigen is determined by preparing competitive binding curves, as described, for example, by Fisher, D., Chap. 42 in: Manual of Clinical Immunology, 2d Ed. (Rose and Friedman, eds.) Amer. Soc. For Microbiol., Washington, D.C. (1980).
Antibody preparations prepared according to either protocol are useful in quantitative immunoassays which determine concentrations of antigen-bearing substances in biological samples; they are also used semi-quantitatively or qualitatively to identify the presence of antigen in a biological sample.
EXAMPLE 27
Identification of Tissue Types or Cell Species by Means of
Labeled Tissue Specific Antibodies
Identification of specific tissues is accomplished by the visualization of tissue specific antigens by means of antibody preparations according to Example 26 which are conjugated, directly or indirectly to a detectable marker. Selected labeled antibody species bind to their specific antigen binding partner in tissue sections, cell suspensions, or in extracts of soluble proteins from a tissue sample to provide a pattern for qualitative or semi-qualitative interpretation.
Antisera for these procedures must have a potency exceeding that of the native preparation, and for that reason, antibodies are concentrated to a mg/ml level by isolation of the gamma globulin fraction, for example, by ion-exchange chromatography or by ammonium sulfate fractionation. Also, to provide the most specific antisera, unwanted antibodies, for example to common proteins, must be removed from the gamma globulin fraction, for example by means of insoluble immunoabsorbents, before the antibodies are labeled with the marker. Either monoclonal or heterologous antisera is suitable for either procedure. A. Immunohistochemical Techniques
Purified, high-titer antibodies, prepared as described above, are conjugated to a detectable marker, as described, for example, by Fudenberg, H., Chap. 26 in: Basic & Clinical Immunology, 3rd Ed. Lange, Los Altos, California (1980) or Rose, N. et al., Chap. 12 in: Methods in Immunodiagnosis, 2d Ed. John Wiley & Sons, New York (1980).
A fluorescent marker, either fluorescein or rhodamine, is preferred, but antibodies can also be labeled with an enzyme that supports a color producing reaction with a substrate, such as horseradish peroxidase. Markers can be added to tissue-bound antibody in a second step, as described below. Alternatively, the specific antitissue antibodies can be labeled with ferritin or other electron dense particles, and localization of the ferritin coupled antigen-antibody complexes achieved by means of an electron microscope. In yet another approach, the antibodies are radiolabeled, with, for example 125I, and detected by overlaying the antibody treated preparation with photographic emulsion.
Preparations to carry out the procedures can comprise monoclonal or polyclonal antibodies to a single gene copy or protein, identified as specific to a tissue type, for example, brain tissue, or antibody preparations to several antigenically distinct tissue specific antigens can be used in panels, independently or in mixtures, as required.
Tissue sections and cell suspensions are prepared for immunohistochemical examination according to common histological techniques. Multiple cryostat sections (about 4 μm, unfixed) of the unknown tissue and known control, are mounted and each slide covered with different dilutions of the antibody preparation. Sections of known and unknown tissues should also be treated with preparations to provide a positive control, a negative control, for example, preimmune sera, and a control for non-specific staining, for example, buffer. Treated sections are incubated in a humid chamber for 30 min at room temperature, rinsed, then washed in buffer for 30-45 min. Excess fluid is blotted away, and the marker developed.
If the tissue specific antibody was not labeled in the first incubation, it can be labeled at this time in a second antibody-antibody reaction, for example, by adding fluorescein- or enzyme-conjugated antibody against the immunoglobulin class of the antiserum-producing species, for example, fluorescein labeled antibody to mouse IgG. Such labeled sera are commercially available.
The antigen found in the tissues by the above procedure can be quantified by measuring the intensity of color or fluorescence on the tissue section, and calibrating that signal using appropriate standards.
B. Identification of Tissue Specific Soluble Proteins
The visualization of tissue specific proteins and identification of unknown tissues from that procedure is carried out using the labeled antibody reagents and detection strategy as described for immunohistochemistry; however the sample is prepared according to an electrophoretic technique to distribute the proteins extracted from the tissue in an orderly array on the basis of molecular weight for detection.
A tissue sample is homogenized using a Virtis apparatus; cell suspensions are disrupted by Dounce homogenization or osmotic lysis, using detergents in either case as required to disrupt cell membranes, as is the practice in the art. Insoluble cell components such as nuclei, microsomeε, and membrane fragments are removed by ultracentrifugation, and the soluble protein-containing fraction concentrated if necessary and reserved for analysis.
A sample of the soluble protein solution is resolved into individual protein species by conventional SDS polyacrylamide electrophoresis as described, for example, by Davis, L. et al., Section 19-2 in: Basic Methods in Molecular Biology (P. Leder, ed), Elsevier, New York (1986), using a range of amounts of polyacrylamide in a set of gels to resolve the entire molecular weight range of proteins to be detected in the sample. A size marker is run in parallel for purposes of estimating molecular weights of the constituent proteins. Sample size for analysis is a convenient volume of from 5-50 μl, and containing from about 1 to 100 μg protein. An aliquot of each of the resolved proteins is transferred by blotting to a nitrocellulose filter paper, a process that maintains the pattern of resolution. Multiple copies are prepared. The procedure, known as Western Blot Analysis; is well described in Davis, L. et al., (above) Section 19-3. One set of nitrocellulose blots is stained with Coomassie Blue dye to visualize the entire set of proteins for comparison with the antibody bound proteins. The remaining nitrocellulose filters are then incubated with a solution of one or more specific antisera to tissue specific proteins prepared as described in Example 26. In this procedure, as in procedure A above, appropriate positive and negative sample and reagent controls are run.
In either procedure A or B, a detectable label can be attached to the primary tissue antigen-primary antibody complex according to various strategies and permutations thereof. In a straightforward approach, the primary specific antibody can be labeled; alternatively, the unlabeled complex can be bound by a labeled secondary anti-IgG antibody. In other approaches, either the primary or secondary antibody is conjugated to a biotin molecule, which can, in a subsequent step, bind an avidin conjugated marker. According to yet another strategy, enzyme labeled or radioactive protein A, which has the property of binding to any IgG, is bound in a final step to either the primary or secondary antibody.
The visualization of tissue specific antigen binding at levels above those seen in control tissues to one or more tissue specific antibodies, prepared from the gene sequences identified from EST sequences, can identify tissues of unknown origin, for example, forensic samples, or differentiated tumor tissue that has metastasized to foreign bodily sites.
The entire contents of all references cited above are hereby incorporated by reference.
While the present invention has been described in some detail for purposes of clarity and understanding, one skilled in the art will appreciate that various changes in form and detail can be made without departing from the true scope of the invention.
VII. Correlation of EST and Clone Identifiers
The EST sequences of the present invention are identified herein by SEQ ID NO, and are identified in the GenBank database by a different number, are identified in the inventors' lab (and upcoming publications) by EST number, and clones have been submitted to the American Type Culture Collection (Rockville, Maryland USA) under clone names. Table 12 cross references those different numbers for the ESTs from cDNA, SEQ ID NOS 1-2409.
Certain Sequence ID NOS are excluded from some claims based on their homology to known non-human sequences (See Table 2).
Table 12. SEQ ID NO Cross References
SEQ ID EST# GB# Clone SEQ ID EST# GB# Clone SEQ ID EST# GB# Clone
1 EST00007 M61959 HFBA01 64 EST00066 M62010 HHCC13 128 EST00252 M62191 HHCG57
2 EST00009 M61953 HFBA05 66 EST00067 M62011 HHCC18 129 EST00321 M62254 HHCG60
3 EST00010 M61961 HFBA07 67 EST00351 M62280 HHCC21 130 EST00355 M62283 HHCG61
4 EST00011 M61962 HFBA08 68 EST00068 M62012 HHCC22 131 EST00322 M62255 HHCG63
5 EST00012 M61963 HFBA10 69 EST00360 M62287 HHCC23 132 EST00110 M62054 HHCG65
6 EST00013 M61964 HFBA11 71 EST00070 M62014 HHCC27 133 EST00111 M62055 HHCG66
7 EST00014 M61965 HFBA11 72 EST00071 M62015 HHCC29 134 EST00375 M62186 HHCG67
8 EST00234 M62172 HFBA26 73 EST00072 M62016 HHCC31 135 EST00112 M62056 HHCG68
9 EST00376 M61966 HFBA20 74 EST00073 M62017 HHCC37 136 EST00113 M62057 HHCG70
10 EST00016 M61967 HFBA23 75 EST00074 M62018 HHCC40 137 EST00272 M62210 HHCG72
11 EST00018 M61968 HFBA36 76 EST00075 M62019 HHCC42 138 EST00114 M62058 HHCG85
12 EST00274 M62212 HFBA51 77 EST00257 M62196 HHCC53 139 EST00116 M62059 HHCH01
13 EST0055 M62194 HFBA65 78 EST00258 M62197 HHCC53 140 EST00117 M62060 HHCH09
14 EST00019 M61969 HFBA66 79 EST00076 M62020 HHCC64 141 EST00118 M62061 HHCH11
15 EST00020 M61970 HFBA69 80 EST00077 M62021 HHCC67 142 EST00323 M62256 HHCH15
16 EST00021 M61971 HFBA71 81 EST00315 M62248 HHCC70 143 EST00119 M62062 HHCH36
17 EST00022 M61972 HFBA77 82 EST00078 M62022 HHCC72 144 EST00120 M62063 HHCH41
18 EST00373 M62299 HFBA84 83 EST00079 M62023 HHCC74 145 EST00121 M62064 HHCH49
19 EST00023 M61973 HFBA86 84 EST00080 M62024 HHCC76 146 EST00122 M62065 HHCH50
20 EST00024 M61974 HFBA87 85 EST00081 M62025 HHCC77 147 EST00292 M62230 HHCH51
21 EST00025 M61975 HFBA89 86 EST00082 M62026 HHCC78 148 EST00236 M62174 HHCH52
22 EST00301 M62239 HFBA90 87 EST00083 M62027 HHCC80 149 EST00123 M62066 HHCH53
23 EST00026 M61976 HHCA08 88 EST00084 M62028 HHCD01 150 EST00124 M62067 HHCH57
24 EST00027 M61977 HHCA05 89 EST00085 M62029 HHCD02 151 EST00125 M62068 HHCH59
25 EST00028 M61978 HHCA08 90 EST00302 M62240 HHCD03 152 EST00126 M62069 HHCM60
26 EST00310 M62245 HHCA02 91 EST00086 M62030 HHCD05 153 EST00127 M62070 HHCM62
27 EST00029 M61979 HHCA08 92 EST00087 M62031 HHCD06 154 EST00128 M62071 HHCM63
28 EST00030 M61980 HHCA09 93 EST00287 M62225 HHCD08 155 EST00129 M62072 HHCM64
29 EST00031 M61981 HHCA104 94 EST00353 M62281 HHCD10 156 EST00130 M62073 HHCM66
30 EST00032 M61982 HHCA10 95 EST00088 M62032 HHCD12 157 EST00131 M62074 HHCM68
31 EST00033 M61983 HHCA11 96 EST00089 M62033 HHCD14 158 EST00132 M62075 HHCH71
32 EST00233 M62171 HHCA11 97 EST00289 M62227 HHCD17 159 EST00325 M62257 HHCH72
33 EST00034 M61984 HHCA13 98 EST00260 M62199 HHCD21 160 EST00326 M62258 HHCH73
34 EST00035 M61985 HHCA14 99 EST00316 M62249 HHCD30 161 EST00247 M62185 HHCH76
35 EST00036 M61986 HHCA18 100 EST00090 M62034 HHCE04 162 EST00133 M62076 HHCH77
36 EST00037 M61987 HHCA21 101 EST00091 M62035 HHCE05 163 EST00134 M62077 HHCH79
37 EST00038 M61988 HHCA23 102 EST00248 M62187 HHCE06 164 EST00135 M62078 HHCM83
38 EST00374 M62300 HHCA53 103 EST00317 M62250 HHCE07 165 EST00136 M62079 HHCH84
39 EST00039 M61989 HHCA53 104 EST00354 M62282 HHCE08 166 EST00137 M62080 HHCH87
40 EST00040 M61990 HHCA54 105 EST00365 M62291 HHCE10 167 EST00138 M62081 HHCH91
41 EST00041 M61991 HHCA54 106 EST00092 M62036 HHCE12 168 EST00140 M62082 HHCH96
42 EST00042 M61992 HHCA55 107 EST00093 M62037 HHCE13 169 EST00141 M62083 HHCI05
43 EST00371 M62297 HHCA57 108 EST00094 M62038 HHCE15 170 EST00295 M62233 HHCI06
45 EST00364 M62290 HHCA68 109 EST00095 M62039 HHCE17 171 EST00327 M62259 HHCI10
46 EST00044 M61994 HHCA69 110 EST00096 M62040 HHCE20 172 EST00142 M62084 HHCI17
47 EST00046 M61995 HHCA71 111 EST00281 M62218 HHCG01 173 EST00143 M62085 HHCI26
48 EST00291 M62229 HHCA73 112 EST00318 M62251 HHCG03 174 EST00296 M62234 HHCI27
49 EST00047 M61996 HHCA79 113 EST00097 M62041 HHCG07 175 EST00144 M62086 HHCI33
50 EST00048 M61997 HHCB04 114 EST00098 M62042 HHCG10 1.77 EST00328 M62260 HHCI36
51 EST00049 M61998 HHCB08 115 EST00099 M62043 HHCG14 178 EST00294 M62232 HHCI42
52 EST00052 M61999 HHCB16 116 EST00100 M62044 HHCG18 179 EST00145 M62087 HHCI47
53 EST00054 M62000 HHCB35 117 EST00319 M62252 HHCG25 180 EST00299 M62237 HHCI54
54 EST00055 M62001 HHCB51 118 EST00101 M62045 HHCG29 181 EST00147 M62088 HHCI55
55 EST00056 M62002 HHCB52 119 EST00102 M62046 HHCG31
56 EST00057 M62003 HHCB53 120 EST00103 M62047 HHCG33
57 EST00058 M62004 HHCB55 121 EST00104 M62048 HHCG36
58 EST00059 M62005 HHCB96 122 EST00105 M62049 HHCG37
59 EST00061 M62006 HHCC04 123 EST001.06 M62050 HHCG38
60 EST00062 M62007 HHCC05 124 EST00107 M62051 HHCG40
61 EST00290 M62228 HHCC07 125 EST00108 M62052 HHCG41
62 EST00064 M62008 HHCC09 126 EST00109 M62053 HHCG44
63 EST00065 M62009 HHCC10 127 EST00320 M62253 HHCG51
SEQ ID EST# GB# Clone SEQ ID EST# GB# Clone SEQ ID EST# GB# Clone
182 EST00329 M62261 HHCI59 249 EST00275 M62213 HHCSA86 317 EST00379 M78231 HEFBA01
183 EST00148 M62089 HHCI61 250 EST00197 M62136 HHCSA87 318 EST00380 M78232 HEFBA04
184 EST00149 M62090 HHCI62 251 EST00370 M62296 HHCSA92 319 EST00381 M78233 HEFBA04
185 EST00150 M62091 HHCI73 252 EST00198 M62137 HHCSA95 320 EST00382 M78234 HEFBA07
186 EST00151 M62092 HHCI75 253 EST00199 M62138 HHCSB05 321 EST00383 M78235 HEFBA07
187 EST00152 M62093 HHC179 254 EST00200 M62139 HHCSB07 322 EST00384 M78236 HEFBA09
188 EST00256 M62195 HHCI84 255 EST00201 M62140 HHCSB08 323 EST00385 M78237 HEFBA10
189 EST00282 M62219 HHCI85 256 EST00345 M62267 HHCSB09 324 EST01827 M85319 HEFBA11
190 EST00153 M62094 HHCI86 257 EST00337 M62268 HHCSB12 325 EST00386 M78238 HEFBA13
191 EST00154 M62095 HHCI88 258 EST00346 M62276 HHCSB13 326 EST00387 M78239 HEFBA13
192 EST00155 M62096 HHCI90 259 EST00202 M62141 HHCSB19 327 EST00388 M78240 HEFBA18
193 EST00156 M62097 HHCI92 260 EST00357 M62285 HHCSB20 328 EST00389 M78241 HEFBA18
194 EST00157 M62098 HHCI93 261 EST00338 M62269 HHCSB21 329 EST00390 M78242 HEFBA21
195 EST00158 M62099 HHCI94 262 EST00339 M62270 HHCSB22 330 EST00391 M78243 HEFBA23
196 EST00159 M62100 HHCJ05 263 EST00203 M62142 HHCSB23 331 EST00392 M78244 HEFBB04
197 EST00160 M62101 HHCJ07 264 EST00204 M62143 HHCSB25 332 EST00393 M78245 HEFBB04
198 EST00161 M62102 HHCJ09 265 EST00205 M62144 HHCSB27 333 EST00394 M78246 HEFBB09
199 EST00277 M62214 HHCJ13 266 EST00206 M62145 HHCSB29 334 EST00395 M78247 HEFBB21
200 EST00162 M62103 HHCJ15 267 EST00297 M62235 HHCSB30 335 EST00396 M78248 HEFBB24
201 EST00163 M62104 HHCJ17 268 EST00369 M62295 HHCSB31 336 EST00397 M78249 HEFBB30
202 EST00298 M62236 HHCJ29 269 EST00293 M62231 HHCSB32 337 EST00398 M78250 HEFBC02
203 EST00164 M62105 HHCJ30 270 EST00207 M62146 HHCSB34 338 EST00399 M78251 HEFBC06
204 EST00235 M62173 HHCJ34 271 EST00283 M62220 HHCSB37 339 EST00400 M78252 HEFBC11
205 EST00165 M62106 HHCJ35 272 EST00340 M62271 HHCSB43 340 EST00402 M78254 HEFBC17
206 EST00166 M62107 HHCJ36 273 EST00208 M62147 HHCSB44 341 EST00403 M78255 HEFBC21
207 EST00167 M62108 HHCJ37 274 EST00268 M62206 HHCSB45 342 EST00404 M78256 HEFBD03
208 EST00250 M62189 HHCJ42 275 EST00209 M62148 HHCSB46 343 EST01428 M79273 HEFBD03
209 EST00331 M62262 HHCJ43 276 EST00211 M62149 HHCSB48 344 EST00405 M78257 HEFBD05
210 EST00168 M62109 HHCJ47 277 EST00212 M62150 HHCSB50 345 EST00406 M78258 HEFBD09
211 EST00332 M62263 HHCJ48 278 EST00342 M62272 HHCSB53 346 EST01828 M85320 HEFBE18
212 EST00169 M62110 HHCJ50 279 EST00213 M62151 HHCSB55 347 EST01829 M85321 HEFBF14
213 EST00170 M62111 HHCJ51 280 EST00343 M62273 HHCSB56 348 EST01830 M85322 HFBA86
214 EST00171 M62112 HHCJ59 281 EST00214 M62152 HHCSB57 349 EST01831 M85323 HFBA90
215 EST00172 M62113 HHCJ60 282 EST00344 M62274 HHCSB58 350 EST00407 M78259 HFBBA01
216 EST00173 M62114 HHCJ61 283 EST00215 M62153 HHCSB62 351 EST00408 M78260 HFBBA02
218 EST00175 M62116 HHCJ67 284 EST00216 M62154 HHCSB68 352 EST00409 M78261 HFBBA03
219 EST00176 M62117 HHCJ73 285 EST00286 M62224 HHCSB69 353 EST00410 M78262 HFBBA04
220 EST00372 M62298 HHCJ74 286 EST00217 M62155 HHCSB70 354 EST01433 M79278 HFBBA05
221 EST00359 M62286 HHCJ78 287 EST00218 M62156 HHCSB73 355 EST00411 M78263 HFBBA06
222 EST00177 M62118 HHCJ79 288 EST00219 M62157 HHCSB77 356 EST00412 M78264 HFBBA07
223 EST00368 M62294 HHCJ80 289 EST00220 M62158 HHCSB78 357 EST00413 M78265 HFBBA08
224 EST00356 M62284 HHCJ85 290 EST00221 M62159 HHCSB79 358 EST00414 M78266 HFBBA09
225 EST00178 M62119 HHCJ84 291 EST00222 M62160 HHCSB81 359 EST00415 M78267 HFBBA10
226 EST00333 M62264 HHCJ85 292 EST00223 M62161 HHCSB83 360 EST00416 M78268 HFBBA11
227 EST00259 M62198 HHCJ86 293 EST00224 M62162 HHCSB84 361 EST00417 M78269 HFBBA14
228 EST00179 M62120 HHCJ91 294 EST00225 M62163 HHCSB91 362 EST00418 M78270 HFBBA15
229 EST00180 M62121 HHCJ89 295 EST00226 M62164 HHCSB92 363 EST00419 M78271 HFBBA16
230 EST00181 M62122 HHCJ92 296 EST00228 M62166 HTCA08 364 EST00420 M78272 HFBBA17
231 EST00334 M62265 HHCJ93 297 EST00230 M62168 HTCA27 365 EST01434 M85318 HFBBA18
232 EST00182 M62123 HHCJ96 298 EST00231 M62169 HTCA27 366 EST00421 M78273 HFBBA20
233 EST00183 M62124 HHCSA26 299 EST00249 M62188 HTCA36 367 EST00422 M78274 HFBBA21
234 EST00184 M62125 HHCSA28 300 EST00232 M62170 HTCB03 368 EST00423 M78275 HFBBA22
235 EST00185 M62126 HHCSA29 301 EST00300 M62238 HTCB05 369 EST00424 M78276 HFBBA23
236 EST00186 M62127 HHCSA30 302 EST00303 M62241 HFBA22 370 EST00425 M78277 HFBBA24
237 EST00187 M62128 HHCSA31 303 EST00348 M62277 HFBA45 371 EST00426 M78278 HFBBA25
238 EST00188 M62129 HHCSA49 304 EST00307 M62242 HFBA78 372 EST00427 M78279 HFBBA26
239 EST00189 M62130 HHCSA53 305 EST00308 M62243 HHCA05 373 EST01832 M85324 HFBBA28
240 EST00335 M62266 HHCSA59 306 EST00309 M62244 HHCA06
241 EST00191 M62131 HHCSA64 307 EST00312 M62246 HHCA57
242 EST00192 M62132 HHCSA70 308 EST00314 M62247 HHCA63
243 EST00193 M62133 HHCSA75 309 EST00174 M62115 HHCJ67
244 EST00194 M62134 HHCSA77 310 EST00377 HHCG05
245 EST00347 M62275 HHCSA79 311 EST00270 M62208 HFBA18
246 EST00196 M62135 HHCSA81 313 EST00276 M62215 HHCJ13
247 EST00279 M62217 HHCSA82 315 EST00008 M61960 HFBA04
248 EST00271 M62209 HHCSA85 316 EST00378 M78230 HEF8A01
SEQ ID EST# GB# Clone SEQ ID EST# GB# Clone SEQ ID EST# GB# Clone
374 EST00428 M78280 HFBBA30 440 EST01454 M77870 HFBCA22 507 EST00530 M78382 HFBCB13
375 EST00429 M78281 HFBBA32 441 EST00481 M78333 HFBCA24 508 EST00531 M78383 HFBCB14
376 EST01436 M77852 HFBBA33 442 EST01456 M77872 HFBCA26 509 EST01472 M77888 HFBCB15
377 EST00430 M78282 HFBBA34 443 EST00482 M78334 HFBCA27 510 EST00532 M78384 HFBCB16
378 EST00431 M78283 HFBBA37 444 EST00483 M78335 HFBCA28 511 EST00533 M78385 HFBCB19
379 EST00432 M78284 HFBBA38 445 EST00484 M78336. HFBCA30 512 EST00534 M78386 HFBCB20
380 EST01439 M77855 HFBBA40 446 EST00485 M78337 HFBCA31 513 EST00535 M78387 HFBCB21
381 EST00433 M78285 HFBBA41 447 EST00486 M78338 HFBCA32 514 EST00536 M78388 HFBCB22
382 EST00434 M78286 HFBBA42 448 EST00487 M78339 HFBCA33 515 EST00537 M78389 HFBCB23
383 EST00435 M78287 HFBBA43 449 EST00488 M78340 HFBCA34 516 EST00538 M78390 HFBCB24
384 EST01440 M77856 HFBBA44 450 EST00489 M78341 HFBCA35 517 EST00539 M78391 HFBCB26
385 EST00436 M78288 HFBBA45 451 EST00490 M78342 HFBCA36 518 EST00540 M78392 HFBCB28
386 EST00437 M78289 HFBBA47 452 EST00491 M78343 HFBCA37 519 EST00541 M78393 HFBCB31
387 EST00438 M78290 HFBBA48 453 EST00492. M78344 HFBCA38 520 EST00542 M78394 HFBCB33
388 EST00439 M78291 HFBBA49 454 EST00493 M78345 HFBCA39 521 EST01474 M77890 HFBCB34
389 EST00440 M78292 HFBBA51 455 EST00494 M78346 HFBCA40 522 EST00543 M78395 HFBCB35
390 EST01442 M77858 HFBBA52 456 EST01835 M85326 HFBCA41 523 EST01838 M85329 HFBCB36
391 EST00441 M78293 HFBBA53 457 EST00495 M78347 HFBCA42 524 EST00544 M78396 HFBCB38
392 EST00442 M78294 HFBBA54 458 EST00496 M78348 HFBCA43 525 EST00545 M78397 HFBCB40
393 EST00443 M78295 HFBBA56 459 EST00497 M78349 HFBCA44 526 EST00546 M78398 HFBCB42
394 EST00444 M78296 HFBBA57 460 EST01457 M77873 HFBCA45 527 EST00547 M78399 HFBCB43
395 EST00445 M78297 HFBBA58 461 EST01836 M85327 HFBCA46 528 EST00548 M78400 HFBCB46
396 EST01443 M77859 HFBBA59 462 EST00498 M78350 HFBCA47 529 EST00549 M78401 HFBCB48
397 EST00446 M78298 HFBBA60 463 EST01459 M77875 HFBCA49 530 EST01477 M77893 HFBCB49
398 EST00447 M78299 HFBBA61 464 EST00499 M78351 HFBCA50 531 EST00550 M78402 HFBCB51
399 EST00448 M78300 HFBBA62 465 EST00500 M78352 HFBCA51 532 EST00551 M78403 HFBCB52
400 EST00449 M78301 HFBBA63 466 EST00501 M78353 HFBCA52 533 EST00552 M78404 HFBCB53
401 EST00450 M78302 HFBBA64 467 EST00502 M78354 HFBCA54 534 EST01478 M77894 HFBCB54
402 EST00451 M78303 HFBBA65 468 EST00503 M78355 HFBCA56 535 EST00553 M78405 HFBCB55
403 EST00452 M78304 HFBBA66 469 EST01460 M77876 HFBCA58 536 EST01479 M77895 HFBCB56
404 EST00453 M78305 HFBBA67 470 EST00504 M78356 HFBCA59 537 EST00554 M78406 HFBCB57
405 EST00454 M78306 HFBBA68 471 EST00505 M78357 HFBCA60 538 EST00555 M78407 HFBCB59
406 EST00455 M78307 HFBBA69 473 EST00506 M78358 HFBCA63 539 EST00556 M78408 HFBCB60
407 EST00456 M78308 HFBBA72 474 EST00507 M78359 HFBCA64 540 EST00557 M78409 HFBCB61
408 EST00457 M78309 HFBBA73 475 EST00508 M78360 HFBCA65 541 EST00558 M78410 HFBCB62
409 EST01444 M77860 HFBBA74 476 EST00509 M78361 HFBCA70 542 EST01480 M77896 HFBCB63
410 EST00458 M78310 HFBBA76 477 EST01463 M77879 HFBCA71 543 EST00559 M78411 HFBCB64
411 EST00459 M78311 HFBBA77 478 EST00510 M78362 HFBCA72 544 EST00560 M78412 HFBCB65
412 EST01445 M77861 HFBBA80 479 EST00511 M78363 HFBCA73 545 EST01481 M77897 HFBCB66
413 EST01446 M77862 HFBBA81 480 EST01464 M77880 HFBCA74 546 EST01839 M85330 HFBCB67
414 EST00460 M78312 HFBBA82 481 EST00512 M78364 HFBCA76 547 EST00561 M78413 HFBCB69
415 EST00461 M78313 HFBBA83 482 EST01465 M77881 HFBCA77 548 EST00562 M78414 HFBCB71
416 EST00462 M78314 HFBBA84 483 EST00513 M78365 HFBCA79 549 EST00563 M78415 HFBCB73
417 EST00463 M78315 HFBBA85 484 EST00514 M78366 HFBCA80 550 EST00564 M78416 HFBCB74
418 EST00464 M78316 HFBBA87 485 EST01466 M77882 HFBCA82 551 EST01482 M77898 HFBCB75
419 EST00465 M78317 HFBBA89 486 EST00515 M78367 HFBCA83 552 EST00565 M78417 HFBCB76
420 EST00466 M78318 HFBBA90 487 EST00516 M78368 HFBCA84 553 EST00566 M78418 HFBCB77
421 EST00467 M78319 HFBBA91 488 EST00517 M78369 HFBCA85 554 EST00567 M78419 HFBCB79
422 EST01447 M77863 HFBBA92 489 EST00518 M78370 HFBCA86 555 EST01483 M77899 HFBCB80
423 EST00468 M78320 HFBBA93 490 EST00519 M78371 HFBCA87 556 EST00568 M78420 HFBCB81
424 EST01448 M77864 HFBBA96 491 EST00520 M78372 HFBCA88 557 EST00569 M78421 HFBCB82
425 EST00469 M78321 HFBCA01 492 EST00521 M78373 HFBCA90 558 EST01484 M77900 HFBCB83
426 EST00470 M78322 HFBCA02 493 EST00522 M78374 HFBCA91 559 EST00570 M78422 HFBCB84
427 EST01449 M77865 HFBCA04 494 EST00523 M78375 HFBCA92 560 EST01485 M77901 HFBCB85
428 EST01451 M77867 HFBCA06 495 EST00524 M78376 HFBCA94 561 EST0O571 M78423 HFBCB87
429 EST00471 M78323 HFBCA07 496 EST00525 M78377 HFBCA95 562 EST00572 M78424 HFBCB89
430 EST00472 M78324 HFBCA08 497 EST00526 M78378 HFBCA96 563 EST00573 M78425 HFBCB91
431 EST00473 M78325 HFBCA09 498 EST01467 M77883 HFBCB01
432 EST01452 M77868 HFBCA11 499 EST01468 M77884 HFBCB02
433 EST00474 M78326 HFBCA12 500 EST00527 M78379 HFBCB03
434 EST00475 M78327 HFBCA13 501 EST02715 HFBCB05
435 EST00476 M78328 HFBCA15 502 EST01469 M77885 HFBCB06
436 EST00477 M78329 HFBCA16 503 EST00528 M78380 HFBCB08
437 EST00478 M78330 HFBCA17 504 EST00529 M78381 HFBCB09
438 EST00479 M78331 HFBCA18 505 EST01837 M85328 HFBCB10
439 EST00480 M78332 HFBCA21 506 EST01471 M77887 HFBCB12
SEQ ID EST# GB# Clone SEQ ID EST# GB# Clone SEQ ID EST# GB# Clone
564 EST00574 M78426 HFBCB92 630 EST01505 M77921 HFBCC85 696 EST01521 M77937 HFBCD87
565 EST00575 M78427 HFBCB93 631 EST00625 M78477 HFBCC89 697 EST00680 M78532 HFBCD88
566 EST00576 M78428 HFBCB94 632 EST00626 M78478 HFBCC90 698 EST00681 M78533 HFBCD89
567 EST00577 M78429 HFBCB96 633 EST00627 M78479 HFBCC91 699 EST00682 M78534 HFBCD90
568 EST00578 M78430 HFBCC01 634 EST00628 M78480 HFBCC92 700 EST00683 M78535 HFBCD91
569 EST00579 M78431 HFBCC02 635 EST00629 M78481 HFBCC93 701 EST01522 M77938 HFBCD92
570 EST00580 M78432 HFBCC03 636 EST01507 M77923 HFBCC94 702 EST00684 M78536 HFBCD94
571 EST00581 M78433 HFBCC04 637 EST00630 M78482 HFBCC95 703 EST00685 M78537 HFBCD95
572 EST00582 M78434 HFBCC06 638 EST00631 M78483 HFBCD03 704 EST00686 M78538 HFBCD96
573 EST00583 M78435 HFBCC09 639 EST00632 M78484 HFBCD04 705 EST00687 M78539 HFBCE01
574 EST00584 M78436 HFBCC11 640 EST01509 M77925 HFBCD06 706 EST00688 M78540 HFBCE02
575 EST00585 M78437 HFBCC13 641 EST00633 M78485 HFBCD07 707 EST01847 M85335 HFBCE03
576 EST00586 M78438 HFBCC14 642 EST00634 M78486 HFBCD08 708 EST00689 M78541 HFBCE05
577 EST00587 M78439 HFBCC15 643 EST00635 M78487 HFBCD09 709 EST00690 M78542 HFBCE06
578 EST00588 M78440 HFBCC16 644 EST00636 M78488 HFBCD10 710 EST00691 M78543 HFBCE07
579 EST00589 M78441 HFBCC17 645 EST00637 M78489 HFBCD11 711 EST00692 M78544 HFBCE08
580 EST00590 M78442 HFBCC18 646 EST00638 M78490 HFBC012 712 EST00693 M78545 HFBCE11
581 EST00591 M78443 HFBCC20 647 EST00639 M78491 HFBC016 713 EST00694 M78546 HFBCE12
582 EST00592 M78444 HFBCC21 648 EST00640 M78492 HFBCD17 714 EST00695 M78547 HFBCE13
583 EST00593 M78445 HFBCC22 649 EST00641 M78493 HFBCD18 715 EST01523 M77939 HFBCE14
584 EST00594 M78446 HFBCC23 650 EST00642 M78494 HFBCD19 716 EST01524 M77940 HFBCE15
585 EST00595 M78447 HFBCC25 651 EST00643 M78495 HFBCD20 717 EST01525 M77941 HFBCE16
586 EST00596 M78448 HFBCC26 652 EST01510 M77926 HFBCD21 718 EST00696 M78548 HFBCE17
587 EST01488 M77904 HFBCC27 653 EST01512 M77928 HFBCD26 719 EST01526 M77942 HFBCE18
588 EST00597 M78449 HFBCC28 654 EST00644 M78496 HFBCD27 720 EST00697 M78549 HFBCE19
589 EST00598 M78450 HFBCC29 655 EST00645 M78497 HFBCD28 721 EST01527 M77943 HFBCE21
590 EST00599 M78451 HFBCC30 656 EST01513 M77929 HFBCD29 722 EST01528 M77944 HFBCE22
591 EST01489 M77905 HFBCC31 657 EST00646 M78498 HFBC031 723 EST00698 M78550 HFBCE23
592 EST00600 M78452 HFBCC32 658 EST00647 M78499 HFBC032 724 EST01529 M77945 HFBCE24
593 EST00601 M78453 HFBCC33 659 EST00648 M78500 HFBCD33 725 EST00699 M78551 HFBCE26
594 EST01490 M77906 HFBCC34 660 EST00649 M78501 HFBCD34 726 EST00700 M78552 HFBCE27
595 EST01840 M85331 HFBCC34 661 EST00650 M78502 HFBCD35 727 EST00701 M78553 HFBCE28
596 EST00602 M78454 HFBCC35 662 EST00651 M78503 HFBCD36 728 EST00702 M78554 HFBCE29
597 EST00603 M78455 HFBCC36 663 EST00652 M78504 HFBCD37 729 EST00703 M78555 HFBCE30
598 EST00604 M78456 HFBCC38 664 EST00653 M78505 HFBCD38 730 EST00704 M78556 HFBCE31
599 EST00605 M78457 HFBCC39 665 EST00654 M78506 HFBCD39 731 EST00705 M78557 HFBCE32
600 EST01492 M77908 HFBCC40 666 EST01514 M77930 HFBCD40 732 EST00706 M78558 HFBCE34
601 EST01493 M77909 HFBCC41 667 EST00655 M78507 HFBCD41 733 EST00707 M78559 HFBCE36
602 EST00606 M78458 HFBCC42 668 EST00656 M78508 HFBCD42 734 EST00708 M78560 HFBCE37 03 EST01494 M77910 HFBCC43 669 EST00657 M78509 HFBCD43 735 EST00709 M78561 HFBCE38
604 EST00607 M78459 HFBCC44 670 EST00658 M78510 HFBCD44 736 EST01532 M77948 HFBCE40
605 EST00608 M78460 HFBCC46 671 EST00659 M78511 HFBCD45 737 EST00710 M78562 HFBCE41
606 EST00609 M78461 HFBCC47 672 EST00660 M78512 HFBCD46 738 EST00711 M78563 HFBCE42
607 EST00610 M78462 HFBCC50 673 EST01515 M77931 HFBCD47 739 EST01534 M77950 HFBCE46
608 EST00611 M78463 HFBCC51 674 EST01516 M77932 HFBCD49 740 EST01535 M77951 HFBCE48
609 EST01496 M77912 HFBCC52 675 EST00661 M78513 HFBCD51 741 EST00712 M78564 HFBCE49
610 EST00612 M78464 HFBCC53 676 EST00662 M78514 HFBCD54 742 EST00713 M78565 HF8CE50
611 EST00613 M78465 HFBCC54 677 EST00663 M78515 HFBCD57 743 EST00714 M78566 HFBCE52
612 EST00614 M78466 HFBCC55 678 EST01517 M77933 HFBCD61 744 EST01537 M77953 HFBCE54
613 EST00615 M78467 HFBCC56 679 EST01518 M77934 HF8CD62 745 EST00715 M78567 HFBCE55
614 EST01842 M85332 HFBCC57 680 EST00664 M78516 HFBCD63 746 EST00716 M78568 HFBCE56
615 EST00616 M78468 HFBCC63 681 EST00665 M78517 HFBCD64 747 EST00717 M78569 HFBCE58
616 EST01497 M77913 HFBCC65 682 EST00666 M78518 HFBC065 748 EST01850 M85337 HFBCE59
617 EST00617 M78469 HFBCC66 683 EST00667 M78519 HFBCD68 749 EST00719 M78571 HFBCE61
618 EST01498 M77914 HFBCC67 684 EST00668 M78520 HFBCD69 750 EST01539 M77955 HFBCE63
619 EST00619 M78471 HFBCC69 685 EST00669 M78521 HFBCD70 751 EST01540 M77956 HFBCE64
620 EST01499 M77915 HFBCC70 686 EST00670 M78522 HFBCD72 752 EST00720 M78572 HFBCE65
621 EST00620 M78472 HFBCC71 687 EST00671 M78523 HFBCD74
622 EST01843 M85333 HFBCC72 688 EST00672 M78524 HFBCD76
623 EST00621 M78473 HFBCC73 689 EST00673 M78525 HFBCD79
624 EST01500 M77916 HFBCC75 69U EST00674 M78526 HFBCD80
625 EST01844 M85334 HFBCC77 691 EST00675 M78527 HFBCD81
626 EST00622 M78474 HFBCC80 692 EST00676 M78528 HFBCD82
627 EST00623 M78475 HFBCC81 693 EST00677 M78529 HFBCD83
628 EST01503 M77919 HFBCC82 694 EST00678 M78530 HFBCD84
629 EST00624 M78476 HFBCC84 695 EST00679 M78531 HFBCD86
SEQ ID EST# GB# Clone SEQ ID EST# GB# Clone SEQ ID EST# GB# Clone
753 EST00721 M78573 HFBCE66 819 EST01859 M85345 HFBCG01 885 EST01885 M85371 HFBCH12
754 EST00722 M78574 HFBCE69 820 EST01860 M85346 HFBCG02 886 EST01886 M85372 HFBCH14
755 EST00723 M78575 HFBCE72 821 EST01862 M85348 HFBCG09 887 EST01887 M85373 HFBCH15
756 EST01541 M77957 HFBCE73 822 EST01863 M85349 HFBCG10 888 EST01888 M85374 HFBCH16
757 EST01542 M77958 HFBCE74 823 EST01864 M85350 HFBCG11 889 EST01889 M85375 HFBCH17
758 EST00724 M78576 HFBCE77 824 EST01865 M85351 HFBCG12 890 EST01890 M85376 HFBCH18
759 EST00725 M78577 HFBCE78 825 EST01866 M85352 HFBCG13 891 EST01891 M85377 HFBCH20
760 EST00726 M78578 HFBCE80 826 EST01867 M85353 HFBCG15 892 EST01892 M85378 HFBCH21
761 EST01544 M77960 HFBCE82 827 EST01558 M77974 HFBCG17 893 EST01893 M85379 HFBCH22
762 EST00727 M78579 HFBCE83 828 EST00767 M78619 HFBCG19 894 EST01894 M85380 HFBCH23
763 EST00728 M78580 HFBCE84 829 EST00768 M78620 HFBCG20 895 EST01895 M85381 HFBCH24
764 EST00729 M78581 HFBCE85 830 EST01559 M77975 HFBCG21 896 EST01896 M85382 HFBCH31
765 EST00730 M78582 HFBCE86 831 EST00769 M78621 HFBCG22 897 EST01897 M85383 HFBCH33
766 EST00731 M78583 HFBCE87 832 EST00770 M78622 HFBCG23 898 EST01898 M85384 HFBCH34
767 EST00732 M78584 HFBCE88 833 EST01560. M77976 HFBCG24 899 EST01899 M85385 HFBCH35
768 EST00733 M78585 HFBCE90 834 EST00771 M78623 HFBCG25 900 EST01900 M85386 HFBCH36
769 EST00734 M78586 HFBCE91 835 EST00772 M78624 HFBCG26 901 EST01901 M85387 HFBCH37
770 EST00735 M78587 HFBCE93 836 EST00773 M78625 HFBCG27 902 EST01902 M85388 HFBCH38
771 EST01546 M77962 HFBCE94 837 EST01561 M77977 HFBCG29 903 EST01903 M85389 HFBCH39
772 EST00736 M78588 HFBCE95 838 EST00774 M78626 HFBCG30 904 EST01904 M85390 HFBCH42
773 EST01547 M77963 HFBCE96 839 EST01562 M77978 HFBCG31 905 EST01905 M85391 HFBCH43
774 EST01548 M77964 HFBCF01 840 EST00775 M78627 HFBCG32 906 EST01906 M85392 HFBCH45
775 EST00737 M78589 HFBCF03 841 EST00776 M78628 HFBCG33 907 EST01907 M85393 HFBCH46
776 EST00738 M78590 HFBCF07 842 EST01563 M77979 HFBCG34 908 EST01908 M85394 HFBCH50
777 EST00739 M78591 HFBCF09 843 EST01564 M77980 HFBCG35 909 EST01909 M85395 HFBCH56
778 EST00740 M78592 HFBCF10 844 EST01565 M77981 HFBCG37 910 EST01910 M85396 HFBCH57
779 EST00741 M78593 HFBCF11 845 EST00777 M78629 HFBCG38 911 EST01911 M85397 HFBCH58
780 EST01549 M77965 HFBCF13 846 EST00778 M78630 HFBCG40 912 EST01912 M85398 HFBCM60
781 EST01550 M77966 HFBCF14 847 EST00779 M78631 HFBCG43 913 EST01913 M85399 HFBCM61
782 EST01551 M77967 HFBCF16 848 EST01566 M77982 HFBCG44 914 EST01914 M85400 HFBCM62
783 EST01552 M77968 HFBCF23 849 EST01567 M77983 HFBCG45 915 EST01915 M85401 HFBCM63
784 EST01852 M85338 HFBCF41 850 EST00780 M78632 HFBCG47 916 EST01917 M85402 HFBCM65
785 EST01553 M77969 HFBCF42 851 EST00781 M78633 HFBCG49 917 EST01919 M85404 HFBCM68
786 EST00742 M78594 HFBCF43 852 EST00782 M78634 HFBCG51 918 EST01920 M85405 HFBCM70
787 EST00743 M78595 HFBCF44 853 EST00783 M78635 HFBCG53 919 EST01921 M85406 HFBCM71
788 EST00744 M78596 HFBCF45 854 EST00784 M78636 HFBCG57 920 EST01922 M85407 HFBCM72
789 EST00745 M78597 HFBCF46 855 EST00785 M78637 HFBCG61 221 EST01923 M85408 HFBCM73
790 EST01554 M77970 HFBCF47 856 EST01568 M77984 HFBCG62 922 EST01924 M85409 HFBCM74
791 EST00746 M78598 HFBCF48 857 EST01868 M85354 HFBCG69 923 EST01925 M85410 HFBCM76
792 EST00747 M78599 HFBCF49 858 EST01869 M85355 HFBCG72 924 EST01926 M85411 HFBCM77
793 EST00748 M78600 HFBCF50 859 EST01870 M85356 HFBCG73 925 EST01927 M85412 HFBCM78
794 EST01555 M77971 HFBCF51 860 EST00786 M78638 HFBCG74 926 EST01929 M85414 HFBCM81
795 EST00749 M78601 HFBCF52 861 EST01871 M85357 HFBCG76 927 EST01930 M85415 HFBCM82
796 EST00750 M78602 HFBCF53 862 EST01872 M85358 HFBCG77 928 EST01931 M85416 KFBCM84
797 EST00751 M78603 HFBCF54 863 EST01873 M85359 HFBCG78 929 EST01932 M85417 HFBCM86
798 EST01853 M85339 HFBCF56 864 EST00787 M78639 HFBCG79 930 EST01933 M85418 HFBCM87
799 EST00752 M78604 HFBCF57 865 EST01569 M77985 HFBCG80 931 EST01934 M85419 HFBCH90
800 EST00753 M78605 HFBCF58 866 EST01874 M85360 HFBCG81 932 EST01935 M85420 HFBCH92
801 EST00754 M78606 HFBCF60 867 EST01875 M85361 HFBCG83 933 EST01936 M85421 HFBCH93
802 EST00755 M78607 HFBCF61 868 EST01876 M85362 HFBCG84 934 EST01937 M85422 HFBCH94
803 EST00756 M78608 HFBCF63 869 EST00788 M78640 HFBCG85 935 EST01938 M85423 HFBCH95
804 EST00757 M78609 HFBCF68 870 EST00789 M78641 HFBCG88 936 EST01939 HB5424 HFBCH96
805 EST00758 M78610 HFBCF73 871 EST00790 M78642 HFBCG89 937 EST01940 M85425 HFBCI01
806 EST00759 M78611 HFBCF74 872 EST00791 M78643 HFBCG90 938 EST01941 M85426 HFBCI02
807 EST00760 M78612 HFBCF75 873 EST00792 M78644 HFBCG92 939 EST01943 M85427 HFBC105
808 EST00761 M78613 HFBCF79 874 EST00793 M78645 HFBCG93 940 EST01944 M85428 HFBCI06
809 EST00762 M78614 HFBCF81 875 EST00794 M78646 HFBCG94 941 EST01945 M85429 HFBCI08
810 EST00763 M78615 HFBCF84 876 EST00795 M78647 HFBCG96
811 EST00764 M78616 HFBCF85 877 EST01877 M85363 HFBCH01
812 EST01854 M85340 HFBCF86 878 EST01878 M85364 HFBCH02
813 EST00765 M78617 HFBCF87 879 EST01879 M85365 HFBCH03
814 EST00766 M78618 HFBCF89 880 EST01880 M85366 HFBCH05
815 EST01855 M85341 HFBCF90 881 EST01881 M85367 HFBCH06
816 EST01856 M85342 HFBCF91 882 EST01882 M85368 HFBCH07
817 EST01857 M85343 HFBCF93 883 EST01883 M85369 HFBCH08
818 EST01858 M85344 HFBCF94 884 EST01884 M85370 HFBCH10
SEQ ID EST# GB# Clone SEQ ID EST# GB# Clone SEQ ID EST# GB# Clone
942 EST01947 M85431 HFBCI11 1008 EST02022 M85506 HFBCJ12 1074 EST02092 M85576 HFBCK29
943 EST01948 M85432 HFBCI12 1009 EST02023 M85507 HFBCJ17 1075 EST02093 M85577 HFBCK30
944 EST01949 M85433 HFBCI13 1010 EST02024 M85507 HFBCJ18 1076 EST02094 M85578 HFBCK31
EST02096 H85580 HFBCK36
945 EST01950 M85434 HFBC114 1011 EST02025 M85509 HFBCJ20 1077
946 EST01953 M85437 HFBC117 1012 EST02026 M85510 HFBCJ34 1078 EST02097 H85581 HFBCK37
947 EST01954 M85438 HFBCI18 1013 EST02027 M85511 HFBCJ35 1079 EST02098 M85582 HFBCK38
948 EST01957 M85441 HFBCI28 1014 EST02028 M85512 HFBCJ39 1080 EST02099 M85583 HFBCK39
949 EST01958 M85442 HFBCI28 1015 EST02029 M85513 HFBCJ40 1081 EST02100 M85584 HFBCK40
950 EST01959 M85443 HFBCI27 1016 EST02030 M85514 HFBCJ41 1082 EST02101 M85585 HFBCK42
951 EST01960 M85444 HFBC128 1017 EST02031 M85515 HFBCJ42 1083 EST02093 M85586 HFBCK43
952 EST01961 M85445 HFBCI29 1018 EST02032 M85516 HFBCJ49 1084 EST02103 M85587 HFBCK44
953 EST01962 M85446 HFBCI31 1019 EST02033 M85517 HFBCJ50 1085 EST02104 M85588 HFBCK45
954 EST01963 M85447 HFBCI32 1020 EST02034 M85518 HFBCJ51 1086 EST02093 M85577 HFBCK30
955 EST01966 M85450 HFBCI36 1021 EST02035 M85519 HFBCJ52 1087 EST02106 M85590 HFBCK49
956 EST01968 M85452 HFBCI40 1022 EST02036 M85520 HFBCJ54 1088 EST02107 M85591 HFBCK54
957 EST01969 M85453 HFBCI41 1023 EST02037 M85521 HFBCJ55 1089 EST02108 M85592 HFBCK55
958 EST01970 M85454 HFBCI42 1024 EST02038 M85522 HFBCJ56 1090 EST02109 M85593 HFBCK56
959 EST01972 M85456 HFBCI44 1025 EST02040 M85524 HFBCJ59 1091 EST02110 M85594 HFBCK57
960 EST01973 M85457 HFBCI45 1026 EST02041 M85525 HFBCJ60 1092 EST02111 M85595 HFBCK58
961 EST01974 M85458 HFBCI46 1027 EST02042 M85526 HFBCJ61 1093 EST02112 M85596 HFBCK59
962 EST01975 M85459 HFBCI47 1028 EST02043 M85527 HFBCJ62 1094 EST02113 M85597 HFBCK60
963 EST01976 M85460 HFBCI48 1029 EST02044 M85528 HFBCJ63 1095 EST02114 M85598 HFBCK61
964 EST01977 M85461 HFBCI50 1030 EST02045 M85529 HFBCJ64 1096 EST02115 M85599 HFBCK62
965 EST01978 M85462 HFBCI51 1031 EST02046 M85530 HFBCJ65 1097 EST02116 M85600 HFBCK63
966 EST01979 M85463 HFBCI52 1032 EST02048 M85532 HFBCJ67 1098 EST02117 M85601 HFBCK65
967 EST01980 M85464 HFBCI53 1033 EST02049 M85533 HFBCJ68 1099 EST02118 M85602 HFBCK68
968 EST01981 M85456 HFBC154 1034 EST02050 M85534 HFBCJ69 1100 EST02119 H85603 HFBCK69
969 EST01982 M85466 HFBCI57 1035 EST02051 M85535 HFBCJ70 1101 EST02120 M85604 HFBCK70
970 EST01983 M85467 HFBCI58 1036 EST02052 H85536 HFBCJ72 1102 EST02121 M85605 HFBCK71
971 EST01985 M85469 HFBC160 1037 EST02053 M85537 HFBCJ73 1103 EST02122 M85606 HFBCK72
972 EST01986 M85470 HFBCI62 1038 EST02054 M85538 HFBCJ74 1104 EST02123 M85607 HFBCK74
973 EST01987 M85471 HFBCI65 1039 EST02055 M85539 HFBCJ75 1105 EST02124 M85608 HFBCK75 974 EST01988 M85472 HFBCI66 1040 EST02056 M85540 HFBCJ77 1106 EST02125 M85609 HFBCK76 975 EST01989 M85473 HFBCI67 1041 EST02057 M85541 HFBCJ78 1107 EST02126 M85610 HFBCK77
976 EST01990 M85474 HFBC168 1042 EST02058 M85542 HFBCJ79 1108 EST02127 M85611 HFBCK81
977 EST01991 M85475 HFBCI70 1043 EST02059 M85543 HFBCJ80 1109 EST02128 H85612 HFBCK83
978 EST01992 M85476 HFBCI71 1044 EST02060 M85544 HFBCJ84 1110 EST02129 H85613 HFBCK84
979 EST01993 M85477 HFBCI72 1045 EST02061 M85545 HFBCJ85 1111 EST02131 H85614 HFBCK86
980 EST01994 M85478 HFBCI73 1046 EST02062 MM5546 HFBCJ86 1112 EST02132 H85615 HFBCK87
981 EST01995 M85479 HFBC175 1047 EST02063 M85547 HFBCJ87 1113 EST02133 H85616 HFBCK89
982 EST01996 M85480 tlFBC.76 1048 EST02064 M85548 HFBCJ90 1114 EST02134 M85617 HFBCK90
983 EST01997 M85481 HFBCI77 1049 EST02065 M85549 HFBCJ91 1115 EST02135 H85618 HFBCK91
984 EST01998 M85482 HFBCI78 1050 EST02066 M85550 HFBCJ92 1116 EST02136 M85619 HFBCK92
HFBCK93
985 EST01999 M85483 HFBCI80 1051 EST02067 M85551 HFBCJ94 1117 EST02137 H85620
936 EST02000 M85484 HFBCI81 1052 EST02068 M85552 HFBCJ95 1118 EST02138 H85621 HFBCK94
987 EST02001 M85485 HFBCI82 1053 EST02069 M85553 HFBCK01 1119 EST02139 H85622 HFBCK95
988 EST02002 M85486 HFBCI83 1054 EST02070 M85554 HFBCK02 1120 EST02140 M85623 HFBCK96
990 EST02004 M85488 HFBCI85 1056 EST02072 M85556 HFBCK04 1122 EST02142 M85625 HFBCL02
991 EST02005 M85489 HFBCI86 1057 EST02073 M85557 HFBCK05 1123 EST02143 H85626 HFBCL05
992 EST02006 M85490 HFBCI87 1058 EST02074 M85558 HFBCK06 1124 EST02144 H85627 HFBCL07
993 EST02007 M85491 HFBCI88 1059 EST02075 M85559 HFBCK07 1125 EST02145 H85628 HFBCL08
994 EST02008 M85492 HFBCI89 1060 EST02076 M85560 HFBCK08 1126 EST02146 H85629 HFBCL09
995 EST02009 M85493 HFBCI90 1061 EST02078 M85562 HFBCK11 1127 EST02147 H85630 HFBCL10
996 EST02010 M85494 HFBCI91 1062 EST02079 M85563 HFBCK12 1128 EST02148 M85631 HFBCL11
997 EST02011 M85495 HFBCI92 1063 EST02081 M85565 HFBCK16 1129 EST02149 H85632 HFBCL12
998 EST02012 M85496 HFBCI93 1064 EST02082 M85566 HFBCK17 1130 EST02150 M85633 HFBCL13
999 EST02013 M85497 HFBCI95 1065 EST02083 M85567 HFBCK18
1000 EST02014 M85498 HFBCI96 1066 EST02084 M85568 HFBCK19
1001 EST02015 M85499 HFBCJ01 1067 EST02085 M85569 HFBCK20
1002 EST02016 M85500 HFBCJ03 1068 EST02086 M85570 HFBCK21
1003 EST02017 M85501 HFBCJ04 1069 EST02087 M85571 HFBCK22
1004 EST02018 M85502 HFBCJ05 1070 EST02088 M85572 HFBCK23
1005 EST02019 M85503 HFBCJ06 1071 EST02089 M85573 HFBCK25
1006 EST02020 M85504 HFBCJ09 1072 EST02090 M85574 HFBCK26
1007 EST02021 M85505 HFBCJ11 1073 EST02091 M85575 HFBCK28
Figure imgf000105_0001
SEQ ID EST# GB# Clone SEQ ID EST# GB# Clone SEQ ID EST# GB# Clone
1314 EST02344 M85822 HFBCo38 1380 EST02411 M85887 HFBCP19 1446 EST02482 M85958 HFBCR57
1315 EST02345 M85311 HFBCo38 1381 EST02413 M85889 HFBCP22 1447 EST02483 M85959 HFBCT01
1316 EST02346 M85823 HFBCo39 1382 EST02414 M85890 HFBCP23 1448 EST02484 M85960 HFBCT02
1317 EST02347 M85312 HFBCO39 1383 EST02415 M85891 HFBCP49 1449 EST02485 M85961 HFBCT03
1318 EST02348 M85824 HFBCO41 1384 EST02416 M85892 HFBCP50 1450 EST02486 M85962 HFBCT04
1319 EST02349 M85825 HFBCO42 1385 EST02417 M85893 HFBCP51 1451 EST02487 M85963 HFBCT05
1320 EST02350 M85826 HFBCO43 1386 EST02418 M85894 HFBCP52 1452 EST02488 M85964 HFBCT06
1321 EST02351 M85827 HFBCO44 1387 EST02419 M85895 HFBCP54 1453 EST02489 M85965 HFBCT08
1322 EST02352 M85828 HFBCO45 1388 EST02421 M85897 HFBCP56 1454 EST02490 M85966 HFBCT09
1323 EST02353 M85829 HFBCO46 1389 EST02422 M85898 HFBCP57 1455 EST02491 M85967 HFBCT10
1324 EST02354 M85830 HFBCO47 1390 EST02423 M85899 HFBCP59 1456 EST02492 M85968 HFBCT12
1325 EST02355 M85831 HFBCO48 1391 EST02424 M85900 HFBCP60 1457 EST02493 M85969 HFBCU02
1326 EST02356 M85832 HFBCO49 1392 EST02425 M85901 HFBCP61 1458 EST02495 M85970 HFBCU04
1327 EST02357 M85833 HFBCO50 1393 EST02426 M85902 HFBCP62 1459 EST02496 M85971 HFBCU06
1328 EST02358 M85834 HFBCO51 1394 EST02427 M85903 HFBCP63 1460 EST02497 M85972 HFBCU07
1329 EST02359 M85835 HFBCO53 1395 EST02429 M85905 HFBCP66 1461 EST02498 M85973 HFBCU08
1330 EST02360 M85836 HFBCO54 1396 EST02430 M85906 HFBCP67 1462 EST02499 M85974 HFBCU09
1331 EST02361 M85837 HFBCO58 1397 EST02431 M85907 HFBCP69 1463 EST02500 M85975 HFBCU11
1332 EST02362 M85838 HFBCO59 1398 EST02432 M85908 HFBCP71 1464 EST02501 M85976 HFBCU12
1333 EST02363 M85839 HFBCO60 1399 EST02433 M85909 HFBCP72 1465 EST02502 M85977 HFBCV02
1334 EST02364 M85840 HFBCO61 1400 EST02434 M85910 HFBCP73 1466 EST02503 M85978 HFBCV03
1335 EST02365 M85841 HFBCO63 1401 EST02435 M85911 HFBCP74 1467 EST02504 M85979 HFBCV04
1336 EST02366 M85842 HFBCO64 1402 EST02436 M85912 HFBCP75 1468 EST02505 M85980 HFBCV05
1337 EST02367 M85843 HFBCO65 1403 EST02437 M85913 HFBCP76 1469 EST02506 M85981 HFBCV06
1338 EST02368 M85844 HFBCO66 1404 EST02438 M85914 HFBCP77 1470 EST02507 M85982 HFBCV07
1339 EST02369 M85845 HFBCO67 1405 EST02439 M85915 HFBCP78 1471 EST02508 M85983 HFBCV08
1340 EST02370 M85846 HFBCO69 1406 EST02440 M85916 HFBCP79 1472 EST02509 M85984 HFBCV09
1341 EST02371 M85847 HFBCO70 1407 EST02441 M85917 HFBCP81 1473 EST02510 M85985 HFBCV10
1342 EST02372 M85848 HFBCO71 1408 EST02442 M85918 HFBCP82 1474 EST02511 M85986 HFBCV11
1343 EST02373 M85849 HFBCO73 1409 EST02443 M85919 HFBCP83 1475 EST02512 M85987 HFBCX02
1344 EST02374 M85850 HFBCO74 1410 EST02444 M85920 HFBCP84 1476 EST02513 M85988 HFBCX05
1345 EST02375 M85851 HFBCO75 1411 EST02445 M85921 HFBCP85 1477 EST02514 M85989 HFBCX06
1346 EST02376 M85852 HFBCO76 1412 EST02446 M85922 HFBCP86 1478 EST02515 M85990 HFBCX08
1347 EST02377 M85853 HFBCO77 1413 EST02447 M85923 HFBCP87 1479 EST02516 M85991 HFBCX10
1348 EST02378 M85854 HFBCO78 1414 EST02448 M85924 HFBCP88 1480 EST02517 M85992 HFBCY02
1349 EST02379 M85855 HFBCO79 1415 EST02449 M85925 HFBCP89 1481 EST02518 M85993 HFBCY03
1350 EST02380 M85856 HFBCO80 1416 EST02450 M85926 HFBCP90 1482 EST02519 M85994 HFBCY04
1351 EST02381 M85857 HFBCO81 1417 EST02452 M85928 HFBCP92 1483 EST02520 M85995 HFBCY05
1352 EST02382 M85858 HFBCO82 1418 EST02453 M85929 HFBCP93 1484 EST02521 M85996 HFBCY06
1353 EST02383 M85859 HFBCO83 1419 EST02454 M85930 HFBCP94 1485 EST02522 M85997 HFBCY07
1354 EST02384 M85860 HFBCO84 1420 EST02456 M85932 HFBCQ14 1486 EST02523 M85998 HFBCY08
1355 EST02385 M85861 HFBCO85 1421 EST02457 M85933 HFBCQ18 1487 EST02524 M85999 HFBCY09
1356 EST02386 M85862 HFBCO87 1422 EST02458 M85934 HFBCR01 1488 EST02525 M86000 HFBCY10
1357 EST02387 M85863 HFBCO88 1423 EST02459 M85935 HFBCR02 1489 EST02526 M86001 HFBCY11
1358 EST02388 M85864 HFBCO89 1424 EST02460 M85936 HFBCR04 1490 EST02527 M86002 HFBCY12
1359 EST02390 M85866 HFBCO91 1425 EST02461 M85937 HFBCR05 1491 EST02529 M86004 HFBCY15
1360 EST02391 M85867 HFBCO93 1426 EST02462 M85938 HFBCR06 1492 EST02530 M86005 HFBCY16
1361 EST02392 M85868 HFBCO94 1427 EST02463 M85939 HFBCR07 1493 EST02531 M86006 HFBCY17
1362 EST02393 M85869 HFBCO95 1428 EST02464 M85940 HF8CR08 1494 EST02532 M86007 HFBCY18
1363 EST02394 M85870 HFBCO96 1429 EST02465 M85941 HFBCR09 1495 EST02533 M86008 HFBCY19
1364 EST02395 M85871 HFBCP02 1430 EST02466 M85942 HFBCR10 1496 EST02534 M86009 HFBCY20
1365 EST02396 M85872 HFBCP03 1431 EST02467 M85943 HFBCR11 1497 EST02535 M86010 HFBCY21
1366 EST02397 M85873 HFBCP04 1432 EST02468 M85944 HFBCR36 1498 EST02536 M86011 HFBCY22
1367 EST02398 M85874 HFBCP06 1433 EST02469 M85945 HFBCR37 1499 EST02537 M86012 HFBCY24
1368 EST02399 M85875 HFBCP07 1434 EST02470 M85946 HFBCR38 1500 EST02538 M86013 HFBCY25
1369 EST02400 M85876 HFBCP08 1435 EST02471 M85947 HFBCR39 1501 EST02539 M86014 HFBCY26
1370 EST02401 M85877 HFBCP09 1436 EST02472 M85948 HFBCR40 1502 EST02540 M86015 HFBCY27
1371 EST02402 M85878 HFBCP10 1437 EST02473 M85949 HFBCR42
1372 EST02403 M85879 HFBCP11 1438 EST02474 M85950 HFBCR46
1373 EST02404 M85880 HFBCP12 1439 EST02475 M85951 HFBCR48
1374 EST02405 M85881 HFBCP13 1440 EST02476 M85952 HFBCR49
1375 EST02406 M85882 HFBCP14 1441 EST02477 M85953 HFBCR51
1376 EST02407 M85883 HFBCP15 1442 EST02478 M85954 HFBCR53
1377 EST02408 M85884 HFBCP16 1443 EST02479 M85955 HFBCR54
1378 EST02409 M85885 HFBCP17 1444 EST02480 M85956 HFBCR55
1379 EST02410 M85886 HFBCP18 1445 EST02481 M85957 HFBCR56
SEQ ID EST# GB# Clone SEQ ID EST# GB# Clone SEQ ID EST# GB# Clone
1503 EST02541 M86016 HFBCY28 1569 EST02612 M86087 HFBDJ16 1635 EST02682 M85316 HHCF46
1504 EST02542 M86017 HFBCY30 1570 EST02613 M86088 HFBDJ17 1636 EST02684 M85317 HHCF93
1505 EST02543 M86018 HFBCY31 1571 EST02614 M86089 HFBDJ19 1637 EST00796 M78648 HHCMA16
1506 EST02545 M86020 HFBCY33 1572 EST02615 M86090 HFBDJ20 1638 EST00798 M78650 HHCMA26
1507 EST02546 M86021 HFBCY34 1573 EST02616 M86091 HFBDJ21 1639 EST00799 M78651 HHCMA28
1508 EST02547 M86022 HFBCY35 1574 EST02617 M86092 HFBDJ22 1640 EST00800 M78652 HHCMA29
1509 EST02548 M86023 HFBCY36 1575 EST02618 M86093 HFBDJ24 1641 EST00801 M78653 HHCMA32
1510 EST02549 M86024 HFBCY37 1576 EST02619 M86094 HFBDJ25 1642 EST00802 M78654 HHCMA34
1511 EST02550 M86025 HFBCY38 1577 EST02620 M86095 HFBDJ29 1643 EST00803 M78655 HHCMA36
1512 EST02551 M86026 HFBCY40 1578 EST02621 M86096 HFBDJ31 1644 EST01571 M77986 HHCMA37
1513 EST02552 M86027 HFBCY41 1579 EST02622 M86097 HFBDJ32 1645 EST00804 M78656 HHCMA40
1514 EST02553 M86028 HFBCY43 1580 EST02623 M86098 HFBDJ35 1646 EST00805 M78657 HHCMA41
1515 EST02554 M86029 HFBCY44 1581 EST02625 M86099 HFBDJ38 1647 EST00806 M78658 HHCMA45
1516 EST02555 M86030 HFBCY45 1582 EST02626 M86100 HFBDJ40 1648 EST00807 M78659 HHCMA45
1517 EST02558 M86033 HFBCY50 1583 EST02628 M86102 HFBDJ42 1649 EST00808 M78660 HHCMA48
1518 EST02559 M86034 HFBCY51 1584 EST02629 M86103 HFBDJ43 1650 EST00809 M78661 HHCMA49
1519 EST02560 M86035 HFBCY52 1585 EST02630 M86104 HFBDJ44 1651 EST00810 M78662 HHCMA52
1520 EST02561 M86036 HFBCY53 1586 EST02631 M86105 HFBDJ45 1652 EST00811 M78663 HHCMA53
1521 EST02562 M86037 HFBCY54 1587 EST02632 M86106 HFBDJ46 1653 EST00812 M78664 HHCMA55
1522 EST02563 M86038 HFBCY55 1588 EST02633 M86107 HFBDJ47 1654 EST01572 M77987 HHCMA56
1523 EST02564 M86039 HFBCZ01 1589 EST02634 M86108 HFBDJ49 1655 EST00813 M78665 HHCMA57
1524 EST02565 M86040 HFBCZ05 1590 EST02635 M86109 HFBDJ50 1656 EST00814 M78666 HHCMA58
1525 EST02566 M86041 HFBCZ06 1591 EST02636 M86110 HFBDJ51 1657 EST00815 M78667 HHCMA60
1526 EST02567 M86042 HFBCZ07 1592 EST02637 M86111 HFBDJ52 1658 EST00816 M78668 HHCMA62
1527 EST02568 M86043 HFBCZ08 1593 EST02638 M86112 HFBDJ53 1659 EST00817 M78669 HHCMA65
1528 EST02569 M86044 HFBCZ09 1594 EST02639 M86113 HFBDJ54 1660 EST00818 M78670 HHCMA68
1529 EST02570 M86045 HFBCZ10 1595 EST02640 M86114 HFBDJ55 1661 EST00819 M78671 HHCMA72
1530 EST02571 M86046 HFBCZ11 1596 EST02641 M86115 HFBDJ57 1662 EST00820 M78672 HHCMA73
1531 EST02572 M86047 HFBCZ12 1597 EST02642 M86116 HFBDJ61 1663 EST00821 M78673 HHCMA78
1532 EST02573 M86048 HFBDE01 1598 EST02643 M86117 HFBDJ62 1664 EST00822 M78674 HHCMA80
1533 EST02574 M86049 HFBDE03 1599 EST02644 M86118 HFBDJ63 1665 EST00823 M78675 HHCMA81
1534 EST02575 M86050 HFBDE04 1600 EST02645 M86119 HFBDJ64 1666 EST00824 M78676 HKCMA82
1535 EST02576 M86051 HFBDE05 1601 EST02646 M86120 HFBDJ65 1667 EST00825 M78677 HHCMA83
1536 EST02577 M86052 HFBDE06 1602 EST02647 M86121 HFBDJ67 1668 EST00826 M78678 HHCMA84
1537 EST02578 M86053 HFBDE07 1603 EST02648 M86122 HFBDJ69 1669 EST00827 M78679 HHCMB03
1538 EST02579 M86054 HFBDE09 1604 EST02649 M86123 HFBDJ70 1670 EST00828 M78680 HHCMB12
1539 EST02580 M86055 HFBDE12 1605 EST02650 M86124 HFBDJ71 1671 EST00829 M78681 HHCMB12
1540 EST02581 M86056 HFBDF01 1606 EST02651 M86125 HFBDJ72 1672 EST00830 M78682 HHCMC04
1541 EST02582 M86057 HFBDF02 1607 EST02652 M86126 HFBDJ74 1673 EST00831 M78683 HHCMC05
1542 EST02583 M86058 HFBDF03 1608 EST02653 M86127 HFBDJ75 1674 EST00832 M78684 HHCMC06
1543 EST02584 M86059 HFBDF04 1609 EST02654 M86128 HFBDJ76 1675 EST00833 M78685 HHCMC07
1544 EST02586 M86061 HFBDF06 1610 EST02655 M86129 HFBDJ77 1676 EST00834 M78686 HHCMC09
1545 EST02587 M86062 HFBDF10 1611 EST02656 M86130 HFBDJ79 1677 EST00835 M78687 HHCMC10
1546 EST02588 M86063 HFBDF11 1612 EST02657 M86131 HFBDJ80 1678 EST00836 M78688 HHCMC11
1547 EST02589 M86064 HFBD101 1613 EST02658 M86132 HFBDJ86 1679 EST00837 M78689 HHCMC12
1548 EST02590 M86065 HFBD102 1614 EST02659 M86133 HFBDJ87 1680 EST00838 M78690 HHCMC13
1549 EST02591 M86066 HFBD103 1615 EST02660 M86134 HFBDJ89 1681 EST01573 M77988 HHCMC14
1550 EST02592 M86067 HFBD104 1616 EST02661 M86135 HFBDJ91 1682 EST00839 M78691 HHCMC15
1551 EST02593 M86068 HFBD105 1617 EST02662 M86136 HFBDJ92 1683 EST00840 M78692 HHCMC16
1552 EST02594 M86069 HFBD106 1618 EST02663 M86137 HFBDJ93 1684 EST00841 M78693 HHCMC17
1553 EST02595 M86070 HFBD107 1619 EST02665 M86139 HFBDK02 1685 EST00842 M78694 HHCMC18
1554 EST02597 M86072 HFBD110 1620 EST02666 M86140 HFBDK11 1686 EST01574 M77989 HHCMC21
1555 EST02598 M86073 HFBD111 1621 EST02667 M86141 HHCE25 1687 EST00843 M78695 HHCMC28
1556 EST02599 M86074 HFBD112 1622 EST02668 M86142 HHCE27 1688 EST00844 M78696 HHCMC36
1557 EST02600 M86075 HFBDJ01 1623 EST02669 M86143 HHCE30 1689 EST00845 M78697 HHCMC37
1558 EST02601 M86076 HFBDJ02 1624 EST02670 M86144 HHCE32 1690 EST00846 M78698 HHCMC38
1559 EST02602 M86077 HFBDJ04 1625 EST02672 M86146 HHCE36 1691 EST01577 M77992 HHCMC39
1560 EST02603 M86078 HFBDJ06 1626 EST02673 M86147 HHCE37
1561 EST02604 M86079 HFBDJ07 1627 EST02674 M86148 HHCF07
1562 EST02605 M86080 HFBDJ09 1628 EST02675 M85313 HHCF07
1563 EST02606 M86081 HFBDJ10 1629 EST02677 M85314 HHCF19
1564 EST02607 M86082 HFBDJ11 1630 EST02676 M86149 HHCF19
1565 EST02608 M86083 HFBDJ12 1631 EST02678 M86150 HHCF40
1566 EST02609 M86084 HFBDJ13 1632 EST02679 M86151 HHCF44
1567 EST02610 M86085 HFBDJ14 1633 EST02680 M85315 HHCF44
1568 EST02611 M86086 HFBDJ15 1634 EST02681 M86152 HHCF46
Figure imgf000108_0001
SEQ ID EST# GB# Clone SEQ ID EST# GB# Clone SEQ ID EST# GB# Clone
1881 EST01000 M78852 HHCMH48 1947 EST01052 M78904 HHCPC05 2013 EST01107 M78959 HHCPE80
1882 EST01638 M78051 HHCMH53 1948 EST01053 M78905 HHCPC05 2014 EST01108 M78960 HHCPE81
1883 EST01001 M78853 HHCMH54 1949 EST01054 M78906 HHCPC08 2015 EST01109 M78961 HHCPE83
1884 EST01002 M78854 HHCMH55 1950 EST01055 M78907 HHCPC08 2016 EST01110 M78962 HHCPE85
1885 EST01639 M78052 HHCMH61 1951 EST01056 M78908 HHCPC10 2017 EST01678 M78088 HHCPE89
1886 EST01003 M78855 HHCMH69 1952 EST01057 M78909 HHCPC11 2018 EST01111 M78963 HHCPE94
1887 EST01004 M78856 HHCMH72 1953 EST01661 M78073 HHCPC11 2019 EST01112 M78964 HHCPE96
1888 EST01005 M78857 HHCMH73 1954 EST01058 M78910 HHCPC14 2020 EST01113 M78965 HHCPF04
1889 EST01006 M78858 HHCMH73 1955 EST01662 M78074 HHCPC16 2021 EST01114 M78966 HHCPF10
1890 EST01007 M78859 HHCMH74 1956 EST01663 M78075 HHCPC18 2022 EST01115 M78967 HHCPF11
1891 EST01008 M78860 HHCMH75 1957 EST01059 M78911 HHCPC29 2023 EST01116 M78968 HHCPF12
1892 EST01009 M78861 HHCMH77 1958 EST01060 M78912 HHCPC37 2024 EST01117 M78969 HHCPF16
1893 EST01642 M78055 HHCMM81 1959 EST01061 M78913 HHCPC40 2025 EST01118 M78970 HHCPF24
1894 EST01643 M78056 HHCMM89 1960 EST02697 M86165 HHCPC42 2026 EST01119 M78971 HHCPF29
1895 EST01010 M78862 HHCMM89 1961 EST02698 M86166 HHCPC44 2027 EST01120 M78972 HHCPF33
1896 EST01011 M78863 HHCMH91 1962 EST01062 M78914 HHCPC45 2028 EST01121 M78973 HHCPF43
1897 EST01012 M78864 HHCPA01 1963 EST01063 M78915 HHCPC49 2029 EST01682 M78092 HHCPF44
1898 EST01013 M78865 HHCPA05 1964 EST01064 M78916 HHCPC52 2030 EST01122 M78974 HHCPF50
1899 EST01014 M78866 HHCPA14 1965 EST01664 M78076 HHCPC53 2031 EST01123 M78975 HHCPF52
1900 EST01015 M78867 HHCPA16 1966 EST01065 M78917 HHCPC55 2032 EST01683 M78093 HHCPF54
1901 EST01016 M78868 HHCPA17 1967 EST01066 M78918 HHCPC58 2033 EST01684 M78094 HHCPF56
1902 EST01017 M78869 HHCPA18 1968 EST01067 M78919 HHCPC60 2034 EST01124 M78976 HHCPF58
1903 EST01018 M78870 HHCPA25 1969 EST01068 M78920 HHCPC62 2035 EST01125 M78977 HHCPF61
1904 EST01019 M78871 HHCPA26 1970 EST01666 M78078 HHCPC65 2036 EST01126 M78978 HHCPF62
1905 EST01020 M78872 HHCPA28 1971 EST01069 M78921 HHCPC73 2037 EST01686 M78096 HHCPF66
1906 EST01021 M78873 HHCPA29 1972 EST01070 M78922 HHCPC78 2038 EST01127 M78979 HHCPF72
1907 EST01022 M78874 HHCPA42 1973 EST01071 M78923 HHCPC79 2039 EST01128 M78980 HHCPF79
1908 EST01023 M78875 HHCPA49 1974 EST01667 M78079 HHCPC92 2040 EST01129 M78981 HHCPF81
1909 EST01024 M78876 HHCPA52 1975 EST01073 M78925 HHCPC95 2041 EST01130 M78982 HHCPF84
1910 EST01645 M78058 HHCPA59 1976 EST01074 M78926 HHCPD06 2042 EST01688 M78098 HHCPF86
1911 EST02694 M86162 HHCPA59 1977 EST01075 M78927 HHCPD20 2043 EST01131 M78983 HHCPF87
1912 EST01025 M78877 HHCPA60 1978 EST01076 M78928 HHCPD26 2044 EST01132 M78984 HHCPF91
1913 EST01646 M78059 HHCPA67 1979 EST01077 M78929 HHCPD27 2045 EST01133 M78985 HHCPF92
1914 EST01026 M78878 HHCPA76 1980 EST01078 M78930 HHCPD28 2046 EST01134 M78986 HHCPG03
1915 EST01027 M78879 HHCPA78 1981 EST01079 M78931 HHCPD34 2047 EST01135 M78987 HHCPG05
1916 EST01028 M78880 HHCPA81 1982 EST01080 M78932 HHCPD43 2048 EST01136 M78988 HHCPG13
1917 EST01029 M78881 HHCPA83 1983 EST01081 M78933 HHCPD48 2049 EST01689 M78099 HHCPG14
1918 EST02695 M86163 HHCPA83 1984 EST01082 M78934 HHCPD49 2050 EST01137 M78989 HHCPG35
1919 EST01030 M78882 HHCPA88 1985 EST01083 M78935 HHCPD50 2051 EST01138 M78990 HHCPG37
1920 EST01031 M78883 HHCPA89 1986 EST01084 M78936 HHCPD56 2052 EST01139 M78991 HHCPG39
1921 EST01647 M78060 HHCPA90 1987 EST02700 M86168 HHCPD59 2053 EST01140 M78992 HHCPG42
1922 EST01032 M78884 HHCPB01 1988 EST01085 M78937 HHCPD70 2054 EST01141 M78993 HHCPG45
1923 EST01033 M78885 HHCPB10 1989 EST01086 M78938 HHCPD76 2055 EST01690 M78100 HHCPG47 1924 EST01034 M78886 HHCPB11 1990 EST01087 M78939 HHCPD80 2056 EST01142 M78994 HHCPG48
1925 EST01035 M78887 HHCPB15 1991 EST01088 M78940 HHCPE09 2057 EST01143 M78995 HHCPG62
1926 EST01036 M78888 HHCPB19 1992 EST01089 M78941 HHCPE12 2058 EST01144 M78996 HHCPG68
1927 EST01037 M78889 HHCPB21 1993 EST01090 M78942 HHCPE14 2059 EST01145 M78997 HHCPG70
1928 EST01038 M78890 HHCPB22 1994 EST01091 M78943 HHCPE16 2060 EST01146 M78998 HHCPG72
1929 EST01039 M78891 HHCPB27 1995 EST01092 M78944 HHCPE17 2061 EST01147 M78999 HHCPG76
1930 EST01040 M78892 HHCPB28 1996 EST01093 M78945 HHCPE26 2062 EST02701 M86169 HHCPG77
1931 EST01041 M78893 HHCPB34 1997 EST01094 M78946 HHCPE27 2063 EST01148 M79000 HHCPG82
1932 EST01042 M78894 HHCPB34 1998 EST01095 M78947 HHCPE29 2064 EST01149 M79001 HHCPG83
1933 EST01650 M78063 HHCPB41 1999 EST01096 M78948 HHCPE30 2065 EST01691 M78101 HHCPG88
1934 EST01043 M78895 HHCPB49 2000 EST01097 M78949 HHCPE33 2066 EST01692 M78102 HHCPG96
1935 EST01044 M78896 HHCPB57 2001 EST01098 M78950 HHCPE37 2067 EST01693 M78103 HPHCPH13
1936 EST01045 M78897 HHCPB58 2002 EST01099 M78951 HHCPE38 2068 EST01694 M78104 HHCPH18
1937 EST01652 M78065 HHCPB60 2003 EST01675 M78085 HHCPE69 2069 EST01150 M79002 HHCPH19
1938 EST01654 M78067 HHCPB63 2004 EST01676 M78086 HHCPE70
1939 EST01655 M78068 HHCPB66 2005 EST01100 M78952 HHCPE71
1940 EST01046 M78898 HHCPB69 2006 EST01101 M78953 HHCPE72
1941 EST01047 M78899 HHCPB82 2007 EST01102 M78954 HHCPE73
1942 EST01048 M78900 HHCPB85 2008 EST01103 M78955 HHCPE74
1943 EST01049 M78901 HHCPB90 2009 EST01677 M78087 HHCPE75
1944 EST01050 M78902 HHCPB96 2010 EST01104 M78956 HHCPE76
1945 EST01051 M78903 HHCPC02 2011 EST01105 M78957 HHCPE78
1946 EST02696 M86164 HHCPC03 2012 EST01106 M78958 HHCPE79
SEQ ID EST# GB# Clone SEQ ID EST# GB# Clone SEQ ID EST# GB# Clone
2070 EST01151 M79003 HHCPH24 2136 EST01210 M79062 HHCPJ33 2202 EST01732 M78141 HHCPL63
2071 EST01696 M78106 HHCPH32 2137 EST01211 M79063 HHCPJ40 2203 EST01264 M79116 HHCPL65
2072 EST01152 M79004 HHCPH33 2138 EST01715 M78124 HHCPJ42 2204 EST01265 M79117 HHCPL76
2073 EST01697 M78107 HHCPH35 2139 EST01716 M78125 HHCPJ44 2205 EST01735 M78144 HHCPL77
2074 EST01698 M78108 HHCPH39 2140 EST01212 M79064 HHCPJ50 2206 EST01736 M78145 HHCPL79
2075 EST01153 M79005 HHCPH40 2141 EST01213 M79065 HHCPJ54 2207 EST01266 M79118 HHCPL81
2076 EST02702 M86170 HHCPH42 2142 EST01214 M79066 HHCPJ60 2208 EST01267 M79119 HHCPL82
2077 EST01154 M79006 HHCPH43 2143 EST01215 M79067 HHCPJ62 2209 EST02717 HHCPL84
2078 EST01155 M79007 HHCPH44 2144 EST01216 M79068 HHCPJ65 2210 EST01268 M79120 HHCPL87
2079 EST01156 M79008 HHCPH45 2145 EST01217 M79069 HHCPJ69 2211 EST01269 M79121 HHCPL93
2080 EST01157 M79009 HHCPH56 2146 EST01218 M79070 HHCPJ76 2212 EST01270 M79122 HHCPL96
2081 EST01158 M79010 HHCPH59 2147 EST01219 M79071 HHCPJ77 2213 EST01271 M79123 HHCPH02
2082 EST01159 M79011 HHCPH60 2148 EST01220 M79072 HHCPJ80 2214 EST01272 M79124 HHCPM03
2083 EST01160 M79012 HHCPH61 2149 EST01221 M79073 HHCPJ81 2215 EST01273 M79125 HHCPM06
2084 EST01161 M79013 HHCPH63 2150 EST01222 M79074 HHCPJ82 2216 EST01737 M78146 HHCPM10
2085 EST01162 M79014 HHCPH68 2151 EST01223 M79075 HHCPJ83 2217 EST01738 M78147 HHCPH17
2086 EST01163 M79015 HHCPH70 2152 EST01224 M79076 HHCPJ91 2218 EST01274 M79126 HHCPH22
2087 EST01164 M79016 HHCPH72 2153 EST01225 M79077 HHCPJ96 2219 EST01275 M79127 HHCPH24
2088 EST01166 M79018 HHCPH75 2154 EST01226 M79078 HHCPK02 2220 EST01740 M78148 HHCPH28
2089 EST01699 M78109 HHCPH79 2155 EST01227 M79079 HHCPK05 2221 EST01741 M78149 HHCPH30
2090 EST01167 M79019 HHCPH80 2156 EST01718 M78127 HHCPK09 2222 EST01276 M79128 HHCPH31
2091 EST01168 M79020 HHCPH84 2157 EST01719 M78128 HHCPK10 2223 EST01742 M78150 HHCPH37
2092 EST01700 M78110 HHCPH85 2158 EST01228 M79080 HHCPK17 2224 EST01277 M79129 HHCPH42
2093 EST01170 M79022 HHCPI01 2159 EST01229 M79081 HHCPK20 2225 EST01278 M79130 HHCPH44
2094 EST01171 M79023 HHCPI03 2160 EST01230 M79082 HHCPK21 2226 EST01744 M78152 HHCPM47
2095 EST01701 M78111 HHCPI05 2161 EST01231 M79083 HHCPK26 2227 EST01279 M79131 HHCPM49
2096 EST01172 M79024 HHCPI08 2162 EST01232 M79084 HHCPK27 2228 EST01280 M79132 HHCPH50
2097 EST01173 M79025 HHCPI09 2163 EST01233 M79085 HHCPK31 2229 EST01281 M79133 HHCPM58
2098 EST01174 M79026 HHCPI20 2164 EST01234 M79086 HHCPK38 2230 EST01282 M79134 HHCPH59
2099 EST01175 M79027 HHCPI28 2165 EST01720 M78129 HHCPK40 2231 EST01746 M78154 HHCPH60
2100 EST01176 M79028 HHCPI34 2166 EST01236 M79088 HHCPK42 2232 EST01283 M79135 HHCPM61
2101 EST01177 M79029 HHCPI38 2167 EST01237 M79089 HHCPK45 2233 EST01284 M79136 HHCPM65
2102 EST01178 M79030 HHCPI38 2168 EST01238 M79090 HHCPK49 2234 EST01285 M79137 HHCPH66
2103 EST01179 M79031 HHCPI40 2169 EST01722 M78131 HHCPK55 2235 EST01286 M79138 HHCPM70
2104 EST01180 M79032 HHCPI40 2170 EST01239 M79091 HHCPK66 2236 EST01287 M79139 HHCPM72
2105 EST01181 M79033 HHCPI41 2171 EST01240 M79092 HHCPK67 2237 EST01288 M79140 HHCPM75
2106 EST01182 M79034 HHCPI41 2172 EST01241 M79093 HHCPK68 2238 EST01289 M79141 HHCPM77
2107 EST01183 M79035 HHCPI50 2173 EST01724 M78133 HHCPK77 2239 EST01290 M79142 HHCPM81
2108 EST01184 M79036 HHCPI56 2174 EST01242 M79094 HHCPK77 2240 EST01291 M79143 HHCPM82
2109 EST01185 M79037 HHCPI56 2175 EST01243 M79095 HHCPK82 2241 EST01747 M78155 HHCPM83
2110 EST01186 M79038 HHCPI58 2176 EST01244 M79096 HHCPK85 2242 EST01292 M79144 HHCPM84
2111 EST01187 M79039 HHCPI58 2177 EST01245 M79097 HHCPK88 2243 EST01293 M79145 HHCPN06
2112 EST01188 M79040 HHCPI63 2178 EST01726 M78135 HHCPK91 2244 EST01294 M79146 HHCPN09
2113 EST01189 M79041 HHCPI67 2179 EST01246 M79098 HHCPL02 2245 EST01748 M78156 HHCPN14
2114 EST01190 M79042 HHCPI68 2180 EST01247 M79099 HHCPL03 2246 EST01295 M79147 HHCPN15
2115 EST01191 M79043 HHCPI70 2181 EST01248 M79100 HHCPL05 2247 EST01296 M79148 HHCPN18
2116 EST01192 M79044 HHCPI74 2182 EST01249 M79101 HHCPL06 2248 EST01297 M79149 HHCPN24
2117 EST01193 M79045 HHCPI77 2183 EST01250 M79102 HHCPL07 2249 EST01298 M79150 HHCPN25
2118 EST01194 M79046 HHCPI84 2184 EST01251 M79103 HHCPL08 2250 EST01299 M79151 HHCPN26
2119 EST01195 M79047 HHCPI85 2185 EST01252 M79104 HHCPL17 2251 EST01300 M79152 HHCPN30
2120 EST01196 M79048 HHCPI88 2186 EST01253 M79105 HHCPL19 2252 EST01750 M78158 HHCPN31
2121 EST01711 M78120 HHCPI90 2187 EST01727 M78136 HHCPL20 2253 EST01301 M79153 HHCPN32
2122 EST01197 M79049 HHCPI94 2188 EST01254 M79106 HHCPL21 2254 EST01751 M78159 HHCPN36
2123 EST01713 M78122 HHCPI96 2189 EST01255 M79107 HHCPL25 2255 EST01302 M79154 HHCPN38
2124 EST01198 M79050 HHCPJ03 2190 EST01728 M78137 HHCPL27 2256 EST02718 HHCPN46
2125 EST01199 M79051 HHCPJ06 2191 EST01256 M79108 HHCPL32 2257 EST01303 M79155 HHCPN48
2126 EST01200 M79052 HHCPJ09 2192 EST01257 M79109 HHCPL36 2258 EST01754 M78161 HHCPN50
2127 EST01201 M79053 HHCPJ12 2193 EST01258 M79110 HHCPL37
2128 EST01202 M79054 HHCPJ16 2194 EST01729 M78138 HHCPL39
2129 EST01203 M79055 HHCPJ20 2195 EST01259 M79111 HHCPL48
2130 EST01204 M79056 HHCPJ21 2196 EST01260 M79112 HHCPL51
2131 EST01205 M79057 HHCPJ22 2197 EST01261 M79113 HHCPL52
2132 EST01206 M79058 HHCPJ23 2198 EST01750 M78139 HHCPL53
2133 EST01207 M79059 HHCPJ27 2199 EST01262 M79114 HHCPL54
2134 EST01208 M79060 HHCPJ29 2200 EST01731 M78140 HHCPL58
2135 EST01209 M79061 HHCPJ32 2201 EST01263 M79115 HHCPL60
SEQ ID EST# GB# Clone SEQ ID EST# GB# Clone SEQ ID EST# GB# Clone
2259 EST01304 M79156 HHCPN52 2326 EST01791 M78198 HHCPQ06 2392 EST01416 M79261 HRBAA06
2260 EST01305 M79157 HHCPN54 2327 EST01354 M79206 HHCPQ07 2393 EST01417 M79262 HRBAA07
2261 EST01755 M78162 HHCPN60 2328 EST01355 M79207 HHCPQ12 2394 EST01418 M79263 HRBAA20
2262 EST01306 M79158 HHCPN63 2329 EST01792 M78199 HHCPQ13 2395 EST01419 M79264 HRBAA26
2263 EST01307 M79159 HHCPN64 2330 EST01793 M78200 HHCPQ16 2396 EST01420 M79265 HRBAA27
2264 EST01308 M79160 HHCPN65 2331 EST01356 M79208 HHCPQ22 2397 EST01421 M79266 HRBBA04
2265 EST01309 M79161 HHCPN65 2332 EST01794 M78201 HHCPQ23 2398 EST01422 M79267 HRBBA06
2266 EST01310 M79162 HHCPN67 2333 EST01357 M79209 HHCPQ29 2399 EST01423 M79268 HRBBA08
2267 EST01756 M78163 HHCPN70 2334 EST01358 M79210 HHCPQ50 2400 EST01824 M78227 HRBBA11
2268 EST01311 M79163 HHCPN76 2335 EST01359 M79211 HHCPQ60 2401 EST01424 M79269 HRBBA14
2269 EST01312 M79164 HHCPN92 2336 EST01360 M79212 HHCPQ62 2402 EST01826 M78229 HRBBA21
2270 EST01313 M79165 HHCPN96 2337 EST01361 M79213 HHCPQ68 2403 EST01425 M79270 HRBBA22
2271 EST01314 M79166 HHCPO01 2338 EST02706 M86174 HHCPQ70 2404 EST01426 M79271 HRBBA23
2272 EST01762 M78169 HHCPO03 2339 EST01362 M79214 HHCPQ75 2405 EST01427 M79272 HRBBA24
2273 EST01315 M79167 HHCPO05 2340 EST01802 M78209 HHCPQ88 2406 EST02713 M86178 HRHAA22
2274 EST02704 M86172 HHCPO06 2341 EST01364 M79216 HHCPQ89 2407 EST02714 M86179 HRHAA23
2275 EST01316 M79168 HHCPO07 2342 EST01365 M79217 HHCPQ90 2408 EST00244 M62182 HHCG50
2276 EST01317 M79169 HHCPO08 2343 EST01366 M79218 HHCPQ91 2409 EST00273 H62211 HFBA76
2277 EST01318 M79170 HHCPO10 2344 EST01367 M79219 HHCPQ92
2278 EST01319 M79171 HHCPO19 2345 EST01368 M79220 HHCPQ94
2279 EST01320 M79172 HHCPO21 2346 EST01369 M79221 HHCPQ95
2280 EST01763 M78170 HHCPO22 2347 EST01370 M79222 HHCPR02
2281 EST01321 M79173 HHCPO24 2348 EST01371 M79223 HHCPR10
2282 EST01764 M78171 HHCPO25 2349 EST01372 M79224 HHCPR11
2283 EST01322 M79174 HHCPO45 2350 EST02708 M86176 HHCPR13
2284 EST01323 M79175 HHCPO51 2351 EST01373 M79225 HHCPR15
2285 EST01768 M78175 HHCPO57 2352 EST01374 M79226 HHCPR16
2287 EST01770 M78177 HHCPO64 2353 EST01806 M78213 HHCPR19
2288 EST01324 M79176 HHCPO65 2354 EST01375 M79227 HHCPR22
2289 EST01325 M79177 HHCPO72 2355 EST01376 M79228 HHCPR23
2290 EST01772 M78179 HHCPO74 2356 EST01377 M79229 HHCPR28
2291 EST01773 M78180 HHCPO75 2357 EST01378 M79230 HHCPR49
2292 EST01326 M79178 HHCPO76 2358 EST01379 M79231 HHCPR51
2293 EST01327 M79179 HHCPO81 2359 EST01380 M79232 HHCPR52
2294 EST01328 M79180 HHCPO82 2360 EST01381 M79233 HHCPR53
2295 EST01329 M79181 HHCPO83 2361 EST01382 M79234 HHCPR62
2296 EST01330 M79182 HHCPO87 2362 EST01383 M79235 HHCPR66
2297 EST01775 M78182 HHCPO88 2363 EST01384 M79236 HHCPR68
2298 EST01331 M79183 HHCPO92 2364 EST01385 M79237 HHCPR75
2299 EST01332 M79184 HHCPP03 2365 EST01386 M79238 HHCPR78
2300 EST01333 M79185 HHCPP04 2366 EST01387 M79239 HHCPR86
2301 EST01334 M79186 HHCPP05 2367 EST01388 M79240 HHCPR90
2302 EST01779 M78186 HHCPP06 2368 EST01389 M79241 HHCPR95
2303 EST01335 M79187 HHCPP07 2369 EST01811 M78218 HHCPR96
2304 EST01780 M78187 HHCPP12 2370 EST01390 M79242 HHCPS02
2305 EST01336 M79188 HHCPP13 2371 EST01391 M79243 HHCPS06
2306 EST01337 M79189 HHCPP15 2372 EST01392 M79244 HHCPS12
2307 EST02705 M86173 HHCPP21 2373 EST01393 M79245 HHCPS17
2308 EST01339 M79191 HHCPP22 2374 EST01394 M79246 HHCPS18
2309 EST01340 M79192 HHCPP27 2375 EST01815 M78222 HHCPS29
2310 EST01341 M79193 HHCPP37 2376 EST01395 M79247 HHCPS30
2311 EST01342 M79194 HHCPP39 2377 EST01396 M79248 HHCPS35
2312 EST01343 M79195 HHCPP41 2378 EST01397 M79249 HHCPS36
2313 EST01344 M79196 HHCPP45 2379 EST01398 M79250 HHCPS39
2314 EST01345 M79197 HHCPP47 2380 EST01399 M79251 HHCPS40
2315 EST01346 M79198 HHCPP51 2381 EST01400 M79252 HHCPS43
2316 EST01782 M78189 HHCPP55 2382 EST01401 M79253 HHCPS44
2317 EST01347 M79199 HHCPP56 2383 EST01402 M79254 HHCPS51
2318 EST01348 M79200 HHCPP64 2384 EST01403 M79255 HHCPS56
2319 EST01349 M79201 HHCPP66 2385 EST01816 M78223 HHCPS60
2320 EST01784 M78191 HHCPP69 2386 EST01404 M79256 HHCPS62
2321 EST01350 M79202 HHCPP76 2387 EST01405 M79257 HHCPS63
2322 EST01351 M79203 HHCPP77 2388 EST01406 M79258 HHCPS71
2323 EST01789 M78196 HHCPP91 2389 EST01407 M79259 HHCPS78
2324 EST01352 M79204 HHCPP96 2390 EST02712 M86177 HHCPS95
2325 EST01353 M79205 HHCPQ05 2391 EST01415 M79260 HRBAA06
NOTE REGARDING SEQUENCE LISTINGS: The listings of SEQ ID NOS: 1-2421 are in numerical order. However, an occasional number (for example, SEQ ID NO: 44) is not found in this list. In all, 9 SEQ ID NOS are not used. Nevertheless, the convention "1-2421" is used, for example, to refer to all the SEQ ID NOS in the following list, while "1-315" is used, for example, to refer to all the listed sequences falling between SEQ ID NO 1 and SEQ ID NO 315.
SEQUENCE LISTING
(1) GENERAL INFORMATION:
(i) APPLICANT: Venter, J. Craig
Adams, Mark D.
Moreno, Ruben F.
(ii) TITLE OF INVENTION: Sequences Characteristic of Human Gene
Transcription Product
(iii) NUMBER OF SEQUENCES: 2412 (1-2421, with 9 SEQ ID NOS unused.)
(iv) CORRESPONDENCE ADDRESS:
(A) ADDRESSEE: Knobbe, Martens, Olson, and Bear
(B) STREET: 620 Newport Center Dr. Sixteenth Floor
(C) CITY: Newport Beach
(D) STATE: CA
(E) COUNTRY: USA
(F) ZIP: 92660
(v) COMPUTER READABLE FORM:
(A) MEDIUM TYPE: Floppy disk
(B) COMPUTER: IBM PC compatible
(C) OPERATING SYSTEM: PC-DOS/MS-DOS
(D) SOFTWARE: Patentln Release #1.0, Version #1.25
(vi) CURRENT APPLICATION DATA:
(A) APPLICATION NUMBER: 07/837,195
(B) FILING DATE: 12-FEB-1992
(vii) PRIOR APPLICATION DATA:
(A) APPLICATION NUMBER: US 07/716,831
(B) FILING DATE: 20-JUN-1991
(viii) ATTORNEY/AGENT INFORMATION:
(A) NAME: Israelsen, Ned A.
(B) REGISTRATION NUMBER: 29,655
(C) REFERENCE/DOCKET NUMBER: NIH004.004CP1
(ix) TELECOMMUNICATION INFORMATION:
(A) TELEPHONE: 619-235-8550
(B) TELEFAX: 619-235-0176
SEO ID NO:1: (Length of Sequence = 362 Nucleotides)
CTTCCCTTTT GTTCCCCTCA GTGTCCCTTT TAATTCCTTC CCTCCATTTT CCTTAGCAGC ATCCTAGTTG ATGGTCIGGG
TTATCAGAGG AGCAAAAACA TTTAAGTGTC AAATAATGCT CATTTTTCTCC CTGGGATTTC TAAACAGAAA AAATGAAGAA
Figure imgf000114_0001
Figure imgf000115_0001
Figure imgf000116_0001
Figure imgf000117_0001
Figure imgf000118_0001
Figure imgf000119_0001
Figure imgf000120_0001
Figure imgf000121_0001
Figure imgf000122_0001
Figure imgf000123_0001
Figure imgf000124_0001
Figure imgf000125_0001
Figure imgf000126_0001
Figure imgf000127_0001
Figure imgf000128_0001
Figure imgf000129_0001
Figure imgf000130_0001
Figure imgf000131_0001
Figure imgf000132_0001
Figure imgf000133_0001
Figure imgf000134_0001
Figure imgf000135_0001
Figure imgf000136_0001
.
.
Figure imgf000137_0001
Figure imgf000138_0001
Figure imgf000139_0001
Figure imgf000140_0001
Figure imgf000141_0001
Figure imgf000142_0001
Figure imgf000143_0001
Figure imgf000144_0001
Figure imgf000145_0001
Figure imgf000146_0001
Figure imgf000147_0001
Figure imgf000148_0001
Figure imgf000149_0001
Figure imgf000150_0001
Figure imgf000151_0001
Figure imgf000152_0001
Figure imgf000153_0001
Figure imgf000154_0001
Figure imgf000155_0001
Figure imgf000156_0001
Figure imgf000157_0001
Figure imgf000158_0001
Figure imgf000159_0001
Figure imgf000160_0001
Figure imgf000161_0001
Figure imgf000162_0001
Figure imgf000163_0001
Figure imgf000164_0001
Figure imgf000165_0001
Figure imgf000166_0001
Figure imgf000167_0001
Figure imgf000168_0001
Figure imgf000169_0001
Figure imgf000170_0001
Figure imgf000171_0001
Figure imgf000172_0001
Figure imgf000173_0001
Figure imgf000174_0001
Figure imgf000175_0001
Figure imgf000176_0001
Figure imgf000177_0001
Figure imgf000178_0001
Figure imgf000179_0001
Figure imgf000180_0001
Figure imgf000181_0001
Figure imgf000182_0001
Figure imgf000183_0001
Figure imgf000184_0001
Figure imgf000185_0001
Figure imgf000186_0001
Figure imgf000187_0001
Figure imgf000188_0001
Figure imgf000189_0001
Figure imgf000190_0001
Figure imgf000191_0001
Figure imgf000192_0001
Figure imgf000193_0001
Figure imgf000194_0001
Figure imgf000195_0001
Figure imgf000196_0001
Figure imgf000197_0001
Figure imgf000198_0001
Figure imgf000199_0001
Figure imgf000200_0001
Figure imgf000201_0001
Figure imgf000202_0001
Figure imgf000203_0001
Figure imgf000204_0001
Figure imgf000205_0001
Figure imgf000206_0001
Figure imgf000207_0001
Figure imgf000208_0001
Figure imgf000209_0001
Figure imgf000210_0001
Figure imgf000211_0001
Figure imgf000212_0001
Figure imgf000213_0001
Figure imgf000214_0001
Figure imgf000215_0001
Figure imgf000216_0001
Figure imgf000217_0001
Figure imgf000218_0001
Figure imgf000219_0001
Figure imgf000220_0001
Figure imgf000221_0001
Figure imgf000222_0001
Figure imgf000223_0001
Figure imgf000224_0001
Figure imgf000225_0001
Figure imgf000226_0001
Figure imgf000227_0001
Figure imgf000228_0001
Figure imgf000229_0001
Figure imgf000230_0001
Figure imgf000231_0001
Figure imgf000232_0001
Figure imgf000233_0001
Figure imgf000234_0001
Figure imgf000235_0001
Figure imgf000236_0001
Figure imgf000237_0001
Figure imgf000238_0001
Figure imgf000239_0001
Figure imgf000240_0001
Figure imgf000241_0001
Figure imgf000242_0001
Figure imgf000243_0001
Figure imgf000244_0001
Figure imgf000245_0001
Figure imgf000246_0001
Figure imgf000247_0001
Figure imgf000248_0001
Figure imgf000249_0001
Figure imgf000250_0001
Figure imgf000251_0001
Figure imgf000252_0001
Figure imgf000253_0001
Figure imgf000254_0001
Figure imgf000255_0001
Figure imgf000256_0001
Figure imgf000257_0001
Figure imgf000258_0001
Figure imgf000259_0001
Figure imgf000260_0001
Figure imgf000261_0001
Figure imgf000262_0001
Figure imgf000263_0001
Figure imgf000264_0001
Figure imgf000265_0001
Figure imgf000266_0001
Figure imgf000267_0001
Figure imgf000268_0001
Figure imgf000269_0001
Figure imgf000270_0001
Figure imgf000271_0001
Figure imgf000272_0001
Figure imgf000273_0001
Figure imgf000274_0001
Figure imgf000275_0001
Figure imgf000276_0001
Figure imgf000277_0001
Figure imgf000278_0001
Figure imgf000279_0001
Figure imgf000280_0001
Figure imgf000281_0001
Figure imgf000282_0001
Figure imgf000283_0001
Figure imgf000284_0001
Figure imgf000285_0001
Figure imgf000286_0001
Figure imgf000287_0001
Figure imgf000288_0001
Figure imgf000289_0001
Figure imgf000290_0001
Figure imgf000291_0001
Figure imgf000292_0001
Figure imgf000293_0001
Figure imgf000294_0001
Figure imgf000295_0001
Figure imgf000296_0001
Figure imgf000297_0001
Figure imgf000298_0001
Figure imgf000299_0001
Figure imgf000300_0001
Figure imgf000301_0001
Figure imgf000302_0001
Figure imgf000303_0001
Figure imgf000304_0001
Figure imgf000305_0001
Figure imgf000306_0001
Figure imgf000307_0001
Figure imgf000308_0001
Figure imgf000309_0001
Figure imgf000310_0001
Figure imgf000311_0001
Figure imgf000312_0001
Figure imgf000313_0001
Figure imgf000314_0001
Figure imgf000315_0001
Figure imgf000316_0001
Figure imgf000317_0001
Figure imgf000318_0001
Figure imgf000319_0001
Figure imgf000320_0001
Figure imgf000321_0001
Figure imgf000322_0001
Figure imgf000323_0001
Figure imgf000324_0001
Figure imgf000325_0001
Figure imgf000326_0001
Figure imgf000327_0001
Figure imgf000328_0001
Figure imgf000329_0001
Figure imgf000330_0001
Figure imgf000331_0001
Figure imgf000332_0001
Figure imgf000333_0001
Figure imgf000334_0001
Figure imgf000335_0001
Figure imgf000336_0001
Figure imgf000337_0001
Figure imgf000338_0001
Figure imgf000339_0001
Figure imgf000340_0001
Figure imgf000341_0001
Figure imgf000342_0001
Figure imgf000343_0001
Figure imgf000344_0001
Figure imgf000345_0001
Figure imgf000346_0001
Figure imgf000347_0001
Figure imgf000348_0001
Figure imgf000349_0001
Figure imgf000350_0001
Figure imgf000351_0001
Figure imgf000352_0001
Figure imgf000353_0001
Figure imgf000354_0001
Figure imgf000355_0001
Figure imgf000356_0001
Figure imgf000357_0001
Figure imgf000358_0001
Figure imgf000359_0001
Figure imgf000360_0001
Figure imgf000361_0001
Figure imgf000362_0001
Figure imgf000363_0001
Figure imgf000364_0001
Figure imgf000365_0001
Figure imgf000366_0001
Figure imgf000367_0001
Figure imgf000368_0001
Figure imgf000369_0001
Figure imgf000370_0001
Figure imgf000371_0001
Figure imgf000372_0001
Figure imgf000373_0001
Figure imgf000374_0001
Figure imgf000375_0001
Figure imgf000376_0001
Figure imgf000377_0001
Figure imgf000378_0001
Figure imgf000379_0001
Figure imgf000380_0001
Figure imgf000381_0001
Figure imgf000382_0001
Figure imgf000383_0001
Figure imgf000384_0001
Figure imgf000385_0001
Figure imgf000386_0001
Figure imgf000387_0001
Figure imgf000388_0001
Figure imgf000389_0001
Figure imgf000390_0001
Figure imgf000391_0001
Figure imgf000392_0001
Figure imgf000393_0001
Figure imgf000394_0001
Figure imgf000395_0001
Figure imgf000396_0001
Figure imgf000397_0001
Figure imgf000398_0001
Figure imgf000399_0001
Figure imgf000400_0001
Figure imgf000401_0001
Figure imgf000402_0001
Figure imgf000403_0001
Figure imgf000404_0001
Figure imgf000405_0001
Figure imgf000406_0001
Figure imgf000407_0001
Figure imgf000408_0001
Figure imgf000409_0001
Figure imgf000410_0001
Figure imgf000411_0001
Figure imgf000412_0001
Figure imgf000413_0001
Figure imgf000414_0001
Figure imgf000415_0001
Figure imgf000416_0001
Figure imgf000417_0001
Figure imgf000418_0001
Figure imgf000419_0001
Figure imgf000420_0001
Figure imgf000421_0001
Figure imgf000422_0001
Figure imgf000423_0001
Figure imgf000424_0001
Figure imgf000425_0001
Figure imgf000426_0001
Figure imgf000427_0001
Figure imgf000428_0001
Figure imgf000429_0001
Figure imgf000430_0001
Figure imgf000431_0001
Figure imgf000432_0001
Figure imgf000433_0001
Figure imgf000434_0001
Figure imgf000435_0001
Figure imgf000436_0001
Figure imgf000437_0001
Figure imgf000438_0001
Figure imgf000439_0001
Figure imgf000440_0001
Figure imgf000441_0001
Figure imgf000442_0001
Figure imgf000443_0001
Figure imgf000444_0001
Figure imgf000445_0001
Figure imgf000446_0001
Figure imgf000447_0001
Figure imgf000448_0001
Figure imgf000449_0001
Figure imgf000450_0001
Figure imgf000451_0001
Figure imgf000452_0001
Figure imgf000453_0001
Figure imgf000454_0001
Figure imgf000455_0001
Figure imgf000456_0001
Figure imgf000457_0001
Figure imgf000458_0001
Figure imgf000459_0001
Figure imgf000460_0001
Figure imgf000461_0001
Figure imgf000462_0001
Figure imgf000463_0001
Figure imgf000464_0001
Figure imgf000465_0001
Figure imgf000466_0001
Figure imgf000467_0001
Figure imgf000468_0001
Figure imgf000469_0001
Figure imgf000470_0001
Figure imgf000471_0001
Figure imgf000472_0001
Figure imgf000473_0001
Figure imgf000474_0001
Figure imgf000475_0001
Figure imgf000476_0001
Figure imgf000477_0001
Figure imgf000478_0001
Figure imgf000479_0001
Figure imgf000480_0001
Figure imgf000481_0001
Figure imgf000482_0001
Figure imgf000483_0001
Figure imgf000484_0001
Figure imgf000485_0001
Figure imgf000486_0001
Figure imgf000487_0001
Figure imgf000488_0001
Figure imgf000489_0001
Figure imgf000490_0001
Figure imgf000491_0001
Figure imgf000492_0001
Figure imgf000493_0001
Figure imgf000494_0001
Figure imgf000495_0001
Figure imgf000496_0001
Figure imgf000497_0001

Claims

WHAT IS CLAIMED IS:
1. A purified polynucleotide having a sequence designated as one of:
SEQ ID NO: 316 - 2421, except SEQ ID NOS 650, 1834, and 2073;
or having a sequence complementary thereto.
2. A purified polynucleotide having a sequence designated as one of:
SEQ ID NO: 316 - 2421, except SEQ ID NOS: 485, 650, 1834, 2073, 2092, and 2353;
or complementary sequence thereto or, for those sequences over 150 nucletides long, a portion thereof at least 150 nucleotides in length.
3. An isolated polynucleotide that includes a sequence designated as one of:
SEQ ID NO: 316 - 2421, except SEQ ID NOS: 485, 650,
1834, 2073, 2092, and 2353;
or complementary sequence thereto or, for those sequences over 150 nucleotides long, a portion thereof at least
150 nucleotides in length.
4. An isolated polynucleotide operably coding for a native human polypeptide or protein, which includes a region coding for the same amino acid sequence as a native human coding region corresponding to a sequence designated as one of:
SEQ ID NO: 316 - 2421.
5. The polynucleotide of Claim 4, wherein said SEQ ID NO is listed in Table 6 and is one of SEQ ID NOS: 316-2421.
6. The polynucleotide of Claim 4, wherein said SEQ ID NO is listed in Table 7 and is one of SEQ ID NOS: 316-2421.
7. The polynucleotide of Claim 4, wherein said SEQ ID NO is identified in Table 10 in a metabolic functional grouping and is one of SEQ ID NOS : 316-2421.
8. The polynucleotide of Claim 4, wherein said SEQ ID NO is identified in Table 10 in a structural functional grouping and is one of SEQ ID NOS: 316-2421.
9. The polynucleotide of Claim 4, wherein said SEQ ID NO is identified in Table 11 in a developmental control grouping and is one of SEQ ID NOS: 316-2421.
10. An isolated polynucleotide coding for a human protein or polypeptide, which includes a coding region corresponding to the EST identified as:
SEQ ID NO: 316 - 2421;
or a polynucleotide complementary thereto.
11. The polynucleotide of Claim 10, wherein the SEQ ID NO is 316-1000.
12. The polynucleotide of Claim 10, wherein the SEQ ID NO is 1001-1500.
13. The polynucleotide of Claim 10, wherein the SEQ ID NO is 1501-2000.
14. The polynucleotide of Claim 10, wherein the SEQ ID NO. is 2001-2421.
15. The polynucleotide of Claim 10, wherein said
polynucleotide further includes the entire sequence designated as any one of SEQ ID NOS: 316-2421.
16. An isolated polynucleotide comprising at least 150 bp of a sequence of Claim 10 and wherein said SEQ ID NO excludes NOS 485, 650, 1834, 2073, 2092, and 2353.
17. An isolated polynucleotide sequence, which hybridizes to a sequence designated as any one of SEQ ID NOS 316-2421, except SEQ ID NOS 485, 650, 1834, 2073, 2092, and 2353, or to a sequence complementary thereto, under hybridization conditions sufficiently stringent to require at least 97% base pairing.
18. A polynucleotide according to any one of Claims 4-17, in substantially purified form.
19. A construct in isolated form comprising a vector and a polynucleotide according to any one of Claims 1-17.
20. The construct according to Claim 19, further
comprising a promoter operably linked to said polynucleotide.
21. A panel of at least 100 isolated polynucleotides having the sequences of Claim 3 or Claim 16.
22. An antisense oligonucleotide capable of blocking expression of any one of the polynucleotide-encoding sequences of Claim 10.
23. A triple helix probe capable of blocking expression of any one of the polynucleotide-encoding sequences of Claim 10 having at least a 10-base homopurine or homopyrimidine sequence, said probe comprising single-stranded DNA having at least a 10-base homopurine or homopyrimidine sequence and being adapted to bind to the major groove of double stranded DNA which includes said polynucleotide-encoding sequence.
25. The polynucleotide of Claim 1, wherein said SEQ ID NO is 913.
26. The polynucleotide of Claim 1, wherein said SEQ ID NO is 1039.
27. The polynucleotide of Claim 1, wherein said SEQ ID NO is 1395.
28. The polynucleotide of Claim 1, wherein said SEQ ID NO is 1567.
29. The polynucleotide of Claim 1, wherein said SEQ ID NO is 1667.
30. The polynucleotide of Claim 1, wherein said SEQ ID NO is 1704.
31. The polynucleotide of Claim 1, wherein said SEQ ID NO is 2089.
32. The polynucleotide of Claim 1, wherein said SEQ ID NO is 2297.
33. The polynucleotide of Claim 1, wherein said SEQ ID NO is 2302.
PCT/US1993/001294 1992-02-12 1993-02-12 Sequences characteristic of human gene transcription product WO1993016178A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US83719592A 1992-02-12 1992-02-12
US07/837,195 1992-02-12

Publications (2)

Publication Number Publication Date
WO1993016178A2 true WO1993016178A2 (en) 1993-08-19
WO1993016178A3 WO1993016178A3 (en) 1993-11-25

Family

ID=25273788

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1993/001294 WO1993016178A2 (en) 1992-02-12 1993-02-12 Sequences characteristic of human gene transcription product

Country Status (2)

Country Link
AU (1) AU3665893A (en)
WO (1) WO1993016178A2 (en)

Cited By (58)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1995018226A1 (en) * 1993-12-24 1995-07-06 University Of Wales College Of Medicine Tuberous sclerosis 2 gene and uses thereof
EP0679716A1 (en) * 1993-11-12 1995-11-02 Kenichi Matsubara Gene signature
WO1996013514A1 (en) * 1994-10-27 1996-05-09 Thomas Jefferson University Tcl-1 gene and protein and related methods and compositions
WO1997002280A1 (en) * 1995-06-30 1997-01-23 Human Genome Sciences, Inc. Breast specific genes and proteins
US5605797A (en) * 1994-09-15 1997-02-25 Board Of Trustees Operating Michigan State University Bovine β-mannosidase gene and methods of use
US5695937A (en) * 1995-09-12 1997-12-09 The Johns Hopkins University School Of Medicine Method for serial analysis of gene expression
WO1998037197A1 (en) * 1997-02-24 1998-08-27 Incyte Pharmaceuticals, Inc. Novel microtubule-associated protein
WO1998038209A2 (en) * 1997-02-26 1998-09-03 Genetics Institute, Inc. Secreted proteins and polynucleotides encoding them
US5840559A (en) * 1996-10-30 1998-11-24 Incyte Pharmaceuticals, Inc. Human spermidine/spermine N1-acetyltransferase
WO1999000518A1 (en) * 1997-06-26 1999-01-07 Abbott Laboratories Member of the tnf family useful for treatment and diagnosis of disease
US5866330A (en) * 1995-09-12 1999-02-02 The Johns Hopkins University School Of Medicine Method for serial analysis of gene expression
US5874285A (en) * 1996-09-13 1999-02-23 Incyte Pharmaceuticals, Inc. Polynucleotide encoding a novel human nm23-like protein
WO1999009158A1 (en) * 1997-08-13 1999-02-25 Chugai Research Institute For Molecular Medicine, Inc. PROTEIN HAVING Zn FINGER-LIKE MOTIF
US5889170A (en) * 1997-01-31 1999-03-30 Incyte Pharmaceuticals, Inc. Human integral membrane protein
WO1999021997A1 (en) * 1997-10-28 1999-05-06 Immunex Corporation Viral encoded semaphorin protein receptor dna and polypeptides
EP0915904A1 (en) * 1996-05-06 1999-05-19 Chiron Corporation Mammalian sex comb on midleg (mammalian scm) acts as a tumor suppressor
WO1999024610A1 (en) * 1997-11-06 1999-05-20 Millennium Pharmaceuticals, Inc. Novel genes encoding transporter-like molecules
EP0922053A1 (en) * 1996-05-08 1999-06-16 Chiron Corporation MAMMALIAN ADDITIONAL SEX COMBS (MAMMALIAN Asx) ACTS AS A TUMOR SUPPRESSOR
US5916753A (en) * 1997-11-13 1999-06-29 Incyte Pharmaceuticals, Inc. SH3-containing proteins
US5917028A (en) * 1996-10-29 1999-06-29 Incyte Pharmaceuticals, Inc. Human phosphoprotein
US5948619A (en) * 1997-07-31 1999-09-07 Incyte Pharmaceuticals, Inc. Human zygin-1
EP0710721A3 (en) * 1994-11-02 1999-09-15 Takeda Chemical Industries, Ltd. Method for probing the function of a protein
EP0973794A1 (en) * 1997-02-19 2000-01-26 The Regents of the University of California Netrin receptors
WO2000022126A1 (en) * 1998-10-15 2000-04-20 Zymogenetics, Inc. Follistatin-related protein zfsta2
US6066451A (en) * 1994-10-03 2000-05-23 Beth Israel Deaconess Medical Center, Inc. Neural cell protein marker RR/B and DNA encoding the same
WO2000037639A2 (en) * 1998-12-18 2000-06-29 Incyte Pharmaceuticals, Inc. Lymphocytic membrane proteins
WO2000040719A2 (en) * 1999-01-06 2000-07-13 University Of Leeds Tissue repair protein involved in orofacial clefting and uses thereof
US6096873A (en) * 1996-07-12 2000-08-01 Genentech, Inc. Gamma-heregulin
WO2000050451A2 (en) * 1999-02-26 2000-08-31 Deutsches Krebsforschungszentrum Stiftung des öffentlichen Rechts Protein (tp) that is involved in the development of the nervous system
WO2000053734A2 (en) * 1999-03-09 2000-09-14 Schering Aktiengesellschaft Human angiogenesis relevant nucleic acid and protein sequences obtained from endothelial cells
US6130068A (en) * 1998-10-26 2000-10-10 Immunex Corporation Viral encoded semaphorin protein receptor DNA and polypeptides
WO2001030845A1 (en) * 1999-10-22 2001-05-03 American Home Products Corporation Pablo, a polypeptide that interacts with bcl-xl, and uses related thereto
EP1103604A1 (en) * 1998-07-29 2001-05-30 Kyowa Hakko Kogyo Co., Ltd. Novel polypeptide
WO2001055300A2 (en) * 2000-01-31 2001-08-02 Human Genome Sciences, Inc. Nucleic acids, proteins, and antibodies
WO2000039160A3 (en) * 1998-12-24 2001-08-23 Yeda Res & Dev
WO2001081361A1 (en) * 2000-04-11 2001-11-01 Cogent Neuroscience, Inc. Compositions and method for diagnosing and treating conditions, disorders, or diseases involving cell death
US6355788B1 (en) 1998-10-15 2002-03-12 Zymogenetics, Inc. Follistatin-related protein zfsta2
EP1209229A1 (en) * 2000-11-21 2002-05-29 Klaus-Peter Department of Psychiatry and Psychotherapy University of Würzburg Lesch Gene involved in schizophrenia
US6468766B1 (en) * 1996-03-15 2002-10-22 President And Fellows Of Harvard College Aortic carboxypeptidase-like polypeptide
US6472517B1 (en) * 1998-10-09 2002-10-29 Genset S.A. Nucleic acids encoding human CIDE-B protein and polymorphic markers thereof
US6482922B2 (en) 1995-11-02 2002-11-19 Human Genome Sciences, Inc. Mammary transforming protein
US6506882B2 (en) 1996-03-14 2003-01-14 Human Genome Sciences, Inc. Antibodies that bind tumor necrosis factor delta
US6541224B2 (en) 1996-03-14 2003-04-01 Human Genome Sciences, Inc. Tumor necrosis factor delta polypeptides
US6558910B2 (en) * 1999-09-10 2003-05-06 The Regents Of The University Of California SF, a novel family of taste receptors
US6562949B1 (en) 1997-10-28 2003-05-13 Immunex Corporation Antibodies to viral encoded semaphorin protein receptor polypeptides
EP1354948A1 (en) * 2000-12-22 2003-10-22 Kazusa DNA Research Institute Foundation Novel cancer-associated genes
US7115727B2 (en) 2002-08-16 2006-10-03 Agensys, Inc. Nucleic acids and corresponding proteins entitled 282P1G3 useful in treatment and detection of cancer
US7175995B1 (en) 1994-10-27 2007-02-13 Thomas Jefferson University TCL-1 protein and related methods
US7189820B2 (en) 2001-05-24 2007-03-13 Human Genome Sciences, Inc. Antibodies against tumor necrosis factor delta (APRIL)
US7214497B2 (en) 1997-10-28 2007-05-08 Immunex Corporation Viral encoded semaphorin protein receptor DNA and polypeptides
US7217788B2 (en) 1996-03-14 2007-05-15 Human Genome Sciences, Inc. Human tumor necrosis factor delta polypeptides
US7465550B2 (en) 1999-09-10 2008-12-16 The Regents Of The University Of California Method for screening taste-modulating compounds
US7601514B2 (en) * 2000-01-20 2009-10-13 Genentech, Inc. Nucleic acid encoding PRO10268 polypeptides
US7628989B2 (en) 2001-04-10 2009-12-08 Agensys, Inc. Methods of inducing an immune response
US7927597B2 (en) 2001-04-10 2011-04-19 Agensys, Inc. Methods to inhibit cell growth
US9173960B2 (en) 2011-11-04 2015-11-03 Novartis Ag Methods of treating cancer with low density lipoprotein-related protein 6 (LRP6)—half life extender constructs
US9290573B2 (en) 2010-05-06 2016-03-22 Novartis Ag Therapeutic low density lipoprotein-related protein 6 (LRP6) multivalent antibodies
US9428583B2 (en) 2010-05-06 2016-08-30 Novartis Ag Compositions and methods of use for therapeutic low density lipoprotein-related protein 6 (LRP6) multivalent antibodies

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
NATURE vol. 355, 13 February 1992, LONDON, UNITED KINGDOM pages 632 - 634 M.D. ADAMS 'Sequence Identification of 2375 human brain genes' *
SCIENCE vol. 252, 21 June 1991, WASHINGTON, DC, USA pages 1651 - 1656 M.D. ADAMS ET AL. 'Complementary DNA Sequencing: Expressed Sequence Tags and Human genome Projects' *

Cited By (116)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0679716A1 (en) * 1993-11-12 1995-11-02 Kenichi Matsubara Gene signature
EP0679716A4 (en) * 1993-11-12 1999-06-09 Kenichi Matsubara Gene signature.
US6232452B1 (en) 1993-12-24 2001-05-15 Julian R. Sampson Tuberous sclerosis 2 gene and uses thereof
US6207374B1 (en) 1993-12-24 2001-03-27 Medical Research Council Tuberous sclerosis 2 gene and uses thereof
US6485960B1 (en) 1993-12-24 2002-11-26 Medical Research Council Polycystic kidney disease 1 gene and uses thereof
WO1995018226A1 (en) * 1993-12-24 1995-07-06 University Of Wales College Of Medicine Tuberous sclerosis 2 gene and uses thereof
US5605797A (en) * 1994-09-15 1997-02-25 Board Of Trustees Operating Michigan State University Bovine β-mannosidase gene and methods of use
US5837836A (en) * 1994-09-15 1998-11-17 Board Of Trustees Operating Michigan State University Bovine β-mannosidase nucleic acid sequence
US6066451A (en) * 1994-10-03 2000-05-23 Beth Israel Deaconess Medical Center, Inc. Neural cell protein marker RR/B and DNA encoding the same
WO1996013514A1 (en) * 1994-10-27 1996-05-09 Thomas Jefferson University Tcl-1 gene and protein and related methods and compositions
US7175995B1 (en) 1994-10-27 2007-02-13 Thomas Jefferson University TCL-1 protein and related methods
US7749715B2 (en) 1994-10-27 2010-07-06 Thomas Jefferson University TCL-1 gene and protein and related methods and compositions
EP0710721A3 (en) * 1994-11-02 1999-09-15 Takeda Chemical Industries, Ltd. Method for probing the function of a protein
WO1997002280A1 (en) * 1995-06-30 1997-01-23 Human Genome Sciences, Inc. Breast specific genes and proteins
US5866330A (en) * 1995-09-12 1999-02-02 The Johns Hopkins University School Of Medicine Method for serial analysis of gene expression
US5695937A (en) * 1995-09-12 1997-12-09 The Johns Hopkins University School Of Medicine Method for serial analysis of gene expression
US6746845B2 (en) 1995-09-12 2004-06-08 The Johns Hopkins University Method for serial analysis of gene expression
US6383743B1 (en) 1995-09-12 2002-05-07 The John Hopkins University School Of Medicine Method for serial analysis of gene expression
US6482922B2 (en) 1995-11-02 2002-11-19 Human Genome Sciences, Inc. Mammary transforming protein
US6506882B2 (en) 1996-03-14 2003-01-14 Human Genome Sciences, Inc. Antibodies that bind tumor necrosis factor delta
US6509170B1 (en) 1996-03-14 2003-01-21 Human Genome Sciences, Inc. Polynucleotides encoding human tumor necrosis factor delta
US6541224B2 (en) 1996-03-14 2003-04-01 Human Genome Sciences, Inc. Tumor necrosis factor delta polypeptides
US7217788B2 (en) 1996-03-14 2007-05-15 Human Genome Sciences, Inc. Human tumor necrosis factor delta polypeptides
US7094878B2 (en) 1996-03-15 2006-08-22 President And Fellows Of Harvard College Aortic carboxypeptidase-like polypeptide
US6468766B1 (en) * 1996-03-15 2002-10-22 President And Fellows Of Harvard College Aortic carboxypeptidase-like polypeptide
EP0960120A4 (en) * 1996-05-06 2001-05-02 Chiron Corp MAMMALIAN SEX COMB ON MIDLEG (MAMMALIAN Scm) ACTS AS AN ONCOGENE
EP0960120A1 (en) * 1996-05-06 1999-12-01 Chiron Corporation MAMMALIAN SEX COMB ON MIDLEG (MAMMALIAN Scm) ACTS AS AN ONCOGENE
EP0915904A1 (en) * 1996-05-06 1999-05-19 Chiron Corporation Mammalian sex comb on midleg (mammalian scm) acts as a tumor suppressor
EP0915904A4 (en) * 1996-05-06 2001-05-02 Chiron Corp Mammalian sex comb on midleg (mammalian scm) acts as a tumor suppressor
EP0960115A1 (en) * 1996-05-08 1999-12-01 Chiron Corporation MAMMALIAN ADDITIONAL SEX COMBS (MAMMALIAN Asx) ACTS AS AN ONCOGENE
EP0922053A4 (en) * 1996-05-08 2001-04-25 Chiron Corp MAMMALIAN ADDITIONAL SEX COMBS (MAMMALIAN Asx) ACTS AS A TUMOR SUPPRESSOR
EP0960115A4 (en) * 1996-05-08 2001-04-25 Chiron Corp MAMMALIAN ADDITIONAL SEX COMBS (MAMMALIAN Asx) ACTS AS AN ONCOGENE
EP0922053A1 (en) * 1996-05-08 1999-06-16 Chiron Corporation MAMMALIAN ADDITIONAL SEX COMBS (MAMMALIAN Asx) ACTS AS A TUMOR SUPPRESSOR
US6096873A (en) * 1996-07-12 2000-08-01 Genentech, Inc. Gamma-heregulin
US7585673B2 (en) 1996-07-12 2009-09-08 Genentech, Inc. γ-heregulin
US6916624B2 (en) 1996-07-12 2005-07-12 Genentech, Inc. Antibodies that bind gamma-heregulin
US5874285A (en) * 1996-09-13 1999-02-23 Incyte Pharmaceuticals, Inc. Polynucleotide encoding a novel human nm23-like protein
US6087125A (en) * 1996-09-13 2000-07-11 Incyte Pharmaceuticals, Inc. Polynucleotide encoding a novel human nm23-like protein
US5917028A (en) * 1996-10-29 1999-06-29 Incyte Pharmaceuticals, Inc. Human phosphoprotein
US5840559A (en) * 1996-10-30 1998-11-24 Incyte Pharmaceuticals, Inc. Human spermidine/spermine N1-acetyltransferase
US5889170A (en) * 1997-01-31 1999-03-30 Incyte Pharmaceuticals, Inc. Human integral membrane protein
US7919588B2 (en) 1997-02-19 2011-04-05 The Regents Of The University Of California Netrin receptors
EP0973794A1 (en) * 1997-02-19 2000-01-26 The Regents of the University of California Netrin receptors
EP0973794A4 (en) * 1997-02-19 2000-12-06 Univ California Netrin receptors
US7041806B2 (en) 1997-02-19 2006-05-09 The Regents Of The University Of California Netrin receptors
WO1998037197A1 (en) * 1997-02-24 1998-08-27 Incyte Pharmaceuticals, Inc. Novel microtubule-associated protein
WO1998038209A2 (en) * 1997-02-26 1998-09-03 Genetics Institute, Inc. Secreted proteins and polynucleotides encoding them
WO1998038209A3 (en) * 1997-02-26 1998-12-17 Genetics Inst Secreted proteins and polynucleotides encoding them
US6171787B1 (en) 1997-06-26 2001-01-09 Abbott Laboratories Member of the TNF family useful for treatment and diagnosis of disease
WO1999000518A1 (en) * 1997-06-26 1999-01-07 Abbott Laboratories Member of the tnf family useful for treatment and diagnosis of disease
US5948619A (en) * 1997-07-31 1999-09-07 Incyte Pharmaceuticals, Inc. Human zygin-1
WO1999009158A1 (en) * 1997-08-13 1999-02-25 Chugai Research Institute For Molecular Medicine, Inc. PROTEIN HAVING Zn FINGER-LIKE MOTIF
US6187909B1 (en) 1997-10-28 2001-02-13 Immunex Corporation Viral encoded semaphorin protein receptor polypeptides
WO1999021997A1 (en) * 1997-10-28 1999-05-06 Immunex Corporation Viral encoded semaphorin protein receptor dna and polypeptides
US6562949B1 (en) 1997-10-28 2003-05-13 Immunex Corporation Antibodies to viral encoded semaphorin protein receptor polypeptides
US7214497B2 (en) 1997-10-28 2007-05-08 Immunex Corporation Viral encoded semaphorin protein receptor DNA and polypeptides
WO1999024610A1 (en) * 1997-11-06 1999-05-20 Millennium Pharmaceuticals, Inc. Novel genes encoding transporter-like molecules
US6277565B1 (en) * 1997-11-06 2001-08-21 Millennium Pharmaceuticals, Inc. OCT-3 gene encoding transporter-like molecules
US5916753A (en) * 1997-11-13 1999-06-29 Incyte Pharmaceuticals, Inc. SH3-containing proteins
EP1103604A1 (en) * 1998-07-29 2001-05-30 Kyowa Hakko Kogyo Co., Ltd. Novel polypeptide
US7262039B1 (en) 1998-07-29 2007-08-28 Kyowa Hakko Kogyo Co., Ltd. Polypeptide
EP1103604A4 (en) * 1998-07-29 2002-10-31 Kyowa Hakko Kogyo Kk Novel polypeptide
US7081515B2 (en) 1998-10-09 2006-07-25 Serono Genetics Institute S.A. CIDE-B polypeptides
US6472517B1 (en) * 1998-10-09 2002-10-29 Genset S.A. Nucleic acids encoding human CIDE-B protein and polymorphic markers thereof
US6355788B1 (en) 1998-10-15 2002-03-12 Zymogenetics, Inc. Follistatin-related protein zfsta2
WO2000022126A1 (en) * 1998-10-15 2000-04-20 Zymogenetics, Inc. Follistatin-related protein zfsta2
US6130068A (en) * 1998-10-26 2000-10-10 Immunex Corporation Viral encoded semaphorin protein receptor DNA and polypeptides
US6174689B1 (en) 1998-10-26 2001-01-16 Immunex Corporation Viral encoded semaphorin protein receptor DNA and polypeptides
WO2000037639A2 (en) * 1998-12-18 2000-06-29 Incyte Pharmaceuticals, Inc. Lymphocytic membrane proteins
WO2000037639A3 (en) * 1998-12-18 2000-11-16 Incyte Pharma Inc Lymphocytic membrane proteins
US7339047B2 (en) 1998-12-24 2008-03-04 Yeda Research And Development Company Ltd. Caspase-8 interacting proteins
WO2000039160A3 (en) * 1998-12-24 2001-08-23 Yeda Res & Dev
WO2000040719A3 (en) * 1999-01-06 2000-10-26 Univ Leeds Tissue repair protein involved in orofacial clefting and uses thereof
WO2000040719A2 (en) * 1999-01-06 2000-07-13 University Of Leeds Tissue repair protein involved in orofacial clefting and uses thereof
WO2000050451A3 (en) * 1999-02-26 2001-08-02 Deutsches Krebsforsch Protein (tp) that is involved in the development of the nervous system
WO2000050451A2 (en) * 1999-02-26 2000-08-31 Deutsches Krebsforschungszentrum Stiftung des öffentlichen Rechts Protein (tp) that is involved in the development of the nervous system
WO2000053734A2 (en) * 1999-03-09 2000-09-14 Schering Aktiengesellschaft Human angiogenesis relevant nucleic acid and protein sequences obtained from endothelial cells
WO2000053734A3 (en) * 1999-03-09 2001-04-26 Schering Ag Human angiogenesis relevant nucleic acid and protein sequences obtained from endothelial cells
US7868150B2 (en) 1999-09-10 2011-01-11 The Regents Of The University Of California Nucleic acids encoding T2R taste receptors
US7452694B2 (en) 1999-09-10 2008-11-18 The Regents Of The University Of California Nucleic acids encoding T2R of taste receptors
US9817000B2 (en) 1999-09-10 2017-11-14 The Regents Of The University Of California Method for identifying compounds that modulate a T2R taste receptor
US9063124B2 (en) 1999-09-10 2015-06-23 The Regents Of The University Of California Method for identifying compounds that modulate a T2R taste receptor
US7745601B2 (en) 1999-09-10 2010-06-29 The Regents Of The University Of California Nucleic acids encoding T2R, a novel family of taste receptors
US8624012B2 (en) 1999-09-10 2014-01-07 The Regents Of The University Of California Nucleic acids encoding T2R bitter taste receptors
US8580527B2 (en) 1999-09-10 2013-11-12 The Regents Of The University Of California Methods for identifying compounds which modulate T2R bitter taste receptors
US7888045B2 (en) 1999-09-10 2011-02-15 The Regents Of The University Of California Methods for identifying modulators of SF taste receptor signaling
US6558910B2 (en) * 1999-09-10 2003-05-06 The Regents Of The University Of California SF, a novel family of taste receptors
US7595166B2 (en) 1999-09-10 2009-09-29 The Regents Of The University Of California Methods of screening modulators of T2R taste receptors
US8329885B2 (en) 1999-09-10 2012-12-11 The Regents Of The University Of California Nucleic acid encoding a T2R taste receptor
US7479373B2 (en) 1999-09-10 2009-01-20 The Regents Of The University Of California Method for identifying compounds modulating taste transduction
US7465550B2 (en) 1999-09-10 2008-12-16 The Regents Of The University Of California Method for screening taste-modulating compounds
WO2001030845A1 (en) * 1999-10-22 2001-05-03 American Home Products Corporation Pablo, a polypeptide that interacts with bcl-xl, and uses related thereto
US6664068B2 (en) 1999-10-22 2003-12-16 Wyeth Pablo, a polypeptide that interacts with Bcl-xL, and uses related thereto
US7601514B2 (en) * 2000-01-20 2009-10-13 Genentech, Inc. Nucleic acid encoding PRO10268 polypeptides
WO2001055300A3 (en) * 2000-01-31 2002-01-03 Human Genome Sciences Inc Nucleic acids, proteins, and antibodies
WO2001055300A2 (en) * 2000-01-31 2001-08-02 Human Genome Sciences, Inc. Nucleic acids, proteins, and antibodies
WO2001081361A1 (en) * 2000-04-11 2001-11-01 Cogent Neuroscience, Inc. Compositions and method for diagnosing and treating conditions, disorders, or diseases involving cell death
WO2002042454A3 (en) * 2000-11-21 2003-03-13 Klaus-Peter Lesch Gene involved in schizophrenia
EP1209229A1 (en) * 2000-11-21 2002-05-29 Klaus-Peter Department of Psychiatry and Psychotherapy University of Würzburg Lesch Gene involved in schizophrenia
WO2002042454A2 (en) * 2000-11-21 2002-05-30 Lesch Klaus Peter Gene involved in schizophrenia
US8008437B2 (en) 2000-12-22 2011-08-30 Kazusa Dna Research Institute Foundation Cancer-associated genes
US7375199B2 (en) 2000-12-22 2008-05-20 Kazusa Dna Research Institute Foundation Cancer-associated genes
EP1354948A4 (en) * 2000-12-22 2004-06-30 Kazusa Dna Res Inst Foundation Novel cancer-associated genes
EP1354948A1 (en) * 2000-12-22 2003-10-22 Kazusa DNA Research Institute Foundation Novel cancer-associated genes
US7927597B2 (en) 2001-04-10 2011-04-19 Agensys, Inc. Methods to inhibit cell growth
US7951375B2 (en) 2001-04-10 2011-05-31 Agensys, Inc. Methods of inducing an immune response
US7736654B2 (en) 2001-04-10 2010-06-15 Agensys, Inc. Nucleic acids and corresponding proteins useful in the detection and treatment of various cancers
US7628989B2 (en) 2001-04-10 2009-12-08 Agensys, Inc. Methods of inducing an immune response
US7641905B2 (en) 2001-04-10 2010-01-05 Agensys, Inc. Methods of inducing an immune response
US7189820B2 (en) 2001-05-24 2007-03-13 Human Genome Sciences, Inc. Antibodies against tumor necrosis factor delta (APRIL)
US7612172B2 (en) 2002-08-16 2009-11-03 Agensys, Inc. Nucleic acids and corresponding proteins entitled 282P1G3 useful in treatment and detection of cancer
US7115727B2 (en) 2002-08-16 2006-10-03 Agensys, Inc. Nucleic acids and corresponding proteins entitled 282P1G3 useful in treatment and detection of cancer
US9290573B2 (en) 2010-05-06 2016-03-22 Novartis Ag Therapeutic low density lipoprotein-related protein 6 (LRP6) multivalent antibodies
US9428583B2 (en) 2010-05-06 2016-08-30 Novartis Ag Compositions and methods of use for therapeutic low density lipoprotein-related protein 6 (LRP6) multivalent antibodies
US9173960B2 (en) 2011-11-04 2015-11-03 Novartis Ag Methods of treating cancer with low density lipoprotein-related protein 6 (LRP6)—half life extender constructs
USRE47860E1 (en) 2011-11-04 2020-02-18 Novartis Ag Methods of treating cancer with low density lipoprotein-related protein 6 (LRP6)—half life extender constructs

Also Published As

Publication number Publication date
AU3665893A (en) 1993-09-03
WO1993016178A3 (en) 1993-11-25

Similar Documents

Publication Publication Date Title
WO1993016178A2 (en) Sequences characteristic of human gene transcription product
King et al. Mammalian homologs of Drosophila ELAV localized to a neuronal subset can bind in vitro to the 3'UTR of mRNA encoding the Id transcriptional repressor
Ju et al. Transcriptome analysis of channel catfish (Ictalurus punctatus): genes and expression profile from the brain
Burgess et al. A cluster of three novel Ca2+ channel γ subunit genes on chromosome 19q13. 4: evolution and expression profile of the γ subunit gene family
US6943241B2 (en) Full-length cDNA
US5352775A (en) APC gene and nucleic acid probes derived therefrom
Rudolph et al. Three Drosophila Beta-Tubulin Sequences: A Developmentally Regulated Isoform (β 3), the Testis-Specific Isoform (β 2), and an Assembly-Defective Mutation of the Testis-Specific Isoform (B2T8) Reveal Both an Ancient Divergence in Metazoan Isotypes and Structural Constraints for Beta-Tubulin Function
CA2100472C (en) Inherited and somatic mutations of apc gene in colerectal cancer of humans
EP1308459A2 (en) Full-length cDNA sequences
US20030044783A1 (en) Human genes and gene expression products
JP2001269182A (en) Sequence tag and coded human protein
US5872237A (en) Megabase transcript map: novel sequences and antibodies thereto
Seroussi et al. Characterization of the human NIPSNAP1 gene from 22q12: a member of a novel gene family
JPH10513045A (en) Novel chemokine expressed in human fetal spleen, its production and use
Oh et al. Fine mapping in tomato using microsynteny with the Arabidopsis genome: the Diageotropica (Dgt) locus
US5643778A (en) RNA editing enzyme and methods of use thereof
US5585462A (en) Cloning of perilipin proteins
US6368794B1 (en) Detection of altered expression of genes regulating cell proliferation
Lin et al. Variation in primary sequence and tandem repeat copy number among i-antigens of Ichthyophthirius multifiliis
Friedman et al. Isolation and identification of aging-related cDNAs in the mouse
Sutcliffe et al. Brain specific gene expression
US5763166A (en) Gene associated with X linked Kallmann syndrome and diagnostic applications therefrom
WO1997012973A1 (en) Human cyclin i and gene encoding the same
Orr et al. A new approach to understanding T cell development: the isolation and characterization of immature CD4-, CD8-, CD3-T cell cDNAs by subtraction cloning.
Rosen et al. WITHDRAWN APPLICATION AS PER THE LATEST USPTO WITHDRAWN LIST

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AU CA JP

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): AT BE CH DE DK ES FR GB GR IE IT LU MC NL PT SE

AK Designated states

Kind code of ref document: A3

Designated state(s): AU CA JP

AL Designated countries for regional patents

Kind code of ref document: A3

Designated state(s): AT BE CH DE DK ES FR GB GR IE IT LU MC NL PT SE

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
NENP Non-entry into the national phase in:

Ref country code: CA

122 Ep: pct application non-entry in european phase