WO1999031964A1

WO1999031964A1 - Nucleotide polymorphisms in soybean

Info

Publication number: WO1999031964A1
Application number: PCT/US1998/026935
Authority: WO
Inventors: Holly Jessen; David Webb; Paul Keim; Jim Schupp; Virginia Coryell
Original assignee: Pioneer Hi-Bred International, Inc.
Priority date: 1997-12-19
Filing date: 1998-12-18
Publication date: 1999-07-01
Also published as: BR9813805A; CA2314992A1; AU1927699A; AR017917A1

Abstract

Sequence polymorphisms at 53 loci throughout the soybean genome are described. These polymorphisms are used to fingerprint and map genes and QTL in selective breeding experiments and to identify flanking nucleic acids that map near genes and QTL.

Description

NUCLEOTIDE POLYMORPHISMS IN SOYBEAN FIELD OF THE INVENTION The invention is in the field of agricultural technology, particularly marker assisted selection of soybean.

CROSS-REFERENCE TO RELATED APPLICATIONS This application is a continuation-in-part of USSN 60/068, 185 "NUCLEOTIDE POLYMORPHISMS IN SOYBEAN" filed December 19, 1997 by Jessen et al. , which is incorporated herein by reference.

BACKGROUND OF THE INVENTION Genetic mutations are antecedent to evolution and genetic diversity. Along with mutations that change the expression of genes, and consequently the function and structure of organisms, comparative sequencing within a species reveals a plethora of neutral mutations in non-coding regions of genomic DNA. Whether or not mutations affect organisms, they constitute diversity in the nucleotide sequence - and if known, can be exploited as genetic markers on chromosomes.

To be genetically stable, a mutation in a base in one DNA strand typically requires a complementary base change in the opposite strand. Thus, the polymorphism, or change, between a wildtype and a mutant includes both strands of the DNA molecule. If a change allows the organism to be viable and fecund, any base can substitute for any other base at one or more nucleotide positions in a DNA sequence, or any length of DNA bases can be inserted or deleted. Mutations that are selectively neutral or provide an advantage in a particular environment may then proliferate within a population. Genetic markers represent (mark the location of) specific loci in the genome of a species or closely related species, and sampling of different genotypes at these marker loci reveals genetic variation. The genetic variation at marker loci can then be described and applied to genetic studies, commercial breeding, diagnostics, cladistic analysis of variance, or genotyping of samples. Genetic markers have the greatest utility when they are highly heritable, multi-allelic, and numerous. Most genetic markers are highly heritable because their alleles are determined by the nucleotide sequence of DNA, which is highly conserved from one generation to the next, and the detection of their alleles is unaffected by the natural environment. Markers have multiple alleles because, in the evolutionary process, rare, genetically-stable mutations in DNA sequences defining marker loci arose and were disseminated through the generations along with other existing alleles. The highly conserved nature of DNA combined with the rare occurrence of stable mutations allows genetic markers to be both predictable and discerning of different genotypes. The repertoire of genetic-marker technologies today allows multiple technologies to be used simultaneously in the same project. The invention of each new genetic-marker technology and each new DNA polymoφhism adds additional utility to genetic markers. Many genetic-marker technologies exist— including restriction-fragment-length polymoφhism (RFLP) Bostein et al (1980) Am J Hum Genet 32:314-331; single strand conformation polymoφhism (SSCP) Fischer et al. (1983) Proc Natl Acad Sci USA 80: 1579-1583, Orita et al. (1989) Genomics 5:874-879; amplified fragment-length polymoφhism (AFLP) Vos et al. (1995) Nucleic Acids Res 23:4407-4414; microsatellite or simple-sequence repeat (SSR) Weber JL and May PE (1989) Am J Hum Genet 44:388-396; random-amplified polymoφhic DNA (RAPD)

Williams et al (1990) Nucleic Acids Res 18:6531-6535; sequence tagged site (STS) Olson et al. (1989) Science 245: 1434-1435; genetic-bit analysis (GBA) Nikiforov et al (1994) Nucleic Acids Res 22:4167-4175; allele-specific polymerase chain reaction (ASPCR) Gibbs et al. (1989) Nucleic Acids Res 17:2437-2448, Newton et al. (1989) Nucleic Acids Res 17:2503-2516; nick-translation PCR (e.g., TaqMan™) Lee et al. (1993) Nucleic Acids Res 21-3161-3166; and allele-specific hybridization (ASH) Wallace et al. (1979) Nucleic Acids Res 6:3543-3557, (Sheldon et al. (1993) Clinical Chemistry 39(4):718-719) among others- with each technology having its own particular basis for detecting polymoφhisms in DNA sequence. The development of polymoφhic genetic markers has made it possible for quantitative and molecular geneticists to investigate what Edwards, et al., in Genetics 115: 113 (1987) referred to as "quantitative trait loci" (QTL), as well as their numbers, magnitudes and distributions. QTL include genes that control, to some degree, numerically representable phenotypic traits (disease resistance, crop yield, resistance to environmental extremes, etc.), that are distributed within a family of individuals as well as within a population of families of individuals. An experimental paradigm has been developed to identify and analyze QTL. This paradigm involves crossing two inbred lines and genotyping multiple marker loci and evaluating one to several quantitative phenotypic traits among the progeny of the cross. QTL are then identified and ultimately selected for based on significant statistical associations between the genotypic values determined by genetic marker technology and the phenotypic variability among the segregating progeny. Unfortunately, complete sets of genetic markers are not available for a variety of important crops, making it difficult to quickly assess the genotype of any particular individual. For example, although soybeans are a major cash crop which provide most of the world's protein and vegetable oils, complete sets of genetic markers which span the soybean genome are not available. Accordingly, there exists a need to develop genetic markers for genotyping, marker assisted selection, positional cloning of nucleic acids and the like, e.g., in soybean. This invention provides these and many other features.

SUMMARY OF THE INVENTION New sequence polymoφhisms at 63 different loci are described. Identification of these alleles provides compositions and methods for rapidly determining the complete genotype of a soybean plant. This ability to determine, accurately and quickly, the genotype of a soybean plant provides for improved methods of marker assisted selection in plant breeding and in analysis of transgenic soybean cells and plants. Example technologies which may be used to detect the loci include allele-specific hybridization (ASH), the polymerase chain reaction (PCR), random-amplified polymoφhic DNA (RAPD), restriction- fragment-length polymoφhism (RFLP), single strand conformation polymoφhism (SSCP), allele-specific polymerase chain reaction (ASPCR), genetic-bit analysis (GBA), nick-translation PCR (TaqMan^®), hybridization to solid phase arrays (e.g., very large scale immobilized polymer arrays (VLSIPS arrays)), and the like.

In one embodiment, methods of detecting one or more genetic nucleotide polymoφhism in a biological sample from a soybean plant are provided by hybridizing a probe nucleic acid to one of the loci described herein. For example, a biological sample derived from a soybean plant is provided, and a probe nucleic acid is hybridized to a target nucleic acid including a nucleotide polymoφhism from the locus. Preferred loci include pA060A, pA077A, pA086A, pA169A, pA280A, pA378A, pA505A, pA519A, pA588A, pA947B, pB032A, pB032B, pB039A, pBLT24A, pBLT65A, php02265A, php02301A, php02361A, php02370C, php02387A, ρhp02388A, php02393A, php02396A, php02636A, php03522A, php05219A, php05233A, php05278A, php05342A, php07659A, php08584A, phpl2105A, php02340B, php05264A, phpl0355B, pK069A, pK079A, pK401A, pK418A, pK644B, pL058A, pL183A, pR045A, pR153A, pT005A, pT155A, pA059A, pA064A, pBLT15A, pA593A, pA882A, p8320E, SOYBPSP, php02329A, php02371A, pA343A, pA748B, php05290A, pA858A, php02376A, pG17.3, pB132A, phpl0078A. Particularly preferred "php" loci include php02265A, php02301A, php02361A, php02370C, php02387A, php02388A, php02393A, php02396A, php02636A, php03522A, php05219A, php05233A, php05278A, php05342A, php07659A, php08584A, phpl2105A, php02340B, php05264A, phpl0355B php02329A, php02371A, php05290A, php02376A, and phpl0078A. In certain embodiments where more than one loci are detected, at least one of the detected loci will typically be a locus with a "php" designation. One newly discovered advantage for all of the loci noted above is that probes which specifically hybridize to the selected locus do not specifically hybridize to additional loci in the soybean genome because the loci are all unique in the soybean genome. In preferred aspects, the loci and included polymoφhic nucleotides are in linkage disequilibrium with a Quantitative Trait Locus (QTL) such as resistance to soybean cyst nematode, brown stem rot, phytopthora rot or the like. Accordingly, the presence or absence of the detected locus corresponds to the presence or absence of a particular QTL. A variety of probe nucleic acids which hybridize to the loci are provided by the invention. The probes include the amplicons, PCR primers, and the like, described herein, which are used to identify and detect the loci.

Hybridization of a probe to a locus is detected to confirm the presence of the locus and typically to determine whether a particular polymoφhic nucleotide is present at the locus. This detection is performed directly or indirectly. Direct detection, methods of detecting hybridization include Southern analysis, northern analysis, array-dependent nucleic acid hybridization on a nucleic acid polymer array, in situ hybridization, or other methods which directly monitor the hybridization. Indirect detection includes, e.g., detection of an amplification product which is dependent on hybridization of the probe to the target nucleic acid. For example, the polymerase chain reaction (PCR) and/or the ligase chain reaction (LCR) are used to monitor hybridization, e.g., by detecting formation of an amplicon which is synthesized only if a probe (e.g., a PCR primer) hybridizes to the target. Similarly, the probe or target is optionally amplified prior to detection. Preferred amplification methods include PCR, LCR, and cloning of the target nucleic acid.

In several embodiments, it will be desirable to detect more than one locus. For example, detection of multiple loci from a biological sample provide a way of providing an overall genotype of the biological sample. Thus, in one embodiment, a second probe nucleic acid is hybridized to a second target nucleic acid linked to a second nucleotide polymoφhism in a second locus selected from the second group of loci consisting of the loci noted above. Similarly, a plurality of probes (a third, fourth, fifth... nth probe) are hybridized to a plurality (a third, fourth, fifth... nth) polymoφhic nucleotide in one of the loci noted above. In one embodiment, a majority of the noted loci are detected. In an another preferred embodiment, all of the loci noted above are detected, thereby providing a comprehensive genotype. Similarly, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100%, or any intermediate percentage thereof of the loci are detected in alternate embodiments. The methods are applicable to detection of loci and genotyping in a variety of biological samples, including a soybean plant, a soybean plant extract, an isolated soybean plant tissue, an isolated plant tissue extract, a soybean plant cell culture, a soybean plant cell culture extract, a recombinant cell comprising a nucleic acid derived from a soybean plant, a soybean plant seed, and an extract of a recombinant cell comprising a nucleic acid derived from a soybean plant.

The target nucleic acid which is detected can include the first polymoφhic nucleotide, or it may be proximal to the polymoφhic nucleotide. In typical embodiments, the target nucleic acid includes polymoφhic nucleotide to be detected. However, depending on how the target nucleic acid is detected, it can also be convenient to detect nucleotides proximal to the polymoφhic nucleotide. For example, when LCR is used, the presence or absence of a polymoφhic nucleotide is detected by amplifying nucleotide regions flanking the polymoφhic nucleotide.

In one aspect, the present invention includes marker-assisted selection of soybean plants, e.g., by detecting any of the loci noted above and selecting a plant based upon the presence or absence of one or more desired polymoφhic nucleotide.

In another aspect, nucleic acids corresponding to nucleotides proximal to or including the marker nucleic acids are cloned. In a particularly preferred aspect, a nucleic acid flanked by two nucleic acid loci is cloned. Typically, this cloned nucleic acid includes a coding sequence. The cloned nucleic acid is optionally transduced into cells or plants, e.g., to make transgenic plants (e.g., soybean) expressing the coding sequence.

In one aspect, nucleotide polymoφhisms proximal to the selected loci are identified and mapped, e.g., by genetic mapping or nucleotide sequencing of nucleic acid regions genetically linked to the selected locus from genetically diverse strains of soybean. The identification of these additional polymoφhisms provides additional marker regions which are used to identify the source of a soybean nucleic acid.

Similar to the detection methods above, nucleotide polymoφhisms are also detected by separating nucleic acids having the polymoφhisms by size and or charge, thereby separating the nucleic acids. For example, single-strand conformation polymoφhism can be performed on two or more nucleic acids on a polyacrylamide gel. Amplification methods and compositions for detecting nucleic acids linked to loci are also provided. Typical amplification methods of the present invention include PCR, asymmetric PCR, and LCR. For example, methods of amplifying a nucleic acid with a first primer nucleic acid to a template nucleic acid and amplifying a portion of the template nucleic acid with a template-dependent polymerase enzyme or a ligase enzyme are provided. The primer hybridizes under stringent conditions to a locus nucleic acid from one of the loci described above. Typical amplification primer lengths are less than 100 nucleotides, although they may be longer or shorter, e.g., between about 10 and 50 nucleotides, typically between about 15 and 25 nucleotides, or as long as or longer than 100-200 nucleotides, or the like. Where the primer is a PCR primer, the primer provides a polymerase extendible substrate and the primer-dependent polymerase extends the primer. In one aspect, the primer is an allele-specific primer. In typical LCR amplification methods, the first primer hybridizes adjacent to a second primer on the template nucleic acid and the first and second primers are ligated with a ligase enzyme, thereby amplifying the portion of the template hybridized to the first and second primers. In PCR methods, the method includes hybridizing a second primer to the template, wherein the first and second primer hybridize to complementary strands of the template nucleic acid.

Amplification mixtures for practicing the amplification methods are also provided. For example, a PCR reaction mixture having, e.g., a polymerase enzyme, deoxy nucleotides, a template nucleic acid comprising a polymoφhic nucleotide which hybridizes under stringent conditions to a locus as above, and primers which specifically hybridize to the template nucleic acid are also provided. Primers include the PCR primers described herein, and additional primers selected to amplify portions of the amplicons described herein. As noted, the primers are optionally allele-specific primers to facilitate quantitative PCR.

PCR amplicons are also provided, including nucleic acids having a polymoφhic nucleotide. The amplicon hybridizes under stringent conditions to a locus selected from a group of loci consisting of those set forth above. Exemplar amplicons are described herein. Particularly preferred amplicons include phpl l l38 and php 11627. A variety of additional compositions are also provided by the present invention. One class of compositions has a first recombinant nucleic acid which differentially hybridizes under allele-specific hybridization conditions to a first allele from a locus in the soybean genome selected from the above loci, where the first recombinant nucleic acid shows decreased hybridization affinity for a second allele from the selected locus. The composition optionally includes one or more additional recombinant nucleic acids (i.e., additional probes) which differentially hybridize under allele-specific hybridization conditions to a second allele from a selected locus, wherein the second nucleic acid shows decreased hybridization affinity for the first allele from the selected locus. For example, multi-color hybridization nucleic acid probe hybridization techniques such as comparative genomic hybridization (CGH) or fluorescence in situ hybridization (FISH) can be used to detect different alleles on different chromosomes.

In another aspect, a composition including a recombinant nucleic acid which specifically hybridizes to a first allele-specific probe and a second allele-specific probe is provided. The recombinant nucleic acid can be a probe, target nucleic acid, chromosomal nucleic acid, recombinant nucleic acid or the like. The first and second allele-specific probes hybridize under allele-specific hybridization conditions to a first haplotype of a locus in the soybean genome noted above. The composition optionally comprises additional materials such as allele-specific probes for the detection of the nucleic acid, or the like. Sets of nucleic acid probes are also provided, including sets of nucleic acid probes having a plurality of probe nucleic acids which specifically hybridize to a plurality of target nucleic acids which hybridize under stringent conditions to a plurality of the loci noted above. The sets may be in any of a variety of physical arrangements, including arrays, containers, or the like. In a particularly preferred embodiment, the set is in kit form, i.e., having the set of nucleic acids, and optionally comprising one or more additional component such as a container, instructional materials, one or more control target nucleic acids, and recombinant cells comprising one or more target nucleic acids. Transgenic plants are provided. In particular, a transgenic plant having a recombinant nucleic acid which hybridizes under stringent conditions to a target nucleic acid is provided. The target nucleic acid is genetically linked to (and preferably comprises) a nucleotide polymoφhism from a locus selected from the group of loci noted above. In a preferred embodiment, the recombinant nucleic acid comprises a coding sequence encoded by a gene in linkage disequilibrium with a Quantitative Trait Locus (QTL). Example QTL include a QTL for resistance to soybean cyst nematode, a QTL for resistance to brown stem rot, and a QTL for resistance to phytopthora rot.

Definitions A "polymoφhism" is a change or difference between two related nucleic acids. A "nucleotide polymoφhism" refers to a nucleotide which is different in one sequence when compared to a related sequence when the two nucleic acids are aligned for maximal correspondence. A "genetic nucleotide polymoφhism" refers to a nucleotide which is different in one sequence when compared to a related sequence when the two nucleic acids are aligned for maximal correspondence, where the two nucleic acids are genetically related, i.e., homologous, e.g., where the nucleic acids are isolated from different strains of a soybean plant, or from different alleles of a single strain, or the like.

A "biological sample" is a portion of material isolated from a biological source such as a plant, isolated plant tissue, or plant cell, or a portion of material made from such a source such as a cell extract or the like.

A "probe nucleic acid" is an RNA or DNA or analogue thereof. The probe may be of any length. Typical probes include PCR primers, PCR amplicons, cloned genomic nucleic acids encoding a genetic locus of interest, and the like. "Marker assisted selection" refers to the process of selecting a desired trait or desired traits in a plant or plants by detecting one or more nucleic acids from the plant, where the nucleic acid is associated with the desired trait. A "locus" is a nucleic acid region where a polymoφhic nucleic acid resides.

A "genetic marker" is a region on a genomic nucleic acid mapped by a marker nucleic acid. A "marker nucleic acid" is a nucleic acid which is an indicator for the presence of a marker locus. The marker can be either a probe nucleic acid which identifies a target nucleic acid genetically linked to the locus, or a sequence hybridized by the probe, i.e., a genomic nucleic acid linked to the locus. Typically, a probe will be used to hybridize to or amplify the locus. Example markers include isolated nucleic acids from the locus, cloned nucleic acids comprising the locus, PCR primers for amplifying the locus, and the like.

Two nucleic acid sequences are "genetically linked" when the sequences are in linkage disequilibrium.

A "vector" is a carrier composition which assists in transducing, transforming or infecting a cell with a nucleic acid, thereby causing the cell to express vector associated nucleic acids and, optionally, proteins other than those native to the cell, or in a manner not native to the cell. The term vector includes nucleic acid (ordinarily RNA or DNA) to be expressed by the cell (a "vector nucleic acid"). A vector optionally includes materials to aid in achieving entry of the nucleic acid into the cell, such as a retroviral particle, liposome, protein coating or the like. A "promoter" is an array of nucleic acid control sequences which directs transcription of a nucleic acid. As used herein, a promoter includes necessary nucleic acid sequences near the start site of transcription, such as, in the case of a polymerase II type promoter, a TATA element. A promoter also optionally includes distal enhancer or repressor elements which can be located as much as several thousand base pairs from the start site of transcription. A "constitutive" promoter is a promoter which is active in a selected organism under most environmental and developmental conditions. An "inducible" promoter is a promoter which is under environmental or developmental regulation in a selected organism.

The terms "isolated" or "biologically pure" refer to material which is substantially or essentially free from components which normally accompany it as found in its native state. DETAILED DISCUSSION OF THE INVENTION The present invention resides in part in the identification of new marker loci for soybean plants. The loci include polymoφhic nucelotides which vary depending . on the particular strain of soybean considered. These loci are used in plant breeding projects, e.g., for marker-assisted selection, for positional cloning of linked nucleic acid regions of soybean chromosomal nucleic acids, and the like. Because the sequences of the loci and surrounding regions are provided, it is possible to easily select appropriate probes for detection of particular polymoφhic nucleotides, e.g., by allele-specific hybridization, PCR amplification, or the like. The polymoφhic nucleotides are detected directly, or by detecting nucleic acids in linkage disequilibrium with the loci.

The described loci which are prefixed by "php" were previously completely unknown, with no information being publicly available regarding the loci. Some of the other loci (those not preface by "php") were previously identified by binding to RFLP probes; however, no sequence information regarding the loci was available and it was, therefore, not possible to design probes which hybridize specifically to polymoφhic nucleotides. The locus SOYBPSP was previously sequenced in an intron of a gene, but no polymoφhism was previously identified at the locus. Clones comprising loci not prefaced by "php" i.e., comprising publicly available RFLP probes are publicly available from Biogenetic Services, Inc. (Brookings SD) and PE AgGen, Inc. (formerly Linkage Genetics) (Salt Lake City, UT). Uses for Nucleotide Polvmoφhisms

The nucleotide polymoφhisms described here are used, e.g., for DNA-fingeφrinting soybean varieties, genetic-linkage mapping of the soybean genome, marker association with specific genes or quantitative-trait loci (QTL) affecting phenotypic traits, marker-assisted selection for preferred genotypes in soybean breeding, positional cloning of genes from the soybean genome and other puφoses that will be apparent upon complete review of this disclosure. DNA Fingerprinting

DNA fmgeφrinting is the application of multiple genetic markers to DNA extracted from an individual or pooled group of related individuals, such as an inbred line or variety, so that the cumulative marker allele profile provides a description of the variety's overall genotype. Comparisons of these marker allele profiles can be made among samples of different varieties, or among different samples of the same variety, and estimates can be made regarding the genetic relationships among these samples. These estimates can be obtained without knowing the pedigrees of the fingeφrinted varieties.

DNA fingeφrints are used to determine whether a soybean variety is described and used within guidelines established, e.g., by Plant Variety Protection and patent laws. Seed of one variety may be sold by two different vendors using different variety names, or a variety may have been genetically bred from another variety by repeated cycles of backcrossing and selection to the extent that the new variety was essentially derived from the original variety. In these situations, questions about ownership may need to be answered. DNA-fingeφrint profiles can be obtained from each variety or seed source and compared. If done properly, the data will show with a high probability whether or not the different samples are genetically alike. The polymoφhisms at the 63 polymoφhic loci described herein invention provide a basis for quickly and reliably comparing DNA-fingeφrint profiles in soybean. These 63 loci are well distributed around the soybean genome and provide an adequate number of loci to make reasonable conclusions regarding variety or seed-source identities and relationships.

DNA fingeφrinting is used by soybean breeders to identify diverse breeding parents, to create novel recombinant populations, to better understand the structure of the soybean genome, and to better understand the history of pedigree breeding. The 63 example loci herein provide sufficient polymoφhisms to estimate genetic relationships among soybean varieties.

DNA fingeφrinting is used by soybean seed companies to estimate genetic purity of a seedlot for quality control and labeling of their product, and to identify the variety source of any contamination found among fingeφrinted samples. These 63 loci provide sufficient polymoφhisms to estimate genetic purity within a seedlot. Genetic Mapping of the Soybean Genome

Genetic mapping is done by finding polymoφhic markers that are genetically linked to each other (in linkage groups) or linked to genes or QTL affecting phenotypic traits of interest within a segregating population. The alignment of markers into linkage groups is useful as a reference for future use of the markers and for accurately positioning genes or QTL relative to the markers. The nucleotide polymoφhisms described here for 63 exemplar loci provide a means to utilize these loci in genetic mapping studies in soybean. Many of these loci have multiple sub-loci and haplotypes across the sub-loci. Each haplotype provides a different allele composition within a locus, thereby expanding the utility of these marker loci to more soybean mapping studies than possible with only two alleles per locus. Many of these 63 loci were intentionally selected for polymoφhism development because they were widely dispersed among soybean genetic linkage groups and would therefore collectively maximize their utility for mapping the soybean genome. Because each of these 63 loci was designed to be a discrete individual locus, these loci cannot be confused with duplicate loci having similar sequence elsewhere in the genome. They therefore are excellent reference loci on any soybean genetic map and can be used to reliably align the same linkage groups from different maps.

Many of these marker loci were selected to develop additional nucleotide polymoφhisms because they were found to be genetically linked to important QTL for disease resistance. The loci php05219A, php07659A, phpl0355B, and pK069A were found to cluster around a resistance QTL on group G; the loci pT155A, pBLT24A, and pBLT65A were clustered around a resistance QTL on group A; and the locus php02301A was near a resistance QTL on group M. These three QTL provide resistance to soybean cyst nematode (Heterodera glycines lchinohe) (See also, Webb et al. (1995) Theor Appl Genet 91:574-581). The loci php02636A on group C, php08584A on group S, and pK079A on group L26 were all linked to additional QTL for resistance to soybean cyst nematode. The locus pB032B on group J was near Rbs₃ for resistance to brown stem rot. The loci pK418A and pA280A on group N were near Rps,, the locus pR045A on group F was linked to Rps₃, the loci pA378A and pL183A on group G were near Rps₄, and the locus pT005A on group G was near Rps₅, all providing resistance to phytophthora rot. Marker-Assisted Selection in Soybean Improvement

After genes or a QTL and a marker or markers are mapped together and found to be in linkage disequilibrium, it is possible to use those markers to select for the desired alleles of those genes or QTL - a process called marker-assisted selection (MAS). In brief, a nucleic acid corresponding to the marker nucleic acid is detected in a biological sample from a plant to be selected. This detection can take the form of hybridization of a probe nucleic acid to a marker, e.g., using allele-specific hybridization, Southern analysis, northern analysis, in situ hybridization, hybridization of primers followed by PCR amplification of a region of the marker or the like. A variety of procedures for detecting markers are described herein. After the presence (or absence) of a particular marker in the biological sample is verified, the plant is selected, i.e., used to make progeny plants by selective breeding.

Nucleotide polymoφhisms were developed at markers near numerous resistance loci in soybean that are effective against soybean cyst nematode, Phytophthora sojae (phytophthora rot), and Phialophora gregata (brown stem rot). These are among the most damaging pathogens to soybeans in North America.

Soybean breeders need to combine disease resistance loci with genes for high yield and other desirable traits to develop improved soybean varieties. Disease screening for large numbers of samples can be expensive, time consuming, and unreliable. Use of the nucleotide polymoφhisms described here and genetically-linked nucleotides as genetic markers for disease resistance loci is an effective method of selecting resistant varieties in breeding programs. When a population is segregating for multiple loci affecting multiple diseases, the efficiency of MAS compared to phenotypic screening becomes even greater because all the loci can be processed in the lab together from a single sample of DNA. Another advantage over field evaluations for disease reaction is that MAS can be done at any time of year regardless of the growing season. Moreover, environmental effects are irrelevant to marker-assisted selection.

Another use of MAS in plant and animal breeding is to assist the recovery of the recurrent parent genotype by backcross breeding. Backcross breeding is the process of crossing a progeny back to one of its parents. Backcrossing is usually done for the puφose of introgressing one or a. few loci from a donor parent into an otherwise desirable genetic background from the recurrent parent. The more cycles of backcrossing that is done, the greater the genetic contribution of the recurrent parent to the resulting variety. This is often necessary, because resistant plants may be otherwise undesirable, i.e., due to low yield, low fecundity, or the like. In contrast, strains which are the result of intensive breeding programs may have excellent yield, fecundity or the like, merely being deficient in one desired trait such as resistance to a particular pathogen. The 63 marker loci described in the Examples below are distributed around the soybean genome and are used to select for the recurrent-parent genotype. MAS for the recurrent-parent genotype can be combined with MAS for the disease resistance loci using these markers. Accordingly, it is possible to use the markers to introduce disease resistance QTL into plant varieties having an otherwise desirable genetic background using the markers of the invention for selection of the QTL and for selection of the otherwise desirable background. Positional Cloning in Soybean Positional gene cloning uses the proximity of a mapped gene and its linked markers to physically define a cloned chromosomal fragment that contains a desired gene. If two or more markers flanking the gene are physically close to each other, they may hybridize to the same DNA fragment, thereby identifying a clone on which the gene is located. If flanking markers are more distant from each other, a fragment containing the gene may be identified by constructing a contig of overlapping clones.

Recently, BAC (bacterial artificial chromosome) and YAC (yeast artificial chromosome) libraries containing large fragments of soybean DNA have been constructed Funke RP and Kolchinsky A (1994) CRC Press, Boca Raton, FL, ppl25-308 1994; Marek LF and Shoemaker RC (1996) Soybean Genet Newsl 23: 126-129 1996; Danish et al. (1997) Soybean Genet Newsl 24: 196-198. These libraries and advances in genetic mapping make positional cloning of soybean genes feasible using the markers identified herein.

A marker is ideally locus-specific to reliably identify a clone from the targeted chromosomal region. The soybean genome is highly duplicated (Shoemaker et al (1996) (Glycine subgenus soja) Genetics 144:329-338, but each nucleotide polymoφhism and its PCR primers described here is specific to a single locus in the soybean genome and therefore correctly identifies soybean clones that hybridize to a corresponding probe DNA sequence corresponding to a particular target genomic location. Some of these marker loci are closely linked to agronomically important genes, such as genes for resistance to soybean cyst nematode and fungal pathogens, and are used as locus-specific reference points in positional cloning efforts for these genes. Making and Using Markers for Detection of Polymoφhic Nucleic Acids

The ability to characterize an individual by its genome is due to the inherent variability of genetic information. Although DNA sequences which code for necessary proteins are well conserved across a species, there are regions of DNA which are non-coding or code for portions of proteins which do not have critical functions and therefore, absolute conservation of nucleic acid sequence is not strongly selected for. These variable regions are identified by genetic markers. Typically, genetic markers are bound by probes such as oligonucleotides or amplicons which bind to variable regions of the genome. In some instances, the presence or absence of binding to a genetic marker identifies individuals by their unique nucleic acid sequence. In other instances, a marker binds to nucleic acid sequences of all individuals but the individual is identified by the position in the genome bound by a marker probe.

The major causes of genetic variability are addition, deletion, or point mutations, recombination and transposable elements within the genome of individuals in a plant population.

Point mutations are typically the result of inaccuracy in DNA replication. During meiosis in the creation of germ cells or in mitosis to create clones, DNA polymerase "switches" bases, either transitionally (i.e. , a purine for a purine and a pyrimidine for a pyrimidine) or transversionally (i.e., purine to pyrimidine and vice versa). The base switch is maintained if the exonuclease function of DNA polymerase does not correct the mismatch. At germination, or the next cell division (in clonal cells), the DNA strand with the point mutation becomes the template for a complementary strand and the base switch is incoφorated into the genome. Transposable elements are sequences of DNA which have the ability to move or to jump to new locations within a genome and several examples of transposons are known in the art.

Given the sequences herein, one of skill can generate probe nucleic acids for detecting markers, including probes which are PCR primers, allele-specific probes, PCR amplicons and the like for the detection of polymoφhic nucleotides at the loci disclosed herein, as well as genetically-linked sequences.

Cloning methodologies for replicating nucleic acids and sequencing methods to verify the sequence of nucleic acids are well known in the art. Examples of appropriate cloning and sequencing techniques, and instructions sufficient to direct persons of skill through many cloning exercises are found in Berger and Kimmel, Guide to Molecular Cloning Techniques, Methods in Enzymology volume 152 Academic Press, Inc., San Diego, CA (Berger); Sambrook et al. (1989) Molecular Cloning - A Laboratory Manual (2nd ed.) Vol. 1-3, Cold Spring Harbor Laboratory, Cold Spring Harbor Press, NY, (Sambrook); and Current Protocols in Molecular Biology, F.M. Ausubel et al. , eds., Current Protocols, a joint venture between Greene Publishing Associates, Inc. and John Wiley & Sons, Inc. , (through and including the 1997 Supplement) (Ausubel). A catalogue of Bacteria and Bacteriophages useful for cloning is provided, e.g., by the ATCC, e.g., The ATCC Catalogue of Bacteria and Bacteriophage (1992) Gherna et al. (eds) published by the ATCC. Additional basic procedures for sequencing, cloning and other aspects of molecular biology and underlying theoretical considerations are also found in Lewin (1995) Genes V Oxford University Press Inc. , NY (Lewin); and Watson et al. (1992) Recombinant DNA Second Edition Scientific American Books, NY. Product information from manufacturers of biological reagents and experimental equipment also provide information useful in known biological methods. Such manufacturers include the Sigma Chemical Company (Saint Louis, MO); New England Biolabs (Beverly, MA); R&D systems (Minneapolis, MN); Pharmacia LKB Biotechnology (Piscataway, NJ); CLONTECH Laboratories, Inc. (Palo Alto, CA);

ChemGenes Coφ. , (Waltham MA) Aldrich Chemical Company (Milwaukee, WI); Glen Research, Inc. (Sterling, VA); GIBCO BRL Life Technologies, Inc. (Gaithersberg, MD); Fluka Chemica-Biochemika Analytika (Fluka Chemie AG, Buchs, Switzerland); Invitrogen (San Diego, CA); Perkin Elmer (Foster City, CA); and Strategene; as well as many other commercial sources known to one of skill. As described previously, genetic markers and some RFLP probes are available from Biogenetic Services, Inc. (Brookings, SD), Linkage Genetics (Salt Lake City, UT- a subsidiary of Perkin Elmer, Branchburg NJ).

The nucleic acid compositions of this invention, whether DNA, RNA, cDNA, genomic DNA, or analogues thereof, or a hybrid of these molecules, are isolated from biological sources or synthesized in vitro. The nucleic acids of the invention are present in transfected whole cells, in transfected cell lysates, in transgenic plants (especially soybean) or in partially purified or substantially pure form.

In vitro amplification techniques suitable for amplifying sequences for use as molecular probes or generating nucleic acid fragments for subsequent subcloning are known. Examples of techniques sufficient to direct persons of skill through such in vitro amplification methods, including the polymerase chain reaction (PCR) the ligase chain reaction (LCR), Q/3-replicase amplification and other RNA polymerase mediated techniques (e.g., NASBA) are found in Berger, Sambrook, and Ausubel, as well as Mullis et al , (1987) U.S. Patent No. 4,683,202; PCR Protocols A Guide to Methods and Applications (Innis et al. eds) Academic Press Inc. San Diego, CA (1990) (ϊnnis); • Arnheim & Levinson (October 1 , 1990) C&EN 36-47; The Journal Of NIH Research (1991) 3, 81-94; (Kwoh et al. (1989) Proc. Natl. Acad. Sci. USA 86, 1173; Guatelli et al. (1990) Proc. Natl. Acad. Sci. USA 87, 1874; Lomell et al (1989) J. Clin. Chem 35, 1826; Landegren et al , (1988) Science 241, 1077-1080; Van Brunt (1990) Biotechnology 8, 291-294; Wu and Wallace, (1989) Gene 4, 560; Barringer et al. (1990) Gene 89, 117, and Sooknanan and Malek (1995) Biotechnology 13: 563-564. Improved methods of cloning in vitro amplified nucleic acids are described in Wallace et al , U.S. Pat. No.

5,426,039. One of skill will appreciate that essentially any RNA can be converted into a double stranded DNA suitable for restriction digestion, PCR expansion and sequencing using reverse transcriptase and a polymerase. See, Ausbel, Sambrook and Berger, all supra. Oligonucleotides for use as probes, e.g., in in vitro amplification methods, for use as gene probes, or as inhibitor components (e.g., ribozymes) are typically synthesized chemically according to the solid phase phosphoramidite triester method described by Beaucage and Caruthers (1981), Tetrahedron Letts. , 22(20): 1859-1862, e.g., using an automated synthesizer, as described in Needhaim-VanDevanter et al. (1984) Nucleic Acids Res., 12:6159-6168. Oligonucleotides can also be custom made and ordered from a variety of commercial sources known to persons of skill. Purification of oligonucleotides, where necessary, is typically performed by either native acrylamide gel electrophoresis or by anion-exchange HPLC as described in Pearson and Regnier (1983) J. Chrom. 255: 137-149. The sequence of the synthetic oligonucleotides can be verified using the chemical degradation method of Maxam and Gilbert (1980) in Grossman and Moldave (eds.) Academic Press, New York, Methods in Enzymology 65:499-560.

Providing Large Nucleic Acid Templates

In certain applications it is advantageous to make or clone large nucleic acids which encompass multiple loci, or to detect, clone, or isolate nucleic acids linked to polymoφhic nucleotides. For example, as described supra, in one embodiment, positional cloning is used to isolate nucleic acids proximal to polymoφhic nucleotides, e.g., at more than one locus. These nucleic acids are in linkage disequilibrium with the polymoφhic nucleotides, i.e., they are genetically linked to the polymoφhic nucleotides on a chromosomal nucleic acid. It will be appreciated that a nucleic acid genetically linked to a polymoφhic nucleotide optionally resides up to about 50 centimorgans from the polymoφhic nucleic acid, although the precise physical distance will vary depending on the cross-over frequency of the particular chromosomal region. Typical distances from a polymoφhic nucleotide are in the range of 1-50 centimorgans, for example, less than 1-5, about 1-5, 1, 5, 10, 15, 20, 25, 30, 35, 40, 45 or 50 centimorgans, etc.

Many methods of making large recombinant RNA and DNA nucleic acids, including recombinant plasmids, recombinant lambda phage, cosmids, yeast artificial chromosomes (YACs), PI artificial chromosomes, Bacterial Artificial Chromosomes (BACs), and the like are known. A general introduction to YACs, BACs, PACs and MACs as artificial chromosomes is described in Monaco and Larin (1994) Trends Biotechnol 12(7): 280-286. Examples of appropriate cloning techniques for making large nucleic acids, and instructions sufficient to direct persons of skill through many cloning exercises are found in Berger and Kimmel, Sambrook, and Ausubel, all supra.

In one aspect, nucleic acids hybridizing to the polymoφhic nucleic acids disclosed herein (or linked to such nucleic acids) are cloned into large nucleic acids such as YACs, or are detected in YAC genomic libraries cloned from soybean. The construction of YACs and YAC libraries is known. See, Berger, supra, and Burke et al. (1987) Science 236:806-812. Gridded libraries of YACs are described in Anand et al (1989) Nucleic Acids Res. 17, 3425-3433, and Anand et al. (1990) Nucleic Acids Res. Riley (1990) 18: 1951-1956 Nucleic Acids Res. 18(10):2887-2890 and the references therein describe cloning of YACs and related technologies. YAC libraries containing large fragments of soybean DNA have been constructed. See, Funke and Kolchinsky (1994) CRC Press,. Boca Raton, FL, pp. 125-308 1994; Marek and Shoemaker (1996) Soybean Genet Newsl 23: 126-129 1996; Danish et al. (1997) Soybean Genet Newsl 24: 196-198. See also, Ausubel, chapter 13 for a description of procedures for making YAC libraries.

Similarly, cosmids or other molecular vectors such as BAC and PI constructs are also useful for isolating or cloning nucleic acids linked to polymoφhic nucleic acids. Cosmid cloning is also known. See, e.g., Ausubel, chapter 1.10.11 (supplement 13) and the references therein. See also, Ish-Horowitz and Burke (1981) Nucleic Acids Res. 9:2989-2998; Murray (1983) Phage Lambda and Molecular Cloning in Lambda II (Hendrix et al , eds) 395-432 Cold Spring Harbor Laboratory, NY; Frischauf et al. (1983) J.Mol Biol. 170:827-842; and, Dunn and Blattner (1987) Nucleic Acids Res. 15:2677-2698, and the references cited therein. Construction of BAC and PI libraries is known; see, e.g., Ashworth et al. (1995) Anal Biochem 224(2): 564-571; Wang et al. (1994) Genomics 24(3): 527-534; Kim et al. (1994) Genomics 22(2):336-9; Rouquier et al. (1994) Anal Biochem 217(2):205-9; Shizuya et al. (1992) Proc Natl Acad Sci U S A 89(18): 8794-7; Ki et al (1994) Genomics 22(2):336-9; Woo et al. (1994) Nucleic Acids Res 22(23): 4922-31 ; Wang et al. (1995) P/α/it (3):525-33; Cai (1995) Genomics 29(2): 413-25; Schmitt et al (1996) Genomics 1996 33(l):9-20; Kim et α/. (1996) Genomics 34(2):213-8; Kim et al (1996) Proc Natl Acad Sci U S A

(13): 6297-301; Pusch et al (1996) Gene 183(l-2):29-33; and, Wang et al (1996) Genome Res 6(7): 612-9.

Improved methods of in vitro amplification to amplify large nucleic acids linked to the polymoφhic nucleic acids herein are summarized in Cheng et al. (1994) Nature 369:684-685 and the references therein.

In addition, any of the cloning or amplification strategies described above are useful for creating contigs of overlapping clones, thereby providing overlapping nucleic acids which show the physical relationship at the molecular level for genetically linked nucleic acids. A common example of this strategy is found in whole organism sequencing projects, in which overlapping clones are sequenced to provide the entire sequence of a chromosome. In this procedure, a library of the organism's cDNA or genomic DNA is made according to standard procedures described, e.g., in the references above. Individual clones are isolated and sequenced, and overlapping sequence information is ordered to provide the sequence of the organism. See also, Tomb et al. (1997) Nature 539-547 describing the whole genome random sequencing and assembly of the complete genomic sequence of Helicobacter pylori; Fleischmann et al. (1995) Science 269:496-512 describing whole genome random sequencing and assembly of the complete Haemophilus influenzae genome; Fraser et al (1995) Science 21Q- 91- 403 describing whole genome random sequencing and assembly of the complete Mycoplasma genitalium genome and Bult et al. (1996) Science 273: 1058-1073 describing whole genome random sequencing and assembly of the complete Methanococcus jannaschii genome. Recently, Hagiwara and Curtis (1996) Nucleic Acids Research 24(12): 2460-2461 developed a "long distance sequencer" PCR protocol for generating overlapping nucleic acids from very large clones to facilitate sequencing, and methods of amplifying and tagging the overlapping nucleic acids into suitable sequencing templates. The methods can be used in conjunction with shotgun sequencing techniques to improve the efficiency of shotgun methods typically used in whole organism sequencing projects. As applied to the present invention, the techniques are useful for identifying and sequencing genomic nucleic acids genetically linked to the loci described. Hybridization Strategies

In a preferred aspect, a labeled probe nucleic acid is specifically hybridized to a marker nucleic acid from a biological sample and the label is detected, thereby determining that the marker nucleic acid is present in the sample. -For example, a marker comprising a polymoφhic nucleic acid can be detected by allele-specific hybridization of a probe to the region of the marker comprising the polymoφhic nucleic acid. Similarly, a marker can be detected by Southern analysis, northern analysis, in situ analysis, or the like.

Two single-stranded nucleic acids "hybridize" when they form a double- stranded duplex. The region of double-strandedness can include the full-length of one or both of the single-stranded nucleic acids, or all of one single stranded nucleic acid and a subsequence of the other single stranded nucleic acid, or the region of double- strandedness can include a subsequence of each nucleic acid. "Stringent hybridization conditions" in the context of nucleic acid hybridization are sequence dependent and are different under different environmental parameters. An extensive guide to the hybridization of nucleic acids is found in Tijssen (1993), id. Generally, stringent conditions are selected to be about 5° C lower than the thermal melting point (T_m) for the specific sequence at a defined ionic strength and pH. The T_m is the temperature

(under defined ionic strength and pH) at which 50% of the target sequence hybridizes to a perfectly matched probe. Highly stringent conditions are selected to be equal to the T_m point for a particular probe. Sometimes the term "T_d" is used to define the temperature at which at least half of the probe dissociates from a perfectly matched target nucleic acid. In any case, a variety of estimation techniques for estimating the T_m or T_d are available, and generally described in Tijssen, id. Typically, G-C base pairs in a duplex are estimated to contribute about 3°C to the T_m, while A-T base pairs are estimated to contribute about 2°C, up to a theoretical maximum of about 80-100°C. However, more sophisticated models of T_M and T_d are available and appropriate in which G-C stacking interactions, solvent effects, the desired assay temperature and the like are taken into account. In one example, PCR primers were designed to have a dissociation temperature (T_d) of approximately 60°C, using the formula: T_d = (((((3 x #GC) + (2 x #AT)) x 37) - 562) / #bp) - 5; where #GC, #AT, and #bp are the number of guanine-cytosine base pairs, the number of adenine-thymine base pairs, and the number of total base pairs, respectively, involved in the annealing of the primer to the template DNA.

An example of stringent hybridization conditions for hybridization of complementary nucleic acids that have more than 100 complementary residues on a filter in a Southern or northern blot is 50% formalin with 1 mg of heparin at 42°C, with the hybridization being carried out overnight. An example of stringent wash conditions for a Southern blot of such nucleic acids is a 0.2x SSC wash at 65 °C for 15 minutes (see, Sambrook, supra for a description of SSC buffer). Often the high stringency wash is preceded by a low stringency wash to remove background probe signal. An example low stringency wash is 2x SSC at 40°C for 15 minutes.

In general, a signal to noise ratio of 2x (or higher) than that observed for an unrelated probe in the particular hybridization assay indicates detection of a specific hybridization. For highly specific hybridization strategies such as allele-specific hybridization, an allele-specific probe is usually hybridized to a marker nucleic acid (e.g., a genomic nucleic acid, an amplicon, or the like) comprising a polymoφhic nucleotide under highly stringent conditions. Allele-Specific Hybridization (ASH)

One especially preferred example of a hybridization technology for detecting marker nucleic acids is allele-specific hybridization, or "ASH. " This technology is based on the stable annealing of a short, single-stranded oligonucleotide probe to a single-stranded target nucleic acid only when base pairing is completely complementary. The hybridization can then be detected from a radioactive or non-radioactive label on the probe (methods of labeling probes and other nucleic acids are set forth in detail below).

ASH markers are polymoφhic when their base composition at one or a few nucleotide positions in a segment of DNA is different among different genotypes. For each polymoφhism, two or more different ASH probes are designed to have identical DNA sequences except at the polymoφhic nucleotide(s). Each probe will have exact homology with one allele sequence so that the complement of probes can distinguish all the alternative allele sequences. Each probe is hybridized against the target DNA. With appropriate probe design and stringency conditions, a single-base mismatch between the probe and target DNA will prevent hybridization and the unbound probe will wash away. In this manner, only one of the alternative probes will hybridize to a target sample that is homozygous or homogeneous for an allele (an allele is defined by the DNA homology between the probe and target). Samples that are heterozygous or heterogeneous for two alleles will hybridize to both of two alternative probes. Having a probe for each allele allows the polymoφhism to be genetically co-dominant which is useful in determining zygosity. In addition, a co-dominant ASH system is useful when hybridization does not occur for either one of two alternative probes, so that control experiments can be directed towards verifying insufficient target DNA or the occurrence of a new allele. ASH markers are used as dominant markers where the presence or absence of only one allele is determined from hybridization or lack of hybridization by only one probe. The alternative allele may be inferred from the lack of hybridization. Heterogeneous target nucleic acids (i.e., chromosomal DNA from a multiallelic plant) are detected by monitoring simultaneous hybridization of two or more probes comprising different polymoφhic nucleotides to a genomic nucleic acid.

Allele-specific hybridization was first described by Wallace et al. (1979) who showed that the hybridization between an oligonucleotide probe and bacteriophage target DNA, dissociated at about 10° C lower temperature when the probe and target sequences had a single base-pair mismatch compared to when the probe and target DNA had perfect homology. This difference in thermal stability allowed ASH probes to discriminate the two alleles determined by a single-nucleotide polymoφhism between the wildtype sequence and a point mutation in the am-3 bacteriophage.

Later it was shown that a mixture of ASH probes, designed from the possible degenerate DNA sequences coding for a known amino acid sequence, could be used to identify clones containing the rabbit β-globin DNA that coded for that protein (Wallace et al. (1981) Nuclei Acids Res 9:879-894). They also showed that the only probe that hybridized to the clones had exact homology to the clone, whereas three probes that did not hybridize to the clones had a single base-pair mismatch with the target DNA. ASH markers have been developed to diagnose susceptibility to human diseases caused by point mutations in DNA sequence. Examples are for the /3^s-globin allele that can cause sickle-cell anemia (Conner et al. (1983) Proc Natl Acad Sci USA 80:278-282), the /3°-thalassemia allele that can cause /3-thalassemia (Pirastu et al. (1983) New England J Med 309:284-287), the /3,-antitrypsin allele that can cause liver cirrhosis and pulmonary emphysema (Kidd (1983) Nature 304:230-234), the HLA-DR haplotypes associated with immune response (Angelini et al. (1986) Proc Natl Acad Sci USA 83:4489-4493), and the A985G allele that can cause medium-chain acyl-CoA dehydrogenase deficiency (Iitia A et al. (1994) BioTechniques 17:566-571).

ASH markers have also been developed to identify strains of fungi resistant to the fungicide benzimidazole because of specific point mutations in the j8-tubulin gene in Venturia inaequalis (Koenraadt and Jones (1992) Phytopathology 82: 1354-1358 and Rhynchosporium secalis (Wheeler et al. (1995) Pestic Sci 43:201-209). An ASH probe is designed to form a stable duplex with a nucleic acid target only when base pairing is completely complementary. One or more base-pair mismatches between the probe and target prevents stable hybridization. This holds true for numerous variations of the process. The probe and target molecules are optionally either RNA or denatured DNA; the target molecule(s) is/are any length of nucleotides beyond the sequence that is complementary to the probe; the probe is designed to hybridize with either strand of a DNA target; the probe ranges in size to conform to variously stringent hybridization conditions, etc.

The polymerase chain reaction (PCR) (see, e.g., Mullis KB and Faloona F (1987) Methods Enzymol 155:335-350 and references supra) allows the target sequence for ASH to be amplified from low concentrations of nucleic acid in relatively small volumes (Koenraadt H and Jones AR (1992) Phytopatholog 82: 1354-1358; Iitia et al. (1994) BioTechniques 17:566-571). Otherwise, the target sequence from genomic DNA is digested with a restriction endonuclease and size separated by gel electrophoresis (Conner et al. 1983). Hybridizations typically occur with the target sequence bound to the surface of a membrane or, as described in U.S. Patent 5,468,613, the ASH probe sequence may be bound to a membrane.

Utilizing nucleotide alleles and polymoφhisms described here, ASH data were obtained by amplifying nucleic acid fragments (amplicons) from genomic DNA using PCR, transferring the amplicon target DNA to a membrane in a dot-blot format, hybridizing a labeled oligonucleotide probe to the amplicon target, and observing the hybridization dots by autoradiography. These genetic markers have utility in the improvement of soybean, an important crop plant that supplies much of the world's oil and protein. Solid-Phase Arrays

In one variant, ASH technologies are adapted to solid phase arrays for the rapid and specific detection of multiple polymoφhic nucleotides. Typically, an ASH probe is linked to a solid support and a target nucleic acid (e.g., a genomic nucleic acid, or an amplicon) is hybridized to the probe. Either the probe, or the target, or both, can be labeled, typically with a fluorophore. Where the target is labeled, hybridization is detected by detecting bound fluorescence. Where the probe is labeled, hybridization is typically detected by quenching of the label. Where both the probe and the target are labeled, detection of hybridization is typically performed by monitoring a color shift resulting from proximity of the two bound labels. A variety of labeling strategies, labels, and the like, particularly for fluorescent based applications are described, supra.

In one embodiment, an array of ash probes are synthesized on a solid support. Using chip masking technologies and photoprotective chemistry it is possible to generate ordered arrays of nucleic acid probes. These arrays, which are known, e.g., as "DNA chips," or as very large scale immobilized polymer arrays ("VLSIPS™" arrays) can include millions of defined probe regions on a substrate having an area of about 1cm² to several cm².

The construction and use of solid phase nucleic acid arrays to detect target nucleic acids is well described in the literature. See, Fodor et al. (1991) Science, 251: 767- 777; Sheldon. et al. (1993) Clinical Chemistry 39(4): 718-719; Kozal et al. (1996) Nature Medicine 2(7): 753-759 and Hubbell U.S. Pat. No. 5,571,639. See also, Pinkel et al. PCT/US95/16155 (WO 96/17958). In brief, a combinatorial strategy allows for the synthesis of arrays containing a large number of probes using a minimal number of synthetic steps. For instance, it is possible to synthesize and attach all possible DNA 8mer oligonucleotides (4⁸, or 65,536 possible combinations) using only 32 chemical synthetic steps. In general, VLSIPS™ procedures provide a method of producing 4ⁿ different oligonucleotide probes on an array using only 4n synthetic steps.

Light-directed combinatorial synthesis of oligonucleotide arrays on a glass surface is performed with automated phosphoramidite chemistry and chip masking techniques similar to photoresist technologies in the computer chip industry. Typically, a glass surface is derivatized with a silane reagent containing a functional group, e.g., a hydroxy 1 or amine group blocked by a photolabile protecting group. Photolysis through a photolithogaphic mask is used selectively to expose functional groups which are then ready to react with incoming 5'-photoprotected nucleoside phosphoramidites. The phosphoramidites react only with those sites which are illuminated (and thus exposed by removal of the photolabile blocking group). Thus, the phosphoramidites only add to those areas selectively exposed from the preceding step. These steps are repeated until the desired array of sequences have been synthesized on the solid surface. Combinatorial synthesis of different oligonucleotide analogues at different locations on the array is determined by the pattern of illumination during synthesis and the order of addition of coupling reagents. Monitoring of hybridization of target nucleic acids to the array is typically performed with fluorescence microscopes or laser scanning microscopes. In addition to being able to design, build and use probe arrays using available techniques, one of skill is also able to order custom-made arrays and array- reading devices from manufacturers specializing in array manufacture. For example, Affymetrix Coφ. in Santa Clara CA manufactures DNA VLSIP™ arrays.

It will be appreciated that probe design is influenced by the intended application. For example, where several allele-specific probe-target interactions are to be detected in a single assay, e.g., on a single DNA chip, it is desirable to have similar melting temperatures for all of the probes. Accordingly, the length of the probes are adjusted so that the melting temperatures for all of the probes on the array are closely similar (it will be appreciated that different lengths for different probes may be needed to achieve a particular T_m where different probes have different GC contents). Although melting temperature is a primary consideration in probe design, other factors are optionally used to further adjust probe construction.

Chromosome Painting Technologies--In Situ Hybridization

In one aspect, a marker is used as a chromosome probe to cytogenetically detect the presence of a polymoφhic nucleic acid or region linked to the nucleic acid.

This can be especially useful because cytogenetic identification of a chromosomal region provides a way of determining the physical location of the region hybridized by the probe, i.e., in reference to other known markers.

Typically, a probe which hybridizes to a polymoφhic nucleotide or a linked nucleic acid is chemically linked to a colorometric label, or fluorophore. The probe is used to paint the chromosome with the color label, thereby identifying regions which are hybridized by the label. Chromosome painting refers to the staining of specific metaphase or prophase chromosomes or regions of chromosomes with probe mixtures, e.g., probes hybridizing to the polymoφhic nucleic acids of the invention, and optionally, additional probes hybridizing to additional regions. The painting signal is preferably obtained by fluorescence in situ hybridization (FISH) of such mixtures with the target genome. A variety of staining technologies for the detection of chromosomal differences (typically abnormalities) are known. See, Jauch et al, Hum. Genet. , 85: 145- 150 (1990); Wier Chromosomal, 100:371-376 (1991); Van-den-Engh et al, Cytometry 6:92-100 (1988) and Kaltoft et al. Arch. Dermatol. Res. , 279:293-298 (1987); Sealey et al. Nucleic Acids Res. 13:1905 (1985); Landegent et al. Hum. Genet., 77:366 (1987); Nisson et al, BRL Focus, 13:42 (1991). Comparative genomic hybridization (CGH) is also a known approach for identifying the presence and localization of sequences in a genome compared to a reference genome. See, Kallioniemi, et al. (1992) Science 258:818. CGH can provide a quantitative estimate of copy number and also provides information regarding the localization of amplified or deleted sequences in a normal chromosome. Many in situ detection techniques are known and can be adapted to the present invention. Fluorescent in situ hybridization (FISH), reverse chromosome painting, FISH on DAPI stained chromosomes, generation of Alphoid DNA probes for FISH using PCR, PRINS labeling of DNA, free chromatin mapping, spectral karyotyping and a variety of other techniques described, e.g., in Tijssen (1993) Laboratory Techniques in biochemistry and molecular biology—hybridization with nucleic acid probes parts I and II. Elsevier, New York, and, Choo (ed) (1994) Methods In Molecular Biology Volume 33- In Situ Hvbridization Protocols Humana Press Inc. , New Jersey (see also, other books in the Methods in Molecular Biology series).

These color-labeling strategies are useful for distinguishing the presence or absence of a chromosomal nucleic acid. They are also useful for the detection of multiple probes with multiple labels. In particular, chromosomes are optionally stained with multiple probes, optionally having multiple color labels. In this way, it is possible to quickly provide a genetic map of a sample at the molecular level. Furthermore, it is possible to determine whether two polymoφhic nucleotides from the same locus are present. For example, if two allele-specific probes with different color labels are hybridized to a chromosomal sample under allele-specific hybridization conditions,- it is possible specifically to detect both polymoφhic nucleotides. For example, where a first probe has a "blue" label, and a second probe has a "yellow" label, a sample which is homozygous for the polymoφhic nucleotide specifically bound by the first probe will look "blue" to an observer, a sample which is homozygous for the polymoφhic nucleotide specifically bound by the second probe will look "yellow" to an observer, while a sample which is heterozygous and binds both probes will appear "green" to an observer. It will be appreciated that many color combinations are possible. For example, where the first fluorophore emits a "blue" light and a second fluorophore emits a "yellow" light, the effect to the observer is that a "green" signal is observed. It will be appreciated that a wide variety of emission characteristics can be monitored; indeed, even when the fluorophores emit a non-visible wavelength of light, a combination color can be assigned to a ratio between any two (or more, e.g., where more than two probes are used in an assay) wavelengths of light. Amplification Detection Strategies

In a preferred embodiment, a polymoφhic nucleotide is detected by amplifying the polymoφhic nucleotide and detecting the resulting amplicon. A variety of variations on this strategy are used to detect polymoφhic nucleic acids, depending on the materials available, and the like. (1) PCR

In one embodiment, nucleic acids primers which hybridize to regions of a genomic nucleic acid that flank a polymoφhic nucleotide to be detected are used in PCR or LCR reactions to generate an amplicon comprising the polymoφhic nucleotide. A variety of PCR and LCR strategies are known in the art and are found in Berger, Sambrook, Ausubel, and Innis, all supra. See also, as Mullis et al. , (1987) U.S. Patent No. 4,683,202. In brief, a nucleic acid having a polymoφhic nucleic acid to be detected (a genomic DNA, a genomic clone, a genomic amplicon or the like) is hybridized to primers which flank the polymoφhic nucleotide to be detected (e.g., nucleotide polymoφhisms at a locus such as pA060A, pA077A, pA086A, pA169A, pA280A, pA378A, pA505A, pA519A, pA588A, pA947B, pB032A, pB032B, pB039A, pBLT24A, pBLT65A, php02265A, php02301A, php02361A, php02370C, php02387A, php02388A, php02393A, php02396A, php02636A, php03522A, php05219A, php05233A, php05278A, php05342A, php07659A, php08584A, phpl2105A, php02340B, php05264A, phpl0355B, pK069A, pK079A, pK401A, pK418A, pK644B, pL058A, pL183A, pR045A, pR153A, pT005A, pT155A, php02329A, php02371A, pA343A, pA748B, php05290A, pA858A, php02376A, pG17.3, pB132A, phpl0078A and/or SOYBPSP, all described supra). Example primers which amplify the polymoφhic nucleic acids are provided in the examples section below. The primers are extended in a PCR reaction (typically including a thermostable polymerase enzyme such as Taq, deoxynucleotides, Mg^{+ +} and the like; See, Ausubel, Innis, Berger or Sambrook for typical PCR conditions). The resulting PCR amplicons comprise the polymoφhic nucleic acid to be detected. Exemplar amplicons include phal2105, phal2390, phal2391, phal2392, phal2393, phal2394, phal2394, phal2395, phal2396, phal0634, phal0623, phal0624, phal0649, phal l l35, phal0792, phal0635, phal0638, phal0648, phal0621, phal l071, phal l073, phal0640, phal l076, phal0653, phal0598, phal0615, phal0646, phal0618, phal0620, phal0782, phalll31, phalll32, phal0650, phal0651, phal ll38, phal0637, phal l078, phal l079, phal l l39, phal0655, phal l701, phal l627, phal0633, phal l074, phal l075, phal0632, phal l628, phall l33, phal0641, phal l l36, phal0658, phal0636, phal0783, phal0647, pha08230, phal3070, phal3071, phal3072, phal3073, phal3074, phal3158, phal3560, phal3561, phal4257 and phal4395. Methods of detecting PCR amplicons are known, and can easily be adapted to detecting the amplicons of the invention. Detection is typically performed by running PCR reaction products out on an acrylamide or agarose gel and detecting the size of the reaction products; alternatively, the products can be detected by allele-specific hybridization, by allele-specific hybridization to a polymer array as described supra, or by sequencing the PCR amplicons (using standard Sanger dideoxy or Maxam-Gilbert methods). The polymoφhic nucleotides in amplicons are optionally detected by cleaving the amplicon with a restriction enzyme that recognizes the polymoφhic nucleic acid, in an adaptation of standard RFLP analysis.

In addition to the example primer nucleic acids in the examples section below, one of skill is easily able to select a variety of other primers which can be used in PCR amplification of nucleic acids comprising or proximal to polymoφhic nucleotides. In particular, methods of amplifying large nucleic acids by PCR are summarized in Cheng et al. (1994) Nature 369: 684-685 and the references therein, in which PCR amplicons of up to 40kb are generated. More typically, standard PCR is used to create amplicons of between about 100 and about 5,000 nucleotides in length, e.g., using the techniques described in Ausubel and Innis, supra. In any case, primers that hybridize to essentially any region of an amplicon made using the primers of the invention are designed by reference to the sequence of the amplicon. The sequence of the primers are selected to hybridize to regions of the amplicon.

Amplicons are sequenced by any of a variety of protocols. Most DNA sequencing today is carried out by chain termination methods of DNA sequencing. The most popular chain termination methods of DNA sequencing are variants of the dideoxynucleotide mediated chain termination method of Sanger. See, Sanger et al. (1977) Proc. Nat. Acad. Sci, USA 74:5463-5467. For a simple introduction to dideoxy sequencing, see, Current Protocols in Molecular Biology, F.M. Ausubel et al , eds. , Current Protocols, a joint venture between Greene Publishing Associates, Inc. and John Wiley & Sons, Inc. , (Supplement 37, current through 1997) (Ausubel), Chapter 7. Thousands of laboratories employ dideoxynucleotide chain termination techniques. Commercial kits containing the reagents most typically used for these methods of DNA sequencing are available and widely used.

In addition to the Sanger methods of chain termination, new PCR exonuclease digestion methods are available for DNA sequencing of PCR amplicons. Direct sequencing of PCR generated amplicons by selectively incoφorating boronated nuclease resistant nucleotides into the amplicons during PCR and digestion of the amplicons with a nuclease to produce sized template fragments has been developed (Porter et al. (1997) Nucleic Acids Research 25(8): 1611-1617). In the methods, 4 PCR reactions on a template are performed, in which one of the nucleotide triphosphates in the PCR reaction mixture is partially substituted with a 2'deoxynucleoside 5'-α[P- boranoj-triphosphate. The boronated nucleotide is stocastically incoφorated into PCR products at varying positions along the PCR amplicon. An exonuclease which is blocked by incoφorated boronated nucleotides is used to cleave the PCR amplicons. The cleaved amplicons are then separated by size using polyacrylamide gel electrophoresis, providing the sequence of the amplicon. An advantage of this method is that it requires fewer biochemical manipulations for sequencing an amplicon than performing standard Sanger- style sequencing of PCR amplicons.

Once an amplicon is sequenced, the sequence is optionally used to select primers complementary to the amplicon, i.e., primers which will hybridize to the amplicon. It is expected that one of skill is thoroughly familiar with the theory and practice of nucleic acid hybridization and primer selection. Gait, ed. Oligonucleotide Synthesis: A Practical Approach, IRL Press, Oxford (1984); W.H.A. Kuijpers Nucleic Acids Research 18(17), 5197 (1994); K.L. Dueholm J. Org. Chem. 59, 5767-5773 (1994); S. Agrawal (ed.) Methods in Molecular Biology, volume 20; and Tijssen (1993) Laboratory Techniques in biochemistry and molecular biology-hybridization with nucleic acid probes, e.g., part I chapter 2 "overview of principles of hybridization and the strategy of nucleic acid probe assays", Elsevier, New York provide a basic guide to nucleic acid hybridization. Innis, supra, provides an overview of primer selection.

One of skill will recognize that the 3' end of an amplification primer is more important for PCR than the 5' end. Investigators have reported PCR products where only a few nucleotides at the 3' end of an amplification primer were complementary to a DNA to be amplified. In this regard, nucleotides at the 5' end of a primer can incoφorate structural features unrelated to the target nucleic acid; for instance, in one embodiment, a sequencing primer hybridization site (or a complement to such a primer, depending on the application) is incoφorated into the amplification primer, where the sequencing primer is derived from a primer used in a standard sequencing kit, such as one using a biotinylated or dye-labeled universal M13 or SP6 primer. The primers are typically selected so that there is no complementarity between any known target sequence and any constant primer region. One of skill will appreciate that constant regions in primer sequences are optional.

Typically, all primer sequences are selected to hybridize only to a perfectly complementary DNA, with the nearest mismatch hybridization possibility from known DNA sequence typical having at least about 50 to 70% hybridization mismatches, and preferably 100% mismatches for the terminal 5 nucleotides at the 3' end of the primer.

The primers are selected so that no secondary structure forms within the primer. Self-complementary primers have poor hybridization properties, because the complementary portions of the primers self hybridize (i.e., form haiφin structures). Primers are selected to have minimal cross-hybridization, thereby preventing competition between individual primers and a template nucleic acid and preventing duplex formation of the primers in solution, and possible concatenation of the primers during PCR. If there is more than one constant region in the primer, the constant regions of the primer are selected so that they do not self-hybridize or form haiφin structures.

One of skill will recognize that there are a variety of possible ways of performing the above selection steps, and that variations on the steps are appropriate. Most typically, selection steps are performed using simple computer programs to perform the selection as outlined above; however, all of the steps are optionally performed manually. One available computer program for primer selection is the MacVector™ program from Kodak. In addition to programs for primer selection, one of skill can easily design simple programs for any or all of the preferred selection steps.

One of skill will recognize that a wide variety of amplicons are provided by the present invention. In particular, amplicons are generated with the primers described herein. The amplicons can be generated by exponential amplification as described in the examples herein, or by linear amplification using a single specific primer, or by using one of the example primers below in conjunction with a set of random primers.

It will be appreciated that the amplicons are characterized by a variety of physicochemical properties, including, but not limited to the following. First, the amplicons of the invention are produced in an amplification reaction using the primers as described above, with genomic soybean nucleic acid as a template (or a derivative thereof, such as a cloned or in vitro amplified genomic nucleic acid). Second, single stranded forms of the amplicons (e.g., denatured amplicons) hybridize under stringent conditions to the template nucleic acid. Conditions for specific hybridization of nucleic acids, including amplicon nucleic acids are described above. A third physicochemical property of amplicons of the invention is that they specifically hybridize to one or more of the primers in the examples section below. In particular, the primers used to make the amplicon will hybridize to the amplicon; indeed, in PCR amplification strategies, hybridization of the primers to the amplicon is usually required for amplification. Additional physicochemical properties of the amplicons are described in the examples section, where example amplicons are described with reference, e.g., to size and hybridization to particular primers. (2) LCR

In another embodiment, LCR is used to amplify specifically a polymoφhic nucleic acid. By detecting the amplification product, presence of the polymoφhic nucleotide is confirmed. Detection is typically performed by running LCR reaction products out on an acrylamide or agarose gel and detecting the size of the reaction products; alternatively, the products can be detected by allele-specific hybridization, by allele-specific hybridization to a polymer array as described supra, or by sequencing the LCR amplicons (using standard Sanger dideoxy or Maxam-Gilbert methods). Detection techniques such as PCR amplification or other in vitro amplification methods are also used to detect LCR products.

The ligation chain reaction (LCR; sometimes denoted the "ligation amplification reaction" or "LAR") and related techniques are used as diagnostic methods for detecting single nucleotide variations in target nucleic acids. LCR provides a mechanism for linear or exponential amplification of a target nucleic acid via ligation of complementary oligonucleotides hybridized to a target. This amplification is performed to distinguish target nucleic acids that differ by a single nucleotide, providing a powerful tool for the analysis of genetic variation in the present invention, i.e., for distinguishing polymoφhic nucleotides.

The principle underlying LCR is straightforward: Oligonucleotides which are complementary to adjacent segments of a target nucleic acid are brought into proximity by hybridization to the target, and ligated using a ligase. To achieve linear amplification of the nucleic acid, a single pair of oligonucleotides which hybridize to adjoining areas of the target sequence are employed: the oligonucleotides are ligated, denatured from the template and the reaction is repeated. To achieve exponential amplification of the target nucleic acid two pairs of oligonucleotides (or more) are used, each pair hybridizing to complementary sequences on e.g. , a double-stranded target polynucleotide. After ligation and denaturation, the target and each of the ligated oligonucleotide pairs serves as a template for hybridization of the complementary oligonucleotides to achieve ligation. The ligase enzyme used in performing LCR is typically thermostable, allowing for repeated denaturation of the template and ligated oligonucleotide complex by heating the ligation reaction. LCR is useful as a diagnostic tool in the detection of genetic variation.

Using LCR methods, it is possible to distinguish between target polynucleotides which differ by a single nucleotide at the site of ligation. Ligation occurs only between oligonucleotides hybridized to a target polynucleotide where the complementarity between the oligonucleotides and the target is perfect, enabling differentiation between allelic variants of a gene or other chromosomal sequence. The specificity of ligation during

LCR can be increased by substituting the more specific NAD ⁺ -dependant ligases such as E. coli ligase and (thermostable) Taq ligase for the less specific T4 DNA ligase. The use of NAD analogues in the ligation reaction further increases specificity of the ligation reaction. See, U.S. Pat. No. 5,508, 179 to Wallace et al.

Finally, multiple LCR reactions can be run simultaneously in a single reaction, or in parallel reactions for simultaneous detection of any or all of the nucleotide polymoφhisms described herein.

(3). TAS. 3SR and OB amplification

Nucleotide polymoφhisms are also detected using other in vitro detection methods, including TAS, 3SR and Q/3 amplification. (TAS), the self-sustained sequence replication system (3SR) and the Qβ replicase amplification system (QB), are reviewed in The Journal Of NIH Research (1991) 3, 81-94. The present invention may be practiced in conjunction with TAS (Kwoh, et al. (1989) Proc. Natl. Acad. Sci. USA 86, 1173 or the related 3SR (Guatelli et al. (1990) Proc. Natl. Acad. Sci. USA 87, 1874) for detecting single-base alterations in target nucleic acids by transcribing the target, annealing oligonucleotide primers to the transcript and ligating the annealed primers. QB replication (Lomell et al. (1989) J. Clin. Chem 35, 1826) may also be used in conjunction with the ligation methods of the present invention to detect mismatches by performing QB amplification on DNA ligated by the methods of the present invention. Labeling and Detecting Probes

A probe for use in an in situ detection procedure, an in vitro amplification procedure (PCR, LCR, NASBA, etc.), hybridization techniques (allele-specific hybridization, in situ analysis, Southern analysis, northern analysis, etc.) or any other detection procedure herein can be labeled with any composition detectable by spectroscopic, photochemical, biochemical, immunochemical, electrical, optical or chemical means. Useful labels in the present invention include spectral labels such as fluorescent dyes (e.g., fluorescent isothiocyanate, Texas red, rhodamine, digoxigenin, biotin, and the like), radiolabels (e.g. , ³H, ¹²⁵I, ³⁵S, ¹⁴C, ³²P, ³³P, etc.), enzymes (e.g. , horse-radish peroxidase, alkaline phosphatase etc.) spectral colorimetric labels such as colloidal gold or colored glass or plastic (e.g. polystyrene, polypropylene, latex, etc.) beads. The label may be coupled directly or indirectly to a component of the detection assay (e.g. , a probe, primer, amplicon, YAC, BAC or the like) according to methods well known in the art. As indicated above, a wide variety of labels may be used, with the choice of label depending on sensitivity required, ease of conjugation with the compound, stability requirements, available instrumentation, and disposal provisions. In general, a detector which monitors a probe- target nucleic acid hybridization is adapted to the particular label which is used. Typical detectors include spectrophotometers, phototubes and photodiodes, microscopes, scintillation counters, cameras, film and the like, as well as combinations thereof. Examples of suitable detectors are widely available from a variety of commercial sources known to persons of skill. Commonly, an optical image of a substrate comprising a nucleic acid array with particular set of probes bound to the array is digitized for subsequent computer analysis.

Because incoφoration of radiolabeled nucleotides into nucleic acids is straightforward, this detection represents a preferred labeling strategy. Exemplar technologies for incoφorating radiolabels include end-labeling with a kinase or phoshpatase enzyme, nick translation, incoφoration of radio-active nucleotides with a polymerase and many other well known strategies.

Fluorescent labels are also preferred labels, having the advantage of requiring fewer precautions in handling. Preferred labels are typically characterized by one or more of the following: high sensitivity, high stability, low background, low environmental sensitivity and high specificity in labeling. Fluorescent moieties, which are incoφorated into the labels of the invention, are generally are known, including Texas red, digoxigenin, biotin, 1- and 2-aminonaphthalene, p,p'- dia inostilbenes, pyrenes, quaternary phenanthridine salts, 9-aminoacridines, p,p'-diaminobenzophenone imines, anthracenes, oxacarbocyanine, merocyanine, 3-aminoequilenin, perylene, bis-benzoxazole, bis-p-oxazolyl benzene, 1,2-benzophenazin, retinol, bis-3-aminopyridinium salts, hellebrigenin, tetracycline, sterophenol, benzimidazolylphenylamine, 2-oxo-3-chromen, indole, xanthen, 7-hydroxycoumarin, phenoxazine, calicylate, strophanthidin, poφhyrins, triarylmethanes and flavin. Individual fluorescent compounds which have functionalities for linking to an element desirably detected in an apparatus or assay of the invention, or which can be modified to incoφorate such functionalities include, e.g. , dansyl chloride; fluoresceins such as 3,6-dihydroxy-9-phenylxanthydrol; rhodamineisothiocyanate; N-phenyl l-amino-8- sulfonatonaphthalene; N-phenyl 2-amino-6-sulfonatonaphthalene; 4-acetamido-4-isothiocyanato-stilbene-2,2'-disulfonic acid; pyrene-3-sulfonic acid;

2 toluidinonaphthalene-6-sulfonate; N-phenyl-N-methyl- 2-aminoaphthalene-6-sulfonate; ethidium bromide; stebrine; auromine-0,2-(9'-anthroyl)palmitate; dansyl phosphatidylethanolamine; N,N'-dioctadecyl oxacarbocyanine: N,N'-dihexyl oxacarbocyanine; merocyanine, 4-(3'-pyrenyl)stearate; d-3-aminodesoxy-equilenin; 12-(9'-anthroyl)stearate; 2-methylanthracene; 9-vinylanthracene;

2,2'(vinylene-p-phenylene)bisbenzoxazole; p-bis(2-(4-methyl-5-phenyl-oxazolyl))benzene; 6-dimethylamino-l,2-benzophenazin; retinol; bis(3'-aminopyridinium) 1, 10-decandiyl diiodide; sulfonaphthylhydrazone of hellibrienin; chlorotetracycline; N-(7-dimethylamino-4-methyl-2-oxo-3-chromenyl)maleimide; N-(p-(2- benzimidazolyl)-phenyl)maleimide; N-(4-fluoranthyl)maleimide; bis(homovanillic acid); resazarin; 4-chloro-7-nitro-2, l,3- benzooxadiazole; merocyanine 540; resorufin; rose bengal; and 2,4-diphenyl-3(2H)-furanone. Many fluorescent tags are commercially available from SIGMA chemical company (Saint Louis, MO), Molecular Probes, R&D systems (Minneapolis, MN), Pharmacia LKB Biotechnology (Piscataway, NJ), CLONTECH Laboratories, Inc. (Palo Alto, CA), Chem Genes Coφ. , Aldrich Chemical Company (Milwaukee, WI), Glen Research, Inc. , GIBCO BRL Life Technologies, Inc. (Gaithersberg, MD), Fluka Chemica- Biochemika Analytika (Fluka Chemie AG, Buchs, Switzerland), and Applied Biosystems (Foster City, CA) as well as other commercial sources known to one of skill.

In one embodiment, nucleic acids are labeled by culturing recombinant cells which encode the nucleic acid in a medium which incoφorates fluorescent or radioactive nucleotide analogues in the growth medium, resulting in the production of fluorescently labeled nucleic acids. Similarly, nucleic acids are synthesized in vitro using a primer and a DNA polymerase such as taq. For example, Hawkins et al. U.S. Pat. No. 5,525,711 describes pteridine nucleotide analogs for use in fluorescent DNA probes, including PCR amplicons.

The label is coupled directly or indirectly to a molecule to be detected (a product, substrate, enzyme, or the like) according to methods well known in the art. As indicated above, a wide variety of labels are used, with the choice of label depending on the sensitivity required, ease of conjugation of the compound, stability requirements, available instrumentation, and disposal provisions. Non radioactive labels are often attached by indirect means. Generally, a ligand molecule (e.g. , biotin) is covalently bound to a nucleic acid such as a probe, primer, amplicon, YAC, BAC or the like. The ligand then binds to an anti-ligand (e.g. , streptavidin) molecule which is either inherently detectable or covalently bound to a signal system, such as a detectable enzyme, a fluorescent compound, or a chemiluminescent compound. A number of ligands and anti-ligands can be used. Where a ligand has a natural anti-ligand, for example, biotin, thyroxine, and cortisol, it can be used in conjunction with labeled anti-ligands. Alternatively, any haptenic or antigenic compound can be used in combination with an antibody. Labels can also be conjugated directly to signal generating compounds, e.g. , by conjugation with an enzyme or fluorophore or chromophore. Enzymes of interest as labels will primarily be hydrolases, particularly phosphatases, esterases and glycosidases, or oxidoreductases, particularly peroxidases. Fluorescent compounds include fluorescein and its derivatives, rhodamine and its derivatives, dansyl, umbelliferone, etc. Chemiluminescent compounds include luciferin and 2,3-dihydrophthalazinediones, e.g. , luminol. Means of detecting labels are well known to those of skill in the art. Thus, for example, where the label is a radioactive label, means for detection include a scintillation counter or photographic film as in autoradiography. Making Transgenic Plants With Nucleic Acids Linked to Selected Loci

Nucleic acids which are genetically linked to the loci described herein are optionally cloned and transduced into cells, especially to make transgenic plants. In particular, nucleic acids linked to the loci pA060A, pA077A, pA086A, pA169A, pA280A, pA378A, pA505A, pA519A,^'pA588A, pA947B, pB032A, pB032B, pB039A, pBLT24A, pBLT65A, php02265A, php02301A, php02361A, php02370C, php02387A, php02388A, php02393A, php02396A, php02636A, php03522A, php05219A, php05233A, php05278A, php05342A, php07659A, php08584A, phpl2105A, php02340B, php05264A, phpl0355B, pK069A, pK079A, pK401A, pK418A, pK644B, pL058A, pL183A, pR045A, pR153A, pT005A, pT155A php02329A, php02371A, pA343A, pA748B, php05290A, pA858A, php02376A, pG17.3, pB132A, phpl0078A or SOYBPSP are cloned and transduced into plants. The cloned sequences are useful as molecular tags for selected plant strains, and are further useful for encoding polypeptides. Often, these polypeptides are encoded by a QTL and are responsible for the phenotypic effects of the QTL.

The nucleic acids linked to a selected locus or selected loci are introduced into plant cells, either in culture or in organs of a plant, e.g., leaves, stems, fruit, seed, etc. The expression of natural or synthetic nucleic acids encoded by nucleic acids linked to polymoφhic nucleic acids can be achieved by operably linking a nucleic acid of interest to a promoter, incoφorating the construct into an expression vector, and introducing the vector into a suitable host cell. Alternatively, an endogenous promoter linked to the nucleic acids can be used.

Typical vectors contain transcription and translation terminators, transcription and translation initiation sequences, and promoters useful for regulation of the expression of the particular nucleic acid. The vectors optionally comprise generic expression cassettes containing at least one independent terminator sequence_^ sequences permitting replication of the cassette in eukaryotes, prokaryotes, or both (e.g. , shuttle vectors) and selection markers for both prokaryotic and eukaryotic systems. Vectors are suitable for replication and integration in prokaryotes, eukaryotes, or preferably both. See, Giliman & Smith, Gene 8:81 (1979); Roberts, et al , Nature, 328:731 (1987);

Schneider, B. , et al, Protein Expr. Purifi 6435: 10 (1995); Berger, Sambrook, Ausubel (all supra).

Cloning of QTL Linked Sequences into Bacterial Hosts

There are several well-known methods of introducing nucleic acids into bacterial cells, any of which may be used in the present invention. These include: fusion of the recipient cells with bacterial protoplasts containing the DNA, electroporation, projectile bombardment, and infection with viral vectors, etc. Bacterial cells are often used to amplify increase the number of plasmids containing DNA constructs of this invention. The bacteria are grown to log phase and the plasmids within the bacteria can be isolated by a variety of methods known in the art (see, for instance, Sambrook). In addition, a plethora of kits are commercially available for the purification of plasmids from bacteria. For their proper use, follow the manufacturer's instructions (see, for example, Easy Prep™, FlexiPrep™, both from Pharmacia Biotech; StrataClean™, from Stratagene; and, QIAexpress Expression System, Qiagen). The isolated and purified plasmids are then further manipulated to produce other plasmids, used to transfect plant cells, or incoφorated into Agrobacterium tumefaciens to infect plants.

The in vitro delivery of nucleic acids into bacterial hosts can be to any cell grown in culture. Contact between the cells and the genetically engineered nucleic acid constructs, when carried out in vitro, takes place in a biologically compatible medium. The concentration of nucleic acid varies widely depending on the particular application, but is generally between about 1 μM and about 10 mM. Treatment of the cells with the nucleic acid is generally carried out at physiological temperatures (about 37 °C) for periods of time of from about 1 to 48 hours. Alternatively, a nucleic acid operably linked to a promoter to form a fusion gene is expressed in bacteria such as E. coli and its gene product isolated and purified.

Transfecting Plant Cells To use isolated sequences in the above techniques, recombinant DNA vectors suitable for transformation of plant cells are prepared. Techniques for transforming a wide variety of higher plant species are well known and described in the technical and scientific literature. See, for example, Weising, et al, Ann. Rev. Genet. 22:421-477 (1988). A DNA sequence coding for the desired mRNA, polypeptide, or non-expressed tagging sequence is transduced into the plant. Where the sequence is expressed, the sequence is optionally combined with transcriptional and translational initiation regulatory sequences which will direct the transcription of the sequence from the gene in the intended tissues of the transformed plant.

Promoters in nucleic acids linked to the above loci are identified, e.g. , by analyzing the 5' sequences upstream of a coding sequence in linkage disequilibrium with the loci. Optionally, such nucleic acids will be associated with a QTL. Sequences characteristic of promoter sequences can be used to identify the promoter. Sequences controlling eukaryotic gene expression have been extensively studied. For instance, promoter sequence elements include the TATA box consensus sequence (TATA AT), which is usually 20 to 30 base pairs upstream of a transcription start site. In most instances the TATA box aids in accurate transcription initiation. In plants, further upstream from the TATA box, at positions -80 to -100, there is typically a promoter element with a series of adenines surrounding the trinucleotide G (or T) N G. See, e.g. , J. Messing, et al, in GENETIC ENGINEERING IN PLANTS, pp. 221-227 (Kosage, Meredith and Hollaender, eds. (1983)).

A number of methods are known to those of skill in the art for identifying and characterizing promoter regions in plant genomic DNA. See, e.g., Jordano, et al, Plant Cell 1:855-866 (1989); Bustos, et al, Plant Cell 1:839-854 (1989); Green, et al, EMBO J. 7:4035-4044 (1988); Meier, et al, Plant Cell 3:309-316 (1991); and Zhang, et al, Plant Physiology 110: 1069-1079 (1996).

In construction of recombinant expression cassettes of the invention, a plant promoter fragment is optionally employed which directs expression of the gene in all tissues of a regenerated plant. Such promoters are referred to herein as "constitutive" promoters and are active under most environmental conditions and states of development or cell differentiation. Examples of constitutive promoters include the cauliflower mosaic virus (CaMV) 35S transcription initiation region, the 1 '- or 2'- promoter derived from T-DNA of Agrobacterium tumafaciens, and other transcription initiation regions from various plant genes known to those of skill.

Alternatively, the plant promoter may direct expression of the polynucleotide of the invention in a specific tissue (tissue-specific promoters) or may be otherwise under more precise environmental control (inducible promoters). Examples of tissue-specific promoters under developmental control include promoters that initiate transcription only in certain tissues, such as fruit, seeds, or flowers.

If polypeptide expression is desired, a polyadenylation region at the 3 '-end of the coding region is typically included. The polyadenylation region can be derived from the natural gene, from a variety of other plant genes, or from T-DNA.

The vector comprising the sequences (e.g. , promoters or coding regions) from genes of the invention will typically comprise a marker gene which confers a selectable phenotype on plant cells. For example, the marker can encode biocide resistance, particularly antibiotic resistance, such as resistance to kanamycin, G418, bleomycin, hygromycin, or herbicide resistance, such as resistance to chlorosluforon or Basta. Introduction of the Nucleic Acids into Plant Cells

The DNA constructs of the invention are introduced into plant cells, either in culture or in the organs of a plant by a variety of conventional techniques. For example, the DNA construct can be introduced directly into the genomic DNA of the plant cell using techniques such as electroporation and microinjection of plant cell protoplasts, or the DNA constructs can be introduced directly to plant cells using ballistic methods, such as DNA particle bombardment. Alternatively, the DNA constructs are combined with suitable T-DNA flanking regions and introduced into a conventional Agrobacterium tumefaciens host vector. The virulence functions of the Agrobacterium tumefaciens host directs the insertion of the construct and adjacent markers into the plant cell DNA when the cell is infected by the bacteria.

Microinjection techniques are known in the art and well described in the scientific and patent literature. The introduction of DNA constructs using polyethylene glycol precipitation is described in Paszkowski, et al, EMBO J. 3:2717 (1984). Electroporation techniques are described in Fromm, et al, Proc. Nat 'I. Acad. Sci. USA 82:5824 (1985). Ballistic transformation techniques are described in Klein, et al, Nature 327:70-73 (1987).

Agrobacterium tumefaciens-mediated transformation techniques, including disarming and use of binary vectors, are also well described in the scientific literature. See, for example Horsch, et al, Science 233:496-498 (1984), and Fraley, et al, Proc. Nat 'I. Acad. Sci. USA 80:4803 (1983). Agrobacterium-mediated transformation is a preferred method of transformation of dicots. Generation of Transgenic Plants Transformed plant cells which are derived by any of the above transformation techniques can be cultured to regenerate a whole plant which possesses the transformed genotype and thus the desired phenotype. Such regeneration techniques rely on manipulation of certain phytohormones in a tissue culture growth medium, typically relying on a biocide and/or herbicide marker which has been introduced together with the desired nucleotide sequences. Plant regeneration from cultured protoplasts is described in Evans, et al., Protoplasts Isolation and Culture. Handbook of Plant Cell Culture, pp. 124-176, Macmillian Publishing Company, New York, (1983); and Binding, REGENERATION OF PLANTS, PLANT PROTOPLASTS, pp. 21-73, CRC Press, Boca Raton, (1985). Regeneration can also be obtained from plant callus, explants, somatic embryos (Dandekar, et al, J. Tissue Cult. Meth. 12: 145 (1989); McGranahan, et al, Plant Cell Rep. 8:512 (1990)), organs, or parts thereof. Such regeneration techniques are described generally in Klee, et al, Ann. Rev. of Plant Phys. 38:467-486 (1987).

One of skill will recognize that after the expression cassette is stably incoφorated in transgenic plants and confirmed to be operable, it can be introduced into other plants by sexual crossing. Any of a number of standard breeding techniques can be used, depending upon the species to be crossed.

Discussion of the Accompanying Sequence Listing The accompanying sequence listing provides complete or partial sequences for a number of amplicons comprising the marker loci herein and for various primers and probe sequences useful in allele-specific hybridization, PCR and the like. The information is presented in DNA sequences. One of skill will readily understand that the sequence also fully describes the complementary strand of the provided DNA, i.e., by using standard base-pairing rules, the sequence of complementary DNA is provided and can be written out by any competent practitioner in the art. RNAs having the same sequence are provided by substituting "T" residues with "U" residues, and RNAs corresponding to the complementary strand are similarly provided. A variety of conservatively modified variations of the sequences are also fully provided. For example, coding regions denoted by open reading frames, beginning with the start codon "ATG" coding for methionine and optionally ending with a stop codon (TAA, TAG or TGA) can generally be modified by substituting codons which equivalently code for the same amino acid. The genetic code is well known, and is found in essentially all modern textbooks on Molecular Biology. See, e.g., Lewin (1995) Genes V Oxford University Press Inc. , NY (Lewin); and Watson et al. (1992) Recombinant DNA Second Edition Scientific American Books, NY. Accordingly, although the given sequences are preferred because of their hybridization characteristics (i.e., because the given sequences hybridize to genomic soybean DNA at polymoφhic loci) one of skill will recognize that, for coding puφoses, any coding sequence can be equivalently represented by any sequence having equivalent codons, and that recitation of a single sequence provides all of these coding sequences. In the interest of not providing clearly redundant information, all possible coding sequences are not written out separately. However, one of skill can easily do so with the information provided, by simple reference to the genetic code. Simple computer programs can also be used to list any or all such nucleic acids, given the provided sequence. For example, coding regions where the nucleotides TTT (coding for phenylalanine) appear are optionally substituted with TTC, and vice-versa. The codons TTA, TTG, CTT, CTC, CTA, CTG (coding for Leucine) are optionally substituted for one another, in any combination. Coding regions where ATT, ATC or ATA appear (all coding for isoleucine) are optionally substituted for one another. The codons GTT, GTC, GTA and GTG (all coding for valine) are all optionally substituted for one another. The codons TCT, TCC, TCA, AGT, TCG and AGC (all coding for serine) are optionally substituted for one another. The codons CCT, CCC, CCA and CCG (all coding for proline) are optionally substitued for one another. The codons

ACT, ACC, ACA, and ACG (all coding for threonine) are optionally substitued for one another. The codons GCT, GCC, GCA and GCG (all coding for alanine) are optionally substitued for one another. The codons TAT and TAC (all coding for tyrosine) are optionally substitued for one another. The codons TAA, TAG and TGA (all coding for stop codons) are optionally substitued for one another. The codons CAT and CAC (all coding for histadine) are optionally substitued for one another. The codons CAA and CAG (all coding for glutamine) are optionally substitued for one another. The codons AAT and AAC (all coding for asparagine) are optionally substitued for one another. The codons AAA and AAG (coding for lysine) are optionally substitued for one another. The codons GAA and GAG (coding for glutamic acid) are optionally substitued for one another. The codons TGT and TGC (coding for cyteine) are optionally substitued for one another. The codons CGT, CGC, CGA and CGG (coding for arginine) are optionally substitued for one another. The codons GGT, GGC, GGA and CCC (coding for glycine) are optionally substitued for one another.

Additional conservative substitutions are also provided by the given sequences. With respect to particular nucleic acid sequences, conservatively modified variants are those nucleic acids which encode identical or essentially identical amino acid sequences, or where the nucleic acid does not encode an amino acid sequence, to essentially identical sequences.

As to amino acid sequences, one of skill will recognize that individual substitutions, deletions or additions to a nucleic acid, peptide, polypeptide, or protein sequence which alters, adds or deletes a single amino acid or a small percentage of amino acids in the encoded sequence is a "conservatively modified variant" where the alteration results in the substitution of an amino acid with a chemically similar amino acid. Conservative substitution tables providing functionally similar amino acids are well known in the art. The following six groups each contain amino acids that are conservative substitutions for one another: 1) Alanine (A), Serine (S), Threonine (T); 2) Aspartic acid (D), Glutamic acid (E); 3) Asparagine (N), Glutamine (Q); 4) Arginine (R), Lysine (K); 5) Isoleucine (I), Leucine (L), Methionine (M), Valine (V); and 6) Phenylalanine (F), Tyrosine (Y), and Tryptophan (W).

Regarding nucleic acids, the amplicons and probes in the sequence listing are typically used in hybridization experiments, e.g., for marker-assisted selection. Accordingly, amplicons and probes which are substantially similar or identical, which can be used in the methods herein (e.g., intra-specific alleles, genetically engineered nucleic acids made, e.g., by modification of the provided sequences, and the like) are provided by the accompanying sequences. In particular, bases may be added, deleted or changed without substantially altering the hybridization properties of the nucleic acid. For example, bases which do not hybridize to a given probe can be modified without altering the hybridization properties of the probe to the given sequence. Modifications in such non-hybridizing regions (e.g., flanking regions) result in a nucleic acid which is essentially the same (has the same desired phsiochemical properties, i.e., hybridizes to the same probe) as the written sequence. For example, it will be appreciated that the amplicons are optionally larger than probes which hybridize to them. Accordingly, the regions which are not involved in hybridization are not essential for hybridization to a probe. Thus, where the nucleotide to be detected is a polymoφhic nucleotide, the regions of an amplicon flanking the polymoφhic nucleotide which do not hybridize to a probe (e.g., an allele-specific probe, as described, supra) are not critical for hybridization to the probe. One of skill will recognize that these regions are optionally modified.

One of skill will further recognize that the sequences in the sequence listing are optionally part of larger sequences, e.g., the nucleic acids can be cloned into vectors known in the art. See, Sambrook, Ausubel, Berger and Innis, all supra. Furthermore, subsequences of the given sequences are easily constructed, either by synthetically or recombinantly joining nucleotides to yield the subsequence. Typical subsequences are at least about 10 nucleotides, often at least about 20 nucleotides, generally often at least about 30 nucletoides, and optionally any length, e.g., 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900 or the like and can, of course, be full-length. Similarly, a subsequence can include, e.g. 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100% or any percentage between those listed of a particular full-length nucleic acid.

The subsequences are characterized by the ability to specifically hybridize to the complement of the full-length sequence, and by sequence identity with a sequence in the sequence listing over a selected comparison window. A "comparison window", as used herein, includes reference to a segment of any one of the number of contiguous positions selected from the group consisting of from 10 to 1000, usually about 20 to about 200, more usually about 100 to about 150 in which a sequence may be compared to a reference sequence of the same number of contiguous positions after the two sequences are optimally aligned. Methods of alignment of sequences for comparison are well-known in the art. Optimal alignment of sequences for comparison may be conducted by the local homology algorithm of Smith & Waterman, Adv. Appl Math. 2:482 (1981); by the homology alignment algorithm of Needleman & Wunsch, J. Mol. Biol 48:443 (1970); by the search for similarity method of Pearson & Lipman, Proc. Natl Acad. Sci. USA 85:2444 (1988); by computerized implementations of these algorithms (including, but not limited to CLUSTAL in the PC/Gene program by Intelligenetics, Mountain View, California, GAP, BESTFIT, BLAST, FASTA, and TFASTA in the Wisconsin Genetics Software Package, Genetics Computer Group (GCG), 575 Science Dr. , Madison, Wisconsin, USA); the CLUSTAL program is well described by Higgins & Shaφ, Gene, 73:237-244 (1988) and Higgins & Shaφ CABIOS 5: 151-153 (1989); Coφet et al. , Nucleic Acids Research 16: 10881-10890 (1988); Huang et al. Computer Applications in the Biosciences 8: 155-165 (1992); and Pearson et al , Methods in Molecular Biology 24:307-331 (1994). Alignment is also often performed by inspection and manual alignment. One of skill can easily select a variety of nucleic acids which are identical to the given nucleic acids over any selected comparison window size. It will be recognized that where the window includes a nucleotide region to be detected (e.g., a locus), the remainder of the nucleic acid may be deleted or modified.

One of skill will also understand that sequencing technology is imperfect. Typical error rates for DNA sequencing are on the order of 1-5%, depending on the particular technology used for sequencing. Accordingly, one of skill will recognize that the amplicons of the sequence listing are most preferably obtained by amplifying a genomic nucleic acid (e.g., a genomic clone from a library or genomic nucleic acid isolated from a plant) using the primers described for the particular amplicon. However, the amplicons are also obtained by other means, including synthetic creation of a nucleic acid having the indicated sequence, or any other method described herein. Modifying Nucleic Acids in the Sequence Listing

One of skill will appreciate that many conservative variations of the nucleic acids in the sequence listing can be made using common techniques. For example, due to the degeneracy of the genetic code, "silent substitutions" (i.e., substitutions of a nucleic acid sequence which do not result in an alteration in an encoded polypeptide) are an implied feature of every nucleic acid sequence which encodes an amino acid. Similarly, "conservative amino acid substitutions," in one or a few amino acids in an amino acid sequence of a packaging or packageable construct, are substituted with different amino acids with highly similar properties (see, above) are also readily identified as being highly similar to a disclosed construct. Such conservatively substituted variations of each explicitly disclosed sequence are a feature of the present invention. Substantially identical nucleotides such as allelic variants or recombinantly engineered nucleic acids comprising all or a portion of a given sequence are also made using these standard techniques.

One of skill will recognize many ways of generating alterations in a given nucleic acid construct. Such well-known methods include site-directed mutagenesis, PCR amplification using degenerate oligonucleotides, exposure of cells containing the nucleic acid to mutagenic agents or radiation, chemical synthesis of a desired oligonucleotide (e.g., in conjunction with ligation and/or cloning to generate large nucleic acids) and other well-known techniques. See, Giliman and Smith (1979) Gene 8:81-97, Roberts et al. (1987) Nature 328:731-734 and Sambrook, Innis, Ausubel, Berger, Needham VanDevanter and Mullis (all supra). See also, Kilbey et al. (eds) (1984) Handbook of Mutagenicity Test Procedures. Second Edition Elsevier, New York.

One of skill can select a desired nucleic acid of the invention based upon the sequences provided and upon knowledge in the art regarding retroviruses generally. The specific effects of many mutations on hybridization, for instance, can be determined with virtual certainty, even in the absence of experimental information on a hybridization. Moreover, general knowledge regarding the nature of proteins and nucleic acids allows one of skill to select appropriate sequences with activity similar or equivalent to the nucleic acids in the sequence listings herein. Finally, most modifications to nucleic acids are evaluated by routine screening techniques in suitable assays for the desired characteristic. For instance, changes in the immunological character of encoded polypeptides can be detected by an appropriate immunological assay. Modifications of other properties such as nucleic acid hybridization to a complementary nucleic acid, redox or thermal stability of encoded proteins, hydrophobicity, susceptibility to proteolysis, or the tendency to aggregate are all assayed according to standard techniques, all of which are well-suited to high throughput selection. EXAMPLES The following examples are offered by way of illustration, and are not intended to be limiting. One of skill will immediately recognize a variety of alternate procedures, compositions, reagents and the like which can be substituted for those exemplified below.

Example 1 : Identification of Marker Loci

Marker loci must have multiple alleles to be used for genetic studies, genetic selection, and genetic identification within a species. We compared the DNA sequences among different soybean varieties and found new sequence polymoφhisms at 63 loci throughout the soybean genome. The alleles for these polymoφhisms are described in this example using allele-specific hybridization (ASH) as an example marker technology. We designed locus-specific oligonucleotide primers to amplify each locus by PCR, and designed allele-specific oligonucleotide probes to hybridize with and distinguish each allele. These polymoφhisms provide new opportunities to develop genetic-marker technology and to expand genetic-marker applications for the improvement of soybean.

A. Materials and Methods Identification of Sequence Polymorphisms

We selected soybean markers for conversion to ASH based on their genetic map locations. The objective was to have ASH markers well distributed around the soybean genome to maximize their general utility as markers and to have ASH markers near to genes of interest for marker-assisted selection and positional cloning of genetically linked genes, or other nucleic acids of interest. Map locations of RFLP markers were identified on Pioneer Hi-Bred International proprietary soybean marker maps and the USDA/ISU public soybean marker map (Shoemaker RC, Olson TC (1993) (Glycine max L. Merr).

P. 6131-6138. In O'Brien SJ (ed.) Genetic maps: Locus maps of complex genomes. Cold Spring Harbor Laboratory Press, New York). The linkage groups on the Pioneer Hi-Bred maps were named according to the USDA/ISU public map by cross referencing markers in common between the maps. Some marker loci selected for ASH development were homologous to cloned genomic soybean DNA. For these loci, fragments of genomic DNA were ligated into the Pstl restriction site of the commonly available pBS+ vector, transformed into E. coli strain DH5α using established protocols Keim P and Shoemaker RC (1988) Soybean Genet Newl 15: 147-148, and mapped as RFLP markers prior to selection for use as ASH probes. For a few other loci, genomic DNA fragments were ligated into the Lambda ZAP II vector, packaged, plated on E. coli strain XLl-Blue MRF', selected based on homology to a DNA probe and excised in pBluescript SK (-) phagemid in E. coli strain SOLR™ using ΕxAssist™ helper phage (Stratagene, La Jolla, California 92037 U.S.A.). Alternatively, selected DNA fragments were ligated into the pCR™ vector and transformed into E. coli according to the TA Cloning^® Kit V2.1 (Invitrogen, San Diego, California 92121 U.S.A.) or cloned into LIC vector (PharMingen, San Diego, California 92121 U.S.A.) and transformed into E. coli strain DH5α. Plasmid DNA was isolated using the Magic™ or Wizard™ Miniprep systems (Promega, Madison, Wisconsin 53711 U.S.A.) or Nucleobond Ax kits (Macherey-Nagel, P.O. Box 10 13 52 D523 > 13 Duren, Germany). Minipreps were precipitated in 0.1 vol 7.5 M NH₄Ac and 2.0 vols ΕtOH at -20° C for 20 min, spun in a microcentrifuge for 20 min, washed in 70% ΕtOH, dried, and dissolved in 10 mM Tris/HCl pH 8.5. Insert DNA was sequenced using the Taq DyeDeoxy terminator cycle sequencing reaction and a Perkin Elmer ABI 373 or 377 DNA sequencer. Sequencing primers were designed from the vector.

DNA sequence from each locus was then used to design forward and reverse PCR primers to amplify the locus from different soybean varieties. The primers were designed to have a dissociation temperature (T_d) of approximately 60° C, using the formula:

T_d = (((((3 x #GC) + (2 x #AT)) x 37) - 562) / #bp) - 5, where #GC, #AT, and #bp are the number of guanine-cytosine base pairs, the number of adenine-thymine base pairs, and the number of total base pairs, respectively, involved in the annealing of the primer to the template DNA. They were synthesized with a Perkin Elmer ABI 394 DNA/RNA Synthesizer using cyanoethyl phosphoramidite chemistry with the dimethoxytrityl protecting group removed, and were purified by desalting over a Pharmacia NAP10 column.

The PCR reaction mixture to amplify each desired locus consisted of 1.0 μM of the forward primer and 1.0 μM of the reverse primer, IX of buffer (20 mM Tripotassium citrate, 20 mM MgSO₄, 40 mM Tris base, 10 mM Glycine, 5 mM L-Histidine, and 0.01 % Triton X-100), 240 μM of each dNTP, 0.4 U μh-- of Hot Tub DNA polymerase (Amersham Life Science, Arlington Heights, IL 60005 U.S. A), and 1.0 ng μ ¹ of template DNA, all diluted in HPLC-grade H₂O. The PCR reaction was done in 33 cycles with an initial denaturing for 2 min at 94° C, subsequent denaturing for 30 seconds at 94° C , annealing for 2 min at 58° C, and extension for 2.5 min at 70° C. A final extension was done for 3 min at 70° C. An aliquot of each PCR product was run on a 1-2% agarose gel and stained with EtBr for viewing under ultraviolet light.

Each amplicon was purified for sequencing using a QIAquick-spin PCR Purification Kit using the manufacturer's protocol (Qiagen, Chatsworth, California 91311 U.S.A.). If multiple bands were amplified from one PCR, all were individually extracted from the gel and purified using the QIAquick Gel Extraction Kit (Qiagen, Chatsworth, California 91311 U.S.A.). These products were then sequenced using the Taq DyeDeoxy terminator cycle sequencing reaction on a Perkin Elmer ABI 373 or 377 DNA sequencer. Each oligonucleotide primer used to amplify a DNA fragment by PCR was also used to prime the sequencing of that fragment from one end. When a DNA fragment was too large to sequence in one pass from each side, additional sequencing primers were designed from the initial amplicon sequence and used to obtain more sequence further inside the fragment. When multiple bands were produced from an original set of PCR primers, the sequences among these different fragments were compared and locus-specific primers were designed to amplify only the desired locus.

Eight soybean varieties (Table 1) used to compare DNA sequences and identify single-nucleotide polymoφhisms were progenitors to many modern North

American soybean varieties (Gizlice et al. (1994) Crop Sci 34: 1143-1151) and should, therefore, represent most alleles existing for each locus. Two varieties, BSR101 and PI437.654, were also sequenced for each locus because they were the parents to a population of recombinant- inbred soybean lines used to map many of the markers. These 10 varieties represent a broad range of maturity types and were considered genetically diverse by RFLP fingeφrinting analysis (unpublished data). The sequences of these 10 varieties were aligned and compared for differences at each locus. Table 1. Soybean varieties used to compare DNA sequence, their estimated contributions to modern North American varieties, and their relative maturity groups. BSRIOI and PI437.654 were included to identify marker alleles that could be genetically mapped in a recombinant- inbred population. Variety Percentage Contribution! Maturity Group

Mandarin (Ottawa) 12.2 I

Lincoln 17.9 III

A. K. Harrow 4.9 III

S100 7.5 V

Ogden 4.9 VI

CNS 9.4 VII

Tokyo 3.8 VII

Jackson 3.3 VII

BSRIOI - III

PI437.654 - II t Percentages from Gizlice et al. 1994.

Design and Testing of ASH Markers

When single-nucleotide polymoφhisms or other polymoφhisms were found among the DNA sequences at each locus, several ASH oligonucleotide probes were designed, synthesized, and tested for each allele. The ASH probes were designed to have a dissociation temperature of about 37° C. They were synthesized with a Perkin Elmer ABI 394 DNA/RNA Synthesizer using (-cyanoethyl phosphoramidite chemistry with the dimethoxytrityl protecting group removed, and were purified by desalting over a Pharmacia NAP 10 column. The probes were tested against the 10 sequenced soybean varieties, 12 additional North American ancestor varieties, and 72 recombinant-inbred lines of the BSRIOI X PI437.654 mapping population. Probes were also tested for signal (hybridization of the correct probe) to noise (hybridization of the incorrect probe) ratios for each locus using the amplified target DNA from one variety in a dilution series. The dilution series were made using about 100 ng, 10 ng, and 1 ng of PCR-product (target) DNA.

Target DNA was attached to a Hybond N+ nylon membrane (Amersham Life Science, Arlington Heights, IL 60005 U.S. A) in the following manner. The membrane was soaked briefly in water and the excess water was removed from the membrane by blotting. Two microliters of the PCR product or diluted PCR product was pipetted onto the moist membrane. The membrane was placed on a blotter paper samrated with a DNA denaturing solution (0.6 M NaCI, 0.4 M NaOH), DNA side up, for two minutes. The membrane was then transferred to a blotter paper samrated with a neutralizing solution (0.5 M Tris pH7.5, 1.5 M NaCI) for 10 minutes. The membrane was then baked at 85-90° C for 1-2 hours and UV-crosslinked at 20,000 μJ. Prior to hybridization, each membrane was soaked in a hybridization- washing buffer (0.75 M Na, 0.5 M PO₄, 1.0 mM disodium EDTA, and 1 % sarkosyl) for 30 min at 65° C, then in fresh buffer for 30 min to overnight at room temperature.

Each ASH probe was end-labeled with ³²P transferred from the γ position of ATP by the T4 polynucleotide kinase reaction according to the kinase manufacturer's protocol (New England Biolabs, Beverly, Massachusetts 01915 U.S.A.). Hybridization of the probe and target DNA was done for at least one hour while shaking at 60 rpm at room temperature. Afterwards, the hybridization solution was discarded and the membrane was sequentially washed, each time in fresh solution, once for 2 min, once for 15 min, and twice for 30 min while shaking at 60 φm. The hybridized membrane was placed against X-ray film for 30 minutes to

18 hr at -80 ° C. The X-ray film was developed and the probe was evaluated for its hybridization characteristics. If necessary, the probe was redesigned and tested to increase the signal-to-noise ratio or signal strength.

Genetic Mapping of Soybean ASH Markers

Two segregating soybean populations were used to confirm the map location of each ASH marker relative to other markers in linkage groups. These two populations each consisted of about 300 recombinant-inbred lines from the crosses PI437.654 X BSRIOI and Bell X YB17E, respectively. The mapping procedure used and the population history for PI437.654 X BSRIOI were as already described (Webb et al (1995) Theor Appl Genet 91:574-581; Keim et al (1997) Crop Sci 37:537-543). The population from Bell X YB17E consisted of F_{5 6} lines. When possible, the linkage groups were identified with and named according to the USDA/Iowa State University public soybean map (Shoemaker and Olson 1993). Results

We identified DNA-sequence polymoφhisms at 63 independent loci in the soybean genome by comparing the DNA sequences among different soybean genotypes. These 63 loci were named (Table 2) according to pre-existing nomenclature. All were named as RFLP marker loci, except the loci php08320E, php08584A, php 10078 A and phpl0355B which were originally mapped as AFLP marker loci, the locus php 12105A which was only mapped as an ASH marker, and the locus SOYBPSP which was a soybean gene sequence in GENBANK (accession Ml 3759) for a 7S seed storage protein (Doyle et al. (1996) J Biol Chem 261:9228-9238). Of these 63 loci, only SOYBPSP was a known gene sequence and its nucleotide polymoφhism (which was previously unknown and identified by our sequencing) was found in an intron.

We designed forward and reverse PCR primers (Table 3) that amplified a region (the amplified product of the PCR reaction is an "amplicon") of each locus containing at least one nucleotide polymoφhism. Most of these loci had multiple polymoφhic nucleotide positions separated by monomoφhic nucleotides. When the distance between two polymoφhic nucleotides was greater than the size of a typical oligonucleotide hybridization probe, we considered each polymoφhic nucleotide position to be an independent sub-locus (Table 2). When the distance between two polymoφhic nucleotide positions was within the size of one oligonucleotide probe, we considered those polymoφhisms to be dependent and part of one sub-locus. We named each amplicon using the prefix 'pha' as an acronym for Pioneer Hi-Bred amplicon, followed by a unique identification number (Tables 2 and 3). Locus or amplicon names can both be used to refer to a DNA region containing specific polymoφhic sub-loci. The particular primer pairs described here produced amplicons ranging in length between 86 and 1880 base pairs. About half the amplicon sizes were estimated from bands viewed on agarose gels and about half were obtained by sequencing the entire amplicon region. Estimates from gels were accurate to within about 10% of the insert in size. The loci php05219A (amplicon phalll38) and phpl0355B (amplicon phall627) each produced two fragment sizes because each locus had an insertion-deletion event that varied among different soybean genotypes (Table 4).

One locus, pK079A, had two non-overlapping amplicons, pha 11074 and pha 11075. Two amplicons were needed for this locus because the DNA sequences flanking its two sub-loci were conserved at a second pK079 locus. A region of DNA between the two sub-loci was found to be different from the second pK079 locus so PCR primers specific to pK079A were designed for two amplicons, one for each sub-locus.

The polymoφhic nucleotides for each allele of each sub-locus were designated by upper-case letters within the probe sequences shown in Table 2. Each oligonucleotide probe distinguishes one allele of a sub-locus and locus. A public soybean variety representative of each allele is listed in Table 2 for the puφose of example.

The primer pairs and probes presented here enable the use of allele-specific hybridization (ASH) and many other techniques for detection of polymoφhisms at these loci. ASH was used here as an example marker technology for each polymoφhism. Other genetic marker technologies are equally effective as a means to exploit these polymoφhisms in soybean improvement programs. In addition, other DNA sequences than specifically shown here are used for primers and probes to detect these polymoφhisms, e.g., as described supra. Any DNA sequences flanking these polymoφhic nucleotides and within the size range amplifiable by PCR (up to 40Kb using long-distance PCR methods, although more typically about 2KB or less using standard PCR methods) are used as PCR primers to amplify the polymoφhic DNA, and the forward and reverse primers are designed from either DNA strand. Any DNA sequence containing the polymoφhic nucleotides and within the discriminatory capability of DNA hybridization conditions are used as hybridization probes to detect the polymoφhism. The probes are designed from either DNA strand.

The genetic map locations by linkage group for 43 of these 63 loci were established by mapping them as ASH markers in segregating populations (Table 2). Mapping showed the alleles of each locus to be heritable and to segregate normally in a recombinant-inbred population. When possible, the linkage-group names used here were the same as those used as a reference map for soybean (Shoemaker and Olson, 1993, supra.). Thirty eight of these 63 loci mapped as ASH markers to 17 linkage groups that correspond to the public reference map and 5 loci mapped to linkage groups on three other genetic maps that have not yet been identified with linkage groups on the public reference map. These later groups were named using a combination of letter (B, L or Z) and number. Twenty markers, pA059A, pA064A, pA077A, pA593A, pBLT15A, pK401A, pR153A, php02340B, php02396A, php05264A pA343A, pA748B, pA858A, pB132A, pG17.3A, php02371A, php05290A, php02329A, php02376A, and php08320E, were not mapped as ASH markers, but sixteen of their loci have been mapped as either RFLP or AFLP markers. Four loci, php02396A, php05264A, php02329A and php02376A have not been mapped as markers of any kind. Regardless of their map status, the 63 loci described here reside on most, and possibly all, of the 20 chromosome pairs in soybean. Table 2. Soybean marker loci: their genetic linkage group, sub-loci, and polymoφhic nucleotides, probe sequences for allele-specific hybridizations, and a representative soybean variety for each allele.

Locus Linkage Amplicon Sub- Probe Soybean Q ID

Name Groupt Name Locus Sequence^ Variety NO: pA059A C phal2392 ttgtgaTcaatata AKH arrow 1

(4§) ttgtgaCcaatat CNS 2

_PA060A J phal0634 actaaatTtatacc AKHarrow 3

(2) ggtataCatttag BSRIOI 4 pA064A D phal2393 gttgcTtgggt S100 5 α§) ttgcCtgggtt AKHarrow 6

2 ttttcTTttgttag Mandarin 7

2 ttttttcttgttag AKHarrow 8

3 tttccAaaggtg Mandarin 9

3 ttttccGaaggt AKHarrow 10 pA077A 16 phal0623 1 ggattACatacta BSRIOI 11

(3§) 1 tggattTTatacta Ogden 12

2 atcatgatttcag PI437.654 13

2 tgaATCTGAtttc CNS 14

3 gCAAGTATCAtg CNS 15

3 tttcagTTtgattt PI437.654 16

4 atgttCgggga BSRIOI 17

4 aatgttTgggg Mandarin 18

5 . agaacaCggaat BSRIOI 19

5 gaacaTggaatg Mandarin 20

6 tgcaaCggcat BSRIOI 21

6 tgcaaTggcatt Ogden 22 pA086A E phal0624 1 tttgaaGctttat BSRIOI 23

(1) 1 tttgaaTctttatc PI437.654 24 pA169A L phal0649 1 tcatcGaatcac BSRIOI 25

(1) 1 ttcatcAaatcac Jackson 26

2 GatgaCgatttg BSRIOI 27

2 aCatgaTgatttg PI437.654 28

3 tcatCtGtgataa BSRIOI 29

3 tcatCtCtgataa Jackson 30

3 tcatGtGtgataa PI437.654 31 pA280A N phal ll35 1 tgcactTtaaatta BSRIOI 32

(1.2) 1 ttgcacttaaatta Lincoln 33

2 ttccttcTtttttt BSRIOI 34

2 tccttcCtttttt PI437.654 35

3 agagaGatactc BSRIOI 36

3 agagaCatactc Columbus 37

4 gaccCgctc BSRIOI 38

4 gaccTgctcc Lincoln 39

5 gaaatcCcaaaaa BSRIOI 40

5 gaaatcAcaaaaaa Lincoln 41

6 tttatcAtttttgg PI437.654 42

6 atttatcTtttttg BSRIOI 43 pA343A D pha 13072 1 cagcaAtgaaag CNS 44

(1§) 1 tcagcatgaaag AKHarrow 45

2 ttaactTgccag AKHarrow 46

2 ttaactAgccag CNS 47

3 accctCaatatg AKHarrow 48

3 accctAaatatg CNS 49

_PA37δA G pha 10792 1 ttggaaTAtatact Manchu 50

(1) 1 ttggaaGAtatac PI437.654 51

1 ttggaaGTtatac BSRIOI 52

2 agcacAagtgg PI437.654 53

2 agcacTagtgg PI86050 54

_PA505A A phalϋ635 1 attaggggcag BSRIOI 55

(1.2) 1 attaggGggca PI437.654 56

2 atatgtaAcaaaag PI437.654 57

2 tatgtaCcaaaag BSRIOI 58

_PA519A B phal0638 1 tcttttAcataatg BSRIOI 59

(1,2) 1 cattatgCaaaag CNS 60

2 gtactaTtatttg PI437.654 61

2 gtactaGtatttg BSRIOI 62

3 ttgagGatttag BSRIOI 63

3 ttgagCatttag AKHarrow 64

4 aaggaGgttgc BSRIOI 65

4 taaggaAgttgc AKHarrow 66

5 . ttagttGagagg AKHarrow 67

5 ttagttAagagga BSRIOI 68

_PA593A B phal2395 1 gtttttTttTataaat AKHarrow 69

(1§) 1 gtttttAttATTATTa CNS 70

2 aaatatAtatatatata AKHarrow 71

2 aatatGtataAtatat CNS 72

3 atatatTtaataaatat CNS 73

3 tatatatAtaTATATAT AKHarrow 74

4 aaaaTaaaaAtaaaag AKHarrow 75

4 aaaAaaaaCtaaaag CNS 76

5 atctttGatgagt CNS 77

5 atctttCatgagt AKHarrow 78

6 tTtttttCtttttac CNS 79

6 ttAtttttAtttttac AKHarrow 80

_PA5δδA B phal064δ 1 agaaattAgtaagt BSRIOI 81

(1) 1 agaaattTgtaagt PI437.654 82

2 aatcttTtttaaag BSRIOI 83

2 aatcttCtttaaag PI437.654 84 pA74δB C phal3073 1 tctatTctgaag Mandarin 85

(1§) 1 tctatActgaag CNS 86

2 aatgatAatttagt CNS 87

2 aatgatCatttag Mandarin 88 pAδ5δA H phal3158 1 ctacattTtttttg CNS 89

(1§) 1 tacattGtttttg AKHarrow 90

2 tttttgTtagaga CNS 91 2 tttttgCtagag AKHarrow 92

3 acactGcttac CNS 93

3 tacactActtac AKHarrow 94 pA882A O phal2396 1 taaggtaatgttg S100 95

(1) 1 aggtGGTaatgt CNS 96

2 ggcttAtgcatt CNS 97

2 ggcttCtgcat AKHarrow 98 pA947B W phal0621 1 agggCctctg PI437.654 99

(1,2) 1 tagggTctctg BSRIOI 100

2 atacttgtactct Ogden 101

2 tacttCgtactc BSRIOI 102

3 tatggagtaattg Ogden 103

3 ATCTTTTCAGGTT BSRIOI 104 pB032A K phal lϋ71 1 ttgaatTcccct BSRIOI 105

(1) 1 ttgaatCcccc PI437.654 106

2 atgtttCgaagc BSRIOI 107

2 atgtttTgaagca PI437.654 108

3 cggttTtattag PI437.654 109

3 cggttCtattag BSRIOI 110

4 attccTgcccc BSRIOI 111

4 attccAgcccc Tokyo 112 pB032B J phal l073 1 accgAgcaac BSRIOI 113

(1) 1 ttgcCcggtg PI437.654 114

2 . tgtaaTgcGtg BSRIOI 115

2 tgtaaCgcGtg PI437.654 116

2 tgtaaCgcAtgt Cook 117

3 tcaccTggatc BSRIOI 118

3 atcaccCggat PI437.654 119 pB039A I phal0640 1 ttcagtCaaacc BSRIOI 120

(1) 1 ttcagtTaaacca PI437.654 121 pB132A B phal4257 1 tcttaaTaggct Tokyo 122

O§) tcttaaAaggct CNS 123

_PBLT15 L27 pha 12394 1 gatcaaAcccaa AKHarrow 124

_r A d§) 1 atcaaGcccaa CNS 125 pBLT24 A phall07ό 1 cacatTccacaa PI437.654 126 A

(1) 1 cacatAccacaa BSRIOI 127

2 attattAttttcac PI437.654 128

2 attattTttttcac BSRIOI 129

3 aggagtAgtaatt PI437.654 130

3 ggagtGgtaatt BSRIOI 131 pBLT65 A phal0653 1 ttggacTattaata PI437.654 132

(1) 1 ttggacAattaata BSRIOI 133

2 taatatcTtatgca BSRIOI 134

2 aatatcGtatgca PI437.654 135

PU17.3A L phal3561 1 ttctcgGgcc AKHarrow 136 (4§) ttctcgCgcc CNS 137

2 ttctgataaaaaaa CNS 138

2 tctgatGaaaaaa AKHarrow 139 php0226 F pha 10598 gaatgaCtttga BSRIOI 140

5A

(1) gaatgaTtttgac Mandarin 141 php0230 M phal0615 tcattcAttcatg P1437.054 142 IA

(1) tcattcTttcatg BSRIOI 143 phpϋ232 '> phal3070 catataGtagtag Mandarin 144 9A acatataAtagtag CNS 145 php0234 A phal2390 aaaaaaaatgagg BSRIOI 146

(1§) aaaaaaaTatgagg CNS 147 phpϋ236 G phal0646 tgtgacaaccga BSRIOI 148 IA

(1) gtgacACaacc PI437.654 149 php0237 K phal0618 tctccAgaaaca BSRIOI 150

0C

(1) gtttcGggag PI437.654 151

2 aagaagAtgatg BSRIOI 152

2 agaagCtgatg PI437.654 153 php0237 w phal3071 ggcaaAttttCc AKHarrow 154

IA

(1§) ggcaaGttttAc CNS 155

2 ttttCcGgtgc AKHarrow 156

2 GttttAcAgtgc CNS 157

3 aataaaCaagagg AKHarrow 158

3 aataaaTaagagga CNS 159

4 tcgtttGcaatc AKHarrow 160

4 tcgtttCcaatc CNS 161

5 aaatgatTccttg AKHarrow 162

5 aatgatCccttg CNS 163

6 tattatgTttttgat AKHarrow 164

6 tattatgCttttga CNS 165

7 ggaaagGattttt CNS 166

7 ggaaagAattttta AKHarrow 167

8 ttttgtATctgtat CNS 168

8 tttgtGGctgta AKHarrow 169 php0237 ? pha!35ό0 1 attaaaCcccag Mandarin 170

6A

1 attaaaTcccagt CNS 171

2 ttttgtTagagag Mandarin 172

2 ttttgtCagaga CNS 173

3 ttttctTctgtca Mandarin 174

3 ttttctCctgtc CNS 175

4 acaacTAAtaagg Mandarin 176 4 tacaactaaggta CNS 177 php0238 L18 phal0620 1 atcacgAatcac PI437.654 178

7A

(1) 1 gtgatCcgtg BSRIOI 179 php0238 L23 phal0782 1 cgcacaAatatg H437.654 180 8A

(1) 1 gcacaGatatg BSRIOI 181

2 atactgAtttctg PI437.654 182

2 tactgGtttctg BSRIOI 183

3 gttagaAtgttag PI437.654 184

3 ttagaGtgttagt BSRIOI 185

4 aatataaAtaaggg PI437.654 186

4 atataaGtaaggg BSRIOI 187

5 aaactcagttattt PI437.654 188

5 aactcAAagttatt BSRIOI 189

6 tttcttTgttattc PI437.654 190

6 atttcttCgttatt BSRIOI 191

7 attctgCtattatt PI437.654 192

7 tattctgTtattatt BSRIOI 193

8 tgcacaTattact PI437.654 194

8 gcacaCattact BSRIOI 195

9 atcccatGttca PI437.654 196

9 tcccatAttcag BSRIOI 197 php0239 C phalll31 1 . tttaagGagagtt P1437.654 198

3A

(1,2) 1 gtttaagAagagt BSRIOI 199

2 tgctttGatttg BSRIOI 200

2 tgctttAatttgg Mandarin 201

3 atctctaTaaaca PI437.654 202

3 tctctaCaaacaa BSRIOI 203

4 tctcaaCttgga PI437.654 204

4 tctcaaTttggaa BSRIOI 205

5 tttgcAtgcaac PI437.654 206

5 tttgcTtgcaac BSRIOI 207

6 cacatCcatttg Ogden 208

6 cacatAcatttg PI437.654 209 ph_P0239 ? phalll32 1 tacaaaaAaaggtt AKHarrow 210

6A

1 acaaaaCaaggtt BSRIOI 211

2 ggcAtgTgagt AKHarrow 212

2 ggcTtgCgag BSRIOI 213

3 tgatatATTcttca AKHarrow 214

3 tgatatGcttcaa BSRIOI 215

4 agttaacAtgaag AKHarrow 216

4 gttaacGtgaag BSRIOI 217

5 agcagtGaagta AKHarrow 218

5 gcagtAaagtac BSRIOI 219

6 aatatatctcttttttt AKHarrow 220

6 atatctcCtttttt CNS 221 7 aaaaaaGtagctaa AKHarrow 222

7 aaaaaaCtagctaa CNS 223

8 tatttgCattagg AKHarrow 224

8 ttatttgTattagg CNS 225

9 tcttacaTctttg AKHarrow 226

9 cttacaGctttg CNS 227

10 cggtAgagatt AKHarrow 228

10 cggtGgagatt CNS 229

11 gggcacAaaag AKHarrow 230

11 ggcacGaaagt CNS 231 php0263 C pha 10650 1 ctatatTttggtg BSRIOI 232

6A

(1) 1 ctatatAttggtg PI437.654 233

2 aataatGattgtg PI437.654 234

2 taataatAattgtgt BSRIOI 235

3 ttattatCttttgt Williams 236

3 ttattatTttttgt CNS 237

4 ttaAaatcTagtag CNS 238

4 attaAaatcCagta PI437.654 239

4 attaTaatcCagta Williams 240

5 agattaAcaggc CNS 241

5 tagattaCcagg BSRIOI 242 php0352 K phal065l 1 aagagaAggcta BSRIOI 243

2A

(1) 1 aagagaTggcta PI437.654 244

2 tggaaAccttaat BSRIOI 245

2 tggaaGccttaa PI437.654 246

3 tacctcAagtgt BSRIOI 247

3 acctcGagtgt PI437.654 248

4 actaagAattttg PI437.654 249

4 actaagCattttg BSRIOI 250 php0521 G phalll38 1 ttatagAcacttg BSRIOI 251

9A

(1.2) 1 tatagGcacttg Mukden 252

2 tggtaCgttatg Mukden 253

2 ttggtaAgttatg BSRIOI 254

3 ataaaccAtatatg PI437.654 255

3 ataaaccTtatatg BSRIOI 256

4 agtgtgtTTTttt BSRIOI 257

4 gtgtgtAAGtttt PI437.654 258

5 ttggtCcaagg PI437.654 259

5 ttggtTcaaggt BSRIOI 260 php0523 L21 phal0637 1 ataggGaaaagg BSRIOI 261

3A

(1) 1 aataggTaaaagg PI437.654 . 262

2 aaccttTctgtc BSRIOI 263

2 accttGctgtc PI437.654 264 php0526 ') phal2391 1 ctacatAatgaag BSRIOI 265

4A 1 tacatGatgaag AKHarrow 266

2 aatcttGctgtg BSRIOI 267

2 aatcttActgtga AKHarrow 268

3 atgtgGcattga BSRIOI 269

3 catgtgAcattg AKHarrow 270

4 tatacaaTatctaaa BSRIOI 271

4 atacaaCatctaaa AKHarrow 272

5 tcaatgAtggata BSRIOI 273

5 tcaatgTtggata AKHarrow 274 php0527 M phal l078 1 aaaatCttggatc BSRIOI 275 8A

(1) 1 aaaaatAttggat PI437.654 276 php0529 E phal3074 1 gaatgaatttttc AKHarrow 277

OA

(1§) 1 aatgaaCtttttc CNS 278

2 aaggagGgaaaa AKHarrow 279

2 aaggagAgaaaaa CNS 280

3 aaatgaaCaaaaaaa CNS 281

3 aaatgaaAaaaaaaa AKHarrow 282 phpϋ534 A phallϋ/y 1 tttgttAtttatga F1437.6 283 2A

(1) 1 tttgttCtttatga BSRIOI 284

2 atctatGtatatta BSRIOI 285

2 tatctatAtatatta PI437.654 286

3 gttgcCaaatca BSRIOI 287

3 tgttgcAaaatca PI437.654 288 php0765 G phalll39 1 agtgcGAtgaaa BSRIOI 289

9A

(1,2) 1 agtgcACtgaaa PI437.654 290

2 gaggaGatgtag BSRIOI 291

2 gaggaAatgtag PI437.654 292

3 gatgaTtttagc BSRIOI 293

3 gatgaGtttagc PI437.654 294

4 taggGatttgg BSRIOI 295

4 aataggAatttgg PI437.654 296

5 ccatgTttggtt BSRIOI 297

5 ccatgCttggt PI437.654 298 php0832 J phal2436 1 aggaGtatCcc BSRIOI 299

(1§) 1 aggaAtatTccc PI437.654 300

2 agaaAAtcgcttt BSRIOI 301

2 gaaGGtcgGTC PI437.654 302

3 gttttttGactttt BSRIOI 303

3 gttttttTacttttg PI437.654 304

4 tatgatGtttcct BSRIOI 305

4 atatgatAtttcct PI437.654 306 php0858 s phal0655 1 ggtacCggag BSRIOI 307

4A

(1,2) 1 tggtacTggag PI437.654 308 2 tgcatAGcaaga BSRIOI 309

2 tgcatGAcaaga PI437.654 310

3 aactcGttgatg BSRIOI 311

3 aaactcAttgatg PI437.654 312

4 agtctGagtttg PI437.654 313

4 aagtctAagtttg BSRIOI 314

5 gtgtaaAcggg PI437.654 315

5 tgtaaGcggga BSRIOI 316

6 tgaaGaaaaaTatg BSRIOI 317

6 gaaCaaaaaGatg PI437.654 318

7 atgggAcgttg BSRIOI 319

7 atgggGcgtt PI437.654 320

8 gctttgttgttg BSRIOI 321

8 gctttgGTAttg PI437.654 322

9 ttagtGgacagt CNS 323

9 gttagtTgacag PI437.654 324

10 gccaaaGcaaata CNS 325

10 gccaaaTcaaataa PI437.654 326

11 agtgaTactgg CNS 327

11 agtgaGactgg PI437.654 328

S phal l70l 12 gtcacTAgagaa BSRIOI 329

(1) 12 gtcacGTgaga PI437.654 330

13 taatccAgggaa BSRIOI 331

13 taatccTgggaa PI437.654 332

14 attaattTagaagg BSRIOI 333

14 ttaattGagaagg PI437.654 334

15 gtctgCatatga CNS 335

15 tgtctgTatatga PI437.654 336

16 atgtgCtttggt CNS 337

16 atgtgGtttggt PI437.654 338

17 cggTaagAtgaa BSRIOI 339

17 cggCaagGtg CNS 340

18 ctcgcAccttc BSRIOI 341

18 tcgcGccttc CNS 342 phplϋ07 I phal4395 1 ttttttaTagaaaag Bell 343

8A

(2) 1 tttttaGagaaag Mandarin 344

2 aaaaaaaTttaagac Bell 345

2 aaaaaaaAttaagac Mandarin 346

3 ttttttttgtagtg PI437.654 347

3 tttttttGtgtagt Bell 348

4 ataatatATgaaattt PI437.654 349

4 ataatatgaatttc Bell 350

5 aatttcGtatgta Bell 351

5 aatttcAtatgtac Mandarin . 352

6 acatttGgattaa Mandarin 353

6 acatttAgattaaa Bell 354

7 taaaaaaAtattact Bell 355

7 taaaaaaGtattact PI437.654 356 8 gaacaaTttgtaat Mandarin 357

8 gaacaaCttgtaa Bell 358

9 cttcaCgaagg Mandarin 359

9 tcttcaTgaagg Bell 360 php 1035 G pha 11627 1 gtttctcttatGac PI437.654 361 5B

(1,2) 1 tcTTATCttatGac BSRIOI 362

1 tcTTATCttatAac S100 363

2 gtttcTgataac PI437.654 364

2 gtttcAgataac BSRIOI 365 php 1 10 S phal2105 1 ttgggAatgatg BSRIOI 366 5A

(1) 1 tgggGatgatg PI437.654 367 pK069A G phalU63 1 atctaaCatttagt PI437.654 368

(1,2) 1 aatctaaAatttagt BSRIOI 369 pK079A L26 phal l074 1 acttcAagtgga BSRIOI 370

(1) 1 cttcGagtgga PI437.654 371

L26 pha 11075 2 gacaatCtaaaaa BSRIOI 372

(1) 2 gacaatTtaaaaa PI437.654 373 pK401A K pha 10632 1 aagaatCttccta Tokyo 374

(4§) 1 aaagaatAttccta BSRIOI 375

2 atgtgtttggttt Tokyo 376

2 tgtgtGTttggt BSRIOI 377

3 tatttttaaatcg Tokyo 378

3 tatttttTaaatcg BSRIOI 379 pK418A N phal l628 1 ttaattaCcttaag BSRIOI 380

(1,2) 1 ttaattaTcttaag Archer 381

2 tgcaaaaaaataaag BSRIOI 382

2 tgcaaaAaaaataaa Archer 383

2 actttatCtttttg PI437.654 384

3 aagtCcaGcac Bell 385

3 aagtCcaAcact PI437.654 386

3 aagtTcaGcact BSRIOI 387

4 atgCcCttttgt PI437.654 388

4 gatgCcAttttg BSRIOI 389

4 ggatgTcAtttt PI340.046 390

5 tttgctTtgtatg PI437.654 391

5 ttgctCtgtatg BSRIOI 392

6 agaattCgcatat PI437.654 393

6 gaattTgcatatc BSRIOI 394

7 tcatttAcccaaa PI437.654 395

7 tcatttGcccaa BSRIOI 396

8 tgtgctTatgag P437.654 397

8 gtgctGatgag BSRIOI 398

9 taattTAgcttaag PI437.654 399

9 ttaattCGgcttaa BSRIOI 400

10 ctgaaTgaggg PI437.654 401

10 ctgaaCgaggg BSRIOI 402

11 aattgtaAtcattg PI437.654 403 11 attgtaGtcattg BSRIOI 404

12 gataacCactca BSRIOI 405

12 gataacactcatt Sanga 406

13 TCTGAACGAAGA BSRIOI 407

13 agaataatatgcg PI103.091 408

14 attatatCacatgt PI340.046 409

14 attatatTacatgta BSRIOI 410

15 tatataGggctg PI437.654 411

15 atatataTggctg Sanga 412 I phal l l33 1 gtataaaAaaagg PI437.654 413

(2) 1 gtataaaaaaggg Lincoln 414

2 atccaCtaaatg PI437.654 415

2 atccaGtaaatg Lincoln 416

3 tcgataActtattt Lincoln 417

3 tcgataGcttatt PI437.654 418

4 atacGTTAGaaga Ogden 419

4 tatacTCaagact PI437.654 420

4 acTCTAGaagac Lincoln 421

4 tacTCTGGaaga CNS 422

5 aaatagTtaagatt PI437.654 423

5 aaatagCtaagatt Ogden 424

6 taacttaaAataaaa PI437.654 425

6 aacttaaGataaaa CNS 426

7 ttTcAtattaacc PI437.654 427

7 ttGcGtCCTCA Ogden 428

8 aatcttCtataatc PI437.654 429

8 taatcttTtataatc Ogden 430

9 ttacaaGtttgag Ogden 431

9 ttacaaAtttgagt PI437.654 432

10 aaagaCtacttaa PI437.654 433

10 aaagaTtacttaaa Ogden 434

11 agattcAtatgttt PI437.654 435

11 attcGTATGtatg Ogden 436

12 atacatACaaataa PI437.654 437

12 atacatATaaataag Ogden 438

12 atacatGCaaataa Lincoln 439

13 tttgttTGagaaaa Ogden 440

13 ttttgttCAagaaa PI437.654 441

13 tttgttTAagaaaat Lincoln 442

14 ttatttCtttAttatt PI437.654 443

14 ttatttTtttAttatt Lincoln 444

14 tatttCtttGttatt Ogden 445

15 aaatggCaaattg PI437.654 446

15 aaatggTaaattgt Lincoln 447

16 agttgGtctttg PI437.654 448

16 tagttgAtctttg Lincoln 449

17 attaacaGtaaagt PI37.654 450

17 tattaacataaaAgt Lincoln 451

18 ataaaagaatatat PI437.654 452 18 ataaaaAaatatat Lincoln 453

19 cttctttCattttt PI437.654 454

19 cttctttAattttta Lincoln 455

20 tttactatGaaaga PI437.654 456

20 tttactatAaaaga Lincoln 457

21 aacattCactataa Lincoln 458

21 gaacattAactata PI437.654 459

22 taacattTgcataa Lincoln 460

22 aacattCgcataa PI437.654 461

23 ttatataAtacataa PI437.654 462

23 ttatataTtacataa Lincoln 463

24 cctcaTctaatg PI437.654 464

24 cctcaActaatg Lincoln 465

25 atccttGtttttg Lincoln 466

25 aatccttTtttttg PI437.654 467

26 aatccCcagaaa Lincoln 468

26 taatccTcagaaa PI437.654 469

27 atggaAgcgtc PI437.654 470

27 tggaGgcgtc Ogden 471

28 ggttgTggcg Ogden 472

28 ggttgAggcg PI437.654 473 pL058A B36 phal0641 1 actgcTtataaca BSRIOI 474

(2) 1 actgcCtataac AKHarrow 475

2 aaaattGccacg BSRIOI 476

2 aaaattAccacgt AKHarrow 477

3 ttctttTgtgaca BSRIOI 478

3 ttctttGgtgac AKHarrow 479 pL183A ϋ phall l36 1 ggtgaGaaaaag PI437.654 480

(1) 1 ggtgaAaaaaagt Resnick 481

2 agtcaCattattc PI437.654 482

2 tagtcaTattattc Resnick 483

3 taggtaCaaagtt PI437.654 484

3 aggtaTaaagttc Resnick 485

4 ttctctGtgttg PI437.654 486

4 ttctctCtgttg BSRIOI 487

5 ttgtgaAcataca PI437.654 488

5 tgtgaGcataca BSRIOI 489

6 tgataGatcttca Lincoln 490

6 gtgataAatcttc PI437.654 491

7 tacctactatgat PI437.654 492

7 tacctaTActatg BSRIOI 493

8 agaattAtgttgtt PI437.654 494

8 agaattGtgttgt BSRIOI 495

9 agtatttAtaccaa BSRIOI 496

9 agtatttTtaccaa PI437.654 497

10 ataaaaaGgatgaa BSRIOI 498

10 taaaaaTgatgaag PI437.654 499

11 tatattAgaggat Resnik 500

11 tatattTgaggat PI86050 501 12 atttatAcgagga PI437.654 502

12 atttatGcgagg Resnik 503

13 ttgtattATcaaatt Lincoln 504

13 tgtattTACcaaat PI437.654 505

14 gtttcaAacaca Resnik 506

14 gtttcaCacaca PI86050 507

15 gatccGtatcc Resnik 508

15 agatccAtatcc PI86050 509

16 tatccGaccca PI86050 510

16 tatccAacccat Resnik 511 F pha 10658 1 ggaacGttacc BSRIOI 512

(1) 1 tggaacAttacc PI340.046 513

2 atgcttAactaac Burlison 514

2 tgcttGactaac BSRIOI 515

3 aaacacATaaaatg BSRIOI 516

3 aaaacacaaaatga Burlison 517

4 tcctgtAttttag BSRIOI 518

4 cctgtGttttag PI88788 519

5 cgttttAAaaatttt PI88788 520

5 cgttttTTaaatttt BSRIOI 521

6 tgataaaGttattta BSRIOI 522

6 tgataaaAttattta PI88788 523

7 tatttacTggttt BSRIOI 524

7 tatttacAggttt PI88788 525

8 aattttaTatttatg PI88788 526

8 aattttaGatttatg BSRIOI 527

9 aatttaGcacttc PI88788 528

9 aatttaAcacttc BSRIOI 529

10 gtgcGTtaTgc Burlison 530

10 gtgcTAtaAgct Proto 531

10 gtgcTTtaAgct BSRIOI 532

11 atttttTGgtatg PI88788 533

11 atttttTAgtatgg Burlison 534

11 aatttttGAgtatg BSRIOI 535

12 gaaaaatCaaggt BSRIOI 536

12 gaaaaatAaaggta PI88788 537

13 ggtttTgccga PI88788 538

13 ggtttGgccg BSRIOI 539

14 tcaattCtcttag BSRIOI 540

14 tcaattTtcttagt Proto 541

15 tttatttTTaaaaaaa BSRIOI 542

15 tttatttTAaaaaaaa Proto 543

15 tttattLATaaaaaaa Burlison 544

15 tttatttAAaaaaaaa PI88788 545

16 atttttGaaaattc BSRIOI 546

16 catttttTaaaattc PI88788 547

17 tttcaatTtgtcat PI88788 548

17 ttcaatGtgtcat BSRIOI 549

18 acagaaCtcaac PI88788 550 18 acagaaTtcaaca BSRIOI 551

19 ctatTgaAggttt PI88788 552

19 tatGgaGggttt BSRIOI 553

20 gactaaAgtgag Burlison 554

20 gactaaTgtgag BSRIOI 555

21 taacacTtaacac Burlison 556

21 taacacAtaacac BSRIOI 557

22 gtaaagttcaag Burlison 558

22 taaAGTTagttcaa BSRIOI 559

23 agaaaaCaactgt BSRIOI 560

23 tagaaaaTaactgt Burlison 561

24 gccAAAACAGTt BSRIOI 562

24 caagcctaatag PI340.046 563 pR153A D phal0636 1 actataGttcgc PI437.654 564

(1§) 1 actataCttcgc Lincoln 565 pT005A G phal0783 1 atgccAgcatg BSRIOI 566

(2) 1 atgccTgcatg AKHarrow 567

2 gtgtatGgttgt BSRIOI 568

2 tgtgtatAgttgt AKHarrow 569

3 gcatGtcgac BSRIOI 570

3 gcatCtcgac AKHarrow 571

4 aaaaagTatgagg BSRIOI 572

4 aaaaaagCatgag AKHarrow 573

5 gtgggcattag Williams 574

5 gtgggCcatta Mandarin 575 pT155A A phal0647 1 tgtgaaTattagc PI437.654 576

(1) 1 gtgaaCattagc BSRIOI 577

2 tgtatcGtaaatc BSRIOI 578

2 tgtatctaaatctt PI437.654 579

SOYBPS 0 pha08230 1 taaaattGttggtt BSRIOI 580 P

(1) 1 taaaattAttggtt PI437.654 581 t Soybean populations used to confirm the map location of each marker locus by ASH were (1) PI437.654 X BSRIOI, (2) Bell X YB17E, (3) B152 X Century 84, and (4) Shoemaker and Olson 1993. § indicates the marker's map location as an RFLP or AFLP marker and was not confirmed by ASH. φ Upper-case letters show polymoφhic nucleotides in soybean.

Table 3. Forward and reverse PCR primer sequences for soybean marker loci.

Table 4: Sequences of nucleotide polymoφhism regions: pha08230 (SEQ ID NO:712)

TTGAAGGTTG TAAGAGTCTC GGTCGTCGT TGTTCACCAA GGTAAGAATG GCAGTCCCT GTAATTAACA AAACAAATTC AATAACATAA TTTTATGAGA AGTAAAAGCA TGTGATGATG TTTATTAATT AATTAAATAA GTGTCATAGT AAGATTAGTT TTTATAAAAT T[G/A]TTGG TTTTGTTTTT ATTTATTTAA TAATTTTTCT

TAATCCGCAT AAAATAATGA CATTTATTTT ATTTTGAGAG GAAAGGAATA CTAACCGTTA AGGATAACGA TGAGGTAATC AGCGTCAGCA TGGTGGGGGA GAAGAAGGGT GTTGGGTTTG GAGTTGAACT CCAAAATGCG GTAGTCTCG GAGATTCTGA AGCTGTTGGG AGCGTTTGT TGAACCTCTG GAGGACGCGA ACGTGGCCAT ATTGGTTTTT GAAGAGAGTT TGGAACCTTT TAGAGTTGAA

GTGAAAAGGG TTCTTATTCT TATGTCTTCG TGGTTCTCTT TGAGACTCAG AACCTTCACT TTCTTGGCTC TCTTTGTCTT GCTCCTCATC CTCGTCTTGG TCTTCTTCTT CTTCTTCACT TTCCTTTCCT TGGTGCTTTT CCTGCTTGTG TTGCCATTCG TGCTTTTCCT CTTCCTTTTG ATGAGGTTGG TGTG phal0598 (SEQ ID NO:713)

CTGCAGCTCC TTCTGCATCT TCCACCTCAA TAGTGATGGN TAACTCTATG GAAGATGCAG TGGTAGTGTG GTGACCAGAT GTTCCTCCAC AATAATGCTC CATTTCTTTG ACCTCAGGGA TTCCAGTGAT CTCCATGTCT TGACCTCTGC CCACATTCCA TTGTCTTGAC CTCCTCCCAC ATTCCATTGT CTTGACCTCC

TCCCCCTCTT GTGATGTTAT CAAAGGTGGC TCCTCGTTTG TCCAAGAATA AACAGCTACA TACATCACTG CATTCTGAAT CTGAAACACT CTCCTCTACT CTTGTTCACA TGGTCTATTG AATGGGAAT CAACTTCTTC GTGAGTGGGT TGTTCATTT TTGACCTTGT CCCCAAACTT GATGCCTTAA TTTTGCATCC ACCTTTCCAC TCACTGCATG ATTTCTTCAA TTCCCCTGAC CTTGATATAG

AGGGTTGTGC CTCTGCCCCT TCTCAGGAAT GA[C/T]TTT GACTCTTCAA TAGCTTGTGC TGCATTTGCT TCAACTCTTA AATGTTTACC TTGTCAGAA CAAGGCTTCA ACCCACAAGA ATTGGCTAG CTCCTTCAGG GACTCTGAAC TCCGTTGGGG TCACTGCTCG ATTCTTGTTT GGAACTAAGC ATTAGCACTG CAG phal0615 (SEQ ID NO:714, 715)

CTGCAGAGAG GAGTGGTGTT CATGCTTTCC CTGCTGGGGT GTGCCCACTG TGGGTGCTG GTGGCCACTT CCTGGTGGT GGCTATGGAA ATCTCATGTG TGGATAATA TCATTGATGC AAGATTGGTT GATGTAAACG GTAACATACT TGACAGAAAG TCAATGGGGG AAGATCAGTT TTGGGCCATA AGAGGAGGTG GTGGGGGAAG TTTTGGTGTC ATTC[A/T]T TCATGGAAGA TCAAGTTTGT TTTTGTGACT CCAAAAGTGA CTGTTTTCAA AGTGATGAGA AACTTGGAGT TGGAAGATGG TGCAAAGGGT CTTGTTTACA AGTGGCAATT GATTGCAACA AAATTGCATG AAGATCTTTT CATAAGAGTG ATGCATGATG TGGTTGATGG CACTCAAAAT GCCAATAAGA AGACCATTCA GGTTACTTTT ATTGGTTTGT

TCTTGGCAA GGTGATCAAA TGTTGGAGTT TGGTAAATGA GAGTNTCCT GAATTGGGTT TGGAGCAAAG TGACTGCATT GAAATGCCAT GGATCAACTC CACCCTTTAT TGGTTCAATT ACCCAATTGG GACCCCCATT GTTTGATGTG CCCAAAGAGC CCCTTTCACA TAGCTTCAAA ACCATGTCAG ATTATGTGAA GAGGCCCATT AGAGAAACTG CTCTTAAGTC CATTATGATT AAAAGTGAGA GAGAGTGTGA GGATGGAATG GAATCCTTAT GGTGGAAAGA TGCATGAGAT TTCACCATCA GAAACTCCAT TTCCTCATAG AGCAGGGAAC TTGTTCTTGA TTGAGTACTT AACATCTTGG GGGCAAGATG GGTTTGGATG CAGGTAATCC GTTACCTAAA CATTTCAAGG TCA...( - 1300 bp)...AACATTAAGA CTATATTTAG ATTTTTTTTG AAAGACAAAA GAATATTATT GATGATGGTT TGGCCCGCAA GGTGTACAAA AACATGAGGA TAACAGTTTT TCCTCCCAAA ACAAAAAGCT CGAGCACCAT ATGACTGAAA AGTGAAAACA AGACATATAC AATGCAAGTT AAATTTAGAG ATATATCTCT AAATGCTGCA ACTAAGCCCG GAAAGTCAAA CAAAAGCTAA CATATATCCA TCTGCAG

phal0618 (SEQ ID NO:716, 717)

CTGCAGCACT CGAAAGTGAC CTGCATCAGC CTTCAACCTT CACCCTTATG CAGCAACAAC CAAAACCATA ATTTAAAACC TATTTTGAAA CATATCCATG TACATTTTCC CGCTATCAAG CCTGTTCTTT TAACAATAAT CATTATTCTC TTGTCCAAAA ACTGGAAGCA ACTGAGATCA AGACAAATTG AGTGCAAAAT CTACACAATC TATGTTCAAA ATATTGAGTG CAGGTCTCTT TCTTGCAATT CCAAGTCGTA TCTATTGGGG TCACACACAA ATGGAATATC ACTAAAGCTA TCGAAGAA GATTCCACCC AGATCATCAT CAGTAGCTA TTTGTGTCTG TTGATCCTTT GTGGCTTCCC CATCA[T/G] CTTCTTCTCC [A/CJGAAAC ATTTGAATTC TTTTCAACT AATTCCCTAA ATAACGA ( ~ 410 bp) TGATGGTATG TGGAGATTTA ACTTACTGTA GGTTCCAAGG TAGAGATATT TGTTTCCCAA AAACCCTTC

CTATTCTTGC TTCCCATCTA CCAGTGTGG TGGTGCCTGC AAAACATTCA CCATGTGTTA GTGATTCATC TTCGCAAGCA ACAATTCAGA AGCCAATTCG AGTCGTATTC ATCAGAAATA TCCACCTCGC AACACCCCTA TACTTTGATA CACCTCTTGA AAAGCCACTG CTCTTTCTGA TCAATTTCAA AATAACCAAA GTCAAAAAAT CATATATAAT ATCAAATTCT GTACAACTTT ATATGCTGCA

AACACTACTA CTATAGTGCT ATTGCTATCT CTTATAAAAT ATAAAACTGC AG phal0620 (SEQ ID NO:718)

ACTGATGCTG GGCGTGTTCA TAACCTTGCT CCCATACTAA CAGAGCATCT GTTTTCCTTC CCAATGCAGA AAGGGCGTGA CCTAGATGAC AGAATCAACC

AGTGCTAAGT TCAAAAGACA AGGCATTTGC CTCTATGACT CTTTCTATGG TAAAATTTAA ACAAAACTGT CGCGTGATGA TAATAATCCA TAAACAGAAC GAATAAATGA TAAATCCATC AAAATATAAA TTCCAAGTTC CAACACGGTT CTTAACTTTT TATTCATCAA CCAGTAACCA CCTCTGTTAT TCTTAAAAGA ACTTAAATCG CACTTCCCAG GGTACAAAAA GTTAAAAAGA TTCATCAAGA

ATTGGATGAA TTTAACAAGC AAGAGACCTT TGAGAATGTA GGCTTGAAGT CGCGAGGGGT CAAGCTGAAG CGCTTTGTTG CAATCCTTAA TTACGTGCT TGTGCAGCTC CAATCGGCTG TAACAGAAG GCCCTGTTAC TGAAAGCAAG CATTGAACAT CACATTAGCA TTCACGGAAG AGAATTAAAC TTAATCGTGA T[C/T]CGTG ATGAAAAAGA AACAGAATCG GAGAGGAATG GAGGTCCGAA CCAGATGTCT TGGATTGCA CCAGACTGA GAAACGAGCG AATCGAGGAC TCGAATGGCC TTGGACCAGT CCTTGGAG phal0621 (SEQ ID NO:719) CAGCTAAACC TTACAAGGAT GATTGGTCAA GAAAAAGGA ATTCCCAGAA

ACGAGGAACT CACTCAGGCT ATGATAGTGA TGATAACATG TGTGAACAAC GTTCTCTTTC AAAGAATAAC ATAAATGCAA TTCGTCATTC ACAAGTAGGG [T/C]CTCT GAATCTATTT TCTTCATACT T[C]GTACTC TATGATAATC ATACTTTCC TGTTCCCTAT CTTTCACAAC ATAAAATCTT GAGGATTAAG TGGGGAGAT TTTGGCTACT ATTTATATAT CTAATAATC ACTATGAATA TTGAATATGC

AGTCTGGTGT GTGACACACA TTTGGGCTAA AACCAAACTC ATATTTAGTT GAAAAAAATA AGACTCGTGG ATCTTCTATT TATAATATGG GCATACATTC TTATATGGAG TA[TAATCTT TTCAGGTTCC ACGAGTA]AT TGATACATT ATGGCAACTT CAGTCCAGGG ATATGCTGC CAGCTATpha 10623 AATT GAAATCTATT ATCAATGGTG AATTCCCTG CTGGTTGGGA GAAAGCACTT CCGGTGAGT AAATTCTAAA CTCACAGGTT TTTCTTACAT GTTTGTAACT TTTCAAGGTT TGGATTfAC/ TTJATACTAT CCAAAGTTGA TCATGAfATC TGA]TTTCAG [TT/CAAGTA TCA]TGATTT GGTAGTATGG GGCTTCAAAA AGTTATTCCA AATATTAAAA

CCTGGCTTCC AATTTGATTT CAGACATACA CTCCAGAGAG CCCAGCGGAT GCCACCAGAA ACCTGTCTCA AACAAACCTT AATGCCCTTG CAAAGGTTCT TCCCGGTCTG CTTGGTGGCA GTGCAGATCT TGCTTCTTCC AACATGACCT TGCTCAAAAT GTT[T/C]GG GGACTTCCAA AAGGATACTC CAGCAGAGCG TAATGTTAGA TTCGGTGTTA GAGAACAfC/ TJGGAATGGG AGCTATCTGC AA[C/T]GG

CATTGCTCTT CACAGCCCTG GACTGATTC CATATTGTGC AACCTTCTTT GG phal0624 (SEQ ID NO:721, 722)

CTGCAGGTAC TTTTTATGAA TCAACTGCTT CTATTTCACA GCTTGTTTTC ATTTATCTCT TCTAAAAGGG GAAGAGGGAA AAATTGTATT ATTTGCTAAA

ATATGATTTT CTTTTATCTA GAGGGACTA TGTGAAATGG AGAGGAGTG CTATTTCTTC _GGTTCCTTTT GTCACTTGTT GACGAATCAG AAAAGATAAG GCAGTTAGCG GATTTTCTCT TTGGAAATAT TTTGAAAGGT TTGAA[G/T] CTTTATCTTG ATAGTTATAC TTGCCATTAT ATATAAATTG GAAAATATGT TTGTCATTGG ACTTTAATTA TTGTGTTTTT CTCACCAGT CAAGTCTCCT

CTTTTAGCAT ACAATAGTT TTGTTGAGGC TGTTTTTGTT CTGAACGACT GTCATGTCCA TAATGGGCAT CGTGAGTCTC AAGGATCACG AA...(~ 970 bp)...AAATCTTGAA AGAATTAATA ATAAAAAAAC TATTTTATGG GGACAAAATA CTTATTTAAG CCTTCTAATT ATAACATTAA TTGTGGGATC AGGATGCTTT TCAAATTCTC GGCTGTAAAG AGATACGCAT TTCATCCACT CGTGCATCAT

CTGAGTCAGC AGATGTAGAG GAGGAAGGGG GAGA phal0632 (SEQ ID NO:723)

ACAAGCAGAG TACACCACGT TAGCAATCAG CAATCAATGT CAGAGCACAA GAAACTGAAT CTAGAAAGAC CCTATCAACT TTCCTGTTTC CATTGCATGG

CCACAGATTG TGTTTTAATA AAGTTCAGTA ATTGATCACA CTCATTGTTT GTACAATCTA TTATTCAAAA CTACTAGGCC TACAAATCAT GTTTAGCAGA ATATACACTA AATAACCCAT CTGACAGACT CAGAATGCAA CTAAATGAT GAAGGAACCA ACACTTGCAT AACAATTTG ATCACACGGT GTTTCAAATT TCACATTGTA AAGGTAACCA AAAAAGAAGA AAAAGAATfA /CJTTCCTAG

ATGAAATTTA CAAGCAAGTG TAAAACAACA CATGGTTAAA TGTGT[GT]T TGGTTTTTCT TTGTTATTAT TATTATTTTT fTlAAATCG GCAAATGAGT GAGGAGTGCT GACAΓA/GI CACACTTATT ACTTGTTAAA ATTTATGGAA AACTAAAAAA ATCATAAGAT TTAGTGAGAC TAACAAAATT TAGTCAATAA AAAAAA phal0633 (SEQ ID NO: 724)

CTGCAGTGGC ATTAGGCTCA TNATCACAAA ACAAATCCAT AACAGAAAAT TACCGAGTCT GAGAACAAG GACAAATCAA TAGGTGAGAC GAAGAAAGN AAGGACAGGC AATTNATGAT TTTTAAAAAG GAAAAGCAAA GATAGATGTT AAATAAATTC CAGTGTTGTG GCCTCGANAG ANAAATTGCT AAGATAAAAT

CTAA[A/C]A TTTAGTACTA TAGCAAGAAA CACATCCCCA ACATACGTTT GTTGAATATC TATGATATAC TTCTTACATT GATCAGTTAC TGTCTTCCAT TGATTTGGGG TCCATTAAAA GCAGTATTC TGGTGATCAG GAGCACTCCT GTTAAGTGG GGAAAAAAGT ATCAATTCTT TAAATGCTGA phal0634 (SEQ ID NO: 725)

TTCTATCATT AGCTCCGGAA TTGTTAACTC TTGATAACAA GTTTTCTGTA

CTTACAGGTG CAGCCCTCTT GCTGTCAATG CTGCTCAAAG TTATCTCTTC CCTTGGTCCC AGTTGAGTAT TTATCAGTTT TTCCTGAATA AAAAGCAATC AAGTATTCAA TATCACAGAA TACCTATGAA ATGTCACAGT ATAAAGGAAT ATTATGGGAA CAGTACCCTC AATGCAGAAT TTTCCATCC TCATCTGCTC AGAACCTTCA GTCAGTCGA GTTATTTCTG ACTTCAGAGA CACATTCTCA GCAGTTAACA TCTCAACTTT TCGTGCCAAT TCTTCAGTCT CGGCCTAAAC

AGAGGAAATA TGAATATTAG TTATTAACTT TCCTCTTGTC AGAACAATCC TTTTCCTTT TTAGTTATTT TATTTTATTT TTTGAGGCTA CATATGTGTA CATGCAACAA GTGAATAAAC ATGTGATAGT AATAAAGAAA CTAAACAATT TGTCTTTTTA TCAGACAAGA CACGATGGAA GGTTTGCAAG TCAACAAACT AAAT[T/G]T ATACCTGCTT CCTCAGCCTG GACCTTCTAG CAGATTCACG

GTTAGATTGT TTTCTCCTCT CCCGTTTCAG CTCACGCTCA TTCTAAGGAA TTAAAAATAC ACACAGGAAT TAATTTACTG TCAGTAAAAG ATGGACGGTA TTGCATAATT GTTCTAATGT GTTTCACAA GCATCTCTAA TGTACTAGAC ATGATCCGA GACATTAGAA CAAACCATTA TACCTGTAAC CAAGTTTCAT TACGCACAAC TGCACAAGGT TGTGCGGCAC TTGTGGAATT TGCCTTAGAA

TGAACAGTCG AAGGGTTCCT CAGCTCCAGT GCTGTGGCCA TACCTGAAGA AACTACAGGT CCAACTAATG TTCCTGCAAC ACTAGCTGGG TAGCTGACAC AATCTTTTGG AAGATGCAGT CTCTTGGAAG CCCGGACCAT TTGTAGCTCA GTTTTCCCTT CTGCATCTGC TTGTACAAGA TTGATATAAT ATATAACCCT CAAAACACAT ATTATGTTGA CAACTTAATT CCTAAAATTA ATAGCTGAAG

TTACTTAAAT ATCACATTCT GCATGCAAGT ATGACATTTT CTTGATAATT GCATACAAAG AACACCAAAA GCTTGACTGT ATTAAGTCTG AAAACCAGGT AAAAACCCAT TCAAGTATCC CATTTGTCAA ACTAATCAGA AAGTTAAAAC TTGGGTAATT TCATTAACTG TGCATAAGTA ATAGGATAAT ATCAACTTAT CAAGACCATA GTACCATACT CAATGATCTA GTTGACAACA TTGCACAGTC

ATGACACCAA AAGTATAACT GTAATCCAAC ACAACTCTTT TGTTCATCCA GATGCAACAG TGAATAGGAG TCATACCAGT GATCGGTGTT CCTTCTCGGC TTCTTTTCCT TTTTGTTTGA TTAGCCTGAG CATAAAATTA ACAAATAACA TAATTTAGTG CTTCAATTTG TCAGATCATT TAAATAANAT AGACTGCTAG TGCACACAAA CAACAGATGT TAATGGTGAG AAAGTTCTCA CCCTGCAG phal0635 (SEQ ID NO:726)

CTGCAGTTTT TCCCAGAAAA AAAATGGTTA CGATTTAGAA CTAATGATGT

AGCACAAATA ATTGTTCAAA TCATTATGCT TCCAGCAACA AAAACTCAGA TGCAACAATG GAGATGCCCA TGGCGAATCA AAATGCACTG GAAAATAGAT

TGGCACGATC ATAAAAGATC TTCTACACT TCTAATGCCT ATTTAGGTGT GCTTGATAA AGATGAATTA GG[G]GGCAG GGATAGTTTT AAGAAATGAA AAATGTTAAC TTGTTACTAT ATGTA[A/C] CAAAAGTAGA AACATCATCA TTTCATTACA CCAAATGTTT TCATCTCTAT TTTAAGATGA CAATGAAGTG AACATATAAT AAAAGAGCTC AATTTCGATA CACTTGGTAT AAAAGTTTTT

ATATTGACA ACTGGTTAGA AATCTCCCTA GATATGACT AAAATAATTT TTACAAAAGT CAACAATTTT TAATTATGTG ACAATTTATG ATTGGATAAC AGTGTAAAAA GAACATTTGT ATTGTTTGTG TATTTCCTAT TAAATTCATA AAATAACATT TCTTTCATGC TTATTTTTAT GTTTTTCTTG GAAATACTTT TCACCTATCC AACCATGCAC ATTTTTAACT AAAACCAAAA ATAGAAGCAA

TAACATACTT GAATTGATTG ATTCCATCAG CAAGAGCCGA ACCAAAAAGG ACCATAATCT TGTCTGAGAT GCCAAAGGCA ATATTTTTCC TTGACACTAC AGCAATCTGC AG phal0636 (SEQ ID NO:727, 728)

CTGCAGATTC ACTTCAGGAA TGGAACAAAA ACCACCCAAC CAAGCACTGC TCACAAAAGG AATTTCATCT GTAAGTCGTC TAAGTCCACA TTCTCATTTT TTATTTTTTT GCTAATTATT TTCAACATCT TCAAGGTTCT TTTGTTTTTA ACTAGTAATA GCGCCACTAA CTACCATTTT TTAAGGGGAA AAAAATTGGT TACTGTGTGT TTTTTCTTCT GTGGTCTTGA TCCAGACTAG TTTACATGTT TTGGCGTTTT GGGTGTTGCA A[T]TTTTTT TTTGGAAAGG TTGTTATAGT GGACTAGTGG CCATGGATGG TATTATGAGA GTTGATTAAG TAGGGGAAAA AAAGGGTTTT TGTGGTTTTA AAAACTAGG CAGCGTGACA AACTGAGCAT

AGGGTTCCA TTATTTTTAA CTTTTTTTAT TGAGAAAAAA AATAATATTT TTGAGCCGTT TGGTTCCACT ATA[C/G]TT CGCCATCATG AAAAGCCGTG TCCTTTGTGG CTTTCTTACA ATTCTACTTT ACTCTTACTC TACTCAAGAG TTAAGATTCC TTTTTCAAAG CAACTTCAT TTACCCATCG GACATCAACC CTATTTATT AACTATGTTT CCATGTCGAC CTTTGATGTA CAACACAAAA

ACAAACCAT AGCCATGCCC TTGTATTCTT GGCCAAAACA AGAGAGAGAA AGAGAGAGAG TAAAAATCTA TGCTTTTCCT CCTCGGGAGT ATCAAAATGT TTGTCGGACT CTCCCAAAAG TCATCAGTCA CCTTTTAAAC TTTTAAGATA TACATGATAT TATCCACAAC AACGGTAAAA TTATGCATTG TATGTCGGGT TACTCCACTT AACTATATTT TATTAACTTC TAAAAAACAC TATTTACACT

CGTCACTTCT TTACATCGAA AATCTGTAGT ATTATACATA CATTATAGTA ACAAATTTCA CATTAAAAAT AAGAGTATAA TATAAGAAAG TATGCACCTT TAATAACGCG TAAACGTGAA TTCTTTTTTG GCTTTTGAAG CGTTGAGCCA ACCTTTAAGC AAAATATTCA AATTTAACTC AAGTCATTGT CGGTAGTTAG CTTATTTTGG TTATGATATG ATAGCCAGTT TAACGACAAT TCAAAATGCC

GACAGCATGA AA TCC AGTTTGGAAT AGTGACCAAG CCNAATTATA

ACTTTACGGA TCATACCAAN CCAAAGAACA TTGAGGTTTA ATATATATAT ATATATAGCT TATGTCACTT TAATCAAAGC AGCAATTGCT GAAATTGCTG GAGACAAGAA GTTGTTTAAT GTGGTGGCTT TTTGAGAAAT AACAGACAAC CCCAATCTAA AAAAGTCCAG GAAGATATTG TTTACGTGTT GGGATTGAGA

TTGGAAGAAG AAGGTGAGAA TGTAAGAGCT GATACAGATT AGACTTGAAT AGGTTGGGGA TTCCACTTGA TATATGGTGA TTATAAGGGC TNCNAAATTT TGCTAACTTC AAGAAATAAA AACGTATTAA CTGATAAAAT GGAAGTTAAG TCAACTTTNC TCTGCAG phal0637 (SEQ ID N0:729)

CTGCAGTTAC GAGTAATTGG GTAGGTTACT GCACTAATA CTTTAGTTGA CTTTTGAGGT GGTTGAGAT ACTAGCTATC TATTAATCTG TTCTATAAAA TACAGGCATA GCTGTTCTCT TGAACTATAG AATTTACATA GCCTTTT[C/ A]CCTATTAT TAATAACACC AAGACACACA TATACTGCTA GACACTTCTG

TATATGGGAT CTCTCAAAGA TCCTTTCATC TTTCTACATT TTATTTGGAT ATGGGAATCA GGTGATCAAC CCTAACAGTT AGGGTTTGGA TGTGTTCTCC AACCACCCAT GTTTGGTTGT GTCCTATAGC TCACACATTT TGAGACCTTT TATCCTTTGA AGTTTGATTG ATTCCAACAT TCAGATTGTC TTTCCTCGAT CAATTGGCCT ACTCAACCTT fT/G1CTGT CATCTACCTT TTCATATACT

TCTACCACAT AGC phal0638 (SEQ ID NO:730, 731)

CTGCAGCAAC AACCCCCAAA CCCCGTTGCC AGAAACAAGA CAATTTGTCA AATGCAGAGC ATATATAAAA AATAGAGAGT AGGATCATCA CCTTTGCTGG

TTACTGGAAT ATTTTCTAAG TCCATTGAAC TCATTTTCTC TGATGCAGAG CTCAAAGACC TGAAAAGCCA TTGTTTGTAA TAAGAACAGT AGACCATATA TTTACTACCA TCAAGGTCTA AAAACACAAA ACAAAGTTCA CCTGCTAGCA CCCCAAACAT TTGAGTTGCT TGATGAAAAA TATCATCTAC AATTGAATA TACTAGCTTG ATGCCTATTT GTTTCTAAAC CCATATCCA AATA[C/A]T

AGTACCTCTC ACAATAGCTT TAAAGTAAAC ATGGAAATTC ATCAACCCCT AAAT[G/C]C TCAACCTTAA TTGCCTCAGA GGCCATTTTA TTAATATAGA TGGAAACCAG AAACATGTCC AACTATTACC TTGAAAAACT ATCATATTAG CAACTAAGGA [G/A]GTTGC TGTATTTATC CTCT[T/C]A ACTAACCTGG TTTGTGAACA AAGAAAATTC CAATTTTCTT TCAACAATAA AAATTCAGTT TGGTTCCTAA TCACATTATG [T/C]AAAAG AGTAAACAAA ATGAGTTACA AATAAAAGGT CAAATTTTAT CAAAAAATT CTTCACCACA ATAAAGAGCA TTGTATGACT GCTCTTTTA AATTAATTTT CAATCCTTTT AAGTTGCAAC

AATGATAAAG TATCAATTAC CTAAGAGTTC CAGGGAAAAT TAATAAGAGC AAATAATTTA AGGATAATAG TTTATATGTC ACATACAACT AAAGCAGTGA TTCTTAAGGA GGTTTTCCAA GATTGCACTA TGGTTTTACA GCTAAAGAAG AGGTTATGAA TTTATTAAAC AGAAGCTGGG ACATCAATGG ATAAAATAAA TGGATCCAGA AAGGTACAAA CCCCATTATC ATCTGTTTGG AAAAATTTGA

AAGCAAAATC ...( - 50bp )...CCAACC ACATATAGAC TGCTTATTAA NACCCATCCA AGATTTCTAC TGAACCAAAA CGAATATGAT CCTAAATATN CATTGAATTA ATCTTACCCA GATGGGGGGT CATTGATTGT AGTCCCCTTT CCAAAAACCG GGGGGTTCCT AAATAAATTG AAAATAAAAA AGATTAGGAA GCTAACAGAA ATTTGAATAT AACGCTCCAA AGTTCACACC ATATCAGTCC

ATATTCCCCT TGAAATAGAA AACTAATACA ATAGAAAGTG TCACACTACC TCAAATTCCT CAAATTCTTT GTCCTCAATT GTTTCAACAC TTGGCATGGT ACTAGGTGGT GGATGAAGCA AGTCTTGTGC TTCTTCCCAT GTAATTCTCA ACTCCACAGC ATCTTCATTA TGTGTCAGCA ACCTCTTACT CTTTGGTGCA ATATTTAGAG TCTTCTCAGA AACTGAAATT GTTTGCTGC AG phal0640 (SEQ ID NO: 732, 733)

CTGCAGA ATTTTTCTCA TAATACAAGT TTAGAATTAGTTTCTCAAAA

AAACAAAAT TAAGAAAAGA AAAGTCAACA ATAAAGAAGT TTGATCCAAC CTCTTTTATC ATGGACCCAT CTGCAAGCAT GTCCAATGGA AGCATCATTA

AATCACTTAA AGCATTAAGG AATTGGAAAG GTTTGAAGGA TGATTCACAT TTGGATTCAT TGTTTTCATT GCTAACTTCA CGAGAGTCAC TATCATCAAT GCTAAATAAA TCTGAAAGC CATCTTGACC AGTCACCAAT CTGAGTACAG ATAATAAAA AGTTACTCAA GAGAACGAAC ATCATATATG ACATATATAT CCTTCATGCA ATAGCAACCA AACTAATTTT ATTAGGTAAG GTTGGCTACA

TGGATGTATC TACGACATTA AGCTCTATCA AGCACCAAAT TTTCAGT[C/ TJAAACCATT TAGGGTAAGG TCTTACTAGA TGNTTTCTCA AGTTTTCTTA GGTCTTTCTC CACCTTTTTT GATTGAACTA ACTTCCATT TGATCCACCT TTCTCACTGA AGCCTCTAC TGGTCTTCTT CTCACATGAT CAAACTAAGC GAGTTTCCAT CATCTTCTCT TCAGTTGATG CTATTATAAC ACTGATGGAT

ACAT...( ~610 bp)...ACATCCTAAT GCATATTGCT GTGGGTGCAA TGGAAGTAAG ACTTACAGTG AAAAATATAT AAAGATTAAG ATTAATATTG TGTTATTTAA AAGAAGAAAG AATGCATTAA ACAACTTCAG TTTCATAACT CACAGCATTC TTCAGTTGTG CACCAGCCCC AAAGCCTGAT TTTCCAGCTG GAATAGGAAG AACCATAGAA TCACTAATGG GATCTGATAT AGGATCCATG GGCATCTCTT

CAGCGGATTC ACGAAGAATA GCATTGAACA TTGCCACATC AAGTCTACTA ACCAACTGCT CCATTACCTA CAACCACAAC AACAGTCCTA TTTCAATCAG GCTGCAG phal0641 (SEQ ID NO: 734)

CTGCAGTCAA TCATAATATT CAACCATGAA ATCGAAGTAG TATCTAACCT GTCTGCTCTC TTTTGCATTT CTGATGAGTC CATCAAGACA AGTATTGATG ACCTTTAGGT CATCCTGAAA CTTTCTTTGC CTTGGGATTA TCCACCTTGC CAATGGAATT TTCCAATAT GGAATGTAGA AAGTGGATCT ATGTTCAGC TTCAAAAAGA GTGCCATAGA CTGC[C/T]T ATAACAGACT TAAAAGCAGA

GGATATATGA GCTCTCCTAA CCAAAGGTAA ACAGAACAAA TAAAAAAGTA AGGTCTTGAA AATCATATTC CCAGCATGGC CTAAAATTTT GAGAATTAAC TAAGGTGTAT TATGTACTTA TATTGATGAT GATAAGTCCT TTTTTACCCA CACAATTGTT TGTAGAAAAC TAGTAATTAT CTTCAACGTG TAAAATCAAG AGATTGAGCT TCTGGAGTAC ATATTAATAA AGAAGTACAC TAAATAGGTA GCTAGACACT TTTAAGGAGG CCAAATATAA GAAAGAAAAT T[A/G]CCAC GTCTACTTTC ATTTACAATC AATACATCAA TATACACTTG AAATGCATAA CCTTTAAGTC AAACCTTAAT AACTGGAGAT TCTTT[T/G] GTGACAGAA

CCAAAGTCAT AGTTGAACAC ACCGATCCC AATAATATCA AGAGCCAAAC TAGAAAACTC TGCCTCAAGA TCCAATTCAA TTGAGTTAGG TCCATCATAA CCCTCTCCTT CAAGAAGCTT ATTAAACTTC AATATTGTTC GTTCTGATTA AGTTGTGAAT ATTTTGACCA TAGCTTCCAA GTATGAGTTA TGGAAAGCCA GAGCAATGAC TGCATATTTT ATAAATATCA AGTAAAAGAA AAAATAATAT

TAGGTTACAT TCATAAAACA CTTGACATTA TTAGCACAGG AAAGACAAGC TATTGAAACC ACTGATTATT CTTTAAAAGT TATTTAGACT CCTAAGTTGT AACAGAATGC TGGTATTAAT TTAGTTAATT ACTGCAG pha 10646 (SEQ ID NO : 616 , 735)

CC CCCAACAAAA CTAAAAATAG AACCCTCAAC AACC... -45 bp)... TCACACCAC ACAAAACCCA TATAATGGAG AAATTATTAC ATGGTCCAAG ACACTTGTC TATAATTCTA ATTCTATGTG AC[AC]AACC GATTTTCTTG ATGTATCAT GTGAAGATT GTTCAAGATC ATAATGTTCT TGGACCATG TAAATAA phal0647 (SEQ ID NO: 736, 737)

CTGCAGAAGAA GAAGATAACG TACAAGCATC AATCAAGCT CTCACTGCTA

TTCCCCAAA ATATTTCAAA AAACAAAACA AGAAGAAGAA AAAACTGCAA TGTATGGAA AAGTGGTTC TCTCGTTCAT CTTCGTGCTC TTGCTCGTCT

TCTCTTACTC AATTTTCATT GGCACCCTCG ACATTCGATC TTATTTCTTC CCTCGCCTAA AGTTACCCGC GGCTGCACCT GCTCCCTGTG CACCCGAGCC TCCTCTTCGG GTTTTCATGT ACGATCTCCC TCGCCGATTC AACGTCGGCA TGATTGACCG CCGGAGCGCG TCGGAGACGC CGTCACGTTG AGGACTGGCC GGCGTGG... ( ~ 255 bp). ..AACTCGCC CAAGCGTTCT TCGTGCCGTT

CTTCTCGNCG CTCAGCTTCA ACACGCACGG CCACACCATG AAGGATCCCG CCACGCAGAT TGATCGCCAA TTGCAGGTGC GTCGCTAGGT TTCTCGATTG TTCCGAATTG ATTGGTGAAT CANTCAAATG CTATGCTATG AGAGTTATTT TCACAGCAGG TTTTGCATGT TTAGCCTTAG AACGCGATTA GTATGATCAC TTGAAGATTT A[C]GATACA GGATGCTTGT TGAACGACAT TAGATGCTNA

AAATCATAAG AAATTTGATG AATGCTAATf A/G]TTCACA CCAGTCTAGA AGGTAGGAAT ATCCTAATG GAGTTTTCTC AGGATCGCAC GAGTTGAAC TCTGCTTATG TACCTGCAG phal0648 (SEQ ID NO: 738, 739)

CTGCAGCTTT GTAAAAATGT TTATGGCCA TTG GGCATG ATTC TTGAA TAGCCTTTTT ACCAAGAAA CTGCATCATC ATTCTAGAAA TT[A/T]GTA AGTTTTGTCT GATCACTTCA CAAGAGACCT TTAAA[A/G] AAGATTTATA TGGCCCAAGA ATTTTGGAAA TGCATGCGCA AACCCATAAC TGAATGTAAA GAAATTTACT AGTTTAATTG TAGTTCTCGG CTGAGTTTGT AGCCCTGCCT

TTGTTTTGTT TTGTTTTAGA CCGTTTTGTT CGGGCCTCTT TTGTTACTCA AATGTTCACC GTTATTAATG GATTTTATTA TTTGTAGTTT AATTTCTCTG ATAACCATTA TTGTTATTAT TTGTTCTTAC CATTATTAAT GGATTTTATT TTCAATAAAA CAAAAAAAAG TTGAAATGTG TGTGTGCGCT TGTGTATTAT ATAATGTGTC TGAGCTAT...( - 15 bp)... AATACAG GACACGTTTA

AATCACAAGA TATTTATCGG GAAGGTATA GATGTGTTGA TGCTCTAGCC AACCATGCA CATTAGATGA ACATAGACTT TATGATGTTT GAGCAACCTC CTACTATGTT GAGCCATTTT GATTGCTGAT TTAATAAATT TGGACTTGTG TCCATCCTGG TAACAAGCAC TCCAAAATTC CTTCTTCTCT TCCCTATTGG GGCACTCCCT CATAAGTCAT GGAAATTTCC AAATACACCC CAAACTTTTA CTTTTGCACC ACCTCTCTTA TTTTTAAATT GTTATTACAA AACTACCCTC AATTTCTTTA ACCTGTCTAT TTTGCAGACC TGCAG phal0649 (SEQ ID NO.-740)

TGATTTTCTT TTGGATCCTG AGAACAAGCC CCCCAGAATT TCACTCATG CATGTGCCAG AGTTCAGGAA TAGTCTTCCA AGGGAAACAA AGAAGCCCT CATCTTGTTT CTCAAATTCA TCTTCATTTG GAATCAAAGG AGGCCTGAC TGACTTTGTA GTAGGAGGAG GCCTTTGCAT TGCTGTGCTA CAAATGGTG

CTCCTTTCTT CCTTGAAGGA GCTCTTGGAT CCTGAAAATG AAGTAAAAT TATTATATTT TCTAAGGGCT ATAAGGTCG GCTTGGTGGT TAAAGG CAG AAGGGGGGA GGGAGAGGTC CGTGGTTTTG AGTTTGGAAT TTCTCTT AC AAACAAACTA ACAACTAGTA TTGACCGATA AAAAAAAACT GATATTTC C AGTTTATATA CTAGATGTTT AATAGGCATC ACAAGTCAGA TGCAAAGTT

TTTATATTT GTTTACTAAT TAGAAATCAT TCTTAGTATA ACTTTTAAGG TAATTGTTA TAAAAGTCAA CAAACTTATC A[C/G]A[G/ C]ATGA[C/T]GATTTGTGAT T[C/T]GATG AACAGTGCAA AACACTTTTT AACTCTCATT ACATGTATTAT TTATTCATAT ACTATAACAT GTTTGATAAA CATATATTCT GTTCTGTGA TCAAGAATGC AGAAAACTAT TCAACTTTAG

GATATACAGG TTATTTAGGAC TTAATTAACT TAGACCAAGG AAGAAATTTA TACCTCAACC AAATGTTTC TTTGAAATTT GTAGGTTTTT AAACTTGTTA CAATGGTTCA GGACAACATGC TACGGAAGGA AATAATTGAC AAGTGTTAC TCACATCTGG AGAAGAAAAC ATAGCCCTC ACCCTCCA phal0650 (SEQ ID NO:741)

CGCAATTAAC CCTCACTAAA GGGAACAAAA GCTTGCATGC CTGCAGCAAT ATAACCAGGA TTCAGAATTA ATCTAGTTAG TATATCATAC AATGCAGGGC CAGTTACAAT ACATACATA CGCATAACCA AAACAGTAAC ATCAATGGAA CAGTAATAG GACACAAT[T /CJATTATTA T[T/C]TTTT TGTTAAGGAA

ATTTCTAAGA AAAACACAAC CATTTGTACA AAAAAGGTAT TAATACATAG CTACATGGAA GAAACCTACA TTA[A/T]AA TC[C/T]AGT AGTGAGAAAA GATGGGGGCA ATTATGATAA TCTCGGAAAG CCTCTGCCAA GGGTCAGCAT TCAAAATTGA GTTCCTTAGC CTGCTGTCTG CATATGCTTA TCCACAAGGA ATATTGTCTC CGTGAGGATT AACCAAAAGC ATACCTCAAT GGGTCCAGAT

ATCCTGAAGA TAGCGCCCAA TTTGCTGAGC ACCAA[A/T] ATATAGGGCA TCGGCAACGA GAAAGACTCC AATCTACGCC ACAAATGTCA AACTTGTGAA TGTCAAGGTT AAGAAATAAG ATTTACAATT GAAGGTCTAC AGCAGATAGA TTA[C/A]CA GGCGTGCAGA AAACACTAA TCTATATCCC CAACCTTCCT TGGCTGCAG GTCGACTCTA GAGGATCC

C CGGGTACCGA GCTCGAATTC GCCCTATAGT GAGTCGTATT ACACCCTATA GTGAGTCGTA TTACGCCCTA TAGTGAGTCG TATTAC phal0651 (SEQ ID NO:742, 743) AGCTTTCCAT GTTTATTCAC GGCGCCATTG ACAGCTTGTT TCTTATTTTT

TGTGTTCTGG CCATCTTCTT CATCTTCACT GTACTCACTC TCATAGTCAT CATCATACTC TTCATCCTCG TTGTCCTCAT CCTCCTCAAC TCCATCCTCC ATGATAGGTC CTTCATCAGC CAAAACAGCA CCCCTGGTGC CAGCCCGAGT TCGTTTCAAA ACTTTGAGCT TTAAATTTGT CCTTCTGTCA CAGCCCAACC AAGAATCACA CAAACAATAG CAAGTCAAAT TGTAGTTACC CTCTGATGGG GCCTGGAACT TGCCCAATAC TAATCTAGAG CCCCCTTTCA CCTTCTCAAC TGCTTCTGCA ACTACCTTAC TGGTCTCCTT CACATTNGNT CCTGACCCCT CCATAGACTC CTCAATTGCC TTAGATGCAG CAGTTACAGC AGCAGCTTCA TCCATGAAA CTAACCTTCT GAGAAAACCA CACGTTGTT TGAAACAGAA TCCGCAAGCA AAAACCAGTA GTTTTCTTCC TTATGAAATG GGTAGTAGGG GGCATGTGGA AGAGCACCAA TCAGGCTATT ACCCCTTTTA ACATTTATCC AAGCATGGAG AGTCACAATG TCCCCCTCTT GTTTACCCTC TTCACCTTCA GTCTCACAAG TTACCTC[G/ AJAGTGTCAA GGAAGGCATC ATGTCCAGTA

CCGTCTCAAT GTCTTCCACT TCAGCGGAAG ATAACCCACC TGTTTGAATA AGAAGGTCAG CTCGCTCCTG AGAGTCCATG TCATGAAGTT CCTGAAATGT TCTCACTTTC TGCATTATCA AGAAGACAAA AT[T/G]CTT AGTTCCAACA CTCAATGTCA TGGGATCAAT TTGGAAATTC AATGACCAAG TAAAAAAATT CAACATTTGA TTTTCAAATA GCCAATAGTC ACTATACAGT ACATACAACT

TTGGGTTAAC TGTGCATAAA AGAGA[T/A] GGCTATGCAG TTTACGAAAG AAAAAAAA.. .

( ~ 40 bp )...CAAAAT CAGACATTAA GG[T/C]TTC CACTTTCAAA CTCTAACCTA AATTAAAAGG GGGGAAAAAG ACCAATCAAC ATATCATAGA CGCATTCTTT GTTTCTAGAT TTTCTTAATT ATTTATGCAT TTGTCAATCT TGAGGACACT

TAGCAGAAAT AAAAATTCCA TTAAATTGGA TCCAACAAGT GGCTAATTAA CTAGTTAGA TCCCCAGGAA AAATGTGCAT AAAAGGCTT GTTGACATTT TATCTGAGGT AAATCTTAAA GACCTCTAGA CCATGTAACT ACAGGTAAAC AGCATGGACA TGAAAATCCA GATCCAGACC AGGGAAATTT AATCATTTAC AATTGGATGG CAAGGTCATG AATAATGCTG ACATACTTGC GGGCCACCTT

CTTAATAAT phal0653 (SEQ ID NO:744, 745)

GGAATGACTG CAACCTGAGA AAGACGCCCA TTATCCAAAG CTTGACGAAA TCTAGATTGC GATGCCTCAG CAACAGCTTT TACTTCTTTC TCGGGCACAG

CAAAGCATAC AGAATGCTCA CTACTAGCCT ACATAAATAC TGTTAATGAT TAATGCCATT TCTTATATAT CAGCGTGGAC AACTAGAAAA ATTGAAAAAA GTTATAAGTG CACCTGAGAT ATCATGATAA CATTAGCTCC AACATCTTTT ACTGCACCA AAAATAGCAC TGGCAGTACC TGGAACACC AGCCATTCCA GTTCTGCAAA AAAGCATCAA AGAAAANTTT ATTGGAATCT ACAACTTGGA

C[T/A]ATTA ATATTGGTTA AAGAAAACCT TA...(~ 155 bp)...TGGGAATGTT CCTTATCATA ATGGGTATGC CATATCGCAT CACAGGAATA ATTGTGCGGG GATGCAAGAC ATTGGCACCC AAATAAGACT GTACAACAGA CAATTTGCAA GTTAATCTCC TTAATTTTAC AAGCAGAAGG AGCATTGAGG CTTTCCAGCA TCTAACTCAC CATTTCCCAA GCCTCTTGAT AAGACAGTGT CTTCCAAATC

ACAGCCTCAC TAACTGCATA [C/A]GATAT TATTGTTTAT ATTTAATCCA AACATCATGT TATGGCAGTC AAACACAACA CAAAAGAATC ATCAATATGT CAGAGCTAGA ACTCCCTCTG GTTACTAAAA ACCAATTCTC ATGATCCAGT CCACTCATTG TTTAACTCA GAGACAGAGT ACGAAGCATA ACAAACCTTT TCTAGGATC TGCACTATAC ACACCATCAA CATCTGTCCA AATTGTGACC TGACGAGCCT TAAATAGAG CACCCATAAT TGCTGCCGAG AAGTCACTTC CATCTCTCTT CAGTGTGGTA GGAATGTTTT GAGGTGTGCT TGCAATGAAT CCAGTGGCAA TGATTACCTT ACATGGATTC AAAGAGTACC ATTTCTCAAG TCTCTTCTCA GATTCCA phal0655 (SEQ ID NO:746)

AATAATATCA CAGTAAGAAA AAGACAACAG CTGTGGATGT CAGGAGGGAT TGATCCGCAC ATGCATATGG ATCCAATCGA TAAGGTGAAG TTTCTCTTGG CTTGCCTCGC ACCTTCTCAA TTAATATATA TATGTATGTG GTTTGGTGTC TGTATATGA CATATGTGCT CAGTGCCTCC GCCAAA.G/ T]CAAATAAA

ATGGATACTG AAATGGTTGA GAGCATTGAT GTCACAGATG ATGCAAGGAG TTTTAGCTGG GCAGTGGACA GCGCCATAAG GAGTGA[T/G ]ACTGGGTCC ATTTTTGGAG AATCTCGTGT CCGTAGCTTT GCTTCGTTAG T[G/T]GACA GTGCCATAAG GAGTAGTAGT GCTACTGAAG CAGATTTGGA ATCATCCCTT GTCCAGGCAG AAGACAGGGC GATGAGGACT GTTGCAGCAC GATTAACAAA AGCCGTGTCA AATGCAAAAA GACATTTTGC TAAAGGATTA TTTACCCGGG CTGAGCTTAT ACCAGTCACC CAGGCTGAGC TTGAACCAGC CACCAACAAT TTCCCAGGAT TATTATTTAC CCTGGATGAG CTTAAAGCAG CCACCAATAA

TTTCTCATTT GACAACAAGA TTGGTACPV CJGGAGGCTT TG[GTA]TTG TTGAGTACAG AGGCAAACTC [A/GJTTGAT GGTCGTGAGG TTGCAATCAA GAGGGGTAAA ACCTGGTCAA ACTCATTTGG TAAGTCT[G/ AJAGTTTGCC TTGTTCTCCC G[T/C]TTAC ACCACAAGAA TTTGGTTGGG CTGGTTGGAT TTTGTGAA[G /C]AAAAA[G /T]ATGAAAG GCTCTTGGTG TATGAGTACA

TGAAGAATGG G[G/A]CGTT GTATCATCAT TTGCAT[GA/ AG]CAAGAAG GGTAGCAGT GTGTTGAATT GGTA phal0658 (SEQ ID NO:747) CTGCAGGGTT TGATCGCTA CTATGTATAA CTATTTGAC TTAGAGGAAC

TCATCTTAAT TAATGTCTTC TACACCGTCT GCCTGCAACA GAGTATGTAT GTATATATAA ACAACATTCC AGGGCCAGTG TCATACAAGT TACAGCCAAT TTTCAAAGTA ATTAAGTGAT GCTAAACTAA ATCAAGACTG AGTTTAAATT TTCAAAACTA ACTCATTTCA CATATATTTT GAGTTATTTA GGCCAACTCT TGCTAACCAA AGTTGGGTGA ATATTCTAAC TGGTCCCCAA AACTGTGAAG

CGTAGTCACT TTAGTCTCTG AAATAATAAA ATTCAAAAAA GTTTCTAAAA TTACAATCCA TATGTACTCA C[A/T]TTAG TCCAATATTC AAAAACTTGT GTGATTTAAA TGAAAATATT AACATACTTT CAGGACAAAA GTGACAATAA AATGACAACA TAAAGACCTA TTCTGAATCT TCTTACTTTC AGGGACAAAT GGGATAACAC [A/TJTAACA CTTTTAGGAC TAAAATGGTA GTTTATCTAA

TCAAAGTTTT TATGGTAA[A GTT]AGTTCA AGCCCAACAA TAACACACTT GAAATTCACA TAGAAAA[C/ T]AACTGTTC AAGTACAAAA TAGGAGGAAT CCAACAAATT AGCAAGCCfA AAACAGTJTA ATAGGTACCT GATTTTCTGG TAGAACTTTC ATCATCCATC CTCATCCCAC AGGTTGGAGC AAAAACTGCC CATTGTACCT TTGTCCCACA ACATTGTGGA AATATCTCTA ACATTGCATT

CAAACACTTC CTGCACTCAT CACTTGGAAG ATCCCTATCA CACTGCACCC ATCCATACAA TGTCTCATTG TCACCCCAAT CGAACTCCTC CACTGCCCAA AACTGCTTGG TCTCTATAGT TGCTTTTGTA ATCAAACCLC/T]TC[C/ A]ATAGCATC CTCAATTfC/ TjTCTTAGTT TCTGTTGA[A /GjTTCTGTG TTTGGACCAC GATCTTTGTG AGACAAATTC AATTTTTAAG TGACTTTAAA

ATATGACAfC /AJATTGAAA TGACCTGTTT GGATATTAAA TTTAAAGAAT TTT[C/A]AA AAATGATGAG AAATTTATTG GAATTTTTTT T[T/A][A/T JAAATAAAAT AGAATTACAA AACGCTGACA AGGTGTTTTG GTAGGGAGAG ATTCTTTTCA ATTTCTTAGA AATCTTGGAA GTG[T/C]TA AATTCCTATA TTTGATATAA CTATTTTAAT GATCATTTTC ATAAAT[C/A ]TAAAATTCA

CAAGAATCAT TTTTGAATAA GTCTTCTACT AAACGTGGTT CGGTGGAGAG ACTCTACAAA ATGAGGTCAG ACATCGTAGG ATGTTAGT[C /TJAAGCATC GGC[C/A]AA ACC[A/T]GT AAATACTTCA TATCATATCA ATCATATGAT GAT[A]AAAA AAACGCTTTC TTAAGACACC GTGAACTCTA GAAAACAQA T]ATAAAATG AAATCTGCAC AAGC[T/A]T A[A/T][A/C ]GCACATGAC

TAAATAA[C/ TJTTTATCAA AATAAAAAAC TAAAA[T/C] ACAGGAAAGT TGAATTGCTT TATCCAAATA AAATTTfAA/ TTjAAAACGA AAGAAGTTTA ATTTGCAAAT AGCTTGAATT TTTCAAATAC CATAC[TC/T A/CAJAAAAA TTACCTT[G/ T]ATTTTTCT GGGTCCGGTA A[C/T]GTTC CACGTTGGGC TTAAACTAAC TTTGCCATGG AAACTCTGA TTGGAGTAAC GGAGGATGC

ACACATCATA CCATAGGATA GCGGTAACAC TGTCAGGACA CAGCCGAGAG ATTTCATTGA CAGCAGTGGT GAGACAGAAC TGGCAGAAGT ATCCTGTGAT GTCGTATCTG CAG phal0782 (SEQ ID NO:748)

CTGCAGGTTA GTCGAACTCT TGTCATTTTT TCTTCAAGTG TTGACTGATT AGCTGTTACA AATTTTTAGA GATTACATT AATTACCGCT ATGACTATAT CTTGGGACT ATATTGATTA ATAGTTGTCT TTGAAACTTT TGGAAAGTCT AAAGATCTAT TATGTTGGAG AATAAACCTT TGTGCACTCA CGCACA[A/G

JATATGAACA GAGAATATGT TTCAATGAAG CTTGTGTGTA TTTTCTTATC AAGCACTTAC ACATATGCAC CTTTATATAG TGCTTTTATC AAAAACTGCT ACAGCAGAAA ACAGAAA[T/ CJCAGTATAA AACTAACA[T /C]TCTAACC AACTACTACT AATAATACCC TTA[T/C]TT ATATTTGAAT TAAACTC[AA ]AGTTATTTC TT[T/C]GTT ATTCTG[C/T JTATTATTAA AGTTATTTCA

TTTGTTGCTT CTTCTTGTTT AATGCACA[T /CJATTACTT TGTCTTGTAA AGGTCGAGGA TATGATTCTG TTATTCCCTC TATTCCTGAT GATGTAAGTG GCGAATCCCA T[G/A]TTCA GACCAAAATA TTTGCTGATA AAGTTGTGGC ATATATAGA CCAATACATA GAAGCAATGG AGAAGGTAGC CTATCCCCC TTTTGAAACA ATATGGCATT GTAATTACTA GACAGTTTGA TGACATTTCT

TTTTGCAATT TGTTAAGGTT AAACTTAAGC AAGGATTGAA AACTGCAATG AGCATATCCA GCGAAGGAAA TGCAG phal0783 (SEQ ID NO:749) CTGCAGAGT CTCGCTAATG GGAGTGAAAC CTCAGTTTC TACTTTCTGA

AGTGTAGTAG ATTGCATGC[ T/A]GGCATG AGCACATGCT TATGAATTAT TGATAGATAC TGTTCAATAA GGGCATCACA TACATGAAAA CAACTACACT TGTGACCTTT TTCTCTTAGT CTTGAAAAAC ATCGTTGAAA ACAAATTAAG GAAATGCTAC CTATGTGAAA TTTTGATCAA GTTATGTGTG CAACTGTGTA T[G/A]GTTG TGGATCGTT TATTCTTTCA AGAAGCCTAG ATTCACCTAC

ATGCATCTCA AGCATCAATA TTGAGTTGGA TCAAATGGCA AAGTTGATTT CTTCTTTAGC TTGTTTTTAA ATGGATTTAT GATGAAGGGG CCACATTTCT GGTTCGCAT[ G/CJTCGACC TCAT[A/G]C TTTTTTCTTT CTTGCAAAAT ATTTGATTGG TGAACTGGTT CAAATGATTC ACGAGGCAGT GTAATTTACT CTAATAAGTT TTAGTAGCTC AATGTTTGTG GGCC[C]ATT AGGTTCAATG

TAGGGCCAGG ATTGTGGTGG AAGAGAGCTG TTGAGAAATG TTGCAGCAGG AGGAAGAGGA TGGAAGAAAA TTTTGAAATT TGAAGGAGTC ACGTTGCCTT GCCAAAAACA TGATAGACAA GTAGTGTACC GTTGCAGTAG AAATTACCTG AAATGCTATG TGTCCGTTGC ATTCTTCCCC TACATAAGGA TTGCTTCCTC CACTCTAGTT TTTCAATTTC ATTTCAATTC TATTTTTGAG TTCTTTGCAA

AAATAAAAAC CTGCATCCCT TTTTTGGGGT GTATGGGTTG ATGGAAGTGA GTGAAAGTGA TAGATACGGT AGATGACGTG ATGAAATTGG ATGCTTGAAT AATAGATCGA AGAATCCTT TTCCTTACAC CCTAGTTAG GCCCTACTAT TTATGCTCTC AATTCCATTC CTCAATTTCC AGAAATCTAT GTCTAAAGTC ATGTCTGATG TCTGTCACTA TATATTAGTT AATTCATATA TGTTTATGAA

TATTGCATTG ACCCTCGTTA TATATGGATC AATCAAGCCC ATTGCTGCAG phal0792 (SEQ ID NO:750, 751)

CTGCAGCAAC CGGTCTTCTA TCAAAACTAA CACATTTGAA TCTGTCTTGG AATGATCTTC ATTCACAGAT GCCTCCAGAA TTTGGCCTCC TTCAGAACCT

GGCAGTTTTG GATCTTCGCA ATAGTGCCTT GCACGGTTCA ATTCCAGCAG ATATATGTGA CTCAGGCAAT TTAGCTGTCC TCCAACTTGA TGGAAATTCA TTTGAAGGGA ATATTCCGTC CGAGATTGG AAATTGTAGC TCTCTTTACT TGCTGTATG CATCCTATTT CTCAAGCTCT AGTGTCATTT CTCTTAACAC ATCTTTTTGG AA[GA/GT/T A]TATACTAA TTCCTATCTA TTTTATGCAG

GAGTTTGTCT CACAATAATT TGACTGGTTC AATTCCAAAG TCCATGTCAA AGCTAAACAA GCTCAAAATC CTCAAGCTGG AATTCAATGA ACTAAGTGG AGAGATACCA ATGGAGCTTG GAATGCTTCA GAGTCTTCTT GCTGTAAACA TATCATACAA CAGGCTCACA GGAAGGCTTC CTACAAGTAG CATATTTCA GAACTTGGAC AAAAGTTCCT TGGAAGGAAA CCTGGGTCTT TGTTCACCCT TGTTGAAGGG TCCATGTAAG ATGAATGTCC CCAAACCACT [A/TJGTGCT TGACCCAAA TGCCTATAAC AACCAAATAA GTCCTCAAAG GCAAACAAA CGAATCATCT GAGTCTGGCC CAGTCCATCG CCACAGGTTC CTTAGTGTAT

CTGCTATTGT AGCAATATCT GCATCCTTTG TCATTGTATT AGGAGTGATT GCTGTTAGCC TACTTAATGT TTCTGTAAGG AGAAGCTAAC ATTTTTGGAT AATGCTTTGG AAAGCATGTG CTCGAGCTCT TCAAGATCGG GAAGTCCAGC CACAGGAAAG CTTATCCTGT TTGA...( ~ 175bp)...CTTCGGGAAA GCAAGGCACC CAAATCTAAT AGCATTGAAA GGATACTATT GGATTCTTCA ATTACAACTT

TTAGTGACTG AGTTTGCACC AAATGGTAGC TTGCAAGCCA AGCTACATGA AAGGCTTCCT TCAAGTCCTC CTCTTTCTTG GGCTATAAGG TTCAAAATCT TGCTTGGAAC AGCAAAGGGG CTTGCTCATT TGCACCACTC TTTCCGTCCA CCGATCATCC ACTACAACAT AAAGCCAAGT AACATTTTGC TTGACGAAAA TTACAACGCC AAGATCTCAG ATTTTGGGTT GGCTCGGCTT CTGACAAAGC

TGGACCGGCA TGTGATGAGC AACAGGTTCC AGAGTGCACT AGGATATGTG GCACCAGAAT TAGCATGCCA GAGCTTAAGG GTCAATGAGA AATGTGATGT GTATGGTTTT GGGGTGATGA TCCTTGAGCT GGTGACAGGT AGGAGACCAG TGGAGTATGG AGAAGACAAT GTGCTGATAC TGAATGACCA TGTGAGGGTG CTGCTTGAGC AAGGGAATGT GTTGGAGTGT GTGGATCAAA GCATGAGTGA

GTATCCTGAA GATGAGGTAT TGCCTGTTCT GAAGCTAGCA ATGGTATGCA CCTCTCAAAT TCCTTCTAGC AGGCCTACTA TGGCTGAAGT GGTGCAAATA CTGCAG phal l071 (SEQ ID NO:752)

AGA ATCTGTGCAG CAAACGACAC CTTCATCCCC TGACGTTCCA TTTGCTCAG TTGTTGGCAT CTTCACTGG ACCGGGCTCG TAAAAGTAAT GGGAATCAT AAGTTTCCA TTATACAATT ATGAATTTCA TCCTTATCA ACAATATCCT GGAAGCCCA GGTGGCCAGC TCATATCACC GGGATCAGC ATTTTCAACT TCTGGTACTT CAACCCCATT CCCTGATAGA CCCCCTACTC TTGAAT[T/C

JCCCCTTTCC CAAAGGGGAA ACACCAAAGA TCTTGGGTTT TGAACACTTC TCCACTCGAA GATGGGGTTC AAGACTAGGA TCTGGGTCGT TGACGCCAGA CGGTGCATGG CAAGGTTCAA GACTAGGCTC GGGATCGTTG ACTCCTGATG GTATTGGGCT TGCTTCACGA TTAGGCTCTG GGTGTGTGAC ACCTGATGGT CTGGGGCfA/ T]GGAATCCA GGTTAGGTTC TGGCTGTTTG ACACCTGACA

GTGCTGGGCC AATCAATCAA AACAACATCT CTGTGCAAAA CCAGATATCT AAGGAAGCAA CTCTTGCAGA TACGGACAAT GGACATTCAA GTAATGCAAC ATTGATTGAT CACAGAGTTT CATTTGAATT AACCGGGGAA GATGTTGCCC GCTGTCTTGC TAATA[G/A] AACCGGGGTA TTGCTTC[G/ A]AAACATGT CAGGGTCTTC ACAAGGTATA CTTTCCAAAG ACCCTGTTGA CAGAGAAAGG

GTGCAAAAAG ACACCGATAC ATGTACAGAG AAAACCGAT GATAAGCCTG ACAATTCTGT AGGAGGAGA GCAATGCCTT CACAAGCAAA ATTCTGTAAA TTCTTCCAAA GAATTCAATT TTGACAACAG GAAAGATGAT GTTTCTGTTA CTGCTGGCAG TGGCT phal l073 (SEQ ID NO:753)

CTGCAGAGAA ATGGAAAATC CTTTTTGNTT TTATACCATA CAGGTTAAGT CATGTTGCAA TACACTAAAA CCTCAATTCA TTTCTGACTG TAACATTGGG AAGAAAGCCC AGCTGTTGGC TGATCTACCT TCCTTCCCAG CAACCTTCCT GTTAGTCCAC CACTCATAGC CACTGCCAGT AGTAACAGAA ACATCACCTT TCCTGTTGTC AAAATTGAAT TCTTTGGAAG AATTTACAGA ATTTTGCTTG TGAAGGCATT GCTCTCCTT TTCCTACAGG ATTGTCAGGC TTATCGTCA GTTTTCTCTG TACA[C/T]G C[G/A]TTAC AGCTACTATT GGTGTCTATT TGCACCCCTT TCTCTTGTCA ACAGGGTCTT TGGTCAGTAT ACCTTGTGA AGACCCTGAC ATGTTTCGAA GCAATACCCC AGTTTTATTT GCAAGACACC G[G/A]GCAA CATCTTCCCC GGTTAATTCA AATGAAACTC TGTGATCAAC CAATGTTGCA TTGCTTGGAT GTCCATTGTC TGAATCTGCA AGAGTTGCTT CCTTAGAAAT CTGGTTTTGC ACAGAGATGT TGTTTTGATT GGTTGGCCCA

GCACTGTCAG GTGTCAAACA GCCAGAACCT AACCTGGATT CCTGCCCCAG ACCATCAGGT GTCACACACC CAGAGCCTAA TCGCGAAGCA AGCCCAACAC CATCAGGAGT CAAGGATCC CGAACCTAGT CTTGAACCTT GCCATGCACT GTCTGGCGTC AATGATCCAG ATCCCAGTCT AGAACCCCAT CTTCGGGTGG AGNAGTGTTC AACACCCAAG ATCTTTGGTG TTTCCCCTTT GGGGAATTCA AGAGTAGGGG GTCTATCAGG GAATGGGGTT GAGGTGCCAG AAGTTGAAAA TGCTGATCCt G/A]GGTGAT ATGAACTGGC CACCTGGGCT TCCAGGATAT TGTTGATAAG GATGAAATTC ATAATTGTAT AATGGAAACT TCTGATGCC CATTACACTT ACGAGCCCG GTCCAGTGAA GATGCCAACA ATTGAGCAAA TGGAACGTCA GGGGACGAAG GTGTCGTTTG CTGCACAGAT TCAGGTCGGG

GAG phall074, phal l075 (SEQ ID NO: 754)

CTGCAGCTAT TTGCTTATGA TACTGTTAAT AAGAACCTCT CGCCAA AGC CAGGGGAGCA GCCCAAACTC CCTATTCCAG CATCATTGAT AGCAGG TGC

TTGTGCTGGA GTTTGTTCAA CTATATGCAC ATATCCTCTA GAGTTGCTAA AGACTCGACT AACTATCCA GGTGTGTACA TATAAAACCA AACAGGCTC ATGTTTCCCA CATGAGTTTA ATTTCTATTA TTGTCAGTAT AAAAGTTTTT ACATTATTAA TCAAGAAAAA ATCATGTTAG CTATGACTTT TAAAGAGTTA TTTTAAAAGT CAACGAACAT ACTATGCATG ATGATTTCTG ATTAGTTGAC

AAT[T/C]TA AAAACTCTT TATAGTGT CGGTGCGTAG ACTCTGTTAC TCTTCCC CTATGTATCT TGTATCATCA TACCAAAAAT AAAAGCAAAA TTTAAGTTGA ATTAGTAATT TTGCAGAGGG GTGTTTATGA TGGTCTACTA GATGCATTCC TGAAAATAGT TAGAGAAGAG GGTGCAGGAG AACTTTACAG AGGTCTTACT CCGAGTCTGA TTGGAGTAAT TCCATATTCT GCCACCAATT ACTTTGCCTA

TGACACCTTG AGGAAAGCAT ACAGAAAAAT TTTCAAAAAA GAGAAGATTG GCAACATTGA AACCCTTTTG ATAGGATCAG CAGCTGGTGC ATTTTCAAGT AGTGCTACCT TTCCACTfC/ TJGAAGTGGC TCGCAAACAC ATGCAAGTGG GAGCCCTCAG TGGAAGGCAA GTTTACAAAA ATGTGATTCA TGCCCTTGCA AGCATTCTTG AGCAAGAAGG GATCCAAGGA TTATATAAAG GGTTGGGAC

CTAGCTGCAT GAAGTTGGTG CCAGCTGCA G phal l076 (SEQ ID NO:755)

TTATGGGTTT CCTTGTGTTG CTTCTTTTCT CCCTCTTAG GTCTCTCTTC TAGTTCCAGC ATATCAACTC ATCGTTCCAT ATTGGACCTT GACCTAACCA

AGTTTACCAC ACAGAAACAG GTGTCTTCAC TGTTCCAACT ATGGAAGAGT GAGCATGGAC GTGTCTACCA TAACCACGAA GAAGAGGCAA AGAGACTTGA GATTTTCAAG AATAACTCGA ACTATATCAG GGACATGAAT GCAAACAGAA AATCACCCCA TTCTCATCGT TTAGGATTGA ACAAGTTTGC TGACATCACT CCTCAAGAGT TCAGCAAAAA GTACTTGCAA GCTCCCAAGG ATGTGTCGCA

GCAAATCAAA ATGGCCAACA AGAAAATGAA GAAGGAACAA TATTCTTGTG ACCATCCACC TGCATCATGG GATTGGAGGA AAAAAGGTGT CATCACCCAA GTAAAGTACC AAGGGGGCTG TGGTATGTGA AACCATTAAT TGTTTTACCA CGTAATTACf T/C]ACTCCT TCTATCTTAA AATAAACTGT TGTCTTAAAT TGTTTTATAT AAATTAAAAA TAAAATAAAA TAAAATAATA TTTTTACCAA ACTAACCTTA TTAGTGAAAG AGATTACCAT TAATACTAGA TTTAAAAACT AAAAATATAT TTATTAGAGT TATTGGTGAA AA[T/A]AAT AATTAATTTT ACGTTGAAAA GTTAAATGTG ATATTTATTT TGCTACAATT TTTTTTAGTG TGACACTTAT TTTGGAATGG GAAAATAATT ATTAACACTA ACTTTTTTTC TTTGTCTTGT GGΓT/AIAT GTGAAATCAT TTGATATAT AGGAAGGGGT TGGGCGTTT TCTGCCACGG GAGCCATAGA CCAGCACATG CAATAGCAAC AGGAGACCTT GTTAGCCTTT CTGAACAAGA ACTCGTAGAC TGTGTGGAAG AAAGCGAAGG TTGTTACAAT GGATGGCACT ATCAATCGTT CGAATGGGTT

TTAGAACATG GTGGTATTGC CACTGATGAT GATTATCCTT ACAGAGCTAA AGAGGGTAGA TGCAAAGCCA ATAAGATACA AGACAAGGTT ACAATTGACG GATATGAAAC TGTAATAATG TCAGATGAGA GTACAGAATC AGAGACAGAG CAAGCGTTCT TAAGCGCCAT CCTTGAGCAA CCAATTAGTG TCTCAATTGA TGCAAAAGAT TTTCATTTAT ACACCGGGGG AATTTATGAT GGAGAAAACT

GTACAAGTCC GTATGGGATT AATCACTTTG TTTTACTTGT GGGTTATGGT TCAGCGGATG GTGTAGATTA CTGGATAGCG AAAAATTCAT GGGGAGAAGA TTGGGGAGAA GATGGTTACA TTTGGATCCA AAGAAACACG GGTAATTTAT TAGGAGTGTG TGGGATGAAT TATTTCGCTT CATACCCAAC CAAAGAGGAA TCAGAAACAC TGGTGTCTG CTCGCGTTAA AGGTCATCGA AGAGTTGATC

ACTCTCCTCT TTGAAGCCGT AAAGGTTCAA TACAACGAGT GCTTGTTTTC TTAGGGACAA GCATTGTACT TATGTATGAT TCTGTGTAAC CATGAGTCTC CACGTTGTAC TAATGTGAAG GGCAAAAATA AAACACACAA CAAGTTCGTT TTTCTCAAT phal l078 (SEQ ID NO:756-758)

CTGCAGAAAC TGTCGGTTTT ATTAGTAATT CAGCACAAAT TCTTCGGTGG AGTTTTTTTT TTNTTCTGTT TTTGGCTGCG TTTACGTGTG TTTGTTTAGT TTCCTGGAAA TTGTTCAAAT TGGTTAAAGG ATCTTTATAA GTGGACATTT CAGTTGGAGT TAATTGCGCA ATTGAAAGTT AAGGCATAAA TGTCTCTATT

TAGGGAAGTA TTTGAATGCA ATGCTGCTCA TTTATGTTCT GTTAAATCTG TATATTGATA TGCATTGATC CTTCTTATTT ATGTTATTGG CCATTATTTT TTACATTTAA GTGTGGTTCA TAGCTTATAT ATGACAAAAA TATATGATTT ATACTTCCTA TCGTGTTGGG ATGCAGATGG TCATGGGAGG TTTCCTATAT TGCCAATTGG TGAATGTATT GTGCCTTTGG TCCAT...( ~ 210 bp)...

TGTCTTGTGA CGAAAGTACT ATAGTGTAG TGCAATCTCA TACTGGCTTT TCACGATGAT AGGTTAAAA ATGTTTTCTA GCAAAGCACA TCAGTAATAA TGATCCAAfG /TJATTTTTA ATGCCTAAGC TTGATAATTT TCTAAATTCT TACTCGTTCC ATGAAGTGTG ...( -50 bp )...TTTTTG GAAGCAACTA TGACCAATCT TATCTTGTTT CATTTAAAAA TAATTCTAAA TCCTCTTGTA

TCATATTCTT ACTAATTCTC CTATTTACTA AACTAAAAAA GTATTTAGTG TTTTTTATGT TTTCAAGATA TTAAATTCAT TTACATTTTC CTTCCTTCAT ATTTTAAAGC ATTGTTGTTG TTTACAGAGT TTGGCCATT AGGATGTTGA CAAAGGGCAG AGCTCCTTC ACGTATTACT AAGGAAATGC AAGTAAAGAT TGGAAAATGC TCAGAGTC phal 1079 (SEQ ID NO:759)

CTGCAGGTAA TTTCTTAATG AATGGGTTGT TCTTTTGCTC CTCTGCAAAA

TCAAGTTATC TTTGATTAAG CGACTGATTG AAGACCTTTT TTTAATTTAT TCGGGAAGA AGAAGAACAC TCGGTACAGT AGCAGGAAA ACATGGTGTA

GCCTATATGT GATAACTTAT TAACATGTGT CAATGCCATA TGGAATAAAA AAATCTTCTT CTTCAGATAC ATTATTGAAC TTTTTGACGT TTAATTTCTT GTCTTCACAT GTCCTACTGA GTAGCCCAAT TTGTT[C/A] TTTATGATGT GGCTGAAGAT TGAAGGGCCA CTCAGTTGTA TCTAT[G/A] TATATTAATA TATGTATCAT TATTTCATAT ACCATATCGT GCATTTTCAC GGCTTATAAT ATGTAAATTG GCTTCAAATA TTGCCAGATG CTACAAGAAA ACATGATTGA TTATATGGAT ATGTGAATGC TGAATAGTTT CACAATAGCA ACTGAGATGT CACAAACTTT TTACCTTCTT TACATTAAAT AAACTCCAGT CTCTTAAGAT GTACCCCACC AATAAAGTGA TTGCATAAAT TATTCTAGCA AAAATGTATA AGGTTAGCAT AGAGGTAAGG GTATGAAAGT TAAGCTTATG CTTTTTTTTT TTACTATTAT TATTATTAGC ATATCAAAAA CAACATTAAG TATTCTGGTG CCTGCTTAAA ACTTCAAATT TTGAAAACTC TTCTCAGTAT GCTCCCCTCA TATTCTTTTA GCTTCAAAAA TCATTTACAC GTGAGTTCTG AATATATGAT

TT[G/T]GCA ACAGATAGAT TGGTGTTTTC TCAACACAA GTCGATAATT TGAGTGTGGA TTTCCTAGCT TCAAACAAT ATCTATACAT GAACGTCCTT GATAGATCAA CTGCAG phal l l31 (SEQ ID NO:760, 761)

CTGCAGTGCC ACCATCAGCA GTATCAGCAG ATGACACTGC AAATCAAGCT GTACACTGTC CATCTCTATT GTTGATGCTC ATAGCCTGC TTCTTAATCT TGTTATTCTG ATATGTTTC TATACACAAC TTCATTCTTT TTCTCTCAAT ATTGATGTAT AATCAGTTAC CATTTTTTTA TATGTGGTGT GTGAATTACT TTTTGCGCAA AAACAAAAAA TTGGCTATGT GCCTGGTAGT TTAAG[G/A]

AGAGTTCTAT TGATCAATTT ATGTAATGAA ATTACGTTTG CCAAGTGAGC CTTAGTAAAT GGCTAACTTC TTACTGCTTC ATCATATGCT TT[G/A]ATT TGGTTTGACT GAAATCTTCA TACCCTATTG CACAAAGATC CATAAAGCAT GCCTTGCGTG CATTTTTTT GTTGCCATTC GTTGCGGGGG TGCTTCTGTC ATTCATCCAT TCTTACTTAA TGTTGCATAC GAATGAATGT GTAAAACATT

TCTCAGAATC TCTA[T/C]A AACAAGTGCT TGCTTATAAT CTCGAAAAAC CAACCACTGA ACATGAATGA GTGTGCATAT TAATTCTCAA [C/T]TTGGA ACGTGAATCA GTGAGTGGTC TCTTTGC[A/ TJTGCAACTT GAGATGGTTG TGAAACTCCT GAA ACA A ATA GTGTTGCAAA TG[T/G]ATG TGCCGACTAT TTTTCTCTAC ATTATTGATT AAAAAATTTC CTTTTTTATG AGAAGTCATG

TACTAAATAA CGTTTTTTAT ATGTTATTC TTTGTAGAAC TCATCATATT GTTCGTTAA CGAACTCAAA TCTAAGT CGT TTATTTCAAT TTACTTTTTG CACTAAT... ( - 1200 bp) ...ACTAAAT TTGAGTATGT ATCACAGGTA GTATGACTCA CATTGCCAAC AATCTTTGCT CCAAACCATG GAGTATCAGC ACCATATGGA GGAGGTATGA CTCTAATTCC ATTGGACACG TAGGGAGGAA

GTAAGGCATG GAGTTCCTTC TCTATCCTTT CTGTACCAGA ATGGTCCCAA AAGTAGTAGA ATTAGAAGTA GCCATTACCC CACTACAGGT GTTTAATTTC ATTTCATCAT TTCGCAAAGA TATAAAATGT CATGTCACAA ATAAAAAAGA ATATTTACCA GCCAAACCAG GTAAGCATGC AGTCCCCCCT GATAAGACTA TAGTCTTGTA CCAATCACTG TCACATGCTA AATCTGCAG phal l l32 (SEQ ID NO:762)

CTGCAGCGTG TAAATGAGCA TTGTGGGTTC TAATTTCCTG CTTTTTCCTC

TCCTTTTGAT CTTTCAACCA CCTACCCATT GTCCTACCTC TTAGCAAGCT CCTGTAAAGC TGGTGAGCAA TAATTTGCAC CAATAATAAA ATAAATGAAA

AAAAATACTT AGTAATAGAT TGAGTTGGTA TTTGAGCATA ACTGATGAAG ATAATCATGA GGTTGATTGC AAGATACCCC ATTTCTGAGC AAGTTTTGAC TGGAAAGGAA TTCTGGATTG AGTGCTTGGT GGAGTAAAAG TAAATCCTGT ACATGACATA TTGCTATGTT ATTAATTTAT ACACACTAA GAATTCGCTC GCTGTACAA AA[A/C]AAG GTTTGGTAAT GGTGGC[A/T ]TG[T/C]GA

GTGTTTGTCA TTGATGGTTC TCAGCTATTG GCATGAGGAA ATTATCAACT TCACAACAAA AAGCAGTTAG TCATGAAAGC AGGCTAAGGC CAAATGATAT [ATT/G]TTC TTCAAATTTG AAACTGAAAA ATGGGCCAAA TTGTTTCTTC TGCATATGAT CATATTTAT CACTATATAA GTTAAC[A/G ]TGAAGCAGT . [A/G]AAGTA CCTTCTTTTC ATTACTGTCT TTTGGCAAAA ATGGAGGGAA

ACACCCGTTA GATAACTGCA CAAAAATCTT GCATCAGAAG TTCAGAACCA ATATTAACTT ACAAAAGCAA TAGACACACA CAGCAATAAT AGATGCTTCT TTATATTGAA GAATCCTAGG ACCAGAACAA AAGGGGTCTA GGTTATGGAA AAATTCTTTT CCGTGAAACT CCTTTGGAAG GCTTTGAATA TCTC[C]TTT TTTTTCTATT TCCCCAATGC AGGTGTGTGT GT[GTGT]GG CATAGAAATA GAATATCTTT TAAGAGAACT AACATAGCTT TTGCATGATT GTCAGTTTAC ATGCTGCATT GCCAAGTAAG AAAAAA[G/C ]TAGCTAAAA GAAGTGCTAA TGACCTAAT[ G/AJCAAATA ACTCTTAAAT TAAACACAGC CAATGTTCTT

ACA[T/G]CT TTGGAAGCAG TGGAACTCTT GGTATCAAAT TGGTGGCCTG AAGGGCACfA /G]AAAGTGG CATCTCTATA CCGGT[A/G] GAGATTGTAA CGTTGGTGCT GTGCAGTGCT TTGGAGAGTT CCATGGCTGA TAGACTCCAT GATCTAGCAA GGAACTCCAT AGATTCAGTT GGAGTATCA GGAGGGGGAC AGGATGATG CAGGCCAATC TGCAG phal l l33 (SEQ ID NO:763, 764)

CTGCAGAGAC CATTCAAGAT CGACTGAAGA ATTTGATTGC CTCGTCCCCT

GTGATGCTGT TCATGAAGGG TACCCCAGAT GCACCAAGAT GTGGTTTTAG TTCCAGAGTT GCTGATGCCC TTCGACAAGA GGGCTTGAAT TTTGGGTCCT

TTGATATATT GACTGATGAG GAAGTGAGAC AGGGATTGAA GGTATACTCA AATTGGCCAA CCTATCCTCA ACTCTACTAC AAAAGTGAGC TGATTGGTGG TCATGATATT GTGATGGAGC TGCGAAATAA TGGGGAGCTG AAGTCGACTT TATCTGAGTA GGATTATTAT TATTCCTTCA AATAACATGT GTTATGTCCT AGAAGCCATT TTGGGAGGTT GTGTTTGATG TTCATTAAT GAACTACATG

GTTATTTTAT _ATGCTACCG TCAGTGATTT TGAAAATTGT TAGTGTGGAG CCTCA[T/A] CTAATGGTA TACTGAACAT GCATGAATTC CAATCAGATT AGAAATTGTT ATATA[T/A ]TACATAATT ATTTGGTGGA AACCTCTCAT TGGTACTGCC AAACAAATT AATTCAACAC GTGTGAGCCC CCTCGAAGTT GCTTGCCCTT CAAAGTTTAT GTTTTGGACG TTATGCCAAT CCTT[T/G]T

TTTTGGTCTA GACTTTCACT CAGACAAGGA ACATT[C/A] ACTATAAAAT TAGATTCTGA AACTTCGATA AAAAGAATGC TTTTTAATAT ATAACGAACA GTTTAAATTT TATAATTAAA AAAATCTTGA AATATAGTTT AGATTAAAAA AATCTTGAAA TATGGTTTAG AGATTAAAAA AAAATGG[C/ TJAAATTGTG ACTATTAACA GAATGATACA AAGA[C/T]C AACTATACAA ATAAAAGAAT

GATAATTTAA AATGATTGAG AAGTTTTTTT GGTAGATATT AACA[G]TAA A[A]GTATAA A[A]AAAGGG AGCAATTAGT TACAAATTAG GTCTGTAGTA TTTTGAATTC TCATGATGAT TTATTAAATG CATCAATCTA ATCATATCAT GTATTTCAAA TTTAAATCAT CCA[C/G]TA AATGCATCAA TCTAATCATA TCATATATTT CAGATTTAAA TCATCCfAGT AAATGCATCA ATCTAATCAT

ATCATATATT TCAGATTTAA ATCATCCJAC TCGATA[G/A ]CTTATTT[G T/AT/GC]AT GTATCATGTA TGCATGCTGT ATCTT[T/G] C[A/G]T[CC TCATAGAATA ]ATTAACCTA ATCTT[C/T] TATAATCTTT TATTT[C/T] TTT[A/G]TT ATTAGAAAAA GACTAAATAG GAAAGGATAG ATCATATATA C[TCTAG/TC TGG/TC/GTT AG]AAGACTC AAA[T/C]TT GTAACTTAA[

A/G]ATAAAA [G/A]AATAT ATAAAAATAA ATTAACCTGA TCTTCTTT[C /A]ATTTTTA ATTTTTT[T] ACTAT[G/A] AAAGA[C/T] TACTTAAATG TATTTTGTT[ CA/TG/TA]A GAAAATAG[T /CJTAAGATT C[A/GTATG] TATGTTTGAA TTAAATTTTC TTTTATATAA CATT[T/C] GCATAAATTT TTTTAACCAA AAATAAGGAT AGAATTACTA TCAGCTGTT ACAACTTAGA AACATTCTAC

CAGATTCTCT TCCTTCTCCT TTTGCACTGT TCTTCTGAG TGTAATCC[T /CjCAGAAAA GATGGAAGTA ATTCTTGAGATTCTG GGAGGAACT...( ~ 185 bp)...CCTCCTAGTC GCCCCCACCT GATGACGCAA ACCTTGCGCC GCC[T/A]C AACCGACGCT GAGACTCGCC ATGGCACACA CGTCCGTTCG GGATGCCGGA GTTTCCACC GTCGTTGCAT CTCAATATA ATACCACACG TTTACGATC

CGTTTGGATA TGGAATGCCA AATATACCGGAG TTTTCATCGT TAACGGAGTT GAGGCCGATGC CACAGATGCC GCAACAGCTTC CGTCATCATC AACGGAGGAT ATGCTGGTGC CGTCAAATAG CAATGAGTT TGAAGAACCGC TGCCTGACA TGATGGAfA/ GjGCGTCACC GCCGTCAGAG ACGGCGGGGA GAGTGACGTT GAACGTGAAA GTGCATGAGA TCCAAGATCG CATCCCTATA GAAATGGAGT TTGAAGACAC CGTTCTGAAG GTGAAAGAGA AGATAGTTGC CCGCGAAGAC ATGCGAGGT GTGCCATTGG AGAGGATCG CGTTGCAGTC GCATTCAGCG GGTGTGGAAT TGCTTGACCA TCAGGTTCTG CAG phal l l35 (SEQ ID NO:765)

CTGCAGCTG ATGGGAAGAT TTACTTCTGG TAGACTATT TTAAAATCTT

GTAGTTTTCG TTTGGTTCCA AATTCATTCC AGAACTTTAT TTGCAATTTT GCATAAATTT TTAAAATGTT TCTTCCGTGA AC ATG ATG A A AAGTAATTTG

CTTGTCAGCT TCCATTTGAC ACTTGTTCAA AATTAACTAA AAGTGATGTG ATTATGGCCA TTACTATTTA ATTTTCCTGT GTTATTTGGA TGAAAAACAT AGTTCCAAGA AAATTTCCAG ATTATACTTA TTCTATGGGG AGTGGTTTTT CTTATTTCTA AGTCTACCCT GTCTTTCTTT TGTAAAATTT TCATTCTGTT TGGAGCTTTA TCCCGCATCC CAATCTTTAT TATTTTGTCT TCTAGAATAT

GTTCAAGGAA GTCGATGGGG TTGACTTAGA AATGGCATAT ACTAGTACAG AATATTGATA GAAACCCAGA TTTTGAATAC AGGAGAGTGT GAATAATTCC CAGTTGAAAA ACTTATCTCA GAAGAGCAAA TAAGGATGTA TTTATCCTTG AATTTTAGTG GGTTAAGATT ACGTTGCACT [T]TAAATTA AATCAAGATG GACTTGCCCT TACTTGGAAC ATTTAAGAGA [G/C]ATACT CACTTTATAG

GAGCCTTGCC CAATTAGAAA AAGAATAATG GAAAGGAAAC ACAAGGAAAA ATTTACGGAA GTAGACTCTC CCATAACAAA TAGAAAGATA ATGGAAGCTC TCAATATGTA AGAGAAGAAA TGATAGCACA TATCATTCAA GAGATTTATA AATCTAATCT TATCCTGCAC CCTGTTCTTA CTTGTCTCTC CTCTCCACCC TGTCTTTTTT ATTGCTCCCA AGTTACTGAG TCTGCCCTTC TATTCCTCCT

GCCATGTGAG CTGTGTGGAC C[C/T]GCTC CTTTCCCCTG TCATCAAACA TATTCTTTCA ATGGTACATG TGTTTTCTTA ATAGGATTAT TAGGAAGAGA AATGTTAGAG ACATGCTCTT AGACATTATC TTTCTAACAC TCTGTGATTG GCTGAATTTT GTTAAAAATC ACCAAGTTTG GGGGTCCCAC TTATCATTTA ATGATTTTAT CTCCTAGTTT AGATGTGGGG TCCACCAAAA TTAGTGATTT

TCAATAAATT TCATTCAATC ACAATGAGAG TATTAGAGAG AGAGAGTTGT TAGCATTCCT CCAGTAGGA ATTGTTTGTG AAAAACTGGT TTGGTGCCTG AAGAACATTG ATTCTTCTTT GAAGGAAAAA AAAAA[A/G] GAAGGAAAGC ATTTTTTCCC ACTCTTTTCA TACTTAGTAT CAAATTAAAA TTGTCCTCAT TATGATATTG ATCATTTCTT TGTTTGGAGT ATGCTTGAGG AATTGGTGCA

TGTTTTAATA GTGTCATATA TTAAGTACTG TAGGAGTAGA ATGTACTCTG TTTTCATTGA TATATGTATT TATC[T/A]T TTTTG[G/T] GATTTCAATA ACAGGGGATG TTAAGAAACT CCCTTGTCA ATAAAACTGG TTTAGAGCC TGCAG phall l36 (SEQ ID NO:766, 767)

CTGCAGGGTG TCAGGGCAGA GCTTTAACAG ACTCTATTTG ATGAAGGCT

TTGCATGAGG ATTGATAAGG CAATTTTTG TTTGTCATTC TGATGGTTTG ATCCAATTT CTTTTCCATA TAGGATGAGA ATTTTATGGC GAATTTGGTA GTATATTfA/ T]GAGGATCA CGCTTGTAAG GCGACGTAAT TTTTGTTTAT

TTTATGAAAC CCATCCAATT GTATTGAGTA TAATCTGTTA TTGAATATTT AGAGATACAT GCAGTGCCAC TTGCTGCTTT TTATCTATGT TACAAAATTT GGCATCCAGT TATATTAACC AACTACTTTA GTTAATTTCT GTGGAAAGTA TTT[A/T]TA CCAATCACAA TGCGAATAGT GAAACAATCA CAACTAGAAG TTCTCGCTTC ATC[C/A]TT TTTATTCGGA GCATCGTTTC AAACACGTTA

GTTTTGCATT TTGCAATTTT TTTTTT[T]A AATCAATAGG CTCTGTTCTG CTGAGGAATA CGTGTTTAAG GAATAACTAC CCGAAAACAA GCAAATCCCA AACAACTTGT ATTAACTTCT GAATATTTTA CGGGATGGTT TTTTTAAATG TAAATTTTGA GCATTCCTTG AAA...( ~ 200bp)...TGGGT[C/T] GGATA[T/C] GGATCTTGAA AAGAAGTTCT TTTATTTCAT GGATTGCACT TATCTGTGTf G/TJTGAAAC GCGACTTGGT GAGTCGTGCC TAAAACATGG TGGATAATAC TCATCCCAGC TTGTGTATCT TATGTATG[T /C]TCACAAT CTCTCTTGAT GAAGAT[T/C JTATCACTGC TTCCAACA[C /G]AGAGAAC TTT[G/A]TA

CCTA[TA]CT ATGATCGGAA AAGAAAAAGA TTTTGTGCTG ATTTTTTAAT ATAAAATAAT ATTGATAAAA CAATTTCTTG AATTCGATTT TGCTCACATG AAAACTTTCC TTCCCTAAAA CTTTTT[C/T ]TCACCCAA AAAATAATAA CAACA[T/C] AATTCTGTAT TCATGCCTAA GATTTTTAGA TTTGTTCATT CCAATGAATA AT[G/A]TGA CTACTTTTAA TCATGTGGCT ACATTGCTAT

AGTTTGTAAG TTGTAATATT ATTAAAATTT G[AT/GTA]A ATACAATCGA TAAACTGAAA TCAGTAAAAG AGAAAACAAG AGTTGCGAAT AACGGATTTA TATGATAGTG ATTTCCATTT AT[A/G]CGA GGAAAATCTA CTTGGACATA AAACAAGAGT CTACATAAAC CAAATAGCCA AGGAAATTAA AAAGGATACA CAACATAAAG GAAAAAAGAA AAGAAAAAAA TCTCATCAG TCACAGATTT

TCTATATGTC AAGCTAGTT CAACAGAGAA GCATATCTCA TCAAAATACT TCCATCCTTT GAGGAACTGG GTTTTATGAC CTGCAG phal l l38 (SEQ ID NO:768) CTGGATTTGA GATCACCAT CCCTTATTTA ATTAGTGCTT AAAAGATTCT

TCTGGAAAA GCCAAATATA GTTGGT[C/T JCAAGGTTTT GCCATTAATA TCCGATTAGT TAAGGGAGAG ACCATCATAC TTAAGGGGAG AAATTTATAG ACTATTTTTT ATTTCTTGAA GTAGCTAAAT TAGCAAACCA ATAGTCTAAA TGATATAAAA AGACACAAAA CAAAATCTAT TTCAGGACAA TAATAAGTGA TAGAAATAGT TCAAGTTTGG CAAGGGAAGG AATCCTTACC CCAAATGCTA

GCTTTGCTGG AAGGCAATGG GCGAACCAAT AAGTTAGCAG CATGTGGATC ACAGTGTACA AACCCATGCT TGAACATCAT TTCAGCAAAA GTTTGACTAA CCTGCAAAAT GTTTATACAT TTTACTAAAT TACTCCATAA ATGAAAGACA GCCTATGTGT AACACTTAAA AGTTATCACC ATGTGACCAT GAAGTCACAG GTTTGAGTTG TGGAATCAGC CTCTTGAAAA TGCAAGGAAA GGCTGTCTAC

TATACAGTAT AAACCCACAC AGAGATCTGA CCGTTCTGTG GGCAGGAGGT TTATGAGTTT ATGGTACATG CCTCCCTTTA TTTTGTTGAT AAACC[A/T] TATATGGTAC TAATAGCATG AGGGAATCAA AATGGTCATC ATTAACTTGC TTAACTTCAT TGACGTATGT TATCTTGTCA AAGATATCAT GAGGCATATA ATAA[AAACT TCGTACCCCA TTGCCCAGAG GCTCTTCGCT ATGCGAAGGT

ATGGGGGAGG GACGTTGTAC GCAGCCTTAC CCTTGNATAT GCAAAGAGGC TGTTTCCGGA TTCGAACCCA TGACCAACAA GTCACCAAGG AACAACTTTA CCGCTGCACA GGGCTCGCCC ATCATGAGGC ATATAATAA] TGTTATAAAA AA[CTT/AAA JACACACTAT CAACACCTAT TGGTGACCAT TGTACAAGTG [C/T]CTATA ATAAAGCCTG TAAACATAA C[G/T]TACC AATGTTGAAA

GTTCATGCAG GTTGATCCCA AGTTTCCGA ATGGTCTTTA CATCATTTAC ATAAGCACCC TCC phall l39 (SEQ ID NO:769) GAATTCCATT GAAGTTGTAG AAGTCAAACT TAATTGTAGG TTTAGGACAG

CCTAGTTTGG TATTGGTGTT TGTAATTTTT TTGTATATTG TGTAGAAGTT ACTGTTCTGA ACCGCTTGCT CTTTTAACCG GAATGAAAGA ACAGAGTTGA CTTTGTTTCA CATGCGCAGT GGGATTGTGT TGCTTTATGA AAATGGAAC TGCTTTTGGA CTCTGTTGGG ATAAACTTCT CTATAAATA CTTATAGGAG AAGAAAACAA TAAGGAAAAA TTAAATAACG TTCTTCCATA GACTAAAATT

AGCTTATGTA TAAGTTAAAA TCATCTTTTG GAGAAGCTAA ATGAGAAAAC CTTTGCAAAT TAACTTGTGC ATAAGCTAAT TTTAGTGAAG CTAATTTTAT TTTTGCTTCT TATCTTATGG AGAAGTTTGT TTAAATAGGf G/A]ATTTGG TGTAAATAGT TGTGCTGTTG TTAACTGGGT CATTTTTCAG TTTTAACCTT GTCTGAAACC AA[A/G]CAT GGCTAACAGA ATGTTAACAT TTTCCAAAAT TATATTCACA TCAGAAATTT TTTTAGAAGT TTTTCCGGAT GA[T/G]TTT AGCCCAAGAA TTTGTTTGAA TTTCCAAGAT CCATGCTATT TGCTTCAATT GTAAATTTAA ATCCAACGTC TTTCTTGTTA TAAATGCAAC CAAACATCAA

ATTGAGAATT TTGTTTTTCT CTGTGAATTA TTAGTTAAGT GTGTTTTCTT TGTTTAAAAT AGATCTAATT TGTTAAGCTT TATTTACCAG AGTTACCAAA CCATCTGTGA GAAATATCCT TCATTCCGTG AAAGATCTGA AAATGTTGAT CTCGTGGTGG AAATTTCTCT GCAACCATGG CATGTTTTTA AGCCCGATGG AGTAAGATAA CTCTCTGGCA CTCTTTAATT TATTTATGCT TTTTCAGGCA

TTTTTCA[TC /GTJGCACTT TTTGTGATAT GGTTCTGATT TTGTTGCCAT TAACAATTTG CTTGGTACTT ATTCTGTGAC AGGTGATTTT ATTCTCAGAC ATTCTTACCC CACTTTCTGG AATGAATATA CCCTTTGATA TTGTGAAGGG TAAGGGTCCT GTTATATTTG ATCCTATTCA CACAGCTGCC CAGGTTGATC AAGTGAGGGA GTTTATTCCT GAAGAATCAG TTCCATATGT TGGTGAAGCA

CTGACAATTT TGAGGAAAG AGGTCAGACT GATCCTATGT TCTGTTGCAT TTCTGCTTTT GTTCATTTTA CCCCCCACTT CCTTCTTATA GCAGAAGCAT GTCTGTTTAG TCAAGTTCCA TAAGACAGGA AATTCTTTAT GTTTGTTTGA CTTAAAAAGA ACTCTGTATT TTATGGATGC TCATTGCTTG TGAATCATCA TTGGGGGGTT GTAATAGCAT GAAATGATGG AGAACAGGTG ATTGAAGGCA

TATCTCATGT GGTACATGGG GGAACATATA CCTTGAAAAC TCACAGAACT ATATATTTTA TTGAACTGTC TATCCTGTAG GTGGCTATTA AGATGCTGAG TTAATCTACA T[C/T]TCCT CTTTGTATTA AAAAGTTGT TTCATGTACT AGCAAATGGT GGAGCTTCAT GGAAACATA CCCGACATCT AAATTTCTTA TTATTGTTGC TTTATCATTA TTTTCTGTGG CCCAATTGAT GTATTATGCC

CCTATGTTTT ACTATCTTAT TGAATTC phall627 (SEQ ID NO:770)

CAATTCATGG TTTCTCΓTTA TClTTATrG /A]ACATTGT TGCCAAGTAA TACTACTAT ATAAATTCAG ATTTGGGTTT CfA/TlGAT AACCGTGGTC GTTAC pha 11628 (SEQ ID NO: 771)

CTGCAGTGTT GTCTCTCGG AGTTGCTTCA ATTGCTCATA CTCTTTGGG

ATAAC[C]AC TCATTTCAAA GATGTACTAG TTTAAAACAT GCAAA[A]AA [G/A]ATAAA GTTAATGTGT ATTTTGTATG TTGTAGGGAA GCACAAAGTA

TCTTGATTGA ATTAGGAAGA TTACACGAGC CGTATGCATC AGAATAA[AT GGTTTGTGGG AGGTTAGATT TTCTGAACGA AGATGAAGA] TATGC[A/G] AATTCTTTTC AAATTAATTT TGGG[C/T]A AATGATGAAG CTAGACTGAT AATTGATTAA TTTTGGGCAA ATAATATTAT AT[T/C]ACA TGTATGAGAT TGATTTTAAG TGTATATGCA TACATGAAGC AATAGACTTA ATTTAATTA[

C/TJCTTAAG GAGTG[C/T] TG[G/A]ACT TTTGAGGATG [C/T]C[C/A ]TTTTGTGCT [G/T]ATGAG CCCTCCATGG TTGACATACA [A/GJAGCAA ATTGCAGGGT GTCTTTAGCT GAGGTTTTTG CTGCTTCGAA GTGGCAATT GAATCAGCTC CGTTGGACAG TGACATGGTG ATGGTGGTGA TAATTAATTf CG/TA]GCTT AAGGGTAAGT ACAACTTCTT AGCTCTGTAA GCAAAGGATG CCTTGTGGAG

TTGGTTCATC TAATCCACGT ATATATAfG/ T]GGCTGAA[ C/T]GAGGGA ACAAGAGTTT TCAATCAATG A[T/C]TACA ATTCCACAC TCTCGCCTCT AAAGTGCAT CCCTCACATT GAAGCATCCT CCAAATCCCA AAATATTATT ATTACCACTT AAAGCTATTA CAAATCAGAA AACACTGCAG phal l701 (SEQ ID NO:772)

TTCTCTTGG CTTGCCTCGC ACCTTCTCA ATAATATCAC AGTAAGAAAA

AGAAGACAGC TGTGGATGTT AGGAGTGATT GACCCGCACA TGCATATGGA TCCAATCGGf T/C]AAG[A/ GjTGAAGTTT CTCTTGGCTT GCCTCGCfA/ G]CCTTCT[A /CJAATTAAT ATAT[ATATA TJATATGTAT GTG[G/C]TT TGGTGTCTGf T/CJATATGA CATATGTGCT CAGTGCCTCC GCCAAAGCAA ATAAAATGGA TACTGAAATG GTTGAGAGCA TTGATGTCAC AGATGATGCA AGGAGTTTTA GCTGGGCAGT GGACAGCGCC ATAAGGAGTG AGACTGGGTC

CATTTTTGGA GAATCTCGTG TCCGTAGTTT TGATTCGTTA GTGGACAGTG CCATAAGGAG TAGTAGTGCT ACTGAAGAAG ATTTGGGATC ATCCCCGGTC CATGTTTTTG CTATGGCAGA AGACAGGGCG ATGAGGAGTA GTAGTGGAGC TGATTTGGGA ACATTCCCTT TTCATGTTTT TGCTACGGCT GCGGATATGA GGACTATTGC AGCACGATTA ACAAAAGCCG TGTCGAATGC AAAAAGACAT

TTTGCTGAAG GATTATTTAC CCGGGCTGAG CTTATACCAG TCACCCAGGC TGAGCTTGAA CCAGCCACCA ACAATTTCCC [A/TJGGATT ATTATTTACC CTGGCTGAGC TTAAAGCAGC CACCAATAAT TTCTC[AC/T A]GTGACAAC ATGATTGATT TTTATGTGTA CAGAGGCAAA CTCGTTGATG GTCGTGAGGT TGCAATCAAA AAGAGGATAG CAACCAGGGG AGACTCGTTT GGTAAGTCTG

AATTTGCCAT CTTTTCCCGT TTACATCACC GGAACTTGGT TGGGCTGGTT GGGTTCTGCA AAAACAGAAA TGAAAGGCTG CTGGTGTATG AGTACATGAA GAATGGGTCG TTGCATGATC ATTTGCATGA CAAGAACAA TGTGGAGAAG GCTAGCAGT GTGTTGAATT phal2105 (SEQ ID NO:773)

ACGTGGCACA AATCCAAGGA CGTGGCGCGG AGAATCCTC GATTCGGTAG AGTACCAGAT GATGAGGTG CACGTACACC TTAGGTTTAG GAGAGGCGAA TCTCGCGGGT AAGAAGATAT TCCTTTACGA CGCCGTTTGC AGGCCGAGCG AGATTCACTC GTTGGAGACG ACGCCGTTTG ATTACGTGGG GAACTGCGAG

AACAAGACGC TGCACGCGAC GCAGCAGATC GCGGAGTGTT GGACGCGCGC GGTGAGGAAG CTGCTGGAGA GAGTGGCGGA GTCGGTGGAG AGAAAAACGT TGGAGAAGGC GGCGAGGGAG TGTCACGCGG TGGAGCGGAT CTGGAAGTTG TTAACGGAGG TTGAGGACGT GCACGTGATG ATGGATCCGG AGGATTTCTT GAGGTTGAAG AAGGAGTTGG G[G/A]ATGA TGAGAAATTG CGGGGAAAT

GGTGGCGTTT TGCTTCAGG TCGAGGGAGC YCGTGGAGGT GGCGAGGATG TGTAGGGATC TGAGGCAGAA GGTGCCGGAG ATATTGG phal2390 (SEQ ID NO:774) CTGCAGTCTT TAGTTGGCCA AAGCCCAGAT TTGATTGTTT TTCTATCGCT

AGTTGAGAGA TGTTGGCAG GATCTTTGTG ACCAAATGT TGATGATTCG TGGTATAGCC TTGTTTCTGG ATAGGACGTA TGTGAAACAA ACAACAAATG TACAGTCATT ATGGGACATG GGTTTGCAAC TTTTCTGCAA ATATCTTTCT CTATCTCCAG AAGTAGAACA TAAAACTGTT ACTGGTCTTC TTCGTATGAT CGGAAGTGAA AGGTAATTTA TATTTTGCAA TCTCAGTTAT GAAAATGACC

AGTACAGTGT ATGTGCAGGT TGTTCTTCAT GTGTTCTGTA TATGCATTCT AGAATTCATG TGATGGGAAC CAGTTGTTAC TGATAAAACC AACGAAGACT AGTTCTTTAC AAGAACATAA GTGCAGAATA AAAGATAAAT CTAGAATTTT GCAAAATGCA GATATACTGA ACCTCTAGCC CAAGCCCATT CCCTTTTTGC CCTCACTTCA TGCCTTCTAT GCACTACATA TCTTTCCCTC AT[A]TTTTT

TTTTTGTTTT TCCCTAATTA TTTTCCACCT GCGGGACCTC TCACTATTCC TTGTCAGTCC AGCTCTTGGG TTATGCAGAA AACTTAACAA TGTGGATATT CCCTTTATTT TTTATTCGGT CCTCCTCACC CATGTGTGTT TACGCTTTTT ACTTCCCATT CCCTTCCTTG TAGCAATTAC CTTATTGGCC ATCCAATTTT CTTAATATTC C

AAGCAGAG TGACCATTTG TTTGGGGAAA CCTTAAGTTG CCACTCTGCT TTTTATTCTG TAAAATCAGG AACCATCACT CTGACATGAG GGAACTAGAA TTGAGATTTC TGAATGGGTT TGGAATGGAG TTCTTGTTAG AGATGGGCTT TTTCATAATT ATTTTTGCTG ATCTTATTGT TTTTGAAAGA AATAATCCTT GAACTACCTG CAAAACCATC TTCTACCAAT TGCCTACTGT TGTAGTCAAT CATATATCCA CTTTTTTTAT TGAATATGTT GGGCAAGATG GAGAGACTTG GTACAATGTT AAAGGTGCTG CCTTGTGACC TGAAAATCTA TATCTTGCCA AAAAATGGGT CCATCAGTGT ATAATTAAAG TAATAATTTC AGAATAGTGA

TATATAATAA ACACCATATA GAGATTCCTA TGGTGATAAT TGTTGGAAAT GGGAGTATTT TAGGATATTA GAGCTTCTTT AATGTTTGTT TTTCATTGTG GCCTAGATAC TTGGTATAGC TAGGTTCAAT GCATTTTAGG AAGTGGTAGT AGATCTGAAT TGCTCTTTTG TAAATTGTTT TAGCATTAAT TATCTTTGTA ATACTCTTGA ATATAAATAG GCCATTCTGC TGGGTAGAA AAGACAATCC

AGCATATTA AGAAAAATTT TGTTTCTCAT CTTCCACAAT AATTATGTGA CTTAGCCAGT AATTTTCAAT CTTACTGCAG phal2391 (SEQ ID NO:775) CTGCAGTGCC ACTTGTACAA GCTCTGCTG CCAGGTTAAG TGTTTCTAA

TAAATTGTGC TTGGTTTTGG TATCAGATTG ACTACAT[A/ GjATGAAGCC ATTCTGACAT CATTCTGAAA TAATGAAATT TGGAATTGGA AATCTT[G/A JCTGTGATCT TTTTTCCCCC TATTTGAGGG AAGCAAATAC TAGTTTGGCT TAATTACACT TTTAGTTCCT TTATTTTAGC CTATGCACGA TTTTGGTTCC CCTAGTTCTA ATTGCTTGCA TTTAGTCTTT GTAGTTACTC AATTGTTAAA

ATTAAGTCCC TCACCTCATA TTTTGTCCAA TATTTAGTGG AAATCTCACA AGGTGTGATG TAAGTCATGT G[G/A]CATT GATTTGTGTA GATTGTTGTA GGCAAAATAT ATACAA[T/C JATCTAAATA AAATATTTCA TATATTTGTT TCCTTTGTTA ATCTATACAA AAGTATAAAT CAATG[A/T] TGGATAAATT CTGGCTGTCT TTTTCAATGA ATAATGTCTG GCCTTGCAGC TTTCTCAAAC

ACACTAGCCC CAATTTGTTC TAGTCATAGC ATGGCCATGT TCTCTCTAGT ATGTCCAGCT GCTTCAAATT CCATTGTGTC AATCTCTATG TGTCAAATCT CCAACTATAT CAACTTCCA GTCGTGTCAA TCTGCAG phal2392 (SEQ ID NO:776)

CTGCAGGAC ATTCACAGTC ATTGCCGCA GTGGAAGGAA TTAACAAAGA TAGTAACAG GCAATCTAAC TGCACTGTCA GACTGTGAT A GAACAGTC AA TGCAGGA TG TAGTGGT GGACTTGCGG GACTATGCCT TAGAGTTCAT TATCAACAAT GGTGGCATTG ACACCGAAGA GGATTACCCC TTTCAAGGTG CTGTTGGTAT TTGTGA[T/C ]CAATATAAG GTTAGTTTTA CCTTTGATTC

TTTGATAAAT TAGTAAATGT TTCTAATAGT TTCATTATAA TACTTATATA ATTTTTTTCT TATCTATAGA TAAATGCAGT TGATGGTTAC GAACGTGTTC CTGCCTATGA TGAATTAGCC TTGAAAAAGG CGGTAGCAAA TCAACCAGTG AGCGTTGCCA TATATTGAAG CATATGGCAA AGAGTTTCAA TTATATGAAT CAGTAAGTTT TTTAATCAAC TTTACTTGAA AAGTAAAGAA CTAAAGGAAC

CTACAATGTG AGTAGAGAGA CGGACTTAAA GTTAGCAATG CATTACATTT AAACTAATCT AATCTAAAGA GATACTCTCT CCAAATTTAA ATATAAACAA ATTTAACTAT CAAACATATG AATTAAAAAA TTTAGTTAAT AATATTTAAT TCAAAAATCT TCCTCAAATA TTTTTCATAA TGATGCTTGT AAAATAAAAA TAAATGATTT ATGAGAAATA AAATTTTATT AAATGAATCA CATAATAAGA

ATATTATCAT TAAATAAAAT AAAAATTAAT TAGTATTTAA TTTATTTATA TTTAGATCTA ATTTTTTTTT TATTGCTTAT ACTTAAGTCC AGAGAGAATA CATTAAAAAT TGTTCAATTC TCAAATATAA ATTACAATAA CTTTACTTCT TATTTACTGA CTTAATCTAG TATATTTTAT TTGTCCTAAT TTTCCAGGGT ATATTCACAG GAAAATGTGG CACGTCAATA GACCATGGTG TTACAGCTGT

TGGGTACGGA ACAGAAAATG GAATTGATTA TTGGATTGTT AAGAATTCAT GGGGTGAGAA TTGGGGTGAG GCAGGTTAT GTAAGAATGG AACGTAATAC AGCAGAAGA CACTGCAG phal2393 (SEQ ID NO:777)

CTGCAGGGT GGGATGTCCA ATGAATTAT ACAACACTAT TG CCCGTGT AAC TGATGG AATTTATGAA GGTACTGCCC AGGTTTATTT TTCTGTATTT CTTAATGATG GTTTTGAATT ATATTGTAAT ATCCTGTTTT GCTACAGGCA TTGCCATTGG TGGAGATGTT TTCCCAGGTT CCACACTTTC TGACCATGTT

TTGCGGTTTA ACAACATACC ACAGGTAAAA TTTCACTTTT ATCTTGGGTA CTGTATATAA TATCTGCAAC TCATTAAGAA CTGCAAATGA TTTGGAGTAT GTCCTTTTCT TGTGAATGAT CAGTTTGTAT TTACATTAAA CCTCAGGTGA AAATGATGGT AGTACTTGGG GAACTTGGTG GGCGTGATGA GTATTCTCTA GTGGAAGCCC TAAAACAAGG GAAAGTGACT AAACCAGTTG TTGC[T/C]T

GGGTTAGCGG AACCTGTGCA CGACTCTTCA AATCTGAAGT ACAATTTGGT CATGCTGTAT GTCAGTGGCC CAATATTTTT TTACTAATTC TCAGTTATAG CATTTATAAT AAGGGAGCCA AATTCCTGAT GTTCAGTGGC ACTTGTATAA ATTAGGGAGC TAAAAGTGGT GGTGAGATGG AGTCTGCTCA AGCAAAGAAT CAGGCACTAA AAGAAGCTGG AGCTGTTGTT CCCACTTCAT TTGAAGCTTT

TGAAGACGCA ATAAAGGAAA CATTTGACAA ATTGGTTAGT TTATCTTGAA ATTTTTCATC TCTCTAACTG ATAATTTTAC CTTGTCCCCA ACTCCCCATC TCCCAAAGGG AGCAATGTTT TAAAGAAGTG TCTTTAGTTT TTAGTTGCCT AGATTAGCTT GAATATCATA GTTGTTTTTT C[TT]TTGTT AGGTTCAAGA AGGGAACATC ACACCTTTTA AAGAGTTTAC TGCACCGCCA ATCCCTGAGG

ACCTTAACAC AGCAATTAGG AGTGGAAAAG TACGTGCTCC AACTCACATT ATTTCCACCA TCTCTGATGA CAGAGGTATG TTGGGATCCT TCGAATTTAT AAGTTGTAAC ATGGAACTGA GGTGTCCTAG ACACATTTGA TAGCTAAATT AAACTTTTCT TCCATTATTT TTCC[G/A]A AGGTGAGGAG CCATGCTATG CTGGTGTACC AATGTCTACC ATTATTGAAA ATGGTTATGG TGTGGGTGAT

GTAATCTCTC TTTTGTGGTT CAAACGCAGC CTTCCCCGTT ACTGTACTCA ATTTATTGA GGTAGATTAT TCATACT CTA GTCTGCAG phal2394 (SEQ ID NO:778) CTGCAGCCTC AAAGCCTGGC TGCATTATCT GATGGAAATG CAAGGATTG

GTGCTCATC ATGTTCCCCC ACAAGATGCT GTTGTCGACA AGGGAAAGAA ACCTATATCA CCTCAAGTTA CTCCCAGAGG GAGAAGGTCC CTTTCTGAA CCACTTAAAG AGTCAACAG TTGAAGGCCG AGCTGCTCTG TTGGCAAATA ACAAAATGCC TCATCCATTT ATTTTGATCA A[G/A]CCCA AGGATGAGCC TGTTGATGAT ATACCAGATT ATGAGATTCC CCTTGCAGTG ATTCCTCCTG

GTATTCAACT CACTATGCCT TGTTGAGAAG CTTCTTTCTT TCTGTCCATT TTTTGTTGTT GTTGTTGTGG CATCTTGGTG TTGACTTTTA CTTTCATGGT CCACTTTTAT TGAATTCAAT GATGAGTACT GTATTTTGTT GGTAGTATAA TGCTGCGGAG TAGGTGGGTT ATTCTTCGGA ATATATAGGA AAAGTCTTTA ATTCCAACAT ACCTCAAAAG AAGAAAAGAA TTTAATGCAC AATTTTGGCT

GAGCTGAGAA ATAATTTTAG TTGAATAGAA CCTGATTAAT TTTCCCTCCC TGGAGAAAGG ATGACATTTC ACCAGAGCCT ACCATGTGCT TCCATTTTGC ACACACAACA AGTCTGTAAT TGCATACAGT AGTAACAGTA TAACTTTTGC TTTCCCTGAA AATATATGAT CATCTCTGTT TGCTGTTGCT TCTTTATGTG GAAGATCCCT GGTTCCTGGA ATCAGAGATT CAGAAATGCC CTTGTATTTT

AGACTTGAGC CAAAGAAATG TTACTTCTGT GCCATTTTAT TGTTTTTGTT ATTCTGTTTC TAACCCCACA TTAAACTCCA TTGATGAGTT CTATTTTGAT TTATTTTAGA CTCTCCAATG GGTGCAGTTG AAAAGCAAGA TGTCCATGAC ACTGTTGTAT CACAATGCAG AGATGAAGAC GTTGAACATG AAGATGTTTT TCCTTCCTCA AATGAAGAAG CAACTTCTAA TGTATATGTA GCTTTGTCA

TCTATGGGAG AGGTAAAAA TTTCTCTGAG CTGCAG phal2395 (SEQ ID NO:779)

CTGCAGTCAA TGCTGATGCC ATTTTCAGA ATTTGATCAC TCAGAAAAGG GTGATGCAT TATATGGTAA GTTTTT[T/A ]TT[T/A][T TATTJATAAA TAT[A/G]TA TA[A]TATAT AT[A/T]TA[ 14(TA)/15( TA)/16(TA) JATAAATATT GAACTTTTA[ T/G]TTTT[A /T]TTTTT[A /C]TTTTTAC CTTGGTTTGC TCATCTTT[C /G]ATGAGTG

TTGTTTTGTC TATTATTCGT GAATTTGAAT GGTTTCTAT GATTATACTT ACTA TTTCT GTCTCGCCCT ATTTCGCAGC AATGGAACTT GCATTATCTT TGGAAAGGCT GAATAATGAG AAGCTTCTAA ATTTACACAG CGTACGTCAC TTCTATAGTT CTATTTTCTC AGTACTTTTG TGAATTCAAC TCAAACCTTT TTAATTCCAC TACCTCCCAA TGGAATCTTA CTATGTTGTT GAATGCCAAT

AACTAGTTTT TATCACTATA TATGCAGTTA GCAAATGAAA ACAATGATGT GCAATTTGTT GACTTTCTTG AAAGCGAGTT TTTGGTTGGT CAGGTAAATC AATGTCCAGT AGTGTGATTC TTTCTATTGC TGTCAAGAGT AGTTGAGTAG CATGGGAGAA ACATTGTTCT CATTAATTTC TTTTTGTTGC TTAAGGTGGA AGACATTAAA AAGATCTCAG AATATGTGGC CCAGTTAAGA AGAATGGGAA

AAGGACATGG TATAATATAT GTTAACTACT TGCATCTTAT AAACCAAAC GGTTGATTCA TATTCATATA CTTTGTGGT GAAATTAAAG TATTTTATTA TTGGTGGANC TGCAG phal2396 (SEQ ID NO:780, 781)

CTGCAGGATT TGGCATCTTC TGTTTAATCA AGCATAAATG ATATGGTTGA TTAATTAGTG ATATATCATG TTGGGATGCT GCTAACTGA ACGGTTGACC ACTACACAG TGTGATGTTT TAATTAACTT TGCTTCAACA TT[ACC]ACC TTATATTTAA AGAATTAGGA ATAGATTTAC CAGTAGGCCC TTATGATAAG AAAAATAAAA TAAATTTACT TTCTCCTCAA TTAAAATCGG CTT[C/A]TG

CATTTCATTT TTTAAGAGGT TAAATGAGAG AATTTCTCTA TATAAAAATA GATAAGTTAA TCACAAAATT TCAATTTCTT TTTCTTTTCT TCTTTTATAA TTCTGTTTTT ATTTAAACAT TATTCTTCTG ATTAAGTGGG TTTGTGAATA TGGATTGCCC AACCTAATTT ACCTAAAAAT TTGGAATAAA CTAAGTAAAC TTATTTTGTC ATCATTAATT AGTTTAATGT GCATTACTTT GATTACTTGA GGATAAAAGT GTGTTTTACA TATCTTTATC ATATAGGGAC CATTCGATGG GCAAAATAAG ATAAGGTTTA ATAGGATAAA TTATTATATT CTTTTTCAGT CAATTTTTGA TACACTATTA AAATATAATT AGTTTATTTT ACCATCTATT ATATCATGCT GTATCAGTGT AGCTTTTATC CCACCCATCT TAAGAAAAAA TATATGTTGG AGAAAAAAGA AAAATATAAG AAAAAGTAAT GACAGAAGTG

ATAAATTTTG AGAAAAGTTC TTTGATGTTA ATGAAGAAAT TTGTTT...( - 550 bp). ..TCCACGCT CGCCTCATCG TGCACTCGGT TACACCCGAT AACTTGTTCG TCTCCAAACT CATCCTCTTC TACTCCAAAT CTAACCACGC GCACTTCGCG CGCAAGGTGT TGGACGCGAC CCCCAACAGA AACACCTTCA CGTCATGTTC CGCCACGCGC TCAACCTCTT CGCGTCATTC ACTTTTTCCA CAACCCCCAA

CGCCTCCCCC GATAACTTC ACCATATCCT GCGTCTTGA AAGCCTTAGC TTCGTCTTTT TGCAGCCCCC ATTGGCGAAA GAGGTTCACT GTTTAATCCT TCGACGCGGG CGGATTGGAC TCTGATATAT TTGTTCTCAA CACGTTGATC ACGTGTTACT GCAG phal2436 (SEQ ID NO:782)

CAATTCAAGA AAAAAGACA ACATAAAAGA GTCTATAAGA CCTCC AAG GA[A/G]TAT [C/T]CCCTA GAA[AA/GG] TCG[GTCG]C TTTCACAAAA GT[C/A]AAA AAACATATGA TrA/GlTTT CCTGCACTTC ATATAAATAC TCGTCATTTAC

P13070 (SEQ ID NO:784)

CTGCAGAATG CTGATATAAG TTGGCAAAGT CGTTTGCATT TTGGCCAGAA

GTTTGCATGC TTGAATAGCC ATTTTCAGCA ATAAAATATA ATCCCATGTG TCTCTTTTTG CTTCCATTCT CAGCCTTACT TTTTACCATT TGAAATTTCT TTTGACATGT ATAAGCATTT ACATTCGAAC AAAATCTTGA AACTCAAGTT TTAATAATTG ATTATGAGAT GGACTTTGAC TCTAACCTGA ACTTATTGTT GACCAGTAAC TGTATTTTTC ACTTAGAATC AGAACATAGC TAGAATTATC GTGAGTATAC TCTTT ( -650 bp ) GGG GCTTTTTTTT CATGATCATT

CATGTATCAT GTTACATATT TGACATAGCA GTAGGCGAGT TCTGAAATTG TTCTTTGGTC AGGTTGTCAG GCACAAAAGG GATGTCAATC GATCATGTTC TTAGAAGCTC AAGCAGGTAT AAAAAGGCCA GAACAACGGC CTCTCAATTG GAAGAGACTC AAAGCCAGCC CAAATTTGTT CCAGACAGCC TCGCCGAGTA ATCACACATT TCAGCAGAAG AAAGGAAAAA GGACACGTAA CCACAACAGT

GGTATTTTTG TTCATTTTTG TGTCCTTACA GAGCTCCGTC TCTCTGTTTT TGTTGCCCTT TTACTTGTAG GTTTATCTCG TTCTTAACAT TCCAAAATCC AAACAGAAAA CTATTTTTG CCGTGTAA GCGTGTTTAC CAATCTAGTT GGTGATTTA TGTCACGAAT CTAGTAAAGG AGAACAATGT GATATATTCT ACTA[C/T]T ATATGTATGT TACACTAAAT TACTTATATC CCTAGTGTGA

ATTGTGTGAA GTATCTTTCA ATTCCATAGA GGAATAATC TCTTGACTGA TAACTTGCAG TAGCGTTTT CTTTTTTAAC CACATTTTTT TCTATCTACC AGAATGAAAC TTGAAACTTC CTTTAAGGAA TATGGGCCTA GTCAATAAGA GAGATAATAT TTATATTTAG AGAGAGAGAG AGCAATAGGT ACAAGTTACA AGCAAGAAGT GGTACAAGAA AAAGGCATTG GAAGCGGGTG CACAAAATTC

CATTCCCTTA GCATTAGCAT AGCAAAGTCA CTACTCTGAT CTCATTCTGT TAAGCTGTTG AGTGCCGGTT GCAAAGTACA TAACCCAAAC AATCGCAGGT GTAACCACAA TAAGCAAAAG CTGTCCCCTG TTGTCACTGG CGCAGCATCA GCAATCACAG CCTCACTAGC CGATGCATCT GAGCCAAAAA GCATGCCCGA TGCGCCCAGC CCAGCCCAAG CCCAATGATG ACCCCTTGCT GCTACGCATG

CGGTTCAGTT GGTTCAGCGC TGGCTGCAG

P13071 (SEQ ID NO:785, 786)

CTGCAGTCCT TAGGACAGTA TTTGTTTGGT TCAACCAGCA ACCATCCTAG TC TACAAA TCTTTGTTCT AACTATGGAT AATGGATGCC TTCAACCACA ATTAGCA

AAGCCAGGA T AGCTACTT GGTTCAAGCT TGTTGATTAC TGCATAATCC TGGT AATT TTGACAGATG AGTCTGAATG AAATGTTTCA GAATCACAAC CTGTTGAG TCTTGAGACT GGTATATTCT AGCTCTGCAT AGTAGACAGA AACAAAAAGT CACTCCGG TAAAGACCAG AAACTGGATA TCTCAATAGT CAATTGTGTC AAAGTATAAA TTATACTTAG TTTTCCAGAG AAAAATCTTT GGACCATCTG

AATCCAAATT TACTAACTAA TAAATGGAAC CTCCAATACA ACTACATCAT TAAAGAATTC TGCTGGACAA TATTCTGTAA AAATCCATTA ACTTAATTGA TATACTCACT CATAATTGTG CTATCTTGTT CGATGTTTGT TAATTTTTTC AAATTGGGAT GTTGGTTTCC ATTATATCTT TTGCAAAAAA TTTGAGCATT ACAAAATAAA ATTATAGAAT AAGATAAGCA GTTGGGGATA GAGAATCTCA

TTTTTCTGCA TTTCGTTAGA GATATTACGT TATATTAGAT ACTTCAGTGT GGTTATGACA AGAAACTCTA TACTTTCTGG CATCTGAACC AAAGTTGTGG TAGACCCACA CTAACATCAT ACCAAAGTG CTCATTCTTC ACAGAGTTAG GAAAAACTA TACTTAACTA TAAAAACTCT GGCTACTCAC AAGAATGCTA AGCTAGTTTT CTAATATTAA TTCAAAATAC AGAAAATTAT GTTTAATCTT

ACACTGGAAA TCAACGAAAA GAATGGAAAA AATAAAAATG AGGAAGAGCG TATTGGATT GTTCACTTTA ACATCAACTT ATATCTCAGG AATTGTAAAG CATCCCTGCA TCTCTATATG AGATTTTAAA CTAATCCAAA CAAGCTAACC CTTCCTAGTC CTAGAACTCT TATCACATG CTGAAAGGTA TACACTCACT ACTAACTAAA TAACATGAAA ATGATGTACA CTTTGATTAC AAACATTTTA

AGTTTTAACT GGTTTAAAAA TAGTTCATAA AAGAGAATAA TATATAGGTT CAACAGTTTT CCAGCTAGTC CGCACACCTA TTTGCCACTT TGAAGCTGTG TGTCTATTTC CGCTTTGAAG CTGTGTTATG AGATAAGACA ACACTCCAAC TTTTATTCTA CAAATGCTGA TTTTCTTAAC AAAATCTTGT TTCTACTCTT ATGTTGCTTG CTAATTAATA CCTTACCATG TCAACATCTA ACTTGAGAAG ATCTATTTTT CTAAATAAAT TATGTTTCTT CTTTATATAA TCGGCCTTCG CTAATTAAAC TTTATTGTCC TTCCTTTTTA NTGCTGTTCT ACTAAGGAGC AAGGGTAAAA TTGCGGTTCA TATGCACCAA CTA ( ~ 300 bp)... . GGCAGTAT

TATTGGCTTG GGAGGCCCGC CCATGAGAGT TCTGCAAAGT AAGCAGCGTC AACTCACGGC CAGCCTATGA CCCCTCCAAA AACTTCTTGC ACATCACAAT GAGTGCCGCA AGCAGCTCCC ACTGCTTGCT TTGTTTTCCT TACTCATTAG TTTAAGGCAA [A/G]TTTT[ C/A]C[G/A] GTGCTGCTTT CCTCTT[G/A ]TTTATTGAC TAGATGTTGG GTCGATACCT CGTTT[G/C] CAATCGATGT GGGGACTCTT

TATTCGAAGT TTGCTGAACT TGTGCTTTGC TATCTAAGCC TTGCACATCA GCTTTTGTTA AATGAT[T/C JCCTTGCACC TTGAATCATG ATCACATTTT AGATTCTGAT ATTAGTATTA TG[C/T]TTT TGATGGAATG CAAGGAAAG[ G/AJATTTTT AATACAG[CC /AT]ACAAAA TGATACCAAA AGGCTTTATC AATTGATTAA AAAAAGTGCA AGTGTTATTA TGCATGCACT TGAATTTGGT

TACAACTTGA TTTTTTACAC AGCTTGGAAT TTACTGTTTG TAACAACTT TCTCCCAACA CAAGTTTCG CTCAGTTTTG AATCATAAGG TTAGTAATCA TTCAGCTGCT TTCAGTTATT TGGATCCTTC TTTGGCATGA ATGACTTTTG GTTTTGGACA TACTTACAGG AATAATAATA TAAGCTGGGA TGCCAAAGAT AATTGGAAAA GTACTCTTCT GCCTTCATGG TACTAGGCTT TTTTTTCTGT

GAAAAATAGT GACAGTACAA GAGACTTTTA AATTGTATCA AACTGAAAAA TCTTTGAGGC GATTTCTATT ACCTTCAGCT TCTTTAAGCT CAACCAACTG CAG

P13072 (SEQ ID NO:787) CTGCAGCACC CCTAGCAAGA CCTGCTTATA TACAGAAGCA AATTTAATCT

ATGAGGCTCT GATTTTTCTT TCTTTCTCAT ATTTATGTGA ACATCAAGCT GAAAATAATC CTCATGTAA CCAACTCTCT ATGAAGTTTG AGATCCAAA CCTGTTTTTC TGAAAGATGG AAAAGAAAAT GAAAGTTAAA AAGATAGAAT CAAACATGTA GCTATATTCA ATTGTAAGTG CATAATTCAG CA[A]TGAAA GGTGATACAA TCAAAATTAC CCACCTCGGG TACAAGGGAA CATAGTCTCC

ATGTTGATCT GGC[A/T]AG TTAAATAAAA TTGAAACGTT AGACAAATAC ATAAAATGTT ATCCCAAACA CTCAGAAAAA TAAATGAGTT ATTTGAACCC T[C/A]AATA TG CTGATAA TACCAAAAGG AAAATAACTT AGAGGCATTG CTCTCTATTC AGAATTGCAG GTTCATCTTG ATAATTCATG TTATAAAGAT TTGCATTAAA

TATTAGAAAT TTAGAATAG CTTACCGAAG TGAAACACCA AATCCGATTA GAGAACAAC ATCAAAAACA CCTAAGGAGT TGTATCTAAG TTTTGTCCAC ATCAAAAAAT ACATTGATGA TGCGAAAAGG AGAGCATACC TCTTCCTGGA ATGAAAACAG TCCCACTCTC TTTGCTGAAA TTGACACCAG GATTGTAACG CTGCAG

P13073 (SEQ ID NO:788)

CTGCAGCATG GGCACTATAC AAGGCTCAAG AAGAGCTCAT AAAGGTTGCA

AAAGAGTTTG GTGTTAAGCT CACAATGTTC CATGGCAGAG GAGGAACTGT GGGAAGAGGA GGAGGTCCCA CTCATCTTGC TATATTATCT CAGCCACCGG

ATACTATTCA TGGCTCACTT CGGGTAACAG TGCAAGGTGA AGTTATTGAA CAGTCATTTG GAGAGGAGCA CTTGTGCTTC AGAACACTTC AGCGCTTCAC TGCTGCTACA CTTGAGCATG GAATGCACCC TCCTGTGGCA CCAAAACCAG AGTGGCGTGC CCTCATGGAT GAGATGGCT GTCATTGCTA CAGAGGAGTA TCGCTCCAT TGTTTTCCAG GAACCCCGTT TCGTTGAGTA CTTCCGATGT

GTAAGTATTG TTGAATACTT CAG[T/A]AT AGAAAGATGT CCTTGAAAAT CTAGCAGTTT AAGTGGCATA TTTACAAAAA TGAT[A/C]A TTTAGTTAGC ATGATTAACT AAAATGCAAT TGTTTCCAAT CAAGACAAAA TTCCTTTAGC ATTATTGATG TTAAAATAAA TCGTTAATAA TGTTTTACCA TTTT[T]TTT TCTTCCCCAA TCTTGTGAAT ATATTATTAT CAGTTGCAAA AATTCTGATT CAACTGGAAT ACAATTATAT TTCTGATGA TTTAAGAAAC ATTTCTCTTT CCTTTGGAGT CACAACTAT TTATCTTGCA ATTTATTTCC TTATTTTCTT TCCTTGCTTT AATGCTGAAT ATTGTAAACT GCAG

P13074 (SEQ ID NO:789)

CTGCAGGATA TGGAAAGTGG AAAACTAGAA AACGGAAAGC CTTTTAG TG

ATAGTGTGAT ATACTACTAT ACGTAAGTTA CATTCATTCA CCAACAAAAA AAAACGTAAG TTACATGCAT TAGTTTTCCT TCTTTAAGGG ATAAAGTGAT

CTTTGAGGTA ACGATGGCCC AATCAAATGA GGTAACGATT GAGCTAAAAT GCAAATGACA CAAATGCAAA TAGTAGTAAT TGCTTGAGAT TTAGGGGATT AGTTTAGCAA GCATAGTGAT CATCATTTTA TAAAATTAAT AATGATATAT TGAGGGACTT TTTAAAATCA TAAAACGTTT TAAATTCCAA CAGAAATGG ATAGCAAGTC AATTTCATGC CTTGTGATAG AAAAAGAAA AGTCGTAGTA

AATTATATTT TGCTTTGTAT ATGCTCAAAT CACATTTTTT ATTTTCTTTT CTTTTAAAAA TTTAAAATTT AAAATATACT GTGTAATAGT ATGCTACACA TTCATCATTT TAGAAAAGTC AAATGAATTA TTTGATTTTT TATTATATTT TTTTATTCAA TTTGATTATT TATCTTTTAA AAAGTTTAAT TTGATTTTTT ATCTTATTTT TTGGTTTAAT TTAATTCTTT ATCTTTTAAA AAAATTAGTT

ATTTATCTTT TTTTAAAGTT TTATCATTCG ATATTTAAGA TTGACGTCAT TAATCATTT AAGAATGAA[ C]TTTTTCAG TTAAAAATGA AGTGGAAAGG AG[A/G]GAA AAAATGAA[A /C]AAAAAAA ATTTATTTGT TAACGATGTC AATTTTAAAT AGATTAAAAG ATAAAAAAAA ATCAAATAAA TCATTTAGTC TTTTAGAAAT GAATGAGTTA TATTCAAATT TTTTAATATA GATGAACAAA

TGAATGAGTT ATACTTAAAT TTTTTAATAT AAAAGAACA TTTTTAAGGC AGATGAAACT TAGGCCCTTT TCTGAAAACA GTTTATTAAC CCTTTTCTGA AAACATTCT TCACATTCAC TAAGTACATC TTCATGTCCT GCAG P13158 (SEQ ID NO:790, 791)

CTGCAGCAAC ATCAACAGTG TCCCGAATCG AAGCGTCTCG ATCCGGGTTC CAACGCTCGC CGGCCTGGAC ACCGGAGATC GGTATCCGCC GGAGCTCCGT TAATCTACTC CGGCGGAGCC ACCTTACTCC CAAGCGGGAA CATTTGTCCG TCCGGGAAGA TCCTCAAACC GGGCTTGCCC TCGCGCGGGT CGAACCGGAC TGATGTGTTG GGCTCCGGCA CCGTGAAACT ACGGCCGGGG CAGCATAGTG

CGAGGCGTCT CGGGCAATAT TCCGGTGCCC GTGGGCGCAC TGCCGCCTAC GGTGAAGCGC GCGCTCAGCG GCTCCGATCC CGAGGAGTTG AAGAGGGCTG GGAATGAGTT GTATAGAGGC GGGAACTTTG CGGAGGCGCT GGCATTGTAC GATCGCGCCG TCGCCATCTC GCCGGGAAAC GCCGCATGCC GAAGCAACCG CGCGGCGGCG CTTACGGCGC TCGGGAGGCT CGCCGAGGCC GCGAGGGAGT

GCCTCGAGGC GGTGAAGCTG GACCTTGCTT ATGCCAGAGC GCACAAGAGA CTTGCTTCTC TTTATCTAAG GTAATGTATT AATGGAAAAA TTTGGATTTG GATTTGCATT TGAATCTGAG TTTGAGTTTA GTTTTGTTGA GATTGGATTG GAACCAAGAA ACTTGAGTTT AGAGCTAGTC AAACTTGATT ATGGCTTTGG NC A ACGTGTT TGGTACTCCC TGTTA ACGTG ATTAGTGGAG ... ( - 50 bp )

...TGACT CTGTGTTGAA TGTTGATGCT ACTTTCAGTT GCTTCTGTAT CCAAAGAAAC GTGACTCGTG ATATATCACT TTTGTGCAGG TTTGGACAGG TTGAGAATTC GCGGCAGCAC CTGTGTCTCT CTGGGGTTCA AGAGGATAAG TCTGAGGAGC AGAAGCTGGT GTTGTTGGAG AAGCATTTGA ATCGGTGCGC TGATGCGCGG AAAGTTGGTG ACTGGAAGA GGGTGCTTAG GGAATCTGA

GGCTGCCATT GCTGTTGGAG CAGATTTTTC GCCTCAGGTA GTTTTGAATT GAAATTTCTG ATGTTACCAT TGTCTACATT [G/T]TTTTT G[C/T]TAGA GATGTCATAT GAAATTATTA GCGTGTCTTT GGTTAATAGC ATAAAGTTTT AGGATAGCTA GGGTGTGATT GGTTTCTGTT TTCAAATAAC TGTTTTTAGT TTCCAAAATA CAACTAAACA AGGTTGCTCT GTTGGCTGGT GCAGAGTGTA GTGCAAGGGA CAGAAAAGTG TGGAATTGAT GGGGAAATTT AACAGGTTTT AATAATTTCT TTTGTTTGTT TTTTTGGTTT TTGGTTTTTA GAATACTTTT [T]TTTTGAA ACAATTTTTA GAACTTAATA GATTTTGGAT GGTAAATTGA

TTGGTAAG[C /T]AGTGTAA AATTATTTTG GGAAACTGTT TTTAAAATCA AAAAGTGAGG AGAATTAATG AGGTCCTTA GTTCGTGGTA TGGTGGTAGA CTAGATTCTC TACAATCAA GTATAAAATC ATCCTCGGGT TTTTACTTGA TAGTTTAAGC TTTTAGGATA ATTGGTTTGT GACAATTTGT ATTGGGCTAA CTTGTTGGAT TGCTACTTAC AGA

TTGTTG CTTGCAAGGT GGAAGCCTAT TTAAAACTGC ATCAACTTGA AGATGCTGAA TCAAGTCTCT CAAATGTTCC GAAGTTGGAA GGTTGTCCTC CAGAGTGCTC TCAGACCAAG TTCTTTGGTA TGGTTGGTGA AGCCTATGTT CCTTTTGTGT GTGCACAGGT TGAGATGGCC TTGGGGAGGT AAACACTAAA AACCTTAGGC TTGAAATCCA AAGCTAAGTA AAACTTTTGA GTGAGGAACA

AGTAATGGAT TGTTGCAGGT TTGAGAATGC TGTT

P13560 (SEQ ID NO:792)

CTGCAGAACT GGTGGTGGTA CTCAGTCGC ATTGCATTAA CATCTATAC TATCTGTTCC TGCTGCACTG ATTTCAGTAA AGGATCTAAA AGCTTTGAGA

CTGGG[A/G] TTTAATATGG AGCTTATAGC TATTGGATGT TCAGTAAGAA TAGACATGAT ACTATTTCCT ATGTCAACAT AATTCTTCTT CTTCTGACTT GGAAAGTTAA CATGATCTTC TTCTGAATGC AGGCAATTTT TGTCTTATCC TTTCGAGGTG TTATCCATAT ATGGATCATG GGGAAGAGGG GCCCTGTCTA TGTTGCAATG TTTAAGCCAC TCGAAATTGT CTTCGCAGTC ATCTTGGGGG

TTACTTTTCT TGGGGACTCT CTTTATATTG GAAGGTATAA CTCAGTGTTT TGT[C/T]AG AGAGTTATTT TCTTCTTACA TACTTCACAT TATTTTGTTA AAATCCATTT TCT[C/T]CT GTCACTCTAT ATCACACTTT TCAGCTTATT TTATACTTCT TTCTTTTCTC CATTTATGTC TACCTTAfTT A]GTTGTATA AGAAGTTGTA AAAACCGTAG GTAGGAATAT AATTTCTCTT ATGTTAACAT

GTTCAATTTA AAACATTTGT TTCGGGTCTA ATTACAAGGG TCCAACTTTA TGTACAGTGT GATCGGAGCT GCCATAATAG TTGTTGGTTT TTATGCTGTT ATTTGGGGGA AAAGTCAAG AGAAGGTGGA GGAAGATTGT ACAGTCTGC AG P13561 (SEQ ID NO:793)

GAATTCTTAC AATCTCTTGA TTCATGTAAC TGACTTTATC AGAATAGTTC AGTATACATT TTGATAACTT CACAATCTAA AGGATTCATC ATATAAGCAT ATCAAAGAAA AGGTATGAAG GTAAAGTGGT TAAAGATAAT AAATATCTGA CCTCAGGTAG TCTGGAAGCA AAATATTTTA TATAATCCCG TCCAATGTTC TTCACAATAC TGTCTAAGAG ATAAAGAGAT GGCAGTTTTT GATCACTCGG

AACCTGGAAA GTCATGAACT CATTTCAATT AAACAAGACC TTTTTTCCAT AATACAAAGA CGCCATGAGA GAAATAAGAT TTCCCTGACA TATGAAAACA AGGGAAACAG CTGCAATCAA CGATATCTAA TCCAATTAAA AGTTAGTATC AAATTTCACC AAAAGTGGAC TTTAGCACTG ATTGAGATTG GAAACTTTTA GGGTTCACTT CTCTCTTGTA AATGGAAAAA TCCATGTTTA CCAGGATTCT

CAATCCCCTG TGCATACCAA AACCACTCCC CAATGGTGCT AATTACCAGT TACACACGTA GTCAAAATAC CAAGCTACCG TAATTTGGCC AAAGGACAGT ATTTGTTTT TCTTTGTAAC TCTGATTGGA AAACTAAATA TGTTTTTCGT TCTTTTAAGG TGCAGGGCAG AGTGTACCAC TTAAGAACCA ATCTACATCT ACAGGGCCAA TGAAATGAGA TGAAATTCAA TTTGAACAAA CTATGAATGC

CCTTTTTCTG ACTAGCATTC ATTTCCATCT AAGTACAACA CTAACCCAAT TCATCAACGC ATAGCATGAC AAACAATAAA ACTAATTGAA TTGCTCTTTC TTCATTCCAA ATCCATTACT ATTCAATTCA ATAAACTTGA CATAAACAAC TGCATATTTT GTTCACCTCT ATAATATTAG CACAAACGGT GGCAGCAATT GCCTTGGCAG CAGACAAGTT CTCTCCAGCA ATAATAGTCA AGTTGGTAAT TATTGGCTTC GAATTGAAAG TGAGCTCAGC AAGCGCGGTC TTGTACTGAA TCACAAGCTC TTGGTGCGGC GGAGGCTGCG GCTGATACCC TCCGCCGCCG CCGGAGTCTC TGTCATATGC TCGGAACCTC GTAGACGGCA ACGTAGTAAC

TGCCGCAGGG CGTAGAGGTA ATTGTCGGGC GCTGAGTTCC TCGATTAATC GAGGCTTCTT GGGACCGGGT TCTCTCGATC TGTCCAACG ATCTCTCCAT GTTCATTCG AATCAAATGC AAACGAATTT AAAACCCTAA TCCTAACCTT TATTGGTTCT CG[C/G]GCC GGTTTTATTG GGGAGGGGGA ATGAATCGAA GAGAAGCTCG GATTTAAAAT TGTAGAGGGC GAATTGAGAT AAACCCTAAT

CCTAATTTAC ATGAATTAAT AAAAAATAAA AAATAAAAGA GAGAGATGGA AGAAGGTGGA AGGAGGTGGA GTGTTTATAA GTAGGCTGTG ATCTTGGTTG GAAAAAAAAA [A]CGAAGAA GAAAGAGTTC AAAAACTTAT GATGGAGTTC TTATATTTTT TAACGGTTTC CTAAAGCTTG ATTTTAAACG ATAAAATTTT ATTAAAGTAA AATAAACATT TTCTGAT[G] AAAAAAAAAT ATCACTTTTT

TGTGTAAAGA ATATATCACG TTTAAATGTT TAAAATTAAC CTTAAAATTA AATATTTTTA AAAATCTTTT TAGATTAGGA GTGTTTGGAT AGGATTTAAG ATGAAATAAA ATTCAAATCA ATCGTATTTA AATAAATTAA TTTAGTGTTT TAAAGTATCT TCAGTCGATC TAAATCAAAT CAATGTAATA ATATTAATAT TATGTTGTTT TGATACTTTA ATTTCCAAC ATAACATATT ACTTACATCC

GACTTAAAT ATAATTAATA ATAAGTTGTT TTCATGCTAG GAATTTTATT ACAGTATCAA AACACTAACA TGATTTAAAT ATCCTTATGA GATAAGGCCA AAAAATTCCA ACGTGTAAGT GATACCCAAC TTCTTATTCG TGATGTCTTT GTTGATGGCC ATTGGCGTTG GAAATATTTT TTCCTCCATT ATTCTAATTG ATGTTAAGCA TAAGATGATG AACTTAATTC TTGATGAAAA TAGTCATGGT

GTGATCATTT CGGGGCATAT CCAATCTGGT ATCTACACTG CCAAGTCCAC CCATGCAAAT GATTGATTCA CGAGTCTGCT AATCATAGTT CACTTACTGA GAGATTGTAG CAAGGAAACT CAAATATGGA GTCTTATAAA TATGGATTCC AAGTGTTGAC TTCTTCATTG TGAATTAAGA CATTTGGTTC GCAACTCAAC TGAAACGACA TAGAAATCTT TTTGCTTGTC TTGCTGGGTG GTTGATTTGG

ATATCAAGGT ACGTGGAGAT CTTTGAGAAA CGCATTTGGT CTACATGG

P14257 (SEQ ID NO:794) CTGCAGAGGA TGATTGATGA GGGCACCTCT GAACATGTGC TAATAGTGTT

CAATGGAGTC ATAGATGATG TTGTTGAGCT TATGGTGGAC CCTTTTGGCA ACTACCTTG TGCAGAAGTT GCTTGATGTG GGCGGAGATG ATGAAAGGTT G CAGGTTGT GTCAATGTTG ACAAAAGAAC CAGGGCAGCT AATCAAAACC TCTTTGAATA TACACGGGT ATATGCATCC TTCTGTTGA ACTGAAAGAT TGTTCTTTTT TTCTTTTTCT ATATCATAAT GAACATGTTT TCTTTTTCTT

TTTGACATGC TGAATTTGAT AGTGTTGCTG TATCAGGACT CGGGTGGTTC AGAAGCTGAT CACGACTGTC GACTCTAGAA AACAAATTGC AATGCTTATG TCTGCTATTC AATCTGGTTT TCTTGCTCTT ATTAAGGATC TAAATGGGAA TCATGTCATA CAGCGTTGCT TGCAATACTT TAGCTGTAAA GATAATGAGG TATAACACTT ATCTCTTTTG CTTGCAATTT TATTGTAGGT TTATTTTCCC

ATTATTCACT AACTTCAACA GTACATAGCC ATATAGTTAT TCCATACTAT TCAGTTAAAT TTATAAGAGA ACAAGAACAA CCATTAGCTG TCCACTTTAC AATTTTGTTC TAAGCTGTAT GAAGGTATCA TTACACATGA TGACACAAAA TGGTCTTTAG TCAACACTGC TTAGAGCAGA GGGCATTCCT GTTTTATTAA TGTTTCATAA ATTGGTTTAT TCATGGTTTT GTGAAGCGTT GGCTTCTTAT

AGTGTGAGCT CTCCTGCCTC TTTTGGTCAT TTGGTGTCAA CCAGATGAAA TATATTTCAC AAAAGGGTCT TAAATCTTAC ACCTTGTTGA AACTTTTTTA ATTGCATGGT AATCTCCCAG TCTATTTTAT GCACACTACT TCAGAATATC CTGACTTTTC AAACTAATTT CATTATTCAG CAGTTTCTAA ATTTGCTGGA TACTTCATCT CTGGGATTTT CATATTTTCA AGCCTATGTT TATCTCTGGT CACTTCAGAA ATTTCTAAAA GGTAAGCCAA CTGTGGAAGG AACAAAATTC ACCATCTATT GAGGGATACT ATCTTATGAC CATAATGGTG AAGCATTCTA TATTTGGCCA GTTAGCCTGA ATATGTTCCT TATTTTCTTT CCCAATTAGG

AATTCTTAA[ T/AjAGGCTA ACTCTCTACT ATATAAGAAG CAATGGAAAT AAATGTTCTG GTTGAGAATG TAATGATTAT TGACCATTGA CATGGGACTG AAAAAGTTAG TAATAAATTC GGGTGATCCT TTTCATTTTG TGCCAGCCTC TAATAAATTA AATTATTTG ACTCATTGCC ACTAAAATTT CTCTGTTTC ATTTATTACT TACCACGATA AATATTAAAT ATAACGTAGG TACTGCAG

P14395 (SEQ ID NO:795)

AAAGATTGA GCCCTACAAT GACTAAATTA CGTGTGTAC CGATTAATAT

GTTTTAGAAA AATAATCACT TTTCT[A/C] TAAAAAAAfA /T]TTAAGAC AGTTTCGGAA AAACAAATAT TTACTCAAA TTTATTAATT TCCAAACATA

TATGTTTCGT TTATATATAG TTCACAAAGA AAACACCGAA GTTTGAAAGC AAACACTACA [CJAAAAAAA AAAAAAAAAA AAAGGAGCGA AGTGCTAGGA CAAACCCTAC AAAATAATTA ACCAGGAAGG GTAGGTGGTT ACGTTGTTAA CTCCAAAGGA TGAAGTTTCA ATAATTGATC ATTTCCTTTT TGCCCAATTG GCATAAAAAG TAAATTTTAT GACAAATATC TAACGAGGAT ATGGCTCGGT

GATTGGAGCC ATAATTGATG ATTCTTGAGC AGATTGACTT CCTGAATAAC CTGACCTTGT TACCTCTGAA TGTTGCCCAC CAGACAGCAT AGCCGTAGGC AAAGAACTCT TATCACGCAT TAAAAATGCA GGTTCAGAAG GTTTAGCAAG TGGGAAAGAG TCACTGTTAA GCATCAGTAA AACGGTATTC ATAGTTGGTC TATCAGCTAT ATCTTCCTGT ACACACAGTA ATCCAATGTG AATGCATCTC

CTTATTTCAT TCCAAGAATA ATCCTTTAAT GTGTCATCTA CAATATTTGA AACTGTCCCT CCCCTCCAAT TTTTCCATGC CTGCAAAAAA ATTGCTCAAC TGAAAATAAA ACGTTTATAA TAT[AT]GAA ATTTCTTAAA TGTGGAAAAT TTC[A/G]TA TGTACAACTA TAAGCTTCCG AACTTCAAAT AAATAGGGAA ATTAACTATT AACTAATAGT CGTGAAAAGG TATCACACTT ACAAAGCTTA

ATAGATCTTG TGCATTTTCC TCGCTACCAC GAATCTCACT GTTTCTTTGT CCGCATACAA TTTCCAGAAT CATTACGCCA AAACTAAAGA CATCTGACTT GACTGAAAAC TGTCCATATT TAATGTACTC AGGAGCCATA TATCCACTGT ATATCACAAT AAGCAAGCAC ATTT[A/G]G ATTAAAAAA[ A/GJTATTAC TTCACCATTC ATGTTTCACC TATTGTTTTT TTAAGGGCCT ATTGGCTTTT

GTTCTAAAGA AAATCACTAA TTGAAAATAT GGAATATCTT ATTAACTCTC AAGTGTTGTT ATACTTACAA GGTCCCGACA ATTGTATTTG TACTGGCTTG AGTTTGATTG ATCTCAAATA ATCTTGCCAT GCCAAAATCT GATATTTTAG GGTTCAACTC TTCATCTAAC AAAATGTTAC TTGTTTTGAG ATCACGATGA ACAA[C/T]T TGTAATCGAG AATCTTCA[C /T]GAAGGTA AAGAAGACCT

CGAGCAATAC CCCTTATAAT ATTATAGCGT CTTTCCCAAT TCAAATTCAC ACGATTGTTT GGATCTACAT AAATACCCAA GCCACCATAG ACATGTATAG TCTTTCACAA TTTAATAAAT TGTTCAATAA TATCTCTTTA TATCATCAAA GGTAAAGTCA AAGCAAAAAT TTGTAACTTA CCAAATATGA AATAATCAAG GCTTTTATTG GGAACCAATT CATATATCAA TAACCTTTC TCTTCTTGAA

AAACAAAAGC CAAGCAGTC TAACTAAGTT TCGGTGTTGA AGCTTCCCTG TTAC

In the above, bases that are ambiguous are represented by an N. All bases noted within brackets are polymoφhic in some manner, i.e. differ between genotypes (but sequences are known and represented). Bracketed sequence notations that do not contain backslashes within them represent insertion/deletion events, i.e. the bracketed sequences occur in some genotypes but are deleted out in others. Bracketed bases before an after slashes are substituted in different genotypes, e.g. , the [AC/TT] notation indicates that some genotypes exhibit an AC sequence at this position and others exhibit a TT. Similarly, [TCTAG/TCTGG/TC/GTTAG] indicates that genotypes exhibited one of these four different sequences at this position, i.e., the nomenclature indicates that the site displays a combination of insertion events and single base polymoφhisms.

The nucleotide polymoφhisms described by this invention can reduce the expense and time required to exploit genetic markers in soybean improvement, seed production, and the protection of proprietary rights. We describe the use of allele-specific hybridization (ASH) as one technology that can be used to detect these polymoφhisms. Using these polymoφhisms with a technology such as ASH is less expensive and time consuming than other genetic marker methods. The polymoφhisms described here have the genetic advantages of being co-dominant and locus specific, and have the operational advantages of being obtained by PCR amplification from small quantities of DNA template and being detected without the use of gel electrophoresis.

Accordingly, the disclosures and descriptions herein are intended to be illustrative, but not limiting, of the scope of the invention which is set forth in the following claims. One of skill will recognize many modifications which fall within the scope of the following claims. For example, all of the methods and compositions herein may be used in different combinations to achieve results selected by one of skill. All publications and patent applications cited herein are incoφorated by reference in their entirety for all puφoses, as if each were specifically indicated to be incoφorated by reference.

Claims

WHAT IS CLAIMED IS:

1. A method of selecting a first plant by marker-assisted selection of a nucleotide polymo╧åhism comprising the steps of: (i) detecting a first marker nucleic acid from the first plant which is genetically linked to a locus selected from the group consisting of pA060A, pA077A, pA086A, pA169A, pA280A, pA378A, pA505A, pA519A, pA588A, pA947B, pB032A, pB032B, pB039A, pBLT24A, pBLT65A, php02265A, php02301A, php02361A, php02370C, php02387A, php02388A, php02393A, php02396A, php02636A, php03522A, php05219A, php05233A, php05278A, php05342A, php07659A, php08584A, phpl0355B, pK069A, pK079A, pK401A, pK418A, pK644B, pL058A, pL183A, pR045A, pR153A, pT005A, pT155A, phpl2105A, php02340B, php05264A, pA059A, pA064A, pBLT15A, pA593A, pA882A, p8320E, php02329A, php02371A, pA343A, pA748B, php05290A, pA858A, php02376A, pG17.3, pB132A, phpl0078A and SOYBPSP, which loci comprise the nucleotide polymo╧åhism; and, (ii) selecting the first plant comprising the marker nucleic acid, thereby selecting for the polymo╧åhic nucleotide.

2. The method of claim 1 , wherein the locus is selected from the group of loci consisting of php02265A, php02301A, php02361A, php02370C, php02387A, php02388A, php02393A, php02396A, php02636A, php03522A, php05219A, php05233A, php05278A, php05342A, php07659A, php08584A, phpl2105A, php02340B, php05264A, phpl0355B, php02329A, php02371A, php05290A, php02376A, and phpl0078A.

3. The method of claim 1, wherein at least one additional marker nucleic acid linked to a locus from the group consisting of pA060A, pA077A, pA086A, pA169A, pA280A, pA378A, pA505A, pA519A, pA588A, pA947B, pB032A, pB032B, pB039A, pBLT24A, pBLT65A, php02265A, php02301A, php02361A, php02370C, php02387A, php02388A, php02393A, php02396╬╗, php02636A, php03522A, php05219A, php05233A, php05278A, php05342A, php07659A, php08584A, phpl2105A, php02340B, php05264A, phpl0355B, php02329A, php02371A, pA343A, pA748B, php05290A, pA858A, php02376A, pG17.3, pB132A, phpl0078A pK069A, pK079A, pK401A, pK418A, pK644B, pL058A, pL183A, pR045A, pR153A, pT005A, pT155A, pA059A, pA064A, pBLT15A, pA593A, pA882A, p8320E and SOYBPSP is detected and the plant is selected for the at least one additional locus.

4. The method of claim 3, wherein a majority of the additional loci are detected.

5. The method of claim 1, wherein the marker is detected by amplifying a nucleic acid comprising the selected locus.

6. The method of claim 5, wherein the amplified nucleic acid is an amplicon selected from the group consisting of phal2105, phal2390, phal2391, phal2392, phal2393, phal2394, phal2394, phal2395, phal2396, phal0634, phal0623, phal0624, phal0649, phal ll35, phal0792, phal0635, phal0638, phal0648, phal0621, phall071, phal l073, phal0640, phall076, phal0653, phal0598, phal0615, phal0646, phal0618, phal0620, phal0782, phalll31, phal l l32, phal0650, phal0651, phal l l38, phal0637, phall078, phal l079, phal ll39, phal0655, phal l701 , phal l627, phal0633, phall074, phall075, phal0632, phal l628, phal l l33, phal0641, phal l l36, phal0658, phal0636, phal0783, phal0647, pha08230 phal3070, phal3071, phal3072, phal3073, phal3074, phal3158, phal3560, phal3561, phal4257 and phal4395.

7. The method of claim 1, wherein the marker nucleic acid comprises the polymo╧åhic nucleotide.

8. A method of detecting a genetic nucleotide polymo╧åhism in a biological sample from a soybean plant, comprising the steps of: (i) providing the biological sample; (ii) providing a first probe nucleic acid which hybridizes to a first target nucleic acid linked to a first nucleotide polymo╧åhism in a first locus selected from the first group of loci consisting of php02265A, php02301A, php02361A, php02370C, php02387A, php02388A, php02393A, php02396A, php02636A, php03522A, php05219A, php05233A, php05278A, php05342A, php07659A, php08584A, phpl2105A, php02340B, php05264A and phpl0355B, php02329A, php02371A, php05290A, php02376A, and phpl0078A; (iii) contacting the first probe to the first target nucleic acid; and, (iv) detecting hybridization of the first probe and the first target nucleic acid, thereby detecting the first nucleotide polymo╧åhism in the biological sample.

9. The method of claim 8, further comprising detecting a second target nucleic acid linked to a second nucleotide polymo╧åhism in a second locus selected from the second group of loci consisting of pA060A, pA077A, pA086A, pA169A, pA280A, pA378A, pA505A, pA519A, pA588A, pA947B, pB032A, pB032B, pB039A, pBLT24A, pBLT65A, php02265A, php02301A, php02361A, php02370C, php02387A, php02388A, php02393A, php02396A, php02636A, php03522A, php05219A, php05233A, php05278A, php05342A, php07659A, php08584A, phpl2105A, php02340B, php05264A, phpl0355B, php02329A, php02371A, pA343A, pA748B, php05290A, pA858A, php02376A, pG17.3, pB132A, phpl0078A, pK069A, pK079A, pK401A, pK418A, pK644B, pL058A, pL183A, pR045A, pR153A, pT005A, pT155A, pA059A, pA064A, pBLT15A, pA593A, pA882A, p8320E and SOYBPSP.

10. The method of claim 9, wherein the method further comprises detecting marker polymo╧åhic nucleotides at a majority of the first and second groups of loci, thereby providing a comprehensive genotype of the biological tissue.

11. The method of claim 8, wherein the first probe specifically hybridizes to a single locus of a soybean genome.

12. The method of claim 8, wherein step (iv) consists of indirect detection of the hybridization of the first probe to the first target nucleic acid.

13. The method of claim 12, wherein indirect detection of the hybridization comprises detecting a hybridization dependent PCR amplicon selected from the group consisting of ╧ühal2105, phal2390, ╧ühal2391, phal2392, phal2393, phal2394, phal2394, phal2395, phal2396, phal0634, phal0623, phal0624, phal0649, phal l l35, phal0792, phal0635, phal0638, phal0648, phal0621, phal l071 , phal l073, phal0640, phall076, phal0653, phal0598, phal0615, phal0646, phal0618, phal0620, phal0782, phal ll31, phal ll32, phal0650, phal0651, phal l l38, phal0637, phal l078, phal l079, phal l l39, phal0655, phal l701, phal l627, phal0633, phal l074, phal l075, phal0632, phal l628, phal l l33, phal0641 , phal ll36, phal0658, phal0636, phal0783, phal0647, pha08230, phal3070, phal3071, phal3072, phal3073, phal3074, phal3158, phal3560, phal3561, phal4257 and phal4395.

14. The method of claim 13, wherein the PCR amplicon is made using the first probe as a primer for polymerase-dependent amplification, wherein the first nucleic acid is a template for the polymerase dependent amplification, and wherein detection of hybridization between the first probe and the first nucleic acid is performed by detecting the PCR amplicon.

15. The method of claim 8, wherein the method comprises amplification of the first target nucleic acid, or amplification of the first probe.

16. The method of claim 8, wherein step (iv) comprises direct detection of the hybridization of the first probe to the first target nucleic acid.

17. The method of claim 16, wherein hybridization is detected using a technique selected from the group consisting of Southern blotting, northern blotting and array-dependent nucleic acid hybridization on a nucleic acid polymer array.

18. The method of claim 8, the method further comprising detection of a plurality of target nucleic acids linked to polymo╧åhic nucleotides in a plurality of loci selected from the group of loci consisting of pA060A, pA077A,' pA086A, pA169A, pA280A, pA378A, pA505A, pA519A, pA588A, pA947B, pB032A, pB032B, pB039A, pBLT24A, pBLT65A, php02265A, php02301A, php02361A, php02370C, php02387A, php02388A, php02393A, php02396A, php02636A, php03522A, php05219A, php05233A, php05278A, php05342A, php07659A, php08584A, phpl2105A, php02340B, php05264A, phpl0355B, php02329A, php02371A, pA343A, pA748B, php05290A, pA858A, php02376A, pG17.3, pB132A, phpl0078A pK069A, pK079A, pK401A, pK418A, pK644B, pL058A, pL183A, pR045A, pR153A, pT005A, pT155A, pA059A, pA064A, pBLT15A, pA593A, pA882A, p8320E and SOYBPSP.

19. The method of claim 8, wherein the biological sample is selected from the group consisting of a soybean plant, a soybean plant extract, an isolated soybean plant tissue, an isolated plant tissue extract, a soybean plant cell culture, a soybean plant cell culture extract, a recombinant cell comprising a nucleic acid derived from a soybean plant, a soybean plant seed, and an extract of a recombinant cell comprising a nucleic acid derived from a soybean plant.

20. The method of claim 8, wherein the first target nucleic acid comprises the first polymo╧åhic nucleotide.

21. The method of claim 8, wherein the target nucleic acid is amplified prior to detection by an amplification technique selected from the group consisting of: PCR, LCR, and cloning of the target nucleic acid.

22. The method of claim 8, further comprising marker-assisted selection of a soybean plant comprising the detected nucleotide polymo╧åhism.

23. The method of claim 8, further comprising cloning a nucleic acid proximal to a locus selected from the group consisting of pA060A, pA077A, pA086A, pA169A, pA280A, pA378A, pA505A, pA519A, pA588A, pA947B, pB032A, pB032B, pB039A, pBLT24A, pBLT65A, php02265A, php02301A, php02361A, php02370C, php02387A, php02388A, php02393A, php02396A, php02636A, php03522A, php05219A, php05233A, php05278A, php05342A, php07659A, php08584A, phpl2105A, php02340B, php05264A, phpl0355B, php02329A, php02371A, pA343A, pA748B, php05290A, pA858A, php02376A, pG17.3, pB132A, php 10078 A, pK069A, pK079A, pK401A, pK418A, pK644B, pL058A, pL183A, pR045A, pR153A, pT005A, pT155A, pA059A, pA064A, pBLT15A, pA593A, pA882A, p8320E, pA059A, pA064A, pBLT15A, pA593A, pA882A, p8320E and SOYBPSP.

24. The method of claim 8, wherein a second polymo╧åhic nucleotide is detected, the first and second polymo╧åhic nucleotides corresponding to different polynucleotide positions, the method further comprising positional cloning of a clonable nucleic acid which hybridizes under stringent conditions to a genomic nucleic acid located between the two polymo╧åhic nucleotides.

25. The method of claim 24, wherein the clonable nucleic acid encodes a polypeptide expressed in a soybean plant.

26. The method of claim 24, wherein the clonable nucleic acid is operably linked to a heterologous promoter.

27. The method of claim 24, further comprising transducing the clonable nucleic acid into a cell.

28. The method of claim 8, further comprising map identification of a second nucleotide polymo╧åhism proximal to the selected loci, optionally by sequencing a nucleic acid comprising the second nucleotide polymo╧åhism.

29. The method of claim 8, further comprising (v) transducing a nucleic acid in linkage disequilibrium with a polymo╧åhic nucleotide from the selected loci into a soybean plant, thereby providing a transgenic soybean plant.

30. The method of claim 29, wherein step (iv) is performed after step (v).

31. The method of claim 8, wherein the first probe comprises a nucleic acid selected from the group of nucleic acids consisting of (PROBE SEQS).

32. The method of claim 8, wherein the polymo╧åhic nucleotide is in linkage disequilibrium with a Quantitative Trait Locus (QTL) selected from the group consisting of a QTL for resistance to soybean cyst nematode, a QTL for resistance to brown stem rot, and a QTL for phytopthora rot.

33. A method of separating two nucleic acids comprising an allele-specific nucleotide polymo╧åhism on a locus selected from the group of loci consisting of pA060A, pA077A, pA086A, pA169A, pA280A, pA378A, pA505A, pA519A, pA588A, pA947B, pB032A, pB032B, pB039A, pBLT24A, pBLT65A, php02265A, php02301A, php02361A, php02370C, php02387A, php02388A, php02393A, php02396A, php02636A, php03522A, php05219A, php05233A, php05278A, php05342A, php07659A, php08584A, phpl2105A, php02340B, php05264A, phpl0355B, php02329A, php02371A, pA343A, pA748B, php05290A, pA858A, php02376A, pG17.3, pB132A, phpl0078A, pK069A, pK079A, pK401A, pK418A, pK644B, pL058A, pL183A, pR045A, pR153A, pT005A, pT155A, pA059A, pA064A, pBLT15A, pA593A, pA882A, p8320E, pA059A, pA064A, pBLT15A, pA593A, pA882A, p8320E and SOYBPSP; said method consisting of separating the two nucleic acids by size or charge and detecting the two separated nucleic acids.

34. The method of claim 33, wherein the two nucleic acids are separated by single-strand conformation polymo╧åhism on a polyacrylamide gel.

35. The method of claim 33, wherein the loci are selected from the group consisting of php02265A, php02301A, php02361A, php02370C, php02387A, php02388A, php02393A, php02396A, php02636A, php03522A, php05219A, php05233A, php05278A, php05342A, php07659A, php08584A, phpl2105A, php02340B, php05264A phpl0355B, php02329A, php02371A, php05290A, php02376A, and phpl0078A.

36. A method of amplifying a nucleic acid, comprising providing a first primer nucleic acid which hybridizes under stringent conditions to a locus nucleic acid from a locus selected from the group of loci consisting of pA060A, pA077A, pA086A, pA169A, pA280A, pA378A, pA505A, pA519A, pA588A, pA947B, pB032A, pB032B, pB039A, pBLT24A, pBLT65A, php02265A, php02301A, php02361A, php02370C, php02387A, php02388A, php02393A, php02396A, php02636A, php03522A, php05219A, php05233A, php05278A, php05342A, php07659A, php08584A, phpl2105A, php02340B, php05264A, phpl0355B, php02329A, php02371A, pA343A, pA748B, php05290A, pA858A, php02376A, pG17.3, pB132A, phpl0078A, pK069A, pK079A, pK401A, pK418A, pK644B, pL058A, pL183A, pR045A, pR153A, pT005A, pT155A, pA059A, pA064A, pBLT15A, pA593A, pA882A, p8320E and SOYBPSP; providing a template nucleic acid which hybridizes to the selected locus under stringent conditions; hybridizing the primer to the template; and, amplifying a portion of the template nucleic acid with a template-dependent polymerase enzyme or a ligase enzyme.^'

37. The method of claim 36, wherein the primer is less than 100 nt in length and provides a polymerase extendible substrate and the primer- dependent polymerase extends the primer.

38. The method of claim 36, wherein the primer is an allele- specific primer.

39. The method of claim 36, wherein the first primer hybridizes adjacent to a second primer on the template nucleic acid and the first and second primers are ligated with a ligase enzyme, thereby amplifying the portion of the template hybridized to the first and second primers.

40. The method of claim 36, wherein the polymerase is a thermostable polymerase and the portion of the template is amplified by PCR using the first primer as a PCR primer.

41. The method of claim 40, wherein the method further comprises hybridizing a second primer to the template, wherein the first and second primer hybridize to complementary strands of the template nucleic acid, and wherein the first and second primers are PCR primers for the PCR.

42. The method of claim 36, wherein the template is amplified by a technique selected from the group consisting of PCR, asymmetric PCR, and LCR.

43. A composition comprising a first recombinant nucleic acid which differentially hybridizes under allele-specific hybridization conditions to a first allele from a locus in the soybean genome selected from the group of loci consisting of pA060A, pA077A, pA086A, pA169A, pA280A, pA378A, pA505A, pA519A, pA588A, pA947B, pB032A, pB032B, pB039A, pBLT24A, pBLT65A, php02265A, php02301A, php02361A, php02370C, php02387A, php02388A, php02393A, php02396A, php02636A, php03522A, php05219A, php05233A, php05278A, php05342A, php07659A, php08584A, phpl2105A, php02340B, php05264A, phpl0355B, php02329A, php02371A, pA343A, pA748B, php05290A, pA858A, php02376A, pG17.3, pB132A, phpl0078A, pK069A, pK079A, pK401A, pK418A, pK644B, pL058A, pL183A, pR045A, pR153A, pT005A, pT155A, pA059A, pA064A, pBLT15A, pA593A, pA882A, p8320E and SOYBPSP; wherein the first recombinant nucleic acid shows decreased hybridization affinity for a second allele from the selected locus.

44. The composition of claim 43, wherein the isolated nucleic acid is an oligonucleotide probe selected from the group of probes consisting of (PROBES).

45. The composition of claim 43, further comprising a second recombinant nucleic acid which differentially hybridizes under allele-specific hybridization conditions to the second allele from the selected locus, wherein the second nucleic acid shows decreased hybridization affinity for the first allele from the selected locus.

46. A composition comprising a recombinant nucleic acid which specifically hybridizes to a first allele-specific probe and a second allele-specific probe, which first and second allele-specific probes hybridize under allele-specific hybridization conditions to a first haplotype of a locus in the soybean genome selected from the group of loci consisting of pA060A, pA077A, pA086A, pA169A, pA280A, pA378A, pA505A, pA519A, pA588A, pA947B, pB032A, pB032B, pB039A, pBLT24A, pBLT65A, php02265A, php02301A, php02361A, php02370C, php02387A, php02388A, php02393A, php02396A, php02636A, php03522A, php05219A, php05233A, php05278A, php05342A, php07659A, php08584A, phpl2105A, php02340B, php05264A, phpl0355B, php02329A, php02371A, pA343A, pA748B, php05290A, pA858A, php02376A, pG17.3, pB132A, phpl0078A, pK069A, pK079A, pK401A, pK418A, pK644B, pL058A, pL183A, pR045A, pR153A, pT005A, pT155A, pA059A, pA064A, pBLT15A, pA593A, pA882A, p8320E and SOYBPSP.

47. The composition of claim 46, the composition further comprising an allele-specific probe which hybridizes to the recombinant nucleic acid.

48. The composition of claim 46, the composition comprising an allele-specific probe which hybridizes to a locus selected from the group consisting of php02265A, php02301A, php02361A, php02370C, php02387A, php02388A, php02393A, php02396A, php02636A, php03522A, php05219A, php05233A, php05278A, php05342A, php07659A, php08584A, phpl2105A, php02340B, php05264A php02329A, php02371A, php05290A, php02376A, phpl0078A and phpl0355B.

49. A PCR reaction mixture comprising: a polymerase enzyme; deoxy nucleotides; a template nucleic acid comprising a polymo╧åhic nucleotide, which template nucleic acid hybridizes under stringent conditions to a locus selected from a group of loci consisting of pA060A, pA077A, pA086A, pA169A, pA280A, pA378A, pA505A, pA519A, pA588A, pA947B, pB032A, pB032B, pB039A, pBLT24A, pBLT65A, php02265A, ph╧ü02301A, php02361A, php02370C, php02387A, php02388A, php02393A, php02396A, php02636A, php03522A, php05219A, php05233A, php05278A, php05342A, php07659A, php08584A, phpl2105A, php02340B, php05264A, phpl0355B, pA343A, pA748B, pA858A, pG17.3, pB132A, php02329A, php02371A, php05290A, php02376A, phpl0078A, pK069A, pK079A, pK401A, pK418A, pK644B, pL058A, pL183A, pR045A, pR153A, pT005A, pT155A, pA059A, pA064A, pBLT15A, pA593A, pA882A, p8320E and SOYBPSP; and, primers which specifically hybridize to the template nucleic acid, which primers are extendible by the polymerase under selected PCR reaction conditions.

50. The PCR reaction mixture of claim 49, wherein the primers are selected from the group of primers consisting of (PCR PRIMERS).

51. The PCR reaction mixture of claim 49, wherein the primers are allele-specific primers.

52. A PCR amplicon, which amplicon comprises a nucleic acid comprising a polymo╧åhic nucleotide, which amplicon hybridizes under stringent conditions to a locus selected from a group of loci consisting of pA060A, pA077A, pA086A, pA169A, pA280A, pA378A, pA505A, pA519A, pA588A, pA947B, pB032A, pB032B, pB039A, pBLT24A, pBLT65A, php02265A, php02301A, php02361A, php02370C, php02387A, php02388A, php02393A, php02396A, php02636A, php03522A, php05219A, php05233A, php05278A, php05342A, php07659A, php08584A, phpl2105A, php02340B, php05264A, phpl0355B, php02329A, php02371A, pA343A, pA748B, php05290A, pA858A, php02376A, pG17.3, pB132A, phpl0078A, pK069A, pK079A, pK401A, pK418A, pK644B, pL058A, pL183A, pR045A, pR153A, pT005A, pT155A, pA059A, pA064A, pBLT15A, pA593A, pA882A, p8320E and SOYBPSP.

53. The PCR amplicon of claim 52, wherein the locus is selected from the group consisting of php02265A, php02301A, ph╧ü02361A, php02370C, php02387A, php02388A, php02393A, php02396A, php02636A, php03522A, php05219A, php05233A, php05278A, php05342A, php07659A, php08584A, phpl2105A, php02340B, php05264A phpl2105A, php02340B, php05264A, php02329A, php02371A, php05290A, php02376A, phpl0078A and phpl0355B.

54. The PCR amplicon of claim 52, wherein the amplicon is selected from the group consisting of phal2105, phal2390, phal2391, phal2392, phal2393, phal2394, phal2394, phal2395, phal2396, phal0634, phal0623, phal0624, phal0649, ╧ühal l l35, phal0792, phal0635, phal0638, phal0648, phal0621, phal l071, phal l073, phal0640, phall076, phal0653, phal0598, phal0615, phal0646, phal0618, phal0620, phal0782, phall l31, phal l l32, phal0650, phal0651, phal l l38, phal0637, phall078, phall079, phal l l39, phal0655, phal l701, phal l627, phal0633, phall074, phall075, phal0632, phal l628, phal l l33, phal0641, phall l36, phal0658, phal0636, phal0783, phal0647, pha08230, phal3070, phal3071, phal3072, phal3073, phal3074, phal3158, phal3560, phal3561, phal4257 and phal4395.

55. The PCR amplicon of claim 52, wherein the amplicon is selected from the group consisting of phpl ll38 and php 11627.

56. The PCR amplicon of claim 52, wherein the amplicon is cloned into a vector.

57. An isolated or recombinant nucleic acid which hybridizes under stringent conditions to the PCR amplicon of claim 52.

58. The nucleic acid of claim 57, which nucleic acid is a DNA clone.

59. A set of nucleic acid probes comprising a plurality of probe nucleic acids which specifically hybridize to a plurality of target nucleic acids which hybridize under stringent conditions to a plurality of loci consisting of pA060A, pA077A, pA086A, pA169A, pA280A, pA378A, pA505A, pA519A, pA588A, pA947B, pB032A, pB032B, pB039A, pBLT24A, pBLT65A, php02265A, php02301A, php02361A, php02370C, php02387A, php02388A, php02393A, php02396A, php02636A, php03522A, php05219A, php05233A, php05278A, php05342A, php07659A, php08584A, phpl2105A, php02340B, php05264A, phpl0355B, php02329A, php02371A, pA343A, pA748B, php05290A, pA858A, php02376A, pG17.3, pB132A, phpl0078A, pK069A, pK079A, pK401A, pK418A, pK644B, pL058A, pL183A, pR045A, pR153A, pT005A, pT155A, pA059A, pA064A, pBLT15A, pA593A, pA882A, p8320E and SOYBPSP, or to an amplicon selected from the group consisting of phal2105, phal2390, phal2391, phal2392, phal2393, phal2394, phal2394, phal2395, phal2396, phal0634, phal0623, phal0624, phal0649, phalll35, phal0792, phal0635, phal0638, phal0648, phal0621, phall071, phall073, phal0640, phall076, phal0653, phal0598, phal0615, phal0646, phal0618, phal0620, phal0782, phalll31, phalll32, phal0650, phal0651, phalll38, phal0637, phall078, phall079, phalll39, phal0655, phall701, phall627, phal0633, phall074, phall075, phal0632, phall628, phalll33, phal0641, phalll36, phal0658, phal0636, phal0783, phal0647, pha08230, phal3070, phal3071, phal3072, phal3073, phal3074, phal3158, phal3560, phal3561, phal4257 and phal4395.

60. The set of claim 59, wherein the set hybridizes to a locus selected from the group of loci consisting of php02265A, php02301A, php02361A, php02370C, php02387A, php02388A, php02393A, php02396A, php02636A, php03522A, php05219A, php05233A, php05278A, php05342A, php07659A, php08584A, phpl2105A, php02340B, php05264A, php02329A, php02371A, php05290A, php02376A, phpl0078A and phpl0355B.

61. The set of claim 59, wherein the probe nucleic acids are arranged in an array.

62. The set of claim 59, wherein the set of nucleic acids is in kit form, said kit optionally comprising one or more component selected from the components consisting of a container, instructional materials, one or more control target nucleic acids, and recombinant cells comprising one or more target nucleic acids.

63. A recombinant plant comprising a recombinant nucleic acid which hybridizes under stringent conditions to a target nucleic acid comprising a nucleotide polymo╧åhism from a locus selected from the group of loci consisting of pA060A, pA077A, pA086A, pA169A, pA280A, pA378A, pA505A, pA519A, pA588A, pA947B, pB032A, pB032B, pB039A, pBLT24A, pBLT65A, php02265A, php02301A, php02361A, php02370C, php02387A, php02388A, php02393A, php02396A, php02636A, php03522A, php05219A, php05233A, php05278A, php05342A, php07659A, php08584A, phpl2105A, php02340B, php05264A, phpl0355B, php02329A, php02371A, pA343A, pA748B, php05290A, pA858A, php02376A, pG17.3, pB132A, phpl0078A, pK069A, pK079A, pK401A, pK418A, pK644B, pL058A, pL183A, pR045A, pR153A, pT005A, pT155A and SOYBPSP, or to an amplicon selected from the group consisting of phal2105, phal2390, phal2391, phal2392, phal2393, phal2394, phal2394, phal2395, phal2396, phal0634, phal0623, phal0624, phal0649, phalll35, phal0792, phal0635, phal0638, phal0648, phal0621, phall071, phall073, phal0640, phall076, phal0653, phal0598, phal0615, phal0646, phal0618, phal0620, phal0782, phalll31, phalll32, phal0650, phal0651, phalll38, phal0637, phall078, phall079, phalll39, phal0655, phall701, phall627, phal0633, phall074, phall075, phal0632, phall628, phalll33, phal0641, phalll36, phal0658, phal0636, phal0783, phal0647, pha08230, phal3070, phal3071, phal3072, phal3073, phal3074, phal3158, phal3560, phal3561, phal4257 and phal4395.

64. The recombinant plant of claim 63, wherein the recombinant nucleic acid comprises a coding sequence encoded by a gene in linkage disequilibrium with a Quantitative Trait Locus (QTL).

65. The recombinant plant of claim 64, wherein the QTL is selected from the group consisting of a QTL for resistance to soybean cyst nematode, a QTL for resistance to brown stem rot, and a QTL for resistance to phytopthora rot.