WO1998020165A2 - Biallelic markers - Google Patents

Biallelic markers Download PDF

Info

Publication number
WO1998020165A2
WO1998020165A2 PCT/US1997/020313 US9720313W WO9820165A2 WO 1998020165 A2 WO1998020165 A2 WO 1998020165A2 US 9720313 W US9720313 W US 9720313W WO 9820165 A2 WO9820165 A2 WO 9820165A2
Authority
WO
WIPO (PCT)
Prior art keywords
polymorphic
segment
allele
column
nucleic acid
Prior art date
Application number
PCT/US1997/020313
Other languages
French (fr)
Other versions
WO1998020165A3 (en
Inventor
Eric S. Lander
David Wang
Thomas Hudson
Original Assignee
Whitehead Institute For Biomedical Research
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Whitehead Institute For Biomedical Research filed Critical Whitehead Institute For Biomedical Research
Priority to EP97946582A priority Critical patent/EP0941366A2/en
Publication of WO1998020165A2 publication Critical patent/WO1998020165A2/en
Publication of WO1998020165A3 publication Critical patent/WO1998020165A3/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers

Definitions

  • the variant form may confer an evolutionary advantage or disadvantage relative to a progenitor form or may be neutral.
  • a variant form confers a lethal disadvantage and is not transmitted to subsequent generations of the organism.
  • a variant form confers an evolutionary advantage to the species and is eventually incorporated into the DNA of many or most members of the species and effectively becomes the progenitor form.
  • a restriction fragment length polymorphism Is a variation in DNA sequence that alters the length of a restriction fragment (Botstein et al . , Am. J. Hum . Genet . 32, 314-331 (1980)).
  • the restriction fragment length polymorphism may create or delete a restriction site, thus changing the length of the restriction fragment.
  • RFLPs have been widely used in human and animal genetic analyses (see WO 90/13668; W090/11369; Donis-Keller, Cell 51, 319-337 (1987); Lander et al . , Genetics 121, 85-99 (1989) ) .
  • the presence of the RFLP in an individual can be used to predict the likelihood that the animal will also exhibit the trait.
  • VNTR variable number tandem repeat
  • STRs short tandem repeats
  • VNTRs have been used in identity "and paternity analysis (US 5,075,217; Armour et al . , FEBS Lett . 307, 113-115 (1992); Horn et al . , W0 91/14003; Jeffreys, EP 370,719), and in a large number of genetic mapping studies.
  • Other polymorphisms take the form of single nucleotide variations between individuals of the same species .
  • polymorphisms are far more frequent than RFLPs , STRs and VNTRs .
  • Some single nucleotide polymorphisms occur in protein-coding sequences, in which case, one of the polymorphic forms may give rise to the expression of a defective or other variant protein and, potentially, a genetic disease. Examples of genes, in which polymorphisms within coding sequences give rise to genetic disease include -globin (sickle cell anemia) and CFTR (cystic fibrosis) .
  • Other single nucleotide polymorphisms occur in noncoding regions. Some of these polymorphisms may also result in defective protein expression (e.g., as a result of defective splicing) . Other single nucleotide polymorphisms have no phenotypic effects.
  • Single nucleotide polymorphisms can be used in the same manner as RFLPs and VNTRs, but offer several advantages. Single nucleotide polymorphisms occur with greater frequency and are spaced more uniformly throughout the genome than other forms of polymorphism. The greater frequency and uniformity of single nucleotide polymorphisms means that there is a greater probability that such a polymorphism will be found in close proximity to a genetic locus of interest than would be the case for other polymorphisms. The different forms of characterized single nucleotide polymorphisms are often easier to distinguish than other types of polymorphism (e.g., by use of assays employing allele-specific hybridization probes or primers) .
  • the invention provides nucleic acid sequences comprising nucleic acid segments of from about 10 to about 200 bases as shown in the Table, column 7, including a polymorphic site. Complements of these segments are also included.
  • the segments can be DNA or RNA, and can be double- or single-stranded. Segments can be, for example, 10-20, 10-50 or 10-100 bases long. Preferred segments include a biallelic polymorphic site. The base occupying the polymorphic site in the segments can be the reference
  • the invention further provides allele-specific- oligonucleotides that hybridize to a segment of a fragment shown in the Table, column 7, or its complement. These oligonucleotides can be probes or primers. Also provided are isolated nucleic acids comprising a sequence shown in the Table, column 7, or the complement thereto, in which the polymorphic site within the sequence is occupied by a base other than the reference base shown in the Table, column 3.
  • the invention further provides a method of analyzing a nucleic acid from an individual.
  • the method determines which base is present at any one of the polymorphic sites shown in the Table.
  • a set of bases occupying a set of the polymorphic sites shown in the Table is determined. This type of analysis can be performed on a number of individuals, who are tested for the presence of a disease phenotype. The presence or absence of disease phenotype is then correlated with a base or set of bases present at the polymorphic sites in the individuals tested.
  • An oligonucleotide can be DNA or RNA, and single- or double- stranded. Oligonucleotides can be naturally occurring or synthetic, but are typically prepared by synthetic means.
  • the oligonucleotides of the present invention can comprise all of an oligonucleotide sequence presented in column 7 of the Table or a segment of such an oligonucleotide which includes a polymorphic site.
  • Oligonucleotides can be all of a nucleic acid segment as represented in column 7 of the Table; a nucleic acid sequence which comprises a nucleic acid segment represented in column 7 of the Table and additional nucleic acids (present at either or both ends of a nucleic acid segment of column 7) ; or a portion (fragment) of a nucleic acid segment represented in column 7 of the Table which includes a polymorphic site.
  • Preferred oligonucleotides of the invention include segments of DNA, or their complements, which include any one of the polymorphic sites shown in the Table. The segments can be between 5 and 250 bases, and, in specific embodiments, are between 5-10, 5-20, 10-20, 10- 50, 20-50 or 10-100 bases.
  • the polymorphic site can occur within any position of the segment.
  • the segments can be from any of the allelic forms of DNA shown in the Table.
  • Hybridization probes are oligonucleotides which bind in a base-specific manner to a complementary strand of nucleic acid. Such probes include peptide nucleic acids, as described in Nielsen et al . , Science 254, 1497-1500 (1991) .
  • primer refers to a single- stranded oligonucleotide which acts as a point of initiation of template-directed DNA synthesis under appropriate conditions ( e . g.
  • primer site refers to the area of the target DNA to which a primer hybridizes.
  • primer pair refers to a set of primers including a 5' (upstream) -primer that hybridizes with the 5' end of the DNA sequence to be amplified and a 3' (downstream) primer that hybridizes with the complement of the 3' end of the sequence to be amplified.
  • linkage describes the tendency of genes, alleles, loci or genetic markers to be inherited together as a result of their location on the same chromosome. It can be measured by percent recombination •between the two genes, alleles, loci or genetic markers.
  • polymorphism refers to the occurrence of two or more genetically determined alternative sequences or alleles in a population.
  • a polymorphic marker or site is the locus at which divergence occurs. Preferred markers have at least two alleles, each occurring at frequency of greater than 1%, and more preferably greater than 10% or 20% of a selected population.
  • a polymorphic locus may be as small as one base pair.
  • Polymorphic markers include restriction fragment length polymorphisms, variable number of tandem repeats (VNTR's), hypervariable regions, minisatellites, dinucleotide repeats, trinucleotide repeats, tetranucleotide repeats, simple sequence repeats, and insertion elements such as Alu.
  • allelic form is arbitrarily designated as the reference form and other allelic forms are designated as alternative or variant alleles.
  • allelic form occurring most frequently in a selected population is sometimes referred to as the wildtype form. Diploid organisms may be homozygous or heterozygous for allelic forms.
  • a diallelic or biallelic polymorphism has two forms.
  • a triallelic polymorphism has three forms.
  • a single nucleotide polymorphism occurs at a polymorphic site occupied by a single nucleotide, which is the site of variation between allelic sequences . -The site is usually preceded by and followed by highly conserved sequences of the allele (e.g., sequences that vary in less than 1/100 or 1/1000 members of the populations) .
  • a single nucleotide polymorphism usually arises due to substitution of one nucleotide for another at the polymorphic site.
  • a transition is the replacement of one purine by another purine or one pyrimidine by another pyrimidine.
  • a transversion is the replacement of a purine by a pyrimidine or vice versa.
  • Single nucleotide polymorphisms can also arise from a deletion of a nucleotide or an insertion of a nucleotide relative to a reference allele.
  • the polymorphic site is occupied by a base other than the reference base. For example, where the reference allele contains the base "T" at the polymorphic site, the altered allele can contain a "C", "G” or "A" at the polymorphic site.
  • Hybridizations are usually performed under stringent conditions, for example, at a salt concentration of no more than 1 M and a temperature of at least 25 °C.
  • stringent conditions for example, at a salt concentration of no more than 1 M and a temperature of at least 25 °C.
  • 5X SSPE 750 mM NaCl, 50 mM NaPhosphate, 5 mM EDTA, pH 7.4
  • a temperature of 25-30°C, or equivalent conditions are suitable for allele-specific probe hybridizations.
  • Equivalent conditions can be determined by varying one or more of the parameters given as an example, as known in the art, while maintaining a similar degree of identity or similarity between the target nucleotide sequence and the primer or probe used.
  • an isolated nucleic acid of the invention may be substantially isolated with respect to the complex cellular milieu in which it naturally occurs.
  • the- isolated material will form part of a composition (for example, a crude extract containing other substances) , buffer system or reagent mix.
  • the material may be purified to essential homogeneity, for example as determined by PAGE or column chromatography such as HPLC.
  • an isolated nucleic acid comprises at least about 50, 80 or 90 percent (on a molar basis) of all macromolecular species present.
  • the novel polymorphisms of the invention are listed in the Table.
  • the first column of the Table lists the names assigned to the fragments in which the polymorphisms occur.
  • the fragments are all human genomic fragments.
  • the sequence of one allelic form of each of the fragments (arbitrarily referred to as the prototypical or reference form) has been previously published. These sequences are listed at http://www-genome.wi.mit.edu/ (all STS's (sequence tag sites)); http://shgc.stanford.edu (Stanford STS's); and http://ww.tigr.org/ (TIGR STS's).
  • the Web sites also list primers for amplification of the fragments, and the genomic location of fragments. Some fragments are expressed sequence tags, and some are random genomic fragments. All information in the websites concerning the fragments listed in the Table is incorporated by reference in its entirety for all purposes.
  • the second column lists the position in the fragment in which a polymorphic site has been found. Positions are numbered consecutively with the first base of the fragment sequence as listed in one of the above databases being assigned the number one.
  • the third column lists the base occupying the polymorphic site in the sequence in the data base. This base is arbitrarily designated -the-- reierence or prototypical form, but it is not necessarily the most frequently occurring form.
  • the fourth column in the Table lists the alternative base(s) at the polymorphic site.
  • the fifth column of the Table lists a 5' (upstream or forward) primer that hybridizes with the 5' end of the DNA sequence to be amplified.
  • the sixth column of the Table lists a 3' (downstream or reverse) primer that hybridizes with the complement of the 3' end of the sequence to be amplified.
  • the seventh column of the Table lists a number of bases of sequence on either side of the polymorphic site in each fragment .
  • the indicated sequences can be either DNA or RNA. In the latter, the T's shown in the Table are replaced by U's.
  • the base occupying the polymorphic site is indicated in EUPAC-IUB ambiguity code.
  • tissue samples include whole blood, semen, saliva, tears, urine, fecal material, sweat, buccal, skin and hair.
  • tissue sample For assay of cDNA or mRNA, the tissue sample must be obtained from an organ in which the target nucleic acid is expressed. For example, if the target nucleic acid is a cytochrome P450, the liver is a suitable source.
  • PCR DNA Amplifica tion
  • PCR Protocols A Guide to Methods and Applications (eds. Innis,-- et-al . , Academic Press, San Diego, CA, 1990); Mattila et al . , Nuclei c Acids Res . 19, 4967 (1991); Eckert et al . , PCR Methods and Applications 1, 17 (1991); PCR (eds. McPherson et al . , IRL Press, Oxford); and U.S. Patent 4,683,202.
  • LCR ligase chain reaction
  • NASBA nucleic acid based sequence amplification
  • the latter two amplification methods involve isothermal reactions based on isothermal transcription, which produce both single stranded RNA (ssRNA) and double stranded DNA (dsDNA) as the amplification products in a ratio of about 30 or 100 to 1, respectively.
  • ssRNA single stranded RNA
  • dsDNA double stranded DNA
  • the first type of analysis is carried out to identify polymorphic sites not previously characterized (i.e., to identify new polymorphisms) .
  • This analysis compares target sequences in different individuals to identify points of variation, i.e., polymorphic sites.
  • groups of individuals representing the greatest ethnic diversity among humans and greatest breed and species variety in plants and animals patterns characteristic of the most common alleles/haplotypes of the locus can be identified, and the frequencies of such alleles/haplotypes in the population can be determined. Additional allelic frequencies can be determined -for subpopulations characterized by criteria such as geography, race, or gender.
  • the de novo identification of polymorphisms of the invention is described in the Examples section.
  • the second type of analysis determines which form(s) of a characterized (known) polymorphism are present in individuals under test. There are a variety of suitable procedures, which are discussed in turn.
  • Allele-specific probes for analyzing polymorphisms is described by e.g., Saiki et al . , Nature 324, 163-166 (1986); Dattagupta, EP 235,726, Saiki, WO 89/11548. Allele-specific probes can be designed that hybridize to a segment of target DNA from one individual but do not hybridize to the corresponding segment from another individual due to the presence of different polymorphic forms in the respective segments from the two individuals. Hybridization conditions should be sufficiently stringent that there is a significant difference in hybridization intensity between alleles, and preferably an essentially binary response, whereby a probe hybridizes to only one of the alleles.
  • Some probes are designed to hybridize to a segment of target DNA such that the polymorphic site aligns with a central position (e.g., in a 15-mer at the 7 position; in a 16-mer, at either the 8 or 9 position) of the probe. This design of probe achieves good discrimination in hybridization between different allelic forms.
  • Allele-specific probes are often used in pairs, one member of a pair showing a perfect match to a reference form of a target sequence and the other member showing a perfect match to a variant form. Several pairs of probes can then be immobilized on the same support for simultaneous analysis of multiple polymorphisms within the same target sequence .
  • the polymorphisms can also be identified by hybridization to nucleic acid arrays, some examples of which are described in WO 95/11995.
  • One form of such arrays is described in the Examples section in connection with de novo identification of polymorphisms.
  • the same array or a different array can be used for analysis of characterized polymorphisms.
  • WO 95/11995 also describes subarrays that are optimized for detection of a variant form of a precharacterized polymorphism.
  • Such a subarray contains probes designed to be complementary to a second reference sequence, which is an allelic variant of the first reference sequence.
  • the second group of probes is designed by the same principles as described in the Examples, except that the probes exhibit complementarity to the second reference sequence.
  • a second group can be particularly useful for analyzing short subsequences of the primary reference sequence in which multiple mutations are expected to occur within a short distance commensurate with the length of the probes (e.g., two or more mutations within 9 to 21 bases) .
  • Allele-Specific Primers An allele-specific primer hybridizes to a site on target DNA overlapping a polymorphism and only primes amplification of an allelic form to which the primer exhibits perfect complementarity. See Gibbs , Nucl eic Acid Res . 17, 2427-2448 (1989) . This primer is used in conjunction with a second primer which hybridizes at a distal site. Amplification proceeds from the -two-primers , resulting in a detectable product which indicates the particular allelic form is present. A control is usually performed with a second pair of primers, one of which shows a single base mismatch at the polymorphic site and the other of which exhibits perfect complementarity to a distal site.
  • the single-base mismatch prevents amplification and no detectable product is formed.
  • the method works best when the mismatch is included in the 3 ' -most position of the oligonucleotide aligned with the polymorphism because this position is most destabilizing to elongation from the primer (see, e.g., WO 93/22456).
  • the direct analysis of the sequence of polymorphisms of the present invention can be accomplished using either the dideoxy chain termination method or the Maxam Gilbert method (see Sambrook et al . , Molecular Cloning, A Laboratory Manual (2nd Ed., CSHP, New York 1989); Zyskind et al . i Recombinant DNA Laboratory Manual , (Acad. Press, 1988) ) . 5. Denaturing Gradient Gel Electrophoresis Amplification products generated using the polymerase chain reaction can be analyzed by the use of denaturing gradient gel electrophoresis. Different alleles can be identified based on the different sequence-dependent melting properties and electrophoretic migration of DNA in solution. Erlich, ed. , PCR Technology, Principles and Applica tions for DNA Amplification, (W.H. Freeman and Co, New York, 1992), Chapter 7.
  • Alleles of target sequences can be differentia-ted using single- strand conformation polymorphism analysis, which identifies base differences by alteration in electrophoretic migration of single stranded PCR products, as described in Orita et al . , Proc . Na t . Acad . Sci . 86,
  • Amplified PCR products can be generated as described above, and heated or otherwise denatured, to form single stranded amplification products.
  • Single- stranded nucleic acids may refold or form secondary structures which are partially dependent on the base sequence.
  • the different electrophoretic mobilities of single-stranded amplification products can be related to base-sequence differences between alleles of target sequences .
  • polymorphisms of the invention are often used in conjunction with ⁇ - polymorphisms in distal genes.
  • Preferred polymorphisms for use in forensics are biallelic because the population frequencies of two polymorphic forms can usually be determined with greater accuracy than those of multiple polymorphic forms at multi-allelic loci.
  • the capacity to identify a distinguishing or unique set of forensic markers in an individual is useful for forensic analysis. For example, one can determine whether a blood sample from a suspect matches a blood or other tissue sample from a crime scene by determining whether the set of polymorphic forms occupying selected polymorphic sites is the same in the suspect and the sample. If the set of polymorphic markers does not match between a suspect and a sample, it can be concluded (barring experimental error) that the suspect was not the source of the sample. If the set of markers does match, one can conclude that the DNA from the suspect is consistent with that found at the crime scene.
  • frequencies of the polymorphic forms at the loci tested have been determined (e.g., by analysis of a suitable population of individuals) , one can perform a statistical analysis to determine the probability that a match of suspect and crime scene sample would occur by chance .
  • p(ID) is the probability that two random individuals have the same polymorphic or allelic form at a given polymorphic site. In biallelic loci, four genotypes are possible: AA, AB, BA, and BB . If alleles A and B occur in a haploid genome of the organism with frequencies x and y, the probability of each genotype in a diploid organism is
  • the cumulative probability of identity (cum p(ID)) for each of multiple unlinked loci is determined by multiplying the probabilities provided by each locus.
  • cum p(ID) p(IDl)p(ID2)p(ID3) ....
  • the object of paternity testing is usually" to ⁇ determine whether a male is the father of a child. In most cases, the mother of the child is known and thus, the mother's contribution to the child's genotype can be traced. Paternity testing investigates whether the part of the child's genotype not attributable to the mother is consistent with that of the putative father. Paternity testing can be performed by analyzing sets of polymorphisms in the putative father and the child. If the set of polymorphisms in the child attributable to the father does not match the set of polymorphisms of the putative father, it can be concluded, barring experimental error, that the putative father is not the real father. If the set of polymorphisms in the child attributable to the father does match the set of polymorphisms of the putative father, a statistical calculation can be performed to determine the probability of coincidental match.
  • polymorphisms of the invention may contribute to the phenotype of an organism in different ways . Some polymorphisms occur within a protein coding sequence and contribute to phenotype by affecting protein structure.
  • the effect may be neutral, beneficial or detrimental, or both beneficial and detrimental, depending on the circumstances .
  • a heterozygous sickle cell mutation confers resistance to malaria, but a homozygous sickle cell mutation is usually lethal.
  • Other polymorphisms occur in noncoding regions but may exert phenotypic effects indirectly via influence on replication, transcription, and translation.
  • a single polymorphism may affect more than one phenotypic trait.
  • a single phenotypic trait may be affected by polymorphisms in different genes. Further, some polymorphisms predispose an individual to a distinct mutation that is causally related to a certain phenotype.
  • Phenotypic traits include diseases that ha-ve teiown but hitherto unmapped genetic components (e.g., agammaglobulimenia, diabetes insipidus, Lesch-Nyhan syndrome, muscular dystrophy, Wiskott-Aldrich syndrome,
  • Phenotypic traits also include symptoms of, or susceptibility to, multifactorial diseases of which a component is or may be genetic, such as autoimmune diseases, inflammation, cancer, diseases of the nervous system, and infection by pathogenic microorganisms.
  • autoimmune diseases include rheumatoid arthritis, multiple sclerosis, diabetes (insulin-dependent and non-independent) , systemic lupus erythematosus and Graves disease.
  • cancers include cancers of the bladder, brain, breast, colon, esophagus, kidney, leukemia, liver, lung, oral cavity, ovary, pancreas, prostate, skin, stomach and uterus.
  • Phenotypic traits also include characteristics such as longevity, appearance (e.g., baldness, obesity), strength, speed, endurance, fertility, and susceptibility or receptivity to particular drugs or therapeutic treatments.
  • Correlation is performed for a population of individuals who have been tested for the presence or absence of a phenotypic trait of interest and for polymorphic markers sets.
  • a set of polymorphisms i.e. a polymorphic set
  • the alleles of each polymorphism of the set are then reviewed--to-determine whether the presence or absence of a particular allele is associated with the trait of interest.
  • Correlation can be performed by standard statistical methods such as a K - squared test and statistically significant correlations between polymorphic form(s) and phenotypic characteristics are noted.
  • allele Al at polymorphism A correlates with heart disease.
  • allele Bl at polymorphism B correlates with increased milk production of a farm animal.
  • Such correlations can be exploited in several ways .
  • detection of the polymorphic form set in a human or animal patient may justify immediate administration of treatment, or at least the institution of regular monitoring of the patient.
  • Detection of a polymorphic form correlated with serious disease in a couple contemplating a family may also be valuable to the couple in their reproductive decisions.
  • the female partner might elect to undergo in vitro fertilization to avoid the possibility of transmitting such a polymorphism from her husband to her offspring.
  • immediate therapeutic intervention or monitoring may not be justified.
  • the patient can be motivated to begin simple life-style changes (e.g., diet, exercise) that can be accomplished at little cost to the patient but confer potential benefits in reducing the risk of conditions to which the patient may have increased susceptibility by virtue of variant alleles .
  • Identification -of -a polymorphic set in a patient correlated with enhanced receptiveness to one of several treatment regimes for a disease indicates that this treatment regime should be followed.
  • Y ijkpn ⁇ + YSi + P j + X k + ⁇ 1 + ... jS 17 + PE n + a n +e p
  • Y ijknp is the milk, fat, fat percentage, SNF, SNF percentage, energy concentration, or lactation energy record
  • is an overall mean
  • YSi is the effect common to all cows calving in year-season
  • X k is the effect common to cows in either the high or average selection line
  • ⁇ to ⁇ xl are the binomial regressions of production record on mtDNA D-loop sequence polymorphisms
  • PE n is permanent environmental effect common to all records of cow n
  • a n is effect of animal n and is composed of the additive genetic contribution of sire and dam breeding values and a Mendelian sampling effect
  • e p is a random residual. It was found that eleven of seventeen polymorphisms tested influenced at least one production trait. Bovines having the best
  • D. Genetic Mapping of Phenotypic Traits The previous section concerns identifying correlations between phenotypic traits and polymorphisms that directly or indirectly contribute to those traits.
  • the present section describes identification of a physical linkage between a genetic locus associated with a trait of interest and polymorphic markers that are not associated with the trait, but are in physical proximity with the genetic locus responsible for the trait and co-segregate with it.
  • Such analysis is useful for mapping a genetic locus associated with a phenotypic trait to a chromosomal position, and thereby cloning gene(s) responsible for the trait. See Lander et al . , Proc . Na tl . Acad . Sci .
  • Linkage studies are typically performed on members of a family. Available members of the family are characterized for the presence or absence of a phenotypic trait and for a set of polymorphic markers. The distribution of polymorphic markers in an informative meiosis is then analyzed to determine which polymorphic markers co- segregate with a phenotypic trait. See, e . g. , Kerem et al . , Science 245, 1073-1080 (1989); Monaco et al . , Na ture 316, 842 (1985); Yamoka et al . , Neurology 40, 222-226 (1990); Rossiter et al . , FASEB Journal 5, 21-27 (1991).
  • LOD log of the odds
  • the likelihood at a given value of ⁇ is: probability of data if loci linked at ⁇ to probability of data if loci unlinked.
  • the computed likelihoods are usually expressed as the log 10 of this ratio (i.e., a lod score) .
  • a lod score of 3 indicates 1000:1 odds against an apparent observed linkage being a coincidence.
  • the use of logarithms- allows data collected from different families to be combined by simple addition. Computer programs are available for the calculation of lod scores for differing values of ⁇ (e.g., LIPED, MLINK (Lathrop, Proc . Na t . Acad . Sci . (USA) 81, 3443-3446 (1984)) .
  • a recombination fraction may be determined from mathematical tables. See Smith et al . , Ma thema tical tables for research workers in human genetics (Churchill, London, 1961); Smith, Ann . Hum . Genet . 32, 127-150 (1968) . The value of ⁇ at which the lod score is the highest is considered to be the best estimate of the recombination fraction.
  • Positive lod score values suggest that the two loci are linked, whereas negative values suggest that linkage is less likely (at that value of ⁇ ) than the possibility that the two loci are unlinked.
  • a combined lod score of +3 or greater is considered definitive evidence that two loci are linked.
  • a negative lod score of -2 or less is taken as definitive evidence against linkage of the two loci being compared.
  • Negative linkage data are useful in excluding a chromosome or a segment thereof from consideration. The search focuses on the remaining non-excluded chromosomal locations .
  • the invention further provides variant forms of nucleic acids and corresponding proteins.
  • the nucleic acids comprise one of the sequences described in the Table, column 8, in which the polymorphic position is occupied by one of the alternative bases for that position. Some nucleic acids encode full-length variant forms of proteins.
  • variant proteins have the prototypical amino acid sequences encoded by nucleic acid sequences shown in the Table, column 8, (read so as to be in- frame with the full-length coding sequence of which it is a component) except at an amino acid encoded by a codon including one of the polymorphic positions shown in the Table. That position is occupied by the amino acid coded by the corresponding codon in any of the alternative forms shown in the Table .
  • Variant genes can be expressed in an expression vector in which a variant gene is operably linked to a native or other promoter.
  • the promoter is a eukaryotic promoter for expression in a mammalian cell.
  • the transcription regulation sequences typically include a heterologous promoter and optionally an enhancer which is recognized by the host.
  • the selection of an appropriate promoter for example trp, lac, phage promoters, glycolytic enzyme promoters and tRNA promoters, depends on the host selected.
  • Commercially available expression vectors can be used. Vectors can include host-recognized replication systems, amplifiable genes, selectable markers, host sequences useful for insertion into the host genome, and the like.
  • the means of introducing the expression construct into a host cell varies depending upon the particular construction and the target host. Suitable means include fusion, conjugation, transfection, transduction, electroporation or injection, as described in Sambrook, supra .
  • a wide variety of host cells can be employed for expression of the variant gene, both prokaryotic and eukaryotic. Suitable host cells include bacteria such as E. coli , yeast, filamentous fungi, insect cells, mammalian cells, typically immortalized, e . g. , mouse, CHO, human and monkey cell lines and derivatives thereof. Preferred host cells are able to process the variant gene product to produce an appropriate mature polypeptide.
  • the protein may be isolated by conventional means of protein biochemistry and purification to obtain a substantially pure product, i . e . , 80, 95 or 99% free of cell component contaminants, as described in Jacoby, Methods in Enzymology Volume 104, Academic Press, New York (1984); Scopes, Protein Purifica tion, Principles and Practice, 2nd Edition, Springer-Verlag, New York (1987); and DeuLscher (ed) , Guide to Protein Purifica tion, Methods in Enzymology, Vol. 182 (1990). If the protein is secreted, it can be isolated from the supernatant in which the host cell is grown. If not secreted, the protein can be isolated from a lysate of the host cells.
  • the invention further provides transgenic nonhuman animals capable of expressing an exogenous variant gene and/or having one or both alleles of an endogenous variant gene inactivated.
  • Expression of an exogenous variant gene is usually achieved by operably linking the gene to a promoter and optionally an enhancer, and microinjecting the construct into a zygote .
  • Inactivation of endogenous variant genes can be achieved by forming a transgene in which a cloned variant gene is inactivated by insertion of a positive selection marker. See Capecchi, Science 244, 1288-1292 (1989) .
  • the transgene is then introduced into an embryonic stem cell, where it undergoes homologous recombination with an endogenous variant gene. Mice and other rodents are preferred animals. Such animals provide useful drug screening systems .
  • the present invention includes biologically active fragments of the polypeptides, or analogs thereof, including organic molecules which simulate the interactions of the peptides.
  • Biologically active fragments include any portion of the full-length polypeptide which confers a biological function on the variant gene product, including ligand binding, and antibody binding.
  • Ligand binding includes binding by nucleic acids, proteins or polypeptides, small biologically active molecules, or large cellular structures.
  • Antibodies that specifically bind to variant gene products but not to corresponding prototypical gene products are also provided.
  • Antibodies can be made by injecting mice or other animals with the variant gene product or synthetic -peptide- fragments thereof. Monoclonal antibodies are screened as are described, for example, in Harlow & Lane, Antibodies , A Labora tory Manual , Cold Spring Harbor Press, New York (1988) ; Goding, Monoclonal antibodies, Principles and Practice (2d ed.) Academic Press, New York (1986) . Monoclonal antibodies are tested for specific immunoreactivity with a variant gene product and lack of immunoreactivity to the corresponding prototypical gene product . These antibodies are useful in diagnostic assays for detection of the variant form, or as an active ingredient in a pharmaceutical composition.
  • kits comprising at least one allele-specific oligonucleotide as described above. Often, the kits contain one or more pairs of allele- specific oligonucleotides hybridizing to different forms of a polymorphism. In some kits, the allele-specific oligonucleotides are provided immobilized to a substrate.
  • the same substrate can comprise allele- specific oligonucleotide probes for detecting at least 10, 100 or all of the polymorphisms shown in the Table.
  • kits include, for example, restriction enzymes, reverse-transcriptase or polymerase, the substrate nucleoside triphosphates , means used to label (for example, an avidin-enzyme conjugate and enzyme substrate and chromogen if the label is biotin) , and the appropriate buffers for reverse transcription, PCR, or hybridization reactions.
  • the kit also contains instructions for carrying out the methods.
  • the polymorphisms shown in the Table were identified by resequencing of target sequences from three to ten unrelated individuals of diverse ethnic and geographic backgrounds by hybridization to probes immobilized to microfabricated arrays or conventional sequencing.
  • the strategy and principles for design and use of such arrays are generally described in WO 95/11995.
  • the strategy provides arrays of probes for analysis of target sequences showing a high degree of sequence identity to the reference sequences of the fragments shown in the Table, column 1.
  • the reference sequences were sequence-tagged sites (STSs) developed in the course of the Human Genome Project (see, e . g . , Science 270, 1945-1954 (1995); Nature 380, 152-154 (1996)).
  • a typical probe array used in this analysis has two groups of four sets of probes that respectively tile both strands of a reference sequence.
  • a first probe set comprises a plurality of probes exhibiting perfect complementarily with one of the reference sequences.
  • Each probe in the first probe set has an interrogation position that corresponds to a nucleotide in the reference sequence. That is, the interrogation position is aligned with the corresponding nucleotide in the reference sequence, when the probe and reference sequence are aligned to maximize complementarily between the two.
  • For each probe in the first set there are three corresponding probes from three additional probe sets. Thus, there are four probes corresponding to each nucleotide in the reference sequence.
  • probes from the three additional probe -sets aaee identical to the corresponding probe from the first probe set except at the interrogation position, which occurs in the same position in each of the four corresponding probes from the four probe sets, and is occupied by a different nucleotide in the four probe sets.
  • probes were 25 nucleotides long. Arrays tiled for multiple different references sequences were included on the same substrate.
  • target sequences from an individual were amplified from human genomic DNA using primers for the fragments indicated in the listed Web sites.
  • the amplified target sequences were fluorescently labelled during or after PCR.
  • the labelled target sequences were hybridized with a substrate bearing immobilized arrays of probes. The amount of lable bound to probes was measured. Analysis of the pattern of label revealed the nature and position of differences between the target and reference sequence. For example, comparison of the intensities of four corresponding probes reveals the identity of a corresponding nucleotide in the target sequences aligned with the interrogation position of the probes.
  • the corresponding nucleotide is the complement of the nucleotide occupying the interrogation position of the probe showing the highest intensity (see WO 95/11995) .
  • the existence of a polymorphism is also manifested by differences in normalized hybridization intensities of probes flanking the polymorphism when the probes hybridized to corresponding targets from different individuals. For example, relative loss of hybridization intensity in a "footprint" of probes flanking a polymorphism signals a difference between the target and reference (i.e., a polymorphism) (see EP 717,113) .
  • hybridization intensities for corresponding targete-s from different individuals can be classified into groups or clusters suggested by the data, not defined a priori , such that isolates in a give cluster tend to be similar and isolates in different clusters tend to be dissimilar. Hybridizations to samples from different individuals were performed separately. The Table summarizes the data obtained for target sequences in comparison with a reference sequence for the individuals tested.
  • the invention includes a number of general uses that can be expressed concisely as follows.
  • the invention provides for the use of any of the nucleic acid segments described above in the diagnosis or monitoring of diseases, such as cancer, inflammation, heart disease, diseases of the CNS, and susceptibility to infection by microorganisms.
  • the invention further provides for the use of any of the nucleic acid segments in the manufacture of a medicament for the treatment or prophylaxis of such diseases.
  • the invention further provides for the use of any of the DNA segments as a pharmaceutical.
  • Wl-7718b 248 AGGAACAAAAAATTACAAAGAACCATGCAGGAAGGAAAACTATGTATT[A/G1AT
  • ATrGCACTG GTTTTTGAAATACCTTTGTAGTTACTCAAGC[A/C, ⁇ GTTACTCCCTACACTGATGC AAGGATTACAGAAACTGATGCCAAGGGGCTGAGTGAGTTCAACTACATGTTCTGGGGGCCCGGAGAT AGATGACTTTGCAGATGGAMGAGGTGAAAATGAAGAAGGAAGCTGTGTTGAAACAGAAAAATAAG
  • Wl-7718a 42 TCAAAAGGAACAAAAATTACAAAGAACCATGCAGGAAGGAAAACTATGTATTA
  • WI-7227C 291 TTATTATGGGAAAGGAAATGGCATTGCTGCTTTCAACCAGCGACTAATGCAAT
  • Wl-7227b 93 GTGTTATTATGGGAAAGGAAATGGCATTGCTGCTTTCAACCAGCGACTAATG
  • Wl-7227a 24 G GTGTTATTATGGGAAAGGAAATGGCATTGCTGCTTTCAACCAGCGACTAATG j CCACAATGCCTCTCCCACGATGTCAAGGACTCCTGTCTGTCCTGGAGGTGGGAGACAAGGAACCTCCG
  • Wl-1 95b 1 30 AGTGAGCTGGGGAAGGCAGGATTT
  • Wl-1126b 230 AAAATGCAAATCCAGCTGT CTTTTT[T/C
  • Wl-3429b 64 TCCTGACTGTTAACAAGCACTCCAGGCAATTCTTAAGACCAAGCACGGAGC
  • Wl-3429a 62 TCCTGACTGTTAACAAGCACTCCAGGCAATTCTTAAGACCAAGCACGGAGC
  • Wl-6786b 1 1 1 TTTTTGGCAGGGGACACTCCTTCTGGGTGCTCTATTGCTCAGTTTCATCATT
  • Wl-6786a 1 06 TTTTTGGCAGGGGACACTCCTTCTGGGTGCTCTATTGCTCAGTTTCATCATT
  • AAAAGGACAG TTTCCATCTTA CCAGATATCA TTTCATTTCTG CAACATTTATCAAACATGGTAGGGAAMGTTCTCACTCTGCACTATAAAAAGGACAGCCAGATATCA
  • CAGAAMTCA ATGAGACCCTGCTTTGMCGTTAMCGTTTTGGMTMTGGAAMGGAGCTAGGACMTTCTTGCTT
  • AAAAATTAAC CAGGGTCTTGCTCTGTCTCCCAGGCTAGAGTGAGGTGACACMTCMGACTCACAGTAGCCTCMCCT
  • WI-7079 293 TTTTACAGCTCTTGGCAT ⁇ TCCTCGCCTAGGCCTGTGAGGTMCTGGGAT
  • Wl-7104b 249 GTGAGGCCTTGCACCAGGTGGGGGCCACAGCACCAGCAGCATCTTTG[CtFJF
  • WI-9161 61 1 CCTGGC GGM CTGTCTAGTCTCTCCTGTMGCCAMGMATGMCATTCCA
  • Wl-7023b 206 A[C/A]ACACACATTCTTGCTCTACCCAMGCTCTGGCTGGCAGCACTM
  • WI-7093 54 GGGAGAGCTCTTGTTATFATTMTATTGTTGCCGCTGTTGTGTTGTTGTTA
  • ACTTCTCCC TCTGACCTAGG MAGMCTACAGAGGACGATGTCCAAMCMAAMTGGCATCACCTGTCAAAMTGGAGTTCCACT
  • WI-205C 1 46 ATCTTACTTTGTTTAAMMCTGCATATGCCTTTA I I I I I GTTTTAGTTCCC
  • Wl-205b 1 46 ATCTTACTTTGTTTAAAAMCTGCATATGCCTTTATTTTTGTTTTAGTTCCC
  • WI-1943C 1 65 TACAGGGCACCGNTGAGCATTCCAGATGACTCCAMGCCCCGGCTGGAGTAT
  • Wl-1943b 1 65 TACAGGGCACCGNTGAGCATTCCAGATGACTCCAMGCCCCGGCTGGAGTAT
  • Wi-6336b 234 GTACCCCAGTGCATTATGTCTTGGTAGAGCC[C/T]TGAGGACACTGACAGT
  • Wl-6564b 54 GTTCCTTGGCAGGAGMCATGCATATGACTTTAAMTMAGACCMCA
  • Wl-6817 1 45 MGATGTTGGACACCTTGTGTTCAMTCTTGGTTCAGGTGCGGCCTGTGCAG
  • Wl-6826b 1 54 TMGCTGMTTGCAMTTATGGCMCACACACTGGACTGGGGTATACGTTG
  • WI-6826 1 54 TMGCTGMTTGCAMTTATGGCMCACACACTGGACTGGGGTATACGTTG
  • Wl-7056b 1 8
  • WI-7136 58 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGTAGCTTTCTATATATG
  • WI-7146C 21 0 MCGC[A/G]GTTCATGTACMGGCCCCTCTGCMCTGGAGAGAAMTTA
  • WI-7146 202 ICCMCGCAGTTCATGTACMGGCCCCTCTGCMCTGGAGAGAAMTTA
  • WI-7153 1 61 AGTACCTATCTTTAMGTATAGTACATTTTACATATGTAAATGGTATGTTT
  • Wl-7169b 1 61 TTTCMGTCATCTTAGCAGCTAGGATTCTCAMTGGMGTGTTATATATA
  • Wl-7464b 1 68 GAMGMAGCCCTACAMTAGGCCCAGGAGMGCMCGTTCACCMCMTTAT
  • Wl-7464a 1 03 GAMGMAGCCCTACAMTAGGCCCAGGAGMGCAACGTTCACCMCMTTAT
  • Wl-7506b 1 1 8 GMGMMTATTTTAAMTATTGGACCACTCTTGTTCTACCATCCCTACCCACT
  • Wl-7534b 1 43 AGAGTGCTGCTAAM ⁇ GGATTGGTGTGATCTTTTTGGTAGTTGTMTTT
  • WI-7534 1 35 AGAGTGCTGCTAAMTTGGATTGGTGTGATCTTTTTGGTAGTTGTMTTT
  • Wl-7543b 1 62 CTCTGCAGCCCTCAGATFATTTTTCCTCTGGCTCCTTGGATGTAGTCAGTTA
  • Wl-7577g 1 57 ATTGTATMTGTGGCCTGTTATACATGACACTCTTCTGMTTGACTGTATTTC
  • WI-7743C 1 06 GAGGGGCAGMCAGCCGCTCCTGTCTGCCAGCCAGCAGCCAGCTCTCAGCC
  • Wl-7743 1 06 GAGGGGCAGMCAGCCGCTCCTGTCTGCCAGCCAGCAGCCAGCTCTCAGCC
  • Wl-7765b 1 26 ACTCAMCCAMTCACTGMCTTTGCTGAGCCTGTAMATAAMGGTCGGA
  • Wl-7774b 1 70 ATGATTGAAMTMTGCTGTCCTTTAGTAGCMGTAAMTGTGTCTTGCT
  • Wl-7785b 1 65 TAATTIATTTTGTCCATTGATGTATTTATTTTGTAMTGTATCTTGGTGCTGC
  • WI-7789C 84 GCCCTCCTGGTGACTCGGGGGCTGTCTCAGACGACTAGCCCAGGACCCATCT _
  • Wl-7830d 1 50 T AGGTTGATCGTTGTGTTGTTRTGCTGCACTTTTTACI I I I I IGCGTGTGGA
  • WI-7830C 54 AGGTTGATCGTTGTGTTGTTTFGCTGCACTTTTTAC I I I I I GCGTGTGGA
  • Wl-7830b 1 34 AGGTTGATCGTTGTGTTGTTFTGCTGCACTTTTTAC I I I I I GCGTGTGGA
  • Wl-7900d 1 28 TATGATGTATTTCTGAGCTAAMCTCMCTATAGMGACATTAAAAGAAATC
  • WI-7900C 84 TATGATGTATTTCTGAGCTAAMCTCMCTATAGMGACATTAMAGAAATC
  • WI-7900 84 TATGATGTATTTCTGAGCTAAAACTCAACTATAGAAGACATTAAAAGAAATC
  • WI-8024C 206 TTCCC[A/G]CTCTAGMCAGCTGGCCCTGGTCGTCAGTACACMGGAMGAGC
  • Wl-8024b 206 TTCCC[A/G]CTCTAGMCAGCTGGCCCTGGTCGTCAGTACACMGGMAGAGC
  • WI-8321 1 78 TTTTGCTATGGTTCTAGTTFATCMCCTACTTTATTAGCTGMCTGTTGGC
  • WI-8321 1 78 TTTFGCTATGGTTCTAGTTTATCMCCTACTTTATTAGCTGMCTGTTGGC
  • Wl-8332b 123 AGGTGGAGGGTNTCCGGGGMGCAGTTAGATGAGTTMGTGTGATGCACA
  • Wl-8378b 31 1 MCTGCCCCCATGATCCMTCACCTNTCACCAGGCCCCTCCTCCMCACGTGGGG
  • WI-8378 308 MCTGCCCCCATGATCCMTCACCTNTCACCAGGCCCCTCCTCCMCACGTGGGG
  • WI-8426 1 84 G AGGCTGGGAGTATGGANGGNCCCGGGGCCCTTGGCNATNGNATFCAGTGAG
  • Wl-9676h 1 34 AGGCCAGGGTCTCAGCTTTAMGCCTTGGMTCCTATGCATTGTTTGTTT
  • Wl-9676d 1 34 AGGCCAGGGTCTCAGCTTTAMGCCTTGGMTCCTATGCATTGTTTGTTT
  • WI-9832 1 1 6 A TTTGTMGTGGACTAMGTTTGAGGACCAGACATGGMGGTTGGCTTTGGC
  • AAAGCATGAC CGCTTATGTTA AATAAAATGA ATAGTMTTCC CMGTGAATATTGATACATGGCTGACMAGCATGACMTMMTGMCAC[A/G]TACGGGMTTAC

Abstract

The invention provides nucleic acid segments of the human genome including polymorphic sites. Allele-specific primers and probes hybridizing to regions flanking these sites are also provided. The nucleic acids, primers and probes are used in applications such as forensics, paternity testing, medicine and genetic analysis.

Description

BIALLELIC MARKERS
RELATED APPLICATIONS
This application claims priority to U.S. provisional application Serial No. 60/030,455, filed November 6, 1996, the entire teachings of which are incorporated herein by reference .
BACKGROUND OF THE INVENTION
The genomes of all organisms undergo spontaneous mutation in the course of their continuing evolution, generating variant forms of progenitor sequences (Gusella, Ann . Rev. Biochem . 55, 831-854 (1986)). The variant form may confer an evolutionary advantage or disadvantage relative to a progenitor form or may be neutral. In some instances, a variant form confers a lethal disadvantage and is not transmitted to subsequent generations of the organism. In other instances, a variant form confers an evolutionary advantage to the species and is eventually incorporated into the DNA of many or most members of the species and effectively becomes the progenitor form. In many instances, both progenitor and variant form(s) survive and co-exist in a species population. The coexistence of multiple forms of a sequence gives rise to polymorphisms.
Several different types of polymorphism have been reported. A restriction fragment length polymorphism (RFLP) Is a variation in DNA sequence that alters the length of a restriction fragment (Botstein et al . , Am. J. Hum . Genet . 32, 314-331 (1980)). The restriction fragment length polymorphism may create or delete a restriction site, thus changing the length of the restriction fragment. RFLPs have been widely used in human and animal genetic analyses (see WO 90/13668; W090/11369; Donis-Keller, Cell 51, 319-337 (1987); Lander et al . , Genetics 121, 85-99 (1989) ) . When a heritable trait can be linked to a particular RFLP, the presence of the RFLP in an individual can be used to predict the likelihood that the animal will also exhibit the trait.
Other polymorphisms take the form of short tandem repeats (STRs) that include tandem di-, tri- and tetra- nucleotide repeated motifs. These tandem repeats are also referred to as variable number tandem repeat (VNTR) polymorphisms. VNTRs have been used in identity "and paternity analysis (US 5,075,217; Armour et al . , FEBS Lett . 307, 113-115 (1992); Horn et al . , W0 91/14003; Jeffreys, EP 370,719), and in a large number of genetic mapping studies. Other polymorphisms take the form of single nucleotide variations between individuals of the same species . Such polymorphisms are far more frequent than RFLPs , STRs and VNTRs . Some single nucleotide polymorphisms occur in protein-coding sequences, in which case, one of the polymorphic forms may give rise to the expression of a defective or other variant protein and, potentially, a genetic disease. Examples of genes, in which polymorphisms within coding sequences give rise to genetic disease include -globin (sickle cell anemia) and CFTR (cystic fibrosis) . Other single nucleotide polymorphisms occur in noncoding regions. Some of these polymorphisms may also result in defective protein expression (e.g., as a result of defective splicing) . Other single nucleotide polymorphisms have no phenotypic effects.
Single nucleotide polymorphisms can be used in the same manner as RFLPs and VNTRs, but offer several advantages. Single nucleotide polymorphisms occur with greater frequency and are spaced more uniformly throughout the genome than other forms of polymorphism. The greater frequency and uniformity of single nucleotide polymorphisms means that there is a greater probability that such a polymorphism will be found in close proximity to a genetic locus of interest than would be the case for other polymorphisms. The different forms of characterized single nucleotide polymorphisms are often easier to distinguish than other types of polymorphism (e.g., by use of assays employing allele-specific hybridization probes or primers) . Only a small percentage of the total repository of polymorphisms in humans and other organisms ha-s been identified. The limited number of polymorphisms identified to date is due to the large amount of work required for their detection by conventional methods. For example, a conventional approach to identifying polymorphisms might be to sequence the same stretch of DNA in a population of individuals by dideoxy sequencing. In this type of approach, the amount of work increases in proportion to both the length of sequence and the number of individuals in a population and becomes impractical for large stretches of DNA or large numbers of persons .
SUMMARY OF THE INVENTION
The invention provides nucleic acid sequences comprising nucleic acid segments of from about 10 to about 200 bases as shown in the Table, column 7, including a polymorphic site. Complements of these segments are also included. The segments can be DNA or RNA, and can be double- or single-stranded. Segments can be, for example, 10-20, 10-50 or 10-100 bases long. Preferred segments include a biallelic polymorphic site. The base occupying the polymorphic site in the segments can be the reference
(Table, column 3) or an alternative base .(Table, column 4) .
The invention further provides allele-specific- oligonucleotides that hybridize to a segment of a fragment shown in the Table, column 7, or its complement. These oligonucleotides can be probes or primers. Also provided are isolated nucleic acids comprising a sequence shown in the Table, column 7, or the complement thereto, in which the polymorphic site within the sequence is occupied by a base other than the reference base shown in the Table, column 3.
The invention further provides a method of analyzing a nucleic acid from an individual. The method determines which base is present at any one of the polymorphic sites shown in the Table. Optionally, a set of bases occupying a set of the polymorphic sites shown in the Table is determined. This type of analysis can be performed on a number of individuals, who are tested for the presence of a disease phenotype. The presence or absence of disease phenotype is then correlated with a base or set of bases present at the polymorphic sites in the individuals tested. DETAILED DESCRIPTION OF THE INVENTION DEFINITIONS
An oligonucleotide can be DNA or RNA, and single- or double- stranded. Oligonucleotides can be naturally occurring or synthetic, but are typically prepared by synthetic means. The oligonucleotides of the present invention can comprise all of an oligonucleotide sequence presented in column 7 of the Table or a segment of such an oligonucleotide which includes a polymorphic site. Oligonucleotides can be all of a nucleic acid segment as represented in column 7 of the Table; a nucleic acid sequence which comprises a nucleic acid segment represented in column 7 of the Table and additional nucleic acids (present at either or both ends of a nucleic acid segment of column 7) ; or a portion (fragment) of a nucleic acid segment represented in column 7 of the Table which includes a polymorphic site. Preferred oligonucleotides of the invention include segments of DNA, or their complements, which include any one of the polymorphic sites shown in the Table. The segments can be between 5 and 250 bases, and, in specific embodiments, are between 5-10, 5-20, 10-20, 10- 50, 20-50 or 10-100 bases. The polymorphic site can occur within any position of the segment. The segments can be from any of the allelic forms of DNA shown in the Table. Hybridization probes are oligonucleotides which bind in a base-specific manner to a complementary strand of nucleic acid. Such probes include peptide nucleic acids, as described in Nielsen et al . , Science 254, 1497-1500 (1991) . As used herein, the term primer refers to a single- stranded oligonucleotide which acts as a point of initiation of template-directed DNA synthesis under appropriate conditions ( e . g. , in the presence of four different nucleoside triphosphates and an agent for polymerization, such as, DNA or RNA polymerase or reverse transcriptase) in an appropriate buffer and at a suitable temperature . The appropriate length of a primer depends on the intended use of the primer, but typically ranges from 15 to 30 nucleotides. Short primer molecules generally require cooler temperatures to form sufficiently stable hybrid complexes with the template . A primer need not reflect the exact sequence of the template, but must be sufficiently complementary to hybridize with a template. The term primer site refers to the area of the target DNA to which a primer hybridizes. The term primer pair refers to a set of primers including a 5' (upstream) -primer that hybridizes with the 5' end of the DNA sequence to be amplified and a 3' (downstream) primer that hybridizes with the complement of the 3' end of the sequence to be amplified.
As used herein, linkage describes the tendency of genes, alleles, loci or genetic markers to be inherited together as a result of their location on the same chromosome. It can be measured by percent recombination •between the two genes, alleles, loci or genetic markers.
As used herein, polymorphism refers to the occurrence of two or more genetically determined alternative sequences or alleles in a population. A polymorphic marker or site is the locus at which divergence occurs. Preferred markers have at least two alleles, each occurring at frequency of greater than 1%, and more preferably greater than 10% or 20% of a selected population. A polymorphic locus may be as small as one base pair. Polymorphic markers include restriction fragment length polymorphisms, variable number of tandem repeats (VNTR's), hypervariable regions, minisatellites, dinucleotide repeats, trinucleotide repeats, tetranucleotide repeats, simple sequence repeats, and insertion elements such as Alu. The first identified allelic form is arbitrarily designated as the reference form and other allelic forms are designated as alternative or variant alleles. The allelic form occurring most frequently in a selected population is sometimes referred to as the wildtype form. Diploid organisms may be homozygous or heterozygous for allelic forms. A diallelic or biallelic polymorphism has two forms. A triallelic polymorphism has three forms. A single nucleotide polymorphism occurs at a polymorphic site occupied by a single nucleotide, which is the site of variation between allelic sequences . -The site is usually preceded by and followed by highly conserved sequences of the allele (e.g., sequences that vary in less than 1/100 or 1/1000 members of the populations) .
A single nucleotide polymorphism usually arises due to substitution of one nucleotide for another at the polymorphic site. A transition is the replacement of one purine by another purine or one pyrimidine by another pyrimidine. A transversion is the replacement of a purine by a pyrimidine or vice versa. Single nucleotide polymorphisms can also arise from a deletion of a nucleotide or an insertion of a nucleotide relative to a reference allele. Typically the polymorphic site is occupied by a base other than the reference base. For example, where the reference allele contains the base "T" at the polymorphic site, the altered allele can contain a "C", "G" or "A" at the polymorphic site.
Hybridizations are usually performed under stringent conditions, for example, at a salt concentration of no more than 1 M and a temperature of at least 25 °C. For example, conditions of 5X SSPE (750 mM NaCl, 50 mM NaPhosphate, 5 mM EDTA, pH 7.4) and a temperature of 25-30°C, or equivalent conditions, are suitable for allele-specific probe hybridizations. Equivalent conditions can be determined by varying one or more of the parameters given as an example, as known in the art, while maintaining a similar degree of identity or similarity between the target nucleotide sequence and the primer or probe used.
The term "isolated" is used herein to indicate that the material in question exists in a physical milieu distinct from that in which it occurs in nature. For example, an isolated nucleic acid of the invention may be substantially isolated with respect to the complex cellular milieu in which it naturally occurs. In some instances,-" the- isolated material will form part of a composition (for example, a crude extract containing other substances) , buffer system or reagent mix. In other circumstance, the material may be purified to essential homogeneity, for example as determined by PAGE or column chromatography such as HPLC. Preferably, an isolated nucleic acid comprises at least about 50, 80 or 90 percent (on a molar basis) of all macromolecular species present.
I . Novel Polymorphisms of the Invention
The novel polymorphisms of the invention are listed in the Table. The first column of the Table lists the names assigned to the fragments in which the polymorphisms occur. The fragments are all human genomic fragments. The sequence of one allelic form of each of the fragments (arbitrarily referred to as the prototypical or reference form) has been previously published. These sequences are listed at http://www-genome.wi.mit.edu/ (all STS's (sequence tag sites)); http://shgc.stanford.edu (Stanford STS's); and http://ww.tigr.org/ (TIGR STS's). The Web sites also list primers for amplification of the fragments, and the genomic location of fragments. Some fragments are expressed sequence tags, and some are random genomic fragments. All information in the websites concerning the fragments listed in the Table is incorporated by reference in its entirety for all purposes.
The second column lists the position in the fragment in which a polymorphic site has been found. Positions are numbered consecutively with the first base of the fragment sequence as listed in one of the above databases being assigned the number one. The third column lists the base occupying the polymorphic site in the sequence in the data base. This base is arbitrarily designated -the-- reierence or prototypical form, but it is not necessarily the most frequently occurring form. The fourth column in the Table lists the alternative base(s) at the polymorphic site. The fifth column of the Table lists a 5' (upstream or forward) primer that hybridizes with the 5' end of the DNA sequence to be amplified. The sixth column of the Table lists a 3' (downstream or reverse) primer that hybridizes with the complement of the 3' end of the sequence to be amplified.
The seventh column of the Table lists a number of bases of sequence on either side of the polymorphic site in each fragment . The indicated sequences can be either DNA or RNA. In the latter, the T's shown in the Table are replaced by U's. The base occupying the polymorphic site is indicated in EUPAC-IUB ambiguity code.
II. Analysis of Polymorphisms A. Preparation of Samples Polymorphisms are detected in a target nucleic acid from an individual being analyzed. For assay of genomic
DNA, virtually any biological sample (other than pure red blood cells) is suitable. For example, convenient tissue samples include whole blood, semen, saliva, tears, urine, fecal material, sweat, buccal, skin and hair. For assay of cDNA or mRNA, the tissue sample must be obtained from an organ in which the target nucleic acid is expressed. For example, if the target nucleic acid is a cytochrome P450, the liver is a suitable source.
Many of the methods described below require amplification of DNA from target samples. This can be accomplished by e.g., PCR. See generally PCR Technology: Principles and Applications for DNA Amplifica tion (ed. H.A. Erlich, Freeman Press, NY, NY, 1992) ; PCR Protocols : A Guide to Methods and Applications (eds. Innis,-- et-al . , Academic Press, San Diego, CA, 1990); Mattila et al . , Nuclei c Acids Res . 19, 4967 (1991); Eckert et al . , PCR Methods and Applications 1, 17 (1991); PCR (eds. McPherson et al . , IRL Press, Oxford); and U.S. Patent 4,683,202.
Other suitable amplification methods include the ligase chain reaction (LCR) (see Wu and Wallace, Genomics 4, 560 (1989), Landegren et al . , Science 241, 1077 (1988), transcription amplification (Kwoh et al . , Proc . Na tl . Acad . Sci . USA 86, 1173 (1989)), and self-sustained sequence replication (Guatelli et al . , Proc . Nat . Acad . Sci . USA, 87, 1874 (1990)) and nucleic acid based sequence amplification (NASBA) . The latter two amplification methods involve isothermal reactions based on isothermal transcription, which produce both single stranded RNA (ssRNA) and double stranded DNA (dsDNA) as the amplification products in a ratio of about 30 or 100 to 1, respectively.
B. Detection of Polymorphisms in Target DNA
There are two distinct types of analysis of target DNA for detecting polymorphisms. The first type of analysis, sometimes referred to as de novo characterization, is carried out to identify polymorphic sites not previously characterized (i.e., to identify new polymorphisms) . This analysis compares target sequences in different individuals to identify points of variation, i.e., polymorphic sites. By analyzing groups of individuals representing the greatest ethnic diversity among humans and greatest breed and species variety in plants and animals, patterns characteristic of the most common alleles/haplotypes of the locus can be identified, and the frequencies of such alleles/haplotypes in the population can be determined. Additional allelic frequencies can be determined -for subpopulations characterized by criteria such as geography, race, or gender. The de novo identification of polymorphisms of the invention is described in the Examples section. The second type of analysis determines which form(s) of a characterized (known) polymorphism are present in individuals under test. There are a variety of suitable procedures, which are discussed in turn.
1. Allele-Specific Probes
The design and use of allele-specific probes for analyzing polymorphisms is described by e.g., Saiki et al . , Nature 324, 163-166 (1986); Dattagupta, EP 235,726, Saiki, WO 89/11548. Allele-specific probes can be designed that hybridize to a segment of target DNA from one individual but do not hybridize to the corresponding segment from another individual due to the presence of different polymorphic forms in the respective segments from the two individuals. Hybridization conditions should be sufficiently stringent that there is a significant difference in hybridization intensity between alleles, and preferably an essentially binary response, whereby a probe hybridizes to only one of the alleles. Some probes are designed to hybridize to a segment of target DNA such that the polymorphic site aligns with a central position (e.g., in a 15-mer at the 7 position; in a 16-mer, at either the 8 or 9 position) of the probe. This design of probe achieves good discrimination in hybridization between different allelic forms.
Allele-specific probes are often used in pairs, one member of a pair showing a perfect match to a reference form of a target sequence and the other member showing a perfect match to a variant form. Several pairs of probes can then be immobilized on the same support for simultaneous analysis of multiple polymorphisms within the same target sequence .
2. Tiling Arrays
The polymorphisms can also be identified by hybridization to nucleic acid arrays, some examples of which are described in WO 95/11995. One form of such arrays is described in the Examples section in connection with de novo identification of polymorphisms. The same array or a different array can be used for analysis of characterized polymorphisms. WO 95/11995 also describes subarrays that are optimized for detection of a variant form of a precharacterized polymorphism. Such a subarray contains probes designed to be complementary to a second reference sequence, which is an allelic variant of the first reference sequence. The second group of probes is designed by the same principles as described in the Examples, except that the probes exhibit complementarity to the second reference sequence. The inclusion of a second group (or further groups) can be particularly useful for analyzing short subsequences of the primary reference sequence in which multiple mutations are expected to occur within a short distance commensurate with the length of the probes (e.g., two or more mutations within 9 to 21 bases) .
3. Allele-Specific Primers An allele-specific primer hybridizes to a site on target DNA overlapping a polymorphism and only primes amplification of an allelic form to which the primer exhibits perfect complementarity. See Gibbs , Nucl eic Acid Res . 17, 2427-2448 (1989) . This primer is used in conjunction with a second primer which hybridizes at a distal site. Amplification proceeds from the -two-primers , resulting in a detectable product which indicates the particular allelic form is present. A control is usually performed with a second pair of primers, one of which shows a single base mismatch at the polymorphic site and the other of which exhibits perfect complementarity to a distal site. The single-base mismatch prevents amplification and no detectable product is formed. The method works best when the mismatch is included in the 3 ' -most position of the oligonucleotide aligned with the polymorphism because this position is most destabilizing to elongation from the primer (see, e.g., WO 93/22456).
4. Direct-Sequencing
The direct analysis of the sequence of polymorphisms of the present invention can be accomplished using either the dideoxy chain termination method or the Maxam Gilbert method (see Sambrook et al . , Molecular Cloning, A Laboratory Manual (2nd Ed., CSHP, New York 1989); Zyskind et al . i Recombinant DNA Laboratory Manual , (Acad. Press, 1988) ) . 5. Denaturing Gradient Gel Electrophoresis Amplification products generated using the polymerase chain reaction can be analyzed by the use of denaturing gradient gel electrophoresis. Different alleles can be identified based on the different sequence-dependent melting properties and electrophoretic migration of DNA in solution. Erlich, ed. , PCR Technology, Principles and Applica tions for DNA Amplification, (W.H. Freeman and Co, New York, 1992), Chapter 7.
6. Single-Strand Conformation Polymorphism Analysis
Alleles of target sequences can be differentia-ted using single- strand conformation polymorphism analysis, which identifies base differences by alteration in electrophoretic migration of single stranded PCR products, as described in Orita et al . , Proc . Na t . Acad . Sci . 86,
2766-2770 (1989) . Amplified PCR products can be generated as described above, and heated or otherwise denatured, to form single stranded amplification products. Single- stranded nucleic acids may refold or form secondary structures which are partially dependent on the base sequence. The different electrophoretic mobilities of single-stranded amplification products can be related to base-sequence differences between alleles of target sequences .
III. Methods of Use
After determining polymorphic form(s) present in an individual at one or more polymorphic sites, this information can be used in a number of methods. A . Forensics
Determination of which polymorphic forms occupy a set of polymorphic sites in an individual identifies a set of polymorphic forms that distinguishes the individual. See generally National Research Council, The Evaluation of Forensi c DNA Evidence (Eds. Pollard et al . , National Academy Press, DC, 1996) . The more sites that are analyzed, the lower the probability that the set of polymorphic forms in one individual is the same as that in an unrelated individual. Preferably, if multiple sites are analyzed, the sites are unlinked. Thus, polymorphisms of the invention are often used in conjunction with ~- polymorphisms in distal genes. Preferred polymorphisms for use in forensics are biallelic because the population frequencies of two polymorphic forms can usually be determined with greater accuracy than those of multiple polymorphic forms at multi-allelic loci.
The capacity to identify a distinguishing or unique set of forensic markers in an individual is useful for forensic analysis. For example, one can determine whether a blood sample from a suspect matches a blood or other tissue sample from a crime scene by determining whether the set of polymorphic forms occupying selected polymorphic sites is the same in the suspect and the sample. If the set of polymorphic markers does not match between a suspect and a sample, it can be concluded (barring experimental error) that the suspect was not the source of the sample. If the set of markers does match, one can conclude that the DNA from the suspect is consistent with that found at the crime scene. If frequencies of the polymorphic forms at the loci tested have been determined (e.g., by analysis of a suitable population of individuals) , one can perform a statistical analysis to determine the probability that a match of suspect and crime scene sample would occur by chance . p(ID) is the probability that two random individuals have the same polymorphic or allelic form at a given polymorphic site. In biallelic loci, four genotypes are possible: AA, AB, BA, and BB . If alleles A and B occur in a haploid genome of the organism with frequencies x and y, the probability of each genotype in a diploid organism is
(see WO 95/12607) : Homozygote: p (AA) = x2
Homozygote: p(BB)= y2 = (1-x)2
Single Heterozygote : p(AB)= p (BA) = xy = x(l-x)
Both Heterozygotes : p (AB+BA) = 2xy = 2x(l-x)-
The probability of identity at one locus (i.e, the probability that two individuals, picked at random from a population will have identical polymorphic forms at a given locus) is given by the equation: p(ID) = (x2)2 + (2xy)2 + (y2)2.
These calculations can be extended for any number of polymorphic forms at a given locus. For example, the probability of identity p(ID) for a 3-allele system where the alleles have the frequencies in the population of x, y and z, respectively, is equal to the sum of the squares of the genotype frequencies : p(ID) = x4 + (2xy)2 + (2yz)2 + (2xz)2 + z4 + y4
In a locus of n alleles, the appropriate binomial expansion is used to calculate p(ID) and p(exc) .
The cumulative probability of identity (cum p(ID)) for each of multiple unlinked loci is determined by multiplying the probabilities provided by each locus. cum p(ID) = p(IDl)p(ID2)p(ID3) .... p(IDn) The cumulative probability of non-identity for n loci (i.e. the probability that two random individuals will be different at 1 or more loci) is given by the equation: cum p (nonID) = l-cum p(ID) . If several polymorphic loci are tested, the cumulative probability of non- identity for random individuals becomes very high (e.g., one billion to one) . Such probabilities can be taken into account together with other evidence in determining the guilt or innocence of the suspect .
B. Paternity Testing
The object of paternity testing is usually" to~determine whether a male is the father of a child. In most cases, the mother of the child is known and thus, the mother's contribution to the child's genotype can be traced. Paternity testing investigates whether the part of the child's genotype not attributable to the mother is consistent with that of the putative father. Paternity testing can be performed by analyzing sets of polymorphisms in the putative father and the child. If the set of polymorphisms in the child attributable to the father does not match the set of polymorphisms of the putative father, it can be concluded, barring experimental error, that the putative father is not the real father. If the set of polymorphisms in the child attributable to the father does match the set of polymorphisms of the putative father, a statistical calculation can be performed to determine the probability of coincidental match.
The probability of parentage exclusion (representing the probability that a random male will have a polymorphic form at a given polymorphic site that makes him incompatible as the father) is given by the equation (see WO 95/12607) : p(exc) = xy(l-xy) where x and y are the population frequencies of alleles A and B of a biallelic polymorphic site.
(At a triallelic site p(exc) = xy(l-xy) + yz (1- yz) + xz(l-xz)+ 3xyz (1-xyz) ) ) , where x, y and z and the respective population frequencies of alleles A, B and C) .
The probability of non-exclusion is p(non-exc) = l-p(exc)
The cumulative probability of non-exclusion (representing the value obtained when n loc-i a^re used) is thus : cum p(non-exc) = p (non-excl) p (non-exc2) p (non-exc3 ) .... p(non-excn)
The cumulative probability of exclusion for n loci (representing the probability that a random male will be excluded) cum p(exc) = 1 - cum p(non-exc) . If several polymorphic loci are included in the analysis, the cumulative probability of exclusion of a random male is very high. This probability can be taken into account in assessing the liability of a putative father whose polymorphic marker set matches the child's polymorphic marker set attributable to his/her father.
C. Correlation of Polymorphisms with Phenotypic Traits The polymorphisms of the invention may contribute to the phenotype of an organism in different ways . Some polymorphisms occur within a protein coding sequence and contribute to phenotype by affecting protein structure.
The effect may be neutral, beneficial or detrimental, or both beneficial and detrimental, depending on the circumstances . For example, a heterozygous sickle cell mutation confers resistance to malaria, but a homozygous sickle cell mutation is usually lethal. Other polymorphisms occur in noncoding regions but may exert phenotypic effects indirectly via influence on replication, transcription, and translation. A single polymorphism may affect more than one phenotypic trait. Likewise, a single phenotypic trait may be affected by polymorphisms in different genes. Further, some polymorphisms predispose an individual to a distinct mutation that is causally related to a certain phenotype.
Phenotypic traits include diseases that ha-ve teiown but hitherto unmapped genetic components (e.g., agammaglobulimenia, diabetes insipidus, Lesch-Nyhan syndrome, muscular dystrophy, Wiskott-Aldrich syndrome,
Fabry's disease, familial hypercholesterolemia, polycystic kidney disease, hereditary spherocytosis, von Willebrand' s disease, tuberous sclerosis, hereditary hemorrhagic telangiectasia, familial colonic polyposis, Ehlers-Danlos syndrome, osteogenesis imperfecta, and acute intermittent porphyria) . Phenotypic traits also include symptoms of, or susceptibility to, multifactorial diseases of which a component is or may be genetic, such as autoimmune diseases, inflammation, cancer, diseases of the nervous system, and infection by pathogenic microorganisms. Some examples of autoimmune diseases include rheumatoid arthritis, multiple sclerosis, diabetes (insulin-dependent and non-independent) , systemic lupus erythematosus and Graves disease. Some examples of cancers include cancers of the bladder, brain, breast, colon, esophagus, kidney, leukemia, liver, lung, oral cavity, ovary, pancreas, prostate, skin, stomach and uterus. Phenotypic traits also include characteristics such as longevity, appearance (e.g., baldness, obesity), strength, speed, endurance, fertility, and susceptibility or receptivity to particular drugs or therapeutic treatments.
Correlation is performed for a population of individuals who have been tested for the presence or absence of a phenotypic trait of interest and for polymorphic markers sets. To perform such analysis, the presence or absence of a set of polymorphisms (i.e. a polymorphic set) is determined for a set of the individuals, some of whom exhibit a particular trait, and some of which exhibit lack of the trait. The alleles of each polymorphism of the set are then reviewed--to-determine whether the presence or absence of a particular allele is associated with the trait of interest. Correlation can be performed by standard statistical methods such as a K - squared test and statistically significant correlations between polymorphic form(s) and phenotypic characteristics are noted. For example, it might be found that the presence of allele Al at polymorphism A correlates with heart disease. As a further example, it might be found that the combined presence of allele Al at polymorphism A and allele Bl at polymorphism B correlates with increased milk production of a farm animal.
Such correlations can be exploited in several ways . In the case of a strong correlation between a set of one or more polymorphic forms and a disease for which treatment is available, detection of the polymorphic form set in a human or animal patient may justify immediate administration of treatment, or at least the institution of regular monitoring of the patient. Detection of a polymorphic form correlated with serious disease in a couple contemplating a family may also be valuable to the couple in their reproductive decisions. For example, the female partner might elect to undergo in vitro fertilization to avoid the possibility of transmitting such a polymorphism from her husband to her offspring. In the case of a weaker, but still statistically significant correlation between a polymorphic set and human disease, immediate therapeutic intervention or monitoring may not be justified. Nevertheless, the patient can be motivated to begin simple life-style changes (e.g., diet, exercise) that can be accomplished at little cost to the patient but confer potential benefits in reducing the risk of conditions to which the patient may have increased susceptibility by virtue of variant alleles . Identification -of -a polymorphic set in a patient correlated with enhanced receptiveness to one of several treatment regimes for a disease indicates that this treatment regime should be followed.
For animals and plants, correlations between characteristics and phenotype are useful for breeding for desired characteristics. For example, Beitz et al . , US 5,292,639 discuss use of bovine mitochondrial polymorphisms in a breeding program to improve milk production in cows. To evaluate the effect of mtDNA D-loop sequence polymorphism on milk production, each cow was assigned a value of 1 if variant or 0 if wildtype with respect to a prototypical mitochondrial DNA sequence at each of 17 locations considered. Each production trait was analyzed individually with the following animal model:
Yijkpn= μ + YSi + Pj + Xk + β1 + ... jS17 + PEn + an +ep where Yijknp is the milk, fat, fat percentage, SNF, SNF percentage, energy concentration, or lactation energy record; μ is an overall mean; YSi is the effect common to all cows calving in year-season; Xk is the effect common to cows in either the high or average selection line; β to βxl are the binomial regressions of production record on mtDNA D-loop sequence polymorphisms; PEn is permanent environmental effect common to all records of cow n; an is effect of animal n and is composed of the additive genetic contribution of sire and dam breeding values and a Mendelian sampling effect; and ep is a random residual. It was found that eleven of seventeen polymorphisms tested influenced at least one production trait. Bovines having the best polymorphic forms for milk production at these eleven loci are used as parents for breeding the next generation of the herd.
D. Genetic Mapping of Phenotypic Traits The previous section concerns identifying correlations between phenotypic traits and polymorphisms that directly or indirectly contribute to those traits. The present section describes identification of a physical linkage between a genetic locus associated with a trait of interest and polymorphic markers that are not associated with the trait, but are in physical proximity with the genetic locus responsible for the trait and co-segregate with it. Such analysis is useful for mapping a genetic locus associated with a phenotypic trait to a chromosomal position, and thereby cloning gene(s) responsible for the trait. See Lander et al . , Proc . Na tl . Acad . Sci . (USA) 83, 7353-7357 (1986); Lander et al . , Proc . Na tl . Acad. Sci . (USA) 84, 2363-2367 (1987); Donis-Keller et al . , Cell 51, 319-337 (1987); Lander et al . , Genetics 121, 185-199 (1989)). Genes localized by linkage can be cloned by a process known as directional cloning. See Wainwright, Med . J. Australia 159, 170-174 (1993); Collins, Nature Genetics 1, 3-6 (1992) .
Linkage studies are typically performed on members of a family. Available members of the family are characterized for the presence or absence of a phenotypic trait and for a set of polymorphic markers. The distribution of polymorphic markers in an informative meiosis is then analyzed to determine which polymorphic markers co- segregate with a phenotypic trait. See, e . g. , Kerem et al . , Science 245, 1073-1080 (1989); Monaco et al . , Na ture 316, 842 (1985); Yamoka et al . , Neurology 40, 222-226 (1990); Rossiter et al . , FASEB Journal 5, 21-27 (1991). Linkage is analyzed by calculation of LOD (log of the odds) values. A lod value is the relative likelihood of obtaining observed segregation data for a marker and a genetic locus when the two are located at a recombination fraction θ , versus the situation in which the two are not linked, and thus segregating independently (Thompson & Thompson, Genetics in Medicine (5th ed, W.B. Saunders
Company, Philadelphia, 1991) ; Strachan, "Mapping the human genome" in The Human Genome (BIOS Scientific Publishers Ltd, Oxford) , Chapter 4) . A series of likelihood ratios are calculated at various recombination fractions ( θ ) , ranging from θ = 0.0 (coincident loci) to θ = 0.50
(unlinked) . Thus, the likelihood at a given value of θ is: probability of data if loci linked at θ to probability of data if loci unlinked. The computed likelihoods are usually expressed as the log10 of this ratio (i.e., a lod score) . For example, a lod score of 3 indicates 1000:1 odds against an apparent observed linkage being a coincidence. The use of logarithms- allows data collected from different families to be combined by simple addition. Computer programs are available for the calculation of lod scores for differing values of θ (e.g., LIPED, MLINK (Lathrop, Proc . Na t . Acad . Sci . (USA) 81, 3443-3446 (1984)) . For any particular lod score, a recombination fraction may be determined from mathematical tables. See Smith et al . , Ma thema tical tables for research workers in human genetics (Churchill, London, 1961); Smith, Ann . Hum . Genet . 32, 127-150 (1968) . The value of θ at which the lod score is the highest is considered to be the best estimate of the recombination fraction.
Positive lod score values suggest that the two loci are linked, whereas negative values suggest that linkage is less likely (at that value of θ ) than the possibility that the two loci are unlinked. By convention, a combined lod score of +3 or greater (equivalent to greater than 1000:1 odds in favor of linkage) is considered definitive evidence that two loci are linked. Similarly, by convention, a negative lod score of -2 or less is taken as definitive evidence against linkage of the two loci being compared. Negative linkage data are useful in excluding a chromosome or a segment thereof from consideration. The search focuses on the remaining non-excluded chromosomal locations .
IV. Modified Polypeptides and Gene Sequences The invention further provides variant forms of nucleic acids and corresponding proteins. The nucleic acids comprise one of the sequences described in the Table, column 8, in which the polymorphic position is occupied by one of the alternative bases for that position. Some nucleic acids encode full-length variant forms of proteins. Similarly, variant proteins have the prototypical amino acid sequences encoded by nucleic acid sequences shown in the Table, column 8, (read so as to be in- frame with the full-length coding sequence of which it is a component) except at an amino acid encoded by a codon including one of the polymorphic positions shown in the Table. That position is occupied by the amino acid coded by the corresponding codon in any of the alternative forms shown in the Table .
Variant genes can be expressed in an expression vector in which a variant gene is operably linked to a native or other promoter. Usually, the promoter is a eukaryotic promoter for expression in a mammalian cell. The transcription regulation sequences typically include a heterologous promoter and optionally an enhancer which is recognized by the host. The selection of an appropriate promoter, for example trp, lac, phage promoters, glycolytic enzyme promoters and tRNA promoters, depends on the host selected. Commercially available expression vectors can be used. Vectors can include host-recognized replication systems, amplifiable genes, selectable markers, host sequences useful for insertion into the host genome, and the like.
The means of introducing the expression construct into a host cell varies depending upon the particular construction and the target host. Suitable means include fusion, conjugation, transfection, transduction, electroporation or injection, as described in Sambrook, supra . A wide variety of host cells can be employed for expression of the variant gene, both prokaryotic and eukaryotic. Suitable host cells include bacteria such as E. coli , yeast, filamentous fungi, insect cells, mammalian cells, typically immortalized, e . g. , mouse, CHO, human and monkey cell lines and derivatives thereof. Preferred host cells are able to process the variant gene product to produce an appropriate mature polypeptide. Processing includes glycosylation, ubiquitination, disulfide bond formation, general post-translational modification, and the like. The protein may be isolated by conventional means of protein biochemistry and purification to obtain a substantially pure product, i . e . , 80, 95 or 99% free of cell component contaminants, as described in Jacoby, Methods in Enzymology Volume 104, Academic Press, New York (1984); Scopes, Protein Purifica tion, Principles and Practice, 2nd Edition, Springer-Verlag, New York (1987); and DeuLscher (ed) , Guide to Protein Purifica tion, Methods in Enzymology, Vol. 182 (1990). If the protein is secreted, it can be isolated from the supernatant in which the host cell is grown. If not secreted, the protein can be isolated from a lysate of the host cells.
The invention further provides transgenic nonhuman animals capable of expressing an exogenous variant gene and/or having one or both alleles of an endogenous variant gene inactivated. Expression of an exogenous variant gene is usually achieved by operably linking the gene to a promoter and optionally an enhancer, and microinjecting the construct into a zygote . See Hogan et al . , "Manipulating the Mouse Embryo, A Laboratory Manual," Cold Spring Harbor Laboratory. Inactivation of endogenous variant genes can be achieved by forming a transgene in which a cloned variant gene is inactivated by insertion of a positive selection marker. See Capecchi, Science 244, 1288-1292 (1989) . The transgene is then introduced into an embryonic stem cell, where it undergoes homologous recombination with an endogenous variant gene. Mice and other rodents are preferred animals. Such animals provide useful drug screening systems . In addition to substantially full-length polypeptides expressed by variant genes, the present invention includes biologically active fragments of the polypeptides, or analogs thereof, including organic molecules which simulate the interactions of the peptides. Biologically active fragments include any portion of the full-length polypeptide which confers a biological function on the variant gene product, including ligand binding, and antibody binding. Ligand binding includes binding by nucleic acids, proteins or polypeptides, small biologically active molecules, or large cellular structures.
Polyclonal and/or monoclonal antibodies that specifically bind to variant gene products but not to corresponding prototypical gene products are also provided. Antibodies can be made by injecting mice or other animals with the variant gene product or synthetic -peptide- fragments thereof. Monoclonal antibodies are screened as are described, for example, in Harlow & Lane, Antibodies , A Labora tory Manual , Cold Spring Harbor Press, New York (1988) ; Goding, Monoclonal antibodies, Principles and Practice (2d ed.) Academic Press, New York (1986) . Monoclonal antibodies are tested for specific immunoreactivity with a variant gene product and lack of immunoreactivity to the corresponding prototypical gene product . These antibodies are useful in diagnostic assays for detection of the variant form, or as an active ingredient in a pharmaceutical composition.
V. Kits The invention further provides kits comprising at least one allele-specific oligonucleotide as described above. Often, the kits contain one or more pairs of allele- specific oligonucleotides hybridizing to different forms of a polymorphism. In some kits, the allele-specific oligonucleotides are provided immobilized to a substrate. For example, the same substrate can comprise allele- specific oligonucleotide probes for detecting at least 10, 100 or all of the polymorphisms shown in the Table. Optional additional components of the kit include, for example, restriction enzymes, reverse-transcriptase or polymerase, the substrate nucleoside triphosphates , means used to label (for example, an avidin-enzyme conjugate and enzyme substrate and chromogen if the label is biotin) , and the appropriate buffers for reverse transcription, PCR, or hybridization reactions. Usually, the kit also contains instructions for carrying out the methods. The following Examples are offered for the purpose of illustrating the present invention and are not to be construed to limit the scope of this invention-.- T e teachings of all references cited herein are hereby incorporated herein by reference.
EXAMPLES
The polymorphisms shown in the Table were identified by resequencing of target sequences from three to ten unrelated individuals of diverse ethnic and geographic backgrounds by hybridization to probes immobilized to microfabricated arrays or conventional sequencing. The strategy and principles for design and use of such arrays are generally described in WO 95/11995. The strategy provides arrays of probes for analysis of target sequences showing a high degree of sequence identity to the reference sequences of the fragments shown in the Table, column 1. The reference sequences were sequence-tagged sites (STSs) developed in the course of the Human Genome Project (see, e . g . , Science 270, 1945-1954 (1995); Nature 380, 152-154 (1996)). Most STS's ranged from 100 bp to 300 bp in size. A typical probe array used in this analysis has two groups of four sets of probes that respectively tile both strands of a reference sequence. A first probe set comprises a plurality of probes exhibiting perfect complementarily with one of the reference sequences. Each probe in the first probe set has an interrogation position that corresponds to a nucleotide in the reference sequence. That is, the interrogation position is aligned with the corresponding nucleotide in the reference sequence, when the probe and reference sequence are aligned to maximize complementarily between the two. For each probe in the first set, there are three corresponding probes from three additional probe sets. Thus, there are four probes corresponding to each nucleotide in the reference sequence. The probes from the three additional probe -sets aaee identical to the corresponding probe from the first probe set except at the interrogation position, which occurs in the same position in each of the four corresponding probes from the four probe sets, and is occupied by a different nucleotide in the four probe sets. In the present analysis, probes were 25 nucleotides long. Arrays tiled for multiple different references sequences were included on the same substrate.
Multiple target sequences from an individual were amplified from human genomic DNA using primers for the fragments indicated in the listed Web sites. The amplified target sequences were fluorescently labelled during or after PCR. The labelled target sequences were hybridized with a substrate bearing immobilized arrays of probes. The amount of lable bound to probes was measured. Analysis of the pattern of label revealed the nature and position of differences between the target and reference sequence. For example, comparison of the intensities of four corresponding probes reveals the identity of a corresponding nucleotide in the target sequences aligned with the interrogation position of the probes. The corresponding nucleotide is the complement of the nucleotide occupying the interrogation position of the probe showing the highest intensity (see WO 95/11995) . The existence of a polymorphism is also manifested by differences in normalized hybridization intensities of probes flanking the polymorphism when the probes hybridized to corresponding targets from different individuals. For example, relative loss of hybridization intensity in a "footprint" of probes flanking a polymorphism signals a difference between the target and reference (i.e., a polymorphism) (see EP 717,113) . Additionally, hybridization intensities for corresponding targete-s from different individuals can be classified into groups or clusters suggested by the data, not defined a priori , such that isolates in a give cluster tend to be similar and isolates in different clusters tend to be dissimilar. Hybridizations to samples from different individuals were performed separately. The Table summarizes the data obtained for target sequences in comparison with a reference sequence for the individuals tested.
From the foregoing, it is apparent that the invention includes a number of general uses that can be expressed concisely as follows. The invention provides for the use of any of the nucleic acid segments described above in the diagnosis or monitoring of diseases, such as cancer, inflammation, heart disease, diseases of the CNS, and susceptibility to infection by microorganisms. The invention further provides for the use of any of the nucleic acid segments in the manufacture of a medicament for the treatment or prophylaxis of such diseases. The invention further provides for the use of any of the DNA segments as a pharmaceutical. All publications and patent applications cited above are incorporated by reference in their entirety for all purposes to the same extent as if each individual publication or patent application were specifically and individually indicated to be so incorporated by reference
Figure imgf000034_0001
Figure imgf000035_0001
Figure imgf000036_0001
Figure imgf000037_0001
)
0
Figure imgf000038_0001
Figure imgf000039_0001
U>
Figure imgf000039_0002
ATTGCACTGAAG I I I I I GAAATACCTTTGTAGTTACTCAAGCAGTTACTCCCTACACTGATGCAAGGA TTACAGAAACTGATGCCAAGGGGtC/G]TGAGTGAGTTCAACTACATGTTCTGGGGGCCCGGAGATAG ATGACTTTGCAGATGGAAAGAGGTGAAAATGAAGAAGGAAGCTGTGTTGAAACAGAAAAATAAGTC
WI-7718C 91 G AAAAGGAACAAAAATTACAAAGAACCATGCAGGAAGGAAAACTATGTATTAAT
ATTGCACTGAAG I I I I I GAAATACCTTTGTAGTTACTCAAGCAGTTACTCCCTACACTGATGCAAGGA TTACAGAAACTGATGCCAAGGGGCTGAGTGAGTTCAACTACATGTTCTGGGGGCCCGGAGATAGATG ACTTTGCAGATGGAAAGAGGTGAAAATGAAGAAGGAAGCTGTGTTGAAACAGAAAAATAAGTCAAA
Wl-7718b 248 AGGAACAAAAATTACAAAGAACCATGCAGGAAGGAAAACTATGTATT[A/G1AT
ATrGCACTG GTTTTTGAAATACCTTTGTAGTTACTCAAGC[A/C,ηGTTACTCCCTACACTGATGC AAGGATTACAGAAACTGATGCCAAGGGGCTGAGTGAGTTCAACTACATGTTCTGGGGGCCCGGAGAT AGATGACTTTGCAGATGGAMGAGGTGAAAATGAAGAAGGAAGCTGTGTTGAAACAGAAAAATAAG
Wl-7718a 42 TCAAAAGGAACAAAAATTACAAAGAACCATGCAGGAAGGAAAACTATGTATTA
AGGGAATTGTGTTGCTCCTGGAGGAAGCCCAGGCATCATTAAACAAGCCAGTAGGTCACCTGGCTTC CGTGGACCAATTCATCTTTCAGACAAGCTTTA[G/C]AGAAATGGACTCAGGGAAGAGACTCACATGC TTTGGTTAGTATCTGTGTTTCCGGTGGGTGTAATAGGGGATTAGCCCCAGAAGGGACTGAGCTAAACA
Wl-7227d 99 GTGTTATTATGGGAAAGGAAATGGCATTGCTGCTTTCAACCAGCGACTAATG Ul
00
AGGGAATTGTGTTGCTCCTGGAGGAAGCCCAGGCATCATTAAACAAGCCAGTAGGTCACCTGGCTTC CGTGGACCAATTCATCTTTCAGACAAGCTTTAGAGAAATGGACTCAGGGAAGAGACTCACATGCTTT GGTTAGTATCTGTGTTTCCGGTGGGTGTAATAGGGGATTAGCCCCAGAAGGGACTGAGCTAAACAGTG
WI-7227C 291 TTATTATGGGAAAGGAAATGGCATTGCTGCTTTCAACCAGCGACTAATGCAAT
AGGGAATTGTGTTGCTCCTGGAGGAAGCCCAGGCATCATTAAACAAGCCAGTAGGTCACCTGGCTTC CGTGGACCAATTCATCTTTCAGACAA[G/ηCTTTAGAGAAATGGACTCAGGGAAGAGACTCACATGC TTTGGTTAGTATCTGTGTTTCCGGTGGGTGTAATAGGGGATTAGCCCCAGAAGGGACTGAGCTAAACA
Wl-7227b 93 GTGTTATTATGGGAAAGGAAATGGCATTGCTGCTTTCAACCAGCGACTAATG
AGGGAATTGTGTTGCTCCTGGAGG[A G]AGCCCAGGCATCATTAAACAAGCCAGTAGGTCACCTGGC TTCCGTGGACCAATTCATCTTTCAGACAAGCTTTAGAGAAATGGACTCAGGGAAGAGACTCACATGC TTTGGTTAGTATCTGTGTTTCCGGTGGGTGTAATAGGGGATTAGCCCCAGAAGGGACTGAGCTAAACA
Wl-7227a 24 G GTGTTATTATGGGAAAGGAAATGGCATTGCTGCTTTCAACCAGCGACTAATG j CCACAATGCCTCTCCCACGATGTCAAGGACTCCTGTCTGTCCTGGAGGTGGGAGACAAGGAACCTCCG | AAGAGGAAGCAAGAAAGCCGTACTGTCTATGTTGTGATCCTTCATCGAACAAACTGATGCGAAAACT |TGAATCTGTTACTGAAATGAGGAGAGAAGGACATGTGCTATTGAACTGAGCCAAACACACTGTAAAT
Wl-7310b 234>A ATCCACAGACTCCCTCCCCTGCCCCCATCCCAfA/CIATGATCTTGAGATTTC
)
Figure imgf000041_0001
GAAGCAACCAGAAAGTATCTTTATCCCCATCTAGATTATGTCTGGGTTCTTCCAGACTCCTACGATTA AATTGTATGCATGTGAACAACTGATGAGGTACTTAGATCTCAGTGCTTTGCAGAAAGAAAAG[T/C]C GTCTACCATTTTCACCAAATTTCGTAGTACAATTTAAGTATCTCTTGTTATCTCCCCTAGGAGTCTAA
Wl-1 95b 1 30 AGTGAGCTGGGGAAGGCAGGATTT
GAAGCAACCAGAAAGTATCTTTATCCCCATCTAGATTATGTCTGGGT[T/C]CTTCCAGACTCCTACGA TTAAATTGTATGCATGTGAACAACTGATGAGGTACTTAGATCTCAGTGCTTTGCAGAAAGAAAAGTC GTCTACCATTTTCACCAAATTTCGTAGTACAATTTAAGTATCTCTTGTTATCTCCCCTAGGAGTCTAA
Wl-1795a 47 AGTGAGCTGGGGAAGGCAGGATTT
CACACAATTTGCAAACACTTCAAAGTGAACGCCCGACATCATCAGCCCGTTAACGTCCAGGCCATGT CCCACATAGAGAACGCTTTACTTCCACGTCTCTCCATACGTAGGTCCTGGTCTCCTATCACATTGCCA C[G/A]TAGCCCTCCCTTCCCTTCCCCCTACAGGCCCTCTTCAGGGCCCCAGTCCCCCTCTGAGACTCCC
1 36 ATGGATCATTCCTGTTTCTGTATCAGGCAGTGATTTAACTCC I I I I I I GT
CACACAATTTGCAAACACTTCAAAGTGAACGCCCGACATCATCAGCCCGTTAACGTCCAGGCCATGT CCCACATAGAGAACGCTTTACTTCCACGTCTCTCCATACGTAGGTCCTGGTCTCCTATCACATTGCCA C[G/A]TAGCCCTCCCTTCCCTTCCCCCTACAGGCCCTCTTCAGGGCCCCAGTCCCCCTCTGAGACTCCC
1 36 ATGGATCATTCCTGTTTCTGTATCAGGCAGTGATTTAACTCC I I I I I I GT
4-.
CACACAATTTGCAAACACTTCAAAGTGAACGCCCGACATCATCAGCCCGTTAACGTCCAGGCCATGT CCCACATAGAGAACGCTTTACTTCCACGTCTCTCCATACGTAGGTCCTGGTCTCCTATCACATTGCCA CGTAGC[C/ηCTCCCTTCCCTTCCCCCTACAGGCCCTCTTCAGGGCCCCAGTCCCCCTCTGAGACTCCC
1 41 ATGGATCATTCCTGTTTCTGTATCAGGCAGTGATTTAACTCC I I I I I I GT
CACACAATTTGCAAACACTTCAAAGTGAACGCCCGACATCATCAGCCCGTTAACGTCCAGGCCATGT CCCACATAGAGAACGCTTTACTTCCACGTCTCTCCATACGTAGGTCCTG[G/CJTCTCCTATCACATTG CCACGTAGCCCTCCCTTCCCTTCCCCCTACAGGCCCTCTTCAGGGCCCCAGTCCCCCTCTGAGACTCCC
Figure imgf000042_0001
1 1 6 ATGGATCATTCCTGTTTCTGTATCAGGCAGTGATTTAACTCC I I I I I I GT
CTCTTATTTCTCTGGGCACTGCTTTCTTTGGGGGCAAACTTCCAGTATCACT[G/A]ATACTAATATAA AAACCCTGT GTCTGCTTGCATTTTCAAGATTCAATATATATCCAGATTGTTTTCCCAGCAAAGAA TTTTATTTCTCAAGATATAAAAAATMATATTTAATTTCAGTTTCCTCAAAAGGAATATGAAATT
WI-1126C 52 G TGTTAAAATGCAAATCCAGCTGTAAC I I I I I I GGACTTGTCTTTTATTTCTT
CTCTTATTTCTCTGGGCACTGCTTTCTTTGGGGGCAAACTTCCAGTATCACTGATACTAATATAAAAA CCCTGTMGTCTGCTTGCATTTTCAAGATTCAATATATATCCAGATTGTTTTCCCAGCAAAGAAAATT TTATTTCTCAAGATATAAMMTMATATTTMTTTCAGTTTCCTCMAAGGAATATGMATTTGTT
Wl-1126b 230 AAAATGCAAATCCAGCTGT CTTTTT[T/C|GGACTTGTCTTTTATTTCTT
Figure imgf000043_0001
4-.
Figure imgf000044_0001
CGAGCTTGGGATAAAGCAAGGGGACCTTGGC[G/A]CTCTCAGCTTTCCCTGCCACATCCAGCTTGTTG TCCCAATGAAATACTGAGATGCTGGGCTGTCTCTCCCTTCCAGGAATGCTGGGCCCCCAGCCTGGCCA GACMGMGACTGTCAGGMGGGTCGGAGTCTGTAAMCCAGCATACAGTTTGGCTTTTTTCACATT
Wl-7038a 31 G . GATCA I I I I l ATATGAAATAAAAAGATCCTGCATTTATGGTGTAGTTCTGA
ATACGCTTTCTGTCTGTCCCACAGTGGAACCAGCACCCAGGTGGCCAGGGTCGGGCTCCACACA[G η CCCTCAGCCCCTTCAGCTTTGCATGTGTCCATCGGTGACTCAGCACAGAGTTTTCCAACCTCATGTGA CAAAAATACAGATTCCCAGTCTCCTCTCCTGGATTTGGATCTAGCAAGACCAGAGACGGTCCTAGAA
Wl-3429b 64 TCCTGACTGTTAACAAGCACTCCAGGCAATTCTTAAGACCAAGCACGGAGC
ATACGCTTTCTGTCTGTCCCACAGTGGAACCAGCACCCAGGTGGCCAGGGTCGGGCTCCACA[C/ηAG CCCTCAGCCCCTTCAGCTTTGCATGTGTCCATCGGTGACTCAGCACAGAGTTTTCCAACCTCATGTGA CAAAAATACAGATTCCCAGTCTCCTCTCCTGGATTTGGATCTAGCAAGACCAGAGACGGTCCTAGAA
Wl-3429a 62 TCCTGACTGTTAACAAGCACTCCAGGCAATTCTTAAGACCAAGCACGGAGC
ATTTTAGGACAGTGAAAAAAAGGGATTTATAAATAAAATCTATGCCATCCAGGAGGTATGTGTCAGT GTCCAGAACATCCTAGATGAAGTGGCTTCCTTTGGCGAAAGGATAAAGAAGTGAGTGACGGTGACCT GTGAGCCCCATTCTTCT[G/A]TGGGATAAGGTGTCCATTTGTTTCTTGGAGGGTGAAATGCCACATTC
WI-6786C 1 51 TTTTTGGCAGGGGACACTCCTTCTGGGTGCTCTATTGCTCAGTTTCATCATT 4-.
ATTTTAGGACAGTGAAAAAMGGGATTTATAAATAAAATCTATGCCATCCAGGAGGTATGTGTCAGT GTCCAGAACATCCTAGATGAAGTGGCTTCCTTTGGCGAAAGGAT[A/ηAAGAAGTGAGTGACGGTGA CCTGTGAGCCCCATTCTTCTGTGGGATAAGGTGTCCATTTGTTTCTTGGAGGGTGAAATGCCACATTC
Wl-6786b 1 1 1 TTTTTGGCAGGGGACACTCCTTCTGGGTGCTCTATTGCTCAGTTTCATCATT
ATTTTAGGACAGTGAAAAAAAGGGATTTATAAATAAAATCTATGCCATCCAGGAGGTATGTGTCAGT GTCCAGAACATCCTAGATGAAGTGGCTTCCTTTGGCGAA[A/ηGGATAAAGAAGTGAGTGACGGTGA CCTGTGAGCCCCATTCTTCTGTGGGATAAGGTGTCCATTTGTTTCTTGGAGGGTGAAATGCCACATTC
Wl-6786a 1 06 TTTTTGGCAGGGGACACTCCTTCTGGGTGCTCTATTGCTCAGTTTCATCATT
GGCTATTTGTAAATGCTTGGTTATTTGACTCCAAAATTGAATAAGTATTGGGGAAGAATCCCTCACCT ACTTCCAMTCCCTTACATATCMTTTTACACAAAGCCCCTAAACCTTCAGTTCCAATCACTCTGAAT TTCATATACCTCCATTATTAAATTCAATACATCATTGCAGAGAAAAGACAACGGTGCCAACTGGGTT
Wl-671 1 b 226 T TGGTTGGTGCCTGCACACCCACA[G ηTGGCAACTAAGTGTAATCTCTAAA
GGCTATTTGTA TGCTTGGTTATTTGACTCCAAAA[T/C]TGAATAAGTATTGGGGAAGAATCCCTC
ACCTACTTCCA TCCCTTACATATC TTTTACACAAAGCCCCTAAACCTTCAGTTCCAATCACTCT GAATTTCATATACCTCCATTATTAAATTCAATACATCATTGCAGAGAAAAGACAACGGTGCCAACTG
Wl-6711 a 36 T Ci - GGTTTGGTTGGTGCCTGCACACCCACAGTGGCAACTAAGTGTAATCTCTAAA
Figure imgf000046_0001
Figure imgf000046_0002
4-.
Figure imgf000047_0001
Figure imgf000048_0001
4-. ^1
Figure imgf000049_0001
4-.
00
Figure imgf000050_0001
4-.
Figure imgf000051_0001
Λ
O
Figure imgf000052_0001
Figure imgf000053_0001
Figure imgf000054_0001
CCTCCTCTGAG GCAGTTCTCTGAAAGACMTGGATTGTGGAGCATACTGMGACTATTCCTAMTGGCTATTTGTGTTG
TTTGTGTTGGG ATTTTCTGAAT GGTGGTCMG[A G]CTATTCAGAAMTCTCAGAGGAGGACAMTGATAGTGCACTGCAGCCAGCTCG
WI-11909 78 G TGGTCMG_ AG GACTGGCTTGCAAGAGTC
TCCTGTAMGC
CATGAAGAGT CAATTTTATAT AAAAATACCATTTAGCATCMTTGCCCCMGTTTGGCAGGCATGMGAGTGGGCAGTTCAΓT/G]GTT
WI-1 1806 60 GGGCAGTTCA ACTAATAA TTATTAGTATATMAATTGGCTTTACAGGMGCATTATGG
CCCTAGTGMTACMCCTTTGTCCTGGAGAC[C/A]CCAGCTAGTCTMGAAMCTTCCTAGGCTGAG
WI-11946 31 CTCTCTTGGGMTCTMGATAMGMCTGAGATCCTGGGMGMGGGM
TGMGATCAG
ATCTCTGGTTT CAGCTGTGGTG ACAAAATTCACMGTACAACACTGCTTATTTTCTTGCTTGMGATCAGATCTCTGGTTTATTTM[T/
WI-11965 65 G ATTT MTGTTGAT G]ATCMCATTCACCACAGCTGMGGAAATTAMCTGMCCT
TGCCCTACTAC TGAGGAAATGT ACCTATTTTGMACTGCAGAAAGGGCAGGACAAMCAAATCACTTCATAGATTTTTCTGGGAMTAT
GCTTTTAAAA GTTACAGTATT TGCCCTACTACGCTTTTAAMAA[T/A]AATAAAAATACTGTAACACATTTCCTCATTTCTCTTACGA
WI-11027 90 A TTTATT ATACTTTC I I I I I GATATTGCAMTTCTATGGCATACACAGAGGCACCTCCTCMTGCCCTG
TTCTGCTGAAGATCACAAMCMTTTCMCCTCTGTGGTTCAAMTMTTTMGGATCTTGTACCTTT
GTGTTTATTTTCTGTTTCAACTAAGGA[C/ηAGACTTCAGMGGCATAGCTTCCCTTGTAACGTTTTT
WI-1 1049 95 AAACATCTTTTTCATTTGTAGGAAGGMCATTTCAAAAGCCCAA CΛ )
AAAAGGACAG TTTCCATCTTA CCAGATATCA TTTCATTTCTG CAACATTTATCAAACATGGTAGGGAAMGTTCTCACTCTGCACTATAAAAAGGACAGCCAGATATCA
WI-15488 69 AC TMC AC[CtT]GTTACAGAAATGAAATAAGATGGAAAATm AACAAATTG
MCAGTTAAT GAMCACATC GGCTGGTGAM TGCTCAATTTMTGTGATAATCTCCMCAGTTMTGAAACACATCCGTA[A/G]GTATGACATCATTT
WI-13654 49 CGT TGATGTCAT CACCAGCCAGCTACTTCATGTGGCAGAAMGGTMCCTTTTCCCCATTTTACAGACAAMCCAGT _
ATGAGACCCTGCTTTGMCGTTAMCGTTTTGGMTMTGGAAAAGGAGCTAGGACMTTCTTGCTT
Wl- TCMGTAAMTTGTGACTGAGCAGAAMTCAGCCAGCTATCTTGGGTGCAGAGAGGTACTCCMGTA 1 1 070b 1 3_5j C| C[C, ]GTGGGGGTTCTGATGACTTCCACGGTCACTGGGGATCCMCAGMGGGM
CAGAAMTCA ATGAGACCCTGCTTTGMCGTTAMCGTTTTGGMTMTGGAAMGGAGCTAGGACMTTCTTGCTT
Wl- GCCAGCTATCT TFGGAGTACCT TCMGTAAMTTGTGACTGAGCAGAAMTCAGCCAGCTATCTT[G/ηGGTGCAGAGAGGTACTCCM 1 1070a 1 1 0 | G T T CTCTGCACC GTACCGTGGGGGTTCTGATGACTTCCACGGTCACTGGGGATCCMCAGMGGGM
MTCTTTTATATTTCCAGCTGTTGAGACAGTATTTTTGAGGGCTGATGTTACCTCTAGCGGCGAMCC AGAGCCAGCTATTMGCAGCCAGMAGCTACAGTMTTGMTACATGACCATT[T/C]CTCTTTTAGC
WI-12020 1 21 T C -- ACGTTCTTTGTTCTCCTC
Λ
4-.
Figure imgf000056_0001
Figure imgf000057_0001
/l
Figure imgf000058_0001
Figure imgf000059_0001
00
Figure imgf000060_0001
Figure imgf000061_0001
O
Figure imgf000062_0001
Figure imgf000063_0001
Figure imgf000063_0002
Figure imgf000064_0001
Figure imgf000065_0001
Figure imgf000065_0002
ON 4-.
Figure imgf000066_0001
Figure imgf000067_0001
Figure imgf000067_0002
Figure imgf000068_0001
Figure imgf000069_0001
ON 00
Figure imgf000070_0001
Figure imgf000071_0001
-4
O
Figure imgf000072_0001
Figure imgf000073_0001
Figure imgf000074_0001
TGCTC I I I I I ATTTCACGTTTCACMCACACGCCGTG[G/ηTGGCACAGTCTACCAMGTGCCCGCAG CGCCACGCTTGGGCCGGMGGTCTCATTCTGTTCGTCTCTATGGACTGATTGMTTTGGGATGGCCAG CTCCAGMTGTTCCACGTGGGGGCACTCTGTGGGCAGAGAGGCTGAGCCCTTGCCCACACTGGCACCA
WI-9617 37 G MGAGGTTGCACGATGCAGCTTGCAGTGGGTCCMGCCGGGTGTGCTGTG
MTGCTGGAGAAMCATCMCATTGAGTTGACATTTGTTTTGCTGMGTATAGCTACCATCCACTAT CATGAATTTTTGTTTCATTACAMTGATAGAAMGCCAGATTCTCAAAATAAAG[T/G]ATMTTCTT TGTATTAAATAMTGTTTATAAATGTTTATGAAGCTCATTACATTATC I I I I I I AAMAAGTAAAAA
WI-9657 1 21 TTTTAGMCATATGACGCTTTTCATMTTMTGCTTTTGATATAGATTTGAGG
AAAAATTAAC CAGGGTCTTGCTCTGTCTCCCAGGCTAGAGTGAGGTGACACMTCMGACTCACAGTAGCCTCMCCT
Wl- CCTCCCMGTA CAGGTGTGGTG CCTATGCTCMGCCAGCCTCCCMGTAGCTGGGACTACAGGCATGT[G/C]ACACCACACCTGGTTM 131 19b 1 14ι G ' GCTGGGA T I I I l I I IM I I I I I l GTMAGATAGGGTCTCACTATGTTGCCCCGTCTCAAAAMCMACCMCTMC
CAGGGTCTTGCTCTGTCTCCCAGGCTAGAGTGAGGTGACACMTCMGACT[C/G]ACAGTAGCCTCA ACCTCCTATGCTCMGCCAGCCTCCCMGTAGCTGGGACTACAGGCATGTGACACCACACCTGGTTA
Wl- A l I I I I I I MI I I I I I GTAMGATAGGGTCTCACTATGTTGCCCCGTCTCAAAMACAMCCAACTAA 131 19a 51 C _
ACAGGAATCTGAAAGTTACCMGGCAATTTTCCCTTTTAGGATCATAMGACTACAGACTTMGCTT 3
TCATAMGAC TTAGAAATTTT TTTT[C/T]C I I I I I CCATATMTACACAAAATTTCTAAATATCCTTAAAAAAGAAAATATAAATAGT TACAGACTTA GTGTATTATAT TTCAGTATGTTATGTAGAGTCACATACTATGGCAAMATATTTTATTMTTGAGGGMTAGGCCMT
WI-13112 71 AGC I I I I I GGAAAMG
TGTTAACATTTTTATTGGTACGTGCTCTCAGTACM[C/A]AMCAGCATCAGTAGTGTACACTTTGAT
CAMGTGTACA MAMGGMTTTTTAGCTTAGTAGMMGMAGCCCAMGGTCAGAAGTATMTGMTATGTACAT
TGGTACGTGCT CTACTGATGCT CTTTATGGAMCTGTTTGTGTGACCATCTTTATCTTCCCCTGTGGATGAGATGTATGCACACACMGT
WI-12988 36 CTCAGTACM GTTT AAA
TGCTATTCATGACAGACACGTGAGACAMTATTCTTATTTTACAGATGGAMTAGACCCAGACATTA
CTMTAGTGG TTCAGTACTTTMCCACTAATAGTGGMCCCTGAGACTTTA[G/A]ATCTGCAMGGGGTTTAATAAT
Wl- MCCCTGAGA CATTATTAAAC GCMATATCACATATATTTCCA I I I I I AACACCATATTTAAGTTTTCCATTTTCTTMTAGAMATGA 13020a 1 08 GL CTTT CCCTTTGCAGA TAMAAATGTTTTCCCCMTAT
TGTATAAAMATCCMCTTGTTCCACMGTACATATGTCCTATGATTTTATGCATACATCCATATAC
CCATATACAT ATATATCAAGGTMAGTCCA[A/G]TACAMMMCAGCATTTCCTATGGCCAGTGTFCTACAGAAGT ATATCMGGT GCCATAGGM MGACTGTGCAMCTTTATCGTATAGTCAMTGAGATTGCACACTMGGCAGGATGAGGCAGMGCA
WI-12837 87' MAGTCCA ATGCTG I I I I I AGTTGTGTCCA
GTCCTCAGGCCCTTCTCTGGCTGCAGAGCCGTCTTCTCAGGTTGCCTGTC[G/C]TCTCCTGGCCTCTAG TCTTCCCTGCTCTCCGAGGTAGAGCTGGGTATGGATGCTTAGTGCCCTCACTTCTCTCTGTCTATACCT GCCCCATCTGAGCACCCATTGCTCACCATCAGATCMCCTTTGATΠTACATCATMTGTATTCACCA
L4261 1 b 50 CTGGAGCTTCACTTTGTTAC
GTCCTCAGGCCCTTCTCTGGCTGCAGAGCCGTCT[T/C]CTCAGGTTGCCTGTCGTCTCCTGGCCTCTAG TCTTCCCTGCTCTCCGAGGTAGAGCTGGGTATGGATGCTTAGTGCCCTCACTTCTCTCTGTCTATACCT GCCCCATCTGAGCACCCATTGCTCACCATCAGATCMCCTTTGATTTTACATCATMTGTATTCACCA
L4261 1 34 T CTGGAGCTTCACTTTGTTAC
TGAACGTGTGGTTAAAACTAGGCMTTGGTTMMATCMTTTMMAACAGGCCTAGAMCAGTG
TGMGAAATG ACCACACCTCMGCAATGATTATCCCTAGCACTCAGATTATGTTCTTGAMTACCATTTTCTGCTTTC GCTGATACCA ATGTGCATTTT AAMGAAAGACATGAGGGCTTCTTGMGAMTGGCTGATACCMG[CtηCTGCAGTGAAAMTGCA
Wl-1172b 1 79 A TCACTGCAG CATGATGAGCCTGGMCATGTTGT
TGAACGTGTGGTTMAA[C/A]TAGGCAATTGGTTAAAAATCAATTTAAMAACAGGCCTAGAAACA GTGACCACACCTCMGCMTGATTATCCCTAGCACTCAGATTATGTFCTTGAMTACCATTTTCTGCT TTCAAMGMAGACATGAGGGCTTCTTGMGMATGGCTGATACCMGCCTGCAGTGAAMATGCA
Wl-1 172a 1 7 CATGATGAGCCTGGMCATGTTGT
AGAGGCAGATTGGAAGTGTGAAAAAAATGAAAGM[G/C]MGMAAAAAGAGTCTAAATATTCAG 4-.
GCAGATTGGA CACTTACATTT MATGTMGTGCTGCCCTCMCTGTTCTTTACCCACTTMTTCTGCMTTTTGAAAACTAGATTGMT AGTGTGAAM CTGAATATTTA TCCTTTGCAAMCCCTTGCATCATGGATACCCGAGTTAMCCGTTMTTAAMGACATTAMCATGG
WI-1 177 35 G O A GACTCTTT CCTGGTG
TCCATGGTTTGGTTGCTACTGACTTTGTTAGCCTTACTGCCCACTATGCATTGGMCATTCCCATATTC CMCTMGCAGGAGTGTTCACMTM.ACMCATAGGCTCTTTATTCTCCTTCTTTCATTMTTTTCTT TCAC[GyA]TTATTCCCTCACCCTGMCGCCCTTCTTCCTTCGTAGTGACATTTTAAMTCCACTTTAC
Wl-1231 b 1 41 I G ACATTCGGACC
TCCATGGTTTGGTTGCTACTGACTTTGTTAGCCTTACTGCCCACTATGCATTGGMCATTCCCATATTC
GGCTCTTTATT CMCTAAGCAGGAGTGTTCACMTMACMCATAGGCTCTTFATTCTCCTTCTTTCA[T/C]TMTTTT
CTCCTTCTTTC CGTTCAGGGTG CTTTCACGTTATTCCCTCACCCTGMCGCCCTTCTTCCTTCGTAGTGACATTTTAAMTCCACTTTACA
Wl-1231 a 1 26 T!C A_ __ _ AGGGAATM CATTCGGACC
ACATACATAT GMGGCAGGACTGTGTTTTGGAGGACMMAGTAAMTC I I I I l ATATCTTTA I I I I I I MTTTTATT [CCATTATACA GACCTTTCTTT TTTTTTCAGGCATATAGACATACATATCCATTATACMCAGMMG[G/C]GGGCTGGAAAAGMAG
WI-472 1 1 4 G C ACAGAMAG TCCAGCCC GTCMGTGAGATTTCAGATATTCTTAAATGCMGGCTGACAMTTTGGGCTTGATT
Figure imgf000076_0001
Figure imgf000077_0001
Figure imgf000078_0001
-4 -4
Figure imgf000079_0001
oo
Figure imgf000080_0001
vo
Figure imgf000081_0001
09
©
Figure imgf000082_0001
00
Figure imgf000083_0001
00 SI
Figure imgf000084_0001
oo
Figure imgf000085_0001
00 4-.
Figure imgf000086_0001
Figure imgf000087_0001
00 ON
Figure imgf000088_0001
00 -4
Figure imgf000089_0001
00
00
Figure imgf000090_0001
00 VO
Figure imgf000091_0001
vo o
Figure imgf000092_0001
VO
Figure imgf000093_0001
Figure imgf000093_0002
SI
Figure imgf000094_0001
VO
Figure imgf000095_0001
Figure imgf000096_0001
vo
Figure imgf000097_0001
VO ON
Figure imgf000098_0001
TΓTTTGTTTΌCTCTGGACACCCACTGCTCCCAGGATGAMGGAGAG[G/A]MTGAGATCAGTTTTGGA
WI-7593 46 CACTTCCTCTTGAMTATAMGMTCMCMGTFACAGTCATGTTGGGGACTTCTTCTCTCTCCM
AGTGCATCTTGGGGGAMGGGCTCCAGTGTTATCTGGACCAGTTCCTTCATΠTCAGGTGGGACTCTT GATCCAGAGA[A G]GACAMGCTCCTCAGTGAGCTGGTGTATMTCCMGACAGMCCCMGTCTCC TGACTCCTGGCCTTCTATGCCCTCTATCCTATCATAGATAACATTCTCCACAGCCTCACTTCATTCCAC
WI-6962 78 CTATTCTCTGAAMTATTCCCTGAGAGAGMCAGAGAGATTTAGATMGA
GCAGAGMGAGMCCATGCCAGGGGAGMGGCACCCAGCCATC[C/G]TGACCCAGCGAGGAGCCM
MGGCACCCA GCTCCTCGCTG CTATCCCAMTATACCTGGGTGMATATACCAAATTCTGCATCTCCAGAGGMMTMGAMTMA
WI-7059 43 GCCATC GGTCA GATGAATTGTTGCAACTCTTAAAAAM
CACTTCACTGA MGACACCAT TCTACTTTCTG AGCAGCCATCACATGATCTGTTTTTCACCACTTCACTGAMGACACCATTTAT[A/C]TACCCMGGG
WI-9063 53 CCCTTGGGT CAGAMGTAGMCTTACTATTCATTAMTGTTTGACACMTTGGMTTGTC
MGGGGCATTGAGACTATAMGCAGTAGACAATCCCCACATACCATCTGTAGAGTTGGMCTGCATT CTTTTMAGTTΓΓATATGCATATATTTΓAGGGCTGCTAGACTTACTTTCCTATTTTC'FTTTCCATTGC TATTCTTGAGCACAMATGATMTCMTTATTACATTTATACATCACC I I I I I GACTTTTCCMGCCC
WI-7079 293 TTTTACAGCTCTTGGCATΠTCCTCGCCTAGGCCTGTGAGGTMCTGGGAT
GGTAAMGTT GACAGAI I I I I o CTTTTTGCTCT GACCTAGTTCC TGGATGCCGAGGTAAMGTTCTTTTFGCTCTAAAAGM[A/G]AAGGMCTAGGTCAAAMTCTGTCC 1
WI-9074 38 AAAAG TT GTGACCTATCAGTTATTM I I I I I MGGATGTTGCCACTGGCAMTGTMCTGT
GGAGTTTGCCCCTTCCTMGGGMGGAGATCTTTATCTTTCTGGTTGGCTTGACCAGTCACGTTGGGA GMGAGAGAGAGTGCCAGGAGACCCTGAGGGCAGCCGGTTCCTACTTTGGACTGAGAGMGGGAGCC CCAGGCTGGAGCAGCATGAGGCCCAGCMGMGGGCTTGGGTTCTGAGGMGCAGATGTTTCATGCT
Wl-7104b 249 GTGAGGCCTTGCACCAGGTGGGGGCCACAGCACCAGCAGCATCTTTG[CtFJF
GGAGTTTGCCCCTTCCTMGGGMGGAGATCTTTATCTTTCTGGTTGGCTTGACCAGTCACGTTGGGA GMGAGAGAGAGTGCCAGGAGACCCTGAGGGCAGCCGGTTCCTACTTTGGACTGAGAGMGGGAGCC CCAGGCTGGAGCAGCATGAGGC[C/A]CAGCMGMGGGCTTGGGTTCTGAGGMGCAGATGTTTCAT
WI-7104 1 57 GCTGTGAGGCCTTGCACCAGGTGGGGGCCACAGCACCAGCAGCATCTTTGCT
CCTGAGCCCTC TGTAGGGCTGA CATACMTGAGAGCCCTGAGCCCTCMGMCTCA[CtηGCCAGCTCAGCCCTACACCAGTTTCCACC
WI-8974 34 AAGMCTCA GCTGGC TGGAGTTCATGCMGGGCMMGGCAGTGCCATGCMGCTGTTTM
GCTTACAGGAG
CCTMGCATTG AGACTAGACA CTGTGAGGGTGACGTTAGCATTACCCCCMCCTCATTTTAGTTGCCTMGCATTGCCTGGC[Cπ TC
WI-9161 61 1 CCTGGC GGM CTGTCTAGTCTCTCCTGTMGCCAMGMATGMCATTCCA
CCCTGTTCCCATGCTGACCTGTGTTTCCTCCCCAGTCATCTTTCCTGTTCCAGAGAGGTGGGGCTGGAT
WI-9014C 93 lT Cl- GTCTCCATCTCTGTCTCMCTTTAΓF/CIGTGCACTGAGCTGCMCTTCT
CCCTGTTCCCATGCTGACCTGTGTTTCCTCCCCAGTCATCTTTqCtFJFGTTCCAGAGAGGTGGGGCTG
Wl-9014b 44 GATGTCTCCATCTCTGTCTCMCTTTATGTGCACTGAGCTGCMCTTCT
TCTGAGAGAMTGACTTGTGGGAGACACCCTGCAGATCCTCATGGGTTTGTGACAGACCCTGCGTGCT CAGTGCCCTTTMGTGCATCCCGCTGTGCTGACTTTGAGTGGGATCMCATCTGTCCTACGGGTCCCC TCI I I I I IGGCCCCAGTATTCATGGCAGGGTTTGTTGGACACCTACTAGCTTCCCTTCCCATTCMCAC
Wl-7023b 206 A[C/A]ACACACATTCTTGCTCTACCCAMGCTCTGGCTGGCAGCACTM
TCTGAGAGAMTGACTTGTGGGAGACACCCTGCAGATCCTCATGGGTTTGTGACAG[A/C]CCCTGCGT GCTCAGTGCCCTTTMGTGCATCCCGCTGTGCTGACTTTGAGTGGGATCMCATCTGTCCTACGGGTC CCCTC I M IT IGGCCCCAGTATTCATGGCAGGGTTTGTTGGACACCTACTAGCTTCCCTTCCCATTCM
Wl-7023a 56 CACACACACACATTCTTGCTCTACCCAMGCTCTGGCTGGCAGCACTM
CTGAMTCCCCCTCTCTGCCCTGGCTGGATCCGGGGACCCCTTTGCCCTTCCCT[CT]GGCTCCCAGCC CTACAGACTTGCTGTGTGACCTCAGGCCAGTGTGCCGACCTCTCTGGGCCTCAGTTTTCCCAGCTATG AAAACAGCTATCTCACAMGTTGTGTGMGCAGMGAGAAMGCTGGAGGMGGCCGTGGGCCMT
WI-7093 54 GGGAGAGCTCTTGTTATFATTMTATTGTTGCCGCTGTTGTGTTGTTGTTA
ACATATCTGAAAAATGTTGAMGCCTMGCCAGGAATAMAGAAMGTAGAGATMTAATCA[G/A]
WI-9171 62 TTCTTTACMCCGATGGTMTTMGCTTGTATTCACMGACTTCATGC
CTAGGACCCC TCTAGAGGGTA vo 00 ATTCTCCTATT TATAGGACAGG GTGTGAGACCATCATGGTGCCAGTCTAGGACCCCATTCTCCTATTTAΓT/C]CAGTCCTGTCCTATATA
WI-9174 47 T ACTG_ CCCTCTAGAMCAGAMGCMTTTTTAGGCAGCTATGGTCAMTTGAG _ _
CAGAGGTCTTG MGGCCAGATGCACATCCCTGGMGGACATCCATGTTCCGAGMGMCAGAT[A/G]ATCCCTGTATT
CCATGTTCCGA AMTACAGGG TCMGACCTCTGTGCACTTATTTATGMCCTGCCCTGCTCCCACAGMCACAGCMTTCCTCAGGCTA
WI-7753 52 GMGMCAGA A AGCTGCCGGTTCTTAMTCCATCCTGCTMGTTMTGTTGGGTAGM
AMGGGMAG
CCACTTCTCCC TCTGACCTAGG MAGMCTACAGAGGACGATGTCCAAMCMAAMTGGCATCACCTGTCAAAMTGGAGTTCCACT
WI-9186 76 CGCA T TCTCCCCGCA[G/A]ACCTAGGTCAGACTTTCCCTTTCATCTT
AGMTATTGT
CTGCCTTAMG GGTGTGTGTGG TTGGACAMCCTAGMTTTTCTCCCTFTATGTATCTCTATCGATTGTGTAGCMTTGACAGAGMTM
WI-9193 94 G, CA TAGGGGG CTCACMTATTGTCTGCCTTAMGCA[G/A]TACCCCCCTACCACACACACCCCTGTCCTC
TTTGGATTGATATCGTGAMTCCTCAGCCGAGAMTTGGGCTGGATTG[CtF]GCTTTGGTTMTACAT
WI-9015 48 CTTTCCCTMAGMGATAMCACAAMTCCATTCCAGGTAGCTCGGCACCMCTMGM
GGAGCCAGGAGACAGCAGGGTCTGAGAGAGGAGCCAC[A/G]GTCCCTMTGACACCCACTCCTAGCC
GGTCTGAGAG GGAGTGGGTGT CTGAGGCTCGTGCCCCTCAGACTGGGGMGAGTCCMGGMGGGAGGGAGCAGCCACTCCTCMTGC
WI-7254 37 AGGAGCCAC CATFAGGGA TCMTGGCTCCCCTGMATCMGACAGG
Figure imgf000101_0001
o β
Figure imgf000102_0001
CMGAGAGAG TGCAAAGAAA CCAGGAGCACTAGAGAGGGAGGGGGMGAGCAGMGTTAGAGAAAAAMGCCACCGGAGGAMGG AGAGGAMGA GMTGAAAGTT AAAAAACATCGGCCAACCTAGAMCGTTTTCATTCGTCATTCCMGAGAGAGAGAGGAMGMAM
WI-7424 1 31 AAAA G [T/A]ACAACTTTCATTCTTTCTTTGCACGTTCATAMCATTCTACATA
TCCTGCMGMGTTCTCMGCC I I I I I GATTTTTGTGCMTMAGTACAGCTTTGCATMGAGTGAM TTGGGCTAGCTTAMTGGATCCATAMCTTTCTTCTMTTTTMGTGAGA[A/C]TCTTTTAMCACCT GTTMATTTMTGTAGCAGTCTGAGAATCTAAMTTATGTACCACTCGTTTATFTGTTCATTCATCCA
X86400 1 1 8 TCCCTTTTCCCATGMTATTTCA
GTGGCCACTACATGTTATAGAMCCATCATCTTGTCACACAGCACAGTCTATGMTMMGGCTGAG TTATCACTMGCAGGAGAAAMGCATTAAAMGTGTCCCATTMMGGGACTTTTMTCMCCTAA TMACTCTMTTCTGCTGACTTTTTAMGATCTMGGTCATTFTMTACATGCTGAAMGGGTCACA
WI-8053 242 T ATTAATTCTTTGATCTTTTTTACTCACTGTTAACTTATATAA[T/A]TTCAGMC
TACACMTGMTTGCTTTTATTTCGGTATGCATCCACATTTCAGCATTTAGTGGTCCTGMCAGCMG TGGMAGACGCAGCMTTTGCCAGGAGGTCMGCCCACCMTTTCGGGGATCTGCTGTGCACACCGG GTTCCTTCTTMTCCCTGCTGAGGATCTTG[G/A]GMGCAGCAGCAGCACCAMACCMGGCATGCA
WI-6190 ' 1 65 CCGGATTCMGGTTCTTTTTGTTCCAGTTGTCAGATTCCAMCTAGACCCCA
MCAGTCACCACCMCCACATGACMCTCGCCAGGCMGGCCTTGCTTCCCTCCCTCCTTTGCGTCCC ATGTGCCTAGTCAGCMGGTCGGGGAGGCACCGATGTTAGCTTCGCCCAMGGGAGTATTACAGAGA GAGGCTTGGGAM[G C]GGMGGMACCTGGACAGGCTTTTCAGCACTGAGAMTCACTTAMACTG
WI-6275 I 1 48 G Cl ATTTGCTTTCAGTMCTGGTATGTCTGM
ACCMGAGATCAGCTGTCTAMCAGCAGCT rGATTGT[G/ηGGGCTTCCTGAMGMACCTTGC
TGACAGCTTCTCACTGACCTGCAGGACGGMCCGTACCTGAGAGGGGATGGGGGCTCTCTCACAAM GMTATTTGGGGCAGMCCCTGGMCTGGCCACCAGGGACATCCCAMTATCCCCTCCTCCTCAGGG
WI-6421 41 CTCACCCCGACATCCTCAGCCAMTGMGGCTCTGM
GGGTGAGACGGGTTTATTGTGCACATFTACACAGCGTCACAGCGTCTGGGCTGGCAGCGGCCATGCTC CTGTGGTCGGGCTGCTCTACMGGGCGTTCACTTTTCTTCACCACACTATGTACAGTCAGTGCTCCM GGTGATGGGCTACAGTGCTGCATCAGTGAGTCTGTACACACATTTTTACATAMTTACACACGACTC
WI-6905 21 51 T j A ATACATGMAAA[T/A]AGAGCCTAAGGGCCTGTATTTTAATGAGAAMAAA
MCTTGTTTACAAMTAGGCTTTGCAMCTTCATTACTGMTTGTMAGTCMTGACTGTGTTGTTTT TAAAATATGTACCMGGAMTACAMTTGGATMTGATCATTTTTCATGCTCAGGAGAGMCAGCAC AGAAATAMGGATACTGCACMGGTGCMGGAAACCGGMCCCATTGTGTACACTGTCTTCACACAG
WI-9420 202 G A' — [G/A]GCATTCTTTCTCACCTTMCTGCAGCTGTGCMGATGCCTCAGTGTG
O I
Figure imgf000104_0001
©Ul
Figure imgf000105_0001
o 4-
Figure imgf000106_0001
O CΛ
Figure imgf000107_0001
TTTCTAGGCTGTACAGTCTGATGCATGA I I I I I I I ATAMTATTTCATACTCTTGTGMTTTGGATCTT TTTACTTTGAGCATATATTTTAGMTATGTGT[A/G]TGTTAMGGATCTCCACMTGTCTGCAGTGTG MGGCAGGTTCATTGTGGMTAGTTTMCAGTCAGGMGGCTAMCTGGTCAGTATTMTGTGTAGC
WI-7805 1 01 G CCTACCAMAATAGCCAGTAGTATCTGAAMTGMAAATAMTGMGTAT
GGCCAGGAGATTAGCMCMGGATTCATTCTGTTACTTACTTGCCCCTTTTTATCTTTCCCTCTTGCCC CAGTCCCTTCTCTCCAGCTTCATGTGMGCTCTGCACAGACMGACACTCAGTGTCCTTGGCAGTGCT [G/T]CTACTCCTCAGGTGCAGCATACATMCCAGTMGAGACTMATCTGCMTATATMAGAGCTC
WI-7416 1 37 CTACAMTCAGTAACATGMGMCACTCMMATTGGCAMTGTCATCAG
ATTTGMGATTTGGAGGGCTTTGCAGAGGAAMTAGATTTCMTTGGATCCCCAAACTATMTGACA AG I I I I I MTTAGGTGTGATCMGGCTFCTMMGTGAMTGCMGTTGTTACCAGTAMGTTTATA TCTTCCATTCAGCCCAGCTCATTTGCCAGAAMTTCAGGTGAGTGGATTGGCCAGACTATCTGGCMG
WI-140 252 GATGAAAATTTTAGTTTAMMJGTGTCATTTGTCTGTAπGGCAπ
GAGGTCTTTCAGCMCATGGMGCCCTACTGCTTCMCCCCGAGTTCCCCGGATCMGTGCTGGCACC CATGATGGAAACTCTTGCCATGGTTTTAGTACCCTGGACCMGTAGTCATTCCATCCTGACTTTAAM TTCTMACAGCCTTTGATGGGACMTCTCTGCTMAGACTMCCACTTCCTTATCTTATCTTCAGCTA
WI-198 21 8 CCTGCTTCCCTTTC[C/ηGTTTAACMAGCATAGMTATTCTGMCMCT
TTCATGGTCCCAAGACAGATTTTAMGAAAGAAMTAAGCCTCATCTCCTMCTATGACTTGGTCGG o ON MGCCMGAACCTACTTCAACATTTGACCCATMCCTTCTCTTGAGATGATGGGCTGACTTTTTCMT GCATGAGTTTGΓT/C]CCAAAGGCTTGATGGGAAMTCTCMCATTTGTTACCTMGMAGAGGATGT
WI-205C 1 46 ATCTTACTTTGTTTAAMMCTGCATATGCCTTTA I I I I I GTTTTAGTTCCC
TTCATGGTCCCMGACAGATTTTAMGMAGAAMTMGCCTCATCTCCTMCTATGACTTGGTCGG MGCCMGMCCTACTTCMCATTTGACCCATMCCTTCTCTTGAGATGATGGGCTGACTTTTTCMT GCATGAGTTTGΓT/C]CCAMGGCTTGATGGGAAMTCTCMCATTTGTTACCTMGMAGAGGATGT
Wl-205b 1 46 ATCTTACTTTGTTTAAAAMCTGCATATGCCTTTATTTTTGTTTTAGTTCCC
GMGACTGAGTTTCCAGGAGGTTGCAGCCGTTTCTCTCGGGCCATATGGCTMTMGGAGCTTGAGCA GGGATFCMCCTGTTTGCMCCCMGTNCTTTCCMGAGGTCTCAGACTACCTCCTCCATCTCCCCCT CTCCCCCACMCACACAMTACAGAGATT[G/C]MTTCAGGAGCCAGTTTCTAGGTGGGCTTTGAGC
WI-234 1 65 MTCATACACAGTMTCTCTTGGTGCTTTAGTTTTCTCAMTGGGMATGG
AGCTTTTGAMTCCAAMACCACAT[A/G]CTTGACTCTCTTATCCTCCTCTTGTTGTMCATCTATCC CTGAGGCAGAAMTACAGMCACCCTGTGGCTGCCTGMCGGAGGMGGATGGGGGCGGGGAGACAT CGGTCMTGTATCAMGCATCTCTCTGCCTGAMGACCTCTCCTGAMGACATGAGCTATTAGGAGC
Wl-276b 25 A G — TCTGGCMGGGCTTTGTCTTATCCTCCTTGCTATCCCTGATGACTGGGCAM
Figure imgf000109_0001
© 00
Figure imgf000110_0001
o vo
Figure imgf000111_0001
Figure imgf000112_0001
Figure imgf000113_0001
Figure imgf000113_0002
Figure imgf000114_0001
Figure imgf000115_0001
Figure imgf000116_0001
TGTTCTCTGGTCCAGGCACCGGGCTMGTCTTGTCTGCATMTGGMTMTCMCTGGACMCCCCNG CTNAGGTAGGNTACCTNGGCMTTAGCCCCATCTTACAGCTGCAMAGAGG[C/T]GCTCTGAGAGGT AMGTGCCCTGCCCCMCGCGCACMCTAGAGAGCAGCCAMCAGGTGTTTGMCCCAGCTCTGCCT
WI-1900 1 1 9 GACTTCAGATCTGTGTGCTTMCTGCCATGAGAMCCACTTTTCTTTGCTCC
ATTCCAGTTTCACAGTGGGCACAGGAGTCAGATTAGGGCTMGTTGGGGGGACAGGATGCACAGCGT GTTGGCTCAGGATCTCTGGGAGGTGGCACCTGTGACCTGGGCTMNCATGCTACTTTCAGAGTCMGC AGCMGCCMTGGGTAGGGAMGACCAGCC[C/T]CTCTGMNCTGGGTCCCACGTGGAGATAGTGM
WI-1943C 1 65 TACAGGGCACCGNTGAGCATTCCAGATGACTCCAMGCCCCGGCTGGAGTAT
ATTCCAGTTTCACAGTGGGCACAGGAGTCAGATTAGGGCTMGTTGGGGGGACAGGATGCACAGCGT GTTGGCTCAGGATCTCTGGGAGGTGGCACCTGTGACCTGGGCTMNCATGCTACTTTCAGAGTCMGC AGCMGCCMTGGGTAGGGAMGACCAGCC[C/T]CTCTGMNCTGGGTCCCACGTGGAGATAGTGM
Wl-1943b 1 65 TACAGGGCACCGNTGAGCATTCCAGATGACTCCAMGCCCCGGCTGGAGTAT
ATFCCAGTTTCACAGTGGGCACAGGAGTCAGATTAGGGCTMGTTGGGGGGACAGGATGCACAGCGT GTTGGCTCAGGATCTCTGGGAGGTGGCACCTGTGACCTGGGCTMNCATGCTACTTTCAGAGTCMGC AGCMGCCMTGGGTAGGGAMGACCAGC[C/T]CCTCTGMNCTGGGTCCCACGTGGAGATAGTGM
WI-1943 1 64 TACAGGGCACCGNTGAGCATTCCAGATGACTCCAMGCCCCGGCTGGAGTAT
CCAGGTGAGGCTGAMGMGGMGGAGGCMTTGCTGTTGGAGTGAGGGATTCTGGAGMGCACCCT GCAGAGCTTCATTCTGTTTTCAAMGTGTGCCATGCANGGTCNTCTGGGTTGTGAGCTCATNGCTGAG TTATCACAGCTCCTGATGACAGATCATGAAAMTAGGTACTTCCCMGCTCTGACTAGACCTTGGCA
WI-1960C 270 GTTGCMTTMATCCGTGGTGTCTGAAMCTTAAAMTGCACCTCCCMCTTT
CCAGGTGAGGCTGAMGMGGMGGAGGCMTTGCTGTTGGAGTGAGGGATTCTGGAGMGCACCCT GCAGAGCTTCATTCTGTTTTCAAMGTGTGCCATGCANGGTCNTCTGGGTTGTGAGCTCATNGCTGAG TTATCACAGCTCCTGATGACAGATCATGAAAMTAGGTACTTCCCMGCTCTGACTAGACCTTGGCA
Wl-1960b 270 GTTGCMTTAMTCCGTGGTGTCTGMMCTFAAMATGCACCTCCCMCTTT
CTGATGCCMGTGCAGCTTAGAGTNAGGMTCCAGAGAMGTNTTTGGATCTGGTMGTAGGAGTCA TTCTGGGCATTTCTTCATAGAGTNTTG I I I I lAGTCTCGTMTMTACTGTTGCCCTAGGMGGTTGTT
TTTCCTACTGCGTCTGTGAMGCCTTTCCCCATCGAGTGATACAGTACTTTCCAGTTATGGAGATTT[
WI-1977 203 /C]TMCMTCMACACTGGCTGAGGCTGTTGG
AMTTCTAGMGCCAGMGTCAGCTCACGATTTATAMGTTGMGTMATGCATTGTAGTTTCATGT TTTCTCTTAATTCTGCACAMACTAGCTAAAMTC[T/C]TTTMATCAGTTACCAGAGGCAATACCT GGGTTMTGTMGCACTCAMAGTTATGTAGAGTAGCTGTCTCTGAGTCAC I I I I I I CTACTCTCATT
WI-2012 1 02 | GGCTTCACCMTGCTTCCACTGGATC
Figure imgf000118_0001
Figure imgf000119_0001
00
Figure imgf000120_0001
Figure imgf000121_0001
Figure imgf000122_0001
Figure imgf000123_0001
SI s
Figure imgf000124_0001
-123-
Figure imgf000125_0001
TTGGTTGGCATTTTAGCCTCATAACMCTATTTACMTCATAATTGTTACTCTTATTTTACMACMG MAMTGAGGCTTMCATCACACTTCTGCTTAGTCGCAGAGCCMGATTTGMCCCAGGMTCCATT CACCGGTAC[A/G]TGCTACCTGGGTAAMAATGTTTAATTAAMTCTATGGCATTAGATTTCAMGA
WI-4584 144 GTCCTMTGTGGTTTTGMMTAGGTGTGCTTTMTTTGTFTATCAGTATGC
TTTCTGCATTTGMTGTGTATGGTCAGACTTCAGAGGMCCCAGGMTCTCATTTATTCAGTACMTA TGGTGGCCAGGTGCTCAGGCCCTATTATCAGAGAGATCTCAGTTTMCTTTCCMTTCCACCATTTAC TGACCATATGACTTGGGGMCATTATCTCACCTATCTGAGTCTGTATCC[CtηCATCTTTAMTTGTA
WI-4639 1 85 AATTTTAAGGACACCTATCATAGTAATATTGTGAGGATAAAATGAAATAA
AMTGMTCCGCTFTAGAGCAMTACCAGTMGGGCTGGTGCAGGATGGTGGTGGCTGAGAGA[A/- ]GATTACTCATAAAAGCATATTAATTTTATAAATATGGAMATTFAACTAGATAATTAAATGTGAAT TGAGTTTGMGGTTGCATGAGAGTAGGGAGGAGGTAGTTTCTACTTATAGGGTTTATATMGTNTGCT
WI-5327 63 A TCMTAGMTGGCTCTTTCGGATGACAATGATGAACTGTTCTMGCAGACAG
GCTTTTGAGMTGMMGGGGAGCCTGGACCATTGCAGGGCTTCTTCATCTCTGATTATTTTGTGTAT TTATTGTTCACTTATTTAT[C/T]GTCTGTCTCCCCTTCTGGTATGCTFGTGTCATGAMCMT GMTTC CCCAGTGCCTGGCCCGATTCGTGGCTCCTAGAGGTGTCCAGAAAAAMGTTTCGGTGMTAGMTTG
WI-5390 87 ACGMTGGGTTCAGMTTGMACCTGTGMTCTATGGMGACAMCGAAT SI
CCTTGCCTGCTTTATGCATMTGAGMTAGAGTTGACTCTCCTGTCMGMATCMTFATTMGCAGT GCAAACATTATTTTAATTT[G/A]AAAGAAACTTGTTTCTGAAACTTTGTACTCTTGTAGTNAAATTG MTCTTTCCTTCTCAGCAGTTTCCATGGTCGTGMTCCACCCCATCTCTTTTCACCAGTAGCMGATT
Wl-5404b 87 GCTACTTATATGGAAGGGTTTTAGAGTTCATMCM
CCTTGCCTGCTTTATGCATMTGAGMTAGAGTTGACTCTCCTGTCMGMATCMTTATFMGCAGT GCAAACATTATTTTAATTT[G/A]AAAGAAACTTGTTTCTGAMCTTTGTACTCTTGTAGTNAAATTG MTCTTTCCTTCTCAGCAGTTTCCATGGTCGTGMTCCACCCCATCTCTTTTCACCAGTAGCMGATT
WI-5404 I 87 GCTACTTATATGGMGGGTTTTAGAGTTCATMCM
TAGGAMGGGGATGGTGATGGCCTCTGAGACATFTAMTCTATTCTTTCACCACTCACACTGCCGCCA TATCTCCTC[A/C]CCMCACCTCTGTTTTCTGACAGCCMGTTTCCATCAGTTGATATGGGACTATTT GTTGCAAMCMTTGTTMMGATTTGGCTGACTTTGGCTGMTTTGCTACMCTCCMMAGANTC
Wl-5545b 77 GAGATACACCATGMTTTTATTTTCATTTCA
TAGGAMGGGGATGGTGATGGCCTCTGAGACATTTAMTCTATTCTTTCACCACTCACACTGCCGCCA TATCTCCTC[A/C]CCMCACCTCTGTTTTCTGACAGCCMGTTTCCATCAGTTGATATGGGACTATTT GTTGCAAMCMTTGTTMMGATTTGGCTGACTTTGGCTGMTTTGCTACMCTCCAAAMGANTC
WI-5545 77 A' C GAGATACACCATGMTFTTATTTTCATTTCA
SI CΛ
Figure imgf000127_0001
Figure imgf000127_0002
TMTTGCACAACTTACATATCAGGGTTTCTGATTGAMGGMGAGMTATTCCTTTCTTTTAGTGATT GCTTAATATTMTTCATAATMGTGCACCATCTCTΓF/CJGCTCCTTATAAATGTGTTTAGMGMGG MATTGAGTGTTGGGMTTMGCMCCAGGAGACATTTTTATATACTCCTACAGTGGGGGMGACTT
WI-6244 1 03 CCTATTTTCTΓTCCCMGGATGGATACATTTCTAC
CTGGCCTTATMTCCMGTFTAGGATTMTCTTACCCCMCTTMTAGACTFCCAGACAGTTGCAGTT GTCTACMGATTTCCTCCTAGTAGGGCTTTGGGTGTTGGCACCGTTTGGCTCATTC[C/ΗACTCTCCCT GGGTCTTATTGACTTTCAGGGAGCCTAGMGAGCTGGACMMCCTGCTTCTTTGCAGAMGAGTCG
WI-6268 1 24 GGGTTCCAMGATTTCGTTACGAI M I M A
AGGTGCCATTTMTCCATTCAMTTTGGMGCTACATCTTCMGGGTCTGAGAGAGCTCACTCCCCCC ATATATTCCCCCTTFACATGTTTFCTFATMGACATACAGTTTAATCMTTAACAMCTMACAGCTT ATATACTGGCMTATATTACAGATGGGTTTATGTCAGAGTAATAGATCACATGAMTGGACCATGTG
Wi-6336b 234 GTACCCCAGTGCATTATGTCTTGGTAGAGCC[C/T]TGAGGACACTGACAGT
AGGTGCCATTTMTCCATTCAMTTTGGMGCTACATCTTCMGGGTCTGAGAGAGCTCACTCCCCCC ATATATTCCCCCTTTACATGTTTTCTTATMGACATACAGTTTMTCMTTAACAAACTAMCAGCTT ATATACTGGCAATATATTACAGATGGGTTTATGTCAGAGTAATAGATCACATGAAATGGACCATGTG
WI-6336 234 T GTACCCCAGTGCATTATGTCTTGGTAGAGCC[CtTTFGAGGACACTGACAGT
SI
TTGGATACAAAAATTCAGTTACACAATCAGTAGCATTCMMTTAGTTATGAGTATTTATACAATTA ON CAAAAATGGNTTCATGTrTTMCM[C/A]GTATTTTAMAGCTCAAACATTTTAAAACAGGCACAAT ATTCTMNGGCATATGCATTCACCATGGGCTTTTGMTGTCCTCACTCCCMCTTCACMTCAAMTC
WI-6381 92 TACAGANGCGGCAAMGATCAGAGTTCAG
GGTTGAGGCATTGGGAMGGCAGAMTTGAGGCAGTAGAMATGGACATTTTAGGAAMGAGMGT TCAGAGGCAAAGTCATGACAGACAGGAMTACMGGCTTAGGMGACAGTAGTCTCTGTGGTTGM ATTTTGGTGTCATMTMGMGTTTAGACTTTGGTGGTTGTAGTAGTTGTAGTAGTAGGTAGCGTT[C/
WI-6436 1 98 G]ATTGGGTGTATTCCACAGACMGGTGATGTTCTMGATTTGATATTTATTGT
GAGGCCTCTTTGCTTTTCCTCAGTCMGGCTGTATCCAGGGTTGATATCTAGCCTATATGCCATATGT GTATGGCTAGTGTTTGTTCTGATTGGTTGGTGCTCACACTGCCCAGATTGTTAMTATTTTGAAMTC GTATCTGGTTCTATTCATCTGCATTCTCTGATCTTATGTCTGGCTCTATT[C/ηATCCCTATTCTCTGA
WI-6449 1 86 TCTTATGTCAGACCTGMGTTCCTCTMI I I I I CTGTGGTGTATTTATA
GAGGCCTCTTTGCTTTTCCTCAGTCMGGCTGTATCCAGGGTTGATATCTAGCCTATATGCCATATGT GTATGGCTAGTGTTTGTTCTGATTGGTTGGTGCTCACACTGCCCAGATTGTTAMTATTTTGAAMTC
Figure imgf000128_0001
GTATCTGGTTCTATTCATCTGCATTCTCTGATCTTATGTCTGGCTCTATT[CA]ATCCCTATTCTCTGA
WI-6449 1 86 C T — TCTTATGTCAGACCTGMGTTCCTCTM I I I I I CTGTGGTGTATTTATA
GCTGGAGAGAAMGACCTCCAAMGMGAMCTMATCAGAGTCTCTTGAGCMGAGGMTTGMA AGAACA[T/C]TGAAAAMATTAAAGTAGAACTCAMGAGCCMAAAGTCCCCMTTGTGTCCATTA TMGMATATTFTGAATGGAMTCTTMGAATGATTTTATTGATCAGTTAMTGTTCTTCCTCTCCTC
WI-6463 72 CAGTCCCATTTATATGACATTCCGCATGCTG
MGCAGTAMTCTTCCATCATGCCATGGATGCCAGTGGGTAMTGTTATAGAMCTTCAGAGGANAC AGAGGCAM[C/T]GTTGGTTATAGCAGTCMCGACATCATCMTGMGACATGACTTGCTTAGAGCC MGMMAGTAGGATTTTGAMGGCACAGAGAAMGGGGTGTACTAGAGGAGMCTATGTMGCAG
Wl-6474b 76 T AGGTATAGAGGMCTMAGTATAAMGAGTGAGCCATMCTTAGGGTACCATAA
MGCAGTAMTCTTCCATCATGCCATGGATGCCAGTGGGTAMTGTTATAGAMCTTCAGAGGANAC AGAGGCAM[GT]GTTGGTTATAGCAGTCMCGACATCATCMTGMGACATGACTTGCTTAGAGCC MGAAAMGTAGGATTTTGAMGGCACAGAGAAMGGGGTGTACTAGAGGAGMCTATGTMGCAG
WI-6474 76 AGGTATAGAGGMCTMAGTATAAMGAGTGAGCCATMCTTAGGGTACCATM
GMCTCMTTMCTTTGCMCACTGAGAAMTCGGATTTGGAGATCTGCAMGCTGAGGTTGAGATT TTGGACCTTGGTGATCCAMTGGGGMTGCCACGCTTCGAGGCCTGTCTATATGCTTTATTTTTGTGA CACTGTCTATTTACCCTCCCCCMTAGTGGAGMTCAGAG[T/A]GCTCCTTGTCAGTGTTGCTACAGA
Wl-6478b 175 GMGATATACAGGATGGMGGACAGCTCCTCGTAGGACCTAGACACMCTG
GMCTCAATTMCTTTGCMCACTGAGAAMTCGGATFTGGAGATCTGCAMGCTGAGGTTGAGATT s
-4 TTGGACCTTGGTGATCCAMTGGGGMTGCCACGCTTCGAGGCCTGTCTATATGCTTTATΠTFGTGA CACTGTCTATTTACCCTCCCCCMTAGTGGAGMTCAGAGΓF/A]GCTCCTTGTCAGTGTTGCTACAGA
WI-6478 1 75 GAAGATATACAGGATGGMGGACAGCTCCTCGTAGGACCTAGACACMCTG
CACATTTTGMTGCMCTGAGAMNTGGTTΓTNTAGGCCTACCTTTTATTTMGAGTACATCTGGCTC CMTGTTACCCCAMCATGCAMACATMGGCMCMTTCTGATCATTTTATAGGNTCCCMGCCCA
TTAGCAATATCTTA[G/A]TCAAATTTTAMAAGAGMCAGGAAATMGGAAGGCCTMCAGAGGAG
WI-6559 149 TTAAATMTTGTGCAAMCTTATCAGTTCTTC
TTCTTTATTGGTCCTACCMTGTGACTCTTTACCCAGGCCCACTGTTCCTATGC[G/A]CACTGGCTTTG TAGGCATTCACATCATATGTCTGTGTCCTGAAMTCTCMTTMTTTCTCCTNCCTATFCCTTTTCCATl GCTCTGCCTCATTTNCTCAGAMTTGMGGCATTTGATTATNA I I I I I I I GTTTGGGTCTGTGTAMG
Wl-6564b 54 GTTCCTTGGCAGGAGMCATGCATATGACTTTAAMTMAGACCMCA
TFCTTTATTGGTCCTACCMTGTGACTCTrTACCCAGGCCCACTGTTCCTATGC[GyA]CACTGGCTTTG
TAGGCATTCACATCATATGTCTGTGTCCTGAMATCTCMTTMTTTCTCCTNCCTATTCC' FCCATl
GCTCTGCCTCATTTNCTCAGAMTTGMGGCATTTGATTATNATTT1 GTTTGGGTCTGTGTAMG
WI-6564 54 G! GTTCCTTGGCAGGAGMCATGCATATGACTTTAAMTMAGACCMCA
SI
00
Figure imgf000130_0001
S>
VO
Figure imgf000131_0001
GCATGATTAMCCAGTGCAGAMMTACCMGTACATTGGGTGMCGATGAGCTAGCTGTTCTAGTA TTTGCTTTTTGTMTCCAGTTMGACCATCAGCATATACAACATCATCACTAACTCMCMTGTAGCT GCAGGGTMC[C/A]TGTGGATACCCTGTGTGCTCTACTNGCCTCCAMGGCATTCAGGGGATCATCA
Wl-6817 1 45 MGATGTTGGACACCTTGTGTTCAMTCTTGGTTCAGGTGCGGCCTGTGCAG
GATGGMAGCCATTTTA I I I I I CTCTMATTTTAAAATAGMGACTTTAATGGAAMCATTTAGTAC CATCATGTCACCCTGMTGCCAGCMTACCTCGACTTTTACACACGCAGGMGCCTAGTAAMGCCC CGTCAGTAGTACACATTTCTCTATGGTCCTTCMCAGTT TTTTGGCCAATTAATTAACCAAAAMMTTTTTCTGCTATTT
Wl-6819b 221 CTTTAGCAMCAGCMTMCT TTGTGTTTCCTATATGACACCTAATATCCAG
GATGGAAAGCCATTTTA I I I I I CTCTAAATTTTAAAATAGMGACTTTAATGGAAMCATTTAGTAC CATCATGTCACCCTGMTGCCAGCMTACCTCGACTTTTACACACGCAGGMGCCTAGTAMAGCCC CGTCAGTAGTACACATTTCTCTATGGTCCTTCMCAGTTTT[G/T]CATATACAMATTTTCTGCTATT
Wl-6819a 175 TTGCTTTAGCMACAGCMTMCTTTTGTGTTTCCTATATGACACCTMTAT
GCAAAMGCTTTATTGGCTCCMCMATTATCCCTTTTAAMCTCCTCTTCTTCTTCTGGTCTCAGTG GAACAACACATTTGMTTTCAGATTTGCAGTTTATAGCA I I I I I I I I CCCTAAGMCCATATAMTAC ATGCAAAACCTTGTACAT[A/G]GAGCTTAAATMTATCAAAATGCAAATATAGATTGGGTGCACTGT
Wl-6826b 1 54 TMGCTGMTTGCAMTTATGGCMCACACACTGGACTGGGGTATACGTTG
GCAAAMGCTTTATTGGCTCCMCAMTFATCCCTTTTAAAACTCCTCTTCTTCTTCTGGTCTCAGTG UI
© GAACAACACATTTGAATTTCAGATTTGCAGTTTATAGCA I I I I I I I I CCCTMGAACCATATAMTAC ATGCAAAACCTTGTACAT[A/G]GAGCTTAAATMTATCAAAATGCMATATAGATTGGGTGCACTGT
WI-6826 1 54 TMGCTGMTTGCAMTTATGGCMCACACACTGGACTGGGGTATACGTTG
AGTGCAMCTATTTTGMCAAMGTAMCTATGAGTCACAGCATTCAGCMGACATCAGACACGGA AGAGTGMCMTATTCACTMGTMAATACAGCAGATGAGATGTCTCTCACATGTA[T/C]ATTTMT TATFCATGC I I I I I CMTAGTCTCTTAGTCMCTTTCAGTGTMTTTCCACAMTATATAGCAGCTCA
Wl-6857a 1 22 MCACMATGCAGGAGCACMTGGCMAGTTTGGCMCTGTTTTGGGCTMTT
TTATAGMTACTTATGGGGCATACGNGTAMTGAACTGTCMCCTTMAATCTAMCMACAGCTTG TTTGTGGTTCGTCCTGMATCCTCCCTGCTCACAAMCAGCCAGCTACTNGGTTTTCTAAMGACGTA ATTTTGCAGGCAMCTTC[G/A]TAGAGCCATTCTGTGCAGMGMGGGMGGGAGMGCTGTTTGTT
WI-6865 1 53 G A TTACCTGTAGTATGMGATATTCTTTGCGCTGTTAGMCTGAGCTCATFM
ATTGAMACTGGTTAGCMCAGATAMTTACMTAGAGCCTGGATATAMMTGAGAGMGMTGC AGACTTA[C/T]MGCTTATAGAGAMGTCAAAMGGAGCMGTTTTTGMATCAGATTTTATGATAC GGAAAAAAMTTFCCTTFTTTTGCCMCAGGATTATTTCGMTAATAMTCTGCCAGTGCCMTCAG
WI-6909 73 C T AMCACCATTTCCACMTATTTGCATGCCCCTAGTTGCCTATTTTATACATATC
UI
Figure imgf000133_0001
ACTTCTAGTGCCTCTGTTACCACCACCTCTMTGCCTCTGGTCGCCGCACTTCTGATGTCCGTAGGCCT TMATCTGCCTGGCGTCCCCTCCCTCTGTCTTCAGCACCCAGAGGAGGAGAGAGCCGGCAGTTCCCTG CAGGAGAGAGGAGGGGCTGCTGGACCCMGGCTCAGTCCCTCTGCTCTCAGGACCCCCTGTCCTGACT
Wl-6996b 242 CTCTCCTGATGGTGGGCCCTCTGTGCTCTTCTCTTCqG/ηGTCGGATC
ACTTCTAGTGCCTCTGTTACCACCACCTCTMTGCCTCTGGTCGCCGCACTTCTGATGTCCGTAGGCCT TAMTCTGCCTGGCGTCCCCTCCCTCTGTCTTCAGCACCCAGAGGAGGAGAGAGCCGGCAGTTCCCTG CAGGAGAGAGGAGGGGCTGCTGGACCCMGGCTCAGTCCCTCTGCTCTCAGGACCCCCTGTCCTGACT
WI-6996 228 CTCTCCTGATGGTGGGCCCTCTG[T/G]GCTCTTCTCTTCCGGTCGGATC
TGGGGAGGACAGGGAGATGCTGCAGTTCCAAMGAGMGGTFTCTTCCAGAGTCATCTACCTGAGTC CTGMGCTCCCTGTCCTGAMGCCACAGACMTATGGTCCCAMT[G/A]CCCGACTGCACCTTCTGTG CTTCAGCTCTTCTFGACATCMGGCTCTTCCGTTCCACATCCACACAGCCMTCCMTTMTCMACC
Wl-7021 b 1 1 2 ACTGTTATTMCAGATAATAGCAACTTGGGAMTGCTTATGTTACAGGTTA
TGGGGAGGACAGGGAGATGCTGCAGTTCCAAMGAGMGGTTTCTTCCAGAGTCATCTACCTGAGTC CTGMGCTCCCTGTCCTGMAGCCACAGACMTATGGTCCC[A/G]MTGCCCGACTGCACCTTCTGTG CTTCAGCTCTTCTTGACATCMGGCTCTTCCGTTCCACATCCACACAGCCMTCCMTTMTCMACC
WI-7021 1 08 ACTGTTATTMCAGATAATAGCAACTFGGGAMTGCTTATGTTACAGGTTA
UI
GGCAGTAGGACCACCAGTGTGGGGTTCTGCTGGGACCTTGGAGAGCCTGCATCCCAGGATGCGGGTGG SI CCCTGCAGCCTCCTCCACCTCACCTCCATGACAGCGCTAMCGTTGGTGAfC/ηGGTTGGGAGCCTCT GGGGCTGTTGMGTCACCTTGTGTGTTCCMGTTTCCMACMCAGAMGTCATTCCTTCTTTTTAM
WI-7056C 1 1 8 ATGGTGCTTMGTTCCAGCAGATGCCACATMGGGGTTTGCCATTTGATA
GGCAGTAGGACCACCAGTGTGGGGTTCTGCTGGGACCTTGGAGAGCCTGCATCCCAGGATGCGGGTGG CCCTGCAGCCTCCTCCACCTCACCTCCATGACAGCGCTAMCGTTGGTGA[C/ηGGTTGGGAGCCTCT GGGGCTGTTGMGTCACCTTGTGTGTTCCMGπTCCMACMCAGAMGTCATTCCTTCTTTTTAM
Wl-7056b 1 1 8| ATGGTGCTTMGTTCCAGCAGATGCCACATMGGGGTTTGCCATTTGATA
MTTCGCTGMAAAGGMCTACCTATCCTTACATTTCACCTACTMTGTCTCTTCTMCATCTTAGAG GTCCATGGAGMGGCATATGGAGMCATGTTTTATACTGCTCTATAMTAGTATTCCMTCACTGTG CTTAATTTAAATAGCATT[A/C]TCTTATCATTFATCAGCCTTTTATGTATTTTCCAAGTAAAATATTA
Wl-7091 b 1 53 ACATATTATTTCATFGGTCTTC I I M I I ATCTGGTTCTATATGMTGCTAT
MTTCGCTGMMAGGMCTACCTATCCTTACATTTCACCTACTMTGTCTCTTCTMCATCTTAGAG GTCCATGGAGMGGCATATGGAGMCATGTTTTATACTGCTCTATAMTAGTATTCCMTCACTGTG CTTAATTTAAATAGCAT [A/C]TCTTATCATTTATCAGCCTTTTATGTATTTTCCMGTAAAATATTA
WI-7091 1 531 ACATATTATTTCATFGGTCTTC I I I I I I ATCTGGTTCTATATGMTGCTAT
TGTGMGCCACATTTTCCMCATGAGCCTCATGMGCCAACTMGTGTTATTGMCTGΓΓ/CIMTTC TCTCMTMCTCAGTGTAGCACTTTAMGTCTGMGGACAGCMCATGAAMGAGCATATCMTGTG
GTGGAGAMGGGMGGGGTTGGC I I I I I MTTTAT TTTTCTTCATCTTTTATMCMGMAGNNNNN
WI-7136 58 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGTAGCTTTCTATATATG
GGGACGCCTGTTGTT1TGGCTCAATTTGGGTTTGTTGGTCACATGGAGCTCTTCCATTTCGTTTAGCTG MTMTGAGTTGTTCCTAGAGGAGACAGCCTGTCTCTCCTFGTTGCCCCCAMGCCCATGCCCTGCCG TGGTGGCAGCTGGGGCTGTGGATGGGAGGGGTCCCCMCATGGATGTGTTGCCCCTCCTCCGCATGCC
WI-7146C 21 0 MCGC[A/G]GTTCATGTACMGGCCCCTCTGCMCTGGAGAGAAMTTA
GGGACGCCTGTTGTTTTGGCTCMTTTGGGTTTGTTGGTCACATGGAGCTCTTCCATTTCGTTTAGCTG MTMTGAGTTGTTCCTAGAGGAGACAGCCTGTCTCTCCTTGTTGCCCCCAMGCCCATGCCCTGCCG TGGTGGCAGCTGGGGCTGTGGATGGGAGGGGTCCCCMCATGGATGTGTTGCCCCTCCTCCGCATGCC
Wl-7146b 21 0 MCGC[A G]GTTCATGTACMGGCCCCTCTGCMCTGGAGAGAAMTTA
GGGACGCCTGTTGTTTTGGCTCMTTTGGGTTTGTTGGTCACATGGAGCTCTTCCATTTCGTTTAGCTG MTMTGAGTTGTTCCTAGAGGAGACAGCCTGTCTCTCCTTGTTGCCCCCAMGCCCATGCCCTGCCG TGGTGGCAGCTGGGGCTGTGGATGGGAGGGGTCCCCMCATGGATGTGTTGCCCCTCCTCCGCAηGA
WI-7146 202 ICCMCGCAGTTCATGTACMGGCCCCTCTGCMCTGGAGAGAAMTTA
UI
ATATTACMCTTGC I I I I I AGCTGATCTTCCATCCTCMATGACTC I I I I I I CTTTATATGTTMCATA I TATAAMTGGCMCTGATAGTCMTTTTGAI I I I lATTCAGGMCTATCTGAMTCTGCTCAGAGCCT ATGTGCATAGATGAAACNNNNNNNtA/T]AAAAAAAGTTATTTAACAGTAATCTATTTACTAATTAT
WI-7153 1 61 AGTACCTATCTTTAMGTATAGTACATTTTACATATGTAAATGGTATGTTT
TAGMTAGATGCGGTCATATTCTTCTTTGGCTTCTGGTTCTTCCAGCCCTCATGGTTGGCATCACATAT GCCTGCATGCCATTMCACCAGCTGGCCCTACCCCTATMTGATCCTGTGTCCTAMTTMTATACAC CAGTGGTTCCTCCTCCCTGΓT/G]TAAAGACTMTGCTCAGATGCTGTTTACGGATATTTATATTCTAG
WI-7155 1 56 TCTCACTCTCTTGTCCCACCCTTCTTCTCTTCCCCATTCCCMCTCCAG
AGCTCCACCAGATGCAGATTTGTGTTΓTGTTTTCTTGTTATCACTGTCACACAGCTTATMCATGTAT
GCTTTTCAGMTACAGTTGTCTAGCCMGCCATCMGTGTCTGMATTCMTATTGGTTTATGCAMT
ACAGCAAACTTTTATTTAAGTAGAT[A/G]GGAGAATATGTTTAAAATATTAGGAATCCTAGACCATA
Wl-7169b 1 61 TTTCMGTCATCTTAGCAGCTAGGATTCTCAMTGGMGTGTTATATATA
CTCCTAGACTAGTGCTTTACCTTTATTMTGMCTGTGACAGGMGCCCMGGCAGTGTTCCTCACCA ATMCTTCAGAGMGTCAGTTGGAGAAMTGMGMMAGGCTGGCTGAAMTCACTATMCCATC AGTTACTGGTTTCAGTTGACAAMTATATMTGGTTTACTGCTGTCATTGTCCATGCCTA[C/T]AGAT
Wl-7175b 1 94 MTTTATTTTGTA I I I I I GMTMAMACATTTGTACATTCCTGATACTGGG
ui
4-.
Figure imgf000136_0001
UI CΛ
Figure imgf000137_0001
Figure imgf000138_0001
I -4
Figure imgf000139_0001
UI 00
Figure imgf000140_0001
U
VO
Figure imgf000141_0001
©
Figure imgf000142_0001
Figure imgf000143_0001
TGAMTCCTGGGTCTCTTGGCCTGTCCTGTAGCTGGTTTATTTTTTACTTTGCCCCCTCCCCAC I I I I I I TGAGATCCATCCTTTATCAAGAAG[T/A]CTGAAGCGACTATAMGGTTTTTGMTTCAGATTTAAM ACCMCTTATAMGCATΓGCMCMGGTTACCTCTATTTTGCCACMGCGTCTCGGGATTGTGTTTGA
WI-7388 94 CTTGTGTCTGTCCMGMCTTTTCCCCCAAAGATGTGTATAGTTATTGG
TTAGATTTTMTFGGCMCCAGCMCTCACTGCCACCATTCCACTGCAGATCTNCTATTCCTGG[A/G] GTTGATATGACMGGMACCCTATTGGMCCMGTCTTCAGATTGTNCCATGTGCAGACAGGCTCCT TGTCTGTAGGTGTAGTAGCATGTACACTGTACTGTTCACTGTMCATAGTTTGTNCTGGTATTTGTTA
WI-7438 64 TTGGAMTGMTATCGCTTCCACTGACTTTTACCA
CCATGATCCCCTCCTCTTGCCAMTGGAGGMGCCTGTGGATGGTACCMCAMCMGCCCCAMCC CAGTACAMCTGAGMTGAGAGMCCCTGATAGCACTGTCTGMTTGCCAGGAGCCTCCMGGCTM TCCTACCCCTGGATTTCT|T/C]TGTTGTTTAAGTTATTTCTAGCCACCACAMGAGGGTACTGCCCM
Wl-7454b 1 52 CAGACTCATCCTTAAAMATCCCATTTGTCTACTTCTCAMTG I I I I I GACA
CCATGATCCCCTCCTCTTGCCAMTGGAGGMGCCTGTGGATGGTACCMCAMCMGCCCCAMCC CAGTACAMCTGAGMTGAGAGMCCCTGATAGCACTGTCTGMTFGCCAGGAGCCTCCMGGCTM TCCTACCCCTGGATTTCT[T/C]TGTTGTTTMG"FTATTTCTAGCCACCACAMGAGGGTACTGCCCM
WI-7454 1 52 CAGACTCATCCTTAAAAAATCCCATTTGTCTACTTCTCAMTG I I I I I GACA
AATTTGAAAATCTGAAAAAAAGTGCATAAGCAGAGAMTGACACTTATTCCAAATAAATAAATTGT 4i. SI CCA I I I I I CACTCAGTCCATCTTAACCATGTACAATGCACTAMTTACTATTFATMTTTCCTATGTA CMCAGAGCCACAGCACMGAGGGTGGGCATMGCAGTTGCCA[G/C]CCAGMGAGCTTTCACTCAT
Wl-7464c 1 77 GAMGMAGCCCTACAMTAGGCCCAGGAGMGCMCGTTCACCMCMTTAT
AATTTGAAMTCTGAMAAMGTGCATAAGCAGAGAMTGACACTTATTCCAAATAMTMATTGT CCA I I I I I CACTCAGTCCATCTTMCCATGTACAATGCACTAMTTACTATTTATMTTTCCTATGTA CMCAGAGCCACAGCACMGAGGGTGGGCATMG[C/A]AGTTGCCAGCCAGMGAGCTTTCACTCAT
Wl-7464b 1 68 GAMGMAGCCCTACAMTAGGCCCAGGAGMGCMCGTTCACCMCMTTAT
MTTTGMMTCTGAMMMGTGCATMGCAGAGAMTGACACTTATTCCAMTMATAMTTGT CCA I I I I I CACTCAGTCCATCTTMCCATGTACAATG[C/A]ACTAAATTACTATTTATMTTTCCTAT GTACMCAGAGCCACAGCACMGAGGGTGGGCATMGCAGTTGCCAGCCAGMGAGCTTTCACTCAT
Wl-7464a 1 03 GAMGMAGCCCTACAMTAGGCCCAGGAGMGCAACGTTCACCMCMTTAT
CMTTCTCMTCCMCCTAGTCTGTNTGCCTAMCCATTCCAGACAMCTTCCACTTCGMGGTTTTA MTGCATMGTCAGATAGCMTCCTTCAGTTGCCCCAGAGGCACATCACGTTCTTTGMTGCTTCA[T
Figure imgf000144_0001
/G]TATAGTCCTCTTCATTTAGCMTCAGTGAGGCMTACACTGGCATCATGATCCCI I I I I I IAGGA
Wl-7499b 1 34 G! ACTCTGTACAAMTTCCCTTTGMMTATAMTTTTGGAAATGAGTGATGA
CMTTCTCAATCCMCCTAGTCTGTNTGCCTM[A/G]CCATTCCAGACAMCTTCCACTTCGAAGGTT TTAMTGCATMGTCAGATAGCAATCCTTCAGTTGCCCCAGAGGCACATCACGTTCTTFGMTGCTTC ATTATAGTCCTCTTCATTTAGCMTCAGTGAGGCMTACACTGGCATCATGATCCCTTTTTTTAGGM
Wl-7499a 33 CTCTGTACAAMTTCCCTTTGAMATATAMTTTTGGMATGAGTGATGA
TGGGMTAGTMGAGAMGATGGGAAAGGTGACCAMAACMTATAGAGGCAGAGGCCMGTGMT GCATCCCAGCAGCAGACCACTTNAAMGTAGTCCTGGTGCTGATFGCCTAGC[A/C]GGAGAGTTGAG TGCCACAGGTAAGAATGAGTGMGAGGAAAAMTCATGATGTCATGTATGCAGTMTTACTATGTCA
Wl-7506b 1 1 8 GMGMMTATTTTAAMTATTGGACCACTCTTGTTCTACCATCCCTACCCACT
TGGGMTAGTMGAGAMGATGGGAMGGTGACCAMMCMTATAGAGGCAGAGGCCMGTGMT GCATCCCAGCAGCAGACCACTTNAAMGTAGTCCTGGTGCTGATTGCCTAGC[A/C]GGAGAGTTGAG TGCCACAGGTMGMTGAGTGMGAGGAAMMTCATGATGTCATGTATGCAGTMTTACTATGTCA
WI-7506 1 1 8 GMGAAMTATTTTAAMTATTGGACCACTCTTGTTCTACCATCCCTACCCACT
TGTGMTTCTTAGCTCTGGMGGTGTTTATGCCTTTGCGGGTTTCTTGATGTGTTCGCAGTGTCACCCA AGAGTCAGMCTGTACACATCCCAAMTTTGGTGGCCGTGGMCACATTCCCGGTGATAGMTTGCT AMTFGT[CAF]GTGAAATAGGTTAGM I I I I I CTTTAAATTATGGTTTTCTTATTCGTGAAAATTCGG
Wl-7534b 1 43 AGAGTGCTGCTAAMπGGATTGGTGTGATCTTTTTGGTAGTTGTMTTT
TGTGMTTCTTAGCTCTGGMGGTGTTTATGCCTTTGCGGGTTTCTTGATGTGTTCGCAGTGTCACCCA AGAGTCAGMCTGTACACATCCCAAMTTTGGTGGCCGTGGMCACATFCCCGGTGATAGMTTGCΓΓ /C]MATTGTCGTGAAATAGGTTAGM I I I I I CTTTAMTTATGGTTTTCTTATTCGTGAAAATTCGG
WI-7534 1 35 AGAGTGCTGCTAAMTTGGATTGGTGTGATCTTTTTGGTAGTTGTMTTT
GGGMAGMTAAMTFAGCTTGAGCMCCTGGCTMGATAGAGGGGCTCTGGGAGACTTTGMGACC AGTCCTGTTTGCAGGGMGCCCCACTTGMGGMGMGTCTMGAGTGMGTAGGTGTGACTTGMC TAGATTGCATGCTTCCTCCTTTGCTCTT[G/A]GGMGACCAGCTTTGCAGTGACAGCTTGAGTGGGTT
Wl-7543b 1 62 CTCTGCAGCCCTCAGATFATTTTTCCTCTGGCTCCTTGGATGTAGTCAGTTA
GGGAMGMTMMTTAGCTTGAGCMCCTGGCTMGATAGAGGGGCTCTGGGAGACTTTGMGACC AGTCCTGTTTGCAGGGMGCCCCACTTGMGGMGMGTCTMGAGTGMGTAGGTGTGACTTGMC TAGATTGCATGCTTCCTCCTTTGCTCTr[G/A]GGMGACCAGCTrTGCAGTGACAGCTTGAGTGGGTT
WI-7543 M 62 I GJ A CTCTGCAGCCCTCAGATTATTTTTCCTCTGGCTCCTTGGATGTAGTCAGTTA
GGTGATCMGATCTGTTCCACAGGGCTMTGCCACCATCTCCCCTCAMATTTGTAGAGGtT/C]TCTA MMGAMGTGGTATGTTGTGTGATGATCAGCACTMGTCCTGCATTCCTGTTAMGCCACTTGGGTC ATMGMGGGMGTAAAAMTGAAGTCTGACTAGAMTTCTATTGCAGAGGCCMGTACATTTAGT
Figure imgf000145_0001
WI-7555C 60 T Ci ATGGCATTGAGTTGTGATATAGTTTTCATTTGATGTGCATTTTGMTTTCAG
4i.
Figure imgf000146_0001
Figure imgf000147_0001
MCCATGTTCCCTTCTTCTTAGCACCACAMTMTCAAMCCCMCATMGTGTΓTGCTTTCCTTTM AMTATGCATCAMTCGTCTCTCATTACTTTTCTCTGAGGGTΠTAGTA[A/G]ACAGTAGGAGTTMT AMGMGTFCATΠTGGTTTACACGTAGGAMGMGAGMGCATCAMGTGGAGATATGTTMCTAT
W1-7577J 1 1 7 TGTATMTGTGGCCTGTTATACATGACACTCTTCTGMTTGACTGTATTTC
MCCATGTTCCCTTCTTCTTAGCACCACAMTMTCMMCCCAACATMGTGTTTGCTTTCCTTTM AMTATGCA[T/C]CAAATCGTCTCTCATTACTTTTCTCTGAGGGTTTTAGTAMCAGTAGGAGTTMT MAGAAGTTCATTTTGGTTTACACGTAGGAMGAAGAGMGCATCAMGTGGAGATATGTTMCTAT
WI-7577J 77 TGTATAATGTGGCCTGTTATACATGACACTCTTCTGAATTGACTGTATTTC
MCCATGTTCCCTTCTTCTTAGCACCACAMTMTCMMCCCMCATM[G/C]TGTTTGCTTTCCTT TMMATATGCATCAMTCGTCTCTCATTACTTTTCTCTGAGGGTTTTAGTAMCAGTAGGAGTTMT AMGMGTTCATTTTGGTTTACACGTAGGAMGMGAGMGCATCAMGTGGAGATATGTTMCTAT
Wl-7577 50 TGTATMTGTGGCCTGTTATACATGACACTCTTCTGMTTGACTGTATTTC
MCCATGTTCCCTTCTTCTTAGCACCACAMTMTCAAMCCCMCATAAGTGTTTGCTTTCCTTTM MATATGCATCAAATCGTCTCTCATTACTTTTCTCTGAGGGTTTTAGTAMCAGTAGGAGTTMTM AGMGTTCATTTTGGTTTACAC[G/A]TAGGAMGMGAGMGCATCMAGTGGAGATATGTTMCT
Wl-7577g 1 57 ATTGTATMTGTGGCCTGTTATACATGACACTCTTCTGMTTGACTGTATTTC
MCCATGTTCCCTTCTTCTTAGCACCACAAATAATCAAAACCCAACAT[A/G]AGTGTTTGCTTTCCTT
ON
TMAAATATGCATCAMTCGTCTCTCATTACTTTTCTCTGAGGGTTT FAGTAMCAGTAGGAGTTMT
AMGMGTTCAT GGTTTACACGTAGGAMGMGAGMGCATCAMGTGGAGATATGTTMCTAT
Wl-7577f 48 G TGTATMTGTGGCCTGTTATACATGACACTCTTCTGMTTGACTGTATTTC
AACCATGTTCCCTTCTTCTTAGCACCACAMTMTCMAACCCMCATMGTGTTTGCTTTCCTTTM MATATGCATCAAATC[G/A]TCTCTCATTACTTTTCTCTGAGGGTTTTAGTAMCAGTAGGAGTTMT MAGMGTTCATTTTGGTTTACACGTAGGAMGMGAGMGCATCAMGTGGAGATATGTTMCTAT
Wl-7577e 84 TGTATMTGTGGCCTGTTATACATGACACTCTTCTGMTTGACTGTATTTC
MCCATGTTCCCTTCTTCTTAGCACCACAMTMTCMMCCCMCATMGTGTTTGCTTTCCTTTM MATATGCATCMATCGTCTCTCAT[T/C]ACTTTTCTCTGAGGGTTTTAGTAMCAGTAGGAGTTAAT AMGMGTTCATTTTGGTTTACACGTAGGAMGMGAGMGCATCAMGTGGAGATATGTTMCTAT
Wl-7577d 93 TGTATMTGTGGCCTGTTATACATGACACTCTTCTGMTTGACTGTATTTC
Figure imgf000148_0001
MCCATGTTCCCTTCTTCTTAGCACCACAMTMTCMMCCCMCATMGTGTTTGCTTTCCTTTM MATATGCATCAMTCGTCTCTCATTACTTTTCTCTGAGGGTTTTAGTAMCAGTAGGAGTTMTM AGMGTTCATTTTGGTTTA[C/A]ACGTAGGAMGMGAGMGCATCAMGTGGAGATATGTTMCT
WI-7577C I 1 54 ' C' A ATTGTATMTGTGGCCTGTTATACATGACACTCTTCTGMTTGACTGTATTTC
-4
Figure imgf000149_0001
00
Figure imgf000150_0001
÷-
VO
Figure imgf000151_0001
Figure imgf000151_0002
Figure imgf000152_0001
TTAMTGAGTGTGTTTGTCACCGTTGGGGATTGGGGMGACTGTGGCTGCTGGCACTTGGAGCCMGG GTTCAGAGACTCAGGGCCCCAGCACTAMGCAGTGGACCCCAGGAGTCCCTGGTMTMGTACTGTG TACAGMTTCTGCTACCTCACTGGGGTCCTGGGGCCTCGGAGCCTCATCCGAGGCAGGGTCAGGAGAG
Wl-7743d 275 T GGGCAGMCAGCCGCTCCTGTCTGCCAGCCAGCAGCCAGCTCTCAGCCMCG
TTAMTGAGTGTGTTTGTCACCGTTGGGGATTGGGGMGACTGTGGCTGCTGGCACTTGGAGCCMGG GTTCAGAGACTCAGGGCCCCAGCACTAMGCAGTGGAC[C/A]CCAGGAGTCCCTGGTMTMGTACT GTGTACAGMTTCTGCTACCTCACTGGGGTCCTGGGGCCTCGGAGCCTCATCCGAGGCAGGGTCAGGA
WI-7743Θ 1 06 GAGGGGCAGMCAGCCGCTCCTGTCTGCCAGCCAGCAGCCAGCTCTCAGCC
TTAMTGAGTGTGTTTGTCACCGTTGGGGATTGGGGMGACTGTGGCTGCTGGCACTTGGAGCCMGG GTTCAGAGACTCAGGGCCCCAGCACTAMGCAGTGGACCCCAGGAGTCCCTGGTMTMGTACTGTG TACAGAATFCTGCTACCTCACTGGGGTCCTGGGGCCTCGGAGCCTCATCCGAGGCAGGGTCAGGAGAG
Wl-7743d 275 | T GGGCAGMCAGCCGCTCCTGTCTGCCAGCCAGCAGCCAGCTCTCAGCCMCG
TTMATGAGTGTGTTTGTCACCGTTGGGGATTGGGGMGACTGTGGCTGCTGGCACTTGGAGCCMGG GTTCAGAGACTCAGGGCCCCAGCACTAMGCAGTGGAC[C/A]CCAGGAGTCCCTGGTMTMGTACT GTGTACAGMTTCTGCTACCTCACTGGGGTCCTGGGGCCTCGGAGCCTCATCCGAGGCAGGGTCAGGA
WI-7743C 1 06 GAGGGGCAGMCAGCCGCTCCTGTCTGCCAGCCAGCAGCCAGCTCTCAGCC
TTAMTGAGTGTGTFTGTCACCGTTGGGGATTGGGGMGACTGTGGCTGCTGGCACTTGGAGCCMGG GTTCAGAGACTCAGGGCCCCAGCACTAMGCAGTGGACCCCAGGAGTCCCTGGTMTMGTACTGTG TACAGMTTCTGCTACCTCACTGGGGTCCTGGGGCCTCGGAGCCTCATCCGAGGCAGGGTCAGGAGAG
Wl-7743b 275 GGGCAGMCAGCCGCTCCTGTCTGCCAGCCAGCAGCCAGCTCTCAGCCMCG
TTAMTGAGTGTGTTTGTCACCGTTGGGGATTGGGGMGACTGTGGCTGCTGGCACTTGGAGCCMGG GTTCAGAGACTCAGGGCCCCAGCACTAMGCAGTGGAC[C/A]CCAGGAGTCCCTGGTMTMGTACT GTGTACAGMTTCTGCTACCTCACTGGGGTCCTGGGGCCTCGGAGCCTCATCCGAGGCAGGGTCAGGA
Wl-7743 1 06 GAGGGGCAGMCAGCCGCTCCTGTCTGCCAGCCAGCAGCCAGCTCTCAGCC
TTAMTGAGTGTGTTTGTCACCGTTGGGGATTGGGGMGACTGTGGCTGCTGGCACTTGGAGCCMGG GTTCAGAGACTCAGGGCCCCAGCACTAMGCAGTGGACCCCAGGAGTCCCTGGTMTMGTACTGTG TACAGA TTCTGCTACCTCACTGGGGTCCTGGGGCCTCGGAGCCTCATCCGAGGCAGGGTCAGGAGAG
WI-7743 275 GGGCAGMCAGCCGCTCCTGTCTGCCAGCCAGCAGCCAGCTCTCAGCCMCG
TGACATTTATTCAMGTTMMGCAMCACTTACAGMTTATGAAGAGGTATCTGTTTAACATTTCC TCAGTCMGTTCAGAGTCTTCAGAGACTTCGTMTTMAGGMCAGAGTGAGAGACATCATCMGTG GAGAGAAATCtA/G]TAGTTTAAACTGCATTATAAATTTTATAACAGAATTAAAGTAGATTTTAAAA
WI-7758 1 441 GATMMTGTGTMTTTTGTTTATATTTTCCCATTTGGACTGTMCTGACTGCC
ACAGGGCCTTTGGCAGGTGCAGCCCCCACTGCCTTTGACCTGCCTCCCTTCATGCATGGAMTTCCCT TCATCTGGMCCATCAGAMCACCCTCACACTGGGACTTGCAAAMGGGTCAGTATGG[G/C]TTAGG GMMCATTCCATCCTFGAGTCAMAMTCTCMTTCTTCCCTATCTTTGCCACCCTCATGCTGTGTG
Wl-7765b 1 26 ACTCAMCCAMTCACTGMCTTTGCTGAGCCTGTAMATAAMGGTCGGA
TTMTTTACTGATTCCAGCMGACCAMTCATTGTATCAGATTΛ I I I I I MGTTTTATCCGTAGTTT GATAAMGATTTTCCTATTCCTTGGTTCTGTCAGAGMCCTMTMGTGCTACTTTGCCATTMGGCA
GACTAGGGTTCATGTC I I I I LACCCTTTNNNNNNNNNTTGTAAMGTCTAGTTACCTACTTTTTCTTT
Wl-7773b 237 G GATTTFCGACGTTTGACTAGCCATCTCMGCM[C/G]TTTCGACGTTTGA
TGCMCCTCTTTTCGTGATGGGCAGCCTGCTGGTCAGCACTCCAGTAGCGAGAGACGGCACCCAGMT CAGATCCCAGCTTCGGCATTTGATCAGACCAMCAGTGCTGTTΓCCCGGGGAGGAMCACTTTTTTM TTACCCTTTTGCAGGCACCACCTTTAATCTGTTTTT/C]ATACCTTGCTTATTAMTGAGCGACTTMA
Wl-7774b 1 70 ATGATTGAAMTMTGCTGTCCTTTAGTAGCMGTAAMTGTGTCTTGCT
GCAGAGACCTTCCMGGACATATTGCAGGATTCTGTMTAGTGMCATATGGAMGTATTAGAMTA TTTATTGTCTGTAAATACTGTAMTGCATTGGMTAAMCTGTCTCCCCCATTGCTCTATGAMCTGC ACATTGGTCATTGTGMTANNNNNNNNNNNGCCMGGCTMTCCMTTATTATTATCACATTTACCA
WI-7785C 1 65 TMTTTA'ΓΓTTGTCCATTGATGTATTTATTTTGTAAATGTATCTTGGTGCTGC
GCAGAGACCTTCCMGGACATATTGCAGGATTCTGTMTAGTGMCATATGGAMGTATTAGAMTA CΛ I TTTATTGTCTGTAAATACTGTAMTGCATTGGMTAAMCTGTCTCCCCCATTGCTCTATGAMCTGC ACATTGGTCATTGTGMTANNNNNNNNNNNGCCAAGGCTMTCCAATTATTATTATCACATTTACCA
Wl-7785b 1 65 TAATTIATTTTGTCCATTGATGTATTTATTTTGTAMTGTATCTTGGTGCTGC
GCAGAGACCTTCCMGGACATATTGCAGGATTCTGTMTAGTGMCATATGGAMGTATTAGAMTA TTTATTGTCTGTAMTACTGTAMTGCATTGGMTMMCTGTCTCCCCCATTGCTCTATGAMCTGC ACATTGGTCATTGTGMTANN[-
/T]NNNNNNNNGCCAAGGCTMTCCMTTA'TTATTATCACATTTACCATMTTTATTTTGTCCATTGA
WI-7785 1 56 TGTATTTAJTTTGTAMTGTATCTTGGTG __
TCTCCCCCTCATCCMCTCCGAMGTCTGMTCFCCCMGGAGGGCACCATCTTACAGAGACTCTCCC TGACGGTGGMTTTM[G/A]TTTAGGGTCCCTAAMGCATTTGACACACAGTTGTTGMTGACTGAC CCAAMTGTGMTGMGCTMTGTGMTGTGAGTGMGCTCCCTTCAGGCCCGCTGCCCTAGGATAT
WI-7789C 84 GCCCTCCTGGTGACTCGGGGGCTGTCTCAGACGACTAGCCCAGGACCCATCT _
TCTCCCCCTCATCCMCTCCGMAGTCTGMTCTCCCMGGAGGGCACCATCTTACAGAGACTCTCCC TGACGGTGGMTTTM[G/A]TTTAGGGTCCCTAAMGCATTTGACACACAGTTGTTGMTGACTGAC CCAAMTGTGMTGMGCTMTGTGMTGTGAGTGMGCTCCCTTCAGGCCCGCTGCCCTAGGATAT
Wl-7789b 84 ' G Ai — GCCCTCCTGGTGACTCGGGGGCTGTCTCAGACGACTAGCCCAGGACCCATCT
UI
Figure imgf000155_0001
GCAGGAAATAGTCACTCATCCCACTCCACATMGGGGTTTAGTMGAGMGTCTGTCTGTCTGATGA TGGATAGGGGGCAMTC I I I I I CCCCTTTCTGTTMTAGTCATCACATTTCTATGCCAMCAGGMCG
ATCCATMCTTTAGT[CT]TTAATGTACACATTGCATTTTGATAMATTMTTTTGTTGTTTCCTTTG
Wl-7830d 1 50 T AGGTTGATCGTTGTGTTGTTRTGCTGCACTTTTTACI I I I I IGCGTGTGGA
GCAGGAMTAGTCACTCATCCCACTCCACATMGGGGTTTAGTMGAGMGTCT[G A]TCTGTCTGA TGATGGATAGGGGGCAMTCI I I I I CCCCTTTCTGTTMTAGTCATCACATTTCTATGCCAMCAGGA ACGATCCATMCTTFAGTCTTAATGTACACATTGCATTFTGATAAMTΓMTTTTGTTGTTTCCTTTG
WI-7830C 54 AGGTTGATCGTTGTGTTGTTTFGCTGCACTTTTTAC I I I I I I GCGTGTGGA
GCAGGAMTAGTCACTCATCCCACTCCACATMGGGGTTTAGTMGAGMGTCTGTCTGTCTGATGA TGGATAGGGGGCAMTC I I I I I CCCCTTTCTGTTMTAGTCATCACATTTCTATGCCAMCAGGMC[ G/A]ATCCATMCTTTAGTCTΓAATGTACACATTGCATTFTGATAMATTMTTTTGTTGTTTCCTTTG
Wl-7830b 1 34 AGGTTGATCGTTGTGTTGTTFTGCTGCACTTTTTAC I I I I I I GCGTGTGGA
GCAGGAAATAGTCACTCATCCCACTCCACATMGGGGTTTAGTA[A/G]GAGMGTCTGTCTGTCTGA
TGATGGATAGGGGGCAMTCI I I FT CCCCTTTCTGTTMTAGTCATCACATTTCTATGCCAMCAGGA ACGATCCATMCTTTAGTCTTAATGTACACATTGCATTTTGATAAMTTMTTTΓGTTGTTFCCTΓTG
Wl-7830 44 AGGTTGATCGTTGTGTTGTTTTGCTGCACTTΓTTACTTTTTTGCGTGTGGA
CCACTTCCTATCTGA I I I I I CCCAGFC/ΗAAATGAGGCAGGCAATTCTAGTCTTCCACAAAACATCTA 4i. GCCATCTAAMTGGAGAGATGAATCATTCTACCTATACAMCMGCTAGCTATTAGAGGGTGGTTGG GGTATGCTACTCATMGATTTCAGGGTGTCTTCCMCTGMATCTCMTGTTCTCAGTACGAAAMC
Wl-7865e 25 CTGAAATCACATGCCTATGTAAGGAMGTGCTATTCACCCAGTAMCCCAM
CCACTTCCTATCTGATTTTFCCCAGCAMTGAGGCAGGCMTTCTAGTCTTCCACAAMCATCTAGCC ATCTAAMTGGAGAGATGAATCATTCTACCTATACAMCMGCTAGCTATFAGAGGGTGGTTGGGGT ATGCTACTCATAAGATTTCAGGGTGTCTTCCMCTGAAATCTCMTGTTCTCAGTA[C/ηGAAMAC
Wl-7865d 1 91 CTGAMTCACATGCCTATGTMGGMAGTGCTATTCACCCAGTAMCCCMA
CCACTTCCTATCTGATTTrTCCCAG[CtηAAATGAGGCAGGCMTTCTAGTCTTCCACAAAACATCTA GCCATCTAAMTGGAGAGATGMTCATTCTACCTATACAMCMGCTAGCTATTAGAGGGTGGTTGG GGTATGCTACTCATMGATTTCAGGGTGTCTTCCMCTGMATCTCMTGTTCTCAGTACGAAAMC
WI-7865C 25 l C CTGAMTCACATGCCTATGTMGGMAGTGCTATTCACCCAGTAMCCCMA
CCACTTCCTATCTGATTTTTCCCAGCAMTGAGGCAGGCMTTCTAGTCTTCCACAAMCATCTAGCC ATCTAAMTGGAGAGATGMTCATTCTACCTATACAMCMGCTAGCTATTAGAGGGTGGTTGGGGT ATGCTACTCATAAGATTTCAGGGTGTCTTCCMCTGMATCTCMTGTTCTCAGTA[C/T]GAAMAC
Wl-7865b 1 91 CT CTGAMTCACATGCCTATGTMGGMAGTGCTATTCACCCAGTAMCCCAM
Figure imgf000157_0001
Figure imgf000158_0001
GCTCACTGTGACCCATCCTTACTCTACTTGGCCAGGCCACAGTAAMCMGTGACCTTCAGAGCAGCT GCCACMCTGGCCATGCCCTGCCATTGAMCAGTGATTMGTTTGATCMGCCATGGTGA[C/T]ACA AAMTGCATTGATCATGMTAGGAGCCCATGCTAGMGTACATTCTCTCAGATTTGMCCAGTGAM
Wl-7900d 1 28 T TATGATGTATTTCTGAGCTAAAACTCAACTAΓAGAAGACATTAAAAGAAATC
GCTCACTGTGACCCATCCTTACTCTACTTGGCCAGGCCACAGTAAMCMGTGACCTTCAGAGCAGCT GCCACMCTGGCCATG[CTΗCCTGCCATTGAMCAGTGATTMGTTTGATCMGCCATGGTGACACA AAAATGCATTGATCATGMTAGGAGCCCATGCTAGMGTACATFCTCTCAGATTTGMCCAGTGAM
Wl-7900e 84 TATGATGTATTFCTGAGCTAAMCTCAACTATAGMGACATTAAMGAAATC
GCTCACTGTGACCCATCCTTACTCTACTTGGCCAGGCCACAGTAAMCMGTGACCTTCAGAGCAGCT GCCACMCTGGCCATGCCCTGCCATTGAMCAGTGATTMGTTTGATCMGCCATGGTGA[C/ΗACA AAMTGCATTGATCATGMTAGGAGCCCATGCTAGMGTACATTCTCTCAGATTTGMCCAGTGAM
Wl-7900d 1 28 TATGATGTATTTCTGAGCTAAMCTCMCTATAGMGACATTAAAAGAAATC
GCTCACTGTGACCCATCCTTACTCTACTTGGCCAGGCCACAGTAAMCMGTGACCTTCAGAGCAGCT GCCACMCTGGCCATG[CT CCTGCCATTGAMCAGTGATTMGTTTGATCAAGCCATGGTGACACA AAMTGCATTGATCATGMTAGGAGCCCATGCTAGMGTACATFCTCTCAGATTTGMCCAGTGAM
WI-7900C 84 TATGATGTATTTCTGAGCTAAMCTCMCTATAGMGACATTAMAGAAATC
GCTCACTGTGACCCATCCTTACTCTACTTGGCCAGGCCACAGTAAMCMGTGACCTTCAGAGCAGCT GCCACMCTGGCCATGCCCTGCCATTGAMCAGTGATTMGTTTGATCMGCCATGGTGA[C/T]ACA AAAATGCATTGATCATGMTAGGAGCCCATGCTAGMGTACATTCTCTCAGATTTGMCCAGTGAM
Wl-7900b 1 28 T TATGATGTATTTCTGAGCTAAAACTCAACTATAGAAGACATTMAAGAAATC
GCTCACTGTGACCCATCCTTACTCTACTTGGCCAGGCCACAGTAAMCMGTGACCTTCAGAGCAGCT GCCACMCTGGCCATG[CTΗCCTGCCATTGAMCAGTGATTMGTTTGATCMGCCATGGTGACACA AAMTGCATTGATCATGMTAGGAGCCCATGCTAGMGTACATTCTCTCAGATTTGMCCAGTGAM
WI-7900 84 TATGATGTATTTCTGAGCTAAAACTCAACTATAGAAGACATTAAAAGAAATC
AGACTTAGGTACAATTGCTCCCCTTTTTATATA[C/T]AGACACACACAGGACACATATATTAMCAG ATTGTTTCATCATTGCATCTATTTTCCATATAGTCATCMGAGACCATTFTATAAMCATGGTMGAC CCTTTTTAAMCAMCTCCAGGCCCTTGGTTGCGGGTCGCTGGGTTATTGGGGCAGCGCCGTGGTCGT
WI-7901 C 33 CAC I CAGTCGCTCTGCATGCTCTCTGTCATACAGACAGGTMCCTAGTFCT
AGACTTAGGTACAATTGCTCCCCTTTTTATATA[C/T]AGACACACACAGGACACATATATFAMCAG ATTGTTTCATCATTGCATCTATTTTCCATATAGTCATCAAGAGACCATTTTATAAMCATGGTMGAC CCTTTTFAAMCAMCTCCAGGCCCTTGGTTGCGGGTCGCTGGGTTATTGGGGCAGCGCCGTGGTCGT
Wl-7901 b 33 CACTCAGTCGCTCTGCATGCTCTCTGTCATACAGACAGGTMCCTAGTTCT
Figure imgf000160_0001
VO
Figure imgf000161_0001
ACMTCTCAGMGGACTGTGCMGTCMTGAGTCGCTTGTGMTTCTCATCTGGAM[C/T]GATCCC ACGTCTTAGMCCTTCACCACMGGAG I I I I I CTTGTAGTGATTCTCAMGTCTTGGTAGGCATTCGA ACTGGTCCTTTCACTTTGAGATTCTTTTCTTTTGCGCCTCTTATCMGTCAGCACACACCTTTTCCMG
Wl-8021 b 57 GATTTTACGTTGCGGCTTGTTAGGGGTGATTCGMTTCGGTGMTTGCCA
ACMTCTCAGMGGACTGTGCMGTCMTGAGTCGCTTGTGMTTCTCATCTGGAM[C/TJGATCCC ACGTCTTAGMCCTTCACCACMGGAG I I MTCTTGTAGTGATTCTCAMGTCTTGGTAGGCATTCGA ACTGGTCCTTTCACTTΓGAGATTCTTTTCTTTTGCGCCTCTTATCMGTCAGCACACACCTTTTCCMG
WI-8021 57 GAT ΓACGTTGCGGCTTGTTAGGGGTGATTCGMTTCGGTGMTTGCCA
CTGAAMTTTACTATGCTCTCCACMCMGAGCTCCCATTTTCCACAGACACAGTCMTGTCAGTCA GCTTGTATTCAGGAGGACAGGGCAGAGGGATCCCAGTGGCACTTCCCATGGGMGACAGMGAGAGT GGGCCCCAGAGATGGMGGACCCCAGTGTCATCACCAMCMCCATTTCAGCCGCTCTAGCCTCTM
WI-8024C 206 TTCCC[A/G]CTCTAGMCAGCTGGCCCTGGTCGTCAGTACACMGGAMGAGC
CTGAAAATTTACTATGCTCTCCACAACMGAGCTCCCATTTTCCACAGACACAGTCMTGTCAGTCA GCTTGTATFCAGGAGGACAGGGCAGAGGGATCCCAGTGGCACTTCCCATGGGMGACAGMGAGAGT GGGCCCCAGAGATGGMGGACCCCAGTGTCATCACCAMCMCCATTTCAGCCGCTCTAGCCTCTM
Wl-8024b 206 TTCCC[A/G]CTCTAGMCAGCTGGCCCTGGTCGTCAGTACACMGGMAGAGC
GMTGAGCCTTCCTAGCGCCGAGGGACCTGCTGCTGTTGTTGGCCTGCACATGCATTCTATGGMTGC TTTTTGGCCMGCGGGGGCACTGAGGACTMGCTCTGANNNNNNNNNATCTCGCCCAMCTCCTTTCT MGGAGTCTGGGGTGTCATGCCCTACAMCC[A/G]TAAATTCTCATCAGATGGATTTTATTTMCGTT
WI-8077 1 67 GTGTATTGTGACTTACTTTCCAATCTGACTCTGGCATMCMGGGMAAA
TCTAGGTTTMTCMAGCMTTTGCANTTTGGATTTTGGMTGACCACTCCCTTGCTMGGMGCTAT GTACTTCATGCTGTGGAMCTGGCAMTACAGMTGTAGCTTGTTT[G/C]TTTTCTTAGCCTTGMGA TGACCAGGTAGAGAGACAGAGTGAGACCMCAGTTTTTCTGATTTCCCTGCTCCTCCTATTCCTTCCT
Wl-81 18f 1 1 4 AAAMTCAGACTCATTGTGACCAGTAGTCTTGAGGACTCMGCTGMTGA
TCTAGGTTFMTCMAGCMTTTGCANTTTGGATTTTGGA[A/G]TGACCACTCCCTTGCTMGGMGC TATGTACTTCATGCTGTGGAMCTGGCAMTACAGMTGTAGCTTGTTTGTTTTCTTAGCCTTGMGA TGACCAGGTAGAGAGACAGAGTGAGACCMCAGTTFTTCTGATTTCCCTGCTCCTCCTATTCCTTCCT
WI-8118Θ ! 40 A d- AAAAATCAGACTCATTGTGACCAGTAGTCTTGAGGACTCMGCTGMTGA
TCTAGGTTTMTCMAGCMTTTGCANTTTGGATTTTGGMTGACCACTCCCTTGCTMGGMGCTAT GTACTTCATGCTGTGGAMCTGGCMATACAGMTGTAGCTTGTTTGTTT[T/G]CTTAGCCTTGMGA TGACCAGGTAGAGAGACAGAGTGAGACCMCAGTTTTTCTGATTTCCCTGCTCCTCCTATTCCTTCCT
Wl-81 18d 1 1 8 ' T G — AAAMTCAGACTCATTGTGACCAGTAGTCTTGAGGACTCMGCTGMTGA
Figure imgf000163_0001
TTFTTAMTATGCCCGTTTAGAGCAGACACAGTCACMTMMGTTMAMGTTACMTGTGTCCAG TGTATATACCCAGGNMTCCATTCTTGGTACTTTTCMGAGCTGCTGTTATACTGAGTCTCTGAGMG TCCCCTTAGATMTAGCTGCCACTTTTCAGTATGGTTCAGMT[G/A]AGTATCTTAGTATTCTTTCTA
WI-8321 1 78 TTTTGCTATGGTTCTAGTTFATCMCCTACTTTATTAGCTGMCTGTTGGC
TTTTTMATATGCCCGTTTAGAGCAGACACAGTCACMTMMGTTMAMGTTACMTGTGTCCAG TGTATATACCCAGGNMTCCATTCTTGGTACTTTTCMGAGCTGCTGTTATACTGAGTCTCTGAGMG TCCCCTTAGATAATAGCTGCCACTTTTCAGTATGGTTCAGAAT[G/A]AGTATCTTAGTATTCTTTCTA
WI-8321 1 78 TTTFGCTATGGTTCTAGTTTATCMCCTACTTTATTAGCTGMCTGTTGGC
TATGTACTCACTTTCAGTTACCCCCGTGCCTCCAGMTCGCATGTTGCTCCACCTGGGGGCGGATATA MTTACCTCTAGATTGTCCAMGCCCAGTCTTTCCCTTCCCTGTGCAGCCTTAGA[A/C]ACTMGTAG CAGTACTGTTTGGTGTGTGTTTGTTTCTTCCCCAGCMTGCCTACTGCAGCTACTTAGTMCMCTAG
Wl-8332b 123 AGGTGGAGGGTNTCCGGGGMGCAGTTAGATGAGTTMGTGTGATGCACA
TATGTACTCACTTTCAGTTACCCCCGTGCCTCCAGMTCGCATGTTGCTCCACCTGGGGGCGGATATA MTTACCTCTAGATTGTCCAMGCCCAGTCTTTCCCTTCCCTGTGC[A/C]GCCTTAGAMCTMGTAG CAGTACTGTTTGGTGTGTGTTTGTTTCTTCCCCAGCMTGCCTACTGCAGCTACTTAGTMCMCTAG
Wl-8332 1 1 4 AGGTGGAGGGTNTCCGGGGMGCAGTTAGATGAGTFMGTGTGATGCACA ON SI
TGCGGGCTTMCAGGMGCATGACTGGGAGGCCTCAGGMGCTTATMTCATGGCAGMGGCGMGG GGMGCMGGACCTTCTTCACATGGCAGCAGGAGAMGAGMGMGGGAGMGTCTACACACTTTT AMCMCCAGATCTCATGAGANTTCCATCGGGAGACAGCACTAGGGGGATGGCACTAMCCATTAGA
Wl-8378b 31 1 MCTGCCCCCATGATCCMTCACCTNTCACCAGGCCCCTCCTCCMCACGTGGGG
TGCGGGCTTMCAGGMGCATGACTGGGAGGCCTCAGGMGCTTATMTCATGGCAGMGGCGMGG GGMGCMGGACCTTCTTCACATGGCAGCAGGAGAMGAGMGMGGGAGMGTCTACACACTTTT MACMCCAGATCTCATGAGANTFCCATCGGGAGACAGCACTAGGGGGATGGCACTAMCCATTAGA
WI-8378 308 MCTGCCCCCATGATCCMTCACCTNTCACCAGGCCCCTCCTCCMCACGTGGGG
AGCACATATTTAGCATTMGCCTCMACGATACAGCMTATGTTACATTCTCTTGTGAAMCAG TTGTTGTAGACTGTTMNNNNNNNNMATGTMCTCCGACTTGTGCCTMTAGGATTTGACCNTTAA GAGGNTTCTTTTGCTGTGGANGGGGTGGCTTTGCTTGMCTTCCATTCTGtT/G]GCCTTGTAGCTGGTG
WI-8426 1 84 G AGGCTGGGAGTATGGANGGNCCCGGGGCCCTTGGCNATNGNATFCAGTGAG
TTGAGCCTCCACAMTMTGCAACCMGTTTTACATTTTTMCAGCCCTTCTACATACACT[C/A]CA TCTTCTCTATCTTAGTTCCMGTTTTAGTTTTCMTCCCMπATACCMTTCCATTGTTATTTTMGA MAMCCTTCCCAGTTATTGTCAGAMCTATGATTTAGCTTACCCCCTCCACTACCCAGCAMCTAC
Wl-8450h 61 C A AGAGAGGATGGGAGTGTMTATGAGCAGTACAGAGTCTTMTGCMTTCAT
ON UI
Figure imgf000165_0001
ON 4i-
Figure imgf000166_0001
ON CΛ
Figure imgf000167_0001
ON ON
Figure imgf000168_0001
GGCCACTGTCCAMGTCTGTCACAGTCCTCCATATGGCAMGATGMGMMTTGGCMTCT TTA
GGGGTACCMGGNTCTGAGTTTGTACGGTCTTTATAMTGCAGAGCMGATGTGGCTTTCCTGCCCC[ C/AJATTTCACCTCMGGCATCTTCAGCMCCCCACATGGCTTCCCTCTGTGCGCATGAMTMCTTG
Wl-9676h 1 34 AGGCCAGGGTCTCTCAGCTTTAMGCCTTGGMTCCTATGCATTGTTTGTTT
GGCCACTGTCCAAAGTCTGTCACAGTCCTCCATATGGCAMGATGAAGAAAATTGGCMTCTTTTTA GGGGTACCMGGNTCTGAGTTTGTACGGTCTTTATAMTGCAGAGCAAGATGTGGCTTTCCTGCCCCC ATTTCACCTCMGGCATCTTCAGCMCCCCACATGGCTTCCCTCTGTGCGCATGAMTMCTTGAGG[
Wl-9676g 202 C/ηCAGGGTCTCTCAGCTTTAMGCCTTGGAATCCTATGCATTGTTTGTTT
GGCCACTGTCCAMGTCTGTCACAGTCCTCCATATGGCAMGATGMGMMTTGGCMTCTTTTTA GGGGTACCMGGNTCTGAGTTTGTACGGTCTTTATAMTGCAGAGCMGATGTGGCTTTCCTGCCCCC ATTTCACCTCMGGCATCTTCAGCAACCCCACATGGCTFCCCTCTGTGC[G/ηCATGAMTAACTTGA
Wl-9676f 1 84 GGCCAGGGTCTCTCAGCTTTAMGCCTTGGMTCCTATGCATTGTTTGTTT
GGCCACTGTCCAMGTCTGTCACAGTCCTCCATATGGCAMGATGAAGAAMTTGGCMTCTTTTTA GGGGTACCMGGNTCTGAGTFTGTACGGTCTTTATAMTGCAGAGCMGATGTGGCTTTCCTGCCCCC ATTTCACCTCMGGCATCTTCAGCAACCCCACATGGCTΓF/CJCCCTCTGTGCGCATGAMTMCTTGA
Wl-9676e 1 73 T GGCCAGGGTCTCTCAGCTTTAMGCCTTGGMTCCTATGCATTGTTTGTTT
GGCCACTGTCCAMGTCTGTCACAGTCCTCCATATGGCAMGATGMGMMTTGGCMTCTTTTTA ON -4 GGGGTACCMGGNTCTGAGTTTGTACGGTCTTTATAMTGCAGAGCMGATGTGGCTTTCCTGCCCC[
C/A]ATTTCACCTCMGGCATCTTCAGCMCCCCACATGGCTTCCCTCTGTGCGCATGAMTMCTTG
Wl-9676d 1 34 AGGCCAGGGTCTCTCAGCTTTAMGCCTTGGMTCCTATGCATTGTTTGTTT
GGCCACTGTCCAMGTCTGTCACAGTCCTCCATATGGCAMGATGMGAAMTTGGCMTCT TA
GGGGTACCMGGNTCTGAGTTTGTACGGTCTTTATAAATGCAGAGCA[A/G]GATGTGGCTTTCCTGCC CCCATTTCACCTCMGGCATCTTCAGCMCCCCACATGGCTTCCCTCTGTGCGCATGAMTMCTTGA
WI-9676C 1 1 4 GGCCAGGGTCTCTCAGCTTTAMGCCTTGGMTCCTATGCATTGTTTGTTT
GGCCACTGTCCAAAGTCTGTCACAGTCCTCCATATGGCAMGATGMGAAMTTGGCMTCTTTTTA GGGGTACCMGGNTCTGAGTTTGTA[CtηGGTCTTTATAMTGCAGAGCMGATGTGGCTTTCCTGCC CCCATTTCACCTCMGGCATCTTCAGCMCCCCACATGGCTTCCCTCTGTGCGCATGAMTMCTTGA
Wl-9676b 92 GGCCAGGGTCTCTCAGCTTTAMGCCTTGGMTCCTATGCATFGTTTGTTT
GGCCACTGTCCAMGTCTGTCACAGTCCTCCATATGGCAMGATGMGMMTTGGCMTCTTTTT, GGGGTACCMGGNTCTG[AC]GTTTGTACGGTCTTTATAMTGCAGAGCMGATGTGGCTTTCCTGCC CCCATTFCACCTCMGGCATCTTCAGCMCCCCACATGGCTTCCCTCTGTGCGCATGAMTMCTTGA
|WI-9676a 84 A C GGCCAGGGTCTCTCAGCTTTAMGCCTTGGMTCCTATGCATTGTTTGTTT
TGGACCAMCACAGACAGATGTATTCCTGGTGCCTGTGTA[C/AJATTACMCTCATTGATCACATGC AGCMCATCMCATCTCMGGAGTCCATTTGTTCAAAACACAGTAMTGACTCCACATTTTCCCTTT GAGTCMCAAMGACTCTGCTTGTCACCTTGCCTGGAGCGGGGTGGTTTTTCACTATGTGAGTATCTA
Wl-9738b 40 TCTTTTTATTTCTGTCCCTTATGTTGGTGGGCACATGTCTGTATTGCTGTCC
TGGACCAMCACAGACAGATGTATTCCTGGTGCCTGTGTA[C/AJATTACMCTCATTGATCACATGC AGCMCATCMCATCTCMGGAGTCCATFTGTTCAAMCACAGTAAATGACTCCACATTTTCCCTTT GAGTCMCAAMGACTCTGCTTGTCACCTTGCCTGGAGCGGGGTGGTTTTTCACTATGTGAGTATCTA
WI-9738 40 TCTTTTTATTTCTGTCCCTTATGTTGGTGGGCACATGTCTGTATTGCTGTCC
ACTGAMTGTAMTGGCCMGGCACCCAGGACCTTAAAMTCATMGMGTTMTCTGTGGGMM GAGTMCTACAAMGCATCTAMCMGAGCAGGATGTGATGTAATGTGTCCCCTTATCACTTTAGTC AGTAAAGATMGMAGCCCTGGTGAGTATCCACTTCCACAMCACACAGMTATACACTTTTGGMG
WI-9756 47 ATΓTCCACTTMCCACTTGATTCTTCAC I I I I I I ATGATTTAAAACTCTCCGTGG
GATGGTCCCTTMGGATTTGCATTGGTTMTGGGCAGACTGGTGCAAMGAGGCTGMTTGMTMT TAGGMACTGGGAGMTTCAATTCAMGMGMTTCTTGTTCGCMGGTCMTTTTTATACTATTTA A[A/G]TAAAATAACTCTGGTAGGTTCTATAGCAMTGCTAAGTAAAGTAACCGCTGGTTTCTAAATT
WI-9758 1 35 A G ATTACG
ATTTAAATCCAGGCAGCGGGGAAMTGGATACTTTCATATGTCTCTGTACCCMCTATAMCTTTTG ON CO GTTCTCATGCACCATTTTCATTTTGCCTTCTCACTCCMGTACCACTGATTTTACCMTT[G/A]CTCTC ATMTTGACTTTGCTACTGGMGAMCTCTTAGMTGTFGGMTTTCTCTATTACACACTTTGCCTCA
WI-9778 1 27 MGMTGTGTCAGTCAGGACTAMGGCMTAGTCTCAGGGCAGACAGCC
TCTCCCCTTTGCCTCCTCATGCCCACTCCCTCAGCCTGCACAGAGCGTTTCTCCAGTGTAGTCTCTGGT CCATCTGCATCAAMTCACCTGCAGGACTTGCTGACMTGCAGTTTC[C/A]TGGATCCCACCCAGGA CTCAAAAAMCTAGGMTTGGGAGMGAGGGACCTGGMTCGGTGTTGCTAGCMGCCCCCAGGTGG
WI-9832 1 1 6 A TTTGTMGTGGACTAMGTTTGAGGACCAGACATGGMGGTTGGCTTTGGC
TGGAMMTAGC" TTTATCAATCTCTGATATGCTACATATGTCATGGAGAMTGCAGMTGGCATGA TATGAAATTCCA" TTTTGAATGAATAAAATATAC[A/G]TGTGTATGTATATATACTTATFAACACTT
AGGATTATATACACACMTMAACGTCTGTMGGATAMCTMGGTTCTATCAGTGGGAMTGAGA
WI-9841 1 01 G- TTGAAMGAGGGGGATGTGTTACTTGATATGCTGTTG
GMCTMCACCTTTCTTGCATGGA I I I I I CTTGATTATTGGCAGTTAACMTMMTGTTATTAGATC ACTGGTGCTTCTGTGTGGGGTTGAG I I I I I l ATGATATCTCCTGTTAGACCCATMGGGAGGCTGTGA GTTGTTTTCTACATCCTTGGACTATATAAGATCCTCTTTTMMTTATATTTTATATMGCACATGAA
W1-9880C 222 G A AATGGAATGAAATAATGAfG/AITTGACATAGGAATTACCTACATATTTTG
ON VO
Figure imgf000171_0001
Figure imgf000172_0001
Figure imgf000172_0002
Figure imgf000173_0001
-4 SI
Figure imgf000174_0001
Figure imgf000174_0002
-4
UI
Figure imgf000175_0001
Figure imgf000175_0002
-4
4i.
Figure imgf000176_0001
-4 CΛ
Figure imgf000177_0001
AGTATACAMCATTTMGCTGTGGTCMGGCTACAGATGTGCTGACMGGCACTTCATGTAMGTGT CAGMGGAGCTACAAMCCTACCCTCA[A/GJTGAGCATGGTACTTGGCCTTTGGAGGMCMTCGGC TGCATTGMGATCCAGCTGCCTAπGATTTMGCTTTCCTGTTGMTGACAMGTATGTGGTTTTGTA
DWU-252 94 AT
GMCATTCCTCTGCAGCACTTCACTACCAMTGAGCATTAGCTACTTTTCAGMTTGMGGAGAAM TGCATTATGTGGACTGAA[C/ηCGACTTTTCTAAAGCTCTGMCAAAAGCTTTTCTTFCCTTTTGCM CMGACAMGCMAGCCACATTTTGCATTAGACAGATGACGGCTGCTCGMGMCMTGTCAGAM
DWU-330 85 CTCGATGMTGTGTTGATTTGAGAMTTTTACTGACAGAAATGCMTCTCCCT
GAAAATGTTAATTGGGCAGGTGAAMGGGTACAGATGTGCTGTAGCAGACCTTTGGTTTTAAMGAG
MGCATCATTTCCCCMCAGGGCMCTGTAGMGGCCAGCTGMGAGTAMGGAAMGGTCTGAGG
ACTGAGCCTGTGGCTGGCTGGAAAMGGTGMTGTTGAGGGCCCTTCACTTCCATCACMGAMGTC
DWU-370 231 i ATTAGACGGTACCMTTCAGTGTCTGTTCCT[A/G]GCATCTATTTCCTCTGTGC
CTCTTAACTTCAGTTCCCTCATCTATAAGMTAAGGGATTCAGTTGTGATCACATAGCTCAGGTAATC
DWU- CAGGACCAGAMCCCAGGAGC[A GJTGGGACCTGATCCACAGCTAGAGGATGGGGGACTCTGTAGCT 1537b 89 ACAGCATTTTCCTGMCACACMGMATCCAGTMGCAGCACACACTGGCTGA
CTCTTMCTTCAGTTCCCTCATCTATMGMTMGGGATTCAGTTGTGATCA[C/T]ATAGCTCAGGTA! -4
ON
DWU- ATCCAGGACCAGAMCCCAGGAGCATGGGACCTGATCCACAGCTAGAGGATGGGGGACTCTGTAGCT 1537a 52 T ACAGCATTTTCCTGMCACACMGAMTCCAGTMGCAGCACACACTGGCTGA
ACCATCTTATACTATGGCAGGTMGTCCATACAGMGAGCCCTCTCTCCCTGGGATTTGAGTGGGGTC CCCAGCTCCACCCAGAGGCCCCTGGGGMTTCCAGGGTCACTGTTCCTTCCTGTCTCCCTGTGGGMT
ESTD- CMGCCAGCTCCAGGCCAGMGTGGGACTGTGAGGACATGGAGGCCTCGGCACTGAGCTG[C/G]AGA ADAb 1 96 CCCGCAGACCMCTCCTGAGCTTTCTGGGCCTCTGAGTCTTGTCCTC
ACCATCTTATACTATGGCAGGTMGTCCATACAGMGAGCCCTCTCTCCCTGGGATTTGAGTGGGGTC CCCAGCTCCACCCAGAGGCCCCTGGGGAATTCCAGGGTCACTGTTCCTTCCTGTCTCCCTGTGGGMT
ESTD- CMGCCAGCTCCAGGCCAGMGTGGGACTGTGAGGACATGGAGGCCTC[G/AJGCACTGAGCTGCAGA ADAa 1 84 CCCGCAGACCMCTCCTGAGCTTTCTGGGCCTCTGAGTCTTGTCCTC
TCTCCTGTCATTCCTACTCCATTAGTTCMGGTCAGTGMGMCTGGGGCMTTMCCMGTMTTCA
ESTD- TGGACTGCCCMCTGCGMACMGMGGGCGCAGTGGAGCAGGAGTATTATGCTACGCGGTTACCTT ANT1 1 60 Tl TTTTTATGGAGGACCGMCTGAGGCrr/qGAGCTCAGATGATCCTGT
TGCCTGGGGTGGCMGGCTGCAMCMGGAGGCMCCCAGGAGGCTTTTATGMGCGGGCCATGGTA
EST10398I AGATGCTGCCACCTCTTATCTACTTGATGATGTTCACATTTGGGGCTTGACTTTCCMCACGGAGMG 2b 1 68; A' G - CATTGTTTTCTTCGGGCCMGMGGTATCTACCrA/GIATAGTGTCTATTAGGCATTTG
-4 -1
Figure imgf000179_0001
-4
00
Figure imgf000180_0001
-4
VO
Figure imgf000181_0001
oo s
Figure imgf000182_0001
Figure imgf000182_0002
00
Figure imgf000183_0001
oo
SI
Figure imgf000184_0001
00 U
Figure imgf000185_0001
00
Figure imgf000186_0001
Figure imgf000187_0001
00 ON
Figure imgf000188_0001
00 -4
Figure imgf000189_0001
00 00
Figure imgf000190_0001
00
VO
Figure imgf000191_0001
VO
©
Figure imgf000192_0001
Figure imgf000193_0001
VsO
Figure imgf000194_0001
VO U
Figure imgf000195_0001
VO
Figure imgf000196_0001
VO CΛ
Figure imgf000197_0001
VO
ON
Figure imgf000198_0001
VO ~1
Figure imgf000199_0001
Figure imgf000199_0002
00
Figure imgf000200_0001
V VO
Figure imgf000201_0001
s
© o
Figure imgf000202_0001
Figure imgf000202_0002
SI o
Figure imgf000203_0001
SsI
S)
Figure imgf000204_0001
SI
© UI
Figure imgf000205_0001
©
Figure imgf000206_0001
S © CΛ
Figure imgf000207_0001
Figure imgf000208_0001
s>
©
ON
Figure imgf000208_0002
Ss)
-4
Figure imgf000209_0001
I
© oo
Figure imgf000210_0001
AAAGCATGAC CGCTTATGTTA AATAAAATGA ATAGTMTTCC CMGTGAATATTGATACATGGCTGACMAGCATGACMTMMTGMCAC[A/G]TACGGGMTTAC
WI-17904 50 G ACAC 03 TATTAACATMGCGATAACATCAAAACATCTGGTAMATGCAGTTAAAACMCAACACAMTGA
TGCCAMTAC MCTACTAGCG G I I I M I CTTTGAGTGACACMGCTTGTTCA I I I I I GAGAAMTGTGTGCCMATACTCMGTGTGM
EST34149 TCMGTGTGA AGMCAACTA T[A G]GATTTTATTAGTTGTTCTCGCTAGTAGT ΓGGTATTCTATGMAMMGCAGCTAGTTCAGC 5 69 AT ATAAAATC TT ACAAATCACACAAGT
TGGGAAMCATMGTTMCTCMGMTATATFCCAGTCTTTATGTTACTMMCATTGTMTAGTGT
EST34343 TTTTATCAATGATGCCGAGGTCACTGCT[C/A]TACAMGATTAMGMACTTACCATCAAACACTTC 8 95 CAGTGCATCM
GGACCATATG CAGAMTTATG GGTACACAATTTTMTGGMGGMCCACAGGTATGTTGAMGMCATCAGTACAGCTGGAGACAGG ATATATAACT TGATAATAACT GAGGGACCATATGATATATMCTCCTAMAGC[C ]GGMGGAGTTATTATCACATMTTTCTGGGC
WI-17982 98 CCTAAMGC CCTTCC GCTACAGMG I I I I I CATCA
CTCAGTMCTCCGGTGTATMTCTGCCATTTATTGATTTATTTATGATAAMCMCCTCTCATTGTGA AAMCAGCTMGGGTGACATCTCCAGACCCMCCACTGTCCCTGTMTGT[A/C]CTGCTGAGAGTCC
WI-17993 1 1 8 ACATTTTGGAAATCCAAT
CCCATCCAGAMCCCCAGTGTGATGGTGGMGCAGCATGAAMCMCATCTCCCCAGGCCTCGCAGT
GTAGAGGCGA AGGCACATGGG AGAGGCGAAGGGMCAG[A/GJGCTGCCCATGTGCCTGTCTCTAMGACGCCACCCTCAGGTTGATGT
Figure imgf000211_0001
WI-17996 84 G AGGGMCAG CAGC CACCTGTGGGAGACCGGGT
ATTCTTTATAAAAACACCATGTCCCTAAAATGT[C/GJATTCMCATATATGCACACCTTCGATGTAT
WI-17136 33 AGGACACTGATCAAMMGACAGAGAMTGTGTCCCT
GCCACTGAAAAMGGTGCTCTTCC[A/C]GTTTCTMCTCCCTGGACTCCCTCATTGGMCTGMGCTC ACAGATGTTTCAGCTGGACTAGT1TAGACTTTGCTGTATTTTAMAGGCAGTGTTGATGCTCCAGGAT
WI-18041 24 TCAAATACTTAATCA
EST35164 CACAGCCCTGC CCCTCTGGATT TTGMCCMGGCCCTMCAGATGACTCAGCAGGGCCTTCMGCACAGCCCTGCCCCC[AG]TCTTGA 8a 57 G CCCC CTGMTCTCM GATTCAGMTCCAGAGGGTGCTCAGTCCTTGGTTTAGGTGCTTCTGTGACATTTCCTCTTG
AGCGMTGMMTGCTACATAGGCTCCCTGAGTTCTTTCATGTACGMTCTTGGTTACACATCTTAGl
Wl- AGJACAGCAGAGCTGCCTGAGGGAGGGTTGTGTTTMTGTCGTATGCATGCTCAGCACAGTGCTGGC 18052b 67 ATGGCCCATCCATGCTTT
CCTGAGTTCTT AGCGMTGMMTGCTACATAGGCTCCCTGAGTTCTTTCATGTACGMTC|T/C]TGGTTACACATCTT
Wl- TCATGTACGA CTCAGGCAGCT AGMCAGCAGAGCTGCCTGAGGGAGGGTTGTGTTTMTGTCGTATGCATGCTCAGCACAGTGCTGGC 18052a 50 ATC CTGCTGT ATGGCCCATCCATGCTTT
GGGAGTGGGG CGTCACCCTGC CTGTTGTGCTGAGMCAGMGGGGTCMGGGAGTGGGGGAGTAAAA[G/A]TGGMGCAGGGTGACG
WI-18054 46 GAGTAAAA TTCCA CATGCAGGAGTCCAGACAAMGACGGGTGATTTTGCTCAGGTTGGTAGCMCAGAGGTMTG
S)
O
Figure imgf000212_0001
S)
Figure imgf000213_0001
S) S)
Figure imgf000214_0001
s>
U)
Figure imgf000215_0001
S>
Figure imgf000216_0001
TCATCTGAGA CATTATAGGTA TCC I I I I I ATTCATGATTTGTTTCATCTGAGAATAMCTTCCTGTCTMTTTTCCAA[C/G]ACTATGTT
EST39236 ATAMCTTCCTj CTGAGTCATAC TAATGTATGACTCAGTACCTATMTGAGACTGGAMTATATTACCTGGCAMTGMTGAGGTGTCTC Ob 57 GTCT ATTAAACA
GCACMTTAA CAMCAGACCTTTGGTTTGAGCTCACCTGGTGACAGGAGACTCCTACCTGAMCAGGGATGCC[G/η
EST39294
Figure imgf000217_0001
CCTGAMCAG ACATAGTACCG TTCTCGGTACTATGTTTMTTGTGCTGAGCCAGCMCCCTCGAGTTACCCGGCCTTTTACCCCACGCC 4 63 GGATGCC AGAA AGCTCTGCTTGTCTGCAT
AGMAACATTCTGTCTGATCAGAGGMGATGTATGTAGAAMTCAGMTCTGACTGMTTCCTMA
EST39366 ATCTAT[T/C]ACACTGAGAGGAAAATGGAAAAGAAMTGTTTGCATAMGCTTTTCCCTGACTCTCA 2 GAGGGGTTCAGA
TGATTTGAGAC AAAMGCTGTAGCTGGCMGTCAMGTTTATTTTATGTGTGTAMTTCCCAGTTGAGCATTFTTTCAT
EST39371 CATTTGGATTA ATTTCACATTT TTGGATTAGCGTGAGAGG[A/G]AAAAATGTGAMTGTCTCMATCAAATGCTTCCTTCTMAGATTA 9 86 GCGTGAGAGG TT GACATTGCCCMCCCTGC _
ACMGTGACATATCCMCCMCC[A/G]TCCATCCCCACCTGTGCCCTATTCTTTCCTTGTGTTTCTTT AGAGCCTTTTCAGCTATTTCCTGTGMGCAMCTGCACGMGGCCTCCCCCGTACTCCTCCCCTGGM
WI-17177 23 G _
AGGTTCCTGGTTGCTCCCCACMTTTTGATTIC/TJGGTGGCTTCATAAGGGACCCAGGATFCTGCATT S
EST39428 GCTCCCCACA GGTCCCTTATG TTCTGGGTGGGGCCTAGGTMTTCTGTTGCCTTTGGTCCACAGAGCACMTTAMGMGATCAGGTCT CΛ 8 31 T ATTTTGATT AAGCCACC GGCTGTTGC _
GGCAGAGGM
EST39430 TMCTGATGTT CAGGGGTCGGG MTTTAGCAGAMCMTGMGTTGGCAGAGGMTAACTGATGTTC[A/C]CAATACCCCGACCCCTGA 2 45 C GTATTG CCCAGTACCTTTCCCTCAGGCCCAGGCTCCGGTGGAGGATGTCCTGGG
CTACTGACAT AAAGCCCTGTAMCTGMGCTAGACMCGTCMCTTTGGMGMAATMCAGGMCCTATTTATAT
EST39446 AGGGACTTCA TCCTGGAAMC ACGTAMTCACTTTCATACCTGCCTACTGACATAGGGACTTCAGAGTMTA[CAF]GGTTTATGTCAGT 7b 117. GAGTM TGACATAMCC TTTCCAGGATTGTTCTCCC
EST39465 MTGCAGGAG CMTCTCGGCC ATGGTGTCATTAGAGGGCCACAGGGGATGGGGGAGTAAMAATMCATAMCGMCTGMCAGAM 2 80 GGTGGC CCTCT TGCAGGAGGGTGGC[A/G]AGAGGGGCCGAGATFGGGTGTTCAGGGCAGAGAGGTGGMGACCAG
AMGATTCCT
EST39501 GTAGACATCT CACTTGCMTT TGCTTACMCCCATMCCATAGGCCATGTGTTCAGACATTCTTGACCMGCCTMAGATTCCTGTAG 0 81 MCATTAG CTGMGGCT ACATCTMCATTAG A/GJTAGCCTTCAGMTTGCMGTGCMGTTCAAGTCMACCMTTC
CACAAMTGGGACTGCTGMGAGTGGACAGTTGGACCTTACTTTGGTGACCCCATACATTTGTGGTCAI
Wl- CATGCTTTAGCCATAqA/C]CATGGTMCATTGACTATGGAGTCTTGTGAMGTGTMTGTGCGATG 1 8387b 84 A C - GCTATGTAGACATAAAGA
SI ON
Figure imgf000218_0001
S)
Figure imgf000219_0001
Figure imgf000220_0001
Figure imgf000220_0002
SI
VO
Figure imgf000221_0001
Figure imgf000221_0002
Figure imgf000222_0001
Figure imgf000222_0002
SI
S)
Figure imgf000223_0001
SI ) )
Figure imgf000224_0001
Figure imgf000224_0002
CAGGCAGGACTTCAGTGTCAGTATCCCTGCCTTCAGTCTTCTTTAGAMTCACATCTGTGTTCMTCC ATTGTTTAGAGGGAGTGTA M i l l CCTGTTCCA[C/ηGMGAGGACTTTTTGTTCACMTTGGATCAC
D63807 1 01 T MTGCAGAGGAGTCTGTTCCTCCCCCGTCGGCTTCTCGGTGCTGGGAGGGTGACCTGTCCCAGATGAC
TGGGMCATGCGTGTGACCTC[T/C]ACAGCTACCTCTTCTATGGACTGGTTATTGCCAMCAGCCACA CTGTGGGACTCTTCTTAACTTAAATTTTAATTTATTTATACTATTTAG I I I I I ATAATTTATTTTTGAT TTCACAGTGTGTTTGTGATTGTTTGCTCTGAGAGTTCCCCCTGTCCCCTCCACCTTCCCTCACAGTGTG
D90145 21 C - TCTGGTG
EST14035 ATTATCACTCTCMAAATTTTGGTGTGTGTGTTTMGTACTTTCTTATTTATGAGCCCC[T/C]GAGGA 1 a 59 c - CCAGACATGTTATTATCMGCCCCTTATATACCATCTMT
EST16668 GCATTTTAAAATTCACATTGMTCATTATTTACTATTTATGATGTTTACATAACMTTCAGTATCATT 5 71 T ATG[C/T]TGTAGATTTCAGATGTAGGTCGTCAATACTGAGCACTTATCT
EST16904 ACAGACTATCGCCAACTTATMTGCTTAAACTTTATGATCMTAGTAATAMTTACA[C/T]GAGATA 7 5_7 T TTCACACTTTATTATAAAATAGGGTTTGTGTMGATGA I I I I I CCCMCTGTAGGTTMCAT
EST21863 TTTTTMGTACCAGAGGCACTGCTGGMCAGGATGAAMCTGATACACCIA/G]GTTACTACTTACTC 9 49 G - TTCACTCTTCAMCTGATTCCCCTAMGACTTCTACTTAGCAM
SI
EST21885 GGCTGTMGTAGAATCAMGGTTMGMCATTTTATGCACTTATTCCACAAACATTTACTGAGCATA SI
UI 6 80 A CTAGGTGCTGGGA[G/A]TGTGACAGTGAGCAAAMACACM
EST22623 ATTTTAGTGCAAATGACAMGCCCAA[A/G]AGMCAGAGGATCAMTAAGATTGAAATGTATTACC 8a 26 G_ TTCTCATMGTATACGAAGTTTAACACMGTATGGGAGT
EST22644 AAAATGATTGMTTCAGCAAGTACATTTATGATCTATCTACATTGTTAAAACAGCACTAAAAATAA 2 98 G MA M I L L AAAATGATTATCCATTATTTACAG A/GJAAATGTGGAAAAGATGGCTTTTAAACCC
EST23587 | CCTCATTTATTTAAAAAGACGGACATAAAAA[T/A]TATACAACAAAMACCCAAGTCACATTTCAG 1 31 A GAGGTAAMACTAAAMGTCTGATATGAMATATGGTGG
MAGATCTGGCATTATTCACATCATTCTMATATTTTGTAATTAC I I I I I CCATGAGTATTTTTTTCA
EST24246 TGTCCMGCATTTTMCTATCATTTTAGCGTAMTACCΓF/C]GMTAACCCATAGTTACAGAATTGG
7 1 06 GTCTGTGTMCCTCMTT
TAGTTTMTTTTCTGMCCTTTGGCTTATAM I I I I I CTCAACTT[A/G]CATTTAAAMTGTATCAAT 45 GCACCTTCTTCAGTAGTACCACATGAAMTATAMCCTCGTTC
EST24435 CTTGMCTTCTGGTCTCMGTGGTACGTCCGTCTCMCCTCCCAAMTGATGCGATTACAGGCATMG
(3 73 CAGCCLGYALTGCCTGACCCACATTTFCTTTATCCGATCTGTTGATGGACATTCAGGTTGTTTC
EST25089 TATTGTTGCATTATCAAAATGGTTAΓF/C]AGTTTTCMTTAAAACTGTAATTGATTTCTATGTATAAA 6 25 ACAGCTFTGMGTTGTMATGTAGTTTCCMTCGTTAGTTMTGCTACATT
s>
SI
Figure imgf000226_0001
Figure imgf000227_0001
s
S) CΛ
Figure imgf000227_0002
SI SI
Figure imgf000228_0001
Figure imgf000229_0001
S)
S)
00
Figure imgf000230_0001
Figure imgf000230_0002
Figure imgf000231_0001
SI s vo
Figure imgf000231_0002
SI oI
Figure imgf000232_0001
I
UI
Figure imgf000233_0001
AGTTGCCAGCTCCCATGTACCAGCAGCTGGMTCTGMGGCGTGAGTCTTCATCTTAGGGCATCGCTC CTCCTCAC[G/A]CCACAMTCTGGTGCCTCTCTCTTGCTTACAMTGTCTAGGTCCCCACTGCCTGCT GGAMGMMCACACTCCTTTGCTTAGCCCACAGTTCTCCATTTCACTTGACCCCTGCCCACCTCTCC
U31416C 76 MCCTMCTGGCTTACTTCCT
AGTTGCCAGCTCCCATGTACCAGCAGCTGGMTCTGMGGCGTGAGTCTTCATCTTAGGGCATCGCTC [C/ηTCCTCACGCCACAMTCTGGTGCCTCTCTCTTGCTTACAMTGTCTAGGTCCCCACTGCCTGCTG GAMGMMCACACTCCTTTGCTTAGCCCACAGTTCTCCATTTCACTTGACCCCTGCCCACCTCTCCA
1131416b 68 ACCTMCTGGCTTACTTCCT
ACGGGTCACACAGAGAMCCTGAGTCTAGCCATGAGGGGCTTATGCTCCCMCTCACATTGTTCCTCC AGACCGCAGG[CTJTCCCCCAGCCTCAGGTTGCTGGAGCTGTCACATGACTGCATCCTGCCTGCCAGG GCTGCAMGCMGGTCTTGCTTCTATCTGGGGGACGCTGCTCGAGAGAGGCCGAGAGGCCGCAGMC
U37519a 78 ATGCCAGGTGTCC
GACCACGCTGAMCCCACCCACCCGCTGTGCTGACCATGGGCCCTGAGCGTCCT[A/G]CCCCGMTTC ACGAGGCTGAGGCATCCGGGAGCTGGCGTMTGCCTGGCCGCAGTGTGTGTGTATCCCATACCCCACT
U37690 54 A! G CTGGMGGMCCATCCAGTAMGGTCTTT
TGAAACCGTTTCMCATGGAAATGATCTGTATTGACTM[T/C]ACACCAGTCCACACTTCTATGACT ) UI TCTGCCATTTCAMGACTCATTTCTCCTATMCCACCGCATGAGTTGMTCAAMTTTTCAGATCTTT SI TCAGGAGTGTMGGMACATCATGTTTACCTGTGCAGGCACTAGTCCTTTACAGATGACCATGCTGAT
V00540 39 A
TCMGMGGTGACTGCCCTTGTATGATGGGATGGGMGATGAATGACTGGTTTTTACTGGGGTGTM AACCACTCTGAGCCTCTCTGAGACCATGTGGTTTTAAM[A/ ATCCATMGGGMGGTACCCACAC CAGTATCTGAGTTCCAGTAGCTMGACCCTAGMTTTGGATTCATCTCTG I I I I I I CATGTCTCTCCTT
X15943 1 06 GTMCCCTGAGATCATCAG _
AGGMGATCCCACCGACCCTTCCTGGCCTMTCCTTTAGATFAGGTCACATTACATTMCATTTAGGA ACCCAGACCGAMAGTTGCTGAMGGGMGGAGACACATTCACAMGAAMGTTGCGMMTTGCG MATCTGTTGTGCA[C/ηGCTCAMTGMAACGCCTTTCGGCTTTGGGCTTTTA I I I I I I I GGMCTG
X5201 1 b 1 48 T CGAGTGGCTTAGGTCTAGCCT
AGGMGATCCCACCGACCCTTCCTGGCCTMTCCTTTAGATTAGGTCACATTACATTMCATTTAGGA ACCCAGACCGAMAGTTGCTGMAGGGMGGAGACACATTCACAMGAM[A/C]GTTGCGAMATT
I GCGAMTCTGTTGTGCACGCTCAMTGMMCGCCTTTCGGCTTTGGGCTTTFA I I I I I I I GGMCTG X5201 1 a ' 1 1 8 C - CGAGTGGCTTAGGTCTAGCCT
SI
UI UI
Figure imgf000235_0001
CATCCCMGGCACTGGTGGTGACTCTGCTTCCTG[C/T]ACTGACCCAGAGCCTCTGCCTGTGCACTGC MGCTGTGTCTACTCAGGCCCCMGGGGACTCTCTGTTTCCATTCTCCCCCCACAGACCTGTCMGAG
X87344 34 T MGCATGACAMCMMTCATTTACCGACTTTAGTGCTTTTTT
GGTGGGCTGGTATCTCAGAMGTGCCTGACACACTMCCMGCTGAGTTTCCTATGGGMCMTTGA AGTAMCTTTTTGTTCTGGTCCTTTTTGGTCGAGGAGTMCMTACAMTGGATTTTGGGAGTGACTC MGMGTGAAGMTGCACMGMTGGATCACMGATGGMTTTA[GtηCAMCCCTAGCCTTGCTT
X87838 1 79 G GTJAAAATT
GTTCTGCTGCCTCTACACAGGGGCCCTGTACAGTGMTGGTGCCATTTTCGMGGAGCAGCAGTGTGA CCTCCTGTGACCC[A G]TGMTGTGCCTCCMGCGGCCCTGTGTGTTTGACATGTGMGCTATTTGAT ATGCACCAGGTCTCMGGTTCTCATTTCTCAGGTGACGTGATTCTMGGCAGGATTTGAGAGTTCACA
Z14138 81 GMGGAT
TMTCCTCACCATTCCTCAGGTATMGTTCTATAMCAGGCTTGGMTCTGGGTMTTMAMCAGA AMTTATAGTCMTATACCATGACATGMGMTGAATCCATTCTTTGGAGATGGAGTATACATGACT GCMCTGTATTTCATACGTTCTTTTCAMGTGGGATAGCTATTGCAGCTTAMGAGC[A/C]CAGGTTC
Z18859 1 91 CAGTACTGGTTTTCCM
S)
AGMCCTGACCAGATGTGGCTCGGAGGGGMTCCAGACCCGCTGCTGTCTTGCTCTCCCTCCCCTCCC UI CACTCCTCCTCTCTTCTTCCTCTTCTCTCTCACTGCCACGCCTTCCTTTCCCTCCTCCTCCCCCTCTCCG CTCTGTGCTCTTCATTCTCAC[GA]GGCCCGCMCCCCTCCTCTCTCTGTCCCCGCCCGTCTCTGGMA
Z23091 1 59 G CTGAGCTTGACGTTTG
GTTGGCATTGTTAGTAAMCTTCATAGGTGMGAGGAGGATCAGTGAGATTMGTTATTTTATCAM GTGTGGTTTTCTGCMGGGCAGGTTTGAMCCTGACCCTAGTTGTGCTCCAGGACCTAIA/G]GCGTGC TCACTCTACCTTGTCTTTGTGTTGAMGGAGTGGTTTCCCATGACTGTTTMGTGACMGTGCCATGG
1 1595b 1 25 ATATCTACACCGTCACCAGACTAGATTGTCTCMTGTCCTTGGCTTGCGAC
GTTGGCATTGTTAGTAMACTTCATAGGTGMGAGGAGGATCAGTGAGATTMGTTATTTTATCAM GTGTGGTTTTCTGCMGGGCAGGTTTGAMCCTGACCCTAGTTGTGCTCCAGGACCTA[A/G]GCGTGC TCACTCTACCTTGTCTTTGTGTTGAMGGAGTGGTTTCCCATGACTGTTTMGTGACMGTGCCATGG
1 1 595 1 25 ATATCTACACCGTCACCAGACTAGATTGTCTCMTGTCCTTGGCTTGCGAC
TATATCACATTAGTATGTCACTGCCATGGTMGGACTTTGATCACTAGGAAATMGMCACTTTGM TGGTCTTGTCCTTTCMTMMAGAGTGACATGATTGMCATGTGTTTTAGATMAGGGCACTT[GtF ]GCAGGAGTGTTTAGGATGMGAGAGMGAGATTMGGMGATCAGGMGMMGTAGCMTGGGA
1241 1 31 G T ATGAMATAGGAGGCCCTGAGATCCACTGGATMTCTMAAMCCMGAGAMG
GTGCGATCACCACTACAGTCTMTTTCAGATGFTTTCATTACCCCTAMAGAMTCTTGTACCCATTA GCMTTATTCCTCATTCCTGCCCTCACCCCCAGGCCCTACTCTTTATCGCTATAGATTTGCC[C/ηACT TGACATATCATACACATGGAGCCATACATATGTGTGCCCTTCATGATTGGCTTCTTTCACTGAGMTA
1 282 1 30 ATGTTTTCAAGGT
AGTATCACACATACTTAATATATTAGATATACACMTMTMAATCACTCCCTACCTTGAAMCTTT A[C/ηAGAAGCATTTTTAATTTTACAACACAMGCTCAAACGMCCTACAATMGTCTAGTAGTCTG TTTACGTGCCAAGGGATAAGGCTGAACMTAMTTMCCCTTTAAAAATGTCTATGMCAAGTACAA
681 0 68 T TTTTC I I I I I GAGTTCTGCAGAGCAATGACCACTMGMATA I I I I I AMGGC
CCMGTACATTGGGTGMCGATGAGCTAGCTGTTCTAGTATTTGC I I I I I GTMTCCAGTTMGACCA TCAGCATATACAACATCATCACTMCTCMCMTGTAGCTGCAGGGTMC[A/C]TGTGGATACCCTG TGTGCTCTACTGGCCTCCAMGGCATTCAGGGGATCATCAMGATGTTGGACACCTTGTGTTCAMTC
681 7 1 1 8 πGGTTCAGGTGCGGCCTGTGCAGΛTCGGCTTTTTGGTTTGGTTGTCTTAG
CCATTTTA I I I I I CTCTAAATTTTAAAATAGMGACTTTMTGGAAAACATTFAGTACCATCATGTCA CCCTGMTGCCAGCMTACCTCGACTTTTACACACGCAGGMGCCTAGTAAMGCCCCGTCAGTAGT ACACATTTCTCTATGGTCCTTCMCAGTTTTGCATATACAAMTTTTCTGCTATTΓTGCTTTAGCAAA
6819b 21 2 CAGCMTMCTTTTGTGTTTCCTATATGACACCTAATATCCA SI u>
CCATTTTA 1 1 ΓI FCTCTAAATTTTAAAATAGAAGACTTTAATGGAAAACATTTAGTACCATCATGTCA CCCTGMTGCCAGCMTACCTCGACTTTTACACACGCAGGMGCCTAGTAMAGCCCCGTCAGTAGT
ACACATTFCTCTATGGTCCTTCAACAGTTTT[G ηCATATACAAMTTTTCTGCTATTTTGCTTTAGC
6819a 1 66 UT AMCAGCMTMCTTTTGTGTTTCCTATATGACACCTMTATCCA
CTGGTATGTCATAAGCMTCCATAATTGTTATAGCTATT[A/G1TTATACTATGGCACCATTTGGGACA
CAGATTATATATGTCAGACACCACGMTGTCCTTTMGATATGCAGCMGCACAMTCTGTCATGGT
681 xx 39 TTAACAAMGAAATGMCGTCTAGG
AGGATTCCCTCTTFTFCTATTGATTGGMTAGTTTCAGMGGMTGGTACCAGTTCCTCCTTGTACCT CTGGTAGMTTCGGCTGTGMTCCATCTGGTCCTGGACTCTTTTTGGTTGGTAMCTATTGATTATTGC CACMTTTCAGA[GtηCCTGTTATTGGTCTATTCAGAGATTCMCTTCTTCCTGGTTTAGTCTTGGGA
6972b 1 49 GAGTGTATGTGTCGAGGMT
AGGATTCCCTCTTTTTCTATTGATTGGMTAGTTTCAGMGGMTGGTACCAGTTCCTCCTTGTACCT CTGGTAGMTTCGGCTGTGMTCCATCTGGTCCTGGACTCTTTTTGGTTGGTM[A/G]CTATTGATTA TTGCCACMTTTCAGAGCCTGTTATTGGTCTATTCAGAGATTCMCTTCTTCCTGGTTTAGTCTTGGGAl
6972a 1 22 ' © GAGTGTATGTGTCGAGGMT
s>
UI
ON
Figure imgf000238_0001
SI
U -4
Figure imgf000239_0001
SI
UI 06
Figure imgf000240_0001
S)
UI
VO
Figure imgf000241_0001
S>
©
Figure imgf000242_0001
S
Figure imgf000243_0001
SI
S .I
Figure imgf000244_0001
SI
UI
Figure imgf000245_0001
SI
Figure imgf000246_0001
SI
Figure imgf000247_0001
Figure imgf000248_0001
Figure imgf000249_0001
-248-
Figure imgf000250_0001
-249-
Figure imgf000251_0001
SI oCΛ
Figure imgf000252_0001
S) CΛ
Figure imgf000253_0001
I CΛ SI
Figure imgf000254_0001
s
Figure imgf000255_0001
SI CΛ
Figure imgf000256_0001
Figure imgf000257_0001
I CΛ
ON
Figure imgf000258_0001
SI CΛ -4
Figure imgf000259_0001
SI CΛ
00
Figure imgf000260_0001
)
Figure imgf000261_0001
Figure imgf000262_0001
SI
ON
Figure imgf000263_0001
Figure imgf000264_0001
Figure imgf000265_0001
SI
ON
Figure imgf000266_0001
)
ON CΛ
Figure imgf000267_0001
Figure imgf000267_0002
Figure imgf000268_0001
< < o o o F- < o o o
Figure imgf000269_0001
I
ON 00
Figure imgf000270_0001
s>
ON VO
Figure imgf000271_0001
SI -4
©
Figure imgf000272_0001
SI -4
Figure imgf000273_0001
Figure imgf000273_0002
Figure imgf000274_0001
Figure imgf000275_0001
SI
Figure imgf000276_0001
Figure imgf000277_0001
) -4
ON
Figure imgf000278_0001
s
-
Figure imgf000279_0001
SI -o4o
Figure imgf000280_0001
I -4
VO
Figure imgf000281_0001
SI
00 o
Figure imgf000282_0001
s>
00
Figure imgf000283_0001
)
00 SI
Figure imgf000284_0001
SI oo
Figure imgf000285_0001
Figure imgf000285_0002
00
Figure imgf000286_0001
SI 00 CΛ
Figure imgf000287_0001
Figure imgf000288_0001
I oo
~4
Figure imgf000289_0001
SI
00 00
Figure imgf000290_0001
SI
00
Figure imgf000291_0001
SI
Figure imgf000292_0001
Figure imgf000293_0001
S)
VO )
Figure imgf000294_0001
SI
VO UI
Figure imgf000295_0001
SI
Figure imgf000296_0001
SI
NO CΛ
Figure imgf000297_0001
SI
Figure imgf000298_0001
Figure imgf000299_0001
SI
VO
Figure imgf000299_0002
S>
VO 00
Figure imgf000300_0001
Figure imgf000301_0001
I
VO VO
Figure imgf000301_0002
© o
Figure imgf000302_0001
UI
©
Figure imgf000303_0001
Figure imgf000303_0002
UoI
Figure imgf000304_0001
UI
©
UI
Figure imgf000305_0001
UoI
Figure imgf000306_0001
EQUIVALENTS
While this invention has been particularly shown and described with references to preferred embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. Those skilled in the art will recognize or be able to ascertain using no more than routine experimentation, many equivalents to the specific embodiments of the invention described specifically herein. Such equivalents are intended to be encompassed in the scope of the claims.

Claims

CLAIMSWE CLAIM:
1. A nucleic acid segment shown in column 7 of the Table, or a portion thereof which includes a polymorphic site, or the complement of the segment or portion thereof.
2. The nucleic acid segment of claim 1 that is DNA.
3. The nucleic acid segment of claim 1 that is RNA.
4. The segment of claim 1 that is less 'than 100 bases.
5. The segment of claim 1 that is less than 50 bases.
6. The segment of claim 1 that is less than 20 bases.
7. The segment of claim 1, wherein the polymorphic site is biallelic .
8. The segment of claim 1, wherein the polymorphic form occupying the polymorphic site is the reference base for the fragment listed in the Table, column 3.
9. The segment of claim 1, wherein the polymorphic form occupying the polymorphic site is an alternative form for the fragment listed in the Table, column 4.
10. An allele-specific oligonucleotide that hybridizes to a segment of a fragment shown in the Table, column 7 or its complement.
11. The allele-specific oligonucleotide of claim 10 that is a probe .
12. The allele-specific oligonucleotide of claim 10, wherein a central position of the probe aligns with the polymorphic site of the fragment .
13. The allele-specific oligonucleotide of claim 10 that is a primer.
14. The allele-specific oligonucleotide of claim 13, wherein the 3' end of the primer aligns with the polymorphic site of the fragment .
15. The allele-specific oligonucleotide of Claim 10, which is selected from the group consisting of the nucleotide sequences of the Table, column 5.
16. The allele-specific oligonucleotide of Claim 10, which is selected from the group consisting of the nucleotide sequences of the Table, column 6.
17. An isolated nucleic acid comprising a sequence of the Table, column 7 or the complement thereof, wherein the polymorphic site within the sequence or complement is occupied by a base other than the reference base shown in the Table, column 3.
18. A method of analyzing a nucleic acid, comprising obtaining the nucleic acid from an individual; and determining a base occupying any one of the polymorphic sites shown in the Table.
19. The method of claim 18, wherein the determining comprises determining a set of bases occupying a set of the polymorphic sites shown in the Table.
0. The method of claim 18, wherein the nucleic acid is obtained from a plurality of individuals, and a base occupying one of the polymorphic positions is determined in each of the individuals, and the method further comprising testing each individual for the presence of a disease phenotype, and correlating the presence of the disease phenotype with the base .
PCT/US1997/020313 1996-11-06 1997-11-05 Biallelic markers WO1998020165A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP97946582A EP0941366A2 (en) 1996-11-06 1997-11-05 Biallelic markers

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US3045596P 1996-11-06 1996-11-06
US60/030,455 1996-11-06

Publications (2)

Publication Number Publication Date
WO1998020165A2 true WO1998020165A2 (en) 1998-05-14
WO1998020165A3 WO1998020165A3 (en) 1998-11-12

Family

ID=21854280

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1997/020313 WO1998020165A2 (en) 1996-11-06 1997-11-05 Biallelic markers

Country Status (2)

Country Link
EP (1) EP0941366A2 (en)
WO (1) WO1998020165A2 (en)

Cited By (73)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0887407A2 (en) * 1997-06-11 1998-12-30 Smithkline Beecham Corporation cDNA clone HAPO167 that encodes a human 7-transmembrane receptor
WO1999050454A2 (en) * 1998-04-01 1999-10-07 Whitehead Institute For Biomedical Research Coding sequence polymorphisms in vascular pathology genes
WO1999053095A2 (en) * 1998-04-09 1999-10-21 Whitehead Institute For Biomedical Research Biallelic markers
WO1999054500A2 (en) * 1998-04-21 1999-10-28 Genset Biallelic markers for use in constructing a high density disequilibrium map of the human genome
WO1999061659A1 (en) * 1998-05-26 1999-12-02 Procrea Biosciences Inc. A novel str marker system for dna fingerprinting
WO1999064590A1 (en) * 1998-06-05 1999-12-16 Genset Polymorphic markers of prostate carcinoma tumor antigen-1 (pcta-1)
WO2000008209A2 (en) * 1998-08-07 2000-02-17 Genset Nucleic acids encoding human tbc-1 protein and polymorphic markers thereof
WO2000018960A2 (en) * 1998-09-25 2000-04-06 Massachusetts Institute Of Technology Methods and products related to genotyping and dna analysis
WO2000028080A2 (en) * 1998-11-10 2000-05-18 Genset Methods, software and apparati for identifying genomic regions harboring a gene associated with a detectable trait
WO2000029623A2 (en) * 1998-11-17 2000-05-25 Curagen Corporation Nucleic acids containing single nucleotide polymorphisms and methods of use thereof
WO2000042165A2 (en) * 1999-01-15 2000-07-20 Human Genome Sciences, Inc. Bone marrow-specific protein
WO2000055375A1 (en) * 1999-03-17 2000-09-21 Alphagene, Inc. Secreted proteins and polynucleotides encoding them
WO2000056924A2 (en) * 1999-03-24 2000-09-28 Genset Genomic sequence of the purh gene and purh-related biallelic markers
WO2000058519A2 (en) * 1999-03-31 2000-10-05 Whitehead Institute For Biomedical Research Charaterization of single nucleotide polymorphisms in coding regions of human genes
WO2000058510A2 (en) * 1999-03-30 2000-10-05 Genset Schizophrenia associated genes, proteins and biallelic markers
WO2000066728A1 (en) * 1999-05-03 2000-11-09 Compugen Ltd. StAR HOMOLOGUES
WO2000071710A2 (en) * 1999-05-25 2000-11-30 Aventis Pharma S.A. Expression products of genes involved in diseases related to cholesterol metabolism
FR2794131A1 (en) * 1999-05-25 2000-12-01 Aventis Pharma Sa New nucleic acid derived from human chromosome 9, used e.g. for diagnosis and drug screening, derived from genes implicated in disorders of lipoprotein metabolism
WO2001000669A2 (en) * 1999-06-25 2001-01-04 Genset A bap28 gene and protein
EP1088900A1 (en) * 1999-09-10 2001-04-04 Epidauros Biotechnologie AG Polymorphisms in the human CYP3A4, CYP3A7 and hPXR genes and their use in diagnostic and therapeutic applications
WO2001038586A2 (en) * 1999-11-24 2001-05-31 Curagen Corporation Nucleic acids containing single nucleotide polymorphisms and methods of use thereof
WO2001042511A2 (en) * 1999-12-10 2001-06-14 Whitehead Institute For Biomedical Research Ibd-related polymorphisms
WO2001048245A2 (en) * 1999-12-27 2001-07-05 Curagen Corporation Nucleic acids containing single nucleotide polymorphisms and methods of use thereof
WO2001047942A2 (en) * 1999-12-27 2001-07-05 Curagen Corporation Nucleic acids containing single nucleotide polymorphisms and methods of use thereof
WO2001047944A2 (en) * 1999-12-28 2001-07-05 Curagen Corporation Nucleic acids containing single nucleotide polymorphisms and methods of use thereof
WO2001051670A2 (en) * 2000-01-07 2001-07-19 Curagen Corporation Nucleic acids containing single nucleotide polymorphisms and methods of use thereof
WO2001066800A2 (en) * 2000-03-07 2001-09-13 Whitehead Institute For Biomedical Research Human single nucleotide polymorphisms
WO2001079504A1 (en) * 2000-04-13 2001-10-25 Millennium Pharmaceuticals, Inc. 23155 NOVEL PROTEIN HUMAN 5-α REDUCTASES AND USES THEREFOR
WO2002002812A2 (en) * 2000-06-30 2002-01-10 University College London Method for determining the susceptibility to stoke by analysing the ins gene
US6344441B1 (en) 1997-08-06 2002-02-05 Genset Lipoprotein-regulating medicaments
WO2002066641A1 (en) * 2001-02-20 2002-08-29 Genset S.A. Pg-3 and biallelic markers thereof
US6476208B1 (en) 1998-10-13 2002-11-05 Genset Schizophrenia associated genes, proteins and biallelic markers
US6479238B1 (en) 1999-02-10 2002-11-12 Marta Blumenfeld Polymorphic markers of the LSR gene
US6482923B1 (en) 1997-09-17 2002-11-19 Human Genome Sciences, Inc. Interleukin 17-like receptor protein
WO2002092612A2 (en) * 2001-05-11 2002-11-21 Noxxon Pharma Ag Nucleic acids that bind to enterotoxin b
WO2003000896A2 (en) * 2001-05-03 2003-01-03 Genodyssee POLYNUCLEOTIDES AND POLYPEPTIDES OF THE IFNα-5 GENE
US6534293B1 (en) 1999-01-06 2003-03-18 Cornell Research Foundation, Inc. Accelerating identification of single nucleotide polymorphisms and alignment of clones in genomic sequencing
US6537751B1 (en) 1998-04-21 2003-03-25 Genset S.A. Biallelic markers for use in constructing a high density disequilibrium map of the human genome
US6555316B1 (en) 1999-10-12 2003-04-29 Genset S.A. Schizophrenia associated gene, proteins and biallelic markers
US6566498B1 (en) 1998-02-06 2003-05-20 Human Genome Sciences, Inc. Human serine protease and serpin polypeptides
US6566332B2 (en) 2000-01-14 2003-05-20 Genset S.A. OBG3 globular head and uses thereof for decreasing body mass
US6582909B1 (en) 1998-11-04 2003-06-24 Genset, S.A. APM1 biallelic markers and uses thereof
US6635443B1 (en) 1997-09-17 2003-10-21 Human Genome Sciences, Inc. Polynucleotides encoding a novel interleukin receptor termed interleukin-17 receptor-like protein
US6703228B1 (en) 1998-09-25 2004-03-09 Massachusetts Institute Of Technology Methods and products related to genotyping and DNA analysis
US6759192B1 (en) 1998-06-05 2004-07-06 Genset S.A. Polymorphic markers of prostate carcinoma tumor antigen-1(PCTA-1)
US6759515B1 (en) 1997-02-25 2004-07-06 Corixa Corporation Compositions and methods for the therapy and diagnosis of prostate cancer
US6825004B1 (en) 1998-08-07 2004-11-30 Genset S.A. Nucleic acids encoding human TBC-1 protein and polymorphic markers thereof
US6849719B2 (en) 1997-09-17 2005-02-01 Human Genome Sciences, Inc. Antibody to an IL-17 receptor like protein
US6869762B1 (en) 1999-12-10 2005-03-22 Whitehead Institute For Biomedical Research Crohn's disease-related polymorphisms
US6902892B1 (en) 1998-10-19 2005-06-07 Diadexus, Inc. Method of diagnosing, monitoring, staging, imaging and treating prostate cancer
US6902890B1 (en) 1999-11-04 2005-06-07 Diadexus, Inc. Method of diagnosing monitoring, staging, imaging and treating cancer
US6967091B2 (en) 2000-01-14 2005-11-22 Genset, S.A. OBG3 globular head and uses thereof for decreasing body mass
US6989367B2 (en) 2000-01-14 2006-01-24 Genset S.A. OBG3 globular head and uses thereof
US7067627B2 (en) 1999-03-30 2006-06-27 Serono Genetics Institute S.A. Schizophrenia associated genes, proteins and biallelic markers
US7105353B2 (en) 1997-07-18 2006-09-12 Serono Genetics Institute S.A. Methods of identifying individuals for inclusion in drug studies
WO2006136033A1 (en) * 2005-06-23 2006-12-28 The University Of British Columbia Coagulation factor iii polymorphisms associated with prediction of subject outcome and response to therapy
US7193045B2 (en) 1998-05-15 2007-03-20 Genetech, Inc. Polypeptides that induce cell proliferation
US7267966B2 (en) 1998-10-27 2007-09-11 Affymetrix, Inc. Complexity management and analysis of genomic DNA
US7323307B2 (en) 1996-09-19 2008-01-29 Affymetrix, Inc. Identification of molecular sequence signatures and methods involving the same
US7338787B2 (en) 2000-01-14 2008-03-04 Serono Genetics Institute S.A. Nucleic acids encoding OBG3 globular head and uses thereof
US7501239B2 (en) * 2003-04-18 2009-03-10 Arkray, Inc. Method of detecting β3 adrenaline receptor mutant gene and nucleic acid probe and kit therefor
WO2009143576A1 (en) * 2008-05-27 2009-12-03 Adelaide Research & Innovation Pty Ltd Polymorphisms associated with pregnancy complications
US8044183B2 (en) 1998-02-05 2011-10-25 Glaxosmithkline Biologicals S.A. Process for the production of immunogenic compositions
US8063191B2 (en) * 2005-09-16 2011-11-22 Mayo Foundation For Medical Education And Research Polynucleotides encoding for fusion proteins with natriuresis activity
WO2011151405A1 (en) 2010-06-04 2011-12-08 Institut National De La Sante Et De La Recherche Medicale (Inserm) Constitutively active prolactin receptor variants as prognostic markers and therapeutic targets to prevent progression of hormone-dependent cancers towards hormone-independence
US8133734B2 (en) 1999-03-16 2012-03-13 Human Genome Sciences, Inc. Kit comprising an antibody to interleukin 17 receptor-like protein
US8367322B2 (en) 1999-01-06 2013-02-05 Cornell Research Foundation, Inc. Accelerating identification of single nucleotide polymorphisms and alignment of clones in genomic sequencing
US9074244B2 (en) 2008-03-11 2015-07-07 Affymetrix, Inc. Array-based translocation and rearrangement assays
US9193777B2 (en) 2009-07-09 2015-11-24 Mayo Foundation For Medical Education And Research Method of treating cardiac arrhythmia with long acting atrial natriuretic peptide(LA-ANP)
US9388457B2 (en) 2007-09-14 2016-07-12 Affymetrix, Inc. Locus specific amplification using array probes
US9611305B2 (en) 2012-01-06 2017-04-04 Mayo Foundation For Medical Education And Research Treating cardiovascular or renal diseases
US10344068B2 (en) 2011-08-30 2019-07-09 Mayo Foundation For Medical Education And Research Natriuretic polypeptides
CN111139301A (en) * 2020-03-10 2020-05-12 无锡市第五人民医院 Breast cancer related gene ERBB2 site g.39397319C > A mutant and application thereof

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1995012607A1 (en) * 1993-11-03 1995-05-11 Molecular Tool, Inc. Single nucleotide polymorphisms and their use in genetic analysis
FR2722295A1 (en) * 1994-07-07 1996-01-12 Roussy Inst Gustave Single strand conformation polymorphism analysis of DNA

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1995012607A1 (en) * 1993-11-03 1995-05-11 Molecular Tool, Inc. Single nucleotide polymorphisms and their use in genetic analysis
FR2722295A1 (en) * 1994-07-07 1996-01-12 Roussy Inst Gustave Single strand conformation polymorphism analysis of DNA

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
DATABASE EMEST10 embl Accession number: hst27766, 12 January 1995 ADAMS M D ET AL.: "Initial assessment of human gene diversity and expression patterns based upon 52 million basepairs of cDNA sequence" XP002067789 *
GROMPE M: "THE RAPID DETECTION OF UNKNOWN MUTATIONS IN NUCLEIC ACIDS" NATURE GENETICS, vol. 5, no. 2, October 1993, pages 111-117, XP000615290 *
HRUBAN R H ET AL: "K-RAS ONCOGENE ACTIVATION IN ADENOCARCINOMA OF THE HUMAN PANCREAS A STUDY OF 82 CARCINOMAS USING A COMBINATION OF MUTANT-ENRICHED POLYMERASE CHAIN RACTION ANALYSIS AND ALLELE-SPECIFIC OLIGONUCLEOTIDE HYBRIDIZATION" AMERICAN JOURNAL OF PATHOLOGY, vol. 143, no. 2, 1 August 1993, pages 545-554, XP000572114 *
NIKIFOROV T T ET AL: "GENETIC BIT ANALYSIS: A SOLID PHASE METHOD FOR TYPING SINGLE NUCLEOTIDE POLYMORPHISMS" NUCLEIC ACIDS RESEARCH, vol. 22, no. 20, October 1994, pages 4167-4175, XP002015765 *
SYVANEN A -CH ET AL: "IDENTIFICATION OF INDIVIDUALS BY ANALYSIS OF BIALLELIC DNA MARKERS,USING PCR AND SOLID-PHASE MINISEQUENCING" AMERICAN JOURNAL OF HUMAN GENETICS, vol. 52, no. 1, January 1993, pages 46-59, XP002050638 *
WANG D ET AL: "TOWARD A THIRD GENERATION GENETIC MAP OF THE HUMAN GENOME BASED ON BI-ALLELIC POLYMORPHISMS" AMERICAN JOURNAL OF HUMAN GENETICS, vol. 59, no. 4, October 1996, page A03 XP002050641 *

Cited By (126)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7323307B2 (en) 1996-09-19 2008-01-29 Affymetrix, Inc. Identification of molecular sequence signatures and methods involving the same
US6759515B1 (en) 1997-02-25 2004-07-06 Corixa Corporation Compositions and methods for the therapy and diagnosis of prostate cancer
EP0887407A3 (en) * 1997-06-11 2000-07-12 Smithkline Beecham Corporation cDNA clone HAPO167 that encodes a human 7-transmembrane receptor
EP0887407A2 (en) * 1997-06-11 1998-12-30 Smithkline Beecham Corporation cDNA clone HAPO167 that encodes a human 7-transmembrane receptor
US7105353B2 (en) 1997-07-18 2006-09-12 Serono Genetics Institute S.A. Methods of identifying individuals for inclusion in drug studies
US7919091B2 (en) 1997-08-06 2011-04-05 Serono Genetics Institute S.A. Lipolysis stimulated receptor (LSR) specific antibodies
US7220722B2 (en) 1997-08-06 2007-05-22 Serono Genetics Institute S.A. Lipoprotein-regulating medicants
US7291709B2 (en) 1997-08-06 2007-11-06 Serono Genetics Institute S.A. LSR receptor, its activity, its cloning, and its applications to the diagnosis, prevention and/or treatment of obesity and related risks or complications
US6946444B2 (en) 1997-08-06 2005-09-20 Genset Lipoprotein-regulating medicaments
US6344441B1 (en) 1997-08-06 2002-02-05 Genset Lipoprotein-regulating medicaments
US6635431B1 (en) 1997-08-06 2003-10-21 Genset, S.A. LSR receptor, activity, cloning, and uses for diagnosing, preventing and/or treating obesity and related risks or complications
US6849719B2 (en) 1997-09-17 2005-02-01 Human Genome Sciences, Inc. Antibody to an IL-17 receptor like protein
US6635443B1 (en) 1997-09-17 2003-10-21 Human Genome Sciences, Inc. Polynucleotides encoding a novel interleukin receptor termed interleukin-17 receptor-like protein
US7638603B2 (en) 1997-09-17 2009-12-29 Human Genome Sciences, Inc. Antibodies against interleukin 17 receptor-like protein
US6482923B1 (en) 1997-09-17 2002-11-19 Human Genome Sciences, Inc. Interleukin 17-like receptor protein
US8097257B2 (en) 1998-02-05 2012-01-17 Glaxosmithkline Biologicals S.A. MAGE3 polypeptides
US8044183B2 (en) 1998-02-05 2011-10-25 Glaxosmithkline Biologicals S.A. Process for the production of immunogenic compositions
US8597656B2 (en) 1998-02-05 2013-12-03 Glaxosmithkline Biologicals S.A. Process for the production of immunogenic compositions
US6566498B1 (en) 1998-02-06 2003-05-20 Human Genome Sciences, Inc. Human serine protease and serpin polypeptides
WO1999050454A2 (en) * 1998-04-01 1999-10-07 Whitehead Institute For Biomedical Research Coding sequence polymorphisms in vascular pathology genes
US6692909B1 (en) 1998-04-01 2004-02-17 Whitehead Institute For Biomedical Research Coding sequence polymorphisms in vascular pathology genes
WO1999050454A3 (en) * 1998-04-01 2000-04-13 Whitehead Biomedical Inst Coding sequence polymorphisms in vascular pathology genes
WO1999053095A3 (en) * 1998-04-09 2000-03-16 Whitehead Biomedical Inst Biallelic markers
WO1999053095A2 (en) * 1998-04-09 1999-10-21 Whitehead Institute For Biomedical Research Biallelic markers
US6537751B1 (en) 1998-04-21 2003-03-25 Genset S.A. Biallelic markers for use in constructing a high density disequilibrium map of the human genome
WO1999054500A3 (en) * 1998-04-21 2000-03-16 Genset Sa Biallelic markers for use in constructing a high density disequilibrium map of the human genome
WO1999054500A2 (en) * 1998-04-21 1999-10-28 Genset Biallelic markers for use in constructing a high density disequilibrium map of the human genome
US7193045B2 (en) 1998-05-15 2007-03-20 Genetech, Inc. Polypeptides that induce cell proliferation
WO1999061659A1 (en) * 1998-05-26 1999-12-02 Procrea Biosciences Inc. A novel str marker system for dna fingerprinting
WO1999064590A1 (en) * 1998-06-05 1999-12-16 Genset Polymorphic markers of prostate carcinoma tumor antigen-1 (pcta-1)
US6759192B1 (en) 1998-06-05 2004-07-06 Genset S.A. Polymorphic markers of prostate carcinoma tumor antigen-1(PCTA-1)
US7547771B2 (en) 1998-06-05 2009-06-16 Serono Genetics Institute S.A. Polymorphic markers of prostate carcinoma tumor antigen -1(PCTA-1)
US6825004B1 (en) 1998-08-07 2004-11-30 Genset S.A. Nucleic acids encoding human TBC-1 protein and polymorphic markers thereof
WO2000008209A3 (en) * 1998-08-07 2000-11-09 Genset Sa Nucleic acids encoding human tbc-1 protein and polymorphic markers thereof
WO2000008209A2 (en) * 1998-08-07 2000-02-17 Genset Nucleic acids encoding human tbc-1 protein and polymorphic markers thereof
WO2000018960A2 (en) * 1998-09-25 2000-04-06 Massachusetts Institute Of Technology Methods and products related to genotyping and dna analysis
JP2002525127A (en) * 1998-09-25 2002-08-13 マサチューセッツ インスティテュート オブ テクノロジー Methods and products for genotyping and DNA analysis
US6703228B1 (en) 1998-09-25 2004-03-09 Massachusetts Institute Of Technology Methods and products related to genotyping and DNA analysis
WO2000018960A3 (en) * 1998-09-25 2000-09-08 Massachusetts Inst Technology Methods and products related to genotyping and dna analysis
US7371811B2 (en) 1998-10-13 2008-05-13 Serono Genetics Institute S.A. Schizophrenia associated genes, proteins and biallelic markers
US6476208B1 (en) 1998-10-13 2002-11-05 Genset Schizophrenia associated genes, proteins and biallelic markers
US7432064B2 (en) 1998-10-19 2008-10-07 Diadexus, Inc. Method of diagnosing, monitoring, staging, imaging and treating prostate cancer
US6902892B1 (en) 1998-10-19 2005-06-07 Diadexus, Inc. Method of diagnosing, monitoring, staging, imaging and treating prostate cancer
US7267966B2 (en) 1998-10-27 2007-09-11 Affymetrix, Inc. Complexity management and analysis of genomic DNA
US6582909B1 (en) 1998-11-04 2003-06-24 Genset, S.A. APM1 biallelic markers and uses thereof
WO2000028080A3 (en) * 1998-11-10 2000-08-17 Genset Sa Methods, software and apparati for identifying genomic regions harboring a gene associated with a detectable trait
AU771187B2 (en) * 1998-11-10 2004-03-18 Genset Methods, software and apparati for identifying genomic regions harboring a gene associated with a detectable trait
WO2000028080A2 (en) * 1998-11-10 2000-05-18 Genset Methods, software and apparati for identifying genomic regions harboring a gene associated with a detectable trait
US6291182B1 (en) 1998-11-10 2001-09-18 Genset Methods, software and apparati for identifying genomic regions harboring a gene associated with a detectable trait
WO2000029623A3 (en) * 1998-11-17 2001-04-19 Curagen Corp Nucleic acids containing single nucleotide polymorphisms and methods of use thereof
WO2000029623A2 (en) * 1998-11-17 2000-05-25 Curagen Corporation Nucleic acids containing single nucleotide polymorphisms and methods of use thereof
US6670464B1 (en) 1998-11-17 2003-12-30 Curagen Corporation Nucleic acids containing single nucleotide polymorphisms and methods of use thereof
US6534293B1 (en) 1999-01-06 2003-03-18 Cornell Research Foundation, Inc. Accelerating identification of single nucleotide polymorphisms and alignment of clones in genomic sequencing
US8367322B2 (en) 1999-01-06 2013-02-05 Cornell Research Foundation, Inc. Accelerating identification of single nucleotide polymorphisms and alignment of clones in genomic sequencing
WO2000042165A2 (en) * 1999-01-15 2000-07-20 Human Genome Sciences, Inc. Bone marrow-specific protein
WO2000042165A3 (en) * 1999-01-15 2001-05-31 Human Genome Sciences Inc Bone marrow-specific protein
US6534485B1 (en) 1999-01-15 2003-03-18 Human Genome Sciences, Inc. Bone marrow-specific protein
US6479238B1 (en) 1999-02-10 2002-11-12 Marta Blumenfeld Polymorphic markers of the LSR gene
US7125667B2 (en) 1999-02-10 2006-10-24 Serono Genetics Institute S.A. Polymorphic markers of the LSR gene
US8133734B2 (en) 1999-03-16 2012-03-13 Human Genome Sciences, Inc. Kit comprising an antibody to interleukin 17 receptor-like protein
WO2000055375A1 (en) * 1999-03-17 2000-09-21 Alphagene, Inc. Secreted proteins and polynucleotides encoding them
US6544737B1 (en) 1999-03-24 2003-04-08 Genset S.A. Genomic sequence of the purH gene and purH-related biallelic markers
WO2000056924A2 (en) * 1999-03-24 2000-09-28 Genset Genomic sequence of the purh gene and purh-related biallelic markers
WO2000056924A3 (en) * 1999-03-24 2001-08-09 Genset Sa Genomic sequence of the purh gene and purh-related biallelic markers
US7427482B2 (en) 1999-03-24 2008-09-23 Serono Genetics Institute S.A. Methods of assessing the risk for the development of sporadic prostate cancer
US7041454B2 (en) 1999-03-24 2006-05-09 Serono Genetics Institute, S.A. Genomic sequence of the purH gene and purH-related biallelic markers
WO2000058510A2 (en) * 1999-03-30 2000-10-05 Genset Schizophrenia associated genes, proteins and biallelic markers
US7067627B2 (en) 1999-03-30 2006-06-27 Serono Genetics Institute S.A. Schizophrenia associated genes, proteins and biallelic markers
WO2000058510A3 (en) * 1999-03-30 2001-08-09 Genset Sa Schizophrenia associated genes, proteins and biallelic markers
WO2000058519A2 (en) * 1999-03-31 2000-10-05 Whitehead Institute For Biomedical Research Charaterization of single nucleotide polymorphisms in coding regions of human genes
WO2000058519A3 (en) * 1999-03-31 2001-08-23 Whitehead Biomedical Inst Charaterization of single nucleotide polymorphisms in coding regions of human genes
WO2000066728A1 (en) * 1999-05-03 2000-11-09 Compugen Ltd. StAR HOMOLOGUES
WO2000071710A3 (en) * 1999-05-25 2001-05-17 Aventis Pharma Sa Expression products of genes involved in diseases related to cholesterol metabolism
WO2000071710A2 (en) * 1999-05-25 2000-11-30 Aventis Pharma S.A. Expression products of genes involved in diseases related to cholesterol metabolism
FR2794131A1 (en) * 1999-05-25 2000-12-01 Aventis Pharma Sa New nucleic acid derived from human chromosome 9, used e.g. for diagnosis and drug screening, derived from genes implicated in disorders of lipoprotein metabolism
WO2001000669A3 (en) * 1999-06-25 2002-01-17 Genset Sa A bap28 gene and protein
WO2001000669A2 (en) * 1999-06-25 2001-01-04 Genset A bap28 gene and protein
EP1088900A1 (en) * 1999-09-10 2001-04-04 Epidauros Biotechnologie AG Polymorphisms in the human CYP3A4, CYP3A7 and hPXR genes and their use in diagnostic and therapeutic applications
US6555316B1 (en) 1999-10-12 2003-04-29 Genset S.A. Schizophrenia associated gene, proteins and biallelic markers
US7326402B2 (en) 1999-11-04 2008-02-05 Diadexus, Inc. Method of diagnosing, monitoring, staging, imaging and treating cancer
US6902890B1 (en) 1999-11-04 2005-06-07 Diadexus, Inc. Method of diagnosing monitoring, staging, imaging and treating cancer
WO2001038586A3 (en) * 1999-11-24 2002-04-25 Richard A Shimkets Nucleic acids containing single nucleotide polymorphisms and methods of use thereof
WO2001038586A2 (en) * 1999-11-24 2001-05-31 Curagen Corporation Nucleic acids containing single nucleotide polymorphisms and methods of use thereof
WO2001042511A3 (en) * 1999-12-10 2002-11-14 Whitehead Biomedical Inst Ibd-related polymorphisms
WO2001042511A2 (en) * 1999-12-10 2001-06-14 Whitehead Institute For Biomedical Research Ibd-related polymorphisms
US6869762B1 (en) 1999-12-10 2005-03-22 Whitehead Institute For Biomedical Research Crohn's disease-related polymorphisms
WO2001048245A3 (en) * 1999-12-27 2002-11-28 Curagen Corp Nucleic acids containing single nucleotide polymorphisms and methods of use thereof
WO2001047942A3 (en) * 1999-12-27 2002-12-12 Curagen Corp Nucleic acids containing single nucleotide polymorphisms and methods of use thereof
WO2001048245A2 (en) * 1999-12-27 2001-07-05 Curagen Corporation Nucleic acids containing single nucleotide polymorphisms and methods of use thereof
WO2001047942A2 (en) * 1999-12-27 2001-07-05 Curagen Corporation Nucleic acids containing single nucleotide polymorphisms and methods of use thereof
WO2001047944A2 (en) * 1999-12-28 2001-07-05 Curagen Corporation Nucleic acids containing single nucleotide polymorphisms and methods of use thereof
WO2001047944A3 (en) * 1999-12-28 2003-02-20 Curagen Corp Nucleic acids containing single nucleotide polymorphisms and methods of use thereof
WO2001051670A2 (en) * 2000-01-07 2001-07-19 Curagen Corporation Nucleic acids containing single nucleotide polymorphisms and methods of use thereof
WO2001051670A3 (en) * 2000-01-07 2002-02-28 Curagen Corp Nucleic acids containing single nucleotide polymorphisms and methods of use thereof
US6967091B2 (en) 2000-01-14 2005-11-22 Genset, S.A. OBG3 globular head and uses thereof for decreasing body mass
US7671024B2 (en) 2000-01-14 2010-03-02 Serono Genetics Institute S.A. OBG3 globular head and uses thereof
US6989367B2 (en) 2000-01-14 2006-01-24 Genset S.A. OBG3 globular head and uses thereof
US7338787B2 (en) 2000-01-14 2008-03-04 Serono Genetics Institute S.A. Nucleic acids encoding OBG3 globular head and uses thereof
US7393661B2 (en) 2000-01-14 2008-07-01 Serono Genetics Institute S.A. Nucleic acids encoding OBG3 polypeptides
US6566332B2 (en) 2000-01-14 2003-05-20 Genset S.A. OBG3 globular head and uses thereof for decreasing body mass
WO2001066800A2 (en) * 2000-03-07 2001-09-13 Whitehead Institute For Biomedical Research Human single nucleotide polymorphisms
WO2001066800A3 (en) * 2000-03-07 2003-06-05 Whitehead Biomedical Inst Human single nucleotide polymorphisms
WO2001079504A1 (en) * 2000-04-13 2001-10-25 Millennium Pharmaceuticals, Inc. 23155 NOVEL PROTEIN HUMAN 5-α REDUCTASES AND USES THEREFOR
WO2002002812A3 (en) * 2000-06-30 2004-02-26 Univ London Method for determining the susceptibility to stoke by analysing the ins gene
WO2002002812A2 (en) * 2000-06-30 2002-01-10 University College London Method for determining the susceptibility to stoke by analysing the ins gene
WO2002066641A1 (en) * 2001-02-20 2002-08-29 Genset S.A. Pg-3 and biallelic markers thereof
WO2003000896A2 (en) * 2001-05-03 2003-01-03 Genodyssee POLYNUCLEOTIDES AND POLYPEPTIDES OF THE IFNα-5 GENE
WO2003000896A3 (en) * 2001-05-03 2003-07-17 Genodyssee POLYNUCLEOTIDES AND POLYPEPTIDES OF THE IFNα-5 GENE
WO2002092612A3 (en) * 2001-05-11 2003-07-10 Noxxon Pharma Ag Nucleic acids that bind to enterotoxin b
WO2002092612A2 (en) * 2001-05-11 2002-11-21 Noxxon Pharma Ag Nucleic acids that bind to enterotoxin b
US7501239B2 (en) * 2003-04-18 2009-03-10 Arkray, Inc. Method of detecting β3 adrenaline receptor mutant gene and nucleic acid probe and kit therefor
WO2006136033A1 (en) * 2005-06-23 2006-12-28 The University Of British Columbia Coagulation factor iii polymorphisms associated with prediction of subject outcome and response to therapy
US8063191B2 (en) * 2005-09-16 2011-11-22 Mayo Foundation For Medical Education And Research Polynucleotides encoding for fusion proteins with natriuresis activity
US9388457B2 (en) 2007-09-14 2016-07-12 Affymetrix, Inc. Locus specific amplification using array probes
US10329600B2 (en) 2007-09-14 2019-06-25 Affymetrix, Inc. Locus specific amplification using array probes
US11408094B2 (en) 2007-09-14 2022-08-09 Affymetrix, Inc. Locus specific amplification using array probes
US9074244B2 (en) 2008-03-11 2015-07-07 Affymetrix, Inc. Array-based translocation and rearrangement assays
US9932636B2 (en) 2008-03-11 2018-04-03 Affymetrix, Inc. Array-based translocation and rearrangement assays
WO2009143576A1 (en) * 2008-05-27 2009-12-03 Adelaide Research & Innovation Pty Ltd Polymorphisms associated with pregnancy complications
US9193777B2 (en) 2009-07-09 2015-11-24 Mayo Foundation For Medical Education And Research Method of treating cardiac arrhythmia with long acting atrial natriuretic peptide(LA-ANP)
WO2011151405A1 (en) 2010-06-04 2011-12-08 Institut National De La Sante Et De La Recherche Medicale (Inserm) Constitutively active prolactin receptor variants as prognostic markers and therapeutic targets to prevent progression of hormone-dependent cancers towards hormone-independence
US10344068B2 (en) 2011-08-30 2019-07-09 Mayo Foundation For Medical Education And Research Natriuretic polypeptides
US9611305B2 (en) 2012-01-06 2017-04-04 Mayo Foundation For Medical Education And Research Treating cardiovascular or renal diseases
US9987331B2 (en) 2012-01-06 2018-06-05 Mayo Foundation For Medical Education And Research Treating cardiovascular or renal diseases
US10092628B2 (en) 2012-01-06 2018-10-09 Mayo Foundation For Medical Education And Research Treating cardiovascular or renal diseases
CN111139301A (en) * 2020-03-10 2020-05-12 无锡市第五人民医院 Breast cancer related gene ERBB2 site g.39397319C > A mutant and application thereof

Also Published As

Publication number Publication date
EP0941366A2 (en) 1999-09-15
WO1998020165A3 (en) 1998-11-12

Similar Documents

Publication Publication Date Title
US5856104A (en) Polymorphisms in the glucose-6 phosphate dehydrogenase locus
EP0941366A2 (en) Biallelic markers
US6525185B1 (en) Polymorphisms associated with hypertension
US20060263807A1 (en) Methods for polymorphism identification and profiling
US6869762B1 (en) Crohn&#39;s disease-related polymorphisms
WO1998038846A2 (en) Genetic compositions and methods
US20020037508A1 (en) Human single nucleotide polymorphisms
US20060188875A1 (en) Human genomic polymorphisms
WO2001066800A2 (en) Human single nucleotide polymorphisms
EP0812922A2 (en) Polymorphisms in human mitochondrial nucleic acid
EP1068353A2 (en) Coding sequence polymorphisms in vascular pathology genes
EP1240354A2 (en) Single nucleotide polymorphisms in genes
EP1068354A2 (en) Biallelic markers
WO1998058529A2 (en) Genetic compositions and methods
US20030039973A1 (en) Human single nucleotide polymorphisms
US20030054381A1 (en) Genetic polymorphisms in the human neurokinin 1 receptor gene and their uses in diagnosis and treatment of diseases
EP1024200A2 (en) Genetic compositions and methods
WO2001038576A2 (en) Human single nucleotide polymorphisms
EP1276899A2 (en) Ibd-related polymorphisms
WO1999014228A1 (en) Genetic compositions and methods
WO2000058519A2 (en) Charaterization of single nucleotide polymorphisms in coding regions of human genes
WO2001034840A2 (en) Genetic compositions and methods
WO2005072150A2 (en) Ldlr genetic markers associated with age of onset of alzheimer&#39;s disease
US20020155446A1 (en) Very low density lipoprotein receptor polymorphisms and uses therefor
US20030008301A1 (en) Association between schizophrenia and a two-marker haplotype near PILB gene

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): JP US

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
AK Designated states

Kind code of ref document: A3

Designated state(s): JP US

AL Designated countries for regional patents

Kind code of ref document: A3

Designated state(s): AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE

WWE Wipo information: entry into national phase

Ref document number: 1997946582

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 1997946582

Country of ref document: EP

WWW Wipo information: withdrawn in national office

Ref document number: 1997946582

Country of ref document: EP