WO2001037191A2 - Method for manipulating protein or dna sequence data (in order to generate complementary peptide ligands) - Google Patents
Method for manipulating protein or dna sequence data (in order to generate complementary peptide ligands) Download PDFInfo
- Publication number
- WO2001037191A2 WO2001037191A2 PCT/GB2000/004418 GB0004418W WO0137191A2 WO 2001037191 A2 WO2001037191 A2 WO 2001037191A2 GB 0004418 W GB0004418 W GB 0004418W WO 0137191 A2 WO0137191 A2 WO 0137191A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sequence
- frame
- frames
- complementary
- protein
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B15/00—ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
- G16B15/30—Drug targeting using structural data; Docking or binding prediction
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/10—Sequence alignment; Homology search
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B15/00—ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
Definitions
- Proteins are made up of strings of amino acids and each amino acid in a string is coded for by a triplet of nucleotides present in DNA sequences.
- the linear sequence of DNA code is read and translated by a cell's synthetic machinery to produce a linear sequence of amino acids, which then folds to form a complex three-dimensional protein.
- binding The mechanisms which govern protein folding are multi-factorial and the summation of a series of interactions between biophysical phenomena and other protein molecules. Virtually all molecules signal by non-covalent attachment to another molecule ("binding"). Despite the conceptual simplicity and tremendous importance of molecular recognition, the forces and energetics that govern it are poorly understood. This is owed to the fact that the two primary binding forces (electrostatics and van der Waals interactions) are weak, and roughly of the same order of magnitude. Moreover, binding at any interface is complicated by the presence of solvent (water) , solutes (metal ions and salt molecules) , and dynamics within the protein, all of which can inhibit or enhance the binding reaction.
- a typical growth factor has a molecular weight of 15,000 to 30,000 daltons, whereas a typical small molecule drug has a molecular weight of 300-700.
- X-ray crystal structures of small molecule-protein complexes such as biotm-avidm
- enzyme-substrates show that they usually oind m crevices, not to flat areas cf tne protein.
- protein-protein targets are non- traditional and the pharmaceutical community has had very limited success m developing drugs that bind to them using currently available approaches to lead discovery.
- High throughput screening technologies in which large (combinatorial) libraries of synthetic compounds are screened against a target protein (s) have failed to produce a significant number of lead compounds .
- Protein-protein interactions are distinct from the interaction of substrates to enzymes or small molecule ligands to seven-transmembrane receptors. Protein-protein interactions occur over relatively large surface areas, as opposed to the interactions of small molecule ligands with serpentine receptors, or enzymes with their substrates, which usually occur in focused "pockets" or "clefts.”
- the problem therefore is to define the small subset of regions that define the binding or functionality of the protein.
- the invention described herein provides a method and a software tool for processing sequence data and a method and a software tool for protein structure analysis, and the data forming the product of each method, as defined the appended independent claims to which reference should be made. Preferred or advantageous features of the invention are set out dependent subclaims .
- the invention provides a method and a software tool for use analysing and manipulating sequence data (e.g. both DNA and protein) such as is found large databases (see EXAMPLE 1) .
- sequence data e.g. both DNA and protein
- sequence data e.g. both DNA and protein
- This technology may advantageously have significant applications the application of informatics to sequence databases order to identify lead molecules for important pharmaceutical targets.
- DNA is composed of two helical strands of nucleotides (see FIG. 11) .
- the concepts governing the genetic code and the fact that DNA codes for protem sequences are well known
- the 'sense' strand codes for the protem, and as such, attracts all the attention of molecular biologists and protem chemists alike.
- the purpose of the other 'anti-sense' strand is more elusive. To most, its function is relegated to that of a molecular 'support' for the 'sense' strand, which is used when DNA is replicated but is of little immediate functional significance for the day-to-day activities of cellular processes .
- Mekler' ⁇ original tneory was supported by studies on antigen processing pathways Specifically, an antibo ⁇ y- synthesizing R N A complex was found to bind to its antigen with high affinity (Fishman and Adler, 1967) Mekler contended that these results demonstrated the ability of a protem antigen to regulate its own synthesis by binding to the mRNA encoding the antibody (Mekler, 1969) As the binding between the active centre of the antibody and the antigenic determinant is well known to be based on associations of polypeptide chains, he purported that two interacting polypeptides may be encoded in complementary strands of DNA (FIG. 11)
- Mekler also analysed the proposed interacting regions of pancreatic ribonuclease A and recorded that reading the complementary RNA of one of the interacting chains the 5 '-3' direction yielded the sequence of the other mteractant. From these observations he suggested that there existed a specific code of interaction between ammo acid si d e chains encoded by complementary codons at the RNA level (EXAMPLE 2 ) .
- hydropathic character of an amino acid residue is related to the identity of the middle letter of the triplet codon from which it is transcribed.
- a triplet codon with thymine (T) as its middle base codes for a hydrophobic residue whilst adenine (A) codes for a hydrophilic residue.
- a triplet codon with middle bases cytosine (C) or guanine (G) encode residues which are relatively neutral and with similar hydropathy scores. Hydropathy is an index of the affinity of an amino acid for a polar environment, hydrophilic residues yielding a more negative score, whilst hydrophobic residues exhibit more positive scores.
- Blalock suggested that it is the linear pattern of amino acid hydropathy scores in a sequence (rather than the combination of specific residue identities) that defines the secondary structure environment. Furthermore, he suggested that sequences with inverted hydropathic profiles are complementary in shape by virtue of inverse forces determining their steric relationships.
- the generation of a complementary peptide is straightforward m cases where the DNA sequence information is available.
- the complementary base sequence is read in either the 5' -2' or 3'- ⁇ ' direction an d translated to the peptide sequence according to the genetic code.
- the possible complementary amino acids for a leucine residue are glutamine (3 possible codons) , stop (2 possible codons) , glutamic acid (1 possible codons) and lysine (1 possible codon) .
- glutamine would be chosen on the basis of statistical weight.
- Information such as this, along with the use of codon usage tables leads to a consensus approach to limiting the number of possible combinations of complementary sequences. Bost and Blalock (1989) and Shai et al . (1989) have employed methods of this type.
- Higner affinity pept ⁇ es may serve a purpose the development of therapeutics, for example a complementary peptioe to a coat protein of a virus may interfere with the virus-host interaction at the molecular level, thus providing a strategy to manage this type of disorder
- This invention provides significant benefits for those interested in:
- FIG. 1 shows a block diagram illustrating one embodiment of a method of the present invention
- FIG. 2 shows a block diagram illustrating one embodiment for carrying out Step 4 FIG. 1
- FIG. 3 shows a block diagram illustrating one embodiment for carrying out Step 5 in FIG. 1
- FIG. 4 shows a block diagram illustrating one embodiment for carrying out Step 8 in FIG. 2 and 3
- FIG. 5 shows a block diagram illustrating one embodiment for carrying out Step 8 m FIG. 2 and 3
- FIG. 6 shows a block diagram illustrating one embodiment for carrying out Step 6 FIG. 1
- FIG. 7 snows a block diagram illustrating a second embodiment of a method of the present invention
- FIG. 8 shows a block diagram illustrating one embodiment for carrying out Step 29 m FIG. 7 ;
- FIG. 1 shows a block diagram illustrating one embodiment of a method of the present invention
- FIG. 2 shows a block diagram illustrating one embodiment for carrying out Step 4 FIG. 1
- FIG. 3 shows a block diagram illustrating one embodiment for carrying out Step 5 in FIG
- FIG. 9 shows a block diagram illustrating one embodiment for carrying out Step 30 FIG. 7 ;
- FIG. 10 shows a diagram illustrating one embodiment of software design required to implement the ALS program
- FIG. 11 shows a diagram illustrating the principle of complementary peptide derivation.
- FIG. 12 shows a diagram to illustrate antisense ammo acids pairings inherent in the genetic code
- FIG. 13 snows a representation of the Molecular Recognition Theory
- FIC 14 shows a graph and text illustrating cioicgic ⁇ . oata as an example of the utility of the ALS program
- ALS antisense ligand searcher
- 'Antisense' refers to relationships between ammo acids specified in EXAMPLES 2 and 4 (both 5'->3' derived and 3 ' - >5 ' derived coding schemes).
- FIG. 1-6 Diagrams describing the algorithms involved m this software are shown m FIG. 1-6.
- the present invention is directed toward a computer-based process, a computer-based system and/or a computer program product for analysing antisense relationships between protem or DNA sequences.
- a scheme of software architecture of a preferred embodiment is shown m FIG. 10.
- the method of the embodiment provides a tool for the analysis of protein or DNA sequences for antisense relationships.
- This embodiment covers analysis of DNA or prote sequences for intramolecular (within the same sequence) antisense relationships or ter-molecular (between 2 different sequences) antisense relationships. This principle applies whether the sequence contains am o acid information (protein) or DNA information, since the former may be derived from the latter.
- the overall process of the invention is to facilitate the batch analysis of an entire genome (collection of genes/and or protein sequences) for every possible antisense relationship of both mter- and mtra-molecular nature.
- a protem sequence database SWISS-PROT (Bairoch and Apweiler, 1999) , may be analysed by the methods described.
- SWISS-PROT contains a list of protem sequences.
- the current invention does not specify what format the input sequences are held - for this example we used a relational database to allow access to this data.
- the program runs m two modes.
- the first mode is to select the first protem sequence SWISS-PROT and then analyse the antisense relationships between this sequence and all other protein sequences, one at a time.
- the program selects the second sequence and repeats this process. This continues until all of the possible relationships have been analysed.
- the second mode is where each protein sequence is analysed for antisense relationships within the same protein and thus each sequence is loaded from the database and analysed in turn for these cropert es
- Eotn operational mooes use tne same core algontnms for their processes
- the core algorithms are describee detail below
- EXAMPLES 5 and 6 An example of tne output from this process is shown EXAMPLES 5 and 6.
- EXAMPLE 5 shows a list of proteins the SWISS-PROT database that contain highly improbable numbers of intramolecular antisense frames of size 10 (frame size is a section of the ma sequence, it is described m more detail below) . In EXAMPLE 5 the total number of antisense frames are shown. Another way of representing this data is to list the actual sequence information itself, as shown in EXAMPLE 6 and m the Sequence Listing (Seq ID Nos 1-32) . An example of the biological relevance of peptides derived from this method is shown in FIG. 14. The embodiment can output the data m either of these formats as well as many others .
- protem sequence 1 is ATRGRDSRDERSDERTD and protem sequence 2 is GTFRTSREDSTYSGDTDFDE (universal 1 letter ammo acid codes used) .
- step 1 a protem sequence, Sequence 1
- the protein sequence consists of an array of universally recognised ammo acid one letter codes, e.g. 'ADTRGSRD'.
- the source of this sequence can oe a database, or any other file type.
- Step 2 is the same operation as for step 1, except Sequence 2 is loaded.
- Decision step 3 involves comparing the two sequences and determining whether they are identical, or whether they differ. If they differ, processing continues to step 4, described in FIG. 2, otherwise processing continues to step 5, described in FIG. 3.
- Step 6 analyses the data resulting from either step 4, or step 5, and involves an algorithm described m FIG. 6. Description of parameters useo m FIG 2
- a 'frame' is selected for each of the proteins selected steps 1 and 2.
- a 'frame' is a specific section of a protein sequence. For example, for sequence 1, the first frame of lengtr. '5' would correspond to the characters 'ATRGR'
- the user of the program decides the frame length as an input value. This value corresponds to parameter (n) in FIG. 2.
- a frame is selected from each of the protein sequences (sequence 1 and sequence 2). Each pair of frames that are selected are aligned and frame position parameter (f) is set to 0.
- the first pair of ammo acids are 'compared' using the algorithm shown m FIG.
- FIG 3 shows a block diagram of the algorithmic process that is carried out m the conditions described m FIG. 1.
- Step 12 is the only difference between the algorithms FIG. 2 and FIG 3.
- the value of dp2) (the position of the frame m sequence 2) is set to at least the value of dpi) at all times since as Sequence 1 and Se ⁇ uence 2 are identical, if dp2) is less than dpi) then the same sequences are being searched twice.
- FIG 4 and 5 describe the process m which a pair of ammo acids (FIG. 4) or a pair of triplet codons is assessed for an antisense relationship.
- the antisense relationships are listed in EXAMPLES 2 and 4.
- step 13 the currently selected ammo acid from the current frame of Sequence 1 and the currently selected ammo acid from the current frame of Sequence 2 (determined by parameter (f) in FIG. 2 and 3) are selected.
- the first am o acid from the first frame of Sequence 1 would be 'A' and the first ammo acid from the first frame of Sequence 2 would be 'G' .
- step 14 the ASCII character codes for the selected single uppercase characters are determined and multiplied and, m step 15, the product compared with a list of pre-calculated scores, which represent the antisense relationships EXAMPLES 2 and 4 If the ammo acids are deemed to fulfil the criteria for an antisense relationship (the product matches a value m the pre-calculated list) then an output parameter (T) is set to 1, otherwise the output parameter is set to 0 (see FIG. 4) .
- Steps 16-21 relate to the case where the input sequences are DNA/RNA code rather the protem sequence.
- Sequence 1 could be AAATTTAGCATG and Sequence 2 could be TTTAAAGCATGC.
- the domain of the current invention includes both of these types of information as input values, since tne protein se ⁇ uence ca ne oeco ⁇ ed from tne DNA sequence, m accordance with the genetic code
- Steps 16-21 determine antisense relationships for a given triplet codon
- the currently selected triplet codon for botn sequences is 'read'
- the first triplet codon of the first frame would be 'AAA'
- Sequence 2 this would be ' TTT '
- the second character of each of these strings is selected.
- FIG. 6 illustrates the process of rationalising the results after the comparison of 2 prote or 2 DNA sequences.
- step 22 the first 'result' is selected.
- a result consists of information on a pair of frames that were deemed 'antisense' in FIG. 2 or 3. This information includes location, length, score ( ⁇ ..e the sum of scores for a frame) and frame type (forward or reverse, depending on orientation of sequences with respect to one another) .
- the frame size, the score values and the length of the parent sequence are then used to calculate the probability of that frame existing.
- the statistics, which govern the probability of any frame existing are described the next section and refer to equations 1-4. If the probability is less than a user chosen value (p) , then the frame details are 'stored' for inclusion m the final result set (step 24).
- the number of complementary frames in a protein sequence can be predicted from appropriate use of statistical theory.
- This value (p) is calculated as 2.98.
- Equation 2 For a single 'frame' of size (n) the probability ( C) of pairing a number of complementary ammo acids (r) can be described by the binomial distribution (equation 2) :
- a region of protem may oe complementary to itself.
- A-S, L-K and V-D are complementary partners.
- a six ammo acid wide frame would thus be reporte ⁇ (in reverse orientation) .
- a frame of this type is only specified by half of the residues m the frame. Such a frame is called a reverse turn. :.-.
- this scenario once half of the frame length has been selected with complementary partners, there is a finite probability that those partners are the sequential neighbouring amino acids to those already selected. The probability of this occurring in any protein of any se ⁇ uence is (equation 6) :
- the software of the embodiment incorporates all of the statistical models reported above such that it may assess whether a frame qualifies as a forward frame, reverse frame, or reverse turn.
- the current invention provides a novel method for aiding the determination of three dimensional, structure .
- This software performs the following tasks: -
- AXRA overcomes previous limitations of analysing protein sequences for antisense interactions by recognising for the first time that antisense pairings also exist in discontinuous regions of proteins, and thus antisense sequence searching can be expanded to 3 dimensional structures .
- User options allow control over searching parameters such as frame length, minimum distance for partner and number of neighbouring residues from the same chain to exclude from analysis.
- step 25 the program reads a file containing the cartesian x, y, z co-ordinates of a protein structure and these are stored by conventional programmatic means (step 26) .
- the protem sequence (1 letter ammo acid codes) is also read from this file and stored memory as an array of cnaracters
- step 2 the distances netween eacn alpha- caroon atom (as denoted m Brookhaven databank format CA) ana all other carbon atoms that make up eacn ammo acid (CB, cl, c2, en) are calculated by vector mathematics from the cartesian co-ordinates
- the program user chooses (through the UI) which atom type (e g CB , cl etc) are used the calculation of the distances between two ammo acids.
- the (x) closest ammo acids for each residue are stored for further analysis.
- the value (x) is provided by the user from a suitable user interface (UI) .
- UI user interface
- the default maximum distance m this process is 15 angstroms,- if less than (x) ammo acids fall within this distance then only those within this distance will be stored.
- the user may change this value through the UI . This is known as the Nearest Neighbour Sphere (NNS).
- NNS Nearest Neighbour Sphere
- the program flow follows the user's choice (input through the UI) as to whether the analysis should be based on hydropathy (step 29) or whether the analysis should be based on antisense relationships (step 30)
- step 31 the antisense relationships between the first ammo acid m the protein sequence (stored m step 25) and the list of ammo acids stored as the nearest neighbour sphere (NNS) are determined.
- NNS is a list of arrays - one array for each position in the protein sequence
- each am o acid in the sequence is selected in turn and compared with each member of its NNS (stored in step 27) using the algorithm depicted m FIG.
- Decision step 32 routes the users selection (from the UI) of whether to find regions of antisense relationships between 2 continuous parts of the same sequence (step 33), 1 continuous and 1 discontinuous part of the same sequence (step 34) or 2 discontinuous parts of the same sequence (step 35) .
- the first 'frame' of length (n) of the protein sequence s selected.
- the frame is a section of the total sequence, and the length of this frame (n) is chosen by the user through the UI .
- a Score Threshold ⁇ ST) parameter is chosen through the UI.
- the first frame (of length ( n ) ) is selected from the protem sequence.
- the NNS is analysed If any continuous combinations of antisense relationships within the NNS are found where the aggregate Score ⁇ S) is greater than the user chosen Score Threshold ⁇ ST ) then the am o acids sequence locations are stored as a 'hit' frame. This is repeated for each frame in the protem sequence When the process has finished the 'hit frame' results are then listed m an appropriate UI format.
- the first 'frame' of length (n) of the protein sequence is selected.
- the frame is a section of the total sequence, and the length of this frame (n) is chosen by the user through the UI .
- a Score Threshold ⁇ ST) parameter is chosen through the UI.
- the first frame ( of length ( n ) ) is selected from the protem sequence.
- For each ammo acid each frame the NNS is analysed. If any discontinuous combinations of antisense relationships within the NNS are found where the aggregate score (S) is greater than the user chosen Score Threshold ⁇ ST ) then the ammo acids sequence locations are stored as a 'hit' frame. This is repeated for each frame of the protein sequence.
- step 40 the hydropathic comparison scores between the first am o acid the protem sequence (stored in step 25) and the list of ammo aci ⁇ s stored as the nearest neignbour sphere (NNS) are determined using the following equation (equation ) :
- (a 2 ) and (a ) are the hydropathy scores of the ammo acids selected as scored on tne Kyte and Doolittle scale (Kyte and Doolittle, 1982) . This equation is evaluated for each pair of ammo acids specified by the currently selected ammo acid and its partners the NNS and the resulting ( H) values are scored.
- the user may specify input values determining the maximum ⁇ maxd) and minimum ⁇ mmd) distances that relationships must fall within to be processed further. This process is repeated for all ammo acids in the protem sequence.
- the overall process here is to define the hydropathic relationships between proximal ammo acids. Programmatically, we end up with a list of arrays where each array contains a list of nyoropathic scores for ammo acids neighbouring the ammo acid specified by the index m the ma list. This list of arrays ⁇ LR) is then used for steps 37, 38 or 39.
- Decision step 36 routes the users selection (from the UI) of whether to find regions of complementary hydropathy between 2 continuous parts of the same sequence (step 37), 1 continuous and 1 discontinuous part of the same sequence (step 38) or 2 discontinuous parts of the same sequence (step 39) .
- the frame is a section of the total sequence, and the length of this frame (n) is chosen by the user through the UI . Also chosen through the UI is a Hydropathy Score Threshold ⁇ HST) parameter.
- the first 'frame' of length (n) of the protein sequence is selected. In this nr ⁇ t frame tne first am o acic ⁇ s se_ectec.
- the LCWEST value of tne list of hydropathy scores formed step 40 is ta en and written to a Result Frame ( RFi . (The sequence indexes of tne amino acids that are responsible for the lowest scores are written to another list ⁇ SL) such that a link between ammo acid location and hydropathy is created) .
- ⁇ H is defined in the equation above
- ⁇ L) is the frame length, denoting the length of the ammo acid sequence that is used for the comparison. The lower the score ( ⁇ ) , the greater the degree of hydropathic complementarity for the defined region.
- the sequence indexes of the amino acids that were responsible for the hydropathy values used in equation 10 are analysed for continuity (i.e. are these amino acids continuous, such as position 10, position 11, position 12 etc) . If continuity is found, the frame is stored for further analysis.
- the frame is a section of the total sequence, ano the length of this frame (n) is chosen by the user through tne UI . Also chosen through the UI is a Hydropathy Score Threshold ⁇ HST) parameter.
- the first 'frame' of length (n) of the prote sequence is selected.
- the LOWEST value of the list of hydropathy scores formed in step 40 is taken and written to a Result Frame (RF) .
- Result Frame RF
- the sequence indexes of the ammo acids that are responsible for the lowest scores are written to another list (SL) sucn that a lm ⁇ between ammo acid location and hydropathy is created.
- This is repeated for each ammo acid m the frame until we have a completed Result Frame ⁇ RF) that contains a list of the lowest hydropathy scores available for the specified ammo acids.
- the average hydropathy for this frame is then determined by the following equation 8.
- step 38 all hydropathic relationships (equation 8) between each ammo acid and its NNS counterparts are written out to a display for further analysis.
- the program flow is illustrated FIG. 7.
- the software was used to select regions of complementary hydropathy withm the IL-l ⁇ IL-1R crystal structure.
- the program was run on the X-ray file (pdb2 ⁇ tb) and selected the most complementary region between the ligand and receptor as consisting of residues 47-54 of IL-l ⁇ (sequence QGEESND) and residues 245, 244, 303, 298, 242, 249, 253 of the receptor (sequence W, S, V, I, G, Y, I).
- This demonstrates two things. Firstly, it shows that the software functions properly that it can locate regions of hydropathic complementarity between a receptor-ligand pair.
- the region of IL-l ⁇ wnich has the closest residues of greatest hydropathic inversion to the IL-1 type I receptor is the trigger loop region of IL-l ⁇ to which we have previously designed antisense peptides.
- the receptor-ligand contact pairs analysed by the software a s di spl aying tne large s t di f f erence : /drocathi .ndice: are i l lus trated bel ow .
- Region of complementary hydropathy within the X-ray crystal structure of IL-l ⁇ complexed with its type I receptor pdb file 2itb
- C-alpha traces of the proteins are displayed with regions picked out by the 3D-hydropathy map tool highlighted in white.
- This invention presents a novel informatics technology that greatly accelerates the pace for initial identification and subsequent optimization of small peptides that bind to protein-protein targets. Using this technology an operator can systematically produce large numbers or 'catalogues' of small peptides that are very useful and specific agonists/antagonists of protein-protein interactions.
- peptides are ideally suited for use in drug discovery programs as biological tools for probing gene function, or as a basis for configuring drug discovery screens or as a molecular scaffold for medicinal chemistry.
- peptides with a high affinity for a protein could form drugs in their own right.
- MOUSE Q99020 CARG-BINDING FACTOR-A 285 6 9 3 15 0 000223
- DROME Q24563 DOPAMINE RECEPTOR 2 539 8 0 0 8 0 000796
- the antisense homology box a new motif within proteins that encodes biologically active peptides. Nature Medicine. 1:894-901.
Abstract
Description
Claims
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP00979742A EP1230615A2 (en) | 1999-11-19 | 2000-11-20 | Method for manipulating protein or dna sequence data in order to generate complementary peptide ligands |
AU17134/01A AU1713401A (en) | 1999-11-19 | 2000-11-20 | Method for manipulating protein or dna sequence data (in order to generate complementary peptide ligands) |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GB9927485.4 | 1999-11-19 | ||
GB9927485A GB2356401A (en) | 1999-11-19 | 1999-11-19 | Method for manipulating protein or DNA sequence data |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2001037191A2 true WO2001037191A2 (en) | 2001-05-25 |
WO2001037191A3 WO2001037191A3 (en) | 2002-03-21 |
Family
ID=10864860
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/GB2000/004418 WO2001037191A2 (en) | 1999-11-19 | 2000-11-20 | Method for manipulating protein or dna sequence data (in order to generate complementary peptide ligands) |
Country Status (5)
Country | Link |
---|---|
US (1) | US6721663B1 (en) |
EP (1) | EP1230615A2 (en) |
AU (1) | AU1713401A (en) |
GB (1) | GB2356401A (en) |
WO (1) | WO2001037191A2 (en) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB0119890D0 (en) * | 2001-08-15 | 2001-10-10 | Proteom Ltd | Apparatus and method for predicting rules of protein sequence interactions |
BR0315054A (en) * | 2002-11-01 | 2005-08-16 | Boys Town Nat Res Hospital | Inductively linking to alpha 1 beta 1 integrin and jobs |
AU2003902621A0 (en) * | 2003-05-27 | 2003-06-12 | Commonwealth Scientific And Industrial Research Organisation | Ecdysone receiptor ligand-binding domain structure |
WO2007035406A1 (en) * | 2005-09-16 | 2007-03-29 | Orthologic Corp. | Antibodies to complementary peptides of thrombin or portions thereof |
US9268903B2 (en) | 2010-07-06 | 2016-02-23 | Life Technologies Corporation | Systems and methods for sequence data alignment quality assessment |
US10327812B2 (en) | 2015-11-04 | 2019-06-25 | Rainbow Medical Ltd. | Pericardial access device |
US10667842B2 (en) | 2017-11-24 | 2020-06-02 | Rainbow Medical Ltd. | Pericardial needle mechanism |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5081584A (en) * | 1989-03-13 | 1992-01-14 | United States Of America | Computer-assisted design of anti-peptides based on the amino acid sequence of a target peptide |
US5212072A (en) * | 1985-03-01 | 1993-05-18 | Board Of Regents, The University Of Texas System | Polypeptides complementary to peptides or proteins having an amino acid sequence or nucleotide coding sequence at least partially known and methods of design therefor |
US5523208A (en) * | 1994-11-30 | 1996-06-04 | The Board Of Trustees Of The University Of Kentucky | Method to discover genetic coding regions for complementary interacting proteins by scanning DNA sequence data banks |
WO1998037242A1 (en) * | 1997-02-24 | 1998-08-27 | Tm Technologies, Inc. | Process for selecting anti-sense oligonucleotides |
WO1999042621A2 (en) * | 1998-02-21 | 1999-08-26 | Tm Technologies, Inc. | Methods for identifying or characterising a site based on the thermodynamic properties of nucleic acids |
WO1999055911A1 (en) * | 1998-04-24 | 1999-11-04 | Fang Fang | Identifying peptide ligands of target proteins with target complementary library technology (tclt) |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5884230A (en) * | 1993-04-28 | 1999-03-16 | Immunex Corporation | Method and system for protein modeling |
-
1999
- 1999-11-19 GB GB9927485A patent/GB2356401A/en not_active Withdrawn
-
2000
- 2000-05-16 US US09/571,854 patent/US6721663B1/en not_active Expired - Fee Related
- 2000-11-20 WO PCT/GB2000/004418 patent/WO2001037191A2/en not_active Application Discontinuation
- 2000-11-20 AU AU17134/01A patent/AU1713401A/en not_active Abandoned
- 2000-11-20 EP EP00979742A patent/EP1230615A2/en not_active Withdrawn
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5212072A (en) * | 1985-03-01 | 1993-05-18 | Board Of Regents, The University Of Texas System | Polypeptides complementary to peptides or proteins having an amino acid sequence or nucleotide coding sequence at least partially known and methods of design therefor |
US5081584A (en) * | 1989-03-13 | 1992-01-14 | United States Of America | Computer-assisted design of anti-peptides based on the amino acid sequence of a target peptide |
US5523208A (en) * | 1994-11-30 | 1996-06-04 | The Board Of Trustees Of The University Of Kentucky | Method to discover genetic coding regions for complementary interacting proteins by scanning DNA sequence data banks |
WO1998037242A1 (en) * | 1997-02-24 | 1998-08-27 | Tm Technologies, Inc. | Process for selecting anti-sense oligonucleotides |
WO1999042621A2 (en) * | 1998-02-21 | 1999-08-26 | Tm Technologies, Inc. | Methods for identifying or characterising a site based on the thermodynamic properties of nucleic acids |
WO1999055911A1 (en) * | 1998-04-24 | 1999-11-04 | Fang Fang | Identifying peptide ligands of target proteins with target complementary library technology (tclt) |
Non-Patent Citations (1)
Title |
---|
BARANYI L ET AL: "THE ANTISENSE HOMOLOGY BOX: A NEW MOTIF WITHIN PROTEINS THAT ENCODES BIOLOGICALLY ACTIVE PEPTIDES" NATURE MEDICINE,NATURE PUBLISHING, CO,US, vol. 1, no. 9, September 1995 (1995-09), pages 894-901, XP000984564 ISSN: 1078-8956 * |
Also Published As
Publication number | Publication date |
---|---|
US6721663B1 (en) | 2004-04-13 |
EP1230615A2 (en) | 2002-08-14 |
WO2001037191A3 (en) | 2002-03-21 |
GB2356401A (en) | 2001-05-23 |
GB9927485D0 (en) | 2000-01-19 |
AU1713401A (en) | 2001-05-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Janin et al. | Protein–protein interaction and quaternary structure | |
Wan et al. | Large scale statistical prediction of protein-protein interaction by potentially interacting domain (PID) pair | |
Rufino et al. | Predicting the conformational class of short and medium size loops connecting regular secondary structures: application to comparative modelling | |
ŠAli et al. | Derivation of rules for comparative protein modeling from a database of protein structure alignments | |
DK1987178T3 (en) | Process for the construction and screening of peptide structure libraries | |
US6804611B2 (en) | Apparatus and method for automated protein design | |
Chiusano et al. | Second codon positions of genes and the secondary structures of proteins. Relationships and implications for the origin of the genetic code | |
Dandekar et al. | Identifying the tertiary fold of small proteins with different topologies from sequence and secondary structure using the genetic algorithm and extended criteria specific for strand regions | |
US20060160138A1 (en) | Compositions and methods for protein design | |
US20030130797A1 (en) | Protein modeling tools | |
WO2007008951A1 (en) | Compositions and methods for design of non-immunogenic proteins | |
EP1237907A2 (en) | Complementary peptide ligands generated from the human genome | |
Bazan | Helical fold prediction for the cyclin box | |
WO2001037191A2 (en) | Method for manipulating protein or dna sequence data (in order to generate complementary peptide ligands) | |
Brylinski et al. | Early-stage folding in proteins (in silico) sequence-to-structure relation | |
Kumar et al. | Automated protein design: Landmarks and operational principles | |
Hodgman | The elucidation of protein function from its amino acid sequence | |
Brylinski et al. | SPI–Structure Predictability Index for Protein Sequences. | |
López-Romero et al. | Prediction of functional sites in proteins by evolutionary methods | |
CA2548482A1 (en) | Protein engineering with analogous contact environments | |
CN116092573A (en) | Design method of protein interaction inhibitory peptide | |
US20040214206A1 (en) | Method of designing multifunctional base sequence | |
Heringa | 1. AMINO ACID SEQUENCE COMPARISON | |
Jiang | Computational studies of molecular recognition and protein folding patterns | |
Parmeggiani | Design of armadillo repeat protein scaffolds |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A2 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
AK | Designated states |
Kind code of ref document: A3 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A3 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2000979742 Country of ref document: EP |
|
WWP | Wipo information: published in national office |
Ref document number: 2000979742 Country of ref document: EP |
|
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 2000979742 Country of ref document: EP |