CBU - Computational Biology Unit
 
Bla bla ...

Send an email to services@cbu.uib.no

RegExpMaker 1.0
Submit Clustal alignment file:


Options:
Statistics (1)
First position in alignment is the N-terminus
Last position in alignment is the C-terminus

(1) Statistics: Searches in all Eukaryotic and Bacterial sequences of the SwissProt database are performed with the calculated regular expressions. For regular expressions with more than 15 positions, statistics are switched off automatically since the search becomes too time consuming.

Links:
  • ELM - Eukaryotic Linear Motif resource for predicting functional sites in proteins

  • Pratt - Pattern discovery
  • eMOTIF - Motif discovery and searching
  • Teiresias - Sequence pattern discovery
  • FPAT - Regular expression searches of sequence databases
  • SIRW - Combines the ability to search protein/nucleotide databases with keywords and a sequence motif



  • About

    Many important functional aspects of proteins can be attributed short linear motifs, including post-translational modification sites and protein-protein interaction peptides (see ELM - http://elm.eu.org). Regular expressions can be used to describe such motifs, and can be used to detect/predict these. However, since such linear motifs are so short, regular expressions describing them will often massively overpredict, and information beyond sequence is needed to improve predictions (ELM).

    RegExpMaker takes a multiple alignment (clustal format), and uses the amino acid groups as adopted from WR Taylor, J Theor Biol. 1986 Mar 21;119(2):205-18 (see figure below). If the amino acids in a position of the alignment fall within one of the predefined groups they are represented as a list of observed residues (within brackets: e.g. [VIL]). If the amino acids fall within more than one of the predifined groups, they will in Regular expression 1 be represented by a '.', implying that all amino acids are allowed in this position. If it is possible to identify groups that do not have at least one amino acid present in that position, that position is represented by amino acids not allowed in Regular Expression 2 (e.g. [^DE]).

    Additionally, gaps in the alignment are interpreted as implying that that position is optional. The user also has the opportunity to specify whether the first or last position of the submited alignments is N- or C-terminally respectively, and this piece of information will be built into the resulting regular expression.

    RegExpMaker uses the POSIX syntax for regular expressions (for details on syntax and examples: ELM), in contrast to Prosite, where a slightly different syntax is used (more information here).



    Pål Puntervoll (C) 2003

    This page is maintained by webmaster@bccs.uib.no. Last updated: Tuesday 12 February, 2008
    Unifob logo    UiB logo