CBU - Computational Biology Unit
 
The Pratt program is able to discover patterns conserved in sets of unaligned protein sequences. The output is in the form of Prosite patterns.

Send an email to services@cbu.uib.no

S-option

The S option allows you to control the scoring of patterns. There are five possible scoring schemes to be used:

info

patterns are scored by their information content as defined in (Jonassen et al, 1995). Note that a pattern's score is independent of which sequences it matches.

mdl
patterns are scored by a Minimum Description Length principle derived scoring scheme, which is related to the one above, but penalises patterns scoring few sequences vs. patterns scoring many. Parameters Z0-Z3 appears when this scoring scheme is used.

tree
a pattern is scored higher if it contains more information and/or if it matches more diverse sequences. The sequence diversity is calculated from a dendrogram which has to be input.

dist
similar to the tree scoring, except a matrix with pairwise the similarity between all pairs of input sequences are used instead of the tree. The matrix has to be input.

ppv
a measure of Positive Predictive Value - it is assumed that the input sequences constitute a family, and are all contained in the Swiss-Prot database. PPV measures how certain one can be that a sequence belongs to the family given that it matches the pattern.


For the last three scoring schemes, an input file is needed and option SF appears allowing the user to set his own file name.

This page is maintained by webmaster@bccs.uib.no. Last updated: Tuesday 12 February, 2008
Unifob logo    UiB logo