Phyrex is a program for reconstructing ancestral gene expression
profiles and sequence data on a given phylogenetic tree. The program
uses a Minimum Evolution algorithm to reconstruct the gene expression
profiles and a combination of BaseML and ClustalW to reconstruct the
ancestral sequences.It also provide a framework for correlating these
analysis and further analyse candidates of interest.
Phyrex can be downloaded from this site. The latest version of the
program are found under the "Software" heading below. Any
download of this program follow the terms given in the section
"Copyrights" and the terms that are following PAML and ClustalW.
As genomes evolve under selective pressure, not only do protein coding
sequences evolve, but the gene regulatory sites affecting expression
profiles and alternative splice site usage also evolve. To understand
the evolutionary history of gene and genome functionality and the
selective pressures that have affected the evolution of genomes, it is
desirable to reconstruct the ancestral state of gene expression and
splice site usage. At a first level, we have developed a Minimum
Evolution approach based on the use of gene expression profiles. The
approach is implemented to work with large scale datasets like
microarrays, e.g. the Affymetrix genechip technology, and is correlated
with an analysis of the actual regulatory sequence reconstruction done
by similar methods, where such information is available.
Our objectives with this research is to highlight the changes made by
selective pressure by use of Minimum evolution methods to reconstruct
continous data at the ancestral states in a phylogenetic tree.
To reconstruct the ancestral states of the continuous data traits I
have developed a brute force algorithm that constructs an interval of
allowed values on each internal node in a phylogenetic tree (Schreiber
format), and chooses the best value to represent each node. The
algorithm runs trough the tree two times, hence an order O(2n) time
complexity. In the first run the intervals of allowed values are
constructed and in the second run the intervals are narrowed and the
representing value is chosen. This is done for every gene represented
in our large scale data set.
An organism can by accumulating substitutions divide into two closely
related organisms. By comparing sequences from a set of homologous
genes from different species it is possible to find the sequence of
their closest common ancestor. Algorithms that do such calculation on
sequences have allready been developed. In this thesis ClustalW and
BaseML are used. ClustalW is used to align sequences found in the leaf
nodes of the tree and BaseML is used to calculate sequences at the
ancestral nodes of the phylogenetic tree based on the alignment done by
ClustalW. The sequences used are the upstream regions of the genes
collected from EnsEmbl. Our work has
also produced an algorithm and methods for constructing ancestral gene
expression profiles. and a framework that can be used to display and
compare them to the sequence calculation done as described above.
Simultaneous reconstruction of regulatory sequences and expression
profiles reduces the signal to noice ratio by using long branched
significant classes to correlate substitutions with functional effects.
By comparing the two calculations mentioned above and by isolating
candidate genes where a clear change in gene expression is correlated
with a high number of point mutations in the upstream region, the
signal to noice ratio is reduced.
© Copyright 2003 - 2004 by Roald Rossnes / University of Bergen / BCCS-CBU. The software package
is provided "as is" without warranty of any kind. In no event shall the
author or his ``employer'' be held responsible for any damage resulting
from the use of this software, including but not limited to the
frustration that you may experience in using the package. The program
package, including source codes, example data sets, executables, and
this documentation, is distributed free of charge for academic use
only. Permission is granted to copy and use programs in the package
provided no fee is charged for it and provided that this copyright
notice is not removed.