[Biojava-l] Algorithm to compare protein sequences
Thasso Griebel
thasso.griebel at uni-jena.de
Sat Nov 21 11:25:34 UTC 2009
Hi,
if I get this one right you want to do three things.
1. create a multiple sequence alignment.
2. create a pairwise distance matrix from the alignment.
3. use a distance based tree construction method (Agglomerative clustering (UPGME, WPGMA..) or Neighbor Joining) to create a tree. The tree can be printed as newick string.
I don't know if all of this is possible with biojava. If not, I could at least provide code to create the pairwise distance matrix (including JC and Kimura corrections) and for the clustering algorithms. But I thought NJ and AgglomerativeClustering are already implemented, though I couldn't find the classes in the 1.7 API ?
If you don't need to do the computations programmatically, you can also try
http://bio.informatik.uni-jena.de/epos/
though with the currently released version you have to do the alignment externally. The next release will also provide a way to do multiple sequence alignments directly.
Another alternative is
http://gi.cebitec.uni-bielefeld.de/qalign
QAlign can be used to create the alignment (using clustalw, tcoffee or dialign) and create NJ or Agglomerative tree in one step. Nice thing is that you can manipulate the alignment (i.e. insert gaps) and the tree updated continuously
cheers,
thasso
On Nov 21, 2009, at 09:35 , Andreas Dräger wrote:
> Hi Mara,
>
> At the moment there are two alignment algorithms available:
> Smith-Waterman for local and Needleman-Wunsh for global alignment. In
> addition to that there is a package for hidden Markov models that is
> also able to perform sequence alignments (see the BioJava cookbook for
> examples). However, currently both approaches will write the alignment
> similar to the BLAST output and not in this Newick format (I am actually
> not familiar with that). I hope that helps.
>
> Cheers
> Andreas
>
> --
> Dipl.-Bioinform. Andreas Dräger
> Eberhard Karls University Tübingen
> Center for Bioinformatics (ZBIT)
> Sand 1
> 72076 Tübingen
> Germany
>
> Phone: +49-7071-29-70436
> Fax: +49-7071-29-5091
>
>
> _______________________________________________
> Biojava-l mailing list - Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
--
Dipl. Inf. Thasso Griebel-------------------Lehrstuhl fuer Bioinformatik
Office 3426--http://bio.informatik.uni-jena.de--Institut fuer Informatik
Phone +49 (0)3641 9-46454-----------Friedrich-Schiller-Universitaet Jena
Fax +49 (0)3641 9-46452----------Ernst-Abbe-Platz 2, 07743 Jena, Germany
More information about the Biojava-l
mailing list