[Bioperl-l] Building consensus sequence from ESTs

Fernan Aguero fernan@iib.unsam.edu.ar
Tue, 17 Sep 2002 10:10:43 -0300


+----[ Asi hablaba Andrew Macgregor (andrew@anatomy.otago.ac.nz):
|
| Hi all,
| 
| I apologise if this question is a bit basic. Can anyone point me towards
| tools within Bioperl that can take multiple ESTs and create a consensus
| sequence, something like what can be done with Gelmerge using GCG.
| 
|
+----]

I don't know about GCG ... and I doubt if there anything within bioperl
to do this, but here are my options (please enlighten me with other
variations): 

We work with ESTs and we have used different approaches to EST
clustering and the production of consensus sequences. 

i) phrap or any other sequence assembler would cluster ESTs, and also
produce consensi. Many parameters available to play with. 

ii) STACKpack, from the South African National Bioinformatics Institute
and Electric Genetics. Free for academic use. It is a full package, with
a MySQL db, web && command-line interfase, etc. It uses several other
tools (cross_match/RepeatMasker, phrap), but comes with one that makes
the difference from other clustering approaches, it uses as a first
clustering step a loose comparison tool d2_cluster to form initial
clusters that are then assembled using phrap. Produces consensus and has
a tool to analyze them and suggest alternative forms (splicing?)
http://www.sanbi.ac.za/CODES/

iii) compare ESTs all vs all using BLASTN and group sequences using these
similarity relationships. Then align them using clustalw and get your
consensi. This approach would benefit from bioperl (running BLASTs and
parsing results, generating the groups, running clustalw on them and so
on).

iv) perhaps EMBOSS has an app similar to the GCG's Gelmerge you
mentioned? If so, you can use bioperl interfase to EMBOSS to automate
the process.
http://www.emboss.org/

Regards, 

Fernan

-- 
F e r n a n   A g u e r o
http://genoma.unsam.edu.ar/~fernan