[Bioperl-l] Annotation Pipeline

Elia Stupka elia@fugu-sg.org
Fri, 1 Nov 2002 15:39:38 +0800 (SGT)


> Working on an EST project and I would like to improve our annotation
> pipeline.

Hi Charles,

   we've been working on similar projects for a while. There is a couple
of things that you should take into account with ESTs that you might want
to do:

1-clustering. At the moment we are using StackPack which is free for
academia, and you can use something else, but you definitely need to take
all ESTs and cluster them to find out which have strong overlaps and build
consensus sequences from those

2-if you want to annotate their possible relationship to genese you might
want to use something like ESTscan (an ORF finder that allows for poor
quality in the sequence) to predict out of your consensus which will
actually give rise to ORFs

3-Having the ORFs you can then translate them into proteins. Once you have
the proteins you can scan them for domains (such as Pfam, PROSITE,
PRINTS), link protein domains to GO and you can of course blast the
translations to protein datasets.

In terms of blasting, protein domain prediction,etc. you can use BioPipe,
(www.biopipe.org) which we've been now using for a while to annotate
sequences, blast datasets,etc. while StackPack is pretty much a
stand-alone program, but is very easy to use and has a friendly web
interface.

Elia

********************************
* http://www.fugu-sg.org/~elia *
* tel:    +65 6874 1467        *
* mobile: +65 9030 7613        *
* fax:    +65 6779 1117        *
********************************