[Bioperl-l] Aggressive aggregation?

Aaron J. Mackey amackey at pcbi.upenn.edu
Wed Mar 9 14:00:40 EST 2005


> My problem is partly based on a peculiarity of Blast and partly based 
> on
> the behavior of the aggregators in GBrowse and I'm wondering if someone
> else has seen this.

Welcome to the party ;)

> My problem presents itself when an EST hits at more than once place on 
> a
> Chromosome.

Besides Jason's recommendation to use a splicing-aware tool (exonerate 
is one, but Spidey is also a good one, and is based on BLASTN already), 
you have another issue which is that your GFF Target's need to be 
uniquely named.  This is a well-known drawback of GFF prior to GFF3, 
and a continuing issue with GBrowse when using the current Bio::DB:GFF 
(which is not yet GFF3-savvy).

> chr1 aafcest     HSP   1     75    .     +     .     Target 
> "Sequence:chad1" 1 75
> chr1 aafcest     HSP   100   150   .     +     .     Target 
> "Sequence:chad1" 100 150
> chr1 aafcest     match 1     150   .     +     .     Target 
> "Sequence:chad1" 1 150
>
> chr1 aafcest     HSP   200   275   .     -     .     Target 
> "Sequence:chad1" 200 275
> chr1 aafcest     HSP   300   450   .     -     .     Target 
> "Sequence:chad1" 300 450
> chr1 aafcest     match 200   450   .     -     .     Target 
> "Sequence:chad1" 200 450


These need to be Target "Sequence:chad1-1" and "Sequence:chad1-2" or 
some such.  This also means that if you're saving the ESTs in the 
database (for sequence alignment display), you'll have to save them 
redundantly under chad1-1, chad1-2, etc.  The same problem arises with 
BLASTX searches again protein databases.

Now, you could write a custom aggregator that de-aggregated multiple 
chad1 "match" features, assigning the contained HSPs to each, but there 
is no such "default" behavior.  Let me know if there's general interest 
for this ...

Anxiously awaiting GFF3-support in Bio::DB::GFF,
-Aaron

--
Aaron J. Mackey, Ph.D.
Dept. of Biology, Goddard 212
University of Pennsylvania       email:  amackey at pcbi.upenn.edu
415 S. University Avenue         office: 215-898-1205
Philadelphia, PA  19104-6017     fax:    215-746-6697



More information about the Bioperl-l mailing list