[Bioperl-l] Aggressive aggregation?
Aaron J. Mackey
amackey at pcbi.upenn.edu
Wed Mar 9 14:00:40 EST 2005
> My problem is partly based on a peculiarity of Blast and partly based
> on
> the behavior of the aggregators in GBrowse and I'm wondering if someone
> else has seen this.
Welcome to the party ;)
> My problem presents itself when an EST hits at more than once place on
> a
> Chromosome.
Besides Jason's recommendation to use a splicing-aware tool (exonerate
is one, but Spidey is also a good one, and is based on BLASTN already),
you have another issue which is that your GFF Target's need to be
uniquely named. This is a well-known drawback of GFF prior to GFF3,
and a continuing issue with GBrowse when using the current Bio::DB:GFF
(which is not yet GFF3-savvy).
> chr1 aafcest HSP 1 75 . + . Target
> "Sequence:chad1" 1 75
> chr1 aafcest HSP 100 150 . + . Target
> "Sequence:chad1" 100 150
> chr1 aafcest match 1 150 . + . Target
> "Sequence:chad1" 1 150
>
> chr1 aafcest HSP 200 275 . - . Target
> "Sequence:chad1" 200 275
> chr1 aafcest HSP 300 450 . - . Target
> "Sequence:chad1" 300 450
> chr1 aafcest match 200 450 . - . Target
> "Sequence:chad1" 200 450
These need to be Target "Sequence:chad1-1" and "Sequence:chad1-2" or
some such. This also means that if you're saving the ESTs in the
database (for sequence alignment display), you'll have to save them
redundantly under chad1-1, chad1-2, etc. The same problem arises with
BLASTX searches again protein databases.
Now, you could write a custom aggregator that de-aggregated multiple
chad1 "match" features, assigning the contained HSPs to each, but there
is no such "default" behavior. Let me know if there's general interest
for this ...
Anxiously awaiting GFF3-support in Bio::DB::GFF,
-Aaron
--
Aaron J. Mackey, Ph.D.
Dept. of Biology, Goddard 212
University of Pennsylvania email: amackey at pcbi.upenn.edu
415 S. University Avenue office: 215-898-1205
Philadelphia, PA 19104-6017 fax: 215-746-6697
More information about the Bioperl-l
mailing list