[Bioperl-l] Bio::DB::SeqFeature::Store::memory -> filter_by_type very slow

Jelle Scholtalbers j.scholtalbers at gmail.com
Fri Feb 5 15:36:50 UTC 2010


Hi,
a different issue I encountered today with the
Bio::DB::SeqFeature::Store::memory  is with the BINSIZE that this module
sets:

use constant BINSIZE => 10_000;

It would be nice to be able to set this dynamically since different GFF
files ask for different indexing rules. I ran into this as a problem when I
used a file that had its position multiplied by 1000 and at that point the
program ran fairly quick, 3-4min. After dividing the positions by 1000
(which is desired) the program took ~30min. to finish. The slowdown was
traceable to Bio::DB::SeqFeature::Store::memory::filter_by_location. By
setting the BINSIZE to 1 the issue was solved. However for another GFF file
this size is way too low.
Is this already possible and did I not see it or would this be an option to
add?

Cheers,
Jelle


2010/2/1 Chris Fields <cjfields at illinois.edu>

> Jelle,
>
> Seems reasonable, but Lincoln and Scott know that code better and are
> better suited to comment on it. Lincoln, Scott?
>
> chris
>
> On Feb 1, 2010, at 6:24 AM, Jelle Scholtalbers wrote:
>
> > Hi,
> > I used the Bio::DB::SeqFeature::Store::memory module to load in a GFF3
> file
> > which I could then use in my script in a 'queryable' way. To retrieve
> > features I used for example
> >       $db->features(-type => 'BAC:FPC', -seq_id=>'chromosome0')
> > However when doing a profile on my script I found out that 60% of the
> > running time went into filter_by_type from
> > Bio::DB::SeqFeature::Store::memory.
> > Replacing this function with
> >    my @features = grep{$_->type eq 'BAC:FPC'}
> > $db->features(-seq_id=>'chromosome0')
> > which gave me the same results was just a fraction of the earlier run
> time.
> > My script went from 60min. to 4min. for the same result and only changing
> > this function (is called often).
> > Can/Should this be fixed or is this just the faster way to do it?
> >
> > Cheers,
> > Jelle
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>



More information about the Bioperl-l mailing list