[Bioperl-l] bp_bulk_load_gff.pl speed

Aaron J. Mackey amackey at pcbi.upenn.edu
Wed Jul 14 19:10:39 EDT 2004


Aha, I'm *not* crazy!  I've experienced exactly this same behavior (I 
ended up "solving" it by batching loading in blocks of 500, which 
worked fine until my database got very big such that the initial group 
loading got too slow).

What's your mysql version, perl version (usemymalloc?), and OS?  I 
think this is a perl hash/memory issue, but I'd love to solve it now 
that I know it's not just something stupid I'm doing wrong.

-Aaron

On Jul 14, 2004, at 6:22 PM, Dustin Cram wrote:

> I recently started using Bio:DB:GFF, beginning by using
> bp_bulk_load_gff.pl to load a simple but large gff2 file.  This file
> consisted only of transcripts and their subfeatures, so the group
> class of all features was "transcript".  The files loaded with no
> problem and I was able to write a few successful test scripts.
>
> Now I have added  new features (genes) to the gff file, and I
> attempted to load the new file exactly as before with
> bp_bulk_load_gff.pl, but now it takes _much_ longer to load, and takes
> more time the more features are added (the first 5K features take
> about 30 seconds, the next 5K features take nearly 2 minutes, and so
> on).  It took over an hour to 50K features, at which point I stopped
> it.
>
> I've played around with the gff file a bit and found that anything
> that doesn't have a  group class of "transcript" has this problem, for
> example if I 'sed s/transcript/foo/g'  the original file it's slow,
> and if I 'sed s/gene/transcript/g' the new file it's fast.  I have
> manually verified that the MySQL database is empty before each attempt
> and even wiped the tmp directory before each attempt.
>
> Any ideas why non-transcript features take so long?
>
> Thanks,
>
> Dustin Cram
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
Aaron J. Mackey, Ph.D.
Dept. of Biology, Goddard 212
University of Pennsylvania       email:  amackey at pcbi.upenn.edu
415 S. University Avenue         office: 215-898-1205
Philadelphia, PA  19104-6017     fax:    215-746-6697



More information about the Bioperl-l mailing list