[BioSQL-l] Timing importing GenBank files into BioSQL

Nick Loman n.j.loman at bham.ac.uk
Mon Aug 18 16:33:17 UTC 2008


Peter wrote:

> I'm wondering if the BioPerl time is typical (I hope not), and if
> there are any computationally intensive or otherwise slow things it
> does that BioPython might be skipping (checksums? fetching taxonomy?)

I also found that BioPython was faster than BioPerl at importing the 
same GenBank file.

There are some differences in the handling of certain tables, the dbxref 
table springs to mind. It is worth doing a dump of the database after 
importing each file using the two different methods and comparing the 
results. The differences may not be significant for you depending on 
your application.

I suspect the difference is speed you find is related to the number of 
object lookups done in BioPerl which is significantly more than in 
BioPython. You can specify --flatlookup to load_seqdatabase.pl which 
reduces the number of lookups.

You could enable DBI_TRACE to get a log of SQL statements for BioPerl.

For my purposes, I found both Bioperl and Biopython to be a bit slow 
devised a batch import script which speeds things up quite dramatically 
by eliminating most object lookups, and applying the foreign-key 
constraints post-importing.

Regards,

Nick.



More information about the BioSQL-l mailing list