[BioSQL-l] Timing importing GenBank files into BioSQL
Nick Loman
n.j.loman at bham.ac.uk
Mon Aug 18 16:33:17 UTC 2008
Peter wrote:
> I'm wondering if the BioPerl time is typical (I hope not), and if
> there are any computationally intensive or otherwise slow things it
> does that BioPython might be skipping (checksums? fetching taxonomy?)
I also found that BioPython was faster than BioPerl at importing the
same GenBank file.
There are some differences in the handling of certain tables, the dbxref
table springs to mind. It is worth doing a dump of the database after
importing each file using the two different methods and comparing the
results. The differences may not be significant for you depending on
your application.
I suspect the difference is speed you find is related to the number of
object lookups done in BioPerl which is significantly more than in
BioPython. You can specify --flatlookup to load_seqdatabase.pl which
reduces the number of lookups.
You could enable DBI_TRACE to get a log of SQL statements for BioPerl.
For my purposes, I found both Bioperl and Biopython to be a bit slow
devised a batch import script which speeds things up quite dramatically
by eliminating most object lookups, and applying the foreign-key
constraints post-importing.
Regards,
Nick.
More information about the BioSQL-l
mailing list