[Bioperl-l] New to Bioperl

Niels Larsen nel at birc.dk
Sun Jun 15 16:48:00 EDT 2003


I am exploring bioperl, to get an idea of its advantages/disadvantages .. I 
hope to use it and contribute to it. So first I try to load the latest full EMBL 
release into MySQL 4.0.12 using Bio::SeqIO. Parsing of a typical .dat 
entry file (with ~100,000 entries) takes a full 5-6 minutes, whereas zcat'ing
plus reading each line in perl takes 5-6 seconds. Loading each entry at 
a time (using bioperl-db/scripts/biosq/load_seqdatabase.pl) however 
takes 1-2 hours (didnt time exactly), which means 300-600 hours for the
release. The parsing time I could live with. Is there a supported way of 
loading faster? I could write something that creates loading-ready tables
for each .dat file and then each would take 1-2 minutes I think. But finding 
which accessors to use in order to do that is a hard read for me. Do you
have advice about 1) how to avoid loading release-entries one at a time
and 2) how to get a quick overview over which methods apply to any
given object? (I have fiddled with Class::Inspector a bit). Apologies if the
answers are given in the documentation, just point me there then. 

Niels L


Niels Larsen, Associate Professor
Bioinformatics Research Center (BIRC)
Aarhus University
Hoegh Guldbergsgade 10
DK 8000 Aarhus C

Electronic mail: nel at birc.dk

Telephone: +45-8942-3153
Telefax: +45-8942-3077


More information about the Bioperl-l mailing list