[Bioperl-l] New to Bioperl
Niels Larsen
nel at birc.dk
Sun Jun 15 16:48:00 EDT 2003
Greetings,
I am exploring bioperl, to get an idea of its advantages/disadvantages .. I
hope to use it and contribute to it. So first I try to load the latest full EMBL
release into MySQL 4.0.12 using Bio::SeqIO. Parsing of a typical .dat
entry file (with ~100,000 entries) takes a full 5-6 minutes, whereas zcat'ing
plus reading each line in perl takes 5-6 seconds. Loading each entry at
a time (using bioperl-db/scripts/biosq/load_seqdatabase.pl) however
takes 1-2 hours (didnt time exactly), which means 300-600 hours for the
release. The parsing time I could live with. Is there a supported way of
loading faster? I could write something that creates loading-ready tables
for each .dat file and then each would take 1-2 minutes I think. But finding
which accessors to use in order to do that is a hard read for me. Do you
have advice about 1) how to avoid loading release-entries one at a time
and 2) how to get a quick overview over which methods apply to any
given object? (I have fiddled with Class::Inspector a bit). Apologies if the
answers are given in the documentation, just point me there then.
Niels L
------------------------------------------------------------------------
Niels Larsen, Associate Professor
Bioinformatics Research Center (BIRC)
Aarhus University
Hoegh Guldbergsgade 10
DK 8000 Aarhus C
Denmark
Electronic mail: nel at birc.dk
Telephone: +45-8942-3153
Telefax: +45-8942-3077
------------------------------------------------------------------------
More information about the Bioperl-l
mailing list