[Bioperl-l] BioSQL: loading large sequence records,
and taxon parsing
Hilmar Lapp
hlapp at gnf.org
Wed Jun 18 18:21:24 EDT 2003
On Tuesday, June 17, 2003, at 05:46 PM, Xiaoying Lin wrote:
> Hi, I have two questions related to bioSQL (with latest CVS co, and
> bioperl
> 1.2.1)
>
>
> 1. I am wondering if anyone has tried to load a large sequence (like a
> whole
> chromosome with annotation). It took me overnight to load in a 20Mb
> sequence
> with some 4000 genes-worth annotation, on a laptop of P-III, 750 MHz,
> and 250Mb
> mem.
> Is there any way to make this faster? besides buying a faster machine
> ;-)
>
Try check where the bottleneck is first. If you supply --verbose
(you're talking about load_seqdatabase.pl, right?), you'll see
essentially every query as it is executed. Watching this for a short
while should tell you whether just about every query takes, or whether
it's a specific one. Also, monitor the CPU load. What's the percentages
between the perl process and the RDBMS process?
> 2. In the taxon table, there is a column 'mito_genetic_code'
> Have people thought about genetic code for plastid genome, such as
> chloroplast?
>
The columns are straight from the NCBI taxon download ... there is no
plastid_genetic_code in there, can't explain why. Mind asking NCBI?
-hilmar
>
>
>
>
>
>
> __________________________________
> Do you Yahoo!?
> SBC Yahoo! DSL - Now only $29.95 per month!
> http://sbc.yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
--
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------
More information about the Bioperl-l
mailing list