[Bioperl-l] BioSQL: loading large sequence records, and taxon parsing

Hilmar Lapp hlapp at gnf.org
Wed Jun 18 18:21:24 EDT 2003


On Tuesday, June 17, 2003, at 05:46  PM, Xiaoying Lin wrote:

> Hi, I have two questions related to bioSQL (with latest CVS co, and 
> bioperl
> 1.2.1)
>
>
> 1. I am wondering if anyone has tried to load a large sequence (like a 
> whole
> chromosome with annotation). It took me overnight to load in a 20Mb 
> sequence
> with some 4000 genes-worth annotation, on a laptop of P-III, 750 MHz, 
> and 250Mb
> mem.
> Is there any way to make this faster? besides buying a faster machine 
> ;-)
>

Try check where the bottleneck is first. If you supply --verbose 
(you're talking about load_seqdatabase.pl, right?), you'll see 
essentially every query as it is executed. Watching this for a short 
while should tell you whether just about every query takes, or whether 
it's a specific one. Also, monitor the CPU load. What's the percentages 
between the perl process and the RDBMS process?

> 2. In the taxon table, there is a column 'mito_genetic_code'
> Have people thought about genetic code for plastid genome, such as 
> chloroplast?
>

The columns are straight from the NCBI taxon download ... there is no 
plastid_genetic_code in there, can't explain why. Mind asking NCBI?

	-hilmar

>
>
>
>
>
>
> __________________________________
> Do you Yahoo!?
> SBC Yahoo! DSL - Now only $29.95 per month!
> http://sbc.yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------



More information about the Bioperl-l mailing list