[Bioperl-l] BioSQL: loading large sequence records, and taxon parsing

Juguang Xiao juguang at tll.org.sg
Fri Jun 20 13:38:31 EDT 2003


>
>
> 1. I am wondering if anyone has tried to load a large sequence (like a 
> whole
> chromosome with annotation). It took me overnight to load in a 20Mb 
> sequence
> with some 4000 genes-worth annotation, on a laptop of P-III, 750 MHz, 
> and 250Mb
> mem.
> Is there any way to make this faster? besides buying a faster machine 
> ;-)
>

I have loaded the whole swissprot, embl and trembl dataset into biosql 
in mysql. It is not as fast as we expected but endurable. :-)
>

>
> 3. The problem I encountered that may be related to how the taxon_name 
> table is
> populated by the load_seqdatabase.pl (or modules called by). I loaded 
> the
> database with 2 organelle genomes the mito and the chloroplast with 
> following
> two records in that order.  Though both records show up in the 
> bioentry table,
> it seems only the info from the first record got populated into the 
> taxon_name
> table:
>
> taxon_id |                name                |   name_class
> ----------+------------------------------------+-----------------
>         1 | Eukaryota                          | scientific name
>         2 | Viridiplantae                      | scientific name
> .......... extra lines removed ...................
>        13 | Brassicaceae                       | scientific name
>        14 | Arabidopsis                        | scientific name
>        15 | Mitochondrion                      | scientific name
>        16 | Mitochondrion Arabidopsis          | scientific name
>        17 | Mitochondrion Arabidopsis thaliana | scientific name
>        17 | thale cress                        | common name
> (18 rows)

To be honest, I do not care about it, as long as you can fetch the 
result out correctly. I actually met such case before. One way to solve 
it is to load_ncbi_taxonomy before load your sequence. (That may be 
unnecessary in your case)

A user-to-user talk. :-)

Juguang

------------ATGCCGAGCTTNNNNCT--------------
Juguang Xiao
Temasek Life Sciences Laboratory, National University of Singapore
1 Research Link,  Singapore 117604
juguang at tll.org.sg



More information about the Bioperl-l mailing list