[Bioperl-l] BioSQL: loading large sequence records, and taxon parsing

Elia Stupka elia at tll.org.sg
Fri Jun 20 21:32:24 EDT 2003


>> Is there any way to make this faster? besides buying a faster machine 
>> ;-)
>
> I have loaded the whole swissprot, embl and trembl dataset into biosql 
> in mysql. It is not as fast as we expected but endurable. :-)

The only way we have found to increase the loading speed is to split 
the dataset and fire off multiple loading scripts on different 
machines.... but you need different machines to do that ;)

We will try to make our full BioSQL dumps available soon, let me know 
if you want to have them.

Elia


>
>>
>> 3. The problem I encountered that may be related to how the 
>> taxon_name table is
>> populated by the load_seqdatabase.pl (or modules called by). I loaded 
>> the
>> database with 2 organelle genomes the mito and the chloroplast with 
>> following
>> two records in that order.  Though both records show up in the 
>> bioentry table,
>> it seems only the info from the first record got populated into the 
>> taxon_name
>> table:
>>
>> taxon_id |                name                |   name_class
>> ----------+------------------------------------+-----------------
>>         1 | Eukaryota                          | scientific name
>>         2 | Viridiplantae                      | scientific name
>> .......... extra lines removed ...................
>>        13 | Brassicaceae                       | scientific name
>>        14 | Arabidopsis                        | scientific name
>>        15 | Mitochondrion                      | scientific name
>>        16 | Mitochondrion Arabidopsis          | scientific name
>>        17 | Mitochondrion Arabidopsis thaliana | scientific name
>>        17 | thale cress                        | common name
>> (18 rows)
>
> To be honest, I do not care about it, as long as you can fetch the 
> result out correctly. I actually met such case before. One way to 
> solve it is to load_ncbi_taxonomy before load your sequence. (That may 
> be unnecessary in your case)
>
> A user-to-user talk. :-)
>
> Juguang
>
> ------------ATGCCGAGCTTNNNNCT--------------
> Juguang Xiao
> Temasek Life Sciences Laboratory, National University of Singapore
> 1 Research Link,  Singapore 117604
> juguang at tll.org.sg
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
---
Bioinformatics Program Manager
Temasek Life Sciences Laboratory
1, Research Link
Singapore 117604
Tel. +65 6874 4945
Fax. +65 6872 7007



More information about the Bioperl-l mailing list