[Bioperl-l] BioSQL: loading large sequence records,
and taxon parsing
Juguang Xiao
juguang at tll.org.sg
Fri Jun 20 13:38:31 EDT 2003
>
>
> 1. I am wondering if anyone has tried to load a large sequence (like a
> whole
> chromosome with annotation). It took me overnight to load in a 20Mb
> sequence
> with some 4000 genes-worth annotation, on a laptop of P-III, 750 MHz,
> and 250Mb
> mem.
> Is there any way to make this faster? besides buying a faster machine
> ;-)
>
I have loaded the whole swissprot, embl and trembl dataset into biosql
in mysql. It is not as fast as we expected but endurable. :-)
>
>
> 3. The problem I encountered that may be related to how the taxon_name
> table is
> populated by the load_seqdatabase.pl (or modules called by). I loaded
> the
> database with 2 organelle genomes the mito and the chloroplast with
> following
> two records in that order. Though both records show up in the
> bioentry table,
> it seems only the info from the first record got populated into the
> taxon_name
> table:
>
> taxon_id | name | name_class
> ----------+------------------------------------+-----------------
> 1 | Eukaryota | scientific name
> 2 | Viridiplantae | scientific name
> .......... extra lines removed ...................
> 13 | Brassicaceae | scientific name
> 14 | Arabidopsis | scientific name
> 15 | Mitochondrion | scientific name
> 16 | Mitochondrion Arabidopsis | scientific name
> 17 | Mitochondrion Arabidopsis thaliana | scientific name
> 17 | thale cress | common name
> (18 rows)
To be honest, I do not care about it, as long as you can fetch the
result out correctly. I actually met such case before. One way to solve
it is to load_ncbi_taxonomy before load your sequence. (That may be
unnecessary in your case)
A user-to-user talk. :-)
Juguang
------------ATGCCGAGCTTNNNNCT--------------
Juguang Xiao
Temasek Life Sciences Laboratory, National University of Singapore
1 Research Link, Singapore 117604
juguang at tll.org.sg
More information about the Bioperl-l
mailing list