[BioSQL-l] Quick question about tools using BioSQL

Amedeo, Paolo pamedeo at jcvi.org
Mon Mar 1 19:59:50 UTC 2010


I'm evaluating the possibility of using BioSQL for a new genome
annotation project that also involves loading annotated genomes from
GenBank.

So far, I have successfully deployed a working copy of the database
under MySQL on my machine and loaded a couple of genomes from GenBank
files, using the script load_seqdatabase.pl found in the scripts
directory of BioPerl-db-1.6.0.

Browsing the database, however, I have noticed a few things that concern
me a little bit.

First, the seqfeature_relationship table is completely empty and I
couldn't identify any obvious way to investigate parent/child
relationships between entities stored in seqfeature (e.g. in the case of
overlapping genes, or genes embedded in introns of other genes, how one
could determine to which gene a given CDS belongs?).

Second, I was unable to find a dedicated script to populate the ontology
table and I was somehow surprised that this table got somehow populated
with the keywords present in the GenBank files.

Third, once I have loaded a genome without first populating the taxon
table and, as a result I have noticed that the values assigned to
taxon.left_value and taxon.right_value described a narrow interval that
didn't include at all the taxon_id of the genome loaded in the database.

I then tried to use the script bioentry2flat.pl to try to write back to
a gbf file the genome that I had loaded in the database. Unfortunately I
couldn't find any documentation for this script and I've tried to use as
values of the various parameters the same strings that I used with the
other script.  I had to edit the code to get rid of hard-coded values,
but still I couldn't get the script to run successfully. I suspect that
there is some problem with matching correctly the accession.

Obviously I'm doing one or more things wrong and/or I'm not using the
proper set of tools for doing what I need to do.

I would really appreciate if somebody could point me to a set of tools
that would allow me to load gbf files into the database and extract the
individual accessions in both gbf and asn.1 (sqn) format, or teaches me
how should I correctly use these two scripts, so that the
bioentry_relationship table is populated correctly.

 

Thanks for your consideration!

 

Paolo Amedeo

 

Senior Bioinformatics Engineer

J. Craig Venter Institute

9704 Medical Center Dr.

Rockville, MD 20850

 

 





More information about the BioSQL-l mailing list