[BioSQL-l] Problem loading GO.
Hilmar Lapp
hlapp at gmx.net
Tue Apr 17 04:00:55 UTC 2007
Hi Leighton, please see below:
On Apr 16, 2007, at 11:55 AM, Leighton Pritchard wrote:
> Hi,
>
> I've been trying to upload the GO into a clean BioSQL (MySQL, 1.4.1)
> schema using the BioPerl bp_load_ontology.pl script, with the OBOv1.0,
> OBOv1.2, and the most recent flatfiles from
> http://www.geneontology.org/GO.downloads.ontology.shtml - none of my
> attempts have been successful. The errors below are from a Linux
> installation, but the same errors are thrown on OS X, too. I am using
> the most recent versions of BioPerl and bioperl-db, installed via
> CPAN:
>
> [lpritc at lplinuxdev sequence_data]$ perl -MBio::Root::Version -e 'print
> $Bio::Root::Version::VERSION,"\n"'
> 1.005002102
>
> and bioperl-db 1.5.2.
>
> I have attached the traceback below (running with --safe throws a
> number
> of equivalent errors),
Using --safe will throw the same errors, but will continue loading.
I.e., you'd lose the one term, but keep everything else.
I do realize that especially for a graph losing an internal node can
be quite detrimental.
> [...]
> ########
>
> [lpritc at lplinuxdev sequence_data]$ bp_load_ontology.pl --host
> localhost
> --dbname biosql --namespace "Gene Ontology" --dbuser lpritc --dbpass
> ******** --format obo ~/Downloads/gene_ontology_edit.obo
> Loading ontology gene_ontology:
> ... terms
> ... relationships
> Done with gene_ontology.
> Loading ontology biological_process:
> ... terms
>
> -------------------- WARNING ---------------------
> MSG: insert in Bio::DB::BioSQL::DBLinkAdaptor (driver) failed, values
> were ("","","0","") FKs ()
> Column 'dbname' cannot be null
> ---------------------------------------------------
This would point to a problem of the BioPerl obo parser. According to
the message, both the database name and the accession of the db_xref
for the term are - surely erroneously - empty. Apparently the parser
fails to parse out database and accession for this db_xref of term GO:
0018901.
If you can edit the obo file, you can try deleting the db_xref(s) for
that term that look odd (or delete all if you don't need them).
I'd have to debug the obo parser to see exactly where it's going
wrong in parsing.
> Could not store term GO:0018901, name '2,4-dichlorophenoxyacetic acid
> metabolic process':
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> [...]
> [lpritc at lplinuxdev sequence_data]$ bp_load_ontology.pl --host
> localhost
> --dbname biosql --namespace "Gene Ontology" --dbuser lpritc --dbpass
> ******** --format goflat --fmtargs ~/Downloads/GO.defs
Note that the argument for --fmtargs here should read
"-defs_file,/path/to/Downloads/GO.defs". (Note that within the quotes
there is no tilde expansion.)
> ~/Downloads/function.ontology
> Loading ontology Gene Ontology:
> ... terms
>
> -------------------- WARNING ---------------------
> MSG: insert in Bio::DB::BioSQL::DBLinkAdaptor (driver) failed, values
> were ("MetaCyc","2\,3-DIHYDROXYINDOLE-2\,3-DIOXYGENASE-RXN","0","")
> FKs
> ()
> Duplicate entry '2\,3-DIHYDROXYINDOLE-2\,3-DIOXYGENASE-RX-
> MetaCyc-0' for
> key 2
> ---------------------------------------------------
This is one the things why you've got to love MySQL (and I am correct
in inferring that you're using MySQL?). The width of the
dbxref.accession column (for which the second value in parentheses
is) is 40 chars. The apparently pre-existing value ("2\,3-
DIHYDROXYINDOLE-2\,3-DIOXYGENASE-RX-MetaCyc-0") is 50 chars, which
when loaded should have resulted in an exception. Instead, MySQL just
simply and silently truncates it to 40 chars, which makes it
identical to the first 40 chars of "2\,3-DIHYDROXYINDOLE-2\,3-
DIOXYGENASE-RXN" (which is 41 chars in length).
It may be necessary to widen the length of dbname.accession here, for
example to 80 chars? Let me know if you need help with the DDL
command to do this.
Let me know how far this gets you.
-hilmar
--
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
===========================================================
More information about the BioSQL-l
mailing list