[BioSQL-l] [Bioperl-l] Problem loading GO.

Chris Mungall cjm at fruitfly.org
Tue Apr 17 16:54:51 UTC 2007


Is there any reason you're loading GO.defs? This is a legacy format  
all the information is subsumed in the obo file.

I didn't see your message to the GO folks re formatting errors - who  
did you send it to & what was the subject? I'll see it gets seen to.

>> -------------------- WARNING ---------------------
>> MSG: insert in Bio::DB::BioSQL::TermAdaptor (driver) failed, values
>> were
>> ("GO:0006905","vesicle transport","OBSOLETE (was not defined before
>> being made obsolete).","X","") FKs (1)
>> Duplicate entry 'vesicle transport-1-X' for key 3
>> ---------------------------------------------------
>> Could not store term GO:0006905, name 'vesicle transport':
>> [...]
>> There are duplicate terms, identical in the term table except for
>> GOID:
>> GO:0006905 and GO:0005480.  They are both "vesicle transport", and
>> obsoleted:
>
> That violates the uniqueness constraint, and this sounds more like a
> bug in the GO file. I'm also not sure what motivated them to create
> the same term multiple times only to obsolete it immediately.

these things happen - the schema should be able to deal with it. it's  
a pain I know. In Chado we have some hacky solution for this (I  
believe it is concatenating the ID onto the name of obsolete terms).

I think that its actually wrong to include obsoletes and actual terms  
in the same table - however, it's obviously astoundingly useful to be  
able to do this, but it requires the hack to get ou of the uniqueness  
violation.

The EBI loads all of OBO into BioSQL regularly - I wonder how they  
handle this?

On Apr 17, 2007, at 8:09 AM, Hilmar Lapp wrote:

>
> On Apr 17, 2007, at 9:35 AM, Leighton Pritchard wrote:
>
>> Hi Hilmar,
>>
>> Thanks for the very quick response.  Apologies for the long reply,
>> but I
>> thought it might be useful if anyone else happens across the same
>> problems that I did.
>
> Thanks for reporting all these.
>
>> [...]
>> -------------------- WARNING ---------------------
>> MSG: insert in Bio::DB::BioSQL::DBLinkAdaptor (driver) failed, values
>> were ("","","0","") FKs ()
>> Column 'dbname' cannot be null
>> ---------------------------------------------------
>> Could not store term GO:0047554, name '2-pyrone-4,6-dicarboxylate
>> lactonase activity':
>> [...]
>> I tracked this down to an apparently poor formatting of the GO.defs
>> file
>> (note that the first and third definition_lines appear to be two
>> halves
>> of the same entry):
>>
>> term: 2-pyrone-4,6-dicarboxylate lactonase activity
>> goid: GO:0047554
>> definition: Catalysis of the reaction: 2-pyrone-4,6-dicarboxylate +
>> H2O
>> = 4-carboxy-2-hydroxyhexa-2,4-dienedioate.
>> definition_reference: :6-DICARBOXYLATE-LACTONASE-RXN
>
> I wonder whether this is the line that throws the parser off. It
> looks like the database part of the reference is missing - bad.
>
>> definition_reference: EC:3.1.1.57
>> definition_reference: MetaCyc:2-PYRONE-4
>>
>> I found 43 similar errors for other GOIDs, and it appears to result
>> from
>> the occurrence of the string "\," in a dbxref - mostly MetaCyc
>> entries,
>> but also some UM-BBD_pathwayID entries.
>
> I'm not sure - although the string "\," might indeed trip up the
> parser, would have to investigate to confirm. Could it be a
> coincidence with definition_references that lack the database part
> before the colon?
>
>>
>> These errors appear to have followed through into the generation of
>> the
>> OBO format files in each case, e.g.:
>>
>> def: "Catalysis of the reaction: 2-pyrone-4,6-dicarboxylate + H2O =
>> 4-carboxy-2-hydroxyhexa-2,4-dienedioate." [:6-DICARBOXYLATE-
>> LACTONASE-RXN, EC:3.1.1.57, MetaCyc:2-PYRONE-4]
>
> Again, the first db_xref lacks the database in front of the colon. I
> can also see why "\," will trip up the parser in this format.
>
>>
>> and so is something for the GO guys to fix, I guess.
>
> The lack of a database for certain xrefs surely is. If the escaped
> comma does throw off the BioPerl parser then that part is for BioPerl
> to fix. It does seem to extract the parts correctly, if the error
> message is any indication, though you may argue that it should remove
> the escaping backslashes (and I'd certainly agree with that).
>
>>
>>
>> Another error is thrown after fixing the above, though (with the same
>> command as before):
>>
>> Loading ontology Gene Ontology:
>>         ... terms
>>
>> -------------------- WARNING ---------------------
>> MSG: insert in Bio::DB::BioSQL::TermAdaptor (driver) failed, values
>> were
>> ("GO:0006905","vesicle transport","OBSOLETE (was not defined before
>> being made obsolete).","X","") FKs (1)
>> Duplicate entry 'vesicle transport-1-X' for key 3
>> ---------------------------------------------------
>> Could not store term GO:0006905, name 'vesicle transport':
>> [...]
>> There are duplicate terms, identical in the term table except for
>> GOID:
>> GO:0006905 and GO:0005480.  They are both "vesicle transport", and
>> obsoleted:
>
> That violates the uniqueness constraint, and this sounds more like a
> bug in the GO file. I'm also not sure what motivated them to create
> the same term multiple times only to obsolete it immediately.
>
>> [...]
>> -------------------- WARNING ---------------------
>> MSG: insert in Bio::DB::BioSQL::DBLinkAdaptor (driver) failed, values
>> were ("PMID","","0","") FKs ()
>> Column 'accession' cannot be null
>> ---------------------------------------------------
>> Could not store term GO:0032933, name 'SREBP-mediated signaling
>> pathway':
>> [...]
>> with the offending entry being
>>
>> term: SREBP-mediated signaling pathway
>> goid: GO:0032933
>> definition: A series of molecular signals from the endoplasmic
>> reticulum
>> to the nucleus generated as a consequence of altered levels of one or
>> more lipids, and resulting in the activation of transcription by
>> SREBP.
>> definition_reference: GOC:mah
>> definition_reference: PMID:0
>>
>> I commented out the definition_reference for PMID:0, which seemed
>> to fix
>> matters.
>
> Right, it seems to be a bogus reference.
>
>>
>> The process.ontology and component.ontology files then went into the
>> database without a hitch.  Thanks again for your help,
>
> Fantastic you got it all loaded!
>
> Note that you also have the --computetc switch which will compute the
> transitive closure for you automatically.
>
> 	-hilmar
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>



More information about the BioSQL-l mailing list