[Bioperl-l] load_ontology and GO - progress!
Dave Howorth
dhoworth at mrc-lmb.cam.ac.uk
Fri Apr 23 09:30:45 EDT 2004
Sean Davis wrote:
> I think you can load with --noobsolete (see perldoc for
> load_ontology.pl). You may also want to use --safe so that if
> there does happen to be a term already loaded, the entire load
> does not fail (again, see perldoc).
Thanks Sean, that has been a successful workaround that has let me load
the database.
Hilmar Lapp wrote:
> Did you read the load_ontology.pl POD, in particular the documentation
> for the options that deal with obsolete terms?
Hi Hilmar,
Yes I had read the POD. Specifically, I used the options shown in the
synopsis of that document "for loading the Gene Ontology". I had
expected that to be a working example. If different options are needed,
I would have expected them to be used, or at least mentioned, in the
synopsis. Neither does the description of the --noobsolete option
indicate that it is necessary to use it when loading GO, as opposed to
something I might consider for reasons that aren't explained.
Remember, this is the first time I have used this database and loader
and I am specifically using them now with the aim of learning what the
issues are and how best to deal with them. Unless the documentation
describes an issue, I'm not going to be aware of it until I trip over it :(
> Obsolete terms is not a trivial thing to deal with, and in the end you
> need to make some decisions for yourself. load_ontology.pl offers
> several choices but it's up to you what works best for you. There have
> been prior threads on this; e.g. reading
> http://bioperl.org/pipermail/bioperl-l/2004-February/014846.html may
> give you some additional information.
I agree terms that become obsolete are complex to deal with.
I think there are two different issues associated with the two examples
I gave (elastin and collagen).
Collagen appears to be an example of the case you discuss in the thread
to which you refer. Incidentally, adding the obsolete flag to the key
will only ensure uniqueness through one obsolence event. If the events
were to be repeated, it would fail. Hopefully that is an unlikely
scenario :) But if I wanted to handle that case, I would probably look
to adding some form of versioning, rather than a boolean flag.
But the issue with elastin is not about terms that become obsolete, it's
about two terms with the same name. The obsolesence appears to me to be
incidental. The GO files use two different terms (different GO IDs,
different ontologies) with the same name. They happen to be obsolete but
it looks like they both existed at the same time (the GO IDs differ only
by 1 and the terms are used in separate ontology files), not that a term
was made obsolete and another independent term was later created that
happened to use the same name.
If that secenario ever occurs again, it will break the schema. I
surmise from the low numbered GO IDs that perhaps this was something
that happened in the history of GO that will not be permitted to happen
again? But you appeared to think it is possible when you said in Feb:
"This is not atypical for annotation being a work in progress"
<http://bioperl.org/pipermail/bioperl-l/2004-February/014908.html>.
Hence my interest in whether such an event would constitute a data error
to report to GO or a schema error in biosql and my consequent curiosity
about the exact rules for GO.
Cheers, Dave
--
Dave Howorth
MRC Centre for Protein Engineering
Hills Road, Cambridge, CB2 2QH
01223 252960
More information about the Bioperl-l
mailing list