[BioSQL-l] gene ontology questions (bug)
Hilmar Lapp
hlapp at gnf.org
Mon Jun 2 13:41:02 EDT 2003
On Sunday, June 1, 2003, at 01:46 PM, Marc Colosimo wrote:
> I have several questions about the GO structure.
>
> First, the old ER diagram only has ontology_term.
That diagram is dated, there is no current ERD. I need to reproduce it,
sorry about that. Will get to it today or tomorrow.
> But the version I have,
> has ontology and ontology_term.
Then you have mixed versions. There should be no ontology_term table,
and indeed the ontology table is new. The name of ontology_term was
changed to term. Do you run a recent CVS checkout? Did you try to
instantiate over a previous schema?
> I want to add my own terms, but with so
> many contraints, I have no clue where to start. Here is a sample
> autogenerated list of what I want to add:
>
> Term ID Term Name Frequency
> 3 reproduction 248
> 8 thioredoxin 39
>
> (This was made by dChip, with Affymetrix cvs files, for those
> interested).
Right now there is no term_qualifier_value table, but there is
term_relationship, which is valueless though. However, I guess the
frequency is the value to be associated with a chip target, so either a
bioentry or seqfeature. For both there are *_qualifier_value tables. If
you want better help, you need to be more specific about what you want
to represent for which purpose.
>
> Second, I tried to load in stuff (function.ontology) from
> geneontology.org
> using the script load_ontology.pl.
>
> here is the error I get:
> perl load_ontology.pl --dbname bioseqdb --dbuser mcolosim --driver Pg
> ~/Affymetrix/function.ontology
> Parsing input ...
> Loading ontology Gene Ontology:
> ... terms
> DBD::Pg::st execute failed: ERROR: Relation "term" does not exist at
You need to upgrade to the latest schema version. Ontology_term was
renamed to term in the Singapore-version (which is essentially the CVS
head).
> Finally, I was wondering if anyone has written a script to parse the
> gene
> association file type found at
> <http://www.geneontology.org/doc/GO.annotation.html#file>
> and for the files at <http://www.geneontology.org/#ontologies>?
>
As for the latter, the bioperl ontologyIO parser parses that format
(--format goflat). Read the POD documentation of load_ontology.pl, it
tells you how to load this. Be sure to look at the --fmtargs option for
how to supply the definitions file (the example shows it though).
In order to hassle-free load GO you want to obtain the latest bioperl
CVS update from either the HEAD or the stable branch (tag branch-1-2),
if you have bioperl 1.2.1 installed. There were several bugs in 1.2.1
that I had to fix. We'll release bioperl 1.2.2 pretty soon, which will
contain the fixes too.
As for the association file, no. Shouldn't be too hard though to write
a quick converter that outputs SQL INSERT statements into the
bioentry_qualifier_value table, which you then feed to your SQL shell
(psql for instance):
INSERT INTO bioentry_qualifier_value (bioentry_id, term_id, rank,
value)
SELECT e.bioentry_id, t.term_id, 1, <the evidence code or something
may go here>
FROM bioentry e, term t, biodatabase db
WHERE e.biodatabase_id = db.biodatabase_id
AND db.name = <the sequence namespace goes here>
AND e.accession = <the parsed out accession goes here>
AND t.identifier = <the GO id goes here>
You don't necessarily need the namespace but you get the idea.
Alternatively, get the sequence objects, and for each associated term
attach on ontology term:
my $adp = $db->get_object_adaptor("SeqI");
my $ns = ...; # namespace for sequences
while(<>) {
# <parse out accession and GO id>
my $acc = ...;
my $goid = ...;
# find sequence
my $seq = Bio::Seq::RichSeq->new(-accession_number => $acc,
-namespace = $ns);
$seq = $adp->find_by_unique_key($seq);
if(!$seq) { warn "couldn't find seq with accession $acc"; next; }
# create GO term
my $term = Bio::Ontology::Term->new(-identifier => $goid,
-ontology => 'Gene Ontology');
# then attach it to sequence
$seq->annotation->add_Annotation($term);
# and update
$seq->store();
}
Again, you get the idea. BTW both examples will only work if you loaded
GO before.
Note also that evidence isn't really handled yet. You can try and
kludge in biosql (like the value of a *_qualifier_value association),
but in bioperl you are at a loss.
-hilmar
> Thanks,
>
> marc
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
>
--
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------
More information about the BioSQL-l
mailing list