[BioSQL-l] gene ontology questions (bug)

Hilmar Lapp hlapp at gnf.org
Mon Jun 2 13:41:02 EDT 2003


On Sunday, June 1, 2003, at 01:46  PM, Marc Colosimo wrote:

> I have several questions about the GO structure.
>
> First, the old ER diagram only has ontology_term.

That diagram is dated, there is no current ERD. I need to reproduce it, 
sorry about that. Will get to it today or tomorrow.

>  But the version I have,
> has  ontology and ontology_term.

Then you have mixed versions. There should be no ontology_term table, 
and indeed the ontology table is new. The name of ontology_term was 
changed to term. Do you run a recent CVS checkout? Did you try to 
instantiate over a previous schema?

> I want to add my own terms, but with so
> many contraints, I have no clue where to start. Here is a sample
> autogenerated list of what I want to add:
>
> Term ID Term Name       Frequency
> 3       reproduction    248
> 8       thioredoxin     39
>
> (This was made by dChip, with Affymetrix cvs files, for those 
> interested).

Right now there is no term_qualifier_value table, but there is 
term_relationship, which is valueless though. However, I guess the 
frequency is the value to be associated with a chip target, so either a 
bioentry or seqfeature. For both there are *_qualifier_value tables. If 
you want better help, you need to be more specific about what you want 
to represent for which purpose.

>
> Second, I tried to load in stuff (function.ontology) from 
> geneontology.org
> using the script load_ontology.pl.
>
> here is the error I get:
> perl load_ontology.pl --dbname bioseqdb --dbuser mcolosim --driver Pg
> ~/Affymetrix/function.ontology
> Parsing input ...
> Loading ontology Gene Ontology:
>         ... terms
> DBD::Pg::st execute failed: ERROR:  Relation "term" does not exist at

You need to upgrade to the latest schema version. Ontology_term was 
renamed to term in the Singapore-version (which is essentially the CVS 
head).

> Finally, I was wondering if anyone has written a script to parse the 
> gene
> association file type found at
> <http://www.geneontology.org/doc/GO.annotation.html#file>
> and  for the files at <http://www.geneontology.org/#ontologies>?
>

As for the latter, the bioperl ontologyIO parser parses that format 
(--format goflat). Read the POD documentation of load_ontology.pl, it 
tells you how to load this. Be sure to look at the --fmtargs option for 
how to supply the definitions file (the example shows it though).

In order to hassle-free load GO you want to obtain the latest bioperl 
CVS update from either the HEAD or the stable branch (tag branch-1-2), 
if you have bioperl 1.2.1 installed. There were several bugs in 1.2.1 
that I had to fix. We'll release bioperl 1.2.2 pretty soon, which will 
contain the fixes too.

As for the association file, no. Shouldn't be too hard though to write 
a quick converter that outputs SQL INSERT statements into the 
bioentry_qualifier_value table, which you then feed to your SQL shell 
(psql for instance):

	INSERT INTO bioentry_qualifier_value (bioentry_id, term_id, rank, 
value)
	SELECT e.bioentry_id, t.term_id, 1, <the evidence code or something 
may go here>
	FROM bioentry e, term t, biodatabase db
	WHERE e.biodatabase_id = db.biodatabase_id
	AND db.name = <the sequence namespace goes here>
	AND e.accession = <the parsed out accession goes here>
	AND t.identifier = <the GO id goes here>

You don't necessarily need the namespace but you get the idea.

Alternatively, get the sequence objects, and for each associated term 
attach on ontology term:

	my $adp = $db->get_object_adaptor("SeqI");
	my $ns = ...; # namespace for sequences
	while(<>) {
	   # <parse out accession and GO id>
	   my $acc = ...;
	   my $goid = ...;
	   # find sequence
	   my $seq = Bio::Seq::RichSeq->new(-accession_number => $acc,
	                                    -namespace = $ns);
	   $seq = $adp->find_by_unique_key($seq);
	   if(!$seq) { warn "couldn't find seq with accession $acc"; next; }
	   # create GO term
	   my $term = Bio::Ontology::Term->new(-identifier => $goid,
	                                       -ontology => 'Gene Ontology');
	   # then attach it to sequence
	   $seq->annotation->add_Annotation($term);
	   # and update
	   $seq->store();
	}

Again, you get the idea. BTW both examples will only work if you loaded 
GO before.

Note also that evidence isn't really handled yet. You can try and 
kludge in biosql (like the value of a *_qualifier_value association), 
but in bioperl you are at a loss.


	-hilmar

> Thanks,
>
> marc
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------



More information about the BioSQL-l mailing list