[Bioperl-l] Entrez Gene and bioperl-db
Hilmar Lapp
hlapp at gmx.net
Wed Dec 15 03:24:35 EST 2004
On Tuesday, December 14, 2004, at 08:41 AM, Sean Davis wrote:
> If you look in the CDS section of some of the refseq entries
> (http://www.ncbi.nlm.nih.gov/entrez/
> viewer.fcgi?db=nucleotide&val=56550106 as an example), you will see
> the gene ontology information there. I honestly don't know how this
> is handled by bioperl-db...
Well if you don't do anything about it then it will sit there in
seqfeature_qualifier_value rows, where it is relatively useless (but,
hey, it's in the feature table in semi-mangled form, and hence comes in
a more or less useless format already ...).
So what I did is write a custom SequenceProcessorI (by deriving from
Bio::Seq::BaseSeqProcessor) that for every sequence parses this out of
the annotation (tags) of the CDS feature, creates Bio::Ontology::Term
instances with name and identifier set, and re-attaches the term
objects to the sequence object's annotation using
Bio::Annotation::OntologyTerm as the adaptor (so that the term is-a
Bio::AnnotationI). When the result of this gets serialized to the
database through bioperl-db, you get rows in the term table for the
terms, and the association with the sequence (bioentry) will be in
bioentry_qualifier_value. You hook your SequenceProcessorI into the
system by using the --pipeline argument to load_seqdatabase.pl.
-hilmar
--
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------
More information about the Bioperl-l
mailing list