[Bioperl-l] Entrez Gene and bioperl-db

Hilmar Lapp hlapp at gmx.net
Wed Dec 15 03:24:35 EST 2004

On Tuesday, December 14, 2004, at 08:41  AM, Sean Davis wrote:

> If you look in the CDS section of some of the refseq entries  
> (http://www.ncbi.nlm.nih.gov/entrez/ 
> viewer.fcgi?db=nucleotide&val=56550106 as an example), you will see  
> the gene ontology information there.  I honestly don't know how this  
> is handled by bioperl-db...

Well if you don't do anything about it then it will sit there in  
seqfeature_qualifier_value rows, where it is relatively useless (but,  
hey, it's in the feature table in semi-mangled form, and hence comes in  
a more or less useless format already ...).

So what I did is write a custom SequenceProcessorI (by deriving from  
Bio::Seq::BaseSeqProcessor) that for every sequence parses this out of  
the annotation (tags) of the CDS feature, creates Bio::Ontology::Term  
instances with name and identifier set, and re-attaches the term  
objects to the sequence object's annotation using  
Bio::Annotation::OntologyTerm as the adaptor (so that the term is-a  
Bio::AnnotationI). When the result of this gets serialized to the  
database through bioperl-db,  you get rows in the term table for the  
terms, and the association with the sequence (bioentry) will be in  
bioentry_qualifier_value. You hook your SequenceProcessorI into the  
system by using the --pipeline argument to load_seqdatabase.pl.

Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757

More information about the Bioperl-l mailing list