[Biopython-dev] A modification to BioSQL

Brian Osborne bosborne11 at verizon.net
Mon Jun 22 21:22:21 UTC 2015


Peter,

Interesting exchange, was not aware of it.

Yes, in BioPerl there’s a source_tag() method.

Thanks again,

Brian O.

> On Jun 22, 2015, at 4:44 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> 
> Hi Brian,
> 
> Are you familiar with the logic BioPerl uses to set this field?
> See also https://github.com/biopython/biopython/pull/366
> 
> Peter
> 
> On Mon, Jun 22, 2015 at 9:11 PM, Brian Osborne <bosborne11 at verizon.net> wrote:
>> All,
>> 
>> I’ve been using the BioSQL schema with Bioperl and would like to start doing
>> the same with Biopython, but there’s a limitation I’d like to fix. Here’s
>> the relevant table in the BioSQL schema, seqfeature:
>> 
>>     Column     |         Type          |                        Modifiers
>> | Storage  | Stats target | Description
>> ----------------+-----------------------+---------------------------------------------------------+----------+--------------+-------------
>> seqfeature_id  | integer               | not null default
>> nextval('seqfeature_pk_seq'::regclass) | plain    |              |
>> bioentry_id    | integer               | not null
>> | plain    |              |
>> type_term_id   | integer               | not null
>> | plain    |              |
>> source_term_id | integer               | not null
>> | plain    |              |
>> display_name   | character varying(64) |
>> | extended |              |
>> rank           | integer               | not null default 0
>> | plain    |              |
>> 
>> Note that required field, source_term_id. In the work I’ve been doing with
>> Bioperl I’ve been setting this “source term” to different values (e.g.
>> “NCBI”) depending on where the tag/value data in the feature comes from.
>> 
>> But here’s the code that makes a persistent feature, from BioSQL/Loader.py:
>> 
>>    def _load_seqfeature_basic(self, feature_type, feature_rank,
>> bioentry_id):
>>        """Load the first tables of a seqfeature and returns the id
>> (PRIVATE).
>> 
>>        This loads the "key" of the seqfeature (ie. CDS, gene) and
>>        the basic seqfeature table itself.
>>        """
>>        ontology_id = self._get_ontology_id('SeqFeature Keys')
>>        seqfeature_key_id = self._get_term_id(feature_type,
>>                                              ontology_id=ontology_id)
>>        # XXX source is always EMBL/GenBank/SwissProt here; it should depend
>> on
>>        # the record (how?)
>>        source_cat_id = self._get_ontology_id('SeqFeature Sources')
>>        source_term_id = self._get_term_id('EMBL/GenBank/SwissProt',
>>                                           ontology_id=source_cat_id)
>> 
>>        sql = r"INSERT INTO seqfeature (bioentry_id, type_term_id, " \
>>              r"source_term_id, rank) VALUES (%s, %s, %s, %s)"
>>        self.adaptor.execute(sql, (bioentry_id, seqfeature_key_id,
>>                                   source_term_id, feature_rank + 1))
>>        seqfeature_id = self.adaptor.last_id('seqfeature')
>> 
>>        return seqfeature_id
>> 
>> This code always sets the source term to “ EMBL/GenBank/SwissProt”, and it
>> can not be set to anything else. A better idea is to have a method to set
>> and get this, e.g. source(), just as you can set the “type” of the feature.
>> The way to do this is to subclass SeqFeature to make DBSeqFeature, just as
>> Seq is subclassed to make DBSeq and SeqRecord is subclassed to make
>> DBSeqRecord in BioSQL/Seq.py.
>> 
>> So I propose to fork, code, and send a pull request for this. What do you
>> think?
>> 
>> Thanks again,
>> 
>> Brian O.
>> 
>> _______________________________________________
>> Biopython-dev mailing list
>> Biopython-dev at mailman.open-bio.org
>> http://mailman.open-bio.org/mailman/listinfo/biopython-dev




More information about the Biopython-dev mailing list