[BioSQL-l] Question regarding BioPerl / BioSQL - InterPro Optional IDs

Sat Jul 4 13:18:32 UTC 2009

The problem here is that Bioperl-db (the persistence mapper between  
BioSQL and BioPerl) loses the optional_id property of  
Bio::Annotation::DBLink objects.

Moreover, the dbxref table in BioSQL doesn't actually provide for the  
opportunity to store two identifiers (or accessions) for one db_xref,  
so storing this bit of information is actually not as straightforward  
as one might wish b/c it would need to go into the  
dbxref_qualifier_value table, and I would not be surprised if the  
other Bio* projects with a mapping to BioSQL don't store or retrieve  
this either (though it'd be good to hear if anyone does).

Here are a couple of ideas for how this issue might be addressed.

- Write a Bio::Seq::BaseSeqProcessor-derived object that for every  
incoming sequence massages all Interpro links to either substitute the  
primary_id with the optional_id, or to add a second DBLink annotation  
with the optional_id of the original one as its primary_id. (pros:  
relatively easy, entirely under your control; cons: you either lose  
the primary_id now, or have two dbxref annotations for each of the  
original ones.)

- Add a column to the dbxref table, and code to Bioperl-db, that  
store, de/serialize the extra ID. (pros: not losing or duplicating any  
data; cons: change is significant in terms of schema stability,  
requires new release, depends on implementation in Bioperl-db,  
necessitates update of all other Bio* language bindings)

- De/serialize the optional_id as an entry in the  
dbxref_qualifier_value table. (pros: technically it's the Right Way as  
that's what the table was intended for; cons: implementing in Bioperl- 
db is more involved as we now need to transform an object property to  
a child object and back)

So I'd say this is a bug in Bioperl-db in that the  
dbxref_qualifier_value table isn't utilized here. Would you mind  
filing it? In the meantime, if you just need something that works, you  
could try the first of the above ideas.

	-hilmar

On Jul 3, 2009, at 7:17 PM, John LaCava wrote:

> Hi all,
>
> Tried this on the BioPerl-l but seemed to make sense to try here as  
> well.
>
> I am trying to use the BioPerl-db script:
>
> "load_seqdatabase.pl" to parse a SwissProt ".dat" file (Yeast.dat,  
> this is the yeast proteome with annotations etc.).
>
> The particular entry I am interested is the InterPro optional ID,  
> which is the domain name.
>
> I have put a short stub up which displays the 4 pieces of info I  
> want to parse into my data base.
> That can be found here:
>
> http://github.com/johnraekwon/BioPerl---BioSQL---InterPro-Optional-IDs/tree/master
>
> You can see that near the bottom, we get the optional ID:
> $protein_ids->{interpro_domain} = $dblink->{optional_id};
>
> I do not think the bioperl script load_seqdatabase.pl retrieves this  
> information.  At least, I cannot find it in the db built from  
> parsing a test .dat file.
> I would like some help figuring out:
> 1) WHY doesn't it retrieve this information, since it seems to be  
> parsing "all" annotations...
> 2) HOW might I edit the script to include this particular annotation  
> of interest in the info it passes to my db (biosql)
>
> I am a bit out of my depth on this, and so, any help is appreciated.
>
> Cheers,
> John
>
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================