[BioSQL-l] Question regarding BioPerl / BioSQL - InterPro Optional IDs
Hilmar Lapp
hlapp at gmx.net
Sat Jul 4 13:18:32 UTC 2009
The problem here is that Bioperl-db (the persistence mapper between
BioSQL and BioPerl) loses the optional_id property of
Bio::Annotation::DBLink objects.
Moreover, the dbxref table in BioSQL doesn't actually provide for the
opportunity to store two identifiers (or accessions) for one db_xref,
so storing this bit of information is actually not as straightforward
as one might wish b/c it would need to go into the
dbxref_qualifier_value table, and I would not be surprised if the
other Bio* projects with a mapping to BioSQL don't store or retrieve
this either (though it'd be good to hear if anyone does).
Here are a couple of ideas for how this issue might be addressed.
- Write a Bio::Seq::BaseSeqProcessor-derived object that for every
incoming sequence massages all Interpro links to either substitute the
primary_id with the optional_id, or to add a second DBLink annotation
with the optional_id of the original one as its primary_id. (pros:
relatively easy, entirely under your control; cons: you either lose
the primary_id now, or have two dbxref annotations for each of the
original ones.)
- Add a column to the dbxref table, and code to Bioperl-db, that
store, de/serialize the extra ID. (pros: not losing or duplicating any
data; cons: change is significant in terms of schema stability,
requires new release, depends on implementation in Bioperl-db,
necessitates update of all other Bio* language bindings)
- De/serialize the optional_id as an entry in the
dbxref_qualifier_value table. (pros: technically it's the Right Way as
that's what the table was intended for; cons: implementing in Bioperl-
db is more involved as we now need to transform an object property to
a child object and back)
So I'd say this is a bug in Bioperl-db in that the
dbxref_qualifier_value table isn't utilized here. Would you mind
filing it? In the meantime, if you just need something that works, you
could try the first of the above ideas.
-hilmar
On Jul 3, 2009, at 7:17 PM, John LaCava wrote:
> Hi all,
>
> Tried this on the BioPerl-l but seemed to make sense to try here as
> well.
>
> I am trying to use the BioPerl-db script:
>
> "load_seqdatabase.pl" to parse a SwissProt ".dat" file (Yeast.dat,
> this is the yeast proteome with annotations etc.).
>
> The particular entry I am interested is the InterPro optional ID,
> which is the domain name.
>
> I have put a short stub up which displays the 4 pieces of info I
> want to parse into my data base.
> That can be found here:
>
> http://github.com/johnraekwon/BioPerl---BioSQL---InterPro-Optional-IDs/tree/master
>
> You can see that near the bottom, we get the optional ID:
> $protein_ids->{interpro_domain} = $dblink->{optional_id};
>
> I do not think the bioperl script load_seqdatabase.pl retrieves this
> information. At least, I cannot find it in the db built from
> parsing a test .dat file.
> I would like some help figuring out:
> 1) WHY doesn't it retrieve this information, since it seems to be
> parsing "all" annotations...
> 2) HOW might I edit the script to include this particular annotation
> of interest in the info it passes to my db (biosql)
>
> I am a bit out of my depth on this, and so, any help is appreciated.
>
> Cheers,
> John
>
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l
--
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
===========================================================
More information about the BioSQL-l
mailing list