[Bioperl-l] added -type for Bio::Annotation::DBLink

Dave Messina David.Messina at sbc.su.se
Thu Jul 8 18:27:13 UTC 2010


Hi everybody,

In working on representing sequence metadata*, I've found it useful to track
the type of information that is involved in a database cross-reference. I'm
adding an optional -type property to Bio::Annotation::DBLink to support this
cleanly in BioPerl.

Here follows my rationale. I'm not tied to doing this in B:A::DBLink if
there's a better way — it just seems the best route to me at the moment.

--

So, what do I mean by tracking the type of information in a DB
crossreference?

Right now, a standard DBLink contains

database => RefSeq
ID => NM_12345

along with a few other optional properties. See the docs for details:
http://doc.bioperl.org/bioperl-live/Bio/Annotation/DBLink.html


I want to be able to say

database => RefSeq
ID => NM_12345
 type => RNA

Why?

Two reasons:

1. a single database can store more than one type of information.

RefSeq, for example, stores RNA and protein records. Although RefSeq's IDs
are named intelligently to note their type (NM_xxx for transcript, NP_xxx
for protein), this is not true for all databases and not everybody knows the
ID codes.


For example, here are three database-ID pairs:

Genbank: AK291692.1
Genbank: CH471055.1
 Genbank: AAH14616.1

Those are three different record types (mRNA, genomic DNA, protein) from the
same database.


2. There can be multiple crossreferences for multiple types of information.

There can be multiple source databases providing the same type of
crossreference and multiple types of crossreferences.

Take this example:

Genbank: AAA81779.1
EMBL: AK291692
Ensembl: ENST00000308775
 HPA: CAB001960

Two of these are mRNA records, one is a protein record, and one is something
else entirely. If I wanted to take one mRNA xref and one protein xref from
this set, I couldn't do it using solely the information provided above.

If I had type information, though, it'd be easy.

And since -type is an optional parameter, it is fully backwards-compatible.

Any thoughts or comments?


Dave


* specifically, in SeqXML. See http://seqxml.org and
http://doc.bioperl.org/bioperl-live/Bio/SeqIO/seqxml.html




More information about the Bioperl-l mailing list