[Bioperl-l] biosql and relationships

Wed Aug 4 05:19:17 EDT 2004

On Wednesday, August 4, 2004, at 03:40  AM, Robson Francisco de Souza 
{S} wrote:

> So, is it reasonable to use bioentry_relationship to store different
> identifiers for the same bioentries, but from different or even the 
> same
> database? I'm asking that because for many genes you will find several
> names/identifiers like GI numbers, gene names or locus tags and 
> accession
> numbers from different databases like KEGG, SWISSPROT, etc.

You can represent that using bioentry_relationships but I wouldn't do 
so when it's artificial.

I.e., I wouldn't dissect a Genbank entry into GI#, locus tag, gene name 
etc and make them all different bioentries that you would then need to 
link back together.

If, however, you want to represent that two entries from different 
databases are, for instance, synonymous to each other, then loading 
both followed by establishing an entry in the bioentry_relationship 
table is how I would do that, using SQL or some hand-crafted script.

>
> Is it possible to load such identifiers for the sequences into
> bioentry_relationship without actually loading the sequence data?

Having a row in biosequence for a bioentry is optional, and this being 
optional is supported by bioperl-db. So, yes, the schema is aware that 
certain bioentries may not even have the concept of a sequence (like, 
e.g., a LL entry), and if you pass a SeqI object to bioperl-db that 
lacks a sequence you won't get a row in biosequence.

> How could I do so if my sequence relationships are a set of
> alternative identifiers for proteins in a tab separeted file 
> (non-redundant
> sequences identifiers from PIR-NREF)?

If it's just a list of identifiers for a certain entry then I'd add 
those identifiers as annotation (e.g., using 
Bio::Annotation::SimpleValue) to the existing entry.

E.g., to process your tab-delimited file, look up the sequence object 
from the database using a unique key query by e.g. accession#, 
instantiate a Bio::Annotation::SimpleValue object for each additional 
identifier, and add it to the annotation bundle 
($seq->annotation->add_Annotation()). When you're done re-serialize 
through $seq->store().

If the file rather is a cross-referencing table then I'd create 
bare-bones sequence objects for each and serialize them.

>
> Also, what I need is a database of sequence aliases, that would make
> it easy to find identifiers for sequences in different resources 
> (UniProt,
> Kegg/COG/KOG orthologous classifications, etc.).
> Do you have any suggestion on how to implement such a database using
> BioSQL/bioperl-db?

See above. You can see those in action at symatlas.gnf.org (but there's 
no KEGG yet; also, InterPro is the only protein domain annotation at 
present).

	-hilmar

>
> Robson
>
>> Since bioperl doesn't have a straightforward corresponding class,
>> however, it is not supported through the bioperl object layer except
>> for Bio::ClusterI objects (e.g., Bio::Cluster::Unigene).
>>
>> There's different courses you could pursue. What I'm doing in my
>> instance is making heavy use of SQL scripts that I run against the
>> database on a regular basis, and which would synthesize those
>> relationships based off of, for instance, dbxref-to-bioentry matches.
>> Instead of running sql directly, you run also wrap a perl script 
>> around
>> it for processing input files or things like iterative/recursive
>> queries. Or you could amend bioperl (and then bioperl-db) to support
>> this in a better fashion.
>>
>> 	-hilmar
>>
>> -- 
>> -------------------------------------------------------------
>> Hilmar Lapp                            email: lapp at gnf.org
>> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
>> -------------------------------------------------------------
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------