[Bioperl-l] biosql and relationships
Hilmar Lapp
hlapp at gmx.net
Wed Aug 4 05:19:17 EDT 2004
On Wednesday, August 4, 2004, at 03:40 AM, Robson Francisco de Souza
{S} wrote:
> So, is it reasonable to use bioentry_relationship to store different
> identifiers for the same bioentries, but from different or even the
> same
> database? I'm asking that because for many genes you will find several
> names/identifiers like GI numbers, gene names or locus tags and
> accession
> numbers from different databases like KEGG, SWISSPROT, etc.
You can represent that using bioentry_relationships but I wouldn't do
so when it's artificial.
I.e., I wouldn't dissect a Genbank entry into GI#, locus tag, gene name
etc and make them all different bioentries that you would then need to
link back together.
If, however, you want to represent that two entries from different
databases are, for instance, synonymous to each other, then loading
both followed by establishing an entry in the bioentry_relationship
table is how I would do that, using SQL or some hand-crafted script.
>
> Is it possible to load such identifiers for the sequences into
> bioentry_relationship without actually loading the sequence data?
Having a row in biosequence for a bioentry is optional, and this being
optional is supported by bioperl-db. So, yes, the schema is aware that
certain bioentries may not even have the concept of a sequence (like,
e.g., a LL entry), and if you pass a SeqI object to bioperl-db that
lacks a sequence you won't get a row in biosequence.
> How could I do so if my sequence relationships are a set of
> alternative identifiers for proteins in a tab separeted file
> (non-redundant
> sequences identifiers from PIR-NREF)?
If it's just a list of identifiers for a certain entry then I'd add
those identifiers as annotation (e.g., using
Bio::Annotation::SimpleValue) to the existing entry.
E.g., to process your tab-delimited file, look up the sequence object
from the database using a unique key query by e.g. accession#,
instantiate a Bio::Annotation::SimpleValue object for each additional
identifier, and add it to the annotation bundle
($seq->annotation->add_Annotation()). When you're done re-serialize
through $seq->store().
If the file rather is a cross-referencing table then I'd create
bare-bones sequence objects for each and serialize them.
>
> Also, what I need is a database of sequence aliases, that would make
> it easy to find identifiers for sequences in different resources
> (UniProt,
> Kegg/COG/KOG orthologous classifications, etc.).
> Do you have any suggestion on how to implement such a database using
> BioSQL/bioperl-db?
See above. You can see those in action at symatlas.gnf.org (but there's
no KEGG yet; also, InterPro is the only protein domain annotation at
present).
-hilmar
>
> Robson
>
>> Since bioperl doesn't have a straightforward corresponding class,
>> however, it is not supported through the bioperl object layer except
>> for Bio::ClusterI objects (e.g., Bio::Cluster::Unigene).
>>
>> There's different courses you could pursue. What I'm doing in my
>> instance is making heavy use of SQL scripts that I run against the
>> database on a regular basis, and which would synthesize those
>> relationships based off of, for instance, dbxref-to-bioentry matches.
>> Instead of running sql directly, you run also wrap a perl script
>> around
>> it for processing input files or things like iterative/recursive
>> queries. Or you could amend bioperl (and then bioperl-db) to support
>> this in a better fashion.
>>
>> -hilmar
>>
>> --
>> -------------------------------------------------------------
>> Hilmar Lapp email: lapp at gnf.org
>> GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
>> -------------------------------------------------------------
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
--
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------
More information about the Bioperl-l
mailing list