[Bioperl-l] Entrez Gene ASN
Hilmar Lapp
hlapp at gmx.net
Sat Mar 12 19:55:44 EST 2005
On Friday, March 11, 2005, at 11:02 AM, Stefan Kirov wrote:
>
>
> Hilmar Lapp wrote:
>
>> Gene shouldn't be fundamentally different from LocusLink, and
>> LocusLink was represented as an annotated SeqI within bioperl.
>
> It is not, you are right.
>
>>
>> If at all possible I'd still like it to remain that way for Gene in
>> order to allow for a smooth transition from LL to Gene for code
>> that's been using the former.
>>
> hmmmm, back compatibility is good thing, but sometimes it may be hard
> to achieve.
Well, now you contradict yourself. Above you agree that Gene and
LocusLink are fundamentally the same, and here you say representing
them in a compatible fashion may be hard to achieve ...
There are problems indeed though, read on ...
>
>> If you want to emphasize the fact that it's a container for
>> sequences, then that sounds like a ClusterI to me, which can be
>> richly annotated too.
>
> Let me disagree here. Cluster is designed for independent sequences,
> where Gene should deal with sequences, that have hierarchical
> relationship among themselves.
Two notes here. First, ClusterI is not designed for independent
sequences. It is just meant as a container for sequences, be those
related to each other or not.
Second, the ability to represent hierarchical relationships between
sequences is basically absent from bioperl, not just from ClusterI
(aside from ClusterI representing a relationship between the containing
seq and the contained seqs).
We should think seriously before we add that capability. Most of the
people and effort in the field towards hierarchical relationships
between biological entities with sequence takes place in the domain of
feature hierarchies, *not* sequence hierarchies. See GFF3, SO, GBrowse,
Chado, and related efforts.
The only place I know where sequence heirarchies are extensively used
is in our local adaptation of Biosql, and we do all of this in SQL (as
bioperl and therefore bioperl-db has zero support for it).
It's possible but I'm not sure also wise to duplicate the support for
feature hierarchies to sequences ... Wouldn't it in the end benefit
more people if you were able to tie in Gene into the Unflattener that
Chris wrote?
> This is one of the issues I think Seq object is not designed to deal
> with. What we need is:
> genome--(Bio::Seq)-
> |--transcript(Bio::Seq)
> |--protein(Bio::Seq)
> |--transcript(Bio::Seq)
> |--protein(Bio::Seq)
Well, yeah, if you replace Bio::Seq with Bio::SeqFeatureI you are
pretty close to GFF3 and a growing wealth of support for it.
>
> Another significant concern I have is that if we store everything as
> SeqFeature or the overhead may become huge (some records have hundreds
> of different features)
Have you talked to Lincoln about this? I believe GBrowse is dealing
pretty well with this huge overhead but I may be missing something here.
> [...] and any user of the parser will have to do quite of a data
> mining to find the relevant feature. One approach would be to add more
> Bio::Annotation:: objects (for example Bio::Annotation::STS,
> Bio::Annotation::GRIF, etc).
Possibly. Bio::Annotation objects was in fact what I was primarily
referring to when I spoke about annotation.
> We may decide to create a simplified (Bio::Seq, no relationships) or
> more complex object (Gene), based on the user request.
Just as an aside, I guess you know that there is a Gene object already,
but it's feature based.
-hilmar
--
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------
More information about the Bioperl-l
mailing list