[Bioperl-l] Entrez Gene ASN
Stefan Kirov
skirov at utk.edu
Mon Mar 14 11:28:40 EST 2005
Hilmar,
Hilmar Lapp wrote:
>
> On Friday, March 11, 2005, at 11:02 AM, Stefan Kirov wrote:
>
>>
>>
>> Hilmar Lapp wrote:
>>
>>> Gene shouldn't be fundamentally different from LocusLink, and
>>> LocusLink was represented as an annotated SeqI within bioperl.
>>
>>
>> It is not, you are right.
>>
>>>
>>> If at all possible I'd still like it to remain that way for Gene in
>>> order to allow for a smooth transition from LL to Gene for code
>>> that's been using the former.
>>>
>> hmmmm, back compatibility is good thing, but sometimes it may be hard
>> to achieve.
>
>
> Well, now you contradict yourself. Above you agree that Gene and
> LocusLink are fundamentally the same, and here you say representing
> them in a compatible fashion may be hard to achieve ...
Not really. They are fairly similar, but not completely and moreover, I
believe LocusLink parser wouldn't deal with hierarchies.... It just puts
everything in Annotation objects, thus loosing the relationships
(correct me if I am wrong here). Same with homologs.
>
> There are problems indeed though, read on ...
>
>>
>>> If you want to emphasize the fact that it's a container for
>>> sequences, then that sounds like a ClusterI to me, which can be
>>> richly annotated too.
>>
>>
>> Let me disagree here. Cluster is designed for independent sequences,
>> where Gene should deal with sequences, that have hierarchical
>> relationship among themselves.
>
>
> Two notes here. First, ClusterI is not designed for independent
> sequences. It is just meant as a container for sequences, be those
> related to each other or not.
OK, I meant independent as in "I don't know what is your relationship".
My point is it is not fit to describe the hierarchy here.
>
> Second, the ability to represent hierarchical relationships between
> sequences is basically absent from bioperl, not just from ClusterI
> (aside from ClusterI representing a relationship between the
> containing seq and the contained seqs).
>
> We should think seriously before we add that capability. Most of the
> people and effort in the field towards hierarchical relationships
> between biological entities with sequence takes place in the domain of
> feature hierarchies, *not* sequence hierarchies. See GFF3, SO,
> GBrowse, Chado, and related efforts.
I belive it is reasonable to have this functionality. Anyway I see
sequence vs sequence feature hierarchy more as a philosophical question
with a little practical value (unless I am missing something important).
By the ways isn't GBrowse mysql based?
>
> The only place I know where sequence heirarchies are extensively used
> is in our local adaptation of Biosql, and we do all of this in SQL (as
> bioperl and therefore bioperl-db has zero support for it).
>
> It's possible but I'm not sure also wise to duplicate the support for
> feature hierarchies to sequences ... Wouldn't it in the end benefit
> more people if you were able to tie in Gene into the Unflattener that
> Chris wrote?
>
>> This is one of the issues I think Seq object is not designed to
>> deal with. What we need is:
>> genome--(Bio::Seq)-
>> |--transcript(Bio::Seq)
>> |--protein(Bio::Seq)
>> |--transcript(Bio::Seq)
>> |--protein(Bio::Seq)
>
>
> Well, yeah, if you replace Bio::Seq with Bio::SeqFeatureI you are
> pretty close to GFF3 and a growing wealth of support for it.
>
>>
>> Another significant concern I have is that if we store everything as
>> SeqFeature or the overhead may become huge (some records have
>> hundreds of different features)
>
>
> Have you talked to Lincoln about this? I believe GBrowse is dealing
> pretty well with this huge overhead but I may be missing something here.
>
No, I have not, I guess I should...
>
>> [...] and any user of the parser will have to do quite of a data
>> mining to find the relevant feature. One approach would be to add
>> more Bio::Annotation:: objects (for example Bio::Annotation::STS,
>> Bio::Annotation::GRIF, etc).
>
>
> Possibly. Bio::Annotation objects was in fact what I was primarily
> referring to when I spoke about annotation.
>
So do we agree that Bio::Annotation needs some expansion? What other
people think?
>> We may decide to create a simplified (Bio::Seq, no relationships) or
>> more complex object (Gene), based on the user request.
>
>
> Just as an aside, I guess you know that there is a Gene object
> already, but it's feature based.
Yes, but actually Bio::LiveSeq::Gene (vs Bio::SeqFeature::Gene) is more
like what I had in mind (it lacks documentation and relationships I
think, but is a good start), but still what about phylogeny?
>
> -hilmar
More information about the Bioperl-l
mailing list