[Bioperl-l] Entrez Gene ASN

Mon Mar 14 11:28:40 EST 2005

Hilmar,

Hilmar Lapp wrote:

>
> On Friday, March 11, 2005, at 11:02  AM, Stefan Kirov wrote:
>
>>
>>
>> Hilmar Lapp wrote:
>>
>>> Gene shouldn't be fundamentally different from LocusLink, and 
>>> LocusLink was represented as an annotated SeqI within bioperl.
>>
>>
>> It is not, you are right.
>>
>>>
>>> If at all possible I'd still like it to remain that way for Gene in 
>>> order to allow for a smooth transition from LL to Gene for code 
>>> that's been using the former.
>>>
>> hmmmm, back compatibility is good thing, but sometimes it may be hard 
>> to achieve.
>
>
> Well, now you contradict yourself. Above you agree that Gene and 
> LocusLink are fundamentally the same, and here you say representing 
> them in a compatible fashion may be hard to achieve ...

Not really. They are fairly similar, but not completely and moreover, I 
believe LocusLink parser wouldn't deal with hierarchies.... It just puts 
everything in Annotation objects, thus loosing the relationships 
(correct me if I am wrong here). Same with homologs.

>
> There are problems indeed though, read on ...
>
>>
>>> If you want to emphasize the fact that it's a container for 
>>> sequences, then that sounds like a ClusterI to me, which can be 
>>> richly annotated too.
>>
>>
>> Let me disagree here. Cluster is designed for independent sequences, 
>> where Gene should deal with sequences, that have hierarchical 
>> relationship among themselves.
>
>
> Two notes here. First, ClusterI is not designed for independent 
> sequences. It is just meant as a container for sequences, be those 
> related to each other or not.

OK, I meant independent as in "I don't know what is your relationship". 
My point is it is not fit to describe the hierarchy here.

>
> Second, the ability to represent hierarchical relationships between 
> sequences is basically absent from bioperl, not just from ClusterI 
> (aside from ClusterI representing a relationship between the 
> containing seq and the contained seqs).
>
> We should think seriously before we add that capability. Most of the 
> people and effort in the field towards hierarchical relationships 
> between biological entities with sequence takes place in the domain of 
> feature hierarchies, *not* sequence hierarchies. See GFF3, SO, 
> GBrowse, Chado, and related efforts.

I belive it is reasonable to have this functionality. Anyway I see 
sequence vs sequence feature hierarchy more as a philosophical question 
with a little practical value (unless I am missing something important). 
By the ways isn't GBrowse mysql based?

>
> The only place I know where sequence heirarchies are extensively used 
> is in our local adaptation of Biosql, and we do all of this in SQL (as 
> bioperl and therefore bioperl-db has zero support for it).
>
> It's possible but I'm not sure also wise to duplicate the support for 
> feature hierarchies to sequences ... Wouldn't it in the end benefit 
> more people if you were able to tie in Gene into the Unflattener that 
> Chris wrote?
>
>>  This is one of the issues I think  Seq object is not designed to 
>> deal with.  What we need is:
>> genome--(Bio::Seq)-
>>                   |--transcript(Bio::Seq)
>>                                          |--protein(Bio::Seq)
>>                     |--transcript(Bio::Seq)
>>                                          |--protein(Bio::Seq)
>
>
> Well, yeah, if you replace Bio::Seq with Bio::SeqFeatureI you are 
> pretty close to GFF3 and a growing wealth of support for it.
>
>>
>> Another significant concern I have is that if we store everything as 
>> SeqFeature or the overhead may become huge (some records have 
>> hundreds of different features)
>
>
> Have you talked to Lincoln about this? I believe GBrowse is dealing 
> pretty well with this huge overhead but I may be missing something here.
>
No, I have not, I guess I should...

>
>> [...] and any user of the parser will have to do quite of a data 
>> mining to find the relevant feature. One approach would be to add 
>> more Bio::Annotation:: objects (for example Bio::Annotation::STS, 
>> Bio::Annotation::GRIF, etc).
>
>
> Possibly. Bio::Annotation objects was in fact what I was primarily 
> referring to when I spoke about annotation.
>
So do we agree that Bio::Annotation needs some expansion? What other 
people think?

>> We may decide to create a simplified (Bio::Seq, no relationships) or 
>> more complex object (Gene), based on the user request.
>
>
> Just as an aside, I guess you know that there is a Gene object 
> already, but it's feature based.

Yes, but actually Bio::LiveSeq::Gene (vs Bio::SeqFeature::Gene) is more 
like what I had in mind (it lacks documentation and relationships I 
think, but is a good start), but still what about phylogeny?

>
>     -hilmar