[Bioperl-l] Hilmar and Ewan debate SeqFeatures some more...

Mark Wilkinson mwilkinson@gene.pbi.nrc.ca
Fri, 19 Jan 2001 14:49:30 -0600


Hi all!

> However, there are people who do want to do meaningful stuff with
> the coordinates. One of these is our group in Vienna (yes, we draw
> features, and yes, that adds to my concern). The other I know of
> is David with his GUI, which is why I put him on cc.

Hey!  Don't forget the primary author of the SeqCanvas GUI   :-)

If it's okay I have $0.02 to contribute too...


> I agree completely here. I even think $feature->start() can stay
> there forever.
> >snip<
> And I think Location (where) and Feature (what) are not redundant.

This, to me, is the crux of the argument, and I have to side with Hilmar on
this.  From a biological perspective, location and feature are absolutely *not*
redundant.  We are arguing about how to represent something computationally
that has not been universally agreed upon even by the geneticists/MolBiologists
themselves:  What is a gene?  I personally think that Hilmars view is more
"biologically correct" (tm), that a gene, or more generally a feature, is best
described as it was described to me as a first year undergraduate many years
ago, "a functional unit of DNA".   These "functional units" may be overlapping,
even extensively, but if they do not have *exactly* the same function then they
should probably be considered entirely different features, rather than a single
feature with multiple compositions... (I hope I am not over-interpreting your
views, Hilmar...).  This single-feature-multiple-function is an absolute
nightmare for annotators!!

So, in my world view, $Feature->start should only be ambiguous if that *unique
functional unit* has a bona fide ambiguous start.  In such a case, I would then
side with Ewan in his proposal that there should, nevertheless, be a default
$Feature->start value for these fuzzy features (NO EXCEPTION THROWING!!), but
that they are somehow "flagged" such that smarter clients will be able to
easily query these features for their fuzziness and display this fuzziness if
they have the ability (interestingly, we just  initiated a research project
with several CompSci students to investigate how to best visualize exactly
these kinds of "fuzzy" or ambiguous situations!!).  This was not my primary
consideration when I was writing SeqCanvas, but I have already noticed that
this module, as it stands, is nowhere near sufficient to represent "reality",
and will need to be thought-out from scratch over the next few months as our
group trips over these kinds of problems more and more often.  (Stay tuned!  I
intend to re-focus my energies on this code as soon as other more pressing
issues are out of the way!)

So, w.r.t. SeqCanvas  & other GUI's which exist already, I would hope that
these are not an issue in this debate!  My personal opinion is that BioPerl
should make the capturing of biological reality its primary concern and, within
reason, leave the problem of parsing and displaying this data to the client;
"it's an  S.E.P.".  If it is generally agreed upon by the community that
$Feature->start is no longer an adequate representation of "reality",  then it
should be dumped, regardless of what parsers may already exist.
$Feature->start is not the holy grail, the biological data is.  (Personally, I
can't imagine a scenario where $Feature->start would no longer be useful... but
you probably understand what I am getting at...)


> It does exist. It may not be the most frequent case, but it is a
> use case for us. And probably for everyone who draws features.

indeed, it does exist!  And it looks like it will only get worse as we learn
more...

Anyway, for what it's worth, that's my two bits :-)

Cheers all!

M

--
---
Dr. Mark Wilkinson
Bioinformatics Group
National Research Council of Canada
Plant Biotechnology Institute
110 Gymnasium Place
Saskatoon, SK
Canada