[Bioperl-l] Bio::Location::{Simple,Fuzzy} and "IN-BETWEEN"

Heikki Lehvaslaiho heikki.lehvaslaiho at gmail.com
Fri Mar 13 13:24:16 UTC 2009


George,

Chris is right.

You are not suppose to use fuzzy ever!. It was introduced only because
in the olden times sequencing was diffucult and you knew that your
sequence feature starts before your actual sequence. The early
EMBL/GenBank design decision was to mark that with like "CDS <1..2344"
when you knew that your sequence did not start from the start of the
coding region.

You annotate something always in relation to the reference sequence.
If there is something, like an insertion in Chris' example, you use
IN-BETWEEN notation where the start and end have to be adjacent
residues. There is nothing fuzzy in that location, so do not try to
add it.

Yours,

  -Heikki

2009/3/13 Chris Fields <cjfields at illinois.edu>:
> (responding to get the discussion going, maybe one of the LocationI
> designers can respond...)
>
> On Mar 10, 2009, at 7:31 PM, George Hartzell wrote:
>
>> I just tripped over the $self->throw() in B::L::Fuzzy->start where it
>> won't let me use a Fuzzy if the start and end are adjacent.
>>
>> I think that this is going to be one of those More Than One Way To Do
>> It kind of things, but I don't understand the restriction.
>>
>> I have some data for insertions that get a location between two
>> adjacent bases, e.g. s^e, an IN-BETWEEN that starts at s and ends at
>> e.
>>
>> Then I map that location via an alignment to a second sequence and it
>> maps into a gap on the second sequence.  In this case the sequences
>> are no longer adjacent, e.g. the left edge of the gap is l and the
>> right edge is r.  At this point I know less than I did, and am trying
>> to represent it as >l^<r, e.g. an IN-BETWEEN that starts somewhere
>> after l and ends somewhere before r.  This seems to work.
>
> Something like the following?
>
> foo/1-17       aataaataaaagggcca
> bar/1-14       aataaa---aagggcaa
>                      ^
> So that would be 8^9 for 'foo' (excuse the arrow if you don't have
> fixed-width text, it's pointing at pos 8 on 'foo').  I would argue that
> there is no similar feature there at all for 'bar'.  The relevant sequence
> is missing for the insertion, so no feature can be reliably assigned.  That
> would be interesting by itself (at least to me).
>
> If one had to mark it I would guess 'bar' is 6^7, not >6^7< as the ends are
> both known and present, (no '<' or '>') but the sequence maps between the
> two coordinates.
>
>> If I have something on the second sequence in the gap region and am
>> mapping it back to the first then it's going to end up with adjacent
>> start and end.
>
> I don't think the converse works for the reasons stated above; the position
> of interest lies in a gap, so it's lossy.   If I understand this correctly,
> we wouldn't know exactly which gap position the insertion would be in; the
> feature would map back as somewhere within 7-9 (or 7.9).  BTW, I believe
> that latter WITHIN designation is deprecated in the Feature Table
> definition.
>
> At least at this point, the only way I can think of to reliably translate
> the position back is if the start/end is referring to the alignment column
> position, not the sequence.  SimpleAlign does allow features but I'm not
> sure if they point to the alignment position or to individual sequences
> within the alignment.
>
>> It seems like it's be useful for me to just use Bio::Location::Fuzzy's
>> everywhere and use exact info when I have it.  Unfortunately several
>> methods in Bio::Location::Fuzzy check for the first case and throw an
>> exception.
>>
>> I'm hoping that a history lesson or other insight might help me
>> understand why those checks are there.  There don't seem to be any
>> other checks that prevent one from specifying something exact in a
>> Fuzzy and there doesn't seem to be any restriction about specifying an
>> IN-BETWEEN Fuzzy....
>>
>> g.
>
> Anyone else have thoughts?  I had some issues with Location a while back,
> dealing with (I think) how split locations deal with strandedness, but I've
> slept since then...
>
> chris
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>



-- 
    -Heikki
Heikki Lehvaslaiho - heikki lehvaslaiho gmail com
Sent from: London Greater London United Kingdom.




More information about the Bioperl-l mailing list