[Bioperl-l] Bio::Location::{Simple,Fuzzy} and "IN-BETWEEN"
Chris Fields
cjfields at illinois.edu
Fri Mar 13 04:11:10 UTC 2009
(responding to get the discussion going, maybe one of the LocationI
designers can respond...)
On Mar 10, 2009, at 7:31 PM, George Hartzell wrote:
> I just tripped over the $self->throw() in B::L::Fuzzy->start where it
> won't let me use a Fuzzy if the start and end are adjacent.
>
> I think that this is going to be one of those More Than One Way To Do
> It kind of things, but I don't understand the restriction.
>
> I have some data for insertions that get a location between two
> adjacent bases, e.g. s^e, an IN-BETWEEN that starts at s and ends at
> e.
>
> Then I map that location via an alignment to a second sequence and it
> maps into a gap on the second sequence. In this case the sequences
> are no longer adjacent, e.g. the left edge of the gap is l and the
> right edge is r. At this point I know less than I did, and am trying
> to represent it as >l^<r, e.g. an IN-BETWEEN that starts somewhere
> after l and ends somewhere before r. This seems to work.
Something like the following?
foo/1-17 aataaataaaagggcca
bar/1-14 aataaa---aagggcaa
^
So that would be 8^9 for 'foo' (excuse the arrow if you don't have
fixed-width text, it's pointing at pos 8 on 'foo'). I would argue
that there is no similar feature there at all for 'bar'. The relevant
sequence is missing for the insertion, so no feature can be reliably
assigned. That would be interesting by itself (at least to me).
If one had to mark it I would guess 'bar' is 6^7, not >6^7< as the
ends are both known and present, (no '<' or '>') but the sequence maps
between the two coordinates.
> If I have something on the second sequence in the gap region and am
> mapping it back to the first then it's going to end up with adjacent
> start and end.
I don't think the converse works for the reasons stated above; the
position of interest lies in a gap, so it's lossy. If I understand
this correctly, we wouldn't know exactly which gap position the
insertion would be in; the feature would map back as somewhere within
7-9 (or 7.9). BTW, I believe that latter WITHIN designation is
deprecated in the Feature Table definition.
At least at this point, the only way I can think of to reliably
translate the position back is if the start/end is referring to the
alignment column position, not the sequence. SimpleAlign does allow
features but I'm not sure if they point to the alignment position or
to individual sequences within the alignment.
> It seems like it's be useful for me to just use Bio::Location::Fuzzy's
> everywhere and use exact info when I have it. Unfortunately several
> methods in Bio::Location::Fuzzy check for the first case and throw an
> exception.
>
> I'm hoping that a history lesson or other insight might help me
> understand why those checks are there. There don't seem to be any
> other checks that prevent one from specifying something exact in a
> Fuzzy and there doesn't seem to be any restriction about specifying an
> IN-BETWEEN Fuzzy....
>
> g.
Anyone else have thoughts? I had some issues with Location a while
back, dealing with (I think) how split locations deal with
strandedness, but I've slept since then...
chris
More information about the Bioperl-l
mailing list