[Bioperl-l] Question about Bio::Coordinate::Pair
Dan Bolser
dan.bolser at gmail.com
Fri Apr 29 09:53:33 UTC 2011
This issue (see below) got raised again recently and, with retrospect,
I think I understand better now.
In the example case here:
http://www.bioperl.org/wiki/Module_Discussion:Bio::Coordinate::Pair
what we see is that, that part of the feature to 'map' that extends
beyond the range of the '-in' sequence is (arbitrarily?) mapped to a
'gap' (a Bio::Coordinate::Result::Gap) on *that* sequence in the
result. This gap has the seq_id and strand of *that* sequence, and
that's that.
i.e. The Bio::Coordinate::Pair defines the position of (part of) a
contig in a scaffold, and we wish to 'map' some features that are in
'contig coordinates' onto the scaffold to get the same features in
'scaffold coordinates'. Any feature falling within the contig (or
within that part of the contig that we specify when creating the
Bio::Coordinate::Pair) is 'mapped' as a single 'Match'
(Bio::Coordinate::Result::Match) within the result.
If, however, feature (in 'contig coordinates') lies outside or across
the part of the contig specified when creating the
Bio::Coordinate::Pair, that (part of) the feature is 'mapped' as a
single 'Gap' (Bio::Coordinate::Result::Gap) which is given in terms of
the contig (has the contig seq_id and the contig strand).
I'm really not sure of the semantics of this... It makes sense that,
if we put half of one contig into one scaffold, that we don't want
features from the other half to be mapped onto the scaffold, but I
don't know why we would want those features to turn up as gaps, but
there you have it.
The changes I made to Bio::Coordinate::Pair are now defunct, but I'll
still request that my test script gets pulled into master [1], because
more tests can't hurt right?
Cheers,
Dan.
[1] https://github.com/bioperl/bioperl-live/pull/14
On 8 October 2010 16:57, Dan Bolser <dan.bolser at gmail.com> wrote:
> Actually, I changed the Strand of the gap in a few more places [1],
> and now there are a total of 4 failed tests, however, they all appear
> to be of the same form as the one reported below:
>
>
> not ok 84
> # Failed test at t/Coordinate/CoordinateMapper.t line 229.
> # got: '1'
> # expected: '-1'
>
> not ok 93
> # Failed test at t/Coordinate/CoordinateMapper.t line 246.
> # got: '1'
> # expected: '-1'
>
> not ok 99
> # Failed test at t/Coordinate/CoordinateMapper.t line 262.
> # got: '1'
> # expected: '-1'
>
> not ok 102
> # Failed test at t/Coordinate/CoordinateMapper.t line 265.
> # got: '1'
> # expected: '-1'
>
>
> So now you know.
>
>
> Dan.
>
> [1] http://github.com/dbolser/bioperl-live/tree/dbolser_bio_coordinate_pair_tests
>
> On 8 October 2010 16:04, Dan Bolser <dan.bolser at gmail.com> wrote:
>> Well... I tried to toggle the strand of the gap sublocation to match
>> that of the match sublocation, and overall I ended up failing one test
>> within t/Coordinate/CoordinateMapper.t, test number 55 (line 119)...
>>
>> This test actually tests for the strandedness of the gap sublocation
>> that I'm specifically changing because it leads to the 'unexpected'
>> (or at least, 'inconsistent') behaviour that I'm calling a bug (test
>> 55 is at the end, with enough preceding context to reproduce it):
>>
>> # propepide
>> my $match1 = Bio::Location::Simple->new
>> (-seq_id => 'propeptide', -start => 21, -end => 40, -strand=>1 );
>> # peptide
>> my $match2 = Bio::Location::Simple->new
>> (-seq_id => 'peptide', -start => 1, -end => 20, -strand=>1 );
>>
>> ok my $pair = Bio::Coordinate::Pair->new(-in => $match1,
>> -out => $match2,
>> -negative => 0, # false, default
>> );
>>
>> ...
>>
>> #
>> # partial match = gap & match
>> #
>> $pos2 = Bio::Location::Simple->new
>> (-start => 20, -end => 22, -strand=> -1 );
>>
>> ok $res = $pair->map($pos2);
>>
>> ...
>>
>> is $res->gap->strand, -1; # TEST 55. Fails when I 'fix' Bio::Coordinate::Pair
>>
>>
>>
>> In the absence of any other information, can I take this to mean that
>> the strand of the gap sublocations are not used for anything
>> significant?
>>
>>
>> Cheers,
>> Dan.
>>
>>
>>
>> On 5 October 2010 11:36, Dan Bolser <dan.bolser at gmail.com> wrote:
>>> Hi,
>>>
>>> Can someone describe in a bit more detail the purpose of the Gap
>>> sublocations that are sometimes returned by Bio::Coordinate::Pair [1]?
>>>
>>> I found that, according to Bio::Location::Split, if the Match and Gap
>>> sublocations have a different strand, the strand method (called via
>>> Bio::Coordinate::Result) returns undef. This is inconsistent with the
>>> way Bio::Coordinate::Result tends to behave. See the test script and
>>> results below, also pasted here [2].
>>>
>>> The question is, can I just toggle the strand of the Gap sublocation
>>> to match that of the Match sublocation? Or does the strand of the Gap
>>> sublocation encode some important but as yet undocumented information?
>>> If the strand of the Gap and Match sublocations are made to match
>>> (within Bio::Coordinate::Pair) this will simplify code that uses
>>> Bio::Coordinate::Pair, making it more consistent, and perhaps help
>>> with some other bugs [3].
>>>
>>>
>>> Cheers,
>>> Dan.
>>>
>>> [1] http://www.bioperl.org/wiki/Module:Bio::Coordinate::Pair
>>> [2] http://www.bioperl.org/wiki/Module_Discussion:Bio::Coordinate::Pair
>>> [3] http://tinyurl.com/36na2cp
>>>
>>>
>>> #!/usr/bin/perl -w
>>>
>>> ## Stress test Bio::Coordinate::Pair
>>>
>>> use strict;
>>> use Data::Dumper;
>>>
>>> use Bio::Location::Simple;
>>> use Bio::Coordinate::Pair;
>>>
>>> ## A contig
>>> my $ctg = Bio::Location::Simple->
>>> new( -seq_id => 'ctg',
>>> -start => 1,
>>> -end => 1001,
>>> -strand => +1,
>>> );
>>>
>>> ## The contigs position on a chromosome (forward)
>>> my $ctg_on_chr_f = Bio::Location::Simple->
>>> new( -seq_id => 'ctg on chr r',
>>> -start => 5001,
>>> -end => 6001,
>>> -strand => +1,
>>> );
>>>
>>> ## The contigs position on a chromosome (reverse)
>>> my $ctg_on_chr_r = Bio::Location::Simple->
>>> new( -seq_id => 'ctg on chr r',
>>> -start => 5001,
>>> -end => 6001,
>>> -strand => -1,
>>> );
>>>
>>> ## Coordinate mapping (forward)
>>> my $agp_f = Bio::Coordinate::Pair->
>>> new( -in => $ctg,
>>> -out => $ctg_on_chr_f
>>> );
>>>
>>> ## Coordinate mapping (reverse)
>>> my $agp_r = Bio::Coordinate::Pair->
>>> new( -in => $ctg,
>>> -out => $ctg_on_chr_r
>>> );
>>>
>>>
>>>
>>> ## A match, in contig coordinates...
>>> my $match_on_ctg_4 = Bio::Location::Simple->
>>> new( -seq_id => 'hit 4',
>>> -start => 925,
>>> -end => 1125,
>>> -strand => -1,
>>> );
>>>
>>> ## Map it into chromosome coordinates (forward)
>>> my $match_on_chr_4_f =
>>> $agp_f->map( $match_on_ctg_4 );
>>>
>>> print Dumper $match_on_chr_4_f, "\n";
>>>
>>> ## Map it into chromosome coordinates (reverse)
>>> my $match_on_chr_4_r =
>>> $agp_r->map( $match_on_ctg_4 );
>>>
>>> print Dumper $match_on_chr_4_r, "\n";
>>>
>>> __END__
>>>
>>> $VAR1 = bless( {
>>> '_sublocations' => [
>>> bless( {
>>> '_strand' => -1,
>>> '_seqid' => 'ctg on chr r',
>>> '_start' => 5925,
>>> '_location_type' => 'EXACT',
>>> '_end' => 6001
>>> },
>>> 'Bio::Coordinate::Result::Match' ),
>>> bless( {
>>> '_strand' => -1,
>>> '_seqid' => 'ctg',
>>> '_location_type' => 'EXACT',
>>> '_start' => 1002,
>>> '_end' => 1125
>>> }, 'Bio::Coordinate::Result::Gap' )
>>> ],
>>> '_gap' => $VAR1->{'_sublocations'}[1],
>>> 'strand' => -1,
>>> '_match' => $VAR1->{'_sublocations'}[0],
>>> '_splittype' => 'JOIN'
>>> }, 'Bio::Coordinate::Result' );
>>> $VAR2 = '
>>> ';
>>> $VAR1 = bless( {
>>> '_sublocations' => [
>>> bless( {
>>> '_strand' => 1,
>>> '_seqid' => 'ctg on chr r',
>>> '_start' => 5001,
>>> '_location_type' => 'EXACT',
>>> '_end' => 5077
>>> },
>>> 'Bio::Coordinate::Result::Match' ),
>>> bless( {
>>> '_strand' => -1,
>>> '_seqid' => 'ctg',
>>> '_location_type' => 'EXACT',
>>> '_start' => 1002,
>>> '_end' => 1125
>>> }, 'Bio::Coordinate::Result::Gap' )
>>> ],
>>> '_gap' => $VAR1->{'_sublocations'}[1],
>>> 'strand' => 1,
>>> '_match' => $VAR1->{'_sublocations'}[0],
>>> '_splittype' => 'JOIN'
>>> }, 'Bio::Coordinate::Result' );
>>> $VAR2 = '
>>> ';
>>>
>>
>
More information about the Bioperl-l
mailing list