[Bioperl-l] Odd behaviour in Bio::SeqFeature::Gene::Transcript

Thu May 12 13:50:08 EDT 2005

On May 12, 2005, at 7:30 AM, Oliver Burren wrote:

> Dear Developers,
>
> I'm working with Bio::SeqFeature::Gene::Transcript and I am getting  
> some
> odd behaviour. In a nutshell I'm getting an error depending on which
> order i add exons to a transcript when I try and dump introns.
>
> Best illustrated with a script which I attach (test_intron.pl).
>
> Here is the output that I get :
> 229 machine /home/xxx % perl test_intron.pl
> Building transcript on -ve strand
> Exon Order is utr3prime,exon3,exon2,exon1,utr5prime
> Strand is set to -1
> SEQ             intron  31      30      .       -       .
> SEQ             intron  21      25      .       -       .
> SEQ             intron  11      15      .       -       .
> SEQ             intron  6       5       .       -       .
> Exon Order is utr5prime,exon1,exon2,exon3,utr3prime
> Strand is set to -1
>
> ------------- EXCEPTION  -------------
> MSG: Intron gap begins after '10' and ends before '1'
> STACK Bio::SeqFeature::Gene::Intron::location /home/xxxxx/bioperl-
> live/Bio/SeqFeature/Gene/Intron.pm:288
> STACK Bio::SeqFeature::Generic::strand /home/xxxxx/bioperl-
> live/Bio/SeqFeature/Generic.pm:356
> STACK Bio::Tools::GFF::_gff2_string /home/xxxxx/bioperl-
> live/Bio/Tools/GFF.pm:777
> STACK Bio::Tools::GFF::gff_string /home/xxxxx/bioperl-
> live/Bio/Tools/GFF.pm:680STACK
> Bio::SeqFeature::Generic::gff_string /home/xxxxx/bioperl-
> live/Bio/SeqFeature/Generic.pm:762
> STACK toplevel test_intron.pl:56
>
> --------------------------------------
>
>
>
> Is this the expected behaviour or a feature or more likely have I  
> made a
> mistake somwhere ?
>
>
> I can fix it by the following. However this is probably the wrong  
> thing
> to do. It can also be fixed by sorting exons before addition to
> transcript.
>

Sorting needs to take into account strandedness - it should be  
handled by the transcript object I would think.   I am assuming we  
won't ever have exons in the same transcript which are actually  
represented on different contigs.  This would then make the sorting  
not work - we run into the same problem in Bio::SeqFeatureI- 
 >spliced_seq function too.

So the sort needs to have a strand as part of the equation.  I think  
this takes care of it.  Multiply start by the strand in the  
schwartzian transformation.

$ cvs diff Bio/SeqFeature/Gene/Transcript.pm
Index: Bio/SeqFeature/Gene/Transcript.pm
===================================================================
RCS file: /home/repository/bioperl/bioperl-live/Bio/SeqFeature/Gene/ 
Transcript.pm,v
retrieving revision 1.33
diff -r1.33 Transcript.pm
294c294,295
<         @exons = map { $_->[0] } sort { $a->[1] <=> $b->[1] } map  
{ [ $_, $_->start()] } @exons;
---
 >         @exons = map { $_->[0] } sort { $a->[1] <=> $b->[1] }
 >                map { [ $_, $_->start * ($_->strand || 1)] } @exons;

I can check this in if it works - seems to for your example.

--jason

>
> cvs diff Transcript.pm
> Index: Transcript.pm
> ===================================================================
> RCS file: /home/repository/bioperl/bioperl-
> live/Bio/SeqFeature/Gene/Transcript.pm,v
> retrieving revision 1.33
> diff -r1.33 Transcript.pm
> 287c287
> <     $rev_order = ($exons[0]->end() < $exons[1]->start() ? 0 : 1);
> ---
>
>>     #$rev_order = ($exons[0]->end() < $exons[1]->start() ? 0 : 1);
>>
> 291c291,292
> <     if((! defined($strand)) || ($strand != -1) || (! $rev_order)) {
> ---
>
>>     #if((! defined($strand)) || ($strand != -1) || (! $rev_order)) {
>>     if((! defined($strand)) || ($strand != -1)) {
>>
>
>
> Thanks
>
> Olly Burren
> JDRF/WT DIL
>
> <test_intron.pl>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l