[Bioperl-l] Unusual behaviour of SeqIO::tigr

Morten Lindow morten at binf.ku.dk
Mon Mar 8 05:48:51 EST 2004


Thanks Josh that helped,

In case somebody else can use it I am pasting the complete code to get 
the coordinates of all locations on one line each.

But another thing: When parsing rice-pseudochromosome files I get a few 
warnings like this(from around 20-30 sequences per chromosome):

 From chr01.xml

-------------------- WARNING ---------------------
MSG: Seq [8350.t06547]: Not using a valid terminator codon!
---------------------------------------------------

-------------------- WARNING ---------------------
MSG: Seq [8350.t06547]: Terminator codon inside CDS!
---------------------------------------------------

-------------------- WARNING ---------------------
MSG: Seq [8350.t06547]: Not using a valid initiator codon!
---------------------------------------------------
since the problem is only with a small fraction of the TUs I guess it is 
not a bioperl problem, but rather a TIGR problem? Any comments?


 - Morten

----BEGIN code to get featuretable from tigrxml.

#!/usr/bin/perl

use warnings;
use strict;
use lib "$ENV{HOME}/bioperl/bioperl-live";
use Bio::SeqIO;


my $file = shift;

my $tigrin = Bio::SeqIO->new( -format => 'tigr', -file => $file);
 
while (my $seq = $tigrin->next_seq){
   
    #Get global start and end coordinates for this sequence
    my ($source) = grep { $_->primary_tag() eq "source" } 
$seq->get_SeqFeatures();
    my($end5) = $source->get_tag_values('end5');
    my($end3) = $source->get_tag_values('end3');
    my($strand) = $end3 <=> $end5;
   
  
   
   foreach my $feat($seq->get_all_SeqFeatures){
    next if $feat->primary_tag eq 'source';   
    foreach my $location ($feat->location->each_Location){
        print $feat->primary_tag, "\t";   
        my $start = $end5 + ($location->start() - 1)*$strand;
        my $end   = $end5 + ($location->end()   - 1)*$strand;
        print join "\t", ($feat->primary_tag, $seq->id, $start, $end, 
$strand);
        print "\n";
    }
   }
#    if ($seq->id eq '8350.t06547'){
#    print $seq->seq;
#    die $!
#    }
}




Josh Lauricha wrote:

>On Thu 03/04/04 16:57, Morten Lindow wrote:
>  
>
>>I am trying to build a table of genomic features from the tigrxml-format 
>>of a rice (pseudo)chromosome.
>>
>>However the tigr-parser seem to behave differently from genbank/embl: 
>>tigr.pm considers every TU/etc an  individual sequence, and hence resets 
>>its coordinate system every time it starts on a new TU.
>>
>>My question is:
>>Is there a bioperl-way to get to the global coordinates, like when I am 
>>parsing a genbankfile of a whole chromosome?
>>    
>>
>
>Sure:
>
>my $tigrin = Bio::SeqIO->new( -format => 'tigr', -file => 'chr01.xml');
> 
>while (my $seq = $tigrin->next_seq){
>    my ($source) = grep { $_->primary_tag() eq "source" }
>        $seq->get_SeqFeatures();
>
>    # These are the 5' and 3' ends of each TU
>    my($end5) = $source->get_tag_values('end5');
>    my($end3) = $source->get_tag_values('end3');
>    my($strand) = $end3 <=> $end5;
>
>    # Then foreach location just do:
>    my $loc = get_some_location....
>    $start = $end5 + ($loc->start() - 1)*$strand;
>    $end   = $end5 + ($loc->end()   - 1)*$strand;
>    ...
>}
>
>This bit of code isn't tested, but I've been using these alot, so it is
>probably correct.
>
>  
>




More information about the Bioperl-l mailing list