[Bioperl-l] Unusual behaviour of SeqIO::tigr
Morten Lindow
morten at binf.ku.dk
Mon Mar 8 05:48:51 EST 2004
Thanks Josh that helped,
In case somebody else can use it I am pasting the complete code to get
the coordinates of all locations on one line each.
But another thing: When parsing rice-pseudochromosome files I get a few
warnings like this(from around 20-30 sequences per chromosome):
From chr01.xml
-------------------- WARNING ---------------------
MSG: Seq [8350.t06547]: Not using a valid terminator codon!
---------------------------------------------------
-------------------- WARNING ---------------------
MSG: Seq [8350.t06547]: Terminator codon inside CDS!
---------------------------------------------------
-------------------- WARNING ---------------------
MSG: Seq [8350.t06547]: Not using a valid initiator codon!
---------------------------------------------------
since the problem is only with a small fraction of the TUs I guess it is
not a bioperl problem, but rather a TIGR problem? Any comments?
- Morten
----BEGIN code to get featuretable from tigrxml.
#!/usr/bin/perl
use warnings;
use strict;
use lib "$ENV{HOME}/bioperl/bioperl-live";
use Bio::SeqIO;
my $file = shift;
my $tigrin = Bio::SeqIO->new( -format => 'tigr', -file => $file);
while (my $seq = $tigrin->next_seq){
#Get global start and end coordinates for this sequence
my ($source) = grep { $_->primary_tag() eq "source" }
$seq->get_SeqFeatures();
my($end5) = $source->get_tag_values('end5');
my($end3) = $source->get_tag_values('end3');
my($strand) = $end3 <=> $end5;
foreach my $feat($seq->get_all_SeqFeatures){
next if $feat->primary_tag eq 'source';
foreach my $location ($feat->location->each_Location){
print $feat->primary_tag, "\t";
my $start = $end5 + ($location->start() - 1)*$strand;
my $end = $end5 + ($location->end() - 1)*$strand;
print join "\t", ($feat->primary_tag, $seq->id, $start, $end,
$strand);
print "\n";
}
}
# if ($seq->id eq '8350.t06547'){
# print $seq->seq;
# die $!
# }
}
Josh Lauricha wrote:
>On Thu 03/04/04 16:57, Morten Lindow wrote:
>
>
>>I am trying to build a table of genomic features from the tigrxml-format
>>of a rice (pseudo)chromosome.
>>
>>However the tigr-parser seem to behave differently from genbank/embl:
>>tigr.pm considers every TU/etc an individual sequence, and hence resets
>>its coordinate system every time it starts on a new TU.
>>
>>My question is:
>>Is there a bioperl-way to get to the global coordinates, like when I am
>>parsing a genbankfile of a whole chromosome?
>>
>>
>
>Sure:
>
>my $tigrin = Bio::SeqIO->new( -format => 'tigr', -file => 'chr01.xml');
>
>while (my $seq = $tigrin->next_seq){
> my ($source) = grep { $_->primary_tag() eq "source" }
> $seq->get_SeqFeatures();
>
> # These are the 5' and 3' ends of each TU
> my($end5) = $source->get_tag_values('end5');
> my($end3) = $source->get_tag_values('end3');
> my($strand) = $end3 <=> $end5;
>
> # Then foreach location just do:
> my $loc = get_some_location....
> $start = $end5 + ($loc->start() - 1)*$strand;
> $end = $end5 + ($loc->end() - 1)*$strand;
> ...
>}
>
>This bit of code isn't tested, but I've been using these alot, so it is
>probably correct.
>
>
>
More information about the Bioperl-l
mailing list