[Bioperl-l] Unusual behaviour of SeqIO::tigr

Josh Lauricha laurichj at bioinfo.ucr.edu
Fri Mar 5 13:45:10 EST 2004


On Thu 03/04/04 16:57, Morten Lindow wrote:
> I am trying to build a table of genomic features from the tigrxml-format 
> of a rice (pseudo)chromosome.
> 
> However the tigr-parser seem to behave differently from genbank/embl: 
> tigr.pm considers every TU/etc an  individual sequence, and hence resets 
> its coordinate system every time it starts on a new TU.
> 
> My question is:
> Is there a bioperl-way to get to the global coordinates, like when I am 
> parsing a genbankfile of a whole chromosome?

Sure:

my $tigrin = Bio::SeqIO->new( -format => 'tigr', -file => 'chr01.xml');
 
while (my $seq = $tigrin->next_seq){
    my ($source) = grep { $_->primary_tag() eq "source" }
        $seq->get_SeqFeatures();

    # These are the 5' and 3' ends of each TU
    my($end5) = $source->get_tag_values('end5');
    my($end3) = $source->get_tag_values('end3');
    my($strand) = $end3 <=> $end5;

    # Then foreach location just do:
    my $loc = get_some_location....
    $start = $end5 + ($loc->start() - 1)*$strand;
    $end   = $end5 + ($loc->end()   - 1)*$strand;
    ...
}

This bit of code isn't tested, but I've been using these alot, so it is
probably correct.

-- 

------------------------------------------------------
| Josh Lauricha            | Ford, your turning into |
| laurichj at bioinfo.ucr.edu | a penguin. Stop it.     |
| Bioinformatics, UCR      |                         |
|----------------------------------------------------|
| OpenPG:                                            |
|  5A0D 92D3 D093 79DE F724 1137 6DF1 B5EB D9CE AAA8 |
|----------------------------------------------------|


More information about the Bioperl-l mailing list