[Bioperl-l] longest ORF

Mon, 29 Jul 2002 20:26:49 -0400 (EDT)

Something like this:

#!/usr/bin/perl -w

use strict;

# get a $seq Bio::Seq object somehow ...
my $dna = $seq->seq;

my @starts;
my @stops;

while ($dna =~ m/atg/gi) {
    push @starts, pos($dna) - 2;
}

while ($dna =~ m/(taa) | (tga) | (tag)/gi) {
    push @ends, pos($dna) - 2;
}

my $best = 0;
my ($bests, $beste, $beststrand) = (-1) x 3;
for my $s (@starts) {
    for my $e (@ends) {
        if($e - $b + 1 > $best) {
            $best = $e - $b + 1;
            ($bests, $beste, $beststrand) = ($s, $e, 1);
        }
    }
}

# repeat for reverse strand; left as an exercise for the reader.
# report $bests, $beste, $beststrand, perhaps build a new Seq.pm object

As always, this is all good fodder for a Tools/ORFCaller.pm or somesuch.
Like Tools::IUPAC, Tools::ORFs could return a stream of Seq's representing
each orf (with parameters for desired strandedness, minimum length,
genetic code (to use with Tools::CodonTable, etc).

-Aaron

On Tue, 30 Jul 2002, Dan Kortschak wrote:

> Hi All, I'm looking to see if anyone has a small script to take a Bio::Seq
> object and find the longest ORF ((+) strand or both strands).
> Altenatively, is there an easy way to do this using Bio::Seq (translate
> doesn't seem to take a frame for translation so I'm guessing that it
> translates in frame 1 - is this right?)
>
> thanks
> Dan Kortschak
>
>

-- 
 Aaron J Mackey
 Pearson Laboratory
 University of Virginia
 (434) 924-2821
 amackey@virginia.edu