[Bioperl-l] longest ORF
Aaron J Mackey
Aaron J. Mackey" <amackey@virginia.edu
Mon, 29 Jul 2002 20:26:49 -0400 (EDT)
Something like this:
#!/usr/bin/perl -w
use strict;
# get a $seq Bio::Seq object somehow ...
my $dna = $seq->seq;
my @starts;
my @stops;
while ($dna =~ m/atg/gi) {
push @starts, pos($dna) - 2;
}
while ($dna =~ m/(taa) | (tga) | (tag)/gi) {
push @ends, pos($dna) - 2;
}
my $best = 0;
my ($bests, $beste, $beststrand) = (-1) x 3;
for my $s (@starts) {
for my $e (@ends) {
if($e - $b + 1 > $best) {
$best = $e - $b + 1;
($bests, $beste, $beststrand) = ($s, $e, 1);
}
}
}
# repeat for reverse strand; left as an exercise for the reader.
# report $bests, $beste, $beststrand, perhaps build a new Seq.pm object
As always, this is all good fodder for a Tools/ORFCaller.pm or somesuch.
Like Tools::IUPAC, Tools::ORFs could return a stream of Seq's representing
each orf (with parameters for desired strandedness, minimum length,
genetic code (to use with Tools::CodonTable, etc).
-Aaron
On Tue, 30 Jul 2002, Dan Kortschak wrote:
> Hi All, I'm looking to see if anyone has a small script to take a Bio::Seq
> object and find the longest ORF ((+) strand or both strands).
> Altenatively, is there an easy way to do this using Bio::Seq (translate
> doesn't seem to take a frame for translation so I'm guessing that it
> translates in frame 1 - is this right?)
>
> thanks
> Dan Kortschak
>
>
--
Aaron J Mackey
Pearson Laboratory
University of Virginia
(434) 924-2821
amackey@virginia.edu