[Bioperl-l] longest ORF
Dan Kortschak
Dan Kortschak <kortschak@rsbs.anu.edu.au>
Tue, 30 Jul 2002 17:49:50 +1000 (EST)
Hi Arron, I've fiddled some with the barebones perl you posted (there were
issues with codons being in frame or not as well and some BioPerl syntax
things). Thanks BTW. As it stands (with exception of the following), it
works, but only on real ORFs (which sometimes aren't that useful).
What I'm trying to do with the m/(atg|^!taa|^!tga|^!tag)/gi and
m/(taa|tga|tag|...$)/gi is to let the ORF start at the beginning of the
dna so long as it is not a stop and allow it to run-off at the end.
However my tests show that the orfs always start at an atg. Can anyone
help here?
thanks
Dan Kortschak
#!/usr/bin/perl
use Getopt::Long;
use Bio::SeqIO;
my ($sequencefile,$sequenceformat,$help) = (undef, 'fasta',undef);
&GetOptions('input|i=s' => \$sequencefile,
'format|f=s' => \$sequenceformat,
'help|h' => \$help,
);
if ($help) {
exec('perldoc', $0); # when I can be bothered
die;
}
sub longestORF {
my $best=0;
my ($bests, $beste, $beststrand) = (-1,-1,0);
my $dna=Bio::Seq->new(-seq => $_[0]);
my %strand=('+'=>$dna->seq,
'-'=>$dna->revcom->seq);
foreach $direction (keys %strand) {
my @starts;
my @stops;
while ($strand{$direction}=~m/(atg|^!taa|^!tga|^!tag)/gi) {
push @starts,pos($strand{$direction})-2;
}
while ($strand{$direction}=~m/(taa|tga|tag|...$)/gi) {
push @ends,pos($strand{$direction})-2;
}
for my $s (@starts) {
for my $e (@ends) {
if ($e%3==$s%3 and $e>$s) {
if ($e-$s>$best) {
$best=$e-$s;
($bests,$beste,$beststrand) = ($s, $e, $direction);
$bestorf=Bio::Seq->new(-seq=>$strand{$direction})->subseq($s,$e);
}
last
} else {
next
}
}
}
@starts=0;
@ends=0;
}
return ($best, $bests, $beste, $beststrand, $bestorf);
}
my $seqio = new Bio::SeqIO('-format' => $sequenceformat,
'-file' => $sequencefile );
while (my $dna = $seqio->next_seq) {
print $dna->display_id," ",$dna->desc,": ";
$seq=$dna->seq;
($length,$start,$end,$direction,$sequence)=longestORF($seq);
print $length," ",Bio::Seq->new(-seq=>$sequence)->translate->seq,"\n";
}
On Mon, 29 Jul 2002, Aaron J Mackey wrote:
>
> Something like this:
>
> #!/usr/bin/perl -w
>
> use strict;
>
> # get a $seq Bio::Seq object somehow ...
> my $dna = $seq->seq;
>
> my @starts;
> my @stops;
>
> while ($dna =~ m/atg/gi) {
> push @starts, pos($dna) - 2;
> }
>
> while ($dna =~ m/(taa) | (tga) | (tag)/gi) {
> push @ends, pos($dna) - 2;
> }
>
> my $best = 0;
> my ($bests, $beste, $beststrand) = (-1) x 3;
> for my $s (@starts) {
> for my $e (@ends) {
> if($e - $b + 1 > $best) {
> $best = $e - $b + 1;
> ($bests, $beste, $beststrand) = ($s, $e, 1);
> }
> }
> }
>
> # repeat for reverse strand; left as an exercise for the reader.
> # report $bests, $beste, $beststrand, perhaps build a new Seq.pm object
>
> As always, this is all good fodder for a Tools/ORFCaller.pm or somesuch.
> Like Tools::IUPAC, Tools::ORFs could return a stream of Seq's representing
> each orf (with parameters for desired strandedness, minimum length,
> genetic code (to use with Tools::CodonTable, etc).
>
> -Aaron
>
>
> On Tue, 30 Jul 2002, Dan Kortschak wrote:
>
> > Hi All, I'm looking to see if anyone has a small script to take a Bio::Seq
> > object and find the longest ORF ((+) strand or both strands).
> > Altenatively, is there an easy way to do this using Bio::Seq (translate
> > doesn't seem to take a frame for translation so I'm guessing that it
> > translates in frame 1 - is this right?)
> >
> > thanks
> > Dan Kortschak
> >
> >
>
>
--
_____________________________________________________________ .`.`o
o| ,\__ `./`r
Dan Kortschak kortschak@rsbs.anu.spanner.edu.au <\/ \_O> O
"|`...'.\
Before you criticise a man, try to walk a mile in his ` :\
shoes. Then, if he doesn't like what you have to say, : \
you'll be a mile away, and you'll have his shoes. : \
The address above will not work, remove the spanner from the works.
By replying to this email you implicitly accept that your response may
be forwarded to other recipients.
Permission is granted for fair use reproduction.