[Bioperl-l] longest ORF

Dan Kortschak Dan Kortschak <kortschak@rsbs.anu.edu.au>
Wed, 31 Jul 2002 10:32:06 +1000 (EST)


Yeah, I realised that on the way home - I was too stuck in doing the thing
with REs and didn't think about the problem. I don't have the scripts dir
in my distro (debian woody uses bioperl 1.0.1 and I haven't cpaned the
latest version).

BTW You need to push all 3 frames (the frame is crucial to this problem)
on otherwise it won't necessarily find the appropriate ORF (also you need
to push all final run-offs on.

This leaves the script as follows (if its worth putting in as a demo
script or including - after OOing and making it accept strict it a bit -
please let me know):

#!/usr/bin/perl

use Getopt::Long;
use Bio::SeqIO;

my ($sequencefile,$sequenceformat,$help) = (undef, 'fasta',undef);

&GetOptions('input|i=s'              => \$sequencefile,
            'format|f=s'             => \$sequenceformat,
            'help|h'                 => \$help,
            );

if ($help) {
   exec('perldoc', $0); # when I can be bothered
   die;
}

sub longestORF {
   my $best=0;
   my ($bests, $beste, $beststrand) = (-1,-1,0);

   $relaxed=$_[1];
   my $dna=Bio::Seq->new(-seq => $_[0]);
   my %strand=('+'=>$dna->seq,
               '-'=>$dna->revcom->seq);

   foreach $direction (keys %strand) {
      my @starts;
      my @stops;
      push @starts,(1,2,3) if $relaxed;
      while ($strand{$direction}=~m/(atg)/gi) {
         push @starts,pos($strand{$direction})-2;
      }

      while ($strand{$direction}=~m/(taa|tga|tag)/gi) {
         push @ends,pos($strand{$direction})-2;
      }
      push @ends,($dna->length-2,$dna->length-1,$dna->length);

      for my $s (@starts) {
         for my $e (@ends) {
            if ($e%3==$s%3 and $e>$s) {
               if ($e-$s>$best) {
                  $best=$e-$s;
                  ($bests,$beste,$beststrand) = ($s, $e, $direction);
                  $bestorf=Bio::Seq->new(-seq=>$strand{$direction})->subseq($s,$e);
               }
            last
            } else {
               next
            }
         }
      }
      @starts=0;
      @ends=0;
   }
   return ($best, $bests, $beste, $beststrand, $bestorf);
}


my $seqio = new Bio::SeqIO('-format' => $sequenceformat,
                           '-file'   => $sequencefile );

while (my $dna = $seqio->next_seq) {
   print $dna->display_id," ",$dna->desc,": ";
   $seq=$dna->seq;
   print longestORF($seq,true),"\n";
}

On Tue, 30 Jul 2002, Navdeep Jaitly wrote:

> its your regular expressions.. The ! does not work as you might expect..
> try this instead.
> Deep
>
>
> foreach $direction (keys %strand) {
>   my @starts;
>   my @stops;
>   while ($strand{$direction}=~m/(^(?!taa)|^(?!tga)|^(?!tag)|atg)/gi)
>   {
>      my $pos = pos($strand{$direction})- 2  ;
>      if ($pos <= 0)
>      {
>        $pos = 1 ;
>      }
>      push @starts,$pos;
>   }
>
>   while ($strand{$direction}=~m/(taa|tga|tag|...$)/gi)
>   {
>      my $pos = pos($strand{$direction})- 2  ;
>      if ($pos <= 0)
>      {
>         $pos = 1 ;
>      }
>      push @ends,$pos;
>   }

-- 
_____________________________________________________________   .`.`o
                                                         o| ,\__ `./`r
  Dan Kortschak    kortschak@rsbs.anu.spanner.edu.au     <\/    \_O> O
                                                          "|`...'.\
  Before you criticise a man, try to walk a mile in his    `      :\
  shoes. Then, if he doesn't like what you have to say,           : \
  you'll be a mile away, and you'll have his shoes.               :  \

  The address above will not work, remove the spanner from the works.

By replying to this email you implicitly accept that your response may
be forwarded to other recipients.
Permission is granted for fair use reproduction.