[Bioperl-l] longest ORF

Brian Osborne brian_osborne@cognia.com
Wed, 31 Jul 2002 09:01:08 -0400


Dan,

>> (if its worth putting in as a demo
>> script or including - after OOing and making it accept strict it a bit -
>> please let me know):

Yes, it's a neat example script and one that's oft-requested. Make it strict
and I or we will put it into the examples/ directory under your name.

Brian O.


-----Original Message-----
From: bioperl-l-admin@bioperl.org [mailto:bioperl-l-admin@bioperl.org]On
Behalf Of Dan Kortschak
Sent: Tuesday, July 30, 2002 8:32 PM
To: bioperl-l@bioperl.org
Subject: Re: [Bioperl-l] longest ORF

Yeah, I realised that on the way home - I was too stuck in doing the thing
with REs and didn't think about the problem. I don't have the scripts dir
in my distro (debian woody uses bioperl 1.0.1 and I haven't cpaned the
latest version).

BTW You need to push all 3 frames (the frame is crucial to this problem)
on otherwise it won't necessarily find the appropriate ORF (also you need
to push all final run-offs on.

This leaves the script as follows (if its worth putting in as a demo
script or including - after OOing and making it accept strict it a bit -
please let me know):

#!/usr/bin/perl

use Getopt::Long;
use Bio::SeqIO;

my ($sequencefile,$sequenceformat,$help) = (undef, 'fasta',undef);

&GetOptions('input|i=s'              => \$sequencefile,
            'format|f=s'             => \$sequenceformat,
            'help|h'                 => \$help,
            );

if ($help) {
   exec('perldoc', $0); # when I can be bothered
   die;
}

sub longestORF {
   my $best=0;
   my ($bests, $beste, $beststrand) = (-1,-1,0);

   $relaxed=$_[1];
   my $dna=Bio::Seq->new(-seq => $_[0]);
   my %strand=('+'=>$dna->seq,
               '-'=>$dna->revcom->seq);

   foreach $direction (keys %strand) {
      my @starts;
      my @stops;
      push @starts,(1,2,3) if $relaxed;
      while ($strand{$direction}=~m/(atg)/gi) {
         push @starts,pos($strand{$direction})-2;
      }

      while ($strand{$direction}=~m/(taa|tga|tag)/gi) {
         push @ends,pos($strand{$direction})-2;
      }
      push @ends,($dna->length-2,$dna->length-1,$dna->length);

      for my $s (@starts) {
         for my $e (@ends) {
            if ($e%3==$s%3 and $e>$s) {
               if ($e-$s>$best) {
                  $best=$e-$s;
                  ($bests,$beste,$beststrand) = ($s, $e, $direction);

$bestorf=Bio::Seq->new(-seq=>$strand{$direction})->subseq($s,$e);
               }
            last
            } else {
               next
            }
         }
      }
      @starts=0;
      @ends=0;
   }
   return ($best, $bests, $beste, $beststrand, $bestorf);
}


my $seqio = new Bio::SeqIO('-format' => $sequenceformat,
                           '-file'   => $sequencefile );

while (my $dna = $seqio->next_seq) {
   print $dna->display_id," ",$dna->desc,": ";
   $seq=$dna->seq;
   print longestORF($seq,true),"\n";
}

On Tue, 30 Jul 2002, Navdeep Jaitly wrote:

> its your regular expressions.. The ! does not work as you might expect..
> try this instead.
> Deep
>
>
> foreach $direction (keys %strand) {
>   my @starts;
>   my @stops;
>   while ($strand{$direction}=~m/(^(?!taa)|^(?!tga)|^(?!tag)|atg)/gi)
>   {
>      my $pos = pos($strand{$direction})- 2  ;
>      if ($pos <= 0)
>      {
>        $pos = 1 ;
>      }
>      push @starts,$pos;
>   }
>
>   while ($strand{$direction}=~m/(taa|tga|tag|...$)/gi)
>   {
>      my $pos = pos($strand{$direction})- 2  ;
>      if ($pos <= 0)
>      {
>         $pos = 1 ;
>      }
>      push @ends,$pos;
>   }

--
_____________________________________________________________   .`.`o
                                                         o| ,\__ `./`r
  Dan Kortschak    kortschak@rsbs.anu.spanner.edu.au     <\/    \_O> O
                                                          "|`...'.\
  Before you criticise a man, try to walk a mile in his    `      :\
  shoes. Then, if he doesn't like what you have to say,           : \
  you'll be a mile away, and you'll have his shoes.               :  \

  The address above will not work, remove the spanner from the works.

By replying to this email you implicitly accept that your response may
be forwarded to other recipients.
Permission is granted for fair use reproduction.


_______________________________________________
Bioperl-l mailing list
Bioperl-l@bioperl.org
http://bioperl.org/mailman/listinfo/bioperl-l