[Bioperl-l] longest ORF
Navdeep Jaitly
ndjaitly@hotmail.com
Tue, 30 Jul 2002 12:41:57 -0400
its your regular expressions.. The ! does not work as you might expect..
try this instead.
Deep
foreach $direction (keys %strand) {
my @starts;
my @stops;
while ($strand{$direction}=~m/(^(?!taa)|^(?!tga)|^(?!tag)|atg)/gi)
{
my $pos = pos($strand{$direction})- 2 ;
if ($pos <= 0)
{
$pos = 1 ;
}
push @starts,$pos;
}
while ($strand{$direction}=~m/(taa|tga|tag|...$)/gi)
{
my $pos = pos($strand{$direction})- 2 ;
if ($pos <= 0)
{
$pos = 1 ;
}
push @ends,$pos;
}
>From: Dan Kortschak <kortschak@rsbs.anu.edu.au>
>Reply-To: Dan Kortschak <kortschak@rsbs.anu.edu.au>
>To: bioperl-l@bioperl.org
>Subject: Re: [Bioperl-l] longest ORF
>Date: Tue, 30 Jul 2002 17:49:50 +1000 (EST)
>MIME-Version: 1.0
>Received: from mc1-f21.law16.hotmail.com ([65.54.236.28]) by
>mc1-s21.law16.hotmail.com with Microsoft SMTPSVC(5.0.2195.4905); Tue, 30
>Jul 2002 01:01:06 -0700
>Received: from pw600a.bioperl.org ([199.93.107.70]) by
>mc1-f21.law16.hotmail.com with Microsoft SMTPSVC(5.0.2195.4905); Tue, 30
>Jul 2002 00:54:02 -0700
>Received: from pw600a.bioperl.org (localhost [127.0.0.1])by
>pw600a.bioperl.org (8.12.2/8.12.2) with ESMTP id g6U7qNJD028574;Tue, 30 Jul
>2002 03:52:23 -0400
>Received: from rsbsimail.anu.edu.au (rsbsimail.anu.edu.au
>[150.203.38.39])by pw600a.bioperl.org (8.12.2/8.12.2) with ESMTP id
>g6U7p1JD028541for <bioperl-l@bioperl.org>; Tue, 30 Jul 2002 03:51:02 -0400
>Received: from rsbs37112.anu.edu.au (not verified[150.203.37.112]) by
>rsbsimail.anu.edu.au with MailMarshal (4,2,5,0) id <B0000628e1>; Tue, 30
>Jul 2002 17:50:59 +1000
>X-X-Sender: dkortsch@sunya.rsbs.anu.edu.au
>In-Reply-To:
><Pine.OSF.4.33.0207292002001.18773-100000@alpha10.bioch.virginia.edu>
>Message-ID:
><Pine.LNX.4.44.0207301742080.17429-100000@sunya.rsbs.anu.edu.au>
>Sender: bioperl-l-admin@bioperl.org
>Errors-To: bioperl-l-admin@bioperl.org
>X-BeenThere: bioperl-l@bioperl.org
>X-Mailman-Version: 2.0.12
>Precedence: bulk
>List-Help: <mailto:bioperl-l-request@bioperl.org?subject=help>
>List-Post: <mailto:bioperl-l@bioperl.org>
>List-Subscribe:
><http://bioperl.org/mailman/listinfo/bioperl-l>,<mailto:bioperl-l-request@bioperl.org?subject=subscribe>
>List-Id: Bioperl Project Discussion List <bioperl-l.bioperl.org>
>List-Unsubscribe:
><http://bioperl.org/mailman/listinfo/bioperl-l>,<mailto:bioperl-l-request@bioperl.org?subject=unsubscribe>
>List-Archive: <http://bioperl.org/pipermail/bioperl-l/>
>Return-Path: bioperl-l-admin@bioperl.org
>X-OriginalArrivalTime: 30 Jul 2002 07:54:03.0578 (UTC)
>FILETIME=[45D591A0:01C2379E]
>
>Hi Arron, I've fiddled some with the barebones perl you posted (there were
>issues with codons being in frame or not as well and some BioPerl syntax
>things). Thanks BTW. As it stands (with exception of the following), it
>works, but only on real ORFs (which sometimes aren't that useful).
>
>What I'm trying to do with the m/(atg|^!taa|^!tga|^!tag)/gi and
>m/(taa|tga|tag|...$)/gi is to let the ORF start at the beginning of the
>dna so long as it is not a stop and allow it to run-off at the end.
>However my tests show that the orfs always start at an atg. Can anyone
>help here?
>
>thanks
>Dan Kortschak
>
>#!/usr/bin/perl
>
>use Getopt::Long;
>use Bio::SeqIO;
>
>my ($sequencefile,$sequenceformat,$help) = (undef, 'fasta',undef);
>
>&GetOptions('input|i=s' => \$sequencefile,
> 'format|f=s' => \$sequenceformat,
> 'help|h' => \$help,
> );
>
>if ($help) {
> exec('perldoc', $0); # when I can be bothered
> die;
>}
>
>sub longestORF {
> my $best=0;
> my ($bests, $beste, $beststrand) = (-1,-1,0);
>
> my $dna=Bio::Seq->new(-seq => $_[0]);
> my %strand=('+'=>$dna->seq,
> '-'=>$dna->revcom->seq);
>
> foreach $direction (keys %strand) {
> my @starts;
> my @stops;
> while ($strand{$direction}=~m/(atg|^!taa|^!tga|^!tag)/gi) {
> push @starts,pos($strand{$direction})-2;
> }
>
> while ($strand{$direction}=~m/(taa|tga|tag|...$)/gi) {
> push @ends,pos($strand{$direction})-2;
> }
>
> for my $s (@starts) {
> for my $e (@ends) {
> if ($e%3==$s%3 and $e>$s) {
> if ($e-$s>$best) {
> $best=$e-$s;
> ($bests,$beste,$beststrand) = ($s, $e, $direction);
>
>$bestorf=Bio::Seq->new(-seq=>$strand{$direction})->subseq($s,$e);
> }
> last
> } else {
> next
> }
> }
> }
> @starts=0;
> @ends=0;
> }
> return ($best, $bests, $beste, $beststrand, $bestorf);
>}
>
>
>my $seqio = new Bio::SeqIO('-format' => $sequenceformat,
> '-file' => $sequencefile );
>
>while (my $dna = $seqio->next_seq) {
>
> print $dna->display_id," ",$dna->desc,": ";
> $seq=$dna->seq;
> ($length,$start,$end,$direction,$sequence)=longestORF($seq);
> print $length," ",Bio::Seq->new(-seq=>$sequence)->translate->seq,"\n";
>}
>
>
>On Mon, 29 Jul 2002, Aaron J Mackey wrote:
>
> >
> > Something like this:
> >
> > #!/usr/bin/perl -w
> >
> > use strict;
> >
> > # get a $seq Bio::Seq object somehow ...
> > my $dna = $seq->seq;
> >
> > my @starts;
> > my @stops;
> >
> > while ($dna =~ m/atg/gi) {
> > push @starts, pos($dna) - 2;
> > }
> >
> > while ($dna =~ m/(taa) | (tga) | (tag)/gi) {
> > push @ends, pos($dna) - 2;
> > }
> >
> > my $best = 0;
> > my ($bests, $beste, $beststrand) = (-1) x 3;
> > for my $s (@starts) {
> > for my $e (@ends) {
> > if($e - $b + 1 > $best) {
> > $best = $e - $b + 1;
> > ($bests, $beste, $beststrand) = ($s, $e, 1);
> > }
> > }
> > }
> >
> > # repeat for reverse strand; left as an exercise for the reader.
> > # report $bests, $beste, $beststrand, perhaps build a new Seq.pm object
> >
> > As always, this is all good fodder for a Tools/ORFCaller.pm or somesuch.
> > Like Tools::IUPAC, Tools::ORFs could return a stream of Seq's
>representing
> > each orf (with parameters for desired strandedness, minimum length,
> > genetic code (to use with Tools::CodonTable, etc).
> >
> > -Aaron
> >
> >
> > On Tue, 30 Jul 2002, Dan Kortschak wrote:
> >
> > > Hi All, I'm looking to see if anyone has a small script to take a
>Bio::Seq
> > > object and find the longest ORF ((+) strand or both strands).
> > > Altenatively, is there an easy way to do this using Bio::Seq
>(translate
> > > doesn't seem to take a frame for translation so I'm guessing that it
> > > translates in frame 1 - is this right?)
> > >
> > > thanks
> > > Dan Kortschak
> > >
> > >
> >
> >
>
>--
>_____________________________________________________________ .`.`o
> o| ,\__ `./`r
> Dan Kortschak kortschak@rsbs.anu.spanner.edu.au <\/ \_O> O
> "|`...'.\
> Before you criticise a man, try to walk a mile in his ` :\
> shoes. Then, if he doesn't like what you have to say, : \
> you'll be a mile away, and you'll have his shoes. : \
>
> The address above will not work, remove the spanner from the works.
>
>By replying to this email you implicitly accept that your response may
>be forwarded to other recipients.
>Permission is granted for fair use reproduction.
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l@bioperl.org
>http://bioperl.org/mailman/listinfo/bioperl-l
_________________________________________________________________
Send and receive Hotmail on your mobile device: http://mobile.msn.com