[Bioperl-l] Bio::Tools::ESTScan and ESTScan v2.0 beta

Samuel Thoraval samuel.thoraval at librophyt.com
Wed May 18 06:42:52 EDT 2005


Hi,

I have modified Bio::Tools::ESTScan  to make it 
compliant with new ESTScan version 2.0 beta.

I am not familiar with the former version, and I don't know about all the 
changes, but one of them concerns the generated output. 
Instead of having 5 numbers following the sequence id, there are only 3, 
respectively being the score, start position and end position. The score can 
be negative.

I also wanted ESTScan.pm to be able to parse the ESTScan (version 2.0 only)  
protein fasta file (which can be generated with option -t).

I haven't added any support to the 'all-in-one' format for ESTScan v2.

Below is the diff from ESTScan.pm version 1.11 with ESTScan.pm 1.10 :

~~~~~~~~~~~~~~~~~~~~~~~~
1c1
< # $Id: ESTScan.pm,v 1.11 2005/05/18 07:38:45 lapp Exp $
---
> # $Id: ESTScan.pm,v 1.10 2002/10/22 07:38:45 lapp Exp $
172d171
<     my $alphabet;
182c181
<     $seq->desc() =~ /^(\-?[\d.]+)\s*(.*)/ or
---
>     $seq->desc() =~ /^([\d.]+)\s*(.*)/ or
187,195d185
<     # translated may end the description
<     if($seq->desc() =~ /(.*)translated$/) {
<       my $desc = $1;
<       $desc =~ s/;\s+$//;
<       $seq->desc($desc);
<       $alphabet = "protein";
<     } else {
<       $alphabet = "dna";
<     }
230,264d219
<     } elsif ($seq->desc() =~ /^(\d+)\s+(\d+)\s*(.*)/) {
<       # default ESTSCAN v2 format
<       $seq->desc($3);
<       $predobj = Bio::Tools::Prediction::Exon->new('-source' => "ESTScan",
<                                                    '-start' => $1,
<                                                    '-end' => $2);
<       $predobj->strand($gene->strand());
<       $predobj->score($gene->score()); # FIXME or $1, or $2 ?
<       $predobj->primary_tag("InternalExon");
<       $predobj->seq_id($seq->display_id());
<       # add to gene structure object
<       $gene->add_exon($predobj);
<       if ($alphabet eq "dna") {
<               # add predicted CDS
<               $cds = $seq->seq();
<               $cds =~ s/[a-z]//g; # remove the deletions, but keep the 
insertions
<               $cds = Bio::PrimarySeq->new('-seq' => $cds,
<                                       '-display_id' => $seq->display_id(),
<                                       '-desc' => $seq->desc(),
<                                       '-alphabet' => "dna");
<               $gene->predicted_cds($cds);
<               $predobj->predicted_cds($cds);
<               if($gene->strand() == -1) {
<               $self->warn("reverse strand ORF, but unable to reverse 
coordinates!");
<               }
<       } elsif ($alphabet eq "protein") {
<               # add predicted Protein
<               $cds = $seq->seq();
<               $cds = Bio::PrimarySeq->new('-seq' => $cds,
<                                       '-display_id' => $seq->display_id(),
<                                       '-desc' => $seq->desc(),
<                                       '-alphabet' => "protein");
<               $gene->predicted_protein($cds);
<               $predobj->predicted_protein($cds);
<       }
~~~~~~~~~~~~~~~~~~~~~~~~


Regards,

-- 
Samuel Thoraval




More information about the Bioperl-l mailing list