[Bioperl-l] Bio::Tools::ESTScan and ESTScan v2.0 beta
Samuel Thoraval
samuel.thoraval at librophyt.com
Wed May 18 06:42:52 EDT 2005
Hi,
I have modified Bio::Tools::ESTScan to make it
compliant with new ESTScan version 2.0 beta.
I am not familiar with the former version, and I don't know about all the
changes, but one of them concerns the generated output.
Instead of having 5 numbers following the sequence id, there are only 3,
respectively being the score, start position and end position. The score can
be negative.
I also wanted ESTScan.pm to be able to parse the ESTScan (version 2.0 only)
protein fasta file (which can be generated with option -t).
I haven't added any support to the 'all-in-one' format for ESTScan v2.
Below is the diff from ESTScan.pm version 1.11 with ESTScan.pm 1.10 :
~~~~~~~~~~~~~~~~~~~~~~~~
1c1
< # $Id: ESTScan.pm,v 1.11 2005/05/18 07:38:45 lapp Exp $
---
> # $Id: ESTScan.pm,v 1.10 2002/10/22 07:38:45 lapp Exp $
172d171
< my $alphabet;
182c181
< $seq->desc() =~ /^(\-?[\d.]+)\s*(.*)/ or
---
> $seq->desc() =~ /^([\d.]+)\s*(.*)/ or
187,195d185
< # translated may end the description
< if($seq->desc() =~ /(.*)translated$/) {
< my $desc = $1;
< $desc =~ s/;\s+$//;
< $seq->desc($desc);
< $alphabet = "protein";
< } else {
< $alphabet = "dna";
< }
230,264d219
< } elsif ($seq->desc() =~ /^(\d+)\s+(\d+)\s*(.*)/) {
< # default ESTSCAN v2 format
< $seq->desc($3);
< $predobj = Bio::Tools::Prediction::Exon->new('-source' => "ESTScan",
< '-start' => $1,
< '-end' => $2);
< $predobj->strand($gene->strand());
< $predobj->score($gene->score()); # FIXME or $1, or $2 ?
< $predobj->primary_tag("InternalExon");
< $predobj->seq_id($seq->display_id());
< # add to gene structure object
< $gene->add_exon($predobj);
< if ($alphabet eq "dna") {
< # add predicted CDS
< $cds = $seq->seq();
< $cds =~ s/[a-z]//g; # remove the deletions, but keep the
insertions
< $cds = Bio::PrimarySeq->new('-seq' => $cds,
< '-display_id' => $seq->display_id(),
< '-desc' => $seq->desc(),
< '-alphabet' => "dna");
< $gene->predicted_cds($cds);
< $predobj->predicted_cds($cds);
< if($gene->strand() == -1) {
< $self->warn("reverse strand ORF, but unable to reverse
coordinates!");
< }
< } elsif ($alphabet eq "protein") {
< # add predicted Protein
< $cds = $seq->seq();
< $cds = Bio::PrimarySeq->new('-seq' => $cds,
< '-display_id' => $seq->display_id(),
< '-desc' => $seq->desc(),
< '-alphabet' => "protein");
< $gene->predicted_protein($cds);
< $predobj->predicted_protein($cds);
< }
~~~~~~~~~~~~~~~~~~~~~~~~
Regards,
--
Samuel Thoraval
More information about the Bioperl-l
mailing list