[Bioperl-l] fasta36 bug report

Fields, Christopher J cjfields at illinois.edu
Fri Aug 8 01:46:28 UTC 2014


Looks as if there have been major changes to fasta36, see here:

    http://faculty.virginia.edu/wrpearson/fasta/fasta36/

Note in particular point 2:


  *   Display of all significant alignments between query and library sequence. BLAST has always displayed multiple high-scoring alignments (HSPs) between the query and library sequence; previous versions of the FASTA programs displayed only the best alignment, even when other high-scoring alignments were present. This is the major change in FASTA36. For most programs (fasta36, ssearch36, [t]fast[xy]36), if the library sequence contains additional significant alignments, they will be displayed with the alignment output, and as part of -m 9 output (the initial list of high scores).

By default, the statistical threshold for alternate alignments (HSPs) is the E()-threshold / 10.0. For proteins, the default expect threshold is E()< 10.0, the secondary threshold for showing alternate alignments is thus E() < 1.0. Fror translated comparisons, the E()-thresholds are 5.0/0.5; for DNA:DNA 2.0/0.2.

Both the primary and secondary E()-thresholds are set with the -E "prim sec" command line option. If the secondary value is betwee zero and 1.0, it is taken as the actual threshold. If it is > 1.0, it is taken as a divisor for the primary threshold. If it is negative, alternative alignments are disabled and only the best alignment is shown.

I suggest submitting this as a bug report to GitHub along with the examples (use a gist, not pastebin as the latter are not around forever).  We could add support for this (it’s not very high on our priority list of fixes), but having example files helps tremendously along that path.  We also will acept any help/patches that get this working again :)

chris

On Aug 7, 2014, at 7:47 PM, Antony03 <antony.vincent.1 at ulaval.ca<mailto:antony.vincent.1 at ulaval.ca>> wrote:

Hello,

There is a problem when I try to parse a fasta36 report. I got this error:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Unrecognized alignment line (3) '>--'
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:486
STACK: Bio::SearchIO::fasta::next_result
/usr/share/perl5/Bio/SearchIO/fasta.pm:1148
STACK: ./Auto_Annot.pl:123
-----------------------------------------------------------

There is a '>--' after each alignment in fasta36 but not in fasta35.

Consequently, I tried to parse a fasta35 alignment. There is no problem with
bioperl. However, the result is clearly not the same between both (fasta35
and fasta36).

fasta35 gives only 1 alignment: http://pastebin.com/f4NdYJCt

while fasta36 gives 3 alignments: http://pastebin.com/ADeKJ4GC

What I'm doing wrong with fasta35? It is probably not normal that it misses
almost 2 perfect alignments on 3.

Thanks you!




--
View this message in context: http://bioperl.996286.n3.nabble.com/fasta36-bug-report-tp17620.html
Sent from the Bioperl-L mailing list archive at Nabble.com<http://Nabble.com>.
_______________________________________________
Bioperl-l mailing list
Bioperl-l at mailman.open-bio.org<mailto:Bioperl-l at mailman.open-bio.org>
http://mailman.open-bio.org/mailman/listinfo/bioperl-l

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/bioperl-l/attachments/20140808/eec72a02/attachment.html>


More information about the Bioperl-l mailing list