[EMBOSS] stretcher issues

David Mathog mathog at caltech.edu
Tue Feb 5 22:58:09 UTC 2019


EMBOSS 6.6.0 on Centos 6.9.

Trying to align a bunch of 2-20kbp contigs against a 175kbp BAC with 
stretcher and some odd things are falling out.  I think these point to a 
problem in the handling of end gaps in that program.  It is invoked like 
this:

   stretcher -aseq BAC.fasta -bseq contig.fasta \
           -outfile pairs.fasta -aformat3 fasta -auto


1.  If a ~25kbp contig aligns so that its final 12kb overlaps (nearly 
exactly, like 99.9% identity) with the first 12kbp of the BAC it the 
alignment produced is a total mess.  It seems like stretcher cannot 
handle end gaps in this context at all, forcing the 12kbp which should 
be dangling unpaired off the left end of the BAC into alignment 
internally. needle doesn't work in this situation either since both of 
these commands segfault (on a nearly idle machine with 512Gb of RAM):

    needle BAC.fasta contig.fasta -outfile pairs.fasta -aformat3 fasta 
-auto
    needle BAC.fasta contig.fasta -outfile pairs.fasta -aformat3 fasta \
       -endweight T -endextend 0 -endopen 1 -auto

The ssw_test program from SSWlib handles this correctly.  (Unfortunately 
it does a local alignment and so cannot replace needle in this context.)

2.  This happens a lot:

    AATTC(lots of sequence)ATGAC...  (BAC)
    A--------(...)----------TGAC...  (contig)

It will also shift two A's, but not 3, or shift AAT, and so forth.  
Needle doesn't do this when it runs the same alignment (with the 2nd 
command from the pair above).
Needle is also much slower than stretcher.

The contigs have been flipped if necessary so that they are all in the 
same direction
as the BAC.

Regards,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech


More information about the EMBOSS mailing list