[EMBOSS] stretcher issues

Peter Rice ricepeterm at yahoo.co.uk
Wed Feb 6 08:19:57 UTC 2019


Hi David,

Hmmm, interesting....

Can you send me example sequences for these cases and I will take a look.

All the best,

Peter Rice
EMBOSS Team

On 05/02/2019 22:58, David Mathog wrote:
> EMBOSS 6.6.0 on Centos 6.9.
> 
> Trying to align a bunch of 2-20kbp contigs against a 175kbp BAC with 
> stretcher and some odd things are falling out.  I think these point to a 
> problem in the handling of end gaps in that program.  It is invoked like 
> this:
> 
>    stretcher -aseq BAC.fasta -bseq contig.fasta \
>            -outfile pairs.fasta -aformat3 fasta -auto
> 
> 
> 1.  If a ~25kbp contig aligns so that its final 12kb overlaps (nearly 
> exactly, like 99.9% identity) with the first 12kbp of the BAC it the 
> alignment produced is a total mess.  It seems like stretcher cannot 
> handle end gaps in this context at all, forcing the 12kbp which should 
> be dangling unpaired off the left end of the BAC into alignment 
> internally. needle doesn't work in this situation either since both of 
> these commands segfault (on a nearly idle machine with 512Gb of RAM):
> 
>     needle BAC.fasta contig.fasta -outfile pairs.fasta -aformat3 fasta 
> -auto
>     needle BAC.fasta contig.fasta -outfile pairs.fasta -aformat3 fasta \
>        -endweight T -endextend 0 -endopen 1 -auto
> 
> The ssw_test program from SSWlib handles this correctly.  (Unfortunately 
> it does a local alignment and so cannot replace needle in this context.)
> 
> 2.  This happens a lot:
> 
>     AATTC(lots of sequence)ATGAC...  (BAC)
>     A--------(...)----------TGAC...  (contig)
> 
> It will also shift two A's, but not 3, or shift AAT, and so forth. 
> Needle doesn't do this when it runs the same alignment (with the 2nd 
> command from the pair above).
> Needle is also much slower than stretcher.
> 
> The contigs have been flipped if necessary so that they are all in the 
> same direction
> as the BAC.
> 
> Regards,
> 
> David Mathog
> mathog at caltech.edu
> Manager, Sequence Analysis Facility, Biology Division, Caltech
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/emboss

---
This email has been checked for viruses by AVG.
https://www.avg.com



More information about the EMBOSS mailing list