[EMBOSS] stretcher issues
David Mathog
mathog at caltech.edu
Tue Feb 5 22:58:09 UTC 2019
EMBOSS 6.6.0 on Centos 6.9.
Trying to align a bunch of 2-20kbp contigs against a 175kbp BAC with
stretcher and some odd things are falling out. I think these point to a
problem in the handling of end gaps in that program. It is invoked like
this:
stretcher -aseq BAC.fasta -bseq contig.fasta \
-outfile pairs.fasta -aformat3 fasta -auto
1. If a ~25kbp contig aligns so that its final 12kb overlaps (nearly
exactly, like 99.9% identity) with the first 12kbp of the BAC it the
alignment produced is a total mess. It seems like stretcher cannot
handle end gaps in this context at all, forcing the 12kbp which should
be dangling unpaired off the left end of the BAC into alignment
internally. needle doesn't work in this situation either since both of
these commands segfault (on a nearly idle machine with 512Gb of RAM):
needle BAC.fasta contig.fasta -outfile pairs.fasta -aformat3 fasta
-auto
needle BAC.fasta contig.fasta -outfile pairs.fasta -aformat3 fasta \
-endweight T -endextend 0 -endopen 1 -auto
The ssw_test program from SSWlib handles this correctly. (Unfortunately
it does a local alignment and so cannot replace needle in this context.)
2. This happens a lot:
AATTC(lots of sequence)ATGAC... (BAC)
A--------(...)----------TGAC... (contig)
It will also shift two A's, but not 3, or shift AAT, and so forth.
Needle doesn't do this when it runs the same alignment (with the 2nd
command from the pair above).
Needle is also much slower than stretcher.
The contigs have been flipped if necessary so that they are all in the
same direction
as the BAC.
Regards,
David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
More information about the EMBOSS
mailing list