[Biopython-dev] Newbler ACE file to SAM?

Kevin Jacobs <jacobs@bioinformed.com> bioinformed at gmail.com
Wed Aug 4 20:41:06 UTC 2010


On Wed, Aug 4, 2010 at 12:32 PM, Nick Loman <n.j.loman at bham.ac.uk> wrote:

> I'm pretty sure the ACE files contain the individual reads (or at the
> least, the trimmed, aligned portions of them) because this is the file one
> uses in Consed/Tablet to view an assembly. We may of course be talking at
> cross-purposes!
>
>
Hi Nick,

I've reviewed the Newbler ACE files and re-discovered the reason why they
weren't ideal in the first place: the alignment records in Newbler’s output
are gapped based on a pseudo-multiple-alignment of all of the reads to the
reference, not a standard pairwise alignment.  So there is no easy way to
differentiate which gaps in each read were introduced as part of the
pairwise alignment or as artifacts of the multi-way alignment.  This means
I'd need to  re-compute the alignment to the reference, but should be
relatively easy since the aligned start position is known using a round of
the standard Smith-Waterman algorithm.

In other words, it is technically possible to use Newbler's ACE files, but
it really is simpler and easier to use the 454PairAlign.txt results.  More
so because the 454PairAlign.txt files are often vastly smaller than
454Contig.ace files.

On the other hand, it should be easy to adapt my scripts to convert
non-Newbler ACE files to SAM/BAM provided that the reads are gapped for
pairwise alignment.  It has been so long since I've used consed/phred/phrap
that I don't remember if this is how it is normally done.

-Kevin




More information about the Biopython-dev mailing list