[Biojava-l] Sanger sequencing trace files support

Jonas Dehairs jonas.dehairs at gmail.com
Tue Jul 12 15:26:28 UTC 2016


The 4.2 API currently does not have methods for importing and
handeling Sanger sequencing files (ABI, SCF). I'm currently resorting
to the legacy classes in 1.9.1 (ChromatogramFactory and Chromatogram).

ChromatogramFactory only supports Sanger trace files with standard
ATGCN characters. It throws a
UnsupportedChromatogramFormatException upon reading Sanger files with
IUPAC Ambiguity Codes (for example M = A or C). Even if I would just
like to access the traces and ignore the base calls, this is
impossible with the current implementation since we can't even open
the file if it contains Ambiguity codes.

On a side note, I have been getting more and more questions from users
why they can't open their Sanger sequencing files (in my program that
uses BioJava). I think the popularity of CRISPR and the
characterization of CRISPR KO clones (which is likely to result in
heterozygous base calls) is increasing the number of people that have
these IUPAC Ambiguity Sanger files.

For now, I tell people to go back to the Sanger sequencing software
that exports the ABI or SCF files and disable IUPAC Ambiguity in the
export options. In that case the base calling algorithm just picks the
strongest signals in case of ambiguity and sticks to standard ATGCN
characters.

Anyway, I am requesting the addition of the Chromatogram classes to
the new API with support for opening files if they contain UPAC
Ambiguity Codes.

Thank you for this useful API,


More information about the Biojava-l mailing list