[Biojava-l] Sanger sequencing trace files support

Peter Cock p.j.a.cock at googlemail.com
Wed Jul 13 11:10:56 UTC 2016


Hi Jonas,

Thanks for emailing me that example with an M in the sequence.
Biopython could parse it fine, and having checked our existing
sample test files, this one has K, R and Y bases:

https://github.com/biopython/biopython/blob/master/Tests/Abi/3730.ab1

BioJava would be welcome to use that (double check with
Bow, CC'd, if you need it explicitly under a different licence).

Regards,

Peter


On Tue, Jul 12, 2016 at 5:41 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> Hi Jonas,
>
> Are you happy to share sample file(s) using IUPAC ambiguity
> codes like M = A or C which could be freely used by BioJava
> and other projects as a test case?
>
> (I'm specifically asking for Biopython as I'm not sure if anyone
> has tried this with our ABI parser)
>
> Thanks,
>
> Peter
>
> On Tue, Jul 12, 2016 at 4:26 PM, Jonas Dehairs <jonas.dehairs at gmail.com> wrote:
>> The 4.2 API currently does not have methods for importing and
>> handeling Sanger sequencing files (ABI, SCF). I'm currently resorting
>> to the legacy classes in 1.9.1 (ChromatogramFactory and Chromatogram).
>>
>> ChromatogramFactory only supports Sanger trace files with standard
>> ATGCN characters. It throws a
>> UnsupportedChromatogramFormatException upon reading Sanger files with
>> IUPAC Ambiguity Codes (for example M = A or C). Even if I would just
>> like to access the traces and ignore the base calls, this is
>> impossible with the current implementation since we can't even open
>> the file if it contains Ambiguity codes.
>>
>> On a side note, I have been getting more and more questions from users
>> why they can't open their Sanger sequencing files (in my program that
>> uses BioJava). I think the popularity of CRISPR and the
>> characterization of CRISPR KO clones (which is likely to result in
>> heterozygous base calls) is increasing the number of people that have
>> these IUPAC Ambiguity Sanger files.
>>
>> For now, I tell people to go back to the Sanger sequencing software
>> that exports the ABI or SCF files and disable IUPAC Ambiguity in the
>> export options. In that case the base calling algorithm just picks the
>> strongest signals in case of ambiguity and sticks to standard ATGCN
>> characters.
>>
>> Anyway, I am requesting the addition of the Chromatogram classes to
>> the new API with support for opening files if they contain UPAC
>> Ambiguity Codes.
>>
>> Thank you for this useful API,
>> _______________________________________________
>> Biojava-l mailing list  -  Biojava-l at mailman.open-bio.org
>> http://mailman.open-bio.org/mailman/listinfo/biojava-l


More information about the Biojava-l mailing list