[Biojava-l] Sanger sequencing trace files support

Andreas Prlic andreas at sdsc.edu
Wed Jul 13 12:58:11 UTC 2016


Thanks,

I filed this as a feature request for the BioJava 4 series on GitHub.

Andreas

On Wed, Jul 13, 2016 at 4:10 AM, Peter Cock <p.j.a.cock at googlemail.com>
wrote:

> Hi Jonas,
>
> Thanks for emailing me that example with an M in the sequence.
> Biopython could parse it fine, and having checked our existing
> sample test files, this one has K, R and Y bases:
>
> https://github.com/biopython/biopython/blob/master/Tests/Abi/3730.ab1
>
> BioJava would be welcome to use that (double check with
> Bow, CC'd, if you need it explicitly under a different licence).
>
> Regards,
>
> Peter
>
>
> On Tue, Jul 12, 2016 at 5:41 PM, Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
> > Hi Jonas,
> >
> > Are you happy to share sample file(s) using IUPAC ambiguity
> > codes like M = A or C which could be freely used by BioJava
> > and other projects as a test case?
> >
> > (I'm specifically asking for Biopython as I'm not sure if anyone
> > has tried this with our ABI parser)
> >
> > Thanks,
> >
> > Peter
> >
> > On Tue, Jul 12, 2016 at 4:26 PM, Jonas Dehairs <jonas.dehairs at gmail.com>
> wrote:
> >> The 4.2 API currently does not have methods for importing and
> >> handeling Sanger sequencing files (ABI, SCF). I'm currently resorting
> >> to the legacy classes in 1.9.1 (ChromatogramFactory and Chromatogram).
> >>
> >> ChromatogramFactory only supports Sanger trace files with standard
> >> ATGCN characters. It throws a
> >> UnsupportedChromatogramFormatException upon reading Sanger files with
> >> IUPAC Ambiguity Codes (for example M = A or C). Even if I would just
> >> like to access the traces and ignore the base calls, this is
> >> impossible with the current implementation since we can't even open
> >> the file if it contains Ambiguity codes.
> >>
> >> On a side note, I have been getting more and more questions from users
> >> why they can't open their Sanger sequencing files (in my program that
> >> uses BioJava). I think the popularity of CRISPR and the
> >> characterization of CRISPR KO clones (which is likely to result in
> >> heterozygous base calls) is increasing the number of people that have
> >> these IUPAC Ambiguity Sanger files.
> >>
> >> For now, I tell people to go back to the Sanger sequencing software
> >> that exports the ABI or SCF files and disable IUPAC Ambiguity in the
> >> export options. In that case the base calling algorithm just picks the
> >> strongest signals in case of ambiguity and sticks to standard ATGCN
> >> characters.
> >>
> >> Anyway, I am requesting the addition of the Chromatogram classes to
> >> the new API with support for opening files if they contain UPAC
> >> Ambiguity Codes.
> >>
> >> Thank you for this useful API,
> >> _______________________________________________
> >> Biojava-l mailing list  -  Biojava-l at mailman.open-bio.org
> >> http://mailman.open-bio.org/mailman/listinfo/biojava-l
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biojava-l
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biojava-l/attachments/20160713/5f55a9c5/attachment.html>


More information about the Biojava-l mailing list