[Biopython-dev] SeqIO Abi Parser

Wibowo Arindrarto w.arindrarto at gmail.com
Tue Jul 26 09:08:38 UTC 2011


Hi everyone,

A few weeks ago I wrote about my interest in making Biopython able to parse
the Abi trace file. I've finished writing the SeqIO plugin and some tests. I
thought this would be useful to a number of people, so I was wondering about
what I should do after this (how will my code be reviewed?, should I just go
with a pull request?). Of course, there are things that I might have missed
when writing the plugin, so feel free to criticize/comment :)!

Here's the SeqIO plugin:
https://github.com/bow/biopython/blob/seqio-abi/Bio/SeqIO/AbiIO.py

Looking forward to the reply,
---
Wibowo Arindrarto (bow)
http://bow.web.id



On Thu, Jul 7, 2011 at 03:16, Wibowo Arindrarto <w.arindrarto at gmail.com>wrote:

> Hi everyone,
>
> This is my first post in the dev mailing list, so greetings :).
>
> I've been using Biopython for a few months in total now (in a period of
> ~1.5 years) and before that Python for ~0.5 years. Most of the time, I'm
> working with Sanger sequencing results and at one point I was a bit
> disappointed that I couldn't find any (bio)python module for reading .ab1
> files. That compelled me to write my first python module that reads those
> files and extracts the useful information out of them. In the process I
> became more interested in python itself and finally thought it might be neat
> if biopython could do this, built-in.
>
> So I forked the main repo, made some changes to my module so it became a
> parser for the SeqIO submodule that reads Abi files. It's not cooked 100%
> yet, but if anyone is interested in seeing/commenting/criticizing the code,
> I'd appreciate that very much. Here's the link:
> https://github.com/bow/biopython/blob/seqio-abif/Bio/SeqIO/AbiIO.py
>
> Some features to note:
> - I've included a method to trim the sequence based on its quality scores
> - the parser does not extract the entire metadata of the trace files, only
> ones I consider important for further analysis/annotations. Of course, this
> could be changed if the community think some other data should be
> included/excluded
> - For those of you already familiar with the Abi format, I deliberately
> chose the 'PBAS2' tag for the sequence information, which is the unedited
> bases after base-calling by the sequencing program.
>
> Some things that I'm doing right now:
> - writing unit tests
> - making sure it's compatible with Python 3 (thanks Peter :)! )
> - completing the docs
> - making sure it's compatible with most Abi format versions. Currently I've
> only tested it with files from the 310, 3100, and 3700 machines. Does anyone
> have some other versions that I can test this with?
>
> As I understand as well, this is not the only Sanger sequencing trace
> format out there (e.g. SCF is another). I would be glad to learn more and
> write a parser for the SCF format as well. The problem is, I'm not sure this
> would be useful in the long run as I've personally never seen anyone use an
> SCF file and so I've never had a chance to play around with one. If anyone
> has an SCF file lying around and thinks SCF support would be beneficial, I'd
> be happy to accept them :).
>
> I guess that's all for now. Thanks for reading!
>
> ---
> Wibowo Arindrarto (bow)
> http://bow.web.id
>



More information about the Biopython-dev mailing list