[Biopython] Support for Xdna, SnapGene and GCK formats

Damien Goutte-Gattat dgouttegattat at incenp.org
Tue Jul 30 18:06:04 UTC 2019


[Resending this mail, as a broken DKIM signature may have led 
DMARC-compliant spam filters to outright reject the message. Apologies 
to those who did receive it at the first attempt.]

Hi Biopython folks,

Last December I wrote to this mailing list [1] to present new parsers for 
potential addition to Biopython's SeqIO module. That mail may have gone 
unnoticed, so allow me to present those parsers again.

There were initially two parsers, one for the "Xdna" format (used by DNA 
Strider and Serial Cloner [2]) and one for the SnapGene format (used by, well, 
SnapGene [3]). Since then I added a third parser for the "GCK" format (used by 
Gene Construction Kit [4]).

Those parsers are for now available in a Python module called 
"incenp.binseqs". You can find the source code of that module on my forge [5], 
the module can also be installed through PyPI (`pip install incenp.binseqs`).

If you want to test them, after installing the module all have you to do is 
load the `incenp.bio.seqio` package after loading Biopython's SeqIO, then you 
may use SeqIO's standard API:

 from Bio import SeqIO
 import incenp.bio.seqio

 xdna_record = SeqIO.read('serialcloner_file.xdna', 'xdna')
 snap_record = SeqIO.read('snapgene_file.dna', 'snapgene')
 gck_record  = SeqIO.read('gck_file.gck', 'gck')

I wrote the SnapGene parser using a (partial) specification from the editor; I 
had no such specifications for the Xdna and GCK formats, so I wrote the 
corresponding parsers after reverse-engineering some sample files I had. 
Obviously I can make no guarantees about their correctness.

I would like to propose those parsers for inclusion into Biopython.  There 
seems to be an interest [6], so I plan to change the namespace from my own 
`incenp.bio.seqio` to Biopython's `Bio.SeqIO`, update the license to match the 
current Biopython's licensing terms, and then prepare a pull request against 
the latest code base.

Any comment, suggestion, criticism on that proposal is welcome.

Regards,

- Damien

[1] https://lists.open-bio.org/pipermail/biopython/2018-December/016574.html

[2] http://serialbasics.free.fr/Serial_Cloner.html

[3] https://www.snapgene.com/

[4] http://www.textco.com/gene-construction-kit.php

[5] https://git.incenp.org/damien/binseqs

[6] https://twitter.com/pjacock/status/1155932488813797378
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: not available
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20190730/b34243f5/attachment.sig>


More information about the Biopython mailing list