[Biopython] Support for Xdna, SnapGene and GCK formats

Damien Goutte-Gattat dgouttegattat at incenp.org
Tue Jul 30 16:41:17 UTC 2019


Hi Biopython folks,

Last December I wrote to this mailing list [1] to present new parsers 
for potential addition to Biopython's SeqIO module. That mail may have 
gone unnoticed, so allow me to present those parsers again.

There were initially two parsers, one for the "Xdna" format (used by DNA 
Strider and Serial Cloner [2]) and one for the SnapGene format (used by, 
well, SnapGene [3]). Since then I added a third parser for the "GCK" 
format (used by Gene Construction Kit [4]).

Those parsers are for now available in a Python module called 
"incenp.binseqs". You can find the source code of that module on my 
forge [5], the module can also be installed through PyPI (`pip install 
incenp.binseqs`).

If you want to test them, after installing the module all have you to do 
is load the `incenp.bio.seqio` package after loading Biopython's SeqIO, 
then you may use SeqIO's standard API:

  from Bio import SeqIO
  import incenp.bio.seqio

  xdna_record = SeqIO.read('serialcloner_file.xdna', 'xdna')
  snap_record = SeqIO.read('snapgene_file.dna', 'snapgene')
  gck_record  = SeqIO.read('gck_file.gck', 'gck')

I wrote the SnapGene parser using a (partial) specification from the 
editor; I had no such specifications for the Xdna and GCK formats, so I 
wrote the corresponding parsers after reverse-engineering some sample 
files I had. Obviously I can make no guarantees about their correctness.

I would like to propose those parsers for inclusion into Biopython.  
There seems to be an interest [6], so I plan to change the namespace 
from my own `incenp.bio.seqio` to Biopython's `Bio.SeqIO`, update the 
license to match the current Biopython's licensing terms, and then 
prepare a pull request against the latest code base.

Any comment, suggestion, criticism on that proposal is welcome.

Regards,

- Damien

[1] 
https://lists.open-bio.org/pipermail/biopython/2018-December/016574.html

[2] http://serialbasics.free.fr/Serial_Cloner.html

[3] https://www.snapgene.com/

[4] http://www.textco.com/gene-construction-kit.php

[5] https://git.incenp.org/damien/binseqs

[6] https://twitter.com/pjacock/status/1155932488813797378
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: not available
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20190730/192eaf2f/attachment.sig>


More information about the Biopython mailing list