[Bioperl-l] FW: BioPerl SeqIO-like system in BioPython

Chris Fields cjfields at uiuc.edu
Tue Sep 19 17:34:00 UTC 2006

The following is a request from Peter, one of the Biopython developers, for
suggestions from us Bioperlers (Bioperlites?).  They are trying to implement
a SeqIO-like system for BioPython.  Any suggestions/hints/help would be
greatly appreciated.


Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign

Forwarded message:

Chris Fields wrote:
> I know that BioPython is trying to get a SeqIO-like system set up.  
> Let us know if you need any help/advice.
> Cheers!

I've thought of a couple of things - if you want to pass this on to the
appropriate BioPerl people, please do so.

Cheers, Peter

Internal names for formats
I want to use simple strings to describe the different file formats for use
as function arguments (e.g. "fasta", "genbank"), and ideally use the same
names as BioPerl:


Is the webpage authoritative?  I would guess they match the module names
under Bio/SeqIO/*.pm and Bio/AlignIO/*.pm

The comments from Bio/AlignIO.pm list a few more names (not listed under

For the moment, my intention is to also include multiple alignments as part
of our sequence reading support.

How do you cope with assorted gap characters (typically dot/period and dash,
'.' and '-') and how different file formats treat them?

For example, multiple alignments in Fasta format probably use either,
depending on the source of the file.

Clustal and Phylip seem to use '-' as a gap. MSF uses '.'

Phylip apparently treats '.' as meaning "same character as the previous
sequence" which is asking for trouble.

Does BioPerl make any efforts to convert everything into an internal
standard (say '-') when loading files, and convert as appropriate when
writing them?

This old thread suggests it is (was) left in the end user's hands:


More information about the Bioperl-l mailing list