[Open-bio-l] Agreed name for UniProt XML file format

Peter biopython at maubp.freeserve.co.uk
Tue Jul 27 15:31:38 UTC 2010


Dear all,

BioPerl, Biopython and EMBOSS all have a set of sequence file format
names (as strings), used as arguments to their SeqIO modules or as
command line arguments in EMBOSS. I understand that in BioRuby
and BioJava you have named classes instead(?). We currently have
reasonably consistent existing names. For the FASTQ files formats
we managed to agree consistent naming for the Sanger, Solexa and
Illumina 1.3+ variants. Now for the next "new" format...

Andrea Pierlenoin (CC'd) has been working on parsing the UniProt
XML file format in Biopython - this is essentially an XML replacement
for the old SwissProt plain text file format which is called "swiss" in
BioPerl, Biopython and EMBOSS (although EMBOSS also allows
"sw" and "swissprot" as well).

We were originally suggesting calling this new format "uniprot",
http://bioperl.org/pipermail/open-bio-l/2010-January/000609.html

Andrea has since pointed out that in the EBI REST services the file
format is referred as "uniprot-xml" which is also less ambiguous
(after all the old "swiss" plain text format might equally be referred
to as the plain text UniProt format).

So, what do people feel about standardising on "uniprot" and/or
"uniprot-xml" as the format name in Biopython, BioPerl & EMBOSS?

Thanks,

Peter

P.S. Chris, am I right in thinking that if BioPerl were to support this
file format under the name "uniprot-xml" this would be equivalent
to accepting format="uniprot" and variant="xml"? And furthermore
and assuming you regard this as the default/only variant, Bio::SeqIO
would also just accept format="uniprot"?



More information about the Open-Bio-l mailing list