[Open-bio-l] Agreed name for UniProt XML file format

Chris Fields cjfields at illinois.edu
Tue Jul 27 16:14:41 UTC 2010


On Jul 27, 2010, at 10:31 AM, Peter wrote:

> Dear all,
> 
> BioPerl, Biopython and EMBOSS all have a set of sequence file format
> names (as strings), used as arguments to their SeqIO modules or as
> command line arguments in EMBOSS. I understand that in BioRuby
> and BioJava you have named classes instead(?). We currently have
> reasonably consistent existing names. For the FASTQ files formats
> we managed to agree consistent naming for the Sanger, Solexa and
> Illumina 1.3+ variants. Now for the next "new" format...
> 
> Andrea Pierlenoin (CC'd) has been working on parsing the UniProt
> XML file format in Biopython - this is essentially an XML replacement
> for the old SwissProt plain text file format which is called "swiss" in
> BioPerl, Biopython and EMBOSS (although EMBOSS also allows
> "sw" and "swissprot" as well).
> 
> We were originally suggesting calling this new format "uniprot",
> http://bioperl.org/pipermail/open-bio-l/2010-January/000609.html
> 
> Andrea has since pointed out that in the EBI REST services the file
> format is referred as "uniprot-xml" which is also less ambiguous
> (after all the old "swiss" plain text format might equally be referred
> to as the plain text UniProt format).
> 
> So, what do people feel about standardising on "uniprot" and/or
> "uniprot-xml" as the format name in Biopython, BioPerl & EMBOSS?

Agree with hilmar, 'uniprot-xml'.

> Thanks,
> 
> Peter
> 
> P.S. Chris, am I right in thinking that if BioPerl were to support this
> file format under the name "uniprot-xml" this would be equivalent
> to accepting format="uniprot" and variant="xml"? And furthermore
> and assuming you regard this as the default/only variant, Bio::SeqIO
> would also just accept format="uniprot"?

In cases where 'uniprot' is passed, we could handle it either way: delegate to 'swiss' if xml isn't specified (so 'uniprot' is just an alias of 'swiss'), or always use the XML handler for 'uniprot' and ignore the variant argument.  Either way is fine with us.

chris






More information about the Open-Bio-l mailing list