[EMBOSS] Unknown output format 'refseqp' and 'genpept'

Peter biopython at maubp.freeserve.co.uk
Tue Dec 8 13:53:13 UTC 2009


On Tue, Dec 8, 2009 at 1:32 PM, Peter Rice <pmr at ebi.ac.uk> wrote:
>
> Peter wrote:
>>
>> Hi,
>>
>> I have a protein IntelliGenetics file used in the Biopython test suite:
>> http://biopython.org/SRC/biopython/Tests/IntelliGenetics/VIF_mase-pro.txt

It probably doesn't matter what the input file is here, the fact that
it was an (obsolete) format like IntelliGenetics was just chance as
I was working on a Biopython unit test.

>> I am using EMBOSS 6.1.0 (patch level 2 I think), and I am trying
>> to turn this into a "GenBank Protein File", or GenPept file, using
>> EMBOSS seqret.
>>
>> Doesn't EMBOSS seqret support genpept/refseqp as an output format?
>
> Oddly enough you are the first to ask for it.

That surprises me a little bit.

Could I suggest you treat known input formats which are not supported
as output formats a little differently and instead of this:

unknown output format 'genpept'

Perhaps give,

format 'genpept' is not supported for output (only input)

This would help the user rule out having a typo etc.

> Does biopython have a definition of the fields it expects to write out in a
> GenPept or RefseqP format file? We would be able to allow GenBank as an
> alias for, presumably, genpept.

Not explicitly, no. I was hoping to use EMBOSS for cross validation ;)

With hindsight this may have been a mistake, but we use "genbank"
format to mean either nucleotides of proteins. On parsing we just
look at the units of length in the LOCUS line (bp or aa). We also
try to cope with both the current NCBI files and some older variants
we have in our unit tests (different offsets in the LOCUS line).

> Might be a good time to merge the format names and details from biopython
> and emboss. Where can Ifine the biopython ones?

There are two tables on the wiki which include version information:
http://biopython.org/wiki/SeqIO
http://biopython.org/wiki/AlignIO

You can also consult the built in documentation, also available online:
http://biopython.org/DIST/docs/api/Bio.SeqIO-module.html
http://biopython.org/DIST/docs/api/Bio.AlignIO-module.html

For a long time I avoided having aliases (multiple names for the same
thing). However, we now treat "gb" as an alias for "genbank" (since
this is what the NCBI use in Entrez). We also treat "fastq-sanger" and
"fastq" the same.

Peter C (the one at Biopython)



More information about the EMBOSS mailing list