[EMBOSS] Unknown output format 'refseqp' and 'genpept'

Peter Rice pmr at ebi.ac.uk
Tue Dec 8 14:11:59 UTC 2009


Peter C. wrote:
> Could I suggest you treat known input formats which are not supported
> as output formats a little differently and instead of this:
> 
> unknown output format 'genpept'
> 
> Perhaps give,
> 
> format 'genpept' is not supported for output (only input)
> 
> This would help the user rule out having a typo etc.

A useful suggestion. We can apply that to feature formats too. I'll see 
what I can do.

may be worth a tidy up on what we do with formats that are only valid 
for nucleotide or protein (though that is a little tricky as we 
currently try to let some fail over to an equivalent format.

>> Does biopython have a definition of the fields it expects to write out in a
>> GenPept or RefseqP format file? We would be able to allow GenBank as an
>> alias for, presumably, genpept.
> 
> Not explicitly, no. I was hoping to use EMBOSS for cross validation ;)

No problem. We'll go first then and try to define standard formats.

> With hindsight this may have been a mistake, but we use "genbank"
> format to mean either nucleotides of proteins. On parsing we just
> look at the units of length in the LOCUS line (bp or aa). We also
> try to cope with both the current NCBI files and some older variants
> we have in our unit tests (different offsets in the LOCUS line).

We try that too on input, but for output we have to be explicit so the 
user can pick just one of the choices.

regards,

Peter R.



More information about the EMBOSS mailing list