[emboss-dev] Regression in GenBank/GenPept parsing?

Peter Rice pmr at ebi.ac.uk
Tue Jul 21 13:30:17 UTC 2009


Peter wrote:
> On Tue, Jul 21, 2009 at 11:40 AM, Peter Rice<pmr at ebi.ac.uk> wrote:
>> GenPept format expects to find 9 fields on the LOCUS line.
>> RefseqP format expects only 8.
>>
>> The difference is GenPept format including the original GenPept locus name.
> 
> Which 8 or 9 fields?

'LOCUS'
identifier
Genbank-locus-name (GenPept format only)
seqlen             (numeric)
'aa'
molecule-type      (controlled vocabulary - we ignore the protein ones
for now)
'circular' or 'linear'
division           (expecting 'UNC' for unclassified)
date               (last modified date)

> Grand. Will there be an EMBOSS 6.1.1 in a week or so then (addressing
> this, the FASTQ @ problem, and any other minor issues)?

There will be a patch file in the
ftp://emboss.open-bio.org/pub/EMBOSS/patches/ directory

For those (like me) who prefer to manually update there will also be
replacement file(s) in the fixes directory.

> http://biopython.open-bio.org/SRC/biopython/ is just a dump from
> our repository (hourly or something). If you just download the latest
> Biopython source code, this will have all the unit test files etc:
> http://biopython.org/DIST/biopython-1.51b.tar.gz

Super, thanks.

> Ask if you need clarification on what any of the test data files are
> for. In some cases searching the Tests/test_*.py files may have
> informative comments.

Thanks. The plan is to include them in the EMBOSS QA tests so I will
take a look at the inputs and what you check for in the outputs. At
first glance it looks straightforward.

regards,

Peter



More information about the emboss-dev mailing list