[emboss-dev] Regression in GenBank/GenPept parsing?

Peter Rice pmr at ebi.ac.uk
Tue Jul 21 10:40:39 UTC 2009


Peter C. wrote:
>> My next task (once I've made sure your bugs are fixed) is to regenerate
>> all the tables of formats.
> 
> Great. This may save you having to answer my next question,
> which was could you expand on what EMBOSS considers to be
> the differences between "genbank", "genpept" and "refseqp" as
> file formats? Of course, I may come up with further questions ;)

Oh, further questions please! We love answering them.

GenPept format expects to find 9 fields on the LOCUS line.

RefseqP format expects only 8.

The difference is GenPept format including the original GenPept locus name.

We may try to merge them one day. If we do, we would keep the format
names but use one parser.

Your Genpept (refseqp) format problem will be fixed in a patch. It was
fine for one sequence but needed to rebuffer the input file to work with
multiple input sequences.

Meanwhile, could you tar up the biopython test data and scripts
http://biopython.open-bio.org/SRC/biopython/Tests/ and I will try
running the same data through EMBOSS to see what issues we can find.

regards,

Peter



More information about the emboss-dev mailing list