[Bioperl-l] Writing genbank files
Andrew Dalke
Andrew Dalke" <dalke@dalkescientific.com
Sat, 19 Jan 2002 10:37:31 -0700
Ewan:
>BTW - Andrew's suggestion of writing out the header portion is almost
>certainly a bad idea. A format definition is a mixture of the precise
>definition and its common usage. Also the old adage rings true "be
>permissive in what you accept but strict in what you output".
Well, I didn't say that it was a good idea - I said that the GenBank
format specification from NCBI requires a header so that what bioperl,
biojava, biopython, etc. accept and generate as "genbank" isn't
strictly the GenBank format. I also said that
] In reality the software should
] respond to a request for something in GenBank format by saying it
] can't generate a GenBank file but can generate something which is
] GenBank-like, and somehow describe the differences. Documenting
] this nuance and those differences is itself hard and tedious,
] which is why it is rarely done outside of pointers to source code.)
] [It] then happens that there's a de facto consensus amoung those
] in the know about what constitutes a "correct" GenBank file. This
] consensus isn't documented and is learned mostly from experience.
The consensus is that the header is optional. In my view as well,
sticking in a fake header with false data to meet the strict format
spec is a bad thing. But there's apparently not a consensus as to
what is strict/permissive enough for a LOCUS line for neither biojava
nor biopython's parsers will accept bioperl's stock output. That's
easy to fix, once it's pointed out -- make stricter output or have a
more permissive consensus.
Andrew
dalke@dalkescientific.com