[Bioperl-l] Writing genbank files

Andrew Dalke Andrew Dalke" <dalke@dalkescientific.com
Sat, 19 Jan 2002 10:37:31 -0700


Ewan:
>BTW - Andrew's suggestion of writing out the header portion is almost
>certainly a bad idea. A format definition is a mixture of the precise
>definition and its common usage. Also the old adage rings true "be
>permissive in what you accept but strict in what you output".

Well, I didn't say that it was a good idea - I said that the GenBank
format specification from NCBI requires a header so that what bioperl,
biojava, biopython, etc. accept and generate as "genbank" isn't
strictly the GenBank format.  I also said that

] In reality the software should
] respond to a request for something in GenBank format by saying it
] can't generate a GenBank file but can generate something which is
] GenBank-like, and somehow describe the differences.  Documenting
] this nuance and those differences is itself hard and tedious,
] which is why it is rarely done outside of pointers to source code.)

] [It] then happens that there's a de facto consensus amoung those
] in the know about what constitutes a "correct" GenBank file.  This
] consensus isn't documented and is learned mostly from experience.

The consensus is that the header is optional.  In my view as well,
sticking in a fake header with false data to meet the strict format
spec is a bad thing.  But there's apparently not a consensus as to
what is strict/permissive enough for a LOCUS line for neither biojava
nor biopython's parsers will accept bioperl's stock output.  That's
easy to fix, once it's pointed out -- make stricter output or have a
more permissive consensus.

                    Andrew
                    dalke@dalkescientific.com