[Bioperl-l] what are the key features of genbank file recognized by SeqIO?

Torsten Seemann torsten.seemann at infotech.monash.edu.au
Wed Oct 17 01:31:03 UTC 2007


On 17/10/2007, Sheri Simmons <sheris at berkeley.edu> wrote:
> I'm hoping someone can help me with figuring out how to generate genbank files
> from scratch that can be interpreted with SeqIO. The problem is that genbank
> files generated via the artemis program are not recognized by SeqIO, so I am
> attempting to generate SeqIO-compatible genbank files so they can be
> converted to other formats later.
> I produced a file which looks by eye exactly like standard genbank files, but
> which is not recognized by SeqIO. Could anyone tell me or refer me to a
> source that explains the exact format that SeqIO::genbank requires?

Here is NCBI's documentation on the feature table format.
http://www.ncbi.nlm.nih.gov/collab/FT/

Regarding Artemis: if you load an EMBL file, then save as EMBL, it
seems to keep the header. If you save it as Genbank though, it
discards the header and only outputs the feature table and sequence.
Also, Artemis for many of its Save options only saves the feature
table. Both of these last two can't be read by BioPerl. At one stage I
had some code to whack in a fake header and empty sequence at the end
to get it to load.

Now, regarding your example output. I'm never sure if the first line
is LOCUS or ID, I guess one is deprecated. You also don't have a Date
at the end of the first line, but maybe it is optional. You are also
missing the "//" on the final line. Finally, Genbank is strict on
correct spacing etc, but I'm sure the Bioperl parser is written
relatively defensively here, so that might not be the problem.

Hope this helps,

-- 
--Torsten Seemann
--Victorian Bioinformatics Consortium, Monash University
--Tel +61 3 9905 9010



More information about the Bioperl-l mailing list