[Bioperl-l] Bug in SeqIO genbank output

Wes Barris wes.barris at csiro.au
Mon Dec 15 23:38:56 EST 2003


Hi,

I have just succeeded in tracking down a bug that prevents genbank files
written from bioperl from being properly imported into StackPack
(clustering software).  The problem is due to a subtle difference in
a genbank entry downloaded from NCBI and a genbank entry produced using
genbank.pm.  If you use "od -c" to look at a genbank record from NCBI,
you will notice that the word "ORIGIN" is followed by six space characters.

ORIGIN
         1 cggccgcgtc gacttttttt ttaggtattt ttctcttatt atttctaaaa tataaatttt
        61 ggacattcaa aagtgcaaca ngttaatgtg cctgtgggga atatcacagt taaaaaaata

If I process this file using bioperl and then write out a new genbank format
file, the word "ORIGIN" is followed immediately by a carriage return (newline)
character.

It seems silly to me that spaces should be required after the word "ORIGIN",
but they do exist in files downloaded from NCBI and StackPack seems to
require these space characters in order to import a genbank file.  Is there
an official specification for the genbank format?  I have sent a bug report
to the makers of StackPack too.

In the meantime, I have modified my installed copy of Bio/SeqIO/genbank.pm
changing this line:

         $self->_print(sprintf("%-6s%s\n",'ORIGIN',$o ? $o->value : ''));

to this:

         $self->_print(sprintf("%-12s%s\n",'ORIGIN      ',$o ? $o->value : ''));

-- 
Wes Barris
E-Mail: Wes.Barris at csiro.au




More information about the Bioperl-l mailing list