[Bioperl-l] Bug in SeqIO genbank output

Jason Stajich jason at cgt.duhs.duke.edu
Thu Jan 1 20:16:17 EST 2004


The reason Heikki was not seeing the problems is probably because he was
doing roundtripping with a genbank file - if you start with embl or fasta
you see that the trailing 6 spaces aren't coming in.  This is because of
this code
(parsing of genbank)
!	  if(defined($_) && s/^ORIGIN//) {
	      chomp;
	      if( $annotation && length($_) > 0 ) {
		  $annotation->add_Annotation('origin',
					       Bio::Annotation::SimpleValue->new(-value => $_));
	      }
        changing this to
!         if(defined($_) && s/^ORIGIN\s+//) {

So the $o value in the ORIGIN writer was getting set with the 6 spaces
when inputting the genbank file.  This is a silly thing to store as an
annotation.

So fixing the ORIGIN problem
Index: Bio/SeqIO/genbank.pm
===================================================================
RCS file: /home/repository/bioperl/bioperl-live/Bio/SeqIO/genbank.pm,v
retrieving revision 1.99
diff -r1.99 genbank.pm
543c543
< 	  if(defined($_) && s/^ORIGIN//) {
---
> 	  if(defined($_) && s/^ORIGIN\s+//) {
819c819,820
< 	$self->_print(sprintf("%-6s%s\n",'ORIGIN',$o ? $o->value : ''));
---
> 	$self->_print(sprintf("%-12s%s\n",
> 			      'ORIGIN', $o ? $o->value : ''));



I also exposed an embl parsing bugs when there is no feature table that
I fixed on main trunk and also merged onto the branch.


Happy New Year.
--jason
On Fri, 2 Jan 2004, Wes Barris wrote:

> Heikki Lehvaslaiho wrote:
>
> > Wes,
> >
> > You didnot say which versionof bioperl you are using. For some reason
>
> I am using bioperl-1.2.3
>
> > which I
> > can not quite understand, the current code:
> >           $self->_print(sprintf("%-6s%s\n",'ORIGIN',$o ? $o->value : ''));
> >
> > does print out the requred six spaces after the word ORIGIN. This was
> > recently
>
> Really?  How?  In the above line "%-6s" left justifies 'ORIGIN' (which is
> already 6 characters).  The '6' needs to be changed to '12' to get six
> extra spaces.  See below.
>
>
> > fixed. Now, why doesn't it work for you? Could you check that you do not
> > have
> > multiple copies of bioperl in your computer and the older one gets
> > accidently
> > executed?
> >
> > Sorry, I can not comeupwith any better explanation,
> >
> >         -Heikki
> >
> > On Tuesday 16 Dec 2003 4:38 am, Wes Barris wrote:
> >  > Hi,
> >  >
> >  > I have just succeeded in tracking down a bug that prevents genbank files
> >  > written from bioperl from being properly imported into StackPack
> >  > (clustering software).  The problem is due to a subtle difference in
> >  > a genbank entry downloaded from NCBI and a genbank entry produced using
> >  > genbank.pm.  If you use "od -c" to look at a genbank record from NCBI,
> >  > you will notice that the word "ORIGIN" is followed by six space
> > characters.
> >  >
> >  > ORIGIN
> >  >          1 cggccgcgtc gacttttttt ttaggtattt ttctcttatt atttctaaaa
> >  > tataaatttt 61 ggacattcaa aagtgcaaca ngttaatgtg cctgtgggga atatcacagt
> >  > taaaaaaata
> >  >
> >  > If I process this file using bioperl and then write out a new genbank
> >  > format file, the word "ORIGIN" is followed immediately by a carriage
> > return
> >  > (newline) character.
> >  >
> >  > It seems silly to me that spaces should be required after the word
> >  > "ORIGIN", but they do exist in files downloaded from NCBI and StackPack
> >  > seems to require these space characters in order to import a genbank
> > file.
> >  > Is there an official specification for the genbank format?  I have
> > sent a
> >  > bug report to the makers of StackPack too.
> >  >
> >  > In the meantime, I have modified my installed copy of
> > Bio/SeqIO/genbank.pm
> >  > changing this line:
> >  >
> >  >          $self->_print(sprintf("%-6s%s\n",'ORIGIN',$o ? $o->value :
> > ''));
> >  >
> >  > to this:
> >  >
> >  >          $self->_print(sprintf("%-12s%s\n",'ORIGIN      ',$o ?
> > $o->value :
> >  > ''));
> >
> > --
> > ______ _/      _/_____________________________________________________
> >       _/      _/                      http://www.ebi.ac.uk/mutations/
> >      _/  _/  _/  Heikki Lehvaslaiho    heikki_at_ebi ac uk
> >     _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
> >    _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
> >   _/  _/  _/  Cambs. CB10 1SD, United Kingdom
> >      _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
> > ___ _/_/_/_/_/________________________________________________________
> >
>
>
>

--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu


More information about the Bioperl-l mailing list