[Bioperl-l] Bug in SeqIO genbank output
Heikki Lehvaslaiho
heikki at nildram.co.uk
Fri Jan 2 13:13:00 EST 2004
On Friday 02 Jan 2004 1:16 am, Jason Stajich wrote:
> The reason Heikki was not seeing the problems is probably because he was
> doing roundtripping with a genbank file - if you start with embl or fasta
> you see that the trailing 6 spaces aren't coming in. This is because of
So that's what it was! I was using a genbank file. Thanks, Jason,
-Heikki
> this code
> (parsing of genbank)
> ! if(defined($_) && s/^ORIGIN//) {
> chomp;
> if( $annotation && length($_) > 0 ) {
> $annotation->add_Annotation('origin',
> Bio::Annotation::SimpleValue->new(-value => $_));
> }
> changing this to
> ! if(defined($_) && s/^ORIGIN\s+//) {
>
> So the $o value in the ORIGIN writer was getting set with the 6 spaces
> when inputting the genbank file. This is a silly thing to store as an
> annotation.
>
> So fixing the ORIGIN problem
> Index: Bio/SeqIO/genbank.pm
> ===================================================================
> RCS file: /home/repository/bioperl/bioperl-live/Bio/SeqIO/genbank.pm,v
> retrieving revision 1.99
> diff -r1.99 genbank.pm
> 543c543
> < if(defined($_) && s/^ORIGIN//) {
> ---
>
> > if(defined($_) && s/^ORIGIN\s+//) {
>
> 819c819,820
> < $self->_print(sprintf("%-6s%s\n",'ORIGIN',$o ? $o->value : ''));
> ---
>
> > $self->_print(sprintf("%-12s%s\n",
> > 'ORIGIN', $o ? $o->value : ''));
>
> I also exposed an embl parsing bugs when there is no feature table that
> I fixed on main trunk and also merged onto the branch.
>
>
> Happy New Year.
> --jason
>
> On Fri, 2 Jan 2004, Wes Barris wrote:
> > Heikki Lehvaslaiho wrote:
> > > Wes,
> > >
> > > You didnot say which versionof bioperl you are using. For some reason
> >
> > I am using bioperl-1.2.3
> >
> > > which I
> > > can not quite understand, the current code:
> > > $self->_print(sprintf("%-6s%s\n",'ORIGIN',$o ? $o->value :
> > > ''));
> > >
> > > does print out the requred six spaces after the word ORIGIN. This was
> > > recently
> >
> > Really? How? In the above line "%-6s" left justifies 'ORIGIN' (which is
> > already 6 characters). The '6' needs to be changed to '12' to get six
> > extra spaces. See below.
> >
> > > fixed. Now, why doesn't it work for you? Could you check that you do
> > > not have
> > > multiple copies of bioperl in your computer and the older one gets
> > > accidently
> > > executed?
> > >
> > > Sorry, I can not comeupwith any better explanation,
> > >
> > > -Heikki
> > >
> > > On Tuesday 16 Dec 2003 4:38 am, Wes Barris wrote:
> > > > Hi,
> > > >
> > > > I have just succeeded in tracking down a bug that prevents genbank
> > > > files written from bioperl from being properly imported into
> > > > StackPack (clustering software). The problem is due to a subtle
> > > > difference in a genbank entry downloaded from NCBI and a genbank
> > > > entry produced using genbank.pm. If you use "od -c" to look at a
> > > > genbank record from NCBI, you will notice that the word "ORIGIN" is
> > > > followed by six space
> > >
> > > characters.
> > >
> > > > ORIGIN
> > > > 1 cggccgcgtc gacttttttt ttaggtattt ttctcttatt atttctaaaa
> > > > tataaatttt 61 ggacattcaa aagtgcaaca ngttaatgtg cctgtgggga atatcacagt
> > > > taaaaaaata
> > > >
> > > > If I process this file using bioperl and then write out a new
> > > > genbank format file, the word "ORIGIN" is followed immediately by a
> > > > carriage
> > >
> > > return
> > >
> > > > (newline) character.
> > > >
> > > > It seems silly to me that spaces should be required after the word
> > > > "ORIGIN", but they do exist in files downloaded from NCBI and
> > > > StackPack seems to require these space characters in order to import
> > > > a genbank
> > >
> > > file.
> > >
> > > > Is there an official specification for the genbank format? I have
> > >
> > > sent a
> > >
> > > > bug report to the makers of StackPack too.
> > > >
> > > > In the meantime, I have modified my installed copy of
> > >
> > > Bio/SeqIO/genbank.pm
> > >
> > > > changing this line:
> > > >
> > > > $self->_print(sprintf("%-6s%s\n",'ORIGIN',$o ? $o->value :
> > >
> > > ''));
> > >
> > > > to this:
> > > >
> > > > $self->_print(sprintf("%-12s%s\n",'ORIGIN ',$o ?
> > >
> > > $o->value :
> > > > ''));
> > >
> > > --
> > > ______ _/ _/_____________________________________________________
> > > _/ _/ http://www.ebi.ac.uk/mutations/
> > > _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk
> > > _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute
> > > _/ _/ _/ Wellcome Trust Genome Campus, Hinxton
> > > _/ _/ _/ Cambs. CB10 1SD, United Kingdom
> > > _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468
> > > ___ _/_/_/_/_/________________________________________________________
>
> --
> Jason Stajich
> Duke University
> jason at cgt.mc.duke.edu
--
______ _/ _/_____________________________________________________
_/ _/ http://www.ebi.ac.uk/mutations/
_/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk
_/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute
_/ _/ _/ Wellcome Trust Genome Campus, Hinxton
_/ _/ _/ Cambs. CB10 1SD, United Kingdom
_/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________
More information about the Bioperl-l
mailing list