[Bioperl-l] Invalid EMBL files generated in rare circumstances; line wrapping
Adam Sjøgren
adsj at novozymes.com
Mon Sep 29 15:17:31 UTC 2014
Hi.
If you craft a tag on a feature sneakily (or if you are unlucky)
Bio::SeqIO will create invalid EMBL, separating the "/" from the
qualifier name:
ID unknown; SV 1; linear; unassigned DNA; STD; UNC; 4 BP.
XX
AC unknown;
XX
XX
XX
FH Key Location/Qualifiers
FH
FT CDS 1..4
FT /
FT note="XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
FT X"
XX
SQ Sequence 4 BP; 1 A; 1 C; 1 G; 1 T; 0 other;
actg 4
//
In this example "/" and "note" are on separate lines, which is wrong; at
least BioPerl does not accept it itself.
Here is a script to create the above output (BioPerl 1.6.901 used):
#!/usr/bin/perl
use strict;
use warnings;
use Bio::Seq::RichSeq;
use Bio::SeqFeature::Generic;
use IO::String;
use Bio::SeqIO;
my $seq=Bio::Seq::RichSeq->new(-display_id=>'TEST', -seq=>'actg');
my $cds=Bio::SeqFeature::Generic->new(-primary_tag=>'CDS', -start=>1, -end=>4);
$cds->add_tag_value(note=>'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX X');
$seq->add_SeqFeature($cds);
my $string;
my $str=IO::String->new($string);
my $io=Bio::SeqIO->new(-fh=>$str, -format=>'embl');
$io->write_seq($seq);
print $string;
Changing the position of the space in the note makes a/the difference.
Maybe there is a bug lurking in the line wrapping/formatting code
somewhere...
Does this sound like a bug to anyone else?
Best regards,
Adam
--
Adam Sjøgren
adsj at novozymes.com
More information about the Bioperl-l
mailing list