[Bioperl-l] GenBank files CONTIG line messes up sequences

jockoblom at gmail.com jockoblom at gmail.com
Wed May 14 13:41:23 UTC 2014


Hi!

I'm using BioPerl to work with GenBAnk files, mostly public files from the 
NCBI. For GenBank files with a CONTIG line like thw following
"CONTIG      join(BA000030.3:1..9025608)"
the Genbank parser does not work properly.

As a simple example:




























*#!/usr/bin/env perluse strict;use warnings;use Bio::SeqIO;my $file = 
shift;# create reader for the inputmy $seq_in = Bio::SeqIO->new(   -file => 
$file,                                -format => 
'genbank'                            );my @entries;while (my $contig = 
$seq_in->next_seq()){    ## do stuff    push @entries, $contig;}my $outfile 
= Bio::SeqIO->new(  -file => ">out.gbk",                                 
-format => 'genbank'                             );foreach my $contig 
(@entries){        $outfile->write_seq($contig);}*

This script creates a Genbank file, but the genome sequence is missing. 
This is independent from what I do in the "do stuff" part, even this empty 
version does not export a sequence.
The problem seems to be that the sequence is not accesible, e.g.,  the 
following lines will create empty sequences in $feature_seq.






*foreach my $feat ($contig->get_SeqFeatures()){    if($feat->primary_tag() 
eq 'CDS'){        $feature_seq = $feat->seq();...*Even the most simple (and 
most useless) script example does fail to produce a genbank file with 
sequence:













*my $input = $ARGV[0];my $output = $ARGV[1];my $in = Bio::SeqIO->new(-file 
=> "$input" ,            -format => 'genbank');my $out  = 
Bio::SeqIO->new(-file => ">$output" ,                       -format => 
'genbank');                       while ( my $seq = $in->next_seq() ) {    
$out->write_seq($seq);}*

Same script with Genbank to EMBL works just fine. EMBL to Genbank fails as 
well if the EMBL file has the line corresponding to the CONTIG line:
"CO   join(BA000030.3:1..9025608)"

I found this post
https://groups.google.com/forum/#!msg/bioperl-l/ur9ZQIXyoj0/xT8izNWb-8kJ
telling that the problem is fixed in BioPerl 1.6.922, but I still encounter 
the problems with sequence retieval.

Using Perl 5.18.2
Bio::Root::Version::VERSION: 1.006923

Anyone with the same problems or any helpful ideas?

Best regards,

Jochen











More information about the Bioperl-l mailing list