[Bioperl-l] Bio::SeqIO doesn't write all gbk sequences from Bio::DB::GenBank

Veronica A. armendarez77 at hotmail.com
Wed May 8 00:32:22 UTC 2013


Hello,
I'm currently running Bio::Perl 1.6.1 on Ubuntu 12.04.2 LTS and have noticed a problem with Bio::SeqIO when writing genbank files using the write_seq() function;  some of the files do not include an 'ORIGIN' tag or the sequence.

I am using GI#s (50 at a time every 2 minutes) to retrieve genbank files via Bio::DB::GenBank.

----------------------------------------START CODE----------------------------------
my $gb = Bio::DB::GenBank->new(-verbose=>-1);my $seqout = Bio::SeqIO->new(-file=>">$fileName", '-format'=>'Genbank', -alphabet=>'dna', -flush=>0, -verbose=>-1);while(@ids){     my @batchArray = splice(@ids, 0, 50);     my $batchArrayRef = \@batchArray;
     my $streamObj;     my $pid = fork();     if($pid == 0){          eval{               $streamObj = $gb->get_Stream_by_id($batchArrayRef);          };          if($@){               print "Error: ".$@."\n";          }          else{               while(my $seqObj = $streamObj->next_seq()){                    unless($seqObj->accession_number() =~ /N[A-Z]\_/){                        #print "ID: ".$seqObj->id()."\n";                        #print "Seq:\n".$seqObj->seq()."\n";                         $seqout->write_seq($seqObj);                    }                }          }          exit 0;     }}waitpid($pid,0);sleep(120);
----------------------------------------END CODE----------------------------------
Most of the Genbank files written to the output file have sequences, but there is a small portion that do not, even though they should.  For example, JX287367, in NCBI includes an 'ORIGIN' tag and sequence and when I use the print function before writing to file, the sequence is printed to STDOUT, but the 'ORIGIN' tag and sequence are not written to the output gbk file.  The following is found in the final output file:
----------------------------------START GBK-----------------------------------------
LOCUS       JX287367                 588 bp    DNA     linear   BCT 19-DEC-2012DEFINITION  Chlamydia trachomatis strain UW-5/CX pyruvoyl-dependent arginine            decarboxylase (aaxB) gene, complete cds.ACCESSION   JX287367VERSION     JX287367.1  GI:404351720KEYWORDS    .SOURCE      Chlamydia trachomatis  ORGANISM  Chlamydia trachomatis            Bacteria; Chlamydiae; Chlamydiales; Chlamydiaceae;            Chlamydia/Chlamydophila group; Chlamydia.REFERENCE   1  (bases 1 to 588)  AUTHORS   Bliven,K.A., Fisher,D.J. and Maurelli,A.T.  TITLE     Characterization of the activity and expression of arginine            decarboxylase in human and animal Chlamydia pathogens  JOURNAL   FEMS Microbiol. Lett. 337 (2), 140-146 (2012)   PUBMED   23043454REFERENCE   2  (bases 1 to 588)  AUTHORS   Bliven,K.A., Fisher,D.J. and Maurelli,A.T.  TITLE     Direct Submission  JOURNAL   Submitted (06-JUL-2012) Department of Microbiology and Immunology,            F. Edward Hebert School of Medicine, Uniformed Services University            of the Health Sciences, 4301 Jones Bridge Road, Bethesda, MD            20814, USAFEATURES             Location/Qualifiers     source          1..588                     /mol_type="genomic DNA"                     /db_xref="taxon:813"                     /strain="UW-5/CX"                     /organism="Chlamydia trachomatis"                     /serovar="E"     gene            1..588                     /gene="aaxB"     CDS             1..588                     /protein_id="AFR60849.1"                     /gene="aaxB"                     /transl_table=11                     /note="AaxB"                     /db_xref="GI:404351721"                     /codon_start=1                     /product="pyruvoyl-dependent arginine decarboxylase"                     /translation="MPYGTRYPTLAFHTGGVGESDDGMPPQPFETFCYDSALLQAKIE                     NFNIVPYTSVLPKELFGNILPVDQCTKFFKHGAVLEVIMAGRGATVTDGTQAIATGVG                     ICWGKDKNGELIGGWAAEYVEFFPTWIDDEIAESHAKMWLKKSLQHELDLRSVSKHSE                     FQYFHNYINIRKKFGFCLTALGFLNFENVAPAVIQ"
//
----------------------------------END GBK-----------------------------------------
Can anyone tell what I am missing or why this is happening?  I don't know if this has happened in earlier BioPerl versions as up until now, I usually downloaded sequences straight from NCBI, but that became too time consuming....but this seems to be as well :S
Thank you in advance for any help,
Veronica
 		 	   		  



More information about the Bioperl-l mailing list