[Bioperl-l] Bio::SeqIO doesn't write all gbk sequences from Bio::DB::GenBank

Fields, Christopher J cjfields at illinois.edu
Wed May 8 02:17:43 UTC 2013


Veronica,

Your mail may have garbled the script and example file.  Can you paste these in a gist?

https://gist.github.com/

chris

On May 7, 2013, at 7:32 PM, Veronica A. <armendarez77 at hotmail.com> wrote:

> Hello,
> I'm currently running Bio::Perl 1.6.1 on Ubuntu 12.04.2 LTS and have noticed a problem with Bio::SeqIO when writing genbank files using the write_seq() function;  some of the files do not include an 'ORIGIN' tag or the sequence.
> 
> I am using GI#s (50 at a time every 2 minutes) to retrieve genbank files via Bio::DB::GenBank.
> 
> ----------------------------------------START CODE----------------------------------
> my $gb = Bio::DB::GenBank->new(-verbose=>-1);my $seqout = Bio::SeqIO->new(-file=>">$fileName", '-format'=>'Genbank', -alphabet=>'dna', -flush=>0, -verbose=>-1);while(@ids){     my @batchArray = splice(@ids, 0, 50);     my $batchArrayRef = \@batchArray;
>     my $streamObj;     my $pid = fork();     if($pid == 0){          eval{               $streamObj = $gb->get_Stream_by_id($batchArrayRef);          };          if($@){               print "Error: ".$@."\n";          }          else{               while(my $seqObj = $streamObj->next_seq()){                    unless($seqObj->accession_number() =~ /N[A-Z]\_/){                        #print "ID: ".$seqObj->id()."\n";                        #print "Seq:\n".$seqObj->seq()."\n";                         $seqout->write_seq($seqObj);                    }                }          }          exit 0;     }}waitpid($pid,0);sleep(120);
> ----------------------------------------END CODE----------------------------------
> Most of the Genbank files written to the output file have sequences, but there is a small portion that do not, even though they should.  For example, JX287367, in NCBI includes an 'ORIGIN' tag and sequence and when I use the print function before writing to file, the sequence is printed to STDOUT, but the 'ORIGIN' tag and sequence are not written to the output gbk file.  The following is found in the final output file:
> ----------------------------------START GBK-----------------------------------------
> LOCUS       JX287367                 588 bp    DNA     linear   BCT 19-DEC-2012DEFINITION  Chlamydia trachomatis strain UW-5/CX pyruvoyl-dependent arginine            decarboxylase (aaxB) gene, complete cds.ACCESSION   JX287367VERSION     JX287367.1  GI:404351720KEYWORDS    .SOURCE      Chlamydia trachomatis  ORGANISM  Chlamydia trachomatis            Bacteria; Chlamydiae; Chlamydiales; Chlamydiaceae;            Chlamydia/Chlamydophila group; Chlamydia.REFERENCE   1  (bases 1 to 588)  AUTHORS   Bliven,K.A., Fisher,D.J. and Maurelli,A.T.  TITLE     Characterization of the activity and expression of arginine            decarboxylase in human and animal Chlamydia pathogens  JOURNAL   FEMS Microbiol. Lett. 337 (2), 140-146 (2012)   PUBMED   23043454REFERENCE   2  (bases 1 to 588)  AUTHORS   Bliven,K.A., Fisher,D.J. and Maurelli,A.T.  TITLE     Direct Submission  JOURNAL   Submitted (06-JUL-2012) Department of Microbiology and Immunology,            F. Edward Hebert School of Med!
> icine, Uniformed Services University            of the Health Sciences, 4301 Jones Bridge Road, Bethesda, MD            20814, USAFEATURES             Location/Qualifiers     source          1..588                     /mol_type="genomic DNA"                     /db_xref="taxon:813"                     /strain="UW-5/CX"                     /organism="Chlamydia trachomatis"                     /serovar="E"     gene            1..588                     /gene="aaxB"     CDS             1..588                     /protein_id="AFR60849.1"                     /gene="aaxB"                     /transl_table=11                     /note="AaxB"                     /db_xref="GI:404351721"                     /codon_start=1                     /product="pyruvoyl-dependent arginine decarboxylase"                     /translation="MPYGTRYPTLAFHTGGVGESDDGMPPQPFETFCYDSALLQAKIE                     NFNIVPYTSVLPKELFGNILPVDQCTKFFKHGAVLEVIMAGRGATVTDGTQAIATGVG                     ICWGKDKNGELIGGW!
> AAEYVEFFPTWIDDEIAESHAKMWLKKSLQHELDLRSVSKHSE                     FQYFHN
> YINIRKKFGFCLTALGFLNFENVAPAVIQ"
> //
> ----------------------------------END GBK-----------------------------------------
> Can anyone tell what I am missing or why this is happening?  I don't know if this has happened in earlier BioPerl versions as up until now, I usually downloaded sequences straight from NCBI, but that became too time consuming....but this seems to be as well :S
> Thank you in advance for any help,
> Veronica
> 		 	   		  
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l





More information about the Bioperl-l mailing list