[Bioperl-l] Bio::SeqIO doesn't write all gbk sequences from Bio::DB::GenBank
Veronica A.
armendarez77 at hotmail.com
Wed May 8 00:32:22 UTC 2013
Hello,
I'm currently running Bio::Perl 1.6.1 on Ubuntu 12.04.2 LTS and have noticed a problem with Bio::SeqIO when writing genbank files using the write_seq() function; some of the files do not include an 'ORIGIN' tag or the sequence.
I am using GI#s (50 at a time every 2 minutes) to retrieve genbank files via Bio::DB::GenBank.
----------------------------------------START CODE----------------------------------
my $gb = Bio::DB::GenBank->new(-verbose=>-1);my $seqout = Bio::SeqIO->new(-file=>">$fileName", '-format'=>'Genbank', -alphabet=>'dna', -flush=>0, -verbose=>-1);while(@ids){ my @batchArray = splice(@ids, 0, 50); my $batchArrayRef = \@batchArray;
my $streamObj; my $pid = fork(); if($pid == 0){ eval{ $streamObj = $gb->get_Stream_by_id($batchArrayRef); }; if($@){ print "Error: ".$@."\n"; } else{ while(my $seqObj = $streamObj->next_seq()){ unless($seqObj->accession_number() =~ /N[A-Z]\_/){ #print "ID: ".$seqObj->id()."\n"; #print "Seq:\n".$seqObj->seq()."\n"; $seqout->write_seq($seqObj); } } } exit 0; }}waitpid($pid,0);sleep(120);
----------------------------------------END CODE----------------------------------
Most of the Genbank files written to the output file have sequences, but there is a small portion that do not, even though they should. For example, JX287367, in NCBI includes an 'ORIGIN' tag and sequence and when I use the print function before writing to file, the sequence is printed to STDOUT, but the 'ORIGIN' tag and sequence are not written to the output gbk file. The following is found in the final output file:
----------------------------------START GBK-----------------------------------------
LOCUS JX287367 588 bp DNA linear BCT 19-DEC-2012DEFINITION Chlamydia trachomatis strain UW-5/CX pyruvoyl-dependent arginine decarboxylase (aaxB) gene, complete cds.ACCESSION JX287367VERSION JX287367.1 GI:404351720KEYWORDS .SOURCE Chlamydia trachomatis ORGANISM Chlamydia trachomatis Bacteria; Chlamydiae; Chlamydiales; Chlamydiaceae; Chlamydia/Chlamydophila group; Chlamydia.REFERENCE 1 (bases 1 to 588) AUTHORS Bliven,K.A., Fisher,D.J. and Maurelli,A.T. TITLE Characterization of the activity and expression of arginine decarboxylase in human and animal Chlamydia pathogens JOURNAL FEMS Microbiol. Lett. 337 (2), 140-146 (2012) PUBMED 23043454REFERENCE 2 (bases 1 to 588) AUTHORS Bliven,K.A., Fisher,D.J. and Maurelli,A.T. TITLE Direct Submission JOURNAL Submitted (06-JUL-2012) Department of Microbiology and Immunology, F. Edward Hebert School of Medicine, Uniformed Services University of the Health Sciences, 4301 Jones Bridge Road, Bethesda, MD 20814, USAFEATURES Location/Qualifiers source 1..588 /mol_type="genomic DNA" /db_xref="taxon:813" /strain="UW-5/CX" /organism="Chlamydia trachomatis" /serovar="E" gene 1..588 /gene="aaxB" CDS 1..588 /protein_id="AFR60849.1" /gene="aaxB" /transl_table=11 /note="AaxB" /db_xref="GI:404351721" /codon_start=1 /product="pyruvoyl-dependent arginine decarboxylase" /translation="MPYGTRYPTLAFHTGGVGESDDGMPPQPFETFCYDSALLQAKIE NFNIVPYTSVLPKELFGNILPVDQCTKFFKHGAVLEVIMAGRGATVTDGTQAIATGVG ICWGKDKNGELIGGWAAEYVEFFPTWIDDEIAESHAKMWLKKSLQHELDLRSVSKHSE FQYFHNYINIRKKFGFCLTALGFLNFENVAPAVIQ"
//
----------------------------------END GBK-----------------------------------------
Can anyone tell what I am missing or why this is happening? I don't know if this has happened in earlier BioPerl versions as up until now, I usually downloaded sequences straight from NCBI, but that became too time consuming....but this seems to be as well :S
Thank you in advance for any help,
Veronica
More information about the Bioperl-l
mailing list