[Bioperl-l] Bio::SeqIO doesn't write all gbk sequences from Bio::DB::GenBank
Fields, Christopher J
cjfields at illinois.edu
Wed May 8 02:17:43 UTC 2013
Veronica,
Your mail may have garbled the script and example file. Can you paste these in a gist?
https://gist.github.com/
chris
On May 7, 2013, at 7:32 PM, Veronica A. <armendarez77 at hotmail.com> wrote:
> Hello,
> I'm currently running Bio::Perl 1.6.1 on Ubuntu 12.04.2 LTS and have noticed a problem with Bio::SeqIO when writing genbank files using the write_seq() function; some of the files do not include an 'ORIGIN' tag or the sequence.
>
> I am using GI#s (50 at a time every 2 minutes) to retrieve genbank files via Bio::DB::GenBank.
>
> ----------------------------------------START CODE----------------------------------
> my $gb = Bio::DB::GenBank->new(-verbose=>-1);my $seqout = Bio::SeqIO->new(-file=>">$fileName", '-format'=>'Genbank', -alphabet=>'dna', -flush=>0, -verbose=>-1);while(@ids){ my @batchArray = splice(@ids, 0, 50); my $batchArrayRef = \@batchArray;
> my $streamObj; my $pid = fork(); if($pid == 0){ eval{ $streamObj = $gb->get_Stream_by_id($batchArrayRef); }; if($@){ print "Error: ".$@."\n"; } else{ while(my $seqObj = $streamObj->next_seq()){ unless($seqObj->accession_number() =~ /N[A-Z]\_/){ #print "ID: ".$seqObj->id()."\n"; #print "Seq:\n".$seqObj->seq()."\n"; $seqout->write_seq($seqObj); } } } exit 0; }}waitpid($pid,0);sleep(120);
> ----------------------------------------END CODE----------------------------------
> Most of the Genbank files written to the output file have sequences, but there is a small portion that do not, even though they should. For example, JX287367, in NCBI includes an 'ORIGIN' tag and sequence and when I use the print function before writing to file, the sequence is printed to STDOUT, but the 'ORIGIN' tag and sequence are not written to the output gbk file. The following is found in the final output file:
> ----------------------------------START GBK-----------------------------------------
> LOCUS JX287367 588 bp DNA linear BCT 19-DEC-2012DEFINITION Chlamydia trachomatis strain UW-5/CX pyruvoyl-dependent arginine decarboxylase (aaxB) gene, complete cds.ACCESSION JX287367VERSION JX287367.1 GI:404351720KEYWORDS .SOURCE Chlamydia trachomatis ORGANISM Chlamydia trachomatis Bacteria; Chlamydiae; Chlamydiales; Chlamydiaceae; Chlamydia/Chlamydophila group; Chlamydia.REFERENCE 1 (bases 1 to 588) AUTHORS Bliven,K.A., Fisher,D.J. and Maurelli,A.T. TITLE Characterization of the activity and expression of arginine decarboxylase in human and animal Chlamydia pathogens JOURNAL FEMS Microbiol. Lett. 337 (2), 140-146 (2012) PUBMED 23043454REFERENCE 2 (bases 1 to 588) AUTHORS Bliven,K.A., Fisher,D.J. and Maurelli,A.T. TITLE Direct Submission JOURNAL Submitted (06-JUL-2012) Department of Microbiology and Immunology, F. Edward Hebert School of Med!
> icine, Uniformed Services University of the Health Sciences, 4301 Jones Bridge Road, Bethesda, MD 20814, USAFEATURES Location/Qualifiers source 1..588 /mol_type="genomic DNA" /db_xref="taxon:813" /strain="UW-5/CX" /organism="Chlamydia trachomatis" /serovar="E" gene 1..588 /gene="aaxB" CDS 1..588 /protein_id="AFR60849.1" /gene="aaxB" /transl_table=11 /note="AaxB" /db_xref="GI:404351721" /codon_start=1 /product="pyruvoyl-dependent arginine decarboxylase" /translation="MPYGTRYPTLAFHTGGVGESDDGMPPQPFETFCYDSALLQAKIE NFNIVPYTSVLPKELFGNILPVDQCTKFFKHGAVLEVIMAGRGATVTDGTQAIATGVG ICWGKDKNGELIGGW!
> AAEYVEFFPTWIDDEIAESHAKMWLKKSLQHELDLRSVSKHSE FQYFHN
> YINIRKKFGFCLTALGFLNFENVAPAVIQ"
> //
> ----------------------------------END GBK-----------------------------------------
> Can anyone tell what I am missing or why this is happening? I don't know if this has happened in earlier BioPerl versions as up until now, I usually downloaded sequences straight from NCBI, but that became too time consuming....but this seems to be as well :S
> Thank you in advance for any help,
> Veronica
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list