[Bioperl-l] parsing GenBank file
shalabh sharma
shalabh.sharma7 at gmail.com
Tue May 4 18:18:02 UTC 2010
Hi All,
i have a huge GenBank file ( downloaded from RDP containing all
bacterial 16s). I just want to parse RDP id (in LOCUS) and organism's linage
(in ORGANISM).
I wrote a simple script for this:
#!/usr/bin/perl -w
use Bio::SeqIO;
my $seqio_object = Bio::SeqIO->new(-file => "$ARGV[0]");
while(my $seq_object = $seqio_object->next_seq){
my $id = $seq_object->id;
print "$id\t";
my $species_object = $seq_object->species;
my @classification = $seq_object->species->classification;
foreach my $val (@classification){print "$val\t";}
print "\n";
}
I am getting the output like:
S000107505 uncultured Acidobacteria bacterium Geothrix Holophagaceae
Holophagales Holophagae "Acidobacteria" Bacteria Root
S000148973 uncultured Geothrix sp. Geothrix Holophagaceae Holophagales
Holophagae "Acidobacteria" Bacteria Root
S000431649 uncultured Acidobacteria bacterium Geothrix Holophagaceae
Holophagales Holophagae "Acidobacteria" Bacteria Root
..
..
This is the exact output i want, but i am missing lot of records (they are
there in the genbank file but not in my output).
I also got a warning during parsing:
--------------------- WARNING ---------------------
MSG: Unbalanced quote in:
/db_xref="taxon:35783" /germline"
/mol_type="genomic DNA"
/organism="Enterococcus sp."
/strain="LMG12316"No further qualifiers will be added for this feature
---------------------------------------------------
So i was just wondering that is this warning message causing that problem or
i am doing something wrong?
Thanks
Shalabh
More information about the Bioperl-l
mailing list