[Bioperl-l] can't parse GenBank correctly (SeqIO or included
modules)
t-nakazato at muj.biglobe.ne.jp
t-nakazato at muj.biglobe.ne.jp
Fri Sep 30 07:53:18 EDT 2005
Hi,
I'll report the case that BioPerl can't parse GenBank file
correctly.
BioPerl (SeqIO or included modules) confuses REMARK and PUBMED
line in GenBank file.
I'm running Ver.1.5 on RedHat 9.
I wrote script as follows to retrieve PMID from GenBank file.
-----
#!/usr/bin/perl
use Bio::SeqIO;
my $file_in = shift;
my $in_obj = Bio::SeqIO->new( -file => "$file_in",
-format => "GenBank" );
while ( my $each_obj = $in_obj->next_seq) {
@ref_objarray = $each_obj->annotation->get_Annotations("reference");
foreach $ref_obj (@ref_objarray) {
print $ref_obj->pubmed();
}
}
-----
Most of GenBank file is parsed correctly, but I can't get PMID
from AC091629 or AC002397 (GenBank Accession No.). (Nothing is
printed.)
Original GenBank file is as follows.
(in the case of AC091629)
-----
...
REFERENCE 1 (bases 1 to 161334)
AUTHORS Poorkaj,P., Kas,A., D'Souza,I., Zhou,Y., Pham,Q., Stone,M.,
Olson,M.V. and Schellenberg,G.D.
TITLE A genomic sequence analysis of the mouse and human
microtubule-associated protein tau
JOURNAL Mamm. Genome 12 (9), 700-712 (2001)
REMARK Contact: Gerald D. Schellenberg (zachdad at u.washington.edu)
PUBMED 11641718
...
-----
So, I'll try "print $ref_obj->comment();".
-----
Contact: Gerald D. Schellenberg (zachdad at u.washington.edu) PUBMED 11641718
-----
I checked BioPerl confuses "comment" and "pubmed" in $each_obj
(in my script, SeqIO object) with Data::Dumper.
Ver.1.4 confuses JOURNAL and PUBMED line.
This problem was fixed in 1.5, but it seems to remain.
Best regards,
Takeru
More information about the Bioperl-l
mailing list