[Bioperl-l] Parsing entrezgene with Bio::SeqIO
Liisa Koski
koski at cenix-bioscience.com
Thu Mar 16 15:14:24 UTC 2006
Hi,
I'm using Bio::SeqIO to parse the EntrezGene file Homo_sapiens (from
ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ASN_OLD/Mammalia/Homo_sapiens.gz).
I'm using bioperl-1.5.1.
I want to extract the KEGG annotations.
See code below.
use Bio::SeqIO;
use Bio::ASN1::EntrezGene;
my $seqio = Bio::SeqIO->new(-format => 'entrezgene',
-file => 'Homo_sapiens');
while (my $gene = $seqio->next_seq){
print "\n",$gene->id, "\t", $gene->accession_number, "\n";
my $ann = $gene->annotation();
foreach my $key ( $ann->get_all_annotation_keys() ) {
my @values = $ann->get_Annotations($key);
foreach my $value ( @values ) {
print $key, "\t", "=", "\t", $value->as_text,"\n";
}
}
}
Unfortunately the only KEGG annotation I see in the results looks like:
dblink = Direct database link to in database KEGG
(Notice the space between 'to in')
Anyone have any ideas how to get the KEGG annotation results?
Note: I also tried parsing the file
ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ASN_BINARY/Mammalia/Homo_sapiens.ags.gz
but I got the below error:
./entrez_gene_seqio.pl Homo_sapiens.ags
Data Error: none conforming data found on line 1 in Homo_sapiens.ags!
first 20 (or till end of input) characters including the non-conforming data:
00
at /netshare/home/koski/perl_modules/bioperl-live/Bio/SeqIO/entrezgene.pm
line 138
Thanks,
Liisa
More information about the Bioperl-l
mailing list