[Bioperl-l] Parsing entrezgene with Bio::SeqIO

Liisa Koski koski at cenix-bioscience.com
Thu Mar 16 15:14:24 UTC 2006


Hi,
I'm using Bio::SeqIO to parse the EntrezGene file Homo_sapiens (from 
ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ASN_OLD/Mammalia/Homo_sapiens.gz).

I'm using bioperl-1.5.1.

I want to extract the KEGG annotations.
See code below.

use Bio::SeqIO;
use Bio::ASN1::EntrezGene;

my $seqio = Bio::SeqIO->new(-format => 'entrezgene',
                                             -file => 'Homo_sapiens');
while (my $gene = $seqio->next_seq){
    print "\n",$gene->id, "\t", $gene->accession_number, "\n";
    my $ann = $gene->annotation();
    foreach my $key ( $ann->get_all_annotation_keys() ) {
        my @values = $ann->get_Annotations($key);
        foreach my $value ( @values ) {
            print $key, "\t", "=", "\t", $value->as_text,"\n";
        }
    }
}

Unfortunately the only KEGG annotation I see in the results looks like:
dblink  =       Direct database link to  in database KEGG 
(Notice the space between 'to  in')

Anyone have any ideas how to get the KEGG annotation results?

Note: I also tried parsing the file 
ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ASN_BINARY/Mammalia/Homo_sapiens.ags.gz
but I got the below error:

./entrez_gene_seqio.pl Homo_sapiens.ags
Data Error: none conforming data found on line 1 in Homo_sapiens.ags!
first 20 (or till end of input) characters including the non-conforming data:
00
 at /netshare/home/koski/perl_modules/bioperl-live/Bio/SeqIO/entrezgene.pm 
line 138


Thanks,
Liisa




More information about the Bioperl-l mailing list