[Bioperl-l] Parsing entrezgene with Bio::SeqIO

Thu Mar 16 15:59:32 UTC 2006

Liisa Koski wrote:

> Unfortunately the only KEGG annotation I see in the results looks like:
> dblink  =       Direct database link to  in database KEGG 
> (Notice the space between 'to  in')
> 
> Anyone have any ideas how to get the KEGG annotation results?

Stefan's the person maintaining the SeqIO:entrezgene module, so he'd be 
able to answer this part of your question.

> 
> Note: I also tried parsing the file 
> ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ASN_BINARY/Mammalia/Homo_sapiens.ags.gz
> but I got the below error:
> 
> ./entrez_gene_seqio.pl Homo_sapiens.ags
> Data Error: none conforming data found on line 1 in Homo_sapiens.ags!
> first 20 (or till end of input) characters including the non-conforming data:
> 00
>  at /netshare/home/koski/perl_modules/bioperl-live/Bio/SeqIO/entrezgene.pm 
> line 138
> 
The error was thrown by my Bio::ASN1::EntrezGene module because it 
expects a text file, while you fed it with a binary file.  To use 
gzipped ASN binary file from NCBI, download the NCBI gene2xml 
(ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools/converters/by_program/gene2xml),
then use this syntax to run my parser on the binary files:

my $parser = Bio::ASN1::EntrezGene->new('file' => "gene2xml -i 
Homo_sapiens.ags.gz -c -x -b | "); # Homo_sapiens.ags.gz is the gzipped 
binary file directly downloaded from NCBI

Same syntax should be used when you're using SeqIO (thus SeqIO::entrezgene).

Best,

Mingyi