[Bioperl-l] Parsing entrezgene with Bio::SeqIO
Mingyi Liu
mingyi.liu at gpc-biotech.com
Thu Mar 16 15:59:32 UTC 2006
Liisa Koski wrote:
> Unfortunately the only KEGG annotation I see in the results looks like:
> dblink = Direct database link to in database KEGG
> (Notice the space between 'to in')
>
> Anyone have any ideas how to get the KEGG annotation results?
Stefan's the person maintaining the SeqIO:entrezgene module, so he'd be
able to answer this part of your question.
>
> Note: I also tried parsing the file
> ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ASN_BINARY/Mammalia/Homo_sapiens.ags.gz
> but I got the below error:
>
> ./entrez_gene_seqio.pl Homo_sapiens.ags
> Data Error: none conforming data found on line 1 in Homo_sapiens.ags!
> first 20 (or till end of input) characters including the non-conforming data:
> 00
> at /netshare/home/koski/perl_modules/bioperl-live/Bio/SeqIO/entrezgene.pm
> line 138
>
The error was thrown by my Bio::ASN1::EntrezGene module because it
expects a text file, while you fed it with a binary file. To use
gzipped ASN binary file from NCBI, download the NCBI gene2xml
(ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools/converters/by_program/gene2xml),
then use this syntax to run my parser on the binary files:
my $parser = Bio::ASN1::EntrezGene->new('file' => "gene2xml -i
Homo_sapiens.ags.gz -c -x -b | "); # Homo_sapiens.ags.gz is the gzipped
binary file directly downloaded from NCBI
Same syntax should be used when you're using SeqIO (thus SeqIO::entrezgene).
Best,
Mingyi
More information about the Bioperl-l
mailing list