[Bioperl-l] Re: entrezgene binary ASN
Mingyi Liu
mingyi.liu at gpc-biotech.com
Fri Sep 30 12:23:34 EDT 2005
I was half way through adding the support for pipe in
Bio::ASN1::EntrezGene before I realized that this is not a good
solution. The problem I have with the pipe thing is that it merely
added more troubles and did not really save anything.
I mean, one superficial advantage of using pipe directly would be that
you don't need to first launch gene2xml. But 1. Nobody needs to
manually launch gene2xml. In any shell/perl script that does the
automatic download of the NCBI binary ASN files, just add a line to
launch gene2xml right after download. 2. Having EntrezGene module deal
with it transparently would force it to deal with multiple failure
possibilities (no gene2xml installed? gene2xml choked? ...), let alone
hassles of changing syntax in input_file. Simply put, it's not worth it.
Another proposed advantage is saving disk I/O, in a sense it does (the
gzipped binary files are much smaller), but that does not necessarily
lead to shorter processing time since the time gene2xml doing its work
on the fly should be counted as well. Not to mention if gene2xml choked
for whatever reason.
A major disadvantage of using pipe would be doing any sort of seeking
operation on the file - the performance would be abysmal. For indexing
and indexed entry retrieval, one simply have to do the pre-conversion of
those binary gzipped files.
As such I feel there are compelling reasons for one to first convert the
binary gzip files to text files, then use the existing Bioperl modules
to parse, index, retrieve. Any further input/discussions on the matter
is welcomed!
Thanks,
Mingyi
Michael Seewald wrote:
>Hi Stefan,
>
>There are ways to capture these errors. Perl exception handling might
>be way to do it.
>
>On the other hand: Wouldn"t incomplete .gz downloads throw an error
>right away? I have to check (but can't right now).
>
>Michael
>
More information about the Bioperl-l
mailing list