[Bioperl-l] Re: entrezgene binary ASN
Mingyi Liu
mingyi.liu at gpc-biotech.com
Fri Sep 30 15:16:05 EDT 2005
Michael Seewald wrote:
>
> Hi Mingyi,
>
> I am not sure, what you mean. The piping (in my example) did already
> work nicely, not for you?
I was gonna add support such that a gzipped file would automatically be
considered binary ASN file and trigger the launch of gene2xml. This way
user does not have to worry about the syntax of launching gene2xml. At
the same time, user can still call the module with the pipe syntax you
used. I never said it wouldn't work. But I thought you wanted the
parsers to handle the pipe thing (after reading your previous mail, I
found I missed the part where you said "no need to rewrite the
parsers"). So I guess my justifications for not adding gene2xml-based
binary ASN support in EntrezGene.pm is a moot point. But the academic
discussion continues ... :)
>
> With respect to gene2xml failures: This is nothing the module has to
> care about. It *might* check for correct ASN1 syntax, but this is as
> much as I would go. Otherwise, I would just try to make sure, that any
> errors gene2xml throws are caught and passed on. It is the duty of the
> module and/or the person running the script to watch STDERR output!
Sure, and that's of course already built in.
>
> With respect to the indexing: Again I do not think this would break
> anything. Both gunzipping and transforming with gene2xml are
> transparent to the module. The index must not care about it! The
> indexer should recognize, however, if the index has to be rebuilt.
> (This is something that some bioperl modules have problems with AFAICR.)
>
I didn't say indexing would break, but the performance of retrieval
would be horrible. That's why in most situations there's no need to use
pipe - after all, any one who needs to use index & ID-based retrieval
would convert the binary ASN to text file anyway (using a script,
hopefully).
> With respect to disc i/o: This is definitely a time-saver as more and
> more of us are running multi-CPU machines.
There would be some negligible savings. The disk I/O itself takes very
little time compared to the parsing (I've run benchmark before using the
human entrezgene file). So unless one needs to save disk space and only
need to run entrezgene once (ever), I'd say converting to text file
first would save more time.
Mingyi
More information about the Bioperl-l
mailing list