[Bioperl-l] Re: entrezgene binary ASN

Fri Sep 30 15:16:05 EDT 2005

Michael Seewald wrote:

>
> Hi Mingyi,
>
> I am not sure, what you mean. The piping (in my example) did already 
> work nicely, not for you?

I was gonna add support such that a gzipped file would automatically be 
considered binary ASN file and trigger the launch of gene2xml.  This way 
user does not have to worry about the syntax of launching gene2xml.  At 
the same time, user can still call the module with the pipe syntax you 
used.  I never said it wouldn't work.  But I thought you wanted the 
parsers to handle the pipe thing (after reading your previous mail, I 
found I missed the part where you said "no need to rewrite the 
parsers").  So I guess my justifications for not adding gene2xml-based 
binary ASN support in EntrezGene.pm is a moot point.  But the academic 
discussion continues ... :)

>
> With respect to gene2xml failures: This is nothing the module has to 
> care about. It *might* check for correct ASN1 syntax, but this is as 
> much as I would go. Otherwise, I would just try to make sure, that any 
> errors gene2xml throws are caught and passed on. It is the duty of the 
> module and/or the person running the script to watch STDERR output!

Sure, and that's of course already built in.

>
> With respect to the indexing: Again I do not think this would break 
> anything. Both gunzipping and transforming with gene2xml are 
> transparent to the module. The index must not care about it! The 
> indexer should recognize, however, if the index has to be rebuilt. 
> (This is something that some bioperl modules have problems with AFAICR.)
>
I didn't say indexing would break, but the performance of retrieval 
would be horrible.  That's why in most situations there's no need to use 
pipe - after all, any one who needs to use index & ID-based retrieval 
would convert the binary ASN to text file anyway (using a script, 
hopefully).

> With respect to disc i/o: This is definitely a time-saver as more and 
> more of us are running multi-CPU machines.

There would be some negligible savings.  The disk I/O itself takes very 
little time compared to the parsing (I've run benchmark before using the 
human entrezgene file).  So unless one needs to save disk space and only 
need to run entrezgene once (ever), I'd say converting to text file 
first would save more time.

Mingyi