[Bioperl-l] Entrez Gene ASN parsers
Hilmar Lapp
hlapp at gmx.net
Sat Mar 12 23:54:14 EST 2005
On Saturday, March 12, 2005, at 08:12 PM, Liu, Mingyi wrote:
>
> My parser does to NCBI's ASN.1 EntrezGene file what an XML parser does
> to a yet-to-exist XML-formatted EntrezGene file (or better than it, if
> NCBI decides to code Entrez Gene in the XML format that Eutils
> provide).
This is apparently what they will be doing, or at least my
understanding of it. The discomforting thing is that it's taken them so
long already to come up with that supposedly little tool. In fact,
apparently the fact they weren't able to provide the off-line tool yet
is the reason that they're still maintaining the LocusLink download.
That's what they told me in a response to an inquiry. Although from
Monday on they'll remove C.elegans and fruitfly from LL_tmpl. Not good.
> And it performs better than XML parsers.
Actually, even an expat-based XML parser would be by orders of
magnitude slower than your regexp-based.
The question is how safe are your regexps from possibly unexpected
things like escaped quotes or an escaped curly brace that's part of a
string and not end of an entity etc or whatever might confuse your
regexps.
Maybe in ASN.1 this isn't a big deal? I just have too little knowledge
about ASN.1 to make any judgment here.
>
> So I really don't think there's any need for XML file from NCBI.
Yeah, I actually started to change my mind w.r.t. waiting for the XML
format.
-hilmar
--
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------
More information about the Bioperl-l
mailing list