[Bioperl-l] Error reporting/Validation implemented

Mingyi Liu mingyi.liu at gpc-biotech.com
Mon Mar 14 16:15:32 EST 2005


Hi, there,

I just implemented basic error reporting and validation functionalities 
in my Entrez Gene parser in Perl (the regex version).  The validation 
will catch all non-conforming data, while error reporting reports line 
number, error type, and the first 20 (customizable) characters of the 
offending data (but the line number could be incorrect if the format 
resulted in an exception, which is hard to deal with for ASN.1-formatted 
data, although easy for XML parsers). 

The speed for the parser of course slowed down, but I'd say it'd still 
beat most parsers hands down.  The full human genome now takes a bit 
over 12 minutes instead of 11 minutes to process on one Intel Xeon 2.4 
GHz CPU.  So I don't think my parser's speed has much to do with 
performing validation or not.

I had also communicated with Stefan Kirov and turns out the dead entries 
and 0-sized (should be 1-sized) arrays were simply related to data 
trimming options.  So far, so good.

If anyone is interested, check it out at 
http://www.sourceforge.net/projects/egparser/.

Regards,

Mingyi




More information about the Bioperl-l mailing list