[Bioperl-l] Error reporting/Validation implemented
Mingyi Liu
mingyi.liu at gpc-biotech.com
Mon Mar 14 16:15:32 EST 2005
Hi, there,
I just implemented basic error reporting and validation functionalities
in my Entrez Gene parser in Perl (the regex version). The validation
will catch all non-conforming data, while error reporting reports line
number, error type, and the first 20 (customizable) characters of the
offending data (but the line number could be incorrect if the format
resulted in an exception, which is hard to deal with for ASN.1-formatted
data, although easy for XML parsers).
The speed for the parser of course slowed down, but I'd say it'd still
beat most parsers hands down. The full human genome now takes a bit
over 12 minutes instead of 11 minutes to process on one Intel Xeon 2.4
GHz CPU. So I don't think my parser's speed has much to do with
performing validation or not.
I had also communicated with Stefan Kirov and turns out the dead entries
and 0-sized (should be 1-sized) arrays were simply related to data
trimming options. So far, so good.
If anyone is interested, check it out at
http://www.sourceforge.net/projects/egparser/.
Regards,
Mingyi
More information about the Bioperl-l
mailing list