[Bioperl-l] EMBL ID line parsing error

Pierre Rioux pierre_rioux at yahoo.com
Wed Jul 13 13:51:56 EDT 2005


Hi,

> I noticed that one BioFetch test was failing. It was caused by an EMBL entry 
> object not having a display ID. The failure was caused by regular expression 
> in the EMBL parser not allowing spaces in the molecule substring of the ID 
> line:
> 
> 
> ID   BUM        standard; genomic RNA; VRL; 200 BP.
>                    was:   (\S+);
>                    fix:   ([\S ]+);     now in bioperl-live

Because regular expressions are greedy, and because \S
also matches the semicolon ";", I think maybe a better
fix would be 

                            ([^;]);

That way, if the EMBL line format ever gets extended to include
more semicolon-separated fields, it will still work.

(Personally, when I write regexes, I always try to make sure
the specific character that is used as delimiter cannot
be matched by the parenthesized regex for the fields...
otherwise you're putting too much trust on the NUMBER of
fields in the line for the whole line-matching regex
to succeed as planned).

Pierre


		
____________________________________________________
Start your day with Yahoo! - make it your home page
http://www.yahoo.com/r/hs
 


More information about the Bioperl-l mailing list