[Biojava-l] ParseException when using interleaved Nexus file

Richard Holland holland at eaglegenomics.com
Tue Aug 11 10:50:50 UTC 2009


It should already be on CruiseControl.

Standards in bioinformatics are a pain - people write them to describe  
the format of files their software outputs, then the very same people  
then produce files that break those standards without any additional  
documentation or explanation. (Genbank are one of the biggest  
offenders!) It makes it very hard to write parsers, because if you  
stick to the official spec there will always be files that don't work  
yet people insist are still valid, yet without prior documented  
evidence of invalid files that are considered to be valid, it is  
impossible to write a parser to cater for them. :)

cheers,
Richard

On 11 Aug 2009, at 11:12, David Johnson wrote:

> Hi Richard,
>
> OK that's good to know... I suppose that's the problem with  
> specifications - people don't always follow them!
>
> But I get the impression either some people think that using  
> interleave=yes/no is standard practice, or it could be being  
> generated by some other phylo software (e.g. maybe PAUP or some  
> other tools).
>
> I had a talk with my supervisor and he actually can't find the  
> specific programs that have been putting that in, but looking at a  
> range of his Nexus files, there's quite a few that seem to use put  
> in the yes/no bits, some files he received from other researchers.
>
> Are the modifications available in the latest automated build (on  
> CruiseControl)?
>
> Cheers,
> -David
>
> 2009/8/11 Richard Holland <holland at eaglegenomics.com>
> I've found the problem - "interleave=yes" is not valid according to  
> the official NEXUS format spec which the parser was written against.  
> (Maddison et al., 1997)
>
> Interleaved file are supposed to only include the word "interleave"  
> - it takes no parameters. Non-interleaved files shouldn't mention it  
> at all.
>
> I've modified the parser to tolerate this but I'd be interested to  
> know where the invalid token came from - was it added manually, or  
> by an existing piece of publically available software?
>
> The modification has been made in the trunk of the biojava-live  
> subversion repository.
>
> cheers,
> Richard
>
>

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/




More information about the Biojava-l mailing list