[Biojava-l] ParseException when using interleaved Nexus file
David Johnson
d.johnson at reading.ac.uk
Wed Sep 23 10:24:47 UTC 2009
Hi Richard,
Forgot to say after your last mail (ages ago now), thanks for all your
help! The stuff I'm using the Nexus parser in works great now.
Cheers,
-David
2009/8/11 Richard Holland <holland at eaglegenomics.com>:
> It should already be on CruiseControl.
>
> Standards in bioinformatics are a pain - people write them to describe the
> format of files their software outputs, then the very same people then
> produce files that break those standards without any additional
> documentation or explanation. (Genbank are one of the biggest offenders!) It
> makes it very hard to write parsers, because if you stick to the official
> spec there will always be files that don't work yet people insist are still
> valid, yet without prior documented evidence of invalid files that are
> considered to be valid, it is impossible to write a parser to cater for
> them. :)
>
> cheers,
> Richard
>
> On 11 Aug 2009, at 11:12, David Johnson wrote:
>
>> Hi Richard,
>>
>> OK that's good to know... I suppose that's the problem with specifications
>> - people don't always follow them!
>>
>> But I get the impression either some people think that using
>> interleave=yes/no is standard practice, or it could be being generated by
>> some other phylo software (e.g. maybe PAUP or some other tools).
>>
>> I had a talk with my supervisor and he actually can't find the specific
>> programs that have been putting that in, but looking at a range of his Nexus
>> files, there's quite a few that seem to use put in the yes/no bits, some
>> files he received from other researchers.
>>
>> Are the modifications available in the latest automated build (on
>> CruiseControl)?
>>
>> Cheers,
>> -David
>>
>> 2009/8/11 Richard Holland <holland at eaglegenomics.com>
>> I've found the problem - "interleave=yes" is not valid according to the
>> official NEXUS format spec which the parser was written against. (Maddison
>> et al., 1997)
>>
>> Interleaved file are supposed to only include the word "interleave" - it
>> takes no parameters. Non-interleaved files shouldn't mention it at all.
>>
>> I've modified the parser to tolerate this but I'd be interested to know
>> where the invalid token came from - was it added manually, or by an existing
>> piece of publically available software?
>>
>> The modification has been made in the trunk of the biojava-live subversion
>> repository.
>>
>> cheers,
>> Richard
>>
>>
More information about the Biojava-l
mailing list