[Bioperl-l] Root::IO handle Mac and Win32 LF
Jason Stajich
jason at cgt.duhs.duke.edu
Tue Dec 16 11:44:18 EST 2003
On Tue, 16 Dec 2003, Dave Howorth wrote:
> Ah, now that's interesting. In this specific case the application,
> newick.pm, has explicitly opted out of Perl's end-of-line handling by
> redefining $/ so it can slurp the whole tree at once:
>
> local $/ = ";\n";
> return unless $_ = $self->_readline;
>
> Which, IMHO, makes it its problem to deal with line breaks.
Hmmm - SeqIO::fasta does this sort of thing as well.
This has nothing to do with the individual fields though - it only defines
how much to slurp in, if it weren't working we'd get two trees mooshed
together as one record and doesn't affect the multi-lined reports since
they only have a ; at the end.
In the end this had nothing to do with Windows LF problems once I had
Valentin's test file in front of me.
Adding this to newick.pm after the record is slurped in takes
care of the problem:
s/[\n\r]+//g
As any sort of newline needs to be stripped out as that is what is
getting converted to spaces. It really wasn't a windows problem but
a problem with Allen's changes to the newick parsing code to replace WS
with _ but not handling LF separately.
>From the log:
revision 1.22
date: 2003/08/15 17:07:27; author: allenday; state: Exp; lines: +3 -2
removed unnecessary escap char in space removing regex. added regex to
remove quotes and leading/trailing spaces
from node labels as necessary.
----------------------------
revision 1.21
date: 2003/08/15 08:31:46; author: allenday; state: Exp; lines: +5 -2
fixing over-zealous whitespace removal from node labels. we do this by
not tampering with " quoted strings. i'm not sure if newick allows " to
be escaped within these labels... if so, there may be a bug here.
----------------------------
My original code stripped all whitespace and thus we never had this
problem because there shouldn't be any in the node names in Newick
http://evolution.genetics.washington.edu/phylip/newicktree.html
"A name can be any string of printable characters except --->blanks<---,
colons, semcolons, parentheses, and square brackets."
but apparently he wants to support this for his purposes.
I think my small change above takes care of the bug.
-jason
>
> So, unless the problem also occurs in regular code using Perl's default
> line break handling, I'd say the bug should be fixed by adding whatever
> code is required in the newick module, not by adding complexity in
> Root::IO for that special case.
>
> Cheers, Dave
>
--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu
More information about the Bioperl-l
mailing list