[Bioperl-l] Root::IO handle Mac and Win32 LF

Jason Stajich jason at cgt.duhs.duke.edu
Tue Dec 16 11:44:18 EST 2003

On Tue, 16 Dec 2003, Dave Howorth wrote:

> Ah, now that's interesting. In this specific case the application,
> newick.pm, has explicitly opted out of Perl's end-of-line handling by
> redefining $/ so it can slurp the whole tree at once:
>     local $/ = ";\n";
>     return unless $_ = $self->_readline;
> Which, IMHO, makes it its problem to deal with line breaks.
Hmmm - SeqIO::fasta does this sort of thing as well.

This has nothing to do with the individual fields though - it only defines
how much to slurp in, if it weren't working we'd get two trees mooshed
together as one record and doesn't affect the multi-lined reports since
they only have a ; at the end.

In the end this had nothing to do with Windows LF problems once I had
Valentin's test file in front of me.

Adding this to newick.pm after the record is slurped in takes
care of the problem:

As any sort of newline needs to be stripped out as that is what is
getting converted to spaces.  It really wasn't a windows problem but
a problem with Allen's changes to the newick parsing code to replace WS
with _ but not handling LF separately.

>From the log:

revision 1.22
date: 2003/08/15 17:07:27;  author: allenday;  state: Exp;  lines: +3 -2
removed unnecessary escap char in space removing regex.  added regex to
remove quotes and leading/trailing spaces
from node labels as necessary.
revision 1.21
date: 2003/08/15 08:31:46;  author: allenday;  state: Exp;  lines: +5 -2
fixing over-zealous whitespace removal from node labels.  we do this by
not tampering with " quoted strings.  i'm not sure if newick allows " to
be escaped within these labels... if so, there may be a bug here.

My original code stripped all whitespace and thus we never had this
problem because there shouldn't be any in the node names in Newick
 "A name can be any string of printable characters except --->blanks<---,
 colons, semcolons, parentheses, and square brackets."

but apparently he wants to support this for his purposes.

I think my small change above takes care of the bug.


> So, unless the problem also occurs in regular code using Perl's default
> line break handling, I'd say the bug should be fixed by adding whatever
> code is required in the newick module, not by adding complexity in
> Root::IO for that special case.
> Cheers, Dave

Jason Stajich
Duke University
jason at cgt.mc.duke.edu

More information about the Bioperl-l mailing list