[Bioperl-l] Root::IO handle Mac and Win32 LF

Jason Stajich jason at cgt.duhs.duke.edu
Tue Dec 16 11:44:18 EST 2003


On Tue, 16 Dec 2003, Dave Howorth wrote:

> Ah, now that's interesting. In this specific case the application,
> newick.pm, has explicitly opted out of Perl's end-of-line handling by
> redefining $/ so it can slurp the whole tree at once:
>
>     local $/ = ";\n";
>     return unless $_ = $self->_readline;
>
> Which, IMHO, makes it its problem to deal with line breaks.
Hmmm - SeqIO::fasta does this sort of thing as well.

This has nothing to do with the individual fields though - it only defines
how much to slurp in, if it weren't working we'd get two trees mooshed
together as one record and doesn't affect the multi-lined reports since
they only have a ; at the end.

In the end this had nothing to do with Windows LF problems once I had
Valentin's test file in front of me.

Adding this to newick.pm after the record is slurped in takes
care of the problem:
 s/[\n\r]+//g

As any sort of newline needs to be stripped out as that is what is
getting converted to spaces.  It really wasn't a windows problem but
a problem with Allen's changes to the newick parsing code to replace WS
with _ but not handling LF separately.

>From the log:

revision 1.22
date: 2003/08/15 17:07:27;  author: allenday;  state: Exp;  lines: +3 -2
removed unnecessary escap char in space removing regex.  added regex to
remove quotes and leading/trailing spaces
from node labels as necessary.
----------------------------
revision 1.21
date: 2003/08/15 08:31:46;  author: allenday;  state: Exp;  lines: +5 -2
fixing over-zealous whitespace removal from node labels.  we do this by
not tampering with " quoted strings.  i'm not sure if newick allows " to
be escaped within these labels... if so, there may be a bug here.
----------------------------

My original code stripped all whitespace and thus we never had this
problem because there shouldn't be any in the node names in Newick
http://evolution.genetics.washington.edu/phylip/newicktree.html
 "A name can be any string of printable characters except --->blanks<---,
 colons, semcolons, parentheses, and square brackets."

but apparently he wants to support this for his purposes.

I think my small change above takes care of the bug.

-jason


>
> So, unless the problem also occurs in regular code using Perl's default
> line break handling, I'd say the bug should be fixed by adding whatever
> code is required in the newick module, not by adding complexity in
> Root::IO for that special case.
>
> Cheers, Dave
>

--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu


More information about the Bioperl-l mailing list