[Bioperl-l] Root::IO handle Mac and Win32 LF

Dave Howorth dhoworth at mrc-lmb.cam.ac.uk
Tue Dec 16 06:05:35 EST 2003


I'm a bit confused by this discussion.  I think it's best to go back to 
basics and then probably approach it slightly differently.
Q1:  What byte sequence in the data do you want to change to what?
Q2:  What operating system is the code running on?

The problem is that the meaning of \r and \n is different on different 
OS as well as different bytes being stored in files; from 
http://www.perldoc.com/perl5.6.1/pod/perlop.html:

'All systems use the virtual "\n" to represent a line terminator, called 
a "newline". There is no such thing as an unvarying, physical newline 
character. It is only an illusion that the operating system, device 
drivers, C libraries, and Perl all conspire to preserve. Not all systems 
read "\r" as ASCII CR and "\n" as ASCII LF. For example, on a Mac, these 
are reversed, and on systems without line terminator, printing "\n" may 
emit no actual data. In general, use "\n" when you mean a "newline" for 
your system, but use the literal ASCII when you need an exact character. 
For example, most networking protocols expect and prefer a CR+LF 
("\015\012" or "\cM\cJ") for line terminators, and although they often 
accept just "\012", they seldom tolerate just "\015". If you get in the 
habit of using "\n" for networking, you may be burned some day.'


I.e Windows terminates lines with \r\n but a Mac perversely reads them 
as \n\r.

I think for portable code it's better to write the regexps using the 
octal values: \015 instead of CR and \012 instead of LF. Plus as Aaron 
says, the pattern will be broken up. (This goes back to Q1 - is there 
ever any reason to preserve a CR? Or for that matter an LF?)

Then test on all architectures bioperl is supported on :)

Cheers, Dave


Allen Day wrote:
> Are you sure you want to do that?  Maybe
> 
> $line =~ s/^\r// if $^O eq 'MacOS'; #or whatever $^O is for Mac.
> 
> is better.
> 
> -Allen
> 
> 
> On Mon, 15 Dec 2003, Aaron J. Mackey wrote:
> 
> 
>>If $/ = "\n", then your second regexp won't happen (the \r is at the 
>>beginning of the next line), right?
>>
>>So how about instead, simply:
>>
>>$line =~ s/\r/g; # strip any linefeeds, regardless of position
>>
>>-Aaron
>>
>>On Dec 15, 2003, at 1:02 PM, Jason Stajich wrote:
>>
>>
>>>We currently have this code in Bio::Root::IO to handle stripping 
>>>linefeeds
>>>$line =~ s/\r\n/\n/g if( (!$param{-raw}) && (defined $line) );
>>>
>>>This only matches Mac LF, to handle windows we need to also strip \n\r
>>>so I am going to change it to the following:
>>>
>>> $line =~ s/\r\n/\n/g, $line =~ s/\n\r/\n/g
>>>  if( (!$param{-raw}) && (defined $line) );
>>>
>>>Since this is a core critical module wanted to just post it to see if
>>>anyone has objections/suggestions.
>>>
>>>-jason
>>>--
>>>Jason Stajich
>>>Duke University
>>>jason at cgt.mc.duke.edu
-- 
Dave Howorth
MRC Centre for Protein Engineering
Hills Road, Cambridge, CB2 2QH
01223 252960



More information about the Bioperl-l mailing list