[Bioperl-l] Root::IO handle Mac and Win32 LF

Jason Stajich jason at cgt.duhs.duke.edu
Tue Dec 16 09:52:25 EST 2003


It stems from this report
 http://bugzilla.bioperl.org/show_bug.cgi?id=1570

I don't know if he is running clustalw on windows and then trying to run
perl on the file in unix or what.  If that is the case I think it is in
order to unix-ify the file when they are moved over and not up to bioperl.

We already had code in Root::IO like this:
 $line =~ s/\r\n/\n/ if( (! $param{-raw}) && (defined $line);

I have no recollection of when it was added or by whom, I could be the
guilty party but I really don't remember.

So I don't have the answers to your questions:
Q1:  What byte sequence in the data do you want to change to what?
Q2:  What operating system is the code running on?

I think the intention here was that if
 perl -i -e -p 's/\n\r/\n/g' file.dnd
cleaned up the problem, why shouldn't that be part of the IO input
automatically.

I'm going to pass on fixing this bug for now.  Hopefully someone else will
get inspired from this discussion and test and propose THE RIGHT solution.

-jason
On Tue, 16 Dec 2003, Aaron J. Mackey wrote:

>
> I meant that when I examine a text file created by a Mac application
> (in this case, Endnote) using the unix tool "od -c" I see only "\r".
>
> I agree it's all very confusing; I apologize if I've only added to the
> uproar.
>
> -Aaron
>
> On Dec 16, 2003, at 7:04 AM, Dave Howorth wrote:
>
> > Aaron J. Mackey wrote:
> >>> I.e Windows terminates lines with \r\n but a Mac perversely reads
> >>> them as \n\r.
> >> Actually, it seems that there are some Mac-derived files with only
> >> \r, and no \n at all (as a recent example, EndNote 6 exported
> >> bibliographies have no \n's, only \r's by od -c's reckoning).
> >
> > Now you've confused me again.  What do you mean by \r?  Are you saying
> > there are some Mac files with only \012 or with only \015?  That is
> > are you speaking as a Unix/Linux/Windows user or as a Mac user?
> >
> > This is why it's better not to use \r and \n at all in this context.
> >
> >>> I think for portable code it's better to write the regexps using the
> >>> octal values: \015 instead of CR and \012 instead of LF.
> >> We don't have issues writing files, only reading one-line-at-a-time
> >> and canonicalizing it (why do we need to canonicalize it again,
> >> Jason?)
> >
> > I wasn't talking about writing files, I was talking about writing the
> > regexps that are used for reading files. (But as the section I quoted
> > from Perldoc points out, there *are* issues with writing files if you
> > want to use them with some network protocols :)
> >
> > Cheers, Dave
> > --
> > Dave Howorth
> > MRC Centre for Protein Engineering
> > Hills Road, Cambridge, CB2 2QH
> > 01223 252960
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu


More information about the Bioperl-l mailing list