[Biojava-l] Request for help!

Wed Jul 4 12:55:28 UTC 2007

In Perl it is easy enough to regex-replace s/\n\r/\n/g and s/\r//g  
though I'm not sure this wouldn't incur too much overhead in Java.

You can certainly detect the eol character(s) by line.indexOf('\r');  
if found and the preceding character is '\n' you have DOS/Win-style  
line endings, and otherwise if found it is Mac-style.

However, this all seems like a lot of trouble to go through if all  
that one would need to ask of people is to make sure that the file  
matches the native eol style of the platform, which is really trivial  
to achieve.

For example, to convert Win-style line endings to  Unix:

	$ perl -pi -e 's/\r//g;' <your-files-here>

and from Mac to Unix:

	$ perl -pi -e 's/\r/\n/g;' <your-files-here>

I have these and other simple conversions defined as aliases in  
my .profile, and don't really ever worry about writing lots of code  
to accommodate arbitrary line endings :-)

-hilmar

On Jul 4, 2007, at 4:06 AM, Richard Holland wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi guys.
>
> I need help with a programming question!
>
> In Java, you can find out the line-end symbol that the JRE is using by
> calling:
>
>    System.getProperty("line.separator");
>
> On *nix this returns "\n", for instance.
>
> Our file parsers all rely on this to return the symbol to break  
> lines at
> when parsing files. This usually works fine.
>
> BUT... on Windows machines, for certain files, it does not appear to
> work! I suspect that these text files were generated on a *nix machine
> then transferred by copying files across file systems using native  
> copy
> commands, or using binary FTP so that the system retained the *nix
> line-end symbols instead of replacing them for the local line-end
> symbols as it would have done if they were transferred in text mode  
> via
> FTP.
>
> I don't have access to a Windows machine I can test on, but I suspect
> that the fix is quite a simple one and boils down to replacing the
> System() call with something more intelligent.
>
> Is there any regex or similar thing we can use to spot _all_ kinds of
> line-end symbols in text files regardless of the platform the file was
> created on or the platform the parser is being run on?
>
> (For information, the only two users who have reported problems like
> this are both using Nexus files - I'm not sure what tool generated  
> them
> though. The Nexus parser uses the same rules as all the other  
> parsers in
> BioJava so I don't think there's anything specifically wrong with  
> it as
> opposed to say the GenBank or FASTA parsers.)
>
> cheers,
> Richard
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.2.2 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iD8DBQFGi1T74C5LeMEKA/QRAqoeAKCf311nLYPqysNfUVLMy28H0FBMTgCcDaVh
> 3ppr3WRdJcQgzIAJdUoIX0U=
> =Cboa
> -----END PGP SIGNATURE-----
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================