[Biojava-l] Request for help!
Mark Schreiber
markjschreiber at gmail.com
Wed Jul 4 14:10:12 UTC 2007
BufferedWriter provides a newLine() method that writes a line
separator but I'm not sure if that gives you a different result or
not.
This may be a JVM bug that needs to be submitted to Sun.
As a very ugly work around it is possible to determine the OS from the
System object as well.
- Mark
On 7/4/07, Hilmar Lapp <hlapp at gmx.net> wrote:
> In Perl it is easy enough to regex-replace s/\n\r/\n/g and s/\r//g
> though I'm not sure this wouldn't incur too much overhead in Java.
>
> You can certainly detect the eol character(s) by line.indexOf('\r');
> if found and the preceding character is '\n' you have DOS/Win-style
> line endings, and otherwise if found it is Mac-style.
>
> However, this all seems like a lot of trouble to go through if all
> that one would need to ask of people is to make sure that the file
> matches the native eol style of the platform, which is really trivial
> to achieve.
>
> For example, to convert Win-style line endings to Unix:
>
> $ perl -pi -e 's/\r//g;' <your-files-here>
>
> and from Mac to Unix:
>
> $ perl -pi -e 's/\r/\n/g;' <your-files-here>
>
> I have these and other simple conversions defined as aliases in
> my .profile, and don't really ever worry about writing lots of code
> to accommodate arbitrary line endings :-)
>
> -hilmar
>
> On Jul 4, 2007, at 4:06 AM, Richard Holland wrote:
>
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA1
> >
> > Hi guys.
> >
> > I need help with a programming question!
> >
> > In Java, you can find out the line-end symbol that the JRE is using by
> > calling:
> >
> > System.getProperty("line.separator");
> >
> > On *nix this returns "\n", for instance.
> >
> > Our file parsers all rely on this to return the symbol to break
> > lines at
> > when parsing files. This usually works fine.
> >
> > BUT... on Windows machines, for certain files, it does not appear to
> > work! I suspect that these text files were generated on a *nix machine
> > then transferred by copying files across file systems using native
> > copy
> > commands, or using binary FTP so that the system retained the *nix
> > line-end symbols instead of replacing them for the local line-end
> > symbols as it would have done if they were transferred in text mode
> > via
> > FTP.
> >
> > I don't have access to a Windows machine I can test on, but I suspect
> > that the fix is quite a simple one and boils down to replacing the
> > System() call with something more intelligent.
> >
> > Is there any regex or similar thing we can use to spot _all_ kinds of
> > line-end symbols in text files regardless of the platform the file was
> > created on or the platform the parser is being run on?
> >
> > (For information, the only two users who have reported problems like
> > this are both using Nexus files - I'm not sure what tool generated
> > them
> > though. The Nexus parser uses the same rules as all the other
> > parsers in
> > BioJava so I don't think there's anything specifically wrong with
> > it as
> > opposed to say the GenBank or FASTA parsers.)
> >
> > cheers,
> > Richard
> >
> > -----BEGIN PGP SIGNATURE-----
> > Version: GnuPG v1.4.2.2 (GNU/Linux)
> > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
> >
> > iD8DBQFGi1T74C5LeMEKA/QRAqoeAKCf311nLYPqysNfUVLMy28H0FBMTgCcDaVh
> > 3ppr3WRdJcQgzIAJdUoIX0U=
> > =Cboa
> > -----END PGP SIGNATURE-----
> > _______________________________________________
> > Biojava-l mailing list - Biojava-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-l
>
> --
> ===========================================================
> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
> _______________________________________________
> Biojava-l mailing list - Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>
More information about the Biojava-l
mailing list