[Biojava-l] [Biojava-dev] Request for help!

Andy Yates ayates at ebi.ac.uk
Wed Jul 4 14:33:28 UTC 2007


BufferedWriter will always use the value of 
System.getProperty("line.separator") however BufferedReader knows that 
an end of line can be \r\n, \r or \n so in Java land is perfectly legal 
to have any common line terminator & still write files in an OS specific 
manner.

I sent a regex to Rich which he improved on but the net result is the 
extraction of the EOL regardless of which one it is.

I'm not 100% sure on where the problem lies. So long as the parsers use 
BufferedReader for it's text file reading (which they all seem to do) 
this shouldn't have been a problem. In fact this is the line from the 
BufferedReader.readLine() in the JDK:

"Read a line of text. A line is considered to be terminated by any one 
of a line feed ('\n'), a carriage return ('\r'), or a carriage return 
followed immediately by a linefeed."

Very very strange but the regex sounds like it was a pragmatic solution

Andy

Mark Schreiber wrote:
> BufferedWriter provides a newLine() method that writes a line
> separator but I'm not sure if that gives you a different result or
> not.
> 
> This may be a JVM bug that needs to be submitted to Sun.
> 
> As a very ugly work around it is possible to determine the OS from the
> System object as well.
> 
> - Mark
> 
> On 7/4/07, Hilmar Lapp <hlapp at gmx.net> wrote:
>> In Perl it is easy enough to regex-replace s/\n\r/\n/g and s/\r//g
>> though I'm not sure this wouldn't incur too much overhead in Java.
>>
>> You can certainly detect the eol character(s) by line.indexOf('\r');
>> if found and the preceding character is '\n' you have DOS/Win-style
>> line endings, and otherwise if found it is Mac-style.
>>
>> However, this all seems like a lot of trouble to go through if all
>> that one would need to ask of people is to make sure that the file
>> matches the native eol style of the platform, which is really trivial
>> to achieve.
>>
>> For example, to convert Win-style line endings to  Unix:
>>
>>         $ perl -pi -e 's/\r//g;' <your-files-here>
>>
>> and from Mac to Unix:
>>
>>         $ perl -pi -e 's/\r/\n/g;' <your-files-here>
>>
>> I have these and other simple conversions defined as aliases in
>> my .profile, and don't really ever worry about writing lots of code
>> to accommodate arbitrary line endings :-)
>>
>> -hilmar
>>
>> On Jul 4, 2007, at 4:06 AM, Richard Holland wrote:
>>
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA1
>>>
>>> Hi guys.
>>>
>>> I need help with a programming question!
>>>
>>> In Java, you can find out the line-end symbol that the JRE is using by
>>> calling:
>>>
>>>    System.getProperty("line.separator");
>>>
>>> On *nix this returns "\n", for instance.
>>>
>>> Our file parsers all rely on this to return the symbol to break
>>> lines at
>>> when parsing files. This usually works fine.
>>>
>>> BUT... on Windows machines, for certain files, it does not appear to
>>> work! I suspect that these text files were generated on a *nix machine
>>> then transferred by copying files across file systems using native
>>> copy
>>> commands, or using binary FTP so that the system retained the *nix
>>> line-end symbols instead of replacing them for the local line-end
>>> symbols as it would have done if they were transferred in text mode
>>> via
>>> FTP.
>>>
>>> I don't have access to a Windows machine I can test on, but I suspect
>>> that the fix is quite a simple one and boils down to replacing the
>>> System() call with something more intelligent.
>>>
>>> Is there any regex or similar thing we can use to spot _all_ kinds of
>>> line-end symbols in text files regardless of the platform the file was
>>> created on or the platform the parser is being run on?
>>>
>>> (For information, the only two users who have reported problems like
>>> this are both using Nexus files - I'm not sure what tool generated
>>> them
>>> though. The Nexus parser uses the same rules as all the other
>>> parsers in
>>> BioJava so I don't think there's anything specifically wrong with
>>> it as
>>> opposed to say the GenBank or FASTA parsers.)
>>>
>>> cheers,
>>> Richard
>>>
>>> -----BEGIN PGP SIGNATURE-----
>>> Version: GnuPG v1.4.2.2 (GNU/Linux)
>>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>>>
>>> iD8DBQFGi1T74C5LeMEKA/QRAqoeAKCf311nLYPqysNfUVLMy28H0FBMTgCcDaVh
>>> 3ppr3WRdJcQgzIAJdUoIX0U=
>>> =Cboa
>>> -----END PGP SIGNATURE-----
>>> _______________________________________________
>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>> --
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>>
>>
>> _______________________________________________
>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev



More information about the Biojava-l mailing list