[EMBOSS] EMBOSS seqret : IntelliGenetics and new DOS lines

Peter biopython at maubp.freeserve.co.uk
Mon Jul 20 16:30:45 UTC 2009


On Mon, Jul 20, 2009 at 5:16 PM, Peter Rice<pmr at ebi.ac.uk> wrote:
>
> Peter C. wrote:
>> Hi all,
>>
>> I've just updated my Mac to EMBOSS 6.1.0, and have found an
>> issue with seqret conversion of IntelliGenetics files. After some
>> digging, I think this problem relates to having DOS new lines in
>> a file on Unix (in my case, Mac OS X).
>
> we have an application "noreturn" to fix things like this.

That's basically an EMBOSS variant on unix2dos and dos2unix
(or similar) existing Unix command line tools?

I'm more interested in having all the EMBOSS tools handle either
new line format themselves automatically. These days I am mostly
working on Unix (including Mac OS X), but I do have to cope with
Windows style text files quite often.

> If you send me your file I will ty to take a look at whether we shoudl
> be catching the funny newline characters.

For this bug report I was using:
http://emboss.sourceforge.net/docs/themes/seqformats/ig

There are another three example files used in the Biopython unit
tests here:
http://biopython.open-bio.org/SRC/biopython/Tests/IntelliGenetics/

>> P.S. Should I have reported this possible bug via sourceforge?
>
> The emboss-bug at emboss.open-bio.org list is the best way to get
> our attention

Great, another mailing list to sign up to... but if that is your
preferred route, that's fine.

>> P.P.S. Back in 2006, I reported a similar issue with a data
>> corruption reading stockholm/pfam with DOS newlines
>> (Sourceforge Bug #1588956, long since fixed). It seems to
>> me that EMBOSS would benefit from explicit testing of all
>> the file formats using DOS/Windows newlines when run on
>> Unix, and vice versa. Does that sound feasible, or just
>> hopelessly ambitious?
>
> We can try ... how well does biopytjhon handle these? (i.e. do we need
> such examples for perl, python etc or is this an EMBOSS-specific issue?)

I think this is an EMBOSS specific issue. I don't know enough about
how all the different EMBOSS parsers work, but is there a singl
place where you could add automatic handling of either new line
convention when reading in text?

For reference, in Python, you can explicitly open text files in "universal
newlines" mode, which takes care of this. I don't know about Perl.

Peter C.



More information about the EMBOSS mailing list