[BioPython] Whitespace in sequences
Iddo Friedberg
idoerg at pines2.ljcrf.edu
Tue Feb 18 03:39:45 EST 2003
Hi,
I guess you were using biopython on a Mac/Windows box, where '\r' or
'\r\n' is a
newline. Also, it looks like you were using the Bio.Fasta package to
read... the bug shouldn't occur within Bio.SeqIO.FASTA.FastaReader
(although it will within SeqIO.FASTA.FastaWriter!)
Basically, all occurences of the Linux/Unix-centric '\n' should be
replaced with os.linesep. In all modules.
(a few minutes later)
Hmmm... sorry, but I can't seem to commit the bugfix, probably something
to do with snow in Boston, or a Hackathon in Singapore. Take your pick. :)
I'll recheck this in the morning (Pacific time).
Best,
Iddo
--
Iddo Friedberg, Ph.D.
The Burnham Institute
10901 N. Torrey Pines Rd.
La Jolla, CA 92037
USA
Tel: +1 (858) 646 3100 x3516
Fax: +1 (858) 646 3171
http://bioinformatics.ljcrf.edu/~iddo
On Tue, 18 Feb 2003, Paul-Michael Agapow wrote:
>
> Possibly a known bug or even a behaviour that makes sense but ...
>
> While recently writing a biopython script to extract subsequences from
> a fasta file, I was surprised to find that whitespace was retained
> within the sequence after it was read into a SeqRecord. Specifically,
> carriage returns ('\r') were left embedded in the sequence, which then
> made the sequence lengths inaccurate and meant I extracted the wrong
> regions.
>
> So, any ideas about this behaviour? I solved it with a simple re to
> remove whitespace, but I can't think of any format in which whitespace
> is significant within a sequence, so surely it should all be cleaned up.
>
> --
> Dr Paul-Michael Agapow (p.agapow at ucl.ac.uk)
> Dept. Biology, University College London
>
> _______________________________________________
> BioPython mailing list - BioPython at biopython.org
> http://biopython.org/mailman/listinfo/biopython
>
More information about the BioPython
mailing list