[Biopython] Strange Gaps when writing Multi-Fasta

Peter Cock p.j.a.cock at googlemail.com
Thu Feb 24 22:39:28 UTC 2011


I got it - but a bit big for the mailing list maybe?

On Thu, Feb 24, 2011 at 9:50 PM, Brett Bowman <bnbowman at gmail.com> wrote:
> Done.  The script itself is still ~100 lines, but you can safely
> ignore the top 90 which parse the blast file.  Its the bottom 20 lines
> that output everything that are confusing me.
>
> Simply download the script and the raw data file and run the following:
> python psi_parser_simple.py simple_test.txt
>
> This will output the sequence data in 3 ways:
> out_verA.txt - The Id printed in Fasta style, then the entire sequence
> printed on the next line.  No problems here.

No line wrapping, so some text editors may break in odd places
on the gap characters (treating them as hyphens), but seems fine.

> out_verB.txt - Regular Fasta format, written by me.  Here CBI21345 has
> the weird blank line in the middle that I can't get rid of.

I don't see any blank line in the CBI21345 record for out_verB.txt

> out_verC.txt - Regular Fasta format, written by Biopython's SeqIO.
> Here it is YP_002749131 that has the weird blank line that I also
> can't get rid of.

I don't see any blank line in the YP_002749131 record for out_verC.txt

> How is it possible to get the same weird artifact or not, and in
> different places, when all of the data is processed with the same
> For-loop?

Often funny things with new lines are due to OS differences in the
new line (CR + LF on Windows, LF only on Unix). That is unlikely
to be the issue here.

How are you looking at the output files? I just used gedit on Linux,
and double checked at the terminal, e.g.

grep -C 30 CBI21345 out_verB.txt

grep -C 20 YP_002749131 out_verC.txt

I'll send you the three output files (off the mailing list) so you can
compare them to what you get on your machine.

Peter




More information about the Biopython mailing list