[BioPython] Fasta parser, minor (bug/feature?)

Peter Wilkinson pwilkinson_m at xbioinformatics.org
Wed Aug 24 16:43:15 EDT 2005


It seems that the fasta parser retains the os specific line endings when it 
stores the title and sequence in the Record object, so I have to write out 
something like this when I read a file from working in windows (eeeeek), 
then display using a true text editor like Context:

file_out.writelines(str(cur_record).replace('\r',''))

... because all the line endings are '\r\n', and are displayed in the text 
editor as 2 returns, or double spacing the text when written to file 
instead of single space:

 >gi|272209|gb|M61959.1| EST00007 Fetal brain, Stratagene (cat#936206) ...

CTTCCCTTTTGTTCCCCTCAGTGTCCCTTTTAATTGCTTCCCTCCATTTTCCTTAGCAGC

ATCCTAGTTGATGGTCTGGGTTATCAGAGGAGCAAAAACATTTAAGTGTCAAATAATGCT

CATTGTCTCCCTGGGATTTCTAAACAGAAAAAATGAAGAAAGAGGCAGAGAAGAGCTTCA





Should the behavior to allow both single and os specific line returns be 
applied, or just '\n'?

I realise that the Record __str() method uses os.linesep, but when working 
with fasta files in a true text editor in windows ... only the \n is 
needed.  Also I work generally in a mixed environment and the \r\n should 
be avoided.

I am unsure why os.linesep is used here. My vote is to just have a plain 
'\n' applied to each end of line.

Peter





More information about the BioPython mailing list