[Biopython-dev] Wrapping sequences in Fasta output

Peter biopython-dev at maubp.freeserve.co.uk
Thu Aug 9 08:10:22 UTC 2007


Michiel De Hoon wrote:
> Sebastian Bassi wrote:
>> On 8/7/07, Michiel De Hoon <mdehoon at c2b2.columbia.edu> wrote:
>>> I was wondering why we go through the Fasta reader/writer instead of
>>> reading/writing the file contents directly, as in
>>>         for filename, input_file in zip(pair, input_files):
>>>             input_file.close()
>>>             file(input_file.name, "w").write(file(filename).read())
>> The old Fasta writer used to write a 70 column formated fasta file.
>> Your method (and I think also the new seq.io) write the fasta data as
>> a one big line.

Maybe wise doesn't like its input as one long line?

> Peter, can we change the behavior of SeqIO.write so that it writes the fasta
> data in some fixed column format? For comparison, Bioperl appears to use a
> column width of 60 characters:
> 
> http://www.bioperl.org/wiki/FASTA_sequence_format
> 
> --Michiel.

That would be easy, and might improve compatibility with some tools 
which recommend the lines be at most 80 letters long. 60 does seem to be 
considered a default.

My personal preference is with no line breaks, partly because I tend to 
work more with domain sequences (usually less than 100 characters). This 
also means that when viewing a sequence in a text editor I can simply 
halve the line number to get the record number.

Any other views? Otherwise I'll change Bio.SeqIO to write FASTA files 
with a max sequence line length of 60.

Peter




More information about the Biopython-dev mailing list