[Biopython-dev] [BioPython] a sequence set object in biopython?

Michiel de Hoon mjldehoon at yahoo.com
Thu Nov 13 10:27:57 UTC 2008


Adding new classes to Biopython should be done very carefully ... once they're in, it's difficult to remove them again. In the past, removing classes that turned out to be less than ideal was a real headache. Right now I don't see a clear need for a sequence set object ... read on.

--- On Wed, 11/12/08, Giovanni Marco Dall'Olio <dalloliogm at gmail.com> > > > > OK, then use a dict of SeqRecords for this, as shown
> > in the tutorial chapter for Bio.SeqIO and the wiki.
> >  We even have a helper function
> > Bio.SeqIO.to_dict() to do this and check for duplicate
> > keys.
> 
> I would prefer a SeqRecordSet object with a to_dict method

> Wouldn't it be easier:
> >>> seqs = Bio.SeqIO.parse(filehandler,
> 'fasta')
> >>> record_dict = seqs.to_dict()
> 
> than invoking SeqIO twice?

Maybe, yes, but it's just a matter of typing and I don't think that by itself it is a good enough reason for a SeqRecordSet class.

> Let's see it from another point of view.
> In biopython, if you want to print a set of sequences in
> fasta format,
> you have to do the following:
> >>> s1 = SeqRecord(Seq('cacacac'))
> >>> s2 = SeqRecord(Seq('cacacac'))
> >>> seqs = s1, s2
> >>> out = ''
> >>> for seq in seqs:
>         # a "print seq.format('fasta')" statement won't work
>         # properly here, because of blank lines
>         out += seq.format('fasta')
> >>> print out

I don't quite understand why "print seq.format('fasta')" won't work.

> Take for example this code you wrote for me before:
> 
> > class SeqRecordList(list) :
> >    def format(self, format) :
> >        from Bio import SeqIO
> >        from StringIO import StringIO
> >        handle = StringIO()
> >        SeqIO.write(self, handle, format)
> >        handle.seek(0)
> >        return handle.read()
> 
> It's very useful, but I don't think a
> python/biopython newbie would be
> able to write it.

I agree that this is too complicated. What if we redefine SeqIO.write as

def write(self, handle=sys.stdout, format='fasta'):
...

So by default SeqIO.write prints to the screen. Then you can do

SeqIO.write(records)

where records are a list of SeqRecord's.

--Michiel.


      



More information about the Biopython-dev mailing list