[Biopython-dev] Bio.GenBank (was: Bio.File)

Michiel de Hoon mjldehoon at yahoo.com
Mon Sep 12 12:49:35 UTC 2011



--- On Sun, 9/11/11, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> So currently none of Bio.GenBank can really be
> deprecated.

OK.

> Maybe we can represent the WGS records as
> SeqRecord objects without a sequence, but I
> don't like that idea really. Such files are NOT
> sequence files at all.

I agree.
> 
> >
> > Also we'd need some documentation for Bio.GenBank.
> >
> 
> In general it would be a good idea to have a
> worked example parsing a (small) GenBank
> file and showing where in the SeqRecord
> each bit of annotation goes.

That would be good, but we also need some documentation for Bio.GenBank itself, to clarify how Bio.GenBank is meant to be used by users (and also to clarify that Bio.SeqIO produces SeqRecords, and Bio.GenBank its own GenBank-specific records).

> > Also I think that the RecordParser should
> > raise an Exception if it cannot find a record
> > when parsing.
> 
> I disagree (or at least, when exposed via
> Bio.SeqIO I disagree).

After reading your comments, I realized that my mail was confusing. I think we actually agree. This is what I meant to say:

Compare the following:

>>> from Bio import SeqIO
>>> from StringIO import StringIO
>>> handle = StringIO("no record here")
>>> SeqIO.read(handle, 'genbank')
 Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File
"/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Bio/SeqIO/__init__.py",
line 617, in read
  raise ValueError("No records found in handle")
ValueError: No records found in handle

That's fine - the read function says it
will raise an exception if there is not
exactly one record.

With SeqIO.parse, we don't get an Exception:

>>> handle = StringIO("no record here")
>>> records = SeqIO.parse(handle, 'genbank')
>>> for record in records: print record.id
... 
>>> 

This is also OK. SeqIO.parse expects zero, one, or multiple records.

Now for Bio.GenBank:

>>> from Bio import GenBank
>>> parser = GenBank.RecordParser()
>>> handle = StringIO("no record here")
>>> parser.parse(handle)
>>> # no error raised

This I think is not OK. GenBank.RecordParser().parse expects one record; it should raise an Exception if it does not one. Likewise, the parser does not raise an Exception if there are multiple records in the handle.

and for Bio.GenBank.Iterator:

>>> from Bio.GenBank import Iterator
>>> from Bio.GenBank import RecordParser
>>> from StringIO import StringIO
>>> handle = StringIO("no record here")
>>> parser = RecordParser()
>>> records = Iterator(handle, parser)
>>> for record in records: print record.locus
... 
>>> 

which is the same behavior as for Bio.SeqIO.parse, which I think is OK.

Assuming that the RecordParser and the Iterator are the only two classes that are intended for the end-user, it's probably better to add a Bio.GenBank.read and a Bio.GenBank.parse function to be consistent with the other Biopython modules.

Sorry for the confusion!

--Michiel.




More information about the Biopython-dev mailing list