[Biopython-dev] Accessing ExPASy through Bio.SwissProt / Bio.SeqIO

Michiel De Hoon mdehoon at c2b2.columbia.edu
Wed Dec 5 01:13:01 UTC 2007


> One idea I had been thinking about was adding a new function
> Bio.SeqIO.fetch(...) or Bio.SeqIO.online_fetch(...) which would act as
> a proxy to all our supported online sequence databases, and either
> return a handle to the requested record(s), or perhaps return
> SeqRecord(s).

I believe that Bio.db has such a functionality, but I don't think it is used
much.
Anyway, we currently have too many functions in Biopython to access databases
rather than too few.
So I think we should not add any new ones.

> > *If* we decide that ExPASyDictionary should return handles, *then*
actually
> > we don't really need an ExPASyDictionary, as its behavior is then largely
the
> > same as Bio.WWW.ExPASy.get_sprot_raw. So in short, in my opinion
> > Bio.SwissProt.SProt.ExPASyDictionary does not add much beyond what
> > Bio.WWW.ExPASy.get_sprot_raw already offers.
>
> Can ExPASyDictionary return anything that get_sprot_raw can't?
> Otherwise from the user's point of view its just a coding style issue
> (dictionary versus function).

ExPASyDictionary is just a wrapper around get_sprot_raw, so get_sprot_raw can
return any record that ExPASyDictionary can return.
There are two differences between the two:
1) ExPASyDictionary behaves as a dictionary, get_sprot_raw as a function. As
you write, this is just a coding style issue.
2) When creating a ExPASyDictionary, users can pass a parser to parse the
records before returning them. This is in essence only a coding style issue.
In particular, do we want:
   >>> from Bio.SwissProt import SProt
   >>> sprot_parser = SProt.RecordParser()
   >>> dictionary = SProt.ExPASyDictionary(parser = sprot_parser)
   >>> record = dictionary["O12345"]
   or
   >>> from Bio.SwissProt import SProt
   >>> from Bio import ExPASy
   >>> handle = ExPASy.get_sprot_raw("O12345")
   >>> record = SProt.parse(handle)
For SeqRecords, in the ExPASyDictionary approach we'd use a different parser,
in the get_sprot_raw approach we call SeqIO.parse instead of SProt.parse.
For plain-text output, in the ExPASyDictionary approach we pass no parser,
and in the get_sprot_raw approach we call read() on the handle directly.
To get a handle, in the ExPASyDictionary approach we can use StringIO to
convert the text output to a handle; in the get_sprot_raw approach we don't
need to do anything.

In my opinion, both 1) and 2) are just coding style issues. Maintaining both
ExPASyDictionary and get_sprot_raw is a burden for the developers, and causes
confusion for users. So I suggest we focus on one of these, and deprecate the
other.
The ExPASy.get_sprot_raw approach seems closer to how Bio.SeqIO is organized,
and therefore has my preference.

Two more issues:
1) I am not sure why the SwissProt code is kept in a separate SProt submodule
of Bio.SwissProt. Currently, Bio/SwissProt/__init__.py is empty, so we can
save ourselves some typing by keeping all the SwissProt code there instead of
in SProt.py.
2) A SwissProt.parse function currently doesn't exist. Right now it is a
three-step process:
   >>> s_parser = SProt.RecordParser()
   >>> s_iterator = SProt.Iterator(handle, s_parser)
   >>> record = s_iterator.next()
   A SwissProt.parse function would just contain these three steps, or
perhaps only the first two.

--Michiel.


Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032
-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 4451 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biopython-dev/attachments/20071204/9bc0ae4d/attachment-0002.bin>


More information about the Biopython-dev mailing list