[Biopython-dev] ScanProsite

Sun Mar 1 12:17:28 UTC 2009

ScanProsite is a web tool to scan protein sequences against the PROSITE database (see http://www.expasy.org/tools/scanprosite/). Biopython contains code in Bio.Prosite to interact with ScanProsite. However, this code needs to be updated, as it does not work with the current ScanProsite web pages: Neither accessing ScanProsite nor extracting the hits from the HTML page works.

This problem is relatively easy to solve, since ExPASy nowadays allows programmatic access to ScanProsite (see http://www.expasy.org/tools/scanprosite/ScanPrositeREST.html). This returns the Prosite hits in XML format, which can be parsed easily in Python.

The only issue now is how this should be presented to the user. The current (broken) way to access Prosite looks like this:

>>> from Bio import ExPASy
>>> handle = ExPASy.scanprosite1(seq=mysequence)
to get a handle to the raw HTML output, and

>>> from Bio import Prosite
>>> hits = Prosite.scan_sequence_expasy(seq=mysequence)
which returns the hits as a Python list.

One possibility is to have a ScanProsite module under Bio.Prosite or Bio.ExPASy for interaction with ScanProsite. Something like this:
>>> from Bio.ExPASy import ScanProsite
>>> handle = ScanProsite.search(seq=mysequence)
>>> hits = ScanProsite.read(handle)

Another option is to have a scan function in the Bio.Prosite module that accesses the ScanProsite web tool and parses the results:
>>> from Bio import Prosite
>>> hits = Prosite.scan(seq=mysequence)
This is more straightforward, but on the other hand people may want to save the XML search results in an XML file, and for that purpose we'd need a function that does the parsing only.

Any opinions?

--Michiel