[BioPython] Prosite / Prorule

Mon Nov 19 09:24:24 UTC 2007

Hello Peter and Michiel,

* Peter wrote:
> holger.dinkel at gmail.com wrote:
> 
> Could you file a bug and attach a small recent Prosite file which has this problem?

I have created a bugreport (#2403) and also attached two files (a script to show
the error and a prosite-file 'prosite_test.dat')

> Note that the order in _scan_fns does matter.

are you sure about that? The definition of the '_scan_fns'-List, which holds all
callbacks to prosite-entries, shows some 'redundancy'. This makes me think, that
the entries are handled sequentially:

----------------------------------------------------------------------------------------------------
    _scan_fns = [ _scan_id, _scan_ac, _scan_dt, _scan_de, _scan_pa, _scan_ma, _scan_ru, _scan_nr, _scan_cc,

        # This is a really dirty hack, and should be fixed properly at
        # some point.  ZN2_CY6_FUNGAL_2, DNAJ_2 in Rel 15 and PS50309
        # in Rel 17 have lines out of order.  Thus, I have to rescan
        # these, which decreases performance.

        _scan_ma, _scan_nr, _scan_cc, _scan_dr, _scan_3d, _scan_do, _scan_terminator ]
----------------------------------------------------------------------------------------------------

And while scanning prosite-records the function '_scan_record' simply iterates over the _scan_fns-entries:
----------------------------------------------------------------------------------------------------
    def _scan_record(self, uhandle, consumer):
        consumer.start_record()
        for fn in self._scan_fns:
            fn(self, uhandle, consumer)
----------------------------------------------------------------------------------------------------

> 
> Not that I am aware of, however the SwissProt parser looks very similar, so we should be able to fix this without too much hassle.
> 
> Thanks
> 
> Peter

* Michiel De Hoon wrote:
> 
> The Prosite parser was written about five years ago, and it may very well be
> that none of the currently active Biopython developers really know how this
> parser works. In that case, one option may be to write a new Prosite parser
> from scratch. That could even be an easier solution than trying to fix the
> existing parser. If you decide to go that way, it would be a good idea to
> discuss the Prosite parser design beforehand on the development mailing list
> (biopython-dev at biopython.org).
> 
> --Michiel

Re-writing the parser might be the best choice here. Unfortunately, I have not
much experience in writing parsers and also had quite a hard time trying to
understand what was going on in the Prosite RecordParser... 8-/

The way I THINK this should be done, is some event-driven mechanism, where the
first letters of the scanned line determine what kind of information follows.
As compared to iterating over a list (like in the current _scan_fns) and trying
to match each entry with the line...

Could you point me to a parser-implementation which functions as a 'template' of
good parser design. Maybe I can merge it with the existing Prosite-Parser...

Thanks to all of you,

Holger
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biopython/attachments/20071119/fac5baf7/attachment.sig>