[BioPython] Prosite / Prorule
holger.dinkel at gmail.com
holger.dinkel at gmail.com
Mon Nov 19 09:24:24 UTC 2007
Hello Peter and Michiel,
* Peter wrote:
> holger.dinkel at gmail.com wrote:
>
> Could you file a bug and attach a small recent Prosite file which has this problem?
I have created a bugreport (#2403) and also attached two files (a script to show
the error and a prosite-file 'prosite_test.dat')
> Note that the order in _scan_fns does matter.
are you sure about that? The definition of the '_scan_fns'-List, which holds all
callbacks to prosite-entries, shows some 'redundancy'. This makes me think, that
the entries are handled sequentially:
----------------------------------------------------------------------------------------------------
_scan_fns = [ _scan_id, _scan_ac, _scan_dt, _scan_de, _scan_pa, _scan_ma, _scan_ru, _scan_nr, _scan_cc,
# This is a really dirty hack, and should be fixed properly at
# some point. ZN2_CY6_FUNGAL_2, DNAJ_2 in Rel 15 and PS50309
# in Rel 17 have lines out of order. Thus, I have to rescan
# these, which decreases performance.
_scan_ma, _scan_nr, _scan_cc, _scan_dr, _scan_3d, _scan_do, _scan_terminator ]
----------------------------------------------------------------------------------------------------
And while scanning prosite-records the function '_scan_record' simply iterates over the _scan_fns-entries:
----------------------------------------------------------------------------------------------------
def _scan_record(self, uhandle, consumer):
consumer.start_record()
for fn in self._scan_fns:
fn(self, uhandle, consumer)
----------------------------------------------------------------------------------------------------
>
> Not that I am aware of, however the SwissProt parser looks very similar, so we should be able to fix this without too much hassle.
>
> Thanks
>
> Peter
* Michiel De Hoon wrote:
>
> The Prosite parser was written about five years ago, and it may very well be
> that none of the currently active Biopython developers really know how this
> parser works. In that case, one option may be to write a new Prosite parser
> from scratch. That could even be an easier solution than trying to fix the
> existing parser. If you decide to go that way, it would be a good idea to
> discuss the Prosite parser design beforehand on the development mailing list
> (biopython-dev at biopython.org).
>
> --Michiel
Re-writing the parser might be the best choice here. Unfortunately, I have not
much experience in writing parsers and also had quite a hard time trying to
understand what was going on in the Prosite RecordParser... 8-/
The way I THINK this should be done, is some event-driven mechanism, where the
first letters of the scanned line determine what kind of information follows.
As compared to iterating over a list (like in the current _scan_fns) and trying
to match each entry with the line...
Could you point me to a parser-implementation which functions as a 'template' of
good parser design. Maybe I can merge it with the existing Prosite-Parser...
Thanks to all of you,
Holger
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biopython/attachments/20071119/fac5baf7/attachment.sig>
More information about the Biopython
mailing list