[Biopython-dev] Pfam24/HMMER3 (and GO terms...)

Mon Oct 19 17:46:20 UTC 2009

I've started as a close re-write the the original PfamScan script to
make sure the python script works equivalently to the original.  Now
that it works (for basic tests), I will begin by putting better data
interfaces.  The Bio.Pfam.HMM module should as a HMMER3 module work by
itself.  But it needs some examples, and probably some work on making
the interface more clean.  We could also move the code to Bio.HMMER,
rather than having it as a sub modules of Bio.Pfam.

This was primarily motivated by the dependency hell associated with
trying to get pfam_scan.pl to work on a cluster.  pfam_scan.pl relies
on BioPerl and Moose.  From the readme: 'Moose itself has quite a few
dependencies, so don't worry if it looks like you're installing half
of CPAN !'.  The code I've produced works within the BioPython
framework with no additional dependencies.  pfam_scan.pl just does
format parsing and table linking.  The heavy work is done in HMMER.
The dependency cost of pfam_scan.pl is just to great consider it's
functionality can be easily replicated in BioPython.

> Perhaps I have misunderstood you (and I have not looked at
> the code yet), but have you just re-written the PFAM perl script
> pfam_scan.pl in python? Is so, what is the aim? OK, it might be
> a bit faster - but you would be duplicating the work of the PFAM
> team and creating a long term maintenance burden.
>
> I can see the value of having an HMMER3 output parser, and
> a command line wrapper for calling it. This will be useful for
> things outside of PFAM.
>
> I can see the value of having a pfam_scan.pl output parser (XML,
> CVS, or the possible JSON), and a command line wrapper for
> calling it.
>
> Peter
>