[Biopython-dev] results of applying Clone Digger to the sources of BioPython project

Peter biopython at maubp.freeserve.co.uk
Sat Mar 22 11:35:39 UTC 2008


> Hello.
>
>  Clone Digger project is aimed to find software clones (duplicate code) in
>  Python and Java programs.
>
>  I have applied it to the source of BioPython and discovered several clone
>  candidates.
>
>  There are a lot of false positives caused by similar code in
>  nlmmedline_*_format.py files, but maybe other clone candidates will be
>  interesting for you.
>
>  The results can be seen here:
>  http://clonedigger.sourceforge.net/examples.html

Interesting.  Does your tool know to ignore deprecated modules?  e.g.
when we have essentially copied a file from one location to another, a
deprecated the original.

Some of these are from scanner/consumer parsers where there are two
alternative consumers turning the data into different object
representations.

Other things like providing dictionary like objects seem to be reusing
a lot of "boiler plate" code, and could probably be rationalised into
a base class and subclasses.  e.g. in Bio/SwissProt/SProt.py and
Bio/PubMed.py and Bio/GenBank/__init__.py and Bio/Prosite/__init__.py

Other things like the Blunt(AbstractCut) and Ov3(AbstractCut) both
sharing apparently identical catalyse() methods may fall into the same
class.

Peter



More information about the Biopython-dev mailing list