[Biopython-dev] test_AlignIO to python 3

Michiel de Hoon mjldehoon at yahoo.com
Mon Jul 5 11:47:21 UTC 2010


--- On Mon, 7/5/10, Tiago Antão <tiagoantao at gmail.com> wrote:
> >> 3. The big one: No sgmllib in p3.
> > A lot of the things using sgmllib are already
> deprecated (e.g.
> > Bio.NetCatch and Bio.Prosite). I think that leaves
> > just Bio.UniGene and Bio.InterPro - which isn't such
> a big issue.
> I know very little about those parts of the code, but there
> was an import required for sgmllib in test_AlignIO.

In Bio.UniGene and Bio.InterPro, sgmllib is used for parsing HTML pages, which tends to break easily anyway because the HTML format keeps changing. As a case in point, the parser in Bio.InterPro doesn't seem to work with current HTML pages from InterPro. I haven't tried Bio.UniGene, but Bio.UniGene can also parse UniGene flat files so I doubt that there is a real need to parse UniGene html files.

In test_AlignIO, the import for sgmllib is coming from the SGMLStripper class in Bio.File, imported from Bio.ParserSupport, imported from Bio.GenBank, imported from Bio.SeqIO. But Bio.SeqIO doesn't actually use SGMLStripper, which has been deprecated.

So I suggest that instead of fixing the modules that depend on sgmllib, we replace the relevant pieces of code by a NotImplementedError, and see if anybody complains.

For the longer term, it would be nice if the code in Bio.GenBank could be moved to Bio.SeqIO, and made independent of Bio.ParserSupport.

--Michiel.


      




More information about the Biopython-dev mailing list