[Biopython-dev] Bioformat module

Andrew Dalke adalke at mindspring.com
Fri Jan 4 05:37:51 EST 2002


Brad:
>I'll second the "Wow! That's cool" from Jeff :-).

Thanks! to both of you.  And I guess you're running a
2.2 version of Python, since I have some 'yield' statements
in there.  :)

> After some
>small modifications to the GenBank format, I got GenBank minimally
>working with it.

There's going to be a few more changes.  I've been working on
standard tag names for things like identifiers, cross-references,
sequence, and features (with qualifiers).  Seems to work with
well with SWISS-PROT and EMBL.  The idea is to do

   Std.dbid(UntilSep(delimiter = ";"), {"type": "accession"})

and it puts in the correct tags.

(BTW, I'm going to change "delimiter" to "sep".)

>Attached is the format registration stuff, that
>goes in Bioformats/formats/genbank.py for anyone who is interested
>in duplicating this.

Wasn't attached.

>>>> infile.seek(0)

Shouldn't need that.  The identification code should always
reseek the file to the beginning after it's finished.

>I'm definately +1 on checking this into CVS. It seems along the
>same spirit as what Thomas was working on in Bio/SeqIO/generic, but
>integrates well with Martel.

It was.  I looked through the mailings to make sure I read
his (and others') discussions.  It's also (IMNSHO) much better
than the Bioperl and BioJava codes because it can handle
non-sequence formats, like BLAST results, as well.

Should it be under Bio (Bio.Bioformats) or parallel to it?
Unlike Martel, I don't see it as being distributed outside
of Biopython, so I would think under.  And I think the
Biopython code will have hooks to it as well.  Okay, so under
it is.

> I'm not sure if I really have the full
>picture of everything yet, but from what I see it looks good!

I'm giving a short talk Friday morning.  I think I know what
I'm doing well enough now that tomorrow evening I should be
able to write an overview level description of the project.

BTW, for me it was even harder to figure out the full picture.
I had to do one piece at a time until it finally started to
come together.

>I'm excited about the mixin stuff as well -- it seems like it'll
>really simplify a lot of repetitive coding for adding new formats. 
>Too bad I already did all the repetitive coding for GenBank :-).

That was part of the small pieces -- see what works well then
try to abstract from there.

Mixins, however, turned out to be a dead end.  There was a problem
when multiple mixins wanted the same events.  There was also the
annoyance of having to __ all object variables in the hopes of
not getting conflicts with other classes.  So I used a different
approach which actually makes things easier to understand, I hope.
Like I said, tomorrow evening... Hopefully.

                    Andrew
                    dalke at dalkescientific.com





More information about the Biopython-dev mailing list