[Biopython-dev] [Biopython] Google Summer of Code Project: SearchIO in Biopython

Peter Cock p.j.a.cock at googlemail.com
Mon Apr 30 09:49:27 UTC 2012


On Sun, Apr 29, 2012 at 5:42 PM, Wibowo Arindrarto
<w.arindrarto at gmail.com> wrote:
>
> I think I got the gist of it (please correct me if I'm wrong). Some
> information about the search, such as the sequence-wide e-value, may
> not be present in the HSP level. Ignoring them could let us focus on a
> perhaps simpler and more flexible implementation with better
> performance, but at the cost of usefulness of the data itself since we
> are throwing away information.

Yes.

> What I have in mind now is actually closer to iteration on the
> query+subject level. To be clear first, the hierarchy of the objects
> that I propose is this:
>
> * Search object, to represent the entire search session.
> * Result object, to represent a search with one query against the
> database. Depending on the number of queries, we could have one to
> several Result objects contained in a Search.
> * Hit object, to represent a sequence hit. Depending on the search, we
> could also have multiple Hits in one Result object.
> * and finally, HSP object, to represent individual alignments.
>
> Iteration is done on the Results level, so the information is parsed
> on the search query level, not just a single HSPs (I wrote a  very
> short description about what I'm planning the objects to be in here as
> well: http://bit.ly/searchio-terms). I suppose if we aim for maximum
> information parsing over performance and simplicity of the
> format-specific parsers, this is the way to go. There are other
> formats, too, that contains sequence-level search information not
> present in the alignment (e.g. HMMER text output). What do you think
> about this?

That sounds good .

If iteration is done on the Results level, when/how would your
Search object be used?

Peter




More information about the Biopython-dev mailing list