[Bioperl-l] porting from BlastXXX to GenericXXX

Steve Chervitz sac@bioperl.org
Thu, 20 Jun 2002 01:24:47 -0700 (PDT)


--- Jason Stajich <jason@cgt.mc.duke.edu> wrote:
> Steve - I'm slowly getting this going again - I've ported seq_inds to the
> GenericHSP object.  This will work with FASTA HSPs too as the objects
> seems to be sufficiently generic (a couple of places where I do
> if($hsp->algorithm =~ /FAST/ ) for fasta specific things).
>
> I've also added the convenience methods
> GenericHit-> hsps()
> 
> GenericResult-> hits()
> 
> to allow retrieval of the contained objects in a list manner in addition
> to the next_XX() / rewind() iterator methods.
> 
> My priorities in this are to get Bio::SearchIO::Writer::XXTableWriters
> to work for the blast objects and see about adding parser for a couple of
> other pairwise alignment outputs.  If all is well I'd like to see about
> moving the psiblast code into the blast.pm if possible.   This would mean
> adding and iteration method to the Hit objects and what else?  

There's also the found_again() method to indicate if the hit was found in a
previous iteration. The new() method also needs to accept -iteration and
-found_again parameters.

> Should it
> be peeled off into a separate 'psiblast' parsing module after all?

Probably not, since psiblast reports are so similar to non-psiblast reports. If
we wanted to do fancy tricks like in BPpsilite.pm which allows you to jump
right to a specific iteration, then maybe. But I don't know how important this
ability is.

There also the possiblity of having a special psiblast hit object. But I don't
think we need to go there. For non-psiblast hits, iteration() should always
return 1 and found_again() false.

> At any rate we'll have to agree some on common methods that need to be
> established in the interfaces first.  There is some variation in what
> we've each been using: evalue/expect, signif/significance, etc.

I don't see a problem with including all variant method names for the sake of
backward compatiblity. If you want to get strict, you could deprecate certain
ones, but I don't think that's necessary.

> I think I'll go ahead and port/write the SearchIO::emboss parser which
> will produce Search objects from EMBOSS water/needle alignments - we
> already parse these in AlignIO but they produce Align::AlignI objects and
> someone may want to have Search::HSP objects.  I'm still undecided as to
> how we should merge pairwise and MSA objects into a common framework -
> right now we can convert from HSP to Align::AlignI but not in the reverse
> for obvious reasons.  Perhaps this is good enough?

I haven't thought about this problem that much, but one way to do unite the
pairwise and MSA worlds would be to say that an MSA is a collection of pairwise
alignments, where one sequence (the reference) is common across all pairwise
alignments in the set. You'd need some way to flag which is the reference seq,
perhaps by just storing the id of the ref sequence in a slot within the MSA
object.

Steve
 
> -jason
> -- 
> Jason Stajich
> Duke University
> jason at cgt.mc.duke.edu



=====
Steve Chervitz
sac@bioperl.org

__________________________________________________
Do You Yahoo!?
Yahoo! - Official partner of 2002 FIFA World Cup
http://fifaworldcup.yahoo.com