[Bioperl-l] experimental Bio::Search::Tiling implementation
Mark A. Jensen
maj at fortinbras.us
Tue May 19 23:16:55 UTC 2009
Thanks Steve--great idea. It would be great if users who have had
any issues with Bioperl tiling (or any other algorithm, for that matter)
on particular datasets would send them along. I will enter an enhancement
bug report for this purpose; folks can attach their problem data to it.
(P.S. to all; there are also some rudimentary run tests at
----- Original Message -----
From: "Steve Chervitz" <sac at bioperl.org>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: "BioPerl List" <bioperl-l at lists.open-bio.org>
Sent: Tuesday, May 19, 2009 6:21 PM
Subject: Re: [Bioperl-l] experimental Bio::Search::Tiling implementation
Great work. My SearchUtils tiling function has been lingering for far
too long (at least a decade).
Your comment about BLASTP is fitting. I was working almost exclusively
with BLASTP when developing the original tiling function and it seems
like the trouble ensued when using it with other blast flavors. There
was insufficient exploration of blast alignment edge cases. It would
be good to come up with a comprehensive collection of blast reports to
stress test your tiling impl. The set currently in t/data is a good
start, but may not be sufficient.
On Tue, May 19, 2009 at 12:31 PM, Mark A. Jensen <maj at fortinbras.us> wrote:
> Hi All-
> With the frequent posts concerning HSP tilings, I thought it was time
> to create the sought-after Bio::Search::Tiling namespace, and attempt
> to provide a robust and exact tiling algorithm. I think it's timely,
> too, since Jason's usual remarks involve the use of wu-blast with
> the --links option, and wu-blast has recently turned commercial and
> is evidently costly to obtain.
> The namespace includes an abstract interface B:S:Tiling::TilingI, and
> a concrete class called B:S:Tiling::MapTiling. The object is
> constructed like so
> $tiling = Bio::Search::Tiling::MapTiling($your_blast_hit);
> and provides methods for identities(), conserved(), and length();
> other stats could also be provided. Identities and conserved sites are
> correctly estimated, accounting for multiple overlapping HSPs. There
> is also a method next_tiling($type), where $type is 'hit', 'subject'
> (alias for 'hit'), or 'query', which an iterator stepping through all
> minimal sets of HSPs that completely cover the 'hit' or 'query'
> sequence. One feature is that the individual tilings do not need to be
> generated to estimate the statistics; next_tiling provides the individual
> tilings only if you want/need them.
> I've made it available in a pre-alpha state on bioperl-dev. It's
> working and workable with plenty pod: see the synopses. It would
> be excellent if interested folks would try it out on their favorite
> data. Some niceties are not yet implemented, so BLASTP data is your
> best bet for success. Check it out via svn into a separate working
> directory, let me know if there are any questions.
> Below is table of comparison numbers using the current SearchUtils
> tiling implementation and some of the new methods, on some test data
> in t/data. Please see pod for many more details.
> Comparision of methods with (patched) Bio::Search::SearchUtils
> using test data t/data/dcr1_sp.WUBLASTP
> SU: SearchUtils
> MT: MapTiling, using methods 'exact', 'est', 'max'
> so MT(q:x) is MapTiling, stats calculated on the query, with the exact
> method, etc.
> Hit SU MT(q:x) MT(q:e) MT(q:m) MT(s:x) MT(s:e) MT(s:m)
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
More information about the Bioperl-l