[Bioperl-l] arbitrary hashes, blast, statistics, parameters, and java interoperability

Aaron J. Mackey amackey at pcbi.upenn.edu
Fri May 14 07:57:42 EDT 2004


I'd be happy to see the statistics and parameters turn into full 
objects, and could even imagine some useful functions that a 
Bio::Search::Statistics::BLAST object might provide:

my $stats = $result->statistics;

# use the report's database size to get a bit score threshold
# that corresponds to a given expectation threshold:
my $bitscore_threshold = $stats->E_to_bits(1e-6);

# vice versa:
my $expect_threshold = $stats->bits_to_E(32.0);

# calculate a bitscore or expectation for a given comparison:
my $bitscore = $stats->bitscore($rawscore, $querylen, $liblen);
my $exp = $stats->expect($rawscore, $querylen, $liblen);

# Make Warren Gish happy:
my $nats = $stats->bits_to_nats($bitscore);

I realize you (and 99.9% of the world) only care about BLAST statistics 
and parameters, but I really do think you should subclass these things 
so that we can plug in others when/if necessary.  I would think that 
all an interface should gaurantee are generic data access methods 
(get_param, set_param, etc).

$stats->set_param( Lambda => 0.123 );
$stats->set_param( K      => 0.002 );

Specific subclasses might include direct parameter access:

   $blaststats->lambda(0.123);
   $blaststats->K(0.002);

But we shouldn't try to agree on "universal" statistical parameters, 
because they really don't exist.

In terms of run-time parameters, I would guess that a 
Bio::Tools::Run::ParameterI kinda thing would be appropriate; that way, 
you could build a runtime parameter object, pass it off to the 
runnable, and get a result object back that included the (possibly 
modified) parameter object.

-Aaron

On May 14, 2004, at 12:52 AM, Chad Matsalla wrote:

>
> Greetings all,
>
> I am writing a web service that provides Bio::Search::Result objects to
> a Java client. Yes, this does work and yes, it is very kewl.
>
> I created UML models for all of the components required to produce a
> Bio::Search::Result (Bio::Seq, Bio::HitI, etc) and used a code
> generation system to create Java classes that match. Would you like me
> to contribute this UML model (XMI format) to the project? I notice that
> the UML for Bioperl is a bit... dated.
>
> Anyway...
>
> I tell a Java client to ask for a Bio::Search::Result from a SOAP::Lite
> service. This works, until...
>
> The _statistics and _parameters attributes of a Bio::Search::Result
> object are hashes.  Although Java has a corresponding Hashtable class,
> it is not smart enough to deserialize a perl hash in an efficient,
> hack-free manner.
>
> I propose creating a SearchStatistics module that would hold these
> statistics and a SearchParameters object that would hold the 
> parameters.
>
> I understand that hashes are used when you need an arbitrary data
> structure. At least in the case of Blast we know what the keys in a
> statistics and parameters hashtable are going to be so why not have
> objects?
>
> At this time, I really only care about Blast results. Does anybody see
> why I should not change those two parameters to refer to objects rather
> then hashes in the Blast parts of the SearchIO subsystem?
>
> In the case that I create, for example, a SearchStatistics object I
> think that code based on the fact that _statistics is a hash would not
> break because _statistics is still a hash- it is just an object hash.
>
> Can anybody suggest what package these modules should belong to?
>
> I'm very eager to do this so unless there are reasonable objections I
> will do it this weekend. If it suddenly breaks tests or something I can
> undo it.
>
> I have invested significant time in Java<->BioPerl interoperability 
> over
> web services and if anybody is interested in my work just give me a
> shout (ISMB/BOSC?).
>
> Thanks!
>
> Chad Matsalla
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
Aaron J. Mackey, Ph.D.
Dept. of Biology, Goddard 212
University of Pennsylvania       email:  amackey at pcbi.upenn.edu
415 S. University Avenue         office: 215-898-1205
Philadelphia, PA  19104-6017     fax:    215-746-6697



More information about the Bioperl-l mailing list