[Bioperl-l] Bioperl-run: Testing alignments generated externally

Thu Oct 26 22:01:08 UTC 2006

I have been running into similar issues with EUtilities tests.  Since the
data on the server is constantly updated I have to try an future-proof the
tests so they don't constantly fail.  

I have been using Test::More and like/unlike or cmp_ok to get around some of
those 'fuzzy data' issues.  If some methods consistently return a particular
type of value, such as an integer, you could use:

like($foo->get_value, qr{^\d+$}, 'value test'); #integer

or similar.

Chris

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

> Nathan -
> 
> I agree - the values tend to change with different versions of the
> applications unfortunately.  It would make sense to just test that
> you get out sequences that are in valid alignment format and perhaps
> have as many ending sequences as you started with.   The more
> restrictive tests probably aren't reliable with mixing and matching
> versions.
> 
> One thing we do for PAML is condition tests on the version used - but
> of course when a new version comes out we have to add more stuff to
> the tests (or just have some code that skips those tests).
> 
> -jason
> On Oct 26, 2006, at 3:33 AM, Nathan Haigh wrote:
> 
> > Remo Sanges wrote:
> >> Nathan Haigh wrote:
> >>> Sendu Bala wrote:
> >>>
> >>>> Nathan Haigh wrote:
> >>>>
> >>>>> I'm thinking that it's not wise to test for things like
> >>>>> overall_percentage_identity etc in alignments that are
> >>>>> generated by
> >>>>> external software like T-Coffee, Clustalw etc. Changes to software
> >>>>> algorithms/efficiency, bug fixes etc may well alter the quality
> >>>>> of the
> >>>>> alignment produced in different versions and thus affect the value
> >>>>> returned by such methods. Therefore, I think these methods
> >>>>> should only
> >>>>> be tested from alignments loaded directly from t/data.
> >>>>>
> >>>> Did you discover some specific problem cases?
> >>>>
> >>> My messages seem to be taking a while to come through, but, yes.
> >>> It may
> >>> be due to the software changing default parameters, but it makes
> >>> testing
> >>> the output for specific details pretty difficult and
> >>> inconsistent. For
> >>> example, running T-Coffee, the following command from t/TCoffee.t
> >>> results in slightly different alignment:
> >>> $aln = $factory->run('-type' => 'profile',
> >>>                      '-profile' => $aln1,
> >>>                      '-seq'  =>
> >>> Bio::Root::IO->catfile("t","data","cysprot1b.fa"));
> >>>
> >>> Of particular note, is the gaps on the last line of the
> >>> sequences. In
> >>> 4.45, there are two gaps in CATH_RAT/1-133 ('gk-nm---cg') whereas in
> >>> <v4.45 this is ('gkn----mcg').
> >>>
> >> I'm not a T-coffee user but usually you can come across
> >> these problems when you use different scoring parameters
> >> when align sequences.
> >>
> >> Could it be possible that they have simply changed the
> >> default parameters for gap penalties and that kind of
> >> stuff? It is possible to set them?
> >>
> >> If so you can just run the test by defining
> >> the scores in the param hash without using the default.
> >>
> >> HTH
> >>
> >> Remo
> > That is true, but it depends on the whether the wrapper is complete
> > enough to be able to set all the parameters provided by the software.
> >
> > Nath
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich, PhD
> Miller Research Fellow
> University of California
> Dept of Plant and Microbial Biology
> 321 Koshland Hall #3102
> Berkeley, CA 94720-3102
> lab: 510.642.8441
> http://pmb.berkeley.edu/~taylor/people/js.html
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l