[Bioperl-l] About to tag the last RC...

Chris Fields cjfields at illinois.edu
Thu Jan 15 05:17:31 UTC 2009

On Jan 14, 2009, at 5:45 PM, Scott Markel wrote:

> Chris,
> We've been testing 1.6 RC2 with our set of nightly Pipeline Pilot  
> regressions and have noticed a few issues.  Sorry we couldn't get  
> this feedback to you sooner.
> 1) There is a problem with the output filename for bl2seq on  
> Windows.  In response to bug 2707, quotemeta was used when building  
> the parameter string at line 507 in  
> Bio::Tools::Run::StandAloneNCBIBlast (1.5.9_2).  This causes a  
> problem with the path to the output file on Windows.  For example,  
> "C:\DOCUME~1\outfile" becomes "C\:\\DOCUME\~1\\outfile".  bl2seq  
> can't open the output file and fails.

I've added an OS check for that so this isn't used with Windows (I  
wondered whether quotemeta would bite me there).  I'm seriously  
considering ripping out that code altogether, though.  I'm not sure we  
want to wade into attempting to accurately escape shell chars simply  
based on OS differences.

> 2) Parsing megablast output (format 2) with Bio::SearchIO::blast.pm  
> now returns an algorithm name of "BLASTN" instead of "MEGABLAST".   
> This change seems to have been introduced in revision 11579 of  
> blast.pm when a couple regex changes were made (lines 452 and 1201  
> of blast.pm in 1.5.9_2).  Subbing in the old regular expression for  
> megablast in line 452 returned the correct "MEGABLAST" algorithm name.

I worked out why that regex isn't working (it doesn't match MEGABLAST  
at all).  I fixed it and added a test for checking the algorithm to  
the test suite for MEGABLAST output, seems to work now.

> We also see some minor differences that we can live with, e.g.,  
> BLAST hit scores changing from 40 to 40.1 and e-values having  
> trailing zeros.  We'll just update our baselines.

Okay, but let me know if that becomes pressing.  The e-value issue is  
a bit odd and may be worth looking into.

> The change to using Bio::Annotation::TagTree for SwissProt sequence  
> gene names broke a number of our tests but we'll fix that by  
> modifying the adapters we use between our internal representation  
> and BioPerl's.

That would be from the switchover from StructureValue (which wasn't  
really designed for the purposes of storing such data).  A layered  
Bio::Annotation::Collection was the other option (this is almost a  
light version of that).

> One thing we haven't tracked down yet is a change in tag type, e.g.,  
> b:integervalue to b:stringvalue, in the XML representations of our  
> Pipeline Pilot data records.  We're only seeing this for programs in  
> NCBI's BLAST suite.  At this point we don't know what's changed on  
> the BioPerl side to trigger the change in our code.  We'll continue  
> to investigate this.

Again, if you find it's on our side let us know.

> Scott
> Scott Markel, Ph.D.
> Principal Bioinformatics Architect  email:  smarkel at accelrys.com
> Accelrys (SciTegic R&D)             mobile: +1 858 205 3653
> 10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
> San Diego, CA 92121                 fax:    +1 858 799 5222
> USA                                 web:    http://www.accelrys.com
> http://www.linkedin.com/in/smarkel
> Board of Directors: International Society for Computational Biology
> Co-chair: ISCB Publications Committee
> Associate Editor: PLoS Computational Biology
> Editorial Board: Briefings in Bioinformatics

Thanks Scott!  Let us know if you have any other problems.  I've been  
busier than expected but should get RC3 out soon.


More information about the Bioperl-l mailing list