[Bioperl-l] Problems parsing blast reports
Dave Messina
David.Messina at sbc.su.se
Sat May 14 16:15:08 UTC 2011
Hi Lorenzo,
Hmm, I tried your code, and it worked for me.
I set your input variables set as follows:
my ( $filename, $format, $minimumbitsscore, $thresholdevalue,
$minimumidentity,
$maximumredundancy, $minimumalnlength )
= ('2008.blasttable', 'blasttable', 300, '1e-100', 10, 0, 10);
where the file is t/data/2008.blasttable that comes with the BioPerl distro.
With those settings, the top hit goes into %besthits and the third hit goes
into %diverged.
I also added at the top:
use strict;
use warnings;
use Bio::SearchIO;
You *are* using strict and warnings, aren't you? :)
One thing that may be an issue is that you're doing this at the hit level,
and remember that in -m8 format each line represents an HSP.
I'd need your input — really, a neat little test case and the corresponding
erroneous output — to help further.
Dave
On Sat, May 14, 2011 at 17:29, Lorenzo Carretero <lcpaulet at googlemail.com>wrote:
> Hi all,
> I'm trying to parse blasttable '-m 8 format' reports from whole
> intra-genome
> comparisons of all vs all to get the best non-self hit and diverged best
> non-self hit as separate hashes (key=query=>value=non_self_hit). The
> following script seems to run ok but it returns unexpected results (i.e. it
> doesn't catch the best hit but a random hit apparently). I assume hits are
> iterated over the while loop (while (my $hit = $result->next_hit) as
> returned in the blast results (i.e. sorted by hits bit scores).
> Any help would be much appreciated.
> Cheers,
> Lorenzo
>
>
> my ( $filename, $format, $minimumbitsscore, $thresholdevalue,
> > $minimumidentity, $maximumredundancy, $minimumalnlength ) = @_;
> > my ( %besthits, %diverged, %redundant ,%genefusion ) = ();
> > my ( $refbh, $refdiv, $cb, $cd, $refred, $reffus, $cr, $cf);
> > my $total = 0;
> > my $in = new Bio::SearchIO ( -file => $filename,
> > -format => $format,
> > #-verbose => -1,
> > )
> > or die "No $filename BLAST file with
> > $format found";
> > while( my $result = $in->next_result) {
> > $total++;
> > while (my $hit = $result->next_hit) {
> > my $query = $result->query_name();
> > my $hitname = $hit->name();
> > my $bits = $hit->bits();
> > my $evalue = $hit->significance();
> > if ($query ne $hitname and $bits >= $minimumbitsscore and
> > $evalue <= $thresholdevalue ) {
> > $besthits{ $query } = $hitname;
> > }
> > elsif ( $bits <= $minimumbitsscore and $evalue >=
> > $thresholdevalue){
> > $diverged { $query } = $hitname;
> > }
> >
> > # while( my $hsp = $hit->next_hsp ) {
> > # my $querylen = $hsp->length( 'query' );
> > # my $hitlen = $hsp->length( 'hit' );
> > # my $alnlen = $hsp->length( 'total' );
> > # my $identity = $hsp->percent_identity();
> > # if ( $identity > $maximumredundancy ) {
> > # $redundant{ $query } = $hitname;
> > # }
> > # elsif ( ($querylen <= ($alnlen / 2) ) || ($hitlen <=
> > ($alnlen / 2) ) || ($alnlen <= $minimumalnlength) ) {
> > # $genefusion{ $query } = $hitname;
> > # }
> > # elsif ( ($evalue >= $thresholdevalue) || ($identity
> <=
> > $minimumidentity) || ($bits <= $minimumbitsscore) ) {
> > # $diverged{ $query } = $hitname;
> > # }
> > # else {
> > # $besthits{ $query } = $hitname;
> > # }
> > # }
> > }
> > }
> > $refbh = \%besthits;
> > $refdiv = \%diverged;
> > $refred = \%redundant;
> > $reffus = \%genefusion;
> > $cb = keys ( %besthits );
> > $cd = keys( %diverged );
> > $cr = keys( %redundant );
> > $cf = keys( %genefusion );
> >
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
More information about the Bioperl-l
mailing list