Bioperl: BPlite.pm

Lincoln Stein lstein@cshl.org
Wed, 22 Dec 1999 11:46:05 -0500 (EST)


The Blast parser in Boulder is also lightweight -- at least in terms
of memory consumption -- and works with NCBI and WU-BLAST.  Please
feel free to borrow bits of code from that.

Lincoln

Ewan Birney writes:
 > On Tue, 21 Dec 1999, Ian Korf wrote:
 > 
 > > I've been getting requests recently for old BLAST parsers.
 > > Seems as though some people are looking for a lighweight
 > > parser. At http://sapiens.wustl.edu/~ikorf/BPlite.pm you
 > > can find my version of such a module. It parses both NCBI-
 > > and WU-BLAST, and works well in pipes since it reads one
 > > subject and one alignment at a time.
 > 
 > I'd really like to see a lighter blast parser with less embedded
 > functionality in bioperl, ideally with the main features of steve's
 > blast parser. If I can persuade someone to look at this Ian, is it
 > ok to bring it inside bioperl? (any chance of you wanting to do that? I
 > guess not...)
 > 
 > Steve - we *do* need to think of upgrading the blast parser - only
 > you know the code, and the largest set of bugs are found in it.
 > 
 > 
 > > 
 > > The pod2text version of the documentation follows.
 > > 
 > > -Ian Korf
 > > 
 > > 
 > > NAME
 > >     BPlite - Lightweight BLAST parser
 > > 
 > > SYNOPSIS
 > >      use BPlite;
 > >      my $report = new BPlite(\*STDIN);
 > >      $report->query;
 > >      $report->database;
 > >      while(my $sbjct = $report->nextSbjct) {
 > >          $sbjct->name;
 > >          while (my $hsp = $sbjct->nextHSP) {
 > >              $hsp->score;
 > >              $hsp->bits;
 > >              $hsp->percent;
 > >              $hsp->P;
 > >              $hsp->queryBegin;
 > >              $hsp->queryEnd;
 > >              $hsp->sbjctBegin;
 > >              $hsp->sbjctEnd;
 > >              $hsp->queryAlignment;
 > >              $hsp->sbjctAlignment;
 > >          }
 > >      }
 > > 
 > > DESCRIPTION
 > >     BPlite is a package for parsing BLAST reports. The BLAST
 > >     programs are a family of widely used algorithms for sequence
 > >     database searches. The reports are non-trivial to parse, and
 > >     there are differences in the formats of the various flavors of
 > >     BLAST. BPlite parses BLASTN, BLASTP, BLASTX, TBLASTN, and
 > >     TBLASTX reports from both the high performance WU-BLAST, and the
 > >     more generic NCBI-BLAST.
 > > 
 > >     Many people have developed BLAST parsers (I myself have made at
 > >     least three). BPlite is for those people who would rather not
 > >     have a giant object specification, but rather a simple handle to
 > >     a BLAST report that works well in pipes.
 > > 
 > >   Object
 > > 
 > >     BPlite has three kinds of objects, the report, the subject, and
 > >     the HSP. To create a new report, you pass a filehandle reference
 > >     to the BPlite constructor.
 > > 
 > >      my $report = new BPlite(\*STDIN); # or any other filehandle
 > > 
 > >     The report has two attributes (query and database), and one
 > >     method (nextSbjct).
 > > 
 > >      $report->query;     # access to the query name
 > >      $report->database;  # access to the database name
 > >      $report->nextSbjct; # gets the next subject
 > >      while(my $sbjct = $report->nextSbjct) {
 > >          # canonical form of use is in a while loop
 > >      }
 > > 
 > >     A subject is a BLAST hit, which should not be confused with an
 > >     HSP (below). A BLAST hit may have several alignments associated
 > >     with it. A useful way of thinking about it is that a subject is
 > >     a gene and HSPs are the exons. Subjects have one attribute
 > >     (name) and one method (nextHSP).
 > > 
 > >      $sbjct->name;    # access to the subject name
 > >      "$sbjct";        # overloaded to return name
 > >      $sbjct->nextHSP; # gets the next HSP from the sbjct
 > >      while(my $hsp = $sbjct->nextHSP) {
 > >          # canonical form is again a while loop
 > >      }
 > > 
 > >     An HSP is a high scoring pair, or simply an alignment. HSP
 > >     objects do not have any methods, just attributes (score, bits,
 > >     percent, P, queryBegin, queryEnd, sbjctBegin, sbjctEnd,
 > >     queryAliignment, sbjctAlignment) that should be familiar to
 > >     anyone who has seen a blast report. For lazy/efficient coders,
 > >     two-letter abbreviations are available for the attributes with
 > >     long names (qb, qe, sb, se, qa, sa).
 > > 
 > >      $hsp->score;
 > >      $hsp->bits;
 > >      $hsp->percent;
 > >      $hsp->P;
 > >      $hsp->queryBegin;     $hsp->qb;
 > >      $hsp->queryEnd;       $hsp->qe;
 > >      $hsp->sbjctBegin;     $hsp->sb;
 > >      $hsp->sbjctEnd;       $hsp->se;
 > >      $hsp->queryAlignment; $hsp->qa;
 > >      $hsp->sbjctAlignment; $hsp->sa;
 > >      "$hsp"; # overloaded for begin..end bits
 > > 
 > >     I've included a little bit of overloading for double quote
 > >     variable interpolation convenience. A subject will return its
 > >     name and an HSP will return its queryBegin, queryEnd, and bits
 > >     in the alignment. Feel free to modify this to whatever is most
 > >     frequently used by you.
 > > 
 > >     So a very simple look into a BLAST report might look like this.
 > > 
 > >      my $report = new BPlite(\*STDIN);
 > >      while(my $sbjct = $report->nextSbjct) {
 > >          print "$scbjct\n";
 > >          while(my $hsp = $sbjct->nextHSP) {
 > >                     print "\t$hsp\n";
 > >          }
 > >      }
 > > 
 > >     The output of such code might look like this:
 > > 
 > >      >foo
 > >          100..155 29.5
 > >          268..300 20.1
 > >      >bar
 > >          100..153 28.5
 > >          265..290 22.1
 > > 
 > > AUTHOR
 > >     Ian Korf (ikorf@sapiens.wustl.edu,
 > >     http://sapiens.wustl.edu/~ikorf)
 > > 
 > > ACKNOWLEDGEMENTS
 > >     This software was developed at the Genome Sequencing Center at
 > >     Washington Univeristy, St. Louis, MO.
 > > 
 > > COPYRIGHT
 > >     Copyright (C) 1999 Ian Korf. All Rights Reserved.
 > > 
 > > DISCLAIMER
 > >     This software is provided "as is" without warranty of any kind.
 > > 
 > > =========== Bioperl Project Mailing List Message Footer =======
 > > Project URL: http://bio.perl.org/
 > > For info about how to (un)subscribe, where messages are archived, etc:
 > > http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
 > > ====================================================================
 > > 
 > 
 > -----------------------------------------------------------------
 > Ewan Birney. Work: +44 (0)1223 494992. Mobile: +44 (0)7970 151230
 > <birney@sanger.ac.uk>
 > http://www.sanger.ac.uk/Users/birney/
 > -----------------------------------------------------------------
 > 
 > =========== Bioperl Project Mailing List Message Footer =======
 > Project URL: http://bio.perl.org/
 > For info about how to (un)subscribe, where messages are archived, etc:
 > http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
 > ====================================================================

-- 
========================================================================
Lincoln D. Stein                           Cold Spring Harbor Laboratory
lstein@cshl.org			                  Cold Spring Harbor, NY
========================================================================
=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================