Bioperl: BPlite.pm

Jeffrey Chang jchang@SMI.Stanford.EDU
Wed, 22 Dec 1999 08:30:47 -0800 (PST)


Hi Everybody,

Just popping in from biopython!  I thought I'd mention that over there,
we're using an event-oriented design for our parsers, which is described
in a mail:
http://www.biopython.org/pipermail/biopython/1999-December/000149.html

How it works, is that a Scanner object chews through a data file and
generates events when it runs across information.  The events are then
handled by a Consumer.

This design is nice because it decouples a lot of the parsing work from
the final representation, and makes it easy to accomodate parsers of
varying complexity.  You can create Consumers to handle as much or as
little of the data as you want.  The plan for biopython is to distribute
Scanners, and a Consumer that shoves all the information into some data
structure.  Advanced users, however, will have the option of using the
scanner but building their own high performance Consumer tailored
specifically for their own purposes.

The code for this is sitting on my local drive now, and will be in the
biopython CVS repository soon.

Jeff



On Wed, 22 Dec 1999, Ewan Birney wrote:

> On Tue, 21 Dec 1999, Ian Korf wrote:
> 
> > I've been getting requests recently for old BLAST parsers.
> > Seems as though some people are looking for a lighweight
> > parser. At http://sapiens.wustl.edu/~ikorf/BPlite.pm you
> > can find my version of such a module. It parses both NCBI-
> > and WU-BLAST, and works well in pipes since it reads one
> > subject and one alignment at a time.
> 
> I'd really like to see a lighter blast parser with less embedded
> functionality in bioperl, ideally with the main features of steve's
> blast parser. If I can persuade someone to look at this Ian, is it
> ok to bring it inside bioperl? (any chance of you wanting to do that? I
> guess not...)
> 
> Steve - we *do* need to think of upgrading the blast parser - only
> you know the code, and the largest set of bugs are found in it.
> 
> 
> > 
> > The pod2text version of the documentation follows.
> > 
> > -Ian Korf
> > 
> > 
> > NAME
> >     BPlite - Lightweight BLAST parser
> > 
> > SYNOPSIS
> >      use BPlite;
> >      my $report = new BPlite(\*STDIN);
> >      $report->query;
> >      $report->database;
> >      while(my $sbjct = $report->nextSbjct) {
> >          $sbjct->name;
> >          while (my $hsp = $sbjct->nextHSP) {
> >              $hsp->score;
> >              $hsp->bits;
> >              $hsp->percent;
> >              $hsp->P;
> >              $hsp->queryBegin;
> >              $hsp->queryEnd;
> >              $hsp->sbjctBegin;
> >              $hsp->sbjctEnd;
> >              $hsp->queryAlignment;
> >              $hsp->sbjctAlignment;
> >          }
> >      }
> > 
> > DESCRIPTION
> >     BPlite is a package for parsing BLAST reports. The BLAST
> >     programs are a family of widely used algorithms for sequence
> >     database searches. The reports are non-trivial to parse, and
> >     there are differences in the formats of the various flavors of
> >     BLAST. BPlite parses BLASTN, BLASTP, BLASTX, TBLASTN, and
> >     TBLASTX reports from both the high performance WU-BLAST, and the
> >     more generic NCBI-BLAST.
> > 
> >     Many people have developed BLAST parsers (I myself have made at
> >     least three). BPlite is for those people who would rather not
> >     have a giant object specification, but rather a simple handle to
> >     a BLAST report that works well in pipes.
> > 
> >   Object
> > 
> >     BPlite has three kinds of objects, the report, the subject, and
> >     the HSP. To create a new report, you pass a filehandle reference
> >     to the BPlite constructor.
> > 
> >      my $report = new BPlite(\*STDIN); # or any other filehandle
> > 
> >     The report has two attributes (query and database), and one
> >     method (nextSbjct).
> > 
> >      $report->query;     # access to the query name
> >      $report->database;  # access to the database name
> >      $report->nextSbjct; # gets the next subject
> >      while(my $sbjct = $report->nextSbjct) {
> >          # canonical form of use is in a while loop
> >      }
> > 
> >     A subject is a BLAST hit, which should not be confused with an
> >     HSP (below). A BLAST hit may have several alignments associated
> >     with it. A useful way of thinking about it is that a subject is
> >     a gene and HSPs are the exons. Subjects have one attribute
> >     (name) and one method (nextHSP).
> > 
> >      $sbjct->name;    # access to the subject name
> >      "$sbjct";        # overloaded to return name
> >      $sbjct->nextHSP; # gets the next HSP from the sbjct
> >      while(my $hsp = $sbjct->nextHSP) {
> >          # canonical form is again a while loop
> >      }
> > 
> >     An HSP is a high scoring pair, or simply an alignment. HSP
> >     objects do not have any methods, just attributes (score, bits,
> >     percent, P, queryBegin, queryEnd, sbjctBegin, sbjctEnd,
> >     queryAliignment, sbjctAlignment) that should be familiar to
> >     anyone who has seen a blast report. For lazy/efficient coders,
> >     two-letter abbreviations are available for the attributes with
> >     long names (qb, qe, sb, se, qa, sa).
> > 
> >      $hsp->score;
> >      $hsp->bits;
> >      $hsp->percent;
> >      $hsp->P;
> >      $hsp->queryBegin;     $hsp->qb;
> >      $hsp->queryEnd;       $hsp->qe;
> >      $hsp->sbjctBegin;     $hsp->sb;
> >      $hsp->sbjctEnd;       $hsp->se;
> >      $hsp->queryAlignment; $hsp->qa;
> >      $hsp->sbjctAlignment; $hsp->sa;
> >      "$hsp"; # overloaded for begin..end bits
> > 
> >     I've included a little bit of overloading for double quote
> >     variable interpolation convenience. A subject will return its
> >     name and an HSP will return its queryBegin, queryEnd, and bits
> >     in the alignment. Feel free to modify this to whatever is most
> >     frequently used by you.
> > 
> >     So a very simple look into a BLAST report might look like this.
> > 
> >      my $report = new BPlite(\*STDIN);
> >      while(my $sbjct = $report->nextSbjct) {
> >          print "$scbjct\n";
> >          while(my $hsp = $sbjct->nextHSP) {
> >                     print "\t$hsp\n";
> >          }
> >      }
> > 
> >     The output of such code might look like this:
> > 
> >      >foo
> >          100..155 29.5
> >          268..300 20.1
> >      >bar
> >          100..153 28.5
> >          265..290 22.1
> > 
> > AUTHOR
> >     Ian Korf (ikorf@sapiens.wustl.edu,
> >     http://sapiens.wustl.edu/~ikorf)
> > 
> > ACKNOWLEDGEMENTS
> >     This software was developed at the Genome Sequencing Center at
> >     Washington Univeristy, St. Louis, MO.
> > 
> > COPYRIGHT
> >     Copyright (C) 1999 Ian Korf. All Rights Reserved.
> > 
> > DISCLAIMER
> >     This software is provided "as is" without warranty of any kind.
> > 
> > =========== Bioperl Project Mailing List Message Footer =======
> > Project URL: http://bio.perl.org/
> > For info about how to (un)subscribe, where messages are archived, etc:
> > http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
> > ====================================================================
> > 
> 
> -----------------------------------------------------------------
> Ewan Birney. Work: +44 (0)1223 494992. Mobile: +44 (0)7970 151230
> <birney@sanger.ac.uk>
> http://www.sanger.ac.uk/Users/birney/
> -----------------------------------------------------------------
> 
> =========== Bioperl Project Mailing List Message Footer =======
> Project URL: http://bio.perl.org/
> For info about how to (un)subscribe, where messages are archived, etc:
> http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
> ====================================================================
> 

=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================