Bioperl: BPlite.pm

Ian Korf ikorf@sapiens.wustl.edu
Tue, 21 Dec 1999 10:30:12 -0600 (CST)


I've been getting requests recently for old BLAST parsers.
Seems as though some people are looking for a lighweight
parser. At http://sapiens.wustl.edu/~ikorf/BPlite.pm you
can find my version of such a module. It parses both NCBI-
and WU-BLAST, and works well in pipes since it reads one
subject and one alignment at a time.

The pod2text version of the documentation follows.

-Ian Korf


NAME
    BPlite - Lightweight BLAST parser

SYNOPSIS
     use BPlite;
     my $report = new BPlite(\*STDIN);
     $report->query;
     $report->database;
     while(my $sbjct = $report->nextSbjct) {
         $sbjct->name;
         while (my $hsp = $sbjct->nextHSP) {
             $hsp->score;
             $hsp->bits;
             $hsp->percent;
             $hsp->P;
             $hsp->queryBegin;
             $hsp->queryEnd;
             $hsp->sbjctBegin;
             $hsp->sbjctEnd;
             $hsp->queryAlignment;
             $hsp->sbjctAlignment;
         }
     }

DESCRIPTION
    BPlite is a package for parsing BLAST reports. The BLAST
    programs are a family of widely used algorithms for sequence
    database searches. The reports are non-trivial to parse, and
    there are differences in the formats of the various flavors of
    BLAST. BPlite parses BLASTN, BLASTP, BLASTX, TBLASTN, and
    TBLASTX reports from both the high performance WU-BLAST, and the
    more generic NCBI-BLAST.

    Many people have developed BLAST parsers (I myself have made at
    least three). BPlite is for those people who would rather not
    have a giant object specification, but rather a simple handle to
    a BLAST report that works well in pipes.

  Object

    BPlite has three kinds of objects, the report, the subject, and
    the HSP. To create a new report, you pass a filehandle reference
    to the BPlite constructor.

     my $report = new BPlite(\*STDIN); # or any other filehandle

    The report has two attributes (query and database), and one
    method (nextSbjct).

     $report->query;     # access to the query name
     $report->database;  # access to the database name
     $report->nextSbjct; # gets the next subject
     while(my $sbjct = $report->nextSbjct) {
         # canonical form of use is in a while loop
     }

    A subject is a BLAST hit, which should not be confused with an
    HSP (below). A BLAST hit may have several alignments associated
    with it. A useful way of thinking about it is that a subject is
    a gene and HSPs are the exons. Subjects have one attribute
    (name) and one method (nextHSP).

     $sbjct->name;    # access to the subject name
     "$sbjct";        # overloaded to return name
     $sbjct->nextHSP; # gets the next HSP from the sbjct
     while(my $hsp = $sbjct->nextHSP) {
         # canonical form is again a while loop
     }

    An HSP is a high scoring pair, or simply an alignment. HSP
    objects do not have any methods, just attributes (score, bits,
    percent, P, queryBegin, queryEnd, sbjctBegin, sbjctEnd,
    queryAliignment, sbjctAlignment) that should be familiar to
    anyone who has seen a blast report. For lazy/efficient coders,
    two-letter abbreviations are available for the attributes with
    long names (qb, qe, sb, se, qa, sa).

     $hsp->score;
     $hsp->bits;
     $hsp->percent;
     $hsp->P;
     $hsp->queryBegin;     $hsp->qb;
     $hsp->queryEnd;       $hsp->qe;
     $hsp->sbjctBegin;     $hsp->sb;
     $hsp->sbjctEnd;       $hsp->se;
     $hsp->queryAlignment; $hsp->qa;
     $hsp->sbjctAlignment; $hsp->sa;
     "$hsp"; # overloaded for begin..end bits

    I've included a little bit of overloading for double quote
    variable interpolation convenience. A subject will return its
    name and an HSP will return its queryBegin, queryEnd, and bits
    in the alignment. Feel free to modify this to whatever is most
    frequently used by you.

    So a very simple look into a BLAST report might look like this.

     my $report = new BPlite(\*STDIN);
     while(my $sbjct = $report->nextSbjct) {
         print "$scbjct\n";
         while(my $hsp = $sbjct->nextHSP) {
                    print "\t$hsp\n";
         }
     }

    The output of such code might look like this:

     >foo
         100..155 29.5
         268..300 20.1
     >bar
         100..153 28.5
         265..290 22.1

AUTHOR
    Ian Korf (ikorf@sapiens.wustl.edu,
    http://sapiens.wustl.edu/~ikorf)

ACKNOWLEDGEMENTS
    This software was developed at the Genome Sequencing Center at
    Washington Univeristy, St. Louis, MO.

COPYRIGHT
    Copyright (C) 1999 Ian Korf. All Rights Reserved.

DISCLAIMER
    This software is provided "as is" without warranty of any kind.

=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================