Bioperl: BPlite.pm
Lincoln Stein
lstein@cshl.org
Wed, 22 Dec 1999 11:46:05 -0500 (EST)
The Blast parser in Boulder is also lightweight -- at least in terms
of memory consumption -- and works with NCBI and WU-BLAST. Please
feel free to borrow bits of code from that.
Lincoln
Ewan Birney writes:
> On Tue, 21 Dec 1999, Ian Korf wrote:
>
> > I've been getting requests recently for old BLAST parsers.
> > Seems as though some people are looking for a lighweight
> > parser. At http://sapiens.wustl.edu/~ikorf/BPlite.pm you
> > can find my version of such a module. It parses both NCBI-
> > and WU-BLAST, and works well in pipes since it reads one
> > subject and one alignment at a time.
>
> I'd really like to see a lighter blast parser with less embedded
> functionality in bioperl, ideally with the main features of steve's
> blast parser. If I can persuade someone to look at this Ian, is it
> ok to bring it inside bioperl? (any chance of you wanting to do that? I
> guess not...)
>
> Steve - we *do* need to think of upgrading the blast parser - only
> you know the code, and the largest set of bugs are found in it.
>
>
> >
> > The pod2text version of the documentation follows.
> >
> > -Ian Korf
> >
> >
> > NAME
> > BPlite - Lightweight BLAST parser
> >
> > SYNOPSIS
> > use BPlite;
> > my $report = new BPlite(\*STDIN);
> > $report->query;
> > $report->database;
> > while(my $sbjct = $report->nextSbjct) {
> > $sbjct->name;
> > while (my $hsp = $sbjct->nextHSP) {
> > $hsp->score;
> > $hsp->bits;
> > $hsp->percent;
> > $hsp->P;
> > $hsp->queryBegin;
> > $hsp->queryEnd;
> > $hsp->sbjctBegin;
> > $hsp->sbjctEnd;
> > $hsp->queryAlignment;
> > $hsp->sbjctAlignment;
> > }
> > }
> >
> > DESCRIPTION
> > BPlite is a package for parsing BLAST reports. The BLAST
> > programs are a family of widely used algorithms for sequence
> > database searches. The reports are non-trivial to parse, and
> > there are differences in the formats of the various flavors of
> > BLAST. BPlite parses BLASTN, BLASTP, BLASTX, TBLASTN, and
> > TBLASTX reports from both the high performance WU-BLAST, and the
> > more generic NCBI-BLAST.
> >
> > Many people have developed BLAST parsers (I myself have made at
> > least three). BPlite is for those people who would rather not
> > have a giant object specification, but rather a simple handle to
> > a BLAST report that works well in pipes.
> >
> > Object
> >
> > BPlite has three kinds of objects, the report, the subject, and
> > the HSP. To create a new report, you pass a filehandle reference
> > to the BPlite constructor.
> >
> > my $report = new BPlite(\*STDIN); # or any other filehandle
> >
> > The report has two attributes (query and database), and one
> > method (nextSbjct).
> >
> > $report->query; # access to the query name
> > $report->database; # access to the database name
> > $report->nextSbjct; # gets the next subject
> > while(my $sbjct = $report->nextSbjct) {
> > # canonical form of use is in a while loop
> > }
> >
> > A subject is a BLAST hit, which should not be confused with an
> > HSP (below). A BLAST hit may have several alignments associated
> > with it. A useful way of thinking about it is that a subject is
> > a gene and HSPs are the exons. Subjects have one attribute
> > (name) and one method (nextHSP).
> >
> > $sbjct->name; # access to the subject name
> > "$sbjct"; # overloaded to return name
> > $sbjct->nextHSP; # gets the next HSP from the sbjct
> > while(my $hsp = $sbjct->nextHSP) {
> > # canonical form is again a while loop
> > }
> >
> > An HSP is a high scoring pair, or simply an alignment. HSP
> > objects do not have any methods, just attributes (score, bits,
> > percent, P, queryBegin, queryEnd, sbjctBegin, sbjctEnd,
> > queryAliignment, sbjctAlignment) that should be familiar to
> > anyone who has seen a blast report. For lazy/efficient coders,
> > two-letter abbreviations are available for the attributes with
> > long names (qb, qe, sb, se, qa, sa).
> >
> > $hsp->score;
> > $hsp->bits;
> > $hsp->percent;
> > $hsp->P;
> > $hsp->queryBegin; $hsp->qb;
> > $hsp->queryEnd; $hsp->qe;
> > $hsp->sbjctBegin; $hsp->sb;
> > $hsp->sbjctEnd; $hsp->se;
> > $hsp->queryAlignment; $hsp->qa;
> > $hsp->sbjctAlignment; $hsp->sa;
> > "$hsp"; # overloaded for begin..end bits
> >
> > I've included a little bit of overloading for double quote
> > variable interpolation convenience. A subject will return its
> > name and an HSP will return its queryBegin, queryEnd, and bits
> > in the alignment. Feel free to modify this to whatever is most
> > frequently used by you.
> >
> > So a very simple look into a BLAST report might look like this.
> >
> > my $report = new BPlite(\*STDIN);
> > while(my $sbjct = $report->nextSbjct) {
> > print "$scbjct\n";
> > while(my $hsp = $sbjct->nextHSP) {
> > print "\t$hsp\n";
> > }
> > }
> >
> > The output of such code might look like this:
> >
> > >foo
> > 100..155 29.5
> > 268..300 20.1
> > >bar
> > 100..153 28.5
> > 265..290 22.1
> >
> > AUTHOR
> > Ian Korf (ikorf@sapiens.wustl.edu,
> > http://sapiens.wustl.edu/~ikorf)
> >
> > ACKNOWLEDGEMENTS
> > This software was developed at the Genome Sequencing Center at
> > Washington Univeristy, St. Louis, MO.
> >
> > COPYRIGHT
> > Copyright (C) 1999 Ian Korf. All Rights Reserved.
> >
> > DISCLAIMER
> > This software is provided "as is" without warranty of any kind.
> >
> > =========== Bioperl Project Mailing List Message Footer =======
> > Project URL: http://bio.perl.org/
> > For info about how to (un)subscribe, where messages are archived, etc:
> > http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
> > ====================================================================
> >
>
> -----------------------------------------------------------------
> Ewan Birney. Work: +44 (0)1223 494992. Mobile: +44 (0)7970 151230
> <birney@sanger.ac.uk>
> http://www.sanger.ac.uk/Users/birney/
> -----------------------------------------------------------------
>
> =========== Bioperl Project Mailing List Message Footer =======
> Project URL: http://bio.perl.org/
> For info about how to (un)subscribe, where messages are archived, etc:
> http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
> ====================================================================
--
========================================================================
Lincoln D. Stein Cold Spring Harbor Laboratory
lstein@cshl.org Cold Spring Harbor, NY
========================================================================
=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================