Bioperl: BPlite.pm
Ewan Birney
birney@sanger.ac.uk
Wed, 22 Dec 1999 09:16:28 +0000 (GMT)
On Tue, 21 Dec 1999, Ian Korf wrote:
> I've been getting requests recently for old BLAST parsers.
> Seems as though some people are looking for a lighweight
> parser. At http://sapiens.wustl.edu/~ikorf/BPlite.pm you
> can find my version of such a module. It parses both NCBI-
> and WU-BLAST, and works well in pipes since it reads one
> subject and one alignment at a time.
I'd really like to see a lighter blast parser with less embedded
functionality in bioperl, ideally with the main features of steve's
blast parser. If I can persuade someone to look at this Ian, is it
ok to bring it inside bioperl? (any chance of you wanting to do that? I
guess not...)
Steve - we *do* need to think of upgrading the blast parser - only
you know the code, and the largest set of bugs are found in it.
>
> The pod2text version of the documentation follows.
>
> -Ian Korf
>
>
> NAME
> BPlite - Lightweight BLAST parser
>
> SYNOPSIS
> use BPlite;
> my $report = new BPlite(\*STDIN);
> $report->query;
> $report->database;
> while(my $sbjct = $report->nextSbjct) {
> $sbjct->name;
> while (my $hsp = $sbjct->nextHSP) {
> $hsp->score;
> $hsp->bits;
> $hsp->percent;
> $hsp->P;
> $hsp->queryBegin;
> $hsp->queryEnd;
> $hsp->sbjctBegin;
> $hsp->sbjctEnd;
> $hsp->queryAlignment;
> $hsp->sbjctAlignment;
> }
> }
>
> DESCRIPTION
> BPlite is a package for parsing BLAST reports. The BLAST
> programs are a family of widely used algorithms for sequence
> database searches. The reports are non-trivial to parse, and
> there are differences in the formats of the various flavors of
> BLAST. BPlite parses BLASTN, BLASTP, BLASTX, TBLASTN, and
> TBLASTX reports from both the high performance WU-BLAST, and the
> more generic NCBI-BLAST.
>
> Many people have developed BLAST parsers (I myself have made at
> least three). BPlite is for those people who would rather not
> have a giant object specification, but rather a simple handle to
> a BLAST report that works well in pipes.
>
> Object
>
> BPlite has three kinds of objects, the report, the subject, and
> the HSP. To create a new report, you pass a filehandle reference
> to the BPlite constructor.
>
> my $report = new BPlite(\*STDIN); # or any other filehandle
>
> The report has two attributes (query and database), and one
> method (nextSbjct).
>
> $report->query; # access to the query name
> $report->database; # access to the database name
> $report->nextSbjct; # gets the next subject
> while(my $sbjct = $report->nextSbjct) {
> # canonical form of use is in a while loop
> }
>
> A subject is a BLAST hit, which should not be confused with an
> HSP (below). A BLAST hit may have several alignments associated
> with it. A useful way of thinking about it is that a subject is
> a gene and HSPs are the exons. Subjects have one attribute
> (name) and one method (nextHSP).
>
> $sbjct->name; # access to the subject name
> "$sbjct"; # overloaded to return name
> $sbjct->nextHSP; # gets the next HSP from the sbjct
> while(my $hsp = $sbjct->nextHSP) {
> # canonical form is again a while loop
> }
>
> An HSP is a high scoring pair, or simply an alignment. HSP
> objects do not have any methods, just attributes (score, bits,
> percent, P, queryBegin, queryEnd, sbjctBegin, sbjctEnd,
> queryAliignment, sbjctAlignment) that should be familiar to
> anyone who has seen a blast report. For lazy/efficient coders,
> two-letter abbreviations are available for the attributes with
> long names (qb, qe, sb, se, qa, sa).
>
> $hsp->score;
> $hsp->bits;
> $hsp->percent;
> $hsp->P;
> $hsp->queryBegin; $hsp->qb;
> $hsp->queryEnd; $hsp->qe;
> $hsp->sbjctBegin; $hsp->sb;
> $hsp->sbjctEnd; $hsp->se;
> $hsp->queryAlignment; $hsp->qa;
> $hsp->sbjctAlignment; $hsp->sa;
> "$hsp"; # overloaded for begin..end bits
>
> I've included a little bit of overloading for double quote
> variable interpolation convenience. A subject will return its
> name and an HSP will return its queryBegin, queryEnd, and bits
> in the alignment. Feel free to modify this to whatever is most
> frequently used by you.
>
> So a very simple look into a BLAST report might look like this.
>
> my $report = new BPlite(\*STDIN);
> while(my $sbjct = $report->nextSbjct) {
> print "$scbjct\n";
> while(my $hsp = $sbjct->nextHSP) {
> print "\t$hsp\n";
> }
> }
>
> The output of such code might look like this:
>
> >foo
> 100..155 29.5
> 268..300 20.1
> >bar
> 100..153 28.5
> 265..290 22.1
>
> AUTHOR
> Ian Korf (ikorf@sapiens.wustl.edu,
> http://sapiens.wustl.edu/~ikorf)
>
> ACKNOWLEDGEMENTS
> This software was developed at the Genome Sequencing Center at
> Washington Univeristy, St. Louis, MO.
>
> COPYRIGHT
> Copyright (C) 1999 Ian Korf. All Rights Reserved.
>
> DISCLAIMER
> This software is provided "as is" without warranty of any kind.
>
> =========== Bioperl Project Mailing List Message Footer =======
> Project URL: http://bio.perl.org/
> For info about how to (un)subscribe, where messages are archived, etc:
> http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
> ====================================================================
>
-----------------------------------------------------------------
Ewan Birney. Work: +44 (0)1223 494992. Mobile: +44 (0)7970 151230
<birney@sanger.ac.uk>
http://www.sanger.ac.uk/Users/birney/
-----------------------------------------------------------------
=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================