Bioperl: Bio::Tools::Blast vs. Bio::GSC::Tool::Blast
Nigel Brown
brown@ebi.ac.uk
Mon, 8 Jun 1998 19:16:49 +0100 (BST)
A while ago I mentioned the MView software for displaying search results or
pre-computed multiple alignments in an HTML page. The underlying parsers
might also be of interest in this discussion...
Background
----------
The idea here was to have some boiler-plate code for (recursively) breaking
down a (nested) flatfile into records (and sub-records) *on demand*. An
initial pass finds and indexes the top-level record locations, one
intention being that these could be saved for subsequent radom access to an
old BLAST run or an EMBL feature table, or whatever. New parsers are
subclassed from these and are relatively easy to build and test, since most
work is reusing the boiler-plate and then just embedding the necessary
regexps for detailed parsing.
There is NO attempt at providing any kind of higher-level behaviour such as
initiating runs or embedding HTML - that's for the caller. If the caller
wants some datum, they access it explicitly as an object attribute (ouch!),
ie., there are no nice access methods.
Parsers already there
---------------------
In the ftp file (see below) are various test datafiles I use for the MView
production and simple test scripts for loading them together with the
expected output.
examples/blastn:
blastn_1.4.9.big.dat
blastn_1.4.9.dat
blastn_2.0a19MP-WashU.dat
examples/blastp:
blast2_2.0.4.dat
blast2_2.0a13MP-WashU.dat
blastp_1.4.7.dat
blastp_1.4.9+hist.dat
psi-blast_2.0.2.dat
psi-blast_2.0.4.dat
examples/blastx:
blastx_1.4.9.dat
examples/fasta:
fasta_1.6c24.dat
fasta_2.0u.dat
fasta_2.0u.dna.dat
fasta_3.0t76.dat
tfastx_2.0u.dat
tfastx_3.0t.dat
examples/hssp:
9wga.hssp
examples/multi:
clu_1.51.dat
clu_1.60.dat
clu_1.70.dat
msf.1.dat
msf.2.dat
msf.3.dat
pear.dat
I've also parsers for EMBL/GenBank concentrating on the Feature Table, but
less complete, since I have no need of them right now - this stuff is
driven by my needs rather than untainted altruism.
Todo if I ever find time and this doesn't become obsolete
---------------------------------------------------------
Write Pods.
Write a meta-programming tool for defining regexps and actions (a la
icarus i suppose, but simpler?) that would be used to synthesize the actual
parser subclass from the boiler-plate stuff.
Some kind of self-documenting meta-level description on any format(ASN.1?).
Systematize the BLAST and FASTA class hierarchies better.
The code
--------
MView:
http://columba.ebi.ac.uk:8765/mview/
Parser stuff (anon ftp):
www.sander.ebi.ac.uk:/pub/nige/Parse.tar.gz
Example test script
-------------------
This test script parses PSI-BLAST output, and has subrecords named like:
HEADER
RANKING
SEARCH
PROTEIN
PSUM
PHIT
PHIT
PHIT
...
PROTEIN
PSUM
PHIT
PHIT
...
SEARCH
...
PARAMETERS
WARNINGS
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
#!/usr/bin/env perl5
$^W = 1;
use strict;
use lib '/people/nbrown/work/perl/lib';
use Universal;
use Parse::Search::BLAST2;
my @datfiles = qw(test.bp2);
@datfiles = @ARGV if @ARGV;
my ($file, $entry, $ob, $search, $frag);
foreach $file (@datfiles) {
open(DATA, "< $file") or die "can't open $file\n";
while ($entry = Parse::Search::BLAST2::scan_entry($file, *DATA)) {
$entry->print; print "\n";
print $entry->string('Header'); print "\n";
print "COUNTS: ", join(", ", $entry->count), "\n";
if ($entry->count('WARNING')) {
foreach $ob ($entry->parse('WARNING')) {
$ob->print; print "\n";
}
}
foreach $ob ($entry->parse('HEADER')) {
$ob->print; print "\n";
#print $ob->string; print "\n";
}
foreach $search ($entry->parse('SEARCH')) {
$search->print; print "\n";
foreach $ob ($search->parse('RANKING')) {
#print $ob->string; print "\n";
$ob->print; print "\n";
}
foreach $ob ($search->parse('PROTEIN')) {
$ob->print; print "\n";
foreach $frag ($ob->parse('PSUM')) {
$frag->print; print "\n";
next;
}
foreach $frag ($ob->parse('PHIT')) {
$frag->print; print "\n";
next;
}
}
}
$entry->free;
}
close DATA;
}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Enjoy!
n
--
---------------------------------------------------------------------------
Nigel P. Brown, Ph.D. Nigel.Brown@ebi.ac.uk
http://www.sander.ebi.ac.uk/~brown/ Tel: +44 (0)1223 494 451 FAX: 468
European Bioinformatics Institute, Hinxton, Cambridge CB10 1SD, UK
---------------------------------------------------------------------------
=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================