[Bioperl-l] hacking BlastTable.pm to support blast -m 8
Chris Fields
cjfields at illinois.edu
Tue Oct 27 19:50:24 UTC 2009
On Oct 27, 2009, at 2:31 PM, Tristan Lefebure wrote:
> Hello,
> As far as I understand, Bio::Index::BlastTable only supports
> the -m 9 blast format. Another popular and more compact
> format is -m 8, the main difference being that the blast
> program, the query and database, and the different field
> names are not reported between each search, i.e. you get
> a much cleaner table (which looks much easier to parse).
>
> By looking at BlastTable.pm, it looks like the main
> hack would be in the sub _index_file. Right now it is:
>
> sub _index_file {
> my( $self,
> $file, # File name
> $i, # Index-number of file being indexed
> ) = @_;
>
> my( $begin, # Offset from start of file of the start
> # of the last found record.
> );
>
> open(my $BLAST, '<', $file) or $self->throw("cannot open file
> $file\n");
> my $indexpoint = 0;
> my $lastline = 0;
> while( <$BLAST> ) {
> if(m{^#\s+T?BLAST[PNX]} ) {
> my $len = length $_;
> $indexpoint = tell($BLAST)-$len;
> }
> if(m{^#\s+Query:\s+([^\n]+)}) {
> foreach my $id ($self->id_parser()->($1)) {
> $self->debug("id is $id, begin is
> $indexpoint\n");
> $self->add_record($id, $i,
> $indexpoint);
> }
> }
> }
> }
>
> Using the -m 8 format, is it me or this could be
> done by getting the query name from the first row
> of the blast table, find when the hits for this query
> starts and stop, and give this to add_record()?
>
> I'm kind of not sure to get all the details
> regarding the $i and $indexpoint... so well, if an
> expert eye could give me some advice or hack the code
> that would be nice ;)
>
> --Tristan
That should be feasible, yes, and you are correct. The main thing to
make sure of is to retain the '#' for -m9, so the parser catches the
BLAST executable and other info.
I'll go ahead and do this based on your suggestion, unless you have a
patch ready. Also, it looks like the module is missing tests, so I
can work on adding those for both -m8 and -m9 output.
chris
More information about the Bioperl-l
mailing list