[Bioperl-l] Parsing FASTA m10 output
Ioannis Kirmitzoglou
ioanniskirmitzoglou at gmail.com
Thu Apr 19 14:06:06 UTC 2007
I have reported it as a bug on the bugzilla but due to bugzilla problems I
was not able to attach my code and/or sample m10 files.
Nevertheless here is the code that converts an m10 fasta output to an m8
BLAST output which is parseable by the vast majority of software.
<----------- CODE BEGINS HERE ------------------->
#!/usr/bin/perl -w
=head1 NAME
fastam10_to_table - turn FASTA -m 10 output into NCBI -m 8 tabular output
=head1 SYNOPSIS
fastam10_to_table [--header] [-o outfile] inputfile1 inputfile2 ...
=head1 DESCRIPTION
Command line options:
--header -- boolean flag to print column header
-o/--out -- optional outputfile to write data,
otherwise will write to STDOUT
-h/--help -- show this documentation
Not technically a SearchIO script as this doesn't use any Bioperl
components but is a useful and fast. The output is tabular output
with the standard NCBI -m8 columns.
queryname
hit name
percent identity
alignment length
number mismatches
number gaps
query start (if on rev-strand start > end)
query end
hit start (if on rev-strand start > end)
hit end
evalue
bit score
Additionally 4 more columns are provided
percent similar
query length
hit length
query gaps
hit gaps
=head1 AUTHOR - Ioannis Kirmitzoglou
Ioannis Kirmitzoglou IoannisKirmitzoglou_at_gmail-dot-org
=head1 ACKNOWLEDGMENTS - Ioannis Kirmitzoglou
Headers as well as portions of code were taken
from fastam9_to_table.pl by Jason Stajich
=head1 DISCLAIMER
Copyright (c) <2007> <Ioannis Kirmitzolgou>
Permission to use, copy, modify, merge, publish and distribute
this software and its documentation, with or without modification,
for any purpose, and without fee or royalty to the copyright holder(s)
is hereby granted with no restictions and/or prerequisites.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
=cut
use strict;
use Getopt::Long;
my %data=();
my $outfile=''; my $header='';
GetOptions(
'header' => \$header,
'o|out|outfile:s' => \$outfile,
'h|help' => sub { exec('perldoc',$0); exit; }
);
my $outfh;
if( $outfile ) {
open($outfh, ">$outfile") || die("$outfile: $!");
} else {
$outfh = \*STDOUT;
}
$/="\n>>>";
my @fields = qw(qname hname percid alen mmcount gapcount
qstart qend hstart hend evalue bits percsim qlen hlen qgap hgap);
print $outfh "#",uc(join("", map{ sprintf("%-10s",$_) } @fields)), "\n" if
$header;
while (<>) {
chomp;
if ($_=~/^>/ || $_=~/^\#/) {next;}
my @hits = split(/\d+>>/, $_);
@hits= split("\n>>", $hits[0]);
my $hit = shift @hits;
($data{'qname'}, $data{'qlen'} ) = ($hit=~ (/(\S+)\,\s(\d+)/));
foreach my $align (@hits) {
my @details= split ("\n>", $align);
my $detail = shift @details;
($data{'hname'}) = ($detail =~ (/^(\S+)\s/));
$detail=~ /\;\s(?:fa|sw)\_bits\:\s+(\S+)/;
$data{'bits'}=$1;
$detail=~ /\;\s(?:fa|sw)\_expect\:\s+(\S+)/;
$data{'evalue'}=$1;
my $term = quotemeta("; sw_score");
$term =~ s/\\ /\\s/;
$detail=~ /$term\s+(\S+)/;
$data{'score'}=$1;
$term = quotemeta("; sw_ident:");
$term =~ s/\\ /\\s/;
$detail=~ /$term\s+(\S+)/;
$data{'percid'}=$1;
$term = quotemeta("; sw_sim:");
$term =~ s/\\ /\\s/;
$detail=~ /$term\s+(\S+)/;
$data{'percsim'}=$1;
$term = quotemeta("; sw_overlap:");
$term =~ s/\\ /\\s/;
$detail=~ /$term\s+(\S+)/;
$data{'alen'}=$1;
$detail = shift @details;
$term = quotemeta("; al_start:");
$term =~ s/\\ /\\s/;
$detail=~ /$term\s+(\S+)/;
$data{'qstart'}=$1;
$term = quotemeta("; al_stop:");
$term =~ s/\\ /\\s/;
$detail=~ /$term\s+(\S+)/;
$data{'qend'}=$1;
$term = quotemeta("; al_display_start:");
$term =~ s/\\ /\\s/;
my $lakis ='';
$detail=~ /$term.+\s\-*([\-\w\s]+)/g;
$data{'qgap'}=($1 =~ tr/\-//);
$detail = shift @details;
$term = quotemeta("; sq_len:");
$term =~ s/\\ /\\s/;
$detail=~ /$term\s+(\S+)/;
$data{'hlen'}=$1;
$term = quotemeta("; al_start:");
$term =~ s/\\ /\\s/;
$detail=~ /$term\s+(\S+)/;
$data{'hstart'}=$1;
$term = quotemeta("; al_stop:");
$term =~ s/\\ /\\s/;
$detail=~ /$term\s+(\S+)/;
$data{'hend'}=$1;
$term = quotemeta("; al_display_start:");
$term =~ s/\\ /\\s/;
$detail=~ /$term.+\s\-*([\-\w\s]+)/g;
$data{'hgap'}=($1 =~ tr/-//);
$data{'gapcount'} = $data{'qgap'} + $data{'hgap'};
$data{'mmcount'} = $data{'alen'} - ( int($data{'percid'} *
$data{'alen'}) + $data{'gapcount'});
for ( $data{'percid'}, $data{'percsim'} ) {
$_ = sprintf("%.2f",$_*100);
}
print $outfh join( "\t",map { $data{$_} } @fields),"\n"
}
}
<----------------- CODE ENDS HERE ---------------------->
--
*Ioannis Kirmitzoglou*, MSc
PhD. Student,
Bioinformatics Research Laboratory
Department of Biological Sciences
University of Cyprus
More information about the Bioperl-l
mailing list