[Bioperl-l] Parsing FASTA m10 output

Sat Apr 21 17:14:10 UTC 2007

I haven't kept track of this - did this go anywhere? Do we not have  
an -m10 fasta output parser in SearchIO? (I.e., my first thought  
would be that that would be the desired solution; am I misled in this?)

	-hilmar

On Apr 19, 2007, at 10:06 AM, Ioannis Kirmitzoglou wrote:

> I have reported it as a bug on the bugzilla but due to bugzilla  
> problems I
> was not able to attach my code and/or sample m10 files.
> Nevertheless here is the code that converts an m10 fasta output to  
> an m8
> BLAST output which is parseable by the vast majority of software.
>
> <----------- CODE BEGINS HERE ------------------->
>
> #!/usr/bin/perl -w
>
> =head1 NAME
>
> fastam10_to_table  - turn FASTA -m 10 output into NCBI -m 8 tabular  
> output
>
> =head1 SYNOPSIS
>
>  fastam10_to_table [--header] [-o outfile] inputfile1 inputfile2 ...
>
> =head1 DESCRIPTION
>
> Command line options:
>   --header                -- boolean flag to print column header
>   -o/--out                -- optional outputfile to write data,
>                              otherwise will write to STDOUT
>   -h/--help               -- show this documentation
>
> Not technically a SearchIO script as this doesn't use any Bioperl
> components but is a useful and fast.  The output is tabular output
> with the standard NCBI -m8 columns.
>
>  queryname
>  hit name
>  percent identity
>  alignment length
>  number mismatches
>  number gaps
>  query start  (if on rev-strand start > end)
>  query end
>  hit start (if on rev-strand start > end)
>  hit end
>  evalue
>  bit score
>
> Additionally 4 more columns are provided
>  percent similar
>  query length
>  hit length
>  query gaps
>  hit gaps
>
> =head1 AUTHOR - Ioannis Kirmitzoglou
>
> Ioannis Kirmitzoglou IoannisKirmitzoglou_at_gmail-dot-org
>
> =head1 ACKNOWLEDGMENTS - Ioannis Kirmitzoglou
>
> Headers as well as portions of code were taken
>> from fastam9_to_table.pl by Jason Stajich
>
> =head1 DISCLAIMER
>
> Copyright (c) <2007> <Ioannis Kirmitzolgou>
>
> Permission to use, copy, modify, merge, publish and distribute
> this software and its documentation, with or without modification,
> for any purpose, and without fee or royalty to the copyright holder(s)
> is hereby granted with no restictions and/or prerequisites.
>
> THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
> IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
> CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
> TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
> SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
>
> =cut
>
> use strict;
> use Getopt::Long;
>
> my %data=();
>
> my $outfile=''; my $header='';
> GetOptions(
>     'header'              => \$header,
>     'o|out|outfile:s'     => \$outfile,
>     'h|help'              => sub { exec('perldoc',$0); exit; }
>        );
>
> my $outfh;
> if( $outfile ) {
>     open($outfh, ">$outfile") || die("$outfile: $!");
> } else {
>     $outfh = \*STDOUT;
> }
>
>
> $/="\n>>>";
>
> my @fields = qw(qname hname percid alen mmcount gapcount
>         qstart qend hstart hend evalue bits percsim qlen hlen qgap  
> hgap);
>
> print $outfh "#",uc(join("", map{ sprintf("%-10s",$_) } @fields)),  
> "\n" if
> $header;
>
> while (<>) {
>
>         chomp;
>         if ($_=~/^>/ || $_=~/^\#/) {next;}
>         my @hits = split(/\d+>>/, $_);
>         @hits= split("\n>>", $hits[0]);
>
>         my $hit = shift @hits;
>
>         ($data{'qname'}, $data{'qlen'} ) = ($hit=~ (/(\S+)\,\s(\d 
> +)/));
>
>         foreach my $align (@hits) {
>
>             my @details= split ("\n>", $align);
>            my $detail = shift @details;
>             ($data{'hname'}) = ($detail =~ (/^(\S+)\s/));
>             $detail=~ /\;\s(?:fa|sw)\_bits\:\s+(\S+)/;
>             $data{'bits'}=$1;
>             $detail=~ /\;\s(?:fa|sw)\_expect\:\s+(\S+)/;
>             $data{'evalue'}=$1;
>
>             my $term = quotemeta("; sw_score");
>             $term =~ s/\\ /\\s/;
>             $detail=~ /$term\s+(\S+)/;
>             $data{'score'}=$1;
>
>             $term = quotemeta("; sw_ident:");
>             $term =~ s/\\ /\\s/;
>             $detail=~ /$term\s+(\S+)/;
>             $data{'percid'}=$1;
>
>             $term = quotemeta("; sw_sim:");
>             $term =~ s/\\ /\\s/;
>             $detail=~ /$term\s+(\S+)/;
>             $data{'percsim'}=$1;
>
>             $term = quotemeta("; sw_overlap:");
>             $term =~ s/\\ /\\s/;
>             $detail=~ /$term\s+(\S+)/;
>             $data{'alen'}=$1;
>
>             $detail = shift @details;
>
>             $term = quotemeta("; al_start:");
>             $term =~ s/\\ /\\s/;
>             $detail=~ /$term\s+(\S+)/;
>             $data{'qstart'}=$1;
>
>             $term = quotemeta("; al_stop:");
>             $term =~ s/\\ /\\s/;
>             $detail=~ /$term\s+(\S+)/;
>             $data{'qend'}=$1;
>
>             $term = quotemeta("; al_display_start:");
>             $term =~ s/\\ /\\s/;
>             my $lakis ='';
>             $detail=~ /$term.+\s\-*([\-\w\s]+)/g;
>
>             $data{'qgap'}=($1 =~ tr/\-//);
>
>             $detail = shift @details;
>
>             $term = quotemeta("; sq_len:");
>             $term =~ s/\\ /\\s/;
>             $detail=~ /$term\s+(\S+)/;
>             $data{'hlen'}=$1;
>
>             $term = quotemeta("; al_start:");
>             $term =~ s/\\ /\\s/;
>             $detail=~ /$term\s+(\S+)/;
>             $data{'hstart'}=$1;
>
>             $term = quotemeta("; al_stop:");
>             $term =~ s/\\ /\\s/;
>             $detail=~ /$term\s+(\S+)/;
>             $data{'hend'}=$1;
>
>             $term = quotemeta("; al_display_start:");
>             $term =~ s/\\ /\\s/;
>             $detail=~ /$term.+\s\-*([\-\w\s]+)/g;
>             $data{'hgap'}=($1 =~ tr/-//);
>             $data{'gapcount'} = $data{'qgap'} + $data{'hgap'};
>             $data{'mmcount'} = $data{'alen'} - ( int($data{'percid'} *
> $data{'alen'}) + $data{'gapcount'});
>
> for ( $data{'percid'}, $data{'percsim'} ) {
>     $_ = sprintf("%.2f",$_*100);
> }
>
>             print $outfh join( "\t",map { $data{$_} } @fields),"\n"
>         }
>
> }
>
> <----------------- CODE ENDS HERE ---------------------->
>
> -- 
>
> *Ioannis Kirmitzoglou*, MSc
> PhD. Student,
> Bioinformatics Research Laboratory
> Department of Biological Sciences
> University of Cyprus
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================