[Bioperl-l] Parsing FASTA m10 output
Hilmar Lapp
hlapp at gmx.net
Sat Apr 21 17:14:10 UTC 2007
I haven't kept track of this - did this go anywhere? Do we not have
an -m10 fasta output parser in SearchIO? (I.e., my first thought
would be that that would be the desired solution; am I misled in this?)
-hilmar
On Apr 19, 2007, at 10:06 AM, Ioannis Kirmitzoglou wrote:
> I have reported it as a bug on the bugzilla but due to bugzilla
> problems I
> was not able to attach my code and/or sample m10 files.
> Nevertheless here is the code that converts an m10 fasta output to
> an m8
> BLAST output which is parseable by the vast majority of software.
>
> <----------- CODE BEGINS HERE ------------------->
>
> #!/usr/bin/perl -w
>
> =head1 NAME
>
> fastam10_to_table - turn FASTA -m 10 output into NCBI -m 8 tabular
> output
>
> =head1 SYNOPSIS
>
> fastam10_to_table [--header] [-o outfile] inputfile1 inputfile2 ...
>
> =head1 DESCRIPTION
>
> Command line options:
> --header -- boolean flag to print column header
> -o/--out -- optional outputfile to write data,
> otherwise will write to STDOUT
> -h/--help -- show this documentation
>
> Not technically a SearchIO script as this doesn't use any Bioperl
> components but is a useful and fast. The output is tabular output
> with the standard NCBI -m8 columns.
>
> queryname
> hit name
> percent identity
> alignment length
> number mismatches
> number gaps
> query start (if on rev-strand start > end)
> query end
> hit start (if on rev-strand start > end)
> hit end
> evalue
> bit score
>
> Additionally 4 more columns are provided
> percent similar
> query length
> hit length
> query gaps
> hit gaps
>
> =head1 AUTHOR - Ioannis Kirmitzoglou
>
> Ioannis Kirmitzoglou IoannisKirmitzoglou_at_gmail-dot-org
>
> =head1 ACKNOWLEDGMENTS - Ioannis Kirmitzoglou
>
> Headers as well as portions of code were taken
>> from fastam9_to_table.pl by Jason Stajich
>
> =head1 DISCLAIMER
>
> Copyright (c) <2007> <Ioannis Kirmitzolgou>
>
> Permission to use, copy, modify, merge, publish and distribute
> this software and its documentation, with or without modification,
> for any purpose, and without fee or royalty to the copyright holder(s)
> is hereby granted with no restictions and/or prerequisites.
>
> THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
> IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
> CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
> TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
> SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
>
> =cut
>
> use strict;
> use Getopt::Long;
>
> my %data=();
>
> my $outfile=''; my $header='';
> GetOptions(
> 'header' => \$header,
> 'o|out|outfile:s' => \$outfile,
> 'h|help' => sub { exec('perldoc',$0); exit; }
> );
>
> my $outfh;
> if( $outfile ) {
> open($outfh, ">$outfile") || die("$outfile: $!");
> } else {
> $outfh = \*STDOUT;
> }
>
>
> $/="\n>>>";
>
> my @fields = qw(qname hname percid alen mmcount gapcount
> qstart qend hstart hend evalue bits percsim qlen hlen qgap
> hgap);
>
> print $outfh "#",uc(join("", map{ sprintf("%-10s",$_) } @fields)),
> "\n" if
> $header;
>
> while (<>) {
>
> chomp;
> if ($_=~/^>/ || $_=~/^\#/) {next;}
> my @hits = split(/\d+>>/, $_);
> @hits= split("\n>>", $hits[0]);
>
> my $hit = shift @hits;
>
> ($data{'qname'}, $data{'qlen'} ) = ($hit=~ (/(\S+)\,\s(\d
> +)/));
>
> foreach my $align (@hits) {
>
> my @details= split ("\n>", $align);
> my $detail = shift @details;
> ($data{'hname'}) = ($detail =~ (/^(\S+)\s/));
> $detail=~ /\;\s(?:fa|sw)\_bits\:\s+(\S+)/;
> $data{'bits'}=$1;
> $detail=~ /\;\s(?:fa|sw)\_expect\:\s+(\S+)/;
> $data{'evalue'}=$1;
>
> my $term = quotemeta("; sw_score");
> $term =~ s/\\ /\\s/;
> $detail=~ /$term\s+(\S+)/;
> $data{'score'}=$1;
>
> $term = quotemeta("; sw_ident:");
> $term =~ s/\\ /\\s/;
> $detail=~ /$term\s+(\S+)/;
> $data{'percid'}=$1;
>
> $term = quotemeta("; sw_sim:");
> $term =~ s/\\ /\\s/;
> $detail=~ /$term\s+(\S+)/;
> $data{'percsim'}=$1;
>
> $term = quotemeta("; sw_overlap:");
> $term =~ s/\\ /\\s/;
> $detail=~ /$term\s+(\S+)/;
> $data{'alen'}=$1;
>
> $detail = shift @details;
>
> $term = quotemeta("; al_start:");
> $term =~ s/\\ /\\s/;
> $detail=~ /$term\s+(\S+)/;
> $data{'qstart'}=$1;
>
> $term = quotemeta("; al_stop:");
> $term =~ s/\\ /\\s/;
> $detail=~ /$term\s+(\S+)/;
> $data{'qend'}=$1;
>
> $term = quotemeta("; al_display_start:");
> $term =~ s/\\ /\\s/;
> my $lakis ='';
> $detail=~ /$term.+\s\-*([\-\w\s]+)/g;
>
> $data{'qgap'}=($1 =~ tr/\-//);
>
> $detail = shift @details;
>
> $term = quotemeta("; sq_len:");
> $term =~ s/\\ /\\s/;
> $detail=~ /$term\s+(\S+)/;
> $data{'hlen'}=$1;
>
> $term = quotemeta("; al_start:");
> $term =~ s/\\ /\\s/;
> $detail=~ /$term\s+(\S+)/;
> $data{'hstart'}=$1;
>
> $term = quotemeta("; al_stop:");
> $term =~ s/\\ /\\s/;
> $detail=~ /$term\s+(\S+)/;
> $data{'hend'}=$1;
>
> $term = quotemeta("; al_display_start:");
> $term =~ s/\\ /\\s/;
> $detail=~ /$term.+\s\-*([\-\w\s]+)/g;
> $data{'hgap'}=($1 =~ tr/-//);
> $data{'gapcount'} = $data{'qgap'} + $data{'hgap'};
> $data{'mmcount'} = $data{'alen'} - ( int($data{'percid'} *
> $data{'alen'}) + $data{'gapcount'});
>
> for ( $data{'percid'}, $data{'percsim'} ) {
> $_ = sprintf("%.2f",$_*100);
> }
>
> print $outfh join( "\t",map { $data{$_} } @fields),"\n"
> }
>
> }
>
> <----------------- CODE ENDS HERE ---------------------->
>
> --
>
> *Ioannis Kirmitzoglou*, MSc
> PhD. Student,
> Bioinformatics Research Laboratory
> Department of Biological Sciences
> University of Cyprus
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
--
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
===========================================================
More information about the Bioperl-l
mailing list