[Bioperl-l] Parsing FASTA m10 output
Jason Stajich
jason at bioperl.org
Sat Apr 21 17:44:00 UTC 2007
We don't have one yet. This is a new format introduced in the most
recent release of FASTA. Hopefully someone can make some time to add
some code to SearchIO::fasta for it.
I do find that I when I need a fast FASTA to TAB converter that the
simple script (fastam9_to_table) is more efficient that SearchIO
framework so Ioannis is making a parallel one for the new m10
output. So I think having both is useful.
-jason
On Apr 21, 2007, at 10:14 AM, Hilmar Lapp wrote:
> I haven't kept track of this - did this go anywhere? Do we not have
> an -m10 fasta output parser in SearchIO? (I.e., my first thought
> would be that that would be the desired solution; am I misled in
> this?)
>
> -hilmar
>
> On Apr 19, 2007, at 10:06 AM, Ioannis Kirmitzoglou wrote:
>
>> I have reported it as a bug on the bugzilla but due to bugzilla
>> problems I
>> was not able to attach my code and/or sample m10 files.
>> Nevertheless here is the code that converts an m10 fasta output to
>> an m8
>> BLAST output which is parseable by the vast majority of software.
>>
>> <----------- CODE BEGINS HERE ------------------->
>>
>> #!/usr/bin/perl -w
>>
>> =head1 NAME
>>
>> fastam10_to_table - turn FASTA -m 10 output into NCBI -m 8 tabular
>> output
>>
>> =head1 SYNOPSIS
>>
>> fastam10_to_table [--header] [-o outfile] inputfile1 inputfile2 ...
>>
>> =head1 DESCRIPTION
>>
>> Command line options:
>> --header -- boolean flag to print column header
>> -o/--out -- optional outputfile to write data,
>> otherwise will write to STDOUT
>> -h/--help -- show this documentation
>>
>> Not technically a SearchIO script as this doesn't use any Bioperl
>> components but is a useful and fast. The output is tabular output
>> with the standard NCBI -m8 columns.
>>
>> queryname
>> hit name
>> percent identity
>> alignment length
>> number mismatches
>> number gaps
>> query start (if on rev-strand start > end)
>> query end
>> hit start (if on rev-strand start > end)
>> hit end
>> evalue
>> bit score
>>
>> Additionally 4 more columns are provided
>> percent similar
>> query length
>> hit length
>> query gaps
>> hit gaps
>>
>> =head1 AUTHOR - Ioannis Kirmitzoglou
>>
>> Ioannis Kirmitzoglou IoannisKirmitzoglou_at_gmail-dot-org
>>
>> =head1 ACKNOWLEDGMENTS - Ioannis Kirmitzoglou
>>
>> Headers as well as portions of code were taken
>>> from fastam9_to_table.pl by Jason Stajich
>>
>> =head1 DISCLAIMER
>>
>> Copyright (c) <2007> <Ioannis Kirmitzolgou>
>>
>> Permission to use, copy, modify, merge, publish and distribute
>> this software and its documentation, with or without modification,
>> for any purpose, and without fee or royalty to the copyright holder
>> (s)
>> is hereby granted with no restictions and/or prerequisites.
>>
>> THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
>> EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
>> MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
>> NONINFRINGEMENT.
>> IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
>> CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
>> TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
>> SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
>>
>> =cut
>>
>> use strict;
>> use Getopt::Long;
>>
>> my %data=();
>>
>> my $outfile=''; my $header='';
>> GetOptions(
>> 'header' => \$header,
>> 'o|out|outfile:s' => \$outfile,
>> 'h|help' => sub { exec('perldoc',$0); exit; }
>> );
>>
>> my $outfh;
>> if( $outfile ) {
>> open($outfh, ">$outfile") || die("$outfile: $!");
>> } else {
>> $outfh = \*STDOUT;
>> }
>>
>>
>> $/="\n>>>";
>>
>> my @fields = qw(qname hname percid alen mmcount gapcount
>> qstart qend hstart hend evalue bits percsim qlen hlen qgap
>> hgap);
>>
>> print $outfh "#",uc(join("", map{ sprintf("%-10s",$_) } @fields)),
>> "\n" if
>> $header;
>>
>> while (<>) {
>>
>> chomp;
>> if ($_=~/^>/ || $_=~/^\#/) {next;}
>> my @hits = split(/\d+>>/, $_);
>> @hits= split("\n>>", $hits[0]);
>>
>> my $hit = shift @hits;
>>
>> ($data{'qname'}, $data{'qlen'} ) = ($hit=~ (/(\S+)\,\s(\d
>> +)/));
>>
>> foreach my $align (@hits) {
>>
>> my @details= split ("\n>", $align);
>> my $detail = shift @details;
>> ($data{'hname'}) = ($detail =~ (/^(\S+)\s/));
>> $detail=~ /\;\s(?:fa|sw)\_bits\:\s+(\S+)/;
>> $data{'bits'}=$1;
>> $detail=~ /\;\s(?:fa|sw)\_expect\:\s+(\S+)/;
>> $data{'evalue'}=$1;
>>
>> my $term = quotemeta("; sw_score");
>> $term =~ s/\\ /\\s/;
>> $detail=~ /$term\s+(\S+)/;
>> $data{'score'}=$1;
>>
>> $term = quotemeta("; sw_ident:");
>> $term =~ s/\\ /\\s/;
>> $detail=~ /$term\s+(\S+)/;
>> $data{'percid'}=$1;
>>
>> $term = quotemeta("; sw_sim:");
>> $term =~ s/\\ /\\s/;
>> $detail=~ /$term\s+(\S+)/;
>> $data{'percsim'}=$1;
>>
>> $term = quotemeta("; sw_overlap:");
>> $term =~ s/\\ /\\s/;
>> $detail=~ /$term\s+(\S+)/;
>> $data{'alen'}=$1;
>>
>> $detail = shift @details;
>>
>> $term = quotemeta("; al_start:");
>> $term =~ s/\\ /\\s/;
>> $detail=~ /$term\s+(\S+)/;
>> $data{'qstart'}=$1;
>>
>> $term = quotemeta("; al_stop:");
>> $term =~ s/\\ /\\s/;
>> $detail=~ /$term\s+(\S+)/;
>> $data{'qend'}=$1;
>>
>> $term = quotemeta("; al_display_start:");
>> $term =~ s/\\ /\\s/;
>> my $lakis ='';
>> $detail=~ /$term.+\s\-*([\-\w\s]+)/g;
>>
>> $data{'qgap'}=($1 =~ tr/\-//);
>>
>> $detail = shift @details;
>>
>> $term = quotemeta("; sq_len:");
>> $term =~ s/\\ /\\s/;
>> $detail=~ /$term\s+(\S+)/;
>> $data{'hlen'}=$1;
>>
>> $term = quotemeta("; al_start:");
>> $term =~ s/\\ /\\s/;
>> $detail=~ /$term\s+(\S+)/;
>> $data{'hstart'}=$1;
>>
>> $term = quotemeta("; al_stop:");
>> $term =~ s/\\ /\\s/;
>> $detail=~ /$term\s+(\S+)/;
>> $data{'hend'}=$1;
>>
>> $term = quotemeta("; al_display_start:");
>> $term =~ s/\\ /\\s/;
>> $detail=~ /$term.+\s\-*([\-\w\s]+)/g;
>> $data{'hgap'}=($1 =~ tr/-//);
>> $data{'gapcount'} = $data{'qgap'} + $data{'hgap'};
>> $data{'mmcount'} = $data{'alen'} - ( int($data
>> {'percid'} *
>> $data{'alen'}) + $data{'gapcount'});
>>
>> for ( $data{'percid'}, $data{'percsim'} ) {
>> $_ = sprintf("%.2f",$_*100);
>> }
>>
>> print $outfh join( "\t",map { $data{$_} } @fields),"\n"
>> }
>>
>> }
>>
>> <----------------- CODE ENDS HERE ---------------------->
>>
>> --
>>
>> *Ioannis Kirmitzoglou*, MSc
>> PhD. Student,
>> Bioinformatics Research Laboratory
>> Department of Biological Sciences
>> University of Cyprus
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> ===========================================================
> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/
More information about the Bioperl-l
mailing list