[Bioperl-l] Bio::SeachIO::Fasta problem

Martin A. Hansen maasha at image.dk
Mon Aug 25 10:14:39 EDT 2003


hi

im trying to parse fasta search reports with Bio::SeachIO. however, i get this
warning message:

maasha at homer:~/bin$ parse_fasta btg1.fasta 

-------------------- WARNING ---------------------
MSG: unrecognized FASTA Family report file!
---------------------------------------------------

this indicates that there might be something wrong with the fasta report file,
but im not sure what that could be. im i supposed to run a certain version of
fasta? and with a certain set of options? e.g. i have noticed that running
fasta from the wisconsin packages (GCG) outputs a double dot (..) between the
introtext and the data:

The best scores are:                    init1 initn   opt    z-sc E(7402)..

whereas running "normal" fasta does not produce the double dot?

and to really twist the fork i am failing in identifying the different fasta
versions :/

anyways, here is the snippet of code im using to parse:


#!/usr/bin/perl -w

use strict;
use Bio::SearchIO;

my ( $script, $usage, $file );

$script = ( split "/", $0 )[ -1 ];

$usage = qq(

$script by Martin A. Hansen, August 2003.

$script parses a FASTA report file

Usage: $script [file]
               [file]       - file with fasta report

);

print $usage and exit if not @ARGV;

$file = shift @ARGV;


# >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> MAIN <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<


my ( $lines );

$lines = &parse_fasta( $file );

print "$_\n" foreach @{ $lines };

exit;


# >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SUBROUTINES <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<


sub parse_fasta
{
    # Martin A. Hansen, August 2003.

    # parses blast reports using Bioperl

    my ( $file,   # file with blast report
       ) = @_;

    # returns list of sequence lines

    my ( $result, $hit, $hit_name, $searchio, $white_space, $query_beg, $hsp, $hit_string, @lines, $query_string, $query_name );

    $searchio = new Bio::SearchIO ( -format => 'fasta', -file => $file );
    $result   = $searchio->next_result;

    while ( $hit = $result->next_hit )
    {
        $query_name   = $result->query_name;
        $hit_name     = $hit->name;
        $hsp          = $hit->next_hsp;

        $query_string = $hsp->query_string;
        $query_beg    = $hsp->query->start;
        $hit_string   = $hsp->hit_string;

        $white_space  = ' ' x ( $query_beg - 1 );

        push @lines, {
                       "QUERY_NAME"     => $query_name,
                       "QUERY_STRING"   => $white_space . $query_string, 
                       "SUBJECT_NAME"   => $hit_name,
                       "SUBJECT_STRING" => $white_space . $hit_string,
        }
    }

    return wantarray ? @lines : \@lines;
}




# >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<


__END__



any suggestions?


martin


More information about the Bioperl-l mailing list