[Bioperl-l] Bio::SeachIO::Fasta problem
Martin A. Hansen
maasha at image.dk
Mon Aug 25 10:14:39 EDT 2003
hi
im trying to parse fasta search reports with Bio::SeachIO. however, i get this
warning message:
maasha at homer:~/bin$ parse_fasta btg1.fasta
-------------------- WARNING ---------------------
MSG: unrecognized FASTA Family report file!
---------------------------------------------------
this indicates that there might be something wrong with the fasta report file,
but im not sure what that could be. im i supposed to run a certain version of
fasta? and with a certain set of options? e.g. i have noticed that running
fasta from the wisconsin packages (GCG) outputs a double dot (..) between the
introtext and the data:
The best scores are: init1 initn opt z-sc E(7402)..
whereas running "normal" fasta does not produce the double dot?
and to really twist the fork i am failing in identifying the different fasta
versions :/
anyways, here is the snippet of code im using to parse:
#!/usr/bin/perl -w
use strict;
use Bio::SearchIO;
my ( $script, $usage, $file );
$script = ( split "/", $0 )[ -1 ];
$usage = qq(
$script by Martin A. Hansen, August 2003.
$script parses a FASTA report file
Usage: $script [file]
[file] - file with fasta report
);
print $usage and exit if not @ARGV;
$file = shift @ARGV;
# >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> MAIN <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
my ( $lines );
$lines = &parse_fasta( $file );
print "$_\n" foreach @{ $lines };
exit;
# >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SUBROUTINES <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
sub parse_fasta
{
# Martin A. Hansen, August 2003.
# parses blast reports using Bioperl
my ( $file, # file with blast report
) = @_;
# returns list of sequence lines
my ( $result, $hit, $hit_name, $searchio, $white_space, $query_beg, $hsp, $hit_string, @lines, $query_string, $query_name );
$searchio = new Bio::SearchIO ( -format => 'fasta', -file => $file );
$result = $searchio->next_result;
while ( $hit = $result->next_hit )
{
$query_name = $result->query_name;
$hit_name = $hit->name;
$hsp = $hit->next_hsp;
$query_string = $hsp->query_string;
$query_beg = $hsp->query->start;
$hit_string = $hsp->hit_string;
$white_space = ' ' x ( $query_beg - 1 );
push @lines, {
"QUERY_NAME" => $query_name,
"QUERY_STRING" => $white_space . $query_string,
"SUBJECT_NAME" => $hit_name,
"SUBJECT_STRING" => $white_space . $hit_string,
}
}
return wantarray ? @lines : \@lines;
}
# >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
__END__
any suggestions?
martin
More information about the Bioperl-l
mailing list