[Bioperl-l] Bio::SeachIO::Fasta problem

Douglas Joubert djoubert at mail.mcg.edu
Mon Aug 25 16:13:49 EDT 2003


Greetings,

I too received the "> MSG: unrecognized FASTA Family report file!" error when I was "attempting" to demonstrate a snipet of code that I lifted from one of Jason's ppt presentations (GenomeInformatics2002).

I am a librarian, not a programmer, therefore I assumed I had incorrectly installed BioPerl.

My fasta file was Blast output, that I outputted to fasta format, is this not the correct way to use this module.

My text file started with >gi|14625690|emb|AL591499.7| so I thought I was OK

My question is this, what exactly does the hyperlink provided by Jason install?

Cheers

DJJ


Douglas Joubert, M.L.I.S.
Instructor and Digital Information Librarian
Robert B. Greenblatt M.D. Library
Medical College of Georgia
Augusta, GA 30912-4400

>>> Jason Stajich <jason at cgt.duhs.duke.edu> 8/25/2003 1:52:42 PM >>>
Martin - it's tested on FASTA 3.4 and some versions of 3.3.  It can parse
the -m 9 tabluar output as well as standard default output (with or
without Histograms).

Personally I would just use the latest distribution:
ftp://ftp.virginia.edu/pub/fasta/fasta3.shar.Z 

It has not been tested with the GCG-ized FASTA and as you report it
doesn't seem to work. I took the liberty of posting a bug report for you
with an example report as this is the type of information needed for
someone to diagnose a problem.

I don't know that fixing this will get a priority given that it is pretty
easy to install and run FASTA directly from Bill's distro and we can parse
that output just fine.

-jason

On Mon, 25 Aug 2003, Martin A. Hansen wrote:

> hi
>
> im trying to parse fasta search reports with Bio::SeachIO. however, i get this
> warning message:
>
> maasha at homer:~/bin$ parse_fasta btg1.fasta
>
> -------------------- WARNING ---------------------
> MSG: unrecognized FASTA Family report file!
> ---------------------------------------------------
>
> this indicates that there might be something wrong with the fasta report file,
> but im not sure what that could be. im i supposed to run a certain version of
> fasta? and with a certain set of options? e.g. i have noticed that running
> fasta from the wisconsin packages (GCG) outputs a double dot (..) between the
> introtext and the data:
>
> The best scores are:                    init1 initn   opt    z-sc E(7402)..
>
> whereas running "normal" fasta does not produce the double dot?
>
> and to really twist the fork i am failing in identifying the different fasta
> versions :/
>
> anyways, here is the snippet of code im using to parse:
>
>
> #!/usr/bin/perl -w
>
> use strict;
> use Bio::SearchIO;
>
> my ( $script, $usage, $file );
>
> $script = ( split "/", $0 )[ -1 ];
>
> $usage = qq(
>
> $script by Martin A. Hansen, August 2003.
>
> $script parses a FASTA report file
>
> Usage: $script [file]
>                [file]       - file with fasta report
>
> );
>
> print $usage and exit if not @ARGV;
>
> $file = shift @ARGV;
>
>
> # >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> MAIN <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>
>
> my ( $lines );
>
> $lines = &parse_fasta( $file );
>
> print "$_\n" foreach @{ $lines };
>
> exit;
>
>
> # >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SUBROUTINES <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>
>
> sub parse_fasta
> {
>     # Martin A. Hansen, August 2003.
>
>     # parses blast reports using Bioperl
>
>     my ( $file,   # file with blast report
>        ) = @_;
>
>     # returns list of sequence lines
>
>     my ( $result, $hit, $hit_name, $searchio, $white_space, $query_beg, $hsp, $hit_string, @lines, $query_string, $query_name );
>
>     $searchio = new Bio::SearchIO ( -format => 'fasta', -file => $file );
>     $result   = $searchio->next_result;
>
>     while ( $hit = $result->next_hit )
>     {
>         $query_name   = $result->query_name;
>         $hit_name     = $hit->name;
>         $hsp          = $hit->next_hsp;
>
>         $query_string = $hsp->query_string;
>         $query_beg    = $hsp->query->start;
>         $hit_string   = $hsp->hit_string;
>
>         $white_space  = ' ' x ( $query_beg - 1 );
>
>         push @lines, {
>                        "QUERY_NAME"     => $query_name,
>                        "QUERY_STRING"   => $white_space . $query_string,
>                        "SUBJECT_NAME"   => $hit_name,
>                        "SUBJECT_STRING" => $white_space . $hit_string,
>         }
>     }
>
>     return wantarray ? @lines : \@lines;
> }
>
>
>
>
> # >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>
>
> __END__
>
>
>
> any suggestions?
>
>
> martin
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org 
> http://portal.open-bio.org/mailman/listinfo/bioperl-l 
>

--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu
_______________________________________________
Bioperl-l mailing list
Bioperl-l at portal.open-bio.org 
http://portal.open-bio.org/mailman/listinfo/bioperl-l



More information about the Bioperl-l mailing list