[Bioperl-l] problems running Bio::SearchIO on the FASTA results
Jason Stajich
jason.stajich at duke.edu
Tue Dec 28 14:18:01 EST 2004
Christie - this has to do with how the FASTA format has changed with
the latest releases. The parser has been updated to handle the changed
format -update Bio/SearchIO/fasta.pm file from CVS or grab it here;
http://bioperl.org/SRC/
I did not put these changes on the 1.4 branch as I didn't think we'd be
releasing off that branch, but I can merge the changes there as well if
it will help people.
-jason
> Hi folks,
>
> I'm wondering if anybody here is currently parsing the results of
> the FASTA program with Bio::SearchIO. I'm running into a problem very
> early on in the process, right at the moment of trying to parse a
> result.
> Here is a pared-down example program:
>
> >>>>>>
>
> use Bio::SearchIO;
>
> my $fastaFile = 'chWnt3_hg_Gnomon_prots_E0.001.out';
> my $searchIO = new Bio::SearchIO(-format => 'fasta',
> -file => $fastaFile);
>
> my $result = $searchIO->next_result;
>
> <<<<<<<
>
> This program dies on the call to $searchIO->next_result() with this
> message:
>
> >>>>>>>
>
> 1039 cpr at napa:~/fastaTest > ./bioperlFastaParseTest.pl
> Use of uninitialized value in concatenation (.) or string at
> /usr/lib/perl5/site_perl/5.8.0/Bio/Search/HSP/GenericHSP.pm line 231,
> <GEN1> line 131.
>
> ------------- EXCEPTION -------------
> MSG: Did not specify a Query End or Query Begin -verbose 0 -algorithm
> FASTP -hit_seq
> CRNYIEIMPSVAEGVKLGIQECQHQFRGRRWNCTTIDDSLAIFGPVLDKATRESAFVHAIASAGVAFAVTR
> SCAEGTSTICGCDSHHKGPPGEGWKWGGCSEDADFGVLVSREFADARENRPDARSAMNKHNNEAGRTTILD
> HMHLKCKCHGLSGSCEVKTCWWAQPDFRAIGDFLKDKYDSASEMVVEKHRESRGWVETLRAKYSLFKPPTE
> RDLVYYENSPNFCEPNPETGSFGTRDRTCNVTSHGIDGCDLLCCGRGHNTRTEKRKEKCHCIFHWCCYVSC
> QECIRIYDVHTCK
> -hit_length 297 -query_length 297 -query_frame 0 -rank 1 -hit_name
> hmm6623
> -query_name gi|18091804|gb|AAL58093.1| -evalue 0 -score 4361.0
> -hit_frame
> 0 -hsp_length 297 -swscore 3215 -query_seq
> WNCTTIDDSLAIFGPVLDKATRESAFVHAIASAGVAFAVTRSCAEGTSTICGCDSHHKGPPGEGWKWGGCS
> EDADFGVLVSREFADARENRPDARSAMNRHNNEAGRTTILDHMHLKCKCHGLSGSCEVKTCWWAQPDFRAI
> GDYLKDKYDSASEMVVEKHRESRGWVETLRAKYALFKPPTERDLVYYENSPNFCEPNPETGSFGTRDRTCN
> VTSHGIDGCDLLCCGRGHNTRTEKRKEKCHCIFHWCCYVSCQECIRVYDVHTCK
> -homology_seq
> :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
> ::::::::::::::::::::::::::::.::::::::::::::::::::::::::::::::::::::::::
> ::.::::::::::::::::::::::::::::::
> :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
> ::::::::::::.:::::::
> -bits 815.4 (qs='
> STACK Bio::Search::HSP::GenericHSP::new
> /usr/lib/perl5/site_perl/5.8.0/Bio/Search/HSP/GenericHSP.pm:231
> STACK Bio::Search::HSP::FastaHSP::new
> /usr/lib/perl5/site_perl/5.8.0/Bio/Search/HSP/FastaHSP.pm:97
> STACK Bio::Factory::ObjectFactory::create_object
> /usr/lib/perl5/site_perl/5.8.0/Bio/Factory/ObjectFactory.pm:150
> STACK Bio::SearchIO::SearchResultEventBuilder::end_hsp
> /usr/lib/perl5/site_perl/5.8.0/Bio/SearchIO/
> SearchResultEventBuilder.pm:275
> STACK Bio::SearchIO::fasta::end_element
> /usr/lib/perl5/site_perl/5.8.0/Bio/SearchIO/fasta.pm:872
> STACK Bio::SearchIO::fasta::next_result
> /usr/lib/perl5/site_perl/5.8.0/Bio/SearchIO/fasta.pm:403
> STACK toplevel ./bioperlFastaParseTest.pl:9
>
> --------------------------------------
> 1040 cpr at napa:~/fastaTest >
>
> <<<<<<<
>
> Apparently, Bio::Search::HSP::GenericHSP.pm expects Query End and Query
> Begin to be set, and isn't getting them. Out of curiosity, I commented
> the die line (231) from GenericHSP.pm, and then the module dies on the
> next line, looking for Hit Begin and Hit End. Did the FASTA output
> format
> get out of sync with SearchIO? Am I missing something?
>
> I am attaching my output file.
>
> Thanks for any help!
>
> Christie
>
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~
> Christie P Robertson, PhD
> Research Associate
> Geospiza, Inc.
>
> cpr at geospiza.com
> (206)633-4403
> ~~~~~~~~~~~~~~~~~~~~~~~~~
> -------------- next part --------------
> # fasta chWnt3.fasta /usr/local/data/hg_Gnomon_prots.fsa 1 -E 0.001 -Q
> -s P20
> FASTA searches a protein or DNA sequence data bank
> version 3.4t24 July 21, 2004
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/
More information about the Bioperl-l
mailing list