[Bioperl-l] Problems with Bio::DB::Fasta
Justin Chu
justinchu1989 at gmail.com
Fri May 27 19:07:39 UTC 2011
Hi Florent:
Thanks for your reply, I think something is wrong with my installation
because I keep getting an error when running your script. I have had already
tried reinstalling with a version on cpan to make sure my problem is not due
to missing dependencies but I still get the following error:
Can't locate Test/Exception.pm in @INC (@INC contains: t/lib
/home/justin/workspace/.metadata/.plugins/org.epic.debug
/home/justin/workspace/LocalTools/Testing /etc/perl
/usr/local/lib/perl/5.10.1 /usr/local/share/perl/5.10.1 /usr/lib/perl5
/usr/share/perl5 /usr/lib/perl/5.10 /usr/share/perl/5.10
/usr/local/lib/site_perl .) at (eval 46) line 2.
BEGIN failed--compilation aborted at (eval 46) line 2.
BEGIN failed--compilation aborted at
/usr/local/share/perl/5.10.1/Bio/Root/Test.pm line 152.
Compilation failed in require at /home/justin/workspace/LocalTools/Testing/
test.pl line 6.
BEGIN failed--compilation aborted at
/home/justin/workspace/LocalTools/Testing/test.pl line 6.
However I did post my problem somewhere else and I did find other people did
get errors when trying to make a index with my files. The weird thing is
that I could make index files but lines with out sequence would cause my
sequence retrieval to be offset one sequence position by each empty line. I
found that removing all the spaces fixed the retrieval but this still does
not explain the lack or error messages.
Thanks for your help,
Justin
On Thu, May 26, 2011 at 8:55 PM, Florent Angly <florent.angly at gmail.com>wrote:
> Hi Justin,
>
> I been trying to reproduce your issue. A problem I ran into was that there
> were some extra empty lines in your FASTA files. Then I made a test script
> that gets the subsequences you mentioned using three different methods:
> Bio::SeqIO+Bio::Seq, Bio::DB::Fasta, and your InMemoryFastaAccess. These
> three methods return the same answer, so, I see no problem there.
>
> My system is pretty similar to yours:
> Bioperl-live from the BioPerl GitHub master branch from 27/5/11
> Perl 5.12.3
> Linux 2.6.38-2-amd64 (Linux Mint Debian Edition)
>
> Can you run the attached script on the attached FASTA files and see if all
> tests pass?
>
> Thanks,
>
> Florent
>
>
>
>
> On 21/05/11 05:51, Justin Chu wrote:
>
> Hello:
>
> I'm having trouble with Bio::DB::Fasta. It sometimes occurs when I use large
> fasta files and retrieve sequence from a bit past the start of the file. I
> think some characters are being ignored or a rounding error is occurring or
> something when using the offset to retrieve entries from the index file. I
> have attached the Fasta files I have been using, just incase my problem is
> due to improper formatting of my files.
>
> For example:
>
> my $refDB = Bio::DB::Fasta->new('Test2.Fasta');
> my $queryDB = Bio::DB::Fasta->new('Test1.Fasta');
>
> print $refDB->subseq( "gi|294675557|ref|NC_014034.1|", 161067, 161788
> )."\n";
> print $queryDB->subseq( "gi|169245903|gb|EU376363.1|", 1, 722 )."\n";
>
> output:
> GGTAGTCCACGCCGTAAACGATGAATGCCAGTCGTCGGCAG...
> GTAGTCCCGGCCGTAAACGATGGATGCTAGCCGTCGGATAG...
>
> my $refDB2 = InMemoryFastaAccess->new('Test2.Fasta');
> my $queryDB2 = InMemoryFastaAccess->new('Test1.Fasta');
>
> print $refDB2->subseq( "gi|294675557|ref|NC_014034.1|", 161067, 161788
> )."\n";
> print $queryDB2->subseq( "gi|169245903|gb|EU376363.1|", 1, 722 )."\n";
>
> I get:
>
> output:
> GTAGTCCACGCCGTAAACGATGAATGCCAGTCGTCGGCA...
> GTAGTCCCGGCCGTAAACGATGGATGCTAGCCGTCGGAT...
>
> Basically, sometimes the sequences retrieved are correct but other times it
> is offset slightly by a few base pairs. Interestingly it seems that the
> offset problem gets worse as you retrieve sequence chunks further and
> further down the sequence.
>
> print $refDB->subseq( "gi|294675557|ref|NC_014034.1|", 1514858,
> 1515579)."\n";
>
> output:
> CCCTGGTAGTCCACGCCGTAAACGATGAATGCCAGTCGT...
>
> when it should be:
>
> print $refDB2->subseq( "gi|294675557|ref|NC_014034.1|", 1514858,
> 1515579)."\n";
>
> output:
> GTAGTCCACGCCGTAAACGATGAATGCCAGTCGTCGGCA...
>
> This module is still way faster than what I have, so I want to keep using
> it. Do you think there something I'm overlooking that could be the problem
> or do you see a way to fix this?
>
> I am currently running:
> Bioperl-live from the BioPerl GitHub master branch from 19/5/11
> Perl 5.10.1
> Debian 6.0.1
>
> If you need any other information please let me know.
>
> Thanks,
>
> Justin Chu
>
>
>
> _______________________________________________
> Bioperl-l mailing listBioperl-l at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
More information about the Bioperl-l
mailing list