[Bioperl-l] Problems with Bio::DB::Fasta
Florent Angly
florent.angly at gmail.com
Mon May 30 21:53:04 UTC 2011
Hi Justin,
Please "reply all" so that our emails stay on the BioPerl mailing list.
Weirdness regarding new lines if often indicative of a file that has
traveled between different operating systems (which have a different way
of representing new lines). You may try to follow these instructions if
that's the case:
http://www.cyberciti.biz/faq/howto-unix-linux-convert-dos-newlines-cr-lf-unix-text-format/
Florent
On 31/05/11 04:28, Justin Chu wrote:
> Hi Florent:
>
> It seems that I does not detect the spaces in my files at times for
> some reason and will proceed to run the script with no problem.
> Strangely empty lines I insert myself seem to be detected in
> Test1.Fasta, but not in Test2.Fasta.
>
> Justin
>
> On Fri, May 27, 2011 at 5:33 PM, Florent Angly
> <florent.angly at gmail.com <mailto:florent.angly at gmail.com>> wrote:
>
>
>
> On 28/05/11 05:07, Justin Chu wrote:
>> Thanks for your reply, I think something is wrong with my
>> installation because I keep getting an error when running your
>> script. I have had already tried reinstalling with a version on
>> cpan to make sure my problem is not due to missing dependencies
>> but I still get the following error:
>>
>> Can't locate Test/Exception.pm in @INC (@INC contains: t/lib
>> /home/justin/workspace/.metadata/.plugins/org.epic.debug
>> /home/justin/workspace/LocalTools/Testing /etc/perl
>> /usr/local/lib/perl/5.10.1 /usr/local/share/perl/5.10.1
>> /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.10
>> /usr/share/perl/5.10 /usr/local/lib/site_perl .) at (eval 46) line 2.
>> BEGIN failed--compilation aborted at (eval 46) line 2.
>>
>> BEGIN failed--compilation aborted at
>> /usr/local/share/perl/5.10.1/Bio/Root/Test.pm line 152.
>> Compilation failed in require at
>> /home/justin/workspace/LocalTools/Testing/test.pl
>> <http://test.pl> line 6.
>> BEGIN failed--compilation aborted at
>> /home/justin/workspace/LocalTools/Testing/test.pl
>> <http://test.pl> line 6.
>
> Hi Justin,
> Install the Test::Exception module this way (for Debian-like
> systems): sudo apt-get install libtest-exception- perl
> Once it is installed, you should get the error messages on the
> white lines of your FASTA file when running the script. If you
> don't get errors on the white lines, and the script continues
> happily, then that's very likely the reason why you get the wrong
> subsequences.
> Florent
>
>
>
>
>>
>> However I did post my problem somewhere else and I did find other
>> people did get errors when trying to make a index with my files.
>> The weird thing is that I could make index files but lines with
>> out sequence would cause my sequence retrieval to be offset one
>> sequence position by each empty line. I found that removing all
>> the spaces fixed the retrieval but this still does not explain
>> the lack or error messages.
>>
>> Thanks for your help,
>>
>> Justin
>>
>> On Thu, May 26, 2011 at 8:55 PM, Florent Angly
>> <florent.angly at gmail.com <mailto:florent.angly at gmail.com>> wrote:
>>
>> Hi Justin,
>>
>> I been trying to reproduce your issue. A problem I ran into
>> was that there were some extra empty lines in your FASTA
>> files. Then I made a test script that gets the subsequences
>> you mentioned using three different methods:
>> Bio::SeqIO+Bio::Seq, Bio::DB::Fasta, and your
>> InMemoryFastaAccess. These three methods return the same
>> answer, so, I see no problem there.
>>
>> My system is pretty similar to yours:
>> Bioperl-live from the BioPerl GitHub master branch from 27/5/11
>> Perl 5.12.3
>> Linux 2.6.38-2-amd64 (Linux Mint Debian Edition)
>>
>> Can you run the attached script on the attached FASTA files
>> and see if all tests pass?
>>
>> Thanks,
>>
>> Florent
>>
>>
>>
>>
>> On 21/05/11 05:51, Justin Chu wrote:
>>> Hello:
>>>
>>> I'm having trouble with Bio::DB::Fasta. It sometimes occurs when I use large
>>> fasta files and retrieve sequence from a bit past the start of the file. I
>>> think some characters are being ignored or a rounding error is occurring or
>>> something when using the offset to retrieve entries from the index file. I
>>> have attached the Fasta files I have been using, just incase my problem is
>>> due to improper formatting of my files.
>>>
>>> For example:
>>>
>>> my $refDB = Bio::DB::Fasta->new('Test2.Fasta');
>>> my $queryDB = Bio::DB::Fasta->new('Test1.Fasta');
>>>
>>> print $refDB->subseq( "gi|294675557|ref|NC_014034.1|", 161067, 161788
>>> )."\n";
>>> print $queryDB->subseq( "gi|169245903|gb|EU376363.1|", 1, 722 )."\n";
>>>
>>> output:
>>> GGTAGTCCACGCCGTAAACGATGAATGCCAGTCGTCGGCAG...
>>> GTAGTCCCGGCCGTAAACGATGGATGCTAGCCGTCGGATAG...
>>>
>>> my $refDB2 = InMemoryFastaAccess->new('Test2.Fasta');
>>> my $queryDB2 = InMemoryFastaAccess->new('Test1.Fasta');
>>>
>>> print $refDB2->subseq( "gi|294675557|ref|NC_014034.1|", 161067, 161788
>>> )."\n";
>>> print $queryDB2->subseq( "gi|169245903|gb|EU376363.1|", 1, 722 )."\n";
>>>
>>> I get:
>>>
>>> output:
>>> GTAGTCCACGCCGTAAACGATGAATGCCAGTCGTCGGCA...
>>> GTAGTCCCGGCCGTAAACGATGGATGCTAGCCGTCGGAT...
>>>
>>> Basically, sometimes the sequences retrieved are correct but other times it
>>> is offset slightly by a few base pairs. Interestingly it seems that the
>>> offset problem gets worse as you retrieve sequence chunks further and
>>> further down the sequence.
>>>
>>> print $refDB->subseq( "gi|294675557|ref|NC_014034.1|", 1514858,
>>> 1515579)."\n";
>>>
>>> output:
>>> CCCTGGTAGTCCACGCCGTAAACGATGAATGCCAGTCGT...
>>>
>>> when it should be:
>>>
>>> print $refDB2->subseq( "gi|294675557|ref|NC_014034.1|", 1514858,
>>> 1515579)."\n";
>>>
>>> output:
>>> GTAGTCCACGCCGTAAACGATGAATGCCAGTCGTCGGCA...
>>>
>>> This module is still way faster than what I have, so I want to keep using
>>> it. Do you think there something I'm overlooking that could be the problem
>>> or do you see a way to fix this?
>>>
>>> I am currently running:
>>> Bioperl-live from the BioPerl GitHub master branch from 19/5/11
>>> Perl 5.10.1
>>> Debian 6.0.1
>>>
>>> If you need any other information please let me know.
>>>
>>> Thanks,
>>>
>>> Justin Chu
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
>
More information about the Bioperl-l
mailing list