[Bioperl-l] Problems with Bio::DB::Fasta
Justin Chu
justinchu1989 at gmail.com
Fri May 20 19:51:25 UTC 2011
Hello:
I'm having trouble with Bio::DB::Fasta. It sometimes occurs when I use large
fasta files and retrieve sequence from a bit past the start of the file. I
think some characters are being ignored or a rounding error is occurring or
something when using the offset to retrieve entries from the index file. I
have attached the Fasta files I have been using, just incase my problem is
due to improper formatting of my files.
For example:
my $refDB = Bio::DB::Fasta->new('Test2.Fasta');
my $queryDB = Bio::DB::Fasta->new('Test1.Fasta');
print $refDB->subseq( "gi|294675557|ref|NC_014034.1|", 161067, 161788
)."\n";
print $queryDB->subseq( "gi|169245903|gb|EU376363.1|", 1, 722 )."\n";
output:
GGTAGTCCACGCCGTAAACGATGAATGCCAGTCGTCGGCAG...
GTAGTCCCGGCCGTAAACGATGGATGCTAGCCGTCGGATAG...
my $refDB2 = InMemoryFastaAccess->new('Test2.Fasta');
my $queryDB2 = InMemoryFastaAccess->new('Test1.Fasta');
print $refDB2->subseq( "gi|294675557|ref|NC_014034.1|", 161067, 161788
)."\n";
print $queryDB2->subseq( "gi|169245903|gb|EU376363.1|", 1, 722 )."\n";
I get:
output:
GTAGTCCACGCCGTAAACGATGAATGCCAGTCGTCGGCA...
GTAGTCCCGGCCGTAAACGATGGATGCTAGCCGTCGGAT...
Basically, sometimes the sequences retrieved are correct but other times it
is offset slightly by a few base pairs. Interestingly it seems that the
offset problem gets worse as you retrieve sequence chunks further and
further down the sequence.
print $refDB->subseq( "gi|294675557|ref|NC_014034.1|", 1514858,
1515579)."\n";
output:
CCCTGGTAGTCCACGCCGTAAACGATGAATGCCAGTCGT...
when it should be:
print $refDB2->subseq( "gi|294675557|ref|NC_014034.1|", 1514858,
1515579)."\n";
output:
GTAGTCCACGCCGTAAACGATGAATGCCAGTCGTCGGCA...
This module is still way faster than what I have, so I want to keep using
it. Do you think there something I'm overlooking that could be the problem
or do you see a way to fix this?
I am currently running:
Bioperl-live from the BioPerl GitHub master branch from 19/5/11
Perl 5.10.1
Debian 6.0.1
If you need any other information please let me know.
Thanks,
Justin Chu
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Test2.Fasta
Type: application/octet-stream
Size: 3798623 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20110520/317cce7f/attachment-0008.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Test1.Fasta
Type: application/octet-stream
Size: 839 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20110520/317cce7f/attachment-0009.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: InMemoryFastaAccess.pm
Type: application/x-perl
Size: 1111 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20110520/317cce7f/attachment.pl>
More information about the Bioperl-l
mailing list