[Bioperl-l] Problem with BIO::DB::FASTA and Colon in Fasta Header
Florent Angly
florent.angly at gmail.com
Tue Dec 4 21:52:41 UTC 2012
Hi Jason,
See the documentation for seq() at
http://search.cpan.org/~cjfields/BioPerl-1.6.901/Bio/DB/Fasta.pm#OBJECT_METHODS
<http://search.cpan.org/%7Ecjfields/BioPerl-1.6.901/Bio/DB/Fasta.pm#OBJECT_METHODS>.
When you call seq() with a single argument, e.g.
$db->seq('C7047455:0-100'), Bio::DB::Fasta interprets it as a compound
ID and looks for position 0 to 100 of a sequence called C7047455. This
is a feature that has been in Bio::DB::Fasta since the dawn of time. In
this form, seq() expects a colon as part of the compound ID, which is
problematic because your sequence ID actually contains a colon.
I think that when you call $db->seq($id,$start,$end), Bio::DB::Fasta
does not attempt to parse your ID. This is why your code works with this
form. Note that if you want to get the entirety of a sequence called
'C7047455:0-100', the easiest if your sequence names contain colon is to
use $db->get_Seq_by_id('C7047455:0-100') since get_Seq_by_id() does only
take a regular ID (not compound).
Florent
On 05/12/12 06:23, Jason Gallant wrote:
> Hello,
>
> I'm trying to retreive fasta sequences that contain a colon in their
> header. However, I cannot get my BioPerl script to do this!!
>
> It works as expected when the header does not contain the colon, however
> doesn't return anything when it does. Weirdly, when I ask it to return the
> parsed IDs (see below), it returns the appropriate IDs, which include the
> colon! Very confusing, would appreciate any help!!
>
> Many Thanks,
> Jason Gallant
>
>
> use strict;
> use Bio::SearchIO;
> use Bio::DB::Fasta;
>
>
> my ($file,$id,$start,$end) =
> ("secondround_merged_expanded.fasta","C7047455:0-100",1,10);
>
>
> my $db = Bio::DB::Fasta->new($file, -reindex=>1);
> my $seq = $db->seq($id,$start,$end);
>
> print $db->ids;
>
> print $seq,"\n";
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list