[Bioperl-l] Bio::DB::Fasta

Brian Osborne brian_osborne at cognia.com
Mon Jun 30 12:07:07 EDT 2003


Mike,

>Is the large size of the database a problem for returning seq objects?
>Do I need to go back to LargeSeq?

If you can shorten the sequence in that one database and then successfully
use get_Seq_by_id on it that would strongly suggest that the length is the
problem and not, say, some oddity in the header. The case of the sequence
makes no difference, as you say. I would conclude, as you seem to be doing,
that LargeSeq is one solution.

Brian O.


-----Original Message-----
From: bioperl-l-bounces at portal.open-bio.org
[mailto:bioperl-l-bounces at portal.open-bio.org]On Behalf Of Michael R Olson
Sent: Friday, June 27, 2003 4:50 PM
To: Brian Osborne
Cc: Bioperl Mailing List
Subject: RE: [Bioperl-l] Bio::DB::Fasta

I have 2 followup questions, the first of which is small:
You sent me three links that had documentation on them before.
Was this on any of them or a different one, because I couldn't find it.

Second:
I have one database that just consists of one Fasta ID line followed by
the entire genome.  When I say

$db = new Bio::DB::Fasta($filename);
$bigseq = $db->get_Seq_by_id($id);

it crashes with this error:

Odd number of elements in anonymous hash at
/usr/local/bio/www/cgi-bin/BPPNew/Bio/DB/Fasta.pm line 969.

It does THAT because of the call in Fasta.pm that goes:

  return bless { db    => $db,
                 id    => $id,
                 start => $start || 1,
                 stop  => $stop  || $db->length($id)
               },$class;

because $stop is undefined and $db->length($id) returns undefined.
Since $db comes from the Bio::DB::Fasta constructor, I tend to assume
that that's where the problem is, but that's where the code gets hard to
understand and my investigation ended, so now I'm asking you folks.

It works when I use a database that's broken up into chunks.

The only other (obvious) difference between the working version and the
non-working version is that the working version is uppercase and the
non-working version is lowercase, and I don't expect this should be an
issue.

Is the large size of the database a problem for returning seq objects?
Do I need to go back to LargeSeq?

Thanks,
Mike

On Fri, 2003-06-27 at 07:38, Brian Osborne wrote:
> Michael,
>
> This comes directly from the module's documentation:
>
> use Bio::DB::Fasta;
>
> # Bio::SeqIO-style access
> my $stream  = Bio::DB::Fasta->new('test.fa')->get_PrimarySeq_stream;
>
> while ( my $seq = $stream->next_seq ) {
>    print $seq->seq;
> }
>
> You also asked about "customizing" the indexing so you can use ids you
have
> in hand, which partially match the ids in the file. See the bptutorial
> section on "Indexing or accessing..." or FAQ question 2.5.
>
>
> Brian O.


_______________________________________________
Bioperl-l mailing list
Bioperl-l at portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list