[Bioperl-l] Strange BioPerl/Perl problem!

Jason Stajich jason@cgt.mc.duke.edu
Thu, 24 Jan 2002 15:16:52 -0500 (EST)


Not sure since I don't know what exceptions you are getting or what
version of perl,bioperl you are using.  I'm guessing you are running out
of memory?

We have functionality for dealing with large sequence that is more memory
efficient than our string based storage of sequences.  This is called
Bio::Seq::LargePrimarySeq and Bio::Seq::LargeSeq (analagous to
Bio::PrimarySeq and Bio::Seq)

If your data is in fasta format you can read it in with the Bio::SeqIO
with the following code
my $seqio = new Bio::SeqIO(-format => 'largefasta',
	                   -file   => "filename);

my $seq = $seqio->next_seq;

If you'd rather keep using your System call - if you are pulling in chunks
you can use the add_sequence_as_string($str) method to append sequence
info to the end of the sequence like the following
my $largeseq = new Bio::LargeSeq(-id => "largeseqtest");
while( $data = getfromsrc() ) {
 $largeseq->add_sequence_as_string($data);
}

There is imperative that when you have the LargeSeq object you do not call
$seq->seq() as that will cause perl to bring the whole seq into memory and
crash perl if the seq is too big.  We probably should just throw a warning
when trying to call $seq->seq on Bio::Seq::LargePrimarySeqs ... Anyways
you should use the subseq method to pull out the pieces you are interested
in.

-jason

On Thu, 24 Jan 2002, Lynn Stevens wrote:

> Hi All,
>
> I am new to Perl and BioPerl and I have encountered a strange problem
> with a script that I have written.
>
> The script uses the System commmand to access a large DNA contig (2
> million or so bases long).  I place this contig into a BioPerl seq
> object and then manipulate it in various ways (using Seq->subseq) and
> seq->length).  In the end I print out a much smaller subseq of the
> original contig.  My script performs this function on 1000s of contigs
> in a while loop ... or alternatively, I have a slightly different
> script that just performs the function above once (and then I just
> invoke it 1000's of times by using a shell script).
>
> Either way, the script runs perfectly for the 1000's of sequences,
> giving me the correct output (which is generally a 500 bp sequence for
> each call).
>
> However, if I start running the script more than once (several
> processes going at the same time), I get lots of exceptions....of
> various sorts.  They are different everytime.  This happens even when
> I am running similar scripts with different names.  The only thing
> that seems to make logical sense to me is that the vaiables in the
> BioPerl modules are getting confused. However, I can understand why
> this should occur.
>
> Any thoughts or suggestions would be appreciated!
>
> Lynn
>
>
> _________________________________________________________________
> Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>

-- 
Jason Stajich
Duke University
jason@cgt.mc.duke.edu