[Bioperl-l] Fwd: Parse problem of a big EMBL entry

Wed Apr 29 16:41:02 UTC 2009

Brian - please always CC the mailing list on replies.

Not sure what is causing the seg fault so I can't really help here -  
if you want to file it as a bug at the bugzilla with instructions on  
how to reproduce it will hopefully get looked at.

-jason
Begin forwarded message:

> From: brian li <brianli.cas at gmail.com>
> Date: April 29, 2009 1:23:32 AM PDT
> To: Jason Stajich <jason at bioperl.org>
> Subject: Re: [Bioperl-l] Parse problem of a big EMBL entry
>
> Hi Jason,
>
>> Without memory leaks it should only take up as much memory as the  
>> current
>> sequence you have parsed.  If you mean you have a sequence record  
>> with > 1M
>> lines I'm not sure how much memory that would take up, depends on  
>> if this is
>> lots of feature or what.
>
>     Lots of feature.
>
>> There are ways to tell BioPerl to throw away
>> things you don't want to parse out from the record. See
>> http://bioperl.org/wiki/HOWTO:SeqIO#Speed.2C_Bio::Seq::SeqBuilder
>
>     Thanks. I think this would help.
>
>> Perl will use as much memory as is available on your machine. Have  
>> you
>> monitored the memory use of the perl running to insure it is  
>> reaching the
>> 32Gb limit and that is in fact what is killing the program?
>
>     I monitored the memory usage in my last run. The size of free
> memory didn't change a lot, and remained to be around 20GB (buffer
> size added). I took the wrong assumption. Thanks again for your hint.
>
>    BTW: The message I get when I parse big million-line entry is
> "Segmentation fault". Not familiar with this and trying to get a clue.
>
> Brian

Jason Stajich
jason at bioperl.org