[Bioperl-l] SeqFeatureCollection issue

Wiepert, Mathieu Wiepert.Mathieu at mayo.edu
Thu Jul 15 08:45:24 EDT 2004


Hi,

I did try that actually, that was the last thing I was doing, as I left last night.  I thought it was going to work, but it didn't get far before I got "Out of memory!" again.  It seems a contig with a file size of 38,914,775 bytes, which hat 619 features of type mRNA, CDA, or gene, creates a temp file of 18,520,702,976.  SO that's 38 MB to 18 GB.  Wow!  Pulling a range out of that collections does take a bit of time too.  Perhaps there is a better way to do this... 

I am just not sure where all the memory is getting eaten up, if you have an idea (large seq, something with that?) let me know.  I made the temp file get created in a place that I know can hold it at least, and it is working (though I have a 100mb file, I am afraid what that one will do)

Thanks for the input though,
 
-mat

> -----Original Message-----
> From: Jason Stajich [mailto:jason at cgt.duhs.duke.edu]
> Sent: Wednesday, July 14, 2004 8:58 PM
> To: Wiepert, Mathieu
> Cc: bioperl-l at portal.open-bio.org
> Subject: Re: [Bioperl-l] SeqFeatureCollection issue
> 
> Did you try passing in a filename with -file => '/tmp/myfile.idx'?
> 
>  Title   : new
>  Usage   : my $obj = new Bio::SeqFeature::Collection();
>  Function: Builds a new Bio::SeqFeature::Collection object
>  Returns : Bio::SeqFeature::Collection
>  Args    :
> 
>            -minbin        minimum value to use for binning
>                           (default is 100,000,000)
>            -maxbin        maximum value to use for binning
>                           (default is 1,000)
>            -file          filename to store/read the
>                           BTREE from rather than an in-memory structure
>                           (default is false and in-memory).
>            -keep          boolean, will not remove index file on
>                           object destruction.
>            -features      Array ref of features to add initially
> 
> No idea where the /var/tmp is going...
> 
> This *should* work but I haven't done much with it/used it for quite a
> while so I don't know if there are things that don't work...
> 
> If it is really not working you can always go the -> to GFF -> load in
> Bio::DB::GFF route using the in-memory adaptor - I wanted to merge the
> interface so that SeqFeature::Collection used the same method names but
> never got around to it.  If someone is using the module would be a nice
> thing to have...
> 
> -jason
> 
> 
> On Wed, 14 Jul 2004, Wiepert, Mathieu wrote:
> 
> > Hi,
> >
> >
> >
> > I was trying to use the seqfeature collection to pull out features in a
> range I was interested in.  I have two problems (maybe because I am
> loading features form a contig?)
> >
> >
> >
> > In the first case, I ended up running out of space on /var/tmp.  We have
> about .5 GB there I am.  Code is like
> >
> > my $in1  = Bio::SeqIO->new('-file' => $contig.'.gb' , '-format' =>
> 'Genbank');
> >
> > while (my $seq = $in1->next_seq) {
> >
> >             my @feat_ary = $seq->get_SeqFeatures();
> >
> >             my $col = new Bio::SeqFeature::Collection();
> >
> >             # add these features to the object
> >
> >             my $totaladded = $col->add_features(\@feat_ary);
> >
> > }
> >
> >
> >
> > I end up filling /var/tmp to 100%, as I said.
> >
> >
> >
> > So I tried to initialize the collection like
> >
> > my $col = new Bio::SeqFeature::Collection(-features => \@feat_ary);
> >
> >
> >
> > but that gave an error:
> >
> >
> >
> > "Can't call method "put" on an undefined value at
> /usr/local/biotools/perl/5.8.2/lib/site_perl/5.8.2/Bio/SeqFeature/Collecti
> on.pm line 225, <GEN0> line 95373."
> >
> >
> >
> > That looked like the _btree wasn't set, but not sure.
> >
> >
> >
> > I am told we have plenty of room in /tmp, so I should change my tmp dir,
> but the docs said that it was all in memory by default, is that not the
> case?  I tried to export a new tmp dir, but that didn't fix the problem...
> >
> >
> >
> >
> >
> > -mat
> >
> >
> >
> >
> >
> >
> 
> --
> Jason Stajich
> Duke University
> jason at cgt.mc.duke.edu


More information about the Bioperl-l mailing list