[Bioperl-l] DB_File and assembly IO
Chris Fields
cjfields at illinois.edu
Fri Aug 29 14:30:49 UTC 2008
This is a known problem with Bio::Assembly and stems from having a
DB_File tied (opened) for each Bio::Assembly::Contig (via a retained
Bio::SeqFeature::Collection). You can extend the number of open
filehandles on UNIX'y flavors using ulimit (see following link), but
I'm not sure about Win32.
http://bugzilla.open-bio.org/show_bug.cgi?id=2320
The general bug is reproducible using the following simple script. If
needed adjust the range end in the for loop to exceed the ulimit (via
'ulimit -n); Mac OS X 10.5 is set to 2560.
---------------------------
use Bio::Assembly::Contig;
my @contigs;
push @contigs, Bio::Assembly::Contig->new() for (1..10000);
---------------------------
I'll open a bug report on this for tracking (for release 1.7, along
with any other Bio::Assembly issues). That doesn't mean it won't get
fixed sooner, just that we aren't under pressure with the next
release, which already has a full plate. IMO, I don't think there
needs to be one SF::Collection per contig; one instance should work do
for the entire assembly, using the same SF::Collection passed in to
each contig and distinguishing the contig using the SeqFeature
seq_id. It would also be nice if we could change that to also allow
other SeqFeature::CollectionI (i.e. Bio::DB::SeqFeature::Store and the
like, for instance).
chris
On Aug 29, 2008, at 3:40 AM, Florent Angly wrote:
> Hi Joshua,
>
> I don't know the specifics of DB_File, but the 'Cannot open file
> tree: Too many open files' is pretty explicit.
> If you're on Unix/Linux you can check the files that are open by
> your program by typing:
> lsof | grep name_of_program
> There is probably a filehandle that in not closed somewhere in your
> code or the BioPerl code.
> Best,
>
> Florent
>
>
>
> Joshua Udall wrote:
>> Bioperl -
>>
>> I'm trying to read/parse a single cap3 ace file with several thousand
>> contigs. I get a DB_File error at Contig247. Here's the error:
>>
>> ------------- EXCEPTION -------------
>> MSG: Unable to tie DB_File handle
>> STACK Bio::SeqFeature::Collection::new
>> /Users/jaudall/bin/src/bioperl-live/Bio/SeqFeature/Collection.pm:195
>> STACK Bio::Assembly::Contig::new
>> /Users/jaudall/bin/bioperl-live/Bio/Assembly/Contig.pm:256
>> STACK Bio::Assembly::IO::ace::next_assembly
>> /Users/jaudall/bin/src/bioperl-live/Bio/Assembly/IO/ace.pm:148
>> STACK toplevel /Users/jaudall/bin/read_ace.pl:214
>> -------------------------------------
>>
>> Looking at the Collection::new, the error is on the middle line:
>>
>> $self->{'_btree'} = tie %{$self->{'_btreehash'}}, 'DB_File',
>> $self->indexfile, O_RDWR|O_CREAT, 0640, $DB_BTREE; # or die
>> "Cannot open
>> file: $!\n" ;
>> $self->{'_btree'} || $self->throw("Unable to tie DB_File handle");
>> return $self;
>>
>> If I uncomment out the $! die statement that I inserted, I get this:
>>
>> 'Cannot open file tree: Too many open files'
>>
>> Apparently the Collection constructor is creating a new index file
>> for each
>> one and the handles for each are sticking around? That confuses me
>> because
>> reading more about the Collection.pm and DB_File, it appeared to me
>> that no
>> files were written by default (as I'm doing), rather the Collection
>> objects
>> are all stored in memory. I'm pretty sure the error is not a
>> permission
>> error, and if it is not the open file-handles, what else should I
>> look for?
>>
>>
>> If I 'warn' the error instead of throwing it, I get:
>>
>> Can't call method "get_dup" on an undefined value at
>> /Users/jaudall/bin/src/bioperl-live/Bio/SeqFeature/Collection.pm
>> line 360
>>
>> This kind of makes sense because the index appears not be be
>> created and it
>> can't look stuff up in an undefined tied hash. I'm stuck.
>>
>> Thanks for any help and suggestions.
>>
>> OSX, perl 5.8.8, bioperl-live (svn last week)
>>
More information about the Bioperl-l
mailing list