[Bioperl-l] Bio::Assembly::IO problems reading .ace files
Chris Fields
cjfields at uiuc.edu
Tue Dec 4 05:10:57 UTC 2007
Yes, it's possible this would cause memory issues as each
Bio::Assembly::Contig instance would have a
Bio::SeqFeature::Collection attached (each Collection having a tied DB
hash, which would be an open filehandle), So if you had over 1000
contigs open at any one time (in a parsed scaffold, for instance) you
would have 1000 open file handles. Not very efficient.
My thought was to have each Bio::Assembly::Scaffold instance carry a
single Bio::SeqFeature::CollectionI (it could be a
Bio::SeqFeature::Collection, Bio::DB::SeqFeature::Store, or any other
CollectionI, whatever's easiest). Each Contig would be passed (and
store) a reference to the Scaffold SF::Collection and pull features
from there; just haven't had time to mess with it. I don't think
anyone's tackling it, so feel free to code away!
chris
On Dec 3, 2007, at 8:25 PM, Florent Angly wrote:
> Would this issue cause an excessive memory usage? Because I was
> getting a high memory usage when parsing some TIGR Assembler files
> and was wondering if the tigr parser was responsible for that or the
> parent assembly IO module.
> I'd definitely be interested in a fix of the Bio::Assembly
> implementation if it's the assembly IO module's fault....
> Florent
>
> Chris Fields wrote:
>> This seems similar to the 'too many open filehandles issue'
>> documented here:
>>
>> http://bugzilla.open-bio.org/show_bug.cgi?id=2320
>>
>> It unfortunately is due to having an open DB_File for every
>> contig, and is a problem with the Bio::Assembly implementation
>> that isn't easily fixed. Changing the open filehandle limit using
>> ulimit is the only known fix:
>>
>> ulimit -n 10000
>>
>> chris
>>
>> On Dec 3, 2007, at 6:49 PM, Smithies, Russell wrote:
>>
>>
>>> Hi all,
>>>
>>> It' trying to read .ace files but keep getting an error that I don't
>>> know the cause of.
>>> Really basic example code:
>>>
>>> #!/usr/local/bin/perl -w
>>>
>>> use lib "/data/home/smithiesr/bioperl-live";
>>> use Bio::Assembly::IO;
>>> use Data::Dumper;
>>>
>>> $ace = "CLP0001001240-cE15_20030319.ace";
>>>
>>> $io = new Bio::Assembly::IO(-file=>$ace,-format=>"ace");
>>> $assembly = $io->next_assembly;
>>>
>>> foreach $contig ($assembly->all_contigs) {
>>> print Dumper $contig;
>>> }
>>>
>>> Gives this error;
>>> [smithiesr at impala ace_phrap]$ perl bp_read_ace.pl
>>> Can't call method "get_consensus_sequence" on an undefined value
>>> at /data/home/smithiesr/bioperl-live/Bio/Assembly/IO/ace.pm line
>>> 170,
>>> <GEN0> line 42.
>>>
>>> Which relates to this bit in ace.pm:
>>> # Loading contig qualities... (Base Quality field)
>>> /^BQ/ && do {
>>> my $consensus = $contigOBJ->get_consensus_sequence()->seq();
>>>
>>> Is this caused by a dud ace file or a problem with
>>> Bio::Assembly::IO:ace
>>> or is the Contig object not getting created?
>>> Any ideas?
>>>
>>> Thanx,
>>>
>>> Russell Smithies
>>>
>>> Bioinformatics Software Developer
>>> T +64 3 489 9085
>>> E russell.smithies at agresearch.co.nz
>>>
>>> Invermay Research Centre
>>> Puddle Alley,
>>> Mosgiel,
>>> New Zealand
>>> T +64 3 489 3809
>>> F +64 3 489 9174
>>> www.agresearch.co.nz
>>>
>>> =
>>> =
>>> =
>>> ====================================================================
>>> Attention: The information contained in this message and/or
>>> attachments
>>> from AgResearch Limited is intended only for the persons or entities
>>> to which it is addressed and may contain confidential and/or
>>> privileged
>>> material. Any review, retransmission, dissemination or other use
>>> of, or
>>> taking of any action in reliance upon, this information by persons
>>> or
>>> entities other than the intended recipients is prohibited by
>>> AgResearch
>>> Limited. If you have received this message in error, please notify
>>> the
>>> sender immediately.
>>> =
>>> =
>>> =
>>> ====================================================================
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign
More information about the Bioperl-l
mailing list