[Bioperl-l] Proposal for bio-perl updates: ACE assembly file
Jordan Swanson
jswanson at iastate.edu
Mon Feb 14 13:05:52 EST 2005
Hi,
I am new to bioperl, but I have a proposal for updating bioperl with some of
the code I have been using.
Bioperl packages currently exist that open ACE assembly files (output by
phrap/cap3, and other assembly program). However, the current code brings in
the entire file in one call:
my $assembly_in =
Bio::Assembly::IO->new(-file=>"input.ace",
-format=>'ace');
my $assembly = $assembly_in->next_assembly;
I am working on a large EST assembly project(roughly 150K) and our assembly
files have been around 200 MB in size. For many of our applications, we only
need to process one contig at a time, not to mention that reading the entire
assembly at once requires a large amount of memory and/or disc space.
I have developed some code that reads in contigs one at a time, therefore
using only the amount of space needed for one contig object. A brief
synopsis:
my $contig_in = ContigIO->new(-file=>$filename, -format=>'ace');
while( my $contig = $contig_in->next_contig)
{
do_stuff_with_contig();
}
Furthermore, there is no code that currently writes out ACE files or reverses
the contigs orientation. I have developed some code that implements both,
and if you would have it, I would like to submit this code. I have been
working on converting this code to a more bioperl friendly format
( inheriting from bioseq objects, using the bioperl IO system, bioperl style
warnings and so forth)
I would appreciate some advice on how to proceed, specifically on inheriting
from the correct classes and avoiding duplication of code. My initial
thoughts:
* Pull out the parsing code from Assembly::IO::ace.pm and into a new
ContigIO::ace.pm, (possibly inherited from AlignIO, since the contig object
is an AssemblyI)
* Alter Assembly::IO.ace.pm to use the ContigIO.pm to load the entire contig
into, and to output the assembly
* Incorporate somewhere, my reverse_contig function ( which is like revcom for
Bio::SeqI, so possibly in the ContigI.pm file)
Thoughts?
---
Jordan M Swanson
Department of Ecology, Evolution, and Organismal Biology
431 Bessey Hall
Iowa State University
Ames, IA 50011
Lab 515 294-7098
FAX: 515-294-1337
More information about the Bioperl-l
mailing list