[Bioperl-l] Proposal for bio-perl updates: ACE assembly file

Elia Stupka elia at tigem.it
Tue Mar 1 06:51:16 EST 2005


Hi Jordan,

I have been doing some work on Contig::Assembly myself recently, and 
have also been in touch with the author (Robson) about it. Perhaps the 
best thing would be for the three of us to have a chat about this 
object, try to revamp it a little with our improvements, and then 
Robson or I can check it in?

regards,

Elia

On Feb 28, 2005, at 5:05 PM, Jordan Swanson wrote:

> On Monday 14 February 2005 12:05 pm, Jordan Swanson wrote:
>> Hi,
>> I am new to bioperl, but I have a proposal for updating bioperl with 
>> some
>> of the code I have been using.
>>
>> Bioperl packages currently exist that open ACE assembly files (output 
>> by
>> phrap/cap3, and other assembly program).  However, the current code 
>> brings
>> in the entire file in one call:
>>
>> my $assembly_in =
>> 	 Bio::Assembly::IO->new(-file=>"input.ace",
>> 						-format=>'ace');
>>
>> my $assembly = $assembly_in->next_assembly;
>>
>> I am working on a large EST assembly project(roughly 150K) and our 
>> assembly
>> files have been around 200 MB in size.  For many of our applications, 
>> we
>> only need to process one contig at a time, not to mention that 
>> reading the
>> entire assembly at once requires a large amount of memory and/or disc
>> space.
>>
>> I have developed some code that reads in contigs one at a time, 
>> therefore
>> using only the amount of space needed for one contig object. A brief
>> synopsis:
>>
>> my $contig_in = ContigIO->new(-file=>$filename, -format=>'ace');
>> while( my $contig = $contig_in->next_contig)
>> {
>> 	do_stuff_with_contig();
>> }
>>
>> Furthermore, there is no code that currently writes out ACE files or
>> reverses the contigs orientation.  I have developed some code that
>> implements both, and if you would have it, I would like to submit this
>> code.  I have been working on converting this code to a more bioperl
>> friendly format
>> ( inheriting from bioseq objects, using the bioperl IO system, bioperl
>> style warnings and so forth)
>>
>> I would appreciate some advice on how to proceed, specifically on
>> inheriting from the correct classes and avoiding duplication of code. 
>> My
>> initial thoughts:
>>
>> *  Pull out the parsing code from Assembly::IO::ace.pm and into a new
>> ContigIO::ace.pm, (possibly inherited from AlignIO, since the contig 
>> object
>> is an AssemblyI)
>> * Alter Assembly::IO.ace.pm to use the ContigIO.pm to load the entire
>> contig into, and to output the assembly
>> * Incorporate somewhere, my reverse_contig function ( which is like 
>> revcom
>> for Bio::SeqI, so possibly in the ContigI.pm file)
>>
>> Thoughts?
>
> I've gone ahead and incorporated my changes into bioperl compliant 
> objects.
>
> *Bio/Assembly/ContigIO.pm created
> *Bio/Assembly/ContigIO directory created
> *Bio/Assembly/ContigIO/ace.pm created
> *Bio/Assembly/IO/ace.pm modified to use Bio::Assembly::Contig
> *Bio/Assembly/Contig.pm modified to allow base segments and to add a 
> revcom
> method
> *t/ContigIO.t created
>
> How does one submit their code for inspection/review/incorporation?  I 
> used
> cvs to check out the code I've been using, but "cvs add" is not 
> working at my
> permission level.
>
>
>
>
> -- 
> Jordan M Swanson
> Department of Ecology, Evolution, and Organismal Biology
> Iowa State University
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
---
Telethon Institute of Genetics and Medicine
Via Pietro Castellino, 111
80131 Napoli

Tel. +39 081 6132 335
Fax. +39 081 560 98 77

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 3488 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050301/67c65329/attachment.bin


More information about the Bioperl-l mailing list