[Bioperl-l] Proposal for bio-perl updates: ACE assembly file
Elia Stupka
elia at tigem.it
Tue Mar 1 06:51:16 EST 2005
Hi Jordan,
I have been doing some work on Contig::Assembly myself recently, and
have also been in touch with the author (Robson) about it. Perhaps the
best thing would be for the three of us to have a chat about this
object, try to revamp it a little with our improvements, and then
Robson or I can check it in?
regards,
Elia
On Feb 28, 2005, at 5:05 PM, Jordan Swanson wrote:
> On Monday 14 February 2005 12:05 pm, Jordan Swanson wrote:
>> Hi,
>> I am new to bioperl, but I have a proposal for updating bioperl with
>> some
>> of the code I have been using.
>>
>> Bioperl packages currently exist that open ACE assembly files (output
>> by
>> phrap/cap3, and other assembly program). However, the current code
>> brings
>> in the entire file in one call:
>>
>> my $assembly_in =
>> Bio::Assembly::IO->new(-file=>"input.ace",
>> -format=>'ace');
>>
>> my $assembly = $assembly_in->next_assembly;
>>
>> I am working on a large EST assembly project(roughly 150K) and our
>> assembly
>> files have been around 200 MB in size. For many of our applications,
>> we
>> only need to process one contig at a time, not to mention that
>> reading the
>> entire assembly at once requires a large amount of memory and/or disc
>> space.
>>
>> I have developed some code that reads in contigs one at a time,
>> therefore
>> using only the amount of space needed for one contig object. A brief
>> synopsis:
>>
>> my $contig_in = ContigIO->new(-file=>$filename, -format=>'ace');
>> while( my $contig = $contig_in->next_contig)
>> {
>> do_stuff_with_contig();
>> }
>>
>> Furthermore, there is no code that currently writes out ACE files or
>> reverses the contigs orientation. I have developed some code that
>> implements both, and if you would have it, I would like to submit this
>> code. I have been working on converting this code to a more bioperl
>> friendly format
>> ( inheriting from bioseq objects, using the bioperl IO system, bioperl
>> style warnings and so forth)
>>
>> I would appreciate some advice on how to proceed, specifically on
>> inheriting from the correct classes and avoiding duplication of code.
>> My
>> initial thoughts:
>>
>> * Pull out the parsing code from Assembly::IO::ace.pm and into a new
>> ContigIO::ace.pm, (possibly inherited from AlignIO, since the contig
>> object
>> is an AssemblyI)
>> * Alter Assembly::IO.ace.pm to use the ContigIO.pm to load the entire
>> contig into, and to output the assembly
>> * Incorporate somewhere, my reverse_contig function ( which is like
>> revcom
>> for Bio::SeqI, so possibly in the ContigI.pm file)
>>
>> Thoughts?
>
> I've gone ahead and incorporated my changes into bioperl compliant
> objects.
>
> *Bio/Assembly/ContigIO.pm created
> *Bio/Assembly/ContigIO directory created
> *Bio/Assembly/ContigIO/ace.pm created
> *Bio/Assembly/IO/ace.pm modified to use Bio::Assembly::Contig
> *Bio/Assembly/Contig.pm modified to allow base segments and to add a
> revcom
> method
> *t/ContigIO.t created
>
> How does one submit their code for inspection/review/incorporation? I
> used
> cvs to check out the code I've been using, but "cvs add" is not
> working at my
> permission level.
>
>
>
>
> --
> Jordan M Swanson
> Department of Ecology, Evolution, and Organismal Biology
> Iowa State University
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
---
Telethon Institute of Genetics and Medicine
Via Pietro Castellino, 111
80131 Napoli
Tel. +39 081 6132 335
Fax. +39 081 560 98 77
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 3488 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050301/67c65329/attachment.bin
More information about the Bioperl-l
mailing list