[Bioperl-l] Parser: Ace file (Sequence Assembly) in Bioperl

Chris Fields cjfields at illinois.edu
Tue Jan 6 22:13:24 UTC 2009


How about re-implementing Bio::Assembly classes so they simply map to  
Bio::DB::SeqFeature::Store (or similar) methods?  Scaffold could just  
be a wrapper around a Bio::DB::SeqFeature::Store (which can be BDB/ 
mysql/postgresql/memory) and return Contigs.

Similarly, the IO classes could probably act as specialized  
Bio::DB::SeqFeature::Store::Loade classes for the database and just  
return the Scaffold instance.

chris

On Jan 6, 2009, at 3:31 PM, Smithies, Russell wrote:

> I agree with the need for a faster parser.
> Although the current version does a great job, it is slow and memory  
> intensive as it loads everything into Bio::Assembly::Scaffold  
> objects composed of Bio::Assembly::Contig objects.
> I'm not sure exactly what the best solution would be, perhaps a new  
> constructor with a named contig would simplify things?
>
>    $io = new Bio::Assembly::IO(-file=>"454_assy.ace",-format=>"ace");
>
>    $contig = $io->next_assembly_with_contig(-contig=>"Contig000100")- 
> >select_contig;
>
> Or do we even need a next_assembly method?
> Can there be more than one assembly in an .ace file?
>
> --Russell
>
>
>
> From: Abhishek Pratap [mailto:abhishek.vit at gmail.com]
> Sent: Wednesday, 7 January 2009 10:07 a.m.
> To: Chris Fields
> Cc: Smithies, Russell; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Parser: Ace file (Sequence Assembly) in  
> Bioperl
>
> Ok .. Sure in case we do write something which eventually I will  
> have to :)  I will fwd it.
>
> @Russel:
>
> I feel to get info for specific the current method is very slow as  
> it tries to store the info for all contigs into memory. Such info  
> could be memory intensive specially with the next gen data coming  
> from 454 sequencers. I think we should grep to the contig/s of  
> itnerest and then create a record for it. Please correct me if I am  
> wrong.
>
> Thanks,
> -Abhi
> On Tue, Jan 6, 2009 at 3:52 PM, Chris Fields <cjfields at illinois.edu<mailto:cjfields at illinois.edu 
> >> wrote:
> Not at this time (write_assembly is not implemented).  If you come  
> up with code to do so let us know (patches are always welcome).
>
> chris
>
>
> On Jan 6, 2009, at 2:43 PM, Abhishek Pratap wrote:
> Thanks that helped.
>
> Any method to write Ace files ?
>
> Thanks,
> -Abhi
>
> On Tue, Jan 6, 2009 at 3:36 PM, Smithies, Russell <
> Russell.Smithies at agresearch.co.nz<mailto:Russell.Smithies at agresearch.co.nz 
> >> wrote:
> Here's how I've been doing it:
>
>
> my $infile = "454Contigs.ace";
> my $parser = new Bio::Assembly::IO(-file   => $infile ,-format =>  
> "ace") or
> die $!;
> my $assembly = $parser->next_assembly;
>
> # to work with a named contig
> my @wanted_id = ("Contig100");
> my ($contig) = $assembly->select_contigs(@wanted_id) or die $!;
>
> #get the consensus
> my $consensus = $contig->get_consensus_sequence();
>
> #get the consensus qualities
> my @quality_values  = @{$contig->get_consensus_quality()->qual()};
>
> hope this helps,
>
> Russell
>
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org<mailto:bioperl-l-bounces at lists.open-bio.org 
> > [mailto:bioperl-l-<mailto:bioperl-l->
> bounces at lists.open-bio.org<mailto:bounces at lists.open-bio.org>] On  
> Behalf Of Abhishek Pratap
> Sent: Tuesday, 6 January 2009 6:43 p.m.
> To: bioperl-l at lists.open-bio.org<mailto:bioperl-l at lists.open-bio.org>
> Subject: [Bioperl-l] Parser: Ace file (Sequence Assembly) in Bioperl
>
> Hi All
>
> I am looking for some code to parse the ACE file format. I have big  
> ACE
> files which I would like to trim based on the user defined Contig name
> and
> specific region and write out the output to another fresh ACE file.
>
> For now I am trying to tweak Bio::Assembly::IO; but it is kind of  
> slow.
> Any
> other alternative or suggestions.
>
> Thanks All,
> -Abhi
>
>
>
>
>
>
>
>
> --
> -----------------------------
> Abhishek Pratap
> Bioinformatics Software Engineer
> Institute for Genome Sciences
> School of Medicine, Univ of Maryland
> 801, W. Baltimore Street, Baltimore, MD 21209
> Ph: (+1)-410-706-2296
> www.igs.umaryland.edu/<http://www.igs.umaryland.edu/>
>
> Chair
> RSG-Worldwide
> ISCB-Student Council
> http://iscbsc.org/rsg
>
> www.bioinfosolutions.com<http://www.bioinfosolutions.com>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org<mailto:Bioperl-l at lists.open-bio.org>
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> = 
> ======================================================================
> Attention: The information contained in this message and/or  
> attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or  
> privileged
> material. Any review, retransmission, dissemination or other use of,  
> or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by  
> AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> = 
> ======================================================================
>
>
>
> --
> -----------------------------
> Abhishek Pratap
> Bioinformatics Software Engineer
> Institute for Genome Sciences
> School of Medicine, Univ of Maryland
> 801, W. Baltimore Street, Baltimore, MD 21209
> Ph: (+1)-410-706-2296
> www.igs.umaryland.edu/<http://www.igs.umaryland.edu/>
>
> Chair
> RSG-Worldwide
> ISCB-Student Council
> http://iscbsc.org/rsg
>
> www.bioinfosolutions.com<http://www.bioinfosolutions.com>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org<mailto:Bioperl-l at lists.open-bio.org>
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
>
> --
> -----------------------------
> Abhishek Pratap
> Bioinformatics Software Engineer
> Institute for Genome Sciences
> School of Medicine, Univ of Maryland
> 801, W. Baltimore Street, Baltimore, MD 21209
> Ph: (+1)-410-706-2296
> www.igs.umaryland.edu/<http://www.igs.umaryland.edu/>
>
> Chair
> RSG-Worldwide
> ISCB-Student Council
> http://iscbsc.org/rsg
>
> www.bioinfosolutions.com<http://www.bioinfosolutions.com>
>
> = 
> ======================================================================
> Attention: The information contained in this message and/or  
> attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or  
> privileged
> material. Any review, retransmission, dissemination or other use of,  
> or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by  
> AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> = 
> ======================================================================
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list