[Bioperl-l] Parser: Ace file (Sequence Assembly) in Bioperl

Tue Jan 6 21:34:11 UTC 2009

Just a note:

One of the major problems with the current Bio::Assembly  
implementation is all sequence features are stored in multiple  
Bio::SeqFeature::Collection instances (one per contig IIRC) and so one  
can easily tank the ulimit for opened file handles.  This bods ill for  
454/Solexa.

Switching to a single Bio::SeqFeature::CollectionI (per contig) and  
storing based on both unique seq and feature IDs would probably help  
tremendously, particularly if the database is something like  
Bio::DB::SeqFeature::Store (which stores the sequence data as well).   
It's an open project for someone to work on if they are interested,  
though Florent Angly may be tackling this.

chris

On Jan 6, 2009, at 3:07 PM, Abhishek Pratap wrote:

> Ok .. Sure in case we do write something which eventually I will  
> have to :)
> I will fwd it.
>
> @Russel:
>
> I feel to get info for specific the current method is very slow as  
> it tries
> to store the info for all contigs into memory. Such info could be  
> memory
> intensive specially with the next gen data coming from 454  
> sequencers. I
> think we should grep to the contig/s of itnerest and then create a  
> record
> for it. Please correct me if I am wrong.
>
> Thanks,
> -Abhi
>
> On Tue, Jan 6, 2009 at 3:52 PM, Chris Fields <cjfields at illinois.edu>  
> wrote:
>
>> Not at this time (write_assembly is not implemented).  If you come  
>> up with
>> code to do so let us know (patches are always welcome).
>>
>> chris
>>
>>
>> On Jan 6, 2009, at 2:43 PM, Abhishek Pratap wrote:
>>
>> Thanks that helped.
>>>
>>> Any method to write Ace files ?
>>>
>>> Thanks,
>>> -Abhi
>>>
>>> On Tue, Jan 6, 2009 at 3:36 PM, Smithies, Russell <
>>> Russell.Smithies at agresearch.co.nz> wrote:
>>>
>>> Here's how I've been doing it:
>>>>
>>>>
>>>> my $infile = "454Contigs.ace";
>>>> my $parser = new Bio::Assembly::IO(-file   => $infile ,-format =>  
>>>> "ace")
>>>> or
>>>> die $!;
>>>> my $assembly = $parser->next_assembly;
>>>>
>>>> # to work with a named contig
>>>> my @wanted_id = ("Contig100");
>>>> my ($contig) = $assembly->select_contigs(@wanted_id) or die $!;
>>>>
>>>> #get the consensus
>>>> my $consensus = $contig->get_consensus_sequence();
>>>>
>>>> #get the consensus qualities
>>>> my @quality_values  = @{$contig->get_consensus_quality()->qual()};
>>>>
>>>> hope this helps,
>>>>
>>>> Russell
>>>>
>>>>
>>>> -----Original Message-----
>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>> bounces at lists.open-bio.org] On Behalf Of Abhishek Pratap
>>>>> Sent: Tuesday, 6 January 2009 6:43 p.m.
>>>>> To: bioperl-l at lists.open-bio.org
>>>>> Subject: [Bioperl-l] Parser: Ace file (Sequence Assembly) in  
>>>>> Bioperl
>>>>>
>>>>> Hi All
>>>>>
>>>>> I am looking for some code to parse the ACE file format. I have  
>>>>> big ACE
>>>>> files which I would like to trim based on the user defined  
>>>>> Contig name
>>>>> and
>>>>> specific region and write out the output to another fresh ACE  
>>>>> file.
>>>>>
>>>>> For now I am trying to tweak Bio::Assembly::IO; but it is kind  
>>>>> of slow.
>>>>> Any
>>>>> other alternative or suggestions.
>>>>>
>>>>> Thanks All,
>>>>> -Abhi
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> -----------------------------
>>>>> Abhishek Pratap
>>>>> Bioinformatics Software Engineer
>>>>> Institute for Genome Sciences
>>>>> School of Medicine, Univ of Maryland
>>>>> 801, W. Baltimore Street, Baltimore, MD 21209
>>>>> Ph: (+1)-410-706-2296
>>>>> www.igs.umaryland.edu/
>>>>>
>>>>> Chair
>>>>> RSG-Worldwide
>>>>> ISCB-Student Council
>>>>> http://iscbsc.org/rsg
>>>>>
>>>>> www.bioinfosolutions.com
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>> = 
>>>> = 
>>>> = 
>>>> = 
>>>> ===================================================================
>>>> Attention: The information contained in this message and/or  
>>>> attachments
>>>> from AgResearch Limited is intended only for the persons or  
>>>> entities
>>>> to which it is addressed and may contain confidential and/or  
>>>> privileged
>>>> material. Any review, retransmission, dissemination or other use  
>>>> of, or
>>>> taking of any action in reliance upon, this information by  
>>>> persons or
>>>> entities other than the intended recipients is prohibited by  
>>>> AgResearch
>>>> Limited. If you have received this message in error, please  
>>>> notify the
>>>> sender immediately.
>>>> = 
>>>> = 
>>>> = 
>>>> = 
>>>> ===================================================================
>>>>
>>>>
>>>
>>>
>>> --
>>> -----------------------------
>>> Abhishek Pratap
>>> Bioinformatics Software Engineer
>>> Institute for Genome Sciences
>>> School of Medicine, Univ of Maryland
>>> 801, W. Baltimore Street, Baltimore, MD 21209
>>> Ph: (+1)-410-706-2296
>>> www.igs.umaryland.edu/
>>>
>>> Chair
>>> RSG-Worldwide
>>> ISCB-Student Council
>>> http://iscbsc.org/rsg
>>>
>>> www.bioinfosolutions.com
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>>
>
>
> -- 
> -----------------------------
> Abhishek Pratap
> Bioinformatics Software Engineer
> Institute for Genome Sciences
> School of Medicine, Univ of Maryland
> 801, W. Baltimore Street, Baltimore, MD 21209
> Ph: (+1)-410-706-2296
> www.igs.umaryland.edu/
>
> Chair
> RSG-Worldwide
> ISCB-Student Council
> http://iscbsc.org/rsg
>
> www.bioinfosolutions.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l