[Bioperl-l] Re (3): Status of assembly modules

Lee Katz lskatz at gmail.com
Fri Dec 24 03:06:24 UTC 2010


Done

On Sun, Dec 19, 2010 at 11:08 PM, Florent Angly <florent.angly at gmail.com>wrote:

> Hi Lee,
>
> I was able to fix the bug you reported regarding the contig IDs in the
> developement verison of Bioperl.
>
> For the other bug, please file a bug report at
> http://bugzilla.open-bio.org/ and give me the URL. Provide the file that
> you used so that we can reproduce the bug. Also, tell me how much memory you
> have on your machine, as I suspect that you may be running out of memory
> because of the size of your dataset and the way the Bioperl assembly modules
> deal with contigs and scaffolds.
>
> Thank you,
>
> Florent
>
>
>
> On 16/12/10 10:15, Chris Fields wrote:
>
>> Lee,
>>
>> You are more than welcome to look at the code to optimize it; might be
>> worth looking athe they way scaffolds, contigs, etc are defined within one
>> aonther.  I believe Florent Angly has been actively working on these
>> modules; Florent may have thoughts on this.
>>
>> chris
>>
>> On Dec 10, 2010, at 5:53 PM, Lee Katz wrote:
>>
>>  I am wondering if there is a way to optimize the BioPerl code for
>>> Assembly
>>> IO.  Specifically, when I convert a 2.2 MB genome (~200 contigs) from a
>>> 454
>>> ace file to a regular ace file, it takes a few hours to get through 30
>>> contigs using the code below (I estimate more than a day to get through
>>> all
>>> of it).
>>>
>>> Is there a way to optimize it?  To convert a sequence file to another
>>> format
>>> at most would take a minute and therefore converting an ace on the
>>> magnitude
>>> of hours or days is too much.  I wish I understood bioperl better but I
>>> think the best I can do is issue a challenge or a feature request:  who
>>> can
>>> speed up Assembly::IO::ace?
>>>
>>> # convert a Newbler ace to a standard ace
>>> sub _newblerAceToAce($args){
>>>  my($self,$args)=@_;
>>>  my
>>>
>>> $ace454=Bio::Assembly::IO->new(-file=>$$args{ace454Path},-format=>"ace",-variant=>'454');
>>>  my
>>> $ace=Bio::Assembly::IO->new(-file=>">$$args{acePath}",-format=>"ace");
>>> #output ace
>>>  my $numContigs=`grep -c ^CO $$args{ace454Path}`+0;
>>>  logmsg "Converting $$args{ace454Path} (454-ace) to $$args{acePath}
>>> (ace).
>>> $numContigs contigs total.";
>>>  while(my $contig=$ace454->next_contig){
>>>    logmsg "Finished with ".$contig->id ." out of $numContigs";
>>>    $ace->write_contig($contig);
>>>  }
>>>  return $$args{acePath};
>>> }
>>>
>>>
>>> Message: 3
>>>
>>> Date: Mon, 22 Nov 2010 15:18:10 -0500
>>>
>>> From: Lee Katz<lskatz at gatech.edu>
>>>
>>> Subject: [Bioperl-l] Re(2): Status of assembly modules
>>>
>>> To: bioperl-l at lists.open-bio.org
>>>
>>> Message-ID:
>>>
>>>       <AANLkTi=JShCLsHDxHK4eeWD3Da=vWmRkGN2rkuLwCjxn at mail.gmail.com>
>>>
>>> Content-Type: text/plain; charset=UTF-8
>>>
>>>
>>> I figured it out (I haven't tested much though).
>>>
>>>
>>> To whoever works on Assembly::IO::ace.pm:
>>>
>>> I changed a regular expression on line 231 because the contig object was
>>> not
>>>
>>> initializing properly.  For some reason the 454 ace file had adopted the
>>>
>>> reference assembly's ID and therefore there was a GI number followed by a
>>>
>>> pipe.  The pipe was not captured with \w+.  I think that the regex will
>>> be
>>>
>>> safe with \s(\S+)\s.
>>>
>>>
>>> if (/^CO\s(\S+)\s(\d+)\s(\d+)\s(\d+)\s(\w+)/xms) { # New contig starts!
>>>
>>> #if (/^CO\s(\w+)\s(\d+)\s(\d+)\s(\d+)\s(\w+)/xms) { # New contig starts!
>>>
>>>
>>> On Thu, Nov 18, 2010 at 12:04 PM,<bioperl-l-request at lists.open-bio.org
>>>
>>>> wrote:
>>>>
>>>
>>>  Message: 3
>>>> Date: Wed, 17 Nov 2010 22:20:03 -0500
>>>> From: Lee Katz<lskatz at gatech.edu>
>>>> Subject: Re: [Bioperl-l] Status of assembly modules
>>>> To: bioperl-l at lists.open-bio.org
>>>> Message-ID:
>>>>       <AANLkTi=aFAnYrEXj3D4joZeYwxRT971M_ZYR0uFJOrxc at mail.gmail.com>
>>>> Content-Type: text/plain; charset=UTF-8
>>>> I have read on the BioPerl site that a 454 ace is not standardized due
>>>> to
>>>> its coordinate system.  How can I convert it to the standard ace file?
>>>> When I run this code either by using contig or assembly objects, I get
>>>> an
>>>> error.
>>>> Can't call method "get_consensus_sequence" on an undefined value at
>>>> Bio/Assembly/IO/ace.pm line 280,<GEN0>  line 93349.
>>>>   sub _newblerAceToAce($args){
>>>>     my($self,$args)=@_;
>>>>     my
>>>> $ace454=Bio::Assembly::IO->new(-file=>$args{ace454Path},-format=>"ace
>>>>
>>> ",-variant=>'454');
>>>
>>>      my
>>>> $ace=Bio::Assembly::IO->new(-file=>">$args{acePath}",-format=>"ace");
>>>>     #while(my $contig=$ace454->next_contig){
>>>>     while(my $scaffold=$ace454->next_assembly){
>>>>       print Dumper $scaffold;
>>>>     }
>>>>     return $args{acePath};
>>>>   }
>>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>


-- 
Lee Katz
http://leeskatz.com



More information about the Bioperl-l mailing list