[Bioperl-l] Re (3): Status of assembly modules

Thu Dec 16 00:15:11 UTC 2010

Lee,

You are more than welcome to look at the code to optimize it; might be worth looking athe they way scaffolds, contigs, etc are defined within one aonther.  I believe Florent Angly has been actively working on these modules; Florent may have thoughts on this.

chris

On Dec 10, 2010, at 5:53 PM, Lee Katz wrote:

> I am wondering if there is a way to optimize the BioPerl code for Assembly
> IO.  Specifically, when I convert a 2.2 MB genome (~200 contigs) from a 454
> ace file to a regular ace file, it takes a few hours to get through 30
> contigs using the code below (I estimate more than a day to get through all
> of it).
> 
> Is there a way to optimize it?  To convert a sequence file to another format
> at most would take a minute and therefore converting an ace on the magnitude
> of hours or days is too much.  I wish I understood bioperl better but I
> think the best I can do is issue a challenge or a feature request:  who can
> speed up Assembly::IO::ace?
> 
> # convert a Newbler ace to a standard ace
> sub _newblerAceToAce($args){
>  my($self,$args)=@_;
>  my
> $ace454=Bio::Assembly::IO->new(-file=>$$args{ace454Path},-format=>"ace",-variant=>'454');
>  my $ace=Bio::Assembly::IO->new(-file=>">$$args{acePath}",-format=>"ace");
> #output ace
>  my $numContigs=`grep -c ^CO $$args{ace454Path}`+0;
>  logmsg "Converting $$args{ace454Path} (454-ace) to $$args{acePath} (ace).
> $numContigs contigs total.";
>  while(my $contig=$ace454->next_contig){
>    logmsg "Finished with ".$contig->id ." out of $numContigs";
>    $ace->write_contig($contig);
>  }
>  return $$args{acePath};
> }
> 
> 
> Message: 3
> 
> Date: Mon, 22 Nov 2010 15:18:10 -0500
> 
> From: Lee Katz <lskatz at gatech.edu>
> 
> Subject: [Bioperl-l] Re(2): Status of assembly modules
> 
> To: bioperl-l at lists.open-bio.org
> 
> Message-ID:
> 
>       <AANLkTi=JShCLsHDxHK4eeWD3Da=vWmRkGN2rkuLwCjxn at mail.gmail.com>
> 
> Content-Type: text/plain; charset=UTF-8
> 
> 
> I figured it out (I haven't tested much though).
> 
> 
> To whoever works on Assembly::IO::ace.pm:
> 
> I changed a regular expression on line 231 because the contig object was not
> 
> initializing properly.  For some reason the 454 ace file had adopted the
> 
> reference assembly's ID and therefore there was a GI number followed by a
> 
> pipe.  The pipe was not captured with \w+.  I think that the regex will be
> 
> safe with \s(\S+)\s.
> 
> 
> if (/^CO\s(\S+)\s(\d+)\s(\d+)\s(\d+)\s(\w+)/xms) { # New contig starts!
> 
> #if (/^CO\s(\w+)\s(\d+)\s(\d+)\s(\d+)\s(\w+)/xms) { # New contig starts!
> 
> 
> On Thu, Nov 18, 2010 at 12:04 PM, <bioperl-l-request at lists.open-bio.org
>> wrote:
> 
> 
>> Message: 3
> 
>> Date: Wed, 17 Nov 2010 22:20:03 -0500
> 
>> From: Lee Katz <lskatz at gatech.edu>
> 
>> Subject: Re: [Bioperl-l] Status of assembly modules
> 
>> To: bioperl-l at lists.open-bio.org
> 
>> Message-ID:
> 
>>       <AANLkTi=aFAnYrEXj3D4joZeYwxRT971M_ZYR0uFJOrxc at mail.gmail.com>
> 
>> Content-Type: text/plain; charset=UTF-8
> 
>> 
> 
>> I have read on the BioPerl site that a 454 ace is not standardized due to
> 
>> its coordinate system.  How can I convert it to the standard ace file?
> 
>> 
> 
>> When I run this code either by using contig or assembly objects, I get an
> 
>> error.
> 
>> Can't call method "get_consensus_sequence" on an undefined value at
> 
>> Bio/Assembly/IO/ace.pm line 280, <GEN0> line 93349.
> 
>> 
> 
>>   sub _newblerAceToAce($args){
> 
>>     my($self,$args)=@_;
> 
>>     my
> 
>> 
> 
>> $ace454=Bio::Assembly::IO->new(-file=>$args{ace454Path},-format=>"ace
> ",-variant=>'454');
> 
>>     my
> 
>> $ace=Bio::Assembly::IO->new(-file=>">$args{acePath}",-format=>"ace");
> 
>>     #while(my $contig=$ace454->next_contig){
> 
>>     while(my $scaffold=$ace454->next_assembly){
> 
>>       print Dumper $scaffold;
> 
>>     }
> 
>>     return $args{acePath};
> 
>>   }
> 
>> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l