[Bioperl-l] ace.pm

Mon Feb 16 22:35:49 EST 2004

Jason Stajich wrote:

> 
> People write code and modules to support the work they are doing,
> sometimes for a specific data set - so I suspect Robson wrote this to
> support phrap ace format which has a convention of them being ContigXX.
> 
> You are welcome to make changes to code on your local system to get it
> working and then post the diffs so they can be incorporated back in.  Why
> not try changing the code as you have noticed and seeing if it works.  It
> is a collaborative project and these modules are newish, so give a try
> fixing things and then getting feedback on your fixes.

I have modified one line in Bio/Assembly/IO/ace.pm as shown below:

         # Loading contig sequence (COntig sequence field)
#       (/^CO Contig(\d+) (\d+) (\d+) (\d+) (\w+)/) && do { # New contig 
found!
         (/^CO (\w+) (\d+) (\d+) (\d+) (\w+)/) && do { # New contig found!

The change will cause the contigID to be whatever the second field of
this line is (CO CL15Contig1 794 4 0 U).  In this case, it would be
set to "CL15Contig1".

> 
> -jason
> 
> On Tue, 17 Feb 2004, Wes Barris wrote:
> 
>  > Hi,
>  >
>  > ACE files generated by an application called tgicl have "CO"
>  > lines of the form:
>  >
>  > CO CL15Contig2 794 4 0 U
>  >
>  > This line is not parsed properly by the ace.pm bioperl module.
>  > Notice this line from Bio/Assembly/IO/ace.pm .
>  >
>  >          (/^CO Contig(\d+) (\d+) (\d+) (\d+) (\w+)/) && do { # New
>  > contig found!
>  >
>  > Bioperl expects the second "word" in the line to be "Contig\d+" where
>  > the number is used as the "contigID".  Is there a reason why
>  > "contigID" must be a number?  Why can't it be the whole second
>  > "word" of the "CO" line?
>  >
> 
> -- 
> Jason Stajich
> Duke University
> jason at cgt.mc.duke.edu
> 

-- 
Wes Barris
E-Mail: Wes.Barris at csiro.au