[Bioperl-l] Getting read position information from an ACE file?

Mon Sep 21 12:01:03 UTC 2009

Dan Bolser wrote:
> 2009/9/18 Mark A. Jensen <maj at fortinbras.us>:
>   
>> Dan -- I don't know much about Assembly, so can't help there. But can I
>>  encourage you and perhaps one or two others (steganographic content:
>> fangly) to create a HOWTO stub out of this? Would be excellent-
>>     
>
> I'd love to. ACE is pretty ubiquitous, so any additional info on how
> to work with them using BioPerl should help a lot of people.
>   
> The problem is that I'm one of those people ;-)
>
>
> I'm working on an 'ace2tab.plx' script that should encompass this
> info. I'm finding that some 'read ids' have the .range format. i.e.
> "read123455.23-239". However, some do not. i.e. "read123456". Not sure
> where this ID comes from, but I think its telling me something about
> partially aligned reads. 

I think you are right. I have heard that Newbler (the 454 assembler) 
does this insane thing, where it will rip reads apart into segments and 
cluster parts of reads in different contigs.

> The problem is that the coordinates I'm
> seeing don't reflect that (they are just the start and the end point
> of the full read).
>   

That sounds similar to how phrap/consed handle "chimeric" reads. But my 
experience is that phrap is pretty parsimonious with numbers of 
chimerics it will allow.  (That isn't entirely fair to Newbler -- I've 
never been able to get phrap to consistently assemble ESTs. Phrap seems 
tuned to assemble BAC shotgun reads. ESTs seem to drive it a little 
crazy. It will create contigs from a set of reads that have essentially 
no similarity to each other, nor to the consensus sequence phrap creates 
for them.)

-- 
Phillip