[Bioperl-l] need help with phrap parser
Phillip San Miguel
pmiguel at purdue.edu
Fri Dec 8 14:17:02 UTC 2006
neeti somaiya wrote:
> Can anyone point me to a Phrap parser which parses the ace file to extract
> what reads make up each contig (eg. read_a and read_b make contig1; read_d
> read_e and read_z make contig2, and other information of the reads (like
> whether the read is complemented or not with respect to the contig, what
> region of the contig does each read contribute etc), basically the AF and BS
> lines of the ACE output.
>
>
neeti,
To find the reads that went into each contig, you do *not* want the BS tagged records. My understanding is that BS is just what consed uses to populate its consensus line from the ace file.
I write this because of an email sent me by David Gordon in 2001 included here
without his permission:
> > Phrap writes BS lines which
> > indicate, for each consensus position, which read phrap uses at that
> > position to become the consensus. These BS ("base segments") are
> > manipulated by Consed when there are changes to the assembly, such as
> > joins, tears, removing reads, or changing the consensus.
>
The simplest way is:
egrep '^CO|AF|RD' acefilename
if you are on a unix system. Or with perl
while (<>) {
print if (/^CO|AF|RD/);
}
But then you would need to parse the fields of interest. You get the
position/strand in the contig from AF, then you get the length of the
read from RD.
There does look like there is a part of bioperl that meant to perform
this task--including Bio::Assembly::IO::ace but it looks like it was
started, but never completed.
More information about the Bioperl-l
mailing list