[Bioperl-l] Help needed urgently
Phillip San Miguel
pmiguel at purdue.edu
Wed Nov 15 17:00:50 UTC 2006
Sayali wrote:
> I wish to fetch consensus sequence and the names of the trace (chromat)
> files used in the assembly from the .ace file
>
> For this purpose, I am using Bio::Assembly::IO. But I am unable to find the
> appropriate methods which would enable me to fetch this information.
>
>
>
> Note:
>
> The BS line (base segment) in the .ace file indicates which read phrap has
> chosen to be the consensus at a particular position.
>
> For example:
>
> BS 1 515 K26-572c gives
>
> BS <padded start consensus position> <padded end consensus position> <read
> name> respectively.
>
>
>
> How do I retrieve this information contig wise?
>
> Kindly help.
>
> Regards,
>
> Sayali D Salodkar
>
Hi Sayali,
I don't think there is a pre-rolled bioperl method to do what you
ask. (But it seems like one is under construction?)
You can extract the contig sequences from a phrap .ace file with the
following plain perl code:
my $print_it = 0;
while (<>){
$print_it = /^CO / ? 1
: /^$/ ? 0
: $print_it
;
s/^CO />/;
s/\*//g; #removes "pads"--optional
print if $print_it;
}
To find the reads that went into each contig, you do *not* want the
BS tagged records. My understanding is that BS is just what consed uses
to populate its consensus line from the ace file.
The simplest way is:
egrep '^CO|AF' acefilename
if you are on a unix system. Or with perl
while (<>) {
print if (/^CO|AF/);
}
--
Phillip
Purdue Genomics Core Facility
More information about the Bioperl-l
mailing list