[Bioperl-l] Help needed urgently

Phillip San Miguel pmiguel at purdue.edu
Wed Nov 15 17:00:50 UTC 2006


Sayali wrote:
> I wish to fetch consensus sequence and the names of the trace (chromat)
> files used in the assembly from the .ace file
>
> For this purpose, I am using Bio::Assembly::IO. But I am unable to find the
> appropriate methods which would enable me to fetch this information.
>
>  
>
> Note: 
>
> The BS line (base segment) in the .ace file indicates which read phrap has
> chosen to be the consensus at a particular position.
>
> For example: 
>
> BS 1 515 K26-572c gives 
>
> BS <padded start consensus position> <padded end consensus position> <read
> name> respectively.
>
>       
>
> How do I retrieve this information contig wise?
>
> Kindly help.
>
> Regards,
>
> Sayali D Salodkar
>   
Hi Sayali,
    I don't think there is a pre-rolled bioperl method to do what you 
ask. (But it seems like one is under construction?)

    You can extract the contig sequences from a phrap .ace file with the 
following plain perl code:

my $print_it    =   0;
while (<>){
    $print_it =     /^CO / ?   1
                 :  /^$/   ?   0
                 :             $print_it
                 ;
    s/^CO />/;
    s/\*//g;     #removes "pads"--optional
    print if $print_it;
}


    To find the reads that went into each contig, you do *not* want the 
BS tagged records. My understanding is that BS is just what consed uses 
to populate its consensus line from the ace file.
    The simplest way is:

egrep '^CO|AF' acefilename

if you are on a unix system. Or with perl

while (<>) {
    print if (/^CO|AF/);
}

-- 
Phillip
Purdue Genomics Core Facility
   



More information about the Bioperl-l mailing list