[Bioperl-l] parsing result of CAP3 (ACE file)
Jean-Marc FRIGERIO
Frigerio at pierroton.inra.fr
Tue Sep 9 08:45:19 UTC 2008
> > -- Hi,
> >
> > Is somebody have a piece of code to parse result of CAP3 assembly program
> > which
> > format is ACE ?
> > I need to retrieve the alignment from this file.
> >
> > thank you,
> > Laurent --
> >
> >
> >
> >
> > +---------------------------------------------+
> > Laurent Manchon
> > Email: lmanchon at univ-montp2.fr
> > +---------------------------------------------+
>
> Laurent -
>
> I have modified modules that will do it as I recently ran into problems
> with the DB_FILE module in Assembly::IO. In addition, the current version
> of cap3 seems to put a contig length where a pad length is expected (based
> on the Ace format description). The modules I have will parse the ace file
> contig-by-contig rather than having the entire assembly slurped into memory
> (or a tied hash) all at once. You are welcome to them if you are
> interested and I'd like to get them in Bioperl at some point. Bascially,
> there are three files - a modified Contig.pm, ContigIO.pm, and a modified
> ace.pm (in a ContigIO directory).
>
> Josh
Hi,
Here are a 2 pieces of code running on an ace file (output of phrap is that
the same as cap3 ?)
----------------------------- 1 -----------------------------------------
my $assembly = Bio::Assembly::IO->new('-file' => $file,
'-format' => 'ace')->next_assembly;
for my $contig ($assembly->all_contigs)
{
my $ct_seq = $contig->get_consensus_sequence;
(my $ref_seq = uc $ct_seq->seq) =~ s/-//g;
my $debut = $pos - 100 > 0 ? $pos - 100 : 1;
my $fin = $pos + 100 <= length $ref_seq ?
$pos + 100 : length $ref_seq;
my $coll = $contig->get_features_collection;
my @coll = $coll->features_in_range('-start' => $debut, '-end' => $fin);
for my $tag (@coll)
{
next unless $tag->primary_tag eq 'comment';
#print "TAG: ",$tag->start,"\n";
my $tag_pos = $contig->change_coord('gapped consensus','ungapped
consensus',$tag->start);
#print "TAG POS: $tag_pos\n";
next if $pos == $tag_pos;
substr($ref_seq,$tag_pos-1,1,'N');
}
}
------------------------------------ 2 -------------------
my $assembly = Bio::Assembly::IO->new(
'-file' => $file,
'-format' => 'ace')->next_assembly;
for my $contig ($assembly->all_contigs)
{
for my $seq ($contig->each_seq)
{
my $id = $seq->id;
my $s = $seq->seq;
my ($start,$end) =
($contig->change_coord("aligned $id","ungapped consensus",
$seq->start),
$contig->change_coord("aligned $id","ungapped consensus",$seq->end));
my $dir = $seq->strand < 0 ? 'R' : 'F';
......
}
-- Jean-Marc
More information about the Bioperl-l
mailing list