[Bioperl-l] Bioperl and ACE files
Wes Barris
wes.barris at csiro.au
Mon Feb 16 17:42:54 EST 2004
Jason Stajich wrote:
> As always, more code and information as to how you got here makes it
> easier for someone to answer.
I have attached the perl script that I am using.
The sample ACE file is available here:
http://www.livestockgenomics.csiro.au/junk.ace
You might run this script like this:
acetest.pl junk.ace outdir
>
> Not really sure how you are getting to the point where you have created
> Bio::LocatableSeq objects - presumably you are trying to do an assembly
> so I'll guess you got there from Bio::Assembly::IO.
>
> You may need to get help from Robson about what the format is supposed to
> support. A start of 0 is not really proper in Bioperl -
> sequences/features start at 1 in our system, so the assembly code needs to
> adjust for that. presumably those numbers are offsets not actual start
> positions so the parsing code may need some looking at.
>
> -jason
>
>
> On Mon, 16 Feb 2004, Wes Barris wrote:
>
> > Hi,
> >
> > I have an ACE file that I am trying to process with bioperl. A portion
> > of the ACE file looks like this:
> >
> > AF CB429506 U 2
> > AF CB428704 U 6
> > AF CB430643 U 1
> > AF CB431187 U 0
> > AF CB430639 U -7
> > AF CB430480 C 24
> > AF CB430055 U 10
> >
> > Notice the line in the middle that shows a starting position of '0'
> > (zero)? When bioperl tries to process this sequence, an error is
> > thrown. I have found the port of the bioperl code that throws the
> > error:
> > Bio/LocatableSeq.pm:
> > sub get_nse{
> > my ($self,$char1,$char2) = @_;
> >
> > $char1 ||= "/";
> > $char2 ||= "-";
> >
> > $self->throw("Attribute id not set") unless $self->id();
> > $self->throw("Attribute start not set") unless $self->start();<-----
> > $self->throw("Attribute end not set") unless $self->end();
> >
> > return $self->id() . $char1 . $self->start . $char2 . $self->end ;
> >
> > }
> >
> > Notice how "$self->start()" is tested. When it encounters a sequence
> > whose start is set to zero, an error is thrown.
> >
> > I don't know much about the ACE file format. Do I have a questionable
> > ACE file or is this test incomplete?
> >
>
> --
> Jason Stajich
> Duke University
> jason at cgt.mc.duke.edu
>
--
Wes Barris
E-Mail: Wes.Barris at csiro.au
-------------- next part --------------
#!/usr/local/bin/perl -w
#
#
use strict;
use Bio::Assembly::IO;
use Bio::AlignIO;
use Bio::SeqIO;
#
my $usage = "Usage: $0 <infile.ace> <outdir>\n";
my $infile = shift or die $usage;
my $outdir = shift or die $usage;
my $prefix = 'cn';
my $ext = 'msf';
mkdir $outdir, 0755 if (! -d $outdir);
my $io = new Bio::Assembly::IO(-file=>$infile, -format=>'ace');
my $assembly = $io->next_assembly; # Bio::Assembly::ScaffoldI
foreach my $contig ($assembly->all_contigs()) { # Bio::Assembly::Contig
my $contigName = $prefix.($contig->id);
#
# Write the consensus to a file.
#
my $consensusSeq = new Bio::Seq(
-seq=>$contig->get_consensus_sequence->seq,
-id=>$contigName);
my $seqout = new Bio::SeqIO(-file=>">$outdir/$contigName.fa", -format=>'fasta');
$seqout->write_seq($consensusSeq);
#
# Make the consensus the first sequence of the simple align object.
#
my $aln = new Bio::SimpleAlign();
$aln->id('alignment.msf');
$contig->get_consensus_sequence->id($contigName);
$aln->add_seq($contig->get_consensus_sequence);
#
# Loop through each sequence in the contig adding it to the new alignment.
#
foreach my $seq ($contig->each_seq) {
my $id;
if ($seq->display_id =~ /\|/) {
my @junk = split(/[\|\.]/, $seq->display_id);
$id = $junk[3];
}
else {
$id = $seq->display_id;
}
my $lseq = new Bio::LocatableSeq(
-seq=>$seq->seq,
-id=>$id,
-start=>$contig->get_seq_coord($seq)->start,
-end=>$contig->get_seq_coord($seq)->end,
);
&alignSeq($lseq,$contig->get_consensus_sequence->length);
$aln->add_seq($lseq);
}
$aln->set_displayname_flat;
my $outstream = new Bio::AlignIO(-format=>'msf', -file=>">$outdir/$contigName.$ext");
$outstream->write_aln($aln);
undef $outstream;
}
exit;
sub alignSeq {
my ($lseq, $cnlength) = @_;
#
# Clip any sequence that begins before the consensus.
#
if ($lseq->start <= 0) {
my $offset = -$lseq->start + 2;
$lseq->seq($lseq->subseq($offset,$lseq->length));
print($lseq->display_id," was clipped at the beginning by $offset\n");
}
#
# Pad each sequence so it aligns with the consensus.
#
my $before = $lseq->start - 1;
my $after = $cnlength - $lseq->end;
my $alignedSequence = '-' x $before . $lseq->seq . '-' x $after;
#
# Trim any sequence that extends beyond the consensus.
#
if (length($alignedSequence) > $cnlength) {
$alignedSequence = substr($alignedSequence, 0 ,$cnlength);
print($lseq->display_id," was clipped at the end\n");
}
$lseq->seq($alignedSequence);
return;
}
More information about the Bioperl-l
mailing list