Bioperl and matcher
Vilanova,David,LAUSANNE,NRC/BS
david.vilanova at rdls.nestle.com
Tue Nov 26 15:58:32 UTC 2002
Hello,
I have problems retrieving the alignments from an emboss output.
The program belows read 2 files and runs a matcher of all against all.
Matcher gives me an msf output and then I try to parse this alignment with
Bio::AlignIO.
However I get an exception...
Processing sequence 1..vs..3...done
------------- EXCEPTION -------------
MSG: 1 exists as an alignment line but not in the header. Not confident of
what is going on!
STACK Bio::AlignIO::msf::next_aln
/usr/local/lib/perl5/site_perl/5.8.0/Bio/AlignIO/msf.pm:106
STACK toplevel Run_Emboss.pl:50
--------------------------------------
Here is the output from matcher:
!!NA_MULTIPLE_ALIGNMENT 1.0
out MSF: 5 Type: N 26/11/02 CompCheck: 2090 ..
Name: EMBOSS_001 Len: 5 Check: 1045 Weight: 1.00
Name: EMBOSS_002 Len: 5 Check: 1045 Weight: 1.00
//
1 5
EMBOSS_001 CGGCG
EMBOSS_002 CGGCG
###########################################################
It doesn't work for fasta format as well in my script (see output below):
Processing sequence 1..vs..3...done
Use of uninitialized value in sprintf at
/usr/local/lib/perl5/site_perl/5.8.0/Bio/SimpleAlign.pm line 257, <GEN2>
line 4.
Use of uninitialized value in hash element at
/usr/local/lib/perl5/site_perl/5.8.0/Bio/SimpleAlign.pm line 268, <GEN2>
line 4.
Use of uninitialized value in hash element at
/usr/local/lib/perl5/site_perl/5.8.0/Bio/SimpleAlign.pm line 268, <GEN2>
line 4.
Use of uninitialized value in hash element at
/usr/local/lib/perl5/site_perl/5.8.0/Bio/SimpleAlign.pm line 270, <GEN2>
line 4.
#########################
#Script
#! /usr/bin/perl -w
use Bio::Factory::EMBOSS;
use Bio::SeqIO;
use Bio::AlignIO;
die "Usage: perl script.pl [seqfileA] [seqfileB] [outfile]\n" unless @ARGV
eq '3';
#Read input files
($seqfileA,$seqfileB,$outfile) = @ARGV;
#Initialize Object
$EMBOSS = new Bio::Factory::EMBOSS;
#Define emboss program to run
$application = $EMBOSS->program('matcher');
#Manipulate SeqfileA file
$seqA = new Bio::SeqIO (-file => $seqfileA,
-format => 'fasta');
while ($seqinA = $seqA->next_seq){
$inseqA = "asis::".$seqinA->seq;
$seqidA = $seqinA->id;
print "####$seqidA\n";
#Initialize seqB at every iteration of SeqA
$seqB = new Bio::SeqIO (-file => $seqfileB,
-format => 'fasta');
while ($seqinB = $seqB->next_seq){
$inseqB = "asis::".$seqinB->seq; #Format like asis::ATGCGA (required for
emboss)
$seqidB = $seqinB->id;
print "Processing sequence $seqidA..vs..$seqidB...";
#Define program parameters and run...
$application->run({
-sequencea => $inseqA,
-sequenceb => $inseqB,
-aformat => 'msf',
-outfile => $outfile });
print "done\n";
$alnin = new Bio::AlignIO(-format => 'msf',
-file => $outfile );
while ($aln = $alnin->next_aln){
print $aln->no_residues,"\n";
#print $aln->consensus_string,"\n";
}
}
}
More information about the EMBOSS
mailing list