[Bioperl-l] Extracting matching subsequence from pairwise alignment
WoA
witch.of.agnessi at gmail.com
Wed May 8 19:24:53 UTC 2013
Hello All,
I've a pairwise global alignemnet of two DNA sequences generated by the
program NEEDLE of EMBOSS package. I wish to extract the sub-sequence that
matches/aligns to a given region of the other sequence. In this alignment
<http://pastebin.com/wrhEDzJU> (Pastebin Link) the given region (actually
the CDS) falls between base number 24:485 in the original sequence with ID
'XM_001005073.' I wish to extract the sub-sequence in the sequence ID
'Homolog' that aligns with that 24:485 region of the other sequence.
I'm using Bioperl to parse the alignment. I find out the the alignment
column numbers corresponding to 24:485 region in the particular sequence,
using 'column_from_residue_number'. Then I extract the sub-sequence from the
'aligned' other sequence(containing gaps) using the corresponding column
numbers. Finally I remove the gap characters.
Am I doing this thing correctly and are there any pitfalls ? Is there any
better way to do it by (Bio)Perl/Python? The code goes here:
use strict;
use warnings;
use Bio::AlignIO;
# read in an alignment generated by the EMBOSS program Needle
my $in = new Bio::AlignIO(-format => 'emboss',
-file => 'test_needle.aln');
while( my $aln = $in->next_aln ) {
#Seqnames: 'XM_001005073.'(CDS:24-485),'Homolog'
my ($cds_start,$cds_end)=(24,485);#
my $col_cdsstart = $aln->column_from_residue_number(
'XM_001005073.', $cds_start);
my $col_cdsend= $aln->column_from_residue_number( 'XM_001005073.',
$cds_end);
foreach my $seq ($aln->each_seq) {
if($seq->id() eq 'Homolog'){
my
$homolog_cds=$seq->subseq($col_cdsstart,$col_cdsend);
$homolog_cds=~s/\-//g;
print $homolog_cds,"\n";
}
}
}
--
View this message in context: http://bioperl.996286.n3.nabble.com/Extracting-matching-subsequence-from-pairwise-alignment-tp16935.html
Sent from the Bioperl-L mailing list archive at Nabble.com.
More information about the Bioperl-l
mailing list