[Bioperl-l] Motifs and aligned sequences

Antoni Fernàndez-Guerra genomewalker at gmail.com
Sat Aug 19 07:24:31 UTC 2006


Thank you for your help, now I've found a temporary solution for my problem, 
I'm new using Perl and Bioperl,  I've used some help at the book Beginning 
Perl for Bioinformatics, here is part of the code:

I've two arrays to store the dna sequence without dashes(@inter) and his 
position in the sequence with dashes(@num):

foreach $seq (@filename) {
	if( $seq eq '-'){
	++$a;
	}elsif ($seq ne '-'){
	++$a;
	push (@inter, $seq);
	push (@num, $a);
		}
	}
After I ask for the motif and is searched into @inter, I can find the 
beginning and the end of the motif into the modified sequence. With this 
positions I can look into @num and @inter and I obtain the positions:

my $nucleotide = join( '', @inter);
while( $nucleotide =~ /$motif/g ) {
	my $position = pos($nucleotide) ;
	my $init = pos($nucleotide) - length($&) +1;
	push(@locations, $position);
	push(@initial, $init);
	my $position1 = $position -1;
	my $init1 = $init -1;
	print "Start: $inter[$init1] -- $num[$init1]\n";
	print "End:   $inter[$position1] -- $num[$position1]\n\n";

    }
      
I don't know is it very elegant but it seems to work.
Thanks again 
Antonio
A Dissabte 19 Agost 2006 05:17, Seiji Kumagai va escriure:
> Hi,
>
> How about this?
>
> my $str = q/-G---ATT---AT--ATA/;
> my $motif = q/A\-*T\-*T\-*A\-*T\-*A/;
> while ($str =~ /$motif/g) {
>      print $+[0], qq/\n/;
> }
>
> The above code prints the last base positions of a motif. It is only valid
> for *non-overlapping* motifs. For overlapping motif, you can replace
> /$motif/ with /(?=$motif)/. However, if you do so, you won't be able to
> print the positions of the last bases. In stead, it will print positions
> of the immediately before the first bases in the motif. But, I think you
> can easily find the positions of the last bases if you know that
> position. Finally, you can find the explanation in perlre.
>
> On Sat, 19 Aug 2006, Antoni [iso-8859-1] Fernàndez-Guerra wrote:
> > Thanks for you answer Brian, but I've already done it, the problem is
> > that if I remove the dashes I will lose the positions on the aligned
> > sequence, eg: s/-//g --->> GATTATATA, then if i want to know where is the
> > last position of the motif it will be 7 instead of 16. I want to know the
> > positions of the dashes too...but now I don't have any good idea, I will
> > keep working on it. Thanks again
> > Antonio
> >
> >> A Divendres 18 Agost 2006 23:52, vàreu escriure:
> >>> Antonio,
> >>>
> >>> First remove the dashes from the consensus, s/-//g.
> >>>
> >>> Brian O.
> >>>
> >>> On 8/18/06 2:05 PM, "Antonio" <genomewalker at gmail.com> wrote:
> >>>> Hello all,
> >>>> I am trying to find the solution of this problem, I've tried several
> >>>> options but no way. I want to find a motif in an aligned sequence, eg:
> >>>> Aligned Sequence: -G---ATT---AT--ATA
> >>>> Motif: ATTATA
> >>>> So i want to find the motif inside this sequence  and return the last
> >>>> position of the motif in the aligned sequence, in this case 16. I
> >>>> don't know how I've to play with the '-', any suggestions?
> >>>> Thanks in advance!
> >>>> Antonio
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list