[Bioperl-l] finding position of every instance of a pattern in asequence

George Hartzell hartzell at kestrel.alerce.com
Sat Sep 3 19:55:36 EDT 2005


If the 'contig boundary tags' are something that you can match
exactly, you can probably touch up this to do what you want: $seq and
$oligo are simple sequence strings (in this case I'm looking for exact
matches to a short oligo sequence) and you end up with an array full
of generic seqfeatures.


    while (($pos = index($seq, $oligo, $pos)) != -1) {
      my $feat = new Bio::SeqFeature::Generic(-start => $pos+1,
					      -end => $pos + length $oligo,
					      -strand => +1,
					      -primary => 'hit',
					      -source_tag => 'oligo_search',
					      -display_name => $name,
					      -seq_id => $seqobj->id(),
					      -tag => {
						       name => $name,
						       oligo_seq => $oligo,
						      },
					     );
      push @$results, $feat->gff_string;
      $pos++;
    }

g.


Marc Logghe writes:
 > Hi Andrew,
 > One way you could do it, is to use the external EMBOSS application
 > fuzznuc or dreg. There is a API available in bioperl-run
 > (Bio::Factory::EMBOSS). Nice thing is that you can give the EMBOSS
 > option -rformat2 gff so that your output is in GFF. The latter can
 > easily be turned into a feature object:
 > my $tag = Bio::SeqFeature::Generic->new(-gff_string => $gff); 
 > 
 > HTH,
 > Marc
 > 
 > > -----Original Message-----
 > > From: bioperl-l-bounces at portal.open-bio.org 
 > > [mailto:bioperl-l-bounces at portal.open-bio.org] On Behalf Of 
 > > Andrew Stewart
 > > Sent: Friday, September 02, 2005 10:06 PM
 > > To: bioperl-l at portal.open-bio.org
 > > Subject: [Bioperl-l] finding position of every instance of a 
 > > pattern in asequence
 > > 
 > > I have problem I am trying to solve.  I know I can no doubt 
 > > rig up some regular perl code to solve it, but I'm wondering 
 > > if there is some BioPerl module that might make the task much simpler.
 > > 
 > > I have a single sequence composed of a concatenation of 
 > > several contigs, with special 'contig boundary tags' marking 
 > > the transition between every of neighboring contigs. 
 > > 
 > > I wrote a script that reads in the sequence, as well as 
 > > glimmer output from the sequence in order to create a series 
 > > of features (for output into a genbank file).  Because any of 
 > > these features spanning across these contig boundaries 
 > > probably isn't real, I also want to create a miscellaneous 
 > > feature whereever there is a 'contig boundary tag'.
 > > 
 > > Basically what I need is a function that will search the 
 > > entire sequence for my tag sequence, and return a list of the 
 > > locations for every instance of it found in the sequence. 
 > > 
 > > Can anyone direct me to a module that handles this sort of 
 > > thing, or do I need to rig it up outside of bioperl?
 > > 
 > > 
 > > Thanks,
 > > -Andrew Stewart
 > > _______________________________________________
 > > Bioperl-l mailing list
 > > Bioperl-l at portal.open-bio.org
 > > http://portal.open-bio.org/mailman/listinfo/bioperl-l
 > > 
 > 
 > _______________________________________________
 > Bioperl-l mailing list
 > Bioperl-l at portal.open-bio.org
 > http://portal.open-bio.org/mailman/listinfo/bioperl-l
 > 


More information about the Bioperl-l mailing list