[Bioperl-l] finding position of every instance of a pattern in
asequence
George Hartzell
hartzell at kestrel.alerce.com
Sat Sep 3 19:55:36 EDT 2005
If the 'contig boundary tags' are something that you can match
exactly, you can probably touch up this to do what you want: $seq and
$oligo are simple sequence strings (in this case I'm looking for exact
matches to a short oligo sequence) and you end up with an array full
of generic seqfeatures.
while (($pos = index($seq, $oligo, $pos)) != -1) {
my $feat = new Bio::SeqFeature::Generic(-start => $pos+1,
-end => $pos + length $oligo,
-strand => +1,
-primary => 'hit',
-source_tag => 'oligo_search',
-display_name => $name,
-seq_id => $seqobj->id(),
-tag => {
name => $name,
oligo_seq => $oligo,
},
);
push @$results, $feat->gff_string;
$pos++;
}
g.
Marc Logghe writes:
> Hi Andrew,
> One way you could do it, is to use the external EMBOSS application
> fuzznuc or dreg. There is a API available in bioperl-run
> (Bio::Factory::EMBOSS). Nice thing is that you can give the EMBOSS
> option -rformat2 gff so that your output is in GFF. The latter can
> easily be turned into a feature object:
> my $tag = Bio::SeqFeature::Generic->new(-gff_string => $gff);
>
> HTH,
> Marc
>
> > -----Original Message-----
> > From: bioperl-l-bounces at portal.open-bio.org
> > [mailto:bioperl-l-bounces at portal.open-bio.org] On Behalf Of
> > Andrew Stewart
> > Sent: Friday, September 02, 2005 10:06 PM
> > To: bioperl-l at portal.open-bio.org
> > Subject: [Bioperl-l] finding position of every instance of a
> > pattern in asequence
> >
> > I have problem I am trying to solve. I know I can no doubt
> > rig up some regular perl code to solve it, but I'm wondering
> > if there is some BioPerl module that might make the task much simpler.
> >
> > I have a single sequence composed of a concatenation of
> > several contigs, with special 'contig boundary tags' marking
> > the transition between every of neighboring contigs.
> >
> > I wrote a script that reads in the sequence, as well as
> > glimmer output from the sequence in order to create a series
> > of features (for output into a genbank file). Because any of
> > these features spanning across these contig boundaries
> > probably isn't real, I also want to create a miscellaneous
> > feature whereever there is a 'contig boundary tag'.
> >
> > Basically what I need is a function that will search the
> > entire sequence for my tag sequence, and return a list of the
> > locations for every instance of it found in the sequence.
> >
> > Can anyone direct me to a module that handles this sort of
> > thing, or do I need to rig it up outside of bioperl?
> >
> >
> > Thanks,
> > -Andrew Stewart
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
More information about the Bioperl-l
mailing list