[Bioperl-l] seq_word and pattern counts
Torsten Seemann
torsten.seemann at infotech.monash.edu.au
Tue Feb 28 21:45:16 UTC 2006
Nick
> Does anyone know if Bio::Tools::SeqWords
> *count_words
> or
> count_overlap_words
> will do DNA pattern searches and honor ambiguity symbols
> like exist in some restriction enzyme pattern definitions,
> e.g. GGnnCC*
Examination of the code
http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/Bio/Tools/SeqWords.html#CODE4
suggests that all it does is count N-mers of any set of letters,
and does so in a case-insensitive way ie. CAT, Cat, cat are counted as
the same N-mer.
So no it does not handle ambiguity symbols in any special manner.
What would you like it to do?
If a N-mer has 1 "N" in it, does it count towards the 4 possible N-mers
it could be?
If it has 2 "N"s in it, does it count toward all 16 possible
non-ambiguous N-mers?
And so on?
--
Torsten Seemann
Victorian Bioinformatics Consortium, Monash University, Australia
http://www.vicbioinformatics.com/
Phone: +61 3 9905 9010
More information about the Bioperl-l
mailing list