[Bioperl-l] seq_word and pattern counts
Staffa, Nick (NIH/NIEHS) [C]
staffa at niehs.nih.gov
Tue Feb 28 21:46:30 UTC 2006
Yes
N matches any of the four bases.
Nick Staffa
Telephone: 919-316-4569 (NIEHS: 6-4569)
Scientific Computing Support Group
NIEHS Information Technology Support Services Contract
(Science Task Monitor: Jack L. Field (field1 at niehs.nih.gov ))
National Institute of Environmental Health Sciences
National Institutes of Health
Research Triangle Park, North Carolina
-----Original Message-----
From: Torsten Seemann [mailto:torsten.seemann at infotech.monash.edu.au]
Sent: Tuesday, February 28, 2006 4:45 PM
To: Staffa, Nick (NIH/NIEHS) [C]
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] seq_word and pattern counts
Nick
> Does anyone know if Bio::Tools::SeqWords
> *count_words
> or
> count_overlap_words
> will do DNA pattern searches and honor ambiguity symbols
> like exist in some restriction enzyme pattern definitions,
> e.g. GGnnCC*
Examination of the code
http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/Bio/Tools/SeqWords.html#CODE4
suggests that all it does is count N-mers of any set of letters,
and does so in a case-insensitive way ie. CAT, Cat, cat are counted as
the same N-mer.
So no it does not handle ambiguity symbols in any special manner.
What would you like it to do?
If a N-mer has 1 "N" in it, does it count towards the 4 possible N-mers
it could be?
If it has 2 "N"s in it, does it count toward all 16 possible
non-ambiguous N-mers?
And so on?
--
Torsten Seemann
Victorian Bioinformatics Consortium, Monash University, Australia
http://www.vicbioinformatics.com/
Phone: +61 3 9905 9010
More information about the Bioperl-l
mailing list