Bioperl: content.pm

Gatherer, D. (Derek) D.Gatherer@organon.nhe.akzonobel.nl
Wed, 19 Jan 2000 09:41:35 +0100


I'm posting this to both lists since it contains a general and a technical
question.

General question:
What do people think that a sequence content module should have as
functions?  Some of the obvious things that occurred to me are:
1) word frequencies, with user specifiying the length of word, eg triplet
(the most useful one perhaps) certainly up to hextuplet (ie dicodon, which
also has its uses) etc.
2) the above as both overlapping and frame specific
3) the above as on both strands
4) some more esoteric things, eg. Shannon entropy and redundancy, R-Y and
S-W content
5) if an ORF can be detected, some properties of the ORF eg. hydrophobicity
etc.

Technical quesiton:

What is the most efficient means of scanning a sequence object for content?
Is it:
a) treat the sequence as an array and use a sliding window of user-specified
size, or....
b) treat the sequence as a string and use regexps again with user-specified
parameters, or....
c) is there another way? (as Tony Blair would say, a Third Way)

If this isn't a clear question, I'll post some code (to guts only)

Best wishes
Derek

=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================