[Biojava-l] Languages

Thu, 18 Oct 2001 21:57:16 +0200

Hi all of you,

I think/have now really an issue that I recognize is missing in biojava. It is the
concept of the language. That is the cause that I had always trouble trying to
define a cytogenetic locus alphabet in biojava.
Since cytogenetic loci are sequences of DNA, they are words of the language DNA^*
(That is, DNA u (DNAxDNA) u (DNAxDNAxDNA) u ... ad infinitium)(As I remember).
To be definite, they are elements of a subset of DNA^*, so that this language is a
certain finite language - the cytogenetic locus language then. Of course, many
other languages could be made up of DNA^*. Like a language having only elements
which have stop codons in them, or what ever.
Since we do not know what sequence the cytogenetic loci have, this is another
problem then. At least they have a dedicated name that is totally from their
sequence.
Alphabets and languages have something in common:
- They are sets, and thus, it might be possible to check if 'things' are contained
in them or not.
- they can be finite or infinite
- if infinite, they might at least be enumeratable, recursively enumeratable or
whatever.
It's all theoretical computer science. Not my profession at all.
Maybe the profession of some of the people who'd like to code and say have minor
biological experiences.

Ah, and maybe I am wrong.

Regards,

Armin