[Dynamite] new test xml: protein-smith-waterman.xml
Ian Holmes
ihh@fruitfly.org
Fri, 28 Jul 2000 09:58:49 -0700 (PDT)
On Fri, 28 Jul 2000, Guy Slater wrote:
>
> I've started telegraph coding in earnest, but I won't be checking
> stuff in until I have automake/autoconf working properly
> (probably monday). Automake/autoconf are a pain to get working
> well across platforms, but I think in the long run it will help
> portability and save a lot of hassle.
>
> Anyway, I've just checked in an initial attempt for another
> test telegraph model: protein-smith-waterman.xml
>
> You can see it on cvs web in the test directory:
>
> http://dev.ensembl.org/cgi-bin/cvsweb_telegraph/cvsweb.cgi/telegraph/Telegraph/test/xml/
The web view doesn't seem to be working. is this because sshd was just
upgraded on adnah?
> I'm pretty unsure about a lot of this, so if you
> could both have a look over this it would be good.
>
> Some questions and comments:
> ---------------------------
>
> o (telegraph == Moore) && (dynamite == Mealey) ??
> (or vice-versa - I can't remember)
> The advance and param tags are with the transitions,
> but they way I've put them, they repeat on a per-state basis,
> which seems pretty pointless and verbose.
> How should this bit be done properly ?
> What are the pros and cons of moore vs mealey ?
> (I thought we'd discussed this but couldn't find it in the archive)
>
> o Is the way I've done gap_open and gap_extend correct ?
> Why are these vectors not scalars ? It looks really silly.
As regards both these points:
It is not meant to be hand-generated XML (except for these
low-level tests), nor is it meant to look pretty. Do not worry if it looks
wasteful. Remember that we plan to have a higher-level XML (and object
model!) at some point to abstract some of these things. In particular
'calc' expressions (i.e. scalar gap_open etc); don't try and anticipate
these too early
Moore vs mealey: entirely interconvertible, but since there are more
transitions than states, you have more degrees of freedom our way.
> o I still don't like the tag name "scores" being used in the
> parameter assignments. It only marginally less vague than
> using "numbers" or "data". Alternatives ? "populate" ?
yes i agree. "populate" is good. i had thought of "calc" but i prefer
"populate"
Chris says he thinks we should also have a more XML-like list format for
within the "assign" blocks. e.g.
<populate table="gap_extend"> [note "param" --> "table"; see below]
<x>12</x> <x>12</x> <x>12</x> ...
</populate>
it looks awful but we _do_ need to be able to write a DTD for this XML and
i'm not sure the comma-separated list can be DTD'd. if anyone can come up
with a better way....
i think we should definitely not regard the tagnames as set in stone yet
(so don't embed them in your code (this should go without saying ;-)))
> o Similarly, I don't like the use of char and chars.
> Are we limiting alphabet sizes to 256 ?
> Maybe 'character' or 'symbol' ?
"symbol" would be good, i think.
if we want to not restrict the alphabet size we should change the
declaration syntax to e.g.
<alphabet name="protein">
<symbol>A</symbol>
<symbol>R</symbol>
<symbol>N</symbol>
...
</alphabet>
other than that i think it's good.
here are some other tagname changes i think would be good:
(at top level) "index" --> "table"
(within "populate") "param" --> "table"
(within "transition") "<param name=''>" --> "<index table=''>" [see below]
("table" is a Haskell-ish name for a multidimensional array)
also i would like to change the transition block around a little. this is
the only non-cosmetic change. what i would like is to separate out the
lookback from the table indexing. this means specifying it (the lookback)
twice, but i think it will avoid confusion in the long run, especially
when we start to use polymer HMMs.
i have committed some changes into dna-edit.xml rather than spell them all
out in detail here (we can always rewind) --- please tell me what you
think
ian
>
> (disclaimer: ian - I know we discusses most of these when you
> were over here, but if they're still bothering me,
> they *must* be wrong ;)
>
> Anyway, looking forward to hearing how I *should* have written this,
>
> Guy.
>
> --
> %!PS % <------ Guy St.C. Slater ------> http://www.ebi.ac.uk/~guy/ <------
> 210 297/a{def}def/b{translate}a b 36/c{rotate}a c 0 1 0 1 12/d{exch moveto}
> a/e{closepath stroke}a/f{index}a/g{0 0 0 0 4 f}a/h{setlinewidth newpath dup
> g}a{pop exch 1 f add 0 h neg d lineto 72 c lineto e 2 h d 3 f 0 108 arc d e
> 18 c 0 2 f neg b 18 c}for 72 c newpath add g 0 7 arc d e pop showpage
>
> _______________________________________________
> Dynamite mailing list - Dynamite@bioperl.org
> http://www.bioperl.org/mailman/listinfo/dynamite
>