[Dynamite] new test xml: protein-smith-waterman.xml

Ian Holmes ihh@fruitfly.org
Fri, 28 Jul 2000 09:58:49 -0700 (PDT)


On Fri, 28 Jul 2000, Guy Slater wrote:

> 
> I've started telegraph coding in earnest, but I won't be checking
> stuff in until I have automake/autoconf working properly
> (probably monday).  Automake/autoconf are a pain to get working
> well across platforms, but I think in the long run it will help
> portability and save a lot of hassle.
> 
> Anyway, I've just checked in an initial attempt for another
> test telegraph model: protein-smith-waterman.xml
> 
> You can see it on cvs web in the test directory:
> 
> http://dev.ensembl.org/cgi-bin/cvsweb_telegraph/cvsweb.cgi/telegraph/Telegraph/test/xml/

The web view doesn't seem to be working. is this because sshd was just
upgraded on adnah?

> I'm pretty unsure about a lot of this, so if you
> could both have a look over this it would be good.
> 
> Some questions and comments:
> ---------------------------
> 
>     o (telegraph == Moore) && (dynamite == Mealey) ??
>       (or vice-versa - I can't remember)
>       The advance and param tags are with the transitions,
>       but they way I've put them, they repeat on a per-state basis,
>       which seems pretty pointless and verbose.
>       How should this bit be done properly ?
>       What are the pros and cons of moore vs mealey ?
>       (I thought we'd discussed this but couldn't find it in the archive)
> 
>     o Is the way I've done gap_open and gap_extend correct ?
>       Why are these vectors not scalars ?  It looks really silly.

As regards both these points:
It is not meant to be hand-generated XML (except for these
low-level tests), nor is it meant to look pretty. Do not worry if it looks
wasteful. Remember that we plan to have a higher-level XML (and object 
model!) at some point to abstract some of these things. In particular
'calc' expressions (i.e. scalar gap_open etc); don't try and anticipate
these too early

Moore vs mealey: entirely interconvertible, but since there are more
transitions than states, you have more degrees of freedom our way.

>     o I still don't like the tag name "scores" being used in the
>       parameter assignments.  It only marginally less vague than
>       using "numbers" or "data".  Alternatives ?  "populate" ?

yes i agree. "populate" is good. i had thought of "calc" but i prefer
"populate"

Chris says he thinks we should also have a more XML-like list format for
within the "assign" blocks. e.g.

  <populate table="gap_extend">           [note "param" --> "table"; see below]
   <x>12</x> <x>12</x> <x>12</x> ...
  </populate>

it looks awful but we _do_ need to be able to write a DTD for this XML and
i'm not sure the comma-separated list can be DTD'd. if anyone can come up
with a better way....

i think we should definitely not regard the tagnames as set in stone yet
(so don't embed them in your code (this should go without saying ;-)))

>     o Similarly, I don't like the use of char and chars.
>       Are we limiting alphabet sizes to 256 ?
>       Maybe 'character' or 'symbol' ?

"symbol" would be good, i think.
if we want to not restrict the alphabet size we should change the
declaration syntax to e.g.

 <alphabet name="protein">
  <symbol>A</symbol>
  <symbol>R</symbol>
  <symbol>N</symbol>
...
 </alphabet>

other than that i think it's good.

here are some other tagname changes i think would be good:

(at top level) "index" --> "table"
(within "populate") "param" --> "table"
(within "transition") "<param name=''>" --> "<index table=''>"    [see below]

("table" is a Haskell-ish name for a multidimensional array)

also i would like to change the transition block around a little. this is
the only non-cosmetic change. what i would like is to separate out the
lookback from the table indexing. this means specifying it (the lookback)
twice, but i think it will avoid confusion in the long run, especially
when we start to use polymer HMMs.

i have committed some changes into dna-edit.xml rather than spell them all
out in detail here (we can always rewind) --- please tell me what you
think

ian

> 
> (disclaimer: ian - I know we discusses most of these when you
>              were over here, but if they're still bothering me,
>              they *must* be wrong ;)
> 
> Anyway, looking forward to hearing how I *should* have written this,
> 
> Guy.
> 
> --
> %!PS % <------ Guy St.C. Slater ------> http://www.ebi.ac.uk/~guy/  <------
> 210 297/a{def}def/b{translate}a b 36/c{rotate}a c 0 1 0 1 12/d{exch moveto}
> a/e{closepath stroke}a/f{index}a/g{0 0 0 0 4 f}a/h{setlinewidth newpath dup
> g}a{pop exch 1 f add 0 h neg d lineto 72 c lineto e 2 h d 3 f 0 108 arc d e
> 18 c 0 2 f neg b 18 c}for 72 c newpath add g 0 7 arc d e pop showpage
> 
> _______________________________________________
> Dynamite mailing list  -  Dynamite@bioperl.org
> http://www.bioperl.org/mailman/listinfo/dynamite
>