New program: makeseq

Henrikki Almusa henrikki.almusa at helsinki.fi
Mon Sep 27 08:33:16 UTC 2004


On Friday 24 September 2004 18:02, Dr J.C. Ison wrote:
> > 1. Acd handling
> The cleanest way to do this is to use a "Toggle" ACD data item.
> e.g.
<snip example>

Ok, this part seems to work ok. I put toggle for both data file and insert 
info.

> You don't need to prompt the user for sequnce type though, because
> "sequence" data items have attributes:
>
> sequence: sequence
> [
>   parameter: "Y"
>   type: protein
> ]
>
> sequence.begin (start residue, i.e. -sbegin value)
<snip sequence infolist>
>
> You access them in ACD by e.g. $(sequence.begin) etc.
> e.g. to ensure your insert isn't past the end of the sequence use
> maximum: $(sequence.end)

Well, i don't have a sequence there anywhere. And the problem also comes from 
the fact that data file can determine the type as well. It is now queried if 
the data file is not given. And since the insert is counted within the 
sequence length the maximium place to start the insert is lenght - 
insert.length. That calculation doesn't seem to work either, so I'm checking 
that inside the code.

> > 2. Segfaults
> If you really can't fix it get back in touch and I can run it through
> Purify.

That would be nice. I honestly can't figure this one out. I checked that the 
insert goes there (inserts ajpstr can be printed with ajFmtPrint() before 
test).

> > 3. Uniformity
> I'm presuming the 10 and 40 are size of your two arrays.  If you want
> to treat them as strings you have to leave space for your terminating
> NULL, so 41 and 11 would do it.  All abitrary limits really should be
> avoided though, use e.g.
>
> AjPStr seqCharProtPure=NULL;
> seqCharProtPure=ajStrNewC("ACDEFGHIKLMNPQRSTVWYacdefghiklmnpqrstvwy");
>
> and ajStrChar to return a single character from a string at a given
> position.

I use the length to tell me size of the char array that exists. Then when 
creating a random sequence, i can just ask random number between 0 and length 
to get a character for sequence. Well there is one abstraction layer between 
that char array and the final one used in randomised selection, but thats 
because of cusp. There is no arbitrary limits as such. Usage of the above 
char arrays are in makeseq_default_chars function.

>>  Is there a way to use something more generic, so that if
> > emboss changes these things, they would be applied to this program as
> > well?
>
> There might (perhaps should!) be - Alan Bleasby
> (ableasby at rfcgr.mrc.ac.uk) is the best man to ask about that.

Ok. I'll put another post to emboss-dev later on this.

> I've attached the template I use for the DOMAINATRIX documentation, e.g.
>
> http://www.rfcgr.mrc.ac.uk/Software/EMBOSS/Apps/domainatrix/rocon.html
> With this template, I document stuff by hand. The only external program
> I use is "acdtable" to get the ACD stuff. This is slightly different
> from the format used for EMBOSS apps though.

So there is no script to run to get basic info from acd file into html file. 
Then its just manual labour of copying and writing html file :).

> Hope this helps and thanks for the interest
>
> Cheers
>
> Jon

Thanks for help. I attached the new versions of .c and .acd files.
-- 
Henrikki Almusa
-------------- next part --------------
A non-text attachment was scrubbed...
Name: makeseq.c
Type: text/x-csrc
Size: 8999 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/emboss-dev/attachments/20040927/eea2d167/attachment-0001.bin>
-------------- next part --------------
application: makeseq [
  documentation: "Creates random sequences"
  groups: "Edit"
]

section: required [
  information: "Required section"
  type: "page"
]

  integer: amount  [
    standard: "Y"
    default: "100"
    minimum: "1"
    information: "Number of sequences"
  ]

  integer: length  [
    standard: "Y"
    default: "100"
    minimum: "1"
    information: "Length of single sequence"
  ]

  toggle: useinsert [
    standard: "Y"
    information: "Do you want to make an insert"
    default: "N"
  ]

  string: insert  [
    standard: "$(useinsert)"
    information: "Inserted string"
    help: "String that is inserted into sequence"
    # nullok: "Y"
    knowntype: "sequence"
  ]

  integer: start  [
    standard: "$(useinsert)"
    information: "Start point of inserted sequence"
    minimum: "1"
    default: "1"
    # maximum: "@($(length) - @($(insert) ? $(insert.length)-1 : 0))"
  ]

  toggle: usedata [
    standard: "Y"
    information: "Do you want to use distribution file"
    default: "N"
  ]

endsection: required

section: input [
  information: "Input section"
  type: "page"
]

  infile: data [
    standard: "$(usedata)"
    information: "Distribution file"
    help: "This file should be pepstats output file to create protein
           sequences or cusp output to create nucleotide sequence. Nucleotide
           sequences will be created as triplets with end trimmed to be
           correct length."
    nullok: "Y"
  ]

endsection: input


section: advanced [
  information: "Advanced section"
  type: "page"
]

  boolean: protein  [
    standard: "@($(usedata) ? N : Y)"
    default: "N"
    information: "Make protein sequences"
  ]

endsection: advanced

section: output [
  information: "Output section"
  type: "page"
]

  seqoutall: outseq  [
    parameter: "Y"
    type: "any"
    name: "makeseq"
  ]

endsection: output


More information about the emboss-dev mailing list