New program: makeseq

Henrikki Almusa henrikki.almusa at
Fri Sep 24 07:23:33 UTC 2004


I have written a new program for emboss. The code is made against emboss-2.9.0 
and the program is called 'makeseq'. It creates random sequences, but since 
the biological world isn't quite that random, it can use either pepstats 
output (for proteins) or cusp output (for nucleotides) to create a 
distribution. This should give users the ability to create random sequences 
biased according to their own sequence triplet or amino acid distributions. 
The program also allows inserting a given sequence (insert) within the 
created sequence. However, I've encountered a few problems where I need help.

1. Acd handling
I've tried to make the program query something depending on other selection. 
Sequence type should be asked if there is no distribution file and start 
point of the insertion should be asked if insert has been given. I can't make 
it query the these properly properly.

2. Segfaults
The program segfaults when asked to make a nucleotide sequence with a given 
insert. This is caused by the inserts sequence type check. The stack trace 

#0  0x40103531 in cvt_s () from /work/hena/emboss-2.9.0/lib/
#1  0x4010487a in ajFmtVfmt () from /work/hena/emboss-2.9.0/lib/
#2  0x4010445c in ajFmtVfmtStrCL ()
   from /work/hena/emboss-2.9.0/lib/
#3  0x40104367 in ajFmtPrintS () from /work/hena/emboss-2.9.0/lib/
#4  0x40145998 in seqTypeCharDnaGap ()
   from /work/hena/emboss-2.9.0/lib/
#5  0x40144e70 in ajSeqTypeDnaS ()
   from /work/hena/emboss-2.9.0/lib/
#6  0x08049234 in main ()
#7  0x40466a67 in __libc_start_main () from /lib/i686/

Protein typechecking works ok.

3. Uniformity
This problem appears when making pure random sequence. I tried to use 
'ajax/seqtype.c' lines

char seqCharProtPure[]  = "ACDEFGHIKLMNPQRSTVWYacdefghiklmnpqrstvwy";
char seqCharNucPure[]   = "ACGTUacgtu";

with the following addeitions to the file

int  seqCharProtPureLength = 40;
int  seqCharNucPureLength = 10;

Now, this did not work. Therefore, I just copied them within the program and 
it worked fine. However, I don't think this is the proper way to do, since 
the program doesn't then uses it's own settings for what is good character 
and what is not. Is there a way to use something more generic, so that if 
emboss changes these things, they would be applied to this program as well?

Any help is most appreciated. I would like to submit this when these things 
are fixed. 'makeseq.c' and 'makeseq.acd' are attached. Also basic help on 
creating help page for makeseq would be appreciated.

Henrikki Almusa
-------------- next part --------------
application: makeseq [
  documentation: "Creates random sequences"
  groups: "Edit"

section: input [
  information: "Input section"
  type: "page"
  infile: data [
    information: "Distribution file"
    help: "This file should be pepstats output file to create protein
           sequences or cusp output to create nucleotide sequence. Nucleotide
           sequences will be created as triplets with end trimmed to be
           correct length."
    additional: "Y"
    nullok: "Y"
endsection: input

section: required [
  information: "Required section"
  type: "page"

  integer: amount  [
    standard: "Y"
    default: "100"
    minimum: "1"
    information: "Number of sequences"

  integer: length  [
    standard: "Y"
    default: "100"
    minimum: "1"
    information: "Length of single sequence"

endsection: required

section: advanced [
  information: "Advanced section"
  type: "page"

  # this should be queried if no data file
  boolean: protein  [
    standard: "@(!$(data) > 0 ? Y : N)"
    default: "N"
    additional: "Y"
    information: "Make protein sequences"

  string: insert  [
    information: "Inserted string"
    help: "String that is inserted into sequence"
    additional: "Y"
    nullok: "Y"
    knowntype: "sequence"

  # this isn't always queried even as insert given
  integer: start  [
    standard: "@($(insert) ? Y : N)"
    information: "Start point of inserted sequence"
    minimum: "1"
    default: "1"
    # maximum: "@($(length) - @($(insert) ? $(insert.length)-1 : 0))"

endsection: advanced

section: output [
  information: "Output section"
  type: "page"

  seqoutall: outseq  [
    parameter: "Y"
    type: "any"
    name: "makeseq"

endsection: output
-------------- next part --------------
A non-text attachment was scrubbed...
Name: makeseq.c
Type: text/x-csrc
Size: 8999 bytes
Desc: not available
URL: <>

More information about the emboss-dev mailing list