New program: makeseq
Henrikki Almusa
henrikki.almusa at helsinki.fi
Fri Sep 24 07:23:33 UTC 2004
Hello,
I have written a new program for emboss. The code is made against emboss-2.9.0
and the program is called 'makeseq'. It creates random sequences, but since
the biological world isn't quite that random, it can use either pepstats
output (for proteins) or cusp output (for nucleotides) to create a
distribution. This should give users the ability to create random sequences
biased according to their own sequence triplet or amino acid distributions.
The program also allows inserting a given sequence (insert) within the
created sequence. However, I've encountered a few problems where I need help.
1. Acd handling
I've tried to make the program query something depending on other selection.
Sequence type should be asked if there is no distribution file and start
point of the insertion should be asked if insert has been given. I can't make
it query the these properly properly.
2. Segfaults
The program segfaults when asked to make a nucleotide sequence with a given
insert. This is caused by the inserts sequence type check. The stack trace
is:
#0 0x40103531 in cvt_s () from /work/hena/emboss-2.9.0/lib/libajax.so.0
#1 0x4010487a in ajFmtVfmt () from /work/hena/emboss-2.9.0/lib/libajax.so.0
#2 0x4010445c in ajFmtVfmtStrCL ()
from /work/hena/emboss-2.9.0/lib/libajax.so.0
#3 0x40104367 in ajFmtPrintS () from /work/hena/emboss-2.9.0/lib/libajax.so.0
#4 0x40145998 in seqTypeCharDnaGap ()
from /work/hena/emboss-2.9.0/lib/libajax.so.0
#5 0x40144e70 in ajSeqTypeDnaS ()
from /work/hena/emboss-2.9.0/lib/libajax.so.0
#6 0x08049234 in main ()
#7 0x40466a67 in __libc_start_main () from /lib/i686/libc.so.6
Protein typechecking works ok.
3. Uniformity
This problem appears when making pure random sequence. I tried to use
'ajax/seqtype.c' lines
char seqCharProtPure[] = "ACDEFGHIKLMNPQRSTVWYacdefghiklmnpqrstvwy";
char seqCharNucPure[] = "ACGTUacgtu";
with the following addeitions to the file
int seqCharProtPureLength = 40;
int seqCharNucPureLength = 10;
Now, this did not work. Therefore, I just copied them within the program and
it worked fine. However, I don't think this is the proper way to do, since
the program doesn't then uses it's own settings for what is good character
and what is not. Is there a way to use something more generic, so that if
emboss changes these things, they would be applied to this program as well?
Any help is most appreciated. I would like to submit this when these things
are fixed. 'makeseq.c' and 'makeseq.acd' are attached. Also basic help on
creating help page for makeseq would be appreciated.
Thanks,
--
Henrikki Almusa
-------------- next part --------------
application: makeseq [
documentation: "Creates random sequences"
groups: "Edit"
]
section: input [
information: "Input section"
type: "page"
]
infile: data [
information: "Distribution file"
help: "This file should be pepstats output file to create protein
sequences or cusp output to create nucleotide sequence. Nucleotide
sequences will be created as triplets with end trimmed to be
correct length."
additional: "Y"
nullok: "Y"
]
endsection: input
section: required [
information: "Required section"
type: "page"
]
integer: amount [
standard: "Y"
default: "100"
minimum: "1"
information: "Number of sequences"
]
integer: length [
standard: "Y"
default: "100"
minimum: "1"
information: "Length of single sequence"
]
endsection: required
section: advanced [
information: "Advanced section"
type: "page"
]
# this should be queried if no data file
boolean: protein [
standard: "@(!$(data) > 0 ? Y : N)"
default: "N"
additional: "Y"
information: "Make protein sequences"
]
string: insert [
information: "Inserted string"
help: "String that is inserted into sequence"
additional: "Y"
nullok: "Y"
knowntype: "sequence"
]
# this isn't always queried even as insert given
integer: start [
standard: "@($(insert) ? Y : N)"
information: "Start point of inserted sequence"
minimum: "1"
default: "1"
# maximum: "@($(length) - @($(insert) ? $(insert.length)-1 : 0))"
]
endsection: advanced
section: output [
information: "Output section"
type: "page"
]
seqoutall: outseq [
parameter: "Y"
type: "any"
name: "makeseq"
]
endsection: output
-------------- next part --------------
A non-text attachment was scrubbed...
Name: makeseq.c
Type: text/x-csrc
Size: 8999 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/emboss-dev/attachments/20040924/3a8b4444/attachment-0001.bin>
More information about the emboss-dev
mailing list