More than one output ... detail
Peter Rice
pmr at ebi.ac.uk
Mon Mar 24 12:21:50 UTC 2003
Some detailed proposals (and questions) on the text, feature and
sequence outputs (graphics to follow).
Comments please...
The applications with multiple output files (now or in the near future) are:
1. checktrans: outfile, outseq, featout
We can replace the output with a report file
featout is obsolete (a report file can write features).
If we add "-rformat fasta" we can write the sequences in FASTA format
for any report (with annotation from the report features).
Does anyone really need all 3 outputs? (or more than one of them frmo
one run?)
2. cpgplot: outfile, featout
outfile could become a report with the outfile notes in the header, and
a table output (gff is a -rformat option to write the featout file)
3. cpgreport: outfile, featout
outfile could become a report in rformat table ... see cpgplot.
4. diffseq: report featout featout
diffseq reports differences between 2 sequences, as a special report.
It also writes the differences to feature files in GFF format - we can
make this a separate -rformat with both sequences annotated in one GFF file.
Does anyone need the separate featout files from diffseq?
5. einverted (and palindrome): outfile (report, align)
The output file should be an alignment showing the inverted repeat(s).
This is complicated by being a pairwise alignment of one sequence with
itself.
Can this be an alignment format, to avoid the need for a separate report
file? If so, will the alignment routines know there is only one sequence
so GFF output can be merged?
Palindrome has the same problems.
6. emma: seqoutset outfile
emma produces a sequence file with aligned sequences (should this be an
alignment file instead?) and a text file whch is a copy of clustalw's
dendrogram output. Should the dendrogram be a special output file type?
Should emma be simplified to remove the dendrogram option, and make a
separate application to generate it?
Does anyone use the .dnd output file from emma?
Should we rewrite the emma interface to make it a lot simpler?
Note: emma uses a "string" ACD type for the old dendrogram (-dendfile)
input filename, and for other optional input files. This should be
changed to use infile ... to help wrappers, and to validate the filenames.
7. equicktandem: report, outfile
For those who parsed the old format, equicktandem still produces a
second outfile. Does anyone still use it? Can we flag it as obsolete so
interfaces can safely ignore it? (it has nullok set)
8. est2genome: outfile (report, align)
est2genome needs to be converted to produce report and alignment output.
The alignment output is optional.
Many est2genome users depend on the old text output, so this may need to
be preserved - but preferably as an obsolete output (see etandem) or a
renamed "oldest2genome" application that the rest of us can ignore.
I would prefer to have 2 outputs (report and align) with the align
optional. The align ACD definition can set "nullok:Y" and depend on the
-align option. This also needs a new alignment format (est2genome
alignments are not simple :-)
9. etandem: report, outfile
For those who parsed the old format, etandem still produces a second
outfile. Does anyone still use it? Can we flag it as obsolete so
interfaces can safely ignore it? (it has nullok set).
10. megamerger: outfile, seqout
I guess we need both outputs. Can the outfile be a report in the style
of diffseq? It reports what happens to each mismatch region.
Or ... can the output be a sequence with features, and use a report
format as the default feature format? It may mean "merging" some featout
and report qualifiers and attributes.
Any output sequence from EMBOSS can have a feature table. We usually do
not define output feature tables (feature: "Y") for output sequence
types in ACD. They are not shown in the "-help" output.
11. merger: align seqout
Do we need both outputs? Maybe we do. We could make a
"-aformat=consensus" alignment option to get the sequence.
Or we could make one or both outputs optional (with the nullok attribute
in ACD) so wrappers can turn them off.
12. notseq: seqoutall seqoutall
Writes the sequences remaining, and possibly the sequences excluded.
Can we have a simple switch that writes one or the other set, to a
single output file?
13. sirna: report, seqoutall
Can the seqoutall output be a report format? Maybe not, as the sequence
reported is not the same as the original sequence.
Can we use a sequence with a feature report? (see megamerger)
14. sixpack: outfile, seqoutall
Tricky, because both files are used by some interfaces (SRS for example
- although it cheats by merging them)
The output file is like "remap -translation".
It does not work as a report format - too many command line options to
change the appearance of the translation.
I think we must keep the existing 2 output files in this case.
15. supermatcher: align, outfile
The outfile is only an error report. Can we simply use standard error
and let it be redirected? Or do we really need it? There is only a "no
start point" message written to it. Standard error, and a verbose option
for this message would probably be good enough.
16. vectorstrip: outfile, seqoutall
Would be better to have a report file. Can this have the sequence output
as an extra format or file?
Not easy to match the report file to the seqoutall (the report file
shows sequences excluded, the seqoutall file shows sequences that
remain). Should be possible to tag the features in the report to make
this work.
Can we use standard error as the output file, with a verbose option to
report vectors (see supermatcher) ?
17. wordmatch: align featout featout
Do we need the featout files? They report the matched regions between
the two input sequences. This could be simply a new alignment format (a
GFF file with each of the aligned sequences represented).
regards,
Peter
More information about the emboss-dev
mailing list