More than one output ... detail

Peter Rice pmr at ebi.ac.uk
Mon Mar 24 12:21:50 UTC 2003


Some detailed proposals (and questions) on the text, feature and 
sequence outputs (graphics to follow).

Comments please...

The applications with multiple output files (now or in the near future) are:

1. checktrans: outfile, outseq, featout

We can replace the output with a report file
featout is obsolete (a report file can write features).
If we add "-rformat fasta" we can write the sequences in FASTA format 
for any report (with annotation from the report features).

Does anyone really need all 3 outputs? (or more than one of them frmo 
one run?)

2. cpgplot: outfile, featout

outfile could become a report with the outfile notes in the header, and 
a table output (gff is a -rformat option to write the featout file)

3. cpgreport: outfile, featout

outfile could become a report in rformat table ... see cpgplot.

4. diffseq: report featout featout

diffseq reports differences between 2 sequences, as a special report.
It also writes the differences to feature files in GFF format - we can 
make this a separate -rformat with both sequences annotated in one GFF file.

Does anyone need the separate featout files from diffseq?

5. einverted (and palindrome): outfile (report, align)

The output file should be an alignment showing the inverted repeat(s). 
This is complicated by being a pairwise alignment of one sequence with 
itself.

Can this be an alignment format, to avoid the need for a separate report 
file? If so, will the alignment routines know there is only one sequence 
so GFF output can be merged?

Palindrome has the same problems.

6. emma: seqoutset outfile

emma produces a sequence file with aligned sequences (should this be an 
alignment file instead?) and a text file whch is a copy of clustalw's 
dendrogram output. Should the dendrogram be a special output file type? 
Should emma be simplified to remove the dendrogram option, and make a 
separate application to generate it?

Does anyone use the .dnd output file from emma?

Should we rewrite the emma interface to make it a lot simpler?

Note: emma uses a "string" ACD type for the old dendrogram (-dendfile) 
input filename, and for other optional input files. This should be 
changed to use infile ... to help wrappers, and to validate the filenames.

7. equicktandem: report, outfile

For those who parsed the old format, equicktandem still produces a 
second outfile. Does anyone still use it? Can we flag it as obsolete so 
interfaces can safely ignore it? (it has nullok set)

8. est2genome: outfile (report, align)

est2genome needs to be converted to produce report and alignment output. 
The alignment output is optional.

Many est2genome users depend on the old text output, so this may need to 
be preserved - but preferably as an obsolete output (see etandem) or a 
renamed "oldest2genome" application that the rest of us can ignore.

I would prefer to have 2 outputs (report and align) with the align 
optional. The align ACD definition can set "nullok:Y" and depend on the
-align option. This also needs a new alignment format (est2genome 
alignments are not simple :-)

9. etandem: report, outfile

For those who parsed the old format, etandem still produces a second 
outfile. Does anyone still use it? Can we flag it as obsolete so 
interfaces can safely ignore it? (it has nullok set).

10. megamerger: outfile, seqout

I guess we need both outputs. Can the outfile be a report in the style 
of diffseq? It reports what happens to each mismatch region.

Or ... can the output be a sequence with features, and use a report 
format as the default feature format? It may mean "merging" some featout 
and report qualifiers and attributes.

Any output sequence from EMBOSS can have a feature table. We usually do 
not define output feature tables (feature: "Y") for output sequence 
types in ACD. They are not shown in the "-help" output.

11. merger: align seqout

Do we need both outputs? Maybe we do. We could make a 
"-aformat=consensus" alignment option to get the sequence.

Or we could make one or both outputs optional (with the nullok attribute 
in ACD) so wrappers can turn them off.

12. notseq: seqoutall seqoutall

Writes the sequences remaining, and possibly the sequences excluded.

Can we have a simple switch that writes one or the other set, to a 
single output file?

13. sirna: report, seqoutall

Can the seqoutall output be a report format? Maybe not, as the sequence 
reported is not the same as the original sequence.

Can we use a sequence with a feature report? (see megamerger)

14. sixpack: outfile, seqoutall

Tricky, because both files are used by some interfaces (SRS for example 
- although it cheats by merging them)

The output file is like "remap -translation".

It does not work as a report format - too many command line options to 
change the appearance of the translation.

I think we must keep the existing 2 output files in this case.

15. supermatcher: align, outfile

The outfile is only an error report. Can we simply use standard error 
and let it be redirected? Or do we really need it? There is only a "no 
start point" message written to it. Standard error, and a verbose option 
for this message would probably be good enough.

16. vectorstrip: outfile, seqoutall

Would be better to have a report file. Can this have the sequence output 
as an extra format or file?

Not easy to match the report file to the seqoutall (the report file 
shows sequences excluded, the seqoutall file shows sequences that 
remain). Should be possible to tag the features in the report to make 
this work.

Can we use standard error as the output file, with a verbose option to 
report vectors (see supermatcher) ?

17. wordmatch: align featout featout

Do we need the featout files? They report the matched regions between 
the two input sequences. This could be simply a new alignment format (a 
GFF file with each of the aligned sequences represented).

regards,

Peter











More information about the emboss-dev mailing list