needle -filter
David Mathog
mathog at mendel.bio.caltech.edu
Mon Jun 9 18:55:25 UTC 2003
> Peter Rice wrote:
> > simon andrews (BI) wrote:
> > In the manual for needle it suggests that it too can accept -filter
as a qualifier, but I can't get it to work.
> >
> > cat seq1.txt seq2.txt | needle -filter -sformat1 fasta -sformat2 fasta
> >
Hmm. So presumably they're coming out of some other program
together in a stream and it's inconvenient for some reason
to write them to files. Ok.
> You are trying to read 2 inputs from stdin. needle will accept one
> sequence from stdin and another from "somewhere else".
>
> But you can do this:
>
> needle "cat seq1.txt|" "cat seq2.txt|" -sformat1 fasta -sformat2 fasta
That's got to be one of the ugliest syntaxes for reading
in two files I've ever seen! Plus I don't understand how
it differs from:
needle seq1.txt seq2.txt
It might be possible on some platforms to come up with a
"firstfasta" filter program which would emit just the
first fasta entry from the stream. It would have to run
character by character and be able to push the ">" of the
second entry back into the input stream, and I don't
think that's guaranteed to work everywhere. Probably it
would work on Unix though, so you could maybe do something
like this:
needle "firstfasta" "firstfasta"
What Simon needs, and what Emboss doesn't have, is a built in
splitter for multisequence files that will allow the individual
sequences to be directed to specific inputs in a program like
needle. Failing that one could create two fifos, use an external
splitter to direct the bits into the fifos, and run needle
with the fifos for the input file names.
Probably better to build the splitter into EMBOSS though.
Something like:
cat twosequences.fasta | program -filter -route 1:infile1,2:infile2
where infile1/infile2 are the command line names for things that
are typically called "-sequence" and the like. The problem
with needle (and water) is that the sequences typically
go on the command line unadorned, like:
needle seq1 seq2
for which the syntax might be:
-route 1:1,2:2
-route without -filter would be an error. The stream
properties would make it a bit awkward for something like this:
-route 1:1,1:2
If the program works by loading input 1, then input 2. No way
to back up the stream so that input 2 could load with entry
1. The splitter/router could handle this, but only generally
by saving the contents of the first streamed sequence somewhere
for reuse.
For a program to compare one to many there could also be:
-route 1:1,2-END:2
In theory this splitter/router shouldn't be too hard to implement.
In practice the various file inputs would need to read their
data in the order specified by route, and short of reading each
program's code one would have no way of knowing what that order is.
Which suggests:
program -listroutes
which would emit the read order information that -route would
use later.
Regards,
David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
More information about the EMBOSS
mailing list