[EMBOSS] seqret options
Derek Gatherer
d.gatherer at vir.gla.ac.uk
Wed Jun 15 10:31:33 UTC 2005
Dear EMBOSSers
I'm trying to write a pipeline to take a load of paired, aligned homologues
from 2 species and submit them sequentially to the yn00 application from
the well known PAML package. PAML's applications all take PHYLIP
format. I can easily make this by looping over:
seqret -auto -osformat phylip infile -out outfile
However, PAML requires that the flag "I" be placed on the top line of the
phylip fomat to indicate interleaved, eg:
2 663 I
c-barf1 ATGGCCAGGC TTTTCGCTCA GCTGCTCCTG CTCGCGGGCT CCGTCGCCTC
barf1 ATGGCCAGGT TCATCGCTCA GCTCCTCCTG TTGGCCTCCT GTGTGGCCGC
CTGCCTGGCC GTCACCGCCT TTGTGGGTGA GCGGGCCGTC CTGAGTTCCT
CGGCCAGGCT GTCACCGCTT TCTTGGGTGA GCGAGTCACC CTGACCTCCT
rather than the standard phylip format, given by seqret:
2 663
c-barf1 ATGGCCAGGC TTTTCGCTCA GCTGCTCCTG CTCGCGGGCT CCGTCGCCTC
barf1 ATGGCCAGGT TCATCGCTCA GCTCCTCCTG TTGGCCTCCT GTGTGGCCGC
CTGCCTGGCC GTCACCGCCT TTGTGGGTGA GCGGGCCGTC CTGAGTTCCT
CGGCCAGGCT GTCACCGCTT TCTTGGGTGA GCGAGTCACC CTGACCTCCT
I could write a script to open each seqret output file and add this
character to the top line of each, but before I dive into this, I'd like to
know if there is any flag I can add to seqret to get the "I" added
automatically.
Failing that, PAML takes the other, non-interleaved phylip format
("sequential") by default, and that would not require any flag
insertion. Seqret also can produce this (using -osformat phylip3):
1 663 YF
c-barf1 ATGGCCAGGC TTTTCGCTCA GCTGCTCCTG CTCGCGGGCT CCGTCGCCTC
CTGCCTGGCC GTCACCGCCT TTGTGGGTGA GCGGGCCGTC CTGAGTTCCT
ACTGGAAGAG GGTGAGCCTA GGGCCCGAGA TCATGGTGGA ATGGTTCAAA
but then PAML won't read it because it doesn't like the YF flags inserted
by seqret!!
So I either have to script to remove flags from sequential or insert them
in interleaved, unless seqret has a solution.
All assistance gratefully appreciated
Derek
More information about the EMBOSS
mailing list