[EMBOSS] How to apply the einverted and etandom to a fasta file - C
Guy Bottu
gbottu at ben.vub.ac.be
Mon Oct 30 15:33:13 UTC 2006
On Sun, Oct 29, 2006 at 11:39:35AM -0600, yun zheng wrote:
> I am a new user of emboss. I am trying to find repeat sequences in a
> nucleotide sequence file that have many sequences.
>
> Can anybody tell me how to use einverted and etandem to analyze all the
> sequences in a fasta file?
einverted is searching for palindromes rather than repeats. It operates
without problem on a fastA multiple sequence file. The reason that the
output file is empty is probably because it did not find any good
palindrome. Maybe you can try experiment with the parameters.
etandem operates only on one sequence at a time. You can see this because
if you do etandem -help you see that it takes as input an object of type
"sequence" rather than "seqall". If you want to treat many sequences at
once, you will need to put them in separate files. If necessary you can
run seqret -ossingle on your file. You can under the Tc shell (tcsh)
(provided your files are all called something.fasta) do :
foreach FASTAFILE (`ls *.fasta`)
etandem $FASTAFILE -minrepeat=10 -maxrepeat=10 -threshold=20 -auto
end
Problem is that etandem works only well if you provide an appropriate
value for minrepeat/maxrepeat/threshold. You can use equicktandem to get
an idea (look in the 4th column of the output for a repeat size). Working
on all sequences in one run will of course only go well if they all
contain repeats of similar size and quality.
I hope this helps.
Guy Bottu,
Belgian EMBnet Node
More information about the EMBOSS
mailing list