[EMBOSS] How to apply the einverted and etandom to a fasta file - C

Guy Bottu gbottu at ben.vub.ac.be
Mon Oct 30 15:33:13 UTC 2006


On Sun, Oct 29, 2006 at 11:39:35AM -0600, yun zheng wrote:
> I am a new user of emboss. I am trying to find repeat sequences in a
> nucleotide sequence file that have many sequences.
> 
> Can anybody tell me how to use einverted and etandem to analyze all the
> sequences in a fasta file?

einverted is searching for palindromes rather than repeats. It operates 
without problem on a fastA multiple sequence file. The reason that the 
output file is empty is probably because it did not find any good 
palindrome. Maybe you can try experiment with the parameters.

etandem operates only on one sequence at a time. You can see this because 
if you do etandem -help you see that it takes as input an object of type 
"sequence" rather than "seqall". If you want to treat many sequences at 
once, you will need to put them in separate files. If necessary you can 
run seqret -ossingle on your file. You can under the Tc shell (tcsh) 
(provided your files are all called something.fasta) do :

foreach FASTAFILE (`ls *.fasta`)
etandem $FASTAFILE -minrepeat=10 -maxrepeat=10 -threshold=20 -auto
end

Problem is that etandem works only well if you provide an appropriate 
value for minrepeat/maxrepeat/threshold. You can use equicktandem to get 
an idea (look in the 4th column of the output for a repeat size). Working 
on all sequences in one run will of course only go well if they all 
contain repeats of similar size and quality.

I hope this helps.

	Guy Bottu,
	Belgian EMBnet Node




More information about the EMBOSS mailing list