counting fasta files

Florence Servant flo at ebi.ac.uk
Wed Sep 11 17:32:52 UTC 2002


Fernan Aguero wrote:

> +----[ Asi hablaba Ted Chiang (tchiang at bioinfo.sickkids.on.ca):
> |
> | I have a file that contains several hundreds of fasta sequences.  Is there
> | a function/program that will count the number of sequences in this file
> | and report it?
> |
> +----]
>
> Here's another one
>
> cat file.fasta | grep -c \>
>

Hi all,
    I would suggest to add a contraint which is that the line must start with
>:
        cat file.fasta | grep -c ^\>

    To be really sure it does work for all the fasta files you can have, you
also have to take in account that several comment lines starting with > can
occur for a single sequence.

ggrep -A 1 ^\> file.fasta | grep -v ^-- | grep -v ^\> | wc -l

Flo

--
Florence SERVANT
EBI - European Bioinformatics Institute - Room A2-40
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
United Kingdom
Tel : (+44) 01223 494 686






More information about the EMBOSS mailing list