counting fasta files

David Mathog mathog at mendel.bio.caltech.edu
Wed Sep 11 16:05:20 UTC 2002


> I have a file that contains several hundreds of fasta sequences.  Is
there
> a function/program that will count the number of sequences in this
file
> and report it?

Try this:

ftp://saf.bio.caltech.edu/pub/software/molbio/fastaproperties.c

Reads a fasta file (as filename from arg 1, stdin if that's "-")
and emits one status line to stdout which is:

N M TYPE MINLEN MAXLEN AVELEN

where
  N is the number of sequences in the file
  M is the total number of bp/aa in the file (over all sequences)
  TYPE is P or N, the best guess for protein or nucleic acid. If
     it can't tell it will emit P.
  MINLEN
  MAXLEN

1.0.2 11-JUL-2002, DRM.  Added the statistics at the end of the line.
1.0.1 22-MAY-2002, DRM.  Revised count of bases so that it doesn't mess
  up on counts > int4 range.  Use of long long is an extension to
  older ANSI C standards.


Regards,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech



More information about the EMBOSS mailing list