counting fasta files
mathog at mendel.bio.caltech.edu
Wed Sep 11 16:05:20 UTC 2002
> I have a file that contains several hundreds of fasta sequences. Is
> a function/program that will count the number of sequences in this
> and report it?
Reads a fasta file (as filename from arg 1, stdin if that's "-")
and emits one status line to stdout which is:
N M TYPE MINLEN MAXLEN AVELEN
N is the number of sequences in the file
M is the total number of bp/aa in the file (over all sequences)
TYPE is P or N, the best guess for protein or nucleic acid. If
it can't tell it will emit P.
1.0.2 11-JUL-2002, DRM. Added the statistics at the end of the line.
1.0.1 22-MAY-2002, DRM. Revised count of bases so that it doesn't mess
up on counts > int4 range. Use of long long is an extension to
older ANSI C standards.
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
More information about the EMBOSS