[EMBOSS] Counting the number of sequences in a file

Peter Rice pmr at ebi.ac.uk
Tue Jul 20 17:02:12 UTC 2010


On 20/07/10 17:27, Peter C. wrote:
> Hi all,
> 
> Is there a tool in EMBOSS to just count the number of sequences in a file?
>
> Right now I could handle this by using seqret to convert the file into FASTA
> and then pipe that though grep to count the records. But an EMBOSS tool
> would be more elegant, e.g.
> 
> $ countseq -sformat=genbank gbvrt1.seq
> 31065
> 
> For the implementation you might offer the choice between using the normal
> EMBOSS parsing (as in seqret) versus file format specific regular expression
> searches which just look for marker lines (without checking validity) which
> should be really fast.

Very easy to write ... you could do it yourself for practise (we will
help of course).

Just use seqret as the basis, don't write any sequences out, but add an
outfile for the results.

We will add countseq to the next release.

regards,

Peter Rice





More information about the EMBOSS mailing list