fasta splitter

Tue Oct 8 18:38:50 UTC 2002

----- Original Message -----
From: "Peter Rice" <peter.rice at uk.lionbioscience.com>
To: "Tony Cox" <avc at sanger.ac.uk>
Cc: "January Weiner 3" <jweiner1 at ix.urz.uni-heidelberg.de>;
<pise at pasteur.fr>; <emboss at embnet.org>
Sent: Tuesday, October 08, 2002 5:42 PM
Subject: Re: fasta splitter

> Hi Tony
>
> > that sounds excellent - does this mean it really will make it in to the
EMBOSS
> > release? (any idea when? ;)
>
> I already have the first part of the code ... a modified "seqret" to split
> into 10 sequences per file.
>
> Working copy is called "tenco" :-)
>
> What did you have in mind as a naming convention for the output files? The
> existing code names each file after the first sequence, I guess you want
> "outfile.1" "outfile.2" and so on, possibly with leading zeroes
> "outfile.,001" etc.

Hi Peter,

This sounds great to me. Personally, I'd prefer not to have the leading
zeros - just an incrementing ".[integer]" appended to the filename supplied.
Makes shell manipulation easier.

I guess the ideal would able to supply either a number of chunks to split
the file in to or else specify a maximum size (either in bytes or fasta
entries) for each chunk.

cheers

Tony