fasta splitter

Peter Rice peter.rice at uk.lionbioscience.com
Tue Oct 8 16:37:36 UTC 2002


Tony Cox wrote:
> On Tue, 8 Oct 2002, January Weiner 3 wrote:
> 
> Thanks to all that responded. I did, in the end write a 12 line bioperl script
> to split my fasta file. My request seems, however, to highlight a small blind
> spot on the EMBOSS radar. It appears that there are a number of implementations
> out there - perhaps one of them can be donated to the emboss project as the
> basis of a new software tool?

Nobody suggested hacking "seqret" to do what you want...

One problem doing this in EMBOSS is the need to generate filenames for your 
  split files - but maybe a base filename would be enough to generate 
names. Then all you need to do is count sequences in a modified seqret.c 
and change the output file. You can add a command line option for the 
number of sequences in an output file. Cleaning up output files for a rerun 
is an exercise for the user (unless you want to invent a new ACD type that 
does it :-)

Needs a modified version of the seqFileReopen function to handle the file 
naming, but nothing complicated is involved.

regards.

Peter

-- 
------------------------------------------------
Peter Rice, LION Bioscience Ltd, Cambridge, UK
peter.rice at uk.lionbioscience.com +44 1223 224723




More information about the EMBOSS mailing list