fasta splitter

David Mathog mathog at mendel.bio.caltech.edu
Tue Oct 8 20:33:56 UTC 2002



> > What did you have in mind as a naming convention for the 
> > output files? The 
> > existing code names each file after the first sequence, I 
> > guess you want 
> > "outfile.1" "outfile.2" and so on, possibly with leading zeroes 
> > "outfile.,001" etc.
> 
> My $.02: I think that outfile.[number_here] is not a good convention,
since the extension (whatever you put after the dot) means the file
type, and here the file type is always the same (ASCII text). I think it
should be something like:
> outfile_[number].txt
> It should look like this:
> outfile_1.txt


I agree.  Also the numeric range should be displayed
in a fixed column width.  Ideally something like:

  % esplit \
     -sequence=ncbi_nr.nfa \
     -fmask='nr_frag_####.nfa' \
     -spitn=20 \
     -splitmode=cycle \
     -numberfrom=0

would produce

nr_frag_0000.nfa
...
nr_frag_0019.nfa

Keeping the names fixed width prevents all sorts of
text alignment problems which can show up otherwise.

Regards,


David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech



More information about the EMBOSS mailing list