Bioperl: Re: Bio::Tools::Blast

Steven E. Brenner brenner@hyper.stanford.edu
Fri, 28 Aug 1998 14:39:30 -0700


> It's news to me that the FASTA format is defined in the FASTA package.
> I've searched the docs several times in the past, and I've searched it
> again just now and still can't find it.  Steve, can you point me to
> the correct file?  .c and .h files DON'T count!

My recollection is that the definition is listed under "Pearson file
format" in one of the funny ".me" or ".ne" (I forget) files.  (These files
are formatted for some text-processing program.)

> There is some documentation on FASTA at GenBank's site, but it is
> totally underspecified.  It basically says what characters are valid
> in DNA and Peptide sequences.
> 
> In any case, what annoys me is that many people are using the
> description field to encapsulate meta-data, but nobody is doing it in
> the same way.  Even at the NCBI, I see different conventions.  For
> example, Greg Schuler has a simple tag=value notation, but FASTA files
> produced by other NCBI scientists use the | symbol to delineate
> positional parameters.  A real mess.

It is absolutely the case that there are no broadly-used standards about
how data other than "identifier" "description" "sequence".   However, I
don't really see how creating a new format which nobody else uses will
improve the situation.  


Personally, I have found it very convenient to use FASTA-format files,
where I embed additional information in the "description" in a way which
is specific to my applications.  I can run my sequences through a huge
number of differnt programs and my additional data are passed along
harmlessly and unharmed.   

The same thing could be accomplished with, for example, Boulder format.
(My "personal" tags would just go along for the ride).  But the relevant
point is that virtually every program already accepts the FASTA format,
and not Boulder or anything else that we're going to create in the near
future.

Steve

=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================