[Biojava-l] FASTA parsing bug ?

Mark Schreiber markjschreiber at gmail.com
Wed Apr 29 14:33:27 UTC 2009


I can understand a bench scientist wanting FASTA but a computational
biologist. They should be ashamed! With some of the friendly XPath
implementations in common scripting languages there really is no excuse.
It's easier to parse XML than FASTA in Groovy, Perl, Python and Ruby.
Probably Java and C as well.

The state of bioinformatics data formats is cringe worthy. Let's try and
enter the 21st century!

OK I'm ranting again. Maybe I'll go join twitter.

- Mark

On 29 Apr 2009, 10:04 PM, "Josh Goodman" <jogoodma at indiana.edu> wrote:


Hi Mark,

I couldn't agree with you more, which is why we also provide this data in
GFF and Chado XML formats, Chado PostgreSQL dumps, and a public read only
Chado database.  However, no matter how much we try to encourage use of
the other formats users still flock to the good old FASTA files.  There
are a variety of reasons but the most common case involves bench
scientists and/or programmers who run at the sight of anything more
complex than a FASTA file.

I've toyed with the idea of reducing the data we cram into the headers to
gently try to encourage use of the other more sensible formats.  However,
at the end of the day we (FlyBase) serve at the behest of our user
community and this is what they want to see.

Cheers,
Josh

On Wed, 29 Apr 2009, Mark Schreiber wrote: > People who know me will know I
am not a big fan of F...



More information about the Biojava-l mailing list