[Bioperl-l] fasta format

Paul Gordon gordonp@cbr.nrc.ca
Fri, 23 Aug 2002 18:53:59 -0300 (ADT)


> my ($id,$fulldesc) = $top =~ /^\s*(\S+)\s*(.*)/

I guess the tradeoffs are between:

1. people who put a description, but no identifier at all, for whom the
current code does not work nicely

2. people who have a space between the > and the identifier.  

So, which is more likely to occur?  If you wanted to get really fancy, you
might check, if there is a leading space, if the next word looks like an
identifier (e.g. /^[^A-Z\-]$/i).  Even swissprot ids usually have
numbers or underscores.  It may not work all the time (e.g. 16S kind of
descriptors), but perhaps it's better than assuming the user isn't
providing an identifier at all?  And it would be mostly backward
compatible?