[Bioperl-l] fasta format
Matthew Pocock
matthew_pocock@yahoo.co.uk
Sat, 24 Aug 2002 17:25:34 +0100
Hi. There is no one-size-fits-all solution for fasta description lines.
Perhaps an optional callback on the fasta parser object that takes all
text following ">" including all whitespace, and returns an array -
(id,description)? You could write a handfull of default callbacks with
obvious names and in realy mad situations (SCOP fasta may be a candidate
for this), the user can provide their own. Apologies if the bioperl
fasta already has this functionality.
Matthew
Paul Gordon wrote:
>>my ($id,$fulldesc) = $top =~ /^\s*(\S+)\s*(.*)/
>
>
> I guess the tradeoffs are between:
>
> 1. people who put a description, but no identifier at all, for whom the
> current code does not work nicely
>
> 2. people who have a space between the > and the identifier.
>
> So, which is more likely to occur? If you wanted to get really fancy, you
> might check, if there is a leading space, if the next word looks like an
> identifier (e.g. /^[^A-Z\-]$/i). Even swissprot ids usually have
> numbers or underscores. It may not work all the time (e.g. 16S kind of
> descriptors), but perhaps it's better than assuming the user isn't
> providing an identifier at all? And it would be mostly backward
> compatible?
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>
--
BioJava Consulting LTD - Support and training for BioJava
http://www.biojava.co.uk
__________________________________________________
Do You Yahoo!?
Everything you'll ever need on one web page
from News and Sport to Email and Music Charts
http://uk.my.yahoo.com