[Bioperl-l] Bio::*Taxonomy* changes
Hilmar Lapp
hlapp at gmx.net
Tue Jul 25 03:31:41 UTC 2006
On Jul 24, 2006, at 10:29 PM, Chris Fields wrote:
> [...]
> We could go back and forth on what Jason really intended. [...] The
> reality is he's not here and you're willing to do the job.
Right. And, knowing Jason, I think he'd be perfectly fine with seeing
his original idea develop in a possibly different direction, provided
it will all work nicely in the end. I'm willing to take the beating
on me if that doesn't turn out to be true ...
>
> There is one thing I will make perfectly clear here: there should
> never, ever be enforced lookups for SeqIO (even using caches),
You certainly don't want taxonomy lookups during the parsing stage,
and also not for the client requesting properties of the species that
have been parsed with high confidence, i.e., genus and species for a
straightforward binomial like 'Homo sapiens'.
Writing sequences, IMHO, doesn't have to be as fast. It may be better
to emit strict format a bit slower rather than sloppy format a bit
faster.
Upon parsing, one idea could be for the flat file parser to set a
dirty bit in the parsed out species if the parsed text didn't follow
strict binomial conventions, hence the parser may have made a mistake
and if a client requests the information it is better to lookup the
correct values from a taxonomy database. I.e., you could try with a
strict regex first that would imply a high-confidence result. If that
fails you don't give up but mark the result as untrustworthy.
> [...]
> This would have been MUCH easier if all three of us could have gone
> to the local bar for a beer and discussed it. We should just take
> the time out to videoconference next time.
You're not honestly suggesting that a videoconference is better than
having beer together?
Enjoy your trip, and thanks for hanging in there in the discussion, I
appreciate it.
-hilmar
--
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
===========================================================
More information about the Bioperl-l
mailing list