[Bioperl-l] Bio::*Taxonomy* changes

Hilmar Lapp hlapp at gmx.net
Tue Jul 25 03:31:41 UTC 2006


On Jul 24, 2006, at 10:29 PM, Chris Fields wrote:

> [...]
> We could go back and forth on what Jason really intended. [...] The  
> reality is he's not here and you're willing to do the job.

Right. And, knowing Jason, I think he'd be perfectly fine with seeing  
his original idea develop in a possibly different direction, provided  
it will all work nicely in the end. I'm willing to take the beating  
on me if that doesn't turn out to be true ...

>
> There is one thing I will make perfectly clear here: there should
> never, ever be enforced lookups for SeqIO (even using caches),

You certainly don't want taxonomy lookups during the parsing stage,  
and also not for the client requesting properties of the species that  
have been parsed with high confidence, i.e.,  genus and species for a  
straightforward binomial like 'Homo sapiens'.

Writing sequences, IMHO, doesn't have to be as fast. It may be better  
to emit strict format a bit slower rather than sloppy format a bit  
faster.

Upon parsing, one idea could be for the flat file parser to set a  
dirty bit in the parsed out species if the parsed text didn't follow  
strict binomial conventions, hence the parser may have made a mistake  
and if a client requests the information it is better to lookup the  
correct values from a taxonomy database. I.e., you could try with a  
strict regex first that would imply a high-confidence result. If that  
fails you don't give up but mark the result as untrustworthy.


> [...]
> This would have been MUCH easier if all three of us could have gone
> to the local bar for a beer and discussed it. We should just take
> the time out to videoconference next time.

You're not honestly suggesting that a videoconference is better than  
having beer together?

Enjoy your trip, and thanks for hanging in there in the discussion, I  
appreciate it.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================








More information about the Bioperl-l mailing list