[Bioperl-l] Bio::*Taxonomy* changes
Chris Fields
cjfields at uiuc.edu
Tue Jul 25 14:58:29 UTC 2006
Agreed. I fully support the addition of an optional lookup; it gives much
more flexibility SeqIO re: your previous examples of screening sequence
streams for sequences that are primate, mitochondrial, etc. The key word I
want to emphasize is 'optional', not 'enforced'.
I appreciate what Sendu is trying to do; I really do. I think carrying over
an object named 'Bio::Species' into Taxonomy is too confusing (your
'contagion' analogy, as it were). The 'species' concept (biologically
speaking here, not talking about the Bioperl class) is a taxonomic rank
(i.e. part of a taxonomy). I'm trying to take a biologist's point of view
here. What is a 'species'? Or, if we were to stick strictly with using
NCBI definitions, what is a 'species'?
The NCBI definition of 'species' is simply a rank in a lineage, so it is (in
Bioperl terms) a Node. If we were to follow that line of reasoning, why
also have a Species object represent a Taxonomy as well? It's way too
confusing.
Sendu's repeatedly stating "a Species is a Taxonomy" makes some sense in a
BioPerl world only, as we're speaking about a class that has been around for
a long time, one that acted as a container of sorts for sequence data. And
I understand what he intends to do.
Conceptually speaking here, though, the way it is laid out, a Bio::Species
object can hold a Node that represents a 'species' rank, as well as a
'genus' Node, and a 'family' node, and on and on. That's not a 'species',
that's a taxonomy. So just call it a Taxonomy.
The object itself (Bio::Species) never truly represented a 'species' anyway,
biologically speaking, every time it held sequence data. It could be a
subspecies, strain, plasmid, unknown, or an unclassified rank ('no rank') or
environmental sample. It really held a fancier representation of a node, as
based on the TaxID.
My final point is, saying "a species is a taxonomy" to the rest of the
biological world doesn't make sense. Maybe it makes sense to you and I and
Sendu, in our little Bioperl world. But to the thousands of users out there
who don't completely grok the Bioperl class structure, it's just confusing.
If I were to get an object back that was labeled Bio::Species, as a
biologist I would expect it to be part of a taxonomy, not the actual
Taxonomy itself. So, why not cut to the chase: if we are to fundamentally
change the concept of what Bio::Species is by making it a Taxonomy/TaxonomyI
or whatever, why not just use a Taxonomy object altogether and not bother
with Bio::Species at all? Deprecate it.
BTW, I'll be in Connecticut for five days at UConn. So I hope to escape the
heat for a bit. Thanks for listening to my side of things.
Chris
> -----Original Message-----
> From: Hilmar Lapp [mailto:hlapp at gmx.net]
> Sent: Tuesday, July 25, 2006 8:54 AM
> To: Chris Fields
> Cc: Sendu Bala; bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] Bio::*Taxonomy* changes
>
> We intend on having everyone who wants correct taxonomy parsing
> results for the entire kingdom of life to define his/her
> authoritative taxonomy database, be it local or not, be it HTTP or
> SQL queried.
>
> If you don't care about the correctness of the taxonomy parse, or if
> the taxonomy information in the flat file is trivially parseable
> because it conforms to standard binomial convention, then whatever is
> to be put in place needs to work fine regardless of whether a
> taxonomy database is defined or not.
>
> -hilmar
>
> On Jul 25, 2006, at 1:53 AM, Chris Fields wrote:
>
> > So do we intend on having everyone who installs bioperl have a local
> > copy of the taxonomy dumpfile? Or perform a remote lookup via
> > Entrez? Seems a bit extreme.
> >
> > I would like the option of not having the lookup run; as I mentioned
> > to Sendu, one of the biggest complaints about bioperl is speed.
> > Additional lookups won't help on that end.
> >
> > Chris
> >
> > On Jul 24, 2006, at 10:31 PM, Hilmar Lapp wrote:
> >
> >>
> >> On Jul 24, 2006, at 10:29 PM, Chris Fields wrote:
> >>
> >>> [...]
> >>> We could go back and forth on what Jason really intended. [...] The
> >>> reality is he's not here and you're willing to do the job.
> >>
> >> Right. And, knowing Jason, I think he'd be perfectly fine with seeing
> >> his original idea develop in a possibly different direction, provided
> >> it will all work nicely in the end. I'm willing to take the beating
> >> on me if that doesn't turn out to be true ...
> >>
> >>>
> >>> There is one thing I will make perfectly clear here: there should
> >>> never, ever be enforced lookups for SeqIO (even using caches),
> >>
> >> You certainly don't want taxonomy lookups during the parsing stage,
> >> and also not for the client requesting properties of the species that
> >> have been parsed with high confidence, i.e., genus and species for a
> >> straightforward binomial like 'Homo sapiens'.
> >>
> >> Writing sequences, IMHO, doesn't have to be as fast. It may be better
> >> to emit strict format a bit slower rather than sloppy format a bit
> >> faster.
> >>
> >> Upon parsing, one idea could be for the flat file parser to set a
> >> dirty bit in the parsed out species if the parsed text didn't follow
> >> strict binomial conventions, hence the parser may have made a mistake
> >> and if a client requests the information it is better to lookup the
> >> correct values from a taxonomy database. I.e., you could try with a
> >> strict regex first that would imply a high-confidence result. If that
> >> fails you don't give up but mark the result as untrustworthy.
> >>
> >>
> >>> [...]
> >>> This would have been MUCH easier if all three of us could have gone
> >>> to the local bar for a beer and discussed it. We should just take
> >>> the time out to videoconference next time.
> >>
> >> You're not honestly suggesting that a videoconference is better than
> >> having beer together?
> >>
> >> Enjoy your trip, and thanks for hanging in there in the discussion, I
> >> appreciate it.
> >>
> >> -hilmar
> >> --
> >> ===========================================================
> >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
> >> ===========================================================
> >>
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > Christopher Fields
> > Postdoctoral Researcher
> > Lab of Dr. Robert Switzer
> > Dept of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>
> --
> ===========================================================
> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
> ===========================================================
>
>
>
More information about the Bioperl-l
mailing list