Bioperl: Bio::Species.pm: fixed bug #226
Ewan Birney
birney@ebi.ac.uk
Sun, 7 May 2000 14:25:11 +0100 (BST)
On Sat, 6 May 2000, Hilmar Lapp wrote:
> Hello all,
>
> as encouraged by Jason, Ewan, and James, I've corrected the bug in Species.pm
> mentioned in report #226, so far only on branch-06.
>
> The fix involved a couple of changes, most noticeably a change of what
> Bio::Species->classification() is expecting now when it is passed an argument.
> The reason is that now all methods consistently access the same array, except
> for common_name(). Bio::SeqIO::genbank.pm and Bio::SeqIO::embl.pm already
> behaved as if Species.pm worked the way it does now, which triggered the bug.
Hmmm. I am worried that for species that do not have a subspecies
identifier that we will falsely classify the subspecies as a species.
Can you reassure me that this wont happen?
>
> See the doc quote at the end.
>
> I'm wondering whether there's a point in fixing the main branch as well
> immediately. A diff of Species.pm to the main-branch version showed a few
> other, but minor, differences, which I think were not introduced by me. So it
> seems to me that the two branches are not necessarily kept sync'ed even if the
> changes do not introduce API changes or new features etc. Could someone let me
> know whether I shall fix the main branch as well.
>
<wince> I left the main turnk somewhat out of sync with the branch when I
made the 0.6 release. (dirty secret. Hands slapped all around). Can you
propagate teh changes you think are required across (?).
> Regarding design, I have the impression that Bio::Species rather encapsulates
> the source of a sequence (therefore the organelle() method) than a species
> only, and that's exactly the point of interest with respect to seqs (like
> organ, tissue, library etc). So, intuitively I'd call it something like
> Bio::SeqSource or Bio::IsolationSource. Does anyone see a point in this? I
> have no idea how much attention the list community pays to such issues, so
> maybe someone can give me an idea whether this is just stupid and you leave
> such things to the OO/Corba/Java domain.
>
I think the species object should handle biologically species *only*.
issues like organelle are half way split between species issues and source
issues - I could argue it either way. In my view, a good design is
something like
Bio::IsolationSource has-a Bio::Species object
and also has additional methods for tissue etc.
Beware: EMBL/GenBank format spreads this information out across a number
of different places, including feature table (wait for it - Chimeric
clones make life **very** difficult). We should keep our heads and design
what we feel are good objects, making sure that a large percentage of
EMBL/GenBank will fit nicely.
I refuse to let the bioperl object model be dominated by EMBL/GenBank
parsing issues. It is an evolutionary dead-end in my view.
> In addition, Jason advised me to post these things (bug-fixes, discussion
> stuff) here, but maybe I've misunderstood him and I'm too technical for this
> list. If so, please let me know, I'm new to all this.
>
This is bang on track. Keep on posting on this...
> Cheers,
>
> Hilmar
>
> Quoted from the updated documention:
>
> =head2 classification
>
> Title : classification
> Usage : $self->classification(@class_array);
> @classification = $self->classification();
> Function: Fills or returns the classifcation list in
> the object. The array provided must be in
> the order SUBSPECIES, SPECIES, GENUS ---> KINGDOM.
> The first and second element of the array, the subspecies and
> species, must be in lower case, and the rest in title
> case. Only species must be present.
>
> Note that the format convention given above has changed after
> release 0.60. Formerly, SUBSPECIES was not necessary. In order to
> break as few scripts as possible, the method tries to recognize
> whether or not the subspecies is provided, given that the rest
> is given in correct case. This is the reason that the example given
> below is still valid.
> Example : $obj->classification(qw( sapiens Homo Hominidae
> Catarrhini Primates Eutheria Mammalia Vertebrata
> Chordata Metazoa Eukaryota));
> Returns : Classification array
> Args : Classification array
>
> =cut
>
> --
> -----------------------------------------------------------------------
> Hilmar Lapp email: hlapp@gmx.net
> NFI Vienna, IFD/Bioinformatics phone: +43 1 86634 631
> A-1235 Vienna fax: +43 1 86634 727
> ROI: Bioinformatics (arrays, expression, seqs), Programming, Databases,
> Mountain Biking (hard tail, hard fork: feel the trail)
> -----------------------------------------------------------------------
> =========== Bioperl Project Mailing List Message Footer =======
> Project URL: http://bio.perl.org/
> For info about how to (un)subscribe, where messages are archived, etc:
> http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
> ====================================================================
>
-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>.
-----------------------------------------------------------------
=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================