[Bioperl-l] Bio::Species validate_species_name
Hilmar Lapp
hlapp at gmx.net
Wed Sep 21 03:40:04 EDT 2005
Sure this is even likely once Entrez Gene comprises of as many species
as e.g. UniProt does - which is not imminent AFAIK but may be the case
one day.
I like to take the position that throwing an exception is always better
than silently making a mistake, but in the case of stream parsers
typically for most users an exception isn't helpful because 1) they
can't fix the datasource, 2) often they can't fix the responsible
Bioperl code either, and 3) most often they don't care about the fringe
species anyway and would have screened out that record anyway.
You could throw a warning if the species name doesn't look like it fits
your expected pattern, but even detecting that isn't easy.
-hilmar
On Sep 20, 2005, at 10:06 AM, Stefan Kirov wrote:
> Hilmar,
> Before I modify entrezgene to override the classification verification
> I have a question - what's the reason to do that? What about the
> danger of inconsistencies down the road? One can imagine a situation,
> where the parser having passed a bad classification creates invalid
> binomial name, which then breaks the user's code (for example database
> queries, file structure, etc.).
> Thanks!
> Stefan
>
> Hilmar Lapp wrote:
>
>> If you set the classification array and pass a second argument that
>> eval's to true validation is turned off. See
>> Bio::Species::classification. This is what the SeqIO parsers do (or
>> should do).
>>
>> -hilmar
>>
>> On Sep 19, 2005, at 6:56 AM, Stefan Kirov wrote:
>>
>>> NCBI and Bio::Species have very different views on what is the
>>> format of a species field. Is it possible to relax a bit the species
>>> regular expression? For example, 'Rhizophidium sp. 136' is a valid
>>> NCBI species name, but this is not the case in bioperl (species name
>>> in this case is starting with a number). There are other collisions
>>> as well, such as special characters found in the species name: ()'/
>>> Any thoughts?
>>> Stefan
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at portal.open-bio.org
>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>
> -
>
>
--
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------
More information about the Bioperl-l
mailing list