[Bioperl-l] Bio::Species validate_species_name

Hilmar Lapp hlapp at gmx.net
Wed Sep 21 03:40:04 EDT 2005


Sure this is even likely once Entrez Gene comprises of as many species 
as e.g. UniProt does - which is not imminent AFAIK but may be the case 
one day.

I like to take the position that throwing an exception is always better 
than silently making a mistake, but in the case of stream parsers 
typically for most users an exception isn't helpful because 1) they 
can't fix the datasource, 2) often they can't fix the responsible 
Bioperl code either, and 3) most often they don't care about the fringe 
species anyway and would have screened out that record anyway.

You could throw a warning if the species name doesn't look like it fits 
your expected pattern, but even detecting that isn't easy.

	-hilmar

On Sep 20, 2005, at 10:06 AM, Stefan Kirov wrote:

> Hilmar,
> Before I modify entrezgene to override the classification verification 
> I have a question - what's the reason to do that? What about the 
> danger of inconsistencies down the road? One can imagine a situation, 
> where the parser having passed a bad classification creates invalid 
> binomial name, which then breaks the user's code (for example database 
> queries, file structure, etc.).
> Thanks!
> Stefan
>
> Hilmar Lapp wrote:
>
>> If you set the classification array and pass a second argument that 
>> eval's to true validation is turned off. See 
>> Bio::Species::classification. This is what the SeqIO parsers do (or 
>> should do).
>>
>>     -hilmar
>>
>> On Sep 19, 2005, at 6:56 AM, Stefan Kirov wrote:
>>
>>> NCBI and Bio::Species have very different views on what is the 
>>> format of a species field. Is it possible to relax a bit the species 
>>> regular expression? For example, 'Rhizophidium sp. 136' is a valid 
>>> NCBI species name, but this is not the case in bioperl (species name 
>>> in this case is starting with a number). There are other collisions 
>>> as well, such as special characters found in the species name: ()'/
>>> Any thoughts?
>>> Stefan
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at portal.open-bio.org
>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>
> -
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------




More information about the Bioperl-l mailing list