[Bioperl-l] Bio::Species validate_species_name
Stefan Kirov
skirov at utk.edu
Wed Sep 21 08:46:05 EDT 2005
Thanks for the great answer Hilmar!
I would prefer to have some kind of a check if the user wishes so. For
example Entrezgene file contains some HTML tags in some entries species
names which is good to know.
I will put an option -validate_species in the constructor to turn the
check on and off. Maybe a species filter can be of some use as well.
though you can just select the correct file from the NCBI site....
Thanks again!
Stefan
Hilmar Lapp wrote:
> Sure this is even likely once Entrez Gene comprises of as many species
> as e.g. UniProt does - which is not imminent AFAIK but may be the case
> one day.
>
> I like to take the position that throwing an exception is always
> better than silently making a mistake, but in the case of stream
> parsers typically for most users an exception isn't helpful because 1)
> they can't fix the datasource, 2) often they can't fix the responsible
> Bioperl code either, and 3) most often they don't care about the
> fringe species anyway and would have screened out that record anyway.
>
> You could throw a warning if the species name doesn't look like it
> fits your expected pattern, but even detecting that isn't easy.
>
> -hilmar
>
> On Sep 20, 2005, at 10:06 AM, Stefan Kirov wrote:
>
>> Hilmar,
>> Before I modify entrezgene to override the classification
>> verification I have a question - what's the reason to do that? What
>> about the danger of inconsistencies down the road? One can imagine a
>> situation, where the parser having passed a bad classification
>> creates invalid binomial name, which then breaks the user's code (for
>> example database queries, file structure, etc.).
>> Thanks!
>> Stefan
>>
>> Hilmar Lapp wrote:
>>
>>> If you set the classification array and pass a second argument that
>>> eval's to true validation is turned off. See
>>> Bio::Species::classification. This is what the SeqIO parsers do (or
>>> should do).
>>>
>>> -hilmar
>>>
>>> On Sep 19, 2005, at 6:56 AM, Stefan Kirov wrote:
>>>
>>>> NCBI and Bio::Species have very different views on what is the
>>>> format of a species field. Is it possible to relax a bit the
>>>> species regular expression? For example, 'Rhizophidium sp. 136' is
>>>> a valid NCBI species name, but this is not the case in bioperl
>>>> (species name in this case is starting with a number). There are
>>>> other collisions as well, such as special characters found in the
>>>> species name: ()'/
>>>> Any thoughts?
>>>> Stefan
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at portal.open-bio.org
>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>
>> -
>>
>>
--
Stefan Kirov, Ph.D.
University of Tennessee/Oak Ridge National Laboratory
5700 bldg, PO BOX 2008 MS6164
Oak Ridge TN 37831-6164
USA
tel +865 576 5120
fax +865-576-5332
e-mail: skirov at utk.edu
sao at ornl.gov
"And the wars go on with brainwashed pride
For the love of God and our human rights
And all these things are swept aside"
More information about the Bioperl-l
mailing list