[Bioperl-l] Species name validation problem
Hilmar Lapp
hlapp at gmx.net
Mon Mar 27 18:29:40 UTC 2006
I agree. can you file this on bugzilla as a feature request, basically
copy&pasting your email below?
On Mar 27, 2006, at 10:24 AM, David Waner wrote:
> Yes, I meant to type Bio::Species, not Bio::Seq. Sorry for the
> confusion.
>
> My problem is that I am not calling $species->classification()
> directly;
> I am calling Bio::Species->new(), which in turn calls classification()
> which calls validate_species_name(), which then throws an exception on
> some species names. As far as I can see, there is no way to turn off
> this (over-aggressive) validation in the Species constructor.
>
> I guess that instead of this:
>
> $species = Bio::Species->new(-classification =>
> \@classificationArray);
>
> I could do this:
>
> $species = Bio::Species->new();
> $species->classification(\@classificationArray, 'no
> validation');
>
> but it would make a nicer interface to have a validation option in the
> Species constructor.
>
> - David
>
> -----Original Message-----
> From: Hilmar Lapp [mailto:hlapp at gmx.net]
> Sent: Friday, March 24, 2006 9:42 PM
> To: David Waner
> Cc: Bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Species name validation problem
>
>
> The option would be in Bio::Species, not Bio::Seq. You can circumvent
> the name validation by passing an array ref to
> $species->classification() and anything that evaluates to true as the
> second argument. This is for instance what the genbank parser does
> (which doesn't mean that it is always correct); supposedly the
> swissprot
> parser ought to do the same.
>
> -hilmar
>
> On 3/24/06, David Waner <dwaner at scitegic.com> wrote:
>> I have found that Bio::Seq->new() throws exceptions on some "species"
>> names containing special characters, or consisting of a single letter,
>> e.g:
>>
>> SwissProt: POLN_ONNVG O'nyong-nyong virus
>> SwissProt: FIBP_ADE1H Human adenovirus 15/H9
>> SwissProt: POLG_FMDVZ Foot-and-mouth disease virus (strain
>> A22/550 Azerbaijan 65)
>> SwissProt: RIR1_BHV1C Bovine herpesvirus 1.1
>> SwissProt: SODF_METJ Methylomonas J
>> GenBank: AJ416726 Stylosanthes aff. calcicola
>>
>> It seems that the regex in validate_species_name() is too restrictive,
>
>> but I can't find a way to turn off validation without editing bioperl
>> modules. There has been some recent discussion of this issue on the
>> mailing list (see below). Does anyone know if or when a
>> -validate_species option to Bio::Seq->new() will be added? Or should I
>
>> just propose the code change?
>>
>> Thanks,
>> David Waner
>>
>>
>>> Stefan Kirov skirov at utk.edu
>>> Wed Sep 21 08:46:05 EDT 2005
>>>
>>>
>> ----------------------------------------------------------------------
>> --
>> --------
>>>
>>> Thanks for the great answer Hilmar!
>>> I would prefer to have some kind of a check if the user wishes so.
>>> For
>>
>>> example Entrezgene file contains some HTML tags in some entries
>> species
>>> names which is good to know.
>>> I will put an option -validate_species in the constructor to turn
>>> the check on and off. Maybe a species filter can be of some use as
>>> well. though you can just select the correct file from the NCBI
>>> site.... Thanks again! Stefan
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
>
> --
> ----------------------------------------------------------
> : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
> ----------------------------------------------------------
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> --
> Click on the link below to report this email as spam
> https://www.mailcontrol.com/sr/6RxreR3!4EAT093Sa0o+kL74sPfAD2rj2Jp!
> eGk8r
> RtXfcIn+KX87A70BrDI0qIcMansH9FDdvd7u5Zc1G6CuaLdquPg4xnr+tcULmTIZgnhNIFU
> k
> MNJWsODXSRTEtZF6To1umzAv!
> mlBBYJW4WXOZWaK8xzZrmj3Eao8o3D4YNM7jMpLnqnc7LtK
> 9D9H+YhmDk7r9DMVd5h6cTMU3rPx7Z43oVxeMeC
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------
More information about the Bioperl-l
mailing list