[Bioperl-l] Base symbols recognized by Bio::Restriction::Enzyme and Bio::Tools::IUPAC

Mon Nov 13 00:15:37 UTC 2006

On Nov 12, 2006, at 4:21 PM, Conrad Halling wrote:

> A followup:
>
> Bio::Tools::IUPAC::iupac_iub() also returns 'U' as a valid base. My  
> idea
> of getting an authoritative set of base symbols from Bio::Tools::IUPAC
> does not work well.
>
> I will leave the valid bases in Bio::Restriction::Enzyme set to  
> what is
> already used in the module ('ABCDGHKMNRSTVWY').
>
> -- Conrad

Conrad,

You could grab a hash of DNA/RNA codes from Bio::Tools::IUPAC using  
iupac_iub(), then modify that for internal use in Bio::Restriction  
modules by adding/deleting what you want.  Or add a method to  
Bio::Tools::IUPAC that adds/deletes key-value pairs in the object to  
your specifications.

If there are non-IUPAC symbols present in the module we should  
consider the reasons why they are there.  Were they added as a quick  
fix, or for other reasons?  Strictly speaking, I would say a module  
named Bio::Tools::IUPAC should not contain non-IUPAC symbols, and  
modifications should be made on a case-by-case basis (in objects),  
vs. universally (in classes).  You could always remove any non- 
standard symbols and see what breaks.

By the way, I think I can speak for many here by saying that we are  
happy you will take up the Bio::Restriction classes.  They definitely  
need some work!

Chris

> Conrad Halling wrote:
>> Quick summary:
>>
>> 'X' is recognized as a valid base symbol by Bio::Tools::IUPAC but  
>> not by
>> Bio::Restriction::Enzyme. Should 'X' be removed from  
>> Bio::Tools::IUPAC
>> or should it be added to Bio::Restriction::Enzyme?
>>
>> Detailed explanation:
>>
>> I tried to use the Bio::Restriction modules to perform a simple
>> restriction analysis of some sequences I'm using at work, and I found
>> the documentation and code confusing. So I'm volunteering to overhaul
>> and redocument these modules. As part of this effort, I am also
>> volunteering to fix the Bio::Restriction::IO::bairoch module.
>>
>> I have begun writing a test suite, RestrictionEnzyme.t, for the
>> Bio::Restriction::Enzyme module. For one of the tests, I created a
>> Bio::Restriction::Enzyme object with a recognition sequence that
>> includes all of the IUPAC base symbols along with the caret ('^')
>> symbol. A code excerpt is:
>>
>> use Bio::Tools::IUPAC;
>> my %iupac_iub = Bio::Tools::IUPAC::iupac_iub();
>> my $site = join( '', '^', sort( keys( %iupac_iub ) ) );
>> ok $enzyme =
>>     Bio::Restriction::Enzyme->new(
>>         -name   => 'IUPAC-IUB',
>>         -site   => $site );
>>
>> This test fails because Bio::Tools::IUPAC module includes 'X' as a  
>> valid
>> base symbol, whereas Bio::Restriction::Enzyme does not.
>>
>> ------------- EXCEPTION: Bio::Root::Exception -------------
>> MSG: Unrecognized characters in site: [^ABCDGHKMNRSTUVWXY]
>> STACK: Error::throw
>> STACK: Bio::Root::Root::throw Bio/Root/Root.pm:359
>> STACK: Bio::Restriction::Enzyme::site Bio/Restriction/Enzyme.pm:441
>> STACK: Bio::Restriction::Enzyme::new Bio/Restriction/Enzyme.pm:337
>> STACK: t/RestrictionEnzyme.t:184
>> -----------------------------------------------------------
>>
>> The symbols recognized by Bio::Restriction::Enzyme and  
>> Bio::Tools::IUPAC
>> need to be synchronized. Since 'X' is not recommended by  
>> "Nomenclature
>> for Incompletely Specified Bases in Nucleic Acid Sequences" (see
>> http://www.chem.qmul.ac.uk/iubmb/misc/naseq.html) (largely because it
>> stands for xanthine), I am in favor of removing it as a valid symbol.
>>
>> But I have a feeling that if 'X' is removed as a valid symbol from
>> Bio::Tools::IUPAC, this will break a lot of existing code. So the
>> simplest solution seems to be to add 'X' to the symbols recognized by
>> Bio::Restriction::Enzyme.
>>
>> Does anyone have a recommendation?
>>
>>
>
> -- 
> Conrad Halling
> chhalling at verizon.net
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign