[Bioperl-l] Base symbols recognized by Bio::Restriction::Enzyme and Bio::Tools::IUPAC

Conrad Halling chhalling at verizon.net
Sun Nov 12 22:21:06 UTC 2006


A followup:

Bio::Tools::IUPAC::iupac_iub() also returns 'U' as a valid base. My idea 
of getting an authoritative set of base symbols from Bio::Tools::IUPAC 
does not work well.

I will leave the valid bases in Bio::Restriction::Enzyme set to what is 
already used in the module ('ABCDGHKMNRSTVWY').

-- Conrad

Conrad Halling wrote:
> Quick summary:
>
> 'X' is recognized as a valid base symbol by Bio::Tools::IUPAC but not by 
> Bio::Restriction::Enzyme. Should 'X' be removed from Bio::Tools::IUPAC 
> or should it be added to Bio::Restriction::Enzyme?
>
> Detailed explanation:
>
> I tried to use the Bio::Restriction modules to perform a simple 
> restriction analysis of some sequences I'm using at work, and I found 
> the documentation and code confusing. So I'm volunteering to overhaul 
> and redocument these modules. As part of this effort, I am also 
> volunteering to fix the Bio::Restriction::IO::bairoch module.
>
> I have begun writing a test suite, RestrictionEnzyme.t, for the 
> Bio::Restriction::Enzyme module. For one of the tests, I created a 
> Bio::Restriction::Enzyme object with a recognition sequence that 
> includes all of the IUPAC base symbols along with the caret ('^') 
> symbol. A code excerpt is:
>
> use Bio::Tools::IUPAC;
> my %iupac_iub = Bio::Tools::IUPAC::iupac_iub();
> my $site = join( '', '^', sort( keys( %iupac_iub ) ) );
> ok $enzyme =
>     Bio::Restriction::Enzyme->new(
>         -name   => 'IUPAC-IUB',
>         -site   => $site );
>
> This test fails because Bio::Tools::IUPAC module includes 'X' as a valid 
> base symbol, whereas Bio::Restriction::Enzyme does not.
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Unrecognized characters in site: [^ABCDGHKMNRSTUVWXY]
> STACK: Error::throw
> STACK: Bio::Root::Root::throw Bio/Root/Root.pm:359
> STACK: Bio::Restriction::Enzyme::site Bio/Restriction/Enzyme.pm:441
> STACK: Bio::Restriction::Enzyme::new Bio/Restriction/Enzyme.pm:337
> STACK: t/RestrictionEnzyme.t:184
> -----------------------------------------------------------
>
> The symbols recognized by Bio::Restriction::Enzyme and Bio::Tools::IUPAC 
> need to be synchronized. Since 'X' is not recommended by "Nomenclature 
> for Incompletely Specified Bases in Nucleic Acid Sequences" (see 
> http://www.chem.qmul.ac.uk/iubmb/misc/naseq.html) (largely because it 
> stands for xanthine), I am in favor of removing it as a valid symbol.
>
> But I have a feeling that if 'X' is removed as a valid symbol from 
> Bio::Tools::IUPAC, this will break a lot of existing code. So the 
> simplest solution seems to be to add 'X' to the symbols recognized by 
> Bio::Restriction::Enzyme.
>
> Does anyone have a recommendation?
>
>   

-- 
Conrad Halling
chhalling at verizon.net




More information about the Bioperl-l mailing list