[Bioperl-l] Base symbols recognized by Bio::Restriction::Enzyme and Bio::Tools::IUPAC
Conrad Halling
chhalling at verizon.net
Sun Nov 12 18:11:54 UTC 2006
Quick summary:
'X' is recognized as a valid base symbol by Bio::Tools::IUPAC but not by
Bio::Restriction::Enzyme. Should 'X' be removed from Bio::Tools::IUPAC
or should it be added to Bio::Restriction::Enzyme?
Detailed explanation:
I tried to use the Bio::Restriction modules to perform a simple
restriction analysis of some sequences I'm using at work, and I found
the documentation and code confusing. So I'm volunteering to overhaul
and redocument these modules. As part of this effort, I am also
volunteering to fix the Bio::Restriction::IO::bairoch module.
I have begun writing a test suite, RestrictionEnzyme.t, for the
Bio::Restriction::Enzyme module. For one of the tests, I created a
Bio::Restriction::Enzyme object with a recognition sequence that
includes all of the IUPAC base symbols along with the caret ('^')
symbol. A code excerpt is:
use Bio::Tools::IUPAC;
my %iupac_iub = Bio::Tools::IUPAC::iupac_iub();
my $site = join( '', '^', sort( keys( %iupac_iub ) ) );
ok $enzyme =
Bio::Restriction::Enzyme->new(
-name => 'IUPAC-IUB',
-site => $site );
This test fails because Bio::Tools::IUPAC module includes 'X' as a valid
base symbol, whereas Bio::Restriction::Enzyme does not.
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Unrecognized characters in site: [^ABCDGHKMNRSTUVWXY]
STACK: Error::throw
STACK: Bio::Root::Root::throw Bio/Root/Root.pm:359
STACK: Bio::Restriction::Enzyme::site Bio/Restriction/Enzyme.pm:441
STACK: Bio::Restriction::Enzyme::new Bio/Restriction/Enzyme.pm:337
STACK: t/RestrictionEnzyme.t:184
-----------------------------------------------------------
The symbols recognized by Bio::Restriction::Enzyme and Bio::Tools::IUPAC
need to be synchronized. Since 'X' is not recommended by "Nomenclature
for Incompletely Specified Bases in Nucleic Acid Sequences" (see
http://www.chem.qmul.ac.uk/iubmb/misc/naseq.html) (largely because it
stands for xanthine), I am in favor of removing it as a valid symbol.
But I have a feeling that if 'X' is removed as a valid symbol from
Bio::Tools::IUPAC, this will break a lot of existing code. So the
simplest solution seems to be to add 'X' to the symbols recognized by
Bio::Restriction::Enzyme.
Does anyone have a recommendation?
--
Conrad Halling
chhalling at verizon.net
More information about the Bioperl-l
mailing list