[Biojava-l] RestrictionEnzyme can't handle double sites
mark.schreiber at novartis.com
mark.schreiber at novartis.com
Wed Jun 22 21:01:12 EDT 2005
What would be your reccomended solution to this problem?
"Jesse" <jesse-t at chello.nl>
Sent by: biojava-l-bounces at portal.open-bio.org
06/22/2005 11:05 PM
To: <biojava-l at biojava.org>
cc: (bcc: Mark Schreiber/GP/Novartis)
Subject: [Biojava-l] RestrictionEnzyme can't handle double sites
Another problem.
Some Restriction Enzymes have more than one recognition site. Usually this
can be notated by using ambiguous symbols, but some for restriction
enzymes
this is not possible because in some cases the ambiguous symbols rely on
each other.
Usually an ambiguous symbol is something like this:
ANNC
The first "N" is independent of the second "N". For example, it can match
with:
AAAC
AACC
AAGC
AATC
....
....
ATTC
16 possibilities. The ambiguous symbols are independent of each other.
But in some restriction enzyme, the ambiguous symbols are dependent of
each
other. So for a sequence like
ANNC
Would than only match with:
AAAC
ACCC
AGGC
ATTC
Only 4 possibilities. The ambiguous symbols are dependent of each other.
This happens with these enzymes:
TaqII
M.PhiBssHII (unknown cutlocation)
M.Phi3TI (unknown cutlocation)
M.Rho11sI (unknown cutlocation)
M.SPBetaI (unknown cutlocation)
M.SPRI (unknown cutlocation)
<1>TaqII
<2>
<3>GACCGA(11/9),CACCCA(11/9)
<4>
<5>Thermus aquaticus YTI
<6>J.I. Harris
<7>X
<8>Barker, D., Hoff, M., Oliphant, A., White, R., (1984) Nucleic Acids
Res.,
vol. 12, pp. 5567-5581.
Myers, P.A., Roberts, R.J., Unpublished observations.
Rutkowska, S.M., Jaworowska, I., Skowron, P.M., Unpublished observations.
RestrictionEnzymeManager takes the last recognition site in this example,
it
skips GACCGA.
Name: TaqII
RecognitionSite:caccca
ForwardRegex: cac{3}a
ReverseRegex: tg{3}tg
CutType: 0
DownStreamEndType: 0
IsPalindromic: false
DownstreamCut: 17, 15,
- Jesse
-----Oorspronkelijk bericht-----
Van: biojava-l-bounces at portal.open-bio.org
[mailto:biojava-l-bounces at portal.open-bio.org] Namens Jesse
Verzonden: woensdag 22 juni 2005 12:09
Aan: biojava-l at biojava.org
Onderwerp: RE: [Biojava-l] RestrictionEnzymeManager can't correctlyhandle
incomplete enzymes
(I'm not an expert on restriction enzymes.)
I was talking about AacI, of which BamHI is an isoschizomer. The
recognition
site of AacI is unknown, but the one from BamHI is known. Maybe
RestrictionEnzymeManager uses the cutlocation of BamHI when asking the
unknown cutlocation of AacI.
http://rebase.neb.com/rebase/enz/AacI.html
That might also be the reason why RestrictionEnzymeManager requires links
between restriction enzymes. If a restriction enzyme entry is removed from
the REBASE file RestrictionEnzymeManager fails to read in some cases.
But I think using cutlocation of isoschizomers is wrong. Because of this:
REBASE says: "A isoschizomers is a restriction enzymes that recognize the
same DNA sequence. The cut sites may or may not be identical."
So the cut site might be different between different isoschizomers.
I searched for examples in the REBASE file, and found them:
<1>BspKT6I
<2>MboI,AspMDI,AsuMBI,Bce243I,Bfi57I,BfiSHI,BfuCI,Bme12I,Bme2494I,BsaPI,BscF
I,BsmXII,BspI,Bsp9I,Bsp18I,Bsp49I,Bsp51I,Bsp52I,Bsp54I,Bsp57I,Bsp58I,Bsp59I,
Bsp60I,Bsp61I,Bsp64I,Bsp65I,Bsp66I,Bsp67I,Bsp72I,Bsp74I,Bsp76I,Bsp91I,Bsp105
I,Bsp122I,Bsp135I,Bsp136I,Bsp138I,Bsp143I,Bsp147I,Bsp2095I,BspAI,BspFI,BspJI
,BspJ64I,BsrMI,BsrPII,BssGII,Bst19II,Bst1274I,BstEIII,BstENII,BstKTI,BstMBI,
BstXII,BtcI,Bth84I,Bth211I,Bth213I,Bth221I,Bth945I,Bth1140I,Bth1141I,Bth1786
I,Bth1997I,BthCanI,BtkII,Btu33I,Btu34I,Btu36I,Btu37I,Btu39I,Btu41I,CacI,CcoP
31I,CcoP76I,CcoP84I,CcoP95II,CcoP219I,CcyI,CdiCD6II,ChaI,Cin1467I,CjeP338I,C
paI,CpfI,CpfAI,Csp5I,Cte1179I,Cte1180I,CtyI,CviAI,CviHI,DpnII,EsaLHCI,FnuAII
,FnuCI,FnuEI,Gst1588II,HacI,HpyAIII,HpyHPK5II,Kzo9I,LlaAI,LlaDCHI,LlaKR2I,Ls
p1109II,Mel3JI,Mel5JI,Mel7JI,Mel4OI,Mel5OI,Mel2TI,Mel5TI,MeuI,MgoI,MjaIII,Mk
rAI,MmeII,Mmu5I,MmuP2I,MnoIII,MosI,Msp67II,MspBI,MthI,Mth1047I,MthAI,NciAI,N
deII,NflI,NflAII,NflBI,NlaII,NlaDI,NmeCI,NphI,NsiAI,NspAI,NsuI,Pei9403I,PfaI
,Pph288I,RalF40I,Rlu1I,SalAI,SalHI,Sau15I,Sau6782I,Sau3AI,SauCI,SauDI,SauEI,
SauFI,SauGI,SauMI,SinMI,SmiMBI,SsiAI,SsiBI,Ssu211I,Ssu212I,Ssu220I,R1.Ssu247
9I,R2.Ssu2479I,R1.Ssu4109I,R2.Ssu4109I,R1.Ssu4961I,R2.Ssu4961I,R1.Ssu8074I,R
2.Ssu8074I,R1.Ssu11318I,R2.Ssu11318I,R1.SsuDAT1I,R2.SsuDAT1I,SsuRBI,Sth368I,
TrsKTI,TrsSI,TrsTI,TruII,Tsp133I,Uba4I,Uba59I,Uba1101I,Uba1177I,Uba1182I,Uba
1183I,Uba1204I,Uba1259I,Uba1317I,Uba1323I,Uba1366I,Vha44I
<3>GAT^C
<4>2(6)
<5>Bacillus species KT6
<6>N.I. Matvienko
<7>
<8>Shapovalova, N.I., Zheleznaja, L.A., Matvienko, N.I., (1993) Nucleic
Acids Res., vol. 21, pp. 5794.
Shapovalova, N.I., Zheleznaya, L.A., Matvienko, N.I., (1994) Biokhimiia,
vol. 59, pp. 1730-1738.
<1>MboI
<2>AspMDI,AsuMBI,Bce243I,Bfi57I,BfiSHI,BfuCI,Bme12I,Bme2494I,BsaPI,BscFI,Bsm
XII,BspI,Bsp9I,Bsp18I,Bsp49I,Bsp51I,Bsp52I,Bsp54I,Bsp57I,Bsp58I,Bsp59I,Bsp60
I,Bsp61I,Bsp64I,Bsp65I,Bsp66I,Bsp67I,Bsp72I,Bsp74I,Bsp76I,Bsp91I,Bsp105I,Bsp
122I,Bsp135I,Bsp136I,Bsp138I,Bsp143I,Bsp147I,Bsp2095I,BspAI,BspFI,BspJI,BspJ
64I,BspKT6I,BsrMI,BsrPII,BssGII,Bst19II,Bst1274I,BstEIII,BstENII,BstKTI,BstM
BI,BstXII,BtcI,Bth84I,Bth211I,Bth213I,Bth221I,Bth945I,Bth1140I,Bth1141I,Bth1
786I,Bth1997I,BthCanI,BtkII,Btu33I,Btu34I,Btu36I,Btu37I,Btu39I,Btu41I,CacI,C
coP31I,CcoP76I,CcoP84I,CcoP95II,CcoP219I,CcyI,CdiCD6II,ChaI,Cin1467I,CjeP338
I,CpaI,CpfI,CpfAI,Csp5I,Cte1179I,Cte1180I,CtyI,CviAI,CviHI,DpnII,EsaLHCI,Fnu
AII,FnuCI,FnuEI,Gst1588II,HacI,HpyAIII,HpyHPK5II,Kzo9I,LlaAI,LlaDCHI,LlaKR2I
,Lsp1109II,Mel3JI,Mel5JI,Mel7JI,Mel4OI,Mel5OI,Mel2TI,Mel5TI,MeuI,MgoI,MjaIII
,MkrAI,MmeII,Mmu5I,MmuP2I,MnoIII,MosI,Msp67II,MspBI,MthI,Mth1047I,MthAI,NciA
I,NdeII,NflI,NflAII,NflBI,NlaII,NlaDI,NmeCI,NphI,NsiAI,NspAI,NsuI,Pei9403I,P
faI,Pph288I,RalF40I,Rlu1I,SalAI,SalHI,Sau15I,Sau6782I,Sau3AI,SauCI,SauDI,Sau
EI,SauFI,SauGI,SauMI,SinMI,SmiMBI,SsiAI,SsiBI,Ssu211I,Ssu212I,Ssu220I,R1.Ssu
2479I,R2.Ssu2479I,R1.Ssu4109I,R2.Ssu4109I,R1.Ssu4961I,R2.Ssu4961I,R1.Ssu8074
I,R2.Ssu8074I,R1.Ssu11318I,R2.Ssu11318I,R1.SsuDAT1I,R2.SsuDAT1I,SsuRBI,Sth36
8I,TrsKTI,TrsSI,TrsTI,TruII,Tsp133I,Uba4I,Uba59I,Uba1101I,Uba1177I,Uba1182I,
Uba1183I,Uba1204I,Uba1259I,Uba1317I,Uba1323I,Uba1366I,Vha44I
<3>^GATC
<4>2(6)
<5>Moraxella bovis
<6>ATCC 10900
<7>ACFGKNQRUVX
<8>Anton, B.P., Brooks, J.E., Unpublished observations.
Gelinas, R.E., Myers, P.A., Roberts, R.J., (1977) J. Mol. Biol., vol. 114,
pp. 169-179.
Huang, L.-H., Farnet, C.M., Ehrlich, K.C., Ehrlich, M., (1982) Nucleic
Acids
Res., vol. 10, pp. 1579-1591.
Ueno, T., Ito, H., Kimizuka, F., Kotani, H., Nakajima, K., (1993) Nucleic
Acids Res., vol. 21, pp. 2309-2313.
Ueno, T., Ito, H., Kotani, H., Nakajima, K., Japanese Patent Office, 1993.
<1>Mel3JI
<2>MboI,AspMDI,AsuMBI,Bce243I,Bfi57I,BfiSHI,BfuCI,Bme12I,Bme2494I,BsaPI,BscF
I,BsmXII,BspI,Bsp9I,Bsp18I,Bsp49I,Bsp51I,Bsp52I,Bsp54I,Bsp57I,Bsp58I,Bsp59I,
Bsp60I,Bsp61I,Bsp64I,Bsp65I,Bsp66I,Bsp67I,Bsp72I,Bsp74I,Bsp76I,Bsp91I,Bsp105
I,Bsp122I,Bsp135I,Bsp136I,Bsp138I,Bsp143I,Bsp147I,Bsp2095I,BspAI,BspFI,BspJI
,BspJ64I,BspKT6I,BsrMI,BsrPII,BssGII,Bst19II,Bst1274I,BstEIII,BstENII,BstKTI
,BstMBI,BstXII,BtcI,Bth84I,Bth211I,Bth213I,Bth221I,Bth945I,Bth1140I,Bth1141I
,Bth1786I,Bth1997I,BthCanI,BtkII,Btu33I,Btu34I,Btu36I,Btu37I,Btu39I,Btu41I,C
acI,CcoP31I,CcoP76I,CcoP84I,CcoP95II,CcoP219I,CcyI,CdiCD6II,ChaI,Cin1467I,Cj
eP338I,CpaI,CpfI,CpfAI,Csp5I,Cte1179I,Cte1180I,CtyI,CviAI,CviHI,DpnII,EsaLHC
I,FnuAII,FnuCI,FnuEI,Gst1588II,HacI,HpyAIII,HpyHPK5II,Kzo9I,LlaAI,LlaDCHI,Ll
aKR2I,Lsp1109II,Mel5JI,Mel7JI,Mel4OI,Mel5OI,Mel2TI,Mel5TI,MeuI,MgoI,MjaIII,M
krAI,MmeII,Mmu5I,MmuP2I,MnoIII,MosI,Msp67II,MspBI,MthI,Mth1047I,MthAI,NciAI,
NdeII,NflI,NflAII,NflBI,NlaII,NlaDI,NmeCI,NphI,NsiAI,NspAI,NsuI,Pei9403I,Pfa
I,Pph288I,RalF40I,Rlu1I,SalAI,SalHI,Sau15I,Sau6782I,Sau3AI,SauCI,SauDI,SauEI
,SauFI,SauGI,SauMI,SinMI,SmiMBI,SsiAI,SsiBI,Ssu211I,Ssu212I,Ssu220I,R1.Ssu24
79I,R2.Ssu2479I,R1.Ssu4109I,R2.Ssu4109I,R1.Ssu4961I,R2.Ssu4961I,R1.Ssu8074I,
R2.Ssu8074I,R1.Ssu11318I,R2.Ssu11318I,R1.SsuDAT1I,R2.SsuDAT1I,SsuRBI,Sth368I
,TrsKTI,TrsSI,TrsTI,TruII,Tsp133I,Uba4I,Uba59I,Uba1101I,Uba1177I,Uba1182I,Ub
a1183I,Uba1204I,Uba1259I,Uba1317I,Uba1323I,Uba1366I,Vha44I
<3>GATC
<4>
<5>Megasphaera elsedenii 3J
<6>P. Pristas
<7>
<8>Piknova, M., Filova, M., Javorsky, P., Pristas, P., (2004) FEMS
Microbiol. Lett., vol. 236, pp. 91-95.
Piknova, M., Pristas, P., Javorsky, P., (2004) Folia Microbiol. (Praha),
vol. 49, pp. 191-193.
-----Oorspronkelijk bericht-----
Van: mark.schreiber at novartis.com [ <mailto:mark.schreiber at novartis.com>
mailto:mark.schreiber at novartis.com]
Verzonden: woensdag 22 juni 2005 11:25
Aan: Jesse
CC: biojava-l at biojava.org; biojava-l-bounces at portal.open-bio.org
Onderwerp: Re: [Biojava-l] RestrictionEnzymeManager can't correctly handle
incomplete enzymes
I take your point but I notice that BamHI is an isoscizomer. Is the
cleavage
site of BamHI really unknown??
- Mark
"Jesse" <jesse-t at chello.nl>
Sent by: biojava-l-bounces at portal.open-bio.org
06/22/2005 04:15 PM
To: <biojava-l at biojava.org>
cc: (bcc: Mark Schreiber/GP/Novartis)
Subject: [Biojava-l] RestrictionEnzymeManager can't
correctly
handle incomplete
enzymes
RestrictionEnzymeManager can't correctly handle incomplete enzymes and
gives
wrong data.
(Correct me if I'm wrong.)
I'm not sure if this is already discussed or not.
I think RestrictionEnzymeManager can not handle incomplete restriction
enzymes.
BioJava 1.4Pre2 knows two types of RestrictionEnzymes:
-RestrictionEnzyme.CUT_SIMPLE
-RestrictionEnzyme.CUT_COMPOUND
But in REBASE, there are also other restriction enzyme entries:
-Unknown recognition sites. For example "<3>?". RestrictionEnzymeManager
skips this one (which is ok).
-Unknown cut location. For example AacI "<3>GGATCC".
The problem with RestrictionEnzymeManager is with those REBASE entries
which
have an unknown cutlocation. RestrictionEnzymeManager will actually tell
that there is a cutlocation, even though it's unknown in the REBASE file.
For example:
http://rebase.neb.com/rebase/link_withrefm
--------- REBASE ENTRY -----------
<1>AacI
<2>BamHI,AaeI,AcaII,AccEBI,AinII,AliI,Ali12257I,Ali12258I,ApaCI,AsiI,AspTII,
Atu1II,BamFI,BamKI,BamNI,Bca1259I,Bce751I,Bco10278I,BnaI,BsaDI,Bsp30I,Bsp46I
,Bsp90II,Bsp98I,Bsp130I,Bsp131I,Bsp144I,Bsp4009I,BspAAIII,BstI,Bst1126I,Bst2
464I,Bst2902I,BstQI,Bsu90I,Bsu8565I,Bsu8646I,BsuB519I,BsuB763I,CelI,DdsI,Gdo
I,GinI,GoxI,GseIII,GstI,MleI,Mlu23I,NasBI,Nsp29132II,NspSAIV,OkrAI,Pac1110I,
Pae177I,Pfl8I,Psp56I,RhsI,Rlu4I,RspLKII,SolI,SpvI,SurI,Uba19I,Uba31I,Uba38I,
Uba51I,Uba88I,Uba1098I,Uba1163I,Uba1167I,Uba1172I,Uba1173I,Uba1205I,Uba1224I
,Uba1242I,Uba1250I,Uba1258I,Uba1297I,Uba1302I,Uba1324I,Uba1325I,Uba1334I,Uba
1339I,Uba1346I,Uba1383I,Uba1398I,Uba1402I,Uba1414I,Uba4009I
<3>GGATCC
<4>
<5>Acetobacter aceti sub. liquefaciens
<6>IFO 12388
<7>
<8>Seurinck, J., van Montagu, M., Unpublished observations.
----------------------------------
--------- RestrictionEnzyme values --------
Name: AacI
RecognitionSite:ggatcc
ForwardRegex: g{2}atc{2}
ReverseRegex: g{2}atc{2}
CutType: 0 (RestrictionEnzyme.CUT_SIMPLE)
DownStreamEndType: 2
IsPalindromic: true
DownstreamCut: 1, 1,
-------------------------------------------
As you can see, AaCI is used as RestrictionEnzyme.CUT_SIMPLE and it has a
cutlocation while the REBASE entry says that the cutlocation is unknown,
only the recognition site is known. So RestrictionEnzymeManager should
also
filter out those with an unknown cutlocation, otherwise it gives wrong
data.
- Jesse
[Biojava-l] RestrictionEnzymeManager REBASE reader bug?
mark.schreiber at novartis.com mark.schreiber at novartis.com Tue Jun 21
22:22:52 EDT 2005
Hello -
This is now checked in. All tests pass (no surprise as checking for null
never hurt anyone). This will make it into biojava1.4. If you want to add
a
test to the Junit to ensure this stays fixed it would be most appreciated.
I also remember some discussion a while back about the behaivour of
certain
enzymes with respect to their cleavage points which may or may not
have been a bug. Was this ever resolved? If so does anything need fixing?
Thanks.
- Mark
_______________________________________________
Biojava-l mailing list - Biojava-l at biojava.org
http://biojava.org/mailman/listinfo/biojava-l
_______________________________________________
Biojava-l mailing list - Biojava-l at biojava.org
http://biojava.org/mailman/listinfo/biojava-l
_______________________________________________
Biojava-l mailing list - Biojava-l at biojava.org
http://biojava.org/mailman/listinfo/biojava-l
More information about the Biojava-l
mailing list