[Bioperl-l] program that search databais

Aaron J Mackey ajm6q@virginia.edu
Tue, 3 Oct 2000 22:28:05 -0400 (EDT)


  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.
  Send mail to mime@docserver.cac.washington.edu for more info.

---559023410-1804928587-970589231=:12899
Content-Type: TEXT/PLAIN; CHARSET=US-ASCII
Content-ID: <Pine.SOL.4.10.10010031207521.12899@stargate.gpcc.itd.umich.edu>


Just to be explictly clear to the onlookers - Peter's code uses the second
approach I mentioned - iteratively "searching" for the pH where the net
charge equals 0.  The c code I sent along solves for pI explictly via the
Sillero algorithm.  Your mileage may vary.

-Aaron

On Tue, 3 Oct 2000, Peter J Ulintz wrote:

> 
> Thanks Aaron for the C routine. I've attached a Perl version of a
> subroutine to calculate pI that sounds similar to what you described.  
> The routine takes a string representing an AA sequence. If anyone is
> interested in a python version, let me know.
> 
> --Pete
> 
> On Tue, 3 Oct 2000, Aaron J Mackey wrote:
> 
> > 
> > A few people have asked for this info, so I'm copying the list.  Excuse
> > the slightly offtopic conversation.
> > 
> > 
> > On Mon, 2 Oct 2000, Aaron J Mackey wrote:
> > 
> > > 
> > > I once wrote a pI solver in Perl using a method to explicitly solve for
> > > pI.  But it turns out it's much better to simply 'slide' from pH 0 to pH
> > > 14 and see where the electric charge function goes from negative to
> > > positive.  While less elegant, it's much faster (and you can play tricks
> > > with numerical derivatives near zero to get an accurate value, if you'd
> > > like).
> > > 
> > > I'll send you the references and my code, if you're interested.
> > > 
> > > -Aaron
> > > 
> > > On Mon, 2 Oct 2000, Tsvika wrote:
> > > 
> > > > Hi all,
> > > > 
> > > > I am writing a Perl program that search proteins  database, and
> > > > find matches according predicted Molecular Wight (MW) versus MW
> > > > from results of Mass spectrometry. Also, the program takes short
> > > > amino acid sequence (tag) and looks at the N-term or C-term as
> > > > you like. The program has MW tolerance (+/-X%), and uses regexp
> > > > for the tag input. The program is fully functional, and I used it
> > > > on C.elegans wormpep25 successfully.
> > > > Now I want to add the P.I., isoelectric point, option. So, like
> > > > MW, one can try to find matches according to the P.I. results of
> > > > first dimension of 2D gel versus the predicted P.I.
> > > > I would appreciate if you could help me with P.I. calculations,
> > > > and algorithms. Also, I would like to hear if you think it is a
> > > > start for Proteomics applications in bioperl?
> > > > 
> > > > Tsvika.
> > > > 
> > > 
> > > 
> > 
> > -- 
> >  o ~   ~   ~   ~   ~   ~  o
> > / Aaron J Mackey           \
> > \  Dr. Pearson Laboratory  / 
> >  \ University of Virginia  \     
> >  /  (804) 924-2821          \
> >  \  amackey@virginia.edu    /
> >   o ~   ~   ~   ~   ~   ~  o
> > 
> > 
> 

-- 
 o ~   ~   ~   ~   ~   ~  o
/ Aaron J Mackey           \
\  Dr. Pearson Laboratory  / 
 \ University of Virginia  \     
 /  (804) 924-2821          \
 \  amackey@virginia.edu    /
  o ~   ~   ~   ~   ~   ~  o


---559023410-1804928587-970589231=:12899
Content-Type: TEXT/PLAIN; CHARSET=US-ASCII; NAME="pI.pl"
Content-Transfer-Encoding: BASE64
Content-ID: <Pine.SOL.4.10.10010031207110.12899@stargate.gpcc.itd.umich.edu>
Content-Description: pI.pl
Content-Disposition: ATTACHMENT; FILENAME="pI.pl"

c3ViIGNhbGNfcEkgew0KICAgbXkgJHNlcSA9IEBfWzBdOw0KICAgbXkgJWNv
dW50cyA9ICgiQSI9PjAsICJSIj0+MCwgIk4iPT4wLCAiRCI9PjAsICJDIj0+
MCwNCiAgICAgICAgICAgICAgICJRIj0+MCwgIkUiPT4wLCAiRyI9PjAsICJI
Ij0+MCwgIkkiPT4wLA0KICAgICAgICAgICAgICAgIkwiPT4wLCAiSyI9PjAs
ICJNIj0+MCwgIkYiPT4wLCAiUCI9PjAsDQogICAgICAgICAgICAgICAiUyI9
PjAsICJUIj0+MCwgIlciPT4wLCAiWSI9PjAsICJWIj0+MA0KICAgICAgICAg
ICAgICAgKTsgDQoNCiAgICMjIEhhc2ggb2YgcEthIHZhbHVlcyB1c2VkIGlu
IHRoZSBjYWxjdWxhdGlvbi0tIA0KICAgIyAgcEthWzBdID0gJ2ludGVybmFs
IHJlc2lkdWUgcEthJw0KICAgIyAgcEthWzFdID0gJ0MtdGVybSByZXNpZHVl
IHBLYScNCiAgICMgIHBLYVsyXSA9ICdOLXRlcm0gcmVzaWR1ZSBwS2EnDQog
ICBteSAlcEthID0gKCJBIj0+WzAsIDMuNTUsIDcuNTldLA0KICAgICAgICAg
ICAgICAiUiI9PlsxMi4wLCAzLjU1LCA3LjUwXSwNCiAgICAgICAgICAgICAg
Ik4iPT5bMCwgMy41NSwgNy41MF0sDQogICAgICAgICAgICAgICJEIj0+WzQu
MDUsIDQuNTUsIDcuNTBdLA0KICAgICAgICAgICAgICAiQyI9Pls5LjAwLCAz
LjU1LCA3LjUwXSwNCiAgICAgICAgICAgICAgIlEiPT5bMCwgMy41NSwgNy41
MF0sDQogICAgICAgICAgICAgICJFIj0+WzQuNDUsIDQuNzUsIDcuNzBdLA0K
ICAgICAgICAgICAgICAiRyI9PlswLCAzLjU1LCA3LjUwXSwNCiAgICAgICAg
ICAgICAgIkgiPT5bNS45OCwgMy41NSwgNy41MF0sDQogICAgICAgICAgICAg
ICJJIj0+WzAsIDMuNTUsIDcuNTBdLA0KICAgICAgICAgICAgICAiTCI9Plsw
LCAzLjU1LCA3LjUwXSwNCiAgICAgICAgICAgICAgIksiPT5bMTAuMDAsIDMu
NTUsIDcuNTBdLA0KICAgICAgICAgICAgICAiTSI9PlswLCAzLjU1LCA3LjAw
XSwNCiAgICAgICAgICAgICAgIkYiPT5bMCwgMy41NSwgNy41MF0sDQogICAg
ICAgICAgICAgICJQIj0+WzAsIDMuNTUsIDguMzZdLA0KICAgICAgICAgICAg
ICAiUyI9PlswLCAzLjU1LCA2LjkzXSwNCiAgICAgICAgICAgICAgIlQiPT5b
MCwgMy41NSwgNi44Ml0sDQogICAgICAgICAgICAgICJXIj0+WzAsIDMuNTUs
IDcuNTBdLA0KICAgICAgICAgICAgICAiWSI9PlsxMC4wMCwgMy41NSwgNy41
MF0sIA0KICAgICAgICAgICAgICAiViI9PlswLCAzLjU1LCA3LjQ0XQ0KICAg
ICAgICAgICAgICk7DQoNCiAgIGZvciAobXkgJHggPSAxOyAkeCA8IChsZW5n
dGgoJHNlcSkgLSAxKTsgJHgrKykgew0KICAgICAgICBteSAka2V5ID0gc3Vi
c3RyICRzZXEsICR4LCAxOw0KICAgICAgICAkY291bnRzeyRrZXl9ICs9IDE7
DQogICAgfQ0KDQogICBteSAkbGVuZ3RoID0gbGVuZ3RoKCRzZXEpOw0KICAg
bXkgJE50ZXJtID0gc3Vic3RyICRzZXEsIDAsIDE7DQogICBteSAkQ3Rlcm0g
PSBzdWJzdHIgJHNlcSwgLTEsIDE7DQoNCiAgIG15ICREbnVtID0gJGNvdW50
c3siRCJ9OyBteSAkUm51bSA9ICRjb3VudHN7IlIifTsNCiAgIG15ICRIbnVt
ID0gJGNvdW50c3siSCJ9OyBteSAkS251bSA9ICRjb3VudHN7IksifTsNCiAg
IG15ICRFbnVtID0gJGNvdW50c3siRSJ9OyBteSAkQ251bSA9ICRjb3VudHN7
IkMifTsNCiAgIG15ICRZbnVtID0gJGNvdW50c3siWSJ9Ow0KDQogICBteSAk
cEhtaW4gPSAwLjAwOw0KICAgbXkgJHBIbWF4ID0gMTQuMDA7DQoNCiAgIG15
ICRwSCA9IDcuMDsNCiAgIG15ICRuZWdhdGl2ZSA9IG15ICRwb3NpdGl2ZSA9
IDAuMDsNCiAgIGZvciAobXkgJGkgPSAwOyAkaSA8IDEwMDA7ICRpKyspIHsN
CiAgICAgICRwSCA9ICgkcEhtaW4gKyAoJHBIbWF4IC0gJHBIbWluKS8yLjAp
Ow0KICAgICAgJG5lZ2F0aXZlID0gKCAtJERudW0qKDEvKDEgKyAxMCoqKC0k
cEggKyAkcEtheyJEIn1bMF0pKSkNCiAgICAgICAgICAgICAgICAgICAgLSAk
RW51bSooMS8oMSArIDEwKiooLSRwSCArICRwS2F7IkUifVswXSkpDQogICAg
ICAgICAgICAgICAgICAgIC0gJENudW0qKDEvKDEgKyAxMCoqKC0kcEggKyAk
cEtheyJDIn1bMF0pKSkNCiAgICAgICAgICAgICAgICAgICAgLSAkWW51bSoo
MS8oMSArIDEwKiooLSRwSCArICRwS2F7IlkifVswXSkpKQ0KICAgICAgICAg
ICAgICAgICAgICAtICgxLygxICsgMTAqKigtJHBIICsgJHBLYXskQ3Rlcm19
WzFdKSkpKTsNCiAgICAgICANCiAgICAgICRwb3NpdGl2ZSA9ICggJEhudW0q
KDEvKDEgKyAxMCoqKC0kcEtheyJIIn1bMF0gKyAkcEgpKSkNCiAgICAgICAg
ICAgICAgICAgICAgKyAkS251bSooMS8oMSArIDEwKiooLSRwS2F7IksifVsw
XSArICRwSCkpKQ0KICAgICAgICAgICAgICAgICAgICArICRSbnVtKigxLygx
ICsgMTAqKigtJHBLYXsiUiJ9WzBdICsgJHBIKSkpDQogICAgICAgICAgICAg
ICAgICAgICsgKDEvKDEgKyAxMCoqKC0kcEtheyROdGVybX1bMl0gKyAkcEgp
KSkpOw0KICAgICAgbXkgJHRvdGFsX2NoYXJnZSA9ICRuZWdhdGl2ZSArICRw
b3NpdGl2ZTsNCiAgICAgIGlmIChhYnMoJHRvdGFsX2NoYXJnZSkgPCAwLjAx
KSB7DQogICAgICAgICAgcmV0dXJuICRwSDsNCiAgICAgIH0NCiAgICAgIGlm
ICgkdG90YWxfY2hhcmdlID4gMC4wKSB7DQogICAgICAgICAgJHBIbWluID0g
JHBIOw0KICAgICAgfSBlbHNlIHsNCiAgICAgICAgICAkcEhtYXggPSAkcEg7
DQogICAgICB9DQogICB9DQogICAjIHNob3VsZG4ndCBnZXQgaGVyZSwgYnV0
IGlmIGl0IGRvZXMuLi4NCiAgIHByaW50ICJEaWRuJ3QgY29udmVyZ2UgaW4g
MTAwMCBpdGVyYXRpb25zXG4iOyANCn0NCg0KDQoNCg0KDQoNCg0KDQoNCg==
---559023410-1804928587-970589231=:12899--