[Biojava-l] Mass Search Results

William.Old@UCHSC.edu William.Old@UCHSC.edu
Thu, 3 Jan 2002 11:03:11 -0700


Dr. Jones,

I think the time is ripe for biojava interfaces to peptide-mass
fingerprinting search algorithms, and I am glad to see some interest
brewing. Let me express my interest, and relate to the list some of the
reasons. Based on what I have worked with from your previous JPAT library
and the digestion classes in biojava, I am anxious to see what you come up
with. 

I am currently working on an implementation of the ProFound algorithm,
eventually to be used for in house peptide mass fingerprinting. The
performance of my current implementation parallels the web-accessible
version of ProFound at
http://129.85.19.192/profound_bin/WebProFound.exe?FORM=1 in terms of speed
and discriminability. I am designing it to allow the user to search over all
possible post-translational modifications, with user-defined parameters that
reduce the false-positive and false-negative rates which confound
modification searching. Additional parameters include error tolerance, pI,
protein MW, etc. Because the size of the search space and complexity of the
Bayesian calculations, it is written entirely in ANSI C on Solaris, with
only a command line interface, and textual output. Eventually I would be
interested in collaborating to design interfaces to the algorithm, but I am
not sure how robust C/Java interfaces are.

I haven't studied the SeqSimilaritySearchResult interface very much, but my
first impression is that you may want to create a new interface to encompass
the large amount of data returned from a typical search. In the web version
of ProFound (as well as the others, like MSFit), the top hits are returned
with MW, peptide hits, errors, peptide hit sequences, as well as the
normalized probability score. The advantage to interfacing with an algorithm
as opposed to a results parser, would be the opportunity to capture more
information. The scores from ProFound are displayed as normalized Bayesian
probabilities which all add up to 1; however, in many cases, it is also
useful to know the likelihood prior to normalization, as well as a number of
other functions used in the probability calculation which are very useful in
the identification process. Obtaining the likelihood for each hit allows one
to compare scores across different searches, as well as doing statistical
testing to estimate false-positive rates.  All in all, a clean interface to
such an algorithm would be immensely useful, not just for doing single
searches, but also for automation of thousands at a time, and for automated
statistical testing as in: 

J. Eriksson, B.T. Chait, and D. Fenyö, "A Statistical Basis for Testing the
Significance of Mass Spectrometric Protein Identification Results",
Analytical Chemistry 72 (2000) 999-1005. 

I'm planning to publish the work soon, and subsequently would be interested
speaking with you about the design of such interfaces. 


-- 
Will Old, Ph.D.
Research Associate
Center for Computational Pharmacology
http://compbio.uchsc.edu/
Univ. Colo. Health Sci. Center
303-315-1102
William.Old@UCHSC.edu



-----Original Message-----
From: Michael Jones [mailto:mjones@mpi.com]
Sent: Wednesday, December 26, 2001 9:36 AM
To: biojava-l@biojava.org
Subject: [Biojava-l] Mass Search Results


I am thinking about creating some biojava interfaces and implementations 
for peptide-mass fingerprint and peptide fragment mass searches of sequence 
databases. I would like to make it general enough so that it could be used 
to wrap some of the popular search tools. So I need to abstract out things 
like Scoring schemes.

In general the input would be a set of masses (Protein and peptide or 
Fragments and Parent peptide), error tolerance and other filters. The 
output would be a set of proteins or nucleotide sequences along with there 
associated scores and possibly with the matches as features annotated onto 
the returned sequences.

I have been looking at some of the Interfaces used for FastA searches but I 
am not sure that they are appropriate for the problem above. For Example 
the SearchBuilder has as one of its methods SeqSimilaritySearchResult 
makeSearchResult(). A SeqSimilaritySearchResult has a method 
getQuerySequence() that is not appropriate for the mass search problem. 
What do people think. Should I go ahead and use them and just ignore 
getQuerySequence() or should I create new interfaces? Perhaps I could just 
extend SeqSimilaritySearchResult and add a getQueryMassSet method or just 
use the same interface and just put the masses into the SearchParameters
Map.

Also these interfaces according to the documentation seem to be designed to 
handle parsing of results but not for algorithm implementations. Is there 
some other interfaces that may be more appropriate for doing search 
algorithm implementations?

_______________________________________________
Biojava-l mailing list  -  Biojava-l@biojava.org
http://biojava.org/mailman/listinfo/biojava-l