[Bioperl-l] PSI-BLAST Matrix Parser?
Stefan Kirov
skirov at utk.edu
Wed Sep 8 14:07:21 EDT 2004
This seems reasonable to me. The one thing you need to consider is the
structure that should contain the matrix. The current design of
Bio::Matrix::PSM::Psm and Bio::Matrix::PSM::SiteMatrix does not allow this as SiteMatrix is a DNA only object.
There are two ways to go:
Either change SiteMatrix to accept protein matrix data or add a protein matrix class to Bio::Matrix::PSM (say Bio::Matrix::PSM::ProtMatrix), which will hold the data and make Bio::Matrix::PSM::Psm inherit from the class and be able to contain the object (as it is actually a container right now).
So you will have something like:
my $psmIO= new Bio::Matrix::PSM::IO(-file=>$file, -format=>'psi-blast'); #this will call the actual parser (Bio::Matrix::PSM::IO::psiblast)
my $header=$psmIO->.... #I guess there will be some header data
while (my $psm=$psmIO->next_psm) {
my $psimatrix=$psm->protmatrix; #This will be Bio::Matrix::PSM::ProtMatrix object
$psimatrix->.....; #Now process the data parsed into this object through its methods...
}
If you do this maybe you should get an account and commit it yourself?
Does this make sense to you?
Stefan
James Thompson wrote:
>Stefan,
>
>Thanks for the response. For reading in the actual alignment I would use
>Bio::AlignIO to read the PSI-BLAST output as it's just another alignment file,
>but the matrix file that I'm talking about is slightly different. Now that
>I've perused CVS more and learned more about how the Bio::Matrix::PSM modules
>work, I think I have a more clear picture of what I'd like to do.
>
>If you run PSI-BLAST with the -Q option, will take the matrix that it
>used for the position-specific search and output it to a file. I've put up a
>link to one of my matrix files up here if you'd like to look at it:
>
>http://bioinformatics.rit.edu/~tex/atp1.matrix
>
>Basically I'd like to make some Bio::Matrix::PSM::Psm objects (or at least
>a PsmI-compliant object), and I think that the correct way to do this would
>be to add a file format parser to Bio::Matrix::PSM::IO. Currently in Bioperl
>there are three format parsers:
> - mast
> - meme
> - transfac
>
>None of these work with the PSI-BLAST matrix files. I'd like to write a new
>matrix file parser (perhaps called psi-blast?) in the spirit of the three other
>parsers.
>
>If I were to write this, could someone commit it for me?
>
>James Thompson
>
>On Tue, 7 Sep 2004, Stefan A Kirov wrote:
>
>
>
>>I am not sure what object you are going to store your data in... Are you
>>going to develop your own class to hold the data or use an existing one?
>>Also is there any reason not to use Bio::AlignIO (it reads PSI-Blast as
>>far as I know)?
>>Stefan
>>
>>
>>On Tue, 7 Sep 2004, James Thompson wrote:
>>
>>
>>
>>>Dear Bioperl-ers,
>>>
>>>I'd like to parse the output of a PSI-BLAST matrix, and I was wondering if
>>>there was a Bioperl way of parsing these files. If not, I'd like to make my
>>>code general enough to be committed, and I'd like some advice on where exactly
>>>to put such a module. From my cursory knowledge of Bioperl, I think that adding
>>>another format parser to Bio::Matrix::PSM::IO would be a good way to go.
>>>
>>>I have a couple of questions:
>>>- Does anyone know what the PSI-BLAST matrix format is called?
>>>- Is this the correct place in which to put code for parsing this type of files?
>>>
>>>The file format represents a position-specific scoring matrix with some added
>>>statistical information, here's a general overview of the information available
>>>
>>>
>>>from the matrix file:
>>
>>
>>>Last position-specific scoring matrix computed, weighted observed percentages
>>>rounded down, information per position, and relative weight of gapless real
>>>matches to p seudocounts.
>>>
>>>Any help is greatly appreciated.
>>>
>>>James Thompson
>>>
>>>_______________________________________________
>>>Bioperl-l mailing list
>>>Bioperl-l at portal.open-bio.org
>>>http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>
>
>
>
>
--
Stefan Kirov, Ph.D.
University of Tennessee/Oak Ridge National Laboratory
1060 Commerce Park, Oak Ridge
TN 37830-8026
USA
tel +865 576 5120
fax +865 241 1965
e-mail: skirov at utk.edu
sao at ornl.gov
More information about the Bioperl-l
mailing list