[Bioperl-l] Protein alignment CD excision module
Stephen Gordon Lenk
slenk at emich.edu
Wed Aug 31 10:12:24 EDT 2005
I am converting a module that takes a ClustalW alignment, data mines
the conserved domains from NCBI, then selectively replaces the CDs
with IUPAC 'X' and writes a ClustalW file back out. We have several
uses for this module's functions.
I am converting this to be a Bioperl module to take advantage of
AlignIO capabilities to read/write multiple alignment file types.
There is a .pm package excise_cd.pm, which I have placed in Align
(along with clustalw.pm etc). It is @ISA Bio::Root::Root. I have not
yet written an I file for it, but recognise the necessity of doing so
for optimum compatability with Bioperl.
Only one method from excise_cd is used outside the module - excise(),
which takes a SimpleAlign object made with AlignIO in the calling
program and a hash function with options. The excise method extracts
the sequence data from the SimpleAlign object, data mines the CD
information and uses the options to guide the overwriting of residues
with 'X'. excise() (will) then create an AlignIO output object of the
requested format with the excised alignment. This is then returned to
the caller, which can write out the excised alignment in the desired
format.
I think of this from an external perspective as a CD excising (Xing
out) and data converting filter for alignment files.
Is this a reasonable approach? Would this be an appropriate module and
script for me to donate to Bioperl when properly done?
Another question - I data mine from NCBI using only gi identifiers for
the proteins. I have writen my own code to do this. Is there a Bioperl
way to do get CD data for a protein and can this way allow me to
obtain CD regions for PFAM or other identifiers as well?
Thanks,
Steve Lenk
slenk at emich.edu
More information about the Bioperl-l
mailing list