[Bioperl-l] Protein alignment CD excision module

Wed Aug 31 10:12:24 EDT 2005

I am converting a module that takes a ClustalW alignment, data mines 
the conserved domains from NCBI, then selectively replaces the CDs 
with IUPAC 'X' and writes a ClustalW file back out. We have several 
uses for this module's functions.

I am converting this to be a Bioperl module to take advantage of 
AlignIO capabilities to read/write multiple alignment file types.

There is a .pm package excise_cd.pm, which I have placed in Align 
(along with clustalw.pm etc). It is @ISA Bio::Root::Root. I have not 
yet written an I file for it, but recognise the necessity of doing so 
for optimum compatability with Bioperl.

Only one method from excise_cd is used outside the module - excise(), 
which takes a SimpleAlign object made with AlignIO in the calling 
program and a hash function with options. The excise method extracts 
the sequence data from the SimpleAlign object, data mines the CD 
information and uses the options to guide the overwriting of residues 
with 'X'. excise() (will) then create an AlignIO output object of the 
requested format with the excised alignment. This is then returned to 
the caller, which can write out the excised alignment in the desired 
format.

I think of this from an external perspective as a CD excising (Xing 
out) and data converting filter for alignment files. 

Is this a reasonable approach? Would this be an appropriate module and 
script for me to donate to Bioperl when properly done?

Another question - I data mine from NCBI using only gi identifiers for 
the proteins. I have writen my own code to do this. Is there a Bioperl 
way to do get CD data for a protein and can this way allow me to 
obtain CD regions for PFAM or other identifiers as well?

Thanks,
Steve Lenk
slenk at emich.edu