[Bioperl-l] Protein alignment CD excision module
Heikki Lehvaslaiho
heikki at ebi.ac.uk
Wed Aug 31 12:36:14 EDT 2005
Steve,
I can see the usefulness of what you are doing, but bioperl is a library and
needs to think modularly so that other users can easily modify it. What you
are describing is a best implemented as a script that uses several modules.
That example script could be stored in BioPerl separately.
On Wednesday 31 August 2005 15:12, Stephen Gordon Lenk wrote:
> I am converting a module that takes a ClustalW alignment, data mines
> the conserved domains from NCBI, then selectively replaces the CDs
> with IUPAC 'X' and writes a ClustalW file back out. We have several
> uses for this module's functions.
Reading and writing an alignment is already handled by Bio::AlignIO. If you
hardcode the format in a module, you loose flexibility. So this belongs to a
script.
"data mines the conserved domains from NCBI"
This needs to be done separately by writing, e.g., a Bio::DB or a
Bio::Tools::Analysis module for accessing the data. Then you need a storage
object to store the conserved residues. You could use Bio::Seq::Meta derived
objects to do that or store them as sequence feaures Bio::SeqFeature::Generic
- or roll your own. The main question is that do you need to store
residue-based information or a few large regions.
"then selectively replaces the CDs with IUPAC 'X'"
This could be implemented as a method that takes the alignment and the storage
object(s) from your analysis and returns the new alignment.
Bio::Align::Utilities could store that.
> I am converting this to be a Bioperl module to take advantage of
> AlignIO capabilities to read/write multiple alignment file types.
Good idea.
> There is a .pm package excise_cd.pm, which I have placed in Align
> (along with clustalw.pm etc). It is @ISA Bio::Root::Root. I have not
clustalw.pm is in Bio::AlignIO. Only modules that are subclasses of
Bio::AlignIO should go there.
> yet written an I file for it, but recognise the necessity of doing so
> for optimum compatability with Bioperl.
An I file is needed only if you expect that there will be several
implementations of the interface.
> Only one method from excise_cd is used outside the module - excise(),
> which takes a SimpleAlign object made with AlignIO in the calling
> program and a hash function with options. The excise method extracts
For modularity, that hash storing all the options, need to turned into
reusable objects.
> the sequence data from the SimpleAlign object, data mines the CD
> information and uses the options to guide the overwriting of residues
> with 'X'. excise() (will) then create an AlignIO output object of the
> requested format with the excised alignment. This is then returned to
> the caller, which can write out the excised alignment in the desired
> format.
> I think of this from an external perspective as a CD excising (Xing
> out) and data converting filter for alignment files.
>From your earlier description CD finding was the problem.
Bio::SimpleAlign::slice do the slicing. On the other hand, from the
description, I am not sure it is necessary to work with the alignment as a
whole: It might be that it is best to treat each sequence separately. Of
course, that depends on reliability of the alignment and what you have
actually aligned!
> Is this a reasonable approach?
> Would this be an appropriate module and
> script for me to donate to Bioperl when properly done?
Yes, please.
-Heikki
> Another question - I data mine from NCBI using only gi identifiers for
> the proteins. I have writen my own code to do this. Is there a Bioperl
> way to do get CD data for a protein and can this way allow me to
> obtain CD regions for PFAM or other identifiers as well?
>
> Thanks,
> Steve Lenk
> slenk at emich.edu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
--
______ _/ _/_____________________________________________________
_/ _/ http://www.ebi.ac.uk/mutations/
_/ _/ _/ Heikki Lehvaslaiho heikki at_ebi _ac _uk
_/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute
_/ _/ _/ Wellcome Trust Genome Campus, Hinxton
_/ _/ _/ Cambridge, CB10 1SD, United Kingdom
_/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________
More information about the Bioperl-l
mailing list