[Bioperl-l] Finding seqs of given domain architecture

Wed Apr 16 17:52:59 UTC 2008

You can try CDART:

http://www.ncbi.nlm.nih.gov/Structure/lexington/lexington.cgi?cmd=rps

There are probably other tools out there as well.

If you want to roll your own, you can use bioperl wrappers for all of  
these (Bio::Tools::Run::StandAloneBlast is in bioperl-live,  
Bio::Tools::Run::Hmmer in bioperl-run), tweaking the parameters as you  
see fit, and either parse while running them or store the file for  
parsing later using Bio::SearchIO.  Personally, I wouldn't go with (2)  
unless you are absolutely sure the domains are found only once per  
sequence, are spatially conserved, and don't overlap.  For instance,  
with many proteins you could have a domain structure like dom1-dom2,  
dom2-dom1, dom1-dom1-dom2, etc.

If you just want accessions from Pfam's Stockholm format (which are  
UniProt, I believe) you can get at accessions using  
Bio::AlignIO::stockholm (using perl 5.10):

use Bio::AlignIO;
use feature 'say';

my $file = shift || die "Must pass file as argument\n";

my $in = Bio::AlignIO->new(-format => 'stockholm',
                            -file => $file);

while (my $aln = $in->next_aln) {
     my @accs;
     for my $seq ($aln->each_seq) {
         push @accs, $seq->accession_number;
     }
     say join(',', at accs);
}

chris

On Apr 16, 2008, at 11:12 AM, Jacob Keller wrote:

> Hello All,
>
> I am new to this list, so am not totally sure this is the right  
> forum, so please forgive if this is not the right place to asl the  
> following question: I am seeking to get all sequences that have a  
> given domain architecture, or at least that contain two given  
> domains. I have thought of a few ways to do this.
>
> 1. Blast/Psi-blast for each domain, then compare the results for  
> common sequences between the two lists, and fetch those. I would  
> need to write a (simple) script to do this, but would prefer not to  
> re-invent the wheel.
>
> 2. Search with a paradigm sequence of desired architecture/domain  
> composition, somehow tweaking the psiblast parameters to find only  
> matches over the whole search sequence, thereby finding both desired  
> domains. I am not sure how to tweak blast to do this, though.
>
> 3. Pfam has this capability, i.e. to show all domains with a given  
> architecture, but it is difficult to get at the actual sequences or  
> even a list of accession numbers.
>
> Does anybody have any suggestions as to how optimally to get these  
> seq's?
>
> Thanks for your consideration,
>
> Jacob
>
> *******************************************
> Jacob Pearson Keller
> Northwestern University
> Medical Scientist Training Program
> Dallos Laboratory
> F. Searle 1-240
> 2240 Campus Drive
> Evanston IL 60208
> lab: 847.491.2438
> cel: 773.608.9185
> email: j-keller2 at northwestern.edu
> *******************************************
>
> ----- Original Message ----- From: "Heikki Lehvaslaiho" <heikki at sanbi.ac.za 
> >
> To: <bioperl-l at lists.open-bio.org>
> Cc: <allenday at ucla.edu>; "Chris Fields" <cjfields at uiuc.edu>; "Jay  
> Hannah" <jay at jays.net>; <bioperl-l at bioperl.org>
> Sent: Wednesday, April 16, 2008 6:36 AM
> Subject: Re: [Bioperl-l] bioperl-microarray: status?
>
>
>> FYI,
>>
>> Christoper Jones has just published
>> [http://bioinformatics.oxfordjournals.org/cgi/content/short/ 
>> 24/8/1102 an
>> article in Bioinformatics] about his
>> [http://search.cpan.org/perldoc?Microarray Microarray perl module]  
>> in CPAN.
>>
>> (The text added into BioPerl wiki.)
>>
>> -Heikki
>>
>>
>> On Friday 26 January 2007 16:05:01 Chris Fields wrote:
>>> Don't know if it's worth it, but could the microarray package be
>>> modified so that it deals with data generated from or interacts
>>> directly with Bioconductor (i.e. maybe including some specialized
>>> bioperl-run set of classes to run Bioconductor tasks, return
>>> lightweight bioperl microarray classes)?  Allen pointed out in a
>>> previous post that Bioconductor is the best pick for certain tasks,
>>> while Perl excels at others:
>>>
>>> http://article.gmane.org/gmane.comp.lang.perl.bio.general/13993
>>>
>>> Might be nice if we could merge both strengths together in some way.
>>>
>>> chris
>>>
>>> On Jan 26, 2007, at 7:26 AM, Jay Hannah wrote:
>>> > On Jan 25, 2007, at 2:30 AM, Allen Day wrote:
>>> >> Eh, there is some discussion activity on the list, but not  
>>> much.  You
>>> >> are really better off moving to Bioconductor.
>>> >
>>> > Ok, thanks. I added that to the wiki page:
>>> >
>>> >     http://www.bioperl.org/wiki/Microarray_package
>>> >
>>> > j
>>> > seqlab.net
>>> > http://www.bioperl.org/wiki/User:Jhannah
>>> >
>>> > _______________________________________________
>>> > Bioperl-l mailing list
>>> > Bioperl-l at lists.open-bio.org
>>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Robert Switzer
>>> Dept of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>>
>> -- 
>> ______ _/      _/ 
>> _____________________________________________________
>>     _/      _/
>>    _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>>   _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
>>  _/  _/  _/  SANBI, South African National Bioinformatics Institute
>> _/  _/  _/  University of Western Cape, South Africa
>>    _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
>> ___ _/_/_/_/_/ 
>> ________________________________________________________
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign