[Bioperl-l] Finding seqs of given domain architecture
Chris Fields
cjfields at uiuc.edu
Wed Apr 16 17:52:59 UTC 2008
You can try CDART:
http://www.ncbi.nlm.nih.gov/Structure/lexington/lexington.cgi?cmd=rps
There are probably other tools out there as well.
If you want to roll your own, you can use bioperl wrappers for all of
these (Bio::Tools::Run::StandAloneBlast is in bioperl-live,
Bio::Tools::Run::Hmmer in bioperl-run), tweaking the parameters as you
see fit, and either parse while running them or store the file for
parsing later using Bio::SearchIO. Personally, I wouldn't go with (2)
unless you are absolutely sure the domains are found only once per
sequence, are spatially conserved, and don't overlap. For instance,
with many proteins you could have a domain structure like dom1-dom2,
dom2-dom1, dom1-dom1-dom2, etc.
If you just want accessions from Pfam's Stockholm format (which are
UniProt, I believe) you can get at accessions using
Bio::AlignIO::stockholm (using perl 5.10):
use Bio::AlignIO;
use feature 'say';
my $file = shift || die "Must pass file as argument\n";
my $in = Bio::AlignIO->new(-format => 'stockholm',
-file => $file);
while (my $aln = $in->next_aln) {
my @accs;
for my $seq ($aln->each_seq) {
push @accs, $seq->accession_number;
}
say join(',', at accs);
}
chris
On Apr 16, 2008, at 11:12 AM, Jacob Keller wrote:
> Hello All,
>
> I am new to this list, so am not totally sure this is the right
> forum, so please forgive if this is not the right place to asl the
> following question: I am seeking to get all sequences that have a
> given domain architecture, or at least that contain two given
> domains. I have thought of a few ways to do this.
>
> 1. Blast/Psi-blast for each domain, then compare the results for
> common sequences between the two lists, and fetch those. I would
> need to write a (simple) script to do this, but would prefer not to
> re-invent the wheel.
>
> 2. Search with a paradigm sequence of desired architecture/domain
> composition, somehow tweaking the psiblast parameters to find only
> matches over the whole search sequence, thereby finding both desired
> domains. I am not sure how to tweak blast to do this, though.
>
> 3. Pfam has this capability, i.e. to show all domains with a given
> architecture, but it is difficult to get at the actual sequences or
> even a list of accession numbers.
>
> Does anybody have any suggestions as to how optimally to get these
> seq's?
>
> Thanks for your consideration,
>
> Jacob
>
> *******************************************
> Jacob Pearson Keller
> Northwestern University
> Medical Scientist Training Program
> Dallos Laboratory
> F. Searle 1-240
> 2240 Campus Drive
> Evanston IL 60208
> lab: 847.491.2438
> cel: 773.608.9185
> email: j-keller2 at northwestern.edu
> *******************************************
>
> ----- Original Message ----- From: "Heikki Lehvaslaiho" <heikki at sanbi.ac.za
> >
> To: <bioperl-l at lists.open-bio.org>
> Cc: <allenday at ucla.edu>; "Chris Fields" <cjfields at uiuc.edu>; "Jay
> Hannah" <jay at jays.net>; <bioperl-l at bioperl.org>
> Sent: Wednesday, April 16, 2008 6:36 AM
> Subject: Re: [Bioperl-l] bioperl-microarray: status?
>
>
>> FYI,
>>
>> Christoper Jones has just published
>> [http://bioinformatics.oxfordjournals.org/cgi/content/short/
>> 24/8/1102 an
>> article in Bioinformatics] about his
>> [http://search.cpan.org/perldoc?Microarray Microarray perl module]
>> in CPAN.
>>
>> (The text added into BioPerl wiki.)
>>
>> -Heikki
>>
>>
>> On Friday 26 January 2007 16:05:01 Chris Fields wrote:
>>> Don't know if it's worth it, but could the microarray package be
>>> modified so that it deals with data generated from or interacts
>>> directly with Bioconductor (i.e. maybe including some specialized
>>> bioperl-run set of classes to run Bioconductor tasks, return
>>> lightweight bioperl microarray classes)? Allen pointed out in a
>>> previous post that Bioconductor is the best pick for certain tasks,
>>> while Perl excels at others:
>>>
>>> http://article.gmane.org/gmane.comp.lang.perl.bio.general/13993
>>>
>>> Might be nice if we could merge both strengths together in some way.
>>>
>>> chris
>>>
>>> On Jan 26, 2007, at 7:26 AM, Jay Hannah wrote:
>>> > On Jan 25, 2007, at 2:30 AM, Allen Day wrote:
>>> >> Eh, there is some discussion activity on the list, but not
>>> much. You
>>> >> are really better off moving to Bioconductor.
>>> >
>>> > Ok, thanks. I added that to the wiki page:
>>> >
>>> > http://www.bioperl.org/wiki/Microarray_package
>>> >
>>> > j
>>> > seqlab.net
>>> > http://www.bioperl.org/wiki/User:Jhannah
>>> >
>>> > _______________________________________________
>>> > Bioperl-l mailing list
>>> > Bioperl-l at lists.open-bio.org
>>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Robert Switzer
>>> Dept of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>>
>> --
>> ______ _/ _/
>> _____________________________________________________
>> _/ _/
>> _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za
>> _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho
>> _/ _/ _/ SANBI, South African National Bioinformatics Institute
>> _/ _/ _/ University of Western Cape, South Africa
>> _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512
>> ___ _/_/_/_/_/
>> ________________________________________________________
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign
More information about the Bioperl-l
mailing list