[Bioperl-l] Re: grouping sequences by DNA-binding domains --elaboration

Wed Oct 19 18:21:27 EDT 2005

Olena-

If all you want is the description from the CDD ID, then grepping or
hashing or otherwise working with this file will take care of your
needs.

ftp://ftp.ncbi.nih.gov/pub/mmdb/cdd/cddid.tbl

Barry

> -----Original Message-----
> From: bioperl-l-bounces at portal.open-bio.org [mailto:bioperl-l-
> bounces at portal.open-bio.org] On Behalf Of Stefan Kirov
> Sent: Tuesday, October 18, 2005 3:28 PM
> To: Brian Osborne
> Cc: bioperl-l; Olena Morozova
> Subject: Re: [Bioperl-l] Re: grouping sequences by DNA-binding domains
--
> elaboration
> 
> Certainly you are right Brian- there is no particular domain type as
> for example in a controlled vocabulary. One can grep the DNA & binding
> ones, which is not perfect...
> Anyway, I had the feeling Olena needs to know what is the CDD
> description, given the CDD identifier, which is possible using the
> parser (though it is not the most efficient way).
> Stefan
> 
> Brian Osborne wrote:
> 
> >Stefan,
> >
> >Yes, the hyperlinks are in the text just like they were in our old
friend
> >LocusLink. But it seems that Olena wanted information about the
domains,
> >like whether or not the domain was DNA-binding - is this in the ASN?
> >
> >In my too-brief response I was attempting to say that starting with a
> list
> >of domains, or domain ids, and finding out whether they were
DNA-binding
> >domains or not seems to imply working with an ontology.
> >
> >Brian O.
> >
> >
> >On 10/18/05 3:33 PM, "Stefan Kirov" <skirov at utk.edu> wrote:
> >
> >
> >
> >>Actually Brian, Bio::SeqIO::entrezgene will extract this data from
the
> >>ASN1 file:
> >>
> >>use Bio::SeqIO;
> >>my $eio=new Bio::SeqIO(-file=>$file,-format=>'entrezgene',
> >>-debug=>'off',-service_record=>'no');
> >>($seq,$struct,$uncapt)=$eio->next_seq;
> >>my @contigs=$struct->get_members();#(-authority=>'genomic');
> >>foreach my $contig (@contigs) {
> >>    if ($contig->authority eq 'Product') {
> >>        foreach my $sf ($contig->get_SeqFeatures) {
> >>            foreach my $dblink ($sf->annotation-
> >get_Annotations(dblink)) {
> >>                my
> >>$key=$dblink->{_anchor}?$dblink->{_anchor}:$dblink->optional_id;
> >>                my $db=$dblink->database;
> >>                next unless (($db =~/cdd/i)||($sf->primary_tag=~
> >>/conserved/i));
> >>                my $desc;
> >>                if ($key =~ /:/) {
> >>                    ($key,$desc)=split(/:/,$key);
> >>                }
> >>                print join($fs,
> >>$gid,$contig->id,$desc,$key,$sf->score,'','',$db,$sf->start,$sf-
> >end),"\n";
> >>            }
> >>        }
> >>    }
> >>}
> >>
> >>I guess it is really a good time time to write thise docs :-)
> >>Stefan
> >>
> >>Brian Osborne wrote:
> >>
> >>
> >>
> >>>Olena,
> >>>
> >>>I'm pretty sure that there's no code in Bioperl that accesses or
parses
> CDD,
> >>>hopefully I'm corrected if I'm wrong.
> >>>
> >>>Brian O.
> >>>
> >>>
> >>>On 10/18/05 2:26 PM, "Olena Morozova" <olenka.m at gmail.com> wrote:
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>>Hi Brian,
> >>>>
> >>>>Thank you for your reply. It is the CDD (Conserved Domain
Database) on
> >>>>the NCBI web site.
> >>>>Olena
> >>>>
> >>>>On 10/18/05, Brian Osborne <brian_osborne at cognia.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>>Olena,
> >>>>>
> >>>>>What database contains the information you're looking for?
> >>>>>
> >>>>>Brian O.
> >>>>>
> >>>>>
> >>>>>On 10/16/05 8:17 PM, "Olena Morozova" <olenka.m at gmail.com> wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>>Hi agian,
> >>>>>>
> >>>>>>I just figured out how to obtain a list of conserved domains for
a
> >>>>>>given sequence using the SeqHound.pm module available at
> >>>>>>http://www.blueprint.org/seqhound/apifunctslist.html
> >>>>>>
> >>>>>>Now I have a list of conserved domains for a given sequence and
I
> need
> >>>>>>to extract information as to what these domains are and which
ones
> are
> >>>>>>DNA-binding. Any help on this will be greatly appreciated
> >>>>>>
> >>>>>>Thanks again,
> >>>>>>Olena
> >>>>>>
> >>>>>>
> >>>>>>On 10/16/05, Olena Morozova <olenka.m at gmail.com> wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>>I have a list of transcription factor sequences, and I need to
> group
> >>>>>>>them according to the DNA-binding domains based on the
> classification
> >>>>>>>by TRANSFAC or any other database. Basically, I just need to
> extract
> >>>>>>>the DNA-binding domain information for a particular TF from a
> database
> >>>>>>>like TRANSFAC (I don't know what other databases would have
this
> >>>>>>>information, but any will do) Anyone has any idea how to do
this?
> >>>>>>>Thank you very much for your help and time
> >>>>>>>
> >>>>>>>Olena
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>_______________________________________________
> >>>>>>Bioperl-l mailing list
> >>>>>>Bioperl-l at portal.open-bio.org
> >>>>>>http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>_______________________________________________
> >>>Bioperl-l mailing list
> >>>Bioperl-l at portal.open-bio.org
> >>>http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>>
> >>>
> >>>
> >
> >
> >
> >
> 
> --
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l