[Bioperl-l] Taxonomy hierarchy extraction
Jason Stajich
jason at bioperl.org
Tue Jun 19 00:17:34 UTC 2007
All the children are in this array.
You get to decide what you want to do with them. In the following
example I print the id, rank, and scientific name out to the screen.
Because this is a taxonomy db query you are getting back
Bio::Taxonomy::Taxon objects so read the documentation for this
module to see what you can do with the object.
I would also suggest spending a little time with the Getting started
and HOWTO:Trees documentation on the website to get familiar with the
objects and nomenclature.
my @extant_children = grep { $_->is_Leaf } $node->get_all_Descendents;
for my $child ( @extant_children ) {
print "id is ", $child->id, "\n"; # NCBI taxa id
print "rank is ", $child->rank, "\n"; # e.g. species
print "scientific name is ", $child->scientific_name, "\n"; #
scientific name
}
On Jun 18, 2007, at 5:04 PM, George Heller wrote:
> Ok, I installed the latest of Scalar::Util and the script seems to
> be working. But I am confused where exactly I need to look for the
> descendent taxon ids once the script is run. I did look into the /
> tmp/ directory, but I couldnt understand much.
>
> Sorry to be bothering, really appreaciate your patience.
>
> Thanks.
> George
>
> Jason Stajich <jason at bioperl.org> wrote:
> Try installing the latest Scalar::Util
> On Jun 18, 2007, at 4:05 PM, George Heller wrote:
>
> This is the output of /usr/bin/perl -V
>
>
> Summary of my perl5 (revision 5 version 8 subversion 5)
> configuration:
> Platform:
> osname=linux, osvers=2.6.9-22.18.bz155725.elsmp,
> archname=i386-linux-thread-multi
> uname='linux hs20-bc1-4.build.redhat.com
> 2.6.9-22.18.bz155725.elsmp #1 smp thu nov 17 15:34:08 est 2005 i686
> i686 i386 gnulinux '
> config_args='-des -Doptimize=-O2 -g -pipe -m32 -march=i386 -
> mtune=pentium4 -Dversion=5.8.5 -Dmyhostname=localhost -
> Dperladmin=root at localhost -Dcc=gcc -Dcf_by=Red Hat, Inc. -
> Dinstallprefix=/usr -Dprefix=/usr -Darchname=i386-linux -
> Dvendorprefix=/usr -Dsiteprefix=/usr -Duseshrplib -Dusethreads -
> Duseithreads -Duselargefiles -Dd_dosuid -Dd_semctl_semun -Di_db -
> Ui_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio -
> Dinstallusrbinperl -Ubincompat5005 -Uversiononly -Dpager=/usr/bin/
> less -isr -Dinc_version_list=5.8.4 5.8.3 5.8.2 5.8.1 5.8.0'
> hint=recommended, useposix=true, d_sigaction=define
> usethreads=define use5005threads=undef useithreads=define
> usemultiplicity=define
> useperlio=define d_sfio=undef uselargefiles=define
> usesocks=undef
> use64bitint=undef use64bitall=undef uselongdouble=undef
> usemymalloc=n, bincompat5005=undef
> Compiler:
> cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING -
> fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -
> D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm',
> optimize='-O2 -g -pipe -m32 -march=i386 -mtune=pentium4',
> cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING -fno-strict-
> aliasing -pipe -I/usr/local/include -I/usr/include/gdbm'
> ccversion='', gccversion='3.4.6 20060404 (Red Hat 3.4.6-2)',
> gccosandvers=''
> intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
> d_longlong=define, longlongsize=8, d_longdbl=define,
> longdblsize=12
> ivtype='long', ivsize=4, nvtype='double', nvsize=8,
> Off_t='off_t', lseeksize=8
> alignbytes=4, prototype=define
> Linker and Libraries:
> ld='gcc', ldflags =' -L/usr/local/lib'
> libpth=/usr/local/lib /lib /usr/lib
> libs=-lresolv -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -
> lpthread -lc
> perllibs=-lresolv -lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
> libc=/lib/libc-2.3.4.so, so=so, useshrplib=true,
> libperl=libperl.so
> gnulibc_version='2.3.4'
> Dynamic Linking:
> dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-
> Wl,-E -Wl,-rpath,/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE'
> cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'
>
>
> Characteristics of this binary (from libperl):
> Compile-time options: DEBUGGING MULTIPLICITY USE_ITHREADS
> USE_LARGE_FILES PERL_IMPLICIT_CONTEXT
> Built under linux
> Compiled at Jul 24 2006 18:28:10
> @INC:
> /usr/lib/perl5/5.8.5/i386-linux-thread-multi
> /usr/lib/perl5/5.8.5
> /usr/lib/perl5/site_perl/5.8.5/i386-linux-thread-multi
> /usr/lib/perl5/site_perl/5.8.4/i386-linux-thread-multi
> /usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi
> /usr/lib/perl5/site_perl/5.8.2/i386-linux-thread-multi
> /usr/lib/perl5/site_perl/5.8.1/i386-linux-thread-multi
> /usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi
> /usr/lib/perl5/site_perl/5.8.5
> /usr/lib/perl5/site_perl/5.8.4
> /usr/lib/perl5/site_perl/5.8.3
> /usr/lib/perl5/site_perl/5.8.2
> /usr/lib/perl5/site_perl/5.8.1
> /usr/lib/perl5/site_perl/5.8.0
> /usr/lib/perl5/site_perl
> /usr/lib/perl5/vendor_perl/5.8.5/i386-linux-thread-multi
> /usr/lib/perl5/vendor_perl/5.8.4/i386-linux-thread-multi
> /usr/lib/perl5/vendor_perl/5.8.3/i386-linux-thread-multi
> /usr/lib/perl5/vendor_perl/5.8.2/i386-linux-thread-multi
> /usr/lib/perl5/vendor_perl/5.8.1/i386-linux-thread-multi
> /usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi
> /usr/lib/perl5/vendor_perl/5.8.5
> /usr/lib/perl5/vendor_perl/5.8.4
> /usr/lib/perl5/vendor_perl/5.8.3
> /usr/lib/perl5/vendor_perl/5.8.2
> /usr/lib/perl5/vendor_perl/5.8.1
> /usr/lib/perl5/vendor_perl/5.8.0
> /usr/lib/perl5/vendor_perl
>
>
> Thanks.
> George
> .
>
>
> Hilmar Lapp <hlapp at gmx.net> wrote:
> The perl version appears to be 5.8.5 though, so something strange
> appears to be going on too.
>
>
> George, can you please post the output of
>
>
> $ /usr/bin/perl -V
>
>
> -hilmar
>
>
> On Jun 18, 2007, at 6:33 PM, Chris Fields wrote:
>
>
> As the error implies your local version of perl doesn't seem
> support
> weak references, which means it doesn't have Scalar::Utils (which
> was
> added to core after perl 5.6.1, I think). Try installing
> Scalar::Utils to see what happens.
>
>
> chris
>
>
> On Jun 18, 2007, at 5:18 PM, George Heller wrote:
>
>
> I tried running the below mentioned script and I seem to be
> getting
> the following error:
>
>
> Weak references are not implemented in the version of perl at /
> usr/lib/perl5/site_perl/5.8.5/Bio/Tree/Node.pm line 76
> BEGIN failed--compilation aborted at /usr/lib/perl5/site_perl/5.8.5/
> Bio/Tree/Node.pm line 76.
> Compilation failed in require at my.pl line 7.
> BEGIN failed--compilation aborted at my.pl line 7.
>
>
> My script looks something like,
>
>
> #!/usr/bin/perl
> use strict;
> #use warnings;
> use DBI;
> use Bio::Tree::Node;
> use Bio::DB::Taxonomy;
> use Bio::DB::Taxonomy::flatfile;
> my $idx_dir = '/tmp';
>
>
> my ($nodefile,$namesfile) = ('nodes.dmp','names.dmp');
> my $db = new Bio::DB::Taxonomy(-source => 'flatfile',
> -nodesfile => $nodesfile,
> -namesfile => $namesfile,
> -directory => $idx_dir);
> my $node = $db->get_Taxonomy_Node(-taxonid => '33090');
> my @extant_children = grep { $_->is_Leaf } $node-
> get_all_Descendents;
>
>
> foreach $field (@extant_children) {
> print "$field";
> print "|";
> print "\n";
> }
>
>
> And I am running the script using the command,
>
>
> perl myscript.pl -v --names names.dmp --nodes nodes.dmp
>
>
> and I have the nodes.dmp and names.dmp files in the current
> directory.
>
>
> Thanks,
> George
>
>
>
>
> Jason Stajich wrote:
> It is implemented in the implementing class - DB::Taxonomy is
> just the base class. For example see the flatfile implementation
> Bio::DB::Taxonomy::flatfile
>
>
> See the scripts/taxa/local_taxonomydb_query.PLS for example using
> it:
> nodes and names are from NCBI taxonomy database.
>
>
>
>
> Here is an un-debugged copy+paste for your question that *should*
> work.
>
>
>
>
> use Bio::DB::Taxonomy
> my $idx_dir = '/tmp';
>
>
>
>
> my ($nodefile,$namesfile) = ('nodes.dmp,'names.dmp');
> my $db = new Bio::DB::Taxonomy(-source => 'flatfile',
> -nodesfile => $nodesfile,
> -namesfile => $namesfile,
> -directory => $idx_dir);
> my $node = $db->get_Taxonomy_Node(-taxonid => '33090');
> my @extant_children = grep { $_->is_Leaf } $node-
> get_all_Descendents;
>
>
>
>
>
>
>
>
> -jason
>
>
> On Jun 18, 2007, at 10:07 AM, George Heller wrote:
>
>
> What exactly is the "node n" in the query below. When I issue
> this query, it says,
>
>
>
>
> relation "node" does not exist.
>
>
>
>
> I tried to use the get_all_Descendents method but it looks like
> in order to do a recursive call it calls the method
> each_Descendent. This method is not implemented in
> Bio::DB::Taxonomy. It just has a single line,
>
>
>
>
> shift->throw_not_implemented();
>
>
>
>
> Thanks.
> George.
>
>
>
>
> Hilmar Lapp wrote:
> I'm a bit confused - it sounds like you have set up a local
> BioSQL
> database and loaded the NCBI taxonomy into the database. You can
> now
> use simple SQL to retrieve all descendants of a node in the tree
> given its NCBI taxonID such as
>
>
>
>
> SELECT tn.*, tnm.name FROM taxon tn, taxon_name tnm, node n
> WHERE
> n.ncbi_taxon_id = :taxonID
> AND tn.left_value > n. left_value
> AND tn.right_value < n.right_value
> AND tn.taxon_id = tnm.taxon_id
> AND tn.name_class = 'scientific_name'
>
>
>
>
> BioPerl doesn't have a Taxonomy::biosql module yet (though this
> would
> seem like a worthwhile thing to add), so you can't use the
> Bio::DB::Taxonomy interface to do this against a BioSQL instance.
>
>
>
>
> However, BioPerl does have support for the flat-file download of
> the
> NCBI taxonomy database and indexes it, so you can simply use
> Taxonomy::{get_taxon,get_all_Descendants} using the flatfile
> download
> to achieve what you wanted to do in a less than 5 lines of perl.
>
>
>
>
> Although the recursive implementation of
> Taxonomy::get_all_Descendants
> () won't be lightning fast, it may still be perfectly fine for your
> application - are you sure it is not?
>
>
>
>
> -hilmar
>
>
>
>
> On Jun 18, 2007, at 12:21 AM, George Heller wrote:
>
>
>
>
> Thanks. And how can I assign the $node here in the below code,
> such
> that I can reference it to a particular taxon id record? I want to
> retrieve all the descendents from the taxonomy hierarchy, given a
> particular taxon id.
>
>
>
>
> I have a local db setup, in which I have uploaded data using the
> load_ncbi_taxonomy.pl script.
>
>
>
>
> Thanks.
> George
>
>
>
>
> Jason Stajich wrote:
> I assume you already figured out how to setup a local taxonomydb?
>
>
>
>
>
>
>
>
> You just want the extant species/leaves of the tree
>
>
>
>
>
>
>
>
> my @extant_children = grep { $_->is_Leaf } $node-
> get_all_Descedents;
>
>
>
>
>
>
>
>
>
>
>
>
> -jason
> On Jun 17, 2007, at 11:41 AM, George Heller wrote:
>
>
>
>
> Hi all,
>
>
>
>
>
>
>
>
> Can anyone point me to some example that uses the
> get_all_Descendents method from Bio::DB::Taxonomy? I am a newbie at
> this, and I am not quite sure how to implement it.
>
>
>
>
>
>
>
>
> Thanks.
> George
>
>
>
>
>
>
>
>
> Sendu Bala wrote:
> George Heller wrote:
> Hi all,
>
>
>
>
>
>
>
>
> I am looking at extracting the taxonomy hierarchy for some taxon
> ids.
> What I plan to do is, for a given taxon id, say 33090, I want to
> extract all taxon ids that are children of this species. I do not
> just want the immediate children, but the children's children
> and so
> on.
>
>
>
>
>
>
>
>
> Any ideas on the way I can go about doing this?
>
>
>
>
>
>
>
>
> Well, you'll use Bio::DB::Taxonomy presumably, and
> each_Descendent in
> some kind of looping structure. Most easily a recursing sub.
>
>
>
>
>
>
>
>
> If you happen to code up something neat and efficient, why not
> share it
> with us and we could add it to the Taxonomy module(s).
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> ---------------------------------
> Shape Yahoo! in your own image. Join our Network Research Panel
> today!
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
>
>
>
>
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> ---------------------------------
> Need a vacation? Get great deals to amazing places on Yahoo!
> Travel.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
>
> --
> ===========================================================
> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> ---------------------------------
> Take the Internet to Go: Yahoo!Go puts the Internet in your
> pocket: mail, news, photos & more.
>
>
>
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> ---------------------------------
> Bored stiff? Loosen up...
> Download and play hundreds of games for free on Yahoo! Games.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> --
> ===========================================================
> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> ---------------------------------
> Expecting? Get great news right away with email Auto-Check.
> Try the Yahoo! Mail Beta.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>
>
>
>
>
>
> ---------------------------------
> Building a website is a piece of cake.
> Yahoo! Small Business gives you all the tools to get online.
--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/
More information about the Bioperl-l
mailing list