[Bioperl-l] Taxonomy hierarchy extraction
Jason Stajich
jason at bioperl.org
Tue Jun 19 01:05:43 UTC 2007
The files are indexes because you are indexing a flatfile - this
speeds up the lookup so the second time you run the script it doesn't
have to index.
You don't need to look at the files, they won't make sense to a human!
The reason it isn't printing anything is someone didn't really write
the implementation quite right. This code was overhauled by Sendu
before the last release I guess something didn't quite get connected.
I checked in code that has the Bio::Taxon delegating now to a DB
handle for the each_Descendent call.
You can either patch your code or just use the code listed here:
http://bioperl.org/wiki/Module:Bio::DB::Taxonomy
On Jun 18, 2007, at 5:29 PM, George Heller wrote:
> But the problem is that I don't really get any output on the
> screen. In the /tmp directory I get 4 files namely parents, nodes,
> id2names and names2id, but I dont know what to make of them. This
> is what my script looks like,
>
> #!/usr/bin/perl
> use strict;
> #use warnings;
> use DBI;
> use Bio::Tree::Node;
> use Bio::DB::Taxonomy;
> use Bio::DB::Taxonomy::flatfile;
> my $idx_dir = '/tmp';
> my $nodefile;
> my $namesfile;
>
> my ($nodefile,$namesfile) = ('nodes.dmp','names.dmp');
> my $db = new Bio::DB::Taxonomy(-source => 'flatfile',
> -nodesfile => $nodefile,
> -namesfile => $namesfile,
> -directory => $idx_dir);
> my $node = $db->get_Taxonomy_Node(-taxonid => '33090');
> my @extant_children = grep { $_->is_Leaf } $node-
> >get_all_Descendents;
>
> for my $child ( @extant_children ) {
> print "id is ", $child->id, "\n"; # NCBI taxa id
> print "rank is ", $child->rank, "\n"; # e.g. species
> print "scientific name is ", $child->scientific_name, "\n"; #
> scientific name
> }
>
> Thanks.
> George
>
> Jason Stajich <jason at bioperl.org> wrote:
> All the children are in this array.
>
>
> You get to decide what you want to do with them. In the following
> example I print the id, rank, and scientific name out to the screen.
> Because this is a taxonomy db query you are getting back
> Bio::Taxonomy::Taxon objects so read the documentation for this
> module to see what you can do with the object.
> I would also suggest spending a little time with the Getting
> started and HOWTO:Trees documentation on the website to get
> familiar with the objects and nomenclature.
>
>
>
>
> my @extant_children = grep { $_->is_Leaf } $node-
> >get_all_Descendents;
>
>
> for my $child ( @extant_children ) {
> print "id is ", $child->id, "\n"; # NCBI taxa id
> print "rank is ", $child->rank, "\n"; # e.g. species
> print "scientific name is ", $child->scientific_name, "\n"; #
> scientific name
> }
>
>
> On Jun 18, 2007, at 5:04 PM, George Heller wrote:
>
> Ok, I installed the latest of Scalar::Util and the script seems
> to be working. But I am confused where exactly I need to look for
> the descendent taxon ids once the script is run. I did look into
> the /tmp/ directory, but I couldnt understand much.
>
>
> Sorry to be bothering, really appreaciate your patience.
>
>
> Thanks.
> George
>
>
> Jason Stajich <jason at bioperl.org> wrote:
> Try installing the latest Scalar::Util
> On Jun 18, 2007, at 4:05 PM, George Heller wrote:
>
>
> This is the output of /usr/bin/perl -V
>
>
>
>
> Summary of my perl5 (revision 5 version 8 subversion 5)
> configuration:
> Platform:
> osname=linux, osvers=2.6.9-22.18.bz155725.elsmp,
> archname=i386-linux-thread-multi
> uname='linux hs20-bc1-4.build.redhat.com
> 2.6.9-22.18.bz155725.elsmp #1 smp thu nov 17 15:34:08 est 2005 i686
> i686 i386 gnulinux '
> config_args='-des -Doptimize=-O2 -g -pipe -m32 -march=i386 -
> mtune=pentium4 -Dversion=5.8.5 -Dmyhostname=localhost -
> Dperladmin=root at localhost -Dcc=gcc -Dcf_by=Red Hat, Inc. -
> Dinstallprefix=/usr -Dprefix=/usr -Darchname=i386-linux -
> Dvendorprefix=/usr -Dsiteprefix=/usr -Duseshrplib -Dusethreads -
> Duseithreads -Duselargefiles -Dd_dosuid -Dd_semctl_semun -Di_db -
> Ui_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio -
> Dinstallusrbinperl -Ubincompat5005 -Uversiononly -Dpager=/usr/bin/
> less -isr -Dinc_version_list=5.8.4 5.8.3 5.8.2 5.8.1 5.8.0'
> hint=recommended, useposix=true, d_sigaction=define
> usethreads=define use5005threads=undef useithreads=define
> usemultiplicity=define
> useperlio=define d_sfio=undef uselargefiles=define
> usesocks=undef
> use64bitint=undef use64bitall=undef uselongdouble=undef
> usemymalloc=n, bincompat5005=undef
> Compiler:
> cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING -
> fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -
> D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm',
> optimize='-O2 -g -pipe -m32 -march=i386 -mtune=pentium4',
> cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING -fno-
> strict-aliasing -pipe -I/usr/local/include -I/usr/include/gdbm'
> ccversion='', gccversion='3.4.6 20060404 (Red Hat
> 3.4.6-2)', gccosandvers=''
> intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
> d_longlong=define, longlongsize=8, d_longdbl=define,
> longdblsize=12
> ivtype='long', ivsize=4, nvtype='double', nvsize=8,
> Off_t='off_t', lseeksize=8
> alignbytes=4, prototype=define
> Linker and Libraries:
> ld='gcc', ldflags =' -L/usr/local/lib'
> libpth=/usr/local/lib /lib /usr/lib
> libs=-lresolv -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -
> lpthread -lc
> perllibs=-lresolv -lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
> libc=/lib/libc-2.3.4.so, so=so, useshrplib=true,
> libperl=libperl.so
> gnulibc_version='2.3.4'
> Dynamic Linking:
> dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-
> Wl,-E -Wl,-rpath,/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE'
> cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'
>
>
>
>
> Characteristics of this binary (from libperl):
> Compile-time options: DEBUGGING MULTIPLICITY USE_ITHREADS
> USE_LARGE_FILES PERL_IMPLICIT_CONTEXT
> Built under linux
> Compiled at Jul 24 2006 18:28:10
> @INC:
> /usr/lib/perl5/5.8.5/i386-linux-thread-multi
> /usr/lib/perl5/5.8.5
> /usr/lib/perl5/site_perl/5.8.5/i386-linux-thread-multi
> /usr/lib/perl5/site_perl/5.8.4/i386-linux-thread-multi
> /usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi
> /usr/lib/perl5/site_perl/5.8.2/i386-linux-thread-multi
> /usr/lib/perl5/site_perl/5.8.1/i386-linux-thread-multi
> /usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi
> /usr/lib/perl5/site_perl/5.8.5
> /usr/lib/perl5/site_perl/5.8.4
> /usr/lib/perl5/site_perl/5.8.3
> /usr/lib/perl5/site_perl/5.8.2
> /usr/lib/perl5/site_perl/5.8.1
> /usr/lib/perl5/site_perl/5.8.0
> /usr/lib/perl5/site_perl
> /usr/lib/perl5/vendor_perl/5.8.5/i386-linux-thread-multi
> /usr/lib/perl5/vendor_perl/5.8.4/i386-linux-thread-multi
> /usr/lib/perl5/vendor_perl/5.8.3/i386-linux-thread-multi
> /usr/lib/perl5/vendor_perl/5.8.2/i386-linux-thread-multi
> /usr/lib/perl5/vendor_perl/5.8.1/i386-linux-thread-multi
> /usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi
> /usr/lib/perl5/vendor_perl/5.8.5
> /usr/lib/perl5/vendor_perl/5.8.4
> /usr/lib/perl5/vendor_perl/5.8.3
> /usr/lib/perl5/vendor_perl/5.8.2
> /usr/lib/perl5/vendor_perl/5.8.1
> /usr/lib/perl5/vendor_perl/5.8.0
> /usr/lib/perl5/vendor_perl
>
>
>
>
> Thanks.
> George
> .
>
>
>
>
> Hilmar Lapp <hlapp at gmx.net> wrote:
> The perl version appears to be 5.8.5 though, so something
> strange
> appears to be going on too.
>
>
>
>
> George, can you please post the output of
>
>
>
>
> $ /usr/bin/perl -V
>
>
>
>
> -hilmar
>
>
>
>
> On Jun 18, 2007, at 6:33 PM, Chris Fields wrote:
>
>
>
>
> As the error implies your local version of perl doesn't seem
> support
> weak references, which means it doesn't have Scalar::Utils
> (which was
> added to core after perl 5.6.1, I think). Try installing
> Scalar::Utils to see what happens.
>
>
>
>
> chris
>
>
>
>
> On Jun 18, 2007, at 5:18 PM, George Heller wrote:
>
>
>
>
> I tried running the below mentioned script and I seem to be
> getting
> the following error:
>
>
>
>
> Weak references are not implemented in the version of perl at /
> usr/lib/perl5/site_perl/5.8.5/Bio/Tree/Node.pm line 76
> BEGIN failed--compilation aborted at /usr/lib/perl5/site_perl/
> 5.8.5/
> Bio/Tree/Node.pm line 76.
> Compilation failed in require at my.pl line 7.
> BEGIN failed--compilation aborted at my.pl line 7.
>
>
>
>
> My script looks something like,
>
>
>
>
> #!/usr/bin/perl
> use strict;
> #use warnings;
> use DBI;
> use Bio::Tree::Node;
> use Bio::DB::Taxonomy;
> use Bio::DB::Taxonomy::flatfile;
> my $idx_dir = '/tmp';
>
>
>
>
> my ($nodefile,$namesfile) = ('nodes.dmp','names.dmp');
> my $db = new Bio::DB::Taxonomy(-source => 'flatfile',
> -nodesfile => $nodesfile,
> -namesfile => $namesfile,
> -directory => $idx_dir);
> my $node = $db->get_Taxonomy_Node(-taxonid => '33090');
> my @extant_children = grep { $_->is_Leaf } $node-
> get_all_Descendents;
>
>
>
>
> foreach $field (@extant_children) {
> print "$field";
> print "|";
> print "\n";
> }
>
>
>
>
> And I am running the script using the command,
>
>
>
>
> perl myscript.pl -v --names names.dmp --nodes nodes.dmp
>
>
>
>
> and I have the nodes.dmp and names.dmp files in the current
> directory.
>
>
>
>
> Thanks,
> George
>
>
>
>
>
>
>
>
> Jason Stajich wrote:
> It is implemented in the implementing class - DB::Taxonomy is
> just the base class. For example see the flatfile implementation
> Bio::DB::Taxonomy::flatfile
>
>
>
>
> See the scripts/taxa/local_taxonomydb_query.PLS for example using
> it:
> nodes and names are from NCBI taxonomy database.
>
>
>
>
>
>
>
>
> Here is an un-debugged copy+paste for your question that *should*
> work.
>
>
>
>
>
>
>
>
> use Bio::DB::Taxonomy
> my $idx_dir = '/tmp';
>
>
>
>
>
>
>
>
> my ($nodefile,$namesfile) = ('nodes.dmp,'names.dmp');
> my $db = new Bio::DB::Taxonomy(-source => 'flatfile',
> -nodesfile => $nodesfile,
> -namesfile => $namesfile,
> -directory => $idx_dir);
> my $node = $db->get_Taxonomy_Node(-taxonid => '33090');
> my @extant_children = grep { $_->is_Leaf } $node-
> get_all_Descendents;
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> -jason
>
>
>
>
> On Jun 18, 2007, at 10:07 AM, George Heller wrote:
>
>
>
>
> What exactly is the "node n" in the query below. When I issue
> this query, it says,
>
>
>
>
>
>
>
>
> relation "node" does not exist.
>
>
>
>
>
>
>
>
> I tried to use the get_all_Descendents method but it looks like
> in order to do a recursive call it calls the method
> each_Descendent. This method is not implemented in
> Bio::DB::Taxonomy. It just has a single line,
>
>
>
>
>
>
>
>
> shift->throw_not_implemented();
>
>
>
>
>
>
>
>
> Thanks.
> George.
>
>
>
>
>
>
>
>
> Hilmar Lapp wrote:
> I'm a bit confused - it sounds like you have set up a local
> BioSQL
> database and loaded the NCBI taxonomy into the database. You can
> now
> use simple SQL to retrieve all descendants of a node in the tree
> given its NCBI taxonID such as
>
>
>
>
>
>
>
>
> SELECT tn.*, tnm.name FROM taxon tn, taxon_name tnm, node n
> WHERE
> n.ncbi_taxon_id = :taxonID
> AND tn.left_value > n. left_value
> AND tn.right_value < n.right_value
> AND tn.taxon_id = tnm.taxon_id
> AND tn.name_class = 'scientific_name'
>
>
>
>
>
>
>
>
> BioPerl doesn't have a Taxonomy::biosql module yet (though this
> would
> seem like a worthwhile thing to add), so you can't use the
> Bio::DB::Taxonomy interface to do this against a BioSQL instance.
>
>
>
>
>
>
>
>
> However, BioPerl does have support for the flat-file download of
> the
> NCBI taxonomy database and indexes it, so you can simply use
> Taxonomy::{get_taxon,get_all_Descendants} using the flatfile
> download
> to achieve what you wanted to do in a less than 5 lines of perl.
>
>
>
>
>
>
>
>
> Although the recursive implementation of
> Taxonomy::get_all_Descendants
> () won't be lightning fast, it may still be perfectly fine for
> your
> application - are you sure it is not?
>
>
>
>
>
>
>
>
> -hilmar
>
>
>
>
>
>
>
>
> On Jun 18, 2007, at 12:21 AM, George Heller wrote:
>
>
>
>
>
>
>
>
> Thanks. And how can I assign the $node here in the below code,
> such
> that I can reference it to a particular taxon id record? I want to
> retrieve all the descendents from the taxonomy hierarchy, given a
> particular taxon id.
>
>
>
>
>
>
>
>
> I have a local db setup, in which I have uploaded data using the
> load_ncbi_taxonomy.pl script.
>
>
>
>
>
>
>
>
> Thanks.
> George
>
>
>
>
>
>
>
>
> Jason Stajich wrote:
> I assume you already figured out how to setup a local taxonomydb?
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> You just want the extant species/leaves of the tree
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> my @extant_children = grep { $_->is_Leaf } $node-
> get_all_Descedents;
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> -jason
> On Jun 17, 2007, at 11:41 AM, George Heller wrote:
>
>
>
>
>
>
>
>
> Hi all,
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Can anyone point me to some example that uses the
> get_all_Descendents method from Bio::DB::Taxonomy? I am a
> newbie at
> this, and I am not quite sure how to implement it.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Thanks.
> George
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Sendu Bala wrote:
> George Heller wrote:
> Hi all,
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> I am looking at extracting the taxonomy hierarchy for some taxon
> ids.
> What I plan to do is, for a given taxon id, say 33090, I want to
> extract all taxon ids that are children of this species. I do not
> just want the immediate children, but the children's children
> and so
> on.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Any ideas on the way I can go about doing this?
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Well, you'll use Bio::DB::Taxonomy presumably, and
> each_Descendent in
> some kind of looping structure. Most easily a recursing sub.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> If you happen to code up something neat and efficient, why not
> share it
> with us and we could add it to the Taxonomy module(s).
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> ---------------------------------
> Shape Yahoo! in your own image. Join our Network Research Panel
> today!
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> ---------------------------------
> Need a vacation? Get great deals to amazing places on Yahoo!
> Travel.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
>
>
>
>
>
> --
> ===========================================================
> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> ---------------------------------
> Take the Internet to Go: Yahoo!Go puts the Internet in your
> pocket: mail, news, photos & more.
>
>
>
>
>
>
>
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> ---------------------------------
> Bored stiff? Loosen up...
> Download and play hundreds of games for free on Yahoo! Games.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>
>
>
>
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
>
> --
> ===========================================================
> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> ---------------------------------
> Expecting? Get great news right away with email Auto-Check.
> Try the Yahoo! Mail Beta.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> ---------------------------------
> Building a website is a piece of cake.
> Yahoo! Small Business gives you all the tools to get online.
>
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>
>
>
>
>
>
> ---------------------------------
> Fussy? Opinionated? Impossible to please? Perfect. Join Yahoo!'s
> user panel and lay it on us.
--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/
More information about the Bioperl-l
mailing list