[Bioperl-l] Taxonomy hierarchy extraction
George Heller
george.heller at yahoo.com
Tue Jun 19 01:16:10 UTC 2007
Works perfectly. Thanks so much Jason, Hilmar, Chris. You've been a great help!
Thanks.
George
Jason Stajich <jason at bioperl.org> wrote:
The files are indexes because you are indexing a flatfile - this speeds up the lookup so the second time you run the script it doesn't have to index. You don't need to look at the files, they won't make sense to a human!
The reason it isn't printing anything is someone didn't really write the implementation quite right. This code was overhauled by Sendu before the last release I guess something didn't quite get connected.
I checked in code that has the Bio::Taxon delegating now to a DB handle for the each_Descendent call.
You can either patch your code or just use the code listed here:
http://bioperl.org/wiki/Module:Bio::DB::Taxonomy
On Jun 18, 2007, at 5:29 PM, George Heller wrote:
But the problem is that I don't really get any output on the screen. In the /tmp directory I get 4 files namely parents, nodes, id2names and names2id, but I dont know what to make of them. This is what my script looks like,
#!/usr/bin/perl
use strict;
#use warnings;
use DBI;
use Bio::Tree::Node;
use Bio::DB::Taxonomy;
use Bio::DB::Taxonomy::flatfile;
my $idx_dir = '/tmp';
my $nodefile;
my $namesfile;
my ($nodefile,$namesfile) = ('nodes.dmp','names.dmp');
my $db = new Bio::DB::Taxonomy(-source => 'flatfile',
-nodesfile => $nodefile,
-namesfile => $namesfile,
-directory => $idx_dir);
my $node = $db->get_Taxonomy_Node(-taxonid => '33090');
my @extant_children = grep { $_->is_Leaf } $node->get_all_Descendents;
for my $child ( @extant_children ) {
print "id is ", $child->id, "\n"; # NCBI taxa id
print "rank is ", $child->rank, "\n"; # e.g. species
print "scientific name is ", $child->scientific_name, "\n"; #
scientific name
}
Thanks.
George
Jason Stajich <jason at bioperl.org> wrote:
All the children are in this array.
You get to decide what you want to do with them. In the following example I print the id, rank, and scientific name out to the screen.
Because this is a taxonomy db query you are getting back Bio::Taxonomy::Taxon objects so read the documentation for this module to see what you can do with the object.
I would also suggest spending a little time with the Getting started and HOWTO:Trees documentation on the website to get familiar with the objects and nomenclature.
my @extant_children = grep { $_->is_Leaf } $node->get_all_Descendents;
for my $child ( @extant_children ) {
print "id is ", $child->id, "\n"; # NCBI taxa id
print "rank is ", $child->rank, "\n"; # e.g. species
print "scientific name is ", $child->scientific_name, "\n"; # scientific name
}
On Jun 18, 2007, at 5:04 PM, George Heller wrote:
Ok, I installed the latest of Scalar::Util and the script seems to be working. But I am confused where exactly I need to look for the descendent taxon ids once the script is run. I did look into the /tmp/ directory, but I couldnt understand much.
Sorry to be bothering, really appreaciate your patience.
Thanks.
George
Jason Stajich <jason at bioperl.org> wrote:
Try installing the latest Scalar::Util
On Jun 18, 2007, at 4:05 PM, George Heller wrote:
This is the output of /usr/bin/perl -V
Summary of my perl5 (revision 5 version 8 subversion 5) configuration:
Platform:
osname=linux, osvers=2.6.9-22.18.bz155725.elsmp, archname=i386-linux-thread-multi
uname='linux hs20-bc1-4.build.redhat.com 2.6.9-22.18.bz155725.elsmp #1 smp thu nov 17 15:34:08 est 2005 i686 i686 i386 gnulinux '
config_args='-des -Doptimize=-O2 -g -pipe -m32 -march=i386 -mtune=pentium4 -Dversion=5.8.5 -Dmyhostname=localhost -Dperladmin=root at localhost -Dcc=gcc -Dcf_by=Red Hat, Inc. -Dinstallprefix=/usr -Dprefix=/usr -Darchname=i386-linux -Dvendorprefix=/usr -Dsiteprefix=/usr -Duseshrplib -Dusethreads -Duseithreads -Duselargefiles -Dd_dosuid -Dd_semctl_semun -Di_db -Ui_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio -Dinstallusrbinperl -Ubincompat5005 -Uversiononly -Dpager=/usr/bin/less -isr -Dinc_version_list=5.8.4 5.8.3 5.8.2 5.8.1 5.8.0'
hint=recommended, useposix=true, d_sigaction=define
usethreads=define use5005threads=undef useithreads=define usemultiplicity=define
useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
use64bitint=undef use64bitall=undef uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler:
cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm',
optimize='-O2 -g -pipe -m32 -march=i386 -mtune=pentium4',
cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING -fno-strict-aliasing -pipe -I/usr/local/include -I/usr/include/gdbm'
ccversion='', gccversion='3.4.6 20060404 (Red Hat 3.4.6-2)', gccosandvers=''
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
alignbytes=4, prototype=define
Linker and Libraries:
ld='gcc', ldflags =' -L/usr/local/lib'
libpth=/usr/local/lib /lib /usr/lib
libs=-lresolv -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc
perllibs=-lresolv -lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
libc=/lib/libc-2.3.4.so, so=so, useshrplib=true, libperl=libperl.so
gnulibc_version='2.3.4'
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E -Wl,-rpath,/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE'
cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'
Characteristics of this binary (from libperl):
Compile-time options: DEBUGGING MULTIPLICITY USE_ITHREADS USE_LARGE_FILES PERL_IMPLICIT_CONTEXT
Built under linux
Compiled at Jul 24 2006 18:28:10
@INC:
/usr/lib/perl5/5.8.5/i386-linux-thread-multi
/usr/lib/perl5/5.8.5
/usr/lib/perl5/site_perl/5.8.5/i386-linux-thread-multi
/usr/lib/perl5/site_perl/5.8.4/i386-linux-thread-multi
/usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi
/usr/lib/perl5/site_perl/5.8.2/i386-linux-thread-multi
/usr/lib/perl5/site_perl/5.8.1/i386-linux-thread-multi
/usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi
/usr/lib/perl5/site_perl/5.8.5
/usr/lib/perl5/site_perl/5.8.4
/usr/lib/perl5/site_perl/5.8.3
/usr/lib/perl5/site_perl/5.8.2
/usr/lib/perl5/site_perl/5.8.1
/usr/lib/perl5/site_perl/5.8.0
/usr/lib/perl5/site_perl
/usr/lib/perl5/vendor_perl/5.8.5/i386-linux-thread-multi
/usr/lib/perl5/vendor_perl/5.8.4/i386-linux-thread-multi
/usr/lib/perl5/vendor_perl/5.8.3/i386-linux-thread-multi
/usr/lib/perl5/vendor_perl/5.8.2/i386-linux-thread-multi
/usr/lib/perl5/vendor_perl/5.8.1/i386-linux-thread-multi
/usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi
/usr/lib/perl5/vendor_perl/5.8.5
/usr/lib/perl5/vendor_perl/5.8.4
/usr/lib/perl5/vendor_perl/5.8.3
/usr/lib/perl5/vendor_perl/5.8.2
/usr/lib/perl5/vendor_perl/5.8.1
/usr/lib/perl5/vendor_perl/5.8.0
/usr/lib/perl5/vendor_perl
Thanks.
George
.
Hilmar Lapp <hlapp at gmx.net> wrote:
The perl version appears to be 5.8.5 though, so something strange
appears to be going on too.
George, can you please post the output of
$ /usr/bin/perl -V
-hilmar
On Jun 18, 2007, at 6:33 PM, Chris Fields wrote:
As the error implies your local version of perl doesn't seem support
weak references, which means it doesn't have Scalar::Utils (which was
added to core after perl 5.6.1, I think). Try installing
Scalar::Utils to see what happens.
chris
On Jun 18, 2007, at 5:18 PM, George Heller wrote:
I tried running the below mentioned script and I seem to be getting
the following error:
Weak references are not implemented in the version of perl at /
usr/lib/perl5/site_perl/5.8.5/Bio/Tree/Node.pm line 76
BEGIN failed--compilation aborted at /usr/lib/perl5/site_perl/5.8.5/
Bio/Tree/Node.pm line 76.
Compilation failed in require at my.pl line 7.
BEGIN failed--compilation aborted at my.pl line 7.
My script looks something like,
#!/usr/bin/perl
use strict;
#use warnings;
use DBI;
use Bio::Tree::Node;
use Bio::DB::Taxonomy;
use Bio::DB::Taxonomy::flatfile;
my $idx_dir = '/tmp';
my ($nodefile,$namesfile) = ('nodes.dmp','names.dmp');
my $db = new Bio::DB::Taxonomy(-source => 'flatfile',
-nodesfile => $nodesfile,
-namesfile => $namesfile,
-directory => $idx_dir);
my $node = $db->get_Taxonomy_Node(-taxonid => '33090');
my @extant_children = grep { $_->is_Leaf } $node-
get_all_Descendents;
foreach $field (@extant_children) {
print "$field";
print "|";
print "\n";
}
And I am running the script using the command,
perl myscript.pl -v --names names.dmp --nodes nodes.dmp
and I have the nodes.dmp and names.dmp files in the current
directory.
Thanks,
George
Jason Stajich wrote:
It is implemented in the implementing class - DB::Taxonomy is
just the base class. For example see the flatfile implementation
Bio::DB::Taxonomy::flatfile
See the scripts/taxa/local_taxonomydb_query.PLS for example using
it:
nodes and names are from NCBI taxonomy database.
Here is an un-debugged copy+paste for your question that *should*
work.
use Bio::DB::Taxonomy
my $idx_dir = '/tmp';
my ($nodefile,$namesfile) = ('nodes.dmp,'names.dmp');
my $db = new Bio::DB::Taxonomy(-source => 'flatfile',
-nodesfile => $nodesfile,
-namesfile => $namesfile,
-directory => $idx_dir);
my $node = $db->get_Taxonomy_Node(-taxonid => '33090');
my @extant_children = grep { $_->is_Leaf } $node-
get_all_Descendents;
-jason
On Jun 18, 2007, at 10:07 AM, George Heller wrote:
What exactly is the "node n" in the query below. When I issue
this query, it says,
relation "node" does not exist.
I tried to use the get_all_Descendents method but it looks like
in order to do a recursive call it calls the method
each_Descendent. This method is not implemented in
Bio::DB::Taxonomy. It just has a single line,
shift->throw_not_implemented();
Thanks.
George.
Hilmar Lapp wrote:
I'm a bit confused - it sounds like you have set up a local
BioSQL
database and loaded the NCBI taxonomy into the database. You can
now
use simple SQL to retrieve all descendants of a node in the tree
given its NCBI taxonID such as
SELECT tn.*, tnm.name FROM taxon tn, taxon_name tnm, node n
WHERE
n.ncbi_taxon_id = :taxonID
AND tn.left_value > n. left_value
AND tn.right_value < n.right_value
AND tn.taxon_id = tnm.taxon_id
AND tn.name_class = 'scientific_name'
BioPerl doesn't have a Taxonomy::biosql module yet (though this
would
seem like a worthwhile thing to add), so you can't use the
Bio::DB::Taxonomy interface to do this against a BioSQL instance.
However, BioPerl does have support for the flat-file download of
the
NCBI taxonomy database and indexes it, so you can simply use
Taxonomy::{get_taxon,get_all_Descendants} using the flatfile
download
to achieve what you wanted to do in a less than 5 lines of perl.
Although the recursive implementation of
Taxonomy::get_all_Descendants
() won't be lightning fast, it may still be perfectly fine for your
application - are you sure it is not?
-hilmar
On Jun 18, 2007, at 12:21 AM, George Heller wrote:
Thanks. And how can I assign the $node here in the below code,
such
that I can reference it to a particular taxon id record? I want to
retrieve all the descendents from the taxonomy hierarchy, given a
particular taxon id.
I have a local db setup, in which I have uploaded data using the
load_ncbi_taxonomy.pl script.
Thanks.
George
Jason Stajich wrote:
I assume you already figured out how to setup a local taxonomydb?
You just want the extant species/leaves of the tree
my @extant_children = grep { $_->is_Leaf } $node-
get_all_Descedents;
-jason
On Jun 17, 2007, at 11:41 AM, George Heller wrote:
Hi all,
Can anyone point me to some example that uses the
get_all_Descendents method from Bio::DB::Taxonomy? I am a newbie at
this, and I am not quite sure how to implement it.
Thanks.
George
Sendu Bala wrote:
George Heller wrote:
Hi all,
I am looking at extracting the taxonomy hierarchy for some taxon
ids.
What I plan to do is, for a given taxon id, say 33090, I want to
extract all taxon ids that are children of this species. I do not
just want the immediate children, but the children's children
and so
on.
Any ideas on the way I can go about doing this?
Well, you'll use Bio::DB::Taxonomy presumably, and
each_Descendent in
some kind of looping structure. Most easily a recursing sub.
If you happen to code up something neat and efficient, why not
share it
with us and we could add it to the Taxonomy module(s).
---------------------------------
Shape Yahoo! in your own image. Join our Network Research Panel
today!
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/
---------------------------------
Need a vacation? Get great deals to amazing places on Yahoo!
Travel.
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
--
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
===========================================================
---------------------------------
Take the Internet to Go: Yahoo!Go puts the Internet in your
pocket: mail, news, photos & more.
--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/
---------------------------------
Bored stiff? Loosen up...
Download and play hundreds of games for free on Yahoo! Games.
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
--
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
===========================================================
---------------------------------
Expecting? Get great news right away with email Auto-Check.
Try the Yahoo! Mail Beta.
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/
---------------------------------
Building a website is a piece of cake.
Yahoo! Small Business gives you all the tools to get online.
--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/
---------------------------------
Fussy? Opinionated? Impossible to please? Perfect. Join Yahoo!'s user panel and lay it on us.
--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/
---------------------------------
Now that's room service! Choose from over 150,000 hotels
in 45,000 destinations on Yahoo! Travel to find your fit.
More information about the Bioperl-l
mailing list