[Bioperl-l] Taxonomy hierarchy extraction

Jason Stajich jason at bioperl.org
Mon Jun 18 23:22:08 UTC 2007


Try installing the latest Scalar::Util

On Jun 18, 2007, at 4:05 PM, George Heller wrote:

> This is the output of /usr/bin/perl -V
>
> Summary of my perl5 (revision 5 version 8 subversion 5) configuration:
>   Platform:
>     osname=linux, osvers=2.6.9-22.18.bz155725.elsmp, archname=i386- 
> linux-thread-multi
>     uname='linux hs20-bc1-4.build.redhat.com  
> 2.6.9-22.18.bz155725.elsmp #1 smp thu nov 17 15:34:08 est 2005 i686  
> i686 i386 gnulinux '
>     config_args='-des -Doptimize=-O2 -g -pipe -m32 -march=i386 - 
> mtune=pentium4 -Dversion=5.8.5 -Dmyhostname=localhost - 
> Dperladmin=root at localhost -Dcc=gcc -Dcf_by=Red Hat, Inc. - 
> Dinstallprefix=/usr -Dprefix=/usr -Darchname=i386-linux - 
> Dvendorprefix=/usr -Dsiteprefix=/usr -Duseshrplib -Dusethreads - 
> Duseithreads -Duselargefiles -Dd_dosuid -Dd_semctl_semun -Di_db - 
> Ui_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio - 
> Dinstallusrbinperl -Ubincompat5005 -Uversiononly -Dpager=/usr/bin/ 
> less -isr -Dinc_version_list=5.8.4 5.8.3 5.8.2 5.8.1 5.8.0'
>     hint=recommended, useposix=true, d_sigaction=define
>     usethreads=define use5005threads=undef useithreads=define  
> usemultiplicity=define
>     useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
>     use64bitint=undef use64bitall=undef uselongdouble=undef
>     usemymalloc=n, bincompat5005=undef
>   Compiler:
>     cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING -fno- 
> strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE - 
> D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm',
>     optimize='-O2 -g -pipe -m32 -march=i386 -mtune=pentium4',
>     cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING -fno-strict- 
> aliasing -pipe -I/usr/local/include -I/usr/include/gdbm'
>     ccversion='', gccversion='3.4.6 20060404 (Red Hat 3.4.6-2)',  
> gccosandvers=''
>     intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
>     d_longlong=define, longlongsize=8, d_longdbl=define,  
> longdblsize=12
>     ivtype='long', ivsize=4, nvtype='double', nvsize=8,  
> Off_t='off_t', lseeksize=8
>     alignbytes=4, prototype=define
>   Linker and Libraries:
>     ld='gcc', ldflags =' -L/usr/local/lib'
>     libpth=/usr/local/lib /lib /usr/lib
>     libs=-lresolv -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil - 
> lpthread -lc
>     perllibs=-lresolv -lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
>     libc=/lib/libc-2.3.4.so, so=so, useshrplib=true,  
> libperl=libperl.so
>     gnulibc_version='2.3.4'
>   Dynamic Linking:
>     dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,- 
> E -Wl,-rpath,/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE'
>     cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'
>
> Characteristics of this binary (from libperl):
>   Compile-time options: DEBUGGING MULTIPLICITY USE_ITHREADS  
> USE_LARGE_FILES PERL_IMPLICIT_CONTEXT
>   Built under linux
>   Compiled at Jul 24 2006 18:28:10
>   @INC:
>     /usr/lib/perl5/5.8.5/i386-linux-thread-multi
>     /usr/lib/perl5/5.8.5
>     /usr/lib/perl5/site_perl/5.8.5/i386-linux-thread-multi
>     /usr/lib/perl5/site_perl/5.8.4/i386-linux-thread-multi
>     /usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi
>     /usr/lib/perl5/site_perl/5.8.2/i386-linux-thread-multi
>     /usr/lib/perl5/site_perl/5.8.1/i386-linux-thread-multi
>     /usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi
>     /usr/lib/perl5/site_perl/5.8.5
>     /usr/lib/perl5/site_perl/5.8.4
>     /usr/lib/perl5/site_perl/5.8.3
>     /usr/lib/perl5/site_perl/5.8.2
>     /usr/lib/perl5/site_perl/5.8.1
>     /usr/lib/perl5/site_perl/5.8.0
>     /usr/lib/perl5/site_perl
>     /usr/lib/perl5/vendor_perl/5.8.5/i386-linux-thread-multi
>     /usr/lib/perl5/vendor_perl/5.8.4/i386-linux-thread-multi
>     /usr/lib/perl5/vendor_perl/5.8.3/i386-linux-thread-multi
>     /usr/lib/perl5/vendor_perl/5.8.2/i386-linux-thread-multi
>     /usr/lib/perl5/vendor_perl/5.8.1/i386-linux-thread-multi
>     /usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi
>     /usr/lib/perl5/vendor_perl/5.8.5
>     /usr/lib/perl5/vendor_perl/5.8.4
>     /usr/lib/perl5/vendor_perl/5.8.3
>     /usr/lib/perl5/vendor_perl/5.8.2
>     /usr/lib/perl5/vendor_perl/5.8.1
>     /usr/lib/perl5/vendor_perl/5.8.0
>     /usr/lib/perl5/vendor_perl
>
>   Thanks.
>   George
>     .
>
> Hilmar Lapp <hlapp at gmx.net> wrote:
>   The perl version appears to be 5.8.5 though, so something strange
> appears to be going on too.
>
> George, can you please post the output of
>
> $ /usr/bin/perl -V
>
> -hilmar
>
> On Jun 18, 2007, at 6:33 PM, Chris Fields wrote:
>
>> As the error implies your local version of perl doesn't seem support
>> weak references, which means it doesn't have Scalar::Utils (which was
>> added to core after perl 5.6.1, I think). Try installing
>> Scalar::Utils to see what happens.
>>
>> chris
>>
>> On Jun 18, 2007, at 5:18 PM, George Heller wrote:
>>
>>> I tried running the below mentioned script and I seem to be getting
>>> the following error:
>>>
>>> Weak references are not implemented in the version of perl at /
>>> usr/lib/perl5/site_perl/5.8.5/Bio/Tree/Node.pm line 76
>>> BEGIN failed--compilation aborted at /usr/lib/perl5/site_perl/5.8.5/
>>> Bio/Tree/Node.pm line 76.
>>> Compilation failed in require at my.pl line 7.
>>> BEGIN failed--compilation aborted at my.pl line 7.
>>>
>>> My script looks something like,
>>>
>>> #!/usr/bin/perl
>>> use strict;
>>> #use warnings;
>>> use DBI;
>>> use Bio::Tree::Node;
>>> use Bio::DB::Taxonomy;
>>> use Bio::DB::Taxonomy::flatfile;
>>> my $idx_dir = '/tmp';
>>>
>>> my ($nodefile,$namesfile) = ('nodes.dmp','names.dmp');
>>> my $db = new Bio::DB::Taxonomy(-source => 'flatfile',
>>> -nodesfile => $nodesfile,
>>> -namesfile => $namesfile,
>>> -directory => $idx_dir);
>>> my $node = $db->get_Taxonomy_Node(-taxonid => '33090');
>>> my @extant_children = grep { $_->is_Leaf } $node-
>>>> get_all_Descendents;
>>>
>>> foreach $field (@extant_children) {
>>> print "$field";
>>> print "|";
>>> print "\n";
>>> }
>>>
>>> And I am running the script using the command,
>>>
>>> perl myscript.pl -v --names names.dmp --nodes nodes.dmp
>>>
>>> and I have the nodes.dmp and names.dmp files in the current
>>> directory.
>>>
>>> Thanks,
>>> George
>>>
>>>
>>> Jason Stajich wrote:
>>> It is implemented in the implementing class - DB::Taxonomy is
>>> just the base class. For example see the flatfile implementation
>>> Bio::DB::Taxonomy::flatfile
>>>
>>> See the scripts/taxa/local_taxonomydb_query.PLS for example using
>>> it:
>>> nodes and names are from NCBI taxonomy database.
>>>
>>>
>>> Here is an un-debugged copy+paste for your question that *should*
>>> work.
>>>
>>>
>>> use Bio::DB::Taxonomy
>>> my $idx_dir = '/tmp';
>>>
>>>
>>> my ($nodefile,$namesfile) = ('nodes.dmp,'names.dmp');
>>> my $db = new Bio::DB::Taxonomy(-source => 'flatfile',
>>> -nodesfile => $nodesfile,
>>> -namesfile => $namesfile,
>>> -directory => $idx_dir);
>>> my $node = $db->get_Taxonomy_Node(-taxonid => '33090');
>>> my @extant_children = grep { $_->is_Leaf } $node-
>>>> get_all_Descendents;
>>>
>>>
>>>
>>>
>>> -jason
>>>
>>> On Jun 18, 2007, at 10:07 AM, George Heller wrote:
>>>
>>> What exactly is the "node n" in the query below. When I issue
>>> this query, it says,
>>>
>>>
>>> relation "node" does not exist.
>>>
>>>
>>> I tried to use the get_all_Descendents method but it looks like
>>> in order to do a recursive call it calls the method
>>> each_Descendent. This method is not implemented in
>>> Bio::DB::Taxonomy. It just has a single line,
>>>
>>>
>>> shift->throw_not_implemented();
>>>
>>>
>>> Thanks.
>>> George.
>>>
>>>
>>> Hilmar Lapp wrote:
>>> I'm a bit confused - it sounds like you have set up a local
>>> BioSQL
>>> database and loaded the NCBI taxonomy into the database. You can
>>> now
>>> use simple SQL to retrieve all descendants of a node in the tree
>>> given its NCBI taxonID such as
>>>
>>>
>>> SELECT tn.*, tnm.name FROM taxon tn, taxon_name tnm, node n
>>> WHERE
>>> n.ncbi_taxon_id = :taxonID
>>> AND tn.left_value > n. left_value
>>> AND tn.right_value < n.right_value
>>> AND tn.taxon_id = tnm.taxon_id
>>> AND tn.name_class = 'scientific_name'
>>>
>>>
>>> BioPerl doesn't have a Taxonomy::biosql module yet (though this
>>> would
>>> seem like a worthwhile thing to add), so you can't use the
>>> Bio::DB::Taxonomy interface to do this against a BioSQL instance.
>>>
>>>
>>> However, BioPerl does have support for the flat-file download of
>>> the
>>> NCBI taxonomy database and indexes it, so you can simply use
>>> Taxonomy::{get_taxon,get_all_Descendants} using the flatfile
>>> download
>>> to achieve what you wanted to do in a less than 5 lines of perl.
>>>
>>>
>>> Although the recursive implementation of
>>> Taxonomy::get_all_Descendants
>>> () won't be lightning fast, it may still be perfectly fine for your
>>> application - are you sure it is not?
>>>
>>>
>>> -hilmar
>>>
>>>
>>> On Jun 18, 2007, at 12:21 AM, George Heller wrote:
>>>
>>>
>>> Thanks. And how can I assign the $node here in the below code,
>>> such
>>> that I can reference it to a particular taxon id record? I want to
>>> retrieve all the descendents from the taxonomy hierarchy, given a
>>> particular taxon id.
>>>
>>>
>>> I have a local db setup, in which I have uploaded data using the
>>> load_ncbi_taxonomy.pl script.
>>>
>>>
>>> Thanks.
>>> George
>>>
>>>
>>> Jason Stajich wrote:
>>> I assume you already figured out how to setup a local taxonomydb?
>>>
>>>
>>>
>>>
>>> You just want the extant species/leaves of the tree
>>>
>>>
>>>
>>>
>>> my @extant_children = grep { $_->is_Leaf } $node-
>>>> get_all_Descedents;
>>>
>>>
>>>
>>>
>>>
>>>
>>> -jason
>>> On Jun 17, 2007, at 11:41 AM, George Heller wrote:
>>>
>>>
>>> Hi all,
>>>
>>>
>>>
>>>
>>> Can anyone point me to some example that uses the
>>> get_all_Descendents method from Bio::DB::Taxonomy? I am a newbie at
>>> this, and I am not quite sure how to implement it.
>>>
>>>
>>>
>>>
>>> Thanks.
>>> George
>>>
>>>
>>>
>>>
>>> Sendu Bala wrote:
>>> George Heller wrote:
>>> Hi all,
>>>
>>>
>>>
>>>
>>> I am looking at extracting the taxonomy hierarchy for some taxon
>>> ids.
>>> What I plan to do is, for a given taxon id, say 33090, I want to
>>> extract all taxon ids that are children of this species. I do not
>>> just want the immediate children, but the children's children
>>> and so
>>> on.
>>>
>>>
>>>
>>>
>>> Any ideas on the way I can go about doing this?
>>>
>>>
>>>
>>>
>>> Well, you'll use Bio::DB::Taxonomy presumably, and
>>> each_Descendent in
>>> some kind of looping structure. Most easily a recursing sub.
>>>
>>>
>>>
>>>
>>> If you happen to code up something neat and efficient, why not
>>> share it
>>> with us and we could add it to the Taxonomy module(s).
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> ---------------------------------
>>> Shape Yahoo! in your own image. Join our Network Research Panel
>>> today!
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>
>>>
>>> --
>>> Jason Stajich
>>> jason at bioperl.org
>>> http://jason.open-bio.org/
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> ---------------------------------
>>> Need a vacation? Get great deals to amazing places on Yahoo!
>>> Travel.
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>> --
>>> ===========================================================
>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
>>> ===========================================================
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> ---------------------------------
>>> Take the Internet to Go: Yahoo!Go puts the Internet in your
>>> pocket: mail, news, photos & more.
>>>
>>>
>>> --
>>> Jason Stajich
>>> jason at bioperl.org
>>> http://jason.open-bio.org/
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> ---------------------------------
>>> Bored stiff? Loosen up...
>>> Download and play hundreds of games for free on Yahoo! Games.
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> -- 
> ===========================================================
> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
>
>
>
> ---------------------------------
> Expecting? Get great news right away with email Auto-Check.
> Try the Yahoo! Mail Beta.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/





More information about the Bioperl-l mailing list