[Bioperl-l] Handling hierarchical phylogeny based data in bio-/perl

Abhishek Pratap abhishek.vit at gmail.com
Wed Apr 27 18:32:58 UTC 2011


Hi Guys

I have lineage for many contigs blasted to nt dbase. The goal is to
arrange them in a hierarchical data structure something like hash of
hash and also store some other ancillary data like contig names for
each bin and coverage etc.

For example

if my input is from a tsv file with lineage as one column and others
like contig name, coverage etc

Eukaryota Viridiplantae Streptophyta Embryophyta Tracheophyta
Spermatophyta Magnoliophyta eudicotyledons core_eudicotyledons
Eukaryota Viridiplantae Streptophyta Streptophytina Charophyceae
Charales Characeae Chara
Eukaryota Viridiplantae Streptophyta Streptophytina Embryophyta

then I would like to store data as follows

Eukaryota -> count = 3
Eukaryota -> coverage = 6.3
Eukaryota->Viridplantae->count=3
Eukaryota->Viridplantae->coverage=4.3
Eukaryota->Viridplantae->Streptophyta->count=3
Eukaryota->Viridplantae->Streptophyta->coverage2=2.3
-------etc

I could create such hash explicitly but it is a tiring process as num
of words on each line(lineage) increases I have to keep on increasing
my data structure manually. Also all lines(lineage) wont have same
number of words.

Also I would like to print such a tree with count/coverage information
associated for each bin.

Wondering if I can use some Tree based built in capability of
perl/bio-perl to do this. I did have a look at
http://bioperl.org/wiki/HOWTO:Trees but I dont think I could find
example to read from tsv file and create a data structure where I am
also storing count/coverage for each bin.

Any pointers will help.

Best,
-Abhi



More information about the Bioperl-l mailing list