[Bioperl-l] indexing conservation scores

Maxim deeepersound at googlemail.com
Thu Dec 23 00:00:25 UTC 2010


Hi,

bio::db:fasta is a beautiful tool for fast access to sequences present in
large flat text (fasta) files and I really love it. Now I'd like to speed up
the retrieval of data from large files that store conservation scores. The
files that I was able to find at UCSC have fixed step wiggle format, like

fixedStep chrom=chrYHet start=1 step=1
0.117
0.092
0.092
0.085
0.071
0.051
0.021
0.010
0.008
0.010
0.019
0.023
0.023
0.019
........

Does someone see a chance how to use the indexing mechanism used by
bio::db::fasta in order to allow retrieval of float numbers. I could
reformat the wiggle file to a simple space,tab or comma separated list of
scores per chromosome.

Are there suggestions? Or is there indeed a module that takes care about my
problem and I have just overlooked it?
Or won't such an approach  get considerably faster than normal unix commands
like:
sed -n '2,5001p' chrYHet.pp
to retrieve the scores?


Maxim



More information about the Bioperl-l mailing list