[Bioperl-l] indexing conservation scores
Maxim
deeepersound at googlemail.com
Thu Dec 23 00:00:25 UTC 2010
Hi,
bio::db:fasta is a beautiful tool for fast access to sequences present in
large flat text (fasta) files and I really love it. Now I'd like to speed up
the retrieval of data from large files that store conservation scores. The
files that I was able to find at UCSC have fixed step wiggle format, like
fixedStep chrom=chrYHet start=1 step=1
0.117
0.092
0.092
0.085
0.071
0.051
0.021
0.010
0.008
0.010
0.019
0.023
0.023
0.019
........
Does someone see a chance how to use the indexing mechanism used by
bio::db::fasta in order to allow retrieval of float numbers. I could
reformat the wiggle file to a simple space,tab or comma separated list of
scores per chromosome.
Are there suggestions? Or is there indeed a module that takes care about my
problem and I have just overlooked it?
Or won't such an approach get considerably faster than normal unix commands
like:
sed -n '2,5001p' chrYHet.pp
to retrieve the scores?
Maxim
More information about the Bioperl-l
mailing list