[Bioperl-l] BioPerl for indexing quality score files

Gregory Jordan greg at ebi.ac.uk
Wed May 12 18:16:53 UTC 2010


Hi all,

I'm wondering if anyone has tried using BioPerl to index sequence quality
score files? The files I'm looking at tend to look like Fasta files, but
with numbers (between 0 and 99) and spaces instead of sequence strings.
Something like:
---
>chr1
 0 20 20 20 50 99 99 99 99 30 30 20 20 10 10 0 0 0 0
---
(An example for Chimpanzee can be found here, as the file
'panTro2.quals.fa.gz':
http://hgdownload.cse.ucsc.edu/goldenPath/panTro2/bigZips/ )

I'm currently using a home-brewed file indexing system to access subsets of
these quality scores, but it's kind of slow and (probably) buggy. I'd much
rather use something like Bio::DB::Fasta, but (without having actually tried
it) I expect it wouldn't be too happy with these not-quite-fasta format
quality files.

Has anyone run into a similar situation and found a solution using Bioperl
(or something else)?

I'd be happy to hack around a bit to get this to work, if necessary; if
anyone could provide pointers on where to start, I'd be much obliged.

Cheers,
 Greg

PS - it's great to see the GitHub migration moving along so swiftly! I'll be
*much* more likely to start bug-hunting and patch-submitting with the code
living there now. :)



More information about the Bioperl-l mailing list