[Bioperl-l] BioPerl for indexing quality score files

Wed May 12 18:26:26 UTC 2010

Ok, I need to shame myself with a huge "RTFM" for this one --
http://search.cpan.org/~cjfields/BioPerl-1.6.1/Bio/DB/Qual.pm<http://search.cpan.org/%7Ecjfields/BioPerl-1.6.1/Bio/DB/Qual.pm>

Sorry for the spam. Still happy about the GitHub, though!

greg

On 12 May 2010 19:16, Gregory Jordan <greg at ebi.ac.uk> wrote:

> Hi all,
>
> I'm wondering if anyone has tried using BioPerl to index sequence quality
> score files? The files I'm looking at tend to look like Fasta files, but
> with numbers (between 0 and 99) and spaces instead of sequence strings.
> Something like:
> ---
> >chr1
>  0 20 20 20 50 99 99 99 99 30 30 20 20 10 10 0 0 0 0
> ---
> (An example for Chimpanzee can be found here, as the file
> 'panTro2.quals.fa.gz':
> http://hgdownload.cse.ucsc.edu/goldenPath/panTro2/bigZips/ )
>
> I'm currently using a home-brewed file indexing system to access subsets of
> these quality scores, but it's kind of slow and (probably) buggy. I'd much
> rather use something like Bio::DB::Fasta, but (without having actually tried
> it) I expect it wouldn't be too happy with these not-quite-fasta format
> quality files.
>
> Has anyone run into a similar situation and found a solution using Bioperl
> (or something else)?
>
> I'd be happy to hack around a bit to get this to work, if necessary; if
> anyone could provide pointers on where to start, I'd be much obliged.
>
> Cheers,
>  Greg
>
> PS - it's great to see the GitHub migration moving along so swiftly! I'll
> be *much* more likely to start bug-hunting and patch-submitting with the
> code living there now. :)
>