[BioSQL-l] BioSQL seqeunce quality tables

Peter biopython at maubp.freeserve.co.uk
Tue Nov 23 09:00:39 UTC 2010


On Tue, Nov 23, 2010 at 6:07 AM, Dan Kortschak wrote:
>
> Hi,
>
> What is the consensus about storing sequence qualities in the BioSQL
> schema? There is no specific table for this, so I was wondering what
> others do.
>
> thanks
> Dan

For Biopython we decided not to store the quality, and document this
as a known limitation. As I recall there was some discussion about
using the existing BioSQL feature annotations and using a (Sanger)
FASTQ encoded string was suggested, but there was no consensus.

Is there actually a need for this? You can't be thinking of storing raw
reads in BioSQL (are you? I think you'll be disappointed with the
performance), but perhaps it is reasonable for contigs.

I was also interested in other per-letter-annotation, like secondary
structure predictions (which can be stored as a string with the same
length as the sequence) or more general things like atomic coords.
In principle new tables could be introduced to BioSQL just for
per-letter-annotation, designed to work well with extracting a
subsequence with the relevant sub-set of per-letter-annotation.

Peter



More information about the BioSQL-l mailing list