[BioSQL-l] FW: SeqWithQuality and biosql

Richard HOLLAND hollandr at gis.a-star.edu.sg
Tue Jul 5 02:33:07 EDT 2005


I'd think storing it in BioSQL as 2-byte pairs would be good. First byte
is the base (an ASCII character), second byte is the quality (an 8-bit
integer). Sure it wastes a few bits but so does normal DNA...


Richard Holland
Bioinformatics Specialist
GIS extension 8199
---------------------------------------------
This email is confidential and may be privileged. If you are not the
intended recipient, please delete it and notify us immediately. Please
do not copy or use it for any purpose, or disclose its content to any
other person. Thank you.
---------------------------------------------


> -----Original Message-----
> From: biosql-l-bounces at portal.open-bio.org 
> [mailto:biosql-l-bounces at portal.open-bio.org] On Behalf Of 
> mark.schreiber at novartis.com
> Sent: Tuesday, July 05, 2005 1:44 PM
> To: Marc Logghe
> Cc: biosql-l-bounces at portal.open-bio.org; biosql-l at open-bio.org
> Subject: Re: [BioSQL-l] FW: SeqWithQuality and biosql
> 
> 
> Hello -
> 
> I was wondering about similar issues with biojava. As you may 
> (or may not) 
> know biojava can make sequences from symbols in any alphabet, 
> two examples 
> are DNA and the integer alphabet (a collection of Symbols that are 
> integers). Biojava can also make compound alphabets, one such 
> example is 
> the Phred alphabet which is the multiplication of DNA x Integer 
> (technically a subset of Integer from 0 to 99).
> 
> Because sequence in BioSQL is stored in a CLOB if you can encode your 
> SeqWithQuality as a String of characters you can store it. 
> With the case 
> above (which is probably similar to yours) you would need 400 
> characters 
> to store it which is too large for ASCI but could be done in 
> Unicode. The 
> downside is your persitance layer needs to know how to encode 
> and decode 
> your SeqWithQuality. I'm not familiar how BioPerl would do 
> this. BioJava 
> would need to Implement a SymbolTokenizer for the alphabet and then 
> persistance would happen automatically (assuming your DB is OK with 
> Unicode). An alternative would be to make a tokenizer that 
> uses more than 
> single character tokens for encoding (eg A23 G40 T34 C22 etc).
> 
> The alternative you suggest of storing two sequences with a 
> relationship 
> is also nice (because you can retreive each part seperately) but also 
> requires your persitance layer to know about it. However, it has big 
> disadvantages because they are not strongly tied to each 
> other. If you 
> manipulate one you might invalidate the other. Also if you 
> delete one the 
> other will probably not be deleted in a cascade.
> 
> Not sure if any of this helps but a consensus on how to store 
> this kind of 
> information would be good so the bio* projects do it the same way. 
> Consensus in this case will probably mean whatever the first 
> implementation is.
> 
> - Mark
> 
> 
> 
> 
> 
> "Marc Logghe" <Marc.Logghe at devgen.com>
> Sent by: biosql-l-bounces at portal.open-bio.org
> 07/04/2005 05:56 PM
> 
>  
>         To:     <biosql-l at open-bio.org>
>         cc:     (bcc: Mark Schreiber/GP/Novartis)
>         Subject:        [BioSQL-l] FW: SeqWithQuality and biosql
> 
> 
> Apologies for cross posting, I had picked the wrong mail adress :-(
> 
> -----Original Message-----
> From: Marc Logghe 
> Sent: Monday, July 04, 2005 11:43 AM
> To: bioperl-l at portal.open-bio.org
> Subject: SeqWithQuality and biosql
> 
> Hi all,
> I am currently exploring the possibility to store a
> Bio::Seq::SeqWithQuality object in biosql.
> Has anyone ever tried this ?
> One possibility would be to
> 1) split up the Bio::Seq::SeqWithQuality object into a plain
> Bio::Seq::RichSeq and a Bio::Seq::PrimaryQual
> 2) store them separately in biosql; different namespaces
> 3) link them with a relation term.
> 4) make a custom adaptor to fetch the persistent objects from 
> biosql and
> reconstruct the Bio::Seq::SeqWithQuality
> 
> Does that make sense ? Any other suggestions/possibilities ?
> As a test I tried to load a Bio::Seq::PrimaryQual in biosql using the
> load_seqdatabase.pl but it fails because 
> Bio::Seq::PrimaryQual does not
> have a namespace method.
> I hope I'm wrong but I have the impression there is a long 
> way to go ;-)
> 
> Marc
> 
> 
> 
> 
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
> 
> 
> 
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
> 



More information about the BioSQL-l mailing list