[BioSQL-l] FW: SeqWithQuality and biosql

mark.schreiber at novartis.com mark.schreiber at novartis.com
Tue Jul 5 01:44:10 EDT 2005


Hello -

I was wondering about similar issues with biojava. As you may (or may not) 
know biojava can make sequences from symbols in any alphabet, two examples 
are DNA and the integer alphabet (a collection of Symbols that are 
integers). Biojava can also make compound alphabets, one such example is 
the Phred alphabet which is the multiplication of DNA x Integer 
(technically a subset of Integer from 0 to 99).

Because sequence in BioSQL is stored in a CLOB if you can encode your 
SeqWithQuality as a String of characters you can store it. With the case 
above (which is probably similar to yours) you would need 400 characters 
to store it which is too large for ASCI but could be done in Unicode. The 
downside is your persitance layer needs to know how to encode and decode 
your SeqWithQuality. I'm not familiar how BioPerl would do this. BioJava 
would need to Implement a SymbolTokenizer for the alphabet and then 
persistance would happen automatically (assuming your DB is OK with 
Unicode). An alternative would be to make a tokenizer that uses more than 
single character tokens for encoding (eg A23 G40 T34 C22 etc).

The alternative you suggest of storing two sequences with a relationship 
is also nice (because you can retreive each part seperately) but also 
requires your persitance layer to know about it. However, it has big 
disadvantages because they are not strongly tied to each other. If you 
manipulate one you might invalidate the other. Also if you delete one the 
other will probably not be deleted in a cascade.

Not sure if any of this helps but a consensus on how to store this kind of 
information would be good so the bio* projects do it the same way. 
Consensus in this case will probably mean whatever the first 
implementation is.

- Mark





"Marc Logghe" <Marc.Logghe at devgen.com>
Sent by: biosql-l-bounces at portal.open-bio.org
07/04/2005 05:56 PM

 
        To:     <biosql-l at open-bio.org>
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [BioSQL-l] FW: SeqWithQuality and biosql


Apologies for cross posting, I had picked the wrong mail adress :-(

-----Original Message-----
From: Marc Logghe 
Sent: Monday, July 04, 2005 11:43 AM
To: bioperl-l at portal.open-bio.org
Subject: SeqWithQuality and biosql

Hi all,
I am currently exploring the possibility to store a
Bio::Seq::SeqWithQuality object in biosql.
Has anyone ever tried this ?
One possibility would be to
1) split up the Bio::Seq::SeqWithQuality object into a plain
Bio::Seq::RichSeq and a Bio::Seq::PrimaryQual
2) store them separately in biosql; different namespaces
3) link them with a relation term.
4) make a custom adaptor to fetch the persistent objects from biosql and
reconstruct the Bio::Seq::SeqWithQuality

Does that make sense ? Any other suggestions/possibilities ?
As a test I tried to load a Bio::Seq::PrimaryQual in biosql using the
load_seqdatabase.pl but it fails because Bio::Seq::PrimaryQual does not
have a namespace method.
I hope I'm wrong but I have the impression there is a long way to go ;-)

Marc




_______________________________________________
BioSQL-l mailing list
BioSQL-l at open-bio.org
http://open-bio.org/mailman/listinfo/biosql-l





More information about the BioSQL-l mailing list