[BioSQL-l] FW: SeqWithQuality and biosql

Marc Logghe Marc.Logghe at devgen.com
Tue Jul 5 03:39:28 EDT 2005


Thanks for the feedback.
Good to know I am not alone in this ;-)
I totally agree with Mark that there should be a kind of consensus on
how to store this in Bio*.
Yesterday I mistakenly posted my original mail to the bioperl list.
Heikki responded to that; it might be a good starting point but I am not
familiar with it:
http://portal.open-bio.org/pipermail/bioperl-l/2005-July/019271.html
So far the long term solustion.
In short term, to have at least something that works, I'll experiment a
little with storing separate objects. I remember one of the
presentations of Hilmar, where he gave the example of making an adaptor
and storing 2 sequence objects that interacted with each other as a
result of a Two Hybrid experiment in yeast.
Cheers,
Marc


> 
> I'd think storing it in BioSQL as 2-byte pairs would be good. 
> First byte is the base (an ASCII character), second byte is 
> the quality (an 8-bit integer). Sure it wastes a few bits but 
> so does normal DNA...
> 
> 
> Richard Holland
> Bioinformatics Specialist
> GIS extension 8199
> ---------------------------------------------
> This email is confidential and may be privileged. If you are 
> not the intended recipient, please delete it and notify us 
> immediately. Please do not copy or use it for any purpose, or 
> disclose its content to any other person. Thank you.
> ---------------------------------------------
> 
> 
> > -----Original Message-----
> > From: biosql-l-bounces at portal.open-bio.org
> > [mailto:biosql-l-bounces at portal.open-bio.org] On Behalf Of 
> > mark.schreiber at novartis.com
> > Sent: Tuesday, July 05, 2005 1:44 PM
> > To: Marc Logghe
> > Cc: biosql-l-bounces at portal.open-bio.org; biosql-l at open-bio.org
> > Subject: Re: [BioSQL-l] FW: SeqWithQuality and biosql
> > 
> > 
> > Hello -
> > 
> > I was wondering about similar issues with biojava. As you 
> may (or may 
> > not) know biojava can make sequences from symbols in any 
> alphabet, two 
> > examples are DNA and the integer alphabet (a collection of Symbols 
> > that are integers). Biojava can also make compound 
> alphabets, one such 
> > example is the Phred alphabet which is the multiplication of DNA x 
> > Integer (technically a subset of Integer from 0 to 99).
> > 
> > Because sequence in BioSQL is stored in a CLOB if you can 
> encode your 
> > SeqWithQuality as a String of characters you can store it.
> > With the case
> > above (which is probably similar to yours) you would need 400 
> > characters to store it which is too large for ASCI but 
> could be done 
> > in Unicode. The downside is your persitance layer needs to 
> know how to 
> > encode and decode your SeqWithQuality. I'm not familiar how BioPerl 
> > would do this. BioJava would need to Implement a 
> SymbolTokenizer for 
> > the alphabet and then persistance would happen 
> automatically (assuming 
> > your DB is OK with Unicode). An alternative would be to make a 
> > tokenizer that uses more than single character tokens for 
> encoding (eg 
> > A23 G40 T34 C22 etc).
> > 
> > The alternative you suggest of storing two sequences with a 
> > relationship is also nice (because you can retreive each part 
> > seperately) but also requires your persitance layer to know 
> about it. 
> > However, it has big disadvantages because they are not 
> strongly tied 
> > to each other. If you manipulate one you might invalidate 
> the other. 
> > Also if you delete one the other will probably not be deleted in a 
> > cascade.
> > 
> > Not sure if any of this helps but a consensus on how to store this 
> > kind of information would be good so the bio* projects do 
> it the same 
> > way.
> > Consensus in this case will probably mean whatever the first 
> > implementation is.
> > 
> > - Mark
> > 
> > 
> > 
> > 
> > 
> > "Marc Logghe" <Marc.Logghe at devgen.com> Sent by: 
> > biosql-l-bounces at portal.open-bio.org
> > 07/04/2005 05:56 PM
> > 
> >  
> >         To:     <biosql-l at open-bio.org>
> >         cc:     (bcc: Mark Schreiber/GP/Novartis)
> >         Subject:        [BioSQL-l] FW: SeqWithQuality and biosql
> > 
> > 
> > Apologies for cross posting, I had picked the wrong mail adress :-(
> > 
> > -----Original Message-----
> > From: Marc Logghe
> > Sent: Monday, July 04, 2005 11:43 AM
> > To: bioperl-l at portal.open-bio.org
> > Subject: SeqWithQuality and biosql
> > 
> > Hi all,
> > I am currently exploring the possibility to store a 
> > Bio::Seq::SeqWithQuality object in biosql.
> > Has anyone ever tried this ?
> > One possibility would be to
> > 1) split up the Bio::Seq::SeqWithQuality object into a plain 
> > Bio::Seq::RichSeq and a Bio::Seq::PrimaryQual
> > 2) store them separately in biosql; different namespaces
> > 3) link them with a relation term.
> > 4) make a custom adaptor to fetch the persistent objects 
> from biosql 
> > and reconstruct the Bio::Seq::SeqWithQuality
> > 
> > Does that make sense ? Any other suggestions/possibilities ?
> > As a test I tried to load a Bio::Seq::PrimaryQual in biosql 
> using the 
> > load_seqdatabase.pl but it fails because Bio::Seq::PrimaryQual does 
> > not have a namespace method.
> > I hope I'm wrong but I have the impression there is a long 
> way to go 
> > ;-)
> > 
> > Marc
> > 
> > 
> > 
> > 
> > _______________________________________________
> > BioSQL-l mailing list
> > BioSQL-l at open-bio.org
> > http://open-bio.org/mailman/listinfo/biosql-l
> > 
> > 
> > 
> > _______________________________________________
> > BioSQL-l mailing list
> > BioSQL-l at open-bio.org
> > http://open-bio.org/mailman/listinfo/biosql-l
> > 
> 



More information about the BioSQL-l mailing list