[BioSQL-l] FW: SeqWithQuality and biosql
mark.schreiber at novartis.com
mark.schreiber at novartis.com
Tue Jul 5 01:44:10 EDT 2005
Hello -
I was wondering about similar issues with biojava. As you may (or may not)
know biojava can make sequences from symbols in any alphabet, two examples
are DNA and the integer alphabet (a collection of Symbols that are
integers). Biojava can also make compound alphabets, one such example is
the Phred alphabet which is the multiplication of DNA x Integer
(technically a subset of Integer from 0 to 99).
Because sequence in BioSQL is stored in a CLOB if you can encode your
SeqWithQuality as a String of characters you can store it. With the case
above (which is probably similar to yours) you would need 400 characters
to store it which is too large for ASCI but could be done in Unicode. The
downside is your persitance layer needs to know how to encode and decode
your SeqWithQuality. I'm not familiar how BioPerl would do this. BioJava
would need to Implement a SymbolTokenizer for the alphabet and then
persistance would happen automatically (assuming your DB is OK with
Unicode). An alternative would be to make a tokenizer that uses more than
single character tokens for encoding (eg A23 G40 T34 C22 etc).
The alternative you suggest of storing two sequences with a relationship
is also nice (because you can retreive each part seperately) but also
requires your persitance layer to know about it. However, it has big
disadvantages because they are not strongly tied to each other. If you
manipulate one you might invalidate the other. Also if you delete one the
other will probably not be deleted in a cascade.
Not sure if any of this helps but a consensus on how to store this kind of
information would be good so the bio* projects do it the same way.
Consensus in this case will probably mean whatever the first
implementation is.
- Mark
"Marc Logghe" <Marc.Logghe at devgen.com>
Sent by: biosql-l-bounces at portal.open-bio.org
07/04/2005 05:56 PM
To: <biosql-l at open-bio.org>
cc: (bcc: Mark Schreiber/GP/Novartis)
Subject: [BioSQL-l] FW: SeqWithQuality and biosql
Apologies for cross posting, I had picked the wrong mail adress :-(
-----Original Message-----
From: Marc Logghe
Sent: Monday, July 04, 2005 11:43 AM
To: bioperl-l at portal.open-bio.org
Subject: SeqWithQuality and biosql
Hi all,
I am currently exploring the possibility to store a
Bio::Seq::SeqWithQuality object in biosql.
Has anyone ever tried this ?
One possibility would be to
1) split up the Bio::Seq::SeqWithQuality object into a plain
Bio::Seq::RichSeq and a Bio::Seq::PrimaryQual
2) store them separately in biosql; different namespaces
3) link them with a relation term.
4) make a custom adaptor to fetch the persistent objects from biosql and
reconstruct the Bio::Seq::SeqWithQuality
Does that make sense ? Any other suggestions/possibilities ?
As a test I tried to load a Bio::Seq::PrimaryQual in biosql using the
load_seqdatabase.pl but it fails because Bio::Seq::PrimaryQual does not
have a namespace method.
I hope I'm wrong but I have the impression there is a long way to go ;-)
Marc
_______________________________________________
BioSQL-l mailing list
BioSQL-l at open-bio.org
http://open-bio.org/mailman/listinfo/biosql-l
More information about the BioSQL-l
mailing list