[Biojava-l] differences between read in sequence and stored sequence in database]
gabrielle_doan at gmx.net
Tue Oct 28 14:26:47 UTC 2008
concering the problem as described below I have found out that this
problem also occured in BioRuby and was fixed in 2004.
Unfortunately I'm clueless about BioRuby. Does anybody recognize this
problem or understand how it was solved in BioRuby?
I am grateful for any hints.
-------- Original-Nachricht --------
Betreff: [Biojava-l] differences between read in sequence and stored
sequence in database
Datum: Mon, 27 Oct 2008 13:57:03 +0100
Von: Gabrielle Doan <gabrielle_doan at gmx.net>
An: biojava-l at biojava.org
I have a BioSQL database which contains all human chromsomes. For my
recent project I have to query for a part of a sequence.
As far as I know I can get the whole sequence from the entry
Biosequence.Seq in the BioSQL schema. So I've made this query:
SELECT SUBSTRING(bs.seq, 131615042, 131626262) FROM biosequence bs;
But this query hasn't yield the desired string, because the length of
this biosequence is only 100,000,020 bp. I am very confused why I get
such a discrepancy. I have added all chromosomes with the build in
method in BioJava addRichSequence(RichSequence seq) to the database.
From my raw data I know that this sequence should have a length of
140,279,252 bp. So where is the remaining part of my sequence? I have
observed these discrepancies on all chromsomes which are longer than
Here is an abstract of my database:
bioentry_id description length
2 Homo sapiens mitochondrion, complete genome. 16571
3 Homo sapiens chromosome Y, reference assembly, complete sequence.
4 Homo sapiens chromosome X, reference assembly, complete sequence.
5 Homo sapiens chromosome 22, reference assembly, complete sequence.
6 Homo sapiens chromosome 21, reference assembly, complete sequence.
7 Homo sapiens chromosome 20, reference assembly, complete sequence.
8 Homo sapiens chromosome 9, reference assembly, complete sequence.
9 Homo sapiens chromosome 7, reference assembly, complete sequence.
Sequences smaller than 100,000,020 bp are correctly stored under
I am grateful for any hints, which explain the behaviour of my database.
Biojava-l mailing list - Biojava-l at lists.open-bio.org
More information about the Biojava-l