[Biojava-l] differences between read in sequence and stored sequence in database

Mark Schreiber markjschreiber at gmail.com
Fri Oct 31 07:26:35 UTC 2008


Could this be a database implementation issue? Is there a limit on how
long a field can be in your DB?

- Mark

On Mon, Oct 27, 2008 at 8:57 PM, Gabrielle Doan <gabrielle_doan at gmx.net> wrote:
>
> Hi all,
>
> I have a BioSQL database which contains all human chromsomes. For my recent project I have to query for a part of a sequence.
> As far as I know I can get the whole sequence from the entry Biosequence.Seq in the BioSQL schema. So I've made this query:
>
> SELECT SUBSTRING(bs.seq, 131615042, 131626262) FROM biosequence bs;
>
> But this query hasn't yield the desired string, because the length of this biosequence is only 100,000,020 bp. I am very confused why I get such a discrepancy. I have added all chromosomes with the build in method in BioJava addRichSequence(RichSequence seq) to the database. From my raw data I know that this sequence should have a length of 140,279,252 bp. So where is the remaining part of my sequence? I have observed these discrepancies on all chromsomes which are longer than 100,000,020 bp.
>
> Here is an abstract of my database:
> bioentry_id     description     length
> 2       Homo sapiens mitochondrion, complete genome.    16571
> 3       Homo sapiens chromosome Y, reference assembly, complete sequence. 57772954
> 4       Homo sapiens chromosome X, reference assembly, complete sequence. 100000020
> 5       Homo sapiens chromosome 22, reference assembly, complete sequence. 49691432
> 6       Homo sapiens chromosome 21, reference assembly, complete sequence. 46944323
> 7       Homo sapiens chromosome 20, reference assembly, complete sequence. 25960004
> 8       Homo sapiens chromosome 9, reference assembly, complete sequence. 100000020
> 9       Homo sapiens chromosome 7, reference assembly, complete sequence. 100000020
>
> Sequences smaller than 100,000,020 bp are correctly stored under Biosequence.seq.
>
> I am grateful for any hints, which explain the behaviour of my database.
>
> Cheers,
>
> Gabrielle
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l



More information about the Biojava-l mailing list