[BioSQL-l] how to quickly retrieve feature sequences

Gang Wu gwu at molbio.mgh.harvard.edu
Tue Jun 15 13:31:28 EDT 2004


SQL again:

SELECT t1.seqfeature_id,t1.bioentry_id,t2.start_pos, t2.end_pos, t2.strand,
t4.value locus_tag,
substring(t6.seq, t2.start_pos,t2.end_pos) seq
FROM `seqfeature` t1 inner join location t2 on
t1.seqfeature_id=t2.seqfeature_id
inner join term t3 on t1.type_term_id=t3.term_id
inner join seqfeature_qualifier_value t4 on
t1.seqfeature_id=t4.seqfeature_id
inner join term t5 on t4.term_id=t5.term_id
inner join biosequence t6 on t1.bioentry_id=t6.bioentry_id
where t3.name='gene' and t5.name='locus_tag'
limit 2

Gang

-----Original Message-----
From: biosql-l-bounces at portal.open-bio.org
[mailto:biosql-l-bounces at portal.open-bio.org]On Behalf Of Gang Wu
Sent: Tuesday, June 15, 2004 12:38 PM
To: biosql-l at open-bio.org
Subject: [BioSQL-l] how to quickly retrieve feature sequences


Hi,

I just loaded the 5 Arabidopsis thalian Genbank genome files into my
sequence database(BioSQL 1.38). My question is: How can I efficiently
retrieve all gene sequences from the database? I tried to do that by joining
seqfeature, seqfeature_qualifier_value, location, term and biosequence
tables, but it turned out to be extremely slow(See the attached SQL, 2
records take about 20 seconds on my Dell PowerEdge 2650 with dual 2.6G
Xeons). Does anyone have a better way to do it?

All I can imagine to do this faster is(by Java or other languages): Pull all
gene location info; Pull erlated sequence from biosequence table; rotate
through the gene location list and retrieve the substring of the sequence.
But this does not seem attractive for me since for different applications, I
have to write code to pull the sequences by myself. Is it possible to
extend/modify the BioSQL schema to serve this purpose better?

My understanding is that a lot subsequent applications would be only
interested in certain pieces of the whole genome sequences and there must be
an efficient way to do that. If everyone has to invent their method, the
BioSQL might be a little bit too limited. Any idea on this?

Gang

_______________________________________________
BioSQL-l mailing list
BioSQL-l at open-bio.org
http://open-bio.org/mailman/listinfo/biosql-l



More information about the BioSQL-l mailing list