[BioSQL-l] What should source_term_id in table seqfeature refer to?

Richard Holland holland at eaglegenomics.com
Sat Aug 15 20:00:39 UTC 2009


Ok, cool. So we can now rephrase the original question to...: How  
should provenance information be stored in BioSQL?

:)

cheers,
Richard

On 15 Aug 2009, at 20:31, Hilmar Lapp wrote:

>
> On Aug 15, 2009, at 12:32 PM, Richard Holland wrote:
>
>> [...]
>> Case study:
>
> Great, now we're getting somewhere :-)
>
>> I download some seqs from Genbank. (Which then need to be annotated  
>> as having come from Genbank, at the sequence level).
>
> Note, as you say, *at the sequence level*. I.e., you would record  
> this either using the bioentry's namespace (biodatabase), or a  
> bioentry_qualifier_value annotation. I would choose the former,  
> though since a bioentry can on only be in one namespace, it may not  
> satisfy your needs.
>
>> They already have some features on them (which need to be annotated  
>> as having come from Genbank, at the feature level, but of an  
>> unknown algorithm as Genbank doesn't specify how they were  
>> generated usually).
>
> Right. The source term would indicate that GenBank provided them to  
> you, and that that's all you know.
>
>> I then run BLAST of those sequences against some local data, and  
>> record my own features as a result. I also run BLAT, and again  
>> record my own features.
>
> BLAST and BLAT would now be the source terms.
>
>> My colleague also runs BLAST of the same seqs against some data of  
>> his own, and wants our combined feature results to be stored in the  
>> same database. I want to be able to annotate all these new features  
>> both with the algorithm used to generate them (BLAST or BLAT)
>
> You use the source term for that.
>
>> and who did it (myself or my colleague at the institute down the  
>> road)
>
> Ah - that's provenance information, not the source as is normally  
> referred to. BioSQL at present doesn't have an explicit provenance  
> model, but you can still record provenance information through  
> ontology-typed tag/value annotation in seqfeature_qualifier_value,  
> with the terms coming from a provenance ontology (that you make up  
> yourself or grab from somewhere else).
>
>> , in addition to retaining the original features that came from  
>> Genbank (and making sure they're annotated as such).
>
> That shouldn't be a problem - certainly it's not for BioSQL.
>
>> Hence I'd need a source attribute for the sequence (Genbank in this  
>> case), a source attribute for each feature (Genbank, Me, or  
>> Colleague X, in this case), and an algorithm/technique/protocol  
>> attribute for each feature (BLAST or BLAT or 'don't know it just  
>> came from Genbank' in this example).
>
> Not quite - source really is what provided the feature to you, not  
> who or when, or using which BLAST database, genome assembly, or how  
> you parsed the results, etc etc. That's all provenance information.
>
> 	-hilmar
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/




More information about the BioSQL-l mailing list