[BioSQL-l] What should source_term_id in table seqfeature refer to?

Richard Holland holland at eaglegenomics.com
Sat Aug 15 16:32:35 UTC 2009


On 15 Aug 2009, at 15:29, Hilmar Lapp wrote:

>
> On Aug 15, 2009, at 6:44 AM, Richard Holland wrote:
>
>> [...]
>> What I mean is this:
>>
>> 1. The sequence itself could be downloaded from Genbank, EMBL, or  
>> elsewhere, or I could have discovered it in-house.
>
> That's actually what I meant.
>
>> 2. The features on the sequence could have been generated by  
>> running BLAST, miRBase, etc., or they could be manually annotated.
>> 3. The features on the sequence could have been downloaded from  
>> Genbank, EMBL, etc., or they could have been made locally, or by a  
>> collaborator at another institute.
>
> Right, but if a feature is the result of you running some algorithm  
> against some sequences, then it's not been downloaded or given to  
> you. Features on one and the same sequence can have different  
> sources, obviously, so I'm a bit confused - I think we're talking  
> about the same thing in different words, but I'm not sure.

Probably. :)

Case study: I download some seqs from Genbank. (Which then need to be  
annotated as having come from Genbank, at the sequence level). They  
already have some features on them (which need to be annotated as  
having come from Genbank, at the feature level, but of an unknown  
algorithm as Genbank doesn't specify how they were generated usually).  
I then run BLAST of those sequences against some local data, and  
record my own features as a result. I also run BLAT, and again record  
my own features. My colleague also runs BLAST of the same seqs against  
some data of his own, and wants our combined feature results to be  
stored in the same database. I want to be able to annotate all these  
new features both with the algorithm used to generate them (BLAST or  
BLAT) and who did it (myself or my colleague at the institute down the  
road), in addition to retaining the original features that came from  
Genbank (and making sure they're annotated as such). Hence I'd need a  
source attribute for the sequence (Genbank in this case), a source  
attribute for each feature (Genbank, Me, or Colleague X, in this  
case), and an algorithm/technique/protocol attribute for each feature  
(BLAST or BLAT or 'don't know it just came from Genbank' in this  
example).

cheers,
Richard

> 	-hilmar
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/




More information about the BioSQL-l mailing list