[Biojava-l] Best way to represent multiple annotation "tracks"

Mon Sep 8 19:53:09 EDT 2003

Hi again,

[ Pls excuse the poor netiquette for posting a followup to my own message.... ]

>So, let's say I have a bunch of ESTs, each of which has been mapped 
>to a location on the genome. To each EST, my data maps a floating 
>point number indicating the expression level.
>
>In this case, each Feature would represent one of the ESTs, right? 
>So what would be the Feature(s) representing the annotation track? 
>Would they be Features owned by the EST Feature (via the inheritance 
>of FeatureHolder by interface Feature)? Or -- ??

Alternatively, would each EST be a SubSequence with Features attached 
to it, with each Feature representing one of the data "tracks"? If 
so, then how does one decide when something is a Feature or a 
SubSequence? I'm not a biologist, so I'm probably way confused here, 
but it seems to me there is a gray area (ORFs?).

So I guess there are several competing implementations.

1. SEQUENCE BASED, FEATURES REPRESENT DATA TRACKS

    "Chromosome 1" : Sequence
       "AA504327" : SubSequence
          "RAT2N" : Feature
              <anon> : Annotation
                 "Value" => 2.34312
          "RAT2_MEAN": Feature
               <anon> : Annotation
                  "Value" => 1.8342
       "AA432030" : SubSequence
          ...

2. SEQUENCE BASED, ANNOTATIONS REPRESENT DATA TRACKS

    "Chromosome 1" : Sequence
       "AA504327" : SubSequence
          "Microarray Experiment B2341v" : Feature
              <anon> : Annotation
                 "RAT2N" => 2.34312
                 "RAT2_MEAN" => 1.8342
       "AA432030" : SubSequence
          ...

3. FEATURE BASED, SUB-FEATURES REPRESENT DATA TRACKS

    (Very similar to #1, except using Feature instead of SubSequence.)

    "Chromosome 1" : Sequence
       "AA504327" : Feature
          "RAT2N" : Feature
              <anon> : Annotation
                 "Value" => 2.34312
          "RAT2_MEAN": Feature
               <anon> : Annotation
                  "Value" => 1.8342
       "AA432030" : SubSequence
          ...

4. FEATURE BASED, ANNOTATIONS REPRESENT DATA TRACKS

    (Very similar to #2, except using Feature instead of SubSequence.)

    "Chromosome 1" : Sequence
       "AA504327" : Feature
          "Microarray Experiment B2341v" : Feature
              <anon> : Annotation
                 "RAT2N" => 2.34312
                 "RAT2_MEAN" => 1.8342
       "AA432030" : SubSequence
          ...

Thanks a lot again & peace,

Ihab

-- 
Ihab A.B. Awad <ihab at stanford.edu>
Snr Scientific Programmer, Dept of Genetics