[Biojava-l] ResidueList & Annotatable

Matthew Pocock mrp@sanger.ac.uk
Wed, 26 Jan 2000 17:28:50 +0000


Ann Loraine wrote:

> One last concern I have about getting away from Strings:
>
> I'd like to use java regular expression packages to examine sequence,
> and such packages* most likely would deal with Strings.  If there's a way
> to get a String out of a ResidueList then I will be satisfied.
>
> (e.g., savarese.org, ORO software)
>

This is not functionality in the ResidueList for reasons of history, but
Sequence has two methods - seqString and subStr(start, end) that return
strings where in the SimpleSequence implementation each residue has been
getSymbol'ed to get a single char - so effectively you get the sequence, or
part of it, as a string. There is a little bit of ambiguity here - you may
actualy want proteins to print their three-letter-codes out, not a single
letter, but it works for now.

>
> > No, I certainly wouldn't want to impose the one-type-of-annotation-
> > per-Sequence limit.  If you look at the bio.seq.Annotation interface,
> > you will see that it actually represents a set of keyed Objects
> > associated with a Sequence (or some other BioJava object) -- there
> > should be no problem associating many different pieces of data
> > with the Sequence.
> >
> > Note that the Annotation mechanism is only really meant for
> > storing data which correponds to the whole sequence -- for
> > instance, information about how a sequence was obtained,
> > references to journals, etc..  Annotations which apply to
> > specific locations on the sequence (e.g. promoter elements)
> > would be better represented using the more structured
> > bio.seq.Feature interface.
> >
> >
> > Thomas.
> > --
> > ``Science is magic that works''  -- Kurt Vonnegut.
> >
>
> Okay, I think I'm understanding this now.
>
> So a Sequence would have a single Annotation Object, which itself has
> numerous Features, all retrievable if I know what "key" to use?
>
> So if I wanted all the exons in a sequence, I could do something like:
>
> Object exon_list = annotation.get("exons") ?
>
> And exon_list would be a Set or some other data structure which
> contained Features representing exons?
>
> -Ann
>

It is nearly like that. The annotation object is used all over the place to
add per-object annotations. However, features are quite special things, and
only apply to sequences. The API for dealing with them is as follows:

Both Sequence and Feature extend the FeatureHolder interface. This defines
five methods for querying and editing the features within a sequence. You can
add/remove features, count them, get an Iterator over them or return a new
FeatureHolder that contains a filtered sub-set. The filters are simple object
that return yay-or-nay for each feature. A couple of default ones are static
final members of FeatureFilter.

So to get all exons from a sequence, and print out their type and locations do
something like:

Sequence seq; // get seq from somewhere
FeatureFilter exonFilter = new FeatureFilter.ByType("Exon");
FeatureHolder exons = seq.filter(exonFilter, true);

System.out.println("Exons for sequence " + seq.getName()");
for(Iterator i = exons.features(); i.hasNext(); ) {
  Feature f = (Feature) i.next();
  System.out.println("Feature " + f.getType() + " (" + f.getLocation() + ")"
);
}

>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l