[Biojava-l] Unexpected behavior from GFF3 parser

Doug Brown debrown at unity.ncsu.edu
Wed May 21 18:18:53 UTC 2008


Hi All,

  In the process of putting together a piece of code to load GFF3 
features into an extant BioSQL database, I ran across a scenario wherein 
the values for notes stored in the data base (seqfeature_qualifier_value 
tale) are of the form
      "[=      54.9]"
and the value as it occurred in the GFF3 file were of the form
      "... ;someName=      54.9; ..."

There are two things that trouble me about this:
1) the brackets are puzzling. It looks as if some piece of code is 
calling SimpleRichAnnotation.toString()
2) inclusion of the equals sign and lack of trimming for the value.

As for #2, a quick inspection of the code shows line 343 of 
GFF3Parser.java maybe erroneous in that it does not eliminate the equals 
sign. That then would cause a failure of the trimming.

As for #1, the problem might be due to my high level of ignorance of 
proper Biojavax programming style.  OR, it may be due to something in 
hibernate calling toString on the Set returned by 
SimpleRichAnnotation.getNoteSet().

 I am attempting to add GFF3 features to an bioentries in a BioSQL 
database.  Being unable to find any code similar to that which lives in 
the org.biojava.bio.program.gff package, I started rolling my own. If 
there is an easier way,  would somebody please set me on the correct path?

Here is a code snippet from the GFF3DocumentHandler implementation:

    public void recordLine( GFF3Record record)
      {

    . . .

        // use the HQL approach to get a thin sequence
        Query q = session.createQuery( "from ThinSequence as s where 
s.name = :acc");
        q.setString( "acc", record.getSequenceID());
        seq = (Sequence)q.uniqueResult();

        // add the GFF annotation(s) to the sequence
        if ( seq != null)
          this.getAnnotator( (GFF3Record.Impl)record, true).annotate( seq);

    . . .
   }

    . . .

    public SequenceAnnotator getAnnotator( final GFF3Record.Impl rec,
        final boolean checkSeqName)
      {
      return new SequenceAnnotator()
        {
        public Sequence annotate(Sequence seq) throws BioException, 
ChangeVetoException
        {
          if (!checkSeqName || rec.getSequenceID().equals(seq.getName()))
            {
            Feature.Template thisTemplate = null;

            //Build the three types of annotations: 1) non-stranded, 
non-phased
            // 2) stranded, non-phased, or 3) stranded, phased.
            if (rec.getStrand() == StrandedFeature.UNKNOWN)
              {
              RichFeature.Template plain = new RichFeature.Template();
              plain.annotation = Annotation.EMPTY_ANNOTATION;
              }
            else if (rec.getPhase() == GFFTools.NO_FRAME)
              {
              StrandedFeature.Template stranded = new 
StrandedFeature.Template();
              stranded.annotation = Annotation.EMPTY_ANNOTATION;
              stranded.strand = rec.getStrand();
              thisTemplate = stranded;
              }
            else
              {
              // translate GFF phases into Biojava phases
              FramedFeature.Template framed = new FramedFeature.Template();
              framed.annotation = Annotation.EMPTY_ANNOTATION;
              framed.strand = rec.getStrand();
              switch (rec.getPhase())
                {
                case 0:
                  framed.readingFrame = FramedFeature.FRAME_0;
                  break;
                case 1:
                  framed.readingFrame = FramedFeature.FRAME_1;
                  break;
                case 2:
                  framed.readingFrame = FramedFeature.FRAME_2;
                  break;
                }
              thisTemplate = framed;
              }
            // set the items common to all three types
            thisTemplate.location = new RangeLocation( rec.getStart(), 
rec.getEnd());
            thisTemplate.typeTerm =  rec.getType();
            thisTemplate.sourceTerm =  rec.getSource();

            // the annotation was already filled out by the parser
            thisTemplate.annotation = rec.getAnnotation();

            // annotate the seqeunce
            seq.createFeature(thisTemplate);
            }
          return seq;
        }
        };
      }

Regards,
Doug Brown

-- 
Doug Brown - Bioinformatics
Fungal Genomics Laboratory
Center for Integrated Fungal Research
North Carolina State University
Campus Box 7251, Raleigh, NC 27695-7251
https://www.fungalgenomics.ncsu.edu/~debrown/
Tel: (919) 513-0394, Fax (919) 513-0024
e-mail: doug_brown AtSign ncsu.edu




More information about the Biojava-l mailing list