[Biojava-l] Unexpected behavior from GFF3 parser

Richard Holland dicknetherlands at gmail.com
Wed May 21 18:49:57 UTC 2008


Hello. The GFF3 parser is an 'old' pre-BioJavaX one which basically
stuffs everything from the file into string annotations. It's fine as
long as you don't attempt to round-trip it to the database, but then
things get interesting. Still, you're right about the equals - it's a
bug that it isn't trimming it.

But, the string is getting stored verbatim because the BioJavaX
persistence layer always stores all 'old' annotations as toStrings of
the actual objects (it's a best-fit solution for translating 'old'
data into BioJavaX data as in BioJavaX all annotations are strings).

The problem really lies in the GFF3 parser not having been
updated/replaced to work with the new BioJavaX-style parsing, where
things are parsed more deeply and filed into more appropriate
locations in the object model. If there's a volunteer to convert it,
we'd appreciate hearing from them....

cheers,
Richard

2008/5/21 Doug Brown <debrown at unity.ncsu.edu>:
> Hi All,
>
>  In the process of putting together a piece of code to load GFF3 features
> into an extant BioSQL database, I ran across a scenario wherein the values
> for notes stored in the data base (seqfeature_qualifier_value tale) are of
> the form
>     "[=      54.9]"
> and the value as it occurred in the GFF3 file were of the form
>     "... ;someName=      54.9; ..."
>
> There are two things that trouble me about this:
> 1) the brackets are puzzling. It looks as if some piece of code is calling
> SimpleRichAnnotation.toString()
> 2) inclusion of the equals sign and lack of trimming for the value.
>
> As for #2, a quick inspection of the code shows line 343 of GFF3Parser.java
> maybe erroneous in that it does not eliminate the equals sign. That then
> would cause a failure of the trimming.
>
> As for #1, the problem might be due to my high level of ignorance of proper
> Biojavax programming style.  OR, it may be due to something in hibernate
> calling toString on the Set returned by SimpleRichAnnotation.getNoteSet().
>
> I am attempting to add GFF3 features to an bioentries in a BioSQL database.
>  Being unable to find any code similar to that which lives in the
> org.biojava.bio.program.gff package, I started rolling my own. If there is
> an easier way,  would somebody please set me on the correct path?
>
> Here is a code snippet from the GFF3DocumentHandler implementation:
>
>   public void recordLine( GFF3Record record)
>     {
>
>   . . .
>
>       // use the HQL approach to get a thin sequence
>       Query q = session.createQuery( "from ThinSequence as s where s.name =
> :acc");
>       q.setString( "acc", record.getSequenceID());
>       seq = (Sequence)q.uniqueResult();
>
>       // add the GFF annotation(s) to the sequence
>       if ( seq != null)
>         this.getAnnotator( (GFF3Record.Impl)record, true).annotate( seq);
>
>   . . .
>  }
>
>   . . .
>
>   public SequenceAnnotator getAnnotator( final GFF3Record.Impl rec,
>       final boolean checkSeqName)
>     {
>     return new SequenceAnnotator()
>       {
>       public Sequence annotate(Sequence seq) throws BioException,
> ChangeVetoException
>       {
>         if (!checkSeqName || rec.getSequenceID().equals(seq.getName()))
>           {
>           Feature.Template thisTemplate = null;
>
>           //Build the three types of annotations: 1) non-stranded,
> non-phased
>           // 2) stranded, non-phased, or 3) stranded, phased.
>           if (rec.getStrand() == StrandedFeature.UNKNOWN)
>             {
>             RichFeature.Template plain = new RichFeature.Template();
>             plain.annotation = Annotation.EMPTY_ANNOTATION;
>             }
>           else if (rec.getPhase() == GFFTools.NO_FRAME)
>             {
>             StrandedFeature.Template stranded = new
> StrandedFeature.Template();
>             stranded.annotation = Annotation.EMPTY_ANNOTATION;
>             stranded.strand = rec.getStrand();
>             thisTemplate = stranded;
>             }
>           else
>             {
>             // translate GFF phases into Biojava phases
>             FramedFeature.Template framed = new FramedFeature.Template();
>             framed.annotation = Annotation.EMPTY_ANNOTATION;
>             framed.strand = rec.getStrand();
>             switch (rec.getPhase())
>               {
>               case 0:
>                 framed.readingFrame = FramedFeature.FRAME_0;
>                 break;
>               case 1:
>                 framed.readingFrame = FramedFeature.FRAME_1;
>                 break;
>               case 2:
>                 framed.readingFrame = FramedFeature.FRAME_2;
>                 break;
>               }
>             thisTemplate = framed;
>             }
>           // set the items common to all three types
>           thisTemplate.location = new RangeLocation( rec.getStart(),
> rec.getEnd());
>           thisTemplate.typeTerm =  rec.getType();
>           thisTemplate.sourceTerm =  rec.getSource();
>
>           // the annotation was already filled out by the parser
>           thisTemplate.annotation = rec.getAnnotation();
>
>           // annotate the seqeunce
>           seq.createFeature(thisTemplate);
>           }
>         return seq;
>       }
>       };
>     }
>
> Regards,
> Doug Brown
>
> --
> Doug Brown - Bioinformatics
> Fungal Genomics Laboratory
> Center for Integrated Fungal Research
> North Carolina State University
> Campus Box 7251, Raleigh, NC 27695-7251
> https://www.fungalgenomics.ncsu.edu/~debrown/
> Tel: (919) 513-0394, Fax (919) 513-0024
> e-mail: doug_brown AtSign ncsu.edu
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>



More information about the Biojava-l mailing list