[Biojava-l] Unexpected behavior from GFF3 parser
Richard Holland
dicknetherlands at gmail.com
Wed May 21 18:49:57 UTC 2008
Hello. The GFF3 parser is an 'old' pre-BioJavaX one which basically
stuffs everything from the file into string annotations. It's fine as
long as you don't attempt to round-trip it to the database, but then
things get interesting. Still, you're right about the equals - it's a
bug that it isn't trimming it.
But, the string is getting stored verbatim because the BioJavaX
persistence layer always stores all 'old' annotations as toStrings of
the actual objects (it's a best-fit solution for translating 'old'
data into BioJavaX data as in BioJavaX all annotations are strings).
The problem really lies in the GFF3 parser not having been
updated/replaced to work with the new BioJavaX-style parsing, where
things are parsed more deeply and filed into more appropriate
locations in the object model. If there's a volunteer to convert it,
we'd appreciate hearing from them....
cheers,
Richard
2008/5/21 Doug Brown <debrown at unity.ncsu.edu>:
> Hi All,
>
> In the process of putting together a piece of code to load GFF3 features
> into an extant BioSQL database, I ran across a scenario wherein the values
> for notes stored in the data base (seqfeature_qualifier_value tale) are of
> the form
> "[= 54.9]"
> and the value as it occurred in the GFF3 file were of the form
> "... ;someName= 54.9; ..."
>
> There are two things that trouble me about this:
> 1) the brackets are puzzling. It looks as if some piece of code is calling
> SimpleRichAnnotation.toString()
> 2) inclusion of the equals sign and lack of trimming for the value.
>
> As for #2, a quick inspection of the code shows line 343 of GFF3Parser.java
> maybe erroneous in that it does not eliminate the equals sign. That then
> would cause a failure of the trimming.
>
> As for #1, the problem might be due to my high level of ignorance of proper
> Biojavax programming style. OR, it may be due to something in hibernate
> calling toString on the Set returned by SimpleRichAnnotation.getNoteSet().
>
> I am attempting to add GFF3 features to an bioentries in a BioSQL database.
> Being unable to find any code similar to that which lives in the
> org.biojava.bio.program.gff package, I started rolling my own. If there is
> an easier way, would somebody please set me on the correct path?
>
> Here is a code snippet from the GFF3DocumentHandler implementation:
>
> public void recordLine( GFF3Record record)
> {
>
> . . .
>
> // use the HQL approach to get a thin sequence
> Query q = session.createQuery( "from ThinSequence as s where s.name =
> :acc");
> q.setString( "acc", record.getSequenceID());
> seq = (Sequence)q.uniqueResult();
>
> // add the GFF annotation(s) to the sequence
> if ( seq != null)
> this.getAnnotator( (GFF3Record.Impl)record, true).annotate( seq);
>
> . . .
> }
>
> . . .
>
> public SequenceAnnotator getAnnotator( final GFF3Record.Impl rec,
> final boolean checkSeqName)
> {
> return new SequenceAnnotator()
> {
> public Sequence annotate(Sequence seq) throws BioException,
> ChangeVetoException
> {
> if (!checkSeqName || rec.getSequenceID().equals(seq.getName()))
> {
> Feature.Template thisTemplate = null;
>
> //Build the three types of annotations: 1) non-stranded,
> non-phased
> // 2) stranded, non-phased, or 3) stranded, phased.
> if (rec.getStrand() == StrandedFeature.UNKNOWN)
> {
> RichFeature.Template plain = new RichFeature.Template();
> plain.annotation = Annotation.EMPTY_ANNOTATION;
> }
> else if (rec.getPhase() == GFFTools.NO_FRAME)
> {
> StrandedFeature.Template stranded = new
> StrandedFeature.Template();
> stranded.annotation = Annotation.EMPTY_ANNOTATION;
> stranded.strand = rec.getStrand();
> thisTemplate = stranded;
> }
> else
> {
> // translate GFF phases into Biojava phases
> FramedFeature.Template framed = new FramedFeature.Template();
> framed.annotation = Annotation.EMPTY_ANNOTATION;
> framed.strand = rec.getStrand();
> switch (rec.getPhase())
> {
> case 0:
> framed.readingFrame = FramedFeature.FRAME_0;
> break;
> case 1:
> framed.readingFrame = FramedFeature.FRAME_1;
> break;
> case 2:
> framed.readingFrame = FramedFeature.FRAME_2;
> break;
> }
> thisTemplate = framed;
> }
> // set the items common to all three types
> thisTemplate.location = new RangeLocation( rec.getStart(),
> rec.getEnd());
> thisTemplate.typeTerm = rec.getType();
> thisTemplate.sourceTerm = rec.getSource();
>
> // the annotation was already filled out by the parser
> thisTemplate.annotation = rec.getAnnotation();
>
> // annotate the seqeunce
> seq.createFeature(thisTemplate);
> }
> return seq;
> }
> };
> }
>
> Regards,
> Doug Brown
>
> --
> Doug Brown - Bioinformatics
> Fungal Genomics Laboratory
> Center for Integrated Fungal Research
> North Carolina State University
> Campus Box 7251, Raleigh, NC 27695-7251
> https://www.fungalgenomics.ncsu.edu/~debrown/
> Tel: (919) 513-0394, Fax (919) 513-0024
> e-mail: doug_brown AtSign ncsu.edu
>
> _______________________________________________
> Biojava-l mailing list - Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>
More information about the Biojava-l
mailing list