[Biojava-l] Unexpected behavior from GFF3 parser
Doug Brown
debrown at unity.ncsu.edu
Wed May 21 18:18:53 UTC 2008
Hi All,
In the process of putting together a piece of code to load GFF3
features into an extant BioSQL database, I ran across a scenario wherein
the values for notes stored in the data base (seqfeature_qualifier_value
tale) are of the form
"[= 54.9]"
and the value as it occurred in the GFF3 file were of the form
"... ;someName= 54.9; ..."
There are two things that trouble me about this:
1) the brackets are puzzling. It looks as if some piece of code is
calling SimpleRichAnnotation.toString()
2) inclusion of the equals sign and lack of trimming for the value.
As for #2, a quick inspection of the code shows line 343 of
GFF3Parser.java maybe erroneous in that it does not eliminate the equals
sign. That then would cause a failure of the trimming.
As for #1, the problem might be due to my high level of ignorance of
proper Biojavax programming style. OR, it may be due to something in
hibernate calling toString on the Set returned by
SimpleRichAnnotation.getNoteSet().
I am attempting to add GFF3 features to an bioentries in a BioSQL
database. Being unable to find any code similar to that which lives in
the org.biojava.bio.program.gff package, I started rolling my own. If
there is an easier way, would somebody please set me on the correct path?
Here is a code snippet from the GFF3DocumentHandler implementation:
public void recordLine( GFF3Record record)
{
. . .
// use the HQL approach to get a thin sequence
Query q = session.createQuery( "from ThinSequence as s where
s.name = :acc");
q.setString( "acc", record.getSequenceID());
seq = (Sequence)q.uniqueResult();
// add the GFF annotation(s) to the sequence
if ( seq != null)
this.getAnnotator( (GFF3Record.Impl)record, true).annotate( seq);
. . .
}
. . .
public SequenceAnnotator getAnnotator( final GFF3Record.Impl rec,
final boolean checkSeqName)
{
return new SequenceAnnotator()
{
public Sequence annotate(Sequence seq) throws BioException,
ChangeVetoException
{
if (!checkSeqName || rec.getSequenceID().equals(seq.getName()))
{
Feature.Template thisTemplate = null;
//Build the three types of annotations: 1) non-stranded,
non-phased
// 2) stranded, non-phased, or 3) stranded, phased.
if (rec.getStrand() == StrandedFeature.UNKNOWN)
{
RichFeature.Template plain = new RichFeature.Template();
plain.annotation = Annotation.EMPTY_ANNOTATION;
}
else if (rec.getPhase() == GFFTools.NO_FRAME)
{
StrandedFeature.Template stranded = new
StrandedFeature.Template();
stranded.annotation = Annotation.EMPTY_ANNOTATION;
stranded.strand = rec.getStrand();
thisTemplate = stranded;
}
else
{
// translate GFF phases into Biojava phases
FramedFeature.Template framed = new FramedFeature.Template();
framed.annotation = Annotation.EMPTY_ANNOTATION;
framed.strand = rec.getStrand();
switch (rec.getPhase())
{
case 0:
framed.readingFrame = FramedFeature.FRAME_0;
break;
case 1:
framed.readingFrame = FramedFeature.FRAME_1;
break;
case 2:
framed.readingFrame = FramedFeature.FRAME_2;
break;
}
thisTemplate = framed;
}
// set the items common to all three types
thisTemplate.location = new RangeLocation( rec.getStart(),
rec.getEnd());
thisTemplate.typeTerm = rec.getType();
thisTemplate.sourceTerm = rec.getSource();
// the annotation was already filled out by the parser
thisTemplate.annotation = rec.getAnnotation();
// annotate the seqeunce
seq.createFeature(thisTemplate);
}
return seq;
}
};
}
Regards,
Doug Brown
--
Doug Brown - Bioinformatics
Fungal Genomics Laboratory
Center for Integrated Fungal Research
North Carolina State University
Campus Box 7251, Raleigh, NC 27695-7251
https://www.fungalgenomics.ncsu.edu/~debrown/
Tel: (919) 513-0394, Fax (919) 513-0024
e-mail: doug_brown AtSign ncsu.edu
More information about the Biojava-l
mailing list