[Bioperl-l] writing gff - what's the minumum set of objects?

Wed Dec 1 19:31:21 EST 2004

Greetings all,

I'm trying to determine the **minimum**  required set of objects
required to write a line of GFF. I'm doing this so that I can expand the
types of annotations that can be geenrated from TIGR's Arabidopsis
annotations.

Parsing the XML is not a problem. What _is_ a problem is writing the
gff.

So far, I have found that this[1], which writes this[2], is the minimum
required set of operations required to write, say, part of the reference
sequence.

Am I missing something or is this **really** complicated? Is there a
simpler way to do this?

As part of this effort, I've done a bit of work in gff.pm and
Annotation.pm. I hope that this doesn't colide with similar work going
on.

In addition can Hilmar or someone familiar tell me why the 'type'
annotation must be a Bio::Annotation::OntologyTerm ? Why can't it be an
arbitrary string?

An additional question is:
in gff.pm, can I write an algorithm such that if a certain annotation
occurs in an Annotated(say, GROUP) then it's used in the ninth field of
the GFF rather than anything else.

The reason for that is that if people want to control what grouping is
used they need control over that field of the gff.

Thanks for the input,

Chad Matsalla

[1]
sub get_annotated_feature {
     my ($name,$coordset,$type,$annotation_string) = @_;
     my $feature = new Bio::SeqFeature::Annotated(
          -seq_id     =>   $name,
          -source   =>   "example",
          -primary  =>   $name,
          -type     =>   new Bio::Annotation::OntologyTerm(-name =>
'Chromosome'),
          -start    =>   $coordset->getEND5()->getData(),
          -end      =>   $coordset->getEND3()->getData(),
          -strand   =>   '+'
     );
     $feature->add_Annotation('Name',new
Bio::Annotation::SimpleValue($name));
     return $feature;
}

[2]
F8G22   example Chromosome      1       42801   .       +       .
Name=F8G22