[Bioperl-l] reading and writing GFF3

Robert Buels rmb32 at cornell.edu
Fri Jun 16 18:36:22 UTC 2006


Thanks for the reply Scott.  It's good that the BSF::Annotated features 
control the type to be in the SO.  I sort of came to the "BTG is only 
gff3-/like/" conclusion myself as I poked around in the two modules in 
question, so I'd much rather use BSF::gff.  So I guess the question now 
is (and this will probably be a pretty common use case) how does one 
take an "old" Bio::SeqFeature::Generic or the like object and make it 
into a Bio::SeqFeature::Annotated?


Rob

Scott Cain wrote:
> Hi Rob,
>
> I sympathize with your pain--writing GFF3 isn't as easy as writing GFF2,
> but that is actually a good thing.  The tighter constraints results in a
> better, more consistent file format.
>
> The reason only BSF::Annotated features are writable is that there needs
> to be tight control on the 'type' of the feature, to insure that the
> type is part of the Sequence Ontology.  It also makes it much easier to
> properly write out the attributes in the ninth column, particularly the
> ones that are 'reserved', like Parent, Dbxref, and Ontology_term.
>
> BTG is still usable, but the GFF3 it puts out is actually more
> 'GFF3-like'; that is, it looks like GFF3, but because there are no
> constraints on the type and the terms that are used in the ninth column,
> you have to be very careful using it to produce GFF3, by making sure
> that your feature objects conform to the standard before BTG tries to
> write them out.  (Of course, one way to do that would be to convert your
> feature objects to BSF::Annotated objects, but then you could use
> BFIO::gff :-)
>
> [Long pause while scott goes and monkeys with Bio::Tools::GFF]
>
> OK, I just committed a fix Bio::Tools::GFF which produces valid GFF3 for
> your sample data.  Conveniently, since 'nucleotide_motif' is a SO term,
> this is completely valid.  (I even fixed the escaping the of the stray
> '=' in 'hind_R=2046'.)  The output I get is this:
>
> ##gff-version 3
> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2095    2556    918     -       .       Target=Contig151 325 832
> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2590    2736    488     -       .       Target=Contig386 1 124
> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2787    3105    1718    +       .       Target=Contig358 1 311
> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        3974    4036    312     -       .       Target=hind_R%3D2046 59 120
>
> Scott
>
>
>
> On Thu, 2006-06-15 at 18:37 -0700, Robert Buels wrote:
>   
>> There is stuff in bioperl for reading and writing GFF3.  There's 
>> Bio::Tools::GFF.  There's Bio::FeatureIO::gff.  Are there more?  Which 
>> is the 'best' one to use?
>>
>> Neither of these is working very well for me.
>>
>> My proximate use case is reading in a RepeatMasker report with 
>> Bio::Tools::RepeatMasker, which spits out FeaturePair objects, then 
>> writing those out to a GFF3 file.
>>
>> Bio::Tools::GFF will take these things and write out something that 
>> closely resembles GFF3, but with Target attributes that don't seem to 
>> comply with Lincoln's GFF3 spec, since its coordinates are join()ed with 
>> commas instead of spaces.  I'm attaching a little script that 
>> illustrates this.
>>
>> Bio::FeatureIO::gff refuses to take these FeaturePairs or either of the 
>> features contained in them, throwing 'only Bio::SeqFeature::Annotated 
>> objects are writeable'.  This seems a bit silly, since one of the whole 
>> points of Bioperl is using polymorphism to make it easy to connect 
>> things together.  I've attached a little script to illustrate this one too.
>>
>> So my questions are:  what _should_ I be doing here?  Is Bio::Tools::GFF 
>> deprecated?  Why does Bio::FeatureIO::gff only accept 
>> Bio::SeqFeature::Annotated objects?
>>
>> Thanks in advance.
>>
>> Rob
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>     

-- 
Robert Buels
SGN Bioinformatics Analyst
252A Emerson Hall, Cornell University
Ithaca, NY  14853
Tel: 503-889-8539
rmb32 at cornell.edu
http://www.sgn.cornell.edu





More information about the Bioperl-l mailing list