[Bioperl-l] Added sequence parsing code to Bio::Tools::GFF

Chris Mungall cjm at fruitfly.org
Thu Jul 15 20:09:57 EDT 2004


I'm filling in the stubs, yep - see the checked in code

although it turns out the ##FASTA part is in the footer, not the header
(see the spec*). I've still filled in your stub to allow for featuresless
GFF3 files that contain sequence, which is perfectly valid.

On Thu, 15 Jul 2004, Allen Day wrote:

> you're handling the '##FASTA' directive?  this is using the
> _parse_header() method I added for '##sequence-region' lines, I take it?
> i added a stub in this method for handling all GFF3 '##*' directives.
>
> -allen
>
>
> On Mon, 12 Jul 2004, Chris Mungall wrote:
>
> >
> > I have added sequence parsing code to the GFF parser; note that sequence
> > data is only available in GFF3.
> >
> > It should now be possible to create a Bio::SeqIO::gff3 class, which would
> > be a short wrapper to Bio::Tools::GFF. Most people would still want to use
> > the Tools parser to parse on a per-feature basis, but the option of
> > treating gff3 in a similar fashion to genbank/embl/chadoxml/etc via SeqIO
> > would be there.
> >
> > According to the GFF3 spec the sequence data can come after or before the
> > relevant features; this means that the parser has the potential to be a
> > memory hog (but then the existing SeqIO classes already are with genbank
> > whole-chromosome entries).
> >
> > I've included the new docs from the gff parser below; if people agree with
> > this general means of handling sequence data then I'll go ahead and add a
> > Bio::SeqIO::gff3 as well.
> >
> > =head1 GFF3 AND SEQUENCE DATA
> >
> > [added by cjm 2004/07/09]
> >
> > GFF3 supports sequence data; see
> > http://song.sourceforge.net/gff3-jan04.shtml
> >
> > There are a number of ways to deal with this -
> >
> > If you call
> >
> >   $gffio->ignore_sequence_data_toggle(1)
> >
> > prior to parsing the sequence data is ignored; this is useful if you
> > just want the features. It avoids the memory overhead in building and
> > caching sequences
> >
> > Alternatively, you can call either
> >
> >   $gffio->get_all_seqs()
> >
> > Or
> >
> >   $gffio->seq_id_by_h()
> >
> > At the B<end> of parsing to get either a list or hashref of Bio::Seq
> > objects (see the documentation for each of these methods)
> >
> > Note that these objects will not have the features attached - you have
> > to do this yourself, OR call
> >
> >   $gffio->features_attached_to_seqs_toggle(1)
> >
> > PRIOR to parsing; this will ensure that the Seqs have the features
> > attached; ie you will then be able to call
> >
> >   $seq->get_SeqFeatures();
> >
> > And use Bio::SeqIO methods
> >
> > Note that auto-attaching the features to seqs will incur a higher
> > memory overhead as the features must be cached until the sequence data
> > is found
> >
> > =cut
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
>



More information about the Bioperl-l mailing list