[Bioperl-l] Added sequence parsing code to Bio::Tools::GFF
Allen Day
allenday at ucla.edu
Thu Jul 15 21:09:11 EDT 2004
you're handling the '##FASTA' directive? this is using the
_parse_header() method I added for '##sequence-region' lines, I take it?
i added a stub in this method for handling all GFF3 '##*' directives.
-allen
On Mon, 12 Jul 2004, Chris Mungall wrote:
>
> I have added sequence parsing code to the GFF parser; note that sequence
> data is only available in GFF3.
>
> It should now be possible to create a Bio::SeqIO::gff3 class, which would
> be a short wrapper to Bio::Tools::GFF. Most people would still want to use
> the Tools parser to parse on a per-feature basis, but the option of
> treating gff3 in a similar fashion to genbank/embl/chadoxml/etc via SeqIO
> would be there.
>
> According to the GFF3 spec the sequence data can come after or before the
> relevant features; this means that the parser has the potential to be a
> memory hog (but then the existing SeqIO classes already are with genbank
> whole-chromosome entries).
>
> I've included the new docs from the gff parser below; if people agree with
> this general means of handling sequence data then I'll go ahead and add a
> Bio::SeqIO::gff3 as well.
>
> =head1 GFF3 AND SEQUENCE DATA
>
> [added by cjm 2004/07/09]
>
> GFF3 supports sequence data; see
> http://song.sourceforge.net/gff3-jan04.shtml
>
> There are a number of ways to deal with this -
>
> If you call
>
> $gffio->ignore_sequence_data_toggle(1)
>
> prior to parsing the sequence data is ignored; this is useful if you
> just want the features. It avoids the memory overhead in building and
> caching sequences
>
> Alternatively, you can call either
>
> $gffio->get_all_seqs()
>
> Or
>
> $gffio->seq_id_by_h()
>
> At the B<end> of parsing to get either a list or hashref of Bio::Seq
> objects (see the documentation for each of these methods)
>
> Note that these objects will not have the features attached - you have
> to do this yourself, OR call
>
> $gffio->features_attached_to_seqs_toggle(1)
>
> PRIOR to parsing; this will ensure that the Seqs have the features
> attached; ie you will then be able to call
>
> $seq->get_SeqFeatures();
>
> And use Bio::SeqIO methods
>
> Note that auto-attaching the features to seqs will incur a higher
> memory overhead as the features must be cached until the sequence data
> is found
>
> =cut
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
More information about the Bioperl-l
mailing list