[Bioperl-l] More on bioperl-live/Bio/FeatureIO gff.pm
Steffen Grossmann
grossman at molgen.mpg.de
Wed Nov 24 09:23:18 EST 2004
Here are some comments on some parts of your last emails:
Lincoln Stein wrote:
>A group is absolutely not required to end with a ### directive. It is
>just a hint to the GFF parser that it no longer has to keep track of
>previously-loaded features in case a child appears somewhere toward
>the end of the file.
>
>Lincoln
>
>
>
That's exactly how I understand it.
From Allen:
>i interpret group to mean a set of items, each of which has 0..N
>conections to other members of the set, and 0 connections to members in
>other sets.
>
This is another true definition, but
>on a related note, maybe Bio::FeatureIO::GFF should (optionally) write a
>'###' into the filehandle after each time write_feature() is called.
>
this is not true, because it might happen, that a feature appears as a
subfeature of more than one higher-level features. E.g. an exon can
appear as part of two different transcripts. When writing out those two
transcripts, you aren't allowed to put an '###' in between.
Here are some proposals for the concrete implementation:
1) Apart from 'next_feature' we implement two further methods
'next_feature_group' and 'next_seq'. As discussed, a group either ends
with '###' or with the EOF. 'next_seq' of course only makes sense when
there is a '##FASTA' directive at the end of the file.
2a) To be able to deal with large gff-files we introduce two switches
'track_feature_groups' and 'track_seqs' which default to 0 and can be
set to 1 when creating the Bio::FeatureIO object. Only when those
switches are set, users are allowed to call 'next_feature_group' or
'next_seq', respectively. The reason for this that group or seq tracking
can be very memory consuming in large gff-files without '###'s (because
you never know what is to come...).
Alternatively:
2b) As soon as one of the three methods to access the data in the
gff-file has been used for the first time after creating the
Bio::FeatureIO object, the other two don't work any longer...
3) On the writing side implement methods like 'write_feature_group' (can
be ended with '###') and 'write_seq'. In 'write_seq' we would have to
internally remember all written sequences until the file is closed (to
be realized with DESTROY).
Tell me your opinions and whether you have other ideas. I then start
coding...
Steffen
P.S. I see that some of this functionality is already available in
Bio::Tools::GFF. But as it seems there is a tendency away from it. I
have no preference, I just would like to have full GFF3 functionality
somewhere in bioperl...
--
%---------------------------------------------%
% Steffen Grossmann %
% %
% Max Planck Institute for Molecular Genetics %
% Computational Molecular Biology %
%---------------------------------------------%
% Ihnestrasse 73 %
% 14195 Berlin %
% Germany %
%---------------------------------------------%
% Tel: (++49 +30) 8413-1167 %
% Fax: (++49 +30) 8413-1152 %
%---------------------------------------------%
More information about the Bioperl-l
mailing list