[Bioperl-l] More on bioperl-live/Bio/FeatureIO gff.pm

Steffen Grossmann grossman at molgen.mpg.de
Wed Nov 24 09:23:18 EST 2004


Here are some comments on some parts of your last emails:

Lincoln Stein wrote:

>A group is absolutely not required to end with a ### directive.  It is 
>just a hint to the GFF parser that it no longer has to keep track of 
>previously-loaded features in case a child appears somewhere toward 
>the end of the file.
>
>Lincoln
>
>  
>
That's exactly how I understand it.
 From Allen:

>i interpret group to mean a set of items, each of which has 0..N
>conections to other members of the set, and 0 connections to members in
>other sets.
>
This is another true definition, but

>on a related note, maybe Bio::FeatureIO::GFF should (optionally) write a
>'###' into the filehandle after each time write_feature() is called.
>
this is not true, because it might happen, that a feature appears as a 
subfeature of more than one higher-level features. E.g. an exon can 
appear as part of two different transcripts. When writing out those two 
transcripts, you aren't allowed to put an '###' in between.

Here are some proposals for the concrete implementation:

1) Apart from 'next_feature' we implement two further methods 
'next_feature_group' and 'next_seq'. As discussed, a group either ends 
with '###' or with the EOF. 'next_seq' of course only makes sense when 
there is a '##FASTA' directive at the end of the file.

2a) To be able to deal with large gff-files we introduce two switches 
'track_feature_groups' and 'track_seqs' which default to 0 and can be 
set to 1 when creating the Bio::FeatureIO object. Only when those 
switches are set, users are allowed to call 'next_feature_group' or 
'next_seq', respectively. The reason for this that group or seq tracking 
can be very memory consuming in large gff-files without '###'s (because 
you never know what is to come...).

Alternatively:

2b) As soon as one of the three methods to access the data in the 
gff-file has been used for the first time after creating the 
Bio::FeatureIO object, the other two don't work any longer...

3) On the writing side implement methods like 'write_feature_group' (can 
be ended with '###') and 'write_seq'. In 'write_seq' we would have to 
internally remember all written sequences until the file is closed (to 
be realized with DESTROY).

Tell me your opinions and whether you have other ideas. I then start 
coding...

Steffen

P.S. I see that some of this functionality is already available in 
Bio::Tools::GFF. But as it seems there is a tendency away from it. I 
have no preference, I just would like to have full GFF3 functionality 
somewhere in bioperl...

-- 
%---------------------------------------------%
%            Steffen Grossmann                %
%                                             %
% Max Planck Institute for Molecular Genetics %
%      Computational Molecular Biology        %
%---------------------------------------------%
%              Ihnestrasse 73                 %
%               14195 Berlin                  %
%                 Germany                     %
%---------------------------------------------%
%         Tel: (++49 +30) 8413-1167           %
%         Fax: (++49 +30) 8413-1152           %
%---------------------------------------------%




More information about the Bioperl-l mailing list