[Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Tue Dec 9 15:12:56 UTC 2008


http://bugzilla.open-bio.org/show_bug.cgi?id=2671





------- Comment #20 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-09 10:12 EST -------
(In reply to comment #12)
> 
> Bio.Graphics.GenomeDiagram.Utilities
> ====================================
> This is a collection of utilities for getting information useful for graph
> values.  From the docstring,
> 
>     o apply_to_window (sequence, window_size, function, step=None)  Apply a
>                         passed function to fragments of the passed sequence of
>                         size window_size, with each window separated by the
>                         passed step.

This windowing function is rather specific to GenomeDiagram by the nature of
how it returns the values and their positions.  The handling of the end of the
sequence is also non-general.  Suppose we put apply_to_window somewhere under
Bio.Graphics.GenomeDiagram.  It can then be used with any sequence analysis
function which takes a sequence/string and returns a float, returning the
scores and window positions as expected by GenomeDiagram for drawing graphical
tracks.

That would leave the following general non-windowed functions from
Utilities.py,

calc_gc_content - returns a float in the range 0 to 1.
calc_at_content - returns a float in the range 0 to 1.
calc_gc_skew - returns a float, gives zero if there is no GC content.
calc_at_skew - returns a float, gives zero if there is no AT content.

Bio.SeqUtils already has several functions including:

GC - returns a float in the range 0 to 100 (i.e. 100 times the actual fraction)
GC_skew - returns a list of floats using a default window size of 100bp.  Gives
a floating point exception if there is no GC content in any window.

Personally I don't like the fact that the existing GC function returns a number
between 0 and 100, but otherwise this code is fine.

I don't think the current GC_skew function is intuitive and doesn't cover the
non-windowed use-case where you want the GC_skew of the whole sequence passed
in.  This is important if you want to do your own windowing (e.g. comparing GC
skew of individual genes to the whole genome).

Because they differ from the existing Bio.SeqUtils code, I think there is a
case for adding the four non-windowed functions from GenomeDiagram's
Utilities.py under Bio.SeqUtils.  Perhaps under a sub module like
Bio.SeqUtils.Nucleotides or Bio.SeqUtils.NucUtils?  The existing GC functions
in Bio.SeqUtils could be deprecated or at least declared obsolete.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.



More information about the Biopython-dev mailing list