[Biopython-dev] Sequences and simple plots

Peter biopython at maubp.freeserve.co.uk
Fri Sep 26 10:15:52 UTC 2008


On Thu, Sep 25, 2008 at 8:39 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Thu, Sep 25, 2008 at 7:34 PM, Jared Flatow <jflatow at northwestern.edu> wrote:
>>
>> Hi Peter,
>>
>> Good ideas for some useful examples! (though I can't actually find them in
>> the cookbook...)
>
> They are in CVS only at the moment - I can send you the PDF of the
> current tutorial if you like off list.  We don't normally update the
> tutorial on the website except as part of making a new release - this
> avoid the tutorial talking about unreleased code.

Cut and paste for people to comment on directly,

The first shows a histogram of sequence lengths in a FASTA file (based
having recently done this for some real assembly data).  Sample output:
http://biopython.org/DIST/docs/tutorial/images/hist_plot.png

from Bio import SeqIO
handle = open("ls_orchid.fasta")
sizes = [len(seq_record) for seq_record in SeqIO.parse(handle, "fasta")]
handle.close()

import pylab
pylab.hist(sizes, bins=20)
pylab.title("%i orchid sequences\nLengths %i to %i" \
         % (len(sizes),min(sizes),max(sizes)))
pylab.xlabel("Sequence length (bp)")
pylab.ylabel("Count")
pylab.show()

The second is based on the GC% example we used for the BOSC 2008
presentation: http://biopython.org/DIST/docs/tutorial/images/gc_plot.png

from Bio import SeqIO
from Bio.SeqUtils import GC
handle = open("ls_orchid.fasta")
gc_values = [GC(seq_record.seq) for seq_record in SeqIO.parse(handle, "fasta")]
gc_values.sort()
handle.close()

import pylab
pylab.plot(gc_values) pylab.title("%i orchid sequences\nGC%% %0.1f to %0.1f" \
         % (len(gc_values),min(gc_values),max(gc_values)))
pylab.xlabel("Genes")
pylab.ylabel("GC%")
pylab.show()

Peter



More information about the Biopython-dev mailing list