[Bioperl-l] (no subject)

Jason Stajich jason@cgt.mc.duke.edu
Thu, 14 Feb 2002 12:31:32 -0500 (EST)


Seth -
Thanks for the volunteer - actually this method exists in the
Bio::Tools::SeqStats module.  It actually allows you to calculate GC
content - but perhaps it isn't obvious to new users how to get it to do
what you want.

I'm starting a FAQ maybe that will make its way into it eventually.

-jason

Date: Thu, 14 Feb 2002 11:18:11 -0500
From: Seth Purcell <purcell@genome.wi.mit.edu>
To: volunteer@open-bio.org
Subject: [Volunteer] gc_content

Hi -

I am very unfamiliar with BioPerl, but it seems like there isn't a
built-in method to get a sequence's gc content. I am assuming you just
don't want to clutter your code with something so trivial, but it is a
commonly repeated task, so if it would be useful to you please feel free
to incorporate the following small code snippet as a method. I think it
would make sense in either Seq or PrimarySeq. I can see how you might
not want to clutter PrimarySeq, but if you put it there you could avoid
both breaking the abstraction and copying the sequence just to get the
gc content. However, it seems like the Seq and PrimarySeq methods copy
the sequence all over the place, so you may not care about duplicating
the sequence all the time. Sorry to bother you if you already have this
functionality, I just didn't see it in the online documentation.

sub gc_content
{
        # calculate the gc content of the chunk of sequence passed as a
parameter

        my $seq = shift;

        return ($seq =~ tr/gGcC//)/length($seq);
}

I don't know if BioPerl users would rather have a percent than a
fraction, or if it would be useful to generalize this to be able to
calculate the content of letters besides g and c, etc., but these are
easy changes. The version I use optionally takes a reference to avoid
copying long sequences a lot, but I didn't think this was necessary for
a member function.

Seth Purcell
Scientific Programmer

-- 
Jason Stajich
Duke University
jason@cgt.mc.duke.edu