[Bioperl-l] [Fwd: [Volunteer] gc_content]

Thu, 14 Feb 2002 11:53:32 -0500

This is a multi-part message in MIME format.
--------------080400050504060400040007
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit

--------------080400050504060400040007
Content-Type: message/rfc822;
 name="[Volunteer] gc_content"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="[Volunteer] gc_content"

Return-Path: <volunteer-admin@open-bio.org>
Received: from pw600a.bioperl.org (pw600a.bioperl.org [199.93.107.70])
	by fedayi.sonsorol.org (8.11.0/8.11.0) with ESMTP id g1EGH4125895
	for <dag@sonsorol.org>; Thu, 14 Feb 2002 11:17:04 -0500 (EST)
Received: from pw600a.bioperl.org (localhost [127.0.0.1])
	by pw600a.bioperl.org (8.12.2/8.12.2) with ESMTP id g1EGC2kO002746
	for <dag@sonsorol.org>; Thu, 14 Feb 2002 11:12:02 -0500
Received: from harpo.wi.mit.edu (genome.wi.mit.edu [18.157.0.135])
	by pw600a.bioperl.org (8.12.2/8.12.2) with ESMTP id g1EGBmkO002741
	for <volunteer@open-bio.org>; Thu, 14 Feb 2002 11:11:48 -0500
Received: from genome.wi.mit.edu (pc14095.wi.mit.edu [18.157.14.95])
	by harpo.wi.mit.edu (8.9.2/8.9.2) with ESMTP id LAA03910
	for <volunteer@open-bio.org>; Thu, 14 Feb 2002 11:18:12 -0500 (EST)
Message-ID: <3C6BE343.53BB12A4@genome.wi.mit.edu>
From: Seth Purcell <purcell@genome.wi.mit.edu>
Organization: Whitehead Institute Center for Genome Research
X-Mailer: Mozilla 4.78 [en] (WinNT; U)
X-Accept-Language: en
MIME-Version: 1.0
To: volunteer@open-bio.org
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Subject: [Volunteer] gc_content
Sender: volunteer-admin@open-bio.org
Errors-To: volunteer-admin@open-bio.org
X-BeenThere: volunteer@open-bio.org
X-Mailman-Version: 2.0.6
Precedence: bulk
List-Help: <mailto:volunteer-request@open-bio.org?subject=help>
List-Post: <mailto:volunteer@open-bio.org>
List-Subscribe: <http://open-bio.org/mailman/listinfo/volunteer>,
	<mailto:volunteer-request@open-bio.org?subject=subscribe>
List-Id: Open-Bio volunteer coordinator <volunteer.open-bio.org>
List-Unsubscribe: <http://open-bio.org/mailman/listinfo/volunteer>,
	<mailto:volunteer-request@open-bio.org?subject=unsubscribe>
List-Archive: <http://open-bio.org/pipermail/volunteer/>
Date: Thu, 14 Feb 2002 11:18:11 -0500

Hi -

I am very unfamiliar with BioPerl, but it seems like there isn't a
built-in method to get a sequence's gc content. I am assuming you just
don't want to clutter your code with something so trivial, but it is a
commonly repeated task, so if it would be useful to you please feel free
to incorporate the following small code snippet as a method. I think it
would make sense in either Seq or PrimarySeq. I can see how you might
not want to clutter PrimarySeq, but if you put it there you could avoid
both breaking the abstraction and copying the sequence just to get the
gc content. However, it seems like the Seq and PrimarySeq methods copy
the sequence all over the place, so you may not care about duplicating
the sequence all the time. Sorry to bother you if you already have this
functionality, I just didn't see it in the online documentation.

sub gc_content
{
	# calculate the gc content of the chunk of sequence passed as a
parameter

	my $seq = shift;

	return ($seq =~ tr/gGcC//)/length($seq);
}

I don't know if BioPerl users would rather have a percent than a
fraction, or if it would be useful to generalize this to be able to
calculate the content of letters besides g and c, etc., but these are
easy changes. The version I use optionally takes a reference to avoid
copying long sequences a lot, but I didn't think this was necessary for
a member function.

Seth Purcell
Scientific Programmer
Whitehead Institute/MIT Center for Genome Research
_______________________________________________
Volunteer mailing list
Volunteer@open-bio.org
http://open-bio.org/mailman/listinfo/volunteer

--------------080400050504060400040007--