[BioPython] Question about Seq.count()

Jimmy Musselwhite jimmy.musselwhite at gmail.com
Wed Oct 17 23:04:26 UTC 2007


In response to the first reply you gave me, where you said this

You I'd never noticed that - I would call it a bug...

 >>> from Bio.Seq import Seq
 >>> my_seq = Seq("AAACACACGGTTTT")
 >>> my_seq.data.count("GG")
1
 >>> my_seq.data.count("G")
2
 >>> my_seq.tostring().count("G")
2
 >>> my_seq.tostring().count("GG")
1
 >>> my_seq.count("G")
2
 >>> my_seq.count("GG")
0


I've tried that many many times and I always get 0 when I do
my_seq.count("GG")
I just rebuilt biopython from the latest CVS tarball and it still does not
work. I have no idea why yours works and mine doesn't.

On 10/17/07, Jimmy Musselwhite <jimmy.musselwhite at gmail.com> wrote:
>
> Just kidding, it didn't work great. It only "fixed" it because I was
> printing out the output of count() and so it was just executing 100 times
> slower and thus eating RAM 100 times slower :(
>
> It doesn't seem like there is a good way for me to fix this.
>
> On 10/17/07, Jimmy Musselwhite <jimmy.musselwhite at gmail.com> wrote:
> >
> > Thanks guys! That worked great.
> >
> > On 10/17/07, Peter < biopython at maubp.freeserve.co.uk> wrote:
> > >
> > > Jimmy Musselwhite wrote:
> > > > Now the code I want to do is
> > > > record.seq.count(search)
> > > >
> > > > but what I am forced to do is
> > > > record.seq.tostring().count(search)
> > > >
> > > > The problem here is that when I am forced to use .tostring() on
> > > every single
> > > > seq object it devastates my memory usage in a BIG way. It eats up
> > > about
> > > > 1.2gigs and then crashes. If I remove the .tostring() and just tell
> > > if to
> > > > search for 'A', it will run fine and use memory at about 1/100th the
> > > rate
> > >
> > > In the short term, try record.seq.data.count (search) which is what
> > > the
> > > tostring() method is doing anyway (the Seq object stores the sequence
> > > internally as a string).  Does that help?
> > >
> > > We might be tweaking the Seq object after the next release to act a
> > > bit
> > > more like a string - at which point the .data property might go away.
> > >
> > > > So my question sums down to, is there any way to make .count() be
> > > able to
> > > > search for strings and not just characters?
> > >
> > > You I'd never noticed that - I would call it a bug...
> > >
> > > >>> from Bio.Seq import Seq
> > > >>> my_seq = Seq("AAACACACGGTTTT")
> > > >>> my_seq.data.count("GG")
> > > 1
> > > >>> my_seq.data.count("G")
> > > 2
> > > >>> my_seq.tostring().count("G")
> > > 2
> > > >>> my_seq.tostring().count("GG")
> > > 1
> > > >>> my_seq.count("G")
> > > 2
> > > >>> my_seq.count("GG")
> > > 0
> > >
> > > Peter
> > >
> > >
> >
>



More information about the Biopython mailing list