[BioRuby] BioRuby, performance, cut_with_enzyme

Trevor Wennblom trevor at corevx.com
Thu Sep 9 21:11:53 UTC 2010


On Sep 9, 2010, at 4:01 PM, Maciej Łopatka wrote:

> I've got a big problem with performance of BioRuby. I need to cut sequence
> (about 5MB <  < 17MB) with restriction enzymes. It takes ages to cut it with
> just one enzyme, it just doesn't stop, and I have no idea what is going on.
> My friend for the same task used BioPython and it took just 2 minutes for 20
> enzymes.



Hi Maciek,

I'm the author of that particular extension. One of the particular points of the design - several years ago - was to accommodate the particular cut patterns on both strands of the sequence. If I recall, other libraries concentrate on only one strand of the sequence.

This does take a considerable performance hit, interpreted language aside. I'm also relatively certain that I didn't implement the most efficient algorithm as it was my first crack at solving the issue back then.

If you'd like to email me with a link to the particular dataset you're using and what you're trying to do with it I'd be happy to take a look at it in the next few days. Another question is are you interested in the results of having both strands considered - which would give more accurate results - or just a single strand?



More information about the BioRuby mailing list