[Bioperl-l] LargeSeq performance

Wed Oct 29 11:54:55 EST 2003

I have a problem with the performance of  LargeSeq. I am working with 
whole chromosomes (mouse, human) and next_seq takes forever.
I do not know if it is worth, since any portion can be read with random 
access, but I am still curious to know id pepople think it might be a 
good idea to create an object, that hadles extremely large sequences- 
whole chromosomes for example without impact on the performance?
If you think it's worth I can try to do it. What I have in mind is use 
grep to map the record separators ">" (in case you are mad enogh to put 
more than one chromosome in a single file). Thus next_seq will know 
where to look for the next sequence and, parse the id line and calc the 
length. And I doubt anyone will use this under Windows (anyway, OS can 
be checked to avoid problems). Also the object will use random 
accessinstead of  Bio::Root::IO to get sequence data.
Let me know what you think...
Stefan Kirov