[Bioperl-l] dealing with large files
Chris Fields
cjfields at uiuc.edu
Thu Dec 20 20:39:48 UTC 2007
On Dec 20, 2007, at 12:52 PM, Amir Karger wrote:
>> Amir Karger wrote:
>>>> It would be nice to code up a lazy sequence object and related
>>>> parsers; maybe for the next dev release.
>>>
>>> Also, BLAST parsing. Blasting the proteome against the
>> genome makes for
>>> rather large result files.
>>
>> This has already been done. Use Bio::SearchIO::blast_pull. In a
>> situation like yours I dropped run time from 20223s to
>> 951s (~20x faster) and memory usage from over 8GB to less
>> than 5GB (~40%
>> less).
>
> Not in 1.5.1. Is it in 1.5.2 or just in cvs? Is there a single file I
> can put in my own perl lib for this, or does it require large
> bunches of
> new code? (I'm guessing the latter.) We're about to upgrade to 1.5.2
> here, but I don't see our whole center using CVS Bioperl.
>
> -Amir
It's in CVS.
Just to note: there have been a lot of changes between 1.5.1 and
1.5.2, and probably as many from 1.5.2 to now. We are cleaning up
some code introduced prior to the 1.5 release and working on other
fixes and code docs, with the final aim to be a new 1.6; I'm hoping
that release will have routine point releases for bug fixes. Of
course that'll have to wait until after SVN migration!
There a few discussions on the list about speeding up parsing using
lightweight/featherweight objects or even straight hashes (for
instance, Jason has a lightweight seqfeature implementation committed
on a ranch which is quite fast, and Sendu's Bio::SearchIO PullParser
implementations). My feeling is that will be part of the next dev
release, along with GFF3 integration and code cleanup.
chris
More information about the Bioperl-l
mailing list