[Bioperl-l] Next-gen modules

Mark A. Jensen maj at fortinbras.us
Wed Jun 17 20:54:00 UTC 2009


unintended! Does that mean your delete key's broke...?
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Elia Stupka" <e.stupka at ucl.ac.uk>
Cc: <bioperl-l at lists.open-bio.org>; <tristan.lefebure at gmail.com>
Sent: Wednesday, June 17, 2009 4:35 PM
Subject: Re: [Bioperl-l] Next-gen modules


> So, #1 priority is to get fastq up-to-speed, then maybe assess other  
> options.
> 
> Illuminating discussion, thanks Elia!
> 
> urgh, excuse unintended bad pun above...
> 
> chris
> 
> On Jun 17, 2009, at 3:06 PM, Elia Stupka wrote:
> 
>> Interesting that you mention the database issue. We found that for  
>> specific memory/CPU intenstive things we also switch to using dbs.  
>> For example, after many years of loyal use of disconnected_ranges we  
>> switched to a simple SQL implementation of it, because of the large  
>> performance gains it would give us.  Similarly in Ensembl as well as  
>> in the old days of bioperl-db we opted for doing subseq within SQL  
>> where possible.
>>
>> Some lean way of SQL'izing specific components could be less  
>> "disruptive" than avoiding object creation and provide significant  
>> gains in performance. Could be set as an optional flag, and could  
>> use temporary ad hoc SQL databases?
>>
>> Still, priority now is to make SeqIO compliant with all those  
>> formats, than we can worry about performance :)
>>
>> Elia
>>
>> On 17 Jun 2009, at 20:30, Chris Fields wrote:
>>
>>> On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote:
>>>
>>>> Tristan Lefebure wrote:
>>>>> Hello,
>>>>> Regarding next-gen sequences and bioperl, following my  
>>>>> experience, another issue is bioperl speed. For example, if you  
>>>>> want to trim bad quality bases at ends of 1E6 Solexa reads using  
>>>>> Bio::SeqIO::fastq and some methods in Bio::Seq::Quality, well,  
>>>>> you've got to be patient (but may be I missed some shortcuts...).
>>>>
>>>> This is my concern as well. Or, rather, is there actually a  
>>>> significant set of users out there who are dealing with next-gen  
>>>> sequencing and would consider using BioPerl for their work?
>>>>
>>>> I'm working with all the 1000-genomes data at the Sanger, and we  
>>>> at least are probably never going to use BioPerl for the work.
>>>
>>> Are you using pure perl or (gasp) something else?  ;>
>>>
>>> Judging by the feedback there are definitely a set of users who  
>>> would like to integrate nextgen into bioperl somehow, probably to  
>>> take advantage of other aspects of bioperl.
>>>
>>>>> A pure perl solution will be between 100 to 1000x faster... Would  
>>>>> it be possible to have an ultra-light quality object with few  
>>>>> simple methods for next-gen reads?
>>>>
>>>> The fastq parser itself already seems pretty fast. The way to get  
>>>> the speedup is to not create any Bio::Seq* objects but just return  
>>>> the data directly. At that point it's not taking much advantage of  
>>>> BioPerl. But certainly it could be done...
>>>
>>>
>>> I suppose the best way to assess what needs to be done is come up  
>>> with a set of 'use cases' specifying what users want so we can  
>>> design around them, otherwise we're shooting in the dark.
>>>
>>> I'm personally wondering if this could be done as a sequence  
>>> database, something similar in theme to Lincoln's  
>>> SeqFeature::Store, but sequence only, and returns quality objects  
>>> in a similar manner (ala Storable)?  Not sure whether that's  
>>> feasible, but it's appears at least scalable.
>>>
>>> chris
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> ---
>> Senior Lecturer, Bioinformatics
>> UCL Cancer Institute
>> Paul O' Gorman Building
>> University College London
>> Gower Street
>> WC1E 6BT
>> London
>> UK
>>
>> Office (UCL): +44 207 679 6493
>> Office (ICMS): +44 0207 8822374
>>
>> Mobile: +44 7597 566 194
>> Mobile (Italy): +39 338 8448801
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>



More information about the Bioperl-l mailing list