[Bioperl-l] New Bioperl dependency? Sort::Naturally

Florent Angly florent.angly at gmail.com
Sun May 9 05:12:03 UTC 2010


Within one assembly file, contig IDs typically tend to follow one 
formatting convention. The two most popular ones are a numerical ID, or 
an alphanumeric ID, such as 'contig13'. The later case already requires 
natural sorting. There is no way to know in advance what format to 
expect, and in fact, the format being specified by the user, it could be 
arbitrarily complicated, although probably, IDs would be sorted naturally.

I will follow Chris's recommendation of using Sort::Naturally as a 
recommended package. The users who don't have this dependency will have 
their IDs sorted in a safe way, lexically.

Florent


On 09/05/10 02:12, Jason Stajich wrote:
> Unless necessary I don't know if adding yet another dependency is 
> warranted here.
>
> I don't know how complicated the words will be but can't you just 
> strip out the numbers and do this in a schwartzian transformation?
>
> #!/usr/bin/perl -w
> use strict;
> my @arr = qw(single1 contig10 101 contig2 3);
> my @sorted = map { $_->[1] } sort { $a->[0] <=> $b->[0] } map { [ 
> /(\d+)/, $_] } @arr;
> print join("\n", at sorted),"\n";
>
> But I'm not sure how do you want to sort
> 10 vs contig10 vs singlet10 reliably?
>
> -jason
>
> Florent Angly wrote, On 5/7/10 9:42 PM:
>> Hi all,
>>
>> I am working on updating some of the Bio::Assembly::* modules right now.
>> I need to sort a list of IDs. These IDs could be numbers, "words" or 
>> a mix of the two, for example:         @arr = ('singlet1', 
>> 'contig10', 'contig2',  '101', '3');
>>
>> I cannot sort them with the numerical sort: sort { $a <=> $b } @array
>> This would generates warnings because some of'singlet1' the IDs are 
>> numbers.
>>
>> I cannot sort them lexically: sort @array
>> Lexical sorting would not take into account numbers properly and 
>> result in:
>>     singlet1 contig10 contig2 3 101
>>
>> So, what I really need is natural sorting, which is not in any core 
>> function of Perl. I'd like to use the CPAN module Sort::Naturally for 
>> this purpose: nsort @arr
>> The results would be what we expect, i.e.:
>>     3 101 contig2 contig10 singlet1
>>
>> Can I add this module as an additional dependency of BioPerl? I 
>> imagine that some other modules might want to use this. On the 
>> assembly side, it would be used by the writing methods of 
>> Bio::Assembly::IO::tigr and ace. Or maybe there is an easy way around 
>> my problem that doesn't require any external module?
>>
>> Florent
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l 




More information about the Bioperl-l mailing list