[Bioperl-l] how-to-remove-redundant-lines
Terry Jones
tcj25 at cam.ac.uk
Wed Jun 29 07:16:28 EDT 2005
>>>>> "vijayaraj" == vijayaraj nagarajan <bioinfovijayaraj at yahoo.com> writes:
vijayaraj> i have a cluster file with contents like this:
vijayaraj> 1 2 5 7 8 11
vijayaraj> 2 5 7 8 11
vijayaraj> 3 13 17 19
vijayaraj> 4 21 45 67
vijayaraj> 5 7 8 11
vijayaraj> Now the 1,2 and 5th lines are redundant. i need to
vijayaraj> remove the 2nd and 5th line from the file, while
vijayaraj> retaining only the first line, since the first line
vijayaraj> contains all the members present in 2 and 5th line...
Here's something much better. It tries to be somewhat efficient.
Terry
#!/usr/bin/perl -w
use strict;
my @lines;
while (<>){
my @nums = split;
my $nums = {};
map { $nums->{$_} = undef } @nums;
push @lines, [ $nums, scalar(@nums) ];
}
my @sorted = sort { $lines[$b]->[1] <=> $lines[$a]->[1] } 0 .. $#lines;
for (my $i = 0; $i < @lines; $i++){
print join(' ', sort { $a <=> $b } keys %{$lines[$sorted[$i]]->[0]}), "\n" unless match($i);
}
sub match {
my $index = shift;
my $target_set = $lines[$sorted[$index]]->[0];
for (my $i = 0; $i < $index; $i++){
my $is_subset = 1;
my $bigger_set = $lines[$sorted[$i]]->[0];
for my $element (keys %$target_set){
unless (exists $bigger_set->{$element}){
$is_subset = 0;
last;
}
}
return 1 if $is_subset;
}
}
exit(0);
More information about the Bioperl-l
mailing list