[Bioperl-l] how-to-remove-redundant-lines
Heikki Lehvaslaiho
heikki at ebi.ac.uk
Wed Jun 29 05:00:38 EDT 2005
Vijayraj,
Your probelm in mathematical terms is comparing sets.
In pseudocode:
parse first line, create a set, write the line
add set to an array
for each subsequent line {
parse the line, create a set
for each old set in the array {
if this set is a subset of the old set {
next line
}
}
# if we are here, we have not seen the set before
add set to an array, write the line
}
the output will contain the unique lines only.
There are a lot of modules in CPAN that can do the algebra for you. One of
them is Set::Scalar: http://search.cpan.org/~jhi/Set-Scalar-1.19/
Yours,
-Heikki
On Wednesday 29 June 2005 08:19, vijayaraj nagarajan wrote:
> hi
> i have a cluster file with contents like this:
>
> 1 2 5 7 8 11
> 2 5 7 8 11
> 3 13 17 19
> 4 21 45 67
> 5 7 8 11
>
> Now the 1,2 and 5th lines are redundant. i need to
> remove the 2nd and 5th line from the file, while
> retaining only the first line, since the first line
> contains all the members present in 2 and 5th line...
>
> could anyone suggest me how to parse this file, to
> remove such redundant lines using perl.
> any help and suggestions in this regard would be
> greatly appreciated.
>
> thanks
>
> vijayaraj nagarajan
> research assistant
> the university of southern mississippi
> ms, usa
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
--
______ _/ _/_____________________________________________________
_/ _/ http://www.ebi.ac.uk/mutations/
_/ _/ _/ Heikki Lehvaslaiho heikki at_ebi _ac _uk
_/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute
_/ _/ _/ Wellcome Trust Genome Campus, Hinxton
_/ _/ _/ Cambridge, CB10 1SD, United Kingdom
_/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________
More information about the Bioperl-l
mailing list