[Bioperl-l] how-to-remove-redundant-lines
Heikki Lehvaslaiho
heikki at ebi.ac.uk
Wed Jun 29 05:20:40 EDT 2005
... and if your set are not in decreasing order, you can not print them out
immediately, bit you have to store them in a hash and test if the new set is
a superset or a subset of each existing set, and remove and add sets in the
hash accordingly - and print the hash elements in the end.
-Heikki
On Wednesday 29 June 2005 10:00, Heikki Lehvaslaiho wrote:
> Vijayraj,
>
> Your probelm in mathematical terms is comparing sets.
>
> In pseudocode:
>
> parse first line, create a set, write the line
> add set to an array
> for each subsequent line {
> parse the line, create a set
> for each old set in the array {
> if this set is a subset of the old set {
> next line
> }
> }
> # if we are here, we have not seen the set before
> add set to an array, write the line
> }
>
> the output will contain the unique lines only.
>
> There are a lot of modules in CPAN that can do the algebra for you. One of
> them is Set::Scalar: http://search.cpan.org/~jhi/Set-Scalar-1.19/
>
>
> Yours,
> -Heikki
>
> On Wednesday 29 June 2005 08:19, vijayaraj nagarajan wrote:
> > hi
> > i have a cluster file with contents like this:
> >
> > 1 2 5 7 8 11
> > 2 5 7 8 11
> > 3 13 17 19
> > 4 21 45 67
> > 5 7 8 11
> >
> > Now the 1,2 and 5th lines are redundant. i need to
> > remove the 2nd and 5th line from the file, while
> > retaining only the first line, since the first line
> > contains all the members present in 2 and 5th line...
> >
> > could anyone suggest me how to parse this file, to
> > remove such redundant lines using perl.
> > any help and suggestions in this regard would be
> > greatly appreciated.
> >
> > thanks
> >
> > vijayaraj nagarajan
> > research assistant
> > the university of southern mississippi
> > ms, usa
> >
> >
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam? Yahoo! Mail has the best spam protection around
> > http://mail.yahoo.com
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
--
______ _/ _/_____________________________________________________
_/ _/ http://www.ebi.ac.uk/mutations/
_/ _/ _/ Heikki Lehvaslaiho heikki at_ebi _ac _uk
_/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute
_/ _/ _/ Wellcome Trust Genome Campus, Hinxton
_/ _/ _/ Cambridge, CB10 1SD, United Kingdom
_/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________
More information about the Bioperl-l
mailing list