[Bioperl-l] How to remove redundancy ?
Fri, 15 Nov 2002 08:12:23 -0800
Perhaps you could be more specific by what you mean by "redundancy"? And what
format your data set is in? For example, assuming fasta format and redundancy
meaning duplications in the data set, are you referring to primary IDs,
accession numbers, descriptions, or the sequences themselves? If this was the
case you could roll a solution with BioSeqIO. Read in the file, pull out the
information of interest (what you are defining as redundant) with one of the
"get property" sorts of methods (like $obj->desc) and test that information
against a hash populated as you go. If it already exists, move to the next
one, otherwise write it out to a new file.
Center for Biomedical Research,
Dept. of Biology,
University of Victoria
>===== Original Message From Giuseppe Torelli <email@example.com> =====
>which software do you use to remove redundancy
>from a gene dataset ?
>Laboratory of Molecular Evolution
>Stazione Zoologica A. Dohrn
>80121 Naples - Italy
>Tel. 0039 81 5833311
>Fax: 0039 81 7641355
>Bioperl-l mailing list