[Bioperl-l] removing redundant accession numbers
kamesh narasimhan
nkamesh at gmail.com
Thu Sep 7 21:51:50 UTC 2006
Hi ppl,
I am newbie to perl/bioperl programming.
I currently have a task, (which looks a bit daunting to me now...). I
have a text file, in which I have a set of accession numbers and which
look like this
acession_numbers.txt contain: (a '>'' followed by two lower case
alphabets followed by ten digits).
>ci0100130090
>ci0100130320
>ci0100130340
>ci0100130574
>ci0100130090
>ci0100130804
>ci0100130945
>ci0100130986
>ci0100130090
>ci0100131137
>ci0100131140
>ci0100130320
>ci0100130340
>ci0100130804
>ci0100130945
Some of the accession numbers may be repeated in the file, like for
example >ci0100130090 is repeated 3 times, >ci0100130340 is repeated 3
times etc; >ci0100130320 2 times etc;
I would want the output file for a program telling me, that
output file.txt
>ci0100130090 - 3 times
>ci0100130320 - 2 times
.......
I tried perl scripting with the idea of getting to read the $/ = '>'
and getting each element in an array....however, ya..i am not able to
proceed....and seem to going nowhere....
any help with scripting (and if possible with comments) in this regard
will be greatly appreciated.
Thanks a zillion in advance
More information about the Bioperl-l
mailing list