[Bioperl-l] removing redundant accession numbers

Thu Sep 7 21:21:33 UTC 2006

Hi ppl,

I am newbie to perl/bioperl programming.

I currently have a task, (which looks a bit daunting to me now...). I
have a text file, in which I have a set of accession numbers and which
look like this

acession_numbers.txt contain: (a '>'' followed by two lower case
alphabets followed by ten digits).

>ci0100130090
>ci0100130320
>ci0100130340
>ci0100130574
>ci0100130090
>ci0100130804
>ci0100130945
>ci0100130986
>ci0100130090
>ci0100131137
>ci0100131140
>ci0100130320
>ci0100130340
>ci0100130804
>ci0100130945

Some of the accession numbers may be repeated in the file, like for
example >ci0100130090 is repeated 3 times, >ci0100130340 is repeated 3
times etc; >ci0100130320 2 times etc;

I would want the output file for a program telling me, that

output file.txt

>ci0100130090 - 3 times
>ci0100130320 - 2 times
.......

I tried perl scripting with the idea of getting to read the $/ = '>'
and getting each element in an array....however, ya..i am not able to
proceed....and seem to going nowhere....

any help with scripting (and if possible with comments) in this regard
will be greatly appreciated.

Thanks a zillion in advance