[Bioperl-l] counting gaps in sequence data
Michael Robeson
popgen23 at mac.com
Thu Oct 14 18:05:57 EDT 2004
I have a set of data that looks something like the following:
>human
acgtt---cgatacg---acgact-----t
>chimp
acgtacgatac---actgca---ac
>mouse
acgata---acgatcg----acgt
I am having trouble setting up a hash etc., to count the number and
types of continuous gaps. For example the 'human' sequence above has 2
sets of 3 gaps and 1 set of 5 gaps. The 'chimp' has 2 sets of 3 gaps
and finally the 'mouse' has 1 set of 3 gaps and 1 set of 4 gaps.
So, I am having trouble being able to assign a dynamic variable (i.e.
gap length) and place that in a pattern match so that it can count how
many gaps of that length are in that particular sequence. I know how to
set up a hash to count the number of times a gap appears:
'$gaptype{$gap}++' or something. The problem is: what is the best way
(and how) can I set '$gap' to be dynamic.
I need to know the length of each consecutive string of gaps. I know
how to count the gaps by using the 'tr' function. But it gets confusing
when I need to add counts to every instance of that gap length. I also
need to know the position of each gap (denoted by the position of the
first gap in that particular instance). I know that I can use the
'pos()' command for this.
So, my problem is that I think I know some of the bits of code to put
into place the problem is I am getting lost on how to structure it all
together. For now I am just trying to get my output to look like this:
Human
number of 3 base pair gaps: 2
at positions: 6, 16
number of 5 base pair gaps: 1
at positions: 25
Chimp
.... and so on ...
So, any suggestions would be greatly appreciated. If anyone can help me
out with all or even just bits of this I would greatly appreciate it.
This should help me get started on some more advanced parsing I need to
do after this. I like to try and figure things out on my own if I can,
so even pseudo code would be of great help!
-Thanks
-Mike
More information about the Bioperl-l
mailing list