Best way to create a large,iterative database of frequency count, of binary numbers varying by 1 bit?
정보
이 질문은 마감되었습니다. 편집하거나 답변을 올리려면 질문을 다시 여십시오.
이전 댓글 표시
So, I have a general question, not necessarily looking for code, but just a general idea of how to accomplish this task in the best way possible.
To start, I will have a string, that will be several thousands bits long, of simple 1's and 0's. What I want to do, is for each new bit (1 or 0) added to my string, take the last n bits off the end, and store that sequence into a database.
If the string is a NEW/unique string not seen before, I'd like to add it to the database. If it is a repeated sequence, I'd like to add to the count.
What I initially did, since these are binary numbers, using n-bits, was to create a matrix of all possible permutations, where the possible events would be 2^n. So if n==3, I would have 000,001,010,011,100,101,110,111.
And run through my big string 1 bit at a time, look at the latest n bits on the end, and compare to each value in my list and update the count of each occurrence.
What is important, is that I am able to compare the frequency of the sequences that differ by the last bit. So if n==3, I would like to compare for example, the frequency/percentage of occurrence of:
- 00 0 vs 00 1
- 01 0 vs 01 1
- 10 0 vs 10 1
- 11 0 vs 11 1
For now I'm not too concerned with efficiency, just a basic working framework in place.
So in general I am wondering if there is a better way to accomplish what I am trying to do other than the way I just described above?
Thanks!
댓글 수: 0
답변 (1개)
Walter Roberson
2013년 11월 8일
Let B be the bit values in numeric form (0 vs 1), already arranged n wide per row
IDX = 2.^(n-1:-1:0).' * B + 1; %matrix multiplication not .*
numents = length(IDX);
counts = accumarray(IDX(:), 1, [1, 2^n]);
Now counts(K) is the number of occurrences of the bit sequence whose decimal equivalent is K-1.
The marginals are now
T = 2 * (0:(2^(n-1)-1));
marginals = counts([T + 1, T + 2]);
I did not convert the marginal counts into ratios or relative percentages because of the possibility that one or both might be 0.
댓글 수: 6
Walter Roberson
2013년 11월 8일
Numeric array, and it should come from your actual data. For example if you have the signal processing toolbox then you could use buffer with a width "n" and an overlap "n-1", and then transpose the result of buffer()
For example if the input was 0110011 and n = 3 then you would generate
[0 1 1
1 1 0
1 0 0
0 0 1
0 1 1]
Walter Roberson
2013년 11월 12일
There is not much point in creating the list of all possible permutations first: you can just take any particular vector of the right length and compute an integer array index number from it directly.
Counting as you go along is less efficient for the counting work than creating the kind of B array I describe and doing all the counting in a couple of instructions; on the other hand taking that approach requires more memory.
Counting on the fly or reading everything and then counting are both workable solutions.
Forrest
2013년 11월 12일
Forrest
2013년 11월 12일
이 질문은 마감되었습니다.
제품
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!