s is a vector of length 2^14, whose elements are -1 or +1.
I need to repeatedly permutate s, in the following way:
the elements are diveded into cycles, that I need to cyclically permutate independently.
the first n_1 elements belongs to the first "cycle", the next n_2 elements belongs to the second, and so forth.
I would love Ideas for how to optimize this process.
* Edit for clarification:
Suppose instead that s were 2^3=8 elements. Suppose that vector is:
and suppose the cycle lengths are:
Then the desired output is:
What I have done:
I created a block matrix J consisting of cyclic permutation blocks for each cycle, so I have:
I use single-precision gpuArray for J and s.
because J is a constant matrix consisting only 0's and 1's, I hope there is a way to speed up this multiplication.
Or perhaps there is another way to solve this?