Checking repetition of random data

Question

0 개 추천

I heed your help please. I made a random data for example T1 = randn (1000,1); T2= randn (1000,1); .... T100=randn (1000,1); and I want check whether there is any repetition for T's if so then remove it. How can I do that ?? Thanks in advance :)

Regards, Ahmed

댓글 수: 11
이전 댓글 9개 표시 이전 댓글 9개 숨기기

Star Strider 2018년 1월 7일

@Ahmed — See the documentation on rng (link), and more generally, the discussion on Generate Random Numbers That Are Repeatable (link).

Student for ever 2018년 1월 7일

편집: Jan 2018년 1월 7일

Dear Jan,

I am new in matlab :), May my question is not clear, but your answer it is so close of what I want to do I think. I have 200 timeseries, which were come from parallel computation and I just want make sure that 200 are not repeated (what I mean, if I make plot for them they should give me different graphs). So, I put all the 200 timeseries as a matrix, it will be 200 column , then I just want check these columns not the same.

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

활동을 팔로우하려면 로그인

Answer 1

Jan 2018년 1월 4일

편집: Jan 2018년 1월 8일

MATLAB Online에서 열기

2 개 추천

Do not create a list of variables called T1, T2, ... See https://www.mathworks.com/matlabcentral/answers/57445-faq-how-can-i-create-variables-a1-a2-a10-in-a-loop. Use a cell or multidimensional array instead.

I assume your problem is to have no repeated values inside each vector and between all vectors. Then you need 1000*100 different random numbers at first:

ready = false;
while ~ready
  Pool  = rand(1, 100000);
  ready = (length(unique(Pool)) == length(Pool));
end
T = reshape(Pool, 1000, 100);

Maybe this is faster:

ready = all(diff(sort(Pool)));

[EDITED] If all you want is to create a unique set of vectors, and randn was just an example to create test data for the forum:

[T, Idx] = unique(T, 'rows')

[EDITED] And for unique columns:

T = unique(T.', 'rows').'

댓글 수: 1
이전 댓글 -1개 표시 이전 댓글 -1개 숨기기

Student for ever 2018년 1월 8일

Thank you so much

댓글을 달려면 로그인하십시오.

Answer 2

Birdman 2018년 1월 4일

MATLAB Online에서 열기

1 개 추천

Firstly, generate random data as follows:

T=randn(1000,100);

Secondly, as Adam said, use unique function to check repetitions.

Tun=unique(T,'stable');

stable command helps to protect the initial order of values.

댓글 수: 5
이전 댓글 3개 표시 이전 댓글 3개 숨기기

Jan 2018년 1월 4일

A "habit"? :-) I'd suggest to use time consuming methods only, if they are needed for the results.

Birdman 2018년 1월 4일

It is needed for result, exactly.

댓글을 달려면 로그인하십시오.

Answer 3

John BG 2018년 1월 5일

MATLAB Online에서 열기

0 개 추천

Hi Ahmed

this is John BG <mailto:jgb2012@sky.com jgb2012@sky.com>

so far, the supplied answers increase the probability to generate all-different, random Ts.

Each of the answers improves generation randomness, yet if you really want to make sure that all T sequences are different, once generated, let's say you don't really have control on the randomness of the data and the the suggested randn(1000,1) is you model, then there's no other way than comparing them by pairs.

1.

Let be N the amount of T sequences

N=5

2.

then all possible pairs of T sequences are

 L=combinator(N,2,'c')
 =
   2
   3
   4
   5
   3
   4
   5
   4
   5
   5

3.

As Jan Simon mentions, sometimes it's more practical to put all data in a structure that can be indexed, instead of working with N different sequence names.

Let be T all your input Ti sequences compiled into a single matrix

T=randi([1 10],N)
T =
   2     3     9     3
   5     8    10     9
  10     3     6     3
   4     6     2     9
   6     7     2     3

4.

Checking there are no 2 equal sequences

D=[0 0];
  for k=1:1:size(L,1)
    if  isequal(T(L(k,1),:),T(L(k,2),:))
        D=[D;L(k,:)];
    end
    end

5.

Removing repeated sequences

    if size(D,1)>1
        D(1,:)=[];
        T(D(:,1),:)=[]; % removing one of the repeated identical pairs
    end
    T

.

Ahmed, I have overwritten some sequences on purpose, so the counter D shows spotted repeated sequences and these simple lines remove all repetition without losing data (when more than one repetition of same given sequence) and it works.

If you find this answer useful would you please be so kind to consider marking my answer as Accepted Answer?

To any other reader, if you find this answer useful please consider clicking on the thumbs-up vote link

thanks in advance for time and attention

John BG

<mailto:jgb2012@sky.com jgb2012@sky.com>

댓글 수: 12
이전 댓글 10개 표시 이전 댓글 10개 숨기기

John BG 2018년 1월 6일

MATLAB Online에서 열기

Checking delays for 100 strings shows that unique is the fastest option:

    N=100
      M=1000
      % T=randi([1000 9999],N,M);
      T=repmat(randi([1000 9999],1,M),N,1);
      tic
      D=[0 0];
      L=combinator(N,2,'c');
        for k=1:1:size(L,1)
          if  isequal(T(L(k,1),:),T(L(k,2),:))
            D=[D;L(k,:)];
          end
        end
        if size(D,1)>1
            D(1,:)=[];
            T(D(:,1),:)=[]; % removing one of the repeated identical pairs
        end
         toc

100: Elapsed time is 0.068783 seconds.

1000: Elapsed time is 7.953736 seconds.

single string repeated 100 times: Elapsed time is 0.112921 seconds.

      tic
      L=combinator(N,2,'c');
      D  = zeros(size(L, 1), 2);   % Pre-allocation!!!
    iD = 0;
    for k = 1:size(L,1)
      if isequal(T(L(k,1),:),T(L(k,2),:))
         iD       = iD + 1;
         D(iD, :) = L(k, :);
      end
    end
    D = D(1:iD, :);
      toc

100: Elapsed time is 0.075002 seconds.

1000: Elapsed time is 7.884542 seconds.

single string repeated 100 times: Elapsed time is 0.085103 seconds.

   tic
      L=combinator(N,2,'c');
      dup = false(size(L, 1), 2);   % Pre-allocation!!!
    for k = 1:size(L,1)
      if isequal(T(L(k,1),:),T(L(k,2),:))
         dup(k) = true;
         break;
      end
    end
    L = L(dup, :);
      toc

100: Elapsed time is 0.062778 seconds.

1000: Elapsed time is 7.863167 seconds.

single string repeated 100 times: Elapsed time is 0.030683 seconds.

      tic
      nT   = size(T, 1);
    keep = true(nT, 1);  % Pre-allocation!!!
    for i1 = 1:nT
      for i2 = i1 + 1:nT
        if isequal(T(i1, :), T(i2, :))
          keep(i1) = false;
          break;         % No need to proceed the search
        end
      end
    end
    T = T(keep, :);
      toc

100: Elapsed time is 0.068909 seconds.

1000: Elapsed time is 7.784486 seconds.

single string repeated 100 times: Elapsed time is 0.034376 seconds.

    tic
      [T2, Idx] = unique(T, 'rows');
        toc

100: Elapsed time is 0.023476 seconds.

1000: Elapsed time is 0.061907 seconds.

single string repeated 100 times: Elapsed time is 0.024031 seconds.

When increasing the amount of strings, unique outperforms any other solution, regarding time delay.

Regards

John BG

<mailto:jgb2012@sky.com jgb2012@sky.com>

Stephen23 2018년 1월 8일

편집: Stephen23 2018년 1월 8일

The help clearly states that "Answers can only be accepted by someone other than the author of the question after 7 days of inactivity from the author".

Student for ever 2018년 1월 9일

편집: Student for ever 2018년 1월 9일

Thanks all for helping, your comments its really useful for me. @John BG, I already accept Jan's answer.

댓글을 달려면 로그인하십시오.

Checking repetition of random data

댓글 수: 11
이전 댓글 9개 표시 이전 댓글 9개 숨기기

채택된 답변

댓글 수: 1
이전 댓글 -1개 표시 이전 댓글 -1개 숨기기

추가 답변 (2개)

댓글 수: 5
이전 댓글 3개 표시 이전 댓글 3개 숨기기

댓글 수: 12
이전 댓글 10개 표시 이전 댓글 10개 숨기기

카테고리

태그

Community Treasure Hunt

Checking repetition of random data

댓글 수: 11 이전 댓글 9개 표시 이전 댓글 9개 숨기기

채택된 답변

댓글 수: 1 이전 댓글 -1개 표시 이전 댓글 -1개 숨기기

추가 답변 (2개)

댓글 수: 5 이전 댓글 3개 표시 이전 댓글 3개 숨기기

댓글 수: 12 이전 댓글 10개 표시 이전 댓글 10개 숨기기

카테고리

태그

참고 항목

Community Treasure Hunt

댓글 수: 11
이전 댓글 9개 표시 이전 댓글 9개 숨기기

댓글 수: 1
이전 댓글 -1개 표시 이전 댓글 -1개 숨기기

댓글 수: 5
이전 댓글 3개 표시 이전 댓글 3개 숨기기

댓글 수: 12
이전 댓글 10개 표시 이전 댓글 10개 숨기기