Function ecdf break down for large datasets

조회 수: 3 (최근 30일)
Martin
Martin 2011년 2월 24일
Hi,
I have a very large vector x (around 130 million elements). When I try to find the empirical cumulative distribution function of the values from that vector using MATLAB's command "ecdf(x)" the function breaks down. Its plot shows the ECDF for only the smaller values of x and doesn't even exist for bigger values of x. When I try to run the ecdf command on only a part of the vector (say 10 million elements), the results seem OK. Does anyone know what could be wrong with the ecdf function so that it breaks down in this manner for very large datasets?
Thank you very much for you help.
Martin
  댓글 수: 1
Martin
Martin 2011년 3월 8일
Is there anyone who can help me with this issue? Thanks.

댓글을 달려면 로그인하십시오.

답변 (1개)

Mathieu Boutin
Mathieu Boutin 2011년 9월 8일
Hi Martin. You could try my new homemade function and see if it works fine:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function [v_f,v_x] = homemade_ecdf(v_data)
nb_data = numel(v_data);
v_sorted_data = sort(v_data);
v_unique_data = unique(v_data);
nb_unique_data = numel(v_unique_data);
v_data_ecdf = zeros(1,nb_unique_data);
for index = 1:nb_unique_data
current_data = v_unique_data(index);
v_data_ecdf(index) = sum(v_sorted_data <= current_data)/nb_data;
end
v_x = [v_unique_data(1) v_unique_data];
v_f = [0 v_data_ecdf];
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by