Function ecdf break down for large datasets
이전 댓글 표시
Hi,
I have a very large vector x (around 130 million elements). When I try to find the empirical cumulative distribution function of the values from that vector using MATLAB's command "ecdf(x)" the function breaks down. Its plot shows the ECDF for only the smaller values of x and doesn't even exist for bigger values of x. When I try to run the ecdf command on only a part of the vector (say 10 million elements), the results seem OK. Does anyone know what could be wrong with the ecdf function so that it breaks down in this manner for very large datasets?
Thank you very much for you help.
Martin
답변 (1개)
Mathieu Boutin
2011년 9월 8일
Hi Martin. You could try my new homemade function and see if it works fine:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function [v_f,v_x] = homemade_ecdf(v_data)
nb_data = numel(v_data);
v_sorted_data = sort(v_data);
v_unique_data = unique(v_data);
nb_unique_data = numel(v_unique_data);
v_data_ecdf = zeros(1,nb_unique_data);
for index = 1:nb_unique_data
current_data = v_unique_data(index);
v_data_ecdf(index) = sum(v_sorted_data <= current_data)/nb_data;
end
v_x = [v_unique_data(1) v_unique_data];
v_f = [0 v_data_ecdf];
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
카테고리
도움말 센터 및 File Exchange에서 Exploration and Visualization에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!