Create (equal density) spaced vector in MATLAB
이 질문을 팔로우합니다.
- 팔로우하는 게시물 피드에서 업데이트를 확인할 수 있습니다.
- 정보 수신 기본 설정에 따라 이메일을 받을 수 있습니다.
오류 발생
페이지가 변경되었기 때문에 동작을 완료할 수 없습니다. 업데이트된 상태를 보려면 페이지를 다시 불러오십시오.
이전 댓글 표시
1 개 추천
I would like to create a spaced x-axis vector. linspace() results with the equally spaced vector. However, applying linspace to my data (image shown below) ends up losing a major chunk of information in the high density area. So I would like to produce an unequal spaced vector adjusted based on density. (Do suggest me if you feel there is any other method would work best for my dataset).
Thanks,

채택된 답변
William Rose
2024년 9월 27일
편집: William Rose
2024년 9월 27일
[Sorry if this is a duplicate answer. My first attempt to post produced an error message.]
Here are command that work for me. I can;t run them in this window, since the data file, prec_ev_data.mat, is too big to attach, even if I zip it.
data=load('prec_ev_data.mat');
x=data.prec_ev_data(:,1);
y=data.prec_ev_data(:,2);
xs=sort(x);
idx=1:10000:length(x);
xp=xs(idx);
xp=[xp;max(x)]; % add the max value
After the commands above, xp (x for plotting) is a column vector of length 1448. The first and last elements are the minimum and maximum x values in the data. The other values are spaced so there will be 10000 data points between each value in xp.
The triouble with the result above is that you have more than 10000 data pairs with the same x values. Therefore the first 4 values of xp() are identical, the next 7 values of xp() are identical, and so on. Eliminate the duplicate values in xp:
xpu=unique(xp);
disp([length(xp),length(xpu)])
1448 1033
Now xpu has 1033 unique x-values for plotting. They are unevenly spaced and increasing. There are 10000, or sometimes more, data pairs with x-values between each value in xpu.
댓글 수: 10
@Abinesh G, if you want a finer grid of x-values for plotting and analysis, adjust idx in the code above, or combine 2 lines. For example,
% idx=1:10000:length(x);
% xp=xs(idx);
xp=xs(1:1000:length(x)); % 1000 data pairs between xp values
You will still need to use xpu=unique(xp); to remove duplicates.
William Rose
2024년 9월 27일
편집: William Rose
2024년 9월 27일
Here is an example of how you could use the vector xpu in your analysis: Find the mean y-value of the samples in each bin.
This code doesn't run in this window, since the data file is too big to attach. It assumes the data vectors x and y are available, from commands listed above, and xpu has been computed using the commands above. size(xpu)=1033x1.
ymn=zeros(size(xpu)); % allocate vector for mean values of y
for i=1:length(xpu)-1, ymn(i)=mean(y(x>=xpu(i) & x<xpu(i+1))); end
ymn(end)=mean(y(x==xpu(end))); % last value in ymn
Plot the results. Include a plot of all the mean values and a separate plot of the low-x range, where most of the data is concentrated.
figure; subplot(211); plot(xpu,ymn,'-r.');
grid on; xlabel('X'); ylabel('mean(Y)')
subplot(212); plot(xpu,ymn,'-r.');
grid on; xlabel('X'); ylabel('mean(Y)'); xlim([0 5e5])
The commands above produce the figure below.

Most of the bins have 10000 elements in them, but some bins have more, and the last bin has only one sample in it. The low end x-value of each bin is used as the x-value for plotting.
Thanks a lot for detailed response. I am attaching the sampled data within 5MB. My goal is to perform Quantile decision tree using 'quantilePredict' and extract the data points existing outside the 90 percentile line. For 'quantilePredict' I need to provide the predictor data, I have provided initially the data with 'linspace'. However it ends up with equal spacing vector so that I cannot capture the data behaviour with the regions with high density.
Your suggestion on indexing the unique sorted data is sensible. I will give a try. If you have any other suggestion for my problem, kindly let me know.
I could be wrong (because I don't fully understand your problem) but I think if you wanted to spread out the x axis to make it more uniform you'd want to use inverse transform sampling. Basically you find the CDF of your data and invert it. See https://en.wikipedia.org/wiki/Inverse_transform_sampling for more info.
However I suspect this may be an XY Problem ( https://en.wikipedia.org/wiki/XY_problem ) where you're asking us to solve something that is not ultimately what you should be wanting to do.
"extract the data points existing outside the 90 percentile line"
The commands below extract the x,y pairs with the bottom 5%, middle 90%, and top 5% of x values. I realize this is not exactly what you want, but it is related.
data=load('prec_ev_data.mat');
x=data.prec_ev_data(:,1);
y=data.prec_ev_data(:,2);
N=length(x);
[xs,xOrder]=sort(x);
ys=y(xOrder); % y, sorted by sort order of x
xsLo=xs(1:round(N/20)); % lowest 10% of x values
ysLo=ys(1:round(N/20)); % y values corresponding to xsLo
xsMid=xs(round(N/20)+1:round(0.95*N)); % middle 90% of x values
ysMid=ys(round(N/20)+1:round(0.95*N)); % y values corresponding to xsMid
xsHi=xs(round(0.95*N)+1:end); % top 10% of x values
ysHi=ys(round(0.95*N)+1:end); % y values corresponding to xsHi
figure;
subplot(311), scatter(xsLo,ysLo,24,'r')
xlabel('X_{Low}'); ylabel('Y'); grid on
subplot(312), scatter(xsMid,ysMid,24,'g')
xlabel('X_{Mid}'); ylabel('Y'); grid on
subplot(313), scatter(xsHi,ysHi,24,'b')
xlabel('X_{High}'); ylabel('Y'); grid on

Thanks for responding. I may have not communicated properly. But inverse sampling as mentioned by @Image Analyst and your comment on indexing unique sort data may help me in producing unequal ordered vector (with some modifications). So for now I am closing the question by accepting your answer.
@Abinesh G, you're welcome. the suggestions of @Star Strider and @Image Analyst are always valuable.
Are you trying to make a decision tree (or a decision tree forest) to predict Y from X, where X and Y are vectors?
data=load('prec_ev_data.mat');
x1=data.prec_ev_data(:,1);
y1=data.prec_ev_data(:,2);
N1=length(x1);
% There are 900K x1,y1 pairs. Use a random 1% of them for this example.
idx1=randperm(N1); % random rearrangement of indices
x=x1(idx1(1:round(N1/100))); % random 1% of the x1 values
y=y1(idx1(1:round(N1/100))); % correspondng y1 values
Mdl1= TreeBagger(50,x,y,Method="regression")
Mdl1 =
TreeBagger
Ensemble with 50 bagged decision trees:
Training X: [9000x1]
Training Y: [9000x1]
Method: regression
NumPredictors: 1
NumPredictorsToSample: 1
MinLeafSize: 5
InBagFraction: 1
SampleWithReplacement: 1
ComputeOOBPrediction: 0
ComputeOOBPredictorImportance: 0
Proximity: []
For predX, use 21 x-values, chosen so there are approximately equal numbers of samples between successive vaues of predX.
N=length(x);
xs=sort(x);
idx=[1,round(N*(.05:.05:1))];
predX=xs(idx);
%predX=linspace(min(x),max(x),21)'
YQuantiles = quantilePredict(Mdl1,predX,'Quantile',[0.05,0.5,0.95]);
Error using sparse
Third input must be double or logical.
Third input must be double or logical.
Error in CompactTreeBagger>localGetTrainingNodes (line 2224)
S = sparse(1:sum(ibtf),tnode,tw(ibtf));
Error in CompactTreeBagger/quantileNode (line 1880)
[trainNode,TW] = localGetTrainingNodes(trees,Xtrain,wtrain,ibIdx);
Error in CompactTreeBagger/quantilePredictCompact (line 1924)
quantileNode(bagger,X,Xtrain,Wtrain,varargin{:});
Error in TreeBagger/quantilePredict (line 1537)
[varargout{1:nargout}] = quantilePredictCompact(bagger.Compact,X,...
figure; hold on
plot(X,Y,"r.");
plot(predX,YQuantiles)
xlabel('X'); ylabel('Y');
legend('Data','5%','Median','95%')
I am not sure why this error happens. Maybe you can figure it out. I have tried various changes (different number of trees, different length of predX, different values for Quantile vector, changing "...,'Quantile',[...])" to "...,Quantile=[...])", changing predX to a linspace vector, etc. Nothing helped. I also viewed vectors idx and predX to confirm that they look reasonable.
@Abinesh G for what it's worth, I'm attaching my demo of using inverse transform sampling to draw samples from a Rayleigh distribution using the formula for one input, and a uniform sampling of random numbers for the other input. Output is random numbers as if they were drawn from a Rayleigh distribution even though we started with random numbers drawn from a uniform distribution.
@Abinesh G, @Image Analyst suggested above that you find the cumulative distribution function, then invert it. That is what my code above does, using your data. By sorting the x-data, then taking equally spaced values (equal in terms of indices along the sorted x-vector), then finding the corresponding x-values at those indices, you are, in effect, finding equally spaced points on the vertical axis of the CDF, then finding the corresponding horizontal axis values (i.e. x-axis values).
Thanks a lot for responding. I have gone through your demo. I understood the concept on inverse transfrom sampling from cumulation distribution function.
@William Rose: Agree. Your response on deriving interval from the sorted value roughly does this invrse transform.
Although, the concept is straightforward, it didn't occur to me. Thanks to both of you for an elegant solution.
추가 답변 (0개)
카테고리
도움말 센터 및 File Exchange에서 Annotations에 대해 자세히 알아보기
참고 항목
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!웹사이트 선택
번역된 콘텐츠를 보고 지역별 이벤트와 혜택을 살펴보려면 웹사이트를 선택하십시오. 현재 계신 지역에 따라 다음 웹사이트를 권장합니다:
또한 다음 목록에서 웹사이트를 선택하실 수도 있습니다.
사이트 성능 최적화 방법
최고의 사이트 성능을 위해 중국 사이트(중국어 또는 영어)를 선택하십시오. 현재 계신 지역에서는 다른 국가의 MathWorks 사이트 방문이 최적화되지 않았습니다.
미주
- América Latina (Español)
- Canada (English)
- United States (English)
유럽
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
