How I categorize a features?

조회 수: 3 (최근 30일)
HelpAStudent
HelpAStudent 2022년 5월 14일
댓글: the cyclist 2022년 5월 14일
Hi! I have a dataset like the histogram here: with some data around 0, some other around 1, 2, 3, 4 and 5.
I would like to make the features categorical as the amount at witch are they roughly equal in value.
This is the histogram of the features:
Please help me
  댓글 수: 1
Image Analyst
Image Analyst 2022년 5월 14일
It may or may not be possible. How were those data values determined?

댓글을 달려면 로그인하십시오.

채택된 답변

the cyclist
the cyclist 2022년 5월 14일
편집: the cyclist 2022년 5월 14일
Do you mean that you have numerical values, and you want to treat those as categorical instead? You can convert numeric to categorical using the categorical function.
x = 1:5
x = 1×5
1 2 3 4 5
c = categorical(x)
c = 1×5 categorical array
1 2 3 4 5
You said "roughly" equal in value, so maybe you need to do some rounding first?
x = [1.1 2.2 2.9 3.8 5.1]
x = 1×5
1.1000 2.2000 2.9000 3.8000 5.1000
c = categorical(round(x))
c = 1×5 categorical array
1 2 3 4 5
  댓글 수: 1
the cyclist
the cyclist 2022년 5월 14일
When I wrote this answer, I hadn't noticed that your values are not 1,2,3,4,5, but rather 10^-3 times that. So, you'll need to round differently:
x = [1.1 2.2 2.9 3.8 5.1]*1.e-3
x = 1×5
0.0011 0.0022 0.0029 0.0038 0.0051
rx = round(x,3)
rx = 1×5
0.0010 0.0020 0.0030 0.0040 0.0050
c = categorical(rx)
c = 1×5 categorical array
0.001 0.002 0.003 0.004 0.005

댓글을 달려면 로그인하십시오.

추가 답변 (1개)

Image Analyst
Image Analyst 2022년 5월 14일
You can add a tiny bit of noise then recompute the histogram edges such that the bins will be equal percentages (heights). Like this:
data = [zeros(1, 1580), ones(1, 50), 2*ones(1, 70), 2*ones(1, 50), 3*ones(1, 40), 4*ones(1, 25), 4.7*ones(1, 10)]/1000;
subplot(2, 1, 1);
[counts, edges] = histcounts(data);
bar(edges(1:end-1), counts);
grid on;
title('Uneven Bars', 'FontSize', 20);
% Now add a tiny bit of noise and sort
noisyData = data + 0.000001 * rand(size(data));
sortedData = sort(noisyData);
% Get cdf
c = cumsum(sortedData);
c = rescale(c, 0, 100); % Convert to percent.
% Find 6 bins
numBins = 6;
indexes = round(linspace(1, length(data), numBins+1))
indexes = 1×7
1 305 609 913 1217 1521 1825
edges2 = sortedData(indexes)
edges2 = 1×7
0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0047
subplot(2, 1, 2);
counts2 = histcounts(noisyData, edges2)
counts2 = 1×6
304 304 304 304 304 305
bar(edges2(1:end-1), counts2);
grid on;
title('Even Bars', 'FontSize', 20);
  댓글 수: 1
the cyclist
the cyclist 2022년 5월 14일
I'll point out here that @Image Analyst seems to have interpreted your phrase "as the amount at witch are they roughly equal in value" to mean you want the bar heights to be equal.
I interpreted that differently, and took it to to mean that you wanted your data values to be equal (rather than "roughly equal"), which is why our two approaches are very different.

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 AI for Signals에 대해 자세히 알아보기

제품


릴리스

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by