Boxplot for multiple categorical data sets

Hi
I want to plot the Boxplots for 3 repeated variables collected for 4 data sets, where each data set has 15x3 values. So i actually want to plot 4 catagories on x-axis, where each catagory will have 3 vertical boxplots.
Can anyone please help me with that.?
I have attache dthe file with name features.
Thanks in advance.

 채택된 답변

Cris LaPierre
Cris LaPierre 2019년 10월 25일
편집: Cris LaPierre 2019년 10월 25일

0 개 추천

You could use the 'BoxStyle','filled' name,value pair when creating the boxplot. I don't like how that looks. The best I could find to create it the way I like was this post. Note that the fill is a colored object being placed on top of the box plot. That means it will cover up the median line unless you adjust its transparency.
I've moved the plotting of the mean so that it is on top of the new object creating the fill. I've also added it to the legend so that others know what that non-standard marker represents.
Final answer would be this:
load 'Data for plot.mat'
nDataSets = 7;
nVars = 3;
nVals = 15;
% Create column vector to indicate dataset
dataSet = categorical([ones(nVars*nVals,1); ...
ones(nVars*nVals,1)*2; ...
ones(nVars*nVals,1)*3; ...
ones(nVars*nVals,1)*4;...
ones(nVars*nVals,1)*5;...
ones(nVars*nVals,1)*6;...
ones(nVars*nVals,1)*7]);
% Create column vector to indicate the variable
clear var
var(1:nVals,1) = "Var1";
var(end+1:end+nVals,1) = "Var2";
var(end+1:end+nVals,1) = "Var3";
Var = categorical([var;var;var;var;var;var;var]);
% Create a table
testData = table(data,dataSet,Var);
h = boxplot(testData.data,{testData.dataSet,testData.Var},...
'ColorGroup',testData.Var,...
'Labels',{'','Data1','','','Data2','','','Data3','','','Data4','','','Data5','','','Data6','','','Data7',''});
% set(gca,'XTickLabel',{' '})
% Don't display outliers
ol = findobj(h,'Tag','Outliers');
set(ol,'Visible','off');
% Find all boxes
box_vars = findall(h,'Tag','Box');
% Fill boxes
for j=1:length(box_vars)
patch(get(box_vars(j),'XData'),get(box_vars(j),'YData'),box_vars(j).Color,'FaceAlpha',.1,'EdgeColor','none');
end
% Add legend
Lg = legend(box_vars(1:3), {'G1','G2','G3'},'Location','northoutside','Orientation','horizontal');
%% Add Mean to boxplots
summaryTbl = groupsummary(testData,{'dataSet','Var'},"mean")
hold on
plot(summaryTbl.mean_data, '+k')
hold off
Lg.String{4} = 'mean';

댓글 수: 5

Joana
Joana 2019년 10월 25일
Thanks a lot. :)
Anna Nickoloff
Anna Nickoloff 2023년 2월 8일
편집: Anna Nickoloff 2023년 2월 8일
Thank you! I found this incredibly helpful :)
Clara Yang
Clara Yang 2023년 4월 7일
편집: Clara Yang 2023년 4월 7일
Thanks for sharing this method. I don't know why when I try it, even after adjusting all the parameters, I get this error:
Error using boxplot>assignUserLabels
There must be the same number of labels as groups or as the number of elements in X.
Error in boxplot>identifyGroups (line 1254)
assignUserLabels(labels,groupIndexByPoint,numFlatGroups,xlen,...
Error in boxplot (line 290)
identifyGroups (gDat,grouporder,positions,colorgroup,...
Do you know why? Thank you so much!
Clara Yang
Clara Yang 2023년 4월 7일
편집: Clara Yang 2023년 4월 7일
Hi, I figured the reasons, I only have 2 nVars, so in the label I need to delete some extra ' ' . If possible, is there ways for me to put the label in the middle in this case? Thank you so much again for writing this method!
It is probably best to ask a new question of your own, as more people will see it.
I don't see a good way to do this with boxplot, but boxchart can really simplify the code (it's come a long way since the question was originally asked). It does require a little manipulation to get the mean values to align, but nothing difficult.
The X tick label names come directly from the categorical information used to group the data. You don't have to use categorical for grouping, but it does make it convenient to group on non-numeric data.
Here, I've renamed the categories just to demonstrate.
% Create a test data set
nDataSets = 7;
nVars = 2;
nVals = 15;
data = rand(nVals*nVars*nDataSets,1);
% Create column vector to indicate dataset
dataSet = categorical([ones(nVars*nVals,1); ...
ones(nVars*nVals,1)*2; ...
ones(nVars*nVals,1)*3; ...
ones(nVars*nVals,1)*4;...
ones(nVars*nVals,1)*5;...
ones(nVars*nVals,1)*6;...
ones(nVars*nVals,1)*7]);
dataSet = renamecats(dataSet,{'Red','Orange','Yellow','Green','Purple','Indigo','Violet'});
% Create column vector to indicate the variable
clear var
var(1:nVals,1) = "Var1";
var(end+1:end+nVals,1) = "Var2";
Var = categorical([var;var;var;var;var;var;var]);
% Create a table
testData = table(data,dataSet,Var);
% ########################################
% Actual visualization code using boxchart
boxchart(testData.dataSet,testData.data,"GroupByColor",testData.Var)
%% Add Mean to boxplots
summaryTbl = groupsummary(testData,{'dataSet','Var'},"mean");
hold on
plot((1:nDataSets*nVars)/2 + 0.25, summaryTbl.mean_data, '+k')
hold off
legend(["G1","G2","Mean"],'Location','northoutside','Orientation','horizontal')

댓글을 달려면 로그인하십시오.

추가 답변 (3개)

Cris LaPierre
Cris LaPierre 2019년 10월 24일
편집: Cris LaPierre 2019년 10월 24일

0 개 추천

Not sure what you are hoping it looks like in the end, but here's one way.
load features.mat
data1 = features{1};
data2 = features{2};
data3 = features{3};
data4 = features{4};
subplot(1,4,1)
boxplot(data1)
title('Data 1')
subplot(1,4,2)
boxplot(data2)
title('Data 2')
subplot(1,4,3)
boxplot(data3)
title('Data 3')
subplot(1,4,4)
boxplot(data4)
title('Data 4')
naina_features_4boxplot.png

댓글 수: 1

Cris LaPierre
Cris LaPierre 2019년 10월 24일
편집: Cris LaPierre 2019년 10월 24일
One potentially cool thing is to take advantage of the grouping option (second syntax described in the doc). To do so, I'd recommend getting your data into a table. Create one variable with all the data, one with categorical info on the data set, and one with categorical info on the variable.
% Create column vector of all data
data = [data1(:); data2(:); data3(:); data4(:)];
% Create column vector to indicate dataset
dataSet = categorical([ones(numel(data1),1); ...
ones(numel(data2),1)*2; ...
ones(numel(data3),1)*3; ...
ones(numel(data4),1)*4]);
% Create column vector to indicate the variable
clear var
var(1:length(data1),1) = "Var1";
var(end+1:end+length(data1),1) = "Var2";
var(end+1:end+length(data1),1) = "Var3";
Var = categorical([var;var;var;var]);
% Create a table
testData = table(data,dataSet,Var);
Naina_table.png
Now you can use a single boxplot command to create the boxplot you describe. You can use multiple grouping variables to organize the data into separate boxplots (enclose them in curly braces). Here, I group first by dataSet, then by Var.
boxplot(testData.data,{testData.dataSet,testData.Var})
The two X-axis labels indicate 1) dataSet and 2) Variable.
If you want to see all the boxplots for a specific variable next to each other, change the order of your grouping variables to first group by Var, then by dataSet.
boxplot(testData.data,{testData.Var,testData.dataSet})
Notice the X-axis labels can still be used to correctly identify each boxplot.

댓글을 달려면 로그인하십시오.

Joana
Joana 2019년 10월 24일

0 개 추천

Hi Cris,
Thanks a lot for the reply. Second function works okay for me but i have a few more pointers, i'll highly appreciate if you can help on that as well.
1: How to color the Var1, 2 or Var3 with the same color scheme.?
2: Is it possile to add the legend of Var 1, 2 and 3 instead of showing this on x-axis.?
3: Can i show the data type=1,2,3,4 as nominal data and just once instead of repeating it for all 3 variables. e.g show 'Data1' on x-axis for 1st Var1,Var2 and Var3.?
4: The plot is showing the outliers, how to neglect that.?
TIA.

댓글 수: 5

Cris LaPierre
Cris LaPierre 2019년 10월 24일
편집: Cris LaPierre 2019년 10월 24일
This is a good time to recommend going through the documentation for boxplot. What is possible is documented there.
For example:
  1. Coloring groups
  2. Changing labels
  3. Modifying outliers
As for adding a legend, see these forum posts: post1, post2.
Joana
Joana 2019년 10월 25일
편집: Cris LaPierre 2019년 10월 25일
Yes i try to 'ColorGroup,Var1', 'b' and it says 'Invalid parameter name: ColorGroup,var.' .
Can you please correct it.?
i'm new to MATLAB actually so i don't understand how to do that. Also i want to add the 'Mean' on each boxplot, how can i do that.?
I have attached the data. and this is what i am doing for now:
% Create column vector to indicate dataset
dataSet = categorical([ones(numel(data1),1); ...
ones(numel(data2),1)*2; ...
ones(numel(data3),1)*3; ...
ones(numel(data4),1)*4;...
ones(numel(data4),1)*5;...
ones(numel(data4),1)*6;...
ones(numel(e),1)*7]);
% Create column vector to indicate the variable
clear var
var(1:length(data1),1) = "Var1";
var(end+1:end+length(data1),1) = "Var2";
var(end+1:end+length(data1),1) = "Var3";
Var = categorical([var;var;var;var;var;var;var]);
% Create a table
testData = table(data,dataSet,Var);
boxplot(testData.data,{testData.dataSet,testData.Var},'MedianStyle','target','PlotStyle','compact','Color','rmb','Widths',0.1,'Whisker',6.5)
set(gca,'XTickLabel',{' '})
box_vars = findall(gca,'Tag','Box');
hLegend = legend(box_vars([3,2,4]), {'G1','G2','G3'});
%% Add Mean on boxplots
hold on
plot(mean(testData.dataSet), 'dg')
hold off
Box plots display the median value. If you want to display mean, you'll have to write additional plot commands. It looks like you already found it, but see this forum post.
If you are new to MATLAB, I'd recommend first completing MATLAB Onramp. It'll walk you through the fundamentals incrementally and interactively.
You also need to update your code and variable names to match the new variable name.
Here's how I might do it.
load 'Data for plot.mat'
nDataSets = 7;
nVars = 3;
nVals = 15;
% Create column vector to indicate dataset
dataSet = categorical([ones(nVars*nVals,1); ...
ones(nVars*nVals,1)*2; ...
ones(nVars*nVals,1)*3; ...
ones(nVars*nVals,1)*4;...
ones(nVars*nVals,1)*5;...
ones(nVars*nVals,1)*6;...
ones(nVars*nVals,1)*7]);
% Create column vector to indicate the variable
clear var
var(1:nVals,1) = "Var1";
var(end+1:end+nVals,1) = "Var2";
var(end+1:end+nVals,1) = "Var3";
Var = categorical([var;var;var;var;var;var;var]);
% Create a table
testData = table(data,dataSet,Var);
h = boxplot(testData.data,{testData.dataSet,testData.Var},...
'ColorGroup',testData.Var,...
'Labels',{'','Data1','','','Data2','','','Data3','','','Data4','','','Data5','','','Data6','','','Data7',''});
% Don't display outliers
ol = findobj(h,'Tag','Outliers');
set(ol,'Visible','off');
%% Add Mean on boxplots
summaryTbl = groupsummary(testData,{'dataSet','Var'},"mean")
hold on
plot(summaryTbl.mean_data, 'dg')
hold off
% Add legend
box_vars = findall(h,'Tag','Box');
legend(box_vars(1:3), {'G1','G2','G3'},'Location','northoutside','Orientation','horizontal');
naina_features_7boxplots_grouping1.png
Joana
Joana 2021년 5월 21일
Dear Chris
I have been trying to dig in how to plot the boxplots of different colours for each x-axis group. For example for 1st group named as '1' i need to plot the 3 variables in Red color but one will be like shaded-boxplot, other can be 'Dashed and shaded' . Just to differentiate the variable of each group.
Likewise for example i need to plot the group '2' in blue. And the corresponding G1,G2 and G3 can be shaded, dashed or circled sort of boxplots.
Would there be any possible way to do that.?
I will highly appreciate any help. :)
Regards
Adam Danz
Adam Danz 2021년 5월 22일
@Joana I'd probably follow Chris' advice and use the grouping options in boxplot but there are several functions on the file exchange that offer additional grouping methods.

댓글을 달려면 로그인하십시오.

카테고리

도움말 센터File Exchange에서 Data Distribution Plots에 대해 자세히 알아보기

질문:

2019년 10월 24일

편집:

2023년 4월 7일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by