이 질문을 팔로우합니다.
- 팔로우하는 게시물 피드에서 업데이트를 확인할 수 있습니다.
- 정보 수신 기본 설정에 따라 이메일을 받을 수 있습니다.
How can I create new variables based on groups?
조회 수: 2 (최근 30일)
이전 댓글 표시
Hello everyone,
I want to create new variables in order to perform a t-test based on the group membership of my subjects. I have this code here:
clearvars
close all
filepath = ['filepath'];
T =readtable('filename');
G = findgroups(T(:,1))
if G == 1
X = T(:,:)
else G == 2
Y = T(:,:)
end
I am encountering the following problem: It does not work. I will only get table T again for Y and not what I want, two entirely seperate tables based on whether a subject is in group 1 or 2. Any help or tips would be appreciated.
Thank you
댓글 수: 18
Rik
2020년 4월 27일
If you set a breakpoint you will see what is happening: only one of the branches will be executed.
It is a common mistake that people make: if you use an array as the conditional in an if-statement, it may not do what you expect. Either use a loop or an array operation.
If you want specific help: share your data or write code that will generate plausible data.
Hannah_Mad
2020년 4월 27일
Thank you Rik,
Please see below an excerpt from my data.
1 '0,1188' '0,1103' '1,4' '1,3' '-13,00950292' '-1,000894239' '3,728322672' '12,81289888' '0,468820547' '1,169608552'
1 '0,1103' '0,2376' '1,3' '2,8' '-11,8' '-2' '3,6' '13,4' '-0,9' '2,9'
1 '0,1313' '0,1717' '1,3' '1,7' '-13,28540783' '-3,043789654' '1,401630356' '13,32603837' '-2,987182197' '0,545827005'
1 '0,0971' '0,0883' '1,1' '1' '-15,71450602' '-3,962745391' '3,050642807' '13,45261762' '-1,497263892' '3,083489585'
2 '0,295' '0,295' '2,8' '2,8' '-14,5881751' '-2,603528618' '3,518819139' '14,33740562' '-1,870682366' '3,525744346'
2 '0,0883' '0,0883' '1' '1' '-12,86394769' '-5,766465114' '3,120227299' '13,97601291' '-4,209455419' '3,276772679'
2 '0,2191' '0,402' '2' '3,3' '' '' '' '' '' ''
2 '0,1424' '0,1442' '1,6' '1,5' '-17,17220026' '2,691067249' '6,865599728' '14,59057189' '4,206039042' '5,34181054'
2 '' '' '' '' '-13,1' '-4,9' '1,5' '12,7' '-2,7' '3,1'
If I try and use a loop:
clearvars
close all
filepath = ['C:\Hannah\ET\Statistik'];
T =readtable('ET2mat.csv');
G = findgroups(T(:,1))
for k = 1:44
if T(:,1) == 1
x = T(:,:)
else T(:,1) == 2
y = T(:,:)
end
end
I will get the following error message: Undefined operator '==' for input arguments of type 'table'.
So what do I need to do? Make it an array? I understand that maybe this will not work because the grouping variable is not a vector but part of the table.
Thank you
Stephen23
2020년 4월 27일
Using if is a red herring and rather unsuitable. The MATLAB way would be to use logical indexing, e.g.:
G = findgroups(T(:,1))
X = T(G==1,:);
Y = T(G==2,:);
But note that splitting up your table into separate variables is unlikely to be required, nor a good approach. The recommended approach is to use the Split-Apply-Combine Workflow on one table:
Hannah_Mad
2020년 4월 27일
I use splitapply for most things, such as mean, standard deviation etc., however, it does not work for the t-test - do you have another suggestion for this perhaps? Thank you.
Hannah_Mad
2020년 4월 27일
So this is my code then:
G = findgroups(T(:,1))
splitapply(ttest,(T(:,2)), G)
Whiich will result in this error message:
Not enough input arguments.
Error in ttest (line 124)
dim = find(size(x) ~= 1, 1);
Error in test (line 7)
splitapply(ttest,(T(:,2)), G)
>>
Stephen23
2020년 4월 27일
You called ttest with no input arguments, thus the error. You forgot to use @ to create a function handle:
splitapply(@ttest,...)
% ^ you forgot this
Hannah_Mad
2020년 4월 27일
Thank you very much!
However, I still get the following error:
Error using splitapply (line 132)
Applying the function 'ttest' to the 1st group of data generated the following error:
Undefined function 'isnan' for input arguments of type 'cell'.
Error in test (line 7)
splitapply(@ttest,(T(:,11)), G)
Stephen23
2020년 4월 27일
Hannah_Mad's "Answer" moved here:
Well. I keep getting error messages, different ones though.
clearvars
close all
filepath = ['C:\Hannah\ET\Statistik'];
T = readtable('ET2mat.csv');
F = rmmissing(T)
[row col] = size(F)
G = findgroups(F(:,1))
for n = 2:col
fprintf('This is column %d. \n' , n)
splitapply(@ttest,F,G)
end
Will result in:
Error using splitapply (line 132)
Applying the function 'ttest' to the 1st group of data generated the following error:
Undefined function 'minus' for input arguments of type 'cell'.
Error in test (line 12)
splitapply(@ttest,F,G)
So what can I do from here - I do in fact have negative values in my table. Is that the reason?
Stephen23
2020년 4월 27일
"I do in fact have negative values in my table. Is that the reason?"
The actual reason is your data file, which is imported as character, not as numeric. The reasons are:
- The file is typical of regions which use a decimal comma, namely tab-separated values (and a misleading .CSV file extension). Whilst readtable can cope with the tab delimiter, it cannot parse decimal commas.
- single quotes around all "numeric" values. I cannot image what badly written application does that.
Because of these, readtable imports that data (which you think is numeric) as character vectors in cell vectors, complete with single quotes. You can check this quite easily (because you did not upload a sample file I had to create it myself based on your earlier comment, attached, including column headers):
>> T = readtable('test.txt','delimiter','\t')
T =
AA BB CC DD EE FF GG HH II JJ KK
__ __________ __________ _______ _______ ________________ ________________ _______________ _______________ ________________ _______________
1 ''0,1188'' ''0,1103'' ''1,4'' ''1,3'' ''-13,00950292'' ''-1,000894239'' ''3,728322672'' ''12,81289888'' ''0,468820547'' ''1,169608552''
1 ''0,1103'' ''0,2376'' ''1,3'' ''2,8'' ''-11,8'' ''-2'' ''3,6'' ''13,4'' ''-0,9'' ''2,9''
1 ''0,1313'' ''0,1717'' ''1,3'' ''1,7'' ''-13,28540783'' ''-3,043789654'' ''1,401630356'' ''13,32603837'' ''-2,987182197'' ''0,545827005''
1 ''0,0971'' ''0,0883'' ''1,1'' ''1'' ''-15,71450602'' ''-3,962745391'' ''3,050642807'' ''13,45261762'' ''-1,497263892'' ''3,083489585''
2 ''0,295'' ''0,295'' ''2,8'' ''2,8'' ''-14,5881751'' ''-2,603528618'' ''3,518819139'' ''14,33740562'' ''-1,870682366'' ''3,525744346''
2 ''0,0883'' ''0,0883'' ''1'' ''1'' ''-12,86394769'' ''-5,766465114'' ''3,120227299'' ''13,97601291'' ''-4,209455419'' ''3,276772679''
2 ''0,2191'' ''0,402'' ''2'' ''3,3'' '''' '''' '''' '''' '''' ''''
2 ''0,1424'' ''0,1442'' ''1,6'' ''1,5'' ''-17,17220026'' ''2,691067249'' ''6,865599728'' ''14,59057189'' ''4,206039042'' ''5,34181054''
2 '''' '''' '''' '''' ''-13,1'' ''-4,9'' ''1,5'' ''12,7'' ''-2,7'' ''3,1''
>> cellfun(@class,T.BB,'uni',0)
ans =
'char'
'char'
'char'
'char'
'char'
'char'
'char'
'char'
'char'
>> +T.BB{1} % first and last characters are single-quotes.
ans =
39 48 44 49 49 56 56 39
Essentially you have two choices:
- write or edit the file so that all numeric data are written without single quotes and using decimal points, then efficiently import the whole file in one step using readtable, or
- parse those character vectors inside of MATLAB, replacing the decimal commas with decimal points and then converting to numeric. Not particularly efficient, but it can work with your existing data files, e.g.:
T.KKnum = str2double(strrep(strrep(T.KK,'''',''),',','.'));
You can then apply numeric functions to that numeric data. I recommend that you use the variable names to refer to the data columns, rather than indexing.
Hannah_Mad
2020년 4월 28일
So, unfortunately it is still not working. Dataset will be provided.
This is my code:
clearvars
close all
filepath = ['C:\Hannah\ET\Statistik'];
N = readtable('ET2mat.txt','delimiter','\t')
NNUM = str2double(strrep(strrep(N.Gruppe, N.CV_left, N.CV_right, N.Amplitude_left, N.Amplitude_right, N.x_left, N.y_left, N.z_left, N.x_right, N.y_right, N.z_right),',','.'));
F = rmmissing(NNUM)
[row col] = size(F)
N = F(:,1:col)
G = findgroups(N(:,1))
splitapply(@ttest,N,G)
for n = 2:col
fprintf('This is column %d. \n' , n)
splitapply(@ttest,F,G)
end
The error will always be
Error using strrep
Too many input arguments.
Any ideas on that?
Also: how can I chose any of your answers and rate them? I heard I am supposed to do that but it won't work here.
Thank you for your help. I know I am a beginner to MATLAB but it is quite tedious.
Hannah_Mad
2020년 4월 28일
I now did change the commas in excel to dots - so far everything seems fine but I seem to be getting a different error message.
This is my code now:
clearvars
close all
filepath = ['C:\Hannah\ET\Statistik'];
N = readtable('ET2mat.txt','delimiter','\t')
F = rmmissing(N)
[row col] = size(F)
G = findgroups(F{:,1})
for n = 2:col
fprintf('This is column %d. \n' , n)
[h, p ] = splitapply(@ttest,F,G,'Alpha',0.05 )
end
The error I get is:
Error using splitapply (line 87)
Group numbers must be a vector of positive integers, and cannot be a sparse vector.
Error in test (line 14)
[h, p ] = splitapply(@ttest,F,G,'Alpha',0.05 )
Tommy
2020년 4월 28일
편집: Tommy
2020년 4월 28일
Have you also gotten rid of the quotes in your text file?
This line:
[h, p ] = splitapply(@ttest,F,G)
would pass every column within F to ttest at once, as separate arguments. If you want to consider each column individually, you could use
for n = 2:col
fprintf('This is column %d. \n' , n)
[h, p ] = splitapply(@ttest,F{:,n},G)
end
(although this does use indexing rather than variable names.)
Then, the last argument to splitapply must be G, so you cannot have
[h, p ] = splitapply(@ttest,F{:,n},G,'Alpha',0.05 )
because of the 'Alpha' and 0.05. splitapply thinks the 0.05 specifies the group numbers, which is not allowed because the group numbers need to be positive integers. If you want, you could use this syntax:
[h, p ] = splitapply(@(x,y) ttest(x,y,'Alpha',0.05),F{:,n},?,G)
or this syntax:
[h, p ] = splitapply(@(x,m) ttest(x,m,'Alpha',0.05),F{:,n},?,G)
both of which are explained in the documentation for ttest, but this would require you to pass a y or m to ttest, perhaps in place of the ?s above. However, the default alpha value is 0.05, so you shouldn't need to provide it anyway.
(edit) You can only choose and vote for answers, but so far everything here is a comment.
Walter Roberson
2020년 4월 28일
Group numbers must be a vector of positive integers, and cannot be a sparse vector.
You could get that if your G is empty. Check whether F is empty.
Hannah_Mad
2020년 4월 28일
Thank you very much for your kind explanations and detailed information. However I am not entirely sure that this script does what I believe it does: compare the means of two groups (1 and 2, hence the splitapply approach) - as I get two h-values for each column. Shouldn't it be only one value? As there are two groups being compared per column. Do you have any idea about that?
Again, I can only apologize for my basic questions.
Thank you!
Hannah_Mad
2020년 4월 29일
Hello Walter,
I got the following:
class(F{:,1}) : double
size(F{:,1}) 38 1
size(G) 38 1
I think that is alright, isn't it?
Thank you,
Hannah
답변 (0개)
참고 항목
카테고리
Help Center 및 File Exchange에서 Debugging and Analysis에 대해 자세히 알아보기
태그
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!오류 발생
페이지가 변경되었기 때문에 동작을 완료할 수 없습니다. 업데이트된 상태를 보려면 페이지를 다시 불러오십시오.
웹사이트 선택
번역된 콘텐츠를 보고 지역별 이벤트와 혜택을 살펴보려면 웹사이트를 선택하십시오. 현재 계신 지역에 따라 다음 웹사이트를 권장합니다:
또한 다음 목록에서 웹사이트를 선택하실 수도 있습니다.
사이트 성능 최적화 방법
최고의 사이트 성능을 위해 중국 사이트(중국어 또는 영어)를 선택하십시오. 현재 계신 지역에서는 다른 국가의 MathWorks 사이트 방문이 최적화되지 않았습니다.
미주
- América Latina (Español)
- Canada (English)
- United States (English)
유럽
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
아시아 태평양
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)