Run statistical tests on multiple csvs

I have lots of days worth of heartrate data stored in seperate csvs and im currently running a ttest on the data 3hrs before and after 12am. I was going to run this manually on all 30 differenent days but i was wondering if there was a way of looping through all the different days to return all the p values at once.

댓글 수: 1

Rik
Rik 2021년 4월 20일
That seems very likely. Once you create file list (e.g. with dir) you should be able to do the processing in a loop.
Do you have a specific question about implementing this?

댓글을 달려면 로그인하십시오.

답변 (1개)

Manash Sahoo
Manash Sahoo 2021년 4월 20일
편집: Manash Sahoo 2021년 4월 20일

0 개 추천

Store your data in a folder, and use the "Dir" command to return the filenames and loop through them.
For example:
% Load your heart rate data. You can get the names of files and folders
% using the "dir" command.
files = dir(strcat(filepath,"\*.csv")) % Filepath would be the path where your csvs are located.
pval = {};
for i = 1:length(files)
HRDat = readmatrix(files.name); % You may need to edit this per your filepath.
% Do your analysis here, and return your pvalue to pval{i}.
end
Your pvalues in the cell array "pval" will thus correspond to the files in the struct array "files.name". This is usually the way I do things with heart rate data. Let me know if you have any further questions!
EDIT: Fixed the code.
MS

댓글 수: 7

  • Pre-allocation tends to be faster, and since a p-value is a numeric value, using a double array is probably fine as well.
  • You're using the length function. Consider using numel or size instead.
  • I personally prefer avoiding i as a variable name, so I changed that to n.
  • Try to avoid using strcat to create a path. Using fullfile allows you that same flexibility, without having to wory about the correct filesep.
  • As a last point: you forgot to index the struct inside the loop.
% Load your heart rate data. You can get the names of files and folders
% using the "dir" command.
files = dir(fullfile(filepath,'*.csv')) % Filepath would be the path where your csvs are located.
pval = NaN(numel(files),1);
for n = 1:numel(files)
HRDat = readmatrix(files(n).name); % You may need to edit this per your filepath.
% Do your analysis here, and return your pvalue to pval(n).
end
Manash Sahoo
Manash Sahoo 2021년 4월 20일
Ah. Thanks for the pointers! This is indeed a much better solution.
Ross Thompson
Ross Thompson 2021년 4월 20일
Thankyou both!
Ross Thompson
Ross Thompson 2021년 4월 20일
편집: Ross Thompson 2021년 4월 20일
Ive got it to run however its returning the same p value for all 24 days. Any ideas why this may be?
files = dir(fullfile('/Users/rossthompson/Documents/MATLAB/HR_data','*.csv'));
pval = NaN(numel(files),1);
for n = 1:numel(files);
HRDat = readmatrix(files(n).name);
noonindex = find(data.noonTime==1);
noontime = data.Timestamp(noonindex);
A = data.HeartRate(noonindex-36:noonindex);
B = data.HeartRate(noonindex+1:noonindex+37);
[h, p, ci] = ttest2(A,B);
pval(n) = p;
end
pval
pval =
1.0e-11 *
0.4139
0.4139
0.4139
0.4139
0.4139
0.4139
0.4139
0.4139
0.4139
0.4139
0.4139
0.4139
0.4139
0.4139
0.4139
0.4139
0.4139
0.4139
0.4139
0.4139
0.4139
0.4139
0.4139
0.4139
Rik
Rik 2021년 4월 20일
That is a very low value: 0.4e-11. You could conclude that all your analyses have a p value of 0.
Otherwise you will have to look at the data you're using each iteration. When you do that, you will notice that you aren't actually using HRDat in the rest of your loop, so each iteration is using the exact same data.
Ross Thompson
Ross Thompson 2021년 4월 20일
Silly me! Thanks

댓글을 달려면 로그인하십시오.

카테고리

도움말 센터File Exchange에서 Loops and Conditional Statements에 대해 자세히 알아보기

태그

질문:

2021년 4월 20일

댓글:

2021년 4월 20일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by