Run statistical tests on multiple csvs
이전 댓글 표시
I have lots of days worth of heartrate data stored in seperate csvs and im currently running a ttest on the data 3hrs before and after 12am. I was going to run this manually on all 30 differenent days but i was wondering if there was a way of looping through all the different days to return all the p values at once.
댓글 수: 1
Rik
2021년 4월 20일
That seems very likely. Once you create file list (e.g. with dir) you should be able to do the processing in a loop.
Do you have a specific question about implementing this?
답변 (1개)
Manash Sahoo
2021년 4월 20일
편집: Manash Sahoo
2021년 4월 20일
Store your data in a folder, and use the "Dir" command to return the filenames and loop through them.
For example:
% Load your heart rate data. You can get the names of files and folders
% using the "dir" command.
files = dir(strcat(filepath,"\*.csv")) % Filepath would be the path where your csvs are located.
pval = {};
for i = 1:length(files)
HRDat = readmatrix(files.name); % You may need to edit this per your filepath.
% Do your analysis here, and return your pvalue to pval{i}.
end
Your pvalues in the cell array "pval" will thus correspond to the files in the struct array "files.name". This is usually the way I do things with heart rate data. Let me know if you have any further questions!
EDIT: Fixed the code.
MS
댓글 수: 7
Rik
2021년 4월 20일
- Pre-allocation tends to be faster, and since a p-value is a numeric value, using a double array is probably fine as well.
- You're using the length function. Consider using numel or size instead.
- I personally prefer avoiding i as a variable name, so I changed that to n.
- Try to avoid using strcat to create a path. Using fullfile allows you that same flexibility, without having to wory about the correct filesep.
- As a last point: you forgot to index the struct inside the loop.
% Load your heart rate data. You can get the names of files and folders
% using the "dir" command.
files = dir(fullfile(filepath,'*.csv')) % Filepath would be the path where your csvs are located.
pval = NaN(numel(files),1);
for n = 1:numel(files)
HRDat = readmatrix(files(n).name); % You may need to edit this per your filepath.
% Do your analysis here, and return your pvalue to pval(n).
end
Manash Sahoo
2021년 4월 20일
Ah. Thanks for the pointers! This is indeed a much better solution.
Ross Thompson
2021년 4월 20일
Ross Thompson
2021년 4월 20일
편집: Ross Thompson
2021년 4월 20일
Ross Thompson
2021년 4월 20일
Rik
2021년 4월 20일
That is a very low value: 0.4e-11. You could conclude that all your analyses have a p value of 0.
Otherwise you will have to look at the data you're using each iteration. When you do that, you will notice that you aren't actually using HRDat in the rest of your loop, so each iteration is using the exact same data.
Ross Thompson
2021년 4월 20일
카테고리
도움말 센터 및 File Exchange에서 Loops and Conditional Statements에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!