Large datasets: Any way to perform regression analyses on select variables within a large table based on row name?
조회 수: 4 (최근 30일)
이전 댓글 표시
I have a large dataset of soil profiles. I am trying to calculate regressions of organic carbon and profile depth. The data set is a csv with columns for 'profile_name', 'top_depth', 'bottom_depth' and 'organic_carbon'. There are other columns for spatial data that I shouldn't have to mention.
The data is organized so there are multiple rows for one profile, so the 'profile name' value is the same for anywhere from 2 to 10 rows while the 'top_depth' and 'bottom_depth' change to reflect the sample interval within the soil profile, and the 'organic_carbon' represents how much carbon is in the soil.
What I want to do is write a script that will run linear and/or logarithmic regressions of 'organic carbon' and the 'Bottom Depth' values within each distinct 'profile_name'. I might want to go further with some calculations but I think that would be the best start. The hurdle for me is sort of binning the profile data by 'profile_name'. Any clues would be greatly appreciated!
댓글 수: 0
답변 (2개)
Tom Lane
2013년 1월 5일
If you have the Statistics Toolbox, you might find it handy to use "dataset" to read in the csv file and create a dataset array from it. Then I recommend that you convert the profile_name variable to a nominal variable. The following illustrates how you can operate on different subsets based on values of a nominal variable:
d = dataset(Origin,Displacement,Weight); % you would read this from a file
d.Origin = nominal(d.Origin); % convert text to nominal
org = unique(d.Origin);
for j=1:length(org);
t = d.Origin==org(j);
p = polyfit(d.Weight(t),d.Displacement(t),1);
fprintf('%s: %s\n',char(org(j)),num2str(p))
end
댓글 수: 0
참고 항목
카테고리
Help Center 및 File Exchange에서 Linear Regression에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!