Using unbalanced data with fitlme

Question

Tobias Averbeck 2023년 1월 2일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1887477-using-unbalanced-data-with-fitlme

답변: the cyclist 2023년 1월 4일

Hi,

I try to make fitlme work with unbalanced data, but I always get the error "Fixed Effects design matrix X must be of full column rank." So I looked into the code to see what the problem is and fitlme truncates my data, but retains the categorical names from the input table, which leads to a deficient rank.

In my data I have full rows, but also rows with missing data, for example [2012, 'String1', 1, NaN, 44.91, 62.9] The last column is the response column, the rest are predictors. So when I look into the fitlme function it truncates my 12042 rows input table to a 628 rows table, so that apparently every row gets deleted where at least one NaN value is present.

Shashank Prasanna talks about unbalanced data in this video, but how exactly does that work? I tried everything I could and don't know how to proceed.

댓글 수: 6
이전 댓글 4개 표시이전 댓글 4개 숨기기

the cyclist 2023년 1월 2일

If you are OK with using only the data where you have complete rows, then you can just remove the incomplete rows yourself, before calling fitlme and the creation of the dummy variables.

If you are not OK with that, then I would again say that you need to solve your missing data problem, not your rank deficiency issue.

Tobias Averbeck 2023년 1월 2일

MATLAB Online에서 열기

DataSampleTable.mat

This is sample data only, if I were to remove all rows from the entire dataset that have NaN values in them, there would be none left, so I need a way to create a model with missing data points. The database is also as full as it gets, there is no way to get more data points, so I can't solve this any other way.

The following code with the attached table reproduces the error:

Formula = 'Response ~ 1 + Predictor2 + Predictor3 + Predictor4 + Predictor5 + (1|Predictor1)';
lme = fitlme(DataSampleTable,Formula,'DummyVarCoding','full','FitMethod','REML');

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Sulaymon Eshkabilov 2023년 1월 2일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1887477-using-unbalanced-data-with-fitlme#answer_1139937

Suggestion. If you are not using all columns of your data then it is reaonable, you had better clean up your data (only the columns that are being used) by removing the rows where the data is missing (NaN). You can employ isnan() or ismissing() fcn to clean up your data before processing using fitlm() or fitlme(). Note that the demo video, the example data he used has exessive data points.

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

Tobias Averbeck 2023년 1월 2일

With the Data sample this is possible, but not with the whole data set I have. With this dataset there are many more columns and there is no single row with a value in each column. So how do you solve the problem when you have such patchy data but you want to know what has how much influence on the response?

댓글을 달려면 로그인하십시오.

Answer 2

the cyclist 2023년 1월 4일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1887477-using-unbalanced-data-with-fitlme#answer_1141532

You need to learn about data imputation methods. This is not, at its core, a MATLAB problem.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

Using unbalanced data with fitlme

댓글 수: 6
이전 댓글 4개 표시이전 댓글 4개 숨기기

답변 (2개)

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

Using unbalanced data with fitlme

댓글 수: 6 이전 댓글 4개 표시이전 댓글 4개 숨기기

답변 (2개)

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 6
이전 댓글 4개 표시이전 댓글 4개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기