Trying to arrange big data

Question

harel yadid 2021년 3월 3일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/761371-trying-to-arrange-big-data

댓글: harel yadid 2021년 3월 9일

hi everyone, im having trouble to reach max speed reading huge csv files, and i would like to hear your ideas

i'm using datastore in order to pre - arrange 1910 headers, and than i want to fix my output to hold them in a specific struct, and start reading individually because not all 1910 headers are full of data. my main problem is that every header seperated with '_' and it makes it hard to read.

for example:

FruitDs = datastore(Fruits.csv) %im keeping it short, but the function is going fine
NumOfHeaders = length(FruitDs.Variablenames); %e.g Headers : "Apples_colour_S1_red_dated" "Apples_colour_S1_red_dated" "Apples_colour_S_green_dated"....
for n = 1:NumOfHeaders
    if strfind(Ds.VariableName{n},'Apples')
        tmp = Ds.VariableName{n};
        A =strfind(tmp,'_');
        tmp(A[1,3])='.';
        Ds.SelectedVariableName = Ds.VariableName(n);
        ApplesData = readall(Ds);
        eval([tmp '= ApplesData']);  % the struct i need is that out.Apples.colour_XXXXX will contain all data of the specific apple
        Fruits.Apples = tmp;
    end
end

this function works fine, so my questions as follows:

is there any faster way to do it?
do you have a smart and fast logical way to avoid reading empty headers (because 1910x390000 can be too much and not all of them are full (i filled them with NA in the datastore..))
i have some cases which the headers are different only by number, and i do want to seperate them. let's say "Apples_colour_S1_....", "Apples_colour_S2_...". is there a way to avoid second loop (loop that runs over all the SX)?

thanks in advance

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Walter Roberson 2021년 3월 3일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/761371-trying-to-arrange-big-data#answer_637966

don't use eval there. Use setfield. Split the name at _ into a cell array and do cell expansion to insert the field names https://www.mathworks.com/help/matlab/ref/setfield.html

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

Walter Roberson 2021년 3월 3일

MATLAB Online에서 열기

out = struct;
S = "Apples_colour_S_green_dated";
parts = split(S, '_');
out = setfield(out, parts{:}, 12345);
out.Apples.colour.S.green
ans = struct with fields:
    dated: 12345

harel yadid 2021년 3월 9일

thank you very much!

댓글을 달려면 로그인하십시오.

Trying to arrange big data

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

Trying to arrange big data

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 3 이전 댓글 1개 표시이전 댓글 1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기