How to convert a cell array of structs into a table?

조회 수: 63 (최근 30일)
Jakub Sekula
Jakub Sekula 2019년 4월 17일
댓글: Peter Perkins 2019년 5월 3일
I am trying to write a script to analyse data downloaded from Facebook messenger, such as the message count, most common times, wordcloud etc. However I keep running into an issue with accessing the data. The file I've downloaded is a JSON file that, when imported and decoded using jsondecode(), becomes a 1x1 struct with a few fields. I am only interested in the 'messages field' within it, so I extracted it to a separate variable called messages.
data=fileread('message_test.json');
data=jsondecode(data);
messages=data.messages;
Now messages is a Nx1 cell array of structs, which in turn have 4 fields each: sender_name, content, timestamp_ms and type - 3 of these are chars and one is a number. What I would like to do is convert this Nx1 cell array into a Nx4 table with headers corresponding to the struct fields. I have tried using cell2table and struct2table on it, but unsuccessfully.
A = cell2table(messages)
B = struct2table(messages{1,1})
C = struct2table(messages{2,1})
Expectedly, A returns a 21x1 table of structs which is not what I'm after. I think B and C get me somewhat closer to a solution, because they return 1x4 tables, which would give me what I need if only I could join them. However, I cannot concatenate them using neither vertcat() or the regular [B;C]:
D = [B; C]
D = vertcat(B,C)
Both give the following error:
Could not concatenate the table variable 'sender_name' using VERTCAT.
Caused by:
Error using vertcat
Dimensions of arrays being concatenated are not consistent.
I don't quite understand that error, because both B and C are 1x4 tables, so the dimensions seem compatible?
I have also considered using a loop to extract each row of the cell array as a 1x4 table and then merging them but that does not work, because apparently the chars have different lengths and cannot be joined. Could somebody please help me out? I've attached the sample file below.
Thank you in advance.
  댓글 수: 5
Jakub Sekula
Jakub Sekula 2019년 4월 19일
Here's a link to the file on Google Drive - for some reason it doesn't let me attach a JSON here as it's not a supported format.
Peter Perkins
Peter Perkins 2019년 5월 3일
Jakub, I'm not sure if you got your question answered yet or not. I played around with you data a bit. It's not too hard to turn that json file into a table, albeit one with a lot of "holes" -- 4 of the 21 records have an audio or video uri.
I'm trying to understand what are you hoping to create out of these data? You've got two levels of hierarchy, but again, only for 4/21 of the records. So a table containing other tables is one choice, or a table containg two struct arrays, or maybe even three tables.

댓글을 달려면 로그인하십시오.

답변 (1개)

per isakson
per isakson 2019년 4월 18일
편집: per isakson 2019년 4월 19일
So far so good. What to do with the field, "photos"?
>> S=data.messages;
>> sas = [S{[1:10]}];
>> struct2table(sas)
ans =
10×4 table
sender_name timestamp_ms content type
___________________ ____________ _________________________________________________________________ _________
'Abdul-Haq Hussain' 1.5541e+12 'Me neither I can't really be bothered 😂' 'Generic'
'Jakub Sekuła' 1.5541e+12 'but I haven't done it yet' 'Generic'
'Jakub Sekuła' 1.5541e+12 'i think the turbines' 'Generic'
'Abdul-Haq Hussain' 1.5541e+12 'Which is easier ?' 'Generic'
'Abdul-Haq Hussain' 1.5541e+12 'And I'm not sure yet' 'Generic'
'Abdul-Haq Hussain' 1.5541e+12 'No problem' 'Generic'
'Jakub Sekuła' 1.5541e+12 'are you writing your report on this one or on the condensation?' 'Generic'
'Jakub Sekuła' 1.5541e+12 'thank you' 'Generic'
'Jakub Sekuła' 1.5541e+12 'sick' 'Generic'
'Abdul-Haq Hussain' 1.5541e+12 'Those are the results' 'Generic'
One problem is that a message may contain text or photos, possibly both
>> S{13:14}
ans =
struct with fields:
sender_name: 'Abdul-Haq Hussain'
timestamp_ms: 1.5541e+12
photos: [1×1 struct]
type: 'Generic'
ans =
struct with fields:
sender_name: 'Abdul-Haq Hussain'
timestamp_ms: 1.5541e+12
content: 'I have just found the lab sheet'
type: 'Generic'
>> S{13}.photos
ans =
struct with fields:
uri: 'messages/inbox/AbdulHaqHussain_GLUSzzYHEQ/photos/55849416_307437946611321_6026839667574308864_n_307437926611323.jpg'
creation_timestamp: 1.5541e+09
Answer to comment:
The function cssm() reads all messages and transfers the four required fields to a table. All other fields are ignored. Values of missing fields are replaced by an empty character array. Remains a problem with the encoding scheme (on my system).
>> ffs = 'message_test.json';
>> out = cssm( ffs )
out =
21×4 table
sender_name timestamp_ms content type
___________________ ____________ ____________________________________________________________________________________________________________________________________________ _________
'Abdul-Haq Hussain' 1.5541e+12 'Me neither I can't really be bothered 😂' 'Generic'
'Jakub Sekuła' 1.5541e+12 'but I haven't done it yet' 'Generic'
...
'Abdul-Haq Hussain' 1.5541e+12 'Those are the results' 'Generic'
'Abdul-Haq Hussain' 1.5541e+12 '' 'Generic'
'Abdul-Haq Hussain' 1.5541e+12 '' 'Generic'
'Abdul-Haq Hussain' 1.5541e+12 '' 'Generic'
'Abdul-Haq Hussain' 1.5541e+12 'I have just found the lab sheet' 'Generic'
where
function out = cssm( ffs )
%%
data = fileread(ffs);
data = jsondecode(data);
%%
cac = data.messages;
len = length( cac );
%%
required_fields = {'sender_name','timestamp_ms','content','type'};
sas = cell2struct( repmat({''}, len,4 ), required_fields, 2 );
for jj = 1 : len
for name = required_fields
if ismember( name, fields(cac{jj}) )
sas(jj).(name{:}) = cac{jj}.(name{:});
end
end
end
%%
out = struct2table( sas );
end
Run profile() before you even think about replacing the for-loops
  댓글 수: 5
per isakson
per isakson 2019년 4월 19일
편집: per isakson 2019년 4월 19일
cssm() is defined in my answer. Read
  1. function Declare function name, inputs, and outputs
  2. What Is the MATLAB Search Path?
and do (not browse) a couple of the examples that are included in the description of functions.
While you are waiting for our answers I recommend that you try Matlab's Self-Paced Courses, see especially:
  1. MATLAB Onramp, Get started quickly with the basics of MATLAB.
  2. MATLAB Fundamentals, Learn core MATLAB functionality for data analysis, modeling, and programming. (15.3 Function Files: (1/7) Introduction)
Jakub Sekula
Jakub Sekula 2019년 4월 19일
Thanks, I'm actually going through the Matlab Fundamentals course right now but I have not reached the functions chapter yet :))

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 String Parsing에 대해 자세히 알아보기

제품


릴리스

R2019a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by