필터 지우기
필터 지우기

How to extract text from .json files and combine them?

조회 수: 17 (최근 30일)
Susan
Susan 2020년 3월 28일
댓글: Ameer Hamza 2020년 3월 31일
Hello everyone,
I've got some questions and any inputs would be greatly appreciated. I have bunch of .json files, say 1000. To read each files I run the following code
fname = 'C:\Users\...\d90f3c62681e.json';
val = jsondecode(fileread(fname));
the output is as follows. For each file the paper_id, the size of abstract, and the size of body_text changes. I am interested in the text data in the "abstract" and the "body text". How can I extract text file in the abstract and body_text, and combine all these .json files into one file?
val =
struct with fields:
paper_id: 'd90f3c62681e'
metadata: [1×1 struct]
abstract: [1×1 struct]
body_text: [4×1 struct]
bib_entries: [1×1 struct]
ref_entries: [1×1 struct]
back_matter: []
val.abstract =
struct with fields:
text: '300 words)
cite_spans: []
ref_spans: []
section: 'Abstract'
val.body_text =
4×1 struct array with fields:
text
cite_spans
ref_spans
section
  댓글 수: 4
Walter Roberson
Walter Roberson 2020년 3월 28일
Which release are you using? When I try in R2020a, I get
paper_id: '0a43046c154d0e521a6c425df215d90f3c62681e'
>> val.abstract
struct with fields:
text: '300 words) 33 Quantification of aerosolized influenza virus [and a bunch more]
Susan
Susan 2020년 3월 29일
Hi Walter,
Thanks for your reply. I am using R2019a and get the same results as yours. My main question is considering some of this json files don't have any text for abstract, i.e., val.abstract = [], could you please tell me how I can put all the available val.abstract.text and val.body_text.text in 1 file? Do I need a for loop to go through all paper_id and extract text from each paper? If so, how?
Many thanks in advance!!

댓글을 달려면 로그인하십시오.

채택된 답변

Ameer Hamza
Ameer Hamza 2020년 3월 31일
편집: Ameer Hamza 2020년 3월 31일
As I answered in the comment on your other question, the following code will create a struct by combining the fields from individual files. It will then create a combined JSON file
files = dir('JSON files/*.json');
s = struct('abstract', [], 'body_text', []);
for i=1:numel(files)
filename = fullfile(files(i).folder, files(i).name);
data = jsondecode(fileread(filename));
if ~isempty(data.abstract)
s.abstract = [s.abstract; cell2struct({data.abstract.text}, 'text', 1)];
end
if ~isempty(data.body_text)
s.body_text = [s.body_text; cell2struct({data.body_text.text}, 'text', 1)];
end
end
str = jsonencode(s);
f = fopen('filename.json', 'w');
fprintf(f, '%s', str);
fclose(f);
  댓글 수: 2
Susan
Susan 2020년 3월 31일
Thank you so much! Your answer completely solved my issue. Thanks again!
Ameer Hamza
Ameer Hamza 2020년 3월 31일
Glad to be of help.

댓글을 달려면 로그인하십시오.

추가 답변 (1개)

Mohammad Sami
Mohammad Sami 2020년 3월 30일
You can import your data into cell arrays
filelist = {};
vals = cell(length(filelist),1);
haveabstract = false(length(filelist),1);
havebody = false(length(filelist),1);
data = cell(length(filelist),3);
% first col paper_id, second_col abstract, third col body
for i=1:length(filelist)
vals{i} = jsondecode(fileread(filelist{i}));
haveabstract(i) = ~isempty(vals{i}.abstract);
havebody(i) = ~isempty(vals{i}.body_text);
data{i,1} = vals{i}.paper_id;
if haveabstract(i)
data{i,2} = vals{i}.abstract;
end
if havebody(i)
data{i,3} = vals{i}.body_text
end
end

카테고리

Help CenterFile Exchange에서 JSON Format에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by