How to deal with a single tsv-file whose size is out of the memory?

조회 수: 1 (최근 30일)
ziyou teng
ziyou teng 2020년 10월 17일
댓글: ziyou teng 2020년 11월 10일
Hi, guys.
I have a tsv file whose size is 20GB while my pc's memory is only 16GB.
When I read the file, it always shows errors.
I tried tall array as follows but fails due to tsv file cannot be recognized.
So I am looking for advice.
ds = tabularTextDatastore('D:\database1\scipatlinkage\paperauthoridaffiliationname.tsv');
ds.TreatAsMissing = 'NA';
ds.SelectedVariableNames = {'paperid','authorid','affiliationame'};
ds.SelectedFormats(2:3) = {'%s','%s'};
pre = preview(ds)
My matlab is R2020a
  댓글 수: 1
Athul Prakash
Athul Prakash 2020년 10월 24일
What is the exact error you're getting? If it s a File Not Found issue, you may use the 'isfile' function to confirm the existence of your file before running the script. See this doc for 'isfile': https://in.mathworks.com/help/matlab/ref/isfile.html

댓글을 달려면 로그인하십시오.

답변 (1개)

Athul Prakash
Athul Prakash 2020년 10월 24일
I have noticed that you have not created the tall array in the code attached. Perhaps . . .
t = tall(ds);
might be missing.
.As alternatives to tall array, you may try using this datastore with 'mapreduce'. See this doc:
Alternatively, you may also write your own code to 'read' from the datastore iteratively and process the data in chunks that can fit on your RAM.
Hope it Helps!
  댓글 수: 1
ziyou teng
ziyou teng 2020년 11월 10일
Thank you!It‘s really helpful. My problem is the file is too large that goes out of my pc's memory. Your answers have gaven me useful hints on that. Thanks again!

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Large Files and Big Data에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by