Stacking more than 200K files (same #columns)
조회 수: 4 (최근 30일)
이전 댓글 표시
Hello, I have more than 200,000 csv files I would like to put/stack/append/concatenate over each other.
All csv file are in the same folder.
All csv files have the same number of columns (80)
But they may have different number of rows (from 5 to 5000)
I am currently using a loop and readtable for each file
out1=readtable(csvfilename{ii},opts);
out2=[out2;out1];
But it has taken forever, and likely out2 may be too big. Is this the correct way to deal with 200k files? Tall tables?
From the 80 columns in the csv, I only need the same 30 columns, this in case I can directly read those 30 to avoid making the final file too big.
댓글 수: 0
답변 (1개)
Cris LaPierre
2021년 10월 13일
편집: Cris LaPierre
2021년 10월 13일
Consider using a datastore. You can see an example in our Importing Multiple Data Files video from our Practical Data Science with MATLAB specialization on Coursera.
I'm not sure what 'taking forever' means, but it is going to take a while to load 200k files. because the array size hasnt' been preallocated, you are going to encounter memory issues as the array grows, as MATLAB has to keep moving it to larger and larger blocks of continuous membory (see here) If the size gets too big, you may need to look into using a tall array in order to work with the final result. In that case, you may want to look into a TallDatastore.
댓글 수: 4
참고 항목
카테고리
Help Center 및 File Exchange에서 Large Files and Big Data에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!